WORK-IN-PROGRESS: - this material is still under development
Generate code with an explicit simulacrum of the the semantic model of the DSL, so that the generated code has generic-specific separation.
When you generate code, you embed within that code the semantics of the DSL script. By using a Model-Aware Generation you replicate some form of the Semantic Model in the generated code in order to preserve the separation of generic and specific code within the generated code
The most important aspect of Model-Aware Generation is that it preserves the principle of generic-specific separation. The actual form that the model takes in the generated code is much less important, which is why I like to say that the generated code contains a simulacrum of the Semantic Model.
It's a simulacrum model for many reasons. Usually you are generating code because of limitations in the target environment - these limitation often make it harder to express a Semantic Model than you would like. As a result lots of compromises will need to be made that makes the Semantic Model less effective as a statement of the intent of the system. However it's important to realize that this isn't such a big deal, as long as you keep the generic-specific separation.
Since the simulacrum model is a self-standing version of the Semantic Model, you can, and should, build and test the model without using any code generation. Ensure the model has a simple API to populate it. The code generation will then generate configuration code that calls this API. You can then test the simulacrum model using testing scripts that use this same API. This allows you to build, test, and refine the core behavior of the target environment with running the code-generation process. You can do this with a relatively simple test population of the model, which should be easier to understand and debug.
Using a Model-Aware Generation has many advantages compared to using Model Ignorant Generation. Being able to test and build the simulacrum model without generation makes it easier to build and test because you aren't having to rerun and comprehend code generation while you are working on the simulacrum model. Since the generated code is now made up of API calls on the simulacrum model, that code is much easier to generate, which makes the generator much simpler to build and maintain.
The main reason to not use Model-Aware Generation is due to limitations in the target environment. Either it's too hard to express even a simulacrum model, or there are performance problems with having a simulacrum model at run-time.
In many cases, you are using DSLs as a front end to an existing model. In this case, if you are generating code to work with model, then you are using Model-Aware Generation.
For an example of Model-Aware Generation I'll turn again to the secret panel state machine that I started this book with. In this case I'm imagining a situation where we've run out of java-enabled toasters to run our security system and our new batch are programmed only in C. As a result we want to generate the C from the existing java semantic model.
In this write-up I won't talk about how we actually generate the code, for that take a look at the example in Transformer Generation. In this section I'll just concentrate on what the final code, both generated and hand-written, might look like with a Model-Aware Generation.
There are many ways you can implement a model like this in C. Essentially I'm doing it as a data structure with routines that will navigate over this data structure in order to produce the behavior we need. Each physical controller only controls a single device, so we can store the data structure as static data. I shall also avoid doing heap allocations and allocate all the memory I need from the beginning.
I've built the data structure as a set of nested records and arrays. At the top of the structure is a controller
typedef struct {
stateMachine *machine;
int currentState;
} Controller;
You'll notice that I represent the current state as an integer. As you'll see I use integer references in the simulacrum model to represent all the various links between different parts of the model.
The state machine has arrays for states, events, and commands.
typedef struct {
State states[NUM_STATES];
Event events[NUM_EVENTS];
Command commands[NUM_COMMANDS];
} stateMachine;
The sizes of the various arrays are set through macro defines.
#define NUM_STATES 50 #define NUM_EVENTS 50 #define NUM_TRANSITIONS 30 #define NUM_COMMANDS 30 #define NUM_ACTIONS 10 #define COMMAND_HISTORY_SIZE 50 #define NAME_LENGTH 30 #define CODE_LENGTH 4 #define EMPTY -1
Events and commands have their name and code
typedef struct {
char name[NAME_LENGTH + 1];
char code[CODE_LENGTH + 1];
} Event;
typedef struct {
char name[NAME_LENGTH + 1];
char code[CODE_LENGTH + 1];
} Command;
The state struct hold actions and transitions. Actions are just the integers corresponding to the command while transitions are just pairs of integers for the trigger event and target state.
typedef struct {
int event;
int target;
} Transition;
typedef struct {
char name[NAME_LENGTH + 1];
Transition transitions[NUM_TRANSITIONS];
int actions[NUM_COMMANDS];
} State;
Many C programmers would prefer to use pointer arithmetic rather than array indices to navigate through the array structures, but I thought I'd avoid inflicting pointer arithmetic on my non-C readers (not to mention myself, as my C was never that good even before it got rusty.)
To finish off the data structure, I declare the state machine and the controller as static variables, which means there are only one of them.
static stateMachine machine; static Controller controller;
All of these data definitions are done within a single .c file, this way I can encapsulate the data structure behind a bunch of externally declared functions. The specific code only knows about these functions and is, rightly, ignorant about the data structure itself. In this case ignorance is truly bliss.
When I initialize the state machine, I put zero bytes into the first character of string record, effectively making them blank.
void init_machine() {
int i;
for (i = 0; i < NUM_STATES; i++) {
machine.states[i].name[0] = '\0';
int t;
for (t = 0; t < NUM_TRANSITIONS; t++) {
machine.states[i].transitions[t].event = EMPTY;
machine.states[i].transitions[t].target = EMPTY;
}
int c;
for (c = 0; c < NUM_ACTIONS; c++)
machine.states[i].actions[c] = EMPTY;
}
for (i=0; i < NUM_EVENTS; i++) {
machine.events[i].name[0] = '\0';
machine.events[i].code[0] = '\0';
}
for (i=0; i < NUM_COMMANDS; i++) {
machine.commands[i].name[0] = '\0';
machine.commands[i].code[0] = '\0';
}
}
To declare a new event, I look for the first blank event and insert the data there.
void declare_event(char *name, char *code) {
assert_error(is_empty_event_slot(NUM_EVENTS - 1), "event list is full");
int i;
for (i = 0; i < NUM_EVENTS; i++) {
if (is_empty_event_slot(i)) {
strncpy(machine.events[i].name, name, NAME_LENGTH);
strncpy(machine.events[i].code, code, CODE_LENGTH);
break;
}
}
}
int is_empty_event_slot(int index) {
return ('\0' == machine.events[index].name[0]);
}
assert_error is a macro that checks the condition and
if it's false calls an error function with the message.
#define assert_error(test, message) \
do { if (!(test)) sm_error(#message); } while (0)
Commands are declared in the same way, so I'll skip that code.
States are declared through a number of functions. The first one just declares the name of the state.
void declare_state(char *name) {
assert(is_empty_state_slot(NUM_STATES - 1));
int i;
for (i = 0; i < NUM_STATES; i++) {
if (is_empty_state_slot(i)) {
strncpy(machine.states[i].name, name, NAME_LENGTH);
break;
}
}
}
int is_empty_state_slot(int index) {
return ('\0' == machine.states[index].name[0]);
}
Declaring the actions and transitions are bit more complicated as we have to look up the id of the action based on the name. Here's the actions.
void declare_action(char *stateName, char *commandName) {
int state = stateID(stateName);
assert_error(state >= 0, "unrecognized state");
int command = commandID(commandName);
assert_error(command >= 0, "unrecognized command");
assert_error(EMPTY == machine.states[state].actions[NUM_ACTIONS -1],
"too many actions on state");
int i;
for (i = 0; i < NUM_ACTIONS; i++) {
if (EMPTY == machine.states[state].actions[i]) {
machine.states[state].actions[i] = command;
break;
}
}
}
int stateID(char *stateName) {
int i;
for (i = 0; i < NUM_STATES; i++) {
if (is_empty_state_slot(i)) return EMPTY;
if (0 == strcmp(stateName, machine.states[i].name))
return i;
}
return EMPTY;
}
int commandID(char *name) {
int i;
for (i = 0; i < NUM_COMMANDS; i++) {
if (is_empty_command_slot(i)) return EMPTY;
if (0 == strcmp(name, machine.commands[i].name))
return i;
}
return EMPTY;
}
The transitions are similar.
void declare_transition (char *sourceState, char *eventName,
char *targetState)
{
int source = stateID(sourceState);
assert_error(source >= 0, "unrecognized source state");
int target = stateID(targetState);
assert_error(target >= 0, "unrecognized target state");
int event = eventID_named(eventName);
assert_error(event >=0, "unrecognized event");
int i;
for (i = 0; i < NUM_TRANSITIONS; i++){
if (EMPTY == machine.states[source].transitions[i].event) {
machine.states[source].transitions[i].event = event;
machine.states[source].transitions[i].target = target;
break;
}
}
}
int eventID_named(char *name) {
int i;
for (i = 0; i < NUM_EVENTS; i++) {
if (is_empty_event_slot(i)) break;
if (0 == strcmp(name, machine.events[i].name))
return i;
}
return EMPTY;
}
I can now use these declaration functions to define a complete state machine - in this case the familiar one for Miss Grant.
#include "sm.h"
#include "sm-pop.h"
void build_machine() {
declare_event("doorClosed", "D1CL");
declare_event("drawOpened", "D2OP");
declare_event("lightOn", "L1ON");
declare_event("doorOpened", "D1OP");
declare_event("panelClosed", "PNCL");
declare_command("lockDoor", "D1LK");
declare_command("lockPanel", "PNLK");
declare_command("unlockPanel", "PNUL");
declare_command("unlockDoor", "D1UL");
declare_state("idle");
declare_state("active");
declare_state("waitingForDraw");
declare_state("unlockedPanel");
declare_state("waitingForLight");
/* body for idle state */
declare_action("idle", "unlockDoor");
declare_action("idle", "lockPanel");
declare_transition("idle", "doorClosed", "active");
/* body for active state */
declare_transition("active", "lightOn", "waitingForDraw");
declare_transition("active", "drawOpened", "waitingForLight");
/* body for waitingForDraw state */
declare_transition("waitingForDraw", "drawOpened", "unlockedPanel");
/* body for unlockedPanel state */
declare_action("unlockedPanel", "unlockPanel");
declare_action("unlockedPanel", "lockDoor");
declare_transition("unlockedPanel", "panelClosed", "idle");
/* body for waitingForLight state */
declare_transition("waitingForLight", "lightOn", "unlockedPanel");
/* reset event transitions */
declare_transition("idle", "doorOpened", "idle");
declare_transition("active", "doorOpened", "idle");
declare_transition("waitingForDraw", "doorOpened", "idle");
declare_transition("unlockedPanel", "doorOpened", "idle");
declare_transition("waitingForLight", "doorOpened", "idle");
}
This population code is the code that would be generated by a code-generator.
There's a couple of points about this model that are worth expanding on, but before I do I think I should just show the code that makes the state machine work. In this case this is the function that's called to handle an event with a given event code.
void handle_event(char *code) {
int event = eventID_with_code(code);
if (EMPTY == event) return; //ignore unknown events
int t = get_transition_target(controller.currentState, event);
if (EMPTY == t) return; //no transition in this state so shrug
controller.currentState = t;
int i;
for (i = 0; i < NUM_ACTIONS; i++) {
int action = machine.states[controller.currentState].actions[i];
if (EMPTY == action) break;
send_command(machine.commands[action].code);
}
}
int eventID_with_code(char *code) {
int i;
for (i = 0; i < NUM_EVENTS; i++) {
if (is_empty_event_slot(i)) break;
if (0 == strcmp(code, machine.events[i].code))
return i;
}
return EMPTY;
}
int get_transition_target(int state, int event) {
int i;
for (i = 0; i < NUM_TRANSITIONS; i++) {
if (EMPTY == machine.states[state].transitions[i].event) return EMPTY;
if (event == machine.states[state].transitions[i].event) {
return machine.states[state].transitions[i].target;
}
}
return EMPTY;
}
So that's the working state machine model. There are a few points to note about it. Firstly the data structure is somewhat primitive as it involves walking through an array to look up the various codes and names. In defining the state machine, this is probably no big deal, but in running the machine we might be better off replacing the linear search with a hash function. Since the state machine is well-encapsulated, this is easy to do so I'll leave it as an exercise for the reader. Changing such implementation details of the model doesn't affect the interface of the configuration functions that define new state machines. This is an important encapsulation.
The model does not include any notion of reset events. The various reset events that are defined through the DSL scripts and the java semantic model are just turned into extra transitions in the C state machine. This makes the running state machine simpler, and is an example of a typical trade off where I prefer simplicity of operation to clearly stating intent. For the true Semantic Model I prefer to keep as much intent as I can, but for a model in a generated target environment I value capturing intent a little less.
I could go further in simplifying the executing state machine by
removing all the names for events, commands, and states. These names
are only using while configuring the machine and aren't used at all
during the execution. So I could use some look-up tables that I discard
once the machine is fully defined. Indeed the declaration functions
could just use integers, something like
declare_action(1,2);. While this isn't anywhere near as
readable, you can argue that it matters less as this code is generated
anyway. In these situations I'm inclined to keep the names as I
prefer even generated code to be readable, but more importantly it
allows the state machine to produce more useful diagnostics when
things go wrong. I'd sacrifice this, however, If space was really
tight in the target environment.
Generating code in C in the above example means that to setup a new state machine we need to recompile. Using a Model-Aware Generation also allows us to build state machines at run-time, by driving the code generation through another file.
In this case I can express the behavior of a particular state machine though a text file such as this:
event doorClosed D1CL event drawOpened D2OP event lightOn L1ON event doorOpened D1OP event panelClosed PNCL command lockDoor D1LK command lockPanel PNLK command unlockPanel PNUL command unlockDoor D1UL state idle state active state waitingForDraw state unlockedPanel state waitingForLight transition idle doorClosed active action idle unlockDoor action idle lockPanel transition active lightOn waitingForDraw transition active drawOpened waitingForLight transition waitingForDraw drawOpened unlockedPanel transition unlockedPanel panelClosed idle action unlockedPanel unlockPanel action unlockedPanel lockDoor transition waitingForLight lightOn unlockedPanel transition idle doorOpened idle transition active doorOpened idle transition waitingForDraw doorOpened idle transition unlockedPanel doorOpened idle transition waitingForLight doorOpened idle
I can then easily interpret this code using Delimiter Directed Translation with the string processing functions built into the standard C library.
The overall function to build the machine works by just opening the file and interpreting each line as it goes.
void build_machine() {
FILE * input = fopen("machine.txt", "r");
char buffer[BUFFER_SIZE];
while (NULL != fgets(buffer, BUFFER_SIZE, input)) {
interpret(buffer);
}
}
The standard C function strtok allows me to break a string into
tokens separated by white-space. I can pull the first token and then
dispatch to a specific function to interpret that kind of line.
#define DELIMITERS " \t\n"
void interpret(char * line) {
char * keyword;
keyword = strtok(line, DELIMITERS);
if (NULL == keyword) return; // ignores blank lines
if ('#' == keyword[0]) return; // comment
if (0 == strcmp("event", keyword)) return interpret_event();
if (0 == strcmp("command", keyword)) return interpret_command();
if (0 == strcmp("state", keyword)) return interpret_state();
if (0 == strcmp("transition", keyword)) return interpret_transition();
if (0 == strcmp("action", keyword)) return interpret_action();
sm_error("Unknown keyword");
}
Each specific function pulls the necessary tokens and calls the static declare functions that I defined in the previous example. I'll just show events as all the others look pretty much the same.
void interpret_event() {
char *name = strtok(NULL, DELIMITERS);
char *code = strtok(NULL, DELIMITERS);
declare_event(name, code);
}
(Repeated calls to strtok with a NULL
first argument pull further tokens from the same string as the
previous call to strtok.)
I don't consider this textual format a DSL as I designed it based on making it easy to interpret, not for readability by humans. It's useful to have a certain amount of human readability - such as using the names of states, events and commands - as that helps in debugging. But human readability was a distant second place to ease of interpretation for this case.
The point of this example is to illustrate that code generation for a static target language does not mean you cannot use run-time interpretation. By using Model-Aware Generation I can compile just the generic state machine model together with a very simple interpreter. My code generator then just generates the text file to be interpreted. This allows me to use C for my controllers but not have to recompile to make a change in the state machine. By generating a file that's designed for ease of interpretation, I can minimize the cost of the interpreter. I could, of course, go a step further and put the full DSL processor in C - but this would raise the processing demands of the C system, and require more involved C programming. Depending on the particular situation that may be a viable option, and we would no longer be in the world of Model-Aware Generation.