Overview

CodeWorker is a scripting language distributed under the GNU Lesser General Public License and devoted to manipulate many aspects of generative programming as easy and intuitive as possible. Generative programming is a software engineering approach for producing reusable, tailor-made, evolvable and reliable IT systems with a high level of automation.

The scripting language adapts its syntax to the subject it has to handle: - an extended-BNF syntax (declarative part of the language) for recognizing the format of the specifications to parse, - a procedural language for manipulating easily parse trees (the only structured type admitted by 'CodeWorker'), strings, files and directories, - a JSP-like syntax (imperative part of the language), which facilitates the writing of template-based code generation.

Thanks to this syntax adaptation, the scripting language is able to easily: - acquire any kind of specification of the IT system to produce (often XML but not necessary), - generate source code in a classical way (as Rational ROSE), managing protected areas of text that accept hand-typed code, - expand a source file like the class-wizard of Visual C++ (generated text is inserted at specified markups), - translate from a format to another (LaTeX to HTML, XSL to CodeWorker, ... no limit), - transform a source file (to instrument a source file with profiling features, ...).

These tasks are executed in a straightforward process, with no binding to an external programming language and with no translation of requirements specification.

1 Building a parse tree

CodeWorker provides two methods for performing a parsing:

During the parsing of files, CodeWorker feeds an appropriate data structure that is called a tree, a parse tree. A tree is a convenient structure to represent a hierarchical set of nodes, as in XML for instance. The parse tree is shared both by the parse task, which takes in charge of populating the tree, and by the source code generation that will walk through it for generating text.

We suggest to use the file extension ".cwp" for extended-BNF parse scripts.

2 A universal source code/text generation

Given a specification provided in any kind of format, CodeWorker will generate source code or text as required in template-based scripts.

The source code generation can use three modes: generation, expansion or translation.

We suggest to use the file extension ".cwt" for template-based scripts.

3 About the manual

Efforts are focused on improving the reliability of this documentation on examples and on the reference manual (except on English text, I'm afraid!).

A formal representation describes all functions and procedures that CodeWorker provides, with their prototype and a short explanation and an example and the list of all-similar functions and procedures. This formal representation is used to generate source codes of CodeWorker that handle parsing and C++ mapping and execution of each function and procedure of the scripting language. This formal representation that conforms to what CodeWorker expects in terms of function/procedure prototypes, is reused to generate the LaTeX part of the reference manual that describes each of them. Examples are executed while generating the documentation to be sure they are correct, and to report an up to date output.

The chapter getting started is partially generated too, and the guarantee is given that every script runs successfully and that every example file has the last annotations. To warrant that, scripts are executed while generating the documentation, and example/script files contain some formatted comments just before lines to annotate. While including them into the chapter, their content is numerated line by line, and notes are extracted. Notes are written just after the content, and refer to the line they explain.

The documentation is written in LaTeX. The great advantage of LaTeX is that it offers a powerful text processing and that it is easy to manipulate for source code generation (text format instead of binary, and it accepts comments). Markups are inserted into the documentation at the points where generated text must be included. A markup is a special comment that CodeWorker recognizes. This mode of code generation is an illustration of what is called expansion mode before.

Getting started

This chapter is intended to help you to discover the scripting language and how it may serve your software development process.

CodeWorker is delivered with:

Binaries are available into the "bin" directory.

The scripting language adapts its syntax to the nature of the tasks to handle:

Example:

CodeWorker allows saving time to implement source code, if it disposes of a detailed design. Let start with a tiny modeling language that only understands object types and that we create just for this example:

      // file "GettingStarted/Tiny.tml":
      1 class A {
      2 }
      3
      4 class B : A {
      5 }
      6
      7 class C {
      8     B[] b
      9 }
      10
      11 class D {
      12     A a
      13     C[] c
      14 }

line 1: we declare the class A, without attributes,
line 4: we declare the class B, which inherits from A,
line 7: we declare the class C that encapsulates an array of B instances,
line 11: we declare the class D that encapsulates an association to an instance of class A and an array of C instances,

4 The parse tree

The role of the parsing is to populate the parse tree. Let suppose that, for each class, we need of the following attributes:

The description of an encapsulated attribute will require:

To discover the parse tree, we'll first populate it by hand. To do that, let run CodeWorker in console mode:

CodeWorker -console

Type the following line into the console, and be careful not to forget the final semi colon:

insert listOfClasses["A"].name = "A";
traceObject(project);

The insert keyword is used to create new branches into the parse tree. The root is named project, but hasn't to be specified, and a sub-node (or attribute) listOfClasses has been added. This sub-node is quite special: it has to contain an array of nodes that describe classes. Items are indexed by a string and are stored into their entrance order; so, the node that takes in charge of describing the class A is accessed via listOfClasses["A"]. The string "A" is assigned to the attribute listOfClasses["A"].name.

The procedure traceObject(project) shows us the first-level content of the root: the attribute listOfClasses and all its entries (only "A" for the moment). Let populate the tree with the description of the class B:

set listOfClasses["B"].name = "B";

The set keyword is used to assign a value to an existing branch of the parse tree. If this branch doesn't exist yet, a warning notices you that perhaps you have done a spelling mistake, to avoid inserting new bad nodes. But the node is inserted despite of the warning. As the language isn't typed, it allows avoiding some troubles. Let's continue:

ref listOfClasses["B"].parent = listOfClasses["A"];
traceLine(listOfClasses["B"].parent.name);

The node listOfClasses["B"].parent refers to the node listOfClasses["A"], so listOfClasses["B"].parent.name is similar to listOfClasses["A"].name. Let start filling in the tree for class C:

insert listOfClasses["C"].name = "C";
pushItem listOfClasses["C"].listOfAttributes;
local myAttribute;
ref myAttribute = listOfClasses["C"].listOfAttributes#back;

The pushItem assignment command is another way to add a new node into an array, where the item is indexed by the position of the node, starting at 0. The local keyword allows declaring a variable on the stack. This variable is also a parse tree, but not attached to the main parse tree project. For more commodities, this variable will refer to the last element of the attribute's list: myAttribute is shorter to type than listOfClasses["C"].listOfAttributes#back. Notice that the last element of an array is accessed via '#back'. Let complete the attribute b of class C:

insert myAttribute.name = "b";
ref myAttribute.class = listOfClasses["B"];
insert myAttribute.isArray = true;

The keyword true is a predefined constant string that is worth "true". The keyword false also exists and is worth an empty string.

Exercise:

Populate the parse tree with the description of class D.

5 Scanning our design with a BNF-driven script

Now, we'll describe the format of our tiny modeling language thanks to a BNF grammar (see paragraph BNF syntax for more elements about it) like it is recognized by CodeWorker :

      // file "GettingStarted/Tiny-BNF.cwp":
      1 TinyBNF ::=
      2     #ignore(JAVA)
      3     [classDeclaration]*
      4     #empty
      5     => { traceLine("this file is valid"); };
      6 classDeclaration ::=
      7     IDENT:"class"
      8     IDENT
      9     [':' IDENT ]?
      10     classBody;
      11 classBody ::= '{' [attributeDeclaration]* '}';
      12 attributeDeclaration ::= IDENT ['[' ']']? IDENT;
      13 IDENT ::= #!ignore ['a'..'z'|'A'..'Z']+;

line 1: the clause TinyBNF takes in charge of reading our design,
line 2: blanks and comments are allowed between tokens, conforming to the JAVA syntax ('/*' '*/' and '//'),
line 3: the clause classDeclaration is repeated as long as class declarations are encountered into the design,
line 4: if no class anymore, the end of file may have been reached,
line 5: the '=>' operator allows executing instructions of the scripting language into the BNF-driven script; this one will be interpreted once the file will be matched successfully,
line 6: the clause classDeclaration takes in charge of reading a class,
line 7: the clause IDENT reads identifiers and the matched sequence must be worth "class",
line 8: the name of the class is expected here
line 9: the declaration of the parent is facultative and is announced by a colon,
line 11: the clause classBody reads attributes as long as a it matches,
line 12: the clause attributeDeclaration expects a class identifier and, eventually, the symbol of an array, and the name of the attribute,
line 13: the clause IDENT reads an identifier, composed of a letter or more, which cannot be separated by blanks or comments (required by the directive #!ignore),
This BNF-driven script only scans the design ; it doesn't parse the data. Type the following line into the console to scan the design "Tiny.tml":


parseAsBNF("Scripts/Tutorial/GettingStarted/Tiny-BNF.cwp", project, 
        "Scripts/Tutorial/GettingStarted/Tiny.tml");

Output:

this file is valid

But this script isn't sufficient enough to complete the parse tree.

6 Parsing our design with a BNF-driven script

We have to improve the precedent script, called now "Tiny-BNFparsing.cwp", for building the parse tree that represents the pertinent data of the design:

      // file "GettingStarted/Tiny-BNFparsing.cwp":
      1 TinyBNF ::= #ignore(JAVA) [classDeclaration]* #empty
      2     => { traceLine("this file has been parsed successfully"); };
      3 classDeclaration ::=
      4     IDENT:"class"
      5     IDENT:sName
      6     => insert project.listOfClasses[sName].name = sName;
      7     [
      8         ':'
      9         IDENT:sParent
      10         => {
      11             if !findElement(sParent, project.listOfClasses)
      12                 error("class '" + sParent + "' should have been declared before");
      13             ref project.listOfClasses[sName].parent = project.listOfClasses[sParent];
      14         }
      15     ]?
      16     classBody(project.listOfClasses[sName]);
      17 classBody(myClass : node) ::=
      18     '{' [attributeDeclaration(myClass)]* '}';
      19 attributeDeclaration(myClass : node) ::=
      20     IDENT
      21     ['[' ']']?
      22     IDENT;
      23 IDENT ::= #!ignore ['a'..'z'|'A'..'Z']+;

line 5: the name of the class is put into the local variable sName. Note that the first time a variable is encountered after a token, it is declared as local automatically.
line 6: we populate the parse tree as we have proceeded manually,
line 9: the name of the parent class is put into the local variable sParent,
line 11: the parent class must have been declared before: the item is searched into the list of classes,
line 13: we populate the parse tree as we have proceeded manually,
line 16: clauses may accept parameters; here, the current class is passed to classBody that will populate it with attributes,
line 17: the clause classBody expects a parameter as a node; a parameter may be passed as value or node or reference,
line 19: little exercise: complete the clause attributeDeclaration that takes in charge of parsing an attribute of the class given to the argument myClass,
line 20: remember that you must parse the class name of the association here (attribute myClass.listOfAttributes#back.class refers to the associated class),
line 21: remember that you must parse the multiplicity of the association here (attribute myClass.listOfAttributes#back.isArray is worth true if '[]' is present),
line 22: remember that you must parse the name of the association here (to put into attribute myClass.listOfAttributes#back.name),
Exercise:

Complete the precedent clause attributeDeclaration to populate an attribute. You'll find the solution into file "Scripts/Tutorial/GettingStarted/Tiny-BNFparsing1.cwp".

Solution:

      // file "GettingStarted/Tiny-BNFparsing1.cwp":
      1 classBody(myClass : node) ::=
      2     '{' [attributeDeclaration(myClass)]* '}';
      3 attributeDeclaration(myClass : node) ::=
      4     IDENT:sClass
      5     => local myAttribute;
      6     => {
      7         pushItem myClass.listOfAttributes;
      8         ref myAttribute = myClass.listOfAttributes#back;
      9         if !findElement(sClass, project.listOfClasses)
      10             error("class '" + sClass + "' should have been declared before");
      11         ref myAttribute.class = project.listOfClasses[sClass];
      12     }
      13     ['[' ']' => insert myAttribute.isArray = true;]?
      14     IDENT:sName => {insert myAttribute.name = sName;};
      15
      16 IDENT ::= #!ignore ['a'..'z'|'A'..'Z']+;

line 4: the name of the class for the association is assigned to the local variable sName,
line 5: we'll need a local variable to point to the attribute's node for commodity,
line 7: the local variable myAttribute hasn't been declared here, because it disappears at the end of the scope (the trailing brace); a new node is added to the list of attributes,
line 8: the local variable myAttribute points to the last item of the list,
line 9: the class specifier of the association must have been declared,
line 11: we populate the parse tree as done by hand,
line 13: this attribute isArray is added only if the type of the association is an array,
line 14: we complete the attribute description by assigning its name,
Type the following line into the console to parse the design "Tiny.tml":


parseAsBNF("Scripts/Tutorial/GettingStarted/Tiny-BNFparsing1.cwp", project, 
        "Scripts/Tutorial/GettingStarted/Tiny.tml");

Output:

this file has been parsed successfully

7 Implementing a leader script

Now, we'll implement a little function that displays the content of our parse tree. We stop using the console here, and we'll implement the call to the parsing and the function into a leader script. This script will be called at the command line, as seen further.

We suggest to use the file extension ".cws" for non-template and non-BNF scripts.

CodeWorker command line to execute:
-script Scripts/Tutorial/GettingStarted/Tiny-leaderScript0.cws

      // file "GettingStarted/Tiny-leaderScript0.cws":
      1 parseAsBNF("Tiny-BNFparsing1.cwp", project, "Scripts/Tutorial/GettingStarted/Tiny.tml");
      2
      3
      4 function displayParsingTree() {
      5     foreach i in project.listOfClasses {
      6         traceLine("class '" + i.name + "'");
      7         if existVariable(i.parent)
      8             traceLine("\tparent = '" + i.parent.name + "'");
      9         foreach j in i.listOfAttributes {
      10             traceLine("\tattribute '" + j.name + "'");
      11             traceLine("\t\tclass = '" + j.class.name + "'");
      12             if existVariable(j.isArray)
      13                 traceLine("\t\tarray = '" + j.isArray + "'");
      14         }
      15     }
      16 }
      17
      18 displayParsingTree();

line 4: a user-defined function without parameters,
line 5: the foreach statement iterates all items of an array; here, all classes are explored,
line 7: check whether the attribute parent exists or not,
line 9: all attributes of the current class i are iterated,
line 12: perhaps the association is multiple,
line 18: a call to the user-defined function,

Output:

this file has been parsed successfully
class 'A'
class 'B'
    parent = 'A'
class 'C'
    attribute 'b'
        class = 'B'
        array = 'true'
class 'D'
    attribute 'a'
        class = 'A'
    attribute 'c'
        class = 'C'
        array = 'true'

8 Generating code with a pattern script

The source code generation exploits the parse tree to generate any kind of output files: HTML, SQL, C++, ...

A pattern script is written in the scripting language of CodeWorker, extended to be able to fuse the text to put into the output file and the instructions to interpret. It enables to process a {template-based} generation. Such a script looks like a JSP template: the script is embedded between tags '<%' and '%>' or '@'.

We'll start by generating a short JAVA class for each class of the design. It translates the attributes in JAVA and it generates their accessors:

      // file "Scripts/Tutorial/GettingStarted/Tiny-JAVA.cwt":
      1 package tiny;
      2
      3 public class @this.name@ @
      4 if existVariable(this.parent) {
      5     @ extends @this.parent.name@ @
      6 }
      7 @{
      8     // attributes:
      9 @
      10 function getJAVAType(myAttribute : node) {
      11     local sType = myAttribute.class.name;
      12     if myAttribute.isArray {
      13         set sType = "java.util.ArrayList/*<" + sType + ">*/";
      14     }
      15     return sType;
      16 }
      17
      18 foreach i in this.listOfAttributes {
      19     @ private @getJAVAType(i)@ _@i.name@ = null;
      20 @
      21 }
      22 @
      23     //constructor:
      24     public @this.name@() {
      25     }
      26
      27     // accessors:
      28 @
      29 foreach i in this.listOfAttributes {
      30     @ public @getJAVAType(i)@ get@toUpperString(i.name)@() { return _@i.name@; }
      31     public void set@toUpperString(i.name)@(@getJAVAType(i)@ @i.name@) { _@i.name@ = @i.name@; }
      32 @
      33 }
      34 setProtectedArea("Methods");
      35 @}

line 3: swapping to script mode: the value of this.name is put into the output file, knowing that the variable this is determined by the second parameter that is passed to the procedure generate (see section generate() and below). If the notation appears confusing to you (where does the writing mode ends, where does the script mode starts or the contrary), you can choose to inlay the variables in tags '<%' and '%>'.
line 4: swapping once again to script mode for writing the inheritance, if any
line 7: swapping to text mode,
line 10: we'll need a function to convert a type specifier of the tiny modeling language to JAVA, which expects the attribute's node (parameter mode is variable, instead of value),
line 13: we have chosen java.util.ArrayList to represent an array, why not?
line 18: swapping to script mode for declaring the attributes of the class
line 22: swapping to text mode for putting the constructor into the output file,
line 29: swapping to script mode for implementing the accessors to the attributes of the class
line 30: the predefined function toUpperString capitalizes the parameter,
line 34: the procedure setProtectedArea (see section setProtectedArea()) adds a protected area that is intended to the user and that is preserved during a generation process,
line 35: swapping to text mode for writing the trailing brace,
The leader script must be changed to require the generation of each class in JAVA:

CodeWorker command line to execute:
-script Scripts/Tutorial/GettingStarted/Tiny-leaderScript1.cws

      // file "Scripts/Tutorial/GettingStarted/Tiny-leaderScript1.cws":
      1 parseAsBNF("Scripts/Tutorial/GettingStarted/Tiny-BNFparsing1.cwp", project, "Scripts/Tutorial/GettingStarted/Tiny.tml");
      2
      3 foreach i in project.listOfClasses {
      4     generate("Scripts/Tutorial/GettingStarted/Tiny-JAVA.cwt", i, "Scripts/Tutorial/GettingStarted/tiny/" + i.name + ".java");
      5 }
      6

line 4: the second argument is waiting for a tree node that will be accessed into the pattern script via the predefined variable this, which has been encountered above,

Output:

this file has been parsed successfully

Let have a look to the following generated file:

      // file "Scripts/Tutorial/GettingStarted/tiny/D.java":
      package tiny;
     
      public class D {
          // attributes:
          private A _a = null;
          private java.util.ArrayList/*<C>*/ _c = null;
     
          //constructor:
          public D() {
          }
     
          // accessors:
          public A getA() { return _a; }
          public void setA(A a) { _a = a; }
          public java.util.ArrayList/*<C>*/ getC() { return _c; }
          public void setC(java.util.ArrayList/*<C>*/ c) { _c = c; }
      //##protect##"Methods"
      //##protect##"Methods"
      }

9 Expanding text with a pattern script

We'll learn about another mode of generation: expanding a file. Let suppose that you want to inlay generated code into an existing file. The way to do it is first to insert a special comment at the expected place. This comment begins with ##markup## and is followed by a sequence of characters written between double quotes and called the markup key.

Here is a little HTML file that is going to be expanded:

      // file "Scripts/Tutorial/GettingStarted/Tiny.html":
      <HTML>
          <HEAD>
          </HEAD>
          <BODY>
      <!--##markup##"classes"-->
          </BODY>
      </HTML>

The markup key is called "classes" and is put into the file like it: <!- -##markup##"classes"- ->.

Now, we'll implement a short script that is intended to populate the markup area with all classes of the design, displayed into tables:

      // file "Scripts/Tutorial/GettingStarted/Tiny-HTML.cwt":
      1 @
      2 if getMarkupKey() == "classes" {
      3     foreach i in project.listOfClasses {
      4         @ <TABLE>
      5             <TR>
      6                 <TD colspan=3><B>@i.name@</B></TD>
      7             </TR>
      8             <TR>
      9                 <TD><EM>Attribute</EM></TD><TD><EM>Type</EM></TD> <TD><EM>Description</EM></TD>
      10             </TR>
      11 @
      12         foreach j in i.listOfAttributes {
      13             @ <TR>
      14                 <TD><I>@j.name@</I></TD><TD><CODE>@
      15             @@j.class.name@@
      16             if j.isArray {
      17                 @[]@
      18             }
      19             @</CODE></TD><TD>@
      20             setProtectedArea(i.name + "::" + j.name);
      21             @</TD>
      22             </TR>
      23 @
      24         }
      25         @ </TABLE>
      26 @
      27     }
      28 }

line 2: the function getMarkupKey() returns the current expanding markup that is handled,
line 3: all classes will be presented sequentially into tables of 3 columns, whose title is the name of the class, and rows are populated with attributes,
line 12: the name, Type and Description of all attributes of the class are presented into the table,
line 15: the type is expressed in the syntax of our tiny modeling language,
line 20: the description of an attribute must be filled by the user into a protected area, so as to preserve it from an expansion to another,
The leader script has to take into account the expansion of the HTML file:

CodeWorker command line to execute:
-script Scripts/Tutorial/GettingStarted/Tiny-leaderScript2.cws

      // file "Scripts/Tutorial/GettingStarted/Tiny-leaderScript2.cws":
      1 parseAsBNF("Scripts/Tutorial/GettingStarted/Tiny-BNFparsing1.cwp", project, "Scripts/Tutorial/GettingStarted/Tiny.tml");
      2
      3 foreach i in project.listOfClasses {
      4     generate("Scripts/Tutorial/GettingStarted/Tiny-JAVA.cwt", i, "Scripts/Tutorial/GettingStarted/tiny/" + i.name + ".java");
      5 }
      6
      7 traceLine("expanding file 'Tiny0.html'...");
      8 setCommentBegin("<!--");
      9 setCommentEnd("-->");
      10 expand("Scripts/Tutorial/GettingStarted/Tiny-HTML.cwt", project, "Scripts/Tutorial/GettingStarted/Tiny0.html");
      11 //normal;

line 8: to expand a file, the interpreter has to know the format of comments used for declaring the markups. If the format isn't correct, the file will not be expanded.
line 10: be careful to call the procedure expand() and not to confuse with generate()! Remember that a classic generation rewrites all according to the directives of the pattern script and preserves protected areas, but doesn't recognize markup keys.

Output:

this file has been parsed successfully
expanding file 'Tiny0.html'...

It hasn't a great interest to present here the content of the HTML once it has been expanded, but you can display it (file "Scripts/Tutorial/GettingStarted/Tiny0.html") into your browser. You'll notice into the source code that the expanded text is put between tags <!- -##begin##"classes"- -> and <!- -##end##"classes"- ->. Don't type text into this tagged part, except into protected areas, because the next expansion will destroy the tagged part.

For discovering more about CodeWorker through a more complex example, please read the next chapter. You'll learn how to do translations from a format to another, and to use template functions or BNF clauses (very efficient for readability and extension!), and a lot of various things. But it is recommended to practice a little before.

Discovering more with an example

The first time, we recommend to read the precedent chapter, more approachable, before reading this one.

Let imagine that we dispose of a design expressed in a simple modeling language, like it:

      // file "GettingStarted/SolarSystem0.sml":
      1 class Planet {
      2     double diameter;
      3     double getDistanceToSun(int day, int month, int year);
      4 }
      5
      6 class Earth : Planet {
      7     string[] countryNames;
      8 }
      9
      10 class SolarSystem {
      11     aggregate Planet[] planets;
      12 }

line 1: a class is declared with keyword class
line 2: declaration of attributes in a syntax close to C++ or JAVA
line 3: declaration of methods in a syntax close to C++ or JAVA
line 6: a class may inherit from an other ; the syntax looks like C++, see ':'
line 7: an attribute may be an array ; the syntax looks like JAVA
line 11: an attribute may be an object or an array of objects, and an object may be an aggregation (meaning that it belongs to the instance),
This simple modeling language conforms to a BNF grammar (see paragraph
BNF syntax to obtain information about the elements of a BNF syntax):
world ::= [class_declaration]*
class_declaration ::= "class" IDENT [':' IDENT]? class_body
class_body ::= '{' [attribute_decl | method_decl]* '}'
attribute_decl ::= type_specifier IDENT ';'
method_decl ::= type_specifier IDENT '(' [parameters_decl]? ')' ';'
parameters_decl ::= parameter [',' parameters_decl]*
parameter ::= [parameter_mode]? type_specifier IDENT
parameter_mode ::= "in" | "inout" | "out"
type_specifier ::= basic_type ['[' ']']?
basic_type ::= "int" | "double" | "string" |
"boolean" | class_specifier
class_specifier ::= ["aggregate"]? IDENT
IDENT ::= ['a'..'z'|'A'..'Z'|'_'] ['a'..'z'|'A'..'Z'|'_'|'0'..'9']*

Starting from the desing file "SolarSystem0.sml" seen before, which conforms to the Simple Modeling Language described just above, we propose to implement the source code for classes and a light documentation.

10 The parse tree

CodeWorker doesn't belong to the category of typed languages. It recognizes only the tree as structured type and the string as basic type (that may however represent an integer or a boolean, ...). Each node may contain a string as a value, and/or an array of nodes. The main tree is called project, which is the name of its root node, accessible everywhere into scripts.

Now, the best way to understand how to handle the tree is to run the console, and to practice some examples.

Type CodeWorker to the shell to set the console mode. A cursor is waiting for your commands.

Type set a = "little"; and press enter. Don't forget the semi-colon at the end of the line. If absent, the console wait for more input: type the expected semi-colon, and it should be right.

What is the impact of the line you typed? You assigned "little" to the variable a, which doesn't exist. So, a node named 'a' has been added into the main parse tree (called project, remember), to which the variable a points. You noticed that a varning has occurred. It means that you assigned a value to a node that doesn't exist yet. In fact, the instruction set supposes that the variable to assign already exists, and a warning has been thrown to prevent you of a spelling error (perhaps do you intended to type another variable that already exists?) or a logic mistake (at this point of the program, the variable should exist, so what?). It is important to offer this protection, because the language isn't typed, and so, a lot of errors may be reported during the runtime.

The variable a has been added, even if the warning has occurred, but we prefer the instruction insert to add a new node properly : type insert b = "big"; and press enter. No warning was displayed. Now, the root project node contains two sub-nodes, called 'a' and 'b', and we control it by typing traceObject(project);. The following lines are displayed:


Tracing variable 'project':
        a = "little"
        b = "big"
End of variable's trace 'project'.

Let's go further. What about storing a list of items?
Type insert classes["Planet"].name = "Planet";. A node node called 'classes' has been added to project, and then an array entry called "Planet" has been pushed. This entry points to a node, to which 'name' is added, and node 'name' is worth "Planet".

Type insert classes["Earth"].name = "Earth"; and then ask for tracing node 'project'. The following lines are displayed:


Tracing variable 'project':
        a = "little"
        b = "big"
        classes = ""
        classes["Planet", "Earth"]
End of variable's trace 'project'.

Notice that the node 'classes' has no value (but could have!) and contains an array of nodes where entries are "Planet" and "Earth".

To iterate items of array 'classes', type foreach i in classes traceLine("handling class '" + i.name + "'..."); and see the result:


handling class 'Planet'...
handling class 'Earth'...

Variable 'i' is an iterator and is declared locally for processing the foreach instruction. We'll see further that the statement local allows declaring a tree to the stack.

What you know about the parse tree in CodeWorker is sufficient to tackle the next section.

11 Parsing our design

CodeWorker provides two different approaches for parsing files.

11.1 The parsing scripts that read tokens

Those that aren't familiar with a BNF representation will perhaps be more self-assured in using a procedure-driven parsing, where control resides within the implementation and where all tokens are explicitly read by a devoted operation. But it means for instance that ignoring blanks and comments must be indicated explicitly between reading of tokens.

The parsing scripts that read tokens are the oldest way to parse into CodeWorker and are the fastest mode too. But it doesn't offer the same flexibility as BNF scripts, which are syntax-oriented.

Below is an example of what a script that reads tokens looks like:

      // file "GettingStarted/SimpleML-token-reading.cws":
      1 declare function readType();
      2
      3 while skipEmptyCpp() {
      4     if !readIfEqualToIdentifier("class") error("'class' expected");
      5     skipEmptyCpp();
      6     local sClassName = readIdentifier();
      7     if !sClassName error("class name expected");
      8     skipEmptyCpp();
      9     if readIfEqualTo(":") {
      10         skipEmptyCpp();
      11         local sParentName = readIdentifier();
      12         if !sParentName error("parent name expected for class '" + sClassName + "'");
      13         skipEmptyCpp();
      14     }
      15     if !readIfEqualTo("{") error("'{' expected");
      16     skipEmptyCpp();
      17     while !readIfEqualTo("}") {
      18         skipEmptyCpp();
      19         readType();
      20         skipEmptyCpp();
      21         local sMemberName = readIdentifier();
      22         if !sMemberName error("attribute or method name expected");
      23         skipEmptyCpp();
      24         if readIfEqualTo("(") {
      25             skipEmptyCpp();
      26             if !readIfEqualTo(")") {
      27                 do {
      28                     skipEmptyCpp();
      29                     local iPosition = getInputLocation();
      30                     local sMode = readIdentifier();
      31                     if !sMode error("parameter type or mode expected");
      32                     if (sMode != "in") && (sMode != "out") && (sMode != "inout") {
      33                         setInputLocation(iPosition);
      34                         set sMode = "";
      35                     }
      36                     skipEmptyCpp();
      37                     readType();
      38                     skipEmptyCpp();
      39                     local sParameterName = readIdentifier();
      40                     if !sParameterName error("parameter name expected");
      41                     skipEmptyCpp();
      42                 } while readIfEqualTo(",");
      43                 if !readIfEqualTo(")") error("')' expected");
      44             }
      45             skipEmptyCpp();
      46         }
      47         if !readIfEqualTo(";") {
      48             error("';' expected to close an attribute, instead of '" + readChar() + "'");
      49         }
      50         skipEmptyCpp();
      51     }
      52 }
      53 traceLine("the file has been read successfully");
      54
      55 function readType() {
      56     local sType = readIdentifier();
      57     if !sType error("type modifier or name expected, instead of '" + readChar() + "'");
      58     if sType == "aggregate" {
      59         skipEmptyCpp();
      60         sType = readIdentifier();
      61         if !sType error("aggregated class name expected");
      62     }
      63     skipEmptyCpp();
      64     if readIfEqualTo("[") {
      65         skipEmptyCpp();
      66         if !readIfEqualTo("]") error("']' expected to close an array declaration");
      67     }
      68 }

line 1: forward declaration of method readType(), so as to start explanations about how to implement BNF clause world ::= [class_declaration]*,
line 3: do a loop while the end of file hasn't been reached, skipping blanks and C++ comments: skipEmptyCpp() returns false only if an error occurs while reading the stream or the file has completed,
line 4: waiting for token "class" as an identifier (doesn't accept "class" as the beginning of another identifier, such as "classes"). If not found, an error occurs. This token announces a class declaration.
line 5: a disadvantage of writing a procedure-driven reading/parsing: don't forget to skip explicitly blanks and comments by yourself,
line 6: populates a local variable with an identifier token that represents the name of the class
line 7: if an identifier token hasn't been found (token is empty), an error is thrown,
line 9: if the file location points to ":", announcing the inheritance, function readIfEqualTo(":") returns true, and the location moves after the matched expression. If it fails, the file location remains the same.
line 15: body of the class declaration expected
line 17: while inside the class body, reading of attribute and method members,
line 19: we don't conform exactly to the BNF: beginning of method and attribute declaration is factorized,
line 21: name of the attribute or method member,
line 24: not any more ambiguity : it starts by a parenthesis when the members is a method,
line 27: the method expects at least one parameter,
line 29: we keep the current file position, to be able to come back if the next token isn't an access mode ("in", "out" or "inout"),
line 33: we were reading a basic type, instead of a parameter access mode: we come back to the beginning of this token and the mode is set as empty (no mode). Of course, it is possible not to waste time like this, and to optimize function readType() by passing the token as a parameter. But here is the occasion of discovering how to handle the file position.
line 37: type of the current parameter is expected,
line 39: name of the current parameter is expected,
line 42: parameters are separated by commas,
line 47: both attributes and methods must finish with a semi colon,
line 48: function readChar() reads just one character, or returns an empty string if the end of file has been reached,
line 53: once the read of file has completed, a message of success is written,
line 55: user-defined function ; may return a value or not. The declaration always starts with keyword function, even if it announces a procedure (no return value). Reading a type is called at several points of the grammar, so the code is factorized in the procedure readType(). It doesn't return any value about success or failure, because an error is thrown in case of syntax mismatch.
line 58: does the keyword is a modifier? If not sType contains a basic type or a class name
line 60: reads the name of the aggregated class
line 64: perhaps that the type is an array, represented by [],
This script seems quite far from the BNF of our simple modeling language, while it implements it in a procedural way. It is able to read a well-formed design file, as our solar system presented at the beginning of the chapter. It doesn't care about populating a parse tree yet, but produces contextual error messages when the design file doesn't conform to the BNF.

Let apply the script on the design file:


parseFree("GettingStarted/SimpleML-token-reading.cws",
        project, "GettingStarted/SolarSystem0.sml");

Output:

the file has been read successfully

Now, let improve the script to allow populating a parse tree:

      // file "GettingStarted/SimpleML-token-parsing.cws":
      1 declare function readType(myType : node);
      2
      3 while skipEmptyCpp() {
      4     if !readIfEqualToIdentifier("class") error("'class' expected");
      5     skipEmptyCpp();
      6     local sClassName = readIdentifier();
      7     if !sClassName error("class name expected");
      8     insert project.listOfClasses[sClassName].name = sClassName;
      9     skipEmptyCpp();
      10     if readIfEqualTo(":") {
      11         skipEmptyCpp();
      12         local sParentName = readIdentifier();
      13         if !sParentName error("parent name expected for class '" + sClassName + "'");
      14         insert project.listOfClasses[sClassName].parent = sParentName;
      15         skipEmptyCpp();
      16     }
      17     if !readIfEqualTo("{") error("'{' expected");
      18     skipEmptyCpp();
      19     local myClass;
      20     ref myClass = project.listOfClasses[sClassName];
      21     while !readIfEqualTo("}") {
      22         skipEmptyCpp();
      23         local myType;
      24         readType(myType);
      25         skipEmptyCpp();
      26         local sMemberName = readIdentifier();
      27         if !sMemberName error("attribute or method name expected");
      28         skipEmptyCpp();
      29         if readIfEqualTo("(") {
      30             insert myClass.listOfMethods[sMemberName].name = sMemberName;
      31             if myType.name != "void" {
      32                 setall myClass.listOfMethods[sMemberName].type = myType;
      33             }
      34             skipEmptyCpp();
      35             if !readIfEqualTo(")") {
      36                 local myMethod;
      37                 ref myMethod = myClass.listOfMethods[sMemberName];
      38                 do {
      39                     skipEmptyCpp();
      40                     local iPosition = getInputLocation();
      41                     local sMode = readIdentifier();
      42                     if !sMode error("parameter type or mode expected");
      43                     if (sMode != "in") && (sMode != "out") && (sMode != "inout") {
      44                         setInputLocation(iPosition);
      45                         set sMode = "";
      46                     }
      47                     skipEmptyCpp();
      48                     local myParameterType;
      49                     readType(myParameterType);
      50                     skipEmptyCpp();
      51                     local sParameterName = readIdentifier();
      52                     if !sParameterName error("parameter name expected");
      53                     insert myMethod.listOfParameters[sParameterName].name = sParameterName;
      54                     setall myMethod.listOfParameters[sParameterName].type = myParameterType;
      55                     if sMode {
      56                         insert myMethod.listOfParameters[sParameterName].name = sMode;
      57                     }
      58                     skipEmptyCpp();
      59                 } while readIfEqualTo(",");
      60                 if !readIfEqualTo(")") error("')' expected");
      61             }
      62             skipEmptyCpp();
      63         } else {
      64             insert myClass.listOfAttributes[sMemberName].name = sMemberName;
      65             setall myClass.listOfAttributes[sMemberName].type = myType;
      66         }
      67         if !readIfEqualTo(";") error("';' expected to close an attribute, instead of '" + readChar() + "'");
      68         skipEmptyCpp();
      69     }
      70 }
      71 traceLine("the file has been parsed successfully");
      72
      73 function readType(myType : node) {
      74     local sType = readIdentifier();
      75     if !sType error("type modifier or name expected, instead of '" + readChar() + "'");
      76     if sType == "aggregate" {
      77         insert myType.isAggregation = true;
      78         skipEmptyCpp();
      79         sType = readIdentifier();
      80         if !sType error("aggregated class name expected");
      81     }
      82     insert myType.name = sType;
      83     if (sType != "int") && (sType != "double") && (sType != "boolean") && (sType != "string") {
      84         insert myType.isObject = true;
      85     }
      86     skipEmptyCpp();
      87     if readIfEqualTo("[") {
      88         skipEmptyCpp();
      89         if !readIfEqualTo("]") error("']' expected to close an array declaration");
      90         insert myType.isArray = true;
      91     }
      92 }

line 8: about parsing, classes are modeled into node project.listOfClasses[sClassName]. Its attribute name contains the value of sClassName.
line 14: this class inherits from a parent, so the optional attribute parent of the class is populated with the value of sParentName,
line 19: to work easier with the current class node project.listOfClasses[sClassName], we define a reference to it, called myClass,
line 23: the class is populated with the characteristics of the member once its declaration has finished. Otherwise, it may confuse between an attribute or a method declaration. So, we should have factorized the type declaration and the name of the member into a common clause, for example.
line 30: about parsing, methods are modeled into node myClass.listOfMethods[sMemberName],
line 31: attribute name is compulsory into a type node, so if myType.name returns "void", there is no return type,
line 36: to work easier with the current class node myClass.listOfMethods[sMemberName], we define a reference to it, called myMethod,
line 53: about parsing, parameters are modeled into node myMethod.listOfParameters[sParameterName],
line 64: about parsing, attributes are modeled into node myClass.listOfAttributes[sMemberName],
line 65: the type is allocated on the stack, so it is copied into branch type (no node reference) integrally,
line 71: once the parsing of file has achieved, a message of success is written,
line 73: function readType() requires a node into which description of type will be populated,
line 77: about parsing, myType.isAggregation contains true if type is an array,
line 82: about parsing, myType.name contains the name of basic type,
line 83: check whether the type is a basic one or a class specifier,
line 84: about parsing, myType.isObject contains true because we suppose that this type is a class specifier (by default: it isn't a basic type),
line 90: about parsing, myType.isArray contains true if type is an array,
The first version of the script was just able to read a well-formed design file written in the simple modeling language. The second version validates the file and populates the parse tree:


parseFree("GettingStarted/SimpleML-token-parsing.cws",
        project, "GettingStarted/SolarSystem0.sml");

Output:

the file has been parsed successfully

11.2 The parsing scripts that describe a BNF syntax

A BNF is more flexible and more synthetic than a procedural description of parsing. CodeWorker accepts parsing scripts that conform to a BNF.

For more information about elements of syntax for a BNF, let have a look to paragraph BNF syntax.

Below is an example of what a BNF script looks like:

      // file "GettingStarted/SimpleML-reading.cwp":
      1 // syntactical clauses:
      2 world ::= #ignore(C++) [class_declaration]* #empty
      3             => { traceLine("file read successfully"); };
      4 class_declaration ::= IDENT:"class" IDENT [':' IDENT]? class_body;
      5 class_body ::= '{' [attribute_decl | method_decl]* '}';
      6 attribute_decl ::= type_specifier IDENT ';';
      7 method_decl ::= [IDENT:"void" | type_specifier] IDENT
      8                 '(' [parameters_decl]? ')' ';';
      9 parameters_decl ::= parameter [',' parameters_decl]*;
      10 parameter ::= [parameter_mode]? type_specifier IDENT;
      11 parameter_mode ::= IDENT:{"in", "inout", "out"};
      12 type_specifier ::= basic_type ['[' ']']?;
      13 basic_type ::= "int" | "boolean" | "double" | "string" | class_specifier;
      14 class_specifier ::= ["aggregate"]? IDENT;
      15
      16 // lexical clauses:
      17 IDENT ::= #!ignore ['a'..'z'|'A'..'Z'|'_']
      18                     ['a'..'z'|'A'..'Z'|'_'|'0'..'9']*;

line 2: the world to model is composed of classes ; some special commands are used:

line 4: a class declaration begins with identifier "class", and IDENT:"class" means that an identifier is expected, and that this identifier is worth "class". This instruction isn't identical to "class" IDENT that validates the expression "classes", where IDENT matches to "es". A class has a name, read by the first IDENT clause call, and may inherit from a parent, read by the second IDENT
line 5: the body of a class is composed of attributes and methods
line 6: the attribute is preceded by its type, and IDENT reads the name of the attribute
line 7: the method has a return type or expects void keyword, and may expect some parameters ; IDENT reads the name of the method
line 9: a comma separates parameters
line 10: an access mode may be specified to the parameter ; the type is then specified, and IDENT reads the name
line 11: a parameter may be passed:

The pattern IDENT:{"in", "inout", "out"} means that the identifier must match with one of the constant strings listed between brackets. It isn't identical to the pattern "in" | "inout" | "out" that validates the beginning of "int".
line 12: a type is a basic type or an array of basic types
line 13: some basic types, including object types
line 14: IDENT reads the class name, and the object may be aggregated
line 17: this clause reads an identifier, such as pretty_pig1 ; #!ignore means that no character is ignored, even if it matches C++ comment or a blank. If we forget clause #!ignore, then IDENT will validate pretty/*comment*/_pig 1 as an identifier.
This BNF script is very close to the BNF of our simple modeling language, and is able to read a well-formed design file, as our solar system presented at the beginning of the chapter. It doesn't care about populating a parse tree yet, and doesn't produce a contextual error message when the design file doesn't conform to the BNF.

Let apply the BNF script on the design file:


parseAsBNF("GettingStarted/SimpleML-reading.cwp",
        project, "GettingStarted/SolarSystem0.sml");

Output:

file read successfully

About differences, note that each BNF rule must end with a semi colon, and that they have to indicate what is their behaviour while encountering blanks and comments.

Now, let improve the BNF script to allow populating a parse tree, or throwing an error when a syntax error has occurred:

      // file "GettingStarted/SimpleML-parsing.cwp":
      1 // syntactical clauses:
      2 world ::= #ignore(C++) [class_declaration]* #empty
      3             => {
      4                 traceLine("file parsed successfully");
      5                 saveProject("Scripts/Tutorial/SolarSystem0.xml");
      6             };
      7 class_declaration ::= IDENT:"class" #continue
      8             IDENT:sClassName
      9                 => insert project.listOfClasses[sClassName].name = sClassName;
      10             [':' #continue IDENT:sParentName
      11                 => insert project.listOfClasses[sClassName].parent = sParentName;
      12             ]?
      13             class_body(project.listOfClasses[sClassName]);
      14 class_body(myClass : node) ::= '{'
      15         [attribute_decl(myClass) | method_decl(myClass)]* '}';
      16 attribute_decl(myClass : node) ::=
      17             => local myType;
      18             type_specifier(myType) IDENT:sAttributeName ';'
      19             => {
      20                 insert myClass.listOfAttributes[sAttributeName].name = sAttributeName;
      21                 setall myClass.listOfAttributes[sAttributeName].type = myType;
      22             };
      23 method_decl(myClass : node) ::=
      24             => local myType;
      25             [IDENT:"void" | type_specifier(myType)]
      26             IDENT:sMethodName '('
      27             #continue
      28                 => {
      29                     insert myClass.listOfMethods[sMethodName].name = sMethodName;
      30                     if myType.name
      31                         setall myClass.listOfMethods[sMethodName].type = myType;
      32                 }
      33             [parameters_decl(myClass.listOfMethods[sMethodName])]? ')' ';';
      34 parameters_decl(myMethod : node) ::=
      35                 parameter(myMethod)
      36                 [',' #continue parameters_decl(myMethod)]*;
      37 parameter(myMethod : node) ::=
      38             [parameter_mode]?:sMode
      39             => local myType;
      40             type_specifier(myType)
      41             IDENT:sParameterName
      42                 => {
      43                     insert myMethod.listOfParameters[sParameterName].name = sParameterName;
      44                     setall myMethod.listOfParameters[sParameterName].type = myType;
      45                     if sMode {
      46                         insert myMethod.listOfParameters[sParameterName].name = sMode;
      47                     }
      48                 };
      49 parameter_mode ::= IDENT:{"in", "inout", "out"};
      50 type_specifier(myType : node) ::=
      51     basic_type(myType)
      52     ['[' #continue ']' => insert myType.isArray = true; ]?;
      53 basic_type(myType : node) ::=
      54     ["int" | "boolean" | "double" | "string"]:myType.name
      55         |
      56     class_specifier(myType);
      57 class_specifier(myType : node) ::=
      58     ["aggregate" => insert myType.isAggregation = true; ]?
      59     IDENT:myType.name => {insert myType.isObject = true; };
      60
      61 IDENT ::= #!ignore ['a'..'z'|'A'..'Z'|'_']
      62                     ['a'..'z'|'A'..'Z'|'_'|'0'..'9']*;

line 2: the pattern [class_declaration]* always matches with the parsed file, so the rule will continue in sequence in any case (supposing that no error has occurred into clause class_declaration) and the end of file will be checked. If not reached, it doesn't write the message "file read successfully",
line 7: once keyword "class" has been matched, there is no ambiguity : we are handling a class declaration and the rule must continue in sequence. To require that, instruction #continue is written after pattern "class". If a pattern of the sequence doesn't match the parsed file, the parser throws a syntax error automatically.
line 8: the identifier that matches with clause call IDENT is assigned to the local variable sClassName : on contrary of other types of script, a new variable is considered as local, instead of an new attribute added to the current node this,
line 9: about parsing, classes are modeled into node project.listOfClasses[sClassName]. Its attribute name contains the value of sClassName.
line 10: if the class inherits from a parent, ':' is necessary followed by an identifier (pattern #continue), and the identifier that matches with clause call IDENT is assigned to the local variable sClassName,
line 11: this class inherits from a parent, so the optional attribute parent of the class is populated with the value of sParentName,
line 14: clause class_body expects an argument: the class node into which the class members must be described (myClass : node),
line 16: the class is populated with the characteristics of the attribute once its declaration has finished. Otherwise, it may confuse with the beginning of a method declaration. To avoid this ambiguity, we should have factorized the type declaration and the name of the member into a common clause, for example.
line 20: about parsing, attributes are modeled into node myClass.listOfAttributes[sAttributeName],
line 21: the type is allocated on the stack, so it is copied into branch type (no node reference) integrally,
line 23: the class is populated with the characteristics of the method once the opened parenthesis is recognized,
line 27: from here, there is no doubt that we are parsing a method declaration,
line 29: about parsing, methods are modeled into node myClass.listOfMethods[sMethodName],
line 30: attribute name is compulsory into a type node, so if condition myType.name returns false, there is no return type (void),
line 36: a parameter declaration is expected after the comma,
line 43: about parsing, parameters are modeled into node myMethod.listOfParameters[sParameterName],
line 52: about parsing, myType.isArray contains true if type is an array,
line 54: about parsing, myType.name contains the name of basic type,
line 58: about parsing, myType.isAggregation contains true if the object is aggregated,
line 59: about parsing, myType.isObject contains true because this type is a class specifier,
line 61: the lexical clause IDENT recognizes identifiers and might be replaced by the predefined clause #readIdentifier, which does the same work,
The first version of the script was just able to read a well-formed design file written in the simple modeling language. The second version validates the file and populates the parse tree:


parseAsBNF("GettingStarted/SimpleML-parsing.cwp",
        project, "GettingStarted/SolarSystem0.sml");

Output:

file parsed successfully

12 Decorating the parse tree

Once our design file has been parsed (either procedure-driven or BNF-driven, we don't care), there is sometimes a little more work to acomplish on the parse tree. It may be verifying consistency of the whole, as checking existence of each class referenced as association or parent. It may also be reorganizing the graph differently, so as to simplify tasks of source code generation. We call it decorating the parse tree in the CodeWorker vocabulary.

The next script proposes to check the existence of each class specifier types and to keep a reference to the node that describes this class specifier. Some nodes change their nature (myClass.parent becomes a reference to the parent node, for example), some other are added (for object types, the new node myType.class keeps a reference to the class):

      // file "GettingStarted/TreeDecoration.cws":
      1 foreach myClass in project.listOfClasses {
      2     if myClass.parent {
      3         if !findElement(myClass.parent, project.listOfClasses)
      4             error("class '" + myClass.parent + "' doesn't exist while class '"
      5                   + myClass.name + "intends to inherit from it");
      6         ref myClass.parent = project.listOfClasses[myClass.parent];
      7     }
      8     foreach myAttribute in myClass.listOfAttributes {
      9         local myType;
      10         ref myType = myAttribute.type;
      11         if myType.isObject {
      12             if !findElement(myType.name, project.listOfClasses)
      13                 error("class '" + myType.name + "' doesn't exist while attribute '"
      14                       + myClass.name + "::" + myAttribute.name + "' refers to it");
      15             ref myType.class = project.listOfClasses[myType.name];
      16         }
      17     }
      18     foreach myMethod in myClass.listOfMethods {
      19         if existVariable(myMethod.type) && myMethod.type.isObject {
      20             localref myType = myMethod.type;
      21             if !findElement(myType.name, project.listOfClasses)
      22                 error("class '" + myType.name + "' doesn't exist while method '"
      23                       + myClass.name + "::" + myMethod.name + "' refers to it");
      24             ref myType.class = project.listOfClasses[myType.name];
      25         }
      26         foreach myParameter in myMethod.listOfParameters {
      27             localref myType = myParameter.type;
      28             if myType.isObject {
      29                 if !findElement(myType.name, project.listOfClasses)
      30                     error("class '" + myType.name
      31                           + "' doesn't exist while method '"
      32                           + myClass.name + "::" + myMethod.name
      33                           + "' refers to it");
      34                 ref myType.class = project.listOfClasses[myType.name];
      35             }
      36         }