Ensō Data
I've been busy working on Ensō, but took time out to write something about what we are doing. This post introduces the Schema Schema. Next I'll talk about grammars...
Enso Introduction
I know that this is just a teaser, but it is the introduction of the paper on Ensō that Tijs and I are writing. If you want to know more or see the code, let me know.
Ensō is a theoretically sound and practical reformulation of the concepts of model-driven software development. Ensō is based on first-class structural descriptions, invertable transformations, generic operations and interpretation.
Structures in Ensō are a specialized kind of graph, whose nodes are either primitive data or collections of observable properties, whose values are either nodes or collections of nodes. From a programming language viewpoint this may seem an odd choice for data representation. However, it is essentially the Entity-Relationship (ER) model, also known as Information Models, which is widely used in the design of relational databases and is also the basis for Class Diagrams in the Unified Modeling Language (UML), which describe the structure of networks of objects. The key point is that structures in Ensō are viewed holistically as graphs, not as individual values or traditional sums-and-products data structures.
A structural description, or schema, specifies some of the observable properties of structures. Schemas are used to check the consistency structures. Some properties can be checked while the structure is being created, but other can only be checked once the structure is complete. Ensō allows modification of structures, which is necessary to create cyclic graphs, but also allows valid structures to be sealed to prevent further changes.
Invertible transformations are used to map one structure into another kind of structure, such that mapping can be inverted to (partially) recover the original structure from the result. One common kind of transformation is called a grammar, which is an invertible transformation between structures and text. Text grammars are invertable because they can be used for parsing and also rendering. Other transformations include Diagram grammars, which map structures into diagrams, were edits on the diagram are reflected in the original structure. Grammars that describe the structure and behavior of user interfaces are used to generate very natural applications. Transformations are also used for querying, template processing, and serialization.
Operations can be guided by schemas or grammars, allowing highly generic operations to be defined for comparison, differencing, merging, projecting and otherwise manipulating cyclic structures. These operations are cyclic maps, which correspond to coinductive transformations. Since schemas and transformations are also structures, they can be merged and transformed using these same generic operations. Transformations on schemas can be applied to instances, to support upgrade and change management of data. The resulting system supports powerful modularity constructs, including feature modules, inheritance, and mixins.
Ensō is based on interpretation rather than code generation. While it is possible to define transformations that generate code in conventional languages, this is rarely (if ever) done in Ensō. Instead all structural descriptions and transformations are interpreted dynamically. Although the current incarnation of Ensō does not include it, we have previously demonstrated that partial evaluation can be applied to Ensō-style interpreters to automatically generate efficient code.
Ensō is a theoretically sound and practical reformulation of the concepts of model-driven software development. Ensō is based on first-class structural descriptions, invertable transformations, generic operations and interpretation.
Structures in Ensō are a specialized kind of graph, whose nodes are either primitive data or collections of observable properties, whose values are either nodes or collections of nodes. From a programming language viewpoint this may seem an odd choice for data representation. However, it is essentially the Entity-Relationship (ER) model, also known as Information Models, which is widely used in the design of relational databases and is also the basis for Class Diagrams in the Unified Modeling Language (UML), which describe the structure of networks of objects. The key point is that structures in Ensō are viewed holistically as graphs, not as individual values or traditional sums-and-products data structures.
A structural description, or schema, specifies some of the observable properties of structures. Schemas are used to check the consistency structures. Some properties can be checked while the structure is being created, but other can only be checked once the structure is complete. Ensō allows modification of structures, which is necessary to create cyclic graphs, but also allows valid structures to be sealed to prevent further changes.
Invertible transformations are used to map one structure into another kind of structure, such that mapping can be inverted to (partially) recover the original structure from the result. One common kind of transformation is called a grammar, which is an invertible transformation between structures and text. Text grammars are invertable because they can be used for parsing and also rendering. Other transformations include Diagram grammars, which map structures into diagrams, were edits on the diagram are reflected in the original structure. Grammars that describe the structure and behavior of user interfaces are used to generate very natural applications. Transformations are also used for querying, template processing, and serialization.
Operations can be guided by schemas or grammars, allowing highly generic operations to be defined for comparison, differencing, merging, projecting and otherwise manipulating cyclic structures. These operations are cyclic maps, which correspond to coinductive transformations. Since schemas and transformations are also structures, they can be merged and transformed using these same generic operations. Transformations on schemas can be applied to instances, to support upgrade and change management of data. The resulting system supports powerful modularity constructs, including feature modules, inheritance, and mixins.
Ensō is based on interpretation rather than code generation. While it is possible to define transformations that generate code in conventional languages, this is rarely (if ever) done in Ensō. Instead all structural descriptions and transformations are interpreted dynamically. Although the current incarnation of Ensō does not include it, we have previously demonstrated that partial evaluation can be applied to Ensō-style interpreters to automatically generate efficient code.
Type Inheritance
I recently gave a lecture in my programming language class concluded by explaining that Java does not support inheritance of interfaces. It supports aggregation or composition, but not inheritance. This statement makes no sense unless you understand what inheritance is: consistent modification of self-referential structures. This is why "self" or "this" refers to the subclass in inherited methods. However, these types don't inherit:
The reason is that after applying the "next" operation, the resulting streams do not have "insert" operations. In other words, this program does not type-check.
In the literature on static typing of OO languages, what we need here are Self types. Some languages, including Eiffel and Scala, have self types.
Why would you care? Well, there are lots of cases where this kind of modification is useful. Its another concept that is difficult to think about in current languages. The OO example above is nice, but type inheritance also applies to algebraic types (See my "Data Abstraction Revisited" essay for a discussion of the differences). Consider this type, written using Haskell notation:
If we want to extend this to a new data type, for example, to include polygons and fill colors, we want to say something like this:
What this means is Diagram extended with new alternatives, such that the recursive references to Diagram are changed to ExtraDiagram. I've been suggesting a while that functional programming could benefit from inheritance. This is just another example.
interface Stream<T> {
T current();
Stream<T> next();
}
interface BiStream<T> extends Stream<T> {
void insert(T item);
}
The reason is that after applying the "next" operation, the resulting streams do not have "insert" operations. In other words, this program does not type-check.
void pushNext(BiStream<String> s) {
s.next().insert("test");
}
In the literature on static typing of OO languages, what we need here are Self types. Some languages, including Eiffel and Scala, have self types.
Why would you care? Well, there are lots of cases where this kind of modification is useful. Its another concept that is difficult to think about in current languages. The OO example above is nice, but type inheritance also applies to algebraic types (See my "Data Abstraction Revisited" essay for a discussion of the differences). Consider this type, written using Haskell notation:
data Diagram
= Rectangle Int Int Int Int
| Circle Int Int Int
| Composite [Diagram]
If we want to extend this to a new data type, for example, to include polygons and fill colors, we want to say something like this:
data ExtraDiagram inherit Diagram
with Polygon [(Int, Int)]
| Fill Color ExtraDiagram
What this means is Diagram extended with new alternatives, such that the recursive references to Diagram are changed to ExtraDiagram. I've been suggesting a while that functional programming could benefit from inheritance. This is just another example.
Expanded thoughts
A great language should let you think useful thoughts you have never been able to think before. This new project I'm working on, whose code name is "Smalltalk of Modeling", is expanding my mind.
Here is one idea. Lets generalize the idea of grammars. A grammar is normally used to parse text to create a parse tree. But if you annotate the grammar with constructors then the parse tree can be instantiated create arbitrary information structures, which we can view as the semantics of the text. If you are careful, then you will find that there can be quite a bit of divergence between the syntax and the semantics; for example, while syntax is tree-based, the semantics may be a graph. We can also run the process backwards, where we take some semantic object and render it into the grammar to create a parse tree, which can then be written out to text. This amounts to putting some clothes onto the abstract semantic structure.
Text --parse(Grammar)---> ParseTree --instantiate(Factory)--> Semantic Structure
Semantic Structure --render(Grammar)--> ParseTree --------display----------> Text
Note that parse and render, which both use the Grammar, are not inverses of each other. Instead, they both output ParseTrees, but they do it from two different directions.
Now the interesting thing is that a Grammar is really just a ParseTree with added structure: to allow for alternation, repetition, and variable data. We don't need to add sequencing, because ParseTree already has that. That is, we can view a grammar as
ParseTreeGrammarStructure = Kleene(ParseTreeStructure)
This is probably not too surprising. The interesting thing is that once we make structural descriptions be values, then we can write the Kleene function, which "grammarizes" any structure. (Its called Kleene after the person who first formalized the idea of alternation, repetition, and sequencing.)
My new language lets me think about, and write down, the Kleene function. And use it in more general ways. For example, I can now contemplate constructing
DiagramGrammarStrucutre = Kleene(DiagramStructure)
This creates a semantic structure for the grammar of diagrams. I can then use it with my generic render routine to create diagrams of any semantic structure:
Semantic Structure ---render(DiagramGrammar)---> Diagram
This brings up the question of parsing diagrams. It turns out that we will edit them rather than parse them, but it will be the topic for a future post.
Here is one idea. Lets generalize the idea of grammars. A grammar is normally used to parse text to create a parse tree. But if you annotate the grammar with constructors then the parse tree can be instantiated create arbitrary information structures, which we can view as the semantics of the text. If you are careful, then you will find that there can be quite a bit of divergence between the syntax and the semantics; for example, while syntax is tree-based, the semantics may be a graph. We can also run the process backwards, where we take some semantic object and render it into the grammar to create a parse tree, which can then be written out to text. This amounts to putting some clothes onto the abstract semantic structure.
Text --parse(Grammar)---> ParseTree --instantiate(Factory)--> Semantic Structure
Semantic Structure --render(Grammar)--> ParseTree --------display----------> Text
Note that parse and render, which both use the Grammar, are not inverses of each other. Instead, they both output ParseTrees, but they do it from two different directions.
Now the interesting thing is that a Grammar is really just a ParseTree with added structure: to allow for alternation, repetition, and variable data. We don't need to add sequencing, because ParseTree already has that. That is, we can view a grammar as
ParseTreeGrammarStructure = Kleene(ParseTreeStructure)
This is probably not too surprising. The interesting thing is that once we make structural descriptions be values, then we can write the Kleene function, which "grammarizes" any structure. (Its called Kleene after the person who first formalized the idea of alternation, repetition, and sequencing.)
My new language lets me think about, and write down, the Kleene function. And use it in more general ways. For example, I can now contemplate constructing
DiagramGrammarStrucutre = Kleene(DiagramStructure)
This creates a semantic structure for the grammar of diagrams. I can then use it with my generic render routine to create diagrams of any semantic structure:
Semantic Structure ---render(DiagramGrammar)---> Diagram
This brings up the question of parsing diagrams. It turns out that we will edit them rather than parse them, but it will be the topic for a future post.
Smalltalk and the future
I recently said that I'm working on creating the "Smalltalk of Modeling". To me, this means creating an elegant system that is practical but also embodies radical and powerful new ideas. It should also be implemented in itself and also capable of creating real applications.
But there are few things about Smalltalk that I hate. The first is the idea of an "image". I cannot stand it when my program is tangled with the IDE. It makes it difficult to collaborate, do version control, or deploy final products. I'm also not a big fan of read-eval-print loops (REPL). I like to write code in files, compile them into programs, and deploy them to customers. My IDE should help me write and debug my programs, but stay out of my way otherwise. Perhaps I did too much C programming when I was young. Some people love images, but I don't get it.
The second one is model-view-controller (MVC). I love MV, but I can do without the C. I have always suspected that it is an artifact of the Smalltalk runtime model, which was written right on the metal and did its own bit rendering and mouse tracking. I have to admit I'm not as experienced at GUIs as I am at other kinds of programming, but I've done a few. More modern toolkits don't have separate controllers. The controllers are built into the views (widgets). That is fine with me. Every time I see a library that emphasizes its adherence to MVC (with a capital C) I cringe. For example, does it really help web apps to modularize where to go next from a page?
Other than that, I love Smalltalk. On the other hand, I'm convinced that OO has run out of steam, and should be best used as the foundation for the next dominant paradigm. What is the next paradigm? Models. Yes, that's it! Not UML. Not MDA. Not EMF/QVT. Not Rails. These are good hints about what is to come, but they are still not complete. Hence my current project, to create a Smalltalk of Modeling.
But there are few things about Smalltalk that I hate. The first is the idea of an "image". I cannot stand it when my program is tangled with the IDE. It makes it difficult to collaborate, do version control, or deploy final products. I'm also not a big fan of read-eval-print loops (REPL). I like to write code in files, compile them into programs, and deploy them to customers. My IDE should help me write and debug my programs, but stay out of my way otherwise. Perhaps I did too much C programming when I was young. Some people love images, but I don't get it.
The second one is model-view-controller (MVC). I love MV, but I can do without the C. I have always suspected that it is an artifact of the Smalltalk runtime model, which was written right on the metal and did its own bit rendering and mouse tracking. I have to admit I'm not as experienced at GUIs as I am at other kinds of programming, but I've done a few. More modern toolkits don't have separate controllers. The controllers are built into the views (widgets). That is fine with me. Every time I see a library that emphasizes its adherence to MVC (with a capital C) I cringe. For example, does it really help web apps to modularize where to go next from a page?
Other than that, I love Smalltalk. On the other hand, I'm convinced that OO has run out of steam, and should be best used as the foundation for the next dominant paradigm. What is the next paradigm? Models. Yes, that's it! Not UML. Not MDA. Not EMF/QVT. Not Rails. These are good hints about what is to come, but they are still not complete. Hence my current project, to create a Smalltalk of Modeling.
Ruby GUI
I'm in Ruby GUI hell.
wxruby
I'm using ruby1.9 so I need the latest gem. But it is not compiled for 64bit architecture, so I have to build it from source. When I compile, I get an error:
Update: after much pain I got wxruby to run. Here is the post that has the clue. The basic summary is:
1) configure ruby with " --with-arch=x86_64,i386" to get both architectures
2) use Gem to install wxruby-ruby19-2.0.1-x86-darwin-9.gem
3) when you run ruby, you have to say "arch -i386 ruby ...
As I said, I hate OSX every time I have to install software.
shoes
You have to run the shoes application. Its not a library, so I don't think I can use it. But I might have to try.
bowline
Looks interesting, and things seem to work until I try to launch the application:
I guess I might have to try it...
And all the others... I haven't heard much good about them.
Its at times like this that I pine for my Windows box. It may be ugly, but there is a much better change that things will "just work" than on Un*x or OSX.
wxruby
I'm using ruby1.9 so I need the latest gem. But it is not compiled for 64bit architecture, so I have to build it from source. When I compile, I get an error:
SWIG version 1.3.31 is installed, minimum version required is 1.3.37.But when I used MacPorts to try to upgrade SWIG, it only lets me install version 1.3.31 or 2.0.3_0. But wxruby requires a version between 1.3.32 and 1.3.37 because "SWIG 1.3.39 introduced changes incompatible with swig/fixmodule.rb".
Update: after much pain I got wxruby to run. Here is the post that has the clue. The basic summary is:
1) configure ruby with " --with-arch=x86_64,i386" to get both architectures
2) use Gem to install wxruby-ruby19-2.0.1-x86-darwin-9.gem
3) when you run ruby, you have to say "arch -i386 ruby ...
As I said, I hate OSX every time I have to install software.
shoes
You have to run the shoes application. Its not a library, so I don't think I can use it. But I might have to try.
bowline
Looks interesting, and things seem to work until I try to launch the application:
no such file to load -- rubygemsJRuby/swing
./bowline_twitter/config/boot.rb:3:in `require'
./bowline_twitter/config/boot.rb:3:in `'
./bowline_twitter/script/init:2:in `require'
./bowline_twitter/script/init:2:in `'
I guess I might have to try it...
And all the others... I haven't heard much good about them.
Its at times like this that I pine for my Windows box. It may be ugly, but there is a much better change that things will "just work" than on Un*x or OSX.