Generic Syntax: Lisp parsing + C notation

I've been working with Jose Falcon, undergrad here at UT, for almost a year on this crazy idea for an approach to syntax that combines aspects of Lisp S-Expressions and familiar Algol/C/Java notations. The idea is to use a generic syntax, as in Lisp, but extend it to include many of the common syntactic conventions found in programming languages, grammars, and style sheets. Lisp only recognizes these characters: "(.,')" and space. Our language, called Gel, recognizes {}, [], (), and arbitrary unary and infix operators. So you can write
{ a + (x ** 3) ==> x | y.z; x := 37; if: a=3 then: print(3, f[x]); }
and have it parse just as you would expect. We tag keywords with a ":", because they have to be generic too. To get operators to work right, we make spaces meaningful. Thus:
a +b == a(+b)
a+ b == (a+)b
a + b == (a)+(b)
This corresponds to common usage in Java/C and also most grammar notations:
E ::= E | ("+" E)*
also parses correctly in Gel. Its basically a "super-lexer" just as Lisp is. We will get the source code up soon so you can check it out. Here is the paper. The work will be presented at IFIP Working Conference on Domain Specific Languages (DSL WC).


rgrig said...

Why would I use Gel instead of flex/bison or antlr?

William said...

Gel is good if you want to parse rich syntax but don't want to worry about it too much. Also, if you have multiple different sub-languages, Gel is useful.

You still need a tree grammar to validate the resulting Gel AST. There are several ways to do this. One is to output XML and then validate with an XML schema. Another is to write a tree grammar over the Gel AST. You could use ANTLR for that, but it would be a much simpler grammar than a full syntax grammar.

Gel is not a complete replacement for flx/bison yet. We are still working on XML output, and we are also building our own tree grammar format. But you can use Gel as a high-power structural lexer now.

rgrig said...

I see. I'll play with it next week.