Working with re2c, lessons learned

I've completed the parser that parses bind9 style config files. The parser itself is based on  the lemon parser generator,  and the lexer is based on re2c. While testing and developing the parser, I've ran into some strange, undocumented issues.

  • Error checking is practically absent. Bad input results in bad runtime behaviour instead of errors.

  • Make sure that all input in conditions is handled; Input encountered in states without a matching rule results in the resulting code jumping to a (semi) random rule. Again, this doesn't result in errors.

  • RTFM about what regular expressions re2c supports. No error checking what so ever.

  • Rules with wildcard conditions (<*> <regexp>) are first processed, above all other rules. A condition with a matching rules overrides the wildcard condition. This is important if you have a match-all condition [^], since that will precede the wildcard condition rule.

If you live up to the above undocumented things, re2c make writing lexers as easy as it gets.