Flex (not Adobe’s)
October 8, 2008 § Leave a comment
Disclaimer: This post is intended as a reminder for myself.
I need to tokenize (and parse) a language that have very simple grammar. That job is currently done using Python beautifully. But at some point, a C or C++ implementation maybe better (for obvious performance reason).
What open source tool can I use? This need must have already been addressed long time ago.
I’m quite sure that the parser will have be home-grown. Meanwhile, Flex should be able to fulfill the tokenizer need.
- Unsurprisingly, AWS SimpleDB syntax follows Backus-Naur form. I’m glad to finally know the formal name for this kind of grammar.
Some word definitions:
- Lexical Analyzer – This is what common programmers would refer to as tokenizer. It’s input will be transformed into list (or tree) of tokens.
- Parser – Takes in list (or tree) of tokens, then generates output.
- Flex – It’s a general purpose tokenizer, written in C.
- Backus-Naur form – It’s a common grammar for programming languages.
References that have nothing to do with what I need: