Flex (not Adobe’s)

October 8, 2008 § Leave a comment

Disclaimer: This post is intended as a reminder for myself.

 

I need to tokenize (and parse) a language that have very simple grammar. That job is currently done using Python beautifully. But at some point, a C or C++ implementation maybe better (for obvious performance reason). 

What open source tool can I use? This need must have already been addressed long time ago.

I’m quite sure that the parser will have be home-grown. Meanwhile, Flex should be able to fulfill the tokenizer need.

Side Note:

  • Unsurprisingly, AWS SimpleDB syntax follows Backus-Naur form. I’m glad to finally know the formal name for this kind of grammar.

Some word definitions:

  • Lexical Analyzer – This is what common programmers would refer to as tokenizer. It’s input will be transformed into list (or tree) of tokens.
  • Parser – Takes in list (or tree) of tokens, then generates output. 

References:

  • Flex – It’s a general purpose tokenizer, written in C.
  • Backus-Naur form – It’s a common grammar for programming languages.

References that have nothing to do with what I need:

  • JS/CC – Javascript parser written in Javascript
  • Quex – Tokenize string -> C++ lexical analyzers.
Advertisements

Tagged: ,

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

What’s this?

You are currently reading Flex (not Adobe’s) at RAPD.

meta

%d bloggers like this: