Hadoop: Open Source Map Reduce
December 29, 2007 § 2 Comments
What is Map:
In Python, map() applies a certain function to each element in a list. Map returns a list.
What is Reduce:
Superficially, not much difference, reduce() takes a certain function and runs that function against every element in a list. Reduce returns 1 item.
What is MapReduce:
It is an architecture that allows functions to be executed across distributed cluster. MapReduce is special because the map and reduce functions are complemented with key-value mapping so that functions can be executed across distributed commodity servers.
What is Hadoop:
It is MapReduce open source implementation. It is written in Java.
Python obviously already have map and reduce functions, so what’s left is to figure out the distributed aspect of MapReduce. Below are two people who have already thought of MapReduce implementation in Python:
Resources:
- Google MapReduce Paper
- Hadoop
- Thomas B Hickey blog
- Nutch – Another Java implementation
- Michael G Noll blog
Interesting article. Pretty basic info, if you want to see an implementation that does not require a Java install, try: http://skynet.rubyforge.org/
Yes its Ruby but its the closest thing to python you are going to find.
@Enger:
Thanks for the info about Skynet. I’ll definitely take a look.