Thursday, September 27, 2012

MongoDB tips from "Lessons Learned from Migrating 2+ Billion Documents at Craigslist"


You should really listen to this talk "Lessons Learned from Migrating 2+ Billion Documents at Craigslist" by Jeremy Zawodny.
However if you don't have 30 minutes to spare these are the main items:

1. Pay attention to encoding. MongoDB uses UTF8 so you'll need to process your data if it has all sorts of encoding.
2.  There's a document size limit (defferrs from version to version) so if some of your documents are too big you should plan how to avoid this problem. Otherwise it will fail when you'll try to load them into MongoDB.
3. Pay attention to data types (don't put everything as string) - otherwise you'll have trouble when querying. This is especially tricky when using dynamic typed programming languages. Also make sure that the driver you use is not inferring your data types.
4. Sharding - when you first load the data you can  stop the internal load balancer (to reduce IO) and you can also  pre split the data in advance.
5. consider using file system that supports compression if you store lots of text.

and finally - join the mailing list. it has tons of information that would be very helpful.