Tuesday, March 23, 2010

Posts I made on Microsoft Israel MCS blog (in hebrew)

First post about what should be avoided when considering the use of design patterns: http://blogs.microsoft.co.il/blogs/mcs/archive/2009/11/04/426203.aspx

Second post about NoSQL (not-only SQL):
http://blogs.microsoft.co.il/blogs/mcs/archive/2010/03/17/nosql-not-only-sql.aspx

Unfortunately they copied the text from word document directly into the blog editor which led to some rather funny mistakes in the products listing and categorization section. the real mapping is as follows:

Azure table storage -> Microsoft

Memcachedb

Velocity -> Microsoft

Cassandra -> facebook and DIGG

Dynamo -> amazon

tokyoTyrant

barkelyDb

if you want to see how it really works check out this blog-post from a very talented person I know :)
http://drorbr.blogspot.com/2010/02/migrating-springhibernate-application.html
The usage example there is in Java but it really doesn't matter…

Wednesday, March 10, 2010

Playing with MongoDB in c#

Took an hour to start playing with MongoDB using c#. It's really easy to get it up and running, practically a matter of minutes on a single machine configuration. Later on I installed the official .net driver from jithub and wrote some code.

my conclusion so far:
1. it's fast. much faster than relation DB on same machine with same data.
2. for CRUD operations it's easy to use (I didn't take the time so far to checkout the map-reduce implementation coming with the driver).

When I tried to bulk insert more than 250,000 items I got an error from the server saying:
"Wed Mar 10 21:37:34 bad recv() len: 53888928
Wed Mar 10 21:37:34 end connection 127.0.0.1:2795"
I opened a Jira bug for this since I didn't find anything about it on google and got the following response:
"messages can't be more than 4MB (plus a little wiggle room for header)
when doing bulk inserts, need to do in batches of 4mb"
Great work by the jira team of this project: http://jira.mongodb.org/browse/CSHARP-25?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel




These are the results for my machine using single threaded approach with one open connection (intel core2duo , 2.66GHz , 3.25GB Ram, win xp sp2, standard single HD):



Time (ms) to create 1 jsons in memory: 0
Time (ms) to insert 1 jsons to mongo - one by one: 0
Time (ms) to insert 1 jsons to mongo - bulk operation: 0
Time (ms) to insert 1 jsons to mongo - bulk operation (safe mode): 41
Time (ms) to read 1 records: 0
***
Time (ms) to create 10 jsons in memory: 0
Time (ms) to insert 10 jsons to mongo - one by one: 2
Time (ms) to insert 10 jsons to mongo - bulk operation: 0
Time (ms) to insert 10 jsons to mongo - bulk operation (safe mode): 31
Time (ms) to read 10 records: 0
***
Time (ms) to create 100 jsons in memory: 3
Time (ms) to insert 100 jsons to mongo - one by one: 19
Time (ms) to insert 100 jsons to mongo - bulk operation: 0
Time (ms) to insert 100 jsons to mongo - bulk operation (safe mode): 42
Time (ms) to read 100 records: 4
***
Time (ms) to create 1000 jsons in memory: 22
Time (ms) to insert 1000 jsons to mongo - one by one: 197
Time (ms) to insert 1000 jsons to mongo - bulk operation: 4
Time (ms) to insert 1000 jsons to mongo - bulk operation (safe mode): 17
Time (ms) to read 1000 records: 29
***
Time (ms) to create 10000 jsons in memory: 233
Time (ms) to insert 10000 jsons to mongo - bulk operation: 58
Time (ms) to insert 10000 jsons to mongo - bulk operation (safe mode): 175
Time (ms) to read 10000 records: 247
***
Time (ms) to create 100000 jsons in memory: 2294
Time (ms) to insert 100000 jsons to mongo - bulk operation: 693
Time (ms) to insert 100000 jsons to mongo - bulk operation (safe mode): 1410
Time (ms) to read 100000 records: 2360
***
Time (ms) to create 1000000 jsons in memory: 23328
Wed Mar 10 21:37:34 bad recv() len: 53888928
Wed Mar 10 21:37:34 end connection 127.0.0.1:2795


as you can see it, on this very simple configuration the performance is close to linear to the number of items on all types of tested operations

Monday, March 8, 2010

Performance snippet – arrays in c# (for x86 compilers)

Here are some rules for working with array in .net when performance is critical. Some demand from you nothing but knowing about them with no special implementation implications, other require making decisions that require more attention (such as using unsafe code or even unmanaged code).

It's better to use jagged arrays (array or arrays) rather than multidimensional arrays
Within the CLR there is an optimization for loops that has a termination checkpoint against the length property of the collection/array. However this is not implemented for multidimensional arrays.In order to achieve better performance when the need arises to use multidimensional arrays it's better to use Jagged arrays
Example of a jagged array:int[][] arrJagged = new int[][]{new int[]{1,2,3,4,5},new int[]{2,3,4,5,6}};

Always use ascending loops
Well its even a better coding practice. Not only that your loop would be easier to read and understand it will also enjoy the benefit of better performance.
In a regular ascending loop the compiler statically checks that we are in the boundaries of the array, thus preventing the need for further checks on runtime. But, this feature is not implemented for descending loops which means that for each access by index a to the array a dynamic range check would be performed on runtime, reducing code speed…
But, pay attention. If you write this code

Private double[] GenerateAndPopulate(int iSize)
{
   double[] arr = new double[iSize];
   for(int I=0 ; I < iSize ; i++)
   {
      Arr[i] = i;
   }
}

You would not enjoy the termination of dynamic bound checks since. The right way (in terms of performance) is to use arr.Lenght

To be continued soon….