Sunday, October 27, 2013

Hadoop configuration - permissions confusion

I'll start from the end - HDFS permissions are managed by hadoop and are not reflected (even on creation) from the underlying file system.

To make a long story short - I installed hadoop 2.2.0 manually a few days ago on Linux 12.04 LTS. Once finished I ran ./start-all.sh and everything seemed to be working. once I stopped the services I got a message saying that job-tracker is not running and therefor will not be stopped. The log file for job tracker showed said it did not have permission to write to

job-tracker log I get the following error:
FATAL org.apache.hadoop.mapred.JobTracker: 
org.apache.hadoop.security.AccessControlException: 
The systemdir hdfs://localhost:54310/home/hduser/tmp/mapred/system 
is not owned by hduser
googeling about it led me to tons of people have the same problem or variants of it but all solutions did not work. Some of them far fetched (re-install everything) but most of them focused on "mapred-site.xml" and file system permissions. Since my config files were OK I played with the permissions and ownership of the "tmp/mapred" folder and restarting thing over and over again... to no avail. After too much time of bashing my head against an imaginary wall I realized the little "hdfs://" on the begging. Wish I noticed it 2 hours before.
The solution was easy. instead of using regular bash commands I needed to use the hadoop version of them. so with "hadoop fs -chmod" and "hadoop fs -chown" I solved the issue in 10 seconds ("hadoop fs chown -R hduser /home/hduser/tmp/" and "hadoop fs chmod -R 755 /home/hduser/tmp/")

Thursday, September 27, 2012

MongoDB tips from "Lessons Learned from Migrating 2+ Billion Documents at Craigslist"


You should really listen to this talk "Lessons Learned from Migrating 2+ Billion Documents at Craigslist" by Jeremy Zawodny.
However if you don't have 30 minutes to spare these are the main items:

1. Pay attention to encoding. MongoDB uses UTF8 so you'll need to process your data if it has all sorts of encoding.
2.  There's a document size limit (defferrs from version to version) so if some of your documents are too big you should plan how to avoid this problem. Otherwise it will fail when you'll try to load them into MongoDB.
3. Pay attention to data types (don't put everything as string) - otherwise you'll have trouble when querying. This is especially tricky when using dynamic typed programming languages. Also make sure that the driver you use is not inferring your data types.
4. Sharding - when you first load the data you can  stop the internal load balancer (to reduce IO) and you can also  pre split the data in advance.
5. consider using file system that supports compression if you store lots of text.

and finally - join the mailing list. it has tons of information that would be very helpful.

Sunday, July 1, 2012

rails with sqlite - suddenly became a little nightmare

So I decided to reinstall rails on one of my virtual machines since I really fucked it up and there was no restore image available. Though to myself I'll just waste 3 minutes and everything will be up and running. What I did not expect is that reinstalling sqlite3 which I use for Dev will cause me such an headache.

while creating new project using "rails new someProject" the bundler had problems with the sqlite installation saying "an error occurred while installing sqlite3 (1.3.6) " . Googled a bit about it and found so many people having the same problem but nothing worked. the gem file and the sqlite3 installation did not want to work together. The solutions varied from tweaking every possible file related to ruby to uninstalling everything you ever had....
after trying all sorts of voodoo from  stackoverflow I found out the solution (among 10 other I tried) which is to install a dev version of sqlite using "apt-get install libsqlite3-dev".
So first of all I posted it here not to forget it once I'll run into this weird issue again. What surprised me the most is that nothing in the error messages implied that the problem is with the version itself nor did the gem file gave me any clue about it.

Getting back onto linux after so many years working only with windows surely takes its toll.



Wednesday, June 6, 2012

Two issues with asp.net MVC

While Microsoft pushes it's MVC implementation over the old web-forms model it seems that some of the techniques demonstrated in most presentations/introduction are quite naive and can lead a developers problematic paths. Here are a two things that might look small and not very harmful but I think they should be taken in consideration when using Asp.net MVC:


1. Security issue with the default model binding - The implementation of the default model binding takes the data coming from the users HTTP request and puts it in a key-value data structure. This is a good thing for saving ourselves from boilerplate code but using it without thinking of security implications can lead to some security flaws in the code. For example,  with a little common-sense/internal-datastruct-knowledge/fuzzing-knowledge an attacker can alternate internal values of the model (or view-model, depends on the programmer) by adding them to the http response sent from his browser to the server. This can lead to all sorts of bad things from impersonation to data corruption and even perssisted attacks. it all depends on the flow of the code. you can checkout this simple example


2. I've seen excessive use of RedirectToAction fundtion which causes HTTP 302 which google is not very fond of. HTTP 302 means that the page has temporarily moved to a new location. this, however, is not what is usually demonstrated. it is usually used in Asp.net MVC to demonstrate the way to redirect a user to another page after performing some action. a better approach would be to use some kind of Server.transfer but it is missing from the current MVC implementation. You can check the workaround but I think such function should have been implemented inside the framework itself.





Wednesday, September 7, 2011

quick & dirty basic python reference - v1

this is definitely not a tutorial. just quick notes I took while reading two python books and playing with this nice&flexible language.


Getting around:

Get help:
>>> num = 10
>>> help(10)
Help on int object:

class int(object)
| int(x[, base]) -> integer
|
| Convert a string or number to an integer, if possible. A floating
| point argument will be truncated towards zero (this does not include a
| string representation of a floating point number!) When converting a
| string, use the optional base. It is an error to supply a base when
| converting a non-string.
|
| Methods defined here:
|
| __abs__(...)
| x.__abs__() <==> abs(x)
|
| __add__(...)
| x.__add__(y) <==> x+y
.
.
.

Basic commands and concepts:

>>>print 2*3
6

>>> x = 2
>>> x += 1
>>> x
3

>>> 2 ** 3 #power
8

>>>TextFromUser = raw_input( )


import sys - imports system library

sys.path.append - appends a floder to python path

if "somevalue" in sys.argv: - nice way to write an if statement when asking about a list item


reload(someScript) - reloads a script\library

from datetime import datetime.current - imports a specific function/member

from SomeScript import var1 , var2 , var3 , var4 - imports a specific function/member

>>> import random
>>> random.random( )
0.232425325
>>> random.choice([1,2,3,5,8,13,21,34,55])
13

>>> userdata = int(input("how old are you")) #get input from user
how old are you54
>>> userdata
54

Get a list of type's\object's function:

>>> dir(list)
['__add__', '__class__', '__contains__', '__delattr__', '__delitem__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__', '__getitem__', '__gt__', '__hash__', '__iadd__', '__imul__', '__init__', '__iter__', '__le__', '__len__', '__lt__', '__mul__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__reversed__', '__rmul__', '__setattr__', '__setitem__', '__sizeof__', '__str__', '__subclasshook__', 'append', 'count', 'extend', 'index', 'insert', 'pop', 'remove', 'reverse', 'sort']

>>> lst = [1,2,3,4,5]
>>> dir(lst)
['__add__', '__class__', '__contains__', '__delattr__', '__delitem__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__', '__getitem__', '__gt__', '__hash__', '__iadd__', '__imul__', '__init__', '__iter__', '__le__', '__len__', '__lt__', '__mul__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__reversed__', '__rmul__', '__setattr__', '__setitem__', '__sizeof__', '__str__', '__subclasshook__', 'append', 'count', 'extend', 'index', 'insert', 'pop', 'remove', 'reverse', 'sort']

Data types:

string:

>>> S = 'stringy'
>>> S[1]
't'
>>> S[-2]
'g'
>>> S[1:4]
'trin'
>>> S[1:]
'tringy'
>>> S[:2]
'st'
>>> len(S)
7
>>> S*3
'stringystringystringy'

#this is how we create multi-line string
os.environ["CFLAGS"] = "-W -Wimplicit-int " + \
"-Wimplicit-function-declaration " + \
"-Wimplicit -Wmain -Wreturn-type -Wunused -Wswitch "

Dictionary:

>>> dict = {'felix':'cat','rex':'dog','num':2}
>>> dict['num'] -= 2
>>> dict['num']
0

list:

>>> lst1 = [1,2,3,4,5]
>>> lst1
[1, 2, 3, 4, 5]
>>> lst1[1:]
[2, 3, 4, 5]
>>> lst1[:-1]
[1, 2, 3, 4]
>>> lst1[-3:]
[3, 4, 5]
>>> lst2 = [1,2,[1,2,3,4],1,1,2,2]
>>> lst2[2]
[1, 2, 3, 4]
>>> lst2.append(1)
>>> lst2
[1, 2, [1, 2, 3, 4], 1, 1, 2, 2, 1]
>>> lst3 = lst2 * 2
>>> lst3
[1, 2, [1, 2, 3, 4], 1, 1, 2, 2, 1, 1, 2, [1, 2, 3, 4], 1, 1, 2, 2, 1]
>>> lst4 = [11,22,33,44,55]
>>> lst4.pop(1)
>>> lst4
[11,33,44,55]

>>> multidimlist = [[1, 2, 3],
[4, 5, 6],
[7, 8, 9]]
>>> multidimlist[1][1]
5

>>> multidimlist = [[1,2,3],[4,5,6],[7,8,9]]
>>> [row[0] for row in multidimlist]
[1, 4, 7]

>>> [{row[0],row[1]} for row in multidimlist if row[0] > 3 and row[1] < 9]
[set([4, 5]), set([8, 7])]

tuple: (same as list just immutable):

>>> tup = (1,1,1,1,1,1,2,2,2,3,4,5,6)
>>> tup.count(2)
3

Set:

>>> s1 = set([1,2,3,4,5])
>>> s2 = set([3,4,5,6,7,8])
>>> s1 & s2
set([3, 4, 5])
>>> s1 - s2
set([1, 2])
>>> s2 - s1
set([8, 6, 7])
>>> s1 | s2
set([1, 2, 3, 4, 5, 6, 7, 8])

type: (for 'reflection\introspection' operations)

>>> sample_set = set([1,2,3])
>>> type(sample_set)


class:

class hunter:
def __init__(self,name,age): #init when created
self.name = name
self.age = age
def hunt(self,animal):
return str(animal) + " hunted"

>>> jack = hunter.hunter('jack',22)
>>> jack.hunt('joe')
'joe hunted'


Functions:

Define a function:

>>> def fact(n):
""" calculates the factorial """
r = 1
while n >0:
r = r*n
n = n - 1
return r


Run the function:
>>> fact(4)
24

Define a function with named parameters and with default values:
>>> def myPower(arg1 , arg2=2):
x = 1
counter = arg2
while counter > 0:
x = arg1 * x
counter = counter - 1
return x

Run the function:
>>> myPower(3)
9
>>> myPower(2,8)
256
>>> myPower(arg2 = 8 , arg1=2)
256

Define a function undefined number of args:
>>> def Contains10(*nums):
current = 0
for num in nums:
if num == 10:
return 'true'
return 'false'

Run the function:

>>> Contains10(10,22)
'true'
>>> Contains10(1,2,3)
'false'

Function is a first class citizen - can be passed as a parameter or set to a variable:

>>> ct = Contains10
>>> ct(10,2,3)
'true'

Function that gets another function as argument and returns a new funcion - decorator pattern:

>>> def logTime(func):
from datetime import datetime
print("setting time logging decoration to ", func.__name__)
def wrapperFunc(*args):
print("start " , func.__name__, " - " , datetime.now())
func(*args)
print("finish " , func.__name__, " - " , datetime.now())
return wrapperFunc

Apply decorator to a function:

>>> @logTime
def takesLongTime():
i = 0
while i < 1000000000:
i += i + 2
return i

setting time logging decoration to takesLongTime


Run the function:
>>> takesLongTime()
start takesLongTime - 2011-09-08 00:42:33.682000
finish takesLongTime - 2011-09-08 00:42:33.710000

Working with files\web\database:

working with files:
>>> f = open('c:\example.txt', 'w')
>>> f.writelines('I am the first line')
>>> f.write('I am some more text')
>>> f.close()

the file now contains -> "I am the first lineI am some more text"

>>> f = open('c:\example.txt')
>>> f.read()
'I am the first lineI am some more text'

>>> import os
>>> for root, dirs, files in os.walk('c:\\playpython'):
print("{0} has {1} files".format(root , len(files)))

c:\playpython has 6 files
c:\playpython\1dir has 1 files
c:\playpython\2dir has 6 files



Working with web client: (urllib2)

>>> import urllib2 as u
>>> webpage = u.urlopen('http://www.cnn.com')
>>> webpage.read(400)
'CNN.com International - Breaking, World, Business, Sports, Entertainment and Video News \nWorking with mongodb:

>>> import pymongo
>>> from pymongo import Connection
>>> conn = Connection('localhost', 666) #connect to mongodb on local machine over port 666
>>> db = conn('myDB) #open specific db inside mongodb server
>>> collection = db.customers #get a specific collection



Sunday, July 10, 2011

XSS - not as easy as it used to be

I am preparing a lecture about web site security and was starting with good old XSS. I did a very similar lecture two and a half years ago and demonstrated some of the OWASP top 10 on a very big Israeli community site. As it happens the guys from this site finally realized that their site was much like Swiss cheese so the plugged most of the holes.
I thought it would be fun to create a simple asp.net site that will demonstrate the flaws I found back then in the community site. However it seems that with the new age of browsers (all major browsers) and with the new asp.net runtime it is very hard to perform even a simple XSS. I am not a big Microsoft fan (nor a fan of any other big corporate) but I must say that Microsoft did hell of a job making the asp.net almost completely idiot proof for newbie web developers. It checks query string for XSS and throws exceptions at you, it encodes stuff, it validates http request parts – it ruins all the fun :)

But don't worry – I already made the first page demonstrating some ways to perform XSS attacks. I will upload it to this blog sometime in the near future.

Monday, July 19, 2010

a great site with tons of examples for everything you wan't to do with c#

http://www.java2s.com/Code/CSharp/CatalogCSharp.htm

Monday, May 3, 2010

asp.net Performance snippet - disable dynamic compilation unless you need it

disable dynamic compilation unless you need it

Should be false

Monday, April 5, 2010

Using castle dynamicProxy

I was looking for an easy dynamic proxy to work with (since it is missing from the .net framework for some unknown reason). I found castle dynamic proxy to be very elegant and useful.

Here is how to use it.

First create the class that will be accessible via the proxy:

    public  class MsgRouter
    {
        public virtual string SendMail(string from, string to, string title, string body)
        {
            return "from: " + from + Environment.NewLine + "to: " + to + Environment.NewLine + "title:" + title + Environment.NewLine + "body:" + body;
        }
 
        public virtual string SendEncryptedMail(string from, string to, string title, string body, string key)
        {
            return "from: " + from + Environment.NewLine + "to: " + to + Environment.NewLine + "title:" + title + Environment.NewLine + "body:" + body;
 
        }
 
        public virtual string SendFax(string PhoneNumber, string body)
        {
            return "phone number: " + PhoneNumber + " ,fax body:" + body;
        }
    }


Create an interceptor:

    /// <summary>
    ///     Counts the number of calls for each function.
    /// </summary>
    public class CallCounterInterceptor : IInterceptor
    {
        private Dictionary<string, int> _callsToMethods = new Dictionary<string, int>();
 
        #region IInterceptor Members
 
        public void Intercept(IInvocation invocation)
        {
            var name = invocation.MethodInvocationTarget.Name;
            if (!_callsToMethods.ContainsKey(name))
            {
                _callsToMethods.Add(name, 1);
            }
            else
            {
                int current = _callsToMethods[name];
                _callsToMethods[name] = current + 1;
            }
        }
 
        #endregion
    }


Create another interceptor:

   /// <summary>
    ///     Clears sensitive data
    /// </summary>
    public class ClearSensitiveDataInterceptor : IInterceptor
    {
        #region IInterceptor Members
 
        public void Intercept(IInvocation invocation)
        {
            for(int i=0; i < invocation.Arguments.Length ; i++)
            {
                invocation.SetArgumentValue(i, ClearSensitiveData(invocation.Arguments[i].ToString()));
            }
            invocation.Proceed();
            // change the return value
            invocation.ReturnValue = "gone!";
        }
 
        #endregion
 
        #region Utility
 
        /// <summary>
        ///     Just for the example....
        /// </summary>
        /// <param name="sInput"></param>
        /// <returns></returns>
        public string ClearSensitiveData(string sInput)
        {
            string res =  sInput.ToLower().Replace("[secret]", "---");
            return res;
        }
 
        #endregion
    }


Create a factory method for the MsgRouter class. The factory will be in-charge of creating the proxy object and binding it to one or more interceptors:

        private static MsgRouter MailerFactory()
        {
            IInterceptor[] arrInterceptors = new IInterceptor[2];
            arrInterceptors[0] = new ClearSensitiveDataInterceptor();
            arrInterceptors[1] = new CallCounterInterceptor();
 
            ProxyGenerator oProxyGenerator = new ProxyGenerator();
            var oProxy = oProxyGenerator.CreateClassProxy<MsgRouter>(new ProxyGenerationOptions(), arrInterceptors);
            return oProxy;
        }


Finally write some code to use the MsgRouter object:

            var oProxy = MailerFactory();
 
            string res = oProxy.SendEncryptedMail("me", "you", "title", "hello! this is the [secret]", "#$#%##$%^$#^$#^#^#^^$#%$#%$#^&^%&&*^&*($^%#&%#^$^");
            res = oProxy.SendMail("me", "you", "title", "hello! this is the [secret] I have told you about");
            res = oProxy.SendMail("me2", "you3", "a new title", "You get the point right?");
            res = oProxy.SendFax("6726721", "This is the body of the fax to be sent...");


Interception will be done for each virtual function call in a nested way as demonstrated:

1. call the method on the proxy object
2. first interceptor is invoked
3. second interceptor is invoked
4. original method on the original object is invoked
5. second interceptor has a chance to continue (and change for example the return value)
6. first interceptor has a chance to continue (and change for example the return value)

This is fine for what I was looking for. For more advanced implementations one should consider using a more comprehensive AOP framework.

Tuesday, March 23, 2010

Posts I made on Microsoft Israel MCS blog (in hebrew)

First post about what should be avoided when considering the use of design patterns: http://blogs.microsoft.co.il/blogs/mcs/archive/2009/11/04/426203.aspx

Second post about NoSQL (not-only SQL):
http://blogs.microsoft.co.il/blogs/mcs/archive/2010/03/17/nosql-not-only-sql.aspx

Unfortunately they copied the text from word document directly into the blog editor which led to some rather funny mistakes in the products listing and categorization section. the real mapping is as follows:

Azure table storage -> Microsoft

Memcachedb

Velocity -> Microsoft

Cassandra -> facebook and DIGG

Dynamo -> amazon

tokyoTyrant

barkelyDb

if you want to see how it really works check out this blog-post from a very talented person I know :)
http://drorbr.blogspot.com/2010/02/migrating-springhibernate-application.html
The usage example there is in Java but it really doesn't matter…