Sunday, October 27, 2013

Hadoop configuration - permissions confusion

I'll start from the end - HDFS permissions are managed by hadoop and are not reflected (even on creation) from the underlying file system.

To make a long story short - I installed hadoop 2.2.0 manually a few days ago on Linux 12.04 LTS. Once finished I ran ./start-all.sh and everything seemed to be working. once I stopped the services I got a message saying that job-tracker is not running and therefor will not be stopped. The log file for job tracker showed said it did not have permission to write to

job-tracker log I get the following error:
FATAL org.apache.hadoop.mapred.JobTracker: 
org.apache.hadoop.security.AccessControlException: 
The systemdir hdfs://localhost:54310/home/hduser/tmp/mapred/system 
is not owned by hduser
googeling about it led me to tons of people have the same problem or variants of it but all solutions did not work. Some of them far fetched (re-install everything) but most of them focused on "mapred-site.xml" and file system permissions. Since my config files were OK I played with the permissions and ownership of the "tmp/mapred" folder and restarting thing over and over again... to no avail. After too much time of bashing my head against an imaginary wall I realized the little "hdfs://" on the begging. Wish I noticed it 2 hours before.
The solution was easy. instead of using regular bash commands I needed to use the hadoop version of them. so with "hadoop fs -chmod" and "hadoop fs -chown" I solved the issue in 10 seconds ("hadoop fs chown -R hduser /home/hduser/tmp/" and "hadoop fs chmod -R 755 /home/hduser/tmp/")

No comments: