Uploading files to EMR MasterNode:
ssh hadoop@ec2-54-211-65-155.compute-1.amazonaws.com -i ~/documents/AWS_home/cred/sariyaba.pem
Use Cyberduck to SFTP files from Mac to EC2 instance
hadoop jar /home/hadoop/poc/dynamodb-0.0.1-SNAPSHOT.jar com.here.poc.dynamodb.LoadDriverNew s3://ariyabala/POC/DynamoDB/sample s3://ariyabala/POC/DynamoDB/output
Deploying files and Executing the jar:
scp -i ~/documents/AWS_home/cred/sariyaba.pem target/nokia.ddb-0.0.1-SNAPSHOT.jar hadoop@ec2-54-226-87-206.compute-1.amazonaws.com:/home/hadoop/
ssh -i ~/Documents/AWS_home/cred/sariyaba.pem hadoop@ec2-54-211-107-39.compute-1.amazonaws.com
inputPath=”s3://ariyabala/POC/spring/nokia_text.txt” outputPath=”s3://ariyabala/POC/spring/output_wc” hadoop jar nokia.ddb-0.0.1-SNAPSHOT.jar com.nokia.ddb.Job wordCountJob
Install S3 CLI in Mac:
ruby -e “$(curl -fsSL https://raw.github.com/mxcl/homebrew/go)”
brew install s3cmd
Upload/Download Files to/from S3:
s3cmd -c [configuration file] get –recursive s3path localpath
s3cmd -c ~/Documents/s3cfg-struct get –recursive s3://ariyabala/ddb/geoip-export/2013-10-28_07.00/ ~/Documents/AWS_home/POC/DDB/ddb_exports/
s3cmd -c [configuration file] put localpath s3path
s3cmd -c ~/Documents/s3cfg-struct put ~/Documents/workspace/*.txt s3://ariyabala/POC/spring/
Recursive Line count on files in S3:
output=$(path=`hadoop dfs -lsr s3://com.nokia.analytics.prod.deviceact/activation/processed/2012-11-04/ | awk ‘{print $6}’| grep ‘part’ |grep -v NativeS3FileSystem` ;for f in $path; do count=`hadoop dfs -cat s3://com.nokia.analytics.prod.deviceact/$f | wc -l`;echo $f”:”$count”\n”; done;); echo -e $output
Run Script:
./elastic-mapreduce –create –alive –name “My Development Jobflow” –jar s3://elasticmapreduce/libs/script-runner/script-runner.jar —args “s3://<buckname>/test.sh”
Mounting Instance Store:
sudo umount /dev/xvdb
sudo mkfs.ext4 /dev/xvdb
sudo mkdir -p /local/b
sudo mount /dev/xvdb /local/b
sudo umount /dev/xvdc
sudo mkfs.ext4 /dev/xvdc
sudo mkdir -p /local/c
sudo mount /dev/xvdc /local/c