내용

글번호 723
작성자 허진경
작성일 2017-07-17 17:28:38
제목 워드 카운트 실행해보기
내용 WordCount.jar 파일과 데이터파일(speech.tar.gz)이 필요하다. WordCount.jar, speech.tar.gz 파일을 ~/lab 디렉토리에 다운로드 한다. $ mkdir ~/lab $ cd ~/lab $ wget -O WordCount.jar http://javaspecialist.co.kr/pds/249 $ wget -O speech.tar.gz http://javaspecialist.co.kr/pds/250 $ tar -xf speech.tar.gz 클러스터를 실행한다.(nodemanager와 historyserver를 실행해 놓는다) $ cd $HADOOP_HOME/sbin $ ./start-all.sh $ ./yarn-daemon.sh start nodemanager $ ./mr-jobhistory-daemon.sh start historyserver HDFS에 데이터 업로드 디렉토리 생성 및 데이터 파일 업로드 $ hdfs dfs -put ~/lab/speech/ / 워드 카운트 예제 실행 $ hadoop jar WordCount.jar /speech/ /output/word_count 실행 결과 확인 $ hdfs dfs -ls /output/word_count $ hdfs dfs -cat /output/word_count/part-r-00000 $ hdfs dfs -cat /output/word_count/part-r-00000 | head -10 $ hdfs dfs -cat /output/word_count/part-r-00000 | tail -10 ---------------------------------------------- [hadoop@master lab]$ hadoop jar WordCount.jar /speech/ /output/word_count 17/07/17 01:15:44 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032 17/07/17 01:15:45 WARN mapreduce.JobResourceUploader: Hadoop command-line option parsing not performed. Implement the Tool interface and execute your application with ToolRunner to remedy this. 17/07/17 01:15:45 INFO input.FileInputFormat: Total input files to process : 14 17/07/17 01:15:45 INFO mapreduce.JobSubmitter: number of splits:14 17/07/17 01:15:45 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1500262048189_0001 17/07/17 01:15:46 INFO impl.YarnClientImpl: Submitted application application_1500262048189_0001 17/07/17 01:15:46 INFO mapreduce.Job: The url to track the job: http://master:8088/proxy/application_1500262048189_0001/ 17/07/17 01:15:46 INFO mapreduce.Job: Running job: job_1500262048189_0001 17/07/17 01:17:20 INFO mapreduce.Job: Job job_1500262048189_0001 running in uber mode : false 17/07/17 01:17:20 INFO mapreduce.Job: map 0% reduce 0% 17/07/17 01:17:37 INFO mapreduce.Job: map 7% reduce 0% 17/07/17 01:17:39 INFO mapreduce.Job: map 14% reduce 0% 17/07/17 01:17:41 INFO mapreduce.Job: map 21% reduce 0% 17/07/17 01:17:43 INFO mapreduce.Job: map 36% reduce 0% 17/07/17 01:17:44 INFO mapreduce.Job: map 43% reduce 0% 17/07/17 01:17:55 INFO mapreduce.Job: map 50% reduce 0% 17/07/17 01:17:58 INFO mapreduce.Job: map 57% reduce 0% 17/07/17 01:18:01 INFO mapreduce.Job: map 64% reduce 0% 17/07/17 01:18:02 INFO mapreduce.Job: map 71% reduce 0% 17/07/17 01:18:03 INFO mapreduce.Job: map 79% reduce 0% 17/07/17 01:18:08 INFO mapreduce.Job: map 86% reduce 0% 17/07/17 01:18:09 INFO mapreduce.Job: map 93% reduce 0% 17/07/17 01:18:10 INFO mapreduce.Job: map 100% reduce 29% 17/07/17 01:18:11 INFO mapreduce.Job: map 100% reduce 100% 17/07/17 01:18:11 INFO mapreduce.Job: Job job_1500262048189_0001 completed successfully 17/07/17 01:18:11 INFO mapreduce.Job: Counters: 50 File System Counters FILE: Number of bytes read=349601 17/07/17 01:18:11 INFO mapreduce.Job: Counters: 50 File System Counters FILE: Number of bytes read=349601 FILE: Number of bytes written=2743560 FILE: Number of read operations=0 FILE: Number of large read operations=0 FILE: Number of write operations=0 HDFS: Number of bytes read=176577 HDFS: Number of bytes written=49110 HDFS: Number of read operations=45 HDFS: Number of large read operations=0 HDFS: Number of write operations=2 Job Counters Killed map tasks=1 Launched map tasks=14 Launched reduce tasks=1 Rack-local map tasks=14 Total time spent by all maps in occupied slots (ms)=201330 Total time spent by all reduces in occupied slots (ms)=27018 Total time spent by all map tasks (ms)=201330 Total time spent by all reduce tasks (ms)=27018 Total vcore-milliseconds taken by all map tasks=201330 Total vcore-milliseconds taken by all reduce tasks=27018 Total megabyte-milliseconds taken by all map tasks=206161920 Total megabyte-milliseconds taken by all reduce tasks=27666432 Map-Reduce Framework Map input records=1333 Map output records=30154 Map output bytes=289287 Map output materialized bytes=349679 Input split bytes=1843 Map-Reduce Framework Map input records=1333 Map output records=30154 Map output bytes=289287 Map output materialized bytes=349679 Input split bytes=1843 Combine input records=0 Combine output records=0 Reduce input groups=4858 Reduce shuffle bytes=349679 Reduce input records=30154 Reduce output records=4858 Spilled Records=60308 Shuffled Maps =14 Failed Shuffles=0 Merged Map outputs=14 GC time elapsed (ms)=1995 CPU time spent (ms)=6300 Physical memory (bytes) snapshot=3129200640 Virtual memory (bytes) snapshot=31510302720 Total committed heap usage (bytes)=2283995136 Shuffle Errors BAD_ID=0 CONNECTION=0 IO_ERROR=0 WRONG_LENGTH=0 WRONG_MAP=0 WRONG_REDUCE=0 File Input Format Counters Bytes Read=174734 File Output Format Counters Bytes Written=49110 [hadoop@master lab]$ hdfs dfs -ls /output/word_count Found 2 items -rw-r--r-- 2 hadoop supergroup 0 2017-07-17 01:18 /output/word_count/_SUCCESS -rw-r--r-- 2 hadoop supergroup 49110 2017-07-17 01:18 /output/word_count/part-r-00000 [hadoop@master lab]$ hdfs dfs -cat /output/word_count/part-r-00000 | head -10 $1 3 $12 2 $150 1 $16 2 $185 1 $2 3 $20 1 $210 1 $3 1 $30 2 [hadoop@master lab]$ hdfs dfs -cat /output/word_count/part-r-00000 | tail -10 yesterday 4 yet 12 yielding 1 yields 1 you 84 young 14 your 22 yourselves 1 youth 2 zone 1