내용 |
WordCount.jar 파일과 데이터파일(speech.tar.gz)이 필요하다.
WordCount.jar, speech.tar.gz 파일을 ~/lab 디렉토리에 다운로드 한다.
$ mkdir ~/lab
$ cd ~/lab
$ wget -O WordCount.jar http://javaspecialist.co.kr/pds/249
$ wget -O speech.tar.gz http://javaspecialist.co.kr/pds/250
$ tar -xf speech.tar.gz
클러스터를 실행한다.(nodemanager와 historyserver를 실행해 놓는다)
$ cd $HADOOP_HOME/sbin
$ ./start-all.sh
$ ./yarn-daemon.sh start nodemanager
$ ./mr-jobhistory-daemon.sh start historyserver
HDFS에 데이터 업로드 디렉토리 생성 및 데이터 파일 업로드
$ hdfs dfs -put ~/lab/speech/ /
워드 카운트 예제 실행
$ hadoop jar WordCount.jar /speech/ /output/word_count
실행 결과 확인
$ hdfs dfs -ls /output/word_count
$ hdfs dfs -cat /output/word_count/part-r-00000
$ hdfs dfs -cat /output/word_count/part-r-00000 | head -10
$ hdfs dfs -cat /output/word_count/part-r-00000 | tail -10
----------------------------------------------
[hadoop@master lab]$ hadoop jar WordCount.jar /speech/ /output/word_count
17/07/17 01:15:44 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
17/07/17 01:15:45 WARN mapreduce.JobResourceUploader: Hadoop command-line option parsing not performed. Implement the Tool interface and execute your application with ToolRunner to remedy this.
17/07/17 01:15:45 INFO input.FileInputFormat: Total input files to process : 14
17/07/17 01:15:45 INFO mapreduce.JobSubmitter: number of splits:14
17/07/17 01:15:45 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1500262048189_0001
17/07/17 01:15:46 INFO impl.YarnClientImpl: Submitted application application_1500262048189_0001
17/07/17 01:15:46 INFO mapreduce.Job: The url to track the job: http://master:8088/proxy/application_1500262048189_0001/
17/07/17 01:15:46 INFO mapreduce.Job: Running job: job_1500262048189_0001
17/07/17 01:17:20 INFO mapreduce.Job: Job job_1500262048189_0001 running in uber mode : false
17/07/17 01:17:20 INFO mapreduce.Job: map 0% reduce 0%
17/07/17 01:17:37 INFO mapreduce.Job: map 7% reduce 0%
17/07/17 01:17:39 INFO mapreduce.Job: map 14% reduce 0%
17/07/17 01:17:41 INFO mapreduce.Job: map 21% reduce 0%
17/07/17 01:17:43 INFO mapreduce.Job: map 36% reduce 0%
17/07/17 01:17:44 INFO mapreduce.Job: map 43% reduce 0%
17/07/17 01:17:55 INFO mapreduce.Job: map 50% reduce 0%
17/07/17 01:17:58 INFO mapreduce.Job: map 57% reduce 0%
17/07/17 01:18:01 INFO mapreduce.Job: map 64% reduce 0%
17/07/17 01:18:02 INFO mapreduce.Job: map 71% reduce 0%
17/07/17 01:18:03 INFO mapreduce.Job: map 79% reduce 0%
17/07/17 01:18:08 INFO mapreduce.Job: map 86% reduce 0%
17/07/17 01:18:09 INFO mapreduce.Job: map 93% reduce 0%
17/07/17 01:18:10 INFO mapreduce.Job: map 100% reduce 29%
17/07/17 01:18:11 INFO mapreduce.Job: map 100% reduce 100%
17/07/17 01:18:11 INFO mapreduce.Job: Job job_1500262048189_0001 completed successfully
17/07/17 01:18:11 INFO mapreduce.Job: Counters: 50
File System Counters
FILE: Number of bytes read=349601
17/07/17 01:18:11 INFO mapreduce.Job: Counters: 50
File System Counters
FILE: Number of bytes read=349601
FILE: Number of bytes written=2743560
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=176577
HDFS: Number of bytes written=49110
HDFS: Number of read operations=45
HDFS: Number of large read operations=0
HDFS: Number of write operations=2
Job Counters
Killed map tasks=1
Launched map tasks=14
Launched reduce tasks=1
Rack-local map tasks=14
Total time spent by all maps in occupied slots (ms)=201330
Total time spent by all reduces in occupied slots (ms)=27018
Total time spent by all map tasks (ms)=201330
Total time spent by all reduce tasks (ms)=27018
Total vcore-milliseconds taken by all map tasks=201330
Total vcore-milliseconds taken by all reduce tasks=27018
Total megabyte-milliseconds taken by all map tasks=206161920
Total megabyte-milliseconds taken by all reduce tasks=27666432
Map-Reduce Framework
Map input records=1333
Map output records=30154
Map output bytes=289287
Map output materialized bytes=349679
Input split bytes=1843
Map-Reduce Framework
Map input records=1333
Map output records=30154
Map output bytes=289287
Map output materialized bytes=349679
Input split bytes=1843
Combine input records=0
Combine output records=0
Reduce input groups=4858
Reduce shuffle bytes=349679
Reduce input records=30154
Reduce output records=4858
Spilled Records=60308
Shuffled Maps =14
Failed Shuffles=0
Merged Map outputs=14
GC time elapsed (ms)=1995
CPU time spent (ms)=6300
Physical memory (bytes) snapshot=3129200640
Virtual memory (bytes) snapshot=31510302720
Total committed heap usage (bytes)=2283995136
Shuffle Errors
BAD_ID=0
CONNECTION=0
IO_ERROR=0
WRONG_LENGTH=0
WRONG_MAP=0
WRONG_REDUCE=0
File Input Format Counters
Bytes Read=174734
File Output Format Counters
Bytes Written=49110
[hadoop@master lab]$ hdfs dfs -ls /output/word_count
Found 2 items
-rw-r--r-- 2 hadoop supergroup 0 2017-07-17 01:18 /output/word_count/_SUCCESS
-rw-r--r-- 2 hadoop supergroup 49110 2017-07-17 01:18 /output/word_count/part-r-00000
[hadoop@master lab]$ hdfs dfs -cat /output/word_count/part-r-00000 | head -10
$1 3
$12 2
$150 1
$16 2
$185 1
$2 3
$20 1
$210 1
$3 1
$30 2
[hadoop@master lab]$ hdfs dfs -cat /output/word_count/part-r-00000 | tail -10
yesterday 4
yet 12
yielding 1
yields 1
you 84
young 14
your 22
yourselves 1
youth 2
zone 1
|