对于Hadoop的MapReduce编程makefile[通俗易懂]

对于Hadoop的MapReduce编程makefile

大家好,又见面了,我是全栈君,今天给大家准备了Idea注册码。

根据近期需要hadoop的MapReduce程序集成到一个大的应用C/C++书面框架。在需求make当自己主动MapReduce编译和打包的应用。

在这里,一个简单的WordCount1一个例子详细的实施细则,注意:hadoop版本号2.4.0.

源码包括两个文件。一个是WordCount1.java是详细的对单词计数实现的逻辑。第二个是CounterThread.java。当中简单的当前处理的行数做一个统计和打印。代码分别见附1. 编写makefile的关键是将hadoop提供的jar包的路径所有载入进来,看到网上非常多资料都自己实现一个脚本把hadoop文件夹下所有的.jar文件放到一个路径中。然后进行编译。这样的做法太麻烦了。当然也有些简单的办法,可是都是比較老的hadoop版本号如0.20之类的。

事实上,hadoop提供了一个命令hadoop classpath能够获得包括全部jar包的路径.所以仅仅须要用 javac -classpath “`hadoop classpath`” *.java 便可。然后使用jar -cvf对class文件进行打包就能够了。

详细的Makefile代码例如以下:

SRC_DIR = src/mypackage/*.java 
CLASS_DIR = bin
TARGET_JAR = WordCount

all:$(TARGET_JAR)

$(TARGET_JAR): $(SRC_DIR) 
	mkdir -p $(CLASS_DIR)
#	javac -classpath `$(HADOOP) classpath` -d $(CLASS_DIR) $(SRC_DIR) 
	javac -classpath "`hadoop classpath`" src/mypackage/*.java -d $(CLASS_DIR) -Xlint
	jar -cvf $(TARGET_JAR).jar -C $(CLASS_DIR) ./
 
clean: 
	rm -rf $(CLASS_DIR) *.jar

make一下:

lichao@ubuntu:WordCount1$ make
mkdir -p bin
javac -classpath "`hadoop classpath`" src/mypackage/*.java -d bin -Xlint
warning: [path] bad path element "/home/lichao/Software/hadoop/hadoop-src/hadoop-2.4.0-src/hadoop-dist/target/hadoop-2.4.0/share/hadoop/common/lib/jaxb-api.jar": no such file or directory
warning: [path] bad path element "/home/lichao/Software/hadoop/hadoop-src/hadoop-2.4.0-src/hadoop-dist/target/hadoop-2.4.0/share/hadoop/common/lib/activation.jar": no such file or directory
warning: [path] bad path element "/home/lichao/Software/hadoop/hadoop-src/hadoop-2.4.0-src/hadoop-dist/target/hadoop-2.4.0/share/hadoop/common/lib/jsr173_1.0_api.jar": no such file or directory
warning: [path] bad path element "/home/lichao/Software/hadoop/hadoop-src/hadoop-2.4.0-src/hadoop-dist/target/hadoop-2.4.0/share/hadoop/common/lib/jaxb1-impl.jar": no such file or directory
warning: [path] bad path element "/home/lichao/Software/hadoop/hadoop-src/hadoop-2.4.0-src/hadoop-dist/target/hadoop-2.4.0/share/hadoop/yarn/lib/jaxb-api.jar": no such file or directory
warning: [path] bad path element "/home/lichao/Software/hadoop/hadoop-src/hadoop-2.4.0-src/hadoop-dist/target/hadoop-2.4.0/share/hadoop/yarn/lib/activation.jar": no such file or directory
warning: [path] bad path element "/home/lichao/Software/hadoop/hadoop-src/hadoop-2.4.0-src/hadoop-dist/target/hadoop-2.4.0/share/hadoop/yarn/lib/jsr173_1.0_api.jar": no such file or directory
warning: [path] bad path element "/home/lichao/Software/hadoop/hadoop-src/hadoop-2.4.0-src/hadoop-dist/target/hadoop-2.4.0/share/hadoop/yarn/lib/jaxb1-impl.jar": no such file or directory
warning: [path] bad path element "/home/lichao/Software/hadoop/hadoop-src/hadoop-2.4.0-src/hadoop-dist/target/hadoop-2.4.0/contrib/capacity-scheduler/*.jar": no such file or directory
src/mypackage/WordCount1.java:61: warning: [deprecation] Job(Configuration,String) in Job has been deprecated
		Job job = new Job(conf, "WordCount1");                  //建立新job
		          ^
10 warnings
jar -cvf WordCount.jar -C bin ./
added manifest
adding: mypackage/(in = 0) (out= 0)(stored 0%)
adding: mypackage/WordCount1.class(in = 1970) (out= 1037)(deflated 47%)
adding: mypackage/CounterThread.class(in = 1760) (out= 914)(deflated 48%)
adding: mypackage/WordCount1$IntSumReducer.class(in = 1762) (out= 749)(deflated 57%)
adding: mypackage/WordCount1$TokenizerMapper.class(in = 1759) (out= 762)(deflated 56%)
adding: log4j.properties(in = 476) (out= 172)(deflated 63%)

尽管有warning,可是不影响结果。

编译后。我们来简单的測试一下。

先生成測试数据:while true; do seq 1 100000 >> tmpfile; done; 差点儿相同能够了就Ctrl+c

然后将数据放到hdfs上。hadoop fs -put tmpfile /data/

接着执行MapReduce程序:hadoop jar WordCount.jar mypackage/WordCount1 /data/tmpfile /output2

效果例如以下:

14/07/15 13:26:01 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
14/07/15 13:26:03 INFO client.RMProxy: Connecting to ResourceManager at localhost/127.0.0.1:8032
14/07/15 13:26:05 INFO input.FileInputFormat: Total input paths to process : 1
14/07/15 13:26:05 INFO mapreduce.JobSubmitter: number of splits:6
14/07/15 13:26:06 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1405397597558_0003
14/07/15 13:26:06 INFO impl.YarnClientImpl: Submitted application application_1405397597558_0003
14/07/15 13:26:06 INFO mapreduce.Job: The url to track the job: http://ubuntu:8088/proxy/application_1405397597558_0003/
14/07/15 13:26:06 INFO mapreduce.Job: Running job: job_1405397597558_0003
14/07/15 13:26:20 INFO mapreduce.Job: Job job_1405397597558_0003 running in uber mode : false
14/07/15 13:26:20 INFO mapreduce.Job:  map 0% reduce 0%
14/07/15 13:26:34 WARN mapreduce.Counters: Group org.apache.hadoop.mapred.Task$Counter is deprecated. Use org.apache.hadoop.mapreduce.TaskCounter instead
输入行数:0
14/07/15 13:26:48 INFO mapreduce.Job:  map 2% reduce 0%
输入行数:3138474
14/07/15 13:26:51 INFO mapreduce.Job:  map 5% reduce 0%
14/07/15 13:26:54 INFO mapreduce.Job:  map 6% reduce 0%
14/07/15 13:26:55 INFO mapreduce.Job:  map 8% reduce 0%
14/07/15 13:26:57 INFO mapreduce.Job:  map 9% reduce 0%
14/07/15 13:26:58 INFO mapreduce.Job:  map 11% reduce 0%
14/07/15 13:27:00 INFO mapreduce.Job:  map 12% reduce 0%
14/07/15 13:27:01 INFO mapreduce.Job:  map 13% reduce 0%
输入行数:23383595
14/07/15 13:27:05 INFO mapreduce.Job:  map 14% reduce 0%
输入行数:23383595
14/07/15 13:27:23 INFO mapreduce.Job:  map 15% reduce 0%
14/07/15 13:27:27 INFO mapreduce.Job:  map 16% reduce 0%
14/07/15 13:27:28 INFO mapreduce.Job:  map 18% reduce 0%
14/07/15 13:27:30 INFO mapreduce.Job:  map 19% reduce 0%
14/07/15 13:27:31 INFO mapreduce.Job:  map 21% reduce 0%
14/07/15 13:27:34 INFO mapreduce.Job:  map 24% reduce 0%
输入行数:38430301
14/07/15 13:27:37 INFO mapreduce.Job:  map 25% reduce 0%
14/07/15 13:27:40 INFO mapreduce.Job:  map 26% reduce 0%
输入行数:42826322
14/07/15 13:27:57 INFO mapreduce.Job:  map 27% reduce 0%
14/07/15 13:28:00 INFO mapreduce.Job:  map 29% reduce 0%
14/07/15 13:28:02 INFO mapreduce.Job:  map 30% reduce 0%
14/07/15 13:28:03 INFO mapreduce.Job:  map 32% reduce 0%
输入行数:54513531
14/07/15 13:28:05 INFO mapreduce.Job:  map 33% reduce 0%
14/07/15 13:28:06 INFO mapreduce.Job:  map 34% reduce 0%
14/07/15 13:28:08 INFO mapreduce.Job:  map 35% reduce 0%
14/07/15 13:28:09 INFO mapreduce.Job:  map 36% reduce 0%
输入行数:60959081
14/07/15 13:28:22 INFO mapreduce.Job:  map 42% reduce 0%
14/07/15 13:28:30 INFO mapreduce.Job:  map 43% reduce 0%
14/07/15 13:28:31 INFO mapreduce.Job:  map 44% reduce 0%
14/07/15 13:28:34 INFO mapreduce.Job:  map 45% reduce 0%
14/07/15 13:28:35 INFO mapreduce.Job:  map 46% reduce 0%
输入行数:69936159
14/07/15 13:28:37 INFO mapreduce.Job:  map 47% reduce 0%
14/07/15 13:28:38 INFO mapreduce.Job:  map 48% reduce 0%
14/07/15 13:28:41 INFO mapreduce.Job:  map 49% reduce 0%
14/07/15 13:28:44 INFO mapreduce.Job:  map 50% reduce 0%
输入行数:77160461
14/07/15 13:29:01 INFO mapreduce.Job:  map 51% reduce 0%
14/07/15 13:29:04 INFO mapreduce.Job:  map 52% reduce 0%
14/07/15 13:29:05 INFO mapreduce.Job:  map 53% reduce 0%
输入行数:83000373
14/07/15 13:29:07 INFO mapreduce.Job:  map 54% reduce 0%
14/07/15 13:29:09 INFO mapreduce.Job:  map 55% reduce 0%
14/07/15 13:29:10 INFO mapreduce.Job:  map 56% reduce 0%
14/07/15 13:29:13 INFO mapreduce.Job:  map 57% reduce 0%
14/07/15 13:29:16 INFO mapreduce.Job:  map 58% reduce 0%
输入行数:93361766
14/07/15 13:29:32 INFO mapreduce.Job:  map 59% reduce 0%
输入行数:98194696
14/07/15 13:29:35 INFO mapreduce.Job:  map 60% reduce 0%
14/07/15 13:29:37 INFO mapreduce.Job:  map 61% reduce 0%
14/07/15 13:29:38 INFO mapreduce.Job:  map 62% reduce 0%
14/07/15 13:29:40 INFO mapreduce.Job:  map 63% reduce 0%
14/07/15 13:29:41 INFO mapreduce.Job:  map 64% reduce 0%
14/07/15 13:29:44 INFO mapreduce.Job:  map 65% reduce 0%
14/07/15 13:29:48 INFO mapreduce.Job:  map 66% reduce 0%
输入行数:109562184
14/07/15 13:30:04 INFO mapreduce.Job:  map 67% reduce 0%
输入行数:113362818
14/07/15 13:30:06 INFO mapreduce.Job:  map 68% reduce 0%
14/07/15 13:30:08 INFO mapreduce.Job:  map 69% reduce 0%
14/07/15 13:30:10 INFO mapreduce.Job:  map 70% reduce 0%
14/07/15 13:30:12 INFO mapreduce.Job:  map 71% reduce 0%
14/07/15 13:30:15 INFO mapreduce.Job:  map 72% reduce 0%
输入行数:123074119
14/07/15 13:30:32 INFO mapreduce.Job:  map 76% reduce 0%
14/07/15 13:30:33 INFO mapreduce.Job:  map 80% reduce 0%
14/07/15 13:30:34 INFO mapreduce.Job:  map 83% reduce 0%
14/07/15 13:30:35 INFO mapreduce.Job:  map 84% reduce 0%
输入行数:123074119
14/07/15 13:30:37 INFO mapreduce.Job:  map 89% reduce 0%
14/07/15 13:30:38 INFO mapreduce.Job:  map 92% reduce 0%
14/07/15 13:30:39 INFO mapreduce.Job:  map 95% reduce 0%
14/07/15 13:30:40 INFO mapreduce.Job:  map 100% reduce 0%
输入行数:123074119
14/07/15 13:30:53 INFO mapreduce.Job:  map 100% reduce 100%
14/07/15 13:30:53 INFO mapreduce.Job: Job job_1405397597558_0003 completed successfully
14/07/15 13:30:53 INFO mapreduce.Job: Counters: 50
	File System Counters
		FILE: Number of bytes read=58256119
		FILE: Number of bytes written=66039749
		FILE: Number of read operations=0
		FILE: Number of large read operations=0
		FILE: Number of write operations=0
		HDFS: Number of bytes read=724520133
		HDFS: Number of bytes written=1088895
		HDFS: Number of read operations=21
		HDFS: Number of large read operations=0
		HDFS: Number of write operations=2
	Job Counters 
		Killed map tasks=2
		Launched map tasks=8
		Launched reduce tasks=1
		Data-local map tasks=8
		Total time spent by all maps in occupied slots (ms)=1528715
		Total time spent by all reduces in occupied slots (ms)=17508
		Total time spent by all map tasks (ms)=1528715
		Total time spent by all reduce tasks (ms)=17508
		Total vcore-seconds taken by all map tasks=1528715
		Total vcore-seconds taken by all reduce tasks=17508
		Total megabyte-seconds taken by all map tasks=1565404160
		Total megabyte-seconds taken by all reduce tasks=17928192
	Map-Reduce Framework
		Map input records=123074119
		Map output records=123074119
		Map output bytes=1216795535
		Map output materialized bytes=7133406
		Input split bytes=594
		Combine input records=127374119
		Combine output records=4900000
		Reduce input groups=100000
		Reduce shuffle bytes=7133406
		Reduce input records=600000
		Reduce output records=100000
		Spilled Records=5500000
		Shuffled Maps =6
		Failed Shuffles=0
		Merged Map outputs=6
		GC time elapsed (ms)=39761
		CPU time spent (ms)=1397060
		Physical memory (bytes) snapshot=1797943296
		Virtual memory (bytes) snapshot=5082316800
		Total committed heap usage (bytes)=1398800384
	Shuffle Errors
		BAD_ID=0
		CONNECTION=0
		IO_ERROR=0
		WRONG_LENGTH=0
		WRONG_MAP=0
		WRONG_REDUCE=0
	File Input Format Counters 
		Bytes Read=724519539
	File Output Format Counters 
		Bytes Written=1088895

附录1:WordCount1.java和CounterThread.java的代码

//WordCount1.java代码
package mypackage;

import java.io.IOException;
import java.util.StringTokenizer;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.util.GenericOptionsParser;

public class WordCount1 {
	public static class TokenizerMapper extends Mapper<Object, Text, Text, IntWritable>{

		private final static IntWritable one = new IntWritable(1);  //建立"int"型变量one,初值为1
		private Text word = new Text();                             //建立"string:型变量 word,用于接收传入的单词

		public void map(Object key, Text value, Context context
				) throws IOException, InterruptedException {
			StringTokenizer itr = new StringTokenizer(value.toString());  //将输入的文本按行分段
			while (itr.hasMoreTokens()) {
				word.set(itr.nextToken());                                  //为word赋值
				context.write(word, one);                                   // 将 键-值 对 word one 传入
			}
			//System.out.println("read lines:"+context.getCounter("org.apache.hadoop.mapred.Task$Counter","MAP_INPUT_RECORDS").getValue());
			//System.out.println( "输入行数:" + context.getCounters().findCounter("org.apache.hadoop.mapred.Task$Counter", "MAP_INPUT_RECORDS").getValue() );
			//System.out.println( "输入行数:" + context.getCounters().findCounter("", "MAP_INPUT_RECORDS").getValue() );
		}
	}

	public static class IntSumReducer 
	extends Reducer<Text,IntWritable,Text,IntWritable> { 
		private IntWritable result = new IntWritable();                 //创建整型变量result

		public void reduce(Text key, Iterable<IntWritable> values, 
				Context context
				) throws IOException, InterruptedException {
			int sum = 0;                                                 //创建int 型变量sum 初值0
			for (IntWritable val : values) {
				sum += val.get();                                          //将每一个key相应的全部value类间

			}
			result.set(sum);                                              //sum传入result                                        
			context.write(key, result);                                   //将 key-result对传入
		}
	}

	public static void main(String[] args) throws Exception {
		Configuration conf = new Configuration();
		//String[] newArgs = new String[]{"hdfs://localhost:9000/data/tmpfile","hdfs://localhost:9000/data/wc_output"};
		String[] otherArgs = new GenericOptionsParser(conf, args).getRemainingArgs();
		if (otherArgs.length != 2) {
			System.err.println("Usage: wordcount <in> <out>");
			System.exit(2);
		}
		Job job = new Job(conf, "WordCount1");                  //建立新job
		job.setJarByClass(WordCount1.class);
		job.setMapperClass(TokenizerMapper.class);              //设置map类
		job.setCombinerClass(IntSumReducer.class);              //设置combiner类
		job.setReducerClass(IntSumReducer.class);               //设置reducer类
		job.setOutputKeyClass(Text.class);                       //输出的key类型
		job.setOutputValueClass(IntWritable.class);              //输出的value类型
		FileInputFormat.addInputPath(job, new Path(otherArgs[0]));  //输入输出參数(在设置中指定)
		FileOutputFormat.setOutputPath(job, new Path(otherArgs[1]));
		
		CounterThread ct = new CounterThread(job);
		ct.start();
		
		job.waitForCompletion(true);
		
		System.exit(0);
		//System.exit(job.waitForCompletion(true) ? 0 : 1);
	}
}

//CounterThread.java的代码
package mypackage;

import java.lang.*;
import java.io.IOException;
import java.util.StringTokenizer;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.JobStatus;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.util.GenericOptionsParser;

public class CounterThread extends Thread{
	
	public CounterThread(Job job) {
		_job = job;
	}
	
	public void run() {
		while(true){
			try {
				Thread.sleep(1000*5);
			} catch (InterruptedException e1) {
				// TODO Auto-generated catch block
				e1.printStackTrace();
			}
			try {
				if(_job.getStatus().getState() == JobStatus.State.RUNNING) 
					//continue;
					System.out.println( "输入行数:" + _job.getCounters().findCounter("org.apache.hadoop.mapred.Task$Counter", "MAP_INPUT_RECORDS").getValue() );
			} catch (IOException e) {
				// TODO Auto-generated catch block
				e.printStackTrace();
			} catch (InterruptedException e) {
				// TODO Auto-generated catch block
				e.printStackTrace();
			}
		}
	}
	
	private Job _job;
}

版权声明:本文内容由互联网用户自发贡献,该文观点仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 举报,一经查实,本站将立刻删除。

发布者:全栈程序员-用户IM,转载请注明出处:https://javaforall.cn/116698.html原文链接:https://javaforall.cn

【正版授权,激活自己账号】: Jetbrains全家桶Ide使用,1年售后保障,每天仅需1毛

【官方授权 正版激活】: 官方授权 正版激活 支持Jetbrains家族下所有IDE 使用个人JB账号...

(0)


相关推荐

  • 研华acdp手机版_研华acdp值得购买吗

    研华acdp手机版_研华acdp值得购买吗你准备游览一个公园,该公园由 N 个岛屿组成,当地管理部门从每个岛屿出发向另外一个岛屿建了一座桥,不过桥是可以双向行走的。同时,每对岛屿之间都有一艘专用的往来两岛之间的渡船。相对于乘船而言,你更喜欢步行。你希望所经过的桥的总长度尽可能的长,但受到以下的限制:可以自行挑选一个岛开始游览。任何一个岛都不能游览一次以上。无论任何时间你都可以由你现在所在的岛 S 去另一个你从未到过的岛 D。由 S 到 D 可以有以下方法:(1)步行:仅当两个岛之间有一座桥时才有可能。对于这种情况,桥的长度会累加到你步

  • 大数据挖掘有哪些技术

    大数据挖掘有哪些技术  数据挖掘技术虽是一项新兴的数据处理技术,但其发展速度十分迅猛,至今已经形成了决策树、神经网络、统计学习、聚类分析、关联规则等多项数据挖掘技术,极大的满足了用户的需求。  1、决策树算法  决策树算法是分类和预测的常用技术之一,可用于深入分析分类问题,使用时,决策树能够利用预测理论对多个变量中进行分析,从而预测处任一变量的发展趋势和变化关系;除此以外,还能对变量发展趋势进行双向预测,既能进行正向预测,也能进行反向预测,因此具有方便灵活的优势。  2、神经网络算法  神经网络是将计算机技术与

  • 二叉树的5个重要性质「建议收藏」

    二叉树的5个重要性质「建议收藏」1.在二叉树的第i层上最多有2 i-1 个节点。(i>=1) 用归纳法证明:归纳基:i=1层时,只有一个根结点,          2i-1=20=1;归纳假设:假设i=k时,命题成立;归纳证明:二叉树上每个结点至多有两棵子树,则第k+1层的结点数最多为2k-12=2k+1-1。

  • DDR3原理详解_判断能量信号和功率信号

    DDR3原理详解_判断能量信号和功率信号转自:http://www.360doc.com/content/14/0116/16/15528092_345730642.shtml 首先,我们先了解一下内存的大体结构工作流程,这样会比较容量理解这些参数在其中所起到的作用。这部分的讲述运用DDR3的简化时序图。   DDR3的内部是一个存储阵列,将数据“填”进去,你可以它想象成一张表格。和表格的检索原理一样,先指定一个行(Row),再指定一个…

    2022年10月25日
  • java防止接口重复请求_前端防止重复提交

    java防止接口重复请求_前端防止重复提交PopularMVC框架接口防重复提交功能使用示例简介1、简介此项目用于演示如何使用popularmvc提供的接口防重复提交功能。使用防重提交功能,只需要在需要防重的接口上添加@RequiredNoRepeatSubmit注解即可。主要有以下内容:防重复提交码模式自定义防重复提交码自定义防重复提交码需要调用者保证防重复提交码的全局唯一性,推荐结构:userId+timestamptimestamp在数据更新后才允许更新使用sign作为防重码如果接口开启了数字签

  • 【spring】bean管理

    【spring】bean管理【spring】bean管理

发表回复

您的电子邮箱地址不会被公开。

关注全栈程序员社区公众号