Hadoop生态圈python + mapreduce + wordcount

Hadoop生态圈python + mapreduce + wordcountHadoop生态圈python+mapreduce+wordcount启动hadoop进度发布文件hdfsdfs-put/home/hadoop/hadoop/input/user/hadoop/input查看hdfs现在有一些文件[hadoop@master0hadoop]$hdfsdfs-ls/Found1itemsdrwxr-xr-x-hadoopsupergroup02019-12-0402

大家好,又见面了,我是你们的朋友全栈君。

Hadoop生态圈python + mapreduce + wordcount

启动hadoop进度

发布文件

hdfs dfs -put /home/hadoop/hadoop/input /user/hadoop/input 

查看hdfs现在有一些文件

[hadoop@master0 hadoop]$ hdfs dfs -ls / Found 1 items drwxr-xr-x - hadoop supergroup 0 2019-12-04 02:17 /user 

经验| Hadoop生态圈python + mapreduce + wordcount

查看上传的文件是否正确

经验| Hadoop生态圈python + mapreduce + wordcount

运行程序,查询字符串出现次数

 

查看输出结果

[hadoop@master0 hdfs]$ hdfs dfs -cat /user/hadoop/output/* work, 63 worker 315 would 62 write-operations. 62 written 62 ........ ....... ....... 

编写mapreduce编程,推送到流中进行运算

#!/usr/bin/env Python3 # -*- coding: utf-8 -*- # @Software: PyCharm # @virtualenv:workon # @contact: 1040691703@qq.com # @Desc:Code descripton __author__ = '未昔/AngelFate' __date__ = '2019/12/4 20:20' import sys for line in sys.stdin: line = line.strip() words = line.split() for word in words: print("%s\t%s"%(word,1)) 
#!/usr/bin/env Python3 # -*- coding: utf-8 -*- # @Software: PyCharm # @virtualenv:workon # @contact: 1040691703@qq.com # @Desc:Code descripton __author__ = '未昔/AngelFate' __date__ = '2019/12/4 20:25' import sys current_word = None #记录前一个单词, 用于比较 count = 0 word = None current_count = 0 for line in sys.stdin: line = line.strip() word, count = line.split('\t', 1) try: count = int(count) except ValueError: continue if current_word == word: current_count += count else: if current_word: print("%s\t%s" % (current_word, current_count)) current_count = count current_word = word if word == current_word: print("%s\t%s"%(word,count)) 
[hadoop@master0 hadoop]$ bin/hadoop jar\ share/hadoop/tools/lib/****.jar \ -file mapper.py -mapper "python mapper.py" \ -file reducer.py -reducer "python reducer.py" \ -input /user/hadoop/input -output /user/hadoop/input 

[hadoop @ master0 hadoop] $ hadoop fs -cat input / part-00000
经验| Hadoop生态圈python + mapreduce + wordcount

(API), 62 (C++, 62 (FUSE) 62 (HDFS) 62 (HDFS). 63 (JAR) 63 (JRE) 63 (RPC) 62 (SPoF), 62 (a,b,c) 62 (a,b,c), 62 (but 62 (either 63 (more 63 (ssh) 63 (the 62 (typically 62 (webapp) 62 (x,y,z) 62 (x,y,z). 62 1.6 63 2.0 62 2012,[66] 62 3, 62 3rd-party 62 A 369 API 124 ARchive 63 An 62 Append. 62 B 124 Because 62 C#, 62 Clients 62 Cocoa, 62 Common 126 Data 62 DataNode 126 DataNode. 63 Distributed 63 Each 62 Environment 63 Erlang, 62 Failure 62 Federation, 62 File 182 Filesystem 62 For 119 HDFS 732 HDFS, 62 HDFS-UI 62 HDFS. 62 HTTP, 62 Hadoop 986 Hadoop-compatible 63 Hadoop. 63 Haskell, 62 I/O 62 In 121 It 62 Java 312 Java, 62 Job 63 JobTracker 63 Linux 62 MapReduce 126 MapReduce/MR1 63 May 62 Moreover, 62 NameNode 126 NameNode) 62 NameNode, 189 OCaml), 62 OS 63 PHP, 62 POSIX 124 POSIX-compliant 62 POSIX-compliant, 62 Perl, 62 Point 62 Python, 62 RAID 124 Ruby, 62 Runtime 63 Secure 63 Shell 63 Similarly, 63 Single 62 Smalltalk, 62 Some 62 System 63 TCP/IP 62 Task 63 TaskTracker, 63 The 552 These 125 This 187 Thrift 62 Tracker, 126 Unix 62 Userspace 62 Web 62 When 125 With 62 YARN/MR2)[58] 63 a 1994 abstractions, 63 access 62 achieved 62 achieves 62 across 250 actions, 62 acts 63 added 62 addition, 62 advantage 124 aims 62 allowing 62 also 62 alternate 63 although 62 always 62 amount 62 an 187 and 1747 and, 63 announced 62 application 124 application. 62 applications 63 applications. 63 approach 63 architecture 63 are 437 around, 62 as 249 automatic 62 available 62 available. 125 awareness 124 awareness: 63 backbone 63 backup 62 backup. 62 be 497 because 62 become 62 been 62 between 187 block 62 blocks 62 both 63 bottleneck 124 browsed 62 builds 62 but 62 by 249 call 62 can 561 capabilities, 62 certain 62 checkpointed 62 choosing 62 client 124 cluster 376 cluster, 63 code 63 command-line 62 commands 62 communicate 62 communication. 62 compliance 62 compute-only 63 concurrent 62 configurations 62 connects 62 consider 62 consists 126 contains 187 copies 62 corruption 63 create 62 criticality. 62 data 934 data, 62 data-intensive 62 data-only 63 data. 63 datanode 62 datanodes, 62 dedicated 63 default 62 demonstrated 62 designed 62 developing 62 differ 62 different 62 directly 62 directories. 62 directory 124 distributed 124 distributed, 62 does 124 due 124 each 124 edit 62 effective 63 engine 63 entire 62 equivalents. 63 especially 62 every 63 example: 62 execute 63 extent 62 fact, 62 fail 62 fail-over. 62 failed 62 failing 63 failure; 63 failures 63 file 810 file-system 249 file-system-specific 63 files 125 files, 62 files. 62 files[65] 62 for 869 framework. 62 from 62 fully 124 generate 125 gigabytes 62 goals 62 goes 124 hardware 63 has 186 have 125 having 124 hence 62 high-availability 62 high. 62 higher. 63 host 63 hosts 62 hosts, 62 huge 124 if 125 images 62 immutable 62 impact 125 in 560 inability 62 includes 125 incorrectly 62 increase 62 increased 62 index, 63 information 63 information, 62 instead 62 interface 62 interface, 62 interpret 62 is 622 is, 63 is. 63 issue, 62 issues 62 it 187 its 124 job 249 job-completion 62 jobs 62 jobs. 62 journal 62 keep 62 lack 62 language 62 large 124 larger 63 letting 62 level 63 libraries.A 6 libraries.File 6 libraries.For 6 libraries.HDFS 15 libraries.Hadoop 16 libraries.In 4 libraries.The 9 local 62 location 63 location. 62 log 62 loss 63 machines. 62 main 62 manage 63 managed 63 management 62 manually 62 map 186 master 126 may 62 memory 63 metadata 124 metadata, 62 method 63 methods 62 might 62 misleading 62 mostly 62 mounted 62 mounted,[62] 62 move 62 multi-node 63 multiple 312 name 125 namely, 62 namenode 496 namenode's 125 namenode, 62 namenodes. 62 namespaces 62 native 62 necessary 63 needed 63 network 249 new 62 node 500 nodes 251 nodes. 189 nodes: 62 nominally 62 non-POSIX 62 nonstandard 63 normally 63 not 310 number 124 occurs, 63 of 1622 offline. 62 on 622 one 125 only 63 operations 62 options 62 or 562 other 248 other. 62 outage 63 over 248 package 63 package, 63 perform 124 performance 124 plus 62 point 62 portable 62 possible 63 power 63 precisely, 63 preventing 63 prevents 62 primary 248 problem 62 problem, 62 procedure 62 programming 62 project 62 protocol 62 provide 125 provides 63 rack 126 rack, 62 rack. 62 rack/switch 63 racks. 63 range 62 rebalance 62 reduce 249 reduces 125 redundancy 125 regularly 62 release 62 reliability 62 remain 63 remote 124 replaced 63 replay 62 replicating 125 replication 124 request. 62 require 125 requirements 62 requires 63 requiring 62 restart 62 running 62 same 125 saves 62 scalability 62 scalable, 62 scheduled 62 schedules 124 scheduling 126 scripts 126 secondary 250 separate 62 served 62 server 188 serves 62 set 63 shell 62 should 63 shutdown 63
版权声明:本文内容由互联网用户自发贡献,该文观点仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 举报,一经查实,本站将立刻删除。

发布者:全栈程序员-用户IM,转载请注明出处:https://javaforall.cn/143518.html原文链接:https://javaforall.cn

【正版授权,激活自己账号】: Jetbrains全家桶Ide使用,1年售后保障,每天仅需1毛

【官方授权 正版激活】: 官方授权 正版激活 支持Jetbrains家族下所有IDE 使用个人JB账号...

(0)
blank

相关推荐

  • Python读取h5文件_html python

    Python读取h5文件_html python原文链接:https://blog.csdn.net/leibaojiangjun1/article/details/53635353 h5接受的数据是矩阵跟mat方法一致,但是具有更强的压缩性能使用hdf5依赖于python的工具包:h5pyimporth5py #导入工具包importnumpyasnp#HDF5的写入:imgData=np.zeros((30,3…

  • 解决idea中maven项目的pom文件不会自动下载jar包问题 + 更新不完整依赖命令

    解决idea中maven项目的pom文件不会自动下载jar包问题 + 更新不完整依赖命令不会自动下载jar包idea昨天还在正常使用,结果今天发现pom文件中的依赖不会自动下载了,最后百度找到了解决方案: setting——>maven——>去掉workoffline的勾,问题解决!但是我之前用着还是没有这个问题的,百度发现可能是Intellijideasetting显示出错/或者电脑运行过久出错,在经历几次重启I…

  • 二进制8进制10进制16进制代码_不同进制之间的转换

    二进制8进制10进制16进制代码_不同进制之间的转换为什么要使用进制数数据在计算机中的表示,最终以二进制的形式存在,就是各种<黑客帝国>电影中那些0101010…的数字;我们操作计算机,实际就是使用程序和软件在计算机上各种读写数据,如果我们直接操作二进制的话,面对这么长的数进行思考或操作,没有人会喜欢。C,C++语言没有提供在代码直接写二进制数的方法。用16进制或8进制可以…

  • java 生成xml dom4j_Java生成xml——DOM4J生成

    java 生成xml dom4j_Java生成xml——DOM4J生成一、四种方式的总结【DOM】DOM方式生成xml是基于DOM树的结构,整个DOM树会存在内存中,所以使用DOM方式可以频繁的修改xml的内容,但是因为DOM树是存在内存中的,所以对内存消耗较大。DOM方式比较适用于需要频繁删改的情况。【SAX】SAX方式生成xml是逐步写一、DOM4J生成实例Dom4JToXmlDemo.javapublicclassDom4JToXmlDemo{p…

  • linux下解压zip文件命令

    linux下解压zip文件命令1、把/home目录下面的data目录压缩为data.zipzip-rdata.zipdata#压缩data目录2、把/home目录下面的data.zip解压到databak目录里面unzipdata.zip-ddatabak3、把/home目录下面的a文件夹和3.txt压缩成为a123.zipzip-ra123.zipa3.txt4、把/home目录下面的t.z…

  • 心形函数的几种表达式怎么求_心形曲线函数4种表达式

    心形函数的几种表达式怎么求_心形曲线函数4种表达式用两个函数表示:f(x)=sqrt(1-(abs(x)-1)^2)h(x)=-2*sqrt(1-0.5*abs(x))也可以根据图中的q(x)画出心形的内部:q(x)=(f(x)-h(x))/2*cos(200*x)+(f(x)+h(x))/2用一个函数表示,我拟合了很久才画出来的:f(x)=(0.64*sqrt(abs(x))-0.8+1.2^abs(x)*cos(200*x))*sqrt(cos(x))定义域:-pi/2<=x<=pi/2我个人觉得要比

发表回复

您的电子邮箱地址不会被公开。

关注全栈程序员社区公众号