Hadoop生态圈python + mapreduce + wordcount

Hadoop生态圈python + mapreduce + wordcountHadoop生态圈python+mapreduce+wordcount启动hadoop进度发布文件hdfsdfs-put/home/hadoop/hadoop/input/user/hadoop/input查看hdfs现在有一些文件[hadoop@master0hadoop]$hdfsdfs-ls/Found1itemsdrwxr-xr-x-hadoopsupergroup02019-12-0402

大家好,又见面了,我是你们的朋友全栈君。

Hadoop生态圈python + mapreduce + wordcount

启动hadoop进度

发布文件

hdfs dfs -put /home/hadoop/hadoop/input /user/hadoop/input 

查看hdfs现在有一些文件

[hadoop@master0 hadoop]$ hdfs dfs -ls / Found 1 items drwxr-xr-x - hadoop supergroup 0 2019-12-04 02:17 /user 

经验|  Hadoop生态圈python + mapreduce + wordcount

查看上传的文件是否正确

经验|  Hadoop生态圈python + mapreduce + wordcount

运行程序,查询字符串出现次数

 

查看输出结果

[hadoop@master0 hdfs]$ hdfs dfs -cat /user/hadoop/output/* work, 63 worker 315 would 62 write-operations. 62 written 62 ........ ....... ....... 

编写mapreduce编程,推送到流中进行运算

#!/usr/bin/env Python3 # -*- coding: utf-8 -*- # @Software: PyCharm # @virtualenv:workon # @contact: 1040691703@qq.com # @Desc:Code descripton __author__ = '未昔/AngelFate' __date__ = '2019/12/4 20:20' import sys for line in sys.stdin: line = line.strip() words = line.split() for word in words: print("%s\t%s"%(word,1)) 
#!/usr/bin/env Python3 # -*- coding: utf-8 -*- # @Software: PyCharm # @virtualenv:workon # @contact: 1040691703@qq.com # @Desc:Code descripton __author__ = '未昔/AngelFate' __date__ = '2019/12/4 20:25' import sys current_word = None #记录前一个单词, 用于比较 count = 0 word = None current_count = 0 for line in sys.stdin: line = line.strip() word, count = line.split('\t', 1) try: count = int(count) except ValueError: continue if current_word == word: current_count += count else: if current_word: print("%s\t%s" % (current_word, current_count)) current_count = count current_word = word if word == current_word: print("%s\t%s"%(word,count)) 
[hadoop@master0 hadoop]$ bin/hadoop jar\ share/hadoop/tools/lib/****.jar \ -file mapper.py -mapper "python mapper.py" \ -file reducer.py -reducer "python reducer.py" \ -input /user/hadoop/input -output /user/hadoop/input 

[hadoop @ master0 hadoop] $ hadoop fs -cat input / part-00000
经验|  Hadoop生态圈python + mapreduce + wordcount

(API), 62 (C++, 62 (FUSE) 62 (HDFS) 62 (HDFS). 63 (JAR) 63 (JRE) 63 (RPC) 62 (SPoF), 62 (a,b,c) 62 (a,b,c), 62 (but 62 (either 63 (more 63 (ssh) 63 (the 62 (typically 62 (webapp) 62 (x,y,z) 62 (x,y,z). 62 1.6 63 2.0 62 2012,[66] 62 3, 62 3rd-party 62 A 369 API 124 ARchive 63 An 62 Append. 62 B 124 Because 62 C#, 62 Clients 62 Cocoa, 62 Common 126 Data 62 DataNode 126 DataNode. 63 Distributed 63 Each 62 Environment 63 Erlang, 62 Failure 62 Federation, 62 File 182 Filesystem 62 For 119 HDFS 732 HDFS, 62 HDFS-UI 62 HDFS. 62 HTTP, 62 Hadoop 986 Hadoop-compatible 63 Hadoop. 63 Haskell, 62 I/O 62 In 121 It 62 Java 312 Java, 62 Job 63 JobTracker 63 Linux 62 MapReduce 126 MapReduce/MR1 63 May 62 Moreover, 62 NameNode 126 NameNode) 62 NameNode, 189 OCaml), 62 OS 63 PHP, 62 POSIX 124 POSIX-compliant 62 POSIX-compliant, 62 Perl, 62 Point 62 Python, 62 RAID 124 Ruby, 62 Runtime 63 Secure 63 Shell 63 Similarly, 63 Single 62 Smalltalk, 62 Some 62 System 63 TCP/IP 62 Task 63 TaskTracker, 63 The 552 These 125 This 187 Thrift 62 Tracker, 126 Unix 62 Userspace 62 Web 62 When 125 With 62 YARN/MR2)[58] 63 a 1994 abstractions, 63 access 62 achieved 62 achieves 62 across 250 actions, 62 acts 63 added 62 addition, 62 advantage 124 aims 62 allowing 62 also 62 alternate 63 although 62 always 62 amount 62 an 187 and 1747 and, 63 announced 62 application 124 application. 62 applications 63 applications. 63 approach 63 architecture 63 are 437 around, 62 as 249 automatic 62 available 62 available. 125 awareness 124 awareness: 63 backbone 63 backup 62 backup. 62 be 497 because 62 become 62 been 62 between 187 block 62 blocks 62 both 63 bottleneck 124 browsed 62 builds 62 but 62 by 249 call 62 can 561 capabilities, 62 certain 62 checkpointed 62 choosing 62 client 124 cluster 376 cluster, 63 code 63 command-line 62 commands 62 communicate 62 communication. 62 compliance 62 compute-only 63 concurrent 62 configurations 62 connects 62 consider 62 consists 126 contains 187 copies 62 corruption 63 create 62 criticality. 62 data 934 data, 62 data-intensive 62 data-only 63 data. 63 datanode 62 datanodes, 62 dedicated 63 default 62 demonstrated 62 designed 62 developing 62 differ 62 different 62 directly 62 directories. 62 directory 124 distributed 124 distributed, 62 does 124 due 124 each 124 edit 62 effective 63 engine 63 entire 62 equivalents. 63 especially 62 every 63 example: 62 execute 63 extent 62 fact, 62 fail 62 fail-over. 62 failed 62 failing 63 failure; 63 failures 63 file 810 file-system 249 file-system-specific 63 files 125 files, 62 files. 62 files[65] 62 for 869 framework. 62 from 62 fully 124 generate 125 gigabytes 62 goals 62 goes 124 hardware 63 has 186 have 125 having 124 hence 62 high-availability 62 high. 62 higher. 63 host 63 hosts 62 hosts, 62 huge 124 if 125 images 62 immutable 62 impact 125 in 560 inability 62 includes 125 incorrectly 62 increase 62 increased 62 index, 63 information 63 information, 62 instead 62 interface 62 interface, 62 interpret 62 is 622 is, 63 is. 63 issue, 62 issues 62 it 187 its 124 job 249 job-completion 62 jobs 62 jobs. 62 journal 62 keep 62 lack 62 language 62 large 124 larger 63 letting 62 level 63 libraries.A 6 libraries.File 6 libraries.For 6 libraries.HDFS 15 libraries.Hadoop 16 libraries.In 4 libraries.The 9 local 62 location 63 location. 62 log 62 loss 63 machines. 62 main 62 manage 63 managed 63 management 62 manually 62 map 186 master 126 may 62 memory 63 metadata 124 metadata, 62 method 63 methods 62 might 62 misleading 62 mostly 62 mounted 62 mounted,[62] 62 move 62 multi-node 63 multiple 312 name 125 namely, 62 namenode 496 namenode's 125 namenode, 62 namenodes. 62 namespaces 62 native 62 necessary 63 needed 63 network 249 new 62 node 500 nodes 251 nodes. 189 nodes: 62 nominally 62 non-POSIX 62 nonstandard 63 normally 63 not 310 number 124 occurs, 63 of 1622 offline. 62 on 622 one 125 only 63 operations 62 options 62 or 562 other 248 other. 62 outage 63 over 248 package 63 package, 63 perform 124 performance 124 plus 62 point 62 portable 62 possible 63 power 63 precisely, 63 preventing 63 prevents 62 primary 248 problem 62 problem, 62 procedure 62 programming 62 project 62 protocol 62 provide 125 provides 63 rack 126 rack, 62 rack. 62 rack/switch 63 racks. 63 range 62 rebalance 62 reduce 249 reduces 125 redundancy 125 regularly 62 release 62 reliability 62 remain 63 remote 124 replaced 63 replay 62 replicating 125 replication 124 request. 62 require 125 requirements 62 requires 63 requiring 62 restart 62 running 62 same 125 saves 62 scalability 62 scalable, 62 scheduled 62 schedules 124 scheduling 126 scripts 126 secondary 250 separate 62 served 62 server 188 serves 62 set 63 shell 62 should 63 shutdown 63
版权声明:本文内容由互联网用户自发贡献,该文观点仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 举报,一经查实,本站将立刻删除。

发布者:全栈程序员-用户IM,转载请注明出处:https://javaforall.cn/143518.html原文链接:https://javaforall.cn

【正版授权,激活自己账号】: Jetbrains全家桶Ide使用,1年售后保障,每天仅需1毛

【官方授权 正版激活】: 官方授权 正版激活 支持Jetbrains家族下所有IDE 使用个人JB账号...

(0)


相关推荐

  • accept 函数_case when函数

    accept 函数_case when函数2.关于AcceptEx  使用此函数时,要包含头文:Mswsock.h,同时要链接:Mswsock.lib。可在源程序中加入下面的语句,这样在编译时,将自动链接Mswsock.lib。  #pragmacomment(lib,”Mswsock.lib”)    下面是使用AcceptEx函数的示例代码:  #defineSTRICT  #define_WIN32_WINNT0x050

  • 全排列(递归与非递归实现)[通俗易懂]

    全排列(递归与非递归实现)

  • leetcode-79单词搜索(深搜dfs)[通俗易懂]

    leetcode-79单词搜索(深搜dfs)[通俗易懂]给定一个 m x n 二维字符网格 board 和一个字符串单词 word 。如果 word 存在于网格中,返回 true ;否则,返回 false 。单词必须按照字母顺序,通过相邻的单元格内的字母构成,其中“相邻”单元格是那些水平相邻或垂直相邻的单元格。同一个单元格内的字母不允许被重复使用。示例 1:输入:board = [[“A”,“B”,“C”,“E”],[“S”,“F”,“C”,“S”],[“A”,“D”,“E”,“E”]], word = “ABCCED”输出:true示例 2:输

  • 【最新】解决Github网页上图片显示失败的问题

    【最新】解决Github网页上图片显示失败的问题好几个星期之前本人就发现自己的github在网页打开显示不了图片的问题了,不过当时没在意。今天强迫症逼迫我一定要搞定它,于是去找了一些方法,自己做个记录,有相同问题的伙伴可以参考一下。一、问题比如随便打开一个项目,图片都挂掉了,我头像都没了打开控制台显示主要报错是Failedtoloadresource:net::ERR_CERT_COMMON_NAME_INVALID查了…

  • Python中strip()函数

    Python中strip()函数在pythonAPI中这样解释strip()函数:声明:s为字符串,rm为要删除的字符序列s.strip(rm)删除s字符串中开头、结尾处,位于rm删除序列的字符s.lstrip(rm)删除s

  • 幼儿数学推理题图片_逻辑图形推理题

    幼儿数学推理题图片_逻辑图形推理题前天上幼儿园中班的侄子考了我一道题请在括号内填上正确的答案:(),(),2,4,6,7,8算了半小时都没头绪还被“羞辱”了一番:舅舅,这么简单的题都不会,还大学毕业的呢。看着侄子卖关子的表情,着实尴尬。✿赶✿紧✿想✿答✿案✿答案:(快来快来),(数一数),2,4,6,7,8!看完答案我感觉我的智商被侮辱了!气得我把我侄子“揍”了一顿如果你也没答出来千万别怀疑人生当今社会竞争那么激烈,仅仅拥有知…

发表回复

您的电子邮箱地址不会被公开。

关注全栈程序员社区公众号