Spark快速入门系列(三)深入理解RDD

Spark快速入门系列(三)深入理解RDD这里写目录标题深入RDD二级目录三级目录深入RDD目标深入理解RDD的内在逻辑,以及RDD的内部属性(RDD由什么组成)案例需求给定一个网站的访问记录,俗称Accesslog计算其中出现的独立IP,以及其访问的次数二级目录三级目录…

大家好,又见面了,我是你们的朋友全栈君。如果您正在找激活码,请点击查看最新教程,关注关注公众号 “全栈程序员社区” 获取激活教程,可能之前旧版本教程已经失效.最新Idea2022.1教程亲测有效,一键激活。

Jetbrains全系列IDE使用 1年只要46元 售后保障 童叟无欺

深入 RDD

目标

深入理解 RDD 的内在逻辑, 以及 RDD 的内部属性(RDD 由什么组成)

案例

需求

给定一个网站的访问记录, 俗称 Access log
计算其中出现的独立 IP, 以及其访问的次数

创建个数据文件access_log_sample.txt(数据量太大,存不到这里,先用100)

190.217.63.59 - - [01/Nov/2017:00:00:15 +0000] "GET /axis2/services/WebFilteringService/getCategoryByUrl?app=chrome_antiporn&ver=0.19.7.1&url=https%3A//securepubads.g.doubleclick.net/static/3p_cookie.html&cat=business-and-economy HTTP/1.1" 200 133 "-" "Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36"
76.114.21.96 - - [01/Nov/2017:00:00:31 +0000] "GET /axis2/services/WebFilteringService/getCategoryByUrl?app=chrome_antiporn&ver=0.19.7.1&url=http%3A//tricolor.entravision.com/sacramento/escucha-en-vivo/&cat=business-and-economy HTTP/1.1" 200 133 "-" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36"
206.126.121.204 - - [01/Nov/2017:00:00:46 +0000] "GET /axis2/services/WebFilteringService/getCategoryByUrl?app=chrome_antiporn&ver=0.19.7.1&url=http%3A//zone.msn.com/gameplayer/gameplayer.aspx%3Fgame%3Dfamilyfeud&cat=internet-portal HTTP/1.1" 200 133 "-" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36"
154.121.8.18 - - [01/Nov/2017:00:01:01 +0000] "GET /axis2/services/WebFilteringService/getCategoryByUrl?app=firefox_AntiPorn&ver=0.19.6.9&url=https%3A%2F%2Fwww.google.dz%2Fsearch&cat=search-engine HTTP/1.1" 200 133 "-" "Mozilla/5.0 (Windows NT 5.1; rv:11.0) Gecko/20100101 Firefox/11.0"
190.238.37.217 - - [01/Nov/2017:00:01:17 +0000] "GET /axis2/services/WebFilteringService/getCategoryByUrl?app=chrome_antiporn&ver=0.19.7.1&url=https%3A//securepubads.g.doubleclick.net/static/3p_cookie.html&cat=business-and-economy HTTP/1.1" 200 133 "-" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36"
147.147.163.182 - - [01/Nov/2017:00:01:31 +0000] "GET /axis2/services/WebFilteringService/getCategoryByUrl?app=firefox_AntiPorn&ver=0.19.6.9&url=https%3A%2F%2Fs-usweb.dotomi.com%2Frenderer%2FdelPublishersCookies.html&cat=business-and-economy HTTP/1.1" 200 133 "-" "Mozilla/5.0 (X11; Linux x86_64; rv:56.0) Gecko/20100101 Firefox/56.0"
200.78.93.132 - - [01/Nov/2017:00:01:45 +0000] "GET /axis2/services/WebFilteringService/getCategoryByUrl?app=chrome_antiporn&ver=0.19.7.1&url=https%3A//www.facebook.com/login/device-based/regular/login/&cat=social-networking HTTP/1.1" 200 133 "-" "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36"
24.200.173.170 - - [01/Nov/2017:00:01:59 +0000] "GET /axis2/services/WebFilteringService/getCategoryByUrl?app=chrome_antiporn&ver=0.19.7.1&url=https%3A//securepubads.g.doubleclick.net/static/glade.js&cat=business-and-economy HTTP/1.1" 200 133 "-" "Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36"
189.252.185.4 - - [01/Nov/2017:00:02:15 +0000] "GET /axis2/services/WebFilteringService/getCategoryByUrl?app=firefox_AntiPorn&ver=0.19.6.9&url=https%3A%2F%2Fwww.google.cm%2Fblank.html&cat=internet-portal HTTP/1.1" 200 133 "-" "Mozilla/5.0 (Windows NT 6.1; rv:34.0) Gecko/20100101 Firefox/34.0"
190.90.22.125 - - [01/Nov/2017:00:02:29 +0000] "GET /axis2/services/WebFilteringService/getCategoryByUrl?app=chrome_antiporn&ver=0.19.7.1&url=http%3A//www.raicesdeeuropa.com/grandes-obras-de-los-principales-escritores-nacidos-durante-el-siglo-xix/&cat=unknown HTTP/1.1" 200 134 "-" "Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36"
181.64.62.158 - - [01/Nov/2017:00:02:45 +0000] "GET /axis2/services/WebFilteringService/getCategoryByUrl?app=chrome_antiporn&ver=0.19.7.1&url=https%3A//bancaporinternet.interbank.com.pe/Warhol/redireccionaInicioLogueo&cat=financial-service HTTP/1.1" 200 133 "-" "Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/62.0.3202.75 Safari/537.36"
122.54.153.240 - - [01/Nov/2017:00:03:00 +0000] "GET /axis2/services/WebFilteringService/getCategoryByUrl?app=chrome_antiporn&ver=0.19.7.1&url=https%3A//securepubads.g.doubleclick.net/static/3p_cookie.html&cat=business-and-economy HTTP/1.1" 200 133 "-" "Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/59.0.3071.115 Safari/537.36"
181.64.62.158 - - [01/Nov/2017:00:03:16 +0000] "GET /axis2/services/WebFilteringService/getCategoryByUrl?app=chrome_antiporn&ver=0.19.7.1&url=https%3A//www.google.com.pe/&cat=search-engine HTTP/1.1" 200 133 "-" "Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36"
190.236.239.8 - - [01/Nov/2017:00:03:33 +0000] "GET /axis2/services/WebFilteringService/getCategoryByUrl?app=chrome_antiporn&ver=0.19.7.1&url=https%3A//www.google.com.pe/search%3Frlz%3D1C2AOHY_esPE760PE760%26source%3Dhp%26ei%3DUw_5WeGVA4TjmAHO8aCgDw%26q%3Dfb%26oq%3Dfb%26gs_l%3Dpsy-ab.3..0i131k1j0l4j0i131k1l2j0l3.1767.1916.0.2135.2.2.0.0.0.0.144.269.0j2.2.0....0...1.1.64.psy-ab..0.2.267....0.pWGbpZy6zwg%26safe%3Dhigh&cat=search-engine HTTP/1.1" 200 133 "-" "Mozilla/5.0 (Windows NT 5.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/49.0.2623.112 Safari/537.36"
190.110.200.41 - - [01/Nov/2017:00:03:50 +0000] "GET /axis2/services/WebFilteringService/getCategoryByUrl?app=chrome_antiporn&ver=0.19.7.1&url=https%3A//www.facebook.com/rsrc.php/v3i0KB4/ye/l/es_LA/G6VcGRK_54X.js&cat=social-networking HTTP/1.1" 200 133 "-" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36"
77.180.73.169 - - [01/Nov/2017:00:04:06 +0000] "GET /axis2/services/WebFilteringService/getCategoryByUrl?app=chrome_antiporn&ver=0.19.7.1&url=https%3A//gomovies.co/&cat=unknown HTTP/1.1" 200 134 "-" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36"
49.146.42.248 - - [01/Nov/2017:00:04:22 +0000] "GET /axis2/services/WebFilteringService/getCategoryByUrl?app=chrome_antiporn&ver=0.19.7.1&url=https%3A//mm-a.akamaihd.net/160/sn/assets/common/3d/particle/ns2/texture/line_040.dxt%3Fv%3D25960&cat=content-delivery-network HTTP/1.1" 200 133 "-" "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/46.0.2490.80 Safari/537.36"
181.64.146.165 - - [01/Nov/2017:00:04:39 +0000] "GET /axis2/services/WebFilteringService/getCategoryByUrl?app=chrome_antiporn&ver=0.19.7.1&url=https%3A//www.facebook.com/rsrc.php/yR/r/lvSDckxyoU5.ogg&cat=social-networking HTTP/1.1" 200 133 "-" "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/53.0.2785.143 Safari/537.36"
201.240.33.214 - - [01/Nov/2017:00:04:55 +0000] "GET /axis2/services/WebFilteringService/getCategoryByUrl?app=chrome_antiporn&ver=0.19.7.1&url=https%3A//es.savefrom.net/%23url%3Dhttp%3A//youtube.com/watch%3Fv%3Dgr_3VrQC8qY%26utm_source%3Dyoutube.com%26utm_medium%3Dshort_domains%26utm_campaign%3Dssyoutube.com&cat=software-download HTTP/1.1" 200 133 "-" "Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36"
190.236.56.58 - - [01/Nov/2017:00:05:10 +0000] "GET /axis2/services/WebFilteringService/getCategoryByUrl?app=chrome_webfilter&ver=0.19.7.1&url=https%3A//scontent.flim5-4.fna.fbcdn.net/v/t1.0-1/p32x32/22310580_351017335344058_8554274362948717253_n.jpg%3Foh%3D5da979568a22e425b79b7ba788dbc30a%26oe%3D5A65BCC3&cat=social-networking HTTP/1.1" 200 133 "-" "Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36"
181.64.192.238 - - [01/Nov/2017:00:05:26 +0000] "GET /axis2/services/WebFilteringService/getCategoryByUrl?app=chrome_antiporn&ver=0.19.7.1&url=https%3A//www.google.com.pe/%3Fgws_rd%3Dssl&cat=search-engine HTTP/1.1" 200 133 "-" "Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36"
201.255.225.35 - - [01/Nov/2017:00:05:41 +0000] "GET /axis2/services/WebFilteringService/getCategoryByUrl?app=chrome_antiporn&ver=0.19.7.1&url=https%3A//www.google.com.ar/search%3Fq%3D886971865721%26oq%3D886971865721%26aqs%3Dchrome..69i57.719j0j7%26sourceid%3Dchrome%26ie%3DUTF-8%26safe%3Dhigh&cat=search-engine HTTP/1.1" 200 133 "-" "Mozilla/5.0 (Windows NT 10.0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36"
181.211.197.246 - - [01/Nov/2017:00:05:56 +0000] "GET /axis2/services/WebFilteringService/getCategoryByUrl?app=chrome_antiporn&ver=0.19.7.1&url=https%3A//www.netflix.com/logout%3Flocale%3Des-EC&cat=media-streaming HTTP/1.1" 200 133 "-" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36"
187.3.230.121 - - [01/Nov/2017:00:06:11 +0000] "GET /axis2/services/WebFilteringService/getCategoryByUrl?app=chrome_antiporn&ver=0.19.7.1&url=http%3A//baixar.programanex.com.br/latest/setup_nex.exe&cat=unknown HTTP/1.1" 200 134 "-" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/62.0.3202.75 Safari/537.36"
175.158.226.85 - - [01/Nov/2017:00:06:28 +0000] "GET /axis2/services/WebFilteringService/getCategoryByUrl?app=chrome_antiporn&ver=0.19.7.1&url=https%3A//web.facebook.com/%3F_rdc%3D1%26_rdr&cat=social-networking HTTP/1.1" 200 133 "-" "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36"
24.151.60.116 - - [01/Nov/2017:00:06:43 +0000] "GET /axis2/services/WebFilteringService/getCategoryByUrl?app=chrome_webfilter&ver=0.19.7.1&url=https%3A//www.edhelper.com/edhelper_monthly.htm&cat=educational-institution HTTP/1.1" 200 133 "-" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36"
190.236.157.88 - - [01/Nov/2017:00:06:58 +0000] "GET /axis2/services/WebFilteringService/getCategoryByUrl?app=chrome_antiporn&ver=0.19.7.1&url=http%3A//auth.kaybo1.com/member/login.html%3Fback_url%3Dhttp%3A//pb.kaybo1.com/event/evt20170301_event/event01.html&cat=game HTTP/1.1" 200 133 "-" "Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36"
75.73.28.212 - - [01/Nov/2017:00:07:13 +0000] "GET /axis2/services/WebFilteringService/getCategoryByUrl?app=chrome_antiporn&ver=0.19.7.1&url=https%3A//securepubads.g.doubleclick.net/static/3p_cookie.html&cat=business-and-economy HTTP/1.1" 200 133 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36"
190.43.170.133 - - [01/Nov/2017:00:07:29 +0000] "GET /axis2/services/WebFilteringService/getCategoryByUrl?app=chrome_antiporn&ver=0.19.7.1&url=https%3A//securepubads.g.doubleclick.net/static/3p_cookie.html&cat=business-and-economy HTTP/1.1" 200 133 "-" "Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36"
186.121.250.170 - - [01/Nov/2017:00:07:45 +0000] "GET /axis2/services/WebFilteringService/getCategoryByUrl?app=chrome_antiporn&ver=0.19.7.1&url=https%3A//r1---sn-5mncvap8p5-a2ce.googlevideo.com/generate_204&cat=media-streaming HTTP/1.1" 200 133 "-" "Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36"
190.237.183.6 - - [01/Nov/2017:00:08:01 +0000] "GET /axis2/services/WebFilteringService/getCategoryByUrl?app=chrome_antiporn&ver=0.19.7.1&url=http%3A//musicaq.biz/song.php%3Fid%3DQ2hpbml0byBEZWwgQW5kZSAtICAgIFByaW1pY2lhfGh0dHBzOi8vYXBpLnNvdW5kY2xvdWQuY29tL3RyYWNrcy83MjkxNjE2NS9zdHJlYW0%252FY2xpZW50X2lkPTBmOGZkYmJhYTIxYTliZDE4MjEwOTg2YTdkYzJkNzJj&cat=unknown HTTP/1.1" 200 134 "-" "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36"
181.67.2.102 - - [01/Nov/2017:00:08:17 +0000] "GET /axis2/services/WebFilteringService/getCategoryByUrl?app=chrome_antiporn&ver=0.19.7.1&url=http%3A//lcperu.edestinos.com.pe/check-in-online&cat=unknown HTTP/1.1" 200 134 "-" "Mozilla/5.0 (Windows NT 6.2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36"
186.121.218.21 - - [01/Nov/2017:00:08:33 +0000] "GET /axis2/services/WebFilteringService/getCategoryByUrl?app=chrome_antiporn&ver=0.19.7.1&url=https%3A//www.4kdownload.com/buy/videodownloader%3Fsource%3Dvideodownloader%26redirect-locale%3Des%26ui_source%3Dshow-on-run-3&cat=unknown HTTP/1.1" 200 134 "-" "Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/62.0.3202.75 Safari/537.36"
76.23.172.162 - - [01/Nov/2017:00:08:48 +0000] "GET /axis2/services/WebFilteringService/getCategoryByUrl?app=chrome_webfilter&ver=0.19.7.1&url=https%3A//open.spotify.com/&cat=music HTTP/1.1" 200 133 "-" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36"
181.64.101.27 - - [01/Nov/2017:00:09:04 +0000] "GET /axis2/services/WebFilteringService/getCategoryByUrl?app=chrome_antiporn&ver=0.19.7.1&url=https%3A//www.facebook.com/&cat=social-networking HTTP/1.1" 200 133 "-" "Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36"
114.186.152.178 - - [01/Nov/2017:00:09:21 +0000] "GET /axis2/services/WebFilteringService/getCategoryByUrl?app=chrome_antiporn&ver=0.19.7.1&url=https%3A//securepubads.g.doubleclick.net/static/3p_cookie.html&cat=business-and-economy HTTP/1.1" 200 133 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36"
181.234.56.200 - - [01/Nov/2017:00:09:36 +0000] "GET /axis2/services/WebFilteringService/getCategoryByUrl?app=chrome_antiporn&ver=0.19.7.1&url=https%3A//toroadvertisingmedia.com/cr%3Fb%3D218558%26p%3D7550%26c%3D6608%26h%3D0d7386ae207d128d276c8fc974f8f99b%26l%3DCO%26tz%3D-5.0%26sh%3D768.0%26sw%3D1360.0%26ad.trans.id%3Dwzj9mrkhearh%26t%3D1509494794724%26u%3Dhttps%253A%252F%252Fwww.popcornvod.com%252Fwelcome.html%253Faff%253D4054%2526theme%253D0922%2526clickid%253DOCM2NjA4IzI0MyM3NTUwfDIxODU1OHxDT3wzfDF8fHx3emo5bXJraGVhcmh8fHw%2526pub%253D1400%2526sub_pub_id%253D&cat=unknown HTTP/1.1" 200 133 "-" "Mozilla/5.0 (Windows NT 10.0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36"
190.233.78.10 - - [01/Nov/2017:00:09:52 +0000] "GET /axis2/services/WebFilteringService/getCategoryByUrl?app=chrome_antiporn&ver=0.19.7.1&url=http%3A//www.flvto.biz/es/downloads/mp3/yt_5S-Fjz5CR5s/&cat=unknown HTTP/1.1" 200 134 "-" "Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36"
186.46.172.102 - - [01/Nov/2017:00:10:07 +0000] "GET /axis2/services/WebFilteringService/getCategoryByUrl?app=chrome_antiporn&ver=0.19.7.1&url=http%3A//jkanime.net/kimi-ni-todoke-2/5/&cat=entertainment-and-art HTTP/1.1" 200 133 "-" "Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36"
189.148.47.237 - - [01/Nov/2017:00:10:24 +0000] "GET /axis2/services/WebFilteringService/getCategoryByUrl?app=chrome_webfilter&ver=0.19.7.1&url=https%3A//securepubads.g.doubleclick.net/static/3p_cookie.html&cat=business-and-economy HTTP/1.1" 200 133 "-" "Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/53.0.2800.0 Iron Safari/537.36"
182.251.246.12 - - [01/Nov/2017:00:10:39 +0000] "GET /webapi/getcategory?uri=yakusoku.cocoloni.jp&cat=society HTTP/1.1" 200 60 "-" "Apache-HttpClient/UNAVAILABLE (java 1.4)"
181.234.203.122 - - [01/Nov/2017:00:10:56 +0000] "GET /axis2/services/WebFilteringService/getCategoryByUrl?app=chrome_antiporn&ver=0.19.7.1&url=https%3A//bonusbitcoin.co/faucet&cat=unknown HTTP/1.1" 200 134 "-" "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36"
103.4.190.242 - - [01/Nov/2017:00:11:10 +0000] "GET /axis2/services/WebFilteringService/getCategoryByUrl?app=chrome_antiporn&ver=0.19.7.1&url=https%3A//s1-word-edit-15.cdn.office.net/we/s/1687297775_App_Scripts/2057/WordEditor.Wac.TellMeModel.js&cat=unknown HTTP/1.1" 200 134 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/62.0.3202.75 Safari/537.36"
72.182.173.74 - - [01/Nov/2017:00:11:23 +0000] "GET /axis2/services/WebFilteringService/getCategoryByUrl?app=chrome_webfilter&ver=0.19.7.1&url=http%3A//store.steampowered.com/agecheck/app/744640/&cat=game HTTP/1.1" 200 133 "-" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36"
186.179.100.64 - - [01/Nov/2017:00:11:36 +0000] "GET /axis2/services/WebFilteringService/getCategoryByUrl?app=chrome_antiporn&ver=0.19.7.1&url=https%3A//swx.cdn.skype.com/assets/v/0.0.300/audio/m4a/call-dialing.m4a&cat=internet-communication HTTP/1.1" 200 133 "-" "Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36"
74.71.124.140 - - [01/Nov/2017:00:11:50 +0000] "GET /axis2/services/WebFilteringService/getCategoryByUrl?app=chrome_antiporn&ver=0.19.7.1&url=http%3A//my.netzero.net/s/sp&cat=internet-communication HTTP/1.1" 200 133 "-" "Mozilla/5.0 (Windows NT 6.3; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36"
94.189.216.28 - - [01/Nov/2017:00:12:04 +0000] "GET /axis2/services/WebFilteringService/getCategoryByUrl?app=chrome_antiporn&ver=0.19.7.1&url=http%3A//www.nba.com/&cat=sport HTTP/1.1" 200 133 "-" "Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/62.0.3202.75 Safari/537.36"
187.131.9.222 - - [01/Nov/2017:00:12:17 +0000] "GET /axis2/services/WebFilteringService/getCategoryByUrl?app=chrome_antiporn&ver=0.19.7.1&url=http%3A//www.google.com.mx/%3Fgfe_rd%3Dcr%26dcr%3D0%26ei%3DWRH5WeezN-bo8AeEoo7oDw&cat=search-engine HTTP/1.1" 200 133 "-" "Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36"
179.7.171.84 - - [01/Nov/2017:00:12:33 +0000] "GET /axis2/services/WebFilteringService/getCategoryByUrl?app=chrome_antiporn&ver=0.19.7.1&url=https%3A//securepubads.g.doubleclick.net/static/3p_cookie.html&cat=business-and-economy HTTP/1.1" 200 133 "-" "Mozilla/5.0 (Windows NT 10.0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36"
200.106.89.161 - - [01/Nov/2017:00:12:49 +0000] "GET /axis2/services/WebFilteringService/getCategoryByUrl?app=firefox_AntiPorn&ver=0.19.6.9&url=https%3A%2F%2Fwww.pnp.gob.pe%2Fadmision_EESTP_PNP%2Fprospecto_proceso_admision_ETSPNP_2017_II.pdf&cat=government HTTP/1.1" 200 133 "-" "Mozilla/5.0 (Windows NT 6.1; rv:52.0) Gecko/20100101 Firefox/52.0"
187.222.252.169 - - [01/Nov/2017:00:13:05 +0000] "GET /axis2/services/WebFilteringService/getCategoryByUrl?app=chrome_antiporn&ver=0.19.7.1&url=https%3A//s1-word-view-15.cdn.office.net/wv/s/1687297775_resources/3082/progress16.gif&cat=unknown HTTP/1.1" 200 134 "-" "Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36"
49.146.42.248 - - [01/Nov/2017:00:13:22 +0000] "GET /axis2/services/WebFilteringService/getCategoryByUrl?app=chrome_antiporn&ver=0.19.7.1&url=https%3A//dquchx93qmjdu.cloudfront.net/s3/resources/sound/common/pickweapon_69eea0cef175a3faa11eca989f346a4c.mp3&cat=content-delivery-network HTTP/1.1" 200 133 "-" "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/46.0.2490.80 Safari/537.36"
189.181.11.35 - - [01/Nov/2017:00:13:38 +0000] "GET /axis2/services/WebFilteringService/getCategoryByUrl?app=chrome_antiporn&ver=0.19.7.1&url=http%3A//adexc.net/network/%3Fref_prm%3D28401%26clck%3Db0ajqvw8zzni%26pub_sd%3DM82IMGZFR%26ad_spv%3D549&cat=botnet HTTP/1.1" 200 133 "-" "Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/62.0.3202.75 Safari/537.36"
190.186.200.125 - - [01/Nov/2017:00:13:56 +0000] "GET /axis2/services/WebFilteringService/getCategoryByUrl?app=firefox_AntiPorn&ver=0.19.6.9&url=https%3A%2F%2Fes-la.facebook.com%2F&cat=social-networking HTTP/1.1" 200 133 "-" "Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:56.0) Gecko/20100101 Firefox/56.0"
190.171.208.228 - - [01/Nov/2017:00:14:11 +0000] "GET /axis2/services/WebFilteringService/getCategoryByUrl?app=chrome_antiporn&ver=0.19.7.1&url=http%3A//chiquitests.com/enchinan/&cat=unknown HTTP/1.1" 200 134 "-" "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36"
75.86.115.195 - - [01/Nov/2017:00:14:28 +0000] "GET /axis2/services/WebFilteringService/getCategoryByUrl?app=chrome_webfilter&ver=0.19.7.1&url=https%3A//securepubads.g.doubleclick.net/static/3p_cookie.html&cat=business-and-economy HTTP/1.1" 200 133 "-" "Mozilla/5.0 (X11; CrOS x86_64 9765.85.0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.123 Safari/537.36"
201.240.33.221 - - [01/Nov/2017:00:14:45 +0000] "GET /axis2/services/WebFilteringService/getCategoryByUrl?app=chrome_antiporn&ver=0.19.7.1&url=https%3A//cf-media.sndcdn.com/OaJxdnP5Fsen.128.mp3%3FPolicy%3DeyJTdGF0ZW1lbnQiOlt7IlJlc291cmNlIjoiKjovL2NmLW1lZGlhLnNuZGNkbi5jb20vT2FKeGRuUDVGc2VuLjEyOC5tcDMiLCJDb25kaXRpb24iOnsiRGF0ZUxlc3NUaGFuIjp7IkFXUzpFcG9jaFRpbWUiOjE1MDk0OTU2Nzh9fX1dfQ__%26Signature%3DSjqtGj2LWI9SCvgiIzNXs4M7P7eA-OCfi%7E%7EMwNzxFQ-Pft1DLkoDuUx1vnqf0JC0BGKRegqep0hiMxiJMUUBVLYzEtZq0jZFZKz90zO8lyfvOG38vwnbUj68Jcpb6PTTvwLK1lK9Oo8RA1DSQ-NmA1v1yj8N0DQBZmEF2RXRbmXxgh7kSledHq2OFfQ1Im-OLJyvFEH2Mq-4c3YruyvdxSPxBOkp81CL53ceEm9oAYNThc-7HXv5LPbqB%7EOrcjqXi0VihyE4MSoIou08%7E3sZBNTpq2fB4RhP8TnoNblAQtWsPMEj%7EhXTX9cJ3WrOvb9k67DV3HKf0RYfpiX-jFTfog__%26Key-Pair-Id%3DAPKAJAGZ7VMH2PFPW6UQ&cat=unknown HTTP/1.1" 200 134 "-" "Mozilla/5.0 (Windows NT 5.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/49.0.2623.112 Safari/537.36"
202.151.22.3 - - [01/Nov/2017:00:15:00 +0000] "GET /axis2/services/WebFilteringService/getCategoryByUrl?app=firefox_WebFilter&ver=0.19.6.9&url=http%3A%2F%2Fwww.fijitimes.com%2Fstory.aspx&cat=news-and-media HTTP/1.1" 200 133 "-" "Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:56.0) Gecko/20100101 Firefox/56.0"
181.176.73.81 - - [01/Nov/2017:00:15:16 +0000] "GET /axis2/services/WebFilteringService/getCategoryByUrl?app=chrome_antiporn&ver=0.19.7.1&url=https%3A//web.facebook.com/login.php%3Flogin_attempt%3D1%26lwv%3D110&cat=social-networking HTTP/1.1" 200 133 "-" "Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36"
190.236.208.200 - - [01/Nov/2017:00:15:33 +0000] "GET /axis2/services/WebFilteringService/getCategoryByUrl?app=chrome_antiporn&ver=0.19.7.1&url=https%3A//www.google.com.pe/search%3Fsafe%3Dstrict%26hl%3Des%26biw%3D1366%26bih%3D662%26tbm%3Disch%26sa%3D1%26ei%3DWxD5WebYNIj4wASqorPIDw%26q%3Dcontribucion%26oq%3Dcon%26gs_l%3Dpsy-ab.1.1.0i67k1l5j0l5.447088.451222.0.455291.37.12.0.0.0.0.394.1792.2-4j2.7.0....0...1.1.64.psy-ab..31.5.1471.0..0i30k1.248.GHQlbsuDZcQ%26safe%3Dhigh&cat=search-engine HTTP/1.1" 200 133 "-" "Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36"
186.137.144.93 - - [01/Nov/2017:00:15:48 +0000] "GET /axis2/services/WebFilteringService/getCategoryByUrl?app=chrome_antiporn&ver=0.19.7.1&url=https%3A//www.ar.avon.com/REPSuite/orderEntry.page%3Fredirected%3Dtrue%26isSuccess%3DY&cat=shopping HTTP/1.1" 200 133 "-" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/62.0.3202.75 Safari/537.36"
49.148.209.194 - - [01/Nov/2017:00:16:03 +0000] "GET /axis2/services/WebFilteringService/getCategoryByUrl?app=chrome_antiporn&ver=0.19.7.1&url=https%3A//www.facebook.com/&cat=social-networking HTTP/1.1" 200 133 "-" "Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/57.0.2987.133 Safari/537.36"
73.213.34.16 - - [01/Nov/2017:00:16:18 +0000] "GET /axis2/services/WebFilteringService/getCategoryByUrl?app=chrome_webfilter&ver=0.19.7.1&url=https%3A//swx.cdn.skype.com/assets/v/0.0.300/audio/m4a/call-outgoing-p1.m4a&cat=internet-communication HTTP/1.1" 200 133 "-" "Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36"
201.240.247.104 - - [01/Nov/2017:00:16:33 +0000] "GET /axis2/services/WebFilteringService/getCategoryByUrl?app=chrome_antiporn&ver=0.19.7.1&url=https%3A//www.google.com.pe/search%3Fq%3Dcomo+prepara+par+ahacer+una+mascara+de+pantomima%26rlz%3D1C1NHXL_esPE709PE709%26oq%3Dcomo+prepara+par+ahacer+una+mascara+de+pantomima%26aqs%3Dchrome..69i57.19536j0j7%26sourceid%3Dchrome%26ie%3DUTF-8%26safe%3Dhigh&cat=search-engine HTTP/1.1" 200 133 "-" "Mozilla/5.0 (Windows NT 6.3; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36"
190.130.189.170 - - [01/Nov/2017:00:16:49 +0000] "GET /axis2/services/WebFilteringService/getCategoryByUrl?app=chrome_antiporn&ver=0.19.7.1&url=https%3A//es.123rf.com/imagenes-de-archivo/ombligo.html&cat=business-and-economy HTTP/1.1" 200 133 "-" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36"
190.204.104.89 - - [01/Nov/2017:00:17:04 +0000] "GET /webapi/getcategory?uri=www.google.co.ve&cat=search-engine HTTP/1.1" 200 67 "-" "Apache-HttpClient/UNAVAILABLE (java 1.4)"
190.42.233.34 - - [01/Nov/2017:00:17:20 +0000] "GET /axis2/services/WebFilteringService/getCategoryByUrl?app=chrome_antiporn&ver=0.19.7.1&url=https%3A//www.facebook.com/&cat=social-networking HTTP/1.1" 200 133 "-" "Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36"
187.136.98.155 - - [01/Nov/2017:00:17:35 +0000] "GET /axis2/services/WebFilteringService/getCategoryByUrl?app=chrome_antiporn&ver=0.19.7.1&url=http%3A//musicaq.biz/descargar-musica/9f352ef6-santana-the-game-of-love-ft-michelle-branch.html&cat=unknown HTTP/1.1" 200 134 "-" "Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36"
147.147.163.182 - - [01/Nov/2017:00:17:50 +0000] "GET /axis2/services/WebFilteringService/getCategoryByUrl?app=firefox_AntiPorn&ver=0.19.6.9&url=https%3A%2F%2Fwww.worldtimebuddy.com%2F&cat=unknown HTTP/1.1" 200 133 "-" "Mozilla/5.0 (X11; Linux x86_64; rv:56.0) Gecko/20100101 Firefox/56.0"
190.232.70.238 - - [01/Nov/2017:00:18:06 +0000] "GET /axis2/services/WebFilteringService/getCategoryByUrl?app=chrome_antiporn&ver=0.19.7.1&url=http%3A//sv3.onlinevideoconverter.com/download%3Ffile%3De4c2d3a0e4a0c2&cat=adult-and-pornography HTTP/1.1" 200 133 "-" "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36"
49.148.209.194 - - [01/Nov/2017:00:18:21 +0000] "GET /axis2/services/WebFilteringService/getCategoryByUrl?app=chrome_antiporn&ver=0.19.7.1&url=https%3A//web.roblox.com/&cat=game HTTP/1.1" 200 133 "-" "Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36"
189.170.192.60 - - [01/Nov/2017:00:18:36 +0000] "GET /axis2/services/WebFilteringService/getCategoryByUrl?app=chrome_antiporn&ver=0.19.7.1&url=https%3A//encrypted-tbn0.gstatic.com/images%3Fq%3Dtbn%3AANd9GcQcdjN8-1NJnSeC6ptIlx7S0wZucgg1jzL4N-i7IWE_8o8-F0gmjw&cat=search-engine HTTP/1.1" 200 133 "-" "Mozilla/5.0 (Windows NT 5.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/49.0.2623.112 Safari/537.36"
161.18.215.235 - - [01/Nov/2017:00:18:50 +0000] "GET /webapi/getcategory?uri=www.wattpad.com&cat=personal-site-and-blog HTTP/1.1" 200 75 "-" "Apache-HttpClient/UNAVAILABLE (java 1.4)"
138.36.222.166 - - [01/Nov/2017:00:19:04 +0000] "GET /axis2/services/WebFilteringService/getCategoryByUrl?app=chrome_antiporn&ver=0.19.7.1&url=https%3A//securepubads.g.doubleclick.net/static/3p_cookie.html&cat=business-and-economy HTTP/1.1" 200 133 "-" "Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36"
201.230.112.110 - - [01/Nov/2017:00:19:20 +0000] "GET /axis2/services/WebFilteringService/getCategoryByUrl?app=chrome_antiporn&ver=0.19.7.1&url=https%3A//civilgeeks.com/categor%25C3%25ADa/hidraulica/&cat=personal-site-and-blog HTTP/1.1" 200 133 "-" "Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36"
190.234.49.7 - - [01/Nov/2017:00:19:35 +0000] "GET /axis2/services/WebFilteringService/getCategoryByUrl?app=chrome_antiporn&ver=0.19.7.1&url=https%3A//www.google.com.pe/search%3Fq%3DPINTEREST%26oq%3DPINTERE%26aqs%3Dchrome.0.69i59j69i60j69i65j69i57j0l2.2160j0j1%26sourceid%3Dchrome%26ie%3DUTF-8%26safe%3Dhigh&cat=search-engine HTTP/1.1" 200 133 "-" "Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36"
190.236.239.11 - - [01/Nov/2017:00:19:49 +0000] "GET /axis2/services/WebFilteringService/getCategoryByUrl?app=chrome_antiporn&ver=0.19.7.1&url=https%3A//scontent.flim5-3.fna.fbcdn.net/v/t1.0-1/p32x32/22687826_1976412995963948_3676302371441952941_n.jpg%3Foh%3D7bc40797d744c7b5d94dd368ae4de823%26oe%3D5A6CCB3A&cat=social-networking HTTP/1.1" 200 133 "-" "Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/62.0.3202.75 Safari/537.36"
73.193.233.55 - - [01/Nov/2017:00:20:04 +0000] "GET /axis2/services/WebFilteringService/getCategoryByUrl?app=chrome_antiporn&ver=0.19.7.1&url=https%3A//web2.secureinternetbank.com/pbi_pbi1151/login/Remote/221272028&cat=financial-service HTTP/1.1" 200 133 "-" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36"
190.66.152.36 - - [01/Nov/2017:00:20:19 +0000] "GET /axis2/services/WebFilteringService/getCategoryByUrl?app=chrome_antiporn&ver=0.19.7.1&url=https%3A//es.answers.yahoo.com/question/index%3Fqid%3D20120715103200AAX15LS&cat=internet-portal HTTP/1.1" 200 133 "-" "Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36"
24.12.190.248 - - [01/Nov/2017:00:20:35 +0000] "GET /axis2/services/WebFilteringService/getCategoryByUrl?app=chrome_webfilter&ver=0.19.7.1&url=http%3A//mesgmy.ebay.com/ws/eBayISAPI.dll%3FViewMyMessages%26_trksid%3Dp2057872.m2034.l3912%26CurrentPage%3DMyeBayMyMessages%26ssPageName%3DSTRK%3AME%3ALNLK%3ANone%26FClassic%3Dtrue&cat=auctions HTTP/1.1" 200 133 "-" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36"
186.88.204.14 - - [01/Nov/2017:00:20:50 +0000] "GET /axis2/services/WebFilteringService/getCategoryByUrl?app=chrome_antiporn&ver=0.19.7.1&url=http%3A//www.espn.com.ve/futbol/resultados/_/liga/todo/fecha/20171030&cat=unknown HTTP/1.1" 200 134 "-" "Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/62.0.3202.75 Safari/537.36"
121.208.9.139 - - [01/Nov/2017:00:21:06 +0000] "GET /axis2/services/WebFilteringService/getCategoryByUrl?app=chrome_antiporn&ver=0.19.7.1&url=http%3A//contest.cartoonnetwork.com.au/mobile/&cat=entertainment-and-art HTTP/1.1" 200 133 "-" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36"
190.237.181.93 - - [01/Nov/2017:00:21:21 +0000] "GET /axis2/services/WebFilteringService/getCategoryByUrl?app=chrome_antiporn&ver=0.19.7.1&url=https%3A//www.google.com.pe/&cat=search-engine HTTP/1.1" 200 133 "-" "Mozilla/5.0 (Windows NT 6.2; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36"
190.226.68.247 - - [01/Nov/2017:00:21:35 +0000] "GET /axis2/services/WebFilteringService/getCategoryByUrl?app=chrome_antiporn&ver=0.19.7.1&url=https%3A//articulo.mercadolibre.com.ar/MLA-666799963-ipod-classic-_JM&cat=auctions HTTP/1.1" 200 133 "-" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36"
190.237.218.215 - - [01/Nov/2017:00:21:51 +0000] "GET /axis2/services/WebFilteringService/getCategoryByUrl?app=chrome_antiporn&ver=0.19.7.1&url=https%3A//www.google.com.pe/search%3Fsafe%3Dstrict%26rlz%3D1C1NHXL_esPE700PE709%26ei%3DlRP5WaLvKMS1wQS-vbIw%26q%3Dsword+art+online+temporada+3+capitulo+1+sub+espa%25C3%25B1ol%26oq%3Dsword+art+online+temporada+3%26gs_l%3Dpsy-ab.1.1.0i67k1l2j0l8.4652.4951.0.6388.2.2.0.0.0.0.395.650.2-1j1.2.0....0...1.1.64.psy-ab..0.2.635....0.mr5_VTgCxKQ%26safe%3Dhigh&cat=search-engine HTTP/1.1" 200 133 "-" "Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36"
187.202.159.92 - - [01/Nov/2017:00:22:06 +0000] "GET /axis2/services/WebFilteringService/getCategoryByUrl?app=firefox_AntiPorn&ver=0.19.6.9&url=http%3A%2F%2Fwww.excelsior.com.mx%2Feuropa%23view-1&cat=news-and-media HTTP/1.1" 200 133 "-" "Mozilla/5.0 (Windows NT 6.3; Win64; x64; rv:56.0) Gecko/20100101 Firefox/56.0"
49.145.255.136 - - [01/Nov/2017:00:22:23 +0000] "GET /axis2/services/WebFilteringService/getCategoryByUrl?app=chrome_antiporn&ver=0.19.7.1&url=http%3A//ff81k.voluumtrk2.com/8dc38b77-7604-481b-bd63-11eaca6207e4%3FID%3D74575527&cat=unknown HTTP/1.1" 200 134 "-" "Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/59.0.3071.115 Safari/537.36"
81.103.165.211 - - [01/Nov/2017:00:22:39 +0000] "GET /axis2/services/WebFilteringService/getCategoryByUrl?app=chrome_webfilter&ver=0.19.7.1&url=https%3A//securepubads.g.doubleclick.net/static/3p_cookie.html&cat=business-and-economy HTTP/1.1" 200 133 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36"
129.7.0.190 - - [01/Nov/2017:00:22:55 +0000] "GET /axis2/services/WebFilteringService/getCategoryByUrl?app=chrome_antiporn&ver=0.19.7.1&url=https%3A//elearning.uh.edu/bbcswebdav/pid-4102743-dt-content-rid-27567989_1/xid-27567989_1&cat=sport HTTP/1.1" 200 133 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36"
186.137.235.112 - - [01/Nov/2017:00:23:12 +0000] "GET /axis2/services/WebFilteringService/getCategoryByUrl?app=chrome_antiporn&ver=0.19.7.1&url=https%3A//www1.tarjetacencosud.com.ar/sociosce/context/initPrivada.action&cat=unknown HTTP/1.1" 200 134 "-" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36"
71.228.46.247 - - [01/Nov/2017:00:23:28 +0000] "GET /axis2/services/WebFilteringService/getCategoryByUrl?app=chrome_antiporn&ver=0.19.7.1&url=https%3A//www.udacity.com/courses/data-science&cat=educational-institution HTTP/1.1" 200 133 "-" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36"
187.189.90.132 - - [01/Nov/2017:00:23:45 +0000] "GET /axis2/services/WebFilteringService/getCategoryByUrl?app=chrome_antiporn&ver=0.19.7.1&url=https%3A//securepubads.g.doubleclick.net/static/3p_cookie.html&cat=business-and-economy HTTP/1.1" 200 133 "-" "Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36"
186.93.5.254 - - [01/Nov/2017:00:24:01 +0000] "GET /axis2/services/WebFilteringService/getCategoryByUrl?app=chrome_antiporn&ver=0.19.7.1&url=https%3A//www.banesconline.com/MANTIS/WEBSITE/imagenesinhouse/imagenesinhouse.aspx&cat=financial-service HTTP/1.1" 200 133 "-" "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36"
181.176.85.164 - - [01/Nov/2017:00:24:15 +0000] "GET /axis2/services/WebFilteringService/getCategoryByUrl?app=chrome_antiporn&ver=0.19.7.1&url=https%3A//web.facebook.com/login.php%3Flogin_attempt%3D1%26lwv%3D110&cat=social-networking HTTP/1.1" 200 133 "-" "Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36"
190.229.2.7 - - [01/Nov/2017:00:24:30 +0000] "GET /axis2/services/WebFilteringService/getCategoryByUrl?app=chrome_webfilter&ver=0.19.7.1&url=https%3A//windows-file-explorer.softonic.com/%3Fex%3DDSK-309.5&cat=software-download HTTP/1.1" 200 133 "-" "Mozilla/5.0 (Windows NT 5.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/49.0.2623.112 Safari/537.36"
101.102.214.204 - - [01/Nov/2017:00:24:44 +0000] "GET /axis2/services/WebFilteringService/getCategoryByUrl?app=firefox_WebFilter&ver=0.19.6.9&url=https%3A%2F%2Fwww.chatwork.com%2F%23!rid37781593&cat=computer-information HTTP/1.1" 200 133 "-" "Mozilla/5.0 (Windows NT 6.1; WOW64; rv:56.0) Gecko/20100101 Firefox/56.0"
107.130.125.138 - - [01/Nov/2017:00:25:00 +0000] "GET /axis2/services/WebFilteringService/getCategoryByUrl?app=chrome_antiporn&ver=0.19.7.1&url=https%3A//securepubads.g.doubleclick.net/static/3p_cookie.html&cat=business-and-economy HTTP/1.1" 200 133 "-" "Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/62.0.3202.75 Safari/537.36"
187.204.183.145 - - [01/Nov/2017:00:25:14 +0000] "GET /axis2/services/WebFilteringService/getCategoryByUrl?app=chrome_antiporn&ver=0.19.7.1&url=http%3A//educacion.app.jalisco.gob.mx/cas/Default.aspx&cat=government HTTP/1.1" 200 133 "-" "Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36"
88.26.241.195 - - [01/Nov/2017:00:25:28 +0000] "GET /axis2/services/WebFilteringService/getCategoryByUrlLexibook?appid=android_safebrowser&ver=1.2.4&url=https://www-cdn.whatsapp.net/android/2.17.393/WhatsApp.apk&cat=internet-communication HTTP/1.1" 200 149 "-" "Dalvik/1.6.0 (Linux; U; Android 4.4.2; MFS100ES Build/KOT49H)"
190.218.173.239 - - [01/Nov/2017:00:25:43 +0000] "GET /axis2/services/WebFilteringService/getCategoryByUrl?app=chrome_antiporn&ver=0.19.7.1&url=https%3A//offer.alibaba.com/exclusive_US_EN.html%3Ftv%3D2%26isFeature%3Dtrue%26imp%3D5b1aor1btqfgc6v2rk7%26xp%3D-baxEQ7WcvtuK1U3YXZj3e11KlWATqHSv3HPF5tfWmkCmo1TaYp8yWdHlHT3IkKE4blNtS6vAcINPyVmlLV4u-mPaUrlz_JCb14tWvEsxKI%26pid%3D1018325%26td%3DPropellerads%26cv%3D1020192%26aff_id%3D182463618%26ct%3D2%26size%3D300_250%26cn%3DPA%26an%3D50001%26bm%3Dcpa%26tp1%3D372702377464%26src%3Dsaf&cat=business-and-economy HTTP/1.1" 200 133 "-" "Mozilla/5.0 (Windows NT 6.2; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36"

代码展示

class AccessLogAgg { 

@Test
def ipAgg(): Unit = { 

Logger.getLogger("org").setLevel(Level.ERROR)
//TODO 创建SparkContext
val conf = new SparkConf().setMaster("local[6]").setAppName("ip_agg")
val sc = new SparkContext(conf)
//TODO 读取文件,生成数据集
val path = "dataset\\access_log_sample.txt"
val source: RDD[String] = sc.textFile(path)
//TODO 取出IP 赋予出现次数为1
val ipRDD: RDD[(String, Int)] = source.map(x => (x.split(" ")(0), 1))
//TODO 简单清洗
//去除空的数据
//去掉非法的数据
//根据业务再整理一下数据
val cleanRDD: RDD[(String, Int)] = ipRDD.filter(x => StringUtils.isNotEmpty(x._1))
//TODO 根据IP出现的次数进行聚合
val ipAggRDD: RDD[(String, Int)] = cleanRDD.reduceByKey(_ + _)
//TODO 根据IP出现的次数进行排序 默认升序
val sortRDD: RDD[(String, Int)] = ipAggRDD.sortBy(x => x._2, ascending = false)
//TODO 取出结果打印结果
sortRDD.foreach(println)
}
}

针对这个小案例, 我们问出互相关联但是又方向不同的五个问题

1.假设要针对整个网站的历史数据进行处理, 量有 1T, 如何处理?

放在集群中, 利用集群多台计算机来并行处理

2.如何放在集群中运行?

在这里插入图片描述
简单来讲, 并行计算就是同时使用多个计算资源解决一个问题, 有如下四个要点

  • 要解决的问题必须可以分解为多个可以并发计算的部分
  • 每个部分要可以在不同处理器上被同时执行
  • 需要一个共享内存的机制
  • 需要一个总体上的协作机制来进行调度

3.如果放在集群中的话, 可能要对整个计算任务进行分解, 如何分解?

在这里插入图片描述
概述

  • 对于 HDFS 中的文件, 是分为不同的 Block 的
  • 在进行计算的时候, 就可以按照 Block 来划分, 每一个 Block 对应一个不同的计算单元

扩展

  • RDD 并没有真实的存放数据, 数据是从 HDFS 中读取的, 在计算的过程中读取即可
  • RDD 至少是需要可以 分片 的, 因为HDFS中的文件就是分片的, RDD 分片的意义在于表示对源数据集每个分片的计算, RDD 可以分片也意味着 可以并行计算

4.移动数据不如移动计算是一个基础的优化, 如何做到?

每一个计算单元需要记录其存储单元的位置, 尽量调度过去
每一个计算单元需要记录其存储单元的位置, 尽量调度过去

5.在集群中运行, 需要很多节点之间配合, 出错的概率也更高, 出错了怎么办?
在这里插入图片描述
RDD1 → RDD2 → RDD3 这个过程中, RDD2 出错了, 有两种办法可以解决

  • 缓存 RDD2 的数据, 直接恢复 RDD2, 类似 HDFS 的备份机制
  • 记录 RDD2 的依赖关系, 通过其父级的 RDD 来恢复 RDD2, 这种方式会少很多数据的交互和保存

如何通过父级 RDD 来恢复?

  • 记录 RDD2 的父亲是 RDD1
  • 记录 RDD2 的计算函数, 例如记录 RDD2 = RDD1.map(…​), map(…​) 就是计算函数
  • 当 RDD2 计算出错的时候, 可以通过父级 RDD 和计算函数来恢复 RDD2

6.假如任务特别复杂, 流程特别长, 有很多 RDD 之间有依赖关系, 如何优化?

在这里插入图片描述
上面提到了可以使用依赖关系来进行容错, 但是如果依赖关系特别长的时候, 这种方式其实也比较低效, 这个时候就应该使用另外一种方式, 也就是记录数据集的状态

在 Spark 中有两个手段可以做到

  • 缓存
  • Checkpoint

再谈 RDD

目标

  1. 理解 RDD 为什么会出现
  2. 理解 RDD 的主要特点
  3. 理解 RDD 的五大属性

RDD 为什么会出现?

在 RDD 出现之前, 当时 MapReduce 是比较主流的, 而 MapReduce 如何执行迭代计算的任务呢?

在这里插入图片描述
多个 MapReduce 任务之间没有基于内存的数据共享方式, 只能通过磁盘来进行共享

这种方式明显比较低效

RDD 如何解决迭代计算非常低效的问题呢?

在这里插入图片描述在 Spark 中, 其实最终 Job3 从逻辑上的计算过程是: Job3 = (Job1.map).filter, 整个过程是共享内存的, 而不需要将中间结果存放在可靠的分布式文件系统中

这种方式可以在保证容错的前提下, 提供更多的灵活, 更快的执行速度.

RDD 的特点

RDD 不仅是数据集, 也是编程模型
RDD 即是一种数据结构, 同时也提供了上层 API, 同时 RDD 的 API 和 Scala 中对集合运算的 API 非常类似, 同样也都是各种算子

在这里插入图片描述
RDD 的算子大致分为两类:

  • Transformation 转换操作, 例如 map flatMap filter 等
  • Action 动作操作, 例如 reduce collect show 等

执行 RDD 的时候, 在执行到转换操作的时候, 并不会立刻执行, 直到遇见了 Action 操作, 才会触发真正的执行, 这个特点叫做 惰性求值

RDD 可以分区

在这里插入图片描述
RDD 是一个分布式计算框架, 所以, 一定是要能够进行分区计算的, 只有分区了, 才能利用集群的并行计算能力

同时, RDD 不需要始终被具体化, 也就是说: RDD 中可以没有数据, 只要有足够的信息知道自己是从谁计算得来的就可以, 这是一种非常高效的容错方式

RDD 是只读的

在这里插入图片描述
RDD 是只读的, 不允许任何形式的修改. 虽说不能因为 RDD 和 HDFS 是只读的, 就认为分布式存储系统必须设计为只读的. 但是设计为只读的, 会显著降低问题的复杂度, 因为 RDD 需要可以容错, 可以惰性求值, 可以移动计算, 所以很难支持修改.

  • RDD2 中可能没有数据, 只是保留了依赖关系和计算函数, 那修改啥?
  • 如果因为支持修改, 而必须保存数据的话, 怎么容错?
  • 如果允许修改, 如何定位要修改的那一行? RDD 的转换是粗粒度的, 也就是说, RDD 并不感知具体每一行在哪.

RDD 是可以容错的

在这里插入图片描述
RDD 的容错有两种方式

  • 保存 RDD 之间的依赖关系, 以及计算函数, 出现错误重新计算
  • 直接将 RDD 的数据存放在外部存储系统, 出现错误直接读取, Checkpoint

什么叫做弹性分布式数据集

分布式

  • RDD 支持分区, 可以运行在集群中

弹性

  • RDD 支持高效的容错
  • RDD 中的数据即可以缓存在内存中, 也可以缓存在磁盘中, 也可以缓存在外部存储中

数据集

  • RDD 可以不保存具体数据, 只保留创建自己的必备信息, 例如依赖和计算函数
  • RDD 也可以缓存起来, 相当于存储具体数据

总结: RDD 的五大属性

首先整理一下上面所提到的 RDD 所要实现的功能:

  • RDD 有分区
  • RDD 要可以通过依赖关系和计算函数进行容错
  • RDD 要针对数据本地性进行优化
  • RDD 支持 MapReduce 形式的计算, 所以要能够对数据进行 Shuffled

对于 RDD 来说, 其中应该有什么内容呢? 如果站在 RDD 设计者的角度上, 这个类中, 至少需要什么属性?

  • Partition List 分片列表, 记录 RDD 的分片, 可以在创建 RDD 的时候指定分区数目, 也可以通过算子来生成新的 RDD 从而改变分区数目
  • Compute Function 为了实现容错, 需要记录 RDD 之间转换所执行的计算函数
  • RDD Dependencies RDD 之间的依赖关系, 要在 RDD 中记录其上级 RDD 是谁, 从而实现容错和计算
  • Partitioner 为了执行 Shuffled 操作, 必须要有一个函数用来计算数据应该发往哪个分区
  • Preferred Location 优先位置, 为了实现数据本地性操作, 从而移动计算而不是移动存储, 需要记录每个 RDD 分区最好应该放置在什么位置
版权声明:本文内容由互联网用户自发贡献,该文观点仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 举报,一经查实,本站将立刻删除。

发布者:全栈程序员-用户IM,转载请注明出处:https://javaforall.cn/158913.html原文链接:https://javaforall.cn

【正版授权,激活自己账号】: Jetbrains全家桶Ide使用,1年售后保障,每天仅需1毛

【官方授权 正版激活】: 官方授权 正版激活 支持Jetbrains家族下所有IDE 使用个人JB账号...

(0)
blank

相关推荐

  • stn专线和otn有什么区别_stn云专线是什么意思?

    stn专线和otn有什么区别_stn云专线是什么意思?云专线产品是指依托于STN(智能传送网),为客户提供灵活业务接入、灵活带宽、高可靠性及端到端质量保障的专线产品。STN云专线产品描述:依托于STN(智能传送网),为客户提供灵活业务接入、灵活带宽、高可靠性及端到端质量保障的二层以太专线产品。STN(SmartTransportNetwork)智能传送网,采用JIPRAN及PTN技术相结合发展起来的—种增强型分组组网技术,该技术可叠加在移动业…

    2022年10月19日
  • wine怎么打开exe_exe是什么格式的文件

    wine怎么打开exe_exe是什么格式的文件1,WinExec():  WinExec主要运行EXE文件,不能运行其他类型的文件。不用引用特别单元。  原型:UINTWinExec(exePath,ShowCmd)  示例,我想要用记事本打开”C:\HDC.TXT”,以正常方式运行:WinExec(pChar(‘notepadc:\taoyoyo.txt’),SW_SHOWNORMAL);  参数说明:  –xePath:命令行参数。注意,要用pChar转化一下。  –ShowCmd:外部程序…

  • 深度学习 相机标定_相机标定

    深度学习 相机标定_相机标定术语内参矩阵:IntrinsicMatrix焦距:FocalLength主点:PrincipalPoint径向畸变:RadialDistortion切向畸变:TangentialDistortion旋转矩阵:RotationMatrices平移向量:TranslationVectors平均重投影误差:MeanReprojectionError重投影误差:Repr…

  • 虚拟ip的概念_虚拟化的概念

    虚拟ip的概念_虚拟化的概念1.虚拟IP是什么?要是单讲解虚拟IP,理解起来很困难,所以干脆把动态IP、固定IP、实体IP与虚拟IP都讲解一下,加深理解和知识扩展实体IP:在网络的世界里,为了要辨识每一部计算机的位置,因此有了计算机IP位址的定义。一个IP就好似一个门牌!例如,你要去微软的网站的话,就要去『207.46.197.101』这个IP位置!这些可以直接在网际网络上沟…

    2022年10月20日
  • 计算机原码反码补码怎么算_-35的原码反码补码

    计算机原码反码补码怎么算_-35的原码反码补码最近花了点时间对计算机的原码,反码和补码进行了研究,对为什么要有反码和补码以及他们这么设计的原因有了一定的理解机器数一个数在计算机中的表现形式叫做机器数,这个数有正负之分,在计算机中用一个数的最高位(符号位)用来表示它的正负,其中0表示正数,1表示负数。例如正数7,在计算机中用一个8位的二进制数来表示,是00000111,而负数-7,则用10000111表示,这里的00000111和1…

  • navicat 激活码 2022【最新永久激活】

    (navicat 激活码 2022)好多小伙伴总是说激活码老是失效,太麻烦,关注/收藏全栈君太难教程,2021永久激活的方法等着你。IntelliJ2021最新激活注册码,破解教程可免费永久激活,亲测有效,下面是详细链接哦~https://javaforall.cn/100143.html1M2OME2TZY-eyJsaWNlbnNlSW…

发表回复

您的电子邮箱地址不会被公开。

关注全栈程序员社区公众号