Spark快速入门系列(三)深入理解RDD

Spark快速入门系列(三)深入理解RDD这里写目录标题深入RDD二级目录三级目录深入RDD目标深入理解RDD的内在逻辑,以及RDD的内部属性(RDD由什么组成)案例需求给定一个网站的访问记录,俗称Accesslog计算其中出现的独立IP,以及其访问的次数二级目录三级目录…

大家好,又见面了,我是你们的朋友全栈君。如果您正在找激活码,请点击查看最新教程,关注关注公众号 “全栈程序员社区” 获取激活教程,可能之前旧版本教程已经失效.最新Idea2022.1教程亲测有效,一键激活。

Jetbrains全系列IDE使用 1年只要46元 售后保障 童叟无欺

深入 RDD

目标

深入理解 RDD 的内在逻辑, 以及 RDD 的内部属性(RDD 由什么组成)

案例

需求

给定一个网站的访问记录, 俗称 Access log
计算其中出现的独立 IP, 以及其访问的次数

创建个数据文件access_log_sample.txt(数据量太大,存不到这里,先用100)

190.217.63.59 - - [01/Nov/2017:00:00:15 +0000] "GET /axis2/services/WebFilteringService/getCategoryByUrl?app=chrome_antiporn&ver=0.19.7.1&url=https%3A//securepubads.g.doubleclick.net/static/3p_cookie.html&cat=business-and-economy HTTP/1.1" 200 133 "-" "Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36"
76.114.21.96 - - [01/Nov/2017:00:00:31 +0000] "GET /axis2/services/WebFilteringService/getCategoryByUrl?app=chrome_antiporn&ver=0.19.7.1&url=http%3A//tricolor.entravision.com/sacramento/escucha-en-vivo/&cat=business-and-economy HTTP/1.1" 200 133 "-" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36"
206.126.121.204 - - [01/Nov/2017:00:00:46 +0000] "GET /axis2/services/WebFilteringService/getCategoryByUrl?app=chrome_antiporn&ver=0.19.7.1&url=http%3A//zone.msn.com/gameplayer/gameplayer.aspx%3Fgame%3Dfamilyfeud&cat=internet-portal HTTP/1.1" 200 133 "-" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36"
154.121.8.18 - - [01/Nov/2017:00:01:01 +0000] "GET /axis2/services/WebFilteringService/getCategoryByUrl?app=firefox_AntiPorn&ver=0.19.6.9&url=https%3A%2F%2Fwww.google.dz%2Fsearch&cat=search-engine HTTP/1.1" 200 133 "-" "Mozilla/5.0 (Windows NT 5.1; rv:11.0) Gecko/20100101 Firefox/11.0"
190.238.37.217 - - [01/Nov/2017:00:01:17 +0000] "GET /axis2/services/WebFilteringService/getCategoryByUrl?app=chrome_antiporn&ver=0.19.7.1&url=https%3A//securepubads.g.doubleclick.net/static/3p_cookie.html&cat=business-and-economy HTTP/1.1" 200 133 "-" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36"
147.147.163.182 - - [01/Nov/2017:00:01:31 +0000] "GET /axis2/services/WebFilteringService/getCategoryByUrl?app=firefox_AntiPorn&ver=0.19.6.9&url=https%3A%2F%2Fs-usweb.dotomi.com%2Frenderer%2FdelPublishersCookies.html&cat=business-and-economy HTTP/1.1" 200 133 "-" "Mozilla/5.0 (X11; Linux x86_64; rv:56.0) Gecko/20100101 Firefox/56.0"
200.78.93.132 - - [01/Nov/2017:00:01:45 +0000] "GET /axis2/services/WebFilteringService/getCategoryByUrl?app=chrome_antiporn&ver=0.19.7.1&url=https%3A//www.facebook.com/login/device-based/regular/login/&cat=social-networking HTTP/1.1" 200 133 "-" "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36"
24.200.173.170 - - [01/Nov/2017:00:01:59 +0000] "GET /axis2/services/WebFilteringService/getCategoryByUrl?app=chrome_antiporn&ver=0.19.7.1&url=https%3A//securepubads.g.doubleclick.net/static/glade.js&cat=business-and-economy HTTP/1.1" 200 133 "-" "Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36"
189.252.185.4 - - [01/Nov/2017:00:02:15 +0000] "GET /axis2/services/WebFilteringService/getCategoryByUrl?app=firefox_AntiPorn&ver=0.19.6.9&url=https%3A%2F%2Fwww.google.cm%2Fblank.html&cat=internet-portal HTTP/1.1" 200 133 "-" "Mozilla/5.0 (Windows NT 6.1; rv:34.0) Gecko/20100101 Firefox/34.0"
190.90.22.125 - - [01/Nov/2017:00:02:29 +0000] "GET /axis2/services/WebFilteringService/getCategoryByUrl?app=chrome_antiporn&ver=0.19.7.1&url=http%3A//www.raicesdeeuropa.com/grandes-obras-de-los-principales-escritores-nacidos-durante-el-siglo-xix/&cat=unknown HTTP/1.1" 200 134 "-" "Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36"
181.64.62.158 - - [01/Nov/2017:00:02:45 +0000] "GET /axis2/services/WebFilteringService/getCategoryByUrl?app=chrome_antiporn&ver=0.19.7.1&url=https%3A//bancaporinternet.interbank.com.pe/Warhol/redireccionaInicioLogueo&cat=financial-service HTTP/1.1" 200 133 "-" "Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/62.0.3202.75 Safari/537.36"
122.54.153.240 - - [01/Nov/2017:00:03:00 +0000] "GET /axis2/services/WebFilteringService/getCategoryByUrl?app=chrome_antiporn&ver=0.19.7.1&url=https%3A//securepubads.g.doubleclick.net/static/3p_cookie.html&cat=business-and-economy HTTP/1.1" 200 133 "-" "Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/59.0.3071.115 Safari/537.36"
181.64.62.158 - - [01/Nov/2017:00:03:16 +0000] "GET /axis2/services/WebFilteringService/getCategoryByUrl?app=chrome_antiporn&ver=0.19.7.1&url=https%3A//www.google.com.pe/&cat=search-engine HTTP/1.1" 200 133 "-" "Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36"
190.236.239.8 - - [01/Nov/2017:00:03:33 +0000] "GET /axis2/services/WebFilteringService/getCategoryByUrl?app=chrome_antiporn&ver=0.19.7.1&url=https%3A//www.google.com.pe/search%3Frlz%3D1C2AOHY_esPE760PE760%26source%3Dhp%26ei%3DUw_5WeGVA4TjmAHO8aCgDw%26q%3Dfb%26oq%3Dfb%26gs_l%3Dpsy-ab.3..0i131k1j0l4j0i131k1l2j0l3.1767.1916.0.2135.2.2.0.0.0.0.144.269.0j2.2.0....0...1.1.64.psy-ab..0.2.267....0.pWGbpZy6zwg%26safe%3Dhigh&cat=search-engine HTTP/1.1" 200 133 "-" "Mozilla/5.0 (Windows NT 5.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/49.0.2623.112 Safari/537.36"
190.110.200.41 - - [01/Nov/2017:00:03:50 +0000] "GET /axis2/services/WebFilteringService/getCategoryByUrl?app=chrome_antiporn&ver=0.19.7.1&url=https%3A//www.facebook.com/rsrc.php/v3i0KB4/ye/l/es_LA/G6VcGRK_54X.js&cat=social-networking HTTP/1.1" 200 133 "-" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36"
77.180.73.169 - - [01/Nov/2017:00:04:06 +0000] "GET /axis2/services/WebFilteringService/getCategoryByUrl?app=chrome_antiporn&ver=0.19.7.1&url=https%3A//gomovies.co/&cat=unknown HTTP/1.1" 200 134 "-" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36"
49.146.42.248 - - [01/Nov/2017:00:04:22 +0000] "GET /axis2/services/WebFilteringService/getCategoryByUrl?app=chrome_antiporn&ver=0.19.7.1&url=https%3A//mm-a.akamaihd.net/160/sn/assets/common/3d/particle/ns2/texture/line_040.dxt%3Fv%3D25960&cat=content-delivery-network HTTP/1.1" 200 133 "-" "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/46.0.2490.80 Safari/537.36"
181.64.146.165 - - [01/Nov/2017:00:04:39 +0000] "GET /axis2/services/WebFilteringService/getCategoryByUrl?app=chrome_antiporn&ver=0.19.7.1&url=https%3A//www.facebook.com/rsrc.php/yR/r/lvSDckxyoU5.ogg&cat=social-networking HTTP/1.1" 200 133 "-" "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/53.0.2785.143 Safari/537.36"
201.240.33.214 - - [01/Nov/2017:00:04:55 +0000] "GET /axis2/services/WebFilteringService/getCategoryByUrl?app=chrome_antiporn&ver=0.19.7.1&url=https%3A//es.savefrom.net/%23url%3Dhttp%3A//youtube.com/watch%3Fv%3Dgr_3VrQC8qY%26utm_source%3Dyoutube.com%26utm_medium%3Dshort_domains%26utm_campaign%3Dssyoutube.com&cat=software-download HTTP/1.1" 200 133 "-" "Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36"
190.236.56.58 - - [01/Nov/2017:00:05:10 +0000] "GET /axis2/services/WebFilteringService/getCategoryByUrl?app=chrome_webfilter&ver=0.19.7.1&url=https%3A//scontent.flim5-4.fna.fbcdn.net/v/t1.0-1/p32x32/22310580_351017335344058_8554274362948717253_n.jpg%3Foh%3D5da979568a22e425b79b7ba788dbc30a%26oe%3D5A65BCC3&cat=social-networking HTTP/1.1" 200 133 "-" "Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36"
181.64.192.238 - - [01/Nov/2017:00:05:26 +0000] "GET /axis2/services/WebFilteringService/getCategoryByUrl?app=chrome_antiporn&ver=0.19.7.1&url=https%3A//www.google.com.pe/%3Fgws_rd%3Dssl&cat=search-engine HTTP/1.1" 200 133 "-" "Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36"
201.255.225.35 - - [01/Nov/2017:00:05:41 +0000] "GET /axis2/services/WebFilteringService/getCategoryByUrl?app=chrome_antiporn&ver=0.19.7.1&url=https%3A//www.google.com.ar/search%3Fq%3D886971865721%26oq%3D886971865721%26aqs%3Dchrome..69i57.719j0j7%26sourceid%3Dchrome%26ie%3DUTF-8%26safe%3Dhigh&cat=search-engine HTTP/1.1" 200 133 "-" "Mozilla/5.0 (Windows NT 10.0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36"
181.211.197.246 - - [01/Nov/2017:00:05:56 +0000] "GET /axis2/services/WebFilteringService/getCategoryByUrl?app=chrome_antiporn&ver=0.19.7.1&url=https%3A//www.netflix.com/logout%3Flocale%3Des-EC&cat=media-streaming HTTP/1.1" 200 133 "-" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36"
187.3.230.121 - - [01/Nov/2017:00:06:11 +0000] "GET /axis2/services/WebFilteringService/getCategoryByUrl?app=chrome_antiporn&ver=0.19.7.1&url=http%3A//baixar.programanex.com.br/latest/setup_nex.exe&cat=unknown HTTP/1.1" 200 134 "-" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/62.0.3202.75 Safari/537.36"
175.158.226.85 - - [01/Nov/2017:00:06:28 +0000] "GET /axis2/services/WebFilteringService/getCategoryByUrl?app=chrome_antiporn&ver=0.19.7.1&url=https%3A//web.facebook.com/%3F_rdc%3D1%26_rdr&cat=social-networking HTTP/1.1" 200 133 "-" "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36"
24.151.60.116 - - [01/Nov/2017:00:06:43 +0000] "GET /axis2/services/WebFilteringService/getCategoryByUrl?app=chrome_webfilter&ver=0.19.7.1&url=https%3A//www.edhelper.com/edhelper_monthly.htm&cat=educational-institution HTTP/1.1" 200 133 "-" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36"
190.236.157.88 - - [01/Nov/2017:00:06:58 +0000] "GET /axis2/services/WebFilteringService/getCategoryByUrl?app=chrome_antiporn&ver=0.19.7.1&url=http%3A//auth.kaybo1.com/member/login.html%3Fback_url%3Dhttp%3A//pb.kaybo1.com/event/evt20170301_event/event01.html&cat=game HTTP/1.1" 200 133 "-" "Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36"
75.73.28.212 - - [01/Nov/2017:00:07:13 +0000] "GET /axis2/services/WebFilteringService/getCategoryByUrl?app=chrome_antiporn&ver=0.19.7.1&url=https%3A//securepubads.g.doubleclick.net/static/3p_cookie.html&cat=business-and-economy HTTP/1.1" 200 133 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36"
190.43.170.133 - - [01/Nov/2017:00:07:29 +0000] "GET /axis2/services/WebFilteringService/getCategoryByUrl?app=chrome_antiporn&ver=0.19.7.1&url=https%3A//securepubads.g.doubleclick.net/static/3p_cookie.html&cat=business-and-economy HTTP/1.1" 200 133 "-" "Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36"
186.121.250.170 - - [01/Nov/2017:00:07:45 +0000] "GET /axis2/services/WebFilteringService/getCategoryByUrl?app=chrome_antiporn&ver=0.19.7.1&url=https%3A//r1---sn-5mncvap8p5-a2ce.googlevideo.com/generate_204&cat=media-streaming HTTP/1.1" 200 133 "-" "Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36"
190.237.183.6 - - [01/Nov/2017:00:08:01 +0000] "GET /axis2/services/WebFilteringService/getCategoryByUrl?app=chrome_antiporn&ver=0.19.7.1&url=http%3A//musicaq.biz/song.php%3Fid%3DQ2hpbml0byBEZWwgQW5kZSAtICAgIFByaW1pY2lhfGh0dHBzOi8vYXBpLnNvdW5kY2xvdWQuY29tL3RyYWNrcy83MjkxNjE2NS9zdHJlYW0%252FY2xpZW50X2lkPTBmOGZkYmJhYTIxYTliZDE4MjEwOTg2YTdkYzJkNzJj&cat=unknown HTTP/1.1" 200 134 "-" "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36"
181.67.2.102 - - [01/Nov/2017:00:08:17 +0000] "GET /axis2/services/WebFilteringService/getCategoryByUrl?app=chrome_antiporn&ver=0.19.7.1&url=http%3A//lcperu.edestinos.com.pe/check-in-online&cat=unknown HTTP/1.1" 200 134 "-" "Mozilla/5.0 (Windows NT 6.2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36"
186.121.218.21 - - [01/Nov/2017:00:08:33 +0000] "GET /axis2/services/WebFilteringService/getCategoryByUrl?app=chrome_antiporn&ver=0.19.7.1&url=https%3A//www.4kdownload.com/buy/videodownloader%3Fsource%3Dvideodownloader%26redirect-locale%3Des%26ui_source%3Dshow-on-run-3&cat=unknown HTTP/1.1" 200 134 "-" "Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/62.0.3202.75 Safari/537.36"
76.23.172.162 - - [01/Nov/2017:00:08:48 +0000] "GET /axis2/services/WebFilteringService/getCategoryByUrl?app=chrome_webfilter&ver=0.19.7.1&url=https%3A//open.spotify.com/&cat=music HTTP/1.1" 200 133 "-" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36"
181.64.101.27 - - [01/Nov/2017:00:09:04 +0000] "GET /axis2/services/WebFilteringService/getCategoryByUrl?app=chrome_antiporn&ver=0.19.7.1&url=https%3A//www.facebook.com/&cat=social-networking HTTP/1.1" 200 133 "-" "Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36"
114.186.152.178 - - [01/Nov/2017:00:09:21 +0000] "GET /axis2/services/WebFilteringService/getCategoryByUrl?app=chrome_antiporn&ver=0.19.7.1&url=https%3A//securepubads.g.doubleclick.net/static/3p_cookie.html&cat=business-and-economy HTTP/1.1" 200 133 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36"
181.234.56.200 - - [01/Nov/2017:00:09:36 +0000] "GET /axis2/services/WebFilteringService/getCategoryByUrl?app=chrome_antiporn&ver=0.19.7.1&url=https%3A//toroadvertisingmedia.com/cr%3Fb%3D218558%26p%3D7550%26c%3D6608%26h%3D0d7386ae207d128d276c8fc974f8f99b%26l%3DCO%26tz%3D-5.0%26sh%3D768.0%26sw%3D1360.0%26ad.trans.id%3Dwzj9mrkhearh%26t%3D1509494794724%26u%3Dhttps%253A%252F%252Fwww.popcornvod.com%252Fwelcome.html%253Faff%253D4054%2526theme%253D0922%2526clickid%253DOCM2NjA4IzI0MyM3NTUwfDIxODU1OHxDT3wzfDF8fHx3emo5bXJraGVhcmh8fHw%2526pub%253D1400%2526sub_pub_id%253D&cat=unknown HTTP/1.1" 200 133 "-" "Mozilla/5.0 (Windows NT 10.0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36"
190.233.78.10 - - [01/Nov/2017:00:09:52 +0000] "GET /axis2/services/WebFilteringService/getCategoryByUrl?app=chrome_antiporn&ver=0.19.7.1&url=http%3A//www.flvto.biz/es/downloads/mp3/yt_5S-Fjz5CR5s/&cat=unknown HTTP/1.1" 200 134 "-" "Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36"
186.46.172.102 - - [01/Nov/2017:00:10:07 +0000] "GET /axis2/services/WebFilteringService/getCategoryByUrl?app=chrome_antiporn&ver=0.19.7.1&url=http%3A//jkanime.net/kimi-ni-todoke-2/5/&cat=entertainment-and-art HTTP/1.1" 200 133 "-" "Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36"
189.148.47.237 - - [01/Nov/2017:00:10:24 +0000] "GET /axis2/services/WebFilteringService/getCategoryByUrl?app=chrome_webfilter&ver=0.19.7.1&url=https%3A//securepubads.g.doubleclick.net/static/3p_cookie.html&cat=business-and-economy HTTP/1.1" 200 133 "-" "Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/53.0.2800.0 Iron Safari/537.36"
182.251.246.12 - - [01/Nov/2017:00:10:39 +0000] "GET /webapi/getcategory?uri=yakusoku.cocoloni.jp&cat=society HTTP/1.1" 200 60 "-" "Apache-HttpClient/UNAVAILABLE (java 1.4)"
181.234.203.122 - - [01/Nov/2017:00:10:56 +0000] "GET /axis2/services/WebFilteringService/getCategoryByUrl?app=chrome_antiporn&ver=0.19.7.1&url=https%3A//bonusbitcoin.co/faucet&cat=unknown HTTP/1.1" 200 134 "-" "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36"
103.4.190.242 - - [01/Nov/2017:00:11:10 +0000] "GET /axis2/services/WebFilteringService/getCategoryByUrl?app=chrome_antiporn&ver=0.19.7.1&url=https%3A//s1-word-edit-15.cdn.office.net/we/s/1687297775_App_Scripts/2057/WordEditor.Wac.TellMeModel.js&cat=unknown HTTP/1.1" 200 134 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/62.0.3202.75 Safari/537.36"
72.182.173.74 - - [01/Nov/2017:00:11:23 +0000] "GET /axis2/services/WebFilteringService/getCategoryByUrl?app=chrome_webfilter&ver=0.19.7.1&url=http%3A//store.steampowered.com/agecheck/app/744640/&cat=game HTTP/1.1" 200 133 "-" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36"
186.179.100.64 - - [01/Nov/2017:00:11:36 +0000] "GET /axis2/services/WebFilteringService/getCategoryByUrl?app=chrome_antiporn&ver=0.19.7.1&url=https%3A//swx.cdn.skype.com/assets/v/0.0.300/audio/m4a/call-dialing.m4a&cat=internet-communication HTTP/1.1" 200 133 "-" "Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36"
74.71.124.140 - - [01/Nov/2017:00:11:50 +0000] "GET /axis2/services/WebFilteringService/getCategoryByUrl?app=chrome_antiporn&ver=0.19.7.1&url=http%3A//my.netzero.net/s/sp&cat=internet-communication HTTP/1.1" 200 133 "-" "Mozilla/5.0 (Windows NT 6.3; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36"
94.189.216.28 - - [01/Nov/2017:00:12:04 +0000] "GET /axis2/services/WebFilteringService/getCategoryByUrl?app=chrome_antiporn&ver=0.19.7.1&url=http%3A//www.nba.com/&cat=sport HTTP/1.1" 200 133 "-" "Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/62.0.3202.75 Safari/537.36"
187.131.9.222 - - [01/Nov/2017:00:12:17 +0000] "GET /axis2/services/WebFilteringService/getCategoryByUrl?app=chrome_antiporn&ver=0.19.7.1&url=http%3A//www.google.com.mx/%3Fgfe_rd%3Dcr%26dcr%3D0%26ei%3DWRH5WeezN-bo8AeEoo7oDw&cat=search-engine HTTP/1.1" 200 133 "-" "Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36"
179.7.171.84 - - [01/Nov/2017:00:12:33 +0000] "GET /axis2/services/WebFilteringService/getCategoryByUrl?app=chrome_antiporn&ver=0.19.7.1&url=https%3A//securepubads.g.doubleclick.net/static/3p_cookie.html&cat=business-and-economy HTTP/1.1" 200 133 "-" "Mozilla/5.0 (Windows NT 10.0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36"
200.106.89.161 - - [01/Nov/2017:00:12:49 +0000] "GET /axis2/services/WebFilteringService/getCategoryByUrl?app=firefox_AntiPorn&ver=0.19.6.9&url=https%3A%2F%2Fwww.pnp.gob.pe%2Fadmision_EESTP_PNP%2Fprospecto_proceso_admision_ETSPNP_2017_II.pdf&cat=government HTTP/1.1" 200 133 "-" "Mozilla/5.0 (Windows NT 6.1; rv:52.0) Gecko/20100101 Firefox/52.0"
187.222.252.169 - - [01/Nov/2017:00:13:05 +0000] "GET /axis2/services/WebFilteringService/getCategoryByUrl?app=chrome_antiporn&ver=0.19.7.1&url=https%3A//s1-word-view-15.cdn.office.net/wv/s/1687297775_resources/3082/progress16.gif&cat=unknown HTTP/1.1" 200 134 "-" "Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36"
49.146.42.248 - - [01/Nov/2017:00:13:22 +0000] "GET /axis2/services/WebFilteringService/getCategoryByUrl?app=chrome_antiporn&ver=0.19.7.1&url=https%3A//dquchx93qmjdu.cloudfront.net/s3/resources/sound/common/pickweapon_69eea0cef175a3faa11eca989f346a4c.mp3&cat=content-delivery-network HTTP/1.1" 200 133 "-" "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/46.0.2490.80 Safari/537.36"
189.181.11.35 - - [01/Nov/2017:00:13:38 +0000] "GET /axis2/services/WebFilteringService/getCategoryByUrl?app=chrome_antiporn&ver=0.19.7.1&url=http%3A//adexc.net/network/%3Fref_prm%3D28401%26clck%3Db0ajqvw8zzni%26pub_sd%3DM82IMGZFR%26ad_spv%3D549&cat=botnet HTTP/1.1" 200 133 "-" "Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/62.0.3202.75 Safari/537.36"
190.186.200.125 - - [01/Nov/2017:00:13:56 +0000] "GET /axis2/services/WebFilteringService/getCategoryByUrl?app=firefox_AntiPorn&ver=0.19.6.9&url=https%3A%2F%2Fes-la.facebook.com%2F&cat=social-networking HTTP/1.1" 200 133 "-" "Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:56.0) Gecko/20100101 Firefox/56.0"
190.171.208.228 - - [01/Nov/2017:00:14:11 +0000] "GET /axis2/services/WebFilteringService/getCategoryByUrl?app=chrome_antiporn&ver=0.19.7.1&url=http%3A//chiquitests.com/enchinan/&cat=unknown HTTP/1.1" 200 134 "-" "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36"
75.86.115.195 - - [01/Nov/2017:00:14:28 +0000] "GET /axis2/services/WebFilteringService/getCategoryByUrl?app=chrome_webfilter&ver=0.19.7.1&url=https%3A//securepubads.g.doubleclick.net/static/3p_cookie.html&cat=business-and-economy HTTP/1.1" 200 133 "-" "Mozilla/5.0 (X11; CrOS x86_64 9765.85.0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.123 Safari/537.36"
201.240.33.221 - - [01/Nov/2017:00:14:45 +0000] "GET /axis2/services/WebFilteringService/getCategoryByUrl?app=chrome_antiporn&ver=0.19.7.1&url=https%3A//cf-media.sndcdn.com/OaJxdnP5Fsen.128.mp3%3FPolicy%3DeyJTdGF0ZW1lbnQiOlt7IlJlc291cmNlIjoiKjovL2NmLW1lZGlhLnNuZGNkbi5jb20vT2FKeGRuUDVGc2VuLjEyOC5tcDMiLCJDb25kaXRpb24iOnsiRGF0ZUxlc3NUaGFuIjp7IkFXUzpFcG9jaFRpbWUiOjE1MDk0OTU2Nzh9fX1dfQ__%26Signature%3DSjqtGj2LWI9SCvgiIzNXs4M7P7eA-OCfi%7E%7EMwNzxFQ-Pft1DLkoDuUx1vnqf0JC0BGKRegqep0hiMxiJMUUBVLYzEtZq0jZFZKz90zO8lyfvOG38vwnbUj68Jcpb6PTTvwLK1lK9Oo8RA1DSQ-NmA1v1yj8N0DQBZmEF2RXRbmXxgh7kSledHq2OFfQ1Im-OLJyvFEH2Mq-4c3YruyvdxSPxBOkp81CL53ceEm9oAYNThc-7HXv5LPbqB%7EOrcjqXi0VihyE4MSoIou08%7E3sZBNTpq2fB4RhP8TnoNblAQtWsPMEj%7EhXTX9cJ3WrOvb9k67DV3HKf0RYfpiX-jFTfog__%26Key-Pair-Id%3DAPKAJAGZ7VMH2PFPW6UQ&cat=unknown HTTP/1.1" 200 134 "-" "Mozilla/5.0 (Windows NT 5.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/49.0.2623.112 Safari/537.36"
202.151.22.3 - - [01/Nov/2017:00:15:00 +0000] "GET /axis2/services/WebFilteringService/getCategoryByUrl?app=firefox_WebFilter&ver=0.19.6.9&url=http%3A%2F%2Fwww.fijitimes.com%2Fstory.aspx&cat=news-and-media HTTP/1.1" 200 133 "-" "Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:56.0) Gecko/20100101 Firefox/56.0"
181.176.73.81 - - [01/Nov/2017:00:15:16 +0000] "GET /axis2/services/WebFilteringService/getCategoryByUrl?app=chrome_antiporn&ver=0.19.7.1&url=https%3A//web.facebook.com/login.php%3Flogin_attempt%3D1%26lwv%3D110&cat=social-networking HTTP/1.1" 200 133 "-" "Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36"
190.236.208.200 - - [01/Nov/2017:00:15:33 +0000] "GET /axis2/services/WebFilteringService/getCategoryByUrl?app=chrome_antiporn&ver=0.19.7.1&url=https%3A//www.google.com.pe/search%3Fsafe%3Dstrict%26hl%3Des%26biw%3D1366%26bih%3D662%26tbm%3Disch%26sa%3D1%26ei%3DWxD5WebYNIj4wASqorPIDw%26q%3Dcontribucion%26oq%3Dcon%26gs_l%3Dpsy-ab.1.1.0i67k1l5j0l5.447088.451222.0.455291.37.12.0.0.0.0.394.1792.2-4j2.7.0....0...1.1.64.psy-ab..31.5.1471.0..0i30k1.248.GHQlbsuDZcQ%26safe%3Dhigh&cat=search-engine HTTP/1.1" 200 133 "-" "Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36"
186.137.144.93 - - [01/Nov/2017:00:15:48 +0000] "GET /axis2/services/WebFilteringService/getCategoryByUrl?app=chrome_antiporn&ver=0.19.7.1&url=https%3A//www.ar.avon.com/REPSuite/orderEntry.page%3Fredirected%3Dtrue%26isSuccess%3DY&cat=shopping HTTP/1.1" 200 133 "-" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/62.0.3202.75 Safari/537.36"
49.148.209.194 - - [01/Nov/2017:00:16:03 +0000] "GET /axis2/services/WebFilteringService/getCategoryByUrl?app=chrome_antiporn&ver=0.19.7.1&url=https%3A//www.facebook.com/&cat=social-networking HTTP/1.1" 200 133 "-" "Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/57.0.2987.133 Safari/537.36"
73.213.34.16 - - [01/Nov/2017:00:16:18 +0000] "GET /axis2/services/WebFilteringService/getCategoryByUrl?app=chrome_webfilter&ver=0.19.7.1&url=https%3A//swx.cdn.skype.com/assets/v/0.0.300/audio/m4a/call-outgoing-p1.m4a&cat=internet-communication HTTP/1.1" 200 133 "-" "Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36"
201.240.247.104 - - [01/Nov/2017:00:16:33 +0000] "GET /axis2/services/WebFilteringService/getCategoryByUrl?app=chrome_antiporn&ver=0.19.7.1&url=https%3A//www.google.com.pe/search%3Fq%3Dcomo+prepara+par+ahacer+una+mascara+de+pantomima%26rlz%3D1C1NHXL_esPE709PE709%26oq%3Dcomo+prepara+par+ahacer+una+mascara+de+pantomima%26aqs%3Dchrome..69i57.19536j0j7%26sourceid%3Dchrome%26ie%3DUTF-8%26safe%3Dhigh&cat=search-engine HTTP/1.1" 200 133 "-" "Mozilla/5.0 (Windows NT 6.3; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36"
190.130.189.170 - - [01/Nov/2017:00:16:49 +0000] "GET /axis2/services/WebFilteringService/getCategoryByUrl?app=chrome_antiporn&ver=0.19.7.1&url=https%3A//es.123rf.com/imagenes-de-archivo/ombligo.html&cat=business-and-economy HTTP/1.1" 200 133 "-" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36"
190.204.104.89 - - [01/Nov/2017:00:17:04 +0000] "GET /webapi/getcategory?uri=www.google.co.ve&cat=search-engine HTTP/1.1" 200 67 "-" "Apache-HttpClient/UNAVAILABLE (java 1.4)"
190.42.233.34 - - [01/Nov/2017:00:17:20 +0000] "GET /axis2/services/WebFilteringService/getCategoryByUrl?app=chrome_antiporn&ver=0.19.7.1&url=https%3A//www.facebook.com/&cat=social-networking HTTP/1.1" 200 133 "-" "Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36"
187.136.98.155 - - [01/Nov/2017:00:17:35 +0000] "GET /axis2/services/WebFilteringService/getCategoryByUrl?app=chrome_antiporn&ver=0.19.7.1&url=http%3A//musicaq.biz/descargar-musica/9f352ef6-santana-the-game-of-love-ft-michelle-branch.html&cat=unknown HTTP/1.1" 200 134 "-" "Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36"
147.147.163.182 - - [01/Nov/2017:00:17:50 +0000] "GET /axis2/services/WebFilteringService/getCategoryByUrl?app=firefox_AntiPorn&ver=0.19.6.9&url=https%3A%2F%2Fwww.worldtimebuddy.com%2F&cat=unknown HTTP/1.1" 200 133 "-" "Mozilla/5.0 (X11; Linux x86_64; rv:56.0) Gecko/20100101 Firefox/56.0"
190.232.70.238 - - [01/Nov/2017:00:18:06 +0000] "GET /axis2/services/WebFilteringService/getCategoryByUrl?app=chrome_antiporn&ver=0.19.7.1&url=http%3A//sv3.onlinevideoconverter.com/download%3Ffile%3De4c2d3a0e4a0c2&cat=adult-and-pornography HTTP/1.1" 200 133 "-" "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36"
49.148.209.194 - - [01/Nov/2017:00:18:21 +0000] "GET /axis2/services/WebFilteringService/getCategoryByUrl?app=chrome_antiporn&ver=0.19.7.1&url=https%3A//web.roblox.com/&cat=game HTTP/1.1" 200 133 "-" "Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36"
189.170.192.60 - - [01/Nov/2017:00:18:36 +0000] "GET /axis2/services/WebFilteringService/getCategoryByUrl?app=chrome_antiporn&ver=0.19.7.1&url=https%3A//encrypted-tbn0.gstatic.com/images%3Fq%3Dtbn%3AANd9GcQcdjN8-1NJnSeC6ptIlx7S0wZucgg1jzL4N-i7IWE_8o8-F0gmjw&cat=search-engine HTTP/1.1" 200 133 "-" "Mozilla/5.0 (Windows NT 5.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/49.0.2623.112 Safari/537.36"
161.18.215.235 - - [01/Nov/2017:00:18:50 +0000] "GET /webapi/getcategory?uri=www.wattpad.com&cat=personal-site-and-blog HTTP/1.1" 200 75 "-" "Apache-HttpClient/UNAVAILABLE (java 1.4)"
138.36.222.166 - - [01/Nov/2017:00:19:04 +0000] "GET /axis2/services/WebFilteringService/getCategoryByUrl?app=chrome_antiporn&ver=0.19.7.1&url=https%3A//securepubads.g.doubleclick.net/static/3p_cookie.html&cat=business-and-economy HTTP/1.1" 200 133 "-" "Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36"
201.230.112.110 - - [01/Nov/2017:00:19:20 +0000] "GET /axis2/services/WebFilteringService/getCategoryByUrl?app=chrome_antiporn&ver=0.19.7.1&url=https%3A//civilgeeks.com/categor%25C3%25ADa/hidraulica/&cat=personal-site-and-blog HTTP/1.1" 200 133 "-" "Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36"
190.234.49.7 - - [01/Nov/2017:00:19:35 +0000] "GET /axis2/services/WebFilteringService/getCategoryByUrl?app=chrome_antiporn&ver=0.19.7.1&url=https%3A//www.google.com.pe/search%3Fq%3DPINTEREST%26oq%3DPINTERE%26aqs%3Dchrome.0.69i59j69i60j69i65j69i57j0l2.2160j0j1%26sourceid%3Dchrome%26ie%3DUTF-8%26safe%3Dhigh&cat=search-engine HTTP/1.1" 200 133 "-" "Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36"
190.236.239.11 - - [01/Nov/2017:00:19:49 +0000] "GET /axis2/services/WebFilteringService/getCategoryByUrl?app=chrome_antiporn&ver=0.19.7.1&url=https%3A//scontent.flim5-3.fna.fbcdn.net/v/t1.0-1/p32x32/22687826_1976412995963948_3676302371441952941_n.jpg%3Foh%3D7bc40797d744c7b5d94dd368ae4de823%26oe%3D5A6CCB3A&cat=social-networking HTTP/1.1" 200 133 "-" "Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/62.0.3202.75 Safari/537.36"
73.193.233.55 - - [01/Nov/2017:00:20:04 +0000] "GET /axis2/services/WebFilteringService/getCategoryByUrl?app=chrome_antiporn&ver=0.19.7.1&url=https%3A//web2.secureinternetbank.com/pbi_pbi1151/login/Remote/221272028&cat=financial-service HTTP/1.1" 200 133 "-" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36"
190.66.152.36 - - [01/Nov/2017:00:20:19 +0000] "GET /axis2/services/WebFilteringService/getCategoryByUrl?app=chrome_antiporn&ver=0.19.7.1&url=https%3A//es.answers.yahoo.com/question/index%3Fqid%3D20120715103200AAX15LS&cat=internet-portal HTTP/1.1" 200 133 "-" "Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36"
24.12.190.248 - - [01/Nov/2017:00:20:35 +0000] "GET /axis2/services/WebFilteringService/getCategoryByUrl?app=chrome_webfilter&ver=0.19.7.1&url=http%3A//mesgmy.ebay.com/ws/eBayISAPI.dll%3FViewMyMessages%26_trksid%3Dp2057872.m2034.l3912%26CurrentPage%3DMyeBayMyMessages%26ssPageName%3DSTRK%3AME%3ALNLK%3ANone%26FClassic%3Dtrue&cat=auctions HTTP/1.1" 200 133 "-" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36"
186.88.204.14 - - [01/Nov/2017:00:20:50 +0000] "GET /axis2/services/WebFilteringService/getCategoryByUrl?app=chrome_antiporn&ver=0.19.7.1&url=http%3A//www.espn.com.ve/futbol/resultados/_/liga/todo/fecha/20171030&cat=unknown HTTP/1.1" 200 134 "-" "Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/62.0.3202.75 Safari/537.36"
121.208.9.139 - - [01/Nov/2017:00:21:06 +0000] "GET /axis2/services/WebFilteringService/getCategoryByUrl?app=chrome_antiporn&ver=0.19.7.1&url=http%3A//contest.cartoonnetwork.com.au/mobile/&cat=entertainment-and-art HTTP/1.1" 200 133 "-" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36"
190.237.181.93 - - [01/Nov/2017:00:21:21 +0000] "GET /axis2/services/WebFilteringService/getCategoryByUrl?app=chrome_antiporn&ver=0.19.7.1&url=https%3A//www.google.com.pe/&cat=search-engine HTTP/1.1" 200 133 "-" "Mozilla/5.0 (Windows NT 6.2; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36"
190.226.68.247 - - [01/Nov/2017:00:21:35 +0000] "GET /axis2/services/WebFilteringService/getCategoryByUrl?app=chrome_antiporn&ver=0.19.7.1&url=https%3A//articulo.mercadolibre.com.ar/MLA-666799963-ipod-classic-_JM&cat=auctions HTTP/1.1" 200 133 "-" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36"
190.237.218.215 - - [01/Nov/2017:00:21:51 +0000] "GET /axis2/services/WebFilteringService/getCategoryByUrl?app=chrome_antiporn&ver=0.19.7.1&url=https%3A//www.google.com.pe/search%3Fsafe%3Dstrict%26rlz%3D1C1NHXL_esPE700PE709%26ei%3DlRP5WaLvKMS1wQS-vbIw%26q%3Dsword+art+online+temporada+3+capitulo+1+sub+espa%25C3%25B1ol%26oq%3Dsword+art+online+temporada+3%26gs_l%3Dpsy-ab.1.1.0i67k1l2j0l8.4652.4951.0.6388.2.2.0.0.0.0.395.650.2-1j1.2.0....0...1.1.64.psy-ab..0.2.635....0.mr5_VTgCxKQ%26safe%3Dhigh&cat=search-engine HTTP/1.1" 200 133 "-" "Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36"
187.202.159.92 - - [01/Nov/2017:00:22:06 +0000] "GET /axis2/services/WebFilteringService/getCategoryByUrl?app=firefox_AntiPorn&ver=0.19.6.9&url=http%3A%2F%2Fwww.excelsior.com.mx%2Feuropa%23view-1&cat=news-and-media HTTP/1.1" 200 133 "-" "Mozilla/5.0 (Windows NT 6.3; Win64; x64; rv:56.0) Gecko/20100101 Firefox/56.0"
49.145.255.136 - - [01/Nov/2017:00:22:23 +0000] "GET /axis2/services/WebFilteringService/getCategoryByUrl?app=chrome_antiporn&ver=0.19.7.1&url=http%3A//ff81k.voluumtrk2.com/8dc38b77-7604-481b-bd63-11eaca6207e4%3FID%3D74575527&cat=unknown HTTP/1.1" 200 134 "-" "Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/59.0.3071.115 Safari/537.36"
81.103.165.211 - - [01/Nov/2017:00:22:39 +0000] "GET /axis2/services/WebFilteringService/getCategoryByUrl?app=chrome_webfilter&ver=0.19.7.1&url=https%3A//securepubads.g.doubleclick.net/static/3p_cookie.html&cat=business-and-economy HTTP/1.1" 200 133 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36"
129.7.0.190 - - [01/Nov/2017:00:22:55 +0000] "GET /axis2/services/WebFilteringService/getCategoryByUrl?app=chrome_antiporn&ver=0.19.7.1&url=https%3A//elearning.uh.edu/bbcswebdav/pid-4102743-dt-content-rid-27567989_1/xid-27567989_1&cat=sport HTTP/1.1" 200 133 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36"
186.137.235.112 - - [01/Nov/2017:00:23:12 +0000] "GET /axis2/services/WebFilteringService/getCategoryByUrl?app=chrome_antiporn&ver=0.19.7.1&url=https%3A//www1.tarjetacencosud.com.ar/sociosce/context/initPrivada.action&cat=unknown HTTP/1.1" 200 134 "-" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36"
71.228.46.247 - - [01/Nov/2017:00:23:28 +0000] "GET /axis2/services/WebFilteringService/getCategoryByUrl?app=chrome_antiporn&ver=0.19.7.1&url=https%3A//www.udacity.com/courses/data-science&cat=educational-institution HTTP/1.1" 200 133 "-" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36"
187.189.90.132 - - [01/Nov/2017:00:23:45 +0000] "GET /axis2/services/WebFilteringService/getCategoryByUrl?app=chrome_antiporn&ver=0.19.7.1&url=https%3A//securepubads.g.doubleclick.net/static/3p_cookie.html&cat=business-and-economy HTTP/1.1" 200 133 "-" "Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36"
186.93.5.254 - - [01/Nov/2017:00:24:01 +0000] "GET /axis2/services/WebFilteringService/getCategoryByUrl?app=chrome_antiporn&ver=0.19.7.1&url=https%3A//www.banesconline.com/MANTIS/WEBSITE/imagenesinhouse/imagenesinhouse.aspx&cat=financial-service HTTP/1.1" 200 133 "-" "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36"
181.176.85.164 - - [01/Nov/2017:00:24:15 +0000] "GET /axis2/services/WebFilteringService/getCategoryByUrl?app=chrome_antiporn&ver=0.19.7.1&url=https%3A//web.facebook.com/login.php%3Flogin_attempt%3D1%26lwv%3D110&cat=social-networking HTTP/1.1" 200 133 "-" "Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36"
190.229.2.7 - - [01/Nov/2017:00:24:30 +0000] "GET /axis2/services/WebFilteringService/getCategoryByUrl?app=chrome_webfilter&ver=0.19.7.1&url=https%3A//windows-file-explorer.softonic.com/%3Fex%3DDSK-309.5&cat=software-download HTTP/1.1" 200 133 "-" "Mozilla/5.0 (Windows NT 5.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/49.0.2623.112 Safari/537.36"
101.102.214.204 - - [01/Nov/2017:00:24:44 +0000] "GET /axis2/services/WebFilteringService/getCategoryByUrl?app=firefox_WebFilter&ver=0.19.6.9&url=https%3A%2F%2Fwww.chatwork.com%2F%23!rid37781593&cat=computer-information HTTP/1.1" 200 133 "-" "Mozilla/5.0 (Windows NT 6.1; WOW64; rv:56.0) Gecko/20100101 Firefox/56.0"
107.130.125.138 - - [01/Nov/2017:00:25:00 +0000] "GET /axis2/services/WebFilteringService/getCategoryByUrl?app=chrome_antiporn&ver=0.19.7.1&url=https%3A//securepubads.g.doubleclick.net/static/3p_cookie.html&cat=business-and-economy HTTP/1.1" 200 133 "-" "Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/62.0.3202.75 Safari/537.36"
187.204.183.145 - - [01/Nov/2017:00:25:14 +0000] "GET /axis2/services/WebFilteringService/getCategoryByUrl?app=chrome_antiporn&ver=0.19.7.1&url=http%3A//educacion.app.jalisco.gob.mx/cas/Default.aspx&cat=government HTTP/1.1" 200 133 "-" "Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36"
88.26.241.195 - - [01/Nov/2017:00:25:28 +0000] "GET /axis2/services/WebFilteringService/getCategoryByUrlLexibook?appid=android_safebrowser&ver=1.2.4&url=https://www-cdn.whatsapp.net/android/2.17.393/WhatsApp.apk&cat=internet-communication HTTP/1.1" 200 149 "-" "Dalvik/1.6.0 (Linux; U; Android 4.4.2; MFS100ES Build/KOT49H)"
190.218.173.239 - - [01/Nov/2017:00:25:43 +0000] "GET /axis2/services/WebFilteringService/getCategoryByUrl?app=chrome_antiporn&ver=0.19.7.1&url=https%3A//offer.alibaba.com/exclusive_US_EN.html%3Ftv%3D2%26isFeature%3Dtrue%26imp%3D5b1aor1btqfgc6v2rk7%26xp%3D-baxEQ7WcvtuK1U3YXZj3e11KlWATqHSv3HPF5tfWmkCmo1TaYp8yWdHlHT3IkKE4blNtS6vAcINPyVmlLV4u-mPaUrlz_JCb14tWvEsxKI%26pid%3D1018325%26td%3DPropellerads%26cv%3D1020192%26aff_id%3D182463618%26ct%3D2%26size%3D300_250%26cn%3DPA%26an%3D50001%26bm%3Dcpa%26tp1%3D372702377464%26src%3Dsaf&cat=business-and-economy HTTP/1.1" 200 133 "-" "Mozilla/5.0 (Windows NT 6.2; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36"

代码展示

class AccessLogAgg { 
   
  @Test
  def ipAgg(): Unit = { 
   
    Logger.getLogger("org").setLevel(Level.ERROR)
    //TODO 创建SparkContext
    val conf = new SparkConf().setMaster("local[6]").setAppName("ip_agg")
    val sc = new SparkContext(conf)
    //TODO 读取文件,生成数据集
    val path = "dataset\\access_log_sample.txt"
    val source: RDD[String] = sc.textFile(path)

    //TODO 取出IP 赋予出现次数为1
    val ipRDD: RDD[(String, Int)] = source.map(x => (x.split(" ")(0), 1))
    //TODO 简单清洗
      //去除空的数据
      //去掉非法的数据
      //根据业务再整理一下数据
      val cleanRDD: RDD[(String, Int)] = ipRDD.filter(x => StringUtils.isNotEmpty(x._1))

    //TODO 根据IP出现的次数进行聚合
    val ipAggRDD: RDD[(String, Int)] = cleanRDD.reduceByKey(_ + _)
    //TODO 根据IP出现的次数进行排序 默认升序
    val sortRDD: RDD[(String, Int)] = ipAggRDD.sortBy(x => x._2, ascending = false)

    //TODO 取出结果打印结果
    sortRDD.foreach(println)
  }
}

针对这个小案例, 我们问出互相关联但是又方向不同的五个问题

1.假设要针对整个网站的历史数据进行处理, 量有 1T, 如何处理?

放在集群中, 利用集群多台计算机来并行处理

2.如何放在集群中运行?

在这里插入图片描述
简单来讲, 并行计算就是同时使用多个计算资源解决一个问题, 有如下四个要点

  • 要解决的问题必须可以分解为多个可以并发计算的部分
  • 每个部分要可以在不同处理器上被同时执行
  • 需要一个共享内存的机制
  • 需要一个总体上的协作机制来进行调度

3.如果放在集群中的话, 可能要对整个计算任务进行分解, 如何分解?

在这里插入图片描述
概述

  • 对于 HDFS 中的文件, 是分为不同的 Block 的
  • 在进行计算的时候, 就可以按照 Block 来划分, 每一个 Block 对应一个不同的计算单元

扩展

  • RDD 并没有真实的存放数据, 数据是从 HDFS 中读取的, 在计算的过程中读取即可
  • RDD 至少是需要可以 分片 的, 因为HDFS中的文件就是分片的, RDD 分片的意义在于表示对源数据集每个分片的计算, RDD 可以分片也意味着 可以并行计算

4.移动数据不如移动计算是一个基础的优化, 如何做到?

每一个计算单元需要记录其存储单元的位置, 尽量调度过去
每一个计算单元需要记录其存储单元的位置, 尽量调度过去

5.在集群中运行, 需要很多节点之间配合, 出错的概率也更高, 出错了怎么办?
在这里插入图片描述
RDD1 → RDD2 → RDD3 这个过程中, RDD2 出错了, 有两种办法可以解决

  • 缓存 RDD2 的数据, 直接恢复 RDD2, 类似 HDFS 的备份机制
  • 记录 RDD2 的依赖关系, 通过其父级的 RDD 来恢复 RDD2, 这种方式会少很多数据的交互和保存

如何通过父级 RDD 来恢复?

  • 记录 RDD2 的父亲是 RDD1
  • 记录 RDD2 的计算函数, 例如记录 RDD2 = RDD1.map(…​), map(…​) 就是计算函数
  • 当 RDD2 计算出错的时候, 可以通过父级 RDD 和计算函数来恢复 RDD2

6.假如任务特别复杂, 流程特别长, 有很多 RDD 之间有依赖关系, 如何优化?

在这里插入图片描述
上面提到了可以使用依赖关系来进行容错, 但是如果依赖关系特别长的时候, 这种方式其实也比较低效, 这个时候就应该使用另外一种方式, 也就是记录数据集的状态

在 Spark 中有两个手段可以做到

  • 缓存
  • Checkpoint

再谈 RDD

目标

  1. 理解 RDD 为什么会出现
  2. 理解 RDD 的主要特点
  3. 理解 RDD 的五大属性

RDD 为什么会出现?

在 RDD 出现之前, 当时 MapReduce 是比较主流的, 而 MapReduce 如何执行迭代计算的任务呢?

在这里插入图片描述
多个 MapReduce 任务之间没有基于内存的数据共享方式, 只能通过磁盘来进行共享

这种方式明显比较低效

RDD 如何解决迭代计算非常低效的问题呢?

在这里插入图片描述在 Spark 中, 其实最终 Job3 从逻辑上的计算过程是: Job3 = (Job1.map).filter, 整个过程是共享内存的, 而不需要将中间结果存放在可靠的分布式文件系统中

这种方式可以在保证容错的前提下, 提供更多的灵活, 更快的执行速度.

RDD 的特点

RDD 不仅是数据集, 也是编程模型
RDD 即是一种数据结构, 同时也提供了上层 API, 同时 RDD 的 API 和 Scala 中对集合运算的 API 非常类似, 同样也都是各种算子

在这里插入图片描述
RDD 的算子大致分为两类:

  • Transformation 转换操作, 例如 map flatMap filter 等
  • Action 动作操作, 例如 reduce collect show 等

执行 RDD 的时候, 在执行到转换操作的时候, 并不会立刻执行, 直到遇见了 Action 操作, 才会触发真正的执行, 这个特点叫做 惰性求值

RDD 可以分区

在这里插入图片描述
RDD 是一个分布式计算框架, 所以, 一定是要能够进行分区计算的, 只有分区了, 才能利用集群的并行计算能力

同时, RDD 不需要始终被具体化, 也就是说: RDD 中可以没有数据, 只要有足够的信息知道自己是从谁计算得来的就可以, 这是一种非常高效的容错方式

RDD 是只读的

在这里插入图片描述
RDD 是只读的, 不允许任何形式的修改. 虽说不能因为 RDD 和 HDFS 是只读的, 就认为分布式存储系统必须设计为只读的. 但是设计为只读的, 会显著降低问题的复杂度, 因为 RDD 需要可以容错, 可以惰性求值, 可以移动计算, 所以很难支持修改.

  • RDD2 中可能没有数据, 只是保留了依赖关系和计算函数, 那修改啥?
  • 如果因为支持修改, 而必须保存数据的话, 怎么容错?
  • 如果允许修改, 如何定位要修改的那一行? RDD 的转换是粗粒度的, 也就是说, RDD 并不感知具体每一行在哪.

RDD 是可以容错的

在这里插入图片描述
RDD 的容错有两种方式

  • 保存 RDD 之间的依赖关系, 以及计算函数, 出现错误重新计算
  • 直接将 RDD 的数据存放在外部存储系统, 出现错误直接读取, Checkpoint

什么叫做弹性分布式数据集

分布式

  • RDD 支持分区, 可以运行在集群中

弹性

  • RDD 支持高效的容错
  • RDD 中的数据即可以缓存在内存中, 也可以缓存在磁盘中, 也可以缓存在外部存储中

数据集

  • RDD 可以不保存具体数据, 只保留创建自己的必备信息, 例如依赖和计算函数
  • RDD 也可以缓存起来, 相当于存储具体数据

总结: RDD 的五大属性

首先整理一下上面所提到的 RDD 所要实现的功能:

  • RDD 有分区
  • RDD 要可以通过依赖关系和计算函数进行容错
  • RDD 要针对数据本地性进行优化
  • RDD 支持 MapReduce 形式的计算, 所以要能够对数据进行 Shuffled

对于 RDD 来说, 其中应该有什么内容呢? 如果站在 RDD 设计者的角度上, 这个类中, 至少需要什么属性?

  • Partition List 分片列表, 记录 RDD 的分片, 可以在创建 RDD 的时候指定分区数目, 也可以通过算子来生成新的 RDD 从而改变分区数目
  • Compute Function 为了实现容错, 需要记录 RDD 之间转换所执行的计算函数
  • RDD Dependencies RDD 之间的依赖关系, 要在 RDD 中记录其上级 RDD 是谁, 从而实现容错和计算
  • Partitioner 为了执行 Shuffled 操作, 必须要有一个函数用来计算数据应该发往哪个分区
  • Preferred Location 优先位置, 为了实现数据本地性操作, 从而移动计算而不是移动存储, 需要记录每个 RDD 分区最好应该放置在什么位置
版权声明:本文内容由互联网用户自发贡献,该文观点仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 举报,一经查实,本站将立刻删除。

发布者:全栈程序员-用户IM,转载请注明出处:https://javaforall.cn/158913.html原文链接:https://javaforall.cn

【正版授权,激活自己账号】: Jetbrains全家桶Ide使用,1年售后保障,每天仅需1毛

【官方授权 正版激活】: 官方授权 正版激活 支持Jetbrains家族下所有IDE 使用个人JB账号...

(0)


相关推荐

  • ActiveMQ入门系列二:入门代码实例(点对点模式)[通俗易懂]

    在上一篇《ActiveMQ入门系列一:认识并安装ActiveMQ(Windows下)》中,大致介绍了ActiveMQ和一些概念,并下载、安装、启动他,还访问了他的控制台页面。这篇,就用代码实例说下如

  • vim命令搜索_linux的vim

    vim命令搜索_linux的vim尽管目前我们已经涉及Vim的多种特性,但此编辑器的特性集如此庞大,不管我们学习多少,似乎仍然远远不足。承接我们的Vim教程系列,本文我们将讨论Vim提供的多种搜索技术。不过在此之前,请注意文中涉及到的所有的例子、命令、指令均是在Ubuntu14.04,Vim7.4下测试的。Vim中的基础搜索操作当你在Vim中打开一个文件并且想要搜索一个特定的单词或模板,第一步你必须要先按…

    2022年10月24日
  • 微信支付与支付宝钱包的竞争分析

    微信支付与支付宝钱包的竞争分析微信支付与支付宝钱包的竞争分析NO1:2013年8月,微信5.0上线,其中附加了一个功能叫做微信支付,当时的微信用户已经超过了4亿,活跃用户1.94亿,估计不少人在看微信支付同支付老大哥支付包的大战。说起微信支付和支付宝的大战,先来说说他们背景,微信支付是社交软件巨头腾讯公司旗下的微信中的附加功能,而支付宝是电商巨头阿里巴巴旗下的支付理财软件。两家都有超过2万的顶级互联网员工,兵强马壮…

  • getattr getattribute_getparameter返回值

    getattr getattribute_getparameter返回值问题描述今天开发验证码验证功能,需要将手机号和对应的验证码设置到session中以便后面的验证,具体代码如下:1.发送验证码并把验证码保存到session中protectedvoiddoPost(HttpServletRequestreq,HttpServletResponseresponse)throwsServletException,IOException{ try{mresponse=response;St

    2022年10月31日
  • 空间回归与地理加权_地理加权回归处理点数据

    空间回归与地理加权_地理加权回归处理点数据本章有数学公式……对数学过敏者慎入……前文再续,书接上一回……上一次说到,在改进全局回归的基础上,GWR终于横空出世了,从此空间分析领域终于有了自己专用的回归算法。如果说,空间统计有别于经典统计学的两大特征:空间相关性和空间异质性,莫兰指数等可以用来量化空间相关性,那么地理加权回归,就可以用来量化空间异质性。在对全局回归问题的改进中,局部回归可以说是最简单的方法,GWR继续应用了局

  • 【23】进大厂必须掌握的面试题-50个spring面试

    点击上方“全栈程序员社区”,星标公众号 重磅干货,第一时间送达 让我们从Spring面试问题的第一部分开始,即“一般问题”。 一般问题–Spring面试问题 1.不同版本的Spri…

发表回复

您的电子邮箱地址不会被公开。

关注全栈程序员社区公众号