大家好，又见面了，我是全栈君，今天给大家准备了Idea注册码。

此处内容已经被作者隐藏，请输入验证码查看内容

请关注本站微信公众号，回复“”，获取验证码。在微信里搜索“”或者“”或者微信扫描右侧二维码都可以关注本站微信公众号。

Scrapy是一个流行的网络爬虫框架，从现在起将陆续记录Python3.6下Scrapy整个学习过程，方便后续补充和学习。

Python网络爬虫之scrapy(一)已经介绍scrapy安装、项目创建和测试基本命令操作，本文将对item设置、提取和使用进行详细说明

item设置

　　item是保存爬取到的数据的容器，其使用方式和字典类似，并且提供了额外保护机制来避免拼写错误导致的未定义字段错误，定义类型为scrapy.Field的类属性来定义一个item，可以根据自己的需要在items.py文件中编辑相应的item

# -*- coding: utf-8 -*-

# Define here the models for your scraped items
#
# See documentation in:
# https://doc.scrapy.org/en/latest/topics/items.html


#装载我们抓取数据的容器
import scrapy

class ExampleItem(scrapy.Item):
    # define the fields for your item here like:
    # name = scrapy.Field()
    name = scrapy.Field()    #属性作为Field对象
    population = scrapy.Field()

item提取

　　首先回顾下创建的爬虫模块country.py，继承scrapy.Spider，且定义了三个属性

name: 用于区别 Spider。该名字必须是唯一的，您不可以为不同的 Spider 设定相同的名字
start_urls: 包含了 Spider 在启动时进行爬取的 url 列表
parse() 是 spider 的一个方法。被调用时，每个初始 URL 完成下载后生成的 response对象将会作为唯一的参数传递给该函数。该方法负责解析返回的数据(response data)，提取数据(生成 item)以及生成需要进一步处理的 URL 的 response对象。

　　response常用属性：content、text、status_code、cookies

selector选择器

　　scrapy使用了一种基于xpath和css表达式机制：scrapy selector

　　selector方法

xpath(): 传入 xpath 表达式，返回该表达式所对应的所有节点的 selector list 列表
css(): 传入 CSS 表达式，返回该表达式所对应的所有节点的 selector list 列表
extract(): 序列化该节点为 unicode 字符串并返回 list
re(): 根据传入的正则表达式对数据进行提取，返回 unicode 字符串 list 列表

shell命令抓取

　　scrapy提供了shell命令对网页数据进行抓取

　　命令格式：scrapy shell web

D:\Pystu\example>scrapy shell http://example.webscraping.com/places/default/view/Afghanistan-1

Scrapy组件之item

>>> response.xpath('//tr//td[@class="w2p_fw"]/text()').extract()
['647,500 square kilometres', '29,121,286', 'AF', 'Afghanistan', 'Kabul', '.af',
 'AFN', 'Afghani', '93', 'fa-AF,ps,uz-AF,tk']

item使用

1. item声明

class ExampleItem(scrapy.Item):
    # define the fields for your item here like:
    name = scrapy.Field()    #属性作为Field对象
    population = scrapy.Field(serializer=str)

　　Field对象这么了每个字段的元数据（metadata），可以为每个字段指明任何类型的元数据

2. item创建

item = ExampleItem(name="Afghanistan",population="29121262")
        print (item)

3. item与字典转换

　　根据item创建字典

>>> dict(ExampleItem) # create a dict from all populated values
{"name"="Afghanistan","population"="29121262"}

　　根据字典创建item

>>> Product({"name"="Afghanistan","population"="29121262"})
Product(name="Afghanistan",population="29121262")

发布者：全栈程序员-用户IM，转载请注明出处：https://javaforall.cn/120194.html原文链接：https://javaforall.cn

【正版授权，激活自己账号】： Jetbrains全家桶Ide使用，1年售后保障，每天仅需1毛

【官方授权正版激活】： 官方授权正版激活支持Jetbrains家族下所有IDE 使用个人JB账号...

Scrapy组件之item

item设置

item提取

selector选择器

selector方法

shell命令抓取

item使用

1. item声明

2. item创建

3. item与字典转换

相关推荐

关于ContentPlaceHolder与Content控件

vagrant 登录不了

延时函数如何延时

向量与矩阵范数_矩阵范数与谱半径的关系

linux 如何编译安装软件

Python遗传和进化算法框架（一）Geatpy快速入门[通俗易懂]

发表回复

　　selector方法