结合Cookie的简单图片验证码的工作原理探究

大家好，又见面了，我是你们的朋友全栈君。

在系统登录时，往往需要同时提交一个验证码。验证码的作用是验证登录者是不是人，因为人能轻松识别的图片，对于机器来说难度却比较大，因此在登陆时进行验证码校验可以阻断大部分爬虫机器人的骚扰，成本低，收益大，使得图片验证码得到了广泛的应用。本文通过网络分析的方式，step by step地探索了图片验证码的工作原理，文章技术门槛低，可读性和可操作性较强，适用于各种闲着没事干的无聊分子。

1.百度“系统登录”，随便找了个登录网站：

http://scm.fstvgo.com/

2.打开该网站，按F12打开开发者模式，点击Network，点击F5进行页面刷新，在Network栏中查看网站请求接口，可以看到有两个比较主要的，

第一个是网站页面的请求API：

Request URL: http://scm.fstvgo.com/
Request Method: GET
Status Code: 200 OK
Remote Address: 117.40.130.60:80
引用站点策略: unsafe-url

Request Header：
GET / HTTP/1.1
Host: scm.fstvgo.com
Connection: keep-alive
Cache-Control: max-age=0
Upgrade-Insecure-Requests: 1
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/90.0.4430.72 Safari/537.36 Edg/90.0.818.42
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9
Accept-Encoding: gzip, deflate
Accept-Language: zh-CN,zh;q=0.9,en;q=0.8,en-GB;q=0.7,en-US;q=0.6
Cookie: ASP.NET_SessionId=iqzvmskmcbhvuwwgap2kw3pv

第二个是验证码图片的请求API：

Request URL: http://scm.fstvgo.com/Login/GetValidateCode
Request Method: GET
Status Code: 200 OK
Remote Address: 117.40.130.60:80
引用站点策略: strict-origin-when-cross-origin

Request Header：
Accept: image/webp,image/apng,image/svg+xml,image/*,*/*;q=0.8
Accept-Encoding: gzip, deflate
Accept-Language: zh-CN,zh;q=0.9,en;q=0.8,en-GB;q=0.7,en-US;q=0.6
Connection: keep-alive
Cookie: ASP.NET_SessionId=iqzvmskmcbhvuwwgap2kw3pv
Host: scm.fstvgo.com
Referer: http://scm.fstvgo.com/
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/90.0.4430.72 Safari/537.36 Edg/90.0.818.42

3.直接点击“登录”，查看Network栏中的请求与响应如下：

Request URL: http://scm.fstvgo.com/Login/Login
Request Method: POST
Status Code: 200 OK
Remote Address: 117.40.130.60:80
引用站点策略: strict-origin-when-cross-origin


Request Header：
Accept: */*
Accept-Encoding: gzip, deflate
Accept-Language: zh-CN,zh;q=0.9,en;q=0.8,en-GB;q=0.7,en-US;q=0.6
Connection: keep-alive
Content-Length: 57
Content-Type: application/x-www-form-urlencoded
Cookie: ASP.NET_SessionId=iqzvmskmcbhvuwwgap2kw3pv
Host: scm.fstvgo.com
Origin: http://scm.fstvgo.com
Referer: http://scm.fstvgo.com/
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/90.0.4430.72 Safari/537.36 Edg/90.0.818.42
X-Requested-With: XMLHttpRequest


Form Data：
dlrID: 
dlrPwd: 
checkCode: 
X-Requested-With: XMLHttpRequest

{"errorMsg":"验证码不正确！"}

猜测验证码的工作原理是：

1.客户端请求验证码；

2.服务器端生成随机验证码并保存，并根据验证码生成图片，然后回复给客户端；

3.客户端进行登录，输入正确的验证码，服务器端拿到后和存储的验证码进行比较，一致则返回true，不一致则返回false

下面使用python实际验证下：


def yzm():
    headers = {
        'Accept': 'application/json, text/javascript, */*; q=0.01',
        'Accept-Encoding': 'gzip, deflate, br',
        'Accept-Language': 'zh-CN,zh;q=0.9',
        'Connection': 'keep-alive',
        'Referer': 'http://scm.fstvgo.com/',
        'Host': 'scm.fstvgo.com',
        'Origin': 'http: // scm.fstvgo.com',
        'X-Requested-With': 'XMLHttpRequest',
        'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/90.0.4430.72 Safari/537.36 Edg/90.0.818.42',
    }
    base_url = "http://scm.fstvgo.com/"
    login_url = "http://scm.fstvgo.com/Login/Login"
    yzm_url = "http://scm.fstvgo.com/Login/GetValidateCode"
    requests.get(base_url, headers=headers)
    cap = requests.get(yzm_url, headers=headers)
    with open("cap.png", "wb") as f:
        f.write(cap.content)
        f.close()
    print('打开cap.png，查看并输入其中的验证码：')
    jym = sys.stdin.readline()
    jym = jym.replace('\n', '')
    print(jym)

    data = {
        'dlrID': '',
        'dlrPwd': '',
        'checkCode': jym,
        'X-Requested-With': 'XMLHttpRequest'
    }
    try:
            content = requests.post(login_url, data=data, headers=headers)
            content.encoding = content.apparent_encoding
            content_text = content.text
            content_text = content_text.encode('utf-8').decode('unicode_escape')
            print(content_text)
    except Exception as e:
        print(e)

首先请求验证码，并保存为图片，打开验证码，在终端输入验证码后进行Login，按照这个步骤的话，验证码校验应该就会通过了。

But… …

还是返回{“errorMsg”:”验证码不正确！”}

仔细对比了下Header，等等，这个Cookie是干什么的？

其实上面的原理分析不全面：Login时服务器端怎么区分你就是之前请求验证码的那个客户端呢？答案就是Cookie。

因此，结合Cookie的图片验证码的工作原理是：

1.客户端使用带Cookie的Header请求验证码；

2.服务器端生成随机验证码verifycode，然后保存为Cookie : verifycode键值对，并根据验证码生成图片，然后回复给客户端；

3.客户端使用同1中的Cookie进行登录，输入正确的验证码，服务器端根据该Cookie查询到相对应的verifycode，并和传过来的进行比较，一致则返回true，不一致则返回false，查不到的话如这个服务器居然还返回了NullReferenceException… …

因此，在Header中加入Cookie后，验证通过。

    headers = {
        'Accept': 'application/json, text/javascript, */*; q=0.01',
        'Accept-Encoding': 'gzip, deflate, br',
        'Accept-Language': 'zh-CN,zh;q=0.9',
        'Connection': 'keep-alive',
        'Referer': 'http://scm.fstvgo.com/',
        'Host': 'scm.fstvgo.com',
        'Origin': 'http: // scm.fstvgo.com',
        'X-Requested-With': 'XMLHttpRequest',
        'Cookie': 'ASP.NET_SessionId=xf3vipkyf5vxf34n0acc5t0e',
        'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/90.0.4430.72 Safari/537.36 Edg/90.0.818.42',
    }

发布者：全栈程序员-用户IM，转载请注明出处：https://javaforall.cn/159594.html原文链接：https://javaforall.cn

【正版授权，激活自己账号】： Jetbrains全家桶Ide使用，1年售后保障，每天仅需1毛

【官方授权正版激活】： 官方授权正版激活支持Jetbrains家族下所有IDE 使用个人JB账号...