大家好,又见面了,我是你们的朋友全栈君。
在系统登录时,往往需要同时提交一个验证码。验证码的作用是验证登录者是不是人,因为人能轻松识别的图片,对于机器来说难度却比较大,因此在登陆时进行验证码校验可以阻断大部分爬虫机器人的骚扰,成本低,收益大,使得图片验证码得到了广泛的应用。本文通过网络分析的方式,step by step地探索了图片验证码的工作原理,文章技术门槛低,可读性和可操作性较强,适用于各种闲着没事干的无聊分子。
1.百度“系统登录”,随便找了个登录网站:
2.打开该网站,按F12打开开发者模式,点击Network,点击F5进行页面刷新,在Network栏中查看网站请求接口,可以看到有两个比较主要的,
第一个是网站页面的请求API:
Request URL: http://scm.fstvgo.com/
Request Method: GET
Status Code: 200 OK
Remote Address: 117.40.130.60:80
引用站点策略: unsafe-url
Request Header:
GET / HTTP/1.1
Host: scm.fstvgo.com
Connection: keep-alive
Cache-Control: max-age=0
Upgrade-Insecure-Requests: 1
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/90.0.4430.72 Safari/537.36 Edg/90.0.818.42
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9
Accept-Encoding: gzip, deflate
Accept-Language: zh-CN,zh;q=0.9,en;q=0.8,en-GB;q=0.7,en-US;q=0.6
Cookie: ASP.NET_SessionId=iqzvmskmcbhvuwwgap2kw3pv
第二个是验证码图片的请求API:
Request URL: http://scm.fstvgo.com/Login/GetValidateCode
Request Method: GET
Status Code: 200 OK
Remote Address: 117.40.130.60:80
引用站点策略: strict-origin-when-cross-origin
Request Header:
Accept: image/webp,image/apng,image/svg+xml,image/*,*/*;q=0.8
Accept-Encoding: gzip, deflate
Accept-Language: zh-CN,zh;q=0.9,en;q=0.8,en-GB;q=0.7,en-US;q=0.6
Connection: keep-alive
Cookie: ASP.NET_SessionId=iqzvmskmcbhvuwwgap2kw3pv
Host: scm.fstvgo.com
Referer: http://scm.fstvgo.com/
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/90.0.4430.72 Safari/537.36 Edg/90.0.818.42
3.直接点击“登录”,查看Network栏中的请求与响应如下:
Request URL: http://scm.fstvgo.com/Login/Login
Request Method: POST
Status Code: 200 OK
Remote Address: 117.40.130.60:80
引用站点策略: strict-origin-when-cross-origin
Request Header:
Accept: */*
Accept-Encoding: gzip, deflate
Accept-Language: zh-CN,zh;q=0.9,en;q=0.8,en-GB;q=0.7,en-US;q=0.6
Connection: keep-alive
Content-Length: 57
Content-Type: application/x-www-form-urlencoded
Cookie: ASP.NET_SessionId=iqzvmskmcbhvuwwgap2kw3pv
Host: scm.fstvgo.com
Origin: http://scm.fstvgo.com
Referer: http://scm.fstvgo.com/
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/90.0.4430.72 Safari/537.36 Edg/90.0.818.42
X-Requested-With: XMLHttpRequest
Form Data:
dlrID:
dlrPwd:
checkCode:
X-Requested-With: XMLHttpRequest
{"errorMsg":"验证码不正确!"}
猜测验证码的工作原理是:
1.客户端请求验证码;
2.服务器端生成随机验证码并保存,并根据验证码生成图片,然后回复给客户端;
3.客户端进行登录,输入正确的验证码,服务器端拿到后和存储的验证码进行比较,一致则返回true,不一致则返回false
下面使用python实际验证下:
def yzm():
headers = {
'Accept': 'application/json, text/javascript, */*; q=0.01',
'Accept-Encoding': 'gzip, deflate, br',
'Accept-Language': 'zh-CN,zh;q=0.9',
'Connection': 'keep-alive',
'Referer': 'http://scm.fstvgo.com/',
'Host': 'scm.fstvgo.com',
'Origin': 'http: // scm.fstvgo.com',
'X-Requested-With': 'XMLHttpRequest',
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/90.0.4430.72 Safari/537.36 Edg/90.0.818.42',
}
base_url = "http://scm.fstvgo.com/"
login_url = "http://scm.fstvgo.com/Login/Login"
yzm_url = "http://scm.fstvgo.com/Login/GetValidateCode"
requests.get(base_url, headers=headers)
cap = requests.get(yzm_url, headers=headers)
with open("cap.png", "wb") as f:
f.write(cap.content)
f.close()
print('打开cap.png,查看并输入其中的验证码:')
jym = sys.stdin.readline()
jym = jym.replace('\n', '')
print(jym)
data = {
'dlrID': '',
'dlrPwd': '',
'checkCode': jym,
'X-Requested-With': 'XMLHttpRequest'
}
try:
content = requests.post(login_url, data=data, headers=headers)
content.encoding = content.apparent_encoding
content_text = content.text
content_text = content_text.encode('utf-8').decode('unicode_escape')
print(content_text)
except Exception as e:
print(e)
首先请求验证码,并保存为图片,打开验证码,在终端输入验证码后进行Login,按照这个步骤的话,验证码校验应该就会通过了。
But… …
还是返回{“errorMsg”:”验证码不正确!”}
仔细对比了下Header,等等,这个Cookie是干什么的?
其实上面的原理分析不全面:Login时服务器端怎么区分你就是之前请求验证码的那个客户端呢?答案就是Cookie。
因此,结合Cookie的图片验证码的工作原理是:
1.客户端使用带Cookie的Header请求验证码;
2.服务器端生成随机验证码verifycode,然后保存为Cookie : verifycode键值对,并根据验证码生成图片,然后回复给客户端;
3.客户端使用同1中的Cookie进行登录,输入正确的验证码,服务器端根据该Cookie查询到相对应的verifycode,并和传过来的进行比较,一致则返回true,不一致则返回false,查不到的话如这个服务器居然还返回了NullReferenceException… …
因此,在Header中加入Cookie后,验证通过。
headers = {
'Accept': 'application/json, text/javascript, */*; q=0.01',
'Accept-Encoding': 'gzip, deflate, br',
'Accept-Language': 'zh-CN,zh;q=0.9',
'Connection': 'keep-alive',
'Referer': 'http://scm.fstvgo.com/',
'Host': 'scm.fstvgo.com',
'Origin': 'http: // scm.fstvgo.com',
'X-Requested-With': 'XMLHttpRequest',
'Cookie': 'ASP.NET_SessionId=xf3vipkyf5vxf34n0acc5t0e',
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/90.0.4430.72 Safari/537.36 Edg/90.0.818.42',
}
发布者:全栈程序员-用户IM,转载请注明出处:https://javaforall.cn/159594.html原文链接:https://javaforall.cn
【正版授权,激活自己账号】: Jetbrains全家桶Ide使用,1年售后保障,每天仅需1毛
【官方授权 正版激活】: 官方授权 正版激活 支持Jetbrains家族下所有IDE 使用个人JB账号...