Python3实现HTTP请求因吉的博客-

11 六月

星期四, 11 六月 2020 00:38 Last Updated on 星期四, 11 六月 2020 00:38 0 Comments

文章目录

1 urllib实现

2 request实现

1 urllib实现

关于urllib、urllib2和urllib3的区别可以查看。python3中，urllib被打包成一个包，所拥有的模块如下：

名称	作用
urllib.request	打开和读取url
urllib.error	处理request引起的异常
urllib.parse	解析url
urllib.robotparser	解析robots.txt文件

1.1 完整请求与响应模型的实现

urllib2提供一个基础函数urlopen，通过向指定的URL发出请求来获取数据，最简单的形式如下：

# coding: utf-8 import warnings warnings.filterwarnings('ignore') from urllib import request  """响应""" res = request.urlopen('https://www.zhihu.com') #可以设置timeout，例如timeout=2 html = res.read() print(html)

输出：

b'<!doctype html>n<html lang="zh" data-hairline="true" data-theme="light"><head><meta charSet="utf-8"/><title data-react...'

以上代码可以分为两步：

# coding: utf-8 import warnings warnings.filterwarnings('ignore') from urllib import request  """请求""" req = request.Request('https://www.zhihu.com') """响应""" res = request.urlopen(req) html = res.read() print(html)

以上的两者方法都是GET请求，接下来对POST请求进行说明：

# coding: utf-8 import warnings warnings.filterwarnings('ignore') from urllib import request  url = 'https://www.xxx.com//login' postdata = {b'username': b'miao',              b'password': b'123456'} """请求""" req = request.Request(url, postdata) """响应""" res = request.urlopen(req) html = res.read() print(html)

这个自己试试就行。

1.2 请求头headers处理

下面的例子对添加请求头信息进行说明，包括设置User-Agent和Referer：

# coding: utf-8 import warnings warnings.filterwarnings('ignore') from urllib import request  url = 'https://www.xxx.com//login' postdata = {b'username': b'xxx',              b'password': b'******'} user_agent = 'Mozilla/4.0 (compatible; MSIE 5.5; Windows NT)' referer = 'https://www.github.com' herders = {'User-Agent': user_agent, 'Referer': referer} """请求""" req = request.Request(url, postdata, herders) """响应""" res = request.urlopen(req) html = res.read() print(html)

请求头信息也可以用add_header来添加：

# coding: utf-8 import warnings warnings.filterwarnings('ignore') from urllib import request  url = 'https://www.xxxxxx.com//login' postdata = {b'username': b'xxx',              b'password': b'******'} user_agent = 'Mozilla/4.0 (compatible; MSIE 5.5; Windows NT)' referer = 'https://www.github.com' req = request.Request(url, postdata) """修改""" req.add_header('User-Agent', user_agent) req.add_header('Referer', referer)  res = request.urlopen(req) html = res.read() print(html)

注意：.
对某些header要特别注意，服务器会针对这些header进行检查，例如：

User-Agent：有些服务器或Proxy会通过该值来判断是否是浏览器发出的请求
Content-Type：在使用REST接口时，服务器会检查该值，用来确定HEEP Body的内容该怎样解析，在使用服务器提供的RESTful或SOAP服务时，该值的设置错误会导致服务器拒绝服务。常见的取值如下：

application/xml (在XML RPC，如RESTful/SOAP调用时使用
application/json (在JSON RPC调用时使用)
application/x-www-form-urlencoded (浏览器提交Web表单时使用）

Referer：服务器有时会检查防盗链。

1.3 Cookie处理

如果需要得到某个Cookie的值，可以采取如下做法：

# coding: utf-8 import warnings warnings.filterwarnings('ignore') from urllib import request from http import cookiejar  cookie = cookiejar.CookieJar() opener = request.build_opener(request.HTTPCookieProcessor(cookie)) """响应""" res = opener.open('https://www.zhihu.com') for item in cookie: print(item.name + ": " + item.value)

输出：

_xsrf: 467z... _zap: 4f91... KLBRSID: ed2a...

当然可以按自己的需要手动添加Cookie的内容：

# coding: utf-8 import warnings warnings.filterwarnings('ignore') from urllib import request  cookie = ('Cookie', 'email=' + 'xxxxxxx@163.com') opener = request.build_opener() opener.addheaders = [cookie] """请求""" req = request.Request('https://www.zhihu.com') """响应""" res = opener.open(req) print(res.headers) retdata = res.read()

输出：

Date: Tue, 09 Jun 2020 06:45:54 GMT Content-Type: text/html; charset=utf-8 Content-Length: 49014 Connection: close Server: CLOUD ELB 1.0.0...

1.4 获取HTTP相应码

对于200OK来说，只需使用urlopen返回对象的getcode()即可获得HTTP的返回码。但是对于其他返回码，则会抛出异常：

# coding: utf-8 import warnings warnings.filterwarnings('ignore') from urllib import request  try: """响应"""     res = request.urlopen('https://www.zhihu.com') print(res.getcode()) except request.HTTPError as e: if hasattr(e, 'code'): print("Error code: ", e.code)

输出：

1.5 重定向

以下代码将检查是否出现了重定向动作：

# coding: utf-8 import warnings warnings.filterwarnings('ignore') from urllib import request  try: """响应"""     res = request.urlopen('https://www.zhihu.com') print(res.geturl()) except request.HTTPError as e: if hasattr(e, 'code'): print("Error code: ", e.code)

输出：

https://www.zhihu.com/signin?next=%2F

如果不想重定向，则可以自定义HTTPRedirectHandler类：

# coding: utf-8 import warnings warnings.filterwarnings('ignore') from urllib import request  class RedirectHandler(request.HTTPRedirectHandler): def http_error_301(self, req, fp, code, msg, headers): pass def http_error_302(self, req, fp, code, msg, headers):         result = request.HTTPRedirectHandler.http_error_301(self, req, fp, code, msg, headers)         result.status = code         result.newurl = result.geturl() return result      opener = request.build_opener(RedirectHandler) res = opener.open('https://www.zhihu.cn') print(res)

输出：

<http.client.HTTPResponse object at 0x000001BEAC776160>

1.6 Proxy的设置

示例如下：

# coding: utf-8 import warnings warnings.filterwarnings('ignore') from urllib import request  proxy = request.ProxyHandler({'http': '127.0.0.1: 8087'}) opener = request.build_opener(proxy) res = opener.open('https://www.zhihu.com/') print(res.read())

输出：
Python3实现HTTP请求因吉的博客-

2 request实现

2.1 完整请求与响应模型的实现

1）GET请求：

# coding: utf-8 import warnings warnings.filterwarnings('ignore') import requests  res = requests.get('https://www.zhihu.com') print(res.content)

2）POST请求：

# coding: utf-8 import warnings warnings.filterwarnings('ignore') import requests  postdata = {'key' : 'value'} res = requests.post('https://www.zhihu.com', data=postdata) print(res.content)

HTTP中其他请求方式示例如下：

requests.put (‘https://www.xxxxxx.com/put’，data={‘key’:‘value’})
requests.delete (‘https://www.xxxxxx.com/delete’)
requests.head (‘https://www.xxxxxx.com/get’)
requests.options (‘https://www.xxxxxx.com/get’)

3）复杂URL的输入，除了使用完整的URL，requests还提供了以下方式：

# coding: utf-8 import warnings warnings.filterwarnings('ignore') import requests  payload = {'Keywords': 'bolg:qiyeboy', 'pageindex': 1} """可设置timeout""" res = requests.get('https://www.zhihu.com', params=payload) print(res.url)

输出：

https://www.zhihu.com/?Keywords=bolg%3Aqiyeboy&pageindex=1

2.2 响应与编码

以res = requests.get(‘https://www.zhihu.com’) 为例，其返回值中：

res.content：字节形式
res.text：文本形式
res.encoding：根据HTTP头猜测的网页编码格式

这里使用第三方库chardet来进行字符串 / 文件编码检测：

# coding: utf-8 import warnings warnings.filterwarnings('ignore') import requests import chardet  res = requests.get('https://www.zhihu.com') """ detect返回字典，包括：     - 'encoding'：编码形式      - 'confidence'：检测精确度     - 'language'：超文本标记语言 """ ret_dic = chardet.detect(res.content) """使用检测到的编码形式解码""" res.encoding = ret_dic['encoding'] print(ret_dic) print(res.text)

输出：

{'encoding': 'ascii', 'confidence': 1.0, 'language': ''} <html> <head><title>400 Bad Request</title></head> <body bgcolor="white"> <center><h1>400 Bad Request</h1></center> <hr><center>openresty</center> </body> </html>

2.3 请求头headers处理

# coding: utf-8 import warnings warnings.filterwarnings('ignore') import requests  user_agent = 'Mozilla/4.0 (compatible; MSIE 5.5; Windows NT)' headers = {'User-Agent': user_agent} res = requests.get('https://www.zhihu.com', headers=headers) print(res.content)

2.4 响应码code和响应头headers处理

# coding: utf-8 import warnings warnings.filterwarnings('ignore') import requests  res = requests.get('https://www.baidu.com') """ res.status_code：获取响应码 res.status_code == requests.codes.ok：判断相应码 """ if res.status_code == requests.codes.ok: print("响应码：", res.status_code) print("响应头：", res.headers) print("字段获取：", res.headers.get('content-type')) else: """  当相应码是4XX或5XX时，raise_for_status()会抛出异常  当相应码是200时，raise_for_status()返回None  """     res.raise_for_status()

输出：

响应码： 200 响应头： {'Cache-Control': 'private, no-cache, no-store, proxy-revalidate, no-transform', 'Connection': 'keep-alive', 'Content-Encoding': 'gzip', 'Content-Type': 'text/html', 'Date': 'Tue, 09 Jun 2020 13:42:42 GMT', 'Last-Modified': 'Mon, 23 Jan 2017 13:27:52 GMT', 'Pragma': 'no-cache', 'Server': 'bfe/1.0.8.18', 'Set-Cookie': 'BDORZ=27315; max-age=86400; domain=.baidu.com; path=/', 'Transfer-Encoding': 'chunked'} 字段获取： text/html

2.5 Cookie处理

1）自动Cookie：

# coding: utf-8 import warnings warnings.filterwarnings('ignore') import requests  user_agent = 'Mozilla/4.0 (compatible; MSIE 5.5; Windows NT)' headers={'User-Agent':user_agent} res = requests.get('https://www.baidu.com', headers=headers) for cookie in res.cookies.keys(): print(cookie + ": " + res.cookies.get(cookie))

输出：

BAIDUID: D285BF54C9CC968744699A9B4F843D60:FG=1 BIDUPSID: D285BF54C9CC9687F9E45D28DB4C9F33 H_PS_PSSID: 1456_31326_21100_31069_31765_31673_30823 PSTM: 1591710519 BDSVRTM: 0 BD_HOME: 1

2）自定义Cookie：

# coding: utf-8 import warnings warnings.filterwarnings('ignore') import requests  user_agent = 'Mozilla/4.0 (compatible; MSIE 5.5; Windows NT)' headers={'User-Agent':user_agent} """自定义""" cookies = dict(name='guangtouqiang', age='18') res = requests.get('https://www.baidu.com', headers=headers, cookies=cookies) print(res.text)

3）自动处理Cookie：

# coding: utf-8 import warnings warnings.filterwarnings('ignore') import requests  login_url = 'https://www.zhihu.com/login' s = requests.Session() datas = {'name': 'guangtouqiang', 'passwd': '123456'} """ 游客模式，服务器先分配一个cookie， 如果没有这一步，系统会认为时非法用户 allow_redirects=True表示允许重定向，如果重定向，则可通过res.history查看历史信息 """ s.get(login_url, allow_redirects=True) """验证成功，权限将升级到会员权限""" res = s.post(login_url, data=datas, allow_redirects=True) print(res.text)

输出：

<html> <head><title>400 Bad Request</title></head> <body bgcolor="white"> <center><h1>400 Bad Request</h1></center> <hr><center>openresty</center> </body> </html>

2.7 重定向和历史信息

展开阅读全文

1
评论
x
海报

扫一扫，海报
4
手机看

到微信朋友圈

x

扫一扫，手机阅读
打赏

打赏

因吉

“你的鼓励将是我创作的最大动力”

5C币 10C币 20C币 50C币 100C币 200C币

确定

本页所有内容来自官方网站 https://www.imapbox.com 新闻来源：互联网搜索引擎和新闻站

本网页所有图片由 ImageBox 图片批量下载器,网页图片批量下载专家,网页图片批量下载器,获取到文章图片，下载并得到。

ImageBox 图片批量下载器工具地址: 网页图片批量下载工具-最新版本下载

非凡下载站地址：https://www.crsky.com/soft/35838.html

本网页所有视频内容由 imoviebox边看边下-网页视频下载, iurlBox网页地址收藏管理器下载并得到。

ImovieBox网页视频下载器下载地址: ImovieBox网页视频下载器-最新版本下载

本文章由: imapbox邮箱云存储,邮箱网盘,ImageBox 图片批量下载器,网页图片批量下载专家,网页图片批量下载器,获取到文章图片,imoviebox网页视频批量下载器,下载视频内容,为您提供.

阅读和此文章类似的: 全球云计算

Python3实现HTTP请求因吉的博客-

文章目录

1 urllib实现

1.1 完整请求与响应模型的实现

1.2 请求头headers处理

1.3 Cookie处理

1.4 获取HTTP相应码

1.5 重定向

1.6 Proxy的设置

2 request实现

2.1 完整请求与响应模型的实现

2.2 响应与编码

2.3 请求头headers处理

2.4 响应码code和响应头headers处理

2.5 Cookie处理

2.7 重定向和历史信息

文章目录

近期文章

官方链接

关于我们

软件产品

事业方向

联系我们

ImapBox Technology Research Group

Python3实现HTTP请求因吉的博客-

文章目录

1 urllib实现

1.1 完整请求与响应模型的实现

1.2 请求头headers处理

1.3 Cookie处理

1.4 获取HTTP相应码

1.5 重定向

1.6 Proxy的设置

2 request实现

2.1 完整请求与响应模型的实现

2.2 响应与编码

2.3 请求头headers处理

2.4 响应码code和响应头headers处理

2.5 Cookie处理

2.7 重定向和历史信息

文章目录

近期文章

官方链接

关于我们

软件产品

事业方向

联系我们

ImapBox Technology Research Group

登录