关于urllib、urllib2和urllib3的区别可以查看。python3中,urllib被打包成一个包,所拥有的模块如下: urllib2提供一个基础函数urlopen,通过向指定的URL发出请求来获取数据,最简单的形式如下: 输出: 以上代码可以分为两步: 以上的两者方法都是GET请求,接下来对POST请求进行说明: 这个自己试试就行。 下面的例子对添加请求头信息进行说明,包括设置User-Agent和Referer: 请求头信息也可以用add_header来添加: 注意:. 如果需要得到某个Cookie的值,可以采取如下做法: 输出: 当然可以按自己的需要手动添加Cookie的内容: 输出: 对于200OK来说,只需使用urlopen返回对象的getcode()即可获得HTTP的返回码。但是对于其他返回码,则会抛出异常: 输出: 以下代码将检查是否出现了重定向动作: 输出: 如果不想重定向,则可以自定义HTTPRedirectHandler类: 输出: 示例如下: 输出: 1)GET请求: 2)POST请求: HTTP中其他请求方式示例如下: 3)复杂URL的输入,除了使用完整的URL,requests还提供了以下方式: 输出: 以res = requests.get(‘https://www.zhihu.com’) 为例,其返回值中: 这里使用第三方库chardet来进行字符串 / 文件编码检测: 输出: 输出: 1)自动Cookie: 输出: 2)自定义Cookie: 3)自动处理Cookie: 输出:文章目录
1 urllib实现
名称
作用
urllib.request
打开和读取url
urllib.error
处理request引起的异常
urllib.parse
解析url
urllib.robotparser
解析robots.txt文件
1.1 完整请求与响应模型的实现
# coding: utf-8 import warnings warnings.filterwarnings('ignore') from urllib import request """响应""" res = request.urlopen('https://www.zhihu.com') #可以设置timeout,例如timeout=2 html = res.read() print(html)
b'<!doctype html>n<html lang="zh" data-hairline="true" data-theme="light"><head><meta charSet="utf-8"/><title data-react...'
# coding: utf-8 import warnings warnings.filterwarnings('ignore') from urllib import request """请求""" req = request.Request('https://www.zhihu.com') """响应""" res = request.urlopen(req) html = res.read() print(html)
# coding: utf-8 import warnings warnings.filterwarnings('ignore') from urllib import request url = 'https://www.xxx.com//login' postdata = {b'username': b'miao', b'password': b'123456'} """请求""" req = request.Request(url, postdata) """响应""" res = request.urlopen(req) html = res.read() print(html)
1.2 请求头headers处理
# coding: utf-8 import warnings warnings.filterwarnings('ignore') from urllib import request url = 'https://www.xxx.com//login' postdata = {b'username': b'xxx', b'password': b'******'} user_agent = 'Mozilla/4.0 (compatible; MSIE 5.5; Windows NT)' referer = 'https://www.github.com' herders = {'User-Agent': user_agent, 'Referer': referer} """请求""" req = request.Request(url, postdata, herders) """响应""" res = request.urlopen(req) html = res.read() print(html)
# coding: utf-8 import warnings warnings.filterwarnings('ignore') from urllib import request url = 'https://www.xxxxxx.com//login' postdata = {b'username': b'xxx', b'password': b'******'} user_agent = 'Mozilla/4.0 (compatible; MSIE 5.5; Windows NT)' referer = 'https://www.github.com' req = request.Request(url, postdata) """修改""" req.add_header('User-Agent', user_agent) req.add_header('Referer', referer) res = request.urlopen(req) html = res.read() print(html)
对某些header要特别注意,服务器会针对这些header进行检查,例如:
application/xml (在XML RPC,如RESTful/SOAP调用时使用
application/json (在JSON RPC调用时使用)
application/x-www-form-urlencoded (浏览器提交Web表单时使用)
1.3 Cookie处理
# coding: utf-8 import warnings warnings.filterwarnings('ignore') from urllib import request from http import cookiejar cookie = cookiejar.CookieJar() opener = request.build_opener(request.HTTPCookieProcessor(cookie)) """响应""" res = opener.open('https://www.zhihu.com') for item in cookie: print(item.name + ": " + item.value)
_xsrf: 467z... _zap: 4f91... KLBRSID: ed2a...
# coding: utf-8 import warnings warnings.filterwarnings('ignore') from urllib import request cookie = ('Cookie', 'email=' + 'xxxxxxx@163.com') opener = request.build_opener() opener.addheaders = [cookie] """请求""" req = request.Request('https://www.zhihu.com') """响应""" res = opener.open(req) print(res.headers) retdata = res.read()
Date: Tue, 09 Jun 2020 06:45:54 GMT Content-Type: text/html; charset=utf-8 Content-Length: 49014 Connection: close Server: CLOUD ELB 1.0.0...
1.4 获取HTTP相应码
# coding: utf-8 import warnings warnings.filterwarnings('ignore') from urllib import request try: """响应""" res = request.urlopen('https://www.zhihu.com') print(res.getcode()) except request.HTTPError as e: if hasattr(e, 'code'): print("Error code: ", e.code)
200
1.5 重定向
# coding: utf-8 import warnings warnings.filterwarnings('ignore') from urllib import request try: """响应""" res = request.urlopen('https://www.zhihu.com') print(res.geturl()) except request.HTTPError as e: if hasattr(e, 'code'): print("Error code: ", e.code)
https://www.zhihu.com/signin?next=%2F
# coding: utf-8 import warnings warnings.filterwarnings('ignore') from urllib import request class RedirectHandler(request.HTTPRedirectHandler): def http_error_301(self, req, fp, code, msg, headers): pass def http_error_302(self, req, fp, code, msg, headers): result = request.HTTPRedirectHandler.http_error_301(self, req, fp, code, msg, headers) result.status = code result.newurl = result.geturl() return result opener = request.build_opener(RedirectHandler) res = opener.open('https://www.zhihu.cn') print(res)
<http.client.HTTPResponse object at 0x000001BEAC776160>
1.6 Proxy的设置
# coding: utf-8 import warnings warnings.filterwarnings('ignore') from urllib import request proxy = request.ProxyHandler({'http': '127.0.0.1: 8087'}) opener = request.build_opener(proxy) res = opener.open('https://www.zhihu.com/') print(res.read())
2 request实现
2.1 完整请求与响应模型的实现
# coding: utf-8 import warnings warnings.filterwarnings('ignore') import requests res = requests.get('https://www.zhihu.com') print(res.content)
# coding: utf-8 import warnings warnings.filterwarnings('ignore') import requests postdata = {'key' : 'value'} res = requests.post('https://www.zhihu.com', data=postdata) print(res.content)
# coding: utf-8 import warnings warnings.filterwarnings('ignore') import requests payload = {'Keywords': 'bolg:qiyeboy', 'pageindex': 1} """可设置timeout""" res = requests.get('https://www.zhihu.com', params=payload) print(res.url)
https://www.zhihu.com/?Keywords=bolg%3Aqiyeboy&pageindex=1
2.2 响应与编码
# coding: utf-8 import warnings warnings.filterwarnings('ignore') import requests import chardet res = requests.get('https://www.zhihu.com') """ detect返回字典,包括: - 'encoding':编码形式 - 'confidence':检测精确度 - 'language':超文本标记语言 """ ret_dic = chardet.detect(res.content) """使用检测到的编码形式解码""" res.encoding = ret_dic['encoding'] print(ret_dic) print(res.text)
{'encoding': 'ascii', 'confidence': 1.0, 'language': ''} <html> <head><title>400 Bad Request</title></head> <body bgcolor="white"> <center><h1>400 Bad Request</h1></center> <hr><center>openresty</center> </body> </html>
2.3 请求头headers处理
# coding: utf-8 import warnings warnings.filterwarnings('ignore') import requests user_agent = 'Mozilla/4.0 (compatible; MSIE 5.5; Windows NT)' headers = {'User-Agent': user_agent} res = requests.get('https://www.zhihu.com', headers=headers) print(res.content)
2.4 响应码code和响应头headers处理
# coding: utf-8 import warnings warnings.filterwarnings('ignore') import requests res = requests.get('https://www.baidu.com') """ res.status_code:获取响应码 res.status_code == requests.codes.ok:判断相应码 """ if res.status_code == requests.codes.ok: print("响应码:", res.status_code) print("响应头:", res.headers) print("字段获取:", res.headers.get('content-type')) else: """ 当相应码是4XX或5XX时,raise_for_status()会抛出异常 当相应码是200时,raise_for_status()返回None """ res.raise_for_status()
响应码: 200 响应头: {'Cache-Control': 'private, no-cache, no-store, proxy-revalidate, no-transform', 'Connection': 'keep-alive', 'Content-Encoding': 'gzip', 'Content-Type': 'text/html', 'Date': 'Tue, 09 Jun 2020 13:42:42 GMT', 'Last-Modified': 'Mon, 23 Jan 2017 13:27:52 GMT', 'Pragma': 'no-cache', 'Server': 'bfe/1.0.8.18', 'Set-Cookie': 'BDORZ=27315; max-age=86400; domain=.baidu.com; path=/', 'Transfer-Encoding': 'chunked'} 字段获取: text/html
2.5 Cookie处理
# coding: utf-8 import warnings warnings.filterwarnings('ignore') import requests user_agent = 'Mozilla/4.0 (compatible; MSIE 5.5; Windows NT)' headers={'User-Agent':user_agent} res = requests.get('https://www.baidu.com', headers=headers) for cookie in res.cookies.keys(): print(cookie + ": " + res.cookies.get(cookie))
BAIDUID: D285BF54C9CC968744699A9B4F843D60:FG=1 BIDUPSID: D285BF54C9CC9687F9E45D28DB4C9F33 H_PS_PSSID: 1456_31326_21100_31069_31765_31673_30823 PSTM: 1591710519 BDSVRTM: 0 BD_HOME: 1
# coding: utf-8 import warnings warnings.filterwarnings('ignore') import requests user_agent = 'Mozilla/4.0 (compatible; MSIE 5.5; Windows NT)' headers={'User-Agent':user_agent} """自定义""" cookies = dict(name='guangtouqiang', age='18') res = requests.get('https://www.baidu.com', headers=headers, cookies=cookies) print(res.text)
# coding: utf-8 import warnings warnings.filterwarnings('ignore') import requests login_url = 'https://www.zhihu.com/login' s = requests.Session() datas = {'name': 'guangtouqiang', 'passwd': '123456'} """ 游客模式,服务器先分配一个cookie, 如果没有这一步,系统会认为时非法用户 allow_redirects=True表示允许重定向,如果重定向,则可通过res.history查看历史信息 """ s.get(login_url, allow_redirects=True) """验证成功,权限将升级到会员权限""" res = s.post(login_url, data=datas, allow_redirects=True) print(res.text)
<html> <head><title>400 Bad Request</title></head> <body bgcolor="white"> <center><h1>400 Bad Request</h1></center> <hr><center>openresty</center> </body> </html>
2.7 重定向和历史信息
本网页所有视频内容由 imoviebox边看边下-网页视频下载, iurlBox网页地址收藏管理器 下载并得到。
ImovieBox网页视频下载器 下载地址: ImovieBox网页视频下载器-最新版本下载
本文章由: imapbox邮箱云存储,邮箱网盘,ImageBox 图片批量下载器,网页图片批量下载专家,网页图片批量下载器,获取到文章图片,imoviebox网页视频批量下载器,下载视频内容,为您提供.
阅读和此文章类似的: 全球云计算