|
一、urllib库1、概述urllib库的核心功能是:向服务器发送请求,得到服务器响应,获取网页的内容。urllib库是Python3.X内置的HTTP请求库。urllib库提供了四大模块,如图1-1所示。图1-1urllib库结构Ø requset:HTTP请求模块,可以用来模拟发送请求,只需要传入URL及额外参数,就可以模拟浏览器访问网页的过程。Ø error:异常处理模块,检测请求是否报错,捕捉异常错误,进行重试或其他操作,保证程序不会终止。Ø parse:工具模块,提供许多URL处理方法,如拆分、解析、合并等。Ø robotparser:识别网站的robots.txt文件,判断哪些网站可以爬,哪些网站不可以爬,使用频率较少。2、常用方法介绍(1)urllib.request.urlopen()函数创建一个标识远程url的文件对象,然后像本地文件一样操作这个类文件对象来获取远程数据。语法如下:urlopen(url,data=None,timeout=socket._GLOBAL_DEFAULT_TIMEOUT,*,cafile=None,capath=None,cadefault=False,context=None)Ø 参数描述名称描述urlURL地址,即为要抓取网页的地址,如:http://www.baidu.com/;datadata用来指明发往服务器请求中的额外信息。data默认是None,表示此时以GET方式发送请求;当用户给出data参数的时候,表示为POST方式发送请求;timeout在某些网络情况不好或者服务器端异常的情况会出现请求慢的情况,或者请求异常,这时给timeout参数设置一个请求超时时间,而不是让程序一直在等待结果;cafile、capath、cadefault用于实现可信任的CA证书的HTTP请求;context实现SSL加密传输。返回值:http.client.HTTPResponse对象http.client.HTTPResponse对象提供的常用方法如下:v read()、readline()、readlines()、fileno()、close():对HTTPResponse类型数据进行操作;v info():返回HTTPMessage对象,表示远程服务器返回的头信息;v getcode():返回HTTP状态码v geturl():返回请求的url;v getheaders():响应的头部信息;v getheader('Server'):返回响应头指定参数Server的值;v status:返回状态码;v reason:返回状态的详细信息。(2)urllib.request.urlretrieve()函数函数urlretrieve()将远程数据下载到本地。语法如下:urllib.request.urlretrieve(url,filename=None,reporthook=None,data=None)Ø 参数描述名称描述url远程数据的地址。filename保存文件的路径,如果为空,则下载为临时文件。reporthook钩子函数连接服务器成功以及每个数据块下载完成时各调用一次,包含3个参数,依次为已经下载的数据块,数据块的大小,总文件的大小,可用于显示下载进度。datapost到服务器的数据。使用示例:defSchedule(a,b,c): """a:已经下载的数据块 b:数据块的大小 c:远程文件的大小 """ per=100.0*float(a*b)/float(c) ifper>100: per=100print("a",a) print("b",b) print("c",c) print('{:.2f}%'.format(per))url='https://xxx.cmg.cn/'local='mylogo.png'filename,_=urllib.request.urlretrieve(url,local,Schedule)#('mylogo.png',)print(filename)#mylogo.png(3)decode()函数decode()方法以指定的编码格式解码bytes对象,并返回解码后的字符串。语法格式为:bytes.decode(encoding=“utf-8”,errors=“strict”)Ø 参数描述名称描述encoding要使用的编码,如“UTF-8”,“gbk”,默认编码为“UTF-8”errors设置不同错误的处理方案。默认为“strict”,意为编码错误引起一个UnicodeError,其它可能的值有“ignore”,“replace”,“xmlcharrefreplace”,“backslashreplace”(4)urllib.parse.urlencode()函数把字典或序列数据转换为URL字符串。语法如下:urllib.parse.urlencode(query, doseq=False, safe='', encoding=None,errors=None,quote_via=quote_plus)Ø 参数描述名称描述query待转换的参数对象doseq序列元素是否单独转换safe安全默认值encoding编码errors错误默认值(5)urllib.parse模块的常用函数表【urlparse,urlunparse,urljoin,urldefrag,urlsplit,urlunsplit,urlencode,parse_qs, parse_qsl,quote,quote_plus,quote_from_bytes,unquote,unquote_plus,unquote_to_bytes】(6)常见的各种状态码含义200:请求正常,服务器正常的返回数据;301:永久重定向。比如在访问www.jingdong.com的时候会重定向到www.jd.com;302:临时重定向。比如在访问一个需要登录的页面的时候,而此时没有登录,那么就会重定向到登录页面;400:请求的url在服务器上找不到。换句话说就是请求url错误;403:服务器拒绝访问,权限不够;500:服务器内部错误。可能是服务器出现bug了。3、使用基础(1)发送请求importurllib.requestr=urllib.request.urlopen("http://www.python.org/")(2)读取响应内容importurllib.requesturl=“http://www.python.org/"withurllib.request.urlopen(url)asr: r.read()(3)传递URL参数importurllib.requestimporturllib.parseparams=urllib.parse.urlencode({'q':'urllib','check_keywords':'yes','area':'default'})url="HTTPS://docs.python.org/3/search.html?{}".format(params)r=urllib.request.urlopen(url)(4)传递中文参数importurllib.requestsearchword=urllib.parse.quote(input("请输入要查询的关键字:"))url="https://cn.bing.com/images/async?q={}&first=0&mmasync=1".format(searchword)r=urllib.request.urlopen(url)(5)传递POST请求importurllib.requestimporturllib.parsebase_url="你的请求网址"#你请求网址的参数data={ "name":"www.lidihuo.com", "pass":"xxxx"}headers={"User-Agent":"Mozilla/5.0(WindowsNT6.1;WOW64)AppleWebKit/537.36(KHTML,likeGecko)Chrome/63.0.3239.132Safari/537.36QIHU360SE"}postdata=urllib.parse.urlencode(data).encode('utf-8')req=urllib.request.Request(url=base_url,headers=headers,data=postdata,method='POST')response=urllib.request.urlopen(req)html=response.read()#html=response.read().decode('utf-8')#decode()是把bytes转化解码为strprint(html)(6)传递GET请求fromurllibimportparse,requestimportrandomurl='https://www.lidihuo.com/python/spider-test.html'keyvalue='url参数'encoded_wd=parse.urlencode(wd)new_url=url+'?'+encoded_wdreq=request.Request(url)#为了防止被网站封ip,模仿浏览器访问网站ua_list=[ "Mozilla/5.0(Macintosh;IntelMacOSX10.6;rv2.0.1)Gecko/20100101Firefox/4.0.1", "Mozilla/5.0(WindowsNT6.1;rv2.0.1)Gecko/20100101Firefox/4.0.1", "Opera/9.80(Macintosh;IntelMacOSX10.6.8;U;en)Presto/2.8.131Version/11.11", "Opera/9.80(WindowsNT6.1;U;en)Presto/2.8.131Version/11.11", "Mozilla/5.0(Macintosh;IntelMacOSX10_7_0)AppleWebKit/535.11(KHTML,likeGecko)Chrome/17.0.963.56Safari/535.11"]#在User-Agent列表里随机选择一个User-Agent;从序列中随机选取一个元素user_agent=random.choice(ua_list)req.add_header('User-Agent',user_agent)response=request.urlopen(req)print(response.read().decode('utf-8'))(7)下载远程数据到本地importurllib.requesturl=“https://www.python.org/static/img/python-logo.png"urllib.request.urlretrieve(url,"python-logo.png")(8)Cookie的使用importurllib.requestimportHTTP.cookiejarurl=“http://www.w3school.com.cn/"cjar=HTTP.cookiejar.CookieJar()opener=urllib.request.build_opener(urllib.request.HTTPCookieProcessor(cjar))urllib.request.install_opener(opener)r=urllib.request.urlopen(url)(9)设置Headers浏览器header部分的信息:Ø User-Agent:有些服务器或Proxy会通过该值来判断是否是浏览器发出的请求。Ø Content-Type:在使用REST接口时,服务器会检查该值,用来确定HTTPBody中的内容该怎样解析。Ø application/xml:在XMLRPC,如RESTful/SOAP调用时使用。Ø application/json:在JSONRPC调用时使用。Ø application/x-www-form-urlencoded:浏览器提交Web表单时使用。其中,agent就是请求的身份,user-agent能够使服务器识别出用户的操作系统及版本、cpu类型、浏览器类型和版本。很多网站会设置user-agent白名单,只有在白名单范围内的请求才能正常访问。所以在我们的爬虫代码中需要设置user-agent伪装成一个浏览器请求。有时候服务器还可能会校验Referer,所以还可能需要设置Referer(用来表示此时的请求是从哪个页面链接过来的)。示例代码如下:#伪装浏览器的爬虫fromurllibimportrequestimportreimportsslurl="https://www.lidihuo.com/python/spider-test.html"#导入ssl时关闭证书验证ssl._create_default_https_context=ssl._create_unverified_context#构造请求头信息header={ 'User-Agent':'Mozilla/5.0(WindowsNT6.1;WOW64)AppleWebKit/537.36(KHTML,likeGecko)Chrome/63.0.3239.132Safari/537.36'}req=request.Request(url,headers=header)#发送请求.获取响应信息reponse=request.urlopen(req).read().decode()#解码---(编码encode())pat=r""data=re.findall(pat,reponse)#由于data返回的是列表形式的数据用data[0]直接取值print(data[0])(10)Proxy(代理)的设置动态设置代理,就是事先准备一堆User-Agent.每次发送请求时就从中间随机选取一个,采用random随机模块的choice方法随机选择User-Agent,这样每次请求都会从中选择,请求很频繁的话就多找几个user-agent,具体如下:defload_page(url,form_data): USER_AGENTS=[ "Mozilla/5.0(compatible;MSIE9.0;WindowsNT6.1;Win64;x64;Trident/5.0;.NETCLR3.5.30729;.NETCLR3.0.30729;.NETCLR2.0.50727;MediaCenterPC6.0)", "Mozilla/5.0(compatible;MSIE8.0;WindowsNT6.0;Trident/4.0;WOW64;Trident/4.0;SLCC2;.NETCLR2.0.50727;.NETCLR3.5.30729;.NETCLR3.0.30729;.NETCLR1.0.3705;.NETCLR1.1.4322)", "Mozilla/4.0(compatible;MSIE7.0b;WindowsNT5.2;.NETCLR1.1.4322;.NETCLR2.0.50727;InfoPath.2;.NETCLR3.0.04506.30)", "Mozilla/5.0(Windows;U;WindowsNT5.1;zh-CN)AppleWebKit/523.15(KHTML,likeGecko,Safari/419.3)Arora/0.3(Change:287c9dfb30)", "Mozilla/5.0(X11;U;Linux;en-US)AppleWebKit/527+(KHTML,likeGecko,Safari/419.3)Arora/0.6", "Mozilla/5.0(Windows;U;WindowsNT5.1;en-US;rv:1.8.1.2pre)Gecko/20070215K-Ninja/2.1.1", "Mozilla/5.0(Windows;U;WindowsNT5.1;zh-CN;rv:1.9)Gecko/20080705Firefox/3.0Kapiko/3.0", "Mozilla/5.0(X11;Linuxi686;U;)Gecko/20070322Kazehakase/0.4.5" ] user_agent=random.choice(USER_AGENTS) headers={ 'User-Agent':user_agent }二、Request库1、概述Requests库号称是最好用的,同时也是使用最多的PythonHTTP库。Requests库由Python的urllib3封装而成,但比urllib3提供了更友好的接口调用体验。Requests库可发送原生的HTTP/1.1请求,无须手动为URL添加查询字符串,也不需要对POST数据进行表单编码。相对于urllib3,Requests库拥有完全自动化的Keep-Alive和HTTP连接池的功能。主要特性如下:v Keep-Alive&连接池v 国际化域名和URLv 带持久Cookie的会话v 浏览器方式的SSL认证v 自动内容解码v 基本/摘要式的身份认证v 优雅的Key/ValueCookiev 自动解压v Unicode响应体v HTTP(S)代理支持v 文件分块上传v 流下载v 连接超时v 分块请求v 支持.netrc2、安装pipinstallrequests3、基本使用(1)MakeaRequestv importingtheRequestsmodule:importrequestsv trytogetawebpager=requests.get(‘https://api.github.com/’)§——使用get方法,获取网络资源时,有一个基本流程:第一步发送get请求,获取HTTP返回的Response对象r第二步使用r.status_code,检测返回的状态,如果r.status_code=200,则可以获取HTTP响应的数据;否则,则检测请求出错原因。§——爬取网页的通用代码框架:importrequestsurl=‘https://api.github.com/’try:r=requests.get(url,timeout=30)r.raise_for_status()#状态码不是200,则引发HTTPError异常r.encoding=r.apparent_encodingprint(r.text)except:print(‘产生异常’)v trytomakeanHTTPPOSTrequestr=request.post(‘https://httpbin.org/post’,data={‘key’:’value’})(2)PassingParametersInURLsYouoftenwanttosendsomesortofdataintheURL’squerystring.forexample:payload={'key1':'value1','key2':'value2'}r=requests.get('https://httpbin.org/get',params=payload)print(r.url) #https://httpbin.org/get?key2=value2&key1=value1(3)Response ContentWecanreadthecontentoftheserver’sresponse.forexample:r=requests.get('https://api.github.com/events')print(r.text)#'[{"repository":{"open_issues":0,"url":"https://github.com/...Hints:Ifyouchangetheencoding,Requestswillusethenewvalueofr.encodingwheneveryoucallr.text.(4)BinaryResponseContentYoucanalsoaccesstheresponsebodyasbytes,fornon-textrequests:print(r.content)#b'[{"repository":{"open_issues":0,"url":"https://github.com/...Hints :Thegzipanddeflatetransfer-encodingsareautomaticallydecodedforyou.(5)JSONResponseContentThere’salsoabuiltinJSONdecoder,incaseyou’redealingwithJSONdata:r=requests.get('https://api.github.com/events')print(r.json())#[{'repository':{'open_issues':0,'url':'https://github.com/...(6)RawResponseContentIntherarecasethatyou’dliketogettherawsocketresponsefromtheserver,youcanaccessr.raw.Ifyouwanttodothis,makesureyousetstream=Trueinyourinitialrequest.(7)CustomHeadersIfyou’dliketoaddHTTPheaderstoarequest,simplypassinadicttotheheadersparameter.forexample:url='https://api.github.com/some/endpoint'headers={'user-agent':'my-app/0.0.1'}r=requests.get(url,headers=headers)Note:Customheadersaregivenlessprecedencethanspecificsourcesofinformation.Furthermore,Requestsdoesnotchangeitsbehavioratallbasedonwhichcustomheadersarespecified.Theheadersaresimplypassedonintothefinalrequest.(8)MorecomplicatedPOSTrequestsTypically,youwanttosendsomeform-encodeddata,muchlikeanHTMLform.Todothis,simplypassadictionarytothedataargument.Yourdictionaryofdatawillautomaticallybeform-encodedwhentherequestismade:payload={'key1':'value1','key2':'value2'}r=requests.post("https://httpbin.org/post",data=payload)print(r.text){ ... "form":{ "key2":"value2", "key1":"value1" }, ...}Note:参数还可以是列表,JSON格式数据等。(9)POSTaMultipart-EncodedFileRequestsmakesitsimpletouploadMultipart-encodedfiles.forexample:url='https://httpbin.org/post'files={'file'pen('report.xls','rb')}r=requests.post(url,files=files)r.text{ ... "files":{ "file":"" }, ...}Youcansetthefilename,content_typeandheadersexplicity,forexample:url='https://httpbin.org/post'files={'file''report.xls',open('report.xls','rb'),'application/vnd.ms-excel',{'Expires':'0'})}r=requests.post(url,files=files)r.text(10)ResponseStatusCodesYoucanchecktheresponsestatuscode,forexample:r=requests.get('https://httpbin.org/get')print(r.status_code)#200(11)ResponseHeadersprint(r.headers)(12)CookiesIfaresponsecontainssomeCookies,youcanquicklyaccessthem:url='http://example.com/some/cookie/setting/url'r=requests.get(url)r.cookies['example_cookie_name']#'example_cookie_value'4、高级使用(1)SessionObjectsTheSessionobjectallowsyoutopersistcertainparametersacrossrequests.ItalsopersistscookiesacrossallrequestsmadefromtheSessioninstance,andwilluseurllib3’sconnectionpolling.ASessionobjecthasallthemethodsofthemainRequestsAPI.Let’spersistsomecookiesacrossrequests,forexample:s=requests.Session()s.get('https://httpbin.org/cookies/set/sessioncookie/123456789')r=s.get('https://httpbin.org/cookies')print(r.text)Sessionscanalsobeusedtoprovidedefaultdatatotherequestmethods.ThisisdonebyprovidingdatatothepropertiesonaSessionobject,forexample:s=requests.Session()s.auth=('user','pass')s.headers.update({'x-test':'true'})#both'x-test'and'x-test2'aresents.get('https://httpbin.org/headers',headers={'x-test2':'true'})(2)RequestandResponseObjectsWheneveracallismadetorequests.get()andfriends,youaredoingtwomajorthings.First,youareconstructingaRequestobjectwhichwillbesentofftoaseervertorequestorquerysomeresource.Second,aResponseobjectisgeneratedonceRequestsgetsaresponsebackfromtheserver.TheResponseobjectcontainsalloftheinformationreturnedbytheserverandalsocontainstheRequestsobjectyoucreatedoriginally.r=requests.get('https://en.wikipedia.org/wiki/Monty_Python')Ifwewanttoaccesstheheaderstheserversentbacktous,wedothis:print(r.headers)(3)PreparedRequestsWheneveryoureceiveaResponseobjectfromanAPIcalloraSessioncall,therequestattributeisactuallythePreparedRequestthatwasused.Insomecasesyoumaywishtodosomeextraworktothebodyorheadersbeforesendingarequest.forexample:fromrequestsimportRequest,Sessions=Session()req=Request('POST',url,data=data,headers=headers)prepped=req.prepare()#dosomethingwithprepped.bodyprepped.body='No,Iwantexactlythisasthebody.'#dosomethingwithprepped.headersdelprepped.headers['Content-Type']resp=s.send(prepped, stream=stream, verify=verify, proxies=proxies, cert=cert, timeout=timeout)print(resp.status_code)5、API参考【https://docs.python-requests.org/en/latest/api/】(1)主接口所有请求功能都可以通过7个函数实现,这7个函数都返回Response对象实例,如表5-1所示。 表5-1七个请求函数名称描述requests.request(method,url,**kwargs)构造并发送一个请求。总共15个参数:Ø method:请求方式,GET,OPTIONS,HEAD,POST,PUT,PATCH,和DELETE七种。Ø url:一个请求的urlØ params:[字典,元组型列表,字节]在url上传递的参数Ø data:[字典,元组型列表,字节]在请求体中传递的参数Øjson:[序列化的python对象]json格式数据,在请求体传递Ø headers:[字典]定制http请求头Ø cookies:[字典或CookJar对象]请求中的cookies信息Ø files:[字典]上传的文件列表Ø auth:[元组]支持HTTP认证功能Ø timeout:[float/元组]超时时间,单位:秒Ø allow_redirects:[True(默认)/False]允许/禁止重定向Ø proxies:[字典]设置访问的代理服务器Ø verify :[True(默认)/False],认证SSL证书开关Ø stream:[True(默认)/False],获取内容立即下载开关Ø cert:本地SSL证书路径requests.get(url,params=None,**kwargs)构造一个向服务器请求资源的Request对象,请求方式:GET,并返回一个包含服务器资源的Response对象。参数:url:拟获取页面的url的链接;params:url中的额外参数,字典或字节流格式[可选]**kwargs:12个控制访问的参数,与request方法一致。requests.head(url,**kwargs)SendsaHEADrequestrequest.post(url,data=None,json=None,**kwargs)SendsaPOSTrequestrequest.put(url,data=None,**kwargs)SendsaPUTrequestrequest.patch(url,data=None,**kwargs)SendsaPATCHrequestrequest.delete(url,**kwargs)SendsaDELETErequest(2)RequestSessions一个请求会话对象,提供了持久化cookie、连接查询和配置等功能。常用功能列表如5-2所示。表5-2RequestSessions属性和函数名称描述auth=None[元组或对象]请求授权信息cert=NoneSSL客户验证,“字符串”则是客户的.pem文件,若为元组,则(cert,key)对close()关闭所有适配器与会话cookies=Nonedelete(url,**kwargs)发送DELETE请求,返回Response对象get(url,**kwargs)发送GET请求,返回Response对象get_adapter(url)[BaseApapter]返回指定连接(url)的适配器对象get_redirect_target(resp)接收响应,返回一个重定位的URI或Nonehead(url,**kwargs)发送HEAD请求headers=NoneAcase-insensitivedictionaryofheaderstobesentoneachRequestsentfromthesessionhooks=NoneEvent-handlinghooksmax_redirests=None允许重定向的最大数量merge_environment_settings()检测环境并用给定的参数设置环境对象mount(prefix,adapter)registersaconnectionaadaptertoaprefixoptions(url,**kwargs)SendsaOPTIONSrequest.ReturnsResponseobjectparams=None[dict]请求的参数patch(url,data=None,**kwargs)发送PATCH请求,返回Response对象post(url,data=None,...)发送POST请求,返回Response对象prepare_request(request)实例化一个PrepareRequest对象request(method,url,...)实例化一个Request对象,并发送请求,返回Response对象send(request,**kwargs)SendagivenPreparedRequest,返回Response对象stream=None【Streamresponsecontent】(3)Lower-LevelClasses classrequests.Request(method=None,url=None,headers=None,files=None,data=None,params=None,auth=None,cookies=None,hooks=None,json=None)该类标识一个用户自定义的Request对象。classrequests.Response(object)一个Response对象,该对象包含了HTTP请求的服务器响应信息,如表5-3所示。表5-3Response对象属性和函数名称描述apparent_encodingTheapparentencoding,providedbythechardetlibrary(从内容中分析出来的响应内容编码方式【备选编码方式,编码方式更准确】)。close()关闭连接,并将连接释放到连接池contentHTTP响应内容的二进制形式。cookies=NoneACookieJarofCookiestheserversentbackelapsed=None从发送请求到收到响应已经消耗的时间encoding=NoneEncodingtodecodewithwhenaccessingr.text(即:从HTTPheader中猜测的响应内容的编码形式,即:从header的charset字段获取。若header中不存在charset,则认为编码为ISO-8859-1)headers=None[dict]响应头部信息history=None[list]AlistofResponseobjectsfromthehistoryoftheRequestis_redirectiter_content(..)Iteratesovertheresponsedataiter_linesIteratesovertheresponsedata,onelineatatime.json(**kwargs)Returnsthejson-encodedcontentofaresponse.linksReturnstheparsedheaderlinksoftheresponsenextReturnsaPrepareRequestforthenextrequestinaredirectchainokReturnsTrueifstatus_codelessthan400,Falseifnotraise_forstatus()RaisesHTTPError,ifoneoccurred(即:若不是200,产生HTTPError异常)raw=NoneFile-likeobjectrepresentationofreaponsereason=NoneTextualreasonofrespondedHTTPstatusrequest=NoneThePreparedRequestobjecttowhichthisaresponsestatus_code=NoneHTTP请求的返回状态码,200——成功textHTTP响应内容的字符串形式,即:url对应的页面内容url=NoneFinalURLlocationofResponse. classrequests.PreparedRequestThefullymutablePreparedRequestobject,containingtheexactbytesthatwillbesenttotheserver.InstancesaregeneratedfromaRequestobject,andshouldnotbeinstantiatedmanually;doingsomayproduceundesirableeffects.表5-4PreparedRequest对象属性和函数名称描述body=Nonerequestbodytosendtotheserverderegister_hook(event,hook)[bool]Deregisterapreviouslyregisteredhook.=True:thehookexistedheaders=NonedictionaryofHTTPheadershooks=Nonedictionaryofcallbackhooks,forinternalusage.method=NoneHTTPverbtosendtotheserverpath_urlBuildthepathURLtouseprepare(method=None,url=None,...)Preparestheentirerequestwiththegivenparametersprepare_auth(auth,url=’’)PreparesthegivenHTTPauthdataprepare_body(data,files,json=None)PreparesthegivenHTTPbodydataprepare_content_length(body)PrepareContent-Lengthheaderbasedonrequestmethodand bodyprepare_cookies(cookies)PreparesthegivenHTTPcookiedataprepare_headers(headers)PreparesthegivenHTTPheadersprepare_hooks(hooks)Preparesthegivenhooksprepare_method(method)PreparestheHTTPmethodprepare_url(url,params)PreparesthegivenHTTPURLregister_hook(event,hook)Properlyregisterahookurl=NoneHTTPURLtosendtherequesttoØ classrequests.adapters.BaseAdapterTheBaseTransportAdapter(基本传输适配器)表5-5BaseAdapter对象属性和函数名称描述close()Cleansupadapterspecificitems.send(request,stream=False,...)SendsPreparedRequestobject.ReturnResponseobject send(request,stream=False,timeout=None,verify=True,cert=None,proxies=None)v request:ThePreparedRequestbeingsentv stream:whethertostreamtherequestcontent.v timeoutfloat/tuple)howlongtowaittheservertosenddatabeforegivingup,asafloat,oratuple(连接,读取)v verify1)boolean--TLScertificate,(2)string--itmustbeapathtoaCAbundletousev cert:Anyuser-providedSSLcertificatetobetrustedv proxies--Theproxiesdictionarytoapplytotherequestclassrequests.adapters.HTTPAdapter(pool_connections=10,pool_maxsize=10,max_retries=0,pool_block=False)Thebuilt-inHTTPAdapterforurllib3.Providesageneral-caseinterfaceforRequestssessionstocontactHTTPandHTTPSurlsbyimplementingtheTransportAdapterinterface.ThisclasswillusuallybecreatedbytheSessionclassunderthecovers.表5-6HTTPAdapter参数描述、属性与方法名称描述参数列表pool_connectionsThenumberofurllibsconnectionpoolstocachepool_maxsizeThemaximumnumberofconnectionstosaveinthepoolmax_retriesThemaximumnumberofretrieseachconnectionshouldattemptpool_blockWhethertheconnectionpoolshouldblockforconnections属性和方法add_headers(request,**kwargs)Addanyheadersneededbytheconnectionbuild_response(req,resp)BuildsaResponseobjectfromaurllib3response.Thisshouldnotbecalledfromusercodecert_verify(conn,url,verify,cert)VerifyaSSLcertificate.close()Disposesofanyinternalstate.get_connection(url,proxies=None)ReturnsaurllibsconnectionforthegivenURL.init_poolmanager(connections,...)InitializesaurllibsPoolManagerproxy_headers(proxy)Returnsadictionaryoftheheaderstoaddtoanyrequestsentthroughaproxyproxy_manager_for(proxy,...)ReturnurllibsProxyManagerforthegivenproxy.request_url(request,proxies)Obtaintheurltousewhenmakingthefinalrequestsend(request,stream=False,...)SendsPreparedRequestobject.ReturnsResponseobject.(4)Encodings表5-7Encoding方法名称描述request.utils.get_encodings_from_content(content)Returnsencodingsfromgivencontentstringrequest.utils.get_encoding_from_headers(headers)ReturnsencodingsfromgivenHTTPHeaderDictrequests.utils.get_unicode_from_response(r)Returnstherequestedcontentbackinunicode
|
|