本文共 2568 字,大约阅读时间需要 8 分钟。
在python3中我们使用request模块访问一个网页,可以选择对文件的读写或者urllib.request.urlretrieve()方法将我们浏览的页面保存到本地。
方法1: url_list=[""] for urlinfo in url_list: file=urllib.request.urlopen(urlinfo) data=file.read() with open(str(urlinfo).split(".")[1]+".html","wb") as fileinfo: fileinfo.write(data)方法2:
filename=urllib.request.urlretrieve(") 检查Web服务器Nginx的访问日志: IP地址 时间 访问方法 访问协议 访问状态等 180.156.222.228 - - [26/Nov/2017:20:02:02 +0800] "GET / HTTP/1.1" 200 4462 "-" "Python-urllib/3.5" "-" 180.156.222.228 - - [26/Nov/2017:20:02:03 +0800] "GET / HTTP/1.1" 200 4462 "-" "Python-urllib/3.5" "-" 180.156.222.228 - - [26/Nov/2017:20:02:03 +0800] "GET / HTTP/1.1" 200 4462 "-" "Python-urllib/3.5" "-" 180.156.222.228 - - [26/Nov/2017:20:02:03 +0800] "GET / HTTP/1.1" 200 4462 "-" "Python-urllib/3.5" "-" 180.156.222.228 - - [26/Nov/2017:20:02:03 +0800] "GET / HTTP/1.1" 200 4462 "-" "Python-urllib/3.5" "-" 180.156.222.228 - - [26/Nov/2017:20:02:03 +0800] "GET / HTTP/1.1" 200 4462 "-" "Python-urllib/3.5" "-" 180.156.222.228 - - [26/Nov/2017:20:02:03 +0800] "GET / HTTP/1.1" 200 4462 "-" "Python-urllib/3.5" "-" 180.156.222.228 - - [26/Nov/2017:20:02:03 +0800] "GET / HTTP/1.1" 200 4462 "-" "Python-urllib/3.5" "-" 180.156.222.228 - - [26/Nov/2017:20:02:03 +0800] "GET / HTTP/1.1" 200 4462 "-" "Python-urllib/3.5" "-" 模拟浏览器-Headers属性1: import urllib.request import re url="" headers = ("User-Agent","Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:57.0) Gecko/20100101 Firefox/57.0") opener = urllib.request.build_opener() opener.addheaders=[headers] data=opener.open(url).read() with open( "1.html", "wb") as fileinfo: fileinfo.write(data)伪装后的请求:
180.156.222.228 - - [26/Nov/2017:20:57:22 +0800] "GET / HTTP/1.1" 200 4462 "-" "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:57.0) Gecko/20100101 Firefox/57.0" "-" 180.156.222.228 - - [26/Nov/2017:20:57:22 +0800] "GET / HTTP/1.1" 200 4462 "-" "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:57.0) Gecko/20100101 Firefox/57.0" "-" 180.156.222.228 - - [26/Nov/2017:20:57:22 +0800] "GET / HTTP/1.1" 200 4462 "-" "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:57.0) Gecko/20100101 Firefox/57.0" "-" 180.156.222.228 - - [26/Nov/2017:20:57:22 +0800] "GET / HTTP/1.1" 200 4462 "-" "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:57.0) Gecko/20100101 Firefox/57.0" "-"模拟浏览器—Headers属性2
url="" req=urllib.request.Request(url) req.add_header("User-Agent","Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:57.0) Gecko/20100101 Firefox/57.0") data=urllib.request.urlopen(req).read() print(data)本文转自 tianya1993 51CTO博客,原文链接:http://blog.51cto.com/dreamlinux/2044474,如需转载请自行联系原作者