這時就先用了 pycurl 去跟 Web server 要資料,接著嘗試跟 Web server 要 gz 形態的資料回來,並使用 zlib 解掉。
範例:
import pycurl
import StringIO
import zlib
def getWebData(url):
c = pycurl.Curl()
c.setopt( pycurl.URL , url.encode('utf-8') )
c.setopt( pycurl.FOLLOWLOCATION , True )
c.setopt( pycurl.HTTPHEADER , [
'User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.8; rv:21.0) Gecko/20100101 Firefox/21.0',
'Accept-Language: zh-tw,zh;q=0.8,en-us;q=0.5,en;q=0.3',
'Accept-Encoding: gzip, deflate',
'Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
])
open('/tmp/pycurl','wb').close()
c.setopt( pycurl.COOKIEFILE , '/tmp/pycurl' )
c.setopt( pycurl.COOKIEJAR , '/tmp/pycurl' )
b = StringIO.StringIO()
h = StringIO.StringIO()
c.setopt(pycurl.WRITEFUNCTION, b.write)
c.setopt(pycurl.HEADERFUNCTION, h.write)
c.perform()
body = b.getvalue()
b.close()
header = h.getvalue()
h.close()
return {'header':header, 'body':body}
target = "http://blog.changyy.org"
#data = urllib2.urlopen(target)
data = zlib.decompress(getData(target)['body'], 16+zlib.MAX_WBITS)
print data
透過 pycurl 的 request header 指定 gzip 形態,收回來後,在用 zlib.decompress 解開來使用。
沒有留言:
張貼留言