第二十四個夏天後: [Python] 使用 cURL (PycURL)

2010年10月13日星期三

[Python] 使用 cURL (PycURL)

以前常用 PHP cURL 惡搞，最近常寫 Python，終於又碰到了使用 cURL 的時刻了，自己隨意摸索一下筆記。

關於 PycURL 的使用，對照文件：

PycURL Documentation

PycURL: Curl Objects

libcurl - curl_easy_setopt()

如此如此，這般這般，心得：

所有 "CURLOPT_" 開頭的設定，在 Python 改用 "pycurl."，例如 CURLOPT_URL 改成 pycurl.URL

以前用 PHP 時，可以 curl_setopt( $ch , CURLOPT_RETURNTRANSFER , true ); 的方式，就可以透過 $result = curl_exec( $ch ); 把網頁資料存在變數中，現在這邊則是要用 b = StringIO.StringIO()、c.setopt(pycurl.WRITEFUNCTION, b.write) 和 b.getvalue() 方式來取得

簡易的範例：

        import urllib
        import pycurl
        import StringIO

        url = 'target_url'
        c = pycurl.Curl()
        c.setopt( pycurl.URL , url )
        c.setopt( pycurl.FOLLOWLOCATION , True )

        c.setopt( pycurl.COOKIEFILE , '/tmp/pycurl' )
        c.setopt( pycurl.COOKIEJAR , '/tmp/pycurl' )

        b = StringIO.StringIO()
        c.setopt(pycurl.WRITEFUNCTION, b.write)

        c.perform()

        #print b.getvalue()
        r = b.getvalue()
        b.close()
        b = StringIO.StringIO()
        c.setopt(pycurl.WRITEFUNCTION, b.write)

        check = re.findall( re.compile( '(<form(.*?)</form>)' , flags=(re.IGNORECASE|re.DOTALL) ) , r );
        if len(check) < 1 :
                print "No FORM DATA"
                return

        r = check[0][1]
        out = {}
        for sub_info in re.findall( re.compile( '<input(.*?)(name=[\'"]{0,1}(.*?)[\'"]{0,1}[\s]+value=[\'"]{0,1}(.*?)[\'"]{0,1}[\s>]|value=[\'"]{0,1}(.*?)[\'"]{0,1}[\s]+name=[\'"]{0,1}(.*?)[\'"]{0,1}[\s>])' , flags=re.IGNORECASE ) , r ):
                if len(sub_info) != 6 :
                        continue
                if sub_info[2] != '':
                        out[ sub_info[2] ] = sub_info[3]
                elif sub_info[5] != '':
                        out[ sub_info[5] ] = sub_info[4]

        for key in out :
                print "\t",key,":\t",out[key]

        url = 'target_form_action_url'
        c.setopt( pycurl.URL , url )
        c.setopt( pycurl.FOLLOWLOCATION , True )
        c.setopt( pycurl.POST , True )
        c.setopt( pycurl.POSTFIELDS , urllib.urlencode(out) )

        c.perform()

        #print b.getvalue()
        r = b.getvalue()
        b.close()

        f = open( target_file , 'wb' )
        f.write( r )
        f.close()

這個範例是到某個網頁收集 form 表格中的 input 欄位資料，並且使用 cookie ，最後則是產生 POST 並把結果存在檔案中。

第二十四個夏天後

2010年10月13日星期三

[Python] 使用 cURL (PycURL)

沒有留言:

張貼留言

Subscribe Now

2010年10月13日 星期三

[Python] 使用 cURL (PycURL)

沒有留言:

張貼留言

Subscribe Now

2010年10月13日星期三