第二十四個夏天後: PaaS

顯示具有 PaaS 標籤的文章。顯示所有文章

2012年2月26日星期日

[Python] 使用 Heroku 筆記 @ Windows 7

差不多該玩一下 Heroku 了，這跟 Google App Engine 一樣，是一套 PaaS 服務架構，仔細看才發現 Heroku 竟然是架設在 EC2 上面的服務，讓我想起 Dropbox 這有名的服務是使用 S3 架構，原來有不少成功的服務不見得從機房都自己打造呢！這樣的感觸滿特別的，不需堅持自己從零開始。使用 Heroku 跟 GAE 這類服務的主因之一，是想要妥善利用身邊的資源，不見得要去租一台虛擬機器來使用，雖然權限大能做的事很多，但一開始太多資源不見得是好事，除此之外，使用 Heroku 跟 GAE 的好處是碰到流量大時，錢砸下去就可以 Scaling ，這樣的投資似乎不差吧？最大的缺點是要去習慣這些架構，轉個角度來看，或許這就是雲端的成本吧？就像把想做的事寫成符合 Hadoop/HBase 架構一樣。

對 Heroku 來說，計費方式是以 process 單位小時的方式計價，例如免費帳號就是一個月使用 750 小時不用錢，使用一隻 process 一個月是 720 小時，所以就不用錢。原先是一隻 process 用一小時 0.05 美元，其中 process 分為 web process 跟 background process 兩種。只是 Heroku 資料庫免費空間僅 5MB 而已，像 GAE 則是 1GB 大小，但是 GAE 還有計算 in/out 的資料量等等。我還沒搞懂 Heroku 有沒計算流量以及是否有限制 library 的部分 :P

好啦，回過頭講一下在 Windows 的操作方式，準備資源：

Python 2.7

http://www.python.org/ftp/python/2.7/python-2.7.msi

http://www.python.org/ftp/python/2.7/python-2.7.amd64.msi

Python - virtualenv

virtualenv-1.7.1.2.tar.gz

Heroku Windows

heroku-toolbelt.exe

依序安裝完 Heroku Windows、Python 2.7 後，把 virtualenv-1.7.1.2.tar.gz 解壓縮後，就可以準備操作了，主要參考 Heroku - Getting Started with Python on Heroku/Cedar 這篇，先用最簡單的方式體驗一下，其中有用到 python Flask，這是一個輕量型 MVC 的 framework 囉。

步驟：

Python 2.7 在 C:\Python27\python.exe；virtualenv 在 C:\virtualenv-1.7.1.2；heroku-toolbelt 預設在 C:\Program Files\Heroku。安裝完軟體後，直接用 cmd 運作即可。

建立空目錄，如 helloflask

C:\Users\user>mkdir helloflask && cd helloflask
C:\Users\user\helloflask>

建立 virtualenv 環境

C:\Users\user\helloflask>C:\Python27\python.exe C:\virtualenv-1.7.1.2\virtualenv.py venv --distribute
New python executable in venv\Scripts\python.exe
Installing distribute...done.
Installing pip...done.

啟用虛擬環境：

C:\Users\user\helloflask>venv\Scripts\activate
(venv) C:\Users\user\helloflask>

安裝 Flask：

(venv) C:\Users\user\helloflask>pip install flask

Downloading/unpacking flask
Downloading Flask-0.8.tar.gz (494Kb): 494Kb downloaded
Running setup.py egg_info for package flask
...
Downloading/unpacking Werkzeug>=0.6.1 (from flask)
Downloading Werkzeug-0.8.3.tar.gz (1.1Mb): 1.1Mb downloaded
Running setup.py egg_info for package Werkzeug
...
Downloading/unpacking Jinja2>=2.4 (from flask)
Downloading Jinja2-2.6.tar.gz (389Kb): 389Kb downloaded
Running setup.py egg_info for package Jinja2
...
Successfully installed flask Werkzeug Jinja2
Cleaning up...

建立 app.py 程式(C:\Users\user\helloflask\app.py)：

import os

from flask import Flask
app = Flask(__name__)

@app.route('/')
def hello():
return 'Hello World!'

if __name__ == '__main__':
  # Bind to PORT if defined, otherwise default to 5000.
  port = int(os.environ.get('PORT', 5000))
  app.run(host='0.0.0.0', port=port)

建立 Heroku 環境檔案：

(venv) C:\Users\user\helloflask>pip freeze > requirements.txt
(venv) C:\Users\user\helloflask>cat requirements.txt
Flask==0.8
Jinja2==2.6
Werkzeug==0.8.3
distribute==0.6.24
(venv) C:\Users\user\helloflask>echo web: python app.py > Procfile
(venv) C:\Users\user\helloflask>cat Procfile
web: python app.py

執行 web service：

(venv) C:\Users\user\helloflask>foreman start

接著就可以用瀏覽器瀏覽 http://localhost:5000 啦，可以看到印有 "Hello World!" 字樣，要結束的話就按 ctrl+c 囉

如此一來，就完成本地端的測試開發囉。至於 Deploy 的部分，則是好好看一下官網上面的介紹，這邊就不多提了。

整體上，覺得 GAE 比較簡單一點，過程中只需安裝 python 跟 GAE SDK 環境，但 Heroku 還須使用 Ruby 環境等，或許對已習慣用 Ruby 的人來說，是非常自在的 :D

2012年2月19日星期日

[Python] 使用 Google App Engine 之資料模型(Data Model)筆記 @ Windows 7

操作簡單的 GAE 後，開始來摸摸資料儲存的部分。對於 GAE DB 的部分，操作不難，跟 Django 很像，先定義一個 Data Model (資料庫的資料表)，接著就可以操作了！此例僅簡單帶過，更豐富的操作方式請參考官網 GAE - 資料模型。

建立 NewsData 資料模型：

from google.appengine.ext import db

class NewsData(db.Model):
check = db.StringProperty()
  url = db.StringProperty()
  title = db.StringProperty()
  date = db.DateProperty()

新增一筆資料：

import datetime
item = NewsData(chech='1',url='http://localhost',title='TestNews',date=datetime.datetime.now().date())
item.put()

查詢資料：

使用 Data Model 查詢：

q = NewsData.all()
results = q.fetch(3)
for p in results:
print '<a href="%s">%s</a>' % (p.url,p.title)

使用 GqlQuery 之 SQL 語法：

# 查詢 3 天內的新聞
q = db.GqlQuery("SELECT * FROM NewsData WHERE date > :1 ORDER BY date DESC", datetime.datetime.now().date() - datetime.timedelta(days=3) )
results = q.fetch(3)
for p in results:
print '<a href="%s">%s</a>' % (p.url,p.title)

上述都很淺顯易懂，接著能嘗試簡易的 MVC 架構，把 DB Modle 定義在 mydb.py 檔，由 myput.py 和 myquery.py 作為 CGI 來操作(MVC的 VC 偷懶合在一起 XD)。

目錄結構：

app.yaml
favicon.ico
index.yaml
main.py
mydb.py
myquery.py
myput.py

app.yaml：

application: engineapp
version: 1
runtime: python
api_version: 1

handlers:
- url: /favicon\.ico
static_files: favicon.ico
upload: favicon\.ico

- url: /query
script: myquery.py

- url: /put
script: myput.py

- url: .*
script: main.py

mydb.py：

from google.appengine.ext import db
class NewsData(db.Model):
  check = db.StringProperty()
  url = db.StringProperty()
  title = db.StringProperty()
  date = db.DateProperty(auto_now_add=True)

myput.py：

# -*- coding: utf-8 -*-
print 'Content-Type: text/html'
print ''

import mydb
import cgi, hashlib, datetime, urllib
from google.appengine.ext import db

request = cgi.FieldStorage()

newsDate = datetime.datetime.now().date()
newsTitle = 'defaultTitle' if request is None or 'title' not in request or request['title'].value == '' else cgi.escape(request['title'].value)
newsURL = 'http://localhost' if request is None or 'url' not in request or request['url'].value == '' else request['url'].value
newsCheck = hashlib.md5(str(newsTitle)+str(newsURL)).hexdigest()

if mydb.NewsData.all().filter('check =',newsCheck).get() is None:
  item = mydb.NewsData(check=newsCheck,url=unicode(newsURL,'utf-8'),title=unicode(newsTitle,'utf-8'),date=newsDate)
  item.put()
  print 'Put'
else:
  print 'No Operation'

myquery.py：

# -*- coding: utf-8 -*-
print 'Content-Type: text/html'
print ''

import mydb
import datetime
from google.appengine.ext import db

print """
<html>
<head>
      <meta http-equiv="Content-Type" content="text/html; charset=utf-8">
  </head>
  <body>
"""

q = db.GqlQuery("SELECT * FROM NewsData WHERE date > :1 ORDER BY date DESC", datetime.datetime.now().date() - datetime.timedelta(days=3) )
results = q.fetch(50)
for p in results:
  url = p.url
  date = p.date
  check = p.check
  title = p.title
  print '<a href="%s">[%s] %s(%s)</a><br />' % ( url.encode('utf-8'), date, title.encode('utf-8'), check.encode('utf-8') )

print """
</body>
</html>
"""

如此一來，透過瀏覽 http://localhost:port/put (或 http://localhost:port/put?title=123&url=www.google.com ) 新增資料，透過 http://localhost:port/query 顯示資料。除此之外，還可以透過 Google App Engine Launcher 的 SDK Console ，直接用瀏覽器去查看資料庫的東西，實在方便：

這邊容易碰到的問題是 DB 內資料的編碼問題，我在 myput.py 把 CGI 得到的東西用 unicode(data,'utf-8') 的存進資料庫，在 myquery.py 時，則是用 data.encode('utf-8') 處理印出的部分。

2012年2月18日星期六

[Python] 使用 Google App Engine (GAE) 筆記 @ Windows 7

很久以前就註冊了帳號，但一直都沒認真使用 :P 而後 GAE 又多了 Cron Jobs，我還是沒有用。最近提起勁來用一下吧！此篇只著重在 local 端的 python 練習。

練習的功能：

設定 CLI 相容環境(command mode 可以測試 script.py 而不用每次都從瀏覽器執行)

使用 urllib2 取得網路資料

使用 re 取得資料(Regular Expression 處理字串)

使用 urlfetch 存取網路服務(google.appengine.api)

實現 urlfetch with cookie 功能

使用 plurk api 發布消息

流程：

先在 Windows 7 環境上，使用 googleappengine lib 寫寫 python 程式，接著從某個 url 取得資料，再用 re 處理字串，然後使用 plurk api 發布訊息。上述流程沒問題後，改成 CGI 模式，因此可透過 GAE 來執行，最後設定 cron jobs

安裝 GAE 相關開發環境：

GoogleAppEngine-1.6.2.msi

python-2.5.msi

npp.5.9.8.Installer.exe

google-appengine-docs-20120131.zip (GAE離線文件)

一切都用預設安裝，別忘了設定環境變數，在 cmd mode 下才可以直接打 python 來做事

建立一個 GAE Project：

僅本地端，除了 Project 位置外(D:\GAE\workspace\engineapp)，全部都預設，弄完就開起來測試一下，理論上應該可以輕易地用瀏覽器瀏覽這 hello world 程式

新增一支 script (plurk.py)：

在 Project 位置建立一個 plurk.py 空檔案，接著就切到 command mode 來測試：

C:\> D:
D:\> cd GAE\workspace\engineapp
D:\GAE\workspace\engineapp>python plurk.py
(...沒東西...)

設置 script 可使用 GAE Libs：

# -*- coding: utf-8 -*-
"""
# ImportError: No module named google.appengine.api
import sys, os
DIR_PATH = 'C:\Program Files\Google\google_appengine'
EXTRA_PATHS = [
DIR_PATH,
os.path.join(DIR_PATH, 'lib', 'antlr3'),
  os.path.join(DIR_PATH, 'lib', 'django'),
  os.path.join(DIR_PATH, 'lib', 'django_0_96','django'),
  os.path.join(DIR_PATH, 'lib', 'django_1_2','django'),
  os.path.join(DIR_PATH, 'lib', 'django_1_3','django'),
  os.path.join(DIR_PATH, 'lib', 'simplejson'),
  os.path.join(DIR_PATH, 'lib', 'fancy_urllib'),
  os.path.join(DIR_PATH, 'lib', 'ipaddr'),
  os.path.join(DIR_PATH, 'lib', 'webob'),
  os.path.join(DIR_PATH, 'lib', 'yaml', 'lib'),
]
sys.path = EXTRA_PATHS + sys.path

# AssertionError: No api proxy found for service "urlfetch"
from google.appengine.api import apiproxy_stub_map
from google.appengine.api import datastore_file_stub
from google.appengine.api import mail_stub
from google.appengine.api import urlfetch_stub
from google.appengine.api import user_service_stub

APP_ID = u'test_app'
#os.environ['AUTH_DOMAIN'] = AUTH_DOMAIN # gmail.com
#os.environ['USER_EMAIL'] = LOGGED_IN_USER # account@gmail.com

apiproxy_stub_map.apiproxy = apiproxy_stub_map.APIProxyStubMap()
# Use a fresh stub datastore.
stub = datastore_file_stub.DatastoreFileStub(APP_ID, '/dev/null', '/dev/null')
apiproxy_stub_map.apiproxy.RegisterStub('datastore_v3', stub)
# Use a fresh stub UserService.
apiproxy_stub_map.apiproxy.RegisterStub('user',user_service_stub.UserServiceStub())
# Use a fresh urlfetch stub.
apiproxy_stub_map.apiproxy.RegisterStub('urlfetch', urlfetch_stub.URLFetchServiceStub())
# Use a fresh mail stub.
apiproxy_stub_map.apiproxy.RegisterStub('mail', mail_stub.MailServiceStub())
"""

把這一段擺在 plurk.py 的最上面，如此一來，當你是在 command mode 上執行時，把這段註解打開即可使用(把一開始跟最後的 """ 去掉即可)

接著撰寫 Plurk API 要用的範例程式：

參考 [PHP] 使用官方 Plurk API 實作簡單的機器人 - 靠機器人救 Karma！以 Yahoo News 為例架構，改成 GAE Python 版，分別實作三個主要 function：

def getNews()

取得新聞

def doAct()

執行 url/api

def getTinyURL(src)

取得縮網址

程式碼：

# return responseContent
def doAct( targetURL, method='POST', data={}, cookie = None, header=None ):
rawHeader = {'User-Agent':'Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)'}
if header:
rawHeader = header

  # load cookie
  try:
      if cookie <> None:
      cookieData = '' # {}
      for c in cookie.values():
          #cookieData[c.key]=c.value
          cookieData += c.key + "=" + c.value
if len(cookieData) > 0 :
          rawHeader['Cookie'] = cookieData
  except Excetion, e:
      pass

  # post data
  try:
      if data and len(data) > 0 :
          data = urllib.urlencode( data )
      else:
          data = None
  except Excetion, e:
      data = None

  rawMehtod = urlfetch.POST if method != 'GET' else urlfetch.GET
  response = urlfetch.fetch(url=targetURL,payload=data,method=rawMehtod,headers=rawHeader,deadline=10)

  if cookie <> None:
      cookie.load(response.headers.get('set-cookie', ''))
  return response.content if response <> None and response.content <> None else ''

# return {'status':'ok','data':[ {'title':NewsTitle, 'url':NewsURL } , ... ]}
def getNews():
  out = {'status':'fail','data':None}
  try:
      # setting
      newsURL = 'http://tw.yahoo.com/'
      newsPatternBeginChecker = '<label class="img-border clearfix">'
      newsPatternEndChecker = '<ol class="newsad clearfix">'
      newsRePatternNewsExtract = '<h3[^>]*>[^<]*<a href="(.*?)"[^>]*>(.*?)</a></h3>'
      #newsRePatternNewsExtract = r'<h3[^>]*>[^<]*<a href="(?P<url>.*?)"[^>]*>(?P<title>.*?)</a></h3>'
      newsReOptionsNewsExtract = re.DOTALL
      newsPatternURLChecker = 'http:'

      raw = urllib2.urlopen(newsURL).read()
        checker = str(raw).find(newsPatternBeginChecker)
      if checker < 0:
          out['data'] = 'newsPatternBeginChecker fail'
          return out
        raw = raw[checker+len(newsPatternBeginChecker):]
        checker = raw.find(newsPatternEndChecker)
        if checker < 0:
            out['data'] = 'newsPatternEndChecker fail'
            return out
        raw = raw[:checker]
        #print "##",raw,"##"
        m = re.findall( newsRePatternNewsExtract, raw, newsReOptionsNewsExtract )
        if m:
          out['data'] = []
          for data in m:
              urlChecker = data[0].find(newsPatternURLChecker)
              if urlChecker >= 0:
                  out['data'].append( {'title':data[1],'url':data[0][urlChecker:]} )
              if len(out['data']) > 0:
                    out['status'] = 'ok'
              else:
                  out['data'] = 'not found'
  except Exception, e:
      out['data'] = str(e)
  return out

# return short url via tinyurl.com
def getTinyURL(src):
  try:
      raw = urllib2.urlopen('http://tinyurl.com/api-create.php?'+urllib.urlencode({'url':src})).read()
      return raw.strip()
  except Exception, e:
      pass
  return None

呼叫方式：

# main
plurkAPIKey = 'YourPlurkAPIKey'
plurkID = 'YourPlurkID'
plurkPasswd = 'YourPlurkPassword'
getNewsData = getNews()

runLog = []
if getNewsData['status'] == 'ok':
  # try login
  baseCookie = Cookie.SimpleCookie()
  loginData = {'api_key':plurkAPIKey,'username':plurkID,'password':plurkPasswd}
  checkLogin = doAct( 'http://www.plurk.com/API/Users/login', 'POST', loginData, baseCookie )
  try:
      obj = simplejson.loads(checkLogin)
      if 'error_text' in obj:
          runLog.append( 'login error: '+str(obj['error_text']) )
  except Exception,e :
      runLog.append( 'login exception: '+str(e) )
  if len(runLog) == 0:
      # try post
      for news_info in getNewsData['data']:
          formated_message = '[News] '+news_info['url']+' ('+news_info['title']+')'
          if len(formated_message) > 140:
              shortURL = getTinyURL(news_info['url'])
              if shortURL <> None:
                  formated_message = '[News] '+shortURL+' ('+news_info['title']+')'

          if len(formated_message) <= 140:
              writeData = {'api_key':plurkAPIKey,'qualifier':'shares','content':formated_message}
              checkPost = doAct( 'http://www.plurk.com/API/Timeline/plurkAdd' , 'POST' , writeData, baseCookie )

              try:
                  obj = simplejson.loads(checkPost)
                  if 'error_text' in obj and obj['error_text'] <> None:
                      runLog.append( 'post error: '+str(obj['error_text'])+', Message:'+formated_message )
              except Exception, e:
                  runLog.append( 'post exception: '+str(e)+', Message:'+formated_message )
  else:
      runLog.append( 'getNews error:'+getNewsData['data'])

若單純測試 Plurk API 的話，只要依序執行這兩段即可：

# try login
baseCookie = Cookie.SimpleCookie()
loginData = {'api_key':plurkAPIKey,'username':plurkID,'password':plurkPasswd}
print doAct( 'http://www.plurk.com/API/Users/login', 'POST', loginData, baseCookie )
# try post
writeData = {'api_key':plurkAPIKey,'qualifier':'shares','content':'Hello World'}
print doAct( 'http://www.plurk.com/API/Timeline/plurkAdd' , 'POST' , writeData, baseCookie )

弄成 CGI 模式：

# ...依序把上面的程式碼都湊在一起後，接著下面這段...

# CGI FORMAT for HTML
print 'Content-Type: text/html'
print ''
# report
print '<html><head><meta http-equiv="content-type" content="text/html; charset=utf-8"/></head><body>'
if len(runLog) > 0:
  print "<pre>"
  #print runLog
  print "</pre>"
  for err in runLog:
      print '<p>'+ err + '</p>'
else:
  print 'OK'
print '</body></html>'

如此一來，就是相容於瀏覽器跟 command mode 的環境：

此為連續執行 2~3 次的結果，因為 Plurk 會擋重複訊息，所以第二次輸出的結果不一樣

設定 GAE Project (engineapp)：

設定對應的 URL 位置：

engineapp\app.yaml：

application: engineapp
version: 1
runtime: python
api_version: 1

handlers:
- url: /favicon\.ico
static_files: favicon.ico
upload: favicon\.ico

- url: /PlrukPost
script: plurk.py

- url: .*
script: main.py

如此就可以用網頁瀏覽：

設定 Cron Jobs：

engineapp\cron.yaml：

cron:
- description: news job
url: /PlurkPost
schedule: every 20 minutes

這樣每 20 分鐘就會去瀏覽該網頁一次，自然就會執行工作一次，此部分需要上傳到 GAE 上才能

Deploy 筆記：

在 Google App Engine 的文件上，以 Python 2.5 為例，然而在使用 Deploy 時，會看到 ssl module not found 的訊息，因此無法上傳到 server。最簡單的解法就是安裝一下 Python 2.7 版，接著在 Google App Engine Launcher -> Edit -> Preferences 指定 Python Path ，就能夠順利上傳囉！

訂閱：文章 (Atom)

2012年2月26日 星期日

[Python] 使用 Heroku 筆記 @ Windows 7

2012年2月19日 星期日

[Python] 使用 Google App Engine 之資料模型(Data Model)筆記 @ Windows 7

2012年2月18日 星期六

[Python] 使用 Google App Engine (GAE) 筆記 @ Windows 7

Subscribe Now

2012年2月26日星期日

2012年2月19日星期日

2012年2月18日星期六