第二十四個夏天後: 10月 2010

2010年10月22日星期五

opubWriter - An Online WYSIWYG EPUB Writer

http://www.opubwriter.com/ & http://opubwriter.wordpress.com/

在找 tinyMCE 相關資料時，剛好找到 opubWriter 的服務，他是一款線上編輯 EPUB 的服務，編輯完後可以把書籍下載回來。

目前提供的介面還很陽春，用起來稍稍不方便。例如想要把圖片擺進 EPUB 時，必須先透過 [Library]->[Media] 上傳圖片，上傳完可以看到圖片下面有個 path 描述，如 images/file.jpg，接著換到 Editor 編輯文章時，使用插入圖片，要選擇指定 images/file.jpg 的方式。如果直接填入圖片網址的話，依照 EPUB 的規格，該 link 在遵循 EPUB Spec 的閱讀器上，並不會顯示。

雖然 opubWriter 操作介面陽春，但大概該有的功能都差不多，的確可以在線上編輯產生一本 EPUB 書籍，依照上述的 To do list，唯一的缺憾大概是 EPUB Validate 並還沒有加入，另外，則是連到國外操作很緩慢。

簡述編輯書籍的流程：

先到 Library > New 新增一本書，預設只先給 Title 和 Copyright 兩章

接著 Library > Organisation 裡頭，點選 Content 時，就可以自行新增 XHTML 檔案，並且最後會詢問該檔案的順序位置，只需挑選一個對應的檔案，選擇 After, Before 等選項

透過 Library > Editor 的區域，挑選你要編輯的檔案（皆為 XHTML），就開始 WYSIWYG 的編輯環境了，編輯完記得要按右上角的 Save

最後，切換到 Library，就可以選擇下載指定的 EPUB 檔案

Sigil - A WYSIWYG EPUB Writer

坊間大部分產生 EPUB 的方式，都是從 A 格式轉成 B 格式的方式，找了一下有沒有 Open Source 或免費的 EPUB Writer，發現有一套用起來還不會太難用：

Sigil - A WYSIWYG ebook editor
http://code.google.com/p/sigil/

此軟體支援 Windows、Linux 和 Mac。我只在 Windows XP 上安裝測試，初步使用還算正常。

上圖綠色背景是我自己電腦上的不傷眼的設定，原樣應該是白色背景

使用上有個地方要留意，那就是在編輯文章時，很常把圖片用拖拉的方式擺進去，的確可以顯示，但編輯出來的 EPUB 卻不見得可以看到，最常見的範例是把網頁的圖片拖拉到編輯器，這時候顯示的圖片其實是透過超連結的，但 EPUB Spec 上頭，有規範使用的圖片都是要包在 EPUB 檔案內的，因此，在某些遵守 EPUB 格式的閱讀器上，就會看不到圖片。解法也很簡單，每次要拖拉圖進去，先把圖下載好，在透過上面的 Insert > Image 的方式，把圖加入，此時就會看到右邊的 Images 目錄中有圖片，這樣的方式就是把圖片都包進 EPUB 囉。

dotEPUB - 將網頁轉成 EPUB

http://dotepub.com/

今年中也曾想過作這種事，這 dotEPUB 是其中的一個雛型！可以搭配 Firefox 的 bookmarks 的用法，當你瀏覽到某一個網頁時，點選一下 bookmark ，就等同於將網頁 link 送到 dotEPUB，並且 dotEPUB 就回傳一份打包好的 EPUB 檔案，也可以看看下面的示範影片

2010年10月21日星期四

Javascript K-means Clustering

嘗試用 Javascript 寫了 K-means algorithm，在這過程之中，讓我回想起兩年前用 C 寫 Betweenness Centrality 的點滴！偶爾拿來訓練腦袋也不錯！

在這過程之中，發現一件事情，那就是 K-means 原先是想要把一堆資料分成 K 群為目標，那是否有可能收斂時，卻沒有 K 個群呢？這個我在自己的測資中，有發現這個現象，但我不曉得是不是演算法哪邊寫錯了，還是本來就存在這個問題？

黑色代表五筆資料，紅色代表一開始假設的族群中心點

例如原先共有 5 筆資料，以二維平面來說，假設資料全部都縮在角落，然後一開始決定 k=2 時，恰好設定在兩個對角，導致在判斷 5 筆資料要歸屬那個群族時，變成一個是空的，另一個有 5 筆資料，如此下去，很快就收斂，但資料就變成只有一個族群了。解決的方式就是一開始指定群族的位置時，直接指定在實際存在的點上，這樣可以確保每個族群一開始至少有一個點。

程式碼：

try{
        console.log( 'begin' );
}catch(err){
        console = {}
        console.log = function(){}
}

//
// pointer_list = []
// pointer_list[0] = { 'x':x , 'y':y , 'group_index':-1, 'distance':-1 }
//
var pointer_list = new Array();
for( var i=0,cnt=raw_pointer.length ;i<cnt ; i++ )
        pointer_list.push( { 'x':raw_pointer[i]['x'] , 'y':raw_pointer[i]['y'] , 'group_index': -1 , 'distance':-1 } );

var group_cnt = 100;

//
// center = []
// center[0] = { 'group_list':[] , 'x':x , 'y':y }
//
var center = new Array();
for( var i=0 ; i<group_cnt ; ++i )
        center.push( { 'group_list':[] } );

for( var i=0, cnt=pointer_list.length ; i<cnt ; ++i )
{
        var c_index = parseInt( i / ( cnt / group_cnt ) );
        if( center[ c_index ]['x'] == undefined )
        {
                center[ c_index ]['x'] = pointer_list[i]['x'];
                center[ c_index ]['y'] = pointer_list[i]['y'];
                //center[ c_index ]['group_list'].push( i );
                pointer_list[i]['group_index'] = c_index;
                pointer_list[i]['distance'] = 0;
        }
}
console.log( 'group:'+group_cnt+',pointer:'+pointer_list.length );
var wanna_finish = 0;
var run_cnt = 0;
var max_run_cnt = 30;
while( !wanna_finish && run_cnt < max_run_cnt )
{
        wanna_finish = 1;
        for( var i=0, cnt=pointer_list.length ; i<cnt ; ++i )
        {
                for( var j=0; j<group_cnt ; ++j )
                {
                        var diff_x = Math.abs( center[j]['x'] - pointer_list[i]['x'] );
                        var diff_y = Math.abs( center[j]['y'] - pointer_list[i]['y'] );
                        var diff = Math.sqrt( diff_x*diff_x + diff_y*diff_y );
                        if( pointer_list[i]['group_index'] < 0 || pointer_list[i]['distance'] > diff )
                        {
                                pointer_list[i]['group_index'] = j;
                                pointer_list[i]['distance'] = diff;
                        }
                }
                center[ pointer_list[i]['group_index'] ]['group_list'].push( i );
        }
        for( var i=0 ;i<group_cnt ; ++i )
        {

                var cnt = center[i]['group_list'].length;
                var x = 0;
                var y = 0;
                for( var k=0 ; k<cnt ; k++ )
                {
                        x += pointer_list[ center[i]['group_list'][k] ]['x'];
                        y += pointer_list[ center[i]['group_list'][k] ]['y'];
                }
                center[i]['x'] = x/cnt;
                center[i]['y'] = y/cnt;

                if(
                        center[i]['old_x'] == undefined || center[i]['old_y'] == undefined
                        || center[i]['old_x'] != center[i]['x'] || center[i]['old_y'] != center[i]['y']
                 )
                {
                        wanna_finish = 0;
                        center[i]['old_x'] = center[i]['x'];
                        center[i]['old_y'] = center[i]['y'];
                }
                console.log( run_cnt+' @ Group:'+cnt+',center['+i+']:('+center[i]['x']+','+center[i]['y']+')');
        }
        if( !wanna_finish )
        {
                if( run_cnt < max_run_cnt )
                {
                        for( var i=0 ; i<group_cnt ; ++i )
                                center[ i ]['group_list'] = [];
                }
                for( var i=0, cnt=pointer_list.length ; i<cnt ; ++i )
                {
                        pointer_list[i]['group_index'] = -1;
                }
        }
        run_cnt ++;
        console.log( 'WannaFinish:' + wanna_finish + ' @ Run:' + run_cnt );
}

其他資訊：

group_cnt：代表最後的 k 是幾個，此例為 100 個

max_run_cnt：一種額外的終止條件，避免 k-means 找太久，此例為 30 次

center：array list，記錄找出來的族群資訊，分別有 x, y 座標，以及 group_list 是 array list，記錄 pointer 在 pointer_list 的 index 位置

pointer_list：array list，記錄 x,y 座標，以及所屬的 group (記錄 index) 和與該 group 中心點的距離

Cross-domain Ajax Query

無聊玩了一下 Ajax Query，在 http://localhost/ 呼叫 htt://www.example.com/test.php 取得資料，過程中用 Firebug 查看網路跟 Javascript，很奇妙地是回傳 200 OK 但又被打個 X ，一直搞不太懂為啥會這樣，結果過了三十分鐘後我才想起來！這就是 cross-domain 的 request 問題！也就是使用 Javascript 在 A site 去 Query 另一個 domain, B site, 的問題。

說真的我都還沒解過，只知道曾聽人說可以用 callbacok function 來解，但怎樣解？我不知道。昨晚嘗試使用，才發現 callback function 是搭配 JSON 格式，這類的專有名詞是 JSONP，全名是 JSON with padding，細節可以在 Wikipedia - JSON 查看。

舉一個範例

假設在 A domain 下，進行 Ajax Query，查詢 B domain 的資料，如 http://b.domain/test.php，並且 http://b.domain/test.php 支援 JSON 和 JSONP 的使用

假設使用 http://b.domain/test.php?callback=my_func 時，即啟動 JSONP 的方式，而不加 callback 參數時，僅 JSON 格式的回傳

如果沒有加 callback 參數時，http://b.domain/test.php，那回傳的資料可能是：

{'x':1,'y':2}

若加上 callback=my_func 時，http://b.domain/test.php?callback=my_func，則回傳

my_func({'x':1,'y':2})

而這種透過 callback 的使用方式：

新增一個 function 名為 my_func

使用動態新增 script 的方式

範例：

<html>
   <head>
       <meta http-equiv="content-type" content="text/html; charset=UTF-8"/>
       <title>Use JSONP</title>
       <script type="text/javascript">
           function initialize()
           {
               var action_url = 'http://b.domain/test.php?callback=my_func';
               var script = document.createElement("script");
               script.setAttribute("src",action_url);
               script.setAttribute("type","text/javascript");
               document.body.appendChild(script);
           }
           function my_func(data)
           {
               console.log( data );
               //alert(data);
           }
       </script>
   </head>
   <body onload="initialize()">
   </body>
</html>

如此一來，在 my_func 中，就會收到 {'x':1,'y':2} 的資訊，並且可以用 data['x'] 和 data['y'] 來取值囉！

2010年10月19日星期二

用 Python 寫 CGI

參考資料：

cgi — Common Gateway Interface support

Penzilla.net's Python Tutorial - CGI Scripting Basics

Embedding Python In Apache2 With mod_python (Debian Etch)

Appending to Your Python Path

之前用 Python 寫了一隻 tool，然後希望透過 web 來傳參數使用，於是就想透過 Python 實做 CGI 好了！在此不聊 Web Server 的設定，假設只需在 ~/public_html/cgi-bin 或 ~/public_html/cgi-bin 還是 ~/public_html/ 內擺放此 cgi 就行了！

注意事項：

該檔案要給予執行權限，如 chmod 755

從 CGI 呼叫 tools 做完在透過 stdout 接收，很可能會碰到編碼問題
- f = popen( 'tools arg' , 'rb' )
  d = f.read()
  f.close()
  
  print d
- 解法就是，改用 import 的方式，使用其 function 回傳的。我猜可能是因為 popen 可能是使用執行者的環境變數，跟你自己執行時不一樣。

如果 Python 版本不符合需求，那就自己編吧！

範例(my.cgi)：

#!/home/user/tarball/bin/python
# -*- coding: utf-8 -*-
import cgitb
cgitb.enable()

import cgi

print "Content-Type: text/html"     # HTML is following
print                               # blank line, end of headers

form = cgi.FieldStorage()

if 'pattern' in form:
        pattern = form['pattern'].value
        page = form['page'].value if 'page' in form and int( form['page'].value ) > 0 else 1

        import sys
        sys.path.append( "/home/user/mypylib" )
        from mylibs import *
        x = online_query( query=pattern , page=int(page) )
        print x.encode( 'utf-8' )

首先一開始是使用 /home/user/tarball/bin/python 自己編的 Python，理由純粹只是要用 PycURL 而已，可以參考安裝 cURL、Python 和 PycURL @ Ubuntu Server。

接著則是 CGI 的標準用法，開頭要印出 "Content-Type: text/html\n\n" 的訊息，以 C 語言就是：

#include <stdio.h>
#include <stdlib.h>

int main()
{
printf( "Content-Type: text/html\n\n" );
return 0;
}

然後，則是使用 cgi.FieldStorage() ，就能得知 POST/GET 的資訊，此例是 pattern 跟 page 這兩個參數。

最後，則是使用自己的寫的 python code，全寫在 /home/user/mypylib/mylibs.py 裡，需要透過 sys.path.append 增加搜尋的路徑，而 online_query 回傳的是 UTF-8 的資料，因此輸出時，需要在 encode 一下。

以上算是一個很簡單的筆記吧

2010年10月18日星期一

OPDS Catalog, OpenSearch, and Stanza

想要產生 OPDS Catalog 以及提供 OpenSearch 的功能，如圖右上角，讓 Stanza 這個閱讀器可以看得懂，但 OPDS Catalog 1.0 的規格還沒有完全支援（今天是 2010/10/18，Stanza 軟體是 3.0.3 版本），因此像是一些 link 中 rel 這個 attribute ，填入 OPDS Catalog 1.0 的規格後，也會導致 Stanza 讀不出來，之前測試的結果，把 rel 留空白應該可以。

最近想要試的是 OpenSearch，在 OPDS Catalog 1.0 中 7.5 裡有的定義：

7.5. Search

An OPDS Catalog MAY provide a search facility through an [OpenSearch] description document. Links to [OpenSearch] description documents MUST use the “search” relation value and the “application/opensearchdescription+xml” media type as defined in the “Autodiscovery” section of the [OpenSearch] specification.

<link rel="search"
      href="search.xml"
      type="application/opensearchdescription+xml"/>

In an [OpenSearch] description document, the search interface SHOULD use the media type associated to OPDS Catalogs:

<Url type="application/atom+xml;profile=opds-catalog"
     template="http://example.com/search?q={searchTerms}" />

OPDS Catalog Feed Documents MAY include elements from the [OpenSearch] namespace such as “opensearch:totalResults” or “opensearch:itemsPerPage” in [OpenSearch] responses.

只是我嘗試的結果，那個 type 沒被認出來，倒是直接用 type="application/atom+xml" 就可以看到，以 Feedbooks Catalog 為例：

所以，可以很偷懶乾脆一起寫：

至於 opensearch.xml 描述檔該怎樣寫，也可以直接參考 Feedbooks OpenSearch，甚至我還在想，乾脆全部導到 Feedbooks 也不錯

7.5. Search

An OPDS Catalog MAY provide a search facility through an [OpenSearch] description document. Links to [OpenSearch]
description documents MUST use the “search” relation value and the
“application/opensearchdescription+xml” media type as defined in the
“Autodiscovery” section of the [OpenSearch] specification.

<link rel="search"
      href="search.xml"
      type="application/opensearchdescription+xml"/>

In an [OpenSearch] description document, the search interface SHOULD use the media type associated to OPDS Catalogs:

<Url type="application/atom+xml;profile=opds-catalog"
     template="http://example.com/search?q={searchTerms}" />

OPDS Catalog Feed Documents MAY include elements from the [OpenSearch] namespace such as “opensearch:totalResults” or “opensearch:itemsPerPage” in [OpenSearch] responses.

安裝 cURL、Python 和 PycURL @ Ubuntu Server

由於工作用的桌機有防火牆的問題，所以我必須找一台可以連外的 server，接著又發現他提供的 python 並沒有 PycURL，因此我一路從 cURL 裝起，順便紀錄一下。

其實我有 root 權限，只是機器不是我的，少動為妙 XD

安裝 Python
- $ wget http://www.python.org/ftp/python/2.6.6/Python-2.6.6.tar.bz2
- $ tar -xvf Python-2.6.6.tar.bz2
- $ cd Python-2.6.6
- $ ./configure ./configure --prefix=/home/user/tarball && make && make install

安裝 cURL
- $ wget http://curl.haxx.se/download/curl-7.21.2.tar.bz2
- $ tar -xvf curl-7.21.2.tar.bz2
- $ cd curl-7.21.2
- $ ./configure --prefix=/home/user/tarball --without-ssl && make && make install

安裝 PycURL
- $ wget http://pycurl.sourceforge.net/download/pycurl-7.19.0.tar.gz
- $ tar -xvf pycurl-7.19.0.tar.gz
- $ cd pycurl-7.19.0
- $ /home/user/tarball/bin/python setup.py install

正當我以為一切都很美好，跑了程式後蹦出了訊息：

Fatal Python error: pycurl: libcurl link-time version is older than compile-time version

仔細一看，原來系統已有的 curl 是 7.18.2 版本(可以用 curl -V 查看)，比我安裝的還要低，因此會出包，原先想要學其他人更改 lib 的位置，可是這樣惡搞下去不就等於用 root 亂搞了嗎，所以最後挑 curl 7.18.2 和 pycurl-7.18.2 的版本來編就好。

再這之前還試過安裝 curl 7.19.0 和 pycurl-7.19.0，但還是一樣的訊息，所以結論？若系統已有 curl 的話（用 which curl 和 curl -V 來判斷跟得知版本），還是偷懶挑跟系統一樣的版本來裝吧。

2010年10月15日星期五

Stanza - USER AGENT ＆ OPDS Catalog

Stanza 是在 iPod Touch / iPhone / iPad 上，強大的免費電子書軟體，有興趣可以裝一下

然而，有些電子書的供應網站，有提供 OPDS Catalog，但是僅限於 Stanza 去閱讀，真是令人不方便，因此就先了解一下 Stanza 的 User Agent後，就可以去模擬一下，看到那些 OPDS Catalog 囉！

得知 Stanza 的 User Agent 的方式嘛，就是先寫個 CGI ，例如透過 PHP：

＠index.php
<?php
file_put_contents( '/tmp/opds' , print_r( $_SERVER , true ) );
?>

接著使用 Stanza > 獲取書籍 > 分享 > 編輯 > 新增上述 CGI 位置，接著瀏覽一下就可以收集到 Stanza 使用的 User Agent 啦！

Stanza/3.0.3 iPhone OS/4.1/iPhone catalog/3.0.3

最後則是透過瀏覽器的一些 plugin ，例如 Firefox & User Agent Switcher，就可以新增一個 user agent，接著在瀏覽某些網頁就會看到神秘的 OPDS Catalog 囉！

2010年10月13日星期三

opac.ndl.go.jp 日本書籍資料

研究上，如果要取得書籍資料當作測試，並不是每一家擁有資料者都那麼佛心的！特別是沒有任何關係者。慶幸的，国立国会図書館有提供這種服務！

在 http://opac.ndl.go.jp/ 網站上，可以透過關鍵字去查詢書籍，查詢完後，旁邊有個小按鈕 "Download" ，之後就可以把這次查詢的資料其 metadata 下載回來。其格式是 tab-delimited format (tsv) 並且最多讓你下載 200 筆資料。

以 Python 語法來處理下載下來的 tsv 檔案：

fd = codecs.open( target_file , 'rb' , 'Shift-JIS' )
rawitems = []
while True:
raw = fd.readline()
if raw is None or raw == '' :
   break
else:
   rawitems.append( raw.split( '\t' ) )
fd.close()

RE_ISBN = re.compile( '[0-9xX\-]{10,}' )

for rawdata in rawitems:

   ISBN = None
   TITLE = None
   PUBLISHER = None
   NOTE = None

   # ISBN
   if len(rawdata) < 17:
       continue
   ISBN = rawdata[16]
   ISBN = re.findall( RE_ISBN , ISBN )
   if len(ISBN) < 1 :
       continue
   ISBN = ISBN[0].replace('-', '')

   # TITLE
   if len(rawdata) < 2:
       continue
   TITLE = rawdata[1]

   # PUBLISHER
   if len(rawdata) < 9:
       continue
   PUBLISHER = rawdata[8]

   if len(rawdata) >= 16 :
       NOTE = rawdata[15]

   #
   # do something ...
   # ...

   return

[Python] 使用 cURL (PycURL)

以前常用 PHP cURL 惡搞，最近常寫 Python，終於又碰到了使用 cURL 的時刻了，自己隨意摸索一下筆記。

關於 PycURL 的使用，對照文件：

PycURL Documentation

PycURL: Curl Objects

libcurl - curl_easy_setopt()

如此如此，這般這般，心得：

所有 "CURLOPT_" 開頭的設定，在 Python 改用 "pycurl."，例如 CURLOPT_URL 改成 pycurl.URL

以前用 PHP 時，可以 curl_setopt( $ch , CURLOPT_RETURNTRANSFER , true ); 的方式，就可以透過 $result = curl_exec( $ch ); 把網頁資料存在變數中，現在這邊則是要用 b = StringIO.StringIO()、c.setopt(pycurl.WRITEFUNCTION, b.write) 和 b.getvalue() 方式來取得

簡易的範例：

        import urllib
        import pycurl
        import StringIO

        url = 'target_url'
        c = pycurl.Curl()
        c.setopt( pycurl.URL , url )
        c.setopt( pycurl.FOLLOWLOCATION , True )

        c.setopt( pycurl.COOKIEFILE , '/tmp/pycurl' )
        c.setopt( pycurl.COOKIEJAR , '/tmp/pycurl' )

        b = StringIO.StringIO()
        c.setopt(pycurl.WRITEFUNCTION, b.write)

        c.perform()

        #print b.getvalue()
        r = b.getvalue()
        b.close()
        b = StringIO.StringIO()
        c.setopt(pycurl.WRITEFUNCTION, b.write)

        check = re.findall( re.compile( '(<form(.*?)</form>)' , flags=(re.IGNORECASE|re.DOTALL) ) , r );
        if len(check) < 1 :
                print "No FORM DATA"
                return

        r = check[0][1]
        out = {}
        for sub_info in re.findall( re.compile( '<input(.*?)(name=[\'"]{0,1}(.*?)[\'"]{0,1}[\s]+value=[\'"]{0,1}(.*?)[\'"]{0,1}[\s>]|value=[\'"]{0,1}(.*?)[\'"]{0,1}[\s]+name=[\'"]{0,1}(.*?)[\'"]{0,1}[\s>])' , flags=re.IGNORECASE ) , r ):
                if len(sub_info) != 6 :
                        continue
                if sub_info[2] != '':
                        out[ sub_info[2] ] = sub_info[3]
                elif sub_info[5] != '':
                        out[ sub_info[5] ] = sub_info[4]

        for key in out :
                print "\t",key,":\t",out[key]

        url = 'target_form_action_url'
        c.setopt( pycurl.URL , url )
        c.setopt( pycurl.FOLLOWLOCATION , True )
        c.setopt( pycurl.POST , True )
        c.setopt( pycurl.POSTFIELDS , urllib.urlencode(out) )

        c.perform()

        #print b.getvalue()
        r = b.getvalue()
        b.close()

        f = open( target_file , 'wb' )
        f.write( r )
        f.close()

這個範例是到某個網頁收集 form 表格中的 input 欄位資料，並且使用 cookie ，最後則是產生 POST 並把結果存在檔案中。

2010年10月11日星期一

OPDS Catalog Validator

OPDS Catalog 是從 Atom 那邊擴充的，相關的驗證工具：

The W3C Markup Validation Service

W3C Feed Validation Service, for Atom and RSS

但如果想要驗證 OPDS Catalog 1.0 的規格，這些都還不夠用。而後終於找到一個可以驗證的方式：

Validator.nu

使用方式也要稍作設定：

Address : 你的 opds url

Schemas
- http://openpub.googlecode.com/svn/trunk/schemas/atom.rnc
- http://openpub.googlecode.com/svn/trunk/schemas/opds_catalog.rnc

Preset：None

Parser：XML; load external entities

勾選 'Be lax about HTTP Content-Type'

2010年10月8日星期五

[Javascript] EPUB Reader 翻頁效果的原理

幾天前收到一個問題，關於坊間常見得 Javascript 版 EPUB Reader ，他們處理翻頁的效果是怎樣做的呢？

可以看看以下的 Demo：

ePub js / ePub Zen Garden
- http://threepress.org/static/epubjs/
- http://epubzengarden.com/

rePublish
- http://romeda.org/rePublish/

Monocle
- http://monocle.inventivelabs.com.au/

Booktorious
- http://ditrw.com/booktorious/
- 無分頁效果，以及必須自行上傳檔案

Javascript EPUB Reader 的實作，原理就是將 EPUB 裡頭以XHTML、HTML單一檔案為單位，把整個 XHTML 弄成一個物件擺入指定的 DIV 裡頭，藉以顯示出來，就像用瀏覽器瀏覽 XHTML 檔案一樣。如此一來，只是透過 Javascript 去抽取該 XHTML 內的資料出來呈現。

至於翻頁的效果呢？假設要呈現的內容範圍是 640x480，以 ePub js 的原始碼看來，先產生一個 div-tmp ，動態把一個 XHTML 的資料，以 <p> 為單位地，慢慢地一個個加入到 div-tmp，並且動態地查看 div-tmp 的高度，若超過 480 時，則把前一個才加入的 <p> 內容，以空白切字，從後面慢慢地移出一個字出來，直到 div-tmp 的高度滿足 480 的限制，並且把移出來的內容另外建立一個新的 <p> 並擺在其後，以便下一頁的使用。而 rePublish 也是使用類似的概念，細節可以 rePublish - paginator.js 看到。

但是，以空白切字是有危險的，例如僅適用於英文字這種好切的語言，因此我嘗試使用 Monocle 發現他並沒有切字切不好的現象，再仔細一看，發現他的實作方式並非使用切字、填補的方式，並採用兩個 div 的處理機制。而後發現使採用 CSS Multi-column Layout Module (W3C Candidate Recommendation 17 December 2009)。

概念就是產生兩個 div ，一個是現在的這頁，另一個是下一頁用來顯示的，假設這兩頁是同一份的 XHTML 資料，且這個 XHTML 檔案不分頁顯示時，呈現共佔 640x800 大小，那第一頁就類似顯示 640x480，剩下的 640x(800-480) 之後的內容。然而，這樣還可能存在邊界關係，例如第一頁的最後一列，可能會整列因為視窗大小不夠顯示，而把整列的字都切了一半。實際測試，發現並沒這個問題，我用 Firebug 去觀察相關的 div style，整理一個小範例：

<html>
        <head>
                <title>Monocle Test</title>
                <style type="text/css">
                        #container
                        {
                                width: 100%;
                                height: 100%;
                                position: relative;
                        }
                        #show {
                                position: absolute;
                                top: 1em; bottom: 1em; left: 1em; right: 1em;
                                overflow: hidden;
                        }
                        #content
                        {
                                position: absolute;
                                top: 0pt; bottom: 0pt; min-width: 200%;
                                -moz-column-gap: 0pt; -moz-column-width: 1219px;
                        }
                </style>
        </head>
        <body>
                <div id="container">
                        <div id="show">
                                <div id="content">
                                        <p>　　詞曰：滾滾長江東逝水，浪花淘盡英雄。是非成敗轉頭空：青山仿舊在，幾度夕陽紅。白髮漁樵江渚上，慣看秋月春風。一壺
濁酒喜相逢：古今多少事，都付笑談中。</p>
                                        <p/>
                                        <p>　　話說天下大勢，分久必合，合久必分：周末七國分爭，并入於秦。及秦滅之後，楚、漢分爭，又并入於漢。漢朝自高祖斬白蛇
而起義，一統天下。後來光武中興，傳至獻帝，遂分為三國。推其致亂之由，殆始於桓、靈二帝。桓帝禁錮善類，崇信宦官。及桓帝崩，靈帝即位，大將軍竇武、太傅陳蕃，共相>輔佐。時有宦官曹節等弄權，竇武、陳蕃謀誅之，作事不密，反為所害。中涓自此愈橫。</p>
                                        <p/>
                                        <p>　　建寧二年四月望日，帝御溫德殿。方陞座，殿角狂風驟起，只見一條大青蛇，從梁上飛將下來，蟠於椅上。帝驚倒，左右急救
入宮，百官俱奔避。須臾，蛇不見了。忽然大雷大雨，加以冰雹，落到半夜方止，壞卻房屋無數。建寧四年二月，洛陽地震；又海水泛溢，沿海居民，盡被大浪捲入海中。光和元>年，雌雞化雄。六月朔，黑氣十餘丈，飛入溫德殿中。秋七月，有虹見於玉堂；五原山岸，盡皆崩裂。種種不祥，非止一端。</p>

                                </div>
                        </div>
                </div>
        </body>
</html>

可以使用 Firefox 瀏覽器瀏覽上述內容，並且試著把視窗的高度縮小，可以發現並不會造成最後一列字被切一半。原理就是使用 css 來處理。另外一提的，#content 裡的 -moz-column-width 是跟我的視窗大小有關，測試時要改一下囉。

而 Moncle 使用的技巧就是上面的東西，再加上一個 Display Screen 的移換，將上述的 CSS 更新：

呈現的樣貌：

因此關於切換頁面的部份，就只是移換要顯示的框框位置，上例就是左邊為第一頁，右邊是第二頁

訂閱：文章 (Atom)

2010年10月22日 星期五

2010年10月21日 星期四

2010年10月19日 星期二

2010年10月18日 星期一

7.5. Search

2010年10月15日 星期五

2010年10月13日 星期三

2010年10月11日 星期一

2010年10月8日 星期五

Subscribe Now