第二十四個夏天後: 2024

2024年10月14日星期一

Linux 開發筆記 - 在 Ubuntu 16.04 使用 Docker Ubuntu 24.04 無法正常更新套件 (apt update）@ Ubuntu 16.04

追蹤到最後，解法應當是把 Host 端的 docker 更新來修正，但一開始很懶，只好遮住雙眼摸魚一下 XD

狀態：

```
$ lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description: Ubuntu 16.04 LTS
Release: 16.04
Codename: xenial

$ docker run -it ubuntu:24.04 /bin/bash
root@a0f5c64ed667:/# apt update
Get:1 http://security.ubuntu.com/ubuntu noble-security InRelease [126 kB]
Get:2 http://archive.ubuntu.com/ubuntu noble InRelease [256 kB]
Err:1 http://security.ubuntu.com/ubuntu noble-security InRelease
The following signatures couldn't be verified because the public key is not available: NO_PUBKEY 871920D1991BC93C
Err:2 http://archive.ubuntu.com/ubuntu noble InRelease
The following signatures couldn't be verified because the public key is not available: NO_PUBKEY 871920D1991BC93C
Get:3 http://archive.ubuntu.com/ubuntu noble-updates InRelease [126 kB]
Err:3 http://archive.ubuntu.com/ubuntu noble-updates InRelease
The following signatures couldn't be verified because the public key is not available: NO_PUBKEY 871920D1991BC93C
Get:4 http://archive.ubuntu.com/ubuntu noble-backports InRelease [126 kB]
Err:4 http://archive.ubuntu.com/ubuntu noble-backports InRelease
The following signatures couldn't be verified because the public key is not available: NO_PUBKEY 871920D1991BC93C
Reading package lists... Done
W: GPG error: http://security.ubuntu.com/ubuntu noble-security InRelease: The following signatures couldn't be verified because the public key is not available: NO_PUBKEY 871920D1991BC93C
E: The repository 'http://security.ubuntu.com/ubuntu noble-security InRelease' is not signed.
N: Updating from such a repository can't be done securely, and is therefore disabled by default.
N: See apt-secure(8) manpage for repository creation and user configuration details.
W: GPG error: http://archive.ubuntu.com/ubuntu noble InRelease: The following signatures couldn't be verified because the public key is not available: NO_PUBKEY 871920D1991BC93C
E: The repository 'http://archive.ubuntu.com/ubuntu noble InRelease' is not signed.
N: Updating from such a repository can't be done securely, and is therefore disabled by default.
N: See apt-secure(8) manpage for repository creation and user configuration details.
W: GPG error: http://archive.ubuntu.com/ubuntu noble-updates InRelease: The following signatures couldn't be verified because the public key is not available: NO_PUBKEY 871920D1991BC93C
E: The repository 'http://archive.ubuntu.com/ubuntu noble-updates InRelease' is not signed.
N: Updating from such a repository can't be done securely, and is therefore disabled by default.
N: See apt-secure(8) manpage for repository creation and user configuration details.
W: GPG error: http://archive.ubuntu.com/ubuntu noble-backports InRelease: The following signatures couldn't be verified because the public key is not available: NO_PUBKEY 871920D1991BC93C
E: The repository 'http://archive.ubuntu.com/ubuntu noble-backports InRelease' is not signed.
N: Updating from such a repository can't be done securely, and is therefore disabled by default.
N: See apt-secure(8) manpage for repository creation and user configuration details.
E: Problem executing scripts APT::Update::Post-Invoke 'rm -f /var/cache/apt/archives/*.deb /var/cache/apt/archives/partial/*.deb /var/cache/apt/*.bin || true'
E: Sub-process returned an error code
```

硬解：

```
root@a0f5c64ed667:/# apt update --allow-insecure-repositories
...
```

安裝軟體：

```
root@a0f5c64ed667:/# apt install -y --allow-unauthenticated vim wget curl git tree python3-virtualenv
...
```

結果越裝越痛苦 XD 還是回去看 Docker 官網安裝教學：docs.docker.com/engine/install/ubuntu/#install-using-the-repository

如此就可以在 Host Ubuntu 16.04 ，正確呼喚出 docker Ubuntu 24.04 來運行，此外，也可以留意 Docker 官方文件建議的 Ubuntu OS 環境已經是 20.04 以上了

2024年10月9日星期三

學習完畢 Google Cloud AI Study Jam 2024 - 生成式 AI 培訓計劃

圖：www.credly.com/users/changyy/

就這樣一口氣把 Google Cloud AI Study Jam 2024 都扛過去了，從月初到 10/08 晚間，每天多少都有清幾則：

Path 1: Beginner: Introduction to Generative AI Learning Path (5 堂課程)
Path 2: Intermediate: Gemini for Google Cloud Learning Path (8 堂課程)
Path 3: Advanced: Generative AI for Developers Learning Path (12 堂課程)

原本是颱風天體驗了 Path 1 ，結果就這樣馬不停蹄地把 Path 2 跟 Path 3 也給 KO 了，真的獲益良多。

近一年當中，在 AI 應用開發跟自修上，剛好上過台大李宏毅教授的 Youtube 影片：【生成式AI導論 2024】，接著也協助公司使用 OpenAI API 建置了幾個雛形和一款稍微正式的對內服務，體驗過訓練自己的 model 來做客製化應用，以及也摸索過 RAG 應用和負責協助公司導入 AI 輔助應用等等，在這樣的背景下，進行 Path 1, Path 2 和 Path 3 仍收穫滿滿，補了不少思維：

對 Google AI 的發展更新為沒脫隊！只是入門門檻高 XD
研發 AI 服務時，要想想怎樣達成開發出一個負責任的 AI

在 Path 1：Beginner: Introduction to Generative AI Learning Path 過程時，開始熟悉 Google Cloud 的介面，儘管 2018 年左右就開始用 GCP ，但基於工作方式大多透靠 ansible + jenkins 完成任務，一直漠視了 Google 發展，這次使用上反而驚艷了一下，特別是 Gemini 小幫手到處都可以協助，不熟也沒關係，問！就對了。

在 Path 1 的課程就被 Google Cloud 網頁管理介面驚艷到，接著補了不少關於負責任 AI 的思維，令人想起前陣子 OpenAI Superalignment 的事情，雖然有點直覺，但對於事件的感受還不夠強烈，在上完課後終於能比較全面體會了，好在 OpenAI 在 2024.05 成立新的安全委員會了。

在 Path 2: Intermediate: Gemini for Google Cloud Learning Path 課程，讓我感受到身為 RD / DevOps / SRE 時 AI 的靈壓，可以有以下體驗思維：

站在資方角度，不需聘請有高深經驗者，僅需找資淺的聰明者(學習力強)，只要能問對問題，靠 Gemini 補完工作，甚至快速累積經驗
站在開發者進行 side project 角度，透過 Gemini 大方地享受一人工作模式，可以只專精或感興趣一個領域，剩下的靠 Gemini 模擬出 10-30 人力規模的新創組織，幫你把點子完善到成品

此外，在 Path 2 的各種角色使用 Gemini 工作的思維，讓人欽佩大公司的組織架構，在小公司都是全包了 Orz

Gemini for Application Developers
Gemini for Cloud Architects
Gemini for Data Scientists and Analysts
Gemini for Network Engineers
Gemini for Security Engineers
Gemini for DevOps Engineers
Gemini for end-to-end SDLC
Develop GenAI Apps with Gemini and Streamlit

在 Path 3: Advanced: Generative AI for Developers Learning Path 的心得，就進入比較正式的原理課程，雖然看過過台大教授的 AI 導論，但還是在這波習得不少底層運作的觀念。

圖：www.cloudskillsboost.google/profile/leaderboard

圖：www.cloudskillsboost.google/paths

最後，差不多累積了兩萬分，其中 Quiz 錯了幾次，Lab 都一次通過。

2024年10月3日星期四

體驗 Google Cloud AI Study Jam 2024: goo.gle/csj-tw-2024

圖：www.facebook.com/share/p/ihEZoPDaanCRqWgS/

看著認識的 Google 技術傳教士分享這活動，趁颱風天來體驗一下 XD 總算做完 Path 1 的學習歷程，微累，但至少可以拿到個貼紙禮物了？XD

Path 1: Beginner: Introduction to Generative AI Learning Path (5 堂課程)
Path 2: Intermediate: Gemini for Google Cloud Learning Path (8 堂課程)
Path 3: Advanced: Generative AI for Developers Learning Path (12 堂課程)

共 25 份學習教材，完成時間約 48 小時。

在影片觀看上，可以用兩倍速吸收，旁邊又有中文字幕，很不錯。整體耗時是還可以接受，倒是透過這次了解 vertex ai 的操作介面，很佛的設定好 prompt 後，還可以匯出程式碼，這點很方便，若要說卡關的話，好像有某堂課要驗證 vertex ai studio 某操作項目時，一直失敗，後來我猜到了，他是在掃已儲存的 prompt 的比對方式是用英文作為判斷，這時要把 vertex ai studio 的操作資料改成英文 (examples) 才能被偵測成功完成該項目。

整體上就熟悉一下 Google Cloud 服務，很順，此外 Path 1 某堂課也有稍微改 code 的地方，對於工程師背景的人來說，看懂題目，很簡單就能完成的。印象中，有一題很卡，題目上只有一個 example ，要自己再想另一個填入，接著，還要再想額外一個當作 promot & Test Input 來實驗（總共想兩個新句子），大概這題一開始不順會瞬間喪失自信 XD 但撐過了就沒問題了。經過這痛苦後，也被訓練到，看題目時可以挑熟悉的語言（中文），但建議還要另開一頁用英文的再看一次，避免翻中文時，一些關鍵操作看不懂。特別是一些操作選單，英文清楚非常多。

之前一直想用 LangChain 也略知一二，但恰好教學內容提了簡單 Python Code 真不錯，醍醐灌頂。

過去已經約一年多都在用 OpenAI API 開發服務，這次體驗了 Google 牌，也才了解 PaLM, Gemini 和 Vertex AI 的不同，之前 PaLM 剛推出來時，同事整合會議記錄，但整體上還是有不少需要客製化的地方，很快就放棄整合，沒想到 Google NotebookLM 最近推出，狂勝，且根據 Path 1 的學習歷程，大概能體會 Google 在 Responsible AI 投入很驚人的資源。

其他：學習完畢 Google Cloud AI Study Jam 2024 - 生成式 AI 培訓計劃

2024年9月23日星期一

PHP 開發筆記 - 在 macOS 15 安裝 PHP 7.4 開發環境 @ Apple M2 / arm64

原本是習慣用 MacPorts 管理套件，然後升上 macOS 15 後，跑了 MacPorts migration 後，發現 php74 不見了，細追才發現沒裝起來 XD 努力研究了一下：

---> Configuring php74
Warning: Configuration logfiles contain indications of -Wimplicit-function-declaration; check that features were not accidentally disabled:
gethostbyname_r: found in php-7.4.33/config.log
_controlfp: found in php-7.4.33/config.log
_controlfp_s: found in php-7.4.33/config.log
Warning: Configuration logfiles contain indications of -Wimplicit-int; check that features were not accidentally disabled:
found in php-7.4.33/config.log
---> Building php74
Error: Failed to build php74: command execution failed
Error: See /opt/local/var/macports/logs/_opt_local_var_macports_sources_rsync.macports.org_macports_release_tarballs_ports_lang_php/php74/main.log for details.
Error: Follow https://guide.macports.org/#project.tickets if you believe there is a bug.
Error: Processing of port php74 failed
---> Some of the ports you installed have notes:
db48 has the following notes:
The Java and Tcl bindings are now provided by the db48-java and
db48-tcl subports.

===

:info:build ld: warning: ignoring duplicate libraries: '-largon2', '-lxml2', '-lz'
:info:build Undefined symbols for architecture arm64:
:info:build "_res_9_dn_expand", referenced from:
:info:build _php_parserr in dns.o
:info:build _php_parserr in dns.o
:info:build _php_parserr in dns.o
:info:build _php_parserr in dns.o
:info:build _php_parserr in dns.o
:info:build _php_parserr in dns.o
:info:build _php_parserr in dns.o
:info:build ...
:info:build "_res_9_dn_skipname", referenced from:
:info:build _zif_dns_get_record in dns.o
:info:build _zif_dns_get_mx in dns.o
:info:build _zif_dns_get_mx in dns.o
:info:build "_res_9_init", referenced from:
:info:build _zif_dns_check_record in dns.o
:info:build _zif_dns_get_record in dns.o
:info:build _zif_dns_get_mx in dns.o
:info:build "_res_9_search", referenced from:
:info:build _zif_dns_check_record in dns.o
:info:build _zif_dns_get_record in dns.o
:info:build _zif_dns_get_mx in dns.o
:info:build ld: symbol(s) not found for architecture arm64
:info:build clang: error: linker command failed with exit code 1 (use -v to see invocation)
:info:build make: *** [sapi/cli/php] Error 1
:info:build make: *** Waiting for unfinished jobs....
:info:build ld: warning: ignoring duplicate libraries: '-largon2', '-lxml2', '-lz'
:info:build Undefined symbols for architecture arm64:
:info:build "_res_9_dn_expand", referenced from:
:info:build _php_parserr in dns.o
:info:build _php_parserr in dns.o
:info:build _php_parserr in dns.o
:info:build _php_parserr in dns.o
:info:build _php_parserr in dns.o
:info:build _php_parserr in dns.o
:info:build _php_parserr in dns.o
:info:build ...
:info:build "_res_9_dn_skipname", referenced from:
:info:build _zif_dns_get_record in dns.o
:info:build _zif_dns_get_mx in dns.o
:info:build _zif_dns_get_mx in dns.o
:info:build "_res_9_init", referenced from:
:info:build _zif_dns_check_record in dns.o
:info:build _zif_dns_get_record in dns.o
:info:build _zif_dns_get_mx in dns.o
:info:build "_res_9_search", referenced from:
:info:build _zif_dns_check_record in dns.o
:info:build _zif_dns_get_record in dns.o
:info:build _zif_dns_get_mx in dns.o
:info:build ld: symbol(s) not found for architecture arm64
:info:build clang: error: linker command failed with exit code 1 (use -v to see invocation)
:info:build make: *** [sapi/phpdbg/phpdbg] Error 1
:info:build make: Leaving directory `/opt/local/var/macports/build/_opt_local_var_macports_sources_rsync.macports.org_macports_release_tarballs_ports_lang_php/php74/work/php-7.4.33'
:info:build Command failed: cd "/opt/local/var/macports/build/_opt_local_var_macports_sources_rsync.macports.org_macports_release_tarballs_ports_lang_php/php74/work/php-7.4.33" && /usr/bin/make -j8 -w all
:info:build Exit code: 2
:error:build Failed to build php74: command execution failed
:debug:build Error code: CHILDSTATUS 94983 2
:debug:build Backtrace: command execution failed
:debug:build while executing
:debug:build "system {*}$notty {*}$callback {*}$nice $fullcmdstring"
:debug:build invoked from within
:debug:build "command_exec -callback portprogress::target_progress_callback build"
:debug:build (procedure "portbuild::build_main" line 10)
:debug:build invoked from within
:debug:build "$procedure $targetname"
:error:build See /opt/local/var/macports/logs/_opt_local_var_macports_sources_rsync.macports.org_macports_release_tarballs_ports_lang_php/php74/main.log for details.

因此，主要是 _res_9_dn_* 等等的問題，問 AI 說，解方若還要繼續用 MacPorts 管理，只能裝個肥滋滋的 bind9

% sudo port install bind9
% sudo port clean php74
% sudo port install php74 configure.cflags="-I/opt/local/include" configure.ldflags="-L/opt/local/lib -lbind9 -ldns -lisc"

但這樣跑也失敗了

接著又試了

% brew install php@7.4
Warning: No available formula with the name "php@7.4". Did you mean php@8.2, php@8.1 or php@8.0?
==> Searching for similarly named formulae and casks...
==> Formulae
php@8.2 php@8.1 php@8.0

To install php@8.2, run:
brew install php@8.2

雖然繼續深入問 AI 可以得到要做 php-7.4 patch 檔案，但這樣搞下去維護太累了，等於自己編譯 php74 跟改 code 了

最終，取個平衡，未來需要 php74 的環境，就靠 Docker 吧 XD 就單純這樣用即可：

% docker run -it -v ~/projects/:/projects -p 8080:8080 -p 8000:8000 -p 8443:8443 --rm php:7.4-cli bash

如此在 Docker 內部用 `php -S 0.0.0.0:8000` 就可以在 Host 端存取查看運行結果了

若還是想直接裝在系統，可以參考 github.com/shivammathur/homebrew-php

2024年9月8日星期日

Dart 開發筆記 - 製作 Big5 to UTF8 工具以及發布到 pub.dev

最近想整理以前 C++ 寫過的東西來尋找熱情，先試試把很片段的小東西轉出來使用，這次純靠 claude.ai 做了滿多事，包括 README.md, CHANGELOG.md, example 和 test 都是他寫的 XD 剛好把最懶散的部分都給搞定了，因此 example 和 test 裡頭都會有簡體中文，甚至 README 一開始也給簡中版，只好請他給予英文版即可

整個過程大概不用兩個小時，非常舒服，花比較多的時間是請他把工具規劃成 library 的過程，claude ai 提了不少建議，但我也打槍他，最終有了現況的產出，這時真的感受到 AI 輔助的有趣的地方，還會提供一些觀點，像是 Flutter 他有多種平台，有些平台不適合 io 操作，這時會收到一些 AI 給予的建議，調整一些實作方向。

產出：pub.dev/packages/big5_utf8_converter

其中 pub.dev 發布過程還不太熟悉，參考官網 dart.dev.org.tw/tools/pub/publishing 先用指令：

```
big5_utf8_converter_dart % dart pub publish --dry-run
Resolving dependencies...
Downloading packages...
_fe_analyzer_shared 73.0.0 (74.0.0 available)
analyzer 6.8.0 (6.9.0 available)
macros 0.1.2-main.4 (0.1.3-main.0 available)
Got dependencies!
3 packages have newer versions incompatible with dependency constraints.
Try `dart pub outdated` for more information.
Publishing big5_utf8_converter 1.0.0 to https://pub.dev:
├── CHANGELOG.md (<1 KB)
├── LICENSE (1 KB)
├── README.md (2 KB)
├── assets
│ └── big5_to_utf8_lookup.bin (64 KB)
├── bin2dart.dart (<1 KB)
├── example
│ ├── big5_utf8_converter_example.dart (<1 KB)
│ └── big5_utf8_converter_load_table_example.dart (1 KB)
├── lib
│ ├── big5_utf8_converter.dart (<1 KB)
│ └── src
│ ├── big5_to_utf8_lookup_data.dart (260 KB)
│ └── big5_utf8_converter.dart (1 KB)
├── pubspec.yaml (<1 KB)
└── test
└── big5_decoder_test.dart (2 KB)

Total compressed archive size: 98 KB.
The server may enforce additional checks.

Package has 0 warnings.
```

接著才使用 Github Actions 來做事，僅需使用 Github Actions 預設的 Dart 就會幫跑 test case：

```
jobs:
build:
runs-on: ubuntu-latest

steps:
- uses: actions/checkout@v4

- uses: dart-lang/setup-dart@9a04e6d73cca37bd455e0608d7e5092f881fd603

- name: Install dependencies
run: dart pub get

- name: Analyze project source
run: dart analyze

- name: Run tests
run: dart test
```

最後，更新 README.md 增加圖標令牌：

[![pub package](https://img.shields.io/pub/v/big5_utf8_converter.svg)](https://pub.dev/packages/big5_utf8_converter)
[![Build Status](https://github.com/changyy/big5_utf8_converter_dart/workflows/Dart/badge.svg)](https://github.com/changyy/big5_utf8_converter_dart/actions)

收工！

2024年8月28日星期三

Xiaomi 小米智慧直流變頻電風扇斷頭維修

小米智慧直流電頻電風扇斷

買了六年的電風扇，之前已經有點感覺有異狀，但在某一天早上，終於因為不當的使用方式移動他而斷掉了，查了一下討論文，沒想到還滿多人講這件事，包括沒有設計提拉他的把手 XD 這大概就是極簡設計吧，正確的移動他是要握住下方的桿子。

研究了一會兒，在 BiliBili 和抖音有看到“拆解”影片，再細看一下淘寶有賣零件！不錯，立馬研究一下，但想說零件來再說，結果...

拿到零件後，要進行最後的處理，才發現最困難的地方就是拆除斷掉的項目

因為這段是把塑膠套管透過強力膠黏住的，解方只有兩種，一種是加熱處理，另一種是設法先鋸成切片，各個片段取出時比較容易。

原本想說懶得買鋸子，就學影片火烤，但烤了老半天都看到塑膠管形變，就是拔不出來，最後在暴力處理的過程，正式把斷掉的部分拔掉，但管子內管還是沒有清乾淨，新買的零件也裝不下去，最終回去買鋸子鋸個兩半，在用一字跟夾子強力清除，不得不說，清除完有種爽感跟成就感，那是擠壓多時的恨意 XD

最後，安裝很順利，畢竟拆解時都有拍照，對著回顧一下就裝完了，裝完當下還覺得有點怪怪的，結果裝反了 XD 又拆一次再裝一次，熟到可以進工廠組裝了

這張是裝錯的，裝反了

終於，結束了這場鬧劇。

這邊善意提醒，購買零件時，請務必把電風扇底部的型號給予賣家，因為小米電風扇有很多款，每款格式不同，主要產品型號可以讓賣家快速確認，此外，可以的話，那還是放棄修復了 XD 年限到了，這類商品就自然地汰舊換新吧！

2024年8月23日星期五

Python 開發筆記 - 不透過 Google API Key 下載公開 Google Sheets 資料，將每個 Sheet 匯出 csv 格式

有個工作任務要做 Google Sheets 資料比對，最簡單的方式就把他們匯出後用 git diff 來比對即可，想試著用 AI 產生一隻 python 小工具，只要輸入 Google Sheets URL 或是 Google Sheets URL 內關鍵的辨識 ID （在此稱作 spreadsheet id），就能夠下載該 Google Spreadsheet 內所有 sheet 資料

然後，要下載指定 sheet 必須得知每個 sheet gid ，這個問 ChatGPT-4o 或 Claude.ai 老半天還是沒法解，包過上傳 html static code，最後自己還是跳下來收尾人工刻一下，原理：

先設法下載到 HTML Code
透過 docs-sheet-tab-caption 抓出 Sheet Name
透過 var bootstrapData = {...}; 得知內有 Sheet Name 與 Gid 的資料
再用 [0,0,\"gid\",[ 格式，找到 gid

連續動作：

% python3 main.py

usage: main.py [-h] (--google-spreadsheet-url GOOGLE_SPREADSHEET_URL | --google-spreadsheet-id GOOGLE_SPREADSHEET_ID) [--output OUTPUT]

main.py: error: one of the arguments --google-spreadsheet-url --google-spreadsheet-id is required

% python3 main.py --google-spreadsheet-id 'XXXXXXXXXXXXXXXXXXXXXXX'

[INFO] Downloaded sheet: sheet01 to sheets_csv/sheet01.csv

[INFO] Downloaded sheet: sheet02 to sheets_csv/sheet02.csv

[INFO] Downloaded sheet: sheet03 to sheets_csv/sheet03.csv

[INFO] Downloaded sheet: sheet04 to sheets_csv/sheet04.csv

[INFO] Downloaded sheet: sheet05 to sheets_csv/sheet05.csv

程式碼：

```
% cat main.py
import argparse
import re
import requests
import json
import os

def extract_spreadsheet_id(url):
match = re.search(r'/d/([a-zA-Z0-9-_]+)', url)
return match.group(1) if match else None

def get_spreadsheet_info(spreadsheet_id):
url = f"https://docs.google.com/spreadsheets/d/{spreadsheet_id}/edit"
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36'
}

try:
response = requests.get(url, headers=headers)
response.raise_for_status()
html_content = response.text

# Extract sheet names
sheet_names = re.findall(r'docs-sheet-tab-caption[^>]+>([^<]+)</div>', html_content)
#print(f"Found sheet names: {sheet_names}")

# Extract mergedConfig
config_match = re.search(r'var bootstrapData\s*=\s*({.*?});', html_content, re.DOTALL)
if config_match:
config_str = config_match.group(1)
sheet_info = {}
try:
for index, sheet_name in enumerate(sheet_names):
#print(f"Processing sheet: {sheet_name}, index: {index}")
beginPattern = f'[{index},0,\\"'
endPattern = f'\\",['
beginIndex = config_str.find(beginPattern)
endIndex = config_str.find(endPattern, beginIndex)
gidValue = config_str[beginIndex + len(beginPattern):endIndex]
sheet_info[sheet_name] = gidValue
return sheet_info
except Exception as e:
print(f"[INFO] Error extracting sheet information: {e}")
return None
else:
print("[INFO] Could not find bootstrapData in the HTML content")
return None
except requests.RequestException as e:
print(f"[INFO] Error fetching the spreadsheet: {e}")
return None

def download_sheet_as_csv(spreadsheet_id, sheet_name, gid, output_folder):
csv_url = f"https://docs.google.com/spreadsheets/d/{spreadsheet_id}/export?format=csv&gid={gid}"
csv_response = requests.get(csv_url)

if csv_response.status_code == 200:
output_path = os.path.join(output_folder, f"{sheet_name}.csv")
with open(output_path, 'wb') as f:
f.write(csv_response.content)
print(f"[INFO] Downloaded sheet: {sheet_name} to {output_path}")
else:
print(f"[INFO] Failed to download sheet: {sheet_name}. Status code: {csv_response.status_code}")

def main():
parser = argparse.ArgumentParser(description="Extract Google Spreadsheet information")
group = parser.add_mutually_exclusive_group(required=True)
group.add_argument("--google-spreadsheet-url", help="Google Spreadsheet URL")
group.add_argument("--google-spreadsheet-id", help="Google Spreadsheet ID")
parser.add_argument('--output', type=str, default='sheets_csv', help='The directory to save the CSV files')
args = parser.parse_args()

if args.google_spreadsheet_url:
spreadsheet_id = extract_spreadsheet_id(args.google_spreadsheet_url)
else:
spreadsheet_id = args.google_spreadsheet_id

if not spreadsheet_id:
print("[INFO] Invalid Google Spreadsheet URL or ID")
return

sheet_info = get_spreadsheet_info(spreadsheet_id)
if sheet_info:
for name, gid in sheet_info.items():
download_sheet_as_csv(spreadsheet_id, name, gid, args.output)
else:
print("[INFO] Failed to extract sheet information")

if __name__ == "__main__":
main()
```

2024年8月16日星期五

Python 開發筆記 - 引用 yt-dlp extractor 資源，呼叫特定函數功能

我想 yt-dlp 應當不用做太多簡介，他本身容納了各種影音網站的分析器，以前也片段片段的研究：

原本有個任務是讓同事處理，他在忙碌就幫忙試了一下，整體上就問問 ChatGPT 就得到想要的八成的範例程式，接著再小幅修改一下使用方式就收工。有了 ChatGPT 後，幾乎可以不用做什麼筆記，忘了就在問一下即可，當然，能不能一問就得到結果，就回歸到詢問者的功力了。

例如情境：

透過 yt-dlp extractor 內的 facebook.py ，幫我列出 facebook video 內的格式，以協助進行產品多媒體格式的偵錯

如此 ChatGPT 就給我不錯的程式碼，修改一下，搞定：

```
% git clone https://github.com/yt-dlp/yt-dlp.git

% cat test-facebook.py
import sys
import argparse

def get_video_formats(url):
from yt_dlp.YoutubeDL import YoutubeDL
from yt_dlp.extractor.facebook import FacebookIE

# 创建一个 YoutubeDL 实例
ydl_opts = {}
with YoutubeDL(ydl_opts) as ydl:
# 实例化 FacebookIE 类，并传递 ydl 对象作为 downloader
facebook_ie = FacebookIE(ydl)

# 使用 extract 方法提取视频信息
video_info = facebook_ie.extract(url)

# 获取可用的格式
formats = video_info.get('formats', [])

# 打印所有格式的id和分辨率
for fmt in formats:
# 获取格式的分辨率，如果没有就使用 'N/A'
resolution = fmt.get('height', 'N/A')
print(f"Format ID: {fmt['format_id']}, Resolution: {resolution}p")

if __name__ == "__main__":
# 设置命令行参数解析
parser = argparse.ArgumentParser(description='Extract video formats from a Facebook video URL using yt-dlp.')
parser.add_argument('--path', required=True, help='Path to the yt-dlp directory')
parser.add_argument('--facebook-url', required=True, help='Facebook video URL')

args = parser.parse_args()

# 将 yt-dlp 的路径插入 sys.path
sys.path.insert(0, args.path)

# 调用函数来获取视频格式
get_video_formats(args.facebook_url)

% python3 test-facebook.py --path ./yt-dlp/ --facebook-url 'https://www.facebook.com/TED/videos/1464960234158173'
[facebook] Extracting URL: https://www.facebook.com/TED/videos/1464960234158173
[facebook] 1464960234158173: Downloading webpage
Format ID: hd, Resolution: N/Ap
Format ID: sd, Resolution: N/Ap
Format ID: 492667606832062v, Resolution: 720p
Format ID: 1654227388706906v, Resolution: 720p
Format ID: 849737406808391v, Resolution: 720p
Format ID: 1263117531679597v, Resolution: 1080p
Format ID: 1036187750774836a, Resolution: Nonep
```

收工！

這樣往後有哪些 extrator 內有一些有趣的 function ，就可以這樣呼叫出來使用，省去自己在重複刻輪子，且如果有需要更新維護，僅需把 yt-dlp code 更新到最新即可，保留異質系統整合彈性

此外 yt-dlp 本身指令有支援 JSON output ，若 yt-dlp JSON output 已夠用，也不需要向上述那樣用法，上述筆記純粹是一個範例，像是取得中間資料結構等等。

2024年8月7日星期三

Python 開發筆記 - 使用 Nginx / WSGI / Gunicorn / Flask 進行 Python API 服務的上線整合 @ Ubuntu

近期工作上在 node.js 服務上，多開了一個 python api 服務做整合應用，目前先試著混合架構來擠擠機器資源而不是 micorservice 架構。

實作就是 Nginx 擋在前面，接著有些 requests 交給 node.js 運作，有些 requests 交給 python api 服務，整體上就是透過 Proxy Pass 架構：

$ cat /etc/nginx/conf.d/service.conf | grep location
location / {

location /node/ { rewrite ^/node/(.*)$ /$1 break; proxy_set_header X-Real-IP $remote_addr; proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for; proxy_redirect off; proxy_http_version 1.1; proxy_read_timeout 60; proxy_pass http://localhost:3000; }

location /python/ { rewrite ^/python/(.*)$ /$1 break; proxy_set_header X-Real-IP $remote_addr; proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for; proxy_redirect off; proxy_http_version 1.1; proxy_read_timeout 60; proxy_pass http://unix:/var/run/service-py.sock; }

其中 node.js 是跑在 3000 port 服務，而 python api 跑在 unix:/var/run/service-py.sock ，上述 Nginx 設定檔剛好可以作為筆記，屬於兩種不同的設計方式，此外，這邊收到 requests 導向到 node.js 或 python api 時，都會刻意再去掉一些 prefix ，讓後面的服務開發比較多彈性。

回到 python api ，此次採用 Flask framework，他的運行很簡單：

```
% cat app.py
...
if __name__ == '__main__':
app.run(host='localhost', port=3001)
```

直接執行法：

% python3 app.py
* Serving Flask app 'app' (lazy loading)
* Environment: production
WARNING: This is a development server. Do not use it in a production deployment.
Use a production WSGI server instead.
* Debug mode: off
WARNING: This is a development server. Do not use it in a production deployment. Use a production WSGI server instead.
* Running on http://localhost:3001
Press CTRL+C to quit

使用 Flask 指令執行：

$ FLASK_APP=app.py flask run --host 0.0.0.0 --port 3001
* Serving Flask app 'app.py' (lazy loading)
* Environment: production
WARNING: This is a development server. Do not use it in a production deployment.
Use a production WSGI server instead.
* Debug mode: off
WARNING: This is a development server. Do not use it in a production deployment. Use a production WSGI server instead.
* Running on all addresses (0.0.0.0)
* Running on http://127.0.0.1:3001
* Running on http://x.x.x.x:3001
Press CTRL+C to quit

最後則是為了穩定性，採用 Nginx + WSGI 整合方式，透過 Gunicorn 來工作：

$ cat wsgi.py
from app import app

if __name__ == "__main__":
app.run()

$ sudo gunicorn --workers 4 --bind unix:/var/run/service-py.sock -m 777 wsgi:app
[2024-08-07] [1701223] [INFO] Starting gunicorn 20.0.4
[2024-08-07] [1701223] [INFO] Listening at: unix:/var/run/service-py.sock (1701223)
[2024-08-07] [1701223] [INFO] Using worker: sync
[2024-08-07] [1701225] [INFO] Booting worker with pid: 1701225
[2024-08-07] [1701226] [INFO] Booting worker with pid: 1701226
[2024-08-07] [1701227] [INFO] Booting worker with pid: 1701227
[2024-08-07] [1701228] [INFO] Booting worker with pid: 1701228

如果要把運行也包裝系統指令方便處理，就只需：

$ cat /etc/systemd/system/my-py.service
[Unit]
Description=my-py-service
After=network.target

[Service]
Type=simple
User=root
WorkingDirectory=/path/project
#ExecStart=/usr/bin/python3 app.py
#Environment=FLASK_APP=app.py
#ExecStart=/usr/bin/python3 -m flask run --port 3001 --host 0.0.0.0
ExecStart=/usr/bin/gunicorn --workers 4 --bind unix:/var/run/service-py.sock -m 777 wsgi:app
Restart=on-failure

[Install]
WantedBy=multi-user.target

後續就可以透過以下方式管理：

$ sudo systemctl status my-py.service
$ sudo systemctl stop my-py.service
$ sudo systemctl start my-py.service
$ sudo systemctl restart my-py.service

2024年8月6日星期二

PHP 開發筆記 - 嘗試使用 Laravel + Twill CMS Toolkit 開發 CMS 後台管理服務 @ macOS M1

近期公司內部服務都朝向 Laravel framework 來維護，其中常見需求是製作具有 CMS 管理的機制，提供不同部門的同事編輯資料及發佈出去。就來試試看 Twill 這個 CMS Toolkit 套件。從他的文件得知，引入他可以快速擁有後台登入機制，以及省去自己規劃資料庫的資料表，而相較 wordpress 則是有更大的彈性做事。

目前在 Macbook M1 和 MacPorts 環境下，操作一下，先弄個 Laravel project 出來：

```
% sudo port install php83 php83-iconv php83-intl php83-mbstring php83-openssl php83-curl php83-sqlite php83-zip php83-gd php83-exif
% alias php=php83
% wget https://getcomposer.org/download/latest-stable/composer.phar -O /tmp/composer.phar
% php /tmp/composer.phar self-update
% alias composer="php /tmp/composer.phar"
% composer create-project --prefer-dist laravel/laravel /tmp/laravel-workspace
Creating a "laravel/laravel" project at "/tmp/laravel-workspace"
Installing laravel/laravel (v11.1.4)
- Installing laravel/laravel (v11.1.4): Extracting archive
Created project in /tmp/laravel-workspace
> @php -r "file_exists('.env') || copy('.env.example', '.env');"
Loading composer repositories with package information
Updating dependencies
...
```

這邊偷懶用 alias composer="php /tmp/composer.phar" ，後面會碰到類似的的錯誤訊息時，其實就是找不到 composer 指令：

```
Symfony\Component\Process\Exception\ProcessStartFailedException

The command "'composer' 'dump-autoload'" failed.

Working directory: /private/tmp/laravel-workspace

Error: proc_open(): posix_spawn() failed: No such file or directory
```

因此可以把 /tmp/composer.phar 擺到 PATH 內會尋找指令的地方，或是人工再補個 composer dump-autoload 等等

```
% cp /tmp/composer.phar ~/.bin/composer
% chmod 755 ~/.bin/composer
```

接下來安裝 Twill Toolkit，整個流程其實參考 Twill 官網教學文即可：Building a simple page builder with Laravel Blade，在此僅記錄一下操作流程：

```
% cd /tmp/laravel-workspace
laravel-workspace % composer require area17/twill:"^3.0"
laravel-workspace % php artisan twill:install
...
Let's create a superadmin account!

Enter an email:
> user@example.com

Enter a password:
>

Confirm the password:
>

Your account has been created
All good!
```

接著安裝後台模組 Pages：

```
laravel-workspace % php artisan twill:make:module pages

Do you need to use the block editor on this module? [yes]:
[0] no
[1] yes
>

Do you need to translate content on this module? [yes]:
[0] no
[1] yes
>

Do you need to generate slugs on this module? [yes]:
[0] no
[1] yes
>

Do you need to attach images on this module? [yes]:
[0] no
[1] yes
>

Do you need to attach files on this module? [yes]:
[0] no
[1] yes
>

Do you need to manage the position of records on this module? [yes]:
[0] no
[1] yes
>

Do you need to enable revisions on this module? [yes]:
[0] no
[1] yes
>

Do you need to enable nesting on this module? [no]:
[0] no
[1] yes
>

Do you also want to generate a model factory? [yes]:
[0] no
[1] yes
>

Do you also want to generate a model seeder? [yes]:
[0] no
[1] yes
>

Migration created successfully! Add some fields!

INFO Factory [database/factories/PageFactory.php] created successfully.

Models created successfully! Fill your fillables!
Repository created successfully! Control all the things!
Controller created successfully! Define your index/browser/form endpoints options!
Form request created successfully! Add some validation rules!

Do you also want to generate the preview file? [yes]:
[0] no
[1] yes
>

INFO Seeder [database/seeders/PageSeeder.php] created successfully.

The following snippet has been added to routes/twill.php:
-----
TwillRoutes::module('pages');
-----
To add a navigation entry add the following to your AppServiceProvider BOOT method.
-----
use A17\Twill\Facades\TwillNavigation;
use A17\Twill\View\Components\Navigation\NavigationLink;

public function boot()
{
...

TwillNavigation::addLink(
NavigationLink::make()->forModule('pages')
);
}
-----
Do not forget to migrate your database after modifying the migrations.

Enjoy.
```

上述的意思是 routes/twill.php 已經增加好後台管理介面的 routing 規則：

```
% cat routes/twill.php
<?php

use A17\Twill\Facades\TwillRoutes;

// Register Twill routes here eg.
// TwillRoutes::module('posts');

TwillRoutes::module('pages');
```

但是 app/Providers/AppServiceProvider.php 必須自己處理，調整成下方：

```
laravel-workspace % cat app/Providers/AppServiceProvider.php
<?php

namespace App\Providers;

use Illuminate\Support\ServiceProvider;
use A17\Twill\Facades\TwillNavigation;
use A17\Twill\View\Components\Navigation\NavigationLink;

class AppServiceProvider extends ServiceProvider
{
/**
* Register any application services.
*/
public function register(): void
{
//
}

/**
* Bootstrap any application services.
*/
public function boot(): void
{
TwillNavigation::addLink(
NavigationLink::make()->forModule('pages')
);
}
}
```

這些更動是讓後台 /admin 時，上方導覽可以多一項 Pages 的功能，後續透過下方來使用:

laravel-workspace % php artisan migrate
laravel-workspace % php artisan serve --host 0.0.0.0 --port 8000

如此，就可以用 http://localhost:8000/admin 登入後台，點擊 Pages 切換到 http://localhost:8000/admin/pages 可以新增 Pages ，可點擊 Add new 按鈕一則，這時可以看到前台網址規則是 localhost/en/pages/hello-world ，但實際上在 PHP Laravel routing 規則中，前台網址規則都還沒實作處理，所以是看不到資料的。

上述僅做了簡易的後台搭建，後續要處理的有：

讓後來編輯界面更加豐富，例如有更多的元件(image, text)等，主要是擴增編輯使用的表單元素
處理後台編輯時，preview 缺少的 css 資源 (運行 npm install && npm run build)
建立新元素的樣板資料
建立前台網頁的網址規則跟處理的 Controller (PageDisplayController)

如此，在後台把頁面發布出去後，就可以在前台被瀏覽到。但光上述四點的操作項目是不少的。

接著來進行，也就是 Twill 官網導覽流程，並且把其他碰到的問題也解一解：Configuring the page module

由於預設的前台網頁網址規則是跟語言相關的，例如建立個 Hello World Page 後，可以看到他的前台網址是 localhost/en/pages/hello-world ，若要去掉語言，就是調整 app/Http/Controllers/Twill/PageController.php ：

```
18 protected function setUpController(): void
19 {
20 $this->setPermalinkBase(''); // 去掉 /pages/ 那層網址
21 $this->withoutLanguageInPermalink(); // 去掉 /en/ 那層網址
22 }
```

接著來調整撰文時的表單功能，例如目前每一個 Page 都可以填寫 title 跟 description 可增加 SEO 的效果，而增加文章分享後的美觀，則是要增加圖片，這時直接修改 app/Http/Controllers/Twill/PageController.php ，添加發文時可以上傳圖片功能：

```
6 use A17\Twill\Services\Forms\Fields\Medias;
...
29 public function getForm(TwillModelContract $model): Form
30 {
31 $form = parent::getForm($model);
32
33 $form->add(
34 Input::make()->name('description')->label('Description')->translatable()
35 );
36
37 $form->add(
38 Medias::make()->name('cover')->label('Cover image')
39 );
40
41 return $form;
42 }
...
```

主要是新增引入 `use A17\Twill\Services\Forms\Fields\Medias;` 跟 `$form->add( Medias::make()->name('cover')->label('Cover image') );`

這時我們的 laravel 是透過 artisan 跑在 8000 port，拖拉上傳圖片時，其實會顯示不出來

因為他的圖片網址規則並沒有帶 port ，且只要把網址複製出來加上 port number 就可以正常顯示，代表只需處理 Laravel framework 的 .env 中 APP_URL 規則：

```
% cat .env | grep APP_URL
APP_URL=http://localhost
```

這時建議測試時可以有兩套環境設置，運行時指定設定檔案：

```
% cp .env .env.local
% cat .env.local | grep APP_URL
APP_URL=http://localhost:8000
% php artisan serve --host 0.0.0.0 --port 8000 --env local

INFO Server running on [http://0.0.0.0:8000].

Press Ctrl+C to stop the server
```

如此也解掉後台圖片顯示失敗的問題。

下一刻則是擴充文章編輯環境，引入 Block Editor 架構，也就是在編輯文章時，可以拖拉區塊到文章內，還可以自行開發，把常用的項目元件化。從 Twill 官網的範例資訊，預設有兩個 Block 了，一個是 image block ，另一個是 wysiwyg block，官網範例會試著新增一個小的區塊，例如名為 Text 的區塊。

首先，先啟用 Block Editor，啟用方式是在 app/Http/Controllers/Twill/PageController.php 裡引入 `use A17\Twill\Services\Forms\Fields\BlockEditor;` 和增加 `$form->add( BlockEditor::make() );`

接著則是試著建立一個 text block：

```
laravel-workspace % php artisan twill:make:block text

Should we also generate a view file for rendering the block? (yes/no) [no]:
> yes

Creating block...
File: /private/tmp/laravel-workspace/resources/views/twill/blocks/text.blade.php
Block text was created.
Block text blank render view was created.
Block is ready to use with the name 'text'

laravel-workspace % cat resources/views/twill/blocks/text.blade.php
@twillBlockTitle('Text')
@twillBlockIcon('text')
@twillBlockGroup('app')

<x-twill::input
name="title"
label="Title"
:translated="true"
/>

<x-twill::wysiwyg
name="text"
label="Text"
placeholder="Text"
:toolbar-options="[
'bold',
'italic',
['list' => 'bullet'],
['list' => 'ordered'],
[ 'script' => 'super' ],
[ 'script' => 'sub' ],
'link',
'clean'
]"
:translated="true"
/>
```

可以看到其樣板也長出來了

但是在後台 preview 時，仍會有待處理的訊息：

This is a basic preview. You can use dd($block) to view the data you have access to. <br />This preview file is located at: /private/tmp/laravel-workspace/resources/views/site/blocks/text.blade.php

將他修改一下：

```
% cat resources/views/site/blocks/text.blade.php
<div class="prose">
<h2>{{$block->translatedInput('title')}}</h2>
{!! $block->translatedInput('text') !!}
</div>
```

接著再產生另一個 image block 並更新他的 block view 跟 preview:

```
laravel-workspace % php artisan twill:make:block image

Should we also generate a view file for rendering the block? (yes/no) [no]:
> yes

Creating block...
File: /private/tmp/laravel-workspace/resources/views/twill/blocks/image.blade.php
Block image was created.
Block image blank render view was created.
Block is ready to use with the name 'image'

laravel-workspace % cat resources/views/twill/blocks/image.blade.php
@twillBlockTitle('Image')
@twillBlockIcon('text')
@twillBlockGroup('app')

<x-twill::medias
name="highlight"
label="Highlight"
/>

laravel-workspace % cat resources/views/site/blocks/image.blade.php
<div class="py-8 mx-auto max-w-2xl flex items-center">
<img src="{{$block->image('highlight', 'desktop')}}"/>
</div>
```

此外，image block 要生效還需要調整 config/twill.php

```
% cat config/twill.php
<?php

return [
'block_editor' => [
'crops' => [
'highlight' => [
'desktop' => [
[
'name' => 'desktop',
'ratio' => 16 / 9,
],
],
'mobile' => [
[
'name' => 'mobile',
'ratio' => 1,
],
],
],
],
],
];
```

接著，還要幫前後台產的文章添加 CSS 效果 `@vite('resources/css/app.css')` ，添加方式是修改 `resources/views/site/layouts/block.blade.php` 跟 `resources/views/site/page.blade.php`

```
laravel-workspace % cat resources/views/site/layouts/block.blade.php
<!doctype html>
<html lang="en">
<head>
<title>#madewithtwill website</title>
@vite('resources/css/app.css')
</head>
<body>
<div>
@yield('content')
</div>
</body>
</html>

laravel-workspace % cat resources/views/site/page.blade.php
<!doctype html>
<html lang="en">
<head>
<title>{{ $item->title }}</title>
@vite('resources/css/app.css')
</head>
<body>
<div class="mx-auto max-w-2xl">
{!! $item->renderBlocks() !!}
</div>
</body>
</html>
```

此外，還要用 npm 工具編譯出 resources/css/app.css：

```
% nvm use v20
Now using node v20.9.0 (npm v10.3.0)

% npm install

added 23 packages, and audited 24 packages in 577ms

5 packages are looking for funding
run `npm fund` for details

found 0 vulnerabilities

laravel-workspace % tree resources/css/
resources/css/
└── app.css

1 directory, 1 file
```

如此，在 Twill 後台編輯文章時，就可以在 Block editor 操作下增加文字區塊、圖片等等的功能。

不過，直到現在前台機制還沒打通，尚未提供前台 routing rule 等顯示前台網頁，先建立個 PageDisplayController 來處理：

```
% php artisan make:controller PageDisplayController

INFO Controller [app/Http/Controllers/PageDisplayController.php] created successfully.
```

把 PageDisplayController 更新為可以接一個參數，並且立刻把它印出 debug 訊息：

```
% cat app/Http/Controllers/PageDisplayController.php
<?php

namespace App\Http\Controllers;

use Illuminate\Http\Request;
use Illuminate\Contracts\View\View;

class PageDisplayController extends Controller
{
public function show(string $slug): View
{
dd($slug);
}
}
```

接著，再把 routing 設置好：

```
% cat routes/web.php
<?php

use Illuminate\Support\Facades\Route;

//Route::get('/', function () {
// return view('welcome');
//});

Route::get('{slug}', [\App\Http\Controllers\PageDisplayController::class, 'show'])->name('frontend.page');
```

如此在前台瀏覽網頁就會看到：

最後，再把 PageDisplayController 調整成顯示正確的資料：

```
laravel-workspace % cat app/Http/Controllers/PageDisplayController.php
<?php

namespace App\Http\Controllers;

use Illuminate\Http\Request;
use Illuminate\Contracts\View\View;
use App\Repositories\PageRepository;

class PageDisplayController extends Controller
{
//public function show(string $slug): View
//{
// dd($slug);
//}

public function show(string $slug, PageRepository $pageRepository): View
{
$page = $pageRepository->forSlug($slug);

if (!$page) {
abort(404);
}

return view('site.page', ['item' => $page]);
}
}
```

如此瀏覽前台時，例如 http://localhost:8000/hello-world 就可以顯示網頁內容了。

最後，如果碰到前台圖片沒有顯示出來的部分，則是留意拖拉建立圖片時，需要依照不同裝置版型做設定，例如 PC 瀏覽時看不到圖片，那應當是少設定的 desktop 的設置：

```
laravel-workspace % cat resources/views/site/blocks/image.blade.php
<div class="py-8 mx-auto max-w-2xl flex items-center">
<img src="{{$block->image('highlight', 'desktop')}}"/>
</div>
```

需留意在 block editor 時，其 Image 拖拉進去時，有沒有 `desktop crop` 的描述

2024年8月2日星期五

Node.js 開發筆記 - 分別透過 Pyodide, Brython, WebAssembly 在 node.js 呼叫 Python Code @ node.js v20, python3.11

一時興起研究一下 node.js 呼叫 python code 的方式，當然，都在 linux server 可以直接用 child_process 直接呼叫 python 去運行，例如 nodejs.org/api/child_process.html 的範例

```
const { spawn } = require('node:child_process');
const ls = spawn('ls', ['-lh', '/usr']);

ls.stdout.on('data', (data) => {
console.log(`stdout: ${data}`);
});

ls.stderr.on('data', (data) => {
console.error(`stderr: ${data}`);
});

ls.on('close', (code) => {
console.log(`child process exited with code ${code}`);
});
```

然而，有沒有可能在 node.js 內，直接做 python 直譯的過程等方式？當個樂子，找了一下，還真的有，這樣搞的優勢是降低環境部署的變因，當然，效率上不見得是好辦法，但可以讓不同語言的開發者進行融合(誤)，目前看到兩種整合方式：

Pyodide, pyodide.org
Brython, brython.info

其中 Brython 屬於設計在 Web Browser 下運行（需要 DOM 資源），而 Pyodide 則不需要。分別筆記一下用法。

首先是要執行的 python code 內有 python 的 re 跟 json 模組的使用：

```
% cat script.py
import re
import json

def runTest(inputData):
output = {}
flags = 0
pattern = r'''(?x)
(?:
\.get$"n"$\)&&$b=|
(?:
b=String\.fromCharCode\(110$|
(?P<str_idx>[a-zA-Z0-9_$.]+)&&$b="nn"\[\+(?P=str_idx)\]
),c=a\.get\(b$\)&&$c=|
\b(?P<var>[a-zA-Z0-9_$]+)=
)(?P<nfunc>[a-zA-Z0-9_$]+)(?:\[(?P<idx>\d+)\])?\([a-zA-Z]$
(?(var),[a-zA-Z0-9_$]+\.set$"n"\,(?P=var)$,(?P=nfunc)\.length)'''

try:
result = re.search(pattern, inputData, flags)
if result:
output["status"] = True
output["data"] = result.groupdict()
except Exception as e:
output["error"] = str(e)
return json.dumps(output, indent=4)

runTest(data)
```

Pyodide 用法：

```
% nvm use v20
Now using node v20.10.0 (npm v10.2.3)
% npm install pyodide
% cat package.json
{
"dependencies": {
"pyodide": "^0.26.2"
}
}

% cat run.js
const fs = require('fs').promises;
const { loadPyodide } = require("pyodide");

async function main() {
const fileContent = await fs.readFile('mydata.bin', 'utf8');
let pyodide = await loadPyodide();
pyodide.globals.set("data", pyodide.toPy(fileContent));
const pythonCode = await fs.readFile('script.py', 'utf8');
let result = pyodide.runPython(pythonCode);
console.log(result);
}

main();

% echo "Hello World" > mydata.bin

% node run.js
{}
```

上述使用過程算直觀，但偷懶把要傳遞的資料設定在全域變數，在用 node.js 環境接住 python 運算的結果，看來這個效果是很 OK ，有正常運行得到期待的結果。

接著研究 Brython 用法，他設計上需要 Browser 環境做事:

% head -n 9 brython.js
// brython.js brython.info
// version [3, 11, 0, 'final', 0]
// implementation [3, 11, 3, 'dev', 0]
// version compiled from commented, indented source files at
// github.com/brython-dev/brython
var __BRYTHON__=__BRYTHON__ ||{}
try{
eval("async function* f(){}")}catch(err){console.warn("Your browser is not fully supported. If you are using "+
"Microsoft Edge, please upgrade to the latest version")}

在 node.js 需要 jsdom 模擬一些環境，而下載 brython.js 和 brython_stdlib.js 則參考官網文件，透過 pip install brython 工具出來使用，所以這邊的流程會多了 python 工具的安裝，且現況用 python 3.12 會顯示有些問題，就先定在 3.11 版。此外 brython.js 運行環境，也是可以用最新版 node.js v22 ，但是會看到 [DEP0040] DeprecationWarning: The `punycode` module is deprecated. Please use a userland alternative instead. 訊息，所以先退到 node.js v20 避免額外的訊息

連續動作：

```
% python3.11 -m venv venv
% source venv/bin/activate
(venv) % pip install brython
Collecting brython
Using cached brython-3.11.3-py3-none-any.whl.metadata (1.0 kB)
Using cached brython-3.11.3-py3-none-any.whl (1.6 MB)
Installing collected packages: brython
Successfully installed brython
(venv) % brython-cli install
Installing Brython 3.11.3
done
(venv) % ls
README.txt brython_stdlib.js index.html venv
brython.js demo.html unicode.txt
```

接著回到 node.js 主場：

```
% nvm use v20
Now using node v20.10.0 (npm v10.2.3)
% cat package.json
{
"dependencies": {
"jsdom": "^24.1.1"
}
}
% cat run-via-dom.js
const { JSDOM } = require('jsdom');
const fs = require('fs');
const path = require('path');

const dom = new JSDOM(`<!DOCTYPE html>
<html>
<head></head>
<body>
<script></script>
</body>
</html>`, {
runScripts: "dangerously",
resources: "usable"
});

const brythonJsPath = path.join(__dirname, 'brython.js');
const brythonStdlibJsPath = path.join(__dirname, 'brython_stdlib.js');

const brythonJs = fs.readFileSync(brythonJsPath, 'utf8');
const brythonStdlibJs = fs.readFileSync(brythonStdlibJsPath, 'utf8');

try {
dom.window.eval(brythonJs);
dom.window.eval(brythonStdlibJs);
} catch (error) {
console.error('Error executing brython.js:', error);
}

const scriptPath = path.join(__dirname, 'script.py');
const pythonScript = fs.readFileSync(scriptPath, 'utf8');

const dataPath = path.join(__dirname, 'mydata.bin');
const binaryData = fs.readFileSync(dataPath, 'utf8');
const base64Data = Buffer.from(binaryData).toString('base64');

const scriptElement = dom.window.document.createElement('script');
scriptElement.type = 'text/python';
scriptElement.textContent = `
import base64
data = base64.b64decode("""${base64Data}""")

${pythonScript}

from browser import document
document.output = runTest(data)
`
dom.window.document.body.appendChild(scriptElement);
try {
dom.window.brython({debug: 1, pythonpath: ['.']})
console.log(dom.window.document.output);
} catch (error) {
console.error('Error executing dom.window.brython:', error);
}
console.log('Python script execution completed.');

% echo "Hello World" > mydata.bin

% node run-via-dom.js
{"status": false, "data": {}, "error": "not the same type for string and pattern"}
Python script execution completed.
```

很可惜的，剛好要實驗複雜的 python regular expression，在 brython.js + node.js v20 + jsdom 環境上失敗了，甚至小改 index.html 搭配 python3 -m http.server 用 Chrome browser 執行（給予他完整的 Chrome 瀏覽器環境）還是有一樣的錯誤訊息，這邊就暫時推論失敗了，而上述的範例已經包括從 node.js 傳資料到 python code ，以及運行完如何把回傳資料傳到 node.js 使用，眼尖的人，應該會發現在 brython 用法內，使用了 `document.output = runTest(data)` ，其實是多呼叫了一次 runTest(data)，因為原先 `${pythonScript}` 也有做，但沒在細追怎樣接運算結果，剛好不合預期就放棄研究。

最後，就是 WebAssembly 領域(一開始寫這篇筆記就是要研究 WebAssembly ，不小心走偏)，把某一種 python code 轉成 wasm 格式，接著用 wasmer 運行，或是在其他語言(如 node.js)運行 wasm code。

先透過 MacPorts 安裝 wasmer:

% port search wasmer
wasmer @4.3.5 (lang, devel)
The leading WebAssembly Runtime supporting WASI and Emscripten
% sudo port install wasmer

接著試著用 py2wasm 把 script-main.py 轉成 script-main.wasm，其中 py2wasm 官網有提到目前僅支援 python3.11:

% python3.11 -m venv venv
% source venv/bin/activate
(venv) % pip install py2wasm
(venv) % py2wasm script-main.py -o script-main.wasm

程式碼：

```
% cat script-main.py
import re
import json

def runTest(inputData):
output = { "status": False, "data": {}, "error": None}
flags = 0
pattern = r'''(?x)
(?:
\.get$"n"$\)&&$b=|
(?:
b=String\.fromCharCode\(110$|
(?P<str_idx>[a-zA-Z0-9_$.]+)&&$b="nn"\[\+(?P=str_idx)\]
),c=a\.get\(b$\)&&$c=|
\b(?P<var>[a-zA-Z0-9_$]+)=
)(?P<nfunc>[a-zA-Z0-9_$]+)(?:\[(?P<idx>\d+)\])?\([a-zA-Z]$
(?(var),[a-zA-Z0-9_$]+\.set$"n"\,(?P=var)$,(?P=nfunc)\.length)'''

try:
result = re.search(pattern, inputData, flags)
if result:
output["status"] = True
output["data"] = result.groupdict()
except Exception as e:
output["error"] = str(e)
return json.dumps(output, indent=4)

if __name__ == "__main__":
import sys
if len(sys.argv) < 2:
print("Usage: python script-main.py <inputData>")
else:
print(runTest(sys.argv[1]))

% python3 script-main.py
Usage: python script-main.py <inputData>

% python3 script-main.py "Hello World"
{
"status": false,
"data": {},
"error": null
}
```

wasmer 實測：

```
% wasmer run script-main.wasm
Usage: python script-main.py <inputData>

% wasmer run script-main.wasm "Hello World"
{
"status": false,
"data": {},
"error": null
}
```

接著讓 Node.JS 來運行，這邊就來煩 ChatGPT 並小改一下，有了一個比較堪用的版本：

```
% cat run.js
const fs = require('fs');
const { WASI } = require('wasi');
const path = require('path');
const { TextDecoder } = require('util');

const runWasm = async (inputData) => {
const wasmPath = path.resolve('./script-main.wasm');
const wasmBinary = fs.readFileSync(wasmPath);

// Setup a WASI instance
const wasi = new WASI({
args: inputData ? ['script-main.wasm', inputData] : ['script-main.wasm'],
env: {},
version: 'preview1'
});

// Create a memory buffer for the stdout
const memory = new WebAssembly.Memory({ initial: 1 });

// Compile and instantiate the WebAssembly module
const { instance } = await WebAssembly.instantiate(wasmBinary, {
wasi_snapshot_preview1: wasi.wasiImport,
env: { memory }
});

// Start the WASI instance
wasi.start(instance);

// Read and decode the stdout data
const stdout = new Uint8Array(memory.buffer);
const decoder = new TextDecoder('utf8');
const output = decoder.decode(stdout);
console.log(output.trim());
};

// Get the inputData from the command line arguments
runWasm(process.argv[2] || null).catch(console.error);

% nvm use v20
Now using node v20.10.0 (npm v10.2.3)

% node run.js
(node:22046) ExperimentalWarning: WASI is an experimental feature and might change at any time
(Use `node --trace-warnings ...` to show where the warning was created)
Usage: python script-main.py <inputData>

% node run.js "Hello World"
(node:22050) ExperimentalWarning: WASI is an experimental feature and might change at any time
(Use `node --trace-warnings ...` to show where the warning was created)
{
"status": false,
"data": {},
"error": null
}
```

回過頭來，故事起源是想善用一些 open source 甚至不同程式語言的整合架構，因此稍微研究一些跨語言的整合，很可惜的，最佳的路線應當還是各自跑在各自的 runtime 環境，以上就當趣味筆記一下。

訂閱：文章 (Atom)

2024年10月14日 星期一

2024年10月9日 星期三

2024年10月3日 星期四

2024年9月23日 星期一

2024年9月8日 星期日

2024年8月28日 星期三

2024年8月23日 星期五

2024年8月16日 星期五

2024年8月7日 星期三

2024年8月6日 星期二

2024年8月2日 星期五

Subscribe Now