2021年6月12日 星期六

Node.js 筆記 - 使用 Puppeteer 監控瀏覽網頁的 Requests 跟 HTML Code @ macOS 11.4, node v14.14.0, npm 6.14.5

緣起於在追蹤網站架構時,每次都靠 chrome browser 開發人員工具去追蹤特定的 request 還好,若需要大量瀏覽找尋一些蛛絲馬跡時,就會很煩!以前很閒時,就會跳下去寫 chrome extension 或是熱衷於 C++ 時,就會寫一點點  Chromium Embedded Framework 來寫個小型瀏覽器,最後想起來試試看 Puppeteer 吧!

使用 Puppeteer 很輕鬆,除了是很常見的 Javascript 語言外,Puppeteer 套件提供了網頁端常見的事件偵測、所發的 Request 清單。就這樣拼拼湊湊即可完工:

(async () => {
const browser = await puppeteer.launch({headless: false});
//const page = await browser.newPage();
const context = await browser.createIncognitoBrowserContext();
const page = await context.newPage();
await page.emulate(device);
await page.setRequestInterception(true);

// https://pptr.dev/#?product=Puppeteer&version=v10.0.0&show=api-event-domcontentloaded
// https://developer.mozilla.org/en-US/docs/Web/API/Window/DOMContentLoaded_event
//
// The DOMContentLoaded event fires when the initial HTML document has been completely loaded and parsed, without waiting for stylesheets, images, and subframes to finish loading.
//
page.on('domcontentloaded', () => {
console.log('on.domcontentloaded');
});

// https://pptr.dev/#?product=Puppeteer&version=v10.0.0&show=api-event-domcontentloaded
// https://developer.mozilla.org/en-US/docs/Web/API/Window/load_event
//
// The load event is fired when the whole page has loaded, including all dependent resources such as stylesheets and images. This is in contrast to DOMContentLoaded, which is fired as soon as the page DOM has been loaded, without waiting for resources to finish loading.
//
page.on('load', async () => {
console.log('on.load');
watchTags(page, watch_tags);
});

// https://pptr.dev/#?product=Puppeteer&version=v10.0.0&show=api-event-framenavigated
// https://pptr.dev/#?product=Puppeteer&version=v10.0.0&show=api-class-frame
page.on('framenavigated', frame => {
console.log('on.framenavigated: '+frame.url());
});

// https://pptr.dev/#?product=Puppeteer&version=v10.0.0&show=api-event-request
// https://pptr.dev/#?product=Puppeteer&version=v10.0.0&show=api-class-httprequest
page.on('request', request => {
watchRequest(page.url(), request.url());
if (skip_resource_type[ request.resourceType() ])
request.abort();
else
request.continue();
});

// https://pptr.dev/#?product=Puppeteer&version=v10.0.0&show=api-event-response
// https://pptr.dev/#?product=Puppeteer&version=v10.0.0&show=api-class-httpresponse
page.on('response', response => {
//console.log('on.response: '+response.url());
});

while (true) {
if (url_init.length > 0) {
const target_url = url_init.shift();
await page.goto(target_url, {
waitUntil: 'networkidle0',
});
console.log('networkidle0');
}
if (url_init.length > 0)
await sleep(5000);
else
await sleep(50);
}
await browser.close();
})();

如此,搭配 command line 使用,就可很快的監控想要的 request 規則,或是 HTML DOM Tree 的變化,例如監控 <script> 的讀取位置在哪:

% node index.js -url "https://tw.yahoo.com" -tag "script.src"
...
on.load
[WATCH][TAG] script: src
Web URL: [https://tw.yahoo.com/]
[
  { src: 'https://s.yimg.com/wi/ytc.js' },
  {
    src: 'https://tw.mobi.yahoo.com/polyfill.min.js?features=array.isarray%2Carray.prototype.every%2Carray.prototype.foreach%2Carray.prototype.indexof%2Carray.prototype.map%2Cdate.now%2Cfunction.prototype.bind%2Cobject.keys%2Cstring.prototype.trim%2Cobject.defineproperty%2Cobject.defineproperties%2Cobject.create%2Cobject.freeze%2Carray.prototype.filter%2Carray.prototype.reduce%2Cobject.assign%2Cpromise%2Crequestanimationframe%2Carray.prototype.some%2Cobject.getownpropertynames%2Clocale-data-en-us%2Cintl%2Clocale-data-zh-hant-tw&version=2.1.23'
  },
  { src: 'https://s.yimg.com/aaq/yc/2.9.0/zh.js' },
  {
    src: 'https://s.yimg.com/ud/fp/js/vendor.174522d6d76b51858b93.min.js'
  },
  { src: 'https://s.yimg.com/ud/fp/js/common.js' },
  { src: 'https://s.yimg.com/oa/consent.js' },
  { src: 'https://consent.cmp.oath.com/cmpStub.min.js' },
  { src: 'https://consent.cmp.oath.com/cmp.js' },
  { src: 'https://s.yimg.com/ss/rapid3.js' },
  { src: 'https://s.yimg.com/rq/darla/boot.js' },
  {
    src: 'https://s.yimg.com/ud/fp/js/main.6622c6aeda051386afaa.min.js'
  },
  { src: 'https://mbp.yimg.com/sy/os/yaft/yaft-0.3.22.min.js' },
  {
    src: 'https://mbp.yimg.com/sy/os/yaft/yaft-plugin-aftnoad-0.1.3.min.js'
  },
  { src: 'https://s.yimg.com/aaq/vzm/perf-vitals_1.0.0.js' },
  {
    src: 'https://fc.yahoo.com/sdarla/php/client.php?dm=1&lang=zh-Hant-TW'
  },
  {
    src: 'https://s.yimg.com/aaq/c/e7fecc0.caas-abu_highlander.min.js'
  }
]
...

2021年6月4日 星期五

[Linux] 升級 Red Hat Enterprise Server 的 OpenSSL, OpenSSH 服務 @ Red Hat Enterprise Linux Server release 5.3

這是一台滿舊的機器,大概 10 年前吧。同事想要 git clone 時,出現了失敗問題,追蹤一下是機器的 openssl 過舊,由於機器有他的任務在,不好意思亂更新系統資源,怕爆!所以採用 tarball 的安裝方式。

openssl 太舊時,連用 wget/curl 去下載 https 來源時都會失敗的,只好靠 http 或是 scp 搬東西進去。

$ lsb_release -a
Distributor ID: RedHatEnterpriseServer
Description: Red Hat Enterprise Linux Server release 5.3 (Tikanga)
Release: 5.3
Codename: Tikanga

$ openssl version
OpenSSL 0.9.8e-fips-rhel5 01 Jul 2008

只好挑個 /opt/tarball 安裝啦,沒想到要編譯 openssl 又會碰到 perl 版本不夠新:

$ wget http://www.cpan.org/src/5.0/perl-5.10.0.tar.gz
$ cd ~/perl-5.10.0
$ ./configure.gnu --prefix=/opt/tarball
$ make -j4 install

接著再安裝個 zlib ,採用 static 方式方便後續大家使用:

$ wget http://zlib.net/zlib-1.2.11.tar.gz
$ cd ~/zlib-1.2.11
$ ./configure --prefix=/opt/tarball --static
$ make install

編譯 openssl:

$ wget http://www.openssl.org/source/openssl-1.1.1k.tar.gz 
$ cd ~/openssl-OpenSSL_1_1_1k 
$ PATH=/opt/tarball/bin:$PATH ./config --prefix=/opt/tarball
$ make -j4 install

編譯 openssh 工具包:https://github.com/openssh/openssh-portable/releases

$ cd ~/openssh-portable-V_8_6_P1
$ autoreconf
$ LD_LIBRARY_PATH=/opt/tarball/lib:$LD_LIBRARY_PATH ./configure --prefix=/opt/tarball/ --with-ssl-dir=/opt/tarball
$ LD_LIBRARY_PATH=/opt/tarball/lib:$LD_LIBRARY_PATH make -j4 install
$ file /opt/tarball/bin/ssh
/opt/tarball/bin/ssh: ELF 32-bit LSB shared object, Intel 80386, version 1 (SYSV), for GNU/Linux 2.6.9, stripped

接著偷懶拉 code 方式:

$ GIT_SSH_COMMAND="LD_LIBRARY_PATH=/opt/tarball/lib:$LD_LIBRARY_PATH /opt/tarball/bin/ssh" git clone ...

或是在環境變數添加好:

LD_LIBRARY_PATH=/opt/tarball/lib:$LD_LIBRARY_PATH 
PATH=/opt/tarball/bin:$PATH

搞定!