在此對一份類似 Access Logs 做分析,將其轉成 CSV 格式來分析,其中 CSV 裡頭有 id 跟 remote_ip 兩個欄位,將 remote_ip 分析完後直接歸類屬於哪個 country_code,程式碼很簡單:
% php -v
WARNING: PHP is not recommended
PHP is included in macOS for compatibility with legacy software.
Future versions of macOS will not include PHP.
PHP 7.3.24-(to be removed in future macOS) (cli) (built: Dec 21 2020 21:33:25) ( NTS )
Copyright (c) 1997-2018 The PHP Group
Zend Engine v3.3.24, Copyright (c) 1998-2018 Zend Technologies
% cat composer.json
{
"require": {
"geoip2/geoip2": "~2.0"
}
}
% php composer.phar install
No lock file found. Updating dependencies instead of installing from lock file. Use composer update over composer install if you do not have a lock file.
Loading composer repositories with package information
Updating dependencies
Lock file operations: 4 installs, 0 updates, 0 removals
- Locking composer/ca-bundle (1.2.9)
- Locking geoip2/geoip2 (v2.11.0)
- Locking maxmind-db/reader (v1.10.0)
- Locking maxmind/web-service-common (v0.8.1)
Writing lock fileInstalling dependencies from lock file (including require-dev)
Package operations: 4 installs, 0 updates, 0 removals
- Installing composer/ca-bundle (1.2.9): Extracting archive
- Installing maxmind/web-service-common (v0.8.1): Extracting archive
- Installing maxmind-db/reader (v1.10.0): Extracting archive
- Installing geoip2/geoip2 (v2.11.0): Extracting archive
2 package suggestions were added by new dependencies, use `composer suggest` to see details.
Generating autoload files
1 package you are using is looking for funding.
Use the `composer fund` command to find out more!
% cat job.php
<?phprequire 'vendor/autoload.php';// https://github.com/maxmind/GeoIP2-phpuse GeoIp2\Database\Reader;$reader = new Reader('GeoLite2-City.mmdb');// https://www.php.net/manual/en/function.fgetcsv.php$row = 1;$header_row = NULL;$header = array();$country_group = array();$lookup = array();$tracking_time_cost_per_run = microtime(true);echo "[".date("Y-m-d H:i:s")."]\tinit\n";if (($handle = fopen("access_log.csv", "r")) !== FALSE) {while (($data = fgetcsv($handle, 10240, ",")) !== FALSE) {if (is_null($header_row)) {$header_row = $data;foreach($header_row as $index => $name) {$header[$name] = $index;}continue;}$remote_ip = $data[ $header['remote_ip'] ];$data_id = $data[ $header['id'] ];try {$record = $reader->city($remote_ip);} catch (Exception $e) {continue;}if (!is_null($record) && property_exists($record, 'country') && !is_null($record->country) && isset($record->country->isoCode)) {if (!isset($country_group[$record->country->isoCode]))$country_group[$record->country->isoCode] = array();$hash_key = $remote_ip.'-'.$record->country->isoCode;if (!isset($lookup[$hash_key])) {$lookup[$hash_key] = 1;array_push($country_group[$record->country->isoCode], $data_id);}}++$row;if ($row % 100000 == 0) {echo "[".date("Y-m-d H:i:s")."]\t".number_format($row).", time cost: ".(microtime(true) - $tracking_time_cost_per_run)."\n";$tracking_time_cost_per_run = microtime(true);file_put_contents('/tmp/lookup-result.json', json_encode($country_group));}}fclose($handle);echo "[".date("Y-m-d H:i:s")."]\t".number_format($row)."\n";echo "Total #: $row\n";print_r($header);file_put_contents('lookup-result.json', json_encode($country_group));}
產出:
% time php job.php[2021-04-09 21:03:51] init[2021-04-09 21:05:18] 100,000, time cost: 86.867056131363[2021-04-09 21:06:41] 200,000, time cost: 83.395349025726
...
% jq '' /tmp/lookup-result.json | head -n10
{
"TW": [
"1",
"68",
"101",
"121",
"147",
"193",
"236",
"259",
...
就把一堆 record id 擺在某個 country_code 下面,每 85秒處理完 10萬筆資料。
沒有留言:
張貼留言