2021年4月22日 星期四

[PHP] 下載 Maxmind GeoIP Legacy Databases 和 ngx_http_geoip_module 相關處理 @ macOS 11.2, PHP 7.3.24

由於 Maxmind 在推 GeoIP2 ,不開放 .dat 的 GeoIP DB 了,以前下載位置:

  • http://geolite.maxmind.com/download/geoip/database/GeoLiteCountry/GeoIP.dat.gz
  • http://geolite.maxmind.com/download/geoip/database/GeoLiteCity.dat.gz
現在官方都說要改用 GeoLite2 DB ,其格式:
  • GeoIP2 Binary (.mmdb)
  • GeoIP2 CSV 
目前先偷懶不處理 nginx ,因為 nginx 採用 GeoIP Legacy Databases。

有找到一個網站 https://www.miyuru.lk/geoiplegacy,從中下載:
連續動作:

% php -v
WARNING: PHP is not recommended
PHP is included in macOS for compatibility with legacy software.
Future versions of macOS will not include PHP.
PHP 7.3.24-(to be removed in future macOS) (cli) (built: Dec 21 2020 21:33:25) ( NTS )
Copyright (c) 1997-2018 The PHP Group
Zend Engine v3.3.24, Copyright (c) 1998-2018 Zend Technologies

% php composer.phar require geoip/geoip:~1.16

% wget https://dl.miyuru.lk/geoip/maxmind/city/maxmind4.dat.gz
% gunzip -d maxmind4.dat.gz

% cat test-ip-via-geoip.php
<?php
require 'vendor/autoload.php';
$gi = geoip_open("maxmind4.dat",GEOIP_STANDARD);
$country = geoip_country_code_by_addr($gi, $argv[1]);
echo "lookup [".$argv[1]."], result: [$country]\n";

% php test-ip-via-geoip.php 8.8.8.8
lookup [8.8.8.8], result: [US]

% nslookup tw.yahoo.com
Server: 8.8.8.8
Address: 8.8.8.8#53

Non-authoritative answer:
tw.yahoo.com canonical name = atsv2-fp-shed.wg1.b.yahoo.com.
Name: atsv2-fp-shed.wg1.b.yahoo.com
Address: 180.222.102.201
Name: atsv2-fp-shed.wg1.b.yahoo.com
Address: 180.222.102.202

% php test-ip-via-geoip.php 180.222.102.202
lookup [180.222.102.202], result: [IN]

% curl ipinfo.io/180.222.102.202
{
  "ip": "180.222.102.202",
  "hostname": "media-router-fp74.prod.media.vip.tp2.yahoo.com",
  "city": "Taoyuan City",
  "region": "Taiwan",
  "country": "TW",
  "loc": "24.9937,121.2970",
  "org": "AS24506 YAHOO! TAIWAN",
  "timezone": "Asia/Taipei",
  "readme": "https://ipinfo.io/missingauth"
}

% nslookup facebook.com
Server: 8.8.8.8
Address: 8.8.8.8#53

Non-authoritative answer:
Name: facebook.com
Address: 31.13.87.36

% curl ipinfo.io/31.13.87.36
{
  "ip": "31.13.87.36",
  "hostname": "edge-star-mini-shv-01-tpe1.facebook.com",
  "city": "Hong Kong",
  "region": "Central and Western",
  "country": "HK",
  "loc": "22.2783,114.1747",
  "org": "AS32934 Facebook, Inc.",
  "timezone": "Asia/Hong_Kong",
  "readme": "https://ipinfo.io/missingauth"
}

% nslookup www.gov.tw
Server: 8.8.8.8
Address: 8.8.8.8#53

Non-authoritative answer:
Name: www.gov.tw
Address: 223.200.155.55

% php test-ip-via-geoip.php 223.200.155.55
lookup [223.200.155.55], result: [TW]

% curl ipinfo.io/223.200.155.55
{
  "ip": "223.200.155.55",
  "hostname": "223-200-155-55.hinet-ip.hinet.net",
  "city": "Hualien City",
  "region": "Taiwan",
  "country": "TW",
  "loc": "23.9769,121.6044",
  "org": "AS4782 Data Communication Business Group",
  "timezone": "Asia/Taipei",
  "readme": "https://ipinfo.io/missingauth"
}

2021年4月9日 星期五

[PHP] 透過 Maxmind GeoIP DB 統計用戶資訊 @ macOS 11.2, PHP 7.3.24

真是超級久沒用 maxmind.com 的 GeoIP DB 了,一時之間還以為沒有免費使用的方式,追了一下是要註冊帳號才能下載,而 maxmind 也有提供自動化每天更新 GeoIP DB 的機制。

在此對一份類似 Access Logs 做分析,將其轉成 CSV 格式來分析,其中 CSV 裡頭有 id 跟 remote_ip 兩個欄位,將 remote_ip 分析完後直接歸類屬於哪個 country_code,程式碼很簡單:

% php -v

WARNING: PHP is not recommended
PHP is included in macOS for compatibility with legacy software.
Future versions of macOS will not include PHP.
PHP 7.3.24-(to be removed in future macOS) (cli) (built: Dec 21 2020 21:33:25) ( NTS )
Copyright (c) 1997-2018 The PHP Group
Zend Engine v3.3.24, Copyright (c) 1998-2018 Zend Technologies

% cat composer.json
{
  "require": {
    "geoip2/geoip2": "~2.0"
  }
}

% php composer.phar install
No lock file found. Updating dependencies instead of installing from lock file. Use composer update over composer install if you do not have a lock file.
Loading composer repositories with package information
Updating dependencies
Lock file operations: 4 installs, 0 updates, 0 removals
- Locking composer/ca-bundle (1.2.9)
- Locking geoip2/geoip2 (v2.11.0)
- Locking maxmind-db/reader (v1.10.0)
- Locking maxmind/web-service-common (v0.8.1)
Writing lock file
Installing dependencies from lock file (including require-dev)
Package operations: 4 installs, 0 updates, 0 removals
- Installing composer/ca-bundle (1.2.9): Extracting archive
- Installing maxmind/web-service-common (v0.8.1): Extracting archive
- Installing maxmind-db/reader (v1.10.0): Extracting archive
- Installing geoip2/geoip2 (v2.11.0): Extracting archive
2 package suggestions were added by new dependencies, use `composer suggest` to see details.

Generating autoload files
1 package you are using is looking for funding.
Use the `composer fund` command to find out more!

% cat job.php
<?php

require 'vendor/autoload.php';

// https://github.com/maxmind/GeoIP2-php
use GeoIp2\Database\Reader;
$reader = new Reader('GeoLite2-City.mmdb');

// https://www.php.net/manual/en/function.fgetcsv.php
$row = 1;
$header_row = NULL;
$header = array();
$country_group = array(); 
$lookup = array();

$tracking_time_cost_per_run = microtime(true);
echo "[".date("Y-m-d H:i:s")."]\tinit\n";
if (($handle = fopen("access_log.csv", "r")) !== FALSE) {
while (($data = fgetcsv($handle, 10240, ",")) !== FALSE) {
if (is_null($header_row)) {
$header_row = $data;
foreach($header_row as $index => $name) {
$header[$name] = $index;
}
continue;
}
$remote_ip = $data[ $header['remote_ip'] ];
$data_id = $data[ $header['id'] ];

try {
$record = $reader->city($remote_ip);
} catch (Exception $e) {
continue;
}
if (!is_null($record) && property_exists($record, 'country') && !is_null($record->country) && isset($record->country->isoCode)) {
if (!isset($country_group[$record->country->isoCode]))
$country_group[$record->country->isoCode] = array();
$hash_key = $remote_ip.'-'.$record->country->isoCode;
if (!isset($lookup[$hash_key])) {
$lookup[$hash_key] = 1;
array_push($country_group[$record->country->isoCode], $data_id);
}
}
++$row;
if ($row % 100000 == 0) {
echo "[".date("Y-m-d H:i:s")."]\t".number_format($row).", time cost: ".(microtime(true) - $tracking_time_cost_per_run)."\n";
$tracking_time_cost_per_run = microtime(true);
file_put_contents('/tmp/lookup-result.json', json_encode($country_group));
}
}
fclose($handle);

echo "[".date("Y-m-d H:i:s")."]\t".number_format($row)."\n";
echo "Total #: $row\n";
print_r($header);
file_put_contents('lookup-result.json', json_encode($country_group));
}

產出:
% time php job.php
[2021-04-09 21:03:51] init
[2021-04-09 21:05:18] 100,000, time cost: 86.867056131363
[2021-04-09 21:06:41] 200,000, time cost: 83.395349025726 
...

% jq '' /tmp/lookup-result.json | head -n10

{
  "TW": [
    "1",
    "68",
    "101",
    "121",
    "147",
    "193",
    "236",
    "259", 
... 

就把一堆 record id 擺在某個 country_code 下面,每 85秒處理完 10萬筆資料。