只要資料有 Geolocation,就能夠把玩 CartoDB 了 :P 若可以的話,再加上時間就更完美了。因此,最容易拿到的測資是 Apache web server log,把 access.log 挑點東西出來即可,至於 Geolocation 就用 ip 反查吧!
從 access.log 取出 ip list:
$ grep -v "^localhost\|::1" /var/log/apache2/access.log | awk '{print $1}' | uniq
首先,先到 Maxmind 下載最新的 GeoLite2-City 資訊:
$ wget http://geolite.maxmind.com/download/geoip/database/GeoLite2-City.mmdb.gz
$ gunzip GeoLite2-City.mmdb.gz
安裝 geoip-bin 工具:
$ sudo apt-get install geoip-bin
$ geoiplookup 8.8.8.8
GeoIP Country Edition: US, United States
$ geoiplookup -f GeoLite2-City.mmdb 8.8.8.8
Error Traversing Database for ipnum = 134744072 - Perhaps database is corrupt?
Segmentation fault (core dumped)
囧...只好裝一下新版 maxmind python sdk 寫一段小 code:
$ sudo pip install geoip2
$ vim t.py
mport sys
import geoip2.database
reader = geoip2.database.Reader('GeoLite2-City.mmdb')
try:
response = reader.city(sys.argv[1])
print str(response.location.latitude)+","+str(response.location.longitude)
except Exception, e:
pass
$ python t.py 8.8.8.8
37.386,-122.0838
接著,就乾脆寫 python 來處理 access.log 吧 XDD 用 command line 好像太冗長了。
$ sudo cp /var/log/apache2/access.log /tmp/access.log
$ sudo chmod 644 /tmp/access.log
$ vim log.py
import geoip2.database
reader = geoip2.database.Reader('GeoLite2-City.mmdb')
try:
log = open('/tmp/access.log','rb').read()
for rec in log.split('\n'):
fields = rec.split(' ')
try:
if fields[0] == 'localhost' or fields[0] == '::1' :
continue
response = reader.city(fields[0])
print fields[3][1:]+","+str(response.location.latitude)+","+str(response.location.longitude)
except Exception, e:
pass
except Exception, e:
pass
$ python log.py > log.csv
$ cat log.csv
...
07/Sep/2014:22:12:43,35.685,139.7514
07/Sep/2014:22:13:53,39.4899,-74.4773
...
對於時間格式不用擔心,直接丟進 cartodb.com 請他幫你處理!
匯入後,預設都是 string,可以把 field1 設成 date type,field2 跟 field3 都設成 number type,弄完順便 rename 一下,接著再點選 geo 欄位,可以採用 field2 跟 field3 來生成,如此就完成 CartoDB table 製作。
最後再去視覺化那邊,挑一下以 date 為基準的時間變化,就可以有不錯的視覺圖表。
沒有留言:
張貼留言