2015年1月29日 星期四

[SQL] select n rows from each group @ MySQL 5.6

假設有一張 table 名為 log 長這樣:

{
id INTEGER,
level VARCHAR(16),
user VARCHAR(32)
}

mysql> SELECT * FROM log
1, "SA", "admin1"
2, "SA", "admin2"
3, "SA", "admin3"
4, "SA", "admin4"
5, "RD", "programmer1"
6, "RD", "programmer2"
7, "RD", "programmer3"
8, "RD", "programmer4"
9, "RD", "programmer1"
10, "FAE", "programmerA"
11, "FAE", "programmerB"
12, "FAE", "programmerC"


有沒有一招可以撈出,讓每個 Level 只顯示 3 筆資料?假想成果:

mysql> SELECT ... FROM log GROUP BY level
1, "SA", "admin1"
2, "SA", "admin2"
3, "SA", "admin3"
5, "RD", "programmer1"
6, "RD", "programmer2"
7, "RD", "programmer3"
10, "FAE", "programmerA"
11, "FAE", "programmerB"
12, "FAE", "programmerC"


土法煉鋼法,用 UNION ALL 來處理:

mysql> SELECT * FROM (SELECT * FROM log WHERE level = 'SA' LIMIT 3) AS t UNION ALL (SELECT * FROM log WHERE level = 'RD' LIMIT 3) UNION ALL (SELECT * FROM log WHERE level = 'FAE' LIMIT 3);

所幸,問了一下強者我同學,得到個關鍵字:GROUP_CONCAT , http://dev.mysql.com/doc/refman/5.6/en/group-by-functions.html#function_group-concat

mysql> SELECT level, GROUP_CONCAT(user) FROM log GROUP BY level;
"SA", "admin1,admin2,admin3"
"RD", "programmer1, programmer2, programmer3"
"FAE", "programmerA, programmerB, programmerC"


如果想限制撈出的資料個數,要設定 group_concat_max_len:

mysql> SET group_concat_max_len = 2;
mysql> SELECT level, GROUP_CONCAT(user) FROM log GROUP BY level;
"SA", "admin1,admin2"
"RD", "programmer1, programmer2"
"FAE", "programmerA, programmerB"


雖然上述結果還不太適合再做 JOIN 來處理,但,已經算佛心了... XD

其他 Google 用的關鍵字:"select top n rows from each group",會看到一些 RANK() OVER(PARTITION BY level) ,但對 MySQL 應該不適用 XD 強者我同學說,若在 PostgreSQL 可以用:

postgrel> select level, array_aggr(user)[0:1] from table group by level;

看來該多給 PostgrelSQL 機會 XD (當初案子用到 GIS 相關 plugin 才有用它...)

此外,跟強者我同學閒聊時,發現去年的一些經驗還滿適合使用的,有些查詢很久的指令,可以考慮定期產生並儲存在另一張 tabel,降低一般 client 觸發複雜的 SQL Query,像是 JOIN, GROUP 等,這也是在大型服務中也常用到的方式。算是此次閒聊最大的心得,因為去年也有應用這個架構來處理服務,驗證自已的(偷懶)做法無誤 XDDD

1 則留言:

  1. pgsql 可以用 inheritance partition
    https://blog.engineyard.com/2013/scaling-postgresql-performance-table-partitioning

    回覆刪除