ELK 教學 - 定期清除 Elasticsearch 資料

2017-07-27

ELK

-- Pageviews

當開始使用 ELK 蒐集 Log 後，終究有一天 Elasticsearch 會把硬碟空間塞爆。
建議定期把 Log 清除，本篇將介紹定期清除 Elasticsearch 過舊的資料。

要刪除 Elasticsearch 過舊的 Index 有兩種方式：

使用官方的 Curator
自己寫排程

我在得知有 Curator 之前，就已經用 Shell Script 把清除 Elasticsearch 資料的邏輯寫完…
為了不辜負我自已寫的東西，所以我要分享自己寫排程清除 Elasticsearch 資料。XD

Shell Script

1. Get Index Name

由與我們公司的 Index 都是以週為單位區分，所以 Index 的名稱後綴都會再上 -年.週。
用 curl 查詢 Elasticsearch 的 Indices Name：

1	curl "http://localhost:9200/_cat/indices?v&h=i"

查詢結果：

i
.kibana
ddd-rrr-dev-2017.20
ddd-rrr-dev-2017.21
filebeat-sss-2017.22
filebeat-sss-2017.27
logstash-bbb-2017.28
logstash-bbb-2017.29
logstash-ccc-2017.27
logstash-ccc-2017.25
index

自訂的 Index 都是後綴 -2017.xx，xx就是 2017 年的第幾週。

2. Filter Index

透過 grep 過濾出符合後綴 -年.週 的 Index

1	curl "http://localhost:9200/_cat/indices?v&h=i" \| grep -P "\-\d{4}\.\d{2}$"

最後再排除 N 週內的資料，結果就是要被刪除的 Index Name 囉～

1	curl "http://localhost:9200/_cat/indices?v&h=i" \| grep -P "\-\d{4}\.\d{2}$" \| grep -Pv "(\-2017\.27\|\-2017\.28\|\-2017\.29)\b"`

查詢結果：

ddd-rrr-dev-2017.20
ddd-rrr-dev-2017.21
filebeat-sss-2017.22
logstash-ccc-2017.25

3. Delete Index

把查出來的結果用 Elasticsearch 的 DELETE 刪除：

curl -XDELETE "localhost:9200/ddd-rrr-dev-2017.20"
curl -XDELETE "localhost:9200/ddd-rrr-dev-2017.21"
curl -XDELETE "localhost:9200/filebeat-sss-2017.22"
curl -XDELETE "localhost:9200/logstash-ccc-2017.25"

4. Save Script

把邏輯寫成 sh 檔案，我是把它跟 Elasticsearch 的設定檔存在一起，比較好找。

1	vi /etc/elasticsearch/purge.sh

#!/bin/sh

ES_URL_AND_PORT=localhost:9200

main() {
  delete_indices "uat-*" 2
  delete_indices "prod-*" 3

  curl -XPUT "$ES_URL_AND_PORT/_settings" -H "Content-Type: application/json" -d '{ "index.blocks.read_only_allow_delete": "false" }'
  curl -XDELETE "$ES_URL_AND_PORT/.monitoring-*"
  # curl "$ES_URL_AND_PORT/_cat/indices"
}

delete_indices() {
  local pattern=$1
  local keep_week=$2

  local i=0
  local keeps=""
  local year=$(date +%Y)
  local week=$(date +%V)
  local temp_week=$week

  while [ $i -lt $keep_week ]; do
    keeps="$keeps\-$year\.`printf %02d $temp_week`|"
    if [[ $temp_week -le 1 ]]; then
      year=`expr $year - 1`
      temp_week=53
      ((i--))
    else
      ((temp_week--))
    fi
    ((i++))
  done

  if [[ $i != 0 ]]; then
    EXPIRED_INDICES=`curl "http://$ES_URL_AND_PORT/_cat/indices/$pattern?v&h=i" | grep -P "\-\d{4}\.\d{2}$" | grep -Pv "(${keeps::-1})\b"`
    for name in $EXPIRED_INDICES
    do
      # echo "curl -XDELETE $ES_URL_AND_PORT/$name"
      curl -XDELETE "$ES_URL_AND_PORT/$name"
    done
  fi
}

main "$@"

我的範例是留 12 週，基本上需求。我個人認為超過三週的 Log 就已經沒有價值了。
但如果你是拿 Log 來做分析，就另當別論了！

定期執行

把 Shell Script 加到 crontab。

1	vi /etc/crontab

# Example of job definition:
# .---------------- minute (0 - 59)
# |  .------------- hour (0 - 23)
# |  |  .---------- day of month (1 - 31)
# |  |  |  .------- month (1 - 12) OR jan,feb,mar,apr ...
# |  |  |  |  .---- day of week (0 - 6) (Sunday=0 or 7) OR sun,mon,tue,wed,thu,fri,sat
# |  |  |  |  |
# *  *  *  *  * user-name  command to be executed
0 5 * * 1 root /bin/sh /etc/elasticsearch/purge.sh