vmetrics vmagent setup

alerts-alertmanager | alerts-vmalert | alerts-acceptance

daemon setup

vi alarms.yaml

groups:
- name: alarms
  rules:

  - alert: cpu usage hits the roof

    # testing 20% prod 95%
    expr: avg_over_time(log_metric_gauge_cpu_p[1m]) > 20

    # testing 5s/ prod 5m
    for: 5s

    labels:
      # https://betterstack.com/community/guides/incident-management/severity-levels/
      severity: sev5

    annotations:
      dashboard: https://vmetrics.nethence.com/vmui/#/?g0.expr=avg_over_time%28log_metric_gauge_cpu_p%5B1m%5D%29

ready to go

check that AlartManager is up and running

ping alertmanager
nmap -p 9093 alertmanager

vi /etc/rc.local

echo starting vmalert
nohup vmalert-prod -rule=/root/alerts.yaml -datasource.url=http://127.0.0.1:8428 \
    -notifier.showURL \
    -notifier.suppressDuplicateTargetErrors \
    -notifier.url http://alertmanager:9093 \
    > /var/log/vmalert.log &

# -notifier.blackhole

usage

tail -F /var/log/vmalert.log

reload

pgrep -a vmalert
kill -HUP `pgrep vmalert`

resources

https://docs.victoriametrics.com/vmalert.html

https://docs.victoriametrics.com/guides/guide-vmanomaly-vmalert.html