We use several tools to gain insight into performance at each level of our infrastructure.

Metric/stats collection is done with Collectd on host systems feeding instances of Influxdb. We then visualize this data with Grafana. A variety of Collectd plugins gather data about Ceph, system performance, network throughput, switch interfaces (snmp plugin), and more.

Collectd-Influx-Grafana Stack

Cluster Dashboard

Cluster Dashboard

Detail of OSD/Journal IO

OSD IO Detail

ELK Stack

Log collection and aggregation uses the “ELK” stack and Filebeat for shipping logs to Elasticsearch

Log collection and processing in Logstash
Log storage in an Elasticsearch Cluster
Visualization in Kibana and also in Grafana for data processed as time-series.

ELK Stack in OSiRIS