Ceph Logo

Ceph is a distributed object store and file system designed to provide excellent performance, reliability and scalability.

The OSiRIS Ceph deployment spans WSU, MSU, and UM. We currently have deployed approximately 900 OSD. Our OSD are 8TB or 10TB disks for a total of about 8PB raw storage.

Puppet All of our components are deployed and managed with a puppet module forked from a module started by the Openstack group. The module code is available on Github: https://github.com/MI-OSiRIS/puppet-ceph

Ceph Metrics

To gather Ceph metrics we use Collectd with a plugin that reads from the daemon admin sockets. Collectd feeds into Influxdb which supports intaking Collectd UDP data directly. We also gather system stats such as CPU, Iotime, memory, threads, etc. For an overview of this toolchain please have a look at our monitoring and logging overview

We can then visualize this data with Grafana. For example, here are two simple dashboards showing OSD operation latency and operations per second.

OSD Operation latency

OSD Operations per second

OSD Operations per second

OSD Operations Latency

Cluster Dashboard

We also can combine plots to make dashboards giving us an overview of our cluster.

Ceph Dashboard