The flapping Ceph Mon service was caused by the failure of one of the system SSDs in a hardware RAID-1. It seems that the hardware RAID controller was not fast enough in removing the failing SSD which in turn led to instabilities of the Ceph Mon service. Now that the SSD has been removed, all services on this storage node are fully operational again and we have added the server back to the cluster.
We will replace the faulty SSD soon in order to fully restore redundancy of the system RAID.
Please accept our apologies for the inconvenience this incident may have caused.