All systems are operational

Past Incidents

Monday, 14th January 2019

Storage Cluster: Minor Software Upgrade, scheduled 2 days ago

On Monday 2019-01-14 from 14:00 CET to 17:00 CET, we will install the latest storage cluster software patches. During this maintenance we expect short periods of slightly degraded performance on the storage cluster (SSD and bulk storage) as well as on our object storage. No further impact is expected.

Date / Time
Monday 2019-01-14, 14:00 CET to 17:00 CET

Expected Impact
We expect short periods of slightly degraded SSD storage, bulk storage and object storage performance during this maintenance.

We apologize for any inconvenience this may cause and thank you for your understanding.

Friday, 11th January 2019

No incidents reported

Thursday, 10th January 2019

No incidents reported

Wednesday, 9th January 2019

No incidents reported

Tuesday, 8th January 2019

Network Infrastructure Incident Report Regarding Network Issues

Detailed Incident Report

On 2019-01-07 at 15:35 CET, we were notified by our network monitoring system that we were facing partial packet loss to various destinations in the Internet. After short investigation we discovered that the BGP sessions with all of our upstream providers have been reset at the same time and were re-established after a few seconds.

Between 15:33 and 15:50 CET, the BGP sessions were reset and re-established repeatedly which caused further partial packet loss for Internet-facing connections. After that, the situation was stable again.

However, as a result of the numerous BGP resets in such a short period of time, our prefixes became the victims of BGP route dampening and therefore, our services may not have been available via certain Internet providers for a longer period of time.

At 16:25 CET, we received confirmation from various customers and our network monitoring system that our services were reachable from all over again.

Root Cause Analysis

Shortly after 16:00 CET, we were already in touch with the vendor and discussed potential workarounds. It quickly turned out that we were by far not the only ones affected and that these BGP session resets were caused by the DISCO experiment, which triggered a bug in our routing stack.

The bug was triggered by the use of a BGP attribute reserved for development in the virtual network control (VNC) code of our routing stack. VNC was using "255" as a development value for features that were never standardized. The intent was to disable this usage for non-development use, but this did not happen.

Since "255" was a known attribute, the software tried to parse this attribute (that was generated as part of the experiment) -- and failed as it was of an unknown format. This failure in turn resulted in the common attribute parsing error behavior being triggered. RFC 4271 mandates a session reset in this case.

Steps Taken

Between 18:00 and 19:00 CET, in an emergency maintenance window, we installed a patch containing a workaround for the VNC issue mentioned above.

The vendor is now working on a final solution to prevent this from happening again. We are considering sponsoring the implementation of RFC 7606 to contribute to BGP stability in our routing stack in the future.

Please accept our apologies for the inconvenience this incident may have caused you and your customers. We keep doing our best to prevent such situations from happening.

Monday, 7th January 2019

Network Infrastructure Network Issues (Emergency Maintenance Window)

After successful tests in our lab, we will now roll out a patch containing a workaround for the issues seen earlier today. During this maintenance work you may experience short periods of packet loss (up to 1-2 minutes) or higher RTTs for connections from and towards the Internet. Connections between virtual servers at cloudscale.ch will not be affected by this maintenance work.

We apologize for any inconvenience this may cause and thank you for your understanding.

Network Infrastructure Network Issues

The situation is stable again. The incident was caused by the DISCO experiment. We will follow up with a detailed incident report later.

Network Infrastructure Network Issues

We are currently experiencing network issues with our upstream providers. We are investigating.

Sunday, 6th January 2019

No incidents reported

Saturday, 5th January 2019

No incidents reported