Friday, 29th January 2021

Core Network Infrastructure Incident Report Regarding Connectivity Issues on 2021-01-27 due to DDoS Attack

Management Summary

In the afternoon of Wednesday, 2021-01-27, cloudscale.ch was the target of a DDoS attack which was announced by a blackmail message just minutes before the attack started. Thanks to our documented and tested procedures, impact on our customers' services could be avoided for much of the attack's duration. However, in addition to a short period of link saturation caused by the attack traffic directly, some of our mitigation measures had unwanted side effects, limiting connectivity for a subset of virtual servers and situations. As an immediate action, we have further extended our monitoring and will thoroughly review our DDoS mitigation strategies.

Please accept our apologies for the inconvenience this incident may have caused you and your customers.

Detailed Incident Report

13:20 - 19:15 CET: Overall incident duration

Situation

cloudscale.ch was targeted by a volume-based DDoS attack using a number of attack techniques. Inbound attack traffic started at 13:20 CET, just minutes after we received a blackmail message at 13:07 CET. Over time, the attack details changed, involving different IPv4 addresses in multiple subnets, and growing in traffic volume.

Thanks to our monitoring, including alert thresholds for utilization of individual links, we were able to immediately execute our documented mitigation procedure, responding to the attack characteristics we were observing.

At about 18:30 CET, the attack traffic faded. However, as a precautionary measure, we decided to keep and adapt our mitigation measures for a little longer.

Impact

For most of the total incident duration, thanks to our redundant, amply sized uplinks and quick mitigation efforts, there was no relevant impact to our customers' services. During specific periods, however, effects of the attack and our countermeasures affected parts of our customers' external connections and/or servers (see below for details).

17:13 - 17:28 CET: Saturation of certain links

Situation

At 17:13 CET, we noticed a change in the attack pattern. It started to include a larger number of different IP addresses in our network, and grew further in traffic volume. As a consequence, certain links were fully utilized, causing congestion and connectivity failures for connections using one of those paths.

Measures taken

Building on the measures which were already in place, we adapted and reinforced our attack mitigation in order to move traffic away from the saturated links and to achieve more effective filtering of attack traffic based on the new target addresses as we began seeing them in the attack.

Impact

Saturation of certain links caused connection failures or degraded performance for connections between systems within our cloud infrastructure and external systems. These issues potentially affected all customers, but only for connections routed through one of the saturated links. Affected connections included traffic of virtual servers, DNS lookups using our resolvers, requests to our object storage from external sources as well as access to our website, Cloud Control Panel, and API.

Not affected

Network traffic within our cloud infrastructure (both using public IP addresses and connections through private networks) was not affected by the attack.

17:39 - 17:51 CET and 18:21 - 19:15 CET: Side effects from mitigation measures

Situation

The attack mitigation measures taken so far proved to be effective in keeping attack traffic under control. However, traffic distribution across links was far from ideal. There was significant utilization on some of the links, which posed a risk to stable operation in case of a further increase in (attack) traffic, while legitimate traffic was not using available capacity on other links.

Measures taken

We tried to re-engineer traffic distribution across uplinks by multiple changes in BGP announcements and options in order to move traffic to underutilized links while keeping traffic filtering in place to mitigate the ongoing DDoS attack. Unfortunately, some of these changes had unwanted side effects on some of the network segments which were involved in the attack.

Impact

As a consequence of the traffic redistribution efforts, external connections from and to virtual servers in the RMA region using IPv4 addresses in either 5.102.145.0/24, 5.102.146.0/24, or 5.102.147.0/24 were not possible.

Not affected

Given the specific scope of the impact mentioned above, the following use cases and variations were not affected by this issue:

  • Network traffic within our cloud infrastructure (both using public IP addresses and connections through private networks)
  • External connections from and to virtual servers in the RMA region using Floating IPs; in combination with internal traffic being unaffected (see above), this means that HA/load-balancing and similar setups remained fully available as long as external connections were established through a Floating IP
  • IPv6 connections
  • Virtual servers in the LPG region
Learnings and follow-up actions

In retrospect, our documented and tested DDoS mitigation procedures helped us to react quickly and effectively in order to keep customer impact to a minimum for most of the attack's duration. However, it became clear that depending on the attack structure and general situation, there were also limits to the approach we had chosen.

We will re-evaluate the possible mitigation strategies to better balance effective filtering of actual attack traffic on the one hand and making the best use of all available traffic paths for legitimate traffic on the other. We will also update our procedures to explicitly cover a broader set of potential, subtly different scenarios to achieve the best possible mitigation effect while avoiding decisions on the spot.

As an immediate action, we have already extended our monitoring to detect unwanted side effects like those accidentally introduced by our attack mitigation measures.