After performing a controlled reboot of all switches, the situation could be stabilized. We have raised a high-priority case with our vendor and it already got management awareness there.
The issue we are facing seems to be triggered by high CPU load on the switches themselves, which leads to instabilities in control plane traffic. We have relaxed several control plane timers hoping to mitigate the issue for the time being. However, the decrease in CPU utilization was only minimal.
We will continue investigating this issue together with the vendor and keep you up to date on this status page.
Please accept our sincere apologies and rest assured that we treat this case with the utmost priority.