We have experienced an instability in Kubernetes CoreDNS that caused our internal name resolution to fail, and as a consequence, our alerting pipeline was impacted, since alerts rely on webhooks that could not be delivered. As an improvement action, we will deploy CoreDNS as a DaemonSet to ensure resilience and consistent availability across all nodes. We will also add an internal DNS probe within the cluster to monitor DNS-related failures and provide early detection even when CoreDNS is degraded.
Posted Dec 08, 2025 - 12:27 GMT-03:00
Update
Communication between our services and the workload manager has been restored, and the environment is stable. We continue to monitor the situation and will update this page soon with more information about the incident.
Posted Dec 08, 2025 - 12:05 GMT-03:00
Monitoring
A fix has been implemented and we are monitoring the results.
Posted Dec 08, 2025 - 11:59 GMT-03:00
Update
We have identified a communication failure between some of our servers and our workload manager, which affected the application distribution process.
Posted Dec 08, 2025 - 11:38 GMT-03:00
Identified
The issue has been identified and a fix is being implemented.
Posted Dec 08, 2025 - 11:22 GMT-03:00
Investigating
We are currently investigating this issue.
Posted Dec 08, 2025 - 11:07 GMT-03:00
This incident affected: PORTAL, OPERATIONS, API, INTEGRATION, WEBHOOKS, NOTIFICATION, and REPORT.