DEGRADEDLink tracking issues
Click tracking became unavailable for email recipients served by our Amsterdam data center.
When it happened
May 15th, 12:24 AM EST - May 15th 6:06 AM EST
The cause of the issue
Our upstream provider for our link tracking servers stopped forwarding packets for the public IP in the Amsterdam region to our link routing nodes. The configuration for our nodes and the IP forwarding had not been modified, the routing simply stopped working. We are working with this provider to understand why this happened, as well as taking additional steps to mitigate the issue in the future.
What we’re doing to mitigate the issue in the future
Changes to DNS failover. For our API and SMTP endpoints, we use AWS Route 53 health checks to automatically failover those endpoints in the event of issues in a particular data center. The click tracking system currently has a manual failover process. We will be updating this to use Route 53 health monitoring and automatic failover.
More regular review of playbooks. We have had virtually zero downtime on the click tracking system since it was released 18 months ago. As a result, not all team members are familiar with the failover process for this system. We have documented procedures ("playbooks") for service issues and how to mitigate various types of issues, but we need to review them on a more regular basis and do more fire drills to verify the failover and recovery processes are current and simple.