What was affected
On Sept 4, 2017 Postmark lost some of your activity data. Activity events from approximately 5am Eastern until 11pm Eastern were affected and are unrecoverable. Activity events include:
- Outbound data such as sent, opened, delivered, bounced, and clicked events
- Inbound data such as processed and blocked events
What was not affected
This did not affect email sending in any way. All emails that we received were sent to their recipients. If you had webhooks set for opens or bounces, you would have received the appropriate data sent to your webhook.
Inbound processing was delayed at times during the day, but all inbound events were sent to your webhooks. Only the post-processing record that would have been shown in your activity stream is missing.
Daily statistics that we provide were not affected.
A problem in recovering data from our Elasticsearch cluster was the cause of today's data loss.
At approximately 5am Eastern our Elasticsearch cluster became unavailable. We determined that this was due to unavailable master nodes and restarted the eligible master nodes in our cluster. This succeeded in making the cluster available again and accept writes and reads against the cluster. However, due to the new master node reelection, even though our Elasticsearch cluster was in an operational state, it was in an extremely fragile state, not having our usual levels of redundancy.
Our team spent the day trying several different approaches to recovery but we weren’t able to resolve the issue. We then decided to perform a full cluster restart of our data nodes. This did succeed in bringing the cluster into a fully operational state, but as mentioned earlier, the cluster was not in a redundant state. Some nodes had the only copy of some data so when we forced the restart, the data on it was lost.
We've already started investigating alternative data stores so that we aren't so reliant on Elasticsearch. When we have more information on this, we'll be sure to blog and let you know.
We know that you depend on Postmark as an infrastructure product and we take any kind of data loss very seriously. Even though none of your emails were lost, we know you depend on activity data to help troubleshoot and give feedback to your customers, so we do take this incident very seriously.
If you have any questions about the specifics of this incident, please reach out to us at email@example.com.