As recent struggles in production have revealed, our email infrastructure is of byzantine complexity and astounding opacity. Different products (LIP, RateTracker, Rate Notifications) handle email completely differently, and few of them produce clear metrics or logs that allow Operations to detect an email problem or troubleshoot it.
It would create a vast improvement to product reliability and maintainability if we standardized email handling across products in a manner that would allow.
1. Standardized configuration. This would allow operations to ensure that all configuration was being handled the same and to be aware of what the relevant configuration is when troubleshooting an issue.
2. Standardized metrics. Publishing metrics in some standardized format would allow Operations to put graphing and alerts around email performance, determine when queues were building up or when email flow had stopped, and appropriately size infrastructure instead of using the current "best guess" methodology.
3. Standardize sending methods. Currently some of the application use TLS encryption on email, some use plain SMTP, some use Amazon SES, some use a relayer and some do not, etc. This makes email issues difficult to troubleshoot.
4. Standardize logging. This would allow operations to look for key error messages and quickly identify applications that were failing to allow for quick resolution
5. Standardize documentation. Revamping all the applications would allow us to get a list of all the applications that *can* send email. This is information Operations does not currently have.
6. Standardized testing. Currently QA tests none of our email output functionality, we have 0% test coverage on this stuff due to the inherent difficulties of testing so many email solutions without being flagged as a spammer. A standardized functionality / configuration would allow us to implement good test hooks to test all this stuff during product regression.