In this Checklist for Success series, we will discuss reducing unknowns when hosting in the cloud using and migrating to Amazon Aurora. These tips might also apply to other database as a service (DBaaS) offerings.
While DBaaS encapsulates a lot of the moving pieces, it also means relying on this approach for your long-term stability. This encapsulation is a two-edged sword that takes away your visibility into performance outside of the service layer.
Shine a Light on Bad Queries
Bad queries are one of the top offenders of downtime. Aurora doesn’t protect you against them. Performing a query review as part of a routine health check of your workload helps ensure that you do not miss looming issues. It also helps you predict the workload on specific times and events. For example, if you already know your top three queries tend to exponentially increase, and are read bound, you can easily decide to increase the number of read-replicas on your cluster.
Having historical query performance data helps makes this task easier and less stressful. While historical data allows you to look backward, it’s also very valuable to have a tool that lets you look at active incident scenarios in progress. Knowing what queries are currently running when suffering from performance issues reduces guesswork and helps solve problems faster.
Pick Your Tool(s)
There are a number of ways you can achieve query performance excellence. Performance Insights is a built-in offering from AWS that is tightly integrated with RDS. It has a seven-day free retention period, with an extra cost beyond that. It is available for each instance in a cluster. Performance Insights takes most of its metrics from the Performance_Schema. It includes specific metrics from the operating system that may not be available from regular Cloudwatch metrics.
Query Analytics from Percona Monitoring and Management (PMM) also uses the same source as Performance Insights: the Performance Schema. Unlike Performance Insights though, PMM is deployed separately from the cluster. This means you can keep your metrics even if you keep recycling your cluster instances. With PMM, you can also consolidate your query reviews from a single location, and you can monitor your cluster instances from the same location – including an extensive list of performance metrics.
You can enable Performance Insights and configure for the default seven-day retention period, and then combine with PMM for longer retention period across all your cluster instances. Note though that PMM may incur a cost for additional API calls to retrieve performance insight metrics.
Outside of the built-in and open source alternative, VividCortex, NewRelic and Datadog are excellent tools that do everything we discussed above and more. NewRelic, for example, allows you to take a good view of the database, application and external requests timing. This, in my opinion, is so very valuable.
Bad queries are not only the potential unknowns. Deleted rows, dropped tables, crippling schema changes, and even AZ/Region failures are realities in the cloud. We will discuss them next! Stay “tuned” for part two.
Meanwhile, we’d like to hear your success stories in Amazon Aurora in the comments below!