The better the statistics and the more precisely it describes the actual data, the faster the plan will be, because the optimizer image of reality will be closer to the actual reality.Īnd, of course, when not to use indexes, when up-to-date statistics is enough. To make this decision optimizer can examine the actual table data, but with multi gigabyte and terabyte tables, the only practical solution is to use various data statistics that were collected in advance.
When your SQL query reaches the DBMS, it's the optimizer's job to decide how to execute it for you to get the result as fast as possible. Without any statistics available, the optimizer uses some hard-coded guesstimates, and this assuming that 10 of the rows will match the predicate cagmtoffset -5. If anyone can shed more information on this, please let me know in the comments.In this talk you'll learn what data statistics MariaDB and MySQL can collect, what statements do that, how to tell the optimizer to use it (it won't necessarily do it automatically!) and how it can make your queries many times faster. However, what the query optimizer doesn’t realize is that there is only one distinct value in the column sgmtoffset. I assume (like most of Aurora) that there is some special AWS sauce helping here. I still don't have a good explanation as to why the 53 join query had no issues executing on AWS Aurora MySQL, optimizer_search_depth is set to the default of 62 on there. Not fast, but a damn sight faster than several days. The 53 JOIN query that used to fail to complete, now completes in 268ms.
The CPU never went up to the same levels due to the change in optimizer_search_depth. I also killed every running query manually that was stuck in the Statistics phase. This will mean that queries that do > 7 JOINS may not run the best query path, but at least they will actually complete.
has the background on why 62 was chosen as the default for optimizer_search_depthĪfter reading the above literature, we ended up settling on the 'automatic' tuning for optimizer_search_depth.contains really useful information on tuning optimizer_search_depth in general.indicated that tuning optimizer_search_depth to a value lower than 62 would be a good idea.gave the first clue about optimizer_search_depth and how this can blow out query times.seemed to indicate that Statistics phase hanging could be due to a lack of IOPS.ResearchĪfter seeing the seeing queries stuck in Statistics phase, we did some digging to see what other content had to say on this topic: We even had trouble trying to connect to the database cluster via the MySQL client, due to timeouts. Upsizing the vCPU count in the database cluster had no impact, as the 'stuck' queries just consume all the CPU available. These queries did not appear to ever complete, and the CPU was pegged at 100%. You can check the accuracy of statistics by comparing the actual cardinality of an index (determined by running SELECT DISTINCT on the index columns) with the estimates in the mysql.innodbindexstats table. The issue seemed to disproportionately impact SQL queries with > 40 joins in a single query. Statistics are not accurate enough and the optimizer chooses suboptimal plans, as shown in EXPLAIN output.
Running a SHOW FULL PROCESSLIST showed these queries were all stuck in a Statistics phase. In saying that, the database overall data size was tiny, with only 228 pieces of content in Drupal (this is very low, some Drupal sites can have millions of items of content). One query I found that was 'stuck' had 53 joins. We were seeing SQL queries appear to never complete when they had lots of JOINs in them. These SQL queries the end developers do not write by hand, Drupal abstracts this detail away through the entity API. When loading a piece of content, it is not uncommon in Drupal to have 20+ joins on a single SQL query. A given piece of content can contain dozens of fields. The issueĭrupal is extremely flexible, and creates a highly normalised table structure, 2 tables per field on a piece of content (1 for revisions, and another for active data). This platform contains a number of Drupal 8 sites, which surface content through JSONAPI (now part of core in Drupal 8). I was recently part of a migration from AWS ( Amazon Aurora - MySQL-Compatible) to Azure ( Azure database for MariaDB 10.3) for a large suite of applications.