Decoding Magento 2.3.4: Unraveling the Mystery of Increased Database Load and the Promising Fixes

Decoding Magento 2.3.4: Unraveling the Mystery of Increased Database Load and the Promising Fixes

Decoding Magento 2.3.4: Unraveling the Mystery of Increased Database Load and the Promising Fixes

In the dynamic world of eCommerce platforms, Magento 2.3.4's upgrade appears to be a double-edged sword. It's unleashed a conundrum, causing a noticeable spike in database CPU load, primarily credited to a persistent, time-consuming query related to the popular search term cache. This blog article dives deep into this issue, exploring its roots, understanding its impacts, and scrutinizing the proposed solutions while seeking an effective fix for this intriguing performance anomaly.

Understanding the Performance Anomaly: The Culprit Query

A performance anomaly plaguing the Magento 2.3.4 eCommerce platform can be traced back to a single, seemingly innocuous query: SELECT DISTINCT COUNT(*) FROM search_query WHERE (store_id = 1) AND (num_results > 0). The query is part of a mechanism to cache popular search terms, an optimization tool designed to enhance user experience by reducing the retrieval time for frequently searched items. However, instead of speeding things up, the query is causing an unexpected spike in database CPU load.

This load increase is conspicuous with the upgrade to Magento 2.3.4, and has been reported not just on this version, but reproduced in Magento 2.4.x and beyond, persisting even in the recent 2.4.5-p4 patch. While it can be argued that performance issues may arise due to a myriad of reasons, the finger points consistently at this particular query, a common denominator across multiple reported incidents, making it the culprit of our analysis.

The Factor of Volume: Impact of Larger Datasets on Query Duration

The performance issue brought by the culprit query becomes exponentially troublesome as the size of the dataset increases. Imagine the search_query table as a giant library. A librarian (the database) is tasked to count the number of books (entries) that have been borrowed at least once (num_results >0). Now, if the library only contained a few shelves of books, this might be a manageable task. But as the library grows to include millions of books, the task becomes significantly more time-consuming, and the librarian is increasingly taxed, slowing down other operations.

Users have reported slow query performance on live sites with millions of search terms, indicating a direct correlation between the volume of data and the query duration. As the eCommerce store grows and accumulates more search data, the strain on the database escalates, slowing down operations and negatively impacting user experience, and ultimately, sales.

Unraveling the Code: Dissecting the num_results > 0 Condition

The num_results > 0 condition is a critical part of the offending query. In plain English, it is instructing the database to count only distinct search queries that have returned at least one result. However, the inclusion of this condition is significantly lengthening the query duration.

The reason behind this lies in the way the database processes the query. Without the condition, the database can freely count each row in the search_query table. But, with the condition, the database needs to inspect each row, checking if the num_results value is greater than zero before counting it. This additional step, although seemingly trivial, becomes a substantial burden when dealing with large datasets.

The num_results >0 condition, thus, unwittingly becomes a major contributor to the performance anomaly. The irony is manifested as an optimization tool becoming an optimization hurdle, with the num_results > 0 condition morphing from a simple checker of relevance into a processor-intensive taskmaster.

The Ongoing Struggle: Tracing the Persistence of the Issue Across Magento Updates

Despite the tireless efforts of developers and Magento contributors, the issue has shown a stubborn resilience across various versions of Magento, from 2.3.4 to 2.4.5-p4. As pointed out by several users, this query, labeled as SELECT DISTINCT COUNT(*) FROM search_query, continues to haunt Magento sites, causing slow query performances in both the admin panel and the live site, inducing a significant performance degradation.

Interestingly, the issue was marked as stale due to the lack of recent activity, only to be confirmed relevant later. This highlights the persistence and relevancy of the issue that continues to impact larger datasets. Despite the speculated fixes provided in c90edaa, the performance issue still prevails. The redundancy of the DISTINCT operator and the inefficacy of the suggested fix in c90edaa underscore the need for a more robust solution.

Seeking the Holy Grail: Evaluating the Effectiveness of Removing the DISTINCT Operator

The DISTINCT operator, as it turns out, has been the center of much debate among the Magento community. It has been identified as a significant source of performance degradation. This operator, which is part of the query, SELECT DISTINCT COUNT(*) FROM search_query, has been deemed redundant due to the existing unique constraints on query_text and store_id. Thus, removing the DISTINCT operator from queries within Magento\Search\Model\ResourceModel\Query\Collection has been widely recommended.

The elimination of the DISTINCT operator reportedly reduces the query duration significantly, providing a light at the end of the tunnel for the struggling Magento users. However, it's crucial to note that this solution may not be applicable to all projects, particularly those with high search term cardinality.

In Pursuit of a Solution: Navigating Workarounds and Potential Fixes

Given the persistent nature of the performance issue, the Magento community has proposed several alternative solutions. These include asynchronous insertion of search terms in batches, only inserting a fraction of the search terms, or even stopping tracking search terms entirely.

Another possible workaround includes overriding the execute function in Magento_CatalogSearch/Controller/Result/Index. Notably, the removal of the getCacheableResult part of the function has been reported to improve the query performance. Further, adding an index on the search_query table has been suggested as a potential solution.

While the issue was resolved on the Magento 2.4-develop branch, the problem persists on other versions, prompting a reopening of the issue for reevaluation. It's clear that while a silver bullet solution remains elusive, the pursuit for an effective fix continues. The Magento community remains hopeful and relentlessly innovative, aiming to decode the mystery of this intriguing performance anomaly once and for all.

Accordingly, the persistent performance anomaly encountered across various versions of Magento, driven by both an innocuous query and an ironic optimization tool, undeniably requires a sophisticated fix. We explored a variety of potential workarounds, such as the much-debated removal of the DISTINCT operator, asynchronous insertion of search terms, overriding the execute function in Magento_CatalogSearch, and others. However, no single remedy seems universally applicable or persistently effective, underscoring the need for a more comprehensive, situation-specific solution strategy.

In conclusion:

  • The issue's resilience, even across updates, underlines the need for a robust, long-term solution.
  • The elimination of the DISTINCT operator and adding an index on the search_query table offer promise, but may not be universally applicable.
  • Workarounds, while potentially beneficial, should be considered temporary, as the pursuit of a more sustainable fix continues.

Therefore, while Magento's puzzling performance anomaly remains a challenge, it also presents an opportunity for the community to rally together, share ideas, and develop innovative approaches to enhance the platform's performance. The Magento community's resilience mirrors the mystery they are trying to solve, reinforcing the power of collaborative problem-solving in the face of persistent challenges.