Decoding the Mystery: How Magento 2.3's Popular Search Term Cache is Straining Your Database and The Simple Fix You Need

Decoding the Mystery: How Magento 2.3’s Popular Search Term Cache is Straining Your Database and The Simple Fix You Need

"Decoding the Mystery: How Magento 2.3's Popular Search Term Cache is Straining Your Database and The Simple Fix You Need"

In the ever-evolving landscape of E-commerce, Magento 2.3 has emerged as a preferred platform. But a hidden menace threatens its smooth operation – an intensifying strain on the database due to a query linked to the popular search term cache. The computationally heavy query, SELECT DISTINCT COUNT(*) FROM search_query WHERE (store_id = 1) AND (num_results > 0), is causing unexpected hikes in database CPU load, particularly with larger search_query tables. This blog unravels this overlooked issue and proposes a simple yet effective fix that can greatly enhance your Magento 2.3 performance.

Unveiling the Issue: The Unforeseen Strain on Your Magento Database

The foundation of any efficient e-commerce platform, such as Magento 2.3, is a well-optimized database. However, an unlikely culprit has been discovered to wreak havoc on its performance: a query associated with the popular search term cache. This query, SELECT DISTINCT COUNT(*) FROM search_query WHERE (store_id = 1) AND (num_results > 0), is seen to cause an unexpected surge in database CPU load. This issue was initially noticed on Magento Commerce 2.3.4 with Elasticsearch 6.7.0 but has since been observed on the newer 2.4-develop branch of Magento. It is not confined to Elasticsearch and has been reported with lower versions of Magento too.

The threat arises from the query's deceptive nature. The expected outcome of the query is to wrap up promptly, but in reality, it can consume an inordinate amount of time, especially with a larger search_query table. High CPU usage and lag in the performance of your Magento operation are the direct consequences of this problem.

Deciphering the Query: The Invisible Culprit Behind High CPU Loads

The problematic query is designed to count distinct records from the search_query table where the number of results is greater than zero. The segment of the query that ostensibly slows it down is the condition num_results > 0. This condition is indispensable to the functionality of the popular search term cache feature.

Interestingly, the performance issue is not merely tied to the presence of num_results; the distinct operator in the query seems unnecessary and negatively affects query performance. The distinct operator causes the database to pore over all the rows in the search_query table to ensure no duplicates are included in the count. This increases query time significantly, especially when the table becomes too large.

Perils of Large search_query Tables: Tracing the Root Cause of Prolonged Query Duration

In Magento operations, the search_query table can balloon significantly. This table records every search term entered into the site's search bar, along with the number of results (num_results) returned for each query. As the database accumulates more records, the table size mushrooms, leading to considerable strain on the database and sluggish queries.

As noted, the larger the search_query table, the longer it takes for the query to conclude. In extreme cases, with a table containing millions of entries, the query can take up to 10 seconds to complete. This slowdown can affect not only the search feature's performance but also the overall responsiveness of the Magento interface, thereby impacting the user experience.

In a performance-centric e-commerce world where every second counts, this lag presents a significant challenge. It underlines the importance of addressing this issue and optimizing these queries for smooth functioning.

Beyond Reverting the PR: Innovative Workarounds and their Limitations

Initially, the community suggested reverting the PR to address the issue. While this provided a temporary solution, it did not rectify the fundamental flaw at the heart of the problem.

A more innovative workaround involved manipulating the execute function in Magento_CatalogSearch/Controller/Result/Index to only use the getNotCacheableResult part. This approach significantly improved performance. However, it failed to fully resolve the issue, and performance of the primary search bar was still hindered, especially with larger search_query tables.

Taming the Beast: Effective Solutions to Enhance Query Performance and Ensure Smooth Magento Operation

With workarounds falling short and the issue persisting, a definite solution targeting the root cause needs to be found. One key discovery was the unnecessary distinct operator in the query that was adversely affecting its performance. The removal of this operator reduces query completion time from 10 seconds to a mere 2-3 seconds.

The search_query table already has a unique constraint on query_text and store_id, so the results will always be the same, regardless of the presence of the distinct operator. Therefore, its removal from the Magento search module's queries can enhance performance significantly.

Additional potential solutions include asynchronous insertion of search terms in batches, inserting only a fraction of the search terms, or stopping the tracking of search terms entirely.

The num_results Condition Debate: A Necessary Evil or an Unnecessary Bottleneck?

Despite the proposed solutions, the debate about the num_results condition remains. It's the part that slows down the query. Is the num_results > 0 condition a necessary evil or an unnecessary bottleneck?

Removing the num_results condition does improve the query duration significantly, but it also affects the popular search term cache feature's functionality. Since this performance issue occurs even with lower Magento versions — not just Elasticsearch — it necessitates further exploration and testing.

The issue has been reopened for further evaluation. The goal is to find a fix that addresses the performance problem without hindering the popular search term cache feature's functionality.

In conclusion, the ongoing scrutiny and optimization in the rapidly evolving e-commerce platforms emphasize the importance of continuous observation. Solutions like the removal of the distinct operator and innovative batch-wise, asynchronous insertion of search terms are testaments to open-source collaboration power. The ongoing num_results condition debate, its impact on query performance, and its importance for the popular search term cache feature underscore the complexity of the issue. It’s through these efforts that the e-commerce landscape continues to advance, becoming better and more efficient daily. As the Magento community continues to scrutinize, debate, and test, it's their collective tenacity that will lead to a refined and optimized e-commerce platform, further strengthening Magento's position as an industry leader.