"Boosting Magento Performance: Unraveling the Mystery of High CPU Load and the Surprising Role of Search Term Cache"
Navigating the complex world of Magento performance issues can feel like solving a mystery, and one persistently perplexing issue stands out – the high CPU load caused by the popular search term cache. It's a problem that has stubbornly persisted across versions, from Magento 2.3 to the latest 2.4.x branches, with a surprising culprit being the 'DISTINCT' operator in the search_query table. This article unravels the enigma, exploring this performance issue's roots and potential solutions that can significantly boost your Magento platform's performance.
The Persistent Performance Issue: High CPU Load in Magento 2.3 and Beyond
Diving into the issue at hand, a sudden splash of high database CPU load was reported following the upgrade to Magento 2.3. This performance problem then stubbornly persisted, even through the latest 2.4.x branches, causing distress among many in the Magento community. The culprit, however, was found to be right under our noses—the popular search term cache. It appears that the larger the search_query table grows, the longer the associated query takes to complete. This led to a significant increase in the CPU load, hindering performance and frustrating users across the globe. Even overriding the execute function in Magento_CatalogSearch/Controller/Result/Index or adding an index on the search_query table did not sufficiently rectify the issue. The problem remained in Magento versions up to and including 2.4.5-p4, reporting a persisting high CPU usage, much to the dismay of users.
Deciphering the Problem: The Role of 'DISTINCT' Operator and the search_query Table
After sleuthing through the lines of code, Magento sleuths traced the issue back to the 'DISTINCT' operator within the search_query table. The queries housing this operator, such as SELECT DISTINCT COUNT(*) FROM search_query, were noted to take an unusual amount of time to execute, causing significant performance issues. Worse still, the search_query table grew rapidly as the distinct operator was implemented multiple times, exacerbating the problem.
The DISTINCT operator, however, appeared to be an unnecessary element, seemingly only hampering performance rather than helping it. This is particularly the case when it comes to the unique constraint on query_text and store_id. It's like adding a gate to a fenceless field—it serves no practical function and only complicates matters.
Exploring the Solutions: Adjustments to the Search Term Cache for Enhanced Performance
From the ocean of issues, islands of solutions began to emerge. Removing the DISTINCT operator started showing promise in improving performance. Within \Magento\Search\Model\ResourceModel\Query\Collection, excising the DISTINCT operator started to yield quicker query times. Interestingly, it was found that the removal of the DISTINCT operator reduced the query time drastically – from 700ms down to a mere 2ms – and remarkably, it produced the same results.
Another promising solution was found in making adjustments to the search term cache. By changing $select->distinct(false), the keyword search performance improved, suggesting that the high search term cardinality was more of a hindrance than a help.
The problem of the rapidly expanding search_query table also prompted another solution: the insertion of search terms in batches. The idea here is to prevent the slowdown caused by the sudden surge of search terms in the table. By only inserting a fraction of the search terms or inserting them asynchronously in batches, performance can possibly be improved.
The Potential of Asynchronous Processing: Inserting Search Terms in Batches
As the search_query table grows rapidly, the performance issues escalate, with the keyword search taking as long as 10 seconds to complete in Magento 2.4.6. Given the high search term cardinality, inserting search terms asynchronously in batches emerges as an effective solution to prevent these slowdowns.
By only inserting a fraction of the search terms, the system can manage the load more efficiently, striking a balance between availability of popular search terms and performance. This method sidesteps the performance roadblocks posed by the 'DISTINCT' operator and the num_results condition in the query, both of which have been identified as culprits of slow performance.
Understanding the Impact of DISTINCT Operator: Insights from Keyword Search Time Reduction
Testing the query time difference with and without the 'DISTINCT' operator reveals striking insights. The query SELECT DISTINCT COUNT(*) FROM search_query takes hundreds of milliseconds to finish, whereas the same query without the 'DISTINCT' operator takes a mere few milliseconds.
Moreover, removing the 'DISTINCT' operator, which does not affect the result of the queries, reduces the keyword search time significantly. This surprising discovery points at the 'DISTINCT' operator as a major performance hampering component. It's like an unnecessary speed bump on the road of Magento's search operation, slowing down the process without yielding any added benefits.
Beyond the Fixes: Ensuring High Performance in Future Magento Versions
As we step into the future, ensuring high performance in upcoming Magento versions is paramount. While the removal of the 'DISTINCT' operator from queries within \Magento\Search\Model\ResourceModel\Query\Collection and the strategic insertion of search terms in batches are fruitful solutions, Magento developers must also be vigilant about further enhancements.
Despite the performance issues being relevant in various Magento versions from 2.4.1-p1 to 2.4.6, simple fixes like changing $select->distinct(false) can significantly improve keyword search performance. It's clear that the search_query table, with millions of entries, needs efficient management to ensure consistent performance.
To this end, adding an index on the search_query table might resolve the issue. Overriding the execute function in Magento_CatalogSearch/Controller/Result/Index can also enhance performance. While the c90edaa commit was thought to be a fix, it doesn't solve the performance issue, indicating the need for alternative solutions.
In conclusion, the labyrinthine issue of high CPU load in Magento is largely attributed to the 'DISTINCT' operator in the search_query table and the surprising role of search term cache. By implementing key adjustments such as:
- Removing the 'DISTINCT' operator from queries
- Asynchronously inserting search terms in batches
- Modifying the search term cache with $select->distinct(false)
We can significantly enhance Magento's performance, optimizing the eCommerce experience for users worldwide. Furthermore, by being proactive in managing the rapidly expanding search_query table, we can ensure smoother, faster performance in future Magento versions. The complexity of the issue only underscores the importance of continual exploration and refinement of solutions to keep pace with the ever-evolving world of eCommerce technology. Let us use these insights as stepping stones towards creating a more efficient and reliable Magento experience, and let this be a testament to the unending quest for better, more responsive digital platforms.