Dismantling the Distinct Operator: An Inside Look at Magento 2.3's Performance Pitfall and How to Overcome It

Dismantling the Distinct Operator: An Inside Look at Magento 2.3’s Performance Pitfall and How to Overcome It

Dismantling the Distinct Operator: An Inside Look at Magento 2.3's Performance Pitfall and How to Overcome It

In the ever-evolving sphere of e-commerce, Magento 2.3's performance has been hampered by a seemingly innocuous query that harbors a significant flaw — the excessive usage of the DISTINCT operator in search term queries — leading to increased database loads and high CPU usage. The culprit, a recently added Popular Search Term Cache feature, has been putting immense strain on larger databases, causing marked delays in query executions. This article delves into the predicament, offering insight into the issue and potential solutions to this performance pitfall that continues to plague even the latest Magento branches.

Unveiling the Performance Dilemma: The Impact of Magento 2.3 Update

Magento 2.3’s update, while initially appearing as a boon with its Popular Search Term Cache feature, has inadvertently introduced a persistent performance issue. The upgrade pushed websites into a state of heightened database CPU load. Why? A single, seemingly benign query: SELECT DISTINCT COUNT(*) FROM search_query. Its frequency and duration have caused a surge in load times, effectively altering the underlying performance dynamics of the platform.

Its role? Primarily, this query operates within the main search bar, an element where swift responses are expected, and any delay can impact user experience significantly. While the functionality of caching popular search terms is undeniably valuable, the manner in which it has been implemented unfortunately led to a bottleneck in performance, particularly for larger search_query tables.

Inside the Issue: Understanding the Role of the DISTINCT Operator

The DISTINCT operator in SQL is responsible for returning unique values in the output, excluding all duplicates. In the context of the Magento 2.3 update, this operator is utilized in the problematic query that is causing increased CPU usage. However, upon closer scrutiny, it seems this operator might not be necessary. Already, a unique constraint exists on query_text and store_id, rendering the use of DISTINCT redundant and superfluous.

The query’s intent is to retrieve unique counts from search_query, but the DISTINCT operator is creating additional, unnecessary load on MySQL. Its removal, subsequently, has been identified as a potential route towards mitigation of this performance issue.

The Bottleneck: Detailed Examination of the num_results > 0 Clause

Further exacerbating the situation is the clause num_results > 0. This piece of the query puzzle is identified as the primary performance bottleneck. It's responsible for filtering results that have a num_results value greater than zero, thereby increasing the execution time, especially with larger search_query tables.

Interestingly, the removal of this clause has shown to significantly improve query duration. It's an inviting prospect, hinting that the solution to Magento's performance woes might not be as complex as it initially appears. However, it raises another important question: why was the clause included in the first place if it has such an adverse effect on performance? The answer to this question may shed light on whether removal is the ideal solution, or if it will give rise to other, unforeseen problems.

The challenge now is to balance the need for efficiency with that of accuracy and functionality. After all, any solution must respect the original intent of the feature, which is to enhance user experience by quickly returning popular search terms. But it’s clear that a rethink of the current implementation is necessary, as the current scenario of high CPU usage is untenable.

Analyzing Potential Fixes: Reverting the PR vs. Overriding the Execute Function

As the performance issue with Magento 2.3 became evident, a myriad of potential fixes were presented. Two stood out prominently; reverting the PR and overriding the execute function in Magento_CatalogSearch/Controller/Result/Index. Initially, a patch was proposed to simply revert the PR. However, reverting the PR, while appearing a straightforward solution, merely paused the problem, potentially allowing for its resurgence in future updates. On the contrary, overriding the execute function presented a more sustainable solution.

By overriding the execute function, performance was improved significantly by focusing on the getNotCacheableResult part and eliminating the code for the getCacheableResult part. This resolution allowed for a more streamlined operation, reducing the amount of time the query took to complete and thus improving overall performance.

The Consequence of a Mammoth Search Query Table: Implications and Solutions

The search_query table, with its approximately 2.7 million search terms, caused significant strain on the database. The sheer volume of data to be processed by the search_query lead to high CPU usage, impacting the performance of administrative panel operations. With the num_results > 0 portion of the query being the main bottleneck, removing this part improved the query's duration significantly.

The solution to this issue lies not in reducing the number of search terms but in streamlining how they are accessed and processed. Removing the DISTINCT operator, an unnecessary function due to the unique constraint on query_text and store_id, from queries within \Magento\Search\Model\ResourceModel\Query\Collection significantly improved the performance. Additionally, adding an index on the search_query table could potentially resolve the issue, though further testing is required to determine the impact on write loads.

The Path Ahead: Needed Improvements and Future Developments for Magento

The performance issue in Magento 2.3, and its persistence in the latest 2.4.1 update with PHP 7.4 and MySQL 8.0, highlights the need for ongoing evaluation and improvement. This issue is still relevant and requires attention as the feature of caching popular search terms continues to cause performance problems, even with the elastic search enabled.

The bottleneck, the DISTINCT operator, unnecessarily burdens MySQL and can be removed. It is important to note that while the suggested fix of adding columns to the select query does not directly address the performance issue, it may still be beneficial.

Future developments for Magento should focus on refining the Popular Search Term Cache feature, reducing the load on MySQL, and optimizing the handling of search terms. As e-commerce continues to grow, the ability of the Magento platform to adapt and improve will be crucial for maintaining its position as a leading platform in the industry.

In conclusion, the performance issues brought about by the Magento 2.3 update are not insurmountable, but rather necessitate a reevaluation of how the platform handles popular search term caching.

• The DISTINCT operator, a seemingly innocuous factor, symbolizes a considerable performance bottleneck, its removal from the frequent SELECT DISTINCT COUNT(*) FROM search_query, proving to be a viable step towards improving MySQL efficiency.

• Equally important is the reexamination of the num_results > 0 clause, its exclusion providing significant enhancements in query speed, although further analysis is needed to ensure no unintended complications arise.

• Lastly, potential solutions such as overriding the execute function or adding an index on the search_query table offer promising avenues for sustainable performance improvement.

As Magento evolves, it is imperative these lessons inform its future development, ensuring its continued success in the ever-expanding realm of e-commerce.