While working on my last project, I was facing a query which was taking 30 second to execute. The number of tables involved were 10 and total records were around 20k. It was a search functionality for a web-app project. There were lot of sub-queries involved in the first draft, all of them using ‘IN’. After doing some research ( within MySQL Community ) , I just replaced all ‘IN’ by ‘EXISTS’ and voila ! The query took 11 seconds to execute.
For example, you could change this query :
SELECT *
FROM invitees i
WHERE i.event_id IN ( SELECT e.id
FROM event e
WHERE e.city LIKE = ‘ABC’ )
TO
SELECT *
FROM invitees i
WHERE EXISTS ( SELECT 1
FROM event e
WHERE e.city LIKE = ‘ABC’
AND e.id = i.event_id ).
Try it and let me know if you find the same difference in performance as I have. There were many more modifications to the original query to bring it further down to 1 sec. But I will talk about that later. For more information on EXISTS and IN Sub Queries see Reference below.
References :
http://dev.mysql.com/doc/refman/5.0/en/optimizing-subqueries.html
http://dev.mysql.com/doc/refman/5.0/en/in-subquery-optimization.html
Jay Pipes said
1) You are using LIKE with no wildcard, and so it really should be = ‘ABC’
2) You would get sub-second performance with a standard join:
SELECT i.*
FROM invitees i
INNER JOIN event e
ON e.id = i.event_id
AND e.city = ‘ABC’;
Roland Bouman said
Hi there,
in my opinion, these
SELECT *
FROM invitees i
WHERE i.event_id IN (
SELECT id
FROM event e
WHERE city LIKE ‘ABC’
)
Should all be rewritten to joins
SELECT i.*
FROM invitees i
INNER JOIN event e
ON i.event_id = e.id
WHERE e.city LIKE ‘ABC’
Also, the LIKE does not contain a wildcard, we can just as well write:
WHERE e.city = ‘ABC’
instead of
WHERE e.city LIKE ‘ABC’
Roland Bouman
http://rpbouman.blogspot.com/
Rishi Agarwal said
Oh , I am sorry. Should have put LIKE ‘%ABC%’ or = ‘ABC’ for the example.
Second, I agree with your example of using a standard join, it will be better. The 2 examples given were to show the difference between IN and EXISTS , so didn’t focus on JOIN. But yeah, there are ways to replace ‘IN’ Sub Queries with JOIN for better performance.
Andrew said
—
SELECT *
FROM invitees i
WHERE i.event_id IN ( SELECT e.id
FROM event e
WHERE e.city LIKE = ‘ABC’ )
—
I believe this is slow because the inner query is evaluated, and then the outer query is evaluated – you do not reference the inner and outer queries directly.
If you rewrote the query to be:
SELECT *
FROM invitees i
WHERE i.event_id IN ( SELECT e.id
FROM event e
WHERE e.city LIKE = ‘ABC’ AND i.event_id = e.id )
You would find the query to be much faster.