June 4, 2008

Performance differences between IN and EXISTS

Written by

Very often I hear people wondering if it’s better the EXISTS or the IN operator.
Let’s start by saying that the two operators behave differently when NULL values are involved (see this post).

If you are sure that the field used to join the tables does not include NULL values then the IN operator performs “generally” better than EXISTS. I say “generally” because this is not always the case. For example consider the following query:

The previous query is processed more or less as follows:

The subquery is evaluated and the duplicates removed from the result (DISTINCT).
Then it is joined to the outer table. Now lets analyze the same query using NOT EXISTS:

The previous query is processed more or less as follows:

This means that using NOT EXISTS always results in a full scan of the outer table whereas the query which uses NOT IN can exploit an index on table_a.cd_field.
In general NOT IN performs better when the result of the subquery is small and there’s an index on the field used by the join. On the other hand, NOT EXISTS is to prefer when the subquery has many rows, the outer query is relatively small and there’s an index on the field used to join the two tables. In this latter case the cost of removing duplicates (DISTINCT) could be greater than the cost of the full scan of the outer table.

Of course this is a very general guideline, you should always conduct some test using indexes and so on to figure out which method to use.

Category : ITProgramming

Tags :

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code class="" title="" data-url=""> <del datetime=""> <em> <i> <q cite=""> <strike> <strong> <pre class="" title="" data-url=""> <span class="" title="" data-url="">

Proudly powered by WordPress and Sweet Tech Theme