Alessandro Lacava’s Blog

Google
 

June 4, 2008

Performance differences between IN and EXISTS

Filed under: Computer, Oracle, Database — alessandrolacava @ 3:33 pm

Very often I hear people wondering if it's better the EXISTS or the IN operator.
Let's start by saying that the two operators behave differently when NULL values are involved (see this post).

If you are sure that the field used to join the tables does not include NULL values then the IN operator performs "generally" better than EXISTS. I say "generally" because this is not always the case. For example consider the following query:

SQL:
  1. SELECT *
  2. FROM table_a a
  3. WHERE a.cd_field NOT IN (SELECT b.cd_field FROM table_b b);

The previous query is processed more or less as follows:

SQL:
  1. SELECT *
  2. FROM table_a a,
  3. (SELECT DISTINCT cd_field FROM table_b) b
  4. WHERE a.cd_field != b.cd_field;

The subquery is evaluated and the duplicates removed from the result (DISTINCT).
Then it is joined to the outer table. Now lets analyze the same query using NOT EXISTS:

SQL:
  1. SELECT *
  2. FROM table_a a
  3. WHERE NOT EXISTS (SELECT 1 FROM table_b b WHERE a.cd_field = b.cd_field);

The previous query is processed more or less as follows:

SQL:
  1. FOR x IN ( SELECT * FROM table_a )
  2. LOOP
  3. IF ( NOT EXISTS ( SELECT 1 FROM table_b b WHERE x.cd_field = b.cd_field )
  4. THEN
  5. --output the record
  6. END IF
  7. END LOOP

This means that using NOT EXISTS always results in a full scan of the outer table whereas the query which uses NOT IN can exploit an index on table_a.cd_field.
In general NOT IN performs better when the result of the subquery is small and there's an index on the field used by the join. On the other hand, NOT EXISTS is to prefer when the subquery has many rows, the outer query is relatively small and there's an index on the field used to join the two tables. In this latter case the cost of removing duplicates (DISTINCT) could be greater than the cost of the full scan of the outer table.

Of course this is a very general guideline, you should always conduct some test using indexes and so on to figure out which method to use.


NOT IN vs. NOT EXISTS when NULL values are involved

Filed under: Computer, Oracle, Database — alessandrolacava @ 3:24 pm

NOT IN does not behave as you would expect when NULL values are involved.
Suppose you have the following tables:

TABLE_A
CD_FIELD DS_FIELD
1 ALPHA
2 BETA
TABLE_B
CD_FIELD DS_FIELD
1 GAMMA
NULL DELTA

Now try the following query:

SQL:
  1. SELECT *
  2. FROM table_a a
  3. WHERE a.cd_field NOT IN (SELECT b.cd_field FROM table_b b);

Does the previous query return any row? The answer is NO it doesn't, even if there is a value (2) in table_a.cd_field which is not present in table_b.cd_field.

I think this is because NOT IN internally uses the inequality operator.
You should never use the equality or inequality operators with NULL, but IS NULL and IS NOT NULL, respectively.
For this reason, in SQL, NULL is not equal to NULL.
If you don't believe me then try the following query:

SQL:
  1. SELECT 'test' c
  2. FROM dual
  3. WHERE NULL = NULL;

It returns 0 rows.

So how can you work this out? Enter the EXISTS (and NOT EXISTS) operator.

Try the following query:

SQL:
  1. SELECT *
  2. FROM table_a a
  3. WHERE NOT EXISTS (SELECT 1 FROM table_b b WHERE a.cd_field = b.cd_field);

The previous query returns 1 row, that is what you would expect since there is a row in table_a with cd_field = 2 which is not present in table_b.cd_field.

From now on beware of using NOT IN and IN when NULL values are involved.


Next Page »