For a while I was sure that UNION ALL had better performance when dealing with multiple matches from SELECT * FROM table. But recently I revisited my code and the performance is negligible.

My code is pulling about 50 clientid from the events table, and it had about 100 datapoints for each clientid. This is the SQL:

UNION ALL

SELECT * FROM events
WHERE clientid = 'FC001'
AND datetime BETWEEN '2020-01-01 00:00' AND '2020-01-31 00:00'

UNION ALL

SELECT * FROM events
WHERE clientid = 'FC002'
AND datetime BETWEEN '2020-01-01 00:00' AND '2020-01-31 00:00'

UNION ALL

SELECT * FROM events
WHERE clientid = 'FC003'
AND datetime BETWEEN '2020-01-01 00:00' AND '2020-01-31 00:00'
...
...

And these are the results I got.

[2020-02-16T11:28:08.036Z]  getting entries from 2019-12-31T22:00:00.000Z  ----->  2020-01-31T21:59:59.999Z

[2020-02-16T11:28:08.650Z]  sql data 658ms

WHERE..IN

Now let's compare this to a WHERE..IN command for a list of clientids.

SELECT * FROM events
WHERE clientid IN ('FC001', 'FC002', 'FC003', ....)
AND datetime BETWEEN '2020-01-01 00:00' AND '2020-01-31 00:00'

Here are the results.

[2020-02-16T11:26:33.851Z]  getting entries from 2019-12-31T22:00:00.000Z  ----->  2020-01-31T21:59:59.999Z

[2020-02-16T11:26:34.501Z]  get sql data 650ms

As I see it the results are negligible. The events table is indexed by clientid, and data parsing is optimized.

In fact, further Googling shows that SQL experts even prefer WHERE..IN because the statement makes much more sense than a UNION ALL join.

Further reading

© 2024. All Rights Reserved.