For a while I was sure that UNION ALL
had better performance when dealing with multiple matches from SELECT * FROM table
. But recently I revisited my code and the performance is negligible.
My code is pulling about 50 clientid
from the events
table, and it had about 100 datapoints for each clientid
. This is the SQL:
UNION ALL
SELECT * FROM events
WHERE clientid = 'FC001'
AND datetime BETWEEN '2020-01-01 00:00' AND '2020-01-31 00:00'
UNION ALL
SELECT * FROM events
WHERE clientid = 'FC002'
AND datetime BETWEEN '2020-01-01 00:00' AND '2020-01-31 00:00'
UNION ALL
SELECT * FROM events
WHERE clientid = 'FC003'
AND datetime BETWEEN '2020-01-01 00:00' AND '2020-01-31 00:00'
...
...
And these are the results I got.
[2020-02-16T11:28:08.036Z] getting entries from 2019-12-31T22:00:00.000Z -----> 2020-01-31T21:59:59.999Z
[2020-02-16T11:28:08.650Z] sql data 658ms
WHERE..IN
Now let's compare this to a WHERE..IN
command for a list of clientids
.
SELECT * FROM events
WHERE clientid IN ('FC001', 'FC002', 'FC003', ....)
AND datetime BETWEEN '2020-01-01 00:00' AND '2020-01-31 00:00'
Here are the results.
[2020-02-16T11:26:33.851Z] getting entries from 2019-12-31T22:00:00.000Z -----> 2020-01-31T21:59:59.999Z
[2020-02-16T11:26:34.501Z] get sql data 650ms
As I see it the results are negligible. The events table is indexed by clientid, and data parsing is optimized.
In fact, further Googling shows that SQL experts even prefer WHERE..IN
because the statement makes much more sense than a UNION ALL
join.