I’m sure I’m missing something here, but I noticed some unexpected performance degradation in one of my queries. (Note that I’m replacing ? with 0 in MemSQL Studio just to run the Explain/Profile.) Let me call it Query 0:
select e from datoms_long
where p = 11
and e in (select e from datoms_ref
where p = 11
and a = ?
and v = ?)
and a = ?;
Same performance issue with a join approach (Query 1):
select datoms_long.e
from datoms_long
inner join
datoms_ref
on datoms_ref.p = 11
and datoms_long.p = 11
and datoms_ref.e = datoms_long.e
and datoms_long.a = ?
and datoms_ref.a = ?
and datoms_ref.v = ?;
Query 0 and Query 1 each take ~500ms, but if I break Query 0 into two parts (Queries 2 and 3), each part runs essentially instantaneously:
select e from datoms_ref
where p = 11
and a = ?
and v = ?;
select e from datoms_long
where p = 11
and e in (?, ?) -- output of `select e from datoms_ref ...`
and a = ?;
Each of the tables involved have the same indices:
index eav (e desc, a desc, v desc),
index av ( a desc, v desc),
index vv ( v desc),
constraint pk primary key (p desc, e desc, a desc, v desc),
shard key (p)
The Profile shows Queries 0 and 1 doing a full table scan (~500K rows) every time even when using the primary key. By contrast, Queries 2 and 3 scan only 300 rows. Happy to post the output of explain etc.
Note that p = “partition” (distinct ~15 of these), e = “entity id” (distinct 100Ks of these), a = “attribute id” (distinct ~50 of these), and v = “value” (distinct 100Ks of these).
I’ve recently re-read up on the behavior of indexes in both MemSQL and MySQL, but I’ve got to be missing something fundamental here. Any ideas?
I really appreciate your help!
—————
EDIT: I figured this out. First I went to EverSQL (super useful site by the way!) and when I pasted in the query above, with 0s instead of ?s, it said, among other things, “Avoid Implicit Casts When Searching for Strings … It’s recommended not to compare string columns to numeric values, as it will result in a cast that will prevent index usage.” This appears to have been exactly what was happening:
- I was using
0instead of e.g.convert(0, binary)in MemSQL Studio, which made the datatype of the parameter ambiguous and caused the query optimizer not to use the appropriate indexes, just as EverSQL said. - I was doing
.setObjectin JDBC which also made the datatype of the parameter ambiguous. However, interestingly, I still needed to wrap the param inconvert(..., binary)even after calling.setBytesin JDBC.
As a simple test, wrapping the params / 0s in convert(..., binary) made the query execute virtually instantly instead of doing a full table scan. Hooray for obscure index knowledge!