How does relevance score work in full text search

Hi,

We are using memsql for full text search. We are dependent on results of memsql Full text search based on the relevance score. I was under opinion that Full Text Search relevance score is based on how accurately the input matches with what is present in the database. I am searching for a string which is exactly same as what is present in database, also there are few entries which are similar to the input string but does not match completely. I was expecting higher relevance score for the record which matches exactly same as input string and lesser relevance score for the partial match. However, I am getting same relevance score for the few records. Please find the example below.

Following are my column values

line_2
Taman Rawang Idaman
Taman Rawang Putra
Taman Rawang Tin
Taman Rawang
Taman Rawang Jaya
Taman Rawang Perdana

Below is my query.

SELECT line_2, MATCH (line_2) AGAINST ('Taman Rawang Jaya') line_2_relevance
FROM GEO_SOURCE
WHERE MATCH (line_2) AGAINST ('Taman Rawang Jaya') >= 0.5
ORDER BY line_2_relevance desc

Output

line_2                 line_2_relevance
Taman Rawang Perdana   1
Taman Rawang Jaya      1
Taman Rawang           1
Taman Rawang Putra     1
Taman Sri Rawang       1
Taman Rawang Tin       1
Taman Rawang Idaman    1

As you can see, even though input matches exactly with database entry, relevance score of partially matched records were also ranked higher. Can you please explain me how relevance score is calculated ? Does memsql takes care of the order of occurrence of the queried text against what is present in database ?

MemSQL Version
7.1.8

Thanks,
Pramod

1 Like

Our implementation is based on C-Lucene, which is a port of the Java Lucene system to C++. Here’s what I was able to find about scoring.

https://lucene.apache.org/core/3_5_0/scoring.html

You might consider breaking your search names up into separate terms for first, middle, and last names, and also throwing in the whole name as a term, to get better ranking. That is just a thought-don’t know if it’ll work.

Hi Hanson,

Thanks for you reply. I am querying against address line_2, hence I cannot separate the terms to query. Is there any way that I can achieve better relevance score If all match terms are present vs few matching terms ? Is there anyway to configure tokenizer/analyzer used by underlying lucene in memsql ?