Stack Overflow Asked by user8588978 on December 11, 2021
I have a table with column content that has a FULLTEXT index.
I want to take advantage of the speed of MATCH() on large text.
I want the search to be as accurate as possible.
When I search for the phrase string "large truck" this way:
SELECT * FROM MyTable WHERE MATCH(content) AGAINST('"large truck"' IN BOOLEAN MODE);
Some instances are missed.
MyTable:
| content |
----------------
|Large n truck| FOUND ✓
----------------
|large truck | FOUND ✓
----------------
|large trucks | *PLURAL MISSED!
----------------
|large truckl | *TYPE-O MISSED!
If I use the standard LIKE / wildcard method:
SELECT * FROM `MyTable` WHERE `content` LIKE '%large truck%'
MyTable:
| content |
----------------
|Large n truck| *MISSED!
----------------
|large truck | FOUND ✓
----------------
|large trucks | FOUND ✓
----------------
|large truckl | FOUND ✓
It seems I can’t use a PHRASE search with wildcard together either:
SELECT * FROM MyTable WHERE MATCH(content) AGAINST('"large truck*"' IN BOOLEAN MODE); **DOES NOT WORK**
OR
SELECT * FROM MyTable WHERE MATCH(content) AGAINST('"large truck"*' IN BOOLEAN MODE); **DOES NOT WORK**
So…
How do I successfully use MATCH() AGAINST() searching for a phrase, and get all instances returned – even case insensitive partial string matches?
The trick that I often use with FT is to do it in two steps:
MATCH
, hoping to get all the desired text, but possibly some extra results.AND
with another condition -- LIKE
(faster) or REGEXP
(more powerful).The MATCH
will be fast because of FT; the other part will be performed second, so it will be fast because there aren't many rows to check.
This matches your criteria:
SELECT * FROM MyTable
WHERE MATCH(content) AGAINST('+large +truck*' IN BOOLEAN MODE)
AND content REGEXP "large[[:space:]]+truck";
Phrased differently, the query will run something like this:
MATCH
will be very fast (because of the way it is designed). It will find all rows with both "large" and "truck*" anywhere in content
. Now, let's say, there are 30 rows that satisfy that.WHERE
is evaluated. But it is done against only those 30 rows. So, even though the REGEXP
is costly, it is not being done often.The net effect is that the whole query runs "fast", which was one of your requirements.
Note: I needed the second part to prevent these
large green truck
the truck is large
Depending on the version, you may need this REGEXP instead: "large\s+truck"
.
Answered by Rick James on December 11, 2021
Here is a quick hack with REGEXP, but it's not solving the issue as it does not use the fulltext index:
SELECT *
FROM MyTable
WHERE content REGEXP("large[[:space:]]+truck*");
Of course you can also use the FT index by searching without exact phrase:
SELECT *
FROM MyTable
WHERE MATCH(content) AGAINST('+large +truck*' IN BOOLEAN MODE);
But this will eventually include records that you do not want as it is not the same as exact phrase search.
Unfortunately phrase search (double quotes - "") and truncation operator (wildcard - *) will not work regardless if you use InnoDB or MyISAM. It does not work with the distance operator with InnoDB too (which is probably coming from the same origin). I guess it is related in the nature how data of the full text index is stored.
Answered by Philip Petrov on December 11, 2021
Get help from others!
Recent Questions
Recent Answers
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP