![]() This token contains a large number of repeated content, and I will filter them out with a “Does Not Contain” operator. In this example, I am going to create unevenly distributed data. Full-text search - “Does Not Contain” query (All of these can be calibrated as long as cost factors and environmental performance measuring are accurate enough.) postgres=# set enable_seqscan=off SET postgres=# explain (analyze,verbose,timing,costs,buffers) select * from notcontain t1 where info to_tsquery ('!eb') QUERY PLAN - Bitmap Heap Scan on postgres.notcontain t1 (cost=8229490120.25 rows=950820 width=412) (actual time=1325.587.1540.145 rows=947911 loops=1) Output: id, info Recheck Cond: (t1.info to_tsquery('!eb'::text)) Heap Blocks: exact=55549 Buffers: shared hit=171948 -> Bitmap Index Scan on idx_notcontain_info (cost=4743.30 rows=950820 width=0) (actual time=1315.663.1315.663 rows=947911 loops=1) Index Cond: (t1.info to_tsquery('!eb'::text)) Buffers: shared hit=116399 Planning time: 0.135 ms Execution time: 1584.670 ms (10 rows) Example 2. Most of the time we can trust the database to find the most efficient method to complete our queries. We can see that, searches that use index are really slow. “Contains” and “Does Not Contain” are inversions of each other, and so are the associated costs.) select * from notcontain t1 where info to_tsquery ('!eb') postgres=# explain (analyze,verbose,timing,costs,buffers) select * from notcontain t1 where info to_tsquery ('!eb') QUERY PLAN - Seq Scan on postgres.notcontain t1 (cost=54.51 rows=950820 width=412) (actual time=0.016.1087.463 rows=947911 loops=1) Output: id, info Filter: (t1.info to_tsquery('!eb'::text)) Rows Removed by Filter: 52089 Buffers: shared hit=55549 Planning time: 0.131 ms Execution time: 1134.571 ms (7 rows)ħ.Force disable Full Table Scan, and allow the database to choose the indexes. (However, if the query contains a keyword, it will be very cost-effective to use the GIN indexes. It is not cost-effective to use indexes for filtering in a query that does not contain a keyword. Why wasn’t an index used? As I have explained before, the number of data records containing the keyword is quite small. The database automatically chooses Full Table Scan, and does not use the GIN index. Postgres=# insert into notcontain select generate_series(1,1000000), to_tsvector(gen_rand_str(256)) Ĭreate index idx_notcontain_info on notcontain using gin (info) ĥ.Query one of the records postgres=# select * from notcontain limit 1 - id | 1 info | 'afbbeeccbf':3 'b':16 'bdcdfd':2 'bdcfbcecdeeaed':8 'bfedfecbfab':7 'cd':9 'cdcaefaccdccadeafadededddcbdecdaefbcfbdaefcec':14 'ceafecff':6 'd':17,18 'dbc':12 'dceabcdcbdca':10 'dddfdbffffeaca':13 'deafcccfbcdebdaecda':11 'dfbadcdebdedbfa':19 'eb':15 'ebe':1 'febdcbdaeaeabbdadacabdbbedfafcaeabbdcedaeca':5 'fedeecbcdfcdceabbabbfcdd':4 Full-text search - “Does Not Contain” queryġ.Create a test table postgres=# create table notcontain (id int, info tsvector) CREATE TABLEĢ.Create a function to generate random strings CREATE OR REPLACE FUNCTION gen_rand_str(integer) RETURNS text LANGUAGE sql STRICT AS $function$ select string_agg(a,'') from generate_series(1,$1), (select array) t(a) $function$ PostgreSQL uses a CBO (cost-based optimizer) execution plan optimizer to automatically choose the best index. ![]() It will be cost-effective if the skipped token contains a relatively large amount of data. A query that does not contain a keyword is actually a scanning process that skips the token above the main tree. GIN stands for Generalized Inverted Index. ![]() Will indexes be used in full-text searches that do not contain a certain keyword? With built-in GIN indexes, PostgreSQL supports full-text search and searching multi-value data types including arrays.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |