Skip to content

Similarity Filter

Usually, ORDER BY and LIMIT clauses are used to perform vector search. In some cases, you may want to use a WHERE clause to filter out vectors that are far from the query. Naturally, you might write the following SQL.

sql
EXPLAIN (COSTS FALSE) 
SELECT * FROM items WHERE embedding <-> '[0, 0, 0]' < 0.1 ORDER BY embedding <-> '[0, 0, 0]' LIMIT 10;
                                  QUERY PLAN                                   
-------------------------------------------------------------------------------
 Limit
   ->  Index Scan using items_embedding_idx on items
         Order By: (embedding <-> '[0,0,0]'::vector)
         Filter: ((embedding <-> '[0,0,0]'::vector) < '0.1'::double precision)

The returned results meet expectations, while this approach suffers from poor performance. This is because if fewer than vectors fall within the distance threshold, PostgreSQL forces the index to continue to search to get results, exhausting the entire search space.

To avoid this situation, you can use specific syntax to push the filter down to the vector index, allowing it to stop the search as soon as the search region moves beyond the specified distance. We refer to this type of distance-based condition that can be pushed down to the index as a similarity filter.

sql
EXPLAIN (COSTS FALSE) 
SELECT * FROM items WHERE embedding <<->> sphere('[0, 0, 0]'::vector, 0.1) ORDER BY embedding <-> '[0, 0, 0]' LIMIT 10;
                               QUERY PLAN                               
------------------------------------------------------------------------
 Limit
   ->  Index Scan using items_embedding_idx on items
         Index Cond: (embedding <<->> '("[0,0,0]",0.1)'::sphere_vector)
         Order By: (embedding <-> '[0,0,0]'::vector)

embedding <<->> sphere('[0, 0, 0]'::vector, 0.1) evaluates to true if and only if the L2 distance between the two vectors is less than 0.1. Use the <<#>> operator for negative inner product, and the <<=>> operator for cosine distance.

Additionally, if you specify only the WHERE clause without an ORDER BY, the index can still be used effectively.

sql
EXPLAIN (COSTS FALSE) 
SELECT * FROM items WHERE embedding <<->> sphere('[0, 0, 0]'::vector, 0.1);
                            QUERY PLAN                            
------------------------------------------------------------------
 Index Scan using items_embedding_idx on items
   Index Cond: (embedding <<->> '("[0,0,0]",0.1)'::sphere_vector)

Reference

Operator Classes

Refer to