Graph Index since v0.5.0
VectorChord's index type vchordg
is a disk-based graph index. It has low memory consumption.
Let's start by creating a table named items
with an embedding
column of type vector(n)
, and then populate it with sample data.
To create a vchordg
index, you can use the following SQL.
CREATE INDEX ON items USING vchordg (embedding vector_l2_ops);
TIP
This feature is in preview.
Tuning
When building an index, two options usually need tuning: m
and ef_construction
. m
is the maximum number of neighbors per vertex, and ef_construction
is the size of the dynamic list containing the nearest neighbors during insertion.
In search, you need to tune ef_search
. ef_search
is the size of the dynamic list containing the nearest neighbors during search.
CREATE INDEX ON items USING vchordg (embedding vector_l2_ops) WITH (options = $$
m = 64
ef_construction = 128
$$);
SET vchordg.ef_search TO '128';
SELECT * FROM items ORDER BY embedding <-> '[3,1,2]' LIMIT 10;
As a disk-based index, vchordg
usually requires only the quantized vectors to be kept in the buffer pool to maintain performance. By default, vchordg
quantizes a
CREATE INDEX ON items USING vchordg (embedding vector_l2_ops) WITH (options = $$
bits = 1
m = 64
ef_construction = 128
$$);
The index building can be sped up by using multiple processes. Refer to PostgreSQL Tuning.
Reference
Operator Classes vchordg
The following table lists all available operator classes supported by vchordg
.
Operator Class | Description | Operator 1 | Operator 2 |
---|---|---|---|
vector_l2_ops | index works for vector type and Euclidean distance | <->(vector,vector) | <<->>(vector,vector) |
vector_ip_ops | index works for vector type and negative inner product | <#>(vector,vector) | <<#>>(vector,vector) |
vector_cosine_ops | index works for vector type and cosine distance | <=>(vector,vector) | <<=>>(vector,vector) |
halfvec_l2_ops | index works for halfvec type and Euclidean distance | <->(halfvec,halfvec) | <<->>(halfvec,halfvec) |
halfvec_ip_ops | index works for halfvec type and negative inner product | <#>(halfvec,halfvec) | <<#>>(halfvec,halfvec) |
halfvec_cosine_ops | index works for halfvec type and cosine distance | <=>(halfvec,halfvec) | <<=>>(halfvec,halfvec) |
<<->>
, <<#>>
, <<=>>
are operators defined by VectorChord.
For more information about <<->>
, <<#>>
, <<=>>
, refer to Similarity Filter.
All operator classes are available since version 0.3.0
.
Indexing Options vchordg
bits
since v0.5.0
- Description: The ratio of bits to dimensions after RaBitQ quantization.
bits = 2
provides better recall and QPS, whilebits = 1
consumes less memory. - Type: integer
- Default:
2
- Example:
bits = 2
means a-dimensional vector is quantized to bits. bits = 1
means a-dimensional vector is quantized to bits .
m
since v0.5.0
- Description: The maximum number of neighbors per vertex. The larger
m
is, the better the performance, but the higher the storage requirement.m
corresponds toin HNSW and in DiskANN. - Type: integer
- Default:
32
- Example:
m = 32
means that there are at mostneighbors per vertex. m = 64
means that there are at mostneighbors per vertex.
ef_construction
since v0.5.0
- Description: The size of the dynamic list containing the nearest neighbors during insertion. The larger
ef_construction
is, the better the performance, but the slower the insertion.ef_construction
corresponds toin HNSW and in DiskANN. - Type: integer
- Default:
64
- Example:
ef_construction = 64
means that the size of the dynamic list containing the nearest neighbors isduring insertion. ef_construction = 128
means that the size of the dynamic list containing the nearest neighbors isduring insertion.
alpha
since v0.5.0
- Description: The
alpha
values selected during pruning.alpha
corresponds toin DiskANN. This option must be an ascending list, where the first element is 1.0
and the last element is less than2.0
. - Type: list of floats
- Default:
[1.0, 1.2]
- Example:
alpha = [1.0, 1.2]
is equivalent to settingalpha = 1.2
in DiskANN.alpha = [1.0]
is equivalent to the default pruning strategy in HNSW.
- Note: this option is ineffective when the distance metric is negative inner product.
beam_construction
since v0.5.0
- Description: Beam width used during insertion. Beam width refers to the number of vertices accessed at once. Since the time to randomly read a small number of sectors from the SSD is almost the same as reading a single sector, a larger beam width effectively reduces the number of round trips to the SSD, resulting in better performance. Since it increases computation, it is disadvantageous for indexes that fit entirely in memory.
- Type: integer
- Default:
1
- Example:
beam_construction = 8
means that the index accesses 8 vertices at once during insertion.beam_construction = 1
means that the index accesses 1 vertex at once during insertion.
Search Parameters vchordg
vchordg.ef_search
since v0.5.0
- Description: The size of the dynamic list containing the nearest neighbors in search. The larger
vchordg.ef_search
is, the better the recall, but the worse the QPS.vchordg.ef_search
corresponds toin HNSW and DiskANN. - Type: integer
- Default:
64
- Domain:
[1, 65535]
- Example:
SET vchordg.ef_search = 64
indicates the size of the dynamic list containing the nearest neighbors isduring search. SET vchordg.ef_search = 128
indicates the size of the dynamic list containing the nearest neighbors isduring search.
vchordg.beam_search
since v0.5.0
- Description: Beam widths used during search. Beam width refers to the number of vertices accessed at once. Since the time to randomly read a small number of sectors from the SSD is almost the same as reading a single sector, a larger beam width effectively reduces the number of round trips to the SSD, resulting in better performance. Since it increases computation, it is disadvantageous for indexes that fit entirely in memory.
- Type: integer
- Default:
1
- Domain:
[1, 65535]
- Example:
SET vchordg.beam_search = 8
indicates that the index accesses 8 vertices at once during search.SET vchordg.beam_search = 1
indicates that the index accesses 1 vertex at once during search.