Skip to content

Graph Index since v0.5.0

VectorChord's index type vchordg is a disk-based graph index. It has low memory consumption.

Let's start by creating a table named items with an embedding column of type vector(n), and then populate it with sample data.

To create a vchordg index, you can use the following SQL.

sql
CREATE INDEX ON items USING vchordg (embedding vector_l2_ops);

TIP

This feature is in preview.

Tuning

When building an index, two options usually need tuning: m and ef_construction. m is the maximum number of neighbors per vertex, and ef_construction is the size of the dynamic list containing the nearest neighbors during insertion.

In search, you need to tune ef_search. ef_search is the size of the dynamic list containing the nearest neighbors during search.

sql
CREATE INDEX ON items USING vchordg (embedding vector_l2_ops) WITH (options = $$
m = 64
ef_construction = 128
$$);

SET vchordg.ef_search TO '128';
SELECT * FROM items ORDER BY embedding <-> '[3,1,2]' LIMIT 10;

As a disk-based index, vchordg usually requires only the quantized vectors to be kept in the buffer pool to maintain performance. By default, vchordg quantizes a -dimensional vector to bits. Let the number of rows be . Then, the total memory required for the index is bits. If you have very limited memory and are using ultra-high dimensional vectors, consider quantizing a vector to bits. Then, the total memory required for the index is bits.

sql
CREATE INDEX ON items USING vchordg (embedding vector_l2_ops) WITH (options = $$
bits = 1
m = 64
ef_construction = 128
$$);

The index building can be sped up by using multiple processes. Refer to PostgreSQL Tuning.

Reference

Operator Classes vchordg

The following table lists all available operator classes supported by vchordg.

Operator ClassDescriptionOperator 1Operator 2
vector_l2_opsindex works for vector type and Euclidean distance<->(vector,vector)<<->>(vector,vector)
vector_ip_opsindex works for vector type and negative inner product<#>(vector,vector)<<#>>(vector,vector)
vector_cosine_opsindex works for vector type and cosine distance<=>(vector,vector)<<=>>(vector,vector)
halfvec_l2_opsindex works for halfvec type and Euclidean distance<->(halfvec,halfvec)<<->>(halfvec,halfvec)
halfvec_ip_opsindex works for halfvec type and negative inner product<#>(halfvec,halfvec)<<#>>(halfvec,halfvec)
halfvec_cosine_opsindex works for halfvec type and cosine distance<=>(halfvec,halfvec)<<=>>(halfvec,halfvec)

<<->>, <<#>>, <<=>> are operators defined by VectorChord.

For more information about <<->>, <<#>>, <<=>>, refer to Similarity Filter.

All operator classes are available since version 0.3.0.

Indexing Options vchordg

bits since v0.5.0

  • Description: The ratio of bits to dimensions after RaBitQ quantization. bits = 2 provides better recall and QPS, while bits = 1 consumes less memory.
  • Type: integer
  • Default: 2
  • Example:
    • bits = 2 means a -dimensional vector is quantized to bits.
    • bits = 1 means a -dimensional vector is quantized to bits .

m since v0.5.0

  • Description: The maximum number of neighbors per vertex. The larger m is, the better the performance, but the higher the storage requirement. m corresponds to in HNSW and in DiskANN.
  • Type: integer
  • Default: 32
  • Example:
    • m = 32 means that there are at most neighbors per vertex.
    • m = 64 means that there are at most neighbors per vertex.

ef_construction since v0.5.0

  • Description: The size of the dynamic list containing the nearest neighbors during insertion. The larger ef_construction is, the better the performance, but the slower the insertion. ef_construction corresponds to in HNSW and in DiskANN.
  • Type: integer
  • Default: 64
  • Example:
    • ef_construction = 64 means that the size of the dynamic list containing the nearest neighbors is during insertion.
    • ef_construction = 128 means that the size of the dynamic list containing the nearest neighbors is during insertion.

alpha since v0.5.0

  • Description: The alpha values selected during pruning. alpha corresponds to in DiskANN. This option must be an ascending list, where the first element is 1.0 and the last element is less than 2.0.
  • Type: list of floats
  • Default: [1.0, 1.2]
  • Example:
    • alpha = [1.0, 1.2] is equivalent to setting alpha = 1.2 in DiskANN.
    • alpha = [1.0] is equivalent to the default pruning strategy in HNSW.
  • Note: this option is ineffective when the distance metric is negative inner product.

beam_construction since v0.5.0

  • Description: Beam width used during insertion. Beam width refers to the number of vertices accessed at once. Since the time to randomly read a small number of sectors from the SSD is almost the same as reading a single sector, a larger beam width effectively reduces the number of round trips to the SSD, resulting in better performance. Since it increases computation, it is disadvantageous for indexes that fit entirely in memory.
  • Type: integer
  • Default: 1
  • Example:
    • beam_construction = 8 means that the index accesses 8 vertices at once during insertion.
    • beam_construction = 1 means that the index accesses 1 vertex at once during insertion.

Search Parameters vchordg

  • Description: The size of the dynamic list containing the nearest neighbors in search. The larger vchordg.ef_search is, the better the recall, but the worse the QPS. vchordg.ef_search corresponds to in HNSW and DiskANN.
  • Type: integer
  • Default: 64
  • Domain: [1, 65535]
  • Example:
    • SET vchordg.ef_search = 64 indicates the size of the dynamic list containing the nearest neighbors is during search.
    • SET vchordg.ef_search = 128 indicates the size of the dynamic list containing the nearest neighbors is during search.
  • Description: Beam widths used during search. Beam width refers to the number of vertices accessed at once. Since the time to randomly read a small number of sectors from the SSD is almost the same as reading a single sector, a larger beam width effectively reduces the number of round trips to the SSD, resulting in better performance. Since it increases computation, it is disadvantageous for indexes that fit entirely in memory.
  • Type: integer
  • Default: 1
  • Domain: [1, 65535]
  • Example:
    • SET vchordg.beam_search = 8 indicates that the index accesses 8 vertices at once during search.
    • SET vchordg.beam_search = 1 indicates that the index accesses 1 vertex at once during search.