Pipelines knowledge bases reference

Suggest edits

This reference documentation for Pipelines knowledge bases includes information on the functions and views available in the aidb extension related to knowledge bases.

Views

`aidb.knowledge_bases`

Also referenceable as aidb.kbs, the aidb.knowledge_bases view provides a list of all knowledge bases in the database. It includes information about the knowledge base name, the model used, and the source data type.

Column	Type	Description
id	integer
name	text	Name of the knowledge base.
vector_schema	text	Schema for vector_table.
vector_table	text	Name of the table where the embeddings are stored. Gets newly created if it doesn’t exist. Managed by aidb.
vector_key_column	text	Column to use to store the key that references the key in source data when computing embeddings. We recommend using the default and letting aidb manage this table.
vector_data_column	text	Column to store embeddings in. We recommend using the default and letting aidb manage this table.
model_name	text	Name of the model to use for embedding computation and retrievals.
topk	integer	Default number of results to return during a retrieve. Similar to LIMIT in SQL.
distance_operator	aidb.DistanceOperator	During retrieval, the vector operation to use to compare the vectors.
options	jsonb	Unused.
auto_processing	aidb.PipelineAutoProcessingMode	Auto-processing mode.
owner_role	text	The Postgres role who created this pipeline. Background auto-processing will run the pipeline as this role.
batch_size	integer	How many records to process concurrently when running this pipeline.
background_sync_interval	interval	Used for background auto-processing. This is the interval between pipeline executions.
source_type	text	Indicates whether this pipeline uses a table or volume as data source.
source_schema	text	Schema for source_table.
source_table	text	Name of the table used as input for the pipeline. Unused if the knowledge base uses a volume as source.
source_data_column	text	Column name in the source table that Pipelines computes embeddings for. This is also the column that's returned in retrieve operations.
source_data_format	aidb.PipelineDataFormat	Format of the data the knowledge base is working with. Uses type `aidb.PipelineDataFormat`.
source_key_column	text	Column to use as key for storing the embedding in the vector table. This provides a reference from the embedding to the source data.
source_volume_name	text	Name of the volume to use as a data source. Only applicable to knowledge bases configured with aidb.create_volume_knowledge_base().

`aidb.knowledge_base_stats`

Also referenceable as aidb.kbstat, the aidb.knowledge_base_stats view provides current statistics about auto processing for knowledge base pipelines.

Column	Type	Description
knowledge base	text	Name of the knowledge base.
auto processing	aidb.PipelineAutoProcessingMode	Auto-processing mode.
table: unprocessed rows	bigint	For table knowledge bases: The number of unprocessed rows that are new or might have changed since the last execution.
volume: scans completed	bigint	For volume knowledge bases: The number of full listings of the source volume that were performed so far.
count(source records)	bigint	The number of records in the source of the knowledge base. For volume knowledge bases, this is number of records seen during the last full scan.
count(embeddings)	bigint	The number of embeddings stored in the destination table.

Types

`aidb.DistanceOperator`

The aidb.DistanceOperator type is an enum that represents the distance operators that can be used during retrieval.

Value	Description
L2	Euclidean distance
Inner	Inner product
Cosine	Cosine similarity
L1	L1 distance
Hamming	Hamming distance
Jaccard	Jaccard distance

SQL definition:

CREATE TYPE DistanceOperator AS ENUM (
  'L2',
  'InnerProduct',
  'Cosine',
  'L1',
  'Hamming',
  'Jaccard'
);

`aidb.PipelineDataFormat`

The aidb.PipelineDataFormat type is an enum that represents the data formats that can be used as source data.

Value	Description
Text	Text data
Image	Image data
Pdf	PDF data

SQL definition:

CREATE TYPE PipelineDataFormat AS ENUM (
  'Text',
  'Image',
  'Pdf'
);

`aidb.PipelineAutoProcessingMode`

The aidb.PipelineAutoProcessingMode type is an enum used to define how auto processing for a Pipeline shoud behave (e.g. a knowledge base or a Preparer).

Value	Description
Live	New data is processed immediately while being added (using Postgres Triggers)
Background	Continuous processing in the background (using Postgres background workers)
Disabled	No automated processing

CREATE TYPE PipelineAutoProcessingMode AS ENUM (
	'Live',
	'Background',
	'Disabled'
);

Functions

`aidb.create_table_knowledge_base`

Creates a knowledge base for a given table.

Parameters

Parameter	Type	Default	Description
name	TEXT	Required	Name of the knowledge base.
model_name	TEXT	Required	Name of the model to use.
source_table	regclass	Required	Name of the table to use as source.
source_data_column	TEXT	Required	Column name in source table to use.
source_data_format	aidb.PipelineDataFormat	Required	Format of data in that column ("Text", "Image", "PDF").
source_key_column	TEXT	'id'	Unique column in the source table to use as key to reference the rows.
vector_table	TEXT	NULL
vector_data_column	TEXT	'embeddings'
vector_key_column	TEXT	'id'
topk	INTEGER	1
distance_operator	aidb.distanceoperator	'L2'
options	JSONB	'{}'::JSONB	Unused
auto_processing	aidb.PipelineAutoProcessingMode	'Disabled'	Configure auto-processing for this pipeline.
index_type	TEXT	'vector'	Type of index to use for the vector table.
batch_size	int	100	How many records to process concurrently.
background_sync_interval	interval	'30 seconds'	Interval between pipeline executions if background auto-processing is configured.

Index_types

If index_type is set to vector, the system will automatically create a hnsw index on the vector table based on the distance operator used in the knowledge base. This is the default index type. The vector index type is only able to support 2000 dimensions or less. If more dimensions are needed, the index type should be set to disabled.
distance_operator index_type
L2 vector_l2_ops
InnerProduct vector_ip_ops
Cosine vector_cosine_ops
L1 vector_l1_ops
If index_type is set to ivfflat, the system will create a IVFFlat index on the vector table.
If index_type is set to disabled, no index will be created.

distance_operator	index_type
L2	`vector_l2_ops`
InnerProduct	`vector_ip_ops`
Cosine	`vector_cosine_ops`
L1	`vector_l1_ops`

Example

SELECT aidb.create_table_knowledge_base(
               name => 'test_knowledge_base',
               model_name => 'bert',
               source_table => 'test_source_table',
               source_data_column => 'content',
               source_data_type => 'Text'
       );

`aidb.create_volume_knowledge_base`

Creates a knowledge base for a given PGFS volume.

Parameters

Parameter	Type	Default	Description
name	TEXT	Required	Name of the knowledge base
model_name	TEXT	Required	Name of the model
source_volume_name	TEXT	Required	Name of the volume
vector_table	TEXT	NULL	Name of the vector table
vector_data_column	TEXT	'embeddings'	Name of the vector column
vector_key_column	TEXT	'id'	Name of the key column
topk	INTEGER	1	Number of results to return
distance_operator	aidb.distanceoperator	'L2'	Distance operator
options	JSONB	'{}'::JSONB	Options
auto_processing	aidb.PipelineAutoProcessingMode	'Disabled'	Configure auto-processing for this pipeline.
index_type	TEXT	'vector'	Type of index to use for the vector table.
batch_size	int	100	How many records to process concurrently.
background_sync_interval	interval	'30 seconds'	Interval between pipeline executions if background auto-processing is configured.

Index_types

If index_type is set to vector, the system will automatically create a hnsw index on the vector table based on the distance operator used in the knowledge base. This is the default index type. The vector index type is only able to support 2000 dimensions or less. If more dimensions are needed, the index type should be set to disabled.
distance_operator index_type
L2 vector_l2_ops
InnerProduct vector_ip_ops
Cosine vector_cosine_ops
L1 vector_l1_ops
If index_type is set to ivfflat, the system will create a IVFFlat index on the vector table.
If index_type is set to disabled, no index will be created.

distance_operator	index_type
L2	`vector_l2_ops`
InnerProduct	`vector_ip_ops`
Cosine	`vector_cosine_ops`
L1	`vector_l1_ops`

Example

SELECT aidb.create_volume_knowledge_base(
               name => 'demo_vol_knowledge_base',
               model_name => 'simple_model',
               source_volume_name => 'demo_bucket_vol'
       );

`aidb.set_auto_knowledge_base`

Sets the auto-processing mode for this knowledge base.

Parameters

Parameter	Type	Description
knowledge_base_name	TEXT	Name of knowledge base for which to enable auto-processing
mode	aidb.PipelineAutoProcessingMode	Desired auto-processing mode
batch_size	INTEGER DEFAULT NULL	How many records to process in one batch in Disabled (aka. Manual) or Background processing
background_sync_interval	INTERVAL DEFAULT NULL	Desired sync interval for background processing

Example

SELECT aidb.set_auto_knowledge_base('test_knowledge_base', 'Live');
SELECT aidb.set_auto_knowledge_base('test_knowledge_base', 'Disabled', batch_size => 100);
SELECT aidb.set_auto_knowledge_base('test_knowledge_base', 'Background');
SELECT aidb.set_auto_knowledge_base('test_knowledge_base', 'Background', batch_size => 100, background_sync_interval => '10 seconds');

`aidb.bulk_embedding`

Compute the embeddings for the source records in this pipeline and store them in vector destination.

The behavior of this function depends on the configured auto_processing mode:

Live and Disabled auto-processing: Directly process all source records.
Background auto-processing: Mark all source records for processing but don't perform the operation. The background worker will perform it.

Parameters

Parameter	Type	Default	Description
knowledge_base_name	TEXT		Name of knowledge base for which to generate embeddings.
silent	BOOL	false	Disable printing status and progress logs.

Example

aidb=# SELECT aidb.bulk_embedding('kb_volume_image_manual');

Output

INFO:  kb_volume_image_manual: (re)setting state table to process all data...
INFO:  kb_volume_image_manual: Starting... Batch size 10, count(known source records): 0, scans completed: 0, count(embeddings): 0
INFO:  kb_volume_image_manual: Batch iteration finished, count(known source records): 10, scans completed: 0, count(embeddings): 10
[...]
INFO:  kb_volume_image_manual: Batch iteration finished, count(known source records): 177, scans completed: 0, count(embeddings): 177
INFO:  kb_volume_image_manual: finished, count(known source records): 177, scans completed: 1, count(embeddings): 177

`aidb.retrieve_key`

Retrieves a key from matching embeddings without looking up the source data.

Parameters

Parameter	Type	Default	Description
knowledge_base_name	TEXT		Name of knowledge base to use for retrieval
query_string	TEXT		Query string to use for retrieval
number_of_results	INTEGER	0	Number of results to return

Example

SELECT * FROM aidb.retrieve_key('test_knowledge_base', 'shoes', 2);

Output

  key  |      distance
-------+--------------------
 43941 | 0.2938963414490189
 19337 | 0.3023805122617119
(2 rows)

`aidb.retrieve_text`

Retrieves the source text data from matching embeddings by joining the embeddings with the source table.

Parameters

Parameter	Type	Default	Description
knowledge_base_name	TEXT		Name of knowledge base to use for retrieval
query_string	TEXT		Query string to use for retrieval
number_of_results	INTEGER	0	Number of results to return

Returns

Column	Type	Description
`key`	text	Key of the retrieved data
`value`	text	Value of the retrieved data
`distance`	double precision	Distance of the retrieved data from the query

Example

SELECT * FROM aidb.retrieve_text('test_knowledge_base', 'jacket', 2);

Output

  key  |                       value                        |      distance
-------+----------------------------------------------------+--------------------
 19337 | United Colors of Benetton Men Stripes Black Jacket | 0.2994317672742334
 55018 | Lakme 3 in 1 Orchid  Aqua Shine Lip Color          | 0.3804609668507203
(2 rows)

`aidb.delete_knowledge_base`

Deletes only the knowledge base's configuration from the database.

Parameters

Parameter	Type	Default	Description
knowledge_base_name	TEXT		Name of knowledge base to delete

Example

select aidb.delete_knowledge_base('test_knowledge_base');

Output

 delete_knowledge_base
------------------

(1 row)

`aidb.create_volume`

Creates a volume from a PGFS storage location for use as a data source in knowledge bases.

Parameters

Parameter	Type	Description
name	TEXT	Name of the volume to create
server_name	TEXT	Name of the storage location to use for the volume
path	TEXT	Path to the volume in the storage location
mime_type	TEXT	Type of data in the volume (Text or Image)

Example

select aidb.create_volume('demo_bucket_vol', 'demo_bucket', 'demo_bucket/demo_folder', 'Text');

Output

 create_volume
---------------
(1 row)

`aidb.list_volumes`

Lists all the volumes that were created in the database.

Example

select * from aidb.list_volumes();

`aidb.delete_volume`

Deletes a volume from the database.

Parameters

Parameter	Type	Default	Description
volume_name	TEXT		Name of the volume to delete

Example

select aidb.delete_volume('demo_bucket_vol');

Output

 delete_volume
-------------
(1 row)

← Prev

Pipelines reference - Preparers

↑ Up

Pipelines reference

Pipelines PGFS reference

Could this page be better? Report a problem or suggest an addition!

Pipelines knowledge bases reference

Views

aidb.knowledge_bases

aidb.knowledge_base_stats

Types

aidb.DistanceOperator

aidb.PipelineDataFormat

aidb.PipelineAutoProcessingMode

Functions

aidb.create_table_knowledge_base

Parameters

Index_types

Example

aidb.create_volume_knowledge_base

Parameters

Index_types

Example

aidb.set_auto_knowledge_base

Parameters

Example

aidb.bulk_embedding

Parameters

Example

aidb.retrieve_key

Parameters

Example

aidb.retrieve_text

Parameters

Returns

Example

aidb.delete_knowledge_base

Parameters

Example

aidb.create_volume

Parameters

Example

aidb.list_volumes

Example

aidb.delete_volume

Parameters

Example

← Prev

↑ Up

Next →

`aidb.knowledge_bases`

`aidb.knowledge_base_stats`

`aidb.DistanceOperator`

`aidb.PipelineDataFormat`

`aidb.PipelineAutoProcessingMode`

`aidb.create_table_knowledge_base`

`aidb.create_volume_knowledge_base`

`aidb.set_auto_knowledge_base`

`aidb.bulk_embedding`

`aidb.retrieve_key`

`aidb.retrieve_text`

`aidb.delete_knowledge_base`

`aidb.create_volume`

`aidb.list_volumes`

`aidb.delete_volume`