SQL Reference

Complete SQL language reference for ThunderDB covering DDL, DML, transactions, vector operations, foreign data wrappers, and built-in functions.

SQL Reference

ThunderDB implements a rich SQL dialect that is largely compatible with PostgreSQL. This reference documents every statement, data type, operator, and function available in ThunderDB.


Data Types

ThunderDB supports the following data types:

Numeric Types

TypeSizeRangeDescription
BOOLEAN1 bytetrue / falseLogical boolean
INT8 (alias TINYINT)1 byte-128 to 1278-bit signed integer
INT16 (alias SMALLINT)2 bytes-32,768 to 32,76716-bit signed integer
INT32 (alias INT, INTEGER)4 bytes-2^31 to 2^31-132-bit signed integer
INT64 (alias BIGINT)8 bytes-2^63 to 2^63-164-bit signed integer
FLOAT32 (alias REAL, FLOAT)4 bytesIEEE 754 single32-bit floating point
FLOAT64 (alias DOUBLE PRECISION, DOUBLE)8 bytesIEEE 754 double64-bit floating point
DECIMAL(p, s) (alias NUMERIC)variableUp to 38 digitsExact decimal with precision p and scale s

String Types

TypeMax SizeDescription
STRING (alias TEXT)2 GBVariable-length unlimited string
CHAR(n)n bytesFixed-length string, blank-padded
VARCHAR(n)n bytesVariable-length string with max length

Binary Types

TypeMax SizeDescription
BINARY (alias BYTEA)2 GBVariable-length binary data

Date & Time Types

TypeSizeDescription
DATE4 bytesCalendar date (year, month, day)
TIME8 bytesTime of day without timezone
TIMESTAMP8 bytesDate and time without timezone
TIMESTAMPTZ (alias TIMESTAMP WITH TIME ZONE)8 bytesDate and time with timezone
INTERVAL16 bytesTime duration

Other Types

TypeSizeDescription
UUID16 bytesUniversally unique identifier
JSONvariableJSON data stored as text
JSONBvariableJSON data stored in decomposed binary format (indexable)
ARRAYvariableOne-dimensional array of any scalar type
VECTOR(dim)dim * 4 bytesFixed-dimension vector of FLOAT32 elements

Type Casting

ThunderDB supports explicit casting with CAST() and the :: operator:

SELECT CAST('2025-01-15' AS DATE);
SELECT '42'::INT64;
SELECT '[1.0, 2.0, 3.0]'::VECTOR(3);

DDL (Data Definition Language)

CREATE TABLE

Create a new table with specified columns, constraints, and storage engine.

Syntax:

CREATE TABLE [IF NOT EXISTS] [schema.]table_name (
    column_name data_type [NOT NULL] [DEFAULT expr] [PRIMARY KEY],
    ...
    [CONSTRAINT name PRIMARY KEY (col1, col2, ...)],
    [CONSTRAINT name UNIQUE (col1, col2, ...)],
    [CONSTRAINT name FOREIGN KEY (col) REFERENCES other_table(col)
        [ON DELETE CASCADE|SET NULL|RESTRICT]
        [ON UPDATE CASCADE|SET NULL|RESTRICT]],
    [CONSTRAINT name CHECK (expression)]
)
[ENGINE = ROW | COLUMNAR]
[PARTITION BY RANGE|HASH|LIST (column)]
[WITH (option = value, ...)];

Storage Engines:

EngineBest ForDescription
ROW (default)OLTP, transactionalRow-oriented storage, optimized for point lookups and writes
COLUMNAROLAP, analyticsColumn-oriented storage, optimized for scans and aggregations

Examples:

-- Basic table
CREATE TABLE users (
    id        BIGINT PRIMARY KEY,
    name      VARCHAR(255) NOT NULL,
    email     VARCHAR(255) NOT NULL UNIQUE,
    active    BOOLEAN DEFAULT true,
    created_at TIMESTAMPTZ DEFAULT now()
);

-- Table with composite primary key
CREATE TABLE order_items (
    order_id   BIGINT NOT NULL,
    product_id BIGINT NOT NULL,
    quantity   INT32 NOT NULL CHECK (quantity > 0),
    unit_price DECIMAL(10,2) NOT NULL,
    PRIMARY KEY (order_id, product_id),
    FOREIGN KEY (order_id) REFERENCES orders(id) ON DELETE CASCADE
);

-- Columnar table for analytics
CREATE TABLE analytics_events (
    event_id    BIGINT PRIMARY KEY,
    user_id     BIGINT,
    event_type  VARCHAR(50) NOT NULL,
    properties  JSONB,
    occurred_at TIMESTAMPTZ NOT NULL
) ENGINE = COLUMNAR;

-- Partitioned table
CREATE TABLE logs (
    id         BIGINT PRIMARY KEY,
    level      VARCHAR(10),
    message    TEXT,
    created_at TIMESTAMPTZ NOT NULL
) PARTITION BY RANGE (created_at);

-- Table with vector column
CREATE TABLE documents (
    id        BIGINT PRIMARY KEY,
    title     VARCHAR(500),
    body      TEXT,
    embedding VECTOR(1536),
    metadata  JSONB
);

-- Table with array columns
CREATE TABLE tags (
    id     BIGINT PRIMARY KEY,
    name   VARCHAR(100),
    labels ARRAY<VARCHAR(50)>
);

-- Conditional creation
CREATE TABLE IF NOT EXISTS sessions (
    id         UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    user_id    BIGINT NOT NULL REFERENCES users(id),
    token      VARCHAR(255) NOT NULL,
    expires_at TIMESTAMPTZ NOT NULL
);

DROP TABLE

Remove a table and its data.

Syntax:

DROP TABLE [IF EXISTS] [schema.]table_name [CASCADE | RESTRICT];

Examples:

-- Drop a table (error if it does not exist)
DROP TABLE sessions;

-- Drop only if it exists
DROP TABLE IF EXISTS sessions;

-- Drop and cascade to dependent objects (indexes, foreign keys)
DROP TABLE users CASCADE;

ALTER TABLE

Modify an existing table structure.

Syntax:

ALTER TABLE [schema.]table_name
    ADD COLUMN column_name data_type [constraints]
    | DROP COLUMN column_name [CASCADE]
    | ALTER COLUMN column_name SET DATA TYPE data_type
    | ALTER COLUMN column_name SET DEFAULT expr
    | ALTER COLUMN column_name DROP DEFAULT
    | ALTER COLUMN column_name SET NOT NULL
    | ALTER COLUMN column_name DROP NOT NULL
    | ADD CONSTRAINT constraint_def
    | DROP CONSTRAINT constraint_name
    | RENAME TO new_table_name
    | RENAME COLUMN old_name TO new_name;

Examples:

-- Add a column
ALTER TABLE users ADD COLUMN phone VARCHAR(20);

-- Drop a column
ALTER TABLE users DROP COLUMN phone;

-- Change column type
ALTER TABLE users ALTER COLUMN name SET DATA TYPE VARCHAR(500);

-- Add a check constraint
ALTER TABLE orders ADD CONSTRAINT positive_total CHECK (total >= 0);

-- Rename a table
ALTER TABLE users RENAME TO customers;

-- Rename a column
ALTER TABLE orders RENAME COLUMN total TO order_total;

CREATE INDEX

Create an index on one or more columns for faster lookups.

Syntax:

CREATE [UNIQUE] INDEX [IF NOT EXISTS] index_name
    ON [schema.]table_name
    USING method (column_name [ASC|DESC] [NULLS FIRST|LAST], ...)
    [WITH (option = value, ...)]
    [WHERE predicate];

Supported Index Methods:

MethodUse CaseDescription
BTREE (default)Equality, range, orderingBalanced tree, supports <, <=, =, >=, >
HASHEquality onlyHash table, supports only =
HNSWVector ANN searchHierarchical navigable small world graph
IVF_PQLarge-scale vector searchInverted file with product quantization
GINFull-text, JSONB, arraysGeneralized inverted index
BRINLarge ordered datasetsBlock range index, very compact

Examples:

-- B-tree index (default)
CREATE INDEX idx_orders_customer ON orders (customer_id);

-- Unique index
CREATE UNIQUE INDEX idx_users_email ON users (email);

-- Composite index
CREATE INDEX idx_orders_customer_date ON orders (customer_id, created_at DESC);

-- Partial index (conditional)
CREATE INDEX idx_orders_pending ON orders (created_at)
    WHERE status = 'pending';

-- HNSW vector index
CREATE INDEX idx_docs_embedding ON documents
    USING HNSW (embedding)
    WITH (m = 16, ef_construction = 200, distance_metric = 'cosine');

-- IVF-PQ vector index for large datasets
CREATE INDEX idx_docs_embedding_ivf ON documents
    USING IVF_PQ (embedding)
    WITH (nlist = 1024, m_pq = 64, distance_metric = 'l2');

-- GIN index for JSONB
CREATE INDEX idx_events_properties ON analytics_events USING GIN (properties);

-- GIN index for full-text search
CREATE INDEX idx_docs_body_fts ON documents USING GIN (to_tsvector('english', body));

-- BRIN index for time-series data
CREATE INDEX idx_logs_created ON logs USING BRIN (created_at)
    WITH (pages_per_range = 32);

DROP INDEX

Remove an index.

Syntax:

DROP INDEX [IF EXISTS] index_name [CASCADE | RESTRICT];

Examples:

DROP INDEX idx_orders_customer;
DROP INDEX IF EXISTS idx_docs_embedding;

DML (Data Manipulation Language)

INSERT

Insert one or more rows into a table.

Syntax:

INSERT INTO [schema.]table_name [(column1, column2, ...)]
    VALUES (value1, value2, ...) [, (value1, value2, ...), ...]
    [ON CONFLICT (column) DO NOTHING | DO UPDATE SET col = expr, ...]
    [RETURNING column1, column2, ...];

Examples:

-- Insert a single row
INSERT INTO users (id, name, email)
VALUES (1, 'Alice Johnson', 'alice@example.com');

-- Insert multiple rows
INSERT INTO users (id, name, email) VALUES
    (2, 'Bob Smith', 'bob@example.com'),
    (3, 'Carol White', 'carol@example.com'),
    (4, 'Dave Brown', 'dave@example.com');

-- Insert with default values
INSERT INTO users (id, name, email)
VALUES (5, 'Eve Green', 'eve@example.com');
-- active defaults to true, created_at defaults to now()

-- Upsert (insert or update on conflict)
INSERT INTO users (id, name, email)
VALUES (1, 'Alice J. Updated', 'alice@example.com')
ON CONFLICT (id) DO UPDATE SET name = EXCLUDED.name;

-- Insert and return the inserted row
INSERT INTO orders (customer_id, product_id, quantity, total)
VALUES (1, 42, 3, 149.97)
RETURNING id, created_at;

-- Insert from a SELECT
INSERT INTO order_archive (id, customer_id, total, created_at)
SELECT id, customer_id, total, created_at
FROM orders
WHERE created_at < '2025-01-01';

-- Insert a vector
INSERT INTO documents (id, title, embedding)
VALUES (1, 'Quantum Computing', '[0.1, -0.23, 0.98, ...]'::VECTOR(1536));

UPDATE

Modify existing rows in a table.

Syntax:

UPDATE [schema.]table_name
SET column1 = expr1, column2 = expr2, ...
[FROM other_table]
[WHERE condition]
[RETURNING column1, column2, ...];

Examples:

-- Update a single row
UPDATE users SET name = 'Alice Johnson-Smith' WHERE id = 1;

-- Update multiple columns
UPDATE orders
SET status = 'shipped', shipped_at = now()
WHERE id = 5012;

-- Update with expression
UPDATE products
SET price = price * 1.10
WHERE category = 'electronics';

-- Update with subquery
UPDATE orders
SET status = 'vip'
WHERE customer_id IN (
    SELECT id FROM users WHERE membership = 'gold'
);

-- Update with FROM clause (join-based update)
UPDATE order_items oi
SET unit_price = p.price
FROM products p
WHERE oi.product_id = p.id
  AND p.price_updated_at > oi.created_at;

-- Update and return modified rows
UPDATE users
SET active = false
WHERE last_login < now() - INTERVAL '90 days'
RETURNING id, name, email;

DELETE

Remove rows from a table.

Syntax:

DELETE FROM [schema.]table_name
[USING other_table]
[WHERE condition]
[RETURNING column1, column2, ...];

Examples:

-- Delete a single row
DELETE FROM users WHERE id = 42;

-- Delete with condition
DELETE FROM sessions WHERE expires_at < now();

-- Delete with subquery
DELETE FROM orders
WHERE customer_id IN (
    SELECT id FROM users WHERE active = false
);

-- Delete with USING (join-based delete)
DELETE FROM order_items oi
USING orders o
WHERE oi.order_id = o.id AND o.status = 'cancelled';

-- Delete all rows (truncate-like)
DELETE FROM temp_imports;

-- Delete and return removed rows
DELETE FROM users
WHERE active = false
RETURNING id, name, email;

SELECT

Query data from one or more tables.

Syntax:

SELECT [DISTINCT] column_expr [AS alias], ...
FROM [schema.]table_name [AS alias]
    [JOIN type join_table ON condition]
    [WHERE condition]
    [GROUP BY column_or_expr, ...]
    [HAVING condition]
    [ORDER BY column_or_expr [ASC|DESC] [NULLS FIRST|LAST], ...]
    [LIMIT count]
    [OFFSET count];

Basic Queries

-- Select all columns
SELECT * FROM users;

-- Select specific columns
SELECT name, email FROM users;

-- Select with alias
SELECT
    name AS customer_name,
    email AS customer_email
FROM users;

-- Select with expression
SELECT
    name,
    price,
    price * 0.9 AS discounted_price
FROM products;

-- Distinct values
SELECT DISTINCT category FROM products;

-- Count rows
SELECT COUNT(*) FROM orders;

WHERE Clause

-- Comparison operators
SELECT * FROM products WHERE price > 100;
SELECT * FROM products WHERE price BETWEEN 50 AND 200;
SELECT * FROM users WHERE name LIKE 'A%';
SELECT * FROM users WHERE name ILIKE '%smith%';  -- case-insensitive
SELECT * FROM users WHERE email IS NOT NULL;

-- Logical operators
SELECT * FROM products
WHERE category = 'electronics' AND price < 500;

SELECT * FROM products
WHERE category = 'electronics' OR category = 'books';

-- IN operator
SELECT * FROM orders WHERE status IN ('pending', 'processing', 'shipped');

-- NOT operator
SELECT * FROM orders WHERE status NOT IN ('cancelled', 'refunded');

-- EXISTS subquery
SELECT * FROM users u
WHERE EXISTS (
    SELECT 1 FROM orders o WHERE o.customer_id = u.id
);

-- ANY / ALL
SELECT * FROM products
WHERE price > ALL (SELECT price FROM products WHERE category = 'books');

-- JSONB operators
SELECT * FROM analytics_events
WHERE properties->>'source' = 'mobile';

SELECT * FROM analytics_events
WHERE properties @> '{"os": "ios"}'::JSONB;

JOIN Types

-- INNER JOIN
SELECT u.name, o.id AS order_id, o.total
FROM users u
INNER JOIN orders o ON u.id = o.customer_id;

-- LEFT JOIN
SELECT u.name, COUNT(o.id) AS order_count
FROM users u
LEFT JOIN orders o ON u.id = o.customer_id
GROUP BY u.name;

-- RIGHT JOIN
SELECT u.name, o.id AS order_id
FROM orders o
RIGHT JOIN users u ON u.id = o.customer_id;

-- FULL OUTER JOIN
SELECT u.name, o.id AS order_id
FROM users u
FULL OUTER JOIN orders o ON u.id = o.customer_id;

-- CROSS JOIN
SELECT u.name, p.name AS product_name
FROM users u
CROSS JOIN products p;

-- Self-join
SELECT e.name AS employee, m.name AS manager
FROM employees e
LEFT JOIN employees m ON e.manager_id = m.id;

-- Multiple joins
SELECT
    u.name AS customer,
    p.name AS product,
    oi.quantity,
    oi.unit_price
FROM orders o
JOIN users u ON o.customer_id = u.id
JOIN order_items oi ON o.id = oi.order_id
JOIN products p ON oi.product_id = p.id
WHERE o.status = 'shipped';

GROUP BY and HAVING

-- Basic aggregation
SELECT category, COUNT(*) AS product_count, AVG(price) AS avg_price
FROM products
GROUP BY category;

-- Group by with HAVING
SELECT customer_id, SUM(total) AS lifetime_value
FROM orders
GROUP BY customer_id
HAVING SUM(total) > 10000
ORDER BY lifetime_value DESC;

-- Group by with expressions
SELECT
    date_trunc('month', created_at) AS month,
    COUNT(*) AS order_count,
    SUM(total) AS revenue
FROM orders
WHERE created_at >= '2025-01-01'
GROUP BY date_trunc('month', created_at)
ORDER BY month;

-- Multiple aggregations
SELECT
    category,
    COUNT(*) AS count,
    MIN(price) AS min_price,
    MAX(price) AS max_price,
    AVG(price) AS avg_price,
    SUM(price) AS total_value
FROM products
GROUP BY category
ORDER BY total_value DESC;

ORDER BY, LIMIT, and OFFSET

-- Order ascending (default)
SELECT * FROM products ORDER BY price;

-- Order descending
SELECT * FROM products ORDER BY price DESC;

-- Multiple order columns
SELECT * FROM products ORDER BY category ASC, price DESC;

-- Null handling
SELECT * FROM products ORDER BY discount NULLS LAST;

-- Pagination
SELECT * FROM products
ORDER BY id
LIMIT 20 OFFSET 40;  -- Page 3, 20 items per page

-- Top-N query
SELECT * FROM orders
ORDER BY total DESC
LIMIT 10;

Subqueries and CTEs

-- Scalar subquery
SELECT name,
       (SELECT COUNT(*) FROM orders o WHERE o.customer_id = u.id) AS order_count
FROM users u;

-- Derived table (subquery in FROM)
SELECT top_customers.name, top_customers.total_spent
FROM (
    SELECT u.name, SUM(o.total) AS total_spent
    FROM users u
    JOIN orders o ON u.id = o.customer_id
    GROUP BY u.name
    ORDER BY total_spent DESC
    LIMIT 100
) AS top_customers;

-- Common Table Expression (CTE)
WITH monthly_revenue AS (
    SELECT
        date_trunc('month', created_at) AS month,
        SUM(total) AS revenue
    FROM orders
    GROUP BY 1
)
SELECT
    month,
    revenue,
    LAG(revenue) OVER (ORDER BY month) AS prev_month_revenue,
    revenue - LAG(revenue) OVER (ORDER BY month) AS revenue_change
FROM monthly_revenue
ORDER BY month;

-- Recursive CTE (hierarchical data)
WITH RECURSIVE org_tree AS (
    -- Base case: top-level managers
    SELECT id, name, manager_id, 0 AS depth
    FROM employees
    WHERE manager_id IS NULL

    UNION ALL

    -- Recursive step
    SELECT e.id, e.name, e.manager_id, t.depth + 1
    FROM employees e
    JOIN org_tree t ON e.manager_id = t.id
)
SELECT * FROM org_tree ORDER BY depth, name;

Window Functions

-- Row number
SELECT
    name,
    category,
    price,
    ROW_NUMBER() OVER (PARTITION BY category ORDER BY price DESC) AS rank
FROM products;

-- Running total
SELECT
    id,
    created_at,
    total,
    SUM(total) OVER (ORDER BY created_at ROWS UNBOUNDED PRECEDING) AS running_total
FROM orders;

-- Moving average
SELECT
    date,
    revenue,
    AVG(revenue) OVER (
        ORDER BY date
        ROWS BETWEEN 6 PRECEDING AND CURRENT ROW
    ) AS seven_day_avg
FROM daily_revenue;

-- Rank and Dense Rank
SELECT
    name,
    category,
    price,
    RANK() OVER (PARTITION BY category ORDER BY price DESC) AS rank,
    DENSE_RANK() OVER (PARTITION BY category ORDER BY price DESC) AS dense_rank
FROM products;

-- Lead and Lag
SELECT
    month,
    revenue,
    LAG(revenue, 1) OVER (ORDER BY month) AS prev_month,
    LEAD(revenue, 1) OVER (ORDER BY month) AS next_month
FROM monthly_revenue;

-- NTILE (buckets)
SELECT
    name,
    price,
    NTILE(4) OVER (ORDER BY price) AS price_quartile
FROM products;

-- First/Last value
SELECT DISTINCT
    category,
    FIRST_VALUE(name) OVER (PARTITION BY category ORDER BY price ASC) AS cheapest,
    FIRST_VALUE(name) OVER (PARTITION BY category ORDER BY price DESC) AS most_expensive
FROM products;

Set Operations

-- UNION (deduplicated)
SELECT name, email FROM users
UNION
SELECT name, email FROM archived_users;

-- UNION ALL (keep duplicates)
SELECT id, total FROM orders_2024
UNION ALL
SELECT id, total FROM orders_2025;

-- INTERSECT
SELECT customer_id FROM orders
INTERSECT
SELECT id FROM users WHERE active = true;

-- EXCEPT
SELECT id FROM users
EXCEPT
SELECT customer_id FROM orders;

Transactions

ThunderDB supports full ACID transactions with multiple isolation levels.

BEGIN

Start a new transaction.

BEGIN;
-- or
BEGIN TRANSACTION;
-- or with isolation level
BEGIN TRANSACTION ISOLATION LEVEL SERIALIZABLE;

COMMIT

Commit the current transaction, making all changes permanent.

COMMIT;
-- or
COMMIT TRANSACTION;

ROLLBACK

Roll back the current transaction, discarding all changes.

ROLLBACK;
-- or
ROLLBACK TRANSACTION;

SAVEPOINT

Create a savepoint within a transaction for partial rollback.

-- Create a savepoint
SAVEPOINT my_savepoint;

-- Roll back to savepoint (undo changes since savepoint, but keep transaction open)
ROLLBACK TO SAVEPOINT my_savepoint;

-- Release a savepoint (no longer needed)
RELEASE SAVEPOINT my_savepoint;

SET TRANSACTION ISOLATION LEVEL

Set the isolation level for the current transaction.

SET TRANSACTION ISOLATION LEVEL READ COMMITTED;
SET TRANSACTION ISOLATION LEVEL REPEATABLE READ;
SET TRANSACTION ISOLATION LEVEL SERIALIZABLE;

Isolation Levels:

LevelDirty ReadNon-Repeatable ReadPhantom ReadWrite Skew
READ COMMITTEDNoPossiblePossiblePossible
REPEATABLE READNoNoPossiblePossible
SERIALIZABLENoNoNoNo

Transaction Examples

-- Simple transaction
BEGIN;
UPDATE accounts SET balance = balance - 100 WHERE id = 1;
UPDATE accounts SET balance = balance + 100 WHERE id = 2;
INSERT INTO transfers (from_id, to_id, amount) VALUES (1, 2, 100);
COMMIT;

-- Transaction with savepoint
BEGIN;
INSERT INTO orders (customer_id, total) VALUES (42, 299.99);
SAVEPOINT before_items;
INSERT INTO order_items (order_id, product_id, quantity) VALUES (1, 10, 2);
-- Oops, wrong product
ROLLBACK TO SAVEPOINT before_items;
INSERT INTO order_items (order_id, product_id, quantity) VALUES (1, 15, 2);
COMMIT;

-- Read-only transaction for consistent analytics
BEGIN TRANSACTION ISOLATION LEVEL REPEATABLE READ;
SET TRANSACTION READ ONLY;
SELECT COUNT(*) FROM orders WHERE status = 'pending';
SELECT SUM(total) FROM orders WHERE status = 'pending';
-- Both queries see the same snapshot
COMMIT;

-- Serializable transaction for inventory check
BEGIN TRANSACTION ISOLATION LEVEL SERIALIZABLE;
SELECT quantity FROM inventory WHERE product_id = 42;
-- Application checks if quantity >= requested amount
UPDATE inventory SET quantity = quantity - 5 WHERE product_id = 42;
INSERT INTO orders (customer_id, product_id, quantity) VALUES (1, 42, 5);
COMMIT;
-- If another transaction modified inventory concurrently, this will
-- fail with a serialization error and should be retried

Vector Operations

ThunderDB natively supports vector embeddings for similarity search, enabling AI/ML workloads alongside traditional SQL.

Creating Tables with Vector Columns

-- 1536-dimensional vectors (OpenAI text-embedding-3-small)
CREATE TABLE documents (
    id        BIGINT PRIMARY KEY,
    title     VARCHAR(500),
    body      TEXT,
    embedding VECTOR(1536)
);

-- 768-dimensional vectors (sentence-transformers)
CREATE TABLE products (
    id          BIGINT PRIMARY KEY,
    name        VARCHAR(255),
    description TEXT,
    image_embed VECTOR(512),
    text_embed  VECTOR(768)
);

Inserting Vectors

-- Insert with array literal
INSERT INTO documents (id, title, embedding)
VALUES (1, 'Hello World', '[0.1, -0.2, 0.3, ...]'::VECTOR(1536));

-- Insert from application (using bind parameter)
INSERT INTO documents (id, title, embedding)
VALUES ($1, $2, $3);
-- where $3 is a float array of length 1536

Similarity Operators

OperatorDistance MetricDescription
<->L2 (Euclidean)Euclidean distance between two vectors
<=>CosineCosine distance (1 - cosine similarity)
<#>Inner ProductNegative inner product (for max inner product search)

Vector Search Queries

-- K-nearest neighbors with L2 distance
SELECT id, title, embedding <-> $1 AS distance
FROM documents
ORDER BY embedding <-> $1
LIMIT 10;

-- Cosine similarity search
SELECT id, title, 1 - (embedding <=> $1) AS similarity
FROM documents
ORDER BY embedding <=> $1
LIMIT 10;

-- Inner product search (for normalized vectors)
SELECT id, title, embedding <#> $1 AS score
FROM documents
ORDER BY embedding <#> $1
LIMIT 10;

-- Filtered vector search (hybrid search)
SELECT id, title, embedding <-> $1 AS distance
FROM documents
WHERE category = 'science' AND published = true
ORDER BY embedding <-> $1
LIMIT 10;

-- Vector search with metadata join
SELECT
    d.id,
    d.title,
    d.embedding <-> $1 AS distance,
    a.name AS author
FROM documents d
JOIN authors a ON d.author_id = a.id
WHERE d.embedding <-> $1 < 0.5
ORDER BY d.embedding <-> $1
LIMIT 20;

Creating Vector Indexes

-- HNSW index (recommended for most use cases)
CREATE INDEX idx_docs_embed ON documents
    USING HNSW (embedding)
    WITH (
        m = 16,                    -- Max connections per node
        ef_construction = 200,     -- Build-time beam width
        distance_metric = 'cosine' -- 'l2', 'cosine', or 'inner_product'
    );

-- IVF-PQ index (for datasets > 1M vectors)
CREATE INDEX idx_docs_embed_ivf ON documents
    USING IVF_PQ (embedding)
    WITH (
        nlist = 1024,              -- Number of clusters
        m_pq = 64,                 -- Number of PQ sub-quantizers
        distance_metric = 'l2'
    );
-- Set HNSW search beam width for the current session
SET hnsw.ef_search = 128;  -- Higher = more accurate but slower

-- Set IVF-PQ probe count
SET ivf.nprobe = 20;  -- Higher = more accurate but slower

-- Then run your query
SELECT id, title, embedding <-> $1 AS distance
FROM documents
ORDER BY embedding <-> $1
LIMIT 10;

Foreign Data Wrappers (FDW)

ThunderDB can query data stored in external systems and join it with local tables. This enables data federation without ETL.

CREATE SERVER

Define a connection to an external data source.

-- PostgreSQL server
CREATE SERVER pg_production
    TYPE 'postgresql'
    OPTIONS (
        host '10.0.1.50',
        port '5432',
        dbname 'production',
        user 'readonly',
        password 'secret'
    );

-- MySQL server
CREATE SERVER mysql_legacy
    TYPE 'mysql'
    OPTIONS (
        host '10.0.2.50',
        port '3306',
        dbname 'legacy_app',
        user 'reader',
        password 'secret'
    );

-- MongoDB server
CREATE SERVER mongo_logs
    TYPE 'mongodb'
    OPTIONS (
        connection_string 'mongodb://10.0.3.50:27017/logs',
        auth_database 'admin',
        user 'reader',
        password 'secret'
    );

-- Amazon S3 (Parquet files)
CREATE SERVER s3_data_lake
    TYPE 's3'
    OPTIONS (
        region 'us-east-1',
        bucket 'my-data-lake',
        access_key_id 'AKIAIOSFODNN7EXAMPLE',
        secret_access_key 'wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY'
    );

CREATE FOREIGN TABLE

Map a remote table or collection to a local table definition.

-- From PostgreSQL
CREATE FOREIGN TABLE remote_users (
    id     BIGINT,
    name   VARCHAR(255),
    email  VARCHAR(255),
    active BOOLEAN
) SERVER pg_production
OPTIONS (schema 'public', table 'users');

-- From MySQL
CREATE FOREIGN TABLE legacy_orders (
    order_id   INT32,
    customer   VARCHAR(100),
    total      DECIMAL(10,2),
    order_date DATE
) SERVER mysql_legacy
OPTIONS (table 'orders');

-- From MongoDB
CREATE FOREIGN TABLE mongo_access_logs (
    _id        VARCHAR(24),
    user_id    BIGINT,
    endpoint   VARCHAR(255),
    status     INT32,
    timestamp  TIMESTAMPTZ
) SERVER mongo_logs
OPTIONS (collection 'access_logs');

-- From S3 Parquet files
CREATE FOREIGN TABLE s3_events (
    event_id   BIGINT,
    event_type VARCHAR(50),
    payload    JSONB,
    created_at TIMESTAMPTZ
) SERVER s3_data_lake
OPTIONS (path 'events/year=2025/', format 'parquet');

Querying Foreign Tables

Foreign tables behave like regular tables in queries. ThunderDB pushes down predicates and projections to minimize data transfer.

-- Query a foreign table directly
SELECT * FROM remote_users WHERE active = true LIMIT 100;

-- Join local and foreign tables
SELECT
    u.name,
    u.email,
    COUNT(o.order_id) AS total_orders,
    SUM(o.total) AS lifetime_value
FROM remote_users u
JOIN legacy_orders o ON u.name = o.customer
GROUP BY u.name, u.email
ORDER BY lifetime_value DESC
LIMIT 20;

-- Cross-database federation: PostgreSQL + MySQL + MongoDB in one query
SELECT
    u.name,
    o.total AS order_total,
    COUNT(l._id) AS access_count
FROM remote_users u
JOIN legacy_orders o ON u.name = o.customer
LEFT JOIN mongo_access_logs l ON u.id = l.user_id
GROUP BY u.name, o.total
ORDER BY access_count DESC;

-- Query S3 Parquet files
SELECT
    event_type,
    COUNT(*) AS event_count,
    MIN(created_at) AS first_seen,
    MAX(created_at) AS last_seen
FROM s3_events
WHERE created_at >= '2025-12-01'
GROUP BY event_type
ORDER BY event_count DESC;

DROP SERVER and DROP FOREIGN TABLE

-- Drop a foreign table
DROP FOREIGN TABLE IF EXISTS remote_users;

-- Drop a server (must drop foreign tables first, or use CASCADE)
DROP SERVER IF EXISTS pg_production CASCADE;

Built-in Functions

Aggregate Functions

FunctionDescription
COUNT(*)Count of rows
COUNT(expr)Count of non-null values
COUNT(DISTINCT expr)Count of distinct non-null values
SUM(expr)Sum of values
AVG(expr)Average of values
MIN(expr)Minimum value
MAX(expr)Maximum value
ARRAY_AGG(expr)Collect values into an array
STRING_AGG(expr, delimiter)Concatenate strings with delimiter
BOOL_AND(expr)True if all values are true
BOOL_OR(expr)True if any value is true
STDDEV(expr)Sample standard deviation
VARIANCE(expr)Sample variance
PERCENTILE_CONT(fraction)Continuous percentile
PERCENTILE_DISC(fraction)Discrete percentile

String Functions

FunctionDescriptionExample
LENGTH(s)String lengthLENGTH('hello') = 5
UPPER(s)UppercaseUPPER('hello') = 'HELLO'
LOWER(s)LowercaseLOWER('HELLO') = 'hello'
TRIM(s)Remove whitespaceTRIM(' hi ') = 'hi'
LTRIM(s)Left trimLTRIM(' hi') = 'hi'
RTRIM(s)Right trimRTRIM('hi ') = 'hi'
SUBSTRING(s, start, len)Extract substringSUBSTRING('hello', 2, 3) = 'ell'
REPLACE(s, from, to)Replace occurrencesREPLACE('hello', 'l', 'r') = 'herro'
CONCAT(s1, s2, ...)Concatenate stringsCONCAT('a', 'b', 'c') = 'abc'
SPLIT_PART(s, delim, n)Split and extractSPLIT_PART('a.b.c', '.', 2) = 'b'
REGEXP_MATCH(s, pattern)Regex matchREGEXP_MATCH('abc123', '\d+') = {'123'}
REGEXP_REPLACE(s, p, r)Regex replaceREGEXP_REPLACE('abc', '[a-z]', 'X', 'g') = 'XXX'
STARTS_WITH(s, prefix)Prefix checkSTARTS_WITH('hello', 'he') = true
MD5(s)MD5 hashMD5('hello') = '5d41402abc4b2a76...'

Date & Time Functions

FunctionDescriptionExample
now()Current timestamp with timezone2025-12-15 10:30:00+00
current_dateCurrent date2025-12-15
current_timeCurrent time10:30:00+00
date_trunc(field, ts)Truncate to precisiondate_trunc('month', ts)
date_part(field, ts)Extract fielddate_part('year', ts) = 2025
EXTRACT(field FROM ts)Extract field (SQL standard)EXTRACT(MONTH FROM ts) = 12
age(ts1, ts2)Interval between timestampsage(now(), created_at)
ts + INTERVAL '...'Add intervalnow() + INTERVAL '7 days'
ts - INTERVAL '...'Subtract intervalnow() - INTERVAL '1 hour'
to_char(ts, format)Format timestampto_char(now(), 'YYYY-MM-DD')
to_timestamp(str, fmt)Parse timestampto_timestamp('2025-01-15', 'YYYY-MM-DD')
generate_series(start, stop, step)Generate time seriesSee example below
-- Generate a time series
SELECT ts::DATE AS day
FROM generate_series(
    '2025-01-01'::TIMESTAMP,
    '2025-01-31'::TIMESTAMP,
    '1 day'::INTERVAL
) AS ts;

JSON / JSONB Functions

Function / OperatorDescriptionExample
->Get JSON element by key (returns JSON)data->'name'
->>Get JSON element by key (returns text)data->>'name'
#>Get JSON element by path (returns JSON)data#>'{address,city}'
#>>Get JSON element by path (returns text)data#>>'{address,city}'
@>Containsdata @> '{"active": true}'
<@Contained by'{"a":1}' <@ data
?Key existsdata ? 'email'
jsonb_build_object(k,v,...)Construct JSONBjsonb_build_object('name', 'Alice')
jsonb_agg(expr)Aggregate into JSON arrayjsonb_agg(name)
jsonb_object_agg(k, v)Aggregate into JSON objectjsonb_object_agg(key, value)
jsonb_array_elements(j)Expand JSON array to rowsjsonb_array_elements('[1,2,3]')
jsonb_each(j)Expand JSON object to key-value rowsjsonb_each('{"a":1}')
jsonb_set(j, path, val)Set value at pathjsonb_set(data, '{name}', '"Bob"')
jsonb_strip_nulls(j)Remove null keysjsonb_strip_nulls(data)

Mathematical Functions

FunctionDescription
ABS(x)Absolute value
CEIL(x) / CEILING(x)Round up
FLOOR(x)Round down
ROUND(x, d)Round to d decimal places
TRUNC(x, d)Truncate to d decimal places
MOD(x, y)Modulo
POWER(x, y)x raised to power y
SQRT(x)Square root
LN(x)Natural logarithm
LOG(base, x)Logarithm with base
EXP(x)Exponential (e^x)
RANDOM()Random value between 0 and 1
GREATEST(a, b, ...)Maximum of values
LEAST(a, b, ...)Minimum of values

Conditional Expressions

-- CASE expression
SELECT name,
    CASE
        WHEN price < 10 THEN 'budget'
        WHEN price < 100 THEN 'mid-range'
        ELSE 'premium'
    END AS tier
FROM products;

-- COALESCE (first non-null)
SELECT COALESCE(nickname, name, 'Anonymous') AS display_name
FROM users;

-- NULLIF (return null if equal)
SELECT NULLIF(discount, 0) AS effective_discount
FROM products;

-- IIF (inline if)
SELECT name, IIF(active, 'Active', 'Inactive') AS status
FROM users;

System Commands

-- Show server version
SELECT version();

-- Show current database
SELECT current_database();

-- Show current user
SELECT current_user;

-- Show all tables
SHOW TABLES;

-- Show table schema
DESCRIBE users;
-- or
\d users

-- Show running queries
SELECT * FROM thunder_catalog.running_queries;

-- Show cluster status
SELECT * FROM thunder_catalog.cluster_nodes;

-- Show replication status
SELECT * FROM thunder_catalog.replication_status;

-- Cancel a running query
SELECT thunder_cancel_query('query_id_here');

-- Analyze table statistics (for query planner)
ANALYZE users;

-- Vacuum (reclaim storage)
VACUUM users;
VACUUM FULL users;  -- Rewrites table, reclaims maximum space