Configuration
Configuration
ThunderDB is configured through a TOML configuration file, typically located at /etc/thunderdb/thunderdb.toml. Every setting can also be overridden via environment variables, making it easy to customize behavior in containerized deployments.
Configuration File Format
The configuration file is organized into the following sections:
[node]
[network]
[storage]
[cluster]
[security]
[logging]
Complete Reference Configuration
Below is a fully annotated configuration file with all available settings and their defaults:
# =============================================================================
# ThunderDB Configuration File
# =============================================================================
# -----------------------------------------------------------------------------
# Node Settings
# -----------------------------------------------------------------------------
[node]
# Unique identifier for this node in the cluster.
# Must be unique across all nodes. In Kubernetes, this is typically derived
# from the pod ordinal index.
node_id = 1
# -----------------------------------------------------------------------------
# Network Settings
# -----------------------------------------------------------------------------
[network]
# Address to bind all listeners to.
# Use "0.0.0.0" to listen on all interfaces, or a specific IP to restrict.
listen_addr = "0.0.0.0"
# PostgreSQL wire protocol port.
# Compatible with psql, pgcli, and all PostgreSQL client libraries.
pg_port = 5432
# MySQL wire protocol port.
# Compatible with mysql CLI and all MySQL client libraries.
mysql_port = 3306
# RESP (Redis Serialization Protocol) port.
# Compatible with redis-cli and all Redis client libraries.
resp_port = 6379
# HTTP API port.
# Used for REST API, admin endpoints, metrics, and health checks.
http_port = 8088
# gRPC port.
# Used for inter-node communication and the native gRPC client API.
grpc_port = 9090
# -----------------------------------------------------------------------------
# Storage Settings
# -----------------------------------------------------------------------------
[storage]
# Directory for storing data files (pages, indexes, metadata).
data_dir = "/var/lib/thunderdb/data"
# Directory for write-ahead log (WAL) files.
# For best performance, place on a separate disk from data_dir.
wal_dir = "/var/lib/thunderdb/wal"
# Size of the buffer pool (in-memory page cache).
# This is the single most important tuning parameter. Larger values improve
# read performance by keeping more pages in memory.
# Supports suffixes: KB, MB, GB.
buffer_pool_size = "128MB"
# Size of the WAL write buffer.
# Larger values improve write throughput by batching WAL writes.
wal_buffer_size = "16MB"
# Size of each data page.
# Changing this after initialization requires a full data migration.
# Valid values: 4KB, 8KB, 16KB, 32KB.
page_size = "16KB"
# Interval between automatic checkpoints.
# Checkpoints flush dirty pages to disk, reducing recovery time.
# Lower values reduce recovery time but increase I/O.
checkpoint_interval = "60s"
# Number of threads dedicated to background compaction.
# More threads speed up compaction but consume CPU.
compaction_threads = 2
# Enable direct I/O to bypass the OS page cache.
# Recommended for production to avoid double-caching.
direct_io = false
# Enable compression for data pages on disk.
# Reduces storage requirements at a small CPU cost.
compression = true
# Compression algorithm to use when compression is enabled.
# Options: "Lz4" (fast), "Snappy" (balanced), "Zstd" (high ratio).
compression_algorithm = "Lz4"
# Maximum WAL size before forcing a checkpoint.
# When the WAL reaches this size, a checkpoint is triggered regardless
# of the checkpoint_interval.
max_wal_size = "1GB"
# Whether to flush WAL to disk on every commit.
# true: Guarantees durability (no data loss on crash). Recommended for production.
# false: Better write performance but risks losing the last few transactions on crash.
sync_commit = true
# -----------------------------------------------------------------------------
# Cluster Settings
# -----------------------------------------------------------------------------
[cluster]
# Cluster name. All nodes in the same cluster must use the same name.
cluster_name = "default"
# List of peer node addresses (host:grpc_port).
# Exclude the current node's address.
peers = []
# Raft election timeout.
# If a follower doesn't hear from the leader within this duration,
# it starts a new election. Must be greater than raft_heartbeat_interval.
# For WAN deployments, increase to 3-5s.
raft_election_timeout = "1s"
# Raft heartbeat interval.
# The leader sends heartbeats at this interval.
# Must be significantly less than raft_election_timeout (typically 1/10th).
raft_heartbeat_interval = "100ms"
# Number of copies of each data region.
# 3 is recommended for production (tolerates 1 node failure).
# Cannot exceed the number of nodes in the cluster.
replication_factor = 3
# Maximum size of a single data region before it is split.
# Smaller regions enable finer-grained load balancing.
max_region_size = "256MB"
# Minimum size of a single data region before it is merged.
# Prevents excessive fragmentation from many small regions.
min_region_size = "64MB"
# Enable automatic region balancing across nodes.
# When enabled, the leader periodically rebalances regions
# to maintain even distribution.
auto_balance = true
# -----------------------------------------------------------------------------
# Security Settings
# -----------------------------------------------------------------------------
[security]
# Enable client authentication.
# When false, all connections are accepted without credentials.
authentication_enabled = false
# Enable TLS for all client-facing protocols.
tls_enabled = false
# Path to the TLS certificate file (PEM format).
tls_cert_path = ""
# Path to the TLS private key file (PEM format).
tls_key_path = ""
# Superuser account name.
superuser = "admin"
# Superuser password (plaintext, for initial setup only).
# In production, use THUNDERDB_SUPERUSER_PASSWORD_HASH environment variable
# with an Argon2 hash instead.
superuser_password = ""
# -----------------------------------------------------------------------------
# Logging Settings
# -----------------------------------------------------------------------------
[logging]
# Log level. Options: "trace", "debug", "info", "warn", "error".
# Use "info" for production, "debug" for development, "trace" for deep debugging.
level = "info"
# Log output format. Options: "text" (human-readable), "json" (structured).
# Use "json" for production environments with log aggregation.
format = "text"
# Enable slow query logging.
# Queries exceeding the threshold are logged at WARN level.
slow_query_enabled = true
# Threshold for slow query logging.
# Queries taking longer than this are logged.
slow_query_threshold = "1s"
Section Reference
[node]
| Parameter | Type | Default | Description |
|---|---|---|---|
node_id | integer | 1 | Unique node identifier within the cluster. |
[network]
| Parameter | Type | Default | Description |
|---|---|---|---|
listen_addr | string | "0.0.0.0" | Bind address for all listeners. |
pg_port | integer | 5432 | PostgreSQL wire protocol port. |
mysql_port | integer | 3306 | MySQL wire protocol port. |
resp_port | integer | 6379 | RESP (Redis) wire protocol port. |
http_port | integer | 8088 | HTTP API and admin endpoint port. |
grpc_port | integer | 9090 | gRPC port for inter-node and client communication. |
[storage]
| Parameter | Type | Default | Description |
|---|---|---|---|
data_dir | string | "/var/lib/thunderdb/data" | Data file storage directory. |
wal_dir | string | "/var/lib/thunderdb/wal" | WAL file storage directory. |
buffer_pool_size | size | "128MB" | In-memory page cache size. |
wal_buffer_size | size | "16MB" | WAL write buffer size. |
page_size | size | "16KB" | Data page size. Immutable after initialization. |
checkpoint_interval | duration | "60s" | Automatic checkpoint interval. |
compaction_threads | integer | 2 | Background compaction thread count. |
direct_io | boolean | false | Bypass OS page cache with direct I/O. |
compression | boolean | true | Enable on-disk page compression. |
compression_algorithm | string | "Lz4" | Compression algorithm: Lz4, Snappy, or Zstd. |
max_wal_size | size | "1GB" | Maximum WAL size before forced checkpoint. |
sync_commit | boolean | true | Flush WAL to disk on every commit. |
[cluster]
| Parameter | Type | Default | Description |
|---|---|---|---|
cluster_name | string | "default" | Cluster name shared by all nodes. |
peers | array | [] | Peer node addresses in "host:port" format. |
raft_election_timeout | duration | "1s" | Raft follower election timeout. |
raft_heartbeat_interval | duration | "100ms" | Raft leader heartbeat interval. |
replication_factor | integer | 3 | Number of region replicas. |
max_region_size | size | "256MB" | Region split threshold. |
min_region_size | size | "64MB" | Region merge threshold. |
auto_balance | boolean | true | Enable automatic region rebalancing. |
[security]
| Parameter | Type | Default | Description |
|---|---|---|---|
authentication_enabled | boolean | false | Require client authentication. |
tls_enabled | boolean | false | Enable TLS encryption. |
tls_cert_path | string | "" | Path to TLS certificate (PEM). |
tls_key_path | string | "" | Path to TLS private key (PEM). |
superuser | string | "admin" | Superuser account name. |
superuser_password | string | "" | Superuser password (plaintext). |
[logging]
| Parameter | Type | Default | Description |
|---|---|---|---|
level | string | "info" | Log level: trace, debug, info, warn, error. |
format | string | "text" | Log format: text or json. |
slow_query_enabled | boolean | true | Enable slow query logging. |
slow_query_threshold | duration | "1s" | Slow query time threshold. |
Environment Variable Overrides
Every configuration parameter can be overridden by an environment variable. This is especially useful for Docker and Kubernetes deployments where secrets and per-instance values should not be baked into configuration files.
| Environment Variable | Overrides | Example |
|---|---|---|
THUNDERDB_DATA_DIR | storage.data_dir | /mnt/ssd/thunderdb/data |
THUNDERDB_WAL_DIR | storage.wal_dir | /mnt/ssd/thunderdb/wal |
THUNDERDB_LOG_LEVEL | logging.level | debug |
THUNDERDB_SUPERUSER_PASSWORD_HASH | security.superuser_password | argon2:$argon2id$v=19$... |
THUNDERDB_NODE_ID | node.node_id | 2 |
THUNDERDB_LISTEN_ADDR | network.listen_addr | 0.0.0.0 |
THUNDERDB_PG_PORT | network.pg_port | 15432 |
THUNDERDB_MYSQL_PORT | network.mysql_port | 13306 |
THUNDERDB_RESP_PORT | network.resp_port | 16379 |
THUNDERDB_HTTP_PORT | network.http_port | 18088 |
THUNDERDB_GRPC_PORT | network.grpc_port | 19090 |
Precedence: Environment variables take precedence over values in the configuration file. Command-line flags (if any) take precedence over both.
Using Environment Variables with Docker
docker run -d \
-e THUNDERDB_NODE_ID=1 \
-e THUNDERDB_LISTEN_ADDR=0.0.0.0 \
-e THUNDERDB_LOG_LEVEL=info \
-e THUNDERDB_SUPERUSER_PASSWORD_HASH='argon2:$argon2id$v=19$m=65536,t=3,p=4$...' \
-e THUNDERDB_DATA_DIR=/var/lib/thunderdb/data \
-e THUNDERDB_WAL_DIR=/var/lib/thunderdb/wal \
thunderdb:latest
Using Environment Variables with systemd
Add an override file:
sudo systemctl edit thunderdb
[Service]
Environment="THUNDERDB_LOG_LEVEL=debug"
Environment="THUNDERDB_SUPERUSER_PASSWORD_HASH=argon2:$argon2id$v=19$..."
Performance Tuning Guide
ThunderDB’s HTAP architecture means it must be tuned differently depending on whether your workload leans toward OLTP (transactional), OLAP (analytical), or a mix of both.
OLTP-Optimized Configuration
For workloads dominated by short, high-frequency transactions (point lookups, inserts, updates):
[storage]
# Large buffer pool to keep hot rows in memory.
# Aim for 60-70% of available system RAM.
buffer_pool_size = "8GB"
# Moderate WAL buffer -- OLTP writes are typically small.
wal_buffer_size = "32MB"
# Ensure every commit is durable.
sync_commit = true
# Frequent checkpoints reduce recovery time after crashes.
checkpoint_interval = "30s"
# Fewer compaction threads needed; OLTP generates less bulk data.
compaction_threads = 2
# Bypass OS cache to avoid double-buffering.
direct_io = true
# Lz4 for minimal CPU overhead on the write path.
compression = true
compression_algorithm = "Lz4"
# Smaller max WAL keeps recovery time bounded.
max_wal_size = "512MB"
Key principles:
- Maximize buffer pool size to serve reads from memory.
- Use
sync_commit = trueto guarantee durability. - Lower
checkpoint_intervalto reduce crash recovery time. - Use Lz4 compression for its speed advantage on the write path.
OLAP-Optimized Configuration
For workloads dominated by large scans, aggregations, and batch processing:
[storage]
# Moderate buffer pool -- OLAP scans are sequential and don't benefit
# as much from caching random pages.
buffer_pool_size = "4GB"
# Large WAL buffer to handle bulk writes efficiently.
wal_buffer_size = "128MB"
# Async commit is acceptable if some data loss on crash is tolerable.
sync_commit = false
# Less frequent checkpoints to reduce I/O during long-running queries.
checkpoint_interval = "300s"
# More compaction threads for faster background processing of bulk data.
compaction_threads = 8
# Direct I/O is still beneficial for large sequential reads.
direct_io = true
# Zstd compression for maximum space savings on large datasets.
compression = true
compression_algorithm = "Zstd"
# Larger max WAL to avoid checkpoint storms during bulk loads.
max_wal_size = "4GB"
Key principles:
- Allocate more to WAL buffer for batch write throughput.
- Use more compaction threads to keep up with bulk data ingestion.
- Use Zstd compression to minimize storage costs for large datasets.
- Larger
max_wal_sizeandcheckpoint_intervalreduce I/O interference with queries.
Mixed HTAP Configuration
For workloads with both transactional and analytical queries (the most common ThunderDB use case):
[storage]
# Balance between caching hot transactional data and leaving room
# for analytical query memory needs.
buffer_pool_size = "6GB"
# Balanced WAL buffer.
wal_buffer_size = "64MB"
# Durability is important for the transactional component.
sync_commit = true
# Moderate checkpoint interval balances recovery time and I/O.
checkpoint_interval = "60s"
# Moderate compaction thread count.
compaction_threads = 4
# Direct I/O recommended.
direct_io = true
# Lz4 is a good default balance of speed and compression.
compression = true
compression_algorithm = "Lz4"
# Moderate max WAL size.
max_wal_size = "1GB"
Memory Sizing Guide
Use the following guidelines to size ThunderDB’s memory parameters based on available system RAM:
| Available RAM | Buffer Pool | WAL Buffer | Recommended For |
|---|---|---|---|
| 8 GB | 4 GB | 32 MB | Development / Small production |
| 16 GB | 10 GB | 64 MB | Medium OLTP workloads |
| 32 GB | 20 GB | 128 MB | Large OLTP / Mixed HTAP |
| 64 GB | 40 GB | 256 MB | Heavy HTAP workloads |
| 128 GB | 80 GB | 512 MB | Large-scale analytics |
General rules:
- Allocate 50-70% of total RAM to
buffer_pool_size. - Reserve at least 2-4 GB for the OS, file system cache, and other processes.
- The WAL buffer should be 0.5-1% of the buffer pool size.
- For Kubernetes, set resource
requeststo the sum of buffer pool + WAL buffer + 2 GB overhead, andlimitsto 1.5x the requests.
Compression Algorithm Comparison
| Algorithm | Compression Ratio | Compression Speed | Decompression Speed | Best For |
|---|---|---|---|---|
| Lz4 | Low-Medium | Very Fast | Very Fast | OLTP, low-latency reads |
| Snappy | Medium | Fast | Fast | General purpose |
| Zstd | High | Moderate | Fast | OLAP, storage-constrained |
OS-Level Tuning
For optimal performance, apply these OS-level settings on Linux:
# Increase file descriptor limits
echo "thunder soft nofile 65535" >> /etc/security/limits.conf
echo "thunder hard nofile 65535" >> /etc/security/limits.conf
# Reduce swappiness (prefer keeping data in RAM)
echo "vm.swappiness=1" >> /etc/sysctl.conf
# Increase the number of memory map areas
echo "vm.max_map_count=262144" >> /etc/sysctl.conf
# Increase network buffer sizes for high-throughput workloads
echo "net.core.somaxconn=65535" >> /etc/sysctl.conf
echo "net.ipv4.tcp_max_syn_backlog=65535" >> /etc/sysctl.conf
# Apply changes
sysctl -p
Disk I/O Tuning
For SSD-backed storage:
# Use noop or none scheduler for SSDs
echo "none" > /sys/block/sda/queue/scheduler
# Set readahead to 256 KB for SSDs (reduce for random I/O workloads)
blockdev --setra 512 /dev/sda
For best results, place data_dir and wal_dir on separate physical disks or volumes. WAL writes are sequential and benefit from dedicated I/O bandwidth.
Feedback
Was this page helpful?
Glad to hear it! Tell us how we can improve.
Sorry to hear that. Tell us how we can improve.