ducklake 0.2.0
Multi-Backend Catalog Support
DuckLake now supports PostgreSQL, SQLite, and MySQL as catalog backends in addition to DuckDB (#15, @stefanlinner). This aligns with the DuckLake 1.0 specification and enables concurrent multi-client access when using PostgreSQL or SQLite.
New Features
-
attach_ducklake()gainsbackend,catalog_connection_string,read_only, andoverride_data_pathparameters for multi-backend support. -
install_ducklake()gains abackendparameter to pre-install backend extensions (e.g.,install_ducklake(backend = "postgres")). - New
get_ducklake_backend()returns the active catalog backend type. -
detach_ducklake()gains ashutdownparameter. By default it now performs a soft detach (SQLDETACH+USE memory;) instead of shutting down the connection, allowing backend switching within a session. -
backup_ducklake()is now backend-aware: file-based backends (DuckDB, SQLite) get catalog + data copied; PostgreSQL/MySQL get data only with guidance to usepg_dump/mysqldump. Also fixes a pre-existing bug where catalog backups were silently 0 bytes due to DuckDB holding file locks duringfile.copy().
Breaking Changes
-
attach_ducklake()now requireslake_path(previously optional). -
set_ducklake_connection()has been removed. The package now exclusively uses duckplyr’s singleton DuckDB connection. -
detach_ducklake()no longer shuts down the DuckDB connection by default. Passshutdown = TRUEfor the previous behaviour.
Internal
- Schema qualifier logic updated throughout (
get_metadata_table(),time_travel.R,transactions.R) to handle PostgreSQL/MySQL backends that don’t use the.main.schema prefix. - New internal helpers:
build_attach_sql(),ensure_extensions(),shutdown_and_reset_singleton().
ducklake 0.1.0
Initial release of ducklake, an R package for versioned data lake infrastructure built on DuckDB and DuckLake.
Features
Core Table Operations
-
create_table()- Create new tables in the data lake -
get_ducklake_table()- Retrieve tables as tibbles -
replace_table()- Replace entire table contents with versioning
Row-Level Operations
-
rows_insert()- Insert new rows with automatic versioning -
rows_update()- Update existing rows with audit trail -
rows_delete()- Delete rows while maintaining history
ACID Transactions
-
with_transaction()- Execute code blocks within transactions -
begin_transaction(),commit_transaction(),rollback_transaction()- Manual transaction control - Full ACID compliance for data integrity
Time Travel
-
get_ducklake_table_asof()- Query table state at specific timestamps -
get_ducklake_table_version()- Retrieve specific table versions -
list_table_snapshots()- View complete version history -
restore_table_version()- Roll back to previous versions
Metadata and Audit Trail
-
get_metadata_table()- Access comprehensive metadata -
set_snapshot_metadata()- Add author, commit messages, and tags - Complete lineage tracking for all data changes
Connection Management
-
install_ducklake()- Install/update DuckLake extension -
attach_ducklake()- Initialize data lake connections -
detach_ducklake()- Clean up connections -
get_ducklake_connection()- Retrieve the active DuckDB connection
Query Execution
-
ducklake_exec()- Execute SQL with automatic assignment handling -
show_ducklake_query()- Preview translated SQL queries -
extract_assignments_from_sql()- Parse SQL table assignments
Backup and Maintenance
-
backup_ducklake()- Create incremental backups - Support for local and remote backup locations
Vignettes
- Getting Started - Introduction to ducklake workflows
- Clinical Trial Data Lake - Industry-specific use case
- Modifying Tables - Comprehensive guide to row operations
- Working with Transactions - ACID transaction patterns
- Time Travel Queries - Historical data access
- Storage and Backup Management - Data persistence strategies
