Skip to contents

ducklake 0.2.0

Multi-Backend Catalog Support

DuckLake now supports PostgreSQL, SQLite, and MySQL as catalog backends in addition to DuckDB (#15, @stefanlinner). This aligns with the DuckLake 1.0 specification and enables concurrent multi-client access when using PostgreSQL or SQLite.

New Features

  • attach_ducklake() gains backend, catalog_connection_string, read_only, and override_data_path parameters for multi-backend support.
  • install_ducklake() gains a backend parameter to pre-install backend extensions (e.g., install_ducklake(backend = "postgres")).
  • New get_ducklake_backend() returns the active catalog backend type.
  • detach_ducklake() gains a shutdown parameter. By default it now performs a soft detach (SQL DETACH + USE memory;) instead of shutting down the connection, allowing backend switching within a session.
  • backup_ducklake() is now backend-aware: file-based backends (DuckDB, SQLite) get catalog + data copied; PostgreSQL/MySQL get data only with guidance to use pg_dump/mysqldump. Also fixes a pre-existing bug where catalog backups were silently 0 bytes due to DuckDB holding file locks during file.copy().

Breaking Changes

  • attach_ducklake() now requires lake_path (previously optional).
  • set_ducklake_connection() has been removed. The package now exclusively uses duckplyr’s singleton DuckDB connection.
  • detach_ducklake() no longer shuts down the DuckDB connection by default. Pass shutdown = TRUE for the previous behaviour.

Internal


ducklake 0.1.0

Initial release of ducklake, an R package for versioned data lake infrastructure built on DuckDB and DuckLake.

Features

Core Table Operations

Row-Level Operations

ACID Transactions

Time Travel

Metadata and Audit Trail

Connection Management

Query Execution

Backup and Maintenance

  • backup_ducklake() - Create incremental backups
  • Support for local and remote backup locations

Vignettes

  • Getting Started - Introduction to ducklake workflows
  • Clinical Trial Data Lake - Industry-specific use case
  • Modifying Tables - Comprehensive guide to row operations
  • Working with Transactions - ACID transaction patterns
  • Time Travel Queries - Historical data access
  • Storage and Backup Management - Data persistence strategies

Lifecycle

This package is currently in experimental status. The API may change as we gather feedback from early users, but core functionality is stable and ready for pilot projects.