Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Show HN: Jinbase – Multi-model transactional embedded database (github.com/pyrustic)
48 points by alexrustic on Nov 30, 2024 | hide | past | favorite | 4 comments
Hi HN ! Alex here. I'm excited to show you Jinbase (https://github.com/pyrustic/jinbase), my multi-model transactional embedded database.

Almost a year ago, I introduced Paradict [1], my take on multi-format streaming serialization. Given its readability, the Paradict text format appears de facto as an interesting data format for config files. But using Paradict to manage config files would end up cluttering its programming interface and making it confusing for users who still have choices of alternative libraries (TOML, INI File, etc.) dedicated to config files. So I used Paradict as a dependency for KvF (Key-value file format) [2], a new project of mine that focuses on config files with sections.

With its compact binary format, I thought Paradict would be an efficient dependency for a new project that would rely on I/O functions (such as Open, Read, Write, Seek, Tell and Close) to implement a minimalistic yet reliable persistence solution. But that was before I learned that "files are hard" [3]. SQLite with its transactions, BLOB data type and incremental I/O for BLOBs seemed like the right giant to stand on for my new project.

Jinbase started small as a key-value store and ended up as a multi-model embedded database that pushes the boundaries of what we usually do with SQLite. The first transition to the second data model (the depot) happened when I realized that the key-value store was not well suited for cases where a unique identifier is supposed to be automatically generated for each new record, saving the user the burden of providing an identifier that could accidentally be subject to a collision and thus overwrite an existing record. After that, I implemented a search capability that accepts UID ranges for the depot store, timespans (records are automatically timestamped) for both the depot and key-value stores and GLOB patterns and number ranges for string and integer keys in the key-value store.

The queue and stack data models emerged as solutions for use cases where records must be consumed in a specific order. A typical record would be retrieved and deleted from the database in a single transaction unit.

Since SQLite is used as the storage engine, Jinbase supports the relational model de facto. For convenience, all tables related to Jinbase internals are prefixed with "jinbase_", making Jinbase a useful tool for opening legacy SQLite files to add new data models that will safely coexist with the ad hoc relational model.

All four main data models (key-value, depot, queue, stack) support Paradict-compatible data types, such as dictionaries, strings, binary data, integers, datetimes, etc. Under the hood, when the user initiates a write operation, Jinbase serializes (except for binary data), chunks, and stores the data iteratively. A record can be accessed not only in bulk, but also with two levels of partial access granularity: the byte-level and the field-level.

While SQLite's incremental I/O for BLOBs is designed to target an individual BLOB column in a row, Jinbase extends this so that for each record, incremental reads cover all chunks as if they were a single unified BLOB. For dictionary records only, Jinbase automatically creates and maintains a lightweight index consisting of pointers to root fields, which then allows extracting from an arbitrary record the contents of a field automatically deserialized before being returned.

The most obvious use cases for Jinbase are storing user preferences, persisting session data before exit, order-based processing of data streams, exposing data for other processes, upgrading legacy SQLite files with new data models and bespoke data persistence solutions.

Jinbase is written in Python, is available on PyPI and you can play with the examples on the README.

Let me know what you think about this project.

[1] https://news.ycombinator.com/item?id=38684724

[2] https://github.com/pyrustic/kvf

[3] https://news.ycombinator.com/item?id=10725859



This is interesting! I understand the reference implementation of the client being in a specific language, since it kind of has to be to be runnable. But I’m a little off put by the selection of a Python-only serialization format as well. So to implement this in another language, both the Jinbase model concepts as well as the serializer/deserializer have to be implemented. Wouldn’t it have made more sense to use MessagePack or Protobuf or one of the other formats that already has ubiquitous language support? I understand you are the author of the Paradict format but it seems like a non-starter for wide adoption of Jinbase across languages.


Thank you for your comment ! Indeed, for wide adoption across languages, we will need to port at least Paradict as that is the format in which BLOBs are serialized.

Protobuf relies heavily on predefined schemas and this rigidity goes against the flexibility of Jinbase's schema-less philosophy.

MessagePack (or CBOR) seems more convincing but Paradict has some subtleties that I don't find there. For example, Paradict preserves UTC offsets [1], handles integer bases, allowing for the representation of integers in decimal, binary, octal, and hexadecimal formats, has an extension mechanism that I find more interesting, etc.

Soon I will be adding a command line interface to Jinbase. From the CLI one will be able to read and write any type of data and this will only be possible because Paradict has a twin text format. MessagePack, from what I know, started with 1:1 compatibility with JSON and then over time it added things that are not present in JSON, thus breaking the 1:1 compatibility with JSON.

If I understand Peter Naur's take on programming [2] correctly, Jinbase is a software idea that I'm trying to implement (bring to life) one iteration at a time, and for that I need to have some level of control over the components (like the serialization format) so that I can adjust things accordingly. For example, the Paradict binary format is originally intended to serialize and deserialize only dictionaries (P...dict), but I changed that detail so that Jinbase users can freely store other things than dictionaries.

Once the core idea is fully implemented, we will see how to reproduce it elsewhere, one contribution/compilation after another...

[1] https://codeblog.jonskeet.uk/2019/03/27/storing-utc-is-not-a...

[2] https://news.ycombinator.com/item?id=26027448


How does Jinbase handle concurrency? Since it’s built on SQLite, does it inherit SQLite’s locking mechanisms, or have you implemented additional features to manage concurrent access across the different data models?


Jinbase is designed to be thread-safe, ensuring it can be reliably used in a multithreaded context.

To interact with SQLite, Jinbase uses LiteDBC [1] an SQL interface compliant with the DB-API 2.0 specification described by PEP 249 [2], itself wrapping Python's sqlite3 [3] module for a more intuitive interface and multithreading support by default. I wrote for LiteDBC a stress test [4] involving concurrency with Asyncpal [5].

[1] https://github.com/pyrustic/litedbc

[2] https://peps.python.org/pep-0249/

[3] https://docs.python.org/3/library/sqlite3.html

[4] https://github.com/pyrustic/litedbc/blob/master/tests/test_s...

[5] https://news.ycombinator.com/item?id=41404020




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: