Documentation

2. Why RelPyDB?

Where RelPyDB fits relative to dicts, pandas, SQLite, and ORMs — and when to reach for each.

The gap RelPyDB fills

Python has many ways to store and query structured data. Each solves a different problem:

Tool	Strength	Gap
Plain dicts / lists	Zero setup. Pure Python.	No schema, no typed enforcement, no joins, no FK constraints.
pandas	Excellent for tabular analytics and data transformation.	No schema, no foreign keys, relational joins require manual merge steps.
SQLite / SQLAlchemy	Real SQL, file-backed, full relational model.	Requires separate process model (or ORM boilerplate), SQL strings, class declarations.
DuckDB	Fastest in-process analytics for large datasets.	SQL-centric, not Python-native; less ergonomic for lightweight scripts or tests.
RelPyDB	Full relational model (schema, FK, joins, grouping, indexes, encryption, persistence) directly in Python objects.	Slower than compiled engines on large analytical queries.

RelPyDB is aimed at the cases where you want relational structure and correctness checks, but you want to stay inside Python without spinning up a server or writing SQL strings.

Same task, four tools

Count confirmed players by game system from sessions + registrations. This is a real example from the RelPyDB benchmark.

RelPyDB — 32 lines

db = RelPy()

db.create_table("sessions")
db.add_column("sessions", "id", AutoNumber, is_primary_key=True)
db.add_column("sessions", "game_system", str, nullable=False)

db.create_table("registrations")
db.add_column("registrations", "id", AutoNumber, is_primary_key=True)
db.add_column(
    "registrations", "session_id", int,
    nullable=False, references="sessions.id",
)
db.add_column("registrations", "status", str, nullable=False)

sess1 = db.insert("sessions", {"game_system": "D&D 5e"})
sess2 = db.insert("sessions", {"game_system": "Pathfinder"})

db.insert("registrations", {"session_id": sess1["id"], "status": "confirmed"})
db.insert("registrations", {"session_id": sess1["id"], "status": "confirmed"})
db.insert("registrations", {"session_id": sess2["id"], "status": "waitlist"})

result = (
    db.query("registrations")
      .join("sessions")
      .where(col("registrations.status") == "confirmed")
      .group_by("sessions.game_system")
      .aggregate(total_players=count())
      .to_list()
)

SQLAlchemy ORM — 44 lines

Base = declarative_base()

class GameSession(Base):
    __tablename__ = "sessions"
    id          = Column(Integer, primary_key=True, autoincrement=True)
    game_system = Column(String, nullable=False)

class Registration(Base):
    __tablename__ = "registrations"
    id         = Column(Integer, primary_key=True, autoincrement=True)
    session_id = Column(Integer, ForeignKey("sessions.id"))
    status     = Column(String, nullable=False)

engine     = create_engine("sqlite:///:memory:")
Base.metadata.create_all(engine)
DbSession  = sessionmaker(bind=engine)
orm_session = DbSession()

sess1 = GameSession(game_system="D&D 5e")
sess2 = GameSession(game_system="Pathfinder")
orm_session.add_all([sess1, sess2])
orm_session.commit()

orm_session.add_all([
    Registration(session_id=sess1.id, status="confirmed"),
    Registration(session_id=sess1.id, status="confirmed"),
    Registration(session_id=sess2.id, status="waitlist"),
])
orm_session.commit()

result = (
    orm_session.query(
        GameSession.game_system,
        func.count(Registration.id).label("total_players"),
    )
    .join(Registration, Registration.session_id == GameSession.id)
    .filter(Registration.status == "confirmed")
    .group_by(GameSession.game_system)
    .all()
)

All four tools produce the same output

game_system	total_players
D&D 5e	2

Performance — benchmark results

These numbers are from a live benchmark run on a dataset of 200 customers, 200 orders, and 250 support tickets (10 queries × 30 iterations). RelPyDB is a pure-Python in-memory library; SQLite and DuckDB are compiled engines. The comparison is useful for understanding relative behavior, not absolute production throughput.

Query	SQLAlchemy/SQLite	DuckDB	RelPyDB
Revenue by region & category (4 joins)	0.28 ms	1.02 ms	12.82 ms
Top customers by paid amount	0.12 ms	0.68 ms	1.77 ms
Supplier category performance	0.26 ms	0.89 ms	9.77 ms
Sales rep monthly performance	0.12 ms	0.62 ms	1.57 ms
Payment method stats	0.08 ms	0.39 ms	0.24 ms
Support tickets by category/status	0.12 ms	0.57 ms	1.82 ms
Active customer revenue by region/tier	0.11 ms	0.72 ms	2.27 ms
Monthly category revenue	0.28 ms	0.74 ms	6.98 ms

RelPyDB's advantage is ergonomics and zero setup, not raw speed. For queries involving many cross-product joins on large datasets, SQLite or DuckDB will always be faster. RelPyDB outperforms both engines on startup time (8.95 ms vs 13 ms for SQLite and 206 ms for DuckDB) for the same schema.

When to use RelPyDB

You need a relational model (schema, FK, joins) but don't want to spin up a database server.
You're writing tests and need a fast, resettable in-memory relational fixture.
You're building a small tool, CLI, or script that processes structured JSON from an API.
You want encrypted column storage without an external secrets manager.
You want save/load of the full relational state to a single JSON file.

When to use something else

Large datasets (> 100 000 rows) — DuckDB or SQLite will be significantly faster.
Concurrent writes — RelPyDB is single-process in-memory. Use PostgreSQL or SQLite with WAL.
Analytical workloads on DataFrames you already have — pandas is more ergonomic for pure tabular work.

← 0. Quickstart 3. Concepts →