RpRelPyDBRelational Python

Documentation

2. Why RelPyDB?

Where RelPyDB fits relative to dicts, pandas, SQLite, and ORMs — and when to reach for each.

The gap RelPyDB fills

Python has many ways to store and query structured data. Each solves a different problem:

ToolStrengthGap
Plain dicts / listsZero setup. Pure Python.No schema, no typed enforcement, no joins, no FK constraints.
pandasExcellent for tabular analytics and data transformation.No schema, no foreign keys, relational joins require manual merge steps.
SQLite / SQLAlchemyReal SQL, file-backed, full relational model.Requires separate process model (or ORM boilerplate), SQL strings, class declarations.
DuckDBFastest in-process analytics for large datasets.SQL-centric, not Python-native; less ergonomic for lightweight scripts or tests.
RelPyDBFull relational model (schema, FK, joins, grouping, indexes, encryption, persistence) directly in Python objects.Slower than compiled engines on large analytical queries.

RelPyDB is aimed at the cases where you want relational structure and correctness checks, but you want to stay inside Python without spinning up a server or writing SQL strings.

Same task, four tools

Count confirmed players by game system from sessions + registrations. This is a real example from the RelPyDB benchmark.

RelPyDB — 32 lines
db = RelPy()

db.create_table("sessions")
db.add_column("sessions", "id", AutoNumber, is_primary_key=True)
db.add_column("sessions", "game_system", str, nullable=False)

db.create_table("registrations")
db.add_column("registrations", "id", AutoNumber, is_primary_key=True)
db.add_column(
    "registrations", "session_id", int,
    nullable=False, references="sessions.id",
)
db.add_column("registrations", "status", str, nullable=False)

sess1 = db.insert("sessions", {"game_system": "D&D 5e"})
sess2 = db.insert("sessions", {"game_system": "Pathfinder"})

db.insert("registrations", {"session_id": sess1["id"], "status": "confirmed"})
db.insert("registrations", {"session_id": sess1["id"], "status": "confirmed"})
db.insert("registrations", {"session_id": sess2["id"], "status": "waitlist"})

result = (
    db.query("registrations")
      .join("sessions")
      .where(col("registrations.status") == "confirmed")
      .group_by("sessions.game_system")
      .aggregate(total_players=count())
      .to_list()
)
SQLAlchemy ORM — 44 lines
Base = declarative_base()

class GameSession(Base):
    __tablename__ = "sessions"
    id          = Column(Integer, primary_key=True, autoincrement=True)
    game_system = Column(String, nullable=False)

class Registration(Base):
    __tablename__ = "registrations"
    id         = Column(Integer, primary_key=True, autoincrement=True)
    session_id = Column(Integer, ForeignKey("sessions.id"))
    status     = Column(String, nullable=False)

engine     = create_engine("sqlite:///:memory:")
Base.metadata.create_all(engine)
DbSession  = sessionmaker(bind=engine)
orm_session = DbSession()

sess1 = GameSession(game_system="D&D 5e")
sess2 = GameSession(game_system="Pathfinder")
orm_session.add_all([sess1, sess2])
orm_session.commit()

orm_session.add_all([
    Registration(session_id=sess1.id, status="confirmed"),
    Registration(session_id=sess1.id, status="confirmed"),
    Registration(session_id=sess2.id, status="waitlist"),
])
orm_session.commit()

result = (
    orm_session.query(
        GameSession.game_system,
        func.count(Registration.id).label("total_players"),
    )
    .join(Registration, Registration.session_id == GameSession.id)
    .filter(Registration.status == "confirmed")
    .group_by(GameSession.game_system)
    .all()
)

All four tools produce the same output

game_systemtotal_players
D&D 5e2

Performance — benchmark results

These numbers are from a live benchmark run on a dataset of 200 customers, 200 orders, and 250 support tickets (10 queries × 30 iterations). RelPyDB is a pure-Python in-memory library; SQLite and DuckDB are compiled engines. The comparison is useful for understanding relative behavior, not absolute production throughput.

QuerySQLAlchemy/SQLiteDuckDBRelPyDB
Revenue by region & category (4 joins)0.28 ms1.02 ms12.82 ms
Top customers by paid amount0.12 ms0.68 ms1.77 ms
Supplier category performance0.26 ms0.89 ms9.77 ms
Sales rep monthly performance0.12 ms0.62 ms1.57 ms
Payment method stats0.08 ms0.39 ms0.24 ms
Support tickets by category/status0.12 ms0.57 ms1.82 ms
Active customer revenue by region/tier0.11 ms0.72 ms2.27 ms
Monthly category revenue0.28 ms0.74 ms6.98 ms
RelPyDB's advantage is ergonomics and zero setup, not raw speed. For queries involving many cross-product joins on large datasets, SQLite or DuckDB will always be faster. RelPyDB outperforms both engines on startup time (8.95 ms vs 13 ms for SQLite and 206 ms for DuckDB) for the same schema.

When to use RelPyDB

  • You need a relational model (schema, FK, joins) but don't want to spin up a database server.
  • You're writing tests and need a fast, resettable in-memory relational fixture.
  • You're building a small tool, CLI, or script that processes structured JSON from an API.
  • You want encrypted column storage without an external secrets manager.
  • You want save/load of the full relational state to a single JSON file.

When to use something else

  • Large datasets (> 100 000 rows) — DuckDB or SQLite will be significantly faster.
  • Concurrent writes — RelPyDB is single-process in-memory. Use PostgreSQL or SQLite with WAL.
  • Analytical workloads on DataFrames you already have — pandas is more ergonomic for pure tabular work.