Documentation
2. Why RelPyDB?
Where RelPyDB fits relative to dicts, pandas, SQLite, and ORMs — and when to reach for each.
The gap RelPyDB fills
Python has many ways to store and query structured data. Each solves a different problem:
| Tool | Strength | Gap |
|---|---|---|
| Plain dicts / lists | Zero setup. Pure Python. | No schema, no typed enforcement, no joins, no FK constraints. |
| pandas | Excellent for tabular analytics and data transformation. | No schema, no foreign keys, relational joins require manual merge steps. |
| SQLite / SQLAlchemy | Real SQL, file-backed, full relational model. | Requires separate process model (or ORM boilerplate), SQL strings, class declarations. |
| DuckDB | Fastest in-process analytics for large datasets. | SQL-centric, not Python-native; less ergonomic for lightweight scripts or tests. |
| RelPyDB | Full relational model (schema, FK, joins, grouping, indexes, encryption, persistence) directly in Python objects. | Slower than compiled engines on large analytical queries. |
RelPyDB is aimed at the cases where you want relational structure and correctness checks, but you want to stay inside Python without spinning up a server or writing SQL strings.
Same task, four tools
Count confirmed players by game system from sessions + registrations. This is a real example from the RelPyDB benchmark.
db = RelPy()
db.create_table("sessions")
db.add_column("sessions", "id", AutoNumber, is_primary_key=True)
db.add_column("sessions", "game_system", str, nullable=False)
db.create_table("registrations")
db.add_column("registrations", "id", AutoNumber, is_primary_key=True)
db.add_column(
"registrations", "session_id", int,
nullable=False, references="sessions.id",
)
db.add_column("registrations", "status", str, nullable=False)
sess1 = db.insert("sessions", {"game_system": "D&D 5e"})
sess2 = db.insert("sessions", {"game_system": "Pathfinder"})
db.insert("registrations", {"session_id": sess1["id"], "status": "confirmed"})
db.insert("registrations", {"session_id": sess1["id"], "status": "confirmed"})
db.insert("registrations", {"session_id": sess2["id"], "status": "waitlist"})
result = (
db.query("registrations")
.join("sessions")
.where(col("registrations.status") == "confirmed")
.group_by("sessions.game_system")
.aggregate(total_players=count())
.to_list()
)
Base = declarative_base()
class GameSession(Base):
__tablename__ = "sessions"
id = Column(Integer, primary_key=True, autoincrement=True)
game_system = Column(String, nullable=False)
class Registration(Base):
__tablename__ = "registrations"
id = Column(Integer, primary_key=True, autoincrement=True)
session_id = Column(Integer, ForeignKey("sessions.id"))
status = Column(String, nullable=False)
engine = create_engine("sqlite:///:memory:")
Base.metadata.create_all(engine)
DbSession = sessionmaker(bind=engine)
orm_session = DbSession()
sess1 = GameSession(game_system="D&D 5e")
sess2 = GameSession(game_system="Pathfinder")
orm_session.add_all([sess1, sess2])
orm_session.commit()
orm_session.add_all([
Registration(session_id=sess1.id, status="confirmed"),
Registration(session_id=sess1.id, status="confirmed"),
Registration(session_id=sess2.id, status="waitlist"),
])
orm_session.commit()
result = (
orm_session.query(
GameSession.game_system,
func.count(Registration.id).label("total_players"),
)
.join(Registration, Registration.session_id == GameSession.id)
.filter(Registration.status == "confirmed")
.group_by(GameSession.game_system)
.all()
)
All four tools produce the same output
| game_system | total_players |
|---|---|
| D&D 5e | 2 |
Performance — benchmark results
These numbers are from a live benchmark run on a dataset of 200 customers, 200 orders, and 250 support tickets (10 queries × 30 iterations). RelPyDB is a pure-Python in-memory library; SQLite and DuckDB are compiled engines. The comparison is useful for understanding relative behavior, not absolute production throughput.
| Query | SQLAlchemy/SQLite | DuckDB | RelPyDB |
|---|---|---|---|
| Revenue by region & category (4 joins) | 0.28 ms | 1.02 ms | 12.82 ms |
| Top customers by paid amount | 0.12 ms | 0.68 ms | 1.77 ms |
| Supplier category performance | 0.26 ms | 0.89 ms | 9.77 ms |
| Sales rep monthly performance | 0.12 ms | 0.62 ms | 1.57 ms |
| Payment method stats | 0.08 ms | 0.39 ms | 0.24 ms |
| Support tickets by category/status | 0.12 ms | 0.57 ms | 1.82 ms |
| Active customer revenue by region/tier | 0.11 ms | 0.72 ms | 2.27 ms |
| Monthly category revenue | 0.28 ms | 0.74 ms | 6.98 ms |
When to use RelPyDB
- You need a relational model (schema, FK, joins) but don't want to spin up a database server.
- You're writing tests and need a fast, resettable in-memory relational fixture.
- You're building a small tool, CLI, or script that processes structured JSON from an API.
- You want encrypted column storage without an external secrets manager.
- You want save/load of the full relational state to a single JSON file.
When to use something else
- Large datasets (> 100 000 rows) — DuckDB or SQLite will be significantly faster.
- Concurrent writes — RelPyDB is single-process in-memory. Use PostgreSQL or SQLite with WAL.
- Analytical workloads on DataFrames you already have — pandas is more ergonomic for pure tabular work.