Documentation
8. Grouping & Aggregation
Summarise rows into groups and compute aggregate statistics.
group_by + aggregate
Once you call .group_by(), the query switches into grouped mode. You must follow it with .aggregate() before calling any terminal method.
RelPyDB
from relpy import RelPy, AutoNumber, col, count, sum_, avg
result = (
db.query("orders")
.group_by("status")
.aggregate(
order_count=count(),
total=sum_("amount"),
average=avg("amount"),
)
.to_list()
)
SQL equivalent
SELECT
status,
COUNT(*) AS order_count,
SUM(amount) AS total,
AVG(amount) AS average
FROM orders
GROUP BY status;
Output
| status | order_count | total | average |
|---|---|---|---|
| paid | 2 | 200.0 | 100.0 |
| pending | 1 | 200.0 | 200.0 |
Aggregate functions
| Function | Import | SQL | Description |
|---|---|---|---|
count() | from relpy import count | COUNT(*) | Count all rows in group. |
count("col") | from relpy import count | COUNT(col) | Count non-None values in column. |
sum_("col") | from relpy import sum_ | SUM(col) | Sum of values. (Trailing underscore avoids shadowing Python's sum.) |
avg("col") | from relpy import avg | AVG(col) | Mean of values. |
min_("col") | from relpy import min_ | MIN(col) | Smallest value. |
max_("col") | from relpy import max_ | MAX(col) | Largest value. |
Multi-column group_by
RelPyDB
result = (
db.query("orders")
.join("users")
.group_by("users.city", "orders.status")
.aggregate(
orders=count(),
revenue=sum_("orders.amount"),
)
.order_by("revenue", descending=True)
.to_list()
)
SQL equivalent
SELECT
users.city,
orders.status,
COUNT(*) AS orders,
SUM(orders.amount) AS revenue
FROM orders
JOIN users
ON orders.user_id = users.id
GROUP BY users.city, orders.status
ORDER BY revenue DESC;
Output (example)
| users.city | orders.status | orders | revenue |
|---|---|---|---|
| Haifa | pending | 1 | 200.0 |
| Tel Aviv | paid | 2 | 200.0 |
having — filter groups
having() filters the grouped result, like WHERE but applied after aggregation.
RelPyDB
result = (
db.query("orders")
.group_by("status")
.aggregate(
order_count=count(),
total=sum_("amount"),
)
.having(col("total") > 150)
.to_list()
)
SQL equivalent
SELECT
status,
COUNT(*) AS order_count,
SUM(amount) AS total
FROM orders
GROUP BY status
HAVING SUM(amount) > 150;
Output — only groups with total > 150
| status | order_count | total |
|---|---|---|
| paid | 2 | 200.0 |
| pending | 1 | 200.0 |
Real benchmark result — confirmed registrations by game system
This query was run in the RelPyDB benchmark against 10 000 registration records across 120 sessions. The same query ran in DuckDB, SQLAlchemy, pandas, NumPy, and Python loops — all returned identical results.
result = (
db.query("registrations")
.join("sessions")
.where(
(col("registrations.status") == "confirmed") &
(col("registrations.checked_in") == True)
)
.group_by("sessions.game_system")
.aggregate(
checked_in_players=count(),
ticket_revenue=sum_("registrations.price"),
average_ticket=avg("registrations.price"),
)
.order_by("checked_in_players", descending=True)
.to_list()
)
Output — top 5 game systems
| game_system | checked_in_players | ticket_revenue | average_ticket |
|---|---|---|---|
| Mork Borg | 1 079 | 54 035.0 | 50.08 |
| Call of Cthulhu | 923 | 45 020.0 | 48.78 |
| D&D 5e | 854 | 41 905.0 | 49.07 |
| Blades in the Dark | 693 | 33 895.0 | 48.91 |
| Pathfinder | 597 | 29 590.0 | 49.56 |