RpRelPyDBRelational Python

Documentation

8. Grouping & Aggregation

Summarise rows into groups and compute aggregate statistics.

group_by + aggregate

Once you call .group_by(), the query switches into grouped mode. You must follow it with .aggregate() before calling any terminal method.

RelPyDB
from relpy import RelPy, AutoNumber, col, count, sum_, avg

result = (
    db.query("orders")
      .group_by("status")
      .aggregate(
          order_count=count(),
          total=sum_("amount"),
          average=avg("amount"),
      )
      .to_list()
)
SQL equivalent
SELECT
  status,
  COUNT(*)         AS order_count,
  SUM(amount)      AS total,
  AVG(amount)      AS average
FROM orders
GROUP BY status;
Output
statusorder_counttotalaverage
paid2200.0100.0
pending1200.0200.0

Aggregate functions

FunctionImportSQLDescription
count()from relpy import countCOUNT(*)Count all rows in group.
count("col")from relpy import countCOUNT(col)Count non-None values in column.
sum_("col")from relpy import sum_SUM(col)Sum of values. (Trailing underscore avoids shadowing Python's sum.)
avg("col")from relpy import avgAVG(col)Mean of values.
min_("col")from relpy import min_MIN(col)Smallest value.
max_("col")from relpy import max_MAX(col)Largest value.

Multi-column group_by

RelPyDB
result = (
    db.query("orders")
      .join("users")
      .group_by("users.city", "orders.status")
      .aggregate(
          orders=count(),
          revenue=sum_("orders.amount"),
      )
      .order_by("revenue", descending=True)
      .to_list()
)
SQL equivalent
SELECT
  users.city,
  orders.status,
  COUNT(*)          AS orders,
  SUM(orders.amount) AS revenue
FROM orders
JOIN users
  ON orders.user_id = users.id
GROUP BY users.city, orders.status
ORDER BY revenue DESC;
Output (example)
users.cityorders.statusordersrevenue
Haifapending1200.0
Tel Avivpaid2200.0

having — filter groups

having() filters the grouped result, like WHERE but applied after aggregation.

RelPyDB
result = (
    db.query("orders")
      .group_by("status")
      .aggregate(
          order_count=count(),
          total=sum_("amount"),
      )
      .having(col("total") > 150)
      .to_list()
)
SQL equivalent
SELECT
  status,
  COUNT(*) AS order_count,
  SUM(amount) AS total
FROM orders
GROUP BY status
HAVING SUM(amount) > 150;
Output — only groups with total > 150
statusorder_counttotal
paid2200.0
pending1200.0

Real benchmark result — confirmed registrations by game system

This query was run in the RelPyDB benchmark against 10 000 registration records across 120 sessions. The same query ran in DuckDB, SQLAlchemy, pandas, NumPy, and Python loops — all returned identical results.

result = (
    db.query("registrations")
      .join("sessions")
      .where(
          (col("registrations.status")     == "confirmed") &
          (col("registrations.checked_in") == True)
      )
      .group_by("sessions.game_system")
      .aggregate(
          checked_in_players=count(),
          ticket_revenue=sum_("registrations.price"),
          average_ticket=avg("registrations.price"),
      )
      .order_by("checked_in_players", descending=True)
      .to_list()
)
Output — top 5 game systems
game_systemchecked_in_playersticket_revenueaverage_ticket
Mork Borg1 07954 035.050.08
Call of Cthulhu92345 020.048.78
D&D 5e85441 905.049.07
Blades in the Dark69333 895.048.91
Pathfinder59729 590.049.56