Documentation

8. Grouping & Aggregation

Summarise rows into groups and compute aggregate statistics.

group_by + aggregate

Once you call .group_by(), the query switches into grouped mode. You must follow it with .aggregate() before calling any terminal method.

RelPyDB

from relpy import RelPy, AutoNumber, col, count, sum_, avg

result = (
    db.query("orders")
      .group_by("status")
      .aggregate(
          order_count=count(),
          total=sum_("amount"),
          average=avg("amount"),
      )
      .to_list()
)

SQL equivalent

SELECT
  status,
  COUNT(*)         AS order_count,
  SUM(amount)      AS total,
  AVG(amount)      AS average
FROM orders
GROUP BY status;

Output
statusorder_counttotalaverage
paid2200.0100.0
pending1200.0200.0

status	order_count	total	average
paid	2	200.0	100.0
pending	1	200.0	200.0

Aggregate functions

Function	Import	SQL	Description
`count()`	`from relpy import count`	`COUNT(*)`	Count all rows in group.
`count("col")`	`from relpy import count`	`COUNT(col)`	Count non-None values in column.
`sum_("col")`	`from relpy import sum_`	`SUM(col)`	Sum of values. (Trailing underscore avoids shadowing Python's `sum`.)
`avg("col")`	`from relpy import avg`	`AVG(col)`	Mean of values.
`min_("col")`	`from relpy import min_`	`MIN(col)`	Smallest value.
`max_("col")`	`from relpy import max_`	`MAX(col)`	Largest value.

Multi-column group_by

RelPyDB

result = (
    db.query("orders")
      .join("users")
      .group_by("users.city", "orders.status")
      .aggregate(
          orders=count(),
          revenue=sum_("orders.amount"),
      )
      .order_by("revenue", descending=True)
      .to_list()
)

SQL equivalent

SELECT
  users.city,
  orders.status,
  COUNT(*)          AS orders,
  SUM(orders.amount) AS revenue
FROM orders
JOIN users
  ON orders.user_id = users.id
GROUP BY users.city, orders.status
ORDER BY revenue DESC;

Output (example)
users.cityorders.statusordersrevenue
Haifapending1200.0
Tel Avivpaid2200.0

users.city	orders.status	orders	revenue
Haifa	pending	1	200.0
Tel Aviv	paid	2	200.0

having — filter groups

having() filters the grouped result, like WHERE but applied after aggregation.

RelPyDB

result = (
    db.query("orders")
      .group_by("status")
      .aggregate(
          order_count=count(),
          total=sum_("amount"),
      )
      .having(col("total") > 150)
      .to_list()
)

SQL equivalent

SELECT
  status,
  COUNT(*) AS order_count,
  SUM(amount) AS total
FROM orders
GROUP BY status
HAVING SUM(amount) > 150;

Output — only groups with total > 150
statusorder_counttotal
paid2200.0
pending1200.0

status	order_count	total
paid	2	200.0
pending	1	200.0

Real benchmark result — confirmed registrations by game system

This query was run in the RelPyDB benchmark against 10 000 registration records across 120 sessions. The same query ran in DuckDB, SQLAlchemy, pandas, NumPy, and Python loops — all returned identical results.

result = (
    db.query("registrations")
      .join("sessions")
      .where(
          (col("registrations.status")     == "confirmed") &
          (col("registrations.checked_in") == True)
      )
      .group_by("sessions.game_system")
      .aggregate(
          checked_in_players=count(),
          ticket_revenue=sum_("registrations.price"),
          average_ticket=avg("registrations.price"),
      )
      .order_by("checked_in_players", descending=True)
      .to_list()
)

Output — top 5 game systems
game_systemchecked_in_playersticket_revenueaverage_ticket
Mork Borg1 07954 035.050.08
Call of Cthulhu92345 020.048.78
D&D 5e85441 905.049.07
Blades in the Dark69333 895.048.91
Pathfinder59729 590.049.56

game_system	checked_in_players	ticket_revenue	average_ticket
Mork Borg	1 079	54 035.0	50.08
Call of Cthulhu	923	45 020.0	48.78
D&D 5e	854	41 905.0	49.07
Blades in the Dark	693	33 895.0	48.91
Pathfinder	597	29 590.0	49.56

← 7. Joins 9. Views →