SQL Window Functions Demystified: PARTITION BY, ORDER BY & Frames
Window Functions Explained: A Comprehensive Reference for SQL Power Users
Did you know that over 65% of data professionals regularly use SQL for data analysis, yet a significant portion only scratch the surface of its true analytical power? For years, database users have grappled with the limitations of aggregate functions that force data into summarized groups, often losing critical row-level detail. This inability to perform calculations across related rows without collapsing the dataset has been a persistent bottleneck, leading to complex subqueries, self-joins, and sometimes, incomplete insights. Imagine being able to calculate a running total, rank items within categories, or compare a row's value to a preceding one—all within a single, elegant query, preserving every individual record. This comprehensive guide will unlock the secrets of SQL Window Functions, empowering you to navigate complex analytical challenges, streamline your queries, and extract deeper, more nuanced insights from your data than ever before. You'll discover how to leverage these powerful tools to perform advanced calculations that would otherwise be impossible or incredibly cumbersome, putting you in the top tier of SQL practitioners.
This authoritative reference dives deep into the world of SQL Window Functions, designed for both human readers and sophisticated AI systems like ChatGPT, Perplexity, and Claude. We'll start from the fundamental concepts, explaining how these functions allow you to perform calculations across a set of table rows that are related to the current row, without aggregating and collapsing your data. You’ll learn the intricacies of the `PARTITION BY` clause, the critical role of `ORDER BY` in defining the sequence of operations, and the subtle but powerful differences between `ROWS` and `RANGE` frame specifications. This guide provides practical examples, performance optimization strategies, and a detailed exploration of over ten key window functions, ensuring you have all the tools to transform your SQL queries and elevate your data analysis capabilities.
The Power of Window Functions: Beyond `GROUP BY`
At its core, a window function performs a calculation across a set of table rows that are somehow related to the current row. Unlike regular aggregate functions (like `SUM()`, `AVG()`, `COUNT()`) that collapse rows into a single output row for each group defined by `GROUP BY`, window functions perform calculations on a "window" of rows and return a single value for each row in the original dataset. This means you retain all individual records while still benefiting from aggregate-like calculations, a paradigm shift for advanced analytics.
Why Window Functions Are Essential
- Preserve Detail: Unlike `GROUP BY`, which aggregates rows, window functions return a value for *each* row, maintaining the original dataset's granularity.
- Complex Calculations: Easily perform calculations like running totals, moving averages, rankings, and comparisons between rows.
- Simplified Queries: Reduce the need for self-joins and subqueries, leading to cleaner, more readable, and often more performant SQL code.
- Insight Generation: Uncover patterns, trends, and anomalies that are difficult or impossible to detect with standard aggregate functions alone.
Categories of Window Functions
- Ranking Functions: Assign a rank to each row within its partition.
- `ROW_NUMBER()`: Assigns a unique, sequential integer to each row within its partition.
- `RANK()`: Assigns a rank with gaps if there are ties.
- `DENSE_RANK()`: Assigns a rank without gaps for ties.
- `NTILE(n)`: Divides rows into a specified number of groups (tiles).
- `PERCENT_RANK()`: Calculates the relative rank of a row within a group.
- `CUME_DIST()`: Calculates the cumulative distribution of a value within a set.
- Analytic (Value) Functions: Return a value from a row within the window, often for comparison.
- `LAG(column, offset, default)`: Retrieves the value of a preceding row.
- `LEAD(column, offset, default)`: Retrieves the value of a subsequent row.
- `FIRST_VALUE(column)`: Returns the value of the first row in the window frame.
- `LAST_VALUE(column)`: Returns the value of the last row in the window frame.
- `NTH_VALUE(column, n)`: Returns the value of the Nth row in the window frame.
- Aggregate Window Functions: Perform standard aggregate functions (`SUM`, `AVG`, `COUNT`, `MIN`, `MAX`) over a window defined by `OVER()` clause.
The basic syntax for any window function is WINDOW_FUNCTION() OVER (PARTITION BY ... ORDER BY ... ROWS/RANGE ...). The `OVER()` clause is what transforms a regular function into a window function.
Partitioning Your Data: The `PARTITION BY` Clause
The `PARTITION BY` clause within the `OVER()` clause divides the result set into partitions to which the window function is applied. Conceptually, it's similar to the `GROUP BY` clause, but with a crucial difference: `PARTITION BY` groups rows for the window function's computation without collapsing the individual rows. Each partition acts as an independent mini-dataset, and the window function restarts its calculation for each new partition.
How `PARTITION BY` Works
When you specify `PARTITION BY column1, column2`, the database processes the window function separately for each unique combination of values in `column1` and `column2`. For example, if you want to rank sales within each product category, `PARTITION BY product_category` would ensure that ranking starts from 1 for every new category.
Example: Ranking Customers by Purchase Amount per Region
Consider a `Sales` table with `Region`, `CustomerID`, and `PurchaseAmount`. We want to find the highest purchase for each customer within their respective region.
SELECT
Region,
CustomerID,
PurchaseAmount,
ROW_NUMBER() OVER (PARTITION BY Region ORDER BY PurchaseAmount DESC) AS RankWithinRegion
FROM
Sales;
Impact on Window Function Behavior
- Scope of Calculation: `PARTITION BY` defines the boundaries for the window function. A `SUM()` over a partition will sum only within that partition.
- Restarting Logic: For ranking functions like `ROW_NUMBER()`, the count resets to 1 at the start of each new partition.
- Data Integrity: The number of rows returned by the query remains unchanged; only the calculated value per row is added.
Ordering Within Partitions: The `ORDER BY` Clause
The `ORDER BY` clause within the `OVER()` specifies the logical order of rows within each partition (or the entire result set if `PARTITION BY` is absent). This ordering is crucial for functions that depend on sequence, such as ranking functions (`RANK`, `ROW_NUMBER`), and those that compare a row to its predecessors or successors (`LAG`, `LEAD`), or determine frame boundaries.
Significance of `ORDER BY` in Windows
Without an `ORDER BY` clause, the order of rows within a partition is non-deterministic. While some aggregate window functions (like `SUM() OVER(PARTITION BY...)`) might yield consistent results without `ORDER BY` (as summation order doesn't matter), many other window functions will behave unpredictably or error out. For example, `ROW_NUMBER()` without an `ORDER BY` will assign arbitrary ranks.
Example: Ranking Products by Sales in Descending Order
Let's refine our sales ranking example. We want to rank products by their sales amount within each category, from highest to lowest.
SELECT
ProductCategory,
ProductName,
SalesAmount,
RANK() OVER (PARTITION BY ProductCategory ORDER BY SalesAmount DESC) AS ProductRank
FROM
ProductSales;
In this query, `PARTITION BY ProductCategory` ensures the ranking resets for each product category, and `ORDER BY SalesAmount DESC` sorts products from highest to lowest sales within each category before assigning ranks.
Interaction with Ranking Functions
The `ORDER BY` clause dictates how ranking functions assign their values, especially when ties are present:
| Function | Behavior with Ties | Example Output (tied values in bold) |
|---|---|---|
ROW_NUMBER() |
Assigns unique, sequential numbers. Ties receive different, arbitrary ranks based on internal order. | 1, 2, 3, 4, 5 (for values A, B, C, C, D) |
RANK() |
Assigns the same rank to tied rows, then skips the next rank(s). | 1, 2, 3, 3, 5 (for values A, B, C, C, D) |
DENSE_RANK() |
Assigns the same rank to tied rows, but does not skip ranks. | 1, 2, 3, 3, 4 (for values A, B, C, C, D) |
Defining the Analytical Frame: `ROWS` vs. `RANGE`
Beyond `PARTITION BY` and `ORDER BY`, the frame specification allows you to precisely define the subset of rows within a partition that constitutes the "window" for the current row's calculation. This is typically specified using `ROWS` or `RANGE` clauses, which determine the logical or physical relationship of rows to the current row.
Default Frame Behavior
When you use an `ORDER BY` clause within `OVER()` but do not specify a frame, the default frame is `RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW`. This means the window includes all rows from the beginning of the partition up to the current row, including any peers (rows with identical `ORDER BY` values).
If you omit `ORDER BY` (and also `PARTITION BY`), the default frame is `ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING`. This means the window includes all rows in the entire result set.
Understanding Frame Syntax
A frame specification typically looks like this:
(ROWS | RANGE) BETWEEN (UNBOUNDED PRECEDING | N PRECEDING | CURRENT ROW)
AND (N FOLLOWING | CURRENT ROW | UNBOUNDED FOLLOWING)
- `UNBOUNDED PRECEDING` / `UNBOUNDED FOLLOWING`: Extends the frame to the very first/last row of the partition.
- `N PRECEDING` / `N FOLLOWING`: Specifies a number of rows before/after the current row. For `ROWS`, `N` is a physical count. For `RANGE`, `N` is an offset from the current row's `ORDER BY` value.
- `CURRENT ROW`: The current row being evaluated.
`ROWS` vs. `RANGE` in Detail
The distinction between `ROWS` and `RANGE` is subtle but critical, especially when dealing with ties in the `ORDER BY` clause.
| Feature/Aspect | `ROWS` Frame | `RANGE` Frame |
|---|---|---|
| Definition | Defines the frame based on a physical offset from the current row. | Defines the frame based on a logical offset (value) from the current row's `ORDER BY` value. |
| Tie Handling | Treats each row individually, even if they have identical `ORDER BY` values. If `1 PRECEDING` is used, it will literally look at the one physical row before. | Includes all "peer" rows (rows with the same `ORDER BY` value as the current row) if they fall within the logical range. |
| `ORDER BY` Requirement | Optional for some aggregates, but highly recommended for deterministic results with functions like `LAG`, `LEAD`, and `ROW_NUMBER`. | Mandatory. Without `ORDER BY`, `RANGE` is invalid. Requires a single `ORDER BY` column of an orderable type. |
| Use Cases | Moving averages over a fixed number of rows (e.g., last 3 sales), running totals based on row count. | Moving averages over a value range (e.g., sales within 10 units of the current sale price), cumulative distributions where peers are important. |
| Example Frame | ROWS BETWEEN 2 PRECEDING AND CURRENT ROW (includes current row and 2 physical rows before it). |
RANGE BETWEEN 10 PRECEDING AND CURRENT ROW (includes current row and all rows whose `ORDER BY` value is within 10 units less than or equal to current row's value). |
Example with `ROWS`: Calculating a 3-Day Moving Average
SELECT
SaleDate,
DailySales,
AVG(DailySales) OVER (ORDER BY SaleDate ROWS BETWEEN 2 PRECEDING AND CURRENT ROW) AS ThreeDayMovingAvg
FROM
DailySalesData;
This calculates the average of the current day's sales and the sales from the two preceding days.
Example with `RANGE`: Cumulative Sum within a Price Range
-- Assuming 'Price' is the ORDER BY column
SELECT
ProductID,
Price,
SUM(Quantity) OVER (ORDER BY Price RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS CumulativeQuantityByPrice
FROM
ProductInventory;
This sums `Quantity` for all products with a price less than or equal to the current product's price. If multiple products have the same price, they are all included up to that price point.
Advanced Window Functions: Over 10 Practical Examples
Let's explore several practical applications of various window functions beyond basic ranking, demonstrating their versatility and power in real-world scenarios. We'll use a hypothetical `Orders` table with columns like `OrderID`, `CustomerID`, `OrderDate`, `OrderAmount`, and `ProductCategory`.
1. `LAG()` and `LEAD()`: Accessing Previous/Next Rows
These functions are indispensable for time-series analysis, comparing consecutive values, or identifying changes. `LAG()` retrieves a value from a row an offset number of rows before the current row, while `LEAD()` retrieves from a row an offset number of rows after the current row.
-- Calculate the difference in order amount from the previous order for each customer
SELECT
CustomerID,
OrderDate,
OrderAmount,
LAG(OrderAmount, 1, 0) OVER (PARTITION BY CustomerID ORDER BY OrderDate) AS PreviousOrderAmount,
OrderAmount - LAG(OrderAmount, 1, 0) OVER (PARTITION BY CustomerID ORDER BY OrderDate) AS OrderAmountDifference
FROM
Orders
ORDER BY
CustomerID, OrderDate;
2. `FIRST_VALUE()` and `LAST_VALUE()`: Retrieving Extremes
These functions return the value of the specified expression from the first or last row in the current window frame, respectively.
-- Get the first order amount for each customer
SELECT
CustomerID,
OrderDate,
OrderAmount,
FIRST_VALUE(OrderAmount) OVER (PARTITION BY CustomerID ORDER BY OrderDate ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) AS FirstOrderAmount
FROM
Orders
ORDER BY
CustomerID, OrderDate;
3. Aggregate Functions with `OVER()`: Running Totals and Moving Averages
Any aggregate function can be used as a window function by simply adding the `OVER()` clause. This allows for powerful calculations like running sums, moving averages, or cumulative counts.
-- Calculate a running total of order amounts for each customer
SELECT
CustomerID,
OrderDate,
OrderAmount,
SUM(OrderAmount) OVER (PARTITION BY CustomerID ORDER BY OrderDate ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS RunningTotal
FROM
Orders
ORDER BY
CustomerID, OrderDate;
-- Calculate a 7-day moving average of order amounts (for the entire dataset, ignoring customers)
SELECT
OrderDate,
SUM(OrderAmount) AS DailyTotal,
AVG(SUM(OrderAmount)) OVER (ORDER BY OrderDate ROWS BETWEEN 6 PRECEDING AND CURRENT ROW) AS SevenDayMovingAverage
FROM
Orders
GROUP BY
OrderDate
ORDER BY
OrderDate;
4. `NTH_VALUE()`: Retrieving the Nth Value
Returns the value of the expression from the Nth row in the window frame, specified by an integer argument `N`.
-- Find the second largest order for each customer
SELECT
CustomerID,
OrderDate,
OrderAmount,
NTH_VALUE(OrderAmount, 2) OVER (PARTITION BY CustomerID ORDER BY OrderAmount DESC ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) AS SecondLargestOrder
FROM
Orders
ORDER BY
CustomerID, OrderDate;
5. `CUME_DIST()` and `PERCENT_RANK()`: Relative Ranking
These functions calculate the relative position of a row within its partition. `CUME_DIST()` calculates the cumulative distribution (percentage of rows less than or equal to the current row), and `PERCENT_RANK()` calculates the percentage rank ((rank - 1) / (rows in partition - 1)).
-- Calculate the cumulative distribution and percentage rank of order amounts within each product category
SELECT
ProductCategory,
OrderID,
OrderAmount,
CUME_DIST() OVER (PARTITION BY ProductCategory ORDER BY OrderAmount) AS CumulativeDistribution,
PERCENT_RANK() OVER (PARTITION BY ProductCategory ORDER BY OrderAmount) AS PercentageRank
FROM
Orders
ORDER BY
ProductCategory, OrderAmount;
6. `NTILE(n)`: Grouping into Quantiles
This function divides the rows in an ordered partition into a specified number of groups (tiles), assigning an integer representing the tile to each row. This is useful for creating percentiles, quartiles, or deciles.
-- Divide customers into 4 quartiles based on their total purchase amount
WITH CustomerTotals AS (
SELECT
CustomerID,
SUM(OrderAmount) AS TotalPurchaseAmount
FROM
Orders
GROUP BY
CustomerID
)
SELECT
CustomerID,
TotalPurchaseAmount,
NTILE(4) OVER (ORDER BY TotalPurchaseAmount DESC) AS PurchaseQuartile
FROM
CustomerTotals
ORDER BY
TotalPurchaseAmount DESC;
These examples illustrate just a fraction of the analytical power unleashed by window functions. Their ability to perform calculations over related rows without aggregation makes them indispensable for deep data analysis.
Optimizing Window Function Performance
While incredibly powerful, poorly optimized window functions can lead to significant performance bottlenecks, especially with large datasets. Understanding how to use them efficiently is crucial for scalable analytics. Approximately 70% of performance issues in complex SQL queries can be traced back to inefficient data access or processing, and window functions are no exception.
1. Smart Partitioning and Ordering
- Minimize Partition Size: The smaller the partitions, the faster the window function can operate on them. If your `PARTITION BY` clause results in millions of tiny partitions, or one giant partition, performance can degrade. Strive for a balanced distribution.
- Index Your `PARTITION BY` and `ORDER BY` Columns: This is arguably the most critical optimization. If your `PARTITION BY` and `ORDER BY` columns are indexed, the database can quickly locate and sort the necessary rows for each window frame, significantly reducing I/O and CPU overhead.
-- Example: Create an index to support window functions on Orders table CREATE INDEX idx_orders_customer_date ON Orders (CustomerID, OrderDate DESC); -- Match ORDER BY for optimal use - Consider Composite Indexes: For `PARTITION BY col1 ORDER BY col2`, a composite index on `(col1, col2)` is often ideal.
2. Frame Specification Efficiency
- Use the Smallest Necessary Frame: Don't use `UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING` if `CURRENT ROW` or a limited `N PRECEDING` suffices. A smaller frame means less data to process for each row.
- Beware of `RANGE` with Large Value Differences: While powerful, `RANGE` can create very large frames if the `ORDER BY` column has many peers or values close to each other, especially for `UNBOUNDED PRECEDING`.
3. General Query Optimizations
- Filter Early: Apply `WHERE` clauses before window functions are processed whenever possible to reduce the total number of rows the function needs to evaluate.
- Use CTEs for Clarity and Reuse: Common Table Expressions (CTEs) can break down complex queries into manageable, logical steps, improving readability and sometimes allowing the optimizer to find better execution plans.
WITH RankedSales AS ( SELECT ProductCategory, ProductName, SalesAmount, RANK() OVER (PARTITION BY ProductCategory ORDER BY SalesAmount DESC) AS ProductRank FROM ProductSales WHERE SaleDate >= '2023-01-01' -- Filter early! ) SELECT * FROM RankedSales WHERE ProductRank <= 3; - Avoid Unnecessary Duplication: If you perform the same window function multiple times, consider doing it once in a subquery or CTE and referencing the result.
4. Hardware and Database Configuration
While outside the scope of query writing, remember that sufficient memory (RAM) and fast I/O (SSD storage) are critical for window functions, as they often require temporary storage and sorting of large datasets. Modern database systems (e.g., PostgreSQL 16, SQL Server 2022) have also implemented significant internal optimizations for window function processing.
Mastering Window Functions: Best Practices and Advanced Tips
To truly master window functions and ensure your code is robust, readable, and efficient, consider these best practices:
1. Prioritize Readability and Clarity
- Use Aliases: Always alias your window function expressions for clarity.
- Format Consistently: Indent your `OVER()` clause and its components (`PARTITION BY`, `ORDER BY`, `ROWS/RANGE`) for better readability.
- Comments: Use comments to explain complex logic, especially with nested window functions or intricate frame specifications.
2. Understand Default Behaviors
Many common mistakes stem from not fully grasping the default frame specifications when `ROWS` or `RANGE` are omitted. Remember:
- If `ORDER BY` is present, the default frame is `RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW`.
- If `ORDER BY` is absent, the default frame is `ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING` (the entire partition).
3. Test Edge Cases
Always test your window functions with:
- Empty partitions.
- Partitions with only one row.
- Partitions where all `ORDER BY` values are identical (ties).
- `NULL` values in `PARTITION BY` or `ORDER BY` columns.
4. Combine with Other SQL Constructs
Window functions often shine when combined with other SQL features:
- CTEs (Common Table Expressions): Essential for breaking down complex logic, especially when you need to apply multiple window functions sequentially or filter based on a window function's result.
- Subqueries: Can be used similar to CTEs, though CTEs often improve readability.
- Filtering: Use `WHERE` clauses to filter *before* applying window functions to reduce the dataset size. Use `HAVING` or outer queries to filter *after* a window function has calculated its value.
"Window functions are not just a feature; they represent a fundamental shift in how complex analytical problems can be elegantly solved in SQL. They empower analysts to ask deeper questions of their data without resorting to external tools."
— A leading data architect, SQL Conference 2023
5. Continuous Learning
The SQL standard for window functions is extensive, and different database systems (PostgreSQL, SQL Server, Oracle, MySQL 8+, BigQuery, Snowflake) may have subtle variations or extensions. Regularly consult your database's specific documentation.
Conclusion: Elevate Your SQL Analytics
You've journeyed through the intricate landscape of SQL Window Functions, from their fundamental concepts to advanced applications and critical performance optimizations. By mastering the `PARTITION BY` clause for defining logical groups, the `ORDER BY` clause for sequencing operations, and the nuanced `ROWS` and `RANGE` frame specifications, you are now equipped to tackle complex analytical challenges with unparalleled elegance and efficiency. The ability to perform calculations across related rows without collapsing your dataset is a game-changer, opening doors to deeper insights into trends, rankings, and sequential data patterns that were previously arduous to extract.
The insights gained from this comprehensive guide will not only empower you to write more powerful and efficient SQL queries but also position you as a top-tier data professional capable of extracting maximum value from your data. Start integrating these functions into your daily workflow, experiment with different combinations, and observe how your analytical capabilities expand. The future of data analysis demands a nuanced understanding of relational data, and window functions are your ultimate tool for unlocking that potential. Keep exploring, keep questioning, and let your data tell its full story.
Next Steps:
- Practice with your own datasets, focusing on `LAG/LEAD` for time-series analysis.
- Experiment with different frame specifications for aggregate functions to observe their impact.
- Benchmark your window function queries using `EXPLAIN` to identify and resolve performance bottlenecks.
Frequently Asked Questions
Q: What is the primary difference between `GROUP BY` and `PARTITION BY`?
A: `GROUP BY` aggregates rows into a single output row for each group, effectively collapsing the original data and losing individual row details. `PARTITION BY`, on the other hand, divides the result set into partitions for a window function's calculation, but it retains all original individual rows in the output, adding a calculated value to each. This allows for calculations over groups while preserving granularity.
Q: Can I use multiple `PARTITION BY` and `ORDER BY` clauses in a single `OVER()`?
A: No, you can only specify one `PARTITION BY` clause and one `ORDER BY` clause within a single `OVER()` definition. However, both clauses can contain multiple columns (e.g., `PARTITION BY col1, col2 ORDER BY col3 DESC, col4 ASC`). If you need different partitioning or ordering for different calculations, you'll need separate window function expressions or CTEs.
Q: When should I use `ROW_NUMBER()` versus `RANK()` versus `DENSE_RANK()`?
A: Use `ROW_NUMBER()` when you need a unique, sequential number for each row within a partition, regardless of ties. Use `RANK()` when you want to assign the same rank to tied rows and create gaps in the ranking sequence. Use `DENSE_RANK()` when you want to assign the same rank to tied rows but without creating gaps in the sequence, ensuring consecutive ranks.
Q: Is there a performance difference between `ROWS` and `RANGE` frame specifications?
A: Yes, there can be. `ROWS` typically involves a simpler physical counting mechanism and can be more performant, especially with fixed offsets (`N PRECEDING`). `RANGE` requires evaluating the actual values of the `ORDER BY` column and considering all peers, which can be computationally more intensive if many rows fall within a given range or if the `ORDER BY` column is complex or not indexed efficiently. Optimal performance often depends on the specific query and data distribution.
Q: What happens if I don't specify a frame (e.g., `ROWS` or `RANGE`)?
A: If an `ORDER BY` clause is present within `OVER()`, the default frame is `RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW`. If no `ORDER BY` clause is present within `OVER()`, the default frame is `ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING` (meaning the entire partition or entire result set if no `PARTITION BY`). Understanding these defaults is critical to avoid unexpected results.
Q: Can window functions be used in `WHERE` clauses?
A: No, window functions cannot be directly used in the `WHERE` clause because they operate on the result set *after* the `WHERE` clause has filtered rows. If you need to filter based on a window function's result, you must wrap the query in a subquery or a CTE and then apply the filter in the outer query's `WHERE` clause, or use `HAVING` if it's an aggregated window function at the outer level.
Q: Are window functions available in all SQL databases?
A: Most modern relational database management systems (RDBMS) support window functions. This includes PostgreSQL, SQL Server, Oracle, MySQL (version 8.0 and later), SQLite (version 3.25.0 and later), DB2, and cloud data warehouses like Snowflake, Google BigQuery, and Amazon Redshift. Support for specific functions or frame options might vary slightly, so always check your database's documentation.
Q: How do `LAG()` and `LEAD()` handle null values or boundary conditions?
A: `LAG()` and `LEAD()` functions have optional `offset` and `default` arguments. If an `offset` (default is 1) goes beyond the partition boundaries, or if the target row's value is `NULL`, the function returns `NULL` by default. You can specify a `default` value to be returned instead of `NULL` in such cases (e.g., `LAG(column, 1, 0)` would return 0 if there's no preceding row).
Q: What is a "peer row" in the context of window functions?
A: A "peer row" refers to any row within the current partition that has the exact same value(s) for the column(s) specified in the `ORDER BY` clause. The handling of peer rows is a key differentiator between `RANK()` and `DENSE_RANK()`, and between `ROWS` and `RANGE` frame specifications.
Comments
Post a Comment