Window Functions in SQL

Window Functions in SQL

In SQL, computations over a collection of rows that are connected to the current row are performed using Windows functions (also known as analytical functions). Running totals, moving averages, and other aggregate values within a set of data can be determined using these functions. Windows functions can be an effective tool for reporting and data analysis.

Here are some commonly used windows functions in SQL:

ROW_NUMBER(): assigns a unique number to each row within a result set.

RANK(): assigns a rank to each row based on the value of a specified column. Rows with the same value get the same rank, and the next rank is skipped.

DENSE_RANK(): assigns a rank to each row based on the value of a specified column. Rows with the same value get the same rank, and the next rank is not skipped.

NTILE(n): divides a result set into n groups of equal size, assigning a group number to each row.

LAG(column, n): returns the value of the column in the previous row, offset by n rows.

LEAD(column, n): returns the value of the column in the next row, offset by n rows.

SUM(), AVG(), MIN(), MAX(), COUNT(): these functions are used to calculate aggregate values across a set of rows, but with the addition of the OVER() clause, they can be used as window functions to calculate these values for a subset of rows.

Examples of using "Window Function" in SQL:

SELECT
salesperson,
sale_date,
sale_amount,
SUM(sale_amount) OVER (PARTITION BY salesperson ORDER BY sale_date) as running_total
FROM
sales_table;

The running total of sales for each salesperson is computed in this example using the SUM() function as a window function, and the results are sorted by sale date. Rows are grouped by salesperson using the PARTITION BY clause, and are sorted inside each partition using the ORDER BY clause.

Example 1: Using ROW_NUMBER()

Each row in a result set is given a different number by the ROW_NUMBER() function. Here is an illustration of how to use ROW_NUMBER() to rank each row according to the value of a particular column:

SELECT 
   product_id, 
   product_name, 
   unit_price, 
   ROW_NUMBER() OVER (ORDER BY unit_price DESC) AS rank 
FROM 
   products;

The product ID, name, unit price, and rank for each product are all included in the result set of this query, which is based on the unit price of each product. The result set will be arranged in descending order by unit price.

Example 2: Using SUM() with OVER()

As a window function, the SUM() function can be used to compute the running total of a given column across a collection of rows. Here is an illustration of how to compute the running total of sales for each salesperson using SUM() and OVER():

SELECT 
   salesperson, 
   sale_date, 
   sale_amount, 
   SUM(sale_amount) OVER (PARTITION BY salesperson ORDER BY sale_date) as running_total 
FROM 
   sales_table;

Each salesperson's running total of sales as well as the salesperson's name, the sale date, and the sale amount will all be included in the result set of this query. The result set will be divided into salesperson and sale date groups, respectively.

Example 3: Using LEAD() and LAG()

The values of a specific column in the subsequent or preceding row can be retrieved using the LEAD() and LAG() procedures, respectively. Here is an illustration of how to compute the change in sales from one month to the next using LEAD() and LAG():

SELECT 
   salesperson, 
   sale_date, 
   sale_amount, 
   LEAD(sale_amount) OVER (PARTITION BY salesperson ORDER BY sale_date) - sale_amount as change 
FROM 
   sales_table;

The salesperson, sale date, sale total, and month-to-month sales change are all included in the result set of this query. The result set will be divided into salesperson and sale date groups, respectively.