SQL Functions and Clauses

Aggregate Functions and Scalar Functions in SQL

In SQL, functions are used to perform operations on data. These functions can be broadly classified into two categories: Aggregate Functions and Scalar Functions. Below are the definitions, examples, and use cases for each.

1. Aggregate Functions

Aggregate functions are used to perform a calculation on a set of values and return a single result. These functions are typically used in conjunction with the GROUP BY clause to aggregate data into groups based on a common attribute.

Common Aggregate Functions:

a. COUNT()

Definition: The COUNT() function returns the total number of rows in a specified column or table. It can also be used to count rows that match certain conditions.
Example:

SELECT COUNT(employee_id) AS total_employees FROM employees;
This query returns the total number of employees in the employees table.

b. SUM()

Definition: The SUM() function calculates the total sum of a numeric column. It is used to add up values in a column.
Example:

SELECT department_id, SUM(salary) AS total_salary FROM employees GROUP BY department_id;
This query returns the total salary per department by summing the salary column for each department_id.

c. AVG()

Definition: The AVG() function calculates the average value of a numeric column.
Example:

SELECT department_id, AVG(salary) AS average_salary FROM employees GROUP BY department_id;
This query calculates the average salary for each department.

d. MIN()

Definition: The MIN() function returns the smallest value in a column.
Example:

SELECT MIN(salary) AS lowest_salary FROM employees;
This query returns the minimum salary from the employees table.

e. MAX()

Definition: The MAX() function returns the largest value in a column.
Example:

SELECT MAX(salary) AS highest_salary FROM employees;
This query returns the highest salary from the employees table.

2. Scalar Functions

Scalar functions operate on individual values and return a single result for each value. These functions are typically used to perform operations on a single value or column, such as string manipulation, mathematical calculations, or data transformations.

Common Scalar Functions:

a. UPPER()

Definition: The UPPER() function converts all the characters in a string to uppercase.
Example:

SELECT UPPER(customer_name) AS customer_name_upper FROM customers;
This query returns the customer_name in uppercase.

b. LOWER()

Definition: The LOWER() function converts all characters in a string to lowercase.
Example:

SELECT LOWER(employee_name) AS employee_name_lower FROM employees;
This query returns the employee_name in lowercase.

c. LEN() / LENGTH()

Definition: The LEN() or LENGTH() function returns the number of characters in a string (the length of the string).
Example:

SELECT LEN(employee_name) AS name_length FROM employees;
This query returns the length of the employee name.

d. ROUND()

Definition: The ROUND() function rounds a numeric value to a specified number of decimal places.
Example:

SELECT ROUND(salary, 2) AS rounded_salary FROM employees;
This query rounds the salary values to two decimal places.

e. CONCAT()

Definition: The CONCAT() function concatenates two or more strings into one string.
Example:

SELECT CONCAT(first_name, ' ', last_name) AS full_name FROM employees;
This query combines the first_name and last_name columns into a full_name column.

Key Differences Between Aggregate and Scalar Functions

Aggregate Functions	Scalar Functions
Operate on a set of rows and return a single result.	Operate on individual values and return a single result for each.
Typically used with GROUP BY to aggregate data.	Operate on individual column values or expressions.
Example: COUNT(), SUM(), AVG(), MIN(), MAX()	Example: UPPER(), LEN(), ROUND(), CONCAT(), NOW()
Returns one result for each group of rows.	Returns one result for each individual row or value.

GROUP BY and HAVING Clause in SQL

In SQL, the GROUP BY and HAVING clauses are used together to group rows that have the same values in specified columns and to filter the results based on some conditions. Here's a detailed explanation of each, including definitions, examples, and use cases.

1. GROUP BY Clause

Definition:

The GROUP BY clause is used to arrange identical data into groups. It is typically used with aggregate functions (such as COUNT(), SUM(), AVG(), etc.) to perform operations on each group of rows rather than the entire table.

How It Works:

It groups the rows that have the same values in one or more columns.
Once the rows are grouped, you can apply aggregate functions like COUNT(), SUM(), AVG(), MIN(), and MAX() to the groups.

Syntax:

SELECT column1, column2, aggregate_function(column3) FROM table_name GROUP BY column1, column2;

Example:

Let’s say we have a table named orders with the following columns: order_id, customer_id, order_date, total_amount.
To find the total sales for each customer, we can group the data by customer_id and sum the total_amount for each customer.
SELECT customer_id, SUM(total_amount) AS total_sales FROM orders GROUP BY customer_id;
This query groups the rows by customer_id and then calculates the total sales (SUM(total_amount)) for each customer.

2. HAVING Clause

Definition:

The HAVING clause is used to filter groups of rows created by the GROUP BY clause based on a condition. It works similarly to the WHERE clause, but WHERE is used to filter rows before they are grouped, while HAVING filters after the grouping.

How It Works:

The HAVING clause is applied after the GROUP BY operation has been performed and is used to filter the groups based on the results of aggregate functions or other conditions.
Important: You cannot use aggregate functions in the WHERE clause (e.g., SUM(), AVG()), but you can use them in the HAVING clause.

Syntax:

SELECT column1, aggregate_function(column2) FROM table_name GROUP BY column1 HAVING aggregate_function(column2) condition;

Example:

To find customers who have spent more than $500 in total, we can use the HAVING clause to filter the result of the grouped data:
SELECT customer_id, SUM(total_amount) AS total_sales FROM orders GROUP BY customer_id HAVING SUM(total_amount) > 500;
This query first groups the orders by customer_id, calculates the total sales for each customer, and then filters to include only those customers whose total sales exceed $500.

Key Differences Between WHERE and HAVING

WHERE Clause	Having Clause
Filters rows before grouping.	Filters groups after they are created.
Used to filter individual rows based on conditions.	Used to filter grouped data based on aggregate functions.
Can use regular conditions (e.g., column values).	Can use conditions based on aggregate functions (e.g., SUM(), COUNT()).
Comes before GROUP BY.	Comes after GROUP BY.

GROUP BY Using ROLLUP and CUBE Operators in SQL

In SQL, ROLLUP and CUBE are extensions of the GROUP BY clause used for generating multiple levels of summary data in one query. They are part of the "Subtotals" functionality, which allows for aggregating data across multiple combinations of grouped columns. Both operators are used to produce subtotals (rollups) and cross-tabulations (cubes) in the result set.

1. ROLLUP Operator

Definition:

The ROLLUP operator is used to calculate subtotals for hierarchical data. It generates a result set that includes the aggregation of the grouped data, as well as the subtotals for higher-level groupings. The ROLLUP operator is typically used when you have a hierarchy of data and need to summarize it at different levels.

How It Works:

ROLLUP works by generating aggregates for each combination of columns from the rightmost column to the leftmost column.
It first groups by all the specified columns, then progressively drops the rightmost column to generate subtotals for higher levels of aggregation.

Syntax:

SELECT column1, column2, aggregate_function(column3) FROM table_name GROUP BY column1, column2 WITH ROLLUP;

Example:

Let’s consider a table sales with the following columns: region, product, sales_amount.
To get the total sales by region and product, along with subtotals for each region (using ROLLUP), we can write the following query:
SELECT region, product, SUM(sales_amount) AS total_sales FROM sales GROUP BY region, product WITH ROLLUP;

Explanation:

The GROUP BY region, product groups the data by region and product.
The WITH ROLLUP operator will produce subtotal rows for each region (by combining all products in that region) and an overall total (summed over all regions and products).

The NULL values indicate subtotal or grand total rows.

2. CUBE Operator

Definition:

The CUBE operator is a more advanced version of ROLLUP that calculates subtotals for all combinations of the specified columns. It generates a result set with all possible combinations of the specified columns, including the totals for each individual column and combinations of columns.

How It Works:

CUBE generates aggregates for all combinations of the selected columns. It returns a result set that includes subtotals for all groupings, not just a hierarchical subset.
The operator is particularly useful when you want to generate a cross-tabulation of the data.

Syntax:

SELECT column1, column2, aggregate_function(column3) FROM table_name GROUP BY column1, column2 WITH CUBE;

Example:

Let’s continue with the sales table (with region, product, and sales_amount).
To calculate total sales by region and product, along with all possible subtotals (using CUBE), you can write:
SELECT region, product, SUM(sales_amount) AS total_sales FROM sales GROUP BY region, product WITH CUBE;

Explanation:

The WITH CUBE operator will generate the following:

Total sales by region and product.
Subtotals by region and product.
Subtotals by region.
Subtotals by product.
The overall total (grand total).

Explanation of NULL:
- The NULL values represent combinations that aggregate the data at different levels:
  - The NULL for product in a row means that total sales are calculated for all products in that region.
  - The NULL for region in a row means that total sales are calculated for all regions for a particular product.
  - The row where both columns are NULL represents the grand total.

Key Differences Between ROLLUP and CUBE

ROLLUP	CUBE
Generates subtotals along a hierarchical path.	Generates subtotals for all combinations of the grouped columns.
Produces subtotals for a specified hierarchy.	Produces subtotals for every possible combination of columns.
Ideal when dealing with hierarchical data.	Ideal for cross-tabulation of data (comprehensive subtotals).
Includes fewer rows (fewer subtotals).	Includes more rows (all combinations and totals).

Power BI Explained: Easy-to-Follow Guide for Data Analysis and Reporting