Improve SQL Efficiency: Tips for Indexing and Performance Optimization
In this article I’m going to be talking about indexing and performance optimization when using SQL.
SQL Indexing
You can think of indexing as a table of contents in a book. It helps the database to find data faster. If there are no indexes then the database has to scan each row in a table to find the required data, which is called a full table scan. With an index, the database jumps directly to the needed rows, making queries much faster.
There are several types of indexes,
Primary index - Automatically created on primary key columns.
Composite index - An index on multiple columns
Unique index - Ensures all values in a column are unique.
Full text index - Speeds up searching within large text fields.
For this article, we are going to work with the employees table we created a few articles ago. Now there are a few functions that we can work on using indexes. The first is checking for any existing indexes.
show index from employees;
Now as you know, every table will have at least one index because a primary index will be automatically created on primary key columns.
Lets see how we can create a new index,
create index idx_name on employees(name);
This will create an index for the name column in employees table. Lets create a composite index which is an index created for multiple columns,
create index idx_dpt_sal on employees(department, salary);
This will speed up queries that that filter by both department and salary.
You can also remove indexes with the below query,
drop index idx_name on employees;
This will remove the index that we set on the name column of the employees table.
Performance Optimization
There are a few things that we can do to speed up query results and optimize our database performance. We are going to look at a few of such practices in this section. One such practice is to avoid SELECT * as much as possible. You should always determine which data you want to retrieve and include the column names in the SELECT clause. Fetching only the required columns will improve performance.
/*What to avoid*/
SELECT * from employees;
/*Optimized way*/
SELECT name, age from employees;
Another tip is to use indexing wisely. You should create indexes for frequently used column in WHERE, JOIN and ORDER BY. Avoid indexing all the columns as it will slow down INSERT, UPDATE and DELETE statements.
Another way to optimize performance is to use the EXPLAIN clause. You can use it to find out how a query is executed so you can determine which parts are slow, and try to optimize that part to speed up the query.
EXPLAIN SELECT name FROM employees WHERE salary > 5000;
Another tip is to optimize JOINS. You can use indexes on your foreign keys for faster joins, and also use INNER JOIN instead of LEFT JOIN or RIGHT JOIN as much as possible if you don’t need unmatched rows.
If you are working with a very large dataset, you can use the LIMIT clause to limit the number of rows fetched so you don’t need to overload the database every time you run a query.
SELECT name, age from employees LIMIT 1000;
This will only fetch the first 1000 rows from the table.
A similar approach is to partition larger tables to smaller tables for faster queries. This creates more manageable portions of data and ultimately enables better performance.
Lets do a few exercises on what we discussed in todays article,
- Create a table called customers with columns ID, name, email, city and phone number. Create an index for the column email.
Optimize this query: SELECT * FROM orders WHERE status = 'Shipped';
SELECT ORDERID FROM orders WHERE status = 'Shipped';
Check existing indexes on the products table
show index from products;
Analyze this query with EXPLAIN: SELECT name FROM employees WHERE department = 'IT';
EXPLAIN SELECT name FROM employees WHERE department = 'IT';