MySQL performance optimization

Table design optimization

Data type selection: Use the smallest and most appropriate data type (e.g., use INT a different one VARCHAR to store the IP address).
Avoid using this method whenever possible ENUM ; prioritize using VARCHAR OR TINYINT + Dictionary Table instead. Adding new enumeration values requires ALTER TABLE locking the large table, impacting online services. Furthermore, ENUM sorting is based on “internal numbers” rather than alphabetical order, making it prone to errors.
Choosing an appropriate character storage length not only saves database table space and index storage, but more importantly, it improves retrieval speed.
Split large fields into low-frequency fields to separate hot and cold data.
Large field splitting: Split large fields such as BLOB, TEXT etc. into independent tables to reduce I/O of the main table.
Do not store large data such as images and files in the database. Large files and images should be stored in the file system.
Define all columns as such whenever possible NOT NULL (depending on the specific application). This is because NULL values require additional storage and processing in the index (using null bits bitmaps), and the resulting NULL expressions are more complex to calculate, potentially causing the index to become ineffective.
Use TIMESTAMP (4 bytes) or DATETIME (8 bytes) to store the time instead of using a string (which takes up more storage space and cannot be used for calculations and comparisons using date functions).
Storage engine selection: InnoDB is preferred (it supports transactions, row-level locking, and crash recovery). MyISAM is suitable for read-heavy, write-light scenarios, but it does not support transactions.
Denormalization: Use appropriate redundant fields (such as province, city, and county) to reduce JOIN operations.
Use partitioned tables with caution: Partitioned tables physically represent multiple files but logically represent a single table. Cross-partition queries are inefficient, so it is recommended to use physical table partitioning to manage large datasets.
The number of columns in a single table should ideally be less than 50. If it exceeds this, it is recommended to consider vertical table partitioning.
It is recommended to keep the data volume of each table below 5 million. If it exceeds 5 million, you can use historical data archiving or database sharding to achieve this. (5 million rows is not a limit for MySQL databases, but a single table that is too large will cause great problems for modifying the table structure, backup and recovery.)
Reserved fields are prohibited in tables. It is difficult to name reserved fields in a way that is self-explanatory, and the data type stored in a reserved field cannot be determined, making it impossible to choose an appropriate type. Modifying the type of a reserved field will lock the table.

SQL optimization

Enable slow log analysis to locate SQL queries that take a long time to execute.

-- View slow query logs
SET GLOBAL slow_query_log = 'ON';
SET GLOBAL long_query_time = 1; -- Query records exceeding 1 second

Avoid full table scans : Use EXPLAIN analytical execution plans to ensure queries use indexes. type The values of `index` are roughly sorted from best to worst as follows:

system→ const→ eq_ref→ ref→ range→ index→ ALL. The goal of SQL performance optimization should at least reach rangelevel 1, and ideally it ref should be level consts 2. The main optimization method is indexes , which will be discussed in detail in the “Index Optimization” section later.

Avoid SELECT * : Select only the fields you need to reduce unnecessary data transfer.
Pagination optimization WHERE id > Previous page maximum ID : Use alternatives for deep pagination LIMIT offset, size.

-- Inefficient (poor performance at offset=1 million)
SELECT * FROM logs LIMIT 1000000, 20;
-- Efficient (record the ID of the last item on the previous page)
SELECT * FROM logs WHERE id > 1000000 LIMIT 20

Batch operations : Insert/update data in batches to reduce network round trips.
Batch reading : When retrieving a large amount of data, it is recommended to retrieve the data in batches to reduce the amount of data in a single request. Each time, retrieve less than 2000 data entries, and the result set should be less than 1 MB.
Do not use `order by rand()` : `order by rand()` adds a pseudo column to the table, and then uses the `rand()` function to calculate the `rand()` value for each row of data, sorting based on that row. This usually generates a temporary table on disk, so it is very inefficient and cannot utilize indexes.
Use JOIN instead of subqueries : the smaller table drives the larger table, ensuring that the joined fields are indexed. Because subqueries (especially correlated subqueries) can lead to repeated execution, difficulty in utilizing indexes, and the optimizer may choose an inefficient execution plan, rewriting them as JOINs can better utilize indexes, support efficient join algorithms, and result in a better execution plan and more stable performance.
When performing multi-table join queries, it is essential to ensure that the joined fields have indexes.
Avoid JOIN operations involving more than three tables : JOIN operations involving multiple tables significantly increase query complexity, lock contention, and the difficulty for the optimizer to select an execution plan, easily leading to a sharp drop in performance and making maintenance difficult. This should be avoided through application-layer splitting or redundant design.
While avoiding foreign keys and cascading updates : all foreign key concepts should be resolved at the application layer. Take the relationship between students and grades as an example: if `student_id` in the student table is the primary key, then `student_id` in the grade table is a foreign key. If updating `student_id` in the student table simultaneously triggers an update in the grade table, this is a cascading update. Foreign keys and cascading updates are suitable for low-concurrency single-machine environments but not for distributed, high-concurrency clusters; cascading updates are highly blocking and pose a risk of database update storms; foreign keys also affect database insertion speed.
Using a covering index for queries : This means the index contains the fields needed for the query, which can avoid table lookups.
Break down complex large SQL queries into multiple smaller SQL queries :

question	Disadvantages of Large SQL	Advantages of splitting into smaller SQL statements
Lock contention	Holding a lock for too long can block other operations.	Small lock granularity and short lock time reduce blocking.
The execution plan is complex.	The optimizer struggles to choose the optimal path and is prone to using incorrect indexes.	Each small SQL statement is simple, resulting in a more stable execution plan.
Memory and resource consumption	Temporary tables, sorting, and joins can consume a lot of memory.	Distribute resource consumption to reduce single-event pressure
Readability and maintainability	Difficult to understand, debug and modify	Clear logic, easy to test and reuse
Cache-friendly	Results cannot be cached or cache hit rate is low	Small results can be reused in Redis/local cache.
Fault tolerance and degradation	One failure leads to total failure.	Partial failures can be downgraded.
Scalability	Difficult to process in parallel or asynchronously	Asynchronous, concurrent, and paginated loading are supported.

Avoid performing calculations in the database as much as possible; complex calculations should be moved to the business application.
Large SQL statements, large transactions, and large batch operations should be avoided and instead handled at the business level. Large batch operations can cause significant master-slave latency, and binlog logs in row format will generate a large amount of data.
Avoid using stored procedures, triggers, functions, etc., as they can easily cause business logic to be coupled with the database and affect database service performance.
Prefer using UNION UNION ALL ALL over UNION UNIONALL unless you explicitly need deduplication. UNION ALL directly merges the result set, retaining all rows, including duplicates; UNION, on the other hand, requires deduplication, which involves sorting or hashing, thus incurring significant memory and CPU overhead.

Index usage optimization

In MySQL, indexes are crucial for improving query efficiency. However, not all scenarios are suitable for creating indexes, as they consume additional storage space and may slow down write operations. Proper use of indexes can significantly improve database performance, while improper use can lead to performance degradation.

Scenarios for creating indexes

Scenarios where indexing is typically required:

Create a primary key index on the column that serves as the primary key to enforce the uniqueness of that column and organize the arrangement of data in the table.
Creating a unique index on columns that require uniqueness constraints can ensure data integrity.
Creating indexes on columns that are frequently used as query criteria (such as in the WHERE clause) can speed up searches.
Creating indexes on columns that are frequently used in JOIN joins, primarily foreign keys, can speed up the join process.
Create indexes on columns that are frequently searched based on ranges (<, <=, =, >, >=, BETWEEN, IN, LIKE that does not start with a wildcard) because the indexes are already sorted and the ranges they specify are contiguous.
Create indexes on columns that frequently require sorting (ORDER BY) and grouping (GROUP BY). Since the indexes are already sorted, queries can utilize the index’s sorting properties, thus speeding up sort query times.
Use composite indexes based on your business scenario: When a query requires multiple columns, consider creating a composite index, but be aware of the leftmost matching principle. If an index contains all the columns required by the current query, data can be retrieved directly from the index without accessing the actual table; this type of index is called a covering index. Using covering indexes can also significantly improve query performance.

Scenarios where index creation requires caution:

For columns containing a large number of duplicate values, indexes are not very effective and may even reduce query performance.
For very small tables (such as those with a few hundred records or less), a full table scan is usually faster than using an index because indexes also require storage space and maintaining them incurs additional overhead.
Columns that are frequently updated should be indexed with caution, as indexes can increase the time spent on update operations, since the index needs to be updated synchronously with each update.
Indexes should not be created for columns that are rarely used or referenced in queries, as maintaining unused indexes incurs additional overhead.
Regular indexes should not be created for columns defined as TEXT, IMAGE, and BIT data types because these columns either contain a large amount of data or have very few values.

Index design

Indexed fields should ideally be numeric (simple data types): Avoid designing fields containing only numeric information as character types, as this will reduce query and join performance and increase storage overhead. This is because the storage engine compares each character in a string one by one when processing queries and joins, while numeric types only need to be compared once.
Avoid setting default values for fields NULL: columns containing null values are difficult to optimize for queries because NULL values are handled differently in the index structure than non- NULL values. This makes index storage, index statistics, and comparison operations more complex and may lead to additional scan operations.
Use unique indexes whenever possible to improve query efficiency.
Prioritize columns with high selectivity (selectivity = unique values / total number of rows). Higher selectivity leads to higher index efficiency. For example, high-selectivity columns such as user email and mobile phone are ideal candidates for indexing. Avoid creating separate indexes on low-selectivity columns such as gender and status (0/1).
Prefix index:
- When creating an index on varchar a field, the index length should be specified. It’s unnecessary to index the entire field; the index length should be determined based on the actual text’s distinguishability. Index length and distinguishability are contradictory; generally, for string data, an index of length 20 will have a distinguishability of over 90%. The count(distinct left(Column name, index length))/count(*)appropriate distinguishability should be used to determine the index length.
- When indexing strings, a prefix length should be specified if possible. Prefix indexes are an efficient way to make indexes smaller and faster. Choose a prefix that is long enough to ensure high selectivity, but not too long (to save space). Note that MySQL cannot use prefix indexes for order by summations group by or covering scans.
- For columns of type BLOB, TEXT, or very long VARCHAR, prefix indexes must be used because MySQL does not allow indexing the full length of these columns.
If you’re searching large text files, use a full-text index instead of a full-text index LIKE '%…%'. However, full-text search isn’t really MySQL’s forte; it’s better to use a dedicated search engine to retrieve large text files.
Using composite indexes instead of multiple single-column indexes allows for efficient support of conditional and range queries involving multiple columns in a single index lookup. When using multiple single-column indexes, MySQL may need to perform an index merge, combining the results from different indexes, which is less efficient than a single composite index query.
When using a composite index, the column with the highest selectivity should be placed first to maximize the index’s filtering capability.
Regularly check for duplicate/redundant indexes and unused indexes: MySQL allows multiple indexes to be created on the same column, whether intentionally or unintentionally. In most cases, redundant indexes are unnecessary, as each index increases the overhead of write operations (INSERT/UPDATE/DELETE). Regularly check SHOW INDEX FROM table_name and EXPLAIN analyze index usage, and delete unused or redundant indexes.
As applications evolve, database usage patterns may change, making it necessary to periodically review and adjust indexing strategies.

The correct way to use indexes

Analyze the SQL statement execution plan: Before executing critical queries, EXPLAIN check if the expected indexes were used. Pay attention to fields such type as (preferably const/ref to avoid ALL full table scans), key (the actual index used), and rows (the number of rows scanned).
Reasonable use IN and OR:
- IN Indexes can usually be used (especially for small ranges).
- Multiple OR conditions may lead to index merging, but this is not necessarily efficient. It is recommended to UNION optimize or refactor the query.
Try to avoid using !=`\n` <> or NOT IN`\n`, as these reverse query operations usually cause the engine to abandon the use of the index and perform a full table scan.
Queries using composite indexes must adhere to the “leftmost matching principle,” otherwise the composite index will become ineffective. The leftmost matching principle requires that queries must start from the leftmost column of the index and cannot skip any intermediate index columns.

KEY a_b_c(a, b, c) 
 
-- conforms to the leftmost matching principle: 
WHERE a=? and b=? and c=? 
WHERE a=? and b=? 
WHERE a=? 
 
-- does not conform to the leftmost matching principle 
WHERE b=? and c=? -- does not start from index a 
WHERE b=? -- does not start from index a 
WHERE c=? -- does not start from index a 
WHERE a=? and c=? -- skips index b

When using composite indexes for sorting, ORDER BY the “leftmost matching principle” must also be followed: the order of the indexes ORDER BY must be the same as the column order in the index, and all columns must be in the same direction (either all ascending or all descending).

KEY a_b_c(a, b, c) 
 
-- ORDER BY can use the leftmost prefix of the index: 
ORDER BY a 
ORDER BY a,b 
ORDER BY a,b,c 
ORDER BY a DESC, b DESC, c DESC 
 
-- If WHERE uses the leftmost prefix of the index as a constant, then ORDER BY can use the co-index. 
WHERE a=const ORDER BY b,c 
WHERE a=const AND b=const ORDER BY c 
WHERE a=const ORDER BY b,c 
WHERE a=const AND b>const ORDER BY b,c 
 
-- Cannot use the index for sorting 
ORDER BY a ASC, b DESC,C DESC -- Inconsistent sorting 
ORDER BY b,c -- Loss of index a 
WHERE a=const ORDER c -- Loss of the co-index b 
WHERE a=const ORDER BY a,d -- d is not part of the prime index

Avoid performing function operations on indexed columns as much as possible, otherwise the index will become ineffective.

SELECT * FROM users WHERE YEAR(create_time) = 2023; -- Index invalid
-- changed to：
SELECT * FROM users WHERE create_time >= '2023-01-01' AND create_time < '2024-01-01';

Avoid performing calculations or expression operations on indexed columns as much as possible, otherwise the index will become ineffective.

SELECT * FROM users WHERE age + 10 = 30; -- Index invalid
--changed to：
SELECT * FROM users WHERE age = 20;

To prevent implicit type conversions caused by different field types from rendering the index ineffective.

-- If the ID type is a string and numbers are used in SQL statements, it will cause the index to become invalid
SELECT * FROM users WHERE id = 123;

Avoid fuzzy searches starting with wildcards whenever possible LIKE . If you absolutely need to perform a large number of left-hand or full-hand fuzzy searches, it is recommended to use a search engine. Index files have the leftmost prefix matching property of B-trees; if the left-hand value is not determined, the index cannot be used.

SELECT * FROM users WHERE name LIKE 'admin%'; -- Index can be used
SELECT * FROM users WHERE name LIKE '%admin'; -- Index invalid

OR Conditions: When using OR separate conditions, if OR a column in the first condition has an index, but a column in the second condition does not, then none of the indexes will be used. This can be UNION ALL replaced with something else that achieves the same effect:

-- Assuming num1 has an index and num2 has no index
SELECT * FROM t WHERE num1=10 OR num2=20; -- Num1 index may be invalid
-- changed to：
SELECT * FROM t WHERE num1=10 
UNION ALL 
SELECT * FROM t WHERE num2=20; -- Num1 index takes effect

Lock usage optimization

Optimizing lock performance in MySQL is crucial for improving overall database performance, especially in high-concurrency environments.

Use the appropriate storage engine : Compared to MyISAM’s table-level locking mechanism, InnoDB offers row-level locking and Multi-Version Concurrency Control (MVCC), which makes it perform better when handling a large number of read and write operations. Therefore, InnoDB is recommended in most cases.
Choosing the appropriate transaction isolation level : Different isolation levels correspond to different locking strategies, and choosing the appropriate isolation level can achieve a balance between consistency and concurrency. For example, using a lower isolation level can READ COMMITTED reduce the lock holding time and scope, but may sacrifice some data consistency guarantees.
Utilize indexes whenever possible : Ensure that the columns used in your query conditions have appropriate indexes. This reduces the amount of data locked because InnoDB only locks rows that meet the conditions, not the entire table. Avoid full table scans, as they can lock a large number of rows, increasing the likelihood of deadlocks.
Shorten transaction length : Minimizing the duration of transactions reduces the probability of lock contention. Ensure that transactions only begin when necessary and commit or roll back as soon as possible.
Batch operation transactions : Combine multiple related small operation transactions into a larger operation transaction for execution, thereby reducing the number of times locks need to be acquired.
Acquire locks of sufficient level at once : When explicitly locking a recordset, it’s best to request locks of sufficient level at once. For example, if you need to modify data, it’s better to directly request an exclusive lock rather than first requesting a shared lock and then requesting an exclusive lock during modification, as this can easily lead to deadlocks. However, do not request locks exceeding the actual required level.
Unified operation order : When different programs access the same set of tables, they should try to agree on the same order to access each table. For a table, the rows in the table should be accessed in a fixed order as much as possible. This can greatly reduce the probability of deadlock.
Application layer optimization : Reduce lock contention for the same resource by redesigning application logic. For example, adjust business processes to distribute access to frequently accessed data.
Configuration parameter adjustment:
- innodb_lock_wait_timeout Set a reasonable timeout value to prevent blocking issues caused by waiting for the lock for a long time.
- innodb_print_all_deadlocks Enabling this option can help monitor and analyze all deadlocks in the production environment.
Command-line diagnostics : can be used periodically SHOW STATUS to view the current lock status, lock contention, etc.

MySQL Server Optimization

Connection count and thread optimization : Adjustments were made max_connections to control the maximum number of client connections that the MySQL server allows to be established simultaneously, preventing resource exhaustion. Adjustments were thread_cache_size also made to control the number of reusable idle threads in the cache, reducing the overhead of frequently creating and destroying threads.
Connection timeout configuration optimization : Adjust wait_timeout (for non-interactive connections, such as application connection pools, scripts) / interactive_timeout (for interactive connections, such as MySQL client command line) to proactively release long-idle connections after they have been idle (without any queries) for a set time, preventing connection exhaustion. Recommended settings are 600 (10 minutes) or 1800 (30 minutes).
InnoDB storage engine configuration optimization :

category	Configuration items	Recommended value (for reference)	illustrate
Memory Management	innodb_buffer_pool_size is the size of the data and index buffer.	70%~80% of physical memory	The memory area for caching data pages and index pages is the most critical performance parameter .
Transaction log	innodb_log_file_size Redo Log size	1~2G (multiple files)	Redo log file size affects write performance and crash recovery time.
	innodb_log_buffer_size Redo Log write buffer size	64M~256M
Disk swiping strategy	innodb_flush_log_at_trx_commit transaction log flushing frequency	Option 1 (flush to disk with every commit, the safest) or option 2 (write to OS cache, flush to disk every second, which can balance security and performance)	Transaction persistence control controls the flushing strategy of the Redo Log to disk when a transaction is committed, which directly affects data security and performance.
IO concurrency	innodb_io_capacity / innodb_io_capacity_max disk I/O capacity hints	Standard SSD: 2000 / 4000	This tells InnoDB the disk’s I/O capabilities to control the background dirty flushing speed and prevent checkpoints from falling too far behind, which could lead to “write amplification” or IOPS bottlenecks.
Concurrency control	innodb_thread_concurrency thread concurrency limit	0 (meaning no limit, automatically managed by InnoDB) or 1 to 2 times the number of CPU cores.	Limit the number of concurrent threads to prevent thread contention from causing context switching overhead.
	The number of I/O threads innodb_read_io_threads / write_threads	4~8 (adjustable for high concurrency)
Deadlock and Locks	innodb_deadlock_detect deadlock detection switch	ON (default)	Automatic deadlock detection.
	innodb_lock_wait_timeout lock wait timeout period	50~120 seconds
Other key items	innodb_flush_method: File system flushing method	O_DIRECT (Setting Linux to O_DIRECT can bypass the OS cache and avoid double caching and dirty page conflicts)	Controls how data files and log files are flushed to disk.
	sync_binlog is the frequency at which the MySQL Server layer’s Binlog is flushed to disk.	1 (flush to disk with every transaction commit, safest but low performance) or 1000 (flush to disk every 1000 commits, high performance but may lose logs)	Controlling the synchronization frequency of the MySQL Server layer Binlog is mainly used in scenarios such as data recovery and master-slave replication.

Hardware optimization :

Use SSDs instead of HDDs to improve I/O performance.
Increase memory to expand the buffer pool and reduce disk access.

MySQL Server tuning principles:

Monitor first, then optimize : use logs such as SHOW ENGINE INNODB STATUS CSS performance_schema, CSS, and slow query logs to analyze bottlenecks.
Trade-offs based on scenario : security vs. performance, OLTP vs. OLAP.
Adjust gradually : Change only one parameter at a time and observe the effect.
Backup and Testing : Verify in a test environment before production changes.

MySQL architecture performance optimization

Master-slave replication and read-write separation : The master database handles write operations, while the slave databases share the read load. Automatic routing can be implemented using middleware (such as MyCat or ShardingSphere).
Database sharding and table partitioning :
- Tables :
  - Vertical partitioning : This avoids pagination caused by excessively large single table rows. A large table with many fields can be split into multiple smaller tables, each containing a subset of the fields.
  - Horizontal partitioning : This solves the problem of insufficient single-table capacity by splitting a large table into multiple smaller tables, each containing only a subset of rows from the original table. Horizontal partitioning is recommended only for tables exceeding 5 million rows or 2GB in size.
- Database sharding : This solves server performance issues; multiple databases can enhance concurrent processing capabilities.

Application layer optimization

Caching strategy : Use Redis or similar caches to cache frequently queried results and reduce database pressure.
Connection pool management : Use HikariCP or Druid to control the connection pool size (recommended Number of CPU cores * 2 + 1).
Connection timeout management : Properly controlling application-layer connection timeouts is crucial for ensuring system stability and preventing database resource exhaustion. Appropriate timeout thresholds should be set based on business scenarios to avoid long queries or network issues causing connection backlog or even exhaustion, leading to service denial. If no timeout is set at the application layer, connections will remain occupied when queries are stuck, eventually saturating the database ( max_connections reaching the limit) and triggering a cascading failure.

Timeout type	illustrate	Recommended values (for reference)
connectTimeout	Timeout for establishing a TCP connection	5 ~ 10 seconds
socketTimeout / readTimeout	Timeout for reading data (query execution)	1 ~ 30 seconds (by service)
connectionLifetime	Maximum connection lifetime (prevents long-term connection aging)	30 ~ 60 minutes
wait_timeout (database-side coordination)	How long can a connection remain idle before being closed by the server?	It is recommended to set it to 600 ~ 1800 seconds.

jdbc:mysql://host:3306/dbname?
  connectTimeout=10000        &  // 10 seconds
  socketTimeout=30000         &  // 30 seconds
  autoReconnect=false         &  // Automatic reconnection is not recommended
  maxLifetime=1800000           // 30 minutes (connection pool control)