Redis Performance Optimization Guide

I. Insufficient Memory: Redis’s “Capacity Ceiling” Problem

Redis, as an in-memory database, stores all data in memory. Once memory is full, it either triggers data eviction leading to the loss of frequently accessed data, or directly causes an OutOfMemoryError (OOM) that crashes the service. I have seen incidents during major e-commerce promotions where the database was instantly overwhelmed due to a sudden drop in cache hit rate caused by a lack of memory planning in advance.

1.1 Why is there insufficient memory?

Data volume surge : With rapid business growth, cached user data, product information, and other data have accumulated uncontrollably, exceeding the memory limit configured in Redis (maxmemory parameter).
Severe memory fragmentation : After frequent add and delete operations, a large number of scattered free blocks appear in memory. Although the total free memory is sufficient, it cannot accommodate large objects, causing a “false memory shortage”.
Inappropriate data structure selection : Using String to store structured data, or using Set to store a collection of pure integers, can lead to serious memory waste.

1.2 How to solve this? A layered solution from “optimization” to “scaling up”.

First layer: Memory optimization, making the most of existing resources (lowest cost).

This is the preferred solution, which improves memory utilization by optimizing data structures and memory configuration:

Choose the right data structure to avoid ineffective waste : When storing structured data such as user information and product attributes, use hashes instead of strings. For example, storing user “Zhang San, 20 years old” uses hashes and saves more than 50% of memory HMSET user:100 name "zhangsan" age 20compared to storing them separately .user:100:nameuser:100:age
When storing sets of integers, ensure that you use Redis’s built-in IntSet structure (which stores only integers and has no pointer overhead), and avoid using a regular Set.
Short lists are stored using compressed lists (ZipList), and list-max-ziplist-entriesparameters such as configuration are used to prevent them from being converted to linked lists prematurely.
Defragmenting memory : Redis 4.0+ supports automatic defragmentation; it is recommended to enable this in production environments.

# Enable automatic fragmentation
config set activedefrag yes
# When the fragmentation rate exceeds 1.2 and the free memory exceeds 100MB, it triggers the sorting process
config set active-defrag-ignore-bytes 104857600
config set active-defrag-threshold-lower 10
# During organization, CPU usage should not exceed 25% to avoid affecting business operations
config set active-defrag-cycle-max 25

Configure a reasonable eviction policy : For caching scenarios, the LRU (Least Recently Used) policy is preferred to ensure the eviction of the least recently used data.

# Eliminate the least recently used key among all keys
config set maxmemory-policy allkeys-lru
# Set the maximum memory to 70% -80% of physical memory and reserve system memory
config set maxmemory 16106127360

Second layer: Horizontal expansion, breaking through the single-node limitation (radical solution)

If the optimized memory is still insufficient, the capacity needs to be expanded through cluster sharding:

Core principle : 16384 hash slots are allocated to multiple master nodes, with each master node responsible for a portion of the slots. The total memory capacity increases linearly with the number of master nodes.
Quick deployment example :

# Activate 3 master nodes (ports 7000-7002)
redis-server --port 7000 --cluster-enabled yes --maxmemory 10gb
# Create a cluster and automatically allocate slots
redis-cli --cluster create 127.0.0.1:7000 127.0.0.1:7001 127.0.0.1:7002 --cluster-replicas 0

Bonus points in interviews: When answering questions about insufficient memory, first talk about “memory optimization” (a low-cost solution), then talk about “cluster expansion” (a radical solution), demonstrating a layered problem-solving approach.

II. The Major Key Issue: The Hidden “Performance Bomb”

Large keys refer to keys used to store large amounts of data, such as a List storing 100,000 records or a Hash storing 100,000 fields. Many developers don’t pay attention to this initially, but as business needs grow, small keys gradually become large keys, eventually causing command execution lag and network congestion. I once dealt with a 1GB Hash key, which blocked the main thread for 8 seconds when executing the DEL command, triggering a service avalanche.

2.1 What are the dangers of large keys?

Blocking the main thread : In Redis’s single-threaded model, operations on large keys (such as DEL, HGETALL) will occupy the main thread for several seconds, during which all requests will be queued.
Network congestion : Reading large keys generates a lot of network data, consuming bandwidth and causing increased latency;
Master-slave replication delay : Large key synchronization will consume network resources of master and slave nodes, causing data synchronization delays on slave nodes.

2.2 How to solve it? The “Investigate-Dismantle-Control” three-step method.

Step 1: Accurately Locate the Major Key

First, identify the major keys before you can optimize accordingly. Here are two practical methods:

Redis built-in commands (for quick troubleshooting) :

redis-cli --bigkeys -i 0.1 #The output will display the maximum Key for each data structure, such as “Biggest hash key: "tag:user:200" (10000 fields)”。

RDB file analysis (precise location) :

Use redis-rdb-tools to analyze the RDB file and obtain the precise memory usage for each key:

# installation tool
pip install redis-rdb-tools
# Generate memory report
rdb -c memory dump.rdb > redis_memory.csv
# Filter keys with memory>5MB
grep -E ",[5-9][0-9]{6,}," redis_memory.csv

Step 2: Scientifically Decompose the Major Keys

Depending on the data structure type of the large key, different splitting strategies are adopted. The following are solutions that have been verified in practice:

Hash key splitting : The key is split into multiple smaller hashes by taking the modulo of the field’s hash. For example, “tag:user:100” can be split into 32 smaller hashes.

import hashlib 
import redis 
 
cli = redis.Redis(host='localhost', port=6379) 
SHARD_COUNT = 32 # Divide into 32 shards, can be adjusted as needed 
 
def get_shard_key(main_key, field): 
# Get the shard number by hashing the field 
field_hash = hashlib.md5(field.encode()).hexdigest() 
shard_idx = int(field_hash, 16) % SHARD_COUNT 
return f"{main_key}:{shard_idx}" 
 
# Store the field (replaces the original HSET) 
main_key = "tag:user:100" 
field = "like:music" 
shard_key = get_shard_key(main_key, field) 
cli.hset(shard_key, field, "rock") 
 
# Read the field (replaces the original HGET) 
print(cli.hget(shard_key, field))

List large key splitting : Sharding by quantity or time, for example, splitting “order:user:100” into 1000 records/shard:

def add_order(main_key, order_info, max_len=1000): 
# Get the current shard number (maintained in Redis) 
shard_idx_key = f"{main_key}:shard_idx" 
shard_idx = int(cli.get(shard_idx_key) or 0) 
shard_key = f"{main_key}:{shard_idx}" 
 
# Switch to the next shard if the shard is full 
if cli.llen(shard_key) >= max_len: 
shard_idx += 1 
cli.set(shard_idx_key, shard_idx) 
shard_key = f"{main_key}:{shard_idx}" 
 
cli.rpush(shard_key, order_info) 
 
# Example call 
add_order("order:user:100", "iPhone 15 order")

String large key splitting : Split large JSON or text into multiple smaller strings, or store them in object storage (such as MinIO), while Redis only stores the address and core fields.

Step 3: Controlling the Key from the Source

Development guidelines : clearly state that “the memory usage of a single key should be ≤10MB” and “the number of List/Hash elements should be ≤10,000”, which will be the focus of code reviews;
Monitoring and alerts : Use Prometheus + Grafana to monitor large keys and set an alert for “Key memory > 5MB”;
Avoid full caching : Cache paginated data or core fields, rather than the entire dataset (e.g., cache only the price and inventory of a product, not the details text).

III. Blocking Operations: The Fatal Flaw of the Single-Threaded Model

Redis’s single-threaded model is the core of its high performance, but it also means that “one command blocks, and all requests queue up.” Common blocking scenarios in production environments include executing the KEYS command to traverse all keys, deleting large keys, and synchronizing persistence, all of which can cause service slowdowns.

3.1 Which operations can cause blocking?

Explicit blocking commands : Full traversal classes: KEYS, SMEMBERS, HGETALL;
Data manipulation functions: DEL (large key), FLUSHALL, SAVE;
Blocking wait classes: BLPOP, BRPOP.
Implicit blocking scenario : Memory eviction: Large keys are evicted when memory is full;
AOF flushing: Simultaneous flushing when using the always strategy;
Master-slave synchronization: When the master node generates an RDB snapshot.

3.2 How to solve it? The “ban-substitute-superior” three-pronged approach.

First measure: Prohibit high-risk commands

Prevent accidental operations by configuring renaming or disabling dangerous commands:

# redis.conf
rename-command KEYS ""  # Disable KEYS
rename-command FLUSHALL ""  # Disable FLUSHELL
rename-command DEL "SAFE_DEL"  #Rename DEL to reduce accidental deletion

The second approach: Use non-blocking commands instead.

This is the core method for resolving congestion. Alternatives to high-frequency congestion commands are as follows:

Blocking commands	Non-blocking alternatives	Advantages
KEYS *	SCAN 0 MATCH * COUNT 100	Iterative traversal without blocking the main thread
DEL Big Key	UNLINK Large Key (Redis 4.0+)	Asynchronous deletion, the main thread is unaware of it.
FLUSHALL	FLUSHALL ASYNC	Asynchronous clearing, non-blocking
SAVE	BGSAVE	The RDB is generated in the background without affecting business operations.
HGETALL Big Hash	HSCAN Big Hash 0 COUNT 100	Acquire in batches to reduce the time required for each transaction.

The third approach: Optimize implicit blocking scenarios.

AOF flushing strategy : Instead of always (synchronous flushing), choose everysec (asynchronous flushing per second) to balance performance and security. config set appendfsync everysec
Master-slave synchronization optimization : During the initial synchronization of the slave node, the master node generates an RDB using BGSAVE (non-blocking) and configures repl-backlog-size to avoid incremental synchronization failures.config set repl-backlog-size 104857600 # 100MB

3.3 How to troubleshoot blockage issues?

When Redis becomes slow to respond, follow these steps to troubleshoot:

View the number of blocked clients : redis-cli info stats | grep "blocked_clients"
Locate the blocked command : redis-cli client list | grep "blocked" # The output will display the blocked command and client information, such as “cmd=del” indicating that it was blocked by the DEL command.
Slow query log review :

# Configure a slow query threshold of 10ms and retain 1000 logs
config set slowlog-log-slower-than 10000
config set slowlog-max-len 1000
# View slow query logs
redis-cli slowlog get

IV. Extension: Quick Solutions to Other Common Performance Issues

In addition to the three core issues, the following two questions also frequently arise, and quick solutions are provided:

4.1 Network latency

Causes : Redis and applications are deployed across different data centers; connection pool configuration is inappropriate. Solutions : 1. Deploy within the same data center to reduce cross-region latency; 2. Use a connection pool (such as JedisPool) to reuse connections and avoid frequent creation/closure; 3. Enable TCP_NODELAY and disable the Nagle algorithm config set tcp-nodelay yes.

4.2 Persistence performance issues

Causes : RDB generation takes a long time, and AOF logs are too large; Solutions : 1. RDB: Replace SAVE with BGSAVE and execute it during off-peak hours; 2. AOF: Enable BGREWRITEAOF to compress logs and reduce disk flushing pressure.