Caching

10 min readJun 18, 2023

In our digital world, data is currency. As we continue to generate unprecedented volumes of data, the mechanisms to manage, process, and access this data must keep up. This is where distributed caching comes into play.

Part 1: Understanding Caching

Caching plays a pivotal role in improving the performance of distributed systems by storing data closer to where it’s needed, thereby reducing the latency of data retrieval.

1.1 Where Caching Occurs

Caching can be implemented at different stages of data processing or network communication:

Browser: Web browsers cache static resources like CSS, JavaScript, and images to reduce network traffic and improve page load times.
CDN: Content Delivery Networks cache data at geographically diverse nodes to reduce latency for end users.
API Gateway: Caches responses to API requests to improve the responsiveness of web services.
Service Level: Uses the CPU, RAM, or disk for caching data to optimize resource usage.
Memory Management Unit: Utilizes caches like L1, L2, L3, and the Translation Lookaside Buffer (TLB) to speed up memory access.
Database: Caches session data, query results, or metadata to improve database performance.

1.2 Cache Deployment

Caching can be deployed in various forms based on the system architecture:

In-Process Caching: The cache lives in the same process as the application. It is the fastest but is limited by application and hardware constraints.
Inter-Process Caching: The cache is locally available but runs as a separate process, allowing multiple applications to share the cache.
Remote Caching: The cache runs on a different machine, which can be accessed over the network. Redis and Memcache are popular remote caching solutions.

1.3 Why Use Cache

Caching is employed due to two fundamental principles:

Amdahl’s Law: The performance improvement gained by optimizing a portion of a system is limited by the proportion of time that portion is used.
Pareto Distribution (80/20 rule): For many events, roughly 80% of the effects come from 20% of the causes. In the context of caching, this means that around 20% of your dataset will account for 80% of data access.

1.4 Issues in Distributed Caching and Their Resolutions

Distributed caching brings its own challenges:

Cost: Distributed caching can add significant infrastructure and maintenance costs. This can be mitigated by cost-efficient cloud solutions, open-source software, and prioritizing what to cache.
Consistency: Ensuring all caches have the most recent data is challenging. This can be managed with cache invalidation strategies, like TTL or write-through.
Race Condition: Concurrent reads/writes can lead to inconsistent data. This can be mitigated with techniques like “locking” data while it’s being written or atomic operations.
Availability: The cache should always be available to serve data. Techniques like replication and consistent hashing can improve cache availability.

1.5 Principle of Locality

This principle posits that programs access a small proportion of their address space at any given time, and there are two types:

Temporal Locality: If an item is accessed, it’s likely to be accessed again in the near future.
Spatial Locality: If an item is accessed, items whose addresses are close are likely to be accessed soon.

1.6 Cache Replacement Policies

When a cache is full, it must decide which items to discard:

LRU (Least Recently Used): The cache discards the least recently used items first.
LFU (Least Frequently Used): The cache discards the least frequently used items first.

Some systems, like Twitter, use a combination of both.

How Does the LRU Cache Work?

The LRU cache leverages a combination of a hash table and a doubly linked list. Each node in the doubly linked list holds a key-value pair from the cache. The head of the list represents the most recently used item, while the tail is the least recently used one. The hash table stores reference to the nodes in the list, facilitating quick access to the linked list nodes.

LRU Cache in Action: A Detailed Example

Let’s say we have an LRU cache with a capacity of 3, and we’re performing the following operations: Insert 1, Insert 2, Insert 3, Access 2, Insert 4, and Access 1.

Insert 1: The cache is empty, so 1 is added at the head of the list.
List: [1]
Hash: {1: Node(1)}
Insert 2: The cache is not full, so 2 is added at the head of the list.
List: [2, 1]
Hash: {1: Node(1), 2: Node(2)}
Insert 3: The cache is still not full, so 3 is added at the head of the list.
List: [3, 2, 1]
Hash: {1: Node(1), 2: Node(2), 3: Node(3)}
Access 2: 2 is in the cache (a cache hit). 2 is moved to the head of the list.
List: [2, 3, 1]
Hash: {1: Node(1), 2: Node(2), 3: Node(3)}
Insert 4: The cache is full. The tail node (1) is removed from the list and the hash, and 4 is added at the head of the list.
List: [4, 2, 3]
Hash: {2: Node(2), 3: Node(3), 4: Node(4)}
Access 1: 1 is not in the cache (a cache miss), leading to an operation like fetching it from the main data source.

Through this example, we can see that the LRU cache effectively manages a set of limited resources (cache slots in this case), ensuring that the cache holds the most recently used items to service incoming requests.

However, LRU has limitations. It doesn’t perform well in scenarios with ample one-time-use data. This is where Count-Min Sketch, a probabilistic data structure, comes to the rescue.

How Count-Min Sketch Works

Count-Min Sketch keeps track of the frequency of events in a stream of data. It uses several hash functions to map each data item to a counter in each row of a two-dimensional array. The minimum value of all the counters for a particular item gives an estimate of its frequency. This approach evicts the least frequently used items, overcoming the LRU’s limitations.

1.7 Cache Invalidation

Cache invalidation is the process of updating or removing cached data to maintain consistency. There are several methods:

TTL (Time To Live): Data is automatically removed from the cache after a certain time period.
Invalidation on Read: The cache checks if data is stale when it’s accessed. The challenge is how does the cache know the data is stale. This can be mitigated by using a “last updated” timestamp on the original data source.
Invalidation on Write: Data is invalidated when the original source is updated. This method is used most often due to its simplicity and effectiveness.

2.1 Distributed Caching

Distributed caching allows caching data across multiple nodes to increase storage space and improve data availability. However, it comes with its challenges and solutions:

Modulus-based Distribution: Modulus of the key’s hash and the number of nodes determines the node where data is stored. The challenge here is that when a node goes down or comes up, all data needs to be remapped, causing latency. This issue can be resolved using consistent hashing.
Range-based Distribution: Nodes are responsible for storing data of certain key ranges. The challenge is that data may be unevenly distributed, and scaling can be hard. This can be mitigated by monitoring and adjusting ranges periodically.
Consistent Hashing: Consistent hashing ensures minimal data remapping when nodes are added or removed, thus improving the system’s overall performance.

2.2 Cache Strategies

Several strategies can be employed for handling cache reads and writes:

Read-Through Cache: The cache itself fetches the data from the underlying store on a cache miss, reducing the load on the underlying storage.
Cache Aside (Lazy Loading): The application is responsible for fetching data on a cache miss and updating the cache.
Write-Through Cache: The cache immediately writes data to the underlying store when the cache is updated, ensuring data consistency.
Write-Around Cache: Updates are written directly to the storage, skipping the cache to prevent cache pollution.
Write-Back Cache: The cache only writes to the underlying store when necessary, optimizing write operations.

The combinations of these strategies are tailored according to specific application needs.

2.3 Combinations of Cache Strategies and Their Applications

Different combinations of cache read and write strategies prove beneficial in specific system architectures and use-cases.

Cache Aside & Write Around

This combination shines in a system with heavy write operations, where updated data isn’t immediately required. The Write-Around strategy prevents the cache from being flooded with write operations that aren’t needed soon. It directly writes updates to the back-end storage, bypassing the cache to avoid polluting it with data that might not be accessed soon.

Simultaneously, Cache Aside handles the read operations. Whenever data is requested, the system first checks the cache. If it hits a miss, it then retrieves the data from the storage, updates the cache and finally serves the data to the requester. This approach is beneficial in big data processing systems where data consistency isn’t an immediate requirement, but cache memory needs to be effectively utilized.

Read-Through & Write-Through

When high data consistency is required, this combination proves effective. In a Write-Through strategy, every write operation updates both the cache and the back-end storage, ensuring the cache always reflects the latest data. This method minimizes the risk of serving stale data, hence improving data consistency.

The Read-Through strategy complements it by letting the cache handle the retrieval of data. Upon a cache miss, it fetches the data from the storage directly, thus keeping the process seamless for the application. Transactional systems, such as banking applications or booking systems, often use this approach due to its consistent and reliable nature.

Read-Through & Write-Around

This combination strikes a balance between the other two. It caters to systems where cache pollution needs to be avoided, and data consistency is necessary but not immediately required. Like the Read-Through & Write-Through combination, the Read-Through strategy lets the cache handle data retrieval upon a cache miss. But instead of immediately writing to the cache upon every update, the Write-Around strategy writes updates directly to the storage, helping to maintain cache memory effectively. This approach is prevalent in Content Delivery Networks (CDNs), where data consistency is crucial, but instant reflection of changes in the cache isn’t required.

Cache Operation Challenges and Solutions

Caching systems often face challenges related to reliability, traffic, and consistency:

Reliability Challenges: Cache systems can face issues like cache avalanche (where cache expires all at once), cache stampede (many identical requests for a cache miss), thundering herd problem (overloading of requests), and penetration issues (non-existing keys causing overhead). Solutions include staggered expiration, consistent hashing, load balancing, and bloom filters.
Traffic Challenges: Hot key problems (certain keys are accessed more often), and large keys (taking up more space and I/O) can be resolved by rate limiting, key partitioning, and compacting large keys.
Consistency Challenges: When multiple cache replicas exist, maintaining data consistency can be difficult. One possible solution is using a centralized system to manage updates.
Concurrent Cache Updates: Concurrent updates can lead to inconsistencies. The CAS (Compare and Swap) theorem uses versioning to handle such situations.

Overcoming Reliability Challenges in Caching Systems

When deploying caching systems, developers need to be aware of several potential challenges that can impact the reliability and overall performance of their applications. Some of these challenges include cache avalanche, cache stampede, the thundering herd problem, and cache penetration.

Cache Avalanche

A cache avalanche occurs when several cache items expire simultaneously, causing an overwhelming number of requests to fall onto the database at once. This sudden surge in requests can significantly slow down the system or, in the worst case, cause it to crash.

Solution: One solution to prevent cache avalanches is using staggered expiration times. Instead of setting the same TTL (Time to Live) for all cache items, they can be assigned varying expiration times. This staggering prevents a large number of cache items from expiring simultaneously, thus avoiding a cache avalanche.

Cache Stampede

A cache stampede occurs when multiple identical requests are made to a cache for a value that is not currently cached, leading to many simultaneous expensive calculations or database queries.

Solution: Implementing a lock or semaphore mechanism can effectively prevent cache stampedes. Once the first request triggers the database query, a lock can be set to signal other identical requests to wait until the queried data is cached.

Thundering Herd Problem

The thundering herd problem refers to the scenario where a multitude of requests overwhelm a system once a particular resource becomes available. In a cache system, this can happen when a highly requested cache item expires, and all the clients request the backend database for this item simultaneously.

Solution: To combat the thundering herd problem, load balancers can be used to distribute requests evenly across multiple servers, reducing the load on any single server. Additionally, using a mechanism similar to the cache stampede solution, locking the cache during the database query can help control the inflow of requests.

Cache Penetration

Cache penetration refers to the situation where requests are made for data that doesn’t exist, either in the cache or the database. These invalid requests can bypass the cache layer and hit the database, causing unnecessary load.

Solution: A common solution to cache penetration is using a Bloom filter, a probabilistic data structure that can efficiently check whether an element belongs to a set. The filter can verify whether a requested key is likely to exist before the request hits the database.

By understanding these potential challenges and their solutions, developers can ensure the robustness of their caching systems, leading to improved application performance and reliability.

How CAS Works

CAS is an atomic operation, meaning it’s completed without being interrupted by other processes. The operation takes three parameters: a memory location (or a cache line), an expected old value, and a new value.

The operation proceeds as follows:

It compares the current value at the specified memory location with the expected old value.
If the current value matches the expected old value, it means no other operation has changed the data since it was last checked. The operation then swaps the current value with the new value.
If the current value does not match the expected old value, it implies some other operation has changed the data. In this case, the swap is not performed.

2.4 Cache Invalidation Strategies

Cache invalidation strategies are vital to maintaining data consistency:

TTL-Based Invalidation: Each cache entry has a TTL (Time to Live). After the TTL expires, the cache entry is discarded or refreshed.
Write-Through Invalidation: Whenever data is updated, the cache is immediately updated or invalidated.
Cache Eviction Policies: LRU, LFU, or other eviction policies are used to remove entries from the cache when it is full.

Understanding caching strategies, their applications, and potential challenges is fundamental in building scalable and efficient systems. When properly implemented, caching can dramatically improve system performance by reducing the need for expensive operations and ensuring that frequently accessed data is readily available.