1. background

This article was written last week when I went to Tech Salon to listen to IQI’s Java Cache Path. Let’s start with a brief introduction to the development of IQIYI’s Java caching path.

You should know the history of cache evolution
You can see that the diagram is divided into several stages:

  • Phase 1: Data synchronization plus redis

Synchronize data to redis through message queues, and then Java applications directly fetch caches
The advantage of this stage is that the data update is fast because of the distributed cache used. The disadvantage is also obvious: depending on the stability of Redis, once redis hangs, the entire cache system is unavailable, resulting in cache avalanche, all requests hit DB.

  • Second and third phases: JavaMap to Guava cache

This phase uses in-process caching as a first-level cache and redis as a second-level cache. Advantages: Not affected by external systems, other systems hang up and can still be used. Disadvantage: In-process caches cannot update in real time as distributed caches do. Because of the limited memory in java, it is necessary to set the size of the cache, and then some caches will be eliminated, there will be hit rate problems.

  • Stage 4: Guava Cache refresh

In order to solve the above problems, Guava Cache can set the refresh time after writing and refresh it. It solves the problem of not updating all the time, but it still does not solve the problem of real-time refresh.

  • Phase 5: External cache asynchronous refresh

You should know the history of cache evolution

This phase extends Guava Cache, using redis as a message queue notification mechanism to notify other Java applications to refresh.

Here is a brief introduction to the five stages of IQI cache development, of course, there are other optimization, such as GC tuning, cache penetration, cache coverage optimization and so on. Interested students can pay attention to the public number and contact me for communication.

Primitive Society-Database Search

Above mentioned is an evolutionary line of IQI, but in the general development process, the first step is not redis, but direct library search.

When the traffic is low, it is most convenient to look up the database or read the files, which can also fully meet our business requirements.

Ancient Society – HashMap

When we use a certain amount of traffic or query the database very frequently, we can sacrifice the HashMap or Concurrent HashMap that comes with our java. We can write this in the code:

public class CustomerService {
    private HashMap<String,String> hashMap = new HashMap<>();
    private CustomerMapper customerMapper;
    public String getCustomer(String name){
        String customer = hashMap.get(name);
        if ( customer == null){
            customer = customerMapper.get(name);
        return customer;

But there’s a problem that HashMap can’t do data obsolescence and memory will grow indefinitely, so hashMap will soon be obsolete. Of course, it’s not that he’s totally useless. Just like our ancient society, not everything is out of date. For example, the traditional virtues of our Chinese people are never out of date. Like this hashMap, it can be used as a cache in some scenarios. When there is no need for elimination mechanism, for example, when we use reflection, if we search for Method through reflection every time, field. Performance must be inefficient, and then we use HashMap to cache it, which can improve performance a lot.

Modern Society – LRUHashMap

In the ancient society, it is difficult for us to eliminate the data, which will lead to the infinite expansion of our memory. Obviously, it is unacceptable for us. Some people say that I have eliminated some data. That’s not right, but how? Was it eliminated randomly? Of course not. Imagine that you just loaded A into the cache and were eliminated the next time you want to access it. Then you will visit our database. Why do we need to cache?

So smart people invented several elimination algorithms. Here are three common FIFOs, LRU, LFU (and some ARC, MRU interested in self-search):

  • FIFO: First in, first out. In this elimination algorithm, first in the cache will be eliminated first. This is the simplest, but it will lead to a very low hit rate. Imagine if we had a high-frequency data that was first accessed by all data, and those that were not very high were accessed later, that would squeeze out our first data but his high-frequency access.
  • LRU: The least recently used algorithm. In this algorithm, the above problems are avoided. Every time we access the data, we put it at the end of our team. If we need to eliminate the data, we only need to eliminate the leader of the team. But there is still a problem. If a data accessed 10,000 times in the first 59 minutes of an hour (this is a hot data), and did not access this data in the next minute, but there are other data accesses, which led to the elimination of our hot data.
  • LFU: Minimum frequency use recently. In this algorithm, the above is optimized, using additional space to record the frequency of each data, and then select the lowest frequency to eliminate. This avoids the problem that the LRU cannot handle the time period.

Three elimination strategies are listed above. For these three strategies, the cost of implementation is higher than one, and the same hit rate is better than one. In general, we choose the middle solution, that is, the cost of implementation is not too high, and the hit rate is also good LRU, how to achieve a LRUMap? We can complete a simple LRUMap by inheriting LinkedHashMap and rewriting the removeEldestEntry method.

class LRUMap extends LinkedHashMap {

        private final int max;
        private Object lock;

        public LRUMap(int max, Object lock) {
            // No need for expansion
            super((int) (max * 1.4f), 0.75f, true);
            this.max = max;
            this.lock = lock;

         * Rewrite the removeEldestEntry method of LinkedHashMap
         * Judging from Put, if true, the oldest one will be deleted
         * @param eldest
         * @return
        protected boolean removeEldestEntry(Map.Entry eldest) {
            return size() > max;

        public Object getValue(Object key) {
            synchronized (lock) {
                return get(key);
        public void putValue(Object key, Object value) {
            synchronized (lock) {
                put(key, value);


        public boolean removeValue(Object key) {
            synchronized (lock) {
                return remove(key) != null;
        public boolean removeAll(){
            return true;

In LinkedHashMap, a list of entries (objects for keys and values) is maintained. Every time we get or put, we put the new entry inserted or the old entry queried at the end of our list.
It can be noted that in the construction method, the size of the settings is specially set to Max * 1.4. In the following removeEldest Entry method, only size > Max is eliminated, so that our map can never walk out of the logic of expansion. By rewriting LinkedHashMap, we implemented our LMap in several simple ways.

Modern Society – Guava cache

LRUMap has been invented in modern society to eliminate cached data, but there are several problems:

  • Lock competition is serious, you can see that in my code, Lock is a global lock, at the method level, when the call volume is large, the performance will inevitably be lower.
  • No support for expiration time
  • Automatic refresh is not supported

So the big guys at Google couldn’t help but invent the Guava cache, which you can use as easily as the following code:

public static void main(String[] args) throws ExecutionException {
        LoadingCache<String, String> cache = CacheBuilder.newBuilder()
                // 30 ms after writing
                .expireAfterWrite(30L, TimeUnit.MILLISECONDS)
                // 30 ms after the visit expired
                .expireAfterAccess(30L, TimeUnit.MILLISECONDS)
                // refresh after 20ms
                .refreshAfterWrite(20L, TimeUnit.MILLISECONDS)
                // Turn on weak key and when garbage collection is started, the cache is also reclaimed
        Cache. put ("hello 1", "I'm Hello 1");
        Cache. put ("hello 1", "I'm Hello 2");
    public static com.google.common.cache.CacheLoader<String, String> createCacheLoader() {
        return new com.google.common.cache.CacheLoader<String, String>() {
            public String load(String key) throws Exception {
                return key;

I will explain how guava cache solves several problems of LRUMap from the principle of guava cache.

Lock competition

Guava cache adopts the idea of Concurrent HashMap, locking in segments and taking responsibility for its own elimination in each segment. In Guava, if there are too few segments, the competition is still very serious. If there are too many segments, it will be easy to be eliminated randomly. For example, if the size is 100, divide them into 100 segments. That is to say, let each data have a single segment, and each segment will handle the elimination process by itself, so there will be random elimination. In guava cache, the following code is used to figure out how to segment.

    int segmentShift = 0;
    int segmentCount = 1;
    while (segmentCount < concurrencyLevel && (!evictsBySize() || segmentCount * 20 <= maxWeight)) {
      segmentCount <<= 1;

The segmentCount above is our final number of segments, which guarantees at least 10 Entries per segment. If the parameter concurrencyLevel is not set, the default is 4, and the final number of segments is up to 4. For example, if we have a size of 100, we will divide it into four segments, and the maximum size of each segment is 25.
In guava cache, write operations are directly locked. For read operations, if the read data is not expired and loaded, no lock is needed. If not read, it will be locked again for second reading. If no cache loading is needed, that is, through the Cache Loader we configured, I configure the key directly, which is usually configurated in business. Query from the database.
As shown in the following figure:

You should know the history of cache evolution

Expiration date

Compared with LRUMap, there are two kinds of expire time, one is how long after writing expire AfterWrite, the other is how long after reading expire AfterAccess. The interesting thing is that in guava cache, the expired Entry does not expire immediately (that is, no background threads are sweeping all the time), but it expires when reading and writing. The advantage of doing so is to avoid global locking when the background threads scan. Look at the following code:

public static void main(String[] args) throws ExecutionException, InterruptedException {
        Cache<String, String> cache = CacheBuilder.newBuilder()
                // 5 seconds after writing
                .expireAfterWrite(5, TimeUnit.MILLISECONDS)
        Cache. put ("hello 1", "I'm Hello 1");
        Cache. put ("hello 2", "I'm Hello 2");
        Cache. put ("hello 3", "I'm Hello 3");
        Cache. put ("hello 4", "I'm Hello 4");
        // Sleep for at least 5 ms
        Cache. put ("hello 5", "I'm Hello 5");

From this result, we know that the overdue processing is only performed at put time. Particular attention is given to my concurrency Level (1) above, where I set the maximum number of segments to 1, otherwise this experimental effect will not occur. As I said in the previous section, we are dealing with the expiration in the unit of segments. Two queues are maintained in each segment:

    final Queue<ReferenceEntry<K, V>> writeQueue;

    final Queue<ReferenceEntry<K, V>> accessQueue;

WriteQueue maintains a write queue, where the head represents data written early and the tail represents data written late.
AccessQueue maintains the access queue, which, like LRU, is used to phase out access time. If this segment exceeds the maximum capacity, such as the 25 we mentioned above, the first element of the access Queue queue will be eliminated.

void expireEntries(long now) {

      ReferenceEntry<K, V> e;
      while ((e = writeQueue.peek()) != null && map.isExpired(e, now)) {
        if (!removeEntry(e, e.getHash(), RemovalCause.EXPIRED)) {
          throw new AssertionError();
      while ((e = accessQueue.peek()) != null && map.isExpired(e, now)) {
        if (!removeEntry(e, e.getHash(), RemovalCause.EXPIRED)) {
          throw new AssertionError();

The above is guava cache’s process of dealing with expired Entries, which peek two queues at a time and delete them if they expire. Generally, processing expired Entries can be invoked before and after our put operation, or when we read data and find that it is expired, then we can do the expiration of the entire Segment, or when we do the second reading of lockedGetOrLoad operation.

void evictEntries(ReferenceEntry<K, V> newest) {
      ///... Eliminate useless code

      while (totalWeight > maxSegmentWeight) {
        ReferenceEntry<K, V> e = getNextEvictable();
        if (!removeEntry(e, e.getHash(), RemovalCause.SIZE)) {
          throw new AssertionError();
** Return entry of accessQueue
ReferenceEntry<K, V> getNextEvictable() {
      for (ReferenceEntry<K, V> e : accessQueue) {
        int weight = e.getValueReference().getWeight();
        if (weight > 0) {
          return e;
      throw new AssertionError();

Above is the code we used to expel Entry, and you can see that access Queue was visiting to expel its team leader. The expulsion strategy is usually invoked when the elements in the segment change, such as insertion operation, update operation, loading data operation.

auto refresh

Automatic refresh operation is relatively simple to implement in guava cache. It can directly judge whether it meets the refresh conditions by querying.

Other characteristics

There are other features in Guava cache:

Virtual reference

In Guava cache, both key and value can set virtual references, and there are two reference queues in egment:

    final @Nullable ReferenceQueue<K> keyReferenceQueue;

    final @Nullable ReferenceQueue<V> valueReferenceQueue;

These two queues are used to record the recycled references, where each queue records the hash of each recycled Entry, so that after recycling, the previous Entry can be deleted through the hash value in the queue.

Delete listeners

In guava cache, when data is eliminated, but you don’t know whether it’s expired, expelled, or recycled because of virtual references? At this time, you can call this method removalListener (RemovalListener listener) to add listeners for data elimination monitoring, log or some other processing, which can be used for data elimination analysis.

In Removal Cause, all the reasons for being eliminated are recorded: deleted by users, replaced by users, expired, expelled collection, and eliminated due to size.

Summary of guava cache

Reading the source code of guava cache carefully, we can conclude that it is a LRU Map with good performance and abundant api. Aiqiyi’s cache development is also based on this. Through the secondary development of guava cache, it can update the cache between Java application services.

Towards the Future-caffeine

The function of guava cache is indeed very powerful, which meets the needs of most people, but it is essentially a layer of LRU encapsulation, so it is dwarfed by many other better elimination algorithms. Caffeine cache implements W-TinyLFU (a variant of LFU+LRU algorithm). Here is a comparison of the hit rates of different algorithms:
You should know the history of cache evolution

Optimal is the best hit rate, LRU is a brother compared with other algorithms. And our W-Tiny LFU is the closest to the ideal hit rate. Of course, not only is the hit rate caffeine better than guava cache, but also the read-write throughput of guava cache.
You should know the history of cache evolution

At this time, you will be curious about why caffeine is so powerful. Don’t worry. Let’s talk to you slowly.


I’ve already talked about how traditional LFU works. In LFU, as long as the probability distribution of data access patterns remains unchanged over time, its hit rate can become very high. Here I still take Aiqi Yiyi as an example. For example, a new play came out. We cached it with LFU. This new play has visited hundreds of millions of times in these days, and this frequency has recorded hundreds of millions of times in our LFU. But the new play will always be outdated, for example, the first few episodes of the new play after a month have actually been outdated, but his visits are really too high, other TV plays simply can not eliminate the new play, so there are limitations in this mode. So a variety of LFU variants have emerged, attenuating based on time periods, or frequencies over a recent period of time. The same LFU also uses extra space to record the frequency of each data access, even if the data is not in the cache, so the extra space to maintain is very large.

Let’s imagine that we build a hashMap for this maintenance space. Each data item will exist in the hashMap. When the amount of data is very large, the hashMap will be very large.

Back to LRU, our LRU is not so useless. LRU can cope well with sudden traffic, because it does not need cumulative data frequency.

So W-Tiny LFU combines LRU with LFU, and some other features of the algorithm.

Frequency recording

First of all, we will talk about the problem of frequency recording. The goal we want to achieve is to use limited space to record access frequencies that change with time. Count-Min Sketch is used in W-TinyLFU to record our access frequency, which is also a variant of Bloom filter. As shown in the following figure:
You should know the history of cache evolution
If we need to record a value, then we need to process hash through a variety of Hash algorithms, and then in the corresponding hash algorithm record + 1, why do we need a variety of hash algorithms? Because this is a compression algorithm, there must be conflicts. For example, we build an array of Long and calculate the hash position of each data. For example, Zhang San and Li Si, they may have the same hash value. For example, the location of 1 Long [1] will increase the corresponding frequency. Zhang San visits 10,000 times, and Li Si visits the location of 1 Long [1] 10,011. If Li Si’s visit rating is taken, it will be 10,011. But Li Si Naming only visited once, so to solve this problem, he used it. Many hash algorithms can be understood as a concept of long [][] two-dimensional arrays. For example, in the first algorithm, Zhang San and Li Si conflict, but in the second and third algorithm, there is no big probability conflict. For example, one algorithm has about 1% probability conflict, and the four algorithms have 1% quadratic probability of collision. Through this model, when we take the access rate of Li Si, we take the lowest frequency of Li Si’s access of all the algorithms. So his name is Count-Min Sketch.

You should know the history of cache evolution

You should know the history of cache evolution

Here’s a simple example: if a hashMap records this frequency, if I have 100 data, then the HashMap has to store 100 data access frequencies. Even if my cache’s capacity is 1, because of Lfu’s rules, I have to record all the access frequencies of this 100 data. If there’s more data, I’ll have more records.

In Count-Min Sketch, let me just talk about the implementation in caffeine (in the Frequency Sketch class). If your cache size is 100, it generates a long array whose size is the nearest power of 2 to 100, that is 128. This array will record our access frequency. In caffeine, the maximum frequency of his rule is 15,15 binary bits 1111, totaling 4 bits, while Long type is 64 bits. So each Long type can put 16 algorithms, but caffeine does not do so, only uses four hash algorithms, each Long type is divided into four segments, each segment contains the frequency of four algorithms. The advantage of this approach is that Hash conflicts can be further reduced, and the original 128-size hash becomes 128X4.

A Long is structured as follows:
You should know the history of cache evolution
Our four paragraphs are divided into A, B, C and D, which I will call them later. And the four algorithms in each segment are called s1, s2, s3, s4. Here’s an example of how to add a digital frequency to access 50. Let’s take size = 100 for example.

  1. Firstly, determine which segment of the 50 hash is in. The number less than 4 must be obtained by hash & 3. Assuming hash & 3 = 0, it is in paragraph A.
  2. For 50 hash, we use other hash algorithms to do hash again, and get the position of long array. Suppose we use S1 algorithm to get 1, S2 algorithm to get 3, S3 algorithm to get 4, S4 algorithm to get 0.
  3. Then, in paragraph A of long [1], the position of S1 is + 1, abbreviated as 1As1 plus 1, then 3As2 plus 1, 4As3 plus 1, and 0As4 plus 1.

You should know the history of cache evolution

At this point, some people will question whether the maximum frequency of 15 is too small. It doesn’t matter that in this algorithm, for example, size equals 100. If he improves 1000 times globally, he will divide it by 2 attenuation. After attenuation, he can continue to increase. This algorithm has been proved in W-TinyLFU’s paper that it can better adapt to the access frequency of time period.

Read and write performance

In guava cache, we said that there is overdue processing in its read and write operations, that is, you may also do a phase-out operation in a Put operation, so its read and write performance will be affected to some extent. You can see in the figure above, caffeine does explode guava cache in the read and write operations. This is mainly because caffeine, which operates on these events asynchronously, submits events to the queue, where the data structure of the queue is RingBuffer. It’s not clear what you should know about the high-performance unlocked queue Disruptor in this article. Then the queue is fetched by using the default ForkJoinPool. commonPool (), or by configuring the thread pool itself, followed by subsequent elimination and expiration operations.

Of course, read and write also have different queues. In caffeine, cache read is much more than write, so for write operation, all threads share a Ringbuffer.

You should know the history of cache evolution

For read operations, which are more frequent than write operations, and further reduce competition, each thread is equipped with a RingBuffer:

You should know the history of cache evolution

Data Elimination Strategy

In caffeine, all data is in Concurrent HashMap. Unlike guava cache, guava cache implements a structure similar to Concurrent HashMap itself. There are three record references in caffeineLRUQueue:

  • Eden queue: In caffeine, only% 1 of the cache capacity is specified. If size = 100, the effective size of the queue is equal to 1. This queue records incoming data to prevent burst traffic from being eliminated due to previous lack of access frequency. For example, there is a new play on the line, in the beginning there is no access frequency, to prevent it from being eliminated by other caches and joined this area. Eden, the most comfortable and comfortable area, is hard to be eliminated by other data.
  • Probation queue: It’s called probation queue. In this queue, your data is relatively cold and will be eliminated soon. This effective size is size minus Eden minus protected.
  • Protected queue: In this queue, you can rest assured that you will not be eliminated for the time being, but don’t worry, if the Probation queue has no data or the Protected data is full, you will also be faced with the embarrassing situation of elimination. Of course, to become this queue, you need to visit Probation once and then upgrade it to Protected queue. This effective size is (size minus eden) X 80% and if size = 100, it will be 79.

The three queues are as follows:

You should know the history of cache evolution

  1. All new data will go into Eden.
  2. Eden is full. Elimination goes into Probation.
  3. If one of the data is accessed in Probation, the data is upgraded to Protected.
  4. If Protected is full, it will continue to be downgraded to Probation.

When data obsolescence occurs, it will be eliminated from Probation. The head of the data queue in this queue will be called victim. This head must be the first one to enter. According to the algorithm of LRU queue, he should be eliminated. But here he can only be called victim. This queue is a probation queue. The representative is going to give him a sentence immediately. Here we will take out the team’s tail candidates, also known as attackers. Here, the victim will do PK with the attacker. Through the frequency data recorded in our Count-Min Sketch, we can make the following judgments:

  • If the attacker is larger than the victim, the victim will be eliminated directly.
  • If the attacker is <=5, the attacker is eliminated directly. This logic is explained in his notes:

You should know the history of cache evolution
He believed that setting a warm-up threshold would make the overall hit rate higher.

  • In other cases, they are eliminated randomly.

How to use

For those who are familiar with Guava, if you are worried about switching costs, then you are completely worried about it. The API of caffeine draws on the API of Guava and finds that it is basically the same.

public static void main(String[] args) {
        Cache<String, String> cache = Caffeine.newBuilder()
                .expireAfterWrite(1, TimeUnit.SECONDS)

Incidentally, more and more open source frameworks have abandoned Guava cache, such as Spring 5. In business, I have also compared Guava cache with caffeine and finally chose caffeine, which also has a good effect online. So don’t worry that caffeine is immature and nobody uses it.


This paper mainly talks about the way of IQI cache and a history of local cache (from ancient times to the future), as well as the basic principle of each cache. Of course, it is not enough to use caches well, such as how local caches update synchronously after changes in other places, distributed caches, multi-level caches and so on. A section on how to use caching will be written later. For the principles of Guava cache and caffeine, we will also devote time to write the source code analysis of these two. If interested friends can pay attention to the public number, they can consult the updated articles at the first time.

Finally, this article is included in JGrowing, a comprehensive, excellent Java learning route built by the community. If you want to participate in the maintenance of open source projects, you can build it together. The GitHub address is: https://github.com/javagrowin.
Please give it to a little star.

If you think this article has an article for you, you can pay attention to my technical public number. Your attention and forwarding is my greatest support, O ()

You should know the history of cache evolution