Arts card punch – week 3

1. Algorithm to do an algorithm problem of leetcode

146. LRU Cache
Design and implement a data structure for Least Recently Used (LRU) cache. It should support the following operations: get and put.

get(key) – Get the value (will always be positive) of the key if the key exists in the cache, otherwise return -1.
put(key, value) – Set or insert the value if the key is not already present. When the cache reached its capacity, it should invalidate the least recently used item before inserting a new item.

The cache is initialized with a positive capacity.

Follow up:
Could you do both operations in O(1) time complexity?

Example:

LRUCache cache = new LRUCache( 2 /* capacity */ );

cache.put(1, 1);
cache.put(2, 2);
cache.get(1);       // returns 1
cache.put(3, 3);    // evicts key 2
cache.get(2);       // returns -1 (not found)
cache.put(4, 4);    // evicts key 1
cache.get(1);       // returns -1 (not found)
cache.get(3);       // returns 3
cache.get(4);       // returns 4

Answer:This problem is not difficult, mainly to examine the use of data structures. Because LRU’s rules need to move an element from linear structure to the head frequently, only the linked list structure can be selected; and because it is necessary to query whether the key is in the cache within a constant time, the linked list alone cannot meet the requirements, so it needs to add a hash table to quickly query. In other words, the data structure needed for this problem includes: bidirectional linked list + hash table.
My first version uses STL’s list and unordered_map. The code is as follows:

class LRUCache {
public:
    LRUCache(int capacity) {
        m_cache.clear();
        m_hash.clear();
        m_capacity = capacity;
        m_size = 0;
    }
    
    int get(int key) {
        return _get(key, -1);
    }
    
    void put(int key, int value) {
        int ret = _get(key, value);
        if (ret > 0) {
            //exist and refresh, do nothing
            return;
        }
        else {
            //push_front
            m_cache.push_front(pair<int, int>(key, value));
            m_hash.insert(pair<int, list<pair<int, int>>::iterator>(key, m_cache.begin()));
            if (m_size >= m_capacity) {
                //pop the old value
                list<pair<int, int>>::iterator last = --m_cache.end();
                m_hash.erase((*last).first);
                m_cache.pop_back();
            }
            else {
                m_size++;
            }
        }
    }
private:
    int m_capacity;
    int m_size;
    map<int, list<pair<int, int>>::iterator> m_hash;
    list<pair<int, int>> m_cache;

    int _get(int key, int newVal) {
        map<int, list<pair<int, int>>::iterator>::iterator iter;
        iter = m_hash.find(key);
        if (iter == m_hash.end()) {
            return -1;
        }
        else {
            list<pair<int, int>>::iterator pos = iter->second;
            int val = (*pos).second;
            if (newVal > 0) {
                val = newVal;
            }
            m_cache.erase(pos);
            m_cache.push_front(pair<int, int>(key, val));
            m_hash.erase(key);
            m_hash.insert(pair<int, list<pair<int, int>>::iterator>(key, m_cache.begin()));
            return val;
        }
    }
};

The list of STL is adopted. Although the specific implementation of bidirectional linked list is avoided, it is not smooth to use, because:

  1. When the cache is full, new elements need to be added and the last element needs to be removed, so the elements of the linked list need to record the key and corresponding val (Val is used for get return, and key is mainly used to remove the data corresponding to the hash table)
  2. Val of hash table is the iterator of linked list, which is very tedious to define (of course, typedef can be used to simplify)
  3. With STL’s list implementation, the internal implementation adds a lot of complexity, and the execution time will be higher than the pure two-way linked list

Therefore, in the end, I encapsulated the implementation of linked list by myself, and the specific code is as follows:

struct Node {
    int key;
    int val;
    Node *prev;
    Node *next;
};

class LRUCache {
public:
    LRUCache(int capacity) {
        m_capacity = capacity;
        m_size = 0;
        m_head = new Node{0, 0};
        m_tail = new Node{0, 0};
        //M_head and m_tail are Sentinels
        m_head->next = m_tail;
        m_tail->prev = m_head;
    }
    
    int get(int key) {
        return _touch(key, -1);
    }
    
    void put(int key, int value) {
        if (_touch(key, value) < 0) {
            Node *p = new Node{key, value};
            _push_front(p);
            m_map[key] = p;
            if (m_size < m_capacity) {
                m_size++;
            }
            else {
                m_map.erase(_pop_back());
            }           
        }
    }
private:
    unordered_map<int, Node *> m_map;
    Node *m_head, *m_tail;
    int m_capacity;
    int m_size;

    int _touch(int key, int val) {
        unordered_map<int, Node *>::iterator iter = m_map.find(key);
        if (iter == m_map.end()) {
            return -1;
        }
        Node *p = iter->second;
        if (val > 0)
            p->val = val;
        _move_to_head(p);
        return p->val;
    }

    void _move_to_head(Node *p) {
        p->prev->next = p->next;
        p->next->prev = p->prev;
        _push_front(p);
    }

    int _pop_back() {
        Node *last = m_tail->prev;
        int key = last->key;
        last->prev->next = m_tail;
        m_tail->prev = last->prev;
        last = NULL;
        return key;
    }

    void _push_front(Node *p) {
        p->prev = m_head;
        p->next = m_head->next;
        m_head->next->prev = p;
        m_head->next = p;
    }
};

In the implementation of linked list, I added two sentinels (m_head, m_tail node pointer). When adding and removing nodes, there is no need to judge null, and a lot of code is saved. In the end, this kind of implementation, no matter in time or space use, is better than the first one.

2. Review read and comment on at least one English technical article

The article read this week isPrateek GogiaRedis: what and what? The whole article talks about what redis is and what advantages it has over traditional databases.
First of all, redis is a kind of memory database, which supports string, hash table, ordered set, collection and other data structures. Because the data is stored in memory, the processing speed is much faster than the traditional database;
Secondly, the reasons why redis is used are as follows:

  1. Redis is written in C language, so the processing speed is very fast;
  2. Redis is NoSQL;
  3. At present, many technology giants use redis, such as GitHub, Weibo, pinterest, snapchat, Craigslist, digg, stackoverflow, Flickr;
  4. Using redis cache can save the cost of accessing cloud database directly;
  5. Redis is very friendly to developers and supports many languages, such as (JavaScript, Java, go, C, C + +, C x, python, Objective-C, PHP and other popular languages);
  6. Redis is open source and so far stable.

3. Tip learn at least one technical skill

Use of sentry: when writing algorithm logic, it is often necessary to make special judgments on some boundary conditions due to the limitations of the scene, such as the null judgment of pointer, the boundary between the upper and lower left and right boundaries, the judgment of the last element of array, etc. at this time, the concept of Sentry can be used to avoid the judgment of this boundary, and a little space can be used to simplify the code logic. The use of sentinels is illustrated by the following:

  1. For sentinels of two-dimensional matrix, many scenes need to use two-dimensional matrix, such as maze. Assuming the size of maze is n * m, when walking maze, we need to judge the upper, lower, left and right boundaries, for example, the boundary judgment of X coordinate is x > 0 & & x < n, y coordinate is similar, each processing needs to do four judgments, which is very tedious. In this case, I will generally add sentinels around the two-dimensional matrix, that is, I will apply for the matrix of (n + 2) * (M + 2). For the coordinate (x, y) (x = = 0 | x = = n + 1 | y = = 0 | y = = m + 1), setting the corresponding value as an obstacle can not be passed. This kind of logic processing is much simpler. It is OK to directly traverse 1 < = x < = n and 1 < = y < = m;
  2. To operate the head end pointer of a two-way linked list, it is often necessary to determine whether it is empty first. If we initialize the linked list, we will assign two nodes (both nodes will not change in the life cycle of the linked list) to the linked list. In this way, when we operate the nodes of the linked list in the future, we do not need to judge whether they are empty;
  3. To find elements i n an unordered array, we usually traverse it once from beginning to end and find a jump cycle. Each cycle requires two comparison operations. One is to determine whether the subscript I is less than N, and the other is to determine whether the corresponding data is equal to the key to be found. The way to set the sentry is to make the last element of the array the key to be found, so that the judgment on whether the subscript I is less than n can be reduced during the cycle, because it will be found eventually.

4. Share a technical article with opinions and thoughts

This week’s shared article is to use tensorflow for text emotion analysis, which describes in detail how to use tensorflow to analyze text emotion. It is a good article with steps and data.