Integer set is one of the underlying implementations of redis set key. When a set contains only integer value elements and the number of elements is small, redis will use the integer set as the underlying implementation of the set key.

Implementation of 1 integer set

The integer set is an abstract data structure used by redis to store integer values. It can save type int16_ t、int32_ t、int64_ The integer value of T, and ensure that there are no duplicate elements in the set.

eachintset.h/intsetStructure represents a set of integers:

typedef struct intset {
    uint32_t encoding;
    uint32_t length;
    int8_t contents[];
} intset;
  • Encoding: encoding method
  • Length: the number of elements contained in the collection
  • Contents []: saves the array of elements

The contents array is the underlying implementation of the integer set: each element of the integer set is an array item of the contents array, and each item is ordered from small to large according to the size of the value in the array, and the array does not contain duplicate items.

The length property records theNumber of elements, which is the length of the contents array.

Although the intset structure declares the contents property as int8_ The contents array does not hold any int8_ For example, if the value of encoding attribute is intset, the real type of contents array depends on the value of encoding attribute_ ENC_ Int16, then contents is an int16_ An array of type T. each item in the array is an int16_ Integer value of type T. the value range is: [- 32768-32767] (2 ^ (16-1)).

Similarly, the value of encoding is intset_ ENC_ Int32, then the value range of each item in the array is: [- 2147483648, 2147483647] (2 ^ (32-1).

This also raises a problem when we encode an intset_ ENC_ The intset of int8 is inserted at 129 (int8_ The value range of T is [- 128, 127]). What happens?

This also triggers the upgrade operation of intset. Correspondingly, there are also degradation operations. Next, let’s take a closer look at the upgrade operation of intset.

2 upgrade operation

Whenever we want to add a new element to the integer set, if the type of the new element is larger than the encoding type of the integer set, the integer set needs to be processed firstUpgrade operationBefore you can add a new element to the integer collection.

The source code of the whole upgrade operation is as follows:

// intset.c/intsetUpgradeAndAdd()
/* Upgrades the intset to a larger encoding and inserts the given integer. */
static intset *intsetUpgradeAndAdd(intset *is, int64_t value) {
    uint8_t curenc = intrev32ifbe(is->encoding);
    uint8_t newenc = _intsetValueEncoding(value);
    int length = intrev32ifbe(is->length);
    int prepend = value < 0 ? 1 : 0;

    /* First set new encoding and resize */
    is->encoding = intrev32ifbe(newenc);
    is = intsetResize(is,intrev32ifbe(is->length)+1);

    /* Upgrade back-to-front so we don't overwrite values.
     * Note that the "prepend" variable is used to make sure we have an empty
     * space at either the beginning or the end of the intset. */

    /* Set the value at the beginning or the end. */
    if (prepend)
    is->length = intrev32ifbe(intrev32ifbe(is->length)+1);
    return is;

There are three steps to upgrade the integer collection and add new elements

  1. Expand the underlying array size。 According to the type of the new element, expand the size of the underlying array of the integer collection and allocate space for the new element.
  2. Elements, and keep the original order。 All the existing elements of the underlying array are converted to the same type as the new element, and the converted elements are placed in the correct position to ensure that the original order does not change.
  3. Add a new element to the underlying array.

In addition, once the upgrade operation is triggered by inserting a new element, the length of the newly inserted element is larger than that of all the existing elements in the collection. Therefore, the value of this new element is either greater than all existing elements (positive value) or smaller than all existing elements (negative value)

  • In the new elementless thanFor all existing elements, the new element will be placed at the beginning of the underlying array, that is, the position with index 0;
  • In the new elementgreater thanFor all existing elements, the new element will be placed at the end of the underlying array;

3 upgrade advantages

The upgrade strategy of integer set has two main advantages

  1. The flexibility of integer set is indicated;
  2. Save memory as much as possible;

3.1 prompt flexibility

Because C is a statically typed language, in order to avoid type errors, we usually do not put two different types of values in the same data structure.

However, because of the upgrade operation, the integer set can be adapted to the new elements through it, so we can arbitrarily change int16_ t、int32_ t. And Int64_ T-type integers are added to the set without worrying about type errors, which greatly improves the flexibility of integer sets.

3.2 save memory

Of course, let an array hold int16 at the same time_ t、int32_ t. And Int64_ For integer values of type T, we can use Int64 roughly_ An array of type T is used as the underlying implementation of the set of integers to store values of different types. However, even if you add int16 to the collection_ t、int32_ For the values of type T and arrays, Int64 is required_ T type of space to save, there is a waste of memory.

The integer set upgrade operation can not only save three different types of values, but also ensure that the upgrade operation will only be carried out when necessary, so as to save memory.

4 intersection, union and difference set algorithm

The collection in redis implementsHand over, merge and differenceRelevant operations can be attendedt_set.cIn which
sinterGenericCommand()Realize intersection,sunionDiffGenericCommand()The Union and difference sets are realized.

All of them can element multiple sets at the same time. When the subtraction operation is performed on multiple sets, the difference between the first set and the second set is calculated first, and then the difference set is made with the third set, and so on.

Next, let’s look at the implementation of the next three operations.

4.1 intersection

The process of calculating intersection can be divided into three parts

  1. Check each set and treat the nonexistent set as an empty set. Once there is an empty set, the final intersection is an empty set.
  2. The collection is sorted according to the number of elements. This sort is beneficial to start from the smallest set in later calculation, and the number of elements to be processed is less.
  3. The first set after sorting (that is, the smallest set) is traversed. For each element, it is searched in all the following sets in turn. Only elements that can be found in all sets are added to the final result set.

It should be noted that the time complexity of intset and dict storage is O (log n) and O (1), respectively. But because only small sets use intset, it can be roughly considered that the search of intset is also of constant time complexity.

4.2 Union

Union operation is the simplest, as long as traversing all the sets and adding each element to the final result set. Adding elements to the collection will automatically de duplicate, so there is no need to detect whether the element already exists at the time of insertion.

4.3 difference set

There are two possible algorithms for computing difference sets, and their time complexity is different.

The first algorithm

The first set is traversed. For each element of the first set, it is searched in all the following sets in turn. Only elements that cannot be found in all sets are added to the final result set.

The time complexity of this algorithm is O (n * m), where n is the number of elements in the first set and M is the number of sets.

The second algorithm

  1. Add all elements of the first collection to an intermediate set.
  2. Traverse all the following sets, and for each element encountered, delete it from the intermediate set.
  3. Finally, the remaining elements of the intermediate set constitute the difference set.
  4. The time complexity of this algorithm is O (n), where n is the sum of the number of elements in all sets.

At the beginning of calculating the difference set, the time complexity of the two algorithms is estimated respectively, and then the algorithm with low complexity is selected for operation. There are two more things to note:

  • To a certain extent, the first algorithm is preferred, because it involves less operations, only adding, while the second algorithm needs to add and then delete.
  • If the first algorithm is selected, all the collections after the second set are sorted according to the number of elements before the implementation of the algorithm. This sort facilitates finding elements with a higher probability, thus ending the search faster.

5 Summary

  1. The set of integers is one of the underlying implementations of set keys.
  2. The set of integers holds the set elements in an orderly, non repetitive manner. When necessary, the type of the underlying array is changed according to the type of the newly added element.
  3. The upgrade operation improves the operation flexibility and saves memory as much as possible.
  4. Collections can be madeHand over, merge and differenceOperation set.