1、 Analysis background
With the continuous development of e-commerce industry in the Internet era, the e-commerce market is becoming more and more saturated, and the competition among e-commerce platforms has entered a white hot stage. Therefore, data analysis of user behavior has become the top priority of platform operation. Based on the Taobao user behavior data set, this paper studies the implicit feedback problem, obtains the user demand from the user behavior, analyzes the user emotion, and aims to drive product optimization with data.
2、 Data sources
The data used in this paper comes from Alibaba cloud TianchiTaobao user behavior data set
The data set contains all the behaviors of about one million random users with behaviors (including click, buy, add purchase and like) between November 25, 2017 and December 3, 2017. The organization of the dataset is similar to movielens-20m, that is, each row of the dataset represents a user behavior, which is composed of user ID, commodity ID, commodity category ID, behavior type and timestamp, separated by commas. A detailed description of each column in the dataset is as follows:
|User ID||Integer type, serialized user ID|
|Commodity ID||Integer type, serialized commodity ID|
|Commodity category ID||Integer type, the category ID of the serialized product|
|Behavior type||String, enumeration type, including (‘pv ‘,’buy’,’cart ‘,’fav’)|
|time stamp||The timestamp of the behavior occurrence|
Among them, the four types of behavior respectively represent:
|pv||Product details page PV, equivalent to click|
|cart||Add merchandise to cart|
|fav||Collection of goods|
The dataset contains nearly one million users (987994), involving more than 4.16 million products (4162024), nearly 10000 commodity categories (9439), and more than 100 million user behaviors (100150807).
Note: this part is quoted from the data set introduction information
3、 Analysis objectives
The purpose of this paper is to analyze the behavior data of Taobao users, find the consumer preferences of users, so as to better guide businesses to obtain customers and profits, and achieve fine operation. The analysis ideas are as follows:
4、 Data cleaning
1. Data import
The downloaded. CSV format file is imported into the database through PL / SQL. It can be seen that the source data file data lacks column name. In this paper, the field name is added for the data, and the user table is created in advance_ Behavior matched.
Note here that the timestamp time of the table is changed_ The stamp type is set to varchar, which refers to the total number of seconds from 00:00 on January 1, 1970 to the second of user behavior. After the data is imported, in order to facilitate the next analysis, SQL statements are used for processing, which are converted into time format date and precise time, minute and second, and stored in create_ DATE、CREATE_ In the two fields of time. The data in the query table is shown in the following figure:
2. Duplicate value processing
In the data import phase, duplicate data has been ignored, so there is no duplicate value.
3. Treatment of missing values
The SQL statement is used to query the data with empty field value in the table, and the result after execution is 0, so there is no missing value.
4. Outlier handling
In the middle of December, 2017, the data set should be in the data set from November 25, 2017_ If the date field value is outside the range, the result will be 257110 after execution, and these abnormal data will be deleted.
5. Data integration
After the above steps of data processing, the final data set for analysis is obtained. Using SQL statements for preliminary statistics, we can see that the latest data set has 26061808 pieces of data, including 4 types of behavior (click, add shopping cart, collect, purchase) of 257134 users in 9 days, including 2467928 products and 8713 commodity types.
5、 User analysis
1. Activity analysis
PV and UV indexes were obtained from two directions of date and time for activity analysis. The PV and UV values are filtered out by SQL statements, and the data are presented in chart form by echarts.
Considering the periodicity of the date, the data is divided into one cycle every 7 days for comparative analysis.
In these 9 days, the PV and UV values on December 2 and 3 increased significantly compared with the previous 7 days. Considering that these two days are weekends, users have more leisure time than working days, and the number of users will increase to a certain extent. Further analysis found that the same weekend on November 25 and 26, although compared with the same cycle of working days of PV, UV slightly increased, but still far less than December 2 and 3. Therefore, it is believed that the rise of December 2-3 may be related to the marketing activities at that time, such as the pre-sale activities and pre-sale of stores.
Considering the periodicity of time, the data were screened according to the 24-hour system for comparative analysis. First, the SQL statement is used to format the time to obtain the precise hour value of the user behavior time, and then PV and UV values are calculated according to the hour.
The average value of PV is 1085908.67, and the average value of UV is 116351.88. In order to facilitate further analysis, the month on month growth rate of each time period compared with the previous hour is calculated according to the data, as shown in the figure below.
① After observing the column chart of PV and UV values, it is found that PV and UV after 9:00 are above their respective average values. It can be seen that most users end their sleep time before 9:00 a.m. and start using app after that. Therefore, it is suggested that morning activities such as check-in time should be planned to end after 9 o’clock, so as to obtain more users.
② Observe the column chart of PV and UV values. From 10:00 to 18:00, the UV value is generally in a high stable stage, indicating that most users have app use behavior during this period. However, according to the relatively low level of PV value, during this period, users are affected by work and study, and the per capita commodity browsing volume is not high. By observing the fluctuation range of PV and UV link ratio in this period, it can be concluded that during the working and learning period, the number of APP users remains relatively flat, but the per capita commodity browsing volume reflected by PV value will fluctuate greatly with the beginning and end of lunch break.
③ Observe the column chart of PV and UV values, from 19:00 to 22:00 is the peak of users’ use, PV and UV values are at a high level within 24 hours, it can be seen that the vast majority of users’ leisure and entertainment time is between 19:00 and 22:00. Therefore, it is suggested that the main activities in the app should be planned in this time period as much as possible, so as to obtain more participating users.
④ After observing the column chart of PV and UV values, the PV and UV values at 23:00 decreased significantly, indicating that more users choose to enter the resting state before 23:00. Observing the growth rate of PV and UV in this period, it is found that the negative growth state of PV and UV first appears at 22 o’clock, and decreases more obviously from 23 o’clock, and reaches the peak value at 1:00 a.m., indicating that most users choose to enter the rest state before 1:00 a.m. This negative growth continued until 4 a.m., when the PV, UV values also decreased to the lowest point.
2. User behavior analysis
Use SQL statement to create user behavior view. According to user ID, count the number of four user behaviors (click, collect, add shopping cart, purchase) within 9 days provided by the dataset. The created view structure is shown in the following figure:
The total number of users’ four behaviors was counted and funnel chart was made. It was found that the conversion rate from Click to purchase was only 2.24%, and the loss was serious.
For this purpose, the data flow from Click to purchase is analyzed.
According to the parameters provided by the data set, the following four user purchase paths are obtained:
① pv -> buy
② pv -> fav -> buy
③ pv -> cart -> buy
④ pv -> fav,cart -> buy
Using SQL statements, the corresponding specific data involved in different behavior paths are calculated
Create a user shopping path view, the structure of the created view is shown in the following figure, and search according to the conditions in the created view.
According to the obtained data, draw the corresponding funnel diagram
It is found that the conversion rate of direct purchase after click is only 1.329%, which is significantly lower than that of the other three purchase paths. The conversion rate of purchase after collection is 7.513%, and that of purchase after addition is 9.692%, and that of both purchase and collection is 15.881%. It shows that compared with the behavior of collecting, more users like to use the shopping cart as the final purchase pool. By putting the goods into the shopping cart, it can avoid re searching when they want to buy, so as to improve the purchase intention of users. Therefore, it is suggested that more activities should be carried out, such as full reduction across stores, collection and purchase of gifts, and advance purchase and enjoy discount, so as to guide users’ collection and purchase behavior, so as to improve the purchase conversion rate.
By observing the non jump out conversion rate of click, we found that more people chose to add purchase behavior, which reached the highest of 3.724%, followed by collection, which reached 1.674%. On the whole, the jump out rate is too high, which reflects that although Taobao users have high PV value and considerable per capita commodity browsing volume, the conversion rate is low, and there is a lot of room for improvement. We can improve the purchase intention of users by improving the product details page, doing a good job in after-sales and increasing praise.
3. User value analysis
According to the parameters provided by the data set, the RFM model of users is analyzed, and the value of users is analyzed.
Recency: This paper defines the difference between the last purchase time of the user and December 3.
Consumption frequency within a certain period of time (frequency): This paper defines the number of times a user purchases in 9 days.
Monetary: this parameter is ignored because the data set does not provide the data.
Create a user mechanism analysis view, and create a good view structure as shown in the figure below, and get the calculated R and F values.
According to the constructed table, max (R) is 8, min (R) is 0, max (f) is 159, min (f) is 1. First, draw a broken line chart of the number of people with different shopping frequency (f) and different recent consumption time, as shown in the figure below. (in real business, if the order frequency of users is too high, anomaly detection should be carried out, which is ignored here)
It can be seen from the chart that within 9 days, the number of users with shopping frequency f of 1 reached 59724, and then showed a rapid downward trend, and tended to be flat from F = 10. It can be seen that the F value of most users is within 10, and the scoring standard should be divided in detail within this range. The most recent shopping time of users, affected by the activities speculated above, is more distributed from December 1 to December 3. Therefore, the detailed design scoring rules are as follows:
|fraction||Last consumption interval R||Consumption frequency f|
|5||Within 1 day||More than 20 times|
According to the above scoring criteria, the user is scored and the user score view is created. The created view structure is shown in the following figure. Furthermore, the AVG (R_ score)、avg(F_ The average score of R and F was 3.515 and 1.391 respectively.
By comparing the scores of F and R dimensions with the calculated mean value, four types of value users are divided: 1) important value users whose F and R exceed the average value; 2) important development users whose r value exceeds the average value; 3) important retaining users whose F value exceeds the average value; 4) important retention users whose F and R values do not exceed the average value. The results are as follows:
According to the results of user classification and screening, the pie chart is drawn as follows: there are 37173 important value users, accounting for 21.24%; 63596 important development users, accounting for 36.33%; 10823 important maintaining users, accounting for 6.18%; and 63451 important retaining users, accounting for 36.25%. Users mainly focus on two categories: important development and important retention. Different marketing measures can be taken for different value types of users.
1. User activity analysis:
① The activity of users on weekends is higher than that on weekdays. From Friday, PV and UV values gradually rise and drop sharply after Monday.
② Most users end their sleep time before 9 a.m. and start using the app after that. Therefore, it is suggested that morning activities such as check-in time should be planned to end after 9 o’clock, so as to obtain more users.
③ The peak of user activity is from 19:00 to 22:00. Therefore, it is suggested that the main activities in the app should be planned in this time period as much as possible, so as to obtain more participating users.
④ Most users choose to enter the rest state before 23:00, and the negative growth state of PV and UV first appears at 22:00, and the decline is more obvious from 23:00, and reaches the peak of negative growth rate of month on month ratio at 1:00 in the morning, which indicates that most users choose to enter the rest state before 1:00 in the morning. Therefore, it is suggested that night activities such as zero second should be finished before 1 o’clock as far as possible, so as to maintain the enthusiasm of users.
2. User behavior analysis:
① The conversion rate from Click to purchase is low, only 2.24%, and the loss is serious. It is suggested to optimize the product details page. While the layout is clear and beautiful, the loading speed should be considered to avoid uploading large pictures and videos, so as to improve the browsing experience of users and reduce the bounce rate of detail pages. Improve the quality of products, do a good job in after-sales service, get more users praise or actively invite users to comment feedback, reduce the bounce rate of comments page.
② The conversion rate of users in the three processes of click → add shopping cart → purchase, click → collection → purchase, click → collect and purchase → purchase is significantly higher than that of click to purchase. Therefore, it is suggested that more activities should be carried out, such as full reduction across stores, collection and purchase of gifts, and advance purchase and enjoy discount, so as to guide users’ collection and purchase behavior, so as to improve the purchase conversion rate.
③ Optimize the recommendation algorithm and search for keywords to match products, give priority to recommend users’ collections and additional purchases, and improve users’ purchase desire.
3. User value analysis:
① There are 37173 important value users, accounting for 21.24%; 63596 important development users, accounting for 36.33%; 10823 important maintenance users, accounting for 6.18%; and 63451 important retaining users, accounting for 36.25%. Users mainly focus on two categories: important development and important retention. Different marketing measures should be taken for different value types of users.
② Important value users: high quality users belong to the user group. They can carry out targeted community operation, pay special benefits, provide better service, etc., and continue to improve customer satisfaction. Such as the recent 88vip membership mechanism.
③ Important to keep users: the number of shopping has reached above the average level, but there is no purchase behavior in recent years. Therefore, we can launch user recall mechanism, send app activity reminders and message consolation, send update reminders of concerned merchants, and send price reduction reminders of concerned commodities.
④ Important development users: they have purchase behavior recently, but the overall number of purchases is lower than the average. They can recommend commodities and issue coupons according to their shopping preferences and basic portraits, so as to improve their purchase intention.
⑤ Important retention of users: the last purchase time is relatively long, the purchase rate is low, and potential valuable customers should be given care and maintain good user relationship.