With the development of technology, our habits are also changing. As such, most of today’s E-Commerce sites use their own proprietary recommendation algorithms to better serve customers with the products they have to like. There are many examples such as Netflix’s movies, Spotify’s music, Facebook recommending friends, product recommendations of Amazon, etc. One of the reasons why these companies are so popular can be shown that their business structures are based on recommendation systems.
What is Recommendation System?
A recommender system, or a recommendation system, can be thought of as a subclass of information filtering system that seeks to predict the best “rating” or “preference” a user would give to an item which is typically obtained by optimizing for objectives like total clicks, total revenue, and overall sales.
Broadly speaking, most recommender systems leverage two types of data:
- Interaction Data, such as ratings, and browsing behaviors, and
- Attribution Information, about each user and items
The modeling approach relying on the former data is generally known Collaborative Filtering method, and the approach using the latter is referred to as the Content-Base Filtering method. There is also another category known as Knowledge-Based recommender system that is based on explicitly specified user requirements. Of course, each of these methods has its strengths and weaknesses depending on which applications they are used for, and the amount of data available. Hybrid Systems are then used to combined the advantages of these approaches to have a robust performing system across a wide variety of applications. Most recommender systems now use a hybrid approach, combining collaborative filtering, content-based filtering, and other approaches.
There are three main types of techniques for Recommendation engines:
- Collaborative filtering
- Content-Based Filtering
- Hybrid Recommendation Systems
Collaborative Filtering Methods
These types of models use the collaborative power of the ratings provided by multiple users to make recommendations and rely mostly on leveraging either inter-item correlations or inter-user interactions for the prediction process. Intuitively, it relies on an underlying notion that two users who rate items similarly are likely to have comparable preferences for other items.
There are two types of methods that are commonly used in collaborative filtering:
Memory-based methods also referred to as neighborhood-based collaborative filtering algorithms, where ratings of user-item combinations are predicted based on their neighborhoods. These neighborhoods can be further defined as User-Based, and Item Based.
Item-Based Collaborative Filtering is the recommendation system to use the similarity between items using the ratings by users.
User-Based Collaborative Filtering is a technique used to predict the items that a user might like on the basis of ratings given to that item by the other users who have similar tastes with that of the target user. Many websites use collaborative filtering for building their recommendation system.
In Model-based methods, ML techniques are used to learn model parameters within the context of a given optimization framework
Content-Based Filtering Methods
In these types of systems, the descriptive attributes of items/users are used to make recommendations. The term “content” refers to these descriptions. In content-based methods, the ratings and interaction behavior of users are combined with the content information available in the items.
Hybrid Methods
In many cases, a wider variety of inputs is available; in such cases, many opportunities exist for hybridization, where the various aspects from different types of systems are combined to achieve the best of all worlds. The approach is comparable to the conventional ensemble analysis approach, where the power of multiple types of machine learning algorithms is combined to create a more robust model.
Dataset and Story
MovieLens, a movie recommendation service, provided the dataset. It contains the rating scores for these movies along with the movies.
It contains 2,000,0263 ratings across 27,278 movies. This data was created by 138,493 users between 09 January 1995 and 31 March 2015. This data set was created on October 17, 2016. Users are randomly selected. It is known that all selected users voted for at least 20 movies.
Variables of the data set:
movie.csv
- movieId — Unique movie number. (UniqueID)
- title — Movie name
rating.csv
- userid — Unique user number. (UniqueID)
- movieId — Unique movie number. (UniqueID)
- rating — The rating given to the movie by the user
- timestamp — Evaluation date
Visit my Kaggle account to see the codes in more detail!