LION COMMUNITY USAGE CASE

Collaborative recommendation: movies and viewers.

 

Collaborative filtering and recommendation.

Word of mouth has always been a powerful and effective technique to spread information and opinions from person to person, in a viral manner. It as a distributed and human way to mine the data implicitly contained in many human minds. A similar process can be simulated through data mining and modeling methods. Starting from the raw data (potentially huge quantities, ranging from thousands to billions of items) one extracts information bits which are relevant for the specific final user, based on models of his explicit or implicit preferences, and on similarities with other people.

An interesting application is in the marketing sector: data collected about users and products, either bought or at least evaluated, can be used to estimate how a customer would evaluate a product he did not see before. The final purpose of predicting evaluations is to encourage the user to buy, for example by recommending a list of items corresponding to the highest predicted evaluations. Advertising is more effective if the presented products are filtered based on the user preferences.

Collaborative filtering and recommendation is a method of predicting the interests of a person by collecting taste information from many other collaborating people. The underlying assumption is that those who agreed in the past tend to agree again in the future. For example, a collaborative recommendation system for movies could make personal predictions about which movie a user should like, given some knowledge of the user's tastes and the information gathered from many other users.

Getting data from a database (mySQL)

In this demo we start from data saved in a mySQL database. Of course, any database can be easily linked to LIONoso. Some figures showing the organization of the records containing movies, users, and votes expressed by users on movies are shown below.

Getting data from a database (mySQL)

The data cab be loaded into LION by using the Data Sources - MySQL table tool and inserting the data as shown below.

Building the models for movies and viewers.

The flow in the LION workbench to extract the model is shown below. The first Python script (matrify.py) transforms the original table into the appropriate matrix form. Then the external program recommend.exe produces the desired model in the Vectors table. Values are the predictions for the missing cells. Additional Tables are for checks (Check contains the difference, on the existing votes between the original votes and the predicted votes) and for additional parameters of the model (pls see the book in the bibliography for details).

By using LIONoso 7D plot and similarity map , movies and viewers can be analyzed in the same space. In the plots below, each movie is represented by a red ball, each viewer by a green one. Movies that a user will like tend to be close to the user. Similar movies and viewers with similar tastes tend to be mapped to nearby positions.

This kind of analysis is powerful to group entities and discover interesting similarity relationships. Entities can be users, customers, business partners, stocks, social network friends, etc.

Predicting the unknown votes.

After the model for each viewer and each movie is derived, the missing votes (the votes for movies that the viewer did not watch yet) can be predicted by the system. They are saved in the Table "Values". The predictions can be used for customized suggestions or for marketing strategies, when trying to sell a new product to an existing customer.

Let's note that no deep knowledge of the techniques is required to use these models in an effective manner. Starting from a large quantity of data and testing the prediction results on a testing set is sufficient to develop effective recommendations with an estimate on their precision.

Please contact us if you are interested in the details of this usage case and we will be happy to send you more details. The web mining and network analytics tools of LIONoso can be used for any situation with interacting entities, like products, customers, employees, biological systems, etc.

References:
The LION way
Roberto Battiti and Mauro Brunato. LIONlab, University of Trento, Feb 2014.
http://www.movielens.org/,Free and personalized movie recommendations. Department of Computer Science and Engineering at the University of Minnesota.