Thursday, February 07, 2013

That Netflix RMSE is way too low or is it ? ( Clustering-Based Matrix Factorization - implementation -)

[[ Update: this paper has been removed from ArXiv. For more info check This Week's Guardians of Science: Zeno Gantner and Peyman Milanfar ]



We've seen this type of occurrence on Nuit Blanche before. This one is either a bombshell or a dud. Early on in a discussion in the Advanced Matrix Factorization group, Nima Mirbakhsh shared his thought and a interesting and potentially mind blowing implementation, here is what he said:


Helping to evaluate my proposed extension on matrix factorization.

Hello eveyone,
I have a new extension of matrix factorization named "Clustering-Based Matrix Factorization". I apply it on many datasets including "Netflix", "Movielens", "Epinions", "Flixter", and it acheives very good results. For the last three data sets the RMSE result is good and realizable, but for Netflix dataset it acheives very interestng result. As we all know the RMSE result of the Netflix prize winner was 0.8567, now my method achieves the RMSE of 0.8122.
I know that the Netflix prize winner's method includes fusion of lots of different algorithm's result, and it is hard to believe that one algorithm can reach such a good result. It has been my concern in the last couple of months too. Thus, I check my source code and my setup several time but cannot find any bug there. I also submit the paper in ICML but except a weak acceptation all other reviewers said that my method actually make sense but they all reject my work just because of the extraordinary result!
That is why I decide to put the paper and my source code online that everyone can evaluate it. Now, I am going to ask you to kindly joining me to evaluate the paper and the source code more accurately. Lets say if my method works fine, it is going to be a new experience on recommendation systems and may show us that they are still opportunities to improve the RMSE results.
Here is the paper's link following by source code's link:
source code: http://goo.gl/Az0lS 
Thanks everyone in advance.

We recently saw some improvement of the Netflix RMSE (Linear Bandits in High Dimension and Recommendation Systems) but this time, the code is shared for everybody to kick the tires on it. As a reminder, we featured that paper earlier:


Recommender systems are emerging technologies that nowadays can be found in many applications such as Amazon, Netflix, and so on. These systems help users find relevant information, recommendations, and their preferred items. Matrix Factorization is a popular method in Recommendation Systems showing promising results in accuracy and complexity. In this paper we propose an extension of matrix factorization that uses the clustering paradigm to cluster similar users and items in several communities. We then establish their effects on the prediction model then. To the best of our knowledge, our proposed model outperforms all other published recommender methods in accuracy and complexity. For instance, our proposed method's accuracy is 0.8122 on the Netflix dataset which is better than the Netflix prize winner's accuracy of 0.8567.

11 comments:

Zeno said...

According to the paper, they do not use the same evaluation protocol as the one used in the Netflix prize competition.

So the results are not comparable.

Igor said...

Zeno,

you may want to give your inoput directly to Nima in the Linkedin thread:

http://www.linkedin.com/groupAnswers?viewQuestionAndAnswers=&discussionID=211708878&gid=4084620&commentID=118330837&trk=view_disc&ut=2YFxQFkHY3XBA1

winsty said...

They just kept all the items with at least 4 ratings... The results are totally not comparable. To my best of my knowledge, on Movielens 100K dataset, the best single model could only reach an RMSE about 0.88. In their paper, the basic MF model could even reach 0.81...

Igor said...

winsty,

"...They just kept all the items with at least 4 ratings... The results are totally not comparable..." is a good observation. However " To my best of my knowledge, on Movielens 100K dataset, the best single model could only reach an RMSE about 0.88. In their paper, the basic MF model could even reach 0.81..." is not really helpful. The reason the code is shared is for people to explain **why** we seem to be getting extraordinarly better results. We are not going through a literature review process.

Nima said...

Hello everyone, I am the author.. Before starting to answer your comments I gonna ask you not to decide so fast before reading the paper..

@winsty i am using ratings 4 or above 4 only for clustering purpose and it doesn't make any change in train set or test set and I am pretty sure that we don't miss any of them in our evaluation. Even users or items with that their all ratings are under 4 will go to same clusters... I have made the Netflix dataset that I used online....

@igor as I say in the paper if you don't use the threshold for stoping the learning process the MF model will get in an overfitting. After 100 epoches I've got RMSE .90 for basic matrix factorization and using the threshold it is almost .81...

@zeno I think they were using RMSE? They were not?

Zeno said...

@Nima The measure is not the only part of the protocol.

Nima said...

Just want to update you guys that the results were not valid. I had a mistake in my code. I will update the paper with new results soon.

irchans said...

Nima,
Did you find the mistake or did one of the Nuit Blanche readers find the mistake? Just curious.

PS: Nice Paper

Igor said...

Irchans,

If you followed the discussion on the linkedin group on advanced matrix factorization, you would have noticed that, most probably Zeno helped a lot. Right now, I am personally waiting for Nima to confirm if the bug is substantial or merely changes the results (while still beating the netflix RMSE).


Cheers,

Igor.

Zeno said...

Hi Igor,

the bug caused the RMSE on MovieLens to be vastly underestimated, so I guess that the results on Netflix do not hold any more.

Igor said...

Zeno,

Yes, this is my understanding. The RMSE does not seem to hold.

Igor

Printfriendly