158 lines
		
	
	
		
			6.6 KiB
		
	
	
	
		
			Plaintext
		
	
	
	
	
	
			
		
		
	
	
			158 lines
		
	
	
		
			6.6 KiB
		
	
	
	
		
			Plaintext
		
	
	
	
	
	
SUMMARY & USAGE LICENSE
 | 
						|
=============================================
 | 
						|
 | 
						|
MovieLens data sets were collected by the GroupLens Research Project
 | 
						|
at the University of Minnesota.
 | 
						|
 
 | 
						|
This data set consists of:
 | 
						|
	* 100,000 ratings (1-5) from 943 users on 1682 movies. 
 | 
						|
	* Each user has rated at least 20 movies. 
 | 
						|
        * Simple demographic info for the users (age, gender, occupation, zip)
 | 
						|
 | 
						|
The data was collected through the MovieLens web site
 | 
						|
(movielens.umn.edu) during the seven-month period from September 19th, 
 | 
						|
1997 through April 22nd, 1998. This data has been cleaned up - users
 | 
						|
who had less than 20 ratings or did not have complete demographic
 | 
						|
information were removed from this data set. Detailed descriptions of
 | 
						|
the data file can be found at the end of this file.
 | 
						|
 | 
						|
Neither the University of Minnesota nor any of the researchers
 | 
						|
involved can guarantee the correctness of the data, its suitability
 | 
						|
for any particular purpose, or the validity of results based on the
 | 
						|
use of the data set.  The data set may be used for any research
 | 
						|
purposes under the following conditions:
 | 
						|
 | 
						|
     * The user may not state or imply any endorsement from the
 | 
						|
       University of Minnesota or the GroupLens Research Group.
 | 
						|
 | 
						|
     * The user must acknowledge the use of the data set in
 | 
						|
       publications resulting from the use of the data set
 | 
						|
       (see below for citation information).
 | 
						|
 | 
						|
     * The user may not redistribute the data without separate
 | 
						|
       permission.
 | 
						|
 | 
						|
     * The user may not use this information for any commercial or
 | 
						|
       revenue-bearing purposes without first obtaining permission
 | 
						|
       from a faculty member of the GroupLens Research Project at the
 | 
						|
       University of Minnesota.
 | 
						|
 | 
						|
If you have any further questions or comments, please contact GroupLens
 | 
						|
<grouplens-info@cs.umn.edu>. 
 | 
						|
 | 
						|
CITATION
 | 
						|
==============================================
 | 
						|
 | 
						|
To acknowledge use of the dataset in publications, please cite the 
 | 
						|
following paper:
 | 
						|
 | 
						|
F. Maxwell Harper and Joseph A. Konstan. 2015. The MovieLens Datasets:
 | 
						|
History and Context. ACM Transactions on Interactive Intelligent
 | 
						|
Systems (TiiS) 5, 4, Article 19 (December 2015), 19 pages.
 | 
						|
DOI=http://dx.doi.org/10.1145/2827872
 | 
						|
 | 
						|
 | 
						|
ACKNOWLEDGEMENTS
 | 
						|
==============================================
 | 
						|
 | 
						|
Thanks to Al Borchers for cleaning up this data and writing the
 | 
						|
accompanying scripts.
 | 
						|
 | 
						|
PUBLISHED WORK THAT HAS USED THIS DATASET
 | 
						|
==============================================
 | 
						|
 | 
						|
Herlocker, J., Konstan, J., Borchers, A., Riedl, J.. An Algorithmic
 | 
						|
Framework for Performing Collaborative Filtering. Proceedings of the
 | 
						|
1999 Conference on Research and Development in Information
 | 
						|
Retrieval. Aug. 1999.
 | 
						|
 | 
						|
FURTHER INFORMATION ABOUT THE GROUPLENS RESEARCH PROJECT
 | 
						|
==============================================
 | 
						|
 | 
						|
The GroupLens Research Project is a research group in the Department
 | 
						|
of Computer Science and Engineering at the University of Minnesota.
 | 
						|
Members of the GroupLens Research Project are involved in many
 | 
						|
research projects related to the fields of information filtering,
 | 
						|
collaborative filtering, and recommender systems. The project is lead
 | 
						|
by professors John Riedl and Joseph Konstan. The project began to
 | 
						|
explore automated collaborative filtering in 1992, but is most well
 | 
						|
known for its world wide trial of an automated collaborative filtering
 | 
						|
system for Usenet news in 1996.  The technology developed in the
 | 
						|
Usenet trial formed the base for the formation of Net Perceptions,
 | 
						|
Inc., which was founded by members of GroupLens Research. Since then
 | 
						|
the project has expanded its scope to research overall information
 | 
						|
filtering solutions, integrating in content-based methods as well as
 | 
						|
improving current collaborative filtering technology.
 | 
						|
 | 
						|
Further information on the GroupLens Research project, including
 | 
						|
research publications, can be found at the following web site:
 | 
						|
        
 | 
						|
        http://www.grouplens.org/
 | 
						|
 | 
						|
GroupLens Research currently operates a movie recommender based on
 | 
						|
collaborative filtering:
 | 
						|
 | 
						|
        http://www.movielens.org/
 | 
						|
 | 
						|
DETAILED DESCRIPTIONS OF DATA FILES
 | 
						|
==============================================
 | 
						|
 | 
						|
Here are brief descriptions of the data.
 | 
						|
 | 
						|
ml-data.tar.gz   -- Compressed tar file.  To rebuild the u data files do this:
 | 
						|
                gunzip ml-data.tar.gz
 | 
						|
                tar xvf ml-data.tar
 | 
						|
                mku.sh
 | 
						|
 | 
						|
u.data     -- The full u data set, 100000 ratings by 943 users on 1682 items.
 | 
						|
              Each user has rated at least 20 movies.  Users and items are
 | 
						|
              numbered consecutively from 1.  The data is randomly
 | 
						|
              ordered. This is a tab separated list of 
 | 
						|
	         user id | item id | rating | timestamp. 
 | 
						|
              The time stamps are unix seconds since 1/1/1970 UTC   
 | 
						|
 | 
						|
u.info     -- The number of users, items, and ratings in the u data set.
 | 
						|
 | 
						|
u.item     -- Information about the items (movies); this is a tab separated
 | 
						|
              list of
 | 
						|
              movie id | movie title | release date | video release date |
 | 
						|
              IMDb URL | unknown | Action | Adventure | Animation |
 | 
						|
              Children's | Comedy | Crime | Documentary | Drama | Fantasy |
 | 
						|
              Film-Noir | Horror | Musical | Mystery | Romance | Sci-Fi |
 | 
						|
              Thriller | War | Western |
 | 
						|
              The last 19 fields are the genres, a 1 indicates the movie
 | 
						|
              is of that genre, a 0 indicates it is not; movies can be in
 | 
						|
              several genres at once.
 | 
						|
              The movie ids are the ones used in the u.data data set.
 | 
						|
 | 
						|
u.genre    -- A list of the genres.
 | 
						|
 | 
						|
u.user     -- Demographic information about the users; this is a tab
 | 
						|
              separated list of
 | 
						|
              user id | age | gender | occupation | zip code
 | 
						|
              The user ids are the ones used in the u.data data set.
 | 
						|
 | 
						|
u.occupation -- A list of the occupations.
 | 
						|
 | 
						|
u1.base    -- The data sets u1.base and u1.test through u5.base and u5.test
 | 
						|
u1.test       are 80%/20% splits of the u data into training and test data.
 | 
						|
u2.base       Each of u1, ..., u5 have disjoint test sets; this if for
 | 
						|
u2.test       5 fold cross validation (where you repeat your experiment
 | 
						|
u3.base       with each training and test set and average the results).
 | 
						|
u3.test       These data sets can be generated from u.data by mku.sh.
 | 
						|
u4.base
 | 
						|
u4.test
 | 
						|
u5.base
 | 
						|
u5.test
 | 
						|
 | 
						|
ua.base    -- The data sets ua.base, ua.test, ub.base, and ub.test
 | 
						|
ua.test       split the u data into a training set and a test set with
 | 
						|
ub.base       exactly 10 ratings per user in the test set.  The sets
 | 
						|
ub.test       ua.test and ub.test are disjoint.  These data sets can
 | 
						|
              be generated from u.data by mku.sh.
 | 
						|
 | 
						|
allbut.pl  -- The script that generates training and test sets where
 | 
						|
              all but n of a users ratings are in the training data.
 | 
						|
 | 
						|
mku.sh     -- A shell script to generate all the u data sets from u.data.
 |