MIT Saliency Benchmark Results: CAT2000

The following are results of models evaluated on their ability to predict ground truth human fixations on our benchmark data set containing 2000 images from 20 different categories with eye tracking data from 24 observers. We post the results here and provide a way for people to submit new models for evaluation.


If you use any of the results or data on this page, please cite the following:

  author    = {Zoya Bylinskii and Tilke Judd and Ali Borji and Laurent Itti and Fr{\'e}do Durand and Aude Oliva and Antonio Torralba},
  title     = {MIT Saliency Benchmark},
   title={CAT2000: A Large Scale Fixation Dataset for Boosting Saliency Research},
   author={Borji, Ali and Itti, Laurent},
   journal={CVPR 2015 workshop on "Future of Datasets"},
   note={arXiv preprint arXiv:1505.03581}


2000 test images (the fixations from 24 viewers per image are not public such that no model can be trained using this data set).
2000 train images with fixations of 18 observers (another 6 observers per image are held out).

Model Performances

Return to results

Sample Image for selected model: