ORBIT: A real-world few-shot dataset for teachable object recognition collected from people who are blind or low vision

Version 3 2021-10-22, 12:03

Version 2 2021-06-11, 11:52

Version 1 2021-04-07, 09:13

dataset

posted on 2021-10-22, 12:03 authored by Daniela Massiceti, Lida Theodorou, Luisa Zintgraf, Matthew Tobias Harris, Simone StumpfSimone Stumpf, Cecily Morrison, Edward Cutrell, Katja Hofmann

Object recognition predominately still relies on many high-quality training examples per object category. In contrast, learning new objects from only a few examples could enable many impactful applications from robotics to user personalization. Most few-shot learning research, however, has been driven by benchmark datasets that lack the high variation that these applications will face when deployed in the real-world. To close this gap, we present the ORBIT dataset, grounded in a real-world application of teachable object recognizers for people who are blind/low vision. We provide a full, unfiltered dataset of 4,733 videos of 588 objects recorded by 97 people who are blind/low-vision on their mobile phones, and a benchmark dataset of 3,822 videos of 486 objects collected by 77 collectors.

The code for loading the dataset, computing all benchmark metrics, and running the baseline models is available at https://github.com/microsoft/ORBIT-Dataset

This version comprises several zip files:

- train, validation, test: benchmark dataset, organised by collector, with raw videos split into static individual frames in jpg format at 30FPS

- other: data not in the benchmark set, organised by collector, with raw videos split into static individual frames in jpg format at 30FPS (please note that the train, validation, test, and other files make up the unfiltered dataset)

- *_224: as for the benchmark, but static individual frames are scaled down to 224 pixels.

- *_unfiltered_videos: full unfiltered dataset, organised by collector, in mp4 format.