MEDISEG
Dataset Overview
MEDISEG (MEDication Image SEGmentation) is a high-quality, real-world dataset designed for the development and evaluation of pill recognition models. It contains two subsets:
- MEDISEG (3-Pills): A controlled dataset featuring three pill types with subtle differences in shape and color.
- MEDISEG (32-Pills): A more diverse dataset containing 32 distinct pill classes, reflecting real-world challenges such as occlusions, varied lighting conditions, and multiple medications in a single frame.
Each subset includes COCO-format annotations with instance segmentation masks, bounding boxes, and class labels.
Dataset Structure
The dataset is organized as follows:
MEDISEG/
│── LICENSE
│── metadata.csv
│── 3pills/
│ ├── annotations.json
│ ├── images/
│ │ ├── image1.jpg
│ │ ├── image2.jpg
│── 32pills/
│ ├── annotations.json
│ ├── images/
│ │ ├── image1.jpg
│ │ ├── image2.jpg
- LICENSE: The CC BY 4.0 license under which the dataset is distributed.
- metadata.csv: Supplementary drug information, including registration numbers, brand names, active ingredients, regulatory classifications, and official URLs.
- annotations.json: COCO-format annotation files providing segmentation masks, bounding boxes, and class labels.
- images/: High-resolution JPG images of medications.
Acknowledgements
If you use this dataset, please cite the corresponding publication:
bibtex
@inproceedings{MEDISEG2025,
title = {MEDISEG: A large-scale dataset of medication images with instance segmentation masks for preventing adverse drug events},
author = {Chu, Wai Ip and Hirani, Shashi and Tarroni, Giacomo and Li, Ling},
journal = {Nature Scientific Data},
year = {2025},
url = {https://example.com}
}