City, University of London
Browse

MEDISEG

Download (418.99 MB)
dataset
posted on 2025-03-14, 16:45 authored by William ChuWilliam Chu

Dataset Overview

MEDISEG (MEDication Image SEGmentation) is a high-quality, real-world dataset designed for the development and evaluation of pill recognition models. It contains two subsets:

  • MEDISEG (3-Pills): A controlled dataset featuring three pill types with subtle differences in shape and color.
  • MEDISEG (32-Pills): A more diverse dataset containing 32 distinct pill classes, reflecting real-world challenges such as occlusions, varied lighting conditions, and multiple medications in a single frame.

Each subset includes COCO-format annotations with instance segmentation masks, bounding boxes, and class labels.

Dataset Structure

The dataset is organized as follows:

MEDISEG/

│── LICENSE
│── metadata.csv

│── 3pills/
│ ├── annotations.json

│ ├── images/
│ │ ├── image1.jpg

│ │ ├── image2.jpg
│── 32pills/

│ ├── annotations.json
│ ├── images/

│ │ ├── image1.jpg
│ │ ├── image2.jpg


  • LICENSE: The CC BY 4.0 license under which the dataset is distributed.
  • metadata.csv: Supplementary drug information, including registration numbers, brand names, active ingredients, regulatory classifications, and official URLs.
  • annotations.json: COCO-format annotation files providing segmentation masks, bounding boxes, and class labels.
  • images/: High-resolution JPG images of medications.

Acknowledgements

If you use this dataset, please cite the corresponding publication:
bibtex
@inproceedings{MEDISEG2025,
title = {MEDISEG: A large-scale dataset of medication images with instance segmentation masks for preventing adverse drug events},
author = {Chu, Wai Ip and Hirani, Shashi and Tarroni, Giacomo and Li, Ling},
journal = {Nature Scientific Data},
year = {2025},
url = {https://example.com}
}

History

Usage metrics

    School of Science & Technology

    Licence

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC