<h3><b>Dataset Overview</b></h3><p dir="ltr">MEDISEG (MEDication Image SEGmentation) is a high-quality, real-world dataset designed for the development and evaluation of pill recognition models. It contains two subsets:</p><ul><li>MEDISEG (3-Pills): A controlled dataset featuring three pill types with subtle differences in shape and color.</li><li>MEDISEG (32-Pills): A more diverse dataset containing 32 distinct pill classes, reflecting real-world challenges such as occlusions, varied lighting conditions, and multiple medications in a single frame.</li></ul><p dir="ltr">Each subset includes COCO-format annotations with instance segmentation masks, bounding boxes, and class labels.</p><h3><b>Dataset Structure</b></h3><p dir="ltr">The dataset is organized as follows:<br><br>MEDISEG/</p><p dir="ltr">│── LICENSE<br>│── metadata.csv</p><p dir="ltr">│── 3pills/<br>│ ├── annotations.json</p><p dir="ltr">│ ├── images/<br>│ │ ├── image1.jpg</p><p dir="ltr">│ │ ├── image2.jpg<br>│── 32pills/</p><p dir="ltr">│ ├── annotations.json<br>│ ├── images/</p><p dir="ltr">│ │ ├── image1.jpg<br>│ │ ├── image2.jpg</p><p><br></p><ul><li>LICENSE: The CC BY 4.0 license under which the dataset is distributed.</li><li>metadata.csv: Supplementary drug information, including registration numbers, brand names, active ingredients, regulatory classifications, and official URLs.</li><li>annotations.json: COCO-format annotation files providing segmentation masks, bounding boxes, and class labels.</li><li>images/: High-resolution JPG images of medications.</li></ul><h3><b>Acknowledgements</b></h3><p dir="ltr">If you use this dataset, please cite the corresponding publication:<br>bibtex<br>@inproceedings{MEDISEG2025,<br>title = {MEDISEG: A large-scale dataset of medication images with instance segmentation masks for preventing adverse drug events},<br>author = {Chu, Wai Ip and Hirani, Shashi and Tarroni, Giacomo and Li, Ling},<br>journal = {Nature Scientific Data},<br>year = {2025},<br>url = {https://example.com}<br>}<br></p>