Open Datasets

for Data-Centric AI

The Data for Smarter AI


Datumo would like to support the AI industry by sharing.

Datumo has been running its own “AI Training Data Grant Program” to contribute to the development of the AI industry. Fellow businesses and research labs have applied for the project and have built datasets without charge. Datumo would like to release "OPEN DATASETS" to provide quality data for anyone who would like to build smarter AI.


DATA is Food for AI

“80% of data science is data cleaning. If 80% of work is preparing high-quality data, then preparing data is the core part of machine learning.”

Andrew Ng _ Founder & CEO of Laning.AI

80% PREP


Source and prepare high quality ingredients

Source and prepare high quality data

Cook a meal

Train a model

YOUTUBE_A Chat with Andrew on MLOps: From Model-centric to Data-centric AI
Quality of data defines the quality of AI

AlphaGO was trained using data from 30 million moves played from 160,000 games in KGS GO Server by high ranked players.

In order to build an AI model that recognizes age and gender through facial analysis, a number of facial image data are required.

Data is the core part of AI

Data customized for your AI

For a voice-recognizing AI speaker, voice data from a wide range of age, gender, and dialects are required.

This collection of appropriate data for AI training is called Dataset. As such, the most important feature in building an AI model is “data”.

Datasets bring life to AI


The Data for Smarter AI

Founded in Nov. 2018


Processed Data




Crowd Workers


As the leading data platform for AI training, DATUMO swiftly and accurately provides proper training datasets using Cashmission, the mobile/web crowd-sourcing platform with more than 150,000 workers.

Learn more about

Since the foundation in 2018, DATUMO has become one of the fastest growing startups in Korea. We have cooperated with about 200 companies, which some of them include industry-leading companies such as Samsung, LG, Kakao, and more. Annual sales in 2020 have reached 5.3M USD (61B KRW) and 3.9M USD (44B KRW) of cumulative investments have been raised. Recently, the founding members have been nominated as Forbes 30 Under 30 Asia 2021.

Data-centric AI, With Datumo

Data quality is Datumo’s forte. Based on mathematical algorithms developed by researchers from KAIST*, Datumo provides accurate total data inspection. The User Guidance team is responsible for maintaining consistency in data quality, which leads to the quality of AI performance. Data diversity is also monitored by our similar-data filtering system based on deep learning.

*: KAIST (Korea Advanced Institute of Science and Technology) is one of the most prestigious research institutes in Korea. Data diversity is also monitored by our similar-data filtering system based on deep learning.