Open Datasets
for Data-Centric AI
The Data for Smarter AI
AI TRAINING DATA SPONSORSHIP PROGRAM
Datumo would like to support the AI industry by sharing.
Datumo has been running its own “AI Training Data Grant Program” to contribute to the development of the AI industry. Fellow businesses and research labs have applied for the project and have built datasets without charge. Datumo would like to release "OPEN DATASETS" to provide quality data for anyone who would like to build smarter AI.
WHAT IS OPEN DATASETS
DATA is Food for AI
“80% of data science is data cleaning. If 80% of work is preparing high-quality data, then preparing data is the core part of machine learning.”
Andrew Ng _ Founder & CEO of Laning.AI
80% PREP
20% ACTION
Source and prepare high quality ingredients
Source and prepare high quality data
Cook a meal
Train a model
YOUTUBE_A Chat with Andrew on MLOps: From Model-centric to Data-centric AI
Quality of data defines the quality of AI
AlphaGO was trained using data from 30 million moves played from 160,000 games in KGS GO Server by high ranked players.
In order to build an AI model that recognizes age and gender through facial analysis, a number of facial image data are required.
Data is the core part of AI
Data customized for your AI
For a voice-recognizing AI speaker, voice data from a wide range of age, gender, and dialects are required.
This collection of appropriate data for AI training is called Dataset. As such, the most important feature in building an AI model is “data”.
Datasets bring life to AI
WHO WE ARE
The Data for Smarter AI
Founded in Nov. 2018
0
Processed Data
0M+
2024.04
Clients
0+
Crowd Workers
0K+
As the leading data platform for AI training, DATUMO swiftly and accurately provides proper training datasets using Cashmission, the mobile/web crowd-sourcing platform with more than 150,000 workers.
Since the foundation in 2018, DATUMO has become one of the fastest growing startups in Korea. We have cooperated with about 200 companies, which some of them include industry-leading companies such as Samsung, LG, Kakao, and more. Annual sales in 2020 have reached 5.3M USD (61B KRW) and 3.9M USD (44B KRW) of cumulative investments have been raised. Recently, the founding members have been nominated as Forbes 30 Under 30 Asia 2021.
Data-centric AI, With Datumo
Data quality is Datumo’s forte. Based on mathematical algorithms developed by researchers from KAIST*, Datumo provides accurate total data inspection. The User Guidance team is responsible for maintaining consistency in data quality, which leads to the quality of AI performance. Data diversity is also monitored by our similar-data filtering system based on deep learning.
*: KAIST (Korea Advanced Institute of Science and Technology) is one of the most prestigious research institutes in Korea. Data diversity is also monitored by our similar-data filtering system based on deep learning.