AI Training Data

Sponsorship Program

The Data for Smarter AI

PARTNER

SPONSOR

$70K worth of dataset production grants

$30K worth of benefits

Calling for Startups, Research Institutions, University Labs and Individual Researchers conducting R&D on AI

Datumo has been running its own “AI Training Data Sponsorship Program” to contribute to the development of the AI industry.

Program Overview

With the AI Training Data sponsorship Program, Datumo supports R&D on AI by granting AI training datasets to selected team or individual; small businesses · startup · research institutions · individual researchers. (This is not a monetary grant program.)
Total of 10 teams will be selected and receive $70,000 worth of dataset each. (Datumo will build and deliever the dataset proposed on the applicant's application form.)
$30K worth of additional benefits provided. (AWS GPU credits and more)
The selected teams must use the dataset for their own business or for research purposes. In addition, all or part of the datasets will be freely open to the public to support the advancement of the Artificial Intelligence fields.

Schedule

TO BE ANNOUNCED SOON
MEANWHILE, PLEASE CONTACT HERE 👈

Purpose of the Program & How We Collect/Label Datasets

Visit Datumo’s official website (https://datumo.com) to see examples of image/video/audio/text collecting & labeling.

Dataset production(collecting & labeling) for selected teams will be processed through Datumo’s web/mobile crowdsourcing platform.If necessary, please contact us (email: [email protected]) to consult about the feasibility of collecting and labeling(annotation) the proposed dataset.

However, please do so in advance, since around the deadline, it is highly unlikely to receive answers in timely manner, due to high demands of inquiries.Applicants may suggest new, original datasets or pre-existing datasets that need modifications according to specific environments, such as advanced ImageNet datasets, English dialect datasets, worldwide traditional food dataset, and so on.

Primary Screening Criteria

Practicality and Universality of datasets · Capability of using crowd-sourcing for data collection/labeling · Excellence of research/development plans using datasets*If there is any issue or violation of policy, applicants may be disqualified even after acceptance.

Program Eligibility

Startups, small companies, research institutions, university labs, individual researchers, etc.

How to Apply

Currently applications are closed

Frequently Asked Questions

What are Datasets?

To develop powerful artificial intelligence, the algorithm needs to train with high-quality data. In the case of training the AlphaGo’s neural networks, it fetches about 30 million moves from 160,000 games of high-level players (6th to 9th level) among the Go games on the KGS public server (https://ww.gokgs.com/). Like so, we need millions of facial photo data to train AI model that determines age and gender from the face, and voice data of people from various ages, genders, and regions are necessary to develop AI speakers (voice recognition). These collections of data are called Datasets which are used to train AI training (machine learning, deep learning, and more).

Why are you doing this? What's the catch?

Are there any strings attached?

By hosting the AI Training Data Sponsorship Program, DATUMO receives many advantages including the following:

1. Publicity
DATUMO has become one of the top AI data startups, but we would like to gain more international publicity. Selected research institutes are required to submit their paper to academic journals or conferences after conducting research using datasets sponsored by DATUMO.

2. Potential future customers
Our clients vary from small startups to large international corporations like LG and Samsung. Through this program, we expect to reach out to researchers in universities and research institutes that might become our future clients.

3. Establishment of 'Open Datasets'
By contributing to the research field of AI, we aim to establish our presence as an AI industry leader.

All information related to the data sponsorship program are subject to change in the future and will be announced through the website.

Please suggest a dataset that can be collected and labeled(annotated) with Datumo's web/mobile crowdsourcing platform. For more information, please visit https://datumo.com . If necessary, teams may contact us (email: [email protected]) to consult about the feasibility of collecting and labeling(annotating) the proposed dataset. However, please do so in advance (Just before the deadline, it is highly unlikely that we answer the email due to high demands of inquiries. Datumo does not hold any responsibility arising from the unanswered inquiries).
The 2nd evaluation will be Presentation Interview. Absent teams will be automatically excluded from further review (no specific presentation format required, 10 minutes allocated per team - further details to be announced).
For the proposed project, participating teams are responsible for legal issues arising from the unauthorized use of all intellectual property rights and other information such as copyrights, patent rights, portrait rights, etc. of a third party. In the event of a problem in the future, the application or grant will be canceled.
The application score will not be disclosed, and if there is any inappropriate issue after the final evaluation, we may exclude the team from the program.
Detailed coordination and negotiation regarding the actual dataset production will be proceeded after the final selection.
Selected applicants must diligently participate in Datumo's written and video interviews, as well as in conferences or events regarding open datasets(built during the program) to present and spread research results in public.
Selected applicants must properly acknowledge Datumo and the sponsorship partners in their research papers (all sponsorship provided, regardless of its form – datasets, cloud computing, mentoring via partners, etc.- should be mentioned and acknowledged).
Ownership and copyright of the produced dataset will be held by individual team (However, teams must consent to the open-to-the-public and operation of all or part of the dataset. Any sensitive data such as personal information can be only used for AI development purposes proposed on the application). Datumo holds the right to open or operate the produced dataset. Moreover, Datumo aims to freely open the datasets to the public for the purpose of academic usages.
Partnered VC (Kakao Ventures, etc.) or companies (AWS, etc) are not obligated to hold any investment or business linkage with selected teams.
Partnered VCs may contact selected team or teams for investment review individually. If there aren't any company that VCs want to contact, VCs may choose not to proceed any investment or contact at all.
The laws of the Republic of Korea, excluding Korea’s conflict of laws rules, will apply to any disputes arising out of or relating to the whole process of AI Training Data Sponsorship Program. All claims arising out of or relating to the Notice, Terms, Applications, or Sponsorship will be litigated exclusively in the Seoul Central District Court. The applicant and Datumo consent to personal jurisdiction in this court.
When submitting proposal, please be sure to clearly understand all necessary terms and conditions. Participants are responsible for consequences resulting from misunderstandings.
Portion of produced dataset that is opened to public by Datumowill be distributed under CC-BY-SA license (Creator: DATUMO)