Course provider: Jožef Stefan Institute (JSI)
Instructors: Panče Panov (JSI)
Learning objectives:
- Understand AI data assets across the lifecycle: how datasets, labels, dataset splits, features, and evaluation artifacts evolve from collection to reuse;
- Apply the FAIR principles to AI work: make data and outputs easier to find, access, combine, and reuse (for teams and future projects);
- Create an actionable DMP for AI projects: a lightweight plan that supports reproducibility, handover, and compliance; and
- Handle constraints responsibly: recognize sensitive data, ethical considerations, access limitations, and industry vs research expectations.
Course content:
- AI data assets and data life cycle (raw/processed data, labels, splits, evaluation artefacts);
- Review of the FAIR principles in the context of AI projects;
- Finding, accessing and reusing data for AI (including access conditions and licensing);
- Data interoperability for AI (formats, metadata, label definitions and basic standards);
- Data repositories and sharing strategies for AI datasets and related artefacts;
- Dealing with confidential, personal, sensitive and private data, and ethical aspects in AI;
- Data management plan (AI-focused): structure of an AI-oriented DMP, use of FAIR principles, and examples of best practices and tools;
- Data management in research and industry: open data/open science vs. industrial constraints (governance, IP, security) in AI projects.
Learning outcomes: By the end of the training, participants can:
- Explain the AI data lifecycle and basic good practices (structure, documentation, versioning, provenance);
- Perform a basic FAIR check on an AI dataset/project and list concrete “quick wins” (metadata, access statement, license, formats);
- Draft a short AI-focused DMP (1–2 pages) that a team can actually follow; and
- Identify when extra safeguards are required (personal/confidential data, restricted access, IP) and propose sensible mitigations.