European Open Source AI Index
DatabaseNewsGuidesAboutContribute
Guide
by Nityaa Kalra
04 August 2025

Weighting Openness Criteria Across Different AI Domains

The term "openness" in generative AI is often used as a blanket statement, but its meaning is far from monolithic. While the Open Source AI Index (OSAI) attempts to quantify this concept through a set of universal metrics, the importance of each metric shifts depending on the domain and the stakeholders involved. A researcher's definition of "open" can be different from a lawyer's, a security expert's, or a musician's. Therefore, a one-size-fits-all approach to openness can often be misleading and counterproductive.

At OSAI, we recognize this complexity. Rather than dictating a single, immutable standard, we aim to approach the concept of openness from a scientific perspective by breaking it down into its constituent parts. In this blog post we analyze how different openness criteria are weighted across various domains.

Academia and Research

For the academic and research communities, the gold standard of openness is reproducibility. The foundation of scientific inquiry rests on the ability to replicate and validate results, which is why the weighting in this domain leans heavily towards training code, data, model weights, and detailed documentation. These elements are considered essential, not optional.

This emphasis is reflected in community efforts like the NeurIPS reproducibility checklist 1, which encourages open practices through formal requirements. The Model Openness Framework (MOF) by White et al. (2024) sharpens this perspective by distinguishing between completeness (the release of components like code, data, and weights) and openness (defined by the licenses attached to those components) 2. For a researcher, the ideal is both: a complete release under a license that enables unrestricted scientific use.

Legal

In the legal and compliance domain, openness is framed through regulatory obligations and licensing standards rather than through the release of raw training data. The EU AI Act sets out obligations for providers of general-purpose AI models (GPAI) in Article 53. Providers must prepare and maintain technical documentation. However, the Act does not require disclosure of raw training data. Instead, a detailed summary and documentation of data provenance and processing are considered adequate. For models released under a free and open-source license with publicly available weights, architecture, and usage information, the Act offers a key exemption: the obligations to maintain technical and downstream documentation (Annex XI and XII) do not apply. These models must still comply with copyright requirements and provide a training-data summary, and the exemption does not apply if the model is designated as posing systemic risk. This exemption leads directly to a broader question: what counts as “open source” in the legal context? The EU AI Act does not define it explicitly, but a widely recognized authority called the Open Source Initiative (OSI) recently introduced the Open Source AI Definition (OS AI 1.0). According to this definition, being open source requires three components to be released under OSI-approved terms: (1) data information: detailed documentation of training data sources, provenance, and processing steps; (2) training and inference code; and (3) parameters (weights). Like the EU AI Act, OSI does not mandate the sharing of raw datasets, but only detailed documentation of data sources. To sum up, both the EU AI Act and the OSI definition emphasize that openness does not hinge on disclosing raw training data. Instead, the focus is on transparency through detailed documentation of data sources.

Finance and Healthcare

Both healthcare and finance operate in high-stakes, regulated environments where the cost of AI failures can be life-altering or economically devastating. In these domains, openness serves the purpose of risk mitigation, bias control, and traceable governance. While financial institutions and medical providers may not open-source their models, internal transparency is essential. Regulatory bodies require models to be explainable and compliant with privacy laws like HIPAA and GDPR. Auditability by internal teams and regulators is non-negotiable and is central to decisions like loan approvals or medical diagnoses. This shows that openness can also mean rigorous internal practices and verifiable documentation that satisfy regulatory, ethical, and societal expectations.

Security

For security-critical systems, openness is a delicate balance between transparency and risk. Full public openness might invite exploitation, which is why controlled internal transparency is key. The NIST AI Risk Management Framework 4 emphasizes the importance of secure system logging, traceable component pipelines, and robust internal documentation to manage and mitigate security-related risks. This approach prioritizes accountability over full public exposure. Real-world incidents such as the misuse of publicly available LLM endpoints in phishing and jailbreak attacks have pushed stakeholders towards a model of openness that favors accountability over full exposure 5.

Creative Industries (Music, Art etc.)

Creative fields value openness primarily in terms of rights management, attribution, and content provenance. For these stakeholders, openness ensures fair use and ethical generation, not necessarily code transparency. This distinction is at the heart of the ongoing global conversation led by organizations like the World Intellectual Property Organization (WIPO) 6. WIPO's discussions on AI and intellectual property highlight the conceptual challenges that AI poses to traditional copyright law, particularly concerning authorship, originality, and the use of copyrighted works in training data. The creative community's definition of openness is therefore less about the free availability of a model's source code and more about a transparent, auditable process that supports ethical transparency and informed reuse.

Summary and the Pursuit of Scientific Openness

The different domains are not entirely separate. Their needs are often interconnected. For example, the legal community's demand for data provenance to address copyright claims (a legal concern) directly benefits researchers who need to audit for bias (an academic concern). The security requirements of the finance and healthcare sectors provide a foundation of trust that benefits the public as a whole.

By demanding true scientific openness, we empower all stakeholders to navigate the complexities of this technology with a foundation of transparency and trust. At OSAI, we argue that while these different perspectives are all valid, they are best served by a commitment to scientific openness. This is a state where a model's creation, operation, and limitations can be fully understood, audited, and reproduced by the broader community. It is defined not just by ease of use or public access, but by the availability of the core components of the scientific process: the data, the code, the weights, and the methodology.

This distinction is critical in an era of rising "false openness". Many models today are released under open licenses but with key components like training data or code kept proprietary. This selective transparency may support limited use, but it blocks meaningful scrutiny. Without full access, researchers can not verify performance, regulators can not assess risk, and creators can not understand data provenance. This practice, sometimes called openwashing, goes against the very principles that true openness is meant to uphold. For openness to matter and be aligned with public interest, it must be complete, not conditional.

References

1 Pineau, J. et al. 2020. Improving reproducibility in machine learning research (JMLR, vol. 22)
2 White, M. et al. 2024. The Model Openness Framework: Promoting completeness and openness in AI (arXiv)
3 EU Artificial Intelligence Act
4 Tabassi, E. 2023. NIST AI Risk Management Framework (AI RMF 1.0)
5 Carlini, N. et al. 2023. Extracting Training Data from Diffusion Models (arXiv)
6 WIPO – World Intellectual Property Organization

Supported by the Centre for Language Studies and the Dutch Research Council. Website design & development © 2024 by BSTN. This version of the index generated 26 December 2025, website content last updated 12 January 2026.