The IndexGuidesNews
AboutContribute

Introducing the European Open Source AI Index

by Mark Dingemanse & Andreas Liesenfeld
24 February 2025

New generative AI models are popping up everywhere and claims about openness abound. When we launched Opening Up ChatGPT in July 2023, it was the first global openness index for instruction-tuned large language models. Soon it featured over 50 models from >25 model providers. However, not everyone likes looking at enormous tables with more models and features than you can handle. Often, what people need is specific guidance on the best models to use in education, a comparison of specific models like Llama and BloomZ, or just a quick list of all models that provide source code as well as scientific documentation.

We designed the European Open Source AI index to facilitate this form of flexible information sharing. You can check out a growing number of guides, or sift through the model index using fulltext search and a comprehensive set of filters. We welcome your requests for guides and your suggestions for new models to include or data points to update. See here for how to contribute.

Read on to learn about about how to navigate today's Open Source AI landscape. Or just start exploring the index on your own.

Happy browsing!

Why openness needs a rethink

For the longest time, open-source software has been the key mental model for the notion of open source. Even if there was a large crop of licenses that differ on the finer points of redistribution, attribution, and so on, most people would agree that open source means the source of a piece software can be openly inspected, adjusted, and redistributed. Developers like open source because it allows them to poke around to see how things work and contribute improvements and bug fixes. Users like open source because it provides a degree of transparency and security that is unmatched by a lot of software from proprietary vendors. As Linus' Law has it, many eyeballs make all bugs shallow.

Today, the advent of complex machine learning models considerably complicates the picture. What exactly is the source of a large language model like Llama or a text-to-image model like Stable Diffusion? Some argue it is at least the source code — the actual computer code written to train and fine-tune the model. Others argue that in addition, the training data is equally crucial. Without countless terabytes of text and imagery, usually scraped from the open web, none of these models would do anything. At the same time, the amount of computational power needed to even train such models is astronomical, and therefore only within the reach of a select few large companies. Is there a meaningful way in which such models can be reverse-engineered or redistributed?

Navigating the Open Source AI landscape

Today, openness is a moving target: single or simple definitions of "open source" won't suffice. Instead, we need more informed takes that allow us to distinguish the relevant degrees and dimensions of openness. One goal of the European Open Source AI index is to supply this information. The index is directly rooted in academic work on dimensions of openness in generative AI technology (Liesenfeld & Dingemanse, 2024; Solaiman, 2023). It aims to cut through the tangle of competing notions by recognising that openness is always a gradient and composite notion. What does that mean?

Openness is gradient. Some systems are more open than others. A commercial model provider like Meta (or Facebook AI Research) agressively markets its Llama models as "open source", but very little about Llama is actually open except for the model weights — the most inscrutable component. Smaller scale research-focused labs like AllenAI provide models like OLMo that are much more open, as our index shows. The gradience of openness should make you wary of any simple claim of "open source AI". Inquiring minds want to know: How open is it?

Here are two models at opposite ends of our openness scale. Both bill themselves as "open source". Only one of them is. In views like this you can also click "compare" to see multiple models side by side. Here's a direct link to [compare Llama 3.3 and OLMo 7B](/compare?models=OLMo,llama-3.3 "Comparison of Llama and OLMo).

Parameter descriptions:

Base Model Data
Are datasources for training the base model comprehensively documented and made available? In case a distinction between base (foundation) and end (user) model is not applicable, this mirrors the end model data entries.
End User Model Data
Are datasources for training the model that the end user interacts with comprehensively documented and made available?
Base Model Weights
Are the weights of the base models made freely available? In case a distinction between base (foundation) and end (user) model is not applicable, this mirrors the end model data entries.
End User Model Weights
Are the weights of the model that the end user interacts with made freely available?
Training Code
Is the source code of dataset processing, model training and tuning comprehensively made available?
Code Documentation
Is the source code of datasource processing, model training and tuning comprehensively documented?
Hardware Architecture
Is the hardware architecture used for datasource processing and model training comprehensively documented?
Preprint
Are archived preprint(s) are available that detail all major parts of the system including datasource processing, model training and tuning steps?
Paper
Are peer-reviewed scientific publications available that detail all major parts of the system including datasource processing, model training and tuning steps?
Modelcard
Is a model card available in standardized format that provides comprehensive insight on model architecture, training, fine-tuning, and evaluation?
Datasheet
Is a datasheet as defined in "Datasheets for Datasets" (Gebru et al. 2021) available?
Package
Is a packaged release of the model available on a software repository (e.g. a Python Package Index, Homebrew)?
API and Meta Prompts
Is an API available that provides unrestricted access to the model (other than security and CDN restrictions)? If applicable, this entry also collects information on the use and availability of meta prompts.
Licenses
Is the project fully covered by Open Source Initiative (OSI)-approved licenses, including all data sources and training pipeline code?
Last updated 12 Sep 2025
OLMo by Ai2
OLMo-2-0325-32B
YuLan by Gaoling School of Artificial Intelligence
YuLan-Mini
BLOOMZ by BigScience Workshop
BLOOM
Amber by LLM360
Amber
Open Assistant by Open Assistant
Pythia-12B
mT0 by BigScience Workshop
mT5-XXL
Whisper by OpenAI
Whisper-large-v3
Pythia by EleutherAI and Together Computer
Pythia-6.9B
Apertus by Swiss AI Initiative
Apertus-70B-2509
SmolLM by HuggingFace
SmolLM3-3B-Base
K2 by LLM360
K2
Tülu by Ai2
Llama-3.1-405B
OpenChat by OpenChat
Meta-Llama-3-8B
Arabic StableLM by StabilityAI
StableLM-2-1.6B
Vicuna by LMSYS
Vicuna-13B
Teuken by OpenGPT-X
Teuken-7B-base
Skywork-OR1 by Skywork
DeepSeek-R1-Distill-Qwen-32B
MobileLLM by Meta
MobileLLM-R1-950M-base
Instella by AMD
Instella-3B
Minerva by Sapienza Natural Language Processing Group
Minerva-7B-base-v1.0
Dolly by Databricks
Pythia-12B
T5 by Google AI
T5
RedPajama by Together Computer
RedPajama-INCITE-7B-Base
MPT by Databricks
MPT-30B
Lucie by OpenLLM-France
Lucie-7B
Eurus by OpenBMB
Mixtral-8x22B-v0.1
DeepSeek V3.1 by DeepSeek
DeepSeek-V3.1-Base
Poro by AMD Silo AI and TurkuNLP and High Performance Language Technologies (HPLT)
Llama-Poro-2-70B-Base
Neo by Multimodal Art Projection
Neo-7B
BERT by Google AI
unspecified
AquilaChat by Beijing Academy of Artificial Intelligence
Aquila2-70B-Expr
Zephyr by HuggingFace
Mixtral-8x22B-v0.1
Yi by 01.AI
Yi-34B
WizardLM by Microsoft and Peking University
LLaMA-7B
Salamandra by Barcelona Supercomputing Center
Salamandra-7B
Occiglot by Occiglot
Occiglot-7B-EU5
NeuralChat by Intel
Mistral-7B-v0.1
Llama Nemotron by NVIDIA
Llama-3.3-70B-Instruct
Guru by LLM360
Guru-32B
GPT-SW3 by AI Sweden
GPT-SW3-6.7B-V2
GPT-NeoXT by Together Computer
GPT-NeoX-20B
Fietje by Bram Vanroy
Phi-2
BTLM by Cerebras
BTLM-3B-8K-Base
Pharia by Aleph Alpha Research
Pharia-1-LLM-7B
minChatGPT by Ethan Yanjia Li
GPT2-Medium
Xwin-LM by Xwin-LM
Llama-2-13B
Phi by Microsoft
Phi-4
Mistral by Mistral AI
Mistral-Large-2411
Kimi K2 by Moonshot AI
Kimi K2 Base
DeepSeek R1 by DeepSeek
DeepSeek-V3-Base
SynLogic by Minimax AI
SynLogic-32B
OpenMoE by Zheng Zian
OpenMoE-8B
OpenELM by Apple
OpenELM-3B
InternLM by Shanghai AI Laboratory
InternLM3-8B
Hunyuan by Tencent
Hunyuan-7B-Pretrain
GPT OSS by OpenAI
unspecified
Falcon by Technology Innovation Institute
Falcon-H1-34B-Base
DeepHermes by Nous Research
Llama-3.1-70B
CT-LLM by Multimodal Art Projection
CT-LLM-Base
Mistral NeMo by Mistral AI and NVIDIA
Mistral-NeMo-12B-Base
XBai-04 by Yuan Shi Technology
Qwen3-32B
Saul by Equall
Mixtral-8x22B-v0.1
Qwen by Alibaba
Qwen3-235B-A22B-Base
Granite by IBM
Granite-3.3-8B-Base
MiMo by Xiaomi
MiMo-7B-Base
Airoboros by Jon Durbin
Qwen1.5-110B
Starling by NexusFlow
Llama-2-13B
Solar by Upstage AI
Mistral-7B-v0.1
Gemma by Google AI
Gemma-3-27B-PT
Geitje by Bram Vanroy
Mistral-7B-v0.1
Claire by OpenLLM-France
Falcon-7B
BELLE by KE Technologies
Llama-2-13B
UltraLM by OpenBMB
Llama-13B
Llama 4 by Meta
Llama-4-Maverick-17B-128E
dots.llm1 by RedNote
dots.llm1.base
StripedHyena by Together Computer
StripedHyena-Hessian-7B
Marco by Alibaba
Marco-LLM-GLO
Viking by Silo AI and TurkuNLP and High Performance Language Technologies (HPLT)
unspecified
Llama 3.1 by Meta
Llama-3.1-405B
XVERSE by Shenzhen Yuanxiang Technology
XVERSE-MoE-A4.2B
RWKV by BlinkDL/RWKV
unspecified
Minimax-M1 by Minimax AI
MiniMax-Text-01
LongAlign by Zhipu AI
Llama-2-13B
Stanford Alpaca by Stanford University CRFM
Llama-7B
GLM by Zhipu AI
unspecified
Llama 3.3 by Meta
Llama-3.3-70B
Stable Beluga by StabilityAI
Llama-2-70B
Snowflake Arctic by Snowflake
Snowflake-Arctic-Base
Persimmon by Adept AI Labs
Persimmon-8B-Base
OPT by Meta
OPT-30B
Jais by G42
Llama-2-70B
Infinity-Instruct by Beijing Academy of Artificial Intelligence
Llama-3.1-70B
H2O-Danube by H2O.ai
H2O-Danube3.1-4B-Base
FastChat-T5 by LMSYS
Flan-T5-XL
EXAONE by LG
unspecified
Crystal by LLM360
Crystal
BitNet by Microsoft
unspecified
Baichuan by Baichuan Intelligent Technology
Baichuan2-13B-Base
StableVicuna by CarperAI
LLaMA-13B
Llama 3 by Meta
Meta-Llama-3-70B
Llama 2 by Meta
Llama-2-70B
LeoLM by LAION
Llama-2-70B
Koala by BAIR
Llama-13B
XGen by Salesforce
XGen-Small-9B-Base-R
Gemma Japanese by Google AI
Gemma-2-2B
Command A by Cohere AI
Command A?
Llama-Sherkala by G42
Llama-3.1-8B
Nanbeige by Nanbeige LLM lab
Nanbeige2-16B
Lumo AI by Proton
Undisclosed

Openness is composite. Openness is composed of multiple elements. This means it is more like press freedom than like temperature. The World Press Freedom Index ranks countries by their press freedom, but this itself takes into account measures on multiple dimensions, including political context, sociocultural context, and legal framework. Likewise, the European Open Source AI index gathers information on the openness of generative AI systems in terms of three broad categories (each in turn composed of finer-grained features): availability, documentation, and access. Recognising the composite nature of openness makes it possible to be concrete about what is open about a model and to what extent. It allows us to answer the question: How is it open?

Hovering over any model in our index will display the evidence we have of its openness across the three dimensions, further broken up into 15 features. Here's a grid view of OLMo's openness features:

Parameter descriptions:

Base Model Data
Are datasources for training the base model comprehensively documented and made available? In case a distinction between base (foundation) and end (user) model is not applicable, this mirrors the end model data entries.
End User Model Data
Are datasources for training the model that the end user interacts with comprehensively documented and made available?
Base Model Weights
Are the weights of the base models made freely available? In case a distinction between base (foundation) and end (user) model is not applicable, this mirrors the end model data entries.
End User Model Weights
Are the weights of the model that the end user interacts with made freely available?
Training Code
Is the source code of dataset processing, model training and tuning comprehensively made available?
Code Documentation
Is the source code of datasource processing, model training and tuning comprehensively documented?
Hardware Architecture
Is the hardware architecture used for datasource processing and model training comprehensively documented?
Preprint
Are archived preprint(s) are available that detail all major parts of the system including datasource processing, model training and tuning steps?
Paper
Are peer-reviewed scientific publications available that detail all major parts of the system including datasource processing, model training and tuning steps?
Modelcard
Is a model card available in standardized format that provides comprehensive insight on model architecture, training, fine-tuning, and evaluation?
Datasheet
Is a datasheet as defined in "Datasheets for Datasets" (Gebru et al. 2021) available?
Package
Is a packaged release of the model available on a software repository (e.g. a Python Package Index, Homebrew)?
API and Meta Prompts
Is an API available that provides unrestricted access to the model (other than security and CDN restrictions)? If applicable, this entry also collects information on the use and availability of meta prompts.
Licenses
Is the project fully covered by Open Source Initiative (OSI)-approved licenses, including all data sources and training pipeline code?
Last updated 12 Sep 2025
OLMo by Ai2
OLMo-2-0325-32B
YuLan by Gaoling School of Artificial Intelligence
YuLan-Mini
BLOOMZ by BigScience Workshop
BLOOM
Amber by LLM360
Amber
Open Assistant by Open Assistant
Pythia-12B
mT0 by BigScience Workshop
mT5-XXL
Whisper by OpenAI
Whisper-large-v3
Pythia by EleutherAI and Together Computer
Pythia-6.9B
Apertus by Swiss AI Initiative
Apertus-70B-2509
SmolLM by HuggingFace
SmolLM3-3B-Base
K2 by LLM360
K2
Tülu by Ai2
Llama-3.1-405B
OpenChat by OpenChat
Meta-Llama-3-8B
Arabic StableLM by StabilityAI
StableLM-2-1.6B
Vicuna by LMSYS
Vicuna-13B
Teuken by OpenGPT-X
Teuken-7B-base
Skywork-OR1 by Skywork
DeepSeek-R1-Distill-Qwen-32B
MobileLLM by Meta
MobileLLM-R1-950M-base
Instella by AMD
Instella-3B
Minerva by Sapienza Natural Language Processing Group
Minerva-7B-base-v1.0
Dolly by Databricks
Pythia-12B
T5 by Google AI
T5
RedPajama by Together Computer
RedPajama-INCITE-7B-Base
MPT by Databricks
MPT-30B
Lucie by OpenLLM-France
Lucie-7B
Eurus by OpenBMB
Mixtral-8x22B-v0.1
DeepSeek V3.1 by DeepSeek
DeepSeek-V3.1-Base
Poro by AMD Silo AI and TurkuNLP and High Performance Language Technologies (HPLT)
Llama-Poro-2-70B-Base
Neo by Multimodal Art Projection
Neo-7B
BERT by Google AI
unspecified
AquilaChat by Beijing Academy of Artificial Intelligence
Aquila2-70B-Expr
Zephyr by HuggingFace
Mixtral-8x22B-v0.1
Yi by 01.AI
Yi-34B
WizardLM by Microsoft and Peking University
LLaMA-7B
Salamandra by Barcelona Supercomputing Center
Salamandra-7B
Occiglot by Occiglot
Occiglot-7B-EU5
NeuralChat by Intel
Mistral-7B-v0.1
Llama Nemotron by NVIDIA
Llama-3.3-70B-Instruct
Guru by LLM360
Guru-32B
GPT-SW3 by AI Sweden
GPT-SW3-6.7B-V2
GPT-NeoXT by Together Computer
GPT-NeoX-20B
Fietje by Bram Vanroy
Phi-2
BTLM by Cerebras
BTLM-3B-8K-Base
Pharia by Aleph Alpha Research
Pharia-1-LLM-7B
minChatGPT by Ethan Yanjia Li
GPT2-Medium
Xwin-LM by Xwin-LM
Llama-2-13B
Phi by Microsoft
Phi-4
Mistral by Mistral AI
Mistral-Large-2411
Kimi K2 by Moonshot AI
Kimi K2 Base
DeepSeek R1 by DeepSeek
DeepSeek-V3-Base
SynLogic by Minimax AI
SynLogic-32B
OpenMoE by Zheng Zian
OpenMoE-8B
OpenELM by Apple
OpenELM-3B
InternLM by Shanghai AI Laboratory
InternLM3-8B
Hunyuan by Tencent
Hunyuan-7B-Pretrain
GPT OSS by OpenAI
unspecified
Falcon by Technology Innovation Institute
Falcon-H1-34B-Base
DeepHermes by Nous Research
Llama-3.1-70B
CT-LLM by Multimodal Art Projection
CT-LLM-Base
Mistral NeMo by Mistral AI and NVIDIA
Mistral-NeMo-12B-Base
XBai-04 by Yuan Shi Technology
Qwen3-32B
Saul by Equall
Mixtral-8x22B-v0.1
Qwen by Alibaba
Qwen3-235B-A22B-Base
Granite by IBM
Granite-3.3-8B-Base
MiMo by Xiaomi
MiMo-7B-Base
Airoboros by Jon Durbin
Qwen1.5-110B
Starling by NexusFlow
Llama-2-13B
Solar by Upstage AI
Mistral-7B-v0.1
Gemma by Google AI
Gemma-3-27B-PT
Geitje by Bram Vanroy
Mistral-7B-v0.1
Claire by OpenLLM-France
Falcon-7B
BELLE by KE Technologies
Llama-2-13B
UltraLM by OpenBMB
Llama-13B
Llama 4 by Meta
Llama-4-Maverick-17B-128E
dots.llm1 by RedNote
dots.llm1.base
StripedHyena by Together Computer
StripedHyena-Hessian-7B
Marco by Alibaba
Marco-LLM-GLO
Viking by Silo AI and TurkuNLP and High Performance Language Technologies (HPLT)
unspecified
Llama 3.1 by Meta
Llama-3.1-405B
XVERSE by Shenzhen Yuanxiang Technology
XVERSE-MoE-A4.2B
RWKV by BlinkDL/RWKV
unspecified
Minimax-M1 by Minimax AI
MiniMax-Text-01
LongAlign by Zhipu AI
Llama-2-13B
Stanford Alpaca by Stanford University CRFM
Llama-7B
GLM by Zhipu AI
unspecified
Llama 3.3 by Meta
Llama-3.3-70B
Stable Beluga by StabilityAI
Llama-2-70B
Snowflake Arctic by Snowflake
Snowflake-Arctic-Base
Persimmon by Adept AI Labs
Persimmon-8B-Base
OPT by Meta
OPT-30B
Jais by G42
Llama-2-70B
Infinity-Instruct by Beijing Academy of Artificial Intelligence
Llama-3.1-70B
H2O-Danube by H2O.ai
H2O-Danube3.1-4B-Base
FastChat-T5 by LMSYS
Flan-T5-XL
EXAONE by LG
unspecified
Crystal by LLM360
Crystal
BitNet by Microsoft
unspecified
Baichuan by Baichuan Intelligent Technology
Baichuan2-13B-Base
StableVicuna by CarperAI
LLaMA-13B
Llama 3 by Meta
Meta-Llama-3-70B
Llama 2 by Meta
Llama-2-70B
LeoLM by LAION
Llama-2-70B
Koala by BAIR
Llama-13B
XGen by Salesforce
XGen-Small-9B-Base-R
Gemma Japanese by Google AI
Gemma-2-2B
Command A by Cohere AI
Command A?
Llama-Sherkala by G42
Llama-3.1-8B
Nanbeige by Nanbeige LLM lab
Nanbeige2-16B
Lumo AI by Proton
Undisclosed

Although text-based LLMs are most numerous, we also include a range of other model types besides text: images, code, video, and audio.

Further reading

  • Liesenfeld, A., & Dingemanse, M. (2024). Rethinking open source generative AI: open-washing and the EU AI Act. Proceedings of the 2024 ACM Conference on Fairness, Accountability, and Transparency (FAccT ’24). doi: 10.1145/3630106.3659005
  • Solaiman, I. (2023). The Gradient of Generative AI Release: Methods and Considerations. Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency, 111–122. doi: 10.1145/3593013.3593981

Supported by the Centre for Language Studies and the Dutch Research Council. Website design & development © 2024 by BSTN. This version of the index generated 12 Sep 2025, website content last updated 04 Sep 2025.