European Open Source AI Index
DatabaseNewsGuidesAboutContribute

Geitje

by Bram Vanroy

Dutch instruction-tuned model based on Mistral 7B
Text
Limited
https://huggingface.co/BramVanroy/GEITje-7B-ultra
Mistral-7B-v0.1
GEITje-7B-ultra
CC-BY-ND
Independent model creator.
https://bramvanroy.github.io/
January 2024
Availability
Base Model Data
Mistral provides no documentation of any of its pretraining data. Geitje Ultra 7B is based on Geitje 7B, which does disclose that Dutch pretraining data includes Gigacorpus and MADLAD.
https://huggingface.co/mistralai/Mistral-7B-v0.1/discussions/8https://huggingface.co/Rijgersberg/GEITje-7B#geitje--trained-further-on-dutch-texts
End User Model Data
Ultrafeedback Dutch (synthetic)
https://huggingface.co/datasets/BramVanroy/ultra_feedback_dutch
Base Model Weights
Weights available through HuggingFace.
https://huggingface.co/mistralai/Mistral-7B-v0.1
End User Model Weights
Instruction-tuned model formerly made available through HuggingFace, now taken down
https://huggingface.co/BramVanroy/GEITje-7B-ultra/tree/main
Training Code
Mistral has limited source code available. No training code for Geitje found apart from the recipe used with the alignment handbook.
https://huggingface.co/BramVanroy/GEITje-7B-ultra#training-procedure
Documentation
Code Documentation
collection of code on github repository, although not very clearly documented and commented.
https://github.com/mistralai/mistral-inference/tree/main/src/mistral_inference
Hardware Architecture
Some information on architecture provided in github repo and HF model card. Training was done using alignment handbook.
https://huggingface.co/BramVanroy/GEITje-7B-ultra
Preprint
Preprint published on arXiv.
https://arxiv.org/abs/2312.12852v1
Paper
No peer-reviewed paper found
Modelcard
Modelcard on HF provides information on fine-tuning but nothing for the Mistral base LLM
https://huggingface.co/BramVanroy/GEITje-7B-ultra
Datasheet
Datasheet available for DPO and for the Dutch portions of pretraining data, but not for original Mistral pretraining data, hence partial.
https://huggingface.co/datasets/BramVanroy/ultra_feedback_dutch
Access
Licenses
Licensed as CC-BY-ND-4.0 on HuggingFace, though no specific license file or statement found
https://huggingface.co/BramVanroy/GEITje-7B-ultra
Is this information not up to date?
Contribute here ->
Supported by the Centre for Language Studies and the Dutch Research Council. Website design & development © 2024 by BSTN. This version of the index generated 09 April 2026, website content last updated 11 March 2026.