Parameter descriptions:

Base Model Data

Are datasources for training the base model comprehensively documented and made available? In case a distinction between base (foundation) and end (user) model is not applicable, this mirrors the end model data entries.

End User Model Data

Are datasources for training the model that the end user interacts with comprehensively documented and made available?

Base Model Weights

Are the weights of the base models made freely available? In case a distinction between base (foundation) and end (user) model is not applicable, this mirrors the end model data entries.

End User Model Weights

Are the weights of the model that the end user interacts with made freely available?

Training Code

Is the source code of dataset processing, model training and tuning comprehensively made available?

Code Documentation

Is the source code of datasource processing, model training and tuning comprehensively documented?

Hardware Architecture

Is the hardware architecture used for datasource processing and model training comprehensively documented?

Preprint

Are archived preprint(s) are available that detail all major parts of the system including datasource processing, model training and tuning steps?

Paper

Are peer-reviewed scientific publications available that detail all major parts of the system including datasource processing, model training and tuning steps?

Modelcard

Is a model card available in standardized format that provides comprehensive insight on model architecture, training, fine-tuning, and evaluation?

Datasheet

Is a datasheet as defined in "Datasheets for Datasets" (Gebru et al. 2021) available?

Package

Is a packaged release of the model available on a software repository (e.g. a Python Package Index, Homebrew)?

API and Meta Prompts

Is an API available that provides unrestricted access to the model (other than security and CDN restrictions)? If applicable, this entry also collects information on the use and availability of meta prompts.

Licenses

Is the project fully covered by Open Source Initiative (OSI)-approved licenses, including all data sources and training pipeline code?

Pharia

by Aleph Alpha Research

About the model:

Autoregressive transformer LLM trained in English, German, French, Spanish, Italian, Portuguese and Dutch

Model type:

Text

Model performance class:

Limited

Link to the model:

https://huggingface.co/Aleph-Alpha/Pharia-1-LLM-7B-control

Base models:

Pharia-1-LLM-7B

End model:

Pharia-1-LLM-7B-control

End model license:

Open Aleph License 1.0

About the organisation:

Company providing enterprise and government AI.

Link to the organisation:

https://aleph-alpha.com/

Model release date:

August 2024

Availability

Base Model Data

No specific accounting or listing available. Training data described in broad terms as 'web-crawled data and structured datasets with a total size of 7.7T, with a cutoff date 04/2023' alongside 'some additional web scraping'.

https://huggingface.co/Aleph-Alpha/Pharia-1-LLM-7B-control#pre-training

End User Model Data

No data provided except a very generic reference to 'source-available, commercially usable datasets, as well as self-created and procured proprietary datasets'.

https://huggingface.co/Aleph-Alpha/Pharia-1-LLM-7B-control#instruction-fine-tuning

Base Model Weights

Pharia 1 base model made available through HuggingFace

https://huggingface.co/Aleph-Alpha/Pharia-1-LLM-7B-control

End User Model Weights

Aligned model made available through HuggingFace

https://huggingface.co/Aleph-Alpha/Pharia-1-LLM-7B-control-aligned

Training Code

Aleph Alpha claims Pharia has been trained using the Scaling code base, which it made available as a repository mirrored from an unknown source; no specific repository found documenting the training or instruction-tuning of Pharia.

https://github.com/Aleph-Alpha/scaling

Documentation

Code Documentation

No specific code for training and tuning found, so no documentation of code found. Scaling code base is reasonably documented.

Hardware Architecture

Training process, architecture and resource usage are documented at above-average levels of detail.

https://huggingface.co/Aleph-Alpha/Pharia-1-LLM-7B-control#resource-usage

Preprint

No preprint or in-depth scientific documentation found outside model card.

Paper

No peer-reviewed paper found

Modelcard

Modelcard on HF provides a fair bit of documentation of training data, architecture, evaluation, and risks and limitations.

https://huggingface.co/Aleph-Alpha/Pharia-1-LLM-7B-control

Datasheet

No datasheet or other detailed accounting of pretraining, finetuning or instruction tuning data found. Scant details given in model card.

Access

Package

Package available through a bespoke Intelligence Layer SDK

https://github.com/Aleph-Alpha/intelligence-layer-sdk

API and Meta Prompts

Paid API access available through Intelligence Layer SDK.

Licenses

Licensed under Open Aleph License 1.0, which limits use to personal, non-profit, research and educational uses.

https://huggingface.co/Aleph-Alpha/Pharia-1-LLM-7B-control/blob/main/LICENSE

Last updated 10 June 2025

Is this information not up to date?

Contribute here ->

Supported by the Centre for Language Studies and the Dutch Research Council. Website design & development © 2024 by BSTN. This version of the index generated 26 December 2025, website content last updated 30 December 2025.