European Open Source AI Index
DatabaseNewsGuidesAboutContribute

AlchemistCoder

by Shanghai AI Laboratory

Open model trained by harmonizing different data sources. Multiple versions exist with different base models.
Code
Full
https://huggingface.co/internlm/AlchemistCoder-DS-6.7B
DeepSeek-Coder-6.7B-Base
AlchemistCoder-DS-6.7B
Apache-2.0
National-level Chinese research institute.
https://www.shlab.org.cn/
May 2024
Availability
Base Model Data
GitHub is mentioned as a primary source for code data. For the rest the data mixture is left abstract.
https://arxiv.org/pdf/2401.14196
End User Model Data
The model makes use of both regular open-source data and synthetic data. Though the open-source data is outlined in the paper, the synthetic data generated is not provided.
https://arxiv.org/pdf/2405.19265
Base Model Weights
Weights available through HuggingFace.
https://huggingface.co/deepseek-ai/deepseek-coder-6.7b-base
End User Model Weights
Weights available through HuggingFace.
https://huggingface.co/internlm/AlchemistCoder-DS-6.7B
Training Code
A repository exists which purportedly contains source code. However, this repository contains no code.
https://github.com/InternLM/AlchemistCoder/
Documentation
Code Documentation
No code available.
https://github.com/InternLM/AlchemistCoder/
Hardware Architecture
No hardware architecture outlined.
Preprint
Preprint made available on arXiv.
https://arxiv.org/pdf/2405.19265
Paper
Paper published in NIPS.
https://dl.acm.org/doi/abs/10.5555/3737916.3737987
Modelcard
Model card contains some information, mainly describing the model and providing usage instructions.
https://huggingface.co/internlm/AlchemistCoder-DS-6.7B
Datasheet
Data sheets available for some data sources, however synthetic data is not made publicly available.
https://huggingface.co/datasets/nickrosh/Evol-Instruct-Code-80k-v1https://huggingface.co/datasets/codefuse-ai/CodeExercise-Python-27khttps://huggingface.co/datasets/theblackcat102/evol-codealpaca-v1
Access
Licenses
Model licensed under Apache-2.0.
https://huggingface.co/datasets/choosealicense/licenses/blob/main/markdown/apache-2.0.md
Is this information not up to date?
Contribute here ->
Supported by the Centre for Language Studies and the Dutch Research Council. Website design & development © 2024 by BSTN. This version of the index generated 09 April 2026, website content last updated 11 March 2026.