Parameter descriptions:

Base Model Data

Are datasources for training the base model comprehensively documented and made available? In case a distinction between base (foundation) and end (user) model is not applicable, this mirrors the end model data entries.

End User Model Data

Are datasources for training the model that the end user interacts with comprehensively documented and made available?

Base Model Weights

Are the weights of the base models made freely available? In case a distinction between base (foundation) and end (user) model is not applicable, this mirrors the end model data entries.

End User Model Weights

Are the weights of the model that the end user interacts with made freely available?

Training Code

Is the source code of dataset processing, model training and tuning comprehensively made available?

Code Documentation

Is the source code of datasource processing, model training and tuning comprehensively documented?

Hardware Architecture

Is the hardware architecture used for datasource processing and model training comprehensively documented?

Preprint

Are archived preprint(s) are available that detail all major parts of the system including datasource processing, model training and tuning steps?

Paper

Are peer-reviewed scientific publications available that detail all major parts of the system including datasource processing, model training and tuning steps?

Modelcard

Is a model card available in standardized format that provides comprehensive insight on model architecture, training, fine-tuning, and evaluation?

Datasheet

Is a datasheet as defined in "Datasheets for Datasets" (Gebru et al. 2021) available?

Package

Is a packaged release of the model available on a software repository (e.g. a Python Package Index, Homebrew)?

API and Meta Prompts

Is an API available that provides unrestricted access to the model (other than security and CDN restrictions)? If applicable, this entry also collects information on the use and availability of meta prompts.

Licenses

Is the project fully covered by Open Source Initiative (OSI)-approved licenses, including all data sources and training pipeline code?

RedPajama

by Together Computer

About the model:

Open AI model developed as a collaboration between various open-source entities.

Model type:

Text

Model performance class:

Limited

Link to the model:

https://huggingface.co/togethercomputer/RedPajama-INCITE-7B-Chat

Base models:

RedPajama-INCITE-7B-Base

End model:

RedPajama-INCITE-7B-Chat

End model license:

Apache-2.0

About the organisation:

Together Computer, a cloud platform for generative AI.

Link to the organisation:

https://together.ai/

Model release date:

March 2023

Availability

Base Model Data

RedPajama-Data-1T made available on HuggingFace

https://huggingface.co/datasets/togethercomputer/RedPajama-Data-1T

End User Model Data

The model was trained on a large collection of diverse data, including Chain-of-Thought (CoT), Public Pool of Prompts (P3) dataset, Natural-Instructions (NI) dataset. Chat-tuning using Databricks-Dolly and OASST1.

https://huggingface.co/datasets/togethercomputer/RedPajama-Data-Instruct https://huggingface.co/datasets/databricks/databricks-dolly-15k https://huggingface.co/datasets/OpenAssistant/oasst1

Base Model Weights

Base is RedPajama-INCITE-7B-Base

https://huggingface.co/togethercomputer/RedPajama-INCITE-7B-Base

End User Model Weights

Instruction-tuned version made available in parallel with base version.

https://huggingface.co/togethercomputer/RedPajama-INCITE-7B-Chat

Training Code

Code for datasets made available in exemplary ways; code for training and tuning harder to find.

https://github.com/togethercomputer/redpajama.cpp/tree/master/examples/redpajama

Documentation

Code Documentation

Code for base LLM and instruction tuning datasets beautifully documented; code specifying training and fine-tuning sparsely documented.

https://github.com/togethercomputer/redpajama.cpp/tree/master/examples/redpajama

Hardware Architecture

Architecture detailed on model card, crucial parts appear to be forked from GPT-NeoX

https://together.ai/blog/redpajama

Preprint

No preprint found.

Paper

No peer-reviewed paper found.

Modelcard

Model card and readme provide details on datasets and training procedure.

https://huggingface.co/togethercomputer/RedPajama-INCITE-7B-Chat

Datasheet

Base data sheet includes links to data and recipes to create from scratch. Other datasets are well-documented.

https://huggingface.co/datasets/togethercomputer/RedPajama-Data-1T https://huggingface.co/datasets/togethercomputer/RedPajama-Data-Instruct https://huggingface.co/datasets/databricks/databricks-dolly-15k https://huggingface.co/datasets/OpenAssistant/oasst1

Access

Package

No separate package found.

API and Meta Prompts

Hosted inference API available through HuggingFace.

https://huggingface.co/togethercomputer/RedPajama-INCITE-7B-Instruct

Licenses

Models licensed under Apache 2.0, but note that the data itself is variably licensed and so imposes some limitations.

https://huggingface.co/togethercomputer/RedPajama-INCITE-7B-Instruct/blob/main/README.md

Is this information not up to date?

Contribute here ->

Supported by the Centre for Language Studies and the Dutch Research Council. Website design & development © 2024 by BSTN. This version of the index generated 12 Sep 2025, website content last updated 04 Sep 2025.