StableVicuna-13B is a Vicuna-13B v0 model fine-tuned using reinforcement learning from human feedback (RLHF) via Proximal Policy Optimization (PPO) on various conversational and instructional datasets
From the documentation 'The reward model used during RLHF was also trained on OpenAssistant Conversations Dataset (OASST1) along with two other datasets Anthropic HH-RLHF, a dataset of preferences about AI assistant helpfulness and harmlessness; and Stanford Human Preferences Dataset a dataset of 385K collective human preferences over responses to questions/instructions in 18 different subject areas, from cooking to legal advice.'
Based on LLaMA weights, which are not openly available though a leaked versions is in wide circulation.
End User Model Weights
Model not functional out of the box as weights require a delta computation. From the docs 'StableVicuna-13B cannot be used from the CarperAI/stable-vicuna-13b-delta weights alone. To obtain the correct model, one must add back the difference between LLaMA 13B and CarperAI/stable-vicuna-13b-delta weights.'
Training code not made public, only code for applying the delta.
Documentation
Code Documentation
Code is minimally documented and deployment requires non-trivial configuration, e.g. 'StableVicuna-13B cannot be used from the CarperAI/stable-vicuna-13b-delta weights alone. To obtain the correct model, one must add back the difference between LLaMA 13B and CarperAI/stable-vicuna-13b-delta weights.'
CC-BY-NC-SA-4.0. License for LLaMA is more murky, hence partial. As they say 'License for the base LLaMA model's weights is Meta's non-commercial bespoke license.'