No datasets made available and no information on datasets disclosed except very generic claims about filtering for high quality. Large amount of synthetic data used.
No post-training datasets made available and no information on datasets disclosed except very generic claims about filtering for high quality. Large amount of synthetic data used.