An implementation of model parallel [GPT2]& [GPT3]-like models, with the ability to scale up to full GPT3 sizes (and possibly more!), using the [mesh-tensorflow]( library.

