I'd like to explore the possibilities of training a LM to learn a specific programming language so he can be used as co-pilot in that context. Language is a niche language (http://pharo.org), and there is no existing model nowadays knowing it (also, I want to make some extra tweaks, once I have it).
Thing is... I have no idea where to start! :)
Yeah there is - GPT-4 is very familiar with Pharo. And you're not going to be able to train anything better than that yourself. OpenAI said it cost about a hundred million dollars in hardware and electricity to train the GPT-4 model. I assume you don't have a budget like that?
Other major models are probably pretty good at it too, but in my experience GPT-4 seems to be the best one especially in terms of code generation. So, I recommend starting with that one.
Sign up for ChatGPT Plus, so you can use GPT-4 instead of GPT-3.5 (GPT-4 is a lot better), and just say "explain this pharo code: " then paste in a block of code.
Here's an example of a simple chat I had about Pharo with GPT-4. I started by asking it to explain some sample code I found from one of the examples on the Pharo website, then I asked if it could be improved (it suggested some good improvements), and then I asked it how to add a feature: