Artificial Intelligence Asked by parzival on August 24, 2021
I’ve been hearing a lot about GPT-3 by OpenAI, and that it’s a simple to use API with text in text out and has a big neural network off 175B parameters.
But how did they achieve this huge number of parameters, and why is it being predicted as one of the greatest innovations?
The main point in GPT-3 and already in 2 was the observation that performance was steadily increasing with increasing model size (As seen in Figure 1.2 in your linked paper). So it seems that while all progress made in NLP was definitely useful, it also is important to just scale up the model size.
This may not seem like surprising point, but it actually kind of is. Normally, performance would saturate or at least the gain would slope off, but this is not the case! So the main innovation may not be that big and is kind off brute-force but the point still stands: Bigger models are better.
Another point to mention is the way they did the training. Such a large model needs some tricks to be actually trained (and fast at that). You also want to make use of multiple GPUs for parallel training. This means they also had to develop new structures for training.
Why exactly it is predicted as a huge innovation may only be contained to some twitter demonstration, there are no real sources on this as far as I know. Especially because the model is not openly available.
Answered by N. Kiefer on August 24, 2021
Get help from others!
Recent Questions
Recent Answers
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP