Natural Language Processing, Artificial Intelligence

New AI milestone — a boon or a threat?

“Eradicating humanity seems like a rather useless endeavor to me” ~ GPT-3

Content Table:-

  1. What is GPT-3?
  2. How it works?
  3. What it can do?
  4. Possible disrupt in the market by GPT-3

What is GPT-3?

The internet is going crazy about the new interactive tool called GPT-3(Generative Pre-trained Transformer 3), it is the third generation of wannabe human-mimicking algorithms that are capable of writing fake articles and auto-complete statements.

For years, humans are chasing the dream to develop an algorithm, that can imitate human learning. And now we’re another step closer to the milestone.

OpenAI (a former non-profit organization) has been able to achieve such a milestone because of its gigantic amount of computational power and data. The latest version of GPT-3 constitutes 175 billion parameters i.e. 10x times more, than the former milestone, the Turing NLG from Microsoft, and 116x times more than the previous GPT version.

GPT-2 vs GPT-3, Source

GPT-3 model is so computationally expensive, that, Microsoft exclusively built a beast machine that constitutes 285,000 CPU cores, 10,000 GPUs, and 400 gigabits per second of network connectivity for each GPU server. When compared with contemporaries, it ranked 5th in the fastest supercomputers list.

Albeit OpenAI’s algorithms had been open to the public in the past, but they have opted to restrain GPT-3’s workflow, the research firm clarifies the motive behind their decision as, “The model is too large for the common public to run on their computers”. Moreover, in the preceding months, OpenAI has altered their corporate structure to a “capped-profit model” to lure more investors, following their decision, they have decided to monetize its GPT-3 research. Microsoft and OpenAI have also entered a $1 Billion deal giving priority access to Azure, Microsoft’s cloud computing platform.

Comparing to the human brain with 100–1000 trillion parameters, GPT-3 boasts 175Billion parameters that are 100 fold of the NLP Turing machine which was released in 2019. Recently, a report suggested Microsoft developed a supercomputer that is capable of computing a trillion parameters, with an exponential increase in parameters and computation power, it is quite possible to mimic the human brain, at least in terms of parameters, if not intelligence, in few years.

How it Works?

Before we dive into the technicalities of GPT-3, few terminologies are important to understand its workflow:

  • NLP Tasks: These are the tasks that we implement to imitate human languages, it includes Language Translation, Reading Comprehension, Recognizing person, location, adjectives from a sentence (Named Entity Recognition), Sentiment Extraction from the text.
  • Language Models: These models are implemented to predict the next most likely word given a set of words/sentences. In simpler words, it is nothing but a model that can generate language in a probabilistic way and it can be trained on any training data i.e. autocomplete feature that is heavily used in mobile phone’s keyboards and Google search.
  • Transfer Learning: This comes from Deep Learning where we train a model for any particular task, but we somehow finetune that model, so that it could be built upon some other tasks too.
  • Zero/One/Few shot learning: It refers to GPT-3’s ability to learn the task by just seeing the task description and zero/one/a few similar examples.

There are two main types of learning in Machine Learning: supervised and unsupervised.

Supervised Learning constitutes of enormous and labeled data, and we let our model learn on given data and produce output on anonymous inputs. It can yield good results and do a solid job developing human look-a-like but it requires a mammoth amount of labeled data.

And, supervised learning isn’t really how humans amass intellect. Rather, there are a million things, we still need to know, yet we have ample amount of knowledge about them, or, in simpler words, we do a lot of unsupervised learning.

GPT-3 learns using unsupervised learning, it is fed everything that we know about language i.e. Common Crawl dataset(most of the internet), Wikipedia dataset, and WebText.

Source: Paper

Talking about workflow, it still uses the transformer model with a lot of attention layers and never-ending data. Pondering over numerical facts, it contains 96 attention layers, 3.2 million batch-size, and an unimaginable 175 Billion parameters.

Following the traditional model for language models, we require pre-trained data and then we perform fine-tuning on the trained model, so, as to, perform some particular task, but with the presence of an extra-ordinary vast dataset in GPT-3, we don’t require any sort of gradient updates/finetuning. It just requires a task description and some inferences.

Zero-Shot learning:

In zero-shot learning, the model has to realize that it has seen something like this before and thus can imitate the task description.

One-Shot learning:

In one-shot learning, we provide a single example operation to the model along with task description, thus, making our task’s intent more clear and thus require no gradient updates.

Few-Shot learning:

In few-shot learning, we provide few examples along with task description thus making our model capable to generalize better.

Source: Paper

As expected, the accuracy with few-shot learning is maximum as compared to competition.

Source: Paper

In traditional SOTA(State of art architecture), the whole input is given, the model, when required, goes to training data and matches pattern to come up with examples and then interpolate. But in GPT-3, It is expected, the algorithm is storing the data in connection i.e. in the weights, and due to such a large number of parameters, it is possible to do so, when we need a model to perform any task we simply go to training data i.e. stored in its weights and pulls out few examples and integrate.

What it can do?

You can use GPT-3’s uncanny capabilities for making an imaginary conversation with the bot, summarizing a movie, writing code, answering medical questions, simplifying the things, or complicate them.

Some of the mentioned capabilities include:

  • Language Translation: Since a lot of internet is made in the English language, GPT-3 shows surprisingly good accuracy as compared to other verbal languages.
Source: Paper
  • Winograd Style Task: It is traditional NLP technique that is responsible to determine which pronoun does a word refers to.
Source: Paper
  • Common-sense Reasoning: This includes solving physical or scientific reasoning questions.
Results on different datasets, Source: Paper
  • Arithmetic Calculations: Along with common reasoning, GPT-3 is capable of solving arithmetic problems up-to 5 operands.
Source: Paper
  • SAT Analogies: SAT is a college level entrance exam that constitutes MCQs. Surprisingly, GPT-3 achieved 65.7% accuracy with Few-shot learning where average human score is 57%.
Source: Paper
  • News-article/Poem Generation: The highly hyped feature of GPT-3 is automatic generation of news articles, texts or poems with just few analogies.
Poem written for Elon Musk by GPT-3

Possible disrupt in the market by GPT-3

GPT-3 arrival in the market can surely cut the strings for few journalists, programmers, content writers but there’s still a long way to go.

For a long time, humans are trying to imitate their knowledge using some algorithms and the majority believes that it is difficult to make and might require a precise understanding of the human mind, its activities, and conscience. But a minority argues that human-level intelligence could naturally arise in robots if provided with a lot more computing power.

GPT-3 workflow surely manages to promote minority’s ideology as the technical stack of GPT-3 isn’t splendid, rather it still works on boggy 2018 GPT-2 architecture with a lot more data.

Moreover, testers have inferred that GPT-3 makes a lot of non-sense even for reasonable questions, and it is not because it doesn’t know the answer but rather because the model thinks the wrong answer to be plausible. Or in simpler words, GPT-3 is biased.

And yet, despite all the de-merits, GPT-3 is nothing but a new milestone for AI and might prove out to be a miracle if combined with other technologies and the quest to generate human-like intelligence will go on, at least for another few years.


[1]. Language Models are few-shot learners.

[2]. GPT-3, explained: This new language AI is uncanny, funny — and a big deal

[3]. GPT-3: Language Models are Few-Shot Learners (Paper Explained)

Feel free to connect:

LinkedIn ~
Instagram ~
Github ~

Follow for further Machine Learning/ Deep Learning blogs.

Medium ~

Want to learn more?

Detecting COVID-19 Using Deep Learning

The Inescapable AI Algorithm: TikTok

Tinder+AI: A perfect Matchmaking?

An insider’s guide to Cartoonization using Machine Learning

Why are YOU responsible for George Floyd’s Murder and Delhi Communal Riots?

Reinforcing the Science Behind Reinforcement Learning

Decoding science behind Generative Adversarial Networks

Understanding LSTM’s and GRU’s

Recurrent Neural Network for Dummies

Convolution Neural Network for Dummies

Diving Deep into Deep Learning

Why Choose Random Forest and Not Decision Trees

Clustering: What it is? When to use it?

Start off your ML Journey with k-Nearest Neighbors

Naive Bayes Explained

Activation Functions Explained

Parameter Optimization Explained

Gradient Descent Explained

Logistic Regression Explained

Linear Regression Explained

Determining Perfect Fit for your ML Model