EDITOR’ S QUESTION
Historical progress in NLP evolved from structural to symbolic, to statistical, to( neural network) pre-trained language models( PLMs) and lastly to LLMs – lately we have seen techniques for the distillation of LLMs and the generation of small language models but I don’ t want to digress.
Language modeling before the era of deep learning focused on training task-specific models through supervision whereas PLMs are trained through selfsupervision with the aim of learning representations that are common across different NLP tasks. As the size of PLMs increased, so did their performance on
The abilities of LLMs to solve diverse tasks with human-level performance come at the cost of slow training and inference, extensive hardware requirements and higher running costs. tasks. That led to LLMs that significantly increased the number of their model parameters and the size of their training dataset.
GPT-3 was the first model to achieve, purely via text interaction with the model,“ strong performance on many NLP datasets, including translation, questionanswering, and cloze tasks, as well as several tasks that require on-the-fly reasoning or domain adaptation, such as unscrambling words, using a novel word in a sentence, or performing 3-digit arithmetic.”
Today’ s LLMs accurately respond to task queries when prompted with task descriptions and examples. However, pre-trained LLMs fail to follow user intent and perform worse in zero-shot settings than in fewshot. Fine-tuning is known to enhance generalization to unseen tasks, improving zero-shot performance significantly. Other improvements relate to either taskspecific training or better prompting.
The abilities of LLMs to solve diverse tasks with humanlevel performance come at the cost of slow training and inference, extensive hardware requirements and higher
30 INTELLIGENTCIO LATAM www. intelligentcio. com