From the Transformer architecture to ChatGPT

Date: October 17, 2023 - at 4.30 PM (Italian time)Where: Aula Voci, Physics and Astronomy Department, Via Francesco Marzolo 8, 35131, Padova

Large Language Models (LLMs) have risen to prominence in recent years thanks to their ability to accurately model human Natural Language, solve reasoning tasks and effectively assist humans in several tasks via a chat interface.
In my talk I will introduce the core neural network architecture behind LLMs, the Transformer. I will further present the main line of research and insights that lead to the successful scaling of Transformer Language Models up to hundreds of billions of parameters. Finally I will relate the main ideas that let us turn a Large Language Model into a useful language assistant such as ChatGPT.

conference speakers

Nicola Dainese

Nicola Dainese graduated in Physics of Data in 2020. He is now PhD Candidate in Computer Science (Deep Learning and Artificial Intelligence) at Aalto University, Finland. His expertises are reinforcement learning and language models.