Why?

Why are we so excited by a chatbot that babbles in natural language, while a program that calculates the exact time of the next solar eclipse leaves us completely indifferent?

“Generative AI”??

A very wrong expression…

  • AI ?
    • not new
    • not a clear expression
  • generative?
    • any algorithm is “generative”

Turing and “originality”

A variant of Lady Lovelace’s objection states that a machine can ‘never do anything really new’. [..] A better variant of the objection says that a machine can never ‘take us by surprise’.[…] Machines take me by surprise with great frequency. This is largely because I do not do sufficient calculation to decide what to expect them to do, or rather because, although I do a calculation, I do it in a hurried, slipshod fashion, taking risks. Turing Computer machinery and intelligence

Why guessing a word seems more exciting?

Easy but difficult to model.

Scientific rationality vs human fuzziness

Science is knowledge which we understand so well that we can teach it to a computer; and if we don’t fully understand something, it is an art to deal with it. Since the notion of an algorithm or a computer program provides us with an extremely useful test for the depth of our knowledge about any given subject, the process of going from an art to a science means that we learn how to automate something.

Knuth, Computer programming as an art

So what’s the interest of LLMs?

  • Using them? NO SCIENTIFIC USE!
  • Studying them? (mmm perhaps…)

Using LLMs

Trendy corporate logics…

Measuring science on the basis of its applied results.

“It works well…”

What is science?

There is a notion of success which has developed in computational cognitive science in recent years which I think is novel in the history of science.Chomsky 2018. See transcript here

Science and understanding

Science is about understanding the world and giving an account of it.

So, using LLMs for science?

  • Stochastic - science is deterministic
  • Use a tool in science -> understand exactly what it does
  • Scientific methodology -> made ad hoc for the research

Is chatting an exciting ability?

Turing is making fun of you, you dummy!

He is not saying that a machine can be intelligent, he is saying that humans are so stupid!

Studying LLMs

  • Understanding language

Confirm the distributional hypothesis?

“the meaning of a word is determined by, or at least strongly correlated with, the multiple (linguistic) contexts in which that word occurs (its ‘distribution’)” Gastaldi, Pellissier

How is the model made?

Input: Eiffel Tower is in <mask>.
Output: Paris = 50%, France 40%, Europe 7%...

or:

Input: Eiffel <mask> is in Paris
output: tower 80%, building 10%...

Steps

  1. Identifying tokens (this is normally made with byte pair encoding)
  2. Encoding: transforming each token in a one-hot encoding vector
  3. Embedding: transforming the hot encoding in some vector describing the semantic position of the token
  4. Attention: adding some vectors in order to represent some relationships between token in each sentence (this is the new approach introduced with transformers)
  5. Feed foreword (a multilayer perceptron to learn the final weights and get the final hidden layer
  6. Normalization and softmax (some normalisation and a softmax in order to get the probability distribution)

Attention

\[Attention(Q,K,V) = softmax(\frac{QK^T}{\sqrt{d_k}})V\]

Science?

Turning fuzziness into rationality