Week 1
- Linguistincs
- Assignment (individual)
Week 2
- Machine learning. Concepts: laplacian → ... → neural networks
- Assignment (individual)
Week 3
- Combine
- Choice of projects as inspiration. Brief
Natural Language Processing:
- People is good at using language, machines... meh
- Languages not unique: not unique to humans (elephants, birds).
What is language:
- Areas: broca / Wernicke in the head.
- The whole brain is used to process language.
- Knowledge, content, planning, imagination.
- Velar closure: drink vs. breathing. We can open/close very quickly.
- Animals are good with vowels.
- Started many (thousands) of years ago, but what it's said is gone.
Written language:
- Written language is approx 3000.
- Not every language can be written.
- Not "transiency" (transiency).
- Written language extends cognition (you are "more intelligent").
- Mapping speech to text.
Grand challenges:
- Machine translation.
- Sequence to sequence: I like your hat / J'aime otn chapeau.
- Humans can teach languages to each other well.
- Ground sound to actions: movements + objects + saying word.
Chinese room:
- Person in a room: gets language A as in. They don't understand, but they have a translation given a production rule. Do you know then language B? Nope.
- Symbol manipulation ≠ consciousness.
When to get translation?:
- In two years, in five years...? Never?
- "Every time I fire a linguistics, accuracy goes up"
Dialog:
- We prefer to communicate with dialog.
- Screen is black and white. Give instruction to computer: "PICK UP A BIG RED BLOCK". "OK".
Turing test:
- Whether the thing you are talking with is a human or a computer.
- Remove distracting factors: everything is mediated through a computer.
- Conversation through typing.
- Good questions for the test are the ones that involve context.
Semantic extraction:
- "Frames" - Minsky. Scripts that we learn (go to the store, pick up a basket, pick up items, pay...).
- Solves the problem of bureaucracy - interpreting legal documents.
Language generation:
- Summarisation: reduce a document.
- Descriptions: map + image + location + camera → describe.
- Journalism: sport and stocks market. Very standarized format.
- Timestamp + summary + prior context + ... + past event.
Information filtering:
- What to include? There can be way too much.
Legal issues:
- Who owns the content?
- Who is responsible?
Question answering:
- In: question. Out: answer.
- Natural interrogaiton mode: "Where are you going?", "How tall is the Eiffel tower?"
- When you talk is less natural to speak queries.