The "attention" system works by having an "attention" function that takes three vectors, one called the "query", one called the "key", and one called the "values", and outputs a vector. Inside this "attention" function are a bunch of additional learnable parameters. It works because conceptually, the "attention" lets you "query" using any of these keys to find values that are "similar".

· · Web · 0 · 0 · 0
Sign in to participate in the conversation

Server run by the main developers of the project 🐘 It is not focused on any particular niche interest - everyone is welcome as long as you follow our code of conduct!