@snacks @ceo_of_monoeye_dating @protos >RAG
well, it works. you fingerprint the input with bge3 or such, query a vector store for top n similar results, maybe also pull the keywords (hybrid search is something of a norm now), and then let the LLM reinterpret the text to answer the question. Hindsight uses this in "reflection" requests and then stores the result in the db as something to be queried/edited later on.
>LLMs, MLP, CNNs
llms tend to be built on transformers which hilariously rely on discrete tokens (like old cheap markovs and HMMs) in a huge buffer and uses "attention" to assign an importance to each slot in the buffer. they basically output a token and then consume the new context window with that token and do it all over again.
mamba networks (and state spaces overall) stream over an input and decide what to keep or drop. they're much more efficient but they are worse at one-shot prompts (which is, amusingly, not really that relevant when tool calling is involved.)
MLPs are just feed forward networks. CNNs attempt to classify patterns and pool classifictions across receptive fields. not.. exactly comparable things, though many generative networks are starting to shove smaller transformers in because CNNs can give you code books and transformers can turn those in to something that is occasionally useful.
then there's the shit i work on which are something else entirely.