Conversation
Why do they not seem to train llms to ask for clarification at all? Seems like that would improve performance a lot instead of betting on people to become better at prompting
2
0
1
ig an llm can not differenciate between things it's making up and actual information from it's training data?
2
0
1
@snacks yeah, they basically function as a very massive autocomplete (no really thr tech is extremely simmiliar)

You could argue on stuff with diffusion but regardless essentially that is what it is
1
0
0
@snacks People don't ask for clarification usually either, so in that way it mimics human behavior. Which is probably why they tend to work better when you write prompts like talking to a human.
2
0
3
@phnt @snacks they absolutely do ask for clarification or what to do now, it's fucking great man. I just had this happen to me yesterday on a *local Qwen*
3
0
2
@snacks Honestly whats worse than Hallucinations is their tendency to be severely affected by any text in the context.

Kinda finding itndiffictult to articulate an example now but they can't really intelligently have an opinion and decide to disregard text in the context and will try to phrase everything on the basis of something that should be irrelevant.

I kinda want to see if and how this pattern is shown in coee which is honestly their main productive use
1
0
2
@snacks I'm guessing it'd be that they are only ever making up plausible sentences, there's no knowledge and no model of what you are asking for, therefore it's not meaningful to have it stop when it needs clarification. @ceo_of_monoeye_dating would know more

and unless LLM transformer models are substantially different from MLPs and CNNs which I know of, there's nothing that links output to the source, so it's not referencing information from its training data
2
0
1
@meeper i know, i usually try to keep conclusions i've arrived at out and just state the problem because otherwise they'll just repeat what i've arrived at already
0
0
1
@phnt @snacks its not to hard for them to actually do it and honestly they only mimic humans because they are tuned to do so, but the whole thinking llm crao means that really isnt necessary and they can be trained to do nonconversational stuff.

Honestly I'd have expected this sooner or later and the fact it isnt like that is probably that it cauaes issues
0
0
2
@protos @ceo_of_monoeye_dating It seems they can't, if you do something like rag they can easily differenciate context from their internal bs at least
1
0
1
@feld @snacks I don't think I've had that happen to me with Kimi 2.5 or GLM 5.1 yet, but I also don't do any of the fancy agentic things, just a plain web UI with manually provided context.

I've had both models create multiple suggestions automatically though.

https://fediffusion.art/objects/cf76fac4-bf69-4d96-9b38-eea3fea4945d
0
0
0
@feld @phnt interesting. I sometimes wonder if they can understand music and stuff and if i don't provide lyrics they just make some up without fail
0
0
0
@feld @phnt @snacks this depends on the harness offering the functionality and i think most models trained for it these days, i see it all the time too. doesn't much happen in the web frontends though.
0
0
3
@protos @ceo_of_monoeye_dating retrieval augmented generation, basically grandpas version of tools. Just run the users query against a database to find data that could answer it and then have an llm pick out the most useful parts and turn it into a coherent answer
1
0
1
@protos @ceo_of_monoeye_dating You're moving responsibilty from the llm to your search algorithm but as long as accuracy is good enough to get useful data inside a single context it's pretty good even with bad models
1
0
0
@protos @snacks >LLM transformer models are substantially different from MLPs and CNNs

They are, and pinging the Computer Vision expert about LLMs is a bit of a bad move. Regardless, I'll look at this thread and see if the underlying issue is one that I can answer.
2
0
2
@ceo_of_monoeye_dating @protos
> guy obsessed with big eyes is computer vision expert
kinda funny
1
1
2
@snacks @ceo_of_monoeye_dating @protos
>RAG
well, it works. you fingerprint the input with bge3 or such, query a vector store for top n similar results, maybe also pull the keywords (hybrid search is something of a norm now), and then let the LLM reinterpret the text to answer the question. Hindsight uses this in "reflection" requests and then stores the result in the db as something to be queried/edited later on.

>LLMs, MLP, CNNs
llms tend to be built on transformers which hilariously rely on discrete tokens (like old cheap markovs and HMMs) in a huge buffer and uses "attention" to assign an importance to each slot in the buffer. they basically output a token and then consume the new context window with that token and do it all over again.

mamba networks (and state spaces overall) stream over an input and decide what to keep or drop. they're much more efficient but they are worse at one-shot prompts (which is, amusingly, not really that relevant when tool calling is involved.)

MLPs are just feed forward networks. CNNs attempt to classify patterns and pool classifictions across receptive fields. not.. exactly comparable things, though many generative networks are starting to shove smaller transformers in because CNNs can give you code books and transformers can turn those in to something that is occasionally useful.

then there's the shit i work on which are something else entirely.
1
0
1
@Mamako @protos @snacks Yeah I don't have specialized knowledge that answers the question in this thread. I can't answer this any better than anyone else can.
1
0
0
@snacks @ceo_of_monoeye_dating @protos hey snacks hows it going listen I just thought u might want to know maija killed himself. He witnessed butt baby birth and all his users turned against him and he od'ed on fentanyl. Rest in peace brave warrior u were mediocre garlicman
1
0
0
@Lyx @ceo_of_monoeye_dating @protos i'm talking wit maija right now and she's telling me i'm a perverted freak :/
1
1
1
@snacks @ceo_of_monoeye_dating @protos yeah hes been dead for 3 months that's dicey with his old account. sorry to break it to u. On a side note one of my favorite things to do is tag maija in threads where I convince people he killed himself in creatif ways.
@ube
0
0
1
@icedquinn @snacks @ceo_of_monoeye_dating sure the architectures are different in many ways, but in this context the relevant difference would be the ability to trace which piece of the dataset, if any, corresponds to one of its outputs
0
0
0