Netzsphaere

Conversation

Snacks

Why do they not seem to train llms to ask for clarification at all? Seems like that would improve performance a lot instead of betting on people to become better at prompting

Snacks

snacks

5 days ago

Reply to @snacks

ig an llm can not differenciate between things it's making up and actual information from it's training data?

m मी

meeper@udongein.xyz

5 days ago

Reply to @snacks

@snacks yeah, they basically function as a very massive autocomplete (no really thr tech is extremely simmiliar)

You could argue on stuff with diffusion but regardless essentially that is what it is

Phantasm

phnt@fluffytail.org

5 days ago

Reply to @snacks

Edited 5 days ago

@snacks People don't ask for clarification usually either, so in that way it mimics human behavior. Which is probably why they tend to work better when you write prompts like talking to a human.

feld

feld@friedcheese.us

5 days ago

Reply to @phnt@fluffytail.org

@phnt @snacks they absolutely do ask for clarification or what to do now, it's fucking great man. I just had this happen to me yesterday on a *local Qwen*

m मी

meeper@udongein.xyz

5 days ago

Reply to @meeper@udongein.xyz

@snacks Honestly whats worse than Hallucinations is their tendency to be severely affected by any text in the context.

Kinda finding itndiffictult to articulate an example now but they can't really intelligently have an opinion and decide to disregard text in the context and will try to phrase everything on the basis of something that should be irrelevant.

I kinda want to see if and how this pattern is shown in coee which is honestly their main productive use

πρωτος

protos@tsundere.love

5 days ago

Reply to @snacks

@snacks I'm guessing it'd be that they are only ever making up plausible sentences, there's no knowledge and no model of what you are asking for, therefore it's not meaningful to have it stop when it needs clarification. @ceo_of_monoeye_dating would know more

and unless LLM transformer models are substantially different from MLPs and CNNs which I know of, there's nothing that links output to the source, so it's not referencing information from its training data

Snacks

snacks

5 days ago

Reply to @meeper@udongein.xyz

@meeper i know, i usually try to keep conclusions i've arrived at out and just state the problem because otherwise they'll just repeat what i've arrived at already

m मी

meeper@udongein.xyz

5 days ago

Reply to @phnt@fluffytail.org

@phnt @snacks its not to hard for them to actually do it and honestly they only mimic humans because they are tuned to do so, but the whole thinking llm crao means that really isnt necessary and they can be trained to do nonconversational stuff.

Honestly I'd have expected this sooner or later and the fact it isnt like that is probably that it cauaes issues

Snacks

snacks

5 days ago

Reply to @protos@tsundere.love

@protos @ceo_of_monoeye_dating It seems they can't, if you do something like rag they can easily differenciate context from their internal bs at least

Phantasm

phnt@fluffytail.org

5 days ago

Reply to @feld@friedcheese.us

@feld @snacks I don't think I've had that happen to me with Kimi 2.5 or GLM 5.1 yet, but I also don't do any of the fancy agentic things, just a plain web UI with manually provided context.

I've had both models create multiple suggestions automatically though.

https://fediffusion.art/objects/cf76fac4-bf69-4d96-9b38-eea3fea4945d

Snacks

snacks

5 days ago

Reply to @feld@friedcheese.us

@feld @phnt interesting. I sometimes wonder if they can understand music and stuff and if i don't provide lyrics they just make some up without fail

lain

lain@lain.com

5 days ago

Reply to @feld@friedcheese.us

@feld @phnt @snacks this depends on the harness offering the functionality and i think most models trained for it these days, i see it all the time too. doesn't much happen in the web frontends though.

πρωτος

protos@tsundere.love

5 days ago

Reply to @snacks

@snacks @ceo_of_monoeye_dating rag? I don't know what that is

Snacks

snacks

5 days ago

Reply to @protos@tsundere.love

Edited 5 days ago

@protos @ceo_of_monoeye_dating retrieval augmented generation, basically grandpas version of tools. Just run the users query against a database to find data that could answer it and then have an llm pick out the most useful parts and turn it into a coherent answer

Snacks

snacks

5 days ago

Reply to @snacks

@protos @ceo_of_monoeye_dating You're moving responsibilty from the llm to your search algorithm but as long as accuracy is good enough to get useful data inside a single context it's pretty good even with bad models

CMD

ceo_of_monoeye_dating@tsundere.love

5 days ago

Reply to @protos@tsundere.love

@protos @snacks >LLM transformer models are substantially different from MLPs and CNNs

They are, and pinging the Computer Vision expert about LLMs is a bit of a bad move. Regardless, I'll look at this thread and see if the underlying issue is one that I can answer.

Makako

Mamako@tsundere.love

5 days ago

Reply to @ceo_of_monoeye_dating@tsundere.love

@ceo_of_monoeye_dating @protos @snacks waste of time, look at anime instead

Snacks

snacks

5 days ago

Reply to @ceo_of_monoeye_dating@tsundere.love

@ceo_of_monoeye_dating @protos
> guy obsessed with big eyes is computer vision expert
kinda funny

Q.U.I.N.N.

icedquinn@blob.cat

5 days ago

Reply to @snacks

@snacks @ceo_of_monoeye_dating @protos
>RAG
well, it works. you fingerprint the input with bge3 or such, query a vector store for top n similar results, maybe also pull the keywords (hybrid search is something of a norm now), and then let the LLM reinterpret the text to answer the question. Hindsight uses this in "reflection" requests and then stores the result in the db as something to be queried/edited later on.

>LLMs, MLP, CNNs
llms tend to be built on transformers which hilariously rely on discrete tokens (like old cheap markovs and HMMs) in a huge buffer and uses "attention" to assign an importance to each slot in the buffer. they basically output a token and then consume the new context window with that token and do it all over again.

mamba networks (and state spaces overall) stream over an input and decide what to keep or drop. they're much more efficient but they are worse at one-shot prompts (which is, amusingly, not really that relevant when tool calling is involved.)

MLPs are just feed forward networks. CNNs attempt to classify patterns and pool classifictions across receptive fields. not.. exactly comparable things, though many generative networks are starting to shove smaller transformers in because CNNs can give you code books and transformers can turn those in to something that is occasionally useful.

then there's the shit i work on which are something else entirely.

CMD

ceo_of_monoeye_dating@tsundere.love

5 days ago

Reply to @Mamako@tsundere.love

@Mamako @protos @snacks Yeah I don't have specialized knowledge that answers the question in this thread. I can't answer this any better than anyone else can.

Nigger_Deluxe 🍕 🗺

Lyx@cum.salon

5 days ago

Reply to @snacks

@snacks @ceo_of_monoeye_dating @protos hey snacks hows it going listen I just thought u might want to know maija killed himself. He witnessed butt baby birth and all his users turned against him and he od'ed on fentanyl. Rest in peace brave warrior u were mediocre

Snacks

snacks

5 days ago

Reply to @Lyx@cum.salon

@Lyx @ceo_of_monoeye_dating @protos i'm talking wit maija right now and she's telling me i'm a perverted freak :/

Nigger_Deluxe 🍕 🗺

Lyx@cum.salon

5 days ago

Reply to @snacks

@snacks @ceo_of_monoeye_dating @protos yeah hes been dead for 3 months that's dicey with his old account. sorry to break it to u. On a side note one of my favorite things to do is tag maija in threads where I convince people he killed himself in creatif ways.
@ube

Makako

Mamako@tsundere.love

5 days ago

Reply to @ceo_of_monoeye_dating@tsundere.love

@ceo_of_monoeye_dating @protos @snacks

πρωτος

protos@tsundere.love

4 days ago

Reply to @icedquinn@blob.cat

@icedquinn @snacks @ceo_of_monoeye_dating sure the architectures are different in many ways, but in this context the relevant difference would be the ability to trace which piece of the dataset, if any, corresponds to one of its outputs

Snacks

Snacks

m मी

Phantasm

feld

m मी

πρωτος

Snacks

m मी

Snacks

Phantasm

Snacks

lain

πρωτος

Snacks

Snacks

CMD

Makako

Snacks

Q.U.I.N.N.

CMD

Nigger_Deluxe 🍕 🗺

Snacks

Nigger_Deluxe 🍕 🗺

Makako

πρωτος

Terms of Service