Netzsphaere

Conversation

Yewcy

yew@movsw.0x0.st

2 months ago

“The trace monoid or free partially commutative monoid is a monoid of traces.”

mia

mia@movsw.0x0.st

2 months ago

Reply to @yew@movsw.0x0.st

@yew

Snacks

snacks

2 months ago

Reply to @yew@movsw.0x0.st

Edited 2 months ago

@yew you're falling for the fp meme?

Yewcy

yew@movsw.0x0.st

2 months ago

Reply to @snacks

@snacks https://en.wikipedia.org/wiki/Trace_cache

Snacks

snacks

2 months ago

Reply to @yew@movsw.0x0.st

@yew so basically a stream of instructions where blocks can be reordered? Kinda like some sort of compression scheme for loops with branches ig but sounds hard to implement

Yewcy

yew@movsw.0x0.st

2 months ago

Reply to @snacks

@snacks decoded instructions (micro operations)

Snacks

snacks

2 months ago

Reply to @yew@movsw.0x0.st

@yew that's pretty standard for an instruction cache afaik

Yewcy

yew@movsw.0x0.st

2 months ago

Reply to @snacks

@snacks they’re the same thing

Snacks

snacks

2 months ago

Reply to @yew@movsw.0x0.st

@yew the article you linked says in its first sentence that it's a specialized instruction cache

Yewcy

yew@movsw.0x0.st

2 months ago

Reply to @snacks

@snacks i have no idea if there’s still desktop cpus with non-trace instruction caches

Snacks

snacks

2 months ago

Reply to @yew@movsw.0x0.st

Edited 2 months ago

@yew it's a way to help with branch prediction issues so it prob doesn't matter that it's complex to implement yeah, makes sense

Snacks

snacks

2 months ago

Reply to @snacks

@yew for desktop cpus

Snacks

snacks

2 months ago

Reply to @snacks

@yew or, anything high powered not massively parallel

Snacks

snacks

2 months ago

Reply to @snacks

@yew apparently at least zen uses a regular ass uop cache and does idk what instead. Intel has a uop queue that does loop unrolling internally?

Snacks

snacks

2 months ago

Reply to @snacks

@yew amd somehow manages it in their data caches?

Snacks

snacks

2 months ago

Reply to @snacks

@yew they store some extra data on how everything can branch in their caches and then just decode all the branches because the decoder is too stronk?

Snacks

snacks

2 months ago

Reply to @snacks

@yew this seems wrong, surely i'm misunderstanding something

Snacks

snacks

2 months ago

Reply to @snacks

@yew found a nice pdf with some info on all the modern microarchitectures btw: https://www.agner.org/optimize/microarchitecture.pdf

Snacks

snacks

2 months ago

Reply to @snacks

@yew at least zen 5
The Zen 5 breaks this long-standing bottleneck for the first time with a fetch rate of 32 bytes per clock and a decoding rate of 6 instructions per clock. This fetch and decode rate applies to each side of a 2-way branch when the two branches are decoded simultaneously.

bartholin (surviving global warming arc)

bartholin@fops.cloud

2 months ago

Reply to @yew@movsw.0x0.st

@yew a monoid is a list (or a string), and traces have an independence relation that allows independent letters of your string to be switched around. This represents concurrent operations or something idk

mia (developer mode)

mia@shrimptest.0x0.st

2 months ago

Reply to @snacks

@snacks @yew zen has clustered decoders (that’s something AMD did with steamroller as well) and zen 5 has two fetch paths for them as well. as long as everything fits into the instruction cache, for 8-byte instructions, excavator actually gets higher throughput than zen 5 and also benefits from SMT, but not as much as zen 5 where throughput doubles with 2 threads.

4-wide instruction decoder too slow? frontend working too hard? literally just add a second cluster lol

having two decode clusters also means zen 5 doesn’t rely quite as heavily on its op cache as zen 4 did, at least for SMT workloads, but switching between instruction cache and op cache mode still hurts performance a lot. usually that’s not a problem because the op cache is massive (6k entries iirc). there are cases where the op cache nearly doubles execution throughput

Snacks

snacks

2 months ago

Reply to @mia@shrimptest.0x0.st

@mia @yew i knew that zen 5 was better with smt than most other architectures but reading up on this stuff is kinda crazy. I thought it was mostly just wider

About Netzsphaere

Terms of Service

DA RULEZ:

Don't cause us any legal trouble
Try not to be too annoying
No loli or beast
Rule #9 still applies

If there's any questions or you want an invite link, feel free to ask snacks.

动态网自由门天安門天安门法輪功李洪志 Free Tibet 六四天安門事件 The Tiananmen Square protests of 1989 天安門大屠殺 The Tiananmen Square Massacre 反右派鬥爭 The Anti-Rightist Struggle 大躍進政策 The Great Leap Forward 文化大革命 The Great Proletarian Cultural Revolution 人權 Human Rights 民運 Democratization 自由 Freedom 獨立 Independence 多黨制 Multi-party system 台灣臺灣 Taiwan Formosa 中華民國 Republic of China 西藏土伯特唐古特 Tibet 達賴喇嘛 Dalai Lama 法輪功 Falun Dafa 新疆維吾爾自治區 The Xinjiang Uyghur Autonomous Region 諾貝爾和平獎 Nobel Peace Prize 劉暁波 Liu Xiaobo 民主言論思想反共反革命抗議運動騷亂暴亂騷擾擾亂抗暴平反維權示威游行李洪志法輪大法大法弟子強制斷種強制堕胎民族淨化人體實驗肅清胡耀邦趙紫陽魏京生王丹還政於民和平演變激流中國北京之春大紀元時報九評論共産黨獨裁專制壓制統一監視鎮壓迫害侵略掠奪破壞拷問屠殺活摘器官誘拐買賣人口遊進走私毒品賣淫春畫賭博六合彩天安門天安门法輪功李洪志 Free Tibet 劉曉波动态网自由门