Telegram Group & Telegram Channel
Daniel Lemire's blog
AVX-512 gotcha: avoid compressing words to memory with AMD Zen 4 processors

The recent AMD processors (Zen 4) provide extensive support for the powerful AVX-512 instructions. AVX-512 (Advanced Vector Extensions 512) is an extension to the x86 instruction set architecture (ISA) introduced by Intel. These instructions enhance the capabilities of processors by allowing for more data to be processed in parallel. You can process registers made of 64 bytes!

One of the neat trick is that given a mask, you can ‘compress’ words: Suppose that you have a vector made of thirty-two 16-bit words, and you want to only keep the second one and third one, then you can use the vpcompressw instruction and the mask 0b110. It will produce a register where the second and third words are placed in first and second position.

An even nicer trick is that you can use this instruction to write just these two words out to memory. You can invoke this functionality with the _mm_mask_compressstoreu_epi16 function intrinsic.

This works well on recent Intel processors, but not so well on AMD Zen 4 processors.

We have a fast function in the simdjson library to minify a file (remove unnecessary spaces).

https://github.com/simdjson/simdjson/pull/2335

source



group-telegram.com/easton_channel/31366
Create:
Last Update:

Daniel Lemire's blog
AVX-512 gotcha: avoid compressing words to memory with AMD Zen 4 processors

The recent AMD processors (Zen 4) provide extensive support for the powerful AVX-512 instructions. AVX-512 (Advanced Vector Extensions 512) is an extension to the x86 instruction set architecture (ISA) introduced by Intel. These instructions enhance the capabilities of processors by allowing for more data to be processed in parallel. You can process registers made of 64 bytes!

One of the neat trick is that given a mask, you can ‘compress’ words: Suppose that you have a vector made of thirty-two 16-bit words, and you want to only keep the second one and third one, then you can use the vpcompressw instruction and the mask 0b110. It will produce a register where the second and third words are placed in first and second position.

An even nicer trick is that you can use this instruction to write just these two words out to memory. You can invoke this functionality with the _mm_mask_compressstoreu_epi16 function intrinsic.

This works well on recent Intel processors, but not so well on AMD Zen 4 processors.

We have a fast function in the simdjson library to minify a file (remove unnecessary spaces).

https://github.com/simdjson/simdjson/pull/2335

source

BY Easton Man's Channel


Warning: Undefined variable $i in /var/www/group-telegram/post.php on line 260

Share with your friend now:
group-telegram.com/easton_channel/31366

View MORE
Open in Telegram


Telegram | DID YOU KNOW?

Date: |

NEWS The Security Service of Ukraine said in a tweet that it was able to effectively target Russian convoys near Kyiv because of messages sent to an official Telegram bot account called "STOP Russian War." He floated the idea of restricting the use of Telegram in Ukraine and Russia, a suggestion that was met with fierce opposition from users. Shortly after, Durov backed off the idea. "He has kind of an old-school cyber-libertarian world view where technology is there to set you free," Maréchal said. Stocks closed in the red Friday as investors weighed upbeat remarks from Russian President Vladimir Putin about diplomatic discussions with Ukraine against a weaker-than-expected print on U.S. consumer sentiment.
from us


Telegram Easton Man's Channel
FROM American