Telegram Group & Telegram Channel
Daniel Lemire's blog
AVX-512 gotcha: avoid compressing words to memory with AMD Zen 4 processors

The recent AMD processors (Zen 4) provide extensive support for the powerful AVX-512 instructions. AVX-512 (Advanced Vector Extensions 512) is an extension to the x86 instruction set architecture (ISA) introduced by Intel. These instructions enhance the capabilities of processors by allowing for more data to be processed in parallel. You can process registers made of 64 bytes!

One of the neat trick is that given a mask, you can ‘compress’ words: Suppose that you have a vector made of thirty-two 16-bit words, and you want to only keep the second one and third one, then you can use the vpcompressw instruction and the mask 0b110. It will produce a register where the second and third words are placed in first and second position.

An even nicer trick is that you can use this instruction to write just these two words out to memory. You can invoke this functionality with the _mm_mask_compressstoreu_epi16 function intrinsic.

This works well on recent Intel processors, but not so well on AMD Zen 4 processors.

We have a fast function in the simdjson library to minify a file (remove unnecessary spaces).

https://github.com/simdjson/simdjson/pull/2335

source



group-telegram.com/easton_channel/31366
Create:
Last Update:

Daniel Lemire's blog
AVX-512 gotcha: avoid compressing words to memory with AMD Zen 4 processors

The recent AMD processors (Zen 4) provide extensive support for the powerful AVX-512 instructions. AVX-512 (Advanced Vector Extensions 512) is an extension to the x86 instruction set architecture (ISA) introduced by Intel. These instructions enhance the capabilities of processors by allowing for more data to be processed in parallel. You can process registers made of 64 bytes!

One of the neat trick is that given a mask, you can ‘compress’ words: Suppose that you have a vector made of thirty-two 16-bit words, and you want to only keep the second one and third one, then you can use the vpcompressw instruction and the mask 0b110. It will produce a register where the second and third words are placed in first and second position.

An even nicer trick is that you can use this instruction to write just these two words out to memory. You can invoke this functionality with the _mm_mask_compressstoreu_epi16 function intrinsic.

This works well on recent Intel processors, but not so well on AMD Zen 4 processors.

We have a fast function in the simdjson library to minify a file (remove unnecessary spaces).

https://github.com/simdjson/simdjson/pull/2335

source

BY Easton Man's Channel


Warning: Undefined variable $i in /var/www/group-telegram/post.php on line 260

Share with your friend now:
group-telegram.com/easton_channel/31366

View MORE
Open in Telegram


Telegram | DID YOU KNOW?

Date: |

Such instructions could actually endanger people — citizens receive air strike warnings via smartphone alerts. Telegram, which does little policing of its content, has also became a hub for Russian propaganda and misinformation. Many pro-Kremlin channels have become popular, alongside accounts of journalists and other independent observers. Emerson Brooking, a disinformation expert at the Atlantic Council's Digital Forensic Research Lab, said: "Back in the Wild West period of content moderation, like 2014 or 2015, maybe they could have gotten away with it, but it stands in marked contrast with how other companies run themselves today." He said that since his platform does not have the capacity to check all channels, it may restrict some in Russia and Ukraine "for the duration of the conflict," but then reversed course hours later after many users complained that Telegram was an important source of information.
from jp


Telegram Easton Man's Channel
FROM American