Hardware Intel vs AMD

Goz

Psy-Richard
Staff member
Forum Supporter
Messages
8,360
Reaction score
1,051
Location
Oxford
Re: GPU for audio processing. It isn't "around the corner" nor ever coming, I'm afraid, as the G stands for Graphics, and the dispatching of data to a GPU occurs in batches (high thoroughput but high and indeterminate latency, the latter being a deal-breaker for audio) is both significantly faster in one direction than the other (you have to mmap your GPU memory to write back to the CPU, and even with the fastest openMPI techniques it's not really a bounded-latency thing you can count on. You can only read/write in complete pages, your data alignment will not make efficient use of this.)

TLDR; GPU's handle big chunks of data typically in a one-way direction before writing to their memory holding the 2-dimensional screen pixel data every (1/frame-rate) seconds, and this actually drives the screen with pictures. It's an output device, not typically an I/O device and it's hardware is specialized for it's main tasks in a highly competitive market.

If you want an APU, they exist, but typically not on silicon (anymore) due to market forces and software.
A DSP or FPGA is suitable hardware for doing audio processing and it's a matter of writing an I/O driver, (trivial, LOL...) which is what your UAD and Waves SoundGrid processing use, respectively. Where a DSP will generically execute code, you generally have things like lots of Multiply-And-Accumulates (MAC) for filtering operations, and data-widths will support double-precision floating point, etc... (even old-school TMS320 type stuff)

So, your computer probably already has DSP cores that have special instructions to use them instead of the CPU for certain operations.
(old school PC was SSE, and on ARM it's NEON)
SIMD is "single instruction, multiple data" for doing the same operation on lots of data in parallel, "vector extensions" are also a means of processing vectors or arrays, etc.
So, the issue here is that a DAW or VST instrument designer will be loathe to use anything that's not standard universal old and boring vs limiting their customers to specific devices. Performance will vary across implementations and profiling shows this, etc... (must be an issue for all the MacOS devs having to rewrite parts of their plugins even with reasonable emulation as it breaks on specific algos etc... now that they moved from x86 to ARM all the way)

Digital Audio requires a precisely clocked stream of sample values for both accurate recording and playback (hence the market in studio master clock sources and the desire to avoid clock skew between systems etc. Think of a bad playback clock as a realtime pitch-shifting algorithm :Grin:)
A typical audio driver arranges for the computers DMA engine to repetitively transfer data from the audio chips serial audio interface to RAM, and/or vice-versa for playback.
It then pulls an interrupt informing the computer that a buffer-full of audio is ready or needed.
Your DAW will typically register IT'S callback function with the driver and this callback function gets called to supply or offload a buffer full of sample values.
(buffer as in the setting you set, like 128 samples (2.9ms), or 256, or 1024 or whatever, it's directly related to latency.)
Your VST/AU plugin code will have either a "generate a buffer full of samples" function or a "process a buffer full of samples" function and will do it's primary work on this "thread" of execution, it's all just a wrapper for that same DMA interrupt saying "hey folks gimme or take, like NOW, not later or you will have a glitch" in some sense, if you drill down through it all...

Sorry for the random lecture, my mind wandered.... this vaporizer works, I see...
Re: GPU for audio processing. It isn't "around the corner" nor ever coming, I'm afraid, as the G stands for Graphics, and the dispatching of data to a GPU occurs in batches (high thoroughput but high and indeterminate latency, the latter being a deal-breaker for audio) is both significantly faster in one direction than the other (you have to mmap your GPU memory to write back to the CPU, and even with the fastest openMPI techniques it's not really a bounded-latency thing you can count on. You can only read/write in complete pages, your data alignment will not make efficient use of this.)

TLDR; GPU's handle big chunks of data typically in a one-way direction before writing to their memory holding the 2-dimensional screen pixel data every (1/frame-rate) seconds, and this actually drives the screen with pictures. It's an output device, not typically an I/O device and it's hardware is specialized for it's main tasks in a highly competitive market.

That just not true ....


Much more is to come. Teh sheer amount of power available in a GPU is ridiculous compared to the CPU (as they have the advantage ofnot havnig to be general purpose)

If you want an APU, they exist, but typically not on silicon (anymore) due to market forces and software.

APUs are just GPUs physically attached to the main CPU ... its a bit cheaper to swap data between the CPU and APU than CPU to GPU because they share the same memory .. but its much slower than VRAM. So horses for courses.

A DSP or FPGA is suitable hardware for doing audio processing and it's a matter of writing an I/O driver, (trivial, LOL...) which is what your UAD and Waves SoundGrid processing use, respectively. Where a DSP will generically execute code, you generally have things like lots of Multiply-And-Accumulates (MAC) for filtering operations, and data-widths will support double-precision floating point, etc... (even old-school TMS320 type stuff)

FPGAs are just not the sort of thigns you'll find on commodity hardware ... nor will you find much software support.
PCs don't tend to have DSPs as the SIMD instruction, as you point out below, cvontain most of the fucntionality.

So, your computer probably already has DSP cores that have special instructions to use them instead of the CPU for certain operations.
(old school PC was SSE, and on ARM it's NEON)
SIMD is "single instruction, multiple data" for doing the same operation on lots of data in parallel, "vector extensions" are also a means of processing vectors or arrays, etc.
So, the issue here is that a DAW or VST instrument designer will be loathe to use anything that's not standard universal old and boring vs limiting their customers to specific devices. Performance will vary across implementations and profiling shows this, etc... (must be an issue for all the MacOS devs having to rewrite parts of their plugins even with reasonable emulation as it breaks on specific algos etc... now that they moved from x86 to ARM all the way)

There are strictly just DSP-like instructions ... Qualcomm, for example, include a separate DSP core with their snap dragon processors which all support NEON.

Almost all floating point processing is done by SSE these days ... just sometimes (often indeed) they only use the scalar functions and don't take advantage of the parallel functions ... (denoted in the assembly instructions as ending in ss or ps).
 

oCeLoT

audio slave
Messages
80
Reaction score
36
Location
lisbon portugal
I would love to be able to use this forum software properly and multi-quote and such.

Your Nvidia example is disingenuous because it's not processing the audio for direct thoroughput output playback it's for analysis, and so latency doesn't matter.



I'd prefer to actually point out that the specific design of WHAT a GPU is and how it transfers which sized data around just precludes it being a good realtime audio processor. That's not it's primary function.
Yes, I'm aware of the movement toward programmable shaders vs hardware shaders over the years, and the tremendously mega-huge data thoroughput, but you are tilting at a giant windmill here: "architecture"
What part of "you can make customized hardware for any given algorithm" is unclear? at the cost of specialization, of course.
Tell me how you can achieve consistent low latency processing audio on a GPU. Which library do you use, brother? Can you achieve this performance across GPU makers devices and product lines?
Is there, actually, a single open-source audio DSP library that runs it's actual audio-pipeline code on the GPU?
I've never seen one nor heard of it. Pray tell if you know.

maybe a GPU powered mixer that runs entirely on the GPU, and that could work, if you can figure out the hardware interfaces!?
Where do you buy GPU's with I2S or TDM interfaces for the audio output if you process it all on the GPU? LOL...

tell me again about how you get your data back to the computer on time everytime (worst case latency matters in audio, hiccup) underneath it's mmap, and the issue is that you will have a hiccup as the GPU will not have the data ready, you will wait for the GPU and be late or you will grab bad/stale data.

Suffice it to say that hardware is specialized, and as you said "horses for courses" and I'm not sure which APU's you are referring to (I'm thinking of ancient BBC forth-driven audio processors) but they are certainly nothing even resembling GPU's, other than they have pipelines, but these are sample-clock driven pipeline

NVIDIA and AMD both have proprietary and/or specialized toolkits to process audio using their hardware that they know a lot of about (they build the hardware)
and so, yes, you can run special machines with their hardware doing dedicated tasks.
Will you see VST/AU plugin makers relying upon these APIs? NO. Maybe one from NVIDIA, lol...
Your customers would all need to buy that graphics card, and there's no incentive for you as you are just a tiny plugin-maker, even a UAD...

Folks, there is a reason the entire pro audio world has NOT been processing audio on the GPU, and it's not because they are stupid or missing an opportunity like low-hanging fruit.

This NVIDIA software is not processing audio on the GPU in the sense of providing those samples back for playback, they are being analyzed.
You can also use the GPU to pre-calculate filter coefficients or convolve stuff etc. but its' not a realtime audio pipeline processor capable platform
sorry.

There is not a single realtime audio processing platform comparable to a UAD using GPUs!
Think about that for a minute. It's a commodity item. It's cheap. If it were a thing, people would be leaping upon it.
You would use a DSP specialized for audio sized data, and many of them have SAI (serial audio interfaces) such as TI's McASP which is a TDM interface.
they are MADE to interface with audio ADC/DAC "codecs"

BOTTOM LINE: an audio DSP engine, if implemented in hardware, is one thing. There's been no money in that, in a world where "worse/cheaper = better" and given the tiny size of the audio market compared to GAMES.

Gaming. This drives GPU development. The other things you can do with shader kernels are pleasant side-effects but it was not designed for audio processing.



If you wish to discuss FPGA's and their availability, I simply refer you to the many audio processing platforms that use FPGA's such AS the aforementioned Waves Soundgrid technology, thank you. You can buy FPGA's. If we really must dig into this topic I'm sure we can cover a whole thread. I have some experience working with FPGA's and I'm up to speed on their industry uptake and usage. RME soundcards have FPGAs inside them and they run that little mixer app on it, for the record.
Ummm servers, ummm Intel just bought a big FPGA maker to integrate their tech in their server chips, ummm Microcrap Azure uses FPGA fabric.

"lack software" - um this is a feature. You make bespoke audio hardware you sell that has a VST/AU interface.
the software is mostly on the computer, as you won't likely run much softcore stuff on the FPGA given that it's an audio processor.
You probably use a standard FPGA-CPU bus or interface and mmap. (which you had to do for your audio interface anyway already)
You really mostly have to write a gui and figure out your VST/AU wrapper and your audio interface driver as above...
 
Last edited:

Nanook

In the kitchen, studio or gym.
Forum Supporter
Messages
23,916
Reaction score
1,090
This made for interesting reading.
I knew nothing about AMD, so instantly deferred to intel, during a current search.

@bez23 ... are you building yourself or considering off-the-shelf solutions?

(My requirements are minimal so think I'm settled on a Lenovo All-In-One solution...for ease..
...although the missus keeps leaning me towards 3XS/Scan. Her production 'puter is ridic spec and seems sturdy so far, 1 year in.
 

oCeLoT

audio slave
Messages
80
Reaction score
36
Location
lisbon portugal
PS, apologies for the snarky/short tone. I'm allegedly working (and frustrated over something atm) and thus shouldn't even be here.

just saying that a reasonable effort applied to even modest hardware that has the properties necessary for an audio realtime system is worth more than all the Intel and NVIDIA in the world as far as audio is concerned. the data thoroughput is modest, it's just the timing requirements, really... and if you already knew that you had an audio "mixer" involved with a pipeline per channel, well, that's a clocked systolic array or something like this... (let's not even speak of propagation delay inside chips, clock skew etc, muah hahahahah)
 

Goz

Psy-Richard
Staff member
Forum Supporter
Messages
8,360
Reaction score
1,051
Location
Oxford
I would love to be able to use this forum software properly and multi-quote and such.

Quote a post then put your cursor where you want and press enter. It will break it up.

Your Nvidia example is disingenuous because it's not processing the audio for direct thoroughput output playback it's for analysis, and so latency doesn't matter.

Did you actually read the article, specifically on the latency? I've used it to do real-time noise removal on a zoom call ... its pretty good! But its only the tip of the iceberg.
 

oCeLoT

audio slave
Messages
80
Reaction score
36
Location
lisbon portugal
Quote a post then put your cursor where you want and press enter. It will break it up.



Did you actually read the article, specifically on the latency? I've used it to do real-time noise removal on a zoom call ... its pretty good! But its only the tip of the iceberg.
1st point, THANK YOU!

2nd point, Yes, I had the precise same argument with a good friend about 2 months ago, and he sent the exact same link, because it's the only search result and they included the word 'realtime' in the name, lol.
It is using the GPU to analyze the data and then deriving amplitude or other modulations CONTROL function much as an envelope follower, it's informing a control process that can occur at much slower timescales than audio-sample-rate/buffer-size timing. (human speech enveloping, for example)
It's not returning processed audio data inside the temporal constraints of an audio callback function.
THAT is why I said "disingenuous"
Realtime latency is high if you must not ever have a glitch as the GPU has it's own memory writing cycle and timing that doesn't care a whit about your audio clock.
Your CPU-based mmap functionality will map the GPU memory back into the CPU's memory space and you will periodically read from it, and if the data is stale/missing you should wait another buffer, or glitch, i suppose. It's good enough for audio playback, for offline processing, for semi-realtime processing, for calculating filter co-efficients, for calculating your whatever you can prepare ahead of time.

What you see instead are many tricks on the CPU where a filter might need a large buffer to work with is that it simply only starts emitting processed audio after a few buffers as it builds up enough "tail" to work with while not messing with the overall audio latency of the system, etc. (if you consider a simple low-pass filter of just averaging every N samples, for example, you see what i mean, the characteristics involved will affect it's necessary working buffer size, etc)

If you try to, say, process a block of audio and return it, you will find that you should increase the buffer size (you may also buffer it internally much as many mastering processors such as EQ's do. It's suitable for semi-realtime like mastering or mixdown processing) to avoid glitching, and this is probably also involving you the programmer having to mmap your data back in as I don't know of any standard API for this, which could certainly be provided by one or more mfg outfits, but these are the folks who deliberately hobble the performance of their own hardware on Linux vs helping out, hence I'll take AMD anyday for their proprietary efforts, lol.... and still, none of these will make their way into plugins as the plugin maker is essentially limiting their market to owners of hardware they gain nothing shilling.

Hence, UAD and a host of other hardware-software audio DSP platforms. Audio DSP processors like SHARCs gradually being supplanted by FPGAs due to the widening usage and increasing scale of the latter coupled with the advantages of a firmware update equals new hardware, plus the fact that FPGAs can parallel process and come with basic DSP blocks so you don't even waste LUT's doing a MAC for a filter, you can fit thousands of filters on a small part with 8k LUTs, and any innovative algorithms can be directly implemented in hardware yadda yadda, but the TI and Analog Devices chips all generally are built for audio DSP and have connectors for your ADC and DAC interfacing in realtime jitter-free form as quality rackmount digital studio gear tends to be (aka, your TC Electronics or Driverack etc is probably not running on Windows, lol... the microcontroller used to typically be something crappy like a Motorola Coldfire, it's just for getting knobs interrupts etc and driving the little LCD, etc...

Another thing that is popular is to profile, find the slow spots and analyze them, or take your tight inner loops, and run them on dedicated processing cores on an FPGA using a standard bus (even SPI can be used, clocked precisely) and then you can focus your efforts on
1) developing this HDL "core"
2) writing a wrapper function, heck you can integrate it into your instruction set on Zynq processors etc, that's how your SSE and such work, it's just dispatch, lol

I wonder how UAD work in their shop, or other similar types...

anyway, i'm not a pro-audio developer, i only play one on tv :P

i think KVR forum has the new home of the Audio DSP Mailing List folks, there's good material there. There's a bunch of good stuff in the embedded world too, as audio is like their hobby for fun thing....

SPeaking of which, for the sake of bringing this ON topic, lol, KVR probably has updated tech reviews on specific mobo cpu combos too, and plenty of arguments, although if you want a real studio nerd argument go to gearslutz
 
Last edited:

Goz

Psy-Richard
Staff member
Forum Supporter
Messages
8,360
Reaction score
1,051
Location
Oxford
1st point, THANK YOU!

No worries.

2nd point, Yes, I had the precise same argument with a good friend about 2 months ago, and he sent the exact same link, because it's the only search result and they included the word 'realtime' in the name, lol.
It is using the GPU to analyze the data and then deriving amplitude or other modulations CONTROL function much as an envelope follower, it's informing a control process that can occur at much slower timescales than audio-sample-rate/buffer-size timing. It's not returning processed audio data inside the temporal constraints of an audio callback function.
THAT is why I said "disingenuous"

So you are talking about wanting latencies down to around 5ms or less? It IS totally doable with GPUs but I do agree it presents a challenge (I tend to work with systems where its even less of a problem cos real-time means less than 10 second latency :Grin: ) but even PCIe lanes don't present a significant enough bottleneck with 5ms to play with. It can even get quicker than that. Not to mention most FPGA cards for PC will require usage of the PCIe bus as well (I have eyed up the Xeon with an embedded FPGA for some specific work loads ... but they are ridiculously pricey). nVidia (and even AMD) have put a lot of work into making their cards lose the latencies. Back in the old days the latencies were obscene because it was pushing data throguh stupendously long pipelines. They're no longer anything like as long (still long compared to a CPU though!) ... but the clock speeds are way higher so it takes less time to traverse that pipeline.

Admittedly I do agree that the above algorithm is designed for things like voice comms where you don't mind latencies of up to 200ms or so ... but these latencies are decreasing every year. I stand by my assertion. Exciting things are starting to happen and this shit will be coming.

Also ... You want something that can run Doom Eternal well ...
 

NabLa

Spaniard DeLuxe
Messages
12,479
Reaction score
601
Location
Laandaan
Actually GPU for audio processing is a very real possibility that may or may not come to a DAW near you. The compute modules on these GPUs are programmable (OpenCL, CUDA) and can indeed be used for general computing. Some workloads are a better fit than others, of course, as these compute cores have a very focused feature set. AI maths and some crypto loads, to name two, are often run on GPUs and indeed, all the GPU players offer non-graphics, compute only variants of their cards for the data centre.

In fact, PS5's 3D audio (the Tempest Engine as sony calls it) runs off its GPU - an article on it here https://www.tomsguide.com/uk/features/ps5-and-3d-audio-everything-you-need-to-know whether fully or partially (probably the latter) I have no idea, but audio can certainly be accelerated using GPU compute.
 

Goz

Psy-Richard
Staff member
Forum Supporter
Messages
8,360
Reaction score
1,051
Location
Oxford
Anyway ... its all good ... just bought a 24GB Geforce 3090 RTX to help with neural net training for the beast at work! :Grin:
 

NabLa

Spaniard DeLuxe
Messages
12,479
Reaction score
601
Location
Laandaan
How did you manage that? These are apparently harder to come by than the new consoles.
 

Goz

Psy-Richard
Staff member
Forum Supporter
Messages
8,360
Reaction score
1,051
Location
Oxford
How did you manage that? These are apparently harder to come by than the new consoles.

overclockers.co.uk still have a few ... they're selling out like hot cakes though.
 

Invariant

Relativity Records
Messages
47
Reaction score
21
Location
UK
This made for interesting reading.
I knew nothing about AMD, so instantly deferred to intel, during a current search.

@bez23 ... are you building yourself or considering off-the-shelf solutions?

(My requirements are minimal so think I'm settled on a Lenovo All-In-One solution...for ease..
...although the missus keeps leaning me towards 3XS/Scan. Her production 'puter is ridic spec and seems sturdy so far, 1 year in.
It's difficult to beat Scan on cost that's for sure. Basically their systems are built free of charge. If you try to source the components seperately it doesn't work out cheaper. Plus you get a free coffee mug with the purchase...
 

Nanook

In the kitchen, studio or gym.
Forum Supporter
Messages
23,916
Reaction score
1,090
It's difficult to beat Scan on cost that's for sure. Basically their systems are built free of charge. If you try to source the components separately it doesn't work out cheaper. Plus you get a free coffee mug with the purchase...

I'm really impressed with the build quality, regardless of how they do it.
SUPER solid. ...its like a tank.
 

nab

Forum Member
Messages
1,674
Reaction score
171
Location
Cornwall
No idea on that Bez. Like Psyfi I use 2013 MacBook Pro (which kicks arse if I may say).
 

Nanook

In the kitchen, studio or gym.
Forum Supporter
Messages
23,916
Reaction score
1,090
Quick Q... Anyone here use an AMD Rayzen 3?
(Considering getting a PC loaded with one, as my new office computer. In the sales innit. Wont need to do heavy duty.)

Shite?
 
Top