Most important news from WWDC - Audio apps can now run multiple audio threads !!!

dendy · June 2020

@jolico said:
How many instances of Model D ?

Just speculating - as i said i'm new in this particular area, still learning and trying to undersand low level concepts - in theory, if Moog adopts this new API, you should be able to run twice as much instances on processors with 2 high performace cores (maybe one less than exact twice because of some additional overhead by managing other things, but definitely lot more)

Of course - IF they adopt new API.

Korakios · June 2020

@Turntablist said:

@hes said:
Are there plugins that manage their own multiprocessing on desktop OS's?

No, it is all dealt with by the host.

Diva VST can have multicore processing on a single instance in Reaper.

@dendy said:

@jolico said:
How many instances of Model D ?

Just speculating - as i said i'm new in this particular area, still learning and trying to undersand low level concepts - in theory, if Moog adopts this new API, you should be able to run twice as much instances on processors with 2 high performace cores (maybe one less than exact twice because of some additional overhead by managing other things, but definitely lot more)

Of course - IF they adopt new API.

Besides the overhead ,on ipads we may not see much improvement if iOS decides to throttle down . But still I am curious seeing the benefits on my old triple core Air2 . I think Zenbeats had an option for multicore ,but the performance was worse .

dendy · June 2020

I think Zenbeats had an option for multicore ,but the performance was wors

Yes because prior to iOS14 and this new API multicore usage is posssible, but it is very tricky and if it is not done properly, it can make things just worse (which is obviously ZB case ). Biggest problem is syncing multiple realtime threads - this should be solved in this new api ..

wim · June 2020

@jolico said:
How many instances of Model D ?

Exactly the same number as now unless it gets a major rewrite to take advantage of that and/or (I'm not clear which) a host is able to take advantage of it.

Even then there will be no one numeric answer to your question because it'll still depend on hardware and other apps running just like it does now.

Don't hold your breath.

Mark B · June 2020

I expect it will have the same limitations as desktop DAW’s unless they are doing something clever. In that plugins that share a group bus or send have to process on the same core.

Korakios · June 2020

@wim said:

@jolico said:
How many instances of Model D ?

Exactly the same number as now unless it gets a major rewrite to take advantage of that and/or (I'm not clear which) a host is able to take advantage of it.

Even then there will be no one numeric answer to your question because it'll still depend on hardware and other apps running just like it does now.

Don't hold your breath.

In the video (3:09=3:29) I understood that a host can also distribute audio threads to cores (besides that an AU can now have multiple threads). So depending on the hardware it may help adding more instances even if the AU is not multi-threaded.
But the AU must join the audio workgroup ,so it will require some coding

@Michael @Michael @Michael ?

Carnbot · June 2020

I'm wondering if this is what could fix the CPU glitch issues on 12.9 2017 ipads in AUM when opening and expanding AU gui's.

So if you can theoretically split things across all 6 cores in 2017 model, could you potentially achieve 6 x more performance?

I'm optimistic, but I know it's probably not that straightforward

supadom · June 2020

@CracklePot said:
Fuck, I hope this doesn’t break everything.
Again.

Yeah, there is that.

Clueless · June 2020

There are some plug-ins on mac/windows which would not even possible to run without multi-core (and/or AVX, AVX-2 instructions) like Kaleidoscope since they so heavy on cpu (but very unique and that is the price if you want things up to 512 tuned resonators).
So multi-core is always a great thing, also for single plug-ins.
I just wonder if iOS devices can handle the thermal of such tasks and/or does not throttle then even more.
But indeed for DAWs it should be the best thing ever on iOS.
This could also lead to a more realistic Logic iOS version

dendy · June 2020

@Korakios said:
In the video (3:09=3:29) I understood that a host can also distribute audio threads to cores (besides that an AU can now have multiple threads). So depending on the hardware it may help adding more instances even if the AU is not multi-threaded.
But the AU must join the audio workgroup ,so it will require some coding

On iOS host can't affect in any way thread management of plugin running. Host just requests instance plugin interface (simplified explanation), sends to plugin instance stream of audio and (eventually) midi and receives from plugin output processed stream of audio and midi .. Then it mixes it together with own stuff.

It can't affect in any way plugin's threading and it cannot decide to "load plugin on different thread". It doesn't work that way. That's out of host ability.

Also, on iOS all plugins are running "off-process", it means they have own audio thread and own main (UI) thread (which is shared for all instances of same plugin). Plugin then communicates with host through api.

On desktop, host can request plugin to run "in process" (as part of host's process) but plugin must allow that (must be compiled to accept such way of running). Running "in-process" is a little bit better in terms of performance (you don't need intermediator which serves for host<->plugin communication) but also more risky because plugin crash takes down whole host.
(this is for AUv3 plugins, no idea how it is with AUv2 and VST - maybe they are all by defaut "in process" ?? don't know)

Anyway this option is just on desktop, not on iOS. On iOS all plugins are simply off-process.

@Carbot
So if you can theoretically split things across all 6 cores in 2017 model, could you potentially achieve 6 x more performance?

Bear in mind that just 2 cores are high performance, rest of them are efficient low performance cores, so you don't get too much from them.

Anyway, app (no mater if host or plugin) can just create multiple threads and process some code on those threads. It cannot decide which CPU is physically processing which thread - this is iOS core responsibility. iOS decides based on multiple things which thread in which moment is processed by which core.

Anyway, as soon as iOS detect higher load on realtime thread(s), it switches to high performance cores (continuously) and turns OFF low performance cores.

rs2000 · June 2020

@Carnbot said:
I'm wondering if this is what could fix the CPU glitch issues on 12.9 2017 ipads in AUM when opening and expanding AU gui's.

So if you can theoretically split things across all 6 cores in 2017 model, could you potentially achieve 6 x more performance?

I'm optimistic, but I know it's probably not that straightforward

I wonder how cool your iPad will remain with all 6 cores running at full speed (is that even supported by the hardware?)
If they don't run at full speed then I wonder how much gain this will really add.
It's not like you can simply split audio processing over multiple threads and have a linear increase with the number of CPU cores, there's always an overhead and often wait times until other threads are finished.
A web browser or running multiple apps simultaneously is a much better example for taking advantage of multiple cores.

Carnbot · June 2020

@rs2000 said:

@Carnbot said:
I'm wondering if this is what could fix the CPU glitch issues on 12.9 2017 ipads in AUM when opening and expanding AU gui's.

So if you can theoretically split things across all 6 cores in 2017 model, could you potentially achieve 6 x more performance?

I'm optimistic, but I know it's probably not that straightforward

I wonder how cool your iPad will remain with all 6 cores running at full speed (is that even supported by the hardware?)
If they don't run at full speed then I wonder how much gain this will really add.

hopefully more stability and at least a significant performance boost even if there's a limitation on all the cores at once, but I guess we'll find out when developers get to play with it

rs2000 · June 2020

@Carnbot said:

@rs2000 said:

@Carnbot said:
I'm wondering if this is what could fix the CPU glitch issues on 12.9 2017 ipads in AUM when opening and expanding AU gui's.

So if you can theoretically split things across all 6 cores in 2017 model, could you potentially achieve 6 x more performance?

I'm optimistic, but I know it's probably not that straightforward

I wonder how cool your iPad will remain with all 6 cores running at full speed (is that even supported by the hardware?)
If they don't run at full speed then I wonder how much gain this will really add.

hopefully more stability and at least a significant performance boost even if there's a limitation on all the cores at once, but I guess we'll find out when developers get to play with it

I'm sure it will evolve over time. And finally MIDI clock processing can have its own thread 😅

hes · June 2020

@dendy said:

Beyond that, having a plugin itself manage multiple cores, would not seem to provide much advantage; the host will presumably already have scheduled things out efficiently

nope:-) host isn't sheduling anything, host uses own audio thread for own stuff but cannot affect threading of plugin in any way... he just sends to plugin some data (parameters+audio/midi stream) and gets from plugin result, but has zero impact on plugin's threading

"Scheduling" may not be the right word. And I don't know anything about audio programming. But I assume the idea is that if a host is loading a plugin, then the host can control what process/thread the plugin starts in, and presumably what core that process runs on.

Here's an article that may be an interesting read: https://www.soundonsound.com/sound-advice/multi-core-processors-musicians

One quote from that article:

"The vast majority of stand-alone soft synths also seem to mostly use a single core, but as soon as you load the VSTi or DXi version into a host VSTi or DXi application, this host should distribute the various plug-ins and soft synths across the available cores to make best use of resources. Fortunately, most multitrack audio applications can distribute the combined load from all your tracks between as many cores as they find . . . ."

Turntablist · June 2020

@Korakios said:

@Turntablist said:

@hes said:
Are there plugins that manage their own multiprocessing on desktop OS's?

No, it is all dealt with by the host.

Diva VST can have multicore processing on a single instance in Reaper.

Kind of, it is an experimental option for forcing more polyphony and actually uses more CPU rather than load spreading, interesting, but in general, load spreading across cores is handled entirely by the host.

noob · June 2020

Steinberg got multicore tweaking options in Halion Groveagent etc dunno if its related, just struck me when reading https://steinberg.help/halion_sonic/v3/en/halion/topics/_shared/options_page_r.html

Clueless · June 2020

@Turntablist said:

@Korakios said:

@Turntablist said:

@hes said:
Are there plugins that manage their own multiprocessing on desktop OS's?

No, it is all dealt with by the host.

Diva VST can have multicore processing on a single instance in Reaper.

Kind of, it is an experimental option for forcing more polyphony and actually uses more CPU rather than load spreading, interesting, but in general, load spreading across cores is handled entirely by the host.

But if one instance of a synth/patch/FX does not run on a single core it makes sense.
I have several synths and FX which use it and it works. When i enable multi-core i can run patches which makes trouble without it.
It is just that DAWs might handle it different and it could bring trouble to use multi-core with a plug-in and your DAW. At least i have no trouble with Logic when multi-core is on auto (up to 8 threads in my case) and it also seems to run fine with plug-ins which use their own multi-core.
Diva f.e. just had an update with better multi-core handling. Dune 2/3, The Legend use it.
And like i said really heavy FX like Kaleidoscope would not run at all without it (beside you have a really fast CPU).
But it could be a night mare to support this with all DAWs handling things a bit different.
At least U-he, Synapse Audio and 2CAudio f.e. seems to made it right.

dendy · June 2020

.

dendy · June 2020

@hes thanks for that article, i think i'm understanding now, i was not getting properly one detail...

CracklePot · June 2020

Aren’t multi-core and multi-thread 2 different things?

Like multiple threads on one core is one thing, but it doesn’t imply multi-core processing.
Maybe they mention mutli-core specifically in the video (wasn’t fully paying attention, sorry),
but I had to ask because it seems like a lot of posts in this thread use the terms interchangeably.

dendy · June 2020

@CracklePot said:
Aren’t multi-core and multi-thread 2 different things?

Like multiple threads on one core is one thing, but it doesn’t imply multi-core processing.
Maybe they mention mutli-core specifically in the video (wasn’t fully paying attention, sorry),
but I had to ask because it seems like a lot of posts in this thread use the terms interchangeably.

true. Apps are working with threads. Operating system
is making decision which thread runs on which core.

Video from first post is about threads, and how to sync them... at the end you need mix together result of all
threads created by all apps to one final stream which ends in main OS audio thread which goes to HW intrface ... "Audio Workgroups" provides mechanism for apps to manage all this process.

Clueless · June 2020

So is a thread a virtual core? Since f.e. the i5 4-core has 4 cores but the i7 has 4 cores but 8 threads (8 virtual cores).

dendy · June 2020

@Clueless said:
So is a thread a virtual core? Since f.e. the i5 4-core has 4 cores but the i7 has 4 cores but 8 threads (8 virtual cores).

Thread is just queue for application code.. application puts code into this queue for later execution. On iOS all UI related code goes into, so called, "main thread" - there is happening everything related to UI and whole application lifetime.

DSP code goes to "realtime audio thread" (or to be exact to the "function" which is called periodically by system's main audio thread) .. application can open as much other threads as it needs and put some code for later processing into that thread / queue) - it 's application responsibility to manage those threads, share data between them in meaningful way.

Number of threads which run app is totally unrelated to number of cores.

then operating system makes decision (based on lot of things) which thread is when processed by CPU cores. It can anytime stop processing o some thread and jump to other and so on ..

wim · June 2020

@Clueless said:
So is a thread a virtual core? Since f.e. the i5 4-core has 4 cores but the i7 has 4 cores but 8 threads (8 virtual cores).

A thread is just a queue of software instructions for a processor to execute. A core is a processor. Its the same as having multiple CPUs, just contained in one chip. A core/processor's job is to execute stacks of requests (threads). I don't think that hosts decide which core threads execute on. I think the operating system decides that, but I could be wrong on that point.

Think of a thread as a line at the grocery store. Think of cores / CPUs as the clerks at the checkout stand. Each clerk only services one customer at a time.

To stretch this analogy even further to apply to iOS audio processing. Say a van load of people arrive at the store to go shopping. They get their stuff and get in line at any available check stands. If they all had to line up at a single check stand it would take much longer than if there were four check stands. They each get processed through at various times and then climb back in the van. But the van can't leave until all shoppers are back on board.

That's the point that Dendy was making: audio input (the van of shoppers arriving) gets processed, more or less efficiently based on the number of check stands (cores / processors), but all processes must be complete before output (the van of shoppers leaving).

What the hell ... I'm so far in the weeds now, I might as well stretch it ridiculously farther. Different clerks at the store work at different speeds. Maybe there's one or two good clerks working, but the lines start to increase, so they open more lines, but they call in the less capable clerks. Those lines move slower than the others, but they still help overall. Pity you if you get in the line with the clerk that moves like molasses (which I always do). That's what Dendy was saying about some cores being slower than others.

Bad app code is analogous to that person in the line who empties their piggy bank and counts out 200 pieces of change to pay for their groceries.

dendy · June 2020

@wim
but all processes must be complete before output (the van of shoppers leaving).

And this is where things starting to be REALLY interesting with this new Audio Workgroups API.

if you watch this image :

there is clearly possible to run paralel threads which are not running in sync (or within) buffer time frame of main audio thread - they can deliver result later and when it is delivered, it is passed into current buffer round of main audio thread..

this is pretty cool, i can see how this can be used for example for reverb, where you can precalculate sooner in other thread what you will need only later.. Huge reverb optimalisation opportunity !!

hes · June 2020

@wim said:

@Clueless said:
So is a thread a virtual core? Since f.e. the i5 4-core has 4 cores but the i7 has 4 cores but 8 threads (8 virtual cores).

This Wikipedia entry may (or may not) clarify some things: https://en.wikipedia.org/wiki/Thread_(computing)

. . . I do not think that hosts decide which core threads execute on. I think the operating system decides that, but I could be wrong on that point.

I think that's true, generally, but not true regarding the sort of audio processing we're talking about. Decisions regarding putting different processes on different cores can be made at the OS level when the processes are totally independent. For example, if you have a word processor and a photo editing app running, they have nothing to do with each other, OS is free to place them on different cores.

However, with a set of audio apps producing a single output, they may run in different processes/threads but the work done needs to be coordinated, e.g., work done in a plugin's process needs to be monitored and eventually brought back into and synchronized with a main coordinating process (i.e, the host). In this case, the OS itself does not have enough information to spread the load among different cores, the host app is the only process that has all the information necessary to coordinate the dependent plugin processes that are to be run on different cores. So the host app needs to work in concert with the underlying OS to spread the load to different cores, the OS can't make the decisions by itself. (At least it seems to me that this is way it has to be.)

Clueless · June 2020

@wim said:

@Clueless said:
So is a thread a virtual core? Since f.e. the i5 4-core has 4 cores but the i7 has 4 cores but 8 threads (8 virtual cores).

A thread is just a queue of software instructions for a processor to execute. A core is a processor. Its the same as having multiple CPUs, just contained in one chip. A core/processor's job is to execute stacks of requests (threads). I don't think that hosts decide which core threads execute on. I think the operating system decides that, but I could be wrong on that point.

Think of a thread as a line at the grocery store. Think of cores / CPUs as the clerks at the checkout stand. Each clerk only services one customer at a time.

To stretch this analogy even further to apply to iOS audio processing. Say a van load of people arrive at the store to go shopping. They get their stuff and get in line at any available check stands. If they all had to line up at a single check stand it would take much longer than if there were four check stands. They each get processed through at various times and then climb back in the van. But the van can't leave until all shoppers are back on board.

That's the point that Dendy was making: audio input (the van of shoppers arriving) gets processed, more or less efficiently based on the number of check stands (cores / processors), but all processes must be complete before output (the van of shoppers leaving).

What the hell ... I'm so far in the weeds now, I might as well stretch it ridiculously farther. Different clerks at the store work at different speeds. Maybe there's one or two good clerks working, but the lines start to increase, so they open more lines, but they call in the less capable clerks. Those lines move slower than the others, but they still help overall. Pity you if you get in the line with the clerk that moves like molasses (which I always do). That's what Dendy was saying about some cores being slower than others.

Bad app code is analogous to that person in the line who empties their piggy bank and counts out 200 pieces of change to pay for their groceries.

Eschatone · June 2020

I’m surprised how efficient MiRack already is across multiple instances. Real multi-threaded audio is going to be a boon for apps like this.

It’s one of the last pieces to the hardware puzzle to bring this platform even closer to parity with desktop.

horsetrainer · June 2020

I don't do programing, but I like to learn about technical concepts.

After reading Wikipedia's article on multi-processor computing, if I understood it correctly, a main issue is scheduling the access to things like Storage, Memory, and I/O connections.

The multiple processors can process their respective threads faster in parallel, than can a single processor... But apparently, the results of those multiply processed threads require other processes that are devoted to managing I/O.

An example mentioned in the Wiki article was modifying data on any specific area of storage or Ram memory. The system design needs to prevent different threads from modifying data that other threads may still need to use.

So even though certain types of "processes" can be completed much faster, those processes that require use of a same I/O function, may end up waiting for other processes to complete to prevent erroneous I/O.

In other words, for some types of computer function, the speed can only be as fast as the speed at which any specific type of I/O is capable of handling throughput.

If I'm correct in my assumption. To apply it to wim's grocery store analogy. Let's consider multiple vans are parking in the store parking lot, and each driver must leave at an exact time.... But the rate that people can move through the stores in-door and out-door, has a limit.... You could have multitudes of checkout lines that speed up the check out process.... But the one function that matters most is that the vans leave on time....

So the store has to keep track of each shopper's departure time, and make sure they are put in both the checkout lines, and in the lines leading into and out of the store, in an order that allows all the shoppers to make it back into their vans on schedule.

I could have interpreted the Wikipedia article wrong, but that sums up the way I understood what I read.

hes · June 2020

@horsetrainer said:
So the store has to keep track of each shopper's departure time, and make sure they are put in both the checkout lines, and in the lines leading into and out of the store, in an order that allows all the shoppers to make it back into their vans on schedule.

This is not a precise or complete analogy, but this may be a way to think about single-core vs. multi-core in the audio app context using your van at the grocery example:

Assume we have a single van (the DAW) that needs to get filled up with food by a certain time. The van has four workers (the plugin apps) that can be used to gather food in the store to hopefully leave on time.

In the single processor core scenario, the van sends the four workers into the grocery store to gather produce. However, they have a (strange) limitation: only one of them can move or gather produce at a time (during their "time slice" on the single processor core). So Worker 1 moves a bit and gets an apple or two, then he must stop and Worker 2 is allowed to gather for a while, then Worker 3, and so on. Eventually they all make it back to the van; if one arrives back before the others he will just sit and wait until they all get back. (Hopefully they all get back in time, if not, then you may, for example, hear a "crackle".)

In the multi-processor scenario, the van sends the four workers out again. Only this time, the workers can all go about their business without worrying about being stopped to allow another worker to move (so long as they are the only worker assigned to their processor core). In an ideal world, they would complete their task of getting all the produce back to the van in 1/4 the time.

(Preventing different processes or threads from accessing the same data at the same time is critical, but I don't think it's a necessary part of understanding the benefit of multi-processing. )

Loopy Pro: Create music, your way.

Most important news from WWDC - Audio apps can now run multiple audio threads !!!

Comments