RD Series on Simracing.GP

Daily WTCR races on Simracing.GP Weekly GT3 Endurance races on Simracing.GP Weekly GT3 Sprint Races on Simracing.GP Weekly GT4 Sprint Races on Simracing.GP

Test: CPU Core count and RAM scaling in ACC, AC1 and R3E

RasmusP

Premium
Is there any program for windows that displays CPU usage like top(1) in Unix does, which is 100% = one core pegged?

I want to see how many times I have single core doing a single thread workload being a holdup.
I'm yet to find something like that!
Only thing I've found is process explorer -> rightclick on process -> properties -> threads.

You can calculate the maximum amount of usage for your cpu by doing 100% / Threads.
For my 2600k that's 100% / 8 = 12.5%.
So if one thread hits 12.5% it's using one core to full extend.

In reality for example AC is hitting the single core limit already at 11.9%.

Also you don't see any statistics at all, just real time...

1581462098498.png
 
  • Like
Reactions: dud
Is there any program for windows that displays CPU usage like top(1) in Unix does, which is 100% = one core pegged?

I want to see how many times I have single core doing a single thread workload being a holdup.

hwinfo64, you can customize it to show actual load per core as a background, logging or overlay function. Set colors/alarms when it's pegged, etc.

the developer is very responsive also thus if you come across a use case, post it on his forum and he'll directly respond with either a method or when he can add it to the next release.
 
The one core pegged alarm isn't very useful because Windows automatically shuffles the load between cores for most practical tasks like games unless you deliberately run benchmarks for single core tests. Few people are aware of this, but you can actually have a perfectly even core load showing in hwinfo64/Afterburner/etc, but still have a program that is running on one thread, so as @RasmusP pointed out, process explorer is your friend if you want to find out how much CPU any single thread in the program consumes.
 

RasmusP

Premium
The one core pegged alarm isn't very useful because Windows automatically shuffles the load between cores for most practical tasks like games unless you deliberately run benchmarks for single core tests. Few people are aware of this, but you can actually have a perfectly even core load showing in hwinfo64/Afterburner/etc, but still have a program that is running on one thread, so as @RasmusP pointed out, process explorer is your friend if you want to find out how much CPU any single thread in the program consumes.
Exactly.
When I run a single thread cinebench, all my cores are evenly loaded. Very low load though...
 

RasmusP

Premium
Have a look at core use with just AC running.....nothing else.
The AMD's do the "shuffling" a little differently due to the extremely different clockspeeds of the individual cores.
It makes sense to put the big threads of an application to the highest boosting cores.
Intel mostly boosts one core after another to the maximum or, like most overclockers do, simply run all cores at the same frequency.

I for example have my 2600k @ 4.4 GHz.
Cinebench with 2 threads looks like this:
1581590299537.png


And Taskmanager looks like this:
1581590356705.png
 
You can calculate the maximum amount of usage for your cpu by doing 100% / Threads.
For my 2600k that's 100% / 8 = 12.5%.
So if one thread hits 12.5% it's using one core to full extend.

I haven’t used process explorer, but is that really how it works? If, for example, you had an old single-core game/app that was maxing out a core with nothing else running on the other cores, are you saying that the single core being used would still show only 12.5% at maximum load? I can’t see the logic in that.

As an aside, I use MSI Afterburner which can easily be set up to show individual core usage (amongst many other system parameters) and gives you a reasonable historical graph of core loads - about 10 minutes on screen.
 

RasmusP

Premium
I haven’t used process explorer, but is that really how it works? If, for example, you had an old single-core game/app that was maxing out a core with nothing else running on the other cores, are you saying that the single core being used would still show only 12.5% at maximum load? I can’t see the logic in that.

As an aside, I use MSI Afterburner which can easily be set up to show individual core usage (amongst many other system parameters) and gives you a reasonable historical graph of core loads - about 10 minutes on screen.
You got that a bit mixed up. If you have an 8 core CPU and an application running on 1 thread, you would see 12.5% overall CPU load.
If you'd have a single core CPU running the same application, you would see 100% CPU load.

Now with the 8 core CPU it depends on the windows version and exact CPU how these 12.5% overall load are spread and shuffled across the 8 cores.

Ryzen 3xxx CPUs would show one or two cores almost maxed out and boosted to max frequency.
With my older i7: well you see the Screenshots above.

So in the end, when you know what piece of hardware is limiting your fps it's these steps:
1. Graphics card at 100% load or not (some games use too much power or gpu ram so let's say 90% for gpu limit)
2. If gpu not at 90+% => CPU limit
3. If CPU overall load not at 90+% => single thread limit

So if your overall CPU load isn't at 90% or more, buying a CPU with more cores might give you some alight improvement in fps due to better shuffling and headroom but the real performance boost would come from a cpu with a higher single thread performance.

Which is why an i5 8600k will shred an AMD 2700 to pieces in AC, r3e, ams, rF2.


About afterburner, Taskmanager etc : they all show the average load of the core over a certain time frame.
In reality a cpu load can't be loaded to let's say 40%.
It's either doing a task at full load or it waits.
Depending on how much it works and how much it waits, you'll see an averaged load.

With a single thread application on an Intel it will look like this:
0ms = core 1 at 100%, core 2-8 wait
10ms = core 2 at 100%, core 1+3-8 wait

1000ms = core 1 at 100... You'll get the idea.

In the end one thread can only run on one core at the same time. So one thread can only cause a maximum overall cpu load of 100% divided by amount of cpu cores.

For me that's 12.5% and this is why a cinebench thread in process explorer shows 12.5% and can't use more of the cpu.


So if your gpu is at 70% and all your cores only show a load of 40%, you can see in process explorer if a thread hits the maximum.
If not, and there aren't more threads than cpu cores (20 cinebench threads on 8 core CPUs won't hit 12.5% anymore!), you are hitting your single thread performance limit.


Now the new AMDs show the same or even higher single thread performance than the current Intels but this would change quite drastically if you'd run 3 threads instead of one.
The AMD cores won't boost as high on 3 cores as it would boost on 1 core.
 

Andrew_WOT

Premium
fpsVR can show how much of frametime is spent in GPU and CPU, giving a very clear picture with some graphs on what is bottlenecking. Best way for finding out what is dragging performance down.
I am surprised we do not have anything like that for flat screen, may be I didn't look hard enough.
Frametime in Afterburner, is it overall or GPU only?
 

RasmusP

Premium
fpsVR can show how much of frametime is spent in GPU and CPU, giving a very clear picture with some graphs on what is bottlenecking. Best way for finding out what is dragging performance down.
I am surprised we do not have anything like that for flat screen, may be I didn't look hard enough.
Frametime in Afterburner, is it overall or GPU only?
It's overall sadly...
FpsVR indeed sounds great!
 
About afterburner, Taskmanager etc : they all show the average load of the core over a certain time frame.

My understanding is that Taskmanager and Afterburner only shows average load if that’s how you’ve set it up. You can select an overall, average core load for the whole CPU or you can show the real-time, individual core load.

In reality a cpu load can't be loaded to let's say 40%.
It's either doing a task at full load or it waits.

Why then, in Afterburner (and Taskmanager), can you see a continuously changing individual core load? As an example, in Flight Simulator X (essentially a single core app with some multi core capability), you can see the primary core being used up to 100% whilst the other cores are doing much less. Or is there something I’m simply not understanding?
 

RasmusP

Premium
My understanding is that Taskmanager and Afterburner only shows average load if that’s how you’ve set it up. You can select an overall, average core load for the whole CPU or you can show the real-time, individual core load.
"real time value" would mean the graphs refreshing at 4.4 GHz for my CPU. Each core can be loaded or be waiting with a time intervall of the clock speed (a lot more complicated but basically this).
So even if the graph is refreshing each millisecond, it would still show an average over roughly 23.000 cpu cycles.
So if core 1 has waited 10k cycles and was used 13k cycles, Taskmanager/afterburner etc would show 57% load.
Why then, in Afterburner (and Taskmanager), can you see a continuously changing individual core load? As an example, in Flight Simulator X (essentially a single core app with some multi core capability), you can see the primary core being used up to 100% whilst the other cores are doing much less. Or is there something I’m simply not understanding?
I'm no hardware expert and no windows programmer. I have no real clue about how windows is distributing CPU tasks across multiple cores but in the end it's not important for the question a consumer needs to get answered.
The question is: what do I need to upgrade and do I need more cores or more single thread performance and what is the best product for me personally.

For my CPU, windows 7 seems to think for 2 heavy threads + a bunch of background programs, the distribution is best when it looks like on my Screenshots above.

For an AMD Ryzen 3xxx Windows 10 seems to think putting the heavy threads on the max boosted core is the best way while every now and then changing what's the max boosted core is.

For your CPU and flight simulator x your windows seems to think throwing a lot of load on the primary core is the most efficient way.

That's the end of my knowledge sadly.. I'd love to find out more of why and how windows does this but effect is pretty easily visible:

If I add up the loads of the threads in process explorer and compare it to the overall load, I see that the overall load is more than this.

So it seems that if you have 1 thread and only 1 core loaded at 100%, it's not as efficient as having that 1 thread running alternating between 2 cores, loaded to 50% (1 cycle loaded, 1 cycle waiting).
This probably has to do with the waiting core loading something from the ram or doing something else which boosts the overall performance.

So more cores = better but the fewer threads you have running, the better becomes a higher single thread performance as it directly boosts the performance instead of slightly increasing the efficiency.
 
One thing that further complicates things with Ryzen is that it uses so called CCX clusters, which are connected together with Infinity Fabric. Most Ryzen CPU's come with two CCX's, so for example Ryzen 5 3600 has 3+3 cores (one core disabled in each CCX) and Ryzen 7 3700 4+4 cores. The catch here is that any communication that has to take place between the CCX's is slower than inside CCX communication. Intel uses something very different called Ring Bus, where each core is connected to each other, so there is no difference in delay when switching between core 1 and core 3 or core 1 and core 8.

So if you have task that requires let's say two cores, the Intel way would be for Windows to distribute it evenly across every core available. With Ryzen, it gets more complicated. The core scheduler has to first take into account the different boost frequencies in Ryzen between, say, actively running two versus six cores. Secondly, the scheduler should be aware of the CCX's, so it should preferably try to keep all computation inside the same CCX unless the app truly needs more cores. There was a news article here indicating that Windows core scheduling is not working exactly the same way as Ryzen Master, so there might be ways to get better performance with Ryzen depending on which scheduler you use.

Anyway, bottom line is, the individual CPU core readings are not very useful unless you know how the CPU architecture and underlying scheduler works.
 
Last edited:
For an AMD Ryzen 3xxx Windows 10 seems to think putting the heavy threads on the max boosted core is the best way while every now and then changing what's the max boosted core is.
/QUOTE]

Are you sure Windows is scheduling with a "current frequency" awareness?

I would think it is more likely that the processor decided to up-clock those cores thata certain load profile already runs on.
 

RasmusP

Premium
Are you sure Windows is scheduling with a "current frequency" awareness?

I would think it is more likely that the processor decided to up-clock those cores thata certain load profile already runs on.
No I'm not sure at all about any of this. I just know what are the different results comparing CPUs.
Afaik win 10 got an update for better AMD performance though so it seems that windows and the CPUs are at least know a little about how to work best together.

At least the, as you call it "load profile" seems to be different between amd and Intel as the load stays on very few certain cores with amd compared to the full shuffling on the Intel CPUs.

Maybe win 10 just knows "On Intel: spread it all as you feel it's most efficient" and "On AMD throw everything background on one CCX and the big threads on the other CCX and keep it kinda constantly on one or two cores"

But as I said I have no knowledge of how this all works exactly... I just see the results as they are and how they affect the performances.
 
I'm yet to find something like that!
Only thing I've found is process explorer -> rightclick on process -> properties -> threads.

You can calculate the maximum amount of usage for your cpu by doing 100% / Threads.
For my 2600k that's 100% / 8 = 12.5%.
So if one thread hits 12.5% it's using one core to full extend.

In reality for example AC is hitting the single core limit already at 11.9%.

Also you don't see any statistics at all, just real time...

View attachment 348891
could you explain what we can ascertain from this. my cpu is delivering frames more than fast enough for my target frame rate (45) (e.g. say cpu frame time is 9ms), yet i can see a single thread going over my 8.33% single thread limit (up to 12%)
 

RasmusP

Premium
could you explain what we can ascertain from this. my cpu is delivering frames more than fast enough for my target frame rate (45) (e.g. say cpu frame time is 9ms), yet i can see a single thread going over my 8.33% single thread limit (up to 12%)
Hmm, if you have 12 cpu threads (6c + ht) then 8.33% should be the limit. And I've never seen process explorer showing more than what's should theoretically be possible!
12% clearly indicates having 8 cpu threads. Either 8 cores without ht/smt or 4c with ht/smt.

Which cpu do you have? And where do you see the 12%? In process explorer?

Apart from this, you can't see any relevant information about fps. I've seen for example rocket league using 2 threads at 12% even when using a limiter to 40 fps. It stayed the same when disabling the limiter and having 250 fps.

The only thing you can see in process explorer is how many cpu cores you would need to not gain more fps from more cores.
Like simracing titles all only using 2-4 threads so having more than 5 cores barely increases fps.

But having a higher per-core performance with a 5 core cpu would give you a straight fps gain.

Which is why 9600k and 5600x are currently the most recommended CPUs for simracing.
 
Hmm, if you have 12 cpu threads (6c + ht) then 8.33% should be the limit. And I've never seen process explorer showing more than what's should theoretically be possible!
12% clearly indicates having 8 cpu threads. Either 8 cores without ht/smt or 4c with ht/smt.

Which cpu do you have? And where do you see the 12%? In process explorer?

Apart from this, you can't see any relevant information about fps. I've seen for example rocket league using 2 threads at 12% even when using a limiter to 40 fps. It stayed the same when disabling the limiter and having 250 fps.

The only thing you can see in process explorer is how many cpu cores you would need to not gain more fps from more cores.
Like simracing titles all only using 2-4 threads so having more than 5 cores barely increases fps.

But having a higher per-core performance with a 5 core cpu would give you a straight fps gain.

Which is why 9600k and 5600x are currently the most recommended CPUs for simracing.
yeah it does seem fishy. i initially loaded a replay and the % was well below 8.33, so i started a single player quick race to see if having ai calculations made any difference (should be separate thread to render thread though shouldnt it) and then the % went up to 10-12%.
8700k so definitely 12t. I was seeing it in the threads tab you show above on one of the ACC threads. Will have another mess around tomorrow
 

Andrew_WOT

Premium
If single thread switching cores between monitoring refresh cycles (very typical esp. with HT), would that show aggregated usage from each core used, thus exceeding max allocation per core?
Try pinning process to specific core using Affinity.
 
Last edited:

RasmusP

Premium
If single thread switching cores between monitoring refresh cycles (very typical esp. with HT), would that show aggregated usage from each core used, thus exceeding max allocation per core?
Try pinning process to specific core using Affinity.
Process explorer doesn't show core usage. It shows just the cpu usage in general for the internal application threads.
And since they can't be "split", their maximum overall cpu usage is 100% divided by the cpu thread count.
With my 2600k (4c/8t) it always maxed out at 12%. When I disabled ht it maxed out at 25%.
With my 10600k it now maxes out at either 8.33% with ht or 16.6% without ht.

It's not fps related though. You can only see how many cores you should buy for that application.
 
Top