'middlewared' process spikes CPU usage in one minute invtervals

fornex

Dabbler
Joined
Jul 13, 2023
Messages
11
Alright, the results are out. :)

TLDR: I think that there is a major improvement. I still see spikes, but they are usually (not always) every 5 mins instead of the previous 1 min interval. This much may be fine (I will try to investigate further).

I tried playing around with the settings etc. What I found were:
- With a fresh install, re-creating the Pools and Datasets etc. instead of importing them, the spikes still happen, but roughly every 5 mins. Although I am not sure, I would guess that this should be when the system dataset gets written into. No writes in-between.
- After adding an app, setting up shares etc., things aren't looking as good (the app was stopped after installation to mitigate any I/O etc.). This is because at this point, the CPU spikes started happening more often. Still not as often as previously with BETA-1, but still plenty of spikes. (The red interval is when I started an I/O intensive app, which was then stopped.)
Képernyőfotó 2023-10-01 - 22.15.25.png

- It was mentioned in the ticket that the logging has somewhat changed. I think I can verify this, as it seems that every value is being logged a lot more frequently: the values of the dashboard are being updated much, much faster than they were on Bluefin (at least for me), and also (as a kind of drawback I might say) the disk I/O looks weird on just about any disk if there is any I/O as the logs are just too frequent (see the image below).
Képernyőfotó 2023-10-01 - 22.03.19.png

- Due to these "too" frequent logged values, it may be that at this point, this much is fine. I will try to measure how much power the system is using with a meter I have at home. If these bumps are in any way important, then I should be able to see them on the readings. If there are no jumps in the readings however, that would mean that this is fine now and this is now only because of the updated, too granular logging frequency.
 

anodos

Sambassador
iXsystems
Joined
Mar 6, 2014
Messages
9,553
Alright, the results are out. :)

TLDR: I think that there is a major improvement. I still see spikes, but they are usually (not always) every 5 mins instead of the previous 1 min interval. This much may be fine (I will try to investigate further).

I tried playing around with the settings etc. What I found were:
- With a fresh install, re-creating the Pools and Datasets etc. instead of importing them, the spikes still happen, but roughly every 5 mins. Although I am not sure, I would guess that this should be when the system dataset gets written into. No writes in-between.
- After adding an app, setting up shares etc., things aren't looking as good (the app was stopped after installation to mitigate any I/O etc.). This is because at this point, the CPU spikes started happening more often. Still not as often as previously with BETA-1, but still plenty of spikes. (The red interval is when I started an I/O intensive app, which was then stopped.)
View attachment 70782
- It was mentioned in the ticket that the logging has somewhat changed. I think I can verify this, as it seems that every value is being logged a lot more frequently: the values of the dashboard are being updated much, much faster than they were on Bluefin (at least for me), and also (as a kind of drawback I might say) the disk I/O looks weird on just about any disk if there is any I/O as the logs are just too frequent (see the image below).
View attachment 70779
- Due to these "too" frequent logged values, it may be that at this point, this much is fine. I will try to measure how much power the system is using with a meter I have at home. If these bumps are in any way important, then I should be able to see them on the readings. If there are no jumps in the readings however, that would mean that this is fine now and this is now only because of the updated, too granular logging frequency.
Every 5 minutes we check pool / data / quota usage (this will cause middlewared process to temporarily spike a single thread as well). Is there an actual issue you're trying to solve? Sometimes utilization is normal / expected.
 

fornex

Dabbler
Joined
Jul 13, 2023
Messages
11
Every 5 minutes we check pool / data / quota usage (this will cause middlewared process to temporarily spike a single thread as well). Is there an actual issue you're trying to solve? Sometimes utilization is normal / expected.
I would say yes. The spiking is now a lot better and more bearable than it was on BETA. However, in my opinion at least, it is still too frequent and does not seem justified on a system with freshly created and almost entirely empty pools and datasets. I thought that the spikes were only happening roughly 5 minutes, which, as you noted, could be regarded as a perfectly normal phenomenon. However, after a bit more testing that I did this week, the situation seems to be a bit more nuanced, as the spikes are not limited to every 5 minutes.

Since this was my first NAS, I had a power meter attached to the NAS from day 1 on Bluefin, so I have an idea of how much power it uses when not in use (e.g. when I am not transferring any data etc.). The PC would sit at about 51.5 to 52.0 W mostly, with some slight jumps up to 54W at most. I tested the same setup now on Cobia RC, and what I found was that the power usage also spikes when the CPU temp./usage graphs spike, meaning that the logged spikes are not simply a result of the changed log frequency between Bluefin and COBIA.

The base wattage is still at around 52W, however, when the spikes happen, the power usage jumps all the way up to 70W, regularly. Also, sadly this happens way more often than 5 minutes. From my experience, it happens a bit randomly, sometimes every 1 to 1.5 minutes, and sometimes 2 or 3. Some other times, it goes on for 5+ minutes without any spikes (see the attached image, where the spikes are happening way more often than 5 minutes).

I think I will have a look around, as I have some vague memories that this started intensifying after moving the system dataset to an SSD.

Képernyőfotó 2023-10-03 - 22.07.29 másolat.png
 
Joined
Sep 11, 2023
Messages
6
I would agree that it is an issue. I know that truenas is not aimed at (weak) home server use, but for me at least the upgrade to Bluefin with the CPU spikes meant going from average core temperatures of ~45° to ~50°. This will add up over time, electricity and maybe also durability-wise.
 

threeseven

Cadet
Joined
Nov 2, 2023
Messages
2
I have the same problem as @fornex after updating from Bluefin to Cobia - high CPU usage spikes every 2 minutes or so. This is weird, since on Bluefin CPU usage/temp graph line was flat when system is idle. Is there any way to find out what is happening during this process is running?


Screenshot 2023-11-02 at 16.08.49.png

Screenshot 2023-11-02 at 15.48.35.png
 

anodos

Sambassador
iXsystems
Joined
Mar 6, 2014
Messages
9,553
I have the same problem as @fornex after updating from Bluefin to Cobia - high CPU usage spikes every 2 minutes or so. This is weird, since on Bluefin CPU usage/temp graph line was flat when system is idle. Is there any way to find out what is happening during this process is running?


View attachment 71994
View attachment 71996
Do you have any external monitoring applications for TrueNAS? I saw one user who had some homeassistant integration that was polling some expensive middleware API endpoints (as well as some non-existent ones on Cobia).
 

Phas2

Cadet
Joined
Nov 11, 2022
Messages
1
I would like to report this same behaviour as undesired. It also started happening after the upgrade from Bluefin to Cobia.
truensa scale process spike.png

after some time looking at htop I managed to capture the process name but also noticed sometimes it would appear up to 3 times.
Perhaps it is so impactful to the system because it runs as a "user" task and gains some kind of priority causing the cpu to raise core speeds.
I'm running on a AMD Ryzen 7 PRO 4750G with Radeon Graphics.
truenas user process.png

On this truenas server I also use a VM for running a game server and every time this check happens there is a noticeable interruption to tick times leading to a bad experience playing the game. Before Cobia it was much more stable. To show this kind of interruption I ran an openspeedtest from the VM connecting outside my network. Above is the virtual adapter for the VM and below is a 10gb network card.
Posnetek zaslona 2024-01-29 135427.png

Does this not happen on fresh installs of Cobia? I've been looking into changing my boot usbs for an ssd and going fresh.
 

fornex

Dabbler
Joined
Jul 13, 2023
Messages
11
I would like to report this same behaviour as undesired. It also started happening after the upgrade from Bluefin to Cobia. View attachment 75229
after some time looking at htop I managed to capture the process name but also noticed sometimes it would appear up to 3 times.
Perhaps it is so impactful to the system because it runs as a "user" task and gains some kind of priority causing the cpu to raise core speeds.
I'm running on a AMD Ryzen 7 PRO 4750G with Radeon Graphics.
View attachment 75230
On this truenas server I also use a VM for running a game server and every time this check happens there is a noticeable interruption to tick times leading to a bad experience playing the game. Before Cobia it was much more stable. To show this kind of interruption I ran an openspeedtest from the VM connecting outside my network. Above is the virtual adapter for the VM and below is a 10gb network card. View attachment 75231
Does this not happen on fresh installs of Cobia? I've been looking into changing my boot usbs for an ssd and going fresh.
Can you try opening a ticket for this? Hopefully this may give more insight into what is going on as the issue has persisted for the past couple of months for me as well.

It also happens on fresh installs. In my last attempt, I made a full backup of all of my datasets, did a clean install, created the pools from scratch without importing anything and even after loading the data, the spikes started happening.
 
Top