'middlewared' process spikes CPU usage in one minute invtervals

Joined
Apr 8, 2021
Messages
7
I noticed the CPU fan was repeatedly spinning up. Looks like the 'middlewared' process (no idea what '(wo' stands for) is using a lot of CPU capacity, every minute, for a few seconds.
Is this normal behavior or a known problem? I didn't find any reports which matched this behavior.

top.PNG


cpu.PNG

I'm on TrueNAS-SCALE-22.12.3.3
 
Last edited:

brokenmass

Cadet
Joined
Aug 14, 2023
Messages
1
I have the same issue but with slightly lower cpu usage (like 8/10%) spikes exactly every minute
 

sfatula

Guru
Joined
Jul 5, 2022
Messages
608
WO is cut off, it's actually middlewared (worker). Middlewared is truenas process, What CPU do you have?
 

sfatula

Guru
Joined
Jul 5, 2022
Messages
608
I don't know what may run that often. Can you click on the bell icon on top right and then click history? Anything there that's running every minute?
 
Joined
Apr 8, 2021
Messages
7
Can you click on the bell icon on top right and then click history? Anything there that's running every minute?
I think you meant the "Jobs" button? I didn't find a history option under the bell icon for alerts.
And no, nothing there running every minute. Only "ipmi.sel.info" and "ipmi.sel.elist" every five minutes, which sounds like the problem discussed in this thread:
 

sfatula

Guru
Joined
Jul 5, 2022
Messages
608
Yeah, meant jobs not bell. Correct, the IPMI spam is not the issue. I am not sure what you could do to make it do that, was thinking maybe an every minute data protection task but not sure that would be all middlewared.

You could try looking in middlewared.log and syslog and see if anything shows up there, maybe hardware. Your CPU is not an old obsolete one so it's not that.

I wouldn't say it's normal for sure. Mine sits at 1%, there is no spike every minute.
 

fornex

Dabbler
Joined
Jul 13, 2023
Messages
11
Hello Everyone,

After upgrading to Cobia (BETA), I also started seeing the exact same phenomenon: CPU activity (as well as temperatures) spike every single minute, leading to roughly 10% spikes (which is quite high as transfering large files only uses 4% CPU on my system).

After digging through middlewared logs (/var/log/middlewared.log), I only had the following "interesting" parts in the log file:

1) pyroute2 fails with the following traceback:

I use a Ryzen 5750G PRO as my CPU.

Code:
[2023/09/10 16:59:24] (INFO) InterfaceService.sync():1723 - Failed to sync routes
Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/middlewared/plugins/network.py", line 1721, in sync
    await self.middleware.call('route.sync')
  File "/usr/lib/python3/dist-packages/middlewared/main.py", line 1418, in call
    return await self._call(
           ^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3/dist-packages/middlewared/main.py", line 1361, in _call
    return await methodobj(*prepared_call.args)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3/dist-packages/middlewared/plugins/network_/route.py", line 130, in sync
    routing_table.add(ipv6_gateway)
  File "/usr/lib/python3/dist-packages/middlewared/plugins/interface/netif_linux/routing.py", line 202, in add
    self._op("add", route)
  File "/usr/lib/python3/dist-packages/middlewared/plugins/interface/netif_linux/routing.py", line 235, in _op
    ip.route(op, **kwargs)
  File "/usr/lib/python3/dist-packages/pyroute2/iproute/linux.py", line 2310, in route
    ret = self.nlm_request(
          ^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3/dist-packages/pyroute2/netlink/nlsocket.py", line 870, in nlm_request
    return tuple(self._genlm_request(*argv, **kwarg))
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3/dist-packages/pyroute2/netlink/nlsocket.py", line 1214, in nlm_request
    for msg in self.get(
               ^^^^^^^^^
  File "/usr/lib/python3/dist-packages/pyroute2/netlink/nlsocket.py", line 873, in get
    return tuple(self._genlm_get(*argv, **kwarg))
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3/dist-packages/pyroute2/netlink/nlsocket.py", line 550, in get
    raise msg['header']['error']
pyroute2.netlink.exceptions.NetlinkError: (22, 'Invalid argument')


2) I get a weird warning saying that a mountpoint for /var/db/system was not found:
Code:
[2023/09/10 16:59:37] (WARNING) SystemDatasetService.sysdataset_path():114 - /var/db/system: mountpoint not found


3) I get an error regarding a service called idmap:
Code:
[2023/09/10 16:59:38] (ERROR) ServiceService._restart():282 - Service 'idmap' not running after restart
 

sfatula

Guru
Joined
Jul 5, 2022
Messages
608
I get #2 and #3, but not #1 on non Cobia. Since I don't have any minute spikes as you guys do, I don't think 2 and 3 have anything to do with it. If you don't have regular logged errors, every minute say, don't know what to say. I'd file a bug report myself if no one else chimes in on why every minute spikes.
 

fornex

Dabbler
Joined
Jul 13, 2023
Messages
11
Thank you for the feedback. In the meantime, I filed a ticket, which can be seen at NAS-124038.

The only weird part (to me at least) was that this phenomenon only started in my case after upgrading to Cobia, while I can see that OP is experiencing the same issue even on TrueNAS-SCALE-22.12.3.3.
 
Joined
Sep 11, 2023
Messages
6
hi,
I noticed the same phenomenon (middleware (worker) taking lots of CPU every minute) after upgrading (from angelfish) to 22.12.3.3.
I couldn't find anything in the logs but I found that the spikes were gone when all pools were exported. this led me to the ZFS reporting tab and there I found that the cpu spikes correlate perfectly with "ARC Requests demand_metadata". I also found this other thread finding the requests: https://www.truenas.com/community/t...quests-demand_metadata-on-idle-server.111373/ .
Also, when I exported a pool with a lot of nested datasets the spikes went down, middleware (worker) now only taks about half a core and not a whole core every minute.
Does this help?
 

fornex

Dabbler
Joined
Jul 13, 2023
Messages
11
@StubbornGarrett You're welcome. Hopefully, this isssue will be resolved soon.

@dasdreHmomenT Thank you for your suggestion. That may work. I will probably try it after doing a thorough backup. Although the weirdest thing in my case was that this spike started happening even before I imported/created any pools, besides the system pool that was already assigned during the installation (I did a clean install of Cobia).
 

anodos

Sambassador
iXsystems
Joined
Mar 6, 2014
Messages
9,553
@StubbornGarrett You're welcome. Hopefully, this isssue will be resolved soon.

@dasdreHmomenT Thank you for your suggestion. That may work. I will probably try it after doing a thorough backup. Although the weirdest thing in my case was that this spike started happening even before I imported/created any pools, besides the system pool that was already assigned during the installation (I did a clean install of Cobia).
60 second intervals sounds like it's possibly an alert that's running too frequently (there's a periodic middleware task to check whether any alerts need to be run). You could probably play around with those settings to see if you can narrow down if one is unreasonably expensive or running at unexpected times.
 

fornex

Dabbler
Joined
Jul 13, 2023
Messages
11
60 second intervals sounds like it's possibly an alert that's running too frequently (there's a periodic middleware task to check whether any alerts need to be run). You could probably play around with those settings to see if you can narrow down if one is unreasonably expensive or running at unexpected times.
Thank you for the advice!

Yes, it does seem too artificial, even with the occasional (ca. 5 minutes) "extra spikes" (when the heat reaches 50+ degrees Celsius and sometimes 30% CPU usage, according to the logged CPU usage values). I will try to play around with the settings the next time I turn the system on (for now, I turned it off due to the really high temperatures, increased power consumption and high CPU usage despite no connections to the server and no services running).
 
Joined
Sep 11, 2023
Messages
6
60 second intervals sounds like it's possibly an alert that's running too frequently (there's a periodic middleware task to check whether any alerts need to be run). You could probably play around with those settings to see if you can narrow down if one is unreasonably expensive or running at unexpected times.
thank you for the idea!
hmm, at least my server had the spikes even when I reset to defaults as soon as I imported a pool.
all alerts are set to "immediately". I changed it for some and will play around, but is there any way to see which might cause the issue in the logs?
 
Joined
Sep 11, 2023
Messages
6
so, changes in the alert settings did nothing about the spikes for me.
I saw that the ticket NAS-124038 was closed because the issue could not be reproduced. It is however still present for me and I suppose for the others in this thread (bluefin 22.12.3.3. and cobia). Is there anything we can do to help investigate this?
 

fornex

Dabbler
Joined
Jul 13, 2023
Messages
11
Hello Everyone,

I managed to upgrade my TrueNas Scale to Cobia RC.1 to check if the issue still persists.

To my surprise, it seems the issue has been solved. Although, I would like to emphasize that I did not install any apps, heck, I did not even import/create any pools. However, last time (as far as I can remember), the issue started appearing immediately after installation, so it seems nice so far. I will try to give updates after I can get to the NAS again and install apps etc.

@dasdreHmomenT
I will post the updates from my end. If the isssue does not resurface in my case, even after installing apps etc., I would say that (even though I am still only one user, meaning a statistically less relevant sample) it would be worthwhile to communicate with the devs, as in that case, the code change between BETA-1 and RC.1 likely resolved the issues. If that is so, then they could check if any of the code changes could apply in your cases as well.
 
Joined
Sep 11, 2023
Messages
6
Hello Everyone,

I managed to upgrade my TrueNas Scale to Cobia RC.1 to check if the issue still persists.

To my surprise, it seems the issue has been solved. Although, I would like to emphasize that I did not install any apps, heck, I did not even import/create any pools. However, last time (as far as I can remember), the issue started appearing immediately after installation, so it seems nice so far. I will try to give updates after I can get to the NAS again and install apps etc.

@dasdreHmomenT
I will post the updates from my end. If the isssue does not resurface in my case, even after installing apps etc., I would say that (even though I am still only one user, meaning a statistically less relevant sample) it would be worthwhile to communicate with the devs, as in that case, the code change between BETA-1 and RC.1 likely resolved the issues. If that is so, then they could check if any of the code changes could apply in your cases as well.
Thank you for the update!
Please inform us whether the issue reappears after importing pools! As stated above, I did not have it on a fresh install but it reappeared immediately after importing a pool and significantly subsided with the removal of a dataset with many children/snapshots.
 

fornex

Dabbler
Joined
Jul 13, 2023
Messages
11
Oh, that is not good news (the fact that it did not happen on a fresh install).

I am hoping that it will not happen once again, as last time I imported the pools, but now I am going to re-create them from scratch (there were a few things that went wrong with ACL previously), but I will certainly post the update. (I will be able to give it a go tomorrow afternoon hopefully, as there has been a lot of work that piled up to the weekend sadly.)
 
Top