'middlewared' process spikes CPU usage in one minute invtervals

StubbornGarrett · Aug 19, 2023

I noticed the CPU fan was repeatedly spinning up. Looks like the 'middlewared' process (no idea what '(wo' stands for) is using a lot of CPU capacity, every minute, for a few seconds.
Is this normal behavior or a known problem? I didn't find any reports which matched this behavior.

I'm on TrueNAS-SCALE-22.12.3.3

brokenmass · Sep 5, 2023

I have the same issue but with slightly lower cpu usage (like 8/10%) spikes exactly every minute

sfatula · Sep 5, 2023

WO is cut off, it's actually middlewared (worker). Middlewared is truenas process, What CPU do you have?

StubbornGarrett · Sep 9, 2023

sfatula said:
What CPU do you have?

An Intel i3-9100

sfatula · Sep 9, 2023

I don't know what may run that often. Can you click on the bell icon on top right and then click history? Anything there that's running every minute?

StubbornGarrett · Sep 9, 2023

sfatula said:
Can you click on the bell icon on top right and then click history? Anything there that's running every minute?

I think you meant the "Jobs" button? I didn't find a history option under the bell icon for alerts.
And no, nothing there running every minute. Only "ipmi.sel.info" and "ipmi.sel.elist" every five minutes, which sounds like the problem discussed in this thread:

ipmi.sel.info and ipmi.sel.elist jobs filling up Tasks, why and how to make it stop?

Using TrueNAS-SCALE-22.12.3.2 build, the Jobs->History->Tasks page now shows jobs "ipmi.sel.info" and "ipmi.sel.elist" every 5 minutes, but no IPMI hardware is present. These IPMI jobs did not populate the Tasks page in the previous build. I'm guessing it some middleware that is doing this...

www.truenas.com

sfatula · Sep 9, 2023

Yeah, meant jobs not bell. Correct, the IPMI spam is not the issue. I am not sure what you could do to make it do that, was thinking maybe an every minute data protection task but not sure that would be all middlewared.

You could try looking in middlewared.log and syslog and see if anything shows up there, maybe hardware. Your CPU is not an old obsolete one so it's not that.

I wouldn't say it's normal for sure. Mine sits at 1%, there is no spike every minute.

fornex · Sep 10, 2023

Hello Everyone,

After upgrading to Cobia (BETA), I also started seeing the exact same phenomenon: CPU activity (as well as temperatures) spike every single minute, leading to roughly 10% spikes (which is quite high as transfering large files only uses 4% CPU on my system).

After digging through middlewared logs (/var/log/middlewared.log), I only had the following "interesting" parts in the log file:

1) pyroute2 fails with the following traceback:

I use a Ryzen 5750G PRO as my CPU.

Code:

[2023/09/10 16:59:24] (INFO) InterfaceService.sync():1723 - Failed to sync routes
Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/middlewared/plugins/network.py", line 1721, in sync
    await self.middleware.call('route.sync')
  File "/usr/lib/python3/dist-packages/middlewared/main.py", line 1418, in call
    return await self._call(
           ^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3/dist-packages/middlewared/main.py", line 1361, in _call
    return await methodobj(*prepared_call.args)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3/dist-packages/middlewared/plugins/network_/route.py", line 130, in sync
    routing_table.add(ipv6_gateway)
  File "/usr/lib/python3/dist-packages/middlewared/plugins/interface/netif_linux/routing.py", line 202, in add
    self._op("add", route)
  File "/usr/lib/python3/dist-packages/middlewared/plugins/interface/netif_linux/routing.py", line 235, in _op
    ip.route(op, **kwargs)
  File "/usr/lib/python3/dist-packages/pyroute2/iproute/linux.py", line 2310, in route
    ret = self.nlm_request(
          ^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3/dist-packages/pyroute2/netlink/nlsocket.py", line 870, in nlm_request
    return tuple(self._genlm_request(*argv, **kwarg))
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3/dist-packages/pyroute2/netlink/nlsocket.py", line 1214, in nlm_request
    for msg in self.get(
               ^^^^^^^^^
  File "/usr/lib/python3/dist-packages/pyroute2/netlink/nlsocket.py", line 873, in get
    return tuple(self._genlm_get(*argv, **kwarg))
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3/dist-packages/pyroute2/netlink/nlsocket.py", line 550, in get
    raise msg['header']['error']
pyroute2.netlink.exceptions.NetlinkError: (22, 'Invalid argument')

2) I get a weird warning saying that a mountpoint for /var/db/system was not found:

Code:

[2023/09/10 16:59:37] (WARNING) SystemDatasetService.sysdataset_path():114 - /var/db/system: mountpoint not found

3) I get an error regarding a service called idmap:

Code:

[2023/09/10 16:59:38] (ERROR) ServiceService._restart():282 - Service 'idmap' not running after restart

sfatula · Sep 10, 2023

I get #2 and #3, but not #1 on non Cobia. Since I don't have any minute spikes as you guys do, I don't think 2 and 3 have anything to do with it. If you don't have regular logged errors, every minute say, don't know what to say. I'd file a bug report myself if no one else chimes in on why every minute spikes.

fornex · Sep 11, 2023

Thank you for the feedback. In the meantime, I filed a ticket, which can be seen at NAS-124038.

The only weird part (to me at least) was that this phenomenon only started in my case after upgrading to Cobia, while I can see that OP is experiencing the same issue even on TrueNAS-SCALE-22.12.3.3.

StubbornGarrett · Sep 11, 2023

sfatula said:
You could try looking in middlewared.log and syslog and see if anything shows up there

Sadly I didn't see anything which correlates to the CPU spikes. But thank you for the input and your offer!

@fornex also thank you for opening the ticket!

dasdreHmomenT · Sep 11, 2023

hi,
I noticed the same phenomenon (middleware (worker) taking lots of CPU every minute) after upgrading (from angelfish) to 22.12.3.3.
I couldn't find anything in the logs but I found that the spikes were gone when all pools were exported. this led me to the ZFS reporting tab and there I found that the cpu spikes correlate perfectly with "ARC Requests demand_metadata". I also found this other thread finding the requests: https://www.truenas.com/community/t...quests-demand_metadata-on-idle-server.111373/ .
Also, when I exported a pool with a lot of nested datasets the spikes went down, middleware (worker) now only taks about half a core and not a whole core every minute.
Does this help?

fornex · Sep 11, 2023

@StubbornGarrett You're welcome. Hopefully, this isssue will be resolved soon.

@dasdreHmomenT Thank you for your suggestion. That may work. I will probably try it after doing a thorough backup. Although the weirdest thing in my case was that this spike started happening even before I imported/created any pools, besides the system pool that was already assigned during the installation (I did a clean install of Cobia).

anodos · Sep 12, 2023

fornex said:
@StubbornGarrett You're welcome. Hopefully, this isssue will be resolved soon.

@dasdreHmomenT Thank you for your suggestion. That may work. I will probably try it after doing a thorough backup. Although the weirdest thing in my case was that this spike started happening even before I imported/created any pools, besides the system pool that was already assigned during the installation (I did a clean install of Cobia).

60 second intervals sounds like it's possibly an alert that's running too frequently (there's a periodic middleware task to check whether any alerts need to be run). You could probably play around with those settings to see if you can narrow down if one is unreasonably expensive or running at unexpected times.

fornex · Sep 14, 2023

anodos said:
60 second intervals sounds like it's possibly an alert that's running too frequently (there's a periodic middleware task to check whether any alerts need to be run). You could probably play around with those settings to see if you can narrow down if one is unreasonably expensive or running at unexpected times.

Thank you for the advice!

Yes, it does seem too artificial, even with the occasional (ca. 5 minutes) "extra spikes" (when the heat reaches 50+ degrees Celsius and sometimes 30% CPU usage, according to the logged CPU usage values). I will try to play around with the settings the next time I turn the system on (for now, I turned it off due to the really high temperatures, increased power consumption and high CPU usage despite no connections to the server and no services running).

dasdreHmomenT · Sep 14, 2023

anodos said:
60 second intervals sounds like it's possibly an alert that's running too frequently (there's a periodic middleware task to check whether any alerts need to be run). You could probably play around with those settings to see if you can narrow down if one is unreasonably expensive or running at unexpected times.

thank you for the idea!
hmm, at least my server had the spikes even when I reset to defaults as soon as I imported a pool.
all alerts are set to "immediately". I changed it for some and will play around, but is there any way to see which might cause the issue in the logs?

dasdreHmomenT · Sep 23, 2023

so, changes in the alert settings did nothing about the spikes for me.
I saw that the ticket NAS-124038 was closed because the issue could not be reproduced. It is however still present for me and I suppose for the others in this thread (bluefin 22.12.3.3. and cobia). Is there anything we can do to help investigate this?

fornex · Sep 30, 2023

Hello Everyone,

I managed to upgrade my TrueNas Scale to Cobia RC.1 to check if the issue still persists.

To my surprise, it seems the issue has been solved. Although, I would like to emphasize that I did not install any apps, heck, I did not even import/create any pools. However, last time (as far as I can remember), the issue started appearing immediately after installation, so it seems nice so far. I will try to give updates after I can get to the NAS again and install apps etc.

@dasdreHmomenT
I will post the updates from my end. If the isssue does not resurface in my case, even after installing apps etc., I would say that (even though I am still only one user, meaning a statistically less relevant sample) it would be worthwhile to communicate with the devs, as in that case, the code change between BETA-1 and RC.1 likely resolved the issues. If that is so, then they could check if any of the code changes could apply in your cases as well.

dasdreHmomenT · Sep 30, 2023

fornex said:
Hello Everyone,

I managed to upgrade my TrueNas Scale to Cobia RC.1 to check if the issue still persists.

To my surprise, it seems the issue has been solved. Although, I would like to emphasize that I did not install any apps, heck, I did not even import/create any pools. However, last time (as far as I can remember), the issue started appearing immediately after installation, so it seems nice so far. I will try to give updates after I can get to the NAS again and install apps etc.

@dasdreHmomenT
I will post the updates from my end. If the isssue does not resurface in my case, even after installing apps etc., I would say that (even though I am still only one user, meaning a statistically less relevant sample) it would be worthwhile to communicate with the devs, as in that case, the code change between BETA-1 and RC.1 likely resolved the issues. If that is so, then they could check if any of the code changes could apply in your cases as well.

Thank you for the update!
Please inform us whether the issue reappears after importing pools! As stated above, I did not have it on a fresh install but it reappeared immediately after importing a pool and significantly subsided with the removal of a dataset with many children/snapshots.

fornex · Sep 30, 2023

Oh, that is not good news (the fact that it did not happen on a fresh install).

I am hoping that it will not happen once again, as last time I imported the pools, but now I am going to re-create them from scratch (there were a few things that went wrong with ACL previously), but I will certainly post the update. (I will be able to give it a go tomorrow afternoon hopefully, as there has been a lot of work that piled up to the weekend sadly.)

Important Announcement for the TrueNAS Community.

'middlewared' process spikes CPU usage in one minute invtervals

Cadet

Cadet

Guru

Cadet

Guru

Cadet

Guru

Dabbler

Guru

Dabbler

Cadet

Cadet

Dabbler

Sambassador

Dabbler

Cadet

Cadet

Dabbler

Cadet

Dabbler

Important Announcement for the TrueNAS Community.

Related topics on forums.truenas.com for thread: "'middlewared' process spikes CPU usage in one minute invtervals"

Similar threads