Start Zero-OS with parameter to prevent loading of a module

Hi, we are talking about a Fujitsu Primergy RX300 S8 server. When loading Z-OS, I noticed that there is permanently almost 50% system load on the CPUs, although nothing should be running. So I started with a live Linux, also with this there is 50% load on the cores. top shows that the module acpi_pad is responsible for this. After I unloaded this module, the system load was immediately gone and everything was as it should be.
Is there any way to start Z-OS without loading this module? Is it possible to do this via startup parameters? Is there another solution for this problem?
Thanks a lot

In documentation I’m finding this condition is generally created by the kernel receiving an inaccurate core count causing the acpi_pad to spawn a huge number of instances.

There appears to be a few noted causes but this issue has a lot of documentation as infrequent and unreliably reproducible.

I would look at these items and test if changing any of them fixes this.
-firmware updates, all of them, I couldn’t find one for your server but Lenovo, dell and ibm had firmware updates listing this in patch notes.
-ensure all cores are on with hyperthreading enabled / verify cores represent properly in bios.
-remove any bios power saving setting that would cause cores to sleep
-inspect, verify and reseat your processor, ram and any pcie cards, remember these are all connected directly.
-if this still hasn’t fixed the issue, reset the bios entirely, change your cmos battery, the basics.

This isn’t an issue I’ve seen elsewhere on the grid and it’s presence in another is definitely means it won’t be fixable within zos, but I think you’ll find a solution in one of these.

My name of telegram is drew smith, feel free to pm me.

I was able to dig up these to bits of documentation on your machine that may help

http://support.ts.fujitsu.com/server/information/linux/RX300S8_TX300S8_Linux.htm

And I had to put this on my drive

FUJITSU Server PRIMERGY
BIOS optimizations for Xeon E5-2600 v2 based systems

1 Like

Hi ParkerS and thanks for all the food for thought. I also googled the problem in the meantime and came across the same cause, the wrong reporting of the cores count.
So I tried a little bit in this direction, e.g. less cores enabled or also one CPU disabled, all power management options disabled or also the OS overwrite for the power management enabled. Unfortunately no success.
The server already had the latest BIOS/firmware and also the latest iRMC flashed.
I had already set the Powersafe BIOS options as suggested in the PDF from Fujitsu. But also with the two alternative settings the problem with the many instances of acpi_pad shows up.
I have the possibility to swap the CPUs with identical ones and will try it as soon as I have some time. I will report back here.

Hi @gos,

Sorry for the late reply here. You can add kernel parameters to your bootstrap image by appending them to the url. I think disabling acpi might do the trick, like this (using farm id 42 as an example, replace it with your own):

https://bootstrap.grid.tf/uefi/prod/42/acpi=off

1 Like

Cool cool cool, Thank you. I’ll try it and report here soon

1 Like

Hi, I would like to add some info to the thread here since the problem is solved but I don’t know exactly why.
I tried the boot option acpi=off. Unfortunately this also disables HyperThreading as HT requires ACPI. At the time I didn’t investigate further and just let it continue with 50% sustained processor load.
Yesterday I went back to the topic and tested a few boot options without success until I tested “processor.nocst=1”. The system ran so slow that even the SSD could only be recognized as HDD etc. Nevertheless HT was on and 50% processor load.
After another reboot without any boot option: The server now started normally with ACPI and HT, but without the 50% processor load problem. I have no idea why, but now it runs as it should.
If someone has an explanation for this, I would like to know it.

1 Like

Thanks for reporting back. It’s possible that Zos got a kernel upgrade that ended up solving the problem. Or… something else, who knows? :slight_smile:

1 Like