Seek Time
Zero OS uses a seek time metric to determine if disks are HDD or SSD. If your SSD is being detected as HDD, it’s possible that there’s an underlying hardware issue causing poor performance. In this post you’ll find info on a few different diagnostics that can help identify SSDs with such issues.
Before you proceed, it’s a good idea to try some basic troubleshooting:
- Reboot the node
- Reseat the cable terminations at the drives and at the mainboard
- If you have storage controller cards, reseat those as well
If you’ve already tried those steps, the rest of this post can help you see what Zos is seeing and identify which disk is at fault. Once you’ve done that, trying to move this drive to a different port on the main board.
Live Linux
To run these tests, first boot into a live Linux distro like grml (the small image is okay for our purposes here). You can flash the iso file onto a USB stick with a tool like USBImager, attach the USB stick to your 3Node, and select it as a temporary boot device.
All commands in this tutorial should be run as root. Grml will present you with a root shell after boot. On other distros you may need to open a terminal and switch to the root user:
sudo su root
Then get a list of all disks on the system:
fdisk -l
Identify the drives attached to the system, including the USB stick you booted from, according to their models and sizes. Make note of the drive paths, normally /dev/sd...
or /dev/nvme...
for NVMe.
Check Seek Times
We can use the same tool that Zero OS uses to check seek times. For convenience, I’ve provided a precompiled copy of the program at a shortened url. Use these commands to download and make the file executable:
wget tinyurl.com/seektime
chmod u+x seektime
To run seektime
on a single disk, specify its path (note for NVMe, a namespace designation, like n1
is required):
# SATA
./seektime /dev/sda
./seektime /dev/sdb
...
# NVMe
./seektime /dev/nvme0n1
./seektime /dev/nvme1n1
...
To check all disks of a given type, a wildcard in a short script can be used:
# SATA
for disk in /dev/sd?; do ./seektime $disk; done
# NVMe
for disk in /dev/nvme?n1; do ./seektime $disk; done
The output will include the measured average seek time and a determination about whether the disk is SSD or HDD. Here’s an example output, with seek values in microseconds:
/dev/sda: SSD (103 us)
/dev/sdb: HDD (3336 us)
Zos assumes that anything with a seek time greater than 500 microseconds, or .5 milliseconds, is an HDD. If your SSD is only performing at HDD levels, this is a sign that the disk is failing and should be replaced.
Additional Diagnostics
Smartctl for SATA
For SATA SSDs, we can get access to built in diagnostics using smartctl
. This is included with grml, but you might need to install it on other distros.
Initiate short test
First, we’ll should initiate a short self test on the disk:
smartctl -t short /dev/sda
Or use a script like above to test all disks:
for disk in /dev/sd?; do smartctl -t short $disk; done
Check the results
This will take a minute or two. You can check whether the test is complete with:
smartctl -c /dev/sda
Look for “Self test execution status”. If it says in progress, you’ll need to wait a bit longer. Otherwise, you can now query the results. A simple pass or fail result can be retrieved with:
smartctl -H /dev/sda
For the full results:
smartctl -a /dev/sda | less
This will pipe the results into less
so you can scroll through them. Hit q
to exit when you’re done.
Interpreting the results
One common indicator of a failing disk is attribute id #5, the reallocated sector count. Drives are built to tolerate a certain number of reallocated sectors, but if you see a number higher than zero in the raw value here, that can be a sign of trouble.
While the SMART data can be useful in identifying failed or failling disks, not every disk with a problem can be spotted this way. Zero OS simply looks at the performance of the disk, and treats it accordingly.
You can consult with other resources online (1, 2) for more information about what output of smartctl
means.
Long test
You can also run a full test, which will scan the entire drive and can take hours to complete. Just specify long
instead:
smartctl -t long /dev/sda
Smartctl for NVMe
NVMe drives have some support under smartctl
. It seems that NVMe drives don’t all support triggering a self test like on SATA drives. However, they do still collect health info that can be queried, for example:
smartctl -a /dev/nvme0
Conclusion
If Zero OS is detecting your SSD as HDD, the best course of action is probably to replace the disk. Running the tests in this post can help to identify disks with slow seek times and other issues as reported by onboard diagnostics. A firmware update for the disk could be worth trying, as well as moving it to a different main board port.
I’m not aware of any other troubleshooting for issues like this, but if you are, let me know below and I’ll incorporate it into this post. Please also share your experience in the replies if you’ve had issues with disks, tried any diagnostics, or have additional tips!