Manipulating cloud images for the Grid

Since the introduction of full virtual machines on the Grid, it’s also possible to run cloud images. This opens the possibility to create custom cloud images. If it can run on Linux, you can add it to a cloud image and in turn deploy it on the Grid!

The following is not the way to do it, just what I found to be practical and easiest. If you have alternatives or other ways to do things, please share!

Requirements:

  • A hypervisor (KVM, Virtualbox, …): we will create a temporary vm to manipulate the image
  • virt-customize and qemu-img ( apt install libguestfs-tools qemu-utils )
  • Some basic Linux knowledge

Prep image

First download your preferred cloud image, in this example we use Ubuntu 20.04.

wget https://cloud-images.ubuntu.com/focal/current/focal-server-cloudimg-amd64.img

A standard Ubuntu cloud image is 2.1GB in size. If you require more for the software you will add later on, use qemu-img to resize the partition of the image. Set the size you want to add (not the total size) at the end of the cmd. Here we will add 1GB to the image.
Remember that it’s important to keep your image as small as possible. Since for each deployment, the image has to be copied from the Hub to the ZOS node. The bigger it is, the longer it takes.

qemu-img resize focal-server-cloudimg-amd64.img +1G

We must set a root password, otherwise we can’t login to the VM console for setting up our image.

virt-customize -a ubuntu-20-04-amd64.img --root-password password:somepassword

Temporary VM to resize filesystem

Next up is to expand the file system on the cloud image. We expanded the partition previously, to use the new space we must also expand the file system. For this I used a temporary VM to which I attach an Ubuntu desktop ISO and the cloud image. Boot into Ubuntu desktop and start GParted. It will ask if GPT can use the free space, press ‘Fix’.
image
Select the cloud image (the ext4 partition), right click it and choose ‘Resize/Move’.

Move the used ‘partition space bar’ to the right to expand the file system.
image -> image

Then press ‘Resize/Move’, next the green check mark to apply the changes, confirm with ‘Apply’

This will expand the partition and you can shut down the VM again.

Temporary VM to add your custom software

Next will be to start another vm with the cloud image as it’s root disk, so to boot the cloud image itself. Once the image is booted you can choose to install your software via the VM console, or enable SSH inside the VM to install remotely. I always use SSH since you need networking to download your software anyway. For this, make sure to connect the VM to a network (with DHCP, is easiest).

Open the console of your VM, login with username root and the password you set previously. Once logged in enable networking and SSH.

ip a
dhclient enp1s0
ip a
ssh-keygen -A
mkdir -p /run/sshd

Edit /etc/ssh/sshd_config so these settings are set. It’s also easy to make a backup of the original sshd_config file since we have to change it back at the end (so to not enable passwor auth in sshd once the image is deployed on the grid):

PermitRootLogin yes
PasswordAuthentication yes

Then start the ssh daemon:

/sbin/sshd -f /etc/ssh/sshd_config

Now login over SSH, using the IP of the VM with username root and the password you set previously. You can start installing and configuring your custom software on the vm now.

Cleanup

Once everything has been set up to your liking and before shutting down the vm, check the following via the VM console:

  • Default the sshd config
  • Remove ssh key files from /etc/ssh/ -> rm /etc/ssh/ssh_host_*
  • Remove the directory /run/sshd -> rm -r /run/sshd
  • Enable all required systemd services ! This will start your service when the image boots -> systemctl enable your-service
  • Check and enable the firewall -> ufw status or nft list ruleset
  • Change root password -> as root run: passwd
  • Clear bash history for root and other users (if created any): vim /root/.bash_history. For root first logout out of the VM console, log back in and put a space in front of your text editor cmd (so it’s not stored in the bash history when you logout).

Upload image to the Hub

Once done shut down the temporary vm again. Ok good, so last thing is to package the image.

qemu-img convert -p -f qcow2 -O raw [your_image].img image.raw
tar -czf [name_of_release].tar.gz image.raw

Once you have the tar.gz upload this file to https://hub.grid.tf/
Once the Hub is ready doing it’s magic, you can find the uploaded image in your personal repo https://hub.grid.tf/your-username.3bot

Don’t hesitate to test these procedures, give comments or new ideas! We are all learning here so pretty sure we can improve. :slight_smile:

2 Likes

Will try to do this with the cockpit project today!

1 Like

im relatively sure ive created a working image based on outside testing, but it times out at 5 minutes when i try to load it, i really think this has more to do with not being able to pull the image that fast, is there anywhy to change the timeout?

I heard there had been a fix on the devnet for the timeout so i tested it there and it still happened, everything described seems to work though, and i can boot the image locally and it perform properly.

This is :fire:
Thanks so much for the detailed guide, @linkmark

2 Likes

https://hub.grid.tf/parkers.3bot/CockpitRC1.flist

-this is what i came up with, it boots up with the cockpit interface running and the firewalld plugin configured to allow on 9090 with the cockpit service.

Included features are docker and qemu, storage, network and fire wall management. the .raw was 3.5 gbs, that may be too much? it would be really cool if i can get this to deploy. would be a one click solution to a gui solution that would be more friendly to the average user. im not sure how to tell whats wrong but it does the whole taking to long to deploy on both devnet

i got it to deploy actually, but i cant ssh in, though i can see it online on my network, im thinking i missed a step in the network setup since i was orgionally working from the grid image for 22.04 and this was vanilla, is there a way to download the grids image as a starting point?

Yes I have the same experience sometimes, retrying works every time for me if it’s a timeout. Even if your deployment fails, the download of the flist to ZOS continues. So next time you try, the image is there.

Newest Ubuntu cloud image builds seem to have an issue booting the networking stack on the Grid.

To get going around this issue use a slightly older one like this one or older: http://cloud-images-archive.ubuntu.com/releases/focal/release-20220530/ubuntu-20.04-server-cloudimg-amd64.img

Just tested your flist, it has the same issue as here.
If it’s not too much hassle, use an older image to start from.

dude you are the man thats an easy fix!

this is so much good information for the video thats coming at the end of this!

https://hub.grid.tf/parkers.3bot/cockpit2004RC1.flist

It works!

I got inspired to try my hand working with cloud images yesterday, thanks in big part to this post, and have some lessons from my own experience to share :slight_smile:

On my Arch Linux system, virt-customize isn’t available in a binary package. I eventually compiled it from source and I think it’s probably the simplest way to accomplish this, but there’s an alternative that gives some cool insight into how these images get configured in the field.

cloud-init is responsible for doing things like setting up user accounts and configuring networking in these Ubuntu images and on the grid in general. It also supports a “nocloud” mode, where configuration is read from a disk image attached to the VM. How to create such an image, which sets the password for the default ubuntu user is documented here. There’s also a tool to automate generating the image. Once you have the image, just attach it to the VM as a second disk and cloud-init will read it and apply changes only on the first boot.

You can also run a history -c before logging out, and no history file will be written. In the case you have a history file, it’s safe to just delete it.

In a fresh cloud image, the root password is unset, with an asterisk in the /etc/shadow file. I couldn’t find a command to restore this state, but you can edit the file directly. Running passwd -d root before to clear out the long encrypted version of the password you set is helpful. Then place * between the first set of colons:

# cat /etc/shadow
root:*:...

And that’s all I’ve got for now. Thanks again for the awesome guide, @linkmark!

1 Like

A note from my experience doing this in ubuntu, I first left a image oversized went to raw and tar, deployed fine,

attempted to resize a qcow2 post modification with “qemu resize --shrink -800M” (made sure this was ll empty) interestingly the image continued to boot in qemu and function appropriately, but after converting and uploading that image, it does not deploy with the “/ cant be mount”

Awesome, did not know :slight_smile:
Thanks for the contribution!

1 Like

Did some shrinking tests too but they all failed with boot issues afterwards. Now I try to find out how much space I need, and resize accordingly.
But there will probably be some tricks to shrink an image correctly. Which would be good to speed up deployment a bit.

i think part of the issue may be that the image.raw is truly a structure of three images, and i assume that they are not in the order you would expect.

i have a theory on being able to break the image out, shrink the / partition individually, and pack it back up. the next image i drag out ill see if i can out the idea to function.

My pleasure! It’s been really fun learning about these cloud images and also taking a deeper dive into how Zos handles stuff behind the scenes along the way :sunglasses:

The best bet, based on my brief research on the subject, is just to create a new image at the right size and copy the files over, after finishing with setup.

an easy workflow for this process, i missed a step in here somewhere so the image itself wasnt correct, but the process has made all the working images lol.