Non-persistent VM upgrades: best practices?

Hello,

what are recommended practices for rolling out an upgraded image for a set of non-persistent VMs?

Let’s say I have a service handled by about five instances (VMs) running on top of the same non-persistent image. Now I want to make a modification to the image (either a new application code, or system package updates). Then I want to roll out a set of new VMs.

  • should I clone the image for modifications, or use snapshots (how?)?
  • should I update the template reflecting the new image ID, or use name-based reference to the root disk?
  • should I delete the old VMs and instantiate new ones, or just shut them down, plug the new image in, and start them up, retaining the VM ID?
  • would using OneFlow help somehow?
  • is it possible to develop a new version in an existing VM, and make the modifications from that VM persistent afterwards?
  • anything else? :slight_smile:

Thanks for any recommendations,

-Yenya

Hello.
My typical practice:
I have image with base operation system only. For example, Ubuntu 16.04. I created its once.
Next, I use Ansible playbooks for prepare, configure, upgrade VM’s systems.
I specially don’t use many parents image, because if you have at least once child VM’s, you can’t remove this image.

If you want to create “unconfigurable”, I think, you can delete old VM’s and create new VM’s.

At the moment, I have one script that wipes out all “personality” of a VM: Delete SSH keys, empty any logfiles, empty package caches etc.
So if I need to fix something in a base image, or want to upgrade it, I do that and follow up with the node wiping script.

I normally have the base image as persistent and then copy it to non-persistent ones for the non-persistent VMs.
The template for the non-persistent ones is cloned from the base one and usually identical, except that the base image might have a fixed ip assigned.

One thing you should check out is canga.io by Javier which can do all those things on the raw image file without even booting.
I want to switch to using it since it sounds a lot cleaner than manually booting and updating the images.

@darkfader: interesting, thanks for the pointer to canga.io.

I wonder how well would it work with my ONe cluster with CEPH storage. AFAIK, qemu-img (which is used by canga.io) does not support native CEPH RBD protocol (unlike qemu itself).

Hello Florian,

Would you mind posting your wipe script? I don’t think it is useful reinventing the wheel every other day.

Hello Yenya,

qemu-img already has support for RBD images. I used this to migrate instances from Ganeti, vSphere and OpenStack to OpenStack and OpenNebula.
Maybe this is helpful for someone:

  • Converting of a RAW file into an (already existing) RBD image:
    qemu-img convert -p FILE -O rbd -n rbd:POOL/IMAGE to convert
  • Converting / Copying of a RBD image into a new RBD image:
    qemu-img convert -p rbd:POOL/IMAGE -O rbd rbd:POOL/NEW-IMAGE

Best regards,

Bernhard J. M. Grün

Hi @dot - here is the script.

My open todo was to redirect history to /dev/null. I sometimes had it as the last thing in root history, which is a nice trap.
So the suggestion is to run like this:
export HISTFILE=/dev/null ./sysprep.sh

#!/bin/sh

# 
# Delete all base image config
#

# rudder cfengine ppkeys
rm /var/rudder/cfengine-community/ppkeys/* 
rm /var/rudder/cfengine-community/policy_server.dat 
# rudder logs
rm /var/rudder/cfengine-community/outputs/*
# rudder uuid
rm /opt/rudder/etc/uuid.hive 
# ssh host keys
rm /etc/ssh/*host*key*
# lvm archive
rm /etc/lvm/archive/*
# logfiles
find /var/log -name "*gz" -exec rm {} +
find /var/log -name "*.1" -exec rm {} +
find /var/log -type f -exec cp /dev/null {} +

# clean apt cache
type apt 2> /dev/null && rm /var/cache/apt/archives/*deb
# clean yum cache
type yum 2> /dev/null && yum clean all

It works well enough as is so I didn’t really beautify anything to avoid overcomplicating what doesn’t need to be.

2 Likes