Upgrading The House Server

Overview

lemon is the house server. The main requirement is data integrity followed by shared services. The WAN and the LAN are directly connected to this server.

This upgrade will convert it from Gentoo Linux to CRUX Linux and update the hardware some.

This document shows the steps to configure mirrored disks, install the Linux distrbution, configure the services, copy the data from the existing server, and finally swap the machines. The LAN and WAN will only be down for a short data syncronization and a reboot.

Services

  • DHCP
  • DNS
  • IMAP
  • NTP
  • CIFS
  • fetchmail
  • SMTP
  • TFTP
  • Firewall
  • NAT
  • crons
  • self-monitoring

CRUX Install

Two mirrored drives are used. The physical drives aren't exactly the same size so only the smaller size is mirrored. That leaves an extra parition on the larger drive for admin use.

The root and /boot partition are mirrored using mdadm with 0.9 metadata so lilo is happy. lilo -M is used one each drive so they both have an MBR.

A third partition is mirrored with mdadm but doesn't require the older metadata format.

Both drives are parititioned identically. Tried using the +nG form in fdisk (version 2.27.1) but the geometry was slightly different on one drive and rounded to a boundary. Ended up using the last sector number to prevent the rounding errors. It does not matter of the starting or ending sector is different between the two drives but the number of sectors must be exactly the same.

Mirroring swap is a bad for performance so each drive has an unmirrored swap partition.

Another partition uses up the rest of the drive as /dev/md2. /dev/md2 is used to create the Volume Group vg0.

The /usr and /var paritions will be logical volumes.

Refer to the Crux Handbook and mix in the tasks below to set up a CRUX based server with reliable disks.

Before running setup

Create the RAID1 for root and /boot

mdadm -C /dev/md0 -e 0 -l 1 -n 2 /dev/sda1 /dev/sda2

mkfs.ext3 /dev/md0

mdadm -C /dev/md1 -e 0 -l 1 -n 2 /dev/sda3 /dev/sda3

mkfs.ext3 /dev/md1

Set up swap space

mkswap /dev/sda2

mkswap /dev/sdb2

swapon /dev/sda2 /dev/sdb2

This wiki article describes the above but fails to say how to force 0.9 metadata with the -e 0 option. This version of metadata is critical for lilo. lilo can only use a boot and root device that have the older metadata format. Newer versions of mdadm default to newer versions of the metadata.

Create the Volume Group for all other important directories

A Volume Group called vg0 is created that is backed by a metadevice comprised of two physical devices in a RAID1 (mirrored) configuration.

mdadm -C /dev/md2 -l 1 -n 2 /dev/sda5 /dev/sda5

Create the LVM2 configuration needed:

pvcreate /dev/md2

vgcreate vg0 /dev/md2

lvcreate -L 44G -n backupslv vg0

lvcreate -L 23G -n sharedlv vg0

lvcreate -L 8G -n usrlv vg0

lvcreate -L 8G -n varlv vg0

mkfs.xfs /dev/vg0/usrlv

mkfs.xfs /dev/vg0/varlv

mkfs.xfs /dev/vg0/backupslv

mkfs.xfs /dev/vg0/sharedlv

backupslv and sharedlv are optional. This house server provides backup and shared space to some computers with other Operating Systems that I support but refuse to name lest I have to wash my keyboard and hands.

The sizes of the logical volumes are discretionary. These are just values near what the current server is using. lvextend and modern filesystems allow easy growth.

The use of XFS is also discretionary but do yourself a favor: make sure some kind of journalling filesystem is used for the storage that makes up the Logical Volumes and the it can be resized on-the-fly. XFS is the best for our usage profile.

Pre-setup and setup

All this disk storage has to be mounted prior to running setup since /usr and /boot are separated.

mount /dev/md1 /mnt

mount /dev/md0 /mnt/boot

mount /dev/vg0/usrlv /mnt/usr

mount /dev/vg0/varlv /mnt/var

Note the shared and backup storage is not mounted. The CRUX install doesn't care about those but they must be in the /etc/fstab when the system is finally booted up.

Run the setup command to install CRUX to the mirrored storage.

Installing the LVM2 userspace tools

After running setup and before chrooting into the the CRUX environment, mount the CRUX install media where the chroot can get to it:

mkdir /mnt/media

mount /dev/sr0 /mnt/media

Then go ahead with the chroot. Once in the CRUX environment use pkgadd to install the LVM2 user space. Some exploration to determine the actual version will be needed ( is my friend) but the command will be like:

pkgadd /media/crux/opt/lvm2#2.02.133-1.pkg.tar.xz

lvm2 is needed so the reboot on the new kernel will activate the LVM Volume Group. Without /usr and /var won't be available during system initialization which will fail.

Kernel configuration

Ensure that support for MD and LVM are built-in and not modules. Not all options are needed; only those for MD RAID1 and LVM's device mapper.

  • CONFIG_MD_AUTODETECT=y
  • CONFIG_BLK_DEV_DM_BUILTIN=y
  • CONFIG_BLK_DEV_DM=y

Post-install

Running on the newly installed CRUX set up the directories for pkgmk and prt-get:

mkdir /usr/ports/source

chmod 1777 /usr/ports/source

mkdir /usr/ports/pkgs

chmod 1777 /usr/ports/pkgs

mkdir /var/tmp/crux

chmod 1777 /var/tmp/crux

Update the system with the latest versions:

ports -u

prt-get sysup

rejmerge

prt-get depinst prt-utils tcpdump strace screen mdadm

revdep

Install & Configure Services

This only covers exceptional circumstances that must be dealt with. There is plenty of documentation for each of the services involved. This covers the unique configuration used for this server.

This configuration is only for the server testing phase. Further configuration changes will be done later before the cutover.

Rebuild the kernel with iptables/masquerading support.

Install these: dhcpcd (DHCP), unbound (DNS), dovecot (IMAP), openntpd (NTP), samba (CIFS), fetchmail, exim (SMTP), tftp-hpa (TFTP), lemon:/usr/sbin/firewall (Firewall and NAT), lm_sensors (self-monitoring), smartmontools (self-monitoring), mailx (smartmontools).

dovecot

Add user vmail for virtual users configuration

/usr/sbin/groupadd -g 27 vmail

/usr/sbin/useradd -g vmail -u 28 -d /home/vmail -s /bin/false vmail

mkdir /var/spool/imap

chown vmail:vmail /var/spool/imap

ln -s /var/spool/imap /home/vmail

touch /var/log/dovecot.log

chown vmail:vmail /var/log/dovecot.log

touch /var/log/dovecot-info.log

chown vmail:vmail /var/log/dovecot-info.log

Add prefix = INBOX. to 10-mail.conf in namespace inbox {} to get Cyrus-like behaviour for folders and subfolders.

exim

core port doesn't include LMTP transport. Added to DF repo.

samba

Add passwords for users.

useradd -g users -d /var/empty -s /bin/false wyatt

passwd -l wyatt

useradd -g users -d /var/empty -s /bin/false wendy

passwd -l wendy

smbpasswd -a wyatt

smbpasswd -a wendy

fetchmail

Set polling period to 120 seconds. Add startup to /etc/rc.local

tftpboot

Did not install xinetd. Was not actively using tftpboot but would like to keep the framework in place.

self-monitoring

mdadm

Added mdadm --monitor --scan --daemonise --mail wyatt@prairieturtle.ca to /etc/rc.local

SMART

Configured to send email to me and only look at sda and sdb.

lm_sensors

sensors-detect showed that the this system has AMD K8 thermal sensors and Winbond W83627EHF/EF/EHG/EG Super IO Sensors.

Ensured that the kernel config Hardware Monitor Driver was enabled for both.

Added modprobe w83627ehf to /etc/rc.local , but not sure it is necessary.

Wrote hw-check script (mostly awk) to email on hardware variance by checking the output from sensors -u. It is run like:

sensors -u | /usr/bin/gawk -v maxtemp=60 -v lowfan=2100 -f /root/bin/hw-check

Where maxtemp is the temperature when an e-mail will be sent because the gawk script will write a message to stdout. Likewise lowfan is will cause an e-mail if the fan rpm goes this low. The script also checks that the sensors were actually found or the script really doesn't guard the system.

The awk script looks like:

#!/usr/bin/gawk
/^k8temp-pci-/ { k8found=1 }
/^w83627-isa-/ { wbfound=1 }
/fan2_input:/   {   rpm = $2
                    if (rpm < lowfan) {
                        printf("Fan rpm %d is lower than limit %d\n", rpm, lowfan)
                    }
                }
/temp3_input:/  {   temp = $2
                    if (temp > maxtemp) {
                        printf("Temp %d is higher than limit %d\n", temp,
                            maxtemp)
                    }
                }
END {
        if (k8found=0) {
            printf("k8temp output not found!\n")
        }
        if (wbfound=0) {
            printf("w83627 chip output not found!\n")
        }
    }
#End of file

Added to root cron every 10 minutes \*/10 * * * * sensors -u | /usr/bin/gawk -v maxtemp=60 -v lowfan=2100 -f /root/bin/hw-check

unbound

Configured for domain pepper all on subnet 192.168.1.0/24. Disabled IPv6. Did not disable DNSSEC. Created include file for RR and PTR records in /etc/unbound/data/pepper .

Added listen statement for the LAN interface IP and a listen statement for 127.0.0.1. Did not include a listen statement for the WAN side as all the machines are private and provide no public services.

Testing

Data Integrity

Hard-disk drive loss

lvs -a -o +devices is handy for looking at the logical volumes and which MD raid devices they are on. mdadm -D /dev/mdX will show the pair of partitions, which should be on different devices, that form one RAID MD device.

Hard-disk recovery after loss

Unplugged sdb power cable. Got e-mail from smartd. Added mdadm --monitor to /etc/rc.local with email to me. /dev/md1 failed write away but /dev/md0 is not mounted so it never said a word. Rebooted system with failed drive just fine.

Simply plugging the power back in didn't cause the kernel to add the device. udevadm trigger didn't help. Have to reboot I guess. Not suprised the hotplug on IDE drive doesn't work.

To re-mirror the drive:

mdadm /dev/md0 -a /dev/sdb1

mdadm /dev/md1 -a /dev/sdb3

Performed the same test again but unplugged the power from /dev/sda this time. After all the drives had been detected as lost rebooted the system. This tested the MBR on the second drive as pointed out in the LILO article referenced earlier. Worked like a charm.

Power Outage(s)

Power Outage During Normal Operation

System came up fine when power re-applied. Journal replayed on one filesystem, others happened to be clean

Power Outate During Recovery From Power Outage

Did not test this.

Services

Cutover Strategy

Initial rsync

jonah is the hostname of the new server until the cutover.

root@jonah:bin/lemon2jonah

#!/bin/bash
rsync -a lemon:/shared/ /shared
rsync -a lemon:/backups/ /backups
rsync -a lemon:/home/ /home
rsync -a lemon:/root/ /root
rsync -a lemon:/tftpboot/ /tftpboot

Mail sync

NOTE the dovecot server must be restarted after changing the user's IMAP password in imapc_password =.

set password in /etc/dovecot/dovecot.conf and then doveadm -o mail_fsync=never backup -R -u user1 imapc:

set password in /etc/dovecot/dovecot.conf and then doveadm -o mail_fsync=never backup -R -u user2 imapc:

Machine Swap

Service Verification

Send test e-mail from outside source. Checked logs for fetchmail and exim activity.

Fallback Strategy

Change hardcoded IP addresses back. Shutdown new machine and old machine and reboot old machine.

Review mail log on new machine and replicate an emails that arrived between the cutover and the failback and copy them to the old machine.

links

social