Definitely Not a Cloud
OBSOLETE - this is now taken care of by this script
this guide also has its alpine linux flavor
XEN & DRBD highly-available convergent farm
tested on slack150
quick-and-dirty deployment process using slackware packages we’ve prepared upfront
The goal is to get a few cluster nodes running XEN + DRBD + KeepaliveD.
For the purpose of a PoC, we’re assuming a KVM host with nested virtualization enabled, to run the few storage nodes and XEN hypervisors within. However the target is obiously a bare-metal cluster farm for designing a self-made and convergent IaaS cloud.
see this script
we want UNIX-style location of the kernels, however XEN micro-kernel usually gets installed in /boot/ – this is why we make this handy symlink at file-system’s root
ls -lhF /boot/xen*.gz ln -s boot/xen.gz /xen.gz
double-check required libraries
which xl ldd /usr/sbin/xl | grep not ldd /usr/sbin/xenconsoled | grep not ldd /usr/lib/xen/bin/qemu-system-i386 | grep not
we need a bridge for guests
vi /etc/rc.d/rc.inet1.conf ext=192.168.122.11/24 int=10.1.1.254/24 gw=192.168.122.1 vi /etc/rc.d/rc.inet1 #!/bin/bash echo rc.inet1 PATH is $PATH #PATH=$PATH:/usr/local/bin:/usr/local/sbin if [[ $1 = stop || $1 = down ]]; then route delete default ifconfig xenbr0 down ifconfig eth0 down ifconfig lo down else echo -n lo ... ifconfig lo up && echo done || echo FAIL echo -n eth0 ... ifconfig eth0 up && echo done || echo FAIL echo -n xenbr0 ... brctl addbr xenbr0 brctl addif xenbr0 eth0 ifconfig xenbr0 $ext up && echo done || echo FAIL echo -n default route ... route add default gw $gw && echo done || echo FAIL echo -n br0 ... brctl addbr br0 # TODO CLUSTER NETWORK #brctl addif br0 eth1 ifconfig br0 $int up sysctl -w net.ipv4.ip_forward=1 echo -n FIREWALL AND SNAT... nft -f /etc/nftables.conf && echo done || echo FAIL # rc.inet2 disabled /etc/rc.d/rc.sshd start fi
tune the front-facing IP accordingly
vi /etc/nftables.conf flush ruleset table ip nat { chain postrouting { type nat hook postrouting priority 100; ip saddr 10.1.1.0/24 oif xenbr0 snat 192.168.122.11; } }
since slackare/150 kernel does not have XEN built-in anymore, so we have to use our custom build (+ xen and reiser4)
ver=5.16.20 cd / mv -f vmlinuz vmlinuz.old mv -f vmlinuz.config vmlinuz.config.old wget https://lab.nethence.com/nunux/$ver.tar.gz wget https://lab.nethence.com/nunux/$ver.vmlinuz -O vmlinuz wget https://lab.nethence.com/nunux/$ver.vmlinuz.config -O vmlinuz.config wget https://lab.nethence.com/nunux/linux-$ver.tar.gz tar xzf $ver.tar.gz -C lib/modules/ rm -f $ver.tar.gz tar xzf linux-$ver.tar.gz -C usr/src/ rm -f linux-$ver.tar.gz depmod -a $ver file vmlinuz* unalias reboot reboot
once everything is ok
removepkg kernel-generic removepkg kernel-huge removepkg kernel-modules removepkg kernel-source ls -lF /var/log/packages/kernel-* # firmware and headers ls -lF /lib/modules/ ls -lF /usr/src/
(syslinux gets installed and used from the host directly)
install the boot-loader
fdisk -l /dev/nvme0n1 | grep ^Disklabel dd if=/usr/share/syslinux/mbr.bin of=/dev/nvme0n1 ls -lF /usr/share/syslinux/*.bin mkdir /boot/syslinux/ extlinux --install /boot/syslinux/ --device /dev/nvme0n1p1 cp -i /usr/share/syslinux/libcom32.c32 /boot/syslinux/ cp -i /usr/share/syslinux/mboot.c32 /boot/syslinux/ cp -i /usr/share/syslinux/libutil.c32 /boot/syslinux/ cp -i /usr/share/syslinux/menu.c32 /boot/syslinux/ ls -lF /boot/syslinux/
boot-loader seeks for kernels at the root of the file-system.
setup serial console mode, see syslinux
then also tune the prompt for serial line
cp -pi /etc/inittab /etc/inittab.dist vi /etc/inittab s1:12345:respawn:/sbin/agetty --noclear --flow-control --keep-baud --local-line 115200,38400,9600 ttyS0 linux
check that SYSLINUX works as expected
reboot
once everything is fine, eventually switch to XEN as default boot entry. beware that the linux console remains hvc0 and it is XEN which takes care of the serial output of the dom0.
uptime uname -r mv -i /etc/lilo.conf /etc/lilo.conf.disabled removepkg lilo ls -lF /xen.gz ls -lF /vmlinuz vi /boot/syslinux/syslinux.cfg default XEN
don’t forget to tune the system init. we can only enable the right one otherwise the other mildly floods our logs.
vi /etc/inittab s1:12345:respawn:/sbin/agetty --noclear --flow-control --keep-baud --local-line 115200,38400,9600 hvc0 linux
we will reboot further down at the acceptance stage.
build the DRBD v9 kernel module and populate into the farm
ssh slack2 mkdir -p /lib/modules/5.13.19.xenreiser4/updates/ scp /lib/modules/5.13.19.xenreiser4/updates/drbd.ko \ slack2:/lib/modules/5.13.19.xenreiser4/updates/drbd.ko scp /lib/modules/5.13.19.xenreiser4/updates/drbd_transport_tcp.ko \ slack2:/lib/modules/5.13.19.xenreiser4/updates/drbd_transport_tcp.ko ssh slack3 mkdir -p /lib/modules/5.13.19.xenreiser4/updates/ scp /lib/modules/5.13.19.xenreiser4/updates/drbd.ko \ slack3:/lib/modules/5.13.19.xenreiser4/updates/drbd.ko scp /lib/modules/5.13.19.xenreiser4/updates/drbd_transport_tcp.ko \ slack3:/lib/modules/5.13.19.xenreiser4/updates/drbd_transport_tcp.ko dsh -e -g xen "depmod -a" dsh -e -g xen "modeprobe drbd" dsh -e -g xen "lsmod | grep drbd" dsh -e -g xen "grep version /proc/drbd"
tune LVM and DRBD initial setup. then configure the initial shared-disk OCFS2 LUN
mv /etc/drbd.conf /etc/drbd.conf.dist ls -lF /etc/drbd.d/ # *.conf doesn't matter vi /etc/drbd.conf # new file ... resource ocfs2 { device /dev/drbd0; meta-disk internal; on pmr1 { disk /dev/mapper/thin-ocfs2; address x.x.x.x:7000; node-id 1; } on pmr2 { disk /dev/mapper/thin-ocfs2; address x.x.x.x:7000; node-id 2; } on pmr3 { disk /dev/mapper/thin-ocfs2; address x.x.x.x:7000; node-id 3; } connection-mesh { hosts slack1 slack2 slack3; } } vi /etc/ocfs2/cluster.conf cluster: node_count = 3 name = OCFS2CLUSTER node: ip_port = 7777 ip_address = 192.168.122.11 number = 1 name = slack1 cluster = OCFS2CLUSTER node: ip_port = 7777 ip_address = 192.168.122.12 number = 2 name = slack2 cluster = OCFS2CLUSTER node: ip_port = 7777 ip_address = 192.168.122.13 number = 3 name = slack3 cluster = OCFS2CLUSTER mv -i /etc/default/o2cb /etc/default/o2cb.dist vi /etc/default/o2cb O2CB_ENABLED=true O2CB_BOOTCLUSTER=OCFS2CLUSTER vi /etc/fstab /dev/drbd0 /data ocfs2 rw,noatime,nodiratime,_netdev,noauto 0 0
cat >> /root/.bash_profile <<-EOF export RCMD_CMD="ssh -o VisualHostKey=no" export RCP_CMD="scp -o VisualHostKey=no" export CLUSTER=$HOME/clusterit.conf EOF source /root/.bash_profile vi $HOME/clusterit.conf GROUP:xen slack1 slack2 slack3 ln -s dma /usr/sbin/mailq ln -s dma /usr/sbin/sendmail mkdir /var/spool/dma/ chown root:mail /var/spool/dma/ chmod 775 /var/spool/dma/ touch /var/spool/dma/flush chown root:mail /var/spool/dma/flush chmod 660 /var/spool/dma/flush mv -i /etc/dma/dma.conf /etc/dma/dma.conf.dist cat > /etc/dma/dma.conf <<-EOF SECURETRANSFER STARTTLS EOF
there is no need to run newaliases for DMA
slackpkg install s-nail vi /etc/aliases root: YOUR-EMAIL
setup time sync
enable daemons at boot-time
vi /etc/rc.d/rc.local /etc/rc.d/init.d/drbd start /etc/rc.d/rc.o2cb start /etc/rc.d/rc.ocfs2 start mount /data/ /etc/rc.d/init.d/xencommons start /etc/rc.d/rc.keepalived start vi /etc/rc.d/rc.local_shutdown /etc/rc.d/rc.keepalived stop /etc/rc.d/init.d/xencommons stop /etc/rc.d/rc.ocfs2 stop /etc/rc.d/rc.o2cb stop /etc/rc.d/init.d/drbd stop
we want fancy, now that we’ve got those third-party packages installed. also deploy a few scripts and enhance your PATH
vi /etc/bashrc export PATH=$PATH:/etc/rc.d:/root/bin:/root/xen ... alias runq='sendmail -q' alias diff='colordiff' cd /root/ git clone git@github.com:pbraun9/xen.git git config --global pull.rebase true
eventually make sure the default kernel still boots with syslinux. let’s reboot and check for serial prompt and console
sync reboot
check
xl info
also once OCFS2 is up, deploy some kernel for guests
mount | grep ocfs mkdir /data/kernels/ cd /data/kernels/ wget https://lab.nethence.com/nunux/5.2.21.domureiser4.vmlinuz wget https://lab.nethence.com/nunux/5.2.21.domureiser4.modules.tar.gz #wget https://lab.nethence.com/nunux/5.2.21.domureiser4.src.tar.gz ln -s 5.2.21.domureiser4.vmlinuz vmlinuz
server says hello
tail -F /var/log/maillog & date | mail -s "NODE $HOSTNAME IS READY" root
also make sure it can communicate with the other members of the cluster farm (SSH, rsync, …)
cat /root/.ssh/authorized_keys
now it’s finally time to reboot to check for boot-loader, console prompt and also make LVM thin happy
reboot
slack1-only
vi /root/sync #!/bin/bash echo echo SYSPREP scp /etc/rc.d/rc.local slack2:/etc/rc.d/ scp /etc/rc.d/rc.local slack3:/etc/rc.d/ scp /etc/hosts slack2:/etc/ scp /etc/hosts slack3:/etc/ scp /root/log slack2:~/ scp /root/log slack3:~/ echo echo DRBD scp /etc/drbd.conf slack2:/etc/ scp /etc/drbd.conf slack3:/etc/ rsync -az --delete /etc/drbd.d/ slack2:/etc/drbd.d/ rsync -az --delete /etc/drbd.d/ slack3:/etc/drbd.d/ echo echo XEN rsync -az --delete --delete-excluded --exclude ".git/" /root/xen/ slack2:/root/xen/ rsync -az --delete --delete-excluded --exclude ".git/" /root/xen/ slack3:/root/xen/ echo echo VRRP scp /etc/keepalived/keepalived.conf slack2:/etc/keepalived/ scp /etc/keepalived/keepalived.conf slack3:/etc/keepalived/ #scp /var/tmp/notify.bash slack2:/var/tmp/ #scp /var/tmp/notify.bash slack3:/var/tmp/ dsh -e -g xen /etc/rc.d/rc.keepalived reload echo echo LINUX-HA scp /etc/ha.d/authkeys /etc/ha.d/ha.cf /etc/ha.d/haresources slack2:/etc/ha.d/ scp /etc/ha.d/authkeys /etc/ha.d/ha.cf /etc/ha.d/haresources slack3:/etc/ha.d/ dsh -e -g xen /etc/rc.d/init.d/heartbeat reload echo
check drbd-utils
drbdadm status cat /proc/drbd /etc/rc.d/rc.drbd start ls -lF /etc/rc.d/rc.drbd # enabled already vi /etc/rc.d/rc.local # self-verbose /etc/rc.d/rc.drbd start
we’re now ready to proceed with a distributed thin volume such as
ls -lF /dev/thin/slack ls -lF /dev/drbd/by-res/slack/0
now create a XEN guest against the thin volume
xl: /usr/lib64/libxlutil.so.4.16: version `VERS_4.15.0' not found (required by xl)
–> your tools version is not in sync with the micro-kernel. maybe you’ve installed the tools twice on the system and are having a PATH issue?
https://unix.stackexchange.com/questions/165875/resume-failed-download-using-linux-command-line-tool
https://stackoverflow.com/questions/60972064/wget-continue-on-retry
https://superuser.com/questions/901962/what-is-the-correct-mime-type-for-a-tar-gz-file