DRBD ON SPARSE LVM FOR A CONVERGENT XEN FARM

no Linstor required

tested on ubuntu and slackware current (oct 2021)

INTRODUCTION

dual-primary are only required temporarily during XEN migration. but we have to enable it: with it disabled, the replicated vdisk does not come up as primary by itself, and remains in secondary state unless manually told to.

DRBD v8 does not seem to support the diskless feature hence we go for v9.

SYSPREP & REQUIREMENTS

make sure the nodes communicate by short hostnames and also validate SSH fingerprints — this is at least required to synchronize the confs across the farm through rsync or scp.

vi /etc/hosts

192.168.122.11  slack1.localdomain slack1
192.168.122.12  slack2.localdomain slack2
192.168.122.13  slack3.localdomain slack3

ssh slack1
ssh slack2
ssh slack3

ssh-keygen -t ed25519
cat .ssh/id_ed25519.pub
vi .ssh/authorized_keys

INSTALL

pick an install method

use PPA binaries
build drbd9 from scratch and build the tools for ubuntu or slackware
prepare RPMs for RHEL/CentOS

and load the v9 modules

modprobe drbd
lsmod | grep drbd
modinfo drbd
cat /proc/drbd

and also install thin tools and tune LVM2

BLOCK DEVICE ARCHITECTURE

the design we’re attempting to PoC as a MWE here is as such:

/data/ as GFS2, OCFS2 or NFS for XEN guest configs
/dev/thin/GUEST-NAME logical volumes for XEN guest file-systems mirrors

assuming a single additional block device for both the shared file-system and LVM2 thin provisioning to live on

fdisk /dev/vdb

partition 1 -- 10G for xen configs
partition 2 -- all the rest for LVM2

BLOCK DEVICE SETUP (LVM2 THIN PROVISIONING)

we don’t need CLVM nor lvmlockd/sanlock/dlm because

we want DRBD to synchronize the thin volumes, not the states of the LVM itself
DRBD talks to both sides, so we surely don’t want to lock any

all nodes

    pvcreate /dev/vdb2
    vgcreate thin /dev/vdb2

create a specific LV as a thin pool. this sometimes gets stuck — this is why we rebooted the machine once lvm2 and thin tools got installed

    lvcreate --extents 100%FREE --thin thin/pool

you can now proceed with the casual LVs within the thin pool

node1 and node2

    lvcreate --virtualsize 10G --thin -n slack thin/pool
    ls -lF /dev/thin/slack

DRBD v9 SETUP

slack1

the global config

mv -i /etc/drbd.conf /etc/drbd.conf.dist
mv /etc/drbd.d/ /etc/drbd.d.dist/
mkdir /etc/drbd.d/
vi /etc/drbd.conf # new file

    global {
            usage-count yes;
            udev-always-use-vnr;
    }

    common {
            net {
                    protocol C;
                    # v9
                    fencing resource-only;
                #fencing resource-and-stonith;
                    allow-two-primaries yes;
            }
    options {
        quorum majority;
        #quorum-minimum-redundancy 2;
        on-no-quorum io-error;
    }
    handlers {
        quorum-lost "echo b > /proc/sysrq-trigger";
    }
            disk {
                    read-balancing when-congested-remote;
            }
    }

include "drbd.d/*.res";

now some resource

vi /etc/drbd.d/slack.res

    resource slack {
            device /dev/drbd1;
            meta-disk internal;
            on slack1 {
                    # v9
                    node-id   1;
                    address   192.168.122.11:7701;
                    disk      /dev/thin/slack;
            }
            on slack2 {
                    node-id   2;
                    address   192.168.122.12:7701;
                    disk      /dev/thin/slack;
            }
            on slack3 {
                    node-id   3;
                    address   192.168.122.13:7701;
                    disk none;
            }
            # v9
            connection-mesh {
                    hosts slack1 slack2 slack3;
            }
    }

scp /etc/hosts slack2:/etc/
scp /etc/hosts slack3:/etc/
scp /etc/drbd.conf slack2:/etc/
scp /etc/drbd.conf slack3:/etc/
scp /etc/rc.d/rc.local slack2:/etc/rc.d/
scp /etc/rc.d/rc.local slack3:/etc/rc.d/
scp /etc/rc.d/rc.local_shutdown slack2:/etc/rc.d/
scp /etc/rc.d/rc.local_shutdown slack3:/etc/rc.d/

INITIALIZE THE MIRROR

all nodes (including the diskless one)

initialize the underlying physical disks and enable the replicated volume

drbdadm create-md slack
drbdadm up slack
ls -lF /dev/drbd1

slack1

state is currently inconsistent on both sides of the mirror.

drbdadm status

since this is a clean and initial shot (there’s no data at all) we can proceed with a quick trick

a single time, on any node

drbdadm new-current-uuid --clear-bitmap slack

–otherwise– we would we need to enforce and mark a valid state somewhere to begin with.

#drbdadm primary --force slack

and so forth with the other nodes hosting a primary DRBD device.

now check the the nodes are synchronizing

drbdadm status

using the clear-bitmap trick, you won’t loose the sparse on the secondary volume

slack2

    lvs -o+discards thin

READY TO GO

the init script was already enabled at startup

ls -lF /etc/rc.d/init.d/drbd
cat /etc/rc.d/rc.local
cat /etc/rc.d/rc.local_shutdown

ACCEPTANCE

check what protocol the resource is living upon

cat /proc/drbd

drbdsetup show all --show-defaults | grep proto

v9 (as long as debug is enabled)

cat /sys/kernel/debug/drbd/resources/<resource>/connections/<connection>/<volume>/proc_drbd

and last but not least, check that you can access the DRBD diskless device from slack3.

check that the replicated volumes are brought up after a reboot

reboot

OPERATIONS & MONITORING

see operations

TROUBLES

root@slack1:~#         drbdadm primary --force slack
slack: State change failed: (-7) Refusing to be Primary while peer is not outdated
Command 'drbdsetup primary slack --force' terminated with exit code 11

==> slack resource on slack3 was not up (and although it’s diskless)

TODO

snapshots
check partition/lvm/drbd/fs-cluster alignment
find a way to sync before-hand to avoid the loss of sparse

RESOURCES

LINBIT DRBD kernel module https://github.com/LINBIT/drbd

DRBD userspace utilities (for 9.0, 8.4, 8.3) https://github.com/LINBIT/drbd-utils

“read-balancing” with 8.4.1+ https://www.linbit.com/en/read-balancing/

v9

The DRBD9 User’s Guide https://linbit.com/drbd-user-guide/drbd-guide-9_0-en/ ==> 4.1.8 initial sync

DRBD 9.0 Manual Pages https://docs.linbit.com/man/v9/

drbd.conf - DRBD Configuration Files https://docs.linbit.com/man/v9/drbd-conf-5/

v8.4

DRBD 8.4 https://github.com/LINBIT/drbd-8.4

drbd.conf - Configuration file for DRBD’s devices https://docs.linbit.com/man/v84/drbd-conf-5/

How to Install DRBD on CentOS Linux https://linuxhandbook.com/install-drbd-linux/

How to Setup DRBD 9 on Ubuntu 16 https://www.globo.tech/learning-center/setup-drbd-9-ubuntu-16/

LINSTOR SDS server https://github.com/LINBIT/linstor-server

ops

CLI management tool for DRBD. Like top, but for DRBD resources. https://github.com/LINBIT/drbdtop

troubles

[DRBD-user] drbd-dkms fails to build under proxmox 6 https://lists.linbit.com/pipermail/drbd-user/2019-August/025208.html

[DRBD-user] Problems compiling kernel module https://lists.linbit.com/pipermail/drbd-user/2016-June/022391.html

thin

5.4.4. Creating Thinly-Provisioned Logical Volumes https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/6/html/logical_volume_manager_administration/thinly_provisioned_volume_creation

How to setup thin Provisioned Logical Volumes in CentOS 7 / RHEL 7 https://www.linuxtechi.com/thin-provisioned-logical-volumes-centos-7-rhel-7/

Thin Provisioning in LVM2 https://www.theurbanpenguin.com/thin-provisioning-lvm2/

Setup Thin Provisioning Volumes in Logical Volume Management (LVM) – Part IV https://www.tecmint.com/setup-thin-provisioning-volumes-in-lvm/

vs. ceph rbd

https://linbit.com/blog/drbd-linstor-vs-ceph/

https://linbit.com/blog/drbd-vs-ceph/

https://linbit.com/blog/how-does-linstor-compare-to-ceph/

moar

RDMA NIC Latest Updates¶ http://lastweek.io/notes/source_code/rdma/