Setting up Ceph

tested 13/mimic on Ubuntu 16/xenial

tested 12/luminous on CentOS 7

Introduction

In this guide we are using the following architecture:

3 cluster nodes (ceph1 ceph2 ceph3)
3 monitors on those
3 managers on those (1 master, 2 slaves)
3 OSDs (additional disks) per node
mgmt system (ceph-deploy) on ceph3

The management system would preferably outside the cluster, but we have used a cluster member for that purpose, ceph3. It makes it easy to compare the settings from ~/ceph.conf versus /etc/ceph/ceph.conf, to validate that those have been populated alright.

Ceph vs System Versions

Check the name of the last Ceph stable release or search for RCs. As of late June 2018 we have Mimic. And as of Jul 2017 we had Luminous.

Look very precisely for the available packages and choose your distribution. Ubuntu LTS and RHEL/CentOS are usually supported. Not Debian although the repo is called as such (only xenial and bionic packages in fact).

If you are using RHEL, you may have to access to the updates (latest versions of packages), or eventually the extra repository, I am not sure. Anyway I switched to CentOS when dealing with Luminous in Jul 2017, and it went smoothly.

Disk Requirements

Make sure you got e.g. 3 disks for each node. As this is testing, XEN guests will have 10G virtual disks as sparse RAW format,

for guest in ceph1 ceph2 ceph3; do
  cd $guest/
  for vdisk in osd1 osd2 osd3; do
    dd if=/dev/zero of=$guest.$vdisk bs=1G count=0 seek=10
  done; unset vdisk
  cd ../
done; unset guest

for guest in ceph1 ceph2 ceph3; do
  cat <<-EOF
    'tap:tapdisk:aio:/data/guests/$guest/$guest.osd1,xvdb,w',
    'tap:tapdisk:aio:/data/guests/$guest/$guest.osd2,xvdc,w',
    'tap:tapdisk:aio:/data/guests/$guest/$guest.osd3,xvdd,w']

EOF
done; unset guest

vi ceph*/ceph{1,2,3}

for guest in ceph1 ceph2 ceph3; do xl create $guest/$guest; done; unset guest

Network Requirements

Define static name resolution on each cluster and mon node,

uname -n
cat /etc/hostname
vi /etc/hosts

x.x.x.x        ceph1
x.x.x.x        ceph2
x.x.x.x        ceph3
x.x.x.254      gw

Make sure the admin node is able ssh to all the other nodes without a password, and register those as known hosts.

ssh ceph1 hostname
ssh ceph2 hostname
ssh ceph3 hostname

System Requirements

Maybe sudo is required by ceph-deploy.

On Ubuntu,

dpkg -l sudo

On CentOS,

rpm -q sudo
systemctl stop firewalld
systemctl disable firewalld
getenforce
setenforce 0
#not /etc/sysconfig/selinux???
cat <<-EOF > /etc/selinux/config
SELINUX=permissive
SELINUXTYPE=targeted
EOF

In case those nodes are vSphere virtualized,

#ps auxw | grep vmtoolsd

Time setup,

nmtp -sU -p 123 NTP
vi /etc/ntp.conf

server NTP

systemctl restart ntpd
systemctl enable ntpd
ntpq -p
date
hwclock --systohc

(optional) In case you are not using the root account,

#grep ^wheel /etc/group
#useradd -G wheel -m ceph

#echo "%wheel ALL=(ALL) NOPASSWD: ALL" >> /etc/sudoers
#vi /etc/sudoers

#Defaults    secure_path = /sbin:/bin:/usr/sbin:/usr/bin:/usr/local/sbin:/usr/local/bin

Installing ceph-deploy

preferably on a management system, but we are using ceph3 cluster node

Install ceph-deploy-2.0.1 or latest one, NOT 2.0.0 (see troubleshooting section if you want to know why),

apt install python-pip
pip install ceph-deploy
#pip search ceph
which ceph-deploy
ceph-deploy --version # 2.0.1

http://docs.ceph.com/ceph-deploy/docs/

Installing Ceph

that can be done manually instead of ceph-deploy install <nodes> – it provides better control for this critical step

Ubuntu LTS

The Ceph repository is not ready for Debian 9/stretch yet (June 2018), therefore we have to use Ubuntu instead, wither xenial or bionic. We are going for xenial.

dist=xenial

apt -y install lsb-release ca-certificates apt-transport-https
lsb_release -sc
wget -q -O- 'https://download.ceph.com/keys/release.asc' | apt-key add -
echo deb https://eu.ceph.com/debian-mimic/ $dist main >> /etc/apt/sources.list
#https://download.ceph.com/debian-mimic/
apt update

see what version you are going to install,

apt search ^ceph-base
apt show ceph-base
apt-cache policy ceph

==> v10/jewel IS WRONG — It should be v13/mimic!

#apt install libaio1 libsnappy1v5 libcurl3 curl libgoogle-perftools4 libleveldb1v5
apt install ceph
ceph --version # 13.2.0

RHEL/CentOS

Manually replicate the repo on all the nodes,

vi /etc/yum.repos.d/ceph.repo

[Ceph]
name=Ceph packages for $basearch
baseurl=http://download.ceph.com/rpm-luminous/el7/$basearch
enabled=1
gpgcheck=1
type=rpm-md
gpgkey=https://download.ceph.com/keys/release.asc

[Ceph-noarch]
name=Ceph noarch packages
baseurl=http://download.ceph.com/rpm-luminous/el7/noarch
enabled=1
gpgcheck=1
type=rpm-md
gpgkey=https://download.ceph.com/keys/release.asc

[ceph-source]
name=Ceph source packages
baseurl=http://download.ceph.com/rpm-luminous/el7/SRPMS
enabled=1
gpgcheck=1
type=rpm-md
gpgkey=https://download.ceph.com/keys/release.asc

and install ceph,

yum clean all
yum install ceph

Defining the cluster

from the mgmt node

Calculating PGs is a lot easier nowerdays. But in case you want to have fun with calculating PGs, see the dedicated and obsolete section at the end of this document.

We have three nodes with three disks each, hence 9 OSDs, with only one object replication, hence,

echo $((3 * 3 * 100 / 2))

Define a new cluster,

ceph-deploy new ceph1 ceph2 ceph3
ls -l ~/ceph.conf

Setup your service and cluster network as well as OSD pool config – add those to the newly created configuration,

vi ceph.conf

public network = x.x.x.0/24
cluster network = x.x.x.0/24
osd pool default size = 2 # Write an object 2 times
osd pool default min size = 1 # Allow writing 1 copy in a degraded state
osd crush chooseleaf type = 1
osd pool default pg num = 450
osd pool default pgp num = 450

and populate the cluster config into /etc/ceph/ceph.conf,

ceph-deploy config push ceph1 ceph2 ceph3

Monitors

from the mgmt node

Deploy monitors,

grep ^mon_initial_members ~/ceph.conf
ceph-deploy mon create-initial
#ceph-deploy gatherkeys ceph1 ceph2 ceph3

Deploy the keyring,

ceph-deploy admin -h
ceph-deploy admin ceph1 ceph2 ceph3
ls -l /etc/ceph/ceph.client.admin.keyring

check with e.g.,

ceph osd tree

Managers

from the mgmt node

ceph-deploy mgr create ceph1 ceph2 ceph3
ceph -s | grep mgr:

OSDs

from the mgmt node

ceph-deploy disk list ceph1 ceph2 ceph3

CAUTION: disk zap is wiping everything out,

#ceph-deploy disk zap ceph1 /dev/xvdb /dev/xvdc /dev/xvdd
#ceph-deploy disk zap ceph2 /dev/xvdb /dev/xvdc /dev/xvdd
#ceph-deploy disk zap ceph3 /dev/xvdb /dev/xvdc /dev/xvdd

ceph-deploy osd create -h

ceph-deploy osd create --data /dev/xvdb ceph1
ceph-deploy osd create --data /dev/xvdc ceph1
ceph-deploy osd create --data /dev/xvdd ceph1

ceph-deploy osd create --data /dev/xvdb ceph2
ceph-deploy osd create --data /dev/xvdc ceph2
ceph-deploy osd create --data /dev/xvdd ceph2

ceph-deploy osd create --data /dev/xvdb ceph3
ceph-deploy osd create --data /dev/xvdc ceph3
ceph-deploy osd create --data /dev/xvdd ceph3

older syntax,

#ceph-deploy disk zap ceph1:sdc ceph1:sdd ceph1:sde
#ceph-deploy disk zap ceph2:sdc ceph2:sdd ceph2:sde
#ceph-deploy disk zap ceph3:sdc ceph3:sdd ceph3:sde

#ceph-deploy osd create ceph1:sdc:/dev/sdb1 ceph1:sdd:/dev/sdb2 ceph1:sde:/dev/sdb3
#ceph-deploy osd create ceph2:sdc:/dev/sdb1 ceph2:sdd:/dev/sdb2 ceph2:sde:/dev/sdb3
#ceph-deploy osd create ceph3:sdc:/dev/sdb1 ceph3:sdd:/dev/sdb2 ceph3:sde:/dev/sdb3

check,

ceph osd tree

Pools

from any cluster node

Create a new pool and check,

ceph osd pool create test-pool 128
ceph osd lspools
ceph osd pool get-quota test-pool

understand why it allocated more than 128 placements groups,

vi poolpg

copy/paste from http://cephnotes.ksperis.com/blog/2015/02/23/get-the-number-of-placement-groups-per-osd/

chmod +x poolpg
./poolpg

Operations

See Operating Ceph

References

This guide was originally based on Luca Dell'Oca’s which has a few issues. For example it prepares the osds and fs on partitions in part 4 while later-on, ceph-deploy disk zap erases everything and the full disk is eventyally used as OSD.

OBSOLETE – Calculating PGs

Have fun with the logic. I don’t get it. But it seems 30 PGs per OSD is maximum while the values for pg num and pgp num should be a power of two. Also increasing the number of PGs makes your cluster more easy to scale out in terms of OSDs. So put simple, I 30 x <number of your OSDs> to find the maximum and take the power of two equal or below it.

2^0     1
2^1     2
2^2     4
2^3     8
2^4     16
2^5     32
2^6     64
2^7     128
2^8     256
2^9     512
2^10    1024

Refs (I’m tired).

Troubleshooting broken ceph-deploy 2.0.0

If you get this error while setting up monitors,

ImportError: libceph-common.so.0: cannot map zero-fill pages

==> increase memory of your nodes, 256M was obviously not enough here. Switching to 1024M did the trick.

If you get this error while searching for OSDs,

TypeError: 'Logger' object is not callable

==> seems to be a bug, which can be fixed,

grep  distro.conn.logger /usr/lib/python2.7/dist-packages/ceph_deploy/osd.py
cp /usr/lib/python2.7/dist-packages/ceph_deploy/osd.py /usr/lib/python2.7/dist-packages/ceph_deploy/osd.py.dist

--- /usr/lib/python2.7/dist-packages/ceph_deploy/osd.py.dist    2018-06-27 15:30:08.003540181 +0000
+++ /usr/lib/python2.7/dist-packages/ceph_deploy/osd.py 2018-06-27 15:31:38.239540181 +0000
@@ -373,7 +373,7 @@
         )
         for line in out:
             if line.startswith('Disk /'):
-                distro.conn.logger(line)
+                LOG.info(line.decode('utf-8'))


 def osd_list(args, cfg):

but the simplest way is to upgrade to ceph-deploy 2.0.1 using pip.

If you get this error while trying to zap a disk,

AttributeError: 'Namespace' object has no attribute 'debug'

==> upgrade from 2.0.0 to ceph-deploy 2.0.1, just get rid of the package and use pip instead.