tested 13/mimic on Ubuntu 16/xenial
tested 12/luminous on CentOS 7
In this guide we are using the following architecture:
ceph1 ceph2 ceph3
)ceph-deploy
) on ceph3
The management system would preferably outside the cluster, but we have used a cluster member for that purpose, ceph3
. It makes it easy to compare the settings from ~/ceph.conf
versus /etc/ceph/ceph.conf
, to validate that those have been populated alright.
Check the name of the last Ceph stable release or search for RCs. As of late June 2018 we have Mimic. And as of Jul 2017 we had Luminous.
Look very precisely for the available packages and choose your distribution. Ubuntu LTS and RHEL/CentOS are usually supported. Not Debian although the repo is called as such (only xenial and bionic packages in fact).
If you are using RHEL, you may have to access to the updates (latest versions of packages), or eventually the extra
repository, I am not sure. Anyway I switched to CentOS when dealing with Luminous in Jul 2017, and it went smoothly.
Make sure you got e.g. 3 disks for each node. As this is testing, XEN guests will have 10G virtual disks as sparse RAW format,
for guest in ceph1 ceph2 ceph3; do cd $guest/ for vdisk in osd1 osd2 osd3; do dd if=/dev/zero of=$guest.$vdisk bs=1G count=0 seek=10 done; unset vdisk cd ../ done; unset guest for guest in ceph1 ceph2 ceph3; do cat <<-EOF 'tap:tapdisk:aio:/data/guests/$guest/$guest.osd1,xvdb,w', 'tap:tapdisk:aio:/data/guests/$guest/$guest.osd2,xvdc,w', 'tap:tapdisk:aio:/data/guests/$guest/$guest.osd3,xvdd,w'] EOF done; unset guest vi ceph*/ceph{1,2,3} for guest in ceph1 ceph2 ceph3; do xl create $guest/$guest; done; unset guest
Define static name resolution on each cluster and mon node,
uname -n cat /etc/hostname vi /etc/hosts x.x.x.x ceph1 x.x.x.x ceph2 x.x.x.x ceph3 x.x.x.254 gw
Make sure the admin node is able ssh to all the other nodes without a password, and register those as known hosts.
ssh ceph1 hostname ssh ceph2 hostname ssh ceph3 hostname
Maybe sudo
is required by ceph-deploy
.
On Ubuntu,
dpkg -l sudo
On CentOS,
rpm -q sudo systemctl stop firewalld systemctl disable firewalld getenforce setenforce 0 #not /etc/sysconfig/selinux??? cat <<-EOF > /etc/selinux/config SELINUX=permissive SELINUXTYPE=targeted EOF
In case those nodes are vSphere virtualized,
#ps auxw | grep vmtoolsd
Time setup,
nmtp -sU -p 123 NTP vi /etc/ntp.conf server NTP systemctl restart ntpd systemctl enable ntpd ntpq -p date hwclock --systohc
(optional) In case you are not using the root
account,
#grep ^wheel /etc/group #useradd -G wheel -m ceph #echo "%wheel ALL=(ALL) NOPASSWD: ALL" >> /etc/sudoers #vi /etc/sudoers #Defaults secure_path = /sbin:/bin:/usr/sbin:/usr/bin:/usr/local/sbin:/usr/local/bin
preferably on a management system, but we are using ceph3 cluster node
Install ceph-deploy-2.0.1
or latest one, NOT 2.0.0 (see troubleshooting section if you want to know why),
apt install python-pip pip install ceph-deploy #pip search ceph which ceph-deploy ceph-deploy --version # 2.0.1
that can be done manually instead of ceph-deploy install <nodes>
– it provides better control for this critical step
The Ceph repository is not ready for Debian 9/stretch yet (June 2018), therefore we have to use Ubuntu instead, wither xenial
or bionic
. We are going for xenial
.
dist=xenial apt -y install lsb-release ca-certificates apt-transport-https lsb_release -sc wget -q -O- 'https://download.ceph.com/keys/release.asc' | apt-key add - echo deb https://eu.ceph.com/debian-mimic/ $dist main >> /etc/apt/sources.list #https://download.ceph.com/debian-mimic/ apt update
see what version you are going to install,
apt search ^ceph-base apt show ceph-base apt-cache policy ceph
==> v10/jewel IS WRONG — It should be v13/mimic!
#apt install libaio1 libsnappy1v5 libcurl3 curl libgoogle-perftools4 libleveldb1v5 apt install ceph ceph --version # 13.2.0
Manually replicate the repo on all the nodes,
vi /etc/yum.repos.d/ceph.repo [Ceph] name=Ceph packages for $basearch baseurl=http://download.ceph.com/rpm-luminous/el7/$basearch enabled=1 gpgcheck=1 type=rpm-md gpgkey=https://download.ceph.com/keys/release.asc [Ceph-noarch] name=Ceph noarch packages baseurl=http://download.ceph.com/rpm-luminous/el7/noarch enabled=1 gpgcheck=1 type=rpm-md gpgkey=https://download.ceph.com/keys/release.asc [ceph-source] name=Ceph source packages baseurl=http://download.ceph.com/rpm-luminous/el7/SRPMS enabled=1 gpgcheck=1 type=rpm-md gpgkey=https://download.ceph.com/keys/release.asc
and install ceph,
yum clean all yum install ceph
from the mgmt node
Calculating PGs is a lot easier nowerdays. But in case you want to have fun with calculating PGs, see the dedicated and obsolete section at the end of this document.
We have three nodes with three disks each, hence 9 OSDs, with only one object replication, hence,
echo $((3 * 3 * 100 / 2))
Define a new cluster,
ceph-deploy new ceph1 ceph2 ceph3 ls -l ~/ceph.conf
Setup your service and cluster network as well as OSD pool config – add those to the newly created configuration,
vi ceph.conf public network = x.x.x.0/24 cluster network = x.x.x.0/24 osd pool default size = 2 # Write an object 2 times osd pool default min size = 1 # Allow writing 1 copy in a degraded state osd crush chooseleaf type = 1 osd pool default pg num = 450 osd pool default pgp num = 450
and populate the cluster config into /etc/ceph/ceph.conf
,
ceph-deploy config push ceph1 ceph2 ceph3
from the mgmt node
Deploy monitors,
grep ^mon_initial_members ~/ceph.conf ceph-deploy mon create-initial #ceph-deploy gatherkeys ceph1 ceph2 ceph3
Deploy the keyring,
ceph-deploy admin -h ceph-deploy admin ceph1 ceph2 ceph3 ls -l /etc/ceph/ceph.client.admin.keyring
check with e.g.,
ceph osd tree
from the mgmt node
ceph-deploy mgr create ceph1 ceph2 ceph3 ceph -s | grep mgr:
from the mgmt node
ceph-deploy disk list ceph1 ceph2 ceph3
CAUTION: disk zap
is wiping everything out,
#ceph-deploy disk zap ceph1 /dev/xvdb /dev/xvdc /dev/xvdd #ceph-deploy disk zap ceph2 /dev/xvdb /dev/xvdc /dev/xvdd #ceph-deploy disk zap ceph3 /dev/xvdb /dev/xvdc /dev/xvdd ceph-deploy osd create -h ceph-deploy osd create --data /dev/xvdb ceph1 ceph-deploy osd create --data /dev/xvdc ceph1 ceph-deploy osd create --data /dev/xvdd ceph1 ceph-deploy osd create --data /dev/xvdb ceph2 ceph-deploy osd create --data /dev/xvdc ceph2 ceph-deploy osd create --data /dev/xvdd ceph2 ceph-deploy osd create --data /dev/xvdb ceph3 ceph-deploy osd create --data /dev/xvdc ceph3 ceph-deploy osd create --data /dev/xvdd ceph3
older syntax,
#ceph-deploy disk zap ceph1:sdc ceph1:sdd ceph1:sde #ceph-deploy disk zap ceph2:sdc ceph2:sdd ceph2:sde #ceph-deploy disk zap ceph3:sdc ceph3:sdd ceph3:sde #ceph-deploy osd create ceph1:sdc:/dev/sdb1 ceph1:sdd:/dev/sdb2 ceph1:sde:/dev/sdb3 #ceph-deploy osd create ceph2:sdc:/dev/sdb1 ceph2:sdd:/dev/sdb2 ceph2:sde:/dev/sdb3 #ceph-deploy osd create ceph3:sdc:/dev/sdb1 ceph3:sdd:/dev/sdb2 ceph3:sde:/dev/sdb3
check,
ceph osd tree
from any cluster node
Create a new pool and check,
ceph osd pool create test-pool 128 ceph osd lspools ceph osd pool get-quota test-pool
understand why it allocated more than 128 placements groups,
vi poolpg copy/paste from http://cephnotes.ksperis.com/blog/2015/02/23/get-the-number-of-placement-groups-per-osd/ chmod +x poolpg ./poolpg
See Operating Ceph
This guide was originally based on Luca Dell'Oca’s which has a few issues. For example it prepares the osds and fs on partitions in part 4 while later-on, ceph-deploy disk zap
erases everything and the full disk is eventyally used as OSD.
Have fun with the logic. I don’t get it. But it seems 30 PGs per OSD is maximum while the values for pg num
and pgp num
should be a power of two. Also increasing the number of PGs makes your cluster more easy to scale out in terms of OSDs. So put simple, I 30 x <number of your OSDs>
to find the maximum and take the power of two equal or below it.
2^0 1 2^1 2 2^2 4 2^3 8 2^4 16 2^5 32 2^6 64 2^7 128 2^8 256 2^9 512 2^10 1024
Refs (I’m tired).
If you get this error while setting up monitors,
ImportError: libceph-common.so.0: cannot map zero-fill pages
==> increase memory of your nodes, 256M
was obviously not enough here. Switching to 1024M
did the trick.
If you get this error while searching for OSDs,
TypeError: 'Logger' object is not callable
==> seems to be a bug, which can be fixed,
grep distro.conn.logger /usr/lib/python2.7/dist-packages/ceph_deploy/osd.py cp /usr/lib/python2.7/dist-packages/ceph_deploy/osd.py /usr/lib/python2.7/dist-packages/ceph_deploy/osd.py.dist --- /usr/lib/python2.7/dist-packages/ceph_deploy/osd.py.dist 2018-06-27 15:30:08.003540181 +0000 +++ /usr/lib/python2.7/dist-packages/ceph_deploy/osd.py 2018-06-27 15:31:38.239540181 +0000 @@ -373,7 +373,7 @@ ) for line in out: if line.startswith('Disk /'): - distro.conn.logger(line) + LOG.info(line.decode('utf-8')) def osd_list(args, cfg):
but the simplest way is to upgrade to ceph-deploy 2.0.1 using pip.
If you get this error while trying to zap
a disk,
AttributeError: 'Namespace' object has no attribute 'debug'
==> upgrade from 2.0.0 to ceph-deploy 2.0.1, just get rid of the package and use pip
instead.