I’m still studying Ceph, and recently faced a scenario in which one of my Ceph nodes went down due to hardware failure. Even though my data was safe due to the replication factor, I was not able to remove the node from the cluster.
I could remove the OSDs on the node, but I didn’t find a way to remove the node being listed in ‘ceph osd tree’. I ended up editing the CRUSH map by hand, to remove the host, and uploaded it back. This worked as expected. Following are the steps I did to achieve this.
a) This was the state just after the node went down:
# ceph osd tree
# id weight type name up/down reweight
-10 .08997 root default
-20 .01999 host hp-m300-5
00 .009995 osd.0 up 1
40 .009995 osd.4 up 1
-30 .009995 host hp-m300-9
10 .009995 osd.1 down 0
-40 .05998 host hp-m300-4
20 .04999 osd.2 up 1
30 .009995 osd.3 up 1
# ceph -w
cluster 62a6a880-fb65-490c-bc98-d689b4d1a3cb
health HEALTH_WARN 64 pgs degraded; 64 pgs stuck unclean; recovery 261/785 objects degraded (33.248%)
monmap e1: 1 mons at {hp-m300-4=10.65.200.88:6789/0}, election epoch 1, quorum 0 hp-m300-4
osdmap e130: 5 osds: 4 up, 4 in
pgmap v8465: 196 pgs, 4 pools, 1001 MB data, 262 objects
7672 MB used, 74192 MB / 81865 MB avail
261/785 objects degraded (33.248%)
64 active+degraded
132 active+clean
I started with marking the OSDs on the node out, and removing them. Note that I don’t need to stop the OSD (osd.1) since the node carrying osd.1 is down and not accessible.
b) If not, you would’ve to stop the OSD using:
# sudo service osd stop osd.1
c) Mark the OSD out, this is not ideally needed in this case since the node is already out.
# ceph osd out osd.1
d) Remove the OSD from the CRUSH map, so that it does not receive any data. You can also get the crushmap, de-compile it, remove the OSD, re-compile, and upload it back.
Remove item id 1 with the name ‘osd.1’ from the CRUSH map.
# ceph osd crush remove osd.1
e) Remove the OSD authentication key
# ceph auth del osd.1
f) At this stage, I had to remove the OSD host from the listing but was not able to find a way to do so. The ‘ceph-deploy’ didn’t have any tools to do this, other than ‘purge’, and ‘uninstall’. Since the node was not f) accessible, these won’t work anyways. A ‘ceph-deploy purge’ failed with the following errors, which is expected since the node is not accessible.
# ceph-deploy purge hp-m300-9
[ceph_deploy.conf][DEBUG ] found configuration file at: /root/.cephdeploy.conf
[ceph_deploy.cli][INFO ] Invoked (1.5.22-rc1): /usr/bin/ceph-deploy purge hp-m300-9
[ceph_deploy.install][INFO ] note that some dependencies *will not* be removed because they can cause issues with qemu-kvm
[ceph_deploy.install][INFO ] like: librbd1 and librados2
[ceph_deploy.install][DEBUG ] Purging from cluster ceph hosts hp-m300-9
[ceph_deploy.install][DEBUG ] Detecting platform for host hp-m300-9 ...
ssh: connect to host hp-m300-9 port 22: No route to host
[ceph_deploy][ERROR ] RuntimeError: connecting to host: hp-m300-9 resulted in errors: HostNotFound hp-m300-9
I ended up fetching the CRUSH map, removing the OSD host from it, and uploading it back.
g) Get the CRUSH map
# ceph osd getcrushmap -o /tmp/crushmap
h) De-compile the CRUSH map
# crushtool -d /tmp/crushmap -o crush_map
i) I had to remove the entries pertaining to the host-to-be-removed from the following sections:
a) devices
b) types
c) And from the ‘root’ default section as well.
j) Once I had the entries removed, I went ahead compiling the map, and inserted it back.
# crushtool -c crush_map -o /tmp/crushmap
# ceph osd setcrushmap -i /tmp/crushmap
k) A ‘ceph osd tree’ looks much cleaner now 🙂
# ceph osd tree
# id weight type name up/down reweight
-1 0.07999 root default
-2 0.01999 host hp-m300-5
0 0.009995 osd.0 down 0
4 0.009995 osd.4 down 0
-4 0.06 host hp-m300-4
2 0.04999 osd.2 up 1
3 0.009995 osd.3 up 1
There may be a more direct method to remove the OSD host from the listing. I’m not aware of anything relevant, based on my limited knowledge. Perhaps I’ll come across something as I progress with Ceph. Comments welcome.