Exception during removal of OSD blacklist entries

Bug #1909399 reported by Thorbjørn Weidemann
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
ceph-iscsi (Ubuntu)
Expired
Undecided
Unassigned

Bug Description

Ubuntu 20.04, official Ubuntu packages, ceph 15.2.7-0ubuntu0.20.04.1, ceph-iscsi 3.4-0ubuntu2
See stacktrace below.
Looking at the code I'm guessing this is related to https://stackoverflow.com/questions/33054527/typeerror-a-bytes-like-object-is-required-not-str-when-writing-to-a-file-in
ie. binary vs. text
I'm not a python programmer, but from the stackoverflow article it would seem that the output of subprocess.check_output is considered binary in python3, and a backwards compatible fix would be do .encode() the strings constants "un-blacklisting" and "isn't blacklisted".

Dec 26 09:19:09 ff-ceph-2 rbd-target-api[8886]: Started the configuration object watcher
Dec 26 09:19:09 ff-ceph-2 rbd-target-api[8886]: Processing osd blacklist entries for this node
Dec 26 09:19:09 ff-ceph-2 rbd-target-api[8886]: Checking for config object changes every 1s
Dec 26 09:19:10 ff-ceph-2 rbd-target-api[8886]: Removing blacklisted entry for this host : 192.168.1.61:6819/881
Dec 26 09:19:12 ff-ceph-2 rbd-target-api[8886]: Traceback (most recent call last):
Dec 26 09:19:12 ff-ceph-2 rbd-target-api[8886]: File "/usr/bin/rbd-target-api", line 2952, in <module>
Dec 26 09:19:12 ff-ceph-2 rbd-target-api[8886]: main()
Dec 26 09:19:12 ff-ceph-2 rbd-target-api[8886]: File "/usr/bin/rbd-target-api", line 2862, in main
Dec 26 09:19:12 ff-ceph-2 rbd-target-api[8886]: osd_state_ok = ceph_gw.osd_blacklist_cleanup()
Dec 26 09:19:12 ff-ceph-2 rbd-target-api[8886]: File "/usr/lib/python3/dist-packages/ceph_iscsi_config/gateway.py", line 110, in osd_blacklist_cleanup
Dec 26 09:19:12 ff-ceph-2 rbd-target-api[8886]: rm_ok = self.ceph_rm_blacklist(blacklist_entry.split(' ')[0])
Dec 26 09:19:12 ff-ceph-2 rbd-target-api[8886]: File "/usr/lib/python3/dist-packages/ceph_iscsi_config/gateway.py", line 46, in ceph_rm_blacklist
Dec 26 09:19:12 ff-ceph-2 rbd-target-api[8886]: if ("un-blacklisting" in result) or ("isn't blacklisted" in result):
Dec 26 09:19:12 ff-ceph-2 rbd-target-api[8886]: TypeError: a bytes-like object is required, not 'str'
Dec 26 09:19:12 ff-ceph-2 systemd[1]: rbd-target-api.service: Main process exited, code=exited, status=1/FAILURE
Dec 26 09:19:12 ff-ceph-2 systemd[1]: rbd-target-api.service: Failed with result 'exit-code'.
Dec 26 09:19:12 ff-ceph-2 systemd[1]: rbd-target-api.service: Scheduled restart job, restart counter is at 1.
Dec 26 09:19:12 ff-ceph-2 systemd[1]: Stopped Ceph iscsi target configuration API.
Dec 26 09:19:12 ff-ceph-2 systemd[1]: Started Ceph iscsi target configuration API.

ProblemType: Bug
DistroRelease: Ubuntu 20.04
Package: ceph-iscsi (not installed)
ProcVersionSignature: Ubuntu 5.4.0-58.64-generic 5.4.73
Uname: Linux 5.4.0-58-generic x86_64
ApportVersion: 2.20.11-0ubuntu27.14
Architecture: amd64
CasperMD5CheckResult: pass
Date: Sun Dec 27 12:17:49 2020
InstallationDate: Installed on 2020-12-24 (2 days ago)
InstallationMedia: Ubuntu-Server 20.04.1 LTS "Focal Fossa" - Release amd64 (20200731)
SourcePackage: ceph-iscsi
UpgradeStatus: No upgrade log present (probably fresh install)

Revision history for this message
Thorbjørn Weidemann (thorbjoern-weidemann) wrote :
Revision history for this message
Thorbjørn Weidemann (thorbjoern-weidemann) wrote :

I initially worked around this by manually un-blacklisting all the OSDs by
for i in `sudo ceph osd blacklist ls|awk '{print $1}'`; do sudo ceph osd blacklist rm $i; done
but the next morning the cluster wouldn't start, and stopping rbd-target-api and
rbd-target-gw fixed it, so I uninstalled ceph-iscsi for now. That is why it is listed as not installed above.
I am currently evaluating ceph for potential use in production.

Revision history for this message
Thorbjørn Weidemann (thorbjoern-weidemann) wrote :

See source: ceph_iscsi_config/gateway.py line 46.

Revision history for this message
Thorbjørn Weidemann (thorbjoern-weidemann) wrote :

I rebuilt the package with the suggested fix (add .encode() to the string constants "un-blacklisting" and "isn't blacklisted"), and now I get:

Dec 27 13:29:22 ff-ceph-2 rbd-target-api[8350]: Removing blacklisted entry for this host : 192.168.1.61:6819/1587702737
Dec 27 13:29:24 ff-ceph-2 rbd-target-api[8350]: Successfully removed blacklist entry
Dec 27 13:29:24 ff-ceph-2 rbd-target-api[8350]: Removing blacklisted entry for this host : 192.168.1.61:0/817297786
Dec 27 13:29:25 ff-ceph-2 rbd-target-api[8350]: Successfully removed blacklist entry
...
So that seems to work.

Revision history for this message
Sergio Durigan Junior (sergiodj) wrote :

Thank you for taking the time to file a bug report.

Could you please provide some reproduction steps here? I'm not a ceph expert, and it seems like there needs to be some initial configuration to make ceph work. I did spend some time trying to configure it here, but it'd be easier if you could provide a step-by-step way to reproduce the bug (this will also be helpful for the potential SRU that we will need to perform).

Thanks.

Changed in ceph-iscsi (Ubuntu):
status: New → Incomplete
Revision history for this message
Thorbjørn Weidemann (thorbjoern-weidemann) wrote :

Thanks for responding :-)

I am not a Ceph expert either.

Steps to reproduce:
Short version: On a running ceph-cluster with at least one blacklisted osd: install ceph-iscsi.

Long version:
Installing Ceph is complicated, so here is a way to do it with ansible. I know this is a lot, but believe me: this is the EASY way.

Normally you would install a ceph-cluster with at least 3 servers, but below is described how to do it on one machine.

Fresh install of Ubuntu 20.04.1 LTS server. Can be in a VM, but make sure you have 3 exstra disks attached and at least 4GB RAM. Below I assume install is on sda and sdb, sdc and sdd are attached blank hds ( I have used 10GB for each).
Make sure install is up-to-date:
thw@ff-ceph-4:~$ sudo apt update
thw@ff-ceph-4:~$ sudo apt dist-upgrade
thw@ff-ceph-4:~$ sudo reboot

Here I have the default admin user thw.
Assuming hostname ff-ceph-4
make sure ff-ceph-4 can resolve to the external ip-address of the local machine, eg. by adding it to /etc/hosts

Make sure user thw can sudo any command without password by sudo visudo and adding the line:
thw ALL=(ALL) NOPASSWD:ALL

thw@ff-ceph-4:~$ sudo apt install ansible git

Create user ansible with password ansible:
thw@ff-ceph-4:~$ sudo adduser ansible

thw@ff-ceph-4:~$ su - ansible

get Ceph ansible playbooks:
ansible@ff-ceph-4:~$ git clone https://github.com/ceph/ceph-ansible.git
ansible@ff-ceph-4:~$ cd ceph-ansible
ansible@ff-ceph-4:~$ git checkout stable-5.0

Copy the attached all.yml to group_vars dir in ceph-ansible. You can diff to all.yml.sample to see what I have changed. I advise you to do this. Make sure the monitor_interface: line lists the name of your network interface, and that public_network: is the network that interface is on.
Copy the attached inventory to current-dir (/home/ansible/ceph-ansible)
NOTE: I could only attach ONE file, so this will be attached in new comment below.

make sure user ansible kan login to thw account with ssh-key:
ansible@ff-ceph-4:~$ ssh-keygen
ansible@ff-ceph-4:~$ ssh-copy-id thw@ff-ceph-4
ansible@ff-ceph-4:~$ cp site.yml.sample site.yml
ansible@ff-ceph-4:~$ ansible-playbook -i inventory site.yml

This will take a few minutes to run.
If something goes wrong see if you can fix it, and re-run the ansible-playbook command.

At the end you should hopefully have a running ceph-"cluster".
Go back to thw user (or add ansible user to sudo-list) and run
thw@ff-ceph-4:~$ sudo ceph -s

Line 3 should read:
    health: HEALTH_OK

To reproduce the bug, you should have some blacklist entries. I have them on a new install - I don't know why. Check with:
thw@ff-ceph-4:~$ sudo ceph osd blacklist ls

If there are entries listed, fine.
If not, create an entry with:
thw@ff-ceph-4:~$ sudo ceph osd blacklist add <ip-address-of-your host>

Now:
thw@ff-ceph-4:~$ sudo ceph osd pool create rbd
thw@ff-ceph-4:~$ sudo rbd pool init rbd
thw@ff-ceph-4:~$ sudo apt install ceph-iscsi

You should now see the exceptions in journalctl

Revision history for this message
Thorbjørn Weidemann (thorbjoern-weidemann) wrote :

Inventory needed in above comment.

Revision history for this message
Launchpad Janitor (janitor) wrote :

[Expired for ceph-iscsi (Ubuntu) because there has been no activity for 60 days.]

Changed in ceph-iscsi (Ubuntu):
status: Incomplete → Expired
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.