Juju doesn't clean up all resources on remove-application

Bug #1977865 reported by Natalia Nowakowska
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Canonical Juju
Invalid
Medium
Jack Shaw

Bug Description

`juju remove-application` doesn't fully remove charm resources. For example, after deploying and removing metacontroller-operator, its pod and statefulset remain in place.

Environment:
microk8s 1.21/stable
Enabled addons: dns storage
juju 2.9.31

Steps to reproduce:
1. Add a new model named `test`
2. Deploy metacontroller-operator. Note: currently, a NotImplemented error is purposely raised on remove event, but in this scenario it has been removed and the charm was built locally.

3. Check the k8s resources:
```
$ kubectl get all -n test
NAME READY STATUS RESTARTS AGE
pod/modeloperator-6dfc6947bc-9btvg 1/1 Running 0 107s
pod/metacontroller-operator-charm-0 1/1 Running 0 44s
pod/metacontroller-operator-0 1/1 Running 0 67s

NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/modeloperator ClusterIP 10.152.183.189 <none> 17071/TCP 107s
service/metacontroller-operator ClusterIP 10.152.183.105 <none> 65535/TCP 69s
service/metacontroller-operator-endpoints ClusterIP None <none> <none> 69s

NAME READY UP-TO-DATE AVAILABLE AGE
deployment.apps/modeloperator 1/1 1 1 107s

NAME DESIRED CURRENT READY AGE
replicaset.apps/modeloperator-6dfc6947bc 1 1 1 107s

NAME READY AGE
statefulset.apps/metacontroller-operator-charm 1/1 44s
statefulset.apps/metacontroller-operator 1/1 67s

```
4. Remove the charm and verify if its resources were cleaned up:
```
 kubectl get all -n test
NAME READY STATUS RESTARTS AGE
pod/modeloperator-6dfc6947bc-9btvg 1/1 Running 0 5m26s
pod/metacontroller-operator-charm-0 1/1 Running 0 4m23s

NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/modeloperator ClusterIP 10.152.183.189 <none> 17071/TCP 5m26s

NAME READY UP-TO-DATE AVAILABLE AGE
deployment.apps/modeloperator 1/1 1 1 5m26s

NAME DESIRED CURRENT READY AGE
replicaset.apps/modeloperator-6dfc6947bc 1 1 1 5m26s

NAME READY AGE
statefulset.apps/metacontroller-operator-charm 1/1 4m23s

```
There are no errors in debug-log.

Revision history for this message
Juan M. Tirado (tiradojm) wrote :

While reproducing the steps I found this in the debug-log

unit-metacontroller-operator-0: 11:40:12 ERROR unit.metacontroller-operator/0.juju-log Uncaught exception while in charm code:
Traceback (most recent call last):
  File "./src/charm.py", line 320, in <module>
    main(MetacontrollerOperatorCharm)
  File "/var/lib/juju/agents/unit-metacontroller-operator-0/charm/venv/ops/main.py", line 406, in main
    _emit_charm_event(charm, dispatcher.event_name)
  File "/var/lib/juju/agents/unit-metacontroller-operator-0/charm/venv/ops/main.py", line 140, in _emit_charm_event
    event_to_emit.emit(*args, **kwargs)
  File "/var/lib/juju/agents/unit-metacontroller-operator-0/charm/venv/ops/framework.py", line 278, in emit
    framework._emit(event)
  File "/var/lib/juju/agents/unit-metacontroller-operator-0/charm/venv/ops/framework.py", line 722, in _emit
    self._reemit(event_path)
  File "/var/lib/juju/agents/unit-metacontroller-operator-0/charm/venv/ops/framework.py", line 767, in _reemit
    custom_handler(event)
  File "./src/charm.py", line 139, in _remove
    raise NotImplementedError()

Can you confirm this is not an issue related to the charm itself?

Changed in juju:
status: New → Incomplete
Revision history for this message
Natalia Nowakowska (natalia-nowakowska) wrote :

Hello, thanks for your answer.
In point 2 I mentioned that for the purposes of reproducing this bug, the NotImplementedError has been removed and the charm was built locally.

Please test the metacontroller latest/edge. We don't raise this error anymore because juju should handle the actions on remove event.

Changed in juju:
status: Incomplete → New
Revision history for this message
Andrew Scribner (ca-scribner) wrote :

Another example of this same problem is the spark-k8s charm. The deployed workload creates a mutatingwebhook "spark-config" which does not get removed on remove-application, resulting in a fully blocked model/k8s namespace. To reproduce:

```
juju deploy spark-k8s spark
# Wait for deployment
juju remove-application spark
# Wait for removal
juju deploy spark-k8s spark
```

The remaining "spark-config" mutatingwebhook from deploy 1 blocks all pod creation, including the second `juju deploy spark-k8s spark` (see https://bugs.launchpad.net/juju/+bug/1980771 for more info)

Changed in juju:
status: New → Triaged
importance: Undecided → Medium
milestone: none → 2.9.34
Changed in juju:
milestone: 2.9.34 → 2.9.35
Changed in juju:
milestone: 2.9.35 → 2.9.36
Changed in juju:
milestone: 2.9.36 → 2.9.37
Changed in juju:
milestone: 2.9.37 → 2.9.38
Changed in juju:
milestone: 2.9.38 → 2.9.39
Changed in juju:
milestone: 2.9.39 → 2.9.40
Jack Shaw (jack-shaw)
Changed in juju:
assignee: nobody → Jack Shaw (jack-shaw)
Revision history for this message
Jack Shaw (jack-shaw) wrote :

Hi, I've had a look into this on Juju 3.0 and have been able to replicate it.

I believe as well I have managed to diagnose the root cause of this as well. When Juju deploys metacontroller-operator, it creates the usual `metacontroller-operator` stateful set for the application

The charm's install hook then uses lightkube to create another stateful set `metacontroller-operator-charm` itself which Juju cannot track.

The charm's remove hook is then a no-op when it should clean up it's resources (hence the NotImplementedError I'd guess), hence the statefulset is left behind.

I believe is a bug with the charm, not with Juju

Changed in juju:
status: Triaged → Invalid
Changed in juju:
milestone: 2.9.40 → 2.9.42
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.