Action `remove-instance` works but appears to fail

Bug #1954306 reported by Peter Matulis
24
This bug affects 3 people
Affects Status Importance Assigned to Milestone
MySQL InnoDB Cluster Charm
Triaged
Medium
Unassigned
mysql-8.0 (Ubuntu)
Confirmed
Undecided
Unassigned

Bug Description

Prior to removing a mysql-innodb-cluster unit (`juju remove-unit`) I applied the `remove-instance` action:

$ juju run-action --wait mysql-innodb-cluster/leader remove-instance address=10.0.0.183

The output included text that suggested the action did not succeed:

===

"message: Remove instance failed"

"nmysqlsh.DBError: MySQL Error (1238): Cluster.remove_instance:
      Variable 'group_replication_force_members' is a non persistent variable"

"status: failed"

===

However, the model status looks reasonable as does the cluster status.

See attachment for details:

mysql-innodb-cluster-remove.txt

Revision history for this message
Peter Matulis (petermatulis) wrote :
Liam Young (gnuoy)
Changed in charm-mysql-innodb-cluster:
status: New → Invalid
Revision history for this message
Liam Young (gnuoy) wrote :

I don't think this is a charm bug. The issue appears when using the mysql to remove a node from the cluster. From what I can see you cannot persist group_replication_force_members and is correctly unset. So the error being reported seems wrong

https://pastebin.ubuntu.com/p/sx6ZB3rs6r/

root@juju-1f04f3-zaza-90b9e082f2aa-2:/var/lib/juju/agents/unit-mysql-innodb-cluster-1/charm# /snap/bin/mysqlsh
Cannot set LC_ALL to locale en_US.UTF-8: No such file or directory
MySQL Shell 8.0.23

Copyright (c) 2016, 2021, Oracle and/or its affiliates.
Oracle is a registered trademark of Oracle Corporation and/or its affiliates.
Other names may be trademarks of their respective owners.

Type '\help' or '\?' for help; '\quit' to exit.
mysql-py> shell.connect('clusteruser:d2Z27kpxZmJ826tSVWL6SVV4LYZhZwwryHtM@172.20.0.111')
Creating a session to 'clusteruser@172.20.0.111'
Fetching schema names for autocompletion... Press ^C to stop.
Your MySQL connection id is 1644 (X protocol)
Server version: 8.0.27-0ubuntu0.20.04.1 (Ubuntu)
No default schema selected; type \use <schema> to set one.
<Session:clusteruser@172.20.0.111:33060>
mysql-py []> cluster = dba.get_cluster('jujuCluster')
mysql-py []> cluster.remove_instance('clusteruser@172.20.0.166', {'force': False})
The instance will be removed from the InnoDB cluster. Depending on the instance
being the Seed or not, the Metadata session might become invalid. If so, please
start a new session to the Metadata Storage R/W instance.

Instance '172.20.0.166:3306' is attempting to leave the cluster...
ERROR: Instance '172.20.0.166:3306' failed to leave the cluster: Variable 'group_replication_force_members' is a non persistent variable
Traceback (most recent call last):
  File "<string>", line 1, in <module>
mysqlsh.DBError: MySQL Error (1238): Cluster.remove_instance: Variable 'group_replication_force_members' is a non persistent variable
mysql-py []> \sql show variables like 'group_replication_force_members';
+---------------------------------+-------+
| Variable_name | Value |
+---------------------------------+-------+
| group_replication_force_members | |
+---------------------------------+-------+
1 row in set (0.0086 sec)

Revision history for this message
Liam Young (gnuoy) wrote :

s/The issue appears when using the mysql to/The issue appears when using the mysql shell to/

Revision history for this message
Lucas Kanashiro (lucaskanashiro) wrote :

Thanks for the investigation Liam. This is in our backlog for further investigation.

Revision history for this message
Sergio Durigan Junior (sergiodj) wrote :

Thanks for reporting the bug.

I'm wondering here if we can get a step-by-step reproducer for this issue focusing on MySQL. This seems to be a somewhat complex scenario which demands a non-trivial amount of work to set up. Thanks.

Revision history for this message
Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in mysql-8.0 (Ubuntu):
status: New → Confirmed
Revision history for this message
Rodrigo Barbieri (rodrigo-barbieri2010) wrote :

The error is different when the action is retried:

1st try:

 output: "Cannot set LC_ALL to locale en_US.UTF-8: No such file or directory\nThe
      instance will be removed from the InnoDB cluster. Depending on the instance\nbeing
      the Seed or not, the Metadata session might become invalid. If so, please\nstart
      a new session to the Metadata Storage R/W instance.\n\nInstance '10.5.2.0:3306'
      is attempting to leave the cluster...\n\e[31mERROR: \e[0mInstance '10.5.2.0:3306'
      failed to leave the cluster: Variable 'group_replication_force_members' is a
      non persistent variable\nTraceback (most recent call last):\n File \"<string>\",
      line 3, in <module>\nmysqlsh.DBError: MySQL Error (1238): Cluster.remove_instance:
      Variable 'group_replication_force_members' is a non persistent variable\n"

2nd try:

    output: "Cannot set LC_ALL to locale en_US.UTF-8: No such file or directory\n\e[31mERROR:
      \e[0m10.5.2.0:3306 is reachable but has state OFFLINE\nTo safely remove it from
      the cluster, it must be brought back ONLINE. If not possible, use the 'force'
      option to remove it anyway.\nTraceback (most recent call last):\n File \"<string>\",
      line 3, in <module>\nmysqlsh.Error: Shell Error (51004): Cluster.remove_instance:
      Instance is not ONLINE and cannot be safely removed\n"

I am not sure but it seems the action is not really completing the removal, as removing with force=true (with the mysql service stopped or only if the instance is fine (ONLINE status), otherwise, hits bug LP#2006759) actually removes it from the cluster-status output.

tags: added: sts
Revision history for this message
Rodrigo Barbieri (rodrigo-barbieri2010) wrote :

I disagree this is an "Invalid" bug. Surely it is not a charm bug, since any instance attempted to be removed manually results in the same error, but this is happening in every single cluster deployed by the charm, no exceptions, and it leaves the instance in an error state, requiring cleanup. I believe the charm is configuring the cluster in a way that sets up the conditions for the problem to happen. I have a hard time finding the same scenario described anywhere on the Internet, except when deployed by the charm.

Changed in charm-mysql-innodb-cluster:
status: Invalid → New
Revision history for this message
Rodrigo Barbieri (rodrigo-barbieri2010) wrote :

Additionally, when the force=true parameter is not used, the action fails and the flags are not removed.

Revision history for this message
Alex Kavanagh (ajkavanagh) wrote :

@Rodrigo, it might not be an actual charm bug; the charm doesn't specifically set up the variable that is mentioned in the error, and it being unset is necessary to start group replication in the first place.

It might be that removing a unit from a cluster fails even if it is set up without using charms (i.e. just manually using 3 machines, and installing the packages). i.e. there's some step that is missing, that we've not uncovered yet. However, that also doesn't mean that it's invalid. Also as the remove-instance is a charm action, then it probably ought to work.

I'm going to set it to triaged and medium as there is sort of a work around.

Changed in charm-mysql-innodb-cluster:
status: New → Triaged
importance: Undecided → Medium
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.