Bug #863526 “Juju agents do not handle reboots” : Bugs : pyjuju

Clint Byrum (clint-fewbar) on 2011-09-30

tags:

added: production

Clint Byrum (clint-fewbar) on 2011-10-10

Changed in juju:
status:	New → Triaged
importance:	Undecided → High

Kapil Thangavelu (hazmat) on 2011-12-07

Changed in juju:
milestone:	none → florence

Revision history for this message

Kapil Thangavelu (hazmat) wrote on 2012-02-22:

#1

The recent landing of upstartification and restart support should help this but needs verification.

Revision history for this message

Clint Byrum (clint-fewbar) wrote on 2012-02-22:

#2

Download full text (8.5 KiB)

I did some smoke testing of this, and it seems like there is still work to do.

Doing a 'restart juju-machine-agent' on a machine, I got this:

2012-02-21 23:35:40,411:1524(0xb7108b70):ZOO_INFO@check_events@1632: session establishment complete on server [10.207.38.151:2181], se
ssionId=0x135a23c31a5000a, negotiated timeout=10000
2012-02-21 23:35:40,424: juju.agents.machine@DEBUG: Units changed old:set([]) new:set(['mysql/0'])
2012-02-21 23:35:40,424: juju.agents.machine@DEBUG: Starting service unit: mysql/0 ...
2012-02-21 23:35:40,425: juju.agents.machine@INFO: Machine agent started id:1 deploy:<class 'juju.machine.unit.UnitMachineDeployment'>
provider:'ec2'
2012-02-21 23:35:40,533: juju.agents.machine@DEBUG: Downloading charm local:oneiric/mysql-1329352875 to /var/lib/juju/charms
2012-02-21 23:35:41,061: juju.agents.machine@DEBUG: Starting service unit mysql/0
2012-02-21 23:35:42,247: juju.agents.machine@INFO: Started service unit mysql/0
2012-02-21 23:37:44,036:1524(0xb7108b70):ZOO_WARN@zookeeper_interest@1461: Exceeded deadline by 26ms
2012-02-21 23:37:47,385:1524(0xb7108b70):ZOO_WARN@zookeeper_interest@1461: Exceeded deadline by 15ms
2012-02-21 23:37:50,739:1524(0xb7108b70):ZOO_WARN@zookeeper_interest@1461: Exceeded deadline by 19ms
2012-02-21 23:58:38,875:1524(0xb7108b70):ZOO_WARN@zookeeper_interest@1461: Exceeded deadline by 19ms
2012-02-22 00:00:43,114:1524(0xb736d6c0):ZOO_INFO@zookeeper_close@2304: Closing zookeeper sessionId=0x135a23c31a5000a to [10.207.38.151:2181]

2012-02-22 00:00:43,901:3654(0xb74f76c0):ZOO_INFO@log_env@658: Client environment:zookeeper.version=zookeeper C client 3.3.3
2012-02-22 00:00:43,901:3654(0xb74f76c0):ZOO_INFO@log_env@662: Client environment:host.name=domU-12-31-39-09-F0-F6
2012-02-22 00:00:43,901:3654(0xb74f76c0):ZOO_INFO@log_env@669: Client environment:os.name=Linux
2012-02-22 00:00:43,901:3654(0xb74f76c0):ZOO_INFO@log_env@670: Client environment:os.arch=3.0.0-14-virtual
2012-02-22 00:00:43,901:3654(0xb74f76c0):ZOO_INFO@log_env@671: Client environment:os.version=#23-Ubuntu SMP Mon Nov 21 23:40:55 UTC 2011
2012-02-22 00:00:43,901:3654(0xb74f76c0):ZOO_INFO@log_env@679: Client environment:user.name=(null)
2012-02-22 00:00:43,901:3654(0xb74f76c0):ZOO_INFO@log_env@687: Client environment:user.home=/root
2012-02-22 00:00:43,901:3654(0xb74f76c0):ZOO_INFO@log_env@699: Client environment:user.dir=/
2012-02-22 00:00:43,901:3654(0xb74f76c0):ZOO_INFO@zookeeper_init@727: Initiating client connection, host=domU-12-31-39-15-25-69.compute-1.internal:2181 sessionTimeout=10000 watcher=0xb730e5c0 sessionId=0 sessionPasswd=<null> context=0x8d62028 flags=0
2012-02-22 00:00:43,903:3654(0xb7294b70):ZOO_INFO@check_events@1585: initiated connection to server [10.207.38.151:2181]
2012-02-22 00:00:43,907:3654(0xb7294b70):ZOO_INFO@check_events@1632: session establishment complete on server [10.207.38.151:2181], sessionId=0x135a23c31a5000d, negotiated timeout=10000
2012-02-22 00:00:43,918: juju.agents.machine@DEBUG: Units changed old:set([]) new:set(['mysql/0'])
2012-02-22 00:00:43,918: juju.agents.machine@DEBUG: Starting service unit: mysql/0 ...
2012-02-22 00:00:43,918: juju.agents.machine@INFO: Machine agent started id:1 deploy...

I did some smoke testing of this, and it seems like there is still work to do.

Doing a 'restart juju-machine-agent' on a machine, I got this:

2012-02-21 23:35:40,411:1524(0xb7108b70):ZOO_INFO@check_events@1632: session establishment complete on server [10.207.38.151:2181], se
ssionId=0x135a23c31a5000a, negotiated timeout=10000
2012-02-21 23:35:40,424: juju.agents.machine@DEBUG: Units changed old:set([]) new:set(['mysql/0'])
2012-02-21 23:35:40,424: juju.agents.machine@DEBUG: Starting service unit: mysql/0 ...
2012-02-21 23:35:40,425: juju.agents.machine@INFO: Machine agent started id:1 deploy:<class 'juju.machine.unit.UnitMachineDeployment'>
 provider:'ec2'
2012-02-21 23:35:40,533: juju.agents.machine@DEBUG: Downloading charm local:oneiric/mysql-1329352875 to /var/lib/juju/charms
2012-02-21 23:35:41,061: juju.agents.machine@DEBUG: Starting service unit mysql/0
2012-02-21 23:35:42,247: juju.agents.machine@INFO: Started service unit mysql/0
2012-02-21 23:37:44,036:1524(0xb7108b70):ZOO_WARN@zookeeper_interest@1461: Exceeded deadline by 26ms
2012-02-21 23:37:47,385:1524(0xb7108b70):ZOO_WARN@zookeeper_interest@1461: Exceeded deadline by 15ms
2012-02-21 23:37:50,739:1524(0xb7108b70):ZOO_WARN@zookeeper_interest@1461: Exceeded deadline by 19ms
2012-02-21 23:58:38,875:1524(0xb7108b70):ZOO_WARN@zookeeper_interest@1461: Exceeded deadline by 19ms
2012-02-22 00:00:43,114:1524(0xb736d6c0):ZOO_INFO@zookeeper_close@2304: Closing zookeeper sessionId=0x135a23c31a5000a to [10.207.38.151:2181]

2012-02-22 00:00:43,901:3654(0xb74f76c0):ZOO_INFO@log_env@658: Client environment:zookeeper.version=zookeeper C client 3.3.3
2012-02-22 00:00:43,901:3654(0xb74f76c0):ZOO_INFO@log_env@662: Client environment:host.name=domU-12-31-39-09-F0-F6
2012-02-22 00:00:43,901:3654(0xb74f76c0):ZOO_INFO@log_env@669: Client environment:os.name=Linux
2012-02-22 00:00:43,901:3654(0xb74f76c0):ZOO_INFO@log_env@670: Client environment:os.arch=3.0.0-14-virtual
2012-02-22 00:00:43,901:3654(0xb74f76c0):ZOO_INFO@log_env@671: Client environment:os.version=#23-Ubuntu SMP Mon Nov 21 23:40:55 UTC 2011
2012-02-22 00:00:43,901:3654(0xb74f76c0):ZOO_INFO@log_env@679: Client environment:user.name=(null)
2012-02-22 00:00:43,901:3654(0xb74f76c0):ZOO_INFO@log_env@687: Client environment:user.home=/root
2012-02-22 00:00:43,901:3654(0xb74f76c0):ZOO_INFO@log_env@699: Client environment:user.dir=/
2012-02-22 00:00:43,901:3654(0xb74f76c0):ZOO_INFO@zookeeper_init@727: Initiating client connection, host=domU-12-31-39-15-25-69.compute-1.internal:2181 sessionTimeout=10000 watcher=0xb730e5c0 sessionId=0 sessionPasswd=<null> context=0x8d62028 flags=0
2012-02-22 00:00:43,903:3654(0xb7294b70):ZOO_INFO@check_events@1585: initiated connection to server [10.207.38.151:2181]
2012-02-22 00:00:43,907:3654(0xb7294b70):ZOO_INFO@check_events@1632: session establishment complete on server [10.207.38.151:2181], sessionId=0x135a23c31a5000d, negotiated timeout=10000
2012-02-22 00:00:43,918: juju.agents.machine@DEBUG: Units changed old:set([]) new:set(['mysql/0'])
2012-02-22 00:00:43,918: juju.agents.machine@DEBUG: Starting service unit: mysql/0 ...
2012-02-22 00:00:43,918: juju.agents.machine@INFO: Machine agent started id:1 deploy:<class 'juju.machine.unit.UnitMachineDeployment'> provider:'ec2'
2012-02-22 00:00:43,984: juju.agents.machine@DEBUG: Downloading charm local:oneiric/mysql-1329352875 to /var/lib/juju/charms

Seems odd that it would download the mysql charm again.

Restarting the unit agent was a bit bizarre too, I don't see why config-changed was run:

2012-02-22 06:14:20,149: unit.lifecycle@DEBUG: pre-stop acquire, running:True
2012-02-22 06:14:20,150: unit.lifecycle@DEBUG: stopping relation lifecycles
2012-02-22 06:14:20,150: unit.lifecycle@DEBUG: stopped unit lifecycle
2012-02-22 06:14:20,150: hook.executor@DEBUG: stopped
2012-02-22 06:14:20,196:3576(0xb74506c0):ZOO_INFO@zookeeper_close@2304: Closing zookeeper sessionId=0x135a23c31a5000c to [10.207.38.151:2181]

2012-02-22 06:14:20,932:3997(0xb73796c0):ZOO_INFO@log_env@658: Client environment:zookeeper.version=zookeeper C client 3.3.3
2012-02-22 06:14:20,932:3997(0xb73796c0):ZOO_INFO@log_env@662: Client environment:host.name=domU-12-31-39-09-F0-F6
2012-02-22 06:14:20,932:3997(0xb73796c0):ZOO_INFO@log_env@669: Client environment:os.name=Linux
2012-02-22 06:14:20,932:3997(0xb73796c0):ZOO_INFO@log_env@670: Client environment:os.arch=3.0.0-14-virtual
2012-02-22 06:14:20,932:3997(0xb73796c0):ZOO_INFO@log_env@671: Client environment:os.version=#23-Ubuntu SMP Mon Nov 21 23:40:55 UTC 2011
2012-02-22 06:14:20,932:3997(0xb73796c0):ZOO_INFO@log_env@679: Client environment:user.name=(null)
2012-02-22 06:14:20,932:3997(0xb73796c0):ZOO_INFO@log_env@687: Client environment:user.home=/root
2012-02-22 06:14:20,932:3997(0xb73796c0):ZOO_INFO@log_env@699: Client environment:user.dir=/
2012-02-22 06:14:20,932:3997(0xb73796c0):ZOO_INFO@zookeeper_init@727: Initiating client connection, host=domU-12-31-39-15-25-69.compute-1.internal:2181 sessionTimeout=10000 watcher=0xb71de5c0 sessionId=0 sessionPasswd=<null> context=0x9c63718 flags=0
2012-02-22 06:14:20,958:3997(0xb7116b70):ZOO_INFO@check_events@1585: initiated connection to server [10.207.38.151:2181]
2012-02-22 06:14:20,961:3997(0xb7116b70):ZOO_INFO@check_events@1632: session establishment complete on server [10.207.38.151:2181], sessionId=0x135a23c31a50011, negotiated timeout=10000
2012-02-22 06:14:21,057: hook.executor@DEBUG: started
2012-02-22 06:14:21,057: unit.lifecycle@DEBUG: pre-start acquire, running:False
2012-02-22 06:14:21,057: unit.lifecycle@DEBUG: start running, unit lifecycle
2012-02-22 06:14:21,058: unit.lifecycle@DEBUG: starting service relation watch
2012-02-22 06:14:21,058: unit.lifecycle@DEBUG: starting unit relation resolved watch
2012-02-22 06:14:21,059: unit.lifecycle@DEBUG: relation resolved changed
2012-02-22 06:14:21,059: unit.lifecycle@INFO: processing relation resolved changed
2012-02-22 06:14:21,062: unit.lifecycle@DEBUG: services changed old:None new:[]
2012-02-22 06:14:21,062: unit.lifecycle@DEBUG: processing relations changed
2012-02-22 06:14:21,063: unit.lifecycle@DEBUG: started unit lifecycle
2012-02-22 06:14:21,067: juju.agents.unit@DEBUG: Configuration Changed
2012-02-22 06:14:21,075: statemachine@DEBUG: unitworkflowstate: transition configure (started -> started) {}
2012-02-22 06:14:21,075: statemachine@DEBUG: unitworkflowstate:  execute action do_configure
2012-02-22 06:14:21,075: hook.executor@DEBUG: Running hook: /var/lib/juju/units/mysql-0/charm/hooks/config-changed
2012-02-22 06:14:22,250: unit.hook.api@INFO: my.cnf not changed, skipping restart/replace
2012-02-22 06:14:22,251: hook.output@INFO: {}

2012-02-22 06:14:22,267: hook.output@DEBUG: hook config-changed exited, exit code 0.
2012-02-22 06:14:22,276: hook.executor@DEBUG: Hook complete: /var/lib/juju/units/mysql-0/charm/hooks/config-changed
2012-02-22 06:14:22,276: unit.lifecycle@DEBUG: configured unit
2012-02-22 06:14:22,295: statemachine@DEBUG: unitworkflowstate: transition complete configure (state started) {}
2012-02-22 06:14:22,298: juju.agents.unit@INFO: No upgrade flag set.

Rebooting worked in much the same manner. Its important to note that the reboot only took about 24 seconds...

2012-02-22 06:15:56,635:3997(0xb73796c0):ZOO_INFO@zookeeper_close@2304: Closing zookeeper sessionId=0x135a23c31a50011 to [10.207.38.151:2181]
2012-02-22 06:16:32,669:602(0xb735f6c0):ZOO_INFO@log_env@658: Client environment:zookeeper.version=zookeeper C client 3.3.3

When I stopped the instance, and ran juju status, I got an ERROR, which I think is erroneous, and should probably just be indicated with instance-state and running:

2012-02-21 22:28:09,940 INFO Connecting to environment...
2012-02-21 22:28:14,727 INFO Connected to environment.
2012-02-21 22:28:18,952 ERROR Machine provider information missing: machine 1
machines:
  0:
    dns-name: ec2-50-19-20-85.compute-1.amazonaws.com
    instance-id: i-bc7277d9
    instance-state: running
    state: running
  1:
    instance-id: i-be1712db
services:
  mysql:
    charm: local:oneiric/mysql-1329352875
    relations: {}
    units:
      mysql/0:
        machine: 1
        public-address: ec2-50-17-61-47.compute-1.amazonaws.com
        relations: {}
        state: down
2012-02-21 22:28:18,958 INFO 'status' command finished successfully

After leaving it off for 3 minutes, I started it back up, and had no problems:

2012-02-22 06:25:41,999:577(0xb73606c0):ZOO_INFO@zookeeper_close@2304: Closing zookeeper sessionId=0x135a23c31a50016 to [10.207.38.151:2181]
2012-02-22 06:28:58,070:574(0xb74f06c0):ZOO_INFO@log_env@658: Client environment:zookeeper.version=zookeeper C client 3.3.3

This included adding and removing relations.

Revision history for this message

William Reade (fwereade) wrote on 2012-02-22:

#3

charm download:

Agreed not necessary. I believe it's not actively harmful, though... have I missed something?

config-changed:

As I recall, this has always been run automatically when a unit starts up; again, it may be redundant, but I don't *think* it'll be harmful.

missing machine:

Hmm, I guess we should filter slightly differently in the ec2 provider -- we should include stopped machines but not terminated ones. (we should also not filter whatever the moving-to-stopped state is ("shutting-down"?)... and if that means we still see machines in the process of termination, so be it).

IMO these are 3 distinct bugs, and I don't personally see the first two as very high priority; I'm not entirely clear on the ramifications of a config-changed change, but the others should be pretty trivial to fix. Opinions?

Revision history for this message

Clint Byrum (clint-fewbar) wrote on 2012-02-22: Re: [Bug 863526] Re: Juju agents do not handle reboots

#4

Excerpts from William Reade's message of Wed Feb 22 08:32:20 UTC 2012:
> charm download:
>
> Agreed not necessary. I believe it's not actively harmful, though...
> have I missed something?
>

Understood, makes sense, and no you have not missed anything.

> config-changed:
>
> As I recall, this has always been run automatically when a unit starts
> up; again, it may be redundant, but I don't *think* it'll be harmful.
>

Actually I kind of like this.. re-asserting as often as makes sense is
good, and after a reboot/restart, that actually makes a lot of sense.

> missing machine:
>
> Hmm, I guess we should filter slightly differently in the ec2 provider
> -- we should include stopped machines but not terminated ones. (we
> should also not filter whatever the moving-to-stopped state is
> ("shutting-down"?)... and if that means we still see machines in the
> process of termination, so be it).
>
> IMO these are 3 distinct bugs, and I don't personally see the first two
> as very high priority; I'm not entirely clear on the ramifications of a
> config-changed change, but the others should be pretty trivial to fix.
> Opinions?
>

The first two aren't even bugs IMO. Lets just leave them be. For the other
one, sounds like its just a cosmetic bug. You sound like you understand
it better than I, so please do flesh out the details in a bug report.

Revision history for this message

Clint Byrum (clint-fewbar) wrote on 2012-03-23:

#5

I've tested rebooting individual machines, and it works fine. I think this is *fixed*.

There is a new issue to replace it, but it only affects EC2. The bootstrap node *might* change addresses if it is rebooted on EC2, and when it does, none of the agents will be able to find it because they hard code its IP.

Changed in juju:
assignee:	nobody → William Reade (fwereade)
status:	Triaged → Fix Released

pyjuju

Juju agents do not handle reboots

Bug Description

Other bug subscribers

Related blueprints

Remote bug watches