Metadata lazy loaded (even when disabled)

Bug #1068801 reported by Joshua Harlow
20
This bug affects 3 people
Affects Status Importance Assigned to Milestone
cloud-init
Fix Released
Medium
Unassigned

Bug Description

A situation I was thinking probably is a bug.

Looking at the boto fetching of metadata, they lazily load the dictionary that forms the instances metadata (and subsequent contained 'sub' dictionaries as well). When the ec2 datasource is serialized it only contains whatever was loaded up to that point (depending on which fields were accessed). Now say the ec2 iptables rule is applied, further lazy loadings accessing contained sub entries of the metadata will fail loading. This would be counterintuitive to modules that expect fields to exist (and aren't prepared for lazy loading that happens in the lazy loading wrapper).

Perhaps before serialization or in our custom ec2_utils (tbd) the full metadata could be traversed and stored so that this situation does not happen.

Related branches

Revision history for this message
Joshua Harlow (harlowja) wrote :

This only seems to affect boto >= 2.5.2

Revision history for this message
Jeremy Deininger (jeremydei) wrote :

I've seen a race condition happening when loading ec2-public keys for the authorized-keys file. The result was no ec2 authorized_key was being populated. (Archlinux, systemd init scripts, python2-boto 2.5.2)

I'll try this patch and report back. Thanks!

Revision history for this message
Joshua Harlow (harlowja) wrote :

Odd, boto 2.5.2 i don't think had the lazy loading stuff, at least from looking at there code @ https://github.com/boto/boto/blob/2.5.2/boto/utils.py#L244 it seems to be doing a depth first like action there...

Revision history for this message
Jeremy Deininger (jeremydei) wrote :

Tried the boto-metadata-fixings branch and it didn't have an effect. What I'm seeing is failure to populate the .ssh/authorized_keys file with the EC2 ssh key. There is no error message. I've added debug logging to my branch but for some reason adding this logging was enough to get the key populated. http://bazaar.launchpad.net/~jeremydei/cloud-init/archlinux/revision/701#cloudinit/sources/__init__.py

Revision history for this message
Joshua Harlow (harlowja) wrote :

Thats totally weird. Let me give u a patch to apply that might also help. Will get u it when I get back to work. It will debug log all loads/writes that are happening. Useful to track down this kind of stuff.

Revision history for this message
Joshua Harlow (harlowja) wrote :

Try this, it might help diagnoise what is being written out...

Revision history for this message
Joshua Harlow (harlowja) wrote :

Anddd.

Scott Moser (smoser)
Changed in cloud-init:
status: New → Triaged
importance: Undecided → Medium
Revision history for this message
Jeremy Deininger (jeremydei) wrote :

Joshua,
I launched an instance with those patches applied. Here is the log output. Still appears as though it's not writing the key. I also didn't really see it say much about writing the metadata (only the userdata). What do you think?

http://pastie.org/private/4nlikzlqoawmfmb6m2iibq

Revision history for this message
Joshua Harlow (harlowja) wrote :

Really odd, it almost seems like no userdata was fetched.

Is it possible to get into the instance and do a wget/curl on the 169 urls?

U should set a big blob that is being written out to a userdata.i file. I didn't see that.

It'd be interesting to get in the instance and then call this code.

https://code.launchpad.net/~harlowja/cloud-init/boto-metadata-fixings/+merge/131301

The get_userdata and metadata function there. See what it gets...

Revision history for this message
Jeremy Deininger (jeremydei) wrote :

Ok, I messed around with running this code quite a bit. As far as I can tell the metadata just doesn't get populated when using boto 2.5.2. Even after launching and re-running cloud-init, or re-running cloud-init after wipeing the instance data did not ever seem to store the metadata.

Then I upgraded to boto 2.6.0. When using that I had this traceback:
http://pastie.org/5344298

This is while running the branch for boto-metadata-fixings + the extra debug patches.

When I was using 2.5.2 it seemed like whenever i print the value of one of these variables that makes it populate the data. Perhaps there is something with casting to string or etc that causes the population. Otherwise it seems to be blank.

-J

Revision history for this message
Jeremy Deininger (jeremydei) wrote :

Further inspection while using boto 2.6.0 it looks like the _unlazy_dict method is returning 'None'. Here's a paste showing them side by side. The Boto one appears to be returning a fully populated dict ..

http://pastie.org/5347372

Revision history for this message
Joshua Harlow (harlowja) wrote :

Well thats odd. Let me check into why thats happening. Seems like it shouldn't.

Revision history for this message
Joshua Harlow (harlowja) wrote :

Can u try @ https://code.launchpad.net/~harlowja/cloud-init/boto-metadata-fixings

I think I fixed the none case there, bad return, oops.

Revision history for this message
Jeremy Deininger (jeremydei) wrote :

Sweet that works! It populates the authorized_key now. https://code.launchpad.net/~harlowja/cloud-init/boto-metadata-fixings r700 + the debug patches. I am doing more testing here soon without all the debug stuff..

Revision history for this message
Scott Moser (smoser) wrote :

fix-committed in revno 715.

Changed in cloud-init:
status: Triaged → Fix Committed
Revision history for this message
Tomas Karasek (tomas-karasek) wrote :

hi, I am testing 0.7.0 on Scientific Linux CERN 6 (clone of RHEL6), with python-boto-2.5.2-1.el6.noarch. I build the rpm for cloud-init from trunk, revno 725. My isntances live in Openstack Essex. Python is 2.6.6.

My .ssh/authorized_keys are still not populated.

I debugged through sources/__init__.py, function normalize_pubkey_data(pubkey_data). So following text regards the function.
When I print the type of pubkey_data, it's <class 'boto.utils.LazyLoadMetadata'>. When I print the pubkey_data itself, the object gets evaluated and .ssh/authorized_keys are populated properly. In that case the pubkey_data are a dict:

{'tkarasek_keypair': ['ssh-rsa AAAAB3NzaD1yc2EAAAABIwAAAIEAutjuZ/3b+eY3VvKWJPsgkfJqcjKaobgjJRbr/jCwskGUNc8nGdB3dhOJTFTUi/gycrT8CeebB/wwwM4oz3ip/GJjCPaHS8X4vkqWsqgBQZ36cNzfRnqwKWRTMzAI19ekTIsOdjENyeQ+QJoyOFIU9RJAOoepEp/uMD3PH39FLfk= sth@lhost, '']}

Even if I don't print the pubkey_data, it takes the last branch ("if isinstance(pubkey_data, (dict)):"). However in the loop over iteritems of the dict, I get ('tkarasek_keypair', None), i.e. the value of pubkey_data['tkarasek_keypair'] is reported as None. Because of that, empty list is returned from normalize_pubkey_data.

When I change the loop to
for (_keyname, klist) in pubkey_data.items(): # isntead of for (_keyname, klist) in pubkey_data.iteritems()

I get the proper list in klist and my authorized keys are populated. It seems the iterator doesnt work well with the boto lazy metadata.

Would you change the loop from iteritems to items?

Revision history for this message
Joshua Harlow (harlowja) wrote :

Hi tomas, can u post the contents of the boto/utils.py file that u have which should be in site-packages. I was checking https://github.com/boto/boto/blob/2.5.2/boto/utils.py and it doesn't seem they have the lazy load dictionary there so I am wondering if they put it in then removed it and thats why u are seeing it.

I think scott and I have some ideas on how to fix this (ie traverse LazyLoadMetadata and force it to be a python dictionary so that none of its weirdness happens). But more info would be great. Thx!

Revision history for this message
Joshua Harlow (harlowja) wrote :

On my rhel6 box:

[harlowja@followsave ~]$ sudo yum list | grep boto
Password:
python-boto.noarch 2.5.2-1.el6 @epel

[harlowja@followsave ~]$ python
Python 2.6.6 (r266:84292, Aug 28 2012, 10:55:56)
[GCC 4.4.6 20120305 (Red Hat 4.4.6-4)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import boto.utils as boto_utils
>>> dir(boto_utils)
['AuthSMTPHandler', 'Encoders', 'ISO8601', 'ISO8601_MS', 'LRUCache', 'MIMEBase', 'MIMEMultipart', 'MIMEText', 'Password', 'ShellCommand', 'StringIO', '__builtins__', '__doc__', '__file__', '__name__', '__package__', '_get_instance_metadata', '_hashfn', 'boto', 'canonical_string', 'datetime', 'fetch_file', 'find_class', 'formatdate', 'get_aws_metadata', 'get_instance_metadata', 'get_instance_userdata', 'get_ts', 'get_utf8_value', 'guess_mime_type', 'gzip', 'hashlib', 'imp', 'logging', 'merge_meta', 'mklist', 'notify', 'parse_ts', 'pythonize_name', 'qsa_of_interest', 'retry_url', 'smtplib', 'subprocess', 'tempfile', 'time', 'update_dme', 'urllib', 'urllib2', 'write_mime_multipart']

I'm not seeing the lazy load stuff there, can u do the same, thx.

Revision history for this message
Tomas Karasek (tomas-karasek) wrote :

Hi Joshua,

it seems there's something wrong with the boto rpm in epel. It says it's 2.5.2 but boto/utils.py contains the LazyLoadMetadata which were introduced later. You can see the code in the tarball in the source rpm:
http://dl.fedoraproject.org/pub/epel/6/SRPMS/repoview/python-boto.html
(to unpack the rpm you can "rpm2cpio python-boto-2.5.2-1.el6.src.rpm | cpio -idmv" if you are interested)

So there's some mess in the boto rpm but all that does not really matter from cloud-init point of view right? Can't we just eventually process the LazyLoadMetadata and transform it do dict? And then work with the dict from then on?

As requested, I attach utils.py from my test instance and the boto routine is:

[root@tomk-cava ~]# yum list | grep boto
python-boto.noarch 2.5.2-1.el6 @epel
[root@tomk-cava ~]# python
Python 2.6.6 (r266:84292, Sep 12 2012, 09:26:39)
[GCC 4.4.6 20120305 (Red Hat 4.4.6-4)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import boto.utils as boto_utils
>>> dir(boto_utils)
['AuthSMTPHandler', 'Encoders', 'ISO8601', 'ISO8601_MS', 'LRUCache', 'LazyLoadMetadata', 'MIMEBase', 'MIMEMultipart', 'MIMEText', 'Password', 'ShellCommand', 'StringIO', '__builtins__', '__doc__', '__file__', '__name__', '__package__', '_end_cap_regex', '_first_cap_regex', '_get_instance_metadata', '_hashfn', '_number_cap_regex', 'base64', 'boto', 'canonical_string', 'compute_md5', 'datetime', 'fetch_file', 'find_class', 'formatdate', 'get_aws_metadata', 'get_instance_metadata', 'get_instance_userdata', 'get_ts', 'get_utf8_value', 'guess_mime_type', 'gzip', 'hashlib', 'imp', 'json', 'logging', 'md5', 'merge_meta', 'mklist', 'notify', 'parse_ts', 'pythonize_name', 'qsa_of_interest', 're', 'retry_url', 'smtplib', 'socket', 'subprocess', 'tempfile', 'time', 'unquote_v', 'update_dme', 'urllib', 'urllib2', 'write_mime_multipart']
>>>

Revision history for this message
Scott Moser (smoser) wrote :

Tomas,
  To avoid complexity in determining when to lazy load, we're just always traversing the dict now.
  http://bazaar.launchpad.net/~cloud-init-dev/cloud-init/trunk/revision/729

So, current trunk (which I hope to release later today as 0.7.2) should be good for you.

Scott Moser (smoser)
Changed in cloud-init:
status: Fix Committed → Fix Released
Revision history for this message
Tomas Karasek (tomas-karasek) wrote :

Tested 0.7.1 and it seems to work on my configuration (as listed above).
Thanks a lot Jeremy, Joshua and Scott.

Revision history for this message
James Falcon (falcojr) wrote :
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.