Deleting a volume uses "dd" which higher up too much the load

Bug #937694 reported by Razique Mahroua
12
This bug affects 2 people
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Fix Released
Undecided
Pádraig Brady

Bug Description

Dear stackers,
Env : DIablo 2011.3

nova-volume/ network/ api/ scheduler/ <------> nova-compute #1
                                                                              |---> nova-compute # x

I just noticed for the second time that removing a volume via nova volume-delete does a "dd" first in order to erase the content.
That dd uses as cpu as possible, making the instances unreacheable - the python process having a hard time working (nova-network especially) Once the dd is complete, and load back to normal, all my instances are reacheable.

http://grrrab.it/96d2up

at the same time, instance availability :
http://grrrab.it/ii62p7

Is it possible to add a "nice" for the dd process, so we make sure the process doesn't phagocyte too much the CPU ?
thanks
Razique

Revision history for this message
Pádraig Brady (p-draigbrady) wrote :

You mention CPU as the bottleneck, but that would surprise me with that dd command.
Is perhaps disk the bottleneck? What type of backing storage do you have? What size are the volumes?
From the graph I'm guessing dd uses 35% of your CPU, while saturating the disk writing zeros?
Perhaps dd is also blowing away the buffer cache while doing this,
thus causing more disk thrashing?

Does the attached patch avoid the issue in any way?
You'll need to apply that as root to /usr/lib/python/...../nova/volume/driver.py
(I'd delete any driver.py[co] files there too just in case),
and restart the nova volume service

Thierry Carrez (ttx)
Changed in nova:
status: New → Incomplete
Revision history for this message
Razique Mahroua (razique) wrote :

Hi - thanks Pádraig,
it's not the CPU, it's the load of the server - pretty high actually.
The volume backend is an ISCSI-connected SAN, and it's not the bottleneck (Gigabit link which doesn't saturates at that point)
Thanks!

Revision history for this message
Thierry Carrez (ttx) wrote :

Razique: could you test the proposed patch ?

Changed in nova:
importance: Undecided → Medium
status: Incomplete → Confirmed
importance: Medium → Undecided
status: Confirmed → Incomplete
Revision history for this message
Thierry Carrez (ttx) wrote :

We cannot solve the issue you reported without more information. Could you please provide the requested information ?

Revision history for this message
Razique Mahroua (razique) wrote :

Sorry for the delay Thierry I was convinced I replied. The patch works. The dd takes longer but availability is not perturbed.
thanks

Revision history for this message
Pádraig Brady (p-draigbrady) wrote :

OK I'll send this for folsom.
Also related is bug 1022511

Changed in nova:
status: Incomplete → In Progress
assignee: nobody → Pádraig Brady (p-draigbrady)
milestone: none → folsom-3
Revision history for this message
Pádraig Brady (p-draigbrady) wrote :

Razique: On second thoughts I'm a bit wary of setting priorities.
That should only be done in very specific circumstances,
or otherwise you'll have a priority setting arms race.

Could you please remove the 'ionice', '-c3', part of the patch,
but leave the 'direct' flags in place.
If that still fixes the issue for you, it points to a Linux VM issue,
which is OK as we'd prefer no caching in this case.

thanks.

Thierry Carrez (ttx)
Changed in nova:
milestone: folsom-3 → folsom-rc1
Revision history for this message
Akira Yoshiyama (yosshy) wrote :

How about cstream?
http://www.cons.org/cracauer/cstream.html

It has nice features below:
* Exact throughput limiting
* Built-in data creation and sink (not using /dev/null and /dev/zero)
* Customizable reporting including bytes written
* Blocksize customization like dd

For example:

# cstream -t 10M -T1 -v1 -n 1G -i - -o /tmp/dummy1G
10485760 B 10.0 MB 1.00 s 10484060 B/s 10.00 MB/s
20979712 B 20.0 MB 2.00 s 10485237 B/s 10.00 MB/s
...

Revision history for this message
Pádraig Brady (p-draigbrady) wrote :

Rate limiting is an option,
though hopefully we can get the system to
auto use appropriate resources for this.

BTW pv is another common tool used for rate limiting:
http://www.ivarch.com/programs/pv.shtml

thanks!

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (master)

Fix proposed to branch: master
Review: https://review.openstack.org/12481

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (master)

Reviewed: https://review.openstack.org/12481
Committed: http://github.com/openstack/nova/commit/64f1916d5871344bdd5177993bca709950e0b819
Submitter: Jenkins
Branch: master

commit 64f1916d5871344bdd5177993bca709950e0b819
Author: Pádraig Brady <email address hidden>
Date: Fri Jul 13 18:22:17 2012 +0100

    avoid the buffer cache when copying volumes

    The dd process when clearing volumes, was seen to starve
    the system when writing to an iSCSI SAN connected over GigE.

    So use O_DIRECT within the dd process so that the system
    buffer cache is not impacted, which is generally the
    best thing to do when streaming large amounts of data.

    Also one could drop the I/O priority of the dd process
    by prepending "ionice -c3". That would change the priority
    from "normal" (best effort) to "idle", which means zeroing
    will only proceed when there is no other I/O on the system.
    It was thought best to leave scheduling decisions to the
    system however, rather than specifying them explicitly.

    Fixes bug: 937694
    Change-Id: Ic842d7b83209c41d8ff05075990ed12e6f86283a

Changed in nova:
status: In Progress → Fix Committed
Thierry Carrez (ttx)
Changed in nova:
status: Fix Committed → Fix Released
Thierry Carrez (ttx)
Changed in nova:
milestone: folsom-rc1 → 2012.2
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.