OpenStack Compute (nova)

Deleting a volume uses "dd" which higher up too much the load

Bug #937694 reported by Razique Mahroua on 2012-02-21

This bug affects 2 people

Affects		Status	Importance	Assigned to	Milestone
	OpenStack Compute (nova)	Fix Released	Undecided	Pádraig Brady	OpenStack Compute (nova) 2012.2 "folsom"

Bug Description

Dear stackers,
Env : DIablo 2011.3

nova-volume/ network/ api/ scheduler/ <------> nova-compute #1
|---> nova-compute # x

I just noticed for the second time that removing a volume via nova volume-delete does a "dd" first in order to erase the content.
That dd uses as cpu as possible, making the instances unreacheable - the python process having a hard time working (nova-network especially) Once the dd is complete, and load back to normal, all my instances are reacheable.

http://grrrab.it/96d2up

at the same time, instance availability :
http://grrrab.it/ii62p7

Is it possible to add a "nice" for the dd process, so we make sure the process doesn't phagocyte too much the CPU ?
thanks
Razique

Revision history for this message

Pádraig Brady (p-draigbrady) wrote on 2012-02-23:

tentative avoidance patch Edit (642 bytes, text/plain)

You mention CPU as the bottleneck, but that would surprise me with that dd command.
Is perhaps disk the bottleneck? What type of backing storage do you have? What size are the volumes?
From the graph I'm guessing dd uses 35% of your CPU, while saturating the disk writing zeros?
Perhaps dd is also blowing away the buffer cache while doing this,
thus causing more disk thrashing?

Does the attached patch avoid the issue in any way?
You'll need to apply that as root to /usr/lib/python/...../nova/volume/driver.py
(I'd delete any driver.py[co] files there too just in case),
and restart the nova volume service

Thierry Carrez (ttx) on 2012-02-24

Changed in nova:
status:	New → Incomplete

Revision history for this message

Razique Mahroua (razique) wrote on 2012-02-24:

Hi - thanks Pádraig,
it's not the CPU, it's the load of the server - pretty high actually.
The volume backend is an ISCSI-connected SAN, and it's not the bottleneck (Gigabit link which doesn't saturates at that point)
Thanks!

Revision history for this message

Thierry Carrez (ttx) wrote on 2012-06-07:

Razique: could you test the proposed patch ?

Changed in nova:
importance:	Undecided → Medium
status:	Incomplete → Confirmed
importance:	Medium → Undecided
status:	Confirmed → Incomplete

Revision history for this message

Thierry Carrez (ttx) wrote on 2012-07-12:

We cannot solve the issue you reported without more information. Could you please provide the requested information ?

Revision history for this message

Razique Mahroua (razique) wrote on 2012-07-12:

Sorry for the delay Thierry I was convinced I replied. The patch works. The dd takes longer but availability is not perturbed.
thanks

Revision history for this message

Pádraig Brady (p-draigbrady) wrote on 2012-07-12:

OK I'll send this for folsom.
Also related is bug 1022511

Changed in nova:
status:	Incomplete → In Progress
assignee:	nobody → Pádraig Brady (p-draigbrady)
milestone:	none → folsom-3

Revision history for this message

Pádraig Brady (p-draigbrady) wrote on 2012-07-13:

Razique: On second thoughts I'm a bit wary of setting priorities.
That should only be done in very specific circumstances,
or otherwise you'll have a priority setting arms race.

Could you please remove the 'ionice', '-c3', part of the patch,
but leave the 'direct' flags in place.
If that still fixes the issue for you, it points to a Linux VM issue,
which is OK as we'd prefer no caching in this case.

thanks.

Thierry Carrez (ttx) on 2012-08-16

Changed in nova:
milestone:	folsom-3 → folsom-rc1

Revision history for this message

Akira Yoshiyama (yosshy) wrote on 2012-08-23:

How about cstream?
http://www.cons.org/cracauer/cstream.html

It has nice features below:
* Exact throughput limiting
* Built-in data creation and sink (not using /dev/null and /dev/zero)
* Customizable reporting including bytes written
* Blocksize customization like dd

For example:

# cstream -t 10M -T1 -v1 -n 1G -i - -o /tmp/dummy1G
10485760 B 10.0 MB 1.00 s 10484060 B/s 10.00 MB/s
20979712 B 20.0 MB 2.00 s 10485237 B/s 10.00 MB/s
...

Revision history for this message

Pádraig Brady (p-draigbrady) wrote on 2012-08-23:

Rate limiting is an option,
though hopefully we can get the system to
auto use appropriate resources for this.

BTW pv is another common tool used for rate limiting:
http://www.ivarch.com/programs/pv.shtml

thanks!

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2012-09-06: Fix proposed to nova (master)

#10

Fix proposed to branch: master
Review: https://review.openstack.org/12481

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2012-09-06: Fix merged to nova (master)

#11

Reviewed: https://review.openstack.org/12481
Committed: http://github.com/openstack/nova/commit/64f1916d5871344bdd5177993bca709950e0b819
Submitter: Jenkins
Branch: master

commit 64f1916d5871344bdd5177993bca709950e0b819
Author: Pádraig Brady <email address hidden>
Date: Fri Jul 13 18:22:17 2012 +0100

avoid the buffer cache when copying volumes

The dd process when clearing volumes, was seen to starve
the system when writing to an iSCSI SAN connected over GigE.

    So use O_DIRECT within the dd process so that the system
    buffer cache is not impacted, which is generally the
    best thing to do when streaming large amounts of data.

    Also one could drop the I/O priority of the dd process
    by prepending "ionice -c3". That would change the priority
    from "normal" (best effort) to "idle", which means zeroing
    will only proceed when there is no other I/O on the system.
    It was thought best to leave scheduling decisions to the
    system however, rather than specifying them explicitly.

Fixes bug: 937694
Change-Id: Ic842d7b83209c41d8ff05075990ed12e6f86283a