bug search with affects_me=on times out

Bug #866100 reported by Martin Pool
56
This bug affects 13 people
Affects Status Importance Assigned to Milestone
Launchpad itself
Fix Released
Critical
Martin Pool

Bug Description

https://bugs.launchpad.net/launchpad/+bugs?search=Search&field.affects_me=on&orderby=-date_last_updated
for instance, times out with eg OOPS-2103AP45

it was working ok this morning (utc+11) but is worse now launchpad's in a higher-load phase.

when I tested this as part of bug 858618 it was also showing variable and somewhat high query times, which led to me not showing the count on the overview page. this url already existed so it's not a new regression due to my work, but it is more visible.

example statement:

LaunchpadTimeoutError: Statement: 'SELECT BugTask.assignee, BugTask.bug, BugTask.bugwatch, BugTask.date_assigned, BugTask.date_closed, BugTask.date_confirmed, BugTask.date_fix_committed, BugTask.date_fix_released, BugTask.date_incomplete, BugTask.date_inprogress, BugTask.date_left_closed, BugTask.date_left_new, BugTask.date_triaged, BugTask.datecreated, BugTask.distribution, BugTask.distroseries, BugTask.heat, BugTask.id, BugTask.importance, BugTask.milestone, BugTask.owner, BugTask.product, BugTask.productseries, BugTask.sourcepackagename, BugTask.status, BugTask.targetnamecache, Bug.date_last_message, Bug.date_last_updated, Bug.date_made_private, Bug.datecreated, Bug.description, Bug.duplicateof, Bug.heat, Bug.heat_last_updated, Bug.id, Bug.latest_patch_uploaded, Bug.message_count, Bug.name, Bug.number_of_duplicates, Bug.owner, Bug.private, Bug.security_related, Bug.title, Bug.users_affected_count, Bug.users_unaffected_count, Bug.who_made_private FROM BugTask JOIN Bug ON BugTask.bug = Bug.id WHERE Bug.id = BugTask.bug AND BugTask.distribution = 1 AND ((BugTask.status = 10) OR (BugTask.status = 15) OR (BugTask.status = 20) OR (BugTask.status = 21) OR (BugTask.status = 22) OR (BugTask.status = 25)) AND Bug.duplicateof is NULL AND \n BugTask.id IN (\n SELECT BugTask.id FROM BugTask, BugAffectsPerson\n WHERE BugTask.bug = BugAffectsPerson.bug\n AND BugAffectsPerson.person = 67034\n AND BugAffectsPerson.affected = TRUE\n )\n AND \n (Bug.private = FALSE OR EXISTS (\n SELECT BugSubscription.bug\n FROM BugSubscription, TeamParticipation\n WHERE TeamParticipation.person = 67034 AND\n TeamParticipation.team = BugSubscription.person AND\n BugSubscription.bug = Bug.id\n UNION\n SELECT BugTask.bug\n FROM BugTask, TeamParticipation\n WHERE TeamParticipation.person = 67034 AND\n TeamParticipation.team = BugTask.assignee AND\n BugTask.bug = Bug.id\n \n UNION\n SELECT BugTask.bug\n FROM BugTask, TeamParticipation, Product\n WHERE TeamParticipation.person = 67034 AND\n TeamParticipation.team = Product.owner AND\n BugTask.product = Product.id AND\n BugTask.bug = Bug.id AND\n Bug.security_related IS False\n UNION\n SELECT BugTask.bug\n FROM BugTask, TeamParticipation, ProductSeries\n WHERE TeamParticipation.person = 67034 AND\n TeamParticipation.team = ProductSeries.owner AND\n BugTask.productseries = ProductSeries.id AND\n BugTask.bug = Bug.id AND\n Bug.security_related IS False\n UNION\n SELECT BugTask.bug\n FROM BugTask, TeamParticipation, Distribution\n WHERE TeamParticipation.person = 67034 AND\n TeamParticipation.team = Distribution.owner AND\n BugTask.distribution = Distribution.id AND\n BugTask.bug = Bug.id AND\n Bug.security_related IS False\n UNION\n SELECT BugTask.bug\n FROM BugTask, TeamParticipation, DistroSeries, Distribution\n WHERE TeamParticipation.person = 67034 AND\n TeamParticipation.team = Distribution.owner AND\n DistroSeries.distribution = Distribution.id AND\n BugTask.distroseries = DistroSeries.id AND\n BugTask.bug = Bug.id AND\n Bug.security_related IS False\n \n ))\n ORDER BY Bug.date_last_updated DESC LIMIT 76 OFFSET 0'<br /> Parameters:()<br /> Original error: QueryCanceledError('canceling statement due to statement timeout\n',)<br />

Related branches

Martin Pool (mbp)
description: updated
Revision history for this message
Stuart Bishop (stub) wrote :
Download full text (3.4 KiB)

launchpad_prod_3=# explain analyze SELECT BugTask.*, Bug.*
launchpad_prod_3-# FROM BugTask JOIN Bug ON BugTask.bug = Bug.id
launchpad_prod_3-# WHERE Bug.id = BugTask.bug
launchpad_prod_3-# AND BugTask.product = 10294
launchpad_prod_3-# AND (
launchpad_prod_3(# (BugTask.status = 10)
launchpad_prod_3(# OR (BugTask.status = 15) OR (BugTask.status = 20)
launchpad_prod_3(# OR (BugTask.status = 21) OR (BugTask.status = 22)
launchpad_prod_3(# OR (BugTask.status = 25))
launchpad_prod_3-# AND Bug.duplicateof is NULL AND
launchpad_prod_3-# BugTask.id IN (
launchpad_prod_3(# SELECT BugTask.id FROM BugTask, BugAffectsPerson
launchpad_prod_3(# WHERE
launchpad_prod_3(# BugTask.bug = BugAffectsPerson.bug
launchpad_prod_3(# AND BugAffectsPerson.person = 78
launchpad_prod_3(# AND BugAffectsPerson.affected = TRUE)
launchpad_prod_3-# ORDER BY Bug.date_last_updated DESC LIMIT 76 OFFSET 0;
                                                                                                         QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit (cost=1950.09..7546.34 rows=76 width=1400) (actual time=155.107..9665.562 rows=58 loops=1)
   -> Nested Loop Semi Join (cost=1950.09..660246.29 rows=8940 width=1400) (actual time=155.106..9665.506 rows=58 loops=1)
         Join Filter: (public.bugtask.id = public.bugtask.id)
         -> Nested Loop (cost=0.00..556112.00 rows=8940 width=1400) (actual time=65.912..8744.850 rows=5772 loops=1)
               -> Index Scan Backward using bug__date_last_updated__idx on bug (cost=0.00..263671.56 rows=677083 width=1116) (actual time=0.146..5893.546 rows=677338 loops=1)
                     Filter: (duplicateof IS NULL)
               -> Index Scan using bugtask__product__bug__key on bugtask (cost=0.00..0.42 rows=1 width=284) (actual time=0.004..0.004 rows=0 loops=677338)
                     Index Cond: ((public.bugtask.product = 10294) AND (public.bugtask.bug = bug.id))
                     Filter: ((public.bugtask.status = 10) OR (public.bugtask.status = 15) OR (public.bugtask.status = 20) OR (public.bugtask.status = 21) OR (public.bugtask.status = 22) OR (public.bugtask.status = 25))
         -> Materialize (cost=1950.09..1955.17 rows=508 width=4) (actual time=0.003..0.065 rows=480 loops=5772)
               -> Nested Loop (cost=6.70..1949.58 rows=508 width=4) (actual time=14.713..42.650 rows=482 loops=1)
                     -> Bitmap Heap Scan on bugaffectsperson (cost=6.70..571.40 rows=357 width=4) (actual time=14.601..23.301 rows=345 loops=1)
                           Recheck Cond: (person = 78)
                           Filter: affected
                           -> Bitmap Index Scan on bugaffectsperson__person__idx (cost=0.00..6.62 rows=361 width=0) (actual time=14.410..14.410 rows=346 loops=1)
                         ...

Read more...

Revision history for this message
Stuart Bishop (stub) wrote :

Rewriting to the following is much faster (30ms vs 9s):

SELECT BugTask.*, Bug.*
FROM BugTask, Bug
WHERE
    Bug.id = BugTask.bug
    AND BugTask.product = 10294
    AND BugTask.status IN (10, 15, 20, 21, 22, 25)
    AND Bug.duplicateof IS NULL
    AND Bug.id IN (
        SELECT BugAffectsPerson.bug
        WHERE BugAffectsPerson.person = 78
            AND BugAffectsPerson.affected IS TRUE)
ORDER BY Bug.date_last_updated DESC LIMIT 76 OFFSET 0;

Revision history for this message
Stuart Bishop (stub) wrote :

The following is faster still, as we know BugAffectsPerson contains at most one row per (person,bug) so can use a normal join instead of a subquery:

SELECT BugTask.*, Bug.*
FROM BugTask, Bug, BugAffectsPerson
WHERE
    Bug.id = BugTask.bug
    AND BugTask.product = 10294
    AND BugTask.status IN (10, 15, 20, 21, 22, 25)
    AND Bug.duplicateof IS NULL
    AND Bug.id = BugAffectsPerson.bug
    AND BugAffectsPerson.person = 78
    AND BugAffectsPerson.affected IS TRUE
ORDER BY Bug.date_last_updated DESC LIMIT 76 OFFSET 0;

Revision history for this message
Martin Pool (mbp) wrote :

stub, thanks for looking in to this.

We did test something similar to your #3 on production prior to landing, and found that it would still sometimes take several seconds. So I'm not sure if it will be sufficiently faster, but we could try it.

Revision history for this message
Martin Pool (mbp) wrote :

https://bugs.launchpad.net/~/+affectingbugs times out too, and is doing a similar query (but from a different place in the code.)

spm measured stub's suggestion #3 on prod and got 109ms which is not too bad.

Revision history for this message
Martin Pool (mbp) wrote :

but, on other servers, 223ms on prod1 and 1995ms on prod2.

On another attempt on prod3 it was down to 98ms.

I don't know why it sometimes blows out so badly; I guess it's having to do physical io?

Revision history for this message
Robert Collins (lifeless) wrote :

Sorting by something on bugtask (e.g. importance our default) might improve things too.

The index_scan_backwards was the primary cause of slowness : postgresql will sometimes stubbornly use a matching sort index when one exists even though the data being shown may not be current on that index (which leads to loads of misses).

Revision history for this message
Robert Collins (lifeless) wrote :

SELECT BugTask.*, Bug.*
FROM BugTask, Bug, BugAffectsPerson
WHERE
    Bug.id = BugTask.bug
    AND BugTask.product = 10294
    AND BugTask.status IN (10, 15, 20, 21, 22, 25)
    AND Bug.duplicateof IS NULL
    AND Bug.id = BugAffectsPerson.bug
    AND BugAffectsPerson.person = 78
    AND BugAffectsPerson.affected IS TRUE
ORDER BY BugTask.importance DESC, BugTask.bug LIMIT 76 OFFSET 0;

Revision history for this message
Robert Collins (lifeless) wrote :

Actually, we need bugtask.id in there to make it completely stable:
SELECT BugTask.*, Bug.*
FROM BugTask, Bug, BugAffectsPerson
WHERE
    Bug.id = BugTask.bug
    AND BugTask.product = 10294
    AND BugTask.status IN (10, 15, 20, 21, 22, 25)
    AND Bug.duplicateof IS NULL
    AND Bug.id = BugAffectsPerson.bug
    AND BugAffectsPerson.person = 78
    AND BugAffectsPerson.affected IS TRUE
ORDER BY BugTask.importance DESC, BugTask.bug, BugTask.id LIMIT 76 OFFSET 0;

Revision history for this message
Robert Collins (lifeless) wrote :

same plan.
take #2: bugaffectsperson has 1M rows in it.

set enable_bitmapscan = 0;
SELECT BugTask.*, Bug.*
FROM BugTask, Bug, BugAffectsPerson
WHERE
    Bug.id = BugTask.bug
    AND BugTask.product = 10294
    AND BugTask.status IN (10, 15, 20, 21, 22, 25)
    AND Bug.duplicateof IS NULL
    AND Bug.id = BugAffectsPerson.bug
    AND BugAffectsPerson.person = 78
    AND BugAffectsPerson.affected IS TRUE
ORDER BY Bug.date_last_updated DESC LIMIT 76 OFFSET 0;

 Limit (cost=2957.31..2957.31 rows=3 width=1520) (actual time=3.618..3.642 rows=64 loops=1)
   -> Sort (cost=2957.31..2957.31 rows=3 width=1520) (actual time=3.617..3.624 rows=64 loops=1)
         Sort Key: bug.date_last_updated
         Sort Method: quicksort Memory: 122kB
         -> Nested Loop (cost=0.00..2957.28 rows=3 width=1520) (actual time=0.098..3.364 rows=64 loops=1)
               -> Nested Loop (cost=0.00..2931.64 rows=4 width=407) (actual time=0.083..2.663 rows=73 loops=1)
                     -> Index Scan using bugaffectsperson__person__idx on bugaffectsperson (cost=0.00..941.40 rows=336 width=4) (actual time=0.029..0.665 rows=345 loops=1)
                           Index Cond: (person = 78)
                           Filter: (affected IS TRUE)
                     -> Index Scan using bugtask__product__bug__key on bugtask (cost=0.00..5.91 rows=1 width=403) (actual time=0.005..0.005 rows=0 loops=345)
                           Index Cond: ((bugtask.product = 10294) AND (bugtask.bug = bugaffectsperson.bug))
                           Filter: (bugtask.status = ANY ('{10,15,20,21,22,25}'::integer[]))
               -> Index Scan using bug_pkey on bug (cost=0.00..6.40 rows=1 width=1117) (actual time=0.007..0.008 rows=1 loops=73)
                     Index Cond: (bug.id = bugtask.bug)
                     Filter: (bug.duplicateof IS NULL)
 Total runtime: 3.800 ms

Revision history for this message
Robert Collins (lifeless) wrote :

We've seen similar issues with other bitmap scans, perhaps we should review our cost parameters.

Revision history for this message
Martin Pool (mbp) wrote : Re: [Bug 866100] Re: bug search with affects_me=on times out

On 5 October 2011 10:54, Robert Collins <email address hidden> wrote:
> Sorting by something on bugtask (e.g. importance our default) might
> improve things too.

Thanks, that should be easy to change. I'm pretty sure it's a less
useful ui (since neglected Unknown/Undecided bugs sort to the top, and
 the stories around this emphasize recent change) but it's better than
timing out.

However, the per-person affectingbugs list already is sorted by
importance and it also times out, so I'm not sure that will be enough

Revision history for this message
Martin Pool (mbp) wrote :

bug 868021 for the other somewhat different timeout

Revision history for this message
Martin Pool (mbp) wrote :

I'll try to fix this on the weekend or early next week, based on stub's ideas, but if anyone wants to steal it please do.

Changed in launchpad:
assignee: nobody → Martin Pool (mbp)
Martin Pool (mbp)
Changed in launchpad:
status: Triaged → In Progress
Revision history for this message
Martin Pool (mbp) wrote :

That query seems to be built by the fairly epic BugTaskSet.buildQuery, perhaps I can tweak it in that.

Revision history for this message
Martin Pool (mbp) wrote :

with my patch this generates

SELECT BugTask.status, BugTask.assignee, BugTask.bug, BugTask.bugwatch, BugTask.date_assigned, BugTask.date_closed, BugTask.date_confirmed, BugTask.date_fix_committed, BugTask.date_fix_released, BugTask.date_incomplete, BugTask.date_inprogress, BugTask.date_left_closed, BugTask.date_left_new, BugTask.date_triaged, BugTask.datecreated, BugTask.distribution, BugTask.distroseries, BugTask.heat, BugTask.id, BugTask.importance, BugTask.milestone, BugTask.owner, BugTask.product, BugTask.productseries, BugTask.sourcepackagename, BugTask.targetnamecache, Bug.date_last_message, Bug.date_last_updated, Bug.date_made_private, Bug.datecreated, Bug.description, Bug.duplicateof, Bug.heat, Bug.heat_last_updated, Bug.id, Bug.latest_patch_uploaded, Bug.message_count, Bug.name, Bug.number_of_duplicates, Bug.owner, Bug.private, Bug.security_related, Bug.title, Bug.users_affected_count, Bug.users_unaffected_count, Bug.who_made_private
FROM BugTask
LEFT JOIN Product ON BugTask.product = Product.id AND Product.active
JOIN BugAffectsPerson ON BugTask.bug = BugAffectsPerson.bug AND BugAffectsPerson.affected AND BugAffectsPerson.person = 16
JOIN Bug ON BugTask.bug = Bug.id WHERE Bug.id = BugTask.bug AND ((BugTask.status = 10) OR BugTask.status IN (15,13,14) OR (BugTask.status = 20) OR (BugTask.status = 21) OR (BugTask.status = 22) OR (BugTask.status = 25)) AND Bug.duplicateof is NULL AND (Bugtask.product IS NULL OR Product.active = TRUE) ORDER BY BugTask.importance DESC, BugTask.id LIMIT 8 OFFSET 0

Revision history for this message
Martin Pool (mbp) wrote :

gnuoy measures, on

930ms (cold)
669ms (cold, different machine)
349ms (hot)

so it seems using the join may help enough.

Revision history for this message
Launchpad QA Bot (lpqabot) wrote :
tags: added: qa-needstesting
Changed in launchpad:
status: In Progress → Fix Committed
Revision history for this message
Robert Collins (lifeless) wrote :

sample url from the report timed out completely cold on qastaging, worked second try (4.93 seconds), third try 2.66 seconds - its not lightning fast, but it is fast enough to be a consistent improvement AFAICT

tags: added: qa-ok
removed: qa-needstesting
Revision history for this message
Martin Pool (mbp) wrote :

qa:

https://bugs.qastaging.launchpad.net/launchpad/+bugs?search=Search&field.affects_me=on

still works, failed cold, succeeded hot in 3.3s total

https://bugs.qastaging.launchpad.net/bzr/+bugs?search=Search&field.affects_me=on
succeeded "cold" (but with bugaffectsuser probably hot) in 3.3s

https://bugs.qastaging.launchpad.net/ubuntu/+bugs?search=Search&field.affects_me=on
failed cold, worked after a couple of tries , in 4.35s

So I think this change is at least not a regression; we'll see how it does on production.

William Grant (wgrant)
Changed in launchpad:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.