[TOPBLOCKER] UI randomly freezes

Bug #1377332 reported by Chris Gagnon
70
This bug affects 10 people
Affects Status Importance Assigned to Milestone
Canonical System Image
Fix Released
Critical
Unassigned
cgmanager (Ubuntu)
Fix Released
Critical
Unassigned
cgmanager (Ubuntu RTM)
Fix Released
Critical
Unassigned
ubuntu-app-launch (Ubuntu)
Fix Released
Critical
Ted Gould
ubuntu-app-launch (Ubuntu RTM)
Fix Released
Critical
Ted Gould
unity8 (Ubuntu)
Invalid
Critical
Unassigned
unity8 (Ubuntu RTM)
Invalid
Critical
Unassigned

Bug Description

Summary
UI randomly freezes

steps:
1. unlock greeter
2. open app
3. pull down random indicator
4. close indicator
5. open app
6. swipe to view all open apps
7. open launcher
8. close some open apps
9. repeat until ui freezes

device info:
[service]
base: system-image.ubuntu.com
http_port: 80
https_port: 443
channel: ubuntu-touch/ubuntu-rtm/14.09-proposed
device: krillin
build_number: 79
version_detail: ubuntu=20141002,device=20141002-d5938d7,custom=1412208099,version=79

Related branches

Revision history for this message
Chris Gagnon (chris.gagnon) wrote :
Download full text (19.0 KiB)

top -n1 -b
top - 22:19:50 up 51 min, 1 user, load average: 0.00, 0.01, 1.48
Tasks: 243 total, 2 running, 226 sleeping, 15 stopped, 0 zombie
%Cpu(s): 17.9 us, 38.7 sy, 0.3 ni, 41.6 id, 1.5 wa, 0.0 hi, 0.0 si, 0.0 st
KiB Mem: 983760 total, 947896 used, 35864 free, 6208 buffers
KiB Swap: 524284 total, 507596 used, 16688 free. 79288 cached Mem

  PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
 2724 phablet 20 0 700876 54316 11120 S 95.0 5.5 39:13.38 unity8-dash
 9502 phablet 20 0 6088 1112 796 R 15.8 0.1 0:00.07 top
 2005 root 20 0 241664 5252 4288 S 5.3 0.5 1:05.29 unity-syst+
    1 root 20 0 3896 2032 1008 S 0.0 0.2 0:07.81 init
    2 root 20 0 0 0 0 S 0.0 0.0 0:00.10 kthreadd
    3 root 20 0 0 0 0 S 0.0 0.0 0:00.75 ksoftirqd/0
    5 root 20 0 0 0 0 S 0.0 0.0 0:00.05 kworker/u:0
    6 root -99 0 0 0 0 S 0.0 0.0 0:00.59 migration/0
   16 root 0 -20 0 0 0 S 0.0 0.0 0:00.00 khelper
   17 root 20 0 0 0 0 S 0.0 0.0 0:00.01 kdevtmpfs
   18 root 0 -20 0 0 0 S 0.0 0.0 0:00.00 netns
   19 root 0 -20 0 0 0 S 0.0 0.0 0:00.00 fs_sync
   20 root 0 -20 0 0 0 S 0.0 0.0 0:00.00 suspend
   21 root 20 0 0 0 0 S 0.0 0.0 0:00.02 sync_supers
   22 root 20 0 0 0 0 S 0.0 0.0 0:00.00 bdi-default
   23 root 0 -20 0 0 0 S 0.0 0.0 0:00.00 kblockd
   24 root 20 0 0 0 0 S 0.0 0.0 0:00.00 khubd
   26 root 0 -20 0 0 0 S 0.0 0.0 0:00.00 cfg80211
   27 root 20 0 0 0 0 S 0.0 0.0 0:00.00 pmic_threa+
   28 root 0 -20 0 0 0 S 0.0 0.0 0:00.00 emi_mpu
   29 root 20 0 0 0 0 S 0.0 0.0 0:25.79 kswapd0
   30 root 20 0 0 0 0 S 0.0 0.0 0:00.00 fsnotify_m+
   31 root 20 0 0 0 0 S 0.0 0.0 0:00.00 ecryptfs-k+
   32 root 0 -20 0 0 0 S 0.0 0.0 0:00.00 crypto
   53 root 0 -20 0 0 0 S 0.0 0.0 0:00.00 uether
   54 root 0 -20 0 0 0 S 0.0 0.0 0:00.00 binder
   55 root rt 0 0 0 0 S 0.0 0.0 0:00.00 hang_detect
   56 root 20 0 0 0 0 S 0.0 0.0 0:00.09 ion_mm_heap
   58 root 20 0 0 0 0 S 0.0 0.0 0:01.83 bat_thread+
   59 root 20 0 0 0 0 S 0.0 0.0 0:00.15 mtk charge+
   60 root -81 0 0 0 0 S 0.0 0.0 0:00.36 btif_rxd
   61 root 20 0 0 0 0 S 0.0 0.0 0:00.00 conn-md-th+
   62 root 0 -20 0 0 0 S 0.0 0.0 0:00.00 mtk_vibrat+
   63 root 10 -10 0 0 0 S 0.0 0.0 0:00.00 krfcommd
   64 root 20 0 0 0 0 S 0.0 0.0 0:00.00 disp_confi+
   65 root 20 0 ...

tags: added: qa-daily-testing qasoak rtm14
Revision history for this message
Chris Gagnon (chris.gagnon) wrote :
Download full text (3.9 KiB)

cat /var/log/syslog | grep Kill
Oct 3 16:05:04 ubuntu-phablet kernel: [ 737.087236] (1)[17901:ls]Killing 'webbrowser-app' (8441), adj 798,
Oct 3 16:41:46 ubuntu-phablet kernel: [ 648.846871] (0)[14914:sudo]Killing 'webapp-containe' (8875), adj 798,
Oct 3 16:41:47 ubuntu-phablet kernel: [ 649.342591] (0)[14937:pkill]Killing 'qmlscene' (10321), adj 798,
Oct 3 16:42:59 ubuntu-phablet kernel: [ 721.348886] (2)[17159:Chrome_InProcGp]Killing 'gallery-app' (12977), adj 798,
Oct 3 16:43:41 ubuntu-phablet kernel: [ 763.588013] (0)[18669:sudo]Killing 'webapp-containe' (13131), adj 798,
Oct 3 16:49:38 ubuntu-phablet kernel: [ 1120.751795] (0)[593:Chrome_IOThread]Killing 'qmlscene' (7549), adj 798,
Oct 3 16:49:39 ubuntu-phablet kernel: [ 1121.194017] (3)[1728:sudo]Killing 'qmlscene' (11312), adj 798,
Oct 3 16:49:40 ubuntu-phablet kernel: [ 1121.928349] (0)[7574:zeitgeist-fts]Killing 'system-settings' (14492), adj 798,
Oct 3 16:53:32 ubuntu-phablet kernel: [ 1354.795767] (2)[11259:sh]Killing 'webbrowser-app' (392), adj 798,
Oct 3 16:54:10 ubuntu-phablet kernel: [ 1392.779876] (2)[12801:compare]Killing 'qmlscene' (5663), adj 798,
Oct 3 16:54:52 ubuntu-phablet kernel: [ 1434.000606] (0)[14278:sudo]Killing 'system-settings' (2224), adj 798,
Oct 3 16:55:13 ubuntu-phablet kernel: [ 1455.366459] (3)[7416:python3]Killing 'webapp-containe' (16994), adj 798,
Oct 3 16:58:51 ubuntu-phablet kernel: [ 1673.276588] (0)[23592:sudo]Killing 'qmlscene' (16292), adj 798,
Oct 3 16:58:57 ubuntu-phablet kernel: [ 1679.017060] (2)[23834:sh]Killing 'webapp-containe' (17504), adj 798,
Oct 3 16:59:00 ubuntu-phablet kernel: [ 1681.990618] (3)[3043:unity8-dash]Killing 'webbrowser-app' (12139), adj 798,
Oct 3 16:59:01 ubuntu-phablet kernel: [ 1683.760525] (2)[3794:QQuickPixmapRea]Killing 'reminders' (12231), adj 798,
Oct 3 17:00:05 ubuntu-phablet kernel: [ 1747.033895] (1)[2692:unity8]Killing 'webbrowser-app' (24820), adj 798,
Oct 3 17:01:17 ubuntu-phablet kernel: [ 1819.695361] (3)[30019:QSGRenderThread]Killing 'qmlscene' (26395), adj 798,
Oct 3 17:01:32 ubuntu-phablet kernel: [ 1834.768949] (0)[30756:sudo]Killing 'qmlscene' (29732), adj 798,
Oct 3 17:02:01 ubuntu-phablet kernel: [ 1863.464267] (0)[32202:python3]Killing 'qmlscene' (7666), adj 798,
Oct 3 17:02:02 ubuntu-phablet kernel: [ 1864.687784] (3)[31891:qmlscene]Killing 'dialer-app' (11267), adj 798,
Oct 3 17:02:23 ubuntu-phablet kernel: [ 1885.677366] (3)[888:sudo]Killing 'qmlscene' (30554), adj 798,
Oct 3 17:02:29 ubuntu-phablet kernel: [ 1891.273684] (1)[765:Chrome_InProcGp]Killing 'camera-app' (14164), adj 798,
Oct 3 17:02:33 ubuntu-phablet kernel: [ 1895.506041] (2)[765:Chrome_InProcGp]Killing 'oxide-renderer' (12595), adj 300,
Oct 3 17:16:27 ubuntu-phablet kernel: [ 2729.313877] (1)[4467:oxide-renderer]Killing 'gallery-app' (28748), adj 798,
Oct 3 17:20:00 ubuntu-phablet kernel: [ 2942.574253] (2)[16010:system-settings]Killing 'webbrowser-app' (4042), adj 798,
Oct 3 17:20:51 ubuntu-phablet kernel: [ 2992.915412] (3)[14250:systemd-udevd]Killing 'webbrowser-app' (12506), adj 798,
Oct 3 17:37:23 ubuntu-phablet kernel: [ 775.965503] (2)[2487:maliit-server]Killing 'webapp-containe' (17151), a...

Read more...

Revision history for this message
Chris Gagnon (chris.gagnon) wrote :

gdb program `pidof unity8`
gdb> set logging on
gdb> t a a bt

Revision history for this message
Chris Gagnon (chris.gagnon) wrote :
Revision history for this message
Chris Gagnon (chris.gagnon) wrote :
Revision history for this message
Chris Gagnon (chris.gagnon) wrote :
Revision history for this message
Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in unity8 (Ubuntu):
status: New → Confirmed
Revision history for this message
Selene ToyKeeper (toykeeper) wrote :

I think I'm seeing this issue too, during manual use. The UI locks at random times and stays that way until I either switch to another app and back or turn the screen off/on (with pin entry).

Revision history for this message
Chris Gagnon (chris.gagnon) wrote :

I can't switch to another app when this happens. I can turn off and on the screen but the ui is still frozen in the same spot when the screen turns on, I don't get the greeter.

Revision history for this message
kevin gunn (kgunn72) wrote :

@Chris - does this only happen on the Dash ? or on an application(s) ? or does it happen on either Dash & any app ? how many different

Have you ever seen this on krillin ?

Revision history for this message
kevin gunn (kgunn72) wrote :

ah, i see krillin in the log

Revision history for this message
kevin gunn (kgunn72) wrote :

chris - could you answer the other questions tho
does this only happen on the Dash ? or on an application(s) ? or does it happen on either Dash & any app ? how many different apps does this happen with?

Changed in unity8 (Ubuntu):
status: Confirmed → Incomplete
kevin gunn (kgunn72)
Changed in unity8 (Ubuntu):
assignee: nobody → Chris Gagnon (chris.gagnon)
Revision history for this message
Chris Gagnon (chris.gagnon) wrote :

I've seen this when the launcher is opening with no other apps in focus, I've also seen this with the app selector when a lot of apps are open (swipe to view all open apps and close or select a different app).

I tried to take a screenshot with mirscreencast but it just hangs and never takes a screenshot.

Changed in unity8 (Ubuntu):
status: Incomplete → Confirmed
assignee: Chris Gagnon (chris.gagnon) → kevin gunn (kgunn72)
Revision history for this message
Chris Gagnon (chris.gagnon) wrote :
Download full text (19.0 KiB)

I just had this happen again, unity8-dash isn't hogging the cpu this time.

phablet@ubuntu-phablet:~$ top -n 1 -b
top - 19:44:36 up 45 min, 1 user, load average: 0.00, 0.01, 0.05
Tasks: 242 total, 5 running, 236 sleeping, 1 stopped, 0 zombie
%Cpu(s): 20.6 us, 12.7 sy, 0.3 ni, 65.5 id, 0.9 wa, 0.0 hi, 0.0 si, 0.0 st
KiB Mem: 983760 total, 955356 used, 28404 free, 7632 buffers
KiB Swap: 524284 total, 240680 used, 283604 free. 130060 cached Mem

  PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
 6581 phablet 20 0 6088 1116 796 R 12.8 0.1 0:00.06 top
 7528 phablet 20 0 24620 13120 3232 R 12.8 1.3 3:23.28 python3
 3156 phablet 20 0 617876 72788 21744 S 4.3 7.4 1:07.18 unity8-dash
    1 root 20 0 3848 2140 1048 S 0.0 0.2 0:08.21 init
    2 root 20 0 0 0 0 S 0.0 0.0 0:00.24 kthreadd
    3 root 20 0 0 0 0 S 0.0 0.0 0:00.63 ksoftirqd/0
    6 root -99 0 0 0 0 S 0.0 0.0 0:00.37 migration/0
   16 root 0 -20 0 0 0 S 0.0 0.0 0:00.00 khelper
   17 root 20 0 0 0 0 S 0.0 0.0 0:00.01 kdevtmpfs
   18 root 0 -20 0 0 0 S 0.0 0.0 0:00.00 netns
   19 root 0 -20 0 0 0 S 0.0 0.0 0:00.00 fs_sync
   20 root 0 -20 0 0 0 S 0.0 0.0 0:00.00 suspend
   21 root 20 0 0 0 0 S 0.0 0.0 0:00.04 sync_supers
   22 root 20 0 0 0 0 S 0.0 0.0 0:00.00 bdi-default
   23 root 0 -20 0 0 0 S 0.0 0.0 0:00.00 kblockd
   24 root 20 0 0 0 0 S 0.0 0.0 0:00.00 khubd
   26 root 0 -20 0 0 0 S 0.0 0.0 0:00.00 cfg80211
   27 root 20 0 0 0 0 S 0.0 0.0 0:00.45 pmic_thread+
   28 root 0 -20 0 0 0 S 0.0 0.0 0:00.00 emi_mpu
   29 root 20 0 0 0 0 S 0.0 0.0 0:08.09 kswapd0
   30 root 20 0 0 0 0 S 0.0 0.0 0:00.02 fsnotify_ma+
   31 root 20 0 0 0 0 S 0.0 0.0 0:00.00 ecryptfs-kt+
   32 root 0 -20 0 0 0 S 0.0 0.0 0:00.00 crypto
   46 root 20 0 0 0 0 S 0.0 0.0 0:05.66 kworker/u:1
   53 root 0 -20 0 0 0 S 0.0 0.0 0:00.00 uether
   54 root 0 -20 0 0 0 S 0.0 0.0 0:00.00 binder
   55 root rt 0 0 0 0 S 0.0 0.0 0:00.00 hang_detect
   56 root 20 0 0 0 0 S 0.0 0.0 0:00.10 ion_mm_heap
   58 root 20 0 0 0 0 S 0.0 0.0 0:01.99 bat_thread_+
   59 root 20 0 0 0 0 S 0.0 0.0 0:00.17 mtk charger+
   60 root -81 0 0 0 0 S 0.0 0.0 0:00.46 btif_rxd
   61 root 20 0 0 0 0 S 0.0 0.0 0:00.00 conn-md-thr+
   62 root 0 -20 0 0 0 S 0.0 0.0 0:00.00 mtk_vibrator
   63 root 10 -10 0 0 0 S 0.0 0.0 0:00.00 krfcommd
   64 root 20 0 0 ...

Revision history for this message
Chris Gagnon (chris.gagnon) wrote :

phablet@ubuntu-phablet:~$ free -h
             total used free shared buffers cached
Mem: 960M 923M 37M 22M 10M 180M
-/+ buffers/cache: 732M 228M
Swap: 511M 323M 188M

Revision history for this message
Chris Gagnon (chris.gagnon) wrote :
Revision history for this message
Chris Gagnon (chris.gagnon) wrote :

The only thing interesting in syslog is:

Oct 7 19:07:37 ubuntu-phablet dbus[2528]: apparmor="DENIED" operation="dbus_method_call" bus="session" path="/com/canonical/QtMir/Clipboard" interface="org.freedesktop.DBus.Introspectable" member="Introspect" mask="send" name="com.canonical.QtMir" pid=10952 profile="com.ubuntu.developer.webapps.webapp-twitter_webapp-twitter_1.0.18" peer_pid=2870 peer_profile="unconfined"
Oct 7 19:07:37 ubuntu-phablet dbus[2528]: apparmor="DENIED" operation="dbus_method_call" bus="session" path="/com/canonical/QtMir/Clipboard" interface="com.canonical.QtMir.Clipboard" member="GetContents" mask="send" name="com.canonical.QtMir" pid=10952

There is nothing in the log for the OOM killer for today.

Revision history for this message
Chris Gagnon (chris.gagnon) wrote :
Revision history for this message
Chris Gagnon (chris.gagnon) wrote :
Revision history for this message
Gerry Boland (gerboland) wrote :

From reading the backtrace in comment 3, I'm guessing that Thread3 is the Qt GUI thread, which is blocked on a WaitCondition.
Thread 15 is a Mir thread, is blocked on SessionAuthorizer::requestAuthorizationForSession which is waiting for the Qt GUI thread to reply to it.
But as GUI thread blocked, then it blocks Mir at that place.

But why Qt GUI thread blocked is open question. More detailed backtrace would help - I'd start installing qtdeclarative5-dbg and qtbase5-dbg. The dbus DENIED messages are interesting too

Revision history for this message
Chris Gagnon (chris.gagnon) wrote :

I was able to reproduce this without autopilot. I installed qtdeclarative5-dbg and qtbase5-dbg, but the stacktrace was worse than before:

Thread 1 (process 2255):
#0 0xb60fd8e4 in ?? ()
#1 0xb6174a38 in ?? ()
Backtrace stopped: previous frame identical to this frame (corrupt stack?)
(gdb)

description: updated
Revision history for this message
Daniel d'Andrada (dandrader) wrote :

> I was able to reproduce this without autopilot. I installed qtdeclarative5-dbg and qtbase5-dbg, but the stacktrace was worse than before:

Many such "corrupt" traces can be sorted out if you also install libc6-dbg, libglib2.0-0-dbg and libdbus-glib-1-2-dbg

Revision history for this message
Daniel d'Andrada (dandrader) wrote :

> 3. pull down random notification
> 4. close notification

Do you mean "notification" or "indicator"? Ie, pull down the indicator panel form the top-edge.

Revision history for this message
Chris Gagnon (chris.gagnon) wrote :

I meant indicator

description: updated
description: updated
kevin gunn (kgunn72)
Changed in unity8 (Ubuntu):
assignee: kevin gunn (kgunn72) → Gerry Boland (gerboland)
importance: Undecided → High
tags: added: touch-2014-10-23
Revision history for this message
Chris Gagnon (chris.gagnon) wrote :
Revision history for this message
Chris Gagnon (chris.gagnon) wrote :
Revision history for this message
Chris Gagnon (chris.gagnon) wrote :

This time when the ui froze there were no OOM events. I was able to get a stacktrace with qtdeclarative5-dbg and qtbase5-dbg installed. The random gesture test had been running for over an hour.

Revision history for this message
Chris Gagnon (chris.gagnon) wrote :
Revision history for this message
Chris Gagnon (chris.gagnon) wrote :
Revision history for this message
Chris Gagnon (chris.gagnon) wrote :
Revision history for this message
Gerry Boland (gerboland) wrote :

This is a backtrace of my last hang: http://pastebin.ubuntu.com/8566086/
which is indicating that a synchronous dbus call is blocked. Is mystery to me yet, as dbus appears to be live

Revision history for this message
Gerry Boland (gerboland) wrote :

qtmir.applications: TaskController::isApplicationPid - appId= "com.ubuntu.calculator_calculator" pid= 18567
is where above appears be to stuck

Revision history for this message
Chris Gagnon (chris.gagnon) wrote :

stacktrace with qtdeclarative5-dbg qtbase5-dbg libc6-dbg libglib2.0-0-dbg and libdbus-glib-1-2-dbg installed

kevin gunn (kgunn72)
tags: added: ota-1
removed: touch-2014-10-23
Changed in unity8 (Ubuntu):
assignee: Gerry Boland (gerboland) → Josh Arenson (josharenson)
Changed in unity8 (Ubuntu RTM):
importance: Undecided → High
assignee: nobody → Josh Arenson (josharenson)
Changed in unity8 (Ubuntu):
status: Confirmed → In Progress
Revision history for this message
Michał Sawicz (saviq) wrote :

I got this today when trying to unlock the phone, got some symbols: http://pastebin.ubuntu.com/8942307/

Revision history for this message
Michał Sawicz (saviq) wrote :

From the trace above it looks like recvmsg blocks in thread 29, resulting in the rest of the shell to go to sleep.

Revision history for this message
Michał Sawicz (saviq) wrote :

More symbols in thread 1 http://pastebin.ubuntu.com/8942528/ revealed that connection to cgmanager failed. There were also errors in cgmanager.log http://pastebin.ubuntu.com/8942980/

Revision history for this message
Ted Gould (ted) wrote :

Per discussion on IRC we're going to make the connections timeout if they can't be made. This makes the situation more robust, in that the user can keep working. But they might loose a resume or pausing of an app because we can't get PIDs for it. We can then look at this bug more carefully in a less critical manner.

Changed in ubuntu-app-launch (Ubuntu):
assignee: nobody → Ted Gould (ted)
Changed in ubuntu-app-launch (Ubuntu RTM):
assignee: nobody → Ted Gould (ted)
Olli Ries (ories)
Changed in unity8 (Ubuntu RTM):
importance: High → Critical
Changed in ubuntu-app-launch (Ubuntu RTM):
importance: Undecided → Critical
summary: - UI randomly freezes
+ [TOPBLOCKER] UI randomly freezes
Revision history for this message
Serge Hallyn (serge-hallyn) wrote :

Hi,

the "no such process" message at sendmsg in cgmanager.log suggests that cgmanager has died. Could you edit /etc/default/cgmanager to have the line

cgmanager_opts="--debug"

then restart cgmanager, reproduce this, and upload the whole log?

Is this happening inside a container that has a cgproxy, or on the host?

Revision history for this message
Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in ubuntu-app-launch (Ubuntu):
status: New → Confirmed
Olli Ries (ories)
tags: added: touch-2014-11-06
removed: ota-1
Revision history for this message
Serge Hallyn (serge-hallyn) wrote :

(Marked as affecting cgmanager since cgmanager seems to be dying, but need more info to debug)

Changed in cgmanager (Ubuntu):
status: New → Confirmed
importance: Undecided → Critical
status: Confirmed → Incomplete
Olli Ries (ories)
Changed in unity8 (Ubuntu RTM):
status: New → In Progress
Ted Gould (ted)
Changed in ubuntu-app-launch (Ubuntu):
status: Confirmed → In Progress
Changed in ubuntu-app-launch (Ubuntu RTM):
status: New → In Progress
Changed in ubuntu-app-launch (Ubuntu):
importance: Undecided → High
Olli Ries (ories)
Changed in unity8 (Ubuntu):
importance: High → Critical
Changed in ubuntu-app-launch (Ubuntu):
importance: High → Critical
Revision history for this message
kevin gunn (kgunn72) wrote :

marking unity8 as opinion for the moment, since there's a targeted fix in app-launch

Changed in unity8 (Ubuntu RTM):
status: In Progress → Opinion
Changed in unity8 (Ubuntu):
status: In Progress → Opinion
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package ubuntu-app-launch - 0.4+15.04.20141118-0ubuntu1

---------------
ubuntu-app-launch (0.4+15.04.20141118-0ubuntu1) vivid; urgency=low

  [ Ted Gould ]
  * Attenting to connect to cgmanager for 1 second and then failing and
    reporting a recoverable error. (LP: #1377332)
 -- Ubuntu daily release <email address hidden> Tue, 18 Nov 2014 21:10:45 +0000

Changed in ubuntu-app-launch (Ubuntu):
status: In Progress → Fix Released
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package ubuntu-app-launch - 0.4+15.04.20141118~rtm-0ubuntu1

---------------
ubuntu-app-launch (0.4+15.04.20141118~rtm-0ubuntu1) 14.09; urgency=low

  [ Ted Gould ]
  * Attenting to connect to cgmanager for 1 second and then failing and
    reporting a recoverable error. (LP: #1377332)
 -- Ubuntu daily release <email address hidden> Tue, 18 Nov 2014 21:09:39 +0000

Changed in ubuntu-app-launch (Ubuntu RTM):
status: In Progress → Fix Released
Revision history for this message
Max Brustkern (nuclearbob) wrote :

Here's a stacktrace from a mako.

Revision history for this message
Max Brustkern (nuclearbob) wrote :

The mako stracktrace is from build 25 devel-proposed build 25.

Revision history for this message
Serge Hallyn (serge-hallyn) wrote :

@Max,

I don't see cgmanager specified in the gdb log.

ICould you still try comment #38?

tags: added: lt-category-visible
tags: added: lt-blocker
Revision history for this message
Stéphane Graber (stgraber) wrote :

21:46 < jibel> stgraber, the problem is still there with cgmanager 0.32-4ubuntu1
21:47 < stgraber> jibel: now that's something I wasn't hoping to hear :)
21:47 < stgraber> jibel: so I'm guessing the same dbus-send test hangs against cgmanager/sock and not against cgmanager.lower/sock, right?
21:47 < jibel> and the logs are not more interesting to read
21:48 < jibel> stgraber, yes, same result.
21:49 < stgraber> jibel: ok, what happens if you "pkill -9 cgproxy", does that fix it?
21:51 < jibel> stgraber, it does
21:51 < ogra> wow, i didnt know you could tear it out uderneath without losing anything
21:52 < stgraber> ok, so that seems to confirm that it's the proxy getting confused somehow
21:52 < stgraber> ogra: yeah, you can shoot them in the head, they'll respawn and nobody should notice, we had to implement that so you can dist-upgrade the host and not kill all your containers in the process
21:52 < ogra> neat
21:53 < ogra> so worst case we could just introduce a watchdog as workaround
21:53 < stgraber> so one thing I notice in the log and that's not supposed to really be a problem but may be making things worse at the moment is that we're dealing with a long running connection to cgproxy there which isn't very usual
21:53 < stgraber> upstart and systemd-logind both establish a new connection every time they need to talk to it rather than keeping a long running one around
21:54 < ogra> who is that keeping the constant connection atm ? upstart-app-launch ?
21:54 < stgraber> it may be worth patching whatever's talking to cgmanager to do that too and see if that helps. Don't get me wrong, there's still a cgmanager/cgproxy bug somewhere, but short of having hallyn around, this may take a while to track down so if connect/disconnect does the trick, that may be an easy way out.
21:55 < ogra> ted, ^^^ ?
21:55 < stgraber> don't know, I don't think I've seen a log with the app name, only its pid :) but yeah, whatever calls gettasks all the time :)
21:55 < ogra> well either qtmir or UAL
21:55 < ogra> i guess
21:58 < pmcgowan> stgraber, ogra good stuff, can we get all this in the bug report?
21:58 < stgraber> probably, what's the bug # for that again? I was given at least two different ones for this this morning

Revision history for this message
Jean-Baptiste Lallement (jibel) wrote :

from 1394919

I tried with cgmanager 0.32-4ubuntu1 and the problem remains.

The following command hangs:
dbus-send --print-reply --address=unix:path=/sys/fs/cgroup/cgmanager/sock --type=method_call /org/linuxcontainers/cgmanager org.linuxcontainers.cgmanager0_0.Ping "int32:1"

The following command:
dbus-send --print-reply --address=unix:path=/sys/fs/cgroup/cgmanager.lower/sock --type=method_call /org/linuxcontainers/cgmanager org.linuxcontainers.cgmanager0_0.Ping "int32:1"
returns:
method return sender=(null sender) -> dest=(null destination) reply_serial=1

The system can be recovered by killing cgproxy

Revision history for this message
Jean-Baptiste Lallement (jibel) wrote :

From stgraber on 1394919

On the next hang, please report the following:
 - dbus-send to both bus addresses (confirming the hang)
 - ls -lh /proc/$(pidof cgmanager)/fd/
 - ls -lh /proc/$(pidof cgproxy)/fd/
 - gdb -p $(pidof cgmanager) -ex bt
 - gdb -p $(pidof cgproxy) -ex bt
 - dmesg
 - free
 - /var/log/upstart/cgmanager.log
 - /var/log/upstart/cgproxy.log
 - ps aux | grep cgmanager
 - ps aux | grep cgproxy

Revision history for this message
Jean-Baptiste Lallement (jibel) wrote :

Here are all the logs you requested. The first dbus-send command hangs and the second works.
Perhaps it is a coincidence but the problem seems to be triggered when the systems running out of memory, killed webapp-container and tries to restore gmail webapp (or maybe any other webapp)

In syslog webapp-container is killed at
Nov 25 08:24:23 ubuntu-phablet kernel: [16138.987397]Killing 'webapp-containe' (23587), adj 900,

It almost match with the time of the apport crash (symptom of this bug)
$ grep ^Date /var/crash/_usr_share_apport_recoverable_problem.32011.crash
Date: Tue Nov 25 08:24:50 2014

Changed in cgmanager (Ubuntu):
status: Incomplete → Confirmed
Revision history for this message
Stéphane Graber (stgraber) wrote :

Just had a chat with Serge and we've got a theory.

cgmanager sends tasks to cgproxy using ucreds. ucreds contain a pid, uid and gid, all of which must be valid. If however the pid gets killed and dies between the time the ucred is generated and the time it's sent over the socket, it'll be invalid and the kernel will refuse to send the message returning the odd error we're seeing. This then makes cgmanager give up and keeps cgproxy hanging in recvmsg.

The way around this is to have both cgmanager and cgproxy check for error on sendmsg and recvmsg, then check the errno and if that matches the "pid no longer exists" case, then just ignore that entry as it means the process has now died and so shouldn't be reported anyway.

There probably are some more similar races here and there in cgmanager/cgproxy when sending processes over ucreds, but getting a patch for that case (assuming the theory is right) shouldn't be terribly difficult.

It should also be reasonably easy to construct a testcase which hits that specific problem by spawning say around a thousand processes, then killing them all while doing gettasks in an infinite loop (which is a good approximation of what libual does today).

Revision history for this message
Serge Hallyn (serge-hallyn) wrote : Re: [Bug 1377332] Re: [TOPBLOCKER] UI randomly freezes

I tried to reproduce this with a program doing an endless loop of
1000 forks which each immediately exit, while doing an endless loop
of cgm gettasks, all inside a container. I failed to get any cgproxy
crashes or hangs however.

Will keep trying tomorrow.

Revision history for this message
Serge Hallyn (serge-hallyn) wrote :

A set of packages with the proposed fix is at:

http://people.canonical.com/~serge/cgmanager-gettasks-race/

Could you please install those and let us know whether they fix the issue?

Revision history for this message
Oliver Grawert (ogra) wrote :

serge: not really ... we need armhf binaries on the phone (which would be fine thanks to your debdiff we could build that in a silo, but ...) the RTM distro is on 0.30 (and i think stuck there since we can not easily upgrade to latest lxc if i understood stephane correctly) ... https://launchpad.net/ubuntu-rtm/+source/cgmanager

Revision history for this message
Oliver Grawert (ogra) wrote :

after testing the packages serge offered above from https://launchpad.net/~ci-train-ppa-service/+archive/ubuntu-rtm/landing-003 for a day i mustr say that the device feels generally a lot more stable. except some webapp crashes that i suspect are unrelated to cgmanager the phone was stable, did not expose any hangs, lock-ups or session restarts yet. i will go on testing this over the weekend but the first impression is really good.

Revision history for this message
Jean-Baptiste Lallement (jibel) wrote :

I confirm what Oliver said, the system is much stable with the package in silo 3 even when it runs out of memory. The QA Team will run Long Running Test with this package and see if it improves the average time to failure.

Revision history for this message
Stéphane Graber (stgraber) wrote :

13:14 < ogra_> stgraber, soo ... that gcmanager fix from hallyn looks reallly good, the phone seems a lot more stable ... we are wondering what the implications are if we would use the 0.32 cgmmanager in rtm though ... would there be any drawbacks with the old lxc version we use ?
13:15 < sil2100> stgraber: hey! We've been wondering, would everything be ok to use the new cgmanager (from utopic) in rtm?
15:15 < stgraber> ogra_, sil2100: I don't expect any problem with current LXC and new cgmanager
15:15 < stgraber> the only time where lxc talks to cgmanager on the phone is at container startup and you've tested that a few times already. LXC itself does cgmanager API version detection at runtime so there's also no need to rebuild it or anything
15:15 < ogra_> stgraber, oh, thats awesome, could you note that in the bug ?

Olli Ries (ories)
Changed in canonical-devices-system-image:
importance: Undecided → Critical
status: New → Confirmed
milestone: none → r1
Changed in cgmanager (Ubuntu RTM):
status: New → Fix Released
importance: Undecided → Critical
Michał Sawicz (saviq)
Changed in unity8 (Ubuntu):
status: Opinion → Invalid
assignee: Josh Arenson (josharenson) → nobody
Changed in unity8 (Ubuntu RTM):
assignee: Josh Arenson (josharenson) → nobody
status: Opinion → Invalid
Changed in canonical-devices-system-image:
status: Confirmed → Fix Released
Changed in cgmanager (Ubuntu):
status: Confirmed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.