My laptop is running 2.6.35-rc2 (plus a few patches) and after not quite
four hours of beeing booted (3:45 to be precise), top shows that kslowd
has accumulated as surprising amount of CPU time. From top display:
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
6173 root 15 -5 0 0 0 S 6 0.0 10:42.91 kslowd001
1219 root 15 -5 0 0 0 S 5 0.0 11:00.68 kslowd000
1765 root 15 -5 0 0 0 S 5 0.0 10:59.20 kslowd002
6784 root 15 -5 0 0 0 S 4 0.0 10:34.37 kslowd003
4946 root 20 0 144m 34m 17m S 3 0.9 2:24.62 Xorg
This came up because I was trying to figure out why mouse response was
being so jerky. There aren't that many users of slow-work, and given
that I'm not using cifs, fscache, gfs2, my suspicions are falling on
drivers/gpu/drm/drm_crtc_helper.c (Would this be something that would be
in use on a Lenovo T400 laptop?)
Is there any way to easily see what is scheduling themselves to use the
slow workqueue?
Thanks,
- Ted
Theodore Ts'o <[email protected]> wrote:
> Is there any way to easily see what is scheduling themselves to use the
> slow workqueue?
Slow work has debugging. Turn on:
CONFIG_SLOW_WORK_DEBUG=y
then do:
watch -n0 cat /sys/kernel/debug/slow_work/runqueue
David
David Howells <[email protected]> wrote:
> > Is there any way to easily see what is scheduling themselves to use the
> > slow workqueue?
>
> Slow work has debugging. Turn on:
>
> CONFIG_SLOW_WORK_DEBUG=y
>
> then do:
>
> watch -n0 cat /sys/kernel/debug/slow_work/runqueue
I documented this, of course. Look in:
Documentation/slow-work.txt
in section "VIEWING EXECUTING AND QUEUED ITEMS".
David
On Thu, Jun 10, 2010 at 4:36 AM, Theodore Ts'o <[email protected]> wrote:
>
> My laptop is running 2.6.35-rc2 (plus a few patches) and after not quite
> four hours of beeing booted (3:45 to be precise), top shows that kslowd
> has accumulated as surprising amount of CPU time. ?From top display:
>
> ?PID USER ? ? ?PR ?NI ?VIRT ?RES ?SHR S %CPU %MEM ? ?TIME+ ?COMMAND
> ?6173 root ? ? ?15 ?-5 ? ? 0 ? ?0 ? ?0 S ? ?6 ?0.0 ?10:42.91 kslowd001
> ?1219 root ? ? ?15 ?-5 ? ? 0 ? ?0 ? ?0 S ? ?5 ?0.0 ?11:00.68 kslowd000
> ?1765 root ? ? ?15 ?-5 ? ? 0 ? ?0 ? ?0 S ? ?5 ?0.0 ?10:59.20 kslowd002
> ?6784 root ? ? ?15 ?-5 ? ? 0 ? ?0 ? ?0 S ? ?4 ?0.0 ?10:34.37 kslowd003
> ?4946 root ? ? ?20 ? 0 ?144m ?34m ?17m S ? ?3 ?0.9 ? 2:24.62 Xorg
>
> This came up because I was trying to figure out why mouse response was
> being so jerky. ?There aren't that many users of slow-work, and given
> that I'm not using cifs, fscache, gfs2, my suspicions are falling on
> drivers/gpu/drm/drm_crtc_helper.c (Would this be something that would be
> in use on a Lenovo T400 laptop?)
>
> Is there any way to easily see what is scheduling themselves to use the
> slow workqueue?
Its an output polling feature of KMS, however it shouldn't cause
disruption elsewhere, but we are holding a mutex in there that there
isn't a real need to hold, so I suspect I'll drop that.
What GPU is it?
Dave.
On Sun, Jun 13, 2010 at 06:23:45PM +1000, Dave Airlie wrote:
>
> Its an output polling feature of KMS, however it shouldn't cause
> disruption elsewhere, but we are holding a mutex in there that there
> isn't a real need to hold, so I suspect I'll drop that.
> What GPU is it?
It's a Lenovo T400, with an Intel GPU:
(--) PCI:*(0:0:2:0) 8086:2a42:17aa:20e4 Intel Corporation Mobile 4 Series Chipse
t Integrated Graphics Controller rev 7, Mem @ 0xf4400000/4194304, 0xd0000000/268
435456, I/O @ 0x00001800/8
Why does KMS need to poll so frequently? 40 minutes of CPU time
accumulated in 4 hours of uptime translates to 16% of the CPU being
consumed by kslowd daemons, which seems... excessive.
I've seen upgraded the userspace to Ubuntu Lucid (from Karmic) and the
kernel to a post 2.6.35-rc2 (without the vt memory corruptor), and
after 4 days, 5 hours of uptime, and I'm seeing something better:
root 6390 4.7 0.0 0 0 ? S< Jun12 49:12 [kslowd001]
root 6784 2.4 0.0 0 0 ? S< Jun09 150:50 [kslowd003]
root 20047 2.7 0.0 0 0 ? S< Jun10 122:38 [kslowd002]
root 32324 4.6 0.0 0 0 ? S< Jun12 49:54 [kslowd000]
So that's 372 minutes of CPU time accumulated by kslowd in 6060
minutes, or about 6% of the CPU --- but part of tha time the laptop
was suspended --- and it still seems high.
I have the following patch which I'm going to install later tonight to
see if I can figure out if it really is drm_crtc_helper.c which is
really responsible for all of the kslowd time being burned, but an
examination of the source doesn't seem to show any other that I'm
using that would likely be using the slow workqueue.
- Ted
commit bd69ab2e31a329ad9d660e23a1dc806ae0a3afbd
Author: Theodore Ts'o <[email protected]>
Date: Sat Jun 12 21:54:03 2010 -0400
Add slow workqueue debuging for drm_crtc_helper.c
diff --git a/drivers/gpu/drm/drm_crtc_helper.c b/drivers/gpu/drm/drm_crtc_helper.c
index 9b2a541..e536e80 100644
--- a/drivers/gpu/drm/drm_crtc_helper.c
+++ b/drivers/gpu/drm/drm_crtc_helper.c
@@ -860,6 +860,16 @@ static void output_poll_execute(struct slow_work *work)
}
}
+#ifdef CONFIG_SLOW_WORK_DEBUG
+static void drm_op_desc(struct slow_work *work, struct seq_file *m)
+{
+ struct delayed_slow_work *delayed_work = container_of(work, struct delayed_slow_work, work);
+ struct drm_device *dev = container_of(delayed_work, struct drm_device, mode_config.output_poll_slow_work);
+
+ seq_printf(m, "DRM_CRTC_HELPER: %s", dev->devname);
+}
+#endif
+
void drm_kms_helper_poll_disable(struct drm_device *dev)
{
if (!dev->mode_config.poll_enabled)
@@ -917,4 +927,7 @@ EXPORT_SYMBOL(drm_helper_hpd_irq_event);
static struct slow_work_ops output_poll_ops = {
.execute = output_poll_execute,
+#ifdef CONFIG_SLOW_WORK_DEBUG
+ .desc = drm_op_desc,
+#endif
};
On Mon, Jun 14, 2010 at 5:49 AM, <[email protected]> wrote:
> On Sun, Jun 13, 2010 at 06:23:45PM +1000, Dave Airlie wrote:
>>
>> Its an output polling feature of KMS, however it shouldn't cause
>> disruption elsewhere, but we are holding a mutex in there that there
>> isn't a real need to hold, so I suspect I'll drop that.
>
>> What GPU is it?
>
> It's a Lenovo T400, with an Intel GPU:
>
> (--) PCI:*(0:0:2:0) 8086:2a42:17aa:20e4 Intel Corporation Mobile 4 Series Chipse
> t Integrated Graphics Controller rev 7, Mem @ 0xf4400000/4194304, 0xd0000000/268
> 435456, I/O @ 0x00001800/8
>
> Why does KMS need to poll so frequently? ?40 minutes of CPU time
> accumulated in 4 hours of uptime translates to 16% of the CPU being
> consumed by kslowd daemons, which seems... excessive.
>
> I've seen upgraded the userspace to Ubuntu Lucid (from Karmic) and the
> kernel to a post 2.6.35-rc2 (without the vt memory corruptor), and
> after 4 days, 5 hours of uptime, and I'm seeing something better:
>
> root ? ? ?6390 ?4.7 ?0.0 ? ? ?0 ? ? 0 ? ? ? ? ?S< ? Jun12 ?49:12 [kslowd001]
> root ? ? ?6784 ?2.4 ?0.0 ? ? ?0 ? ? 0 ? ? ? ? ?S< ? Jun09 150:50 [kslowd003]
> root ? ? 20047 ?2.7 ?0.0 ? ? ?0 ? ? 0 ? ? ? ? ?S< ? Jun10 122:38 [kslowd002]
> root ? ? 32324 ?4.6 ?0.0 ? ? ?0 ? ? 0 ? ? ? ? ?S< ? Jun12 ?49:54 [kslowd000]
>
> So that's 372 minutes of CPU time accumulated by kslowd in 6060
> minutes, or about 6% of the CPU --- but part of tha time the laptop
> was suspended --- and it still seems high.
>
> I have the following patch which I'm going to install later tonight to
> see if I can figure out if it really is drm_crtc_helper.c which is
> really responsible for all of the kslowd time being burned, but an
> examination of the source doesn't seem to show any other that I'm
> using that would likely be using the slow workqueue.
It most likely is, but polling shouldn't really be taking huge amounts
of CPU, unless there are some u/mdelays in there which would be bad.
In theory on Intel with hotplug irqs we shouldn't be poilling at all,
I must check why, the other thing is you could be suffering from the
hotplug irq problem that others have reported, this would cause slow
work triggers which aren't part of the normal poll cycle.
Dave.
On 06:00 Mon 14 Jun , Dave Airlie wrote:
> On Mon, Jun 14, 2010 at 5:49 AM, <[email protected]> wrote:
> > It's a Lenovo T400, with an Intel GPU:
> >
[...]
> > Why does KMS need to poll so frequently? ?40 minutes of CPU time
> > accumulated in 4 hours of uptime translates to 16% of the CPU being
> > consumed by kslowd daemons, which seems... excessive.
> >
[...]
> > I have the following patch which I'm going to install later tonight to
> > see if I can figure out if it really is drm_crtc_helper.c which is
> > really responsible for all of the kslowd time being burned, but an
> > examination of the source doesn't seem to show any other that I'm
> > using that would likely be using the slow workqueue.
>
> It most likely is, but polling shouldn't really be taking huge amounts
> of CPU, unless there are some u/mdelays in there which would be bad.
>
> In theory on Intel with hotplug irqs we shouldn't be poilling at all,
> I must check why, the other thing is you could be suffering from the
> hotplug irq problem that others have reported, this would cause slow
> work triggers which aren't part of the normal poll cycle.
This sounds exactly like the issue I've been seeing on a T500 laptop, as
well (GM45 board). The slowdowns render the system essentially
unusable, as it can spend a loooong time just moving the mouse cursor a
few pixels on the screen. During this time, nothing else on the display
is updating (glxgears drops to 0fps). Things generally seem to be
working fine if I am not moving the mouse, or if I'm not running X.
I do not have this issue on a desktop machine with a G45.
Unfortunately, bisection is proving difficult because the exact set of
conditions to trigger the problems seem to be eluding me: sometimes the
kernel will work perfectly fine for quite some time, and then go
downhill from there. However, this is definitely a regression
introduced after 2.6.35-rc1.
--
Nick Bowler, Elliptic Technologies (http://www.elliptictech.com/)
On Mon, Jun 14, 2010 at 02:42:44PM -0400, Nick Bowler wrote:
>
> This sounds exactly like the issue I've been seeing on a T500 laptop, as
> well (GM45 board). The slowdowns render the system essentially
> unusable, as it can spend a loooong time just moving the mouse cursor a
> few pixels on the screen. During this time, nothing else on the display
> is updating (glxgears drops to 0fps). Things generally seem to be
> working fine if I am not moving the mouse, or if I'm not running X.
I saw exactly the same behaviour when using the Ubuntu Karmic
userspace. Basically, using a 2.6.35-rc2 (+ vt memory corrupter fix)
kernel, the system was essentially unusable. Interestingly, the
problem went away (with the same kernel) once I updated to Ubuntu
Lucid. I was going to mention that so that hopefully someone with far
more X.org-fu than I could figure out how much of this is a X server
bug, and how much of this was a kernel bug, but things worked just
*fine* with 2.6.34 kernel.
Occasionally it will happen that the mouse stops tracking, and then I
grumble and curse, and a few seconds later it resolves itself. I
haven't had time to track it down; but the problem was breathtakingly
easy to reproduce with an Ubuntu Karmic userspace. :-)
- Ted
On Mon, Jun 14, 2010 at 10:46 PM, <[email protected]> wrote:
> On Mon, Jun 14, 2010 at 02:42:44PM -0400, Nick Bowler wrote:
>>
>> This sounds exactly like the issue I've been seeing on a T500 laptop, as
>> well (GM45 board). ?The slowdowns render the system essentially
>> unusable, as it can spend a loooong time just moving the mouse cursor a
>> few pixels on the screen. ?During this time, nothing else on the display
>> is updating (glxgears drops to 0fps). ?Things generally seem to be
>> working fine if I am not moving the mouse, or if I'm not running X.
>
> I saw exactly the same behaviour when using the Ubuntu Karmic
> userspace. ?Basically, using a 2.6.35-rc2 (+ vt memory corrupter fix)
> kernel, the system was essentially unusable. ?Interestingly, the
> problem went away (with the same kernel) once I updated to Ubuntu
> Lucid. ?I was going to mention that so that hopefully someone with far
> more X.org-fu than I could figure out how much of this is a X server
> bug, and how much of this was a kernel bug, but things worked just
> *fine* with 2.6.34 kernel.
>
> Occasionally it will happen that the mouse stops tracking, and then I
> grumble and curse, and a few seconds later it resolves itself. ?I
> haven't had time to track it down; but the problem was breathtakingly
> easy to reproduce with an Ubuntu Karmic userspace. ?:-)
I am seeing crunchy cursor behavior on boot on Ubuntu Lucid user space
with the 2.6.35-rc3 kernel as well. It pretty much always does that
on boot, and often later as well. Right now I have four of these
chaps each consuming 9% of a CPU. Though they definatly come and go.
Very odd:
938 root 15 -5 0 0 0 S 9 0.0 1:00.90 kslowd001
1048 root 15 -5 0 0 0 S 9 0.0 1:01.12 kslowd002
937 root 15 -5 0 0 0 S 8 0.0 1:00.92 kslowd000
1544 root 15 -5 0 0 0 D 8 0.0 1:00.55 kslowd003
-apw
Andy Whitcroft <[email protected]> wrote:
> I am seeing crunchy cursor behavior on boot on Ubuntu Lucid user space
> with the 2.6.35-rc3 kernel as well. It pretty much always does that
> on boot, and often later as well. Right now I have four of these
> chaps each consuming 9% of a CPU. Though they definatly come and go.
> Very odd:
>
> 938 root 15 -5 0 0 0 S 9 0.0 1:00.90 kslowd001
> 1048 root 15 -5 0 0 0 S 9 0.0 1:01.12 kslowd002
> 937 root 15 -5 0 0 0 S 8 0.0 1:00.92 kslowd000
> 1544 root 15 -5 0 0 0 D 8 0.0 1:00.55 kslowd003
Can you see what they're doing?
watch -n0 cat /sys/kernel/debug/slow_work/runqueue
David
> Can you see what they're doing?
>
> ? ? ? ?watch -n0 cat /sys/kernel/debug/slow_work/runqueue
Turned on the debugging and applied the patch from Ted, and when
things are bad I see constant cycling of all four threads in the
output showing similar to the below, note only one thread shows at a
time:
Every 0.1s: cat /sys/kernel/debug/slow_work/runqueue Wed Jun 16 12:34:52 2010
THR PID ITEM ADDR FL MARK DESC
=== ===== ================ == ===== ==========
0 897 ffff88012bb07510 12 20ms DRM_CRTC_HELPER: i915@pci:0000:00:02.0
When things are working well I see the same output appearing for one
update, about every 10s.
-apw
On Wed, Jun 16, 2010 at 12:37 PM, Andy Whitcroft <[email protected]> wrote:
>> Can you see what they're doing?
>>
>> ? ? ? ?watch -n0 cat /sys/kernel/debug/slow_work/runqueue
>
> Turned on the debugging and applied the patch from Ted, and when
> things are bad I see constant cycling of all four threads in the
> output showing similar to the below, note only one thread shows at a
> time:
>
> Every 0.1s: cat /sys/kernel/debug/slow_work/runqueue ? ?Wed Jun 16 12:34:52 2010
>
> THR PID ? ITEM ADDR ? ? ? ?FL MARK ?DESC
> === ===== ================ == ===== ==========
> ? 0 ? 897 ffff88012bb07510 12 ?20ms DRM_CRTC_HELPER: i915@pci:0000:00:02.0
>
> When things are working well I see the same output appearing for one
> update, about every 10s.
Ok I've added some additional debug and these appear to be being
thrown a loop by the HPD interrupt which is firing continiusly, here
is a sample from some 4 minutes after boot:
[ 284.862670] APW: drm_helper_hpd_irq_event: cancel/submit
[ 284.882803] APW: drm_helper_hpd_irq_event: cancel/submit
[ 284.902691] APW: drm_helper_hpd_irq_event: cancel/submit
[ 284.922814] APW: drm_helper_hpd_irq_event: cancel/submit
[ 284.942620] APW: drm_helper_hpd_irq_event: cancel/submit
[ 284.962707] APW: drm_helper_hpd_irq_event: cancel/submit
[ 284.982937] APW: drm_helper_hpd_irq_event: cancel/submit
[ 285.004181] APW: drm_helper_hpd_irq_event: cancel/submit
[ 285.022622] APW: drm_helper_hpd_irq_event: cancel/submit
[ 285.042569] APW: drm_helper_hpd_irq_event: cancel/submit
[ 285.062593] APW: drm_helper_hpd_irq_event: cancel/submit
[ 285.082683] APW: drm_helper_hpd_irq_event: cancel/submit
Later when things calm down we see them but much much less often:
[ 379.038239] APW: drm_helper_hpd_irq_event: cancel/submit
[ 379.044904] APW: drm_helper_hpd_irq_event: cancel/submit
[ 382.438106] APW: drm_helper_hpd_irq_event: cancel/submit
[ 382.444754] APW: drm_helper_hpd_irq_event: cancel/submit
[ 385.838072] APW: drm_helper_hpd_irq_event: cancel/submit
[ 385.844735] APW: drm_helper_hpd_irq_event: cancel/submit
[ 389.237876] APW: drm_helper_hpd_irq_event: cancel/submit
[ 389.244523] APW: drm_helper_hpd_irq_event: cancel/submit
[ 392.638048] APW: drm_helper_hpd_irq_event: cancel/submit
[ 392.644740] APW: drm_helper_hpd_irq_event: cancel/submit
-apw
On 12:37 Wed 16 Jun , Andy Whitcroft wrote:
> > Can you see what they're doing?
> >
> > ? ? ? ?watch -n0 cat /sys/kernel/debug/slow_work/runqueue
>
> Turned on the debugging and applied the patch from Ted, and when
> things are bad I see constant cycling of all four threads in the
> output showing similar to the below, note only one thread shows at a
> time:
>
> Every 0.1s: cat /sys/kernel/debug/slow_work/runqueue Wed Jun 16 12:34:52 2010
>
> THR PID ITEM ADDR FL MARK DESC
> === ===== ================ == ===== ==========
> 0 897 ffff88012bb07510 12 20ms DRM_CRTC_HELPER: i915@pci:0000:00:02.0
Indeed, this looks very similar to mine, except my DESC field is blank
for some reason. The FL value is sometimes 12, sometimes 2.
When things are bad, all four of the kslowd threads are pegged at 16%
CPU usage. On my T500, they _never_ calm down once they're pegged.
However, things seem to be working normally at boot. It seems that
there is some initial trigger which pegs the threads, then the cursor
lag and other problems are just fallout after that. The system is
essentially unusable after this point.
The threads sometimes get pegged immediately after booting the system,
sometimes it can last an hour or more before showing any problems.
Unfortunately, this is making bisection essentially impossible.
This seems to have been introduced somewhere between 2.6.35-rc1 and
-rc2.
--
Nick Bowler, Elliptic Technologies (http://www.elliptictech.com/)
On Wed, Jun 16, 2010 at 3:36 PM, Nick Bowler <[email protected]> wrote:
> On 12:37 Wed 16 Jun ? ? , Andy Whitcroft wrote:
>> > Can you see what they're doing?
>> >
>> > ? ? ? ?watch -n0 cat /sys/kernel/debug/slow_work/runqueue
>>
>> Turned on the debugging and applied the patch from Ted, and when
>> things are bad I see constant cycling of all four threads in the
>> output showing similar to the below, note only one thread shows at a
>> time:
>>
>> Every 0.1s: cat /sys/kernel/debug/slow_work/runqueue ? ?Wed Jun 16 12:34:52 2010
>>
>> THR PID ? ITEM ADDR ? ? ? ?FL MARK ?DESC
>> === ===== ================ == ===== ==========
>> ? ?0 ? 897 ffff88012bb07510 12 ?20ms DRM_CRTC_HELPER: i915@pci:0000:00:02.0
>
> Indeed, this looks very similar to mine, except my DESC field is blank
> for some reason. ?The FL value is sometimes 12, sometimes 2.
DESC is only filled in if you have Ted's patch applied (earlier in this thread).
> When things are bad, all four of the kslowd threads are pegged at 16%
> CPU usage. ?On my T500, they _never_ calm down once they're pegged.
> However, things seem to be working normally at boot. ?It seems that
> there is some initial trigger which pegs the threads, then the cursor
> lag and other problems are just fallout after that. ?The system is
> essentially unusable after this point.
>
> The threads sometimes get pegged immediately after booting the system,
> sometimes it can last an hour or more before showing any problems.
> Unfortunately, this is making bisection essentially impossible.
>
> This seems to have been introduced somewhere between 2.6.35-rc1 and
I suspect this is introduced by the commit which pulled polling of DRM
connectors into the core.
-apw
On 15:55 Wed 16 Jun , Andy Whitcroft wrote:
> On Wed, Jun 16, 2010 at 3:36 PM, Nick Bowler <[email protected]> wrote:
> > The threads sometimes get pegged immediately after booting the system,
> > sometimes it can last an hour or more before showing any problems.
> > Unfortunately, this is making bisection essentially impossible.
> >
> > This seems to have been introduced somewhere between 2.6.35-rc1 and
>
> I suspect this is introduced by the commit which pulled polling of DRM
> connectors into the core.
OK. From this hint I looked at the log between -rc1 and -rc2 and found
commit fbf81762e385d ("drm/kms: disable/enable poll around switcheroo
on/off").
Reverting this commit brings everything back to normal.
--
Nick Bowler, Elliptic Technologies (http://www.elliptictech.com/)