LinuxLists.cc - [PATCH] NFS regression in 2.6.26?, "task blocked for more than 120 seconds"

2008-10-17 12:33:28

Subject: [PATCH] NFS regression in 2.6.26?, "task blocked for more than 120 seconds"

Hi,

Ian: this is a follow-up to your post "NFS regression? Odd delays and
lockups accessing an NFS export" a few weeks ago
(http://lkml.org/lkml/2008/9/27/42).

I am able to trigger this bug within a few minutes on a customer's
machine (large web hoster, a *lot* of NFS traffic).

Symptom: with 2.6.26 (2.6.27.1, too), load goes to 100+, dmesg says
"INFO: task migration/2:9 blocked for more than 120 seconds." with
varying task names. Except for the high load average, the machine
seems to work.

With git bisect, I was finally able to identify the guilty commit,
it's not "Ensure we zap only the access and acl caches when setting
new acls" like you guessed, Ian. According to my bisect,
6becedbb06072c5741d4057b9facecb4b3143711 is the origin of the problem.
e481fcf8563d300e7f8875cae5fdc41941d29de0 (its parent) works well.

Glauber: that is your patch "x86: minor adjustments for do_boot_cpu"
(http://lkml.org/lkml/2008/3/19/143). I don't understand this patch
well, and I fail to see a connection with the symptom, but maybe
somebody else does...

See patch below (applies to 2.6.27.1). So far, it looks like the
problem is solved on the server, no visible side effects.

Max

Revert "x86: minor adjustments for do_boot_cpu"

According to a bisect, Glauber Costa's patch induced high load and
"task ... blocked for more than 120 seconds" messages in dmesg. This
patch reverts 6becedbb06072c5741d4057b9facecb4b3143711.

Signed-off-by: Max Kellermann <[email protected]>
---

arch/x86/kernel/smpboot.c | 21 ++++++++-------------
1 files changed, 8 insertions(+), 13 deletions(-)

diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
index 7985c5b..789cf84 100644
--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -808,7 +808,7 @@ static int __cpuinit do_boot_cpu(int apicid, int cpu)
* Returns zero if CPU booted OK, else error code from wakeup_secondary_cpu.
*/
{
- unsigned long boot_error = 0;
+ unsigned long boot_error;
int timeout;
unsigned long start_ip;
unsigned short nmi_high = 0, nmi_low = 0;
@@ -828,7 +828,11 @@ static int __cpuinit do_boot_cpu(int apicid, int cpu)
}
#endif

- alternatives_smp_switch(1);
+ /*
+ * Save current MTRR state in case it was changed since early boot
+ * (e.g. by the ACPI SMI) to initialize new CPUs with MTRRs in sync:
+ */
+ mtrr_save_state();

c_idle.idle = get_idle_for_cpu(cpu);

@@ -873,6 +877,8 @@ do_rest:
/* start_ip had better be page-aligned! */
start_ip = setup_trampoline();

+ alternatives_smp_switch(1);
+
/* So we see what's up */
printk(KERN_INFO "Booting processor %d/%d ip %lx\n",
cpu, apicid, start_ip);
@@ -891,11 +897,6 @@ do_rest:
store_NMI_vector(&nmi_high, &nmi_low);

smpboot_setup_warm_reset_vector(start_ip);
- /*
- * Be paranoid about clearing APIC errors.
- */
- apic_write(APIC_ESR, 0);
- apic_read(APIC_ESR);
}

/*
@@ -986,12 +987,6 @@ int __cpuinit native_cpu_up(unsigned int cpu)
return -ENOSYS;
}

- /*
- * Save current MTRR state in case it was changed since early boot
- * (e.g. by the ACPI SMI) to initialize new CPUs with MTRRs in sync:
- */
- mtrr_save_state();
-
per_cpu(cpu_state, cpu) = CPU_UP_PREPARE;

#ifdef CONFIG_X86_32

2008-10-17 14:32:14

by Glauber Costa

[permalink] [raw]

Subject: Re: [PATCH] NFS regression in 2.6.26?, "task blocked for more than 120 seconds"

On Fri, Oct 17, 2008 at 02:32:07PM +0200, Max Kellermann wrote:
> Hi,
>
> Ian: this is a follow-up to your post "NFS regression? Odd delays and
> lockups accessing an NFS export" a few weeks ago
> (http://lkml.org/lkml/2008/9/27/42).
>
> I am able to trigger this bug within a few minutes on a customer's
> machine (large web hoster, a *lot* of NFS traffic).
>
> Symptom: with 2.6.26 (2.6.27.1, too), load goes to 100+, dmesg says
> "INFO: task migration/2:9 blocked for more than 120 seconds." with
> varying task names. Except for the high load average, the machine
> seems to work.
>
> With git bisect, I was finally able to identify the guilty commit,
> it's not "Ensure we zap only the access and acl caches when setting
> new acls" like you guessed, Ian. According to my bisect,
> 6becedbb06072c5741d4057b9facecb4b3143711 is the origin of the problem.
> e481fcf8563d300e7f8875cae5fdc41941d29de0 (its parent) works well.
>
> Glauber: that is your patch "x86: minor adjustments for do_boot_cpu"
> (http://lkml.org/lkml/2008/3/19/143). I don't understand this patch
> well, and I fail to see a connection with the symptom, but maybe
> somebody else does...
>
> See patch below (applies to 2.6.27.1). So far, it looks like the
> problem is solved on the server, no visible side effects.
>
> Max
That's probably something related to apic congestion.
Does the problem go away if the only thing you change is this:

> @@ -891,11 +897,6 @@ do_rest:
> store_NMI_vector(&nmi_high, &nmi_low);
>
> smpboot_setup_warm_reset_vector(start_ip);
> - /*
> - * Be paranoid about clearing APIC errors.
> - */
> - apic_write(APIC_ESR, 0);
> - apic_read(APIC_ESR);
> }

Please let me know.

2008-10-20 06:27:47

On Mon, 2008-10-20 at 08:51 +0200, Max Kellermann wrote:
>
> On 2008/10/20 08:27, Ian Campbell <[email protected]> wrote:
> > The issue I see still occurs well before those changesets. I have
> > seen it with v2.6.25 but v2.6.24 survived for 7 days without issue
> > (my threshold for a good kernel is 7 days, hence bisecting is a bit
> > slow...).
>
> Hello Ian,
>
> it seems we're hunting down different bugs after all. Too bad, I
> hoped I could have solved your problem, too.

Thanks anyway, I'll just keep on bisecting ;-)

> Our machine has been
> running well over the weekend with the patch I posted; with faulty
> kernels, the problem would occur after a few minutes.
>
--
Ian Campbell

BOFH excuse #400:

We are Microsoft. What you are experiencing is not a problem; it is an undocumented feature.

Attachments:

signature.asc (197.00 B)
This is a digitally signed message part

2008-10-20 13:16:17

by Glauber Costa

[permalink] [raw]

Subject: Re: [PATCH] NFS regression in 2.6.26?, "task blocked for more than 120 seconds"

On Mon, Oct 20, 2008 at 4:51 AM, Max Kellermann <[email protected]> wrote:
> On 2008/10/17 16:33, Glauber Costa <[email protected]> wrote:
>> That's probably something related to apic congestion.
>> Does the problem go away if the only thing you change is this:
>>
>>
>> > @@ -891,11 +897,6 @@ do_rest:
>> > store_NMI_vector(&nmi_high, &nmi_low);
>> >
>> > smpboot_setup_warm_reset_vector(start_ip);
>> > - /*
>> > - * Be paranoid about clearing APIC errors.
>> > - */
>> > - apic_write(APIC_ESR, 0);
>> > - apic_read(APIC_ESR);
>> > }
>>
>>
>> Please let me know.
>
> Hello Glauber,
>
> I have rebooted the server with 2.6.27.1 + this patchlet an hour ago.
> No problems since.
>
> Hardware: Compaq P4 Xeon server, Broadcom CMIC-WS / CIOB-X2 board.
> Tell me if you need more detailed information.
>

There's a patch in flight from cyrill that probably fixes your problem:
http://lkml.org/lkml/2008/9/15/93

The checks are obviously there for a reason, and we can't just wipe
them out unconditionally ;-) So can you check please that you are also
covered by the case provided?

> On 2008/10/20 08:27, Ian Campbell <[email protected]> wrote:
>> The issue I see still occurs well before those changesets. I have
>> seen it with v2.6.25 but v2.6.24 survived for 7 days without issue
>> (my threshold for a good kernel is 7 days, hence bisecting is a bit
>> slow...).
>
> Hello Ian,
>
> it seems we're hunting down different bugs after all. Too bad, I
> hoped I could have solved your problem, too. Our machine has been
> running well over the weekend with the patch I posted; with faulty
> kernels, the problem would occur after a few minutes.
>
> Max
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>

--
Glauber Costa.
"Free as in Freedom"
http://glommer.net

"The less confident you are, the more serious you have to act."

2008-10-20 14:14:19

by Max Kellermann

[permalink] [raw]

Subject: Re: [PATCH] NFS regression in 2.6.26?, "task blocked for more than 120 seconds"

On 2008/10/20 15:15, Glauber Costa <[email protected]> wrote:
> There's a patch in flight from cyrill that probably fixes your
> problem: http://lkml.org/lkml/2008/9/15/93
>
> The checks are obviously there for a reason, and we can't just wipe
> them out unconditionally ;-) So can you check please that you are
> also covered by the case provided?

Looks good: booted the machine 30 minutes ago, no problems so far.

Max

2008-10-20 14:21:18

by Cyrill Gorcunov

[permalink] [raw]

Subject: Re: [PATCH] NFS regression in 2.6.26?, "task blocked for more than 120 seconds"

[Glauber Costa - Mon, Oct 20, 2008 at 11:15:56AM -0200]
| On Mon, Oct 20, 2008 at 4:51 AM, Max Kellermann <[email protected]> wrote:
| > On 2008/10/17 16:33, Glauber Costa <[email protected]> wrote:
| >> That's probably something related to apic congestion.
| >> Does the problem go away if the only thing you change is this:
| >>
| >>
| >> > @@ -891,11 +897,6 @@ do_rest:
| >> > store_NMI_vector(&nmi_high, &nmi_low);
| >> >
| >> > smpboot_setup_warm_reset_vector(start_ip);
| >> > - /*
| >> > - * Be paranoid about clearing APIC errors.
| >> > - */
| >> > - apic_write(APIC_ESR, 0);
| >> > - apic_read(APIC_ESR);
| >> > }
| >>
| >>
| >> Please let me know.
| >
| > Hello Glauber,
| >
| > I have rebooted the server with 2.6.27.1 + this patchlet an hour ago.
| > No problems since.
| >
| > Hardware: Compaq P4 Xeon server, Broadcom CMIC-WS / CIOB-X2 board.
| > Tell me if you need more detailed information.
| >
|
| There's a patch in flight from cyrill that probably fixes your problem:
| http://lkml.org/lkml/2008/9/15/93
|
| The checks are obviously there for a reason, and we can't just wipe
| them out unconditionally ;-) So can you check please that you are also
| covered by the case provided?

Actually I'll wonder if it help. Do Xeon processors really not
have ESR register and not integrated?

...

- Cyrill -

2008-10-20 14:34:24

by Cyrill Gorcunov

On Sun, 02 Nov 2008 14:40:47 +0000, Ian Campbell wrote:

> On Sat, 2008-11-01 at 09:41 -0400, Trond Myklebust wrote:
>>
>>
>> Have you tested with the TCP RST fix yet? It has been merged into
>> mainline, so it should be in the latest 2.6.28-git, but I've attached
>> it so you can apply it to your test kernel...
>
> I wasn't aware of it. I'll give it a go.
>
> Thanks,
> Ian.
>>

I think I having the same problem as you. At least I have a gut feeling it's nfs related.

What good and bad versions do you have so far in your bisecting. ??

I see the problem several times a day so it should be possible
to at least try one or two versions per day.

this is on a 2.6.27.2 client to a 2.6.26.3 server.
--------

sudo grep blocked /var/log/syslog.0
Nov 5 02:06:27 duo kernel: [ 5080.947067] INFO: task cpufreq-applet:11956 blocked for more than 120 seconds.
Nov 5 02:08:40 duo kernel: [ 5214.091071] INFO: task cpufreq-applet:11956 blocked for more than 120 seconds.
Nov 5 02:10:49 duo kernel: [ 5342.940064] INFO: task cpufreq-applet:11956 blocked for more than 120 seconds.
Nov 5 02:33:15 duo kernel: [ 6688.338072] INFO: task cpufreq-applet:11956 blocked for more than 120 seconds.
Nov 5 02:35:44 duo kernel: [ 6837.588072] INFO: task cpufreq-applet:11956 blocked for more than 120 seconds.
Nov 5 02:38:12 duo kernel: [ 6985.765070] INFO: task cpufreq-applet:11956 blocked for more than 120 seconds.
Nov 5 02:40:37 duo kernel: [ 7130.720067] INFO: task cpufreq-applet:11956 blocked for more than 120 seconds.
Nov 5 06:56:00 duo kernel: [22454.090070] INFO: task cpufreq-applet:11956 blocked for more than 120 seconds.
Nov 5 07:51:38 duo kernel: [25791.279105] INFO: task cpufreq-applet:11956 blocked for more than 120 seconds.
Nov 5 09:39:33 duo kernel: [32267.016068] INFO: task cpufreq-applet:11956 blocked for more than 120 seconds.
Nov 5 09:41:55 duo kernel: [32408.750061] INFO: task cpufreq-applet:11956 blocked for more than 120 seconds.
Nov 5 09:44:17 duo kernel: [32550.484061] INFO: task cpufreq-applet:11956 blocked for more than 120 seconds.
Nov 5 09:46:37 duo kernel: [32691.144064] INFO: task cpufreq-applet:11956 blocked for more than 120 seconds.
Nov 5 16:26:25 duo kernel: [56678.536068] INFO: task cpufreq-applet:11956 blocked for more than 120 seconds.
Nov 5 16:28:50 duo kernel: [56823.492067] INFO: task cpufreq-applet:11956 blocked for more than 120 seconds.
Nov 5 16:31:17 duo kernel: [56970.594061] INFO: task cpufreq-applet:11956 blocked for more than 120 seconds.
Nov 5 16:33:44 duo kernel: [57117.697062] INFO: task cpufreq-applet:11956 blocked for more than 120 seconds.
Nov 5 16:51:04 duo kernel: [58158.153065] INFO: task cpufreq-applet:11956 blocked for more than 120 seconds.
Nov 5 18:00:39 duo kernel: [62256.625050] INFO: task hald-addon-stor:7110 blocked for more than 120 seconds.
Nov 5 18:15:16 duo kernel: [63210.108080] INFO: task cpufreq-applet:11956 blocked for more than 120 seconds.
Nov 5 18:24:08 duo kernel: [63741.610074] INFO: task cpufreq-applet:11956 blocked for more than 120 seconds.
Nov 5 19:47:09 duo kernel: [68722.698056] INFO: task hald-addon-stor:7102 blocked for more than 120 seconds.
Nov 5 19:47:53 duo kernel: [68722.698307] INFO: task hald-addon-stor:7105 blocked for more than 120 seconds.
Nov 5 19:47:53 duo kernel: [68722.698513] INFO: task hald-addon-stor:7110 blocked for more than 120 seconds.
Nov 5 19:47:53 duo kernel: [68755.984030] INFO: task scsi_eh_12:2687 blocked for more than 120 seconds.
Nov 5 22:20:11 duo kernel: [77904.265068] INFO: task cpufreq-applet:11956 blocked for more than 120 seconds.
Nov 5 22:23:10 duo kernel: [78083.580065] INFO: task cpufreq-applet:11956 blocked for more than 120 seconds.
Nov 5 22:46:09 duo kernel: [79462.264081] INFO: task cpufreq-applet:11956 blocked for more than 120 seconds.
Nov 5 23:22:05 duo kernel: [81599.010033] INFO: task scsi_eh_10:2675 blocked for more than 120 seconds.
Nov 5 23:22:05 duo kernel: [81599.010253] INFO: task hald-addon-stor:7097 blocked for more than 120 seconds.
Nov 5 23:22:05 duo kernel: [81599.010468] INFO: task hald-addon-stor:7102 blocked for more than 120 seconds.
Nov 5 23:22:05 duo kernel: [81599.010674] INFO: task hald-addon-stor:7110 blocked for more than 120 seconds.
Nov 6 01:46:31 duo kernel: [90209.346061] INFO: task cpufreq-applet:11956 blocked for more than 120 seconds.
Nov 6 01:46:31 duo kernel: [90242.632048] INFO: task hald-addon-stor:7102 blocked for more than 120 seconds.
Nov 6 01:46:31 duo kernel: [90242.632230] INFO: task hald-addon-stor:7105 blocked for more than 120 seconds.
Nov 6 01:46:31 duo kernel: [90242.632368] INFO: task hald-addon-stor:7110 blocked for more than 120 seconds.
Nov 6 01:46:31 duo kernel: [90275.918024] INFO: task scsi_eh_12:2687 blocked for more than 120 seconds.
Nov 6 02:11:59 duo kernel: [91812.443070] INFO: task cpufreq-applet:11956 blocked for more than 120 seconds.
Nov 6 02:14:12 duo kernel: [91945.587069] INFO: task cpufreq-applet:11956 blocked for more than 120 seconds.
Nov 6 02:16:50 duo kernel: [92103.427069] INFO: task cpufreq-applet:11956 blocked for more than 120 seconds.
Nov 6 02:28:26 duo kernel: [92759.483053] INFO: task hald-addon-usb-:7009 blocked for more than 120 seconds.
Nov 6 02:28:26 duo kernel: [92759.483233] INFO: task hald-addon-stor:7105 blocked for more than 120 seconds.
Nov 6 02:28:26 duo kernel: [92759.483447] INFO: task hald-addon-stor:7110 blocked for more than 120 seconds.
Nov 6 02:28:26 duo kernel: [92792.769034] INFO: task scsi_eh_12:2687 blocked for more than 120 seconds.
Nov 6 02:58:49 duo kernel: [94622.425059] INFO: task cpufreq-applet:11956 blocked for more than 120 seconds.

2008-11-25 07:09:55

On Mon, 2008-12-01 at 22:09 +0000, Ian Campbell wrote:
> On Sun, 2008-11-30 at 19:17 -0500, Trond Myklebust wrote:
> > Can you see if the following 3 patches help? They're against 2.6.28-rc6,
> > but afaics the problems are pretty much the same on 2.6.26.
>
> Thanks.
>
> The server was actually running 2.6.25.7 but the matching sources have
> since been removed the backports.org so I've reproduce with 2.6.26 and
> now I'll add the patches.

Just a small progress report. Anecdotally I thought that unpatched
2.6.26.7 was worse than 2.6.25.7, mostly because it hung twice in the ~1
day I was running it where previously it was less frequent than once per
day.

With the patched server the client ran OK for 2.5 days then mysteriously
hung, the logs show none of the normal symptoms and my wife reset it
before I got home so I've no real clue what happened but I'm inclined to
think it was unrelated for now. I'll get back to you in a week or so if
the problem hasn't reoccurred.

Ian.
--
Ian Campbell

It's later than you think.

Attachments:

signature.asc (197.00 B)
This is a digitally signed message part

2008-12-14 18:24:45

On Wed, Jan 07, 2009 at 05:21:15PM -0500, J. Bruce Fields wrote:
> On Tue, Dec 16, 2008 at 06:39:35PM +0000, Ian Campbell wrote:
> > That's right, it was actually 2.6.26.7 FWIW.
> >
> > > I'll try to take a look at these before I leave for the holidays,
> > > assuming the versions Trond posted on Nov. 30 are the latest.
> >
> > Thanks.
>
> Sorry for getting behind.
>
> If you got a chance to retest with the for-2.6.29 branch at
>
> git://linux-nfs.org/~bfields/linux.git for-2.6.29
>
> that'd be great; that's what I intend to send to Linus.

(Merged now, so testing mainline as of today should work too.)

--b.

2009-01-08 21:23:15

On Thu, 2009-01-08 at 16:26 -0500, J. Bruce Fields wrote:
>
> > > (Merged now, so testing mainline as of today should work too.)
> >
> > The server isn't really a machine I want to test random kernels on,
> is
> > there some subset of those changesets which it would be useful for
> me to
> > pull back onto the 2.6.26 kernel I'm using to test? (I can most like
> > manage the backporting myself).
> >
> > These two look like the relevant ones to me but I'm not sure:
> > 22945e4a1c7454c97f5d8aee1ef526c83fef3223 svc: Clean up deferred
> requests on transport destruction
> > 69b6ba3712b796a66595cfaf0a5ab4dfe1cf964a SUNRPC: Ensure the server
> closes sockets in a timely fashion
> >
> > I think 69b6 was in the set of three I tested previously and the
> other
> > two turned into 2294?
>
> Yep, exactly.--b.

The client machine now has an uptime of ten days without error after
these two patches were applied to the server.

Thanks everybody,
Ian.

--
Ian Campbell

I used to think that the brain was the most wonderful organ in
my body. Then I realized who was telling me this.
-- Emo Phillips

Attachments:

signature.asc (197.00 B)
This is a digitally signed message part

2009-01-22 16:44:44

by J. Bruce Fields

[permalink] [raw]

Subject: Re: [PATCH 0/3] NFS regression in 2.6.26?, "task blocked for more than 120 seconds"

On Thu, Jan 22, 2009 at 08:27:40AM +0000, Ian Campbell wrote:
> On Thu, 2009-01-08 at 16:26 -0500, J. Bruce Fields wrote:
> >
> > > > (Merged now, so testing mainline as of today should work too.)
> > >
> > > The server isn't really a machine I want to test random kernels on,
> > is
> > > there some subset of those changesets which it would be useful for
> > me to
> > > pull back onto the 2.6.26 kernel I'm using to test? (I can most like
> > > manage the backporting myself).
> > >
> > > These two look like the relevant ones to me but I'm not sure:
> > > 22945e4a1c7454c97f5d8aee1ef526c83fef3223 svc: Clean up deferred
> > requests on transport destruction
> > > 69b6ba3712b796a66595cfaf0a5ab4dfe1cf964a SUNRPC: Ensure the server
> > closes sockets in a timely fashion
> > >
> > > I think 69b6 was in the set of three I tested previously and the
> > other
> > > two turned into 2294?
> >
> > Yep, exactly.--b.
>
> The client machine now has an uptime of ten days without error after
> these two patches were applied to the server.
>
> Thanks everybody,

Very good, so upstream should be OK. Thanks for the testing!

--b.