2005-11-09 13:32:14

by Chris Boot

[permalink] [raw]
Subject: 2.6.14-mm1 RAID-1 in D< state

Hi all,

I haven't noticed this until today...but my load average has been
skyrocketing past 3.00 since Monday, which is when I upgraded to
2.6.14-mm1. I've got 3 Software RAID-1 arrays across 4 SATA disks, and
all 3 processes are locked in an uninterruptible sleep.

What's interesting, though, is I haven't noticed a degradation of
performance at all, and all the arrays work absolutely fine. They aren't
rebuilding or doing anything strange that I can see.

Any ideas?

Cheers,
Chris

--
Chris Boot
[email protected]
http://www.bootc.net/


2005-11-09 21:13:40

by J.A. Magallon

[permalink] [raw]
Subject: Re: 2.6.14-mm1 RAID-1 in D< state

On Wed, 09 Nov 2005 13:32:11 +0000, Chris Boot <[email protected]> wrote:

> Hi all,
>
> I haven't noticed this until today...but my load average has been
> skyrocketing past 3.00 since Monday, which is when I upgraded to
> 2.6.14-mm1. I've got 3 Software RAID-1 arrays across 4 SATA disks, and
> all 3 processes are locked in an uninterruptible sleep.
>
> What's interesting, though, is I haven't noticed a degradation of
> performance at all, and all the arrays work absolutely fine. They aren't
> rebuilding or doing anything strange that I can see.
>
> Any ideas?
>

Try this:

http://marc.theaimsgroup.com/?l=linux-scsi&m=113145981728205&w=2

My raid 5 was oopsing till I applied this.

--
J.A. Magallon <jamagallon()able!es> \ Software is like sex:
werewolf!able!es \ It's better when it's free
Mandriva Linux release 2006.1 (Cooker) for i586
Linux 2.6.14-jam1 (gcc 4.0.2 (4.0.2-1mdk for Mandriva Linux release 2006.1))


Attachments:
signature.asc (189.00 B)

2005-11-09 22:23:58

by NeilBrown

[permalink] [raw]
Subject: Re: 2.6.14-mm1 RAID-1 in D< state

On Wednesday November 9, [email protected] wrote:
> Hi all,
>
> I haven't noticed this until today...but my load average has been
> skyrocketing past 3.00 since Monday, which is when I upgraded to
> 2.6.14-mm1. I've got 3 Software RAID-1 arrays across 4 SATA disks, and
> all 3 processes are locked in an uninterruptible sleep.
>
> What's interesting, though, is I haven't noticed a degradation of
> performance at all, and all the arrays work absolutely fine. They aren't
> rebuilding or doing anything strange that I can see.
>
> Any ideas?

Can you
echo t > /proc/sysrq-trigger
dmesg > /tmp/log
and post the log created, possibly removing everything before
SysRq : Show State

If you can't find the 'Show State', then maybe your log buffer isn't
big enough. use 'dmesg -s ...' to make it bigger and try again.

NeilBrown

2005-11-09 23:16:11

by Chris Boot

[permalink] [raw]
Subject: Re: 2.6.14-mm1 RAID-1 in D< state

On 9 Nov 2005, at 22:23, Neil Brown wrote:

> On Wednesday November 9, [email protected] wrote:
>> Hi all,
>>
>> I haven't noticed this until today...but my load average has been
>> skyrocketing past 3.00 since Monday, which is when I upgraded to
>> 2.6.14-mm1. I've got 3 Software RAID-1 arrays across 4 SATA disks,
>> and
>> all 3 processes are locked in an uninterruptible sleep.
>>
>> What's interesting, though, is I haven't noticed a degradation of
>> performance at all, and all the arrays work absolutely fine. They
>> aren't
>> rebuilding or doing anything strange that I can see.
>>
>> Any ideas?
>
> Can you
> echo t > /proc/sysrq-trigger
> dmesg > /tmp/log
> and post the log created, possibly removing everything before
> SysRq : Show State

So that's what the sysrq-trigger is for... :-) Certainly easier that
way when your system still works!

> If you can't find the 'Show State', then maybe your log buffer isn't
> big enough. use 'dmesg -s ...' to make it bigger and try again

It was too small, but the serial console got it:

[4329954.200000] md2_raid1 D F7D776E0 0 809
6 810 799 (L-TLB)
[4329954.200000] f7db7f30 f7d2ba8c c02809e0 f7d776e0 c02c14f2
e9924580 c1b48b60 c1b8e200
[4329954.200000] f7c5bd40 7fffffff f7db7f88 00000000 23c37e00
000f6206 f7d6fa50 f7d6fb78
[4329954.200000] 7fffffff 7fffffff f7db7f88 f7db6000 c0338098
c1b8e200 f7db7f94 f7db7f88
[4329954.200000] Call Trace:
[4329954.200000] [<c02809e0>] generic_unplug_device+0x10/0x20
[4329954.200000] [<c02c14f2>] unplug_slaves+0xd2/0xe0
[4329954.200000] [<c0338098>] schedule_timeout+0x98/0xa0
[4329954.200000] [<c01295a9>] finish_wait+0x39/0x50
[4329954.200000] [<c02c9309>] md_thread+0xc9/0x100
[4329954.200000] [<c01295c0>] autoremove_wake_function+0x0/0x50
[4329954.200000] [<c01142d7>] __wake_up_common+0x37/0x60
[4329954.200000] [<c01295c0>] autoremove_wake_function+0x0/0x50
[4329954.200000] [<c02c9240>] md_thread+0x0/0x100
[4329954.200000] [<c0129174>] kthread+0xa4/0xe0
[4329954.200000] [<c01290d0>] kthread+0x0/0xe0
[4329954.200000] [<c0100f35>] kernel_thread_helper+0x5/0x10
[4329954.200000] md0_raid1 D F7D774A0 0 810
6 812 809 (L-TLB)
[4329954.200000] f7db5f30 f7d2b79c c02809e0 f7d774a0 c02c14f2
c0383bc0 c1b48ae0 c1b8e400
[4329954.200000] f7c5bb60 7fffffff f7db5f88 00000000 9bd42ec0
000f6211 f7d69090 f7d691b8
[4329954.200000] 7fffffff 7fffffff f7db5f88 f7db4000 c0338098
c1b8e400 00000002 f7db4000
[4329954.200000] Call Trace:
[4329954.200000] [<c02809e0>] generic_unplug_device+0x10/0x20
[4329954.200000] [<c02c14f2>] unplug_slaves+0xd2/0xe0
[4329954.200000] [<c0338098>] schedule_timeout+0x98/0xa0
[4329954.200000] [<c0129501>] prepare_to_wait+0x41/0x50
[4329954.200000] [<c02c9309>] md_thread+0xc9/0x100
[4329954.200000] [<c01295c0>] autoremove_wake_function+0x0/0x50
[4329954.200000] [<c01142d7>] __wake_up_common+0x37/0x60
[4329954.200000] [<c01295c0>] autoremove_wake_function+0x0/0x50
[4329954.200000] [<c02c9240>] md_thread+0x0/0x100
[4329954.200000] [<c0129174>] kthread+0xa4/0xe0
[4329954.200000] [<c01290d0>] kthread+0x0/0xe0
[4329954.200000] [<c0100f35>] kernel_thread_helper+0x5/0x10
[4329954.200000] md1_raid1 D F7D77860 0 812
6 813 810 (L-TLB)
[4329954.200000] f7dbbf30 f7d2bc04 c02809e0 f7d77860 c02c14f2
e9924580 c1b48a60 c1b8e000
[4329954.200000] f7c5f920 7fffffff f7dbbf88 00000000 2358ae40
000f6206 f7d5b5c0 f7d5b6e8
[4329954.200000] 7fffffff 7fffffff f7dbbf88 f7dba000 c0338098
c1b8e000 f7dbbf88 f7dba000
[4329954.200000] Call Trace:
[4329954.200000] [<c02809e0>] generic_unplug_device+0x10/0x20
[4329954.200000] [<c02c14f2>] unplug_slaves+0xd2/0xe0
[4329954.200000] [<c0338098>] schedule_timeout+0x98/0xa0
[4329954.200000] [<c02c29ba>] raid1d+0x32a/0x350
[4329954.200000] [<c02c9309>] md_thread+0xc9/0x100
[4329954.200000] [<c01295c0>] autoremove_wake_function+0x0/0x50
[4329954.200000] [<c01142d7>] __wake_up_common+0x37/0x60
[4329954.200000] [<c01295c0>] autoremove_wake_function+0x0/0x50
[4329954.200000] [<c02c9240>] md_thread+0x0/0x100
[4329954.200000] [<c0129174>] kthread+0xa4/0xe0
[4329954.200000] [<c01290d0>] kthread+0x0/0xe0
[4329954.200000] [<c0100f35>] kernel_thread_helper+0x5/0x10

Let me know if you need dumps of any other processes.

> NeilBrown

Cheers,
Chris

--
Chris Boot
[email protected]
http://www.bootc.net/



Attachments:
smime.p7s (2.30 kB)

2005-11-10 05:40:21

by NeilBrown

[permalink] [raw]
Subject: Re: 2.6.14-mm1 RAID-1 in D< state


Thanks for the trace. I see what is happening.
I changed
wait_event_timeout_interruptible
in md.c(md_thread) to
wait_event_timeout

as the thread no longer needs to be able to respond the signals.
However that has the side-effect of putting the process in the 'D'
state and adding to the 'uptime'.

I guess I'll put that back...

NeilBrown


Signed-off-by: Neil Brown <[email protected]>

### Diffstat output
./drivers/md/md.c | 9 +++++----
1 file changed, 5 insertions(+), 4 deletions(-)

diff ./drivers/md/md.c~current~ ./drivers/md/md.c
--- ./drivers/md/md.c~current~ 2005-11-10 16:39:04.000000000 +1100
+++ ./drivers/md/md.c 2005-11-10 16:39:28.000000000 +1100
@@ -3439,10 +3439,11 @@ static int md_thread(void * arg)
allow_signal(SIGKILL);
while (!kthread_should_stop()) {

- wait_event_timeout(thread->wqueue,
- test_bit(THREAD_WAKEUP, &thread->flags)
- || kthread_should_stop(),
- thread->timeout);
+ wait_event_timeout_interruptible
+ (thread->wqueue,
+ test_bit(THREAD_WAKEUP, &thread->flags)
+ || kthread_should_stop(),
+ thread->timeout);
try_to_freeze();

clear_bit(THREAD_WAKEUP, &thread->flags);

2005-11-10 09:37:27

by Chris Boot

[permalink] [raw]
Subject: Re: 2.6.14-mm1 RAID-1 in D< state

On 10 Nov 2005, at 5:40, Neil Brown wrote:

>
> Thanks for the trace. I see what is happening.
> I changed
> wait_event_timeout_interruptible
> in md.c(md_thread) to
> wait_event_timeout
>
> as the thread no longer needs to be able to respond the signals.
> However that has the side-effect of putting the process in the 'D'
> state and adding to the 'uptime'.
>
> I guess I'll put that back...
>
> NeilBrown
>
>
> Signed-off-by: Neil Brown <[email protected]>
>
> ### Diffstat output
> ./drivers/md/md.c | 9 +++++----
> 1 file changed, 5 insertions(+), 4 deletions(-)
>
> diff ./drivers/md/md.c~current~ ./drivers/md/md.c
> --- ./drivers/md/md.c~current~ 2005-11-10 16:39:04.000000000 +1100
> +++ ./drivers/md/md.c 2005-11-10 16:39:28.000000000 +1100
> @@ -3439,10 +3439,11 @@ static int md_thread(void * arg)
> allow_signal(SIGKILL);
> while (!kthread_should_stop()) {
>
> - wait_event_timeout(thread->wqueue,
> - test_bit(THREAD_WAKEUP, &thread->flags)
> - || kthread_should_stop(),
> - thread->timeout);
> + wait_event_timeout_interruptible
> + (thread->wqueue,
> + test_bit(THREAD_WAKEUP, &thread->flags)
> + || kthread_should_stop(),
> + thread->timeout);
> try_to_freeze();
>
> clear_bit(THREAD_WAKEUP, &thread->flags);

Sounds about right but...

drivers/md/md.c: In function `md_thread':
drivers/md/md.c:3441: warning: implicit declaration of function
`wait_event_timeout_interruptible'
[...]
LD .tmp_vmlinux1
drivers/built-in.o(.text+0x9904f): In function `md_thread':
: undefined reference to `wait_event_timeout_interruptible'
drivers/built-in.o(.text+0x9908f): In function `md_thread':
: undefined reference to `wait_event_timeout_interruptible'
make: *** [.tmp_vmlinux1] Error 1

HTH,
Chris

--
Chris Boot
[email protected]
http://www.bootc.net/



Attachments:
smime.p7s (2.30 kB)

2005-11-10 09:38:10

by J.A. Magallon

[permalink] [raw]
Subject: Re: 2.6.14-mm1 RAID-1 in D< state

On Thu, 10 Nov 2005 16:40:13 +1100, Neil Brown <[email protected]> wrote:

>
> Thanks for the trace. I see what is happening.
> I changed
> wait_event_timeout_interruptible
> in md.c(md_thread) to
> wait_event_timeout
>
> as the thread no longer needs to be able to respond the signals.
> However that has the side-effect of putting the process in the 'D'
> state and adding to the 'uptime'.
>
> I guess I'll put that back...
>
> NeilBrown
>
>
> Signed-off-by: Neil Brown <[email protected]>
>
> ### Diffstat output
> ./drivers/md/md.c | 9 +++++----
> 1 file changed, 5 insertions(+), 4 deletions(-)
>
> diff ./drivers/md/md.c~current~ ./drivers/md/md.c
> --- ./drivers/md/md.c~current~ 2005-11-10 16:39:04.000000000 +1100
> +++ ./drivers/md/md.c 2005-11-10 16:39:28.000000000 +1100
> @@ -3439,10 +3439,11 @@ static int md_thread(void * arg)
> allow_signal(SIGKILL);
> while (!kthread_should_stop()) {
>
> - wait_event_timeout(thread->wqueue,
> - test_bit(THREAD_WAKEUP, &thread->flags)
> - || kthread_should_stop(),
> - thread->timeout);
> + wait_event_timeout_interruptible
> + (thread->wqueue,
> + test_bit(THREAD_WAKEUP, &thread->flags)
> + || kthread_should_stop(),
> + thread->timeout);
> try_to_freeze();
>
> clear_bit(THREAD_WAKEUP, &thread->flags);

s/wait_event_timeout_interruptible/wait_event_interruptible_timeout/

;)

--
J.A. Magallon <jamagallon()able!es> \ Software is like sex:
werewolf!able!es \ It's better when it's free
Mandriva Linux release 2006.1 (Cooker) for i586
Linux 2.6.14-jam1 (gcc 4.0.2 (4.0.2-1mdk for Mandriva Linux release 2006.1))


Attachments:
signature.asc (189.00 B)

2005-11-10 09:39:36

by NeilBrown

[permalink] [raw]
Subject: Re: 2.6.14-mm1 RAID-1 in D< state

On Thursday November 10, [email protected] wrote:
>
> Sounds about right but...
>
> drivers/md/md.c: In function `md_thread':
> drivers/md/md.c:3441: warning: implicit declaration of function
> `wait_event_timeout_interruptible'

should be
wait_event_interruptible_timeout

Sorry.
NeilBrown

2005-11-10 09:51:53

by Chris Boot

[permalink] [raw]
Subject: Re: 2.6.14-mm1 RAID-1 in D< state


On 10 Nov 2005, at 9:39, Neil Brown wrote:

> On Thursday November 10, [email protected] wrote:
>>
>> Sounds about right but...
>>
>> drivers/md/md.c: In function `md_thread':
>> drivers/md/md.c:3441: warning: implicit declaration of function
>> `wait_event_timeout_interruptible'
>
> should be
> wait_event_interruptible_timeout
>
> Sorry.
> NeilBrown

No problem. Builds, boots, and fixes the problem.

Cheers,
Chris

--
Chris Boot
[email protected]
http://www.bootc.net/



Attachments:
smime.p7s (2.30 kB)