2002-07-07 10:32:48

by Zwane Mwaikambo

[permalink] [raw]
Subject: ata_special_intr, ide_do_drive_cmd deadlock

The trace is quite nice on this one.

CPU: 0
EIP: 0010:[<c024d7f3>] Not tainted
EFLAGS: 00000086
eax: c0455d9c ebx: c15d2d34 ecx: 00000000 edx: cdfcdd24
esi: c0455ee4 edi: c0455ed0 ebp: c0455ee4 esp: cdfcdcbc
ds: 0018 es: 0018 ss: 0018
Process dd (pid: 668, threadinfo=cdfcc000 task=cee0ed00)
Stack: 00000046 00000000 00000001 dead4ead cdfcdcec cdfcdcec ff5aff5a ff5aff5a
ff5aff5a 00000000 00000001 dead4ead cdfcdcec cdfcdcec ff5aff5a ff5aff5a
ff5aff5a c0455ed0 c0455ed0 cdfcdd8c 00000088 c024d7ed c0455ed0 cdfcdd24
Call Trace: [<c024d7ed>] [<c011cefe>] [<c024f641>] [<c024d590>] [<c024cef3>]
[<c024d021>] [<c024f8ba>] [<c024fd47>] [<c024fe13>] [<c0230d89>] [<c023113d>]
[<c0146ae1>] [<c0146bf8>] [<c0134a5f>] [<c0134f80>] [<c013507c>] [<c0134f80>]
[<c014999c>] [<c0107efa>] [<c0149bea>] [<c01075ab>]

Code: 80 3b 00 f3 90 7e f9 e9 c1 fc ff ff 80 3b 00 f3 90 7e f9 e9

>>EIP; c024d7f3 <.text.lock.ide_taskfile+0/1d> <=====
Trace; c024d7ed <ide_raw_taskfile+4d/53>
Trace; c011cefe <printk+1ae/210>
Trace; c024f641 <do_recalibrate+51/70>
Trace; c024d590 <ata_special_intr+0/210>
Trace; c024cef3 <ata_busy_poll+23/70>
Trace; c024d021 <ata_status_poll+a1/c0>
Trace; c024f8ba <start_request+ca/220>
Trace; c024fd47 <queue_commands+e7/170>
Trace; c024fe13 <do_request+43/70>
Trace; c0230d89 <generic_unplug_device+119/170>
Trace; c023113d <blk_run_queues+13d/150>
Trace; c0146ae1 <do_page_cache_readahead+161/180>
Trace; c0146bf8 <page_cache_readahead+f8/100>
Trace; c0134a5f <do_generic_file_read+7f/3c0>
Trace; c0134f80 <file_read_actor+0/80>
Trace; c013507c <generic_file_read+7c/130>
Trace; c0134f80 <file_read_actor+0/80>
Trace; c014999c <vfs_read+9c/160>
Trace; c0107efa <common_interrupt+22/28>
Trace; c0149bea <sys_read+2a/40>
Trace; c01075ab <syscall_call+7/b>
Code; c024d7f3 <.text.lock.ide_taskfile+0/1d>
00000000 <_EIP>:
Code; c024d7f3 <.text.lock.ide_taskfile+0/1d> <=====
0: 80 3b 00 cmpb $0x0,(%ebx) <=====
Code; c024d7f6 <.text.lock.ide_taskfile+3/1d>
3: f3 90 repz nop
Code; c024d7f8 <.text.lock.ide_taskfile+5/1d>
5: 7e f9 jle 0 <_EIP>
Code; c024d7fa <.text.lock.ide_taskfile+7/1d>
7: e9 c1 fc ff ff jmp fffffccd <_EIP+0xfffffccd> c024d4c0 <ide_do_drive_cmd+e0/1b0>
Code; c024d7ff <.text.lock.ide_taskfile+c/1d>
c: 80 3b 00 cmpb $0x0,(%ebx)
Code; c024d802 <.text.lock.ide_taskfile+f/1d>
f: f3 90 repz nop
Code; c024d804 <.text.lock.ide_taskfile+11/1d>
11: 7e f9 jle c <_EIP+0xc> c024d7ff
<.text.lock.ide_taskfile+c/1d>
Code; c024d806 <.text.lock.ide_taskfile+13/1d>
13: e9 00 00 00 00 jmp 18 <_EIP+0x18> c024d80b
<.text.lock.ide_taskfile+18/1d>

--
function.linuxpower.ca


2002-07-07 10:37:26

by Thunder from the hill

[permalink] [raw]
Subject: Re: ata_special_intr, ide_do_drive_cmd deadlock

Hi,

On Sun, 7 Jul 2002, Zwane Mwaikambo wrote:
> The trace is quite nice on this one.
>
> [trace followed immediately]

Have you tried IDE 96+97 yet? They changed ata_special_intr and
ide_do_drive_cmd heavily.

Regards,
Thunder
--
(Use http://www.ebb.org/ungeek if you can't decode)
------BEGIN GEEK CODE BLOCK------
Version: 3.12
GCS/E/G/S/AT d- s++:-- a? C++$ ULAVHI++++$ P++$ L++++(+++++)$ E W-$
N--- o? K? w-- O- M V$ PS+ PE- Y- PGP+ t+ 5+ X+ R- !tv b++ DI? !D G
e++++ h* r--- y-
------END GEEK CODE BLOCK------

2002-07-07 12:04:52

by Zwane Mwaikambo

[permalink] [raw]
Subject: Re: ata_special_intr, ide_do_drive_cmd deadlock

On Sun, 7 Jul 2002, Thunder from the hill wrote:

> Have you tried IDE 96+97 yet? They changed ata_special_intr and
> ide_do_drive_cmd heavily.

Thank you sir, i'll have a look.

Cheers,
Zwane Mwaikambo

--
function.linuxpower.ca

Subject: Re: ata_special_intr, ide_do_drive_cmd deadlock


If it was IDE 95, or IDE 95 on atapi device it is known, noted in 95's
changelog and fixed in 96...

--
Bartlomiej



2002-07-07 16:56:35

by Zwane Mwaikambo

[permalink] [raw]
Subject: Re: ata_special_intr, ide_do_drive_cmd deadlock

On Sun, 7 Jul 2002, Bartlomiej Zolnierkiewicz wrote:

>
> If it was IDE 95, or IDE 95 on atapi device it is known, noted in 95's
> changelog and fixed in 96...

On ATA disk, with 2.5.25 stock and the deadlock is still there (visual
inspection) in IDE 97

Cheers,
Zwane Mwaikambo
--
function.linuxpower.ca

2002-07-07 17:04:55

by Zwane Mwaikambo

[permalink] [raw]
Subject: Re: ata_special_intr, ide_do_drive_cmd deadlock

On Sun, 7 Jul 2002, Zwane Mwaikambo wrote:

> > If it was IDE 95, or IDE 95 on atapi device it is known, noted in 95's
> > changelog and fixed in 96...
>
> On ATA disk, with 2.5.25 stock and the deadlock is still there (visual
> inspection) in IDE 97

Sorry perhaps let me elaborate, i was doing a dd if=/dev/hdX of=file then
the drive dropped down to PIO, thats when i reckon i hit do_recalibrate.
This was on 2.5.25.

Thanks,
Zwane Mwaikambo

--
function.linuxpower.ca



Subject: Re: ata_special_intr, ide_do_drive_cmd deadlock


On Sun, 7 Jul 2002, Zwane Mwaikambo wrote:

> On Sun, 7 Jul 2002, Zwane Mwaikambo wrote:
>
> > > If it was IDE 95, or IDE 95 on atapi device it is known, noted in 95's
> > > changelog and fixed in 96...
> >
> > On ATA disk, with 2.5.25 stock and the deadlock is still there (visual
> > inspection) in IDE 97
>
> Sorry perhaps let me elaborate, i was doing a dd if=/dev/hdX of=file then
> the drive dropped down to PIO, thats when i reckon i hit do_recalibrate.
> This was on 2.5.25.

do_recalibrate is called under lock and it tries to acquire lock, so
deadlock, you was the first to notice it and you have even added FIXME
to the code... ;-)

Do you realise that 2.5.25 have IDE 93 and it should be fixed in IDE 96.

BTW: know problem with 96 is broken ide_timer_expiry().
Attached IDE 98 (or not) prepatch should fix it.

--
Bartlomiej

>
> Thanks,
> Zwane Mwaikambo
>
> --
> function.linuxpower.ca
>
>
>


Attachments:
ide-98-pre.diff (6.28 kB)
Subject: Re: ata_special_intr, ide_do_drive_cmd deadlock


While at it, please don't spent too much time on locking.
I reverted it to what 2.4.x (early 2.5?) kernels do and it should
work fine, remeber IDE_BUSY bit protects us from reentering
ide_do_request() (while it is set nothing will pass down this function
and REQ_STARTED request's flag protects from block layer.

Locking will be slightly changed/fixed but not now, but after fixing many
much more urgent issues...
I simply dont want to waste time on fixing locking n times.

Regards
--
Bartlomiej


2002-07-07 17:40:08

by Zwane Mwaikambo

[permalink] [raw]
Subject: Re: ata_special_intr, ide_do_drive_cmd deadlock

On Sun, 7 Jul 2002, Bartlomiej Zolnierkiewicz wrote:

> do_recalibrate is called under lock and it tries to acquire lock, so
> deadlock, you was the first to notice it and you have even added FIXME
> to the code... ;-)

I thought you had backed out most if not all those locking changes.

> Do you realise that 2.5.25 have IDE 93 and it should be fixed in IDE 96.

That i wasn't aware of, thanks i'm currently looking at 97

> BTW: know problem with 96 is broken ide_timer_expiry().
> Attached IDE 98 (or not) prepatch should fix it.

Thanks,
Zwane Mwaikambo

--
function.linuxpower.ca

2002-07-08 02:02:19

by Petr Vandrovec

[permalink] [raw]
Subject: IDE94 lockup on lock_page or __wait_on_buffer

On Sun, Jul 07, 2002 at 07:27:18PM +0200, Bartlomiej Zolnierkiewicz wrote:
>
> Do you realise that 2.5.25 have IDE 93 and it should be fixed in IDE 96.
>
> BTW: know problem with 96 is broken ide_timer_expiry().
> Attached IDE 98 (or not) prepatch should fix it.

Hello,
there is something wrong with IDE94 :-( I'm starring at this problem for
6 hours, but I still cannot explain that. After applying IDE94 and
simple booting with:

Linux init=/bin/bash
# bash < /dev/tty2 > /dev/tty2 2>&1 &
<change to vt2>
# dd if=/dev/hdg of=/dev/null bs=4k
<change back to vt1>
# df

system deadlocks. Call stack is either (when dd locks)

__lock_page
lock_page
filemap_nopage (first call to lock_page, at line 1550)
do_no_page
handle_mm_fault
do_page_fault
error_code

or (when bash dies while trying to start df)

__wait_on_buffer
__bread_slow
__getblk
ext2_get_inode
ext2_read_inode
ext2_lookup
real_lookup
do_lookup
link_path_walk
path_lookup
__user_walk
vfs_stat
sys_stat64
syscall_call

Probably IDE messes its request queue and forgets to execute some requests,
or what's going on...

None of running processes (2x bash, dd, keventd,
ksoftirqd...) is executing IDE code when the deadlock happens. IDE channel
in question is dead after deadlock occurs (hdparm -d 0 /dev/hde says
channel busy after some timeout).

Kernel is up, non-preemptible, running on 1GHz Athlon,
one UDMA100 IDE (hde) and one UDMA33 IDE (hdg) connected to pdc20265, 512MB
RAM. I did not notice any problem while using this patch for last 7 days
on 450MHz PIII, two UDMA33 IDE connected to PIIX4, 640MB RAM.

Problem occurs even with latest ide-98-pre.
Thanks,
Petr Vandrovec
[email protected]