2006-09-07 13:34:00

by Christian Leber

[permalink] [raw]
Subject: [BUG] 2.6.18-rc6: hda is allready "IN USE" when booting (non-deterministic)

Hello,

i keep hitting this since 2.6.18-rc3, the imo relevant messages are:

ide0: I/O resource 0x3F6-0x3F6
hda: ERROR, PORTS ALREADY IN USE
ide0 ad 0x1f0-0x1f7,0x3f6 on irq 14
hdb: HL-DT-STDVD-ROM GDR8081N, ATAPI CD/DVD-ROM drive
ide1: I/O resource 0x376-0x376 not free
ide0: ports already in use, skipping probe
register_blkdev: cannot get major 3 for ide0

the probablity for this is about 70%, so with some patience i can use
the box, but that also means it's very odd.
Between the tries i allways switch it completly off.

It's a Dell Latitude C810 laptop with a Pentium 3m 1133 Mhz with Intel 82815
chipset that has a ICH2M south-bridge.

Unfortunally it's not a hardware defect, with <= 2.6.17 it allways
works with imo the same config.
for reference i dumped it here:
http://debian.christian-leber.de/config-craptop-2.6.18-rc6


Christian Leber

--
"Omnis enim res, quae dando non deficit, dum habetur et non datur,
nondum habetur, quomodo habenda est." (Aurelius Augustinus)


2006-09-12 15:39:36

by Christian Leber

[permalink] [raw]
Subject: Re: [BUG] 2.6.18-rc6: hda is allready "IN USE" when booting / pi futex

On Thu, Sep 07, 2006 at 03:33:57PM +0200, Christian Leber wrote:

> ide0: I/O resource 0x3F6-0x3F6
> hda: ERROR, PORTS ALREADY IN USE
> ide0 ad 0x1f0-0x1f7,0x3f6 on irq 14
> hdb: HL-DT-STDVD-ROM GDR8081N, ATAPI CD/DVD-ROM drive
> ide1: I/O resource 0x376-0x376 not free
> ide0: ports already in use, skipping probe
> register_blkdev: cannot get major 3 for ide0
>
> the probablity for this is about 70%, so with some patience i can use
> the box, but that also means it's very odd.
> Between the tries i allways switch it completly off.
>
> It's a Dell Latitude C810 laptop with a Pentium 3m 1133 Mhz with Intel 82815
> chipset that has a ICH2M south-bridge.

I wasted a day to track it down, unfortunally it's for me completly
unclear how this commit s related to IDE, it could be some
timing issue, but that is just guessing.

With 77ba89c5cf28d5d98a3cae17f67a3e42b102cc25 (linux-2.6 tree) i don't have a
problem (i booted it 21 times until now and it allways worked) and
b29739f902ee76a05493fb7d2303490fc75364f4 is the first bad commit.

So this seems to be the "offender":
Author: Ingo Molnar <[email protected]>
Date: Tue Jun 27 02:54:51 2006 -0700

[PATCH] pi-futex: scheduler support for pi


For convenience the diff is also here:
http://debian.christian-leber.de/idebad.patch
I have really no remote idea what is happening.

BTW: I have seen this problem too on a box with a intel BX440, but only
one time.

Christian Leber

--
"Omnis enim res, quae dando non deficit, dum habetur et non datur,
nondum habetur, quomodo habenda est." (Aurelius Augustinus)

2006-09-12 17:42:58

by Ingo Molnar

[permalink] [raw]
Subject: Re: [BUG] 2.6.18-rc6: hda is allready "IN USE" when booting / pi futex


* Christian Leber <[email protected]> wrote:

> I wasted a day to track it down, unfortunally it's for me completly
> unclear how this commit s related to IDE, it could be some timing
> issue, but that is just guessing.

yeah, i too suspect that it's timing related.

> With 77ba89c5cf28d5d98a3cae17f67a3e42b102cc25 (linux-2.6 tree) i don't
> have a problem (i booted it 21 times until now and it allways worked)
> and b29739f902ee76a05493fb7d2303490fc75364f4 is the first bad commit.

but in case it's not timing related, could you do some more testing,
ontop of the 77ba89c5cf28d5d98a3cae17f67a3e42b102cc25 tree?

The b29739f902ee76a05493fb7d2303490fc75364f4 patch is quite large, so i
have created a finegrained, functional splitup of it:

sched-cleanups.patch
sched-add-task-rq-lock-ops.patch
sched-add-rt-mutex-setprio.patch
sched-pi-lock.patch
sched-add-new-macros.patch
sched-add-normal-prio.patch
sched-use-has-rt-policy.patch
sched-set-user-nice-fix.patch
sched-use-normal-prio.patch

and have attached the resulting tarball. (Just extract the tarball into
the known-good 77ba89c5cf28d5d98a3cae17f67a3e42b102cc25 tree, it will
create a patches/ directory which includes a series file and the
patches. If you install quilt then you can do 'quilt push' / 'quilt pop'
to navigate in the patch-queue easily. Use 'quilt applied' to see the
current patch-stack.)

i have ordered the patches within the splitup in a way so that the
patches with more impact are at the end. The last patch
(sched-use-normal-prio) is the one most likely to cause the problem (but
sched-set-user-nice-fix and sched-use-has-rt-policy are possible
candidates too). I have verified that every intermediate step builds and
boots cleanly as well.

Could you try this patch-queue and figure which patch causes the failure
you are seeing? That would reduce the search space quite significantly.

Ingo


Attachments:
(No filename) (1.87 kB)
patches.tar.gz (4.65 kB)
Download all attachments

2006-09-12 21:48:26

by Christian Leber

[permalink] [raw]
Subject: Re: [BUG] 2.6.18-rc6: hda is allready "IN USE" when booting / pi futex

On Tue, Sep 12, 2006 at 07:34:55PM +0200, Ingo Molnar wrote:

> yeah, i too suspect that it's timing related.

I guess that means it's actually a bug in the ide chipset driver.

> The b29739f902ee76a05493fb7d2303490fc75364f4 patch is quite large, so i
> have created a finegrained, functional splitup of it:
>
> sched-cleanups.patch
> sched-add-task-rq-lock-ops.patch
> sched-add-rt-mutex-setprio.patch
> sched-pi-lock.patch
> sched-add-new-macros.patch
> sched-add-normal-prio.patch
> sched-use-has-rt-policy.patch

until here everthing seems to be ok (20 boots)

> sched-set-user-nice-fix.patch

here we go, that one triggers the problem

> sched-use-normal-prio.patch
>
> and have attached the resulting tarball.

Thank you a lot, i didn't know how to split it up.


Christian Leber

--
"Omnis enim res, quae dando non deficit, dum habetur et non datur,
nondum habetur, quomodo habenda est." (Aurelius Augustinus)

2006-09-12 22:03:11

by Ingo Molnar

[permalink] [raw]
Subject: Re: [BUG] 2.6.18-rc6: hda is allready "IN USE" when booting / pi futex


* Christian Leber <[email protected]> wrote:

> > The b29739f902ee76a05493fb7d2303490fc75364f4 patch is quite large, so i
> > have created a finegrained, functional splitup of it:
> >
> > sched-cleanups.patch
> > sched-add-task-rq-lock-ops.patch
> > sched-add-rt-mutex-setprio.patch
> > sched-pi-lock.patch
> > sched-add-new-macros.patch
> > sched-add-normal-prio.patch
> > sched-use-has-rt-policy.patch
>
> until here everthing seems to be ok (20 boots)
>
> > sched-set-user-nice-fix.patch
>
> here we go, that one triggers the problem

ok, great. Could you try the patch below ontop of 2.6.18-rc6, does it
'fix' your bootup too?

the original patch (which this patch reverses) is harmless: it changes
the priority calculation of reniced tasks from:

p->prio = NICE_TO_PRIO(nice);

to:

p->prio = effective_prio(p);

which is a correct change, because NICE_TO_PRIO() gives a 'static' value
that is not modified by interactivity bonuses/penalties, while
effective_prio() is the kind of prio every task should get. I.e.
sys_nice(1) used to (incorrectly) discard the interactive bias of a
task, until the end of the timeslice.

so i'm afraid you managed to trigger a race in the IDE driver that used
to be dormant until now... To debug such races the best method is to
create a 'trace' by printk-ing key points of the port/disk detection
mechanism. [another option would be to use the latency tracer to do a
_really_ finegrained trace of the incident - assuming that the bug does
not go away due to tracing overhead.]

Ingo

--------------->
Subject: [patch] revert set_user_nice() change
From: Ingo Molnar <[email protected]>

revert an otherwise correct fix.

this disturbs dynamic priorities (of user and kernel threads),
so it could have an impact on timing-sensitive races.

NOT-Signed-off-by: Ingo Molnar <[email protected]>

---
kernel/sched.c | 9 +++++----
1 file changed, 5 insertions(+), 4 deletions(-)

Index: linux/kernel/sched.c
===================================================================
--- linux.orig/kernel/sched.c
+++ linux/kernel/sched.c
@@ -3864,8 +3864,8 @@ void rt_mutex_setprio(struct task_struct

void set_user_nice(struct task_struct *p, long nice)
{
+ int old_prio, new_prio, delta;
struct prio_array *array;
- int old_prio, delta;
unsigned long flags;
struct rq *rq;

@@ -3892,11 +3892,12 @@ void set_user_nice(struct task_struct *p
dec_raw_weighted_load(rq, p);
}

+ old_prio = p->prio;
+ new_prio = NICE_TO_PRIO(nice);
+ delta = new_prio - old_prio;
p->static_prio = NICE_TO_PRIO(nice);
set_load_weight(p);
- old_prio = p->prio;
- p->prio = effective_prio(p);
- delta = p->prio - old_prio;
+ p->prio += delta;

if (array) {
enqueue_task(p, array);

2006-09-13 01:24:51

by Christian Leber

[permalink] [raw]
Subject: Re: [BUG] 2.6.18-rc6: hda is allready "IN USE" when booting / pi futex

On Tue, Sep 12, 2006 at 11:55:10PM +0200, Ingo Molnar wrote:

> ok, great. Could you try the patch below ontop of 2.6.18-rc6, does it
> 'fix' your bootup too?

Yes, it does.

> so i'm afraid you managed to trigger a race in the IDE driver that used
> to be dormant until now... To debug such races the best method is to
> create a 'trace' by printk-ing key points of the port/disk detection
> mechanism. [another option would be to use the latency tracer to do a
> _really_ finegrained trace of the incident - assuming that the bug does
> not go away due to tracing overhead.]

I'm looking into it, but i doubt i can find the "race"

The output, when it works it looks like this:

[ 11.275757] ICH2M: IDE controller at PCI slot 0000:00:1f.1
[ 11.275854] ICH2M: chipset revision 3
[ 11.275917] ICH2M: not 100% native mode: will probe irqs later
[ 11.275999] ide0: BM-DMA at 0xbfa0-0xbfa7, BIOS settings: hda:DMA, hdb:DMA
[ 11.276170] ide1: BM-DMA at 0xbfa8-0xbfaf, BIOS settings: hdc:pio, hdd:pio
[ 11.276337] Probing IDE interface ide0...
[ 2.804000] Time: acpi_pm clocksource has been installed.
[ 3.056000] hda: IC25N020ATMR04-0, ATA DISK drive
[ 3.504000] hdb: HL-DT-STDVD-ROM GDR8081N, ATAPI CD/DVD-ROM drive
[ 3.560000] ide0 at 0x1f0-0x1f7,0x3f6 on irq 14
[ 3.560000] Probing IDE interface ide1...
[ 4.136000] hda: max request size: 512KiB
[ 4.156000] hda: 39070080 sectors (20003 MB) w/1740KiB Cache, CHS=16383/255/63, UDMA(100)
[ 4.156000] hda: cache flushes supported
[ 4.156000] hda: hda1 hda2 hda3
[ 4.216000] hdb: ATAPI 24X DVD-ROM drive, 512kB Cache, DMA
[ 4.216000] Uniform CD-ROM driver Revision: 3.20
[ 4.816000] usbcore: registered new driver usbfs
[ 4.820000] usbcore: registered new driver hub
[ 4.820000] USB Universal Host Controller Interface driver v3.0


When it fails the order is different:
(but not allways the same, of course)

hda: IC25N020ATMR04-0, ATA DISK drive <- the disk is found before the controller?!?
ICH2M: IDE controller at PCI slot 0000:00:1f.1
ICH2M: chipset revision 3
ICH2M: not 100% native mode: will probe irqs later
ide0: BM-DMA at 0xbfa0-0xbfa7, BIOS settings: hda:DMA, hdb:DMA
ide1: BM-DMA at 0xbfa8-0xbfaf, BIOS settings: hdc:pio, hdd:pio
ide0: I/O resource 0x3F6-0x3F6 not free
hda: ERROR, PORTS ALREADY IN USE
ide0 ad 0x1f0-0x1f7,0x3f6 on irq 14
usbcore: registered new driver usbfs
usbcore: registered new driver hub
USB Universal Host Controller Interface driver v3.0
hdb: HL-DT-STDVD-ROM GDR8081N, ATAPI CD/DVD-ROM drive
ide1: I/O resource 0x376-0x376 not free
ide1: ports already in use, skipping probe
register_blkdev: cannot get major 3 for ide0


Christian Leber

--
"Omnis enim res, quae dando non deficit, dum habetur et non datur,
nondum habetur, quomodo habenda est." (Aurelius Augustinus)