2005-09-01 07:23:50

by Ingo Molnar

[permalink] [raw]
Subject: [patch] drivers/ide/pci/alim15x3.c SMP fix


is this the right way to fix the UP assumption below?

Ingo

Signed-off-by: Ingo Molnar <[email protected]>

Index: linux/drivers/ide/pci/alim15x3.c
===================================================================
--- linux.orig/drivers/ide/pci/alim15x3.c
+++ linux/drivers/ide/pci/alim15x3.c
@@ -323,7 +323,7 @@ static void ali15x3_tune_drive (ide_driv
if (r_clc >= 16)
r_clc = 0;
}
- local_irq_save(flags);
+ spin_lock_irqsave(&ide_lock, flags);

/*
* PIO mode => ATA FIFO on, ATAPI FIFO off
@@ -345,7 +345,7 @@ static void ali15x3_tune_drive (ide_driv

pci_write_config_byte(dev, port, s_clc);
pci_write_config_byte(dev, port+drive->select.b.unit+2, (a_clc << 4) | r_clc);
- local_irq_restore(flags);
+ spin_unlock_irqrestore(&ide_lock, flags);

/*
* setup active rec
@@ -601,7 +601,7 @@ static unsigned int __devinit init_chips
}
#endif /* defined(DISPLAY_ALI_TIMINGS) && defined(CONFIG_PROC_FS) */

- local_irq_save(flags);
+ spin_lock_irqsave(&ide_lock, flags);

if (m5229_revision < 0xC2) {
/*
@@ -614,7 +614,7 @@ static unsigned int __devinit init_chips
* clear bit 7
*/
pci_write_config_byte(dev, 0x4b, tmpbyte & 0x7F);
- local_irq_restore(flags);
+ spin_unlock_irqrestore(&ide_lock, flags);
return 0;
}

@@ -639,7 +639,7 @@ static unsigned int __devinit init_chips
* 0:0.0 so if we didn't find one we know what is cooking.
*/
if (north && north->vendor != PCI_VENDOR_ID_AL) {
- local_irq_restore(flags);
+ spin_unlock_irqrestore(&ide_lock, flags);
return 0;
}

@@ -662,7 +662,7 @@ static unsigned int __devinit init_chips
pci_write_config_byte(isa_dev, 0x79, tmpbyte | 0x02);
}
}
- local_irq_restore(flags);
+ spin_unlock_irqrestore(&ide_lock, flags);
return 0;
}

@@ -686,7 +686,7 @@ static unsigned int __devinit ata66_ali1
unsigned long flags;
u8 tmpbyte;

- local_irq_save(flags);
+ spin_lock_irqsave(&ide_lock, flags);

if (m5229_revision >= 0xC2) {
/*
@@ -736,7 +736,7 @@ static unsigned int __devinit ata66_ali1

pci_write_config_byte(dev, 0x53, tmpbyte);

- local_irq_restore(flags);
+ spin_unlock_irqrestore(&ide_lock, flags);

return(ata66);
}


2005-09-01 08:12:46

by Rui Nuno Capela

[permalink] [raw]
Subject: Re: [patch] drivers/ide/pci/alim15x3.c SMP fix

Ingo Molnar wrote:
> is this the right way to fix the UP assumption below?
>
> Ingo
>
> Signed-off-by: Ingo Molnar <[email protected]>
>
> Index: linux/drivers/ide/pci/alim15x3.c
> [snip]

OK. The reported boot WARNING seems to be over now. Tested on the
offended laptop ([email protected]/UP, PCI chipset: ALi M1533) with 2.6.13-rt3,
where the suggested patch on drivers/ide/pci/alim15x3.c seems to fix the
burp. All seems to be working fine, still ;)

Thanks.
--
rncbc aka Rui Nuno Capela
[email protected]


2005-09-01 08:17:15

by Ingo Molnar

[permalink] [raw]
Subject: Re: [patch] drivers/ide/pci/alim15x3.c SMP fix


* Rui Nuno Capela <[email protected]> wrote:

> Ingo Molnar wrote:
> >is this the right way to fix the UP assumption below?
> >
> > Ingo
> >
> >Signed-off-by: Ingo Molnar <[email protected]>
> >
> >Index: linux/drivers/ide/pci/alim15x3.c
> > [snip]
>
> OK. The reported boot WARNING seems to be over now. Tested on the
> offended laptop ([email protected]/UP, PCI chipset: ALi M1533) with
> 2.6.13-rt3, where the suggested patch on drivers/ide/pci/alim15x3.c
> seems to fix the burp. All seems to be working fine, still ;)

just to make sure the original point gets across: the warning is only in
the -rt tree, and it pinpoints potential SMP bugs. Does your box do
hyperthreading? If yes then this could be a live (but probably mostly
harmless) SMP bug. Maybe the whole IRQ disabling is unnecessary?
__devinit is mostly serialized, so i'm not sure there's any protection
needed against parallel IRQs?

Ingo

2005-09-01 08:50:11

by Rui Nuno Capela

[permalink] [raw]
Subject: Re: [patch] drivers/ide/pci/alim15x3.c SMP fix

Ingo Molnar wrote:
>Rui Nuno Capela wrote:
>>OK. The reported boot WARNING seems to be over now. Tested on the
>>offended laptop ([email protected]/UP, PCI chipset: ALi M1533) with
>>2.6.13-rt3, where the suggested patch on drivers/ide/pci/alim15x3.c
>>seems to fix the burp. All seems to be working fine, still ;)
>
> just to make sure the original point gets across: the warning is only in
> the -rt tree, and it pinpoints potential SMP bugs. Does your box do
> hyperthreading?

Nope. The [email protected] is from pre-HT age, and the kernel is being built
for UP.

Bye now.
--
rncbc aka Rui Nuno Capela
[email protected]


2005-09-01 10:18:13

by Alan

[permalink] [raw]
Subject: Re: [patch] drivers/ide/pci/alim15x3.c SMP fix

On Iau, 2005-09-01 at 09:24 +0200, Ingo Molnar wrote:
> is this the right way to fix the UP assumption below?

Probably not. The ide_lock may already be held (randomly depending on
the code path) at the point we retune a drive on error. Actually you
probably crash before this anyway.

The ALi code in question was written knowing the system would be
uniprocessor and making various related assumptions. You also have to
get this locking right just to make it more fun - loading the timings
for one channel while another is doing I/O corrupts your data silently
in some cases. Fixing the ide_lock to be consistent in usage when the
tuning calls are made (ie fix the reset path and other offenders) might
be possible and would make using ide_lock ok, but it would still be
wrong with pre-emption and/or SMP.

There is currently no sane locking mechanism to enforce or implement
this in the IDE layer. Welcome to hell, please leave your brain at the
door.

Alan

2005-10-03 06:49:52

by Ingo Molnar

[permalink] [raw]
Subject: Re: [patch] drivers/ide/pci/alim15x3.c SMP fix


* Alan Cox <[email protected]> wrote:

> On Iau, 2005-09-01 at 09:24 +0200, Ingo Molnar wrote:
> > is this the right way to fix the UP assumption below?
>
> Probably not. The ide_lock may already be held (randomly depending on
> the code path) at the point we retune a drive on error. Actually you
> probably crash before this anyway.
>
> The ALi code in question was written knowing the system would be
> uniprocessor and making various related assumptions. You also have to
> get this locking right just to make it more fun - loading the timings
> for one channel while another is doing I/O corrupts your data silently
> in some cases. Fixing the ide_lock to be consistent in usage when the
> tuning calls are made (ie fix the reset path and other offenders) might
> be possible and would make using ide_lock ok, but it would still be
> wrong with pre-emption and/or SMP.

so perhaps part of the solution would be to do the initialization under
the IDE lock, via the patch below? It boots fine on my box so the basic
codepaths seem to be OK. Then the retuning codepaths need to be checked
to make sure they are holding the IDE lock.

Ingo

Index: linux/drivers/ide/setup-pci.c
===================================================================
--- linux.orig/drivers/ide/setup-pci.c
+++ linux/drivers/ide/setup-pci.c
@@ -665,8 +665,11 @@ static int do_ide_setup_pci_device(struc
{
static ata_index_t ata_index = { .b = { .low = 0xff, .high = 0xff } };
int tried_config = 0;
+ unsigned long flags;
int pciirq, ret;

+ spin_lock_irqsave(&ide_lock, flags);
+
ret = ide_setup_pci_controller(dev, d, noisy, &tried_config);
if (ret < 0)
goto out;
@@ -721,6 +724,8 @@ static int do_ide_setup_pci_device(struc
*index = ata_index;
ide_pci_setup_ports(dev, d, pciirq, index);
out:
+ spin_unlock_irqrestore(&ide_lock, flags);
+
return ret;
}

2005-10-03 15:57:09

by Alan

[permalink] [raw]
Subject: Re: [patch] drivers/ide/pci/alim15x3.c SMP fix

On Llu, 2005-10-03 at 08:50 +0200, Ingo Molnar wrote:
> so perhaps part of the solution would be to do the initialization under
> the IDE lock, via the patch below? It boots fine on my box so the basic
> codepaths seem to be OK. Then the retuning codepaths need to be checked
> to make sure they are holding the IDE lock.

The initialisation can take several seconds. Some controller paths may
also sleep so you would need to do a full audit first.

Alan

2005-10-04 09:28:37

by Rui Nuno Capela

[permalink] [raw]
Subject: tsc_c3_compensate undefined since patch-2.6.13-rt13

Ingo,

I'll take this late opportunity to report something that have been
looking suspicious since 2.6.13-rt13, inclusive, about this symbol of
tsc_c3_compensate being undefined and causing some noise on all kernel
builds since then.

To put things in brief, here follows a small exchange that took place on
the linux-audio-user list, regarding this thingie. Apparently for Mark,
it was a kernel build showstopper.


Mark Knecht wrote:
> Hi there,
> I have a newish AMD64 machine. NForce4 chipset. Asus A8N-E
> motherboard. PCI-Express 16x graphics, etc. No matter what I try I've
> been constantly stopped from building a 2.6.13 kernel with Ingo's rt14
> patch.
>
> Here's the error message:
>
> AS arch/x86_64/lib/copy_user.o
> AS arch/x86_64/lib/csum-copy.o
> CC arch/x86_64/lib/csum-partial.o
> CC arch/x86_64/lib/csum-wrappers.o
> CC arch/x86_64/lib/dec_and_lock.o
> CC arch/x86_64/lib/delay.o
> AS arch/x86_64/lib/getuser.o
> AS arch/x86_64/lib/putuser.o
> AS arch/x86_64/lib/thunk.o
> CC arch/x86_64/lib/usercopy.o
> AR arch/x86_64/lib/lib.a
> GEN .version
> CHK include/linux/compile.h
> UPD include/linux/compile.h
> CC init/version.o
> LD init/built-in.o
> LD .tmp_vmlinux1
> drivers/built-in.o(.text+0x24acc): In function `acpi_processor_idle':
> : undefined reference to `tsc_c3_compensate'
> make: *** [.tmp_vmlinux1] Error 1
> lightning linux #
>
> The 2.6.13 kernel builds fine before I apply the patch but fails
afterward.
>
> I do not find the error message
>
> undefined reference to `tsc_c3_compensate'
>
> in Google so maybe it's just me and my kernel config. I've attached
> the config file zipped. I've tried some obvious stuff like completely
> disabling ACPI, etc., but I haven't found anything that works yet.
>
> Thanks in advance for any ideas. This has held up my new AMD64 machine
> being useful for a few weeks now.
>

I'm spotting a very similar message when building 2.6.13.x-rt14, but as
a module linkage warning, not a fatal build error.

Maybe that's because I try to configure _everything_ I can as a module,
not as built-in. As this has been just a warning on the module install
phase I got along and all seems to boot and run fine -- except for some
acpi stuff e.g. I've lost thermal zone sensor on my laptop, but that
didn't look like a big deal.

Yep, that's it... I found the exact messages in my attic, although from
an erlier build:

WARNING:
/lib/modules/2.6.13.1-rt13.0mdk/kernel/drivers/char/hangcheck-timer.ko
needs unknown symbol do_monotonic_clock
WARNING:
/lib/modules/2.6.13.1-rt13.0mdk/kernel/drivers/acpi/processor.ko needs
unknown symbol tsc_c3_compensate


How about reporting to Ingo and/or the lkml? As you can see its not an
AMD64 issue, because I don't have such a beast here.

Bye.
--
rncbc aka Rui Nuno Capela
[email protected]



2005-10-04 10:13:54

by Ingo Molnar

[permalink] [raw]
Subject: Re: tsc_c3_compensate undefined since patch-2.6.13-rt13


* Rui Nuno Capela <[email protected]> wrote:

> Ingo,
>
> I'll take this late opportunity to report something that have been
> looking suspicious since 2.6.13-rt13, inclusive, about this symbol of
> tsc_c3_compensate being undefined and causing some noise on all kernel
> builds since then.
>
> To put things in brief, here follows a small exchange that took place
> on the linux-audio-user list, regarding this thingie. Apparently for
> Mark, it was a kernel build showstopper.

thanks for the reminder!

> WARNING:
> /lib/modules/2.6.13.1-rt13.0mdk/kernel/drivers/char/hangcheck-timer.ko
> needs unknown symbol do_monotonic_clock
> WARNING:
> /lib/modules/2.6.13.1-rt13.0mdk/kernel/drivers/acpi/processor.ko needs
> unknown symbol tsc_c3_compensate

back then i fixed do_monotonic_clock, but forgot to export
tsc_c3_compensate. I have fixed this in my tree, and have uploaded the
2.6.14-rc3-rt3 patch. Does it build without warnings for you now?

Ingo

2005-10-04 13:06:16

by Rui Nuno Capela

[permalink] [raw]
Subject: Re: tsc_c3_compensate undefined since patch-2.6.13-rt13

Ingo Molnar wrote:
> * Rui Nuno Capela <[email protected]> wrote:
>
>>WARNING:
>>/lib/modules/2.6.13.1-rt13.0mdk/kernel/drivers/char/hangcheck-timer.ko
>>needs unknown symbol do_monotonic_clock
>>WARNING:
>>/lib/modules/2.6.13.1-rt13.0mdk/kernel/drivers/acpi/processor.ko needs
>>unknown symbol tsc_c3_compensate
>
>
> back then i fixed do_monotonic_clock, but forgot to export
> tsc_c3_compensate. I have fixed this in my tree, and have uploaded the
> 2.6.14-rc3-rt3 patch. Does it build without warnings for you now?
>

OK. Just built 2.6.14-rc3-rt4 and it got thru. Thermal-zone sensor is
back in town :)

Tks.
--
rncbc aka Rui Nuno Capela
[email protected]


2005-10-16 22:44:17

by Rui Nuno Capela

[permalink] [raw]
Subject: Re: tsc_c3_compensate undefined since patch-2.6.13-rt13

Ingo Molnar wrote:
> * Rui Nuno Capela <[email protected]> wrote:
>
>
>>Ingo,
>>
>>I'll take this late opportunity to report something that have been
>>looking suspicious since 2.6.13-rt13, inclusive, about this symbol of
>>tsc_c3_compensate being undefined and causing some noise on all kernel
>>builds since then.
>>
>>To put things in brief, here follows a small exchange that took place
>>on the linux-audio-user list, regarding this thingie. Apparently for
>>Mark, it was a kernel build showstopper.
>
>
> thanks for the reminder!
>
>
>>WARNING:
>>/lib/modules/2.6.13.1-rt13.0mdk/kernel/drivers/char/hangcheck-timer.ko
>>needs unknown symbol do_monotonic_clock
>>WARNING:
>>/lib/modules/2.6.13.1-rt13.0mdk/kernel/drivers/acpi/processor.ko needs
>>unknown symbol tsc_c3_compensate
>
>
> back then i fixed do_monotonic_clock, but forgot to export
> tsc_c3_compensate. I have fixed this in my tree, and have uploaded the
> 2.6.14-rc3-rt3 patch. Does it build without warnings for you now?
>

As it seems, tsc_c3_compensate isn't being defined again, as of
patch-2.6.14-rc4-rt6 (-rt4 was ok).

Cheers.
--
rncbc aka Rui Nuno Capela
[email protected]

2005-10-17 12:47:51

by Ingo Molnar

[permalink] [raw]
Subject: Re: tsc_c3_compensate undefined since patch-2.6.13-rt13


* Rui Nuno Capela <[email protected]> wrote:

> As it seems, tsc_c3_compensate isn't being defined again, as of
> patch-2.6.14-rc4-rt6 (-rt4 was ok).

oops - export went MIA during the clockevents merge. Have put it back
again. (will be in -rt7)

Ingo