2006-09-24 18:57:39

by Krzysztof Halasa

[permalink] [raw]
Subject: NV SATA breakage: jgarzik/libata-dev#upstream etc

Hi,

The following commit in libata-dev breaks NV SATA init - I don't
have a dump handy, but the problem is a NULL ptr dereference here:

libata-core.c
int ata_device_add(const struct ata_probe_ent *ent)
{
...
/* register each port bound to this device */
for (i = 0; i < host->n_ports; i++) {
...
/* start port */
rc = ap->ops->port_start(ap);
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

The problematic commit is fea63e38013ec628ab3f7fddc4c2148064b7910a:
"[PATCH] libata: fix non-uniform ports handling

Non-uniform ports handling got broken while updating libata to handle
those in the same host. Only separate irq for the non-uniform
secondary port was implemented while all other fields (host flags,
transfer mode...) of the secondary port simply shared those of the
first.

For ata_piix combined mode, which ATM is the only user of non-uniform
ports, this causes the secondary port assume the wrong type. This can
cause PATA port to use SATA ops, which results in bogus check on PCS
and detection failure.

This patch adds ata_probe_ent->pinfo2 which points to optional
port_info for the secondary port. For the time being, this seems to
be the simplest solution. This workaround will be removed together
with ata_probe_ent itself after init model is updated to allow more
flexibility."


All details etc. are, as always, available on request.

00:05.0 IDE interface: nVidia Corporation MCP55 SATA Controller (rev a2)
(prog-if 85 [Master SecO PriO])
Subsystem: Micro-Star International Co., Ltd. Unknown device 7250
Flags: bus master, 66MHz, fast devsel, latency 0, IRQ 225
I/O ports at c800 [size=8]
I/O ports at c480 [size=4]
I/O ports at c400 [size=8]
I/O ports at c080 [size=4]
I/O ports at c000 [size=16]
Memory at feaf9000 (32-bit, non-prefetchable) [size=4K]
--
Krzysztof Halasa


2006-09-25 00:25:33

by Jeff Garzik

[permalink] [raw]
Subject: Re: NV SATA breakage: jgarzik/libata-dev#upstream etc

Krzysztof Halasa wrote:
> Hi,
>
> The following commit in libata-dev breaks NV SATA init - I don't
> have a dump handy, but the problem is a NULL ptr dereference here:
>
> libata-core.c
> int ata_device_add(const struct ata_probe_ent *ent)
> {
> ...
> /* register each port bound to this device */
> for (i = 0; i < host->n_ports; i++) {
> ...
> /* start port */
> rc = ap->ops->port_start(ap);
> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>
> The problematic commit is fea63e38013ec628ab3f7fddc4c2148064b7910a:
> "[PATCH] libata: fix non-uniform ports handling
>
> Non-uniform ports handling got broken while updating libata to handle
> those in the same host. Only separate irq for the non-uniform
> secondary port was implemented while all other fields (host flags,
> transfer mode...) of the secondary port simply shared those of the
> first.

What's broken, and how does it affect NV sata?

That's the chipset on my main dev workstation, and there are no problems
here...

Jeff



2006-09-25 12:51:10

by Krzysztof Halasa

[permalink] [raw]
Subject: Re: NV SATA breakage: jgarzik/libata-dev#upstream etc

Jeff Garzik <[email protected]> writes:

>> libata-core.c
>> int ata_device_add(const struct ata_probe_ent *ent)
>> {
>> ...
>> /* register each port bound to this device */
>> for (i = 0; i < host->n_ports; i++) {
>> ...
>> /* start port */
>> rc = ap->ops->port_start(ap);
>> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>> The problematic commit is fea63e38013ec628ab3f7fddc4c2148064b7910a:
>> "[PATCH] libata: fix non-uniform ports handling
>> Non-uniform ports handling got broken while updating libata to handle
>> those in the same host. Only separate irq for the non-uniform
>> secondary port was implemented while all other fields (host flags,
>> transfer mode...) of the secondary port simply shared those of the
>> first.
>
> What's broken, and how does it affect NV sata?

NV SATA initialization fails with NULL pointer dereference in
ata_device_add (= kernel panic).

> That's the chipset on my main dev workstation, and there are no
> problems here...

I'm a bit surprised... I'm using x86_64 with only one SATA "controller"
enabled (ports 0 and 1 only, ports 2-5 are disabled in BIOS). Only
port 0 is in use. Gcc 4.1.1.

With a64f97f2c351410dfb3099c2369eacf7154b5532 (2.6.18-rc7+, "Merge branch
'tmp' into upstream", just before the commit in question) it works fine:

libata version 2.00 loaded.
sata_nv 0000:00:05.0: version 2.0
ACPI: PCI Interrupt Link [LSA0] enabled at IRQ 23
GSI 16 sharing vector 0xE1 and IRQ 16
ACPI: PCI Interrupt 0000:00:05.0[A] -> Link [LSA0] -> GSI 23 (level, low) -> IRQ
225
PCI: Setting latency timer of device 0000:00:05.0 to 64
ata1: SATA max UDMA/133 cmd 0xC800 ctl 0xC482 bmdma 0xC000 irq 225
ata2: SATA max UDMA/133 cmd 0xC400 ctl 0xC082 bmdma 0xC008 irq 225
scsi0 : sata_nv
ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
ata1.00: ATA-7, max UDMA/133, 488397168 sectors: LBA48 NCQ (depth 0/32)
ata1.00: ata1: dev 0 multi count 16
ata1.00: configured for UDMA/133
scsi1 : sata_nv
ata2: SATA link down (SStatus 0 SControl 300)
ATA: abnormal status 0x7F on port 0xC407
Vendor: ATA Model: ST3250823AS Rev: 3.03
Type: Direct-Access ANSI SCSI revision: 05
--
Krzysztof Halasa

2006-09-25 17:15:30

by Krzysztof Halasa

[permalink] [raw]
Subject: Re: NV SATA breakage: jgarzik/libata-dev#upstream etc

Ok, rebooted it.

After that commit (fea63e38013ec628ab3f7fddc4c2148064b7910a), it does:

ata1: SATA max UDMA/133 cmd 0xC800 ctl 0xC482 bmdma 0xC000 irq 225

GPF: 0 [1]
RIP: ata_device_add+0x19a/0x530
RAX: 48000002d0b38d48
R14: ffff81003f7cc7e0

Call trace:
nv_init_one+0x190/0x1f0
pci_match_device
pci_device_probe
driver_probe_device
__driver_attach
__driver_attach
bus_for_each_dev
driver_register
__pci_register_driver
nv_init
etc.

ata_device_add:
...
193: 49 8b 46 08 mov 0x8(%r14),%rax <<<< ap->ops = invalid
197: 4c 89 f7 mov %r14,%rdi <<<< ap
19a: ff 90 f8 00 00 00 callq *0xf8(%rax) <<<< ap->ops->port_start
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
And it GPFs here (ap->ops->port_start(ap)).

Actually it seems the ap->ops = RAX is invalid but not exactly a NULL ptr.

Now, sata_nv.c: nv_init_one():
struct ata_port_info *ppi;
ppi = &nv_port_info[ent->driver_data];
probe_ent = ata_pci_init_native_mode(pdev, &ppi,
ATA_PORT_PRIMARY | ATA_ PORT_SECONDARY);

while
/**
* ata_pci_init_native_mode - Initialize native-mode driver
* @pdev: pci device to be initialized
* @port: array[2] of pointers to port info structures.
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Not sure... should the nv_init_one() just read:
struct ata_port_info *ppi[2];
ppi[0] = ppi[1] = &nv_port_info[ent->driver_data];
probe_ent = ata_pci_init_native_mode(pdev, ppi,
ATA_PORT_PRIMARY | ATA_ PORT_SECONDARY);

or should
ppi[1] = NULL?
--
Krzysztof Halasa

2006-09-25 22:20:43

by Jeff Garzik

[permalink] [raw]
Subject: Re: NV SATA breakage: jgarzik/libata-dev#upstream etc

Krzysztof Halasa wrote:
> ppi[0] = ppi[1] = &nv_port_info[ent->driver_data];

That's probably the best solution.

Jeff


2006-09-25 23:50:36

by Krzysztof Halasa

[permalink] [raw]
Subject: Re: NV SATA breakage: jgarzik/libata-dev#upstream etc

Jeff Garzik <[email protected]> writes:

> Krzysztof Halasa wrote:
>> ppi[0] = ppi[1] = &nv_port_info[ent->driver_data];
>
> That's probably the best solution.

It fixes the SATA NV problem (I'm not attaching the patch as you already
posted similar one).
--
Krzysztof Halasa

2006-09-25 23:52:18

by Jeff Garzik

[permalink] [raw]
Subject: Re: NV SATA breakage: jgarzik/libata-dev#upstream etc

Krzysztof Halasa wrote:
> Jeff Garzik <[email protected]> writes:
>
>> Krzysztof Halasa wrote:
>>> ppi[0] = ppi[1] = &nv_port_info[ent->driver_data];
>> That's probably the best solution.
>
> It fixes the SATA NV problem (I'm not attaching the patch as you already
> posted similar one).

The argument to ata_pci_init_native_mode() was also updated.

Just for sanity's sake, could you test linux-2.5.git + my patch?

Jeff


2006-09-26 00:29:50

by Krzysztof Halasa

[permalink] [raw]
Subject: Re: NV SATA breakage: jgarzik/libata-dev#upstream etc

Jeff Garzik <[email protected]> writes:

> The argument to ata_pci_init_native_mode() was also updated.
>
> Just for sanity's sake, could you test linux-2.5.git + my patch?

I've already tested nearly identical patch against your latest tree
(currently equal to Linus' - a6d967a485c67ec8a1276261f39d81ace6a3e308)
and it works fine, so Acked-By: Krzysztof Halasa <[email protected]>
--
Krzysztof Halasa