2006-12-05 06:02:11

by Stephen Rothwell

[permalink] [raw]
Subject: Oops in pata_pdc2027x

Hi all,

I get an oops during initialisation of the pata_pdc2027x module on a
POWER 285 machine. This is on a very recent Linus kernel tree
(ff51a98799931256b555446b2f5675db08de6229) with Paulus' powerpc tree
(that has now been merged). The oops looks like this:

----------------------------------------------------------------------------
Synthesizing the initial hotplug events...done.
Waiting for /dev to be fully populated...pata_pdc2027x 0001:cc:01.0: PLL input clock 32721 kHz
ata1: PATA max UDMA/133 cmd 0xD0000800801457C0 ctl 0xD000080080145FDA bmdma 0xD000080080145000 irq 324
ata2: PATA max UDMA/133 cmd 0xD0000800801455C0 ctl 0xD000080080145DDA bmdma 0xD000080080145008 irq 324
scsi1 : pata_pdc2027x
ata1.00: ATAPI, max UDMA/66
Unable to handle kernel paging request for data at address 0xc0000001007e8d80
Faulting instruction address: 0xd00000000007b660
Oops: Kernel access of bad area, sig: 11 [#1]
SMP NR_CPUS=128 NUMA
Modules linked in: pata_pdc2027x dm_mirror dm_snapshot ipr libata
NIP: D00000000007B660 LR: D00000000007B628 CTR: 0000000000000000
REGS: c0000000f4e3b5a0 TRAP: 0300 Not tainted (2.6.19comb)
MSR: 8000000000009032 <EE,ME,IR,DR> CR: 24000080 XER: 20000009
DAR: C0000001007E8D80, DSISR: 0000000040010000
TASK = c000000007001040[3393] 'scsi_eh_1' THREAD: c0000000f4e38000 CPU: 2
GPR00: 0000000100000000 C0000000F4E3B820 D0000000000A95C0 C00000000FA946E8
GPR04: C0000000F4E3B960 0000000000000000 0000000000000003 C0000000F4E3B890
GPR08: 0000000000000001 C0000000007E8D80 D000080080145110 C000000000033A34
GPR12: 0000000000000000 C00000000056FF80 0000000000000000 C0000000004751C8
GPR16: 4000000001C00000 C000000000473B90 0000000000000000 000000000035D800
GPR20: 000000000213AF30 C00000000053AF30 C0000000F4E3BB10 0000000000000001
GPR24: 0000000000000002 0000000000000000 C0000000F4E3B960 C00000000FA946E8
GPR28: 0000000000000003 4000000000000000 D0000000000A7980 0000000000000000
NIP [D00000000007B660] .ata_exec_internal+0x8c/0xf8 [libata]
LR [D00000000007B628] .ata_exec_internal+0x54/0xf8 [libata]
Call Trace:
[C0000000F4E3B820] [D0000000000A7980] .ipr_store_update_fw+0x100/0x6f0 [ipr] (unreliable)
[C0000000F4E3B8F0] [D00000000007C15C] .ata_set_mode+0x4dc/0x6d0 [libata]
[C0000000F4E3B9D0] [D0000000000860E0] .ata_do_eh+0x1538/0x1930 [libata]
[C0000000F4E3BC20] [D000000000083774] .ata_bmdma_drive_eh+0x1f8/0x2e8 [libata]
[C0000000F4E3BCD0] [D0000000000C17B4] .pdc2027x_error_handler+0x28/0x40 [pata_pdc2027x]
[C0000000F4E3BD50] [D000000000086DC0] .ata_scsi_error+0x32c/0x660 [libata]
[C0000000F4E3BE00] [C000000000312F7C] .scsi_error_handler+0xc8/0x80c
[C0000000F4E3BEE0] [C00000000006A0F4] .kthread+0x124/0x174
[C0000000F4E3BF90] [C000000000024D68] .kernel_thread+0x4c/0x68
Instruction dump:
57ac053e e93e8020 7f63db78 7f44d378 780007c6 7f25cb78 7f86e378 38e10070
7fbd0214 39000001 7ba0f860 78001f24 <7d69002a> 7ba0a302 7bbd4602 39200000
done.
----------------------------------------------------------------------------

(The reference to ipr_store_update_fw is certainly not real)

The config, lspci and full boot dmesg output are available at
http://ozlabs.org/~sfr/{config,lspci,dmesg}

This machine has a only cdrom drive attached to the pdc controller.

--
Cheers,
Stephen Rothwell [email protected]
http://www.canb.auug.org.au/~sfr/


Attachments:
(No filename) (3.23 kB)
(No filename) (189.00 B)
Download all attachments

2006-12-06 06:10:36

by Albert Lee

[permalink] [raw]
Subject: Re: Oops in pata_pdc2027x

Stephen Rothwell wrote:
> Hi all,
>
> I get an oops during initialisation of the pata_pdc2027x module on a
> POWER 285 machine. This is on a very recent Linus kernel tree
> (ff51a98799931256b555446b2f5675db08de6229) with Paulus' powerpc tree
> (that has now been merged). The oops looks like this:
>

Hi Stephen,

Where could I download the patched kernel source? There are two
POWER5 9110-51A boxes here. Maybe I could try to reproduce the problem
on these boxes.

--
albert


2006-12-06 06:21:25

by Stephen Rothwell

[permalink] [raw]
Subject: Re: Oops in pata_pdc2027x

On Wed, 06 Dec 2006 14:10:15 +0800 Albert Lee <[email protected]> wrote:
>
> Where could I download the patched kernel source? There are two
> POWER5 9110-51A boxes here. Maybe I could try to reproduce the problem
> on these boxes.

You can just use Linus' current tree - it exhibits the same problem (and
the powerpc tree has been merged into it).

--
Cheers,
Stephen Rothwell [email protected]
http://www.canb.auug.org.au/~sfr/


Attachments:
(No filename) (454.00 B)
(No filename) (189.00 B)
Download all attachments

2006-12-07 03:56:43

by Albert Lee

[permalink] [raw]
Subject: Re: Oops in pata_pdc2027x

Stephen Rothwell wrote:
> Hi all,
>
> I get an oops during initialisation of the pata_pdc2027x module on a
> POWER 285 machine. This is on a very recent Linus kernel tree
> (ff51a98799931256b555446b2f5675db08de6229) with Paulus' powerpc tree
> (that has now been merged). The oops looks like this:
>
> ----------------------------------------------------------------------------
> Synthesizing the initial hotplug events...done.
> Waiting for /dev to be fully populated...pata_pdc2027x 0001:cc:01.0: PLL input clock 32721 kHz
> ata1: PATA max UDMA/133 cmd 0xD0000800801457C0 ctl 0xD000080080145FDA bmdma 0xD000080080145000 irq 324
> ata2: PATA max UDMA/133 cmd 0xD0000800801455C0 ctl 0xD000080080145DDA bmdma 0xD000080080145008 irq 324
> scsi1 : pata_pdc2027x
> ata1.00: ATAPI, max UDMA/66
> Unable to handle kernel paging request for data at address 0xc0000001007e8d80
> Faulting instruction address: 0xd00000000007b660
> Oops: Kernel access of bad area, sig: 11 [#1]
> SMP NR_CPUS=128 NUMA
> Modules linked in: pata_pdc2027x dm_mirror dm_snapshot ipr libata
> NIP: D00000000007B660 LR: D00000000007B628 CTR: 0000000000000000
> REGS: c0000000f4e3b5a0 TRAP: 0300 Not tainted (2.6.19comb)
> MSR: 8000000000009032 <EE,ME,IR,DR> CR: 24000080 XER: 20000009
> DAR: C0000001007E8D80, DSISR: 0000000040010000
> TASK = c000000007001040[3393] 'scsi_eh_1' THREAD: c0000000f4e38000 CPU: 2
> GPR00: 0000000100000000 C0000000F4E3B820 D0000000000A95C0 C00000000FA946E8
> GPR04: C0000000F4E3B960 0000000000000000 0000000000000003 C0000000F4E3B890
> GPR08: 0000000000000001 C0000000007E8D80 D000080080145110 C000000000033A34
> GPR12: 0000000000000000 C00000000056FF80 0000000000000000 C0000000004751C8
> GPR16: 4000000001C00000 C000000000473B90 0000000000000000 000000000035D800
> GPR20: 000000000213AF30 C00000000053AF30 C0000000F4E3BB10 0000000000000001
> GPR24: 0000000000000002 0000000000000000 C0000000F4E3B960 C00000000FA946E8
> GPR28: 0000000000000003 4000000000000000 D0000000000A7980 0000000000000000
> NIP [D00000000007B660] .ata_exec_internal+0x8c/0xf8 [libata]
> LR [D00000000007B628] .ata_exec_internal+0x54/0xf8 [libata]
> Call Trace:
> [C0000000F4E3B820] [D0000000000A7980] .ipr_store_update_fw+0x100/0x6f0 [ipr] (unreliable)
> [C0000000F4E3B8F0] [D00000000007C15C] .ata_set_mode+0x4dc/0x6d0 [libata]
> [C0000000F4E3B9D0] [D0000000000860E0] .ata_do_eh+0x1538/0x1930 [libata]
> [C0000000F4E3BC20] [D000000000083774] .ata_bmdma_drive_eh+0x1f8/0x2e8 [libata]
> [C0000000F4E3BCD0] [D0000000000C17B4] .pdc2027x_error_handler+0x28/0x40 [pata_pdc2027x]
> [C0000000F4E3BD50] [D000000000086DC0] .ata_scsi_error+0x32c/0x660 [libata]
> [C0000000F4E3BE00] [C000000000312F7C] .scsi_error_handler+0xc8/0x80c
> [C0000000F4E3BEE0] [C00000000006A0F4] .kthread+0x124/0x174
> [C0000000F4E3BF90] [C000000000024D68] .kernel_thread+0x4c/0x68
> Instruction dump:
> 57ac053e e93e8020 7f63db78 7f44d378 780007c6 7f25cb78 7f86e378 38e10070
> 7fbd0214 39000001 7ba0f860 78001f24 <7d69002a> 7ba0a302 7bbd4602 39200000
> done.
> ----------------------------------------------------------------------------
>
> (The reference to ipr_store_update_fw is certainly not real)
>
> The config, lspci and full boot dmesg output are available at
> http://ozlabs.org/~sfr/{config,lspci,dmesg}
>
> This machine has a only cdrom drive attached to the pdc controller.
>

It seems the opps occurred in the ata_exec_internal(). Could you please
try the attached patch and see if the older ata_exec_internal() works?
This can help to narrow it down.

--
albert

--- pdc2027x_ozlab/drivers/ata/libata-core.c 2006-12-06 15:01:13.000000000 +0800
+++ exec_internal/drivers/ata/libata-core.c 2006-12-07 11:49:08.000000000 +0800
@@ -1334,7 +1334,7 @@ unsigned ata_exec_internal_sg(struct ata
}

/**
- * ata_exec_internal_sg - execute libata internal command
+ * ata_exec_internal - execute libata internal command
* @dev: Device to which the command is sent
* @tf: Taskfile registers for the command and the result
* @cdb: CDB for packet command
@@ -1342,8 +1342,11 @@ unsigned ata_exec_internal_sg(struct ata
* @buf: Data buffer of the command
* @buflen: Length of data buffer
*
- * Wrapper around ata_exec_internal_sg() which takes simple
- * buffer instead of sg list.
+ * Executes libata internal command with timeout. @tf contains
+ * command on entry and result on return. Timeout and error
+ * conditions are reported via return value. No recovery action
+ * is taken after a command times out. It's caller's duty to
+ * clean up after timeout.
*
* LOCKING:
* None. Should be called with kernel context, might sleep.
@@ -1355,11 +1358,141 @@ unsigned ata_exec_internal(struct ata_de
struct ata_taskfile *tf, const u8 *cdb,
int dma_dir, void *buf, unsigned int buflen)
{
- struct scatterlist sg;
+ struct ata_port *ap = dev->ap;
+ u8 command = tf->command;
+ struct ata_queued_cmd *qc;
+ unsigned int tag, preempted_tag;
+ u32 preempted_sactive, preempted_qc_active;
+ DECLARE_COMPLETION_ONSTACK(wait);
+ unsigned long flags;
+ unsigned int err_mask;
+ int rc;

- sg_init_one(&sg, buf, buflen);
+ spin_lock_irqsave(ap->lock, flags);

- return ata_exec_internal_sg(dev, tf, cdb, dma_dir, &sg, 1);
+ /* no internal command while frozen */
+ if (ap->pflags & ATA_PFLAG_FROZEN) {
+ spin_unlock_irqrestore(ap->lock, flags);
+ return AC_ERR_SYSTEM;
+ }
+
+ /* initialize internal qc */
+
+ /* XXX: Tag 0 is used for drivers with legacy EH as some
+ * drivers choke if any other tag is given. This breaks
+ * ata_tag_internal() test for those drivers. Don't use new
+ * EH stuff without converting to it.
+ */
+ if (ap->ops->error_handler)
+ tag = ATA_TAG_INTERNAL;
+ else
+ tag = 0;
+
+ if (test_and_set_bit(tag, &ap->qc_allocated))
+ BUG();
+ qc = __ata_qc_from_tag(ap, tag);
+
+ qc->tag = tag;
+ qc->scsicmd = NULL;
+ qc->ap = ap;
+ qc->dev = dev;
+ ata_qc_reinit(qc);
+
+ preempted_tag = ap->active_tag;
+ preempted_sactive = ap->sactive;
+ preempted_qc_active = ap->qc_active;
+ ap->active_tag = ATA_TAG_POISON;
+ ap->sactive = 0;
+ ap->qc_active = 0;
+
+ /* prepare & issue qc */
+ qc->tf = *tf;
+ if (cdb)
+ memcpy(qc->cdb, cdb, ATAPI_CDB_LEN);
+ qc->flags |= ATA_QCFLAG_RESULT_TF;
+ qc->dma_dir = dma_dir;
+ if (dma_dir != DMA_NONE) {
+ ata_sg_init_one(qc, buf, buflen);
+ qc->nsect = buflen / ATA_SECT_SIZE;
+ }
+
+ qc->private_data = &wait;
+ qc->complete_fn = ata_qc_complete_internal;
+
+ ata_qc_issue(qc);
+
+ spin_unlock_irqrestore(ap->lock, flags);
+
+ rc = wait_for_completion_timeout(&wait, ata_probe_timeout);
+
+ ata_port_flush_task(ap);
+
+ if (!rc) {
+ spin_lock_irqsave(ap->lock, flags);
+
+ /* We're racing with irq here. If we lose, the
+ * following test prevents us from completing the qc
+ * twice. If we win, the port is frozen and will be
+ * cleaned up by ->post_internal_cmd().
+ */
+ if (qc->flags & ATA_QCFLAG_ACTIVE) {
+ qc->err_mask |= AC_ERR_TIMEOUT;
+
+ if (ap->ops->error_handler)
+ ata_port_freeze(ap);
+ else
+ ata_qc_complete(qc);
+
+ if (ata_msg_warn(ap))
+ ata_dev_printk(dev, KERN_WARNING,
+ "qc timeout (cmd 0x%x)\n", command);
+ }
+
+ spin_unlock_irqrestore(ap->lock, flags);
+ }
+
+ /* do post_internal_cmd */
+ if (ap->ops->post_internal_cmd)
+ ap->ops->post_internal_cmd(qc);
+
+ if (qc->flags & ATA_QCFLAG_FAILED && !qc->err_mask) {
+ if (ata_msg_warn(ap))
+ ata_dev_printk(dev, KERN_WARNING,
+ "zero err_mask for failed "
+ "internal command, assuming AC_ERR_OTHER\n");
+ qc->err_mask |= AC_ERR_OTHER;
+ }
+
+ /* finish up */
+ spin_lock_irqsave(ap->lock, flags);
+
+ *tf = qc->result_tf;
+ err_mask = qc->err_mask;
+
+ ata_qc_free(qc);
+ ap->active_tag = preempted_tag;
+ ap->sactive = preempted_sactive;
+ ap->qc_active = preempted_qc_active;
+
+ /* XXX - Some LLDDs (sata_mv) disable port on command failure.
+ * Until those drivers are fixed, we detect the condition
+ * here, fail the command with AC_ERR_SYSTEM and reenable the
+ * port.
+ *
+ * Note that this doesn't change any behavior as internal
+ * command failure results in disabling the device in the
+ * higher layer for LLDDs without new reset/EH callbacks.
+ *
+ * Kill the following code as soon as those drivers are fixed.
+ */
+ if (ap->flags & ATA_FLAG_DISABLED) {
+ err_mask |= AC_ERR_SYSTEM;
+ ata_port_probe(ap);
+ }
+
+ spin_unlock_irqrestore(ap->lock, flags);
+
+ return err_mask;
}

/**




2006-12-07 06:16:25

by Stephen Rothwell

[permalink] [raw]
Subject: Re: Oops in pata_pdc2027x

On Thu, 07 Dec 2006 11:56:32 +0800 Albert Lee <[email protected]> wrote:
>
> It seems the opps occurred in the ata_exec_internal(). Could you please
> try the attached patch and see if the older ata_exec_internal() works?
> This can help to narrow it down.

This worked much better:

Waiting for /dev to be fully populated...pata_pdc2027x 0001:cc:01.0: PLL input clock 32721 kHz
ata1: PATA max UDMA/133 cmd 0xD0000800801457C0 ctl 0xD000080080145FDA bmdma 0xD000080080145000 irq 324
ata2: PATA max UDMA/133 cmd 0xD0000800801455C0 ctl 0xD000080080145DDA bmdma 0xD000080080145008 irq 324
scsi1 : pata_pdc2027x
ata1.00: ATAPI, max UDMA/66
ata1.00: configured for UDMA/66
scsi2 : pata_pdc2027x
ATA: abnormal status 0x8 on port 0xD0000800801455DF
scsi 1:0:0:0: CD-ROM IBM DROM00205L1 H0 P533 PQ: 0 ANSI: 2
sr0: scsi3-mmc drive: 24x/24x cd/rw xa/form2 cdda tray
Uniform CD-ROM driver Revision: 3.20
sr 1:0:0:0: Attached scsi generic sg4 type 5
done.

--
Cheers,
Stephen Rothwell [email protected]
http://www.canb.auug.org.au/~sfr/


Attachments:
(No filename) (1.04 kB)
(No filename) (189.00 B)
Download all attachments

2006-12-09 08:59:24

by Benjamin Herrenschmidt

[permalink] [raw]
Subject: Re: Oops in pata_pdc2027x

Could that be related to the fix

[PATCH] libata: fix oops with sparsemem

That Arnd Bergmann just posted ? I'm copying it below in case you
haven't seen it.

----

From:
Arnd Bergmann
<[email protected]>
To:
[email protected]
Cc:
[email protected],
[email protected],
[email protected]
Subject:
[PATCH] libata: fix oops with
sparsemem
Date:
Fri, 8 Dec 2006 19:14:40 +0100
(Sat, 05:14 EST)


libata incorrectly passes NULL arguments to sg_set_buf, which
crashes on powerpc64 when looking for the corresponding mem_section.

This introduces a new ata_exec_nodma() wrapper that takes no buffer
arguments and does not call sg_set_buf either. In order to make it
easier to detect this sort of problem, it also adds a WARN_ON(!buf)
to sg_set_buf() so we get a log message even platforms without
sparsemem.

Signed-off-by: Arnd Bergmann <[email protected]>

Index: linux-2.6/drivers/ata/libata-core.c
===================================================================
--- linux-2.6.orig/drivers/ata/libata-core.c
+++ linux-2.6/drivers/ata/libata-core.c
@@ -1332,7 +1332,7 @@ unsigned ata_exec_internal_sg(struct ata
}

/**
- * ata_exec_internal_sg - execute libata internal command
+ * ata_exec_internal - execute libata internal command
* @dev: Device to which the command is sent
* @tf: Taskfile registers for the command and the result
* @cdb: CDB for packet command
@@ -1361,6 +1361,25 @@ unsigned ata_exec_internal(struct ata_de
}

/**
+ * ata_exec_nodma - execute libata internal command
+ * @dev: Device to which the command is sent
+ * @tf: Taskfile registers for the command and the result
+ *
+ * Wrapper around ata_exec_internal_sg() which takes no
+ * data buffer.
+ *
+ * LOCKING:
+ * None. Should be called with kernel context, might sleep.
+ *
+ * RETURNS:
+ * Zero on success, AC_ERR_* mask on failure
+ */
+static unsigned ata_exec_nodma(struct ata_device *dev, struct
ata_taskfile *tf)
+{
+ return ata_exec_internal_sg(dev, tf, NULL, DMA_NONE, NULL, 0);
+}
+
+/**
* ata_do_simple_cmd - execute simple internal command
* @dev: Device to which the command is sent
* @cmd: Opcode to execute
@@ -1384,7 +1403,7 @@ unsigned int ata_do_simple_cmd(struct at
tf.flags |= ATA_TFLAG_DEVICE;
tf.protocol = ATA_PROT_NODATA;

- return ata_exec_internal(dev, &tf, NULL, DMA_NONE, NULL, 0);
+ return ata_exec_nodma(dev, &tf);
}

/**
@@ -3475,7 +3494,7 @@ static unsigned int ata_dev_set_xfermode
tf.protocol = ATA_PROT_NODATA;
tf.nsect = dev->xfer_mode;

- err_mask = ata_exec_internal(dev, &tf, NULL, DMA_NONE, NULL, 0);
+ err_mask = ata_exec_nodma(dev, &tf);

DPRINTK("EXIT, err_mask=%x\n", err_mask);
return err_mask;
@@ -3513,7 +3532,7 @@ static unsigned int ata_dev_init_params(
tf.nsect = sectors;
tf.device |= (heads - 1) & 0x0f; /* max head = num. of heads - 1
*/

- err_mask = ata_exec_internal(dev, &tf, NULL, DMA_NONE, NULL, 0);
+ err_mask = ata_exec_nodma(dev, &tf);

DPRINTK("EXIT, err_mask=%x\n", err_mask);
return err_mask;
Index: linux-2.6/include/linux/scatterlist.h
===================================================================
--- linux-2.6.orig/include/linux/scatterlist.h
+++ linux-2.6/include/linux/scatterlist.h
@@ -8,6 +8,8 @@
static inline void sg_set_buf(struct scatterlist *sg, const void *buf,
unsigned int buflen)
{
+ WARN_ON(!buf); /* virt_to_page(NULL) crashes with sparsemem */
+
sg->page = virt_to_page(buf);
sg->offset = offset_in_page(buf);
sg->length = buflen;
_______________________________________________

2006-12-11 13:56:10

by Tejun Heo

[permalink] [raw]
Subject: Re: Oops in pata_pdc2027x

Benjamin Herrenschmidt wrote:
> Could that be related to the fix
>
> [PATCH] libata: fix oops with sparsemem
>
> That Arnd Bergmann just posted ? I'm copying it below in case you
> haven't seen it.

Seems to be the same problem to me. I'll reply in the original thread.

Thanks.

--
tejun