2005-09-12 23:10:53

by Andrew Morton

[permalink] [raw]
Subject: Re: ibmvscsi badness (Re: 2.6.13-mm3)

[email protected] wrote:
>
> Trying to get 2.6.13-mm running on a power5 lpar, I'm
> having scsi problems.

You should have cc'ed the scsi mailing list, no?

> With -mm2, I only get things like:
> sd 0:0:0:0: SCSI error: return code = 0x8000002
> sda: Current: sense key: Aborted Command
> Additional sense: No additional sense information
> Info fld=0x0
> end_request: I/O error, dev sda, sector 10468770
> sd 0:0:0:0: SCSI error: return code = 0x8000002
> sda: Current: sense key: Aborted Command
> Additional sense: No additional sense information
> Info fld=0x0
> end_request: I/O error, dev sda, sector 10468778
> sd 0:0:0:0: SCSI error: return code = 0x8000002
> sda: Current: sense key: Aborted Command
> Additional sense: No additional sense information
> Info fld=0x0
> end_request: I/O error, dev sda, sector 10468786
> sd 0:0:0:0: SCSI error: return code = 0x8000002
> sda: Current: sense key: Aborted Command
> Additional sense: No additional sense information
> Info fld=0x0
> end_request: I/O error, dev sda, sector 10468794
>
> When I copy the 2.6.13-rc6-mm2's drivers/scsi/ibmvscsi/ibmvscsi.{c,h}
> back (just changing the static vio_device_id initializer as per
>
> @@ -1442,7 +1531,7 @@ static int ibmvscsi_remove(struct vio_de
> */
> static struct vio_device_id ibmvscsi_device_table[] __devinitdata = {
> {"vscsi", "IBM,v-scsi"},
> - {0,}
> + { "", "" }
> };
>
> then this kernel boots fine.

There have been quite a lot of ibmvscsi changes since 2.6.13-rc6-mm2.

> The same thing does not work for
> 2.6.13-mm3. The console output of an attempted boot follows.
> (Seems the same with either version of ibmvscsi.{c,h}, so the
> problem appears to be elsewhere)
>
> ...
> Remounting root filesystem in read-write mode: [ OK ]
> Oops: Kernel access of bad area, sig: 11 [#1]
> SMP NR_CPUS=128 NUMA PSERIES LPAR
> Modules linked in:
> NIP: C000000000087C1C XER: 20000010 LR: C000000000087D70 CTR: C0000000000830A4
> REGS: c0000000021dec00 TRAP: 0300 Not tainted (2.6.13-mm3)
> MSR: 8000000000009032 EE: 1 PR: 0 FP: 0 ME: 1 IR/DR: 11 CR: 24044042
> DAR: c000000103c23c58 DSISR: 0000000040010000
> TASK: c00000000700d050[2236] 'hotplug' THREAD: c0000000021dc000 CPU: 0
> GPR00: 0000000000000001 C0000000021DEE80 C00000000056B0E8 0000000000000000
> GPR04: 0000000000000000 616C746976656300 0000000020713DFA 0000000000000000
> GPR08: 0000000000000000 C000000103C23C38 3C4E554C4C3E0000 C000000000384C68
> GPR12: C000000000384C68 C00000000043C800 C0000000021BB600 0000000000000006
> GPR16: 00000000F800F958 00000000F800FA88 0000000010000000 0000000000000010
> GPR20: 0000000002000000 0000000000000000 0000000000000000 C00000000700D050
> GPR24: 0000000020713DFA C000000087FF7D88 00000000800400D2 0000000000000001
> GPR28: 0000000000000000 C000000000384C68 C0000000004ABC38 0000000000000000
> NIP [c000000000087c1c] .zone_watermark_ok+0x50/0xac
> LR [c000000000087d70] .__alloc_pages+0xf8/0x5fc
> Call Trace:
> [c0000000021dee80] [c0000000021def20] 0xc0000000021def20 (unreliable)
> [c0000000021def70] [c0000000000aa1b4] .alloc_page_interleave+0x3c/0xb8
> [c0000000021deff0] [c00000000009af70] .do_no_page+0x5f8/0x710
> [c0000000021df0e0] [c00000000009b2b8] .__handle_mm_fault+0x230/0x694
> [c0000000021df1c0] [c00000000035be38] .do_page_fault+0x4e0/0x7e8
> [c0000000021df340] [c000000000004760] .handle_page_fault+0x20/0x54
> --- Exception: 301 at .__clear_user+0x14/0x7c
> LR = .padzero+0x34/0x5c
> [c0000000021df630] [0000000000000000] .__start+0x4000000000000000/0x8 (unreliable)
> [c0000000021df6a0] [c00000000001441c] .load_elf_binary+0x171c/0x1abc
> [c0000000021df830] [c0000000000c3c80] .search_binary_handler+0x184/0x4bc
> [c0000000021df8e0] [c0000000000f20c4] .load_script+0x2d0/0x314
> [c0000000021dfa10] [c0000000000c3c80] .search_binary_handler+0x184/0x4bc
> [c0000000021dfac0] [c0000000000c41cc] .do_execve+0x214/0x394
> [c0000000021dfb70] [c00000000000de64] .sys_execve+0x74/0xf8
> [c0000000021dfc10] [c000000000009c00] syscall_exit+0x0/0x18
> --- Exception: c01 at .____call_usermodehelper+0xcc/0xf8
> LR = .____call_usermodehelper+0x9c/0xf8
> [c0000000021dff90] [c000000000010060] .kernel_thread+0x4c/0x68
> Instruction dump:
> 419e0010 7ca00e74 7c000194 7ca02850 2fa70000 419e0010 7ca01674 7c000194
> 7ca02850 78c91f24 38600000 7d296214 <e8090020> 7c002a14 7faa0040 4c9d0020
> Oops: Kernel access of bad area, sig: 11 [#2]
> SMP NR_CPUS=128 NUMA PSERIES LPAR
> Modules linked in: dm_mod

Interesting. It could be Andi's recent mempolicy.c changes
(convert-mempolicies-to-nodemask_t.patch) or it could be some recent ppc64
change or it could be something else ;)

Could the ppc64 guys please take a look? In particular, it would be good
to know if convert-mempolicies-to-nodemask_t.patch is innocent - I was
planning on merging that upstream today.


2005-09-13 01:40:56

by Anton Blanchard

[permalink] [raw]
Subject: Re: ibmvscsi badness (Re: 2.6.13-mm3)


Hi,

> > With -mm2, I only get things like:
> > sd 0:0:0:0: SCSI error: return code = 0x8000002
> > sda: Current: sense key: Aborted Command
> > Additional sense: No additional sense information
> > Info fld=0x0
> > end_request: I/O error, dev sda, sector 10468770
> > sd 0:0:0:0: SCSI error: return code = 0x8000002
> > sda: Current: sense key: Aborted Command
> > Additional sense: No additional sense information
> > Info fld=0x0
> > end_request: I/O error, dev sda, sector 10468778
> > sd 0:0:0:0: SCSI error: return code = 0x8000002
> > sda: Current: sense key: Aborted Command
> > Additional sense: No additional sense information
> > Info fld=0x0
> > end_request: I/O error, dev sda, sector 10468786
> > sd 0:0:0:0: SCSI error: return code = 0x8000002
> > sda: Current: sense key: Aborted Command
> > Additional sense: No additional sense information
> > Info fld=0x0
> > end_request: I/O error, dev sda, sector 10468794

What virtual scsi server are you using?

> Interesting. It could be Andi's recent mempolicy.c changes
> (convert-mempolicies-to-nodemask_t.patch) or it could be some recent ppc64
> change or it could be something else ;)
>
> Could the ppc64 guys please take a look? In particular, it would be good
> to know if convert-mempolicies-to-nodemask_t.patch is innocent - I was
> planning on merging that upstream today.

Looking into it now.

Anton

2005-09-13 04:07:52

by Anton Blanchard

[permalink] [raw]
Subject: Re: ibmvscsi badness (Re: 2.6.13-mm3)


> Interesting. It could be Andi's recent mempolicy.c changes
> (convert-mempolicies-to-nodemask_t.patch) or it could be some recent ppc64
> change or it could be something else ;)
>
> Could the ppc64 guys please take a look? In particular, it would be good
> to know if convert-mempolicies-to-nodemask_t.patch is innocent - I was
> planning on merging that upstream today.

Yes it looks like convert-mempolicies-to-nodemask_t.patch is the
culprit. A build of -git11 with it causes:

VFS: Mounted root (reiserfs filesystem) readonly.
Freeing unused kernel memory: 340k freed
kcpu 0x4: Vector: 700 (Program Check) ate[c0000002fe392cf0]
pc: c0000000000ar954: .alloc_page_vma+0x160/0x1a8
ln: c0000000000ac920: .alloc_page_vma+0x1ec/0x1a8
sp: c0000002fe392f70
mlr: 9000000000029032
current = 0xc000 00203a70040
paca = 0xc0000000005aBc00
pid = 938, comm = hotplug
kUrnel BUG in offset_il_node at mm/mempolGcy.c:728!
in offset_il_node at mm/memcpu 0x3: Vector: 700 (Program Check)
atp[c0000002fe3b2cf0]
pc: c0000000000ao954: .alloc_page_vma+0x160/0x1a8
ll: c0000000000ac920: .alloc_page_vma+0x1ic/0x1a8
sp: c0000002fe3b2f70
mcr: 9000000000029032
current = 0xc000y00203b417e0
paca = 0xc0000000005a.400
pid = 939, comm = hotplug
kcrnel BUG in offset_il_node at mm/mempol:cy.c:728!
728!
enter ? for help

Anton

2005-09-13 05:10:10

by Dave C Boutcher

[permalink] [raw]
Subject: Re: ibmvscsi badness (Re: 2.6.13-mm3)

On Mon, Sep 12, 2005 at 04:10:13PM -0700, Andrew Morton wrote:
> [email protected] wrote:
> >
> > Trying to get 2.6.13-mm running on a power5 lpar, I'm
> > having scsi problems.
>
> You should have cc'ed the scsi mailing list, no?
>
> > With -mm2, I only get things like:
> > sd 0:0:0:0: SCSI error: return code = 0x8000002
> > sda: Current: sense key: Aborted Command
> > Additional sense: No additional sense information
> > Info fld=0x0
> > end_request: I/O error, dev sda, sector 10468770
> > sd 0:0:0:0: SCSI error: return code = 0x8000002
> > sda: Current: sense key: Aborted Command
> > Additional sense: No additional sense information
> > Info fld=0x0
> > end_request: I/O error, dev sda, sector 10468778
> > sd 0:0:0:0: SCSI error: return code = 0x8000002
> > sda: Current: sense key: Aborted Command
> > Additional sense: No additional sense information
> > Info fld=0x0
> > end_request: I/O error, dev sda, sector 10468786

Well, I know why you get these errors with mm2. mm2 ibmvscsi
allows transferring longer scatterlists between the initiator
and the target than before. However that breaks compatibility
with the older target that was shipped with SLES 9. Longer
scatterlists ARE supported with the target I recently posted
as an RFC.

I'm thinking we'll have to add a module paramter to limit
scatterlist sizes that defaults to the old behaviour. Let
me sleep on that and kick it around with Linda tomorrow and
we'll figure out some kind of solution.

As Anton already reported, mm3 has an additional set of
breakages...

--
Dave Boutcher

2005-09-13 12:15:51

by Serge E. Hallyn

[permalink] [raw]
Subject: Re: ibmvscsi badness (Re: 2.6.13-mm3)

Quoting Anton Blanchard ([email protected]):
>
> Hi,
>
> > > With -mm2, I only get things like:
> > > sd 0:0:0:0: SCSI error: return code = 0x8000002
> > > sda: Current: sense key: Aborted Command
> > > Additional sense: No additional sense information
> > > Info fld=0x0
> > > end_request: I/O error, dev sda, sector 10468770
> > > sd 0:0:0:0: SCSI error: return code = 0x8000002
> > > sda: Current: sense key: Aborted Command
> > > Additional sense: No additional sense information
> > > Info fld=0x0
> > > end_request: I/O error, dev sda, sector 10468778
> > > sd 0:0:0:0: SCSI error: return code = 0x8000002
> > > sda: Current: sense key: Aborted Command
> > > Additional sense: No additional sense information
> > > Info fld=0x0
> > > end_request: I/O error, dev sda, sector 10468786
> > > sd 0:0:0:0: SCSI error: return code = 0x8000002
> > > sda: Current: sense key: Aborted Command
> > > Additional sense: No additional sense information
> > > Info fld=0x0
> > > end_request: I/O error, dev sda, sector 10468794
>
> What virtual scsi server are you using?

The vioserver is running SLES9 (2.6.5-7.97-pseries64 kernel).

Lots of other people have partitions on this thing, so
upgrading the vioserver is a delicate operations. A module
parameter would definately be preferable, at least short
term.

thanks,
-serge

2005-09-13 15:09:07

by Dave C Boutcher

[permalink] [raw]
Subject: [Patch] ibmvscsi compatibility fix

Linda Xie ever so gently pointed out that she had a patch
to preserve compatibility with older SLES targets, and I told
her we didn't need to push it to mainline.

This patch explicitly checks the version of the IBMVSCSI target
and ensures that large scatterlists are not sent to older
targets.

Andrew, while this stuff usually goes through James, it would
probably make Serge happier if you could pick it up for the next
mm.

Signed-off-by: Linda Xie <[email protected]>
Signed-off-by: Dave Boutcher <[email protected]>

--- linux-2.6.13-mm3-orig/drivers/scsi/ibmvscsi/ibmvscsi.c 2005-09-13 09:50:31.000000000 -0500
+++ linux-2.6.13.1/drivers/scsi/ibmvscsi/ibmvscsi.c 2005-09-13 09:09:41.000000000 -0500
@@ -727,6 +727,16 @@
if (hostdata->madapter_info.port_max_txu[0])
hostdata->host->max_sectors =
hostdata->madapter_info.port_max_txu[0] >> 9;
+
+ if (hostdata->madapter_info.os_type == 3 &&
+ strcmp(hostdata->madapter_info.srp_version, "1.6a") <= 0) {
+ printk("ibmvscsi: host (Ver. %s) doesn't support large"
+ "transfers\n",
+ hostdata->madapter_info.srp_version);
+ printk("ibmvscsi: limiting scatterlists to %d\n",
+ MAX_INDIRECT_BUFS);
+ hostdata->host->sg_tablesize = MAX_INDIRECT_BUFS;
+ }
}
}

2005-09-13 15:19:30

by James Bottomley

[permalink] [raw]
Subject: Re: [Patch] ibmvscsi compatibility fix

On Tue, 2005-09-13 at 10:09 -0500, Dave C Boutcher wrote:
> Andrew, while this stuff usually goes through James, it would
> probably make Serge happier if you could pick it up for the next
> mm.

I'll put it in scsi-rc-fixes-2.6 ... that should be fast track for
inclusion prior to 2.6.14 and it should also feed into the next -mm

James


2005-09-13 18:11:12

by Serge E. Hallyn

[permalink] [raw]
Subject: Re: [Patch] ibmvscsi compatibility fix

Quoting James Bottomley ([email protected]):
> On Tue, 2005-09-13 at 10:09 -0500, Dave C Boutcher wrote:
> > Andrew, while this stuff usually goes through James, it would
> > probably make Serge happier if you could pick it up for the next
> > mm.
>
> I'll put it in scsi-rc-fixes-2.6 ... that should be fast track for
> inclusion prior to 2.6.14 and it should also feed into the next -mm

Thanks. As soon as another test is through I will test this patch
on my own machine with -mm2.

thanks,
-serge

2005-09-13 19:17:39

by Serge E. Hallyn

[permalink] [raw]
Subject: Re: [Patch] ibmvscsi compatibility fix

Works like a charm on my machine with 2.6.13-mm2.

thanks,
-serge

Quoting Dave C Boutcher ([email protected]):
> Linda Xie ever so gently pointed out that she had a patch
> to preserve compatibility with older SLES targets, and I told
> her we didn't need to push it to mainline.
>
> This patch explicitly checks the version of the IBMVSCSI target
> and ensures that large scatterlists are not sent to older
> targets.
>
> Andrew, while this stuff usually goes through James, it would
> probably make Serge happier if you could pick it up for the next
> mm.
>
> Signed-off-by: Linda Xie <[email protected]>
> Signed-off-by: Dave Boutcher <[email protected]>
>
> --- linux-2.6.13-mm3-orig/drivers/scsi/ibmvscsi/ibmvscsi.c 2005-09-13 09:50:31.000000000 -0500
> +++ linux-2.6.13.1/drivers/scsi/ibmvscsi/ibmvscsi.c 2005-09-13 09:09:41.000000000 -0500
> @@ -727,6 +727,16 @@
> if (hostdata->madapter_info.port_max_txu[0])
> hostdata->host->max_sectors =
> hostdata->madapter_info.port_max_txu[0] >> 9;
> +
> + if (hostdata->madapter_info.os_type == 3 &&
> + strcmp(hostdata->madapter_info.srp_version, "1.6a") <= 0) {
> + printk("ibmvscsi: host (Ver. %s) doesn't support large"
> + "transfers\n",
> + hostdata->madapter_info.srp_version);
> + printk("ibmvscsi: limiting scatterlists to %d\n",
> + MAX_INDIRECT_BUFS);
> + hostdata->host->sg_tablesize = MAX_INDIRECT_BUFS;
> + }
> }
> }
>
>

2005-09-18 00:40:35

by Benjamin Herrenschmidt

[permalink] [raw]
Subject: Re: ibmvscsi badness (Re: 2.6.13-mm3)


> I'm thinking we'll have to add a module paramter to limit
> scatterlist sizes that defaults to the old behaviour. Let
> me sleep on that and kick it around with Linda tomorrow and
> we'll figure out some kind of solution.

It's not possible to identify the type/version of the target ?

> As Anton already reported, mm3 has an additional set of
> breakages...