2007-05-03 20:03:29

by James Smart

[permalink] [raw]
Subject: Re: [RFC][PATCH] fix for async scsi scan sysfs problem (resend)

I doubt it's in the fc transport - it's doing what it always did, which has
nothing to do with coherency of the sdev's.

We're seeing like problems, and it looks like it's related to the scan_mutex
being held when some of the entry points are being called via the recent
async scan code (which also still has a bunch of issues around rmmod).
We should be sending some patches shortly.

-- james s

James Bottomley wrote:
> On Mon, 2007-04-23 at 14:13 -0400, Josef Bacik wrote:
>> Ok I have a new patch that I've built and tested on both my UP and SMP machine
>> and it appears to work fine. I took the async check out of scsi_add_lun, I
>> don't really see the point in waiting to do the sysfs registration stuff (if
>> theres a reason I haven't been able to find it in the original submission of
>> this functionality). Please let me know if this is incorrect. Thank you,
>
> Yes, it's incorrect ... if you do this, the devices will come up in a
> random order for multiple SCSI cards. One of the original design goals
> was not to require udev, so the final ordering should be the same as for
> the sync case.
>
> I think the root cause of the problem is somewhere in the fc transport
> rport addition code.
>
> James
>
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>


2007-08-11 15:08:37

by Jurij Smakov

[permalink] [raw]
Subject: Re: [RFC][PATCH] fix for async scsi scan sysfs problem (resend)

[Please keep me on CC, as I'm not on LKML.]

On Thu, May 03, 2007 at 04:00:57PM -0400, James Smart wrote:
> I doubt it's in the fc transport - it's doing what it always did, which has
> nothing to do with coherency of the sdev's.
>
> We're seeing like problems, and it looks like it's related to the
> scan_mutex
> being held when some of the entry points are being called via the recent
> async scan code (which also still has a bunch of issues around rmmod).
> We should be sending some patches shortly.

Hi James,

I've recently got a Sun Blade 1000 box with a QLA2200 controller, and
I'm bumping into exact same problem with 2.6.22:

scsi 0:0:0:0: Attached scsi generic sg1 type -1
scsi 0:0:0:0: Direct-Access HITACHI DKR1C-J072FC D7V5 PQ: 0
ANSI: 3
kobject_add failed for 0:0:0:0 with -EEXIST, don't try to register
things with the same name in the same directory.
Call Trace:
[000000001000ac78] scsi_sysfs_add_sdev+0x2c/0x228 [scsi_mod]
[0000000010008a68] scsi_probe_and_add_lun+0x97c/0xab8 [scsi_mod]
[00000000100090d8] __scsi_scan_target+0x90/0x660 [scsi_mod]
[0000000010009ce8] scsi_scan_target+0x94/0xa4 [scsi_mod]
[00000000100668bc] fc_scsi_scan_rport+0x68/0x8c [scsi_transport_fc]
[000000000046de88] run_workqueue+0xac/0x138
[000000000046e414] worker_thread+0xc4/0xd4
[0000000000471f24] kthread+0x4c/0x78
[00000000004277f8] kernel_thread+0x38/0x48
[0000000000471d84] kthreadd+0xbc/0x160
error 1

After that the device fails to initialize. On rare occasions the
error does not trigger, and then machine boots fine. The complete boot
log can be found at http://www.wooyd.org/misc/dmesg-blade1000-2.6.22.log

I'm willing to test any patches you might have, as well as provide any
additional debugging information.

Best regards,
--
Jurij Smakov [email protected]
Key: http://www.wooyd.org/pgpkey/ KeyID: C99E03CC

2007-08-13 00:26:53

by Matthew Wilcox

[permalink] [raw]
Subject: Re: [RFC][PATCH] fix for async scsi scan sysfs problem (resend)

On Sat, Aug 11, 2007 at 04:04:54PM +0100, Jurij Smakov wrote:
> [Please keep me on CC, as I'm not on LKML.]
> I've recently got a Sun Blade 1000 box with a QLA2200 controller, and
> I'm bumping into exact same problem with 2.6.22:

Please try
http://marc.info/?l=linux-scsi&m=118289275414202
which fixes a number of problems with the async scanning code.

--
"Bill, look, we understand that you're interested in selling us this
operating system, but compare it to ours. We can't possibly take such
a retrograde step."