2005-01-07 23:43:07

by Joe Krahn

[permalink] [raw]
Subject: Bogus REPORT_LUNS responses breaks SCSI LUN detection

There are apparently several devices that return bad data
for the REPORT_LUNS query, but do not return an error.
The newer kernels only do sequential LUN scans if REPORT_LUNS
fails. There may need to be a kernel option to force sequential
scans.

It might be useful to always do sequential scans, and
only rely on REPORT_LUNS to correctly setup non-sequential LUNs,
where it should be working correctly. Or, at least try sequential
scans if the REPORT_LUNS reply looks 'suspicious'.

Here are some related reports of problems. All of these are RAID
systems, so it may be a specific embedded controller at fault,
but you can't tell this by looking at the Vendor/Model fields.

SuSE 9.1
Vendor: easyRAID Model: X16 Rev: 0001
Type: Direct-Access ANSI SCSI revision: 03
scsi: host 0 channel 0 id 5 lun 0x6500737952414944 has a LUN larger than
currently supported.

SuSE 9.1
Vendor: FX-1600U Model: 3-R Rev: 0001
Type: Direct-Access ANSI SCSI revision: 03
scsi: host 3 channel 0 id 0 lun 0x00000200080c0400 has a LUN larger than
currently supported.

Kernel 2.6, unknown distro
Vendor: transtec Model: Rev: 0001
Type: Direct-Access ANSI SCSI revision: 03
On host 1 channel 0 id 1 only 128 (max_scsi_report_luns) of 536870896
luns reported, try increasing max_scsi_report_luns.
scsi: host 1 channel 0 id 1 lun 0x7400616e73746563 has a LUN larger than
currently supported.

Fedora Core 2 and 3
Vendor: Tornado- Model: F4 Rev: 0001
Type: Direct-Access ANSI SCSI revision: 03
scsi: host 1 channel 0 id 8 lun 0x00000200080c0400 has a LUN larger than
currently supported.


I noticed that these LUN hex values decode to text fragments:
Easy RAID decodes to: 'e.syRAID'
Vendor=Transtec, lun decodes to 't.anstec'.

And, here is a raw dump the REPORT_LUNS response from Tornado F4:
0000000: 00 00 00 80 8b 00 01 32 .......2
0000008: 54 00 72 6e 61 64 6f 2d T.rnado-
0000010: 46 01 20 20 20 20 20 20 F.
0000018: 20 02 20 20 20 20 20 20 .
0000020: 30 03 30 31 00 00 00 00 0.01....
...


2005-02-14 04:51:31

by Kurt Garloff

[permalink] [raw]
Subject: Re: Bogus REPORT_LUNS responses breaks SCSI LUN detection

On Fri, Jan 07, 2005 at 06:39:02PM -0500, Joe Krahn wrote:
> There are apparently several devices that return bad data
> for the REPORT_LUNS query, but do not return an error.
> The newer kernels only do sequential LUN scans if REPORT_LUNS
> fails. There may need to be a kernel option to force sequential
> scans.

There is.
Try passing scsi_mod.default_dev_flags=0x40000
The SUSE initrd will also understand the better memorizable version
scsi_noreportlun=1.

Devices known to be broken should be added to the blacklist with
BLIST_NOREPORTLUN.

> Here are some related reports of problems. All of these are RAID
> systems, so it may be a specific embedded controller at fault,
> but you can't tell this by looking at the Vendor/Model fields.
>
> SuSE 9.1
> Vendor: easyRAID Model: X16 Rev: 0001
> Type: Direct-Access ANSI SCSI revision: 03
> scsi: host 0 channel 0 id 5 lun 0x6500737952414944 has a LUN larger than
> currently supported.

Looks like random garbage.

> SuSE 9.1
> Vendor: FX-1600U Model: 3-R Rev: 0001
> Type: Direct-Access ANSI SCSI revision: 03
> scsi: host 3 channel 0 id 0 lun 0x00000200080c0400 has a LUN larger than
> currently supported.

This probably uses some of the less common LUN numbering?
REPORT_LUNS reports 8byte LUN numbers, which are flattened according
to the most commonly used scheme to a 32bit unsigned int by Linux.
We might change that the LUNs to be opaque or detect the LUN encoding
before flattening.

> Kernel 2.6, unknown distro
> Vendor: transtec Model: Rev: 0001
> Type: Direct-Access ANSI SCSI revision: 03
> On host 1 channel 0 id 1 only 128 (max_scsi_report_luns) of 536870896
> luns reported, try increasing max_scsi_report_luns.
> scsi: host 1 channel 0 id 1 lun 0x7400616e73746563 has a LUN larger than
> currently supported.

Garbage.

> Fedora Core 2 and 3
> Vendor: Tornado- Model: F4 Rev: 0001
> Type: Direct-Access ANSI SCSI revision: 03
> scsi: host 1 channel 0 id 8 lun 0x00000200080c0400 has a LUN larger than
> currently supported.

LUN flattening issue?

> I noticed that these LUN hex values decode to text fragments:
> Easy RAID decodes to: 'e.syRAID'
> Vendor=Transtec, lun decodes to 't.anstec'.

Ask them to fix it.

Regards,
--
Kurt Garloff, Director SUSE Labs, Novell Inc.


Attachments:
(No filename) (2.28 kB)
(No filename) (189.00 B)
Download all attachments

2005-02-15 20:47:54

by Joe Krahn

[permalink] [raw]
Subject: Re: Bogus REPORT_LUNS responses breaks SCSI LUN detection

Kurt Garloff wrote:
> On Fri, Jan 07, 2005 at 06:39:02PM -0500, Joe Krahn wrote:
>
>>There are apparently several devices that return bad data
>>for the REPORT_LUNS query, but do not return an error.
>>The newer kernels only do sequential LUN scans if REPORT_LUNS
>>fails. There may need to be a kernel option to force sequential
>>scans.
>
>
> There is.
> Try passing scsi_mod.default_dev_flags=0x40000
> The SUSE initrd will also understand the better memorizable version
> scsi_noreportlun=1.
>
> Devices known to be broken should be added to the blacklist with
> BLIST_NOREPORTLUN.
>
>

Oops; I didn't see that flag. It seems it was added at the same time LUN
scanning became the default. It would be good to document the
availability of default_dev_flags in /Documents/scsi.

It appears that the broken RAID systems are based on Maxtronic Arrays,
such as the Arena Premium 8600. They just released a fixed firmware, so
the source of the problem should be fixed. (It was also broken for Mac OSX.)

Thanks,
Joe Krahn

2005-02-18 17:26:49

by Andries Brouwer

[permalink] [raw]
Subject: Re: Bogus REPORT_LUNS responses breaks SCSI LUN detection

On Sun, Feb 13, 2005 at 11:51:00PM -0500, Kurt Garloff wrote:

> > SuSE 9.1
> > Vendor: easyRAID Model: X16 Rev: 0001
> > Type: Direct-Access ANSI SCSI revision: 03
> > scsi: host 0 channel 0 id 5 lun 0x6500737952414944 has a LUN larger than
> > currently supported.
>
> Looks like random garbage.

I read "e syRAID"

> > Kernel 2.6, unknown distro
> > Vendor: transtec Model: Rev: 0001
> > Type: Direct-Access ANSI SCSI revision: 03
> > On host 1 channel 0 id 1 only 128 (max_scsi_report_luns) of 536870896
> > luns reported, try increasing max_scsi_report_luns.
> > scsi: host 1 channel 0 id 1 lun 0x7400616e73746563 has a LUN larger than
> > currently supported.

I read "t anstec"

So - you might wish to investigate why the 2nd byte of "easyRAID" and
of "transtec" was zeroed, and whether contents like this was to be
expected (maybe the previous command was IDENTIFY?).

Andries

2005-02-18 18:17:07

by Joe Krahn

[permalink] [raw]
Subject: Re: Bogus REPORT_LUNS responses breaks SCSI LUN detection

Andries Brouwer wrote:
> On Sun, Feb 13, 2005 at 11:51:00PM -0500, Kurt Garloff wrote:
>
>
>>>SuSE 9.1
>>>Vendor: easyRAID Model: X16 Rev: 0001
>>>Type: Direct-Access ANSI SCSI revision: 03
>>>scsi: host 0 channel 0 id 5 lun 0x6500737952414944 has a LUN larger than
>>>currently supported.
>>
>>Looks like random garbage.
>
>
> I read "e syRAID"
>
>
>>>Kernel 2.6, unknown distro
>>>Vendor: transtec Model: Rev: 0001
>>>Type: Direct-Access ANSI SCSI revision: 03
>>>On host 1 channel 0 id 1 only 128 (max_scsi_report_luns) of 536870896
>>>luns reported, try increasing max_scsi_report_luns.
>>>scsi: host 1 channel 0 id 1 lun 0x7400616e73746563 has a LUN larger than
>>>currently supported.
>
>
> I read "t anstec"
>
> So - you might wish to investigate why the 2nd byte of "easyRAID" and
> of "transtec" was zeroed, and whether contents like this was to be
> expected (maybe the previous command was IDENTIFY?).
>
> Andries

The problem arises from a bug in the underlying controller made by
MaxTronic. The good news is that they recently released an upgraded
firmware to fix it. And, more importantly, it is possible to set
scsi_mod.default_dev_flags=0x40000 (==BLIST_NOREPORTLUN)

I suspect that your guess of the previous command being IDENTIFY is correct.

Thanks, Joe Krahn