2017-03-08 14:22:24

by Kashyap Desai

[permalink] [raw]
Subject: out of range LBA using sg_raw

Hi -

Need help to understand if below is something we should consider to be
fixed in megaraid_sas driver or call as unreal exposure.

I have created slice VD of size 10GB (raid 1) using 2 drives. Each
Physical Drive size is 256GB.

Last LBA of the VD and actual Physical disk associated with that VD is
different. Actual Physical disk has larger range of LBA compare VD.

Below is readcap detail of VD0

# sg_readcap /dev/sdu
Read Capacity results:
Last logical block address=20971519 (0x13fffff), Number of
blocks=20971520
Logical block length=512 bytes
Hence:
Device size: 10737418240 bytes, 10240.0 MiB, 10.74 GB

Using below sg_raw command, we should see "LBA out of range" sense. In
CDB 0x28, pass LBA beyond last lba of VD 0x13fffff.

sg_raw -r 4k /dev/sdx 28 00 01 4f ff ff 00 00 08 00

It works if VD created behind MR controller does not support Fast Path
Write.
In case of Fast Path Write, driver convert LBA of VD to underlying
Physical disk and send IO direct to the physical disk. Since Physical disk
has enough LBA range to respond, it will not send "LBA out of range
sense".

Megaraid_Sas driver never validate range of LBA for VD as it assume to be
validated by upper layer in scsi stack. Other sg_tool method like sg_dd,
sg_write, dd etc has checks of LBA range and driver never receive out of
range LBA.

What is a suggestion ? Shall I add check in megaraid_sas driver or it is
not a valid scenario as "sg_raw" tool can send any type of command which
does not require multiple sanity in driver.

Thanks, Kashyap


2017-03-08 15:59:33

by Kashyap Desai

[permalink] [raw]
Subject: RE: out of range LBA using sg_raw

> -----Original Message-----
> From: Christoph Hellwig [mailto:[email protected]]
> Sent: Wednesday, March 08, 2017 8:41 PM
> To: Kashyap Desai
> Cc: [email protected]; [email protected]
> Subject: Re: out of range LBA using sg_raw
>
> Hi Kashyap,
>
> for SG_IO passthrough requests we can't validate command validity for
> commands as the block layer treats them as opaque. The SCSI device
> implementation needs to handle incorrect parameter to be robust.
>
> For your fast path bypass the megaraid driver assumes part of the SCSI
device
> implementation, so it will have to check for validity.

Thanks Chris. It is understood to have sanity in driver, but how critical
such checks where SG_IO type interface send pass-through request. ?
Are you suggesting as good to have sanity or very important as there may
be a real-time exposure other than SG_IO interface ? I am confused over
must or good to have check.
Also one more fault I can generate using below sg_raw command -

"sg_raw -r 32k /dev/sdx 28 00 01 4f ff ff 00 00 08 00"

Provide more scsi data length compare to actual SG buffer. Do you suggest
such SG_IO interface vulnerability is good to be captured in driver.

I am just curious to know how badly we have to scrutinize each packet
before sending to Fast Path as we are in IO path and recommend only
important checks to be added.

Thanks, Kashyap

2017-03-08 16:01:53

by Christoph Hellwig

[permalink] [raw]
Subject: Re: out of range LBA using sg_raw

Hi Kashyap,

for SG_IO passthrough requests we can't validate command validity
for commands as the block layer treats them as opaque. The SCSI
device implementation needs to handle incorrect parameter to be
robust.

For your fast path bypass the megaraid driver assumes part of the
SCSI device implementation, so it will have to check for validity.

2017-03-08 16:08:08

by Bart Van Assche

[permalink] [raw]
Subject: Re: out of range LBA using sg_raw

On Wed, 2017-03-08 at 21:29 +0530, Kashyap Desai wrote:
> Also one more fault I can generate using below sg_raw command -
>
> "sg_raw -r 32k /dev/sdx 28 00 01 4f ff ff 00 00 08 00"
>
> Provide more scsi data length compare to actual SG buffer. Do you suggest
> such SG_IO interface vulnerability is good to be captured in driver.

That's not a vulnerability of the SG I/O interface. A SCSI device has to set
the residual count correctly if the SCSI data length does not match the size
of the data buffer.

Bart.

2017-03-08 16:15:56

by Kashyap Desai

[permalink] [raw]
Subject: RE: out of range LBA using sg_raw

> -----Original Message-----
> From: Bart Van Assche [mailto:[email protected]]
> Sent: Wednesday, March 08, 2017 9:35 PM
> To: [email protected]; [email protected]
> Cc: [email protected]; [email protected]
> Subject: Re: out of range LBA using sg_raw
>
> On Wed, 2017-03-08 at 21:29 +0530, Kashyap Desai wrote:
> > Also one more fault I can generate using below sg_raw command -
> >
> > "sg_raw -r 32k /dev/sdx 28 00 01 4f ff ff 00 00 08 00"
> >
> > Provide more scsi data length compare to actual SG buffer. Do you
> > suggest such SG_IO interface vulnerability is good to be captured in
driver.
>
> That's not a vulnerability of the SG I/O interface. A SCSI device has to
set the
> residual count correctly if the SCSI data length does not match the size
of the
> data buffer.

Thanks Bart. I will pass this information to Broadcom firmware dev. May
be a Tx/Rx (DMA) related code in MR (also for Fusion IT HBA) cannot
handle due to some sanity checks are not passed.

>
> Bart.

2017-03-08 16:29:28

by Kashyap Desai

[permalink] [raw]
Subject: RE: out of range LBA using sg_raw

> -----Original Message-----
> From: Christoph Hellwig [mailto:[email protected]]
> Sent: Wednesday, March 08, 2017 9:37 PM
> To: Kashyap Desai
> Cc: Christoph Hellwig; [email protected]; linux-
> [email protected]
> Subject: Re: out of range LBA using sg_raw
>
> On Wed, Mar 08, 2017 at 09:29:28PM +0530, Kashyap Desai wrote:
> > Thanks Chris. It is understood to have sanity in driver, but how
> > critical such checks where SG_IO type interface send pass-through
request.
> ?
> > Are you suggesting as good to have sanity or very important as there
> > may be a real-time exposure other than SG_IO interface ? I am confused
> > over must or good to have check.
> > Also one more fault I can generate using below sg_raw command -
>
> SCSI _devices_ need to sanity check any input and fail commands instead
of
> crashing or causing other problems. Normal SCSI HBA drivers don't need
to
> do that as they don't interpret CDBs. Megaraid (and a few other raid
drivers)
> are special in that they take on part of the device functionality and do
> interpret CDBs sometimes. In that case you'll need to do all that
sanity
> checking and generate proper errors.
>
> It would be nice to have come common helpers for this shared between
> everyone interpreting SCSI CBD (e.g. the SCSI target code, the NVMe SCSI
> emulation and the various RAID drivers).

Thanks Chris. I will continue on this and will come back with changes.
Let me check with Broadcom internally and figure out all possible
scenarios for megaraid_sas.

Thanks, Kashyap

2017-03-08 16:32:43

by Christoph Hellwig

[permalink] [raw]
Subject: Re: out of range LBA using sg_raw

On Wed, Mar 08, 2017 at 09:29:28PM +0530, Kashyap Desai wrote:
> Thanks Chris. It is understood to have sanity in driver, but how critical
> such checks where SG_IO type interface send pass-through request. ?
> Are you suggesting as good to have sanity or very important as there may
> be a real-time exposure other than SG_IO interface ? I am confused over
> must or good to have check.
> Also one more fault I can generate using below sg_raw command -

SCSI _devices_ need to sanity check any input and fail commands instead
of crashing or causing other problems. Normal SCSI HBA drivers don't
need to do that as they don't interpret CDBs. Megaraid (and a few other
raid drivers) are special in that they take on part of the device
functionality and do interpret CDBs sometimes. In that case you'll
need to do all that sanity checking and generate proper errors.

It would be nice to have come common helpers for this shared between
everyone interpreting SCSI CBD (e.g. the SCSI target code, the NVMe
SCSI emulation and the various RAID drivers).

2017-03-08 16:33:14

by Martin K. Petersen

[permalink] [raw]
Subject: Re: out of range LBA using sg_raw

>>>>> "Kashyap" == Kashyap Desai <[email protected]> writes:

Kashyap,

Kashyap> I am just curious to know how badly we have to scrutinize each
Kashyap> packet before sending to Fast Path as we are in IO path and
Kashyap> recommend only important checks to be added.

As Christoph pointed out, when the fast path is in use you assume the
role of the SCSI device. And therefore it is your responsibility to
ensure that the VD's capacity and other relevant constraints are being
honored. Just like the MR firmware and any attached disks would.

It is a feature that there is no sanity checking in the sg interface.
The intent is to be able to pass through commands directly to a device
and have the device act upon them. Including fail them if they don't
make any sense.

PS. I'm really no fan of the fast path. It's super messy to have the VD
layout handled in two different places.

--
Martin K. Petersen Oracle Linux Engineering

2017-03-08 16:50:04

by Kashyap Desai

[permalink] [raw]
Subject: RE: out of range LBA using sg_raw

> -----Original Message-----
> From: Martin K. Petersen [mailto:[email protected]]
> Sent: Wednesday, March 08, 2017 10:03 PM
> To: Kashyap Desai
> Cc: Christoph Hellwig; [email protected]; linux-
> [email protected]
> Subject: Re: out of range LBA using sg_raw
>
> >>>>> "Kashyap" == Kashyap Desai <[email protected]> writes:
>
> Kashyap,
>
> Kashyap> I am just curious to know how badly we have to scrutinize each
> Kashyap> packet before sending to Fast Path as we are in IO path and
> Kashyap> recommend only important checks to be added.
>
> As Christoph pointed out, when the fast path is in use you assume the
role of
> the SCSI device. And therefore it is your responsibility to ensure that
the VD's
> capacity and other relevant constraints are being honored. Just like the
MR
> firmware and any attached disks would.

Martin -

Agree on this point. I am planning to study all possible such sanity in
driver for VD and not trying to fix one specific scenario as described
here.
Do you think fix in this area is good for kernel-stable as well OR just
keep in linux-next as it is not so severe considering real time exposure ?
Trying to understand priority and severity of this issue.

>
> It is a feature that there is no sanity checking in the sg interface.
> The intent is to be able to pass through commands directly to a device
and
> have the device act upon them. Including fail them if they don't make
any
> sense.

Understood as sg_raw is not design to sanity check.

>
> PS. I'm really no fan of the fast path. It's super messy to have the VD
layout
> handled in two different places.
>
> --
> Martin K. Petersen Oracle Linux Engineering

2017-03-09 00:42:37

by Martin K. Petersen

[permalink] [raw]
Subject: Re: out of range LBA using sg_raw

>>>>> "Kashyap" == Kashyap Desai <[email protected]> writes:

Kashyap,

Kashyap> Agree on this point. I am planning to study all possible such
Kashyap> sanity in driver for VD and not trying to fix one specific
Kashyap> scenario as described here. Do you think fix in this area is
Kashyap> good for kernel-stable as well OR just keep in linux-next as it
Kashyap> is not so severe considering real time exposure ? Trying to
Kashyap> understand priority and severity of this issue.

Since you're presumably only handling reads and writes in the fast path,
I assume the sanity checking will be fairly simple and thus a candidate
for stable.

--
Martin K. Petersen Oracle Linux Engineering