2017-07-18 01:08:46

by Jason L Tibbitts III

[permalink] [raw]
Subject: [REGRESSION] 28676d869bbb (scsi: sg: check for valid direction before starting the request) breaks mtx tape library control

After updating my tape backup server to 4.12 I found that mtx had issues
controlling the tape library. Good behavior:

[root@backup2 ~]# mtx -f /dev/sg7 next 0
Unloading drive 0 into Storage Element 4...done
Loading media from Storage Element 5 into drive 0...done

Bad behavior:

[root@backup2 ~]# mtx -f /dev/sg7 next 0
Unloading drive 0 into Storage Element 46...mtx: Request Sense: Long
Report=yes
mtx: Request Sense: Valid Residual=no
mtx: Request Sense: Error Code=0 (Unknown?!)
mtx: Request Sense: Sense Key=No Sense
mtx: Request Sense: FileMark=no
mtx: Request Sense: EOM=no
mtx: Request Sense: ILI=no
mtx: Request Sense: Additional Sense Code = 00
mtx: Request Sense: Additional Sense Qualifier = 00
mtx: Request Sense: BPV=no
mtx: Request Sense: Error in CDB=no
mtx: Request Sense: SKSV=no
MOVE MEDIUM from Element Address 1 to 1046 Failed

This was seen on a machine running Fedora 25 as well as an Ubuntu
machine. Relevant tickets:
https://bugzilla.redhat.com/show_bug.cgi?id=1471302
http://bugzilla.kernel.org/show_bug.cgi?id=196375
https://bugs.launchpad.net/bugs/1704512

mtx in all cases is 1.3.12; in the Fedora case that's
mtx-1.3.12-14.fc24.x86_64. I see this with an Overland Neo T48s library
but the Ubuntu user had a Dell ML6000 and we both have completely
different HBAs and cabling (LSI3008 SAS and qla2462 FC).

I bisected this down to:

commit 28676d869bbb5257b5f14c0c95ad3af3a7019dd5
Author: Johannes Thumshirn <[email protected]>
Date: Fri Apr 7 09:34:15 2017 +0200

scsi: sg: check for valid direction before starting the request

Check for a valid direction before starting the request, otherwise we
risk running into an assertion in the scsi midlayer checking for valid
requests.

[mkp: fixed typo]

Signed-off-by: Johannes Thumshirn <[email protected]>
Link: http://www.spinics.net/lists/linux-scsi/msg104400.html
Reported-by: Dmitry Vyukov <[email protected]>
Signed-off-by: Hannes Reinecke <[email protected]>
Tested-by: Johannes Thumshirn <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
Signed-off-by: Martin K. Petersen <[email protected]>

and confirmed that clean unpatched 4.12 shows the problem, while
reverting just that patch fixes the issue. Unfortunately I don't know
enough to actually fix this, but I can easily test patches.

- J<


2017-07-18 07:37:51

by Johannes Thumshirn

[permalink] [raw]
Subject: Re: [REGRESSION] 28676d869bbb (scsi: sg: check for valid direction before starting the request) breaks mtx tape library control

On Mon, Jul 17, 2017 at 07:44:50PM -0500, Jason L Tibbitts III wrote:
> and confirmed that clean unpatched 4.12 shows the problem, while
> reverting just that patch fixes the issue. Unfortunately I don't know
> enough to actually fix this, but I can easily test patches.

This is fixed with:
commit 68c59fcea1f2c6a54c62aa896cc623c1b5bc9b47
Author: Johannes Thumshirn <[email protected]>
Date: Fri Jul 7 10:56:38 2017 +0200

scsi: sg: fix SG_DXFER_FROM_DEV transfers

SG_DXFER_FROM_DEV transfers do not necessarily have a dxferp as we set
it to NULL for the old sg_io read/write interface, but must have a
length bigger than 0. This fixes a regression introduced by commit
28676d869bbb ("scsi: sg: check for valid direction before starting the
request")

Signed-off-by: Johannes Thumshirn <[email protected]>
Fixes: 28676d869bbb ("scsi: sg: check for valid direction before starting the request")
Reported-by: Chris Clayton <[email protected]>
Tested-by: Chris Clayton <[email protected]>
Cc: Douglas Gilbert <[email protected]>
Reviewed-by: Hannes Reinecke <[email protected]>
Tested-by: Chris Clayton <[email protected]>
Acked-by: Douglas Gilbert <[email protected]>
Signed-off-by: Martin K. Petersen <[email protected]>


--
Johannes Thumshirn Storage
[email protected] +49 911 74053 689
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 N?rnberg
GF: Felix Imend?rffer, Jane Smithard, Graham Norton
HRB 21284 (AG N?rnberg)
Key fingerprint = EC38 9CAB C2C4 F25D 8600 D0D0 0393 969D 2D76 0850

2017-07-18 16:53:39

by Jason L Tibbitts III

[permalink] [raw]
Subject: Re: [REGRESSION] 28676d869bbb (scsi: sg: check for valid direction before starting the request) breaks mtx tape library control

>>>>> "JT" == Johannes Thumshirn <[email protected]> writes:

JT> This is fixed with: commit 68c59fcea1f2c6a54c62aa896cc623c1b5bc9b47

Hmm, well, I just pulled and built mainline, which does appear to
contain that patch (though it wasn't there when I first started
investigating this last week) and the problem is still there. I'll try
building clean 4.12 and applying just that patch over the top.

- J<

2017-07-18 17:34:08

by Jason L Tibbitts III

[permalink] [raw]
Subject: Re: [REGRESSION] 28676d869bbb (scsi: sg: check for valid direction before starting the request) breaks mtx tape library control

I have verified that building a clean v4.12 with
68c59fcea1f2c6a54c62aa896cc623c1b5bc9b47 cherry picked on top still
shows the problem:

[root@backup2 ~]# mtx -f /dev/sg7 next 0
Unloading drive 0 into Storage Element 45...mtx: Request Sense: Long
Report=yes
mtx: Request Sense: Valid Residual=no
mtx: Request Sense: Error Code=0 (Unknown?!)
mtx: Request Sense: Sense Key=No Sense
mtx: Request Sense: FileMark=no
mtx: Request Sense: EOM=no
mtx: Request Sense: ILI=no
mtx: Request Sense: Additional Sense Code = 00
mtx: Request Sense: Additional Sense Qualifier = 00
mtx: Request Sense: BPV=no
mtx: Request Sense: Error in CDB=no
mtx: Request Sense: SKSV=no
MOVE MEDIUM from Element Address 1 to 1045 Failed

Nothing appears to be logged; is there any kind of debugging information
I can collect which might help to track this down? I'm not particularly
good at this but I am pretty sure that I'm building everything properly
and am actually booting the patched kernel.

- J<

2017-07-19 07:03:40

by Johannes Thumshirn

[permalink] [raw]
Subject: Re: [REGRESSION] 28676d869bbb (scsi: sg: check for valid direction before starting the request) breaks mtx tape library control

On Tue, Jul 18, 2017 at 12:33:59PM -0500, Jason L Tibbitts III wrote:
> I have verified that building a clean v4.12 with
> 68c59fcea1f2c6a54c62aa896cc623c1b5bc9b47 cherry picked on top still
> shows the problem:
>
> [root@backup2 ~]# mtx -f /dev/sg7 next 0
> Unloading drive 0 into Storage Element 45...mtx: Request Sense: Long
> Report=yes
> mtx: Request Sense: Valid Residual=no
> mtx: Request Sense: Error Code=0 (Unknown?!)
> mtx: Request Sense: Sense Key=No Sense
> mtx: Request Sense: FileMark=no
> mtx: Request Sense: EOM=no
> mtx: Request Sense: ILI=no
> mtx: Request Sense: Additional Sense Code = 00
> mtx: Request Sense: Additional Sense Qualifier = 00
> mtx: Request Sense: BPV=no
> mtx: Request Sense: Error in CDB=no
> mtx: Request Sense: SKSV=no
> MOVE MEDIUM from Element Address 1 to 1045 Failed
>
> Nothing appears to be logged; is there any kind of debugging information
> I can collect which might help to track this down? I'm not particularly
> good at this but I am pretty sure that I'm building everything properly
> and am actually booting the patched kernel.


Can you please apply this debugging patch, so I can see what's going on.

diff --git a/drivers/scsi/sg.c b/drivers/scsi/sg.c
index 1e82d4128a84..2505fa0b2062 100644
--- a/drivers/scsi/sg.c
+++ b/drivers/scsi/sg.c
@@ -753,6 +753,10 @@ sg_new_write(Sg_fd *sfp, struct file *file, const char __user *buf,

static bool sg_is_valid_dxfer(sg_io_hdr_t *hp)
{
+
+ pr_info("%s: dxfer_direction: %d, dxfer_len: %d\n",
+ __func__, hp->dxfer_direction, hp->dxfer_len);
+
switch (hp->dxfer_direction) {
case SG_DXFER_NONE:
if (hp->dxferp || hp->dxfer_len > 0)

Thanks,
Johannes
--
Johannes Thumshirn Storage
[email protected] +49 911 74053 689
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 N?rnberg
GF: Felix Imend?rffer, Jane Smithard, Graham Norton
HRB 21284 (AG N?rnberg)
Key fingerprint = EC38 9CAB C2C4 F25D 8600 D0D0 0393 969D 2D76 0850

2017-07-19 08:13:44

by Jason L Tibbitts III

[permalink] [raw]
Subject: Re: [REGRESSION] 28676d869bbb (scsi: sg: check for valid direction before starting the request) breaks mtx tape library control

>>>>> "JT" == Johannes Thumshirn <[email protected]> writes:

JT> Can you please apply this debugging patch, so I can see what's going
JT> on.

Sure, no problem.

I generally run "mtx -f /dev/sg7 status" first just to make sure the
library is there; this has always worked as expected. With the debug
patch applied, this is sent to the console:
[ 33.933422] sg_is_valid_dxfer: dxfer_direction: -3, dxfer_len: 56
[ 33.940526] sg_is_valid_dxfer: dxfer_direction: -3, dxfer_len: 136
[ 33.982429] sg_is_valid_dxfer: dxfer_direction: -3, dxfer_len: 4240
[ 34.569986] sg_is_valid_dxfer: dxfer_direction: -3, dxfer_len: 4240
[ 34.623898] sg_is_valid_dxfer: dxfer_direction: -3, dxfer_len: 4240

Then running "mtx -f /dev/sg7 next 0" gives this as stdout/err:

Unloading drive 0 into Storage Element 46...mtx: Request Sense: Long
Report=yes
mtx: Request Sense: Valid Residual=no
mtx: Request Sense: Error Code=0 (Unknown?!)
mtx: Request Sense: Sense Key=No Sense
mtx: Request Sense: FileMark=no
mtx: Request Sense: EOM=no
mtx: Request Sense: ILI=no
mtx: Request Sense: Additional Sense Code = 00
mtx: Request Sense: Additional Sense Qualifier = 00
mtx: Request Sense: BPV=no
mtx: Request Sense: Error in CDB=no
mtx: Request Sense: SKSV=no
MOVE MEDIUM from Element Address 1 to 1046 Failed

And this to the console:
[ 45.552524] sg_is_valid_dxfer: dxfer_direction: -3, dxfer_len: 56
[ 45.559626] sg_is_valid_dxfer: dxfer_direction: -3, dxfer_len: 136
[ 45.603544] sg_is_valid_dxfer: dxfer_direction: -3, dxfer_len: 4240
[ 46.204614] sg_is_valid_dxfer: dxfer_direction: -3, dxfer_len: 4240
[ 46.258463] sg_is_valid_dxfer: dxfer_direction: -3, dxfer_len: 4240
[ 46.304530] sg_is_valid_dxfer: dxfer_direction: -2, dxfer_len: 0

Would you also want to see the output from that patch applied to a
functioning kernel?

- J<

2017-07-19 08:37:02

by Johannes Thumshirn

[permalink] [raw]
Subject: Re: [REGRESSION] 28676d869bbb (scsi: sg: check for valid direction before starting the request) breaks mtx tape library control

On Wed, Jul 19, 2017 at 03:13:34AM -0500, Jason L Tibbitts III wrote:
> [ 46.304530] sg_is_valid_dxfer: dxfer_direction: -2, dxfer_len: 0

Ahh now I see the -2 (SG_DXFER_TO_DEV) is the crucial point here. It is 0 in
your case.

This would "fix" it but I'm not generally sure it is _the_ solution:

diff --git a/drivers/scsi/sg.c b/drivers/scsi/sg.c
index 1e82d4128a84..b421ec81d775 100644
--- a/drivers/scsi/sg.c
+++ b/drivers/scsi/sg.c
@@ -764,7 +764,7 @@ static bool sg_is_valid_dxfer(sg_io_hdr_t *hp)
return true;
case SG_DXFER_TO_DEV:
case SG_DXFER_TO_FROM_DEV:
- if (!hp->dxferp || hp->dxfer_len == 0)
+ if (!hp->dxferp)
return false;
return true;
case SG_DXFER_UNKNOWN:

Doug, what are the rules for SG_DXFER_TO_{FROM_}DEV? Apparently we can't
be sure len > 0, can we rely on dxferp being present?

Thanks,
Johannes
--
Johannes Thumshirn Storage
[email protected] +49 911 74053 689
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 N?rnberg
GF: Felix Imend?rffer, Jane Smithard, Graham Norton
HRB 21284 (AG N?rnberg)
Key fingerprint = EC38 9CAB C2C4 F25D 8600 D0D0 0393 969D 2D76 0850

2017-07-21 16:43:48

by Johannes Thumshirn

[permalink] [raw]
Subject: Re: [REGRESSION] 28676d869bbb (scsi: sg: check for valid direction before starting the request) breaks mtx tape library control

Doug,

On Wed, Jul 19, 2017 at 10:36:54AM +0200, Johannes Thumshirn wrote:
> On Wed, Jul 19, 2017 at 03:13:34AM -0500, Jason L Tibbitts III wrote:
> > [ 46.304530] sg_is_valid_dxfer: dxfer_direction: -2, dxfer_len: 0
>
> Ahh now I see the -2 (SG_DXFER_TO_DEV) is the crucial point here. It is 0 in
> your case.
>
> This would "fix" it but I'm not generally sure it is _the_ solution:
>
> diff --git a/drivers/scsi/sg.c b/drivers/scsi/sg.c
> index 1e82d4128a84..b421ec81d775 100644
> --- a/drivers/scsi/sg.c
> +++ b/drivers/scsi/sg.c
> @@ -764,7 +764,7 @@ static bool sg_is_valid_dxfer(sg_io_hdr_t *hp)
> return true;
> case SG_DXFER_TO_DEV:
> case SG_DXFER_TO_FROM_DEV:
> - if (!hp->dxferp || hp->dxfer_len == 0)
> + if (!hp->dxferp)
> return false;
> return true;
> case SG_DXFER_UNKNOWN:
>
> Doug, what are the rules for SG_DXFER_TO_{FROM_}DEV? Apparently we can't
> be sure len > 0, can we rely on dxferp being present?

Any comments on this?

Jason, can you try the above? If it works and Doug doesn't respond, I'm
inclined yo submit this band aid.

Thanks,
Johannes

--
Johannes Thumshirn Storage
[email protected] +49 911 74053 689
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 N?rnberg
GF: Felix Imend?rffer, Jane Smithard, Graham Norton
HRB 21284 (AG N?rnberg)
Key fingerprint = EC38 9CAB C2C4 F25D 8600 D0D0 0393 969D 2D76 0850

2017-07-21 19:23:27

by Jason L Tibbitts III

[permalink] [raw]
Subject: Re: [REGRESSION] 28676d869bbb (scsi: sg: check for valid direction before starting the request) breaks mtx tape library control

>>>>> "JT" == Johannes Thumshirn <[email protected]> writes:

JT> Jason, can you try the above? If it works and Doug doesn't respond,
JT> I'm inclined yo submit this band aid.

Unfortunately it doesn't appear to work for me. Maybe I'm building the
wrong thing, though. I checked out 4.12, cherry picked
68c59fcea1f2c6a54c62aa896cc623c1b5bc9b47 and then applied your one liner
on top of that. There appears to be no change in behavior:

[root@backup2 ~]# mtx -f /dev/sg7 next 0
Unloading drive 0 into Storage Element 47...mtx: Request Sense: Long
Report=yes
mtx: Request Sense: Valid Residual=no
mtx: Request Sense: Error Code=0 (Unknown?!)
mtx: Request Sense: Sense Key=No Sense
mtx: Request Sense: FileMark=no
mtx: Request Sense: EOM=no
mtx: Request Sense: ILI=no
mtx: Request Sense: Additional Sense Code = 00
mtx: Request Sense: Additional Sense Qualifier = 00
mtx: Request Sense: BPV=no
mtx: Request Sense: Error in CDB=no
mtx: Request Sense: SKSV=no
MOVE MEDIUM from Element Address 1 to 1047 Failed

I can also apply the debugging patch and try again if that would give
you more useful information.

- J<

2017-07-25 07:18:57

by Johannes Thumshirn

[permalink] [raw]
Subject: Re: [REGRESSION] 28676d869bbb (scsi: sg: check for valid direction before starting the request) breaks mtx tape library control

On Fri, Jul 21, 2017 at 02:23:16PM -0500, Jason L Tibbitts III wrote:
> I can also apply the debugging patch and try again if that would give
> you more useful information.

Yes please (on top of the snippet I've sent you last).

Thanks a lot,
Johannes

--
Johannes Thumshirn Storage
[email protected] +49 911 74053 689
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 N?rnberg
GF: Felix Imend?rffer, Jane Smithard, Graham Norton
HRB 21284 (AG N?rnberg)
Key fingerprint = EC38 9CAB C2C4 F25D 8600 D0D0 0393 969D 2D76 0850

2017-07-25 18:25:18

by Jason L Tibbitts III

[permalink] [raw]
Subject: Re: [REGRESSION] 28676d869bbb (scsi: sg: check for valid direction before starting the request) breaks mtx tape library control

>>>>> "JT" == Johannes Thumshirn <[email protected]> writes:

JT> Yes please (on top of the snippet I've sent you last).

OK, I'm at 4.12 with 68c59fcea1f2c6a54c62aa896cc623c1b5bc9b47 cherry
picked, plus the fix patch and the debug patch you've sent previously.
To make sure we're on the same page, I'll include the patch at the end.

Running "mtx -f /dev/sg7 status" gives proper output with this logged to
the console:

[ 36.742905] sg_is_valid_dxfer: dxfer_direction: -3, dxfer_len: 56
[ 36.750036] sg_is_valid_dxfer: dxfer_direction: -3, dxfer_len: 136
[ 36.791673] sg_is_valid_dxfer: dxfer_direction: -3, dxfer_len: 4240
[ 37.339790] sg_is_valid_dxfer: dxfer_direction: -3, dxfer_len: 4240
[ 37.393597] sg_is_valid_dxfer: dxfer_direction: -3, dxfer_len: 4240

And running "mtx -f /dev/sg7 next 0" gives the following output:

[root@backup2 ~]# mtx -f /dev/sg7 next 0
Unloading drive 0 into Storage Element 1...mtx: Request Sense: Long
Report=yes
mtx: Request Sense: Valid Residual=no
mtx: Request Sense: Error Code=0 (Unknown?!)
mtx: Request Sense: Sense Key=No Sense
mtx: Request Sense: FileMark=no
mtx: Request Sense: EOM=no
mtx: Request Sense: ILI=no
mtx: Request Sense: Additional Sense Code = 00
mtx: Request Sense: Additional Sense Qualifier = 00
mtx: Request Sense: BPV=no
mtx: Request Sense: Error in CDB=no
mtx: Request Sense: SKSV=no
MOVE MEDIUM from Element Address 1 to 1001 Failed

And the following is logged to the console:

[ 192.732294] sg_is_valid_dxfer: dxfer_direction: -3, dxfer_len: 56
[ 192.739492] sg_is_valid_dxfer: dxfer_direction: -3, dxfer_len: 136
[ 192.781507] sg_is_valid_dxfer: dxfer_direction: -3, dxfer_len: 4240
[ 193.392401] sg_is_valid_dxfer: dxfer_direction: -3, dxfer_len: 4240
[ 193.448970] sg_is_valid_dxfer: dxfer_direction: -3, dxfer_len: 4240
[ 193.495130] sg_is_valid_dxfer: dxfer_direction: -2, dxfer_len: 0

That's not any different than what I provided before, and honestly I
wouldn't expect it to be.

Is there something else I can log or some debugging switch I can twiddle
to give you any more information? I can also try to be more available
to try and avoid the timezone-induced day-long cycle time. I'm
available on IRC (tibbs on freenode and oftc) and can try to stay up
late or get up early or something to try and avoid this time zone
mismatch.

Here's what an strace of the last mtx call says:

open("/dev/sg7", O_RDWR) = 3
ioctl(3, SG_GET_VERSION_NUM, [30536]) = 0
ioctl(3, SG_SET_TIMEOUT, [30000]) = 0
brk(NULL) = 0x55d65f68a000
brk(0x55d65f6ab000) = 0x55d65f6ab000
brk(NULL) = 0x55d65f6ab000
ioctl(3, SG_IO, {interface_id='S', dxfer_direction=SG_DXFER_FROM_DEV, cmd_len=6, cmdp="\x12\x00\x00\x00\x38\x00", mx_sb_len=20, iovec_count=0, dxfer_len=56, timeout=30000, flags=0, dxferp="\x08\x80\x05\x02\x45\x00\x00\x02\x42\x44\x54\x20\x20\x20\x20\x20\x46\x6c\x65\x78\x53\x74\x6f\x72\x20\x49\x49\x20\x20\x20\x20\x20"..., status=0, masked_status=0, msg_status=0, sb_len_wr=0, sbp="", host_status=0, driver_status=0, resid=0, duration=1, info=0}) = 0
ioctl(3, SG_IO, {interface_id='S', dxfer_direction=SG_DXFER_FROM_DEV, cmd_len=6, cmdp="\x1a\x08\x1d\x00\x88\x00", mx_sb_len=20, iovec_count=0, dxfer_len=136, timeout=300000, flags=0, dxferp="\x17\x00\x00\x00\x9d\x12\x00\x00\x00\x01\x03\xe9\x00\x30\x00\x65\x00\x00\x00\x01\x00\x01\x00\x00", status=0, masked_status=0, msg_status=0, sb_len_wr=0, sbp="", host_status=0, driver_status=0, resid=112, duration=61, info=0}) = 0
ioctl(3, CDROMAUDIOBUFSIZ or SCSI_IOCTL_GET_IDLUN, 0x7ffdbfaee7a0) = 0
ioctl(3, SG_IO, {interface_id='S', dxfer_direction=SG_DXFER_FROM_DEV, cmd_len=12, cmdp="\xb8\x32\x03\xe9\x00\x30\x00\x00\x10\x90\x00\x00", mx_sb_len=20, iovec_count=0, dxfer_len=4240, timeout=300000, flags=0, dxferp="\x03\xe9\x00\x30\x00\x00\x09\xc8\x02\x80\x00\x34\x00\x00\x09\xc0\x03\xe9\x08\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00"..., status=0, masked_status=0, msg_status=0, sb_len_wr=0, sbp="", host_status=0, driver_status=0, resid=1728, duration=542, info=0}) = 0
ioctl(3, CDROMAUDIOBUFSIZ or SCSI_IOCTL_GET_IDLUN, 0x7ffdbfaee7a0) = 0
ioctl(3, SG_IO, {interface_id='S', dxfer_direction=SG_DXFER_FROM_DEV, cmd_len=12, cmdp="\xb8\x34\x00\x01\x00\x01\x00\x00\x10\x90\x00\x00", mx_sb_len=20, iovec_count=0, dxfer_len=4240, timeout=300000, flags=0, dxferp="\x00\x01\x00\x01\x00\x00\x00\x3c\x04\x80\x00\x34\x00\x00\x00\x34\x00\x01\x09\x00\x00\x00\x00\x00\x00\x81\x03\xe9\x43\x30\x30\x30"..., status=0, masked_status=0, msg_status=0, sb_len_wr=0, sbp="", host_status=0, driver_status=0, resid=4172, duration=47, info=0}) = 0
ioctl(3, CDROMAUDIOBUFSIZ or SCSI_IOCTL_GET_IDLUN, 0x7ffdbfaee7a0) = 0
ioctl(3, SG_IO, {interface_id='S', dxfer_direction=SG_DXFER_FROM_DEV, cmd_len=12, cmdp="\xb8\x31\x00\x00\x00\x01\x00\x00\x10\x90\x00\x00", mx_sb_len=20, iovec_count=0, dxfer_len=4240, timeout=300000, flags=0, dxferp="\x00\x00\x00\x01\x00\x00\x00\x3c\x01\x80\x00\x34\x00\x00\x00\x34\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00"..., status=0, masked_status=0, msg_status=0, sb_len_wr=0, sbp="", host_status=0, driver_status=0, resid=4172, duration=37, info=0}) = 0
fstat(1, {st_mode=S_IFCHR|0620, st_rdev=makedev(136, 0), ...}) = 0
write(1, "Unloading drive 0 into Storage E"..., 43) = 43
ioctl(3, SG_IO, {interface_id='S', dxfer_direction=SG_DXFER_TO_DEV, cmd_len=12, cmdp="\xa5\x00\x00\x00\x00\x01\x03\xe9\x00\x00\x00\x00", mx_sb_len=20, iovec_count=0, dxfer_len=0, timeout=300000, flags=0, dxferp=NULL, status=0, masked_status=0, msg_status=0, sb_len_wr=0, sbp="", host_status=0, driver_status=0, resid=0, duration=0, info=0}) = -1 EINVAL (Invalid argument)
write(2, "mtx: Request Sense: Long Report="..., 36) = 36

And here's what I'm testing on top of 4.12:

diff --git a/drivers/scsi/sg.c b/drivers/scsi/sg.c
index 82c33a6..6870e88 100644
--- a/drivers/scsi/sg.c
+++ b/drivers/scsi/sg.c
@@ -753,15 +753,22 @@ sg_new_write(Sg_fd *sfp, struct file *file, const char __user *buf,

static bool sg_is_valid_dxfer(sg_io_hdr_t *hp)
{
+
+ pr_info("%s: dxfer_direction: %d, dxfer_len: %d\n",
+ __func__, hp->dxfer_direction, hp->dxfer_len);
+
switch (hp->dxfer_direction) {
case SG_DXFER_NONE:
if (hp->dxferp || hp->dxfer_len > 0)
return false;
return true;
- case SG_DXFER_TO_DEV:
case SG_DXFER_FROM_DEV:
+ if (hp->dxfer_len < 0)
+ return false;
+ return true;
+ case SG_DXFER_TO_DEV:
case SG_DXFER_TO_FROM_DEV:
- if (!hp->dxferp || hp->dxfer_len == 0)
+ if (!hp->dxferp)
return false;
return true;
case SG_DXFER_UNKNOWN:

- J<

2017-07-25 19:09:08

by Douglas Gilbert

[permalink] [raw]
Subject: Re: [REGRESSION] 28676d869bbb (scsi: sg: check for valid direction before starting the request) breaks mtx tape library control

On 2017-07-19 04:36 AM, Johannes Thumshirn wrote:
> On Wed, Jul 19, 2017 at 03:13:34AM -0500, Jason L Tibbitts III wrote:
>> [ 46.304530] sg_is_valid_dxfer: dxfer_direction: -2, dxfer_len: 0
>
> Ahh now I see the -2 (SG_DXFER_TO_DEV) is the crucial point here. It is 0 in
> your case.
>
> This would "fix" it but I'm not generally sure it is _the_ solution:
>
> diff --git a/drivers/scsi/sg.c b/drivers/scsi/sg.c
> index 1e82d4128a84..b421ec81d775 100644
> --- a/drivers/scsi/sg.c
> +++ b/drivers/scsi/sg.c
> @@ -764,7 +764,7 @@ static bool sg_is_valid_dxfer(sg_io_hdr_t *hp)
> return true;
> case SG_DXFER_TO_DEV:
> case SG_DXFER_TO_FROM_DEV:
> - if (!hp->dxferp || hp->dxfer_len == 0)
> + if (!hp->dxferp)
> return false;
> return true;
> case SG_DXFER_UNKNOWN:
>
> Doug, what are the rules for SG_DXFER_TO_{FROM_}DEV? Apparently we can't
> be sure len > 0, can we rely on dxferp being present?

_TO_ is toward the device (i.e. WRITE) and what T10 call "data-out".
_FROM_ is from the device (e.g. INQUIRY) and what T10 call "data-in".

_TO_FROM_ is basically _FROM_ (and has nothing to do with bidi).
_TO_FROM_ is very old and is meant to try and detect "short reads"
by prefilling the indirect buffer (by reading from dxferp) before
it is overwritten by the _FROM_ (and then writing to dxferp). It
is from the time when not all LLDs or HBAs provided an indication
of a "short read". Today users have the 'sg_io_hdr_t::resid' for
that purpose. Whether the sg driver in lk 4.12 still does that
I haven't checked.

The only limit that should be placed on dxfer_len is something like
<= 2**28 (256M) in my opinion. Big enough that the kernel would
reject it and small enough to catch negative values placed in an
unsigned.


The overall problem is that sg_is_valid_dxfer() introduced in lk 4.12
is doing more sanity checks than were done before. My policy was
to ignore ("don't care") combinations of dxfer_direction, dxferp and
dxfer_len that were harmless. Anyway those three variables are
incomplete since the SCSI command and the device also dictate the
length of the data-in transfer.

Doug Gilbert

2017-07-26 07:22:28

by Johannes Thumshirn

[permalink] [raw]
Subject: Re: [REGRESSION] 28676d869bbb (scsi: sg: check for valid direction before starting the request) breaks mtx tape library control

On Tue, Jul 25, 2017 at 03:09:00PM -0400, Douglas Gilbert wrote:
> _TO_ is toward the device (i.e. WRITE) and what T10 call "data-out".
> _FROM_ is from the device (e.g. INQUIRY) and what T10 call "data-in".
>
> _TO_FROM_ is basically _FROM_ (and has nothing to do with bidi).
> _TO_FROM_ is very old and is meant to try and detect "short reads"
> by prefilling the indirect buffer (by reading from dxferp) before
> it is overwritten by the _FROM_ (and then writing to dxferp). It
> is from the time when not all LLDs or HBAs provided an indication
> of a "short read". Today users have the 'sg_io_hdr_t::resid' for
> that purpose. Whether the sg driver in lk 4.12 still does that
> I haven't checked.
>
> The only limit that should be placed on dxfer_len is something like
> <= 2**28 (256M) in my opinion. Big enough that the kernel would
> reject it and small enough to catch negative values placed in an
> unsigned.

OK. I'll give it a shot, thanks.

>
>
> The overall problem is that sg_is_valid_dxfer() introduced in lk 4.12
> is doing more sanity checks than were done before. My policy was
> to ignore ("don't care") combinations of dxfer_direction, dxferp and
> dxfer_len that were harmless. Anyway those three variables are
> incomplete since the SCSI command and the device also dictate the
> length of the data-in transfer.

The problem with these don't care combinations was, that user-space could then
easily crash the kernel. This is the reason I introduced sg_is_valid_dxfer().
It's sole purpuse was to avoid more CVEs, but unfortunately it turned into
quite some regressions.

Thanks,
Johannes
--
Johannes Thumshirn Storage
[email protected] +49 911 74053 689
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 N?rnberg
GF: Felix Imend?rffer, Jane Smithard, Graham Norton
HRB 21284 (AG N?rnberg)
Key fingerprint = EC38 9CAB C2C4 F25D 8600 D0D0 0393 969D 2D76 0850

2017-07-26 07:39:41

by Johannes Thumshirn

[permalink] [raw]
Subject: Re: [REGRESSION] 28676d869bbb (scsi: sg: check for valid direction before starting the request) breaks mtx tape library control

On Tue, Jul 25, 2017 at 01:25:07PM -0500, Jason L Tibbitts III wrote:
> >>>>> "JT" == Johannes Thumshirn <[email protected]> writes:
> ioctl(3, SG_IO, {interface_id='S', dxfer_direction=SG_DXFER_TO_DEV, cmd_len=12, cmdp="\xa5\x00\x00\x00\x00\x01\x03\xe9\x00\x00\x00\x00", mx_sb_len=20, iovec_count=0, dxfer_len=0, timeout=300000, flags=0, dxferp=NULL, status=0, masked_status=0, msg_status=0, sb_len_wr=0, sbp="", host_status=0, driver_status=0, resid=0, duration=0, info=0}) = -1 EINVAL (Invalid argument)

[...]
case SG_DXFER_TO_DEV:
case SG_DXFER_TO_FROM_DEV:
if (!hp->dxferp)
return false;
return true;

So here we go, dxfer_direction is SG_DXFER_TO_DEV, dxfer_len is 0 and dxferp
is NULL. We bail out because dxferp is NULL here.

I only looked at sg_write() and not sg_new_write() which is called by the
SG_IO path not the fops->write() path.

It's probably best to just check for dxfer_len <= 2^28 to be valid as Doug
suggested:

diff --git a/drivers/scsi/sg.c b/drivers/scsi/sg.c
index 4fe606b000b4..d7ff71e0c85c 100644
--- a/drivers/scsi/sg.c
+++ b/drivers/scsi/sg.c
@@ -751,35 +751,6 @@ sg_new_write(Sg_fd *sfp, struct file *file, const char __user *buf,
return count;
}

-static bool sg_is_valid_dxfer(sg_io_hdr_t *hp)
-{
- switch (hp->dxfer_direction) {
- case SG_DXFER_NONE:
- if (hp->dxferp || hp->dxfer_len > 0)
- return false;
- return true;
- case SG_DXFER_FROM_DEV:
- /*
- * for SG_DXFER_FROM_DEV we always set dxfer_len to > 0. dxferp
- * can either be NULL or != NULL so there's no point in checking
- * it either. So just return true.
- */
- return true;
- case SG_DXFER_TO_DEV:
- case SG_DXFER_TO_FROM_DEV:
- if (!hp->dxferp || hp->dxfer_len == 0)
- return false;
- return true;
- case SG_DXFER_UNKNOWN:
- if ((!hp->dxferp && hp->dxfer_len) ||
- (hp->dxferp && hp->dxfer_len == 0))
- return false;
- return true;
- default:
- return false;
- }
-}
-
static int
sg_common_write(Sg_fd * sfp, Sg_request * srp,
unsigned char *cmnd, int timeout, int blocking)
@@ -800,7 +771,7 @@ sg_common_write(Sg_fd * sfp, Sg_request * srp,
"sg_common_write: scsi opcode=0x%02x, cmd_size=%d\n",
(int) cmnd[0], (int) hp->cmd_len));

- if (!sg_is_valid_dxfer(hp))
+ if (hp->dxfer_len >= SZ_256M)
return -EINVAL;

k = sg_start_req(srp, cmnd);

--
Johannes Thumshirn Storage
[email protected] +49 911 74053 689
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 N?rnberg
GF: Felix Imend?rffer, Jane Smithard, Graham Norton
HRB 21284 (AG N?rnberg)
Key fingerprint = EC38 9CAB C2C4 F25D 8600 D0D0 0393 969D 2D76 0850

2017-07-26 20:19:43

by Jason L Tibbitts III

[permalink] [raw]
Subject: Re: [REGRESSION] 28676d869bbb (scsi: sg: check for valid direction before starting the request) breaks mtx tape library control

>>>>> "JT" == Johannes Thumshirn <[email protected]> writes:

JT> It's probably best to just check for dxfer_len <= 2^28 to be valid
JT> as Doug suggested:

I can verify that patch on top of git head (as of a few hours ago) does
function properly.

It didn't apply directly on top of 4.12 but even I can handle fixing
that up. The result (just deleting the function and changing the call
to a check for hp->dxfer_len >= SZ_256M) works fine and is at the end.

So thanks. If this goes in, please CC to stable.

- J<

diff --git a/drivers/scsi/sg.c b/drivers/scsi/sg.c
index 82c33a6..aa6f1de 100644
--- a/drivers/scsi/sg.c
+++ b/drivers/scsi/sg.c
@@ -751,29 +751,6 @@ sg_new_write(Sg_fd *sfp, struct file *file, const char __user *buf,
return count;
}

-static bool sg_is_valid_dxfer(sg_io_hdr_t *hp)
-{
- switch (hp->dxfer_direction) {
- case SG_DXFER_NONE:
- if (hp->dxferp || hp->dxfer_len > 0)
- return false;
- return true;
- case SG_DXFER_TO_DEV:
- case SG_DXFER_FROM_DEV:
- case SG_DXFER_TO_FROM_DEV:
- if (!hp->dxferp || hp->dxfer_len == 0)
- return false;
- return true;
- case SG_DXFER_UNKNOWN:
- if ((!hp->dxferp && hp->dxfer_len) ||
- (hp->dxferp && hp->dxfer_len == 0))
- return false;
- return true;
- default:
- return false;
- }
-}
-
static int
sg_common_write(Sg_fd * sfp, Sg_request * srp,
unsigned char *cmnd, int timeout, int blocking)
@@ -794,7 +771,7 @@ sg_common_write(Sg_fd * sfp, Sg_request * srp,
"sg_common_write: scsi opcode=0x%02x, cmd_size=%d\n",
(int) cmnd[0], (int) hp->cmd_len));

- if (!sg_is_valid_dxfer(hp))
+ if (hp->dxfer_len >= SZ_256M)
return -EINVAL;

k = sg_start_req(srp, cmnd);