2006-10-29 19:20:21

by Gregor Jasny

[permalink] [raw]
Subject: 2.6.19-rc3 system freezes when ripping with cdparanoia at ioctl(SG_IO)

Hi,

Today I tried the new cdparanoia from Debian Sid (3.10+debian~pre0-2).
When I started ripping with "cdparanoia -d /dev/scd0 1" my system
freezes after some seconds. There is no oops and even the console
cursor stops blinking.

If I start cdparanoia with -g /dev/scd0 it starts ripping and but the
kernel prints many "program cdparanoia not setting count and/or
reply_len properly" warnings. But this seems to be a cdparanoia bug.

My CDROM:
Vendor: PIONEER
Product: DVD-ROM DVD-106
Revision level: 1.22





http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=391901


Attachments:
(No filename) (617.00 B)
backtrace.txt (3.79 kB)
Download all attachments

2006-10-29 21:31:33

by Ken Moffat

[permalink] [raw]
Subject: Re: 2.6.19-rc3 system freezes when ripping with cdparanoia at ioctl(SG_IO)

On Sun, Oct 29, 2006 at 08:20:17PM +0100, Gregor Jasny wrote:
> Hi,
>
> Today I tried the new cdparanoia from Debian Sid (3.10+debian~pre0-2).
> When I started ripping with "cdparanoia -d /dev/scd0 1" my system
> freezes after some seconds. There is no oops and even the console
> cursor stops blinking.
>
> If I start cdparanoia with -g /dev/scd0 it starts ripping and but the
> kernel prints many "program cdparanoia not setting count and/or
> reply_len properly" warnings. But this seems to be a cdparanoia bug.
>
> My CDROM:
> Vendor: PIONEER
> Product: DVD-ROM DVD-106
> Revision level: 1.22
>
I'm guessing this is really an IDE drive ? If so, I suspect the
problem is in scsi emulation (which doesn't deny that the bug might
be at least partly in the application, although hanging the box is
nasty).

Specifically, I've just compiled that version with the debian patch
on my (non-debian) amd64 and successfully ripped a CD (without any
log messages) on both 2.6.18 and 2.6.19-rc3 using /dev/hdc.

So, if this isn't a real SCSI drive, as a work-around you could try
disabling ide-scsi and use the IDE device name.

Ken
--
das eine Mal als Trag?die, das andere Mal als Farce

2006-10-29 22:06:06

by Brice Goglin

[permalink] [raw]
Subject: Re: 2.6.19-rc3 system freezes when ripping with cdparanoia at ioctl(SG_IO)

Ken Moffat wrote:
> On Sun, Oct 29, 2006 at 08:20:17PM +0100, Gregor Jasny wrote:
>
>> Hi,
>>
>> Today I tried the new cdparanoia from Debian Sid (3.10+debian~pre0-2).
>> When I started ripping with "cdparanoia -d /dev/scd0 1" my system
>> freezes after some seconds. There is no oops and even the console
>> cursor stops blinking.
>>
>> If I start cdparanoia with -g /dev/scd0 it starts ripping and but the
>> kernel prints many "program cdparanoia not setting count and/or
>> reply_len properly" warnings. But this seems to be a cdparanoia bug.
>>
>> My CDROM:
>> Vendor: PIONEER
>> Product: DVD-ROM DVD-106
>> Revision level: 1.22
>>
>>
> I'm guessing this is really an IDE drive ? If so, I suspect the
> problem is in scsi emulation (which doesn't deny that the bug might
> be at least partly in the application, although hanging the box is
> nasty).
>
> Specifically, I've just compiled that version with the debian patch
> on my (non-debian) amd64 and successfully ripped a CD (without any
> log messages) on both 2.6.18 and 2.6.19-rc3 using /dev/hdc.
>
> So, if this isn't a real SCSI drive, as a work-around you could try
> disabling ide-scsi and use the IDE device name.
>

I don't think it is ide-scsi related at all. I would rather think about
libata and/or SATA drivers (I am not sure how to call those IDE drives
that appear as SATA devices...). As shown in the Debian bug report that
Gregor cited[1], the problem has been observed on various machines with
the ata_piix SATA driver (with 2.6.16, .17, .18 and .19-rc kernels).

Brice

[1] http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=391901

2006-10-30 11:43:26

by Jens Axboe

[permalink] [raw]
Subject: Re: 2.6.19-rc3 system freezes when ripping with cdparanoia at ioctl(SG_IO)

On Sun, Oct 29 2006, Gregor Jasny wrote:
> Hi,
>
> Today I tried the new cdparanoia from Debian Sid (3.10+debian~pre0-2).
> When I started ripping with "cdparanoia -d /dev/scd0 1" my system
> freezes after some seconds. There is no oops and even the console
> cursor stops blinking.
>
> If I start cdparanoia with -g /dev/scd0 it starts ripping and but the
> kernel prints many "program cdparanoia not setting count and/or
> reply_len properly" warnings. But this seems to be a cdparanoia bug.
>
> My CDROM:
> Vendor: PIONEER
> Product: DVD-ROM DVD-106
> Revision level: 1.22

Can you confirm that 2.6.18 works?

--
Jens Axboe

2006-10-30 13:14:12

by Gregor Jasny

[permalink] [raw]
Subject: Re: 2.6.19-rc3 system freezes when ripping with cdparanoia at ioctl(SG_IO)

2006/10/30, Jens Axboe <[email protected]>:
> Can you confirm that 2.6.18 works?

I've ripped a lot of CDs with this drive and 2.6.18. But I accessed
the drive via the old ide drivers. How Do I enable libata for the PATA
part of my IDE chipset in 2.6.18?

Gregor

2006-10-30 13:17:10

by Gregor Jasny

[permalink] [raw]
Subject: Re: 2.6.19-rc3 system freezes when ripping with cdparanoia at ioctl(SG_IO)

2006/10/30, Jens Axboe <[email protected]>:
> Can you confirm that 2.6.18 works?

The reporter of [1] states that his SATA Thinkpad freezes with 2.6.17
and 2.6.18, too.

Gregor

[1] http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=391901

2006-10-30 13:26:11

by Jens Axboe

[permalink] [raw]
Subject: Re: 2.6.19-rc3 system freezes when ripping with cdparanoia at ioctl(SG_IO)

On Mon, Oct 30 2006, Gregor Jasny wrote:
> 2006/10/30, Jens Axboe <[email protected]>:
> >Can you confirm that 2.6.18 works?
>
> The reporter of [1] states that his SATA Thinkpad freezes with 2.6.17
> and 2.6.18, too.
>
> Gregor
>
> [1] http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=391901

Ok, mainly just checking if this was a potential dupe of another bug.

--
Jens Axboe

2006-11-09 09:46:48

by Brice Goglin

[permalink] [raw]
Subject: Re: 2.6.19-rc3 system freezes when ripping with cdparanoia at ioctl(SG_IO)

ens Axboe wrote:
> On Mon, Oct 30 2006, Gregor Jasny wrote:
>
>> 2006/10/30, Jens Axboe <[email protected]>:
>>
>>> Can you confirm that 2.6.18 works?
>>>
>> The reporter of [1] states that his SATA Thinkpad freezes with 2.6.17
>> and 2.6.18, too.
>>
>> Gregor
>>
>> [1] http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=391901
>>
>
> Ok, mainly just checking if this was a potential dupe of another bug.
>
>

Jens (or anybody else who has any idea of how to debug this),

Did you have a chance to reproduce the problem? I guess we "only" need a
machine with SATA/ata_piix and cdparanoia 3.10. If you want me to debug
some stuff, feel free to tell me what. But, since it freezes the machine
and sysrq doesn't even work, I don't really know what to try...

I just tried on rc5 and rc5-mm1, both have the problem (as 2.6.16, .17
and .18 do, don't know about earlier kernels). I didn't have a audio CD
here, so I tried abcde on a DVD on purpose. With cdparanoia 3.10-pre0
(from Debian testing), it reports nothing during about 5 seconds and
then the machine freezes. With cdparanoia 3a9.8-11 (from Debian stable),
it reports an error very quickly, and dmesg gets a couple line like these:
sg_write: data in/out 12/12 bytes for SCSI command 0x43--guessing
data in;
program cdparanoia not setting count and/or reply_len properly

Thanks,
Brice

2006-11-09 14:00:23

by Tejun Heo

[permalink] [raw]
Subject: Re: 2.6.19-rc3 system freezes when ripping with cdparanoia at ioctl(SG_IO)

[CC'ing Monty and Douglas.]

Hello, the original thread can be read from the following URL.

http://thread.gmane.org/gmane.linux.ide/13708/focus=13708

Brice Goglin wrote:
> ens Axboe wrote:
>> On Mon, Oct 30 2006, Gregor Jasny wrote:
>>
>>> 2006/10/30, Jens Axboe <[email protected]>:
>>>
>>>> Can you confirm that 2.6.18 works?
>>>>
>>> The reporter of [1] states that his SATA Thinkpad freezes with 2.6.17
>>> and 2.6.18, too.
>>>
>>> Gregor
>>>
>>> [1] http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=391901
>>>
>> Ok, mainly just checking if this was a potential dupe of another bug.
>>
>>
>
> Jens (or anybody else who has any idea of how to debug this),
>
> Did you have a chance to reproduce the problem? I guess we "only" need a
> machine with SATA/ata_piix and cdparanoia 3.10. If you want me to debug
> some stuff, feel free to tell me what. But, since it freezes the machine
> and sysrq doesn't even work, I don't really know what to try...
>
> I just tried on rc5 and rc5-mm1, both have the problem (as 2.6.16, .17
> and .18 do, don't know about earlier kernels). I didn't have a audio CD
> here, so I tried abcde on a DVD on purpose. With cdparanoia 3.10-pre0
> (from Debian testing), it reports nothing during about 5 seconds and
> then the machine freezes. With cdparanoia 3a9.8-11 (from Debian stable),
> it reports an error very quickly, and dmesg gets a couple line like these:
> sg_write: data in/out 12/12 bytes for SCSI command 0x43--guessing
> data in;
> program cdparanoia not setting count and/or reply_len properly

Okay, here's the story.

In interface/scan_devices.c::cdda_identify_scsi(), cdparanoia calls
scsi_inquiry() to identify the device and determine interface type.
This seems to be the first time to actually issue commands to the
device. As interface type isn't completely determined, for sg devices,
it first issues the command w/ d->interface set to SGIO_SCSI. If that
fails, it falls back to SGIO_SCSI_BUGGY1.

For to-device request, both SGIO_SCSI and SGIO_SCSI_BUGGY1 set
sg_io_hdr.dxfer_direction to SG_DXFER_TO_DEV. But for from-device
request, SGIO_SCSI uses SG_DXFER_TO_FROM_DEV while SGIO_SCSI_BUGGY1 uses
SG_DXFER_FROM_DEV. So, cdparanoia first issues inquiry w/
SG_DXFER_TO_FROM_DEV and if that fails falls back to SG_DXFER_FROM_DEV.

drivers/scsi/sg.c interprets SG_DXFER_TO_FROM_DEV as read while
block/scsi_ioctl.c interprets it as write. I guess this is historic
thing (scsi/sg.c updated but block/scsi_ioctl.c is forgotten). As
written above, cdparanoia can handle both cases as long as the kernel
promptly fails command issued with the wrong direction.

This works for most PATA ATAPI devices. Most devices detect reversed
transfer and terminate the command promptly. But this doesn't seem to
be true for SATA device. Many just hang and time out commands with the
wrong transfer direction. If you consider that most early SATA ATAPI
devices are actually PATA + bridge, this is sorta inevitable. The
PATA-SATA bridge cannot issue D2H FIS to abort the command by itself.
It's just mirroring the status of PATA side and PATA side doesn't know
SATA protocol mismatch has occurred.

So, IDENTIFY w/ write-DMA protocol times out after quite some seconds.
This is where things go worse from bad. SATA controllers which have
shadow TF registers don't handle timeout conditions very well,
especially when they're waiting for data transfer. They basically hold
the PCI bus and hang till the transfer completes (which never happens).
That's where the hard lock up comes from.

Jens, I think we need to match block sg's behavior to SCSI's. Monty,
the timeout and hard lock up are due to hardware restrictions. Kernel
and libata can't do much about it. So, please find other way to detect
interface.

Thanks.

--
tejun

2006-11-09 14:14:31

by Jeff Garzik

[permalink] [raw]
Subject: Re: 2.6.19-rc3 system freezes when ripping with cdparanoia at ioctl(SG_IO)

Tejun Heo wrote:
> This works for most PATA ATAPI devices. Most devices detect reversed
> transfer and terminate the command promptly. But this doesn't seem to
> be true for SATA device. Many just hang and time out commands with the
> wrong transfer direction. If you consider that most early SATA ATAPI
> devices are actually PATA + bridge, this is sorta inevitable. The
> PATA-SATA bridge cannot issue D2H FIS to abort the command by itself.
> It's just mirroring the status of PATA side and PATA side doesn't know
> SATA protocol mismatch has occurred.
>
> So, IDENTIFY w/ write-DMA protocol times out after quite some seconds.
> This is where things go worse from bad. SATA controllers which have
> shadow TF registers don't handle timeout conditions very well,
> especially when they're waiting for data transfer. They basically hold
> the PCI bus and hang till the transfer completes (which never happens).
> That's where the hard lock up comes from.
>
> Jens, I think we need to match block sg's behavior to SCSI's. Monty,
> the timeout and hard lock up are due to hardware restrictions. Kernel
> and libata can't do much about it. So, please find other way to detect
> interface.


Mapping 'bidirectional' is a bit difficult. It might be reasonable to
interpret that as "userspace doesn't know" at lower layers, and then
fill in a data transfer direction based on ATA command opcode.

Given that there are stupid apps/libs out there in the field with this
behavior, even if the apps are fixed I think we are stuck with the
stupidities. At the very least, we could abort commands that transfer
data in the opposite direction from indicated, based on a command opcode
table.

Jeff


2006-11-09 15:50:11

by Douglas Gilbert

[permalink] [raw]
Subject: Re: 2.6.19-rc3 system freezes when ripping with cdparanoia at ioctl(SG_IO)

Tejun Heo wrote:
> [CC'ing Monty and Douglas.]
>
> Hello, the original thread can be read from the following URL.
>
> http://thread.gmane.org/gmane.linux.ide/13708/focus=13708
>
> Brice Goglin wrote:
>> ens Axboe wrote:
>>> On Mon, Oct 30 2006, Gregor Jasny wrote:
>>>
>>>> 2006/10/30, Jens Axboe <[email protected]>:
>>>>
>>>>> Can you confirm that 2.6.18 works?
>>>>>
>>>> The reporter of [1] states that his SATA Thinkpad freezes with 2.6.17
>>>> and 2.6.18, too.
>>>>
>>>> Gregor
>>>>
>>>> [1] http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=391901
>>>>
>>> Ok, mainly just checking if this was a potential dupe of another bug.
>>>
>>>
>>
>> Jens (or anybody else who has any idea of how to debug this),
>>
>> Did you have a chance to reproduce the problem? I guess we "only" need a
>> machine with SATA/ata_piix and cdparanoia 3.10. If you want me to debug
>> some stuff, feel free to tell me what. But, since it freezes the machine
>> and sysrq doesn't even work, I don't really know what to try...
>>
>> I just tried on rc5 and rc5-mm1, both have the problem (as 2.6.16, .17
>> and .18 do, don't know about earlier kernels). I didn't have a audio CD
>> here, so I tried abcde on a DVD on purpose. With cdparanoia 3.10-pre0
>> (from Debian testing), it reports nothing during about 5 seconds and
>> then the machine freezes. With cdparanoia 3a9.8-11 (from Debian stable),
>> it reports an error very quickly, and dmesg gets a couple line like
>> these:
>> sg_write: data in/out 12/12 bytes for SCSI command 0x43--guessing
>> data in;
>> program cdparanoia not setting count and/or reply_len properly
>
> Okay, here's the story.
>
> In interface/scan_devices.c::cdda_identify_scsi(), cdparanoia calls
> scsi_inquiry() to identify the device and determine interface type. This
> seems to be the first time to actually issue commands to the device. As
> interface type isn't completely determined, for sg devices, it first
> issues the command w/ d->interface set to SGIO_SCSI. If that fails, it
> falls back to SGIO_SCSI_BUGGY1.
>
> For to-device request, both SGIO_SCSI and SGIO_SCSI_BUGGY1 set
> sg_io_hdr.dxfer_direction to SG_DXFER_TO_DEV. But for from-device
> request, SGIO_SCSI uses SG_DXFER_TO_FROM_DEV while SGIO_SCSI_BUGGY1 uses
> SG_DXFER_FROM_DEV. So, cdparanoia first issues inquiry w/
> SG_DXFER_TO_FROM_DEV and if that fails falls back to SG_DXFER_FROM_DEV.
>
> drivers/scsi/sg.c interprets SG_DXFER_TO_FROM_DEV as read while
> block/scsi_ioctl.c interprets it as write. I guess this is historic
> thing (scsi/sg.c updated but block/scsi_ioctl.c is forgotten). As
> written above, cdparanoia can handle both cases as long as the kernel
> promptly fails command issued with the wrong direction.
>
> This works for most PATA ATAPI devices. Most devices detect reversed
> transfer and terminate the command promptly. But this doesn't seem to
> be true for SATA device. Many just hang and time out commands with the
> wrong transfer direction. If you consider that most early SATA ATAPI
> devices are actually PATA + bridge, this is sorta inevitable. The
> PATA-SATA bridge cannot issue D2H FIS to abort the command by itself.
> It's just mirroring the status of PATA side and PATA side doesn't know
> SATA protocol mismatch has occurred.
>
> So, IDENTIFY w/ write-DMA protocol times out after quite some seconds.
> This is where things go worse from bad. SATA controllers which have
> shadow TF registers don't handle timeout conditions very well,
> especially when they're waiting for data transfer. They basically hold
> the PCI bus and hang till the transfer completes (which never happens).
> That's where the hard lock up comes from.
>
> Jens, I think we need to match block sg's behavior to SCSI's. Monty,
> the timeout and hard lock up are due to hardware restrictions. Kernel
> and libata can't do much about it. So, please find other way to detect
> interface.

Tejun,
Your SG_DXFER_TO_FROM_DEV analysis is correct.

The stupid ~!@# who wrote the code, and the documentation
for it, defined SG_DXFER_TO_FROM_DEV to mean a "transfer
from device" operation where the kernel buffer receiving
the DMA transfer was prefilled with data that the application
provided. That certainly isn't a bidirectional transfer to/from
the device, but it is a bidirectional transfer to kernel
buffers when indirect IO is used.

Why do this? Because the 'resid' field indicating how much
less data was transferred in a "from_device" transfer than
was requested, was not added to SCSI infrastructure till much
later. There are still LLDs out there that don't implement it.
It also reflected a similar technique used with the sg_header
structure (circa 1992) for precisely the same reason. And
application writers wanted that functionality. Joerg was the
first name of one such application writer.


Coincidentally I am sitting on a patch from Luben Tuikov
to cause the same breakage in the sg driver itself.
Nobody has proposed a patch to the documentation for
the explanation of SG_DXFER_TO_FROM_DEV :-)
http://www.torque.net/sg/p/sg_v3_ho.html


As I am currently proposing a SCSI pass through version 4
interface with twin scatter gather lists for independent
bidirectional transfers for SCSI commands, I'm not sure
what setting DMA_BIDIRECTIONAL in the existing interface
buys us.


When you maintain and document a pass through interface you
sit between two groups of people that have conflicting goals
and don't have a particularly high opinion of each other.

Doug Gilbert


2006-11-09 20:09:25

by Monty Montgomery

[permalink] [raw]
Subject: Re: 2.6.19-rc3 system freezes when ripping with cdparanoia at ioctl(SG_IO)

On 11/9/06, Tejun Heo <[email protected]> wrote:

> drivers/scsi/sg.c interprets SG_DXFER_TO_FROM_DEV as read while
> block/scsi_ioctl.c interprets it as write. I guess this is historic
> thing (scsi/sg.c updated but block/scsi_ioctl.c is forgotten).

Not historic; Jens accidentally implemented it backwards. No one
noticed for a long time. I submitted a patch for this a few months
ago.

> This works for most PATA ATAPI devices. Most devices detect reversed
> transfer and terminate the command promptly.

No. The rejection is *not* in hardware; it is in software.
block/scsi_ioctl.c, at least up to 2.6.16, rejected the TO_FROM_DEVICE
request when verifying the command for sanity after setting the
transfer direction incorrectly. As far as the *device* can see,
TO_FROM_DEVICE and FROM_DEVICE are identical. The difference only
applies inside the kernel mid-level driver where TO_FROM_DEVICE
prefills the transfer buffer as a way of working around having no
other detection path for short DMA transfers.

> But this doesn't seem to
> be true for SATA device.

Then the driver is broken and needs to be fixed. And I'll need to
find a workaround for broken kernels that doesn't cause a boom.

> Jens, I think we need to match block sg's behavior to SCSI's. Monty,
> the timeout and hard lock up are due to hardware restrictions.

No., the kernel setting the transfer direction incorrectly. I don't
set the transfer direction, the kernel does.

In your case, I pass in "SGIO_TO_FROM_DEVICE" and the kernel says
"that's a write". The kernel is wrong. It is a read. The original
description of what TO_FROM_DEVICE is for is explicit on this point.

> Kernel
> and libata can't do much about it. So, please find other way to detect
> interface.

Just to be clear-- it is the kernel at fault here, and the kernel can
do something about it-- but only if the kernel gets fixed. Also to be
clear, given this brokenness, yes I need to find another way.

Dammit, dammit, dammit, one step forward, two steps back :-(

Monty

2006-11-09 20:14:37

by Monty Montgomery

[permalink] [raw]
Subject: Re: 2.6.19-rc3 system freezes when ripping with cdparanoia at ioctl(SG_IO)

On 11/9/06, Jeff Garzik <[email protected]> wrote:

> Mapping 'bidirectional' is a bit difficult.

SGIO_TO_FROM_DEVICE is *not* bdirectional!

>From the header that defines it:

#define SG_DXFER_TO_FROM_DEV -4 /* treated like SG_DXFER_FROM_DEV with the
additional property than during indirect
IO the user buffer is copied into the
kernel buffers before the transfer */

That's pretty darned clear. TO_FROM_DEVICE is a straight-up read.
Why the continuing confusion of what this mode is for?

> Given that there are stupid apps/libs out there in the field with this
> behavior, even if the apps are fixed I think we are stuck with the
> stupidities.

*ahem*

Monty

2006-11-09 22:47:23

by Tejun Heo

[permalink] [raw]
Subject: Re: 2.6.19-rc3 system freezes when ripping with cdparanoia at ioctl(SG_IO)

Hello,

Monty Montgomery wrote:
[--snip--]
>> Kernel
>> and libata can't do much about it. So, please find other way to detect
>> interface.
>
> Just to be clear-- it is the kernel at fault here, and the kernel can
> do something about it-- but only if the kernel gets fixed. Also to be
> clear, given this brokenness, yes I need to find another way.

Yeap, it seems to be kernel's fault and we need to fix both.

> Dammit, dammit, dammit, one step forward, two steps back :-(

:-(

--
tejun

2006-11-10 10:36:23

by Luben Tuikov

[permalink] [raw]
Subject: Re: 2.6.19-rc3 system freezes when ripping with cdparanoia at ioctl(SG_IO)

--- Douglas Gilbert <[email protected]> wrote:
> Tejun Heo wrote:
> > [CC'ing Monty and Douglas.]
> >
> > Hello, the original thread can be read from the following URL.
> >
> > http://thread.gmane.org/gmane.linux.ide/13708/focus=13708
> >
> > Brice Goglin wrote:
> >> ens Axboe wrote:
> >>> On Mon, Oct 30 2006, Gregor Jasny wrote:
> >>>
> >>>> 2006/10/30, Jens Axboe <[email protected]>:
> >>>>
> >>>>> Can you confirm that 2.6.18 works?
> >>>>>
> >>>> The reporter of [1] states that his SATA Thinkpad freezes with 2.6.17
> >>>> and 2.6.18, too.
> >>>>
> >>>> Gregor
> >>>>
> >>>> [1] http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=391901
> >>>>
> >>> Ok, mainly just checking if this was a potential dupe of another bug.
> >>>
> >>>
> >>
> >> Jens (or anybody else who has any idea of how to debug this),
> >>
> >> Did you have a chance to reproduce the problem? I guess we "only" need a
> >> machine with SATA/ata_piix and cdparanoia 3.10. If you want me to debug
> >> some stuff, feel free to tell me what. But, since it freezes the machine
> >> and sysrq doesn't even work, I don't really know what to try...
> >>
> >> I just tried on rc5 and rc5-mm1, both have the problem (as 2.6.16, .17
> >> and .18 do, don't know about earlier kernels). I didn't have a audio CD
> >> here, so I tried abcde on a DVD on purpose. With cdparanoia 3.10-pre0
> >> (from Debian testing), it reports nothing during about 5 seconds and
> >> then the machine freezes. With cdparanoia 3a9.8-11 (from Debian stable),
> >> it reports an error very quickly, and dmesg gets a couple line like
> >> these:
> >> sg_write: data in/out 12/12 bytes for SCSI command 0x43--guessing
> >> data in;
> >> program cdparanoia not setting count and/or reply_len properly
> >
> > Okay, here's the story.
> >
> > In interface/scan_devices.c::cdda_identify_scsi(), cdparanoia calls
> > scsi_inquiry() to identify the device and determine interface type. This
> > seems to be the first time to actually issue commands to the device. As
> > interface type isn't completely determined, for sg devices, it first
> > issues the command w/ d->interface set to SGIO_SCSI. If that fails, it
> > falls back to SGIO_SCSI_BUGGY1.
> >
> > For to-device request, both SGIO_SCSI and SGIO_SCSI_BUGGY1 set
> > sg_io_hdr.dxfer_direction to SG_DXFER_TO_DEV. But for from-device
> > request, SGIO_SCSI uses SG_DXFER_TO_FROM_DEV while SGIO_SCSI_BUGGY1 uses
> > SG_DXFER_FROM_DEV. So, cdparanoia first issues inquiry w/
> > SG_DXFER_TO_FROM_DEV and if that fails falls back to SG_DXFER_FROM_DEV.
> >
> > drivers/scsi/sg.c interprets SG_DXFER_TO_FROM_DEV as read while
> > block/scsi_ioctl.c interprets it as write. I guess this is historic
> > thing (scsi/sg.c updated but block/scsi_ioctl.c is forgotten). As
> > written above, cdparanoia can handle both cases as long as the kernel
> > promptly fails command issued with the wrong direction.
> >
> > This works for most PATA ATAPI devices. Most devices detect reversed
> > transfer and terminate the command promptly. But this doesn't seem to
> > be true for SATA device. Many just hang and time out commands with the
> > wrong transfer direction. If you consider that most early SATA ATAPI
> > devices are actually PATA + bridge, this is sorta inevitable. The
> > PATA-SATA bridge cannot issue D2H FIS to abort the command by itself.
> > It's just mirroring the status of PATA side and PATA side doesn't know
> > SATA protocol mismatch has occurred.
> >
> > So, IDENTIFY w/ write-DMA protocol times out after quite some seconds.
> > This is where things go worse from bad. SATA controllers which have
> > shadow TF registers don't handle timeout conditions very well,
> > especially when they're waiting for data transfer. They basically hold
> > the PCI bus and hang till the transfer completes (which never happens).
> > That's where the hard lock up comes from.
> >
> > Jens, I think we need to match block sg's behavior to SCSI's. Monty,
> > the timeout and hard lock up are due to hardware restrictions. Kernel
> > and libata can't do much about it. So, please find other way to detect
> > interface.
>
> Tejun,
> Your SG_DXFER_TO_FROM_DEV analysis is correct.
>
> The stupid ~!@# who wrote the code, and the documentation
> for it, defined SG_DXFER_TO_FROM_DEV to mean a "transfer
> from device" operation where the kernel buffer receiving
> the DMA transfer was prefilled with data that the application
> provided. That certainly isn't a bidirectional transfer to/from
> the device, but it is a bidirectional transfer to kernel
> buffers when indirect IO is used.
>
> Why do this? Because the 'resid' field indicating how much
> less data was transferred in a "from_device" transfer than
> was requested, was not added to SCSI infrastructure till much
> later. There are still LLDs out there that don't implement it.
> It also reflected a similar technique used with the sg_header
> structure (circa 1992) for precisely the same reason. And
> application writers wanted that functionality. Joerg was the
> first name of one such application writer.
>
>
> Coincidentally I am sitting on a patch from Luben Tuikov
> to cause the same breakage in the sg driver itself.

Here is a link to the recently posted 8 month patch:
http://marc.theaimsgroup.com/?l=linux-scsi&m=116267031029025&w=2

The patch would appear to fix the problem Tejun is describing.

I cannot quite remember exactly what I was doing that day 8 months
ago, but was either disk or tape devices testing and arrived
at that patch.

This patch had been in my dev (gateway) tree for the last 8
months, without any problems.

Luben


> Nobody has proposed a patch to the documentation for
> the explanation of SG_DXFER_TO_FROM_DEV :-)
> http://www.torque.net/sg/p/sg_v3_ho.html
>
>
> As I am currently proposing a SCSI pass through version 4
> interface with twin scatter gather lists for independent
> bidirectional transfers for SCSI commands, I'm not sure
> what setting DMA_BIDIRECTIONAL in the existing interface
> buys us.
>
>
> When you maintain and document a pass through interface you
> sit between two groups of people that have conflicting goals
> and don't have a particularly high opinion of each other.
>
> Doug Gilbert
>
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-ide" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>

2006-11-10 12:58:49

by Douglas Gilbert

[permalink] [raw]
Subject: Re: 2.6.19-rc3 system freezes when ripping with cdparanoia at ioctl(SG_IO)

Luben Tuikov wrote:
> --- Douglas Gilbert <[email protected]> wrote:
>> Tejun Heo wrote:
>>> [CC'ing Monty and Douglas.]
>>>
>>> Hello, the original thread can be read from the following URL.
>>>
>>> http://thread.gmane.org/gmane.linux.ide/13708/focus=13708
>>>
>>> Brice Goglin wrote:
>>>> ens Axboe wrote:
>>>>> On Mon, Oct 30 2006, Gregor Jasny wrote:
>>>>>
>>>>>> 2006/10/30, Jens Axboe <[email protected]>:
>>>>>>
>>>>>>> Can you confirm that 2.6.18 works?
>>>>>>>
>>>>>> The reporter of [1] states that his SATA Thinkpad freezes with 2.6.17
>>>>>> and 2.6.18, too.
>>>>>>
>>>>>> Gregor
>>>>>>
>>>>>> [1] http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=391901
>>>>>>
>>>>> Ok, mainly just checking if this was a potential dupe of another bug.
>>>>>
>>>>>
>>>> Jens (or anybody else who has any idea of how to debug this),
>>>>
>>>> Did you have a chance to reproduce the problem? I guess we "only" need a
>>>> machine with SATA/ata_piix and cdparanoia 3.10. If you want me to debug
>>>> some stuff, feel free to tell me what. But, since it freezes the machine
>>>> and sysrq doesn't even work, I don't really know what to try...
>>>>
>>>> I just tried on rc5 and rc5-mm1, both have the problem (as 2.6.16, .17
>>>> and .18 do, don't know about earlier kernels). I didn't have a audio CD
>>>> here, so I tried abcde on a DVD on purpose. With cdparanoia 3.10-pre0
>>>> (from Debian testing), it reports nothing during about 5 seconds and
>>>> then the machine freezes. With cdparanoia 3a9.8-11 (from Debian stable),
>>>> it reports an error very quickly, and dmesg gets a couple line like
>>>> these:
>>>> sg_write: data in/out 12/12 bytes for SCSI command 0x43--guessing
>>>> data in;
>>>> program cdparanoia not setting count and/or reply_len properly
>>> Okay, here's the story.
>>>
>>> In interface/scan_devices.c::cdda_identify_scsi(), cdparanoia calls
>>> scsi_inquiry() to identify the device and determine interface type. This
>>> seems to be the first time to actually issue commands to the device. As
>>> interface type isn't completely determined, for sg devices, it first
>>> issues the command w/ d->interface set to SGIO_SCSI. If that fails, it
>>> falls back to SGIO_SCSI_BUGGY1.
>>>
>>> For to-device request, both SGIO_SCSI and SGIO_SCSI_BUGGY1 set
>>> sg_io_hdr.dxfer_direction to SG_DXFER_TO_DEV. But for from-device
>>> request, SGIO_SCSI uses SG_DXFER_TO_FROM_DEV while SGIO_SCSI_BUGGY1 uses
>>> SG_DXFER_FROM_DEV. So, cdparanoia first issues inquiry w/
>>> SG_DXFER_TO_FROM_DEV and if that fails falls back to SG_DXFER_FROM_DEV.
>>>
>>> drivers/scsi/sg.c interprets SG_DXFER_TO_FROM_DEV as read while
>>> block/scsi_ioctl.c interprets it as write. I guess this is historic
>>> thing (scsi/sg.c updated but block/scsi_ioctl.c is forgotten). As
>>> written above, cdparanoia can handle both cases as long as the kernel
>>> promptly fails command issued with the wrong direction.
>>>
>>> This works for most PATA ATAPI devices. Most devices detect reversed
>>> transfer and terminate the command promptly. But this doesn't seem to
>>> be true for SATA device. Many just hang and time out commands with the
>>> wrong transfer direction. If you consider that most early SATA ATAPI
>>> devices are actually PATA + bridge, this is sorta inevitable. The
>>> PATA-SATA bridge cannot issue D2H FIS to abort the command by itself.
>>> It's just mirroring the status of PATA side and PATA side doesn't know
>>> SATA protocol mismatch has occurred.
>>>
>>> So, IDENTIFY w/ write-DMA protocol times out after quite some seconds.
>>> This is where things go worse from bad. SATA controllers which have
>>> shadow TF registers don't handle timeout conditions very well,
>>> especially when they're waiting for data transfer. They basically hold
>>> the PCI bus and hang till the transfer completes (which never happens).
>>> That's where the hard lock up comes from.
>>>
>>> Jens, I think we need to match block sg's behavior to SCSI's. Monty,
>>> the timeout and hard lock up are due to hardware restrictions. Kernel
>>> and libata can't do much about it. So, please find other way to detect
>>> interface.
>> Tejun,
>> Your SG_DXFER_TO_FROM_DEV analysis is correct.
>>
>> The stupid ~!@# who wrote the code, and the documentation
>> for it, defined SG_DXFER_TO_FROM_DEV to mean a "transfer
>> from device" operation where the kernel buffer receiving
>> the DMA transfer was prefilled with data that the application
>> provided. That certainly isn't a bidirectional transfer to/from
>> the device, but it is a bidirectional transfer to kernel
>> buffers when indirect IO is used.
>>
>> Why do this? Because the 'resid' field indicating how much
>> less data was transferred in a "from_device" transfer than
>> was requested, was not added to SCSI infrastructure till much
>> later. There are still LLDs out there that don't implement it.
>> It also reflected a similar technique used with the sg_header
>> structure (circa 1992) for precisely the same reason. And
>> application writers wanted that functionality. Joerg was the
>> first name of one such application writer.
>>
>>
>> Coincidentally I am sitting on a patch from Luben Tuikov
>> to cause the same breakage in the sg driver itself.
>
> Here is a link to the recently posted 8 month patch:
> http://marc.theaimsgroup.com/?l=linux-scsi&m=116267031029025&w=2
>
> The patch would appear to fix the problem Tejun is describing.
>
> I cannot quite remember exactly what I was doing that day 8 months
> ago, but was either disk or tape devices testing and arrived
> at that patch.
>
> This patch had been in my dev (gateway) tree for the last 8
> months, without any problems.
>
> Luben
>
>
>> Nobody has proposed a patch to the documentation for
>> the explanation of SG_DXFER_TO_FROM_DEV :-)
>> http://www.torque.net/sg/p/sg_v3_ho.html
^^^^^^^^^^^^^

Luben,
The failure being reported is that the block layer
SG_IO ioctl already does what you are proposing to
do for the sg driver.

Hence an application, cdparanoia in this case, since
it coded against documented behaviour, assumes that
SG_DXFER_TO_FROM_DEV will read from the device.
See the definition of SG_DXFER_TO_FROM_DEV in sg.h and
the document above.

So your proposed patch would compound the problem. The
solution is _not_ to change the sg driver and put the
equivalent of the reverse of your patch in the block
layer SG_IO ioctl.

There is nothing to stop a new direction flag being
added called SG_DXFER_BIDIRECTIONAL that maps to
DMA_BIDIRECTIONAL.

Doug Gilbert


2006-11-10 16:12:49

by Jens Axboe

[permalink] [raw]
Subject: Re: 2.6.19-rc3 system freezes when ripping with cdparanoia at ioctl(SG_IO)

On Thu, Nov 09 2006, Monty Montgomery wrote:
> On 11/9/06, Tejun Heo <[email protected]> wrote:
>
> >drivers/scsi/sg.c interprets SG_DXFER_TO_FROM_DEV as read while
> >block/scsi_ioctl.c interprets it as write. I guess this is historic
> >thing (scsi/sg.c updated but block/scsi_ioctl.c is forgotten).
>
> Not historic; Jens accidentally implemented it backwards. No one
> noticed for a long time. I submitted a patch for this a few months
> ago.

Yeah, I wonder why that did not go in, I remember the full breadth of
our discussion and you are fully correct. I'll make sure it gets into
2.6.19!

--
Jens Axboe

2006-11-10 16:19:21

by Brice Goglin

[permalink] [raw]
Subject: Re: 2.6.19-rc3 system freezes when ripping with cdparanoia at ioctl(SG_IO)

Jens Axboe wrote:
> On Thu, Nov 09 2006, Monty Montgomery wrote:
>
>> On 11/9/06, Tejun Heo <[email protected]> wrote:
>>
>>
>>> drivers/scsi/sg.c interprets SG_DXFER_TO_FROM_DEV as read while
>>> block/scsi_ioctl.c interprets it as write. I guess this is historic
>>> thing (scsi/sg.c updated but block/scsi_ioctl.c is forgotten).
>>>
>> Not historic; Jens accidentally implemented it backwards. No one
>> noticed for a long time. I submitted a patch for this a few months
>> ago.
>>
>
> Yeah, I wonder why that did not go in, I remember the full breadth of
> our discussion and you are fully correct. I'll make sure it gets into
> 2.6.19!
>

Note sure this patch was supposed to fix our freeze, but I just tried on
top of rc5 and it does not seem to fix it.

Brice

2006-11-10 16:21:10

by Jens Axboe

[permalink] [raw]
Subject: Re: 2.6.19-rc3 system freezes when ripping with cdparanoia at ioctl(SG_IO)

On Fri, Nov 10 2006, Brice Goglin wrote:
> Jens Axboe wrote:
> > On Thu, Nov 09 2006, Monty Montgomery wrote:
> >
> >> On 11/9/06, Tejun Heo <[email protected]> wrote:
> >>
> >>
> >>> drivers/scsi/sg.c interprets SG_DXFER_TO_FROM_DEV as read while
> >>> block/scsi_ioctl.c interprets it as write. I guess this is historic
> >>> thing (scsi/sg.c updated but block/scsi_ioctl.c is forgotten).
> >>>
> >> Not historic; Jens accidentally implemented it backwards. No one
> >> noticed for a long time. I submitted a patch for this a few months
> >> ago.
> >>
> >
> > Yeah, I wonder why that did not go in, I remember the full breadth of
> > our discussion and you are fully correct. I'll make sure it gets into
> > 2.6.19!
> >
>
> Note sure this patch was supposed to fix our freeze, but I just tried on
> top of rc5 and it does not seem to fix it.

It should fix Alex's issue with wrong data direction being seen, I
haven't had time to follow this thread today so cannot say.

--
Jens Axboe

2006-11-10 20:08:18

by Luben Tuikov

[permalink] [raw]
Subject: Re: 2.6.19-rc3 system freezes when ripping with cdparanoia at ioctl(SG_IO)

--- Douglas Gilbert <[email protected]> wrote:
> Luben Tuikov wrote:
> > --- Douglas Gilbert <[email protected]> wrote:
> >> Tejun Heo wrote:
> >>> [CC'ing Monty and Douglas.]
> >>>
> >>> Hello, the original thread can be read from the following URL.
> >>>
> >>> http://thread.gmane.org/gmane.linux.ide/13708/focus=13708
> >>>
> >>> Brice Goglin wrote:
> >>>> ens Axboe wrote:
> >>>>> On Mon, Oct 30 2006, Gregor Jasny wrote:
> >>>>>
> >>>>>> 2006/10/30, Jens Axboe <[email protected]>:
> >>>>>>
> >>>>>>> Can you confirm that 2.6.18 works?
> >>>>>>>
> >>>>>> The reporter of [1] states that his SATA Thinkpad freezes with 2.6.17
> >>>>>> and 2.6.18, too.
> >>>>>>
> >>>>>> Gregor
> >>>>>>
> >>>>>> [1] http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=391901
> >>>>>>
> >>>>> Ok, mainly just checking if this was a potential dupe of another bug.
> >>>>>
> >>>>>
> >>>> Jens (or anybody else who has any idea of how to debug this),
> >>>>
> >>>> Did you have a chance to reproduce the problem? I guess we "only" need a
> >>>> machine with SATA/ata_piix and cdparanoia 3.10. If you want me to debug
> >>>> some stuff, feel free to tell me what. But, since it freezes the machine
> >>>> and sysrq doesn't even work, I don't really know what to try...
> >>>>
> >>>> I just tried on rc5 and rc5-mm1, both have the problem (as 2.6.16, .17
> >>>> and .18 do, don't know about earlier kernels). I didn't have a audio CD
> >>>> here, so I tried abcde on a DVD on purpose. With cdparanoia 3.10-pre0
> >>>> (from Debian testing), it reports nothing during about 5 seconds and
> >>>> then the machine freezes. With cdparanoia 3a9.8-11 (from Debian stable),
> >>>> it reports an error very quickly, and dmesg gets a couple line like
> >>>> these:
> >>>> sg_write: data in/out 12/12 bytes for SCSI command 0x43--guessing
> >>>> data in;
> >>>> program cdparanoia not setting count and/or reply_len properly
> >>> Okay, here's the story.
> >>>
> >>> In interface/scan_devices.c::cdda_identify_scsi(), cdparanoia calls
> >>> scsi_inquiry() to identify the device and determine interface type. This
> >>> seems to be the first time to actually issue commands to the device. As
> >>> interface type isn't completely determined, for sg devices, it first
> >>> issues the command w/ d->interface set to SGIO_SCSI. If that fails, it
> >>> falls back to SGIO_SCSI_BUGGY1.
> >>>
> >>> For to-device request, both SGIO_SCSI and SGIO_SCSI_BUGGY1 set
> >>> sg_io_hdr.dxfer_direction to SG_DXFER_TO_DEV. But for from-device
> >>> request, SGIO_SCSI uses SG_DXFER_TO_FROM_DEV while SGIO_SCSI_BUGGY1 uses
> >>> SG_DXFER_FROM_DEV. So, cdparanoia first issues inquiry w/
> >>> SG_DXFER_TO_FROM_DEV and if that fails falls back to SG_DXFER_FROM_DEV.
> >>>
> >>> drivers/scsi/sg.c interprets SG_DXFER_TO_FROM_DEV as read while
> >>> block/scsi_ioctl.c interprets it as write. I guess this is historic
> >>> thing (scsi/sg.c updated but block/scsi_ioctl.c is forgotten). As
> >>> written above, cdparanoia can handle both cases as long as the kernel
> >>> promptly fails command issued with the wrong direction.
> >>>
> >>> This works for most PATA ATAPI devices. Most devices detect reversed
> >>> transfer and terminate the command promptly. But this doesn't seem to
> >>> be true for SATA device. Many just hang and time out commands with the
> >>> wrong transfer direction. If you consider that most early SATA ATAPI
> >>> devices are actually PATA + bridge, this is sorta inevitable. The
> >>> PATA-SATA bridge cannot issue D2H FIS to abort the command by itself.
> >>> It's just mirroring the status of PATA side and PATA side doesn't know
> >>> SATA protocol mismatch has occurred.
> >>>
> >>> So, IDENTIFY w/ write-DMA protocol times out after quite some seconds.
> >>> This is where things go worse from bad. SATA controllers which have
> >>> shadow TF registers don't handle timeout conditions very well,
> >>> especially when they're waiting for data transfer. They basically hold
> >>> the PCI bus and hang till the transfer completes (which never happens).
> >>> That's where the hard lock up comes from.
> >>>
> >>> Jens, I think we need to match block sg's behavior to SCSI's. Monty,
> >>> the timeout and hard lock up are due to hardware restrictions. Kernel
> >>> and libata can't do much about it. So, please find other way to detect
> >>> interface.
> >> Tejun,
> >> Your SG_DXFER_TO_FROM_DEV analysis is correct.
> >>
> >> The stupid ~!@# who wrote the code, and the documentation
> >> for it, defined SG_DXFER_TO_FROM_DEV to mean a "transfer
> >> from device" operation where the kernel buffer receiving
> >> the DMA transfer was prefilled with data that the application
> >> provided. That certainly isn't a bidirectional transfer to/from
> >> the device, but it is a bidirectional transfer to kernel
> >> buffers when indirect IO is used.
> >>
> >> Why do this? Because the 'resid' field indicating how much
> >> less data was transferred in a "from_device" transfer than
> >> was requested, was not added to SCSI infrastructure till much
> >> later. There are still LLDs out there that don't implement it.
> >> It also reflected a similar technique used with the sg_header
> >> structure (circa 1992) for precisely the same reason. And
> >> application writers wanted that functionality. Joerg was the
> >> first name of one such application writer.
> >>
> >>
> >> Coincidentally I am sitting on a patch from Luben Tuikov
> >> to cause the same breakage in the sg driver itself.
> >
> > Here is a link to the recently posted 8 month patch:
> > http://marc.theaimsgroup.com/?l=linux-scsi&m=116267031029025&w=2
> >
> > The patch would appear to fix the problem Tejun is describing.
> >
> > I cannot quite remember exactly what I was doing that day 8 months
> > ago, but was either disk or tape devices testing and arrived
> > at that patch.
> >
> > This patch had been in my dev (gateway) tree for the last 8
> > months, without any problems.
> >
> > Luben
> >
> >
> >> Nobody has proposed a patch to the documentation for
> >> the explanation of SG_DXFER_TO_FROM_DEV :-)
> >> http://www.torque.net/sg/p/sg_v3_ho.html
> ^^^^^^^^^^^^^
>
> Luben,
> The failure being reported is that the block layer
> SG_IO ioctl already does what you are proposing to
> do for the sg driver.
>
> Hence an application, cdparanoia in this case, since
> it coded against documented behaviour, assumes that
> SG_DXFER_TO_FROM_DEV will read from the device.
> See the definition of SG_DXFER_TO_FROM_DEV in sg.h and
> the document above.
>
> So your proposed patch would compound the problem. The
> solution is _not_ to change the sg driver and put the
> equivalent of the reverse of your patch in the block
> layer SG_IO ioctl.
>
> There is nothing to stop a new direction flag being
> added called SG_DXFER_BIDIRECTIONAL that maps to
> DMA_BIDIRECTIONAL.

Sounds good!

Luben

P.S. I'd love to see SG_DXFER_TO_FROM_DEV completely ripped out
of sg.c, for obvious reasons. Can you not duplicate the resid "fix"
it provides into "FROM_DEV" -- do apps really rely on it?

2006-11-11 10:46:57

by Christoph Hellwig

[permalink] [raw]
Subject: Re: 2.6.19-rc3 system freezes when ripping with cdparanoia at ioctl(SG_IO)

On Fri, Nov 10, 2006 at 12:08:15PM -0800, Luben Tuikov wrote:
> P.S. I'd love to see SG_DXFER_TO_FROM_DEV completely ripped out
> of sg.c, for obvious reasons. Can you not duplicate the resid "fix"
> it provides into "FROM_DEV" -- do apps really rely on it?

At the beginning of this thread it was mentioned cdparanio uses it.
But in general we can't just rip out userland interfaces, we pretend
to have a stable userspace abi (and except for the big sysfs mess that
actually comes very close to the truth).

What we should do is to document very well what SG_DXFER_TO_FROM_DEV
is doing and that odd name that's been chosen for it. I'll prepare
a patch for that.

2006-11-11 16:39:54

by Douglas Gilbert

[permalink] [raw]
Subject: Re: 2.6.19-rc3 system freezes when ripping with cdparanoia at ioctl(SG_IO)

Christoph Hellwig wrote:
> On Fri, Nov 10, 2006 at 12:08:15PM -0800, Luben Tuikov wrote:
>> P.S. I'd love to see SG_DXFER_TO_FROM_DEV completely ripped out
>> of sg.c, for obvious reasons. Can you not duplicate the resid "fix"
>> it provides into "FROM_DEV" -- do apps really rely on it?
>
> At the beginning of this thread it was mentioned cdparanio uses it.
> But in general we can't just rip out userland interfaces, we pretend
> to have a stable userspace abi (and except for the big sysfs mess that
> actually comes very close to the truth).
>
> What we should do is to document very well what SG_DXFER_TO_FROM_DEV
> is doing and that odd name that's been chosen for it. I'll prepare
> a patch for that.

Christoph,
It is documented and has been from day one. See scsi/sg.h
and http://sg.torque.net/sg/p/sg_v3_ho.html

Naming it is a challenge and at the time there
were no bidirectional transfers to/from a device
to worry about.

A more appropriate but impractical name might be:
SG_DXFER_TO_KERNEL_BUFFER_THEN_READ_FROM_DEV_VIA_KERNEL_BUFFER


Doug Gilbert

2006-11-11 19:09:37

by Luben Tuikov

[permalink] [raw]
Subject: Re: 2.6.19-rc3 system freezes when ripping with cdparanoia at ioctl(SG_IO)

--- Christoph Hellwig <[email protected]> wrote:
> On Fri, Nov 10, 2006 at 12:08:15PM -0800, Luben Tuikov wrote:
> > P.S. I'd love to see SG_DXFER_TO_FROM_DEV completely ripped out
> > of sg.c, for obvious reasons. Can you not duplicate the resid "fix"
> > it provides into "FROM_DEV" -- do apps really rely on it?
>
> At the beginning of this thread it was mentioned cdparanio uses it.
> But in general we can't just rip out userland interfaces, we pretend
> to have a stable userspace abi (and except for the big sysfs mess that
> actually comes very close to the truth).

The more reason to think things thorougly when introducing
new code and architecture into a kernel.

Luben

> What we should do is to document very well what SG_DXFER_TO_FROM_DEV
> is doing and that odd name that's been chosen for it. I'll prepare
> a patch for that.


2006-11-14 12:24:42

by Brice Goglin

[permalink] [raw]
Subject: Re: 2.6.19-rc3 system freezes when ripping with cdparanoia at ioctl(SG_IO)

I just tried commit 616e8a091a035c0bd9b871695f4af191df123caa on top of
rc5 just in case. This commit fixes
http://lkml.org/lkml/2006/10/13/100, which looks related. And it
actually appears to fix our freeze too. Does this speak to you guys ?

Brice

2006-11-14 12:26:38

by Jens Axboe

[permalink] [raw]
Subject: Re: 2.6.19-rc3 system freezes when ripping with cdparanoia at ioctl(SG_IO)

On Tue, Nov 14 2006, Brice Goglin wrote:
> I just tried commit 616e8a091a035c0bd9b871695f4af191df123caa on top of
> rc5 just in case. This commit fixes
> http://lkml.org/lkml/2006/10/13/100, which looks related. And it
> actually appears to fix our freeze too. Does this speak to you guys ?

I thought you had already tested that? Well that's good news, so it was
a similar bug after all. Another one closed for 2.6.19.

--
Jens Axboe

2006-11-14 12:40:19

by Brice Goglin

[permalink] [raw]
Subject: Re: 2.6.19-rc3 system freezes when ripping with cdparanoia at ioctl(SG_IO)

Jens Axboe wrote:
> On Tue, Nov 14 2006, Brice Goglin wrote:
>
>> I just tried commit 616e8a091a035c0bd9b871695f4af191df123caa on top of
>> rc5 just in case. This commit fixes
>> http://lkml.org/lkml/2006/10/13/100, which looks related. And it
>> actually appears to fix our freeze too. Does this speak to you guys ?
>>
>
> I thought you had already tested that?

IIRC, the one I tested was
http://marc.theaimsgroup.com/?l=linux-scsi&m=116267031029025&w=2. It
does something similar in sg.c instead of scsi_ioctl.c.

Thanks,
Brice

2006-11-14 12:47:38

by Jens Axboe

[permalink] [raw]
Subject: Re: 2.6.19-rc3 system freezes when ripping with cdparanoia at ioctl(SG_IO)

On Tue, Nov 14 2006, Brice Goglin wrote:
> Jens Axboe wrote:
> > On Tue, Nov 14 2006, Brice Goglin wrote:
> >
> >> I just tried commit 616e8a091a035c0bd9b871695f4af191df123caa on top of
> >> rc5 just in case. This commit fixes
> >> http://lkml.org/lkml/2006/10/13/100, which looks related. And it
> >> actually appears to fix our freeze too. Does this speak to you guys ?
> >>
> >
> > I thought you had already tested that?
>
> IIRC, the one I tested was
> http://marc.theaimsgroup.com/?l=linux-scsi&m=116267031029025&w=2. It
> does something similar in sg.c instead of scsi_ioctl.c.

You most likely aren't using sg, but the block layer direct path. So no
wonder it didn't change anything.

--
Jens Axboe

2006-11-14 22:52:10

by Monty Montgomery

[permalink] [raw]
Subject: Re: 2.6.19-rc3 system freezes when ripping with cdparanoia at ioctl(SG_IO)

On 11/11/06, Luben Tuikov <[email protected]> wrote:
> --- Christoph Hellwig <[email protected]> wrote:
> > On Fri, Nov 10, 2006 at 12:08:15PM -0800, Luben Tuikov wrote:
> > > P.S. I'd love to see SG_DXFER_TO_FROM_DEV completely ripped out
> > > of sg.c, for obvious reasons. Can you not duplicate the resid "fix"
> > > it provides into "FROM_DEV" -- do apps really rely on it?
> >
> > At the beginning of this thread it was mentioned cdparanio uses it.
> > But in general we can't just rip out userland interfaces, we pretend
> > to have a stable userspace abi (and except for the big sysfs mess that
> > actually comes very close to the truth).
>
> The more reason to think things thorougly when introducing
> new code and architecture into a kernel.

It was introduced for a good reason, and that reason is still relevant
today. Cdparanoia is not using it gratuitously. The only problem is
that the implementation had a bug (well, at least two bugs) and only
sg ever implemented it correctly. Had block and sata implemente dit
correctly, we'd not be having this discussion.

Or you can blame a lower level layer for having no way to inform
mid-level drivers that DMA only completed a partial transfer.

"but anyway"...

This lockup was happening using SATA through the block layer, or does
SATA implement its own version of the ioctl? Back when I was testing
my probing code, the buggy kernel would reject the request, not lock
up-- did a change make it inot 2.6.18 or later that causes a lockup
instead?

(I never tested with SATA cdroms, as I don't have any. I tested with
IDE and SCSI and saw correct or detectable behavior)

Monty