2006-05-04 12:37:22

by K. Ernel

[permalink] [raw]
Subject: cdrom: a dirty CD can freeze your system


good day,

kernel-version: 2.6.16.13 preemptible

I've been experimenting with damaged CDs this day. I observed that
a dirty or (partly) unreadable CD will (1) block the process which is
trying to read from the CD - it will be in state "D" - uninterruptible
sleep and (2) sometimes(?) probably freeze your system such that even
a manual reboot wont work (e.g., because it's not possible to log in, or
keystrokes are no longer accepted) and a power-cycle is required.

the uninterruptible process will force a reboot - it wont go away.

one can observe that freeze in that icmp echo requests will be sent
back with several seconds delay (depending on how much buffering is
done).

the kernel log shows:
hdb: DMA timeout retry
hdb: timeout waiting for DMA
hdb: status timeout: status=0xd0 { Busy }
ide: failed opcode was: unknown
hdb: drive not ready for command
hdb: ATAPI reset complete
hdb: irq timeout: status=0xd0 { Busy }
ide: failed opcode was: unknown
hdb: ATAPI reset complete
hdb: DMA timeout retry
hdb: timeout waiting for DMA
hdb: status timeout: status=0xd0 { Busy }
ide: failed opcode was: unknown
hdb: drive not ready for command
hdb: ATAPI reset complete

... and so on (so the drive is (BUSY | READY | SEEK )

even sending an "hdparm -w" to the drive wont work, in contrast, it will
make it worse because it eventuelly will trigger a kernel panic.

just for sake of completeness, data is read from the device via "SG_IO"
ioctl and "READ CD" command accorinding to the MMC specs. the program
works well for undamaged CDs.

please tell me a way to savely
(1) reset the IDE interface, e.g via IDE-TASKFILE (or, for testing,
a sequence of outb() to the chip)
(2) reset the CD-drive - sending a WIN_DEVICE_RESET (linux/hdreg.h line 196)
doesnt seem to be enough.

kind regards,
herbert rosmanith


2006-05-04 12:45:18

by Michael Tokarev

[permalink] [raw]
Subject: Re: cdrom: a dirty CD can freeze your system

Herbert Rosmanith wrote:
> good day,
>
> kernel-version: 2.6.16.13 preemptible
>
> I've been experimenting with damaged CDs this day. I observed that
> a dirty or (partly) unreadable CD will (1) block the process which is
> trying to read from the CD - it will be in state "D" - uninterruptible
> sleep and (2) sometimes(?) probably freeze your system such that even
> a manual reboot wont work (e.g., because it's not possible to log in, or
> keystrokes are no longer accepted) and a power-cycle is required.
>
> the uninterruptible process will force a reboot - it wont go away.

It's worse than that. See http://marc.theaimsgroup.com/?t=114003595500002&r=1&w=2
and other similar reports. So far, noone cares it seems (for several years already).

/mjt

2006-05-04 13:01:05

by K. Ernel

[permalink] [raw]
Subject: Re: cdrom: a dirty CD can freeze your system


> It's worse than that. See http://marc.theaimsgroup.com/?t=114003595500002&r=1&w=2
> and other similar reports. So far, noone cares it seems (for several years already).

woops ... fortunately, I dont have that kind of problem. my code just does:

loop {
ioctl( SG_IO - timeout=3 seconds);
write block to disk.
}

SG_IO behaves a bit more friendly.... than, say, "CDROMREAD{MODE1,MODE2,AUDIO}" does.
nevertheless, the IDE interface becomes unusable until you reboot the system.

e.g., just right now, I did:

o insert bad CD
o read it until an error occurs.
o "hdparm -w /dev/hdb" - this will turn DMA off. kernel log shows:
hdb: DMA disabled
hdb: ATAPI reset complete
o "hdparm -d 1 /dev/hdb" to reenable DMA, "hdparm /dev/hdb" to look at the
drive settings. the kernel log then shows:
hdb: irq timeout: status=0xd0 { Busy }
ide: failed opcode was: unknown
hdb: ATAPI reset complete
hdb: status error: status=0x49 { DriveReady DataRequest Error }
hdb: status error: error=0x04 { AbortedCommand }
ide: failed opcode was: unknown
hdb: drive not ready for command
hda: dma_intr: status=0x51 { DriveReady SeekComplete Error }
hda: dma_intr: error=0x84 { DriveStatusError BadCRC }
ide: failed opcode was: unknown
hdb: request sense failure: status=0x51 { DriveReady SeekComplete Error }
hdb: request sense failure: error=0x44 { AbortedCommand LastFailedSense=0x04 }

hdparm is now in state "D" -> reboot required.
not so good, da?

kind regards,
herbert rosmanith

2006-05-04 13:37:33

by Alan

[permalink] [raw]
Subject: Re: cdrom: a dirty CD can freeze your system

On Iau, 2006-05-04 at 14:32 +0200, Herbert Rosmanith wrote:
> I've been experimenting with damaged CDs this day. I observed that
> a dirty or (partly) unreadable CD will (1) block the process which is
> trying to read from the CD - it will be in state "D" - uninterruptible
> sleep and (2) sometimes(?) probably freeze your system such that even
> a manual reboot wont work (e.g., because it's not possible to log in, or
> keystrokes are no longer accepted) and a power-cycle is required.

This is a known problem with the old IDE layer. There are several
problems involved

1. The old IDE layer reset confuses some drives fatally
2. The DMA recovery tricks it does break the state machine of some
controllers and hang them for good
3. The error recovery and timer code races and can hang
4. The speed change paths used on DMA fail change down race everything

> please tell me a way to savely
> (1) reset the IDE interface, e.g via IDE-TASKFILE (or, for testing,
> a sequence of outb() to the chip)
> (2) reset the CD-drive - sending a WIN_DEVICE_RESET (linux/hdreg.h line 196)
> doesnt seem to be enough.

Please try the libata PATA patches instead of the old IDE layer.

Alan

2006-05-04 14:14:15

by Christian Trefzer

[permalink] [raw]
Subject: Re: cdrom: a dirty CD can freeze your system

Hi Alan et.al.,

On Thu, May 04, 2006 at 02:48:52PM +0100, Alan Cox wrote:
>
> Please try the libata PATA patches instead of the old IDE layer.
>

I'd love to, but currently I'm running git kernels on both of my
machines, and unfortunately 2.6.16-ide1 won't apply ; )

Since you've been busy I didn't want to bother you, but now that you
mention your PATA efforts again, is there a git tree to pull from, which
contains code similar to that in the latest patches?

I understand that your work is gradually flowing through Jeff, and over
to Linus from there which adds up to, but is not the only reason for,
the huge amount of rejects. I'd rather not waste my time messing with
unclean patching attempts, otherwise my studies _are_ going to kill me.

I have a remote entry for Jeff's pata-drivers branch, but that one won't
discover any of my ide controllers so far. Your patches have been
working very reliably though, so I am annoyed (to say the least) to have
the stuff about missing write barrier support back in my logs. Since I
need John Linville's tree for some WiFi hackery tryouts, I can't seem to
get around running git kernels these days, so I'm back to drivers/ide.
Sigh.

If you've got something for me I'd be happy to keep test-driving the
good stuff some more. It had been working very well for me until the
switch from tar/patch to git.

Keep up the good work : )
Thanks a bunch,
Chris


Attachments:
(No filename) (1.37 kB)
(No filename) (829.00 B)
Download all attachments

2006-05-04 14:44:54

by Joseph Cheek

[permalink] [raw]
Subject: Re: cdrom: a dirty CD can freeze your system

Michael Tokarev wrote:

> Herbert Rosmanith wrote:
> It's worse than that. See http://marc.theaimsgroup.com/?t=114003595500002&r=1&w=2
> and other similar reports. So far, noone cares it seems (for several years already).
>
> /mjt
>

I would love to see this fixed. I hit it often on DVDs.

Joseph

2006-05-04 15:17:14

by Alan

[permalink] [raw]
Subject: Re: cdrom: a dirty CD can freeze your system

On Iau, 2006-05-04 at 16:14 +0200, Christian Trefzer wrote:
> I'd love to, but currently I'm running git kernels on both of my
> machines, and unfortunately 2.6.16-ide1 won't apply ; )

Fair enough 8)

> Since you've been busy I didn't want to bother you, but now that you
> mention your PATA efforts again, is there a git tree to pull from, which
> contains code similar to that in the latest patches?

Not for the current code. The core stuff is mostly in the tree now and
I'll try and push a patch some time today or tomorrow thats versus
2.6.17-rc and should match.

Alan

2006-05-04 16:44:19

by Wakko Warner

[permalink] [raw]
Subject: Re: cdrom: a dirty CD can freeze your system

Alan Cox wrote:
> On Iau, 2006-05-04 at 14:32 +0200, Herbert Rosmanith wrote:
> > I've been experimenting with damaged CDs this day. I observed that
> > a dirty or (partly) unreadable CD will (1) block the process which is
> > trying to read from the CD - it will be in state "D" - uninterruptible
> > sleep and (2) sometimes(?) probably freeze your system such that even
> > a manual reboot wont work (e.g., because it's not possible to log in, or
> > keystrokes are no longer accepted) and a power-cycle is required.
>
> This is a known problem with the old IDE layer. There are several
> problems involved
>
> 1. The old IDE layer reset confuses some drives fatally
> 2. The DMA recovery tricks it does break the state machine of some
> controllers and hang them for good
> 3. The error recovery and timer code races and can hang
> 4. The speed change paths used on DMA fail change down race everything
>
> > please tell me a way to savely
> > (1) reset the IDE interface, e.g via IDE-TASKFILE (or, for testing,
> > a sequence of outb() to the chip)
> > (2) reset the CD-drive - sending a WIN_DEVICE_RESET (linux/hdreg.h line 196)
> > doesnt seem to be enough.
>
> Please try the libata PATA patches instead of the old IDE layer.

I have noticed a problem which I believe is in sr_mod. Doesn't matter if
the physical connection is ide, scsi, usb, etc.

If I access a drive that is not ready (ie, no disc, or in the process of
loading in a disc), the drive will no longer function properly.

I'm not sure if I can explain it fully, and I'm not sure if it's already
been reported.

I place a CD on the tray and do a mount. Mount will fail (lets just assume
that under 2.4.x this worked). I eject the cd and reinsert and wait for it
to become ready. Mount will still fail. Last I recall getblks ioctl
returns 2 or 4 in this case. The only way to fix is to rmmod sr_mod and
reinsert sr_mod.

another example would be that I insert a disc, say with 159000 sectors and
I'm able to read from it just fine. I make the above mistake but I insert a
disc with 200,000 sectors. The disc will be reported with 159000 instead of
the correct 200,000 sectors and some files will not be readable. Again,
rmmod and modprobe sr_mod fixes the problem.

I've been able to reproduce this on every linux system running 2.6 that I've
used with a CDRom.

--
Lab tests show that use of micro$oft causes cancer in lab animals
Got Gas???

2006-05-04 16:59:45

by Alan

[permalink] [raw]
Subject: Re: cdrom: a dirty CD can freeze your system

On Iau, 2006-05-04 at 12:50 -0400, Wakko Warner wrote:
> another example would be that I insert a disc, say with 159000 sectors and
> I'm able to read from it just fine. I make the above mistake but I insert a
> disc with 200,000 sectors. The disc will be reported with 159000 instead of
> the correct 200,000 sectors and some files will not be readable. Again,
> rmmod and modprobe sr_mod fixes the problem.


That one I have seen with some broken media monitoring software that
never closes the file handle. What occurs then is that we don't for some
reason alway see a media change.

Is this SATA or SCSI proper ?

2006-05-04 17:27:38

by Joshua Hudson

[permalink] [raw]
Subject: Re: cdrom: a dirty CD can freeze your system

I've seen this a few times. It never actually hung my system, only one
virtual console. I wonder if preemptable kernel had something to do
with that <g>

2006-05-04 17:54:49

by Christian Trefzer

[permalink] [raw]
Subject: Re: cdrom: a dirty CD can freeze your system

On Thu, May 04, 2006 at 04:28:42PM +0100, Alan Cox wrote:
>
> > Since you've been busy I didn't want to bother you, but now that you
> > mention your PATA efforts again, is there a git tree to pull from,
> > which contains code similar to that in the latest patches?
>
> Not for the current code. The core stuff is mostly in the tree now and
> I'll try and push a patch some time today or tomorrow thats versus
> 2.6.17-rc and should match.
>

Sounds great! I'll build new kernels for all my boxes as soon as I can
get a hold on said patch. At least it "felt" cleaner and I/O was a
little less of a handbrake using libata, so I'll go for it once again.

Just one more thing, I had to hack a little on Kconfig files to make the
"newer" promise driver available - if my memory doesn't fail me I sent a
patch, more like a RFC. Are some drivers intentionally left out of
Kbuild? I could not trigger any problem so far, using ata_piix on this
laptop, and pata_via / pata_pdc2027x on my desktop.

The only strangeness I had was some windoze firmware upgrade tool for my
ATAPI CDRW drive running in wine, poking on every sg device in
existence, thus triggering a freeze as it messed with the disks in some
wicked way. But since this was never intended to work in the first
place, I was happy with it working after simply deleting all sg devs
corresponding with disks. And I guess it is worth mentioning that the
SCSI IOCTLs in question are only accepted by the SCSI stack when the
process is run as root, so it's not exactly something anybody could try
on a machine he cannot already kill. Attempts to run this as an
ordinary user would make the firmware tool get stuck with an all-empty
progress bar, and the wine processes were easily TERM-able.

If there's anything I might want to try out or you'd want to know, like
lspci output and such, please let me know. I'm not home right now, but
here goes for starters.


lspci excerpt:

00:07.1 IDE interface: Intel Corporation 82371AB/EB/MB PIIX4 IDE (rev 01) (prog-if 80 [Master])


lspci -vvvxxxn excerpt:

00:07.1 0101: 8086:7111 (rev 01) (prog-if 80)
Control: I/O+ Mem- BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B-
Status: Cap- 66MHz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR-
Latency: 32
Region 4: I/O ports at 0860 [size=16]
00: 86 80 11 71 05 00 80 02 01 80 01 01 00 20 00 00
10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
20: 61 08 00 00 00 00 00 00 00 00 00 00 00 00 00 00
30: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
40: 07 e3 07 e3 00 00 00 00 01 00 02 00 00 00 00 00
50: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
60: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
70: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
90: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
a0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
b0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
c0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
d0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
f0: 00 00 00 00 00 00 00 00 30 0f 00 00 00 00 00 00


This one has been working perfectly so far, on an ancient Dell Latitude
CPiA.


Kind regards,
Chris


Attachments:
(No filename) (3.20 kB)
(No filename) (829.00 B)
Download all attachments

2006-05-04 20:40:34

by Wakko Warner

[permalink] [raw]
Subject: Re: cdrom: a dirty CD can freeze your system

Joshua Hudson wrote:
> I've seen this a few times. It never actually hung my system, only one
> virtual console. I wonder if preemptable kernel had something to do
> with that <g>

I don't believe pre-empt has anything to do eith it. I have a specialized
boot system (vairous types of boot media) w/o preempt turned on because I
want this as small as possible. It also has this problem.

--
Lab tests show that use of micro$oft causes cancer in lab animals
Got Gas???

2006-05-04 20:56:15

by L A Walsh

[permalink] [raw]
Subject: kernel keeps empty CDROM(DVD)-drive "busy"; (was Re: cdrom: a dirty CD can freeze your system)


Alan Cox wrote:
> This is a known problem with the old IDE layer. There are several
> problems involved
>
---
Maybe I'm running into this same problem. Reading the archived
thread about the linux kernel burning out drives toggled a slight
"worry" bit in my head. Perhaps this is "nothing to worry about"
(ignore the blinking warning light behind the curtain...) and is
another "artifact" of the "ancient" IDE driver code.

I have a Plextor IDE, internal CD/DVD writer. There is no media in
the drive. I _used_ to keep a blank CDROM (ready to burn) in the
drive if I wasn't "around", to keep dust from settling on the tray and
to have a CD ready-to-burn if I was logged in from the other room. But
I kept getting read errors, on boot, so I tried not loading media, which
is where I'm at now.

After boot, the "active" light on the drive turns on for about 3-5
seconds, then blinks off for <1 second, then 3-5 seconds on
again...and repeat, as though it is trying to read a media, failing
then trying again. It repeats this for as long as the system is up.

I've set drive read-ahead to 0, write-cache to off (not that those
settings "should" make a difference with no media in the drive. I
also tried telling the drive to "sleep" (via hdparm), to no
avail.

It "ignores" (gives another error, actually) an attempt to "eject"
from the command line.

It "ignores" pushing the device's door open button (unless I do it
after a power-cycle reset to the system).

It seems to work (at least last time I tried it) for reading CD's
and DVD's as well as burning CD's (haven't tried to burn any DVD's
with it). However, having the device constantly "selected" and
_appearing_ to retry is a bit bothersome given the experience of
another poster in the archived thread that was mentioned -- i.e. --
their drive seemed to burn itself out. I'm not having the exact
same symptoms, as I'm not trying to directly access the drive (nor
am I experiencing any kernel hangs; (sidenote: not running the
preempt kernel, but am running the "voluntary preempt" kernel).

Seeing the "access/select" light on most of the time, though, makes
me wonder if something may be getting worn. Unfortunately, I
can't hear if the drive is actually running due to the whine of
multiple hard disks

In regards to error messages, after every boot, the kernel
issues some errors (there is no CD in the drive) regarding
drive errors on the cdrom:

hdc: ATAPI 40X DVD-ROM DVD-R CD-R/RW drive, 8192kB Cache, UDMA(33)
Uniform CD-ROM driver Revision: 3.20
hdc: packet command error: status=0x51 { DriveReady SeekComplete Error }
hdc: packet command error: error=0x44 { AbortedCommand
LastFailedSense=0x04 }
ide: failed opcode was: unknown
ATAPI device hdc:
Error: Hardware error -- (Sense key=0x04)
Tracking servo failure -- (asc=0x09, ascq=0x01)
The failed "Read Cd/Dvd Capacity" packet command was:
"25 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 "
ACPI: PCI Interrupt 0000:03:0a.0[A] -> GSI 18 (level, low) -> IRQ 17

---
The above comes out when the IDE devices are probed during boot
(followed by a SCSI sda disk probe).

At file-system mount time, I see another few errors (even though there
the cdrom related mount lines in fstab are commented out, specifically to
try to silence these error messages):
...
XFS mounting filesystem hdg1
Ending clean XFS mount for filesystem: hdg1
Adding 265064k swap on /dev/sda2. Priority:-1 extents:1 across:265064k
hdc: packet command error: status=0x51 { DriveReady SeekComplete Error }
hdc: packet command error: error=0x44 { AbortedCommand
LastFailedSense=0x04 }
ide: failed opcode was: unknown
ATAPI device hdc:
Error: Hardware error -- (Sense key=0x04)
Tracking servo failure -- (asc=0x09, ascq=0x01)
The failed "Read Cd/Dvd Capacity" packet command was:
"25 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 "
hdc: drive_cmd: status=0x51 { DriveReady SeekComplete Error }
hdc: drive_cmd: error=0x04 { AbortedCommand }
ide: failed opcode was: 0xec
end_request: I/O error, dev fd0, sector 0
end_request: I/O error, dev fd0, sector 0
hdc: packet command error: status=0x51 { DriveReady SeekComplete Error }
hdc: packet command error: error=0x44 { AbortedCommand
LastFailedSense=0x04 }
ide: failed opcode was: unknown
ATAPI device hdc:
Error: Hardware error -- (Sense key=0x04)
Tracking servo failure -- (asc=0x09, ascq=0x01)
The failed "Read Cd/Dvd Capacity" packet command was:
"25 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 "
---
Hdparm (FWIW) shows:
hdparm -vi /dev/hdc

/dev/hdc:
IO_support = 1 (32-bit)
unmaskirq = 1 (on)
using_dma = 1 (on)
keepsettings = 0 (off)
readonly = 1 (on)
readahead = 0 (off)
HDIO_GETGEO failed: Inappropriate ioctl for device

Model=PLEXTOR DVDR PX-716A, FwRev=1.04, SerialNo=496556
Config={ Fixed Removeable DTR<=5Mbs DTR>10Mbs nonMagnetic }
RawCHS=0/0/0, TrkSize=0, SectSize=0, ECCbytes=0
BuffType=unknown, BuffSize=0kB, MaxMultSect=0
(maybe): CurCHS=0/0/0, CurSects=0, LBA=yes, LBAsects=0
IORDY=on/off, tPIO={min:120,w/IORDY:120}, tDMA={min:120,rec:120}
PIO modes: pio0 pio1 pio2 pio3 pio4
DMA modes: mdma0 mdma1 mdma2
UDMA modes: udma0 udma1 *udma2 udma3 udma4
AdvancedPM=no
Drive conforms to: device does not report version:

* signifies the current active mode

2006-05-05 00:10:47

by Joshua Hudson

[permalink] [raw]
Subject: Re: cdrom: a dirty CD can freeze your system

On 5/4/06, Wakko Warner <[email protected]> wrote:
> Joshua Hudson wrote:
> > I've seen this a few times. It never actually hung my system, only one
> > virtual console. I wonder if preemptable kernel had something to do
> > with that <g>
>
> I don't believe pre-empt has anything to do eith it. I have a specialized
> boot system (vairous types of boot media) w/o preempt turned on because I
> want this as small as possible. It also has this problem.

Uuhhh. I though preempt might be the reason the who system *wasn't* hanging.

2006-05-05 00:13:35

by Wakko Warner

[permalink] [raw]
Subject: Re: cdrom: a dirty CD can freeze your system

Joshua Hudson wrote:
> On 5/4/06, Wakko Warner <[email protected]> wrote:
> >Joshua Hudson wrote:
> >> I've seen this a few times. It never actually hung my system, only one
> >> virtual console. I wonder if preemptable kernel had something to do
> >> with that <g>
> >
> >I don't believe pre-empt has anything to do eith it. I have a specialized
> >boot system (vairous types of boot media) w/o preempt turned on because I
> >want this as small as possible. It also has this problem.
>
> Uuhhh. I though preempt might be the reason the who system *wasn't* hanging.

One of those "didn't read the whole message" errors. Oops. All of my
systems (not the speciallized one) are preemptable. I have not noticed any
lockups on those.

--
Lab tests show that use of micro$oft causes cancer in lab animals
Got Gas???