Problem:
I use SuSE Linux 7.2 and when I create md5sums from damaged files on a
CD, the WHOLE system freezes or is ugly slow untill md5 has passed the
damaged part of the file !
My CDRom drive is a HP8100i IDE CD Writer attached as HDD at the onbord
IDE Controller
test case:
md5 "/cdrom/damaged file"
cat /proc/version output:
Linux version 2.4.4-4GB ([email protected]) (gcc version 2.95.3
20010315 (SuSE)) #1 Wed May 16 00:37:55 GMT 2001
linux_ver output:
Linux filez 2.4.4-4GB #1 Wed May 16 00:37:55 GMT 2001 i586 unknown
Gnu C 2.95.3
Gnu make 3.79.1
binutils 2.10.91.0.4
util-linux 2.11b
mount 2.11b
modutils 2.4.5
e2fsprogs 1.19
reiserfsprogs 3.x.0j
PPP 2.4.0
isdn4k-utils 3.1pre1a
Linux C Library x 1 root root 1343073 Mai 11 2001
/lib/libc.so.6
Dynamic linker (ldd) 2.2.2
Procps 2.0.7
Net-tools 1.60
Kbd 1.04
Sh-utils 2.0
Modules Loaded ppp_async ppp_generic nls_iso8859-1 snd-pcm-oss
snd-pcm-plugin snd-mixer-oss snd-synth-emu8000 snd-synth-emux
snd-seq-midi-emul snd-seq-virmidi snd-emux-mem snd-seq-midi
snd-seq-midi-event snd-seq snd-card-sbawe isa-pnp snd-sb16-csp
snd-sb16-dsp snd-pcm snd-mixer snd-opl3 snd-hwdep snd-timer
snd-mpu401-uart snd-rawmidi snd-seq-device snd soundcore af_packet nfsd
lp parport usbcore ne2k-pci 8390 hisax isdn loop_fish2 ide-scsi rtl8139
reiserfs
cat /proc/cpuinfo output:
processor : 0
vendor_id : AuthenticAMD
cpu family : 5
model : 7
model name : AMD-K6tm w/ multimedia extensions
stepping : 0
cpu MHz : 267.281
cache size : 64 KB
fdiv_bug : no
hlt_bug : no
f00f_bug : no
coma_bug : no
fpu : yes
fpu_exception : yes
cpuid level : 1
wp : yes
flags : fpu vme de pse tsc msr mce cx8 mmx
bogomips : 532.48
cat /proc/modules output:
ppp_async 6480 0 (autoclean) (unused)
ppp_generic 14416 0 (autoclean) [ppp_async]
nls_iso8859-1 2880 1 (autoclean)
snd-pcm-oss 18816 1 (autoclean)
snd-pcm-plugin 15024 0 (autoclean) [snd-pcm-oss]
snd-mixer-oss 5120 0 (autoclean) [snd-pcm-oss]
snd-synth-emu8000 16176 0 (unused)
snd-synth-emux 26592 0 [snd-synth-emu8000]
snd-seq-midi-emul 4480 0 [snd-synth-emux]
snd-seq-virmidi 8496 0 [snd-synth-emux]
snd-emux-mem 1616 0 [snd-synth-emu8000 snd-synth-emux]
snd-seq-midi 3568 0 (unused)
snd-seq-midi-event 2992 0 [snd-seq-virmidi snd-seq-midi]
snd-seq 42656 0 [snd-synth-emux snd-seq-virmidi
snd-seq-midi snd-seq-midi-event]
snd-card-sbawe 5536 1
isa-pnp 28176 0 [snd-card-sbawe]
snd-sb16-csp 15888 0 [snd-card-sbawe]
snd-sb16-dsp 15888 0 [snd-card-sbawe snd-sb16-csp]
snd-pcm 30560 0 [snd-pcm-oss snd-pcm-plugin snd-sb16-dsp]
snd-mixer 24224 0 [snd-mixer-oss snd-synth-emu8000
snd-sb16-csp snd-sb16-dsp]
snd-opl3 4848 0 [snd-card-sbawe]
snd-hwdep 3376 0 [snd-sb16-csp snd-opl3]
snd-timer 8560 0 [snd-seq snd-pcm snd-opl3]
snd-mpu401-uart 2512 0 [snd-card-sbawe]
snd-rawmidi 9664 0 [snd-seq-midi snd-mpu401-uart]
snd-seq-device 4032 0 [snd-synth-emu8000 snd-synth-emux
snd-seq-midi snd-seq snd-card-sbawe snd-rawmidi]
snd 34032 1 [snd-pcm-oss snd-pcm-plugin
snd-mixer-oss snd-synth-emu8000 snd-synth-emux snd-seq-virm
idi snd-emux-mem snd-seq-midi snd-seq-midi-event snd-seq snd-card-sbawe
snd-sb16-csp snd-sb16-dsp snd-pcm snd-mixer snd-
opl3 snd-hwdep snd-timer snd-mpu401-uart snd-rawmidi snd-seq-device]
soundcore 3632 7 [snd]
af_packet 11648 2 (autoclean)
nfsd 67280 4 (autoclean)
lp 5392 0 (autoclean)
parport 24352 0 (autoclean) [lp]
usbcore 47120 0 (autoclean)
ne2k-pci 4640 1 (autoclean)
8390 6240 0 (autoclean) [ne2k-pci]
hisax 496192 1
isdn 123056 2 [hisax]
loop_fish2 9280 0 (unused)
ide-scsi 7856 1
rtl8139 11520 1
reiserfs 156432 3
cat /proc/ioports output:
0000-001f : dma1
0020-003f : pic1
0040-005f : timer
0060-006f : keyboard
0070-007f : rtc
0080-008f : dma page reg
00a0-00bf : pic2
00c0-00df : dma2
00f0-00ff : fpu
0170-0177 : ide1
01f0-01f7 : ide0
0213-0213 : isapnp read
0220-022f : Sound Blaster AWE32/64
0330-0331 : Sound Blaster AWE32/64 - MPU-401
0376-0376 : ide1
0388-038b : Sound Blaster AWE32/64 - FM
03c0-03df : vga+
03f6-03f6 : ide0
0620-0623 : Sound Blaster AWE32/64 - WaveTable
0a20-0a23 : Sound Blaster AWE32/64 - WaveTable
0a79-0a79 : isapnp write
0cf8-0cff : PCI conf1
0e20-0e23 : Sound Blaster AWE32/64 - WaveTable
4000-40ff : PCI device 1106:3040
c000-cfff : PCI Bus #01
d000-d00f : PCI device 1106:0571
d000-d007 : ide0
d008-d00f : ide1
d800-d8ff : PCI device 10ec:8139
d800-d8ff : 8139too
dc00-dc1f : PCI device 1244:0a00
dc00-dc1f : avm PCI
e000-e01f : PCI device 10ec:8029
e000-e01f : ne2k-pci
cat /proc/iomem output:
00000000-0009fbff : System RAM
0009fc00-0009ffff : reserved
000a0000-000bffff : Video RAM area
000c0000-000c7fff : Video ROM
000f0000-000fffff : System ROM
00100000-0fffffff : System RAM
00100000-002327d1 : Kernel code
002327d2-0031bdcb : Kernel data
e0000000-e3ffffff : PCI device 1106:0598
e4000000-e40000ff : PCI device 10ec:8139
e4000000-e40000ff : 8139too
e4001000-e400101f : PCI device 1244:0a00
ffff0000-ffffffff : reserved
lspci -vvv output:
00:00.0 Host bridge: VIA Technologies, Inc. VT82C598 [Apollo MVP3] (rev 04)
Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop-
ParErr- Stepping- SERR- FastB2B-
Status: Cap+ 66Mhz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort-
<TAbort- <MAbort- >SERR- <PERR+
Latency: 0
Region 0: Memory at e0000000 (32-bit, prefetchable) [size=64M]
Capabilities: [a0] AGP version 1.0
Status: RQ=7 SBA+ 64bit- FW- Rate=x1,x2
Command: RQ=0 SBA- AGP- 64bit- FW- Rate=<none>
00:01.0 PCI bridge: VIA Technologies, Inc. VT82C598/694x [Apollo
MVP3/Pro133x AGP] (prog-if 00 [Normal decode])
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop-
ParErr- Stepping- SERR- FastB2B-
Status: Cap- 66Mhz+ UDF- FastB2B- ParErr- DEVSEL=medium >TAbort-
<TAbort- <MAbort+ >SERR- <PERR-
Latency: 0
Bus: primary=00, secondary=01, subordinate=01, sec-latency=0
I/O behind bridge: 0000c000-0000cfff
Memory behind bridge: fff00000-000fffff
Prefetchable memory behind bridge: fff00000-000fffff
BridgeCtl: Parity- SERR- NoISA+ VGA- MAbort- >Reset- FastB2B-
00:07.0 ISA bridge: VIA Technologies, Inc. VT82C586/A/B PCI-to-ISA
[Apollo VP] (rev 47)
Subsystem: VIA Technologies, Inc. MVP3 ISA Bridge
Control: I/O+ Mem+ BusMaster+ SpecCycle+ MemWINV- VGASnoop-
ParErr- Stepping+ SERR- FastB2B-
Status: Cap- 66Mhz- UDF- FastB2B- ParErr- DEVSEL=medium >TAbort-
<TAbort- <MAbort- >SERR- <PERR-
Latency: 0
00:07.1 IDE interface: VIA Technologies, Inc. Bus Master IDE (rev 06)
(prog-if 8a [Master SecP PriP])
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop-
ParErr- Stepping- SERR- FastB2B-
Status: Cap- 66Mhz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort-
<TAbort- <MAbort- >SERR- <PERR-
Latency: 64
Region 4: I/O ports at d000 [size=16]
00:07.3 Host bridge: VIA Technologies, Inc. VT82C586B ACPI (rev 10)
Control: I/O- Mem- BusMaster- SpecCycle- MemWINV- VGASnoop-
ParErr- Stepping- SERR- FastB2B-
Status: Cap- 66Mhz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort-
<TAbort- <MAbort- >SERR- <PERR-
00:12.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL-8139
(rev 10)
Subsystem: Realtek Semiconductor Co., Ltd. RT8139
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop-
ParErr- Stepping- SERR- FastB2B-
Status: Cap+ 66Mhz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort-
<TAbort- <MAbort- >SERR- <PERR+
Latency: 64 (8000ns min, 16000ns max)
Interrupt: pin A routed to IRQ 11
Region 0: I/O ports at d800 [size=256]
Region 1: Memory at e4000000 (32-bit, non-prefetchable) [size=256]
Capabilities: [50] Power Management version 2
Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=0mA
PME(D0-,D1+,D2+,D3hot+,D3cold-)
Status: D0 PME-Enable- DSel=0 DScale=0 PME-
00:13.0 Network controller: AVM Audiovisuelles MKTG & Computer System
GmbH A1 ISDN [Fritz] (rev 02)
Subsystem: AVM Audiovisuelles MKTG & Computer System GmbH
FRITZ!Card ISDN Controller
Control: I/O+ Mem+ BusMaster- SpecCycle- MemWINV- VGASnoop-
ParErr- Stepping- SERR- FastB2B-
Status: Cap- 66Mhz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort-
<TAbort- <MAbort- >SERR- <PERR-
Interrupt: pin A routed to IRQ 5
Region 0: Memory at e4001000 (32-bit, non-prefetchable) [size=32]
Region 1: I/O ports at dc00 [size=32]
00:14.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL-8029(AS)
Subsystem: Realtek Semiconductor Co., Ltd. RT8029(AS)
Control: I/O+ Mem+ BusMaster- SpecCycle- MemWINV- VGASnoop-
ParErr- Stepping- SERR- FastB2B-
Status: Cap- 66Mhz- UDF- FastB2B- ParErr- DEVSEL=medium >TAbort-
<TAbort- <MAbort- >SERR- <PERR-
Interrupt: pin A routed to IRQ 7
Region 0: I/O ports at e000 [size=32]
cat /proc/scsi/scsi output:
Attached devices:
Host: scsi0 Channel: 00 Id: 00 Lun: 00
Vendor: HP Model: CD-Writer+ 8100 Rev: 1.0g
Type: CD-ROM ANSI SCSI revision: 02
/var/log/messages
Apr 18 15:06:18 filez kernel: scsi : aborting command due to timeout :
pid 0, scsi0, channel 0, id 0, lun 0 0x03 00 00 00 40 00
Apr 18 15:06:20 filez kernel: hdd: irq timeout: status=0xd0 { Busy }
Apr 18 15:06:20 filez kernel: hdd: ATAPI reset complete
Apr 18 15:06:20 filez kernel: hdd: irq timeout: status=0x80 { Busy }
Apr 18 15:06:20 filez kernel: hdd: ATAPI reset complete
Apr 18 15:06:21 filez kernel: hdd: irq timeout: status=0x80 { Busy }
Apr 18 15:06:21 filez kernel: hdd: status error: status=0x08 { DataRequest }
Apr 18 15:06:21 filez kernel: hdd: drive not ready for command
Apr 18 15:06:21 filez kernel: SCSI cdrom error : host 0 channel 0 id 0
lun 0 return code = 27000002
Apr 18 15:06:21 filez kernel: I/O error: dev 0b:00, sector 5780
Apr 18 15:06:51 filez kernel: scsi : aborting command due to timeout :
pid 0, scsi0, channel 0, id 0, lun 0 0x28 00 00 00 05 24 00 00 39 00
Apr 18 15:06:51 filez kernel: SCSI host 0 abort (pid 0) timed out -
resetting
Apr 18 15:06:51 filez kernel: SCSI bus is being reset for host 0 channel 0.
Apr 18 15:06:51 filez kernel: scsi : aborting command due to timeout :
pid 0, scsi0, channel 0, id 0, lun 0 0x28 00 00 00 05 24 00 00 39 00
Apr 18 15:06:51 filez kernel: SCSI host 0 abort (pid 0) timed out -
resetting
Apr 18 15:06:51 filez kernel: SCSI bus is being reset for host 0 channel 0.
Apr 18 15:06:52 filez kernel: scsi : aborting command due to timeout :
pid 0, scsi0, channel 0, id 0, lun 0 0x28 00 00 00 05 24 00 00 39 00
Apr 18 15:06:52 filez kernel: SCSI host 0 abort (pid 0) timed out -
resetting
Apr 18 15:06:52 filez kernel: SCSI bus is being reset for host 0 channel 0.
Apr 18 15:06:52 filez kernel: scsi : aborting command due to timeout :
pid 0, scsi0, channel 0, id 0, lun 0 0x28 00 00 00 05 24 00 00 39 00
Apr 18 15:06:52 filez kernel: SCSI host 0 abort (pid 0) timed out -
resetting
Apr 18 15:06:52 filez kernel: SCSI bus is being reset for host 0 channel 0.
Apr 18 15:06:53 filez kernel: scsi : aborting command due to timeout :
pid 0, scsi0, channel 0, id 0, lun 0 0x28 00 00 00 05 24 00 00 39 00
Apr 18 15:06:53 filez kernel: SCSI host 0 abort (pid 0) timed out -
resetting
Apr 18 15:06:53 filez kernel: SCSI bus is being reset for host 0 channel 0.
Apr 18 15:06:53 filez kernel: scsi : aborting command due to timeout :
pid 0, scsi0, channel 0, id 0, lun 0 0x28 00 00 00 05 24 00 00 39 00
Apr 18 15:06:53 filez kernel: SCSI host 0 abort (pid 0) timed out -
resetting
Apr 18 15:06:53 filez kernel: SCSI bus is being reset for host 0 channel 0.
Apr 18 15:06:54 filez kernel: scsi : aborting command due to timeout :
pid 0, scsi0, channel 0, id 0, lun 0 0x28 00 00 00 05 24 00 00 39 00
Apr 18 15:06:54 filez kernel: SCSI host 0 abort (pid 0) timed out -
resetting
Apr 18 15:06:54 filez kernel: SCSI bus is being reset for host 0 channel 0.
Apr 18 15:06:54 filez kernel: scsi : aborting command due to timeout :
pid 0, scsi0, channel 0, id 0, lun 0 0x28 00 00 00 05 24 00 00 39 00
Apr 18 15:06:54 filez kernel: SCSI host 0 abort (pid 0) timed out -
resetting
Apr 18 15:06:54 filez kernel: SCSI bus is being reset for host 0 channel 0.
Apr 18 15:06:55 filez kernel: scsi : aborting command due to timeout :
pid 0, scsi0, channel 0, id 0, lun 0 0x28 00 00 00 05 24 00 00 39 00
Apr 18 15:06:55 filez kernel: SCSI host 0 abort (pid 0) timed out -
resetting
Apr 18 15:06:55 filez kernel: SCSI bus is being reset for host 0 channel 0.
Apr 18 15:06:55 filez kernel: scsi : aborting command due to timeout :
pid 0, scsi0, channel 0, id 0, lun 0 0x28 00 00 00 05 24 00 00 39 00
Apr 18 15:06:55 filez kernel: SCSI host 0 abort (pid 0) timed out -
resetting
Apr 18 15:06:55 filez kernel: SCSI bus is being reset for host 0 channel 0.
Apr 18 15:06:56 filez kernel: scsi : aborting command due to timeout :
pid 0, scsi0, channel 0, id 0, lun 0 0x28 00 00 00 05 24 00 00 39 00
Apr 18 15:06:56 filez kernel: SCSI host 0 abort (pid 0) timed out -
resetting
Apr 18 15:06:56 filez kernel: SCSI bus is being reset for host 0 channel 0.
Apr 18 15:06:56 filez kernel: scsi : aborting command due to timeout :
pid 0, scsi0, channel 0, id 0, lun 0 0x28 00 00 00 05 24 00 00 39 00
Apr 18 15:06:56 filez kernel: SCSI host 0 abort (pid 0) timed out -
resetting
Apr 18 15:06:56 filez kernel: SCSI bus is being reset for host 0 channel 0.
Apr 18 15:06:57 filez kernel: scsi : aborting command due to timeout :
pid 0, scsi0, channel 0, id 0, lun 0 0x28 00 00 00 05 24 00 00 39 00
Apr 18 15:06:57 filez kernel: SCSI host 0 abort (pid 0) timed out -
resetting
Apr 18 15:06:57 filez kernel: SCSI bus is being reset for host 0 channel 0.
Apr 18 15:06:59 filez kernel: scsi : aborting command due to timeout :
pid 0, scsi0, channel 0, id 0, lun 0 0x03 00 00 00 40 00
Apr 18 15:06:59 filez kernel: hdd: irq timeout: status=0xd0 { Busy }
Apr 18 15:06:59 filez kernel: hdd: ATAPI reset complete
Apr 18 15:06:59 filez kernel: hdd: irq timeout: status=0x80 { Busy }
Apr 18 15:06:59 filez kernel: hdd: ATAPI reset complete
Have i forget something ? If yes just Mail me
Frederik Reiss
On Thu, 18 Apr 2002, Dr. Death wrote:
> Problem:
>
> I use SuSE Linux 7.2 and when I create md5sums from damaged files on a
> CD, the WHOLE system freezes or is ugly slow untill md5 has passed the
> damaged part of the file !
>
So what do you suggest? You can see from the logs that the device
is having difficulty reading your damaged CD. You can do what
Windows-95 does (ignore the errors and pretend everything is fine),
or what Windows-98 and Windows-2000/Prof does (blue-screen, and re-boot),
or you can try like hell to read the files like Linux does. What do you
suggest?
Cheers,
Dick Johnson
Penguin : Linux version 2.4.18 on an i686 machine (797.90 BogoMips).
Windows-2000/Professional isn't.
On Fri, Apr 19, 2002 at 10:14:41AM -0400, Richard B. Johnson wrote:
> On Thu, 18 Apr 2002, Dr. Death wrote:
>
> > Problem:
> >
> > I use SuSE Linux 7.2 and when I create md5sums from damaged files on a
> > CD, the WHOLE system freezes or is ugly slow untill md5 has passed the
> > damaged part of the file !
> >
>
> So what do you suggest? You can see from the logs that the device
> is having difficulty reading your damaged CD. You can do what
> Windows-95 does (ignore the errors and pretend everything is fine),
> or what Windows-98 and Windows-2000/Prof does (blue-screen, and re-boot),
> or you can try like hell to read the files like Linux does. What do you
> suggest?
You didn't ask me, but I would still suggest that it would be nice if
the whole system didn't come to a near halt.
-kb, the Kent who wonders of the preemption patch might help here.
I agree, I frequently find myself getting files off damages cd's that
others running windows cannot access. It takes a while, but I can
usually get everything but the files on the damaged parts and even then
I can get parts of them usually.
Darrell
On Fri, 2002-04-19 at 10:14, Richard B. Johnson wrote:
> On Thu, 18 Apr 2002, Dr. Death wrote:
>
> > Problem:
> >
> > I use SuSE Linux 7.2 and when I create md5sums from damaged files on a
> > CD, the WHOLE system freezes or is ugly slow untill md5 has passed the
> > damaged part of the file !
> >
>
> So what do you suggest? You can see from the logs that the device
> is having difficulty reading your damaged CD. You can do what
> Windows-95 does (ignore the errors and pretend everything is fine),
> or what Windows-98 and Windows-2000/Prof does (blue-screen, and re-boot),
> or you can try like hell to read the files like Linux does. What do you
> suggest?
>
> Cheers,
> Dick Johnson
>
> Penguin : Linux version 2.4.18 on an i686 machine (797.90 BogoMips).
>
> Windows-2000/Professional isn't.
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
On Friday 19 April 2002 3:14 pm, Richard B. Johnson wrote:
> So what do you suggest? You can see from the logs that the device
> is having difficulty reading your damaged CD. You can do what
> Windows-95 does (ignore the errors and pretend everything is fine),
> or what Windows-98 and Windows-2000/Prof does (blue-screen, and re-boot),
> or you can try like hell to read the files like Linux does. What do you
> suggest?
>
Don't put them in an Xbox in the first place? (see
http://www.newscientist.com/news/news.jsp?id=ns99992000)
cheers
john
On Fri, 19 Apr 2002, Kent Borg wrote:
> On Fri, Apr 19, 2002 at 10:14:41AM -0400, Richard B. Johnson wrote:
> > On Thu, 18 Apr 2002, Dr. Death wrote:
> >
> > > Problem:
> > >
> > > I use SuSE Linux 7.2 and when I create md5sums from damaged files on a
> > > CD, the WHOLE system freezes or is ugly slow untill md5 has passed the
> > > damaged part of the file !
> > >
> >
> > So what do you suggest? You can see from the logs that the device
> > is having difficulty reading your damaged CD. You can do what
> > Windows-95 does (ignore the errors and pretend everything is fine),
> > or what Windows-98 and Windows-2000/Prof does (blue-screen, and re-boot),
> > or you can try like hell to read the files like Linux does. What do you
> > suggest?
>
> You didn't ask me, but I would still suggest that it would be nice if
> the whole system didn't come to a near halt.
>
Some time-outs are enforced by hardware, some errors even result in
a bus reset in which all the devices reload their firmware, etc.
Therefore time-outs are made quite long so one has to wait for a
relatively long to either retry or to give up upon an error. It
is possible to dynamically determine the kind of time-out necessary
for a particular error in a particular device. This could be
determined, for instance, when a device is mounted or otherwise
first accessed. However, then there would be complaints about
the amount of time necessary to mount a device, etc.
So, basically, the compromise seems to be that bad devices or
bad media result in long-time retries.
Note, if you have another task, that is not trying to use the
errored device or its bus, it does not get starved for CPU time.
The kernel still sleeps while waiting for pass/fail interrupts,
therefore giving the CPU to somebody. Of course this doesn't
help the task that's accessing the device, and to that task the
system seems to, as you say, come to a near halt.
FYI "Dr. Death" is phony and email to that node gets bounced.
Cheers,
Dick Johnson
Penguin : Linux version 2.4.18 on an i686 machine (797.90 BogoMips).
Windows-2000/Professional isn't.
At 10:22 AM 4/19/02 -0400, Kent Borg wrote:
> > > Problem:
> > >
> > > I use SuSE Linux 7.2 and when I create md5sums from damaged files on a
> > > CD, the WHOLE system freezes or is ugly slow untill md5 has passed the
> > > damaged part of the file !
> > >
> >
> > So what do you suggest? You can see from the logs that the device
> > is having difficulty reading your damaged CD. You can do what
> > Windows-95 does (ignore the errors and pretend everything is fine),
> > or what Windows-98 and Windows-2000/Prof does (blue-screen, and re-boot),
> > or you can try like hell to read the files like Linux does. What do you
> > suggest?
>
>You didn't ask me, but I would still suggest that it would be nice if
>the whole system didn't come to a near halt.
If that is a real concern, then consider moving from a 56x CD-ROM reader to
something considerably slower, like a 4x or 8x. (Or try modifying the
driver to request a slower speed.) That will reduce the flood of I/O
messages and actions performed by the driver to recover from
badly-scratched media.
Another option is to invest in a good scratch-repair kit -- many scratches
can be filled with appropriate material that reduces the optical distortion
that causes the flood of activity in the first place.
Do you have a CD burner? Then extract the data and burn a new CD.
Finally, try investing in a CD-ROM player that has in-drive smarts to
recover from scratched media.
The choice is your.
On Fri, 19 Apr 2002, Richard B. Johnson wrote:
> On Thu, 18 Apr 2002, Dr. Death wrote:
>
> > Problem:
> >
> > I use SuSE Linux 7.2 and when I create md5sums from damaged files on a
> > CD, the WHOLE system freezes or is ugly slow untill md5 has passed the
> > damaged part of the file !
> >
>
> So what do you suggest? You can see from the logs that the device
> is having difficulty reading your damaged CD. You can do what
> Windows-95 does (ignore the errors and pretend everything is fine),
> or what Windows-98 and Windows-2000/Prof does (blue-screen, and re-boot),
> or you can try like hell to read the files like Linux does. What do you
> suggest?
The problem is not that reading the disk is slow, it's that it brings the
system to its knees. There are many valid scenarios where non-root users
should be able to put CDs in a machine and they shouldn't be able to DoS
it by doing so.
Fact is the SCSI layer's error handling has been on the list of things in
dire need of replacement for years and this is one of the many symptoms.
--
"Love the dolphins," she advised him. "Write by W.A.S.T.E.."
On Thu Apr 18, 2002 at 03:13:35PM +0200, Dr. Death wrote:
> Problem:
>
> I use SuSE Linux 7.2 and when I create md5sums from damaged files on a
> CD, the WHOLE system freezes or is ugly slow untill md5 has passed the
> damaged part of the file !
This should help somewhat. Currently, ide-cd.c retries ERROR_MAX
(8) times when it sees an error. But ide.c is also retrying
ERROR_MAX times when _it_ sees an error, and does a bus reset
after evey 4 failures. So for each bad sector, you get 64
retries (with typical timouts of 7 seconds each) plus 16 bus
resets per bad sector.
The funny thing is though, we knew after the first read that we
had an uncorrectable medium error. Try this patch vs 2.4.19-pre7
--- linux/drivers/ide/ide-cd.c.orig Tue Apr 9 06:59:56 2002
+++ linux/drivers/ide/ide-cd.c Tue Apr 9 07:04:59 2002
@@ -657,6 +657,11 @@
request or data protect error.*/
ide_dump_status (drive, "command error", stat);
cdrom_end_request (0, drive);
+ } else if (sense_key == MEDIUM_ERROR) {
+ /* No point in re-trying a zillion times on a bad
+ * sector... If we got here the error is not correctable */
+ ide_dump_status (drive, "media error (bad sector)", stat);
+ cdrom_end_request (0, drive);
} else if ((err & ~ABRT_ERR) != 0) {
/* Go to the default handler
for other errors. */
-Erik
--
Erik B. Andersen http://codepoet-consulting.com/
--This message was written using 73% post-consumer electrons--
On 19 April 2002 18:01, Erik Andersen wrote:
> On Thu Apr 18, 2002 at 03:13:35PM +0200, Dr. Death wrote:
> > Problem:
> >
> > I use SuSE Linux 7.2 and when I create md5sums from damaged files on a
> > CD, the WHOLE system freezes or is ugly slow untill md5 has passed the
> > damaged part of the file !
>
> This should help somewhat. Currently, ide-cd.c retries ERROR_MAX
> (8) times when it sees an error. But ide.c is also retrying
> ERROR_MAX times when _it_ sees an error, and does a bus reset
> after evey 4 failures. So for each bad sector, you get 64
> retries (with typical timouts of 7 seconds each) plus 16 bus
> resets per bad sector.
And nobody knows how many tries is in hardware...
so we get 8x8x?? retries, and *this* is slow.
--
vda
On Fri, 19 Apr 2002 14:01:13 -0600
"Erik Andersen" <[email protected]> wrote:
> This should help somewhat. Currently, ide-cd.c retries ERROR_MAX
> (8) times when it sees an error. But ide.c is also retrying
> ERROR_MAX times when _it_ sees an error, and does a bus reset
> after evey 4 failures. So for each bad sector, you get 64
> retries (with typical timouts of 7 seconds each) plus 16 bus
> resets per bad sector.
Thanks for investigation. BTW: Does this cover the ide-scsi case, too?
> The funny thing is though, we knew after the first read that we
> had an uncorrectable medium error. Try this patch vs 2.4.19-pre7
>
> --- linux/drivers/ide/ide-cd.c.orig Tue Apr 9 06:59:56 2002
> +++ linux/drivers/ide/ide-cd.c Tue Apr 9 07:04:59 2002
> @@ -657,6 +657,11 @@
> request or data protect error.*/
> ide_dump_status (drive, "command error", stat);
> cdrom_end_request (0, drive);
> + } else if (sense_key == MEDIUM_ERROR) {
> + /* No point in re-trying a zillion times on a bad
> + * sector... If we got here the error is not correctable */
> + ide_dump_status (drive, "media error (bad sector)", stat);
.. and some curious will want to know which sector has thrown the error
[which would save me to patch this some day myself...]
> + cdrom_end_request (0, drive);
> } else if ((err & ~ABRT_ERR) != 0) {
> /* Go to the default handler
> for other errors. */
> -Erik
>
> --
> Erik B. Andersen http://codepoet-consulting.com/
> --This message was written using 73% post-consumer electrons--
Cheers,
Hans-Peter
You have addressed the core problem I have been working on quietly.
The export of the sense data and end request per subdriver is required to
make the personalities proper. This is messy for two of the four current
subdrivers, and teh fifth new one will be clean from the start.
Cheers,
Andre Hedrick
LAD Storage Consulting Group
On Mon, 22 Apr 2002, Hans-Peter Jansen wrote:
> On Fri, 19 Apr 2002 14:01:13 -0600
> "Erik Andersen" <[email protected]> wrote:
>
> > This should help somewhat. Currently, ide-cd.c retries ERROR_MAX
> > (8) times when it sees an error. But ide.c is also retrying
> > ERROR_MAX times when _it_ sees an error, and does a bus reset
> > after evey 4 failures. So for each bad sector, you get 64
> > retries (with typical timouts of 7 seconds each) plus 16 bus
> > resets per bad sector.
>
> Thanks for investigation. BTW: Does this cover the ide-scsi case, too?
>
> > The funny thing is though, we knew after the first read that we
> > had an uncorrectable medium error. Try this patch vs 2.4.19-pre7
> >
> > --- linux/drivers/ide/ide-cd.c.orig Tue Apr 9 06:59:56 2002
> > +++ linux/drivers/ide/ide-cd.c Tue Apr 9 07:04:59 2002
> > @@ -657,6 +657,11 @@
> > request or data protect error.*/
> > ide_dump_status (drive, "command error", stat);
> > cdrom_end_request (0, drive);
> > + } else if (sense_key == MEDIUM_ERROR) {
> > + /* No point in re-trying a zillion times on a bad
> > + * sector... If we got here the error is not correctable */
> > + ide_dump_status (drive, "media error (bad sector)", stat);
>
> .. and some curious will want to know which sector has thrown the error
> [which would save me to patch this some day myself...]
>
> > + cdrom_end_request (0, drive);
> > } else if ((err & ~ABRT_ERR) != 0) {
> > /* Go to the default handler
> > for other errors. */
> > -Erik
> >
> > --
> > Erik B. Andersen http://codepoet-consulting.com/
> > --This message was written using 73% post-consumer electrons--
>
> Cheers,
> Hans-Peter
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>
Hi,
Byt why does it look up the whole system during these retries?
Does it busy wait? (on timer or HW status)
In that case the preemtive kernel could help too... (assuming that it
does not busy wait under a lock - but it is not unlikely...)
/RogerL
On Friday 19 April 2002 22.01, Erik Andersen wrote:
> On Thu Apr 18, 2002 at 03:13:35PM +0200, Dr. Death wrote:
> > Problem:
> >
> > I use SuSE Linux 7.2 and when I create md5sums from damaged files on a
> > CD, the WHOLE system freezes or is ugly slow untill md5 has passed the
> > damaged part of the file !
>
> This should help somewhat. Currently, ide-cd.c retries ERROR_MAX
> (8) times when it sees an error. But ide.c is also retrying
> ERROR_MAX times when _it_ sees an error, and does a bus reset
> after evey 4 failures. So for each bad sector, you get 64
> retries (with typical timouts of 7 seconds each) plus 16 bus
> resets per bad sector.
>
> The funny thing is though, we knew after the first read that we
> had an uncorrectable medium error. Try this patch vs 2.4.19-pre7
>
> --- linux/drivers/ide/ide-cd.c.orig Tue Apr 9 06:59:56 2002
> +++ linux/drivers/ide/ide-cd.c Tue Apr 9 07:04:59 2002
> @@ -657,6 +657,11 @@
> request or data protect error.*/
> ide_dump_status (drive, "command error", stat);
> cdrom_end_request (0, drive);
> + } else if (sense_key == MEDIUM_ERROR) {
> + /* No point in re-trying a zillion times on a bad
> + * sector... If we got here the error is not correctable */
> + ide_dump_status (drive, "media error (bad sector)", stat);
> + cdrom_end_request (0, drive);
> } else if ((err & ~ABRT_ERR) != 0) {
> /* Go to the default handler
> for other errors. */
> -Erik
>
> --
> Erik B. Andersen http://codepoet-consulting.com/
> --This message was written using 73% post-consumer electrons--
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>
>
--
Roger Larsson
Skellefte?
Sweden
On Fri, 19 Apr 2002, Richard B. Johnson wrote:
> On Thu, 18 Apr 2002, Dr. Death wrote:
>
> > Problem:
> >
> > I use SuSE Linux 7.2 and when I create md5sums from damaged files on a
> > CD, the WHOLE system freezes or is ugly slow untill md5 has passed the
> > damaged part of the file !
> >
>
> So what do you suggest? You can see from the logs that the device
> is having difficulty reading your damaged CD. You can do what
> Windows-95 does (ignore the errors and pretend everything is fine),
> or what Windows-98 and Windows-2000/Prof does (blue-screen, and re-boot),
> or you can try like hell to read the files like Linux does. What do you
> suggest?
Several things come to mind:
1 - don't dedicate the entire machine to retrying the error such that
everything else runs slowly if at all.
2 - if the hardware returns an uncorrectable sector error that should be
passed back to the user process rather than retried. An unconditional
deep retry on an error the hardware labels as uncorrectable is not
desirable, and not better than the Windows in most cases.
I took a bottle cap to one of the morning's AOL CDs and then tried to read
it. It's really not just annoying, it's pretty much useless. If you were
staging software off a CD on a running server, your clients would NOT be
happy!
--
bill davidsen <[email protected]>
CTO, TMR Associates, Inc
Doing interesting things with little computers since 1979.
On Tue, 23 Apr 2002, Bill Davidsen wrote:
> On Fri, 19 Apr 2002, Richard B. Johnson wrote:
>
> > On Thu, 18 Apr 2002, Dr. Death wrote:
> >
> > > Problem:
> > >
> > > I use SuSE Linux 7.2 and when I create md5sums from damaged files on a
> > > CD, the WHOLE system freezes or is ugly slow untill md5 has passed the
> > > damaged part of the file !
> > >
> >
> > So what do you suggest? You can see from the logs that the device
> > is having difficulty reading your damaged CD. You can do what
> > Windows-95 does (ignore the errors and pretend everything is fine),
> > or what Windows-98 and Windows-2000/Prof does (blue-screen, and re-boot),
> > or you can try like hell to read the files like Linux does. What do you
> > suggest?
>
> Several things come to mind:
> 1 - don't dedicate the entire machine to retrying the error such that
> everything else runs slowly if at all.
But it doesn't! As previously stated, if you have a device on a common
'channel' (like IDE), that everybody else is trying to use, then
everybody else ends up waiting. However, if your errored devices don't
take over a common I/O channel, everybody else gets the CPU while the
errors are being retried.
For instance, I have SCSI for my disks, and I use IDE for a R/W CD
because it's cheap. I can "try forever" reading dorked CDs and the
only process affected at all is the one trying to read the CD. I
can do full-speed compiles while the CD is being retried.
It's all about configuration. The kernel drivers sleep while waiting
for interrupts that will determine the success or failure of the
disk operation. The 'sleep' means that the CPU gets given to somebody
who could use it.
> 2 - if the hardware returns an uncorrectable sector error that should be
> passed back to the user process rather than retried. An unconditional
> deep retry on an error the hardware labels as uncorrectable is not
> desirable, and not better than the Windows in most cases.
>
The problem is that the hardware usually waits for the worse-case time
(disk spin-up time) to even report an error. It's not like you could
somehow wait for one rev of the disk to determine if a sector could
be read. The disk, itself, retries for a long time, then it reports
a rather general-purpose error (media error on SCSI, bad sector on
IDE, record not found, etc).
> I took a bottle cap to one of the morning's AOL CDs and then tried to read
> it. It's really not just annoying, it's pretty much useless. If you were
> staging software off a CD on a running server, your clients would NOT be
> happy!
>
Put your CDs on a different controller and you can do anything you
want without affecting other tasks.
Cheers,
Dick Johnson
Penguin : Linux version 2.4.18 on an i686 machine (797.90 BogoMips).
Windows-2000/Professional isn't.
On Wed, 24 Apr 2002, Richard B. Johnson wrote:
> On Tue, 23 Apr 2002, Bill Davidsen wrote:
> > Several things come to mind:
> > 1 - don't dedicate the entire machine to retrying the error such that
> > everything else runs slowly if at all.
>
> But it doesn't! As previously stated, if you have a device on a common
> 'channel' (like IDE), that everybody else is trying to use, then
> everybody else ends up waiting. However, if your errored devices don't
> take over a common I/O channel, everybody else gets the CPU while the
> errors are being retried.
>
> For instance, I have SCSI for my disks, and I use IDE for a R/W CD
> because it's cheap. I can "try forever" reading dorked CDs and the
> only process affected at all is the one trying to read the CD. I
> can do full-speed compiles while the CD is being retried.
That's very nice for a system where cost is no object, but ATAPI/IDE is
where the bulk of Linux system are running. Putting the CD on another
cable is realistic (the system I hung does that) but putting the CD on IDE
and the disk on SCSI is not cost effective compared to fixing the hang in
software.
> It's all about configuration. The kernel drivers sleep while waiting
> for interrupts that will determine the success or failure of the
> disk operation. The 'sleep' means that the CPU gets given to somebody
> who could use it.
It would also be nice if the other IDE channels were given to "somebody
who could use it," but that would appear in some cases not to happen.
> > I took a bottle cap to one of the morning's AOL CDs and then tried to read
> > it. It's really not just annoying, it's pretty much useless. If you were
> > staging software off a CD on a running server, your clients would NOT be
> > happy!
> >
>
> Put your CDs on a different controller and you can do anything you
> want without affecting other tasks.
As above, another type of bus is not cost effective, another IDE cable
doesn't solve the problem, no matter what theory says.
--
bill davidsen <[email protected]>
CTO, TMR Associates, Inc
Doing interesting things with little computers since 1979.
On Wed, 24 Apr 2002, Bill Davidsen wrote:
> On Wed, 24 Apr 2002, Richard B. Johnson wrote:
>
> > On Tue, 23 Apr 2002, Bill Davidsen wrote:
>
> > > Several things come to mind:
> > > 1 - don't dedicate the entire machine to retrying the error such that
> > > everything else runs slowly if at all.
> >
> > But it doesn't! As previously stated, if you have a device on a common
> > 'channel' (like IDE), that everybody else is trying to use, then
> > everybody else ends up waiting. However, if your errored devices don't
> > take over a common I/O channel, everybody else gets the CPU while the
> > errors are being retried.
> >
> > For instance, I have SCSI for my disks, and I use IDE for a R/W CD
> > because it's cheap. I can "try forever" reading dorked CDs and the
> > only process affected at all is the one trying to read the CD. I
> > can do full-speed compiles while the CD is being retried.
>
> That's very nice for a system where cost is no object, but ATAPI/IDE is
> where the bulk of Linux system are running. Putting the CD on another
> cable is realistic (the system I hung does that) but putting the CD on IDE
> and the disk on SCSI is not cost effective compared to fixing the hang in
> software.
>
It is NOT a hang in the software. IDE means Integrated Drive Electronics.
The ONLY thing the CPU has to work with is the electronics IN the drive.
It communicates with the drive from a port on the mother-board. There
is no controller on the motherboard. There is absolutely nothing to
isolate one drive from another. A single drive, doing its retries will
"own" that drive-cable until it has either succeded or given up.
The CPU software MUST NOT do anything on that port until the previous
request was answered via interrupt. This means that if the CD is
using the bus, no task can access any hard-disk drive that is on
that same port.
You can see that there is plenty of CPU time available by having
one task do;
while true; do echo "Hello World!"; done
(before you access you damaged CD).
Then, using another VT, access your damaged CD.
When you switch to the 'Hello world' terminal, it's merrily spinning
along, getting all the CPU time your other task would have gotten
if the drive was readable.
> > It's all about configuration. The kernel drivers sleep while waiting
> > for interrupts that will determine the success or failure of the
> > disk operation. The 'sleep' means that the CPU gets given to somebody
> > who could use it.
>
> It would also be nice if the other IDE channels were given to "somebody
> who could use it," but that would appear in some cases not to happen.
>
There are no other IDE channels to give up. The 'master/save'
configuration should be labled 'cheaper' and the two/channel
configuration should be labeled 'cheapest'. It's what you pay
for. It has nothing to do with the kernel or its drivers.
Recent work on IDE was an attempt to get DMA operations and
other 'speed-up' operations to work better. They will never
work well because IDE is not designed to work well. It's
designed to simply exist.
[SNIPPED...]
Cheers,
Dick Johnson
Penguin : Linux version 2.4.1 on an i686 machine (797.90 BogoMips).
Windows-2000/Professional isn't.
I have a system with 2 hard disks (on separate controllers) and a CD-ROM
on the second controller (shared with the disk that, among other things)
handles /usr.
I took an old data CD, scratched took a fork to it and mounted it.
I then started up MMX playing 'Hotel California' and tried to wc(1)
a 700K file on the CD.not too bad not too bad not too bad
Hotel California played fine, but trying to do an 'ls' of /usr
(same controller) took a LONG time..... (had to wait fnot too bad or the
CD to release the controller).
I could wc larg files in my /tmp directory, play music etc
before that WC came back -- I could do anything I wanted,
as long as I didn't need any data off of that second controller
(e.g. loading programs in /usr would die, since that HD shares
controller with the CD).
Given that I rarely use my CD ROM, it's fine having / and /usr
separated... On the other hand, if I was trying to read damaged CDs
with any regularity, I'd be making sure that the CD ROM drive was
sitting on it's own controller -- even if it meant putting all the
other IO on the system onto one IDE drive/controller.
> where the bulk of Linux system are running. Putting the CD on another
> cable is realistic (the system I hung does that) but putting the CD on IDE
> and the disk on SCSI is not cost effective compared to fixing the hang in
> software.
Note that this problem is a HARDWARE one -- not a software one.
It's kinda like trying to cross a Singapore highway... You can
do it faster, if you don't mind dealing with the nasty side of
a (data) bus. (read: SPLAT)
Bus error: car dumped.
(and if you think Linux is bad, try doing the same thing in
Windows!).
--
Stephen Samuel +1(604)876-0426 [email protected]
http://www.bcgreen.com/~samuel/
Powerful committed communication, reaching through fear, uncertainty and
doubt to touch the jewel within each person and bring it to life.
On Wed, 24 Apr 2002, Stephen Samuel wrote:
> I said:
> > where the bulk of Linux system are running. Putting the CD on another
> > cable is realistic (the system I hung does that) but putting the CD on IDE
> > and the disk on SCSI is not cost effective compared to fixing the hang in
> > software.
>
> Note that this problem is a HARDWARE one -- not a software one.
> It's kinda like trying to cross a Singapore highway... You can
> do it faster, if you don't mind dealing with the nasty side of
> a (data) bus. (read: SPLAT)
I don't know how to say this any other way, reread my second sentence
again. I have the disk and CD on separate cables, it still hangs. I NEVER
mix a CD with anything and expect good response. The IDE devices are
hanging, not just the one sharing the cable, nothing shares the cable but
a ZIP drive I use a few times a year when someone sends me a ZIP,
otherwise it's unused. The disk drive is all by itself, as I said the
first time I mentioned this.
I suspect (without having a good way to check) that all IDE devices
sharing the IRQ with the error device *may* be affected. That's the only
thing which comes to mind, I'll add a Promise controller and disk on a
totally separate board and see if that changes anything. Hopefully it will
not share the IRQ :-(
--
bill davidsen <[email protected]>
CTO, TMR Associates, Inc
Doing interesting things with little computers since 1979.
On Wed, Apr 24, 2002 at 11:33:23PM -0400, Bill Davidsen wrote:
> I suspect (without having a good way to check) that all IDE devices
> sharing the IRQ with the error device *may* be affected. That's the only
> thing which comes to mind, I'll add a Promise controller and disk on a
> totally separate board and see if that changes anything. Hopefully it will
> not share the IRQ :-(
I don't think it has to do with the IRQs, but it sounds like the entire ide
chipset (think two cables one one chipset...) has stopped responding when
ONE device (out of a possible four (with two cables)) has failed media.
Let's use an example to help shine the light on exactly what I'm saying (I'm
trying to summarize what's been said in the threads, and I haven't tested
this... though I will be working on such a system in the next few weeks):
1)
Two drives each on a seperate cable, but on the same chipset:
/dev/hda (hard drive) (chipset1)
/dev/hdc (cd-rom) (chipset1)
Put broken CD into /dev/hdc, and read somehow (dd, cat, whatever), now try
to read from /dev/hda. This (according to this thread) should be damn slow
and you will have a very hard time to use this system while it is trying to
read the CD.
2)
Two drives, each on a seperate cable and on different chipsets:
/dev/hda (hard drive) (chipset1)
/dev/hde (cd-rom) (chipset2)
Put broken CD into /dev/hde, read it again, and try to read from /dev/hda.
All should be good, with blue skies, and a responsive system.
Can someone verify that the above is true, and acurately expresses
what they've experienced?
Also, can someone say for sure (Andre) that this is a hardware limitation,
not a Linux IDE locking problem, and with no possibility of a software
work-around?
Thanks,
Mike
On Thu Apr 25, 2002 at 09:04:57PM -0700, Mike Fedyk wrote:
> 1)
> Two drives each on a seperate cable, but on the same chipset:
> /dev/hda (hard drive) (chipset1)
> /dev/hdc (cd-rom) (chipset1)
>
> Put broken CD into /dev/hdc, and read somehow (dd, cat, whatever), now try
> to read from /dev/hda. This (according to this thread) should be damn slow
> and you will have a very hard time to use this system while it is trying to
> read the CD.
This has not been my experience. Reading from hda continues to
work as expected. But the process reading from hdc stays stuck
in D state for a _long_ time.... A kill -9 takes like 10 minutes
before it gets around to actually killing anything.
> 2)
> Two drives, each on a seperate cable and on different chipsets:
> /dev/hda (hard drive) (chipset1)
> /dev/hde (cd-rom) (chipset2)
>
> Put broken CD into /dev/hde, read it again, and try to read from /dev/hda.
> All should be good, with blue skies, and a responsive system.
Sure. Same as above.
> Also, can someone say for sure (Andre) that this is a hardware limitation,
> not a Linux IDE locking problem, and with no possibility of a software
> work-around?
There is a certain amount of delay when a drive hits a bad
sector. But Linux handles things pretty badly IMHO, and could
do a much better job.
-Erik
--
Erik B. Andersen http://codepoet-consulting.com/
--This message was written using 73% post-consumer electrons--
Basically it is a global design flaw from the beginning, and since I have
only 2.4 to address it is a real nasty! Short version, each subdriver
personally does not do unique error handling. Thus a the simple good/bad
approach to a darwin world has come to bite hard now. There is a failure
to address error/sense decoding based on the operations requested to
perform. Second the mainloop is ATA/IDE centered for all events and this
is in proccess to be fixed for 2.4 soon. Third requires all ATAPI to
decode wrt to primary opcode executed and sense of the preferred event
tables and not the generic catch all.
It is a blood mess, and difficult to describe over email :-/ (for me).
Cheers,
Andre Hedrick
LAD Storage Consulting Group
PS Mike, "Mr. Hedrick" was my genetic donor, "Andre" is what I answer too.
On Thu, 25 Apr 2002, Mike Fedyk wrote:
> On Wed, Apr 24, 2002 at 11:33:23PM -0400, Bill Davidsen wrote:
> > I suspect (without having a good way to check) that all IDE devices
> > sharing the IRQ with the error device *may* be affected. That's the only
> > thing which comes to mind, I'll add a Promise controller and disk on a
> > totally separate board and see if that changes anything. Hopefully it will
> > not share the IRQ :-(
>
> I don't think it has to do with the IRQs, but it sounds like the entire ide
> chipset (think two cables one one chipset...) has stopped responding when
> ONE device (out of a possible four (with two cables)) has failed media.
>
> Let's use an example to help shine the light on exactly what I'm saying (I'm
> trying to summarize what's been said in the threads, and I haven't tested
> this... though I will be working on such a system in the next few weeks):
>
> 1)
> Two drives each on a seperate cable, but on the same chipset:
> /dev/hda (hard drive) (chipset1)
> /dev/hdc (cd-rom) (chipset1)
>
> Put broken CD into /dev/hdc, and read somehow (dd, cat, whatever), now try
> to read from /dev/hda. This (according to this thread) should be damn slow
> and you will have a very hard time to use this system while it is trying to
> read the CD.
>
> 2)
> Two drives, each on a seperate cable and on different chipsets:
> /dev/hda (hard drive) (chipset1)
> /dev/hde (cd-rom) (chipset2)
>
> Put broken CD into /dev/hde, read it again, and try to read from /dev/hda.
> All should be good, with blue skies, and a responsive system.
>
> Can someone verify that the above is true, and acurately expresses
> what they've experienced?
>
> Also, can someone say for sure (Andre) that this is a hardware limitation,
> not a Linux IDE locking problem, and with no possibility of a software
> work-around?
>
> Thanks,
>
> Mike
>
On Wed, 24 Apr 2002, Bill Davidsen wrote:
> > Put your CDs on a different controller and you can do anything you
> > want without affecting other tasks.
>
> As above, another type of bus is not cost effective, another IDE cable
> doesn't solve the problem, no matter what theory says.
Hmm, i have my cdrw on a different IDE controller in an all IDE system and
never experience "hangs" even for completely borked cds. The disks are on
the onboard IDE controller. This is also true for when burning CDs, i can
thrash the harddisks with no noticeable slowdown.
Zwane
--
http://function.linuxpower.ca
> I don't think it has to do with the IRQs, but it sounds like the entire ide
> chipset (think two cables one one chipset...) has stopped responding when
> ONE device (out of a possible four (with two cables)) has failed media.
In my case, I was able to read data from hda while the cdrom
on hdd was trying to recover data from a scratched disk.
Reading data from hdc (shared cable with the CDROM), on the
other hand, was VERY slow.
I have an 82371AB PIIX4 ide interface (dual IDE chipset),
It's part of a 440BX chipset board.
If accessing on one IDE interface (C/D) causes the other (A/B) to lockup,
I'd guess you've got especially cheap hardware on your box.
--
Stephen Samuel +1(604)876-0426 [email protected]
http://www.bcgreen.com/~samuel/
Powerful committed communication, reaching through fear, uncertainty and
doubt to touch the jewel within each person and bring it to life.
On Fri, Apr 26, 2002 at 12:35:55AM -0700, Andre Hedrick wrote:
>
> Basically it is a global design flaw from the beginning, and since I have
> only 2.4 to address it is a real nasty! Short version, each subdriver
> personally does not do unique error handling. Thus a the simple good/bad
> approach to a darwin world has come to bite hard now. There is a failure
> to address error/sense decoding based on the operations requested to
> perform. Second the mainloop is ATA/IDE centered for all events and this
> is in proccess to be fixed for 2.4 soon. Third requires all ATAPI to
> decode wrt to primary opcode executed and sense of the preferred event
> tables and not the generic catch all.
>
> It is a blood mess, and difficult to describe over email :-/ (for me).
>
Ok, so there is hope for a fix. Andre, when you have the patches available,
I'm sure meny people from this thread would be willing to help test, just
announce.
Is there a place where you keep your latest patches with a little
documentation on the purpose of the changes?
> Cheers,
>
> Andre Hedrick
> LAD Storage Consulting Group
>
> PS Mike, "Mr. Hedrick" was my genetic donor, "Andre" is what I answer too.
>
?? I think you're thinking of someone else. Read below, I addressed you as
"Andre", and IIRC always have. I understand personally the "genetic donor"
problem though.
Mike
> > Also, can someone say for sure (Andre) that this is a hardware limitation,
> > not a Linux IDE locking problem, and with no possibility of a software
> > work-around?
> >
> > Thanks,
> >
> > Mike
Hi!
> Basically it is a global design flaw from the beginning, and since I have
> only 2.4 to address it is a real nasty! Short version, each subdriver
Well, noone prevents you from submitting 2.5 patches to Martin.... Actually,
his cleanups maybe made that job easier.
Pavel
--
Philips Velo 1: 1"x4"x8", 300gram, 60, 12MB, 40bogomips, linux, mutt,
details at http://atrey.karlin.mff.cuni.cz/~pavel/velo/index.html.
Well I really do not have the time to follow the changes, but when I fix
it in 2.4, please copy a and credit would be nice but not expected. Also
it made resulted in a more complex mess to address.
Regards,
Andre Hedrick
LAD Storage Consulting Group
On Thu, 25 Apr 2002, Pavel Machek wrote:
> Hi!
>
> > Basically it is a global design flaw from the beginning, and since I have
> > only 2.4 to address it is a real nasty! Short version, each subdriver
>
> Well, noone prevents you from submitting 2.5 patches to Martin.... Actually,
> his cleanups maybe made that job easier.
>
> Pavel
> --
> Philips Velo 1: 1"x4"x8", 300gram, 60, 12MB, 40bogomips, linux, mutt,
> details at http://atrey.karlin.mff.cuni.cz/~pavel/velo/index.html.
>
On Fri, 26 Apr 2002, Andre Hedrick wrote:
>
> Basically it is a global design flaw from the beginning, and since I have
> only 2.4 to address it is a real nasty! Short version, each subdriver
> personally does not do unique error handling.
I'm snipping because obviously lots of folks are reading the previous
posts. Since you clearly can see the issue, I will let this thread rest in
hopes that we will have a patch to try and that it will make the whole
problem shrink if not vanish.
It seems the casual assumption that anyone who has a problem must be
putting both devices on the same cable has been laid to rest, time to wait
for new data and/or patches. I will try again next weekend to test the
same problem using a totally separate (Promise) controller for the CD.
--
bill davidsen <[email protected]>
CTO, TMR Associates, Inc
Doing interesting things with little computers since 1979.
i have a plextor cd-rw drive which i use via scsi emulation, it is the
only device on my second ide bus.
often when i rip cds with grip that have scratches, grip completely
stops responding to input and signals, even -9 wont go through. today i
tried waiting for over half an hour for it to terminate, got bored and
quit X. the grip process died after doing that but the kernel's scsi
layer kept spewing error messages on my console and my machine "lagged"
on about 10-15 second intervals a few seconds at a time. i waited over
an hour for it to give up, but it didn't so i had to reboot. kernel
version i'm using is 2.4.17.
--
__
Ota itsellesi luotettava kotimainen email http://www.jippii.fi/
Tutustu samalla netin parhaaseen pelipaikkaan JIPPIIGAMESIIN.