2001-10-21 10:54:53

by Ville Herva

[permalink] [raw]
Subject: 2.4.10ac10, cdrecord 1.9-6, Mitsumi CR-4804TE: lock up burning too large image

When (accidentally) trying to burn ~670MB onto a 74" cdr disk, I experienced
a complete lock up.

It went to 99% (as one would expect), and then drive began giving weird
sounds - as if it was moving the head from start to end over and over. After
a short while, the whole system locked up, no mouse, keyboard, caps lock,
ctrl-alt-del, alt-sysrq-{s,u,b}.

It used to give a nice error when disk size was exceeded with 2.2.18pre19
and a tad older cdrecord (1.9-something (1.10-4 failed on 2.2 BTW, giving
error on mmapping /dev/null)).

I assume this is a kernel thing...

BTW: Also the cd audio ripping speed has dropped from ~8x to ~1x with both
my cdrw and cd drive. The drop took place before upgrading from 2.2 to 2.4.
I am and was using scsi-emulation for both drives. I tried going back to
older cdparanoia, but it didn't help. Before I try to binary search what has
changed, does anybody have any ideas on what to try?

------------------------------------------------------------------------
kernel 2.4.10-ac10 SMP
cdrecord 1.9-6

dmesg:
scsi : 0 hosts left.
SCSI subsystem driver Revision: 1.00
scsi0 : SCSI host adapter emulation for IDE ATAPI devices
Vendor: MITSUMI Model: CR-4804TE Rev: 2.4C
Type: CD-ROM ANSI SCSI revision: 02
APIC error on CPU0: 08(02)
Attached scsi CD-ROM sr1 at scsi0, channel 0, id 1, lun 0
sr0: scsi3-mmc drive: 32x/32x cd/rw xa/form2 cdda tray
Uniform CD-ROM driver Revision: 3.12
sr1: scsi3-mmc drive: 24x/24x writer cd/rw xa/form2 cdda tray

hdparm /dev/hdd:

/dev/hdd:
HDIO_GET_MULTCOUNT failed: Input/output error
I/O support = 0 (default 16-bit)
unmaskirq = 0 (off)
using_dma = 1 (on)
keepsettings = 0 (off)
HDIO_GET_NOWERR failed: Input/output error
readonly = 0 (off)
BLKRAGET failed: Input/output error
HDIO_GETGEO failed: Invalid argument

cdrecord -scanbus:
0,1,0 1) 'MITSUMI ' 'CR-4804TE ' '2.4C' Removable CD-ROM


kernel messages before the lockup:

Oct 20 20:35:53 kernel: scsi : aborting command due to timeout : pid 155966,
@+scsi0, channel 0, id 1, lun 0 0x2a 00 00 05 48 9e 00 00 1f 00
Oct 20 20:35:53 kernel: hdd: timeout waiting for DMA
Oct 20 20:35:53 kernel: ide_dmaproc: chipset supported ide_dma_timeout func
@+only: 14
Oct 20 20:35:54 kernel: hdd: status timeout: status=0xd0 { Busy }
Oct 20 20:35:54 kernel: hdd: drive not ready for command
Oct 20 20:36:28 kernel: hdd: ATAPI reset timed-out, status=0xd0
<halted at this point>


-- v --

[email protected]


2001-10-21 11:46:15

by Ville Herva

[permalink] [raw]
Subject: Re: 2.4.10ac10, cdrecord 1.9-6, Mitsumi CR-4804TE: lock up burning too large image

On Sun, Oct 21, 2001 at 01:37:01PM +0200, you [Joerg Schilling] claimed:
>

Thanks for the timely reply!

> This must be a broken drive....

Hmm. It used to work with 2.2-kernel. With too large image, it just gave an
error.

> >a short while, the whole system locked up, no mouse, keyboard, caps lock,
> >ctrl-alt-del, alt-sysrq-{s,u,b}.
>
> This is a broken kernel!

Yep.

> >It used to give a nice error when disk size was exceeded with 2.2.18pre19
> >and a tad older cdrecord (1.9-something (1.10-4 failed on 2.2 BTW, giving
> >error on mmapping /dev/null)).
>
> Don't use outdated cdrecord versions, I cannot support them!

Ok. I updated to 1.10 from redhat rawhide, but as said it didn't work at all
with 2.2 ("failed to mmap /dev/null" or something) so I went back to 1.9. I
could retry now that I've updated the machine in question to 2.4. (I can
also see if the 2.2 /dev/null error reproduces if you are interested.) I'll
retry too large image with 1.10 and report back to you, but I fear it is a
kernel bug.

> http://www.fokus.gmd.de/research/cc/glone/employees/joerg.schilling/private/problems.html
>
> 1.9 is outdated for a long long time, it obviously cannot contain workarounds
> for Linux kernel bugs introduced after cdrecord-1.9 came out.

;)

> It looks like there is still a timeout bug in the kernel.
> If the kernel handles timeouts correctly, then cdrecord will return.

I see.


-- v --

[email protected]

2001-10-21 11:58:06

by Joerg Schilling

[permalink] [raw]
Subject: Re: 2.4.10ac10, cdrecord 1.9-6, Mitsumi CR-4804TE: lock up burning too large image


>From [email protected] Sun Oct 21 13:46:25 2001

>On Sun, Oct 21, 2001 at 01:37:01PM +0200, you [Joerg Schilling] claimed:
>>

>Thanks for the timely reply!

>> This must be a broken drive....

>Hmm. It used to work with 2.2-kernel. With too large image, it just gave an
>error.

I may only judge from information you provide, not from information you hide.

>> Don't use outdated cdrecord versions, I cannot support them!

>Ok. I updated to 1.10 from redhat rawhide, but as said it didn't work at all

1.10 is outdated too, please read

http://www.fokus.gmd.de/research/cc/glone/employees/joerg.schilling/private/problems.html

>with 2.2 ("failed to mmap /dev/null" or something) so I went back to 1.9. I

I cannot prevent you from broken Linux installations!

The linux kernel people still have propblems with interfaces and make thanges that
break binary compatibility when going to more recent Linux versions.
Why do you believe that a cdrecord that has been compiled on 2.4 will run on 2.2?

Linux needed close to 10 years to finally support mmap() (ther OS like SunOS
did this since 1987). Cdrecord's outoconf chooses the best interfaces of the OS.
SVS shared mem is outdated and badly implemented on Linux (too many restrictions).
mmap is the modern method to get shared memory but Linux didn't support is before
November 2000.



J?rg

EMail:[email protected] (home) J?rg Schilling D-13353 Berlin
[email protected] (uni) If you don't have iso-8859-1
[email protected] (work) chars I am J"org Schilling
URL: http://www.fokus.gmd.de/usr/schilling ftp://ftp.fokus.gmd.de/pub/unix

2001-10-21 12:07:57

by Ville Herva

[permalink] [raw]
Subject: Re: 2.4.10ac10, cdrecord 1.9-6, Mitsumi CR-4804TE: lock up burning too large image

On Sun, Oct 21, 2001 at 01:56:06PM +0200, you [Joerg Schilling] claimed:
>
> >Hmm. It used to work with 2.2-kernel. With too large image, it just gave an
> >error.
>
> I may only judge from information you provide, not from information you hide.

I did say that in the original report:

"It used to give a nice error when disk size was exceeded with 2.2.18pre19
and a tad older cdrecord..."

but I could have been clearer.

> 1.10 is outdated too, please read
>
> http://www.fokus.gmd.de/research/cc/glone/employees/joerg.schilling/private/problems.html

Ok. I'll compile the newest from source.

But do you think the too-large-image lock up might be cured with a newer
cdrecord, or should is the kernel the prime suspect?

I will try anyway and report back to you.

> >with 2.2 ("failed to mmap /dev/null" or something) so I went back to 1.9. I
>
> I cannot prevent you from broken Linux installations!
>
> The linux kernel people still have propblems with interfaces and make
> thanges that break binary compatibility when going to more recent Linux
> versions. Why do you believe that a cdrecord that has been compiled on 2.4
> will run on 2.2?

Well, most software software seems to work with both 2.2 and 2.4. I didn't
think carefully enough to realize that some interfaces must have changed.

> Linux needed close to 10 years to finally support mmap() (ther OS like
> SunOS did this since 1987). Cdrecord's outoconf chooses the best
> interfaces of the OS. SVS shared mem is outdated and badly implemented on
> Linux (too many restrictions). mmap is the modern method to get shared
> memory but Linux didn't support is before November 2000.

I see. Thanks for the clarification.


-- v --

[email protected]

2001-10-21 12:12:17

by Joerg Schilling

[permalink] [raw]
Subject: Re: 2.4.10ac10, cdrecord 1.9-6, Mitsumi CR-4804TE: lock up burning too large image


>From [email protected] Sun Oct 21 14:08:12 2001

>On Sun, Oct 21, 2001 at 01:56:06PM +0200, you [Joerg Schilling] claimed:
>>
>> >Hmm. It used to work with 2.2-kernel. With too large image, it just gave an
>> >error.
>>
>> I may only judge from information you provide, not from information you hide.

>I did say that in the original report:

>"It used to give a nice error when disk size was exceeded with 2.2.18pre19
>and a tad older cdrecord..."

Sorry, I seem have to miss this.

>> 1.10 is outdated too, please read
>>
>> http://www.fokus.gmd.de/research/cc/glone/employees/joerg.schilling/private/problems.html

>Ok. I'll compile the newest from source.

>But do you think the too-large-image lock up might be cured with a newer
>cdrecord, or should is the kernel the prime suspect?

It least recent libscg versions include a workaround for an incorrect
Linux kernel return for a timed out SCSI command via ATAPI. So if the kernel
does return at all, cdrecord will know why.

J?rg

EMail:[email protected] (home) J?rg Schilling D-13353 Berlin
[email protected] (uni) If you don't have iso-8859-1
[email protected] (work) chars I am J"org Schilling
URL: http://www.fokus.gmd.de/usr/schilling ftp://ftp.fokus.gmd.de/pub/unix

2001-10-21 17:25:09

by Ville Herva

[permalink] [raw]
Subject: Re: 2.4.10ac10, cdrecord 1.9-6, Mitsumi CR-4804TE: lock up burning too large image

On Sun, Oct 21, 2001 at 02:10:18PM +0200, you [Joerg Schilling] claimed:
>
> >Ok. I'll compile the newest from source.
>
> >But do you think the too-large-image lock up might be cured with a newer
> >cdrecord, or should is the kernel the prime suspect?
>
> It least recent libscg versions include a workaround for an incorrect
> Linux kernel return for a timed out SCSI command via ATAPI. So if the kernel
> does return at all, cdrecord will know why.

Bummer. I'm not able to reproduce it with

progress -c 1M < /dev/zero | cdrecord dev=0,1,0 speed=4 -dummy -
(essentially same as 'cdrecord dev=0,1,0 speed=4 -dummy - < /dev/zero')

(The line I originally used was "cdrecord dev=0,1,0 speed=4 -" and the input
was from mkisofs.)

cdrecord-1.9-6

686.00 MB; elapsed 1172 secs; 0.59 MB/s...
cdrecord: Input/output error. write_g1: scsi sendcmd: retryable error
CDB: 2A 00 00 05 53 E1 00 00 1F 00
status: 0x0 (GOOD STATUS)
write track data: error after 715065344 bytes
Sense Bytes: 70 00 00 00 00 00 00 0A 00 00 00 00 00 00 00 00 00 00

and cdrecord 1.11a08

686.00 MB; elapsed 1172 secs; 0.59 MB/s...
./cdrecord: Input/output error. write_g1: scsi sendcmd: no error
CDB: 2A 00 00 05 53 E1 00 00 1F 00
status: 0x2 (CHECK CONDITION)
Sense Bytes: 71 00 03 00 00 00 00 0A 00 00 00 00 0C 00 00 00
Sense Key: 0x3 Medium Error, deferred error, Segment 0
Sense Code: 0x0C Qual 0x00 (write error) Fru 0x0
Sense flags: Blk 0 (not valid)
cmd finished after 5.706s timeout 40s
write track data: error after 715065344 bytes
Sense Bytes: 70 00 00 00 00 00 00 0A 00 00 00 00 00 00 00 00 00 00

(nothing in dmesg.)

Perhaps it really takes real write to trigger this or the cd media in
question was somehow flawed. I try again, when I have more time.


-- v --

[email protected]

2001-10-22 10:40:44

by Paul Kreiner

[permalink] [raw]
Subject: Re: 2.4.10ac10, cdrecord 1.9-6, Mitsumi CR-4804TE: lock up burning too large image

> Ok. I updated to 1.10 from redhat rawhide, but as said it didn't work at all
> with 2.2 ("failed to mmap /dev/null" or something) so I went back to 1.9. I
> could retry now that I've updated the machine in question to 2.4. (I can
> also see if the 2.2 /dev/null error reproduces if you are interested.) I'll
> retry too large image with 1.10 and report back to you, but I fear it is a
> kernel bug.

I believe I ran into this same cdrecord "fail to mmap /dev/null" issue
before. The fix (aside from upgrading your kernel to 2.4.x) is to pass
"fs=0" to cdrecord. According to the docs, this disables the in-memory FIFO.

Cheers,
Paul Kreiner

2001-10-22 19:03:39

by Bill Davidsen

[permalink] [raw]
Subject: Re: 2.4.10ac10, cdrecord 1.9-6, Mitsumi CR-4804TE: lock up burning too large image

In article <[email protected]> [email protected] wrote:
| When (accidentally) trying to burn ~670MB onto a 74" cdr disk, I experienced
| a complete lock up.
|
| It went to 99% (as one would expect), and then drive began giving weird
| sounds - as if it was moving the head from start to end over and over. After
| a short while, the whole system locked up, no mouse, keyboard, caps lock,
| ctrl-alt-del, alt-sysrq-{s,u,b}.
|
| It used to give a nice error when disk size was exceeded with 2.2.18pre19
| and a tad older cdrecord (1.9-something (1.10-4 failed on 2.2 BTW, giving
| error on mmapping /dev/null)).
|
| I assume this is a kernel thing...

You are probably correct, but you might try running as a user other
than root, with proper permissions on the device. cdwrite tries to run
with realtime priority, and I don't know just how tightly a realtime
process could lock the systems if it gets its knickers in a twist.

You might also try software watchdog, I would expect it to reboot if
the kernel is up but the RT process is looping, such as waiting for a
status to change or some such.

--
bill davidsen <[email protected]>
His first management concern is not solving the problem, but covering
his ass. If he lived in the middle ages he'd wear his codpiece backward.