2000-11-24 08:41:47

by Ulrich Windl

[permalink] [raw]
Subject: 2.2.16: How to freeze the kernel

Hello,

this is for your interest, amusement, and for "what not to do":

I managed to freeze the kernel (2.2.16 from SuSE Linux 7.0) in a way
that I could not even switch virtual consoles. Completely silent
eberything...

It all started when Windows/95 ruined another CD-R while trying to
write an image to the media. So I decided to try it with Linux, using
the same CD writer.

I plugged the device to the so far unused SCSI channel and used the
"add-sigle-device" method to avoid reboot, and I succeeded:

kgate kernel: scsi singledevice 0 0 4 0
kgate kernel: Vendor: WAITEC Model: WT624 Rev: 7.0F
kgate kernel: Type: CD-ROM ANSI SCSI
revision: 0
kgate kernel: Detected scsi CD-ROM sr1 at scsi0, channel 0, id 4, lun 0
kgate kernel: (scsi0:0:4:0) Synchronous at 10.0 Mbyte/sec, offset 15.
kgate kernel: sr1: scsi3-mmc drive: 24x/24x writer cd/rw xa/form2 cdda
tray

Then I used "cdrecord-1.8.1" to simulate writing at "speed=8". It
worked so far, but there was a warning about possible problems with
"simulated fixation", and actually several minutes nothing happened
while the simulated fixation was expected to take place.

At some point I hit ^C, returning to the prompt. As the device did not
seem to be ready, I thought "remove the device and reconnect", so I did
"remove-single-device" (possibly while a command was still "busy"). The
remove suceeded, but a second later everything had stopped!

Should a device with busy commands be able to be removed? I guess no...

The last message in the syslog was:

kgate kernel: scsi : aborting command due to timeout : pid 8358,
scsi0, channel 0, id 4, lun 0 UNKNOWN(0x5b) 00 02 00 00 00 00 00 00 00

At that point I pressed "RESET", and interestingly the builtin BIOS of
the Adaptec 2740 (EISA) hung while trying to detect the device.

Only after powering down both, the CD writer and the machine (a HP
Netserver LD Pro), the BIOS detected the device again. So I guess
something badly hung...

The driver being used was
Adaptec AHA274x/284x/294x (EISA/VLB/PCI-Fast SCSI) 5.1.31/3.2.4

After that, everything worked fine.

Regards,
Ulrich


2000-11-24 09:22:28

by Steffen Grunewald

[permalink] [raw]
Subject: Re: 2.2.16: How to freeze the kernel

On Fri 2000-11-24 (09:10), Ulrich Windl wrote:
> Hello,
>
> At that point I pressed "RESET", and interestingly the builtin BIOS of
> the Adaptec 2740 (EISA) hung while trying to detect the device.
>
> Only after powering down both, the CD writer and the machine (a HP
> Netserver LD Pro), the BIOS detected the device again. So I guess
> something badly hung...

For bad sector management I use an old EISA 486 also equipped with
a 2740 card, this one show the same behaviour when I run scsifmt
under DOS and get an old drive with incorrect termination or a bad
controller. Have to switch everything off, even the RESET switch
won't help. I'd blame the BIOS only.

BTW, the built in SCSI utilities are buggy - one cannot rescan the bus...

Are there still upgrades (EISA is meant to be upgradeable) floating
around?

Steffen
--
Steffen Grunewald ** GeoForschungsZentrum Potsdam ** Projektbereich 2.2
Telegrafenberg E3 ** D-14473 Potsdam ** Germany / Duitsland / Allemagne
office: steffen(at)gfz-potsdam.de ** home: steffen.grunewald(at)gmx.de
Hi! I'm a .signature virus. Copy me to your sig and help me spread!

2000-11-24 15:43:45

by Douglas Gilbert

[permalink] [raw]
Subject: Re: 2.2.16: How to freeze the kernel

Ulrich Windl wrote:
>
> Hello,
>
> this is for your interest, amusement, and for "what not to do":
>
> I managed to freeze the kernel (2.2.16 from SuSE Linux 7.0) in a way
> that I could not even switch virtual consoles. Completely silent
> eberything...
>
> It all started when Windows/95 ruined another CD-R while trying to
> write an image to the media. So I decided to try it with Linux, using
> the same CD writer.
>
> I plugged the device to the so far unused SCSI channel and used the
> "add-sigle-device" method to avoid reboot, and I succeeded:
>
> kgate kernel: scsi singledevice 0 0 4 0
> kgate kernel: Vendor: WAITEC Model: WT624 Rev: 7.0F
> kgate kernel: Type: CD-ROM ANSI SCSI
> revision: 0
> kgate kernel: Detected scsi CD-ROM sr1 at scsi0, channel 0, id 4, lun 0
> kgate kernel: (scsi0:0:4:0) Synchronous at 10.0 Mbyte/sec, offset 15.
> kgate kernel: sr1: scsi3-mmc drive: 24x/24x writer cd/rw xa/form2 cdda
> tray
>
> Then I used "cdrecord-1.8.1" to simulate writing at "speed=8". It
> worked so far, but there was a warning about possible problems with
> "simulated fixation", and actually several minutes nothing happened
> while the simulated fixation was expected to take place.

Evidently some media/cdwriters don't like "simulated
fixation" hence the comment from cdrecord. In your
case the warning seems well founded.

When cdrecord issues the SCSI command to fixate
(0x5b on my Yamaha) it sets a long timeout (480.5
seconds (8 minutes)). To find out the current
state (in the lk 2.4 series) try:

$ cat /proc/scsi/sg/debug
dev_max(currently)=9 max_active_device=3 (origin 1)
scsi_dma_free_sectors=512 sg_pool_secs_aval=320 def_reserved_size=32768
>>> device=sg2 scsi2 chan=0 id=6 lun=0 em=0 sg_tablesize=255 excl=0
FD(1): timeout=480500ms bufflen=32768 (res)sgat=0 low_dma=0
cmd_q=0 f_packid=0 k_orphan=0 closed=0
act: id=4054 blen=0 t_o/elap=480500/9920ms sgat=0 op=0x5b

$ cat /proc/scsi/sg/debug
dev_max(currently)=9 max_active_device=3 (origin 1)
scsi_dma_free_sectors=512 sg_pool_secs_aval=320 def_reserved_size=32768
>>> device=sg2 scsi2 chan=0 id=6 lun=0 em=0 sg_tablesize=255 excl=0
FD(1): timeout=480500ms bufflen=32768 (res)sgat=0 low_dma=0
cmd_q=0 f_packid=0 k_orphan=0 closed=0
act: id=4054 blen=0 t_o/elap=480500/13840ms sgat=0 op=0x5b

The last line of these 2 commands shows that SCSI command
0x5b is active with a timeout value of 480500 milliseconds.
9.92 seconds has elapsed when the first 'cat'
was executed and 13.8 seconds had elapsed when
the second one was executed. In my test case
the "dummy fixate" concluded successfully. But what if
it locked up the cdwriter, as the warning hints at?

There is no way that I know of to cancel
a command once it is "active". The timeout
will get it (usually by the brute force
technique of resetting the SCSI bus). [I
have toyed with the idea of trying to shorten
the timeout of an active command.]

> At some point I hit ^C, returning to the prompt. As the device did not
> seem to be ready, I thought "remove the device and reconnect", so I did
> "remove-single-device" (possibly while a command was still "busy"). The
> remove suceeded, but a second later everything had stopped!

Things _not_ to do while there is an active
SCSI command still executing:
- remove a module that it is using
(e.g. sg, aic7xxx, scsi_mod).
In most (but not all) cases rmmod will
report the module is busy.
- use remove-single-device

> Should a device with busy commands be able to be removed? I guess no...

Correct.

> The last message in the syslog was:
>
> kgate kernel: scsi : aborting command due to timeout : pid 8358,
> scsi0, channel 0, id 4, lun 0 UNKNOWN(0x5b) 00 02 00 00 00 00 00 00 00
>
> At that point I pressed "RESET", and interestingly the builtin BIOS of
> the Adaptec 2740 (EISA) hung while trying to detect the device.

In my experience cdwriters are not always well behaved
SCSI devices and can lockup and not respond to SCSI
bus resets. This means you need to power cycle them
to get them back into a functional mode. It is also
a good reason _not_ to have SCSI cdwriters (and scanners)
on the same SCSI bus as high speed modern SCSI disks.
Luckily they tend to use different SCSI parallel bus
types.

> Only after powering down both, the CD writer and the machine (a HP
> Netserver LD Pro), the BIOS detected the device again. So I guess
> something badly hung...
>
> The driver being used was
> Adaptec AHA274x/284x/294x (EISA/VLB/PCI-Fast SCSI) 5.1.31/3.2.4
>
> After that, everything worked fine.

Thanks for the report. Hopefully everything worked ok
when you did _not_ use the "dummy" option in cdrecord.

Doug Gilbert