2002-02-19 17:57:11

by Glover George

[permalink] [raw]
Subject: st0: Block limits 1 - 16777215 bytes.

I've been experiencing mysterious lockups since upgrading to kernel
2.4.17. Looking in the /var/log/messages I hadn't seen anything
suspicious until now. I guess the machine hasn't had time to write this
to disk except every now and then. The message

Feb 19 11:29:55 butler kernel: st0: Block limits 1 - 16777215 bytes.

I notice this after rebooting after the crash. So I tried manually
doing a tar to the tape drive and was able to successfully lock the
machine up. Can someone help me understand this and if it is simply a
limit problem, why would the machine lock up?

Thank you.

Glover George
Systems/Networks Admin
Gulf Sales & Supply, Inc.
(228) 762-0268
[email protected]
http://www.gulfsales.com



2002-02-20 01:16:05

by Mr. James W. Laferriere

[permalink] [raw]
Subject: Re: st0: Block limits 1 - 16777215 bytes.


Hello Glover , I get the same messages & do not get system
lock ups . Might try sending a little bit more info about
what is going on around any of the lock ups . If you can
reliably lock up the system by accessing the tape drive .
Then send some info to the list about the system & the tape
drive and how it is attached to the system . Hth , JimL
ps: All disk & tape drives are scsi .

Tyan Thunder HE-SL Dual s370 M.B.
# lspci
00:00.0 Host bridge: Relience Computer CNB20HE (rev 23)
00:00.1 PCI bridge: Relience Computer CNB20HE (rev 01)
00:00.2 Host bridge: Relience Computer: Unknown device 0006 (rev 01)
00:00.3 Host bridge: Relience Computer: Unknown device 0006 (rev 01)
00:01.0 SCSI storage controller: Symbios Logic Inc. (formerly NCR): Unknown device 0021 (rev 01)
00:01.1 SCSI storage controller: Symbios Logic Inc. (formerly NCR): Unknown device 0021 (rev 01)
BTW these are a Symbios 53c1010 chip on board .
00:02.0 Multimedia audio controller: Ensoniq ES1371 [AudioPCI-97] (rev 09)
00:04.0 PCI bridge: Intel Corporation 80960RP [i960 RP Microprocessor/Bridge] (rev 05)
00:04.1 RAID bus controller: Mylex Corporation DAC960PX (rev 05)
Is a DAC960PL
00:07.0 Ethernet controller: Intel Corporation 82557 [Ethernet Pro 100] (rev 08)
00:0f.0 ISA bridge: Relience Computer: Unknown device 0200 (rev 51)
00:0f.1 IDE interface: Relience Computer: Unknown device 0211
00:0f.2 USB Controller: Relience Computer: Unknown device 0220 (rev 04)
01:00.0 VGA compatible controller: ATI Technologies Inc Rage 128 PF

# cat /proc/cpuinfo Below x 2 .

processor : 0 ( & 1 )
vendor_id : GenuineIntel
cpu family : 6
model : 8
model name : Pentium III (Coppermine)
stepping : 10
cpu MHz : 849.158
cache size : 256 KB
fdiv_bug : no
hlt_bug : no
f00f_bug : no
coma_bug : no
fpu : yes
fpu_exception : yes
cpuid level : 2
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 mmx fxsr sse
bogomips : 1690.82


On Tue, 19 Feb 2002, Glover George wrote:

> I've been experiencing mysterious lockups since upgrading to kernel
> 2.4.17. Looking in the /var/log/messages I hadn't seen anything
> suspicious until now. I guess the machine hasn't had time to write this
> to disk except every now and then. The message
>
> Feb 19 11:29:55 butler kernel: st0: Block limits 1 - 16777215 bytes.
>
> I notice this after rebooting after the crash. So I tried manually
> doing a tar to the tape drive and was able to successfully lock the
> machine up. Can someone help me understand this and if it is simply a
> limit problem, why would the machine lock up?
>
> Thank you.
>
> Glover George
> Systems/Networks Admin
> Gulf Sales & Supply, Inc.
> (228) 762-0268
> [email protected]
> http://www.gulfsales.com
>
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>

+------------------------------------------------------------------+
| James W. Laferriere | System Techniques | Give me VMS |
| Network Engineer | P.O. Box 854 | Give me Linux |
| [email protected] | Coudersport PA 16915 | only on AXP |
+------------------------------------------------------------------+

2002-02-20 14:56:44

by Glover George

[permalink] [raw]
Subject: RE: st0: Block limits 1 - 16777215 bytes.

Ok, after playing with it a little more I found out that the message I'm
getting about the block sizes isn't related to the lockups. I can lock
the system up by tar'ing up the /proc directory (why are you tar'ing the
/proc directory!!! I know!!! But that's not the point). I had no
problem with RH 7.2's supplied 7.2 kernel (2.4.7-10). However, this is
2.4.17 (with the linux-abi patch).

I have been able to make succesful backups as long as I ignore the /proc
directory but something must be wrong. Doing an "ls -la *" doesn't lock
the machine though. Only when tar'ing it (I suppose because of a read).
It doesn't lock up consistently in the same place when reading from the
proc directory however, but always in the proc. I made about 15 test
runs and they all died in proc and --exclude proc doesn't cause it to
lock somewhere else.

I guess the question is (and I've had it for a while) is how do I get
the system to log more info for me? I'm a little new a debugging the
kernel, but I did compile most of the debugging into a test kernel.
However, the system hard locks with no oops or anything in the logs.
The system must be shutdown by holding the power button in for 4
seconds. I know this must not be a huge problem because no ones
mentioned it, so I'd like to do some more investigating on my part.
Thanks for your help.

> Hello Glover , I get the same messages & do not get system
> lock ups . Might try sending a little bit more info about
> what is going on around any of the lock ups . If you can
> reliably lock up the system by accessing the tape drive .
> Then send some info to the list about the system & the tape
> drive and how it is attached to the system . Hth , JimL
> ps: All disk & tape drives are scsi .
>
>
> On Tue, 19 Feb 2002, Glover George wrote:
>
> > I've been experiencing mysterious lockups since upgrading to kernel
> > 2.4.17. Looking in the /var/log/messages I hadn't seen anything
> > suspicious until now. I guess the machine hasn't had time to write
> > this to disk except every now and then. The message
> >
> > Feb 19 11:29:55 butler kernel: st0: Block limits 1 - 16777215 bytes.
> >
> > I notice this after rebooting after the crash. So I tried manually
> > doing a tar to the tape drive and was able to successfully lock the
> > machine up. Can someone help me understand this and if it
> is simply a
> > limit problem, why would the machine lock up?
> >

2002-02-20 15:19:34

by Richard B. Johnson

[permalink] [raw]
Subject: RE: st0: Block limits 1 - 16777215 bytes.

On Wed, 20 Feb 2002, Glover George wrote:

> Ok, after playing with it a little more I found out that the message I'm
> getting about the block sizes isn't related to the lockups. I can lock
> the system up by tar'ing up the /proc directory (why are you tar'ing the
> /proc directory!!! I know!!! But that's not the point). I had no
> problem with RH 7.2's supplied 7.2 kernel (2.4.7-10). However, this is
> 2.4.17 (with the linux-abi patch).
>
> I have been able to make succesful backups as long as I ignore the /proc
> directory but something must be wrong. Doing an "ls -la *" doesn't lock
> the machine though. Only when tar'ing it (I suppose because of a read).
> It doesn't lock up consistently in the same place when reading from the
> proc directory however, but always in the proc. I made about 15 test
> runs and they all died in proc and --exclude proc doesn't cause it to
> lock somewhere else.

You do not tar /proc! There is kcore there! `tar` thinks it's a real
file. Reading (accessing) some kernel areas will cause a deadlock.

If you don't want to --exclude proc, then `umount` it before your
backups. FYI, it's SOP to backup different mounted file-systems so
you don't end up backing up N disks on a single media. Therefore
your `tar` sequence would be something like:

tar -czlf root.tar.gz /
tar -czlf user.tar.gz /user
|________ stay on the same file-system.
..etc..

Since /proc is a seperate file-system, you never have problems like
you describe and the mount-point gets backed up as required.


Cheers,
Dick Johnson

Penguin : Linux version 2.4.1 on an i686 machine (797.90 BogoMips).

111,111,111 * 111,111,111 = 12,345,678,987,654,321