2001-04-03 04:17:14

by Pete Toscano

[permalink] [raw]
Subject: Stuck: What to do with solid locks?

Hello,

Three times since I upgraded to 2.4.3 and at least once in the 2.4.3-pre
series, my machine would completely lock hard. I've got KDB running on
a serial console (as I've been seeing lots of crashes, usually in the
scsi_eh_0 process) and even this fails to pick up anything wrong. It's
just a completely hard crash.

I don't know what to do or how to go about collecting info for debugging
purposes. I run Win2k and FreeBSD on it too and while Win2k does crash
on it sometimes, it doesn't crash anywhere near as much as Linux.
FreeBSD doesn't crash at all. Unfortunately for me, I prefer Linux for
my machine, so ditching Linux for one of these is not a preferred
option.

Anyway, I'm stumped. The magic SysReq stuff doesn't respond either.
Any ides on how to debug this?

My HW is: dual P3 600, 640M RAM, IEEE1394 PCI card, G200 (and G400
sometimes) AGP card, Promise Ultra66 PCI card, Adaptec 2940UW PCI card,
SB Live (EMU10k1) card, and a Linksys 10/100 PCI ethernet card.
Additionally, I have two USB hubs plugged into my PC. One of these hubs
has a USB mouse and a (USB) SanDisk SDDR-31 plugged into it. The other
hub (sometimes) has a Rio 500 plugged into it. I have a generic ATAPI
CDROM (the Kenwood True-X 72x was very flaky when using IDE-SCSI)
connected to the Promise card (yes, I know it doesn't do UDMA4) and a
Plextor 12/4/32 SCSI CD-RW connected to the Adaptec card. Finally, I
have two hard drives with Linux partitions on both. Almost all of these
partition are ReiserFS, with /, /var, and /boot being EXT2.

Attached is my .config file. I'm using stock 2.4.3 with the most recent
KDB patch and the newest AIC7xxx driver (6.1.8).

I'm using RH7.0 with XFree 4.0.1. I've upgraded packages as per the
Changes docs. I'm also using the most recent modutils (2.4.5). Every
crash has happened in X, but then, I almost always work in X on this
workstation, so the only place for them to happen would be in X. I'll
see about leaving it on overnight without running X.

This is very frustrating. I really, really want to be able to start
doing something on my workstation without having to worry everytime
about it crashing.

Thanks,
pete


Attachments:
(No filename) (2.12 kB)
.config (18.44 kB)
Download all attachments

2001-04-03 12:23:40

by Alan

[permalink] [raw]
Subject: Re: Stuck: What to do with solid locks?

> This is very frustrating. I really, really want to be able to start
> doing something on my workstation without having to worry everytime
> about it crashing.

Then install 2.2.19. 2.4.x isnt stable yet. If you have the time then oopses
and debugging data are wonderful if not then 2.2 is stable.


Alan

2001-04-03 14:33:47

by Pete Toscano

[permalink] [raw]
Subject: Re: Stuck: What to do with solid locks?

Oh, I realize this. I don't mind and even expect the occational crash
right now in the 2.4.x series, but the frequency of these crashes fall
into the "frequent" category. I know that if I want a much more stable
system, I should go back to 2.2.19, but I'd prefer to stick it out with
2.4.x and help by collecting data. The data collection part is all (I
think) I can do, as I don't know the first place to begin when it comes
to fixing most kernel problems. I know it's not much, but it's about
all I can do to give something back to the Linux community and I'd
really like to help. The message I wrote last night was a bit too whiny
as I had just had three crashes/locks within a fairly short period of
time.

The most frustrating part is these solid locks. I don't even have KDB
to nose about the system with. Even when the system does crash to KDB,
I don't get Oops messages, just "kernel cannot handle NULL paging
request"-sort of stuff. Nothing ever gets logged to (k)syslogd (or, at
least, handled by (k)syslogd). Keith Owens has been great about helping
me us KDB to try to collect data for people who might be able to track
down bugs, but if I can't get into KDB even, then I have no idea where
to begin to help fix this problem (or these problems).

pete

On Tue, 03 Apr 2001, Alan Cox wrote:

> > This is very frustrating. I really, really want to be able to start
> > doing something on my workstation without having to worry everytime
> > about it crashing.
>
> Then install 2.2.19. 2.4.x isnt stable yet. If you have the time then oopses
> and debugging data are wonderful if not then 2.2 is stable.
>
>
> Alan
>

2001-04-05 07:02:58

by Colonel

[permalink] [raw]
Subject: Re: Stuck: What to do with solid locks?


In kernel.list, you wrote:
>
>Oh, I realize this. I don't mind and even expect the occational crash
>right now in the 2.4.x series, but the frequency of these crashes fall

Well, you say this, but
...more whinny post deleted...

>to begin to help fix this problem (or these problems).

Twice your tone plays different from your words. A scan of lkml
should have shown you that your problem is not a major problem, it's
far more likely to be unique than general.

- -------------------------------------------------------

1) try a different distribution, RH is bleeding edge at times and the
problems may not be entirely within the kernel. For example,
Slackware-current is a base (without any additions it runs a 2.4
kernel) I've used with 4 machines now, and the only problem was the
loopback fs hang of a month ago.

2) remove drivers & hardware goodies to see if stability improves.
change your typical application load to see what happens. I actually
do this the other way, run a simple kernel and then add to it.

3) there are a lot of 2.4 kernels, over 40 variants, look thru the
ChangeLogs, maybe your hardware is mentioned someplace. try them to
see if stability improves.


In short, try the reduce the possible areas for a bug, ideally getting
to a point where you can state : 2.4.X-Y with AAA locks while 2.4.X-Y
without AAA does not lock. That will bring more attention.

oh, and

4) boot into your last working kernel when you want to accomplish
something.