2003-07-06 16:20:32

by Stephan von Krawczynski

[permalink] [raw]
Subject: 2.4.22-pre3 and reiserfs boot problem

Hello,

I just tried 2.4.22-pre3 and found out I cannot boot my test box any more. It
halts at:

reiserfs: found format "3.6" with standard journal

on a partition located on aic7xxx based hd.

Booting the box with pre2 works perfectly well.
Anything I should try? What information is needed?

Regards,
Stephan


2003-07-06 18:00:06

by Chris Mason

[permalink] [raw]
Subject: Re: 2.4.22-pre3 and reiserfs boot problem

On Sun, 2003-07-06 at 12:34, Stephan von Krawczynski wrote:
> Hello,
>
> I just tried 2.4.22-pre3 and found out I cannot boot my test box any more. It
> halts at:
>
> reiserfs: found format "3.6" with standard journal
>
> on a partition located on aic7xxx based hd.
>
> Booting the box with pre2 works perfectly well.
> Anything I should try? What information is needed?

Same config here (reiserfs 3.6.x, aic7xxx drive), works without
problems. Is reiserfs or aic7xxx compiled as a module in your setup?
If so did you remember to install the new modules and mkinitrd (if
required).

Is reiserfs on your root drive? If not can you please boot into single
user mode, enable sysrq, try the mount again and get the decoded output
from sysrq-p and sysrq-t if it hangs.

-chris


2003-07-06 21:10:13

by Stephan von Krawczynski

[permalink] [raw]
Subject: Re: 2.4.22-pre3 and reiserfs boot problem

On 06 Jul 2003 14:13:44 -0400
Chris Mason <[email protected]> wrote:

> On Sun, 2003-07-06 at 12:34, Stephan von Krawczynski wrote:
> > Hello,
> >
> > I just tried 2.4.22-pre3 and found out I cannot boot my test box any more.
> > It halts at:
> >
> > reiserfs: found format "3.6" with standard journal
> >
> > on a partition located on aic7xxx based hd.
> >
> > Booting the box with pre2 works perfectly well.
> > Anything I should try? What information is needed?
>
> Same config here (reiserfs 3.6.x, aic7xxx drive), works without
> problems. Is reiserfs or aic7xxx compiled as a module in your setup?
> If so did you remember to install the new modules and mkinitrd (if
> required).

Hello Chris,

no, there is no modules involved here. The problem arises not on my root
partition (which happens to be reiserfs, too), that seems to be ok. Problems
seem to come up on sda4, a simple data partition.

> Is reiserfs on your root drive? If not can you please boot into single
> user mode, enable sysrq, try the mount again and get the decoded output
> from sysrq-p and sysrq-t if it hangs.
>
> -chris

I am going to try with single user tomorrow morning.

Regards,
Stephan

2003-07-07 09:50:51

by Stephan von Krawczynski

[permalink] [raw]
Subject: Re: 2.4.22-pre3 and reiserfs boot problem

Hello Chris,

I have to correct myself regarding the problem. It does _not_ arise on
aic-driven scsi-disk, but on 3ware-driven RAID5 with 320 GB data. I tried to
mount it by hand, but that does not work either. It does not look like there is
a lot of work going on on the hds while mounting (no LEDs).
Can anyone reproduce with equal setup?
I try to come up with further information...
--
Regards,
Stephan

2003-07-09 11:47:07

by Stephan von Krawczynski

[permalink] [raw]
Subject: Re: 2.4.22-pre3 and reiserfs boot problem

On 06 Jul 2003 14:13:44 -0400
Chris Mason <[email protected]> wrote:

> [...]
> Is reiserfs on your root drive? If not can you please boot into single
> user mode, enable sysrq, try the mount again and get the decoded output
> from sysrq-p and sysrq-t if it hangs.
>
> -chris

Hello Chris,

I tried to produce some useful output but failed. Additionals I found:

- pre3-ac1 has the same problem
- the box _hangs_ in fact, no sysrq is working.
(you need hw-reset to revive the box)
- I can see no disk activity on the 3ware RAID in question
- It always shows up, completely reproducable
- It shows during boot and during single- or multiuser (mount from console)

Regards,
Stephan

2003-07-09 13:22:32

by Chris Mason

[permalink] [raw]
Subject: Re: 2.4.22-pre3 and reiserfs boot problem

On Wed, 2003-07-09 at 08:01, Stephan von Krawczynski wrote:
> On 06 Jul 2003 14:13:44 -0400
> Chris Mason <[email protected]> wrote:
>
> > [...]
> > Is reiserfs on your root drive? If not can you please boot into single
> > user mode, enable sysrq, try the mount again and get the decoded output
> > from sysrq-p and sysrq-t if it hangs.
> >
> > -chris
>
> Hello Chris,
>
> I tried to produce some useful output but failed. Additionals I found:
>
> - pre3-ac1 has the same problem
> - the box _hangs_ in fact, no sysrq is working.
> (you need hw-reset to revive the box)
> - I can see no disk activity on the 3ware RAID in question
> - It always shows up, completely reproducable
> - It shows during boot and during single- or multiuser (mount from console)

Step one is to figure out if the problem is reiserfs or 3ware. Instead
of mounting the filesystem, run debugreiserfs -d /dev/xxxx > /dev/null
and see if you still hang.

This will read the FS metadata and should generate enough io to trigger
the hang if it is a 3ware problem.

(I'm on vacation for a few days, so Oleg is cc'd)

-chris


2003-07-09 13:34:05

by Oleg Drokin

[permalink] [raw]
Subject: Re: 2.4.22-pre3 and reiserfs boot problem

Hello!

On Wed, Jul 09, 2003 at 09:36:04AM -0400, Chris Mason wrote:
> > > Is reiserfs on your root drive? If not can you please boot into single
> > > user mode, enable sysrq, try the mount again and get the decoded output
> > > from sysrq-p and sysrq-t if it hangs.
> > I tried to produce some useful output but failed. Additionals I found:
> > - pre3-ac1 has the same problem
> > - the box _hangs_ in fact, no sysrq is working.
> > (you need hw-reset to revive the box)

This complete hang is _very_ suspicious. Usually you cannot achieve such
results without touching hardware.

> > - I can see no disk activity on the 3ware RAID in question

After the lockup? This is kind of expected ;)

> > - It always shows up, completely reproducable
> > - It shows during boot and during single- or multiuser (mount from console)
> Step one is to figure out if the problem is reiserfs or 3ware. Instead
> of mounting the filesystem, run debugreiserfs -d /dev/xxxx > /dev/null
> and see if you still hang.
> This will read the FS metadata and should generate enough io to trigger
> the hang if it is a 3ware problem.

Or if this one suceeds, then may be reiserfsck --check /dev/xxxx to get
journal replayed. This is in case access pattern matters.

Bye,
Oleg

2003-07-09 13:43:57

by Stephan von Krawczynski

[permalink] [raw]
Subject: Re: 2.4.22-pre3 and reiserfs boot problem

On Wed, 9 Jul 2003 17:48:37 +0400
Oleg Drokin <[email protected]> wrote:

> Hello!
>
> On Wed, Jul 09, 2003 at 09:36:04AM -0400, Chris Mason wrote:
> > > > Is reiserfs on your root drive? If not can you please boot into single
> > > > user mode, enable sysrq, try the mount again and get the decoded output
> > > > from sysrq-p and sysrq-t if it hangs.
> > > I tried to produce some useful output but failed. Additionals I found:
> > > - pre3-ac1 has the same problem
> > > - the box _hangs_ in fact, no sysrq is working.
> > > (you need hw-reset to revive the box)
>
> This complete hang is _very_ suspicious. Usually you cannot achieve such
> results without touching hardware.

Well, I did a few more tries and it looks like this:
- enter mount command
- short blinking on the RAID disks
- about 1/2 second later box hangs

> > Step one is to figure out if the problem is reiserfs or 3ware. Instead
> > of mounting the filesystem, run debugreiserfs -d /dev/xxxx > /dev/null
> > and see if you still hang.
> > This will read the FS metadata and should generate enough io to trigger
> > the hang if it is a 3ware problem.

Ok, I tried this. debugreiserfs runs without any problems. Disks show quite an
activity, the whole thing lasts 1-2 minutes.

mount afterwards shows the same hang.

> Or if this one suceeds, then may be reiserfsck --check /dev/xxxx to get
> journal replayed. This is in case access pattern matters.

I can try that, too. What do you expect to see?

Regards,
Stephan

2003-07-09 13:45:00

by Vincent Touquet

[permalink] [raw]
Subject: Re: 2.4.22-pre3 and reiserfs boot problem

On Wed, Jul 09, 2003 at 02:01:38PM +0200, Stephan von Krawczynski wrote:
>I tried to produce some useful output but failed. Additionals I found:
>
>- pre3-ac1 has the same problem
>- the box _hangs_ in fact, no sysrq is working.
> (you need hw-reset to revive the box)
>- I can see no disk activity on the 3ware RAID in question
>- It always shows up, completely reproducable
>- It shows during boot and during single- or multiuser (mount from console)
>
>Regards,
>Stephan

Which mainboard do you use ?
I'm having endless pain with a 3ware raid and Tyan mainboards, so much I
really really want to replace it with anything else that is stable
(probably a single Intel P4).

v

2003-07-09 13:56:38

by Oleg Drokin

[permalink] [raw]
Subject: Re: 2.4.22-pre3 and reiserfs boot problem

Hello!

On Wed, Jul 09, 2003 at 03:58:03PM +0200, Stephan von Krawczynski wrote:
> > > Step one is to figure out if the problem is reiserfs or 3ware. Instead
> > > of mounting the filesystem, run debugreiserfs -d /dev/xxxx > /dev/null
> > > and see if you still hang.
> > > This will read the FS metadata and should generate enough io to trigger
> > > the hang if it is a 3ware problem.
> Ok, I tried this. debugreiserfs runs without any problems. Disks show quite an
> activity, the whole thing lasts 1-2 minutes.
> mount afterwards shows the same hang.

Hm.

> > Or if this one suceeds, then may be reiserfsck --check /dev/xxxx to get
> > journal replayed. This is in case access pattern matters.
> I can try that, too. What do you expect to see?

Well, it will either hang or not, I think.
It it won't hang, this will complicate matters.
Then next step would be probably to try and mount the partition from usermodelinux if you are able
to conduct such a test.
I am still pretty skeptical about the possibility that recent reiserfs changes broke stuff.

Bye,
Oleg

2003-07-09 14:06:03

by Stephan von Krawczynski

[permalink] [raw]
Subject: Re: 2.4.22-pre3 and reiserfs boot problem

On Wed, 9 Jul 2003 16:01:40 +0200
Vincent Touquet <[email protected]> wrote:

> Which mainboard do you use ?
> I'm having endless pain with a 3ware raid and Tyan mainboards, so much I
> really really want to replace it with anything else that is stable
> (probably a single Intel P4).
>
> v

Sorry, the problem only arises with 2.4.22-pre3, and nothing before. So it is
definitely software-related.

Regards,
Stephan

2003-07-09 14:12:39

by Stephan von Krawczynski

[permalink] [raw]
Subject: Re: 2.4.22-pre3 and reiserfs boot problem

On Wed, 9 Jul 2003 18:11:11 +0400
Oleg Drokin <[email protected]> wrote:

> Hello!
>
> On Wed, Jul 09, 2003 at 03:58:03PM +0200, Stephan von Krawczynski wrote:
> > > > Step one is to figure out if the problem is reiserfs or 3ware. Instead
> > > > of mounting the filesystem, run debugreiserfs -d /dev/xxxx > /dev/null
> > > > and see if you still hang.
> > > > This will read the FS metadata and should generate enough io to trigger
> > > > the hang if it is a 3ware problem.
> > Ok, I tried this. debugreiserfs runs without any problems. Disks show quite
> > an activity, the whole thing lasts 1-2 minutes.
> > mount afterwards shows the same hang.
>
> Hm.
>
> > > Or if this one suceeds, then may be reiserfsck --check /dev/xxxx to get
> > > journal replayed. This is in case access pattern matters.
> > I can try that, too. What do you expect to see?
>
> Well, it will either hang or not, I think.
> It it won't hang, this will complicate matters.
> Then next step would be probably to try and mount the partition from
> usermodelinux if you are able to conduct such a test.
> I am still pretty skeptical about the possibility that recent reiserfs
> changes broke stuff.
>
> Bye,
> Oleg

ok, I did the reiserfsck and it works flawlessly. No errors no problems no
hang.
I tried mount afterwards and it still hangs.
Is there some recent change around the mount procedure itself? maybe it is
really unrelated to reiserfs and 3ware...

Regards,
Stephan

PS to Marcelo:
There is a problem with 2.4.22-pre3. I cannot mount a reiserfs data-partition
with 320 GB size located on a 3ware RAID. It just hangs the box, during init or
any runlevel I tried. It is completely reproducable, but debugreiserfs on the
partition and reiserfsck both show no problems at all ...
The things mounts flawlessly under 2.4.22-pre2 and below.


2003-07-09 16:22:13

by Stephan von Krawczynski

[permalink] [raw]
Subject: Re: 2.4.22-pre3 and reiserfs boot problem

On Wed, 9 Jul 2003 18:40:15 +0300
Ville Herva <[email protected]> wrote:

>> > ok, I did the reiserfsck and it works flawlessly. No errors no problems no
> > hang.
> > I tried mount afterwards and it still hangs.
> > Is there some recent change around the mount procedure itself? maybe it is
> > really unrelated to reiserfs and 3ware...
>
> Is it just this partition or any reiserfs fs on 3ware?

Hm, unfortunately I can't tell, I have no other partition available on 3ware
...

> > Oleg Drokin <[email protected]> wrote:
> > > Then next step would be probably to try and mount the partition from
> > > usermodelinux if you are able to conduct such a test.
>
> It it possible to mount raw partitions with UML?

Hm, I never tried UML. I really wonder if there is nobody else with 3ware and
reiserfs available for re-checking 2.4.22-pre3. Only to see if this is a
singular problem or reproducable elsewhere.

Regards,
Stephan

2003-07-09 17:09:48

by Marcelo Tosatti

[permalink] [raw]
Subject: Re: 2.4.22-pre3 and reiserfs boot problem



On Wed, 9 Jul 2003, Stephan von Krawczynski wrote:

> On Wed, 9 Jul 2003 18:11:11 +0400
> ok, I did the reiserfsck and it works flawlessly. No errors no problems no
> hang.
> I tried mount afterwards and it still hangs.
> Is there some recent change around the mount procedure itself? maybe it is
> really unrelated to reiserfs and 3ware...
>
> Regards,
> Stephan
>
> PS to Marcelo:
> There is a problem with 2.4.22-pre3. I cannot mount a reiserfs data-partition
> with 320 GB size located on a 3ware RAID. It just hangs the box, during init or
> any runlevel I tried. It is completely reproducable, but debugreiserfs on the
> partition and reiserfsck both show no problems at all ...
> The things mounts flawlessly under 2.4.22-pre2 and below.

There are no 3ware changes in pre3. So it must be reiserfs or something
else. Lets try reverting the reiserfs patches to see if they are the
cause?

Attached are files rei1, rei2, and rei3 (all gzip compressed).

They are the three reiserfs changesets which have been included in -pre3.

Can you please revert them (with patch -R) and try to reproduce the
problem?

Thanks


Attachments:
rei1.gz (20.61 kB)
rei2.gz (4.26 kB)
rei3.gz (17.01 kB)
Download all attachments

2003-07-10 11:07:34

by Stephan von Krawczynski

[permalink] [raw]
Subject: Re: 2.4.22-pre3 and reiserfs boot problem

On Wed, 9 Jul 2003 14:18:37 -0300 (BRT)
Marcelo Tosatti <[email protected]> wrote:

> > PS to Marcelo:
> > There is a problem with 2.4.22-pre3. I cannot mount a reiserfs
> > data-partition with 320 GB size located on a 3ware RAID. It just hangs the
> > box, during init or any runlevel I tried. It is completely reproducable,
> > but debugreiserfs on the partition and reiserfsck both show no problems at
> > all ... The things mounts flawlessly under 2.4.22-pre2 and below.
>
> There are no 3ware changes in pre3. So it must be reiserfs or something
> else. Lets try reverting the reiserfs patches to see if they are the
> cause?
>
> Attached are files rei1, rei2, and rei3 (all gzip compressed).

I reverted all three patches and the problem stays just the same. I guess this
makes it a lot likely that the problem lies elsewhere.

If you want me to try others, just send them to me like these three.

Thanks for your support
Stephan

2003-07-10 11:46:21

by Stephan von Krawczynski

[permalink] [raw]
Subject: Re: 2.4.22-pre3 and reiserfs boot problem

On Wed, 9 Jul 2003 14:18:37 -0300 (BRT)
Marcelo Tosatti <[email protected]> wrote:

> Can you please revert them (with patch -R) and try to reproduce the
> problem?
>
> Thanks

In addition I can tell you that the problem is also visible with -pre4

Regards,
Stephan

2003-07-10 12:03:06

by Marcelo Tosatti

[permalink] [raw]
Subject: Re: 2.4.22-pre3 and reiserfs boot problem



On Thu, 10 Jul 2003, Stephan von Krawczynski wrote:

> On Wed, 9 Jul 2003 14:18:37 -0300 (BRT)
> Marcelo Tosatti <[email protected]> wrote:
>
> > > PS to Marcelo:
> > > There is a problem with 2.4.22-pre3. I cannot mount a reiserfs
> > > data-partition with 320 GB size located on a 3ware RAID. It just hangs the
> > > box, during init or any runlevel I tried. It is completely reproducable,
> > > but debugreiserfs on the partition and reiserfsck both show no problems at
> > > all ... The things mounts flawlessly under 2.4.22-pre2 and below.
> >
> > There are no 3ware changes in pre3. So it must be reiserfs or something
> > else. Lets try reverting the reiserfs patches to see if they are the
> > cause?
> >
> > Attached are files rei1, rei2, and rei3 (all gzip compressed).
>
> I reverted all three patches and the problem stays just the same. I guess this
> makes it a lot likely that the problem lies elsewhere.
>
> If you want me to try others, just send them to me like these three.

Stephan,

First of all, thanks a lot for your help. Not everyone is willing to
debug/test problems like you do. This is very important for us.

Well, we now know reiserfs patches in 2.4.22-pre are not the problem.

2.4.21 is OK (does not crash when mounting) correct?

2003-07-10 12:06:06

by Stephan von Krawczynski

[permalink] [raw]
Subject: Re: 2.4.22-pre3 and reiserfs boot problem

On Thu, 10 Jul 2003 08:54:02 -0300 (BRT)
Marcelo Tosatti <[email protected]> wrote:

> > If you want me to try others, just send them to me like these three.
>
> Stephan,
>
> First of all, thanks a lot for your help. Not everyone is willing to
> debug/test problems like you do. This is very important for us.
>
> Well, we now know reiserfs patches in 2.4.22-pre are not the problem.
>
> 2.4.21 is OK (does not crash when mounting) correct?

Everything up to and including 2.4.22-pre2 is working without problems.
Under 2.4.22-pre3 I can reiserfsck the partition and see i/o going on for
minutes without troubles.
But I cannot mount it, the box completely hangs about 1 second after mount try.
I see the drive leds blinking shortly and that's it.

Regards,
Stephan


2003-07-10 14:05:39

by Carl-Daniel Hailfinger

[permalink] [raw]
Subject: Re: 2.4.22-pre3 and reiserfs boot problem

Having read the whole thread, I came up with a few ideas.

The following patches in -pre3 could perhaps have to do something with
your problems:
o (resend) collected semaphore fixes and semtimedop
o Fix potential IO hangs and increase interactiveness during heavy IO
o fix false sharing of mm info
o fix up semops and return, allow timedop
o mremap VM_LOCKED move_vma
o remove io_apic_modify - this doesnt work on some APICs
o small setup-pci cleanups

Since your hangs are not even traceable with SysRq, please try to boot
with nmi_watchdog=1 and if that doesn't work (dmesg will complain)
nmi_watchdog=2. About 15 seconds after the hang your box should print a
backtrace.

As a last resort you could mount the fs from UML.

I suggest you try the nmi_watchdog thing first.


Carl-Daniel
--
http://www.hailfinger.org/

2003-07-10 14:24:15

by Stephan von Krawczynski

[permalink] [raw]
Subject: Re: 2.4.22-pre3 and reiserfs boot problem

On Thu, 10 Jul 2003 16:20:14 +0200
Carl-Daniel Hailfinger <[email protected]> wrote:

> Since your hangs are not even traceable with SysRq, please try to boot
> with nmi_watchdog=1 and if that doesn't work (dmesg will complain)
> nmi_watchdog=2. About 15 seconds after the hang your box should print a
> backtrace.

I have currently nmi_watchdog=1 which works (NMI interrupts show up during
normal operation), but there is no backtrace visible or producable during the
freeze, sorry.

--
Regards,
Stephan

2003-07-10 14:29:54

by Carl-Daniel Hailfinger

[permalink] [raw]
Subject: Re: 2.4.22-pre3 and reiserfs boot problem

Stephan von Krawczynski wrote:
> On Thu, 10 Jul 2003 16:20:14 +0200
> Carl-Daniel Hailfinger <[email protected]> wrote:
>
>
>>Since your hangs are not even traceable with SysRq, please try to boot
>>with nmi_watchdog=1 and if that doesn't work (dmesg will complain)
>>nmi_watchdog=2. About 15 seconds after the hang your box should print a
>>backtrace.
>
>
> I have currently nmi_watchdog=1 which works (NMI interrupts show up during
> normal operation), but there is no backtrace visible or producable during the
> freeze, sorry.

How is that possible? If the NMI watchdog works but doesn't fire, the
lockup should respond to SysRq-T. Could you please try SysRq-T *before*
the hang just to verify that it would work?


Regards,
Carl-Daniel
--
http://www.hailfinger.org/

2003-07-10 14:58:17

by Stephan von Krawczynski

[permalink] [raw]
Subject: Re: 2.4.22-pre3 and reiserfs boot problem

On Thu, 10 Jul 2003 16:44:28 +0200
Carl-Daniel Hailfinger <[email protected]> wrote:

> > I have currently nmi_watchdog=1 which works (NMI interrupts show up during
> > normal operation), but there is no backtrace visible or producable during
> > the freeze, sorry.
>
> How is that possible? If the NMI watchdog works but doesn't fire, the
> lockup should respond to SysRq-T. Could you please try SysRq-T *before*
> the hang just to verify that it would work?

Well, the thing I don't really understand about SysRq-X is that the output is
visible by dmesg-command, but I cannot see anything on the console (single user
tested). This means it may well work in the hang-case, but as I cannot execute
dmesg I will never see it...

Regards,
Stephan

2003-07-10 15:16:15

by Anders Karlsson

[permalink] [raw]
Subject: Re: 2.4.22-pre3 and reiserfs boot problem

Hi,

Apologies for chipping in, but I saw something similar to what was
described in the thread. I'm running 2.4.22-pre3-ac1 with the FreeS/WAN
2.0.1 patches and noticed last night that when booting this kernel, if
an ext3 filesystem had exceeded its mount count and required checking,
the e2fsck process would hang sometime during the fsck and the system
would become unresponsive, but SysRq would still work. Alt-SysRq-P would
show e2fsck and some register details. I did not note them down, but
booting 2.4.21-rc7-ac1 and letting that kernel check the filesystem
would work. Booting back into 2.4.22-pre3-ac1 would then also work.

This might or might not be related to the original problem. I do use
nmi_watchdog=1, NMI count is 1 presently, so I guess that works. The ram
is memtested, so that is not an issue, heavy filesystem usage works
normally, it was just e2fsck that would not work. I have not tried -pre2
or -pre4 yet, but that is on the cards.

If there is anything I can try, let me know.

--
Anders Karlsson <[email protected]>
Trudheim Technology Limited


Attachments:
signature.asc (198.00 B)
This is a digitally signed message part

2003-07-10 15:25:24

by Carl-Daniel Hailfinger

[permalink] [raw]
Subject: Re: 2.4.22-pre3 and reiserfs boot problem

Anders Karlsson wrote:
> Hi,
>
> Apologies for chipping in, but I saw something similar to what was
> described in the thread. I'm running 2.4.22-pre3-ac1 with the FreeS/WAN
> 2.0.1 patches and noticed last night that when booting this kernel, if
> an ext3 filesystem had exceeded its mount count and required checking,
> the e2fsck process would hang sometime during the fsck and the system
> would become unresponsive, but SysRq would still work. Alt-SysRq-P would

Was there any disk activity after it became unresponsive? If not, please
provide a (partially) decoded SysRq-T. I'm only interested in the decoded
stack trace of the hung process (it should have a "D" after the process name).

> show e2fsck and some register details. I did not note them down, but
> booting 2.4.21-rc7-ac1 and letting that kernel check the filesystem
> would work. Booting back into 2.4.22-pre3-ac1 would then also work.
>
> This might or might not be related to the original problem. I do use
> nmi_watchdog=1, NMI count is 1 presently, so I guess that works. The ram
> is memtested, so that is not an issue, heavy filesystem usage works
> normally, it was just e2fsck that would not work. I have not tried -pre2
> or -pre4 yet, but that is on the cards.
>
> If there is anything I can try, let me know.


Carl-Daniel
--
http://www.hailfinger.org/

2003-07-10 18:53:23

by Anders Karlsson

[permalink] [raw]
Subject: Re: 2.4.22-pre3 and reiserfs boot problem

On Thu, 2003-07-10 at 16:39, Carl-Daniel Hailfinger wrote:

> Was there any disk activity after it became unresponsive? If not, please
> provide a (partially) decoded SysRq-T. I'm only interested in the decoded
> stack trace of the hung process (it should have a "D" after the process name).

Right, I have collected the details that was asked for. Carl-Daniel
suggested a way of tricking the system into doing an fsck without
tampering with the filesystem itself. Please find below the data I
copied out and decoded. If there is anything else I can do, let me know.
If there is any data about my system you require, let me know.

Output from Alt SysRq P:

Pid: 160, comm: fsck.ext3
EIP: 0010:[<c01b0f43>] CPU: 0 EFLAGS: 00000246 Not Tainted
EAX: 00000000 EBX: 00000000 ECX: c0338c00 EDX: 00000007
ESI: c0338c00 EDI: eee68000 EBP: eee69de8 DS: 0018 ES: 0018
CR0: 80050033 CR2: 400dcff0 CR3: 2ee6b000 CR4: 000006d0

Call trace: [<c01b1584>] [<c01b1bfb>] [<c01b1cca>] [<c0141643>]
[<c014178f>] [<c01417a4>] [<c0141887>] [<c01464e1>] [<c0141b61>]
[<c0108f83>]

Output from Alt SysRq T:

fsck.ext3 D current 3808 160 149 (NOTLB)
Call Trace: [<c014178f>] [<c01417a4>] [<c0141887>] [<c01464e1>]
[<c0141b61>] [<c0108f83>]


ksymoops 2.4.8 on i686 2.4.22-pre3-ac1. Options used
-V (default)
-k /proc/ksyms (default)
-l /proc/modules (default)
-o /lib/modules/2.4.22-pre3-ac1/ (default)
-m /boot/System.map-2.4.22-pre3-ac1 (default)

Warning: You did not tell me where to find symbol information. I will
assume that the log matches the kernel and modules that are running
right now and I'll use the default options above for symbol resolution.
If the current kernel and/or modules do not match the log, you can get
more accurate output by telling me the kernel version and where to find
map, modules, ksyms etc. ksymoops -h explains the options.

Warning (compare_maps): mismatch on symbol zeroes , ipsec says
f23fdf00, /lib/modules/2.4.22-pre3-ac1/kernel/net/ipsec/ipsec.o says
f23fdde0. Ignoring
/lib/modules/2.4.22-pre3-ac1/kernel/net/ipsec/ipsec.o entry
Pid: 160, comm: fsck.ext3
EIP: 0010:[<c01b0f43>] CPU: 0 EFLAGS: 00000246 Not Tainted
Using defaults from ksymoops -t elf32-i386 -a i386
EAX: 00000000 EBX: 00000000 ECX: c0338c00 EDX: 00000007
ESI: c0338c00 EDI: eee68000 EBP: eee69de8 DS: 0018 ES: 0018
Warning (Oops_set_regs): garbage 'DS: 0018 ES: 0018' at end of register
line ignored
CR0: 80050033 CR2: 400dcff0 CR3: 2ee6b000 CR4: 000006d0
Call trace: [<c01b1584>] [<c01b1bfb>] [<c01b1cca>] [<c0141643>]
[<c014178f>] [<c01417a4>] [<c0141887>] [<c01464e1>] [<c0141b61>]
[<c0108f83>]
Warning (Oops_read): Code line not seen, dumping what data is available


>>EIP; c01b0f43 <__get_request_wait+ae/f6> <=====

>>ECX; c0338c00 <ide_hwifs+c0/2af8>
>>ESI; c0338c00 <ide_hwifs+c0/2af8>
>>EDI; eee68000 <_end+2eb17bf4/304bcc54>
>>EBP; eee69de8 <_end+2eb199dc/304bcc54>

Trace; c01b1584 <__make_request+167/71a>
Trace; c01b1bfb <generic_make_request+c4/13a>
Trace; c01b1cca <submit_bh+59/a0>
Trace; c0141643 <write_locked_buffers+2a/36>
Trace; c014178f <write_some_buffers+140/142>
Trace; c01417a4 <write_unlocked_buffers+13/1d>
Trace; c0141887 <sync_buffers+1a/75>
Trace; c01464e1 <__block_fsync+2f/6c>
Trace; c0141b61 <sys_fsync+93/de>
Trace; c0108f83 <system_call+33/38>


4 warnings issued. Results may not be reliable.



--
Anders Karlsson <[email protected]>
Trudheim Technology Limited


Attachments:
signature.asc (198.00 B)
This is a digitally signed message part