2001-02-19 23:07:59

by James A. Pattie

[permalink] [raw]
Subject: Reiserfs, 3 Raid1 arrays, 2.4.1 machine locks up

I'm not subscribed to the kernel mailing list, so please cc any replies
to me.

I'm building a firewall on a P133 with 48 MB of memory using RH 7.0,
latest updates, etc. and kernel 2.4.1.
I've built a customized install of RH (~200MB) which I untar onto the
system after building my raid arrays, etc. via a Rescue CD which I
created using Timo's Rescue CD project. The booting kernel is
2.4.1-ac10, no networking, raid compiled in but raid1 as a module,
reiserfs as a module, ext2 and iso-9660 compiled in, using sg support
for cd-roms. I had to strip the kernel down so it would fit on a floppy
as the system does not support booting off of CD-ROM.

After booting and getting my initial file system in memory (20+ MB
ramdisk), I created a partition for swap, format and then swapon so I
don't run out of memory. At this point I usually have 3-5 MB free
memory, 128 MB swap.

I partitioned the 2 drives (on 1st and 2nd controller, both 1.3 GB each)
into 4 total partitions. 1st is swap and then the next 3, 1 primary, 2
extended are for raid 1 arrays. I've given 20 MB to /boot (md0), 650MB
to / (md1) and the rest (400+MB) to /var (md2). I format md0 as ext2
and md1 and md2 as reiserfs. When I go to untar the image on the cd to
/mnt/slash (which has md1 mounted on it), the system extracts about 30MB
of data and then just stops responding. No kernel output, etc. I can
change to the other virtual consoles, but no other keyboard input is
accepted. After resetting the machine, the raid arrays rebuild ok, and
reiserfs gives me no problems other than it usually replays 2 or 3
transactions. If I tell tar to pickup on the last directory I saw
extracted, it gets about another 30MB of data and stops again. I've
waited for the raid syncing to be finished or just started after the
arrays are available and it doesn't matter.

I first tried with 2.4.1 stock and then went to 2.4.1-ac10 (the latest
at the time I was playing with this) and it did exactly the same thing.
If I format md1 and md2 with ext2, then everything works fine. I was
initially compiling 386 only support in and have tried with 586 support
(no difference). I've tried both r5 and tea hashes with reiserfs.

One thing I did notice was that the syncing of the raid 1 arrays went in
sequence, md0, md1, md2 instead of in parrallel. I assume it is because
the machine just doesn't have the horsepower, etc. or is it that I have
multiple raid arrays on the same drives?

This isn't a life or death issue at the moment, but I would like to be
able to use reiserfs in this scenario in the future.

I have tested the same rescue CD boot image on a K62, 450Mhz, 128 MB
system. No raid, just one reiserfs partition and it untarred without
any issues. I'm thinking this is something specific to older, lower
memory machines?

--
James A. Pattie
[email protected]

Linux -- SysAdmin / Programmer
PC & Web Xperience, Inc.
http://www.pcxperience.com/




2001-02-20 17:34:03

by James A. Pattie

[permalink] [raw]
Subject: Re: Reiserfs, 3 Raid1 arrays, 2.4.1 machine locks up

Colonel wrote:

> In clouddancer.list.kernel.owner, you wrote:
> >
> >I'm not subscribed to the kernel mailing list, so please cc any replies
> >to me.
> >
> >I'm building a firewall on a P133 with 48 MB of memory using RH 7.0,
> >latest updates, etc. and kernel 2.4.1.
> >I've built a customized install of RH (~200MB) which I untar onto the
> >system after building my raid arrays, etc. via a Rescue CD which I
> >created using Timo's Rescue CD project. The booting kernel is
> >2.4.1-ac10, no networking, raid compiled in but raid1 as a module
>
> Hmm, raid as a module was always a Bad Idea(tm) in the 2.2 "alpha"
> raid (which was misnamed and is 2.4 raid). I suggest you change that
> and update, as I had no problems with 2.4.2-pre2/3, nor have any been
> posted to the raid list.

I just tried with 2.4.1-ac14, raid and raid1 compiled in and it did the
same thing. I'm going to try to compile reiserfs in (if I have enough room
to still fit the kernel on the floppy with it's initial ramdisk, etc.) and
see what that does.


--
James A. Pattie
[email protected]

Linux -- SysAdmin / Programmer
PC & Web Xperience, Inc.
http://www.pcxperience.com/



2001-02-20 18:19:15

by Colonel

[permalink] [raw]
Subject: Re: Reiserfs, 3 Raid1 arrays, 2.4.1 machine locks up

Sender: [email protected]
Date: Tue, 20 Feb 2001 11:32:19 -0600
From: "James A. Pattie" <[email protected]>
X-Accept-Language: en
Content-Type: text/plain; charset=us-ascii

Colonel wrote:

> In clouddancer.list.kernel.owner, you wrote:
> >
> >I'm not subscribed to the kernel mailing list, so please cc any replies
> >to me.
> >
> >I'm building a firewall on a P133 with 48 MB of memory using RH 7.0,
> >latest updates, etc. and kernel 2.4.1.
> >I've built a customized install of RH (~200MB) which I untar onto the
> >system after building my raid arrays, etc. via a Rescue CD which I
> >created using Timo's Rescue CD project. The booting kernel is
> >2.4.1-ac10, no networking, raid compiled in but raid1 as a module
>
> Hmm, raid as a module was always a Bad Idea(tm) in the 2.2 "alpha"
> raid (which was misnamed and is 2.4 raid). I suggest you change that
> and update, as I had no problems with 2.4.2-pre2/3, nor have any been
> posted to the raid list.

I just tried with 2.4.1-ac14, raid and raid1 compiled in and it did the
same thing. I'm going to try to compile reiserfs in (if I have enough room
to still fit the kernel on the floppy with it's initial ramdisk, etc.) and
see what that does.


Hmm. reiserfs is probably OK as a module. ac14 is 5 versions
'behind'. I'd start looking for a distribution problem.

2001-02-20 19:43:58

by Tom Sightler

[permalink] [raw]
Subject: Re: Reiserfs, 3 Raid1 arrays, 2.4.1 machine locks up

> > >I'm building a firewall on a P133 with 48 MB of memory using RH 7.0,
> > >latest updates, etc. and kernel 2.4.1.
> > >I've built a customized install of RH (~200MB) which I untar onto
the
> > >system after building my raid arrays, etc. via a Rescue CD which I
> > >created using Timo's Rescue CD project. The booting kernel is
> > >2.4.1-ac10, no networking, raid compiled in but raid1 as a module
> >
> > Hmm, raid as a module was always a Bad Idea(tm) in the 2.2 "alpha"
> > raid (which was misnamed and is 2.4 raid). I suggest you change that
> > and update, as I had no problems with 2.4.2-pre2/3, nor have any been
> > posted to the raid list.
>
> I just tried with 2.4.1-ac14, raid and raid1 compiled in and it did the
> same thing. I'm going to try to compile reiserfs in (if I have enough
room
> to still fit the kernel on the floppy with it's initial ramdisk, etc.)
and
> see what that does.

There seem to be several reports of reiserfs falling over when memory is
low. It seems to be undetermined if this problem is actually reiserfs or MM
related, but there are other threads on this list regarding similar issues.
This would explain why the same disk would work on a different machine with
more memory. Any chance you could add memory to the box temporarily just to
see if it helps, this may help prove if this is the problem or not.

Later,
Tom


2001-02-20 19:58:11

by James A. Pattie

[permalink] [raw]
Subject: Re: Reiserfs, 3 Raid1 arrays, 2.4.1 machine locks up

Tom Sightler wrote:

> > > >I'm building a firewall on a P133 with 48 MB of memory using RH 7.0,
> > > >latest updates, etc. and kernel 2.4.1.
> > > >I've built a customized install of RH (~200MB) which I untar onto
> the
> > > >system after building my raid arrays, etc. via a Rescue CD which I
> > > >created using Timo's Rescue CD project. The booting kernel is
> > > >2.4.1-ac10, no networking, raid compiled in but raid1 as a module
> > >
> > > Hmm, raid as a module was always a Bad Idea(tm) in the 2.2 "alpha"
> > > raid (which was misnamed and is 2.4 raid). I suggest you change that
> > > and update, as I had no problems with 2.4.2-pre2/3, nor have any been
> > > posted to the raid list.
> >
> > I just tried with 2.4.1-ac14, raid and raid1 compiled in and it did the
> > same thing. I'm going to try to compile reiserfs in (if I have enough
> room
> > to still fit the kernel on the floppy with it's initial ramdisk, etc.)
> and
> > see what that does.
>
> There seem to be several reports of reiserfs falling over when memory is
> low. It seems to be undetermined if this problem is actually reiserfs or MM
> related, but there are other threads on this list regarding similar issues.
> This would explain why the same disk would work on a different machine with
> more memory. Any chance you could add memory to the box temporarily just to
> see if it helps, this may help prove if this is the problem or not.
>
> Later,
> Tom

Out of all the old 72 pin simms we have, we have it maxed out at 48 MB's. I'm
tempted to take the 2 drives out and put them in the k6-2, but that's too much
of a hassle. I'm currently going to try 2.4.1-ac19 and see what happens.

The machine does have 128MB of swap space working, and whenever I've checked
memory usage (while the system was still responding), it never went over a
couple megs of swap space used.

--
James A. Pattie
[email protected]

Linux -- SysAdmin / Programmer
PC & Web Xperience, Inc.
http://www.pcxperience.com/



2001-02-20 20:09:42

by Tom Sightler

[permalink] [raw]
Subject: Re: Reiserfs, 3 Raid1 arrays, 2.4.1 machine locks up

> > There seem to be several reports of reiserfs falling over when memory is
> > low. It seems to be undetermined if this problem is actually reiserfs
or MM
> > related, but there are other threads on this list regarding similar
issues.
> > This would explain why the same disk would work on a different machine
with
> > more memory. Any chance you could add memory to the box temporarily
just to
> > see if it helps, this may help prove if this is the problem or not.
> >
>
> Out of all the old 72 pin simms we have, we have it maxed out at 48 MB's.
I'm
> tempted to take the 2 drives out and put them in the k6-2, but that's too
much
> of a hassle. I'm currently going to try 2.4.1-ac19 and see what happens.
>
> The machine does have 128MB of swap space working, and whenever I've
checked
> memory usage (while the system was still responding), it never went over a
> couple megs of swap space used.

Ah yes, but, from what I've read, the problem seems to occur when
buffer/cache memory is low (<6MB), you could have tons of swap and still
reach this level.

Later,
Tom


2001-02-20 21:08:01

by James A. Pattie

[permalink] [raw]
Subject: Re: Reiserfs, 3 Raid1 arrays, 2.4.1 machine locks up

Tom Sightler wrote:

> > > There seem to be several reports of reiserfs falling over when memory is
> > > low. It seems to be undetermined if this problem is actually reiserfs
> or MM
> > > related, but there are other threads on this list regarding similar
> issues.
> > > This would explain why the same disk would work on a different machine
> with
> > > more memory. Any chance you could add memory to the box temporarily
> just to
> > > see if it helps, this may help prove if this is the problem or not.
> > >
> >
> > Out of all the old 72 pin simms we have, we have it maxed out at 48 MB's.
> I'm
> > tempted to take the 2 drives out and put them in the k6-2, but that's too
> much
> > of a hassle. I'm currently going to try 2.4.1-ac19 and see what happens.
> >
> > The machine does have 128MB of swap space working, and whenever I've
> checked
> > memory usage (while the system was still responding), it never went over a
> > couple megs of swap space used.
>
> Ah yes, but, from what I've read, the problem seems to occur when
> buffer/cache memory is low (<6MB), you could have tons of swap and still
> reach this level.
>
> Later,
> Tom

You were right! I managed to find another 32MB of memory to bump it up to 64
MB total and it worked perfectly. It appears that I had only about 4 MB of
buffer/cache in the 48 MB system and over 15MB in the 64 MB system. I did my
install and switched back to the 48MB running normally and its working just
fine.

Thanks,


--
James A. Pattie
[email protected]

Linux -- SysAdmin / Programmer
PC & Web Xperience, Inc.
http://www.pcxperience.com/



2001-02-20 21:22:11

by Colonel

[permalink] [raw]
Subject: Re: Reiserfs, 3 Raid1 arrays, 2.4.1 machine locks up

From: "Tom Sightler" <[email protected]>
Cc: <[email protected]>
Date: Tue, 20 Feb 2001 14:43:07 -0500
Content-Type: text/plain;
charset="iso-8859-1"

> > >I'm building a firewall on a P133 with 48 MB of memory using RH 7.0,
> > >latest updates, etc. and kernel 2.4.1.
> > >I've built a customized install of RH (~200MB) which I untar onto
the
> > >system after building my raid arrays, etc. via a Rescue CD which I
> > >created using Timo's Rescue CD project. The booting kernel is
> > >2.4.1-ac10, no networking, raid compiled in but raid1 as a module
> >
> > Hmm, raid as a module was always a Bad Idea(tm) in the 2.2 "alpha"
> > raid (which was misnamed and is 2.4 raid). I suggest you change that
> > and update, as I had no problems with 2.4.2-pre2/3, nor have any been
> > posted to the raid list.
>
> I just tried with 2.4.1-ac14, raid and raid1 compiled in and it did the
> same thing. I'm going to try to compile reiserfs in (if I have enough
room
> to still fit the kernel on the floppy with it's initial ramdisk, etc.)
and
> see what that does.

There seem to be several reports of reiserfs falling over when memory is
low. It seems to be undetermined if this problem is actually reiserfs or MM
related, but there are other threads on this list regarding similar issues.
This would explain why the same disk would work on a different machine with
more memory. Any chance you could add memory to the box temporarily just to
see if it helps, this may help prove if this is the problem or not.


Well, I didn't happen to start the thread, but your comments may
explain some "gee I wonder if it died" problems I just had with my
2.4.1-pre2+reiser test box. It only has 16M, so it's always low
memory (never been a real problem in the past however). The test
situation is easily repeatable for me [1]. It's a 486 wall mount, so
it's easier to convert the fs than add memory, and it showed about
200k free at the time of the sluggishness. Previous 2.4.1 testing
with ext2 fs didn't show any sluggishness, but I also didn't happen to
run the test above either. When I come back to the office later, I'll
convert the fs, repeat the test and pass on the results.


[1] Since I decided to try to catch up on kernels, I had just grabbed
-ac18, cd to ~linux and run "rm -r *" via an ssh connection. In a
second connection, I tried a simple "dmesg" and waited over a minute
for results (long enough to log in directly on the box and bring up
top) followed by loading emacs for ftp transfers from kernel.org,
which again 'went to sleep'.

2001-02-21 00:01:58

by Roger Larsson

[permalink] [raw]
Subject: Re: Reiserfs, 3 Raid1 arrays, 2.4.1 machine locks up

On Tuesday 20 February 2001 22:21, Colonel wrote:
> From: "Tom Sightler" <[email protected]>
> Cc: <[email protected]>
> Date: Tue, 20 Feb 2001 14:43:07 -0500
> Content-Type: text/plain;
> charset="iso-8859-1"
>
> > > >I'm building a firewall on a P133 with 48 MB of memory using RH
> > > > 7.0, latest updates, etc. and kernel 2.4.1.
> > > >I've built a customized install of RH (~200MB) which I untar
> > > > onto
>
> the
>
> > > >system after building my raid arrays, etc. via a Rescue CD which
> > > > I created using Timo's Rescue CD project. The booting kernel
> > > > is 2.4.1-ac10, no networking, raid compiled in but raid1 as a
> > > > module
> > >
> > > Hmm, raid as a module was always a Bad Idea(tm) in the 2.2
> > > "alpha" raid (which was misnamed and is 2.4 raid). I suggest you
> > > change that and update, as I had no problems with 2.4.2-pre2/3,
> > > nor have any been posted to the raid list.
> >
> > I just tried with 2.4.1-ac14, raid and raid1 compiled in and it did
> > the same thing. I'm going to try to compile reiserfs in (if I have
> > enough
>
> room
>
> > to still fit the kernel on the floppy with it's initial ramdisk,
> > etc.)
>
> and
>
> > see what that does.
>
> There seem to be several reports of reiserfs falling over when memory is
> low. It seems to be undetermined if this problem is actually reiserfs
> or MM related, but there are other threads on this list regarding similar
> issues. This would explain why the same disk would work on a different
> machine with more memory. Any chance you could add memory to the box
> temporarily just to see if it helps, this may help prove if this is the
> problem or not.
>
>
> Well, I didn't happen to start the thread, but your comments may
> explain some "gee I wonder if it died" problems I just had with my
> 2.4.1-pre2+reiser test box. It only has 16M, so it's always low
> memory (never been a real problem in the past however). The test
> situation is easily repeatable for me [1]. It's a 486 wall mount, so
> it's easier to convert the fs than add memory, and it showed about
> 200k free at the time of the sluggishness. Previous 2.4.1 testing
> with ext2 fs didn't show any sluggishness, but I also didn't happen to
> run the test above either. When I come back to the office later, I'll
> convert the fs, repeat the test and pass on the results.
>
>
> [1] Since I decided to try to catch up on kernels, I had just grabbed
> -ac18, cd to ~linux and run "rm -r *" via an ssh connection. In a
> second connection, I tried a simple "dmesg" and waited over a minute
> for results (long enough to log in directly on the box and bring up
> top) followed by loading emacs for ftp transfers from kernel.org,
> which again 'went to sleep'.
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/

If these are freezes I had them too in 2.4.1, 2.4.2-pre1 fixed it for me.
Really I think it was the patch in handle_mm_fault setting TASK_RUNNING.

/RogerL

--
Home page:
none currently

2001-02-21 03:50:11

by Colonel

[permalink] [raw]
Subject: Re: Reiserfs, 3 Raid1 arrays, 2.4.1 machine locks up


> There seem to be several reports of reiserfs falling over when memory is
> low. It seems to be undetermined if this problem is actually reiserfs
> or MM related, but there are other threads on this list regarding similar
> issues. This would explain why the same disk would work on a different
> machine with more memory. Any chance you could add memory to the box
> temporarily just to see if it helps, this may help prove if this is the
> problem or not.
>
>
> Well, I didn't happen to start the thread, but your comments may
> explain some "gee I wonder if it died" problems I just had with my
> 2.4.1-pre2+reiser test box. It only has 16M, so it's always low
> memory (never been a real problem in the past however). The test
> situation is easily repeatable for me [1]. It's a 486 wall mount, so
> it's easier to convert the fs than add memory, and it showed about
> 200k free at the time of the sluggishness. Previous 2.4.1 testing
> with ext2 fs didn't show any sluggishness, but I also didn't happen to
> run the test above either. When I come back to the office later, I'll
> convert the fs, repeat the test and pass on the results.
>
>
> [1] Since I decided to try to catch up on kernels, I had just grabbed
> -ac18, cd to ~linux and run "rm -r *" via an ssh connection. In a
> second connection, I tried a simple "dmesg" and waited over a minute
> for results (long enough to log in directly on the box and bring up
> top) followed by loading emacs for ftp transfers from kernel.org,
> which again 'went to sleep'.
> -

If these are freezes I had them too in 2.4.1, 2.4.2-pre1 fixed it for me.
Really I think it was the patch in handle_mm_fault setting TASK_RUNNING.

/RogerL

Ohoh, I see that I fat-fingered the kernel version. The test box
kernel is 2.4.2-pre2 with Axboe's loop4 patch to the loopback fs. It
runs a three partition drive, a small /boot in ext2, / as reiser and
swap. I am verifying that the freeze is repeatable at the moment, and
so far I cannot cause free memory to drop to 200k and a short ice age
does not occur. Unless I can get that to repeat, the effort will be
useless... the only real difference is swap, it was not initially
active and now it is. Free memory never drops below 540k now, so I
would suspect a MM influence. [email protected] didn't mention
the memory values in his initial post, but it would be interesting to
see if he simply leaves his machine alone if it recovers
(i.e. probable swap thrashing) and then determine if the freeze ever
re-occurs. James seems to have better repeatability than I do.
Rebooting and retrying still doesn't result in a noticable freeze for
me. Some other factor must have been involved that I didn't notice.
Still seems like MM over reiser tho.


PS for james:
>One thing I did notice was that the syncing of the raid 1 arrays went in
sequence, md0, md1, md2 instead of in parrallel. I assume it is because
the machine just doesn't have the horsepower, etc. or is it that I have
multiple raid arrays on the same drives?

Same drives.

2001-02-21 14:46:48

by James A. Pattie

[permalink] [raw]
Subject: Re: Reiserfs, 3 Raid1 arrays, 2.4.1 machine locks up

Colonel wrote:

> > There seem to be several reports of reiserfs falling over when memory is
> > low. It seems to be undetermined if this problem is actually reiserfs
> > or MM related, but there are other threads on this list regarding similar
> > issues. This would explain why the same disk would work on a different
> > machine with more memory. Any chance you could add memory to the box
> > temporarily just to see if it helps, this may help prove if this is the
> > problem or not.
> >
> >
> > Well, I didn't happen to start the thread, but your comments may
> > explain some "gee I wonder if it died" problems I just had with my
> > 2.4.1-pre2+reiser test box. It only has 16M, so it's always low
> > memory (never been a real problem in the past however). The test
> > situation is easily repeatable for me [1]. It's a 486 wall mount, so
> > it's easier to convert the fs than add memory, and it showed about
> > 200k free at the time of the sluggishness. Previous 2.4.1 testing
> > with ext2 fs didn't show any sluggishness, but I also didn't happen to
> > run the test above either. When I come back to the office later, I'll
> > convert the fs, repeat the test and pass on the results.
> >
> >
> > [1] Since I decided to try to catch up on kernels, I had just grabbed
> > -ac18, cd to ~linux and run "rm -r *" via an ssh connection. In a
> > second connection, I tried a simple "dmesg" and waited over a minute
> > for results (long enough to log in directly on the box and bring up
> > top) followed by loading emacs for ftp transfers from kernel.org,
> > which again 'went to sleep'.
> > -
>
> If these are freezes I had them too in 2.4.1, 2.4.2-pre1 fixed it for me.
> Really I think it was the patch in handle_mm_fault setting TASK_RUNNING.
>
> /RogerL
>
> Ohoh, I see that I fat-fingered the kernel version. The test box
> kernel is 2.4.2-pre2 with Axboe's loop4 patch to the loopback fs. It
> runs a three partition drive, a small /boot in ext2, / as reiser and
> swap. I am verifying that the freeze is repeatable at the moment, and
> so far I cannot cause free memory to drop to 200k and a short ice age
> does not occur. Unless I can get that to repeat, the effort will be
> useless... the only real difference is swap, it was not initially
> active and now it is. Free memory never drops below 540k now, so I
> would suspect a MM influence. [email protected] didn't mention
> the memory values in his initial post, but it would be interesting to
> see if he simply leaves his machine alone if it recovers
> (i.e. probable swap thrashing) and then determine if the freeze ever
> re-occurs. James seems to have better repeatability than I do.
> Rebooting and retrying still doesn't result in a noticable freeze for
> me. Some other factor must have been involved that I didn't notice.
> Still seems like MM over reiser tho.

When the machine stopped responding, the first time, I let it go over the weekend
(2 days+) and it still didn't recover. I never saw a thrashing effect. The
initial memory values were 2MB free memory, < 1MB cache. I never really looked at
the cache values as I wasn't sure how they affected the system. when the system
was untarring my tarball, the memory usage would get down < 500kb and swap would be
around a couple of megs usually.

>
>
> PS for james:
> >One thing I did notice was that the syncing of the raid 1 arrays went in
> sequence, md0, md1, md2 instead of in parrallel. I assume it is because
> the machine just doesn't have the horsepower, etc. or is it that I have
> multiple raid arrays on the same drives?
>
> Same drives.

That's what I thought.

Thanks,


--
James A. Pattie
[email protected]

Linux -- SysAdmin / Programmer
PC & Web Xperience, Inc.
http://www.pcxperience.com/



2001-02-21 16:45:05

by James A. Pattie

[permalink] [raw]
Subject: Re: Reiserfs, 3 Raid1 arrays, 2.4.1 machine locks up

Colonel wrote:

> Sender: [email protected]
> Date: Wed, 21 Feb 2001 08:45:02 -0600
> From: "James A. Pattie" <[email protected]>
>
> Colonel wrote:
>
> > > There seem to be several reports of reiserfs falling over when memory is
> > > low. It seems to be undetermined if this problem is actually reiserfs
> > > or MM related, but there are other threads on this list regarding similar
> > > issues. This would explain why the same disk would work on a different
> > > machine with more memory. Any chance you could add memory to the box
> > > temporarily just to see if it helps, this may help prove if this is the
> > > problem or not.
> > >
> > >
>
> When the machine stopped responding, the first time, I let it go over the weekend
> (2 days+) and it still didn't recover. I never saw a thrashing effect. The
> initial memory values were 2MB free memory, < 1MB cache. I never really looked at
> the cache values as I wasn't sure how they affected the system. when the system
> was untarring my tarball, the memory usage would get down < 500kb and swap would be
> around a couple of megs usually.
>
> Well, it still looks like you have a good test case to resolve the
> problem. Can you add memory per the above request?
>
> I should drop out of this, it seems I had a one time event. Something
> to keep in mind is /boot should either be ext2 or mounted differently
> under reiser (check their website for details). You should probably
> try the Magic SysREQ stuff to see what's up at the time of freeze.
> You should probably run memtest86 to head off questions about your
> memory stability.

I added memory yesterday and got it to work after having 64MB in the system. the free
memory (cache/buffer) was over 30MB. I didn't have any problems then.

After I got everything installed, I bumped the memory back to 48MB and it is running
fine. I don't have the 17+MB ramdisk taking up the memory anymore, so the system has >
15MB of cache/buffer available at all times, even running ssh, sendmail, squid,
firewalling, etc.


>
>
> Just to check on the raid setup, the drives are on separate
> controllers and there is not a slow device on the same bus? I've been
> running the "2.4" raid for a couple years and that was the usual
> problem. Reiserfs is probably more aggressive working the drive and
> it may tend to unhide other system problems.
>

They are on seperate controllers. The second controller has the CD-ROM drive (32x) which
should be faster than the hard drive (since the drives are older).

>
> --
> "... being a Linux user is sort of like living in a house inhabited by
> a large family of carpenters and architects. Every morning when you
> wake up, the house is a little different. Maybe there is a new turret,
> or some walls have moved. Or perhaps someone has temporarily removed
> the floor under your bed." - Unix for Dummies, 2nd Edition

--
James A. Pattie
[email protected]

Linux -- SysAdmin / Programmer
PC & Web Xperience, Inc.
http://www.pcxperience.com/



2001-02-21 20:26:24

by Colonel

[permalink] [raw]
Subject: Re: Reiserfs, 3 Raid1 arrays, 2.4.1 machine locks up

From: "Tom Sightler" <[email protected]>
Cc: <[email protected]>
Date: Tue, 20 Feb 2001 14:43:07 -0500
Content-Type: text/plain;
charset="iso-8859-1"

> > >I'm building a firewall on a P133 with 48 MB of memory using RH 7.0,
> > >latest updates, etc. and kernel 2.4.1.
> > >I've built a customized install of RH (~200MB) which I untar onto
the
> > >system after building my raid arrays, etc. via a Rescue CD which I
> > >created using Timo's Rescue CD project. The booting kernel is
> > >2.4.1-ac10, no networking, raid compiled in but raid1 as a module
> >
> > Hmm, raid as a module was always a Bad Idea(tm) in the 2.2 "alpha"
> > raid (which was misnamed and is 2.4 raid). I suggest you change that
> > and update, as I had no problems with 2.4.2-pre2/3, nor have any been
> > posted to the raid list.
>
> I just tried with 2.4.1-ac14, raid and raid1 compiled in and it did the
> same thing. I'm going to try to compile reiserfs in (if I have enough
room
> to still fit the kernel on the floppy with it's initial ramdisk, etc.)
and
> see what that does.

There seem to be several reports of reiserfs falling over when memory is
low. It seems to be undetermined if this problem is actually reiserfs or MM
related, but there are other threads on this list regarding similar issues.
This would explain why the same disk would work on a different machine with
more memory. Any chance you could add memory to the box temporarily just to
see if it helps, this may help prove if this is the problem or not.


If you caught the end of the thread, james (the initial poster) added
memory, had no problems, removed the extra memory and still had no
problems. His spare memory is greater than my memory total. I too
cannot repeat the freeze. It makes me wonder if there is some
parameter updating in the kernel somehow.

2001-02-23 09:00:17

by Pavel Machek

[permalink] [raw]
Subject: Re: Reiserfs, 3 Raid1 arrays, 2.4.1 machine locks up

Hi!

> I partitioned the 2 drives (on 1st and 2nd controller, both 1.3 GB each)
> into 4 total partitions. 1st is swap and then the next 3, 1 primary, 2
> extended are for raid 1 arrays. I've given 20 MB to /boot (md0), 650MB
> to / (md1) and the rest (400+MB) to /var (md2). I format md0 as ext2
> and md1 and md2 as reiserfs. When I go to untar the image on the cd to
> /mnt/slash (which has md1 mounted on it), the system extracts about 30MB
> of data and then just stops responding. No kernel output, etc. I can
> change to the other virtual consoles, but no other keyboard input is
> accepted. After resetting the machine, the raid arrays rebuild ok, and
> reiserfs gives me no problems other than it usually replays 2 or 3
> transactions. If I tell tar to pickup on the last directory I saw
> extracted, it gets about another 30MB of data and stops again. I've
> waited for the raid syncing to be finished or just started after the
> arrays are available and it doesn't matter.

Try running sync; sync; sync; ... while untarring.
Pavel
--
I'm [email protected]. "In my country we have almost anarchy and I don't care."
Panos Katsaloulis describing me w.r.t. patents at [email protected]

2001-02-23 20:02:51

by Jasmeet Sidhu

[permalink] [raw]
Subject: Re: Reiserfs, 3 Raid1 arrays, 2.4.1 machine locks up

As other posts have pointed out, if you have BAD HDMA cables, you will
experience problems. One thing I would suggest is that you
add kernel.* /dev/console to your /etc/syslog.conf so that you see
any errors resulting from the kernel code. Also I would suggest that you
open another virtual terminal and leave tail -f /var/log/messages and keep
an eye on it when the system could possibly crash. This should help you
out a little bit.

At 09:36 PM 2/22/2001 +0100, Pavel Machek wrote:
>Hi!
>
> > I partitioned the 2 drives (on 1st and 2nd controller, both 1.3 GB each)
> > into 4 total partitions. 1st is swap and then the next 3, 1 primary, 2
> > extended are for raid 1 arrays. I've given 20 MB to /boot (md0), 650MB
> > to / (md1) and the rest (400+MB) to /var (md2). I format md0 as ext2
> > and md1 and md2 as reiserfs. When I go to untar the image on the cd to
> > /mnt/slash (which has md1 mounted on it), the system extracts about 30MB
> > of data and then just stops responding. No kernel output, etc. I can
> > change to the other virtual consoles, but no other keyboard input is
> > accepted. After resetting the machine, the raid arrays rebuild ok, and
> > reiserfs gives me no problems other than it usually replays 2 or 3
> > transactions. If I tell tar to pickup on the last directory I saw
> > extracted, it gets about another 30MB of data and stops again. I've
> > waited for the raid syncing to be finished or just started after the
> > arrays are available and it doesn't matter.
>
>Try running sync; sync; sync; ... while untarring.
> Pavel
>--
>I'm [email protected]. "In my country we have almost anarchy and I don't care."
>Panos Katsaloulis describing me w.r.t. patents at [email protected]
>-
>To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
>the body of a message to [email protected]
>More majordomo info at http://vger.kernel.org/majordomo-info.html
>Please read the FAQ at http://www.tux.org/lkml/


- - -
Jasmeet Sidhu
Unix Systems Administrator
ArrayComm, Inc.
[email protected]
http://www.arraycomm.com