2004-10-28 13:32:34

by Marcelo Tosatti

[permalink] [raw]
Subject: Re: best linux kernel with memory management

On Wed, Oct 27, 2004 at 10:54:48AM +0200, Martin MOKREJ? wrote:
> Hi,
> I have hit again memory problem on the same host with 2.4.28-pre3.
> I went to test raid5 filesystem and wanted to evaluate speed
> of different filesystems while studying different combinations
> of their mkfs options or mount options. I did test reiserfs3,
> ext3, xfs, ext2.
>
> With a subset of xfs test I run on the server out of memory,
> reproducibly. Those are tests on a filesystem which was created
> with "mkfs.xfs -f -b log=9 -d sunit=64,swidth=64 /dev/sdb2"
> and other sizes of swidth parameter. Omitting those parameters
> makes no trouble and tests finish properly.
>
> I have noticed that maxfiles was reached and also the
> 0 allocation failed messages.
> I have /proc/sys/fs/max-files set to 655360.
> The filesystem is created on a 1Terabyte large raid5. I guess xfs
> allocates to much space for it's buffers.
>
> The problem is triggered by starting:
> "bonnie++ -n 1 -s 12G -d /scratch -u apache -q"
> in few minutes system runs out of memory/files.
>
> Originally, I did not have a swap on this host, but even
> doing mkswap and swapon of a spare 12GB partition doesn't help,
> although from I forgot if I have again the 0 allocation pages
> message. At least the maxfiles problem was repeated.
>
> Doing same test on a 5GB large partion on my home computer
> I couldn't reproduce the problem, but from vmstat I gather
> that the system does not loose any memory for buffers.
>
> I'll kick out SMP and HIGHMEM on that problematic host and retry,
> but I don't understand if that would be even expected to help.
> Well, maybe HIGHMEM, but why SMP would do a difference?
>
> I can't think of a reason same machine would perform
> well with many other mkfs.xfs switches except those two above.
> Therefore, I don't believe it's a problem with memory management.
> I suspect a problem in xfs memory usage.
>
> Any ideas what might be wrong? I can attach serial console and
> gather some messages if some is interrested in. In principle,
> I can turn on either memory or xfs debugging.

Its really looks XFS specific - have you sent this to the XFS people?

> Are there any known bugs in memory management or xfs on 2.4.28-pre3?
> I'd like to stick to this version to be able to evaluate the
> performance results. Thanks for any help.
> Please Cc: me in replies.
> Martin
>
> Marcelo Tosatti wrote:
> >On Wed, Aug 25, 2004 at 04:09:32PM +0200, Martin MOKREJ? wrote:
> >
> >>Hi Marcelo,
> >>
> >>Marcelo Tosatti wrote:
> >>
> >>>On Wed, Aug 25, 2004 at 03:32:37PM +0200, Martin MOKREJ? wrote:
> >>>
> >>>
> >>>>Marcelo Tosatti wrote:
> >>>>
> >>>>
> >>>>>Hi Martin!
> >>>>>
> >>>>>On Wed, Aug 25, 2004 at 02:37:07PM +0200, Martin MOKREJ? wrote:
> >>>>>
> >>>>>
> >>>>>
> >>>>>>Hi Marcelo.
> >>>>>>I have a huge server with 6GB RAM and 1 TB raid5 array on aic7xxx
> >>>>>>using XFS. The xfs seems to eat a lot of memory, so I'm getting
> >>>>>>processes killed. That's bad. Which kernel would you recommend? I
> >>>>>>have problems with 2.6.8-rc1.
> >>>>>
> >>>>>
> >>>>>Can you show me the output from dmesg while you see the tasks getting
> >>>>>killed?
> >>>>
> >>>>Umm, the dmesg output is the one you saw below. I could have attached
> >>>>/var/log/messages or kern.log , yes. Will do. The machine did not
> >>>>reboot properly, grrr. It has cost 30 000 Euro and no single linux
> >>>>kernel worked fine. Also 2.4.25 had problems which was on Gentoo
> >>>>install CDROM. I don't remember exact their exact revisions, but I
> >>>>shouldn't have used xfs also for /. I thought better xfs everywhere
> >>>>than combined with reiserfs. Of course, /boot is ext3.
> >>>
> >>>
> >>>So there was a problem with installation of v2.4.25? I'm pretty sure
> >>
> >>Yes, the trick was to never insmod aic7xxx and only install basic stuff
> >>on teh internal raid using gdth controller. After mkfs.xfs on internal
> >>disks I had to reboot often. I was a month away so don't remember
> >>details, but I'm sure I tried even 2.4.27-rcXX or whatever was out at
> >>that time. Tommorow I can give you list of all those kernels. It seemed
> >>to me that that 2.6.8-rc1 is without problems, but as I found after
> >>vacation ...
> >>
> >>
> >>>you can boot a v2.4 kernel on it now if you wish. It has to work.
> >>
> >>Boot I can, I can work for a while in multiuser, but after lot's of IO on
> >>XFS on aic7xxx controller I start to have problems.
> >
> >
> >Odd. You dont remember the errors in detail?
> >
> >
> >>>>>The VM prints out debugging information when that happens.
> >>>>>
> >>>>>How deep into swap are you when the VM starts killing tasks?
> >>>>
> >>>>No swap defined at all, if I remember right. Can't say, but machine
> >>>>totally unloaded.
> >>>
> >>>
> >>>You should add some swap also, really.
> >>
> >>I think I have prepared swap on that aic7xxx RAID, not to stress internal
> >>controller.
> >>
> >>
> >>>>>Whats the output of /proc/slabinfo just before the VM starts killing
> >>>>>tasks?
> >>>>>
> >>>>>XFS is very hungry memory user, but you have _a lot_ of memory, there
> >>>>>must
> >>>>>be some freeable cache around.
> >>>>
> >>>>I aggree.
> >>>>
> >>>>
> >>>>
> >>>>>Well, before gathering this data please try 2.6.8-final.
> >>>>
> >>>>You mean 2.6.8.1? I tried yesterday on another machine, and had to turn
> >>>>off acpi. :(
> >>>
> >>>
> >>>You need ACPI to the box to work properly?
> >>
> >>No, but that crash resulted in immediate lockup of kernel, it did not
> >>crash, but acpi sent out some error messages and blocked system. I want
> >>acpi for better interrupt handling/balancing. But that is another
> >>machine. On this raid host I can stay without it for a while, but good
> >>powersaving/cpu temp. saving is a must in a long term. Should be used for
> >>computations (dual Xeon 3 GHz).
> >>
> >>
> >>
> >>>>>I'm willing to help! :)
> >>>>
> >>>>Great, I'll get an acces to that machine tommorow. Untill that, tell me
> >>>>if I should turn on/off some kernel debug option ...
> >>>
> >>>
> >>>Not really, just enable swap and run 2.6.8.1.
> >>
> >>But why do I need swap at all? What's the difference between 3 GB RAM + 3
> >>GB swap setup to 6 GB RAM only?
> >
> >
> >When the kernel swaps it knows you are out of memory (it has already thrown
> >away pagecache and inode/dentries caches).
> >
> >Swap is important in this case so the kernel can push some anonymous
> >memory to swap, in case you are overloaded with such.
> >
> >
> >>>If you have problems with such setup, we have a problem which must be
> >>>fixed.
> >>>
> >>>I bet v2.4 run flawlessly, but v2.6 is much faster.
> >>
> >>
> >>During installation of gentoo linux, user has to unpack 2 .tar.gz files,
> >>about 15 MB and teh other is bigger. Believe me, after mkfs.xfs I had to
> >>reboot first, then unpack the file, reboot, continue installation. I
> >>think with the other file (bigger?) I did not make it a lot further in
> >>the install process. I remember what has worked was to unpack on another
> >>server and use rsync(1) to copy the tree. Yes, that bad! Feel free to
> >>test http://www.gentoo.org's install procedure and try yourself. If I would know
> >>how to easily rip the install-cd kernel and replace it with my own, I'd
> >>play more. But basically, mkfs.xfs and then "bzip2 -dc *.tar.bz2 | tar
> >>xvf -)" should do the trick on every huge filesystem.
> >
> >
> >Hum, very odd.
> >
> >


2004-10-29 07:16:47

by Nathan Scott

[permalink] [raw]
Subject: Re: best linux kernel with memory management

Hi Martin,

Sorry about the slow reply, been away for a bit...

On Thu, Oct 28, 2004 at 08:51:08AM -0200, Marcelo Tosatti wrote:
> On Wed, Oct 27, 2004 at 10:54:48AM +0200, Martin MOKREJ? wrote:
> > Hi,
> > I have hit again memory problem on the same host with 2.4.28-pre3.
> > I went to test raid5 filesystem and wanted to evaluate speed

Was that software (md) or hardware RAID5?

> > of different filesystems while studying different combinations
> > of their mkfs options or mount options. I did test reiserfs3,
> > ext3, xfs, ext2.
> >
> > With a subset of xfs test I run on the server out of memory,
> > reproducibly. Those are tests on a filesystem which was created
> > with "mkfs.xfs -f -b log=9 -d sunit=64,swidth=64 /dev/sdb2"

With that blocksize (-blog=9 is 512 byte blocksize) you will have
many more buffer_heads per page than the default (4k; i.e. 1-per-
page) which may cause a different kind of memory pressure to what
you'd otherwise see.

Are you tweaking all the filesystems to use that blocksize? I'm
not sure they all support that small, actually, for ext2/3 I think
they stop at 1k as the smallest blocksize.

You should find the ideal MD RAID5 XFS geometry to be a 4k sector
size (-s size=4k) and 4K blocksize (-b size=4k) to mkfs.xfs.

> > and other sizes of swidth parameter. Omitting those parameters
> > makes no trouble and tests finish properly.

Thats interesting, I expect this may be a buffer_head reclaim
issue then, if the larger blocksize runs are completing fine.
In that case, I wonder if the attached patch helps at all?

Can you send me your /proc/meminfo and /proc/slabinfo at the time
you see the failure? Also the console messages (I guess you sent
to Marcelo earlier too) with these failing allocation messages...

> > I have noticed that maxfiles was reached and also the
> > 0 allocation failed messages.
> > although from I forgot if I have again the 0 allocation pages
> > ...

(oh, and also which kernel versions are associated with which
sets of messages - looks like you've tried a few here).

> > >>>>kernel worked fine. Also 2.4.25 had problems which was on Gentoo
> > >>>>install CDROM. I don't remember exact their exact revisions, but I
> > >>>>shouldn't have used xfs also for /. I thought better xfs everywhere
> > >>>>than combined with reiserfs. Of course, /boot is ext3.

There should be no problems using XFS for everything, including
/boot - I do that on all my systems (for a few years now).

> > >>on teh internal raid using gdth controller. After mkfs.xfs on internal
> > >>disks I had to reboot often.

Hmm, that sounds like a device driver bug (if mkfs causes hangs..?).
Could you get sysrq-t or kdb stacktraces for the hung processes?

cheers.

--
Nathan


Attachments:
(No filename) (2.67 kB)
free_more_memory (431.00 B)
Download all attachments

2004-10-29 07:22:26

by Jan Engelhardt

[permalink] [raw]
Subject: Re: best linux kernel with memory management

>> > >>>>kernel worked fine. Also 2.4.25 had problems which was on Gentoo
>> > >>>>install CDROM. I don't remember exact their exact revisions, but I
>> > >>>>shouldn't have used xfs also for /. I thought better xfs everywhere
>> > >>>>than combined with reiserfs. Of course, /boot is ext3.
>
>There should be no problems using XFS for everything, including
>/boot - I do that on all my systems (for a few years now).

/boot (whose root is / for me) can be reiser, the bootload installer just needs
to know that is has to unpack files otherwise they might not boot.



Jan Engelhardt
--
Gesellschaft für Wissenschaftliche Datenverarbeitung
Am Fassberg, 37077 Göttingen, http://www.gwdg.de

2004-10-29 13:09:13

by Rahul Karnik

[permalink] [raw]
Subject: Re: best linux kernel with memory management

On Fri, 29 Oct 2004 17:14:29 +1000, Nathan Scott <[email protected]> wrote:

> There should be no problems using XFS for everything, including
> /boot - I do that on all my systems (for a few years now).

Last time I checked (~ 2 months ago), there is a GRUB bug that
prevents the use of XFS as the /boot filesystem. I use ext3 for my
/boot to get around this, with all my other filesystems being XFS. Any
chance the XFS devs could help fix the GRUB team fix the bug?

Thanks,
Rahul

2004-10-29 13:18:45

by Christoph Hellwig

[permalink] [raw]
Subject: Re: best linux kernel with memory management

On Fri, Oct 29, 2004 at 09:08:49AM -0400, Rahul Karnik wrote:
> On Fri, 29 Oct 2004 17:14:29 +1000, Nathan Scott <[email protected]> wrote:
>
> > There should be no problems using XFS for everything, including
> > /boot - I do that on all my systems (for a few years now).
>
> Last time I checked (~ 2 months ago), there is a GRUB bug that
> prevents the use of XFS as the /boot filesystem. I use ext3 for my
> /boot to get around this, with all my other filesystems being XFS. Any
> chance the XFS devs could help fix the GRUB team fix the bug?

they just have to remove the broken pass where they try to read from the
raw device.