2006-03-04 10:26:11

by David Miller

[permalink] [raw]
Subject: VFS nr_files accounting


I just wanted to report that I am hitting the "VFS: file-max limit xxx
reached" problem quite easily on my 32-cpu Niagara machine with 16GB
of ram with current 2.6.x GIT.

It seems far too easy to get a box into this state due to SLAB
fragmentation and RCU. And once you get a machine into this state it
is totally unusable.

Our test case is usually a "make -j8192" kernel build along with a
parallel bootstrap of gcc. That puts about 256 processes on each
cpu's runqueue, I doubt ksoftirqd can run much at all.

I think part of what helps trigger it might be ccache, which we are
using on this machine. ccache seems to open up a ton of files each
build invocation.

Usually within an hour of that load you'll hit the nr_files limit and
you can't run anything and have to power-cycle.

I think we need to think seriously about this problem.


2006-03-04 14:18:31

by Dipankar Sarma

[permalink] [raw]
Subject: Re: VFS nr_files accounting

On Sat, Mar 04, 2006 at 02:25:46AM -0800, David S. Miller wrote:
>
> I just wanted to report that I am hitting the "VFS: file-max limit xxx
> reached" problem quite easily on my 32-cpu Niagara machine with 16GB
> of ram with current 2.6.x GIT.
>
> It seems far too easy to get a box into this state due to SLAB
> fragmentation and RCU. And once you get a machine into this state it
> is totally unusable.
>
> Our test case is usually a "make -j8192" kernel build along with a
> parallel bootstrap of gcc. That puts about 256 processes on each
> cpu's runqueue, I doubt ksoftirqd can run much at all.

Dave, there is a set of patches in -mm that may handle this
better -

http://kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.16-rc5/2.6.16-rc5-mm2/broken-out/rcu-batch-tuning.patch
http://kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.16-rc5/2.6.16-rc5-mm2/broken-out/fix-file-counting.patch
http://kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.16-rc5/2.6.16-rc5-mm2/broken-out/fix-file-counting-fixes.patch

Could you please try this in your setup ?

The rcu-batch tuning patch provides automatic switching to
process as many RCUs as possible if too many of them are queued.
The file counting fixes count the file structures correctly.

Thanks
Dipankar

2006-03-04 22:22:16

by David Miller

[permalink] [raw]
Subject: Re: VFS nr_files accounting

From: Dipankar Sarma <[email protected]>
Date: Sat, 4 Mar 2006 19:47:17 +0530

> Dave, there is a set of patches in -mm that may handle this
> better -
>
> http://kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.16-rc5/2.6.16-rc5-mm2/broken-out/rcu-batch-tuning.patch
> http://kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.16-rc5/2.6.16-rc5-mm2/broken-out/fix-file-counting.patch
> http://kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.16-rc5/2.6.16-rc5-mm2/broken-out/fix-file-counting-fixes.patch
>
> Could you please try this in your setup ?
>
> The rcu-batch tuning patch provides automatic switching to
> process as many RCUs as possible if too many of them are queued.
> The file counting fixes count the file structures correctly.

Thanks, I'll give these patches a spin.

2006-03-04 22:28:30

by David Miller

[permalink] [raw]
Subject: Re: VFS nr_files accounting

From: "David S. Miller" <[email protected]>
Date: Sat, 04 Mar 2006 14:22:02 -0800 (PST)

> From: Dipankar Sarma <[email protected]>
> Date: Sat, 4 Mar 2006 19:47:17 +0530
>
> > Dave, there is a set of patches in -mm that may handle this
> > better -
> >
> > http://kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.16-rc5/2.6.16-rc5-mm2/broken-out/rcu-batch-tuning.patch
> > http://kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.16-rc5/2.6.16-rc5-mm2/broken-out/fix-file-counting.patch
> > http://kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.16-rc5/2.6.16-rc5-mm2/broken-out/fix-file-counting-fixes.patch
> >
> > Could you please try this in your setup ?
> >
> > The rcu-batch tuning patch provides automatic switching to
> > process as many RCUs as possible if too many of them are queued.
> > The file counting fixes count the file structures correctly.
>
> Thanks, I'll give these patches a spin.

Sigh, this is going to take a while, because there are -mm
dependencies in these patches such as percpu_counter_sum().

I'll have to fish those out of -mm before I can start testing
this.

2006-03-04 22:32:24

by David Miller

[permalink] [raw]
Subject: Re: VFS nr_files accounting

From: "David S. Miller" <[email protected]>
Date: Sat, 04 Mar 2006 14:28:21 -0800 (PST)

> Sigh, this is going to take a while, because there are -mm
> dependencies in these patches such as percpu_counter_sum().
>
> I'll have to fish those out of -mm before I can start testing
> this.

And now that I've sucked in percpu_counter_sum.patch, the
rcu-batch-tuning.patch gets a bunch of rejects.

Sorry, I really can't test this. Can you by chance put together a
patch against vanilla 2.6.16-GIT? We'll need that to put a fix
for this bug into Linus's tree anyways.

Thanks.

2006-03-05 07:06:55

by Dipankar Sarma

[permalink] [raw]
Subject: Re: VFS nr_files accounting

On Sat, Mar 04, 2006 at 02:32:22PM -0800, David S. Miller wrote:
> From: "David S. Miller" <[email protected]>
> Date: Sat, 04 Mar 2006 14:28:21 -0800 (PST)
>
> And now that I've sucked in percpu_counter_sum.patch, the
> rcu-batch-tuning.patch gets a bunch of rejects.
>
> Sorry, I really can't test this. Can you by chance put together a
> patch against vanilla 2.6.16-GIT? We'll need that to put a fix
> for this bug into Linus's tree anyways.

Dave,

Can you check if the following patchset applies to the latest git ?
These were against 2.6.16-rc3.

http://www.hill9.org/linux/kernel/patches/2.6.16-rc3/rcu-batch-tuning.patch
http://www.hill9.org/linux/kernel/patches/2.6.16-rc3/percpu-counter-sum.patch
http://www.hill9.org/linux/kernel/patches/2.6.16-rc3/fix-file-counting.patch

Thanks
Dipankar

2006-03-05 07:37:15

by David Miller

[permalink] [raw]
Subject: Re: VFS nr_files accounting

From: Dipankar Sarma <[email protected]>
Date: Sun, 5 Mar 2006 12:35:38 +0530

> Can you check if the following patchset applies to the latest git ?
> These were against 2.6.16-rc3.
>
> http://www.hill9.org/linux/kernel/patches/2.6.16-rc3/rcu-batch-tuning.patch
> http://www.hill9.org/linux/kernel/patches/2.6.16-rc3/percpu-counter-sum.patch
> http://www.hill9.org/linux/kernel/patches/2.6.16-rc3/fix-file-counting.patch

Applies with some fuzz to kernel/sysctl.c

Thanks a lot!

2006-03-05 11:40:05

by Dipankar Sarma

[permalink] [raw]
Subject: Re: VFS nr_files accounting

On Sat, Mar 04, 2006 at 11:37:25PM -0800, David S. Miller wrote:
> From: Dipankar Sarma <[email protected]>
> Date: Sun, 5 Mar 2006 12:35:38 +0530
>
> > Can you check if the following patchset applies to the latest git ?
> > These were against 2.6.16-rc3.
> >
> > http://www.hill9.org/linux/kernel/patches/2.6.16-rc3/rcu-batch-tuning.patch
> > http://www.hill9.org/linux/kernel/patches/2.6.16-rc3/percpu-counter-sum.patch
> > http://www.hill9.org/linux/kernel/patches/2.6.16-rc3/fix-file-counting.patch
>
> Applies with some fuzz to kernel/sysctl.c

Great. I look forward to hearing from you about the results
with your test case.

Thanks
Dipankar

2006-03-06 20:38:52

by David Miller

[permalink] [raw]
Subject: Re: VFS nr_files accounting

From: Dipankar Sarma <[email protected]>
Date: Sun, 5 Mar 2006 17:08:47 +0530

> Great. I look forward to hearing from you about the results
> with your test case.

It works quite fine so far, I haven't seen the filp exhaustion
nor a highly fragmented filp SLAB.

Instead, I'm not hitting other bugs that are of my own doing
on Niagara, which is what I wanted to accomplish with these
stress tests in the first place :-)

I think we should seriously consider these patches for 2.6.16

2006-03-07 06:42:24

by Dipankar Sarma

[permalink] [raw]
Subject: Re: VFS nr_files accounting

On Mon, Mar 06, 2006 at 12:39:04PM -0800, David S. Miller wrote:
> From: Dipankar Sarma <[email protected]>
> Date: Sun, 5 Mar 2006 17:08:47 +0530
>
> > Great. I look forward to hearing from you about the results
> > with your test case.
>
> It works quite fine so far, I haven't seen the filp exhaustion
> nor a highly fragmented filp SLAB.

Good to hear that.

> Instead, I'm not hitting other bugs that are of my own doing
> on Niagara, which is what I wanted to accomplish with these
> stress tests in the first place :-)

Not good :)

> I think we should seriously consider these patches for 2.6.16

Isn't it a little too late in the 2.6.16 cycle ? I would have
liked a little more time in -mm. Anyway, it is Linus' call.
I can refresh the patches and submit against latest mainline
if Linus and Andrew want.

Thanks
Dipankar

2006-03-07 06:47:31

by David Miller

[permalink] [raw]
Subject: Re: VFS nr_files accounting

From: Dipankar Sarma <[email protected]>
Date: Tue, 7 Mar 2006 12:11:20 +0530

> On Mon, Mar 06, 2006 at 12:39:04PM -0800, David S. Miller wrote:
> > I think we should seriously consider these patches for 2.6.16
>
> Isn't it a little too late in the 2.6.16 cycle ? I would have
> liked a little more time in -mm. Anyway, it is Linus' call.
> I can refresh the patches and submit against latest mainline
> if Linus and Andrew want.

Users can run widely published programs to make one's system run out
of file descriptors and make the machine totaly unusable.

If that doesn't qualify for something to fix for 2.6.16 I don't know
what does. :-)

2006-03-07 06:53:21

by Nick Piggin

[permalink] [raw]
Subject: Re: VFS nr_files accounting

Dipankar Sarma wrote:
> On Mon, Mar 06, 2006 at 12:39:04PM -0800, David S. Miller wrote:

>>I think we should seriously consider these patches for 2.6.16
>
>
> Isn't it a little too late in the 2.6.16 cycle ? I would have
> liked a little more time in -mm. Anyway, it is Linus' call.
> I can refresh the patches and submit against latest mainline
> if Linus and Andrew want.
>

If it is making the machine unusable then it is a bug rather than
just poor behaviour, and definitely a regression.

I think this is very good grounds to get into 2.6.16, even if it
does mean pushing the release back.

The other thing that is typically done for regressions like these
close to release time is to revert the offending changes. I figure
that in this case, such an option is probably _more_ risky.

--
SUSE Labs, Novell Inc.
Send instant messages to your online friends http://au.messenger.yahoo.com

2006-03-07 07:00:32

by David Miller

[permalink] [raw]
Subject: Re: VFS nr_files accounting

From: Nick Piggin <[email protected]>
Date: Tue, 07 Mar 2006 17:53:14 +1100

> The other thing that is typically done for regressions like these
> close to release time is to revert the offending changes. I figure
> that in this case, such an option is probably _more_ risky.

Especially since we're talking about something that went into
2.6.14

2006-03-07 07:09:22

by Andrew Morton

[permalink] [raw]
Subject: Re: VFS nr_files accounting

Dipankar Sarma <[email protected]> wrote:
>
> > I think we should seriously consider these patches for 2.6.16
>
> Isn't it a little too late in the 2.6.16 cycle ? I would have
> liked a little more time in -mm. Anyway, it is Linus' call.
> I can refresh the patches and submit against latest mainline
> if Linus and Andrew want.

I'd view a 2.6.16 merge as relatively low-risk. My main concern would be
possible breakage of those whacky route-cache workloads.

(I've consoldidated the patches and rebased them against mainline).

2006-03-07 08:03:16

by Dipankar Sarma

[permalink] [raw]
Subject: Re: VFS nr_files accounting

On Mon, Mar 06, 2006 at 10:47:48PM -0800, David S. Miller wrote:
> From: Dipankar Sarma <[email protected]>
> Date: Tue, 7 Mar 2006 12:11:20 +0530
>
> > On Mon, Mar 06, 2006 at 12:39:04PM -0800, David S. Miller wrote:
> > > I think we should seriously consider these patches for 2.6.16
> >
> > Isn't it a little too late in the 2.6.16 cycle ? I would have
> > liked a little more time in -mm. Anyway, it is Linus' call.
> > I can refresh the patches and submit against latest mainline
> > if Linus and Andrew want.
>
> Users can run widely published programs to make one's system run out
> of file descriptors and make the machine totaly unusable.
>
> If that doesn't qualify for something to fix for 2.6.16 I don't know
> what does. :-)

Not many people have access to shiny new 8-core 32-thread CPUs
(/me turns green saying this) :-)

To be honest I thought Linus' earlier fix that increased
RCU maximum batch size to 1000 had more or less fixed
the issue for most people. I haven't seen it in my testing,
but I agree that we have to take OOMs seriously. I am just being
paranoic here.

Thanks
Dipankar

2006-03-07 08:10:28

by Dipankar Sarma

[permalink] [raw]
Subject: Re: VFS nr_files accounting

On Mon, Mar 06, 2006 at 11:00:30PM -0800, David S. Miller wrote:
> From: Nick Piggin <[email protected]>
> Date: Tue, 07 Mar 2006 17:53:14 +1100
>
> > The other thing that is typically done for regressions like these
> > close to release time is to revert the offending changes. I figure
> > that in this case, such an option is probably _more_ risky.
>
> Especially since we're talking about something that went into
> 2.6.14

Without a doubt, yes. Waaaay more risky. Compared to that the
rcu-batch-tuning and fix-file-counting are a cakewalk. It would
however be nice if Christoph or Viro gives the file counting
patch a look-over just so that I didn't break any sysctl
stuff.

Thanks
Dipankar

2006-03-07 08:56:30

by Fabio M. Di Nitto

[permalink] [raw]
Subject: Re: VFS nr_files accounting

Dipankar Sarma wrote:
> On Mon, Mar 06, 2006 at 11:00:30PM -0800, David S. Miller wrote:
>> From: Nick Piggin <[email protected]>
>> Date: Tue, 07 Mar 2006 17:53:14 +1100
>>
>>> The other thing that is typically done for regressions like these
>>> close to release time is to revert the offending changes. I figure
>>> that in this case, such an option is probably _more_ risky.
>> Especially since we're talking about something that went into
>> 2.6.14
>
> Without a doubt, yes. Waaaay more risky. Compared to that the
> rcu-batch-tuning and fix-file-counting are a cakewalk. It would
> however be nice if Christoph or Viro gives the file counting
> patch a look-over just so that I didn't break any sysctl
> stuff.
>
> Thanks
> Dipankar

Considering the impact of the problem, wouldn't be wise to push
the fix to -stable as well?

Thanks
Fabio

--
I'm going to make him an offer he can't refuse.

2006-03-07 12:10:28

by Dipankar Sarma

[permalink] [raw]
Subject: Re: VFS nr_files accounting

On Mon, Mar 06, 2006 at 11:06:39PM -0800, Andrew Morton wrote:
> Dipankar Sarma <[email protected]> wrote:
> >
> > > I think we should seriously consider these patches for 2.6.16
> >
> > Isn't it a little too late in the 2.6.16 cycle ? I would have
> > liked a little more time in -mm. Anyway, it is Linus' call.
> > I can refresh the patches and submit against latest mainline
> > if Linus and Andrew want.
>
> I'd view a 2.6.16 merge as relatively low-risk. My main concern would be
> possible breakage of those whacky route-cache workloads.

Yes, I was hoping that more time in -mm would bring out those
whacky corner case OOM/latency problems.

Anyway, here is the kernel paramenter documentation patch.
I am not sure if I got the restrictions in square bracket
right.

Thanks
Dipankar



Update kernel paramenters documentation for new RCU tuning
paramenters.

Signed-off-by: Dipankar Sarma <[email protected]>
---


Documentation/kernel-parameters.txt | 13 +++++++++++++
1 files changed, 13 insertions(+)

diff -puN Documentation/kernel-parameters.txt~rcu-tuning-parm-doc Documentation/kernel-parameters.txt
--- linux-2.6.16-rc3-rcu/Documentation/kernel-parameters.txt~rcu-tuning-parm-doc 2006-03-07 17:23:52.000000000 +0530
+++ linux-2.6.16-rc3-rcu-dipankar/Documentation/kernel-parameters.txt 2006-03-07 17:33:59.000000000 +0530
@@ -1280,6 +1280,19 @@ running once the system is up.
New name for the ramdisk parameter.
See Documentation/ramdisk.txt.

+ rcu.blimit= [KNL,BOOT] Set maximum number of finished
+ RCU callbacks to process in one batch.
+
+ rcu.qhimark= [KNL,BOOT] Set threshold of queued
+ RCU callbacks over which batch limiting is disabled.
+
+ rcu.qlowmark= [KNL,BOOT] Set threshold of queued
+ RCU callbacks below which batch limiting is re-enabled.
+
+ rcu.rsinterval= [KNL,BOOT,SMP] Set the number of additional
+ RCU callbacks to queued before forcing reschedule
+ on all cpus.
+
rdinit= [KNL]
Format: <full_path>
Run specified binary instead of /init from the ramdisk,

_