2009-01-21 11:58:46

by Peter Palfrader

[permalink] [raw]
Subject: 2.6.28, rlimits, performance and debian etch

Hi,

I spent several hours trying to get to the bottom of a serious
performance issue that appeared on one of our servers after upgrading to
2.6.28. In the end it's what could be considered a userspace bug that
was triggered by a change in 2.6.28. Since this might also affect other
people I figured I'd at least document what I found here, and maybe we
can even do something about it:


So, I upgraded some of debian.org's machines to 2.6.28.1 and immediately
the team maintaining our ftp archive complained that one of their
scripts that previously ran in a few minutes still hadn't even come
close to being done after an hour or so. Downgrading to 2.6.27 fixed
that.

Turns out that script is forking a lot and something in it or python or
whereever closes all the file descriptors it doesn't want to pass on.
That is, it starts at zero and goes up to ulimit -n/RLIMIT_NOFILE and
closes them all with a few exceptions.

Turns out that takes a long time when your limit -n is now 2^20 (1048576).

With 2.6.27.* the ulimit -n was the standard 1024, but with 2.6.28 it is
now a thousand times that.

2.6.28 included a patch titled "rlimit: permit setting RLIMIT_NOFILE to
RLIM_INFINITY" (0c2d64fb6cae9aae480f6a46cfe79f8d7d48b59f)[1] that
allows, as the title implies, to set the limit for number of files to
infinity.

Closer investigation showed that the broken default ulimit did not apply
to "system" processes (like stuff started from init). In the end I
could establish that all processes that passed through pam_limit at one
point had the bad resource limit.

Apparently the pam library in Debian etch (4.0) initializes the limits
to some default values when it doesn't have any settings in limit.conf
to override them. Turns out that for nofiles this is RLIM_INFINITY.
Commenting out "case RLIMIT_NOFILE" in pam_limit.c:267 of our pam
package version 0.79-5 fixes that - tho I'm not sure what side effects
that has.

Debian lenny (the upcoming 5.0 version) doesn't have this issue as it
uses a different pam (version).


I'm a bit unsure where to go from here. Maybe the pam library in etch
should be fixed. Maybe the patch should be reverted (but then it may be
more correct now and that's what the changelog entry suggests).
As a stopgap measure I could also just define nofile in limits.conf.

Thanks for listening. Also thanks to Rik and Nocholas who helped track
some of this down.

Cheers,
Peter
1. http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=0c2d64fb6cae9aae480f6a46cfe79f8d7d48b59f
http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=0c2d64fb6cae9aae480f6a46cfe79f8d7d48b59f
--
| .''`. ** Debian GNU/Linux **
Peter Palfrader | : :' : The universal
http://www.palfrader.org/ | `. `' Operating System
| `- http://www.debian.org/


2009-01-23 21:52:33

by Florian Weimer

[permalink] [raw]
Subject: Re: 2.6.28, rlimits, performance and debian etch

* Peter Palfrader:

> Turns out that script is forking a lot and something in it or python or
> whereever closes all the file descriptors it doesn't want to pass on.
> That is, it starts at zero and goes up to ulimit -n/RLIMIT_NOFILE and
> closes them all with a few exceptions.
>
> Turns out that takes a long time when your limit -n is now 2^20 (1048576).

Interesting.

Can we make /proc more-or-less mandatory, so that the file descriptor
list can be retrieved explicitly?

2009-01-23 22:02:42

by David Daney

[permalink] [raw]
Subject: Re: 2.6.28, rlimits, performance and debian etch

Florian Weimer wrote:
> * Peter Palfrader:
>
>> Turns out that script is forking a lot and something in it or python or
>> whereever closes all the file descriptors it doesn't want to pass on.
>> That is, it starts at zero and goes up to ulimit -n/RLIMIT_NOFILE and
>> closes them all with a few exceptions.
>>
>> Turns out that takes a long time when your limit -n is now 2^20 (1048576).
>
> Interesting.
>
> Can we make /proc more-or-less mandatory, so that the file descriptor
> list can be retrieved explicitly?

One problem is that for values of RLIMIT_NOFILE less than something like
4096, it is much faster to call sys_close() on all possible values than
iterate through a handful of open files from /proc/self/fd using
opendir(3)/readdir(3).

Obviously for some large values of RLIMIT_NOFILE, this is no longer true.

People who have written code based on measuring the difference end up
getting screwed when RLIMIT_NOFILE unexpectedly increases.

The real solution is to convert your user space programs to use the new
syscalls that allow for race-free setting of close-on-exec. Then you no
longer need to mess around with iterating over these things.

David Daney

2009-01-23 23:11:44

by Peter Palfrader

[permalink] [raw]
Subject: Re: 2.6.28, rlimits, performance and debian etch

On Fri, 23 Jan 2009, David Daney wrote:

> The real solution is to convert your user space programs to use the new
> syscalls that allow for race-free setting of close-on-exec. Then you no
> longer need to mess around with iterating over these things.

It's python's popen2 implementation that does that for us. At least for
python2.4 and 2.5.

In our particular case moving away from calling external tools and doing
more within the scripts themselves brought a real speedup, but it's
probably not just us :)

--
| .''`. ** Debian GNU/Linux **
Peter Palfrader | : :' : The universal
http://www.palfrader.org/ | `. `' Operating System
| `- http://www.debian.org/

2009-01-25 10:59:35

by Florian Weimer

[permalink] [raw]
Subject: Re: 2.6.28, rlimits, performance and debian etch

* David Daney:

> One problem is that for values of RLIMIT_NOFILE less than something
> like 4096, it is much faster to call sys_close() on all possible
> values than iterate through a handful of open files from /proc/self/fd
> using opendir(3)/readdir(3).

Really? Yuck.

> The real solution is to convert your user space programs to use the
> new syscalls that allow for race-free setting of close-on-exec.
> Then you no longer need to mess around with iterating over these
> things.

These system calls are too recent to use in the next two Debian
releases. In addition, we can't really be sure that all libraries use
the new calls.

I find the design of the CLOEXEC business somewhat revulsive, by the
way. It reminds me of those DoSomethingEx APIs in another platform.
The *_at system calls are a similar wart. Even with this stuff, I
still can't safely open a file with a different effective user ID in a
multithreaded application, or create a AF_UNIX socket with specific
file system permissions. Some thread-specific context with what have
been traditionally considered per-process attributes might have been
better. 8-(

2009-01-27 23:19:21

by Andrew Morton

[permalink] [raw]
Subject: Re: 2.6.28, rlimits, performance and debian etch

On Wed, 21 Jan 2009 12:52:19 +0100
Peter Palfrader <[email protected]> wrote:

> Hi,
>
> I spent several hours trying to get to the bottom of a serious
> performance issue that appeared on one of our servers after upgrading to
> 2.6.28. In the end it's what could be considered a userspace bug that
> was triggered by a change in 2.6.28. Since this might also affect other
> people I figured I'd at least document what I found here, and maybe we
> can even do something about it:
>
>
> So, I upgraded some of debian.org's machines to 2.6.28.1 and immediately
> the team maintaining our ftp archive complained that one of their
> scripts that previously ran in a few minutes still hadn't even come
> close to being done after an hour or so. Downgrading to 2.6.27 fixed
> that.
>
> Turns out that script is forking a lot and something in it or python or
> whereever closes all the file descriptors it doesn't want to pass on.
> That is, it starts at zero and goes up to ulimit -n/RLIMIT_NOFILE and
> closes them all with a few exceptions.
>
> Turns out that takes a long time when your limit -n is now 2^20 (1048576).
>
> With 2.6.27.* the ulimit -n was the standard 1024, but with 2.6.28 it is
> now a thousand times that.
>
> 2.6.28 included a patch titled "rlimit: permit setting RLIMIT_NOFILE to
> RLIM_INFINITY" (0c2d64fb6cae9aae480f6a46cfe79f8d7d48b59f)[1] that
> allows, as the title implies, to set the limit for number of files to
> infinity.
>
> Closer investigation showed that the broken default ulimit did not apply
> to "system" processes (like stuff started from init). In the end I
> could establish that all processes that passed through pam_limit at one
> point had the bad resource limit.
>
> Apparently the pam library in Debian etch (4.0) initializes the limits
> to some default values when it doesn't have any settings in limit.conf
> to override them. Turns out that for nofiles this is RLIM_INFINITY.
> Commenting out "case RLIMIT_NOFILE" in pam_limit.c:267 of our pam
> package version 0.79-5 fixes that - tho I'm not sure what side effects
> that has.
>
> Debian lenny (the upcoming 5.0 version) doesn't have this issue as it
> uses a different pam (version).
>
>
> I'm a bit unsure where to go from here. Maybe the pam library in etch
> should be fixed. Maybe the patch should be reverted (but then it may be
> more correct now and that's what the changelog entry suggests).
> As a stopgap measure I could also just define nofile in limits.conf.
>
> Thanks for listening. Also thanks to Rik and Nocholas who helped track
> some of this down.
>
> Cheers,
> Peter
> 1. http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=0c2d64fb6cae9aae480f6a46cfe79f8d7d48b59f
> http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=0c2d64fb6cae9aae480f6a46cfe79f8d7d48b59f

Ho hum, thanks.

Well, I think we just revert it for now. We can bring it back later
if someone is thus inclined. Along with some sort of opt-in control,
perhaps in /proc. Which defaults to "off".


2009-01-29 12:19:20

by Adam Tkac

[permalink] [raw]
Subject: Re: 2.6.28, rlimits, performance and debian etch

On Tue, Jan 27, 2009 at 03:17:03PM -0800, Andrew Morton wrote:
> On Wed, 21 Jan 2009 12:52:19 +0100
> Peter Palfrader <[email protected]> wrote:
>
> > Hi,
> >
> > I spent several hours trying to get to the bottom of a serious
> > performance issue that appeared on one of our servers after upgrading to
> > 2.6.28. In the end it's what could be considered a userspace bug that
> > was triggered by a change in 2.6.28. Since this might also affect other
> > people I figured I'd at least document what I found here, and maybe we
> > can even do something about it:
> >
> >
> > So, I upgraded some of debian.org's machines to 2.6.28.1 and immediately
> > the team maintaining our ftp archive complained that one of their
> > scripts that previously ran in a few minutes still hadn't even come
> > close to being done after an hour or so. Downgrading to 2.6.27 fixed
> > that.
> >
> > Turns out that script is forking a lot and something in it or python or
> > whereever closes all the file descriptors it doesn't want to pass on.
> > That is, it starts at zero and goes up to ulimit -n/RLIMIT_NOFILE and
> > closes them all with a few exceptions.
> >
> > Turns out that takes a long time when your limit -n is now 2^20 (1048576).
> >
> > With 2.6.27.* the ulimit -n was the standard 1024, but with 2.6.28 it is
> > now a thousand times that.
> >
> > 2.6.28 included a patch titled "rlimit: permit setting RLIMIT_NOFILE to
> > RLIM_INFINITY" (0c2d64fb6cae9aae480f6a46cfe79f8d7d48b59f)[1] that
> > allows, as the title implies, to set the limit for number of files to
> > infinity.
> >
> > Closer investigation showed that the broken default ulimit did not apply
> > to "system" processes (like stuff started from init). In the end I
> > could establish that all processes that passed through pam_limit at one
> > point had the bad resource limit.
> >
> > Apparently the pam library in Debian etch (4.0) initializes the limits
> > to some default values when it doesn't have any settings in limit.conf
> > to override them. Turns out that for nofiles this is RLIM_INFINITY.
> > Commenting out "case RLIMIT_NOFILE" in pam_limit.c:267 of our pam
> > package version 0.79-5 fixes that - tho I'm not sure what side effects
> > that has.
> >
> > Debian lenny (the upcoming 5.0 version) doesn't have this issue as it
> > uses a different pam (version).
> >
> >
> > I'm a bit unsure where to go from here. Maybe the pam library in etch
> > should be fixed. Maybe the patch should be reverted (but then it may be
> > more correct now and that's what the changelog entry suggests).
> > As a stopgap measure I could also just define nofile in limits.conf.
> >
> > Thanks for listening. Also thanks to Rik and Nocholas who helped track
> > some of this down.
> >
> > Cheers,
> > Peter
> > 1. http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=0c2d64fb6cae9aae480f6a46cfe79f8d7d48b59f
> > http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=0c2d64fb6cae9aae480f6a46cfe79f8d7d48b59f
>
> Ho hum, thanks.
>
> Well, I think we just revert it for now. We can bring it back later
> if someone is thus inclined. Along with some sort of opt-in control,
> perhaps in /proc. Which defaults to "off".
>

Hi all,

I don't think the "rlim_infinity" patch should be reverted. Let me try
to explain why.

First, code which sets limits to RLIM_INFINITY is very bad idea and
that code is Debian specific. I downloaded original pam 0.79 (stripped):

for(i = 0; i < RLIM_NLIMITS; i++) {
...
int r = getrlimit(i, &pl->limits[i].limit);
...

as you can see original pam sets limits to inherited defaults. After
code written above Debian adds their patch:

if (limits_not_defined_in_limits_conf) {
...
case RLIMIT_NOFILE:
...
pl->limits[i].limit.rlim_cur = RLIM_INFINITY;
pl->limits[i].limit.rlim_max = RLIM_INFINITY;
...
}

so as you can see inherited default limits are overriden to infinity.
In my opinion Debian should revert their patch which is, at least,
pretty incorrect and unsecure.

Next argument is POSIX compatibility (from setrlimit() specification):

"The value RLIM_INFINITY, defined in <sys/resource.h>, shall be
considered to be larger than any other limit value. If a call to
getrlimit() returns RLIM_INFINITY for a resource, it means the
implementation shall not enforce limits on that resource. Specifying
RLIM_INFINITY as any resource limit value on a successful call to
setrlimit() shall inhibit enforcement of that resource limit."

So kernel does what is expected. If you want "unlimited" number of
descriptors, you have it.

Please consider again where exactly problem is, if in Debian patch or
in kernel patch. From my point of view Debian patch should be
reverted, not the kernel one.

Please add me to CC, I'm not member of the list.

Regards, Adam

___
Adam Tkac

2009-01-29 18:06:17

by Andrew Morton

[permalink] [raw]
Subject: Re: 2.6.28, rlimits, performance and debian etch

On Thu, 29 Jan 2009 13:19:00 +0100 Adam Tkac <[email protected]> wrote:

> On Tue, Jan 27, 2009 at 03:17:03PM -0800, Andrew Morton wrote:
> > On Wed, 21 Jan 2009 12:52:19 +0100
> > Peter Palfrader <[email protected]> wrote:
> >
> > > Hi,
> > >
> > > I spent several hours trying to get to the bottom of a serious
> > > performance issue that appeared on one of our servers after upgrading to
> > > 2.6.28. In the end it's what could be considered a userspace bug that
> > > was triggered by a change in 2.6.28. Since this might also affect other
> > > people I figured I'd at least document what I found here, and maybe we
> > > can even do something about it:
> > >
> > >
> > > So, I upgraded some of debian.org's machines to 2.6.28.1 and immediately
> > > the team maintaining our ftp archive complained that one of their
> > > scripts that previously ran in a few minutes still hadn't even come
> > > close to being done after an hour or so. Downgrading to 2.6.27 fixed
> > > that.
> > >
> > > Turns out that script is forking a lot and something in it or python or
> > > whereever closes all the file descriptors it doesn't want to pass on.
> > > That is, it starts at zero and goes up to ulimit -n/RLIMIT_NOFILE and
> > > closes them all with a few exceptions.
> > >
> > > Turns out that takes a long time when your limit -n is now 2^20 (1048576).
> > >
> > > With 2.6.27.* the ulimit -n was the standard 1024, but with 2.6.28 it is
> > > now a thousand times that.
> > >
> > > 2.6.28 included a patch titled "rlimit: permit setting RLIMIT_NOFILE to
> > > RLIM_INFINITY" (0c2d64fb6cae9aae480f6a46cfe79f8d7d48b59f)[1] that
> > > allows, as the title implies, to set the limit for number of files to
> > > infinity.
> > >
> > > Closer investigation showed that the broken default ulimit did not apply
> > > to "system" processes (like stuff started from init). In the end I
> > > could establish that all processes that passed through pam_limit at one
> > > point had the bad resource limit.
> > >
> > > Apparently the pam library in Debian etch (4.0) initializes the limits
> > > to some default values when it doesn't have any settings in limit.conf
> > > to override them. Turns out that for nofiles this is RLIM_INFINITY.
> > > Commenting out "case RLIMIT_NOFILE" in pam_limit.c:267 of our pam
> > > package version 0.79-5 fixes that - tho I'm not sure what side effects
> > > that has.
> > >
> > > Debian lenny (the upcoming 5.0 version) doesn't have this issue as it
> > > uses a different pam (version).
> > >
> > >
> > > I'm a bit unsure where to go from here. Maybe the pam library in etch
> > > should be fixed. Maybe the patch should be reverted (but then it may be
> > > more correct now and that's what the changelog entry suggests).
> > > As a stopgap measure I could also just define nofile in limits.conf.
> > >
> > > Thanks for listening. Also thanks to Rik and Nocholas who helped track
> > > some of this down.
> > >
> > > Cheers,
> > > Peter
> > > 1. http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=0c2d64fb6cae9aae480f6a46cfe79f8d7d48b59f
> > > http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=0c2d64fb6cae9aae480f6a46cfe79f8d7d48b59f
> >
> > Ho hum, thanks.
> >
> > Well, I think we just revert it for now. We can bring it back later
> > if someone is thus inclined. Along with some sort of opt-in control,
> > perhaps in /proc. Which defaults to "off".
> >
>
> Hi all,
>
> I don't think the "rlim_infinity" patch should be reverted. Let me try
> to explain why.
>
> First, code which sets limits to RLIM_INFINITY is very bad idea and
> that code is Debian specific. I downloaded original pam 0.79 (stripped):
>
> for(i = 0; i < RLIM_NLIMITS; i++) {
> ...
> int r = getrlimit(i, &pl->limits[i].limit);
> ...
>
> as you can see original pam sets limits to inherited defaults. After
> code written above Debian adds their patch:
>
> if (limits_not_defined_in_limits_conf) {
> ...
> case RLIMIT_NOFILE:
> ...
> pl->limits[i].limit.rlim_cur = RLIM_INFINITY;
> pl->limits[i].limit.rlim_max = RLIM_INFINITY;
> ...
> }
>
> so as you can see inherited default limits are overriden to infinity.
> In my opinion Debian should revert their patch which is, at least,
> pretty incorrect and unsecure.
>
> Next argument is POSIX compatibility (from setrlimit() specification):
>
> "The value RLIM_INFINITY, defined in <sys/resource.h>, shall be
> considered to be larger than any other limit value. If a call to
> getrlimit() returns RLIM_INFINITY for a resource, it means the
> implementation shall not enforce limits on that resource. Specifying
> RLIM_INFINITY as any resource limit value on a successful call to
> setrlimit() shall inhibit enforcement of that resource limit."
>
> So kernel does what is expected. If you want "unlimited" number of
> descriptors, you have it.
>
> Please consider again where exactly problem is, if in Debian patch or
> in kernel patch. From my point of view Debian patch should be
> reverted, not the kernel one.
>

Sure, debian might well be wrong. But the bottom line is that the
kernel changed, and people's machines broke.

If the kernel change was really really important then we might just
grit our teeth and live with the breakage. But this change _wasn't_ a
terribly important one. So I think we should back it out while we find
another way of implementing it which does not break currently deployed
installations.

2009-01-29 18:10:27

by Peter Palfrader

[permalink] [raw]
Subject: Re: 2.6.28, rlimits, performance and debian etch

On Thu, 29 Jan 2009, Andrew Morton wrote:

> Sure, debian might well be wrong. But the bottom line is that the
> kernel changed, and people's machines broke.
>
> If the kernel change was really really important then we might just
> grit our teeth and live with the breakage. But this change _wasn't_ a
> terribly important one. So I think we should back it out while we find
> another way of implementing it which does not break currently deployed
> installations.

FWIWs, if Debian stable (4.0) is really the only place that breaks with
this then re-introducing it in a couple of months when the new version
of Debian (5.0) has been out for a while might well be an option.

Peter
--
| .''`. ** Debian GNU/Linux **
Peter Palfrader | : :' : The universal
http://www.palfrader.org/ | `. `' Operating System
| `- http://www.debian.org/