2009-01-29 04:56:20

by Davide Libenzi

[permalink] [raw]
Subject: [patch] drop epoll max_user_instances and rely only on max_user_watches

Linus suggested to put limits where the money is, and max_user_watches
already does that w/out the need of max_user_instances. That has the
advantage to mitigate the potential DoS while allowing pretty generous
default behavior.
Allowing top 4% of low memory (per user) to be allocated in epoll
watches, we have:

LOMEM MAX_WATCHES (per user)
512MB ~178000
1GB ~356000
2GB ~712000

A box with 512MB of lomem, will meet some challenge in hitting 180K
watches, socket buffers math teaches us.
No more max_user_instances limits then.



Signed-off-by: Davide Libenzi <[email protected]>


- Davide


---
fs/eventpoll.c | 22 ++++------------------
include/linux/sched.h | 1 -
2 files changed, 4 insertions(+), 19 deletions(-)

Index: linux-2.6.mod/fs/eventpoll.c
===================================================================
--- linux-2.6.mod.orig/fs/eventpoll.c 2009-01-28 20:06:55.000000000 -0800
+++ linux-2.6.mod/fs/eventpoll.c 2009-01-28 20:08:16.000000000 -0800
@@ -234,8 +234,6 @@
/*
* Configuration options available inside /proc/sys/fs/epoll/
*/
-/* Maximum number of epoll devices, per user */
-static int max_user_instances __read_mostly;
/* Maximum number of epoll watched descriptors, per user */
static int max_user_watches __read_mostly;

@@ -261,14 +259,6 @@

ctl_table epoll_table[] = {
{
- .procname = "max_user_instances",
- .data = &max_user_instances,
- .maxlen = sizeof(int),
- .mode = 0644,
- .proc_handler = &proc_dointvec_minmax,
- .extra1 = &zero,
- },
- {
.procname = "max_user_watches",
.data = &max_user_watches,
.maxlen = sizeof(int),
@@ -491,7 +481,6 @@

mutex_unlock(&epmutex);
mutex_destroy(&ep->mtx);
- atomic_dec(&ep->user->epoll_devs);
free_uid(ep->user);
kfree(ep);
}
@@ -581,10 +570,6 @@
struct eventpoll *ep;

user = get_current_user();
- error = -EMFILE;
- if (unlikely(atomic_read(&user->epoll_devs) >=
- max_user_instances))
- goto free_uid;
error = -ENOMEM;
ep = kzalloc(sizeof(*ep), GFP_KERNEL);
if (unlikely(!ep))
@@ -1141,7 +1126,6 @@
flags & O_CLOEXEC);
if (fd < 0)
ep_free(ep);
- atomic_inc(&ep->user->epoll_devs);

error_return:
DNPRINTK(3, (KERN_INFO "[%p] eventpoll: sys_epoll_create(%d) = %d\n",
@@ -1366,8 +1350,10 @@
struct sysinfo si;

si_meminfo(&si);
- max_user_instances = 128;
- max_user_watches = (((si.totalram - si.totalhigh) / 32) << PAGE_SHIFT) /
+ /*
+ * Allows top 4% of lomem to be allocated for epoll watches (per user).
+ */
+ max_user_watches = (((si.totalram - si.totalhigh) / 25) << PAGE_SHIFT) /
EP_ITEM_COST;

/* Initialize the structure used to perform safe poll wait head wake ups */
Index: linux-2.6.mod/include/linux/sched.h
===================================================================
--- linux-2.6.mod.orig/include/linux/sched.h 2009-01-28 20:07:41.000000000 -0800
+++ linux-2.6.mod/include/linux/sched.h 2009-01-28 20:08:16.000000000 -0800
@@ -630,7 +630,6 @@
atomic_t inotify_devs; /* How many inotify devs does this user have opened? */
#endif
#ifdef CONFIG_EPOLL
- atomic_t epoll_devs; /* The number of epoll descriptors currently open */
atomic_t epoll_watches; /* The number of file descriptors currently watched */
#endif
#ifdef CONFIG_POSIX_MQUEUE


2009-01-29 08:42:23

by Bron Gondwana

[permalink] [raw]
Subject: Re: [patch] drop epoll max_user_instances and rely only on max_user_watches

On Wed, Jan 28, 2009 at 08:56:07PM -0800, Davide Libenzi wrote:
> Linus suggested to put limits where the money is, and max_user_watches
> already does that w/out the need of max_user_instances. That has the
> advantage to mitigate the potential DoS while allowing pretty generous
> default behavior.
> Allowing top 4% of low memory (per user) to be allocated in epoll
> watches, we have:
>
> LOMEM MAX_WATCHES (per user)
> 512MB ~178000
> 1GB ~356000
> 2GB ~712000
>
> A box with 512MB of lomem, will meet some challenge in hitting 180K
> watches, socket buffers math teaches us.
> No more max_user_instances limits then.

Excellent. Glad to see :) Saves me from keeping on working on my "only
account epoll within epoll for max_user_instances" patch, which would
have just been needless complexity (though cheap - is_file_epoll looks
pretty easy to check for both descriptors) - and I hadn't even compiled
and tested it yet.

Thanks Davide - this will keep our mxes humming along happily.

Bron.

2009-01-29 08:58:22

by Andrew Morton

[permalink] [raw]
Subject: Re: [patch] drop epoll max_user_instances and rely only on max_user_watches


> Subject: [patch] drop epoll max_user_instances and rely only on max_user_watches

nanonit: please prepare titles in the form "subsystem-id:
what-i-did-to-it", so a suitable name here would be

epoll: drop max_user_instances and rely only on max_user_watches

On Wed, 28 Jan 2009 20:56:07 -0800 (PST) Davide Libenzi <[email protected]> wrote:

> Linus suggested to put limits where the money is, and max_user_watches
> already does that w/out the need of max_user_instances. That has the
> advantage to mitigate the potential DoS while allowing pretty generous
> default behavior.

A reader of this changelog would be wondering what this DoS is.

> Allowing top 4% of low memory (per user) to be allocated in epoll
> watches, we have:
>
> LOMEM MAX_WATCHES (per user)
> 512MB ~178000
> 1GB ~356000
> 2GB ~712000
>
> A box with 512MB of lomem, will meet some challenge in hitting 180K
> watches, socket buffers math teaches us.
> No more max_user_instances limits then.

So the max consumable memory is

number-of-users * max_user_watches * sizeof(whatever)

?

So if enough users gang up (or if one person has access to a lot of
UIDs), there's still a DoS?

I suspect we can live with that.



I assume that because you based all this on all the other patches, you
view it as 2.6.30 material?

> @@ -581,10 +570,6 @@

please use `diff -p'. It helps.

> struct eventpoll *ep;
>
> user = get_current_user();
> - error = -EMFILE;
> - if (unlikely(atomic_read(&user->epoll_devs) >=
> - max_user_instances))
> - goto free_uid;
> error = -ENOMEM;
> ep = kzalloc(sizeof(*ep), GFP_KERNEL);
> if (unlikely(!ep))
> @@ -1141,7 +1126,6 @@
> flags & O_CLOEXEC);
> if (fd < 0)
> ep_free(ep);
> - atomic_inc(&ep->user->epoll_devs);
>
> error_return:
> DNPRINTK(3, (KERN_INFO "[%p] eventpoll: sys_epoll_create(%d) = %d\n",

I hit a reject here (which is actually in epoll_create1()) (and diff -p
might not have told us of this, because of that darned
SYSCALL_DEFINE1() thing we just added) (which broke ctags too).

The code I have is

if (error < 0)
ep_free(ep);
else
atomic_inc(&ep->user->epoll_devs);

so I obviously nuked the `else' as well.


2009-01-29 10:03:53

by Michael Kerrisk

[permalink] [raw]
Subject: Re: [patch] drop epoll max_user_instances and rely only on max_user_watches

On Thu, Jan 29, 2009 at 9:57 PM, Andrew Morton
<[email protected]> wrote:
>
>> Subject: [patch] drop epoll max_user_instances and rely only on max_user_watches
>
> nanonit: please prepare titles in the form "subsystem-id:
> what-i-did-to-it", so a suitable name here would be
>
> epoll: drop max_user_instances and rely only on max_user_watches
>
> On Wed, 28 Jan 2009 20:56:07 -0800 (PST) Davide Libenzi <[email protected]> wrote:
>
>> Linus suggested to put limits where the money is, and max_user_watches
>> already does that w/out the need of max_user_instances. That has the
>> advantage to mitigate the potential DoS while allowing pretty generous
>> default behavior.

[...]

> I assume that because you based all this on all the other patches, you
> view it as 2.6.30 material?

Since max_user_instances was added in 2.6.28, and this is an ABI
change, I suggest this should go into .29-rc, so that
max_user_instances spends as little time in the wild as possible.

Cheers,

Michael

--
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
git://git.kernel.org/pub/scm/docs/man-pages/man-pages.git
man-pages online: http://www.kernel.org/doc/man-pages/online_pages.html
Found a bug? http://www.kernel.org/doc/man-pages/reporting_bugs.html

2009-01-29 18:26:24

by Greg KH

[permalink] [raw]
Subject: Re: [patch] drop epoll max_user_instances and rely only on max_user_watches

On Thu, Jan 29, 2009 at 11:03:37PM +1300, Michael Kerrisk wrote:
> On Thu, Jan 29, 2009 at 9:57 PM, Andrew Morton
> <[email protected]> wrote:
> >
> >> Subject: [patch] drop epoll max_user_instances and rely only on max_user_watches
> >
> > nanonit: please prepare titles in the form "subsystem-id:
> > what-i-did-to-it", so a suitable name here would be
> >
> > epoll: drop max_user_instances and rely only on max_user_watches
> >
> > On Wed, 28 Jan 2009 20:56:07 -0800 (PST) Davide Libenzi <[email protected]> wrote:
> >
> >> Linus suggested to put limits where the money is, and max_user_watches
> >> already does that w/out the need of max_user_instances. That has the
> >> advantage to mitigate the potential DoS while allowing pretty generous
> >> default behavior.
>
> [...]
>
> > I assume that because you based all this on all the other patches, you
> > view it as 2.6.30 material?
>
> Since max_user_instances was added in 2.6.28, and this is an ABI
> change, I suggest this should go into .29-rc, so that
> max_user_instances spends as little time in the wild as possible.

I agree. I'll backport the change to the .28 and .27 stable kernels as
well, as we have complaints about the default values breaking people's
machines.

thanks,

greg k-h

2009-01-29 18:46:19

by Davide Libenzi

[permalink] [raw]
Subject: Re: [patch] drop epoll max_user_instances and rely only on max_user_watches

On Thu, 29 Jan 2009, Andrew Morton wrote:

>
> > Subject: [patch] drop epoll max_user_instances and rely only on max_user_watches
>
> nanonit: please prepare titles in the form "subsystem-id:
> what-i-did-to-it", so a suitable name here would be
>
> epoll: drop max_user_instances and rely only on max_user_watches

Sorry, I forgot, again. I sent those by hand, while I have that logic in
my scripts.



> On Wed, 28 Jan 2009 20:56:07 -0800 (PST) Davide Libenzi <[email protected]> wrote:
>
> > Linus suggested to put limits where the money is, and max_user_watches
> > already does that w/out the need of max_user_instances. That has the
> > advantage to mitigate the potential DoS while allowing pretty generous
> > default behavior.
>
> A reader of this changelog would be wondering what this DoS is.

for i 1..1021
fd[i] = epoll_create()
for i 1..1021
for j 1..1021, j != i
epoll_ctl(fd[i], ADD, fd[j])

I'm not sure we want to advertise it too much though.


> > Allowing top 4% of low memory (per user) to be allocated in epoll
> > watches, we have:
> >
> > LOMEM MAX_WATCHES (per user)
> > 512MB ~178000
> > 1GB ~356000
> > 2GB ~712000
> >
> > A box with 512MB of lomem, will meet some challenge in hitting 180K
> > watches, socket buffers math teaches us.
> > No more max_user_instances limits then.
>
> So the max consumable memory is
>
> number-of-users * max_user_watches * sizeof(whatever)
>
> ?
>
> So if enough users gang up (or if one person has access to a lot of
> UIDs), there's still a DoS?
>
> I suspect we can live with that.

I think so, given that we've seen that putting too restrictive limits
creates more problems. A similar thing, even though milder, exist on
sys_poll too. sys_poll eats (on a 32bit box) 28 bytes of kernel memory for
each 12 bytes of user memory, worse figures on a 64bit box.



> I assume that because you based all this on all the other patches, you
> view it as 2.6.30 material?
>
> > @@ -581,10 +570,6 @@
>
> please use `diff -p'. It helps.

Need to add to my quilt diff options :)



> The code I have is
>
> if (error < 0)
> ep_free(ep);
> else
> atomic_inc(&ep->user->epoll_devs);
>
> so I obviously nuked the `else' as well.

That's the right thing.


- Davide

2009-01-29 18:55:10

by Davide Libenzi

[permalink] [raw]
Subject: Re: [patch] drop epoll max_user_instances and rely only on max_user_watches

On Thu, 29 Jan 2009, Michael Kerrisk wrote:

> Since max_user_instances was added in 2.6.28, and this is an ABI
> change, I suggest this should go into .29-rc, so that
> max_user_instances spends as little time in the wild as possible.

Agreed, like I already told Andrew. Sorry Michael for the double change to
the man page :/


- Davide

2009-02-01 01:27:10

by Michael Kerrisk

[permalink] [raw]
Subject: Re: [patch] drop epoll max_user_instances and rely only on max_user_watches

Hi Davide,

On Fri, Jan 30, 2009 at 7:54 AM, Davide Libenzi <[email protected]> wrote:
> On Thu, 29 Jan 2009, Michael Kerrisk wrote:
>
>> Since max_user_instances was added in 2.6.28, and this is an ABI
>> change, I suggest this should go into .29-rc, so that
>> max_user_instances spends as little time in the wild as possible.
>
> Agreed, like I already told Andrew. Sorry Michael for the double change to
> the man page :/

So is the right change to the epoll(7) page (see
http://www.kernel.org/doc/man-pages/online/pages/man7/epoll.7.html )
to simply remive the text on max_user_instances:

/proc/sys/fs/epoll/max_user_instances (since Linux 2.6.28)
This specifies an upper limit on the number of epoll
instances that can be created per real user ID.

Or are other changes also required?

Cheers,

Michael

--
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
git://git.kernel.org/pub/scm/docs/man-pages/man-pages.git
man-pages online: http://www.kernel.org/doc/man-pages/online_pages.html
Found a bug? http://www.kernel.org/doc/man-pages/reporting_bugs.html

2009-02-01 01:30:47

by Davide Libenzi

[permalink] [raw]
Subject: Re: [patch] drop epoll max_user_instances and rely only on max_user_watches

On Sun, 1 Feb 2009, Michael Kerrisk wrote:

> Hi Davide,
>
> On Fri, Jan 30, 2009 at 7:54 AM, Davide Libenzi <[email protected]> wrote:
> > On Thu, 29 Jan 2009, Michael Kerrisk wrote:
> >
> >> Since max_user_instances was added in 2.6.28, and this is an ABI
> >> change, I suggest this should go into .29-rc, so that
> >> max_user_instances spends as little time in the wild as possible.
> >
> > Agreed, like I already told Andrew. Sorry Michael for the double change to
> > the man page :/
>
> So is the right change to the epoll(7) page (see
> http://www.kernel.org/doc/man-pages/online/pages/man7/epoll.7.html )
> to simply remive the text on max_user_instances:
>
> /proc/sys/fs/epoll/max_user_instances (since Linux 2.6.28)
> This specifies an upper limit on the number of epoll
> instances that can be created per real user ID.
>
> Or are other changes also required?

Yes. Plus, max_user_watches, instead of being 1/32 of lowmem, it's 4%.
Just a detail.



- Davide

2009-02-01 01:43:59

by Michael Kerrisk

[permalink] [raw]
Subject: Re: [patch] drop epoll max_user_instances and rely only on max_user_watches

Hi Davide,

On Sun, Feb 1, 2009 at 2:30 PM, Davide Libenzi <[email protected]> wrote:
> On Sun, 1 Feb 2009, Michael Kerrisk wrote:
>
>> Hi Davide,
>>
>> On Fri, Jan 30, 2009 at 7:54 AM, Davide Libenzi <[email protected]> wrote:
>> > On Thu, 29 Jan 2009, Michael Kerrisk wrote:
>> >
>> >> Since max_user_instances was added in 2.6.28, and this is an ABI
>> >> change, I suggest this should go into .29-rc, so that
>> >> max_user_instances spends as little time in the wild as possible.
>> >
>> > Agreed, like I already told Andrew. Sorry Michael for the double change to
>> > the man page :/
>>
>> So is the right change to the epoll(7) page (see
>> http://www.kernel.org/doc/man-pages/online/pages/man7/epoll.7.html )
>> to simply remive the text on max_user_instances:
>>
>> /proc/sys/fs/epoll/max_user_instances (since Linux 2.6.28)
>> This specifies an upper limit on the number of epoll
>> instances that can be created per real user ID.
>>
>> Or are other changes also required?
>
> Yes. Plus, max_user_watches, instead of being 1/32 of lowmem, it's 4%.
> Just a detail.

Would you please review the following patch, which comments out the
text for max_user_instances, and updates the numbers for
max_user-watches.

Cheers,

Michael

diff --git a/man7/epoll.7 b/man7/epoll.7
index 7c67e13..6ee686e 100644
--- a/man7/epoll.7
+++ b/man7/epoll.7
@@ -172,10 +172,11 @@ with
.SS /proc interfaces
The following interfaces can be used to limit the amount of
kernel memory consumed by epoll:
-.TP
-.IR /proc/sys/fs/epoll/max_user_instances " (since Linux 2.6.28)"
-This specifies an upper limit on the number of epoll instances
-that can be created per real user ID.
+.\" Following was added in 2.6.28, but them removed in 2.6.29
+.\" .TP
+.\" .IR /proc/sys/fs/epoll/max_user_instances " (since Linux 2.6.28)"
+.\" This specifies an upper limit on the number of epoll instances
+.\" that can be created per real user ID.
.TP
.IR /proc/sys/fs/epoll/max_user_watches " (since Linux 2.6.28)"
This specifies a limit on the total number of
@@ -185,10 +186,10 @@ The limit is per real user ID.
Each registered file descriptor costs roughly 90 bytes on a 32-bit kernel,
and roughly 160 bytes on a 64-bit kernel.
Currently,
-.\" 2.6.28
+.\" 2.6.29 (in 2.6.28, the default was 1/32 of lowmem)
the default value for
.I max_user_watches
-is 1/32 of the available low memory,
+is 1/25 (4%) of the available low memory,
divided by the registration cost in bytes.
.SS Example for Suggested Usage
While the usage of

2009-02-01 01:49:19

by Davide Libenzi

[permalink] [raw]
Subject: Re: [patch] drop epoll max_user_instances and rely only on max_user_watches

On Sun, 1 Feb 2009, Michael Kerrisk wrote:

> Hi Davide,
>
> On Sun, Feb 1, 2009 at 2:30 PM, Davide Libenzi <[email protected]> wrote:
> > On Sun, 1 Feb 2009, Michael Kerrisk wrote:
> >
> >> Hi Davide,
> >>
> >> On Fri, Jan 30, 2009 at 7:54 AM, Davide Libenzi <[email protected]> wrote:
> >> > On Thu, 29 Jan 2009, Michael Kerrisk wrote:
> >> >
> >> >> Since max_user_instances was added in 2.6.28, and this is an ABI
> >> >> change, I suggest this should go into .29-rc, so that
> >> >> max_user_instances spends as little time in the wild as possible.
> >> >
> >> > Agreed, like I already told Andrew. Sorry Michael for the double change to
> >> > the man page :/
> >>
> >> So is the right change to the epoll(7) page (see
> >> http://www.kernel.org/doc/man-pages/online/pages/man7/epoll.7.html )
> >> to simply remive the text on max_user_instances:
> >>
> >> /proc/sys/fs/epoll/max_user_instances (since Linux 2.6.28)
> >> This specifies an upper limit on the number of epoll
> >> instances that can be created per real user ID.
> >>
> >> Or are other changes also required?
> >
> > Yes. Plus, max_user_watches, instead of being 1/32 of lowmem, it's 4%.
> > Just a detail.
>
> Would you please review the following patch, which comments out the
> text for max_user_instances, and updates the numbers for
> max_user-watches.

Thank you Michael, looks fine to me.


- Davide