I think it's necessary to reserve some pids to the super user.
5 must be sufficient.
Signed-off-by: Gustavo Chain <[email protected]>
---
kernel/fork.c | 6 ++++++
1 files changed, 6 insertions(+), 0 deletions(-)
diff --git a/kernel/fork.c b/kernel/fork.c
index 33f12f4..db23cb3 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -1367,11 +1367,17 @@ long do_fork(unsigned long clone_flags,
int __user *parent_tidptr,
int __user *child_tidptr)
{
+#define RESERVED_PIDS 5
+
struct task_struct *p;
int trace = 0;
struct pid *pid = alloc_pid();
long nr;
+ if (!capable(CAP_SYS_ADMIN) && nr_threads >= max_threads
- RESERVED_PIDS) {
+ return -EAGAIN;
+ }
+
if (!pid)
return -EAGAIN;
nr = pid->nr;
--
Gustavo Chaín Dumit
Gustavo Chain wrote:
> I think it's necessary to reserve some pids to the super user.
> 5 must be sufficient.
Why? (Sorry if I missed something.)
Shouldn't you test for error return before the pid is allocated?
Otherwise, I think, you have to free it. Thus:
> long do_fork(unsigned long clone_flags,
> int __user *parent_tidptr,
> int __user *child_tidptr)
> {
> +#define RESERVED_PIDS 5 /* danged if I know why */
> +
> + if (!capable(CAP_SYS_ADMIN) && nr_threads >= max_threads - RESERVED_PIDS)
> + return -EAGAIN;
> +
>
> struct task_struct *p;
> int trace = 0;
> struct pid *pid = alloc_pid();
> long nr;
>
> if (!pid)
> return -EAGAIN;
> nr = pid->nr;
>
(While I'm being picky, I don't like braces around a simple return, and
neither, I note, does the style guide.)
El Wed, 10 Oct 2007 11:19:27 +0930
David Newall <[email protected]> escribió:
> Gustavo Chain wrote:
> > I think it's necessary to reserve some pids to the super user.
> > 5 must be sufficient.
>
> Why? (Sorry if I missed something.)
To prevent a posible DoS ?
>
> Shouldn't you test for error return before the pid is allocated?
> Otherwise, I think, you have to free it. Thus:
> > long do_fork(unsigned long clone_flags,
> > int __user *parent_tidptr,
> > int __user *child_tidptr)
> > {
> > +#define RESERVED_PIDS 5 /* danged if I know why */
> > +
> > + if (!capable(CAP_SYS_ADMIN) && nr_threads >= max_threads -
> > RESERVED_PIDS)
> > + return -EAGAIN;
> > +
> >
> > struct task_struct *p;
> > int trace = 0;
> > struct pid *pid = alloc_pid();
> > long nr;
> >
> > if (!pid)
> > return -EAGAIN;
> > nr = pid->nr;
> >
>
> (While I'm being picky, I don't like braces around a simple return,
> and neither, I note, does the style guide.)
> -
> To unsubscribe from this list: send the line "unsubscribe
> linux-kernel" in the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
--
Gustavo Chaín Dumit
Gustavo Chain wrote:
> El Wed, 10 Oct 2007 11:19:27 +0930
> David Newall <[email protected]> escribió:
>
>> Gustavo Chain wrote:
>>
>>> I think it's necessary to reserve some pids to the super user.
>>> 5 must be sufficient.
>>>
>> Why? (Sorry if I missed something.)
>>
>
> ¿ To prevent a posible DoS ?
>
That was what I thought you had in mind; it protects from some kind of
fork bomb, right? But it doesn't seem useful unless you guarantee
having a process already running (with CAP_SYS_ADMIN) *before* the bomb
goes off.
El Wed, 10 Oct 2007 15:14:06 +0930
David Newall <[email protected]> escribió:
> Gustavo Chain wrote:
> > El Wed, 10 Oct 2007 11:19:27 +0930
> > David Newall <[email protected]> escribió:
> >
> >> Gustavo Chain wrote:
> >>
> >>> I think it's necessary to reserve some pids to the super user.
> >>> 5 must be sufficient.
> >>>
> >> Why? (Sorry if I missed something.)
> >>
> >
> > ¿ To prevent a posible DoS ?
> >
>
> That was what I thought you had in mind; it protects from some kind
> of fork bomb, right? But it doesn't seem useful unless you guarantee
> having a process already running (with CAP_SYS_ADMIN) *before* the
> bomb goes off.
Not really, because fork bomb will never reach maximum pid possible.
And root will always have a "slot" to kill desired processes.
--
Gustavo Chaín Dumit
Gustavo Chain wrote:
> El Wed, 10 Oct 2007 15:14:06 +0930
> David Newall <[email protected]> escribió:
>
>> Gustavo Chain wrote:
>>
>>> El Wed, 10 Oct 2007 11:19:27 +0930
>>> David Newall <[email protected]> escribió:
>>>
>>>
>>>> Gustavo Chain wrote:
>>>>
>>>>
>>>>> I think it's necessary to reserve some pids to the super user.
>>>>> 5 must be sufficient.
>>>>>
>>>>>
>>>> Why? (Sorry if I missed something.)
>>>>
>>>>
>>> ¿ To prevent a posible DoS ?
>>>
>>>
>> That was what I thought you had in mind; it protects from some kind
>> of fork bomb, right? But it doesn't seem useful unless you guarantee
>> having a process already running (with CAP_SYS_ADMIN) *before* the
>> bomb goes off.
>>
>
> Not really, because fork bomb will never reach maximum pid possible.
> And root will always have a "slot" to kill desired processes.
>
This is like pulling teeth: painful.
I don't think you have satisfactorily explained why it's necessary. "To
prevent a possible DoS" isn't sufficient by itself. I think you should
explain the scenarios you have in mind.
On Wed, 10 Oct 2007 09:46:22 EDT, Gustavo Chain said:
> El Wed, 10 Oct 2007 15:14:06 +0930
> David Newall <[email protected]> escribi?:
> > That was what I thought you had in mind; it protects from some kind
> > of fork bomb, right? But it doesn't seem useful unless you guarantee
> > having a process already running (with CAP_SYS_ADMIN) *before* the
> > bomb goes off.
>
> Not really, because fork bomb will never reach maximum pid possible.
> And root will always have a "slot" to kill desired processes.
What David meant was that "root will always have a slot" doesn't *actually*
help unless you *also* have a way to actually *spawn* such a process. In order
to do the ps, kill, and so on that you need to recover, you need to already
have either a root shell available, or a way to *get* a root shell that doesn't
rely on a non-root process (so /bin/su doesn't help here).
Many distros will leave a /sbin/mingetty running on tty1 through tty6, and
you *can* use those to get a root shell. David's point is that without
something like that already in place, the patch doesn't help....
> On Wed, 10 Oct 2007 09:46:22 EDT, Gustavo Chain said:
>> El Wed, 10 Oct 2007 15:14:06 +0930
>> David Newall <[email protected]> escribi?:
>> > That was what I thought you had in mind; it protects from some kind
>> > of fork bomb, right? But it doesn't seem useful unless you guarantee
>> > having a process already running (with CAP_SYS_ADMIN) *before* the
>> > bomb goes off.
>>
>> Not really, because fork bomb will never reach maximum pid possible.
>> And root will always have a "slot" to kill desired processes.
>
> What David meant was that "root will always have a slot" doesn't
> *actually*
> help unless you *also* have a way to actually *spawn* such a process. In
> order
> to do the ps, kill, and so on that you need to recover, you need to
> already
> have either a root shell available, or a way to *get* a root shell that
> doesn't
> rely on a non-root process (so /bin/su doesn't help here).
>
> Many distros will leave a /sbin/mingetty running on tty1 through tty6, and
> you *can* use those to get a root shell. David's point is that without
> something like that already in place, the patch doesn't help....
>
>
but once you are logged in, how can you "spawn" processes (ps, kill, and
so on) if the limit is reached ?
[email protected] wrote:
> What David meant was that "root will always have a slot" doesn't *actually*
> help unless you *also* have a way to actually *spawn* such a process. In order
> to do the ps, kill, and so on that you need to recover, you need to already
> have either a root shell available, or a way to *get* a root shell that doesn't
> rely on a non-root process (so /bin/su doesn't help here).
That's right, although it's worse than that. You need to have a process
with CAP_SYS_ADMIN. If root processes normally have that capability
then the reserved slots may well disappear before you notice a problem.
If root processes normally don't have it, then you need to guarantee
that one is already running.
David Newall wrote:
> [email protected] wrote:
> > What David meant was that "root will always have a slot" doesn't
> > *actually* help unless you *also* have a way to actually *spawn* such a
> > process. In order to do the ps, kill, and so on that you need to
> > recover, you need to already have either a root shell available, or a
> > way to *get* a root shell that doesn't rely on a non-root process (so
> > /bin/su doesn't help here).
>
> That's right, although it's worse than that. You need to have a process
> with CAP_SYS_ADMIN. If root processes normally have that capability
> then the reserved slots may well disappear before you notice a problem.
> If root processes normally don't have it, then you need to guarantee
> that one is already running.
I once posted a patch to handle this DoS, but, as usual, it wasn't accepted.
Go figure...
Here is an excerpt:
Re: [PATCH 1/1] threads_max: Simple lockout prevention patch
From: Al Boldi <[email protected]>
To: Andrew Morton <[email protected]>
CC: [email protected]
Date: 04/24/06 02:12 pm
Andrew Morton wrote:
> Al Boldi <[email protected]> wrote:
> > This is a another resend, which was ignored before w/o comment.
> > Andrew, can you at least comment on it? Thanks!
>
> I don't have a clue what it's for.
Quoting from the 'Resource limits' thread on lkml on 27/09/05:
>>>>> Consider this dilemma:
>>>>> Runaway proc/s hit the limit.
>>>>> Try to kill some and you are denied due to the resource limit.
>>>>> Use some previously running app like top, hope it hasn't been killed
>>>>> by some OOM situation, try killing some procs and another one takes
>>>>> it's place because of the runaway situation.
>>>>> Raise the limit, and it gets filled by the runaways.
>>>>> You are pretty much stuck.
>>>>
>>>> Not really, this is the sort of thing ulimit is meant for. To keep
>>>> processes from any one user from running away. It lets you limit the
>>>> damage it can do, until such time as you can control it and fix the
>>>> runaway application.
>>>
>>> threads-max = 1024
>>> ulimit = 100 forks
>>> 11 runaway procs hitting the threads-max limit
>>
>> This is incorrect. If you ulimit a user to 100 forks, and 11 processes
>> running with that uid
>
> Different uid.
>
Then yes, if you set a system-wide limit that is less than the sum of the
limits imposed on each accountable part of the system you can have lock out.
But thats your fault for misconfiguring the system. Don't do that.
-- end of quote
Thanks!
--
Al
Please don't trim CC lists
On Oct 11, 2007, at 17:02:37, Al Boldi wrote:
> David Newall wrote:
>> [email protected] wrote:
>>> What David meant was that "root will always have a slot" doesn't
>>> *actually* help unless you *also* have a way to actually *spawn*
>>> such a process. In order to do the ps, kill, and so on that you
>>> need to recover, you need to already have either a root shell
>>> available, or a way to *get* a root shell that doesn't rely on a
>>> non-root process (so /bin/su doesn't help here).
>>
>> That's right, although it's worse than that. You need to have a
>> process with CAP_SYS_ADMIN. If root processes normally have that
>> capability then the reserved slots may well disappear before you
>> notice a problem. If root processes normally don't have it, then
>> you need to guarantee that one is already running.
>
> I once posted a patch to handle this DoS, but, as usual, it wasn't
> accepted. Go figure...
This isn't really necessary any more with the new CFS scheduler. If
you want to prevent excess memory usage then you limit memory usage,
not process count, so just set the system max process count to
something absurdly high and leave the user counts down at the maximum
a user might run. Then as long as the sum of the user processes is
less than the max number of processes (which you just set absurdly
high or unlimited), you may still log in. With the per-user
scheduling enabled CFS allows you to run an optimistically-real-time
game as one user and several thousand busy-loops as another user and
get almost picture perfect 50% CPU distribution between the users.
To me that seems a much better DoS-prevention system than limits
which don't scale based on how many people are requesting resources.
Cheers,
Kyle Moffett
Kyle Moffett wrote:
> Please don't trim CC lists
>
> On Oct 11, 2007, at 17:02:37, Al Boldi wrote:
> > David Newall wrote:
> >> [email protected] wrote:
> >>> What David meant was that "root will always have a slot" doesn't
> >>> *actually* help unless you *also* have a way to actually *spawn*
> >>> such a process. In order to do the ps, kill, and so on that you
> >>> need to recover, you need to already have either a root shell
> >>> available, or a way to *get* a root shell that doesn't rely on a
> >>> non-root process (so /bin/su doesn't help here).
> >>
> >> That's right, although it's worse than that. You need to have a
> >> process with CAP_SYS_ADMIN. If root processes normally have that
> >> capability then the reserved slots may well disappear before you
> >> notice a problem. If root processes normally don't have it, then
> >> you need to guarantee that one is already running.
> >
> > I once posted a patch to handle this DoS, but, as usual, it wasn't
> > accepted. Go figure...
>
> This isn't really necessary any more with the new CFS scheduler. If
> you want to prevent excess memory usage then you limit memory usage,
> not process count, so just set the system max process count to
> something absurdly high and leave the user counts down at the maximum
> a user might run. Then as long as the sum of the user processes is
> less than the max number of processes (which you just set absurdly
> high or unlimited), you may still log in. With the per-user
> scheduling enabled CFS allows you to run an optimistically-real-time
> game as one user and several thousand busy-loops as another user and
> get almost picture perfect 50% CPU distribution between the users.
> To me that seems a much better DoS-prevention system than limits
> which don't scale based on how many people are requesting resources.
You have a point, and resource-controllers can probably control DoS a lot
better, but the they also incur more overhead. Think of this "lockout
prevention" patch as a near zero overhead safety valve.
Thanks!
--
Al
On Oct 12, 2007, at 01:37:23, Al Boldi wrote:
> Kyle Moffett wrote:
>> This isn't really necessary any more with the new CFS scheduler.
>> If you want to prevent excess memory usage then you limit memory
>> usage, not process count, so just set the system max process count
>> to something absurdly high and leave the user counts down at the
>> maximum a user might run. Then as long as the sum of the user
>> processes is less than the max number of processes (which you just
>> set absurdly high or unlimited), you may still log in. With the
>> per-user scheduling enabled CFS allows you to run an
>> optimistically-real-time game as one user and several thousand
>> busy-loops as another user and get almost picture perfect 50% CPU
>> distribution between the users. To me that seems a much better DoS-
>> prevention system than limits which don't scale based on how many
>> people are requesting resources.
>
> You have a point, and resource-controllers can probably control DoS
> a lot better, but the they also incur more overhead. Think of this
> "lockout prevention" patch as a near zero overhead safety valve.
But why do you need to add "lockout prevention" if it already
exists? With CFS' extremely efficient per-user-scheduling (hopefully
soon to be the default) there are only two forms of lockout by non-
root processes: (1) Running out of PIDs in the box's PID-space
(think tens or hundreds of thousands of processes), or (2) Swap-
storming the box to death. To put it bluntly trying to reserve free
PID slots is attacking the wrong end of the problem and your so
called "lockout prevention" could very easily ensure that 10 PIDs are
available even if the user has swapstormed the box with the PIDs he
does have.
Cheers,
Kyle Moffett
Kyle Moffett wrote:
> On Oct 12, 2007, at 01:37:23, Al Boldi wrote:
> > You have a point, and resource-controllers can probably control DoS
> > a lot better, but the they also incur more overhead. Think of this
> > "lockout prevention" patch as a near zero overhead safety valve.
>
> But why do you need to add "lockout prevention" if it already
> exists?
I said this before, but I'll say it again: it's about overhead!
> With CFS' extremely efficient per-user-scheduling (hopefully
> soon to be the default) there are only two forms of lockout by non-
> root processes: (1) Running out of PIDs in the box's PID-space
> (think tens or hundreds of thousands of processes), or (2) Swap-
> storming the box to death. To put it bluntly trying to reserve free
> PID slots is attacking the wrong end of the problem and your so
> called "lockout prevention" could very easily ensure that 10 PIDs are
> available even if the user has swapstormed the box with the PIDs he
> does have.
I think you are reading this wrong. It's not about reserving PIDs, it's
about exceeding the max-threads limit. This limit is global and affects
every user including root, which is good, as this allows the sysadmin to
fence the system into a controllable state. So once the system reaches the
fence, sysadmin-intervention allows root to exceed the fence.
Again, this is much nicer with real resource-controllers, but again it's also
more overhead.
Thanks!
--
Al
El Fri, 12 Oct 2007 09:29:10 +0300
Al Boldi <[email protected]> escribió:
> Kyle Moffett wrote:
> > On Oct 12, 2007, at 01:37:23, Al Boldi wrote:
> > > You have a point, and resource-controllers can probably control
> > > DoS a lot better, but the they also incur more overhead. Think
> > > of this "lockout prevention" patch as a near zero overhead safety
> > > valve.
> >
> > But why do you need to add "lockout prevention" if it already
> > exists?
>
> I said this before, but I'll say it again: it's about overhead!
>
> > With CFS' extremely efficient per-user-scheduling (hopefully
> > soon to be the default) there are only two forms of lockout by non-
> > root processes: (1) Running out of PIDs in the box's PID-space
> > (think tens or hundreds of thousands of processes), or (2) Swap-
> > storming the box to death. To put it bluntly trying to reserve free
> > PID slots is attacking the wrong end of the problem and your so
> > called "lockout prevention" could very easily ensure that 10 PIDs
> > are available even if the user has swapstormed the box with the
> > PIDs he does have.
>
> I think you are reading this wrong. It's not about reserving PIDs,
> it's about exceeding the max-threads limit. This limit is global and
> affects every user including root, which is good, as this allows the
> sysadmin to fence the system into a controllable state. So once the
> system reaches the fence, sysadmin-intervention allows root to exceed
> the fence.
>
> Again, this is much nicer with real resource-controllers, but again
> it's also more overhead.
Just an _if()_ ?
may be enable it as an option in kernel config ?
>
> Thanks!
>
> --
> Al
>
> -
> To unsubscribe from this list: send the line "unsubscribe
> linux-kernel" in the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
--
Gustavo Chaín Dumit
Alumno de Ingeniería de Ejecución Informática
Pontificia Universidad Católica de Valparaiso
http://aleph.homeunix.com/~gchain
Gustavo Chain wrote:
> Al Boldi <[email protected]> escribió:
> > Kyle Moffett wrote:
> > > On Oct 12, 2007, at 01:37:23, Al Boldi wrote:
> > > > You have a point, and resource-controllers can probably control
> > > > DoS a lot better, but the they also incur more overhead. Think
> > > > of this "lockout prevention" patch as a near zero overhead safety
> > > > valve.
> > >
> > > But why do you need to add "lockout prevention" if it already
> > > exists?
> >
> > I said this before, but I'll say it again: it's about overhead!
> >
> > > With CFS' extremely efficient per-user-scheduling (hopefully
> > > soon to be the default) there are only two forms of lockout by non-
> > > root processes: (1) Running out of PIDs in the box's PID-space
> > > (think tens or hundreds of thousands of processes), or (2) Swap-
> > > storming the box to death. To put it bluntly trying to reserve free
> > > PID slots is attacking the wrong end of the problem and your so
> > > called "lockout prevention" could very easily ensure that 10 PIDs
> > > are available even if the user has swapstormed the box with the
> > > PIDs he does have.
> >
> > I think you are reading this wrong. It's not about reserving PIDs,
> > it's about exceeding the max-threads limit. This limit is global and
> > affects every user including root, which is good, as this allows the
> > sysadmin to fence the system into a controllable state. So once the
> > system reaches the fence, sysadmin-intervention allows root to exceed
> > the fence.
> >
> > Again, this is much nicer with real resource-controllers, but again
> > it's also more overhead.
>
> Just an _if()_ ?
>
> may be enable it as an option in kernel config ?
Here is the patch again:
[PATCH 1/1] threads_max: Simple lockout prevention patch
Simple attempt to provide a backdoor in a process lockout situation.
echo $$ > /proc/sys/kernel/su-pid allows pid to exceed the threads_max limit.
Note that this patch incurs zero runtime-overhead.
Signed-off-by: Al Boldi <[email protected]>
---
(patch against 2.6.14)
--- kernel/fork.c.orig 2005-11-14 20:55:33.000000000 +0300
+++ kernel/fork.c 2005-11-14 20:58:25.000000000 +0300
@@ -57,6 +57,7 @@
int nr_threads; /* The idle threads do not count.. */
int max_threads; /* tunable limit on nr_threads */
+int su_pid; /* BackDoor pid to exceed limit on nr_threads */
DEFINE_PER_CPU(unsigned long, process_counts) = 0;
@@ -926,6 +927,7 @@
* to stop root fork bombs.
*/
if (nr_threads >= max_threads)
+ if (p->pid != su_pid)
goto bad_fork_cleanup_count;
if (!try_module_get(p->thread_info->exec_domain->module))
--- kernel/sysctl.c.orig 2005-11-14 20:58:45.000000000 +0300
+++ kernel/sysctl.c 2005-11-14 21:01:20.000000000 +0300
@@ -57,6 +57,7 @@
extern int sysctl_overcommit_memory;
extern int sysctl_overcommit_ratio;
extern int max_threads;
+extern int su_pid;
extern int sysrq_enabled;
extern int core_uses_pid;
extern int suid_dumpable;
@@ -509,6 +510,14 @@
.proc_handler = &proc_dointvec,
},
{
+ .ctl_name = KERN_SU_PID,
+ .procname = "su-pid",
+ .data = &su_pid,
+ .maxlen = sizeof(int),
+ .mode = 0644,
+ .proc_handler = &proc_dointvec,
+ },
+ {
.ctl_name = KERN_RANDOM,
.procname = "random",
.mode = 0555,
--- include/linux/sysctl.h.orig 2005-11-14 20:54:55.000000000 +0300
+++ include/linux/sysctl.h 2005-11-14 20:55:15.000000000 +0300
@@ -146,6 +146,7 @@
KERN_RANDOMIZE=68, /* int: randomize virtual address space */
KERN_SETUID_DUMPABLE=69, /* int: behaviour of dumps for setuid core
*/
KERN_SPIN_RETRY=70, /* int: number of spinlock retries */
+ KERN_SU_PID=71, /* int: BackDoor pid to exceed Maximum
+ /* nr of threads in the system */
};