2018-10-31 15:10:41

by Daniel Colascione

[permalink] [raw]
Subject: [PATCH] Document /proc/pid PID reuse behavior

State explicitly that holding a /proc/pid file descriptor open does
not reserve the PID. Also note that in the event of PID reuse, these
open file descriptors refer to the old, now-dead process, and not the
new one that happens to be named the same numeric PID.
---
Documentation/filesystems/proc.txt | 8 ++++++++
1 file changed, 8 insertions(+)

diff --git a/Documentation/filesystems/proc.txt b/Documentation/filesystems/proc.txt
index 12a5e6e693b6..567f66a8a23c 100644
--- a/Documentation/filesystems/proc.txt
+++ b/Documentation/filesystems/proc.txt
@@ -214,6 +214,14 @@ asynchronous manner and the value may not be very precise. To see a precise
snapshot of a moment, you can see /proc/<pid>/smaps file and scan page table.
It's slow but very precise.

+Note that an open a file descriptor to /proc/<pid> or to any of its
+contained files or subdirectories does not prevent <pid> being reused
+for some other process in the event that <pid> exits. Operations on
+open /proc/<pid> file descriptors corresponding to dead processes
+never act on any new process that the kernel may, through chance, have
+also assigned the process ID <pid>. Instead, operations on these FDs
+usually fail with ESRCH.
+
Table 1-2: Contents of the status files (as of 4.8)
..............................................................................
Field Content
--
2.19.1.568.g152ad8e336-goog



2018-11-01 07:09:52

by Mike Rapoport

[permalink] [raw]
Subject: Re: [PATCH] Document /proc/pid PID reuse behavior

On Wed, Oct 31, 2018 at 03:06:22PM +0000, Daniel Colascione wrote:
> State explicitly that holding a /proc/pid file descriptor open does
> not reserve the PID. Also note that in the event of PID reuse, these
> open file descriptors refer to the old, now-dead process, and not the
> new one that happens to be named the same numeric PID.

Signed-off is missing.

> ---
> Documentation/filesystems/proc.txt | 8 ++++++++
> 1 file changed, 8 insertions(+)
>
> diff --git a/Documentation/filesystems/proc.txt b/Documentation/filesystems/proc.txt
> index 12a5e6e693b6..567f66a8a23c 100644
> --- a/Documentation/filesystems/proc.txt
> +++ b/Documentation/filesystems/proc.txt
> @@ -214,6 +214,14 @@ asynchronous manner and the value may not be very precise. To see a precise
> snapshot of a moment, you can see /proc/<pid>/smaps file and scan page table.
> It's slow but very precise.
>
> +Note that an open a file descriptor to /proc/<pid> or to any of its
> +contained files or subdirectories does not prevent <pid> being reused
> +for some other process in the event that <pid> exits. Operations on
> +open /proc/<pid> file descriptors corresponding to dead processes
> +never act on any new process that the kernel may, through chance, have
> +also assigned the process ID <pid>. Instead, operations on these FDs
> +usually fail with ESRCH.
> +

I'd put this text in the beginning of the section, just before table 1-1.
Otherwise looks good.

It maybe also useful to update 'man 5 proc' (1) as well

[1] https://git.kernel.org/pub/scm/docs/man-pages/man-pages.git/tree/man5/proc.5

> Table 1-2: Contents of the status files (as of 4.8)
> ..............................................................................
> Field Content
> --
> 2.19.1.568.g152ad8e336-goog
>

--
Sincerely yours,
Mike.


2018-11-05 13:25:16

by Daniel Colascione

[permalink] [raw]
Subject: [PATCH v2] Document /proc/pid PID reuse behavior

State explicitly that holding a /proc/pid file descriptor open does
not reserve the PID. Also note that in the event of PID reuse, these
open file descriptors refer to the old, now-dead process, and not the
new one that happens to be named the same numeric PID.

Signed-off-by: Daniel Colascione <[email protected]>
---
Documentation/filesystems/proc.txt | 7 +++++++
1 file changed, 7 insertions(+)

Moved paragraphed to start of /proc/pid section; added signed-off-by.

diff --git a/Documentation/filesystems/proc.txt b/Documentation/filesystems/proc.txt
index 12a5e6e693b6..0b14460f721d 100644
--- a/Documentation/filesystems/proc.txt
+++ b/Documentation/filesystems/proc.txt
@@ -125,6 +125,13 @@ process running on the system, which is named after the process ID (PID).
The link self points to the process reading the file system. Each process
subdirectory has the entries listed in Table 1-1.

+Note that an open a file descriptor to /proc/<pid> or to any of its
+contained files or subdirectories does not prevent <pid> being reused
+for some other process in the event that <pid> exits. Operations on
+open /proc/<pid> file descriptors corresponding to dead processes
+never act on any new process that the kernel may, through chance, have
+also assigned the process ID <pid>. Instead, operations on these FDs
+usually fail with ESRCH.

Table 1-1: Process specific entries in /proc
..............................................................................
--
2.19.1.930.g4563a0d9d0-goog


2018-11-06 06:02:16

by Mike Rapoport

[permalink] [raw]
Subject: Re: [PATCH v2] Document /proc/pid PID reuse behavior

On Mon, Nov 05, 2018 at 01:22:05PM +0000, Daniel Colascione wrote:
> State explicitly that holding a /proc/pid file descriptor open does
> not reserve the PID. Also note that in the event of PID reuse, these
> open file descriptors refer to the old, now-dead process, and not the
> new one that happens to be named the same numeric PID.
>
> Signed-off-by: Daniel Colascione <[email protected]>

Reviewed-by: Mike Rapoport <[email protected]>

> ---
> Documentation/filesystems/proc.txt | 7 +++++++
> 1 file changed, 7 insertions(+)
>
> Moved paragraphed to start of /proc/pid section; added signed-off-by.
>
> diff --git a/Documentation/filesystems/proc.txt b/Documentation/filesystems/proc.txt
> index 12a5e6e693b6..0b14460f721d 100644
> --- a/Documentation/filesystems/proc.txt
> +++ b/Documentation/filesystems/proc.txt
> @@ -125,6 +125,13 @@ process running on the system, which is named after the process ID (PID).
> The link self points to the process reading the file system. Each process
> subdirectory has the entries listed in Table 1-1.
>
> +Note that an open a file descriptor to /proc/<pid> or to any of its
> +contained files or subdirectories does not prevent <pid> being reused
> +for some other process in the event that <pid> exits. Operations on
> +open /proc/<pid> file descriptors corresponding to dead processes
> +never act on any new process that the kernel may, through chance, have
> +also assigned the process ID <pid>. Instead, operations on these FDs
> +usually fail with ESRCH.
>
> Table 1-1: Process specific entries in /proc
> ..............................................................................
> --
> 2.19.1.930.g4563a0d9d0-goog
>

--
Sincerely yours,
Mike.


2018-11-06 13:06:51

by Michal Hocko

[permalink] [raw]
Subject: Re: [PATCH v2] Document /proc/pid PID reuse behavior

On Mon 05-11-18 13:22:05, Daniel Colascione wrote:
> State explicitly that holding a /proc/pid file descriptor open does
> not reserve the PID. Also note that in the event of PID reuse, these
> open file descriptors refer to the old, now-dead process, and not the
> new one that happens to be named the same numeric PID.

This sounds quite obvious otherwise anybody could simply DoS the system
by consuming all available pids. But I do not see any strong reason
against having that stated explicitly.

> Signed-off-by: Daniel Colascione <[email protected]>

Acked-by: Michal Hocko <[email protected]>

> ---
> Documentation/filesystems/proc.txt | 7 +++++++
> 1 file changed, 7 insertions(+)
>
> Moved paragraphed to start of /proc/pid section; added signed-off-by.
>
> diff --git a/Documentation/filesystems/proc.txt b/Documentation/filesystems/proc.txt
> index 12a5e6e693b6..0b14460f721d 100644
> --- a/Documentation/filesystems/proc.txt
> +++ b/Documentation/filesystems/proc.txt
> @@ -125,6 +125,13 @@ process running on the system, which is named after the process ID (PID).
> The link self points to the process reading the file system. Each process
> subdirectory has the entries listed in Table 1-1.
>
> +Note that an open a file descriptor to /proc/<pid> or to any of its
> +contained files or subdirectories does not prevent <pid> being reused
> +for some other process in the event that <pid> exits. Operations on
> +open /proc/<pid> file descriptors corresponding to dead processes
> +never act on any new process that the kernel may, through chance, have
> +also assigned the process ID <pid>. Instead, operations on these FDs
> +usually fail with ESRCH.
>
> Table 1-1: Process specific entries in /proc
> ..............................................................................
> --
> 2.19.1.930.g4563a0d9d0-goog

--
Michal Hocko
SUSE Labs

2018-11-07 16:01:33

by Michal Hocko

[permalink] [raw]
Subject: Re: [PATCH v2] Document /proc/pid PID reuse behavior

On Wed 07-11-18 15:48:20, Daniel Colascione wrote:
> On Tue, Nov 6, 2018 at 1:05 PM, Michal Hocko <[email protected]> wrote:
> > On Mon 05-11-18 13:22:05, Daniel Colascione wrote:
> >> State explicitly that holding a /proc/pid file descriptor open does
> >> not reserve the PID. Also note that in the event of PID reuse, these
> >> open file descriptors refer to the old, now-dead process, and not the
> >> new one that happens to be named the same numeric PID.
> >
> > This sounds quite obvious
>
> Many people *on* *LKML* were wrong about this behavior. If it's not
> obvious to experienced kernel developers, it's certainly not obvious
> to the public.

Fair enough

> > otherwise anybody could simply DoS the system
> > by consuming all available pids.
>
> People can do that today using the instrument of terror widely known
> as fork(2). The only thing standing between fork(2) and a full process
> table is RLIMIT_NPROC.

not really. If you really do care about pid space depletion then you
should use pid cgroup controller.
--
Michal Hocko
SUSE Labs

2018-11-07 16:21:13

by Michal Hocko

[permalink] [raw]
Subject: Re: [PATCH v2] Document /proc/pid PID reuse behavior

On Wed 07-11-18 16:10:01, Daniel Colascione wrote:
> On Wed, Nov 7, 2018 at 4:00 PM, Michal Hocko <[email protected]> wrote:
[...]
> > If you really do care about pid space depletion then you
> > should use pid cgroup controller.
>
> Or that, sure. But since cgroups are optional, the problem with the
> core model remains. In general, if there's a problem X with the core
> system API, and you can mitigate X by using a cgroup, X is still a
> problem.

I am not questioning that. All that I am saying is that there is a way to
mitigate the issue. This is not the only resource where the standard
rlimit is not sufficient and there is no other way than cgroups.
Actually cgroups were introduced to address rlimit limits IIRC.

--
Michal Hocko
SUSE Labs

2018-11-07 17:05:49

by Martin Steigerwald

[permalink] [raw]
Subject: Re: [PATCH v2] Document /proc/pid PID reuse behavior

Michal Hocko - 07.11.18, 17:00:
> > > otherwise anybody could simply DoS the system
> > > by consuming all available pids.
> >
> > People can do that today using the instrument of terror widely known
> > as fork(2). The only thing standing between fork(2) and a full
> > process table is RLIMIT_NPROC.
>
> not really. If you really do care about pid space depletion then you
> should use pid cgroup controller.

Its not quite on-topic, but I am curious now: AFAIK PID limit is 16
bits. Right? Could it be raised to 32 bits? I bet it would be a major
change throughout different parts of the kernel.

16 bits sound a bit low these days, not only for PIDs, but also for
connections / ports.

--
Martin



2018-11-07 17:19:07

by Matthew Wilcox

[permalink] [raw]
Subject: Re: [PATCH v2] Document /proc/pid PID reuse behavior

On Tue, Nov 06, 2018 at 08:01:13AM +0200, Mike Rapoport wrote:
> > diff --git a/Documentation/filesystems/proc.txt b/Documentation/filesystems/proc.txt
> > index 12a5e6e693b6..0b14460f721d 100644
> > --- a/Documentation/filesystems/proc.txt
> > +++ b/Documentation/filesystems/proc.txt
> > @@ -125,6 +125,13 @@ process running on the system, which is named after the process ID (PID).
> > The link self points to the process reading the file system. Each process
> > subdirectory has the entries listed in Table 1-1.
> >
> > +Note that an open a file descriptor to /proc/<pid> or to any of its

"open file descriptor" (the "a" is unnecessary)

> > +contained files or subdirectories does not prevent <pid> being reused
> > +for some other process in the event that <pid> exits. Operations on
> > +open /proc/<pid> file descriptors corresponding to dead processes
> > +never act on any new process that the kernel may, through chance, have
> > +also assigned the process ID <pid>. Instead, operations on these FDs
> > +usually fail with ESRCH.

The paragraph is a bit wordy. More pithy:

An open file descriptor for /proc/<pid> (or any of the files or
subdirectories in it) does not prevent <pid> from being reused after
the process exits. Operations on a file descriptor referring to a dead
process usually return ESRCH. They do not act on any new process which
has been assigned the same <pid>.

2018-11-07 18:01:50

by Daniel Colascione

[permalink] [raw]
Subject: Re: [PATCH v2] Document /proc/pid PID reuse behavior

On Tue, Nov 6, 2018 at 1:05 PM, Michal Hocko <[email protected]> wrote:
> On Mon 05-11-18 13:22:05, Daniel Colascione wrote:
>> State explicitly that holding a /proc/pid file descriptor open does
>> not reserve the PID. Also note that in the event of PID reuse, these
>> open file descriptors refer to the old, now-dead process, and not the
>> new one that happens to be named the same numeric PID.
>
> This sounds quite obvious

Many people *on* *LKML* were wrong about this behavior. If it's not
obvious to experienced kernel developers, it's certainly not obvious
to the public.

> otherwise anybody could simply DoS the system
> by consuming all available pids.

People can do that today using the instrument of terror widely known
as fork(2). The only thing standing between fork(2) and a full process
table is RLIMIT_NPROC. In a world where we really did reserve a
numeric PID through the lifetime of any struct pid to which it refers
(i.e., where "cd /proc/$PID" held $PID), we could charge these struct
pid reservations against RLIMIT_NPROC and achieve behavior as safe as
what we have today. The details would be subtle (you'd have to take
pains to avoid double-counting, for example), but it could be made to
work. Other people, on the various lkml threads about my process API
improvement proposals, have proposed fixing the longstanding PID race
problem by making struct pid behave the way people mistakenly believe
it behaves today. It's a serious idea worth actual consideration.

2018-11-07 18:04:42

by Daniel Colascione

[permalink] [raw]
Subject: Re: [PATCH v2] Document /proc/pid PID reuse behavior

On Wed, Nov 7, 2018 at 4:00 PM, Michal Hocko <[email protected]> wrote:
> On Wed 07-11-18 15:48:20, Daniel Colascione wrote:
>> On Tue, Nov 6, 2018 at 1:05 PM, Michal Hocko <[email protected]> wrote:
>> > otherwise anybody could simply DoS the system
>> > by consuming all available pids.
>>
>> People can do that today using the instrument of terror widely known
>> as fork(2). The only thing standing between fork(2) and a full process
>> table is RLIMIT_NPROC.
>
> not really.

What else, besides memory consumption and (as you mention below)
cgroups? In practice, nobody uses RLIMIT_NPROC, so outside of various
container-y namespaced setups, avoidance of
system-DoS-through-PID-exhaustion isn't a pressing problem.

If you really do care about pid space depletion then you
> should use pid cgroup controller.

Or that, sure. But since cgroups are optional, the problem with the
core model remains. In general, if there's a problem X with the core
system API, and you can mitigate X by using a cgroup, X is still a
problem.

2018-11-07 18:22:51

by Daniel Colascione

[permalink] [raw]
Subject: Re: [PATCH v2] Document /proc/pid PID reuse behavior

On Wed, Nov 7, 2018 at 9:16 AM, Matthew Wilcox <[email protected]> wrote:
> On Tue, Nov 06, 2018 at 08:01:13AM +0200, Mike Rapoport wrote:
>> > diff --git a/Documentation/filesystems/proc.txt b/Documentation/filesystems/proc.txt
>> > index 12a5e6e693b6..0b14460f721d 100644
>> > --- a/Documentation/filesystems/proc.txt
>> > +++ b/Documentation/filesystems/proc.txt
>> > @@ -125,6 +125,13 @@ process running on the system, which is named after the process ID (PID).
>> > The link self points to the process reading the file system. Each process
>> > subdirectory has the entries listed in Table 1-1.
>> >
>> > +Note that an open a file descriptor to /proc/<pid> or to any of its
>
> "open file descriptor" (the "a" is unnecessary)

Thanks for spotting that. I had to re-read that sentence three times
or so before even seeing that extra "a".

>> > +contained files or subdirectories does not prevent <pid> being reused
>> > +for some other process in the event that <pid> exits. Operations on
>> > +open /proc/<pid> file descriptors corresponding to dead processes
>> > +never act on any new process that the kernel may, through chance, have
>> > +also assigned the process ID <pid>. Instead, operations on these FDs
>> > +usually fail with ESRCH.
>
> The paragraph is a bit wordy. More pithy:
>
> An open file descriptor for /proc/<pid> (or any of the files or
> subdirectories in it) does not prevent <pid> from being reused after
> the process exits. Operations on a file descriptor referring to a dead
> process usually return ESRCH. They do not act on any new process which
> has been assigned the same <pid>.

I'll send a new patch version --- unless we can just tweak the patch
when we merge it into the tree?

2018-11-08 12:06:26

by David Laight

[permalink] [raw]
Subject: RE: [PATCH v2] Document /proc/pid PID reuse behavior

From: Martin Steigerwald
> Sent: 07 November 2018 17:05
...
> Its not quite on-topic, but I am curious now: AFAIK PID limit is 16
> bits. Right? Could it be raised to 32 bits? I bet it would be a major
> change throughout different parts of the kernel.

It is probably 15 bits (since -ve pid numbers are used for process groups).

My guess is that userspace and the system call interface will handle 32bit
(signed) pid numbers.
(I don't remember 'linux emulation' being one of the emulations that
would truncate 32bit pids when one of the BDSs went to 32bit pids.)
The main problem will be that big numbers will mess up the alignment
of printouts from ps and top (etc).
This can be mitigated by only allocating 'big' numbers on systems
that have a lot of pids.
You also really want an O(1) allocator.

David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)


2018-11-08 12:30:08

by Matthew Wilcox

[permalink] [raw]
Subject: Re: [PATCH v2] Document /proc/pid PID reuse behavior

On Thu, Nov 08, 2018 at 12:02:49PM +0000, David Laight wrote:
> From: Martin Steigerwald
> > Sent: 07 November 2018 17:05
> ...
> > Its not quite on-topic, but I am curious now: AFAIK PID limit is 16
> > bits. Right? Could it be raised to 32 bits? I bet it would be a major
> > change throughout different parts of the kernel.
>
> It is probably 15 bits (since -ve pid numbers are used for process groups).
>
> My guess is that userspace and the system call interface will handle 32bit
> (signed) pid numbers.
> (I don't remember 'linux emulation' being one of the emulations that
> would truncate 32bit pids when one of the BDSs went to 32bit pids.)
> The main problem will be that big numbers will mess up the alignment
> of printouts from ps and top (etc).
> This can be mitigated by only allocating 'big' numbers on systems
> that have a lot of pids.
> You also really want an O(1) allocator.

The allocator is O(log n) -- it's the IDR allocator, used in cyclic mode.
n in this case is the highest ID which is still in use. The tree is
log_64(n) levels high. It walks to the bottom of the tree and puts a
pointer into the tree. If the cursor has wrapped to the beginning of
the tree, it may encounter a PID which is still in use; if it does,
it does a bitmap scan of that node, and will then walk up the tree,
doing a bitmap scan forward at each level until it finds a free PID.

So it's not exactly O(log(n)), but it's close enough for all practical
purposes. And more importantly, it doesn't touch a lot of cachelines.
Two or three at each level of the tree that it accesses. If we went
all the way to a 32-bit PID, the tree would grow to 6 levels deep,
and worst-case would touch 6 + 5 + 4 levels of the tree (starting with
trying to allocate PID 0xffffffff, failing, trying to allocate PID 300,
then having to walk all the way forward to find PID 0xe0000000), so
that's only 45 cachelines.

People care a little too much about O(1)/O(n) behaviour. Cacheline
behaviour, and good average-case performance without falling off a cliff
in the worst case is much more important.

2018-11-08 13:26:19

by Michal Hocko

[permalink] [raw]
Subject: Re: [PATCH v2] Document /proc/pid PID reuse behavior

On Wed 07-11-18 18:04:59, Martin Steigerwald wrote:
> Michal Hocko - 07.11.18, 17:00:
> > > > otherwise anybody could simply DoS the system
> > > > by consuming all available pids.
> > >
> > > People can do that today using the instrument of terror widely known
> > > as fork(2). The only thing standing between fork(2) and a full
> > > process table is RLIMIT_NPROC.
> >
> > not really. If you really do care about pid space depletion then you
> > should use pid cgroup controller.
>
> Its not quite on-topic, but I am curious now: AFAIK PID limit is 16
> bits. Right? Could it be raised to 32 bits? I bet it would be a major
> change throughout different parts of the kernel.
>
> 16 bits sound a bit low these days, not only for PIDs, but also for
> connections / ports.

Do you have any specific example of the pid space exhaustion? Well
except for a fork bomb attacks that could be mitigated by the pid cgroup
controller.
--
Michal Hocko
SUSE Labs

2018-11-08 13:43:42

by David Laight

[permalink] [raw]
Subject: RE: [PATCH v2] Document /proc/pid PID reuse behavior

From: Matthew Wilcox
> Sent: 08 November 2018 12:28
>
> On Thu, Nov 08, 2018 at 12:02:49PM +0000, David Laight wrote:
> > From: Martin Steigerwald
> > > Sent: 07 November 2018 17:05
> > ...
> > > Its not quite on-topic, but I am curious now: AFAIK PID limit is 16
> > > bits. Right? Could it be raised to 32 bits? I bet it would be a major
> > > change throughout different parts of the kernel.
> >
> > It is probably 15 bits (since -ve pid numbers are used for process groups).
> >
> > My guess is that userspace and the system call interface will handle 32bit
> > (signed) pid numbers.
> > (I don't remember 'linux emulation' being one of the emulations that
> > would truncate 32bit pids when one of the BDSs went to 32bit pids.)
> > The main problem will be that big numbers will mess up the alignment
> > of printouts from ps and top (etc).
> > This can be mitigated by only allocating 'big' numbers on systems
> > that have a lot of pids.
> > You also really want an O(1) allocator.
>
> The allocator is O(log n) -- it's the IDR allocator, used in cyclic mode.
> n in this case is the highest ID which is still in use. The tree is
> log_64(n) levels high. It walks to the bottom of the tree and puts a
> pointer into the tree. If the cursor has wrapped to the beginning of
> the tree, it may encounter a PID which is still in use; if it does,
> it does a bitmap scan of that node, and will then walk up the tree,
> doing a bitmap scan forward at each level until it finds a free PID.

Right, but you can choose the pid so that you get a perfect hash.
You can then put a FIFO free list through the unused entries of
the hash table (just an array).
Then pid allocate just picks the oldest free entry and ups the
high bits (that the hash masks out) to make the old value stale.
Almost no cache lines are involved in the whole operation.

David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)


2018-11-08 14:09:19

by Matthew Wilcox

[permalink] [raw]
Subject: Re: [PATCH v2] Document /proc/pid PID reuse behavior

On Thu, Nov 08, 2018 at 01:42:41PM +0000, David Laight wrote:
> From: Matthew Wilcox
> > On Thu, Nov 08, 2018 at 12:02:49PM +0000, David Laight wrote:
> > > This can be mitigated by only allocating 'big' numbers on systems
> > > that have a lot of pids.
> > > You also really want an O(1) allocator.
> >
> > The allocator is O(log n) -- it's the IDR allocator, used in cyclic mode.
> > n in this case is the highest ID which is still in use. The tree is
> > log_64(n) levels high. It walks to the bottom of the tree and puts a
> > pointer into the tree. If the cursor has wrapped to the beginning of
> > the tree, it may encounter a PID which is still in use; if it does,
> > it does a bitmap scan of that node, and will then walk up the tree,
> > doing a bitmap scan forward at each level until it finds a free PID.
>
> Right, but you can choose the pid so that you get a perfect hash.
> You can then put a FIFO free list through the unused entries of
> the hash table (just an array).
> Then pid allocate just picks the oldest free entry and ups the
> high bits (that the hash masks out) to make the old value stale.
> Almost no cache lines are involved in the whole operation.

You'd be looking for something like dynamic perfect hashing then?
I think that'd make a great project for someone to try out. I don't
have the time to look into it myself, and I'm not convinced there's
a real workload that would benefit. But it'd be a great project for
a university student.


2018-11-08 14:15:26

by David Laight

[permalink] [raw]
Subject: RE: [PATCH v2] Document /proc/pid PID reuse behavior

From: Matthew Wilcox
> Sent: 08 November 2018 14:07
...
>
> You'd be looking for something like dynamic perfect hashing then?
> I think that'd make a great project for someone to try out. I don't
> have the time to look into it myself, and I'm not convinced there's
> a real workload that would benefit. But it'd be a great project for
> a university student.

Been there, done that several times.
Half an afternoon's work.

I'm surprised there isn't a C++ container class that will give
you a 'id' for a pointer (or other large value).
But they all do it the other way around.

It is also trivial to double the size of the array and 'unzip'
the allocated items into their correct halves while adding the
other entry to the free list.

David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)


2018-11-19 10:55:26

by Pavel Machek

[permalink] [raw]
Subject: Re: [PATCH v2] Document /proc/pid PID reuse behavior

On Mon 2018-11-05 13:22:05, Daniel Colascione wrote:
> State explicitly that holding a /proc/pid file descriptor open does
> not reserve the PID. Also note that in the event of PID reuse, these
> open file descriptors refer to the old, now-dead process, and not the
> new one that happens to be named the same numeric PID.
>
> Signed-off-by: Daniel Colascione <[email protected]>
> ---
> Documentation/filesystems/proc.txt | 7 +++++++
> 1 file changed, 7 insertions(+)
>
> Moved paragraphed to start of /proc/pid section; added signed-off-by.
>
> diff --git a/Documentation/filesystems/proc.txt b/Documentation/filesystems/proc.txt
> index 12a5e6e693b6..0b14460f721d 100644
> --- a/Documentation/filesystems/proc.txt
> +++ b/Documentation/filesystems/proc.txt
> @@ -125,6 +125,13 @@ process running on the system, which is named after the process ID (PID).
> The link self points to the process reading the file system. Each process
> subdirectory has the entries listed in Table 1-1.
>
> +Note that an open a file descriptor to /proc/<pid> or to any of its
> +contained files or subdirectories does not prevent <pid> being reused
> +for some other process in the event that <pid> exits. Operations on

"does not" -> "may not"?

We want to leave this unspecified, so that we can change it in future.
Pavel


--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html


Attachments:
(No filename) (1.47 kB)
signature.asc (188.00 B)
Digital signature
Download all attachments

2018-11-19 11:19:44

by Aleksa Sarai

[permalink] [raw]
Subject: Re: [PATCH v2] Document /proc/pid PID reuse behavior

On 2018-11-07, Daniel Colascione <[email protected]> wrote:
> On Wed, Nov 7, 2018 at 4:00 PM, Michal Hocko <[email protected]> wrote:
> > On Wed 07-11-18 15:48:20, Daniel Colascione wrote:
> >> On Tue, Nov 6, 2018 at 1:05 PM, Michal Hocko <[email protected]> wrote:
> >> > otherwise anybody could simply DoS the system
> >> > by consuming all available pids.
> >>
> >> People can do that today using the instrument of terror widely known
> >> as fork(2). The only thing standing between fork(2) and a full process
> >> table is RLIMIT_NPROC.
> >
> > not really.
>
> What else, besides memory consumption and (as you mention below)
> cgroups? In practice, nobody uses RLIMIT_NPROC, so outside of various
> container-y namespaced setups, avoidance of
> system-DoS-through-PID-exhaustion isn't a pressing problem.

systemd has had a default pid cgroup controller policy (for both user
and system slices) for a quite long time. I believe that the most recent
version of most enterprise and community distributions support it by
default (and probably even some older versions -- commit 49b786ea146f
was merged in 2015 and I think systemd grew support for it in 2016).

I agree with your overall point, but it should be noted that the vast
majority of Linux systems these days have protections against this (by
default) that use the pids cgroup controller.

--
Aleksa Sarai
Senior Software Engineer (Containers)
SUSE Linux GmbH
<https://www.cyphar.com/>


Attachments:
(No filename) (1.45 kB)
signature.asc (849.00 B)
Download all attachments

2018-11-19 16:25:25

by Daniel Colascione

[permalink] [raw]
Subject: Re: [PATCH v2] Document /proc/pid PID reuse behavior

On Mon, Nov 19, 2018 at 2:54 AM, Pavel Machek <[email protected]> wrote:
> On Mon 2018-11-05 13:22:05, Daniel Colascione wrote:
>> State explicitly that holding a /proc/pid file descriptor open does
>> not reserve the PID. Also note that in the event of PID reuse, these
>> open file descriptors refer to the old, now-dead process, and not the
>> new one that happens to be named the same numeric PID.
>>
>> Signed-off-by: Daniel Colascione <[email protected]>
>> ---
>> Documentation/filesystems/proc.txt | 7 +++++++
>> 1 file changed, 7 insertions(+)
>>
>> Moved paragraphed to start of /proc/pid section; added signed-off-by.
>>
>> diff --git a/Documentation/filesystems/proc.txt b/Documentation/filesystems/proc.txt
>> index 12a5e6e693b6..0b14460f721d 100644
>> --- a/Documentation/filesystems/proc.txt
>> +++ b/Documentation/filesystems/proc.txt
>> @@ -125,6 +125,13 @@ process running on the system, which is named after the process ID (PID).
>> The link self points to the process reading the file system. Each process
>> subdirectory has the entries listed in Table 1-1.
>>
>> +Note that an open a file descriptor to /proc/<pid> or to any of its
>> +contained files or subdirectories does not prevent <pid> being reused
>> +for some other process in the event that <pid> exits. Operations on
>
> "does not" -> "may not"?
>
> We want to leave this unspecified, so that we can change it in future.

No. "Does not". I'm sick and tired of procfs behavior being vague and
unspecified to the point where even kernel developers have an
incorrect mental model of how it all works. With Christian Brauner's
good work in place and hopefully my own work to follow, we're moving
firmly in the direction of procfs handles being struct pid references.
Instead of waffling further, let's just buy into this model and do the
best job we can of making it work well.

2018-11-20 09:07:03

by Vlastimil Babka

[permalink] [raw]
Subject: Re: [PATCH v2] Document /proc/pid PID reuse behavior

On 11/19/18 11:54 AM, Pavel Machek wrote:
> On Mon 2018-11-05 13:22:05, Daniel Colascione wrote:
>> State explicitly that holding a /proc/pid file descriptor open does
>> not reserve the PID. Also note that in the event of PID reuse, these
>> open file descriptors refer to the old, now-dead process, and not the
>> new one that happens to be named the same numeric PID.
>>
>> Signed-off-by: Daniel Colascione <[email protected]>
>> ---
>> Documentation/filesystems/proc.txt | 7 +++++++
>> 1 file changed, 7 insertions(+)
>>
>> Moved paragraphed to start of /proc/pid section; added signed-off-by.
>>
>> diff --git a/Documentation/filesystems/proc.txt b/Documentation/filesystems/proc.txt
>> index 12a5e6e693b6..0b14460f721d 100644
>> --- a/Documentation/filesystems/proc.txt
>> +++ b/Documentation/filesystems/proc.txt
>> @@ -125,6 +125,13 @@ process running on the system, which is named after the process ID (PID).
>> The link self points to the process reading the file system. Each process
>> subdirectory has the entries listed in Table 1-1.
>>
>> +Note that an open a file descriptor to /proc/<pid> or to any of its
>> +contained files or subdirectories does not prevent <pid> being reused
>> +for some other process in the event that <pid> exits. Operations on
>
> "does not" -> "may not"?
>
> We want to leave this unspecified, so that we can change it in future.

Why can't the documentation describe the current implementation, and
change in the future if the implementation changes? I doubt somebody
would ever rely on the pid being reused while having the descriptor
open. How would that make sense?
Pavel
>
>


2018-11-20 09:20:14

by Pavel Machek

[permalink] [raw]
Subject: Re: [PATCH v2] Document /proc/pid PID reuse behavior

On Tue 2018-11-20 10:05:21, Vlastimil Babka wrote:
> On 11/19/18 11:54 AM, Pavel Machek wrote:
> > On Mon 2018-11-05 13:22:05, Daniel Colascione wrote:
> >> State explicitly that holding a /proc/pid file descriptor open does
> >> not reserve the PID. Also note that in the event of PID reuse, these
> >> open file descriptors refer to the old, now-dead process, and not the
> >> new one that happens to be named the same numeric PID.
> >>
> >> Signed-off-by: Daniel Colascione <[email protected]>
> >> ---
> >> Documentation/filesystems/proc.txt | 7 +++++++
> >> 1 file changed, 7 insertions(+)
> >>
> >> Moved paragraphed to start of /proc/pid section; added signed-off-by.
> >>
> >> diff --git a/Documentation/filesystems/proc.txt b/Documentation/filesystems/proc.txt
> >> index 12a5e6e693b6..0b14460f721d 100644
> >> --- a/Documentation/filesystems/proc.txt
> >> +++ b/Documentation/filesystems/proc.txt
> >> @@ -125,6 +125,13 @@ process running on the system, which is named after the process ID (PID).
> >> The link self points to the process reading the file system. Each process
> >> subdirectory has the entries listed in Table 1-1.
> >>
> >> +Note that an open a file descriptor to /proc/<pid> or to any of its
> >> +contained files or subdirectories does not prevent <pid> being reused
> >> +for some other process in the event that <pid> exits. Operations on
> >
> > "does not" -> "may not"?
> >
> > We want to leave this unspecified, so that we can change it in future.
>
> Why can't the documentation describe the current implementation, and
> change in the future if the implementation changes? I doubt somebody

Documentation should describe "contract" between kernel and userspace.

> would ever rely on the pid being reused while having the descriptor
> open. How would that make sense?

I agree this is corner space, but users might be surprised that
keeping FDs of /proc/pid/X would lead to PID space exhaustion, for
example.

Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html


Attachments:
(No filename) (2.11 kB)
signature.asc (188.00 B)
Digital signature
Download all attachments

2018-11-20 09:43:43

by Pavel Machek

[permalink] [raw]
Subject: Re: [PATCH v2] Document /proc/pid PID reuse behavior

On Mon 2018-11-19 08:24:02, Daniel Colascione wrote:
> On Mon, Nov 19, 2018 at 2:54 AM, Pavel Machek <[email protected]> wrote:
> > On Mon 2018-11-05 13:22:05, Daniel Colascione wrote:
> >> State explicitly that holding a /proc/pid file descriptor open does
> >> not reserve the PID. Also note that in the event of PID reuse, these
> >> open file descriptors refer to the old, now-dead process, and not the
> >> new one that happens to be named the same numeric PID.
> >>
> >> Signed-off-by: Daniel Colascione <[email protected]>
> >> ---
> >> Documentation/filesystems/proc.txt | 7 +++++++
> >> 1 file changed, 7 insertions(+)
> >>
> >> Moved paragraphed to start of /proc/pid section; added signed-off-by.
> >>
> >> diff --git a/Documentation/filesystems/proc.txt b/Documentation/filesystems/proc.txt
> >> index 12a5e6e693b6..0b14460f721d 100644
> >> --- a/Documentation/filesystems/proc.txt
> >> +++ b/Documentation/filesystems/proc.txt
> >> @@ -125,6 +125,13 @@ process running on the system, which is named after the process ID (PID).
> >> The link self points to the process reading the file system. Each process
> >> subdirectory has the entries listed in Table 1-1.
> >>
> >> +Note that an open a file descriptor to /proc/<pid> or to any of its
> >> +contained files or subdirectories does not prevent <pid> being reused
> >> +for some other process in the event that <pid> exits. Operations on
> >
> > "does not" -> "may not"?
> >
> > We want to leave this unspecified, so that we can change it in future.
>
> No. "Does not". I'm sick and tired of procfs behavior being vague and
> unspecified to the point where even kernel developers have an

You being sick and tired does not mean this is good idea.

> incorrect mental model of how it all works. With Christian Brauner's
> good work in place and hopefully my own work to follow, we're moving
> firmly in the direction of procfs handles being struct pid references.
> Instead of waffling further, let's just buy into this model and do the
> best job we can of making it work well.

So basically decision is being made now, without seeing Christian's
and your good work. That's not same thing as documentation...

Please don't do this.

NAK.
Pavel

--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html


Attachments:
(No filename) (2.36 kB)
signature.asc (188.00 B)
Digital signature
Download all attachments

2018-11-20 16:39:14

by Joel Fernandes

[permalink] [raw]
Subject: Re: [PATCH v2] Document /proc/pid PID reuse behavior

On Tue, Nov 20, 2018 at 1:05 AM Vlastimil Babka <[email protected]> wrote:
>
> On 11/19/18 11:54 AM, Pavel Machek wrote:
> > On Mon 2018-11-05 13:22:05, Daniel Colascione wrote:
> >> State explicitly that holding a /proc/pid file descriptor open does
> >> not reserve the PID. Also note that in the event of PID reuse, these
> >> open file descriptors refer to the old, now-dead process, and not the
> >> new one that happens to be named the same numeric PID.
> >>
> >> Signed-off-by: Daniel Colascione <[email protected]>
> >> ---
> >> Documentation/filesystems/proc.txt | 7 +++++++
> >> 1 file changed, 7 insertions(+)
> >>
> >> Moved paragraphed to start of /proc/pid section; added signed-off-by.
> >>
> >> diff --git a/Documentation/filesystems/proc.txt b/Documentation/filesystems/proc.txt
> >> index 12a5e6e693b6..0b14460f721d 100644
> >> --- a/Documentation/filesystems/proc.txt
> >> +++ b/Documentation/filesystems/proc.txt
> >> @@ -125,6 +125,13 @@ process running on the system, which is named after the process ID (PID).
> >> The link self points to the process reading the file system. Each process
> >> subdirectory has the entries listed in Table 1-1.
> >>
> >> +Note that an open a file descriptor to /proc/<pid> or to any of its
> >> +contained files or subdirectories does not prevent <pid> being reused
> >> +for some other process in the event that <pid> exits. Operations on
> >
> > "does not" -> "may not"?
> >
> > We want to leave this unspecified, so that we can change it in future.
>
> Why can't the documentation describe the current implementation, and
> change in the future if the implementation changes? I doubt somebody
> would ever rely on the pid being reused while having the descriptor
> open. How would that make sense?
>

Agreed. I am also of the opinion that this should be documented.

- Joel

2018-11-20 16:51:07

by Jonathan Corbet

[permalink] [raw]
Subject: Re: [PATCH v2] Document /proc/pid PID reuse behavior

On Tue, 20 Nov 2018 10:05:21 +0100
Vlastimil Babka <[email protected]> wrote:

> Why can't the documentation describe the current implementation, and
> change in the future if the implementation changes? I doubt somebody
> would ever rely on the pid being reused while having the descriptor
> open. How would that make sense?

In the hopes of ending this discussion, I'm going to go ahead and apply
this. Documenting current behavior is good, especially in situations
where that behavior can surprise people; if the implementation changes,
the docs can change with it.

Thanks,

jon

2018-11-20 16:58:47

by Pavel Machek

[permalink] [raw]
Subject: Re: [PATCH v2] Document /proc/pid PID reuse behavior

On Tue 2018-11-20 09:49:50, Jonathan Corbet wrote:
> On Tue, 20 Nov 2018 10:05:21 +0100
> Vlastimil Babka <[email protected]> wrote:
>
> > Why can't the documentation describe the current implementation, and
> > change in the future if the implementation changes? I doubt somebody
> > would ever rely on the pid being reused while having the descriptor
> > open. How would that make sense?
>
> In the hopes of ending this discussion, I'm going to go ahead and apply
> this. Documenting current behavior is good, especially in situations
> where that behavior can surprise people; if the implementation changes,
> the docs can change with it.

I'd still prefer changing from "does not" to "may not".

It is really simple change, and once we documented a behaviour, we
really should not be changing it.

Thanks,
Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html


Attachments:
(No filename) (985.00 B)
signature.asc (188.00 B)
Digital signature
Download all attachments

2018-11-20 19:43:20

by Matthew Wilcox

[permalink] [raw]
Subject: Re: [PATCH v2] Document /proc/pid PID reuse behavior

On Tue, Nov 20, 2018 at 10:18:29AM +0100, Pavel Machek wrote:
> > would ever rely on the pid being reused while having the descriptor
> > open. How would that make sense?
>
> I agree this is corner space, but users might be surprised that
> keeping FDs of /proc/pid/X would lead to PID space exhaustion, for
> example.

We have a limit on the number of FDs a process can have open for a reason.
Well, for many reasons.


2018-11-20 20:07:04

by Daniel Colascione

[permalink] [raw]
Subject: Re: [PATCH v2] Document /proc/pid PID reuse behavior

On Tue, Nov 20, 2018 at 9:39 AM Matthew Wilcox <[email protected]> wrote:
>
> On Tue, Nov 20, 2018 at 10:18:29AM +0100, Pavel Machek wrote:
> > > would ever rely on the pid being reused while having the descriptor
> > > open. How would that make sense?
> >
> > I agree this is corner space, but users might be surprised that
> > keeping FDs of /proc/pid/X would lead to PID space exhaustion, for
> > example.
>
> We have a limit on the number of FDs a process can have open for a reason.
> Well, for many reasons.

And the typical limit is too low. (I've seen people clamp it to 1024
for some reason.) A file descriptor is just a handle to a kernel
resource. All kernel resources held on behalf of applications need
*some* kind of management interface. File descriptors provide a
consistent and uniform instance of such a management interface. Unless
there's a very good reason, nobody should be using non-FD handles for
kernel resource management. A low default FD table size limit is not
an example of one of these good reasons, not when we can raise FD
table size limit. In general, the software projects should not have to
put up with ugly workarounds for limitations they impose on
themselves.

2018-11-20 20:54:27

by Matthew Wilcox

[permalink] [raw]
Subject: Re: [PATCH v2] Document /proc/pid PID reuse behavior

On Tue, Nov 20, 2018 at 09:48:27AM -0800, Daniel Colascione wrote:
> On Tue, Nov 20, 2018 at 9:39 AM Matthew Wilcox <[email protected]> wrote:
> > We have a limit on the number of FDs a process can have open for a reason.
> > Well, for many reasons.
>
> And the typical limit is too low. (I've seen people clamp it to 1024
> for some reason.)

1024 is the soft limit. 4096 is the default hard limit. You can always
ask root to set your hard limit higher if that's what you need.

> A file descriptor is just a handle to a kernel
> resource. All kernel resources held on behalf of applications need
> *some* kind of management interface. File descriptors provide a
> consistent and uniform instance of such a management interface. Unless
> there's a very good reason, nobody should be using non-FD handles for
> kernel resource management. A low default FD table size limit is not
> an example of one of these good reasons, not when we can raise FD
> table size limit. In general, the software projects should not have to
> put up with ugly workarounds for limitations they impose on
> themselves.

I'm not really sure why you decided to go off on this rant. My point to
Pavel was that there's no way a single process can tie up all of the PIDs.
Unless root decided to let them shoot everybody else in the system in
the foot.