2013-03-05 06:41:24

by Eric W. Biederman

[permalink] [raw]
Subject: Re: For review: pid_namespaces(7) man page

"Michael Kerrisk (man-pages)" <[email protected]> writes:

> Eric,
>
> On Mon, Mar 4, 2013 at 6:52 PM, Eric W. Biederman
> <[email protected]> wrote:
>> "Michael Kerrisk (man-pages)" <[email protected]> writes:
>>
>>> On Fri, Mar 1, 2013 at 4:35 PM, Eric W. Biederman
> <[email protected]> wrote:
>>>> "Michael Kerrisk (man-pages)" <[email protected]> writes:
>>>>
>>>>> Hi Rob,
>>>>>
>>>>> On Fri, Mar 1, 2013 at 5:01 AM, Rob Landley <[email protected]>
> wrote:
>>>>>> On 02/28/2013 05:24:07 AM, Michael Kerrisk (man-pages) wrote:
> [...]
>>>>>>> Because the above unshare(2) and setns(2) calls only change the
>>>>>>> PID namespace for created children, the clone(2) calls neces‐
>>>>>>> sarily put the new thread in a different PID namespace from the
>>>>>>> calling thread.
>>>>>>
>>>>>>
>>>>>> Um, no they don't. They fail. That's the point.
>>>>>
>>>>> (Good catch.)
>>>>>
>>>>>> They _would_ put the new
>>>>>> thread in a different PID namespace, which breaks the definition
> of threads.
>>>>>>
>>>>>> How about:
>>>>>>
>>>>>> The above unshare(2) and setns(2) calls change the PID namespace
> of
>>>>>> children created by subsequent clone(2) calls, which is
> incompatible
>>>>>> with CLONE_VM.
>>>>>
>>>>> I decided on:
>>>>>
>>>>> The point here is that unshare(2) and setns(2) change the PID
>>>>> namespace for created children but not for the calling process,
>>>>> while clone(2) CLONE_VM specifies the creation of a new thread
>>>>> in the same process.
>>>>
>>>> Can we make that "for all new tasks created" instead of "created
>>>> children"
>>>>
>>>> Othewise someone might expect CLONE_THREAD would work as you
>>>> CLONE_THREAD creates a thread and not a child...
>>>
>>> The term "task" is kernel-space talk that rarely appears in man
> pages,
>>> so I am reluctant to use it.
>>
>> With respect to clone and in this case I am not certain we can
> properly
>> describe what happens without talking about tasks. But it is worth
>> a try.
>>
>>
>>> How about this:
>>>
>>> The point here is that unshare(2) and setns(2) change the PID
>>> namespace for processes subsequently created by the caller, but
>>> not for the calling process, while clone(2) CLONE_VM specifies
>>> the creation of a new thread in the same process.
>>
>> Hmm. How about this.
>>
>> The point here is that unshare(2) and setns(2) change the PID
>> namespace that will be used by in all subsequent calls to clone
>> and fork by the caller, but not for the calling process, and
>> that all threads in a process must share the same PID
>> namespace. Which makes a subsequent clone(2) CLONE_VM
>> specify the creation of a new thread in the a different PID
>> namespace but in the same process which is impossible.
>
> I did a little tidying:
>
> The point here is that unshare(2) and setns(2) change the
> PID namespace that will be used in all subsequent calls
> to clone(2) and fork(2), but do not change the PID names‐
> pace of the calling process. Because a subsequent
> clone(2) CLONE_VM would imply the creation of a new
> thread in a different PID namespace, the operation is not
> permitted.
>
> Okay?

That seems reasonable.

CLONE_THREAD might be better to talk about. The check is CLONE_VM
because it is easier and CLONE_THREAD implies CLONE_THREAD.

> Having asked that, I realize that I'm still not quite comfortable with
> this text. I think the problem is really one of terminology. At the
> start of this passage in the page, there is the sentence:
>
> Every thread in a process must be in the
> same PID namespace.
>
> Can you define "thread" in this context?

Most definitely a thread group created with CLONE_THREAD. It is pretty
ugly in just the old fashioned CLONE_VM case too, but that might be
legal.

In a few cases I think the implementation overshoots and test for VM
sharing instead of thread group membership because VM sharing is easier
to test for, and we already have tests for that.

Eric


Subject: Re: For review: pid_namespaces(7) man page

On Tue, Mar 5, 2013 at 7:41 AM, Eric W. Biederman <[email protected]> wrote:
> "Michael Kerrisk (man-pages)" <[email protected]> writes:
>
>> Eric,
>>
>> On Mon, Mar 4, 2013 at 6:52 PM, Eric W. Biederman
>> <[email protected]> wrote:
>>> "Michael Kerrisk (man-pages)" <[email protected]> writes:
>>>
>>>> On Fri, Mar 1, 2013 at 4:35 PM, Eric W. Biederman
>> <[email protected]> wrote:
>>>>> "Michael Kerrisk (man-pages)" <[email protected]> writes:
>>>>>
>>>>>> Hi Rob,
>>>>>>
>>>>>> On Fri, Mar 1, 2013 at 5:01 AM, Rob Landley <[email protected]>
>> wrote:
>>>>>>> On 02/28/2013 05:24:07 AM, Michael Kerrisk (man-pages) wrote:
>> [...]
>>>>>>>> Because the above unshare(2) and setns(2) calls only change the
>>>>>>>> PID namespace for created children, the clone(2) calls neces‐
>>>>>>>> sarily put the new thread in a different PID namespace from the
>>>>>>>> calling thread.
>>>>>>>
>>>>>>>
>>>>>>> Um, no they don't. They fail. That's the point.
>>>>>>
>>>>>> (Good catch.)
>>>>>>
>>>>>>> They _would_ put the new
>>>>>>> thread in a different PID namespace, which breaks the definition
>> of threads.
>>>>>>>
>>>>>>> How about:
>>>>>>>
>>>>>>> The above unshare(2) and setns(2) calls change the PID namespace
>> of
>>>>>>> children created by subsequent clone(2) calls, which is
>> incompatible
>>>>>>> with CLONE_VM.
>>>>>>
>>>>>> I decided on:
>>>>>>
>>>>>> The point here is that unshare(2) and setns(2) change the PID
>>>>>> namespace for created children but not for the calling process,
>>>>>> while clone(2) CLONE_VM specifies the creation of a new thread
>>>>>> in the same process.
>>>>>
>>>>> Can we make that "for all new tasks created" instead of "created
>>>>> children"
>>>>>
>>>>> Othewise someone might expect CLONE_THREAD would work as you
>>>>> CLONE_THREAD creates a thread and not a child...
>>>>
>>>> The term "task" is kernel-space talk that rarely appears in man
>> pages,
>>>> so I am reluctant to use it.
>>>
>>> With respect to clone and in this case I am not certain we can
>> properly
>>> describe what happens without talking about tasks. But it is worth
>>> a try.
>>>
>>>
>>>> How about this:
>>>>
>>>> The point here is that unshare(2) and setns(2) change the PID
>>>> namespace for processes subsequently created by the caller, but
>>>> not for the calling process, while clone(2) CLONE_VM specifies
>>>> the creation of a new thread in the same process.
>>>
>>> Hmm. How about this.
>>>
>>> The point here is that unshare(2) and setns(2) change the PID
>>> namespace that will be used by in all subsequent calls to clone
>>> and fork by the caller, but not for the calling process, and
>>> that all threads in a process must share the same PID
>>> namespace. Which makes a subsequent clone(2) CLONE_VM
>>> specify the creation of a new thread in the a different PID
>>> namespace but in the same process which is impossible.
>>
>> I did a little tidying:
>>
>> The point here is that unshare(2) and setns(2) change the
>> PID namespace that will be used in all subsequent calls
>> to clone(2) and fork(2), but do not change the PID names‐
>> pace of the calling process. Because a subsequent
>> clone(2) CLONE_VM would imply the creation of a new
>> thread in a different PID namespace, the operation is not
>> permitted.
>>
>> Okay?
>
> That seems reasonable.
>
> CLONE_THREAD might be better to talk about. The check is CLONE_VM
> because it is easier and CLONE_THREAD implies CLONE_THREAD.
>
>> Having asked that, I realize that I'm still not quite comfortable with
>> this text. I think the problem is really one of terminology. At the
>> start of this passage in the page, there is the sentence:
>>
>> Every thread in a process must be in the
>> same PID namespace.
>>
>> Can you define "thread" in this context?
>
> Most definitely a thread group created with CLONE_THREAD. It is pretty
> ugly in just the old fashioned CLONE_VM case too, but that might be
> legal.
>
> In a few cases I think the implementation overshoots and test for VM
> sharing instead of thread group membership because VM sharing is easier
> to test for, and we already have tests for that.

So, in summary, the point is that CLONE_VM is being used as a proxy
for CLONE_THREAD because the former is easier to test for, and
CLONE_THREAD requires CLONE_VM, right?

--
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Author of "The Linux Programming Interface"; http://man7.org/tlpi/

2013-03-06 00:41:07

by Eric W. Biederman

[permalink] [raw]
Subject: Re: For review: pid_namespaces(7) man page

"Michael Kerrisk (man-pages)" <[email protected]> writes:

> On Tue, Mar 5, 2013 at 7:41 AM, Eric W. Biederman <[email protected]> wrote:
>> "Michael Kerrisk (man-pages)" <[email protected]> writes:
>>
>>> Eric,
>>>
>>> On Mon, Mar 4, 2013 at 6:52 PM, Eric W. Biederman
>>> <[email protected]> wrote:
>>>> "Michael Kerrisk (man-pages)" <[email protected]> writes:
>>>>
>>>>> On Fri, Mar 1, 2013 at 4:35 PM, Eric W. Biederman
>>> <[email protected]> wrote:
>>>>>> "Michael Kerrisk (man-pages)" <[email protected]> writes:
>>>>>>
>>>>>>> Hi Rob,
>>>>>>>
>>>>>>> On Fri, Mar 1, 2013 at 5:01 AM, Rob Landley <[email protected]>
>>> wrote:
>>>>>>>> On 02/28/2013 05:24:07 AM, Michael Kerrisk (man-pages) wrote:
>>> [...]
>>>>>>>>> Because the above unshare(2) and setns(2) calls only change the
>>>>>>>>> PID namespace for created children, the clone(2) calls neces‐
>>>>>>>>> sarily put the new thread in a different PID namespace from the
>>>>>>>>> calling thread.
>>>>>>>>
>>>>>>>>
>>>>>>>> Um, no they don't. They fail. That's the point.
>>>>>>>
>>>>>>> (Good catch.)
>>>>>>>
>>>>>>>> They _would_ put the new
>>>>>>>> thread in a different PID namespace, which breaks the definition
>>> of threads.
>>>>>>>>
>>>>>>>> How about:
>>>>>>>>
>>>>>>>> The above unshare(2) and setns(2) calls change the PID namespace
>>> of
>>>>>>>> children created by subsequent clone(2) calls, which is
>>> incompatible
>>>>>>>> with CLONE_VM.
>>>>>>>
>>>>>>> I decided on:
>>>>>>>
>>>>>>> The point here is that unshare(2) and setns(2) change the PID
>>>>>>> namespace for created children but not for the calling process,
>>>>>>> while clone(2) CLONE_VM specifies the creation of a new thread
>>>>>>> in the same process.
>>>>>>
>>>>>> Can we make that "for all new tasks created" instead of "created
>>>>>> children"
>>>>>>
>>>>>> Othewise someone might expect CLONE_THREAD would work as you
>>>>>> CLONE_THREAD creates a thread and not a child...
>>>>>
>>>>> The term "task" is kernel-space talk that rarely appears in man
>>> pages,
>>>>> so I am reluctant to use it.
>>>>
>>>> With respect to clone and in this case I am not certain we can
>>> properly
>>>> describe what happens without talking about tasks. But it is worth
>>>> a try.
>>>>
>>>>
>>>>> How about this:
>>>>>
>>>>> The point here is that unshare(2) and setns(2) change the PID
>>>>> namespace for processes subsequently created by the caller, but
>>>>> not for the calling process, while clone(2) CLONE_VM specifies
>>>>> the creation of a new thread in the same process.
>>>>
>>>> Hmm. How about this.
>>>>
>>>> The point here is that unshare(2) and setns(2) change the PID
>>>> namespace that will be used by in all subsequent calls to clone
>>>> and fork by the caller, but not for the calling process, and
>>>> that all threads in a process must share the same PID
>>>> namespace. Which makes a subsequent clone(2) CLONE_VM
>>>> specify the creation of a new thread in the a different PID
>>>> namespace but in the same process which is impossible.
>>>
>>> I did a little tidying:
>>>
>>> The point here is that unshare(2) and setns(2) change the
>>> PID namespace that will be used in all subsequent calls
>>> to clone(2) and fork(2), but do not change the PID names‐
>>> pace of the calling process. Because a subsequent
>>> clone(2) CLONE_VM would imply the creation of a new
>>> thread in a different PID namespace, the operation is not
>>> permitted.
>>>
>>> Okay?
>>
>> That seems reasonable.
>>
>> CLONE_THREAD might be better to talk about. The check is CLONE_VM
>> because it is easier and CLONE_THREAD implies CLONE_THREAD.
>>
>>> Having asked that, I realize that I'm still not quite comfortable with
>>> this text. I think the problem is really one of terminology. At the
>>> start of this passage in the page, there is the sentence:
>>>
>>> Every thread in a process must be in the
>>> same PID namespace.
>>>
>>> Can you define "thread" in this context?
>>
>> Most definitely a thread group created with CLONE_THREAD. It is pretty
>> ugly in just the old fashioned CLONE_VM case too, but that might be
>> legal.
>>
>> In a few cases I think the implementation overshoots and test for VM
>> sharing instead of thread group membership because VM sharing is easier
>> to test for, and we already have tests for that.
>
> So, in summary, the point is that CLONE_VM is being used as a proxy
> for CLONE_THREAD because the former is easier to test for, and
> CLONE_THREAD requires CLONE_VM, right?

I am totally lost about what we are problem we are trying to resolve in
the text at this point. So I am taking this opportunity to review
what is actually happening and hopefully give a clear and useful
explanation.

The clone flags have some dependencies.
CLONE_SIGHAND requires CLONE_VM.
CLONE_THREAD requires CLONE_SIGHAND.

Ultimately there are cases in here that are too strange to think about,
and that no one cares (except so far to document what is going on). The
fundamental goal of these checks it to just not allow the cases that
are too strange to think about.

>From a technical point of view CLONE_THREAD requires being in the same
PID namespace so you can send signals to other threads in your process,
and you need to see in proc all of the threads of your process.

>From a technical point of view CLONE_SIGHAND requries being in the same
PID namespace because we need to know how to encode the PID of the
sending process at the time a signal is enqueued in the destination
queue. A signal queue shared by processes in multiple PID namespaces
will defeat that.

>From a technical point of view CLONE_VM requires all of the threads to
be in a PID namespace, because from the point of view of coredump code
if two processes share the same address space they are threads and will
be core dumped together. When a coredump is written the pid of each
thread is written into the coredump. Writing the pids could not
meaningfully succeed if some of the pids were in a parent PID namespace.

Therefore there is a technical requirement for each of CLONE_THREAD,
CLONE_SIGHAND, CLONE_VM to share a PID namespace.

In the code in the kernel testing only for CLONE_VM is a shorthand for
testing for any of CLONE_THREAD, CLONE_SIGHAND, or CLONE_VM.



On the flip side the addition by unshare(CLONE_NEWPID) of
unshare(CLONE_THREAD) actually appears to be bogus because we do not
change the pid namespace of the process calling unshare (only it's
children), and we already allow that case with setns. I need to think
about that case a little more but I am going to queue up a patch for
3.10 to make unshare(CLONE_NEWPID) and setns(CLONE_NEWPID) consistent.
Probably by removing the check in unshare(CLONE_NEWPID).

I need to think about a bit about what happens from the threaded parents
perspective when different threads can create children in different PID
namespaces. Does it introduce weird hard to support cases into the code?
Or will it just work without requiring anything special and I can allow
it.

Eric

Subject: Re: For review: pid_namespaces(7) man page

On Wed, Mar 6, 2013 at 1:40 AM, Eric W. Biederman <[email protected]> wrote:
> "Michael Kerrisk (man-pages)" <[email protected]> writes:
>
>> On Tue, Mar 5, 2013 at 7:41 AM, Eric W. Biederman <[email protected]> wrote:
>>> "Michael Kerrisk (man-pages)" <[email protected]> writes:
>>>
>>>> Eric,
>>>>
>>>> On Mon, Mar 4, 2013 at 6:52 PM, Eric W. Biederman
>>>> <[email protected]> wrote:
>>>>> "Michael Kerrisk (man-pages)" <[email protected]> writes:
>>>>>
>>>>>> On Fri, Mar 1, 2013 at 4:35 PM, Eric W. Biederman
>>>> <[email protected]> wrote:
>>>>>>> "Michael Kerrisk (man-pages)" <[email protected]> writes:
>>>>>>>
>>>>>>>> Hi Rob,
>>>>>>>>
>>>>>>>> On Fri, Mar 1, 2013 at 5:01 AM, Rob Landley <[email protected]>
>>>> wrote:
>>>>>>>>> On 02/28/2013 05:24:07 AM, Michael Kerrisk (man-pages) wrote:
>>>> [...]
>>>>>>>>>> Because the above unshare(2) and setns(2) calls only change the
>>>>>>>>>> PID namespace for created children, the clone(2) calls neces‐
>>>>>>>>>> sarily put the new thread in a different PID namespace from the
>>>>>>>>>> calling thread.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Um, no they don't. They fail. That's the point.
>>>>>>>>
>>>>>>>> (Good catch.)
>>>>>>>>
>>>>>>>>> They _would_ put the new
>>>>>>>>> thread in a different PID namespace, which breaks the definition
>>>> of threads.
>>>>>>>>>
>>>>>>>>> How about:
>>>>>>>>>
>>>>>>>>> The above unshare(2) and setns(2) calls change the PID namespace
>>>> of
>>>>>>>>> children created by subsequent clone(2) calls, which is
>>>> incompatible
>>>>>>>>> with CLONE_VM.
>>>>>>>>
>>>>>>>> I decided on:
>>>>>>>>
>>>>>>>> The point here is that unshare(2) and setns(2) change the PID
>>>>>>>> namespace for created children but not for the calling process,
>>>>>>>> while clone(2) CLONE_VM specifies the creation of a new thread
>>>>>>>> in the same process.
>>>>>>>
>>>>>>> Can we make that "for all new tasks created" instead of "created
>>>>>>> children"
>>>>>>>
>>>>>>> Othewise someone might expect CLONE_THREAD would work as you
>>>>>>> CLONE_THREAD creates a thread and not a child...
>>>>>>
>>>>>> The term "task" is kernel-space talk that rarely appears in man
>>>> pages,
>>>>>> so I am reluctant to use it.
>>>>>
>>>>> With respect to clone and in this case I am not certain we can
>>>> properly
>>>>> describe what happens without talking about tasks. But it is worth
>>>>> a try.
>>>>>
>>>>>
>>>>>> How about this:
>>>>>>
>>>>>> The point here is that unshare(2) and setns(2) change the PID
>>>>>> namespace for processes subsequently created by the caller, but
>>>>>> not for the calling process, while clone(2) CLONE_VM specifies
>>>>>> the creation of a new thread in the same process.
>>>>>
>>>>> Hmm. How about this.
>>>>>
>>>>> The point here is that unshare(2) and setns(2) change the PID
>>>>> namespace that will be used by in all subsequent calls to clone
>>>>> and fork by the caller, but not for the calling process, and
>>>>> that all threads in a process must share the same PID
>>>>> namespace. Which makes a subsequent clone(2) CLONE_VM
>>>>> specify the creation of a new thread in the a different PID
>>>>> namespace but in the same process which is impossible.
>>>>
>>>> I did a little tidying:
>>>>
>>>> The point here is that unshare(2) and setns(2) change the
>>>> PID namespace that will be used in all subsequent calls
>>>> to clone(2) and fork(2), but do not change the PID names‐
>>>> pace of the calling process. Because a subsequent
>>>> clone(2) CLONE_VM would imply the creation of a new
>>>> thread in a different PID namespace, the operation is not
>>>> permitted.
>>>>
>>>> Okay?
>>>
>>> That seems reasonable.
>>>
>>> CLONE_THREAD might be better to talk about. The check is CLONE_VM
>>> because it is easier and CLONE_THREAD implies CLONE_THREAD.
>>>
>>>> Having asked that, I realize that I'm still not quite comfortable with
>>>> this text. I think the problem is really one of terminology. At the
>>>> start of this passage in the page, there is the sentence:
>>>>
>>>> Every thread in a process must be in the
>>>> same PID namespace.
>>>>
>>>> Can you define "thread" in this context?
>>>
>>> Most definitely a thread group created with CLONE_THREAD. It is pretty
>>> ugly in just the old fashioned CLONE_VM case too, but that might be
>>> legal.
>>>
>>> In a few cases I think the implementation overshoots and test for VM
>>> sharing instead of thread group membership because VM sharing is easier
>>> to test for, and we already have tests for that.
>>
>> So, in summary, the point is that CLONE_VM is being used as a proxy
>> for CLONE_THREAD because the former is easier to test for, and
>> CLONE_THREAD requires CLONE_VM, right?
>
> I am totally lost about what we are problem we are trying to resolve in
> the text at this point. So I am taking this opportunity to review
> what is actually happening and hopefully give a clear and useful
> explanation.

The problem is that the existing text talks about multithreaded
processes needing to be in the same PID namespace and then jumps to
talking about restrictions with CLONE_VM (not CLONE_THREAD). The
reader may not realize know that CLONE_VM is a near synonym for
"multithreaded process".

However, the text you provide here is wonderful detail:

> The clone flags have some dependencies.
> CLONE_SIGHAND requires CLONE_VM.
> CLONE_THREAD requires CLONE_SIGHAND.
>
> Ultimately there are cases in here that are too strange to think about,
> and that no one cares (except so far to document what is going on). The
> fundamental goal of these checks it to just not allow the cases that
> are too strange to think about.
>
> From a technical point of view CLONE_THREAD requires being in the same
> PID namespace so you can send signals to other threads in your process,
> and you need to see in proc all of the threads of your process.
>
> From a technical point of view CLONE_SIGHAND requries being in the same
> PID namespace because we need to know how to encode the PID of the
> sending process at the time a signal is enqueued in the destination
> queue. A signal queue shared by processes in multiple PID namespaces
> will defeat that.
>
> From a technical point of view CLONE_VM requires all of the threads to
> be in a PID namespace, because from the point of view of coredump code
> if two processes share the same address space they are threads and will
> be core dumped together. When a coredump is written the pid of each
> thread is written into the coredump. Writing the pids could not
> meaningfully succeed if some of the pids were in a parent PID namespace.
>
> Therefore there is a technical requirement for each of CLONE_THREAD,
> CLONE_SIGHAND, CLONE_VM to share a PID namespace.
>
> In the code in the kernel testing only for CLONE_VM is a shorthand for
> testing for any of CLONE_THREAD, CLONE_SIGHAND, or CLONE_VM.

I will incorporate most of the above into the page.

> On the flip side the addition by unshare(CLONE_NEWPID) of
> unshare(CLONE_THREAD) actually appears to be bogus

I agree that it seems strange.

Cheers,

Michael

> because we do not
> change the pid namespace of the process calling unshare (only it's
> children), and we already allow that case with setns. I need to think
> about that case a little more but I am going to queue up a patch for
> 3.10 to make unshare(CLONE_NEWPID) and setns(CLONE_NEWPID) consistent.
> Probably by removing the check in unshare(CLONE_NEWPID).
>
> I need to think about a bit about what happens from the threaded parents
> perspective when different threads can create children in different PID
> namespaces. Does it introduce weird hard to support cases into the code?
> Or will it just work without requiring anything special and I can allow
> it.
>
> Eric



--
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Author of "The Linux Programming Interface"; http://man7.org/tlpi/

2013-03-07 08:31:40

by Eric W. Biederman

[permalink] [raw]
Subject: Re: For review: pid_namespaces(7) man page

"Michael Kerrisk (man-pages)" <[email protected]> writes:

> On Wed, Mar 6, 2013 at 1:40 AM, Eric W. Biederman <[email protected]> wrote:
>> "Michael Kerrisk (man-pages)" <[email protected]> writes:
>>
>>> On Tue, Mar 5, 2013 at 7:41 AM, Eric W. Biederman <[email protected]> wrote:
>>>> "Michael Kerrisk (man-pages)" <[email protected]> writes:
>>>>
>>>>> Eric,
>>>>>
>>>>> On Mon, Mar 4, 2013 at 6:52 PM, Eric W. Biederman
>>>>> <[email protected]> wrote:
>>>>>> "Michael Kerrisk (man-pages)" <[email protected]> writes:
>>>>>>
>>>>>>> On Fri, Mar 1, 2013 at 4:35 PM, Eric W. Biederman
>>>>> <[email protected]> wrote:
>>>>>>>> "Michael Kerrisk (man-pages)" <[email protected]> writes:
>>>>>>>>
>>>>>>>>> Hi Rob,
>>>>>>>>>
>>>>>>>>> On Fri, Mar 1, 2013 at 5:01 AM, Rob Landley <[email protected]>
>>>>> wrote:
>>>>>>>>>> On 02/28/2013 05:24:07 AM, Michael Kerrisk (man-pages) wrote:
>>>>> [...]
>>>>>>>>>>> Because the above unshare(2) and setns(2) calls only change the
>>>>>>>>>>> PID namespace for created children, the clone(2) calls neces‐
>>>>>>>>>>> sarily put the new thread in a different PID namespace from the
>>>>>>>>>>> calling thread.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Um, no they don't. They fail. That's the point.
>>>>>>>>>
>>>>>>>>> (Good catch.)
>>>>>>>>>
>>>>>>>>>> They _would_ put the new
>>>>>>>>>> thread in a different PID namespace, which breaks the definition
>>>>> of threads.
>>>>>>>>>>
>>>>>>>>>> How about:
>>>>>>>>>>
>>>>>>>>>> The above unshare(2) and setns(2) calls change the PID namespace
>>>>> of
>>>>>>>>>> children created by subsequent clone(2) calls, which is
>>>>> incompatible
>>>>>>>>>> with CLONE_VM.
>>>>>>>>>
>>>>>>>>> I decided on:
>>>>>>>>>
>>>>>>>>> The point here is that unshare(2) and setns(2) change the PID
>>>>>>>>> namespace for created children but not for the calling process,
>>>>>>>>> while clone(2) CLONE_VM specifies the creation of a new thread
>>>>>>>>> in the same process.
>>>>>>>>
>>>>>>>> Can we make that "for all new tasks created" instead of "created
>>>>>>>> children"
>>>>>>>>
>>>>>>>> Othewise someone might expect CLONE_THREAD would work as you
>>>>>>>> CLONE_THREAD creates a thread and not a child...
>>>>>>>
>>>>>>> The term "task" is kernel-space talk that rarely appears in man
>>>>> pages,
>>>>>>> so I am reluctant to use it.
>>>>>>
>>>>>> With respect to clone and in this case I am not certain we can
>>>>> properly
>>>>>> describe what happens without talking about tasks. But it is worth
>>>>>> a try.
>>>>>>
>>>>>>
>>>>>>> How about this:
>>>>>>>
>>>>>>> The point here is that unshare(2) and setns(2) change the PID
>>>>>>> namespace for processes subsequently created by the caller, but
>>>>>>> not for the calling process, while clone(2) CLONE_VM specifies
>>>>>>> the creation of a new thread in the same process.
>>>>>>
>>>>>> Hmm. How about this.
>>>>>>
>>>>>> The point here is that unshare(2) and setns(2) change the PID
>>>>>> namespace that will be used by in all subsequent calls to clone
>>>>>> and fork by the caller, but not for the calling process, and
>>>>>> that all threads in a process must share the same PID
>>>>>> namespace. Which makes a subsequent clone(2) CLONE_VM
>>>>>> specify the creation of a new thread in the a different PID
>>>>>> namespace but in the same process which is impossible.
>>>>>
>>>>> I did a little tidying:
>>>>>
>>>>> The point here is that unshare(2) and setns(2) change the
>>>>> PID namespace that will be used in all subsequent calls
>>>>> to clone(2) and fork(2), but do not change the PID names‐
>>>>> pace of the calling process. Because a subsequent
>>>>> clone(2) CLONE_VM would imply the creation of a new
>>>>> thread in a different PID namespace, the operation is not
>>>>> permitted.
>>>>>
>>>>> Okay?
>>>>
>>>> That seems reasonable.
>>>>
>>>> CLONE_THREAD might be better to talk about. The check is CLONE_VM
>>>> because it is easier and CLONE_THREAD implies CLONE_THREAD.
>>>>
>>>>> Having asked that, I realize that I'm still not quite comfortable with
>>>>> this text. I think the problem is really one of terminology. At the
>>>>> start of this passage in the page, there is the sentence:
>>>>>
>>>>> Every thread in a process must be in the
>>>>> same PID namespace.
>>>>>
>>>>> Can you define "thread" in this context?
>>>>
>>>> Most definitely a thread group created with CLONE_THREAD. It is pretty
>>>> ugly in just the old fashioned CLONE_VM case too, but that might be
>>>> legal.
>>>>
>>>> In a few cases I think the implementation overshoots and test for VM
>>>> sharing instead of thread group membership because VM sharing is easier
>>>> to test for, and we already have tests for that.
>>>
>>> So, in summary, the point is that CLONE_VM is being used as a proxy
>>> for CLONE_THREAD because the former is easier to test for, and
>>> CLONE_THREAD requires CLONE_VM, right?
>>
>> I am totally lost about what we are problem we are trying to resolve in
>> the text at this point. So I am taking this opportunity to review
>> what is actually happening and hopefully give a clear and useful
>> explanation.
>
> The problem is that the existing text talks about multithreaded
> processes needing to be in the same PID namespace and then jumps to
> talking about restrictions with CLONE_VM (not CLONE_THREAD). The
> reader may not realize know that CLONE_VM is a near synonym for
> "multithreaded process".
>
> However, the text you provide here is wonderful detail:
>
>> The clone flags have some dependencies.
>> CLONE_SIGHAND requires CLONE_VM.
>> CLONE_THREAD requires CLONE_SIGHAND.
>>
>> Ultimately there are cases in here that are too strange to think about,
>> and that no one cares (except so far to document what is going on). The
>> fundamental goal of these checks it to just not allow the cases that
>> are too strange to think about.
>>
>> From a technical point of view CLONE_THREAD requires being in the same
>> PID namespace so you can send signals to other threads in your process,
>> and you need to see in proc all of the threads of your process.
>>
>> From a technical point of view CLONE_SIGHAND requries being in the same
>> PID namespace because we need to know how to encode the PID of the
>> sending process at the time a signal is enqueued in the destination
>> queue. A signal queue shared by processes in multiple PID namespaces
>> will defeat that.
>>
>> From a technical point of view CLONE_VM requires all of the threads to
>> be in a PID namespace, because from the point of view of coredump code
>> if two processes share the same address space they are threads and will
>> be core dumped together. When a coredump is written the pid of each
>> thread is written into the coredump. Writing the pids could not
>> meaningfully succeed if some of the pids were in a parent PID namespace.
>>
>> Therefore there is a technical requirement for each of CLONE_THREAD,
>> CLONE_SIGHAND, CLONE_VM to share a PID namespace.
>>
>> In the code in the kernel testing only for CLONE_VM is a shorthand for
>> testing for any of CLONE_THREAD, CLONE_SIGHAND, or CLONE_VM.
>
> I will incorporate most of the above into the page.
>
>> On the flip side the addition by unshare(CLONE_NEWPID) of
>> unshare(CLONE_THREAD) actually appears to be bogus
>
> I agree that it seems strange.

Having looked at it a little more I will be removing the unnecessary
CLONE_THREAD check in 3.10.

Eric