2014-07-03 12:18:39

by Chen Hanxiao

[permalink] [raw]
Subject: [RFC]Pid conversion between pid namespace

Hi,

We had some discussions on how to carry out
pid conversion between pid namespace via:
syscall[1] and procfs[2].

Pavel suggested that a syscall like
(ID, NS1, NS2) into (ID).

Serge suggested that a syscall
pid_t getnspid(pid_t query_pid, pid_t observer_pid).


Eric and Richard suggested a procfs solution is
more appropriate.

Oleg suggested that we should expand /proc/pid/status
to report this kind of information.

And Richard suggested adding a directory like
/proc/<pidX>/ns/proc/ which would contain everything
from /proc/<pidX inside the namespace>/.

As procfs provided a more user friendly interface,
how about expose all sets of tgid, pid, pgid, sid
by expanding /proc/PID/status in procfs?
And we could also expose ns hierarchy under /proc,
which could be another reference.

Ex:
init_pid_ns ns1 ns2
t1 2
t2 `- 3 1
t3 `- 4 `- 5 1

We could get in /proc/t3/status:
NSpid: 4 5 1
We knew that pid 1 in container is pid 4 in init ns.

And we could get ns hierarchy under /proc/ns_hierarchy like:
init_ns->ns1->ns2 (as the result of readlink)
->ns3
We knew that t3 in ns2, and its hierarchy.

How these ideas looks like?
Any comments would be appreciated.

Thanks,
- Chen


a) syscall
http://lwn.net/Articles/602987/

b) procfs
http://www.spinics.net/lists/kernel/msg1751688.html

????{.n?+???????+%?????ݶ??w??{.n?+????{??G?????{ay?ʇڙ?,j??f???h?????????z_??(?階?ݢj"???m??????G????????????&???~???iO???z??v?^?m???? ????????I?


2014-07-04 05:44:20

by Yasunori Goto

[permalink] [raw]
Subject: Re: [RFC]Pid conversion between pid namespace

Chen-san,

I would like to recommend that you summarize pros/cons for all ideas so far.

For example,

---------
A) make new system call for transrate

A-1) systemcall(ID, NS1, NS2) into (ID).
pros:
- foo
- baa

cons:
- hoge
- hogehogehoge

A-2) pid_t getnspid(pid_t query_pid, pid_t observer_pid)
(ditto)


B) make/change proc file/directories
B-1) expand /proc/pid/status
(ditto)

B-2) /proc/<pidX>/ns/proc/ which would contain everything
from /proc/<pidX inside the namespace>/.
(ditto)


------

Please make clear what is the good/bad point of each opinion by the above,
- Is it hard to keep compatiblity?
- Is it hard to understand for administorator/programmer?
- Is it difficult to show for "nested containers"?
- Is userland tool necessary?
- any other problems?

I hope it will be good discussion by the above.

Thanks,

> Hi,
>
> We had some discussions on how to carry out
> pid conversion between pid namespace via:
> syscall[1] and procfs[2].
>
> Pavel suggested that a syscall like
> (ID, NS1, NS2) into (ID).
>
> Serge suggested that a syscall
> pid_t getnspid(pid_t query_pid, pid_t observer_pid).
>
>
> Eric and Richard suggested a procfs solution is
> more appropriate.
>
> Oleg suggested that we should expand /proc/pid/status
> to report this kind of information.
>
> And Richard suggested adding a directory like
> /proc/<pidX>/ns/proc/ which would contain everything
> from /proc/<pidX inside the namespace>/.
>
> As procfs provided a more user friendly interface,
> how about expose all sets of tgid, pid, pgid, sid
> by expanding /proc/PID/status in procfs?
> And we could also expose ns hierarchy under /proc,
> which could be another reference.
>
> Ex:
> init_pid_ns ns1 ns2
> t1 2
> t2 `- 3 1
> t3 `- 4 `- 5 1
>
> We could get in /proc/t3/status:
> NSpid: 4 5 1
> We knew that pid 1 in container is pid 4 in init ns.
>
> And we could get ns hierarchy under /proc/ns_hierarchy like:
> init_ns->ns1->ns2 (as the result of readlink)
> ->ns3
> We knew that t3 in ns2, and its hierarchy.
>
> How these ideas looks like?
> Any comments would be appreciated.
>
> Thanks,
> - Chen
>
>
> a) syscall
> http://lwn.net/Articles/602987/
>
> b) procfs
> http://www.spinics.net/lists/kernel/msg1751688.html
>

--
Yasunori Goto <[email protected]>

2014-07-09 10:34:05

by Chen Hanxiao

[permalink] [raw]
Subject: RE: [RFC]Pid conversion between pid namespace

Hi,

Let me summarize our discussions of ID conversion by pros/cons:

A) make new system call for translation
A-1) systemcall(ID, NS1, NS2) into (ID).
pros:
- has a reference ns(NS2)
We could get any lower level ID directly.

cons:
- lack of hierarchy information.
CRIU need hierarchy info for checkpoint/restore in nested containers.
- not easy for debug.
And a lot of tools/libs need be modified.

A-2) syscall pid_t getnspid(pid_t query_pid, pid_t observer_pid)
pros:
- ns procfs free, easy to use.
We could get rid of mounted ns procfs.

cons:
- may find multiple results in nested ns.
We wished the new API could tell us the exact answer.
But if getnspid return more than one results will bring trouble to admins,
they had to make another decision.
Or we marked the deepest level for translation as prerequisite.

-based on current pidns, no reference ns.

B) make/change proc file/directories
B-1) expand /proc/pid/status
pros:
- easy to use and to debug
- already had existed interface in kernel

cons:
- based on current ns
for middle level, we had to make another decision.
- do not have hierarchy info.

B-2) /proc/<pidX>/ns/proc/ which would contain everything
pros:
- have enough info from /proc in container

cons:
- Requirements unclear.
We need more discussion to decide which items should not be exposed.
- do not have hierarchy info.


How about do these things in two steps:

C) 1. expose all sets of pid, pgid, sid and tgid
via expanded /proc/PID/status
We could get translated IDs from container like:
NStgid: 16465 5 1
NSpid: 16465 5 1
NSpgid: 16465 5 1
NSsid: 16423 1 0
(a set of IDs with 3 level of ns)

2. add hierarchy info under /proc
We lacked of method of getting hierarchy info, which is useful.
Then we could know the relationship of ns.
How about adding a new proc file just under /proc
to show the hierarchy like readlink did:
pid:[4026531836]-> [4026532390] -> [4026532484]
pid:[4026531836]-> [4026532491]
(A 3 level pid and 2 level pid_

Any comments would be appreciated.

Thanks,
- Chen

> -----Original Message-----
> Subject: [RFC]Pid conversion between pid namespace
>
> Hi,
>
> We had some discussions on how to carry out
> pid conversion between pid namespace via:
> syscall[1] and procfs[2].
>
> Pavel suggested that a syscall like
> (ID, NS1, NS2) into (ID).
>
> Serge suggested that a syscall
> pid_t getnspid(pid_t query_pid, pid_t observer_pid).
>
>
> Eric and Richard suggested a procfs solution is
> more appropriate.
>
> Oleg suggested that we should expand /proc/pid/status
> to report this kind of information.
>
> And Richard suggested adding a directory like
> /proc/<pidX>/ns/proc/ which would contain everything
> from /proc/<pidX inside the namespace>/.
>
> As procfs provided a more user friendly interface,
> how about expose all sets of tgid, pid, pgid, sid
> by expanding /proc/PID/status in procfs?
> And we could also expose ns hierarchy under /proc,
> which could be another reference.
>
> Ex:
> init_pid_ns ns1 ns2
> t1 2
> t2 `- 3 1
> t3 `- 4 `- 5 1
>
> We could get in /proc/t3/status:
> NSpid: 4 5 1
> We knew that pid 1 in container is pid 4 in init ns.
>
> And we could get ns hierarchy under /proc/ns_hierarchy like:
> init_ns->ns1->ns2 (as the result of readlink)
> ->ns3
> We knew that t3 in ns2, and its hierarchy.
>
> How these ideas looks like?
> Any comments would be appreciated.
>
> Thanks,
> - Chen
>
>
> a) syscall
> http://lwn.net/Articles/602987/
>
> b) procfs
> http://www.spinics.net/lists/kernel/msg1751688.html
>
> _______________________________________________
> Containers mailing list
> [email protected]
> https://lists.linuxfoundation.org/mailman/listinfo/containers
????{.n?+???????+%?????ݶ??w??{.n?+????{??G?????{ay?ʇڙ?,j??f???h?????????z_??(?階?ݢj"???m??????G????????????&???~???iO???z??v?^?m???? ????????I?

2014-07-15 04:17:12

by Serge Hallyn

[permalink] [raw]
Subject: Re: [RFC]Pid conversion between pid namespace

Quoting [email protected] ([email protected]):
> Hi,
>
> Let me summarize our discussions of ID conversion by pros/cons:
>
> A) make new system call for translation
> A-1) systemcall(ID, NS1, NS2) into (ID).
> pros:
> - has a reference ns(NS2)
> We could get any lower level ID directly.
>
> cons:
> - lack of hierarchy information.
> CRIU need hierarchy info for checkpoint/restore in nested containers.
> - not easy for debug.
> And a lot of tools/libs need be modified.
>
> A-2) syscall pid_t getnspid(pid_t query_pid, pid_t observer_pid)
> pros:
> - ns procfs free, easy to use.
> We could get rid of mounted ns procfs.
>
> cons:
> - may find multiple results in nested ns.
> We wished the new API could tell us the exact answer.
> But if getnspid return more than one results will bring trouble to admins,

(See below for more, but) the question being posed to getnspid has precisely
one answer.

> they had to make another decision.
> Or we marked the deepest level for translation as prerequisite.
>
> -based on current pidns, no reference ns.

Hm, no. The intent here was that

observer_pid would be in current ns
query_pid would be in observer_pid's ns.

So this would be ideal for "I got a pid in a logfile created by rsyslog in
a nested contaner, what is the logged pid in my pidns."

Taking a set of tasks (like a container with nesting) and bulding a tree
of all pids shouldn't be too difficult either. Start with the init pid,
call getnspid($pid, $init_pid) for every $pid in the container; to figure
out whether any $pid is itself a nested init_pid, we can compare the
/proc/$$/ns/pid, as well as look at getnspid($pid, $pid).

> B) make/change proc file/directories
> B-1) expand /proc/pid/status
> pros:
> - easy to use and to debug
> - already had existed interface in kernel
>
> cons:
> - based on current ns
> for middle level, we had to make another decision.
> - do not have hierarchy info.
>
> B-2) /proc/<pidX>/ns/proc/ which would contain everything
> pros:
> - have enough info from /proc in container
>
> cons:
> - Requirements unclear.
> We need more discussion to decide which items should not be exposed.
> - do not have hierarchy info.
>
>
> How about do these things in two steps:
>
> C) 1. expose all sets of pid, pgid, sid and tgid
> via expanded /proc/PID/status
> We could get translated IDs from container like:
> NStgid: 16465 5 1
> NSpid: 16465 5 1
> NSpgid: 16465 5 1
> NSsid: 16423 1 0
> (a set of IDs with 3 level of ns)
>
> 2. add hierarchy info under /proc
> We lacked of method of getting hierarchy info, which is useful.
> Then we could know the relationship of ns.
> How about adding a new proc file just under /proc
> to show the hierarchy like readlink did:
> pid:[4026531836]-> [4026532390] -> [4026532484]
> pid:[4026531836]-> [4026532491]
> (A 3 level pid and 2 level pid_
>
> Any comments would be appreciated.
>
> Thanks,
> - Chen
>
> > -----Original Message-----
> > Subject: [RFC]Pid conversion between pid namespace
> >
> > Hi,
> >
> > We had some discussions on how to carry out
> > pid conversion between pid namespace via:
> > syscall[1] and procfs[2].
> >
> > Pavel suggested that a syscall like
> > (ID, NS1, NS2) into (ID).
> >
> > Serge suggested that a syscall
> > pid_t getnspid(pid_t query_pid, pid_t observer_pid).
> >
> >
> > Eric and Richard suggested a procfs solution is
> > more appropriate.
> >
> > Oleg suggested that we should expand /proc/pid/status
> > to report this kind of information.
> >
> > And Richard suggested adding a directory like
> > /proc/<pidX>/ns/proc/ which would contain everything
> > from /proc/<pidX inside the namespace>/.
> >
> > As procfs provided a more user friendly interface,
> > how about expose all sets of tgid, pid, pgid, sid
> > by expanding /proc/PID/status in procfs?
> > And we could also expose ns hierarchy under /proc,
> > which could be another reference.
> >
> > Ex:
> > init_pid_ns ns1 ns2
> > t1 2
> > t2 `- 3 1
> > t3 `- 4 `- 5 1
> >
> > We could get in /proc/t3/status:
> > NSpid: 4 5 1
> > We knew that pid 1 in container is pid 4 in init ns.
> >
> > And we could get ns hierarchy under /proc/ns_hierarchy like:
> > init_ns->ns1->ns2 (as the result of readlink)
> > ->ns3
> > We knew that t3 in ns2, and its hierarchy.
> >
> > How these ideas looks like?
> > Any comments would be appreciated.
> >
> > Thanks,
> > - Chen
> >
> >
> > a) syscall
> > http://lwn.net/Articles/602987/
> >
> > b) procfs
> > http://www.spinics.net/lists/kernel/msg1751688.html
> >
> > _______________________________________________
> > Containers mailing list
> > [email protected]
> > https://lists.linuxfoundation.org/mailman/listinfo/containers
> _______________________________________________
> Containers mailing list
> [email protected]
> https://lists.linuxfoundation.org/mailman/listinfo/containers

2014-07-21 10:47:54

by Chen Hanxiao

[permalink] [raw]
Subject: RE: [RFC]Pid conversion between pid namespace

Hi,

> -----Original Message-----
> From: Serge Hallyn [mailto:[email protected]]
> Sent: Tuesday, July 15, 2014 12:16 PM
> To: Chen, Hanxiao/?? ????
> Subject: Re: [RFC]Pid conversion between pid namespace
> > A-2) syscall pid_t getnspid(pid_t query_pid, pid_t observer_pid)
> > pros:
> > - ns procfs free, easy to use.
> > We could get rid of mounted ns procfs.
> >
> > cons:
> > - may find multiple results in nested ns.
> > We wished the new API could tell us the exact answer.
> > But if getnspid return more than one results will bring trouble to admins,
>
> (See below for more, but) the question being posed to getnspid has precisely
> one answer.
>
> > they had to make another decision.
> > Or we marked the deepest level for translation as prerequisite.
> >
> > -based on current pidns, no reference ns.
>
> Hm, no. The intent here was that
>
> observer_pid would be in current ns
> query_pid would be in observer_pid's ns.
>
> So this would be ideal for "I got a pid in a logfile created by rsyslog in
> a nested contaner, what is the logged pid in my pidns."
>
> Taking a set of tasks (like a container with nesting) and bulding a tree
> of all pids shouldn't be too difficult either. Start with the init pid,
> call getnspid($pid, $init_pid) for every $pid in the container; to figure
> out whether any $pid is itself a nested init_pid, we can compare the
> /proc/$$/ns/pid, as well as look at getnspid($pid, $pid).
I'm a little confused in this section:

Ex:
init_pid_ns ns1 ns2
t1 2
t2 `- 3 1
t3 `- 4 `- 5 1
t4 `-6 `-8 `-9
t5 `-10 `-9 `-10

For getnspid($pid, $init_pid),
Does init_pid means container's init_pid such as 3 for t2?

In nested containers, does this syscall work as:
getnspid(9, 4) -> (6, 8, 9)
9 in ns2, 4 as t3 in init_pid_ns(current ns)

And:
getnspid($pid, $pid)
If pid in host and pid in container is the same by coincidence:
getnspid(10,10) for t5, it may not work.

Thanks,
- Chen
>
> > B) make/change proc file/directories
> > B-1) expand /proc/pid/status
> > pros:
> > - easy to use and to debug
> > - already had existed interface in kernel
> >
> > cons:
> > - based on current ns
> > for middle level, we had to make another decision.
> > - do not have hierarchy info.
> >
> > B-2) /proc/<pidX>/ns/proc/ which would contain everything
> > pros:
> > - have enough info from /proc in container
> >
> > cons:
> > - Requirements unclear.
> > We need more discussion to decide which items should not be exposed.
> > - do not have hierarchy info.
> >
> >
> > How about do these things in two steps:
> >
> > C) 1. expose all sets of pid, pgid, sid and tgid
> > via expanded /proc/PID/status
> > We could get translated IDs from container like:
> > NStgid: 16465 5 1
> > NSpid: 16465 5 1
> > NSpgid: 16465 5 1
> > NSsid: 16423 1 0
> > (a set of IDs with 3 level of ns)
> >
> > 2. add hierarchy info under /proc
> > We lacked of method of getting hierarchy info, which is useful.
> > Then we could know the relationship of ns.
> > How about adding a new proc file just under /proc
> > to show the hierarchy like readlink did:
> > pid:[4026531836]-> [4026532390] -> [4026532484]
> > pid:[4026531836]-> [4026532491]
> > (A 3 level pid and 2 level pid_
> >
> > Any comments would be appreciated.
> >
> > Thanks,
> > - Chen
> >
> > > -----Original Message-----
> > > Subject: [RFC]Pid conversion between pid namespace
> > >
> > > Hi,
> > >
> > > We had some discussions on how to carry out
> > > pid conversion between pid namespace via:
> > > syscall[1] and procfs[2].
> > >
> > > Pavel suggested that a syscall like
> > > (ID, NS1, NS2) into (ID).
> > >
> > > Serge suggested that a syscall
> > > pid_t getnspid(pid_t query_pid, pid_t observer_pid).
> > >
> > >
> > > Eric and Richard suggested a procfs solution is
> > > more appropriate.
> > >
> > > Oleg suggested that we should expand /proc/pid/status
> > > to report this kind of information.
> > >
> > > And Richard suggested adding a directory like
> > > /proc/<pidX>/ns/proc/ which would contain everything
> > > from /proc/<pidX inside the namespace>/.
> > >
> > > As procfs provided a more user friendly interface,
> > > how about expose all sets of tgid, pid, pgid, sid
> > > by expanding /proc/PID/status in procfs?
> > > And we could also expose ns hierarchy under /proc,
> > > which could be another reference.
> > >
> > > Ex:
> > > init_pid_ns ns1 ns2
> > > t1 2
> > > t2 `- 3 1
> > > t3 `- 4 `- 5 1
> > >
> > > We could get in /proc/t3/status:
> > > NSpid: 4 5 1
> > > We knew that pid 1 in container is pid 4 in init ns.
> > >
> > > And we could get ns hierarchy under /proc/ns_hierarchy like:
> > > init_ns->ns1->ns2 (as the result of readlink)
> > > ->ns3
> > > We knew that t3 in ns2, and its hierarchy.
> > >
> > > How these ideas looks like?
> > > Any comments would be appreciated.
> > >
> > > Thanks,
> > > - Chen
> > >
> > >
> > > a) syscall
> > > http://lwn.net/Articles/602987/
> > >
> > > b) procfs
> > > http://www.spinics.net/lists/kernel/msg1751688.html
> > >
> > > _______________________________________________
> > > Containers mailing list
> > > [email protected]
> > > https://lists.linuxfoundation.org/mailman/listinfo/containers
> > _______________________________________________
> > Containers mailing list
> > [email protected]
> > https://lists.linuxfoundation.org/mailman/listinfo/containers
????{.n?+???????+%?????ݶ??w??{.n?+????{??G?????{ay?ʇڙ?,j??f???h?????????z_??(?階?ݢj"???m??????G????????????&???~???iO???z??v?^?m???? ????????I?

2014-07-25 10:01:51

by Chen Hanxiao

[permalink] [raw]
Subject: RE: [RFC]Pid conversion between pid namespace

Hi,

We discussed two ways of pid conversion:
syscall and procfs.

Both of them could do a pid translation job.
But for ns hierarchy, syscall like:

pid_t* getnspid(pid_t query_pid, pid_t observer_pid)
or
pid_t getnspid(pid_t query_pid, int query_fd, int ref_fd)

could not work, we knew a pid lived in one ns, but we
did not know their relationships.
For getting the entire set of pids, both of them can do.

So using procfs is a better way.

Ex:
init_pid_ns ns1 ns2
t1 2
t2 `- 3 1
t3 `- 4 `- 5 1
t4 `-6 `-8 `-9
t5 `-10 `-9 `-10

1. How procfs work:
a) adding a nspid hierarchy under /proc/ like:
[root@localhost proc]# tree /proc/nspid
/proc/nspid
?????? ns0
?? ?????? ns1
?? ?????? ns2
?? ?? ?????? pid -> /proc/9/ns
?? ?????? pid -> /proc/4/ns
?????? pid -> /proc/1/ns

We created dirs and add a link to the 1st process of this ns.

b) expose all sets of pid, pgid, sid and tgid
via expanded /proc/PID/status
We could get translated IDs from container like:
NStgid: 6 8 9
NSpid: 6 8 9
NSpgid: 6 8 9
NSsid: 6 1 0
(a set of IDs with 3 level of ns)

2. Advantage of procfs solution
a) easy to use:
getnspid(6, 10) -> (10, 9, 10)
or
getnspid(10, ns1_fd, ns0_fd) -> 9
getnspid(10, ns2_fd, ns0_fd) -> 10

And we could also get it by:
cat /proc/10/status | grep NSpid:
NSpid: 10 9 10
...

b) hierarchy info:
We could not get the ns hierarchy info by just one syscall.
If we had to, it will complicate the interface.

We could check whether two process had some relations
via procfs:
readlink /proc/PID1/ns/pid -> aaa
readlink /proc/PID2/ns/pid -> bbb

Then we could check /proc/nspid/nsX/nsY/nsZ
and find out their relationship.
Ex??
We know t4 live in ns2,
readlink /proc/t4/ns/pid -> AAA
then we refer to /proc/nspid/ and find a same inum AAA under
/proc/nspid/ns0/ns1/ns2
Then we knew that t4 have pid 9 in ns2, have pid 8 in ns1.

Any comments would be warmly welcomed!

Thanks,
- Chen

> -----Original Message-----
> From: [email protected]
> [mailto:[email protected]] On Behalf Of
> [email protected]
> Sent: Wednesday, July 09, 2014 6:34 PM
> To: Eric W. Biederman ([email protected]); Serge Hallyn
> ([email protected]); Oleg Nesterov ([email protected]); Richard Weinberger
> ([email protected]); Pavel Emelyanov ([email protected]); Vasily Kulikov
> ([email protected]); Gotou, Yasunori/???u ????; 'Daniel P. Berrange
> ([email protected])'
> Cc: [email protected]; [email protected]
> Subject: RE: [RFC]Pid conversion between pid namespace
>
> Hi,
>
> Let me summarize our discussions of ID conversion by pros/cons:
>
> A) make new system call for translation
> A-1) systemcall(ID, NS1, NS2) into (ID).
> pros:
> - has a reference ns(NS2)
> We could get any lower level ID directly.
>
> cons:
> - lack of hierarchy information.
> CRIU need hierarchy info for checkpoint/restore in nested containers.
> - not easy for debug.
> And a lot of tools/libs need be modified.
>
> A-2) syscall pid_t getnspid(pid_t query_pid, pid_t observer_pid)
> pros:
> - ns procfs free, easy to use.
> We could get rid of mounted ns procfs.
>
> cons:
> - may find multiple results in nested ns.
> We wished the new API could tell us the exact answer.
> But if getnspid return more than one results will bring trouble to admins,
> they had to make another decision.
> Or we marked the deepest level for translation as prerequisite.
>
> -based on current pidns, no reference ns.
>
> B) make/change proc file/directories
> B-1) expand /proc/pid/status
> pros:
> - easy to use and to debug
> - already had existed interface in kernel
>
> cons:
> - based on current ns
> for middle level, we had to make another decision.
> - do not have hierarchy info.
>
> B-2) /proc/<pidX>/ns/proc/ which would contain everything
> pros:
> - have enough info from /proc in container
>
> cons:
> - Requirements unclear.
> We need more discussion to decide which items should not be exposed.
> - do not have hierarchy info.
>
>
> How about do these things in two steps:
>
> C) 1. expose all sets of pid, pgid, sid and tgid
> via expanded /proc/PID/status
> We could get translated IDs from container like:
> NStgid: 16465 5 1
> NSpid: 16465 5 1
> NSpgid: 16465 5 1
> NSsid: 16423 1 0
> (a set of IDs with 3 level of ns)
>
> 2. add hierarchy info under /proc
> We lacked of method of getting hierarchy info, which is useful.
> Then we could know the relationship of ns.
> How about adding a new proc file just under /proc
> to show the hierarchy like readlink did:
> pid:[4026531836]-> [4026532390] -> [4026532484]
> pid:[4026531836]-> [4026532491]
> (A 3 level pid and 2 level pid_
>
> Any comments would be appreciated.
>
> Thanks,
> - Chen
>
> > -----Original Message-----
> > Subject: [RFC]Pid conversion between pid namespace
> >
> > Hi,
> >
> > We had some discussions on how to carry out
> > pid conversion between pid namespace via:
> > syscall[1] and procfs[2].
> >
> > Pavel suggested that a syscall like
> > (ID, NS1, NS2) into (ID).
> >
> > Serge suggested that a syscall
> > pid_t getnspid(pid_t query_pid, pid_t observer_pid).
> >
> >
> > Eric and Richard suggested a procfs solution is
> > more appropriate.
> >
> > Oleg suggested that we should expand /proc/pid/status
> > to report this kind of information.
> >
> > And Richard suggested adding a directory like
> > /proc/<pidX>/ns/proc/ which would contain everything
> > from /proc/<pidX inside the namespace>/.
> >
> > As procfs provided a more user friendly interface,
> > how about expose all sets of tgid, pid, pgid, sid
> > by expanding /proc/PID/status in procfs?
> > And we could also expose ns hierarchy under /proc,
> > which could be another reference.
> >
> > Ex:
> > init_pid_ns ns1 ns2
> > t1 2
> > t2 `- 3 1
> > t3 `- 4 `- 5 1
> >
> > We could get in /proc/t3/status:
> > NSpid: 4 5 1
> > We knew that pid 1 in container is pid 4 in init ns.
> >
> > And we could get ns hierarchy under /proc/ns_hierarchy like:
> > init_ns->ns1->ns2 (as the result of readlink)
> > ->ns3
> > We knew that t3 in ns2, and its hierarchy.
> >
> > How these ideas looks like?
> > Any comments would be appreciated.
> >
> > Thanks,
> > - Chen
> >
> >
> > a) syscall
> > http://lwn.net/Articles/602987/
> >
> > b) procfs
> > http://www.spinics.net/lists/kernel/msg1751688.html
> >
> > _______________________________________________
> > Containers mailing list
> > [email protected]
> > https://lists.linuxfoundation.org/mailman/listinfo/containers
> _______________________________________________
> Containers mailing list
> [email protected]
> https://lists.linuxfoundation.org/mailman/listinfo/containers
????{.n?+???????+%?????ݶ??w??{.n?+????{??G?????{ay?ʇڙ?,j??f???h?????????z_??(?階?ݢj"???m??????G????????????&???~???iO???z??v?^?m???? ????????I?

2014-07-25 17:34:57

by Serge Hallyn

[permalink] [raw]
Subject: Re: [RFC]Pid conversion between pid namespace

Quoting [email protected] ([email protected]):
> Hi,
>
> > -----Original Message-----
> > From: Serge Hallyn [mailto:[email protected]]
> > Sent: Tuesday, July 15, 2014 12:16 PM
> > To: Chen, Hanxiao/陈 晗霄
> > Subject: Re: [RFC]Pid conversion between pid namespace
> > > A-2) syscall pid_t getnspid(pid_t query_pid, pid_t observer_pid)
> > > pros:
> > > - ns procfs free, easy to use.
> > > We could get rid of mounted ns procfs.
> > >
> > > cons:
> > > - may find multiple results in nested ns.
> > > We wished the new API could tell us the exact answer.
> > > But if getnspid return more than one results will bring trouble to admins,
> >
> > (See below for more, but) the question being posed to getnspid has precisely
> > one answer.
> >
> > > they had to make another decision.
> > > Or we marked the deepest level for translation as prerequisite.
> > >
> > > -based on current pidns, no reference ns.
> >
> > Hm, no. The intent here was that
> >
> > observer_pid would be in current ns
> > query_pid would be in observer_pid's ns.
> >
> > So this would be ideal for "I got a pid in a logfile created by rsyslog in
> > a nested contaner, what is the logged pid in my pidns."
> >
> > Taking a set of tasks (like a container with nesting) and bulding a tree
> > of all pids shouldn't be too difficult either. Start with the init pid,
> > call getnspid($pid, $init_pid) for every $pid in the container; to figure
> > out whether any $pid is itself a nested init_pid, we can compare the
> > /proc/$$/ns/pid, as well as look at getnspid($pid, $pid).
> I'm a little confused in this section:
>
> Ex:
> init_pid_ns ns1 ns2
> t1 2
> t2 `- 3 1
> t3 `- 4 `- 5 1
> t4 `-6 `-8 `-9
> t5 `-10 `-9 `-10
>
> For getnspid($pid, $init_pid),
> Does init_pid means container's init_pid such as 3 for t2?

Right, if you're in init_pid_ns and making the query, then
you'd pass 3.

> In nested containers, does this syscall work as:
> getnspid(9, 4) -> (6, 8, 9)

No, assuming the querying task is in init_pid_ns,
getnspid(9, 4) would return 6.

4 is the observer pid given in the querier's own pidns, so
it refers to t3. 9 is the pid being queried, in the oberver's
pidns, so it revers to t4. The result is, the pid in our own
pidns.

Does that help clarify at all? I'm not sure whether the problem is that
I didn't explain well enough from the start, or whether this just shows
that the API is one only its mother could love :)

-serge

2014-07-28 08:16:55

by Hu Tao

[permalink] [raw]
Subject: Re: [RFC]Pid conversion between pid namespace

Hi,

On Fri, Jul 25, 2014 at 05:34:43PM +0000, Serge Hallyn wrote:
> Quoting [email protected] ([email protected]):
> > Hi,
> >
> > > -----Original Message-----
> > > From: Serge Hallyn [mailto:[email protected]]
> > > Sent: Tuesday, July 15, 2014 12:16 PM
> > > To: Chen, Hanxiao/陈 晗霄
> > > Subject: Re: [RFC]Pid conversion between pid namespace
> > > > A-2) syscall pid_t getnspid(pid_t query_pid, pid_t observer_pid)
> > > > pros:
> > > > - ns procfs free, easy to use.
> > > > We could get rid of mounted ns procfs.
> > > >
> > > > cons:
> > > > - may find multiple results in nested ns.
> > > > We wished the new API could tell us the exact answer.
> > > > But if getnspid return more than one results will bring trouble to admins,
> > >
> > > (See below for more, but) the question being posed to getnspid has precisely
> > > one answer.
> > >
> > > > they had to make another decision.
> > > > Or we marked the deepest level for translation as prerequisite.
> > > >
> > > > -based on current pidns, no reference ns.
> > >
> > > Hm, no. The intent here was that
> > >
> > > observer_pid would be in current ns
> > > query_pid would be in observer_pid's ns.
> > >
> > > So this would be ideal for "I got a pid in a logfile created by rsyslog in
> > > a nested contaner, what is the logged pid in my pidns."
> > >
> > > Taking a set of tasks (like a container with nesting) and bulding a tree
> > > of all pids shouldn't be too difficult either. Start with the init pid,
> > > call getnspid($pid, $init_pid) for every $pid in the container; to figure
> > > out whether any $pid is itself a nested init_pid, we can compare the
> > > /proc/$$/ns/pid, as well as look at getnspid($pid, $pid).
> > I'm a little confused in this section:
> >
> > Ex:
> > init_pid_ns ns1 ns2
> > t1 2
> > t2 `- 3 1
> > t3 `- 4 `- 5 1
> > t4 `-6 `-8 `-9
> > t5 `-10 `-9 `-10
> >
> > For getnspid($pid, $init_pid),
> > Does init_pid means container's init_pid such as 3 for t2?
>
> Right, if you're in init_pid_ns and making the query, then
> you'd pass 3.

Sorry for jumping in, but I'm not quite understanding the purpose of
$init_pid here, does it identify the ns which the process to be
queried is in? Also see my questions below:

1. Given the example above, what's the return of getnspid(9, 3)?
Is it 6(task t4) or 10(task t5)?

2. if there is a process in ns1 which is a child of process 1 has pid
10, but not in ns2, like below:

init_pid_ns ns1 ns2
t1 2
t2 `- 3 1
t3 `- 4 +- 5 1
t4 `-6 | `-8 `-9
t5 `-10 | `-9 `-10
t6 `-11 `-10

then what is the return of getnspid(10, 3)?

Regards,
Hu

>
>
> > In nested containers, does this syscall work as:
> > getnspid(9, 4) -> (6, 8, 9)
>
> No, assuming the querying task is in init_pid_ns,
> getnspid(9, 4) would return 6.
>
> 4 is the observer pid given in the querier's own pidns, so
> it refers to t3. 9 is the pid being queried, in the oberver's
> pidns, so it revers to t4. The result is, the pid in our own
> pidns.
>
> Does that help clarify at all? I'm not sure whether the problem is that
> I didn't explain well enough from the start, or whether this just shows
> that the API is one only its mother could love :)
>
> -serge
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/

2014-07-28 13:25:14

by Serge Hallyn

[permalink] [raw]
Subject: Re: [RFC]Pid conversion between pid namespace

Quoting Hu Tao ([email protected]):
> Hi,
>
> On Fri, Jul 25, 2014 at 05:34:43PM +0000, Serge Hallyn wrote:
> > Quoting [email protected] ([email protected]):
> > > Hi,
> > >
> > > > -----Original Message-----
> > > > From: Serge Hallyn [mailto:[email protected]]
> > > > Sent: Tuesday, July 15, 2014 12:16 PM
> > > > To: Chen, Hanxiao/陈 晗霄
> > > > Subject: Re: [RFC]Pid conversion between pid namespace
> > > > > A-2) syscall pid_t getnspid(pid_t query_pid, pid_t observer_pid)
> > > > > pros:
> > > > > - ns procfs free, easy to use.
> > > > > We could get rid of mounted ns procfs.
> > > > >
> > > > > cons:
> > > > > - may find multiple results in nested ns.
> > > > > We wished the new API could tell us the exact answer.
> > > > > But if getnspid return more than one results will bring trouble to admins,
> > > >
> > > > (See below for more, but) the question being posed to getnspid has precisely
> > > > one answer.
> > > >
> > > > > they had to make another decision.
> > > > > Or we marked the deepest level for translation as prerequisite.
> > > > >
> > > > > -based on current pidns, no reference ns.
> > > >
> > > > Hm, no. The intent here was that
> > > >
> > > > observer_pid would be in current ns
> > > > query_pid would be in observer_pid's ns.
> > > >
> > > > So this would be ideal for "I got a pid in a logfile created by rsyslog in
> > > > a nested contaner, what is the logged pid in my pidns."
> > > >
> > > > Taking a set of tasks (like a container with nesting) and bulding a tree
> > > > of all pids shouldn't be too difficult either. Start with the init pid,
> > > > call getnspid($pid, $init_pid) for every $pid in the container; to figure
> > > > out whether any $pid is itself a nested init_pid, we can compare the
> > > > /proc/$$/ns/pid, as well as look at getnspid($pid, $pid).
> > > I'm a little confused in this section:
> > >
> > > Ex:
> > > init_pid_ns ns1 ns2
> > > t1 2
> > > t2 `- 3 1
> > > t3 `- 4 `- 5 1
> > > t4 `-6 `-8 `-9
> > > t5 `-10 `-9 `-10
> > >
> > > For getnspid($pid, $init_pid),
> > > Does init_pid means container's init_pid such as 3 for t2?
> >
> > Right, if you're in init_pid_ns and making the query, then
> > you'd pass 3.
>
> Sorry for jumping in, but I'm not quite understanding the purpose of
> $init_pid here, does it identify the ns which the process to be
> queried is in? Also see my questions below:

I was passing in initpid for a particular reason before, the second
argument is NOT meant to be an "initpid", it's meant to be the pid
(in caller's ns) of the observer pid - the pid in whose namespace we
are querying.

> 1. Given the example above, what's the return of getnspid(9, 3)?
> Is it 6(task t4) or 10(task t5)?

Assuming the caller is in init_pid_ns, then the return value is t5.

>
> 2. if there is a process in ns1 which is a child of process 1 has pid
> 10, but not in ns2, like below:
>
> init_pid_ns ns1 ns2
> t1 2
> t2 `- 3 1
> t3 `- 4 +- 5 1
> t4 `-6 | `-8 `-9
> t5 `-10 | `-9 `-10
> t6 `-11 `-10
>
> then what is the return of getnspid(10, 3)?

Assuming the caller is in init_pid_ns, the answer is t6. The question
was "In the pid_ns belonging to t2 (pid 3), what task does the pid
10 refer to".