On Fri, Sep 30, 2016 at 07:01:13PM -0700, Andy Lutomirski wrote:
> On an unrelated note, can we please lock down all the silly historical
> *userspace* info leaks in /proc? Nasty ones include: net, cmdline (at
> the very least, only argv[0] should be visible if the reader lacks
> ptrace access).
>
> Less nasty ones include: limits, sched, autogroup, comm, wchan,
> schedstat, cpuset, cgroup, oom_*, sessionid, coredump_filter
If that doesn't break stuff, I'm very much in favor of it.
> uid_map, gid_map, etc are just screwed up. They should be per
> *namespace* somewhere, and they should require creds on the namespace.
What do you have in mind? Something like
/proc/namespaces/user:123456/{uid_map,gid_map,setgroups,parent_ns},
with jumped fake symlinks to the directory and its entries in /proc/$pid/?
> timerslack is totally fscked up -- it allows ugo to write and it
> checks the wrong creds. Jann, does your series fix that?
Nope. Never noticed that thing so far, probably because it was only
added a few months ago. :/ Will add it to my series.
On Sat, Oct 1, 2016 at 3:37 AM, Jann Horn <[email protected]> wrote:
> On Fri, Sep 30, 2016 at 07:01:13PM -0700, Andy Lutomirski wrote:
>> On an unrelated note, can we please lock down all the silly historical
>> *userspace* info leaks in /proc? Nasty ones include: net, cmdline (at
>> the very least, only argv[0] should be visible if the reader lacks
>> ptrace access).
>>
>> Less nasty ones include: limits, sched, autogroup, comm, wchan,
>> schedstat, cpuset, cgroup, oom_*, sessionid, coredump_filter
>
> If that doesn't break stuff, I'm very much in favor of it.
>
>
>> uid_map, gid_map, etc are just screwed up. They should be per
>> *namespace* somewhere, and they should require creds on the namespace.
>
> What do you have in mind? Something like
> /proc/namespaces/user:123456/{uid_map,gid_map,setgroups,parent_ns},
> with jumped fake symlinks to the directory and its entries in /proc/$pid/?
>
Something along those lines would be nice.
There's an unfortunate tension between having names for namespaces
(like 123456 in your example) for ease of use and *not* having names
so CRIU can restore them more easily.
>
>> timerslack is totally fscked up -- it allows ugo to write and it
>> checks the wrong creds. Jann, does your series fix that?
>
> Nope. Never noticed that thing so far, probably because it was only
> added a few months ago. :/ Will add it to my series.
--
Andy Lutomirski
AMA Capital Management, LLC
On Fri, Oct 14, 2016 at 11:25:58AM -0700, Andy Lutomirski wrote:
> On Sat, Oct 1, 2016 at 3:37 AM, Jann Horn <[email protected]> wrote:
> > On Fri, Sep 30, 2016 at 07:01:13PM -0700, Andy Lutomirski wrote:
> >> On an unrelated note, can we please lock down all the silly historical
> >> *userspace* info leaks in /proc? Nasty ones include: net, cmdline (at
> >> the very least, only argv[0] should be visible if the reader lacks
> >> ptrace access).
> >>
> >> Less nasty ones include: limits, sched, autogroup, comm, wchan,
> >> schedstat, cpuset, cgroup, oom_*, sessionid, coredump_filter
> >
> > If that doesn't break stuff, I'm very much in favor of it.
> >
> >
> >> uid_map, gid_map, etc are just screwed up. They should be per
> >> *namespace* somewhere, and they should require creds on the namespace.
> >
> > What do you have in mind? Something like
> > /proc/namespaces/user:123456/{uid_map,gid_map,setgroups,parent_ns},
> > with jumped fake symlinks to the directory and its entries in /proc/$pid/?
> >
>
> Something along those lines would be nice.
>
> There's an unfortunate tension between having names for namespaces
> (like 123456 in your example) for ease of use and *not* having names
> so CRIU can restore them more easily.
As long as the namespaces aren't visible to the container itself, it
has been okay. For example, LXC (and IIUC OpenVZ as well) put
containers in a /lxc/<name> cgroup, which with cgroup namespaces isn't
visible to the container. CRIU implements support for rewriting this
prefix, so we can do renaming on the fly. Since most container engines
won't allow two containers with the same name, we avoid clashes at a
higher level before we end up with an actual "namespace ID" (cgroup
path) clash. We're doing a similar thing for AppArmor namespaces.
Of course, if the IDs are visible to the container like they would be
in this case, then it is much harder.
Cheers,
Tycho