2012-02-23 07:45:33

by Balbir Singh

[permalink] [raw]
Subject: Re: [PATCH] mm: Enable MAP_UNINITIALIZED for archs with mmu

On Thu, Jan 19, 2012 at 12:21 AM, Arun Sharma <[email protected]> wrote:
>
> This enables malloc optimizations where we might
> madvise(..,MADV_DONTNEED) a page only to fault it
> back at a different virtual address.
>
> To ensure that we don't leak sensitive data to
> unprivileged processes, we enable this optimization
> only for pages that are reused within a memory
> cgroup.
>

So the assumption is that only apps that have access to each others
VMA's will run in this cgroup?

> The idea is to make this opt-in both at the mmap()
> level and cgroup level so the default behavior is
> unchanged after the patch.
>

Sorry, I am not convinced we need to do this

1. I know that zeroing out memory is expensive, but building a
potential loop hole is not a good idea
2. How do we ensure that tasks in a cgroup should be allowed to reuse
memory uninitialized, how does the cgroup admin know what she is
getting into?

So I am going to NACK this.

Balbir


2012-02-23 18:42:31

by Arun Sharma

[permalink] [raw]
Subject: Re: [PATCH] mm: Enable MAP_UNINITIALIZED for archs with mmu

Hi Balbir,

Thanks for reviewing. Would you change your position if I limit the
scope of the patch to a cgroup with a single address space?

The moment the cgroup sees more than one address space (either due to
tasks getting created or being added), this optimization would be turned
off.

More details below:

On 2/22/12 11:45 PM, Balbir Singh wrote:
>
> So the assumption is that only apps that have access to each others
> VMA's will run in this cgroup?
>

In a distributed computing environment, a user submits a job to the
cluster job scheduler. The job might involve multiple related
executables and might involve multiple address spaces. But they're
performing one logical task, have a single resource limit enforced by a
cgroup.

They don't have access to each other's VMAs, but if "accidentally" one
of them comes across an uninitialized page with data from another task,
it's not a violation of the security model.

> Sorry, I am not convinced we need to do this
>
> 1. I know that zeroing out memory is expensive, but building a
> potential loop hole is not a good idea
> 2. How do we ensure that tasks in a cgroup should be allowed to reuse
> memory uninitialized, how does the cgroup admin know what she is
> getting into?

I was thinking of addressing this via documentation (as in: don't use
this if you don't know what you're doing!). But limiting the scope to a
single address space cgroup seems cleaner to me.

-Arun

2012-02-24 02:49:14

by Kamezawa Hiroyuki

[permalink] [raw]
Subject: Re: [PATCH] mm: Enable MAP_UNINITIALIZED for archs with mmu

On Thu, 23 Feb 2012 10:42:16 -0800
Arun Sharma <[email protected]> wrote:

> Hi Balbir,
>
> Thanks for reviewing. Would you change your position if I limit the
> scope of the patch to a cgroup with a single address space?
>
> The moment the cgroup sees more than one address space (either due to
> tasks getting created or being added), this optimization would be turned
> off.
>
> More details below:
>
> On 2/22/12 11:45 PM, Balbir Singh wrote:
> >
> > So the assumption is that only apps that have access to each others
> > VMA's will run in this cgroup?
> >
>
> In a distributed computing environment, a user submits a job to the
> cluster job scheduler. The job might involve multiple related
> executables and might involve multiple address spaces. But they're
> performing one logical task, have a single resource limit enforced by a
> cgroup.
>
> They don't have access to each other's VMAs, but if "accidentally" one
> of them comes across an uninitialized page with data from another task,
> it's not a violation of the security model.
>
How do you handle shared resouce, file-cache ?

Thanks,
-Kame

2012-02-24 14:51:21

by Balbir Singh

[permalink] [raw]
Subject: Re: [PATCH] mm: Enable MAP_UNINITIALIZED for archs with mmu

On Fri, Feb 24, 2012 at 8:17 AM, KAMEZAWA Hiroyuki
<[email protected]> wrote:
>> They don't have access to each other's VMAs, but if "accidentally" one
>> of them comes across an uninitialized page with data from another task,
>> it's not a violation of the security model.

Can you expand more on the single address space model?

Balbir

2012-02-24 19:11:20

by Arun Sharma

[permalink] [raw]
Subject: Re: [PATCH] mm: Enable MAP_UNINITIALIZED for archs with mmu

On 2/24/12 6:51 AM, Balbir Singh wrote:
> On Fri, Feb 24, 2012 at 8:17 AM, KAMEZAWA Hiroyuki
> <[email protected]> wrote:
>>> They don't have access to each other's VMAs, but if "accidentally" one
>>> of them comes across an uninitialized page with data from another task,
>>> it's not a violation of the security model.
>
> Can you expand more on the single address space model?

I haven't thought this through yet. But I know that just adding

&& (cgroup_task_count() == 1)

to page_needs_clearing() is not going to do it. We'll have to design a
new mechanism (cgroup_mm_count_all()?) and make sure that it doesn't
race with fork() and inadvertently expose pages from the new address
space to the existing one.

A uid based approach such as the one implemented by Davide Libenzi

http://thread.gmane.org/gmane.linux.kernel/548928
http://thread.gmane.org/gmane.linux.kernel/548926

would probably apply the optimization to more use cases - but
conceptually a bit more complex. If we go with this more relaxed
approach, we'll have to design a race-free cgroup_uid_count() based
mechanism.

-Arun

2012-02-24 19:27:08

by Arun Sharma

[permalink] [raw]
Subject: Re: [PATCH] mm: Enable MAP_UNINITIALIZED for archs with mmu

On 2/23/12 6:47 PM, KAMEZAWA Hiroyuki wrote:
>>
>> In a distributed computing environment, a user submits a job to the
>> cluster job scheduler. The job might involve multiple related
>> executables and might involve multiple address spaces. But they're
>> performing one logical task, have a single resource limit enforced by a
>> cgroup.
>>
>> They don't have access to each other's VMAs, but if "accidentally" one
>> of them comes across an uninitialized page with data from another task,
>> it's not a violation of the security model.
>>
> How do you handle shared resouce, file-cache ?
>

From a security perspective or a resource limit perspective?

Security: all processes in the cgroup run with the same uid and have the
same access to the filesystem. Multiple address spaces in a cgroup can
be thought of as an implementation detail.

Resource limit: We don't have strict enforcement right now. There is a
desire to include everything (file cache, slab memory) in the job's
memory resource limit.

-Arun

2012-02-25 04:13:16

by Balbir Singh

[permalink] [raw]
Subject: Re: [PATCH] mm: Enable MAP_UNINITIALIZED for archs with mmu

On Sat, Feb 25, 2012 at 12:41 AM, Arun Sharma <[email protected]> wrote:
> On 2/24/12 6:51 AM, Balbir Singh wrote:
>>
>> On Fri, Feb 24, 2012 at 8:17 AM, KAMEZAWA Hiroyuki
>> <[email protected]> ?wrote:
>>>>
>>>> They don't have access to each other's VMAs, but if "accidentally" one
>>>> of them comes across an uninitialized page with data from another task,
>>>> it's not a violation of the security model.
>>
>>
>> Can you expand more on the single address space model?
>
>
> I haven't thought this through yet. But I know that just adding
>
> && (cgroup_task_count() == 1)
>
> to page_needs_clearing() is not going to do it. We'll have to design a new
> mechanism (cgroup_mm_count_all()?) and make sure that it doesn't race with
> fork() and inadvertently expose pages from the new address space to the
> existing one.
>
> A uid based approach such as the one implemented by Davide Libenzi
>
> http://thread.gmane.org/gmane.linux.kernel/548928
> http://thread.gmane.org/gmane.linux.kernel/548926
>
> would probably apply the optimization to more use cases - but conceptually a
> bit more complex. If we go with this more relaxed approach, we'll have to
> design a race-free cgroup_uid_count() based mechanism.

Are you suggesting all processes with the same UID should have access
to each others memory contents?

Balbir

2012-02-27 18:32:44

by Arun Sharma

[permalink] [raw]
Subject: Re: [PATCH] mm: Enable MAP_UNINITIALIZED for archs with mmu

On 2/24/12 8:13 PM, Balbir Singh wrote:
>> A uid based approach such as the one implemented by Davide Libenzi
>>
>> http://thread.gmane.org/gmane.linux.kernel/548928
>> http://thread.gmane.org/gmane.linux.kernel/548926
>>
>> would probably apply the optimization to more use cases - but conceptually a
>> bit more complex. If we go with this more relaxed approach, we'll have to
>> design a race-free cgroup_uid_count() based mechanism.
>
> Are you suggesting all processes with the same UID should have access
> to each others memory contents?

No - that's a stronger statement than the one I made in my last message.
I'll however observe that something like this is already possible via
PTRACE_PEEKDATA.

Like I said: a cgroup with a single mm_struct is conceptually cleanest
and covers some of our heavy use cases. A cgroup with a single uid would
cover more of our use cases. It'd be good to know if you and other
maintainers are willing to accept the former, but not the latter.

I'll note that the malloc implementation which uses these interfaces can
still decide to zero the memory depending on which variant of *alloc is
called. But then, we'd have more fine grained control and more
flexibility in terms of temporal usage hints.

-Arun