2013-06-21 23:42:43

by Colin Cross

[permalink] [raw]
Subject: RFC: named anonymous vmas

One of the features of ashmem (drivers/staging/android/ashmem.c) that
hasn't gotten much discussion about moving out of staging is named
anonymous memory.

In Android, ashmem is used for three different features, and most
users of it only care about one feature at a time. One is volatile
ranges, which John Stultz has been implementing. The second is
anonymous shareable memory without having a world-writable tmpfs that
untrusted apps could fill with files. The third and most heavily used
feature within the Android codebase is named anonymous memory, where a
region of anonymous memory can have a name associated with it that
will show up in /proc/pid/maps. The Dalvik VM likes to use this
feature extensively, even for memory that will never be shared and
could easily be allocated using an anonymous mmap, and even malloc has
used it in the past. It provides an easy way to collate memory used
for different purposes across multiple processes, which Android uses
for its "dumpsys meminfo" and "librank" tools to determine how much
memory is used for java heaps, JIT caches, native mallocs, etc.

I'd like to add this feature for anonymous mmap memory. I propose
adding an madvise2(unsigned long start, size_t len_in, int behavior,
void *ptr, size_t size) syscall and a new MADV_NAME behavior, which
treats ptr as a string of length size. The string would be copied
somewhere reusable in the kernel, or reused if it already exists, and
the kernel address of the string would get stashed in a new field in
struct vm_area_struct. Adjacent vmas would only get merged if the
name pointer matched, and naming part of a mapping would split the
mapping. show_map_vma would print the name only if none of the other
existing names rules match.

Any comments as I start implementing it? Is there any reason to allow
naming a file-backed mapping and showing it alongside the file name in
/proc/pid/maps?


2013-06-22 05:12:51

by Kyungmin Park

[permalink] [raw]
Subject: Re: RFC: named anonymous vmas

On Sat, Jun 22, 2013 at 8:42 AM, Colin Cross <[email protected]> wrote:
> One of the features of ashmem (drivers/staging/android/ashmem.c) that
> hasn't gotten much discussion about moving out of staging is named
> anonymous memory.
>
> In Android, ashmem is used for three different features, and most
> users of it only care about one feature at a time. One is volatile
> ranges, which John Stultz has been implementing. The second is
> anonymous shareable memory without having a world-writable tmpfs that
> untrusted apps could fill with files. The third and most heavily used
> feature within the Android codebase is named anonymous memory, where a
> region of anonymous memory can have a name associated with it that
> will show up in /proc/pid/maps. The Dalvik VM likes to use this

Good to know it. I didn't know ashmem provides these features.
we are also discussing these requirement internally. and study how to
show who request these anon memory and which callback is used for it.

> feature extensively, even for memory that will never be shared and
> could easily be allocated using an anonymous mmap, and even malloc has
> used it in the past. It provides an easy way to collate memory used
> for different purposes across multiple processes, which Android uses
> for its "dumpsys meminfo" and "librank" tools to determine how much
> memory is used for java heaps, JIT caches, native mallocs, etc.
Same requirement for app developers. they want to know what's the
meaning these anon memory is allocated and how to find out these anon
memory is allocated at their codes.
>
> I'd like to add this feature for anonymous mmap memory. I propose
> adding an madvise2(unsigned long start, size_t len_in, int behavior,
> void *ptr, size_t size) syscall and a new MADV_NAME behavior, which
> treats ptr as a string of length size. The string would be copied
> somewhere reusable in the kernel, or reused if it already exists, and
> the kernel address of the string would get stashed in a new field in
> struct vm_area_struct. Adjacent vmas would only get merged if the
> name pointer matched, and naming part of a mapping would split the
> mapping. show_map_vma would print the name only if none of the other
> existing names rules match.
Do you want to create new syscall? can it use current madvise and only
allow this feature at linux only?
As you know it's just hint and it doesn't break existing memory behaviors.
>
> Any comments as I start implementing it? Is there any reason to allow
> naming a file-backed mapping and showing it alongside the file name in
> /proc/pid/maps?
>

Thank you,
Kyungmin Park

2013-06-22 05:20:05

by Colin Cross

[permalink] [raw]
Subject: Re: RFC: named anonymous vmas

On Fri, Jun 21, 2013 at 10:12 PM, Kyungmin Park <[email protected]> wrote:
> On Sat, Jun 22, 2013 at 8:42 AM, Colin Cross <[email protected]> wrote:
>> One of the features of ashmem (drivers/staging/android/ashmem.c) that
>> hasn't gotten much discussion about moving out of staging is named
>> anonymous memory.
>>
>> In Android, ashmem is used for three different features, and most
>> users of it only care about one feature at a time. One is volatile
>> ranges, which John Stultz has been implementing. The second is
>> anonymous shareable memory without having a world-writable tmpfs that
>> untrusted apps could fill with files. The third and most heavily used
>> feature within the Android codebase is named anonymous memory, where a
>> region of anonymous memory can have a name associated with it that
>> will show up in /proc/pid/maps. The Dalvik VM likes to use this
>
> Good to know it. I didn't know ashmem provides these features.
> we are also discussing these requirement internally. and study how to
> show who request these anon memory and which callback is used for it.
>
>> feature extensively, even for memory that will never be shared and
>> could easily be allocated using an anonymous mmap, and even malloc has
>> used it in the past. It provides an easy way to collate memory used
>> for different purposes across multiple processes, which Android uses
>> for its "dumpsys meminfo" and "librank" tools to determine how much
>> memory is used for java heaps, JIT caches, native mallocs, etc.
> Same requirement for app developers. they want to know what's the
> meaning these anon memory is allocated and how to find out these anon
> memory is allocated at their codes.
>>
>> I'd like to add this feature for anonymous mmap memory. I propose
>> adding an madvise2(unsigned long start, size_t len_in, int behavior,
>> void *ptr, size_t size) syscall and a new MADV_NAME behavior, which
>> treats ptr as a string of length size. The string would be copied
>> somewhere reusable in the kernel, or reused if it already exists, and
>> the kernel address of the string would get stashed in a new field in
>> struct vm_area_struct. Adjacent vmas would only get merged if the
>> name pointer matched, and naming part of a mapping would split the
>> mapping. show_map_vma would print the name only if none of the other
>> existing names rules match.
> Do you want to create new syscall? can it use current madvise and only
> allow this feature at linux only?
> As you know it's just hint and it doesn't break existing memory behaviors.

The existing madvise syscall only takes a single int to modify the
vma, which is not enough to pass a pointer to a string.

2013-06-22 10:32:12

by Christoph Hellwig

[permalink] [raw]
Subject: Re: RFC: named anonymous vmas

On Fri, Jun 21, 2013 at 04:42:41PM -0700, Colin Cross wrote:
> ranges, which John Stultz has been implementing. The second is
> anonymous shareable memory without having a world-writable tmpfs that
> untrusted apps could fill with files.

I still haven't seen any explanation of what ashmem buys over a shared
mmap of /dev/zero in that respect, btw.

2013-06-22 17:30:32

by Colin Cross

[permalink] [raw]
Subject: Re: RFC: named anonymous vmas

On Sat, Jun 22, 2013 at 3:31 AM, Christoph Hellwig <[email protected]> wrote:
> On Fri, Jun 21, 2013 at 04:42:41PM -0700, Colin Cross wrote:
>> ranges, which John Stultz has been implementing. The second is
>> anonymous shareable memory without having a world-writable tmpfs that
>> untrusted apps could fill with files.
>
> I still haven't seen any explanation of what ashmem buys over a shared
> mmap of /dev/zero in that respect, btw.

I believe the difference is that ashmem ties the memory to an fd, so
it can be passed to another process and mmaped to get to the same
memory, but /dev/zero does not. Passing a /dev/zero fd and mmaping it
would result in a brand new region of zeroed memory. Opening a tmpfs
file would allow sharing memory by passing the fd, but we don't want a
world-writable tmpfs.

2013-06-24 11:48:35

by Christoph Hellwig

[permalink] [raw]
Subject: Re: RFC: named anonymous vmas

On Sat, Jun 22, 2013 at 12:47:29PM -0700, Alex Elsayed wrote:
> Couldn't this be done by having a root-only tmpfs, and having a userspace
> component that creates per-app directories with restrictive permissions on
> startup/app install? Then each app creates files in its own directory, and
> can pass the fds around.

Honestly having a device that allows passing fds around that can be
mmaped sounds a lot simpler. I have to admit that I expect /dev/zero
to do this, but looking at the code it creates new file structures
at ->mmap time which would defeat this.

2013-06-24 17:26:59

by Colin Cross

[permalink] [raw]
Subject: Re: RFC: named anonymous vmas

On Mon, Jun 24, 2013 at 4:48 AM, Christoph Hellwig <[email protected]> wrote:
> On Sat, Jun 22, 2013 at 12:47:29PM -0700, Alex Elsayed wrote:
>> Couldn't this be done by having a root-only tmpfs, and having a userspace
>> component that creates per-app directories with restrictive permissions on
>> startup/app install? Then each app creates files in its own directory, and
>> can pass the fds around.

If each app gets its own writable directory that's not really
different than a world writable tmpfs. It requires something that
watches for apps to exit for any reason and cleans up their
directories, and it requires each app to come up with an unused name
when it wants to create a file, and the kernel can give you both very
cleanly.

2013-06-24 23:45:14

by John Stultz

[permalink] [raw]
Subject: Re: RFC: named anonymous vmas

On Mon, Jun 24, 2013 at 10:26 AM, Colin Cross <[email protected]> wrote:
> On Mon, Jun 24, 2013 at 4:48 AM, Christoph Hellwig <[email protected]> wrote:
>> On Sat, Jun 22, 2013 at 12:47:29PM -0700, Alex Elsayed wrote:
>>> Couldn't this be done by having a root-only tmpfs, and having a userspace
>>> component that creates per-app directories with restrictive permissions on
>>> startup/app install? Then each app creates files in its own directory, and
>>> can pass the fds around.
>
> If each app gets its own writable directory that's not really
> different than a world writable tmpfs. It requires something that
> watches for apps to exit for any reason and cleans up their
> directories, and it requires each app to come up with an unused name
> when it wants to create a file, and the kernel can give you both very
> cleanly.

Though, I believe having a daemon that has exclusive access to tmpfs,
and creates, unlinks and passes the fd to the requesting application
would provide a userspace only implementation of the second feature
requirement ("without having a world-writable tmpfs that untrusted
apps could fill with files"). Though I'm not sure what the
proc/<pid>/maps naming would look like on the unlinked file, so it
might not solve the third naming issue.

thanks
-john

2013-07-14 00:27:11

by Sam Ben

[permalink] [raw]
Subject: Re: RFC: named anonymous vmas

Hi Colin,
On 06/22/2013 07:42 AM, Colin Cross wrote:
> One of the features of ashmem (drivers/staging/android/ashmem.c) that
> hasn't gotten much discussion about moving out of staging is named
> anonymous memory.
>
> In Android, ashmem is used for three different features, and most
> users of it only care about one feature at a time. One is volatile
> ranges, which John Stultz has been implementing. The second is
> anonymous shareable memory without having a world-writable tmpfs that
> untrusted apps could fill with files. The third and most heavily used

How to understand "anonymous shareable memory without having a
world-writable tmpfs that untrusted apps could fill with files"?

> feature within the Android codebase is named anonymous memory, where a
> region of anonymous memory can have a name associated with it that
> will show up in /proc/pid/maps. The Dalvik VM likes to use this
> feature extensively, even for memory that will never be shared and
> could easily be allocated using an anonymous mmap, and even malloc has
> used it in the past. It provides an easy way to collate memory used
> for different purposes across multiple processes, which Android uses
> for its "dumpsys meminfo" and "librank" tools to determine how much
> memory is used for java heaps, JIT caches, native mallocs, etc.
>
> I'd like to add this feature for anonymous mmap memory. I propose
> adding an madvise2(unsigned long start, size_t len_in, int behavior,
> void *ptr, size_t size) syscall and a new MADV_NAME behavior, which
> treats ptr as a string of length size. The string would be copied
> somewhere reusable in the kernel, or reused if it already exists, and
> the kernel address of the string would get stashed in a new field in
> struct vm_area_struct. Adjacent vmas would only get merged if the
> name pointer matched, and naming part of a mapping would split the
> mapping. show_map_vma would print the name only if none of the other
> existing names rules match.
>
> Any comments as I start implementing it? Is there any reason to allow
> naming a file-backed mapping and showing it alongside the file name in
> /proc/pid/maps?
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to [email protected]. For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"[email protected]"> [email protected] </a>

2013-07-14 00:57:23

by Sam Ben

[permalink] [raw]
Subject: Re: RFC: named anonymous vmas

Hi Christoph,
On 06/24/2013 07:48 PM, Christoph Hellwig wrote:
> On Sat, Jun 22, 2013 at 12:47:29PM -0700, Alex Elsayed wrote:
>> Couldn't this be done by having a root-only tmpfs, and having a userspace
>> component that creates per-app directories with restrictive permissions on
>> startup/app install? Then each app creates files in its own directory, and
>> can pass the fds around.
> Honestly having a device that allows passing fds around that can be
> mmaped sounds a lot simpler. I have to admit that I expect /dev/zero
> to do this, but looking at the code it creates new file structures
> at ->mmap time which would defeat this.

Could you point out where done this?

>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to [email protected]. For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"[email protected]"> [email protected] </a>

2013-08-01 08:29:55

by Christoph Hellwig

[permalink] [raw]
Subject: Re: RFC: named anonymous vmas

Btw, FreeBSD has an extension to shm_open to create unnamed but fd
passable segments. From their man page:

As a FreeBSD extension, the constant SHM_ANON may be used for the path
argument to shm_open(). In this case, an anonymous, unnamed shared
memory object is created. Since the object has no name, it cannot be
removed via a subsequent call to shm_unlink(). Instead, the shared
memory object will be garbage collected when the last reference to the
shared memory object is removed. The shared memory object may be shared
with other processes by sharing the file descriptor via fork(2) or
sendmsg(2). Attempting to open an anonymous shared memory object with
O_RDONLY will fail with EINVAL. All other flags are ignored.

To me this sounds like the best way to expose this functionality to the
user. Implementing it is another question as shm_open sits in libc,
we could either take it and shm_unlink to the kernel, or use O_TMPFILE
on tmpfs as the backend.

2013-08-01 08:36:41

by Rich Felker

[permalink] [raw]
Subject: Re: RFC: named anonymous vmas

On Thu, Aug 01, 2013 at 01:29:51AM -0700, Christoph Hellwig wrote:
> Btw, FreeBSD has an extension to shm_open to create unnamed but fd
> passable segments. From their man page:
>
> As a FreeBSD extension, the constant SHM_ANON may be used for the path
> argument to shm_open(). In this case, an anonymous, unnamed shared
> memory object is created. Since the object has no name, it cannot be
> removed via a subsequent call to shm_unlink(). Instead, the shared
> memory object will be garbage collected when the last reference to the
> shared memory object is removed. The shared memory object may be shared
> with other processes by sharing the file descriptor via fork(2) or
> sendmsg(2). Attempting to open an anonymous shared memory object with
> O_RDONLY will fail with EINVAL. All other flags are ignored.
>
> To me this sounds like the best way to expose this functionality to the
> user. Implementing it is another question as shm_open sits in libc,
> we could either take it and shm_unlink to the kernel, or use O_TMPFILE
> on tmpfs as the backend.

I'm not sure what the purpose is. shm_open with a long random filename
and O_EXCL|O_CREAT, followed immediately by shm_unlink, is just as
good except in the case where you have a malicious user killing the
process in between these two operations.

Rich

2013-08-02 15:11:46

by Christoph Hellwig

[permalink] [raw]
Subject: Re: RFC: named anonymous vmas

On Thu, Aug 01, 2013 at 04:36:08AM -0400, Rich Felker wrote:
> I'm not sure what the purpose is. shm_open with a long random filename
> and O_EXCL|O_CREAT, followed immediately by shm_unlink, is just as
> good except in the case where you have a malicious user killing the
> process in between these two operations.

The Android people already have an shm API doesn't leave traces in the
filesystem, and I at least conceptually agree that having an API that
doesn't introduce posisble other access is a good idea. This is the
same reason why the O_TMPFILE API was added in this releases.

2013-08-03 23:55:00

by KOSAKI Motohiro

[permalink] [raw]
Subject: Re: RFC: named anonymous vmas

On Thu, Aug 1, 2013 at 4:36 AM, Rich Felker <[email protected]> wrote:
> On Thu, Aug 01, 2013 at 01:29:51AM -0700, Christoph Hellwig wrote:
>> Btw, FreeBSD has an extension to shm_open to create unnamed but fd
>> passable segments. From their man page:
>>
>> As a FreeBSD extension, the constant SHM_ANON may be used for the path
>> argument to shm_open(). In this case, an anonymous, unnamed shared
>> memory object is created. Since the object has no name, it cannot be
>> removed via a subsequent call to shm_unlink(). Instead, the shared
>> memory object will be garbage collected when the last reference to the
>> shared memory object is removed. The shared memory object may be shared
>> with other processes by sharing the file descriptor via fork(2) or
>> sendmsg(2). Attempting to open an anonymous shared memory object with
>> O_RDONLY will fail with EINVAL. All other flags are ignored.
>>
>> To me this sounds like the best way to expose this functionality to the
>> user. Implementing it is another question as shm_open sits in libc,
>> we could either take it and shm_unlink to the kernel, or use O_TMPFILE
>> on tmpfs as the backend.
>
> I'm not sure what the purpose is. shm_open with a long random filename
> and O_EXCL|O_CREAT, followed immediately by shm_unlink, is just as
> good except in the case where you have a malicious user killing the
> process in between these two operations.

Practically, filename length is restricted by NAME_MAX(255bytes). Several
people don't think it is enough long length. The point is, race free API.