2017-09-14 17:00:49

by Rik van Riel

[permalink] [raw]
Subject: [patch] madvise.2: Add MADV_WIPEONFORK documentation

Add MADV_WIPEONFORK and MADV_KEEPONFORK documentation to
madvise.2. The new functionality was recently merged by
Linus, and should be in the 4.14 kernel.

While documenting what EINVAL means for MADV_WIPEONFORK,
I realized that MADV_FREE has the same thing going on,
so I documented EINVAL for both in the ERRORS section.

This patch documents the following kernel commit:

commit d2cd9ede6e193dd7d88b6d27399e96229a551b19
Author: Rik van Riel <[email protected]>
Date: Wed Sep 6 16:25:15 2017 -0700

mm,fork: introduce MADV_WIPEONFORK

Signed-off-by: Rik van Riel <[email protected]>

index dfb31b63dba3..4f987ddfae79 100644
--- a/man2/madvise.2
+++ b/man2/madvise.2
@@ -31,6 +31,8 @@
.\" 2010-06-19, Andi Kleen, Add documentation of MADV_SOFT_OFFLINE.
.\" 2011-09-18, Doug Goldstein <[email protected]>
.\" Document MADV_HUGEPAGE and MADV_NOHUGEPAGE
+.\" 2017-09-14, Rik van Riel <[email protected]>
+.\" Document MADV_WIPEONFORK and MADV_KEEPONFORK
.\"
.TH MADVISE 2 2017-07-13 "Linux" "Linux Programmer's Manual"
.SH NAME
@@ -405,6 +407,22 @@ can be applied only to private anonymous pages (see
.BR mmap (2)).
On a swapless system, freeing pages in a given range happens instantly,
regardless of memory pressure.
+.TP
+.BR MADV_WIPEONFORK " (since Linux 4.14)"
+Present the child process with zero-filled memory in this range after a
+.BR fork (2).
+This is useful for per-process data in forking servers that should be
+re-initialized in the child process after a fork, for example PRNG seeds,
+cryptographic data, etc.
+.IP
+The
+.B MADV_WIPEONFORK
+operation can only be applied to private anonymous pages (see
+.BR mmap (2)).
+.TP
+.BR MADV_KEEPONFORK " (since Linux 4.14)"
+Undo the effect of an earlier
+.BR MADV_WIPEONFORK .
.SH RETURN VALUE
On success,
.BR madvise ()
@@ -457,6 +475,18 @@ or
but the kernel was not configured with
.BR CONFIG_KSM .
.TP
+.B EINVAL
+.I advice
+is
+.BR MADV_FREE
+or
+.BR MADV_WIPEONFORK
+but the specified address range includes file, Huge TLB,
+.BR MAP_SHARED ,
+or
+.BR VM_PFNMAP
+ranges.
+.TP
.B EIO
(for
.BR MADV_WILLNEED )


2017-09-14 17:09:09

by Colm MacCárthaigh

[permalink] [raw]
Subject: Re: [patch] madvise.2: Add MADV_WIPEONFORK documentation

Great change, just some suggestions ...

On Thu, Sep 14, 2017 at 10:00 AM, Rik van Riel <[email protected]> wrote:
> Add MADV_WIPEONFORK and MADV_KEEPONFORK documentation to
> madvise.2. The new functionality was recently merged by
> Linus, and should be in the 4.14 kernel.
>
> While documenting what EINVAL means for MADV_WIPEONFORK,
> I realized that MADV_FREE has the same thing going on,
> so I documented EINVAL for both in the ERRORS section.
>
> This patch documents the following kernel commit:
>
> commit d2cd9ede6e193dd7d88b6d27399e96229a551b19
> Author: Rik van Riel <[email protected]>
> Date: Wed Sep 6 16:25:15 2017 -0700
>
> mm,fork: introduce MADV_WIPEONFORK
>
> Signed-off-by: Rik van Riel <[email protected]>
>
> index dfb31b63dba3..4f987ddfae79 100644
> --- a/man2/madvise.2
> +++ b/man2/madvise.2
> @@ -31,6 +31,8 @@
> .\" 2010-06-19, Andi Kleen, Add documentation of MADV_SOFT_OFFLINE.
> .\" 2011-09-18, Doug Goldstein <[email protected]>
> .\" Document MADV_HUGEPAGE and MADV_NOHUGEPAGE
> +.\" 2017-09-14, Rik van Riel <[email protected]>
> +.\" Document MADV_WIPEONFORK and MADV_KEEPONFORK
> .\"

It seems to be idiomatic to reference the commit adding the options in
the hidden man-page comments. Probably needs:

.\" commit d2cd9ede6e193dd7d88b6d27399e96229a551b19

here. (That's the commit adding MADV_WIPEONFORK/MADV_KEEPONFORK to Linus' tree.


> .TH MADVISE 2 2017-07-13 "Linux" "Linux Programmer's Manual"
> .SH NAME
> @@ -405,6 +407,22 @@ can be applied only to private anonymous pages (see
> .BR mmap (2)).
> On a swapless system, freeing pages in a given range happens instantly,
> regardless of memory pressure.
> +.TP
> +.BR MADV_WIPEONFORK " (since Linux 4.14)"
> +Present the child process with zero-filled memory in this range after a
> +.BR fork (2).
> +This is useful for per-process data in forking servers that should be
> +re-initialized in the child process after a fork, for example PRNG seeds,
> +cryptographic data, etc.

Instead of cryptographic data, I would say more broadly "secrets" - to
help nudge best-practise. For example in an application that buffers
decrypted plaintext, it's smart to mark it as WIPEONFORK so that there
aren't unnecessary copies of the plaintext floating around.

I'd suggest patching fork.2 also, with something like:

index b5af58ca0..b11e750e3 100644
--- a/man2/fork.2
+++ b/man2/fork.2
@@ -140,6 +140,12 @@ Memory mappings that have been marked with the
flag are not inherited across a
.BR fork ().
.IP *
+Memory in mappings that have been marked with the
+.BR madvise (2)
+.B MADV_WIPEONFORK
+flag is zeroed in the child after a
+.BR fork ().
+.IP *
The termination signal of the child is always
.B SIGCHLD
(see



--
Colm

2017-09-14 19:05:51

by Rik van Riel

[permalink] [raw]
Subject: [patch v2] madvise.2: Add MADV_WIPEONFORK documentation

v2: implement the improvements suggested by Colm, and add
Colm's text to the fork.2 man page
(Colm, I have added a signed-off-by in your name - is that ok?)

Add MADV_WIPEONFORK and MADV_KEEPONFORK documentation to
madvise.2. The new functionality was recently merged by
Linus, and should be in the 4.14 kernel.

While documenting what EINVAL means for MADV_WIPEONFORK,
I realized that MADV_FREE has the same thing going on,
so I documented EINVAL for both in the ERRORS section.

This patch documents the following kernel commit:

commit d2cd9ede6e193dd7d88b6d27399e96229a551b19
Author: Rik van Riel <[email protected]>
Date: Wed Sep 6 16:25:15 2017 -0700

mm,fork: introduce MADV_WIPEONFORK

Signed-off-by: Rik van Riel <[email protected]>
Signed-off-by: Colm MacCárthaigh <[email protected]>

diff --git a/man2/fork.2 b/man2/fork.2
index b5af58ca08c0..b11e750e3876 100644
--- a/man2/fork.2
+++ b/man2/fork.2
@@ -140,6 +140,12 @@ Memory mappings that have been marked with the
flag are not inherited across a
.BR fork ().
.IP *
+Memory in mappings that have been marked with the
+.BR madvise (2)
+.B MADV_WIPEONFORK
+flag is zeroed in the child after a
+.BR fork ().
+.IP *
The termination signal of the child is always
.B SIGCHLD
(see
diff --git a/man2/madvise.2 b/man2/madvise.2
index dfb31b63dba3..bb0ac469c509 100644
--- a/man2/madvise.2
+++ b/man2/madvise.2
@@ -31,6 +31,9 @@
.\" 2010-06-19, Andi Kleen, Add documentation of MADV_SOFT_OFFLINE.
.\" 2011-09-18, Doug Goldstein <[email protected]>
.\" Document MADV_HUGEPAGE and MADV_NOHUGEPAGE
+.\" 2017-09-14, Rik van Riel <[email protected]>
+.\" Document MADV_WIPEONFORK and MADV_KEEPONFORK
+.\" commit d2cd9ede6e193dd7d88b6d27399e96229a551b19
.\"
.TH MADVISE 2 2017-07-13 "Linux" "Linux Programmer's Manual"
.SH NAME
@@ -405,6 +408,22 @@ can be applied only to private anonymous pages (see
.BR mmap (2)).
On a swapless system, freeing pages in a given range happens instantly,
regardless of memory pressure.
+.TP
+.BR MADV_WIPEONFORK " (since Linux 4.14)"
+Present the child process with zero-filled memory in this range after a
+.BR fork (2).
+This is useful for per-process data in forking servers that should be
+re-initialized in the child process after a fork, for example PRNG seeds,
+cryptographic secrets, etc.
+.IP
+The
+.B MADV_WIPEONFORK
+operation can only be applied to private anonymous pages (see
+.BR mmap (2)).
+.TP
+.BR MADV_KEEPONFORK " (since Linux 4.14)"
+Undo the effect of an earlier
+.BR MADV_WIPEONFORK .
.SH RETURN VALUE
On success,
.BR madvise ()
@@ -457,6 +476,18 @@ or
but the kernel was not configured with
.BR CONFIG_KSM .
.TP
+.B EINVAL
+.I advice
+is
+.BR MADV_FREE
+or
+.BR MADV_WIPEONFORK
+but the specified address range includes file, Huge TLB,
+.BR MAP_SHARED ,
+or
+.BR VM_PFNMAP
+ranges.
+.TP
.B EIO
(for
.BR MADV_WILLNEED )

2017-09-14 19:10:34

by Colm MacCárthaigh

[permalink] [raw]
Subject: Re: [patch v2] madvise.2: Add MADV_WIPEONFORK documentation

On Thu, Sep 14, 2017 at 12:05 PM, Rik van Riel <[email protected]> wrote:
> v2: implement the improvements suggested by Colm, and add
> Colm's text to the fork.2 man page
> (Colm, I have added a signed-off-by in your name - is that ok?)

Yep, that's ok! Whole thing LGTM.

--
Colm

Subject: Re: [patch v2] madvise.2: Add MADV_WIPEONFORK documentation

Hello Rik, (and Colm)

On 09/14/2017 09:05 PM, Rik van Riel wrote:
> v2: implement the improvements suggested by Colm, and add
> Colm's text to the fork.2 man page
> (Colm, I have added a signed-off-by in your name - is that ok?)
>
> Add MADV_WIPEONFORK and MADV_KEEPONFORK documentation to
> madvise.2. The new functionality was recently merged by
> Linus, and should be in the 4.14 kernel.
>
> While documenting what EINVAL means for MADV_WIPEONFORK,
> I realized that MADV_FREE has the same thing going on,
> so I documented EINVAL for both in the ERRORS section.
>
> This patch documents the following kernel commit:
>
> commit d2cd9ede6e193dd7d88b6d27399e96229a551b19
> Author: Rik van Riel <[email protected]>
> Date: Wed Sep 6 16:25:15 2017 -0700
>
> mm,fork: introduce MADV_WIPEONFORK

Thanks. I applied this, and tweaked the madvise.2 text a little, to
read as follows (please let me know if I messed anything up):

MADV_WIPEONFORK (since Linux 4.14)
Present the child process with zero-filled memory in this
range after a fork(2). This is useful in forking servers
in order to ensure that sensitive per-process data (for
example, PRNG seeds, cryptographic secrets, and so on) is
not handed to child processes.

The MADV_WIPEONFORK operation can be applied only to pri‐
vate anonymous pages (see mmap(2)).

Thanks,

Michael


> Signed-off-by: Rik van Riel <[email protected]>
> Signed-off-by: Colm MacCárthaigh <[email protected]>
>
> diff --git a/man2/fork.2 b/man2/fork.2
> index b5af58ca08c0..b11e750e3876 100644
> --- a/man2/fork.2
> +++ b/man2/fork.2
> @@ -140,6 +140,12 @@ Memory mappings that have been marked with the
> flag are not inherited across a
> .BR fork ().
> .IP *
> +Memory in mappings that have been marked with the
> +.BR madvise (2)
> +.B MADV_WIPEONFORK
> +flag is zeroed in the child after a
> +.BR fork ().
> +.IP *
> The termination signal of the child is always
> .B SIGCHLD
> (see
> diff --git a/man2/madvise.2 b/man2/madvise.2
> index dfb31b63dba3..bb0ac469c509 100644
> --- a/man2/madvise.2
> +++ b/man2/madvise.2
> @@ -31,6 +31,9 @@
> .\" 2010-06-19, Andi Kleen, Add documentation of MADV_SOFT_OFFLINE.
> .\" 2011-09-18, Doug Goldstein <[email protected]>
> .\" Document MADV_HUGEPAGE and MADV_NOHUGEPAGE
> +.\" 2017-09-14, Rik van Riel <[email protected]>
> +.\" Document MADV_WIPEONFORK and MADV_KEEPONFORK
> +.\" commit d2cd9ede6e193dd7d88b6d27399e96229a551b19
> .\"
> .TH MADVISE 2 2017-07-13 "Linux" "Linux Programmer's Manual"
> .SH NAME
> @@ -405,6 +408,22 @@ can be applied only to private anonymous pages (see
> .BR mmap (2)).
> On a swapless system, freeing pages in a given range happens instantly,
> regardless of memory pressure.
> +.TP
> +.BR MADV_WIPEONFORK " (since Linux 4.14)"
> +Present the child process with zero-filled memory in this range after a
> +.BR fork (2).
> +This is useful for per-process data in forking servers that should be
> +re-initialized in the child process after a fork, for example PRNG seeds,
> +cryptographic secrets, etc.
> +.IP
> +The
> +.B MADV_WIPEONFORK
> +operation can only be applied to private anonymous pages (see
> +.BR mmap (2)).
> +.TP
> +.BR MADV_KEEPONFORK " (since Linux 4.14)"
> +Undo the effect of an earlier
> +.BR MADV_WIPEONFORK .
> .SH RETURN VALUE
> On success,
> .BR madvise ()
> @@ -457,6 +476,18 @@ or
> but the kernel was not configured with
> .BR CONFIG_KSM .
> .TP
> +.B EINVAL
> +.I advice
> +is
> +.BR MADV_FREE
> +or
> +.BR MADV_WIPEONFORK
> +but the specified address range includes file, Huge TLB,
> +.BR MAP_SHARED ,
> +or
> +.BR VM_PFNMAP
> +ranges.
> +.TP
> .B EIO
> (for
> .BR MADV_WILLNEED )
>


--
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/

2017-09-19 19:21:51

by Rik van Riel

[permalink] [raw]
Subject: Re: [patch v2] madvise.2: Add MADV_WIPEONFORK documentation

On Tue, 2017-09-19 at 21:07 +0200, Michael Kerrisk (man-pages) wrote:

> Thanks. I applied this, and tweaked the madvise.2 text a little, to
> read as follows (please let me know if I messed anything up):
>
>        MADV_WIPEONFORK (since Linux 4.14)
>               Present the child process with zero-filled
> memory  in  this
>               range  after  a fork(2).  This is useful in forking
> servers
>               in order to ensure that  sensitive  per-
> process  data  (for
>               example,  PRNG  seeds, cryptographic secrets, and so
> on) is
>               not handed to child processes.
>
>               The MADV_WIPEONFORK operation can be applied
> only  to  pri‐
>               vate anonymous pages (see mmap(2)).

That looks great. Thank you, Michael!

--
All rights reversed


Attachments:
signature.asc (473.00 B)
This is a digitally signed message part