2024-03-25 16:12:45

by David Hildenbrand

[permalink] [raw]
Subject: [PATCH v1 0/3] mm/secretmem: one fix and one refactoring

Patch #1 fixes a GUP-fast issue, whereby we might succeed in pinning
secretmem folios. Patch #2 extends the memfd_secret selftest to cover
that case. Patch #3 removes folio_is_secretmem() and instead lets
folio_fast_pin_allowed() cover that case as well.

With this series, the reproducer (+selftests) works as expected. To
test patch #3, the gup_longterm test does exactly what we need, and
keeps on working as expected.

Cc: Andrew Morton <[email protected]>
Cc: Mike Rapoport (IBM) <[email protected]>
Cc: Miklos Szeredi <[email protected]>
Cc: Lorenzo Stoakes <[email protected]>
Cc: xingwei lee <[email protected]>
Cc: yue sun <[email protected]>

David Hildenbrand (3):
mm/secretmem: fix GUP-fast succeeding on secretmem folios
selftests/memfd_secret: add vmsplice() test
mm: merge folio_is_secretmem() into folio_fast_pin_allowed()

include/linux/secretmem.h | 21 ++---------
mm/gup.c | 33 ++++++++++-------
tools/testing/selftests/mm/memfd_secret.c | 44 +++++++++++++++++++++--
3 files changed, 65 insertions(+), 33 deletions(-)

--
2.43.2



2024-03-25 16:14:31

by David Hildenbrand

[permalink] [raw]
Subject: [PATCH v1 2/3] selftests/memfd_secret: add vmsplice() test

Let's add a simple reproducer for a scneario where GUP-fast could succeed
on secretmem folios, making vmsplice() succeed instead of failing. The
reproducer is based on a reproducer [1] by Miklos Szeredi.

Perform the ftruncate() only once, and check the return value.

For some reason, vmsplice() reliably fails (making the test succeed) when
we move the test_vmsplice() call after test_process_vm_read() /
test_ptrace(). Properly cleaning up in test_remote_access(), which is not
part of this change, won't change that behavior. Therefore, run the
vmsplice() test for now first -- something is a bit off once we involve
fork().

[1] https://lkml.kernel.org/r/CAJfpegt3UCsMmxd0taOY11Uaw5U=eS1fE5dn0wZX3HF0oy8-oQ@mail.gmail.com

Signed-off-by: David Hildenbrand <[email protected]>
---
tools/testing/selftests/mm/memfd_secret.c | 44 +++++++++++++++++++++--
1 file changed, 42 insertions(+), 2 deletions(-)

diff --git a/tools/testing/selftests/mm/memfd_secret.c b/tools/testing/selftests/mm/memfd_secret.c
index 9b298f6a04b3..0acbdcf8230e 100644
--- a/tools/testing/selftests/mm/memfd_secret.c
+++ b/tools/testing/selftests/mm/memfd_secret.c
@@ -20,6 +20,7 @@
#include <unistd.h>
#include <errno.h>
#include <stdio.h>
+#include <fcntl.h>

#include "../kselftest.h"

@@ -83,6 +84,43 @@ static void test_mlock_limit(int fd)
pass("mlock limit is respected\n");
}

+static void test_vmsplice(int fd)
+{
+ ssize_t transferred;
+ struct iovec iov;
+ int pipefd[2];
+ char *mem;
+
+ if (pipe(pipefd)) {
+ fail("pipe failed: %s\n", strerror(errno));
+ return;
+ }
+
+ mem = mmap(NULL, page_size, prot, mode, fd, 0);
+ if (mem == MAP_FAILED) {
+ fail("Unable to mmap secret memory\n");
+ goto close_pipe;
+ }
+
+ /*
+ * vmsplice() may use GUP-fast, which must also fail. Prefault the
+ * page table, so GUP-fast could find it.
+ */
+ memset(mem, PATTERN, page_size);
+
+ iov.iov_base = mem;
+ iov.iov_len = page_size;
+ transferred = vmsplice(pipefd[1], &iov, 1, 0);
+
+ ksft_test_result(transferred < 0 && errno == EFAULT,
+ "vmsplice is blocked as expected\n");
+
+ munmap(mem, page_size);
+close_pipe:
+ close(pipefd[0]);
+ close(pipefd[1]);
+}
+
static void try_process_vm_read(int fd, int pipefd[2])
{
struct iovec liov, riov;
@@ -187,7 +225,6 @@ static void test_remote_access(int fd, const char *name,
return;
}

- ftruncate(fd, page_size);
memset(mem, PATTERN, page_size);

if (write(pipefd[1], &mem, sizeof(mem)) < 0) {
@@ -258,7 +295,7 @@ static void prepare(void)
strerror(errno));
}

-#define NUM_TESTS 4
+#define NUM_TESTS 5

int main(int argc, char *argv[])
{
@@ -277,9 +314,12 @@ int main(int argc, char *argv[])
ksft_exit_fail_msg("memfd_secret failed: %s\n",
strerror(errno));
}
+ if (ftruncate(fd, page_size))
+ ksft_exit_fail_msg("ftruncate failed: %s\n", strerror(errno));

test_mlock_limit(fd);
test_file_apis(fd);
+ test_vmsplice(fd);
test_process_vm_read(fd);
test_ptrace(fd);

--
2.43.2


2024-03-26 06:18:49

by Mike Rapoport

[permalink] [raw]
Subject: Re: [PATCH v1 2/3] selftests/memfd_secret: add vmsplice() test

Hi David,

On Mon, Mar 25, 2024 at 02:41:13PM +0100, David Hildenbrand wrote:
> Let's add a simple reproducer for a scneario where GUP-fast could succeed
> on secretmem folios, making vmsplice() succeed instead of failing. The
> reproducer is based on a reproducer [1] by Miklos Szeredi.
>
> Perform the ftruncate() only once, and check the return value.
>
> For some reason, vmsplice() reliably fails (making the test succeed) when
> we move the test_vmsplice() call after test_process_vm_read() /
> test_ptrace().

That's because ftruncate() call was in test_remote_access() and you need it
to mmap secretmem.

> Properly cleaning up in test_remote_access(), which is not
> part of this change, won't change that behavior. Therefore, run the
> vmsplice() test for now first -- something is a bit off once we involve
> fork().
>
> [1] https://lkml.kernel.org/r/CAJfpegt3UCsMmxd0taOY11Uaw5U=eS1fE5dn0wZX3HF0oy8-oQ@mail.gmail.com
>
> Signed-off-by: David Hildenbrand <[email protected]>
> ---
> tools/testing/selftests/mm/memfd_secret.c | 44 +++++++++++++++++++++--
> 1 file changed, 42 insertions(+), 2 deletions(-)
>
> diff --git a/tools/testing/selftests/mm/memfd_secret.c b/tools/testing/selftests/mm/memfd_secret.c
> index 9b298f6a04b3..0acbdcf8230e 100644
> --- a/tools/testing/selftests/mm/memfd_secret.c
> +++ b/tools/testing/selftests/mm/memfd_secret.c
> @@ -20,6 +20,7 @@
> #include <unistd.h>
> #include <errno.h>
> #include <stdio.h>
> +#include <fcntl.h>
>
> #include "../kselftest.h"
>
> @@ -83,6 +84,43 @@ static void test_mlock_limit(int fd)
> pass("mlock limit is respected\n");
> }
>
> +static void test_vmsplice(int fd)
> +{
> + ssize_t transferred;
> + struct iovec iov;
> + int pipefd[2];
> + char *mem;
> +
> + if (pipe(pipefd)) {
> + fail("pipe failed: %s\n", strerror(errno));
> + return;
> + }
> +
> + mem = mmap(NULL, page_size, prot, mode, fd, 0);
> + if (mem == MAP_FAILED) {
> + fail("Unable to mmap secret memory\n");
> + goto close_pipe;
> + }
> +
> + /*
> + * vmsplice() may use GUP-fast, which must also fail. Prefault the
> + * page table, so GUP-fast could find it.
> + */
> + memset(mem, PATTERN, page_size);
> +
> + iov.iov_base = mem;
> + iov.iov_len = page_size;
> + transferred = vmsplice(pipefd[1], &iov, 1, 0);
> +
> + ksft_test_result(transferred < 0 && errno == EFAULT,
> + "vmsplice is blocked as expected\n");

The same message will be printed on success and on failure.

I think

if (transferred < 0 && errno == EFAULT)
pass("vmsplice is blocked as expected");
else
fail("vmsplice: unexpected memory acccess");

is clearer than feeding different strings to ksft_test_result().

Other than that

Reviewed-by: Mike Rapoport (IBM) <[email protected]>

> +
> + munmap(mem, page_size);
> +close_pipe:
> + close(pipefd[0]);
> + close(pipefd[1]);
> +}

--
Sincerely yours,
Mike.

2024-03-26 12:32:35

by David Hildenbrand

[permalink] [raw]
Subject: Re: [PATCH v1 2/3] selftests/memfd_secret: add vmsplice() test

On 26.03.24 07:17, Mike Rapoport wrote:
> Hi David,
>
> On Mon, Mar 25, 2024 at 02:41:13PM +0100, David Hildenbrand wrote:
>> Let's add a simple reproducer for a scneario where GUP-fast could succeed
>> on secretmem folios, making vmsplice() succeed instead of failing. The
>> reproducer is based on a reproducer [1] by Miklos Szeredi.
>>
>> Perform the ftruncate() only once, and check the return value.
>>
>> For some reason, vmsplice() reliably fails (making the test succeed) when
>> we move the test_vmsplice() call after test_process_vm_read() /
>> test_ptrace().
>
> That's because ftruncate() call was in test_remote_access() and you need it
> to mmap secretmem.

I don't think that's the reason. I reshuffled the code a couple of times
without luck.

And in fact, even executing the vmsplice() test twice results in the
second iteration succeeding on an old kernel (6.7.4-200.fc39.x86_64).

ok 1 mlock limit is respected
ok 2 file IO is blocked as expected
not ok 3 vmsplice is blocked as expected
ok 4 vmsplice is blocked as expected
ok 5 process_vm_read is blocked as expected
ok 6 ptrace is blocked as expected

Note that the mmap()+memset() succeeded. So the secretmem pages should be in the page table.


Even weirder, if I simply mmap()+memset()+munmap() secretmem *once*, the test passes

diff --git a/tools/testing/selftests/mm/memfd_secret.c b/tools/testing/selftests/mm/memfd_secret.c
index 0acbdcf8230e..7a973ec6ac8f 100644
--- a/tools/testing/selftests/mm/memfd_secret.c
+++ b/tools/testing/selftests/mm/memfd_secret.c
@@ -96,6 +96,14 @@ static void test_vmsplice(int fd)
return;
}

+ mem = mmap(NULL, page_size, prot, mode, fd, 0);
+ if (mem == MAP_FAILED) {
+ fail("Unable to mmap secret memory\n");
+ goto close_pipe;
+ }
+ memset(mem, PATTERN, page_size);
+ munmap(mem, page_size);
+
mem = mmap(NULL, page_size, prot, mode, fd, 0);
if (mem == MAP_FAILED) {
fail("Unable to mmap secret memory\n");

ok 1 mlock limit is respected
ok 2 file IO is blocked as expected
ok 3 vmsplice is blocked as expected
ok 4 process_vm_read is blocked as expected
ok 5 ptrace is blocked as expected


.. could it be that munmap()+mmap() will end up turning these pages into LRU pages?

I am 100% sure that is happening -- likely, because VM_LOCKED is involved,
because on the patched kernel, I see the following:

ok 1 mlock limit is respected
ok 2 file IO is blocked as expected
ok 3 vmsplice is blocked as expected
not ok 4 vmsplice is blocked as expected
ok 5 process_vm_read is blocked as expected
ok 6 ptrace is blocked as expected


At this point, I think we should remove the LRU test for secretmem.

I'll adjust patch #1 and extend this test to cover that case as well.

>
>> Properly cleaning up in test_remote_access(), which is not
>> part of this change, won't change that behavior. Therefore, run the
>> vmsplice() test for now first -- something is a bit off once we involve
>> fork().
>>
>> [1] https://lkml.kernel.org/r/CAJfpegt3UCsMmxd0taOY11Uaw5U=eS1fE5dn0wZX3HF0oy8-oQ@mail.gmail.com
>>
>> Signed-off-by: David Hildenbrand <[email protected]>
>> ---
>> tools/testing/selftests/mm/memfd_secret.c | 44 +++++++++++++++++++++--
>> 1 file changed, 42 insertions(+), 2 deletions(-)
>>
>> diff --git a/tools/testing/selftests/mm/memfd_secret.c b/tools/testing/selftests/mm/memfd_secret.c
>> index 9b298f6a04b3..0acbdcf8230e 100644
>> --- a/tools/testing/selftests/mm/memfd_secret.c
>> +++ b/tools/testing/selftests/mm/memfd_secret.c
>> @@ -20,6 +20,7 @@
>> #include <unistd.h>
>> #include <errno.h>
>> #include <stdio.h>
>> +#include <fcntl.h>
>>
>> #include "../kselftest.h"
>>
>> @@ -83,6 +84,43 @@ static void test_mlock_limit(int fd)
>> pass("mlock limit is respected\n");
>> }
>>
>> +static void test_vmsplice(int fd)
>> +{
>> + ssize_t transferred;
>> + struct iovec iov;
>> + int pipefd[2];
>> + char *mem;
>> +
>> + if (pipe(pipefd)) {
>> + fail("pipe failed: %s\n", strerror(errno));
>> + return;
>> + }
>> +
>> + mem = mmap(NULL, page_size, prot, mode, fd, 0);
>> + if (mem == MAP_FAILED) {
>> + fail("Unable to mmap secret memory\n");
>> + goto close_pipe;
>> + }
>> +
>> + /*
>> + * vmsplice() may use GUP-fast, which must also fail. Prefault the
>> + * page table, so GUP-fast could find it.
>> + */
>> + memset(mem, PATTERN, page_size);
>> +
>> + iov.iov_base = mem;
>> + iov.iov_len = page_size;
>> + transferred = vmsplice(pipefd[1], &iov, 1, 0);
>> +
>> + ksft_test_result(transferred < 0 && errno == EFAULT,
>> + "vmsplice is blocked as expected\n");
>
> The same message will be printed on success and on failure.
>
> I think
>
> if (transferred < 0 && errno == EFAULT)
> pass("vmsplice is blocked as expected");
> else
> fail("vmsplice: unexpected memory acccess");
>
> is clearer than feeding different strings to ksft_test_result().
>

Can do, thanks!

--
Cheers,

David / dhildenb


2024-03-26 13:12:11

by David Hildenbrand

[permalink] [raw]
Subject: Re: [PATCH v1 2/3] selftests/memfd_secret: add vmsplice() test

On 26.03.24 13:32, David Hildenbrand wrote:
> On 26.03.24 07:17, Mike Rapoport wrote:
>> Hi David,
>>
>> On Mon, Mar 25, 2024 at 02:41:13PM +0100, David Hildenbrand wrote:
>>> Let's add a simple reproducer for a scneario where GUP-fast could succeed
>>> on secretmem folios, making vmsplice() succeed instead of failing. The
>>> reproducer is based on a reproducer [1] by Miklos Szeredi.
>>>
>>> Perform the ftruncate() only once, and check the return value.
>>>
>>> For some reason, vmsplice() reliably fails (making the test succeed) when
>>> we move the test_vmsplice() call after test_process_vm_read() /
>>> test_ptrace().
>>
>> That's because ftruncate() call was in test_remote_access() and you need it
>> to mmap secretmem.
>
> I don't think that's the reason. I reshuffled the code a couple of times
> without luck.
>
> And in fact, even executing the vmsplice() test twice results in the
> second iteration succeeding on an old kernel (6.7.4-200.fc39.x86_64).
>
> ok 1 mlock limit is respected
> ok 2 file IO is blocked as expected
> not ok 3 vmsplice is blocked as expected
> ok 4 vmsplice is blocked as expected
> ok 5 process_vm_read is blocked as expected
> ok 6 ptrace is blocked as expected
>
> Note that the mmap()+memset() succeeded. So the secretmem pages should be in the page table.
>
>
> Even weirder, if I simply mmap()+memset()+munmap() secretmem *once*, the test passes
>
> diff --git a/tools/testing/selftests/mm/memfd_secret.c b/tools/testing/selftests/mm/memfd_secret.c
> index 0acbdcf8230e..7a973ec6ac8f 100644
> --- a/tools/testing/selftests/mm/memfd_secret.c
> +++ b/tools/testing/selftests/mm/memfd_secret.c
> @@ -96,6 +96,14 @@ static void test_vmsplice(int fd)
> return;
> }
>
> + mem = mmap(NULL, page_size, prot, mode, fd, 0);
> + if (mem == MAP_FAILED) {
> + fail("Unable to mmap secret memory\n");
> + goto close_pipe;
> + }
> + memset(mem, PATTERN, page_size);
> + munmap(mem, page_size);
> +
> mem = mmap(NULL, page_size, prot, mode, fd, 0);
> if (mem == MAP_FAILED) {
> fail("Unable to mmap secret memory\n");
>
> ok 1 mlock limit is respected
> ok 2 file IO is blocked as expected
> ok 3 vmsplice is blocked as expected
> ok 4 process_vm_read is blocked as expected
> ok 5 ptrace is blocked as expected
>
>
> ... could it be that munmap()+mmap() will end up turning these pages into LRU pages?

Okay, now I am completely confused.

secretmem_fault() calls filemap_add_folio(), which should turn this into
an LRU page.

So secretmem pages should always be LRU pages. .. unless we're batching
in the LRU cache and haven't done the lru_add_drain() ...

And likely, the munmap() will drain the lru cache and turn the page into
an LRU page.

Okay, I'll go make sure if that's the case. If so, relying on the page
being LRU vs. not LRU in GUP-fast is unreliable and shall be dropped.

--
Cheers,

David / dhildenb