6ae08ae3dea2 ("bpf: Add probe_read_{user, kernel} and probe_read_{user,
kernel}_str helpers") introduced a subtle bug where
bpf_probe_read_user_str() would potentially copy a few extra bytes after
the NUL terminator.
This issue is particularly nefarious when strings are used as map keys,
as seemingly identical strings can occupy multiple entries in a map.
This patchset fixes the issue and introduces a selftest to prevent
future regressions.
v4 -> v5:
* don't read potentially uninitialized memory
v3 -> v4:
* directly pass userspace pointer to prog
* test more strings of different length
v2 -> v3:
* set pid filter before attaching prog in selftest
* use long instead of int as bpf_probe_read_user_str() retval
* style changes
v1 -> v2:
* add Fixes: tag
* add selftest
Daniel Xu (2):
lib/strncpy_from_user.c: Don't overcopy bytes after NUL terminator
selftest/bpf: Test bpf_probe_read_user_str() strips trailing bytes
after NUL
lib/strncpy_from_user.c | 9 ++-
.../bpf/prog_tests/probe_read_user_str.c | 71 +++++++++++++++++++
.../bpf/progs/test_probe_read_user_str.c | 25 +++++++
3 files changed, 100 insertions(+), 5 deletions(-)
create mode 100644 tools/testing/selftests/bpf/prog_tests/probe_read_user_str.c
create mode 100644 tools/testing/selftests/bpf/progs/test_probe_read_user_str.c
--
2.29.2
do_strncpy_from_user() may copy some extra bytes after the NUL
terminator into the destination buffer. This usually does not matter for
normal string operations. However, when BPF programs key BPF maps with
strings, this matters a lot.
A BPF program may read strings from user memory by calling the
bpf_probe_read_user_str() helper which eventually calls
do_strncpy_from_user(). The program can then key a map with the
resulting string. BPF map keys are fixed-width and string-agnostic,
meaning that map keys are treated as a set of bytes.
The issue is when do_strncpy_from_user() overcopies bytes after the NUL
terminator, it can result in seemingly identical strings occupying
multiple slots in a BPF map. This behavior is subtle and totally
unexpected by the user.
This commit has strncpy start copying a byte at a time if a NUL is
spotted.
Fixes: 6ae08ae3dea2 ("bpf: Add probe_read_{user, kernel} and probe_read_{user, kernel}_str helpers")
Signed-off-by: Daniel Xu <[email protected]>
---
lib/strncpy_from_user.c | 9 ++++-----
1 file changed, 4 insertions(+), 5 deletions(-)
diff --git a/lib/strncpy_from_user.c b/lib/strncpy_from_user.c
index e6d5fcc2cdf3..83180742e729 100644
--- a/lib/strncpy_from_user.c
+++ b/lib/strncpy_from_user.c
@@ -40,12 +40,11 @@ static inline long do_strncpy_from_user(char *dst, const char __user *src,
/* Fall back to byte-at-a-time if we get a page fault */
unsafe_get_user(c, (unsigned long __user *)(src+res), byte_at_a_time);
+ if (has_zero(c, &data, &constants))
+ goto byte_at_a_time;
+
*(unsigned long *)(dst+res) = c;
- if (has_zero(c, &data, &constants)) {
- data = prep_zero_mask(c, data, &constants);
- data = create_zero_mask(data);
- return res + find_zero(data);
- }
+
res += sizeof(unsigned long);
max -= sizeof(unsigned long);
}
--
2.29.2
On Wed, Nov 11, 2020 at 2:46 PM Daniel Xu <[email protected]> wrote:
>
> 6ae08ae3dea2 ("bpf: Add probe_read_{user, kernel} and probe_read_{user,
> kernel}_str helpers") introduced a subtle bug where
> bpf_probe_read_user_str() would potentially copy a few extra bytes after
> the NUL terminator.
>
> This issue is particularly nefarious when strings are used as map keys,
> as seemingly identical strings can occupy multiple entries in a map.
>
> This patchset fixes the issue and introduces a selftest to prevent
> future regressions.
>
> v4 -> v5:
> * don't read potentially uninitialized memory
I think the bigger problem was that it could overwrite unintended
memory. E.g., in BPF program, if you had something like:
char my_buf[8 + 3];
char my_precious_data[5] = {1, 2, 3, 4, 5};
With previous version you'd overwrite my_precious data. BTW, do you
test such scenario in the selftests you added? If not, we should have
something like this as well and validate 1, 2, 3, 4, 5 stay intact.
>
> v3 -> v4:
> * directly pass userspace pointer to prog
> * test more strings of different length
>
> v2 -> v3:
> * set pid filter before attaching prog in selftest
> * use long instead of int as bpf_probe_read_user_str() retval
> * style changes
>
> v1 -> v2:
> * add Fixes: tag
> * add selftest
>
> Daniel Xu (2):
> lib/strncpy_from_user.c: Don't overcopy bytes after NUL terminator
> selftest/bpf: Test bpf_probe_read_user_str() strips trailing bytes
> after NUL
>
> lib/strncpy_from_user.c | 9 ++-
> .../bpf/prog_tests/probe_read_user_str.c | 71 +++++++++++++++++++
> .../bpf/progs/test_probe_read_user_str.c | 25 +++++++
> 3 files changed, 100 insertions(+), 5 deletions(-)
> create mode 100644 tools/testing/selftests/bpf/prog_tests/probe_read_user_str.c
> create mode 100644 tools/testing/selftests/bpf/progs/test_probe_read_user_str.c
>
> --
> 2.29.2
>
On Wed, Nov 11, 2020 at 2:46 PM Daniel Xu <[email protected]> wrote:
>
> do_strncpy_from_user() may copy some extra bytes after the NUL
> terminator into the destination buffer. This usually does not matter for
> normal string operations. However, when BPF programs key BPF maps with
> strings, this matters a lot.
>
> A BPF program may read strings from user memory by calling the
> bpf_probe_read_user_str() helper which eventually calls
> do_strncpy_from_user(). The program can then key a map with the
> resulting string. BPF map keys are fixed-width and string-agnostic,
> meaning that map keys are treated as a set of bytes.
>
> The issue is when do_strncpy_from_user() overcopies bytes after the NUL
> terminator, it can result in seemingly identical strings occupying
> multiple slots in a BPF map. This behavior is subtle and totally
> unexpected by the user.
>
> This commit has strncpy start copying a byte at a time if a NUL is
> spotted.
>
> Fixes: 6ae08ae3dea2 ("bpf: Add probe_read_{user, kernel} and probe_read_{user, kernel}_str helpers")
> Signed-off-by: Daniel Xu <[email protected]>
> ---
This looks more immediately correct.
Acked-by: Andrii Nakryiko <[email protected]>
> lib/strncpy_from_user.c | 9 ++++-----
> 1 file changed, 4 insertions(+), 5 deletions(-)
>
> diff --git a/lib/strncpy_from_user.c b/lib/strncpy_from_user.c
> index e6d5fcc2cdf3..83180742e729 100644
> --- a/lib/strncpy_from_user.c
> +++ b/lib/strncpy_from_user.c
> @@ -40,12 +40,11 @@ static inline long do_strncpy_from_user(char *dst, const char __user *src,
> /* Fall back to byte-at-a-time if we get a page fault */
> unsafe_get_user(c, (unsigned long __user *)(src+res), byte_at_a_time);
>
> + if (has_zero(c, &data, &constants))
> + goto byte_at_a_time;
> +
> *(unsigned long *)(dst+res) = c;
> - if (has_zero(c, &data, &constants)) {
> - data = prep_zero_mask(c, data, &constants);
> - data = create_zero_mask(data);
> - return res + find_zero(data);
> - }
> +
> res += sizeof(unsigned long);
> max -= sizeof(unsigned long);
> }
> --
> 2.29.2
>
Previously, bpf_probe_read_user_str() could potentially overcopy the
trailing bytes after the NUL due to how do_strncpy_from_user() does the
copy in long-sized strides. The issue has been fixed in the previous
commit.
This commit adds a selftest that ensures we don't regress
bpf_probe_read_user_str() again.
Acked-by: Song Liu <[email protected]>
Acked-by: Andrii Nakryiko <[email protected]>
Signed-off-by: Daniel Xu <[email protected]>
---
.../bpf/prog_tests/probe_read_user_str.c | 71 +++++++++++++++++++
.../bpf/progs/test_probe_read_user_str.c | 25 +++++++
2 files changed, 96 insertions(+)
create mode 100644 tools/testing/selftests/bpf/prog_tests/probe_read_user_str.c
create mode 100644 tools/testing/selftests/bpf/progs/test_probe_read_user_str.c
diff --git a/tools/testing/selftests/bpf/prog_tests/probe_read_user_str.c b/tools/testing/selftests/bpf/prog_tests/probe_read_user_str.c
new file mode 100644
index 000000000000..e419298132b5
--- /dev/null
+++ b/tools/testing/selftests/bpf/prog_tests/probe_read_user_str.c
@@ -0,0 +1,71 @@
+// SPDX-License-Identifier: GPL-2.0
+#include <test_progs.h>
+#include "test_probe_read_user_str.skel.h"
+
+static const char str1[] = "mestring";
+static const char str2[] = "mestringalittlebigger";
+static const char str3[] = "mestringblubblubblubblubblub";
+
+static int test_one_str(struct test_probe_read_user_str *skel, const char *str,
+ size_t len)
+{
+ int err, duration = 0;
+ char buf[256];
+
+ /* Ensure bytes after string are ones */
+ memset(buf, 1, sizeof(buf));
+ memcpy(buf, str, len);
+
+ /* Give prog our userspace pointer */
+ skel->bss->user_ptr = buf;
+
+ /* Trigger tracepoint */
+ usleep(1);
+
+ /* Did helper fail? */
+ if (CHECK(skel->bss->ret < 0, "prog_ret", "prog returned: %ld\n",
+ skel->bss->ret))
+ return 1;
+
+ /* Check that string was copied correctly */
+ err = memcmp(skel->bss->buf, str, len);
+ if (CHECK(err, "memcmp", "prog copied wrong string"))
+ return 1;
+
+ /* Now check that no extra trailing bytes were copied */
+ memset(buf, 0, sizeof(buf));
+ err = memcmp(skel->bss->buf + len, buf, sizeof(buf) - len);
+ if (CHECK(err, "memcmp", "trailing bytes were not stripped"))
+ return 1;
+
+ return 0;
+}
+
+void test_probe_read_user_str(void)
+{
+ struct test_probe_read_user_str *skel;
+ int err, duration = 0;
+
+ skel = test_probe_read_user_str__open_and_load();
+ if (CHECK(!skel, "test_probe_read_user_str__open_and_load",
+ "skeleton open and load failed\n"))
+ return;
+
+ /* Give pid to bpf prog so it doesn't read from anyone else */
+ skel->bss->pid = getpid();
+
+ err = test_probe_read_user_str__attach(skel);
+ if (CHECK(err, "test_probe_read_user_str__attach",
+ "skeleton attach failed: %d\n", err))
+ goto out;
+
+ if (test_one_str(skel, str1, sizeof(str1)))
+ goto out;
+ if (test_one_str(skel, str2, sizeof(str2)))
+ goto out;
+ if (test_one_str(skel, str3, sizeof(str3)))
+ goto out;
+
+out:
+ test_probe_read_user_str__destroy(skel);
+}
diff --git a/tools/testing/selftests/bpf/progs/test_probe_read_user_str.c b/tools/testing/selftests/bpf/progs/test_probe_read_user_str.c
new file mode 100644
index 000000000000..3ae398b75dcd
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/test_probe_read_user_str.c
@@ -0,0 +1,25 @@
+// SPDX-License-Identifier: GPL-2.0
+
+#include <linux/bpf.h>
+#include <bpf/bpf_helpers.h>
+#include <bpf/bpf_tracing.h>
+
+#include <sys/types.h>
+
+pid_t pid = 0;
+long ret = 0;
+void *user_ptr = 0;
+char buf[256] = {};
+
+SEC("tracepoint/syscalls/sys_enter_nanosleep")
+int on_write(void *ctx)
+{
+ if (pid != (bpf_get_current_pid_tgid() >> 32))
+ return 0;
+
+ ret = bpf_probe_read_user_str(buf, sizeof(buf), user_ptr);
+
+ return 0;
+}
+
+char _license[] SEC("license") = "GPL";
--
2.29.2
On Wed Nov 11, 2020 at 3:22 PM PST, Andrii Nakryiko wrote:
> On Wed, Nov 11, 2020 at 2:46 PM Daniel Xu <[email protected]> wrote:
> >
> > 6ae08ae3dea2 ("bpf: Add probe_read_{user, kernel} and probe_read_{user,
> > kernel}_str helpers") introduced a subtle bug where
> > bpf_probe_read_user_str() would potentially copy a few extra bytes after
> > the NUL terminator.
> >
> > This issue is particularly nefarious when strings are used as map keys,
> > as seemingly identical strings can occupy multiple entries in a map.
> >
> > This patchset fixes the issue and introduces a selftest to prevent
> > future regressions.
> >
> > v4 -> v5:
> > * don't read potentially uninitialized memory
>
> I think the bigger problem was that it could overwrite unintended
> memory. E.g., in BPF program, if you had something like:
>
> char my_buf[8 + 3];
> char my_precious_data[5] = {1, 2, 3, 4, 5};
How does that happen?
The
while (max >= sizeof(unsigned long)) {
/* copy 4 bytes */
max -= sizeof(unsigned long)
}
/* copy byte at a time */
where `max` is the user supplied length should prevent that kind of
corruption, right?
[...]
On Thu, Nov 12, 2020 at 11:13 AM Daniel Xu <[email protected]> wrote:
>
> On Wed Nov 11, 2020 at 3:22 PM PST, Andrii Nakryiko wrote:
> > On Wed, Nov 11, 2020 at 2:46 PM Daniel Xu <[email protected]> wrote:
> > >
> > > 6ae08ae3dea2 ("bpf: Add probe_read_{user, kernel} and probe_read_{user,
> > > kernel}_str helpers") introduced a subtle bug where
> > > bpf_probe_read_user_str() would potentially copy a few extra bytes after
> > > the NUL terminator.
> > >
> > > This issue is particularly nefarious when strings are used as map keys,
> > > as seemingly identical strings can occupy multiple entries in a map.
> > >
> > > This patchset fixes the issue and introduces a selftest to prevent
> > > future regressions.
> > >
> > > v4 -> v5:
> > > * don't read potentially uninitialized memory
> >
> > I think the bigger problem was that it could overwrite unintended
> > memory. E.g., in BPF program, if you had something like:
> >
> > char my_buf[8 + 3];
> > char my_precious_data[5] = {1, 2, 3, 4, 5};
>
> How does that happen?
>
> The
>
> while (max >= sizeof(unsigned long)) {
> /* copy 4 bytes */
>
> max -= sizeof(unsigned long)
> }
>
> /* copy byte at a time */
>
> where `max` is the user supplied length should prevent that kind of
> corruption, right?
Yes, you are right, I got confused. If the user specified the correct
max, then this would have never happened. Never mind.
>
> [...]
On Wed, Nov 11, 2020 at 02:45:54PM -0800, Daniel Xu wrote:
> do_strncpy_from_user() may copy some extra bytes after the NUL
> terminator into the destination buffer. This usually does not matter for
> normal string operations. However, when BPF programs key BPF maps with
> strings, this matters a lot.
>
> A BPF program may read strings from user memory by calling the
> bpf_probe_read_user_str() helper which eventually calls
> do_strncpy_from_user(). The program can then key a map with the
> resulting string. BPF map keys are fixed-width and string-agnostic,
> meaning that map keys are treated as a set of bytes.
>
> The issue is when do_strncpy_from_user() overcopies bytes after the NUL
> terminator, it can result in seemingly identical strings occupying
> multiple slots in a BPF map. This behavior is subtle and totally
> unexpected by the user.
>
> This commit has strncpy start copying a byte at a time if a NUL is
> spotted.
>
> Fixes: 6ae08ae3dea2 ("bpf: Add probe_read_{user, kernel} and probe_read_{user, kernel}_str helpers")
> Signed-off-by: Daniel Xu <[email protected]>
> ---
> lib/strncpy_from_user.c | 9 ++++-----
> 1 file changed, 4 insertions(+), 5 deletions(-)
>
> diff --git a/lib/strncpy_from_user.c b/lib/strncpy_from_user.c
> index e6d5fcc2cdf3..83180742e729 100644
> --- a/lib/strncpy_from_user.c
> +++ b/lib/strncpy_from_user.c
> @@ -40,12 +40,11 @@ static inline long do_strncpy_from_user(char *dst, const char __user *src,
> /* Fall back to byte-at-a-time if we get a page fault */
> unsafe_get_user(c, (unsigned long __user *)(src+res), byte_at_a_time);
>
> + if (has_zero(c, &data, &constants))
> + goto byte_at_a_time;
> +
> *(unsigned long *)(dst+res) = c;
> - if (has_zero(c, &data, &constants)) {
> - data = prep_zero_mask(c, data, &constants);
> - data = create_zero_mask(data);
> - return res + find_zero(data);
> - }
> +
> res += sizeof(unsigned long);
> max -= sizeof(unsigned long);
> }
The fix looks good to me. It's indeed better than v4 approach.
Linus,
I think you might have an opinion about it.
Please see commit log for the reason we need this fix.
On Fri, Nov 13, 2020 at 9:03 AM Alexei Starovoitov
<[email protected]> wrote:
>
> Linus,
> I think you might have an opinion about it.
> Please see commit log for the reason we need this fix.
Why is BPF doing this?
The thing is, if you care about the data after the strncpy, you should
be clearing it yourself.
The kernel "strncpy_from_user()" is *NOT* the same as "strncpy()",
despite the name. It never has been, and it never will be. Just the
return value being different should have given it away.
In particular, "strncpy()" is documented to zero-pad the end of the
area. strncpy_from_user() in contrast, is documented to NOT do that.
You cannot - and must not - depend on it.
If you want the zero-padding, you need to do it yourself. We're not
slowing down strncpy_from_user() because you want it, because NOBODY
ELSE cares, and you're depending on semantics that do not exist, and
have never existed.
So if you want padding, you do something like
long strncpy_from_user_pad(char *dst, const char __user *src, long count)
{
long res = strncpy_from_user(dst, src, count)
if (res >= 0)
memset(dst+res, 0, count-res);
return res;
}
because BPF is doing things wrong as-is, and the problem is very much
that BPF is relying on undefined data *after* the string.
And again - we're not slowing down the good cases just because BPF
depends on bad behavior.
You should feel bad.
Linus
On Fri, Nov 13, 2020 at 10:08:02AM -0800, Linus Torvalds wrote:
> On Fri, Nov 13, 2020 at 9:03 AM Alexei Starovoitov
> <[email protected]> wrote:
> >
> > Linus,
> > I think you might have an opinion about it.
> > Please see commit log for the reason we need this fix.
>
> Why is BPF doing this?
>
> The thing is, if you care about the data after the strncpy, you should
> be clearing it yourself.
>
> The kernel "strncpy_from_user()" is *NOT* the same as "strncpy()",
> despite the name. It never has been, and it never will be. Just the
> return value being different should have given it away.
>
> In particular, "strncpy()" is documented to zero-pad the end of the
> area. strncpy_from_user() in contrast, is documented to NOT do that.
> You cannot - and must not - depend on it.
>
> If you want the zero-padding, you need to do it yourself. We're not
> slowing down strncpy_from_user() because you want it, because NOBODY
> ELSE cares, and you're depending on semantics that do not exist, and
> have never existed.
>
> So if you want padding, you do something like
>
> long strncpy_from_user_pad(char *dst, const char __user *src, long count)
> {
> long res = strncpy_from_user(dst, src, count)
> if (res >= 0)
> memset(dst+res, 0, count-res);
> return res;
> }
>
> because BPF is doing things wrong as-is, and the problem is very much
> that BPF is relying on undefined data *after* the string.
>
> And again - we're not slowing down the good cases just because BPF
> depends on bad behavior.
You misunderstood.
BPF side does not depend on zero padding.
The destination buffer was already initialized with zeros before the call.
What BPF didn't expect is strncpy_from_user() copying extra garbage after NUL byte.
I bet some kernel routine would be equally surprised by such behavior.
Hence I still think the fix belongs in this function.
I think the theoretical performance degradation is exactly theoretical.
The v4 approach preserves performance. It wasn't switching to byte_at_a_time:
https://patchwork.kernel.org/project/netdevbpf/patch/4ff12d0c19de63e7172d25922adfb83ae7c8691f.1604620776.git.dxu@dxuuu.xyz/
but it caused KASAN splats, since kernel usage of strncpy_from_user()
doesn't init dest array unlike bpf does.
On Fri, Nov 13, 2020 at 11:17 AM Alexei Starovoitov
<[email protected]> wrote:
>
> You misunderstood.
> BPF side does not depend on zero padding.
> The destination buffer was already initialized with zeros before the call.
> What BPF didn't expect is strncpy_from_user() copying extra garbage after NUL byte.
BPF made the wrong expectation.
Those bytes are not defined, and it's faster the way it is written.
Nobody else cares.
BPF needs to fix it's usage. It really is that simple.
strncpy_from_user() is one of the hottest functions in the whole
kernel (under certain not very uncommon loads), and it's been
optimized for performance.
You told it that the destination buffer was some amount of bytes, and
strncpy_from_user() will use up to that maximum number of bytes.
That's the only guarantee you have - it won't write _past_ the buffer
you gave it.
The fact that you then use the string not as a string, but as
something else, that's why *you* need to change your code.
Linus
On Fri, Nov 13, 2020 at 11:17 AM Alexei Starovoitov
<[email protected]> wrote:
>
> The destination buffer was already initialized with zeros before the call.
Side note: this is another example of you using the interface incorrectly.
You should *not* initialize the buffer with zeroes. That's just extra
work. One of the points of the strncpy_from_user() interface is that
it is *not* the incredibly broken garbage that "strncpy()" is.
strncpy_from_user() returns the size of the resulting string,
*EXACTLY* so that people who care can then use that information
directly and efficiently.
Most of the time it's to avoid a strlen() on the result (and check for
overflow), of course, but the other use-case is exactly that "maybe I
need to pad out the result", so that you don't need to initialize the
buffer beforehand.
I'm not sure exactly which strncpy_from_user() user you have issues
with, but I did a
git grep strncpy_from_user -- '*bpf*'
and several of them look quite questionable.
All of the ones in kernel/bpf/syscall.c seem to handle overflow
incorrectly, for example, with a silent truncation instead of error.
Maybe that's fine, but it's questionable.
And the bpf_trace_copy_string() thing doesn't check the result at all,
so the end result may not be NUL-terminated. Maybe that's ok. I didn't
check the call chain.
Linus
On Fri, Nov 13, 2020 at 11:17 AM Alexei Starovoitov
<[email protected]> wrote:
>
> The v4 approach preserves performance. It wasn't switching to byte_at_a_time:
That v4 looks better, but still pointless.
But it might be acceptable, with that final
*out = (*out & ~mask) | (c & mask);
just replaced with
*out = c & mask;
which still writes past the end, but now it only writes zeroes.
But the only reason for that to be done is if you have exposed the
destination buffer to another thread before (and you zeroed it before
exposing it), and you want to make sure that any concurrent reader can
never see anything past the end of the string.
Again - the *only* case that could possibly matter is when you
pre-zeroed the buffer, because if you didn't, then a concurrent reader
would see random garbage *anyway*, particularly since there is no SMP
memory ordering imposed with the strncpy. So nothing but "pre-zeroed"
makes any possible sense, which means that the whole "(*out & ~mask)"
in that v4 patch is completely and utterly meaningless. There's
absolutely zero reason to try to preserve any old data.
In other words, you have two cases:
(a) no threaded and unlocked accesses to the resulting string
(b) you _do_ have concurrent threaded accesses to the string and no
locking (really? That's seriously questionable),
If you have case (a), then the only correct thing to do is to
explicitly pad afterwards. It's optimal, and doesn't make any
assumptions about implementation of strncpy_from_user().
If you really have that case (b), and you absolutely require that the
filling be done without exposing any temporary garbage, and thus the
"pad afterwards" doesn't work, then you are doing something seriously
odd.
But in that seriously odd (b) case, the _only_ possibly valid thing
you can do is to pre-zero the buffer, since strncpy_from_user()
doesn't even imply any memory ordering in its accesses, so any
concurrent reader by definition will see a randomly ordered partial
string being copied. That strikes me as completely insane, but at
least a careful reader could see a valid partial string being possibly
in the process of being built up. But again, that use-case is only
possible if the buffer is pre-zeroed, so doing that "(*out & ~mask)"
cannot be relevant or sane.
If you really do have that (b) case, then I'd accept that modified v4
patch, together with an absolutely *huge* comment both in
strncpy_from_user() and very much at the call-site, talking about that
non-locked concurrent access to the destination buffer.
Linus
On Fri, Nov 13, 2020 at 12:10:57PM -0800, Linus Torvalds wrote:
> On Fri, Nov 13, 2020 at 11:17 AM Alexei Starovoitov
> <[email protected]> wrote:
> >
> > The v4 approach preserves performance. It wasn't switching to byte_at_a_time:
>
> That v4 looks better, but still pointless.
>
> But it might be acceptable, with that final
>
> *out = (*out & ~mask) | (c & mask);
>
> just replaced with
>
> *out = c & mask;
>
> which still writes past the end, but now it only writes zeroes.
>
> But the only reason for that to be done is if you have exposed the
> destination buffer to another thread before (and you zeroed it before
> exposing it), and you want to make sure that any concurrent reader can
> never see anything past the end of the string.
>
> Again - the *only* case that could possibly matter is when you
> pre-zeroed the buffer, because if you didn't, then a concurrent reader
> would see random garbage *anyway*, particularly since there is no SMP
> memory ordering imposed with the strncpy. So nothing but "pre-zeroed"
> makes any possible sense, which means that the whole "(*out & ~mask)"
> in that v4 patch is completely and utterly meaningless. There's
> absolutely zero reason to try to preserve any old data.
>
> In other words, you have two cases:
>
> (a) no threaded and unlocked accesses to the resulting string
>
> (b) you _do_ have concurrent threaded accesses to the string and no
> locking (really? That's seriously questionable),
>
> If you have case (a), then the only correct thing to do is to
> explicitly pad afterwards. It's optimal, and doesn't make any
> assumptions about implementation of strncpy_from_user().
(a) is the only case.
There is no concurrent access to dest.
Theoretically it's possible for two bpf progs on different cpus
to write into the same dest, but it's completely racy and buggy
for other reasons.
I've looked through most of the kernel code where strncpy_from_user()
is used and couldn't find a case where dest is not used as a string.
In particular I was worried about the code like:
struct foo {
...
char name[64];
...
} *f;
f = kcalloc();
...
ret = strncpy_from_user(f->name, user_addr, sizeof(f->name));
if (ret <= 0)
goto ...;
f->name[ret] = 0;
and then the whole *f would be passed to a hash function
that will go over the sizeof(struct foo) assuming
that strncpy_from_user() didn't add the garbage.
The extra zeroing:
f->name[ret] = 0;
didn't save it from the garbage in the "name".
I can convince myself that your new definition of strncpy_from_user:
"
You told it that the destination buffer was some amount of bytes, and
strncpy_from_user() will use up to that maximum number of bytes.
That's the only guarantee you have - it won't write _past_ the buffer
you gave it.
"
makes sense from the performance point of view.
But I think if glibc's strncpy() did something like this it would
probably caused a lot of pain for user space.
The hash element example above is typical bpf usage.
One real bpf prog was converted as a test:
tools/testing/selftests/bpf/progs/pyperf.h
There it's populating:
typedef struct {
char name[FUNCTION_NAME_LEN];
char file[FILE_NAME_LEN];
} Symbol;
with two calls:
bpf_probe_read_user_str(&symbol->name, sizeof(symbol->name),
frame->co_name + pidData->offsets.String_data);
These are the function name inside python interpreter.
The 'len' result is ignored by bpf prog.
And later the whole 'Symbol' is passed into a hash map.
The kernel code doesn't seem to be doing anything like this, so
it's fine to adopt your definition of strncpy_from_user().
The garbage copying part can be cleared on bpf side.
It's easy enough to do in bpf_probe_read_user_str().
We can just zero up to sizeof(long) bytes there.
On Fri, Nov 13, 2020 at 12:57 PM Alexei Starovoitov
<[email protected]> wrote:
>
> (a) is the only case.
Ok, good.
The (b) case is certainly valid in theory (and we might even
traditionaly have had something like that for things like ->comm[]
accesses, although I think we got rid of it).
But the (b) case is _so_ hard to think about and so easy to get wrong
- readers have to be very careful to only read each byte of the source
exactly once - that it's much much better to try to avoid it.
> But I think if glibc's strncpy() did something like this it would
> probably caused a lot of pain for user space.
Oh, absolutely. The standard strncpy() function has some very strict
behavior issues, including that zero-padding of the *whole*
destination buffer, which would be absolutely horrid for things like
fetching pathnames from user space (our buffer is generally close to a
page in size).
In fact, the kernel strncpy() (ie the one that doesn't copy from user)
does ado the whole "pad all zeroes at the end" exactly because people
might depend on that. So the _actual_ strncpy() function conforms to
the standard use - but you generally shouldn't use it, exactly because
it's such a horrible interface. Only good for very small buffers.
> The hash element example above is typical bpf usage.
The core kernel does have one very common string hash case, but it's
for path components, and never the whole string - so it already has to
deal with the fact that the string is very much delimited in place by
not just NUL at the end, but also '/' characters etc.
So no "copy it as a string from user space, and then use it as a
block" that I'm aware of.
Linus