A couple of RFC fixes for bpf_trace_printk(). The first affects 32-bit
architectures in particular, the second is a theoretical uninitialised
variable fix.
Cc: Alexei Starovoitov <[email protected]>
Cc: Daniel Borkmann <[email protected]>
Cc: Steven Rostedt <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: [email protected]
James Hogan (2):
bpf: Fix bpf_trace_printk on 32-bit architectures
bpf: Initialise mod[] in bpf_trace_printk
kernel/trace/bpf_trace.c | 28 +++++++++++++++++++++++-----
1 file changed, 23 insertions(+), 5 deletions(-)
--
2.13.2
In bpf_trace_printk(), the elements in mod[] are left uninitialised, but
they are then incremented to track the width of the formats. Zero
initialise the array just in case the memory contains non-zero values on
entry.
Fixes: 9c959c863f82 ("tracing: Allow BPF programs to call bpf_trace_printk()")
Signed-off-by: James Hogan <[email protected]>
Cc: Alexei Starovoitov <[email protected]>
Cc: Daniel Borkmann <[email protected]>
Cc: Steven Rostedt <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: [email protected]
---
When I checked (on MIPS32), the elements tended to have the value zero
anyway (does BPF zero the stack or something clever?), so this is a
purely theoretical fix.
---
kernel/trace/bpf_trace.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
index 32dcbe1b48f2..86a52857d941 100644
--- a/kernel/trace/bpf_trace.c
+++ b/kernel/trace/bpf_trace.c
@@ -129,7 +129,7 @@ BPF_CALL_5(bpf_trace_printk, char *, fmt, u32, fmt_size, u64, arg1,
u64, arg2, u64, arg3)
{
bool str_seen = false;
- int mod[3] = {};
+ int mod[3] = { 0, 0, 0 };
int fmt_cnt = 0;
u64 unsafe_addr;
char buf[64];
--
2.13.2
bpf_trace_printk() uses conditional operators to attempt to pass
different types to __trace_printk() depending on the format operators.
This doesn't work as intended on 32-bit architectures where u32 & long
are passed differently to u64, since the result of C conditional
operators follows the "usual arithmetic conversions" rules, such that
the values passed to __trace_printk() will always be u64.
For example the samples/bpf/tracex5 test printed lines like below on
MIPS, where the fd and buf have come from the u64 fd argument, and the
size from the buf argument:
dd-1176 [000] .... 1180.941542: 0x00000001: write(fd=1, buf= (null), size=6258688)
Instead of this:
dd-1217 [000] .... 1625.616026: 0x00000001: write(fd=1, buf=009e4000, size=512)
Work around this with an ugly hack which expands each combination of
argument types for the 3 arguments. On 64-bit kernels it is assumed that
u32, long & u64 are all passed the same way so no casting takes place
(it has apparently worked implicitly until now). On 32-bit kernels it is
assumed that long and u32 pass the same way so there are 8 combinations.
On 32-bit kernels bpf_trace_printk() increases in size but should now
work correctly. On 64-bit kernels it actually reduces in size slightly,
I presume due to removal of some of the casts (which as far as I can
tell are unnecessary for printk anyway due to the controlled nature of
the interpretation):
arch function old new delta
x86_64 bpf_trace_printk 532 412 -120
x86 bpf_trace_printk 676 1120 +444
MIPS64 bpf_trace_printk 760 612 -148
MIPS32 bpf_trace_printk 768 996 +228
Fixes: 9c959c863f82 ("tracing: Allow BPF programs to call bpf_trace_printk()")
Signed-off-by: James Hogan <[email protected]>
Cc: Alexei Starovoitov <[email protected]>
Cc: Daniel Borkmann <[email protected]>
Cc: Steven Rostedt <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: [email protected]
---
I'm open to nicer ways of fixing this.
This is tested with samples/bpf/tracex5 on MIPS32 and MIPS64. Only build
tested on x86.
---
kernel/trace/bpf_trace.c | 26 ++++++++++++++++++++++----
1 file changed, 22 insertions(+), 4 deletions(-)
diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
index 37385193a608..32dcbe1b48f2 100644
--- a/kernel/trace/bpf_trace.c
+++ b/kernel/trace/bpf_trace.c
@@ -204,10 +204,28 @@ BPF_CALL_5(bpf_trace_printk, char *, fmt, u32, fmt_size, u64, arg1,
fmt_cnt++;
}
- return __trace_printk(1/* fake ip will not be printed */, fmt,
- mod[0] == 2 ? arg1 : mod[0] == 1 ? (long) arg1 : (u32) arg1,
- mod[1] == 2 ? arg2 : mod[1] == 1 ? (long) arg2 : (u32) arg2,
- mod[2] == 2 ? arg3 : mod[2] == 1 ? (long) arg3 : (u32) arg3);
+ /*
+ * This is a horribly ugly hack to allow different combinations of
+ * argument types to be used, particularly on 32-bit architectures where
+ * u32 & long pass the same as one another, but differently to u64.
+ *
+ * On 64-bit architectures it is assumed u32, long & u64 pass in the
+ * same way.
+ */
+
+#define __BPFTP_P(...) __trace_printk(1/* fake ip will not be printed */, \
+ fmt, ##__VA_ARGS__)
+#define __BPFTP_1(...) ((mod[0] == 2 || __BITS_PER_LONG == 64) \
+ ? __BPFTP_P(arg1, ##__VA_ARGS__) \
+ : __BPFTP_P((long)arg1, ##__VA_ARGS__))
+#define __BPFTP_2(...) ((mod[1] == 2 || __BITS_PER_LONG == 64) \
+ ? __BPFTP_1(arg2, ##__VA_ARGS__) \
+ : __BPFTP_1((long)arg2, ##__VA_ARGS__))
+#define __BPFTP_3(...) ((mod[2] == 2 || __BITS_PER_LONG == 64) \
+ ? __BPFTP_2(arg3, ##__VA_ARGS__) \
+ : __BPFTP_2((long)arg3, ##__VA_ARGS__))
+
+ return __BPFTP_3();
}
static const struct bpf_func_proto bpf_trace_printk_proto = {
--
2.13.2
On 08/08/2017 12:25 AM, James Hogan wrote:
> In bpf_trace_printk(), the elements in mod[] are left uninitialised, but
> they are then incremented to track the width of the formats. Zero
> initialise the array just in case the memory contains non-zero values on
> entry.
>
> Fixes: 9c959c863f82 ("tracing: Allow BPF programs to call bpf_trace_printk()")
> Signed-off-by: James Hogan <[email protected]>
> Cc: Alexei Starovoitov <[email protected]>
> Cc: Daniel Borkmann <[email protected]>
> Cc: Steven Rostedt <[email protected]>
> Cc: Ingo Molnar <[email protected]>
> Cc: [email protected]
> ---
> When I checked (on MIPS32), the elements tended to have the value zero
> anyway (does BPF zero the stack or something clever?), so this is a
> purely theoretical fix.
> ---
> kernel/trace/bpf_trace.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
> index 32dcbe1b48f2..86a52857d941 100644
> --- a/kernel/trace/bpf_trace.c
> +++ b/kernel/trace/bpf_trace.c
> @@ -129,7 +129,7 @@ BPF_CALL_5(bpf_trace_printk, char *, fmt, u32, fmt_size, u64, arg1,
> u64, arg2, u64, arg3)
> {
> bool str_seen = false;
> - int mod[3] = {};
> + int mod[3] = { 0, 0, 0 };
I'm probably missing something, but is the behavior of gcc wrt
above initializers different on mips (it zeroes just fine on x86
at least)? If yes, we'd probably need a cocci script to also check
rest of the kernel given this is used in a number of places. Hm,
could you elaborate?
> int fmt_cnt = 0;
> u64 unsafe_addr;
> char buf[64];
>
From: Daniel Borkmann <[email protected]>
Date: Tue, 08 Aug 2017 10:46:52 +0200
> On 08/08/2017 12:25 AM, James Hogan wrote:
>> In bpf_trace_printk(), the elements in mod[] are left uninitialised,
>> but
>> they are then incremented to track the width of the formats. Zero
>> initialise the array just in case the memory contains non-zero values
>> on
>> entry.
>>
>> Fixes: 9c959c863f82 ("tracing: Allow BPF programs to call
>> bpf_trace_printk()")
>> Signed-off-by: James Hogan <[email protected]>
>> Cc: Alexei Starovoitov <[email protected]>
>> Cc: Daniel Borkmann <[email protected]>
>> Cc: Steven Rostedt <[email protected]>
>> Cc: Ingo Molnar <[email protected]>
>> Cc: [email protected]
>> ---
>> When I checked (on MIPS32), the elements tended to have the value zero
>> anyway (does BPF zero the stack or something clever?), so this is a
>> purely theoretical fix.
>> ---
>> kernel/trace/bpf_trace.c | 2 +-
>> 1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
>> index 32dcbe1b48f2..86a52857d941 100644
>> --- a/kernel/trace/bpf_trace.c
>> +++ b/kernel/trace/bpf_trace.c
>> @@ -129,7 +129,7 @@ BPF_CALL_5(bpf_trace_printk, char *, fmt, u32,
>> fmt_size, u64, arg1,
>> u64, arg2, u64, arg3)
>> {
>> bool str_seen = false;
>> - int mod[3] = {};
>> + int mod[3] = { 0, 0, 0 };
>
> I'm probably missing something, but is the behavior of gcc wrt
> above initializers different on mips (it zeroes just fine on x86
> at least)? If yes, we'd probably need a cocci script to also check
> rest of the kernel given this is used in a number of places. Hm,
> could you elaborate?
This change is not necessary at all.
An empty initializer must clear the whole object to zero.
"theoretical" fix indeed... :-(
On 8 August 2017 17:48:57 BST, David Miller <[email protected]> wrote:
>From: Daniel Borkmann <[email protected]>
>Date: Tue, 08 Aug 2017 10:46:52 +0200
>
>> On 08/08/2017 12:25 AM, James Hogan wrote:
>>> In bpf_trace_printk(), the elements in mod[] are left uninitialised,
>>> but
>>> they are then incremented to track the width of the formats. Zero
>>> initialise the array just in case the memory contains non-zero
>values
>>> on
>>> entry.
>>>
>>> Fixes: 9c959c863f82 ("tracing: Allow BPF programs to call
>>> bpf_trace_printk()")
>>> Signed-off-by: James Hogan <[email protected]>
>>> Cc: Alexei Starovoitov <[email protected]>
>>> Cc: Daniel Borkmann <[email protected]>
>>> Cc: Steven Rostedt <[email protected]>
>>> Cc: Ingo Molnar <[email protected]>
>>> Cc: [email protected]
>>> ---
>>> When I checked (on MIPS32), the elements tended to have the value
>zero
>>> anyway (does BPF zero the stack or something clever?), so this is a
>>> purely theoretical fix.
>>> ---
>>> kernel/trace/bpf_trace.c | 2 +-
>>> 1 file changed, 1 insertion(+), 1 deletion(-)
>>>
>>> diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
>>> index 32dcbe1b48f2..86a52857d941 100644
>>> --- a/kernel/trace/bpf_trace.c
>>> +++ b/kernel/trace/bpf_trace.c
>>> @@ -129,7 +129,7 @@ BPF_CALL_5(bpf_trace_printk, char *, fmt, u32,
>>> fmt_size, u64, arg1,
>>> u64, arg2, u64, arg3)
>>> {
>>> bool str_seen = false;
>>> - int mod[3] = {};
>>> + int mod[3] = { 0, 0, 0 };
>>
>> I'm probably missing something, but is the behavior of gcc wrt
>> above initializers different on mips (it zeroes just fine on x86
>> at least)? If yes, we'd probably need a cocci script to also check
>> rest of the kernel given this is used in a number of places. Hm,
>> could you elaborate?
>
>This change is not necessary at all.
>
>An empty initializer must clear the whole object to zero.
>
>"theoretical" fix indeed... :-(
cool, i hadn't realised unmentioned elements in an initialiser are always zeroed, even when non-global/static, so had interpreted the whole array as uninitialised. learn something new every day :-) sorry for the noise.
cheers
James
From: James Hogan <[email protected]>
Date: Tue, 08 Aug 2017 22:20:05 +0100
> cool, i hadn't realised unmentioned elements in an initialiser are
> always zeroed, even when non-global/static, so had interpreted the
> whole array as uninitialised. learn something new every day :-)
> sorry for the noise.
You didn't have to know in the first place, you could have simply
compiled the code into assembler by running:
make kernel/trace/bpf_trace.s
and seen for yourself before putting all of this time and effort into
this patch and discussion.
If you don't know what the compiler does, simply look!
On Tue, Aug 08, 2017 at 02:54:33PM -0700, David Miller wrote:
> From: James Hogan <[email protected]>
> Date: Tue, 08 Aug 2017 22:20:05 +0100
>
> > cool, i hadn't realised unmentioned elements in an initialiser are
> > always zeroed, even when non-global/static, so had interpreted the
> > whole array as uninitialised. learn something new every day :-)
> > sorry for the noise.
>
> You didn't have to know in the first place, you could have simply
> compiled the code into assembler by running:
>
> make kernel/trace/bpf_trace.s
>
> and seen for yourself before putting all of this time and effort into
> this patch and discussion.
>
> If you don't know what the compiler does, simply look!
Well, thats the danger of wrongly thinking I already knew what it did in
this case. Anyway like I said, I'm sorry for the noise and wasting your
time (but please consider looking at the other patch which is certainly
a more real issue).
Thanks
James
On 08/09/2017 09:39 AM, James Hogan wrote:
[...]
> time (but please consider looking at the other patch which is certainly
> a more real issue).
Sorry for the delay, started looking into that and whether we
have some other options, I'll get back to you on this.
Thanks,
Daniel
Hi James,
On 08/09/2017 10:34 PM, Daniel Borkmann wrote:
> On 08/09/2017 09:39 AM, James Hogan wrote:
> [...]
>> time (but please consider looking at the other patch which is certainly
>> a more real issue).
>
> Sorry for the delay, started looking into that and whether we
> have some other options, I'll get back to you on this.
Could we solve this more generically (as in: originally intended) in
the sense that we don't need to trust the gcc va_list handling; I feel
this is relying on an implementation detail? Perhaps something like
below poc patch?
Thanks again,
Daniel
From 71f16544d455abb6bb82f7253c17c14d2a395e91 Mon Sep 17 00:00:00 2001
Message-Id: <71f16544d455abb6bb82f7253c17c14d2a395e91.1502469361.git.daniel@iogearbox.net>
From: Daniel Borkmann <[email protected]>
Date: Fri, 11 Aug 2017 15:56:32 +0200
Subject: [PATCH] bpf: fix bpf_trace_printk on 32 bit
Signed-off-by: Daniel Borkmann <[email protected]>
---
kernel/trace/bpf_trace.c | 31 +++++++++++++++++++++++++++----
1 file changed, 27 insertions(+), 4 deletions(-)
diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
index 3738519..d4cb36f 100644
--- a/kernel/trace/bpf_trace.c
+++ b/kernel/trace/bpf_trace.c
@@ -204,10 +204,33 @@ static const struct bpf_func_proto *bpf_get_probe_write_proto(void)
fmt_cnt++;
}
- return __trace_printk(1/* fake ip will not be printed */, fmt,
- mod[0] == 2 ? arg1 : mod[0] == 1 ? (long) arg1 : (u32) arg1,
- mod[1] == 2 ? arg2 : mod[1] == 1 ? (long) arg2 : (u32) arg2,
- mod[2] == 2 ? arg3 : mod[2] == 1 ? (long) arg3 : (u32) arg3);
+#define __BPF_TP_EMIT() __BPF_ARG3_TP()
+#define __BPF_TP(...) \
+ __trace_printk(1 /* fake ip will not be printed */, \
+ fmt, ##__VA_ARGS__)
+
+#define __BPF_ARG1_TP(...) \
+ ((mod[0] == 2 || (mod[0] == 1 && __BITS_PER_LONG == 64)) \
+ ? __BPF_TP(arg1, ##__VA_ARGS__) \
+ : ((mod[0] == 1 || (mod[0] == 0 && __BITS_PER_LONG == 32)) \
+ ? __BPF_TP((long)arg1, ##__VA_ARGS__) \
+ : __BPF_TP((u32)arg1, ##__VA_ARGS__)))
+
+#define __BPF_ARG2_TP(...) \
+ ((mod[1] == 2 || (mod[1] == 1 && __BITS_PER_LONG == 64)) \
+ ? __BPF_ARG1_TP(arg2, ##__VA_ARGS__) \
+ : ((mod[1] == 1 || (mod[1] == 0 && __BITS_PER_LONG == 32)) \
+ ? __BPF_ARG1_TP((long)arg2, ##__VA_ARGS__) \
+ : __BPF_ARG1_TP((u32)arg2, ##__VA_ARGS__)))
+
+#define __BPF_ARG3_TP(...) \
+ ((mod[2] == 2 || (mod[2] == 1 && __BITS_PER_LONG == 64)) \
+ ? __BPF_ARG2_TP(arg3, ##__VA_ARGS__) \
+ : ((mod[2] == 1 || (mod[2] == 0 && __BITS_PER_LONG == 32)) \
+ ? __BPF_ARG2_TP((long)arg3, ##__VA_ARGS__) \
+ : __BPF_ARG2_TP((u32)arg3, ##__VA_ARGS__)))
+
+ return __BPF_TP_EMIT();
}
static const struct bpf_func_proto bpf_trace_printk_proto = {
--
1.9.3
On Fri, Aug 11, 2017 at 06:47:04PM +0200, Daniel Borkmann wrote:
> Hi James,
>
> On 08/09/2017 10:34 PM, Daniel Borkmann wrote:
> > On 08/09/2017 09:39 AM, James Hogan wrote:
> > [...]
> >> time (but please consider looking at the other patch which is certainly
> >> a more real issue).
> >
> > Sorry for the delay, started looking into that and whether we
> > have some other options, I'll get back to you on this.
>
> Could we solve this more generically (as in: originally intended) in
> the sense that we don't need to trust the gcc va_list handling; I feel
> this is relying on an implementation detail? Perhaps something like
> below poc patch?
Well it works on MIPS32 and MIPS64 with tracex5.
Tested-by: James Hogan <[email protected]>
Cheers
James
>
> Thanks again,
> Daniel
>
> From 71f16544d455abb6bb82f7253c17c14d2a395e91 Mon Sep 17 00:00:00 2001
> Message-Id: <71f16544d455abb6bb82f7253c17c14d2a395e91.1502469361.git.daniel@iogearbox.net>
> From: Daniel Borkmann <[email protected]>
> Date: Fri, 11 Aug 2017 15:56:32 +0200
> Subject: [PATCH] bpf: fix bpf_trace_printk on 32 bit
>
> Signed-off-by: Daniel Borkmann <[email protected]>
> ---
> kernel/trace/bpf_trace.c | 31 +++++++++++++++++++++++++++----
> 1 file changed, 27 insertions(+), 4 deletions(-)
>
> diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
> index 3738519..d4cb36f 100644
> --- a/kernel/trace/bpf_trace.c
> +++ b/kernel/trace/bpf_trace.c
> @@ -204,10 +204,33 @@ static const struct bpf_func_proto *bpf_get_probe_write_proto(void)
> fmt_cnt++;
> }
>
> - return __trace_printk(1/* fake ip will not be printed */, fmt,
> - mod[0] == 2 ? arg1 : mod[0] == 1 ? (long) arg1 : (u32) arg1,
> - mod[1] == 2 ? arg2 : mod[1] == 1 ? (long) arg2 : (u32) arg2,
> - mod[2] == 2 ? arg3 : mod[2] == 1 ? (long) arg3 : (u32) arg3);
> +#define __BPF_TP_EMIT() __BPF_ARG3_TP()
> +#define __BPF_TP(...) \
> + __trace_printk(1 /* fake ip will not be printed */, \
> + fmt, ##__VA_ARGS__)
> +
> +#define __BPF_ARG1_TP(...) \
> + ((mod[0] == 2 || (mod[0] == 1 && __BITS_PER_LONG == 64)) \
> + ? __BPF_TP(arg1, ##__VA_ARGS__) \
> + : ((mod[0] == 1 || (mod[0] == 0 && __BITS_PER_LONG == 32)) \
> + ? __BPF_TP((long)arg1, ##__VA_ARGS__) \
> + : __BPF_TP((u32)arg1, ##__VA_ARGS__)))
> +
> +#define __BPF_ARG2_TP(...) \
> + ((mod[1] == 2 || (mod[1] == 1 && __BITS_PER_LONG == 64)) \
> + ? __BPF_ARG1_TP(arg2, ##__VA_ARGS__) \
> + : ((mod[1] == 1 || (mod[1] == 0 && __BITS_PER_LONG == 32)) \
> + ? __BPF_ARG1_TP((long)arg2, ##__VA_ARGS__) \
> + : __BPF_ARG1_TP((u32)arg2, ##__VA_ARGS__)))
> +
> +#define __BPF_ARG3_TP(...) \
> + ((mod[2] == 2 || (mod[2] == 1 && __BITS_PER_LONG == 64)) \
> + ? __BPF_ARG2_TP(arg3, ##__VA_ARGS__) \
> + : ((mod[2] == 1 || (mod[2] == 0 && __BITS_PER_LONG == 32)) \
> + ? __BPF_ARG2_TP((long)arg3, ##__VA_ARGS__) \
> + : __BPF_ARG2_TP((u32)arg3, ##__VA_ARGS__)))
> +
> + return __BPF_TP_EMIT();
> }
>
> static const struct bpf_func_proto bpf_trace_printk_proto = {
> --
> 1.9.3
From: Daniel Borkmann
> Sent: 11 August 2017 17:47
> On 08/09/2017 10:34 PM, Daniel Borkmann wrote:
> > On 08/09/2017 09:39 AM, James Hogan wrote:
> > [...]
> >> time (but please consider looking at the other patch which is certainly
> >> a more real issue).
> >
> > Sorry for the delay, started looking into that and whether we
> > have some other options, I'll get back to you on this.
>
> Could we solve this more generically (as in: originally intended) in
> the sense that we don't need to trust the gcc va_list handling; I feel
> this is relying on an implementation detail? Perhaps something like
> below poc patch?
That patch still has 'cond ? arg : cond1 ? (long)arg : (u32)arg' so
probably has the same warning as the original version.
The va_list handling is defined by the relevant ABI, not gcc.
It is ok on x86-64 because all 32bit values are extended to 64bits
before being passed as arguments (either in registers, or on the stack).
Nothing in the C language requires that, so other 64bit architectures
could pass 32bit values in 4 bytes of stack.
David