2021-07-15 06:49:02

by Steven Rostedt

[permalink] [raw]
Subject: [PATCH] tracing: Do no reference char * as a string in histograms

From: "Steven Rostedt (VMware)" <[email protected]>

The histogram logic was allowing events with char * pointers to be used as
normal strings. But it was easy to crash the kernel with:

# echo 'hist:keys=filename' > events/syscalls/sys_enter_openat/trigger

And open some files, and boom!

BUG: unable to handle page fault for address: 00007f2ced0c3280
#PF: supervisor read access in kernel mode
#PF: error_code(0x0000) - not-present page
PGD 1173fa067 P4D 1173fa067 PUD 1171b6067 PMD 1171dd067 PTE 0
Oops: 0000 [#1] PREEMPT SMP
CPU: 6 PID: 1810 Comm: cat Not tainted 5.13.0-rc5-test+ #61
Hardware name: Hewlett-Packard HP Compaq Pro 6300 SFF/339A, BIOS K01
v03.03 07/14/2016
RIP: 0010:strlen+0x0/0x20
Code: f6 82 80 2a 0b a9 20 74 11 0f b6 50 01 48 83 c0 01 f6 82 80 2a 0b
a9 20 75 ef c3 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 <80> 3f 00 74
10 48 89 f8 48 83 c0 01 80 38 00 75 f7 48 29 f8 c3

RSP: 0018:ffffbdbf81567b50 EFLAGS: 00010246
RAX: 0000000000000003 RBX: ffff93815cdb3800 RCX: ffff9382401a22d0
RDX: 0000000000000100 RSI: 0000000000000000 RDI: 00007f2ced0c3280
RBP: 0000000000000100 R08: ffff9382409ff074 R09: ffffbdbf81567c98
R10: ffff9382409ff074 R11: 0000000000000000 R12: ffff9382409ff074
R13: 0000000000000001 R14: ffff93815a744f00 R15: 00007f2ced0c3280
FS: 00007f2ced0f8580(0000) GS:ffff93825a800000(0000)
knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f2ced0c3280 CR3: 0000000107069005 CR4: 00000000001706e0
Call Trace:
event_hist_trigger+0x463/0x5f0
? find_held_lock+0x32/0x90
? sched_clock_cpu+0xe/0xd0
? lock_release+0x155/0x440
? kernel_init_free_pages+0x6d/0x90
? preempt_count_sub+0x9b/0xd0
? kernel_init_free_pages+0x6d/0x90
? get_page_from_freelist+0x12c4/0x1680
? __rb_reserve_next+0xe5/0x460
? ring_buffer_lock_reserve+0x12a/0x3f0
event_triggers_call+0x52/0xe0
ftrace_syscall_enter+0x264/0x2c0
syscall_trace_enter.constprop.0+0x1ee/0x210
do_syscall_64+0x1c/0x80
entry_SYSCALL_64_after_hwframe+0x44/0xae

Where it triggered a fault on strlen(key) where key was the filename.

The reason is that filename is a char * to user space, and the histogram
code just blindly dereferenced it, with obvious bad results.

I originally tried to use strncpy_from_user/kernel_nofault() but found
that there's other places that its dereferenced and not worth the effort.

Just do not allow "char *" to act like strings.

Cc: [email protected]
Fixes: 79e577cbce4c4 ("tracing: Support string type key properly")
Signed-off-by: Steven Rostedt (VMware) <[email protected]>
---
kernel/trace/trace_events_hist.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/kernel/trace/trace_events_hist.c b/kernel/trace/trace_events_hist.c
index 0207aeed31e6..16a9dfc9fffc 100644
--- a/kernel/trace/trace_events_hist.c
+++ b/kernel/trace/trace_events_hist.c
@@ -1689,7 +1689,9 @@ static struct hist_field *create_hist_field(struct hist_trigger_data *hist_data,
if (WARN_ON_ONCE(!field))
goto out;

- if (is_string_field(field)) {
+ /* Pointers to strings are just pointers and dangerous to dereference */
+ if (is_string_field(field) &&
+ (field->filter_type != FILTER_PTR_STRING)) {
flags |= HIST_FIELD_FL_STRING;

hist_field->size = MAX_FILTER_STR_VAL;
@@ -4495,8 +4497,6 @@ static inline void add_to_key(char *compound_key, void *key,
field = key_field->field;
if (field->filter_type == FILTER_DYN_STRING)
size = *(u32 *)(rec + field->offset) >> 16;
- else if (field->filter_type == FILTER_PTR_STRING)
- size = strlen(key);
else if (field->filter_type == FILTER_STATIC_STRING)
size = field->size;

--
2.31.1


2021-07-15 17:26:34

by Tom Zanussi

[permalink] [raw]
Subject: Re: [PATCH] tracing: Do no reference char * as a string in histograms

Hi Steve,

On Thu, 2021-07-15 at 00:02 -0400, Steven Rostedt wrote:
> From: "Steven Rostedt (VMware)" <[email protected]>
>
> The histogram logic was allowing events with char * pointers to be
> used as
> normal strings. But it was easy to crash the kernel with:
>
> # echo 'hist:keys=filename' >
> events/syscalls/sys_enter_openat/trigger
>
> And open some files, and boom!
>
> BUG: unable to handle page fault for address: 00007f2ced0c3280
> #PF: supervisor read access in kernel mode
> #PF: error_code(0x0000) - not-present page
> PGD 1173fa067 P4D 1173fa067 PUD 1171b6067 PMD 1171dd067 PTE 0
> Oops: 0000 [#1] PREEMPT SMP
> CPU: 6 PID: 1810 Comm: cat Not tainted 5.13.0-rc5-test+ #61
> Hardware name: Hewlett-Packard HP Compaq Pro 6300 SFF/339A, BIOS K01
> v03.03 07/14/2016
> RIP: 0010:strlen+0x0/0x20
> Code: f6 82 80 2a 0b a9 20 74 11 0f b6 50 01 48 83 c0 01 f6 82 80 2a
> 0b
> a9 20 75 ef c3 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 <80> 3f
> 00 74
> 10 48 89 f8 48 83 c0 01 80 38 00 75 f7 48 29 f8 c3
>
> RSP: 0018:ffffbdbf81567b50 EFLAGS: 00010246
> RAX: 0000000000000003 RBX: ffff93815cdb3800 RCX: ffff9382401a22d0
> RDX: 0000000000000100 RSI: 0000000000000000 RDI: 00007f2ced0c3280
> RBP: 0000000000000100 R08: ffff9382409ff074 R09: ffffbdbf81567c98
> R10: ffff9382409ff074 R11: 0000000000000000 R12: ffff9382409ff074
> R13: 0000000000000001 R14: ffff93815a744f00 R15: 00007f2ced0c3280
> FS: 00007f2ced0f8580(0000) GS:ffff93825a800000(0000)
> knlGS:0000000000000000
> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 00007f2ced0c3280 CR3: 0000000107069005 CR4: 00000000001706e0
> Call Trace:
> event_hist_trigger+0x463/0x5f0
> ? find_held_lock+0x32/0x90
> ? sched_clock_cpu+0xe/0xd0
> ? lock_release+0x155/0x440
> ? kernel_init_free_pages+0x6d/0x90
> ? preempt_count_sub+0x9b/0xd0
> ? kernel_init_free_pages+0x6d/0x90
> ? get_page_from_freelist+0x12c4/0x1680
> ? __rb_reserve_next+0xe5/0x460
> ? ring_buffer_lock_reserve+0x12a/0x3f0
> event_triggers_call+0x52/0xe0
> ftrace_syscall_enter+0x264/0x2c0
> syscall_trace_enter.constprop.0+0x1ee/0x210
> do_syscall_64+0x1c/0x80
> entry_SYSCALL_64_after_hwframe+0x44/0xae
>
> Where it triggered a fault on strlen(key) where key was the filename.
>
> The reason is that filename is a char * to user space, and the
> histogram
> code just blindly dereferenced it, with obvious bad results.
>
> I originally tried to use strncpy_from_user/kernel_nofault() but
> found
> that there's other places that its dereferenced and not worth the
> effort.
>
> Just do not allow "char *" to act like strings.

The original commit introducing FILTER_PTR_STR only intended char *
string dereferences when it was known to be safe:

commit 87a342f5db69d53ea70493bb1ec69c9047677038
Author: Li Zefan <[email protected]>
Date: Fri Aug 7 10:33:43 2009 +0800

tracing/filters: Support filtering for char * strings

...

The filtering will not dereference "char *" unless the developer
explicitly sets FILTER_PTR_STR in __field_ext.


But it looks like a later patch opened it up for trace_define_field()
to get around that for any char * field:

commit 5967bd5c4239be449744a1471daf60c866486c24
Author: Cong Wang <[email protected]>
Date: Sat May 25 09:58:00 2019 -0700

tracing: Let filter_assign_type() detect FILTER_PTR_STRING

filter_assign_type() could detect dynamic string and static
string, but not string pointers. Teach filter_assign_type()
to detect string pointers, and this will be needed by trace
event injection code.

BTW, trace event hist uses FILTER_PTR_STRING too.
Link: http://lkml.kernel.org/r/[email protected]


Not sure what the correct fix is at this point, but it would seem that
if the original intent was honored, the histogram code should still be
allowed to use them.

Tom



> Cc: [email protected]
> Fixes: 79e577cbce4c4 ("tracing: Support string type key properly")
> Signed-off-by: Steven Rostedt (VMware) <[email protected]>
> ---
> kernel/trace/trace_events_hist.c | 6 +++---
> 1 file changed, 3 insertions(+), 3 deletions(-)
>
> diff --git a/kernel/trace/trace_events_hist.c
> b/kernel/trace/trace_events_hist.c
> index 0207aeed31e6..16a9dfc9fffc 100644
> --- a/kernel/trace/trace_events_hist.c
> +++ b/kernel/trace/trace_events_hist.c
> @@ -1689,7 +1689,9 @@ static struct hist_field
> *create_hist_field(struct hist_trigger_data *hist_data,
> if (WARN_ON_ONCE(!field))
> goto out;
>
> - if (is_string_field(field)) {
> + /* Pointers to strings are just pointers and dangerous to
> dereference */
> + if (is_string_field(field) &&
> + (field->filter_type != FILTER_PTR_STRING)) {
> flags |= HIST_FIELD_FL_STRING;
>
> hist_field->size = MAX_FILTER_STR_VAL;
> @@ -4495,8 +4497,6 @@ static inline void add_to_key(char
> *compound_key, void *key,
> field = key_field->field;
> if (field->filter_type == FILTER_DYN_STRING)
> size = *(u32 *)(rec + field->offset) >> 16;
> - else if (field->filter_type == FILTER_PTR_STRING)
> - size = strlen(key);
> else if (field->filter_type == FILTER_STATIC_STRING)
> size = field->size;
>

2021-07-15 19:59:19

by Steven Rostedt

[permalink] [raw]
Subject: Re: [PATCH] tracing: Do no reference char * as a string in histograms

On Thu, 15 Jul 2021 12:04:13 -0500
Tom Zanussi <[email protected]> wrote:

> Not sure what the correct fix is at this point, but it would seem that
> if the original intent was honored, the histogram code should still be
> allowed to use them.

Note, it can in a patch I plan on sending in the next merge window that
Tzvetomir and I are working on. We succeeded in getting the "eprobe"
working which allows you to do much more with event fields than the
current histogram code does.

Otherwise, I need a way to get the pointer and not a string. As I
discovered, the string may not have any content in it, and you can't
fault within a tracepoint, and you are left with just passing in
"(fault)", or something. Which is not useful at all!

What we have as a working prototype is this:

# cd /sys/kernel/tracing
# echo 'e:myopen syscalls.sys_enter_openat file=$filename:ustring' > kprobe_events
# trace-cmd start -e myopen -F ~/bin/openat
# cat trace
openat-1928 [000] ...2 1540.638692: myopen: (0) file="/etc/ld.so.cache"
openat-1928 [000] ...2 1540.638692: myopen: (0) file="/lib64/libc.so.6"
openat-1928 [000] ...2 1540.638694: myopen: (0) file=(fault)


And if we were to enforce this on a histogram, all we will see is
"fault" because the filename is first referenced inside the system call
and at the beginning of the call, it is not mapped in, and we can not
map it in inside a tracepoint. You are left with just putting in some
stub string (which kprobes uses "(fault)").

But, because now filename is a pointer, that the histograms can pass
around, it is much more useful, as I can use this to pass the filename
pointer to the exit of the system call!

# echo 'hist:keys=common_pid:f=filename' > events/syscalls/sys_enter_openat/trigger
# echo 'openret unsigned long file; long ret' > synthetic_events
# echo 'hist:keys=common_pid:file=$f:onmatch(syscalls.sys_enter_openat).trace(openret,$file,ret)' > events/syscalls/sys_exit_openat/trigger
# echo 'e:myopen synthetic.openret file=$file:ustring ret=$ret:s64' > kprobe_events
# trace-cmd start -e myopen -F ~/bin/openat
# cat trace
openat-1994 [002] ...3 2052.403323: myopen: (0) file="/etc/ld.so.cache" ret=3
openat-1994 [002] ...3 2052.403350: myopen: (0) file="/lib64/libc.so.6" ret=3
openat-1994 [002] ...3 2052.403555: myopen: (0) file="/etc/passwd" ret=3

Now, not only can we see the filename, because we passed the pointer
from the sys_enter to the sys_exit, we also can show the return of the
system call.

This allows for a much more flexibility than the histograms by
themselves have. If you want the file names via pointers, you can then
create these event_probes and use them in the histograms as well. Which
gives you the full flexibility of kprobes but on the fields of events.

Note, without this patch, because the histograms thinks 'char *' is a
string, you can't pass the pointer, but only the string. Hence, you
only get "(fault)", which is pretty darn useless.

This is another rationale for why I want this patch.

FYI, the above is cut and paste from working code, but it still needs a
bit of clean up before we post it.

-- Steve

2021-07-15 20:41:07

by Tom Zanussi

[permalink] [raw]
Subject: Re: [PATCH] tracing: Do no reference char * as a string in histograms

Hi Steve,

On Thu, 2021-07-15 at 15:54 -0400, Steven Rostedt wrote:
> On Thu, 15 Jul 2021 12:04:13 -0500
> Tom Zanussi <[email protected]> wrote:
>
> > Not sure what the correct fix is at this point, but it would seem
> > that
> > if the original intent was honored, the histogram code should still
> > be
> > allowed to use them.
>
> Note, it can in a patch I plan on sending in the next merge window
> that
> Tzvetomir and I are working on. We succeeded in getting the "eprobe"
> working which allows you to do much more with event fields than the
> current histogram code does.
>
> Otherwise, I need a way to get the pointer and not a string. As I
> discovered, the string may not have any content in it, and you can't
> fault within a tracepoint, and you are left with just passing in
> "(fault)", or something. Which is not useful at all!
>
> What we have as a working prototype is this:
>
> # cd /sys/kernel/tracing
> # echo 'e:myopen syscalls.sys_enter_openat file=$filename:ustring' >
> kprobe_events
> # trace-cmd start -e myopen -F ~/bin/openat
> # cat trace
> openat-1928 [000] ...2 1540.638692: myopen: (0)
> file="/etc/ld.so.cache"
> openat-1928 [000] ...2 1540.638692: myopen: (0)
> file="/lib64/libc.so.6"
> openat-1928 [000] ...2 1540.638694: myopen: (0)
> file=(fault)
>
>
> And if we were to enforce this on a histogram, all we will see is
> "fault" because the filename is first referenced inside the system
> call
> and at the beginning of the call, it is not mapped in, and we can not
> map it in inside a tracepoint. You are left with just putting in some
> stub string (which kprobes uses "(fault)").
>
> But, because now filename is a pointer, that the histograms can pass
> around, it is much more useful, as I can use this to pass the
> filename
> pointer to the exit of the system call!
>
> # echo 'hist:keys=common_pid:f=filename' >
> events/syscalls/sys_enter_openat/trigger
> # echo 'openret unsigned long file; long ret' > synthetic_events
> # echo
> 'hist:keys=common_pid:file=$f:onmatch(syscalls.sys_enter_openat).trac
> e(openret,$file,ret)' > events/syscalls/sys_exit_openat/trigger
> # echo 'e:myopen synthetic.openret file=$file:ustring ret=$ret:s64'
> > kprobe_events
> # trace-cmd start -e myopen -F ~/bin/openat
> # cat trace
> openat-1994 [002] ...3 2052.403323: myopen: (0)
> file="/etc/ld.so.cache" ret=3
> openat-1994 [002] ...3 2052.403350: myopen: (0)
> file="/lib64/libc.so.6" ret=3
> openat-1994 [002] ...3 2052.403555: myopen: (0)
> file="/etc/passwd" ret=3
>
> Now, not only can we see the filename, because we passed the pointer
> from the sys_enter to the sys_exit, we also can show the return of
> the
> system call.
>

Very nice! Really looking forward to playing around with this.

> This allows for a much more flexibility than the histograms by
> themselves have. If you want the file names via pointers, you can
> then
> create these event_probes and use them in the histograms as well.
> Which
> gives you the full flexibility of kprobes but on the fields of
> events.
>
> Note, without this patch, because the histograms thinks 'char *' is a
> string, you can't pass the pointer, but only the string. Hence, you
> only get "(fault)", which is pretty darn useless.
>
> This is another rationale for why I want this patch.
>

Yeah, if it's needed to enable the eprobes and since it also fixes a
crashing bug, I'm all for it.

But should it also have an additional 'Fixes:' line?:

Fixes: 5967bd5c4239 ("tracing: Let filter_assign_type() detect
FILTER_PTR_STRING")

Acked-by: Tom Zanussi <[email protected]>


2021-07-15 21:01:44

by Namhyung Kim

[permalink] [raw]
Subject: Re: [PATCH] tracing: Do no reference char * as a string in histograms

Hi Steve,

On Wed, Jul 14, 2021 at 9:02 PM Steven Rostedt <[email protected]> wrote:
>
> From: "Steven Rostedt (VMware)" <[email protected]>
>
> The histogram logic was allowing events with char * pointers to be used as
> normal strings. But it was easy to crash the kernel with:
>
> # echo 'hist:keys=filename' > events/syscalls/sys_enter_openat/trigger
>
> And open some files, and boom!
>
> BUG: unable to handle page fault for address: 00007f2ced0c3280
> #PF: supervisor read access in kernel mode
> #PF: error_code(0x0000) - not-present page
> PGD 1173fa067 P4D 1173fa067 PUD 1171b6067 PMD 1171dd067 PTE 0
> Oops: 0000 [#1] PREEMPT SMP
> CPU: 6 PID: 1810 Comm: cat Not tainted 5.13.0-rc5-test+ #61
> Hardware name: Hewlett-Packard HP Compaq Pro 6300 SFF/339A, BIOS K01
> v03.03 07/14/2016
> RIP: 0010:strlen+0x0/0x20
> Code: f6 82 80 2a 0b a9 20 74 11 0f b6 50 01 48 83 c0 01 f6 82 80 2a 0b
> a9 20 75 ef c3 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 <80> 3f 00 74
> 10 48 89 f8 48 83 c0 01 80 38 00 75 f7 48 29 f8 c3
>
> RSP: 0018:ffffbdbf81567b50 EFLAGS: 00010246
> RAX: 0000000000000003 RBX: ffff93815cdb3800 RCX: ffff9382401a22d0
> RDX: 0000000000000100 RSI: 0000000000000000 RDI: 00007f2ced0c3280
> RBP: 0000000000000100 R08: ffff9382409ff074 R09: ffffbdbf81567c98
> R10: ffff9382409ff074 R11: 0000000000000000 R12: ffff9382409ff074
> R13: 0000000000000001 R14: ffff93815a744f00 R15: 00007f2ced0c3280
> FS: 00007f2ced0f8580(0000) GS:ffff93825a800000(0000)
> knlGS:0000000000000000
> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 00007f2ced0c3280 CR3: 0000000107069005 CR4: 00000000001706e0
> Call Trace:
> event_hist_trigger+0x463/0x5f0
> ? find_held_lock+0x32/0x90
> ? sched_clock_cpu+0xe/0xd0
> ? lock_release+0x155/0x440
> ? kernel_init_free_pages+0x6d/0x90
> ? preempt_count_sub+0x9b/0xd0
> ? kernel_init_free_pages+0x6d/0x90
> ? get_page_from_freelist+0x12c4/0x1680
> ? __rb_reserve_next+0xe5/0x460
> ? ring_buffer_lock_reserve+0x12a/0x3f0
> event_triggers_call+0x52/0xe0
> ftrace_syscall_enter+0x264/0x2c0
> syscall_trace_enter.constprop.0+0x1ee/0x210
> do_syscall_64+0x1c/0x80
> entry_SYSCALL_64_after_hwframe+0x44/0xae
>
> Where it triggered a fault on strlen(key) where key was the filename.
>
> The reason is that filename is a char * to user space, and the histogram
> code just blindly dereferenced it, with obvious bad results.
>
> I originally tried to use strncpy_from_user/kernel_nofault() but found
> that there's other places that its dereferenced and not worth the effort.
>
> Just do not allow "char *" to act like strings.

It'd lose convenience and expressiveness but it's hard to argue if
it crashes the kernel. So,

Acked-by: Namhyung Kim <[email protected]>

Thanks,
Namhyung


>
> Cc: [email protected]
> Fixes: 79e577cbce4c4 ("tracing: Support string type key properly")
> Signed-off-by: Steven Rostedt (VMware) <[email protected]>
> ---
> kernel/trace/trace_events_hist.c | 6 +++---
> 1 file changed, 3 insertions(+), 3 deletions(-)
>
> diff --git a/kernel/trace/trace_events_hist.c b/kernel/trace/trace_events_hist.c
> index 0207aeed31e6..16a9dfc9fffc 100644
> --- a/kernel/trace/trace_events_hist.c
> +++ b/kernel/trace/trace_events_hist.c
> @@ -1689,7 +1689,9 @@ static struct hist_field *create_hist_field(struct hist_trigger_data *hist_data,
> if (WARN_ON_ONCE(!field))
> goto out;
>
> - if (is_string_field(field)) {
> + /* Pointers to strings are just pointers and dangerous to dereference */
> + if (is_string_field(field) &&
> + (field->filter_type != FILTER_PTR_STRING)) {
> flags |= HIST_FIELD_FL_STRING;
>
> hist_field->size = MAX_FILTER_STR_VAL;
> @@ -4495,8 +4497,6 @@ static inline void add_to_key(char *compound_key, void *key,
> field = key_field->field;
> if (field->filter_type == FILTER_DYN_STRING)
> size = *(u32 *)(rec + field->offset) >> 16;
> - else if (field->filter_type == FILTER_PTR_STRING)
> - size = strlen(key);
> else if (field->filter_type == FILTER_STATIC_STRING)
> size = field->size;
>
> --
> 2.31.1
>

2021-07-15 21:16:26

by Steven Rostedt

[permalink] [raw]
Subject: Re: [PATCH] tracing: Do no reference char * as a string in histograms

On Thu, 15 Jul 2021 15:39:33 -0500
Tom Zanussi <[email protected]> wrote:

> Very nice! Really looking forward to playing around with this.

We should be posting an RFC in a week or two. Just working on some
other things at the moment.

>
> Yeah, if it's needed to enable the eprobes and since it also fixes a
> crashing bug, I'm all for it.
>
> But should it also have an additional 'Fixes:' line?:
>
> Fixes: 5967bd5c4239 ("tracing: Let filter_assign_type() detect
> FILTER_PTR_STRING")

Added it.

>
> Acked-by: Tom Zanussi <[email protected]>

Thanks!

-- Steve