DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 95B712187C
Date: Thu, 28 Dec 2017 11:34:34 +0900
From: Masami Hiramatsu <mhiramat@kernel.org>
To: Alexei Starovoitov <ast@fb.com>
Cc: Alexei Starovoitov <alexei.starovoitov@gmail.com>,
        Josef Bacik <jbacik@fb.com>, <rostedt@goodmis.org>, <mingo@redhat.com>,
        <davem@davemloft.net>, <netdev@vger.kernel.org>,
        <linux-kernel@vger.kernel.org>, <ast@kernel.org>, <kernel-team@fb.com>,
        <daniel@iogearbox.net>, <linux-btrfs@vger.kernel.org>,
        <darrick.wong@oracle.com>, Josef Bacik <josef@toxicpanda.com>,
        Akinobu Mita <akinobu.mita@gmail.com>
Subject: Re: [RFC PATCH bpf-next v2 1/4] tracing/kprobe: bpf: Check error
 injectable event is on function entry
Message-Id: <20171228113434.eb182c348fc69853fec934ee@kernel.org>
In-Reply-To: <a4097830-ac90-4db0-b860-6f6a85e91cba@fb.com>
References: <151427438796.32561.4235654585430455286.stgit@devbox>
        <151427441954.32561.8731119329264462024.stgit@devbox>
        <20171227015730.jjggymg4uqllteuy@ast-mbp>
        <20171227145628.53f68f391b2108d6df118ca7@kernel.org>
        <a4097830-ac90-4db0-b860-6f6a85e91cba@fb.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit
Sender: linux-kernel-owner@vger.kernel.org
Content-Length: 5553
Lines: 131

On Wed, 27 Dec 2017 14:46:24 -0800
Alexei Starovoitov <ast@fb.com> wrote:

> On 12/26/17 9:56 PM, Masami Hiramatsu wrote:
> > On Tue, 26 Dec 2017 17:57:32 -0800
> > Alexei Starovoitov <alexei.starovoitov@gmail.com> wrote:
> >
> >> On Tue, Dec 26, 2017 at 04:46:59PM +0900, Masami Hiramatsu wrote:
> >>> Check whether error injectable event is on function entry or not.
> >>> Currently it checks the event is ftrace-based kprobes or not,
> >>> but that is wrong. It should check if the event is on the entry
> >>> of target function. Since error injection will override a function
> >>> to just return with modified return value, that operation must
> >>> be done before the target function starts making stackframe.
> >>>
> >>> As a side effect, bpf error injection is no need to depend on
> >>> function-tracer. It can work with sw-breakpoint based kprobe
> >>> events too.
> >>>
> >>> Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
> >>> ---
> >>>  kernel/trace/Kconfig        |    2 --
> >>>  kernel/trace/bpf_trace.c    |    6 +++---
> >>>  kernel/trace/trace_kprobe.c |    8 +++++---
> >>>  kernel/trace/trace_probe.h  |   12 ++++++------
> >>>  4 files changed, 14 insertions(+), 14 deletions(-)
> >>>
> >>> diff --git a/kernel/trace/Kconfig b/kernel/trace/Kconfig
> >>> index ae3a2d519e50..6400e1bf97c5 100644
> >>> --- a/kernel/trace/Kconfig
> >>> +++ b/kernel/trace/Kconfig
> >>> @@ -533,9 +533,7 @@ config FUNCTION_PROFILER
> >>>  config BPF_KPROBE_OVERRIDE
> >>>  	bool "Enable BPF programs to override a kprobed function"
> >>>  	depends on BPF_EVENTS
> >>> -	depends on KPROBES_ON_FTRACE
> >>>  	depends on HAVE_KPROBE_OVERRIDE
> >>> -	depends on DYNAMIC_FTRACE_WITH_REGS
> >>>  	default n
> >>>  	help
> >>>  	 Allows BPF to override the execution of a probed function and
> >>> diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
> >>> index f6d2327ecb59..d663660f8392 100644
> >>> --- a/kernel/trace/bpf_trace.c
> >>> +++ b/kernel/trace/bpf_trace.c
> >>> @@ -800,11 +800,11 @@ int perf_event_attach_bpf_prog(struct perf_event *event,
> >>>  	int ret = -EEXIST;
> >>>
> >>>  	/*
> >>> -	 * Kprobe override only works for ftrace based kprobes, and only if they
> >>> -	 * are on the opt-in list.
> >>> +	 * Kprobe override only works if they are on the function entry,
> >>> +	 * and only if they are on the opt-in list.
> >>>  	 */
> >>>  	if (prog->kprobe_override &&
> >>> -	    (!trace_kprobe_ftrace(event->tp_event) ||
> >>> +	    (!trace_kprobe_on_func_entry(event->tp_event) ||
> >>>  	     !trace_kprobe_error_injectable(event->tp_event)))
> >>>  		return -EINVAL;
> >>>
> >>> diff --git a/kernel/trace/trace_kprobe.c b/kernel/trace/trace_kprobe.c
> >>> index 91f4b57dab82..265e3e27e8dc 100644
> >>> --- a/kernel/trace/trace_kprobe.c
> >>> +++ b/kernel/trace/trace_kprobe.c
> >>> @@ -88,13 +88,15 @@ static nokprobe_inline unsigned long trace_kprobe_nhit(struct trace_kprobe *tk)
> >>>  	return nhit;
> >>>  }
> >>>
> >>> -int trace_kprobe_ftrace(struct trace_event_call *call)
> >>> +bool trace_kprobe_on_func_entry(struct trace_event_call *call)
> >>>  {
> >>>  	struct trace_kprobe *tk = (struct trace_kprobe *)call->data;
> >>> -	return kprobe_ftrace(&tk->rp.kp);
> >>> +
> >>> +	return kprobe_on_func_entry(tk->rp.kp.addr, tk->rp.kp.symbol_name,
> >>> +				    tk->rp.kp.offset);
> >>
> >> That would be nice, but did you test this?
> >
> > Yes, because the jprobe, which was only official user of modifying execution
> > path using kprobe, did same way to check. (and kretprobe also does it)
> >
> >> My understanding that kprobe will restore all regs and
> >> here we need to override return ip _and_ value.
> >
> > yes, no problem. kprobe restore all regs from pt_regs, including regs->ip.
> >
> >> Could you add a patch with the test the way Josef did
> >> or describe the steps to test this new mode?
> >
> > Would you mean below patch? If so, it should work without any change.
> >
> >  [PATCH v10 4/5] samples/bpf: add a test for bpf_override_return
> 
> yeah. I expect bpf_override_return test to work as-is.
> I'm asking for the test for new functionality added by this patch.
> In particular kprobe on func entry without ftrace.
> How did you test it?

This function is used in kretprobe and jprobe. Jprobe was the user of
"modifying instruction pointer to another function" in kprobes.
If it doesn't work, jprobe also doesn't work, this means you can not
modify IP by kprobes anymore.
Anyway, until linux-4.13, that was well tested by kprobe smoke test.

> and how I can repeat the test?
> I'm still not sure that it works correctly.

That works correctly because it checks given address is on the entry
point (the 1st instruction) of a function, using kallsyms.

The reason why I made another flag for ftrace was, there are 2 modes
for ftrace dynamic instrumentation, fentry and mcount.
With new fentry mode, ftrace will be put on the first instruction
of the function, so it will work as you expected.
With traditional gcc mcount, ftrace will be called after making call
frame for _mcount(). This means if you modify ip, it will not work
or cause a trouble because _mcount call frame is still on the stack.

So, current ftrace-based checker doesn't work, it depends on the case.
Of course, in most case, kernel will be build in new gcc which 
supports fentry, but there is no guarantee.

Please follow what jprobe did if you want to change invoked function
using kprobes. That has been well reviewed and discussed in more than
10 years.

Thank you,

-- 
Masami Hiramatsu <mhiramat@kernel.org>