LinuxLists.cc - [RFA][PATCH 2/5] ftrace/x86: One more missing sync after fixup of function modification failure

2014-02-27 15:51:44

Subject: [RFA][PATCH 2/5] ftrace/x86: One more missing sync after fixup of function modification failure

[Request for Ack]

From: Petr Mladek <[email protected]>

If a failure occurs while modifying ftrace function, it bails out and will
remove the tracepoints to be back to what the code originally was.

There is missing the final sync run across the CPUs after the fix up is done
and before the ftrace int3 handler flag is reset.

Link: http://lkml.kernel.org/r/[email protected]

Fixes: 8a4d0a687a5 "ftrace: Use breakpoint method to update ftrace caller"
Cc: [email protected] # 3.5+
Signed-off-by: Petr Mladek <[email protected]>
Signed-off-by: Steven Rostedt <[email protected]>
---
arch/x86/kernel/ftrace.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/kernel/ftrace.c b/arch/x86/kernel/ftrace.c
index 6b566c8..69885e2 100644
--- a/arch/x86/kernel/ftrace.c
+++ b/arch/x86/kernel/ftrace.c
@@ -660,8 +660,8 @@ ftrace_modify_code(unsigned long ip, unsigned const char *old_code,
ret = -EPERM;
goto out;
}
- run_sync();
out:
+ run_sync();
return ret;

fail_update:
--
1.8.5.3

2014-02-27 16:05:42

by Steven Rostedt

[permalink] [raw]

Subject: Re: [RFA][PATCH 2/5] ftrace/x86: One more missing sync after fixup of function modification failure

H. Peter, I forgot to Cc you on this one. As you are the one I'm
looking for an Ack from.

-- Steve

On Thu, 27 Feb 2014 10:46:18 -0500
Steven Rostedt <[email protected]> wrote:

> [Request for Ack]
>
> From: Petr Mladek <[email protected]>
>
> If a failure occurs while modifying ftrace function, it bails out and will
> remove the tracepoints to be back to what the code originally was.
>
> There is missing the final sync run across the CPUs after the fix up is done
> and before the ftrace int3 handler flag is reset.
>
> Link: http://lkml.kernel.org/r/[email protected]
>
> Fixes: 8a4d0a687a5 "ftrace: Use breakpoint method to update ftrace caller"
> Cc: [email protected] # 3.5+
> Signed-off-by: Petr Mladek <[email protected]>
> Signed-off-by: Steven Rostedt <[email protected]>
> ---
> arch/x86/kernel/ftrace.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/arch/x86/kernel/ftrace.c b/arch/x86/kernel/ftrace.c
> index 6b566c8..69885e2 100644
> --- a/arch/x86/kernel/ftrace.c
> +++ b/arch/x86/kernel/ftrace.c
> @@ -660,8 +660,8 @@ ftrace_modify_code(unsigned long ip, unsigned const char *old_code,
> ret = -EPERM;
> goto out;
> }
> - run_sync();
> out:
> + run_sync();
> return ret;
>
> fail_update:

2014-02-27 16:37:38

by Frederic Weisbecker

[permalink] [raw]

Subject: Re: [RFA][PATCH 2/5] ftrace/x86: One more missing sync after fixup of function modification failure

On Thu, Feb 27, 2014 at 10:46:18AM -0500, Steven Rostedt wrote:
> [Request for Ack]
>
> From: Petr Mladek <[email protected]>
>
> If a failure occurs while modifying ftrace function, it bails out and will
> remove the tracepoints to be back to what the code originally was.
>
> There is missing the final sync run across the CPUs after the fix up is done
> and before the ftrace int3 handler flag is reset.

So IIUC the risk is that other CPUs may spuriously ignore non-ftrace traps if we don't sync the
other cores after reverting the int3 before decrementing the modifying_ftrace_code counter?

>
> Link: http://lkml.kernel.org/r/[email protected]
>
> Fixes: 8a4d0a687a5 "ftrace: Use breakpoint method to update ftrace caller"
> Cc: [email protected] # 3.5+
> Signed-off-by: Petr Mladek <[email protected]>
> Signed-off-by: Steven Rostedt <[email protected]>
> ---
> arch/x86/kernel/ftrace.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/arch/x86/kernel/ftrace.c b/arch/x86/kernel/ftrace.c
> index 6b566c8..69885e2 100644
> --- a/arch/x86/kernel/ftrace.c
> +++ b/arch/x86/kernel/ftrace.c
> @@ -660,8 +660,8 @@ ftrace_modify_code(unsigned long ip, unsigned const char *old_code,
> ret = -EPERM;
> goto out;
> }
> - run_sync();
> out:
> + run_sync();
> return ret;
>
> fail_update:

This could be further optimized by rather calling run_sync() in the end of the
fail_update block (after the probe_kernel_write revert) otherwise even failure on
setting the break will result in run_sync(), which doesn't appear to be needed. But
that's really just nitpicking as it's a rare failure codepath and shouldn't hurt.

In any case, the fix looks correct.

Acked-by: Frederic Weisbecker <[email protected]>

> --
> 1.8.5.3
>
>

2014-02-27 17:00:18

by Steven Rostedt

[permalink] [raw]

Subject: Re: [RFA][PATCH 2/5] ftrace/x86: One more missing sync after fixup of function modification failure

On Thu, 27 Feb 2014 17:37:32 +0100
Frederic Weisbecker <[email protected]> wrote:

> On Thu, Feb 27, 2014 at 10:46:18AM -0500, Steven Rostedt wrote:
> > [Request for Ack]
> >
> > From: Petr Mladek <[email protected]>
> >
> > If a failure occurs while modifying ftrace function, it bails out and will
> > remove the tracepoints to be back to what the code originally was.
> >
> > There is missing the final sync run across the CPUs after the fix up is done
> > and before the ftrace int3 handler flag is reset.
>
> So IIUC the risk is that other CPUs may spuriously ignore non-ftrace traps if we don't sync the
> other cores after reverting the int3 before decrementing the modifying_ftrace_code counter?

Actually, the bug is that they will not ignore the ftrace traps after
we decrement modifying_ftrace_code counter. Here's the race:

CPU0 CPU1
---- ----
remove_breakpoint();
modifying_ftrace_code = 0;

[still sees breakpoint]
<takes trap>
[sees modifying_ftrace_code as zero]
[no breakpoint handler]
[goto failed case]
[trap exception - kernel breakpoint, no
handler]
BUG()

Even if we had a smp_wmb() after removing the breakpoint and clearing
the modifying_ftrace_code, we still need the smp_rmb() on the other
CPUS. The run_sync() does a IPI on all CPUs doing the smp_rmb().

>
> >
> > Link: http://lkml.kernel.org/r/[email protected]
> >
> > Fixes: 8a4d0a687a5 "ftrace: Use breakpoint method to update ftrace caller"
> > Cc: [email protected] # 3.5+
> > Signed-off-by: Petr Mladek <[email protected]>
> > Signed-off-by: Steven Rostedt <[email protected]>
> > ---
> > arch/x86/kernel/ftrace.c | 2 +-
> > 1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/arch/x86/kernel/ftrace.c b/arch/x86/kernel/ftrace.c
> > index 6b566c8..69885e2 100644
> > --- a/arch/x86/kernel/ftrace.c
> > +++ b/arch/x86/kernel/ftrace.c
> > @@ -660,8 +660,8 @@ ftrace_modify_code(unsigned long ip, unsigned const char *old_code,
> > ret = -EPERM;
> > goto out;
> > }
> > - run_sync();
> > out:
> > + run_sync();
> > return ret;
> >
> > fail_update:
>
> This could be further optimized by rather calling run_sync() in the end of the
> fail_update block (after the probe_kernel_write revert) otherwise even failure on
> setting the break will result in run_sync(), which doesn't appear to be needed. But
> that's really just nitpicking as it's a rare failure codepath and shouldn't hurt.

No, the run_sync() must be done after removing the breakpoint. Again,
we don't want one of these breakpoints to be called on another CPU and
then see modifying_ftrace_code as zero. That is bad. The final
run_sync() is required.

I think I'll update the change log to include my race flow graph from
above.

-- Steve

>
> In any case, the fix looks correct.
>
> Acked-by: Frederic Weisbecker <[email protected]>
>
> > --
> > 1.8.5.3
> >
> >

2014-02-27 17:19:42

by Frederic Weisbecker

[permalink] [raw]

Subject: Re: [RFA][PATCH 2/5] ftrace/x86: One more missing sync after fixup of function modification failure

On Thu, Feb 27, 2014 at 12:00:14PM -0500, Steven Rostedt wrote:
> On Thu, 27 Feb 2014 17:37:32 +0100
> Frederic Weisbecker <[email protected]> wrote:
>
> > On Thu, Feb 27, 2014 at 10:46:18AM -0500, Steven Rostedt wrote:
> > > [Request for Ack]
> > >
> > > From: Petr Mladek <[email protected]>
> > >
> > > If a failure occurs while modifying ftrace function, it bails out and will
> > > remove the tracepoints to be back to what the code originally was.
> > >
> > > There is missing the final sync run across the CPUs after the fix up is done
> > > and before the ftrace int3 handler flag is reset.
> >
> > So IIUC the risk is that other CPUs may spuriously ignore non-ftrace traps if we don't sync the
> > other cores after reverting the int3 before decrementing the modifying_ftrace_code counter?
>
> Actually, the bug is that they will not ignore the ftrace traps after
> we decrement modifying_ftrace_code counter. Here's the race:
>
> CPU0 CPU1
> ---- ----
> remove_breakpoint();
> modifying_ftrace_code = 0;
>
> [still sees breakpoint]
> <takes trap>
> [sees modifying_ftrace_code as zero]
> [no breakpoint handler]
> [goto failed case]
> [trap exception - kernel breakpoint, no
> handler]
> BUG()
>
>
> Even if we had a smp_wmb() after removing the breakpoint and clearing
> the modifying_ftrace_code, we still need the smp_rmb() on the other
> CPUS. The run_sync() does a IPI on all CPUs doing the smp_rmb().

Ah ok. My understanding was indeed that it doesn't ignore the ftrace trap,
but I thought the consequence was that we return immediately from the trap
handler.

>
> >
> > >
> > > Link: http://lkml.kernel.org/r/[email protected]
> > >
> > > Fixes: 8a4d0a687a5 "ftrace: Use breakpoint method to update ftrace caller"
> > > Cc: [email protected] # 3.5+
> > > Signed-off-by: Petr Mladek <[email protected]>
> > > Signed-off-by: Steven Rostedt <[email protected]>
> > > ---
> > > arch/x86/kernel/ftrace.c | 2 +-
> > > 1 file changed, 1 insertion(+), 1 deletion(-)
> > >
> > > diff --git a/arch/x86/kernel/ftrace.c b/arch/x86/kernel/ftrace.c
> > > index 6b566c8..69885e2 100644
> > > --- a/arch/x86/kernel/ftrace.c
> > > +++ b/arch/x86/kernel/ftrace.c
> > > @@ -660,8 +660,8 @@ ftrace_modify_code(unsigned long ip, unsigned const char *old_code,
> > > ret = -EPERM;
> > > goto out;
> > > }
> > > - run_sync();
> > > out:
> > > + run_sync();
> > > return ret;
> > >
> > > fail_update:
> >
> > This could be further optimized by rather calling run_sync() in the end of the
> > fail_update block (after the probe_kernel_write revert) otherwise even failure on
> > setting the break will result in run_sync(), which doesn't appear to be needed. But
> > that's really just nitpicking as it's a rare failure codepath and shouldn't hurt.
>
> No, the run_sync() must be done after removing the breakpoint. Again,
> we don't want one of these breakpoints to be called on another CPU and
> then see modifying_ftrace_code as zero. That is bad. The final
> run_sync() is required.

Ok but what I meant is to do this instead:

fail_update:
probe_kernel_write((void *)ip, &old_code[0], 1);
+ run_sync()
goto out;

Because with the current patch we also call run_sync() on add_break() failure.

>
> I think I'll update the change log to include my race flow graph from
> above.
>
> -- Steve
>
>
> >
> > In any case, the fix looks correct.
> >
> > Acked-by: Frederic Weisbecker <[email protected]>
> >
> > > --
> > > 1.8.5.3
> > >
> > >
>

2014-02-27 17:35:57

by Steven Rostedt

[permalink] [raw]

Subject: Re: [RFA][PATCH 2/5] ftrace/x86: One more missing sync after fixup of function modification failure

On Thu, 27 Feb 2014 18:19:37 +0100
Frederic Weisbecker <[email protected]> wrote:

> On Thu, Feb 27, 2014 at 12:00:14PM -0500, Steven Rostedt wrote:
> > On Thu, 27 Feb 2014 17:37:32 +0100
> > Frederic Weisbecker <[email protected]> wrote:
> >
> > > On Thu, Feb 27, 2014 at 10:46:18AM -0500, Steven Rostedt wrote:
> > > > [Request for Ack]
> > > >
> > > > From: Petr Mladek <[email protected]>
> > > >
> > > > If a failure occurs while modifying ftrace function, it bails out and will
> > > > remove the tracepoints to be back to what the code originally was.
> > > >
> > > > There is missing the final sync run across the CPUs after the fix up is done
> > > > and before the ftrace int3 handler flag is reset.
> > >
> > > So IIUC the risk is that other CPUs may spuriously ignore non-ftrace traps if we don't sync the
> > > other cores after reverting the int3 before decrementing the modifying_ftrace_code counter?
> >
> > Actually, the bug is that they will not ignore the ftrace traps after
> > we decrement modifying_ftrace_code counter. Here's the race:
> >
> > CPU0 CPU1
> > ---- ----
> > remove_breakpoint();
> > modifying_ftrace_code = 0;
> >
> > [still sees breakpoint]
> > <takes trap>
> > [sees modifying_ftrace_code as zero]
> > [no breakpoint handler]
> > [goto failed case]
> > [trap exception - kernel breakpoint, no
> > handler]
> > BUG()
> >
> >
> > Even if we had a smp_wmb() after removing the breakpoint and clearing
> > the modifying_ftrace_code, we still need the smp_rmb() on the other
> > CPUS. The run_sync() does a IPI on all CPUs doing the smp_rmb().
>
> Ah ok. My understanding was indeed that it doesn't ignore the ftrace trap,
> but I thought the consequence was that we return immediately from the trap
> handler.

I'll add my above cpu race diagram (is that what we call it?). That
should make this change more understandable.

> Ok but what I meant is to do this instead:
>
> fail_update:
> probe_kernel_write((void *)ip, &old_code[0], 1);
> + run_sync()
> goto out;
>
> Because with the current patch we also call run_sync() on add_break() failure.

Ah ok (my turn to understand). Yeah, if the add_break() fails, then we
don't need to do the run_sync().

But this is just for now, to prevent the add_update_code() error from
crashing. I have more patches that clean this up further. But they are
for 3.15.

-- Steve

2014-02-27 17:52:47

by Frederic Weisbecker

[permalink] [raw]

Subject: Re: [RFA][PATCH 2/5] ftrace/x86: One more missing sync after fixup of function modification failure

On Thu, Feb 27, 2014 at 12:35:53PM -0500, Steven Rostedt wrote:
> On Thu, 27 Feb 2014 18:19:37 +0100
> Frederic Weisbecker <[email protected]> wrote:
>
> > On Thu, Feb 27, 2014 at 12:00:14PM -0500, Steven Rostedt wrote:
> > > On Thu, 27 Feb 2014 17:37:32 +0100
> > > Frederic Weisbecker <[email protected]> wrote:
> > >
> > > > On Thu, Feb 27, 2014 at 10:46:18AM -0500, Steven Rostedt wrote:
> > > > > [Request for Ack]
> > > > >
> > > > > From: Petr Mladek <[email protected]>
> > > > >
> > > > > If a failure occurs while modifying ftrace function, it bails out and will
> > > > > remove the tracepoints to be back to what the code originally was.
> > > > >
> > > > > There is missing the final sync run across the CPUs after the fix up is done
> > > > > and before the ftrace int3 handler flag is reset.
> > > >
> > > > So IIUC the risk is that other CPUs may spuriously ignore non-ftrace traps if we don't sync the
> > > > other cores after reverting the int3 before decrementing the modifying_ftrace_code counter?
> > >
> > > Actually, the bug is that they will not ignore the ftrace traps after
> > > we decrement modifying_ftrace_code counter. Here's the race:
> > >
> > > CPU0 CPU1
> > > ---- ----
> > > remove_breakpoint();
> > > modifying_ftrace_code = 0;
> > >
> > > [still sees breakpoint]
> > > <takes trap>
> > > [sees modifying_ftrace_code as zero]
> > > [no breakpoint handler]
> > > [goto failed case]
> > > [trap exception - kernel breakpoint, no
> > > handler]
> > > BUG()
> > >
> > >
> > > Even if we had a smp_wmb() after removing the breakpoint and clearing
> > > the modifying_ftrace_code, we still need the smp_rmb() on the other
> > > CPUS. The run_sync() does a IPI on all CPUs doing the smp_rmb().
> >
> > Ah ok. My understanding was indeed that it doesn't ignore the ftrace trap,
> > but I thought the consequence was that we return immediately from the trap
> > handler.
>
> I'll add my above cpu race diagram (is that what we call it?). That
> should make this change more understandable.

Yeah sounds like a good idea!

>
>
> > Ok but what I meant is to do this instead:
> >
> > fail_update:
> > probe_kernel_write((void *)ip, &old_code[0], 1);
> > + run_sync()
> > goto out;
> >
> > Because with the current patch we also call run_sync() on add_break() failure.
>
> Ah ok (my turn to understand). Yeah, if the add_break() fails, then we
> don't need to do the run_sync().
>
> But this is just for now, to prevent the add_update_code() error from
> crashing. I have more patches that clean this up further. But they are
> for 3.15.

Yeah sure. That was really just nitpicking. It doesn't hurt in a rare failure path
and the fix is there.

Thanks.