2020-09-28 15:15:13

by Steven Rostedt

[permalink] [raw]
Subject: Re: [Bug 209317] ftrace kernel self test failure on RISC-V on 5.8, regression from 5.4.0

On Sat, 26 Sep 2020 22:02:35 +0000
[email protected] wrote:

> https://bugzilla.kernel.org/show_bug.cgi?id=209317
>
> --- Comment #4 from Colin Ian King ([email protected]) ---
> Issue still in 5.9-rc6
>


Atish,

As the issues bisects down to your commit, care to take a look at this.
(And take ownership of this bug)

-- Steve


2020-09-28 17:27:40

by Atish Patra

[permalink] [raw]
Subject: Re: [Bug 209317] ftrace kernel self test failure on RISC-V on 5.8, regression from 5.4.0

On Mon, 2020-09-28 at 11:13 -0400, Steven Rostedt wrote:
> On Sat, 26 Sep 2020 22:02:35 +0000
> [email protected] wrote:
>
> > https://bugzilla.kernel.org/show_bug.cgi?id=209317
> >
> > --- Comment #4 from Colin Ian King ([email protected]) ---
> > Issue still in 5.9-rc6
> >
>
> Atish,
>
> As the issues bisects down to your commit, care to take a look at
> this.
> (And take ownership of this bug)
>

Yes. I am already looking into this. Colin informed me about the bug
over the weekend.

I couldn't change the ownership as I am not part of the editbugs group.
I have sent an email to [email protected] for access.

> -- Steve

--
Regards,
Atish

2020-10-03 17:36:55

by Atish Patra

[permalink] [raw]
Subject: Re: [Bug 209317] ftrace kernel self test failure on RISC-V on 5.8, regression from 5.4.0

Hi Alan and Zong,
I initially suspected ftrace is broken between v5.6 & v5.7 as Kolin pointed out.
I couldn't find any reason how the HSM patch is related. Zong's ftrace
patching code was also merged in that release.
However, I was able to reproduce the issue in the older kernel(v5.4)
as well on both Qemu & Unleashed hardware.
Here are the steps:

mount -t debugfs none /sys/kernel/debug/
cd /sys/kernel/debug/tracing
echo function_graph > current_tracer
echo function > current_tracer

It works for the first time with function_graph but writing any other
tracer crashes immediately.
Can you take a look to check if the bug is in ftrace infrastructure code ?

On Mon, Sep 28, 2020 at 10:25 AM Atish Patra <[email protected]> wrote:
>
> On Mon, 2020-09-28 at 11:13 -0400, Steven Rostedt wrote:
> > On Sat, 26 Sep 2020 22:02:35 +0000
> > [email protected] wrote:
> >
> > > https://bugzilla.kernel.org/show_bug.cgi?id=209317
> > >
> > > --- Comment #4 from Colin Ian King ([email protected]) ---
> > > Issue still in 5.9-rc6
> > >
> >
> > Atish,
> >
> > As the issues bisects down to your commit, care to take a look at
> > this.
> > (And take ownership of this bug)
> >
>
> Yes. I am already looking into this. Colin informed me about the bug
> over the weekend.
>
> I couldn't change the ownership as I am not part of the editbugs group.
> I have sent an email to [email protected] for access.
>
> > -- Steve
>
> --
> Regards,
> Atish
> _______________________________________________
> linux-riscv mailing list
> [email protected]
> http://lists.infradead.org/mailman/listinfo/linux-riscv



--
Regards,
Atish

2020-10-05 06:11:26

by Zong Li

[permalink] [raw]
Subject: Re: [Bug 209317] ftrace kernel self test failure on RISC-V on 5.8, regression from 5.4.0

Hi Atish,

I can take out some time to take a look at it together, if anyone here
fixes it or has ideas, please share the information, thanks.

On Sun, Oct 4, 2020 at 1:33 AM Atish Patra <[email protected]> wrote:
>
> Hi Alan and Zong,
> I initially suspected ftrace is broken between v5.6 & v5.7 as Kolin pointed out.
> I couldn't find any reason how the HSM patch is related. Zong's ftrace
> patching code was also merged in that release.
> However, I was able to reproduce the issue in the older kernel(v5.4)
> as well on both Qemu & Unleashed hardware.
> Here are the steps:
>
> mount -t debugfs none /sys/kernel/debug/
> cd /sys/kernel/debug/tracing
> echo function_graph > current_tracer
> echo function > current_tracer
>
> It works for the first time with function_graph but writing any other
> tracer crashes immediately.
> Can you take a look to check if the bug is in ftrace infrastructure code ?
>
> On Mon, Sep 28, 2020 at 10:25 AM Atish Patra <[email protected]> wrote:
> >
> > On Mon, 2020-09-28 at 11:13 -0400, Steven Rostedt wrote:
> > > On Sat, 26 Sep 2020 22:02:35 +0000
> > > [email protected] wrote:
> > >
> > > > https://bugzilla.kernel.org/show_bug.cgi?id=209317
> > > >
> > > > --- Comment #4 from Colin Ian King ([email protected]) ---
> > > > Issue still in 5.9-rc6
> > > >
> > >
> > > Atish,
> > >
> > > As the issues bisects down to your commit, care to take a look at
> > > this.
> > > (And take ownership of this bug)
> > >
> >
> > Yes. I am already looking into this. Colin informed me about the bug
> > over the weekend.
> >
> > I couldn't change the ownership as I am not part of the editbugs group.
> > I have sent an email to [email protected] for access.
> >
> > > -- Steve
> >
> > --
> > Regards,
> > Atish
> > _______________________________________________
> > linux-riscv mailing list
> > [email protected]
> > http://lists.infradead.org/mailman/listinfo/linux-riscv
>
>
>
> --
> Regards,
> Atish

2020-10-05 21:10:14

by Atish Patra

[permalink] [raw]
Subject: Re: [Bug 209317] ftrace kernel self test failure on RISC-V on 5.8, regression from 5.4.0

On Sun, Oct 4, 2020 at 11:08 PM Zong Li <[email protected]> wrote:
>
> Hi Atish,
>
> I can take out some time to take a look at it together, if anyone here
> fixes it or has ideas, please share the information, thanks.
>

Thanks. I observed this in case it helps.

Across kernels, the panic trace seems to point out the one of the
first two functions after patching is corrupted.
rcu_momentary_dyntick_idle or stop_machine_yield[1]

[1]https://elixir.bootlin.com/linux/v5.9-rc7/source/kernel/stop_machine.c#L213

I am suspecting nop was not replaced with the correct auipc+jalr pair?

> On Sun, Oct 4, 2020 at 1:33 AM Atish Patra <[email protected]> wrote:
> >
> > Hi Alan and Zong,
> > I initially suspected ftrace is broken between v5.6 & v5.7 as Kolin pointed out.
> > I couldn't find any reason how the HSM patch is related. Zong's ftrace
> > patching code was also merged in that release.
> > However, I was able to reproduce the issue in the older kernel(v5.4)
> > as well on both Qemu & Unleashed hardware.
> > Here are the steps:
> >
> > mount -t debugfs none /sys/kernel/debug/
> > cd /sys/kernel/debug/tracing
> > echo function_graph > current_tracer
> > echo function > current_tracer
> >
> > It works for the first time with function_graph but writing any other
> > tracer crashes immediately.
> > Can you take a look to check if the bug is in ftrace infrastructure code ?
> >
> > On Mon, Sep 28, 2020 at 10:25 AM Atish Patra <[email protected]> wrote:
> > >
> > > On Mon, 2020-09-28 at 11:13 -0400, Steven Rostedt wrote:
> > > > On Sat, 26 Sep 2020 22:02:35 +0000
> > > > [email protected] wrote:
> > > >
> > > > > https://bugzilla.kernel.org/show_bug.cgi?id=209317
> > > > >
> > > > > --- Comment #4 from Colin Ian King ([email protected]) ---
> > > > > Issue still in 5.9-rc6
> > > > >
> > > >
> > > > Atish,
> > > >
> > > > As the issues bisects down to your commit, care to take a look at
> > > > this.
> > > > (And take ownership of this bug)
> > > >
> > >
> > > Yes. I am already looking into this. Colin informed me about the bug
> > > over the weekend.
> > >
> > > I couldn't change the ownership as I am not part of the editbugs group.
> > > I have sent an email to [email protected] for access.
> > >
> > > > -- Steve
> > >
> > > --
> > > Regards,
> > > Atish
> > > _______________________________________________
> > > linux-riscv mailing list
> > > [email protected]
> > > http://lists.infradead.org/mailman/listinfo/linux-riscv
> >
> >
> >
> > --
> > Regards,
> > Atish



--
Regards,
Atish