2020-02-12 14:26:12

by Christophe Leroy

[permalink] [raw]
Subject: [Regression 5.6-rc1][Bisected b6231ea2b3c6] Powerpc 8xx doesn't boot anymore

Hi Rasmus,

Kernel 5.6-rc1 silently fails on boot.

I bisected the problem to commit b6231ea2b3c6 ("soc: fsl: qe: drop
broken lazy call of cpm_muram_init()")

I get a bad_page_fault() for an access at address 8 in
cpm_muram_alloc_common(), called from cpm_uart_console_setup() via
cpm_uart_allocbuf()

Reverting the guilty commit on top of 5.6-rc1 is not trivial.

In your commit text you explain that cpm_muram_init() is called via
subsys_initcall. But console init is done before that, so it cannot work.

Do you have a fix for that ?

Thanks
Christophe

NB: Next time, can you please copy powerpc mailing list when changing
such core parts of powerpc CPUs ?


2020-02-12 14:44:21

by Christophe Leroy

[permalink] [raw]
Subject: Re: [Regression 5.6-rc1][Bisected b6231ea2b3c6] Powerpc 8xx doesn't boot anymore



Le 12/02/2020 à 15:24, Christophe Leroy a écrit :
> Hi Rasmus,
>

[...]

>
> NB: Next time, can you please copy powerpc mailing list when changing
> such core parts of powerpc CPUs ?

Apologise for that comment, in fact I was part of the recipients so it
didn't land in my linuxppc mailbox.

Seems like I missed that series.

2020-02-12 14:50:56

by Christophe Leroy

[permalink] [raw]
Subject: Re: [Regression 5.6-rc1][Bisected b6231ea2b3c6] Powerpc 8xx doesn't boot anymore



On 02/12/2020 02:24 PM, Christophe Leroy wrote:
> Hi Rasmus,
>
> Kernel 5.6-rc1 silently fails on boot.
>
> I bisected the problem to commit b6231ea2b3c6 ("soc: fsl: qe: drop
> broken lazy call of cpm_muram_init()")
>
> I get a bad_page_fault() for an access at address 8 in
> cpm_muram_alloc_common(), called from cpm_uart_console_setup() via
> cpm_uart_allocbuf()
>
> Reverting the guilty commit on top of 5.6-rc1 is not trivial.
>
> In your commit text you explain that cpm_muram_init() is called via
> subsys_initcall. But console init is done before that, so it cannot work.
>
> Do you have a fix for that ?
>

The following patch allows powerpc 8xx to boot again. Don't know if
that's the good place and way to do the fix though.

---
diff --git a/drivers/tty/serial/cpm_uart/cpm_uart_core.c
b/drivers/tty/serial/cpm_uart/cpm_uart_core.c
index 4cabded8390b..341d682ec6eb 100644
--- a/drivers/tty/serial/cpm_uart/cpm_uart_core.c
+++ b/drivers/tty/serial/cpm_uart/cpm_uart_core.c
@@ -1351,6 +1351,7 @@ static int __init cpm_uart_console_setup(struct
console *co, char *options)
clrbits32(&pinfo->sccp->scc_gsmrl, SCC_GSMRL_ENR | SCC_GSMRL_ENT);
}

+ cpm_muram_init();
ret = cpm_uart_allocbuf(pinfo, 1);

if (ret)

---
Christophe

2020-02-13 03:36:49

by Zhao Qiang

[permalink] [raw]
Subject: RE: [Regression 5.6-rc1][Bisected b6231ea2b3c6] Powerpc 8xx doesn't boot anymore

On 02/12/2020 22:50 PM, Christophe Leroy wrote:
> -----Original Message-----
> From: Christophe Leroy <[email protected]>
> Sent: 2020年2月12日 22:50
> To: Rasmus Villemoes <[email protected]>; Leo Li
> <[email protected]>; Qiang Zhao <[email protected]>; Greg
> Kroah-Hartman <[email protected]>
> Cc: Scott Wood <[email protected]>; [email protected]; LKML
> <[email protected]>; [email protected]
> Subject: Re: [Regression 5.6-rc1][Bisected b6231ea2b3c6] Powerpc 8xx doesn't
> boot anymore
>
> ---
> diff --git a/drivers/tty/serial/cpm_uart/cpm_uart_core.c
> b/drivers/tty/serial/cpm_uart/cpm_uart_core.c
> index 4cabded8390b..341d682ec6eb 100644
> --- a/drivers/tty/serial/cpm_uart/cpm_uart_core.c
> +++ b/drivers/tty/serial/cpm_uart/cpm_uart_core.c
> @@ -1351,6 +1351,7 @@ static int __init cpm_uart_console_setup(struct
> console *co, char *options)
> clrbits32(&pinfo->sccp->scc_gsmrl, SCC_GSMRL_ENR |
> SCC_GSMRL_ENT);
> }
>
> + cpm_muram_init();
> ret = cpm_uart_allocbuf(pinfo, 1);
>
> if (ret)
>
How about the patch like below? Just a draft.

diff --git a/drivers/soc/fsl/qe/qe.c b/drivers/soc/fsl/qe/qe.c
index 96c2057..c5c2464 100644
--- a/drivers/soc/fsl/qe/qe.c
+++ b/drivers/soc/fsl/qe/qe.c
@@ -29,6 +29,8 @@
#include <soc/fsl/qe/immap_qe.h>
#include <soc/fsl/qe/qe.h>

+static int qe_inited;
+
static void qe_snums_init(void);
static int qe_sdma_init(void);

@@ -637,15 +639,19 @@ unsigned int qe_get_num_of_snums(void)
}
EXPORT_SYMBOL(qe_get_num_of_snums);

-static int __init qe_init(void)
+int __init qe_init(void)
{
struct device_node *np;

+ if(qe_inited)
+ return 0;
+
np = of_find_compatible_node(NULL, NULL, "fsl,qe");
if (!np)
return -ENODEV;
qe_reset();
of_node_put(np);
+ qe_inited = 1
return 0;
}
subsys_initcall(qe_init);
diff --git a/drivers/tty/serial/cpm_uart/cpm_uart_core.c b/drivers/tty/serial/cpm_uart/cpm_uart_core.c
index 19d5a4c..cbf2c32 100644
--- a/drivers/tty/serial/cpm_uart/cpm_uart_core.c
+++ b/drivers/tty/serial/cpm_uart/cpm_uart_core.c
@@ -1373,6 +1373,7 @@ static struct console cpm_scc_uart_console = {

static int __init cpm_uart_console_init(void)
{
+ qe_init();
register_console(&cpm_scc_uart_console);
return 0;
}
diff --git a/include/soc/fsl/qe/qe.h b/include/soc/fsl/qe/qe.h
index e282ac0..531ba05 100644
--- a/include/soc/fsl/qe/qe.h
+++ b/include/soc/fsl/qe/qe.h
@@ -88,6 +88,7 @@ static inline bool qe_clock_is_brg(enum qe_clock clk)

extern spinlock_t cmxgcr_lock;

+int __init qe_init(void);
/* Export QE common operations */
#ifdef CONFIG_QUICC_ENGINE
extern void qe_reset(void);

Best Regards
Qiang Zhao

2020-02-13 06:17:32

by Christophe Leroy

[permalink] [raw]
Subject: Re: [Regression 5.6-rc1][Bisected b6231ea2b3c6] Powerpc 8xx doesn't boot anymore



Le 13/02/2020 à 04:35, Qiang Zhao a écrit :
> On 02/12/2020 22:50 PM, Christophe Leroy wrote:
>> -----Original Message-----
>> From: Christophe Leroy <[email protected]>
>> Sent: 2020年2月12日 22:50
>> To: Rasmus Villemoes <[email protected]>; Leo Li
>> <[email protected]>; Qiang Zhao <[email protected]>; Greg
>> Kroah-Hartman <[email protected]>
>> Cc: Scott Wood <[email protected]>; [email protected]; LKML
>> <[email protected]>; [email protected]
>> Subject: Re: [Regression 5.6-rc1][Bisected b6231ea2b3c6] Powerpc 8xx doesn't
>> boot anymore
>>
>> ---
>> diff --git a/drivers/tty/serial/cpm_uart/cpm_uart_core.c
>> b/drivers/tty/serial/cpm_uart/cpm_uart_core.c
>> index 4cabded8390b..341d682ec6eb 100644
>> --- a/drivers/tty/serial/cpm_uart/cpm_uart_core.c
>> +++ b/drivers/tty/serial/cpm_uart/cpm_uart_core.c
>> @@ -1351,6 +1351,7 @@ static int __init cpm_uart_console_setup(struct
>> console *co, char *options)
>> clrbits32(&pinfo->sccp->scc_gsmrl, SCC_GSMRL_ENR |
>> SCC_GSMRL_ENT);
>> }
>>
>> + cpm_muram_init();
>> ret = cpm_uart_allocbuf(pinfo, 1);
>>
>> if (ret)
>>
> How about the patch like below? Just a draft.

Yes, I see the idea. I think we could go for something like that.
But in the powerpc 8xx case, we are talking about cpm_init(), not qe_init().

And maybe the return code should be checked, because if it's not 0,
cpm_muram_init() won't have been called.

Thanks,
Christophe

2020-02-13 06:38:44

by Zhao Qiang

[permalink] [raw]
Subject: RE: [Regression 5.6-rc1][Bisected b6231ea2b3c6] Powerpc 8xx doesn't boot anymore

On 02/13/2020 14:17 PM, Christophe Leroy wrote:
> -----Original Message-----
> From: Christophe Leroy <[email protected]>
> Sent: 2020年2月13日 14:17
> To: Qiang Zhao <[email protected]>; Rasmus Villemoes
> <[email protected]>; Leo Li <[email protected]>; Greg
> Kroah-Hartman <[email protected]>
> Cc: Scott Wood <[email protected]>; [email protected]; LKML
> <[email protected]>; [email protected]
> Subject: Re: [Regression 5.6-rc1][Bisected b6231ea2b3c6] Powerpc 8xx doesn't
> boot anymore
>
>
>
> Le 13/02/2020 à 04:35, Qiang Zhao a écrit :
> > On 02/12/2020 22:50 PM, Christophe Leroy wrote:
> >> -----Original Message-----
> >> From: Christophe Leroy <[email protected]>
> >> Sent: 2020年2月12日 22:50
> >> To: Rasmus Villemoes <[email protected]>; Leo Li
> >> <[email protected]>; Qiang Zhao <[email protected]>; Greg
> >> Kroah-Hartman <[email protected]>
> >> Cc: Scott Wood <[email protected]>; [email protected];
> >> LKML <[email protected]>;
> >> [email protected]
> >> Subject: Re: [Regression 5.6-rc1][Bisected b6231ea2b3c6] Powerpc 8xx
> >> doesn't boot anymore
> >>
> >> ---
> >> diff --git a/drivers/tty/serial/cpm_uart/cpm_uart_core.c
> >> b/drivers/tty/serial/cpm_uart/cpm_uart_core.c
> >> index 4cabded8390b..341d682ec6eb 100644
> >> --- a/drivers/tty/serial/cpm_uart/cpm_uart_core.c
> >> +++ b/drivers/tty/serial/cpm_uart/cpm_uart_core.c
> >> @@ -1351,6 +1351,7 @@ static int __init cpm_uart_console_setup(struct
> >> console *co, char *options)
> >> clrbits32(&pinfo->sccp->scc_gsmrl, SCC_GSMRL_ENR |
> >> SCC_GSMRL_ENT);
> >> }
> >>
> >> + cpm_muram_init();
> >> ret = cpm_uart_allocbuf(pinfo, 1);
> >>
> >> if (ret)
> >>
> > How about the patch like below? Just a draft.
>
> Yes, I see the idea. I think we could go for something like that.
> But in the powerpc 8xx case, we are talking about cpm_init(), not qe_init().

Yes, cpm_init.

Best Regards
Qiang Zhao

2020-02-13 07:45:50

by Rasmus Villemoes

[permalink] [raw]
Subject: Re: [Regression 5.6-rc1][Bisected b6231ea2b3c6] Powerpc 8xx doesn't boot anymore

On 12/02/2020 15.24, Christophe Leroy wrote:
> Hi Rasmus,
>
> Kernel 5.6-rc1 silently fails on boot.
>
> I bisected the problem to commit b6231ea2b3c6 ("soc: fsl: qe: drop
> broken lazy call of cpm_muram_init()")
>
> I get a bad_page_fault() for an access at address 8 in
> cpm_muram_alloc_common(), called from cpm_uart_console_setup() via
> cpm_uart_allocbuf()

Sorry about that. But I'm afraid I don't see what I could have done
differently - the patch series, including b6231ea2b3c6, has been in
-next since 20191210, both you and ppc-dev were cc'ed on the entire
series (last revision sent November 28). And I've been dogfooding the
patches on both arm- and ppc-derived boards ever since (but obviously
only for a few cpus).

> Reverting the guilty commit on top of 5.6-rc1 is not trivial.
>
> In your commit text you explain that cpm_muram_init() is called via
> subsys_initcall. But console init is done before that, so it cannot work.

No, but neither did the code I removed seem to work - how does doing
spin_lock_init on a held spinlock, and then unlocking it, work? Is
everything-spinlock always a no-op in your configuration? And even so,
I'd think a GFP_KERNEL allocation under spin_lock_irqsave() would
trigger some splat somewhere?

Please note I'm not claiming my patch is not at fault, it clearly is, I
just want to try to understand how I could have been wrong about the
"nobody can have been relying on it" part.

> Do you have a fix for that ?

Not right now, but I'll have a look. It's true that the patch probably
doesn't revert cleanly, but it shouldn't be hard to add back those few
lines in the appropriate spot, with a big fat comment that this does
something very fishy (at least as a temporary measure if we don't find a
proper solution soonish).

Rasmus

2020-02-13 09:38:20

by Rasmus Villemoes

[permalink] [raw]
Subject: Re: [Regression 5.6-rc1][Bisected b6231ea2b3c6] Powerpc 8xx doesn't boot anymore

On 12/02/2020 15.50, Christophe Leroy wrote:
>
>
> On 02/12/2020 02:24 PM, Christophe Leroy wrote:

>> In your commit text you explain that cpm_muram_init() is called via
>> subsys_initcall. But console init is done before that, so it cannot work.
>>
>> Do you have a fix for that ?
>>
>
> The following patch allows powerpc 8xx to boot again. Don't know if
> that's the good place and way to do the fix though.
>
> ---
> diff --git a/drivers/tty/serial/cpm_uart/cpm_uart_core.c
> b/drivers/tty/serial/cpm_uart/cpm_uart_core.c
> index 4cabded8390b..341d682ec6eb 100644
> --- a/drivers/tty/serial/cpm_uart/cpm_uart_core.c
> +++ b/drivers/tty/serial/cpm_uart/cpm_uart_core.c
> @@ -1351,6 +1351,7 @@ static int __init cpm_uart_console_setup(struct
> console *co, char *options)
>          clrbits32(&pinfo->sccp->scc_gsmrl, SCC_GSMRL_ENR | SCC_GSMRL_ENT);
>      }
>
> +    cpm_muram_init();
>      ret = cpm_uart_allocbuf(pinfo, 1);
>
>      if (ret)

Hmm, that seems to be a somewhat random place, making it hard to see
that it is indeed early enough. Would it work to put it inside the
console_initcall that registers the cpm console? I.e.

static int __init cpm_uart_console_init(void)
{
+ cpm_muram_init();
register_console(&cpm_scc_uart_console);
return 0;
}

console_initcall(cpm_uart_console_init);

Rasmus

2020-02-13 10:41:12

by Christophe Leroy

[permalink] [raw]
Subject: Re: [Regression 5.6-rc1][Bisected b6231ea2b3c6] Powerpc 8xx doesn't boot anymore



On 02/13/2020 07:45 AM, Rasmus Villemoes wrote:
> On 12/02/2020 15.24, Christophe Leroy wrote:
>> Hi Rasmus,
>>
>> Kernel 5.6-rc1 silently fails on boot.
>>
>> I bisected the problem to commit b6231ea2b3c6 ("soc: fsl: qe: drop
>> broken lazy call of cpm_muram_init()")
>>
>> I get a bad_page_fault() for an access at address 8 in
>> cpm_muram_alloc_common(), called from cpm_uart_console_setup() via
>> cpm_uart_allocbuf()
>
> Sorry about that. But I'm afraid I don't see what I could have done
> differently - the patch series, including b6231ea2b3c6, has been in
> -next since 20191210, both you and ppc-dev were cc'ed on the entire
> series (last revision sent November 28). And I've been dogfooding the
> patches on both arm- and ppc-derived boards ever since (but obviously
> only for a few cpus).

Yes, this patch series should have ringed a bell in my head, looks like
I'm the one who introduced this 4 years ago through commit 4d486e008379
("soc/fsl/qe: fix Oops on CPM1 (and likely CPM2)")

But I had completely forgotten that patch until I did some git blame
this morning on this lazy call.


>
>> Reverting the guilty commit on top of 5.6-rc1 is not trivial.
>>
>> In your commit text you explain that cpm_muram_init() is called via
>> subsys_initcall. But console init is done before that, so it cannot work.
>
> No, but neither did the code I removed seem to work - how does doing
> spin_lock_init on a held spinlock, and then unlocking it, work? Is
> everything-spinlock always a no-op in your configuration? And even so,
> I'd think a GFP_KERNEL allocation under spin_lock_irqsave() would
> trigger some splat somewhere?
>
> Please note I'm not claiming my patch is not at fault, it clearly is, I
> just want to try to understand how I could have been wrong about the
> "nobody can have been relying on it" part.
>

It seems spin_lock_init() does just nothing.
spin_lock_irqsave() just disable IRQs and increases preempt_count.
spin_lock_irqrestore() restore IRQ state, decreace preempt_count and
call preempt_schedule if preempt_count reaches 0.

Maybe with some debugging options like DEBUG_ATOMIC_SLEEP could detect it ?

>> Do you have a fix for that ?
>
> Not right now, but I'll have a look. It's true that the patch probably
> doesn't revert cleanly, but it shouldn't be hard to add back those few
> lines in the appropriate spot, with a big fat comment that this does
> something very fishy (at least as a temporary measure if we don't find a
> proper solution soonish).
>

Thanks
Christophe

2020-02-13 10:51:51

by Christophe Leroy

[permalink] [raw]
Subject: Re: [Regression 5.6-rc1][Bisected b6231ea2b3c6] Powerpc 8xx doesn't boot anymore



Le 13/02/2020 à 10:37, Rasmus Villemoes a écrit :
> On 12/02/2020 15.50, Christophe Leroy wrote:
>>
>>
>> On 02/12/2020 02:24 PM, Christophe Leroy wrote:
>
>>> In your commit text you explain that cpm_muram_init() is called via
>>> subsys_initcall. But console init is done before that, so it cannot work.
>>>
>>> Do you have a fix for that ?
>>>
>>
>> The following patch allows powerpc 8xx to boot again. Don't know if
>> that's the good place and way to do the fix though.
>>
>> ---
>> diff --git a/drivers/tty/serial/cpm_uart/cpm_uart_core.c
>> b/drivers/tty/serial/cpm_uart/cpm_uart_core.c
>> index 4cabded8390b..341d682ec6eb 100644
>> --- a/drivers/tty/serial/cpm_uart/cpm_uart_core.c
>> +++ b/drivers/tty/serial/cpm_uart/cpm_uart_core.c
>> @@ -1351,6 +1351,7 @@ static int __init cpm_uart_console_setup(struct
>> console *co, char *options)
>>          clrbits32(&pinfo->sccp->scc_gsmrl, SCC_GSMRL_ENR | SCC_GSMRL_ENT);
>>      }
>>
>> +    cpm_muram_init();
>>      ret = cpm_uart_allocbuf(pinfo, 1);
>>
>>      if (ret)
>
> Hmm, that seems to be a somewhat random place, making it hard to see
> that it is indeed early enough. Would it work to put it inside the
> console_initcall that registers the cpm console? I.e.
>
> static int __init cpm_uart_console_init(void)
> {
> + cpm_muram_init();
> register_console(&cpm_scc_uart_console);
> return 0;
> }
>
> console_initcall(cpm_uart_console_init);
>

Yes that works too.

Thanks
Christophe