2016-10-20 05:55:13

by Larry Finger

[permalink] [raw]
Subject: Regression in 4.9-rc1 for PPC32 - bisected to commit 05fd007e4629

Kernel 4.9-rc1 fails to boot on my PowerBook G4 Aluminum (32-bit PowerPC) with
the following splat:

Kernel Panic - not synching: Attempted to kill init: exitcode = 0x00000200

Call trace:

dump_stack+0x24/0x34 (unreliable)
panic+0x110/0x2ac
do_exit+0x464/0x834
do_group_exit+0x84/0xac
__wake_up_parent+0x0/0x34
ret_from_syscall+0x0/0x40

As the panic happens very early, I was not able to capture the output, thus the
above was entered by hand.

The problem was bisected to commit 05fd007e4629 ("console: don't prefer first
registered if DT specifies stdout-path"). Examining that patch and testing the
various hunks, I found that the system booted fine when I eliminated the hunk at

--- a/drivers/of/base.c
+++ b/drivers/of/base.c
@@ -2077,6 +2077,8 @@ void of_alias_scan(void * (*dt_alloc)(u64 size, u64 align))
name = of_get_property(of_aliases, "stdout", NULL);
if (name)
of_stdout = of_find_node_opts_by_path(name,
&of_stdout_options);
+ if (of_stdout)
+ console_set_by_of();
}

if (!of_aliases)

Similarly, it would boot if I eliminated the hunk at

--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -2647,7 +2658,7 @@ void register_console(struct console *newcon)
* didn't select a console we take the first one
* that registers here.
*/
- if (preferred_console < 0) {
+ if (preferred_console < 0 && !of_specified_console) {
if (newcon->index < 0)
newcon->index = 0;
if (newcon->setup == NULL ||

The problem happens when of_specified_console is true, and the code following
the modified if statement above is not executed. In my .config, CONFIG_OF=y.

As always, I will be happy to test any fixes.

Thanks,

Larry

--
If I was stranded on an island and the only way to get off
the island was to make a pretty UI, I’d die there.

Linus Torvalds


2016-10-22 17:36:09

by Larry Finger

[permalink] [raw]
Subject: Re: Regression in 4.9-rc1 for PPC32 - bisected to commit 05fd007e4629

On 10/20/2016 12:55 AM, Larry Finger wrote:
> Kernel 4.9-rc1 fails to boot on my PowerBook G4 Aluminum (32-bit PowerPC) with
> the following splat:
>
> Kernel Panic - not synching: Attempted to kill init: exitcode = 0x00000200
>
> Call trace:
>
> dump_stack+0x24/0x34 (unreliable)
> panic+0x110/0x2ac
> do_exit+0x464/0x834
> do_group_exit+0x84/0xac
> __wake_up_parent+0x0/0x34
> ret_from_syscall+0x0/0x40
>
> As the panic happens very early, I was not able to capture the output, thus the
> above was entered by hand.
>
> The problem was bisected to commit 05fd007e4629 ("console: don't prefer first
> registered if DT specifies stdout-path"). Examining that patch and testing the
> various hunks, I found that the system booted fine when I eliminated the hunk at
>
> --- a/drivers/of/base.c
> +++ b/drivers/of/base.c
> @@ -2077,6 +2077,8 @@ void of_alias_scan(void * (*dt_alloc)(u64 size, u64 align))
> name = of_get_property(of_aliases, "stdout", NULL);
> if (name)
> of_stdout = of_find_node_opts_by_path(name,
> &of_stdout_options);
> + if (of_stdout)
> + console_set_by_of();
> }
>
> if (!of_aliases)
>
> Similarly, it would boot if I eliminated the hunk at
>
> --- a/kernel/printk/printk.c
> +++ b/kernel/printk/printk.c
> @@ -2647,7 +2658,7 @@ void register_console(struct console *newcon)
> * didn't select a console we take the first one
> * that registers here.
> */
> - if (preferred_console < 0) {
> + if (preferred_console < 0 && !of_specified_console) {
> if (newcon->index < 0)
> newcon->index = 0;
> if (newcon->setup == NULL ||
>
> The problem happens when of_specified_console is true, and the code following
> the modified if statement above is not executed. In my .config, CONFIG_OF=y.
>
> As always, I will be happy to test any fixes.

I have done some testing regarding this regression. I hope that this will help
in finding a fix for the problem.

When I remove the test of "of_specified_console" in the if statement above,
nothing changes in the first time through register_console(). At this point,
newcon->device is NULL, and "bootconsole [udbg0] selected" is logged sometime
after that call. The second time register_console() is called, newcon->device is
c0323b48, and the log contains

console [tty0] enabled
console [udbg0] disabled

At this point newcon->device has been changed to c0329960. Routine
register_console() is called 5 more times with a netcon->device value of
c032e7ec, but the console is not changed again.

If the code is never allowed to execute the if block in question, the
bootconsole [udbg0] is never replaced, which leads to the kernel panic.

Larry

2016-10-23 10:48:45

by Thorsten Leemhuis

[permalink] [raw]
Subject: Re: Regression in 4.9-rc1 for PPC32 - bisected to commit 05fd007e4629

Hi! On 20.10.2016 07:55, Larry Finger wrote:
> Kernel 4.9-rc1 fails to boot on my PowerBook G4 Aluminum (32-bit PowerPC) with
> the following splat:
> Kernel Panic - not synching: Attempted to kill init: exitcode = 0x00000200
>

Thx for CCing [email protected] I added this report to the list
of regressions for Linux 4.9. I'll watch this thread for further updates
on this issue to document progress in my weekly reports. Please let me
know in case the discussion moves to a different place (bugzilla or
another mail thread for example). tia!

Ciao, Thorsten

2016-10-24 16:05:02

by Larry Finger

[permalink] [raw]
Subject: Re: Regression in 4.9-rc1 for PPC32 - bisected to commit 05fd007e4629

On 10/23/2016 05:48 AM, Thorsten Leemhuis wrote:
> Hi! On 20.10.2016 07:55, Larry Finger wrote:
>> Kernel 4.9-rc1 fails to boot on my PowerBook G4 Aluminum (32-bit PowerPC) with
>> the following splat:
>> Kernel Panic - not synching: Attempted to kill init: exitcode = 0x00000200
>>
>
> Thx for CCing [email protected] I added this report to the list
> of regressions for Linux 4.9. I'll watch this thread for further updates
> on this issue to document progress in my weekly reports. Please let me
> know in case the discussion moves to a different place (bugzilla or
> another mail thread for example). tia!
>
> Ciao, Thorsten

Thorsten,

I would have CCd regressions with my previous postings if I had known the
address. The only place I had seen reference to it was on LKML where the address
is carefully obscured.

If I have not heard from the developers regarding this issue by Friday, I will
be submitting a request for reversion of the patch in question.

Larry



2016-10-24 16:30:06

by Paul Burton

[permalink] [raw]
Subject: Re: Regression in 4.9-rc1 for PPC32 - bisected to commit 05fd007e4629

On Monday, 24 October 2016 11:04:58 BST Larry Finger wrote:
> On 10/23/2016 05:48 AM, Thorsten Leemhuis wrote:
> > Hi! On 20.10.2016 07:55, Larry Finger wrote:
> >> Kernel 4.9-rc1 fails to boot on my PowerBook G4 Aluminum (32-bit PowerPC)
> >> with the following splat:
> >> Kernel Panic - not synching: Attempted to kill init: exitcode =
> >> 0x00000200
> >
> > Thx for CCing [email protected] I added this report to the list
> > of regressions for Linux 4.9. I'll watch this thread for further updates
> > on this issue to document progress in my weekly reports. Please let me
> > know in case the discussion moves to a different place (bugzilla or
> > another mail thread for example). tia!
> >
> > Ciao, Thorsten
>
> Thorsten,
>
> I would have CCd regressions with my previous postings if I had known the
> address. The only place I had seen reference to it was on LKML where the
> address is carefully obscured.
>
> If I have not heard from the developers regarding this issue by Friday, I
> will be submitting a request for reversion of the patch in question.
>
> Larry

Hi Larry,

This was already reported over here:

https://www.linux-mips.org/archives/linux-mips/2016-10/msg00130.html

I posted an attempted fix there but it didn't work for Andreas. I'll try to
find some more time to look at it but it's difficult since I don't have access
to the affected hardware.

Thanks,
Paul


Attachments:
signature.asc (801.00 B)
This is a digitally signed message part.

2016-10-24 19:16:13

by Larry Finger

[permalink] [raw]
Subject: Re: Regression in 4.9-rc1 for PPC32 - bisected to commit 05fd007e4629

On 10/24/2016 11:29 AM, Paul Burton wrote:
> On Monday, 24 October 2016 11:04:58 BST Larry Finger wrote:
>> On 10/23/2016 05:48 AM, Thorsten Leemhuis wrote:
>>> Hi! On 20.10.2016 07:55, Larry Finger wrote:
>>>> Kernel 4.9-rc1 fails to boot on my PowerBook G4 Aluminum (32-bit PowerPC)
>>>> with the following splat:
>>>> Kernel Panic - not synching: Attempted to kill init: exitcode =
>>>> 0x00000200
>>>
>>> Thx for CCing [email protected] I added this report to the list
>>> of regressions for Linux 4.9. I'll watch this thread for further updates
>>> on this issue to document progress in my weekly reports. Please let me
>>> know in case the discussion moves to a different place (bugzilla or
>>> another mail thread for example). tia!
>>>
>>> Ciao, Thorsten
>>
>> Thorsten,
>>
>> I would have CCd regressions with my previous postings if I had known the
>> address. The only place I had seen reference to it was on LKML where the
>> address is carefully obscured.
>>
>> If I have not heard from the developers regarding this issue by Friday, I
>> will be submitting a request for reversion of the patch in question.
>>
>> Larry
>
> Hi Larry,
>
> This was already reported over here:
>
> https://www.linux-mips.org/archives/linux-mips/2016-10/msg00130.html
>
> I posted an attempted fix there but it didn't work for Andreas. I'll try to
> find some more time to look at it but it's difficult since I don't have access
> to the affected hardware.

Paul,

Thanks for alerting me to that other thread. Please CC me on future
communications there, as I do not read that ML.

For completeness, that patch failed here as well.

I'm not sure that my problem is exactly the same as the one Andreas sees. I do
get console output from the bootup, but I do see a kernel panic because the
system is trying to kill init. It seems likely that this happens because we are
on the wrong console (udbg0) rather than tty0.

My DT shows the following lines with "stdout":

==> /proc/device-tree/chosen/linux,stdout-path <==
/pci@f0000000/ATY,JasperParent@10/ATY,Jasper_A@0^@
==> /proc/device-tree/chosen/linux,stdout-package <==
?<9e>+P
==> /proc/device-tree/chosen/stdout <==
????

My PowerBook is only used for testing, mainly to check that wireless drivers
work correctly on BE architecture, but I do find a few problems with changes
that behave differently on Apple hardware. I will be happy to provide any
diagnostics that will be useful.

Larry


2016-10-25 01:30:49

by Larry Finger

[permalink] [raw]
Subject: Re: Regression in 4.9-rc1 for PPC32 - bisected to commit 05fd007e4629

I have a hack that works. Perhaps it will give a bit more understanding so that
a proper patch can be created. My changes were applied on top of 4.9-rc1 with
the patch from
https://www.linux-mips.org/archives/linux-mips/2016-10/msg00130.html, and were
as follows:

Index: linux/kernel/printk/printk.c
===================================================================
--- linux.orig/kernel/printk/printk.c 2016-10-24 12:29:17.838938604 -0500
+++ linux/kernel/printk/printk.c 2016-10-24 19:31:20.012593000 -0500
@@ -2657,7 +2657,9 @@
* didn't select a console we take the first one
* that registers here.
*/
- if (preferred_console < 0 && !of_specified_console) {
+ pr_info("Before: newcon->device %p, newcon->setup %p\n", newcon->device,
newcon->setup);
+ if ((preferred_console < 0 && !of_specified_console) ||
+ !newcon->setup) {
if (newcon->index < 0)
newcon->index = 0;
if (newcon->setup == NULL ||
@@ -2670,6 +2672,7 @@
}
}

+ pr_info("After: newcon->device %p, newcon->setup %p\n", newcon->device,
newcon->setup);
/*
* See if this console matches one we selected on
* the command line.

The changes in commit 05fd007e4629 prevent the body of the 'if
((preferred_console < 0 ...' loop from ever being executed; however, the failure
indicates that we do need to go through this code at least once. I tested
various quantities, and !newcon->setup was the only one that helped.

The outputs from the pr_info statements above and the changes in console as
shown by 'dmesg | egrep "console|newcon" are:

[ 0.000000] Before: newcon->device (null), newcon->setup (null)
Part of the body of if executed
[ 0.000000] After: newcon->device (null), newcon->setup (null)
[ 0.000000] bootconsole [udbg0] enabled
[ 0.001581] Before: newcon->device c0326e34, newcon->setup (null)
We now have newcon->device set, Part of body of if executed
[ 0.001934] After: newcon->device c0326e34, newcon->setup (null)
[ 0.002285] console [tty0] enabled
[ 0.002595] bootconsole [udbg0] disabled
[ 0.002913] Before: newcon->device c032cc4c, newcon->setup c032cc80
This time the body of if is not executed and will not be executed anymore.
[ 0.002925] After: newcon->device c032cc4c, newcon->setup c032cc80
[ 0.002943] Before: newcon->device c0331ad8, newcon->setup c0331c34
[ 0.002956] After: newcon->device c0331ad8, newcon->setup c0331c34
[ 0.003060] Before: newcon->device c0331ad8, newcon->setup c068dbc8
[ 0.003074] After: newcon->device c0331ad8, newcon->setup c068dbc8
[ 0.003322] Before: newcon->device c0331ad8, newcon->setup c068ddc4
[ 0.003335] After: newcon->device c0331ad8, newcon->setup c068ddc4
[ 1.993661] Before: newcon->device c0331ad8, newcon->setup c068dbc8
[ 1.993759] After: newcon->device c0331ad8, newcon->setup c068dbc8
[ 1.994395] Before: newcon->device c0331ad8, newcon->setup c068dbc8
[ 1.994492] After: newcon->device c0331ad8, newcon->setup c068dbc8

Larry