2017-06-14 19:26:42

by Guenter Roeck

[permalink] [raw]
Subject: Qemu crashes in -next due to 'of: remove *phandle properties from expanded device tree'

Hi Frank,

your commit 'of: remove *phandle properties from expanded device tree' in
-next causes several of my ppc qemu tests to crash. Looking into qemu, it
sets "linux,phandle" properties for the mpic and for other devices.

The crashes are along the line of

------------[ cut here ]------------
kernel BUG at arch/powerpc/platforms/85xx/corenet_generic.c:50!
Oops: Exception in kernel mode, sig: 5 [#1]
SMP NR_CPUS=32
NUMA
CoreNet Generic
Modules linked in:
CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.12.0-rc5-next-20170614 #1
task: c000000000ad8cc0 task.stack: c000000000bec000
NIP: c000000000a8ca7c LR: c000000000a8ca6c CTR: c000000000a8ca20
REGS: c000000000befb90 TRAP: 0700 Not tainted (4.12.0-rc5-next-20170614)
MSR: 0000000080021000 <CE,ME>
CR: 22000042 XER: 00000000
SOFTE: 0
GPR00: c000000000a8ca6c c000000000befe10 c000000000befa00 0000000000000000
GPR04: 0000000000000000 c000000000ac8458 c000000000ac8438 c000000000830658
GPR08: 0000000000000001 0000000000000001 0000000000000000 0000000000009531
GPR12: 0000000022000022 c00000003fff1000 0000000000000000 0000000000000000
GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
GPR20: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
GPR24: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
GPR28: c000000000000300 c00000003fff2cc0 c000000000ac06e0 c000000000ac06e0
NIP [c000000000a8ca7c] .corenet_gen_pic_init+0x5c/0x90
LR [c000000000a8ca6c] .corenet_gen_pic_init+0x4c/0x90
Call Trace:
[c000000000befe10] [c000000000a8ca6c] .corenet_gen_pic_init+0x4c/0x90
(unreliable)
[c000000000befe80] [c000000000a832f8] .init_IRQ+0x34/0x4c
[c000000000befef0] [c000000000a7fc88] .start_kernel+0x2fc/0x500
[c000000000beff90] [c000000000000554] start_here_common+0x1c/0x48
Instruction dump:
e8aa0068 39088268 39407002 38600000 7fa54800 39205002 7caa4f9e 4bffe9e9
60000000 2c230000 7d200026 55291ffe <0b090000> 4bfff335 60000000 3ca2ffdd
random: 0x600000003d220004 get_random_bytes called with crng_init=0
---[ end trace 0000000000000000 ]---

and are caused by the kernel not finding the mpic node anymore.

Any idea how to solve the problem ?

Bisect log is attached.

Thanks,
Guenter

---
# bad: [b14746170b0684005bab3e07893e6b91baf7dbf6] Add linux-next specific files for 20170614
# good: [32c1431eea4881a6b17bd7c639315010aeefa452] Linux 4.12-rc5
git bisect start 'HEAD' 'v4.12-rc5'
# good: [0500b956eedb4686b0420308ae01a74b00f9ab64] Merge remote-tracking branch 'crypto/master'
git bisect good 0500b956eedb4686b0420308ae01a74b00f9ab64
# bad: [4717c17660509cee9d3596eb19b99f3e26d57c36] Merge remote-tracking branch 'tip/auto-latest'
git bisect bad 4717c17660509cee9d3596eb19b99f3e26d57c36
# good: [f32807fd889514af115c32f597f59763d44ffae4] next-20170613/sound-asoc
git bisect good f32807fd889514af115c32f597f59763d44ffae4
# good: [8bf3df94bf566c7294b6f972cb5afa2d9a3a83f5] Merge remote-tracking branch 'iommu/next'
git bisect good 8bf3df94bf566c7294b6f972cb5afa2d9a3a83f5
# good: [e5c91c3569136b20783bd0799f026b89e4a2752a] Merge branch 'sched/core'
git bisect good e5c91c3569136b20783bd0799f026b89e4a2752a
# good: [3ff2be7e0e543ed1fbdd1a9f5ca49417be7b2a66] Merge branch 'x86/boot'
git bisect good 3ff2be7e0e543ed1fbdd1a9f5ca49417be7b2a66
# good: [2b37bbbc6291132aa8b08088ec31652eaf66ce6a] Merge remote-tracking branches 'spi/topic/rockchip', 'spi/topic/sh-msiof', 'spi/topic/spidev' and 'spi/topic/st-ssc4' into spi-next
git bisect good 2b37bbbc6291132aa8b08088ec31652eaf66ce6a
# good: [82a28f6c16030d04f5719889999f4fa9a35bcfc7] Merge branch 'x86/timers'
git bisect good 82a28f6c16030d04f5719889999f4fa9a35bcfc7
# bad: [d19a4961ac001b1284013ecff3deb6456a09abda] of: make __of_attach_node() static
git bisect bad d19a4961ac001b1284013ecff3deb6456a09abda
# good: [e5e9b5fae7e7d1fad87e4abb52f5f3d55c9f4e25] iio: proximity: as3935: add missing required spi-max-frequency
git bisect good e5e9b5fae7e7d1fad87e4abb52f5f3d55c9f4e25
# good: [d20dc1493db438fbbfb7733adc82f472dd8a0789] of: Support const and non-const use for to_of_node()
git bisect good d20dc1493db438fbbfb7733adc82f472dd8a0789
# good: [4811a1a7800bc59074e640a4fe9befdb668ae56f] Merge branch 'dt/property-move' into dt/next
git bisect good 4811a1a7800bc59074e640a4fe9befdb668ae56f
# bad: [f847192ce4061dc7e9087eb9136a38e3bf582efb] of: remove *phandle properties from expanded device tree
git bisect bad f847192ce4061dc7e9087eb9136a38e3bf582efb
# good: [6fedb069def034a4738584920fe94535ab29637a] of: Provide dummy of_device_compatible_match() for compile-testing
git bisect good 6fedb069def034a4738584920fe94535ab29637a
# first bad commit: [f847192ce4061dc7e9087eb9136a38e3bf582efb] of: remove *phandle properties from expanded device tree


2017-06-14 21:32:47

by Frank Rowand

[permalink] [raw]
Subject: Re: Qemu crashes in -next due to 'of: remove *phandle properties from expanded device tree'

Hi Guenter,

Thanks for reporting this.


On 06/14/17 12:26, Guenter Roeck wrote:
> Hi Frank,
>
> your commit 'of: remove *phandle properties from expanded device tree' in
> -next causes several of my ppc qemu tests to crash. Looking into qemu, it
> sets "linux,phandle" properties for the mpic and for other devices.
>
> The crashes are along the line of
>
> ------------[ cut here ]------------
> kernel BUG at arch/powerpc/platforms/85xx/corenet_generic.c:50!
> Oops: Exception in kernel mode, sig: 5 [#1]
> SMP NR_CPUS=32
> NUMA
> CoreNet Generic
> Modules linked in:
> CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.12.0-rc5-next-20170614 #1
> task: c000000000ad8cc0 task.stack: c000000000bec000
> NIP: c000000000a8ca7c LR: c000000000a8ca6c CTR: c000000000a8ca20
> REGS: c000000000befb90 TRAP: 0700 Not tainted (4.12.0-rc5-next-20170614)
> MSR: 0000000080021000 <CE,ME>
> CR: 22000042 XER: 00000000
> SOFTE: 0
> GPR00: c000000000a8ca6c c000000000befe10 c000000000befa00 0000000000000000
> GPR04: 0000000000000000 c000000000ac8458 c000000000ac8438 c000000000830658
> GPR08: 0000000000000001 0000000000000001 0000000000000000 0000000000009531
> GPR12: 0000000022000022 c00000003fff1000 0000000000000000 0000000000000000
> GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
> GPR20: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
> GPR24: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
> GPR28: c000000000000300 c00000003fff2cc0 c000000000ac06e0 c000000000ac06e0
> NIP [c000000000a8ca7c] .corenet_gen_pic_init+0x5c/0x90
> LR [c000000000a8ca6c] .corenet_gen_pic_init+0x4c/0x90
> Call Trace:
> [c000000000befe10] [c000000000a8ca6c] .corenet_gen_pic_init+0x4c/0x90
> (unreliable)
> [c000000000befe80] [c000000000a832f8] .init_IRQ+0x34/0x4c
> [c000000000befef0] [c000000000a7fc88] .start_kernel+0x2fc/0x500
> [c000000000beff90] [c000000000000554] start_here_common+0x1c/0x48
> Instruction dump:
> e8aa0068 39088268 39407002 38600000 7fa54800 39205002 7caa4f9e 4bffe9e9
> 60000000 2c230000 7d200026 55291ffe <0b090000> 4bfff335 60000000 3ca2ffdd
> random: 0x600000003d220004 get_random_bytes called with crng_init=0
> ---[ end trace 0000000000000000 ]---
>
> and are caused by the kernel not finding the mpic node anymore.
>
> Any idea how to solve the problem ?

The BUG() is triggered if mpic_alloc() returns NULL.

I looked through mpic_alloc(), and the functions that it calls, and nothing
is jumping out as being related to phandles.

Can you add some printks to mpic_alloc() to determine what problem is
causing it to return NULL?

Can you also include the console messages before the "[ cut here ]" line?

-Frank

>
> Bisect log is attached.
>
> Thanks,
> Guenter

< snip >

2017-06-14 22:36:02

by Guenter Roeck

[permalink] [raw]
Subject: Re: Qemu crashes in -next due to 'of: remove *phandle properties from expanded device tree'

On Wed, Jun 14, 2017 at 02:31:58PM -0700, Frank Rowand wrote:
> Hi Guenter,
>
> Thanks for reporting this.
>
>
> On 06/14/17 12:26, Guenter Roeck wrote:
> > Hi Frank,
> >
> > your commit 'of: remove *phandle properties from expanded device tree' in
> > -next causes several of my ppc qemu tests to crash. Looking into qemu, it
> > sets "linux,phandle" properties for the mpic and for other devices.
> >
> > The crashes are along the line of
> >
> > ------------[ cut here ]------------
> > kernel BUG at arch/powerpc/platforms/85xx/corenet_generic.c:50!
> > Oops: Exception in kernel mode, sig: 5 [#1]
> > SMP NR_CPUS=32
> > NUMA
> > CoreNet Generic
> > Modules linked in:
> > CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.12.0-rc5-next-20170614 #1
> > task: c000000000ad8cc0 task.stack: c000000000bec000
> > NIP: c000000000a8ca7c LR: c000000000a8ca6c CTR: c000000000a8ca20
> > REGS: c000000000befb90 TRAP: 0700 Not tainted (4.12.0-rc5-next-20170614)
> > MSR: 0000000080021000 <CE,ME>
> > CR: 22000042 XER: 00000000
> > SOFTE: 0
> > GPR00: c000000000a8ca6c c000000000befe10 c000000000befa00 0000000000000000
> > GPR04: 0000000000000000 c000000000ac8458 c000000000ac8438 c000000000830658
> > GPR08: 0000000000000001 0000000000000001 0000000000000000 0000000000009531
> > GPR12: 0000000022000022 c00000003fff1000 0000000000000000 0000000000000000
> > GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
> > GPR20: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
> > GPR24: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
> > GPR28: c000000000000300 c00000003fff2cc0 c000000000ac06e0 c000000000ac06e0
> > NIP [c000000000a8ca7c] .corenet_gen_pic_init+0x5c/0x90
> > LR [c000000000a8ca6c] .corenet_gen_pic_init+0x4c/0x90
> > Call Trace:
> > [c000000000befe10] [c000000000a8ca6c] .corenet_gen_pic_init+0x4c/0x90
> > (unreliable)
> > [c000000000befe80] [c000000000a832f8] .init_IRQ+0x34/0x4c
> > [c000000000befef0] [c000000000a7fc88] .start_kernel+0x2fc/0x500
> > [c000000000beff90] [c000000000000554] start_here_common+0x1c/0x48
> > Instruction dump:
> > e8aa0068 39088268 39407002 38600000 7fa54800 39205002 7caa4f9e 4bffe9e9
> > 60000000 2c230000 7d200026 55291ffe <0b090000> 4bfff335 60000000 3ca2ffdd
> > random: 0x600000003d220004 get_random_bytes called with crng_init=0
> > ---[ end trace 0000000000000000 ]---
> >
> > and are caused by the kernel not finding the mpic node anymore.
> >
> > Any idea how to solve the problem ?
>
> The BUG() is triggered if mpic_alloc() returns NULL.
>
Yes, I got that far as well ...

> I looked through mpic_alloc(), and the functions that it calls, and nothing
> is jumping out as being related to phandles.
>
> Can you add some printks to mpic_alloc() to determine what problem is
> causing it to return NULL?
>
I'll try later tonight.

> Can you also include the console messages before the "[ cut here ]" line?
>
http://kerneltests.org/builders

Check qemu test results in the 'next' column. ppc and ppc64 show related console
messages.

Guenter

2017-06-15 00:46:41

by Frank Rowand

[permalink] [raw]
Subject: Re: Qemu crashes in -next due to 'of: remove *phandle properties from expanded device tree'

On 06/14/17 15:35, Guenter Roeck wrote:
> On Wed, Jun 14, 2017 at 02:31:58PM -0700, Frank Rowand wrote:
>> Hi Guenter,

< snip >

>> Can you also include the console messages before the "[ cut here ]" line?
>>
> http://kerneltests.org/builders
>
> Check qemu test results in the 'next' column. ppc and ppc64 show related console
> messages.

Thanks for the pointer. Unfortunately I did not see any additional clues (yet)
in the full log.

I tried to compare the failed boot to a good boot, but did not find a console
log for a good boot. I started at the qemu-ppc-next builder page:

http://kerneltests.org/builders/qemu-ppc64-next

and looked at recent tests that were successful (like #645). But the log file
link from that test does not show the contents of the console for tests that
pass. Is there some way to see what the console for a successful test looks
like?

-Frank

2017-06-15 02:10:35

by Guenter Roeck

[permalink] [raw]
Subject: Re: Qemu crashes in -next due to 'of: remove *phandle properties from expanded device tree'

On Wed, Jun 14, 2017 at 05:45:52PM -0700, Frank Rowand wrote:
> On 06/14/17 15:35, Guenter Roeck wrote:
> > On Wed, Jun 14, 2017 at 02:31:58PM -0700, Frank Rowand wrote:
> >> Hi Guenter,
>
> < snip >
>
> >> Can you also include the console messages before the "[ cut here ]" line?
> >>
> > http://kerneltests.org/builders
> >
> > Check qemu test results in the 'next' column. ppc and ppc64 show related console
> > messages.
>
> Thanks for the pointer. Unfortunately I did not see any additional clues (yet)
> in the full log.
>
> I tried to compare the failed boot to a good boot, but did not find a console
> log for a good boot. I started at the qemu-ppc-next builder page:
>
> http://kerneltests.org/builders/qemu-ppc64-next
>
> and looked at recent tests that were successful (like #645). But the log file
> link from that test does not show the contents of the console for tests that
> pass. Is there some way to see what the console for a successful test looks
> like?
>

See attached. I am on the road; I'll try to do some debugging later from home.

Guenter


Attachments:
(No filename) (1.06 kB)
ppc64.log (6.26 kB)
Download all attachments

2017-06-15 04:13:22

by Guenter Roeck

[permalink] [raw]
Subject: Re: Qemu crashes in -next due to 'of: remove *phandle properties from expanded device tree'

On 06/14/2017 05:45 PM, Frank Rowand wrote:
> On 06/14/17 15:35, Guenter Roeck wrote:
>> On Wed, Jun 14, 2017 at 02:31:58PM -0700, Frank Rowand wrote:
>>> Hi Guenter,
>
> < snip >
>
>>> Can you also include the console messages before the "[ cut here ]" line?
>>>
>> http://kerneltests.org/builders
>>
>> Check qemu test results in the 'next' column. ppc and ppc64 show related console
>> messages.
>
> Thanks for the pointer. Unfortunately I did not see any additional clues (yet)
> in the full log.
>
> I tried to compare the failed boot to a good boot, but did not find a console
> log for a good boot. I started at the qemu-ppc-next builder page:
>
> http://kerneltests.org/builders/qemu-ppc64-next
>
> and looked at recent tests that were successful (like #645). But the log file
> link from that test does not show the contents of the console for tests that
> pass. Is there some way to see what the console for a successful test looks
> like?
>
> -Frank
>
Good (v4.12-rc4):

...
NR_IRQS:512 nr_irqs:512 16
OF: Checking node /
OF: node '/' compatible '' type 'open-pic' name '' score 0
OF: node '/' compatible 'open-pic' type '' name '' score 0
OF: Checking node /pci@e0008000
OF: node '/pci@e0008000' compatible '' type 'open-pic' name '' score 0
OF: node '/pci@e0008000' compatible 'open-pic' type '' name '' score 0
OF: Checking node /soc@e0000000
OF: node '/soc@e0000000' compatible '' type 'open-pic' name '' score 0
OF: node '/soc@e0000000' compatible 'open-pic' type '' name '' score 0
OF: Checking node /soc@e0000000/msi@41600
OF: node '/soc@e0000000/msi@41600' compatible '' type 'open-pic' name '' score 0
OF: node '/soc@e0000000/msi@41600' compatible 'open-pic' type '' name '' score 0
OF: Checking node /soc@e0000000/global-utilities@e0000
OF: node '/soc@e0000000/global-utilities@e0000' compatible '' type 'open-pic' name '' score 0
OF: node '/soc@e0000000/global-utilities@e0000' compatible 'open-pic' type '' name '' score 0
OF: Checking node /soc@e0000000/serial@4500
OF: node '/soc@e0000000/serial@4500' compatible '' type 'open-pic' name '' score 0
OF: node '/soc@e0000000/serial@4500' compatible 'open-pic' type '' name '' score 0
OF: Checking node /soc@e0000000/pic@40000
OF: type match
OF: node '/soc@e0000000/pic@40000' compatible '' type 'open-pic' name '' score 2
OF: node '/soc@e0000000/pic@40000' compatible 'open-pic' type '' name '' score 0
mpic: Setting up MPIC " OpenPIC " version 1.2 at e0040000, max 1 CPUs
mpic: ISU size: 512, shift: 9, mask: 1ff
mpic: Initializing for 512 sources

bad:

NR_IRQS: 512, nr_irqs: 512, preallocated irqs: 16
OF: Checking node /
OF: node '/' compatible '' type 'open-pic' name '' score 0
OF: node '/' compatible 'open-pic' type '' name '' score 0
OF: Checking node /pci@e0008000
OF: node '/pci@e0008000' compatible '' type 'open-pic' name '' score 0
OF: node '/pci@e0008000' compatible 'open-pic' type '' name '' score 0
OF: Checking node /soc@e0000000
OF: node '/soc@e0000000' compatible '' type 'open-pic' name '' score 0
OF: node '/soc@e0000000' compatible 'open-pic' type '' name '' score 0
OF: Checking node /soc@e0000000/msi@41600
OF: node '/soc@e0000000/msi@41600' compatible '' type 'open-pic' name '' score 0
OF: node '/soc@e0000000/msi@41600' compatible 'open-pic' type '' name '' score 0
OF: Checking node /soc@e0000000/global-utilities@e0000
OF: node '/soc@e0000000/global-utilities@e0000' compatible '' type 'open-pic' name '' score 0
OF: node '/soc@e0000000/global-utilities@e0000' compatible 'open-pic' type '' name '' score 0
OF: Checking node /soc@e0000000/serial@4500
OF: node '/soc@e0000000/serial@4500' compatible '' type 'open-pic' name '' score 0
OF: node '/soc@e0000000/serial@4500' compatible 'open-pic' type '' name '' score 0
OF: Checking node /soc@e0000000/pic@40000
OF: node '/soc@e0000000/pic@40000' compatible '' type 'open-pic' name '' score 0
OF: node '/soc@e0000000/pic@40000' compatible 'open-pic' type '' name '' score 0
OF: Checking node /aliases
OF: node '/aliases' compatible '' type 'open-pic' name '' score 0
OF: node '/aliases' compatible 'open-pic' type '' name '' score 0
OF: Checking node /cpus
OF: node '/cpus' compatible '' type 'open-pic' name '' score 0
OF: node '/cpus' compatible 'open-pic' type '' name '' score 0
OF: Checking node /cpus/PowerPC,8544@0
OF: node '/cpus/PowerPC,8544@0' compatible '' type 'open-pic' name '' score 0
OF: node '/cpus/PowerPC,8544@0' compatible 'open-pic' type '' name '' score 0
OF: Checking node /chosen
OF: node '/chosen' compatible '' type 'open-pic' name '' score 0
OF: node '/chosen' compatible 'open-pic' type '' name '' score 0
OF: Checking node /memory
OF: node '/memory' compatible '' type 'open-pic' name '' score 0
OF: node '/memory' compatible 'open-pic' type '' name '' score 0
No matching open-pic node
------------[ cut here ]------------
kernel BUG at arch/powerpc/platforms/85xx/corenet_generic.c:50!

So, in __of_device_is_compatible(), the difference is in
__of_device_is_compatible() after

/* Matching type is better than matching name */

Further debugging shows that device->type is NULL in the bad case.

OF: Checking node /soc@e0000000/pic@40000
OF: trying type match open-pic - <NULL>
OF: node '/soc@e0000000/pic@40000' compatible '' type 'open-pic' name '' score 0
OF: node '/soc@e0000000/pic@40000' compatible 'open-pic' type '' name '' score 0

Do you need more information ?

Thanks,
Guenter

2017-06-15 06:48:43

by Michael Ellerman

[permalink] [raw]
Subject: Re: Qemu crashes in -next due to 'of: remove *phandle properties from expanded device tree'

Guenter Roeck <[email protected]> writes:

> Hi Frank,
>
> your commit 'of: remove *phandle properties from expanded device tree' in
> -next causes several of my ppc qemu tests to crash. Looking into qemu, it
> sets "linux,phandle" properties for the mpic and for other devices.

Yeah this broke ~50% of my machines.

Various back traces, or in some cases nothing at all.

cheers

eg:

XICS: Cannot find a Source Controller !
------------[ cut here ]------------
kernel BUG at arch/powerpc/sysdev/xics/xics-common.c:58!
Oops: Exception in kernel mode, sig: 5 [#1]
SMP NR_CPUS=2048
NUMA
pSeries
Modules linked in:
CPU: 0 PID: 0 Comm: swapper/0 Tainted: G W 4.12.0-rc5-gcc5-next-20170614-gb147461 #1
task: c000000000eb1180 task.stack: c000000001084000
NIP: c00000000008d780 LR: c00000000008d770 CTR: 0000000000000000
REGS: c000000001087a40 TRAP: 0700 Tainted: G W (4.12.0-rc5-gcc5-next-20170614-gb147461)
MSR: 8000000000021032 <SF,ME,IR,DR,RI>
CR: 24000422 XER: 00000001
CFAR: c0000000008dd280 SOFTE: 0
GPR00: c00000000008d770 c000000001087cc0 c000000001086400 0000000000000000
GPR04: 0000000000000000 0000000000000000 c000000000ad14c8 0000000000000002
GPR08: 0000000000000002 0000000000000001 0000000000000002 0000000000000000
GPR12: 0000000022000424 c000000006af0000 00000000054dd288 00000000054b5618
GPR16: 00000000054b5320 00000000054b59e8 000000000554dd20 0000000000000060
GPR20: 000000000462eea0 0000000001b56c80 0000000000000040 0000000000000000
GPR24: 0000000004814000 0000000005aa0028 0000000004814000 0000000005ab158e
GPR28: ffffffffd00dfeed c000000000e115e0 0000000000000000 c000000000eb54f4
NIP [c00000000008d780] .xics_update_irq_servers+0x40/0x140
LR [c00000000008d770] .xics_update_irq_servers+0x30/0x140
Call Trace:
[c000000001087cc0] [c00000000008d770] .xics_update_irq_servers+0x30/0x140 (unreliable)
[c000000001087d50] [c000000000db85f0] .xics_init+0x134/0x188
[c000000001087dd0] [c000000000dbdc64] .pseries_init_irq+0x48/0x230
[c000000001087e80] [c000000000da8dcc] .init_IRQ+0x3c/0x50
[c000000001087ef0] [c000000000da44e4] .start_kernel+0x31c/0x528
[c000000001087f90] [c00000000000b070] start_here_common+0x1c/0x4ac
Instruction dump:
f821ff71 60000000 60000000 3d02ffe3 38800000 3be8f0f4 e87f0002 4884fa85
60000000 7c690074 7c7e1b78 7929d182 <0b090000> e93f0002 3d02000b 3c82ffc2
---[ end trace 523b05d3a02887f6 ]---

2017-06-15 07:59:10

by Frank Rowand

[permalink] [raw]
Subject: Re: Qemu crashes in -next due to 'of: remove *phandle properties from expanded device tree'

On 06/14/17 21:12, Guenter Roeck wrote:

< snip >

> Good (v4.12-rc4):
>

< snip >

> OF: Checking node /soc@e0000000/pic@40000
> OF: type match
> OF: node '/soc@e0000000/pic@40000' compatible '' type 'open-pic' name '' score 2
> OF: node '/soc@e0000000/pic@40000' compatible 'open-pic' type '' name '' score 0

< snip >

>
> bad:

< snip >

> OF: Checking node /soc@e0000000/pic@40000
> OF: node '/soc@e0000000/pic@40000' compatible '' type 'open-pic' name '' score 0
> OF: node '/soc@e0000000/pic@40000' compatible 'open-pic' type '' name '' score 0

< snip >

> No matching open-pic node
> ------------[ cut here ]------------
> kernel BUG at arch/powerpc/platforms/85xx/corenet_generic.c:50!
>
> So, in __of_device_is_compatible(), the difference is in
> __of_device_is_compatible() after
>
> /* Matching type is better than matching name */
>
> Further debugging shows that device->type is NULL in the bad case.
>
> OF: Checking node /soc@e0000000/pic@40000
> OF: trying type match open-pic - <NULL>
> OF: node '/soc@e0000000/pic@40000' compatible '' type 'open-pic' name '' score 0
> OF: node '/soc@e0000000/pic@40000' compatible 'open-pic' type '' name '' score 0
>
> Do you need more information ?

I think I know what part of my patch is causing the problem.

Can you try the following patch to see if if fixes the failure in
__of_device_is_compatible()?

If this fixes the failure, then I know what is going on. If it works
then I will have to rework my original patch in a different way than
this quick hack.

-Frank



---
drivers/of/dynamic.c | 14 ++++++++++++++
1 file changed, 14 insertions(+)

Index: b/drivers/of/dynamic.c
===================================================================
--- a/drivers/of/dynamic.c
+++ b/drivers/of/dynamic.c
@@ -218,6 +218,20 @@ int of_property_notify(int action, struc

static void __of_attach_node(struct device_node *np)
{
+ const __be32 *phandle;
+ int sz;
+
+ /* use "<NULL>" to be consistent with populate_node() */
+ np->name = __of_get_property(np, "name", NULL) ? : "<NULL>";
+ np->type = __of_get_property(np, "device_type", NULL) ? : "<NULL>";
+
+ phandle = __of_get_property(np, "phandle", &sz);
+ if (!phandle)
+ phandle = __of_get_property(np, "linux,phandle", &sz);
+ if (IS_ENABLED(CONFIG_PPC_PSERIES) && !phandle)
+ phandle = __of_get_property(np, "ibm,phandle", &sz);
+ np->phandle = (phandle && (sz >= 4)) ? be32_to_cpup(phandle) : 0;
+
np->child = NULL;
np->sibling = np->parent->child;
np->parent->child = np;


2017-06-15 09:53:58

by Guenter Roeck

[permalink] [raw]
Subject: Re: Qemu crashes in -next due to 'of: remove *phandle properties from expanded device tree'

On 06/15/2017 12:58 AM, Frank Rowand wrote:
> On 06/14/17 21:12, Guenter Roeck wrote:
>
> < snip >
>
>> Good (v4.12-rc4):
>>
>
> < snip >
>
>> OF: Checking node /soc@e0000000/pic@40000
>> OF: type match
>> OF: node '/soc@e0000000/pic@40000' compatible '' type 'open-pic' name '' score 2
>> OF: node '/soc@e0000000/pic@40000' compatible 'open-pic' type '' name '' score 0
>
> < snip >
>
>>
>> bad:
>
> < snip >
>
>> OF: Checking node /soc@e0000000/pic@40000
>> OF: node '/soc@e0000000/pic@40000' compatible '' type 'open-pic' name '' score 0
>> OF: node '/soc@e0000000/pic@40000' compatible 'open-pic' type '' name '' score 0
>
> < snip >
>
>> No matching open-pic node
>> ------------[ cut here ]------------
>> kernel BUG at arch/powerpc/platforms/85xx/corenet_generic.c:50!
>>
>> So, in __of_device_is_compatible(), the difference is in
>> __of_device_is_compatible() after
>>
>> /* Matching type is better than matching name */
>>
>> Further debugging shows that device->type is NULL in the bad case.
>>
>> OF: Checking node /soc@e0000000/pic@40000
>> OF: trying type match open-pic - <NULL>
>> OF: node '/soc@e0000000/pic@40000' compatible '' type 'open-pic' name '' score 0
>> OF: node '/soc@e0000000/pic@40000' compatible 'open-pic' type '' name '' score 0
>>
>> Do you need more information ?
>
> I think I know what part of my patch is causing the problem.
>
> Can you try the following patch to see if if fixes the failure in
> __of_device_is_compatible()?
>
> If this fixes the failure, then I know what is going on. If it works
> then I will have to rework my original patch in a different way than
> this quick hack.
>

Sorry, doesn't make a difference.

OF: Checking node /soc@e0000000/pic@40000
OF: trying type match open-pic - <NULL>
OF: node '/soc@e0000000/pic@40000' compatible '' type 'open-pic' name '' score 0
OF: node '/soc@e0000000/pic@40000' compatible 'open-pic' type '' name '' score 0

I added a log message into __of_attach_node(); it is not called.

Guenter