2018-07-02 21:43:44

by Jeremy Linton

[permalink] [raw]
Subject: 4.18rc3 TX2 boot failure with "ACPICA: AML parser: attempt to continue loading table after error"

Hi,

I'm experiencing two problems with commit 5088814a6e931 which is
"ACPICA: AML parser: attempt to continue loading table after error"

The first is this boot failure on a thunderX2:

[ 10.770098] ACPI Error: Ignore error and continue table load
(20180531/psobject-604)
[ 10.777926] Unable to handle kernel NULL pointer dereference at
virtual address 0000000000000000
[ 10.786809] Mem abort info:
[ 10.789623] ESR = 0x96000004
[ 10.792702] Exception class = DABT (current EL), IL = 32 bits
[ 10.798682] SET = 0, FnV = 0
[ 10.801760] EA = 0, S1PTW = 0
[ 10.804925] Data abort info:
[ 10.807827] ISV = 0, ISS = 0x00000004
[ 10.811698] CM = 0, WnR = 0
[ 10.814689] [0000000000000000] user address but active_mm is swapper
[ 10.821108] Internal error: Oops: 96000004 [#1] SMP
[ 10.826032] Modules linked in:
[ 10.829113] CPU: 30 PID: 1 Comm: swapper/0 Not tainted
4.18.0-rc3PPTT4k+ #53
[ 10.836234] Hardware name: Default string Cavium
ThunderX2/Default string, BIOS L50_5.13_1.0.0 05/16/2018
[ 10.845905] pstate: 00400009 (nzcv daif +PAN -UAO)
[ 10.850746] pc : acpi_ps_peek_opcode+0x1c/0x40
[ 10.855231] lr : acpi_ps_create_op+0x54/0x278
[ 10.859627] sp : ffff000009a8ba30
[ 10.862969] x29: ffff000009a8ba30 x28: 0000000054445353
[ 10.868334] x27: 0000000000004008 x26: 0000000000000000
[ 10.873698] x25: ffff000009767f23 x24: ffff000008d59000
[ 10.879063] x23: ffff802672799030 x22: ffff000009a8bb28
[ 10.884427] x21: 0000000000000000 x20: ffff000008d59000
[ 10.889791] x19: ffff802672799030 x18: ffffffffffffffff
[ 10.895155] x17: 0000000000000013 x16: 0000000000000000
[ 10.900519] x15: ffff000008d59708 x14: 2d7463656a626f73
[ 10.905883] x13: 702f313335303831 x12: 3032282064616f6c
[ 10.911246] x11: 20656c6261742065 x10: 756e69746e6f6320
[ 10.916610] x9 : 0000000000000058 x8 : ffff000008570998
[ 10.921974] x7 : 203a726f72724520 x6 : 0000000000000334
[ 10.927338] x5 : 0000000000000012 x4 : 0000000000000000
[ 10.932701] x3 : 0000000000000000 x2 : ffff000009a8bb28
[ 10.938065] x1 : 0000000000000000 x0 : ffff000008505790
[ 10.943430] Process swapper/0 (pid: 1, stack limit =
0x(____ptrval____))
[ 10.950199] Call trace:
[ 10.952663] acpi_ps_peek_opcode+0x1c/0x40
[ 10.956797] acpi_ps_create_op+0x54/0x278
[ 10.960842] acpi_ps_parse_loop+0x1b4/0x6c8
[ 10.965063] acpi_ps_parse_aml+0xe0/0x2b4
[ 10.969108] acpi_ps_execute_table+0xa0/0x104
[ 10.973505] acpi_ns_execute_table+0x120/0x194
[ 10.977989] acpi_ns_parse_table+0x34/0x68
[ 10.982122] acpi_ns_load_table+0x4c/0xbc
[ 10.986169] acpi_tb_load_namespace+0x1d4/0x240
[ 10.990744] acpi_load_tables+0x50/0xbc
[ 10.994614] acpi_init+0xb8/0x374
[ 10.997959] do_one_initcall+0x54/0x208
[ 11.001829] kernel_init_freeable+0x224/0x300
[ 11.006229] kernel_init+0x18/0x118
[ 11.009747] ret_from_fork+0x10/0x18
[ 11.013354] Code: aa0003f3 aa1e03e0 d503201f f9400661 (39400020)
[ 11.019535] ---[ end trace 2bd8068593cf8acc ]---
[ 11.024195] Kernel panic - not syncing: Fatal exception
[ 11.029488] SMP: stopping secondary CPUs
[ 11.033480] ---[ end Kernel panic - not syncing: Fatal exception
]---

Which does appear to be the result of some bad data in the table, but it
was working with 4.17, and reverting this commit solves the problem.

Also the messages now newly being prefixed with '\n' are slightly
corrupted like:

"3ACPI BIOS Error (bug):"

because the KERN_XXX macro is being encoded after the CR which keeps it
from being processed correctly.


Thanks,


2018-07-02 21:53:59

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: 4.18rc3 TX2 boot failure with "ACPICA: AML parser: attempt to continue loading table after error"

On Mon, Jul 2, 2018 at 11:41 PM, Jeremy Linton <[email protected]> wrote:
> Hi,
>
> I'm experiencing two problems with commit 5088814a6e931 which is "ACPICA:
> AML parser: attempt to continue loading table after error"
>
> The first is this boot failure on a thunderX2:
>
> [ 10.770098] ACPI Error: Ignore error and continue table load
> (20180531/psobject-604)
> [ 10.777926] Unable to handle kernel NULL pointer dereference at
> virtual address 0000000000000000
> [ 10.786809] Mem abort info:
> [ 10.789623] ESR = 0x96000004
> [ 10.792702] Exception class = DABT (current EL), IL = 32 bits
> [ 10.798682] SET = 0, FnV = 0
> [ 10.801760] EA = 0, S1PTW = 0
> [ 10.804925] Data abort info:
> [ 10.807827] ISV = 0, ISS = 0x00000004
> [ 10.811698] CM = 0, WnR = 0
> [ 10.814689] [0000000000000000] user address but active_mm is swapper
> [ 10.821108] Internal error: Oops: 96000004 [#1] SMP
> [ 10.826032] Modules linked in:
> [ 10.829113] CPU: 30 PID: 1 Comm: swapper/0 Not tainted
> 4.18.0-rc3PPTT4k+ #53
> [ 10.836234] Hardware name: Default string Cavium ThunderX2/Default
> string, BIOS L50_5.13_1.0.0 05/16/2018
> [ 10.845905] pstate: 00400009 (nzcv daif +PAN -UAO)
> [ 10.850746] pc : acpi_ps_peek_opcode+0x1c/0x40
> [ 10.855231] lr : acpi_ps_create_op+0x54/0x278
> [ 10.859627] sp : ffff000009a8ba30
> [ 10.862969] x29: ffff000009a8ba30 x28: 0000000054445353
> [ 10.868334] x27: 0000000000004008 x26: 0000000000000000
> [ 10.873698] x25: ffff000009767f23 x24: ffff000008d59000
> [ 10.879063] x23: ffff802672799030 x22: ffff000009a8bb28
> [ 10.884427] x21: 0000000000000000 x20: ffff000008d59000
> [ 10.889791] x19: ffff802672799030 x18: ffffffffffffffff
> [ 10.895155] x17: 0000000000000013 x16: 0000000000000000
> [ 10.900519] x15: ffff000008d59708 x14: 2d7463656a626f73
> [ 10.905883] x13: 702f313335303831 x12: 3032282064616f6c
> [ 10.911246] x11: 20656c6261742065 x10: 756e69746e6f6320
> [ 10.916610] x9 : 0000000000000058 x8 : ffff000008570998
> [ 10.921974] x7 : 203a726f72724520 x6 : 0000000000000334
> [ 10.927338] x5 : 0000000000000012 x4 : 0000000000000000
> [ 10.932701] x3 : 0000000000000000 x2 : ffff000009a8bb28
> [ 10.938065] x1 : 0000000000000000 x0 : ffff000008505790
> [ 10.943430] Process swapper/0 (pid: 1, stack limit =
> 0x(____ptrval____))
> [ 10.950199] Call trace:
> [ 10.952663] acpi_ps_peek_opcode+0x1c/0x40
> [ 10.956797] acpi_ps_create_op+0x54/0x278
> [ 10.960842] acpi_ps_parse_loop+0x1b4/0x6c8
> [ 10.965063] acpi_ps_parse_aml+0xe0/0x2b4
> [ 10.969108] acpi_ps_execute_table+0xa0/0x104
> [ 10.973505] acpi_ns_execute_table+0x120/0x194
> [ 10.977989] acpi_ns_parse_table+0x34/0x68
> [ 10.982122] acpi_ns_load_table+0x4c/0xbc
> [ 10.986169] acpi_tb_load_namespace+0x1d4/0x240
> [ 10.990744] acpi_load_tables+0x50/0xbc
> [ 10.994614] acpi_init+0xb8/0x374
> [ 10.997959] do_one_initcall+0x54/0x208
> [ 11.001829] kernel_init_freeable+0x224/0x300
> [ 11.006229] kernel_init+0x18/0x118
> [ 11.009747] ret_from_fork+0x10/0x18
> [ 11.013354] Code: aa0003f3 aa1e03e0 d503201f f9400661 (39400020)
> [ 11.019535] ---[ end trace 2bd8068593cf8acc ]---
> [ 11.024195] Kernel panic - not syncing: Fatal exception
> [ 11.029488] SMP: stopping secondary CPUs
> [ 11.033480] ---[ end Kernel panic - not syncing: Fatal exception ]---
>
> Which does appear to be the result of some bad data in the table, but it was
> working with 4.17, and reverting this commit solves the problem.

But this commit fixes another regression which was more widespread.

Apparently, we can't work around all of the errors in the tables out
there at the same time. :-/

> Also the messages now newly being prefixed with '\n' are slightly corrupted
> like:
>
> "3ACPI BIOS Error (bug):"
>
> because the KERN_XXX macro is being encoded after the CR which keeps it from
> being processed correctly.

Yes, that's a known issue which should be fixed in -rc4.

Thanks,
Rafael

2018-07-02 22:33:13

by Jeremy Linton

[permalink] [raw]
Subject: Re: 4.18rc3 TX2 boot failure with "ACPICA: AML parser: attempt to continue loading table after error"

Hi,

On 07/02/2018 04:52 PM, Rafael J. Wysocki wrote:
> On Mon, Jul 2, 2018 at 11:41 PM, Jeremy Linton <[email protected]> wrote:
>> Hi,
>>
>> I'm experiencing two problems with commit 5088814a6e931 which is "ACPICA:
>> AML parser: attempt to continue loading table after error"
>>
>> The first is this boot failure on a thunderX2:
>>
>> [ 10.770098] ACPI Error: Ignore error and continue table load
>> (20180531/psobject-604)
>> [ 10.777926] Unable to handle kernel NULL pointer dereference at
>> [ 10.950199] Call trace:
>> [ 10.952663] acpi_ps_peek_opcode+0x1c/0x40
>> [ 10.956797] acpi_ps_create_op+0x54/0x278
>> [ 10.960842] acpi_ps_parse_loop+0x1b4/0x6c8
>> [ 10.965063] acpi_ps_parse_aml+0xe0/0x2b4
>> [ 10.969108] acpi_ps_execute_table+0xa0/0x104
>> [ 10.973505] acpi_ns_execute_table+0x120/0x194
>> [ 10.977989] acpi_ns_parse_table+0x34/0x68
>> [ 10.982122] acpi_ns_load_table+0x4c/0xbc
>> [ 10.986169] acpi_tb_load_namespace+0x1d4/0x240
>> [ 10.990744] acpi_load_tables+0x50/0xbc
>> [ 10.994614] acpi_init+0xb8/0x374
>> [ 10.997959] do_one_initcall+0x54/0x208
>> [ 11.001829] kernel_init_freeable+0x224/0x300
>> [ 11.006229] kernel_init+0x18/0x118
>> [ 11.009747] ret_from_fork+0x10/0x18
>> [ 11.013354] Code: aa0003f3 aa1e03e0 d503201f f9400661 (39400020)
>> [ 11.019535] ---[ end trace 2bd8068593cf8acc ]---
>> [ 11.024195] Kernel panic - not syncing: Fatal exception
>> [ 11.029488] SMP: stopping secondary CPUs
>> [ 11.033480] ---[ end Kernel panic - not syncing: Fatal exception ]---
>>
>> Which does appear to be the result of some bad data in the table, but it was
>> working with 4.17, and reverting this commit solves the problem.
>
> But this commit fixes another regression which was more widespread.
>
> Apparently, we can't work around all of the errors in the tables out
> there at the same time. :-/

NP, Let me see if I can come up with a way to harden the
parse_loop/create_op code enough that it doesn't crash the machine.

>
>> Also the messages now newly being prefixed with '\n' are slightly corrupted
>> like:
>>
>> "3ACPI BIOS Error (bug):"
>>
>> because the KERN_XXX macro is being encoded after the CR which keeps it from
>> being processed correctly.
>
> Yes, that's a known issue which should be fixed in -rc4.

Oh.. Yes I see that now, thanks.

2018-07-03 07:53:06

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: 4.18rc3 TX2 boot failure with "ACPICA: AML parser: attempt to continue loading table after error"

On Tue, Jul 3, 2018 at 12:30 AM, Jeremy Linton <[email protected]> wrote:
> Hi,
>
> On 07/02/2018 04:52 PM, Rafael J. Wysocki wrote:
>>
>> On Mon, Jul 2, 2018 at 11:41 PM, Jeremy Linton <[email protected]>
>> wrote:
>>>
>>> Hi,
>>>
>>> I'm experiencing two problems with commit 5088814a6e931 which is "ACPICA:
>>> AML parser: attempt to continue loading table after error"
>>>
>>> The first is this boot failure on a thunderX2:
>>>
>>> [ 10.770098] ACPI Error: Ignore error and continue table load
>>> (20180531/psobject-604)
>>> [ 10.777926] Unable to handle kernel NULL pointer dereference at
>>> [ 10.950199] Call trace:
>>>
>>> [ 10.952663] acpi_ps_peek_opcode+0x1c/0x40
>>> [ 10.956797] acpi_ps_create_op+0x54/0x278
>>> [ 10.960842] acpi_ps_parse_loop+0x1b4/0x6c8
>>> [ 10.965063] acpi_ps_parse_aml+0xe0/0x2b4
>>> [ 10.969108] acpi_ps_execute_table+0xa0/0x104
>>> [ 10.973505] acpi_ns_execute_table+0x120/0x194
>>> [ 10.977989] acpi_ns_parse_table+0x34/0x68
>>> [ 10.982122] acpi_ns_load_table+0x4c/0xbc
>>> [ 10.986169] acpi_tb_load_namespace+0x1d4/0x240
>>> [ 10.990744] acpi_load_tables+0x50/0xbc
>>> [ 10.994614] acpi_init+0xb8/0x374
>>> [ 10.997959] do_one_initcall+0x54/0x208
>>> [ 11.001829] kernel_init_freeable+0x224/0x300
>>> [ 11.006229] kernel_init+0x18/0x118
>>> [ 11.009747] ret_from_fork+0x10/0x18
>>> [ 11.013354] Code: aa0003f3 aa1e03e0 d503201f f9400661 (39400020)
>>> [ 11.019535] ---[ end trace 2bd8068593cf8acc ]---
>>> [ 11.024195] Kernel panic - not syncing: Fatal exception
>>> [ 11.029488] SMP: stopping secondary CPUs
>>> [ 11.033480] ---[ end Kernel panic - not syncing: Fatal exception
>>> ]---
>>>
>>> Which does appear to be the result of some bad data in the table, but it
>>> was
>>> working with 4.17, and reverting this commit solves the problem.
>>
>>
>> But this commit fixes another regression which was more widespread.
>>
>> Apparently, we can't work around all of the errors in the tables out
>> there at the same time. :-/
>
>
> NP, Let me see if I can come up with a way to harden the
> parse_loop/create_op code enough that it doesn't crash the machine.

Sure. I'll look at it too.

2018-07-03 20:32:57

by Schmauss, Erik

[permalink] [raw]
Subject: RE: 4.18rc3 TX2 boot failure with "ACPICA: AML parser: attempt to continue loading table after error"



> -----Original Message-----
> From: [email protected] [mailto:linux-acpi-
> [email protected]] On Behalf Of Rafael J. Wysocki
> Sent: Tuesday, July 3, 2018 12:52 AM
> To: Jeremy Linton <[email protected]>
> Cc: Rafael J. Wysocki <[email protected]>; Schmauss, Erik
> <[email protected]>; [email protected]; linux-
> [email protected]; Rafael J . Wysocki <[email protected]>; linux-arm-
> [email protected]; Lorenzo Pieralisi <[email protected]>
> Subject: Re: 4.18rc3 TX2 boot failure with "ACPICA: AML parser: attempt to
> continue loading table after error"
>
> On Tue, Jul 3, 2018 at 12:30 AM, Jeremy Linton <[email protected]>
> wrote:
> > Hi,
> >
> > On 07/02/2018 04:52 PM, Rafael J. Wysocki wrote:
> >>
> >> On Mon, Jul 2, 2018 at 11:41 PM, Jeremy Linton
> >> <[email protected]>
> >> wrote:
> >>>
> >>> Hi,
> >>>
> >>> I'm experiencing two problems with commit 5088814a6e931 which is
> "ACPICA:
> >>> AML parser: attempt to continue loading table after error"
> >>>
> >>> The first is this boot failure on a thunderX2:
> >>>
> >>> [ 10.770098] ACPI Error: Ignore error and continue table load
> >>> (20180531/psobject-604)
> >>> [ 10.777926] Unable to handle kernel NULL pointer dereference at
> >>> [ 10.950199] Call trace:
> >>>
> >>> [ 10.952663] acpi_ps_peek_opcode+0x1c/0x40
> >>> [ 10.956797] acpi_ps_create_op+0x54/0x278
> >>> [ 10.960842] acpi_ps_parse_loop+0x1b4/0x6c8
> >>> [ 10.965063] acpi_ps_parse_aml+0xe0/0x2b4
> >>> [ 10.969108] acpi_ps_execute_table+0xa0/0x104
> >>> [ 10.973505] acpi_ns_execute_table+0x120/0x194
> >>> [ 10.977989] acpi_ns_parse_table+0x34/0x68
> >>> [ 10.982122] acpi_ns_load_table+0x4c/0xbc
> >>> [ 10.986169] acpi_tb_load_namespace+0x1d4/0x240
> >>> [ 10.990744] acpi_load_tables+0x50/0xbc
> >>> [ 10.994614] acpi_init+0xb8/0x374
> >>> [ 10.997959] do_one_initcall+0x54/0x208
> >>> [ 11.001829] kernel_init_freeable+0x224/0x300
> >>> [ 11.006229] kernel_init+0x18/0x118
> >>> [ 11.009747] ret_from_fork+0x10/0x18
> >>> [ 11.013354] Code: aa0003f3 aa1e03e0 d503201f f9400661 (39400020)
> >>> [ 11.019535] ---[ end trace 2bd8068593cf8acc ]---
> >>> [ 11.024195] Kernel panic - not syncing: Fatal exception
> >>> [ 11.029488] SMP: stopping secondary CPUs
> >>> [ 11.033480] ---[ end Kernel panic - not syncing: Fatal exception
> >>> ]---
> >>>
> >>> Which does appear to be the result of some bad data in the table,
> >>> but it was working with 4.17, and reverting this commit solves the
> >>> problem.
> >>
> >>
> >> But this commit fixes another regression which was more widespread.
> >>
> >> Apparently, we can't work around all of the errors in the tables out
> >> there at the same time. :-/
> >
> >
> > NP, Let me see if I can come up with a way to harden the
> > parse_loop/create_op code enough that it doesn't crash the machine.
>
> Sure. I'll look at it too.

I it looks like there are error cases that have yet to be implemented...
Jeremy, could send an ACPI dump of this thunderX2 machine?

Thanks,
Erik

> --
> To unsubscribe from this list: send the line "unsubscribe linux-acpi" in the body of
> a message to [email protected] More majordomo info at
> http://vger.kernel.org/majordomo-info.html

2018-07-08 09:17:01

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: 4.18rc3 TX2 boot failure with "ACPICA: AML parser: attempt to continue loading table after error"

On Monday, July 2, 2018 11:41:42 PM CEST Jeremy Linton wrote:
> Hi,
>
> I'm experiencing two problems with commit 5088814a6e931 which is
> "ACPICA: AML parser: attempt to continue loading table after error"
>
> The first is this boot failure on a thunderX2:
>
> [ 10.770098] ACPI Error: Ignore error and continue table load
> (20180531/psobject-604)
> [ 10.777926] Unable to handle kernel NULL pointer dereference at
> virtual address 0000000000000000
> [ 10.786809] Mem abort info:
> [ 10.789623] ESR = 0x96000004
> [ 10.792702] Exception class = DABT (current EL), IL = 32 bits
> [ 10.798682] SET = 0, FnV = 0
> [ 10.801760] EA = 0, S1PTW = 0
> [ 10.804925] Data abort info:
> [ 10.807827] ISV = 0, ISS = 0x00000004
> [ 10.811698] CM = 0, WnR = 0
> [ 10.814689] [0000000000000000] user address but active_mm is swapper
> [ 10.821108] Internal error: Oops: 96000004 [#1] SMP
> [ 10.826032] Modules linked in:
> [ 10.829113] CPU: 30 PID: 1 Comm: swapper/0 Not tainted
> 4.18.0-rc3PPTT4k+ #53
> [ 10.836234] Hardware name: Default string Cavium
> ThunderX2/Default string, BIOS L50_5.13_1.0.0 05/16/2018
> [ 10.845905] pstate: 00400009 (nzcv daif +PAN -UAO)
> [ 10.850746] pc : acpi_ps_peek_opcode+0x1c/0x40
> [ 10.855231] lr : acpi_ps_create_op+0x54/0x278
> [ 10.859627] sp : ffff000009a8ba30
> [ 10.862969] x29: ffff000009a8ba30 x28: 0000000054445353
> [ 10.868334] x27: 0000000000004008 x26: 0000000000000000
> [ 10.873698] x25: ffff000009767f23 x24: ffff000008d59000
> [ 10.879063] x23: ffff802672799030 x22: ffff000009a8bb28
> [ 10.884427] x21: 0000000000000000 x20: ffff000008d59000
> [ 10.889791] x19: ffff802672799030 x18: ffffffffffffffff
> [ 10.895155] x17: 0000000000000013 x16: 0000000000000000
> [ 10.900519] x15: ffff000008d59708 x14: 2d7463656a626f73
> [ 10.905883] x13: 702f313335303831 x12: 3032282064616f6c
> [ 10.911246] x11: 20656c6261742065 x10: 756e69746e6f6320
> [ 10.916610] x9 : 0000000000000058 x8 : ffff000008570998
> [ 10.921974] x7 : 203a726f72724520 x6 : 0000000000000334
> [ 10.927338] x5 : 0000000000000012 x4 : 0000000000000000
> [ 10.932701] x3 : 0000000000000000 x2 : ffff000009a8bb28
> [ 10.938065] x1 : 0000000000000000 x0 : ffff000008505790
> [ 10.943430] Process swapper/0 (pid: 1, stack limit =
> 0x(____ptrval____))
> [ 10.950199] Call trace:
> [ 10.952663] acpi_ps_peek_opcode+0x1c/0x40
> [ 10.956797] acpi_ps_create_op+0x54/0x278
> [ 10.960842] acpi_ps_parse_loop+0x1b4/0x6c8
> [ 10.965063] acpi_ps_parse_aml+0xe0/0x2b4
> [ 10.969108] acpi_ps_execute_table+0xa0/0x104
> [ 10.973505] acpi_ns_execute_table+0x120/0x194
> [ 10.977989] acpi_ns_parse_table+0x34/0x68
> [ 10.982122] acpi_ns_load_table+0x4c/0xbc
> [ 10.986169] acpi_tb_load_namespace+0x1d4/0x240
> [ 10.990744] acpi_load_tables+0x50/0xbc
> [ 10.994614] acpi_init+0xb8/0x374
> [ 10.997959] do_one_initcall+0x54/0x208
> [ 11.001829] kernel_init_freeable+0x224/0x300
> [ 11.006229] kernel_init+0x18/0x118
> [ 11.009747] ret_from_fork+0x10/0x18
> [ 11.013354] Code: aa0003f3 aa1e03e0 d503201f f9400661 (39400020)
> [ 11.019535] ---[ end trace 2bd8068593cf8acc ]---
> [ 11.024195] Kernel panic - not syncing: Fatal exception
> [ 11.029488] SMP: stopping secondary CPUs
> [ 11.033480] ---[ end Kernel panic - not syncing: Fatal exception
> ]---
>
> Which does appear to be the result of some bad data in the table, but it
> was working with 4.17, and reverting this commit solves the problem.

Does the patch below make any difference?

---
drivers/acpi/acpica/psobject.c | 3 +++
1 file changed, 3 insertions(+)

Index: linux-pm/drivers/acpi/acpica/psobject.c
===================================================================
--- linux-pm.orig/drivers/acpi/acpica/psobject.c
+++ linux-pm/drivers/acpi/acpica/psobject.c
@@ -39,6 +39,9 @@ static acpi_status acpi_ps_get_aml_opcod
ACPI_FUNCTION_TRACE_PTR(ps_get_aml_opcode, walk_state);

walk_state->aml = walk_state->parser_state.aml;
+ if (!walk_state->aml)
+ return AE_CTRL_PARSE_CONTINUE;
+
walk_state->opcode = acpi_ps_peek_opcode(&(walk_state->parser_state));

/*


2018-07-09 20:46:12

by Jeremy Linton

[permalink] [raw]
Subject: Re: 4.18rc3 TX2 boot failure with "ACPICA: AML parser: attempt to continue loading table after error"

Hi,

First thanks for the patch..

On 07/08/2018 04:14 AM, Rafael J. Wysocki wrote:
> On Monday, July 2, 2018 11:41:42 PM CEST Jeremy Linton wrote:
>> Hi,
>>
>> I'm experiencing two problems with commit 5088814a6e931 which is
>> "ACPICA: AML parser: attempt to continue loading table after error"
>>
>> The first is this boot failure on a thunderX2:
>>
>> [ 10.770098] ACPI Error: Ignore error and continue table load

[trimming]

>> ]---
>>
>> Which does appear to be the result of some bad data in the table, but it
>> was working with 4.17, and reverting this commit solves the problem.
>
> Does the patch below make any difference?
>
> ---
> drivers/acpi/acpica/psobject.c | 3 +++
> 1 file changed, 3 insertions(+)
>
> Index: linux-pm/drivers/acpi/acpica/psobject.c
> ===================================================================
> --- linux-pm.orig/drivers/acpi/acpica/psobject.c
> +++ linux-pm/drivers/acpi/acpica/psobject.c
> @@ -39,6 +39,9 @@ static acpi_status acpi_ps_get_aml_opcod
> ACPI_FUNCTION_TRACE_PTR(ps_get_aml_opcode, walk_state);
>
> walk_state->aml = walk_state->parser_state.aml;
> + if (!walk_state->aml)
> + return AE_CTRL_PARSE_CONTINUE;
> +

Well this seems to avoid the crash, but now it hangs right after on the
"Ignore error and continue table load" message. I would post the DSDT
but I'm borrowing this machine, and I understand the firmware to be an
early release, so...

Anyway, now that I have access again, i'm turning up the debug
messaging, let me see if that points at whats causing the hang.


Thanks,




2018-07-09 21:30:24

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: 4.18rc3 TX2 boot failure with "ACPICA: AML parser: attempt to continue loading table after error"

On Mon, Jul 9, 2018 at 10:45 PM, Jeremy Linton <[email protected]> wrote:
> Hi,
>
> First thanks for the patch..
>
> On 07/08/2018 04:14 AM, Rafael J. Wysocki wrote:
>>
>> On Monday, July 2, 2018 11:41:42 PM CEST Jeremy Linton wrote:
>>>
>>> Hi,
>>>
>>> I'm experiencing two problems with commit 5088814a6e931 which is
>>> "ACPICA: AML parser: attempt to continue loading table after error"
>>>
>>> The first is this boot failure on a thunderX2:
>>>
>>> [ 10.770098] ACPI Error: Ignore error and continue table load
>
>
> [trimming]
>
>>> ]---
>>>
>>> Which does appear to be the result of some bad data in the table, but it
>>> was working with 4.17, and reverting this commit solves the problem.
>>
>>
>> Does the patch below make any difference?
>>
>> ---
>> drivers/acpi/acpica/psobject.c | 3 +++
>> 1 file changed, 3 insertions(+)
>>
>> Index: linux-pm/drivers/acpi/acpica/psobject.c
>> ===================================================================
>> --- linux-pm.orig/drivers/acpi/acpica/psobject.c
>> +++ linux-pm/drivers/acpi/acpica/psobject.c
>> @@ -39,6 +39,9 @@ static acpi_status acpi_ps_get_aml_opcod
>> ACPI_FUNCTION_TRACE_PTR(ps_get_aml_opcode, walk_state);
>> walk_state->aml = walk_state->parser_state.aml;
>> + if (!walk_state->aml)
>> + return AE_CTRL_PARSE_CONTINUE;
>> +
>
>
> Well this seems to avoid the crash, but now it hangs right after on the
> "Ignore error and continue table load" message.

Well, maybe we should just abort in that case.

I'm wondering what happens if you replace the return statement in the
patch above with

return_ACPI_STATUS(AE_AML_BAD_OPCODE)


> I would post the DSDT but
> I'm borrowing this machine, and I understand the firmware to be an early
> release, so...

Please send the output of acpidump from it to Erik in private. That
will help him and Bob to make the code more robust.

> Anyway, now that I have access again, i'm turning up the debug messaging,
> let me see if that points at whats causing the hang.

OK, thanks!

2018-07-10 03:45:10

by Jeremy Linton

[permalink] [raw]
Subject: Re: 4.18rc3 TX2 boot failure with "ACPICA: AML parser: attempt to continue loading table after error"

Hi,

On 07/09/2018 04:28 PM, Rafael J. Wysocki wrote:
> On Mon, Jul 9, 2018 at 10:45 PM, Jeremy Linton <[email protected]> wrote:
>> Hi,
>>
>> First thanks for the patch..
>>
>> On 07/08/2018 04:14 AM, Rafael J. Wysocki wrote:
>>>
>>> On Monday, July 2, 2018 11:41:42 PM CEST Jeremy Linton wrote:
>>>>
>>>> Hi,
>>>>
>>>> I'm experiencing two problems with commit 5088814a6e931 which is
>>>> "ACPICA: AML parser: attempt to continue loading table after error"
>>>>
>>>> The first is this boot failure on a thunderX2:
>>>>
>>>> [ 10.770098] ACPI Error: Ignore error and continue table load
>>
>>
>> [trimming]
>>
>>>> ]---
>>>>
>>>> Which does appear to be the result of some bad data in the table, but it
>>>> was working with 4.17, and reverting this commit solves the problem.
>>>
>>>
>>> Does the patch below make any difference?
>>>
>>> ---
>>> drivers/acpi/acpica/psobject.c | 3 +++
>>> 1 file changed, 3 insertions(+)
>>>
>>> Index: linux-pm/drivers/acpi/acpica/psobject.c
>>> ===================================================================
>>> --- linux-pm.orig/drivers/acpi/acpica/psobject.c
>>> +++ linux-pm/drivers/acpi/acpica/psobject.c
>>> @@ -39,6 +39,9 @@ static acpi_status acpi_ps_get_aml_opcod
>>> ACPI_FUNCTION_TRACE_PTR(ps_get_aml_opcode, walk_state);
>>> walk_state->aml = walk_state->parser_state.aml;
>>> + if (!walk_state->aml)
>>> + return AE_CTRL_PARSE_CONTINUE;
>>> +
>>
>>
>> Well this seems to avoid the crash, but now it hangs right after on the
>> "Ignore error and continue table load" message.
>
> Well, maybe we should just abort in that case.
>
> I'm wondering what happens if you replace the return statement in the
> patch above with
>
> return_ACPI_STATUS(AE_AML_BAD_OPCODE)

Yes, that is where I went when I applied the patch but I used
AE_CTRL_TERMINATE, which terminates the loop in acpi_ps_parse_loop() and
that appears to successfully finish/terminate the initial parsing pass.
But, it then crashes in acpi_ns_lookup called via the
acpi_walk_resources sequences that goes through ut_evalute_object() due
to the path/scope_info->scope.node being ACPI_ROOT_OBJECT (-1) and
bypassing the null check. Adding a ACPI_ROOT_OBJECT check as well as the
null checks in acpi_ns_lookup results in a successful boot. Tracking
down how the terminate (or whatever) is leaving the info->prefix_node
(in acpi_ns_evaluate) set to ROOT_OBJECT instead of null, is something I
don't yet understand.

Anyway, I tried Using BAD_OPCODE rather than TERMINATE and it seems to
have the same basic result as PARSE_CONTINUE.



2018-07-10 11:15:54

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: 4.18rc3 TX2 boot failure with "ACPICA: AML parser: attempt to continue loading table after error"

On Tuesday, July 10, 2018 5:44:05 AM CEST Jeremy Linton wrote:
> Hi,
>
> On 07/09/2018 04:28 PM, Rafael J. Wysocki wrote:
> > On Mon, Jul 9, 2018 at 10:45 PM, Jeremy Linton <[email protected]> wrote:
> >> Hi,
> >>
> >> First thanks for the patch..
> >>
> >> On 07/08/2018 04:14 AM, Rafael J. Wysocki wrote:
> >>>
> >>> On Monday, July 2, 2018 11:41:42 PM CEST Jeremy Linton wrote:
> >>>>
> >>>> Hi,
> >>>>
> >>>> I'm experiencing two problems with commit 5088814a6e931 which is
> >>>> "ACPICA: AML parser: attempt to continue loading table after error"
> >>>>
> >>>> The first is this boot failure on a thunderX2:
> >>>>
> >>>> [ 10.770098] ACPI Error: Ignore error and continue table load
> >>
> >>
> >> [trimming]
> >>
> >>>> ]---
> >>>>
> >>>> Which does appear to be the result of some bad data in the table, but it
> >>>> was working with 4.17, and reverting this commit solves the problem.
> >>>
> >>>
> >>> Does the patch below make any difference?
> >>>
> >>> ---
> >>> drivers/acpi/acpica/psobject.c | 3 +++
> >>> 1 file changed, 3 insertions(+)
> >>>
> >>> Index: linux-pm/drivers/acpi/acpica/psobject.c
> >>> ===================================================================
> >>> --- linux-pm.orig/drivers/acpi/acpica/psobject.c
> >>> +++ linux-pm/drivers/acpi/acpica/psobject.c
> >>> @@ -39,6 +39,9 @@ static acpi_status acpi_ps_get_aml_opcod
> >>> ACPI_FUNCTION_TRACE_PTR(ps_get_aml_opcode, walk_state);
> >>> walk_state->aml = walk_state->parser_state.aml;
> >>> + if (!walk_state->aml)
> >>> + return AE_CTRL_PARSE_CONTINUE;
> >>> +
> >>
> >>
> >> Well this seems to avoid the crash, but now it hangs right after on the
> >> "Ignore error and continue table load" message.
> >
> > Well, maybe we should just abort in that case.
> >
> > I'm wondering what happens if you replace the return statement in the
> > patch above with
> >
> > return_ACPI_STATUS(AE_AML_BAD_OPCODE)
>
> Yes, that is where I went when I applied the patch but I used
> AE_CTRL_TERMINATE, which terminates the loop in acpi_ps_parse_loop() and
> that appears to successfully finish/terminate the initial parsing pass.
> But, it then crashes in acpi_ns_lookup called via the
> acpi_walk_resources sequences that goes through ut_evalute_object() due
> to the path/scope_info->scope.node being ACPI_ROOT_OBJECT (-1) and
> bypassing the null check. Adding a ACPI_ROOT_OBJECT check as well as the
> null checks in acpi_ns_lookup results in a successful boot. Tracking
> down how the terminate (or whatever) is leaving the info->prefix_node
> (in acpi_ns_evaluate) set to ROOT_OBJECT instead of null, is something I
> don't yet understand.
>
> Anyway, I tried Using BAD_OPCODE rather than TERMINATE and it seems to
> have the same basic result as PARSE_CONTINUE.

OK, thanks!

I evidently didn't look deep enough.

Can you please check the patch below?

I'm not sure if we can pass this broken state to
acpi_ps_complete_final_op(), so it may be necessary to return
an error directly when aml_op_start is NULL.

---
drivers/acpi/acpica/psloop.c | 3 +++
1 file changed, 3 insertions(+)

Index: linux-pm/drivers/acpi/acpica/psloop.c
===================================================================
--- linux-pm.orig/drivers/acpi/acpica/psloop.c
+++ linux-pm/drivers/acpi/acpica/psloop.c
@@ -493,6 +493,9 @@ acpi_status acpi_ps_parse_loop(struct ac
ASL_CV_CAPTURE_COMMENTS(walk_state);

aml_op_start = parser_state->aml;
+ if (!aml_op_start)
+ break;
+
if (!op) {
status =
acpi_ps_create_op(walk_state, aml_op_start, &op);


2018-07-10 11:29:15

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: 4.18rc3 TX2 boot failure with "ACPICA: AML parser: attempt to continue loading table after error"

On Tuesday, July 10, 2018 1:13:17 PM CEST Rafael J. Wysocki wrote:
> On Tuesday, July 10, 2018 5:44:05 AM CEST Jeremy Linton wrote:
> > Hi,
> >
> > On 07/09/2018 04:28 PM, Rafael J. Wysocki wrote:
> > > On Mon, Jul 9, 2018 at 10:45 PM, Jeremy Linton <[email protected]> wrote:
> > >> Hi,
> > >>
> > >> First thanks for the patch..
> > >>
> > >> On 07/08/2018 04:14 AM, Rafael J. Wysocki wrote:
> > >>>
> > >>> On Monday, July 2, 2018 11:41:42 PM CEST Jeremy Linton wrote:
> > >>>>
> > >>>> Hi,
> > >>>>
> > >>>> I'm experiencing two problems with commit 5088814a6e931 which is
> > >>>> "ACPICA: AML parser: attempt to continue loading table after error"
> > >>>>
> > >>>> The first is this boot failure on a thunderX2:
> > >>>>
> > >>>> [ 10.770098] ACPI Error: Ignore error and continue table load
> > >>
> > >>
> > >> [trimming]
> > >>
> > >>>> ]---
> > >>>>
> > >>>> Which does appear to be the result of some bad data in the table, but it
> > >>>> was working with 4.17, and reverting this commit solves the problem.
> > >>>
> > >>>
> > >>> Does the patch below make any difference?
> > >>>
> > >>> ---
> > >>> drivers/acpi/acpica/psobject.c | 3 +++
> > >>> 1 file changed, 3 insertions(+)
> > >>>
> > >>> Index: linux-pm/drivers/acpi/acpica/psobject.c
> > >>> ===================================================================
> > >>> --- linux-pm.orig/drivers/acpi/acpica/psobject.c
> > >>> +++ linux-pm/drivers/acpi/acpica/psobject.c
> > >>> @@ -39,6 +39,9 @@ static acpi_status acpi_ps_get_aml_opcod
> > >>> ACPI_FUNCTION_TRACE_PTR(ps_get_aml_opcode, walk_state);
> > >>> walk_state->aml = walk_state->parser_state.aml;
> > >>> + if (!walk_state->aml)
> > >>> + return AE_CTRL_PARSE_CONTINUE;
> > >>> +
> > >>
> > >>
> > >> Well this seems to avoid the crash, but now it hangs right after on the
> > >> "Ignore error and continue table load" message.
> > >
> > > Well, maybe we should just abort in that case.
> > >
> > > I'm wondering what happens if you replace the return statement in the
> > > patch above with
> > >
> > > return_ACPI_STATUS(AE_AML_BAD_OPCODE)
> >
> > Yes, that is where I went when I applied the patch but I used
> > AE_CTRL_TERMINATE, which terminates the loop in acpi_ps_parse_loop() and
> > that appears to successfully finish/terminate the initial parsing pass.
> > But, it then crashes in acpi_ns_lookup called via the
> > acpi_walk_resources sequences that goes through ut_evalute_object() due
> > to the path/scope_info->scope.node being ACPI_ROOT_OBJECT (-1) and
> > bypassing the null check. Adding a ACPI_ROOT_OBJECT check as well as the
> > null checks in acpi_ns_lookup results in a successful boot. Tracking
> > down how the terminate (or whatever) is leaving the info->prefix_node
> > (in acpi_ns_evaluate) set to ROOT_OBJECT instead of null, is something I
> > don't yet understand.
> >
> > Anyway, I tried Using BAD_OPCODE rather than TERMINATE and it seems to
> > have the same basic result as PARSE_CONTINUE.
>
> OK, thanks!
>
> I evidently didn't look deep enough.
>
> Can you please check the patch below?
>
> I'm not sure if we can pass this broken state to
> acpi_ps_complete_final_op(), so it may be necessary to return
> an error directly when aml_op_start is NULL.
>
> ---
> drivers/acpi/acpica/psloop.c | 3 +++
> 1 file changed, 3 insertions(+)
>
> Index: linux-pm/drivers/acpi/acpica/psloop.c
> ===================================================================
> --- linux-pm.orig/drivers/acpi/acpica/psloop.c
> +++ linux-pm/drivers/acpi/acpica/psloop.c
> @@ -493,6 +493,9 @@ acpi_status acpi_ps_parse_loop(struct ac
> ASL_CV_CAPTURE_COMMENTS(walk_state);
>
> aml_op_start = parser_state->aml;
> + if (!aml_op_start)
> + break;
> +
> if (!op) {
> status =
> acpi_ps_create_op(walk_state, aml_op_start, &op);
>
> --

So maybe something like this:

---
drivers/acpi/acpica/psloop.c | 3 +++
1 file changed, 3 insertions(+)

Index: linux-pm/drivers/acpi/acpica/psloop.c
===================================================================
--- linux-pm.orig/drivers/acpi/acpica/psloop.c
+++ linux-pm/drivers/acpi/acpica/psloop.c
@@ -494,6 +494,9 @@ acpi_status acpi_ps_parse_loop(struct ac

aml_op_start = parser_state->aml;
if (!op) {
+ if (!aml_op_start)
+ return_ACPI_STATUS(AE_AML_INTERNAL);
+
status =
acpi_ps_create_op(walk_state, aml_op_start, &op);
if (ACPI_FAILURE(status)) {



2018-07-10 15:06:31

by Jeremy Linton

[permalink] [raw]
Subject: Re: 4.18rc3 TX2 boot failure with "ACPICA: AML parser: attempt to continue loading table after error"

Hi,

On 07/10/2018 06:25 AM, Rafael J. Wysocki wrote:
> On Tuesday, July 10, 2018 1:13:17 PM CEST Rafael J. Wysocki wrote:
>> On Tuesday, July 10, 2018 5:44:05 AM CEST Jeremy Linton wrote:
>>> Hi,
>>>
>>> On 07/09/2018 04:28 PM, Rafael J. Wysocki wrote:
>>>> On Mon, Jul 9, 2018 at 10:45 PM, Jeremy Linton <[email protected]> wrote:
>>>>> Hi,
>>>>>
>>>>> First thanks for the patch..
>>>>>
>>>>> On 07/08/2018 04:14 AM, Rafael J. Wysocki wrote:
>>>>>>
>>>>>> On Monday, July 2, 2018 11:41:42 PM CEST Jeremy Linton wrote:
>>>>>>>
>>>>>>> Hi,
>>>>>>>
>>>>>>> I'm experiencing two problems with commit 5088814a6e931 which is
>>>>>>> "ACPICA: AML parser: attempt to continue loading table after error"
>>>>>>>
>>>>>>> The first is this boot failure on a thunderX2:
>>>>>>>
>>>>>>> [ 10.770098] ACPI Error: Ignore error and continue table load
>>>>>
>>>>>
>>>>> [trimming]
>>>>>
>>>>>>> ]---
>>>>>>>
>>>>>>> Which does appear to be the result of some bad data in the table, but it
>>>>>>> was working with 4.17, and reverting this commit solves the problem.
>>>>>>
>>>>>>
>>>>>> Does the patch below make any difference?
>>>>>>
>>>>>> ---
>>>>>> drivers/acpi/acpica/psobject.c | 3 +++
>>>>>> 1 file changed, 3 insertions(+)
>>>>>>
>>>>>> Index: linux-pm/drivers/acpi/acpica/psobject.c
>>>>>> ===================================================================
>>>>>> --- linux-pm.orig/drivers/acpi/acpica/psobject.c
>>>>>> +++ linux-pm/drivers/acpi/acpica/psobject.c
>>>>>> @@ -39,6 +39,9 @@ static acpi_status acpi_ps_get_aml_opcod
>>>>>> ACPI_FUNCTION_TRACE_PTR(ps_get_aml_opcode, walk_state);
>>>>>> walk_state->aml = walk_state->parser_state.aml;
>>>>>> + if (!walk_state->aml)
>>>>>> + return AE_CTRL_PARSE_CONTINUE;
>>>>>> +
>>>>>
>>>>>
>>>>> Well this seems to avoid the crash, but now it hangs right after on the
>>>>> "Ignore error and continue table load" message.
>>>>
>>>> Well, maybe we should just abort in that case.
>>>>
>>>> I'm wondering what happens if you replace the return statement in the
>>>> patch above with
>>>>
>>>> return_ACPI_STATUS(AE_AML_BAD_OPCODE)
>>>
>>> Yes, that is where I went when I applied the patch but I used
>>> AE_CTRL_TERMINATE, which terminates the loop in acpi_ps_parse_loop() and
>>> that appears to successfully finish/terminate the initial parsing pass.
>>> But, it then crashes in acpi_ns_lookup called via the
>>> acpi_walk_resources sequences that goes through ut_evalute_object() due
>>> to the path/scope_info->scope.node being ACPI_ROOT_OBJECT (-1) and
>>> bypassing the null check. Adding a ACPI_ROOT_OBJECT check as well as the
>>> null checks in acpi_ns_lookup results in a successful boot. Tracking
>>> down how the terminate (or whatever) is leaving the info->prefix_node
>>> (in acpi_ns_evaluate) set to ROOT_OBJECT instead of null, is something I
>>> don't yet understand.
>>>
>>> Anyway, I tried Using BAD_OPCODE rather than TERMINATE and it seems to
>>> have the same basic result as PARSE_CONTINUE.
>>
>> OK, thanks!
>>
>> I evidently didn't look deep enough.
>>
>> Can you please check the patch below?
>>
>> I'm not sure if we can pass this broken state to
>> acpi_ps_complete_final_op(), so it may be necessary to return
>> an error directly when aml_op_start is NULL.
>>
>> ---
>> drivers/acpi/acpica/psloop.c | 3 +++
>> 1 file changed, 3 insertions(+)
>>
>> Index: linux-pm/drivers/acpi/acpica/psloop.c
>> ===================================================================
>> --- linux-pm.orig/drivers/acpi/acpica/psloop.c
>> +++ linux-pm/drivers/acpi/acpica/psloop.c
>> @@ -493,6 +493,9 @@ acpi_status acpi_ps_parse_loop(struct ac
>> ASL_CV_CAPTURE_COMMENTS(walk_state);
>>
>> aml_op_start = parser_state->aml;
>> + if (!aml_op_start)
>> + break;
>> +
>> if (!op) {
>> status =
>> acpi_ps_create_op(walk_state, aml_op_start, &op);
>>
>> --
>
> So maybe something like this:
>
> ---
> drivers/acpi/acpica/psloop.c | 3 +++
> 1 file changed, 3 insertions(+)
>
> Index: linux-pm/drivers/acpi/acpica/psloop.c
> ===================================================================
> --- linux-pm.orig/drivers/acpi/acpica/psloop.c
> +++ linux-pm/drivers/acpi/acpica/psloop.c
> @@ -494,6 +494,9 @@ acpi_status acpi_ps_parse_loop(struct ac
>
> aml_op_start = parser_state->aml;
> if (!op) {
> + if (!aml_op_start)
> + return_ACPI_STATUS(AE_AML_INTERNAL);
> +
> status =
> acpi_ps_create_op(walk_state, aml_op_start, &op);
> if (ACPI_FAILURE(status)) {
>
>

This gets rid of the infinite loop, but its still has the problem with
acpi_ns_lookup crashing due to -1 in the scope.node.

2018-07-10 17:03:25

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: 4.18rc3 TX2 boot failure with "ACPICA: AML parser: attempt to continue loading table after error"

On Tuesday, July 10, 2018 5:04:21 PM CEST Jeremy Linton wrote:
> Hi,
>
> On 07/10/2018 06:25 AM, Rafael J. Wysocki wrote:
> > On Tuesday, July 10, 2018 1:13:17 PM CEST Rafael J. Wysocki wrote:
> >> On Tuesday, July 10, 2018 5:44:05 AM CEST Jeremy Linton wrote:
> >>> Hi,
> >>>
> >>> On 07/09/2018 04:28 PM, Rafael J. Wysocki wrote:
> >>>> On Mon, Jul 9, 2018 at 10:45 PM, Jeremy Linton <[email protected]> wrote:
> >>>>> Hi,
> >>>>>
> >>>>> First thanks for the patch..
> >>>>>
> >>>>> On 07/08/2018 04:14 AM, Rafael J. Wysocki wrote:
> >>>>>>
> >>>>>> On Monday, July 2, 2018 11:41:42 PM CEST Jeremy Linton wrote:
> >>>>>>>
> >>>>>>> Hi,
> >>>>>>>
> >>>>>>> I'm experiencing two problems with commit 5088814a6e931 which is
> >>>>>>> "ACPICA: AML parser: attempt to continue loading table after error"
> >>>>>>>
> >>>>>>> The first is this boot failure on a thunderX2:
> >>>>>>>
> >>>>>>> [ 10.770098] ACPI Error: Ignore error and continue table load
> >>>>>
> >>>>>
> >>>>> [trimming]
> >>>>>
> >>>>>>> ]---
> >>>>>>>
> >>>>>>> Which does appear to be the result of some bad data in the table, but it
> >>>>>>> was working with 4.17, and reverting this commit solves the problem.
> >>>>>>
> >>>>>>
> >>>>>> Does the patch below make any difference?
> >>>>>>
> >>>>>> ---
> >>>>>> drivers/acpi/acpica/psobject.c | 3 +++
> >>>>>> 1 file changed, 3 insertions(+)
> >>>>>>
> >>>>>> Index: linux-pm/drivers/acpi/acpica/psobject.c
> >>>>>> ===================================================================
> >>>>>> --- linux-pm.orig/drivers/acpi/acpica/psobject.c
> >>>>>> +++ linux-pm/drivers/acpi/acpica/psobject.c
> >>>>>> @@ -39,6 +39,9 @@ static acpi_status acpi_ps_get_aml_opcod
> >>>>>> ACPI_FUNCTION_TRACE_PTR(ps_get_aml_opcode, walk_state);
> >>>>>> walk_state->aml = walk_state->parser_state.aml;
> >>>>>> + if (!walk_state->aml)
> >>>>>> + return AE_CTRL_PARSE_CONTINUE;
> >>>>>> +
> >>>>>
> >>>>>
> >>>>> Well this seems to avoid the crash, but now it hangs right after on the
> >>>>> "Ignore error and continue table load" message.
> >>>>
> >>>> Well, maybe we should just abort in that case.
> >>>>
> >>>> I'm wondering what happens if you replace the return statement in the
> >>>> patch above with
> >>>>
> >>>> return_ACPI_STATUS(AE_AML_BAD_OPCODE)
> >>>
> >>> Yes, that is where I went when I applied the patch but I used
> >>> AE_CTRL_TERMINATE, which terminates the loop in acpi_ps_parse_loop() and
> >>> that appears to successfully finish/terminate the initial parsing pass.
> >>> But, it then crashes in acpi_ns_lookup called via the
> >>> acpi_walk_resources sequences that goes through ut_evalute_object() due
> >>> to the path/scope_info->scope.node being ACPI_ROOT_OBJECT (-1) and
> >>> bypassing the null check. Adding a ACPI_ROOT_OBJECT check as well as the
> >>> null checks in acpi_ns_lookup results in a successful boot. Tracking
> >>> down how the terminate (or whatever) is leaving the info->prefix_node
> >>> (in acpi_ns_evaluate) set to ROOT_OBJECT instead of null, is something I
> >>> don't yet understand.
> >>>
> >>> Anyway, I tried Using BAD_OPCODE rather than TERMINATE and it seems to
> >>> have the same basic result as PARSE_CONTINUE.
> >>
> >> OK, thanks!
> >>
> >> I evidently didn't look deep enough.
> >>
> >> Can you please check the patch below?
> >>
> >> I'm not sure if we can pass this broken state to
> >> acpi_ps_complete_final_op(), so it may be necessary to return
> >> an error directly when aml_op_start is NULL.
> >>
> >> ---
> >> drivers/acpi/acpica/psloop.c | 3 +++
> >> 1 file changed, 3 insertions(+)
> >>
> >> Index: linux-pm/drivers/acpi/acpica/psloop.c
> >> ===================================================================
> >> --- linux-pm.orig/drivers/acpi/acpica/psloop.c
> >> +++ linux-pm/drivers/acpi/acpica/psloop.c
> >> @@ -493,6 +493,9 @@ acpi_status acpi_ps_parse_loop(struct ac
> >> ASL_CV_CAPTURE_COMMENTS(walk_state);
> >>
> >> aml_op_start = parser_state->aml;
> >> + if (!aml_op_start)
> >> + break;
> >> +
> >> if (!op) {
> >> status =
> >> acpi_ps_create_op(walk_state, aml_op_start, &op);
> >>
> >> --
> >
> > So maybe something like this:
> >
> > ---
> > drivers/acpi/acpica/psloop.c | 3 +++
> > 1 file changed, 3 insertions(+)
> >
> > Index: linux-pm/drivers/acpi/acpica/psloop.c
> > ===================================================================
> > --- linux-pm.orig/drivers/acpi/acpica/psloop.c
> > +++ linux-pm/drivers/acpi/acpica/psloop.c
> > @@ -494,6 +494,9 @@ acpi_status acpi_ps_parse_loop(struct ac
> >
> > aml_op_start = parser_state->aml;
> > if (!op) {
> > + if (!aml_op_start)
> > + return_ACPI_STATUS(AE_AML_INTERNAL);
> > +
> > status =
> > acpi_ps_create_op(walk_state, aml_op_start, &op);
> > if (ACPI_FAILURE(status)) {
> >
> >
>
> This gets rid of the infinite loop, but its still has the problem with
> acpi_ns_lookup crashing due to -1 in the scope.node.

OK, so do you mean that something like the patch below is needed for the
system to boot?

---
drivers/acpi/acpica/nsaccess.c | 3 ++-
drivers/acpi/acpica/psloop.c | 3 +++
2 files changed, 5 insertions(+), 1 deletion(-)

Index: linux-pm/drivers/acpi/acpica/psloop.c
===================================================================
--- linux-pm.orig/drivers/acpi/acpica/psloop.c
+++ linux-pm/drivers/acpi/acpica/psloop.c
@@ -494,6 +494,9 @@ acpi_status acpi_ps_parse_loop(struct ac

aml_op_start = parser_state->aml;
if (!op) {
+ if (!aml_op_start)
+ return_ACPI_STATUS(AE_AML_INTERNAL);
+
status =
acpi_ps_create_op(walk_state, aml_op_start, &op);
if (ACPI_FAILURE(status)) {
Index: linux-pm/drivers/acpi/acpica/nsaccess.c
===================================================================
--- linux-pm.orig/drivers/acpi/acpica/nsaccess.c
+++ linux-pm/drivers/acpi/acpica/nsaccess.c
@@ -286,7 +286,8 @@ acpi_ns_lookup(union acpi_generic_state

/* Get the prefix scope. A null scope means use the root scope */

- if ((!scope_info) || (!scope_info->scope.node)) {
+ if ((!scope_info) || (!scope_info->scope.node) ||
+ (scope_info->scope.node == ACPI_ROOT_OBJECT)) {
ACPI_DEBUG_PRINT((ACPI_DB_NAMES,
"Null scope prefix, using root node (%p)\n",
acpi_gbl_root_node));


2018-07-10 17:07:25

by Jeremy Linton

[permalink] [raw]
Subject: Re: 4.18rc3 TX2 boot failure with "ACPICA: AML parser: attempt to continue loading table after error"

On 07/10/2018 11:56 AM, Rafael J. Wysocki wrote:
> On Tuesday, July 10, 2018 5:04:21 PM CEST Jeremy Linton wrote:
>> Hi,
>>
>> On 07/10/2018 06:25 AM, Rafael J. Wysocki wrote:
>>> On Tuesday, July 10, 2018 1:13:17 PM CEST Rafael J. Wysocki wrote:
>>>> On Tuesday, July 10, 2018 5:44:05 AM CEST Jeremy Linton wrote:
>>>>> Hi,
>>>>>
>>>>> On 07/09/2018 04:28 PM, Rafael J. Wysocki wrote:
>>>>>> On Mon, Jul 9, 2018 at 10:45 PM, Jeremy Linton <[email protected]> wrote:
>>>>>>> Hi,
>>>>>>>
>>>>>>> First thanks for the patch..
>>>>>>>
>>>>>>> On 07/08/2018 04:14 AM, Rafael J. Wysocki wrote:
>>>>>>>>
>>>>>>>> On Monday, July 2, 2018 11:41:42 PM CEST Jeremy Linton wrote:
>>>>>>>>>
>>>>>>>>> Hi,
>>>>>>>>>
>>>>>>>>> I'm experiencing two problems with commit 5088814a6e931 which is
>>>>>>>>> "ACPICA: AML parser: attempt to continue loading table after error"
>>>>>>>>>
>>>>>>>>> The first is this boot failure on a thunderX2:
>>>>>>>>>
>>>>>>>>> [ 10.770098] ACPI Error: Ignore error and continue table load
>>>>>>>
>>>>>>>
>>>>>>> [trimming]
>>>>>>>
>>>>>>>>> ]---
>>>>>>>>>
>>>>>>>>> Which does appear to be the result of some bad data in the table, but it
>>>>>>>>> was working with 4.17, and reverting this commit solves the problem.
>>>>>>>>
>>>>>>>>
>>>>>>>> Does the patch below make any difference?
>>>>>>>>
>>>>>>>> ---
>>>>>>>> drivers/acpi/acpica/psobject.c | 3 +++
>>>>>>>> 1 file changed, 3 insertions(+)
>>>>>>>>
>>>>>>>> Index: linux-pm/drivers/acpi/acpica/psobject.c
>>>>>>>> ===================================================================
>>>>>>>> --- linux-pm.orig/drivers/acpi/acpica/psobject.c
>>>>>>>> +++ linux-pm/drivers/acpi/acpica/psobject.c
>>>>>>>> @@ -39,6 +39,9 @@ static acpi_status acpi_ps_get_aml_opcod
>>>>>>>> ACPI_FUNCTION_TRACE_PTR(ps_get_aml_opcode, walk_state);
>>>>>>>> walk_state->aml = walk_state->parser_state.aml;
>>>>>>>> + if (!walk_state->aml)
>>>>>>>> + return AE_CTRL_PARSE_CONTINUE;
>>>>>>>> +
>>>>>>>
>>>>>>>
>>>>>>> Well this seems to avoid the crash, but now it hangs right after on the
>>>>>>> "Ignore error and continue table load" message.
>>>>>>
>>>>>> Well, maybe we should just abort in that case.
>>>>>>
>>>>>> I'm wondering what happens if you replace the return statement in the
>>>>>> patch above with
>>>>>>
>>>>>> return_ACPI_STATUS(AE_AML_BAD_OPCODE)
>>>>>
>>>>> Yes, that is where I went when I applied the patch but I used
>>>>> AE_CTRL_TERMINATE, which terminates the loop in acpi_ps_parse_loop() and
>>>>> that appears to successfully finish/terminate the initial parsing pass.
>>>>> But, it then crashes in acpi_ns_lookup called via the
>>>>> acpi_walk_resources sequences that goes through ut_evalute_object() due
>>>>> to the path/scope_info->scope.node being ACPI_ROOT_OBJECT (-1) and
>>>>> bypassing the null check. Adding a ACPI_ROOT_OBJECT check as well as the
>>>>> null checks in acpi_ns_lookup results in a successful boot. Tracking
>>>>> down how the terminate (or whatever) is leaving the info->prefix_node
>>>>> (in acpi_ns_evaluate) set to ROOT_OBJECT instead of null, is something I
>>>>> don't yet understand.
>>>>>
>>>>> Anyway, I tried Using BAD_OPCODE rather than TERMINATE and it seems to
>>>>> have the same basic result as PARSE_CONTINUE.
>>>>
>>>> OK, thanks!
>>>>
>>>> I evidently didn't look deep enough.
>>>>
>>>> Can you please check the patch below?
>>>>
>>>> I'm not sure if we can pass this broken state to
>>>> acpi_ps_complete_final_op(), so it may be necessary to return
>>>> an error directly when aml_op_start is NULL.
>>>>
>>>> ---
>>>> drivers/acpi/acpica/psloop.c | 3 +++
>>>> 1 file changed, 3 insertions(+)
>>>>
>>>> Index: linux-pm/drivers/acpi/acpica/psloop.c
>>>> ===================================================================
>>>> --- linux-pm.orig/drivers/acpi/acpica/psloop.c
>>>> +++ linux-pm/drivers/acpi/acpica/psloop.c
>>>> @@ -493,6 +493,9 @@ acpi_status acpi_ps_parse_loop(struct ac
>>>> ASL_CV_CAPTURE_COMMENTS(walk_state);
>>>>
>>>> aml_op_start = parser_state->aml;
>>>> + if (!aml_op_start)
>>>> + break;
>>>> +
>>>> if (!op) {
>>>> status =
>>>> acpi_ps_create_op(walk_state, aml_op_start, &op);
>>>>
>>>> --
>>>
>>> So maybe something like this:
>>>
>>> ---
>>> drivers/acpi/acpica/psloop.c | 3 +++
>>> 1 file changed, 3 insertions(+)
>>>
>>> Index: linux-pm/drivers/acpi/acpica/psloop.c
>>> ===================================================================
>>> --- linux-pm.orig/drivers/acpi/acpica/psloop.c
>>> +++ linux-pm/drivers/acpi/acpica/psloop.c
>>> @@ -494,6 +494,9 @@ acpi_status acpi_ps_parse_loop(struct ac
>>>
>>> aml_op_start = parser_state->aml;
>>> if (!op) {
>>> + if (!aml_op_start)
>>> + return_ACPI_STATUS(AE_AML_INTERNAL);
>>> +
>>> status =
>>> acpi_ps_create_op(walk_state, aml_op_start, &op);
>>> if (ACPI_FAILURE(status)) {
>>>
>>>
>>
>> This gets rid of the infinite loop, but its still has the problem with
>> acpi_ns_lookup crashing due to -1 in the scope.node.
>
> OK, so do you mean that something like the patch below is needed for the
> system to boot?

Yes, the machine is back to the pre 4.18 behavior (with the addition of
a lot of additional ACPI error messages) boot with those two changes.

> ---
> drivers/acpi/acpica/nsaccess.c | 3 ++-
> drivers/acpi/acpica/psloop.c | 3 +++
> 2 files changed, 5 insertions(+), 1 deletion(-)
>
> Index: linux-pm/drivers/acpi/acpica/psloop.c
> ===================================================================
> --- linux-pm.orig/drivers/acpi/acpica/psloop.c
> +++ linux-pm/drivers/acpi/acpica/psloop.c
> @@ -494,6 +494,9 @@ acpi_status acpi_ps_parse_loop(struct ac
>
> aml_op_start = parser_state->aml;
> if (!op) {
> + if (!aml_op_start)
> + return_ACPI_STATUS(AE_AML_INTERNAL);
> +
> status =
> acpi_ps_create_op(walk_state, aml_op_start, &op);
> if (ACPI_FAILURE(status)) {
> Index: linux-pm/drivers/acpi/acpica/nsaccess.c
> ===================================================================
> --- linux-pm.orig/drivers/acpi/acpica/nsaccess.c
> +++ linux-pm/drivers/acpi/acpica/nsaccess.c
> @@ -286,7 +286,8 @@ acpi_ns_lookup(union acpi_generic_state
>
> /* Get the prefix scope. A null scope means use the root scope */
>
> - if ((!scope_info) || (!scope_info->scope.node)) {
> + if ((!scope_info) || (!scope_info->scope.node) ||
> + (scope_info->scope.node == ACPI_ROOT_OBJECT)) {
> ACPI_DEBUG_PRINT((ACPI_DB_NAMES,
> "Null scope prefix, using root node (%p)\n",
> acpi_gbl_root_node));
>