Hi,
After new git pull the kernel in Torvalds tree with default debug config
failed to boot with error that occurs prior to mounting filesystems, so there
is no log safe for the screenshot(s) here:
[1] https://domac.alu.unizg.hr/~mtodorov/linux/crashes/2023-07-02/
Bisect shows the first bad commit is 2d47c6956ab3 (v6.4-rc2-1-g2d47c6956ab3):
# good: [98be618ad03010b1173fc3c35f6cbb4447ee2b07] Merge tag 'Smack-for-6.5' of https://github.com/cschaufler/smack-next
git bisect good 98be618ad03010b1173fc3c35f6cbb4447ee2b07
# bad: [f4a0659f823e5a828ea2f45b4849ea8e2dd2984c] drm/i2c: tda998x: Replace all non-returning strlcpy with strscpy
git bisect bad f4a0659f823e5a828ea2f45b4849ea8e2dd2984c
.
.
.
# bad: [2d47c6956ab3c8b580a59d7704aab3e2a4882b6c] ubsan: Tighten UBSAN_BOUNDS on GCC
git bisect bad 2d47c6956ab3c8b580a59d7704aab3e2a4882b6c
# first bad commit: [2d47c6956ab3c8b580a59d7704aab3e2a4882b6c] ubsan: Tighten UBSAN_BOUNDS on GCC
The architecture is Ubuntu 22.04 with lshw and config give in the attachment.
Best regards,
Mirsad Todorovac
On Sun, Jul 02, 2023 at 06:36:12PM +0200, Mirsad Goran Todorovac wrote:
> Hi,
>
> After new git pull the kernel in Torvalds tree with default debug config
> failed to boot with error that occurs prior to mounting filesystems, so there
> is no log safe for the screenshot(s) here:
>
> [1] https://domac.alu.unizg.hr/~mtodorov/linux/crashes/2023-07-02/
>
> Bisect shows the first bad commit is 2d47c6956ab3 (v6.4-rc2-1-g2d47c6956ab3):
>
> # good: [98be618ad03010b1173fc3c35f6cbb4447ee2b07] Merge tag 'Smack-for-6.5' of https://github.com/cschaufler/smack-next
> git bisect good 98be618ad03010b1173fc3c35f6cbb4447ee2b07
> # bad: [f4a0659f823e5a828ea2f45b4849ea8e2dd2984c] drm/i2c: tda998x: Replace all non-returning strlcpy with strscpy
> git bisect bad f4a0659f823e5a828ea2f45b4849ea8e2dd2984c
> .
> .
> .
> # bad: [2d47c6956ab3c8b580a59d7704aab3e2a4882b6c] ubsan: Tighten UBSAN_BOUNDS on GCC
> git bisect bad 2d47c6956ab3c8b580a59d7704aab3e2a4882b6c
> # first bad commit: [2d47c6956ab3c8b580a59d7704aab3e2a4882b6c] ubsan: Tighten UBSAN_BOUNDS on GCC
>
> The architecture is Ubuntu 22.04 with lshw and config give in the attachment.
Can you show early kernel log (something like dmesg)?
Anyway, I'm adding it to regzbot:
#regzbot ^introduced: 2d47c6956ab3c8
#regzbot title: Linux kernel fails to boot due to UBSAN_BOUNDS tightening
Thanks.
--
An old man doll... just what I always wanted! - Clara
On Mon, Jul 03, 2023 at 08:44:37AM +0700, Bagas Sanjaya wrote:
> On Sun, Jul 02, 2023 at 06:36:12PM +0200, Mirsad Goran Todorovac wrote:
> > Hi,
> >
> > After new git pull the kernel in Torvalds tree with default debug config
> > failed to boot with error that occurs prior to mounting filesystems, so there
> > is no log safe for the screenshot(s) here:
> >
> > [1] https://domac.alu.unizg.hr/~mtodorov/linux/crashes/2023-07-02/
> >
> > Bisect shows the first bad commit is 2d47c6956ab3 (v6.4-rc2-1-g2d47c6956ab3):
> >
> > # good: [98be618ad03010b1173fc3c35f6cbb4447ee2b07] Merge tag 'Smack-for-6.5' of https://github.com/cschaufler/smack-next
> > git bisect good 98be618ad03010b1173fc3c35f6cbb4447ee2b07
> > # bad: [f4a0659f823e5a828ea2f45b4849ea8e2dd2984c] drm/i2c: tda998x: Replace all non-returning strlcpy with strscpy
> > git bisect bad f4a0659f823e5a828ea2f45b4849ea8e2dd2984c
> > .
> > .
> > .
> > # bad: [2d47c6956ab3c8b580a59d7704aab3e2a4882b6c] ubsan: Tighten UBSAN_BOUNDS on GCC
> > git bisect bad 2d47c6956ab3c8b580a59d7704aab3e2a4882b6c
> > # first bad commit: [2d47c6956ab3c8b580a59d7704aab3e2a4882b6c] ubsan: Tighten UBSAN_BOUNDS on GCC
> >
> > The architecture is Ubuntu 22.04 with lshw and config give in the attachment.
>
> Can you show early kernel log (something like dmesg)?
>
> Anyway, I'm adding it to regzbot:
>
> #regzbot ^introduced: 2d47c6956ab3c8
> #regzbot title: Linux kernel fails to boot due to UBSAN_BOUNDS tightening
I'm confused. Commit 2d47c6956ab3c8b580a59d7704aab3e2a4882b6c isn't in the v6.4
tree... it's only in Linus's ToT.
Also, the config you included does not show CONFIG_UBSAN_BOUNDS_STRICT
as even being available, much less present. Something seems very wrong
with this report...
-Kees
--
Kees Cook
On 7/2/23 20:20, Kees Cook wrote:
> On Mon, Jul 03, 2023 at 08:44:37AM +0700, Bagas Sanjaya wrote:
>> On Sun, Jul 02, 2023 at 06:36:12PM +0200, Mirsad Goran Todorovac wrote:
>>> Hi,
>>>
>>> After new git pull the kernel in Torvalds tree with default debug config
>>> failed to boot with error that occurs prior to mounting filesystems, so there
>>> is no log safe for the screenshot(s) here:
>>>
>>> [1] https://domac.alu.unizg.hr/~mtodorov/linux/crashes/2023-07-02/
>>>
>>> Bisect shows the first bad commit is 2d47c6956ab3 (v6.4-rc2-1-g2d47c6956ab3):
>>>
>>> # good: [98be618ad03010b1173fc3c35f6cbb4447ee2b07] Merge tag 'Smack-for-6.5' of https://github.com/cschaufler/smack-next
>>> git bisect good 98be618ad03010b1173fc3c35f6cbb4447ee2b07
>>> # bad: [f4a0659f823e5a828ea2f45b4849ea8e2dd2984c] drm/i2c: tda998x: Replace all non-returning strlcpy with strscpy
>>> git bisect bad f4a0659f823e5a828ea2f45b4849ea8e2dd2984c
>>> .
>>> .
>>> .
>>> # bad: [2d47c6956ab3c8b580a59d7704aab3e2a4882b6c] ubsan: Tighten UBSAN_BOUNDS on GCC
>>> git bisect bad 2d47c6956ab3c8b580a59d7704aab3e2a4882b6c
>>> # first bad commit: [2d47c6956ab3c8b580a59d7704aab3e2a4882b6c] ubsan: Tighten UBSAN_BOUNDS on GCC
>>>
>>> The architecture is Ubuntu 22.04 with lshw and config give in the attachment.
>>
>> Can you show early kernel log (something like dmesg)?
>>
>> Anyway, I'm adding it to regzbot:
>>
>> #regzbot ^introduced: 2d47c6956ab3c8
>> #regzbot title: Linux kernel fails to boot due to UBSAN_BOUNDS tightening
>
> I'm confused. Commit 2d47c6956ab3c8b580a59d7704aab3e2a4882b6c isn't in the v6.4
> tree... it's only in Linus's ToT.
>
In ToT:
$ git describe 2d47c6956ab3
v6.4-rc2-1-g2d47c6956ab3
$ git describe --contains 2d47c6956ab3
next-20230616~2^2~51
$ git describe --contains --match 'v*' 2d47c6956ab3
fatal: cannot describe '2d47c6956ab3c8b580a59d7704aab3e2a4882b6c'
"git describe" always shows the parent tree, which I guess was based on
v6.4-rc2.
Guenter
> Also, the config you included does not show CONFIG_UBSAN_BOUNDS_STRICT
> as even being available, much less present. Something seems very wrong
> with this report...
>
> -Kees
>
On 7/3/23 03:44, Bagas Sanjaya wrote:
> On Sun, Jul 02, 2023 at 06:36:12PM +0200, Mirsad Goran Todorovac wrote:
>> Hi,
>>
>> After new git pull the kernel in Torvalds tree with default debug config
>> failed to boot with error that occurs prior to mounting filesystems, so there
>> is no log safe for the screenshot(s) here:
>>
>> [1] https://domac.alu.unizg.hr/~mtodorov/linux/crashes/2023-07-02/
>>
>> Bisect shows the first bad commit is 2d47c6956ab3 (v6.4-rc2-1-g2d47c6956ab3):
>>
>> # good: [98be618ad03010b1173fc3c35f6cbb4447ee2b07] Merge tag 'Smack-for-6.5' of https://github.com/cschaufler/smack-next
>> git bisect good 98be618ad03010b1173fc3c35f6cbb4447ee2b07
>> # bad: [f4a0659f823e5a828ea2f45b4849ea8e2dd2984c] drm/i2c: tda998x: Replace all non-returning strlcpy with strscpy
>> git bisect bad f4a0659f823e5a828ea2f45b4849ea8e2dd2984c
>> .
>> .
>> .
>> # bad: [2d47c6956ab3c8b580a59d7704aab3e2a4882b6c] ubsan: Tighten UBSAN_BOUNDS on GCC
>> git bisect bad 2d47c6956ab3c8b580a59d7704aab3e2a4882b6c
>> # first bad commit: [2d47c6956ab3c8b580a59d7704aab3e2a4882b6c] ubsan: Tighten UBSAN_BOUNDS on GCC
>>
>> The architecture is Ubuntu 22.04 with lshw and config give in the attachment.
>
> Can you show early kernel log (something like dmesg)?
No, machine freezes after those screenfulls and I could only take a
screenshot.
> Anyway, I'm adding it to regzbot:
>
> #regzbot ^introduced: 2d47c6956ab3c8
> #regzbot title: Linux kernel fails to boot due to UBSAN_BOUNDS tightening
>
> Thanks.
>
On 7/2/23 20:26, Guenter Roeck wrote:
> On 7/2/23 20:20, Kees Cook wrote:
>> On Mon, Jul 03, 2023 at 08:44:37AM +0700, Bagas Sanjaya wrote:
>>> On Sun, Jul 02, 2023 at 06:36:12PM +0200, Mirsad Goran Todorovac wrote:
>>>> Hi,
>>>>
>>>> After new git pull the kernel in Torvalds tree with default debug config
>>>> failed to boot with error that occurs prior to mounting filesystems, so there
>>>> is no log safe for the screenshot(s) here:
>>>>
>>>> [1] https://domac.alu.unizg.hr/~mtodorov/linux/crashes/2023-07-02/
>>>>
>>>> Bisect shows the first bad commit is 2d47c6956ab3 (v6.4-rc2-1-g2d47c6956ab3):
>>>>
>>>> # good: [98be618ad03010b1173fc3c35f6cbb4447ee2b07] Merge tag 'Smack-for-6.5' of https://github.com/cschaufler/smack-next
>>>> git bisect good 98be618ad03010b1173fc3c35f6cbb4447ee2b07
>>>> # bad: [f4a0659f823e5a828ea2f45b4849ea8e2dd2984c] drm/i2c: tda998x: Replace all non-returning strlcpy with strscpy
>>>> git bisect bad f4a0659f823e5a828ea2f45b4849ea8e2dd2984c
>>>> .
>>>> .
>>>> .
>>>> # bad: [2d47c6956ab3c8b580a59d7704aab3e2a4882b6c] ubsan: Tighten UBSAN_BOUNDS on GCC
>>>> git bisect bad 2d47c6956ab3c8b580a59d7704aab3e2a4882b6c
>>>> # first bad commit: [2d47c6956ab3c8b580a59d7704aab3e2a4882b6c] ubsan: Tighten UBSAN_BOUNDS on GCC
>>>>
>>>> The architecture is Ubuntu 22.04 with lshw and config give in the attachment.
>>>
>>> Can you show early kernel log (something like dmesg)?
>>>
>>> Anyway, I'm adding it to regzbot:
>>>
>>> #regzbot ^introduced: 2d47c6956ab3c8
>>> #regzbot title: Linux kernel fails to boot due to UBSAN_BOUNDS tightening
>>
>> I'm confused. Commit 2d47c6956ab3c8b580a59d7704aab3e2a4882b6c isn't in the v6.4
>> tree... it's only in Linus's ToT.
>>
>
> In ToT:
>
> $ git describe 2d47c6956ab3
> v6.4-rc2-1-g2d47c6956ab3
>
> $ git describe --contains 2d47c6956ab3
> next-20230616~2^2~51
> $ git describe --contains --match 'v*' 2d47c6956ab3
> fatal: cannot describe '2d47c6956ab3c8b580a59d7704aab3e2a4882b6c'
>
> "git describe" always shows the parent tree, which I guess was based on
> v6.4-rc2.
>
Ah, sorry, I didn't realize that the subject claims that the problem
would be in 6.4.1. That indeed does not match the bisect results.
Guenter
> Guenter
>
>
>> Also, the config you included does not show CONFIG_UBSAN_BOUNDS_STRICT
>> as even being available, much less present. Something seems very wrong
>> with this report...
>>
>> -Kees
>>
>
On 7/3/23 05:26, Guenter Roeck wrote:
> On 7/2/23 20:20, Kees Cook wrote:
>> On Mon, Jul 03, 2023 at 08:44:37AM +0700, Bagas Sanjaya wrote:
>>> On Sun, Jul 02, 2023 at 06:36:12PM +0200, Mirsad Goran Todorovac wrote:
>>>> Hi,
>>>>
>>>> After new git pull the kernel in Torvalds tree with default debug config
>>>> failed to boot with error that occurs prior to mounting filesystems, so there
>>>> is no log safe for the screenshot(s) here:
>>>>
>>>> [1] https://domac.alu.unizg.hr/~mtodorov/linux/crashes/2023-07-02/
>>>>
>>>> Bisect shows the first bad commit is 2d47c6956ab3 (v6.4-rc2-1-g2d47c6956ab3):
>>>>
>>>> # good: [98be618ad03010b1173fc3c35f6cbb4447ee2b07] Merge tag 'Smack-for-6.5' of https://github.com/cschaufler/smack-next
>>>> git bisect good 98be618ad03010b1173fc3c35f6cbb4447ee2b07
>>>> # bad: [f4a0659f823e5a828ea2f45b4849ea8e2dd2984c] drm/i2c: tda998x: Replace all non-returning strlcpy with strscpy
>>>> git bisect bad f4a0659f823e5a828ea2f45b4849ea8e2dd2984c
>>>> .
>>>> .
>>>> .
>>>> # bad: [2d47c6956ab3c8b580a59d7704aab3e2a4882b6c] ubsan: Tighten UBSAN_BOUNDS on GCC
>>>> git bisect bad 2d47c6956ab3c8b580a59d7704aab3e2a4882b6c
>>>> # first bad commit: [2d47c6956ab3c8b580a59d7704aab3e2a4882b6c] ubsan: Tighten UBSAN_BOUNDS on GCC
>>>>
>>>> The architecture is Ubuntu 22.04 with lshw and config give in the attachment.
>>>
>>> Can you show early kernel log (something like dmesg)?
>>>
>>> Anyway, I'm adding it to regzbot:
>>>
>>> #regzbot ^introduced: 2d47c6956ab3c8
>>> #regzbot title: Linux kernel fails to boot due to UBSAN_BOUNDS tightening
>>
>> I'm confused. Commit 2d47c6956ab3c8b580a59d7704aab3e2a4882b6c isn't in the v6.4
>> tree... it's only in Linus's ToT.
>>
>
> In ToT:
>
> $ git describe 2d47c6956ab3
> v6.4-rc2-1-g2d47c6956ab3
>
> $ git describe --contains 2d47c6956ab3
> next-20230616~2^2~51
> $ git describe --contains --match 'v*' 2d47c6956ab3
> fatal: cannot describe '2d47c6956ab3c8b580a59d7704aab3e2a4882b6c'
>
> "git describe" always shows the parent tree, which I guess was based on
> v6.4-rc2.
>
> Guenter
>
>
>> Also, the config you included does not show CONFIG_UBSAN_BOUNDS_STRICT
>> as even being available, much less present. Something seems very wrong
>> with this report...
>>
>> -Kees
Anyway, I have double checked and linux-image-6.4.0-rc2-crash boots while
linux-image-6.4.0-rc2-crash-00001-g2d47c6956ab3 freezes in early boot.
Of course, in the next boot dmesg appears overwritten ... I could provide
only the first screen screenshots.
The difference is only one commit.
It is a bit strange so I am available for any additional diagnostics.
Regards,
Mirsad
On Mon, Jul 03, 2023 at 05:53:48AM +0200, Mirsad Goran Todorovac wrote:
> On 7/3/23 05:26, Guenter Roeck wrote:
> > On 7/2/23 20:20, Kees Cook wrote:
> > > On Mon, Jul 03, 2023 at 08:44:37AM +0700, Bagas Sanjaya wrote:
> > > > On Sun, Jul 02, 2023 at 06:36:12PM +0200, Mirsad Goran Todorovac wrote:
> > > > > Hi,
> > > > >
> > > > > After new git pull the kernel in Torvalds tree with default debug config
> > > > > failed to boot with error that occurs prior to mounting filesystems, so there
> > > > > is no log safe for the screenshot(s) here:
> > > > >
> > > > > [1] https://domac.alu.unizg.hr/~mtodorov/linux/crashes/2023-07-02/
> > > > >
> > > > > Bisect shows the first bad commit is 2d47c6956ab3 (v6.4-rc2-1-g2d47c6956ab3):
> > > > >
> > > > > # good: [98be618ad03010b1173fc3c35f6cbb4447ee2b07] Merge tag 'Smack-for-6.5' of https://github.com/cschaufler/smack-next
> > > > > git bisect good 98be618ad03010b1173fc3c35f6cbb4447ee2b07
> > > > > # bad: [f4a0659f823e5a828ea2f45b4849ea8e2dd2984c] drm/i2c: tda998x: Replace all non-returning strlcpy with strscpy
> > > > > git bisect bad f4a0659f823e5a828ea2f45b4849ea8e2dd2984c
> > > > > .
> > > > > .
> > > > > .
> > > > > # bad: [2d47c6956ab3c8b580a59d7704aab3e2a4882b6c] ubsan: Tighten UBSAN_BOUNDS on GCC
> > > > > git bisect bad 2d47c6956ab3c8b580a59d7704aab3e2a4882b6c
> > > > > # first bad commit: [2d47c6956ab3c8b580a59d7704aab3e2a4882b6c] ubsan: Tighten UBSAN_BOUNDS on GCC
> > > > >
> > > > > The architecture is Ubuntu 22.04 with lshw and config give in the attachment.
> > > >
> > > > Can you show early kernel log (something like dmesg)?
> > > >
> > > > Anyway, I'm adding it to regzbot:
> > > >
> > > > #regzbot ^introduced: 2d47c6956ab3c8
> > > > #regzbot title: Linux kernel fails to boot due to UBSAN_BOUNDS tightening
> > >
> > > I'm confused. Commit 2d47c6956ab3c8b580a59d7704aab3e2a4882b6c isn't in the v6.4
> > > tree... it's only in Linus's ToT.
> > >
> >
> > In ToT:
> >
> > $ git describe 2d47c6956ab3
> > v6.4-rc2-1-g2d47c6956ab3
> >
> > $ git describe --contains 2d47c6956ab3
> > next-20230616~2^2~51
> > $ git describe --contains --match 'v*' 2d47c6956ab3
> > fatal: cannot describe '2d47c6956ab3c8b580a59d7704aab3e2a4882b6c'
> >
> > "git describe" always shows the parent tree, which I guess was based on
> > v6.4-rc2.
> >
> > Guenter
> >
> >
> > > Also, the config you included does not show CONFIG_UBSAN_BOUNDS_STRICT
> > > as even being available, much less present. Something seems very wrong
> > > with this report...
> > >
> > > -Kees
>
> Anyway, I have double checked and linux-image-6.4.0-rc2-crash boots while
> linux-image-6.4.0-rc2-crash-00001-g2d47c6956ab3 freezes in early boot.
I don't understand what tree you're testing. 2d47c6956ab3 is only in
Linus's latest tree, which is not 6.4-rc2.
If you're testing Linus's tree, and you're bisecting to 2d47c6956ab3,
I don't understand why the .config you sent doesn't include
CONFIG_UBSAN_BOUNDS_STRICT (which was introduced by that commit) --
it should be visible whether or not it is selected.
> Of course, in the next boot dmesg appears overwritten ... I could provide
> only the first screen screenshots.
Without CONFIG_UBSAN_TRAP, I would not expect anything other than a
warning (i.e. boot would continue).
The only other thing I can think of that seems related (the backtrace
appears to show usb), might be this:
https://lore.kernel.org/lkml/[email protected]/
which won't appears until after v6.5-rc1.
> The difference is only one commit.
>
> It is a bit strange so I am available for any additional diagnostics.
Thanks! Can you send "grep UBSAN .config" output for the crashing kernel?
Are you booting on an EFI-capable machine? If you could configure pstore
to use the EFI-vars backend, you can capture the crash in EFI and
pstorefs will show it after the next boot. (If you're using systemd,
this all may already be happening -- check /var/lib/systemd/pstore/
or see[1] for more details.)
-Kees
[1] https://www.freedesktop.org/software/systemd/man/systemd-pstore.service.html
--
Kees Cook
On 7/2/23 21:30, Kees Cook wrote:
> On Mon, Jul 03, 2023 at 05:53:48AM +0200, Mirsad Goran Todorovac wrote:
>> On 7/3/23 05:26, Guenter Roeck wrote:
>>> On 7/2/23 20:20, Kees Cook wrote:
>>>> On Mon, Jul 03, 2023 at 08:44:37AM +0700, Bagas Sanjaya wrote:
>>>>> On Sun, Jul 02, 2023 at 06:36:12PM +0200, Mirsad Goran Todorovac wrote:
>>>>>> Hi,
>>>>>>
>>>>>> After new git pull the kernel in Torvalds tree with default debug config
>>>>>> failed to boot with error that occurs prior to mounting filesystems, so there
>>>>>> is no log safe for the screenshot(s) here:
>>>>>>
>>>>>> [1] https://domac.alu.unizg.hr/~mtodorov/linux/crashes/2023-07-02/
>>>>>>
>>>>>> Bisect shows the first bad commit is 2d47c6956ab3 (v6.4-rc2-1-g2d47c6956ab3):
>>>>>>
>>>>>> # good: [98be618ad03010b1173fc3c35f6cbb4447ee2b07] Merge tag 'Smack-for-6.5' of https://github.com/cschaufler/smack-next
>>>>>> git bisect good 98be618ad03010b1173fc3c35f6cbb4447ee2b07
>>>>>> # bad: [f4a0659f823e5a828ea2f45b4849ea8e2dd2984c] drm/i2c: tda998x: Replace all non-returning strlcpy with strscpy
>>>>>> git bisect bad f4a0659f823e5a828ea2f45b4849ea8e2dd2984c
>>>>>> .
>>>>>> .
>>>>>> .
>>>>>> # bad: [2d47c6956ab3c8b580a59d7704aab3e2a4882b6c] ubsan: Tighten UBSAN_BOUNDS on GCC
>>>>>> git bisect bad 2d47c6956ab3c8b580a59d7704aab3e2a4882b6c
>>>>>> # first bad commit: [2d47c6956ab3c8b580a59d7704aab3e2a4882b6c] ubsan: Tighten UBSAN_BOUNDS on GCC
>>>>>>
>>>>>> The architecture is Ubuntu 22.04 with lshw and config give in the attachment.
>>>>>
>>>>> Can you show early kernel log (something like dmesg)?
>>>>>
>>>>> Anyway, I'm adding it to regzbot:
>>>>>
>>>>> #regzbot ^introduced: 2d47c6956ab3c8
>>>>> #regzbot title: Linux kernel fails to boot due to UBSAN_BOUNDS tightening
>>>>
>>>> I'm confused. Commit 2d47c6956ab3c8b580a59d7704aab3e2a4882b6c isn't in the v6.4
>>>> tree... it's only in Linus's ToT.
>>>>
>>>
>>> In ToT:
>>>
>>> $ git describe 2d47c6956ab3
>>> v6.4-rc2-1-g2d47c6956ab3
>>>
>>> $ git describe --contains 2d47c6956ab3
>>> next-20230616~2^2~51
>>> $ git describe --contains --match 'v*' 2d47c6956ab3
>>> fatal: cannot describe '2d47c6956ab3c8b580a59d7704aab3e2a4882b6c'
>>>
>>> "git describe" always shows the parent tree, which I guess was based on
>>> v6.4-rc2.
>>>
>>> Guenter
>>>
>>>
>>>> Also, the config you included does not show CONFIG_UBSAN_BOUNDS_STRICT
>>>> as even being available, much less present. Something seems very wrong
>>>> with this report...
>>>>
>>>> -Kees
>>
>> Anyway, I have double checked and linux-image-6.4.0-rc2-crash boots while
>> linux-image-6.4.0-rc2-crash-00001-g2d47c6956ab3 freezes in early boot.
>
> I don't understand what tree you're testing. 2d47c6956ab3 is only in
> Linus's latest tree, which is not 6.4-rc2.
>
Maybe this ?
$ git checkout -b testing 2d47c6956ab3
Updating files: 100% (15501/15501), done.
Switched to a new branch 'testing'
groeck@server:~/src/linux-staging$ git describe
v6.4-rc2-1-g2d47c6956ab3
Guenter
> If you're testing Linus's tree, and you're bisecting to 2d47c6956ab3,
> I don't understand why the .config you sent doesn't include
> CONFIG_UBSAN_BOUNDS_STRICT (which was introduced by that commit) --
> it should be visible whether or not it is selected.
>
>> Of course, in the next boot dmesg appears overwritten ... I could provide
>> only the first screen screenshots.
>
> Without CONFIG_UBSAN_TRAP, I would not expect anything other than a
> warning (i.e. boot would continue).
>
> The only other thing I can think of that seems related (the backtrace
> appears to show usb), might be this:
> https://lore.kernel.org/lkml/[email protected]/
> which won't appears until after v6.5-rc1.
>
>> The difference is only one commit.
>>
>> It is a bit strange so I am available for any additional diagnostics.
>
> Thanks! Can you send "grep UBSAN .config" output for the crashing kernel?
>
> Are you booting on an EFI-capable machine? If you could configure pstore
> to use the EFI-vars backend, you can capture the crash in EFI and
> pstorefs will show it after the next boot. (If you're using systemd,
> this all may already be happening -- check /var/lib/systemd/pstore/
> or see[1] for more details.)
>
> -Kees
>
> [1] https://www.freedesktop.org/software/systemd/man/systemd-pstore.service.html
>
On 7/3/23 06:30, Kees Cook wrote:
> On Mon, Jul 03, 2023 at 05:53:48AM +0200, Mirsad Goran Todorovac wrote:
>> On 7/3/23 05:26, Guenter Roeck wrote:
>>> On 7/2/23 20:20, Kees Cook wrote:
>>>> On Mon, Jul 03, 2023 at 08:44:37AM +0700, Bagas Sanjaya wrote:
>>>>> On Sun, Jul 02, 2023 at 06:36:12PM +0200, Mirsad Goran Todorovac wrote:
>>>>>> Hi,
>>>>>>
>>>>>> After new git pull the kernel in Torvalds tree with default debug config
>>>>>> failed to boot with error that occurs prior to mounting filesystems, so there
>>>>>> is no log safe for the screenshot(s) here:
>>>>>>
>>>>>> [1] https://domac.alu.unizg.hr/~mtodorov/linux/crashes/2023-07-02/
>>>>>>
>>>>>> Bisect shows the first bad commit is 2d47c6956ab3 (v6.4-rc2-1-g2d47c6956ab3):
>>>>>>
>>>>>> # good: [98be618ad03010b1173fc3c35f6cbb4447ee2b07] Merge tag 'Smack-for-6.5' of https://github.com/cschaufler/smack-next
>>>>>> git bisect good 98be618ad03010b1173fc3c35f6cbb4447ee2b07
>>>>>> # bad: [f4a0659f823e5a828ea2f45b4849ea8e2dd2984c] drm/i2c: tda998x: Replace all non-returning strlcpy with strscpy
>>>>>> git bisect bad f4a0659f823e5a828ea2f45b4849ea8e2dd2984c
>>>>>> .
>>>>>> .
>>>>>> .
>>>>>> # bad: [2d47c6956ab3c8b580a59d7704aab3e2a4882b6c] ubsan: Tighten UBSAN_BOUNDS on GCC
>>>>>> git bisect bad 2d47c6956ab3c8b580a59d7704aab3e2a4882b6c
>>>>>> # first bad commit: [2d47c6956ab3c8b580a59d7704aab3e2a4882b6c] ubsan: Tighten UBSAN_BOUNDS on GCC
>>>>>>
>>>>>> The architecture is Ubuntu 22.04 with lshw and config give in the attachment.
>>>>>
>>>>> Can you show early kernel log (something like dmesg)?
>>>>>
>>>>> Anyway, I'm adding it to regzbot:
>>>>>
>>>>> #regzbot ^introduced: 2d47c6956ab3c8
>>>>> #regzbot title: Linux kernel fails to boot due to UBSAN_BOUNDS tightening
>>>>
>>>> I'm confused. Commit 2d47c6956ab3c8b580a59d7704aab3e2a4882b6c isn't in the v6.4
>>>> tree... it's only in Linus's ToT.
>>>>
>>>
>>> In ToT:
>>>
>>> $ git describe 2d47c6956ab3
>>> v6.4-rc2-1-g2d47c6956ab3
>>>
>>> $ git describe --contains 2d47c6956ab3
>>> next-20230616~2^2~51
>>> $ git describe --contains --match 'v*' 2d47c6956ab3
>>> fatal: cannot describe '2d47c6956ab3c8b580a59d7704aab3e2a4882b6c'
>>>
>>> "git describe" always shows the parent tree, which I guess was based on
>>> v6.4-rc2.
>>>
>>> Guenter
>>>
>>>
>>>> Also, the config you included does not show CONFIG_UBSAN_BOUNDS_STRICT
>>>> as even being available, much less present. Something seems very wrong
>>>> with this report...
>>>>
>>>> -Kees
>>
>> Anyway, I have double checked and linux-image-6.4.0-rc2-crash boots while
>> linux-image-6.4.0-rc2-crash-00001-g2d47c6956ab3 freezes in early boot.
>
> I don't understand what tree you're testing. 2d47c6956ab3 is only in
> Linus's latest tree, which is not 6.4-rc2.
> If you're testing Linus's tree, and you're bisecting to 2d47c6956ab3,
> I don't understand why the .config you sent doesn't include
> CONFIG_UBSAN_BOUNDS_STRICT (which was introduced by that commit) --
> it should be visible whether or not it is selected.
Hi, Mr. Cook,
I have cloned again from the Torvalds' tree, and rebuilt both kernels with the config
attached.
linux-image-6.4.0-rc2-crash2 again boots, and linux-image-6.4.0-rc2-crash2-00001-g2d47c6956ab3
crashes during the early boot. There is nothing from -00001-g2d47c6956ab3 kernel in the
logs.
It is this very config and vanilla Torvalds tree from
git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
Hope this helps.
Best regards,
Mirsad Todorovac
>> Of course, in the next boot dmesg appears overwritten ... I could provide
>> only the first screen screenshots.
>
> Without CONFIG_UBSAN_TRAP, I would not expect anything other than a
> warning (i.e. boot would continue).
>
> The only other thing I can think of that seems related (the backtrace
> appears to show usb), might be this:
> https://lore.kernel.org/lkml/[email protected]/
> which won't appears until after v6.5-rc1.
>
>> The difference is only one commit.
>>
>> It is a bit strange so I am available for any additional diagnostics.
>
> Thanks! Can you send "grep UBSAN .config" output for the crashing kernel?
>
> Are you booting on an EFI-capable machine? If you could configure pstore
> to use the EFI-vars backend, you can capture the crash in EFI and
> pstorefs will show it after the next boot. (If you're using systemd,
> this all may already be happening -- check /var/lib/systemd/pstore/
> or see[1] for more details.)
>
> -Kees
>
> [1] https://www.freedesktop.org/software/systemd/man/systemd-pstore.service.html
>
On 7/3/23 05:58, Guenter Roeck wrote:
> On 7/2/23 20:26, Guenter Roeck wrote:
>> On 7/2/23 20:20, Kees Cook wrote:
>>> On Mon, Jul 03, 2023 at 08:44:37AM +0700, Bagas Sanjaya wrote:
>>>> On Sun, Jul 02, 2023 at 06:36:12PM +0200, Mirsad Goran Todorovac wrote:
>>>>> Hi,
>>>>>
>>>>> After new git pull the kernel in Torvalds tree with default debug config
>>>>> failed to boot with error that occurs prior to mounting filesystems, so there
>>>>> is no log safe for the screenshot(s) here:
>>>>>
>>>>> [1] https://domac.alu.unizg.hr/~mtodorov/linux/crashes/2023-07-02/
>>>>>
>>>>> Bisect shows the first bad commit is 2d47c6956ab3 (v6.4-rc2-1-g2d47c6956ab3):
>>>>>
>>>>> # good: [98be618ad03010b1173fc3c35f6cbb4447ee2b07] Merge tag 'Smack-for-6.5' of https://github.com/cschaufler/smack-next
>>>>> git bisect good 98be618ad03010b1173fc3c35f6cbb4447ee2b07
>>>>> # bad: [f4a0659f823e5a828ea2f45b4849ea8e2dd2984c] drm/i2c: tda998x: Replace all non-returning strlcpy with strscpy
>>>>> git bisect bad f4a0659f823e5a828ea2f45b4849ea8e2dd2984c
>>>>> .
>>>>> .
>>>>> .
>>>>> # bad: [2d47c6956ab3c8b580a59d7704aab3e2a4882b6c] ubsan: Tighten UBSAN_BOUNDS on GCC
>>>>> git bisect bad 2d47c6956ab3c8b580a59d7704aab3e2a4882b6c
>>>>> # first bad commit: [2d47c6956ab3c8b580a59d7704aab3e2a4882b6c] ubsan: Tighten UBSAN_BOUNDS on GCC
>>>>>
>>>>> The architecture is Ubuntu 22.04 with lshw and config give in the attachment.
>>>>
>>>> Can you show early kernel log (something like dmesg)?
>>>>
>>>> Anyway, I'm adding it to regzbot:
>>>>
>>>> #regzbot ^introduced: 2d47c6956ab3c8
>>>> #regzbot title: Linux kernel fails to boot due to UBSAN_BOUNDS tightening
>>>
>>> I'm confused. Commit 2d47c6956ab3c8b580a59d7704aab3e2a4882b6c isn't in the v6.4
>>> tree... it's only in Linus's ToT.
>>>
>>
>> In ToT:
>>
>> $ git describe 2d47c6956ab3
>> v6.4-rc2-1-g2d47c6956ab3
>>
>> $ git describe --contains 2d47c6956ab3
>> next-20230616~2^2~51
>> $ git describe --contains --match 'v*' 2d47c6956ab3
>> fatal: cannot describe '2d47c6956ab3c8b580a59d7704aab3e2a4882b6c'
>>
>> "git describe" always shows the parent tree, which I guess was based on
>> v6.4-rc2.
>>
>
> Ah, sorry, I didn't realize that the subject claims that the problem
> would be in 6.4.1. That indeed does not match the bisect results.
I apologise for confusion. In fact, I have cloned the Torvalds tree after
6.4.1 was released, but I actually cloned the Torvalds tree, not the 6.4.1
from the stable branch as the Subject line might have misled.
But I think the text explained that the Torvalds tree was cloned
and the method:
] After new git pull the kernel in Torvalds tree with default debug config
] failed to boot with error that occurs prior to mounting filesystems, so there
] is no log safe for the screenshot(s) here:
I will try to be more consistent and precise the next time.
Sorry again for the confusion.
I am right now cloning directly from the Torvalds tree for the third time
and with the Ubuntu generic production kernel and the result is the same:
crash in boot for 2d47c6956ab3.
Best regards,
Mirsad Todorovac
On Sun, Jul 02, 2023 at 09:38:50PM -0700, Guenter Roeck wrote:
> On 7/2/23 21:30, Kees Cook wrote:
> > I don't understand what tree you're testing. 2d47c6956ab3 is only in
> > Linus's latest tree, which is not 6.4-rc2.
> >
>
> Maybe this ?
>
> $ git checkout -b testing 2d47c6956ab3
> Updating files: 100% (15501/15501), done.
> Switched to a new branch 'testing'
> groeck@server:~/src/linux-staging$ git describe
> v6.4-rc2-1-g2d47c6956ab3
Oh, it's the bisection position -- 2d47c6956ab3 was based on v6.4-rc2.
Got it. Thank you!
-Kees
--
Kees Cook
On 7/3/23 05:58, Guenter Roeck wrote:
> On 7/2/23 20:26, Guenter Roeck wrote:
>> On 7/2/23 20:20, Kees Cook wrote:
>>> On Mon, Jul 03, 2023 at 08:44:37AM +0700, Bagas Sanjaya wrote:
>>>> On Sun, Jul 02, 2023 at 06:36:12PM +0200, Mirsad Goran Todorovac wrote:
>>>>> Hi,
>>>>>
>>>>> After new git pull the kernel in Torvalds tree with default debug config
>>>>> failed to boot with error that occurs prior to mounting filesystems, so there
>>>>> is no log safe for the screenshot(s) here:
>>>>>
>>>>> [1] https://domac.alu.unizg.hr/~mtodorov/linux/crashes/2023-07-02/
>>>>>
>>>>> Bisect shows the first bad commit is 2d47c6956ab3 (v6.4-rc2-1-g2d47c6956ab3):
>>>>>
>>>>> # good: [98be618ad03010b1173fc3c35f6cbb4447ee2b07] Merge tag 'Smack-for-6.5' of https://github.com/cschaufler/smack-next
>>>>> git bisect good 98be618ad03010b1173fc3c35f6cbb4447ee2b07
>>>>> # bad: [f4a0659f823e5a828ea2f45b4849ea8e2dd2984c] drm/i2c: tda998x: Replace all non-returning strlcpy with strscpy
>>>>> git bisect bad f4a0659f823e5a828ea2f45b4849ea8e2dd2984c
>>>>> .
>>>>> .
>>>>> .
>>>>> # bad: [2d47c6956ab3c8b580a59d7704aab3e2a4882b6c] ubsan: Tighten UBSAN_BOUNDS on GCC
>>>>> git bisect bad 2d47c6956ab3c8b580a59d7704aab3e2a4882b6c
>>>>> # first bad commit: [2d47c6956ab3c8b580a59d7704aab3e2a4882b6c] ubsan: Tighten UBSAN_BOUNDS on GCC
>>>>>
>>>>> The architecture is Ubuntu 22.04 with lshw and config give in the attachment.
>>>>
>>>> Can you show early kernel log (something like dmesg)?
>>>>
>>>> Anyway, I'm adding it to regzbot:
>>>>
>>>> #regzbot ^introduced: 2d47c6956ab3c8
>>>> #regzbot title: Linux kernel fails to boot due to UBSAN_BOUNDS tightening
>>>
>>> I'm confused. Commit 2d47c6956ab3c8b580a59d7704aab3e2a4882b6c isn't in the v6.4
>>> tree... it's only in Linus's ToT.
>>>
>>
>> In ToT:
>>
>> $ git describe 2d47c6956ab3
>> v6.4-rc2-1-g2d47c6956ab3
>>
>> $ git describe --contains 2d47c6956ab3
>> next-20230616~2^2~51
>> $ git describe --contains --match 'v*' 2d47c6956ab3
>> fatal: cannot describe '2d47c6956ab3c8b580a59d7704aab3e2a4882b6c'
>>
>> "git describe" always shows the parent tree, which I guess was based on
>> v6.4-rc2.
>>
>
> Ah, sorry, I didn't realize that the subject claims that the problem
> would be in 6.4.1. That indeed does not match the bisect results.
I apologise for confusion. In fact, I have cloned the Torvalds tree after
6.4.1 was released, but I actually cloned the Torvalds tree, not the 6.4.1
from the stable branch as the Subject line might have misled.
But I think the text explained that the Torvalds tree was cloned
and the method:
] After new git pull the kernel in Torvalds tree with default debug config
] failed to boot with error that occurs prior to mounting filesystems, so there
] is no log safe for the screenshot(s) here:
I will try to be more consistent and precise the next time.
Sorry again for the confusion.
I am right now cloning directly from the Torvalds tree for the third time
and with the Ubuntu generic production kernel and the result is the same:
crash in boot for 2d47c6956ab3.
Best regards,
Mirsad Todorovac
On Mon, Jul 03, 2023 at 07:18:57AM +0200, Mirsad Goran Todorovac wrote:
> I apologise for confusion. In fact, I have cloned the Torvalds tree after
> 6.4.1 was released, but I actually cloned the Torvalds tree, not the 6.4.1
> from the stable branch as the Subject line might have misled.
Thanks, no worries! I got myself confused too. :)
The config you sent looks like I'd expect now too. Questions for you, if
you have time to diagnose further:
- Are you able to catch the very beginning of the crash, where the Oops
starts?
- Does pstore work for you to catch the crash?
- Can you try booting with this patch applied?
https://lore.kernel.org/lkml/[email protected]/
I'll try to see if I can figure out anything more from the images you
posted.
-Kees
--
Kees Cook
On 3.7.2023. 7:41, Kees Cook wrote:
> On Mon, Jul 03, 2023 at 07:18:57AM +0200, Mirsad Goran Todorovac wrote:
>> I apologise for confusion. In fact, I have cloned the Torvalds tree after
>> 6.4.1 was released, but I actually cloned the Torvalds tree, not the 6.4.1
>> from the stable branch as the Subject line might have misled.
>
> Thanks, no worries! I got myself confused too. :)
>
> The config you sent looks like I'd expect now too. Questions for you, if
> you have time to diagnose further:
>
> - Are you able to catch the very beginning of the crash, where the Oops
> starts?
It scrolls up very quickly. Couldn't catch that with the camera.
> - Does pstore work for you to catch the crash?
Haven't tried that yet. I will have to do some homework.
> - Can you try booting with this patch applied?
> https://lore.kernel.org/lkml/[email protected]/
Sure, but after 4 PM UTC+02 I suppose.
> I'll try to see if I can figure out anything more from the images you
> posted.
I really couldn't figure out myself what went wrong with this one?
Best regards,
Mirsad Todorovac
On Mon, Jul 03, 2023 at 09:03:38AM +0200, Mirsad Goran Todorovac wrote:
> On 3.7.2023. 7:41, Kees Cook wrote:
> > On Mon, Jul 03, 2023 at 07:18:57AM +0200, Mirsad Goran Todorovac wrote:
> > > I apologise for confusion. In fact, I have cloned the Torvalds tree after
> > > 6.4.1 was released, but I actually cloned the Torvalds tree, not the 6.4.1
> > > from the stable branch as the Subject line might have misled.
> >
> > Thanks, no worries! I got myself confused too. :)
> >
> > The config you sent looks like I'd expect now too. Questions for you, if
> > you have time to diagnose further:
> >
> > - Are you able to catch the very beginning of the crash, where the Oops
> > starts?
>
> It scrolls up very quickly. Couldn't catch that with the camera.
>
> > - Does pstore work for you to catch the crash?
>
> Haven't tried that yet. I will have to do some homework.
Try adding this to the .config:
# Enable PSTORE support
CONFIG_PSTORE=y
CONFIG_PSTORE_DEFAULT_KMSG_BYTES=10240
CONFIG_PSTORE_COMPRESS=y
CONFIG_PSTORE_DEFLATE_COMPRESS=y
# Enable UEFI pstore backend
CONFIG_EFI_VARS_PSTORE=y
# CONFIG_EFI_VARS_PSTORE_DEFAULT_DISABLE is not set
# Enable ACPI ERST pstore backend
CONFIG_ACPI=y
CONFIG_ACPI_APEI=y
A go write-up about using it is here:
https://blogs.oracle.com/linux/post/pstore-linux-kernel-persistent-storage-file-system
and covers the systemd-pstore details too. Note that in the config I
suggested, I've enabled the efi backend by default.
> > - Can you try booting with this patch applied?
> > https://lore.kernel.org/lkml/[email protected]/
>
> Sure, but after 4 PM UTC+02 I suppose.
Cool. xhci-hub is in your backtrace, and the above patch was made for
something very similar (though, again, I don't see why you're getting a
_crash_, it should _warn_ and continue normally). And, actually, also
include this patch:
https://lore.kernel.org/lkml/[email protected]/
> > I'll try to see if I can figure out anything more from the images you
> > posted.
Yeah, the xhci-hub bit is the only clue I can see here. It's also in the
IRQ handler, which reminds me of this bug that we still don't have a
root-cause for the _crash_ during the warning here:
https://lore.kernel.org/oe-lkp/202306131354.A499DE60@keescook/
but I the new patch I linked to above fixes the source of the warning.
> I really couldn't figure out myself what went wrong with this one?
Having the crash scroll off the page is pretty frustrating. I wonder if
the kernel crash handler could changed to repeat the RIP at the end of
the crash...
-Kees
--
Kees Cook
On Mon, Jul 03, 2023 at 12:03:23PM -0700, Kees Cook wrote:
> Cool. xhci-hub is in your backtrace, and the above patch was made for
> something very similar (though, again, I don't see why you're getting a
> _crash_, it should _warn_ and continue normally). And, actually, also
> include this patch:
> https://lore.kernel.org/lkml/[email protected]/
This is now in Linus's tree:
09b69dd4378b ("usb: ch9: Replace 1-element array with flexible array")
Please also still try with the first patch I mentioned, which is very similar:
https://lore.kernel.org/lkml/[email protected]/
-Kees
--
Kees Cook
On 7/4/23 01:09, Kees Cook wrote:> On Mon, Jul 03, 2023 at 12:03:23PM -0700, Kees Cook wrote:
>> Cool. xhci-hub is in your backtrace, and the above patch was made for
>> something very similar (though, again, I don't see why you're getting a
>> _crash_, it should _warn_ and continue normally). And, actually, also
>> include this patch:
>> https://lore.kernel.org/lkml/[email protected]/
>
> This is now in Linus's tree:
> 09b69dd4378b ("usb: ch9: Replace 1-element array with flexible array")
>
> Please also still try with the first patch I mentioned, which is very similar:
> https://lore.kernel.org/lkml/[email protected]/
Hi,
I have finally built w both patches (and recommended PSTORE settings were
default already).
This second patch fixes the booting problem, but alas there is still a problem -
all Wayland and X11.org GUI applications fail to start, with errors like this one:
Jul 4 19:09:07 defiant kernel: [ 40.529719] invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
Jul 4 19:09:07 defiant kernel: [ 40.529723] CPU: 0 PID: 3492 Comm: thunderbird Not tainted 6.4.0-rc2-crash2-kees2-00001-g2d47c6956ab3-dirty #5
Jul 4 19:09:07 defiant kernel: [ 40.529725] Hardware name: ASRock X670E PG Lightning/X670E PG Lightning, BIOS 1.21 04/26/2023
Jul 4 19:09:07 defiant kernel: [ 40.529726] RIP: 0010:alloc_pid+0x46c/0x480
Jul 4 19:09:07 defiant kernel: [ 40.529730] Code: 00 92 49 c7 c4 f4 ff ff ff e8 50 bc 15 01 4c 89 ff e8 68 50 13 00 e9 ec fd ff ff be 02 00 00 00 e8 89 5f 71 00 e9 f8 fe ff ff <0f> 0b 49 c7 c4 f4 ff ff ff e9 b9 fb ff ff 66 0f 1f 44 00 00 90 90
Jul 4 19:09:07 defiant kernel: [ 40.529731] RSP: 0018:ffffad8c45313c48 EFLAGS: 00010202
Jul 4 19:09:07 defiant kernel: [ 40.529733] RAX: 0000000080000000 RBX: 0000000000000001 RCX: 0000000000000000
Jul 4 19:09:07 defiant kernel: [ 40.529734] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
Jul 4 19:09:07 defiant kernel: [ 40.529734] RBP: ffffad8c45313c98 R08: 0000000000000000 R09: 0000000000000000
Jul 4 19:09:07 defiant kernel: [ 40.529735] R10: 0000000000000000 R11: 0000000000000000 R12: ffff9cbdff1c63a8
Jul 4 19:09:07 defiant kernel: [ 40.529735] R13: ffff9cbde9b08750 R14: 0000000000000001 R15: ffff9cbdff1c63a8
Jul 4 19:09:07 defiant kernel: [ 40.529736] FS: 00007f50d863e780(0000) GS:ffff9ccc97a00000(0000) knlGS:0000000000000000
Jul 4 19:09:07 defiant kernel: [ 40.529737] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Jul 4 19:09:07 defiant kernel: [ 40.529737] CR2: 0000000000000000 CR3: 00000001b0ae0000 CR4: 0000000000750ef0
Jul 4 19:09:07 defiant kernel: [ 40.529738] PKRU: 55555554
Jul 4 19:09:07 defiant kernel: [ 40.529739] Call Trace:
Jul 4 19:09:07 defiant kernel: [ 40.529739] <TASK>
Jul 4 19:09:07 defiant kernel: [ 40.529741] copy_process+0x165f/0x2110
Jul 4 19:09:07 defiant kernel: [ 40.529744] kernel_clone+0x9d/0x3a0
Jul 4 19:09:07 defiant kernel: [ 40.529745] ? find_held_lock+0x31/0xa0
Jul 4 19:09:07 defiant kernel: [ 40.529747] ? mntput_no_expire+0x89/0x4f0
Jul 4 19:09:07 defiant kernel: [ 40.529749] ? lock_release+0xc4/0x270
Jul 4 19:09:07 defiant kernel: [ 40.529751] __do_sys_clone+0x66/0xa0
Jul 4 19:09:07 defiant kernel: [ 40.529754] __x64_sys_clone+0x25/0x40
Jul 4 19:09:07 defiant kernel: [ 40.529755] do_syscall_64+0x59/0x90
Jul 4 19:09:07 defiant kernel: [ 40.529758] ? syscall_exit_to_user_mode+0x39/0x60
Jul 4 19:09:07 defiant kernel: [ 40.529760] ? do_syscall_64+0x69/0x90
Jul 4 19:09:07 defiant kernel: [ 40.529761] ? irqentry_exit_to_user_mode+0x27/0x40
Jul 4 19:09:07 defiant kernel: [ 40.529762] ? irqentry_exit+0x77/0xb0
Jul 4 19:09:07 defiant kernel: [ 40.529764] ? exc_page_fault+0xae/0x240
Jul 4 19:09:07 defiant kernel: [ 40.529765] entry_SYSCALL_64_after_hwframe+0x72/0xdc
Jul 4 19:09:07 defiant kernel: [ 40.529767] RIP: 0033:0x7f50d811ea3d
Jul 4 19:09:07 defiant kernel: [ 40.529769] Code: 5b 41 5c c3 66 0f 1f 84 00 00 00 00 00 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d c3 a3 0f 00 f7 d8 64 89 01 48
Jul 4 19:09:07 defiant kernel: [ 40.529770] RSP: 002b:00007ffcc449ce58 EFLAGS: 00000246 ORIG_RAX: 0000000000000038
Jul 4 19:09:07 defiant kernel: [ 40.529771] RAX: ffffffffffffffda RBX: 0000000000000051 RCX: 00007f50d811ea3d
Jul 4 19:09:07 defiant kernel: [ 40.529771] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000030000011
Jul 4 19:09:07 defiant kernel: [ 40.529772] RBP: 0000000000000001 R08: 0000000000000000 R09: 00007f50d82b97c0
Jul 4 19:09:07 defiant kernel: [ 40.529772] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000011
Jul 4 19:09:07 defiant kernel: [ 40.529773] R13: 00007f50d7e16980 R14: 00007f50d863e6c0 R15: 00007f50d82ba3c0
Jul 4 19:09:07 defiant kernel: [ 40.529775] </TASK>
Jul 4 19:09:07 defiant kernel: [ 40.529776] Modules linked in: binfmt_misc f2fs crc32_generic lz4hc_compress lz4_compress nls_iso8859_1 intel_rapl_msr intel_rapl_common snd_hda_codec_realtek snd_hda_codec_generic ledtrig_audio snd_hda_codec_hdmi edac_mce_amd crct10dif_pclmul snd_hda_intel polyval_clmulni snd_intel_dspcfg polyval_generic ghash_clmulni_intel snd_intel_sdw_acpi snd_seq_midi sha512_ssse3 snd_seq_midi_event snd_hda_codec aesni_intel snd_hda_core crypto_simd cryptd snd_hwdep joydev input_leds snd_rawmidi rapl amdgpu snd_pcm ccp wmi_bmof snd_seq k10temp snd_seq_device iommu_v2 snd_timer drm_buddy gpu_sched drm_suballoc_helper drm_ttm_helper ttm drm_display_helper cec snd drm_kms_helper i2c_algo_bit syscopyarea sysfillrect sysimgblt soundcore mac_hid sch_fq_codel msr parport_pc ppdev lp parport ramoops pstore_blk reed_solomon pstore_zone fuse efi_pstore drm ip_tables x_tables autofs4 btrfs blake2b_generic xor raid6_pq hid_generic nvme nvme_core ahci xhci_pci i2c_piix4 crc32_pclmul nvme_common libahci xhci_pci_renesas r8169 realtek video wmi
Jul 4 19:09:07 defiant kernel: [ 40.529799] gpio_amdpt
Jul 4 19:09:07 defiant kernel: [ 40.529801] ---[ end trace 0000000000000000 ]---
Jul 4 19:09:07 defiant kernel: [ 40.865489] RIP: 0010:alloc_pid+0x46c/0x480
Jul 4 19:09:07 defiant kernel: [ 40.865491] Code: 00 92 49 c7 c4 f4 ff ff ff e8 50 bc 15 01 4c 89 ff e8 68 50 13 00 e9 ec fd ff ff be 02 00 00 00 e8 89 5f 71 00 e9 f8 fe ff ff <0f> 0b 49 c7 c4 f4 ff ff ff e9 b9 fb ff ff 66 0f 1f 44 00 00 90 90
Jul 4 19:09:07 defiant kernel: [ 40.865492] RSP: 0018:ffffad8c45313c48 EFLAGS: 00010202
Jul 4 19:09:07 defiant kernel: [ 40.865494] RAX: 0000000080000000 RBX: 0000000000000001 RCX: 0000000000000000
Jul 4 19:09:07 defiant kernel: [ 40.865495] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
Jul 4 19:09:07 defiant kernel: [ 40.865495] RBP: ffffad8c45313c98 R08: 0000000000000000 R09: 0000000000000000
Jul 4 19:09:07 defiant kernel: [ 40.865496] R10: 0000000000000000 R11: 0000000000000000 R12: ffff9cbdff1c63a8
Jul 4 19:09:07 defiant kernel: [ 40.865497] R13: ffff9cbde9b08750 R14: 0000000000000001 R15: ffff9cbdff1c63a8
Jul 4 19:09:07 defiant kernel: [ 40.865497] FS: 00007f50d863e780(0000) GS:ffff9ccc97a00000(0000) knlGS:0000000000000000
Jul 4 19:09:07 defiant kernel: [ 40.865498] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Jul 4 19:09:07 defiant kernel: [ 40.865499] CR2: 0000000000000000 CR3: 00000001b0ae0000 CR4: 0000000000750ef0
Jul 4 19:09:07 defiant kernel: [ 40.865500] PKRU: 55555554
The interpretation of these findings is beyond the scope of my knowledge.
I hope you can make any use of them.
Best regards,
Mirsad Todorovac
On July 4, 2023 10:20:11 AM PDT, Mirsad Todorovac <[email protected]> wrote:
>On 7/4/23 01:09, Kees Cook wrote:> On Mon, Jul 03, 2023 at 12:03:23PM -0700, Kees Cook wrote:
>>> Cool. xhci-hub is in your backtrace, and the above patch was made for
>>> something very similar (though, again, I don't see why you're getting a
>>> _crash_, it should _warn_ and continue normally). And, actually, also
>>> include this patch:
>>> https://lore.kernel.org/lkml/[email protected]/
>>
>> This is now in Linus's tree:
>> 09b69dd4378b ("usb: ch9: Replace 1-element array with flexible array")
>>
>> Please also still try with the first patch I mentioned, which is very similar:
>> https://lore.kernel.org/lkml/[email protected]/
>
>Hi,
>
>I have finally built w both patches (and recommended PSTORE settings were
>default already).
Were you able to find the crashes saved by pstore?
>
>This second patch fixes the booting problem, but alas there is still a problem -
Ah! That's great! They're is still an unexpected crash source, but the trigger is fixed.
>all Wayland and X11.org GUI applications fail to start, with errors like this one:
>
>Jul 4 19:09:07 defiant kernel: [ 40.529719] invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
Hmm, is CONFIG_UBSAN_TRAP set?
>Jul 4 19:09:07 defiant kernel: [ 40.529726] RIP: 0010:alloc_pid+0x46c/0x480
Hmm, is this patch in your kernel?
https://git.kernel.org/linus/b69f0aeb068980af983d399deafc7477cec8bc04
--
Kees Cook
On 7/4/23 23:36, Kees Cook wrote:
> On July 4, 2023 10:20:11 AM PDT, Mirsad Todorovac <[email protected]> wrote:
>> On 7/4/23 01:09, Kees Cook wrote:> On Mon, Jul 03, 2023 at 12:03:23PM -0700, Kees Cook wrote:
>>>> Cool. xhci-hub is in your backtrace, and the above patch was made for
>>>> something very similar (though, again, I don't see why you're getting a
>>>> _crash_, it should _warn_ and continue normally). And, actually, also
>>>> include this patch:
>>>> https://lore.kernel.org/lkml/[email protected]/
>>>
>>> This is now in Linus's tree:
>>> 09b69dd4378b ("usb: ch9: Replace 1-element array with flexible array")
>>>
>>> Please also still try with the first patch I mentioned, which is very similar:
>>> https://lore.kernel.org/lkml/[email protected]/
>>
>> Hi,
>>
>> I have finally built w both patches (and recommended PSTORE settings were
>> default already).
>
> Were you able to find the crashes saved by pstore?
No, only lktdm and invalid opcode crashes ...
P.S.
Actually, I have recovered some pstore records. Please find them in the attachment:
>> This second patch fixes the booting problem, but alas there is still a problem -
>
> Ah! That's great! They're is still an unexpected crash source, but the trigger is fixed.
Glad I could be of help.
>> all Wayland and X11.org GUI applications fail to start, with errors like this one:
>>
>> Jul 4 19:09:07 defiant kernel: [ 40.529719] invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
>
> Hmm, is CONFIG_UBSAN_TRAP set?
marvin@defiant:~/linux/kernel/linux_torvalds$ grep CONFIG_UBSAN_TRAP .config
CONFIG_UBSAN_TRAP=y
marvin@defiant:~/linux/kernel/linux_torvalds$
>> Jul 4 19:09:07 defiant kernel: [ 40.529726] RIP: 0010:alloc_pid+0x46c/0x480
>
> Hmm, is this patch in your kernel?
> https://git.kernel.org/linus/b69f0aeb068980af983d399deafc7477cec8bc04
No, it wasn't. I had only these:
marvin@defiant:~/linux/kernel/linux_torvalds$ more ../kees-[12].patch
::::::::::::::
../kees-1.patch
::::::::::::::
diff --git a/include/uapi/linux/usb/ch9.h b/include/uapi/linux/usb/ch9.h
index b17e3a21b15f..82ec6af71a1d 100644
--- a/include/uapi/linux/usb/ch9.h
+++ b/include/uapi/linux/usb/ch9.h
@@ -376,7 +376,10 @@ struct usb_string_descriptor {
__u8 bLength;
__u8 bDescriptorType;
- __le16 wData[1]; /* UTF-16LE encoded */
+ union {
+ __le16 legacy_padding;
+ __DECLARE_FLEX_ARRAY(__le16, wData); /* UTF-16LE encoded */
+ };
} __attribute__ ((packed));
/* note that "string" zero is special, it holds language codes that
::::::::::::::
../kees-2.patch
::::::::::::::
diff --git a/include/uapi/linux/usb/ch9.h b/include/uapi/linux/usb/ch9.h
index b17e3a21b15f..3ff98c7ba7e3 100644
--- a/include/uapi/linux/usb/ch9.h
+++ b/include/uapi/linux/usb/ch9.h
@@ -981,7 +981,11 @@ struct usb_ssp_cap_descriptor {
#define USB_SSP_MIN_RX_LANE_COUNT (0xf << 8)
#define USB_SSP_MIN_TX_LANE_COUNT (0xf << 12)
__le16 wReserved;
- __le32 bmSublinkSpeedAttr[1]; /* list of sublink speed attrib entries */
+ union {
+ __le32 legacy_padding;
+ /* list of sublink speed attrib entries */
+ __DECLARE_FLEX_ARRAY(__le32, bmSublinkSpeedAttr);
+ };
#define USB_SSP_SUBLINK_SPEED_SSID (0xf) /* sublink speed ID */
#define USB_SSP_SUBLINK_SPEED_LSE (0x3 << 4) /* Lanespeed exponent */
#define USB_SSP_SUBLINK_SPEED_LSE_BPS 0
marvin@defiant:~/linux/kernel/linux_torvalds$
---------------------------------------------------------
Now it works. Succeeded boot and running of X apps with the new git pull
torvalds tree and the kees-2.patch.
Praise God!
This is the git log --oneline:
d528014517f2 (HEAD, origin/master, origin/HEAD) Revert ".gitignore: ignore *.cover and *.mbx"
04f2933d375e Merge tag 'core_guards_for_6.5_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/peterz/queue
03275585cabd afs: Fix accidental truncation when storing data
538140ca602b Merge tag 'ovl-update-6.5-2' of git://git.kernel.org/pub/scm/linux/kernel/git/overlayfs/vfs
94c76955e86a Merge tag 'gfs2-v6.4-rc5-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2
ccf46d853183 Merge tag 'pm-6.5-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
b869e9f49964 Merge tag 'clk-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux
406fb9eb198a Merge tag 'firewire-6.5-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/ieee1394/linux1394
f1962207150c module: fix init_module_from_file() error handling
40c565a429d7 Merge branches 'pm-cpufreq' and 'pm-cpuidle'
f679e89acdd3 clk: tegra: Avoid calling an uninitialized function
So, the included patch is:
marvin@defiant:~/linux/kernel/linux_torvalds$ git diff
diff --git a/include/uapi/linux/usb/ch9.h b/include/uapi/linux/usb/ch9.h
index 82ec6af71a1d..62d318377379 100644
--- a/include/uapi/linux/usb/ch9.h
+++ b/include/uapi/linux/usb/ch9.h
@@ -984,7 +984,11 @@ struct usb_ssp_cap_descriptor {
#define USB_SSP_MIN_RX_LANE_COUNT (0xf << 8)
#define USB_SSP_MIN_TX_LANE_COUNT (0xf << 12)
__le16 wReserved;
- __le32 bmSublinkSpeedAttr[1]; /* list of sublink speed attrib entries */
+ union {
+ __le32 legacy_padding;
+ /* list of sublink speed attrib entries */
+ __DECLARE_FLEX_ARRAY(__le32, bmSublinkSpeedAttr);
+ };
#define USB_SSP_SUBLINK_SPEED_SSID (0xf) /* sublink speed ID */
#define USB_SSP_SUBLINK_SPEED_LSE (0x3 << 4) /* Lanespeed exponent */
#define USB_SSP_SUBLINK_SPEED_LSE_BPS 0
marvin@defiant:~/linux/kernel/linux_torvalds$
This means vanilla torvalds tree + https://lore.kernel.org/lkml/[email protected]/
works, but vanilla torvalds tree w/o patch still crashes.
I am still rather new to the utilisation of the PSTORE subsystem.
Best regards,
Mirsad Todorovac
On July 4, 2023 4:15:20 PM PDT, Mirsad Todorovac <[email protected]> wrote:
>On 7/4/23 23:36, Kees Cook wrote:
>> On July 4, 2023 10:20:11 AM PDT, Mirsad Todorovac <[email protected]> wrote:
>>> On 7/4/23 01:09, Kees Cook wrote:> On Mon, Jul 03, 2023 at 12:03:23PM -0700, Kees Cook wrote:
>>>>> Cool. xhci-hub is in your backtrace, and the above patch was made for
>>>>> something very similar (though, again, I don't see why you're getting a
>>>>> _crash_, it should _warn_ and continue normally). And, actually, also
>>>>> include this patch:
>>>>> https://lore.kernel.org/lkml/[email protected]/
>>>>
>>>> This is now in Linus's tree:
>>>> 09b69dd4378b ("usb: ch9: Replace 1-element array with flexible array")
>>>>
>>>> Please also still try with the first patch I mentioned, which is very similar:
>>>> https://lore.kernel.org/lkml/[email protected]/
>>>
>>> Hi,
>>>
>>> I have finally built w both patches (and recommended PSTORE settings were
>>> default already).
>>
>> Were you able to find the crashes saved by pstore?
>
>No, only lktdm and invalid opcode crashes ...
>
>P.S.
>
>Actually, I have recovered some pstore records. Please find them in the attachment:
>
>>> This second patch fixes the booting problem, but alas there is still a problem -
>>
>> Ah! That's great! They're is still an unexpected crash source, but the trigger is fixed.
>
>Glad I could be of help.
>
>>> all Wayland and X11.org GUI applications fail to start, with errors like this one:
>>>
>>> Jul 4 19:09:07 defiant kernel: [ 40.529719] invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
>>
>> Hmm, is CONFIG_UBSAN_TRAP set?
>
>marvin@defiant:~/linux/kernel/linux_torvalds$ grep CONFIG_UBSAN_TRAP .config
>CONFIG_UBSAN_TRAP=y
Ah-ha! Turn that off please. With it off you will get much more useful reports from USBAN.
>marvin@defiant:~/linux/kernel/linux_torvalds$
>
>>> Jul 4 19:09:07 defiant kernel: [ 40.529726] RIP: 0010:alloc_pid+0x46c/0x480
>>
>> Hmm, is this patch in your kernel?
>> https://git.kernel.org/linus/b69f0aeb068980af983d399deafc7477cec8bc04
>
>No, it wasn't. I had only these:
>
>marvin@defiant:~/linux/kernel/linux_torvalds$ more ../kees-[12].patch
>::::::::::::::
>../kees-1.patch
>::::::::::::::
>diff --git a/include/uapi/linux/usb/ch9.h b/include/uapi/linux/usb/ch9.h
>index b17e3a21b15f..82ec6af71a1d 100644
>--- a/include/uapi/linux/usb/ch9.h
>+++ b/include/uapi/linux/usb/ch9.h
>@@ -376,7 +376,10 @@ struct usb_string_descriptor {
> __u8 bLength;
> __u8 bDescriptorType;
> - __le16 wData[1]; /* UTF-16LE encoded */
>+ union {
>+ __le16 legacy_padding;
>+ __DECLARE_FLEX_ARRAY(__le16, wData); /* UTF-16LE encoded */
>+ };
> } __attribute__ ((packed));
> /* note that "string" zero is special, it holds language codes that
>::::::::::::::
>../kees-2.patch
>::::::::::::::
>diff --git a/include/uapi/linux/usb/ch9.h b/include/uapi/linux/usb/ch9.h
>index b17e3a21b15f..3ff98c7ba7e3 100644
>--- a/include/uapi/linux/usb/ch9.h
>+++ b/include/uapi/linux/usb/ch9.h
>@@ -981,7 +981,11 @@ struct usb_ssp_cap_descriptor {
> #define USB_SSP_MIN_RX_LANE_COUNT (0xf << 8)
> #define USB_SSP_MIN_TX_LANE_COUNT (0xf << 12)
> __le16 wReserved;
>- __le32 bmSublinkSpeedAttr[1]; /* list of sublink speed attrib entries */
>+ union {
>+ __le32 legacy_padding;
>+ /* list of sublink speed attrib entries */
>+ __DECLARE_FLEX_ARRAY(__le32, bmSublinkSpeedAttr);
>+ };
> #define USB_SSP_SUBLINK_SPEED_SSID (0xf) /* sublink speed ID */
> #define USB_SSP_SUBLINK_SPEED_LSE (0x3 << 4) /* Lanespeed exponent */
> #define USB_SSP_SUBLINK_SPEED_LSE_BPS 0
>marvin@defiant:~/linux/kernel/linux_torvalds$
>
>---------------------------------------------------------
>
>Now it works. Succeeded boot and running of X apps with the new git pull
>torvalds tree and the kees-2.patch.
Perfect! Okay, so it looks like all the issues are known and fixed. I'll work with Greg to get the other ch9 patch landed.
>
>Praise God!
>
>This is the git log --oneline:
>
>d528014517f2 (HEAD, origin/master, origin/HEAD) Revert ".gitignore: ignore *.cover and *.mbx"
>04f2933d375e Merge tag 'core_guards_for_6.5_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/peterz/queue
>03275585cabd afs: Fix accidental truncation when storing data
>538140ca602b Merge tag 'ovl-update-6.5-2' of git://git.kernel.org/pub/scm/linux/kernel/git/overlayfs/vfs
>94c76955e86a Merge tag 'gfs2-v6.4-rc5-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2
>ccf46d853183 Merge tag 'pm-6.5-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
>b869e9f49964 Merge tag 'clk-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux
>406fb9eb198a Merge tag 'firewire-6.5-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/ieee1394/linux1394
>f1962207150c module: fix init_module_from_file() error handling
>40c565a429d7 Merge branches 'pm-cpufreq' and 'pm-cpuidle'
>f679e89acdd3 clk: tegra: Avoid calling an uninitialized function
>
>So, the included patch is:
>
>marvin@defiant:~/linux/kernel/linux_torvalds$ git diff
>diff --git a/include/uapi/linux/usb/ch9.h b/include/uapi/linux/usb/ch9.h
>index 82ec6af71a1d..62d318377379 100644
>--- a/include/uapi/linux/usb/ch9.h
>+++ b/include/uapi/linux/usb/ch9.h
>@@ -984,7 +984,11 @@ struct usb_ssp_cap_descriptor {
> #define USB_SSP_MIN_RX_LANE_COUNT (0xf << 8)
> #define USB_SSP_MIN_TX_LANE_COUNT (0xf << 12)
> __le16 wReserved;
>- __le32 bmSublinkSpeedAttr[1]; /* list of sublink speed attrib entries */
>+ union {
>+ __le32 legacy_padding;
>+ /* list of sublink speed attrib entries */
>+ __DECLARE_FLEX_ARRAY(__le32, bmSublinkSpeedAttr);
>+ };
> #define USB_SSP_SUBLINK_SPEED_SSID (0xf) /* sublink speed ID */
> #define USB_SSP_SUBLINK_SPEED_LSE (0x3 << 4) /* Lanespeed exponent */
> #define USB_SSP_SUBLINK_SPEED_LSE_BPS 0
>marvin@defiant:~/linux/kernel/linux_torvalds$
>
>This means vanilla torvalds tree + https://lore.kernel.org/lkml/[email protected]/
>works, but vanilla torvalds tree w/o patch still crashes.
Great, thanks again for testing it all!
-Keed
>
>I am still rather new to the utilisation of the PSTORE subsystem.
>
>Best regards,
>Mirsad Todorovac
--
Kees Cook
On 7/5/23 04:09, Kees Cook wrote:
> On July 4, 2023 4:15:20 PM PDT, Mirsad Todorovac <[email protected]> wrote:
>> On 7/4/23 23:36, Kees Cook wrote:
>>> On July 4, 2023 10:20:11 AM PDT, Mirsad Todorovac <[email protected]> wrote:
>>>> On 7/4/23 01:09, Kees Cook wrote:> On Mon, Jul 03, 2023 at 12:03:23PM -0700, Kees Cook wrote:
>>>>>> Cool. xhci-hub is in your backtrace, and the above patch was made for
>>>>>> something very similar (though, again, I don't see why you're getting a
>>>>>> _crash_, it should _warn_ and continue normally). And, actually, also
>>>>>> include this patch:
>>>>>> https://lore.kernel.org/lkml/[email protected]/
>>>>>
>>>>> This is now in Linus's tree:
>>>>> 09b69dd4378b ("usb: ch9: Replace 1-element array with flexible array")
>>>>>
>>>>> Please also still try with the first patch I mentioned, which is very similar:
>>>>> https://lore.kernel.org/lkml/[email protected]/
>>>>
>>>> Hi,
>>>>
>>>> I have finally built w both patches (and recommended PSTORE settings were
>>>> default already).
>>>
>>> Were you able to find the crashes saved by pstore?
>>
>> No, only lktdm and invalid opcode crashes ...
>>
>> P.S.
>>
>> Actually, I have recovered some pstore records. Please find them in the attachment:
>>
>>>> This second patch fixes the booting problem, but alas there is still a problem -
>>>
>>> Ah! That's great! They're is still an unexpected crash source, but the trigger is fixed.
>>
>> Glad I could be of help.
>>
>>>> all Wayland and X11.org GUI applications fail to start, with errors like this one:
>>>>
>>>> Jul 4 19:09:07 defiant kernel: [ 40.529719] invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
>>>
>>> Hmm, is CONFIG_UBSAN_TRAP set?
>>
>> marvin@defiant:~/linux/kernel/linux_torvalds$ grep CONFIG_UBSAN_TRAP .config
>> CONFIG_UBSAN_TRAP=y
> Ah-ha! Turn that off please. With it off you will get much more useful reports from USBAN.
Will do that. Thanks for the hint.
>> marvin@defiant:~/linux/kernel/linux_torvalds$
>>
>>>> Jul 4 19:09:07 defiant kernel: [ 40.529726] RIP: 0010:alloc_pid+0x46c/0x480
>>>
>>> Hmm, is this patch in your kernel?
>>> https://git.kernel.org/linus/b69f0aeb068980af983d399deafc7477cec8bc04
>>
>> No, it wasn't. I had only these:
>>
>> marvin@defiant:~/linux/kernel/linux_torvalds$ more ../kees-[12].patch
>> ::::::::::::::
>> ../kees-1.patch
>> ::::::::::::::
>> diff --git a/include/uapi/linux/usb/ch9.h b/include/uapi/linux/usb/ch9.h
>> index b17e3a21b15f..82ec6af71a1d 100644
>> --- a/include/uapi/linux/usb/ch9.h
>> +++ b/include/uapi/linux/usb/ch9.h
>> @@ -376,7 +376,10 @@ struct usb_string_descriptor {
>> __u8 bLength;
>> __u8 bDescriptorType;
>> - __le16 wData[1]; /* UTF-16LE encoded */
>> + union {
>> + __le16 legacy_padding;
>> + __DECLARE_FLEX_ARRAY(__le16, wData); /* UTF-16LE encoded */
>> + };
>> } __attribute__ ((packed));
>> /* note that "string" zero is special, it holds language codes that
>> ::::::::::::::
>> ../kees-2.patch
>> ::::::::::::::
>> diff --git a/include/uapi/linux/usb/ch9.h b/include/uapi/linux/usb/ch9.h
>> index b17e3a21b15f..3ff98c7ba7e3 100644
>> --- a/include/uapi/linux/usb/ch9.h
>> +++ b/include/uapi/linux/usb/ch9.h
>> @@ -981,7 +981,11 @@ struct usb_ssp_cap_descriptor {
>> #define USB_SSP_MIN_RX_LANE_COUNT (0xf << 8)
>> #define USB_SSP_MIN_TX_LANE_COUNT (0xf << 12)
>> __le16 wReserved;
>> - __le32 bmSublinkSpeedAttr[1]; /* list of sublink speed attrib entries */
>> + union {
>> + __le32 legacy_padding;
>> + /* list of sublink speed attrib entries */
>> + __DECLARE_FLEX_ARRAY(__le32, bmSublinkSpeedAttr);
>> + };
>> #define USB_SSP_SUBLINK_SPEED_SSID (0xf) /* sublink speed ID */
>> #define USB_SSP_SUBLINK_SPEED_LSE (0x3 << 4) /* Lanespeed exponent */
>> #define USB_SSP_SUBLINK_SPEED_LSE_BPS 0
>> marvin@defiant:~/linux/kernel/linux_torvalds$
>>
>> ---------------------------------------------------------
>>
>> Now it works. Succeeded boot and running of X apps with the new git pull
>> torvalds tree and the kees-2.patch.
>
> Perfect! Okay, so it looks like all the issues are known and fixed. I'll work with Greg to get the other ch9 patch landed.
Yes, maybe it should be tested more widely first. It was an unobvious bug and
I couldn't see what went wrong ...
>> Praise God!
>>
>> This is the git log --oneline:
>>
>> d528014517f2 (HEAD, origin/master, origin/HEAD) Revert ".gitignore: ignore *.cover and *.mbx"
>> 04f2933d375e Merge tag 'core_guards_for_6.5_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/peterz/queue
>> 03275585cabd afs: Fix accidental truncation when storing data
>> 538140ca602b Merge tag 'ovl-update-6.5-2' of git://git.kernel.org/pub/scm/linux/kernel/git/overlayfs/vfs
>> 94c76955e86a Merge tag 'gfs2-v6.4-rc5-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2
>> ccf46d853183 Merge tag 'pm-6.5-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
>> b869e9f49964 Merge tag 'clk-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux
>> 406fb9eb198a Merge tag 'firewire-6.5-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/ieee1394/linux1394
>> f1962207150c module: fix init_module_from_file() error handling
>> 40c565a429d7 Merge branches 'pm-cpufreq' and 'pm-cpuidle'
>> f679e89acdd3 clk: tegra: Avoid calling an uninitialized function
>>
>> So, the included patch is:
>>
>> marvin@defiant:~/linux/kernel/linux_torvalds$ git diff
>> diff --git a/include/uapi/linux/usb/ch9.h b/include/uapi/linux/usb/ch9.h
>> index 82ec6af71a1d..62d318377379 100644
>> --- a/include/uapi/linux/usb/ch9.h
>> +++ b/include/uapi/linux/usb/ch9.h
>> @@ -984,7 +984,11 @@ struct usb_ssp_cap_descriptor {
>> #define USB_SSP_MIN_RX_LANE_COUNT (0xf << 8)
>> #define USB_SSP_MIN_TX_LANE_COUNT (0xf << 12)
>> __le16 wReserved;
>> - __le32 bmSublinkSpeedAttr[1]; /* list of sublink speed attrib entries */
>> + union {
>> + __le32 legacy_padding;
>> + /* list of sublink speed attrib entries */
>> + __DECLARE_FLEX_ARRAY(__le32, bmSublinkSpeedAttr);
>> + };
>> #define USB_SSP_SUBLINK_SPEED_SSID (0xf) /* sublink speed ID */
>> #define USB_SSP_SUBLINK_SPEED_LSE (0x3 << 4) /* Lanespeed exponent */
>> #define USB_SSP_SUBLINK_SPEED_LSE_BPS 0
>> marvin@defiant:~/linux/kernel/linux_torvalds$
>>
>> This means vanilla torvalds tree + https://lore.kernel.org/lkml/[email protected]/
>> works, but vanilla torvalds tree w/o patch still crashes.
>
> Great, thanks again for testing it all!
No at all, I'm glad I could be of assistance.
Best regards,
Mirsad Todorovac
> -Keed
>
>>
>> I am still rather new to the utilisation of the PSTORE subsystem.
>>
>> Best regards,
>> Mirsad Todorovac
>
On Wed, Jul 5, 2023 at 4:10 AM Kees Cook <[email protected]> wrote:
> On July 4, 2023 4:15:20 PM PDT, Mirsad Todorovac <[email protected]> wrote:
> >On 7/4/23 23:36, Kees Cook wrote:
> >> On July 4, 2023 10:20:11 AM PDT, Mirsad Todorovac <[email protected]> wrote:
> >>> all Wayland and X11.org GUI applications fail to start, with errors like this one:
> >>>
> >>> Jul 4 19:09:07 defiant kernel: [ 40.529719] invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
> >>
> >> Hmm, is CONFIG_UBSAN_TRAP set?
> >
> >marvin@defiant:~/linux/kernel/linux_torvalds$ grep CONFIG_UBSAN_TRAP .config
> >CONFIG_UBSAN_TRAP=y
>
> Ah-ha! Turn that off please. With it off you will get much more useful reports from USBAN.
It might be useful if the x86 code under handle_invalid_op() at least
printed a warning about this when the kernel crashes with #UD on a
system with CONFIG_UBSAN_TRAP=y? It seems pretty unintuitive and
unhelpful that the kernel just crashes itself with a #UD and no
further information in this configuration.
Even just a "WARNING: CONFIG_UBSAN_TRAP active, #UD might be caused by
that" on every #UD that does not come from a known BUG() location or
such might be better than nothing...
And maybe the Kconfig help text could be clearer on this, too.
Currently it does say that this turns warnings into "full exceptions
that abort the running kernel code" but it does not say that the
exception reporting will become pretty unhelpful, so it's probably not
really what you'd want for debugging.
On Wed, Jul 05, 2023 at 05:16:36PM +0200, Jann Horn wrote:
> On Wed, Jul 5, 2023 at 4:10 AM Kees Cook <[email protected]> wrote:
> > On July 4, 2023 4:15:20 PM PDT, Mirsad Todorovac <[email protected]> wrote:
> > >On 7/4/23 23:36, Kees Cook wrote:
> > >> On July 4, 2023 10:20:11 AM PDT, Mirsad Todorovac <[email protected]> wrote:
> > >>> all Wayland and X11.org GUI applications fail to start, with errors like this one:
> > >>>
> > >>> Jul 4 19:09:07 defiant kernel: [ 40.529719] invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
> > >>
> > >> Hmm, is CONFIG_UBSAN_TRAP set?
> > >
> > >marvin@defiant:~/linux/kernel/linux_torvalds$ grep CONFIG_UBSAN_TRAP .config
> > >CONFIG_UBSAN_TRAP=y
> >
> > Ah-ha! Turn that off please. With it off you will get much more useful reports from USBAN.
>
> It might be useful if the x86 code under handle_invalid_op() at least
> printed a warning about this when the kernel crashes with #UD on a
> system with CONFIG_UBSAN_TRAP=y? It seems pretty unintuitive and
> unhelpful that the kernel just crashes itself with a #UD and no
> further information in this configuration.
>
> Even just a "WARNING: CONFIG_UBSAN_TRAP active, #UD might be caused by
> that" on every #UD that does not come from a known BUG() location or
> such might be better than nothing...
I've considered it, but usually CONFIG_UBSAN_TRAP isn't accidentally
set. Also, the crash info is something we can get help from on the
compiler side, to mark up where the traps are, similar to what we do
with KCFI, but it hasn't happened yet for x86. For example, arm64
already encodes the details in the trap instruction itself:
https://git.kernel.org/linus/25b84002afb9dc9a91a7ea67166879c13ad82422
> And maybe the Kconfig help text could be clearer on this, too.
> Currently it does say that this turns warnings into "full exceptions
> that abort the running kernel code" but it does not say that the
> exception reporting will become pretty unhelpful, so it's probably not
> really what you'd want for debugging.
Yeah, that's a reasonable change to make. Can you send a patch for this?
I can carry it.
Thanks!
--
Kees Cook
On Wed, Jul 05, 2023 at 02:08:09PM -0700, Kees Cook wrote:
> > Even just a "WARNING: CONFIG_UBSAN_TRAP active, #UD might be caused by
> > that" on every #UD that does not come from a known BUG() location or
> > such might be better than nothing...
>
> I've considered it, but usually CONFIG_UBSAN_TRAP isn't accidentally
> set. Also, the crash info is something we can get help from on the
> compiler side, to mark up where the traps are, similar to what we do
> with KCFI, but it hasn't happened yet for x86. For example, arm64
> already encodes the details in the trap instruction itself:
> https://git.kernel.org/linus/25b84002afb9dc9a91a7ea67166879c13ad82422
Right, so you could easily use a different #UD instruction that has an
immediate, something like:
0f b9 40 ff ud1 -0x1(%rax),%rax
or even:
0f b9 80 00 ff ff ff ud1 -0x100(%rax),%rax
if you need a 32bit value.
It shouldn't be hard to fix up the #UD handler to decode the instruction
and obtain the displacement for a clue.
Typically we use ud2 because it's the smallest #UD instruction (2 bytes)
and that's enough, but if you want to provide additional clues, there's
options...
On Wed, Jul 05, 2023 at 11:31:13PM +0200, Peter Zijlstra wrote:
> On Wed, Jul 05, 2023 at 02:08:09PM -0700, Kees Cook wrote:
>
> > > Even just a "WARNING: CONFIG_UBSAN_TRAP active, #UD might be caused by
> > > that" on every #UD that does not come from a known BUG() location or
> > > such might be better than nothing...
> >
> > I've considered it, but usually CONFIG_UBSAN_TRAP isn't accidentally
> > set. Also, the crash info is something we can get help from on the
> > compiler side, to mark up where the traps are, similar to what we do
> > with KCFI, but it hasn't happened yet for x86. For example, arm64
> > already encodes the details in the trap instruction itself:
> > https://git.kernel.org/linus/25b84002afb9dc9a91a7ea67166879c13ad82422
>
> Right, so you could easily use a different #UD instruction that has an
> immediate, something like:
>
> 0f b9 40 ff ud1 -0x1(%rax),%rax
Ah yeah, that would be easier, probably. It could match what arm64 does.
--
Kees Cook
On 7/5/23 04:09, Kees Cook wrote:
>>>
>>> Hmm, is CONFIG_UBSAN_TRAP set?
>>
>> marvin@defiant:~/linux/kernel/linux_torvalds$ grep CONFIG_UBSAN_TRAP .config
>> CONFIG_UBSAN_TRAP=y
>
> Ah-ha! Turn that off please. With it off you will get much more useful reports from USBAN.
Done that. And it appears to work.
Great job.
There should be a way to store the earliest kernel messages while in the initrd phase, but
I can't think of any either ...
Have a nice day!
Best regards,
Mirsad Todorovac