Machine: MIPS32 R2 Big Endian (interAptiv (multi))
While testing MIPS with LLVM, I found a weird and very rare bug with
MIPS relocs that LLVM emits into kernel modules. It happens on both
11.0.0 and latest git snapshot and applies, as I can see, only to
references to static symbols.
When the kernel loads the module, it allocates a space for every
section and then manually apply the relocations relative to the
new address.
Let's say we have a function phy_probe() in drivers/net/phy/libphy.ko.
It's static and referenced only in phy_register_driver(), where it's
used to fill callback pointer in a structure.
The real function address after module loading is 0xc06c1444, that
is observed in its ELF st_value field.
There are two relocs related to this usage in phy_register_driver():
R_MIPS_HI16 refers to 0x3c010000
R_MIPS_LO16 refers to 0x24339444
The address of .text is 0xc06b8000. So the destination is calculated
as follows:
0x00000000 from hi16;
0xffff9444 from lo16 (sign extend as it's always treated as signed);
0xc06b8000 from base.
= 0xc06b1444. The value is lower than the real phy_probe() address
(0xc06c1444) by 0x10000 and is lower than the base address of
module's .text, so it's 100% incorrect.
This results in:
[ 2.204022] CPU 3 Unable to handle kernel paging request at virtual
address c06b1444, epc == c06b1444, ra == 803f1090
The correct instructions should be:
R_MIPS_HI16 0x3c010001
R_MIPS_LO16 0x24339444
so there'll be 0x00010000 from hi16.
I tried to catch those bugs in arch/mips/kernel/module.c (by checking
if the destination is lower than the base address, which should never
happen), and seems like I have only 3 such places in libphy.ko (and
one in nf_tables.ko).
I don't think it should be handled somehow in mentioned source code
as it would look rather ugly and may break kernels build with GNU
stack, which seems to not produce such bad codes.
If I should report this to any other resources, please let me know.
I chose clang-built-linux and LKML as it may not happen with userland
(didn't tried to catch).
Thanks,
Al
On Sat, Jan 9, 2021 at 9:11 AM Alexander Lobakin <[email protected]> wrote:
>
> Machine: MIPS32 R2 Big Endian (interAptiv (multi))
>
> While testing MIPS with LLVM, I found a weird and very rare bug with
> MIPS relocs that LLVM emits into kernel modules. It happens on both
> 11.0.0 and latest git snapshot and applies, as I can see, only to
> references to static symbols.
>
> When the kernel loads the module, it allocates a space for every
> section and then manually apply the relocations relative to the
> new address.
>
> Let's say we have a function phy_probe() in drivers/net/phy/libphy.ko.
> It's static and referenced only in phy_register_driver(), where it's
> used to fill callback pointer in a structure.
>
> The real function address after module loading is 0xc06c1444, that
> is observed in its ELF st_value field.
> There are two relocs related to this usage in phy_register_driver():
>
> R_MIPS_HI16 refers to 0x3c010000
> R_MIPS_LO16 refers to 0x24339444
>
> The address of .text is 0xc06b8000. So the destination is calculated
> as follows:
>
> 0x00000000 from hi16;
> 0xffff9444 from lo16 (sign extend as it's always treated as signed);
> 0xc06b8000 from base.
>
> = 0xc06b1444. The value is lower than the real phy_probe() address
> (0xc06c1444) by 0x10000 and is lower than the base address of
> module's .text, so it's 100% incorrect.
>
> This results in:
>
> [ 2.204022] CPU 3 Unable to handle kernel paging request at virtual
> address c06b1444, epc == c06b1444, ra == 803f1090
>
> The correct instructions should be:
>
> R_MIPS_HI16 0x3c010001
> R_MIPS_LO16 0x24339444
>
> so there'll be 0x00010000 from hi16.
>
> I tried to catch those bugs in arch/mips/kernel/module.c (by checking
> if the destination is lower than the base address, which should never
> happen), and seems like I have only 3 such places in libphy.ko (and
> one in nf_tables.ko).
> I don't think it should be handled somehow in mentioned source code
> as it would look rather ugly and may break kernels build with GNU
> stack, which seems to not produce such bad codes.
>
> If I should report this to any other resources, please let me know.
> I chose clang-built-linux and LKML as it may not happen with userland
> (didn't tried to catch).
Thanks for the report. Sounds like we may indeed be producing an
incorrect relocation. This is only seen for big endian triples?
Getting a way for us to deterministically reproduce would be a good
first step. Which config or configs beyond defconfig, and which
relocations specifically are you observing this with?
--
Thanks,
~Nick Desaulniers
From: Nick Desaulniers <[email protected]>
Date: Sat, 9 Jan 2021 09:50:44 -0800
> On Sat, Jan 9, 2021 at 9:11 AM Alexander Lobakin <[email protected]> wrote:
>>
>> Machine: MIPS32 R2 Big Endian (interAptiv (multi))
>>
>> While testing MIPS with LLVM, I found a weird and very rare bug with
>> MIPS relocs that LLVM emits into kernel modules. It happens on both
>> 11.0.0 and latest git snapshot and applies, as I can see, only to
>> references to static symbols.
>>
>> When the kernel loads the module, it allocates a space for every
>> section and then manually apply the relocations relative to the
>> new address.
>>
>> Let's say we have a function phy_probe() in drivers/net/phy/libphy.ko.
>> It's static and referenced only in phy_register_driver(), where it's
>> used to fill callback pointer in a structure.
>>
>> The real function address after module loading is 0xc06c1444, that
>> is observed in its ELF st_value field.
>> There are two relocs related to this usage in phy_register_driver():
>>
>> R_MIPS_HI16 refers to 0x3c010000
>> R_MIPS_LO16 refers to 0x24339444
>>
>> The address of .text is 0xc06b8000. So the destination is calculated
>> as follows:
>>
>> 0x00000000 from hi16;
>> 0xffff9444 from lo16 (sign extend as it's always treated as signed);
>> 0xc06b8000 from base.
>>
>> = 0xc06b1444. The value is lower than the real phy_probe() address
>> (0xc06c1444) by 0x10000 and is lower than the base address of
>> module's .text, so it's 100% incorrect.
>>
>> This results in:
>>
>> [ 2.204022] CPU 3 Unable to handle kernel paging request at virtual
>> address c06b1444, epc == c06b1444, ra == 803f1090
>>
>> The correct instructions should be:
>>
>> R_MIPS_HI16 0x3c010001
>> R_MIPS_LO16 0x24339444
>>
>> so there'll be 0x00010000 from hi16.
>>
>> I tried to catch those bugs in arch/mips/kernel/module.c (by checking
>> if the destination is lower than the base address, which should never
>> happen), and seems like I have only 3 such places in libphy.ko (and
>> one in nf_tables.ko).
>> I don't think it should be handled somehow in mentioned source code
>> as it would look rather ugly and may break kernels build with GNU
>> stack, which seems to not produce such bad codes.
>>
>> If I should report this to any other resources, please let me know.
>> I chose clang-built-linux and LKML as it may not happen with userland
>> (didn't tried to catch).
>
> Thanks for the report. Sounds like we may indeed be producing an
> incorrect relocation. This is only seen for big endian triples?
Unfortunately I don't have a LE board to play with, so can confirm
only Big Endian.
(BTW, if someone can say if it's possible for MIPS (and how if it is)
to launch a LE kernel from BE-booted preloader and U-Boot, that would
be super cool)
> Getting a way for us to deterministically reproduce would be a good
> first step. Which config or configs beyond defconfig, and which
> relocations specifically are you observing this with?
I use `make 32r2_defconfig` which combines several configs from
arch/mips/configs:
- generic_defconfig;
- generic/32r2.config;
- generic/eb.config.
Aside from that, I enable a bunch of my WIP drivers and the
Netfilter. On my setup, this bug is always present in libphy.ko,
so CONFIG_PHYLIB=m (with all deps) should be enough.
The three failed relocs belongs to this part of code: [0]
llvm-readelf on them:
Relocation section '.rel.text' at offset 0xbf60 contains 2281 entries:¬
[...]
00005740 00029305 R_MIPS_HI16 00000000 .text
00005744 00029306 R_MIPS_LO16 00000000 .text
00005720 00029305 R_MIPS_HI16 00000000 .text
00005748 00029306 R_MIPS_LO16 00000000 .text
0000573c 00029305 R_MIPS_HI16 00000000 .text
0000574c 00029306 R_MIPS_LO16 00000000 .text
The first pair is the one from my first mail:
0x3c010000 <-- should be 0x3c010001 to work properly
0x24339444
I'm planning to hunt for more now, will let you know.
[0] https://elixir.bootlin.com/linux/v5.11-rc2/source/drivers/net/phy/phy_device.c#L2989
> Thanks,
> ~Nick Desaulniers
Thanks,
Al
From: Alexander Lobakin <[email protected]>
Date: Sat, 09 Jan 2021 19:15:31 +0000
> From: Nick Desaulniers <[email protected]>
> Date: Sat, 9 Jan 2021 09:50:44 -0800
>
>> On Sat, Jan 9, 2021 at 9:11 AM Alexander Lobakin <[email protected]> wrote:
>>>
>>> Machine: MIPS32 R2 Big Endian (interAptiv (multi))
>>>
>>> While testing MIPS with LLVM, I found a weird and very rare bug with
>>> MIPS relocs that LLVM emits into kernel modules. It happens on both
>>> 11.0.0 and latest git snapshot and applies, as I can see, only to
>>> references to static symbols.
>>>
>>> When the kernel loads the module, it allocates a space for every
>>> section and then manually apply the relocations relative to the
>>> new address.
>>>
>>> Let's say we have a function phy_probe() in drivers/net/phy/libphy.ko.
>>> It's static and referenced only in phy_register_driver(), where it's
>>> used to fill callback pointer in a structure.
>>>
>>> The real function address after module loading is 0xc06c1444, that
>>> is observed in its ELF st_value field.
>>> There are two relocs related to this usage in phy_register_driver():
>>>
>>> R_MIPS_HI16 refers to 0x3c010000
>>> R_MIPS_LO16 refers to 0x24339444
>>>
>>> The address of .text is 0xc06b8000. So the destination is calculated
>>> as follows:
>>>
>>> 0x00000000 from hi16;
>>> 0xffff9444 from lo16 (sign extend as it's always treated as signed);
>>> 0xc06b8000 from base.
>>>
>>> = 0xc06b1444. The value is lower than the real phy_probe() address
>>> (0xc06c1444) by 0x10000 and is lower than the base address of
>>> module's .text, so it's 100% incorrect.
>>>
>>> This results in:
>>>
>>> [ 2.204022] CPU 3 Unable to handle kernel paging request at virtual
>>> address c06b1444, epc =3D=3D c06b1444, ra =3D=3D 803f1090
>>>
>>> The correct instructions should be:
>>>
>>> R_MIPS_HI16 0x3c010001
>>> R_MIPS_LO16 0x24339444
>>>
>>> so there'll be 0x00010000 from hi16.
>>>
>>> I tried to catch those bugs in arch/mips/kernel/module.c (by checking
>>> if the destination is lower than the base address, which should never
>>> happen), and seems like I have only 3 such places in libphy.ko (and
>>> one in nf_tables.ko).
>>> I don't think it should be handled somehow in mentioned source code
>>> as it would look rather ugly and may break kernels build with GNU
>>> stack, which seems to not produce such bad codes.
>>>
>>> If I should report this to any other resources, please let me know.
>>> I chose clang-built-linux and LKML as it may not happen with userland
>>> (didn't tried to catch).
>>
>> Thanks for the report. Sounds like we may indeed be producing an
>> incorrect relocation. This is only seen for big endian triples?
>
> Unfortunately I don't have a LE board to play with, so can confirm
> only Big Endian.
>
> (BTW, if someone can say if it's possible for MIPS (and how if it is)
> to launch a LE kernel from BE-booted preloader and U-Boot, that would
> be super cool)
>
>> Getting a way for us to deterministically reproduce would be a good
>> first step. Which config or configs beyond defconfig, and which
>> relocations specifically are you observing this with?
>
> I use `make 32r2_defconfig` which combines several configs from
> arch/mips/configs:
> - generic_defconfig;
> - generic/32r2.config;
> - generic/eb.config.
>
> Aside from that, I enable a bunch of my WIP drivers and the
> Netfilter. On my setup, this bug is always present in libphy.ko,
> so CONFIG_PHYLIB=m (with all deps) should be enough.
>
> The three failed relocs belongs to this part of code: [0]
>
> llvm-readelf on them:
>
> Relocation section '.rel.text' at offset 0xbf60 contains 2281 entries:
> [...]
> 00005740 00029305 R_MIPS_HI16 00000000 .text
> 00005744 00029306 R_MIPS_LO16 00000000 .text
> 00005720 00029305 R_MIPS_HI16 00000000 .text
> 00005748 00029306 R_MIPS_LO16 00000000 .text
> 0000573c 00029305 R_MIPS_HI16 00000000 .text
> 0000574c 00029306 R_MIPS_LO16 00000000 .text
>
> The first pair is the one from my first mail:
> 0x3c010000 <-- should be 0x3c010001 to work properly
> 0x24339444
>
> I'm planning to hunt for more now, will let you know.
Unfortunately, R_MIPS_32 also suffers from that. And unlikely
R_MIPS_{HI,LO}16, they can't be handled runtime as the values
are pure random.
I expanded arch/mips/kernel/module.c a bit, so it tries to find
the actual symbol in .symtab after each applied relocation and
print the detailed info. Here's an example from nf_defrag_ipv6
loading:
[ 429.789793] nf_defrag_ipv6: final section addresses:
[ 429.795409] 0xc07214fc __ksymtab_gpl
[ 429.799574] 0xc0720000 .text
[ 429.802902] 0xc07216b0 .data
[ 429.806249] 0xc0721790 .bss
[ 429.809474] 0xc0721508 __ksymtab_strings
[ 429.813977] 0xc0728000 .init.text
[ 429.817781] 0xc07214c0 .exit.text
[ 429.821606] 0xc0721520 .rodata
[ 429.825120] 0xc0721578 .rodata.str1.1
[ 429.829322] 0xc0721638 .note.Linux
[ 429.833226] 0xc0721800 .gnu.linkonce.this_module
[ 429.838503] 0xc0721650 .MIPS.abiflags
[ 429.842702] 0xc0721668 .reginfo
[ 429.846326] 0xc0721680 .note.gnu.build-id
[ 429.851129] nf_defrag_ipv6: R_MIPS_32 [0x00000008]: 0xc07216b0 -> 0xc07216b8 is broken
[ 429.860017] nf_defrag_ipv6: R_MIPS_32 [0x00000008]: 0xc07216b0 -> 0xc07216b8 is broken
[ 429.868875] nf_defrag_ipv6: R_MIPS_32 [0x00000138]: 0xc0720000 -> 0xc0720138 is defrag6_net_exit
[ 429.878706] nf_defrag_ipv6: R_MIPS_32 [0x000012c8]: 0xc0720000 -> 0xc07212c8 is nf_ct_net_init
[ 429.888335] nf_defrag_ipv6: R_MIPS_32 [0x0000142c]: 0xc0720000 -> 0xc072142c is nf_ct_net_pre_exit
[ 429.898367] nf_defrag_ipv6: R_MIPS_32 [0x00001440]: 0xc0720000 -> 0xc0721440 is nf_ct_net_exit
[ 429.907994] nf_defrag_ipv6: R_MIPS_32 [0x00000057]: 0xc0721578 -> 0xc07215cf is broken
[ 429.916872] nf_defrag_ipv6: R_MIPS_32 [0x00000000]: 0x80f297f0 -> 0x80f297f0 is proc_dointvec_jiffies
[ 429.927177] nf_defrag_ipv6: R_MIPS_32 [0x00000039]: 0xc0721578 -> 0xc07215b1 is broken
[ 429.936044] nf_defrag_ipv6: R_MIPS_32 [0x00000000]: 0x80f29374 -> 0x80f29374 is proc_doulongvec_minmax
[ 429.946453] nf_defrag_ipv6: R_MIPS_32 [0x00000072]: 0xc0721578 -> 0xc07215ea is broken
[ 429.955320] nf_defrag_ipv6: R_MIPS_32 [0x00000000]: 0x80f29374 -> 0x80f29374 is proc_doulongvec_minmax
[ 429.965737] nf_defrag_ipv6: R_MIPS_32 [0x000000a4]: 0xc0720000 -> 0xc07200a4 is ipv6_defrag
[ 429.975094] nf_defrag_ipv6: R_MIPS_32 [0x000000a4]: 0xc0720000 -> 0xc07200a4 is ipv6_defrag
[ 429.984431] nf_defrag_ipv6: R_MIPS_32 [0x0000106c]: 0xc0720000 -> 0xc072106c is ip6frag_key_hashfn
[ 429.994470] nf_defrag_ipv6: R_MIPS_32 [0x00001090]: 0xc0720000 -> 0xc0721090 is ip6frag_obj_hashfn
[ 430.004486] nf_defrag_ipv6: R_MIPS_32 [0x000010b8]: 0xc0720000 -> 0xc07210b8 is ip6frag_obj_cmpfn
[ 430.014425] nf_defrag_ipv6: R_MIPS_32 [0x00000000]: 0xc0720000 -> 0xc0720000 is nf_defrag_ipv6_enable
[ 430.024742] nf_defrag_ipv6: R_MIPS_32 [0x00000001]: 0xc0721508 -> 0xc0721509 is __kstrtab_nf_defrag_ipv6_enable
[ 430.036074] nf_defrag_ipv6: R_MIPS_32 [0x00000000]: 0xc0721508 -> 0xc0721508 is __kstrtabns_nf_defrag_ipv6_enable
[ 430.047561] nf_defrag_ipv6: R_MIPS_32 [0x00000000]: 0xc0728000 -> 0xc0728000 is init_module
[ 430.056930] nf_defrag_ipv6: R_MIPS_32 [0x00000000]: 0xc07214c0 -> 0xc07214c0 is cleanup_module
At least five symbols are broken and lead to nowhere: two from .data
and three from .rodata. Values in square braces are initial references
that can be observed via `nm -n` -- and for broken ones they really
don't correspond to any symbols, mismatching the neighbours' addresses
by 0x40-0x50.
So for now seems like it's really an LLVM problem and there can't be
any simple workaround for it in the kernel.
> [0] https://elixir.bootlin.com/linux/v5.11-rc2/source/drivers/net/phy/phy_device.c#L2989
>
>> Thanks,
>> ~Nick Desaulniers
>
> Thanks,
> Al
Al
From: Alexander Lobakin <[email protected]>
Date: Sat, 09 Jan 2021 23:29:26 +0000
> From: Alexander Lobakin <[email protected]>
> Date: Sat, 09 Jan 2021 19:15:31 +0000
>
>> From: Nick Desaulniers <[email protected]>
>> Date: Sat, 9 Jan 2021 09:50:44 -0800
>>
>>> On Sat, Jan 9, 2021 at 9:11 AM Alexander Lobakin <[email protected]> wrote:
>>>>
>>>> Machine: MIPS32 R2 Big Endian (interAptiv (multi))
>>>>
>>>> While testing MIPS with LLVM, I found a weird and very rare bug with
>>>> MIPS relocs that LLVM emits into kernel modules. It happens on both
>>>> 11.0.0 and latest git snapshot and applies, as I can see, only to
>>>> references to static symbols.
>>>>
>>>> When the kernel loads the module, it allocates a space for every
>>>> section and then manually apply the relocations relative to the
>>>> new address.
>>>>
>>>> Let's say we have a function phy_probe() in drivers/net/phy/libphy.ko.
>>>> It's static and referenced only in phy_register_driver(), where it's
>>>> used to fill callback pointer in a structure.
>>>>
>>>> The real function address after module loading is 0xc06c1444, that
>>>> is observed in its ELF st_value field.
>>>> There are two relocs related to this usage in phy_register_driver():
>>>>
>>>> R_MIPS_HI16 refers to 0x3c010000
>>>> R_MIPS_LO16 refers to 0x24339444
>>>>
>>>> The address of .text is 0xc06b8000. So the destination is calculated
>>>> as follows:
>>>>
>>>> 0x00000000 from hi16;
>>>> 0xffff9444 from lo16 (sign extend as it's always treated as signed);
>>>> 0xc06b8000 from base.
>>>>
>>>> = 0xc06b1444. The value is lower than the real phy_probe() address
>>>> (0xc06c1444) by 0x10000 and is lower than the base address of
>>>> module's .text, so it's 100% incorrect.
>>>>
>>>> This results in:
>>>>
>>>> [ 2.204022] CPU 3 Unable to handle kernel paging request at virtual
>>>> address c06b1444, epc == c06b1444, ra == 803f1090
>>>>
>>>> The correct instructions should be:
>>>>
>>>> R_MIPS_HI16 0x3c010001
>>>> R_MIPS_LO16 0x24339444
>>>>
>>>> so there'll be 0x00010000 from hi16.
>>>>
>>>> I tried to catch those bugs in arch/mips/kernel/module.c (by checking
>>>> if the destination is lower than the base address, which should never
>>>> happen), and seems like I have only 3 such places in libphy.ko (and
>>>> one in nf_tables.ko).
>>>> I don't think it should be handled somehow in mentioned source code
>>>> as it would look rather ugly and may break kernels build with GNU
>>>> stack, which seems to not produce such bad codes.
>>>>
>>>> If I should report this to any other resources, please let me know.
>>>> I chose clang-built-linux and LKML as it may not happen with userland
>>>> (didn't tried to catch).
>>>
>>> Thanks for the report. Sounds like we may indeed be producing an
>>> incorrect relocation. This is only seen for big endian triples?
>>
>> Unfortunately I don't have a LE board to play with, so can confirm
>> only Big Endian.
>>
>> (BTW, if someone can say if it's possible for MIPS (and how if it is)
>> to launch a LE kernel from BE-booted preloader and U-Boot, that would
>> be super cool)
>>
>>> Getting a way for us to deterministically reproduce would be a good
>>> first step. Which config or configs beyond defconfig, and which
>>> relocations specifically are you observing this with?
>>
>> I use `make 32r2_defconfig` which combines several configs from
>> arch/mips/configs:
>> - generic_defconfig;
>> - generic/32r2.config;
>> - generic/eb.config.
>>
>> Aside from that, I enable a bunch of my WIP drivers and the
>> Netfilter. On my setup, this bug is always present in libphy.ko,
>> so CONFIG_PHYLIB=m (with all deps) should be enough.
>>
>> The three failed relocs belongs to this part of code: [0]
>>
>> llvm-readelf on them:
>>
>> Relocation section '.rel.text' at offset 0xbf60 contains 2281 entries:
>> [...]
>> 00005740 00029305 R_MIPS_HI16 00000000 .text
>> 00005744 00029306 R_MIPS_LO16 00000000 .text
>> 00005720 00029305 R_MIPS_HI16 00000000 .text
>> 00005748 00029306 R_MIPS_LO16 00000000 .text
>> 0000573c 00029305 R_MIPS_HI16 00000000 .text
>> 0000574c 00029306 R_MIPS_LO16 00000000 .text
>>
>> The first pair is the one from my first mail:
>> 0x3c010000 <-- should be 0x3c010001 to work properly
>> 0x24339444
>>
>> I'm planning to hunt for more now, will let you know.
>
> Unfortunately, R_MIPS_32 also suffers from that. And unlikely
> R_MIPS_{HI,LO}16, they can't be handled runtime as the values
> are pure random.
> I expanded arch/mips/kernel/module.c a bit, so it tries to find
> the actual symbol in .symtab after each applied relocation and
> print the detailed info. Here's an example from nf_defrag_ipv6
> loading:
>
> [ 429.789793] nf_defrag_ipv6: final section addresses:
> [ 429.795409] =090xc07214fc __ksymtab_gpl
> [ 429.799574] =090xc0720000 .text
> [ 429.802902] =090xc07216b0 .data
> [ 429.806249] =090xc0721790 .bss
> [ 429.809474] =090xc0721508 __ksymtab_strings
> [ 429.813977] =090xc0728000 .init.text
> [ 429.817781] =090xc07214c0 .exit.text
> [ 429.821606] =090xc0721520 .rodata
> [ 429.825120] =090xc0721578 .rodata.str1.1
> [ 429.829322] =090xc0721638 .note.Linux
> [ 429.833226] =090xc0721800 .gnu.linkonce.this_module
> [ 429.838503] =090xc0721650 .MIPS.abiflags
> [ 429.842702] =090xc0721668 .reginfo
> [ 429.846326] =090xc0721680 .note.gnu.build-id
> [ 429.851129] nf_defrag_ipv6: R_MIPS_32 [0x00000008]: 0xc07216b0 -> 0xc07216b8 is broken
> [ 429.860017] nf_defrag_ipv6: R_MIPS_32 [0x00000008]: 0xc07216b0 -> 0xc07216b8 is broken
> [ 429.868875] nf_defrag_ipv6: R_MIPS_32 [0x00000138]: 0xc0720000 -> 0xc0720138 is defrag6_net_exit
> [ 429.878706] nf_defrag_ipv6: R_MIPS_32 [0x000012c8]: 0xc0720000 -> 0xc07212c8 is nf_ct_net_init
> [ 429.888335] nf_defrag_ipv6: R_MIPS_32 [0x0000142c]: 0xc0720000 -> 0xc072142c is nf_ct_net_pre_exit
> [ 429.898367] nf_defrag_ipv6: R_MIPS_32 [0x00001440]: 0xc0720000 -> 0xc0721440 is nf_ct_net_exit
> [ 429.907994] nf_defrag_ipv6: R_MIPS_32 [0x00000057]: 0xc0721578 -> 0xc07215cf is broken
> [ 429.916872] nf_defrag_ipv6: R_MIPS_32 [0x00000000]: 0x80f297f0 -> 0x80f297f0 is proc_dointvec_jiffies
> [ 429.927177] nf_defrag_ipv6: R_MIPS_32 [0x00000039]: 0xc0721578 -> 0xc07215b1 is broken
> [ 429.936044] nf_defrag_ipv6: R_MIPS_32 [0x00000000]: 0x80f29374 -> 0x80f29374 is proc_doulongvec_minmax
> [ 429.946453] nf_defrag_ipv6: R_MIPS_32 [0x00000072]: 0xc0721578 -> 0xc07215ea is broken
> [ 429.955320] nf_defrag_ipv6: R_MIPS_32 [0x00000000]: 0x80f29374 -> 0x80f29374 is proc_doulongvec_minmax
> [ 429.965737] nf_defrag_ipv6: R_MIPS_32 [0x000000a4]: 0xc0720000 -> 0xc07200a4 is ipv6_defrag
> [ 429.975094] nf_defrag_ipv6: R_MIPS_32 [0x000000a4]: 0xc0720000 -> 0xc07200a4 is ipv6_defrag
> [ 429.984431] nf_defrag_ipv6: R_MIPS_32 [0x0000106c]: 0xc0720000 -> 0xc072106c is ip6frag_key_hashfn
> [ 429.994470] nf_defrag_ipv6: R_MIPS_32 [0x00001090]: 0xc0720000 -> 0xc0721090 is ip6frag_obj_hashfn
> [ 430.004486] nf_defrag_ipv6: R_MIPS_32 [0x000010b8]: 0xc0720000 -> 0xc07210b8 is ip6frag_obj_cmpfn
> [ 430.014425] nf_defrag_ipv6: R_MIPS_32 [0x00000000]: 0xc0720000 -> 0xc0720000 is nf_defrag_ipv6_enable
> [ 430.024742] nf_defrag_ipv6: R_MIPS_32 [0x00000001]: 0xc0721508 -> 0xc0721509 is __kstrtab_nf_defrag_ipv6_enable
> [ 430.036074] nf_defrag_ipv6: R_MIPS_32 [0x00000000]: 0xc0721508 -> 0xc0721508 is __kstrtabns_nf_defrag_ipv6_enable
> [ 430.047561] nf_defrag_ipv6: R_MIPS_32 [0x00000000]: 0xc0728000 -> 0xc0728000 is init_module
> [ 430.056930] nf_defrag_ipv6: R_MIPS_32 [0x00000000]: 0xc07214c0 -> 0xc07214c0 is cleanup_module
>
> At least five symbols are broken and lead to nowhere: two from .data
> and three from .rodata. Values in square braces are initial references
> that can be observed via `nm -n` -- and for broken ones they really
> don't correspond to any symbols, mismatching the neighbours' addresses
> by 0x40-0x50.
Oops, my bad. I forgot that they can point to the middle of the struct
or array or what else. Nevermind, only the problem with R_MIPS_HI16 is
actual.
With the "add 0x10000 if can't find the symbol" workaround I'm now
able to run kernel and modules without any visible defects or issues.
Full list of detected and fixed relocs on my setup:
libphy: R_MIPS_HI16 [0x3c030000, 0x24639444]: .text -> 0xc06b1444 is broken
libphy: R_MIPS_HI16 [0x3c030000, 0x24639444]: .text -> 0xc06c1444 is phy_probe
libphy: R_MIPS_HI16 [0x3c020000, 0x2442970c]: .text -> 0xc06b170c is broken
libphy: R_MIPS_HI16 [0x3c020000, 0x2442970c]: .text -> 0xc06c170c is phy_remove
libphy: R_MIPS_HI16 [0x3c010000, 0x242197ac]: .text -> 0xc06b17ac is broken
libphy: R_MIPS_HI16 [0x3c010000, 0x242197ac]: .text -> 0xc06c17ac is phy_shutdown
nf_tables: R_MIPS_HI16 [0x3c010001, 0x24218164]: .text -> 0xc07bc164 is broken
nf_tables: R_MIPS_HI16 [0x3c010001, 0x24218164]: .text -> 0xc07cc164 is nf_tables_dump_obj_done
nf_tables: R_MIPS_HI16 [0x3c010001, 0x243981bc]: .text -> 0xc07bc1bc is broken
nf_tables: R_MIPS_HI16 [0x3c010001, 0x243981bc]: .text -> 0xc07cc1bc is nft_flowtable_parse_hook
nf_tables: R_MIPS_HI16 [0x3c010001, 0x24398390]: .text -> 0xc07bc390 is broken
nf_tables: R_MIPS_HI16 [0x3c010001, 0x24398390]: .text -> 0xc07cc390 is nft_register_flowtable_net_hooks
nf_tables: R_MIPS_HI16 [0x3c010001, 0x243981bc]: .text -> 0xc07bc1bc is broken
nf_tables: R_MIPS_HI16 [0x3c010001, 0x243981bc]: .text -> 0xc07cc1bc is nft_flowtable_parse_hook
nf_tables: R_MIPS_HI16 [0x3c010001, 0x24398390]: .text -> 0xc07bc390 is broken
nf_tables: R_MIPS_HI16 [0x3c010001, 0x24398390]: .text -> 0xc07cc390 is nft_register_flowtable_net_hooks
nf_tables: R_MIPS_HI16 [0x3c010001, 0x2421866c]: .text -> 0xc07bc66c is broken
nf_tables: R_MIPS_HI16 [0x3c010001, 0x2421866c]: .text -> 0xc07cc66c is nf_tables_dump_flowtable
nf_tables: R_MIPS_HI16 [0x3c010001, 0x242185b4]: .text -> 0xc07bc5b4 is broken
nf_tables: R_MIPS_HI16 [0x3c010001, 0x242185b4]: .text -> 0xc07cc5b4 is nf_tables_dump_flowtable_start
nf_tables: R_MIPS_HI16 [0x3c010001, 0x243981bc]: .text -> 0xc07bc1bc is broken
nf_tables: R_MIPS_HI16 [0x3c010001, 0x243981bc]: .text -> 0xc07cc1bc is nft_flowtable_parse_hook
nf_tables: R_MIPS_HI16 [0x3c020002, 0x24428080]: .text -> 0xc07cc080 is broken
nf_tables: R_MIPS_HI16 [0x3c020002, 0x24428080]: .text -> 0xc07dc080 is nft_rbtree_gc
nf_conntrack: R_MIPS_HI16 [0x3c010000, 0x24258538]: .text -> 0xc077c538 is broken
nf_conntrack: R_MIPS_HI16 [0x3c010000, 0x24258538]: .text -> 0xc078c538 is nf_ct_expectation_timed_out
> So for now seems like it's really an LLVM problem and there can't be
> any simple workaround for it in the kernel.
>
>> [0] https://elixir.bootlin.com/linux/v5.11-rc2/source/drivers/net/phy/phy_device.c#L2989
>>
>>> Thanks,
>>> ~Nick Desaulniers
>>
>> Thanks,
>> Al
>
> Al
Al
Hi Alexander,
I'm genuinely trying to reproduce/understand this report, questions below:
On Sat, Jan 9, 2021 at 11:15 AM Alexander Lobakin <[email protected]> wrote:
>
> From: Nick Desaulniers <[email protected]>
> Date: Sat, 9 Jan 2021 09:50:44 -0800
>
> > On Sat, Jan 9, 2021 at 9:11 AM Alexander Lobakin <[email protected]> wrote:
> >>
> >> Machine: MIPS32 R2 Big Endian (interAptiv (multi))
> >>
> >> While testing MIPS with LLVM, I found a weird and very rare bug with
> >> MIPS relocs that LLVM emits into kernel modules. It happens on both
> >> 11.0.0 and latest git snapshot and applies, as I can see, only to
> >> references to static symbols.
> >>
> >> When the kernel loads the module, it allocates a space for every
> >> section and then manually apply the relocations relative to the
> >> new address.
> >>
> >> Let's say we have a function phy_probe() in drivers/net/phy/libphy.ko.
> >> It's static and referenced only in phy_register_driver(), where it's
> >> used to fill callback pointer in a structure.
> >>
> >> The real function address after module loading is 0xc06c1444, that
> >> is observed in its ELF st_value field.
> >> There are two relocs related to this usage in phy_register_driver():
> >>
> >> R_MIPS_HI16 refers to 0x3c010000
> >> R_MIPS_LO16 refers to 0x24339444
Sorry, how are these calculated? (Explicit shell commands invoked
would be appreciated)
I'm doing:
$ ARCH=mips CROSS_COMPILE=mips-linux-gnu- make CC=clang -j71 32r2_defconfig
$ ARCH=mips CROSS_COMPILE=mips-linux-gnu- make CC=clang -j71 modules
$ llvm-nm --format=sysv drivers/net/phy/phy_device.o | grep phy_probe
$ llvm-objdump -Dr --disassemble-symbols=phy_driver_register
drivers/net/phy/phy_device.o
$ llvm-readelf -r drivers/net/phy/phy_device.o | grep -e R_MIPS_HI16
-e R_MIPS_LO16
for some of the commands trying to verify.
> >>
> >> The address of .text is 0xc06b8000. So the destination is calculated
> >> as follows:
> >>
> >> 0x00000000 from hi16;
> >> 0xffff9444 from lo16 (sign extend as it's always treated as signed);
> >> 0xc06b8000 from base.
> >>
> >> = 0xc06b1444. The value is lower than the real phy_probe() address
> >> (0xc06c1444) by 0x10000 and is lower than the base address of
> >> module's .text, so it's 100% incorrect.
The disassembly for me produces:
399c: 3c 03 00 00 lui $3, 0 <phy_device_free>
0000399c: R_MIPS_HI16 .text
...
39a8: 24 63 3a 5c addiu $3, $3, 14940 <phy_probe>
000039a8: R_MIPS_LO16 .text
I'm not really sure how to manually resolve the relocations; Fangrui
do you have any tips? (I'm coincidentally reading through Linkers &
Loaders currently, but only just started chpt. 4).
> >>
> >> This results in:
> >>
> >> [ 2.204022] CPU 3 Unable to handle kernel paging request at virtual
> >> address c06b1444, epc == c06b1444, ra == 803f1090
> >>
> >> The correct instructions should be:
> >>
> >> R_MIPS_HI16 0x3c010001
> >> R_MIPS_LO16 0x24339444
> >>
> >> so there'll be 0x00010000 from hi16.
> >>
> >> I tried to catch those bugs in arch/mips/kernel/module.c (by checking
> >> if the destination is lower than the base address, which should never
> >> happen), and seems like I have only 3 such places in libphy.ko (and
> >> one in nf_tables.ko).
> >> I don't think it should be handled somehow in mentioned source code
> >> as it would look rather ugly and may break kernels build with GNU
> >> stack, which seems to not produce such bad codes.
> >>
> >> If I should report this to any other resources, please let me know.
> >> I chose clang-built-linux and LKML as it may not happen with userland
> >> (didn't tried to catch).
> >
> > Thanks for the report. Sounds like we may indeed be producing an
> > incorrect relocation. This is only seen for big endian triples?
>
> Unfortunately I don't have a LE board to play with, so can confirm
> only Big Endian.
>
> (BTW, if someone can say if it's possible for MIPS (and how if it is)
> to launch a LE kernel from BE-booted preloader and U-Boot, that would
> be super cool)
>
> > Getting a way for us to deterministically reproduce would be a good
> > first step. Which config or configs beyond defconfig, and which
> > relocations specifically are you observing this with?
>
> I use `make 32r2_defconfig` which combines several configs from
> arch/mips/configs:
> - generic_defconfig;
> - generic/32r2.config;
> - generic/eb.config.
>
> Aside from that, I enable a bunch of my WIP drivers and the
> Netfilter. On my setup, this bug is always present in libphy.ko,
> so CONFIG_PHYLIB=m (with all deps) should be enough.
>
> The three failed relocs belongs to this part of code: [0]
>
> llvm-readelf on them:
>
> Relocation section '.rel.text' at offset 0xbf60 contains 2281 entries:¬
> [...]
> 00005740 00029305 R_MIPS_HI16 00000000 .text
> 00005744 00029306 R_MIPS_LO16 00000000 .text
> 00005720 00029305 R_MIPS_HI16 00000000 .text
> 00005748 00029306 R_MIPS_LO16 00000000 .text
> 0000573c 00029305 R_MIPS_HI16 00000000 .text
> 0000574c 00029306 R_MIPS_LO16 00000000 .text
>
> The first pair is the one from my first mail:
> 0x3c010000 <-- should be 0x3c010001 to work properly
> 0x24339444
>
> I'm planning to hunt for more now, will let you know.
>
> [0] https://elixir.bootlin.com/linux/v5.11-rc2/source/drivers/net/phy/phy_device.c#L2989
>
> > Thanks,
> > ~Nick Desaulniers
>
> Thanks,
> Al
>
--
Thanks,
~Nick Desaulniers
From: Nick Desaulniers <[email protected]>
Date: Mon, 11 Jan 2021 12:03:29 -0800
> Hi Alexander,
> I'm genuinely trying to reproduce/understand this report, questions below:
Hi Nick!
> On Sat, Jan 9, 2021 at 11:15 AM Alexander Lobakin <[email protected]> wrote:
>>
>> From: Nick Desaulniers <[email protected]>
>> Date: Sat, 9 Jan 2021 09:50:44 -0800
>>
>>> On Sat, Jan 9, 2021 at 9:11 AM Alexander Lobakin <[email protected]> wrote:
>>>>
>>>> Machine: MIPS32 R2 Big Endian (interAptiv (multi))
>>>>
>>>> While testing MIPS with LLVM, I found a weird and very rare bug with
>>>> MIPS relocs that LLVM emits into kernel modules. It happens on both
>>>> 11.0.0 and latest git snapshot and applies, as I can see, only to
>>>> references to static symbols.
>>>>
>>>> When the kernel loads the module, it allocates a space for every
>>>> section and then manually apply the relocations relative to the
>>>> new address.
>>>>
>>>> Let's say we have a function phy_probe() in drivers/net/phy/libphy.ko.
>>>> It's static and referenced only in phy_register_driver(), where it's
>>>> used to fill callback pointer in a structure.
>>>>
>>>> The real function address after module loading is 0xc06c1444, that
>>>> is observed in its ELF st_value field.
>>>> There are two relocs related to this usage in phy_register_driver():
>>>>
>>>> R_MIPS_HI16 refers to 0x3c010000
>>>> R_MIPS_LO16 refers to 0x24339444
>
> Sorry, how are these calculated? (Explicit shell commands invoked
> would be appreciated)
Just look at the disassembly below and you'll see the similar
instructions in your module.
> I'm doing:
> $ ARCH=mips CROSS_COMPILE=mips-linux-gnu- make CC=clang -j71 32r2_defconfig
> $ ARCH=mips CROSS_COMPILE=mips-linux-gnu- make CC=clang -j71 modules
> $ llvm-nm --format=sysv drivers/net/phy/phy_device.o | grep phy_probe
> $ llvm-objdump -Dr --disassemble-symbols=phy_driver_register drivers/net/phy/phy_device.o
> $ llvm-readelf -r drivers/net/phy/phy_device.o | grep -e R_MIPS_HI16 -e R_MIPS_LO16
>
> for some of the commands trying to verify.
>
>>>>
>>>> The address of .text is 0xc06b8000. So the destination is calculated
>>>> as follows:
>>>>
>>>> 0x00000000 from hi16;
>>>> 0xffff9444 from lo16 (sign extend as it's always treated as signed);
>>>> 0xc06b8000 from base.
>>>>
>>>> = 0xc06b1444. The value is lower than the real phy_probe() address
>>>> (0xc06c1444) by 0x10000 and is lower than the base address of
>>>> module's .text, so it's 100% incorrect.
>
> The disassembly for me produces:
> 399c: 3c 03 00 00 lui $3, 0 <phy_device_free>
> 0000399c: R_MIPS_HI16 .text
> ...
> 39a8: 24 63 3a 5c addiu $3, $3, 14940 <phy_probe>
> 000039a8: R_MIPS_LO16 .text
So, in your case the values of the instructions that relocs refer are:
0x3c030000 R_MIPS_HI16
0x24633a5c R_MIPS_LO16
Mine were:
0x3c010000
0x24339444
Your second one doesn't have bit 15 set, so I think this pair won't
break the code.
Try to hunt for R_MIPS_LO16 that have this bit set, i.e. they have
'8', '9', 'a', 'b', 'c', 'd' or 'e' as their [15:12].
> I'm not really sure how to manually resolve the relocations; Fangrui
> do you have any tips? (I'm coincidentally reading through Linkers &
> Loaders currently, but only just started chpt. 4).
>
>>>>
>>>> This results in:
>>>>
>>>> [ 2.204022] CPU 3 Unable to handle kernel paging request at virtual
>>>> address c06b1444, epc == c06b1444, ra == 803f1090
>>>>
>>>> The correct instructions should be:
>>>>
>>>> R_MIPS_HI16 0x3c010001
>>>> R_MIPS_LO16 0x24339444
>>>>
>>>> so there'll be 0x00010000 from hi16.
>>>>
>>>> I tried to catch those bugs in arch/mips/kernel/module.c (by checking
>>>> if the destination is lower than the base address, which should never
>>>> happen), and seems like I have only 3 such places in libphy.ko (and
>>>> one in nf_tables.ko).
>>>> I don't think it should be handled somehow in mentioned source code
>>>> as it would look rather ugly and may break kernels build with GNU
>>>> stack, which seems to not produce such bad codes.
>>>>
>>>> If I should report this to any other resources, please let me know.
>>>> I chose clang-built-linux and LKML as it may not happen with userland
>>>> (didn't tried to catch).
>>>
>>> Thanks for the report. Sounds like we may indeed be producing an
>>> incorrect relocation. This is only seen for big endian triples?
>>
>> Unfortunately I don't have a LE board to play with, so can confirm
>> only Big Endian.
>>
>> (BTW, if someone can say if it's possible for MIPS (and how if it is)
>> to launch a LE kernel from BE-booted preloader and U-Boot, that would
>> be super cool)
>>
>>> Getting a way for us to deterministically reproduce would be a good
>>> first step. Which config or configs beyond defconfig, and which
>>> relocations specifically are you observing this with?
>>
>> I use `make 32r2_defconfig` which combines several configs from
>> arch/mips/configs:
>> - generic_defconfig;
>> - generic/32r2.config;
>> - generic/eb.config.
>>
>> Aside from that, I enable a bunch of my WIP drivers and the
>> Netfilter. On my setup, this bug is always present in libphy.ko,
>> so CONFIG_PHYLIB=m (with all deps) should be enough.
>>
>> The three failed relocs belongs to this part of code: [0]
>>
>> llvm-readelf on them:
>>
>> Relocation section '.rel.text' at offset 0xbf60 contains 2281 entries:
>> [...]
>> 00005740 00029305 R_MIPS_HI16 00000000 .text
>> 00005744 00029306 R_MIPS_LO16 00000000 .text
>> 00005720 00029305 R_MIPS_HI16 00000000 .text
>> 00005748 00029306 R_MIPS_LO16 00000000 .text
>> 0000573c 00029305 R_MIPS_HI16 00000000 .text
>> 0000574c 00029306 R_MIPS_LO16 00000000 .text
>>
>> The first pair is the one from my first mail:
>> 0x3c010000 <-- should be 0x3c010001 to work properly
>> 0x24339444
>>
>> I'm planning to hunt for more now, will let you know.
>>
>> [0] https://elixir.bootlin.com/linux/v5.11-rc2/source/drivers/net/phy/phy_device.c#L2989
>>
>> > Thanks,
>> > ~Nick Desaulniers
>>
>> Thanks,
>> Al
>>
>
> Thanks,
> ~Nick Desaulniers
Thanks,
Al
On Mon, Jan 11, 2021 at 12:50 PM Alexander Lobakin <[email protected]> wrote:
>
> > The disassembly for me produces:
> > 399c: 3c 03 00 00 lui $3, 0 <phy_device_free>
> > 0000399c: R_MIPS_HI16 .text
> > ...
> > 39a8: 24 63 3a 5c addiu $3, $3, 14940 <phy_probe>
> > 000039a8: R_MIPS_LO16 .text
>
> So, in your case the values of the instructions that relocs refer are:
>
> 0x3c030000 R_MIPS_HI16
> 0x24633a5c R_MIPS_LO16
>
> Mine were:
>
> 0x3c010000
> 0x24339444
>
> Your second one doesn't have bit 15 set, so I think this pair won't
> break the code.
> Try to hunt for R_MIPS_LO16 that have this bit set, i.e. they have
> '8', '9', 'a', 'b', 'c', 'd' or 'e' as their [15:12].
I don't think any of my R_MIPS_LO16 in that file have that bit set.
See attached.
--
Thanks,
~Nick Desaulniers
From: Nick Desaulniers <[email protected]>
Date: Tue, 12 Jan 2021 14:14:58 -0800
> On Mon, Jan 11, 2021 at 12:50 PM Alexander Lobakin <[email protected]> wrote:
>>
>>> The disassembly for me produces:
>>> 399c: 3c 03 00 00 lui $3, 0 <phy_device_free>
>>> 0000399c: R_MIPS_HI16 .text
>>> ...
>>> 39a8: 24 63 3a 5c addiu $3, $3, 14940 <phy_probe>
>>> 000039a8: R_MIPS_LO16 .text
>>
>> So, in your case the values of the instructions that relocs refer are:
>>
>> 0x3c030000 R_MIPS_HI16
>> 0x24633a5c R_MIPS_LO16
>>
>> Mine were:
>>
>> 0x3c010000
>> 0x24339444
>>
>> Your second one doesn't have bit 15 set, so I think this pair won't
>> break the code.
>> Try to hunt for R_MIPS_LO16 that have this bit set, i.e. they have
>> '8', '9', 'a', 'b', 'c', 'd' or 'e' as their [15:12].
>
> I don't think any of my R_MIPS_LO16 in that file have that bit set.
> See attached.
Well, seems like you got lucky here.
You can try to check for other modules if any of them have
R_MIPS_LO16 with bit 15 set.
Also, you can try to enable:
CONFIG_LD_DEAD_CODE_DATA_ELIMINATION=y
CONFIG_TRIM_UNUSED_KSYMS=y
to alter the build process. These options didn't change anything
in terms of relocs for me though.
> --
> Thanks,
> ~Nick Desaulniers
Thanks,
Al
On Sat, Jan 09, 2021 at 05:11:18PM +0000, Alexander Lobakin wrote:
> Machine: MIPS32 R2 Big Endian (interAptiv (multi))
>
> While testing MIPS with LLVM, I found a weird and very rare bug with
> MIPS relocs that LLVM emits into kernel modules. It happens on both
> 11.0.0 and latest git snapshot and applies, as I can see, only to
> references to static symbols.
>
> When the kernel loads the module, it allocates a space for every
> section and then manually apply the relocations relative to the
> new address.
>
> Let's say we have a function phy_probe() in drivers/net/phy/libphy.ko.
> It's static and referenced only in phy_register_driver(), where it's
> used to fill callback pointer in a structure.
>
> The real function address after module loading is 0xc06c1444, that
> is observed in its ELF st_value field.
> There are two relocs related to this usage in phy_register_driver():
>
> R_MIPS_HI16 refers to 0x3c010000
> R_MIPS_LO16 refers to 0x24339444
>
> The address of .text is 0xc06b8000. So the destination is calculated
> as follows:
>
> 0x00000000 from hi16;
> 0xffff9444 from lo16 (sign extend as it's always treated as signed);
> 0xc06b8000 from base.
>
> = 0xc06b1444. The value is lower than the real phy_probe() address
> (0xc06c1444) by 0x10000 and is lower than the base address of
> module's .text, so it's 100% incorrect.
>
> This results in:
>
> [ 2.204022] CPU 3 Unable to handle kernel paging request at virtual
> address c06b1444, epc == c06b1444, ra == 803f1090
>
> The correct instructions should be:
>
> R_MIPS_HI16 0x3c010001
> R_MIPS_LO16 0x24339444
>
> so there'll be 0x00010000 from hi16.
>
> I tried to catch those bugs in arch/mips/kernel/module.c (by checking
> if the destination is lower than the base address, which should never
> happen), and seems like I have only 3 such places in libphy.ko (and
> one in nf_tables.ko).
> I don't think it should be handled somehow in mentioned source code
> as it would look rather ugly and may break kernels build with GNU
> stack, which seems to not produce such bad codes.
>
> If I should report this to any other resources, please let me know.
> I chose clang-built-linux and LKML as it may not happen with userland
> (didn't tried to catch).
>
> Thanks,
> Al
>
Hi Alexander,
Doubling back around to this as I was browsing through the LLVM 12.0.1
blockers on LLVM's bug tracker and I noticed a commit that could resolve
this? It refers to the same relocations that you reference here.
https://bugs.llvm.org/show_bug.cgi?id=49821
http://github.com/llvm/llvm-project/commit/7e83a7f1fdfcc2edde61f0a535f9d7a56f531db9
I think that Debian's apt.llvm.org repository should have a build
available with that commit in it. Otherwise, building it from source is
not too complicated with my script:
https://github.com/ClangBuiltLinux/tc-build
$ ./build-llvm.py --build-stage1-only --install-stage1-only --projects "clang;lld" --targets "Mips;X86"
would get you a working toolchain relatively quickly.
Cheers,
Nathan
From: Nathan Chancellor <[email protected]>
Date: Fri, 7 May 2021 00:47:14 -0700
> On Sat, Jan 09, 2021 at 05:11:18PM +0000, Alexander Lobakin wrote:
> > Machine: MIPS32 R2 Big Endian (interAptiv (multi))
> >
> > While testing MIPS with LLVM, I found a weird and very rare bug with
> > MIPS relocs that LLVM emits into kernel modules. It happens on both
> > 11.0.0 and latest git snapshot and applies, as I can see, only to
> > references to static symbols.
> >
> > When the kernel loads the module, it allocates a space for every
> > section and then manually apply the relocations relative to the
> > new address.
> >
> > Let's say we have a function phy_probe() in drivers/net/phy/libphy.ko.
> > It's static and referenced only in phy_register_driver(), where it's
> > used to fill callback pointer in a structure.
> >
> > The real function address after module loading is 0xc06c1444, that
> > is observed in its ELF st_value field.
> > There are two relocs related to this usage in phy_register_driver():
> >
> > R_MIPS_HI16 refers to 0x3c010000
> > R_MIPS_LO16 refers to 0x24339444
> >
> > The address of .text is 0xc06b8000. So the destination is calculated
> > as follows:
> >
> > 0x00000000 from hi16;
> > 0xffff9444 from lo16 (sign extend as it's always treated as signed);
> > 0xc06b8000 from base.
> >
> > =3D 0xc06b1444. The value is lower than the real phy_probe() address
> > (0xc06c1444) by 0x10000 and is lower than the base address of
> > module's .text, so it's 100% incorrect.
> >
> > This results in:
> >
> > [ 2.204022] CPU 3 Unable to handle kernel paging request at virtual
> > address c06b1444, epc =3D=3D c06b1444, ra =3D=3D 803f1090
> >
> > The correct instructions should be:
> >
> > R_MIPS_HI16 0x3c010001
> > R_MIPS_LO16 0x24339444
> >
> > so there'll be 0x00010000 from hi16.
> >
> > I tried to catch those bugs in arch/mips/kernel/module.c (by checking
> > if the destination is lower than the base address, which should never
> > happen), and seems like I have only 3 such places in libphy.ko (and
> > one in nf_tables.ko).
> > I don't think it should be handled somehow in mentioned source code
> > as it would look rather ugly and may break kernels build with GNU
> > stack, which seems to not produce such bad codes.
> >
> > If I should report this to any other resources, please let me know.
> > I chose clang-built-linux and LKML as it may not happen with userland
> > (didn't tried to catch).
> >
> > Thanks,
> > Al
> >
>
> Hi Alexander,
Hi!
> Doubling back around to this as I was browsing through the LLVM 12.0.1
> blockers on LLVM's bug tracker and I noticed a commit that could resolve
> this? It refers to the same relocations that you reference here.
>
> https://bugs.llvm.org/show_bug.cgi?id=3D49821
>
> http://github.com/llvm/llvm-project/commit/7e83a7f1fdfcc2edde61f0a535f9d7a=
> 56f531db9
This really seems very related to the bug I encountered.
Currently I don't have a MIPS setup to try since I've moved to
another country, but I should "deploy" it again soon. So I'll
definitely take a look a bit later, thanks for pointing on this
commit!
> I think that Debian's apt.llvm.org repository should have a build
> available with that commit in it. Otherwise, building it from source is
> not too complicated with my script:
>
> https://github.com/ClangBuiltLinux/tc-build
>
> $ ./build-llvm.py --build-stage1-only --install-stage1-only --projects "cl=
> ang;lld" --targets "Mips;X86"
>
> would get you a working toolchain relatively quickly.
I could just build llvm-git from Arch Linux User Repository :) I did
that last time when was checking if the latest snapshot also suffers
from the bug, and I think it didn't take much time to build.
> Cheers,
> Nathan
Thanks,
Al