2023-05-26 16:29:12

by Alexandre Ghiti

[permalink] [raw]
Subject: [PATCH -fixes] riscv: Fix relocatable kernels with early alternatives using -fno-pie

Early alternatives are called with the mmu disabled, and then should not
access any global symbols through the GOT since it requires relocations,
relocations that we do before but *virtually*. So only use medany code
model for this early code.

Signed-off-by: Alexandre Ghiti <[email protected]>
---

Note that I'm not very happy with this fix, I think we need to put more
effort into "harmonizing" this very early code (ie before the mmu is
enabled) as it is spread between different locations and compiled
differently. I'll work on that later, but for now, this fix does what is
needed to work (from my testing at least). Any Tested-by on the Unmatched
and T-head boards is welcome!

arch/riscv/errata/Makefile | 4 ++++
arch/riscv/kernel/Makefile | 4 ++++
2 files changed, 8 insertions(+)

diff --git a/arch/riscv/errata/Makefile b/arch/riscv/errata/Makefile
index a1055965fbee..7b2637c8c332 100644
--- a/arch/riscv/errata/Makefile
+++ b/arch/riscv/errata/Makefile
@@ -1,2 +1,6 @@
+ifdef CONFIG_RELOCATABLE
+KBUILD_CFLAGS += -fno-pie
+endif
+
obj-$(CONFIG_ERRATA_SIFIVE) += sifive/
obj-$(CONFIG_ERRATA_THEAD) += thead/
diff --git a/arch/riscv/kernel/Makefile b/arch/riscv/kernel/Makefile
index fbdccc21418a..153864e4f399 100644
--- a/arch/riscv/kernel/Makefile
+++ b/arch/riscv/kernel/Makefile
@@ -23,6 +23,10 @@ ifdef CONFIG_FTRACE
CFLAGS_REMOVE_alternative.o = $(CC_FLAGS_FTRACE)
CFLAGS_REMOVE_cpufeature.o = $(CC_FLAGS_FTRACE)
endif
+ifdef CONFIG_RELOCATABLE
+CFLAGS_alternative.o += -fno-pie
+CFLAGS_cpufeature.o += -fno-pie
+endif
ifdef CONFIG_KASAN
KASAN_SANITIZE_alternative.o := n
KASAN_SANITIZE_cpufeature.o := n
--
2.39.2



2023-05-26 16:33:15

by Conor Dooley

[permalink] [raw]
Subject: Re: [PATCH -fixes] riscv: Fix relocatable kernels with early alternatives using -fno-pie

On Fri, May 26, 2023 at 05:46:30PM +0200, Alexandre Ghiti wrote:
> Early alternatives are called with the mmu disabled, and then should not
> access any global symbols through the GOT since it requires relocations,
> relocations that we do before but *virtually*. So only use medany code
> model for this early code.
>
> Signed-off-by: Alexandre Ghiti <[email protected]>
> ---
>
> Note that I'm not very happy with this fix, I think we need to put more
> effort into "harmonizing" this very early code (ie before the mmu is
> enabled) as it is spread between different locations and compiled
> differently.

Totally & I'll happily spend the time trying to review that work.

> I'll work on that later, but for now, this fix does what is
> needed to work (from my testing at least). Any Tested-by on the Unmatched
> and T-head boards is welcome!

On 6.4-rc1 & v6.4-rc1 + this patch, with CONFIG_RELOCATABLE added to my
config, my Nezha fails to boot. There is no output whatsoever from the
kernel. Turning off CONFIG_RELOCATABLE boots again.

I didn't test on my unmatched.

Thanks,
Conor.


Attachments:
(No filename) (1.10 kB)
signature.asc (235.00 B)
Download all attachments

2023-05-26 16:58:55

by Conor Dooley

[permalink] [raw]
Subject: Re: [PATCH -fixes] riscv: Fix relocatable kernels with early alternatives using -fno-pie

On Fri, May 26, 2023 at 05:24:41PM +0100, Conor Dooley wrote:
> On Fri, May 26, 2023 at 05:46:30PM +0200, Alexandre Ghiti wrote:
> > Early alternatives are called with the mmu disabled, and then should not
> > access any global symbols through the GOT since it requires relocations,
> > relocations that we do before but *virtually*. So only use medany code
> > model for this early code.
> >
> > Signed-off-by: Alexandre Ghiti <[email protected]>
> > ---
> >
> > Note that I'm not very happy with this fix, I think we need to put more
> > effort into "harmonizing" this very early code (ie before the mmu is
> > enabled) as it is spread between different locations and compiled
> > differently.
>
> Totally & I'll happily spend the time trying to review that work.
>
> > I'll work on that later, but for now, this fix does what is
> > needed to work (from my testing at least). Any Tested-by on the Unmatched
> > and T-head boards is welcome!
>
> On 6.4-rc1 & v6.4-rc1 + this patch, with CONFIG_RELOCATABLE added to my
> config, my Nezha fails to boot. There is no output whatsoever from the
> kernel. Turning off CONFIG_RELOCATABLE boots again.

I don't know if this is better or worse news, but same thing happens on
an icicle kit. What systems, other than QEMU, has the relocatable
eries been tested with, btw?

Cheers,
Conor.

>
> I didn't test on my unmatched.
>
> Thanks,
> Conor.



Attachments:
(No filename) (1.41 kB)
signature.asc (235.00 B)
Download all attachments

2023-05-26 17:00:12

by Alexandre Ghiti

[permalink] [raw]
Subject: Re: [PATCH -fixes] riscv: Fix relocatable kernels with early alternatives using -fno-pie


On 26/05/2023 18:24, Conor Dooley wrote:
> On Fri, May 26, 2023 at 05:46:30PM +0200, Alexandre Ghiti wrote:
>> Early alternatives are called with the mmu disabled, and then should not
>> access any global symbols through the GOT since it requires relocations,
>> relocations that we do before but *virtually*. So only use medany code
>> model for this early code.
>>
>> Signed-off-by: Alexandre Ghiti <[email protected]>
>> ---
>>
>> Note that I'm not very happy with this fix, I think we need to put more
>> effort into "harmonizing" this very early code (ie before the mmu is
>> enabled) as it is spread between different locations and compiled
>> differently.
> Totally & I'll happily spend the time trying to review that work.
>
>> I'll work on that later, but for now, this fix does what is
>> needed to work (from my testing at least). Any Tested-by on the Unmatched
>> and T-head boards is welcome!
> On 6.4-rc1 & v6.4-rc1 + this patch, with CONFIG_RELOCATABLE added to my
> config, my Nezha fails to boot. There is no output whatsoever from the
> kernel. Turning off CONFIG_RELOCATABLE boots again.


Damn, that's going to ruin my long week-end...Thanks though, I'll try to
figure out what's going on, too bad I don't have any thead boards!

Thanks again,

Alex


> I didn't test on my unmatched.
>
> Thanks,
> Conor.
>
> _______________________________________________
> linux-riscv mailing list
> [email protected]
> http://lists.infradead.org/mailman/listinfo/linux-riscv

2023-05-27 09:40:39

by Alexandre Ghiti

[permalink] [raw]
Subject: Re: [PATCH -fixes] riscv: Fix relocatable kernels with early alternatives using -fno-pie


On 26/05/2023 18:35, Conor Dooley wrote:
> On Fri, May 26, 2023 at 05:24:41PM +0100, Conor Dooley wrote:
>> On Fri, May 26, 2023 at 05:46:30PM +0200, Alexandre Ghiti wrote:
>>> Early alternatives are called with the mmu disabled, and then should not
>>> access any global symbols through the GOT since it requires relocations,
>>> relocations that we do before but *virtually*. So only use medany code
>>> model for this early code.
>>>
>>> Signed-off-by: Alexandre Ghiti <[email protected]>
>>> ---
>>>
>>> Note that I'm not very happy with this fix, I think we need to put more
>>> effort into "harmonizing" this very early code (ie before the mmu is
>>> enabled) as it is spread between different locations and compiled
>>> differently.
>> Totally & I'll happily spend the time trying to review that work.
>>
>>> I'll work on that later, but for now, this fix does what is
>>> needed to work (from my testing at least). Any Tested-by on the Unmatched
>>> and T-head boards is welcome!
>> On 6.4-rc1 & v6.4-rc1 + this patch, with CONFIG_RELOCATABLE added to my
>> config, my Nezha fails to boot. There is no output whatsoever from the
>> kernel. Turning off CONFIG_RELOCATABLE boots again.
> I don't know if this is better or worse news, but same thing happens on
> an icicle kit. What systems, other than QEMU, has the relocatable
> eries been tested with, btw?


I tested it on the Unmatched (Andreas did too).

Very weird it does not work on the icicle kit, there is no errata for
this soc, so what gets executed this early for this soc? Do you know
where it fails to boot? If you can debug, you should break on the
address of the entry point (usually 0x8020_0000) since this is the stvec
address, so when you get a trap, you will branch there, and then could
you dump $sepc, $ra and $stval when you get there?

Regarding the thead issue, I think the following should fix it:

diff --git a/arch/riscv/mm/Makefile b/arch/riscv/mm/Makefile
index b85e9e82f082..a9bf3f8c7cb4 100644
--- a/arch/riscv/mm/Makefile
+++ b/arch/riscv/mm/Makefile
@@ -3,6 +3,7 @@
 CFLAGS_init.o := -mcmodel=medany
 ifdef CONFIG_RELOCATABLE
 CFLAGS_init.o += -fno-pie
+CFLAGS_dma-noncoherent.o += -fno-pie
 endif

 ifdef CONFIG_FTRACE


>
> Cheers,
> Conor.
>
>> I didn't test on my unmatched.
>>
>> Thanks,
>> Conor.
>
>
> _______________________________________________
> linux-riscv mailing list
> [email protected]
> http://lists.infradead.org/mailman/listinfo/linux-riscv

2023-05-27 10:33:32

by Conor Dooley

[permalink] [raw]
Subject: Re: [PATCH -fixes] riscv: Fix relocatable kernels with early alternatives using -fno-pie

On Sat, May 27, 2023 at 11:13:18AM +0200, Alexandre Ghiti wrote:
>
> On 26/05/2023 18:35, Conor Dooley wrote:
> > On Fri, May 26, 2023 at 05:24:41PM +0100, Conor Dooley wrote:
> > > On Fri, May 26, 2023 at 05:46:30PM +0200, Alexandre Ghiti wrote:
> > > > Early alternatives are called with the mmu disabled, and then should not
> > > > access any global symbols through the GOT since it requires relocations,
> > > > relocations that we do before but *virtually*. So only use medany code
> > > > model for this early code.
> > > >
> > > > Signed-off-by: Alexandre Ghiti <[email protected]>
> > > > ---
> > > >
> > > > Note that I'm not very happy with this fix, I think we need to put more
> > > > effort into "harmonizing" this very early code (ie before the mmu is
> > > > enabled) as it is spread between different locations and compiled
> > > > differently.
> > > Totally & I'll happily spend the time trying to review that work.
> > >
> > > > I'll work on that later, but for now, this fix does what is
> > > > needed to work (from my testing at least). Any Tested-by on the Unmatched
> > > > and T-head boards is welcome!
> > > On 6.4-rc1 & v6.4-rc1 + this patch, with CONFIG_RELOCATABLE added to my
> > > config, my Nezha fails to boot. There is no output whatsoever from the
> > > kernel. Turning off CONFIG_RELOCATABLE boots again.
> > I don't know if this is better or worse news, but same thing happens on
> > an icicle kit. What systems, other than QEMU, has the relocatable
> > eries been tested with, btw?
>
>
> I tested it on the Unmatched (Andreas did too).

Cool. I cracked out my unmatched and it has the same issue as the
icicle. Ditto my Visionfive v2. Here's my config.
https://raw.githubusercontent.com/ConchuOD/riscv-env/dev/conf/defconfig

A ~default qemu virt doesn't work either. (-m 2G -smp 5)

> Very weird it does not work on the icicle kit, there is no errata for this
> soc, so what gets executed this early for this soc? Do you know where it
> fails to boot? If you can debug, you should break on the address of the
> entry point (usually 0x8020_0000) since this is the stvec address, so when
> you get a trap, you will branch there, and then could you dump $sepc, $ra
> and $stval when you get there?

> Regarding the thead issue, I think the following should fix it:

It did not :/

Cheers,
Conor.


Attachments:
(No filename) (2.34 kB)
signature.asc (235.00 B)
Download all attachments

2023-05-28 13:37:16

by Conor Dooley

[permalink] [raw]
Subject: Re: [PATCH -fixes] riscv: Fix relocatable kernels with early alternatives using -fno-pie

On Sun, May 28, 2023 at 03:00:57PM +0200, Alexandre Ghiti wrote:
> On Sat, May 27, 2023 at 12:02 PM Conor Dooley <[email protected]> wrote:
> >
> > On Sat, May 27, 2023 at 11:13:18AM +0200, Alexandre Ghiti wrote:
> > >
> > > On 26/05/2023 18:35, Conor Dooley wrote:
> > > > On Fri, May 26, 2023 at 05:24:41PM +0100, Conor Dooley wrote:
> > > > > On Fri, May 26, 2023 at 05:46:30PM +0200, Alexandre Ghiti wrote:
> > > > > > Early alternatives are called with the mmu disabled, and then should not
> > > > > > access any global symbols through the GOT since it requires relocations,
> > > > > > relocations that we do before but *virtually*. So only use medany code
> > > > > > model for this early code.
> > > > > >
> > > > > > Signed-off-by: Alexandre Ghiti <[email protected]>
> > > > > > ---
> > > > > >
> > > > > > Note that I'm not very happy with this fix, I think we need to put more
> > > > > > effort into "harmonizing" this very early code (ie before the mmu is
> > > > > > enabled) as it is spread between different locations and compiled
> > > > > > differently.
> > > > > Totally & I'll happily spend the time trying to review that work.
> > > > >
> > > > > > I'll work on that later, but for now, this fix does what is
> > > > > > needed to work (from my testing at least). Any Tested-by on the Unmatched
> > > > > > and T-head boards is welcome!
> > > > > On 6.4-rc1 & v6.4-rc1 + this patch, with CONFIG_RELOCATABLE added to my
> > > > > config, my Nezha fails to boot. There is no output whatsoever from the
> > > > > kernel. Turning off CONFIG_RELOCATABLE boots again.
> > > > I don't know if this is better or worse news, but same thing happens on
> > > > an icicle kit. What systems, other than QEMU, has the relocatable
> > > > eries been tested with, btw?
> > >
> > >
> > > I tested it on the Unmatched (Andreas did too).
> >
> > Cool. I cracked out my unmatched and it has the same issue as the
> > icicle. Ditto my Visionfive v2. Here's my config.
> > https://raw.githubusercontent.com/ConchuOD/riscv-env/dev/conf/defconfig
> >
> > A ~default qemu virt doesn't work either. (-m 2G -smp 5)
>
> I can boot with this config using:
>
> $ sudo ~/qemu/build/qemu-system-riscv64 -machine virt -cpu
> rv64,sv48=off -nographic -m 2G -smp 5 -kernel
> build_conor/arch/riscv/boot/Image -s

Just in case, that is my normal config that I use for testing random
stuff on LKML, I added CONFIG_RELOCATABLE in addition to that.

> I noticed when trying to add this to our internal CI that I had local
> failures that did not happen in the CI because the CI was not using
> the same toolchain: can you give me the full .config? So that I can
> see if the compiler added stack guards or some other things I did not
> think of.

https://gist.githubusercontent.com/ConchuOD/655f9cc19fb3be63f1c9da7e7e3ab717/raw/a1aad3c0d307609b2062fd3a66705166aede9f9f/.config

90% of what I test for upstream stuff uses clang, since clang appears to
be a minority choice - but I could reproduce this with gcc-12 as well,
using the same defconfig as linked above + CONFIG_RELOCATABLE.

Cheers,
Conor.


Attachments:
(No filename) (3.08 kB)
signature.asc (235.00 B)
Download all attachments

2023-05-28 14:02:14

by Alexandre Ghiti

[permalink] [raw]
Subject: Re: [PATCH -fixes] riscv: Fix relocatable kernels with early alternatives using -fno-pie


On 28/05/2023 15:12, Conor Dooley wrote:
> On Sun, May 28, 2023 at 03:00:57PM +0200, Alexandre Ghiti wrote:
>> On Sat, May 27, 2023 at 12:02 PM Conor Dooley <[email protected]> wrote:
>>> On Sat, May 27, 2023 at 11:13:18AM +0200, Alexandre Ghiti wrote:
>>>> On 26/05/2023 18:35, Conor Dooley wrote:
>>>>> On Fri, May 26, 2023 at 05:24:41PM +0100, Conor Dooley wrote:
>>>>>> On Fri, May 26, 2023 at 05:46:30PM +0200, Alexandre Ghiti wrote:
>>>>>>> Early alternatives are called with the mmu disabled, and then should not
>>>>>>> access any global symbols through the GOT since it requires relocations,
>>>>>>> relocations that we do before but *virtually*. So only use medany code
>>>>>>> model for this early code.
>>>>>>>
>>>>>>> Signed-off-by: Alexandre Ghiti <[email protected]>
>>>>>>> ---
>>>>>>>
>>>>>>> Note that I'm not very happy with this fix, I think we need to put more
>>>>>>> effort into "harmonizing" this very early code (ie before the mmu is
>>>>>>> enabled) as it is spread between different locations and compiled
>>>>>>> differently.
>>>>>> Totally & I'll happily spend the time trying to review that work.
>>>>>>
>>>>>>> I'll work on that later, but for now, this fix does what is
>>>>>>> needed to work (from my testing at least). Any Tested-by on the Unmatched
>>>>>>> and T-head boards is welcome!
>>>>>> On 6.4-rc1 & v6.4-rc1 + this patch, with CONFIG_RELOCATABLE added to my
>>>>>> config, my Nezha fails to boot. There is no output whatsoever from the
>>>>>> kernel. Turning off CONFIG_RELOCATABLE boots again.
>>>>> I don't know if this is better or worse news, but same thing happens on
>>>>> an icicle kit. What systems, other than QEMU, has the relocatable
>>>>> eries been tested with, btw?
>>>>
>>>> I tested it on the Unmatched (Andreas did too).
>>> Cool. I cracked out my unmatched and it has the same issue as the
>>> icicle. Ditto my Visionfive v2. Here's my config.
>>> https://raw.githubusercontent.com/ConchuOD/riscv-env/dev/conf/defconfig
>>>
>>> A ~default qemu virt doesn't work either. (-m 2G -smp 5)
>> I can boot with this config using:
>>
>> $ sudo ~/qemu/build/qemu-system-riscv64 -machine virt -cpu
>> rv64,sv48=off -nographic -m 2G -smp 5 -kernel
>> build_conor/arch/riscv/boot/Image -s
> Just in case, that is my normal config that I use for testing random
> stuff on LKML, I added CONFIG_RELOCATABLE in addition to that.
>
>> I noticed when trying to add this to our internal CI that I had local
>> failures that did not happen in the CI because the CI was not using
>> the same toolchain: can you give me the full .config? So that I can
>> see if the compiler added stack guards or some other things I did not
>> think of.
> https://gist.githubusercontent.com/ConchuOD/655f9cc19fb3be63f1c9da7e7e3ab717/raw/a1aad3c0d307609b2062fd3a66705166aede9f9f/.config
>
> 90% of what I test for upstream stuff uses clang, since clang appears to
> be a minority choice - but I could reproduce this with gcc-12 as well,
> using the same defconfig as linked above + CONFIG_RELOCATABLE.


Hmmm, it still works for me with both clang and gcc-9.


You don't have to do that now but is there a way I could get your
compiled image? With the sha1 used to build it? Sorry, I don't see what
happens, I need to get my hands dirty in some debug!


Thanks for being so quick Conor!


> Cheers,
> Conor.
>
> _______________________________________________
> linux-riscv mailing list
> [email protected]
> http://lists.infradead.org/mailman/listinfo/linux-riscv

2023-05-28 14:02:14

by Alexandre Ghiti

[permalink] [raw]
Subject: Re: [PATCH -fixes] riscv: Fix relocatable kernels with early alternatives using -fno-pie

On Sat, May 27, 2023 at 12:02 PM Conor Dooley <[email protected]> wrote:
>
> On Sat, May 27, 2023 at 11:13:18AM +0200, Alexandre Ghiti wrote:
> >
> > On 26/05/2023 18:35, Conor Dooley wrote:
> > > On Fri, May 26, 2023 at 05:24:41PM +0100, Conor Dooley wrote:
> > > > On Fri, May 26, 2023 at 05:46:30PM +0200, Alexandre Ghiti wrote:
> > > > > Early alternatives are called with the mmu disabled, and then should not
> > > > > access any global symbols through the GOT since it requires relocations,
> > > > > relocations that we do before but *virtually*. So only use medany code
> > > > > model for this early code.
> > > > >
> > > > > Signed-off-by: Alexandre Ghiti <[email protected]>
> > > > > ---
> > > > >
> > > > > Note that I'm not very happy with this fix, I think we need to put more
> > > > > effort into "harmonizing" this very early code (ie before the mmu is
> > > > > enabled) as it is spread between different locations and compiled
> > > > > differently.
> > > > Totally & I'll happily spend the time trying to review that work.
> > > >
> > > > > I'll work on that later, but for now, this fix does what is
> > > > > needed to work (from my testing at least). Any Tested-by on the Unmatched
> > > > > and T-head boards is welcome!
> > > > On 6.4-rc1 & v6.4-rc1 + this patch, with CONFIG_RELOCATABLE added to my
> > > > config, my Nezha fails to boot. There is no output whatsoever from the
> > > > kernel. Turning off CONFIG_RELOCATABLE boots again.
> > > I don't know if this is better or worse news, but same thing happens on
> > > an icicle kit. What systems, other than QEMU, has the relocatable
> > > eries been tested with, btw?
> >
> >
> > I tested it on the Unmatched (Andreas did too).
>
> Cool. I cracked out my unmatched and it has the same issue as the
> icicle. Ditto my Visionfive v2. Here's my config.
> https://raw.githubusercontent.com/ConchuOD/riscv-env/dev/conf/defconfig
>
> A ~default qemu virt doesn't work either. (-m 2G -smp 5)

I can boot with this config using:

$ sudo ~/qemu/build/qemu-system-riscv64 -machine virt -cpu
rv64,sv48=off -nographic -m 2G -smp 5 -kernel
build_conor/arch/riscv/boot/Image -s

I noticed when trying to add this to our internal CI that I had local
failures that did not happen in the CI because the CI was not using
the same toolchain: can you give me the full .config? So that I can
see if the compiler added stack guards or some other things I did not
think of.

Thanks!

>
> > Very weird it does not work on the icicle kit, there is no errata for this
> > soc, so what gets executed this early for this soc? Do you know where it
> > fails to boot? If you can debug, you should break on the address of the
> > entry point (usually 0x8020_0000) since this is the stvec address, so when
> > you get a trap, you will branch there, and then could you dump $sepc, $ra
> > and $stval when you get there?
>
> > Regarding the thead issue, I think the following should fix it:
>
> It did not :/
>
> Cheers,
> Conor.
>

2023-05-28 14:10:50

by Conor Dooley

[permalink] [raw]
Subject: Re: [PATCH -fixes] riscv: Fix relocatable kernels with early alternatives using -fno-pie

On Sun, May 28, 2023 at 03:42:59PM +0200, Alexandre Ghiti wrote:
> Hmmm, it still works for me with both clang and gcc-9.

gcc-9 is a bit of a relic, do you have more recent compilers lying
around? If not, I can try some older compilers at some point.

> You don't have to do that now but is there a way I could get your compiled
> image? With the sha1 used to build it? Sorry, I don't see what happens, I
> need to get my hands dirty in some debug!

What do you mean by "sha1"? It falls with v6.4-rc1 which is a stable
hash, if that's what you're looking for.

Otherwise,
https://github.com/ConchuOD/riscv-env/releases/download/v2022.03/vmlinux.bin
(ignore the release crap haha, too lazy to find a proper hosting
mechanism)

| git show
| commit 3bd124485ed55d8ee6c1ff3532c8f617b24aa6ef (HEAD)
| Author: Alexandre Ghiti <[email protected]>
| Date: Fri May 26 17:46:30 2023 +0200
|
| riscv: Fix relocatable kernels with early alternatives using -fno-pie
|
| Early alternatives are called with the mmu disabled, and then should not
| access any global symbols through the GOT since it requires relocations,
| relocations that we do before but *virtually*. So only use medany code
| model for this early code.
|
| Signed-off-by: Alexandre Ghiti <[email protected]>
| Signed-off-by: Conor Dooley <[email protected]>
|
| diff --git a/arch/riscv/errata/Makefile b/arch/riscv/errata/Makefile
| index a1055965fbee..7b2637c8c332 100644
| --- a/arch/riscv/errata/Makefile
| +++ b/arch/riscv/errata/Makefile
| @@ -1,2 +1,6 @@
| +ifdef CONFIG_RELOCATABLE
| +KBUILD_CFLAGS += -fno-pie
| +endif
| +
| obj-$(CONFIG_ERRATA_SIFIVE) += sifive/
| obj-$(CONFIG_ERRATA_THEAD) += thead/
| diff --git a/arch/riscv/kernel/Makefile b/arch/riscv/kernel/Makefile
| index fbdccc21418a..153864e4f399 100644
| --- a/arch/riscv/kernel/Makefile
| +++ b/arch/riscv/kernel/Makefile
| @@ -23,6 +23,10 @@ ifdef CONFIG_FTRACE
| CFLAGS_REMOVE_alternative.o = $(CC_FLAGS_FTRACE)
| CFLAGS_REMOVE_cpufeature.o = $(CC_FLAGS_FTRACE)
| endif
| +ifdef CONFIG_RELOCATABLE
| +CFLAGS_alternative.o += -fno-pie
| +CFLAGS_cpufeature.o += -fno-pie
| +endif
| ifdef CONFIG_KASAN
| KASAN_SANITIZE_alternative.o := n
| KASAN_SANITIZE_cpufeature.o := n


Attachments:
(No filename) (2.27 kB)
signature.asc (235.00 B)
Download all attachments

2023-05-29 19:15:51

by Conor Dooley

[permalink] [raw]
Subject: Re: [PATCH -fixes] riscv: Fix relocatable kernels with early alternatives using -fno-pie

On Mon, May 29, 2023 at 08:51:57PM +0200, Alexandre Ghiti wrote:
>
> On 28/05/2023 15:56, Conor Dooley wrote:
> > On Sun, May 28, 2023 at 03:42:59PM +0200, Alexandre Ghiti wrote:
> > > Hmmm, it still works for me with both clang and gcc-9.
> > gcc-9 is a bit of a relic, do you have more recent compilers lying
> > around? If not, I can try some older compilers at some point.
> >
> > > You don't have to do that now but is there a way I could get your compiled
> > > image? With the sha1 used to build it? Sorry, I don't see what happens, I
> > > need to get my hands dirty in some debug!
> > What do you mean by "sha1"? It falls with v6.4-rc1 which is a stable
> > hash, if that's what you're looking for.
> >
> > Otherwise,
> > https://github.com/ConchuOD/riscv-env/releases/download/v2022.03/vmlinux.bin
> > (ignore the release crap haha, too lazy to find a proper hosting
> > mechanism)
>
>
> Ok, I don't get much info without the symbols, can you also provide the
> vmlinux please? But at least your image does not boot, not during the early
> boot though because the mmu is enabled.

Do you see anything print when you try it? Cos I do not. Iff I have time
tomorrow, I'll go poking with gdb. I'm sorry I have not really done any
investigating, I have been really busy this last week or so with
dt-binding stuff but I should be freer again from tomorrow.

https://github.com/ConchuOD/riscv-env/releases/download/v2022.03/vmlinux

> I tried with gcc-12 and it still works fine on my end, so frustrating!

Crap! Also, should you not be enjoying a public holiday rather than
debugging?! Or maybe debugging is enjoyable for you...

Cheers,
Conor.


Attachments:
(No filename) (1.65 kB)
signature.asc (235.00 B)
Download all attachments

2023-05-29 19:17:05

by Alexandre Ghiti

[permalink] [raw]
Subject: Re: [PATCH -fixes] riscv: Fix relocatable kernels with early alternatives using -fno-pie


On 28/05/2023 15:56, Conor Dooley wrote:
> On Sun, May 28, 2023 at 03:42:59PM +0200, Alexandre Ghiti wrote:
>> Hmmm, it still works for me with both clang and gcc-9.
> gcc-9 is a bit of a relic, do you have more recent compilers lying
> around? If not, I can try some older compilers at some point.
>
>> You don't have to do that now but is there a way I could get your compiled
>> image? With the sha1 used to build it? Sorry, I don't see what happens, I
>> need to get my hands dirty in some debug!
> What do you mean by "sha1"? It falls with v6.4-rc1 which is a stable
> hash, if that's what you're looking for.
>
> Otherwise,
> https://github.com/ConchuOD/riscv-env/releases/download/v2022.03/vmlinux.bin
> (ignore the release crap haha, too lazy to find a proper hosting
> mechanism)


Ok, I don't get much info without the symbols, can you also provide the
vmlinux please? But at least your image does not boot, not during the
early boot though because the mmu is enabled.

I tried with gcc-12 and it still works fine on my end, so frustrating!


> | git show
> | commit 3bd124485ed55d8ee6c1ff3532c8f617b24aa6ef (HEAD)
> | Author: Alexandre Ghiti <[email protected]>
> | Date: Fri May 26 17:46:30 2023 +0200
> |
> | riscv: Fix relocatable kernels with early alternatives using -fno-pie
> |
> | Early alternatives are called with the mmu disabled, and then should not
> | access any global symbols through the GOT since it requires relocations,
> | relocations that we do before but *virtually*. So only use medany code
> | model for this early code.
> |
> | Signed-off-by: Alexandre Ghiti <[email protected]>
> | Signed-off-by: Conor Dooley <[email protected]>
> |
> | diff --git a/arch/riscv/errata/Makefile b/arch/riscv/errata/Makefile
> | index a1055965fbee..7b2637c8c332 100644
> | --- a/arch/riscv/errata/Makefile
> | +++ b/arch/riscv/errata/Makefile
> | @@ -1,2 +1,6 @@
> | +ifdef CONFIG_RELOCATABLE
> | +KBUILD_CFLAGS += -fno-pie
> | +endif
> | +
> | obj-$(CONFIG_ERRATA_SIFIVE) += sifive/
> | obj-$(CONFIG_ERRATA_THEAD) += thead/
> | diff --git a/arch/riscv/kernel/Makefile b/arch/riscv/kernel/Makefile
> | index fbdccc21418a..153864e4f399 100644
> | --- a/arch/riscv/kernel/Makefile
> | +++ b/arch/riscv/kernel/Makefile
> | @@ -23,6 +23,10 @@ ifdef CONFIG_FTRACE
> | CFLAGS_REMOVE_alternative.o = $(CC_FLAGS_FTRACE)
> | CFLAGS_REMOVE_cpufeature.o = $(CC_FLAGS_FTRACE)
> | endif
> | +ifdef CONFIG_RELOCATABLE
> | +CFLAGS_alternative.o += -fno-pie
> | +CFLAGS_cpufeature.o += -fno-pie
> | +endif
> | ifdef CONFIG_KASAN
> | KASAN_SANITIZE_alternative.o := n
> | KASAN_SANITIZE_cpufeature.o := n
>
>
> _______________________________________________
> linux-riscv mailing list
> [email protected]
> http://lists.infradead.org/mailman/listinfo/linux-riscv

2023-05-29 19:49:56

by Alexandre Ghiti

[permalink] [raw]
Subject: Re: [PATCH -fixes] riscv: Fix relocatable kernels with early alternatives using -fno-pie


On 29/05/2023 21:06, Conor Dooley wrote:
> On Mon, May 29, 2023 at 08:51:57PM +0200, Alexandre Ghiti wrote:
>> On 28/05/2023 15:56, Conor Dooley wrote:
>>> On Sun, May 28, 2023 at 03:42:59PM +0200, Alexandre Ghiti wrote:
>>>> Hmmm, it still works for me with both clang and gcc-9.
>>> gcc-9 is a bit of a relic, do you have more recent compilers lying
>>> around? If not, I can try some older compilers at some point.
>>>
>>>> You don't have to do that now but is there a way I could get your compiled
>>>> image? With the sha1 used to build it? Sorry, I don't see what happens, I
>>>> need to get my hands dirty in some debug!
>>> What do you mean by "sha1"? It falls with v6.4-rc1 which is a stable
>>> hash, if that's what you're looking for.
>>>
>>> Otherwise,
>>> https://github.com/ConchuOD/riscv-env/releases/download/v2022.03/vmlinux.bin
>>> (ignore the release crap haha, too lazy to find a proper hosting
>>> mechanism)
>>
>> Ok, I don't get much info without the symbols, can you also provide the
>> vmlinux please? But at least your image does not boot, not during the early
>> boot though because the mmu is enabled.
> Do you see anything print when you try it? Cos I do not. Iff I have time
> tomorrow, I'll go poking with gdb. I'm sorry I have not really done any
> investigating, I have been really busy this last week or so with
> dt-binding stuff but I should be freer again from tomorrow.
>
> https://github.com/ConchuOD/riscv-env/releases/download/v2022.03/vmlinux


Better, the trap happens in kasan_early_init() when it tries to access a
global symbol using the GOT but ends up with a NULL pointer, which is
weird. So to me, this is not related to kasan, it happens that
kasan_early_init() is the first function called after enabling the mmu,
I think you may have an issue with the filling of the relocations. Sorry
to bother you again, but if at some point you can recompile with
DEBUG_INFO enabled, that would be perfect! And also provide the
vmlinux.relocs file. Sorry for all that, too bad I can't reproduce it.


>
>> I tried with gcc-12 and it still works fine on my end, so frustrating!
> Crap! Also, should you not be enjoying a public holiday rather than
> debugging?! Or maybe debugging is enjoyable for you...


Ahah, this is what I enjoy doing when the kids finally sleep :)


Thank you again for your very quick feedback, really appreciated!


>
> Cheers,
> Conor.

2023-05-30 11:37:25

by Conor Dooley

[permalink] [raw]
Subject: Re: [PATCH -fixes] riscv: Fix relocatable kernels with early alternatives using -fno-pie

On Mon, May 29, 2023 at 09:37:28PM +0200, Alexandre Ghiti wrote:
> On 29/05/2023 21:06, Conor Dooley wrote:
> > On Mon, May 29, 2023 at 08:51:57PM +0200, Alexandre Ghiti wrote:
> > > On 28/05/2023 15:56, Conor Dooley wrote:
> > > > On Sun, May 28, 2023 at 03:42:59PM +0200, Alexandre Ghiti wrote:
> > > > > Hmmm, it still works for me with both clang and gcc-9.
> > > > gcc-9 is a bit of a relic, do you have more recent compilers lying
> > > > around? If not, I can try some older compilers at some point.
> > > >
> > > > > You don't have to do that now but is there a way I could get your compiled
> > > > > image? With the sha1 used to build it? Sorry, I don't see what happens, I
> > > > > need to get my hands dirty in some debug!
> > > > What do you mean by "sha1"? It falls with v6.4-rc1 which is a stable
> > > > hash, if that's what you're looking for.
> > > >
> > > > Otherwise,
> > > > https://github.com/ConchuOD/riscv-env/releases/download/v2022.03/vmlinux.bin
> > > > (ignore the release crap haha, too lazy to find a proper hosting
> > > > mechanism)
> > >
> > > Ok, I don't get much info without the symbols, can you also provide the
> > > vmlinux please? But at least your image does not boot, not during the early
> > > boot though because the mmu is enabled.
> > Do you see anything print when you try it? Cos I do not. Iff I have time
> > tomorrow, I'll go poking with gdb. I'm sorry I have not really done any
> > investigating, I have been really busy this last week or so with
> > dt-binding stuff but I should be freer again from tomorrow.
> >
> > https://github.com/ConchuOD/riscv-env/releases/download/v2022.03/vmlinux
>
>
> Better, the trap happens in kasan_early_init() when it tries to access a
> global symbol using the GOT but ends up with a NULL pointer, which is weird.
> So to me, this is not related to kasan, it happens that kasan_early_init()
> is the first function called after enabling the mmu, I think you may have an
> issue with the filling of the relocations.

Yeah, it reproduces without KASAN.

> Sorry to bother you again, but if
> at some point you can recompile with DEBUG_INFO enabled, that would be
> perfect! And also provide the vmlinux.relocs file. Sorry for all that, too
> bad I can't reproduce it.

New vmlinux & vmlinux.relocs here:
https://microchiptechnology-my.sharepoint.com/:u:/g/personal/conor_dooley_microchip_com/EZpFNxYYrnNAh5Z3c-rf0pUBBpdPGTLafqdtfcXRUUBkXw?e=7KKMHX
They're pretty massive unfortunately & hopefully that is not some
garbage internal-only link.
.config is a wee bit different, cos different build machine, but the
problem still manifests on a icicle. I've added it to the tarball just
in case.

> > > I tried with gcc-12 and it still works fine on my end, so frustrating!
> > Crap! Also, should you not be enjoying a public holiday rather than
> > debugging?! Or maybe debugging is enjoyable for you...
>
>
> Ahah, this is what I enjoy doing when the kids finally sleep :)
>
>
> Thank you again for your very quick feedback, really appreciated!

No worries.


Attachments:
(No filename) (3.04 kB)
signature.asc (235.00 B)
Download all attachments

2023-05-30 14:58:29

by Alexandre Ghiti

[permalink] [raw]
Subject: Re: [PATCH -fixes] riscv: Fix relocatable kernels with early alternatives using -fno-pie


On 30/05/2023 13:27, Conor Dooley wrote:
> On Mon, May 29, 2023 at 09:37:28PM +0200, Alexandre Ghiti wrote:
>> On 29/05/2023 21:06, Conor Dooley wrote:
>>> On Mon, May 29, 2023 at 08:51:57PM +0200, Alexandre Ghiti wrote:
>>>> On 28/05/2023 15:56, Conor Dooley wrote:
>>>>> On Sun, May 28, 2023 at 03:42:59PM +0200, Alexandre Ghiti wrote:
>>>>>> Hmmm, it still works for me with both clang and gcc-9.
>>>>> gcc-9 is a bit of a relic, do you have more recent compilers lying
>>>>> around? If not, I can try some older compilers at some point.
>>>>>
>>>>>> You don't have to do that now but is there a way I could get your compiled
>>>>>> image? With the sha1 used to build it? Sorry, I don't see what happens, I
>>>>>> need to get my hands dirty in some debug!
>>>>> What do you mean by "sha1"? It falls with v6.4-rc1 which is a stable
>>>>> hash, if that's what you're looking for.
>>>>>
>>>>> Otherwise,
>>>>> https://github.com/ConchuOD/riscv-env/releases/download/v2022.03/vmlinux.bin
>>>>> (ignore the release crap haha, too lazy to find a proper hosting
>>>>> mechanism)
>>>> Ok, I don't get much info without the symbols, can you also provide the
>>>> vmlinux please? But at least your image does not boot, not during the early
>>>> boot though because the mmu is enabled.
>>> Do you see anything print when you try it? Cos I do not. Iff I have time
>>> tomorrow, I'll go poking with gdb. I'm sorry I have not really done any
>>> investigating, I have been really busy this last week or so with
>>> dt-binding stuff but I should be freer again from tomorrow.
>>>
>>> https://github.com/ConchuOD/riscv-env/releases/download/v2022.03/vmlinux
>>
>> Better, the trap happens in kasan_early_init() when it tries to access a
>> global symbol using the GOT but ends up with a NULL pointer, which is weird.
>> So to me, this is not related to kasan, it happens that kasan_early_init()
>> is the first function called after enabling the mmu, I think you may have an
>> issue with the filling of the relocations.
> Yeah, it reproduces without KASAN.
>
>> Sorry to bother you again, but if
>> at some point you can recompile with DEBUG_INFO enabled, that would be
>> perfect! And also provide the vmlinux.relocs file. Sorry for all that, too
>> bad I can't reproduce it.
> New vmlinux & vmlinux.relocs here:
> https://microchiptechnology-my.sharepoint.com/:u:/g/personal/conor_dooley_microchip_com/EZpFNxYYrnNAh5Z3c-rf0pUBBpdPGTLafqdtfcXRUUBkXw?e=7KKMHX
> They're pretty massive unfortunately & hopefully that is not some
> garbage internal-only link.
> .config is a wee bit different, cos different build machine, but the
> problem still manifests on a icicle. I've added it to the tarball just
> in case.


Ok so I had to recreate the Image from the files you gave me and it
boots fine using qemu: is that expected? Because you only mention the
icicle above.


[    0.000000] Linux version 6.4.0-rc1 (conor@wendy) (ClangBuiltLinux
clang version 15.0.7 (/home/conor/stuff/dev/llvm/clang
8dfdcc7b7bf66834a761bd8de445840ef68e4d1a), ClangBuiltLinux LLD 15.0.7)
#1 SMP PREEMPT Tue May 30 12:13:12 IST 2023
[    0.000000] random: crng init done
[    0.000000] Machine model: riscv-virtio,qemu
[    0.000000] earlycon: ns16550a0 at MMIO 0x0000000010000000 (options '')
[    0.000000] printk: bootconsole [ns16550a0] enabled
[    0.000000] printk: debug: skip boot console de-registration.
[    0.000000] efi: UEFI not found.
[    0.000000] OF: reserved mem: 0x0000000080000000..0x000000008003ffff
(256 KiB) map non-reusable mmode_resv0@80000000
[    0.000000] Zone ranges:
[    0.000000]   DMA32    [mem 0x0000000080000000-0x00000000ffffffff]
[    0.000000]   Normal   [mem 0x0000000100000000-0x000000017fffffff]
[    0.000000] Movable zone start for each node
[    0.000000] Early memory node ranges
[    0.000000]   node   0: [mem 0x0000000080000000-0x000000017fffffff]
[    0.000000] Initmem setup node 0 [mem
0x0000000080000000-0x000000017fffffff]
[    0.000000] SBI specification v1.0 detected
[    0.000000] SBI implementation ID=0x1 Version=0x10002
[    0.000000] SBI TIME extension detected
[    0.000000] SBI IPI extension detected
[    0.000000] SBI RFENCE extension detected
[    0.000000] SBI SRST extension detected
[    0.000000] SBI HSM extension detected
[    0.000000] riscv: base ISA extensions acdfhim
[    0.000000] riscv: ELF capabilities acdfim
[    0.000000] percpu: Embedded 30 pages/cpu s83872 r8192 d30816 u122880
[    0.000000] Kernel command line: earlycon keep_bootcon
root=/dev/mmcblk1p2 rootdelay=10 reboot=cold
[    0.000000] Dentry cache hash table entries: 524288 (order: 10,
4194304 bytes, linear)
[    0.000000] Inode-cache hash table entries: 262144 (order: 9, 2097152
bytes, linear)
[    0.000000] Built 1 zonelists, mobility grouping on.  Total pages:
1034240
[    0.000000] mem auto-init: stack:all(zero), heap alloc:off, heap free:off

...


>>>> I tried with gcc-12 and it still works fine on my end, so frustrating!
>>> Crap! Also, should you not be enjoying a public holiday rather than
>>> debugging?! Or maybe debugging is enjoyable for you...
>>
>> Ahah, this is what I enjoy doing when the kids finally sleep :)
>>
>>
>> Thank you again for your very quick feedback, really appreciated!
> No worries.
>
> _______________________________________________
> linux-riscv mailing list
> [email protected]
> http://lists.infradead.org/mailman/listinfo/linux-riscv

2023-05-30 17:58:17

by Conor Dooley

[permalink] [raw]
Subject: Re: [PATCH -fixes] riscv: Fix relocatable kernels with early alternatives using -fno-pie

On Tue, May 30, 2023 at 04:33:45PM +0200, Alexandre Ghiti wrote:
>
> On 30/05/2023 13:27, Conor Dooley wrote:
> > On Mon, May 29, 2023 at 09:37:28PM +0200, Alexandre Ghiti wrote:
> > > On 29/05/2023 21:06, Conor Dooley wrote:
> > > > On Mon, May 29, 2023 at 08:51:57PM +0200, Alexandre Ghiti wrote:
> > > > > On 28/05/2023 15:56, Conor Dooley wrote:
> > > > > > On Sun, May 28, 2023 at 03:42:59PM +0200, Alexandre Ghiti wrote:
> > > > > > > Hmmm, it still works for me with both clang and gcc-9.
> > > > > > gcc-9 is a bit of a relic, do you have more recent compilers lying
> > > > > > around? If not, I can try some older compilers at some point.
> > > > > >
> > > > > > > You don't have to do that now but is there a way I could get your compiled
> > > > > > > image? With the sha1 used to build it? Sorry, I don't see what happens, I
> > > > > > > need to get my hands dirty in some debug!
> > > > > > What do you mean by "sha1"? It falls with v6.4-rc1 which is a stable
> > > > > > hash, if that's what you're looking for.
> > > > > >
> > > > > > Otherwise,
> > > > > > https://github.com/ConchuOD/riscv-env/releases/download/v2022.03/vmlinux.bin
> > > > > > (ignore the release crap haha, too lazy to find a proper hosting
> > > > > > mechanism)
> > > > > Ok, I don't get much info without the symbols, can you also provide the
> > > > > vmlinux please? But at least your image does not boot, not during the early
> > > > > boot though because the mmu is enabled.
> > > > Do you see anything print when you try it? Cos I do not. Iff I have time
> > > > tomorrow, I'll go poking with gdb. I'm sorry I have not really done any
> > > > investigating, I have been really busy this last week or so with
> > > > dt-binding stuff but I should be freer again from tomorrow.
> > > >
> > > > https://github.com/ConchuOD/riscv-env/releases/download/v2022.03/vmlinux
> > >
> > > Better, the trap happens in kasan_early_init() when it tries to access a
> > > global symbol using the GOT but ends up with a NULL pointer, which is weird.
> > > So to me, this is not related to kasan, it happens that kasan_early_init()
> > > is the first function called after enabling the mmu, I think you may have an
> > > issue with the filling of the relocations.
> > Yeah, it reproduces without KASAN.
> >
> > > Sorry to bother you again, but if
> > > at some point you can recompile with DEBUG_INFO enabled, that would be
> > > perfect! And also provide the vmlinux.relocs file. Sorry for all that, too
> > > bad I can't reproduce it.
> > New vmlinux & vmlinux.relocs here:
> > https://microchiptechnology-my.sharepoint.com/:u:/g/personal/conor_dooley_microchip_com/EZpFNxYYrnNAh5Z3c-rf0pUBBpdPGTLafqdtfcXRUUBkXw?e=7KKMHX
> > They're pretty massive unfortunately & hopefully that is not some
> > garbage internal-only link.
> > .config is a wee bit different, cos different build machine, but the
> > problem still manifests on a icicle. I've added it to the tarball just
> > in case.
>
>
> Ok so I had to recreate the Image from the files you gave me and it boots
> fine using qemu: is that expected? Because you only mention the icicle
> above.

Unfortunately you sent this one right as I left work..
I ssh'ed in though and ran the vmlinux.bin & had the same issues.
Silly question perhaps - is it just not possible to boot something that
has been hit with `objcopy -O binary vmlinux vmlinux.bin` with
CONFIG_RELOCATABLE? At this point that's the main thing that sticks out
to me as being different. You couldn't boot the vmlinux.bin that I sent
you either.

Cheers,
Conor.


Attachments:
(No filename) (3.54 kB)
signature.asc (235.00 B)
Download all attachments

2023-05-30 18:16:27

by Alexandre Ghiti

[permalink] [raw]
Subject: Re: [PATCH -fixes] riscv: Fix relocatable kernels with early alternatives using -fno-pie

On Tue, May 30, 2023 at 7:47 PM Conor Dooley <[email protected]> wrote:
>
> On Tue, May 30, 2023 at 04:33:45PM +0200, Alexandre Ghiti wrote:
> >
> > On 30/05/2023 13:27, Conor Dooley wrote:
> > > On Mon, May 29, 2023 at 09:37:28PM +0200, Alexandre Ghiti wrote:
> > > > On 29/05/2023 21:06, Conor Dooley wrote:
> > > > > On Mon, May 29, 2023 at 08:51:57PM +0200, Alexandre Ghiti wrote:
> > > > > > On 28/05/2023 15:56, Conor Dooley wrote:
> > > > > > > On Sun, May 28, 2023 at 03:42:59PM +0200, Alexandre Ghiti wrote:
> > > > > > > > Hmmm, it still works for me with both clang and gcc-9.
> > > > > > > gcc-9 is a bit of a relic, do you have more recent compilers lying
> > > > > > > around? If not, I can try some older compilers at some point.
> > > > > > >
> > > > > > > > You don't have to do that now but is there a way I could get your compiled
> > > > > > > > image? With the sha1 used to build it? Sorry, I don't see what happens, I
> > > > > > > > need to get my hands dirty in some debug!
> > > > > > > What do you mean by "sha1"? It falls with v6.4-rc1 which is a stable
> > > > > > > hash, if that's what you're looking for.
> > > > > > >
> > > > > > > Otherwise,
> > > > > > > https://github.com/ConchuOD/riscv-env/releases/download/v2022.03/vmlinux.bin
> > > > > > > (ignore the release crap haha, too lazy to find a proper hosting
> > > > > > > mechanism)
> > > > > > Ok, I don't get much info without the symbols, can you also provide the
> > > > > > vmlinux please? But at least your image does not boot, not during the early
> > > > > > boot though because the mmu is enabled.
> > > > > Do you see anything print when you try it? Cos I do not. Iff I have time
> > > > > tomorrow, I'll go poking with gdb. I'm sorry I have not really done any
> > > > > investigating, I have been really busy this last week or so with
> > > > > dt-binding stuff but I should be freer again from tomorrow.
> > > > >
> > > > > https://github.com/ConchuOD/riscv-env/releases/download/v2022.03/vmlinux
> > > >
> > > > Better, the trap happens in kasan_early_init() when it tries to access a
> > > > global symbol using the GOT but ends up with a NULL pointer, which is weird.
> > > > So to me, this is not related to kasan, it happens that kasan_early_init()
> > > > is the first function called after enabling the mmu, I think you may have an
> > > > issue with the filling of the relocations.
> > > Yeah, it reproduces without KASAN.
> > >
> > > > Sorry to bother you again, but if
> > > > at some point you can recompile with DEBUG_INFO enabled, that would be
> > > > perfect! And also provide the vmlinux.relocs file. Sorry for all that, too
> > > > bad I can't reproduce it.
> > > New vmlinux & vmlinux.relocs here:
> > > https://microchiptechnology-my.sharepoint.com/:u:/g/personal/conor_dooley_microchip_com/EZpFNxYYrnNAh5Z3c-rf0pUBBpdPGTLafqdtfcXRUUBkXw?e=7KKMHX
> > > They're pretty massive unfortunately & hopefully that is not some
> > > garbage internal-only link.
> > > .config is a wee bit different, cos different build machine, but the
> > > problem still manifests on a icicle. I've added it to the tarball just
> > > in case.
> >
> >
> > Ok so I had to recreate the Image from the files you gave me and it boots
> > fine using qemu: is that expected? Because you only mention the icicle
> > above.
>
> Unfortunately you sent this one right as I left work..
> I ssh'ed in though and ran the vmlinux.bin & had the same issues.
> Silly question perhaps - is it just not possible to boot something that
> has been hit with `objcopy -O binary vmlinux vmlinux.bin` with
> CONFIG_RELOCATABLE? At this point that's the main thing that sticks out
> to me as being different. You couldn't boot the vmlinux.bin that I sent
> you either.

Ahah, I think we found the culprit!

With CONFIG_RELOCATABLE, vmlinux is actually stripped from all the
relocations (so that it can be shipped) and vmlinux.relocs is what you
should use instead, since it is just a copy of vmlinux before the
removal of the relocations!

>
> Cheers,
> Conor.

2023-05-30 20:32:34

by Conor Dooley

[permalink] [raw]
Subject: Re: [PATCH -fixes] riscv: Fix relocatable kernels with early alternatives using -fno-pie

On Tue, May 30, 2023 at 08:04:17PM +0200, Alexandre Ghiti wrote:
>
> Ahah, I think we found the culprit!
>
> With CONFIG_RELOCATABLE, vmlinux is actually stripped from all the
> relocations (so that it can be shipped) and vmlinux.relocs is what you
> should use instead, since it is just a copy of vmlinux before the
> removal of the relocations!

That probably makes us both eejits for not realising sooner...

Tested-by: Conor Dooley <[email protected]> # booted on nezha & unmatched

Thanks for your patience here Alex.


Attachments:
(No filename) (548.00 B)
signature.asc (235.00 B)
Download all attachments

2023-05-31 07:31:13

by Alexandre Ghiti

[permalink] [raw]
Subject: Re: [PATCH -fixes] riscv: Fix relocatable kernels with early alternatives using -fno-pie


On 30/05/2023 22:22, Conor Dooley wrote:
> On Tue, May 30, 2023 at 08:04:17PM +0200, Alexandre Ghiti wrote:
>> Ahah, I think we found the culprit!
>>
>> With CONFIG_RELOCATABLE, vmlinux is actually stripped from all the
>> relocations (so that it can be shipped) and vmlinux.relocs is what you
>> should use instead, since it is just a copy of vmlinux before the
>> removal of the relocations!
> That probably makes us both eejits for not realising sooner...


Ahah, TIL a new word, thanks :)


>
> Tested-by: Conor Dooley <[email protected]> # booted on nezha & unmatched
>
> Thanks for your patience here Alex.


So I checked again if the -fno-pie should be applied to
mm/dma-noncoherent.c as I suggested, but actually no:
errata/thead/errata.c never reaches riscv_noncoherent_supported() in
early boot (you can see how 'fragile' it is though and why something
needs to be done...).


Oh and I realized that I forgot the Reported-by from Andreas and the
Fixes tags, so here they are:

Fixes: 39b33072941f ("riscv: Introduce CONFIG_RELOCATABLE")
Reported-by: Andreas Schwab <[email protected]>


Thank you too for your patience and your quick answers!

Alex


>
> _______________________________________________
> linux-riscv mailing list
> [email protected]
> http://lists.infradead.org/mailman/listinfo/linux-riscv

2023-05-31 09:51:31

by Conor Dooley

[permalink] [raw]
Subject: Re: [PATCH -fixes] riscv: Fix relocatable kernels with early alternatives using -fno-pie

On Wed, May 31, 2023 at 09:26:27AM +0200, Alexandre Ghiti wrote:
> On 30/05/2023 22:22, Conor Dooley wrote:
> > On Tue, May 30, 2023 at 08:04:17PM +0200, Alexandre Ghiti wrote:
> > > Ahah, I think we found the culprit!
> > >
> > > With CONFIG_RELOCATABLE, vmlinux is actually stripped from all the
> > > relocations (so that it can be shipped) and vmlinux.relocs is what you
> > > should use instead, since it is just a copy of vmlinux before the
> > > removal of the relocations!
> > That probably makes us both eejits for not realising sooner...
>
> Ahah, TIL a new word, thanks :)
>
> >
> > Tested-by: Conor Dooley <[email protected]> # booted on nezha & unmatched
> >
> > Thanks for your patience here Alex.
>
> So I checked again if the -fno-pie should be applied to mm/dma-noncoherent.c
> as I suggested, but actually no: errata/thead/errata.c never reaches
> riscv_noncoherent_supported() in early boot (you can see how 'fragile' it is
> though and why something needs to be done...).

I did make sure to check this patch itself, without the additional bit,
to see if it was needed.
But yeah, it is going to be super fragile - do you have any ideas about
how to circumvent that?

> Oh and I realized that I forgot the Reported-by from Andreas and the Fixes
> tags, so here they are:
>
> Fixes: 39b33072941f ("riscv: Introduce CONFIG_RELOCATABLE")
> Reported-by: Andreas Schwab <[email protected]>
>
>
> Thank you too for your patience and your quick answers!
>
> Alex
>
>
> >
> > _______________________________________________
> > linux-riscv mailing list
> > [email protected]
> > http://lists.infradead.org/mailman/listinfo/linux-riscv


Attachments:
(No filename) (1.69 kB)
signature.asc (235.00 B)
Download all attachments

2023-05-31 11:20:39

by Alexandre Ghiti

[permalink] [raw]
Subject: Re: [PATCH -fixes] riscv: Fix relocatable kernels with early alternatives using -fno-pie


On 31/05/2023 11:32, Conor Dooley wrote:
> On Wed, May 31, 2023 at 09:26:27AM +0200, Alexandre Ghiti wrote:
>> On 30/05/2023 22:22, Conor Dooley wrote:
>>> On Tue, May 30, 2023 at 08:04:17PM +0200, Alexandre Ghiti wrote:
>>>> Ahah, I think we found the culprit!
>>>>
>>>> With CONFIG_RELOCATABLE, vmlinux is actually stripped from all the
>>>> relocations (so that it can be shipped) and vmlinux.relocs is what you
>>>> should use instead, since it is just a copy of vmlinux before the
>>>> removal of the relocations!
>>> That probably makes us both eejits for not realising sooner...
>> Ahah, TIL a new word, thanks :)
>>
>>> Tested-by: Conor Dooley <[email protected]> # booted on nezha & unmatched
>>>
>>> Thanks for your patience here Alex.
>> So I checked again if the -fno-pie should be applied to mm/dma-noncoherent.c
>> as I suggested, but actually no: errata/thead/errata.c never reaches
>> riscv_noncoherent_supported() in early boot (you can see how 'fragile' it is
>> though and why something needs to be done...).
> I did make sure to check this patch itself, without the additional bit,
> to see if it was needed.
> But yeah, it is going to be super fragile - do you have any ideas about
> how to circumvent that?


Yes, I was thinking about multiple solutions:

- All the early code could go into kernel/pi: all the dependencies of
the early code is built in its own way (the symbols are actually
'duplicated'). I see that a bit like the EFI stub. My first try failed
with !CONFIG_RELOCATABLE, I have to dig further.

- Simply do a physical relocation before any early code, execute the
early code, and then do the virtual relocation. But that does not solve
the issue fixed by kernel/pi which allows to recompile standard
functions (like the string ones) without any instrumentation and have
the versions with the instrumentation for normal execution.

- Compile relocatable kernels without -fPIE (why can't we just use
medany actually?). That won't fix certain types of situations where we
need relocations, but that will limit the number of outliers that need
to be compiled with -fno-pie and it will be easier to spot (we'll still
have to be very careful though)

- Be very strict about what can/cannot be done in this pre-mmu stage,
and document that...

The best solution would be the first I guess. Any other ideas welcome :)


>
>> Oh and I realized that I forgot the Reported-by from Andreas and the Fixes
>> tags, so here they are:
>>
>> Fixes: 39b33072941f ("riscv: Introduce CONFIG_RELOCATABLE")
>> Reported-by: Andreas Schwab <[email protected]>
>>
>>
>> Thank you too for your patience and your quick answers!
>>
>> Alex
>>
>>
>>> _______________________________________________
>>> linux-riscv mailing list
>>> [email protected]
>>> http://lists.infradead.org/mailman/listinfo/linux-riscv

2023-05-31 15:20:51

by Palmer Dabbelt

[permalink] [raw]
Subject: Re: [PATCH -fixes] riscv: Fix relocatable kernels with early alternatives using -fno-pie


On Fri, 26 May 2023 17:46:30 +0200, Alexandre Ghiti wrote:
> Early alternatives are called with the mmu disabled, and then should not
> access any global symbols through the GOT since it requires relocations,
> relocations that we do before but *virtually*. So only use medany code
> model for this early code.
>
>

Applied, thanks!

[1/1] riscv: Fix relocatable kernels with early alternatives using -fno-pie
https://git.kernel.org/palmer/c/8dc2a7e8027f

Best regards,
--
Palmer Dabbelt <[email protected]>


Subject: Re: [PATCH -fixes] riscv: Fix relocatable kernels with early alternatives using -fno-pie

Hello:

This patch was applied to riscv/linux.git (fixes)
by Palmer Dabbelt <[email protected]>:

On Fri, 26 May 2023 17:46:30 +0200 you wrote:
> Early alternatives are called with the mmu disabled, and then should not
> access any global symbols through the GOT since it requires relocations,
> relocations that we do before but *virtually*. So only use medany code
> model for this early code.
>
> Signed-off-by: Alexandre Ghiti <[email protected]>
>
> [...]

Here is the summary with links:
- [-fixes] riscv: Fix relocatable kernels with early alternatives using -fno-pie
https://git.kernel.org/riscv/c/8dc2a7e8027f

You are awesome, thank you!
--
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html