2023-11-22 18:24:38

by Serge Semin

[permalink] [raw]
Subject: [PATCH 0/7] MIPS: mm: Fix some memory-related issues

Just recently I've rebased my MIPS32-related work from kernel 6.5-rc4 onto
the latest kernel 6.7-rc1 and immediately got into a bootup-time
mm-related bug (see patches 3-5 in this series). After fixing it I decided
it was time to submit for review the generic MIPS code fixes which I have
been collecting in my local repo for the last year. I was going to submit
them a bit later after I finished working on a patchset with my SoC
arch-specific changes, but since it was getting bigger and bigger, it
turned to be reasonable to spill out the generic part of series right away
especially seeing it might get to be useful in the most recent kernel.

So this series starts with the MIPS-specific dmi_early_remap()
implementation fix. It is utilized by the DMI driver in the framework of
the dmi_setup() method, which is called at the very early boot stage - in
setup_arch((). No VM available at that stage which is required for the
ioremap_cache() to properly work. Thus it was a mistake to have the
dmi_early_remap() macro-function defined as ioremap_cache(). It should
have been ioremap_uc() in first place.

After that goes a fix for the high-memory zone PFNs calculation procedure
on MIPS. It turned out that after some not that recent commit the
IO-memory PFNs got to the high-memory even though they were directly
reachable, thus should have been left in the normal zone.

Then a series of fixes for the recently discovered mm-bug is presented.
Any attempt to re-map the IO-memory with the cached attribute caused the
bootup procedure to crash with the "Unhandled kernel unaligned access"
message. After some digging I found out that the problem was in the
uninitialized IO-memory pages. Please see the patch "mips: Fix max_mapnr
being uninitialized on early stages" description for the detailed
explanation of the problem and suggested fix. Afterwards I submitted
several cleanup patches for the MIPS/mm and generic mm code.

The patchset is closed with a small improvement which sets the MIPS
board/machine name to the dump-stack module in order to print
arch-personalized oopses in the same way as it's done on ARM, ARM64,
RISC-V, etc.

That's it for today.) Thanks for review in advance. Any tests are very
welcome.

Signed-off-by: Serge Semin <[email protected]>
Cc: Alexey Malahov <[email protected]>
Cc: Arnd Bergmann <[email protected]>
Cc: Aleksandar Rikalo <[email protected]>
Cc: Aleksandar Rikalo <[email protected]>
Cc: Dragan Mladjenovic <[email protected]>
Cc: Chao-ying Fu <[email protected]>
Cc: Jiaxun Yang <[email protected]>
Cc: Yinglu Yang <[email protected]>,
Cc: Tiezhu Yang <[email protected]>
Cc: Marc Zyngier <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]

Serge Semin (7):
mips: dmi: Fix early remap on MIPS32
mips: Fix incorrect max_low_pfn adjustment
mips: Fix max_mapnr being uninitialized on early stages
mips: Optimize max_mapnr init procedure
mm/mm_init.c: Extend init unavailable range doc info
mm/mm_init.c: Append '\n' to the unavailable ranges log-message
mips: Set dump-stack arch description

arch/mips/include/asm/dmi.h | 2 +-
arch/mips/kernel/prom.c | 2 ++
arch/mips/kernel/setup.c | 4 ++--
arch/mips/mm/init.c | 16 +++++++++-------
mm/mm_init.c | 3 ++-
5 files changed, 16 insertions(+), 11 deletions(-)

--
2.42.1


2023-11-22 18:26:25

by Serge Semin

[permalink] [raw]
Subject: [PATCH 2/7] mips: Fix incorrect max_low_pfn adjustment

max_low_pfn variable is incorrectly adjusted if the kernel is built with
high memory support and the later is detected in a running system, so the
memory which actually can be directly mapped is getting into the highmem
zone. See the ZONE_NORMAL range on my MIPS32r5 system:

> Zone ranges:
> DMA [mem 0x0000000000000000-0x0000000000ffffff]
> Normal [mem 0x0000000001000000-0x0000000007ffffff]
> HighMem [mem 0x0000000008000000-0x000000020fffffff]

while the zones are supposed to look as follows:

> Zone ranges:
> DMA [mem 0x0000000000000000-0x0000000000ffffff]
> Normal [mem 0x0000000001000000-0x000000001fffffff]
> HighMem [mem 0x0000000020000000-0x000000020fffffff]

Even though the physical memory within the range [0x08000000;0x20000000]
belongs to MMIO on our system, we don't really want it to be considered as
high memory since on MIPS32 that range still can be directly mapped.

Note there might be other problems caused by the max_low_pfn variable
misconfiguration. For instance high_memory variable is initialize with
virtual address corresponding to the max_low_pfn PFN, and by design it
must define the upper bound on direct map memory, then end of the normal
zone. That in its turn potentially may cause problems in accessing the
memory by means of the /dev/mem and /dev/kmem devices.

Let's fix the discovered misconfiguration then. It turns out the commit
a94e4f24ec83 ("MIPS: init: Drop boot_mem_map") didn't introduce the
max_low_pfn adjustment quite correct. If the kernel is built with high
memory support and the system is equipped with high memory, the
max_low_pfn variable will need to be initialized with PFN of the most
upper directly reachable memory address so the zone normal would be
correctly setup. On MIPS that PFN corresponds to PFN_DOWN(HIGHMEM_START).
If the system is built with no high memory support and one is detected in
the running system, we'll just need to adjust the max_pfn variable to
discard the found high memory from the system and leave the max_low_pfn as
is, since the later will be less than PFN_DOWN(HIGHMEM_START) anyway by
design of the for_each_memblock() loop performed a bit early in the
bootmem_init() method.

Fixes: a94e4f24ec83 ("MIPS: init: Drop boot_mem_map")
Signed-off-by: Serge Semin <[email protected]>
---
arch/mips/kernel/setup.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/mips/kernel/setup.c b/arch/mips/kernel/setup.c
index 2d2ca024bd47..0461ab49e8f1 100644
--- a/arch/mips/kernel/setup.c
+++ b/arch/mips/kernel/setup.c
@@ -321,11 +321,11 @@ static void __init bootmem_init(void)
panic("Incorrect memory mapping !!!");

if (max_pfn > PFN_DOWN(HIGHMEM_START)) {
+ max_low_pfn = PFN_DOWN(HIGHMEM_START);
#ifdef CONFIG_HIGHMEM
- highstart_pfn = PFN_DOWN(HIGHMEM_START);
+ highstart_pfn = max_low_pfn;
highend_pfn = max_pfn;
#else
- max_low_pfn = PFN_DOWN(HIGHMEM_START);
max_pfn = max_low_pfn;
#endif
}
--
2.42.1

2023-11-22 18:26:26

by Serge Semin

[permalink] [raw]
Subject: [PATCH 1/7] mips: dmi: Fix early remap on MIPS32

dmi_early_remap() has been defined as ioremap_cache() which on MIPS32 gets
to be converted to the VM-based mapping. DMI early remapping is performed
at the setup_arch() stage with no VM available. So calling the
dmi_early_remap() for MIPS32 causes the system to crash at the early boot
time. Fix that by converting dmi_early_remap() to the uncached remapping
which is always available on both 32 and 64-bits MIPS systems.

Fixes: be8fa1cb444c ("MIPS: Add support for Desktop Management Interface (DMI)")
Signed-off-by: Serge Semin <[email protected]>
---
arch/mips/include/asm/dmi.h | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/mips/include/asm/dmi.h b/arch/mips/include/asm/dmi.h
index 27415a288adf..525aad1572d1 100644
--- a/arch/mips/include/asm/dmi.h
+++ b/arch/mips/include/asm/dmi.h
@@ -5,7 +5,7 @@
#include <linux/io.h>
#include <linux/memblock.h>

-#define dmi_early_remap(x, l) ioremap_cache(x, l)
+#define dmi_early_remap(x, l) ioremap_uc(x, l)
#define dmi_early_unmap(x, l) iounmap(x)
#define dmi_remap(x, l) ioremap_cache(x, l)
#define dmi_unmap(x) iounmap(x)
--
2.42.1

2023-11-22 18:26:29

by Serge Semin

[permalink] [raw]
Subject: [PATCH 4/7] mips: Optimize max_mapnr init procedure

max_mapnr defines the upper boundary of the pages space in the system.
Currently in case if HIGHMEM is available it's calculated based on the
upper high memory PFN limit value. Seeing there is a case when it isn't
fully correct let's optimize out the max_mapnr variable initialization
procedure to cover all the handled in the paging_init() method cases:
1. If CPU has DC-aliases, then high memory is unavailable so the PFNs
upper boundary is determined by max_low_pfn.
2. Otherwise if high memory is available, use highend_pfn value
representing the upper high memory PFNs limit.
3. Otherwise no high memory is available so set max_mapnr with the
low-memory upper limit.

Signed-off-by: Serge Semin <[email protected]>

---

Since I haven't seen any problem the denoted misconfiguration on my setup
the patch isn't marked as fixes, but as an optimization.
---
arch/mips/mm/init.c | 8 ++++++--
1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/arch/mips/mm/init.c b/arch/mips/mm/init.c
index 6e368a4658b5..b2dce07116e8 100644
--- a/arch/mips/mm/init.c
+++ b/arch/mips/mm/init.c
@@ -421,9 +421,13 @@ void __init paging_init(void)
" %ldk highmem ignored\n",
(highend_pfn - max_low_pfn) << (PAGE_SHIFT - 10));
max_zone_pfns[ZONE_HIGHMEM] = max_low_pfn;
- }

- max_mapnr = highend_pfn ? highend_pfn : max_low_pfn;
+ max_mapnr = max_low_pfn;
+ } else if (highend_pfn) {
+ max_mapnr = highend_pfn;
+ } else {
+ max_mapnr = max_low_pfn;
+ }
#else
max_mapnr = max_low_pfn;
#endif
--
2.42.1

2023-11-22 18:26:59

by Serge Semin

[permalink] [raw]
Subject: [PATCH 5/7] mm/mm_init.c: Extend init unavailable range doc info

Besides of the already described reasons the pages backended memory holes
might be persistent due to having memory mapped IO spaces behind those
ranges in the framework of flatmem kernel config. Add such note to the
init_unavailable_range() method kdoc in order to point out to one more
reason of having the function executed for such regions.

Signed-off-by: Serge Semin <[email protected]>

---

Please let me know if the IO-space pages must be initialized somehow
differently rather relying on free_area_init() executing the
init_unavailable_range() method.
---
mm/mm_init.c | 1 +
1 file changed, 1 insertion(+)

diff --git a/mm/mm_init.c b/mm/mm_init.c
index 077bfe393b5e..3fa33e2d32ba 100644
--- a/mm/mm_init.c
+++ b/mm/mm_init.c
@@ -796,6 +796,7 @@ overlap_memmap_init(unsigned long zone, unsigned long *pfn)
* - physical memory bank size is not necessarily the exact multiple of the
* arbitrary section size
* - early reserved memory may not be listed in memblock.memory
+ * - memory mapped IO space
* - memory layouts defined with memmap= kernel parameter may not align
* nicely with memmap sections
*
--
2.42.1

2023-11-22 18:27:35

by Serge Semin

[permalink] [raw]
Subject: [PATCH 3/7] mips: Fix max_mapnr being uninitialized on early stages

max_mapnr variable is utilized in the pfn_valid() method in order to
determine the upper PFN space boundary. Having it uninitialized
effectively makes any PFN passed to that method invalid. That in its turn
causes the kernel mm-subsystem occasion malfunctions even after the
max_mapnr variable is actually properly updated. For instance,
pfn_valid() is called in the init_unavailable_range() method in the
framework of the calls-chain on MIPS:
setup_arch()
+-> paging_init()
+-> free_area_init()
+-> memmap_init()
+-> memmap_init_zone_range()
+-> init_unavailable_range()

Since pfn_valid() always returns "false" value before max_mapnr is
initialized in the mem_init() method, any flatmem page-holes will be left
in the poisoned/uninitialized state including the IO-memory pages. Thus
any further attempts to map/remap the IO-memory by using MMU may fail.
In particular it happened in my case on attempt to map the SRAM region.
The kernel bootup procedure just crashed on the unhandled unaligned access
bug raised in the __update_cache() method:

> Unhandled kernel unaligned access[#1]:
> CPU: 0 PID: 1 Comm: swapper/0 Not tainted 6.7.0-rc1-00307-g0dff108838c9-dirty #2056
> ...
> Call Trace:
> [<8011ef9c>] __update_cache+0x88/0x1bc
> [<80385944>] ioremap_page_range+0x110/0x2a4
> [<80126948>] ioremap_prot+0x17c/0x1f4
> [<80711b80>] __devm_ioremap+0x8c/0x120
> [<80711e0c>] __devm_ioremap_resource+0xf4/0x218
> [<808bf244>] sram_probe+0x4f4/0x930
> [<80889d20>] platform_probe+0x68/0xec
> ...

Let's fix the problem by initializing the max_mapnr variable as soon as
the required data is available. In particular it can be done right in the
paging_init() method before free_area_init() is called since all the PFN
zone boundaries have already been calculated by that time.

Cc: [email protected]
Signed-off-by: Serge Semin <[email protected]>

---

Note I don't really know since what point that problem actually exists.
Based on the commits log it might had been persistent even before the
boot_mem_map allocator was dropped. On the other hand I hadn't seen it
actually come out before moving my working tree from kernel 6.5-rc4 to
6.7-rc1. So after updating the kernel I got the unhandled unaligned access
BUG() due to the access to compound head pointer the __update_cache()
method (see the commit log). After enabling the DEBUG_VM config I managed
to find out that the IO-memory pages were just left uninitialized and
poisoned:

> page:81367080 is uninitialized and poisoned (pfn 8192)
> page dumped because: VM_BUG_ON_PAGE(PagePoisoned(p))
> Kernel bug detected[#1]:
> CPU: 0 PID: 1 Comm: swapper/0 Not tainted 6.7.0-rc1-00298-g88721b1a9ad5-dirty
> $ 0 : 00000000 812d0000 00000034 dced7cdf
> $ 4 : dced7cdf 00594000 10000000 ffff00fe
> $ 8 : 8196bfe0 00000000 00000001 818458c0
> $12 : 00000000 00000000 00000000 00000216
> $16 : 00002800 81227b80 00000000 00000000
> $20 : 00000000 00000000 00000000 00000000
> $24 : 0000022b 818458c0
> $28 : 81968000 8196be68 00000000 803a0920
> Hi : 00000000
> Lo : 00000000
> epc : 8039d2a4 BUG+0x0/0x4
> ra : 803a0920 post_alloc_hook+0x0/0x128
> Status: 10000003 KERNEL EXL IE
> Cause : 00800424 (ExcCode 09)
> PrId : 0001a830 (MIPS P5600)
> Modules linked in:
> Process swapper/0 (pid: 1, threadinfo=81968000, task=819a0000, tls=00000000)
> Stack : 00000000 8101ccb0 00000000 8196bd00 00000000 80359768 818a8300 00000001
> 81139088 8114438c 8042e4f8 81297a2c 81297a2c 81255e90 819a1b50 dced7cdf
> 81297a2c 81297a2c 00000000 81227b80 00000000 81241168 811394b0 00000000
> 81140000 80e2cee0 00000000 00000000 00000000 00000000 00000000 819b0000
> 81140000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
> ...
> Call Trace:
> [<8039d2a4>] BUG+0x0/0x4
> [<803a0920>] post_alloc_hook+0x0/0x128
>
> Code: 01001025 03e00008 24020001 <000c000d> 2403003c 27bdffd0 afb2001c 3c12812f 8e4269e4

Which in its turn made me digging deeper into the way the MMIO-space pages
are initialized. That's how I got into the pfn_valid() and
init_unavailable_range() working improperly on my setup.

Anyway none of the problems above I spotted on kernel 6.5-rc4. So what
actually triggered having them finally popped up isn't that easy to be
foundn seeing the involved code hasn't changed much.
---
arch/mips/mm/init.c | 12 +++++-------
1 file changed, 5 insertions(+), 7 deletions(-)

diff --git a/arch/mips/mm/init.c b/arch/mips/mm/init.c
index 5dcb525a8995..6e368a4658b5 100644
--- a/arch/mips/mm/init.c
+++ b/arch/mips/mm/init.c
@@ -422,7 +422,12 @@ void __init paging_init(void)
(highend_pfn - max_low_pfn) << (PAGE_SHIFT - 10));
max_zone_pfns[ZONE_HIGHMEM] = max_low_pfn;
}
+
+ max_mapnr = highend_pfn ? highend_pfn : max_low_pfn;
+#else
+ max_mapnr = max_low_pfn;
#endif
+ high_memory = (void *) __va(max_low_pfn << PAGE_SHIFT);

free_area_init(max_zone_pfns);
}
@@ -458,13 +463,6 @@ void __init mem_init(void)
*/
BUILD_BUG_ON(IS_ENABLED(CONFIG_32BIT) && (PFN_PTE_SHIFT > PAGE_SHIFT));

-#ifdef CONFIG_HIGHMEM
- max_mapnr = highend_pfn ? highend_pfn : max_low_pfn;
-#else
- max_mapnr = max_low_pfn;
-#endif
- high_memory = (void *) __va(max_low_pfn << PAGE_SHIFT);
-
maar_init();
memblock_free_all();
setup_zero_pages(); /* Setup zeroed pages. */
--
2.42.1

2023-11-22 18:27:45

by Serge Semin

[permalink] [raw]
Subject: [PATCH 6/7] mm/mm_init.c: Append '\n' to the unavailable ranges log-message

Based on the init_unavailable_range() method and it's callee semantics no
multi-line info messages are intended to be printed to the console. Thus
append the '\n' symbol to the respective info string.

Signed-off-by: Serge Semin <[email protected]>
---
mm/mm_init.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/mm/mm_init.c b/mm/mm_init.c
index 3fa33e2d32ba..db8b91175834 100644
--- a/mm/mm_init.c
+++ b/mm/mm_init.c
@@ -827,7 +827,7 @@ static void __init init_unavailable_range(unsigned long spfn,
}

if (pgcnt)
- pr_info("On node %d, zone %s: %lld pages in unavailable ranges",
+ pr_info("On node %d, zone %s: %lld pages in unavailable ranges\n",
node, zone_names[zone], pgcnt);
}

--
2.42.1

2023-11-22 18:28:07

by Serge Semin

[permalink] [raw]
Subject: [PATCH 7/7] mips: Set dump-stack arch description

In the framework of the MIPS architecture the mips_set_machine_name()
method is defined to set the machine name. The name currently is only used
in the /proc/cpuinfo file content generation. Let's have it utilized to
mach-personalize the dump-stack data too in a way it's done on ARM, ARM64,
RISC-V, etc.

Signed-off-by: Serge Semin <[email protected]>
---
arch/mips/kernel/prom.c | 2 ++
1 file changed, 2 insertions(+)

diff --git a/arch/mips/kernel/prom.c b/arch/mips/kernel/prom.c
index f88ce78e13e3..6062e6fa589a 100644
--- a/arch/mips/kernel/prom.c
+++ b/arch/mips/kernel/prom.c
@@ -28,6 +28,8 @@ __init void mips_set_machine_name(const char *name)

strscpy(mips_machine_name, name, sizeof(mips_machine_name));
pr_info("MIPS: machine is %s\n", mips_get_machine_name());
+
+ dump_stack_set_arch_desc(name);
}

char *mips_get_machine_name(void)
--
2.42.1

2023-11-22 18:32:04

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH 0/7] MIPS: mm: Fix some memory-related issues

On Wed, 22 Nov 2023 21:23:58 +0300 Serge Semin <[email protected]> wrote:

> Just recently I've rebased my MIPS32-related work from kernel 6.5-rc4 onto
> the latest kernel 6.7-rc1 and immediately got into a bootup-time
> mm-related bug (see patches 3-5 in this series). After fixing it I decided
> it was time to submit for review the generic MIPS code fixes which I have
> been collecting in my local repo for the last year. I was going to submit
> them a bit later after I finished working on a patchset with my SoC
> arch-specific changes, but since it was getting bigger and bigger, it
> turned to be reasonable to spill out the generic part of series right away
> especially seeing it might get to be useful in the most recent kernel.

It would have been better to separate out the two tiny unrelated MM
patches from this series. I'll steal them - if they later turn up via
the MIPS tree then that's OK.

2023-11-22 19:36:26

by Arnd Bergmann

[permalink] [raw]
Subject: Re: [PATCH 1/7] mips: dmi: Fix early remap on MIPS32

On Wed, Nov 22, 2023, at 19:23, Serge Semin wrote:
> dmi_early_remap() has been defined as ioremap_cache() which on MIPS32 gets
> to be converted to the VM-based mapping. DMI early remapping is performed
> at the setup_arch() stage with no VM available. So calling the
> dmi_early_remap() for MIPS32 causes the system to crash at the early boot
> time. Fix that by converting dmi_early_remap() to the uncached remapping
> which is always available on both 32 and 64-bits MIPS systems.
>
> Fixes: be8fa1cb444c ("MIPS: Add support for Desktop Management Interface (DMI)")
> Signed-off-by: Serge Semin <[email protected]>
> ---
> arch/mips/include/asm/dmi.h | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/arch/mips/include/asm/dmi.h b/arch/mips/include/asm/dmi.h
> index 27415a288adf..525aad1572d1 100644
> --- a/arch/mips/include/asm/dmi.h
> +++ b/arch/mips/include/asm/dmi.h
> @@ -5,7 +5,7 @@
> #include <linux/io.h>
> #include <linux/memblock.h>
>
> -#define dmi_early_remap(x, l) ioremap_cache(x, l)
> +#define dmi_early_remap(x, l) ioremap_uc(x, l)

Please don't use ioremap_uc() in new code, we are in the (long)
process of removing it from the kernel for everything except
x86-32, and it already returns NULL on most of them.

Would the normal ioremap() work for you here? It seems to
do the same thing as ioremap_uc() on mips and a couple of
other architectures that have not yet killed it off.

Arnd

2023-11-23 09:32:48

by Serge Semin

[permalink] [raw]
Subject: Re: [PATCH 1/7] mips: dmi: Fix early remap on MIPS32

Hi Arnd

On Wed, Nov 22, 2023 at 08:35:01PM +0100, Arnd Bergmann wrote:
> On Wed, Nov 22, 2023, at 19:23, Serge Semin wrote:
> > dmi_early_remap() has been defined as ioremap_cache() which on MIPS32 gets
> > to be converted to the VM-based mapping. DMI early remapping is performed
> > at the setup_arch() stage with no VM available. So calling the
> > dmi_early_remap() for MIPS32 causes the system to crash at the early boot
> > time. Fix that by converting dmi_early_remap() to the uncached remapping
> > which is always available on both 32 and 64-bits MIPS systems.
> >
> > Fixes: be8fa1cb444c ("MIPS: Add support for Desktop Management Interface (DMI)")
> > Signed-off-by: Serge Semin <[email protected]>
> > ---
> > arch/mips/include/asm/dmi.h | 2 +-
> > 1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/arch/mips/include/asm/dmi.h b/arch/mips/include/asm/dmi.h
> > index 27415a288adf..525aad1572d1 100644
> > --- a/arch/mips/include/asm/dmi.h
> > +++ b/arch/mips/include/asm/dmi.h
> > @@ -5,7 +5,7 @@
> > #include <linux/io.h>
> > #include <linux/memblock.h>
> >
> > -#define dmi_early_remap(x, l) ioremap_cache(x, l)
> > +#define dmi_early_remap(x, l) ioremap_uc(x, l)
>

> Please don't use ioremap_uc() in new code, we are in the (long)
> process of removing it from the kernel for everything except
> x86-32, and it already returns NULL on most of them.
>
> Would the normal ioremap() work for you here? It seems to
> do the same thing as ioremap_uc() on mips and a couple of
> other architectures that have not yet killed it off.

Ok. Thanks for the heads up. I'll fix the patch to be using ioremap()
in v2. ioremap_uc() is just an macro-alias of ioremap() on MIPS.

-Serge(y)

>
> Arnd

2023-11-23 10:07:47

by Mike Rapoport

[permalink] [raw]
Subject: Re: [PATCH 6/7] mm/mm_init.c: Append '\n' to the unavailable ranges log-message

On Wed, Nov 22, 2023 at 09:24:04PM +0300, Serge Semin wrote:
> Based on the init_unavailable_range() method and it's callee semantics no
> multi-line info messages are intended to be printed to the console. Thus
> append the '\n' symbol to the respective info string.
>
> Signed-off-by: Serge Semin <[email protected]>

Reviewed-by: Mike Rapoport (IBM) <[email protected]>

> ---
> mm/mm_init.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/mm/mm_init.c b/mm/mm_init.c
> index 3fa33e2d32ba..db8b91175834 100644
> --- a/mm/mm_init.c
> +++ b/mm/mm_init.c
> @@ -827,7 +827,7 @@ static void __init init_unavailable_range(unsigned long spfn,
> }
>
> if (pgcnt)
> - pr_info("On node %d, zone %s: %lld pages in unavailable ranges",
> + pr_info("On node %d, zone %s: %lld pages in unavailable ranges\n",
> node, zone_names[zone], pgcnt);
> }
>
> --
> 2.42.1
>

--
Sincerely yours,
Mike.

2023-11-23 10:13:10

by Serge Semin

[permalink] [raw]
Subject: Re: [PATCH 0/7] MIPS: mm: Fix some memory-related issues

On Wed, Nov 22, 2023 at 10:29:00AM -0800, Andrew Morton wrote:
> On Wed, 22 Nov 2023 21:23:58 +0300 Serge Semin <[email protected]> wrote:
>
> > Just recently I've rebased my MIPS32-related work from kernel 6.5-rc4 onto
> > the latest kernel 6.7-rc1 and immediately got into a bootup-time
> > mm-related bug (see patches 3-5 in this series). After fixing it I decided
> > it was time to submit for review the generic MIPS code fixes which I have
> > been collecting in my local repo for the last year. I was going to submit
> > them a bit later after I finished working on a patchset with my SoC
> > arch-specific changes, but since it was getting bigger and bigger, it
> > turned to be reasonable to spill out the generic part of series right away
> > especially seeing it might get to be useful in the most recent kernel.
>

> It would have been better to separate out the two tiny unrelated MM
> patches from this series.

One of them isn't completely unrelated to the series content. The
biggest problem I fixed in the patch
[PATCH 3/7] mips: Fix max_mapnr being uninitialized on early stages
Link: https://lore.kernel.org/linux-mips/[email protected]/
of this series. I was sure that it was a correct fix at least for
having the pfn_valid() method working incorrectly, but I had doubts
whether the memory mapped IO pages were supposed to be left
uninitialized by the arch code relying on the init_unavailable_range()
doing that especially seeing it was printing a warning about having
unavailable ranges. If it turned out to be incorrect I would have
needed to drop the patch
[PATCH 5/7] mm/mm_init.c: Extend init unavailable range doc info
Link: https://lore.kernel.org/linux-mips/[email protected]/
and fix that problem too in the framework of the MIPS arch.

My alternative assumption regarding that problem was that the
arch-code should have used memblock_reserve() method for the IO
ranges, so then the calls-chain:
mem_init()
+-> memblock_free_all()
+-> free_low_memory_core_early()
+-> memmap_init_reserved_pages()
+-> memmap_init_reserved_pages(v)
+-> for_each_reserved_mem_region(region)
+-> reserve_bootmem_region(start, end, nid);
would have properly initialized the IO-pages reserved earlier by means
of the memblock_reserve() method. But it turned out that
reserve_bootmem_region() was available only when
CONFIG_DEFERRED_STRUCT_PAGE_INIT was enabled which didn't seem to be
widespreadly utilized in the arch code. Not finding a better option I
decided to stick to the solution relying on the
init_unavailable_range() method doing the trick and just fix the
method kdoc. Seeing you accepted the patch
[PATCH 5/7] mm/mm_init.c: Extend init unavailable range doc info
it was a correct decision.

> I'll steal them - if they later turn up via
> the MIPS tree then that's OK.

Ok. Thanks for picking them up. I'll drop those two patches from the
series on v2.

-Serge(y)

2023-11-23 10:19:24

by Mike Rapoport

[permalink] [raw]
Subject: Re: [PATCH 5/7] mm/mm_init.c: Extend init unavailable range doc info

Hi Serge,

On Wed, Nov 22, 2023 at 09:24:03PM +0300, Serge Semin wrote:
> Besides of the already described reasons the pages backended memory holes
> might be persistent due to having memory mapped IO spaces behind those
> ranges in the framework of flatmem kernel config. Add such note to the
> init_unavailable_range() method kdoc in order to point out to one more
> reason of having the function executed for such regions.
>
> Signed-off-by: Serge Semin <[email protected]>
>
> ---
>
> Please let me know if the IO-space pages must be initialized somehow
> differently rather relying on free_area_init() executing the
> init_unavailable_range() method.

Maybe I'm missing something, but why do you need struct pages in the
IO space?

> ---
> mm/mm_init.c | 1 +
> 1 file changed, 1 insertion(+)
>
> diff --git a/mm/mm_init.c b/mm/mm_init.c
> index 077bfe393b5e..3fa33e2d32ba 100644
> --- a/mm/mm_init.c
> +++ b/mm/mm_init.c
> @@ -796,6 +796,7 @@ overlap_memmap_init(unsigned long zone, unsigned long *pfn)
> * - physical memory bank size is not necessarily the exact multiple of the
> * arbitrary section size
> * - early reserved memory may not be listed in memblock.memory
> + * - memory mapped IO space
> * - memory layouts defined with memmap= kernel parameter may not align
> * nicely with memmap sections
> *
> --
> 2.42.1
>

--
Sincerely yours,
Mike.

2023-11-23 10:47:44

by Serge Semin

[permalink] [raw]
Subject: Re: [PATCH 5/7] mm/mm_init.c: Extend init unavailable range doc info

Hi Mike

On Thu, Nov 23, 2023 at 12:18:54PM +0200, Mike Rapoport wrote:
> Hi Serge,
>
> On Wed, Nov 22, 2023 at 09:24:03PM +0300, Serge Semin wrote:
> > Besides of the already described reasons the pages backended memory holes
> > might be persistent due to having memory mapped IO spaces behind those
> > ranges in the framework of flatmem kernel config. Add such note to the
> > init_unavailable_range() method kdoc in order to point out to one more
> > reason of having the function executed for such regions.
> >
> > Signed-off-by: Serge Semin <[email protected]>
> >
> > ---
> >
> > Please let me know if the IO-space pages must be initialized somehow
> > differently rather relying on free_area_init() executing the
> > init_unavailable_range() method.
>

> Maybe I'm missing something, but why do you need struct pages in the
> IO space?

In my case at the very least that's due to having a SRAM device
available in the middle of the MMIO-space. The region is getting
mapped using the ioremap_wc() method (Uncached Write-Combine CA),
which eventually is converted to calling get_vm_area() and
ioremap_page_range() (see ioremap_prot() function on MIPS), which in
its turn use the page structs for mapping. Another similar case is
using ioremap_wc() in the PCIe outbound ATU space mapping of
the graphic/video cards framebuffers.

In general having the pages array defined for the IO-memory is
required for mapping the IO-space other than just uncached (my sram
case for example) or, for instance, with special access attribute for
the user-space (if I am not missing something in a way VM works in
that case).

-Serge(y)

>
> > ---
> > mm/mm_init.c | 1 +
> > 1 file changed, 1 insertion(+)
> >
> > diff --git a/mm/mm_init.c b/mm/mm_init.c
> > index 077bfe393b5e..3fa33e2d32ba 100644
> > --- a/mm/mm_init.c
> > +++ b/mm/mm_init.c
> > @@ -796,6 +796,7 @@ overlap_memmap_init(unsigned long zone, unsigned long *pfn)
> > * - physical memory bank size is not necessarily the exact multiple of the
> > * arbitrary section size
> > * - early reserved memory may not be listed in memblock.memory
> > + * - memory mapped IO space
> > * - memory layouts defined with memmap= kernel parameter may not align
> > * nicely with memmap sections
> > *
> > --
> > 2.42.1
> >
>
> --
> Sincerely yours,
> Mike.
>

2023-11-23 12:14:55

by Jiaxun Yang

[permalink] [raw]
Subject: Re: [PATCH 1/7] mips: dmi: Fix early remap on MIPS32



在2023年11月23日十一月 上午9:32,Serge Semin写道:
> Hi Arnd
>
> On Wed, Nov 22, 2023 at 08:35:01PM +0100, Arnd Bergmann wrote:
>> On Wed, Nov 22, 2023, at 19:23, Serge Semin wrote:
>> > dmi_early_remap() has been defined as ioremap_cache() which on MIPS32 gets
>> > to be converted to the VM-based mapping. DMI early remapping is performed
>> > at the setup_arch() stage with no VM available. So calling the
>> > dmi_early_remap() for MIPS32 causes the system to crash at the early boot
>> > time. Fix that by converting dmi_early_remap() to the uncached remapping
>> > which is always available on both 32 and 64-bits MIPS systems.
>> >
>> > Fixes: be8fa1cb444c ("MIPS: Add support for Desktop Management Interface (DMI)")
>> > Signed-off-by: Serge Semin <[email protected]>
>> > ---
>> > arch/mips/include/asm/dmi.h | 2 +-
>> > 1 file changed, 1 insertion(+), 1 deletion(-)
>> >
>> > diff --git a/arch/mips/include/asm/dmi.h b/arch/mips/include/asm/dmi.h
>> > index 27415a288adf..525aad1572d1 100644
>> > --- a/arch/mips/include/asm/dmi.h
>> > +++ b/arch/mips/include/asm/dmi.h
>> > @@ -5,7 +5,7 @@
>> > #include <linux/io.h>
>> > #include <linux/memblock.h>
>> >
>> > -#define dmi_early_remap(x, l) ioremap_cache(x, l)
>> > +#define dmi_early_remap(x, l) ioremap_uc(x, l)
>>
>
>> Please don't use ioremap_uc() in new code, we are in the (long)
>> process of removing it from the kernel for everything except
>> x86-32, and it already returns NULL on most of them.
>>
>> Would the normal ioremap() work for you here? It seems to
>> do the same thing as ioremap_uc() on mips and a couple of
>> other architectures that have not yet killed it off.
>
> Ok. Thanks for the heads up. I'll fix the patch to be using ioremap()
> in v2. ioremap_uc() is just an macro-alias of ioremap() on MIPS.

Perhaps we need to fix ioremap_cache so it can give a KSEG1 address?
AFAIK for Loongson DMI is located at cached memory so using ioremap_uc
blindly will cause inconsistency.

Thanks
- Jiaxun

>
> -Serge(y)
>
>>
>> Arnd

--
- Jiaxun

2023-11-23 12:30:56

by Thomas Bogendoerfer

[permalink] [raw]
Subject: Re: [PATCH 1/7] mips: dmi: Fix early remap on MIPS32

On Thu, Nov 23, 2023 at 12:13:11PM +0000, Jiaxun Yang wrote:
> > Ok. Thanks for the heads up. I'll fix the patch to be using ioremap()
> > in v2. ioremap_uc() is just an macro-alias of ioremap() on MIPS.
>
> Perhaps we need to fix ioremap_cache so it can give a KSEG1 address?

KSEG0 ?

> AFAIK for Loongson DMI is located at cached memory so using ioremap_uc
> blindly will cause inconsistency.

why ?

Thomas.

--
Crap can work. Given enough thrust pigs will fly, but it's not necessarily a
good idea. [ RFC1925, 2.3 ]

2023-11-23 15:08:08

by Jiaxun Yang

[permalink] [raw]
Subject: Re: [PATCH 1/7] mips: dmi: Fix early remap on MIPS32



在2023年11月23日十一月 下午12:29,Thomas Bogendoerfer写道:
> On Thu, Nov 23, 2023 at 12:13:11PM +0000, Jiaxun Yang wrote:
>> > Ok. Thanks for the heads up. I'll fix the patch to be using ioremap()
>> > in v2. ioremap_uc() is just an macro-alias of ioremap() on MIPS.
>>
>> Perhaps we need to fix ioremap_cache so it can give a KSEG1 address?
>
> KSEG0 ?

Ah yes it's KSEG0.

>
>> AFAIK for Loongson DMI is located at cached memory so using ioremap_uc
>> blindly will cause inconsistency.
>
> why ?

Firmware sometimes does not flush those tables from cache back to memory.
For Loongson systems (as well as most MTI systems) cache is enabled by
firmware.

Thanks.
>
> Thomas.
>
> --
> Crap can work. Given enough thrust pigs will fly, but it's not necessarily a
> good idea. [ RFC1925, 2.3 ]

--
- Jiaxun

2023-11-23 16:08:33

by Thomas Bogendoerfer

[permalink] [raw]
Subject: Re: [PATCH 1/7] mips: dmi: Fix early remap on MIPS32

On Thu, Nov 23, 2023 at 03:07:09PM +0000, Jiaxun Yang wrote:
>
>
> 在2023年11月23日十一月 下午12:29,Thomas Bogendoerfer写道:
> > On Thu, Nov 23, 2023 at 12:13:11PM +0000, Jiaxun Yang wrote:
> >> > Ok. Thanks for the heads up. I'll fix the patch to be using ioremap()
> >> > in v2. ioremap_uc() is just an macro-alias of ioremap() on MIPS.
> >>
> >> Perhaps we need to fix ioremap_cache so it can give a KSEG1 address?
> >
> > KSEG0 ?
>
> Ah yes it's KSEG0.

the problem with all 32bit unmapped segments is their limitations in
size. But there is always room to try to use unmapped and fall back
to mapped, if it doesn't work. But I doubt anybody is going to
implement that.

> >> AFAIK for Loongson DMI is located at cached memory so using ioremap_uc
> >> blindly will cause inconsistency.
> >
> > why ?
>
> Firmware sometimes does not flush those tables from cache back to memory.
> For Loongson systems (as well as most MTI systems) cache is enabled by
> firmware.

kernel flushes all caches on startup, so there shouldn't be a problem.

Thomas.

--
Crap can work. Given enough thrust pigs will fly, but it's not necessarily a
good idea. [ RFC1925, 2.3 ]

2023-11-23 17:34:06

by Jiaxun Yang

[permalink] [raw]
Subject: Re: [PATCH 1/7] mips: dmi: Fix early remap on MIPS32



在2023年11月23日十一月 下午4:07,Thomas Bogendoerfer写道:
> On Thu, Nov 23, 2023 at 03:07:09PM +0000, Jiaxun Yang wrote:
>>
[...]
>
> the problem with all 32bit unmapped segments is their limitations in
> size. But there is always room to try to use unmapped and fall back
> to mapped, if it doesn't work. But I doubt anybody is going to
> implement that.

Yep, I guess fallback should be implemented for ioremap_cache as well.

>
>> >> AFAIK for Loongson DMI is located at cached memory so using ioremap_uc
>> >> blindly will cause inconsistency.
>> >
>> > why ?
>>
>> Firmware sometimes does not flush those tables from cache back to memory.
>> For Loongson systems (as well as most MTI systems) cache is enabled by
>> firmware.
>
> kernel flushes all caches on startup, so there shouldn't be a problem.

Actually dmi_setup() is called before cpu_cache_init().

Thanks
>
> Thomas.
>
> --
> Crap can work. Given enough thrust pigs will fly, but it's not necessarily a
> good idea. [ RFC1925, 2.3 ]

--
- Jiaxun

2023-11-24 08:19:29

by Mike Rapoport

[permalink] [raw]
Subject: Re: [PATCH 5/7] mm/mm_init.c: Extend init unavailable range doc info

On Thu, Nov 23, 2023 at 01:42:39PM +0300, Serge Semin wrote:
> On Thu, Nov 23, 2023 at 12:18:54PM +0200, Mike Rapoport wrote:
> > On Wed, Nov 22, 2023 at 09:24:03PM +0300, Serge Semin wrote:
> > > Besides of the already described reasons the pages backended memory holes
> > > might be persistent due to having memory mapped IO spaces behind those
> > > ranges in the framework of flatmem kernel config. Add such note to the
> > > init_unavailable_range() method kdoc in order to point out to one more
> > > reason of having the function executed for such regions.
> > >
> > > Signed-off-by: Serge Semin <[email protected]>
> > >
> > > ---
> > >
> > > Please let me know if the IO-space pages must be initialized somehow
> > > differently rather relying on free_area_init() executing the
> > > init_unavailable_range() method.
> >
>
> > Maybe I'm missing something, but why do you need struct pages in the
> > IO space?
>
> In my case at the very least that's due to having a SRAM device
> available in the middle of the MMIO-space. The region is getting
> mapped using the ioremap_wc() method (Uncached Write-Combine CA),
> which eventually is converted to calling get_vm_area() and
> ioremap_page_range() (see ioremap_prot() function on MIPS), which in
> its turn use the page structs for mapping. Another similar case is
> using ioremap_wc() in the PCIe outbound ATU space mapping of
> the graphic/video cards framebuffers.

ioremap_page_range() does not need struct pages, but rather physical
addresses.

> In general having the pages array defined for the IO-memory is
> required for mapping the IO-space other than just uncached (my sram
> case for example) or, for instance, with special access attribute for
> the user-space (if I am not missing something in a way VM works in
> that case).

No, struct pages are not required to map IO space. If you need to map MMIO
to userspace there's remap_pfn_range() for that.

My guess is that your system has a hole in the physical memory mappings and
with FLATMEM that hole will have essentially unused struct pages, which are
initialized by init_unavailable_range(). But from mm perspective this is
still a hole even though there's some MMIO ranges in that hole.

Now, if that hole is large you are wasting memory for unused memory map and
it maybe worth considering using SPARSEMEM.

> -Serge(y)
>
> >
> > > ---
> > > mm/mm_init.c | 1 +
> > > 1 file changed, 1 insertion(+)
> > >
> > > diff --git a/mm/mm_init.c b/mm/mm_init.c
> > > index 077bfe393b5e..3fa33e2d32ba 100644
> > > --- a/mm/mm_init.c
> > > +++ b/mm/mm_init.c
> > > @@ -796,6 +796,7 @@ overlap_memmap_init(unsigned long zone, unsigned long *pfn)
> > > * - physical memory bank size is not necessarily the exact multiple of the
> > > * arbitrary section size
> > > * - early reserved memory may not be listed in memblock.memory
> > > + * - memory mapped IO space
> > > * - memory layouts defined with memmap= kernel parameter may not align
> > > * nicely with memmap sections
> > > *
> > > --
> > > 2.42.1
> > >
> >
> > --
> > Sincerely yours,
> > Mike.
> >

--
Sincerely yours,
Mike.

2023-11-24 11:19:43

by Serge Semin

[permalink] [raw]
Subject: Re: [PATCH 5/7] mm/mm_init.c: Extend init unavailable range doc info

On Fri, Nov 24, 2023 at 10:19:00AM +0200, Mike Rapoport wrote:
> On Thu, Nov 23, 2023 at 01:42:39PM +0300, Serge Semin wrote:
> > On Thu, Nov 23, 2023 at 12:18:54PM +0200, Mike Rapoport wrote:
> > > On Wed, Nov 22, 2023 at 09:24:03PM +0300, Serge Semin wrote:
> > > > Besides of the already described reasons the pages backended memory holes
> > > > might be persistent due to having memory mapped IO spaces behind those
> > > > ranges in the framework of flatmem kernel config. Add such note to the
> > > > init_unavailable_range() method kdoc in order to point out to one more
> > > > reason of having the function executed for such regions.
> > > >
> > > > Signed-off-by: Serge Semin <[email protected]>
> > > >
> > > > ---
> > > >
> > > > Please let me know if the IO-space pages must be initialized somehow
> > > > differently rather relying on free_area_init() executing the
> > > > init_unavailable_range() method.
> > >
> >
> > > Maybe I'm missing something, but why do you need struct pages in the
> > > IO space?
> >
> > In my case at the very least that's due to having a SRAM device
> > available in the middle of the MMIO-space. The region is getting
> > mapped using the ioremap_wc() method (Uncached Write-Combine CA),
> > which eventually is converted to calling get_vm_area() and
> > ioremap_page_range() (see ioremap_prot() function on MIPS), which in
> > its turn use the page structs for mapping. Another similar case is
> > using ioremap_wc() in the PCIe outbound ATU space mapping of
> > the graphic/video cards framebuffers.
>
> ioremap_page_range() does not need struct pages, but rather physical
> addresses.

Unless I miss something or MIPS32 is somehow special/wrong in that
matter, but from my just got experience it actually does at least in
the framework of the __update_cache() implementation which is called
in the set_ptes() method (former set_pte_at()), which in its turn
is eventually invoked by vmap_range_noflush() and finally by
ioremap_page_range(). See the patch
[PATCH 3/7] mips: Fix max_mapnr being uninitialized on early stages
Link: https://lore.kernel.org/linux-mips/[email protected]/
of this series and the stack-trace of the bug fixed by that patch.

Is it wrong that on MIPS32 ioremap_page_range() eventually relies on
the page structs? It has been like that for, I don't know, long time.
If so then the sparse memory config might be broken on MIPS32..(

>
> > In general having the pages array defined for the IO-memory is
> > required for mapping the IO-space other than just uncached (my sram
> > case for example) or, for instance, with special access attribute for
> > the user-space (if I am not missing something in a way VM works in
> > that case).
>

> No, struct pages are not required to map IO space. If you need to map MMIO
> to userspace there's remap_pfn_range() for that.

Is this correct for both flat and sparse memory config? In anyway
please see my comment above about the problem I recently got.

>
> My guess is that your system has a hole in the physical memory mappings and
> with FLATMEM that hole will have essentially unused struct pages, which are
> initialized by init_unavailable_range(). But from mm perspective this is
> still a hole even though there's some MMIO ranges in that hole.

Absolutely right. Here is the physical memory layout in my system.
0 - 128MB: RAM
128MB - 512MB: Memory mapped IO
512MB - 768MB..8.256GB: RAM

>
> Now, if that hole is large you are wasting memory for unused memory map and
> it maybe worth considering using SPARSEMEM.

Do you think it's worth to move to the sparse memory configuration in
order to save the 384MB of mapping with the 16K page model? AFAIU flat
memory config is more performant. Performance is critical on the most
of the SoC applications especially when using the 10G ethernet or
the high-speed PCIe devices.

-Serge(y)

>
> > -Serge(y)
> >
> > >
> > > > ---
> > > > mm/mm_init.c | 1 +
> > > > 1 file changed, 1 insertion(+)
> > > >
> > > > diff --git a/mm/mm_init.c b/mm/mm_init.c
> > > > index 077bfe393b5e..3fa33e2d32ba 100644
> > > > --- a/mm/mm_init.c
> > > > +++ b/mm/mm_init.c
> > > > @@ -796,6 +796,7 @@ overlap_memmap_init(unsigned long zone, unsigned long *pfn)
> > > > * - physical memory bank size is not necessarily the exact multiple of the
> > > > * arbitrary section size
> > > > * - early reserved memory may not be listed in memblock.memory
> > > > + * - memory mapped IO space
> > > > * - memory layouts defined with memmap= kernel parameter may not align
> > > > * nicely with memmap sections
> > > > *
> > > > --
> > > > 2.42.1
> > > >
> > >
> > > --
> > > Sincerely yours,
> > > Mike.
> > >
>
> --
> Sincerely yours,
> Mike.

2023-11-24 18:52:59

by Serge Semin

[permalink] [raw]
Subject: Re: [PATCH 1/7] mips: dmi: Fix early remap on MIPS32

On Thu, Nov 23, 2023 at 05:33:31PM +0000, Jiaxun Yang wrote:
>
>
> 在2023年11月23日十一月 下午4:07,Thomas Bogendoerfer写道:
> > On Thu, Nov 23, 2023 at 03:07:09PM +0000, Jiaxun Yang wrote:
> >>
> [...]
> >
> > the problem with all 32bit unmapped segments is their limitations in
> > size. But there is always room to try to use unmapped and fall back
> > to mapped, if it doesn't work. But I doubt anybody is going to
> > implement that.
>
> Yep, I guess fallback should be implemented for ioremap_cache as well.
>
> >
> >> >> AFAIK for Loongson DMI is located at cached memory so using ioremap_uc
> >> >> blindly will cause inconsistency.
> >> >
> >> > why ?
> >>
> >> Firmware sometimes does not flush those tables from cache back to memory.
> >> For Loongson systems (as well as most MTI systems) cache is enabled by
> >> firmware.
> >
> > kernel flushes all caches on startup, so there shouldn't be a problem.
>
> Actually dmi_setup() is called before cpu_cache_init().

To preliminary sum the discussion, indeed there can be issues on the
platforms which have DMI initialized on the cached region. Here are
several solutions and additional difficulties I think may be caused by
implementing them:

1. Use unmapped cached region utilization in the MIPS32 ioremap_prot()
method.
This solution a bit clumsy than it looks on the first glance.
ioremap_prot() can be used for various types of the cachability
mapping. Currently it's a default-cacheable CA preserved in the
_page_cachable_default variable and Write-combined CA saved in
boot_cpu_data.writecombine. Based on that we would have needed to use
the unmapped cached region utilized for the IO-remaps called with the
"_page_cachable_default" mapping flags passed only. The rest of the IO
range mappings, including the write-combined ones, would have been
handled by VM means. This would have made the ioremap_prot() a bit
less maintainable, but still won't be that hard to implement (unless I
miss something):
--- a/arch/mips/mm/ioremap.c
+++ b/arch/mips/mm/ioremap.c
/*
- * Map uncached objects in the low 512mb of address space using KSEG1,
- * otherwise map using page tables.
+ * Map uncached/default-cached objects in the low 512mb of address
+ * space using KSEG1/KSEG0, otherwise map using page tables.
*/
- if (IS_LOW512(phys_addr) && IS_LOW512(last_addr) &&
- flags == _CACHE_UNCACHED)
- return (void __iomem *) CKSEG1ADDR(phys_addr);
+ if (IS_LOW512(phys_addr) && IS_LOW512(last_addr)) {
+ if (flags == _CACHE_UNCACHED)
+ return (void __iomem *) CKSEG1ADDR(phys_addr);
+ else if (flags == _page_cachable_default)
+ return (void __iomem *) CKSEG0ADDR(phys_addr);
+ }

Currently I can't figure out what obvious problems it may cause. But
It seems suspicious that the cacheable IO-mapping hasn't been
implemented by the unmapped cacheable region in the first place. In
anyway this solution looks more dangerous than solution 2. because it
affects all the MIPS32 platforms at once.

2. Convert dmi_remap_early() to ioremap_uc() (actually just ioremap()
as noted by Arnd).
As Jiaxun correctly noted this may cause problems on the platforms
which don't flush caches before jumping out to the kernel. Thomas said
that kernel flushes the caches early on boot, but Jiaxun noted that
it's still done after early DMI setup. So the issue with solution 2 is
that the setup_arch() method calls dmi_setup() before it flushes the
caches by means of the cpu_cache_init() method. I guess it can be
fixed just by moving the dmi_setup() method invocation to be after the
cpu_cache_init() is called. This solution looks much less invasive
than solution 1.

So what do you think? What solution do you prefer? Perhaps
alternative?

-Serge(y)

>
> Thanks
> >
> > Thomas.
> >
> > --
> > Crap can work. Given enough thrust pigs will fly, but it's not necessarily a
> > good idea. [ RFC1925, 2.3 ]
>
> --
> - Jiaxun

2023-11-24 22:04:27

by Jiaxun Yang

[permalink] [raw]
Subject: Re: [PATCH 1/7] mips: dmi: Fix early remap on MIPS32



在2023年11月24日十一月 下午6:52,Serge Semin写道:
> On Thu, Nov 23, 2023 at 05:33:31PM +0000, Jiaxun Yang wrote:
>>
[...]
>> Actually dmi_setup() is called before cpu_cache_init().
>
> To preliminary sum the discussion, indeed there can be issues on the
> platforms which have DMI initialized on the cached region. Here are
> several solutions and additional difficulties I think may be caused by
> implementing them:

Thanks for such detailed conclusion!
I'd prefer go solution 1, with comments below.
>
> 1. Use unmapped cached region utilization in the MIPS32 ioremap_prot()
> method.
> This solution a bit clumsy than it looks on the first glance.
> ioremap_prot() can be used for various types of the cachability
> mapping. Currently it's a default-cacheable CA preserved in the
> _page_cachable_default variable and Write-combined CA saved in
> boot_cpu_data.writecombine. Based on that we would have needed to use
> the unmapped cached region utilized for the IO-remaps called with the
> "_page_cachable_default" mapping flags passed only. The rest of the IO
> range mappings, including the write-combined ones, would have been
> handled by VM means. This would have made the ioremap_prot() a bit
> less maintainable, but still won't be that hard to implement (unless I
> miss something):
> --- a/arch/mips/mm/ioremap.c
> +++ b/arch/mips/mm/ioremap.c
> /*
> - * Map uncached objects in the low 512mb of address space using KSEG1,
> - * otherwise map using page tables.
> + * Map uncached/default-cached objects in the low 512mb of address
> + * space using KSEG1/KSEG0, otherwise map using page tables.
> */
> - if (IS_LOW512(phys_addr) && IS_LOW512(last_addr) &&
> - flags == _CACHE_UNCACHED)
> - return (void __iomem *) CKSEG1ADDR(phys_addr);
> + if (IS_LOW512(phys_addr) && IS_LOW512(last_addr)) {
> + if (flags == _CACHE_UNCACHED)
> + return (void __iomem *) CKSEG1ADDR(phys_addr);
> + else if (flags == _page_cachable_default)
> + return (void __iomem *) CKSEG0ADDR(phys_addr);
> + }
>
> Currently I can't figure out what obvious problems it may cause. But
> It seems suspicious that the cacheable IO-mapping hasn't been
> implemented by the unmapped cacheable region in the first place. In
> anyway this solution looks more dangerous than solution 2. because it
> affects all the MIPS32 platforms at once.

I just made a quick grep in tree, and it seems like we don't have much
user of ioremap_cache (as well as ioremap_uc/wc) here so I think it is
a safe assumption.

>
> 2. Convert dmi_remap_early() to ioremap_uc() (actually just ioremap()
> as noted by Arnd).
> As Jiaxun correctly noted this may cause problems on the platforms
> which don't flush caches before jumping out to the kernel. Thomas said
> that kernel flushes the caches early on boot, but Jiaxun noted that
> it's still done after early DMI setup. So the issue with solution 2 is
> that the setup_arch() method calls dmi_setup() before it flushes the
> caches by means of the cpu_cache_init() method. I guess it can be
> fixed just by moving the dmi_setup() method invocation to be after the
> cpu_cache_init() is called. This solution looks much less invasive
> than solution 1.

I recall Tiezhu made dmi_setup() here for reasons. The first reason is that
DMI is placed at memory space that is not reserved, so it may get clobbered
after mm is up. The second is we may have some early quirks depends on DMI
information.

Thanks.
>
> So what do you think? What solution do you prefer? Perhaps
> alternative?
>
> -Serge(y)
>
>>
>> Thanks
>> >
>> > Thomas.
>> >
>> > --
>> > Crap can work. Given enough thrust pigs will fly, but it's not necessarily a
>> > good idea. [ RFC1925, 2.3 ]
>>
>> --
>> - Jiaxun

--
- Jiaxun

2023-11-24 22:35:03

by Jiaxun Yang

[permalink] [raw]
Subject: Re: [PATCH 1/7] mips: dmi: Fix early remap on MIPS32



在2023年11月24日十一月 下午6:52,Serge Semin写道:
> On Thu, Nov 23, 2023 at 05:33:31PM +0000, Jiaxun Yang wrote:
>>
>>
>> 在2023年11月23日十一月 下午4:07,Thomas Bogendoerfer写道:
>> > On Thu, Nov 23, 2023 at 03:07:09PM +0000, Jiaxun Yang wrote:
>> >>
>> [...]
>> >
>> > the problem with all 32bit unmapped segments is their limitations in
>> > size. But there is always room to try to use unmapped and fall back
>> > to mapped, if it doesn't work. But I doubt anybody is going to
>> > implement that.
>>
>> Yep, I guess fallback should be implemented for ioremap_cache as well.
>>
>> >
>> >> >> AFAIK for Loongson DMI is located at cached memory so using ioremap_uc
>> >> >> blindly will cause inconsistency.
>> >> >
>> >> > why ?
>> >>
>> >> Firmware sometimes does not flush those tables from cache back to memory.
>> >> For Loongson systems (as well as most MTI systems) cache is enabled by
>> >> firmware.
>> >
>> > kernel flushes all caches on startup, so there shouldn't be a problem.
>>
>> Actually dmi_setup() is called before cpu_cache_init().
>
> To preliminary sum the discussion, indeed there can be issues on the
> platforms which have DMI initialized on the cached region. Here are
> several solutions and additional difficulties I think may be caused by
> implementing them:
>
> 1. Use unmapped cached region utilization in the MIPS32 ioremap_prot()
> method.
> This solution a bit clumsy than it looks on the first glance.
> ioremap_prot() can be used for various types of the cachability
> mapping. Currently it's a default-cacheable CA preserved in the
> _page_cachable_default variable and Write-combined CA saved in
> boot_cpu_data.writecombine. Based on that we would have needed to use
> the unmapped cached region utilized for the IO-remaps called with the
> "_page_cachable_default" mapping flags passed only. The rest of the IO
> range mappings, including the write-combined ones, would have been
> handled by VM means. This would have made the ioremap_prot() a bit
> less maintainable, but still won't be that hard to implement (unless I
> miss something):
> --- a/arch/mips/mm/ioremap.c
> +++ b/arch/mips/mm/ioremap.c
> /*
> - * Map uncached objects in the low 512mb of address space using KSEG1,
> - * otherwise map using page tables.
> + * Map uncached/default-cached objects in the low 512mb of address
> + * space using KSEG1/KSEG0, otherwise map using page tables.
> */
> - if (IS_LOW512(phys_addr) && IS_LOW512(last_addr) &&
> - flags == _CACHE_UNCACHED)
> - return (void __iomem *) CKSEG1ADDR(phys_addr);
> + if (IS_LOW512(phys_addr) && IS_LOW512(last_addr)) {
> + if (flags == _CACHE_UNCACHED)
> + return (void __iomem *) CKSEG1ADDR(phys_addr);
> + else if (flags == _page_cachable_default)
> + return (void __iomem *) CKSEG0ADDR(phys_addr);
> + }
>
A nip, _page_cachable_default is set in cpu_cache_init() as well. We'd
better move it to cpu-probe.c, or give it a reasonable default value.

Thanks
--
- Jiaxun

2023-11-27 16:23:51

by Serge Semin

[permalink] [raw]
Subject: Re: [PATCH 1/7] mips: dmi: Fix early remap on MIPS32

Hi Jiaxun

On Fri, Nov 24, 2023 at 10:03:49PM +0000, Jiaxun Yang wrote:
> 在2023年11月24日十一月 下午6:52,Serge Semin写道:
> > On Thu, Nov 23, 2023 at 05:33:31PM +0000, Jiaxun Yang wrote:
> >>
> [...]
> >> Actually dmi_setup() is called before cpu_cache_init().
> >
> > To preliminary sum the discussion, indeed there can be issues on the
> > platforms which have DMI initialized on the cached region. Here are
> > several solutions and additional difficulties I think may be caused by
> > implementing them:
>
> Thanks for such detailed conclusion!
> I'd prefer go solution 1, with comments below.
> >
> > 1. Use unmapped cached region utilization in the MIPS32 ioremap_prot()
> > method.
> > This solution a bit clumsy than it looks on the first glance.
> > ioremap_prot() can be used for various types of the cachability
> > mapping. Currently it's a default-cacheable CA preserved in the
> > _page_cachable_default variable and Write-combined CA saved in
> > boot_cpu_data.writecombine. Based on that we would have needed to use
> > the unmapped cached region utilized for the IO-remaps called with the
> > "_page_cachable_default" mapping flags passed only. The rest of the IO
> > range mappings, including the write-combined ones, would have been
> > handled by VM means. This would have made the ioremap_prot() a bit
> > less maintainable, but still won't be that hard to implement (unless I
> > miss something):
> > --- a/arch/mips/mm/ioremap.c
> > +++ b/arch/mips/mm/ioremap.c
> > /*
> > - * Map uncached objects in the low 512mb of address space using KSEG1,
> > - * otherwise map using page tables.
> > + * Map uncached/default-cached objects in the low 512mb of address
> > + * space using KSEG1/KSEG0, otherwise map using page tables.
> > */
> > - if (IS_LOW512(phys_addr) && IS_LOW512(last_addr) &&
> > - flags == _CACHE_UNCACHED)
> > - return (void __iomem *) CKSEG1ADDR(phys_addr);
> > + if (IS_LOW512(phys_addr) && IS_LOW512(last_addr)) {
> > + if (flags == _CACHE_UNCACHED)
> > + return (void __iomem *) CKSEG1ADDR(phys_addr);
> > + else if (flags == _page_cachable_default)
> > + return (void __iomem *) CKSEG0ADDR(phys_addr);
> > + }
> >
> > Currently I can't figure out what obvious problems it may cause. But
> > It seems suspicious that the cacheable IO-mapping hasn't been
> > implemented by the unmapped cacheable region in the first place. In
> > anyway this solution looks more dangerous than solution 2. because it
> > affects all the MIPS32 platforms at once.
>
> I just made a quick grep in tree, and it seems like we don't have much
> user of ioremap_cache (as well as ioremap_uc/wc) here so I think it is
> a safe assumption.

I wouldn't say there aren't much users. ioremap_wc() and it's
devm-version is widely utilized in the GPU and network and some other
subsystems. ioremap_cache() isn't widespread indeed. In anyway even a
single user must be supported in safely calling the method if it's
provided by the arch-code, otherwise the method could be considered as
just a bogus stub to have the kernel successfully built. I bet you'll
agree with that. But that's not the point in this case.

A bit later you also noted:

On Fri, Nov 24, 2023 at 10:34:49PM +0000, Jiaxun Yang wrote:
> A nip, _page_cachable_default is set in cpu_cache_init() as well. We'd
> better move it to cpu-probe.c, or give it a reasonable default value.

Right. Thanks. To be honest I haven't noticed that before your
message. _page_cachable_default is indeed initialized in the
cpu_cache_init() method, several steps after it would be used in the
framework of dmi_remap_early(). On the other hand ioremap_cache() is
defined as ioremap_prot() with the _page_cachable_default variable
passed. So my code will still correctly work unless
_page_cachable_default is pre-initialized with something other than
zero. On the other hand we can't easily change its default value
because it will affect and likely break the r3k (CPU_R3000) and Octeon
based platforms, because it's utilized to initialize the
protection-map table. Of course we can fix the r3k_cache_init() and
octeon_cache_init() methods too so they would get the
_page_cachable_default variable back to zero, but it will also make
things around it more complicated.

Also note, moving the _page_cachable_default initialization to the
earlier stages like cpu_probe() won't work better because the field
value may get change for instance in the framework of the smp_setup()
function (see cps_smp_setup()).

So after all the considerations above this solution now looks even
clumsier than before.( Any idea how to make it better?

>
> >
> > 2. Convert dmi_remap_early() to ioremap_uc() (actually just ioremap()
> > as noted by Arnd).
> > As Jiaxun correctly noted this may cause problems on the platforms
> > which don't flush caches before jumping out to the kernel. Thomas said
> > that kernel flushes the caches early on boot, but Jiaxun noted that
> > it's still done after early DMI setup. So the issue with solution 2 is
> > that the setup_arch() method calls dmi_setup() before it flushes the
> > caches by means of the cpu_cache_init() method. I guess it can be
> > fixed just by moving the dmi_setup() method invocation to be after the
> > cpu_cache_init() is called. This solution looks much less invasive
> > than solution 1.
>
> I recall Tiezhu made dmi_setup() here for reasons. The first reason is that
> DMI is placed at memory space that is not reserved, so it may get clobbered
> after mm is up.

Note the memory might be clobbered even before dmi_setup() for
instance by means of the early_memtest() method. In anyway it would be
better if the system booloader would have already reserved the DMI
memory (in DTB) or it would have been done by the platform-specific
plat_mem_setup() method.

> The second is we may have some early quirks depends on DMI
> information.

Which quirks do you mean to be dependent in between the current
dmi_setup() call place and the cpu_cache_init() method invocation?

-Serge(y)

>
> Thanks.
> >
> > So what do you think? What solution do you prefer? Perhaps
> > alternative?
> >
> > -Serge(y)
> >
> >>
> >> Thanks
> >> >
> >> > Thomas.
> >> >
> >> > --
> >> > Crap can work. Given enough thrust pigs will fly, but it's not necessarily a
> >> > good idea. [ RFC1925, 2.3 ]
> >>
> >> --
> >> - Jiaxun
>
> --
> - Jiaxun

2023-11-27 21:09:09

by Jiaxun Yang

[permalink] [raw]
Subject: Re: [PATCH 1/7] mips: dmi: Fix early remap on MIPS32



在2023年11月27日十一月 下午4:23,Serge Semin写道:
[...]
>> I just made a quick grep in tree, and it seems like we don't have much
>> user of ioremap_cache (as well as ioremap_uc/wc) here so I think it is
>> a safe assumption.
>
> I wouldn't say there aren't much users. ioremap_wc() and it's
> devm-version is widely utilized in the GPU and network and some other
> subsystems. ioremap_cache() isn't widespread indeed. In anyway even a
> single user must be supported in safely calling the method if it's
> provided by the arch-code, otherwise the method could be considered as
> just a bogus stub to have the kernel successfully built. I bet you'll
> agree with that. But that's not the point in this case,
>
> A bit later you also noted:
>
> On Fri, Nov 24, 2023 at 10:34:49PM +0000, Jiaxun Yang wrote:
>> A nip, _page_cachable_default is set in cpu_cache_init() as well. We'd
>> better move it to cpu-probe.c, or give it a reasonable default value.
>
> Right. Thanks. To be honest I haven't noticed that before your
> message. _page_cachable_default is indeed initialized in the
> cpu_cache_init() method, several steps after it would be used in the
> framework of dmi_remap_early(). On the other hand ioremap_cache() is
> defined as ioremap_prot() with the _page_cachable_default variable
> passed. So my code will still correctly work unless
> _page_cachable_default is pre-initialized with something other than
> zero. On the other hand we can't easily change its default value
> because it will affect and likely break the r3k (CPU_R3000) and Octeon
> based platforms, because it's utilized to initialize the
> protection-map table. Of course we can fix the r3k_cache_init() and
> octeon_cache_init() methods too so they would get the
> _page_cachable_default variable back to zero, but it will also make
> things around it more complicated.
>
> Also note, moving the _page_cachable_default initialization to the
> earlier stages like cpu_probe() won't work better because the field
> value may get change for instance in the framework of the smp_setup()
> function (see cps_smp_setup()).
>
> So after all the considerations above this solution now looks even
> clumsier than before.( Any idea how to make it better?

I think the best solution maybe just use CKSEG0 to setup map here.

Btw I was thinking about 64 bit here, I thought for 64bit we would
just embedded prot into XKPHYS, however I quickly figure out
ioremap_cache was never implemented properly on 64-bit system,
so does ioremap_wc.

> u64 base = (flags == _CACHE_UNCACHED ? IO_BASE : UNCAC_BASE);

Which is always uncached mapping.

>>
[...]
>
> Note the memory might be clobbered even before dmi_setup() for
> instance by means of the early_memtest() method. In anyway it would be
> better if the system booloader would have already reserved the DMI
> memory (in DTB) or it would have been done by the platform-specific
> plat_mem_setup() method.

Unfortunately, too many machines are shipped with those badly designed
firmware. We rely on dmi_setup code to scan and preserve dmi table from
random location in memory.

>
>> The second is we may have some early quirks depends on DMI
>> information.
>
> Which quirks do you mean to be dependent in between the current
> dmi_setup() call place and the cpu_cache_init() method invocation?

I think we don't have any for now.

--
- Jiaxun

2023-11-28 07:14:16

by Mike Rapoport

[permalink] [raw]
Subject: Re: [PATCH 5/7] mm/mm_init.c: Extend init unavailable range doc info

On Fri, Nov 24, 2023 at 02:18:44PM +0300, Serge Semin wrote:
> On Fri, Nov 24, 2023 at 10:19:00AM +0200, Mike Rapoport wrote:
> > On Thu, Nov 23, 2023 at 01:42:39PM +0300, Serge Semin wrote:
> > > On Thu, Nov 23, 2023 at 12:18:54PM +0200, Mike Rapoport wrote:
> > > > On Wed, Nov 22, 2023 at 09:24:03PM +0300, Serge Semin wrote:
> > > > > Besides of the already described reasons the pages backended memory holes
> > > > > might be persistent due to having memory mapped IO spaces behind those
> > > > > ranges in the framework of flatmem kernel config. Add such note to the
> > > > > init_unavailable_range() method kdoc in order to point out to one more
> > > > > reason of having the function executed for such regions.
> > > > >
> > > > > Signed-off-by: Serge Semin <[email protected]>
> > > > >
> > > > > ---
> > > > >
> > > > > Please let me know if the IO-space pages must be initialized somehow
> > > > > differently rather relying on free_area_init() executing the
> > > > > init_unavailable_range() method.
> > > >
> > >
> > > > Maybe I'm missing something, but why do you need struct pages in the
> > > > IO space?
> > >
> > > In my case at the very least that's due to having a SRAM device
> > > available in the middle of the MMIO-space. The region is getting
> > > mapped using the ioremap_wc() method (Uncached Write-Combine CA),
> > > which eventually is converted to calling get_vm_area() and
> > > ioremap_page_range() (see ioremap_prot() function on MIPS), which in
> > > its turn use the page structs for mapping. Another similar case is
> > > using ioremap_wc() in the PCIe outbound ATU space mapping of
> > > the graphic/video cards framebuffers.
> >
> > ioremap_page_range() does not need struct pages, but rather physical
> > addresses.
>
> Unless I miss something or MIPS32 is somehow special/wrong in that
> matter, but from my just got experience it actually does at least in
> the framework of the __update_cache() implementation which is called
> in the set_ptes() method (former set_pte_at()), which in its turn
> is eventually invoked by vmap_range_noflush() and finally by
> ioremap_page_range(). See the patch
> [PATCH 3/7] mips: Fix max_mapnr being uninitialized on early stages
> Link: https://lore.kernel.org/linux-mips/[email protected]/
> of this series and the stack-trace of the bug fixed by that patch.
>
> Is it wrong that on MIPS32 ioremap_page_range() eventually relies on
> the page structs? It has been like that for, I don't know, long time.
> If so then the sparse memory config might be broken on MIPS32..(

Do you mind posting your physical memory layout?

If I understand correctly, you have a hole in your RAM and there is MMIO
region somewhere in that hole. With FLATMEM the memory map exists for that
hole and hence pfn_valid() returns 1 for the MMIO range as well. That makes
__update_cache() to check folio state and that check would fail if the memory
map contained garbage. But since the hole in the memory map is initialized
with init_unavailable_range() you get a valid struct page/struct folio and
everything is fine.

With that, the init_unavailable_range() docs need not mention IO space at
all, they should mention holes within FLATMEM memory map.

As for SPARSEMEM, if the hole does not belong to any section, pfn_valid()
will be false for it and __update_cache() won't try to access memory map.

> > > In general having the pages array defined for the IO-memory is
> > > required for mapping the IO-space other than just uncached (my sram
> > > case for example) or, for instance, with special access attribute for
> > > the user-space (if I am not missing something in a way VM works in
> > > that case).
> >
>
> > No, struct pages are not required to map IO space. If you need to map MMIO
> > to userspace there's remap_pfn_range() for that.
>
> Is this correct for both flat and sparse memory config? In anyway
> please see my comment above about the problem I recently got.
>
> >
> > My guess is that your system has a hole in the physical memory mappings and
> > with FLATMEM that hole will have essentially unused struct pages, which are
> > initialized by init_unavailable_range(). But from mm perspective this is
> > still a hole even though there's some MMIO ranges in that hole.
>
> Absolutely right. Here is the physical memory layout in my system.
> 0 - 128MB: RAM
> 128MB - 512MB: Memory mapped IO
> 512MB - 768MB..8.256GB: RAM
>
> >
> > Now, if that hole is large you are wasting memory for unused memory map and
> > it maybe worth considering using SPARSEMEM.
>
> Do you think it's worth to move to the sparse memory configuration in
> order to save the 384MB of mapping with the 16K page model? AFAIU flat
> memory config is more performant. Performance is critical on the most
> of the SoC applications especially when using the 10G ethernet or
> the high-speed PCIe devices.
>
> -Serge(y)
>
> >
> > > -Serge(y)
> > >
> > > >
> > > > > ---
> > > > > mm/mm_init.c | 1 +
> > > > > 1 file changed, 1 insertion(+)
> > > > >
> > > > > diff --git a/mm/mm_init.c b/mm/mm_init.c
> > > > > index 077bfe393b5e..3fa33e2d32ba 100644
> > > > > --- a/mm/mm_init.c
> > > > > +++ b/mm/mm_init.c
> > > > > @@ -796,6 +796,7 @@ overlap_memmap_init(unsigned long zone, unsigned long *pfn)
> > > > > * - physical memory bank size is not necessarily the exact multiple of the
> > > > > * arbitrary section size
> > > > > * - early reserved memory may not be listed in memblock.memory
> > > > > + * - memory mapped IO space
> > > > > * - memory layouts defined with memmap= kernel parameter may not align
> > > > > * nicely with memmap sections
> > > > > *
> > > > > --
> > > > > 2.42.1
> > > > >
> > > >
> > > > --
> > > > Sincerely yours,
> > > > Mike.
> > > >
> >
> > --
> > Sincerely yours,
> > Mike.

--
Sincerely yours,
Mike.

2023-11-28 10:52:00

by Serge Semin

[permalink] [raw]
Subject: Re: [PATCH 5/7] mm/mm_init.c: Extend init unavailable range doc info

On Tue, Nov 28, 2023 at 09:13:39AM +0200, Mike Rapoport wrote:
> On Fri, Nov 24, 2023 at 02:18:44PM +0300, Serge Semin wrote:
> > On Fri, Nov 24, 2023 at 10:19:00AM +0200, Mike Rapoport wrote:
> > > On Thu, Nov 23, 2023 at 01:42:39PM +0300, Serge Semin wrote:
> > > > On Thu, Nov 23, 2023 at 12:18:54PM +0200, Mike Rapoport wrote:
> > > > > On Wed, Nov 22, 2023 at 09:24:03PM +0300, Serge Semin wrote:
> > > > > > Besides of the already described reasons the pages backended memory holes
> > > > > > might be persistent due to having memory mapped IO spaces behind those
> > > > > > ranges in the framework of flatmem kernel config. Add such note to the
> > > > > > init_unavailable_range() method kdoc in order to point out to one more
> > > > > > reason of having the function executed for such regions.
> > > > > >
> > > > > > Signed-off-by: Serge Semin <[email protected]>
> > > > > >
> > > > > > ---
> > > > > >
> > > > > > Please let me know if the IO-space pages must be initialized somehow
> > > > > > differently rather relying on free_area_init() executing the
> > > > > > init_unavailable_range() method.
> > > > >
> > > >
> > > > > Maybe I'm missing something, but why do you need struct pages in the
> > > > > IO space?
> > > >
> > > > In my case at the very least that's due to having a SRAM device
> > > > available in the middle of the MMIO-space. The region is getting
> > > > mapped using the ioremap_wc() method (Uncached Write-Combine CA),
> > > > which eventually is converted to calling get_vm_area() and
> > > > ioremap_page_range() (see ioremap_prot() function on MIPS), which in
> > > > its turn use the page structs for mapping. Another similar case is
> > > > using ioremap_wc() in the PCIe outbound ATU space mapping of
> > > > the graphic/video cards framebuffers.
> > >
> > > ioremap_page_range() does not need struct pages, but rather physical
> > > addresses.
> >
> > Unless I miss something or MIPS32 is somehow special/wrong in that
> > matter, but from my just got experience it actually does at least in
> > the framework of the __update_cache() implementation which is called
> > in the set_ptes() method (former set_pte_at()), which in its turn
> > is eventually invoked by vmap_range_noflush() and finally by
> > ioremap_page_range(). See the patch
> > [PATCH 3/7] mips: Fix max_mapnr being uninitialized on early stages
> > Link: https://lore.kernel.org/linux-mips/[email protected]/
> > of this series and the stack-trace of the bug fixed by that patch.
> >
> > Is it wrong that on MIPS32 ioremap_page_range() eventually relies on
> > the page structs? It has been like that for, I don't know, long time.
> > If so then the sparse memory config might be broken on MIPS32..(
>

> Do you mind posting your physical memory layout?

I actually already did in response to the last part of your previous
message. You must have missed it. Here is the copy of the message:

> On Fri, Nov 24, 2023 at 02:18:44PM +0300, Serge Semin wrote:
> > On Fri, Nov 24, 2023 at 10:19:00AM +0200, Mike Rapoport wrote:
> > ...
> > >
> > > My guess is that your system has a hole in the physical memory mappings and
> > > with FLATMEM that hole will have essentially unused struct pages, which are
> > > initialized by init_unavailable_range(). But from mm perspective this is
> > > still a hole even though there's some MMIO ranges in that hole.
> >
> > Absolutely right. Here is the physical memory layout in my system.
> > 0 - 128MB: RAM
> > 128MB - 512MB: Memory mapped IO
> > 512MB - 768MB..8.256GB: RAM
> >
> > >
> > > Now, if that hole is large you are wasting memory for unused memory map and
> > > it maybe worth considering using SPARSEMEM.
> >
> > Do you think it's worth to move to the sparse memory configuration in
> > order to save the 384MB of mapping with the 16K page model? AFAIU flat
> > memory config is more performant. Performance is critical on the most
> > of the SoC applications especially when using the 10G ethernet or
> > the high-speed PCIe devices.
> ...

Could you also answer to my question above regarding using the
sparsemem instead on my hw memory layout?

>
> If I understand correctly, you have a hole in your RAM and there is MMIO
> region somewhere in that hole.

Absolutely right. Please see my messages above.

> With FLATMEM the memory map exists for that
> hole and hence pfn_valid() returns 1 for the MMIO range as well. That makes
> __update_cache() to check folio state and that check would fail if the memory
> map contained garbage. But since the hole in the memory map is initialized
> with init_unavailable_range() you get a valid struct page/struct folio and
> everything is fine.

Right. That's what currently happens on MIPS32 and that's what I had
to fix in the framework of this series by the next patch:
Link: https://lore.kernel.org/linux-mips/[email protected]/
flatmem version of the pfn_valid() method has been broken due to
max_mapnr being uninitialized before mem_init() is called. So
init_unavailable_range() didn't initialize the pages on the early
bootup stage. Thus afterwards, when max_mapnr has finally got a valid
value any attempts to call the __update_cache() method on the MMIO
memory hole caused the unaligned access crash.

>
> With that, the init_unavailable_range() docs need not mention IO space at
> all, they should mention holes within FLATMEM memory map.

Ok. I'll resend the patch with mentioning flatmem holes instead of
mentioning the IO-spaces.

>
> As for SPARSEMEM, if the hole does not belong to any section, pfn_valid()
> will be false for it and __update_cache() won't try to access memory map.

Ah, I see. In case of the SPARSEMEM config an another version of
pfn_valid() will be called. It's defined in the include/linux/mmzone.h
header file. Right? If so then no problem there indeed.

-Serge(y)

>
> > > > In general having the pages array defined for the IO-memory is
> > > > required for mapping the IO-space other than just uncached (my sram
> > > > case for example) or, for instance, with special access attribute for
> > > > the user-space (if I am not missing something in a way VM works in
> > > > that case).
> > >
> >
> > > No, struct pages are not required to map IO space. If you need to map MMIO
> > > to userspace there's remap_pfn_range() for that.
> >
> > Is this correct for both flat and sparse memory config? In anyway
> > please see my comment above about the problem I recently got.
> >
> > >
> > > My guess is that your system has a hole in the physical memory mappings and
> > > with FLATMEM that hole will have essentially unused struct pages, which are
> > > initialized by init_unavailable_range(). But from mm perspective this is
> > > still a hole even though there's some MMIO ranges in that hole.
> >
> > Absolutely right. Here is the physical memory layout in my system.
> > 0 - 128MB: RAM
> > 128MB - 512MB: Memory mapped IO
> > 512MB - 768MB..8.256GB: RAM
> >
> > >
> > > Now, if that hole is large you are wasting memory for unused memory map and
> > > it maybe worth considering using SPARSEMEM.
> >
> > Do you think it's worth to move to the sparse memory configuration in
> > order to save the 384MB of mapping with the 16K page model? AFAIU flat
> > memory config is more performant. Performance is critical on the most
> > of the SoC applications especially when using the 10G ethernet or
> > the high-speed PCIe devices.
> >
> > -Serge(y)
> >
> > >
> > > > -Serge(y)
> > > >
> > > > >
> > > > > > ---
> > > > > > mm/mm_init.c | 1 +
> > > > > > 1 file changed, 1 insertion(+)
> > > > > >
> > > > > > diff --git a/mm/mm_init.c b/mm/mm_init.c
> > > > > > index 077bfe393b5e..3fa33e2d32ba 100644
> > > > > > --- a/mm/mm_init.c
> > > > > > +++ b/mm/mm_init.c
> > > > > > @@ -796,6 +796,7 @@ overlap_memmap_init(unsigned long zone, unsigned long *pfn)
> > > > > > * - physical memory bank size is not necessarily the exact multiple of the
> > > > > > * arbitrary section size
> > > > > > * - early reserved memory may not be listed in memblock.memory
> > > > > > + * - memory mapped IO space
> > > > > > * - memory layouts defined with memmap= kernel parameter may not align
> > > > > > * nicely with memmap sections
> > > > > > *
> > > > > > --
> > > > > > 2.42.1
> > > > > >
> > > > >
> > > > > --
> > > > > Sincerely yours,
> > > > > Mike.
> > > > >
> > >
> > > --
> > > Sincerely yours,
> > > Mike.
>
> --
> Sincerely yours,
> Mike.

2023-11-28 11:34:24

by Serge Semin

[permalink] [raw]
Subject: Re: [PATCH 1/7] mips: dmi: Fix early remap on MIPS32

On Mon, Nov 27, 2023 at 09:08:11PM +0000, Jiaxun Yang wrote:
>
>
> 在2023年11月27日十一月 下午4:23,Serge Semin写道:
> [...]
> >> I just made a quick grep in tree, and it seems like we don't have much
> >> user of ioremap_cache (as well as ioremap_uc/wc) here so I think it is
> >> a safe assumption.
> >
> > I wouldn't say there aren't much users. ioremap_wc() and it's
> > devm-version is widely utilized in the GPU and network and some other
> > subsystems. ioremap_cache() isn't widespread indeed. In anyway even a
> > single user must be supported in safely calling the method if it's
> > provided by the arch-code, otherwise the method could be considered as
> > just a bogus stub to have the kernel successfully built. I bet you'll
> > agree with that. But that's not the point in this case,
> >
> > A bit later you also noted:
> >
> > On Fri, Nov 24, 2023 at 10:34:49PM +0000, Jiaxun Yang wrote:
> >> A nip, _page_cachable_default is set in cpu_cache_init() as well. We'd
> >> better move it to cpu-probe.c, or give it a reasonable default value.
> >
> > Right. Thanks. To be honest I haven't noticed that before your
> > message. _page_cachable_default is indeed initialized in the
> > cpu_cache_init() method, several steps after it would be used in the
> > framework of dmi_remap_early(). On the other hand ioremap_cache() is
> > defined as ioremap_prot() with the _page_cachable_default variable
> > passed. So my code will still correctly work unless
> > _page_cachable_default is pre-initialized with something other than
> > zero. On the other hand we can't easily change its default value
> > because it will affect and likely break the r3k (CPU_R3000) and Octeon
> > based platforms, because it's utilized to initialize the
> > protection-map table. Of course we can fix the r3k_cache_init() and
> > octeon_cache_init() methods too so they would get the
> > _page_cachable_default variable back to zero, but it will also make
> > things around it more complicated.
> >
> > Also note, moving the _page_cachable_default initialization to the
> > earlier stages like cpu_probe() won't work better because the field
> > value may get change for instance in the framework of the smp_setup()
> > function (see cps_smp_setup()).
> >
> > So after all the considerations above this solution now looks even
> > clumsier than before.( Any idea how to make it better?
>

> I think the best solution maybe just use CKSEG0 to setup map here.
>
> Btw I was thinking about 64 bit here, I thought for 64bit we would
> just embedded prot into XKPHYS, however I quickly figure out
> ioremap_cache was never implemented properly on 64-bit system,
> so does ioremap_wc.
>
> > u64 base = (flags == _CACHE_UNCACHED ? IO_BASE : UNCAC_BASE);
>
> Which is always uncached mapping.

Indeed. Thanks for pointing that out. In the last days several times I
was looking at that line and for some reason UNCAC_BASE seemed as
CAC_BASE to me.) Based on what both IO_BASE and UNCAC_BASE are defined
as of the uncached region anyway, then it should be safe for any
currently supported MIPS64 (including the Loongson's) to use ioremap()
in place of dmi_early_remap(). So basically my current patch in the
subject won't change the method semantics. Let's not to try to fix a
problem which doesn't exist then, and keep the patch as is especially
seeing that the alternatives might still cause some troubles. Will you
be ok with that?

-Serge(y)

>
> >>
> [...]
> >
> > Note the memory might be clobbered even before dmi_setup() for
> > instance by means of the early_memtest() method. In anyway it would be
> > better if the system booloader would have already reserved the DMI
> > memory (in DTB) or it would have been done by the platform-specific
> > plat_mem_setup() method.
>
> Unfortunately, too many machines are shipped with those badly designed
> firmware. We rely on dmi_setup code to scan and preserve dmi table from
> random location in memory.
>
> >
> >> The second is we may have some early quirks depends on DMI
> >> information.
> >
> > Which quirks do you mean to be dependent in between the current
> > dmi_setup() call place and the cpu_cache_init() method invocation?
>
> I think we don't have any for now.
>
> --
> - Jiaxun

2023-11-28 12:43:01

by Arnd Bergmann

[permalink] [raw]
Subject: Re: [PATCH 1/7] mips: dmi: Fix early remap on MIPS32

On Mon, Nov 27, 2023, at 17:23, Serge Semin wrote:
> On Fri, Nov 24, 2023 at 10:03:49PM +0000, Jiaxun Yang wrote:
>> 在2023年11月24日十一月 下午6:52,Serge Semin写道:
>> > On Thu, Nov 23, 2023 at 05:33:31PM +0000, Jiaxun Yang wrote:
>> >>
>> [...]
>> >> Actually dmi_setup() is called before cpu_cache_init().
>> >
>> > To preliminary sum the discussion, indeed there can be issues on the
>> > platforms which have DMI initialized on the cached region. Here are
>> > several solutions and additional difficulties I think may be caused by
>> > implementing them:
>>
>> Thanks for such detailed conclusion!
>> I'd prefer go solution 1, with comments below.
>> >
>> > 1. Use unmapped cached region utilization in the MIPS32 ioremap_prot()
>> > method.
>> > This solution a bit clumsy than it looks on the first glance.
>> > ioremap_prot() can be used for various types of the cachability
>> > mapping. Currently it's a default-cacheable CA preserved in the
>> > _page_cachable_default variable and Write-combined CA saved in
>> > boot_cpu_data.writecombine. Based on that we would have needed to use
>> > the unmapped cached region utilized for the IO-remaps called with the
>> > "_page_cachable_default" mapping flags passed only. The rest of the IO
>> > range mappings, including the write-combined ones, would have been
>> > handled by VM means. This would have made the ioremap_prot() a bit
>> > less maintainable, but still won't be that hard to implement (unless I
>> > miss something):
>> > --- a/arch/mips/mm/ioremap.c
>> > +++ b/arch/mips/mm/ioremap.c
>> > /*
>> > - * Map uncached objects in the low 512mb of address space using KSEG1,
>> > - * otherwise map using page tables.
>> > + * Map uncached/default-cached objects in the low 512mb of address
>> > + * space using KSEG1/KSEG0, otherwise map using page tables.
>> > */
>> > - if (IS_LOW512(phys_addr) && IS_LOW512(last_addr) &&
>> > - flags == _CACHE_UNCACHED)
>> > - return (void __iomem *) CKSEG1ADDR(phys_addr);
>> > + if (IS_LOW512(phys_addr) && IS_LOW512(last_addr)) {
>> > + if (flags == _CACHE_UNCACHED)
>> > + return (void __iomem *) CKSEG1ADDR(phys_addr);
>> > + else if (flags == _page_cachable_default)
>> > + return (void __iomem *) CKSEG0ADDR(phys_addr);
>> > + }
>> >
>> > Currently I can't figure out what obvious problems it may cause. But
>> > It seems suspicious that the cacheable IO-mapping hasn't been
>> > implemented by the unmapped cacheable region in the first place. In
>> > anyway this solution looks more dangerous than solution 2. because it
>> > affects all the MIPS32 platforms at once.
>>
>> I just made a quick grep in tree, and it seems like we don't have much
>> user of ioremap_cache (as well as ioremap_uc/wc) here so I think it is
>> a safe assumption.
>
> I wouldn't say there aren't much users. ioremap_wc() and it's
> devm-version is widely utilized in the GPU and network and some other
> subsystems. ioremap_cache() isn't widespread indeed. In anyway even a
> single user must be supported in safely calling the method if it's
> provided by the arch-code, otherwise the method could be considered as
> just a bogus stub to have the kernel successfully built. I bet you'll
> agree with that. But that's not the point in this case.

ioremap_wc() is useful for mapping PCI attached memory such as frame
buffers, but ioremap_cache() is generally underspecified because the
resulting pointer is neither safe to dereference nor to pass into
readl()/writel()/memcpy_fromio() on all architectures.

There was an effort to convert the remaining ioremap_cache() calls
into memremap() a few years ago, not sure if that's still being worked
on but it would be the right thing to do.

Arnd

2023-11-28 13:54:19

by Serge Semin

[permalink] [raw]
Subject: Re: [PATCH 1/7] mips: dmi: Fix early remap on MIPS32

Hi Arnd

On Tue, Nov 28, 2023 at 01:41:51PM +0100, Arnd Bergmann wrote:
> On Mon, Nov 27, 2023, at 17:23, Serge Semin wrote:
> > On Fri, Nov 24, 2023 at 10:03:49PM +0000, Jiaxun Yang wrote:
> >> 在2023年11月24日十一月 下午6:52,Serge Semin写道:
> >> > On Thu, Nov 23, 2023 at 05:33:31PM +0000, Jiaxun Yang wrote:
> >> >>
> >> [...]
> >> >> Actually dmi_setup() is called before cpu_cache_init().
> >> >
> >> > To preliminary sum the discussion, indeed there can be issues on the
> >> > platforms which have DMI initialized on the cached region. Here are
> >> > several solutions and additional difficulties I think may be caused by
> >> > implementing them:
> >>
> >> Thanks for such detailed conclusion!
> >> I'd prefer go solution 1, with comments below.
> >> >
> >> > 1. Use unmapped cached region utilization in the MIPS32 ioremap_prot()
> >> > method.
> >> > This solution a bit clumsy than it looks on the first glance.
> >> > ioremap_prot() can be used for various types of the cachability
> >> > mapping. Currently it's a default-cacheable CA preserved in the
> >> > _page_cachable_default variable and Write-combined CA saved in
> >> > boot_cpu_data.writecombine. Based on that we would have needed to use
> >> > the unmapped cached region utilized for the IO-remaps called with the
> >> > "_page_cachable_default" mapping flags passed only. The rest of the IO
> >> > range mappings, including the write-combined ones, would have been
> >> > handled by VM means. This would have made the ioremap_prot() a bit
> >> > less maintainable, but still won't be that hard to implement (unless I
> >> > miss something):
> >> > --- a/arch/mips/mm/ioremap.c
> >> > +++ b/arch/mips/mm/ioremap.c
> >> > /*
> >> > - * Map uncached objects in the low 512mb of address space using KSEG1,
> >> > - * otherwise map using page tables.
> >> > + * Map uncached/default-cached objects in the low 512mb of address
> >> > + * space using KSEG1/KSEG0, otherwise map using page tables.
> >> > */
> >> > - if (IS_LOW512(phys_addr) && IS_LOW512(last_addr) &&
> >> > - flags == _CACHE_UNCACHED)
> >> > - return (void __iomem *) CKSEG1ADDR(phys_addr);
> >> > + if (IS_LOW512(phys_addr) && IS_LOW512(last_addr)) {
> >> > + if (flags == _CACHE_UNCACHED)
> >> > + return (void __iomem *) CKSEG1ADDR(phys_addr);
> >> > + else if (flags == _page_cachable_default)
> >> > + return (void __iomem *) CKSEG0ADDR(phys_addr);
> >> > + }
> >> >
> >> > Currently I can't figure out what obvious problems it may cause. But
> >> > It seems suspicious that the cacheable IO-mapping hasn't been
> >> > implemented by the unmapped cacheable region in the first place. In
> >> > anyway this solution looks more dangerous than solution 2. because it
> >> > affects all the MIPS32 platforms at once.
> >>
> >> I just made a quick grep in tree, and it seems like we don't have much
> >> user of ioremap_cache (as well as ioremap_uc/wc) here so I think it is
> >> a safe assumption.
> >
> > I wouldn't say there aren't much users. ioremap_wc() and it's
> > devm-version is widely utilized in the GPU and network and some other
> > subsystems. ioremap_cache() isn't widespread indeed. In anyway even a
> > single user must be supported in safely calling the method if it's
> > provided by the arch-code, otherwise the method could be considered as
> > just a bogus stub to have the kernel successfully built. I bet you'll
> > agree with that. But that's not the point in this case.
>

> ioremap_wc() is useful for mapping PCI attached memory such as frame
> buffers,

Thanks for clarification. That's actually the reason why I originally
added the ioremap_wc() support to the MIPS32 arch. In one of the
projects we had SM750/SM768-based graphic cards attached to the
MIPS32-based SoC. Using ioremap_wc() for the framebuffer significantly
improved the graphic subsystem performance indeed. It was mostly
required for the SM750 chips though, which provided a narrow and slow
PCIe Gen.1 x1 interface.

> but ioremap_cache() is generally underspecified because the
> resulting pointer is neither safe to dereference nor to pass into
> readl()/writel()/memcpy_fromio() on all architectures.

I don't know about ARM64 (which for instance has it utilized to access
the DMI region), but at least in case of MIPS32 (a fortiori MIPS64
seeing the ioremap_cache() method actually returns a pointer to the
uncached region) I don't see a reason why it wouldn't be safe in both
cases described by you. All IO and memory regions are accessed by the
generic load and store instructions. The only difference is that the
MMIO-space accessors normally implies additional barriers, which just
slow down the execution, but shouldn't cause any other problem. Could
you clarify why do you think otherwise?

>
> There was an effort to convert the remaining ioremap_cache() calls
> into memremap() a few years ago, not sure if that's still being worked
> on but it would be the right thing to do.

I see. Thanks for the pointing out to that. I guess it could be done
for MIPS too (at least on our MIPS32 platform DMI is just a memory
region pre-initialized by the bootloader), but the conversion would
require much efforts. Alas currently I can't afford to get it
implemented in the framework of this patchset. (I saved your note in
my MIPS TODO list though. Let's hope eventually I'll be able to get
back to this topic.)

-Serge(y)

>
> Arnd

2023-11-28 15:47:23

by Jiaxun Yang

[permalink] [raw]
Subject: Re: [PATCH 1/7] mips: dmi: Fix early remap on MIPS32



在2023年11月28日十一月 上午11:34,Serge Semin写道:
> On Mon, Nov 27, 2023 at 09:08:11PM +0000, Jiaxun Yang wrote:
[...]
>
> Indeed. Thanks for pointing that out. In the last days several times I
> was looking at that line and for some reason UNCAC_BASE seemed as
> CAC_BASE to me.) Based on what both IO_BASE and UNCAC_BASE are defined
> as of the uncached region anyway, then it should be safe for any
> currently supported MIPS64 (including the Loongson's) to use ioremap()
> in place of dmi_early_remap(). So basically my current patch in the
> subject won't change the method semantics. Let's not to try to fix a
> problem which doesn't exist then, and keep the patch as is especially
> seeing that the alternatives might still cause some troubles. Will you
> be ok with that?

I'd say the safest option is to use CKSEG0 or TO_CAC here, but I'm fine
with ioremap as long as the semantic remains uncached on Loongson.

Thanks.
>
> -Serge(y)
>
>>
[...]
--
- Jiaxun

2023-11-28 21:59:58

by Arnd Bergmann

[permalink] [raw]
Subject: Re: [PATCH 1/7] mips: dmi: Fix early remap on MIPS32

On Tue, Nov 28, 2023, at 14:52, Serge Semin wrote:
> On Tue, Nov 28, 2023 at 01:41:51PM +0100, Arnd Bergmann wrote:
>> On Mon, Nov 27, 2023, at 17:23, Serge Semin wrote:
>> > On Fri, Nov 24, 2023 at 10:03:49PM +0000, Jiaxun Yang wrote:
>> but ioremap_cache() is generally underspecified because the
>> resulting pointer is neither safe to dereference nor to pass into
>> readl()/writel()/memcpy_fromio() on all architectures.
>
> I don't know about ARM64 (which for instance has it utilized to access
> the DMI region), but at least in case of MIPS32 (a fortiori MIPS64
> seeing the ioremap_cache() method actually returns a pointer to the
> uncached region) I don't see a reason why it wouldn't be safe in both
> cases described by you. All IO and memory regions are accessed by the
> generic load and store instructions. The only difference is that the
> MMIO-space accessors normally implies additional barriers, which just
> slow down the execution, but shouldn't cause any other problem. Could
> you clarify why do you think otherwise?

On arch/powerpc, CONFIG_PPC_INDIRECT_MMIO makes all ioremap()
type functions return a token that can be passed into the readl/writel
family but that is not a pointer you can dereference.

On s390, the mechanism is different, but similarly __iomem
tokens are not pointers at all.

>> There was an effort to convert the remaining ioremap_cache() calls
>> into memremap() a few years ago, not sure if that's still being worked
>> on but it would be the right thing to do.
>
> I see. Thanks for the pointing out to that. I guess it could be done
> for MIPS too (at least on our MIPS32 platform DMI is just a memory
> region pre-initialized by the bootloader), but the conversion would
> require much efforts. Alas currently I can't afford to get it
> implemented in the framework of this patchset. (I saved your note in
> my MIPS TODO list though. Let's hope eventually I'll be able to get
> back to this topic.)

I just noticed that the only architectures that actually provide
ioremap_cache() are x86, arm, arm64, mips, loongarch, powerpc, sh
and xtensa. The ones that have ACPI support still definitely
need it, most of the other ones can probably be fixed without
too much trouble.

Arnd

2023-11-29 06:14:49

by Mike Rapoport

[permalink] [raw]
Subject: Re: [PATCH 5/7] mm/mm_init.c: Extend init unavailable range doc info

On Tue, Nov 28, 2023 at 01:51:32PM +0300, Serge Semin wrote:
> On Tue, Nov 28, 2023 at 09:13:39AM +0200, Mike Rapoport wrote:
> > On Fri, Nov 24, 2023 at 02:18:44PM +0300, Serge Semin wrote:
>
> > Do you mind posting your physical memory layout?
>
> I actually already did in response to the last part of your previous
> message. You must have missed it. Here is the copy of the message:

Sorry, for some reason I didn't scroll down your previous mail :)

> > On Fri, Nov 24, 2023 at 02:18:44PM +0300, Serge Semin wrote:
> > > On Fri, Nov 24, 2023 at 10:19:00AM +0200, Mike Rapoport wrote:
> > > ...
> > > >
> > > > My guess is that your system has a hole in the physical memory mappings and
> > > > with FLATMEM that hole will have essentially unused struct pages, which are
> > > > initialized by init_unavailable_range(). But from mm perspective this is
> > > > still a hole even though there's some MMIO ranges in that hole.
> > >
> > > Absolutely right. Here is the physical memory layout in my system.
> > > 0 - 128MB: RAM
> > > 128MB - 512MB: Memory mapped IO
> > > 512MB - 768MB..8.256GB: RAM
> > >
> > > >
> > > > Now, if that hole is large you are wasting memory for unused memory map and
> > > > it maybe worth considering using SPARSEMEM.
> > >
> > > Do you think it's worth to move to the sparse memory configuration in
> > > order to save the 384MB of mapping with the 16K page model? AFAIU flat
> > > memory config is more performant. Performance is critical on the most
> > > of the SoC applications especially when using the 10G ethernet or
> > > the high-speed PCIe devices.
>
> Could you also answer to my question above regarding using the
> sparsemem instead on my hw memory layout?

Currently MIPS defines section size to 256MB, so with your memory layout
with SPARSMEM there will be two sections of 256MB, at 0 and at 512MB, so
you'll save memory map for 256M which is roughly 1M with 16k pages.

It's possible

With SPARSEMEM the pfn_to_page() and page_to_pfn() are a bit longer in
terms of assembly instructions, but I really doubt you'll notice any
performance difference in real world applications.

> > With FLATMEM the memory map exists for that
> > hole and hence pfn_valid() returns 1 for the MMIO range as well. That makes
> > __update_cache() to check folio state and that check would fail if the memory
> > map contained garbage. But since the hole in the memory map is initialized
> > with init_unavailable_range() you get a valid struct page/struct folio and
> > everything is fine.
>
> Right. That's what currently happens on MIPS32 and that's what I had
> to fix in the framework of this series by the next patch:
> Link: https://lore.kernel.org/linux-mips/[email protected]/
> flatmem version of the pfn_valid() method has been broken due to
> max_mapnr being uninitialized before mem_init() is called. So
> init_unavailable_range() didn't initialize the pages on the early
> bootup stage. Thus afterwards, when max_mapnr has finally got a valid
> value any attempts to call the __update_cache() method on the MMIO
> memory hole caused the unaligned access crash.

The fix for max_mapnr makes pfn_valid()==1 for the entire memory map and
this fixes up the struct pages in the hole.

> >
> > With that, the init_unavailable_range() docs need not mention IO space at
> > all, they should mention holes within FLATMEM memory map.
>
> Ok. I'll resend the patch with mentioning flatmem holes instead of
> mentioning the IO-spaces.
>
> >
> > As for SPARSEMEM, if the hole does not belong to any section, pfn_valid()
> > will be false for it and __update_cache() won't try to access memory map.
>
> Ah, I see. In case of the SPARSEMEM config an another version of
> pfn_valid() will be called. It's defined in the include/linux/mmzone.h
> header file. Right? If so then no problem there indeed.

Yes, SPARSMEM uses pfn_valid() defined in include/linux/mmzone.h

> -Serge(y)

--
Sincerely yours,
Mike.

2023-11-30 13:31:49

by Serge Semin

[permalink] [raw]
Subject: Re: [PATCH 5/7] mm/mm_init.c: Extend init unavailable range doc info

On Wed, Nov 29, 2023 at 08:14:00AM +0200, Mike Rapoport wrote:
> On Tue, Nov 28, 2023 at 01:51:32PM +0300, Serge Semin wrote:
> > On Tue, Nov 28, 2023 at 09:13:39AM +0200, Mike Rapoport wrote:
> > > On Fri, Nov 24, 2023 at 02:18:44PM +0300, Serge Semin wrote:
> >
> > > Do you mind posting your physical memory layout?
> >
> > I actually already did in response to the last part of your previous
> > message. You must have missed it. Here is the copy of the message:
>
> Sorry, for some reason I didn't scroll down your previous mail :)
>
> > > On Fri, Nov 24, 2023 at 02:18:44PM +0300, Serge Semin wrote:
> > > > On Fri, Nov 24, 2023 at 10:19:00AM +0200, Mike Rapoport wrote:
> > > > ...
> > > > >
> > > > > My guess is that your system has a hole in the physical memory mappings and
> > > > > with FLATMEM that hole will have essentially unused struct pages, which are
> > > > > initialized by init_unavailable_range(). But from mm perspective this is
> > > > > still a hole even though there's some MMIO ranges in that hole.
> > > >
> > > > Absolutely right. Here is the physical memory layout in my system.
> > > > 0 - 128MB: RAM
> > > > 128MB - 512MB: Memory mapped IO
> > > > 512MB - 768MB..8.256GB: RAM
> > > >
> > > > >
> > > > > Now, if that hole is large you are wasting memory for unused memory map and
> > > > > it maybe worth considering using SPARSEMEM.
> > > >
> > > > Do you think it's worth to move to the sparse memory configuration in
> > > > order to save the 384MB of mapping with the 16K page model? AFAIU flat
> > > > memory config is more performant. Performance is critical on the most
> > > > of the SoC applications especially when using the 10G ethernet or
> > > > the high-speed PCIe devices.
> >
> > Could you also answer to my question above regarding using the
> > sparsemem instead on my hw memory layout?
>

> Currently MIPS defines section size to 256MB, so with your memory layout
> with SPARSMEM there will be two sections of 256MB, at 0 and at 512MB, so
> you'll save memory map for 256M which is roughly 1M with 16k pages.
>
> It's possible
>
> With SPARSEMEM the pfn_to_page() and page_to_pfn() are a bit longer in
> terms of assembly instructions, but I really doubt you'll notice any
> performance difference in real world applications.

Ok. Thank you very much for the comprehensive response. I'll give a
good thought towards moving our platform to the sparse memory config. Most
likely it will be done together with reducing SECTION_SIZE_BITS to
128MB in order to save a few more low-memory space. This will be
mostly useful it XPA is enabled and 8GB memory is available. Such case
requires a lot of low-memory for mapping, which is of just 128MB in
our device.

-Serge(y)

>
> > > With FLATMEM the memory map exists for that
> > > hole and hence pfn_valid() returns 1 for the MMIO range as well. That makes
> > > __update_cache() to check folio state and that check would fail if the memory
> > > map contained garbage. But since the hole in the memory map is initialized
> > > with init_unavailable_range() you get a valid struct page/struct folio and
> > > everything is fine.
> >
> > Right. That's what currently happens on MIPS32 and that's what I had
> > to fix in the framework of this series by the next patch:
> > Link: https://lore.kernel.org/linux-mips/[email protected]/
> > flatmem version of the pfn_valid() method has been broken due to
> > max_mapnr being uninitialized before mem_init() is called. So
> > init_unavailable_range() didn't initialize the pages on the early
> > bootup stage. Thus afterwards, when max_mapnr has finally got a valid
> > value any attempts to call the __update_cache() method on the MMIO
> > memory hole caused the unaligned access crash.
>
> The fix for max_mapnr makes pfn_valid()==1 for the entire memory map and
> this fixes up the struct pages in the hole.
>
> > >
> > > With that, the init_unavailable_range() docs need not mention IO space at
> > > all, they should mention holes within FLATMEM memory map.
> >
> > Ok. I'll resend the patch with mentioning flatmem holes instead of
> > mentioning the IO-spaces.
> >
> > >
> > > As for SPARSEMEM, if the hole does not belong to any section, pfn_valid()
> > > will be false for it and __update_cache() won't try to access memory map.
> >
> > Ah, I see. In case of the SPARSEMEM config an another version of
> > pfn_valid() will be called. It's defined in the include/linux/mmzone.h
> > header file. Right? If so then no problem there indeed.
>
> Yes, SPARSMEM uses pfn_valid() defined in include/linux/mmzone.h
>
> > -Serge(y)
>
> --
> Sincerely yours,
> Mike.

2023-11-30 19:17:05

by Serge Semin

[permalink] [raw]
Subject: Re: [PATCH 1/7] mips: dmi: Fix early remap on MIPS32

On Tue, Nov 28, 2023 at 03:46:37PM +0000, Jiaxun Yang wrote:
>
>
> 在2023年11月28日十一月 上午11:34,Serge Semin写道:
> > On Mon, Nov 27, 2023 at 09:08:11PM +0000, Jiaxun Yang wrote:
> [...]
> >
> > Indeed. Thanks for pointing that out. In the last days several times I
> > was looking at that line and for some reason UNCAC_BASE seemed as
> > CAC_BASE to me.) Based on what both IO_BASE and UNCAC_BASE are defined
> > as of the uncached region anyway, then it should be safe for any
> > currently supported MIPS64 (including the Loongson's) to use ioremap()
> > in place of dmi_early_remap(). So basically my current patch in the
> > subject won't change the method semantics. Let's not to try to fix a
> > problem which doesn't exist then, and keep the patch as is especially
> > seeing that the alternatives might still cause some troubles. Will you
> > be ok with that?
>

> I'd say the safest option is to use CKSEG0 or TO_CAC here,

I would have agreed with you if MIPS didn't have that special
_page_cachable_default variable which is undefined for some platforms
and which might be re-defined during the boot-up process, and if
MIPS64 didn't have ioremap_prot() always mapping to the uncached
region. But IMO updating ioremap_prot() currently seems more risky
than just converting dmi_early_remap() to the uncached version
especially seeing it won't change anything. MIPS64 always have IO
remapped to the uncached region. MIPS32 won't be able to have cached
mapping until VM is available, and paging and slabs are initialized.
So on the early MIPS32 bootup stages ioremap_cache() wouldn't have
worked anyway.

> but I'm fine
> with ioremap as long as the semantic remains uncached on Loongson.

Ok. Thanks.

-Serge(y)

>
> Thanks.
> >
> > -Serge(y)
> >
> >>
> [...]
> --
> - Jiaxun

2023-11-30 19:26:39

by Serge Semin

[permalink] [raw]
Subject: Re: [PATCH 1/7] mips: dmi: Fix early remap on MIPS32

On Tue, Nov 28, 2023 at 10:59:10PM +0100, Arnd Bergmann wrote:
> On Tue, Nov 28, 2023, at 14:52, Serge Semin wrote:
> > On Tue, Nov 28, 2023 at 01:41:51PM +0100, Arnd Bergmann wrote:
> >> On Mon, Nov 27, 2023, at 17:23, Serge Semin wrote:
> >> > On Fri, Nov 24, 2023 at 10:03:49PM +0000, Jiaxun Yang wrote:
> >> but ioremap_cache() is generally underspecified because the
> >> resulting pointer is neither safe to dereference nor to pass into
> >> readl()/writel()/memcpy_fromio() on all architectures.
> >
> > I don't know about ARM64 (which for instance has it utilized to access
> > the DMI region), but at least in case of MIPS32 (a fortiori MIPS64
> > seeing the ioremap_cache() method actually returns a pointer to the
> > uncached region) I don't see a reason why it wouldn't be safe in both
> > cases described by you. All IO and memory regions are accessed by the
> > generic load and store instructions. The only difference is that the
> > MMIO-space accessors normally implies additional barriers, which just
> > slow down the execution, but shouldn't cause any other problem. Could
> > you clarify why do you think otherwise?
>
> On arch/powerpc, CONFIG_PPC_INDIRECT_MMIO makes all ioremap()
> type functions return a token that can be passed into the readl/writel
> family but that is not a pointer you can dereference.
>
> On s390, the mechanism is different, but similarly __iomem
> tokens are not pointers at all.

Ah, you meant that it was not generically safe. Then your were correct
for sure. I was talking about the MIPS arch, which doesn't
differentiate normal and IO memory pointers: all of them are accessed
by the same instructions. So ioremap_prot() returns just a normal
pointer there, which can be safely de-referenced.

>
> >> There was an effort to convert the remaining ioremap_cache() calls
> >> into memremap() a few years ago, not sure if that's still being worked
> >> on but it would be the right thing to do.
> >
> > I see. Thanks for the pointing out to that. I guess it could be done
> > for MIPS too (at least on our MIPS32 platform DMI is just a memory
> > region pre-initialized by the bootloader), but the conversion would
> > require much efforts. Alas currently I can't afford to get it
> > implemented in the framework of this patchset. (I saved your note in
> > my MIPS TODO list though. Let's hope eventually I'll be able to get
> > back to this topic.)
>

> I just noticed that the only architectures that actually provide
> ioremap_cache() are x86, arm, arm64, mips, loongarch, powerpc, sh
> and xtensa. The ones that have ACPI support still definitely
> need it, most of the other ones can probably be fixed without
> too much trouble.

Ok. Thanks. I'll have a look at that on my free time.

-Serge(y)

>
> Arnd

2023-12-01 00:13:59

by Jiaxun Yang

[permalink] [raw]
Subject: Re: [PATCH 1/7] mips: dmi: Fix early remap on MIPS32



在2023年11月30日十一月 下午7:16,Serge Semin写道:
> On Tue, Nov 28, 2023 at 03:46:37PM +0000, Jiaxun Yang wrote:
[...]
>
>> I'd say the safest option is to use CKSEG0 or TO_CAC here,
>
> I would have agreed with you if MIPS didn't have that special
> _page_cachable_default variable which is undefined for some platforms
> and which might be re-defined during the boot-up process, and if
> MIPS64 didn't have ioremap_prot() always mapping to the uncached
> region. But IMO updating ioremap_prot() currently seems more risky
> than just converting dmi_early_remap() to the uncached version
> especially seeing it won't change anything. MIPS64 always have IO
> remapped to the uncached region. MIPS32 won't be able to have cached
> mapping until VM is available, and paging and slabs are initialized.
> So on the early MIPS32 bootup stages ioremap_cache() wouldn't have
> worked anyway.

I really didn't get that, using CKSEG0 on 32bit system and TO_CAC
on 64bit system won't hurt.

Something like:
#ifdef CONFIG_64BIT
#define dmi_remap(x, l) (void *)TO_CAC(x)
#else
#define dmi_remap(x, l) (void *)CKSEG0(x)
#endif

Can help us avoid all the hassle. Since it always ensures we are
using same CCA to access DMI tables. We can always trust Config.K0
left by firmware in this case.

You may add some sanity check on 32 bit to avoid generating invalid
pointer. (And perhaps implement it as ioremap_early.....)

Thanks
--
- Jiaxun

2023-12-01 14:54:34

by Serge Semin

[permalink] [raw]
Subject: Re: [PATCH 1/7] mips: dmi: Fix early remap on MIPS32

On Fri, Dec 01, 2023 at 12:13:22AM +0000, Jiaxun Yang wrote:
>
>
> 在2023年11月30日十一月 下午7:16,Serge Semin写道:
> > On Tue, Nov 28, 2023 at 03:46:37PM +0000, Jiaxun Yang wrote:
> [...]
> >
> >> I'd say the safest option is to use CKSEG0 or TO_CAC here,
> >
> > I would have agreed with you if MIPS didn't have that special
> > _page_cachable_default variable which is undefined for some platforms
> > and which might be re-defined during the boot-up process, and if
> > MIPS64 didn't have ioremap_prot() always mapping to the uncached
> > region. But IMO updating ioremap_prot() currently seems more risky
> > than just converting dmi_early_remap() to the uncached version
> > especially seeing it won't change anything. MIPS64 always have IO
> > remapped to the uncached region. MIPS32 won't be able to have cached
> > mapping until VM is available, and paging and slabs are initialized.
> > So on the early MIPS32 bootup stages ioremap_cache() wouldn't have
> > worked anyway.
>

> I really didn't get that, using CKSEG0 on 32bit system and TO_CAC
> on 64bit system won't hurt.
>
> Something like:
> #ifdef CONFIG_64BIT
> #define dmi_remap(x, l) (void *)TO_CAC(x)
> #else
> #define dmi_remap(x, l) (void *)CKSEG0(x)
> #endif
>
> Can help us avoid all the hassle. Since it always ensures we are
> using same CCA to access DMI tables. We can always trust Config.K0
> left by firmware in this case.

Please note my only concern is about dmi_early_remap(), not
dmi_remap(). The later one can be safely left backended by the
ioremap_cache() method because at the stage it's utilized MIPS32
version of ioremap_prot() will be able to create any mapping it's
requested to. The dmi_early_remap() function is called very early with
no paging or VM or even cache stuff initialized. So currently AFAICS
it just doesn't work on _all_ _MIPS32_ platform, because
ioremap_prot() relies on VM and slab being available to have any
cacheable mapping, which aren't at the moment of the dmi_setup()
function invocation. Seeing the ioremap_cache() is just a stub on
MIPS64 which always performs the uncached mapping, it will be
completely safe to just convert dmi_early_remap() to ioremap() with
no risk to beak anything. dmi_early_remap() semantics won't be
actually changed, it will work as before on MIPS64 and will be fixed
on MIPS32. This (AFAICS) is a completely safe fix of the problem with
just a few affected platforms around.

Getting back to what you suggest. You want to change the
ioremap_prot() semantics so one would return a pointer to the cached
unmapped region for the ioremap_cache() method. First of all
ioremap_cache() doesn't define what type of the cached mapping it
needs but merely relies on the _page_cachable_default variable value.
That variable is uninitialized on the early stages and then only
initialized for the r4k platforms (this makes me also thinking that
ioremap_cache() doesn't properly work for r3k and Octeon platforms),
thus we would need to have it initialized with some value until the
cpu_cache_init() is called and have the r3k and Octen cache init
functions fixed to get it back to the uninitialized zero value .
Moreover all the _CACHE_* field values are already occupied. What
default value should be use then for _page_cachable_default? You say
to read Config.K0 earlier, but Config.K0 may be changed later in the
framework of the cps_smp_setup() method and actually in
cpu_cache_init() for r4k if 'cca' kernel parameter is specified. So do
we need _page_cachable_default being re-initialized then?.. There
might be some other underwater rocks in the fix you suggest. But all
of that already makes your solution much more risky than the one
described before.

Howbeit if you still think that none of the concerns listed above is
worth being that much worried about, then please note your solution is
mainly targeted to fix ioremap_cache(). Meanwhile this patch is about
the DMI region mapping. So if ioremap_cache() needs to be fixed in a
way you suggest it's better to be done in a framework of another
patch. But considering the possible problems it may cause I wouldn't
risk to have it backported to the stable kernels.

-Serge(y)

>
> You may add some sanity check on 32 bit to avoid generating invalid
> pointer. (And perhaps implement it as ioremap_early.....)
>
> Thanks
> --
> - Jiaxun

2023-12-01 15:12:07

by Jiaxun Yang

[permalink] [raw]
Subject: Re: [PATCH 1/7] mips: dmi: Fix early remap on MIPS32



在2023年12月1日十二月 下午2:54,Serge Semin写道:
> On Fri, Dec 01, 2023 at 12:13:22AM +0000, Jiaxun Yang wrote:
>>
>>
>> 在2023年11月30日十一月 下午7:16,Serge Semin写道:
>> > On Tue, Nov 28, 2023 at 03:46:37PM +0000, Jiaxun Yang wrote:
>> [...]
>> >
>> >> I'd say the safest option is to use CKSEG0 or TO_CAC here,
>> >
>> > I would have agreed with you if MIPS didn't have that special
>> > _page_cachable_default variable which is undefined for some platforms
>> > and which might be re-defined during the boot-up process, and if
>> > MIPS64 didn't have ioremap_prot() always mapping to the uncached
>> > region. But IMO updating ioremap_prot() currently seems more risky
>> > than just converting dmi_early_remap() to the uncached version
>> > especially seeing it won't change anything. MIPS64 always have IO
>> > remapped to the uncached region. MIPS32 won't be able to have cached
>> > mapping until VM is available, and paging and slabs are initialized.
>> > So on the early MIPS32 bootup stages ioremap_cache() wouldn't have
>> > worked anyway.
>>
>
>> I really didn't get that, using CKSEG0 on 32bit system and TO_CAC
>> on 64bit system won't hurt.
>>
>> Something like:
>> #ifdef CONFIG_64BIT
>> #define dmi_remap(x, l) (void *)TO_CAC(x)
>> #else
>> #define dmi_remap(x, l) (void *)CKSEG0(x)
>> #endif
>>
>> Can help us avoid all the hassle. Since it always ensures we are
>> using same CCA to access DMI tables. We can always trust Config.K0
>> left by firmware in this case.
>
> Please note my only concern is about dmi_early_remap(), not
> dmi_remap(). The later one can be safely left backended by the
> ioremap_cache() method because at the stage it's utilized MIPS32
> version of ioremap_prot() will be able to create any mapping it's
> requested to. The dmi_early_remap() function is called very early with
> no paging or VM or even cache stuff initialized. So currently AFAICS
> it just doesn't work on _all_ _MIPS32_ platform, because
> ioremap_prot() relies on VM and slab being available to have any
> cacheable mapping, which aren't at the moment of the dmi_setup()
> function invocation. Seeing the ioremap_cache() is just a stub on
> MIPS64 which always performs the uncached mapping, it will be
> completely safe to just convert dmi_early_remap() to ioremap() with
> no risk to beak anything. dmi_early_remap() semantics won't be
> actually changed, it will work as before on MIPS64 and will be fixed
> on MIPS32. This (AFAICS) is a completely safe fix of the problem with
> just a few affected platforms around.
>

The only platform enabled DMI in upstream kernel is Loongson64, which
I'm perfectly sure that the mapping for DMI tables *should* be Cached.
It is an accident that ioremap_cache is misused here, so I'm proposing
to replace it with CKSEG0/TO_CAC. Also as per MIPS UHI spec, all the
data passed from bootloader to firmware should lay in KSEG0, please
let me know if your platform is an exception here.

Using ioremap_cache at dmi_early_remap does not sound safe to me as well.
What if DMI code tried to remap something beyond KSEG range at this place?

The safest option here is just bypassing ioremap framework, which does
not give you any advantage but only burden.

I'll propose a patch later.
--
- Jiaxun

2023-12-01 18:26:52

by Serge Semin

[permalink] [raw]
Subject: Re: [PATCH 1/7] mips: dmi: Fix early remap on MIPS32

On Fri, Dec 01, 2023 at 03:10:13PM +0000, Jiaxun Yang wrote:
>
>
> 在2023年12月1日十二月 下午2:54,Serge Semin写道:
> > On Fri, Dec 01, 2023 at 12:13:22AM +0000, Jiaxun Yang wrote:
> >>
> >>
> >> 在2023年11月30日十一月 下午7:16,Serge Semin写道:
> >> > On Tue, Nov 28, 2023 at 03:46:37PM +0000, Jiaxun Yang wrote:
> >> [...]
> >> >
> >> >> I'd say the safest option is to use CKSEG0 or TO_CAC here,
> >> >
> >> > I would have agreed with you if MIPS didn't have that special
> >> > _page_cachable_default variable which is undefined for some platforms
> >> > and which might be re-defined during the boot-up process, and if
> >> > MIPS64 didn't have ioremap_prot() always mapping to the uncached
> >> > region. But IMO updating ioremap_prot() currently seems more risky
> >> > than just converting dmi_early_remap() to the uncached version
> >> > especially seeing it won't change anything. MIPS64 always have IO
> >> > remapped to the uncached region. MIPS32 won't be able to have cached
> >> > mapping until VM is available, and paging and slabs are initialized.
> >> > So on the early MIPS32 bootup stages ioremap_cache() wouldn't have
> >> > worked anyway.
> >>
> >
> >> I really didn't get that, using CKSEG0 on 32bit system and TO_CAC
> >> on 64bit system won't hurt.
> >>
> >> Something like:
> >> #ifdef CONFIG_64BIT
> >> #define dmi_remap(x, l) (void *)TO_CAC(x)
> >> #else
> >> #define dmi_remap(x, l) (void *)CKSEG0(x)
> >> #endif
> >>
> >> Can help us avoid all the hassle. Since it always ensures we are
> >> using same CCA to access DMI tables. We can always trust Config.K0
> >> left by firmware in this case.
> >
> > Please note my only concern is about dmi_early_remap(), not
> > dmi_remap(). The later one can be safely left backended by the
> > ioremap_cache() method because at the stage it's utilized MIPS32
> > version of ioremap_prot() will be able to create any mapping it's
> > requested to. The dmi_early_remap() function is called very early with
> > no paging or VM or even cache stuff initialized. So currently AFAICS
> > it just doesn't work on _all_ _MIPS32_ platform, because
> > ioremap_prot() relies on VM and slab being available to have any
> > cacheable mapping, which aren't at the moment of the dmi_setup()
> > function invocation. Seeing the ioremap_cache() is just a stub on
> > MIPS64 which always performs the uncached mapping, it will be
> > completely safe to just convert dmi_early_remap() to ioremap() with
> > no risk to beak anything. dmi_early_remap() semantics won't be
> > actually changed, it will work as before on MIPS64 and will be fixed
> > on MIPS32. This (AFAICS) is a completely safe fix of the problem with
> > just a few affected platforms around.
> >
>

> The only platform enabled DMI in upstream kernel is Loongson64, which
> I'm perfectly sure that the mapping for DMI tables *should* be Cached.

Then it looks like it must have been broken in the first place.

> It is an accident that ioremap_cache is misused here, so I'm proposing
> to replace it with CKSEG0/TO_CAC. Also as per MIPS UHI spec, all the
> data passed from bootloader to firmware should lay in KSEG0,

I failed to find the MIPS UHI spec. The only link google freely
provide is
http://prplfoundation.org/wiki/MIPS_documentation
but it's broken. Could you please share the doc somehow?
Anyway AFAICS from the MIPS arch code it only concerns the dtb pointer
passed to the kernel. Does it really mandate all the data being in KSEG0?

> please
> let me know if your platform is an exception here.

No, it's not. U-boot is executed in kseg0 anyway.

>
> Using ioremap_cache at dmi_early_remap does not sound safe to me as well.
> What if DMI code tried to remap something beyond KSEG range at this place?

Right. I've found out that for the safety sake the generic version of the
ioremap_prot() has been recently updated not to do any mapping if the
slab hasn't been initialized:
Link: https://lore.kernel.org/lkml/[email protected]/
I'll add a similar fix to the MIPS-version of the ioremap_prop()
method to make sure no early cached remapping is attempted.

>
> The safest option here is just bypassing ioremap framework, which does
> not give you any advantage but only burden.
>
> I'll propose a patch later.

Ok. I see. I'll resubmit my series today as is then. Should you need
to have the problem fixed differently, please either re-base your
patch on top of it, or add your explicit comment that you'll have a
better fix so Thomas could be able to consider to postpone this patch
mergein until your fix is ready.

-Serge(y)

> --
> - Jiaxun