2020-11-27 17:17:00

by Arvind Sankar

[permalink] [raw]
Subject: [PATCH] x86/cpu/amd: Remove dead code for TSEG region remapping

Commit
26bfa5f89486 ("x86, amd: Cleanup init_amd")
moved the code that remaps the TSEG region using 4k pages from
init_amd() to bsp_init_amd().

However, bsp_init_amd() is executed well before the direct mapping is
actually created:

setup_arch()
-> early_cpu_init()
-> early_identify_cpu()
-> this_cpu->c_bsp_init()
-> bsp_init_amd()
...
-> init_mem_mapping()

So the change effectively disabled the 4k remapping, because
pfn_range_is_mapped() is always false at this point.

It has been over six years since the commit, and no-one seems to have
noticed this, so just remove the code. The original code was also
incomplete, since it doesn't check how large the TSEG address range
actually is, so it might remap only part of it in any case.

Hygon has copied the incorrect version, so the code has never run on it
since the cpu support was added two years ago. Remove it from there as
well.

Signed-off-by: Arvind Sankar <[email protected]>
---
arch/x86/kernel/cpu/amd.c | 21 ---------------------
arch/x86/kernel/cpu/hygon.c | 20 --------------------
2 files changed, 41 deletions(-)

diff --git a/arch/x86/kernel/cpu/amd.c b/arch/x86/kernel/cpu/amd.c
index 1f71c7616917..f8ca66f3d861 100644
--- a/arch/x86/kernel/cpu/amd.c
+++ b/arch/x86/kernel/cpu/amd.c
@@ -23,7 +23,6 @@

#ifdef CONFIG_X86_64
# include <asm/mmconfig.h>
-# include <asm/set_memory.h>
#endif

#include "cpu.h"
@@ -509,26 +508,6 @@ static void early_init_amd_mc(struct cpuinfo_x86 *c)

static void bsp_init_amd(struct cpuinfo_x86 *c)
{
-
-#ifdef CONFIG_X86_64
- if (c->x86 >= 0xf) {
- unsigned long long tseg;
-
- /*
- * Split up direct mapping around the TSEG SMM area.
- * Don't do it for gbpages because there seems very little
- * benefit in doing so.
- */
- if (!rdmsrl_safe(MSR_K8_TSEG_ADDR, &tseg)) {
- unsigned long pfn = tseg >> PAGE_SHIFT;
-
- pr_debug("tseg: %010llx\n", tseg);
- if (pfn_range_is_mapped(pfn, pfn + 1))
- set_memory_4k((unsigned long)__va(tseg), 1);
- }
- }
-#endif
-
if (cpu_has(c, X86_FEATURE_CONSTANT_TSC)) {

if (c->x86 > 0x10 ||
diff --git a/arch/x86/kernel/cpu/hygon.c b/arch/x86/kernel/cpu/hygon.c
index dc0840aae26c..ae59115d18f9 100644
--- a/arch/x86/kernel/cpu/hygon.c
+++ b/arch/x86/kernel/cpu/hygon.c
@@ -14,9 +14,6 @@
#include <asm/cacheinfo.h>
#include <asm/spec-ctrl.h>
#include <asm/delay.h>
-#ifdef CONFIG_X86_64
-# include <asm/set_memory.h>
-#endif

#include "cpu.h"

@@ -203,23 +200,6 @@ static void early_init_hygon_mc(struct cpuinfo_x86 *c)

static void bsp_init_hygon(struct cpuinfo_x86 *c)
{
-#ifdef CONFIG_X86_64
- unsigned long long tseg;
-
- /*
- * Split up direct mapping around the TSEG SMM area.
- * Don't do it for gbpages because there seems very little
- * benefit in doing so.
- */
- if (!rdmsrl_safe(MSR_K8_TSEG_ADDR, &tseg)) {
- unsigned long pfn = tseg >> PAGE_SHIFT;
-
- pr_debug("tseg: %010llx\n", tseg);
- if (pfn_range_is_mapped(pfn, pfn + 1))
- set_memory_4k((unsigned long)__va(tseg), 1);
- }
-#endif
-
if (cpu_has(c, X86_FEATURE_CONSTANT_TSC)) {
u64 val;

--
2.26.2


2020-11-27 17:31:21

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH] x86/cpu/amd: Remove dead code for TSEG region remapping

On Fri, Nov 27, 2020 at 12:13:24PM -0500, Arvind Sankar wrote:
> Commit
> 26bfa5f89486 ("x86, amd: Cleanup init_amd")
> moved the code that remaps the TSEG region using 4k pages from
> init_amd() to bsp_init_amd().
>
> However, bsp_init_amd() is executed well before the direct mapping is
> actually created:
>
> setup_arch()
> -> early_cpu_init()
> -> early_identify_cpu()
> -> this_cpu->c_bsp_init()
> -> bsp_init_amd()
> ...
> -> init_mem_mapping()
>
> So the change effectively disabled the 4k remapping, because
> pfn_range_is_mapped() is always false at this point.
>
> It has been over six years since the commit, and no-one seems to have
> noticed this, so just remove the code. The original code was also
> incomplete, since it doesn't check how large the TSEG address range
> actually is, so it might remap only part of it in any case.

Yah, and the patch which added this:

6c62aa4a3c12 ("x86: make amd.c have 64bit support code")

does not say what for (I'm not surprised, frankly).

So if AMD folks on Cc don't have any need for actually fixing this
properly, yap, we can zap it.

Thx.

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

2020-12-02 18:01:00

by Tom Lendacky

[permalink] [raw]
Subject: Re: [PATCH] x86/cpu/amd: Remove dead code for TSEG region remapping

On 11/27/20 11:27 AM, Borislav Petkov wrote:
> On Fri, Nov 27, 2020 at 12:13:24PM -0500, Arvind Sankar wrote:
>> Commit
>> 26bfa5f89486 ("x86, amd: Cleanup init_amd")
>> moved the code that remaps the TSEG region using 4k pages from
>> init_amd() to bsp_init_amd().
>>
>> However, bsp_init_amd() is executed well before the direct mapping is
>> actually created:
>>
>> setup_arch()
>> -> early_cpu_init()
>> -> early_identify_cpu()
>> -> this_cpu->c_bsp_init()
>> -> bsp_init_amd()
>> ...
>> -> init_mem_mapping()
>>
>> So the change effectively disabled the 4k remapping, because
>> pfn_range_is_mapped() is always false at this point.
>>
>> It has been over six years since the commit, and no-one seems to have
>> noticed this, so just remove the code. The original code was also
>> incomplete, since it doesn't check how large the TSEG address range
>> actually is, so it might remap only part of it in any case.
>
> Yah, and the patch which added this:
>
> 6c62aa4a3c12 ("x86: make amd.c have 64bit support code")
>
> does not say what for (I'm not surprised, frankly).
>
> So if AMD folks on Cc don't have any need for actually fixing this
> properly, yap, we can zap it.

I believe this is geared towards performance. If the TSEG base address is
not 2MB aligned, then hardware has to break down a 2MB TLB entry if the OS
references the memory within the 2MB page that is before the TSEG base
address. This can occur whenever the 2MB TLB entry is re-installed because
of TLB flushes, etc.

I would hope that newer BIOSes are 2MB aligning the TSEG base address, but
if not, then this can help.

So moving it back wouldn't be a bad thing. It should probably only do the
set_memory_4k() if the TSEG base address is not 2MB aligned, which I think
is covered by the pfn_range_is_mapped() call?

Thanks,
Tom

>
> Thx.
>

2020-12-02 18:15:23

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH] x86/cpu/amd: Remove dead code for TSEG region remapping

On Wed, Dec 02, 2020 at 11:58:15AM -0600, Tom Lendacky wrote:
> I believe this is geared towards performance. If the TSEG base address is
> not 2MB aligned, then hardware has to break down a 2MB TLB entry if the OS
> references the memory within the 2MB page that is before the TSEG base
> address. This can occur whenever the 2MB TLB entry is re-installed because
> of TLB flushes, etc.

And if this gets reinstated properly, then that explanation belongs over
it because nothing else explains what that thing did. So thanks for
digging it out.

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

2020-12-02 22:36:34

by Arvind Sankar

[permalink] [raw]
Subject: Re: [PATCH] x86/cpu/amd: Remove dead code for TSEG region remapping

On Wed, Dec 02, 2020 at 11:58:15AM -0600, Tom Lendacky wrote:
> On 11/27/20 11:27 AM, Borislav Petkov wrote:
> > On Fri, Nov 27, 2020 at 12:13:24PM -0500, Arvind Sankar wrote:
> >> Commit
> >> 26bfa5f89486 ("x86, amd: Cleanup init_amd")
> >> moved the code that remaps the TSEG region using 4k pages from
> >> init_amd() to bsp_init_amd().
> >>
> >> However, bsp_init_amd() is executed well before the direct mapping is
> >> actually created:
> >>
> >> setup_arch()
> >> -> early_cpu_init()
> >> -> early_identify_cpu()
> >> -> this_cpu->c_bsp_init()
> >> -> bsp_init_amd()
> >> ...
> >> -> init_mem_mapping()
> >>
> >> So the change effectively disabled the 4k remapping, because
> >> pfn_range_is_mapped() is always false at this point.
> >>
> >> It has been over six years since the commit, and no-one seems to have
> >> noticed this, so just remove the code. The original code was also
> >> incomplete, since it doesn't check how large the TSEG address range
> >> actually is, so it might remap only part of it in any case.
> >
> > Yah, and the patch which added this:
> >
> > 6c62aa4a3c12 ("x86: make amd.c have 64bit support code")
> >
> > does not say what for (I'm not surprised, frankly).
> >
> > So if AMD folks on Cc don't have any need for actually fixing this
> > properly, yap, we can zap it.
>
> I believe this is geared towards performance. If the TSEG base address is
> not 2MB aligned, then hardware has to break down a 2MB TLB entry if the OS
> references the memory within the 2MB page that is before the TSEG base
> address. This can occur whenever the 2MB TLB entry is re-installed because
> of TLB flushes, etc.
>
> I would hope that newer BIOSes are 2MB aligning the TSEG base address, but
> if not, then this can help.
>
> So moving it back wouldn't be a bad thing. It should probably only do the
> set_memory_4k() if the TSEG base address is not 2MB aligned, which I think
> is covered by the pfn_range_is_mapped() call?
>

The pfn_range_is_mapped() call just checks whether it is mapped at all
in the direct mapping. Is the TSEG range supposed to be marked as
non-RAM in the E820 map? AFAICS, the only case when a direct mapping is
created for non-RAM is for the 0-1Mb real-mode range, and that will
always use 4k pages. Above that anything not marked as RAM will create
an unmapped hole in the direct map, so in this case the memory just
below the TSEG base would already use smaller pages if needed.

If it's possible that the E820 mapping says this range is RAM, then
should we also break up the direct map just after the end of the TSEG
range for the same reason?

Thanks.

2020-12-03 08:53:55

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH] x86/cpu/amd: Remove dead code for TSEG region remapping

On Wed, Dec 02, 2020 at 05:32:32PM -0500, Arvind Sankar wrote:
> The pfn_range_is_mapped() call just checks whether it is mapped at all
> in the direct mapping. Is the TSEG range supposed to be marked as
> non-RAM in the E820 map? AFAICS, the only case when a direct mapping is
> created for non-RAM is for the 0-1Mb real-mode range, and that will
> always use 4k pages. Above that anything not marked as RAM will create
> an unmapped hole in the direct map, so in this case the memory just
> below the TSEG base would already use smaller pages if needed.
>
> If it's possible that the E820 mapping says this range is RAM, then
> should we also break up the direct map just after the end of the TSEG
> range for the same reason?

So I have a machine where TSEG is not 2M aligned and somewhere in the 1G
range:

[ 1.135094] tseg: 003bf00000

It is not in the E820 map either:

[ 0.019784] init_memory_mapping: [mem 0x00000000-0x000fffff]
[ 0.020014] init_memory_mapping: [mem 0x3bc00000-0x3bdfffff]
[ 0.020166] init_memory_mapping: [mem 0x20000000-0x3bbfffff]
[ 0.020327] init_memory_mapping: [mem 0x00100000-0x1fffffff]
[ 0.020677] init_memory_mapping: [mem 0x3be00000-0x3be8ffff]

That doesn't mean that it can happen that there might be some
configuration where it ends up being mapped.

So looking at what the code does, it kinda makes sense: you want the 2M
range between 0x3be00000 and 0x3c000000 to be split into 4K mappings,
*if* it is mapped.

I need to find a box where it is mapped *and* not 2M aligned, though,
for testing. Which appears kinda hard to do as all the new ones are
aligned.

The above is from a K8 box which should already be dead, as a matter of
fact.

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

2020-12-03 16:17:13

by Arvind Sankar

[permalink] [raw]
Subject: Re: [PATCH] x86/cpu/amd: Remove dead code for TSEG region remapping

On Thu, Dec 03, 2020 at 09:48:57AM +0100, Borislav Petkov wrote:
> On Wed, Dec 02, 2020 at 05:32:32PM -0500, Arvind Sankar wrote:
> > The pfn_range_is_mapped() call just checks whether it is mapped at all
> > in the direct mapping. Is the TSEG range supposed to be marked as
> > non-RAM in the E820 map? AFAICS, the only case when a direct mapping is
> > created for non-RAM is for the 0-1Mb real-mode range, and that will
> > always use 4k pages. Above that anything not marked as RAM will create
> > an unmapped hole in the direct map, so in this case the memory just
> > below the TSEG base would already use smaller pages if needed.
> >
> > If it's possible that the E820 mapping says this range is RAM, then
> > should we also break up the direct map just after the end of the TSEG
> > range for the same reason?
>
> So I have a machine where TSEG is not 2M aligned and somewhere in the 1G
> range:
>
> [ 1.135094] tseg: 003bf00000
>
> It is not in the E820 map either:
>
> [ 0.019784] init_memory_mapping: [mem 0x00000000-0x000fffff]
> [ 0.020014] init_memory_mapping: [mem 0x3bc00000-0x3bdfffff]
> [ 0.020166] init_memory_mapping: [mem 0x20000000-0x3bbfffff]
> [ 0.020327] init_memory_mapping: [mem 0x00100000-0x1fffffff]
> [ 0.020677] init_memory_mapping: [mem 0x3be00000-0x3be8ffff]
>
> That doesn't mean that it can happen that there might be some
> configuration where it ends up being mapped.
>
> So looking at what the code does, it kinda makes sense: you want the 2M
> range between 0x3be00000 and 0x3c000000 to be split into 4K mappings,
> *if* it is mapped.
>
> I need to find a box where it is mapped *and* not 2M aligned, though,
> for testing. Which appears kinda hard to do as all the new ones are
> aligned.

Do any of them have it mapped at all, regardless of the alignment? There
seems to be nothing else in the kernel that ever looks at the TSEG MSR,
so I would guess that it has to be non-RAM in the E820 map, otherwise
nothing would prevent the kernel from allocating and using that space.

I found the actual original commit, which does has a description of the
reasoning. It's
8346ea17aa20 ("x86: split large page mapping for AMD TSEG")

It looks like at the time, the direct mapping didn't really look at the
E820 map in any detail, and was always set up with at least 2Mb pages,
or Gb pages if they were available, from 0 to max_pfn_mapped. So the
direct mapping would have covered even holes that weren't in the E820
map.

Commit
66520ebc2df3 ("x86, mm: Only direct map addresses that are marked as E820_RAM")
changed the direct map setup to avoid mapping holes, because it
apparently became more serious than performance issues: this commit
mentions MCE's getting triggered because of the overmapping.

>
> The above is from a K8 box which should already be dead, as a matter of
> fact.
>
> --
> Regards/Gruss,
> Boris.
>
> https://people.kernel.org/tglx/notes-about-netiquette

2020-12-03 16:47:47

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH] x86/cpu/amd: Remove dead code for TSEG region remapping

On Thu, Dec 03, 2020 at 11:14:06AM -0500, Arvind Sankar wrote:
> Do any of them have it mapped at all, regardless of the alignment? There
> seems to be nothing else in the kernel that ever looks at the TSEG MSR,
> so I would guess that it has to be non-RAM in the E820 map, otherwise
> nothing would prevent the kernel from allocating and using that space.

Ha, that's a very good question. If all those BIOSes from K8 onwards
would put the TSEG in a non-RAM area and after

66520ebc2df3 ("x86, mm: Only direct map addresses that are marked as E820_RAM")

(great investigative work, btw, thanks for that!) then we can simply say
that that splitting is not needed anymore.

Maybe Tom can ask BIOS people whether they always did that - that being
to put the TSEG into a non-RAM area. I can boot my debug patch on my
boxes here but that doesn't mean a whole lot...

Thx.

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

Subject: [tip: x86/cpu] x86/cpu/amd: Remove dead code for TSEG region remapping

The following commit has been merged into the x86/cpu branch of tip:

Commit-ID: 262bd5724afdefd4c48a260d6100e78cc43ee06b
Gitweb: https://git.kernel.org/tip/262bd5724afdefd4c48a260d6100e78cc43ee06b
Author: Arvind Sankar <[email protected]>
AuthorDate: Fri, 27 Nov 2020 12:13:24 -05:00
Committer: Borislav Petkov <[email protected]>
CommitterDate: Tue, 08 Dec 2020 18:45:21 +01:00

x86/cpu/amd: Remove dead code for TSEG region remapping

Commit

26bfa5f89486 ("x86, amd: Cleanup init_amd")

moved the code that remaps the TSEG region using 4k pages from
init_amd() to bsp_init_amd().

However, bsp_init_amd() is executed well before the direct mapping is
actually created:

setup_arch()
-> early_cpu_init()
-> early_identify_cpu()
-> this_cpu->c_bsp_init()
-> bsp_init_amd()
...
-> init_mem_mapping()

So the change effectively disabled the 4k remapping, because
pfn_range_is_mapped() is always false at this point.

It has been over six years since the commit, and no-one seems to have
noticed this, so just remove the code. The original code was also
incomplete, since it doesn't check how large the TSEG address range
actually is, so it might remap only part of it in any case.

Hygon has copied the incorrect version, so the code has never run on it
since the cpu support was added two years ago. Remove it from there as
well.

Committer notes:

This workaround is incomplete anyway:

1. The code must check MSRC001_0113.TValid (SMM TSeg Mask MSR) first, to
check whether the TSeg address range is enabled.

2. The code must check whether the range is not 2M aligned - if it is,
there's nothing to work around.

3. In all the BIOSes tested, the TSeg range is in a e820 reserved area
and those are not mapped anymore, after

66520ebc2df3 ("x86, mm: Only direct map addresses that are marked as E820_RAM")

which means, there's nothing to be worked around either.

So let's rip it out.

Signed-off-by: Arvind Sankar <[email protected]>
Signed-off-by: Borislav Petkov <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
---
arch/x86/kernel/cpu/amd.c | 21 ---------------------
arch/x86/kernel/cpu/hygon.c | 20 --------------------
2 files changed, 41 deletions(-)

diff --git a/arch/x86/kernel/cpu/amd.c b/arch/x86/kernel/cpu/amd.c
index 1f71c76..f8ca66f 100644
--- a/arch/x86/kernel/cpu/amd.c
+++ b/arch/x86/kernel/cpu/amd.c
@@ -23,7 +23,6 @@

#ifdef CONFIG_X86_64
# include <asm/mmconfig.h>
-# include <asm/set_memory.h>
#endif

#include "cpu.h"
@@ -509,26 +508,6 @@ static void early_init_amd_mc(struct cpuinfo_x86 *c)

static void bsp_init_amd(struct cpuinfo_x86 *c)
{
-
-#ifdef CONFIG_X86_64
- if (c->x86 >= 0xf) {
- unsigned long long tseg;
-
- /*
- * Split up direct mapping around the TSEG SMM area.
- * Don't do it for gbpages because there seems very little
- * benefit in doing so.
- */
- if (!rdmsrl_safe(MSR_K8_TSEG_ADDR, &tseg)) {
- unsigned long pfn = tseg >> PAGE_SHIFT;
-
- pr_debug("tseg: %010llx\n", tseg);
- if (pfn_range_is_mapped(pfn, pfn + 1))
- set_memory_4k((unsigned long)__va(tseg), 1);
- }
- }
-#endif
-
if (cpu_has(c, X86_FEATURE_CONSTANT_TSC)) {

if (c->x86 > 0x10 ||
diff --git a/arch/x86/kernel/cpu/hygon.c b/arch/x86/kernel/cpu/hygon.c
index dc0840a..ae59115 100644
--- a/arch/x86/kernel/cpu/hygon.c
+++ b/arch/x86/kernel/cpu/hygon.c
@@ -14,9 +14,6 @@
#include <asm/cacheinfo.h>
#include <asm/spec-ctrl.h>
#include <asm/delay.h>
-#ifdef CONFIG_X86_64
-# include <asm/set_memory.h>
-#endif

#include "cpu.h"

@@ -203,23 +200,6 @@ static void early_init_hygon_mc(struct cpuinfo_x86 *c)

static void bsp_init_hygon(struct cpuinfo_x86 *c)
{
-#ifdef CONFIG_X86_64
- unsigned long long tseg;
-
- /*
- * Split up direct mapping around the TSEG SMM area.
- * Don't do it for gbpages because there seems very little
- * benefit in doing so.
- */
- if (!rdmsrl_safe(MSR_K8_TSEG_ADDR, &tseg)) {
- unsigned long pfn = tseg >> PAGE_SHIFT;
-
- pr_debug("tseg: %010llx\n", tseg);
- if (pfn_range_is_mapped(pfn, pfn + 1))
- set_memory_4k((unsigned long)__va(tseg), 1);
- }
-#endif
-
if (cpu_has(c, X86_FEATURE_CONSTANT_TSC)) {
u64 val;