This patchset did two things:
a). add a new I/O resource descriptor 'IORES_DESC_RESERVED'
When doing kexec_file_load(), the first kernel needs to pass the e820
reserved ranges to the second kernel, because some devices may use it
in kdump kernel, such as PCI devices.
But, the kernel can not exactly match the e820 reserved ranges when
walking through the iomem resources via the 'IORES_DESC_NONE', because
there are several types of e820 that are described as the 'IORES_DESC_NONE'
type. Please refer to the e820_type_to_iores_desc().
Therefore, add a new I/O resource descriptor 'IORES_DESC_RESERVED' for
the iomem resources search interfaces. It is helpful to exactly match
the reserved resource ranges when walking through iomem resources.
In addition, since the new descriptor 'IORES_DESC_RESERVED' has been
created for the reserved areas, the code originally related to the
descriptor 'IORES_DESC_NONE' also need to be updated.
b). add the e820 reserved ranges to kdump kernel e820 table
At present, when using the kexec_file_load() syscall to load the kernel
image and initramfs(for example: kexec -s -p xxx), the kernel does not
pass the e820 reserved ranges to the second kernel, which might cause
two problems:
The first one is the MMCONFIG issue. The basic problem is that this
device is in PCI segment 1 and the kernel PCI probing can not find it
without all the e820 I/O reservations being present in the e820 table.
And the kdump kernel does not have those reservations because the kexec
command does not pass the I/O reservation via the "memmap=xxx" command
line option. (This problem does not show up for other vendors, as SGI
is apparently the actually fails for everyone, but devices in segment 0
are then found by some legacy lookup method.) The workaround for this
is to pass the I/O reserved regions to the kdump kernel.
MMCONFIG(aka ECAM) space is described in the ACPI MCFG table. If you don't
have ECAM: (a) PCI devices won't work at all on non-x86 systems that use
only ECAM for config access, (b) you won't be albe to access devices on
non-0 segments, (c) you won't be able to access extended config space(
address 0x100-0xffff), which means none of the Extended Capabilities will
be available(AER, ACS, ATS, etc). [Bjorn's comment]
The second issue is that the SME kdump kernel doesn't work without the
e820 reserved ranges. When SME is active in kdump kernel, actually, those
reserved regions are still decrypted, but because those reserved ranges are
not present at all in kdump kernel e820 table, those reserved regions are
considered as encrypted, it goes wrong.
The e820 reserved range is useful in kdump kernel, so it is necessary to
pass the e820 reserved ranges to the kdump kernel.
Changes since v1:
1. Modified the value of flags to "0", when walking through the whole
tree for e820 reserved ranges.
Changes since v2:
1. Modified the value of flags to "0", when walking through the whole
tree for e820 reserved ranges.
2. Modified the invalid SOB chain issue.
Changes since v3:
1. Dropped [PATCH 1/3 v3] resource: fix an error which walks through iomem
resources. Please refer to this commit <010a93bf97c7> "resource: Fix
find_next_iomem_res() iteration issue"
Changes since v4:
1. Improve the patch log, and add kernel log.
Changes since v5:
1. Rewrite these patches log.
Changes since v6:
1. Modify the [PATCH 1/2], and add the new I/O resource descriptor
'IORES_DESC_RESERVED' for the iomem resources search interfaces,
and also updates these codes relates to 'IORES_DESC_NONE'.
2. Modify the [PATCH 2/2], and walk through io resource based on the
new descriptor 'IORES_DESC_RESERVED'.
3. Update patch log.
Changes since v7:
1. Improve patch log.
2. Improve this function __ioremap_check_desc_other().
3. Modify code comment in the __ioremap_check_desc_other()
Changes since v8:
1. Get rid of all changes about ia64.(Borislav's suggestion)
2. Change the examination condition to the 'IORES_DESC_ACPI_*'.
3. Modify the signature. This patch(add the new I/O resource
descriptor 'IORES_DESC_RESERVED') was suggested by Boris.
Changes since v9:
1. Improve patch log.
2. No need to modify the kernel/resource.c, so correct them.
3. Change the name of the __ioremap_check_desc_other() to
__ioremap_check_desc_none_and_reserved(), and modify the
check condition, add comment above it.
Lianbo Jiang (2):
x86/mm, resource: add a new I/O resource descriptor
'IORES_DESC_RESERVED'
x86/kexec_file: add reserved e820 ranges to kdump kernel e820 table
arch/x86/kernel/crash.c | 6 ++++++
arch/x86/kernel/e820.c | 2 +-
arch/x86/mm/ioremap.c | 18 +++++++++++++++---
include/linux/ioport.h | 1 +
4 files changed, 23 insertions(+), 4 deletions(-)
--
2.17.1
When doing kexec_file_load(), the first kernel needs to pass the e820
reserved ranges to the second kernel, because some devices may use it
in kdump kernel, such as PCI devices.
But, the kernel can not exactly match the e820 reserved ranges when
walking through the iomem resources via the 'IORES_DESC_NONE', because
there are several types of e820 that are described as the 'IORES_DESC_NONE'
type. Please refer to the e820_type_to_iores_desc().
Therefore, add a new I/O resource descriptor 'IORES_DESC_RESERVED' for
the iomem resources search interfaces. It is helpful to exactly match
the reserved resource ranges when walking through iomem resources.
In addition, since the new descriptor 'IORES_DESC_RESERVED' has been
created for the reserved areas, the code originally related to the
descriptor 'IORES_DESC_NONE' also need to be updated.
Suggested-by: Borislav Petkov <[email protected]>
Signed-off-by: Lianbo Jiang <[email protected]>
---
arch/x86/kernel/e820.c | 2 +-
arch/x86/mm/ioremap.c | 18 +++++++++++++++---
include/linux/ioport.h | 1 +
3 files changed, 17 insertions(+), 4 deletions(-)
diff --git a/arch/x86/kernel/e820.c b/arch/x86/kernel/e820.c
index 2879e234e193..16fcde196243 100644
--- a/arch/x86/kernel/e820.c
+++ b/arch/x86/kernel/e820.c
@@ -1050,10 +1050,10 @@ static unsigned long __init e820_type_to_iores_desc(struct e820_entry *entry)
case E820_TYPE_NVS: return IORES_DESC_ACPI_NV_STORAGE;
case E820_TYPE_PMEM: return IORES_DESC_PERSISTENT_MEMORY;
case E820_TYPE_PRAM: return IORES_DESC_PERSISTENT_MEMORY_LEGACY;
+ case E820_TYPE_RESERVED: return IORES_DESC_RESERVED;
case E820_TYPE_RESERVED_KERN: /* Fall-through: */
case E820_TYPE_RAM: /* Fall-through: */
case E820_TYPE_UNUSABLE: /* Fall-through: */
- case E820_TYPE_RESERVED: /* Fall-through: */
default: return IORES_DESC_NONE;
}
}
diff --git a/arch/x86/mm/ioremap.c b/arch/x86/mm/ioremap.c
index 0029604af8a4..447eede7ba9e 100644
--- a/arch/x86/mm/ioremap.c
+++ b/arch/x86/mm/ioremap.c
@@ -81,9 +81,21 @@ static bool __ioremap_check_ram(struct resource *res)
return false;
}
-static int __ioremap_check_desc_other(struct resource *res)
+/*
+ * Originally, these areas described as IORES_DESC_NONE are not mapped
+ * as encrypted when using ioremap(), for example, E820_TYPE_{RESERVED,
+ * RESERVED_KERN,RAM,UNUSABLE}, etc. It checks for a resource that is
+ * not described as IORES_DESC_NONE, which can make sure the reserved
+ * areas are not mapped as encrypted when using ioremap().
+ *
+ * Now IORES_DESC_RESERVED has been created for the reserved areas so
+ * the check needs to be expanded so that these areas are not mapped
+ * encrypted when using ioremap().
+ */
+static int __ioremap_check_desc_none_and_reserved(struct resource *res)
{
- return (res->desc != IORES_DESC_NONE);
+ return ((res->desc != IORES_DESC_NONE) &&
+ (res->desc != IORES_DESC_RESERVED));
}
static int __ioremap_res_check(struct resource *res, void *arg)
@@ -94,7 +106,7 @@ static int __ioremap_res_check(struct resource *res, void *arg)
flags->system_ram = __ioremap_check_ram(res);
if (!flags->desc_other)
- flags->desc_other = __ioremap_check_desc_other(res);
+ flags->desc_other = __ioremap_check_desc_none_and_reserved(res);
return flags->system_ram && flags->desc_other;
}
diff --git a/include/linux/ioport.h b/include/linux/ioport.h
index da0ebaec25f0..6ed59de48bd5 100644
--- a/include/linux/ioport.h
+++ b/include/linux/ioport.h
@@ -133,6 +133,7 @@ enum {
IORES_DESC_PERSISTENT_MEMORY_LEGACY = 5,
IORES_DESC_DEVICE_PRIVATE_MEMORY = 6,
IORES_DESC_DEVICE_PUBLIC_MEMORY = 7,
+ IORES_DESC_RESERVED = 8,
};
/* helpers to define resources */
--
2.17.1
At present, when using the kexec_file_load() syscall to load the kernel
image and initramfs(for example: kexec -s -p xxx), the kernel does not
pass the e820 reserved ranges to the second kernel, which might cause
two problems:
The first one is the MMCONFIG issue. The basic problem is that this
device is in PCI segment 1 and the kernel PCI probing can not find it
without all the e820 I/O reservations being present in the e820 table.
And the kdump kernel does not have those reservations because the kexec
command does not pass the I/O reservation via the "memmap=xxx" command
line option. (This problem does not show up for other vendors, as SGI
is apparently the actually fails for everyone, but devices in segment 0
are then found by some legacy lookup method.) The workaround for this
is to pass the I/O reserved regions to the kdump kernel.
MMCONFIG(aka ECAM) space is described in the ACPI MCFG table. If you don't
have ECAM: (a) PCI devices won't work at all on non-x86 systems that use
only ECAM for config access, (b) you won't be albe to access devices on
non-0 segments, (c) you won't be able to access extended config space(
address 0x100-0xffff), which means none of the Extended Capabilities will
be available(AER, ACS, ATS, etc). [Bjorn's comment]
The second issue is that the SME kdump kernel doesn't work without the
e820 reserved ranges. When SME is active in kdump kernel, actually, those
reserved regions are still decrypted, but because those reserved ranges are
not present at all in kdump kernel e820 table, those reserved regions are
considered as encrypted, it goes wrong.
The e820 reserved range is useful in kdump kernel, so it is necessary to
pass the e820 reserved ranges to the kdump kernel.
Suggested-by: Dave Young <[email protected]>
Signed-off-by: Lianbo Jiang <[email protected]>
---
arch/x86/kernel/crash.c | 6 ++++++
1 file changed, 6 insertions(+)
diff --git a/arch/x86/kernel/crash.c b/arch/x86/kernel/crash.c
index 17ffc869cab8..1db2754df9e9 100644
--- a/arch/x86/kernel/crash.c
+++ b/arch/x86/kernel/crash.c
@@ -381,6 +381,12 @@ int crash_setup_memmap_entries(struct kimage *image, struct boot_params *params)
walk_iomem_res_desc(IORES_DESC_ACPI_NV_STORAGE, flags, 0, -1, &cmd,
memmap_entry_callback);
+ /* Add e820 reserved ranges */
+ cmd.type = E820_TYPE_RESERVED;
+ flags = IORESOURCE_MEM;
+ walk_iomem_res_desc(IORES_DESC_RESERVED, flags, 0, -1, &cmd,
+ memmap_entry_callback);
+
/* Add crashk_low_res region */
if (crashk_low_res.end) {
ei.addr = crashk_low_res.start;
--
2.17.1
On Fri, Mar 29, 2019 at 08:39:13PM +0800, Lianbo Jiang wrote:
> -static int __ioremap_check_desc_other(struct resource *res)
> +/*
> + * Originally, these areas described as IORES_DESC_NONE are not mapped
> + * as encrypted when using ioremap(), for example, E820_TYPE_{RESERVED,
> + * RESERVED_KERN,RAM,UNUSABLE}, etc. It checks for a resource that is
> + * not described as IORES_DESC_NONE, which can make sure the reserved
> + * areas are not mapped as encrypted when using ioremap().
> + *
> + * Now IORES_DESC_RESERVED has been created for the reserved areas so
> + * the check needs to be expanded so that these areas are not mapped
> + * encrypted when using ioremap().
> + */
> +static int __ioremap_check_desc_none_and_reserved(struct resource *res)
> {
> - return (res->desc != IORES_DESC_NONE);
> + return ((res->desc != IORES_DESC_NONE) &&
Why is this still checking IORES_DESC_NONE when the idea is to have this
specific IORES_DESC_RESERVED for all marked as *reserved* regions in
e820 which should not be mapped encrypted?
IOW, which regions are still marked as IORES_DESC_NONE and should not be
mapped encrypted?
--
Regards/Gruss,
Boris.
Good mailing practices for 400: avoid top-posting and trim the reply.
On Tue, Apr 02, 2019 at 08:02:04PM +0800, lijiang wrote:
> These regions(E820_TYPE_{RESERVED_KERN,RAM,UNUSABLE}) are still marked as
> IORES_DESC_NONE and should not be mapped encrypted when using ioremap().
Seems to me like we're going in circles. You said here:
https://lkml.kernel.org/r/[email protected]
that the kernel doesn't pass the e820 reserved ranges to the second
kernel.
I suggested to use a special IORES descriptor for them -
IORES_DES_RESERVED.
Now you say that that is not enough and some of those you want passed,
are still marked as IORES_DESC_NONE.
Sounds to me like you need try again.
--
Regards/Gruss,
Boris.
Good mailing practices for 400: avoid top-posting and trim the reply.
在 2019年04月02日 17:06, Borislav Petkov 写道:
> On Fri, Mar 29, 2019 at 08:39:13PM +0800, Lianbo Jiang wrote:
>> -static int __ioremap_check_desc_other(struct resource *res)
>> +/*
>> + * Originally, these areas described as IORES_DESC_NONE are not mapped
>> + * as encrypted when using ioremap(), for example, E820_TYPE_{RESERVED,
>> + * RESERVED_KERN,RAM,UNUSABLE}, etc. It checks for a resource that is
>> + * not described as IORES_DESC_NONE, which can make sure the reserved
>> + * areas are not mapped as encrypted when using ioremap().
>> + *
>> + * Now IORES_DESC_RESERVED has been created for the reserved areas so
>> + * the check needs to be expanded so that these areas are not mapped
>> + * encrypted when using ioremap().
>> + */
>> +static int __ioremap_check_desc_none_and_reserved(struct resource *res)
>> {
>> - return (res->desc != IORES_DESC_NONE);
>> + return ((res->desc != IORES_DESC_NONE) &&
>
> Why is this still checking IORES_DESC_NONE when the idea is to have this
> specific IORES_DESC_RESERVED for all marked as *reserved* regions in
> e820 which should not be mapped encrypted?
>
> IOW, which regions are still marked as IORES_DESC_NONE and should not be
> mapped encrypted?
>
Thanks for your comment.
These regions(E820_TYPE_{RESERVED_KERN,RAM,UNUSABLE}) are still marked as
IORES_DESC_NONE and should not be mapped encrypted when using ioremap().
Please refer to the following function.
static unsigned long __init e820_type_to_iores_desc(struct e820_entry *entry)
{
switch (entry->type) {
case E820_TYPE_ACPI: return IORES_DESC_ACPI_TABLES;
case E820_TYPE_NVS: return IORES_DESC_ACPI_NV_STORAGE;
case E820_TYPE_PMEM: return IORES_DESC_PERSISTENT_MEMORY;
case E820_TYPE_PRAM: return IORES_DESC_PERSISTENT_MEMORY_LEGACY;
case E820_TYPE_RESERVED: return IORES_DESC_RESERVED;
case E820_TYPE_RESERVED_KERN: /* Fall-through: */
case E820_TYPE_RAM: /* Fall-through: */
case E820_TYPE_UNUSABLE: /* Fall-through: */
default: return IORES_DESC_NONE;
}
}
Thanks.
Lianbo
在 2019年04月02日 20:43, Borislav Petkov 写道:
> On Tue, Apr 02, 2019 at 08:02:04PM +0800, lijiang wrote:
>> These regions(E820_TYPE_{RESERVED_KERN,RAM,UNUSABLE}) are still marked as
>> IORES_DESC_NONE and should not be mapped encrypted when using ioremap().
>
> Seems to me like we're going in circles. You said here:
>
> https://lkml.kernel.org/r/[email protected]
>
> that the kernel doesn't pass the e820 reserved ranges to the second
> kernel.
>
> I suggested to use a special IORES descriptor for them -
> IORES_DES_RESERVED.
>
> Now you say that that is not enough and some of those you want passed,
> are still marked as IORES_DESC_NONE.
>
Sorry for the delay.
They are different problems.
The first problem is that passes the e820 reserved ranges to the second kernel,
for this case, it is good enough to use the IORES_DESC_RESERVED, which can
ensure that exactly matches the reserved resource ranges when walking through
iomem resources.
The second problem is about the SEV case. Now, the IORES_DESC_RESERVED has been
created for the reserved areas, therefore the check needs to be expanded so that
these areas are not mapped encrypted when using ioremap().
+static int __ioremap_check_desc_none_and_reserved(struct resource *res)
{
- return (res->desc != IORES_DESC_NONE);
+ return ((res->desc != IORES_DESC_NONE) &&
+ (res->desc != IORES_DESC_RESERVED));
}
Maybe i should split it into two patches. The change of __ioremap_check_desc_none_and_reserved()
should be a separate patch. Any idea?
Thanks.
Lianbo
> Sounds to me like you need try again.
>
On Mon, Apr 15, 2019 at 08:22:22PM +0800, lijiang wrote:
> They are different problems.
Aha, so we're getting closer. You should've lead with that!
> The first problem is that passes the e820 reserved ranges to the second kernel,
Passes or *doesn't* pass?
Because from all the staring, it wants to pass the reserved ranges.
> for this case, it is good enough to use the IORES_DESC_RESERVED, which
> can ensure that exactly matches the reserved resource ranges when
> walking through iomem resources.
Ok.
> The second problem is about the SEV case. Now, the IORES_DESC_RESERVED has been
> created for the reserved areas, therefore the check needs to be expanded so that
> these areas are not mapped encrypted when using ioremap().
>
> +static int __ioremap_check_desc_none_and_reserved(struct resource *res)
That name is crap. If you need to add another desc type, it becomes
wrong again. And that whole code around flags->desc_other is just silly:
Make that machinery around it something like this:
struct ioremap_desc {
u64 flags;
};
instead of "struct ioremap_mem_flags" and that struct ioremap_desc is an
ioremap descriptor which will carry all kinds of settings. system_ram
can then be a simple flag too.
__ioremap_caller() will hand it down to __ioremap_check_mem() etc
and there it will set flags like IOREMAP_DESC_MAP_ENCRYPTED or
IOREMAP_DESC_MAP_DECRYPTED and this way you'll have it explicit and
clear in __ioremap_caller():
if ((sev_active() &&
(io_desc.flags & IOREMAP_DESC_MAP_ENCRYPTED)) ||
encrypted)
prot = pgprot_encrypted(prot);
But that would need a pre-patch which does that conversion.
> Maybe i should split it into two patches. The change of
> __ioremap_check_desc_none_and_reserved() should be a separate patch.
> Any idea?
See above and yes, definitely separate patches.
--
Regards/Gruss,
Boris.
Good mailing practices for 400: avoid top-posting and trim the reply.
在 2019年04月15日 23:41, Borislav Petkov 写道:
> On Mon, Apr 15, 2019 at 08:22:22PM +0800, lijiang wrote:
>> They are different problems.
>
> Aha, so we're getting closer. You should've lead with that!
>
>> The first problem is that passes the e820 reserved ranges to the second kernel,
>
> Passes or *doesn't* pass?
>
> Because from all the staring, it wants to pass the reserved ranges.
>
>> for this case, it is good enough to use the IORES_DESC_RESERVED, which
>> can ensure that exactly matches the reserved resource ranges when
>> walking through iomem resources.
>
> Ok.
>
>> The second problem is about the SEV case. Now, the IORES_DESC_RESERVED has been
>> created for the reserved areas, therefore the check needs to be expanded so that
>> these areas are not mapped encrypted when using ioremap().
>>
>> +static int __ioremap_check_desc_none_and_reserved(struct resource *res)
>
> That name is crap. If you need to add another desc type, it becomes
> wrong again. And that whole code around flags->desc_other is just silly:
>
> Make that machinery around it something like this:
>
> struct ioremap_desc {
> u64 flags;
> };
>
> instead of "struct ioremap_mem_flags" and that struct ioremap_desc is an
> ioremap descriptor which will carry all kinds of settings. system_ram
> can then be a simple flag too.
>
> __ioremap_caller() will hand it down to __ioremap_check_mem() etc
> and there it will set flags like IOREMAP_DESC_MAP_ENCRYPTED or
> IOREMAP_DESC_MAP_DECRYPTED and this way you'll have it explicit and
> clear in __ioremap_caller():
>
> if ((sev_active() &&
> (io_desc.flags & IOREMAP_DESC_MAP_ENCRYPTED)) ||
> encrypted)
> prot = pgprot_encrypted(prot);
>
> But that would need a pre-patch which does that conversion.
>
Thanks for your comment.
Based on the above description, i made a draft patch, please refer to it. But it
seems that the code has been changed a lot.
diff --git a/arch/x86/mm/ioremap.c b/arch/x86/mm/ioremap.c
index 0029604af8a4..04217b61635e 100644
--- a/arch/x86/mm/ioremap.c
+++ b/arch/x86/mm/ioremap.c
@@ -27,9 +27,8 @@
#include "physaddr.h"
-struct ioremap_mem_flags {
- bool system_ram;
- bool desc_other;
+struct ioremap_desc {
+ u64 flags;
};
/*
@@ -61,13 +60,13 @@ int ioremap_change_attr(unsigned long vaddr, unsigned long size,
return err;
}
-static bool __ioremap_check_ram(struct resource *res)
+static unsigned long __ioremap_check_ram(struct resource *res)
{
unsigned long start_pfn, stop_pfn;
unsigned long i;
if ((res->flags & IORESOURCE_SYSTEM_RAM) != IORESOURCE_SYSTEM_RAM)
- return false;
+ return IOREMAP_DESC_MAP_NONE;
start_pfn = (res->start + PAGE_SIZE - 1) >> PAGE_SHIFT;
stop_pfn = (res->end + 1) >> PAGE_SHIFT;
@@ -75,28 +74,44 @@ static bool __ioremap_check_ram(struct resource *res)
for (i = 0; i < (stop_pfn - start_pfn); ++i)
if (pfn_valid(start_pfn + i) &&
!PageReserved(pfn_to_page(start_pfn + i)))
- return true;
+ return IOREMAP_DESC_MAP_SYSTEM_RAM_USING;
}
- return false;
+ return IOREMAP_DESC_MAP_NONE;
}
-static int __ioremap_check_desc_other(struct resource *res)
+/*
+ * Originally, these areas described as IORES_DESC_NONE are not mapped
+ * as encrypted when using ioremap(), for example, E820_TYPE_{RESERVED,
+ * RESERVED_KERN,RAM,UNUSABLE}, etc. It checks for a resource that is
+ * not described as IORES_DESC_NONE, which can make sure the reserved
+ * areas are not mapped as encrypted when using ioremap().
+ *
+ * Now IORES_DESC_RESERVED has been created for the reserved areas so
+ * the check needs to be expanded so that these areas are not mapped
+ * encrypted when using ioremap().
+ */
+static unsigned long __ioremap_check_desc(struct resource *res)
{
- return (res->desc != IORES_DESC_NONE);
+ if ((res->desc != IORES_DESC_NONE) &&
+ (res->desc != IORES_DESC_RESERVED))
+ return IOREMAP_DESC_MAP_ENCRYPTED;
+
+ return IOREMAP_DESC_MAP_NONE;
}
static int __ioremap_res_check(struct resource *res, void *arg)
{
- struct ioremap_mem_flags *flags = arg;
+ struct ioremap_desc *desc = arg;
- if (!flags->system_ram)
- flags->system_ram = __ioremap_check_ram(res);
+ if (!(desc->flags & IOREMAP_DESC_MAP_SYSTEM_RAM_USING))
+ desc->flags |= __ioremap_check_ram(res);
- if (!flags->desc_other)
- flags->desc_other = __ioremap_check_desc_other(res);
+ if (!(desc->flags & IOREMAP_DESC_MAP_ENCRYPTED))
+ desc->flags |= __ioremap_check_desc(res);
- return flags->system_ram && flags->desc_other;
+ return ((desc->flags & IOREMAP_DESC_MAP_SYSTEM_RAM_USING) &&
+ (desc->flags & IOREMAP_DESC_MAP_ENCRYPTED))
}
/*
@@ -105,13 +120,13 @@ static int __ioremap_res_check(struct resource *res, void *arg)
* resource described not as IORES_DESC_NONE (e.g. IORES_DESC_ACPI_TABLES).
*/
static void __ioremap_check_mem(resource_size_t addr, unsigned long size,
- struct ioremap_mem_flags *flags)
+ struct ioremap_desc *desc)
{
u64 start, end;
start = (u64)addr;
end = start + size - 1;
- memset(flags, 0, sizeof(*flags));
+ memset(desc, 0, sizeof(*desc));
walk_mem_res(start, end, flags, __ioremap_res_check);
}
@@ -138,7 +153,7 @@ static void __iomem *__ioremap_caller(resource_size_t phys_addr,
resource_size_t last_addr;
const resource_size_t unaligned_phys_addr = phys_addr;
const unsigned long unaligned_size = size;
- struct ioremap_mem_flags mem_flags;
+ struct ioremap_desc io_desc;
struct vm_struct *area;
enum page_cache_mode new_pcm;
pgprot_t prot;
@@ -157,12 +172,12 @@ static void __iomem *__ioremap_caller(resource_size_t phys_addr,
return NULL;
}
- __ioremap_check_mem(phys_addr, size, &mem_flags);
+ __ioremap_check_mem(phys_addr, size, &io_desc);
/*
* Don't allow anybody to remap normal RAM that we're using..
*/
- if (mem_flags.system_ram) {
+ if (io_desc.flags & IOREMAP_DESC_MAP_SYSTEM_RAM_USING) {
WARN_ONCE(1, "ioremap on RAM at %pa - %pa\n",
&phys_addr, &last_addr);
return NULL;
@@ -200,7 +215,9 @@ static void __iomem *__ioremap_caller(resource_size_t phys_addr,
* resulting mapping.
*/
prot = PAGE_KERNEL_IO;
- if ((sev_active() && mem_flags.desc_other) || encrypted)
+ if ((sev_active() &&
+ (io_desc.flags & IOREMAP_DESC_MAP_ENCRYPTED)) ||
+ encrypted)
prot = pgprot_encrypted(prot);
switch (pcm) {
diff --git a/include/linux/ioport.h b/include/linux/ioport.h
index 6ed59de48bd5..48b2d21ad9e5 100644
--- a/include/linux/ioport.h
+++ b/include/linux/ioport.h
@@ -136,6 +136,15 @@ enum {
IORES_DESC_RESERVED = 8,
};
+/*
+ * IORemap Descriptors Helper
+ */
+enum {
+ IOREMAP_DESC_MAP_NONE = 0,
+ IOREMAP_DESC_MAP_SYSTEM_RAM_USING = 1,
+ IOREMAP_DESC_MAP_ENCRYPTED = 2,
+};
+
/* helpers to define resources */
#define DEFINE_RES_NAMED(_start, _size, _name, _flags) \
{ \
Are you sure about this changes? Or Do other reviewers have any suggestions?
Thanks.
>> Maybe i should split it into two patches. The change of
>> __ioremap_check_desc_none_and_reserved() should be a separate patch.
>> Any idea?
>
> See above and yes, definitely separate patches.
>
OK. Thank you very much.
Lianbo
On Wed, Apr 17, 2019 at 02:40:09PM +0800, lijiang wrote:
> Based on the above description, i made a draft patch, please refer to it. But it
> seems that the code has been changed a lot.
It goes in the right direction. Here is a version where I corrected some
things:
---
diff --git a/arch/x86/mm/ioremap.c b/arch/x86/mm/ioremap.c
index dd73d5d74393..0c1392e76b7d 100644
--- a/arch/x86/mm/ioremap.c
+++ b/arch/x86/mm/ioremap.c
@@ -27,9 +27,8 @@
#include "physaddr.h"
-struct ioremap_mem_flags {
- bool system_ram;
- bool desc_other;
+struct ioremap_desc {
+ u64 flags;
};
/*
@@ -61,13 +60,13 @@ int ioremap_change_attr(unsigned long vaddr, unsigned long size,
return err;
}
-static bool __ioremap_check_ram(struct resource *res)
+static u64 __ioremap_check_ram(struct resource *res)
{
unsigned long start_pfn, stop_pfn;
unsigned long i;
if ((res->flags & IORESOURCE_SYSTEM_RAM) != IORESOURCE_SYSTEM_RAM)
- return false;
+ return 0;
start_pfn = (res->start + PAGE_SIZE - 1) >> PAGE_SHIFT;
stop_pfn = (res->end + 1) >> PAGE_SHIFT;
@@ -75,28 +74,39 @@ static bool __ioremap_check_ram(struct resource *res)
for (i = 0; i < (stop_pfn - start_pfn); ++i)
if (pfn_valid(start_pfn + i) &&
!PageReserved(pfn_to_page(start_pfn + i)))
- return true;
+ return IORES_MAP_SYSTEM_RAM;
}
- return false;
+ return 0;
}
-static int __ioremap_check_desc_other(struct resource *res)
+/*
+ * NONE and RESERVED should not be mapped encrypted in SEV because there
+ * the whole memory is already encrypted.
+ */
+static unsigned long __ioremap_check_desc(struct resource *res)
{
- return (res->desc != IORES_DESC_NONE);
+ if (!sev_active())
+ return 0;
+
+ if (res->desc & (IORES_DESC_NONE | IORES_DESC_RESERVED))
+ return 0;
+
+ return IORES_MAP_ENCRYPTED;
}
static int __ioremap_res_check(struct resource *res, void *arg)
{
- struct ioremap_mem_flags *flags = arg;
+ struct ioremap_desc *desc = arg;
- if (!flags->system_ram)
- flags->system_ram = __ioremap_check_ram(res);
+ if (!(desc->flags & IORES_MAP_SYSTEM_RAM))
+ desc->flags |= __ioremap_check_ram(res);
- if (!flags->desc_other)
- flags->desc_other = __ioremap_check_desc_other(res);
+ if (!(desc->flags & IORES_MAP_ENCRYPTED))
+ desc->flags |= __ioremap_check_desc(res);
- return flags->system_ram && flags->desc_other;
+ return ((desc->flags & (IORES_MAP_SYSTEM_RAM | IORES_MAP_ENCRYPTED)) ==
+ (IORES_MAP_SYSTEM_RAM | IORES_MAP_ENCRYPTED));
}
/*
@@ -105,13 +115,13 @@ static int __ioremap_res_check(struct resource *res, void *arg)
* resource described not as IORES_DESC_NONE (e.g. IORES_DESC_ACPI_TABLES).
*/
static void __ioremap_check_mem(resource_size_t addr, unsigned long size,
- struct ioremap_mem_flags *flags)
+ struct ioremap_desc *desc)
{
u64 start, end;
start = (u64)addr;
end = start + size - 1;
- memset(flags, 0, sizeof(*flags));
+ memset(desc, 0, sizeof(struct ioremap_desc));
walk_mem_res(start, end, flags, __ioremap_res_check);
}
@@ -138,7 +148,7 @@ static void __iomem *__ioremap_caller(resource_size_t phys_addr,
resource_size_t last_addr;
const resource_size_t unaligned_phys_addr = phys_addr;
const unsigned long unaligned_size = size;
- struct ioremap_mem_flags mem_flags;
+ struct ioremap_desc io_desc;
struct vm_struct *area;
enum page_cache_mode new_pcm;
pgprot_t prot;
@@ -157,12 +167,12 @@ static void __iomem *__ioremap_caller(resource_size_t phys_addr,
return NULL;
}
- __ioremap_check_mem(phys_addr, size, &mem_flags);
+ __ioremap_check_mem(phys_addr, size, &io_desc);
/*
* Don't allow anybody to remap normal RAM that we're using..
*/
- if (mem_flags.system_ram) {
+ if (io_desc.flags & IORES_MAP_SYSTEM_RAM) {
WARN_ONCE(1, "ioremap on RAM at %pa - %pa\n",
&phys_addr, &last_addr);
return NULL;
@@ -200,7 +210,7 @@ static void __iomem *__ioremap_caller(resource_size_t phys_addr,
* resulting mapping.
*/
prot = PAGE_KERNEL_IO;
- if ((sev_active() && mem_flags.desc_other) || encrypted)
+ if ((io_desc.flags & IORES_MAP_ENCRYPTED) || encrypted)
prot = pgprot_encrypted(prot);
switch (pcm) {
diff --git a/include/linux/ioport.h b/include/linux/ioport.h
index da0ebaec25f0..2a5e0f1ded93 100644
--- a/include/linux/ioport.h
+++ b/include/linux/ioport.h
@@ -12,6 +12,7 @@
#ifndef __ASSEMBLY__
#include <linux/compiler.h>
#include <linux/types.h>
+#include <linux/bits.h>
/*
* Resources are tree-like, allowing
* nesting etc..
@@ -135,6 +136,14 @@ enum {
IORES_DESC_DEVICE_PUBLIC_MEMORY = 7,
};
+/*
+ * Flags controlling ioremap() behavior.
+ */
+enum {
+ IORES_MAP_SYSTEM_RAM = BIT(0),
+ IORES_MAP_ENCRYPTED = BIT(1),
+};
+
/* helpers to define resources */
#define DEFINE_RES_NAMED(_start, _size, _name, _flags) \
{ \
--
Regards/Gruss,
Boris.
Good mailing practices for 400: avoid top-posting and trim the reply.
On 4/18/19 5:01 AM, Borislav Petkov wrote:
> On Wed, Apr 17, 2019 at 02:40:09PM +0800, lijiang wrote:
>> Based on the above description, i made a draft patch, please refer to it. But it
>> seems that the code has been changed a lot.
>
> It goes in the right direction. Here is a version where I corrected some
> things:
>
> ---
> diff --git a/arch/x86/mm/ioremap.c b/arch/x86/mm/ioremap.c
> index dd73d5d74393..0c1392e76b7d 100644
> --- a/arch/x86/mm/ioremap.c
> +++ b/arch/x86/mm/ioremap.c
> @@ -27,9 +27,8 @@
>
> #include "physaddr.h"
>
> -struct ioremap_mem_flags {
> - bool system_ram;
> - bool desc_other;
> +struct ioremap_desc {
> + u64 flags;
You could probably make this unsigned int for now since you're creating
the new enum used to assign to this variable and it only uses two bits
currently.
> };
>
> /*
> @@ -61,13 +60,13 @@ int ioremap_change_attr(unsigned long vaddr, unsigned long size,
> return err;
> }
>
> -static bool __ioremap_check_ram(struct resource *res)
> +static u64 __ioremap_check_ram(struct resource *res)
> {
> unsigned long start_pfn, stop_pfn;
> unsigned long i;
>
> if ((res->flags & IORESOURCE_SYSTEM_RAM) != IORESOURCE_SYSTEM_RAM)
> - return false;
> + return 0;
>
> start_pfn = (res->start + PAGE_SIZE - 1) >> PAGE_SHIFT;
> stop_pfn = (res->end + 1) >> PAGE_SHIFT;
> @@ -75,28 +74,39 @@ static bool __ioremap_check_ram(struct resource *res)
> for (i = 0; i < (stop_pfn - start_pfn); ++i)
> if (pfn_valid(start_pfn + i) &&
> !PageReserved(pfn_to_page(start_pfn + i)))
> - return true;
> + return IORES_MAP_SYSTEM_RAM;
> }
>
> - return false;
> + return 0;
> }
>
> -static int __ioremap_check_desc_other(struct resource *res)
> +/*
> + * NONE and RESERVED should not be mapped encrypted in SEV because there
> + * the whole memory is already encrypted.
> + */
> +static unsigned long __ioremap_check_desc(struct resource *res)
If you don't make the change to unsigned int, then this should be a u64
return to match the the flags definition and the other function.
> {
> - return (res->desc != IORES_DESC_NONE);
> + if (!sev_active())
> + return 0;
> +
> + if (res->desc & (IORES_DESC_NONE | IORES_DESC_RESERVED))
> + return 0;
The IORES_DESC_xxxx enums are not bits, just sequential numbers, so you
won't be able to test them this way. You can probably get away with a
switch statement here.
Thanks,
Tom
> +
> + return IORES_MAP_ENCRYPTED;
> }
>
> static int __ioremap_res_check(struct resource *res, void *arg)
> {
> - struct ioremap_mem_flags *flags = arg;
> + struct ioremap_desc *desc = arg;
>
> - if (!flags->system_ram)
> - flags->system_ram = __ioremap_check_ram(res);
> + if (!(desc->flags & IORES_MAP_SYSTEM_RAM))
> + desc->flags |= __ioremap_check_ram(res);
>
> - if (!flags->desc_other)
> - flags->desc_other = __ioremap_check_desc_other(res);
> + if (!(desc->flags & IORES_MAP_ENCRYPTED))
> + desc->flags |= __ioremap_check_desc(res);
>
> - return flags->system_ram && flags->desc_other;
> + return ((desc->flags & (IORES_MAP_SYSTEM_RAM | IORES_MAP_ENCRYPTED)) ==
> + (IORES_MAP_SYSTEM_RAM | IORES_MAP_ENCRYPTED));
> }
>
> /*
> @@ -105,13 +115,13 @@ static int __ioremap_res_check(struct resource *res, void *arg)
> * resource described not as IORES_DESC_NONE (e.g. IORES_DESC_ACPI_TABLES).
> */
> static void __ioremap_check_mem(resource_size_t addr, unsigned long size,
> - struct ioremap_mem_flags *flags)
> + struct ioremap_desc *desc)
> {
> u64 start, end;
>
> start = (u64)addr;
> end = start + size - 1;
> - memset(flags, 0, sizeof(*flags));
> + memset(desc, 0, sizeof(struct ioremap_desc));
>
> walk_mem_res(start, end, flags, __ioremap_res_check);
> }
> @@ -138,7 +148,7 @@ static void __iomem *__ioremap_caller(resource_size_t phys_addr,
> resource_size_t last_addr;
> const resource_size_t unaligned_phys_addr = phys_addr;
> const unsigned long unaligned_size = size;
> - struct ioremap_mem_flags mem_flags;
> + struct ioremap_desc io_desc;
> struct vm_struct *area;
> enum page_cache_mode new_pcm;
> pgprot_t prot;
> @@ -157,12 +167,12 @@ static void __iomem *__ioremap_caller(resource_size_t phys_addr,
> return NULL;
> }
>
> - __ioremap_check_mem(phys_addr, size, &mem_flags);
> + __ioremap_check_mem(phys_addr, size, &io_desc);
>
> /*
> * Don't allow anybody to remap normal RAM that we're using..
> */
> - if (mem_flags.system_ram) {
> + if (io_desc.flags & IORES_MAP_SYSTEM_RAM) {
> WARN_ONCE(1, "ioremap on RAM at %pa - %pa\n",
> &phys_addr, &last_addr);
> return NULL;
> @@ -200,7 +210,7 @@ static void __iomem *__ioremap_caller(resource_size_t phys_addr,
> * resulting mapping.
> */
> prot = PAGE_KERNEL_IO;
> - if ((sev_active() && mem_flags.desc_other) || encrypted)
> + if ((io_desc.flags & IORES_MAP_ENCRYPTED) || encrypted)
> prot = pgprot_encrypted(prot);
>
> switch (pcm) {
> diff --git a/include/linux/ioport.h b/include/linux/ioport.h
> index da0ebaec25f0..2a5e0f1ded93 100644
> --- a/include/linux/ioport.h
> +++ b/include/linux/ioport.h
> @@ -12,6 +12,7 @@
> #ifndef __ASSEMBLY__
> #include <linux/compiler.h>
> #include <linux/types.h>
> +#include <linux/bits.h>
> /*
> * Resources are tree-like, allowing
> * nesting etc..
> @@ -135,6 +136,14 @@ enum {
> IORES_DESC_DEVICE_PUBLIC_MEMORY = 7,
> };
>
> +/*
> + * Flags controlling ioremap() behavior.
> + */
> +enum {
> + IORES_MAP_SYSTEM_RAM = BIT(0),
> + IORES_MAP_ENCRYPTED = BIT(1),
> +};
> +
> /* helpers to define resources */
> #define DEFINE_RES_NAMED(_start, _size, _name, _flags) \
> { \
>
On Thu, Apr 18, 2019 at 01:17:03PM +0000, Lendacky, Thomas wrote:
> You could probably make this unsigned int for now since you're creating
> the new enum used to assign to this variable and it only uses two bits
> currently.
Right.
> > + if (res->desc & (IORES_DESC_NONE | IORES_DESC_RESERVED))
> > + return 0;
>
> The IORES_DESC_xxxx enums are not bits, just sequential numbers, so you
> won't be able to test them this way. You can probably get away with a
> switch statement here.
Ah, good catch. I thought they're BITs too. Oh well.
Thx.
--
Regards/Gruss,
Boris.
Good mailing practices for 400: avoid top-posting and trim the reply.