2019-12-09 23:15:43

by Kalra, Ashish

[permalink] [raw]
Subject: [PATCH v2] swiotlb: Adjust SWIOTBL bounce buffer size for SEV guests.

From: Ashish Kalra <[email protected]>

For SEV, all DMA to and from guest has to use shared
(un-encrypted) pages. SEV uses SWIOTLB to make this happen
without requiring changes to device drivers. However,
depending on workload being run, the default 64MB of SWIOTLB
might not be enough and SWIOTLB may run out of buffers to
use for DMA, resulting in I/O errors.

Increase the default size of SWIOTLB for SEV guests using
a minimum value of 128MB and a maximum value of 512MB,
determining on amount of provisioned guest memory.

The SWIOTLB default size adjustment is added as an
architecture specific interface/callback to allow
architectures such as those supporting memory encryption
to adjust/expand SWIOTLB size for their use.

Signed-off-by: Ashish Kalra <[email protected]>
---
Changes in v2:
- Fix compile errors as
Reported-by: kbuild test robot <[email protected]>

arch/x86/Kconfig | 1 +
arch/x86/mm/mem_encrypt.c | 36 ++++++++++++++++++++++++++++++++++++
include/linux/dma-direct.h | 10 ++++++++++
kernel/dma/Kconfig | 3 +++
kernel/dma/swiotlb.c | 14 ++++++++++++--
5 files changed, 62 insertions(+), 2 deletions(-)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 5e8949953660..e75622e58d34 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -1522,6 +1522,7 @@ config AMD_MEM_ENCRYPT
select DYNAMIC_PHYSICAL_MASK
select ARCH_USE_MEMREMAP_PROT
select ARCH_HAS_FORCE_DMA_UNENCRYPTED
+ select ARCH_HAS_ADJUST_SWIOTLB_DEFAULT
---help---
Say yes to enable support for the encryption of system memory.
This requires an AMD processor that supports Secure Memory
diff --git a/arch/x86/mm/mem_encrypt.c b/arch/x86/mm/mem_encrypt.c
index a03614bd3e1a..f4bd4b431ba1 100644
--- a/arch/x86/mm/mem_encrypt.c
+++ b/arch/x86/mm/mem_encrypt.c
@@ -376,6 +376,42 @@ bool force_dma_unencrypted(struct device *dev)
return false;
}

+#define TOTAL_MEM_1G 0x40000000U
+#define TOTAL_MEM_4G 0x100000000U
+
+/*
+ * Override for SWIOTLB default size adjustment -
+ * ARCH_HAS_ADJUST_SWIOTLB_DEFAULT
+ */
+unsigned long adjust_swiotlb_default_size(unsigned long default_size)
+{
+ /*
+ * For SEV, all DMA has to occur via shared/unencrypted pages.
+ * SEV uses SWOTLB to make this happen without changing device
+ * drivers. However, depending on the workload being run, the
+ * default 64MB of SWIOTLB may not be enough & SWIOTLB may
+ * run out of buffers for using DMA, resulting in I/O errors.
+ * Increase the default size of SWIOTLB for SEV guests using
+ * a minimum value of 128MB and a maximum value of 512GB,
+ * depending on amount of provisioned guest memory.
+ */
+ if (sev_active()) {
+ unsigned long total_mem = get_num_physpages() << PAGE_SHIFT;
+
+ if (total_mem <= TOTAL_MEM_1G)
+ default_size = default_size * 2;
+ else if (total_mem <= TOTAL_MEM_4G)
+ default_size = default_size * 4;
+ else
+ default_size = default_size * 8;
+
+ pr_info_once("SEV is active, SWIOTLB default size set to %luMB\n",
+ default_size >> 20);
+ }
+
+ return default_size;
+}
+
/* Architecture __weak replacement functions */
void __init mem_encrypt_free_decrypted_mem(void)
{
diff --git a/include/linux/dma-direct.h b/include/linux/dma-direct.h
index 24b8684aa21d..85507d21493f 100644
--- a/include/linux/dma-direct.h
+++ b/include/linux/dma-direct.h
@@ -35,6 +35,16 @@ static inline bool force_dma_unencrypted(struct device *dev)
}
#endif /* CONFIG_ARCH_HAS_FORCE_DMA_UNENCRYPTED */

+#ifdef CONFIG_ARCH_HAS_ADJUST_SWIOTLB_DEFAULT
+unsigned long adjust_swiotlb_default_size(unsigned long default_size);
+#else
+static inline unsigned long adjust_swiotlb_default_size
+ (unsigned long default_size)
+{
+ return default_size;
+}
+#endif /* CONFIG_ARCH_HAS_ADJUST_SWIOTLB_DEFAULT */
+
/*
* If memory encryption is supported, phys_to_dma will set the memory encryption
* bit in the DMA address, and dma_to_phys will clear it. The raw __phys_to_dma
diff --git a/kernel/dma/Kconfig b/kernel/dma/Kconfig
index 4c103a24e380..851c4500ff88 100644
--- a/kernel/dma/Kconfig
+++ b/kernel/dma/Kconfig
@@ -54,6 +54,9 @@ config ARCH_HAS_DMA_PREP_COHERENT
config ARCH_HAS_FORCE_DMA_UNENCRYPTED
bool

+config ARCH_HAS_ADJUST_SWIOTLB_DEFAULT
+ bool
+
config DMA_NONCOHERENT_CACHE_SYNC
bool

diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
index 9280d6f8271e..7dd72bd88f1c 100644
--- a/kernel/dma/swiotlb.c
+++ b/kernel/dma/swiotlb.c
@@ -155,11 +155,21 @@ void swiotlb_set_max_segment(unsigned int val)
#define IO_TLB_DEFAULT_SIZE (64UL<<20)
unsigned long swiotlb_size_or_default(void)
{
+ unsigned long default_size = IO_TLB_DEFAULT_SIZE;
unsigned long size;

+ /*
+ * If swiotlb size/amount of slabs are not defined on kernel command
+ * line, then give a chance to architectures to adjust swiotlb
+ * size, this may be required by some architectures such as those
+ * supporting memory encryption.
+ */
+ if (!io_tlb_nslabs)
+ default_size = adjust_swiotlb_default_size(default_size);
+
size = io_tlb_nslabs << IO_TLB_SHIFT;

- return size ? size : (IO_TLB_DEFAULT_SIZE);
+ return size ? size : default_size;
}

void swiotlb_print_info(void)
@@ -245,7 +255,7 @@ int __init swiotlb_init_with_tbl(char *tlb, unsigned long nslabs, int verbose)
void __init
swiotlb_init(int verbose)
{
- size_t default_size = IO_TLB_DEFAULT_SIZE;
+ unsigned long default_size = swiotlb_size_or_default();
unsigned char *vstart;
unsigned long bytes;

--
2.17.1


2019-12-20 01:55:03

by Konrad Rzeszutek Wilk

[permalink] [raw]
Subject: Re: [PATCH v2] swiotlb: Adjust SWIOTBL bounce buffer size for SEV guests.

On Mon, Dec 09, 2019 at 11:13:46PM +0000, Ashish Kalra wrote:
> From: Ashish Kalra <[email protected]>
>
> For SEV, all DMA to and from guest has to use shared
> (un-encrypted) pages. SEV uses SWIOTLB to make this happen
> without requiring changes to device drivers. However,
> depending on workload being run, the default 64MB of SWIOTLB
> might not be enough and SWIOTLB may run out of buffers to
> use for DMA, resulting in I/O errors.
>
> Increase the default size of SWIOTLB for SEV guests using
> a minimum value of 128MB and a maximum value of 512MB,
> determining on amount of provisioned guest memory.
>
> The SWIOTLB default size adjustment is added as an
> architecture specific interface/callback to allow
> architectures such as those supporting memory encryption
> to adjust/expand SWIOTLB size for their use.

What if this was made dynamic? That is if there is a memory
pressure you end up expanding the SWIOTLB dynamically?

Also is it worth doing this calculation based on memory or
more on the # of PCI devices + their MMIO ranges size?

>
> Signed-off-by: Ashish Kalra <[email protected]>
> ---
> Changes in v2:
> - Fix compile errors as
> Reported-by: kbuild test robot <[email protected]>
>
> arch/x86/Kconfig | 1 +
> arch/x86/mm/mem_encrypt.c | 36 ++++++++++++++++++++++++++++++++++++
> include/linux/dma-direct.h | 10 ++++++++++
> kernel/dma/Kconfig | 3 +++
> kernel/dma/swiotlb.c | 14 ++++++++++++--
> 5 files changed, 62 insertions(+), 2 deletions(-)
>
> diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
> index 5e8949953660..e75622e58d34 100644
> --- a/arch/x86/Kconfig
> +++ b/arch/x86/Kconfig
> @@ -1522,6 +1522,7 @@ config AMD_MEM_ENCRYPT
> select DYNAMIC_PHYSICAL_MASK
> select ARCH_USE_MEMREMAP_PROT
> select ARCH_HAS_FORCE_DMA_UNENCRYPTED
> + select ARCH_HAS_ADJUST_SWIOTLB_DEFAULT
> ---help---
> Say yes to enable support for the encryption of system memory.
> This requires an AMD processor that supports Secure Memory
> diff --git a/arch/x86/mm/mem_encrypt.c b/arch/x86/mm/mem_encrypt.c
> index a03614bd3e1a..f4bd4b431ba1 100644
> --- a/arch/x86/mm/mem_encrypt.c
> +++ b/arch/x86/mm/mem_encrypt.c
> @@ -376,6 +376,42 @@ bool force_dma_unencrypted(struct device *dev)
> return false;
> }
>
> +#define TOTAL_MEM_1G 0x40000000U
> +#define TOTAL_MEM_4G 0x100000000U
> +
> +/*
> + * Override for SWIOTLB default size adjustment -
> + * ARCH_HAS_ADJUST_SWIOTLB_DEFAULT
> + */
> +unsigned long adjust_swiotlb_default_size(unsigned long default_size)
> +{
> + /*
> + * For SEV, all DMA has to occur via shared/unencrypted pages.
> + * SEV uses SWOTLB to make this happen without changing device
> + * drivers. However, depending on the workload being run, the
> + * default 64MB of SWIOTLB may not be enough & SWIOTLB may
> + * run out of buffers for using DMA, resulting in I/O errors.
> + * Increase the default size of SWIOTLB for SEV guests using
> + * a minimum value of 128MB and a maximum value of 512GB,
> + * depending on amount of provisioned guest memory.
> + */
> + if (sev_active()) {
> + unsigned long total_mem = get_num_physpages() << PAGE_SHIFT;
> +
> + if (total_mem <= TOTAL_MEM_1G)
> + default_size = default_size * 2;
> + else if (total_mem <= TOTAL_MEM_4G)
> + default_size = default_size * 4;
> + else
> + default_size = default_size * 8;
> +
> + pr_info_once("SEV is active, SWIOTLB default size set to %luMB\n",
> + default_size >> 20);
> + }
> +
> + return default_size;
> +}
> +
> /* Architecture __weak replacement functions */
> void __init mem_encrypt_free_decrypted_mem(void)
> {
> diff --git a/include/linux/dma-direct.h b/include/linux/dma-direct.h
> index 24b8684aa21d..85507d21493f 100644
> --- a/include/linux/dma-direct.h
> +++ b/include/linux/dma-direct.h
> @@ -35,6 +35,16 @@ static inline bool force_dma_unencrypted(struct device *dev)
> }
> #endif /* CONFIG_ARCH_HAS_FORCE_DMA_UNENCRYPTED */
>
> +#ifdef CONFIG_ARCH_HAS_ADJUST_SWIOTLB_DEFAULT
> +unsigned long adjust_swiotlb_default_size(unsigned long default_size);
> +#else
> +static inline unsigned long adjust_swiotlb_default_size
> + (unsigned long default_size)
> +{
> + return default_size;
> +}
> +#endif /* CONFIG_ARCH_HAS_ADJUST_SWIOTLB_DEFAULT */
> +
> /*
> * If memory encryption is supported, phys_to_dma will set the memory encryption
> * bit in the DMA address, and dma_to_phys will clear it. The raw __phys_to_dma
> diff --git a/kernel/dma/Kconfig b/kernel/dma/Kconfig
> index 4c103a24e380..851c4500ff88 100644
> --- a/kernel/dma/Kconfig
> +++ b/kernel/dma/Kconfig
> @@ -54,6 +54,9 @@ config ARCH_HAS_DMA_PREP_COHERENT
> config ARCH_HAS_FORCE_DMA_UNENCRYPTED
> bool
>
> +config ARCH_HAS_ADJUST_SWIOTLB_DEFAULT
> + bool
> +
> config DMA_NONCOHERENT_CACHE_SYNC
> bool
>
> diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
> index 9280d6f8271e..7dd72bd88f1c 100644
> --- a/kernel/dma/swiotlb.c
> +++ b/kernel/dma/swiotlb.c
> @@ -155,11 +155,21 @@ void swiotlb_set_max_segment(unsigned int val)
> #define IO_TLB_DEFAULT_SIZE (64UL<<20)
> unsigned long swiotlb_size_or_default(void)
> {
> + unsigned long default_size = IO_TLB_DEFAULT_SIZE;
> unsigned long size;
>
> + /*
> + * If swiotlb size/amount of slabs are not defined on kernel command
> + * line, then give a chance to architectures to adjust swiotlb
> + * size, this may be required by some architectures such as those
> + * supporting memory encryption.
> + */
> + if (!io_tlb_nslabs)
> + default_size = adjust_swiotlb_default_size(default_size);
> +
> size = io_tlb_nslabs << IO_TLB_SHIFT;
>
> - return size ? size : (IO_TLB_DEFAULT_SIZE);
> + return size ? size : default_size;
> }
>
> void swiotlb_print_info(void)
> @@ -245,7 +255,7 @@ int __init swiotlb_init_with_tbl(char *tlb, unsigned long nslabs, int verbose)
> void __init
> swiotlb_init(int verbose)
> {
> - size_t default_size = IO_TLB_DEFAULT_SIZE;
> + unsigned long default_size = swiotlb_size_or_default();
> unsigned char *vstart;
> unsigned long bytes;
>
> --
> 2.17.1
>

2020-01-21 20:11:13

by Kalra, Ashish

[permalink] [raw]
Subject: Re: [PATCH v2] swiotlb: Adjust SWIOTBL bounce buffer size for SEV guests.

On Thu, Dec 19, 2019 at 08:52:45PM -0500, Konrad Rzeszutek Wilk wrote:
> On Mon, Dec 09, 2019 at 11:13:46PM +0000, Ashish Kalra wrote:
> > From: Ashish Kalra <[email protected]>
> >
> > For SEV, all DMA to and from guest has to use shared
> > (un-encrypted) pages. SEV uses SWIOTLB to make this happen
> > without requiring changes to device drivers. However,
> > depending on workload being run, the default 64MB of SWIOTLB
> > might not be enough and SWIOTLB may run out of buffers to
> > use for DMA, resulting in I/O errors.
> >
> > Increase the default size of SWIOTLB for SEV guests using
> > a minimum value of 128MB and a maximum value of 512MB,
> > determining on amount of provisioned guest memory.
> >
> > The SWIOTLB default size adjustment is added as an
> > architecture specific interface/callback to allow
> > architectures such as those supporting memory encryption
> > to adjust/expand SWIOTLB size for their use.
>
> What if this was made dynamic? That is if there is a memory
> pressure you end up expanding the SWIOTLB dynamically?

As of now we want to keep it as simple as possible and more
like a stop-gap arrangement till something more elegant is
available.

>
>> Also is it worth doing this calculation based on memory or
>> more on the # of PCI devices + their MMIO ranges size?

Additional memory calculations based on # of PCI devices and
their memory ranges will make it more complicated with so
many other permutations and combinations to explore, it is
essential to keep this patch as simple as possible by
adjusting the bounce buffer size simply by determining it
from the amount of provisioned guest memory.

Thanks,
Ashish

2020-01-21 20:56:29

by Konrad Rzeszutek Wilk

[permalink] [raw]
Subject: Re: [PATCH v2] swiotlb: Adjust SWIOTBL bounce buffer size for SEV guests.

On Tue, Jan 21, 2020 at 08:09:47PM +0000, Ashish Kalra wrote:
> On Thu, Dec 19, 2019 at 08:52:45PM -0500, Konrad Rzeszutek Wilk wrote:
> > On Mon, Dec 09, 2019 at 11:13:46PM +0000, Ashish Kalra wrote:
> > > From: Ashish Kalra <[email protected]>
> > >
> > > For SEV, all DMA to and from guest has to use shared
> > > (un-encrypted) pages. SEV uses SWIOTLB to make this happen
> > > without requiring changes to device drivers. However,
> > > depending on workload being run, the default 64MB of SWIOTLB
> > > might not be enough and SWIOTLB may run out of buffers to
> > > use for DMA, resulting in I/O errors.
> > >
> > > Increase the default size of SWIOTLB for SEV guests using
> > > a minimum value of 128MB and a maximum value of 512MB,
> > > determining on amount of provisioned guest memory.
> > >
> > > The SWIOTLB default size adjustment is added as an
> > > architecture specific interface/callback to allow
> > > architectures such as those supporting memory encryption
> > > to adjust/expand SWIOTLB size for their use.
> >
> > What if this was made dynamic? That is if there is a memory
> > pressure you end up expanding the SWIOTLB dynamically?
>
> As of now we want to keep it as simple as possible and more
> like a stop-gap arrangement till something more elegant is
> available.

That is nice. But past experience has shown that stop-gap arrangments
end up being the defacto solution.

>
> >
> >> Also is it worth doing this calculation based on memory or
> >> more on the # of PCI devices + their MMIO ranges size?
>
> Additional memory calculations based on # of PCI devices and
> their memory ranges will make it more complicated with so
> many other permutations and combinations to explore, it is
> essential to keep this patch as simple as possible by
> adjusting the bounce buffer size simply by determining it
> from the amount of provisioned guest memory.

Please rework the patch to:

- Use a log solution instead of the multiplication.
Feel free to cap it at a sensible value.

- Also the code depends on SWIOTLB calling in to the
adjust_swiotlb_default_size which looks wrong.

You should not adjust io_tlb_nslabs from swiotlb_size_or_default.
That function's purpose is to report a value.

- Make io_tlb_nslabs be visible outside of the SWIOTLB code.

- Can you utilize the IOMMU_INIT APIs and have your own detect which would
modify the io_tlb_nslabs (and set swiotbl=1?).

Actually you seem to be piggybacking on pci_swiotlb_detect_4gb - so
perhaps add in this code ? Albeit it really should be in it's own
file, not in arch/x86/kernel/pci-swiotlb.c

- Tweak the code in the swiotlb code to make sure it can deal
with io_tlb_nslabs being modified outside of the code at
the start. It should have no trouble, but only testing will
tell for sure.

>
> Thanks,
> Ashish

2020-01-24 23:01:30

by Kalra, Ashish

[permalink] [raw]
Subject: Re: [PATCH v2] swiotlb: Adjust SWIOTBL bounce buffer size for SEV guests.

On Tue, Jan 21, 2020 at 03:54:03PM -0500, Konrad Rzeszutek Wilk wrote:
> >
> > Additional memory calculations based on # of PCI devices and
> > their memory ranges will make it more complicated with so
> > many other permutations and combinations to explore, it is
> > essential to keep this patch as simple as possible by
> > adjusting the bounce buffer size simply by determining it
> > from the amount of provisioned guest memory.
>>
>> Please rework the patch to:
>>
>> - Use a log solution instead of the multiplication.
>> Feel free to cap it at a sensible value.

Ok.

>>
>> - Also the code depends on SWIOTLB calling in to the
>> adjust_swiotlb_default_size which looks wrong.
>>
>> You should not adjust io_tlb_nslabs from swiotlb_size_or_default.

>> That function's purpose is to report a value.
>>
>> - Make io_tlb_nslabs be visible outside of the SWIOTLB code.
>>
>> - Can you utilize the IOMMU_INIT APIs and have your own detect which would
>> modify the io_tlb_nslabs (and set swiotbl=1?).

This seems to be a nice option, but then IOMMU_INIT APIs are
x86-specific and this swiotlb buffer size adjustment is also needed
for other memory encryption architectures like Power, S390, etc.

>>
>> Actually you seem to be piggybacking on pci_swiotlb_detect_4gb - so
>> perhaps add in this code ? Albeit it really should be in it's own
>> file, not in arch/x86/kernel/pci-swiotlb.c

Actually, we piggyback on pci_swiotlb_detect_override which sets
swiotlb=1 as x86_64_start_kernel() and invocation of sme_early_init()
forces swiotlb on, but again this is all x86 architecture specific.

Thanks,
Ashish

2020-02-04 19:36:43

by Kalra, Ashish

[permalink] [raw]
Subject: Re: [PATCH v2] swiotlb: Adjust SWIOTBL bounce buffer size for SEV guests.

Hello Konrad,

Looking fwd. to your feedback regarding support of other memory
encryption architectures such as Power, S390, etc.

Thanks,
Ashish

On Fri, Jan 24, 2020 at 11:00:08PM +0000, Ashish Kalra wrote:
> On Tue, Jan 21, 2020 at 03:54:03PM -0500, Konrad Rzeszutek Wilk wrote:
> > >
> > > Additional memory calculations based on # of PCI devices and
> > > their memory ranges will make it more complicated with so
> > > many other permutations and combinations to explore, it is
> > > essential to keep this patch as simple as possible by
> > > adjusting the bounce buffer size simply by determining it
> > > from the amount of provisioned guest memory.
> >>
> >> Please rework the patch to:
> >>
> >> - Use a log solution instead of the multiplication.
> >> Feel free to cap it at a sensible value.
>
> Ok.
>
> >>
> >> - Also the code depends on SWIOTLB calling in to the
> >> adjust_swiotlb_default_size which looks wrong.
> >>
> >> You should not adjust io_tlb_nslabs from swiotlb_size_or_default.
>
> >> That function's purpose is to report a value.
> >>
> >> - Make io_tlb_nslabs be visible outside of the SWIOTLB code.
> >>
> >> - Can you utilize the IOMMU_INIT APIs and have your own detect which would
> >> modify the io_tlb_nslabs (and set swiotbl=1?).
>
> This seems to be a nice option, but then IOMMU_INIT APIs are
> x86-specific and this swiotlb buffer size adjustment is also needed
> for other memory encryption architectures like Power, S390, etc.
>
> >>
> >> Actually you seem to be piggybacking on pci_swiotlb_detect_4gb - so
> >> perhaps add in this code ? Albeit it really should be in it's own
> >> file, not in arch/x86/kernel/pci-swiotlb.c
>
> Actually, we piggyback on pci_swiotlb_detect_override which sets
> swiotlb=1 as x86_64_start_kernel() and invocation of sme_early_init()
> forces swiotlb on, but again this is all x86 architecture specific.
>
> Thanks,
> Ashish

2020-03-03 17:16:54

by Konrad Rzeszutek Wilk

[permalink] [raw]
Subject: Re: [PATCH v2] swiotlb: Adjust SWIOTBL bounce buffer size for SEV guests.

On Tue, Feb 04, 2020 at 07:35:00PM +0000, Ashish Kalra wrote:
> Hello Konrad,
>
> Looking fwd. to your feedback regarding support of other memory
> encryption architectures such as Power, S390, etc.
>
> Thanks,
> Ashish
>
> On Fri, Jan 24, 2020 at 11:00:08PM +0000, Ashish Kalra wrote:
> > On Tue, Jan 21, 2020 at 03:54:03PM -0500, Konrad Rzeszutek Wilk wrote:
> > > >
> > > > Additional memory calculations based on # of PCI devices and
> > > > their memory ranges will make it more complicated with so
> > > > many other permutations and combinations to explore, it is
> > > > essential to keep this patch as simple as possible by
> > > > adjusting the bounce buffer size simply by determining it
> > > > from the amount of provisioned guest memory.
> > >>
> > >> Please rework the patch to:
> > >>
> > >> - Use a log solution instead of the multiplication.
> > >> Feel free to cap it at a sensible value.
> >
> > Ok.
> >
> > >>
> > >> - Also the code depends on SWIOTLB calling in to the
> > >> adjust_swiotlb_default_size which looks wrong.
> > >>
> > >> You should not adjust io_tlb_nslabs from swiotlb_size_or_default.
> >
> > >> That function's purpose is to report a value.
> > >>
> > >> - Make io_tlb_nslabs be visible outside of the SWIOTLB code.
> > >>
> > >> - Can you utilize the IOMMU_INIT APIs and have your own detect which would
> > >> modify the io_tlb_nslabs (and set swiotbl=1?).
> >
> > This seems to be a nice option, but then IOMMU_INIT APIs are
> > x86-specific and this swiotlb buffer size adjustment is also needed
> > for other memory encryption architectures like Power, S390, etc.

Oh dear. That I hadn't considered.
> >
> > >>
> > >> Actually you seem to be piggybacking on pci_swiotlb_detect_4gb - so
> > >> perhaps add in this code ? Albeit it really should be in it's own
> > >> file, not in arch/x86/kernel/pci-swiotlb.c
> >
> > Actually, we piggyback on pci_swiotlb_detect_override which sets
> > swiotlb=1 as x86_64_start_kernel() and invocation of sme_early_init()
> > forces swiotlb on, but again this is all x86 architecture specific.

Then it looks like the best bet is to do it from within swiotlb_init?
We really can't do it from swiotlb_size_or_default - that function
should just return a value and nothing else.

> >
> > Thanks,
> > Ashish

2020-03-12 00:44:34

by Kalra, Ashish

[permalink] [raw]
Subject: Re: [PATCH v2] swiotlb: Adjust SWIOTBL bounce buffer size for SEV guests.

On Tue, Mar 03, 2020 at 12:03:53PM -0500, Konrad Rzeszutek Wilk wrote:
> > >
> > > >>
> > > >> Actually you seem to be piggybacking on pci_swiotlb_detect_4gb - so
> > > >> perhaps add in this code ? Albeit it really should be in it's own
> > > >> file, not in arch/x86/kernel/pci-swiotlb.c
> > >
> > > Actually, we piggyback on pci_swiotlb_detect_override which sets
> > > swiotlb=1 as x86_64_start_kernel() and invocation of sme_early_init()
> > > forces swiotlb on, but again this is all x86 architecture specific.
>
> Then it looks like the best bet is to do it from within swiotlb_init?
> We really can't do it from swiotlb_size_or_default - that function
> should just return a value and nothing else.

Ok, i will fix this patch accordingly and resubmit it.

Thanks,
Ashish

2020-03-30 22:28:18

by Kalra, Ashish

[permalink] [raw]
Subject: Re: [PATCH v2] swiotlb: Adjust SWIOTBL bounce buffer size for SEV guests.

Hello Konrad,

On Tue, Mar 03, 2020 at 12:03:53PM -0500, Konrad Rzeszutek Wilk wrote:
> On Tue, Feb 04, 2020 at 07:35:00PM +0000, Ashish Kalra wrote:
> > Hello Konrad,
> >
> > Looking fwd. to your feedback regarding support of other memory
> > encryption architectures such as Power, S390, etc.
> >
> > Thanks,
> > Ashish
> >
> > On Fri, Jan 24, 2020 at 11:00:08PM +0000, Ashish Kalra wrote:
> > > On Tue, Jan 21, 2020 at 03:54:03PM -0500, Konrad Rzeszutek Wilk wrote:
> > > > >
> > > > > Additional memory calculations based on # of PCI devices and
> > > > > their memory ranges will make it more complicated with so
> > > > > many other permutations and combinations to explore, it is
> > > > > essential to keep this patch as simple as possible by
> > > > > adjusting the bounce buffer size simply by determining it
> > > > > from the amount of provisioned guest memory.
> > > >>
> > > >> Please rework the patch to:
> > > >>
> > > >> - Use a log solution instead of the multiplication.
> > > >> Feel free to cap it at a sensible value.
> > >
> > > Ok.
> > >
> > > >>
> > > >> - Also the code depends on SWIOTLB calling in to the
> > > >> adjust_swiotlb_default_size which looks wrong.
> > > >>
> > > >> You should not adjust io_tlb_nslabs from swiotlb_size_or_default.
> > >
> > > >> That function's purpose is to report a value.
> > > >>
> > > >> - Make io_tlb_nslabs be visible outside of the SWIOTLB code.
> > > >>
> > > >> - Can you utilize the IOMMU_INIT APIs and have your own detect which would
> > > >> modify the io_tlb_nslabs (and set swiotbl=1?).
> > >
> > > This seems to be a nice option, but then IOMMU_INIT APIs are
> > > x86-specific and this swiotlb buffer size adjustment is also needed
> > > for other memory encryption architectures like Power, S390, etc.
>
> Oh dear. That I hadn't considered.
> > >
> > > >>
> > > >> Actually you seem to be piggybacking on pci_swiotlb_detect_4gb - so
> > > >> perhaps add in this code ? Albeit it really should be in it's own
> > > >> file, not in arch/x86/kernel/pci-swiotlb.c
> > >
> > > Actually, we piggyback on pci_swiotlb_detect_override which sets
> > > swiotlb=1 as x86_64_start_kernel() and invocation of sme_early_init()
> > > forces swiotlb on, but again this is all x86 architecture specific.
>
> Then it looks like the best bet is to do it from within swiotlb_init?
> We really can't do it from swiotlb_size_or_default - that function
> should just return a value and nothing else.
>

Actually, we need to do it in swiotlb_size_or_default() as this gets called by
reserve_crashkernel_low() in arch/x86/kernel/setup.c and used to
reserve low crashkernel memory. If we adjust swiotlb size later in
swiotlb_init() which gets called later than reserve_crashkernel_low(),
then any swiotlb size changes/expansion will conflict/overlap with the
low memory reserved for crashkernel.

Thanks,
Ashish

2020-04-27 18:56:38

by Kalra, Ashish

[permalink] [raw]
Subject: Re: [PATCH v2] swiotlb: Adjust SWIOTBL bounce buffer size for SEV guests.

Hello Konrad,

On Mon, Mar 30, 2020 at 10:25:51PM +0000, Ashish Kalra wrote:
> Hello Konrad,
>
> On Tue, Mar 03, 2020 at 12:03:53PM -0500, Konrad Rzeszutek Wilk wrote:
> > On Tue, Feb 04, 2020 at 07:35:00PM +0000, Ashish Kalra wrote:
> > > Hello Konrad,
> > >
> > > Looking fwd. to your feedback regarding support of other memory
> > > encryption architectures such as Power, S390, etc.
> > >
> > > Thanks,
> > > Ashish
> > >
> > > On Fri, Jan 24, 2020 at 11:00:08PM +0000, Ashish Kalra wrote:
> > > > On Tue, Jan 21, 2020 at 03:54:03PM -0500, Konrad Rzeszutek Wilk wrote:
> > > > > >
> > > > > > Additional memory calculations based on # of PCI devices and
> > > > > > their memory ranges will make it more complicated with so
> > > > > > many other permutations and combinations to explore, it is
> > > > > > essential to keep this patch as simple as possible by
> > > > > > adjusting the bounce buffer size simply by determining it
> > > > > > from the amount of provisioned guest memory.
> > > > >>
> > > > >> Please rework the patch to:
> > > > >>
> > > > >> - Use a log solution instead of the multiplication.
> > > > >> Feel free to cap it at a sensible value.
> > > >
> > > > Ok.
> > > >
> > > > >>
> > > > >> - Also the code depends on SWIOTLB calling in to the
> > > > >> adjust_swiotlb_default_size which looks wrong.
> > > > >>
> > > > >> You should not adjust io_tlb_nslabs from swiotlb_size_or_default.
> > > >
> > > > >> That function's purpose is to report a value.
> > > > >>
> > > > >> - Make io_tlb_nslabs be visible outside of the SWIOTLB code.
> > > > >>
> > > > >> - Can you utilize the IOMMU_INIT APIs and have your own detect which would
> > > > >> modify the io_tlb_nslabs (and set swiotbl=1?).
> > > >
> > > > This seems to be a nice option, but then IOMMU_INIT APIs are
> > > > x86-specific and this swiotlb buffer size adjustment is also needed
> > > > for other memory encryption architectures like Power, S390, etc.
> >
> > Oh dear. That I hadn't considered.
> > > >
> > > > >>
> > > > >> Actually you seem to be piggybacking on pci_swiotlb_detect_4gb - so
> > > > >> perhaps add in this code ? Albeit it really should be in it's own
> > > > >> file, not in arch/x86/kernel/pci-swiotlb.c
> > > >
> > > > Actually, we piggyback on pci_swiotlb_detect_override which sets
> > > > swiotlb=1 as x86_64_start_kernel() and invocation of sme_early_init()
> > > > forces swiotlb on, but again this is all x86 architecture specific.
> >
> > Then it looks like the best bet is to do it from within swiotlb_init?
> > We really can't do it from swiotlb_size_or_default - that function
> > should just return a value and nothing else.
> >
>
> Actually, we need to do it in swiotlb_size_or_default() as this gets called by
> reserve_crashkernel_low() in arch/x86/kernel/setup.c and used to
> reserve low crashkernel memory. If we adjust swiotlb size later in
> swiotlb_init() which gets called later than reserve_crashkernel_low(),
> then any swiotlb size changes/expansion will conflict/overlap with the
> low memory reserved for crashkernel.
>
and will also potentially cause SWIOTLB buffer allocation failures.

Do you have any feedback, comments on the above ?

As such i feel, this patch is complete otherwise and can be included as
it is.

Thanks,
Ashish

2020-06-23 13:41:04

by Konrad Rzeszutek Wilk

[permalink] [raw]
Subject: Re: [PATCH v2] swiotlb: Adjust SWIOTBL bounce buffer size for SEV guests.

On Mon, Apr 27, 2020 at 06:53:18PM +0000, Ashish Kalra wrote:
> Hello Konrad,
>
> On Mon, Mar 30, 2020 at 10:25:51PM +0000, Ashish Kalra wrote:
> > Hello Konrad,
> >
> > On Tue, Mar 03, 2020 at 12:03:53PM -0500, Konrad Rzeszutek Wilk wrote:
> > > On Tue, Feb 04, 2020 at 07:35:00PM +0000, Ashish Kalra wrote:
> > > > Hello Konrad,
> > > >
> > > > Looking fwd. to your feedback regarding support of other memory
> > > > encryption architectures such as Power, S390, etc.
> > > >
> > > > Thanks,
> > > > Ashish
> > > >
> > > > On Fri, Jan 24, 2020 at 11:00:08PM +0000, Ashish Kalra wrote:
> > > > > On Tue, Jan 21, 2020 at 03:54:03PM -0500, Konrad Rzeszutek Wilk wrote:
> > > > > > >
> > > > > > > Additional memory calculations based on # of PCI devices and
> > > > > > > their memory ranges will make it more complicated with so
> > > > > > > many other permutations and combinations to explore, it is
> > > > > > > essential to keep this patch as simple as possible by
> > > > > > > adjusting the bounce buffer size simply by determining it
> > > > > > > from the amount of provisioned guest memory.
> > > > > >>
> > > > > >> Please rework the patch to:
> > > > > >>
> > > > > >> - Use a log solution instead of the multiplication.
> > > > > >> Feel free to cap it at a sensible value.
> > > > >
> > > > > Ok.
> > > > >
> > > > > >>
> > > > > >> - Also the code depends on SWIOTLB calling in to the
> > > > > >> adjust_swiotlb_default_size which looks wrong.
> > > > > >>
> > > > > >> You should not adjust io_tlb_nslabs from swiotlb_size_or_default.
> > > > >
> > > > > >> That function's purpose is to report a value.
> > > > > >>
> > > > > >> - Make io_tlb_nslabs be visible outside of the SWIOTLB code.
> > > > > >>
> > > > > >> - Can you utilize the IOMMU_INIT APIs and have your own detect which would
> > > > > >> modify the io_tlb_nslabs (and set swiotbl=1?).
> > > > >
> > > > > This seems to be a nice option, but then IOMMU_INIT APIs are
> > > > > x86-specific and this swiotlb buffer size adjustment is also needed
> > > > > for other memory encryption architectures like Power, S390, etc.
> > >
> > > Oh dear. That I hadn't considered.
> > > > >
> > > > > >>
> > > > > >> Actually you seem to be piggybacking on pci_swiotlb_detect_4gb - so
> > > > > >> perhaps add in this code ? Albeit it really should be in it's own
> > > > > >> file, not in arch/x86/kernel/pci-swiotlb.c
> > > > >
> > > > > Actually, we piggyback on pci_swiotlb_detect_override which sets
> > > > > swiotlb=1 as x86_64_start_kernel() and invocation of sme_early_init()
> > > > > forces swiotlb on, but again this is all x86 architecture specific.
> > >
> > > Then it looks like the best bet is to do it from within swiotlb_init?
> > > We really can't do it from swiotlb_size_or_default - that function
> > > should just return a value and nothing else.
> > >
> >
> > Actually, we need to do it in swiotlb_size_or_default() as this gets called by
> > reserve_crashkernel_low() in arch/x86/kernel/setup.c and used to
> > reserve low crashkernel memory. If we adjust swiotlb size later in
> > swiotlb_init() which gets called later than reserve_crashkernel_low(),
> > then any swiotlb size changes/expansion will conflict/overlap with the
> > low memory reserved for crashkernel.
> >
> and will also potentially cause SWIOTLB buffer allocation failures.
>
> Do you have any feedback, comments on the above ?


The init boot chain looks like this:

initmem_init
pci_iommu_alloc
-> pci_swiotlb_detect_4gb
-> swiotlb_init

reserve_crashkernel
reserve_crashkernel_low
-> swiotlb_size_or_default
..


(rootfs code):
pci_iommu_init
-> a bunch of the other IOMMU late_init code gets called..
-> pci_swiotlb_late_init

I have to say I am lost to how your patch fixes "If we adjust swiolb
size later .. then any swiotlb size .. will overlap with the low memory
reserved for crashkernel"?

Or are you saying that 'reserve_crashkernel_low' is the _culprit_ and it
is the one changing the size? And hence it modifying the swiotlb size
will fix this problem? Aka _before_ all the other IOMMU get their hand
on it?

If so why not create an
IOMMU_INIT(crashkernel_adjust_swiotlb,pci_swiotlb_detect_override,
NULL, NULL);

And crashkernel_adjust_swiotlb would change the size of swiotlb buffer
if conditions are found to require it.

You also may want to put a #define DEBUG in arch/x86/kernel/pci-iommu_table.c
to check out whether the tree structure of IOMMU entries is correct.



But still I am lost - if say the AMD one does decide for unknown reason
to expand the SWIOTLB you are still stuck with the 'overlap with
the low memory reserved' or so.

Perhaps add a late_init that gets called as the last one to validate
this ? And maybe if the swiotlb gets turned off you also take proper
steps?

> As such i feel, this patch is complete otherwise and can be included as
> it is.
>
> Thanks,
> Ashish

2020-06-24 00:25:32

by Kalra, Ashish

[permalink] [raw]
Subject: Re: [PATCH v2] swiotlb: Adjust SWIOTBL bounce buffer size for SEV guests.

Hello Konrad,

On Tue, Jun 23, 2020 at 09:38:43AM -0400, Konrad Rzeszutek Wilk wrote:
> On Mon, Apr 27, 2020 at 06:53:18PM +0000, Ashish Kalra wrote:
> > Hello Konrad,
> >
> > On Mon, Mar 30, 2020 at 10:25:51PM +0000, Ashish Kalra wrote:
> > > Hello Konrad,
> > >
> > > On Tue, Mar 03, 2020 at 12:03:53PM -0500, Konrad Rzeszutek Wilk wrote:
> > > > On Tue, Feb 04, 2020 at 07:35:00PM +0000, Ashish Kalra wrote:
> > > > > Hello Konrad,
> > > > >
> > > > > Looking fwd. to your feedback regarding support of other memory
> > > > > encryption architectures such as Power, S390, etc.
> > > > >
> > > > > Thanks,
> > > > > Ashish
> > > > >
> > > > > On Fri, Jan 24, 2020 at 11:00:08PM +0000, Ashish Kalra wrote:
> > > > > > On Tue, Jan 21, 2020 at 03:54:03PM -0500, Konrad Rzeszutek Wilk wrote:
> > > > > > > >
> > > > > > > > Additional memory calculations based on # of PCI devices and
> > > > > > > > their memory ranges will make it more complicated with so
> > > > > > > > many other permutations and combinations to explore, it is
> > > > > > > > essential to keep this patch as simple as possible by
> > > > > > > > adjusting the bounce buffer size simply by determining it
> > > > > > > > from the amount of provisioned guest memory.
> > > > > > >>
> > > > > > >> Please rework the patch to:
> > > > > > >>
> > > > > > >> - Use a log solution instead of the multiplication.
> > > > > > >> Feel free to cap it at a sensible value.
> > > > > >
> > > > > > Ok.
> > > > > >
> > > > > > >>
> > > > > > >> - Also the code depends on SWIOTLB calling in to the
> > > > > > >> adjust_swiotlb_default_size which looks wrong.
> > > > > > >>
> > > > > > >> You should not adjust io_tlb_nslabs from swiotlb_size_or_default.
> > > > > >
> > > > > > >> That function's purpose is to report a value.
> > > > > > >>
> > > > > > >> - Make io_tlb_nslabs be visible outside of the SWIOTLB code.
> > > > > > >>
> > > > > > >> - Can you utilize the IOMMU_INIT APIs and have your own detect which would
> > > > > > >> modify the io_tlb_nslabs (and set swiotbl=1?).
> > > > > >
> > > > > > This seems to be a nice option, but then IOMMU_INIT APIs are
> > > > > > x86-specific and this swiotlb buffer size adjustment is also needed
> > > > > > for other memory encryption architectures like Power, S390, etc.
> > > >
> > > > Oh dear. That I hadn't considered.
> > > > > >
> > > > > > >>
> > > > > > >> Actually you seem to be piggybacking on pci_swiotlb_detect_4gb - so
> > > > > > >> perhaps add in this code ? Albeit it really should be in it's own
> > > > > > >> file, not in arch/x86/kernel/pci-swiotlb.c
> > > > > >
> > > > > > Actually, we piggyback on pci_swiotlb_detect_override which sets
> > > > > > swiotlb=1 as x86_64_start_kernel() and invocation of sme_early_init()
> > > > > > forces swiotlb on, but again this is all x86 architecture specific.
> > > >
> > > > Then it looks like the best bet is to do it from within swiotlb_init?
> > > > We really can't do it from swiotlb_size_or_default - that function
> > > > should just return a value and nothing else.
> > > >
> > >
> > > Actually, we need to do it in swiotlb_size_or_default() as this gets called by
> > > reserve_crashkernel_low() in arch/x86/kernel/setup.c and used to
> > > reserve low crashkernel memory. If we adjust swiotlb size later in
> > > swiotlb_init() which gets called later than reserve_crashkernel_low(),
> > > then any swiotlb size changes/expansion will conflict/overlap with the
> > > low memory reserved for crashkernel.
> > >
> > and will also potentially cause SWIOTLB buffer allocation failures.
> >
> > Do you have any feedback, comments on the above ?
>
>
> The init boot chain looks like this:
>
> initmem_init
> pci_iommu_alloc
> -> pci_swiotlb_detect_4gb
> -> swiotlb_init
>
> reserve_crashkernel
> reserve_crashkernel_low
> -> swiotlb_size_or_default
> ..
>
>
> (rootfs code):
> pci_iommu_init
> -> a bunch of the other IOMMU late_init code gets called..
> -> pci_swiotlb_late_init
>
> I have to say I am lost to how your patch fixes "If we adjust swiolb
> size later .. then any swiotlb size .. will overlap with the low memory
> reserved for crashkernel"?
>

Actually as per the boot flow :

setup_arch() calls reserve_crashkernel() and pci_iommu_alloc() is
invoked through mm_init()/mem_init() and not via initmem_init().

start_kernel:
...
setup_arch()
reserve_crashkernel
reserve_crashkernel_low
-> swiotlb_size_or_default

...
...
mm_init()
mem_init()
pci_iommu_alloc
-> pci_swiotlb_detect_4gb
-> swiotlb_init

So as per the above boot flow, reserve_crashkernel() can get called
before swiotlb_detect/init, and hence, if we don't fixup or adjust
the SWIOTLB buffer size in swiotlb_size_or_default() then crash kernel
will reserve memory which will conflict/overlap with any SWIOTLB bounce
buffer allocated memory (adjusted or fixed up later).

Therefore, we need to adjust/fixup SWIOTLB bounce buffer memory in
swiotlb_size_or_default() function itself, before swiotlb detect/init
funtions get invoked.

Thanks,
Ashish

> Or are you saying that 'reserve_crashkernel_low' is the _culprit_ and it
> is the one changing the size? And hence it modifying the swiotlb size
> will fix this problem? Aka _before_ all the other IOMMU get their hand
> on it?
>
> If so why not create an
> IOMMU_INIT(crashkernel_adjust_swiotlb,pci_swiotlb_detect_override,
> NULL, NULL);
>
> And crashkernel_adjust_swiotlb would change the size of swiotlb buffer
> if conditions are found to require it.
>
> You also may want to put a #define DEBUG in arch/x86/kernel/pci-iommu_table.c
> to check out whether the tree structure of IOMMU entries is correct.
>
>
>
> But still I am lost - if say the AMD one does decide for unknown reason
> to expand the SWIOTLB you are still stuck with the 'overlap with
> the low memory reserved' or so.
>
> Perhaps add a late_init that gets called as the last one to validate
> this ? And maybe if the swiotlb gets turned off you also take proper
> steps?
>
> > As such i feel, this patch is complete otherwise and can be included as
> > it is.
> >
> > Thanks,
> > Ashish

2020-06-24 07:08:27

by Kalra, Ashish

[permalink] [raw]
Subject: Re: [PATCH v2] swiotlb: Adjust SWIOTBL bounce buffer size for SEV guests.

On Wed, Jun 24, 2020 at 12:23:57AM +0000, Ashish Kalra wrote:
> Hello Konrad,
>
> On Tue, Jun 23, 2020 at 09:38:43AM -0400, Konrad Rzeszutek Wilk wrote:
> > On Mon, Apr 27, 2020 at 06:53:18PM +0000, Ashish Kalra wrote:
> > > Hello Konrad,
> > >
> > > On Mon, Mar 30, 2020 at 10:25:51PM +0000, Ashish Kalra wrote:
> > > > Hello Konrad,
> > > >
> > > > On Tue, Mar 03, 2020 at 12:03:53PM -0500, Konrad Rzeszutek Wilk wrote:
> > > > > On Tue, Feb 04, 2020 at 07:35:00PM +0000, Ashish Kalra wrote:
> > > > > > Hello Konrad,
> > > > > >
> > > > > > Looking fwd. to your feedback regarding support of other memory
> > > > > > encryption architectures such as Power, S390, etc.
> > > > > >
> > > > > > Thanks,
> > > > > > Ashish
> > > > > >
> > > > > > On Fri, Jan 24, 2020 at 11:00:08PM +0000, Ashish Kalra wrote:
> > > > > > > On Tue, Jan 21, 2020 at 03:54:03PM -0500, Konrad Rzeszutek Wilk wrote:
> > > > > > > > >
> > > > > > > > > Additional memory calculations based on # of PCI devices and
> > > > > > > > > their memory ranges will make it more complicated with so
> > > > > > > > > many other permutations and combinations to explore, it is
> > > > > > > > > essential to keep this patch as simple as possible by
> > > > > > > > > adjusting the bounce buffer size simply by determining it
> > > > > > > > > from the amount of provisioned guest memory.
> > > > > > > >>
> > > > > > > >> Please rework the patch to:
> > > > > > > >>
> > > > > > > >> - Use a log solution instead of the multiplication.
> > > > > > > >> Feel free to cap it at a sensible value.
> > > > > > >
> > > > > > > Ok.
> > > > > > >
> > > > > > > >>
> > > > > > > >> - Also the code depends on SWIOTLB calling in to the
> > > > > > > >> adjust_swiotlb_default_size which looks wrong.
> > > > > > > >>
> > > > > > > >> You should not adjust io_tlb_nslabs from swiotlb_size_or_default.
> > > > > > >
> > > > > > > >> That function's purpose is to report a value.
> > > > > > > >>
> > > > > > > >> - Make io_tlb_nslabs be visible outside of the SWIOTLB code.
> > > > > > > >>
> > > > > > > >> - Can you utilize the IOMMU_INIT APIs and have your own detect which would
> > > > > > > >> modify the io_tlb_nslabs (and set swiotbl=1?).
> > > > > > >
> > > > > > > This seems to be a nice option, but then IOMMU_INIT APIs are
> > > > > > > x86-specific and this swiotlb buffer size adjustment is also needed
> > > > > > > for other memory encryption architectures like Power, S390, etc.
> > > > >
> > > > > Oh dear. That I hadn't considered.
> > > > > > >
> > > > > > > >>
> > > > > > > >> Actually you seem to be piggybacking on pci_swiotlb_detect_4gb - so
> > > > > > > >> perhaps add in this code ? Albeit it really should be in it's own
> > > > > > > >> file, not in arch/x86/kernel/pci-swiotlb.c
> > > > > > >
> > > > > > > Actually, we piggyback on pci_swiotlb_detect_override which sets
> > > > > > > swiotlb=1 as x86_64_start_kernel() and invocation of sme_early_init()
> > > > > > > forces swiotlb on, but again this is all x86 architecture specific.
> > > > >
> > > > > Then it looks like the best bet is to do it from within swiotlb_init?
> > > > > We really can't do it from swiotlb_size_or_default - that function
> > > > > should just return a value and nothing else.
> > > > >
> > > >
> > > > Actually, we need to do it in swiotlb_size_or_default() as this gets called by
> > > > reserve_crashkernel_low() in arch/x86/kernel/setup.c and used to
> > > > reserve low crashkernel memory. If we adjust swiotlb size later in
> > > > swiotlb_init() which gets called later than reserve_crashkernel_low(),
> > > > then any swiotlb size changes/expansion will conflict/overlap with the
> > > > low memory reserved for crashkernel.
> > > >
> > > and will also potentially cause SWIOTLB buffer allocation failures.
> > >
> > > Do you have any feedback, comments on the above ?
> >
> >
> > The init boot chain looks like this:
> >
> > initmem_init
> > pci_iommu_alloc
> > -> pci_swiotlb_detect_4gb
> > -> swiotlb_init
> >
> > reserve_crashkernel
> > reserve_crashkernel_low
> > -> swiotlb_size_or_default
> > ..
> >
> >
> > (rootfs code):
> > pci_iommu_init
> > -> a bunch of the other IOMMU late_init code gets called..
> > -> pci_swiotlb_late_init
> >
> > I have to say I am lost to how your patch fixes "If we adjust swiolb
> > size later .. then any swiotlb size .. will overlap with the low memory
> > reserved for crashkernel"?
> >
>
> Actually as per the boot flow :
>
> setup_arch() calls reserve_crashkernel() and pci_iommu_alloc() is
> invoked through mm_init()/mem_init() and not via initmem_init().
>
> start_kernel:
> ...
> setup_arch()
> reserve_crashkernel
> reserve_crashkernel_low
> -> swiotlb_size_or_default
>
> ...
> ...
> mm_init()
> mem_init()
> pci_iommu_alloc
> -> pci_swiotlb_detect_4gb
> -> swiotlb_init
>
> So as per the above boot flow, reserve_crashkernel() can get called
> before swiotlb_detect/init, and hence, if we don't fixup or adjust
> the SWIOTLB buffer size in swiotlb_size_or_default() then crash kernel
> will reserve memory which will conflict/overlap with any SWIOTLB bounce
> buffer allocated memory (adjusted or fixed up later).
>
> Therefore, we need to adjust/fixup SWIOTLB bounce buffer memory in
> swiotlb_size_or_default() function itself, before swiotlb detect/init
> funtions get invoked.
>

Also to add here, it looks like swiotlb_size_or_default() is an
interface function to get the SWIOTLB bounce buffer size for components
which are initialized before swiotlb_detect/init, so that these
components can reserve or allocate their memory requirements with the
knowledge of how much SWIOTLB bounce buffers are going to use, so
therefore, any fixups or adjustments to SWIOTLB buffer size will need
to be made as part of swiotlb_size_or_default().

Thanks,
Ashish

> > Or are you saying that 'reserve_crashkernel_low' is the _culprit_ and it
> > is the one changing the size? And hence it modifying the swiotlb size
> > will fix this problem? Aka _before_ all the other IOMMU get their hand
> > on it?
> >
> > If so why not create an
> > IOMMU_INIT(crashkernel_adjust_swiotlb,pci_swiotlb_detect_override,
> > NULL, NULL);
> >
> > And crashkernel_adjust_swiotlb would change the size of swiotlb buffer
> > if conditions are found to require it.
> >
> > You also may want to put a #define DEBUG in arch/x86/kernel/pci-iommu_table.c
> > to check out whether the tree structure of IOMMU entries is correct.
> >
> >
> >
> > But still I am lost - if say the AMD one does decide for unknown reason
> > to expand the SWIOTLB you are still stuck with the 'overlap with
> > the low memory reserved' or so.
> >
> > Perhaps add a late_init that gets called as the last one to validate
> > this ? And maybe if the swiotlb gets turned off you also take proper
> > steps?
> >
> > > As such i feel, this patch is complete otherwise and can be included as
> > > it is.
> > >
> > > Thanks,
> > > Ashish

2020-06-24 22:04:26

by Konrad Rzeszutek Wilk

[permalink] [raw]
Subject: Re: [PATCH v2] swiotlb: Adjust SWIOTBL bounce buffer size for SEV guests.

.snip..
> > Actually as per the boot flow :
> >
> > setup_arch() calls reserve_crashkernel() and pci_iommu_alloc() is
> > invoked through mm_init()/mem_init() and not via initmem_init().
> >
> > start_kernel:
> > ...
> > setup_arch()
> > reserve_crashkernel
> > reserve_crashkernel_low
> > -> swiotlb_size_or_default
> >
> > ...
> > ...
> > mm_init()
> > mem_init()
> > pci_iommu_alloc
> > -> pci_swiotlb_detect_4gb
> > -> swiotlb_init
> >
> > So as per the above boot flow, reserve_crashkernel() can get called
> > before swiotlb_detect/init, and hence, if we don't fixup or adjust
> > the SWIOTLB buffer size in swiotlb_size_or_default() then crash kernel
> > will reserve memory which will conflict/overlap with any SWIOTLB bounce
> > buffer allocated memory (adjusted or fixed up later).
> >
> > Therefore, we need to adjust/fixup SWIOTLB bounce buffer memory in
> > swiotlb_size_or_default() function itself, before swiotlb detect/init
> > funtions get invoked.
> >
>
> Also to add here, it looks like swiotlb_size_or_default() is an
> interface function to get the SWIOTLB bounce buffer size for components
> which are initialized before swiotlb_detect/init, so that these
> components can reserve or allocate their memory requirements with the
> knowledge of how much SWIOTLB bounce buffers are going to use, so
> therefore, any fixups or adjustments to SWIOTLB buffer size will need
> to be made as part of swiotlb_size_or_default().

That was never its purpose. It was meant as way to figure out the segment
size for DMA requests and to be used for runtime components. In fact to
be idempotent.

This is why I am disliking this usage and leaning towards something else.

But you pointed out something interesting - you are in fact needing to
adjust the size of the swiotlb based on your needs at bootup. Not any different
from say 'swiotlb' paramter.

Why not have an 'swiotlb_adjust' that is an __init that modifies the
internal swiotlb buffer sizes? Obviously we have to account for 'swiotlb'
parsing as well. The swiotlb_adjust would pick the max from those.