2020-11-04 22:24:36

by Konrad Rzeszutek Wilk

[permalink] [raw]
Subject: Re: [PATCH v3] swiotlb: Adjust SWIOTBL bounce buffer size for SEV guests.

On Wed, Nov 04, 2020 at 10:08:04PM +0000, Ashish Kalra wrote:
> From: Ashish Kalra <[email protected]>
>
> For SEV, all DMA to and from guest has to use shared
> (un-encrypted) pages. SEV uses SWIOTLB to make this
> happen without requiring changes to device drivers.
> However, depending on workload being run, the default
> 64MB of SWIOTLB might not be enough and SWIOTLB
> may run out of buffers to use for DMA, resulting
> in I/O errors and/or performance degradation for
> high I/O workloads.
>
> Increase the default size of SWIOTLB for SEV guests
> using a minimum value of 128MB and a maximum value

<blinks>

64MB for a 1GB VM is not enough?

> of 512MB, determining on amount of provisioned guest

I like the implementation on how this is done.. but
the choices of memory and how much seems very much
random. Could there be some math behind this?

> memory.
>
> Using late_initcall() interface to invoke
> swiotlb_adjust() does not work as the size
> adjustment needs to be done before mem_encrypt_init()
> and reserve_crashkernel() which use the allocated
> SWIOTLB buffer size, hence calling it explicitly
> from setup_arch().
>
> The SWIOTLB default size adjustment is added as an
> architecture specific interface/callback to allow
> architectures such as those supporting memory
> encryption to adjust/expand SWIOTLB size for their
> use.
>
> Signed-off-by: Ashish Kalra <[email protected]>
> ---
> arch/x86/kernel/setup.c | 2 ++
> arch/x86/mm/mem_encrypt.c | 42 +++++++++++++++++++++++++++++++++++++++
> include/linux/swiotlb.h | 1 +
> kernel/dma/swiotlb.c | 27 +++++++++++++++++++++++++
> 4 files changed, 72 insertions(+)
>
> diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
> index 3511736fbc74..b073d58dd4a3 100644
> --- a/arch/x86/kernel/setup.c
> +++ b/arch/x86/kernel/setup.c
> @@ -1166,6 +1166,8 @@ void __init setup_arch(char **cmdline_p)
> if (boot_cpu_has(X86_FEATURE_GBPAGES))
> hugetlb_cma_reserve(PUD_SHIFT - PAGE_SHIFT);
>
> + swiotlb_adjust();
> +
> /*
> * Reserve memory for crash kernel after SRAT is parsed so that it
> * won't consume hotpluggable memory.
> diff --git a/arch/x86/mm/mem_encrypt.c b/arch/x86/mm/mem_encrypt.c
> index 3f248f0d0e07..e0deb157cddd 100644
> --- a/arch/x86/mm/mem_encrypt.c
> +++ b/arch/x86/mm/mem_encrypt.c
> @@ -489,7 +489,49 @@ static void print_mem_encrypt_feature_info(void)
> pr_cont("\n");
> }
>
> +#define TOTAL_MEM_1G 0x40000000UL
> +#define TOTAL_MEM_4G 0x100000000UL
> +
> +#define SIZE_128M (128UL<<20)
> +#define SIZE_256M (256UL<<20)
> +#define SIZE_512M (512UL<<20)
> +
> /* Architecture __weak replacement functions */
> +unsigned long __init arch_swiotlb_adjust(unsigned long iotlb_default_size)
> +{
> + unsigned long size = 0;
> +
> + /*
> + * For SEV, all DMA has to occur via shared/unencrypted pages.
> + * SEV uses SWOTLB to make this happen without changing device
> + * drivers. However, depending on the workload being run, the
> + * default 64MB of SWIOTLB may not be enough & SWIOTLB may
> + * run out of buffers for DMA, resulting in I/O errors and/or
> + * performance degradation especially with high I/O workloads.
> + * Increase the default size of SWIOTLB for SEV guests using
> + * a minimum value of 128MB and a maximum value of 512MB,
> + * depending on amount of provisioned guest memory.
> + */
> + if (sev_active()) {
> + phys_addr_t total_mem = memblock_phys_mem_size();
> +
> + if (total_mem <= TOTAL_MEM_1G)
> + size = clamp(iotlb_default_size * 2, SIZE_128M,
> + SIZE_128M);
> + else if (total_mem <= TOTAL_MEM_4G)
> + size = clamp(iotlb_default_size * 4, SIZE_256M,
> + SIZE_256M);
> + else
> + size = clamp(iotlb_default_size * 8, SIZE_512M,
> + SIZE_512M);
> +
> + pr_info("SEV adjusted max SWIOTLB size = %luMB",
> + size >> 20);
> + }
> +
> + return size;
> +}
> +
> void __init mem_encrypt_init(void)
> {
> if (!sme_me_mask)
> diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h
> index 046bb94bd4d6..01ae6d891327 100644
> --- a/include/linux/swiotlb.h
> +++ b/include/linux/swiotlb.h
> @@ -33,6 +33,7 @@ extern void swiotlb_init(int verbose);
> int swiotlb_init_with_tbl(char *tlb, unsigned long nslabs, int verbose);
> extern unsigned long swiotlb_nr_tbl(void);
> unsigned long swiotlb_size_or_default(void);
> +extern void __init swiotlb_adjust(void);
> extern int swiotlb_late_init_with_tbl(char *tlb, unsigned long nslabs);
> extern void __init swiotlb_update_mem_attributes(void);
>
> diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
> index c19379fabd20..66a9e627bb51 100644
> --- a/kernel/dma/swiotlb.c
> +++ b/kernel/dma/swiotlb.c
> @@ -163,6 +163,33 @@ unsigned long swiotlb_size_or_default(void)
> return size ? size : (IO_TLB_DEFAULT_SIZE);
> }
>
> +unsigned long __init __weak arch_swiotlb_adjust(unsigned long size)
> +{
> + return 0;
> +}
> +
> +void __init swiotlb_adjust(void)
> +{
> + unsigned long size;
> +
> + /*
> + * If swiotlb parameter has not been specified, give a chance to
> + * architectures such as those supporting memory encryption to
> + * adjust/expand SWIOTLB size for their use.
> + */
> + if (!io_tlb_nslabs) {
> + size = arch_swiotlb_adjust(IO_TLB_DEFAULT_SIZE);
> + if (size) {
> + size = ALIGN(size, 1 << IO_TLB_SHIFT);
> + io_tlb_nslabs = size >> IO_TLB_SHIFT;
> + io_tlb_nslabs = ALIGN(io_tlb_nslabs, IO_TLB_SEGSIZE);
> +
> + pr_info("architecture adjusted SWIOTLB slabs = %lu\n",
> + io_tlb_nslabs);
> + }
> + }
> +}
> +
> void swiotlb_print_info(void)
> {
> unsigned long bytes = io_tlb_nslabs << IO_TLB_SHIFT;
> --
> 2.17.1
>


2020-11-05 02:31:30

by Kalra, Ashish

[permalink] [raw]
Subject: Re: [PATCH v3] swiotlb: Adjust SWIOTBL bounce buffer size for SEV guests.

Hello Konrad,

On Wed, Nov 04, 2020 at 05:14:52PM -0500, Konrad Rzeszutek Wilk wrote:
> On Wed, Nov 04, 2020 at 10:08:04PM +0000, Ashish Kalra wrote:
> > From: Ashish Kalra <[email protected]>
> >
> > For SEV, all DMA to and from guest has to use shared
> > (un-encrypted) pages. SEV uses SWIOTLB to make this
> > happen without requiring changes to device drivers.
> > However, depending on workload being run, the default
> > 64MB of SWIOTLB might not be enough and SWIOTLB
> > may run out of buffers to use for DMA, resulting
> > in I/O errors and/or performance degradation for
> > high I/O workloads.
> >
> > Increase the default size of SWIOTLB for SEV guests
> > using a minimum value of 128MB and a maximum value
>
> <blinks>
>
> 64MB for a 1GB VM is not enough?
>
> > of 512MB, determining on amount of provisioned guest
>
> I like the implementation on how this is done.. but
> the choices of memory and how much seems very much
> random. Could there be some math behind this?
>

Earlier the patch was based on using a % of guest memory, as below:

+#define SEV_ADJUST_SWIOTLB_SIZE_PERCENT 5
+#define SEV_ADJUST_SWIOTLB_SIZE_MAX (1UL << 30)
...
...
+ if (sev_active() && !io_tlb_nslabs) {
+ unsigned long total_mem = get_num_physpages() << PAGE_SHIFT;
+
+ default_size = total_mem *
+ SEV_ADJUST_SWIOTLB_SIZE_PERCENT / 100;
+
+ default_size = ALIGN(default_size, 1 << IO_TLB_SHIFT);
+
+ default_size = clamp_val(default_size, IO_TLB_DEFAULT_SIZE,
+ SEV_ADJUST_SWIOTLB_SIZE_MAX);
+ }

But, then it is difficult to predict what % of guest memory to use ?

Then there are other factors to consider, such as vcpu_count or if there
is going to be high I/O workload, etc.

But that all makes it very complicated, what we basically want is a
range from 128M to 512M and that's why the current patch which picks up
this range from the amount of allocated guest memory keeps it simple.

Thanks,
Ashish

> > memory.
> >
> > Using late_initcall() interface to invoke
> > swiotlb_adjust() does not work as the size
> > adjustment needs to be done before mem_encrypt_init()
> > and reserve_crashkernel() which use the allocated
> > SWIOTLB buffer size, hence calling it explicitly
> > from setup_arch().
> >
> > The SWIOTLB default size adjustment is added as an
> > architecture specific interface/callback to allow
> > architectures such as those supporting memory
> > encryption to adjust/expand SWIOTLB size for their
> > use.
> >
> > Signed-off-by: Ashish Kalra <[email protected]>
> > ---
> > arch/x86/kernel/setup.c | 2 ++
> > arch/x86/mm/mem_encrypt.c | 42 +++++++++++++++++++++++++++++++++++++++
> > include/linux/swiotlb.h | 1 +
> > kernel/dma/swiotlb.c | 27 +++++++++++++++++++++++++
> > 4 files changed, 72 insertions(+)
> >
> > diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
> > index 3511736fbc74..b073d58dd4a3 100644
> > --- a/arch/x86/kernel/setup.c
> > +++ b/arch/x86/kernel/setup.c
> > @@ -1166,6 +1166,8 @@ void __init setup_arch(char **cmdline_p)
> > if (boot_cpu_has(X86_FEATURE_GBPAGES))
> > hugetlb_cma_reserve(PUD_SHIFT - PAGE_SHIFT);
> >
> > + swiotlb_adjust();
> > +
> > /*
> > * Reserve memory for crash kernel after SRAT is parsed so that it
> > * won't consume hotpluggable memory.
> > diff --git a/arch/x86/mm/mem_encrypt.c b/arch/x86/mm/mem_encrypt.c
> > index 3f248f0d0e07..e0deb157cddd 100644
> > --- a/arch/x86/mm/mem_encrypt.c
> > +++ b/arch/x86/mm/mem_encrypt.c
> > @@ -489,7 +489,49 @@ static void print_mem_encrypt_feature_info(void)
> > pr_cont("\n");
> > }
> >
> > +#define TOTAL_MEM_1G 0x40000000UL
> > +#define TOTAL_MEM_4G 0x100000000UL
> > +
> > +#define SIZE_128M (128UL<<20)
> > +#define SIZE_256M (256UL<<20)
> > +#define SIZE_512M (512UL<<20)
> > +
> > /* Architecture __weak replacement functions */
> > +unsigned long __init arch_swiotlb_adjust(unsigned long iotlb_default_size)
> > +{
> > + unsigned long size = 0;
> > +
> > + /*
> > + * For SEV, all DMA has to occur via shared/unencrypted pages.
> > + * SEV uses SWOTLB to make this happen without changing device
> > + * drivers. However, depending on the workload being run, the
> > + * default 64MB of SWIOTLB may not be enough & SWIOTLB may
> > + * run out of buffers for DMA, resulting in I/O errors and/or
> > + * performance degradation especially with high I/O workloads.
> > + * Increase the default size of SWIOTLB for SEV guests using
> > + * a minimum value of 128MB and a maximum value of 512MB,
> > + * depending on amount of provisioned guest memory.
> > + */
> > + if (sev_active()) {
> > + phys_addr_t total_mem = memblock_phys_mem_size();
> > +
> > + if (total_mem <= TOTAL_MEM_1G)
> > + size = clamp(iotlb_default_size * 2, SIZE_128M,
> > + SIZE_128M);
> > + else if (total_mem <= TOTAL_MEM_4G)
> > + size = clamp(iotlb_default_size * 4, SIZE_256M,
> > + SIZE_256M);
> > + else
> > + size = clamp(iotlb_default_size * 8, SIZE_512M,
> > + SIZE_512M);
> > +
> > + pr_info("SEV adjusted max SWIOTLB size = %luMB",
> > + size >> 20);
> > + }
> > +
> > + return size;
> > +}
> > +
> > void __init mem_encrypt_init(void)
> > {
> > if (!sme_me_mask)
> > diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h
> > index 046bb94bd4d6..01ae6d891327 100644
> > --- a/include/linux/swiotlb.h
> > +++ b/include/linux/swiotlb.h
> > @@ -33,6 +33,7 @@ extern void swiotlb_init(int verbose);
> > int swiotlb_init_with_tbl(char *tlb, unsigned long nslabs, int verbose);
> > extern unsigned long swiotlb_nr_tbl(void);
> > unsigned long swiotlb_size_or_default(void);
> > +extern void __init swiotlb_adjust(void);
> > extern int swiotlb_late_init_with_tbl(char *tlb, unsigned long nslabs);
> > extern void __init swiotlb_update_mem_attributes(void);
> >
> > diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
> > index c19379fabd20..66a9e627bb51 100644
> > --- a/kernel/dma/swiotlb.c
> > +++ b/kernel/dma/swiotlb.c
> > @@ -163,6 +163,33 @@ unsigned long swiotlb_size_or_default(void)
> > return size ? size : (IO_TLB_DEFAULT_SIZE);
> > }
> >
> > +unsigned long __init __weak arch_swiotlb_adjust(unsigned long size)
> > +{
> > + return 0;
> > +}
> > +
> > +void __init swiotlb_adjust(void)
> > +{
> > + unsigned long size;
> > +
> > + /*
> > + * If swiotlb parameter has not been specified, give a chance to
> > + * architectures such as those supporting memory encryption to
> > + * adjust/expand SWIOTLB size for their use.
> > + */
> > + if (!io_tlb_nslabs) {
> > + size = arch_swiotlb_adjust(IO_TLB_DEFAULT_SIZE);
> > + if (size) {
> > + size = ALIGN(size, 1 << IO_TLB_SHIFT);
> > + io_tlb_nslabs = size >> IO_TLB_SHIFT;
> > + io_tlb_nslabs = ALIGN(io_tlb_nslabs, IO_TLB_SEGSIZE);
> > +
> > + pr_info("architecture adjusted SWIOTLB slabs = %lu\n",
> > + io_tlb_nslabs);
> > + }
> > + }
> > +}
> > +
> > void swiotlb_print_info(void)
> > {
> > unsigned long bytes = io_tlb_nslabs << IO_TLB_SHIFT;
> > --
> > 2.17.1
> >

2020-11-05 17:44:58

by Konrad Rzeszutek Wilk

[permalink] [raw]
Subject: Re: [PATCH v3] swiotlb: Adjust SWIOTBL bounce buffer size for SEV guests.

On Wed, Nov 04, 2020 at 10:39:13PM +0000, Ashish Kalra wrote:
> Hello Konrad,
>
> On Wed, Nov 04, 2020 at 05:14:52PM -0500, Konrad Rzeszutek Wilk wrote:
> > On Wed, Nov 04, 2020 at 10:08:04PM +0000, Ashish Kalra wrote:
> > > From: Ashish Kalra <[email protected]>
> > >
> > > For SEV, all DMA to and from guest has to use shared
> > > (un-encrypted) pages. SEV uses SWIOTLB to make this
> > > happen without requiring changes to device drivers.
> > > However, depending on workload being run, the default
> > > 64MB of SWIOTLB might not be enough and SWIOTLB
> > > may run out of buffers to use for DMA, resulting
> > > in I/O errors and/or performance degradation for
> > > high I/O workloads.
> > >
> > > Increase the default size of SWIOTLB for SEV guests
> > > using a minimum value of 128MB and a maximum value
> >
> > <blinks>
> >
> > 64MB for a 1GB VM is not enough?
> >
> > > of 512MB, determining on amount of provisioned guest
> >
> > I like the implementation on how this is done.. but
> > the choices of memory and how much seems very much
> > random. Could there be some math behind this?
> >
>
> Earlier the patch was based on using a % of guest memory, as below:
>
> +#define SEV_ADJUST_SWIOTLB_SIZE_PERCENT 5
> +#define SEV_ADJUST_SWIOTLB_SIZE_MAX (1UL << 30)
> ...
> ...
> + if (sev_active() && !io_tlb_nslabs) {
> + unsigned long total_mem = get_num_physpages() << PAGE_SHIFT;
> +
> + default_size = total_mem *
> + SEV_ADJUST_SWIOTLB_SIZE_PERCENT / 100;
> +
> + default_size = ALIGN(default_size, 1 << IO_TLB_SHIFT);
> +
> + default_size = clamp_val(default_size, IO_TLB_DEFAULT_SIZE,
> + SEV_ADJUST_SWIOTLB_SIZE_MAX);
> + }
>
> But, then it is difficult to predict what % of guest memory to use ?
>
> Then there are other factors to consider, such as vcpu_count or if there
> is going to be high I/O workload, etc.
>
> But that all makes it very complicated, what we basically want is a
> range from 128M to 512M and that's why the current patch which picks up
> this range from the amount of allocated guest memory keeps it simple.

Right, so I am wondering if we can do this better.

That is you are never going to get any 32-bit devices with SEV right? That
is there is nothing that bounds you to always use the memory below 4GB?

What I wonder is if we can combine the boot-allocation of the SWIOTLB
with the post-boot-allocation of SWIOLTB to stitch together
continous physical ranges.

That way you have the flexibility at the start of using 64MB but if there
is pressure, we grow to a bigger size?

>
> Thanks,
> Ashish
>
> > > memory.
> > >
> > > Using late_initcall() interface to invoke
> > > swiotlb_adjust() does not work as the size
> > > adjustment needs to be done before mem_encrypt_init()
> > > and reserve_crashkernel() which use the allocated
> > > SWIOTLB buffer size, hence calling it explicitly
> > > from setup_arch().
> > >
> > > The SWIOTLB default size adjustment is added as an
> > > architecture specific interface/callback to allow
> > > architectures such as those supporting memory
> > > encryption to adjust/expand SWIOTLB size for their
> > > use.
> > >
> > > Signed-off-by: Ashish Kalra <[email protected]>
> > > ---
> > > arch/x86/kernel/setup.c | 2 ++
> > > arch/x86/mm/mem_encrypt.c | 42 +++++++++++++++++++++++++++++++++++++++
> > > include/linux/swiotlb.h | 1 +
> > > kernel/dma/swiotlb.c | 27 +++++++++++++++++++++++++
> > > 4 files changed, 72 insertions(+)
> > >
> > > diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
> > > index 3511736fbc74..b073d58dd4a3 100644
> > > --- a/arch/x86/kernel/setup.c
> > > +++ b/arch/x86/kernel/setup.c
> > > @@ -1166,6 +1166,8 @@ void __init setup_arch(char **cmdline_p)
> > > if (boot_cpu_has(X86_FEATURE_GBPAGES))
> > > hugetlb_cma_reserve(PUD_SHIFT - PAGE_SHIFT);
> > >
> > > + swiotlb_adjust();
> > > +
> > > /*
> > > * Reserve memory for crash kernel after SRAT is parsed so that it
> > > * won't consume hotpluggable memory.
> > > diff --git a/arch/x86/mm/mem_encrypt.c b/arch/x86/mm/mem_encrypt.c
> > > index 3f248f0d0e07..e0deb157cddd 100644
> > > --- a/arch/x86/mm/mem_encrypt.c
> > > +++ b/arch/x86/mm/mem_encrypt.c
> > > @@ -489,7 +489,49 @@ static void print_mem_encrypt_feature_info(void)
> > > pr_cont("\n");
> > > }
> > >
> > > +#define TOTAL_MEM_1G 0x40000000UL
> > > +#define TOTAL_MEM_4G 0x100000000UL
> > > +
> > > +#define SIZE_128M (128UL<<20)
> > > +#define SIZE_256M (256UL<<20)
> > > +#define SIZE_512M (512UL<<20)
> > > +
> > > /* Architecture __weak replacement functions */
> > > +unsigned long __init arch_swiotlb_adjust(unsigned long iotlb_default_size)
> > > +{
> > > + unsigned long size = 0;
> > > +
> > > + /*
> > > + * For SEV, all DMA has to occur via shared/unencrypted pages.
> > > + * SEV uses SWOTLB to make this happen without changing device
> > > + * drivers. However, depending on the workload being run, the
> > > + * default 64MB of SWIOTLB may not be enough & SWIOTLB may
> > > + * run out of buffers for DMA, resulting in I/O errors and/or
> > > + * performance degradation especially with high I/O workloads.
> > > + * Increase the default size of SWIOTLB for SEV guests using
> > > + * a minimum value of 128MB and a maximum value of 512MB,
> > > + * depending on amount of provisioned guest memory.
> > > + */
> > > + if (sev_active()) {
> > > + phys_addr_t total_mem = memblock_phys_mem_size();
> > > +
> > > + if (total_mem <= TOTAL_MEM_1G)
> > > + size = clamp(iotlb_default_size * 2, SIZE_128M,
> > > + SIZE_128M);
> > > + else if (total_mem <= TOTAL_MEM_4G)
> > > + size = clamp(iotlb_default_size * 4, SIZE_256M,
> > > + SIZE_256M);
> > > + else
> > > + size = clamp(iotlb_default_size * 8, SIZE_512M,
> > > + SIZE_512M);
> > > +
> > > + pr_info("SEV adjusted max SWIOTLB size = %luMB",
> > > + size >> 20);
> > > + }
> > > +
> > > + return size;
> > > +}
> > > +
> > > void __init mem_encrypt_init(void)
> > > {
> > > if (!sme_me_mask)
> > > diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h
> > > index 046bb94bd4d6..01ae6d891327 100644
> > > --- a/include/linux/swiotlb.h
> > > +++ b/include/linux/swiotlb.h
> > > @@ -33,6 +33,7 @@ extern void swiotlb_init(int verbose);
> > > int swiotlb_init_with_tbl(char *tlb, unsigned long nslabs, int verbose);
> > > extern unsigned long swiotlb_nr_tbl(void);
> > > unsigned long swiotlb_size_or_default(void);
> > > +extern void __init swiotlb_adjust(void);
> > > extern int swiotlb_late_init_with_tbl(char *tlb, unsigned long nslabs);
> > > extern void __init swiotlb_update_mem_attributes(void);
> > >
> > > diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
> > > index c19379fabd20..66a9e627bb51 100644
> > > --- a/kernel/dma/swiotlb.c
> > > +++ b/kernel/dma/swiotlb.c
> > > @@ -163,6 +163,33 @@ unsigned long swiotlb_size_or_default(void)
> > > return size ? size : (IO_TLB_DEFAULT_SIZE);
> > > }
> > >
> > > +unsigned long __init __weak arch_swiotlb_adjust(unsigned long size)
> > > +{
> > > + return 0;
> > > +}
> > > +
> > > +void __init swiotlb_adjust(void)
> > > +{
> > > + unsigned long size;
> > > +
> > > + /*
> > > + * If swiotlb parameter has not been specified, give a chance to
> > > + * architectures such as those supporting memory encryption to
> > > + * adjust/expand SWIOTLB size for their use.
> > > + */
> > > + if (!io_tlb_nslabs) {
> > > + size = arch_swiotlb_adjust(IO_TLB_DEFAULT_SIZE);
> > > + if (size) {
> > > + size = ALIGN(size, 1 << IO_TLB_SHIFT);
> > > + io_tlb_nslabs = size >> IO_TLB_SHIFT;
> > > + io_tlb_nslabs = ALIGN(io_tlb_nslabs, IO_TLB_SEGSIZE);
> > > +
> > > + pr_info("architecture adjusted SWIOTLB slabs = %lu\n",
> > > + io_tlb_nslabs);
> > > + }
> > > + }
> > > +}
> > > +
> > > void swiotlb_print_info(void)
> > > {
> > > unsigned long bytes = io_tlb_nslabs << IO_TLB_SHIFT;
> > > --
> > > 2.17.1
> > >

2020-11-05 18:44:03

by Kalra, Ashish

[permalink] [raw]
Subject: Re: [PATCH v3] swiotlb: Adjust SWIOTBL bounce buffer size for SEV guests.

Hello Konrad,

On Thu, Nov 05, 2020 at 12:43:17PM -0500, Konrad Rzeszutek Wilk wrote:
> On Wed, Nov 04, 2020 at 10:39:13PM +0000, Ashish Kalra wrote:
> > Hello Konrad,
> >
> > On Wed, Nov 04, 2020 at 05:14:52PM -0500, Konrad Rzeszutek Wilk wrote:
> > > On Wed, Nov 04, 2020 at 10:08:04PM +0000, Ashish Kalra wrote:
> > > > From: Ashish Kalra <[email protected]>
> > > >
> > > > For SEV, all DMA to and from guest has to use shared
> > > > (un-encrypted) pages. SEV uses SWIOTLB to make this
> > > > happen without requiring changes to device drivers.
> > > > However, depending on workload being run, the default
> > > > 64MB of SWIOTLB might not be enough and SWIOTLB
> > > > may run out of buffers to use for DMA, resulting
> > > > in I/O errors and/or performance degradation for
> > > > high I/O workloads.
> > > >
> > > > Increase the default size of SWIOTLB for SEV guests
> > > > using a minimum value of 128MB and a maximum value
> > >
> > > <blinks>
> > >
> > > 64MB for a 1GB VM is not enough?
> > >
> > > > of 512MB, determining on amount of provisioned guest
> > >
> > > I like the implementation on how this is done.. but
> > > the choices of memory and how much seems very much
> > > random. Could there be some math behind this?
> > >
> >
> > Earlier the patch was based on using a % of guest memory, as below:
> >
> > +#define SEV_ADJUST_SWIOTLB_SIZE_PERCENT 5
> > +#define SEV_ADJUST_SWIOTLB_SIZE_MAX (1UL << 30)
> > ...
> > ...
> > + if (sev_active() && !io_tlb_nslabs) {
> > + unsigned long total_mem = get_num_physpages() << PAGE_SHIFT;
> > +
> > + default_size = total_mem *
> > + SEV_ADJUST_SWIOTLB_SIZE_PERCENT / 100;
> > +
> > + default_size = ALIGN(default_size, 1 << IO_TLB_SHIFT);
> > +
> > + default_size = clamp_val(default_size, IO_TLB_DEFAULT_SIZE,
> > + SEV_ADJUST_SWIOTLB_SIZE_MAX);
> > + }
> >
> > But, then it is difficult to predict what % of guest memory to use ?
> >
> > Then there are other factors to consider, such as vcpu_count or if there
> > is going to be high I/O workload, etc.
> >
> > But that all makes it very complicated, what we basically want is a
> > range from 128M to 512M and that's why the current patch which picks up
> > this range from the amount of allocated guest memory keeps it simple.
>
> Right, so I am wondering if we can do this better.
>
> That is you are never going to get any 32-bit devices with SEV right? That
> is there is nothing that bounds you to always use the memory below 4GB?
>

We do support 32-bit PCIe passthrough devices with SEV.

Therefore, we can't just depend on >4G memory for SWIOTLB bounce buffering
when there is I/O pressure, because we do need to support device
passthrough of 32-bit devices.

Considering this, we believe that this patch needs to adjust/extend
boot-allocation of SWIOTLB and we want to keep it simple to do this
within a range detemined by amount of allocated guest memory.

Thanks,
Ashish

> What I wonder is if we can combine the boot-allocation of the SWIOTLB
> with the post-boot-allocation of SWIOLTB to stitch together
> continous physical ranges.
>
> That way you have the flexibility at the start of using 64MB but if there
> is pressure, we grow to a bigger size?
>
> >
> > Thanks,
> > Ashish
> >
> > > > memory.
> > > >
> > > > Using late_initcall() interface to invoke
> > > > swiotlb_adjust() does not work as the size
> > > > adjustment needs to be done before mem_encrypt_init()
> > > > and reserve_crashkernel() which use the allocated
> > > > SWIOTLB buffer size, hence calling it explicitly
> > > > from setup_arch().
> > > >
> > > > The SWIOTLB default size adjustment is added as an
> > > > architecture specific interface/callback to allow
> > > > architectures such as those supporting memory
> > > > encryption to adjust/expand SWIOTLB size for their
> > > > use.
> > > >
> > > > Signed-off-by: Ashish Kalra <[email protected]>
> > > > ---
> > > > arch/x86/kernel/setup.c | 2 ++
> > > > arch/x86/mm/mem_encrypt.c | 42 +++++++++++++++++++++++++++++++++++++++
> > > > include/linux/swiotlb.h | 1 +
> > > > kernel/dma/swiotlb.c | 27 +++++++++++++++++++++++++
> > > > 4 files changed, 72 insertions(+)
> > > >
> > > > diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
> > > > index 3511736fbc74..b073d58dd4a3 100644
> > > > --- a/arch/x86/kernel/setup.c
> > > > +++ b/arch/x86/kernel/setup.c
> > > > @@ -1166,6 +1166,8 @@ void __init setup_arch(char **cmdline_p)
> > > > if (boot_cpu_has(X86_FEATURE_GBPAGES))
> > > > hugetlb_cma_reserve(PUD_SHIFT - PAGE_SHIFT);
> > > >
> > > > + swiotlb_adjust();
> > > > +
> > > > /*
> > > > * Reserve memory for crash kernel after SRAT is parsed so that it
> > > > * won't consume hotpluggable memory.
> > > > diff --git a/arch/x86/mm/mem_encrypt.c b/arch/x86/mm/mem_encrypt.c
> > > > index 3f248f0d0e07..e0deb157cddd 100644
> > > > --- a/arch/x86/mm/mem_encrypt.c
> > > > +++ b/arch/x86/mm/mem_encrypt.c
> > > > @@ -489,7 +489,49 @@ static void print_mem_encrypt_feature_info(void)
> > > > pr_cont("\n");
> > > > }
> > > >
> > > > +#define TOTAL_MEM_1G 0x40000000UL
> > > > +#define TOTAL_MEM_4G 0x100000000UL
> > > > +
> > > > +#define SIZE_128M (128UL<<20)
> > > > +#define SIZE_256M (256UL<<20)
> > > > +#define SIZE_512M (512UL<<20)
> > > > +
> > > > /* Architecture __weak replacement functions */
> > > > +unsigned long __init arch_swiotlb_adjust(unsigned long iotlb_default_size)
> > > > +{
> > > > + unsigned long size = 0;
> > > > +
> > > > + /*
> > > > + * For SEV, all DMA has to occur via shared/unencrypted pages.
> > > > + * SEV uses SWOTLB to make this happen without changing device
> > > > + * drivers. However, depending on the workload being run, the
> > > > + * default 64MB of SWIOTLB may not be enough & SWIOTLB may
> > > > + * run out of buffers for DMA, resulting in I/O errors and/or
> > > > + * performance degradation especially with high I/O workloads.
> > > > + * Increase the default size of SWIOTLB for SEV guests using
> > > > + * a minimum value of 128MB and a maximum value of 512MB,
> > > > + * depending on amount of provisioned guest memory.
> > > > + */
> > > > + if (sev_active()) {
> > > > + phys_addr_t total_mem = memblock_phys_mem_size();
> > > > +
> > > > + if (total_mem <= TOTAL_MEM_1G)
> > > > + size = clamp(iotlb_default_size * 2, SIZE_128M,
> > > > + SIZE_128M);
> > > > + else if (total_mem <= TOTAL_MEM_4G)
> > > > + size = clamp(iotlb_default_size * 4, SIZE_256M,
> > > > + SIZE_256M);
> > > > + else
> > > > + size = clamp(iotlb_default_size * 8, SIZE_512M,
> > > > + SIZE_512M);
> > > > +
> > > > + pr_info("SEV adjusted max SWIOTLB size = %luMB",
> > > > + size >> 20);
> > > > + }
> > > > +
> > > > + return size;
> > > > +}
> > > > +
> > > > void __init mem_encrypt_init(void)
> > > > {
> > > > if (!sme_me_mask)
> > > > diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h
> > > > index 046bb94bd4d6..01ae6d891327 100644
> > > > --- a/include/linux/swiotlb.h
> > > > +++ b/include/linux/swiotlb.h
> > > > @@ -33,6 +33,7 @@ extern void swiotlb_init(int verbose);
> > > > int swiotlb_init_with_tbl(char *tlb, unsigned long nslabs, int verbose);
> > > > extern unsigned long swiotlb_nr_tbl(void);
> > > > unsigned long swiotlb_size_or_default(void);
> > > > +extern void __init swiotlb_adjust(void);
> > > > extern int swiotlb_late_init_with_tbl(char *tlb, unsigned long nslabs);
> > > > extern void __init swiotlb_update_mem_attributes(void);
> > > >
> > > > diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
> > > > index c19379fabd20..66a9e627bb51 100644
> > > > --- a/kernel/dma/swiotlb.c
> > > > +++ b/kernel/dma/swiotlb.c
> > > > @@ -163,6 +163,33 @@ unsigned long swiotlb_size_or_default(void)
> > > > return size ? size : (IO_TLB_DEFAULT_SIZE);
> > > > }
> > > >
> > > > +unsigned long __init __weak arch_swiotlb_adjust(unsigned long size)
> > > > +{
> > > > + return 0;
> > > > +}
> > > > +
> > > > +void __init swiotlb_adjust(void)
> > > > +{
> > > > + unsigned long size;
> > > > +
> > > > + /*
> > > > + * If swiotlb parameter has not been specified, give a chance to
> > > > + * architectures such as those supporting memory encryption to
> > > > + * adjust/expand SWIOTLB size for their use.
> > > > + */
> > > > + if (!io_tlb_nslabs) {
> > > > + size = arch_swiotlb_adjust(IO_TLB_DEFAULT_SIZE);
> > > > + if (size) {
> > > > + size = ALIGN(size, 1 << IO_TLB_SHIFT);
> > > > + io_tlb_nslabs = size >> IO_TLB_SHIFT;
> > > > + io_tlb_nslabs = ALIGN(io_tlb_nslabs, IO_TLB_SEGSIZE);
> > > > +
> > > > + pr_info("architecture adjusted SWIOTLB slabs = %lu\n",
> > > > + io_tlb_nslabs);
> > > > + }
> > > > + }
> > > > +}
> > > > +
> > > > void swiotlb_print_info(void)
> > > > {
> > > > unsigned long bytes = io_tlb_nslabs << IO_TLB_SHIFT;
> > > > --
> > > > 2.17.1
> > > >

2020-11-05 19:07:30

by Konrad Rzeszutek Wilk

[permalink] [raw]
Subject: Re: [PATCH v3] swiotlb: Adjust SWIOTBL bounce buffer size for SEV guests.

.
> > Right, so I am wondering if we can do this better.
> >
> > That is you are never going to get any 32-bit devices with SEV right? That
> > is there is nothing that bounds you to always use the memory below 4GB?
> >
>
> We do support 32-bit PCIe passthrough devices with SEV.

Ewww.. Which devices would this be?
>
> Therefore, we can't just depend on >4G memory for SWIOTLB bounce buffering
> when there is I/O pressure, because we do need to support device
> passthrough of 32-bit devices.

Presumarily there is just a handful of them?

>
> Considering this, we believe that this patch needs to adjust/extend
> boot-allocation of SWIOTLB and we want to keep it simple to do this
> within a range detemined by amount of allocated guest memory.

I would prefer to not have to revert this in a year as customers
complain about "I paid $$$ and I am wasting half a gig on something
I am not using" and giving customers knobs to tweak this instead of
doing the right thing from the start.

That is the right thing being something less static.

Can you work with me on what that could be please?

>
> Thanks,
> Ashish
>
> > What I wonder is if we can combine the boot-allocation of the SWIOTLB
> > with the post-boot-allocation of SWIOLTB to stitch together
> > continous physical ranges.
> >
> > That way you have the flexibility at the start of using 64MB but if there
> > is pressure, we grow to a bigger size?
> >
> > >
> > > Thanks,
> > > Ashish
> > >
> > > > > memory.
> > > > >
> > > > > Using late_initcall() interface to invoke
> > > > > swiotlb_adjust() does not work as the size
> > > > > adjustment needs to be done before mem_encrypt_init()
> > > > > and reserve_crashkernel() which use the allocated
> > > > > SWIOTLB buffer size, hence calling it explicitly
> > > > > from setup_arch().
> > > > >
> > > > > The SWIOTLB default size adjustment is added as an
> > > > > architecture specific interface/callback to allow
> > > > > architectures such as those supporting memory
> > > > > encryption to adjust/expand SWIOTLB size for their
> > > > > use.
> > > > >
> > > > > Signed-off-by: Ashish Kalra <[email protected]>
> > > > > ---
> > > > > arch/x86/kernel/setup.c | 2 ++
> > > > > arch/x86/mm/mem_encrypt.c | 42 +++++++++++++++++++++++++++++++++++++++
> > > > > include/linux/swiotlb.h | 1 +
> > > > > kernel/dma/swiotlb.c | 27 +++++++++++++++++++++++++
> > > > > 4 files changed, 72 insertions(+)
> > > > >
> > > > > diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
> > > > > index 3511736fbc74..b073d58dd4a3 100644
> > > > > --- a/arch/x86/kernel/setup.c
> > > > > +++ b/arch/x86/kernel/setup.c
> > > > > @@ -1166,6 +1166,8 @@ void __init setup_arch(char **cmdline_p)
> > > > > if (boot_cpu_has(X86_FEATURE_GBPAGES))
> > > > > hugetlb_cma_reserve(PUD_SHIFT - PAGE_SHIFT);
> > > > >
> > > > > + swiotlb_adjust();
> > > > > +
> > > > > /*
> > > > > * Reserve memory for crash kernel after SRAT is parsed so that it
> > > > > * won't consume hotpluggable memory.
> > > > > diff --git a/arch/x86/mm/mem_encrypt.c b/arch/x86/mm/mem_encrypt.c
> > > > > index 3f248f0d0e07..e0deb157cddd 100644
> > > > > --- a/arch/x86/mm/mem_encrypt.c
> > > > > +++ b/arch/x86/mm/mem_encrypt.c
> > > > > @@ -489,7 +489,49 @@ static void print_mem_encrypt_feature_info(void)
> > > > > pr_cont("\n");
> > > > > }
> > > > >
> > > > > +#define TOTAL_MEM_1G 0x40000000UL
> > > > > +#define TOTAL_MEM_4G 0x100000000UL
> > > > > +
> > > > > +#define SIZE_128M (128UL<<20)
> > > > > +#define SIZE_256M (256UL<<20)
> > > > > +#define SIZE_512M (512UL<<20)
> > > > > +
> > > > > /* Architecture __weak replacement functions */
> > > > > +unsigned long __init arch_swiotlb_adjust(unsigned long iotlb_default_size)
> > > > > +{
> > > > > + unsigned long size = 0;
> > > > > +
> > > > > + /*
> > > > > + * For SEV, all DMA has to occur via shared/unencrypted pages.
> > > > > + * SEV uses SWOTLB to make this happen without changing device
> > > > > + * drivers. However, depending on the workload being run, the
> > > > > + * default 64MB of SWIOTLB may not be enough & SWIOTLB may
> > > > > + * run out of buffers for DMA, resulting in I/O errors and/or
> > > > > + * performance degradation especially with high I/O workloads.
> > > > > + * Increase the default size of SWIOTLB for SEV guests using
> > > > > + * a minimum value of 128MB and a maximum value of 512MB,
> > > > > + * depending on amount of provisioned guest memory.
> > > > > + */
> > > > > + if (sev_active()) {
> > > > > + phys_addr_t total_mem = memblock_phys_mem_size();
> > > > > +
> > > > > + if (total_mem <= TOTAL_MEM_1G)
> > > > > + size = clamp(iotlb_default_size * 2, SIZE_128M,
> > > > > + SIZE_128M);
> > > > > + else if (total_mem <= TOTAL_MEM_4G)
> > > > > + size = clamp(iotlb_default_size * 4, SIZE_256M,
> > > > > + SIZE_256M);
> > > > > + else
> > > > > + size = clamp(iotlb_default_size * 8, SIZE_512M,
> > > > > + SIZE_512M);
> > > > > +
> > > > > + pr_info("SEV adjusted max SWIOTLB size = %luMB",
> > > > > + size >> 20);
> > > > > + }
> > > > > +
> > > > > + return size;
> > > > > +}
> > > > > +
> > > > > void __init mem_encrypt_init(void)
> > > > > {
> > > > > if (!sme_me_mask)
> > > > > diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h
> > > > > index 046bb94bd4d6..01ae6d891327 100644
> > > > > --- a/include/linux/swiotlb.h
> > > > > +++ b/include/linux/swiotlb.h
> > > > > @@ -33,6 +33,7 @@ extern void swiotlb_init(int verbose);
> > > > > int swiotlb_init_with_tbl(char *tlb, unsigned long nslabs, int verbose);
> > > > > extern unsigned long swiotlb_nr_tbl(void);
> > > > > unsigned long swiotlb_size_or_default(void);
> > > > > +extern void __init swiotlb_adjust(void);
> > > > > extern int swiotlb_late_init_with_tbl(char *tlb, unsigned long nslabs);
> > > > > extern void __init swiotlb_update_mem_attributes(void);
> > > > >
> > > > > diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
> > > > > index c19379fabd20..66a9e627bb51 100644
> > > > > --- a/kernel/dma/swiotlb.c
> > > > > +++ b/kernel/dma/swiotlb.c
> > > > > @@ -163,6 +163,33 @@ unsigned long swiotlb_size_or_default(void)
> > > > > return size ? size : (IO_TLB_DEFAULT_SIZE);
> > > > > }
> > > > >
> > > > > +unsigned long __init __weak arch_swiotlb_adjust(unsigned long size)
> > > > > +{
> > > > > + return 0;
> > > > > +}
> > > > > +
> > > > > +void __init swiotlb_adjust(void)
> > > > > +{
> > > > > + unsigned long size;
> > > > > +
> > > > > + /*
> > > > > + * If swiotlb parameter has not been specified, give a chance to
> > > > > + * architectures such as those supporting memory encryption to
> > > > > + * adjust/expand SWIOTLB size for their use.
> > > > > + */
> > > > > + if (!io_tlb_nslabs) {
> > > > > + size = arch_swiotlb_adjust(IO_TLB_DEFAULT_SIZE);
> > > > > + if (size) {
> > > > > + size = ALIGN(size, 1 << IO_TLB_SHIFT);
> > > > > + io_tlb_nslabs = size >> IO_TLB_SHIFT;
> > > > > + io_tlb_nslabs = ALIGN(io_tlb_nslabs, IO_TLB_SEGSIZE);
> > > > > +
> > > > > + pr_info("architecture adjusted SWIOTLB slabs = %lu\n",
> > > > > + io_tlb_nslabs);
> > > > > + }
> > > > > + }
> > > > > +}
> > > > > +
> > > > > void swiotlb_print_info(void)
> > > > > {
> > > > > unsigned long bytes = io_tlb_nslabs << IO_TLB_SHIFT;
> > > > > --
> > > > > 2.17.1
> > > > >

2020-11-05 19:40:23

by Kalra, Ashish

[permalink] [raw]
Subject: Re: [PATCH v3] swiotlb: Adjust SWIOTBL bounce buffer size for SEV guests.

On Thu, Nov 05, 2020 at 02:06:49PM -0500, Konrad Rzeszutek Wilk wrote:
> .
> > > Right, so I am wondering if we can do this better.
> > >
> > > That is you are never going to get any 32-bit devices with SEV right? That
> > > is there is nothing that bounds you to always use the memory below 4GB?
> > >
> >
> > We do support 32-bit PCIe passthrough devices with SEV.
>
> Ewww.. Which devices would this be?

That will be difficult to predict as customers could be doing
passthrough of all kinds of devices.

> >
> > Therefore, we can't just depend on >4G memory for SWIOTLB bounce buffering
> > when there is I/O pressure, because we do need to support device
> > passthrough of 32-bit devices.
>
> Presumarily there is just a handful of them?
>
Again, it will be incorrect to assume this.

> >
> > Considering this, we believe that this patch needs to adjust/extend
> > boot-allocation of SWIOTLB and we want to keep it simple to do this
> > within a range detemined by amount of allocated guest memory.
>
> I would prefer to not have to revert this in a year as customers
> complain about "I paid $$$ and I am wasting half a gig on something
> I am not using" and giving customers knobs to tweak this instead of
> doing the right thing from the start.

Currently, we face a lot of situations where we have to tell our
internal teams/external customers to explicitly increase SWIOTLB buffer
via the swiotlb parameter on the kernel command line, especially to
get better I/O performance numbers with SEV.

So by having this SWIOTLB size adjustment done implicitly (even using a
static logic) is a great win-win situation. In other words, having even
a simple and static default increase of SWIOTLB buffer size for SEV is
really useful for us.

We can always think of adding all kinds of heuristics to this, but that
just adds too much complexity without any predictable performance gain.

And to add, the patch extends the SWIOTLB size as an architecture
specific callback, currently it is a simple and static logic for SEV/x86
specific, but there is always an option to tweak/extend it with
additional logic in the future.

Thanks,
Ashish'

>
> That is the right thing being something less static.
>
> Can you work with me on what that could be please?
>
> >
> > Thanks,
> > Ashish
> >
> > > What I wonder is if we can combine the boot-allocation of the SWIOTLB
> > > with the post-boot-allocation of SWIOLTB to stitch together
> > > continous physical ranges.
> > >
> > > That way you have the flexibility at the start of using 64MB but if there
> > > is pressure, we grow to a bigger size?
> > >
> > > >
> > > > Thanks,
> > > > Ashish
> > > >
> > > > > > memory.
> > > > > >
> > > > > > Using late_initcall() interface to invoke
> > > > > > swiotlb_adjust() does not work as the size
> > > > > > adjustment needs to be done before mem_encrypt_init()
> > > > > > and reserve_crashkernel() which use the allocated
> > > > > > SWIOTLB buffer size, hence calling it explicitly
> > > > > > from setup_arch().
> > > > > >
> > > > > > The SWIOTLB default size adjustment is added as an
> > > > > > architecture specific interface/callback to allow
> > > > > > architectures such as those supporting memory
> > > > > > encryption to adjust/expand SWIOTLB size for their
> > > > > > use.
> > > > > >
> > > > > > Signed-off-by: Ashish Kalra <[email protected]>
> > > > > > ---
> > > > > > arch/x86/kernel/setup.c | 2 ++
> > > > > > arch/x86/mm/mem_encrypt.c | 42 +++++++++++++++++++++++++++++++++++++++
> > > > > > include/linux/swiotlb.h | 1 +
> > > > > > kernel/dma/swiotlb.c | 27 +++++++++++++++++++++++++
> > > > > > 4 files changed, 72 insertions(+)
> > > > > >
> > > > > > diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
> > > > > > index 3511736fbc74..b073d58dd4a3 100644
> > > > > > --- a/arch/x86/kernel/setup.c
> > > > > > +++ b/arch/x86/kernel/setup.c
> > > > > > @@ -1166,6 +1166,8 @@ void __init setup_arch(char **cmdline_p)
> > > > > > if (boot_cpu_has(X86_FEATURE_GBPAGES))
> > > > > > hugetlb_cma_reserve(PUD_SHIFT - PAGE_SHIFT);
> > > > > >
> > > > > > + swiotlb_adjust();
> > > > > > +
> > > > > > /*
> > > > > > * Reserve memory for crash kernel after SRAT is parsed so that it
> > > > > > * won't consume hotpluggable memory.
> > > > > > diff --git a/arch/x86/mm/mem_encrypt.c b/arch/x86/mm/mem_encrypt.c
> > > > > > index 3f248f0d0e07..e0deb157cddd 100644
> > > > > > --- a/arch/x86/mm/mem_encrypt.c
> > > > > > +++ b/arch/x86/mm/mem_encrypt.c
> > > > > > @@ -489,7 +489,49 @@ static void print_mem_encrypt_feature_info(void)
> > > > > > pr_cont("\n");
> > > > > > }
> > > > > >
> > > > > > +#define TOTAL_MEM_1G 0x40000000UL
> > > > > > +#define TOTAL_MEM_4G 0x100000000UL
> > > > > > +
> > > > > > +#define SIZE_128M (128UL<<20)
> > > > > > +#define SIZE_256M (256UL<<20)
> > > > > > +#define SIZE_512M (512UL<<20)
> > > > > > +
> > > > > > /* Architecture __weak replacement functions */
> > > > > > +unsigned long __init arch_swiotlb_adjust(unsigned long iotlb_default_size)
> > > > > > +{
> > > > > > + unsigned long size = 0;
> > > > > > +
> > > > > > + /*
> > > > > > + * For SEV, all DMA has to occur via shared/unencrypted pages.
> > > > > > + * SEV uses SWOTLB to make this happen without changing device
> > > > > > + * drivers. However, depending on the workload being run, the
> > > > > > + * default 64MB of SWIOTLB may not be enough & SWIOTLB may
> > > > > > + * run out of buffers for DMA, resulting in I/O errors and/or
> > > > > > + * performance degradation especially with high I/O workloads.
> > > > > > + * Increase the default size of SWIOTLB for SEV guests using
> > > > > > + * a minimum value of 128MB and a maximum value of 512MB,
> > > > > > + * depending on amount of provisioned guest memory.
> > > > > > + */
> > > > > > + if (sev_active()) {
> > > > > > + phys_addr_t total_mem = memblock_phys_mem_size();
> > > > > > +
> > > > > > + if (total_mem <= TOTAL_MEM_1G)
> > > > > > + size = clamp(iotlb_default_size * 2, SIZE_128M,
> > > > > > + SIZE_128M);
> > > > > > + else if (total_mem <= TOTAL_MEM_4G)
> > > > > > + size = clamp(iotlb_default_size * 4, SIZE_256M,
> > > > > > + SIZE_256M);
> > > > > > + else
> > > > > > + size = clamp(iotlb_default_size * 8, SIZE_512M,
> > > > > > + SIZE_512M);
> > > > > > +
> > > > > > + pr_info("SEV adjusted max SWIOTLB size = %luMB",
> > > > > > + size >> 20);
> > > > > > + }
> > > > > > +
> > > > > > + return size;
> > > > > > +}
> > > > > > +
> > > > > > void __init mem_encrypt_init(void)
> > > > > > {
> > > > > > if (!sme_me_mask)
> > > > > > diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h
> > > > > > index 046bb94bd4d6..01ae6d891327 100644
> > > > > > --- a/include/linux/swiotlb.h
> > > > > > +++ b/include/linux/swiotlb.h
> > > > > > @@ -33,6 +33,7 @@ extern void swiotlb_init(int verbose);
> > > > > > int swiotlb_init_with_tbl(char *tlb, unsigned long nslabs, int verbose);
> > > > > > extern unsigned long swiotlb_nr_tbl(void);
> > > > > > unsigned long swiotlb_size_or_default(void);
> > > > > > +extern void __init swiotlb_adjust(void);
> > > > > > extern int swiotlb_late_init_with_tbl(char *tlb, unsigned long nslabs);
> > > > > > extern void __init swiotlb_update_mem_attributes(void);
> > > > > >
> > > > > > diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
> > > > > > index c19379fabd20..66a9e627bb51 100644
> > > > > > --- a/kernel/dma/swiotlb.c
> > > > > > +++ b/kernel/dma/swiotlb.c
> > > > > > @@ -163,6 +163,33 @@ unsigned long swiotlb_size_or_default(void)
> > > > > > return size ? size : (IO_TLB_DEFAULT_SIZE);
> > > > > > }
> > > > > >
> > > > > > +unsigned long __init __weak arch_swiotlb_adjust(unsigned long size)
> > > > > > +{
> > > > > > + return 0;
> > > > > > +}
> > > > > > +
> > > > > > +void __init swiotlb_adjust(void)
> > > > > > +{
> > > > > > + unsigned long size;
> > > > > > +
> > > > > > + /*
> > > > > > + * If swiotlb parameter has not been specified, give a chance to
> > > > > > + * architectures such as those supporting memory encryption to
> > > > > > + * adjust/expand SWIOTLB size for their use.
> > > > > > + */
> > > > > > + if (!io_tlb_nslabs) {
> > > > > > + size = arch_swiotlb_adjust(IO_TLB_DEFAULT_SIZE);
> > > > > > + if (size) {
> > > > > > + size = ALIGN(size, 1 << IO_TLB_SHIFT);
> > > > > > + io_tlb_nslabs = size >> IO_TLB_SHIFT;
> > > > > > + io_tlb_nslabs = ALIGN(io_tlb_nslabs, IO_TLB_SEGSIZE);
> > > > > > +
> > > > > > + pr_info("architecture adjusted SWIOTLB slabs = %lu\n",
> > > > > > + io_tlb_nslabs);
> > > > > > + }
> > > > > > + }
> > > > > > +}
> > > > > > +
> > > > > > void swiotlb_print_info(void)
> > > > > > {
> > > > > > unsigned long bytes = io_tlb_nslabs << IO_TLB_SHIFT;
> > > > > > --
> > > > > > 2.17.1
> > > > > >

2020-11-05 20:22:58

by Konrad Rzeszutek Wilk

[permalink] [raw]
Subject: Re: [PATCH v3] swiotlb: Adjust SWIOTBL bounce buffer size for SEV guests.

On Thu, Nov 05, 2020 at 07:38:28PM +0000, Ashish Kalra wrote:
> On Thu, Nov 05, 2020 at 02:06:49PM -0500, Konrad Rzeszutek Wilk wrote:
> > .
> > > > Right, so I am wondering if we can do this better.
> > > >
> > > > That is you are never going to get any 32-bit devices with SEV right? That
> > > > is there is nothing that bounds you to always use the memory below 4GB?
> > > >
> > >
> > > We do support 32-bit PCIe passthrough devices with SEV.
> >
> > Ewww.. Which devices would this be?
>
> That will be difficult to predict as customers could be doing
> passthrough of all kinds of devices.

But SEV is not on some 1990 hardware. It has PCIe, there is no PCI slots in there.

Is it really possible to have a PCIe device that can't do more than 32-bit DMA?

>
> > >
> > > Therefore, we can't just depend on >4G memory for SWIOTLB bounce buffering
> > > when there is I/O pressure, because we do need to support device
> > > passthrough of 32-bit devices.
> >
> > Presumarily there is just a handful of them?
> >
> Again, it will be incorrect to assume this.
>
> > >
> > > Considering this, we believe that this patch needs to adjust/extend
> > > boot-allocation of SWIOTLB and we want to keep it simple to do this
> > > within a range detemined by amount of allocated guest memory.
> >
> > I would prefer to not have to revert this in a year as customers
> > complain about "I paid $$$ and I am wasting half a gig on something
> > I am not using" and giving customers knobs to tweak this instead of
> > doing the right thing from the start.
>
> Currently, we face a lot of situations where we have to tell our
> internal teams/external customers to explicitly increase SWIOTLB buffer
> via the swiotlb parameter on the kernel command line, especially to
> get better I/O performance numbers with SEV.

Presumarily these are 64-bit?

And what devices do you speak off that are actually affected by
this performance? Increasing the SWIOTLB just means we have more
memory, which in mind means you can have _more_ devices in the guest
that won't handle the fact that DMA mapping returns an error.

Not neccessarily that one device suddenly can go faster.

>
> So by having this SWIOTLB size adjustment done implicitly (even using a
> static logic) is a great win-win situation. In other words, having even
> a simple and static default increase of SWIOTLB buffer size for SEV is
> really useful for us.
>
> We can always think of adding all kinds of heuristics to this, but that
> just adds too much complexity without any predictable performance gain.
>
> And to add, the patch extends the SWIOTLB size as an architecture
> specific callback, currently it is a simple and static logic for SEV/x86
> specific, but there is always an option to tweak/extend it with
> additional logic in the future.

Right, and that is what I would like to talk about as I think you
are going to disappear (aka, busy with other stuff) after this patch goes in.

I need to understand this more than "performance" and "internal teams"
requirements to come up with a better way going forward as surely other
platforms will hit the same issue anyhow.

Lets break this down:

How does the performance improve for one single device if you increase the SWIOTLB?
Is there a specific device/driver that you can talk about that improve with this patch?


>
> Thanks,
> Ashish'
>
> >
> > That is the right thing being something less static.
> >
> > Can you work with me on what that could be please?
> >
> > >
> > > Thanks,
> > > Ashish
> > >
> > > > What I wonder is if we can combine the boot-allocation of the SWIOTLB
> > > > with the post-boot-allocation of SWIOLTB to stitch together
> > > > continous physical ranges.
> > > >
> > > > That way you have the flexibility at the start of using 64MB but if there
> > > > is pressure, we grow to a bigger size?
> > > >
> > > > >
> > > > > Thanks,
> > > > > Ashish
> > > > >
> > > > > > > memory.
> > > > > > >
> > > > > > > Using late_initcall() interface to invoke
> > > > > > > swiotlb_adjust() does not work as the size
> > > > > > > adjustment needs to be done before mem_encrypt_init()
> > > > > > > and reserve_crashkernel() which use the allocated
> > > > > > > SWIOTLB buffer size, hence calling it explicitly
> > > > > > > from setup_arch().
> > > > > > >
> > > > > > > The SWIOTLB default size adjustment is added as an
> > > > > > > architecture specific interface/callback to allow
> > > > > > > architectures such as those supporting memory
> > > > > > > encryption to adjust/expand SWIOTLB size for their
> > > > > > > use.
> > > > > > >
> > > > > > > Signed-off-by: Ashish Kalra <[email protected]>
> > > > > > > ---
> > > > > > > arch/x86/kernel/setup.c | 2 ++
> > > > > > > arch/x86/mm/mem_encrypt.c | 42 +++++++++++++++++++++++++++++++++++++++
> > > > > > > include/linux/swiotlb.h | 1 +
> > > > > > > kernel/dma/swiotlb.c | 27 +++++++++++++++++++++++++
> > > > > > > 4 files changed, 72 insertions(+)
> > > > > > >
> > > > > > > diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
> > > > > > > index 3511736fbc74..b073d58dd4a3 100644
> > > > > > > --- a/arch/x86/kernel/setup.c
> > > > > > > +++ b/arch/x86/kernel/setup.c
> > > > > > > @@ -1166,6 +1166,8 @@ void __init setup_arch(char **cmdline_p)
> > > > > > > if (boot_cpu_has(X86_FEATURE_GBPAGES))
> > > > > > > hugetlb_cma_reserve(PUD_SHIFT - PAGE_SHIFT);
> > > > > > >
> > > > > > > + swiotlb_adjust();
> > > > > > > +
> > > > > > > /*
> > > > > > > * Reserve memory for crash kernel after SRAT is parsed so that it
> > > > > > > * won't consume hotpluggable memory.
> > > > > > > diff --git a/arch/x86/mm/mem_encrypt.c b/arch/x86/mm/mem_encrypt.c
> > > > > > > index 3f248f0d0e07..e0deb157cddd 100644
> > > > > > > --- a/arch/x86/mm/mem_encrypt.c
> > > > > > > +++ b/arch/x86/mm/mem_encrypt.c
> > > > > > > @@ -489,7 +489,49 @@ static void print_mem_encrypt_feature_info(void)
> > > > > > > pr_cont("\n");
> > > > > > > }
> > > > > > >
> > > > > > > +#define TOTAL_MEM_1G 0x40000000UL
> > > > > > > +#define TOTAL_MEM_4G 0x100000000UL
> > > > > > > +
> > > > > > > +#define SIZE_128M (128UL<<20)
> > > > > > > +#define SIZE_256M (256UL<<20)
> > > > > > > +#define SIZE_512M (512UL<<20)
> > > > > > > +
> > > > > > > /* Architecture __weak replacement functions */
> > > > > > > +unsigned long __init arch_swiotlb_adjust(unsigned long iotlb_default_size)
> > > > > > > +{
> > > > > > > + unsigned long size = 0;
> > > > > > > +
> > > > > > > + /*
> > > > > > > + * For SEV, all DMA has to occur via shared/unencrypted pages.
> > > > > > > + * SEV uses SWOTLB to make this happen without changing device
> > > > > > > + * drivers. However, depending on the workload being run, the
> > > > > > > + * default 64MB of SWIOTLB may not be enough & SWIOTLB may
> > > > > > > + * run out of buffers for DMA, resulting in I/O errors and/or
> > > > > > > + * performance degradation especially with high I/O workloads.
> > > > > > > + * Increase the default size of SWIOTLB for SEV guests using
> > > > > > > + * a minimum value of 128MB and a maximum value of 512MB,
> > > > > > > + * depending on amount of provisioned guest memory.
> > > > > > > + */
> > > > > > > + if (sev_active()) {
> > > > > > > + phys_addr_t total_mem = memblock_phys_mem_size();
> > > > > > > +
> > > > > > > + if (total_mem <= TOTAL_MEM_1G)
> > > > > > > + size = clamp(iotlb_default_size * 2, SIZE_128M,
> > > > > > > + SIZE_128M);
> > > > > > > + else if (total_mem <= TOTAL_MEM_4G)
> > > > > > > + size = clamp(iotlb_default_size * 4, SIZE_256M,
> > > > > > > + SIZE_256M);
> > > > > > > + else
> > > > > > > + size = clamp(iotlb_default_size * 8, SIZE_512M,
> > > > > > > + SIZE_512M);
> > > > > > > +
> > > > > > > + pr_info("SEV adjusted max SWIOTLB size = %luMB",
> > > > > > > + size >> 20);
> > > > > > > + }
> > > > > > > +
> > > > > > > + return size;
> > > > > > > +}
> > > > > > > +
> > > > > > > void __init mem_encrypt_init(void)
> > > > > > > {
> > > > > > > if (!sme_me_mask)
> > > > > > > diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h
> > > > > > > index 046bb94bd4d6..01ae6d891327 100644
> > > > > > > --- a/include/linux/swiotlb.h
> > > > > > > +++ b/include/linux/swiotlb.h
> > > > > > > @@ -33,6 +33,7 @@ extern void swiotlb_init(int verbose);
> > > > > > > int swiotlb_init_with_tbl(char *tlb, unsigned long nslabs, int verbose);
> > > > > > > extern unsigned long swiotlb_nr_tbl(void);
> > > > > > > unsigned long swiotlb_size_or_default(void);
> > > > > > > +extern void __init swiotlb_adjust(void);
> > > > > > > extern int swiotlb_late_init_with_tbl(char *tlb, unsigned long nslabs);
> > > > > > > extern void __init swiotlb_update_mem_attributes(void);
> > > > > > >
> > > > > > > diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
> > > > > > > index c19379fabd20..66a9e627bb51 100644
> > > > > > > --- a/kernel/dma/swiotlb.c
> > > > > > > +++ b/kernel/dma/swiotlb.c
> > > > > > > @@ -163,6 +163,33 @@ unsigned long swiotlb_size_or_default(void)
> > > > > > > return size ? size : (IO_TLB_DEFAULT_SIZE);
> > > > > > > }
> > > > > > >
> > > > > > > +unsigned long __init __weak arch_swiotlb_adjust(unsigned long size)
> > > > > > > +{
> > > > > > > + return 0;
> > > > > > > +}
> > > > > > > +
> > > > > > > +void __init swiotlb_adjust(void)
> > > > > > > +{
> > > > > > > + unsigned long size;
> > > > > > > +
> > > > > > > + /*
> > > > > > > + * If swiotlb parameter has not been specified, give a chance to
> > > > > > > + * architectures such as those supporting memory encryption to
> > > > > > > + * adjust/expand SWIOTLB size for their use.
> > > > > > > + */
> > > > > > > + if (!io_tlb_nslabs) {
> > > > > > > + size = arch_swiotlb_adjust(IO_TLB_DEFAULT_SIZE);
> > > > > > > + if (size) {
> > > > > > > + size = ALIGN(size, 1 << IO_TLB_SHIFT);
> > > > > > > + io_tlb_nslabs = size >> IO_TLB_SHIFT;
> > > > > > > + io_tlb_nslabs = ALIGN(io_tlb_nslabs, IO_TLB_SEGSIZE);
> > > > > > > +
> > > > > > > + pr_info("architecture adjusted SWIOTLB slabs = %lu\n",
> > > > > > > + io_tlb_nslabs);
> > > > > > > + }
> > > > > > > + }
> > > > > > > +}
> > > > > > > +
> > > > > > > void swiotlb_print_info(void)
> > > > > > > {
> > > > > > > unsigned long bytes = io_tlb_nslabs << IO_TLB_SHIFT;
> > > > > > > --
> > > > > > > 2.17.1
> > > > > > >

2020-11-05 21:22:55

by Kalra, Ashish

[permalink] [raw]
Subject: Re: [PATCH v3] swiotlb: Adjust SWIOTBL bounce buffer size for SEV guests.

On Thu, Nov 05, 2020 at 03:20:07PM -0500, Konrad Rzeszutek Wilk wrote:
> On Thu, Nov 05, 2020 at 07:38:28PM +0000, Ashish Kalra wrote:
> > On Thu, Nov 05, 2020 at 02:06:49PM -0500, Konrad Rzeszutek Wilk wrote:
> > > .
> > > > > Right, so I am wondering if we can do this better.
> > > > >
> > > > > That is you are never going to get any 32-bit devices with SEV right? That
> > > > > is there is nothing that bounds you to always use the memory below 4GB?
> > > > >
> > > >
> > > > We do support 32-bit PCIe passthrough devices with SEV.
> > >
> > > Ewww.. Which devices would this be?
> >
> > That will be difficult to predict as customers could be doing
> > passthrough of all kinds of devices.
>
> But SEV is not on some 1990 hardware. It has PCIe, there is no PCI slots in there.
>
> Is it really possible to have a PCIe device that can't do more than 32-bit DMA?
>
> >
> > > >
> > > > Therefore, we can't just depend on >4G memory for SWIOTLB bounce buffering
> > > > when there is I/O pressure, because we do need to support device
> > > > passthrough of 32-bit devices.
> > >
> > > Presumarily there is just a handful of them?
> > >
> > Again, it will be incorrect to assume this.
> >
> > > >
> > > > Considering this, we believe that this patch needs to adjust/extend
> > > > boot-allocation of SWIOTLB and we want to keep it simple to do this
> > > > within a range detemined by amount of allocated guest memory.
> > >
> > > I would prefer to not have to revert this in a year as customers
> > > complain about "I paid $$$ and I am wasting half a gig on something
> > > I am not using" and giving customers knobs to tweak this instead of
> > > doing the right thing from the start.
> >
> > Currently, we face a lot of situations where we have to tell our
> > internal teams/external customers to explicitly increase SWIOTLB buffer
> > via the swiotlb parameter on the kernel command line, especially to
> > get better I/O performance numbers with SEV.
>
> Presumarily these are 64-bit?
>
> And what devices do you speak off that are actually affected by
> this performance? Increasing the SWIOTLB just means we have more
> memory, which in mind means you can have _more_ devices in the guest
> that won't handle the fact that DMA mapping returns an error.
>
> Not neccessarily that one device suddenly can go faster.
>
> >
> > So by having this SWIOTLB size adjustment done implicitly (even using a
> > static logic) is a great win-win situation. In other words, having even
> > a simple and static default increase of SWIOTLB buffer size for SEV is
> > really useful for us.
> >
> > We can always think of adding all kinds of heuristics to this, but that
> > just adds too much complexity without any predictable performance gain.
> >
> > And to add, the patch extends the SWIOTLB size as an architecture
> > specific callback, currently it is a simple and static logic for SEV/x86
> > specific, but there is always an option to tweak/extend it with
> > additional logic in the future.
>
> Right, and that is what I would like to talk about as I think you
> are going to disappear (aka, busy with other stuff) after this patch goes in.
>
> I need to understand this more than "performance" and "internal teams"
> requirements to come up with a better way going forward as surely other
> platforms will hit the same issue anyhow.
>
> Lets break this down:
>
> How does the performance improve for one single device if you increase the SWIOTLB?
> Is there a specific device/driver that you can talk about that improve with this patch?
>
>

Yes, these are mainly for multi-queue devices such as NICs or even
multi-queue virtio.

This basically improves performance with concurrent DMA, hence,
basically multi-queue devices.

Thanks,
Ashish

> >
> > >
> > > That is the right thing being something less static.
> > >
> > > Can you work with me on what that could be please?
> > >
> > > >
> > > > Thanks,
> > > > Ashish
> > > >
> > > > > What I wonder is if we can combine the boot-allocation of the SWIOTLB
> > > > > with the post-boot-allocation of SWIOLTB to stitch together
> > > > > continous physical ranges.
> > > > >
> > > > > That way you have the flexibility at the start of using 64MB but if there
> > > > > is pressure, we grow to a bigger size?
> > > > >
> > > > > >
> > > > > > Thanks,
> > > > > > Ashish
> > > > > >
> > > > > > > > memory.
> > > > > > > >
> > > > > > > > Using late_initcall() interface to invoke
> > > > > > > > swiotlb_adjust() does not work as the size
> > > > > > > > adjustment needs to be done before mem_encrypt_init()
> > > > > > > > and reserve_crashkernel() which use the allocated
> > > > > > > > SWIOTLB buffer size, hence calling it explicitly
> > > > > > > > from setup_arch().
> > > > > > > >
> > > > > > > > The SWIOTLB default size adjustment is added as an
> > > > > > > > architecture specific interface/callback to allow
> > > > > > > > architectures such as those supporting memory
> > > > > > > > encryption to adjust/expand SWIOTLB size for their
> > > > > > > > use.
> > > > > > > >
> > > > > > > > Signed-off-by: Ashish Kalra <[email protected]>
> > > > > > > > ---
> > > > > > > > arch/x86/kernel/setup.c | 2 ++
> > > > > > > > arch/x86/mm/mem_encrypt.c | 42 +++++++++++++++++++++++++++++++++++++++
> > > > > > > > include/linux/swiotlb.h | 1 +
> > > > > > > > kernel/dma/swiotlb.c | 27 +++++++++++++++++++++++++
> > > > > > > > 4 files changed, 72 insertions(+)
> > > > > > > >
> > > > > > > > diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
> > > > > > > > index 3511736fbc74..b073d58dd4a3 100644
> > > > > > > > --- a/arch/x86/kernel/setup.c
> > > > > > > > +++ b/arch/x86/kernel/setup.c
> > > > > > > > @@ -1166,6 +1166,8 @@ void __init setup_arch(char **cmdline_p)
> > > > > > > > if (boot_cpu_has(X86_FEATURE_GBPAGES))
> > > > > > > > hugetlb_cma_reserve(PUD_SHIFT - PAGE_SHIFT);
> > > > > > > >
> > > > > > > > + swiotlb_adjust();
> > > > > > > > +
> > > > > > > > /*
> > > > > > > > * Reserve memory for crash kernel after SRAT is parsed so that it
> > > > > > > > * won't consume hotpluggable memory.
> > > > > > > > diff --git a/arch/x86/mm/mem_encrypt.c b/arch/x86/mm/mem_encrypt.c
> > > > > > > > index 3f248f0d0e07..e0deb157cddd 100644
> > > > > > > > --- a/arch/x86/mm/mem_encrypt.c
> > > > > > > > +++ b/arch/x86/mm/mem_encrypt.c
> > > > > > > > @@ -489,7 +489,49 @@ static void print_mem_encrypt_feature_info(void)
> > > > > > > > pr_cont("\n");
> > > > > > > > }
> > > > > > > >
> > > > > > > > +#define TOTAL_MEM_1G 0x40000000UL
> > > > > > > > +#define TOTAL_MEM_4G 0x100000000UL
> > > > > > > > +
> > > > > > > > +#define SIZE_128M (128UL<<20)
> > > > > > > > +#define SIZE_256M (256UL<<20)
> > > > > > > > +#define SIZE_512M (512UL<<20)
> > > > > > > > +
> > > > > > > > /* Architecture __weak replacement functions */
> > > > > > > > +unsigned long __init arch_swiotlb_adjust(unsigned long iotlb_default_size)
> > > > > > > > +{
> > > > > > > > + unsigned long size = 0;
> > > > > > > > +
> > > > > > > > + /*
> > > > > > > > + * For SEV, all DMA has to occur via shared/unencrypted pages.
> > > > > > > > + * SEV uses SWOTLB to make this happen without changing device
> > > > > > > > + * drivers. However, depending on the workload being run, the
> > > > > > > > + * default 64MB of SWIOTLB may not be enough & SWIOTLB may
> > > > > > > > + * run out of buffers for DMA, resulting in I/O errors and/or
> > > > > > > > + * performance degradation especially with high I/O workloads.
> > > > > > > > + * Increase the default size of SWIOTLB for SEV guests using
> > > > > > > > + * a minimum value of 128MB and a maximum value of 512MB,
> > > > > > > > + * depending on amount of provisioned guest memory.
> > > > > > > > + */
> > > > > > > > + if (sev_active()) {
> > > > > > > > + phys_addr_t total_mem = memblock_phys_mem_size();
> > > > > > > > +
> > > > > > > > + if (total_mem <= TOTAL_MEM_1G)
> > > > > > > > + size = clamp(iotlb_default_size * 2, SIZE_128M,
> > > > > > > > + SIZE_128M);
> > > > > > > > + else if (total_mem <= TOTAL_MEM_4G)
> > > > > > > > + size = clamp(iotlb_default_size * 4, SIZE_256M,
> > > > > > > > + SIZE_256M);
> > > > > > > > + else
> > > > > > > > + size = clamp(iotlb_default_size * 8, SIZE_512M,
> > > > > > > > + SIZE_512M);
> > > > > > > > +
> > > > > > > > + pr_info("SEV adjusted max SWIOTLB size = %luMB",
> > > > > > > > + size >> 20);
> > > > > > > > + }
> > > > > > > > +
> > > > > > > > + return size;
> > > > > > > > +}
> > > > > > > > +
> > > > > > > > void __init mem_encrypt_init(void)
> > > > > > > > {
> > > > > > > > if (!sme_me_mask)
> > > > > > > > diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h
> > > > > > > > index 046bb94bd4d6..01ae6d891327 100644
> > > > > > > > --- a/include/linux/swiotlb.h
> > > > > > > > +++ b/include/linux/swiotlb.h
> > > > > > > > @@ -33,6 +33,7 @@ extern void swiotlb_init(int verbose);
> > > > > > > > int swiotlb_init_with_tbl(char *tlb, unsigned long nslabs, int verbose);
> > > > > > > > extern unsigned long swiotlb_nr_tbl(void);
> > > > > > > > unsigned long swiotlb_size_or_default(void);
> > > > > > > > +extern void __init swiotlb_adjust(void);
> > > > > > > > extern int swiotlb_late_init_with_tbl(char *tlb, unsigned long nslabs);
> > > > > > > > extern void __init swiotlb_update_mem_attributes(void);
> > > > > > > >
> > > > > > > > diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
> > > > > > > > index c19379fabd20..66a9e627bb51 100644
> > > > > > > > --- a/kernel/dma/swiotlb.c
> > > > > > > > +++ b/kernel/dma/swiotlb.c
> > > > > > > > @@ -163,6 +163,33 @@ unsigned long swiotlb_size_or_default(void)
> > > > > > > > return size ? size : (IO_TLB_DEFAULT_SIZE);
> > > > > > > > }
> > > > > > > >
> > > > > > > > +unsigned long __init __weak arch_swiotlb_adjust(unsigned long size)
> > > > > > > > +{
> > > > > > > > + return 0;
> > > > > > > > +}
> > > > > > > > +
> > > > > > > > +void __init swiotlb_adjust(void)
> > > > > > > > +{
> > > > > > > > + unsigned long size;
> > > > > > > > +
> > > > > > > > + /*
> > > > > > > > + * If swiotlb parameter has not been specified, give a chance to
> > > > > > > > + * architectures such as those supporting memory encryption to
> > > > > > > > + * adjust/expand SWIOTLB size for their use.
> > > > > > > > + */
> > > > > > > > + if (!io_tlb_nslabs) {
> > > > > > > > + size = arch_swiotlb_adjust(IO_TLB_DEFAULT_SIZE);
> > > > > > > > + if (size) {
> > > > > > > > + size = ALIGN(size, 1 << IO_TLB_SHIFT);
> > > > > > > > + io_tlb_nslabs = size >> IO_TLB_SHIFT;
> > > > > > > > + io_tlb_nslabs = ALIGN(io_tlb_nslabs, IO_TLB_SEGSIZE);
> > > > > > > > +
> > > > > > > > + pr_info("architecture adjusted SWIOTLB slabs = %lu\n",
> > > > > > > > + io_tlb_nslabs);
> > > > > > > > + }
> > > > > > > > + }
> > > > > > > > +}
> > > > > > > > +
> > > > > > > > void swiotlb_print_info(void)
> > > > > > > > {
> > > > > > > > unsigned long bytes = io_tlb_nslabs << IO_TLB_SHIFT;
> > > > > > > > --
> > > > > > > > 2.17.1
> > > > > > > >

2020-11-06 18:28:52

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [PATCH v3] swiotlb: Adjust SWIOTBL bounce buffer size for SEV guests.

On Thu, Nov 05, 2020 at 02:06:49PM -0500, Konrad Rzeszutek Wilk wrote:
> .
> > > Right, so I am wondering if we can do this better.
> > >
> > > That is you are never going to get any 32-bit devices with SEV right? That
> > > is there is nothing that bounds you to always use the memory below 4GB?
> > >
> >
> > We do support 32-bit PCIe passthrough devices with SEV.
>
> Ewww.. Which devices would this be?

There is still some new broken shit like that that keeps appearing.
GPU is pretty famouts for supporting less than 64-bit addressing,
even if it isn't all the way down to 32-bit.

2020-11-13 21:28:21

by Konrad Rzeszutek Wilk

[permalink] [raw]
Subject: Re: [PATCH v3] swiotlb: Adjust SWIOTBL bounce buffer size for SEV guests.

On Thu, Nov 05, 2020 at 09:20:45PM +0000, Ashish Kalra wrote:
> On Thu, Nov 05, 2020 at 03:20:07PM -0500, Konrad Rzeszutek Wilk wrote:
> > On Thu, Nov 05, 2020 at 07:38:28PM +0000, Ashish Kalra wrote:
> > > On Thu, Nov 05, 2020 at 02:06:49PM -0500, Konrad Rzeszutek Wilk wrote:
> > > > .
> > > > > > Right, so I am wondering if we can do this better.
> > > > > >
> > > > > > That is you are never going to get any 32-bit devices with SEV right? That
> > > > > > is there is nothing that bounds you to always use the memory below 4GB?
> > > > > >
> > > > >
> > > > > We do support 32-bit PCIe passthrough devices with SEV.
> > > >
> > > > Ewww.. Which devices would this be?
> > >
> > > That will be difficult to predict as customers could be doing
> > > passthrough of all kinds of devices.
> >
> > But SEV is not on some 1990 hardware. It has PCIe, there is no PCI slots in there.
> >
> > Is it really possible to have a PCIe device that can't do more than 32-bit DMA?
> >
> > >
> > > > >
> > > > > Therefore, we can't just depend on >4G memory for SWIOTLB bounce buffering
> > > > > when there is I/O pressure, because we do need to support device
> > > > > passthrough of 32-bit devices.
> > > >
> > > > Presumarily there is just a handful of them?
> > > >
> > > Again, it will be incorrect to assume this.
> > >
> > > > >
> > > > > Considering this, we believe that this patch needs to adjust/extend
> > > > > boot-allocation of SWIOTLB and we want to keep it simple to do this
> > > > > within a range detemined by amount of allocated guest memory.
> > > >
> > > > I would prefer to not have to revert this in a year as customers
> > > > complain about "I paid $$$ and I am wasting half a gig on something
> > > > I am not using" and giving customers knobs to tweak this instead of
> > > > doing the right thing from the start.
> > >
> > > Currently, we face a lot of situations where we have to tell our
> > > internal teams/external customers to explicitly increase SWIOTLB buffer
> > > via the swiotlb parameter on the kernel command line, especially to
> > > get better I/O performance numbers with SEV.
> >
> > Presumarily these are 64-bit?
> >
> > And what devices do you speak off that are actually affected by
> > this performance? Increasing the SWIOTLB just means we have more
> > memory, which in mind means you can have _more_ devices in the guest
> > that won't handle the fact that DMA mapping returns an error.
> >
> > Not neccessarily that one device suddenly can go faster.
> >
> > >
> > > So by having this SWIOTLB size adjustment done implicitly (even using a
> > > static logic) is a great win-win situation. In other words, having even
> > > a simple and static default increase of SWIOTLB buffer size for SEV is
> > > really useful for us.
> > >
> > > We can always think of adding all kinds of heuristics to this, but that
> > > just adds too much complexity without any predictable performance gain.
> > >
> > > And to add, the patch extends the SWIOTLB size as an architecture
> > > specific callback, currently it is a simple and static logic for SEV/x86
> > > specific, but there is always an option to tweak/extend it with
> > > additional logic in the future.
> >
> > Right, and that is what I would like to talk about as I think you
> > are going to disappear (aka, busy with other stuff) after this patch goes in.
> >
> > I need to understand this more than "performance" and "internal teams"
> > requirements to come up with a better way going forward as surely other
> > platforms will hit the same issue anyhow.
> >
> > Lets break this down:
> >
> > How does the performance improve for one single device if you increase the SWIOTLB?
> > Is there a specific device/driver that you can talk about that improve with this patch?
> >
> >
>
> Yes, these are mainly for multi-queue devices such as NICs or even
> multi-queue virtio.
>
> This basically improves performance with concurrent DMA, hence,
> basically multi-queue devices.

OK, and for _1GB_ guest - what are the "internal teams/external customers" amount
of CPUs they use? Please lets use real use-cases.

2020-11-13 22:13:57

by Kalra, Ashish

[permalink] [raw]
Subject: Re: [PATCH v3] swiotlb: Adjust SWIOTBL bounce buffer size for SEV guests.

Hello Konrad,

On Fri, Nov 13, 2020 at 04:19:25PM -0500, Konrad Rzeszutek Wilk wrote:
> On Thu, Nov 05, 2020 at 09:20:45PM +0000, Ashish Kalra wrote:
> > On Thu, Nov 05, 2020 at 03:20:07PM -0500, Konrad Rzeszutek Wilk wrote:
> > > On Thu, Nov 05, 2020 at 07:38:28PM +0000, Ashish Kalra wrote:
> > > > On Thu, Nov 05, 2020 at 02:06:49PM -0500, Konrad Rzeszutek Wilk wrote:
> > > > > .
> > > > > > > Right, so I am wondering if we can do this better.
> > > > > > >
> > > > > > > That is you are never going to get any 32-bit devices with SEV right? That
> > > > > > > is there is nothing that bounds you to always use the memory below 4GB?
> > > > > > >
> > > > > >
> > > > > > We do support 32-bit PCIe passthrough devices with SEV.
> > > > >
> > > > > Ewww.. Which devices would this be?
> > > >
> > > > That will be difficult to predict as customers could be doing
> > > > passthrough of all kinds of devices.
> > >
> > > But SEV is not on some 1990 hardware. It has PCIe, there is no PCI slots in there.
> > >
> > > Is it really possible to have a PCIe device that can't do more than 32-bit DMA?
> > >
> > > >
> > > > > >
> > > > > > Therefore, we can't just depend on >4G memory for SWIOTLB bounce buffering
> > > > > > when there is I/O pressure, because we do need to support device
> > > > > > passthrough of 32-bit devices.
> > > > >
> > > > > Presumarily there is just a handful of them?
> > > > >
> > > > Again, it will be incorrect to assume this.
> > > >
> > > > > >
> > > > > > Considering this, we believe that this patch needs to adjust/extend
> > > > > > boot-allocation of SWIOTLB and we want to keep it simple to do this
> > > > > > within a range detemined by amount of allocated guest memory.
> > > > >
> > > > > I would prefer to not have to revert this in a year as customers
> > > > > complain about "I paid $$$ and I am wasting half a gig on something
> > > > > I am not using" and giving customers knobs to tweak this instead of
> > > > > doing the right thing from the start.
> > > >
> > > > Currently, we face a lot of situations where we have to tell our
> > > > internal teams/external customers to explicitly increase SWIOTLB buffer
> > > > via the swiotlb parameter on the kernel command line, especially to
> > > > get better I/O performance numbers with SEV.
> > >
> > > Presumarily these are 64-bit?
> > >
> > > And what devices do you speak off that are actually affected by
> > > this performance? Increasing the SWIOTLB just means we have more
> > > memory, which in mind means you can have _more_ devices in the guest
> > > that won't handle the fact that DMA mapping returns an error.
> > >
> > > Not neccessarily that one device suddenly can go faster.
> > >
> > > >
> > > > So by having this SWIOTLB size adjustment done implicitly (even using a
> > > > static logic) is a great win-win situation. In other words, having even
> > > > a simple and static default increase of SWIOTLB buffer size for SEV is
> > > > really useful for us.
> > > >
> > > > We can always think of adding all kinds of heuristics to this, but that
> > > > just adds too much complexity without any predictable performance gain.
> > > >
> > > > And to add, the patch extends the SWIOTLB size as an architecture
> > > > specific callback, currently it is a simple and static logic for SEV/x86
> > > > specific, but there is always an option to tweak/extend it with
> > > > additional logic in the future.
> > >
> > > Right, and that is what I would like to talk about as I think you
> > > are going to disappear (aka, busy with other stuff) after this patch goes in.
> > >
> > > I need to understand this more than "performance" and "internal teams"
> > > requirements to come up with a better way going forward as surely other
> > > platforms will hit the same issue anyhow.
> > >
> > > Lets break this down:
> > >
> > > How does the performance improve for one single device if you increase the SWIOTLB?
> > > Is there a specific device/driver that you can talk about that improve with this patch?
> > >
> > >
> >
> > Yes, these are mainly for multi-queue devices such as NICs or even
> > multi-queue virtio.
> >
> > This basically improves performance with concurrent DMA, hence,
> > basically multi-queue devices.
>
> OK, and for _1GB_ guest - what are the "internal teams/external customers" amount
> of CPUs they use? Please lets use real use-cases.

I will get back to you with some real use-case data for the above guest
configurations next week.

Thanks,
Ashish

2020-11-17 15:36:42

by Kalra, Ashish

[permalink] [raw]
Subject: Re: [PATCH v3] swiotlb: Adjust SWIOTBL bounce buffer size for SEV guests.

On Fri, Nov 13, 2020 at 04:19:25PM -0500, Konrad Rzeszutek Wilk wrote:
> On Thu, Nov 05, 2020 at 09:20:45PM +0000, Ashish Kalra wrote:
> > On Thu, Nov 05, 2020 at 03:20:07PM -0500, Konrad Rzeszutek Wilk wrote:
> > > On Thu, Nov 05, 2020 at 07:38:28PM +0000, Ashish Kalra wrote:
> > > > On Thu, Nov 05, 2020 at 02:06:49PM -0500, Konrad Rzeszutek Wilk wrote:
> > > > > .
> > > > > > > Right, so I am wondering if we can do this better.
> > > > > > >
> > > > > > > That is you are never going to get any 32-bit devices with SEV right? That
> > > > > > > is there is nothing that bounds you to always use the memory below 4GB?
> > > > > > >
> > > > > >
> > > > > > We do support 32-bit PCIe passthrough devices with SEV.
> > > > >
> > > > > Ewww.. Which devices would this be?
> > > >
> > > > That will be difficult to predict as customers could be doing
> > > > passthrough of all kinds of devices.
> > >
> > > But SEV is not on some 1990 hardware. It has PCIe, there is no PCI slots in there.
> > >
> > > Is it really possible to have a PCIe device that can't do more than 32-bit DMA?
> > >
> > > >
> > > > > >
> > > > > > Therefore, we can't just depend on >4G memory for SWIOTLB bounce buffering
> > > > > > when there is I/O pressure, because we do need to support device
> > > > > > passthrough of 32-bit devices.
> > > > >
> > > > > Presumarily there is just a handful of them?
> > > > >
> > > > Again, it will be incorrect to assume this.
> > > >
> > > > > >
> > > > > > Considering this, we believe that this patch needs to adjust/extend
> > > > > > boot-allocation of SWIOTLB and we want to keep it simple to do this
> > > > > > within a range detemined by amount of allocated guest memory.
> > > > >
> > > > > I would prefer to not have to revert this in a year as customers
> > > > > complain about "I paid $$$ and I am wasting half a gig on something
> > > > > I am not using" and giving customers knobs to tweak this instead of
> > > > > doing the right thing from the start.
> > > >
> > > > Currently, we face a lot of situations where we have to tell our
> > > > internal teams/external customers to explicitly increase SWIOTLB buffer
> > > > via the swiotlb parameter on the kernel command line, especially to
> > > > get better I/O performance numbers with SEV.
> > >
> > > Presumarily these are 64-bit?
> > >
> > > And what devices do you speak off that are actually affected by
> > > this performance? Increasing the SWIOTLB just means we have more
> > > memory, which in mind means you can have _more_ devices in the guest
> > > that won't handle the fact that DMA mapping returns an error.
> > >
> > > Not neccessarily that one device suddenly can go faster.
> > >
> > > >
> > > > So by having this SWIOTLB size adjustment done implicitly (even using a
> > > > static logic) is a great win-win situation. In other words, having even
> > > > a simple and static default increase of SWIOTLB buffer size for SEV is
> > > > really useful for us.
> > > >
> > > > We can always think of adding all kinds of heuristics to this, but that
> > > > just adds too much complexity without any predictable performance gain.
> > > >
> > > > And to add, the patch extends the SWIOTLB size as an architecture
> > > > specific callback, currently it is a simple and static logic for SEV/x86
> > > > specific, but there is always an option to tweak/extend it with
> > > > additional logic in the future.
> > >
> > > Right, and that is what I would like to talk about as I think you
> > > are going to disappear (aka, busy with other stuff) after this patch goes in.
> > >
> > > I need to understand this more than "performance" and "internal teams"
> > > requirements to come up with a better way going forward as surely other
> > > platforms will hit the same issue anyhow.
> > >
> > > Lets break this down:
> > >
> > > How does the performance improve for one single device if you increase the SWIOTLB?
> > > Is there a specific device/driver that you can talk about that improve with this patch?
> > >
> > >
> >
> > Yes, these are mainly for multi-queue devices such as NICs or even
> > multi-queue virtio.
> >
> > This basically improves performance with concurrent DMA, hence,
> > basically multi-queue devices.
>
> OK, and for _1GB_ guest - what are the "internal teams/external customers" amount
> of CPUs they use? Please lets use real use-cases.

>> I am sure you will understand we cannot share any external customer
>> data as all that customer information is proprietary.
>>
>> In similar situation if you have to share Oracle data, you will
>> surely have the same concerns and i don't think you will be able
>> to share any such information externally, i.e., outside Oracle.
>>
>I am asking for a simple query - what amount of CPUs does a 1GB
>guest have? The reason for this should be fairly obvious - if
>it is a 1vCPU, then there is no multi-queue and the existing
>SWIOTLB pool size as it is OK.
>
>If however there are say 2 and multiqueue is enabled, that
>gives me an idea of how many you use and I can find out what
>the maximum pool size usage of virtio there is with that configuration.

Again we cannot share any customer data.

Also i don't think there can be a definitive answer to how many vCPUs a
1GB guest will have, it will depend on what kind of configuration we are
testing.

For example, i usually setup 4-16 vCPUs for as low as 512M configured
gueest memory.

I have been also testing with 16 vCPUs configuration for 512M-1G guest
memory with Mellanox SRIOV NICs, and this will be a multi-queue NIC
device environment.

So we might be having less configured guest memory, but we still might
be using that configuration with I/O intensive workloads.

Thanks,
Ashish

2020-11-17 17:02:48

by Konrad Rzeszutek Wilk

[permalink] [raw]
Subject: Re: [PATCH v3] swiotlb: Adjust SWIOTBL bounce buffer size for SEV guests.

.snip..
> > > > Lets break this down:
> > > >
> > > > How does the performance improve for one single device if you increase the SWIOTLB?
> > > > Is there a specific device/driver that you can talk about that improve with this patch?
> > > >
> > > >
> > >
> > > Yes, these are mainly for multi-queue devices such as NICs or even
> > > multi-queue virtio.
> > >
> > > This basically improves performance with concurrent DMA, hence,
> > > basically multi-queue devices.
> >
> > OK, and for _1GB_ guest - what are the "internal teams/external customers" amount
> > of CPUs they use? Please lets use real use-cases.
>
> >> I am sure you will understand we cannot share any external customer
> >> data as all that customer information is proprietary.
> >>
> >> In similar situation if you have to share Oracle data, you will
> >> surely have the same concerns and i don't think you will be able
> >> to share any such information externally, i.e., outside Oracle.
> >>
> >I am asking for a simple query - what amount of CPUs does a 1GB
> >guest have? The reason for this should be fairly obvious - if
> >it is a 1vCPU, then there is no multi-queue and the existing
> >SWIOTLB pool size as it is OK.
> >
> >If however there are say 2 and multiqueue is enabled, that
> >gives me an idea of how many you use and I can find out what
> >the maximum pool size usage of virtio there is with that configuration.
>
> Again we cannot share any customer data.
>
> Also i don't think there can be a definitive answer to how many vCPUs a
> 1GB guest will have, it will depend on what kind of configuration we are
> testing.
>
> For example, i usually setup 4-16 vCPUs for as low as 512M configured
> gueest memory.

Sure, but you are not the normal user.

That is I don't like that for 1GB guests your patch ends up doubling the
SWIOTLB memory pool. That seems to me we are trying to solve a problem
that normal users will not hit. That is why I want 'here is the customer
bug'.

Here is what I am going to do - I will take out the 1GB and 4GB case out of
your patch and call it a day. If there are customers who start reporting issues
we can revist that. Nothing wrong with 'Reported-by' XZY (we often ask the
customer if he or she would like to be recognized on upstream bugs).

And in the meantime I am going to look about adding ..
>
> I have been also testing with 16 vCPUs configuration for 512M-1G guest
> memory with Mellanox SRIOV NICs, and this will be a multi-queue NIC
> device environment.

.. late SWIOTLB expansion to stich the DMA pools together as both
Mellanox and VirtIO are not 32-bit DMA bound.

>
> So we might be having less configured guest memory, but we still might
> be using that configuration with I/O intensive workloads.
>
> Thanks,
> Ashish

2020-11-17 17:40:36

by Kalra, Ashish

[permalink] [raw]
Subject: Re: [PATCH v3] swiotlb: Adjust SWIOTBL bounce buffer size for SEV guests.

Hello Konrad,

On Tue, Nov 17, 2020 at 12:00:03PM -0500, Konrad Rzeszutek Wilk wrote:
> .snip..
> > > > > Lets break this down:
> > > > >
> > > > > How does the performance improve for one single device if you increase the SWIOTLB?
> > > > > Is there a specific device/driver that you can talk about that improve with this patch?
> > > > >
> > > > >
> > > >
> > > > Yes, these are mainly for multi-queue devices such as NICs or even
> > > > multi-queue virtio.
> > > >
> > > > This basically improves performance with concurrent DMA, hence,
> > > > basically multi-queue devices.
> > >
> > > OK, and for _1GB_ guest - what are the "internal teams/external customers" amount
> > > of CPUs they use? Please lets use real use-cases.
> >
> > >> I am sure you will understand we cannot share any external customer
> > >> data as all that customer information is proprietary.
> > >>
> > >> In similar situation if you have to share Oracle data, you will
> > >> surely have the same concerns and i don't think you will be able
> > >> to share any such information externally, i.e., outside Oracle.
> > >>
> > >I am asking for a simple query - what amount of CPUs does a 1GB
> > >guest have? The reason for this should be fairly obvious - if
> > >it is a 1vCPU, then there is no multi-queue and the existing
> > >SWIOTLB pool size as it is OK.
> > >
> > >If however there are say 2 and multiqueue is enabled, that
> > >gives me an idea of how many you use and I can find out what
> > >the maximum pool size usage of virtio there is with that configuration.
> >
> > Again we cannot share any customer data.
> >
> > Also i don't think there can be a definitive answer to how many vCPUs a
> > 1GB guest will have, it will depend on what kind of configuration we are
> > testing.
> >
> > For example, i usually setup 4-16 vCPUs for as low as 512M configured
> > gueest memory.
>
> Sure, but you are not the normal user.
>
> That is I don't like that for 1GB guests your patch ends up doubling the
> SWIOTLB memory pool. That seems to me we are trying to solve a problem
> that normal users will not hit. That is why I want 'here is the customer
> bug'.
>
> Here is what I am going to do - I will take out the 1GB and 4GB case out of
> your patch and call it a day. If there are customers who start reporting issues
> we can revist that. Nothing wrong with 'Reported-by' XZY (we often ask the
> customer if he or she would like to be recognized on upstream bugs).
>

Ok.

> And in the meantime I am going to look about adding ..
> >
> > I have been also testing with 16 vCPUs configuration for 512M-1G guest
> > memory with Mellanox SRIOV NICs, and this will be a multi-queue NIC
> > device environment.
>
> .. late SWIOTLB expansion to stich the DMA pools together as both
> Mellanox and VirtIO are not 32-bit DMA bound.
>
> >
> > So we might be having less configured guest memory, but we still might
> > be using that configuration with I/O intensive workloads.
> >

I am going to submit v4 of my current patch-set which uses max() instead
of clamp() and also replaces constants defined in this patch with the
pre-defined ones in sizes.h

Thanks,
Ashish