2020-11-05 02:20:26

by Kalra, Ashish

[permalink] [raw]
Subject: [PATCH v3] swiotlb: Adjust SWIOTBL bounce buffer size for SEV guests.

From: Ashish Kalra <[email protected]>

For SEV, all DMA to and from guest has to use shared
(un-encrypted) pages. SEV uses SWIOTLB to make this
happen without requiring changes to device drivers.
However, depending on workload being run, the default
64MB of SWIOTLB might not be enough and SWIOTLB
may run out of buffers to use for DMA, resulting
in I/O errors and/or performance degradation for
high I/O workloads.

Increase the default size of SWIOTLB for SEV guests
using a minimum value of 128MB and a maximum value
of 512MB, determining on amount of provisioned guest
memory.

Using late_initcall() interface to invoke
swiotlb_adjust() does not work as the size
adjustment needs to be done before mem_encrypt_init()
and reserve_crashkernel() which use the allocated
SWIOTLB buffer size, hence calling it explicitly
from setup_arch().

The SWIOTLB default size adjustment is added as an
architecture specific interface/callback to allow
architectures such as those supporting memory
encryption to adjust/expand SWIOTLB size for their
use.

Signed-off-by: Ashish Kalra <[email protected]>
---
arch/x86/kernel/setup.c | 2 ++
arch/x86/mm/mem_encrypt.c | 42 +++++++++++++++++++++++++++++++++++++++
include/linux/swiotlb.h | 1 +
kernel/dma/swiotlb.c | 27 +++++++++++++++++++++++++
4 files changed, 72 insertions(+)

diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
index 3511736fbc74..b073d58dd4a3 100644
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -1166,6 +1166,8 @@ void __init setup_arch(char **cmdline_p)
if (boot_cpu_has(X86_FEATURE_GBPAGES))
hugetlb_cma_reserve(PUD_SHIFT - PAGE_SHIFT);

+ swiotlb_adjust();
+
/*
* Reserve memory for crash kernel after SRAT is parsed so that it
* won't consume hotpluggable memory.
diff --git a/arch/x86/mm/mem_encrypt.c b/arch/x86/mm/mem_encrypt.c
index 3f248f0d0e07..e0deb157cddd 100644
--- a/arch/x86/mm/mem_encrypt.c
+++ b/arch/x86/mm/mem_encrypt.c
@@ -489,7 +489,49 @@ static void print_mem_encrypt_feature_info(void)
pr_cont("\n");
}

+#define TOTAL_MEM_1G 0x40000000UL
+#define TOTAL_MEM_4G 0x100000000UL
+
+#define SIZE_128M (128UL<<20)
+#define SIZE_256M (256UL<<20)
+#define SIZE_512M (512UL<<20)
+
/* Architecture __weak replacement functions */
+unsigned long __init arch_swiotlb_adjust(unsigned long iotlb_default_size)
+{
+ unsigned long size = 0;
+
+ /*
+ * For SEV, all DMA has to occur via shared/unencrypted pages.
+ * SEV uses SWOTLB to make this happen without changing device
+ * drivers. However, depending on the workload being run, the
+ * default 64MB of SWIOTLB may not be enough & SWIOTLB may
+ * run out of buffers for DMA, resulting in I/O errors and/or
+ * performance degradation especially with high I/O workloads.
+ * Increase the default size of SWIOTLB for SEV guests using
+ * a minimum value of 128MB and a maximum value of 512MB,
+ * depending on amount of provisioned guest memory.
+ */
+ if (sev_active()) {
+ phys_addr_t total_mem = memblock_phys_mem_size();
+
+ if (total_mem <= TOTAL_MEM_1G)
+ size = clamp(iotlb_default_size * 2, SIZE_128M,
+ SIZE_128M);
+ else if (total_mem <= TOTAL_MEM_4G)
+ size = clamp(iotlb_default_size * 4, SIZE_256M,
+ SIZE_256M);
+ else
+ size = clamp(iotlb_default_size * 8, SIZE_512M,
+ SIZE_512M);
+
+ pr_info("SEV adjusted max SWIOTLB size = %luMB",
+ size >> 20);
+ }
+
+ return size;
+}
+
void __init mem_encrypt_init(void)
{
if (!sme_me_mask)
diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h
index 046bb94bd4d6..01ae6d891327 100644
--- a/include/linux/swiotlb.h
+++ b/include/linux/swiotlb.h
@@ -33,6 +33,7 @@ extern void swiotlb_init(int verbose);
int swiotlb_init_with_tbl(char *tlb, unsigned long nslabs, int verbose);
extern unsigned long swiotlb_nr_tbl(void);
unsigned long swiotlb_size_or_default(void);
+extern void __init swiotlb_adjust(void);
extern int swiotlb_late_init_with_tbl(char *tlb, unsigned long nslabs);
extern void __init swiotlb_update_mem_attributes(void);

diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
index c19379fabd20..66a9e627bb51 100644
--- a/kernel/dma/swiotlb.c
+++ b/kernel/dma/swiotlb.c
@@ -163,6 +163,33 @@ unsigned long swiotlb_size_or_default(void)
return size ? size : (IO_TLB_DEFAULT_SIZE);
}

+unsigned long __init __weak arch_swiotlb_adjust(unsigned long size)
+{
+ return 0;
+}
+
+void __init swiotlb_adjust(void)
+{
+ unsigned long size;
+
+ /*
+ * If swiotlb parameter has not been specified, give a chance to
+ * architectures such as those supporting memory encryption to
+ * adjust/expand SWIOTLB size for their use.
+ */
+ if (!io_tlb_nslabs) {
+ size = arch_swiotlb_adjust(IO_TLB_DEFAULT_SIZE);
+ if (size) {
+ size = ALIGN(size, 1 << IO_TLB_SHIFT);
+ io_tlb_nslabs = size >> IO_TLB_SHIFT;
+ io_tlb_nslabs = ALIGN(io_tlb_nslabs, IO_TLB_SEGSIZE);
+
+ pr_info("architecture adjusted SWIOTLB slabs = %lu\n",
+ io_tlb_nslabs);
+ }
+ }
+}
+
void swiotlb_print_info(void)
{
unsigned long bytes = io_tlb_nslabs << IO_TLB_SHIFT;
--
2.17.1


2020-11-17 20:32:23

by Konrad Rzeszutek Wilk

[permalink] [raw]
Subject: Re: [PATCH v3] swiotlb: Adjust SWIOTBL bounce buffer size for SEV guests.

On Tue, Nov 17, 2020 at 07:04:59PM +0000, Kalra, Ashish wrote:
> Hello Konrad,
>
> Actually I didn’t get that, do you mean you are taking 1G and <=4G cases out of the patch and only going to apply the >4G case as part of the patch ?

That was the thought, but now I am wondering how TDX is going to work
with this. That is the __weak won't work on distro kernel that has to
run on both AMD and Intel. Hmm.

Let me brush off the late-SWIOTLB patches that internally we developed some time ago.

>
> Thanks,
> Ashish
>
> > On Nov 17, 2020, at 11:38 AM, Kalra, Ashish <[email protected]> wrote:
> >
> > Hello Konrad,
> >
> >> On Tue, Nov 17, 2020 at 12:00:03PM -0500, Konrad Rzeszutek Wilk wrote:
> >> .snip..
> >>>>>> Lets break this down:
> >>>>>>
> >>>>>> How does the performance improve for one single device if you increase the SWIOTLB?
> >>>>>> Is there a specific device/driver that you can talk about that improve with this patch?
> >>>>>>
> >>>>>>
> >>>>>
> >>>>> Yes, these are mainly for multi-queue devices such as NICs or even
> >>>>> multi-queue virtio.
> >>>>>
> >>>>> This basically improves performance with concurrent DMA, hence,
> >>>>> basically multi-queue devices.
> >>>>
> >>>> OK, and for _1GB_ guest - what are the "internal teams/external customers" amount
> >>>> of CPUs they use? Please lets use real use-cases.
> >>>
> >>>>> I am sure you will understand we cannot share any external customer
> >>>>> data as all that customer information is proprietary.
> >>>>>
> >>>>> In similar situation if you have to share Oracle data, you will
> >>>>> surely have the same concerns and i don't think you will be able
> >>>>> to share any such information externally, i.e., outside Oracle.
> >>>>>
> >>>> I am asking for a simple query - what amount of CPUs does a 1GB
> >>>> guest have? The reason for this should be fairly obvious - if
> >>>> it is a 1vCPU, then there is no multi-queue and the existing
> >>>> SWIOTLB pool size as it is OK.
> >>>>
> >>>> If however there are say 2 and multiqueue is enabled, that
> >>>> gives me an idea of how many you use and I can find out what
> >>>> the maximum pool size usage of virtio there is with that configuration.
> >>>
> >>> Again we cannot share any customer data.
> >>>
> >>> Also i don't think there can be a definitive answer to how many vCPUs a
> >>> 1GB guest will have, it will depend on what kind of configuration we are
> >>> testing.
> >>>
> >>> For example, i usually setup 4-16 vCPUs for as low as 512M configured
> >>> gueest memory.
> >>
> >> Sure, but you are not the normal user.
> >>
> >> That is I don't like that for 1GB guests your patch ends up doubling the
> >> SWIOTLB memory pool. That seems to me we are trying to solve a problem
> >> that normal users will not hit. That is why I want 'here is the customer
> >> bug'.
> >>
> >> Here is what I am going to do - I will take out the 1GB and 4GB case out of
> >> your patch and call it a day. If there are customers who start reporting issues
> >> we can revist that. Nothing wrong with 'Reported-by' XZY (we often ask the
> >> customer if he or she would like to be recognized on upstream bugs).
> >>
> >
> > Ok.
> >
> >> And in the meantime I am going to look about adding ..
> >>>
> >>> I have been also testing with 16 vCPUs configuration for 512M-1G guest
> >>> memory with Mellanox SRIOV NICs, and this will be a multi-queue NIC
> >>> device environment.
> >>
> >> .. late SWIOTLB expansion to stich the DMA pools together as both
> >> Mellanox and VirtIO are not 32-bit DMA bound.
> >>
> >>>
> >>> So we might be having less configured guest memory, but we still might
> >>> be using that configuration with I/O intensive workloads.
> >>>
> >
> > I am going to submit v4 of my current patch-set which uses max() instead
> > of clamp() and also replaces constants defined in this patch with the
> > pre-defined ones in sizes.h
> >
> > Thanks,
> > Ashish

2020-11-23 15:34:49

by Guilherme Piccoli

[permalink] [raw]
Subject: Re: [PATCH v3] swiotlb: Adjust SWIOTBL bounce buffer size for SEV guests.

Hi Ashish, non-technical comment: in the subject, you might want to
s/SWIOTBL/SWIOTLB .
cheers,


Guilherme