2023-04-01 04:58:08

by Michael Kelley (LINUX)

[permalink] [raw]
Subject: [PATCH v3 1/1] swiotlb: Track and report io_tlb_used high water mark in debugfs

swiotlb currently reports the total number of slabs and the instantaneous
in-use slabs in debugfs. But with increased usage of swiotlb for all I/O
in Confidential Computing (coco) VMs, it has become difficult to know
how much memory to allocate for swiotlb bounce buffers, either via the
automatic algorithm in the kernel or by specifying a value on the
kernel boot line. The current automatic algorithm generously allocates
swiotlb bounce buffer memory, and may be wasting significant memory in
many use cases.

To support better understanding of swiotlb usage, add tracking of the
the high water mark usage of the default swiotlb bounce buffer memory
pool. Report the high water mark in debugfs along with the other swiotlb
metrics. Allow the high water mark to be reset to zero at runtime by
writing to it.

Signed-off-by: Michael Kelley <[email protected]>
---
Changes in v3:
* Do high water mark accounting only when CONFIG_DEBUG_FS=y. As
as a result, add back the mem_used() function for the "swiotlb
buffer is full" error message. [Christoph -- I didn't hear back
whether this approach addresses your concern about one additional
atomic operation when slots are allocated and again when freed. I've
gone ahead with this new version, and we can obviously have further
discussion.]

* Remove unnecessary u64 casts. [Christoph Hellwig]

* Track slot usage and the high water mark only for io_tlb_default_mem.
Previous versions incorrectly included per-device pools. [Petr Tesarik]

Changes in v2:
* Only reset the high water mark to zero when the specified new value
is zero, to prevent confusion about the ability to reset to some
other value [Dexuan Cui]

kernel/dma/swiotlb.c | 41 ++++++++++++++++++++++++++++++++++++++++-
1 file changed, 40 insertions(+), 1 deletion(-)

diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
index d3d6be0..6587a3d 100644
--- a/kernel/dma/swiotlb.c
+++ b/kernel/dma/swiotlb.c
@@ -76,6 +76,9 @@ struct io_tlb_slot {
static unsigned long default_nslabs = IO_TLB_DEFAULT_SIZE >> IO_TLB_SHIFT;
static unsigned long default_nareas;

+static atomic_long_t total_used = ATOMIC_LONG_INIT(0);
+static atomic_long_t used_hiwater = ATOMIC_LONG_INIT(0);
+
/**
* struct io_tlb_area - IO TLB memory area descriptor
*
@@ -594,6 +597,7 @@ static int swiotlb_do_find_slots(struct device *dev, int area_index,
unsigned long flags;
unsigned int slot_base;
unsigned int slot_index;
+ unsigned long old_hiwater, new_used;

BUG_ON(!nslots);
BUG_ON(area_index >= mem->nareas);
@@ -663,6 +667,17 @@ static int swiotlb_do_find_slots(struct device *dev, int area_index,
area->index = 0;
area->used += nslots;
spin_unlock_irqrestore(&area->lock, flags);
+
+ if (IS_ENABLED(CONFIG_DEBUG_FS) && mem == &io_tlb_default_mem) {
+ new_used = atomic_long_add_return(nslots, &total_used);
+ old_hiwater = atomic_long_read(&used_hiwater);
+ do {
+ if (new_used <= old_hiwater)
+ break;
+ } while (!atomic_long_try_cmpxchg(&used_hiwater,
+ &old_hiwater, new_used));
+ }
+
return slot_index;
}

@@ -795,6 +810,9 @@ static void swiotlb_release_slots(struct device *dev, phys_addr_t tlb_addr)
mem->slots[i].list = ++count;
area->used -= nslots;
spin_unlock_irqrestore(&area->lock, flags);
+
+ if (IS_ENABLED(CONFIG_DEBUG_FS) && mem == &io_tlb_default_mem)
+ atomic_long_sub(nslots, &total_used);
}

/*
@@ -891,10 +909,29 @@ bool is_swiotlb_active(struct device *dev)

static int io_tlb_used_get(void *data, u64 *val)
{
- *val = mem_used(&io_tlb_default_mem);
+ *val = atomic_long_read(&total_used);
+ return 0;
+}
+
+static int io_tlb_hiwater_get(void *data, u64 *val)
+{
+ *val = atomic_long_read(&used_hiwater);
+ return 0;
+}
+
+static int io_tlb_hiwater_set(void *data, u64 val)
+{
+ /* Only allow setting to zero */
+ if (val != 0)
+ return -EINVAL;
+
+ atomic_long_set(&used_hiwater, val);
return 0;
}
+
DEFINE_DEBUGFS_ATTRIBUTE(fops_io_tlb_used, io_tlb_used_get, NULL, "%llu\n");
+DEFINE_DEBUGFS_ATTRIBUTE(fops_io_tlb_hiwater, io_tlb_hiwater_get,
+ io_tlb_hiwater_set, "%llu\n");

static void swiotlb_create_debugfs_files(struct io_tlb_mem *mem,
const char *dirname)
@@ -906,6 +943,8 @@ static void swiotlb_create_debugfs_files(struct io_tlb_mem *mem,
debugfs_create_ulong("io_tlb_nslabs", 0400, mem->debugfs, &mem->nslabs);
debugfs_create_file("io_tlb_used", 0400, mem->debugfs, NULL,
&fops_io_tlb_used);
+ debugfs_create_file("io_tlb_used_hiwater", 0600, mem->debugfs, NULL,
+ &fops_io_tlb_hiwater);
}

static int __init __maybe_unused swiotlb_create_default_debugfs(void)
--
1.8.3.1


2023-04-07 05:53:11

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [PATCH v3 1/1] swiotlb: Track and report io_tlb_used high water mark in debugfs

On Fri, Mar 31, 2023 at 09:45:00PM -0700, Michael Kelley wrote:
> Changes in v3:
> * Do high water mark accounting only when CONFIG_DEBUG_FS=y. As
> as a result, add back the mem_used() function for the "swiotlb
> buffer is full" error message. [Christoph -- I didn't hear back
> whether this approach addresses your concern about one additional
> atomic operation when slots are allocated and again when freed. I've
> gone ahead with this new version, and we can obviously have further
> discussion.]

Still not too happy, but at least debugfs is an interfact we could
remove at any time.

But can you please factor the used_hiwater accounting into two
separate helpers that are udner CONFIG_DEBUG_FS and otherwise
stubbed out, instead of adding the logic directly into
swiotlb_do_find_slots and swiotlb_release_slots?

2023-04-07 10:57:33

by Petr Tesařík

[permalink] [raw]
Subject: Re: [PATCH v3 1/1] swiotlb: Track and report io_tlb_used high water mark in debugfs

On Fri, 31 Mar 2023 21:45:00 -0700
Michael Kelley <[email protected]> wrote:

> swiotlb currently reports the total number of slabs and the instantaneous
> in-use slabs in debugfs. But with increased usage of swiotlb for all I/O
> in Confidential Computing (coco) VMs, it has become difficult to know
> how much memory to allocate for swiotlb bounce buffers, either via the
> automatic algorithm in the kernel or by specifying a value on the
> kernel boot line. The current automatic algorithm generously allocates
> swiotlb bounce buffer memory, and may be wasting significant memory in
> many use cases.
>
> To support better understanding of swiotlb usage, add tracking of the
> the high water mark usage of the default swiotlb bounce buffer memory
> pool. Report the high water mark in debugfs along with the other swiotlb
> metrics. Allow the high water mark to be reset to zero at runtime by
> writing to it.
>
> Signed-off-by: Michael Kelley <[email protected]>
> ---
> Changes in v3:
> * Do high water mark accounting only when CONFIG_DEBUG_FS=y. As
> as a result, add back the mem_used() function for the "swiotlb
> buffer is full" error message. [Christoph -- I didn't hear back
> whether this approach addresses your concern about one additional
> atomic operation when slots are allocated and again when freed. I've
> gone ahead with this new version, and we can obviously have further
> discussion.]
>
> * Remove unnecessary u64 casts. [Christoph Hellwig]
>
> * Track slot usage and the high water mark only for io_tlb_default_mem.
> Previous versions incorrectly included per-device pools. [Petr Tesarik]
>
> Changes in v2:
> * Only reset the high water mark to zero when the specified new value
> is zero, to prevent confusion about the ability to reset to some
> other value [Dexuan Cui]
>
> kernel/dma/swiotlb.c | 41 ++++++++++++++++++++++++++++++++++++++++-
> 1 file changed, 40 insertions(+), 1 deletion(-)
>
> diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
> index d3d6be0..6587a3d 100644
> --- a/kernel/dma/swiotlb.c
> +++ b/kernel/dma/swiotlb.c
> @@ -76,6 +76,9 @@ struct io_tlb_slot {
> static unsigned long default_nslabs = IO_TLB_DEFAULT_SIZE >> IO_TLB_SHIFT;
> static unsigned long default_nareas;
>
> +static atomic_long_t total_used = ATOMIC_LONG_INIT(0);
> +static atomic_long_t used_hiwater = ATOMIC_LONG_INIT(0);
> +
> /**
> * struct io_tlb_area - IO TLB memory area descriptor
> *
> @@ -594,6 +597,7 @@ static int swiotlb_do_find_slots(struct device *dev, int area_index,
> unsigned long flags;
> unsigned int slot_base;
> unsigned int slot_index;
> + unsigned long old_hiwater, new_used;
>
> BUG_ON(!nslots);
> BUG_ON(area_index >= mem->nareas);
> @@ -663,6 +667,17 @@ static int swiotlb_do_find_slots(struct device *dev, int area_index,
> area->index = 0;
> area->used += nslots;
> spin_unlock_irqrestore(&area->lock, flags);
> +
> + if (IS_ENABLED(CONFIG_DEBUG_FS) && mem == &io_tlb_default_mem) {

Yes, this works fine now, but why are total_used and used_hiwater
global variables? If you make them fields in struct io_tlb_mem
(possibly guarded with #ifdef CONFIG_DEBUG_FS), you can get rid of the
check. Of course, in instances other than io_tlb_default_mem these
fields would not be exported to userspace through debugfs, but if really
needed, I can at least find them in a crash dump (or read them through
/proc/kcore).

Petr T

2023-04-07 22:14:03

by Michael Kelley (LINUX)

[permalink] [raw]
Subject: RE: [PATCH v3 1/1] swiotlb: Track and report io_tlb_used high water mark in debugfs

From: Petr Tesa??k <[email protected]> Sent: Friday, April 7, 2023 3:56 AM
>
> On Fri, 31 Mar 2023 21:45:00 -0700
> Michael Kelley <[email protected]> wrote:
>
> > swiotlb currently reports the total number of slabs and the instantaneous
> > in-use slabs in debugfs. But with increased usage of swiotlb for all I/O
> > in Confidential Computing (coco) VMs, it has become difficult to know
> > how much memory to allocate for swiotlb bounce buffers, either via the
> > automatic algorithm in the kernel or by specifying a value on the
> > kernel boot line. The current automatic algorithm generously allocates
> > swiotlb bounce buffer memory, and may be wasting significant memory in
> > many use cases.
> >
> > To support better understanding of swiotlb usage, add tracking of the
> > the high water mark usage of the default swiotlb bounce buffer memory
> > pool. Report the high water mark in debugfs along with the other swiotlb
> > metrics. Allow the high water mark to be reset to zero at runtime by
> > writing to it.
> >
> > Signed-off-by: Michael Kelley <[email protected]>
> > ---
> > Changes in v3:
> > * Do high water mark accounting only when CONFIG_DEBUG_FS=y. As
> > as a result, add back the mem_used() function for the "swiotlb
> > buffer is full" error message. [Christoph -- I didn't hear back
> > whether this approach addresses your concern about one additional
> > atomic operation when slots are allocated and again when freed. I've
> > gone ahead with this new version, and we can obviously have further
> > discussion.]
> >
> > * Remove unnecessary u64 casts. [Christoph Hellwig]
> >
> > * Track slot usage and the high water mark only for io_tlb_default_mem.
> > Previous versions incorrectly included per-device pools. [Petr Tesarik]
> >
> > Changes in v2:
> > * Only reset the high water mark to zero when the specified new value
> > is zero, to prevent confusion about the ability to reset to some
> > other value [Dexuan Cui]
> >
> > kernel/dma/swiotlb.c | 41 ++++++++++++++++++++++++++++++++++++++++-
> > 1 file changed, 40 insertions(+), 1 deletion(-)
> >
> > diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
> > index d3d6be0..6587a3d 100644
> > --- a/kernel/dma/swiotlb.c
> > +++ b/kernel/dma/swiotlb.c
> > @@ -76,6 +76,9 @@ struct io_tlb_slot {
> > static unsigned long default_nslabs = IO_TLB_DEFAULT_SIZE >> IO_TLB_SHIFT;
> > static unsigned long default_nareas;
> >
> > +static atomic_long_t total_used = ATOMIC_LONG_INIT(0);
> > +static atomic_long_t used_hiwater = ATOMIC_LONG_INIT(0);
> > +
> > /**
> > * struct io_tlb_area - IO TLB memory area descriptor
> > *
> > @@ -594,6 +597,7 @@ static int swiotlb_do_find_slots(struct device *dev, int
> area_index,
> > unsigned long flags;
> > unsigned int slot_base;
> > unsigned int slot_index;
> > + unsigned long old_hiwater, new_used;
> >
> > BUG_ON(!nslots);
> > BUG_ON(area_index >= mem->nareas);
> > @@ -663,6 +667,17 @@ static int swiotlb_do_find_slots(struct device *dev, int
> area_index,
> > area->index = 0;
> > area->used += nslots;
> > spin_unlock_irqrestore(&area->lock, flags);
> > +
> > + if (IS_ENABLED(CONFIG_DEBUG_FS) && mem == &io_tlb_default_mem) {
>
> Yes, this works fine now, but why are total_used and used_hiwater
> global variables? If you make them fields in struct io_tlb_mem
> (possibly guarded with #ifdef CONFIG_DEBUG_FS), you can get rid of the
> check. Of course, in instances other than io_tlb_default_mem these
> fields would not be exported to userspace through debugfs, but if really
> needed, I can at least find them in a crash dump (or read them through
> /proc/kcore).
>

Got it.

Your previously comments mentioned making them fields in struct io_tlb_mem,
and I missed your point. :-( I got focused on fixing the accounting for
DEBUG_FS so it didn't include the non-default pools, and didn't pick up on the
idea of doing the accounting for the non-default pools even though the values
aren't exposed in /sys. I'll fix this in the next version.

Michael

2023-04-07 22:21:05

by Michael Kelley (LINUX)

[permalink] [raw]
Subject: RE: [PATCH v3 1/1] swiotlb: Track and report io_tlb_used high water mark in debugfs

From: Christoph Hellwig <[email protected]> Sent: Thursday, April 6, 2023 10:50 PM
>
> On Fri, Mar 31, 2023 at 09:45:00PM -0700, Michael Kelley wrote:
> > Changes in v3:
> > * Do high water mark accounting only when CONFIG_DEBUG_FS=y. As
> > as a result, add back the mem_used() function for the "swiotlb
> > buffer is full" error message. [Christoph -- I didn't hear back
> > whether this approach addresses your concern about one additional
> > atomic operation when slots are allocated and again when freed. I've
> > gone ahead with this new version, and we can obviously have further
> > discussion.]
>
> Still not too happy, but at least debugfs is an interfact we could
> remove at any time.
>
> But can you please factor the used_hiwater accounting into two
> separate helpers that are udner CONFIG_DEBUG_FS and otherwise
> stubbed out, instead of adding the logic directly into
> swiotlb_do_find_slots and swiotlb_release_slots?

I coded the way I did to follow the kernel coding style guidance
that prefers converting a Kconfig symbol into a C boolean
expression, and using it in a normal C conditional instead of
using #ifdef. If CONFIG_DEBUG_FS=n, the compiler will constant
fold the conditional away so there's no runtime overhead. I
like the way that approached worked out in this case, but if you prefer
separate functions with #ifdef and stubs, I don't feel strongly either way.

Michael

2023-04-11 03:45:03

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [PATCH v3 1/1] swiotlb: Track and report io_tlb_used high water mark in debugfs

On Fri, Apr 07, 2023 at 10:05:26PM +0000, Michael Kelley (LINUX) wrote:
> > Yes, this works fine now, but why are total_used and used_hiwater
> > global variables? If you make them fields in struct io_tlb_mem
> > (possibly guarded with #ifdef CONFIG_DEBUG_FS), you can get rid of the
> > check. Of course, in instances other than io_tlb_default_mem these
> > fields would not be exported to userspace through debugfs, but if really
> > needed, I can at least find them in a crash dump (or read them through
> > /proc/kcore).
> >
>
> Got it.
>
> Your previously comments mentioned making them fields in struct io_tlb_mem,
> and I missed your point. :-( I got focused on fixing the accounting for
> DEBUG_FS so it didn't include the non-default pools, and didn't pick up on the
> idea of doing the accounting for the non-default pools even though the values
> aren't exposed in /sys. I'll fix this in the next version.

FYI, I agree that per-instance accounting is probably the better way,
too.

2023-04-11 03:47:29

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [PATCH v3 1/1] swiotlb: Track and report io_tlb_used high water mark in debugfs

On Fri, Apr 07, 2023 at 10:01:13PM +0000, Michael Kelley (LINUX) wrote:
> I coded the way I did to follow the kernel coding style guidance
> that prefers converting a Kconfig symbol into a C boolean
> expression, and using it in a normal C conditional instead of
> using #ifdef. If CONFIG_DEBUG_FS=n, the compiler will constant
> fold the conditional away so there's no runtime overhead. I
> like the way that approached worked out in this case, but if you prefer
> separate functions with #ifdef and stubs, I don't feel strongly either way.

I don't think there is a a hard and clear rule. Actual ifdefs have the
benefit of allowing to actually remove struct fields as well. But
the important bit is that I do want the accounting in helpers instead
of in the main swiotlb logic. And once you do that, having #ifdefed
stubs for the functions make sense to me.

2023-04-12 17:41:31

by Michael Kelley (LINUX)

[permalink] [raw]
Subject: RE: [PATCH v3 1/1] swiotlb: Track and report io_tlb_used high water mark in debugfs

From: [email protected] <[email protected]> Sent: Monday, April 10, 2023 8:42 PM
>
> On Fri, Apr 07, 2023 at 10:05:26PM +0000, Michael Kelley (LINUX) wrote:
> > > Yes, this works fine now, but why are total_used and used_hiwater
> > > global variables? If you make them fields in struct io_tlb_mem
> > > (possibly guarded with #ifdef CONFIG_DEBUG_FS), you can get rid of the
> > > check. Of course, in instances other than io_tlb_default_mem these
> > > fields would not be exported to userspace through debugfs, but if really
> > > needed, I can at least find them in a crash dump (or read them through
> > > /proc/kcore).
> > >
> >
> > Got it.
> >
> > Your previously comments mentioned making them fields in struct io_tlb_mem,
> > and I missed your point. :-( I got focused on fixing the accounting for
> > DEBUG_FS so it didn't include the non-default pools, and didn't pick up on the
> > idea of doing the accounting for the non-default pools even though the values
> > aren't exposed in /sys. I'll fix this in the next version.
>
> FYI, I agree that per-instance accounting is probably the better way,
> too.

It turns out that restricted pool swiotlb information *is* available in /sys
when CONFIG_DEBUG_FS=y. There's a call to swiotlb_create_debugfs_files()
in rmem_swiotlb_device_init(). But the /sys info is broken because
io_tlb_used_get() is hardwired to the default pool.

But this is easily fixable. I'll do a two-patch series that fixes this problem,
and then makes the other changes for the high water mark that we've
been discussing.

Michael