2019-07-12 06:38:06

by Sultan Alsawaf

[permalink] [raw]
Subject: [PATCH] scatterlist: Allocate a contiguous array instead of chaining

From: Sultan Alsawaf <[email protected]>

Typically, drivers allocate sg lists of sizes up to a few MiB in size.
The current algorithm deals with large sg lists by splitting them into
several smaller arrays and chaining them together. But if the sg list
allocation is large, and we know the size ahead of time, sg chaining is
both inefficient and unnecessary.

Rather than calling kmalloc hundreds of times in a loop for chaining
tiny arrays, we can simply do it all at once with kvmalloc, which has
the proper tradeoff on when to stop using kmalloc and instead use
vmalloc.

Abusing repeated kmallocs to produce a large allocation puts strain on
the slab allocator, when kvmalloc can be used instead. The single
kvmalloc allocation for all sg lists reduces the burden on the slab and
page allocators, since for large sg list allocations, this commit
replaces numerous kmalloc calls with one kvmalloc call.

The sg chaining is effectively disabled by changing SG_MAX_SINGLE_ALLOC
to UINT_MAX, which causes sg list allocations to never be split into
chains, since no allocation is larger than UINT_MAX. We then plumb
kvmalloc into the allocation functions so that it is used.

Signed-off-by: Sultan Alsawaf <[email protected]>
---
include/linux/scatterlist.h | 2 +-
lib/scatterlist.c | 23 ++---------------------
2 files changed, 3 insertions(+), 22 deletions(-)

diff --git a/include/linux/scatterlist.h b/include/linux/scatterlist.h
index 6eec50fb36c8..e2e26c53c441 100644
--- a/include/linux/scatterlist.h
+++ b/include/linux/scatterlist.h
@@ -310,7 +310,7 @@ size_t sg_zero_buffer(struct scatterlist *sgl, unsigned int nents,
* Maximum number of entries that will be allocated in one piece, if
* a list larger than this is required then chaining will be utilized.
*/
-#define SG_MAX_SINGLE_ALLOC (PAGE_SIZE / sizeof(struct scatterlist))
+#define SG_MAX_SINGLE_ALLOC (UINT_MAX)

/*
* The maximum number of SG segments that we will put inside a
diff --git a/lib/scatterlist.c b/lib/scatterlist.c
index c2cf2c311b7d..bf76854a34aa 100644
--- a/lib/scatterlist.c
+++ b/lib/scatterlist.c
@@ -148,31 +148,12 @@ EXPORT_SYMBOL(sg_init_one);
*/
static struct scatterlist *sg_kmalloc(unsigned int nents, gfp_t gfp_mask)
{
- if (nents == SG_MAX_SINGLE_ALLOC) {
- /*
- * Kmemleak doesn't track page allocations as they are not
- * commonly used (in a raw form) for kernel data structures.
- * As we chain together a list of pages and then a normal
- * kmalloc (tracked by kmemleak), in order to for that last
- * allocation not to become decoupled (and thus a
- * false-positive) we need to inform kmemleak of all the
- * intermediate allocations.
- */
- void *ptr = (void *) __get_free_page(gfp_mask);
- kmemleak_alloc(ptr, PAGE_SIZE, 1, gfp_mask);
- return ptr;
- } else
- return kmalloc_array(nents, sizeof(struct scatterlist),
- gfp_mask);
+ return kvmalloc_array(nents, sizeof(struct scatterlist), gfp_mask);
}

static void sg_kfree(struct scatterlist *sg, unsigned int nents)
{
- if (nents == SG_MAX_SINGLE_ALLOC) {
- kmemleak_free(sg);
- free_page((unsigned long) sg);
- } else
- kfree(sg);
+ kvfree(sg);
}

/**
--
2.22.0


2019-07-12 06:57:10

by Ming Lei

[permalink] [raw]
Subject: Re: [PATCH] scatterlist: Allocate a contiguous array instead of chaining

On Thu, Jul 11, 2019 at 11:36:56PM -0700, Sultan Alsawaf wrote:
> From: Sultan Alsawaf <[email protected]>
>
> Typically, drivers allocate sg lists of sizes up to a few MiB in size.
> The current algorithm deals with large sg lists by splitting them into
> several smaller arrays and chaining them together. But if the sg list
> allocation is large, and we know the size ahead of time, sg chaining is
> both inefficient and unnecessary.
>
> Rather than calling kmalloc hundreds of times in a loop for chaining
> tiny arrays, we can simply do it all at once with kvmalloc, which has
> the proper tradeoff on when to stop using kmalloc and instead use
> vmalloc.

vmalloc() may sleep, so it is impossible to be called in atomic context.

Thanks,
Ming

2019-07-12 07:07:57

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [PATCH] scatterlist: Allocate a contiguous array instead of chaining

On Fri, 12 Jul 2019, Ming Lei wrote:
> On Thu, Jul 11, 2019 at 11:36:56PM -0700, Sultan Alsawaf wrote:
> > From: Sultan Alsawaf <[email protected]>
> >
> > Typically, drivers allocate sg lists of sizes up to a few MiB in size.
> > The current algorithm deals with large sg lists by splitting them into
> > several smaller arrays and chaining them together. But if the sg list
> > allocation is large, and we know the size ahead of time, sg chaining is
> > both inefficient and unnecessary.
> >
> > Rather than calling kmalloc hundreds of times in a loop for chaining
> > tiny arrays, we can simply do it all at once with kvmalloc, which has
> > the proper tradeoff on when to stop using kmalloc and instead use
> > vmalloc.
>
> vmalloc() may sleep, so it is impossible to be called in atomic context.

Allocations from atomic context should be avoided wherever possible and you
really have to have a very convincing argument why an atomic allocation is
absolutely necessary. I cleaned up quite some GFP_ATOMIC users over the
last couple of years and all of them were doing it for the very wrong
reasons and mostly just to silence the warning which is triggered with
GFP_KERNEL when called from a non-sleepable context.

So I suggest to audit all call sites first and figure out whether they
really must use GFP_ATOMIC and if possible clean them up, remove the GFP
argument and then do the vmalloc thing on top.

Thanks,

tglx

2019-07-12 07:20:05

by Sultan Alsawaf

[permalink] [raw]
Subject: Re: [PATCH] scatterlist: Allocate a contiguous array instead of chaining

On Fri, Jul 12, 2019 at 09:06:40AM +0200, Thomas Gleixner wrote:
> On Fri, 12 Jul 2019, Ming Lei wrote:
> > vmalloc() may sleep, so it is impossible to be called in atomic context.
>
> Allocations from atomic context should be avoided wherever possible and you
> really have to have a very convincing argument why an atomic allocation is
> absolutely necessary. I cleaned up quite some GFP_ATOMIC users over the
> last couple of years and all of them were doing it for the very wrong
> reasons and mostly just to silence the warning which is triggered with
> GFP_KERNEL when called from a non-sleepable context.
>
> So I suggest to audit all call sites first and figure out whether they
> really must use GFP_ATOMIC and if possible clean them up, remove the GFP
> argument and then do the vmalloc thing on top.

Hello Thomas and Ming,

It looks like the following call sites are atomic:
drivers/crypto/qce/ablkcipher.c:92: ret = sg_alloc_table(&rctx->dst_tbl, rctx->dst_nents, gfp);
drivers/crypto/ccp/ccp-crypto-aes-cmac.c:110: ret = sg_alloc_table(&rctx->data_sg, sg_count, gfp);
drivers/crypto/ccp/ccp-crypto-sha.c:103: ret = sg_alloc_table(&rctx->data_sg, sg_count, gfp);
drivers/spi/spi-pl022.c:1035: ret = sg_alloc_table(&pl022->sgt_rx, pages, GFP_ATOMIC);
drivers/spi/spi-pl022.c:1039: ret = sg_alloc_table(&pl022->sgt_tx, pages, GFP_ATOMIC);

The crypto ones are conditionally made atomic depending on the presence of
CRYPTO_TFM_REQ_MAY_SLEEP.

Additionally, the following allocation could be problematic with kvmalloc:
net/ceph/crypto.c:180: ret = sg_alloc_table(sgt, chunk_cnt, GFP_NOFS);

This is a snippet from kvmalloc:
/*
* vmalloc uses GFP_KERNEL for some internal allocations (e.g page tables)
* so the given set of flags has to be compatible.
*/
if ((flags & GFP_KERNEL) != GFP_KERNEL)
return kmalloc_node(size, flags, node);

Use of GFP_NOFS in net/ceph/crypto.c would cause kvmalloc to fall back to
kmalloc_node, which could cause problems if the allocation size is too large for
kmalloc_node to reasonably accomodate.

Also, it looks like the vmalloc family doesn't have kvmalloc's GFP_KERNEL check.
Is this intentional, or does vmalloc really not require GFP_KERNEL context?

Thanks,
Sultan

2019-07-12 17:00:54

by Andi Kleen

[permalink] [raw]
Subject: Re: [PATCH] scatterlist: Allocate a contiguous array instead of chaining

Sultan Alsawaf <[email protected]> writes:
>
> Abusing repeated kmallocs to produce a large allocation puts strain on
> the slab allocator, when kvmalloc can be used instead. The single
> kvmalloc allocation for all sg lists reduces the burden on the slab and
> page allocators, since for large sg list allocations, this commit
> replaces numerous kmalloc calls with one kvmalloc call.

Note that vmalloc will eventually cause global TLB flushes, which
are very expensive on larger systems. THere are also global locks
in vmalloc, which can also cause scaling problems. slab is a lot
more optimized.

I don't see any performance numbers in your proposal.

Did you test these corner cases?

I suspect you're better of with larger kmallocs up to a reasonable
size. How big would a sg list need to be to need more than a couple
pages?

-Andi