2021-02-22 15:26:17

by Chuck Lever

[permalink] [raw]
Subject: [PATCH v2 0/4] Reduce page allocator traffic caused by NFSD

Hi Mel-

I've been testing these four, which include a working version of
your bulk allocator patch. In "Refresh rq_pages" I've replaced
the cond_resched() call with schedule_timeout(), as you requested.

As always, review comments and test results are welcome.

---

Chuck Lever (3):
SUNRPC: Set rq_page_end differently
SUNRPC: Refresh rq_pages using a bulk page allocator
SUNRPC: Cache pages that were replaced during a read splice

Mel Gorman (1):
mm: alloc_pages_bulk()


fs/nfsd/vfs.c | 4 +-
include/linux/gfp.h | 24 +++++++
include/linux/sunrpc/svc.h | 1 +
include/linux/sunrpc/svc_xprt.h | 28 ++++++++
mm/page_alloc.c | 110 +++++++++++++++++++++++++++++++-
net/sunrpc/svc.c | 7 ++
net/sunrpc/svc_xprt.c | 55 ++++++++++++----
7 files changed, 214 insertions(+), 15 deletions(-)

--
Chuck Lever


2021-02-22 15:26:59

by Chuck Lever

[permalink] [raw]
Subject: [PATCH v2 2/4] mm: alloc_pages_bulk()

From: Mel Gorman <[email protected]>

On Wed, Feb 10, 2021 at 12:41:03PM +0100, Jesper Dangaard Brouer wrote:
> On Wed, 10 Feb 2021 08:41:55 +0000
> Mel Gorman <[email protected]> wrote:
>
> > On Tue, Feb 09, 2021 at 11:31:08AM +0100, Jesper Dangaard Brouer wrote:
> > > > > Neil Brown pointed me to this old thread:
> > > > >
> > > > > https://lore.kernel.org/lkml/[email protected]/
> > > > >
> > > > > We see that many of the prerequisites are in v5.11-rc, but
> > > > > alloc_page_bulk() is not. I tried forward-porting 4/4 in that
> > > > > series, but enough internal APIs have changed since 2017 that
> > > > > the patch does not come close to applying and compiling.
> > >
> > > I forgot that this was never merged. It is sad as Mel showed huge
> > > improvement with his work.
> > >
> > > > > I'm wondering:
> > > > >
> > > > > a) is there a newer version of that work?
> > > > >
> > >
> > > Mel, why was this work never merged upstream?
> > >
> >
> > Lack of realistic consumers to drive it forward, finalise the API and
> > confirm it was working as expected. It eventually died as a result. If it
> > was reintroduced, it would need to be forward ported and then implement
> > at least one user on top.
>
> I guess I misunderstood you back in 2017. I though that I had presented
> a clear use-case/consumer in page_pool[1].

You did but it was never integrated and/or tested AFAIK. I see page_pool
accepts orders so even by the original prototype, it would only have seen
a benefit for order-0 pages. It would also have needed some supporting
data that it actually helped with drivers using the page_pool interface
which I was not in the position to properly test at the time.

> But you wanted the code as
> part of the patchset I guess. I though, I could add it later via the
> net-next tree.
>

Yes, a consumer of the code should go in at the same time with supporting
data showing it actually helps because otherwise it's dead code.

> It seems that Chuck now have a NFS use-case, and Hellwig also have a
> use-case for DMA-iommu in __iommu_dma_alloc_pages.
>
> The performance improvement (in above link) were really impressive!
>
> Quote:
> "It's roughly a 50-70% reduction of allocation costs and roughly a halving of the
> overall cost of allocating/freeing batches of pages."
>
> Who have time to revive this patchset?
>

Not in the short term due to bug load and other obligations.

The original series had "mm, page_allocator: Only use per-cpu allocator
for irq-safe requests" but that was ultimately rejected because softirqs
were affected so it would have to be done without that patch.

The last patch can be rebased easily enough but it only batch allocates
order-0 pages. It's also only build tested and could be completely
miserable in practice and as I didn't even try boot test let, let alone
actually test it, it could be a giant pile of crap. To make high orders
work, it would need significant reworking but if the API showed even
partial benefit, it might motiviate someone to reimplement the bulk
interfaces to perform better.

Rebased diff, build tested only, might not even work

Signed-off-by: Mel Gorman <[email protected]>
Signed-off-by: Chuck Lever <[email protected]>
---
include/linux/gfp.h | 24 +++++++++++
mm/page_alloc.c | 110 +++++++++++++++++++++++++++++++++++++++++++++++++++
2 files changed, 133 insertions(+), 1 deletion(-)

diff --git a/include/linux/gfp.h b/include/linux/gfp.h
index 80544d5c08e7..4363627a0fe2 100644
--- a/include/linux/gfp.h
+++ b/include/linux/gfp.h
@@ -511,6 +511,29 @@ __alloc_pages(gfp_t gfp_mask, unsigned int order, int preferred_nid)
return __alloc_pages_nodemask(gfp_mask, order, preferred_nid, NULL);
}

+unsigned long
+__alloc_pages_bulk_nodemask(gfp_t gfp_mask, unsigned int order,
+ struct zonelist *zonelist, nodemask_t *nodemask,
+ unsigned long nr_pages, struct list_head *alloc_list);
+
+static inline unsigned long
+__alloc_pages_bulk(gfp_t gfp_mask, unsigned int order,
+ struct zonelist *zonelist, unsigned long nr_pages,
+ struct list_head *list)
+{
+ return __alloc_pages_bulk_nodemask(gfp_mask, order, zonelist, NULL,
+ nr_pages, list);
+}
+
+static inline unsigned long
+alloc_pages_bulk(gfp_t gfp_mask, unsigned int order,
+ unsigned long nr_pages, struct list_head *list)
+{
+ int nid = numa_mem_id();
+ return __alloc_pages_bulk(gfp_mask, order,
+ node_zonelist(nid, gfp_mask), nr_pages, list);
+}
+
/*
* Allocate pages, preferring the node given as nid. The node must be valid and
* online. For more general interface, see alloc_pages_node().
@@ -580,6 +603,7 @@ void * __meminit alloc_pages_exact_nid(int nid, size_t size, gfp_t gfp_mask);

extern void __free_pages(struct page *page, unsigned int order);
extern void free_pages(unsigned long addr, unsigned int order);
+extern void free_pages_bulk(struct list_head *list);

struct page_frag_cache;
extern void __page_frag_cache_drain(struct page *page, unsigned int count);
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index ef5070fed76b..da6984094913 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -3254,7 +3254,7 @@ void free_unref_page(struct page *page)
}

/*
- * Free a list of 0-order pages
+ * Free a list of 0-order pages whose reference count is already zero.
*/
void free_unref_page_list(struct list_head *list)
{
@@ -4435,6 +4435,21 @@ static void wake_all_kswapds(unsigned int order, gfp_t gfp_mask,
}
}

+/* Drop reference counts and free pages from a list */
+void free_pages_bulk(struct list_head *list)
+{
+ struct page *page, *next;
+
+ list_for_each_entry_safe(page, next, list, lru) {
+ trace_mm_page_free_batched(page);
+ if (put_page_testzero(page)) {
+ list_del(&page->lru);
+ __free_pages_ok(page, 0, FPI_NONE);
+ }
+ }
+}
+EXPORT_SYMBOL_GPL(free_pages_bulk);
+
static inline unsigned int
gfp_to_alloc_flags(gfp_t gfp_mask)
{
@@ -5820,6 +5835,99 @@ static int find_next_best_node(int node, nodemask_t *used_node_mask)
}


+/*
+ * This is a batched version of the page allocator that attempts to
+ * allocate nr_pages quickly from the preferred zone and add them to list.
+ * Note that there is no guarantee that nr_pages will be allocated although
+ * every effort will be made to allocate at least one. Unlike the core
+ * allocator, no special effort is made to recover from transient
+ * failures caused by changes in cpusets. It should only be used from !IRQ
+ * context. An attempt to allocate a batch of patches from an interrupt
+ * will allocate a single page.
+ */
+unsigned long
+__alloc_pages_bulk_nodemask(gfp_t gfp_mask, unsigned int order,
+ struct zonelist *zonelist, nodemask_t *nodemask,
+ unsigned long nr_pages, struct list_head *alloc_list)
+{
+ struct page *page;
+ unsigned long alloced = 0;
+ unsigned int alloc_flags = ALLOC_WMARK_LOW;
+ unsigned long flags;
+ struct zone *zone;
+ struct per_cpu_pages *pcp;
+ struct list_head *pcp_list;
+ int migratetype;
+ gfp_t alloc_mask = gfp_mask; /* The gfp_t that was actually used for allocation */
+ struct alloc_context ac = { };
+
+ /* If there are already pages on the list, don't bother */
+ if (!list_empty(alloc_list))
+ return 0;
+
+ /* Order-0 cannot go through per-cpu lists */
+ if (order)
+ goto failed;
+
+ gfp_mask &= gfp_allowed_mask;
+
+ if (!prepare_alloc_pages(gfp_mask, order, numa_mem_id(), nodemask, &ac, &alloc_mask, &alloc_flags))
+ return 0;
+
+ if (!ac.preferred_zoneref)
+ return 0;
+
+ /*
+ * Only attempt a batch allocation if watermarks on the preferred zone
+ * are safe.
+ */
+ zone = ac.preferred_zoneref->zone;
+ if (!zone_watermark_fast(zone, order, high_wmark_pages(zone) + nr_pages,
+ zonelist_zone_idx(ac.preferred_zoneref), alloc_flags, gfp_mask))
+ goto failed;
+
+ /* Attempt the batch allocation */
+ migratetype = ac.migratetype;
+
+ local_irq_save(flags);
+ pcp = &this_cpu_ptr(zone->pageset)->pcp;
+ pcp_list = &pcp->lists[migratetype];
+
+ while (nr_pages) {
+ page = __rmqueue_pcplist(zone, migratetype, alloc_flags,
+ pcp, pcp_list);
+ if (!page)
+ break;
+
+ prep_new_page(page, order, gfp_mask, 0);
+ nr_pages--;
+ alloced++;
+ list_add(&page->lru, alloc_list);
+ }
+
+ if (!alloced) {
+ preempt_enable_no_resched();
+ goto failed;
+ }
+
+ __count_zid_vm_events(PGALLOC, zone_idx(zone), alloced);
+ zone_statistics(zone, zone);
+
+ local_irq_restore(flags);
+
+ return alloced;
+
+failed:
+ page = __alloc_pages_nodemask(gfp_mask, order, numa_node_id(), nodemask);
+ if (page) {
+ alloced++;
+ list_add(&page->lru, alloc_list);
+ }
+
+ return alloced;
+}
+EXPORT_SYMBOL(__alloc_pages_bulk_nodemask);
+
/*
* Build zonelists ordered by node and zones within node.
* This results in maximum locality--normal zone overflows into local


2021-02-22 15:27:25

by Chuck Lever

[permalink] [raw]
Subject: [PATCH v2 1/4] SUNRPC: Set rq_page_end differently

Refactor:

I'm about to use the loop variable @i for something else.

As far as the "i++" is concerned, that is a post-increment. The
value of @i is not used subsequently, so the increment operator
is unnecessary and can be removed.

Also note that nfsd_read_actor() was renamed nfsd_splice_actor()
by commit cf8208d0eabd ("sendfile: convert nfsd to
splice_direct_to_actor()").

Signed-off-by: Chuck Lever <[email protected]>
---
net/sunrpc/svc_xprt.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/net/sunrpc/svc_xprt.c b/net/sunrpc/svc_xprt.c
index da8165bca0d5..819e46ab0a4a 100644
--- a/net/sunrpc/svc_xprt.c
+++ b/net/sunrpc/svc_xprt.c
@@ -686,8 +686,8 @@ static int svc_alloc_arg(struct svc_rqst *rqstp)
}
rqstp->rq_pages[i] = p;
}
- rqstp->rq_page_end = &rqstp->rq_pages[i];
- rqstp->rq_pages[i++] = NULL; /* this might be seen in nfs_read_actor */
+ rqstp->rq_page_end = &rqstp->rq_pages[pages];
+ rqstp->rq_pages[pages] = NULL; /* this might be seen in nfsd_splice_actor() */

/* Make arg->head point to first page and arg->pages point to rest */
arg = &rqstp->rq_arg;


2021-02-22 15:27:55

by Chuck Lever

[permalink] [raw]
Subject: [PATCH v2 3/4] SUNRPC: Refresh rq_pages using a bulk page allocator

Reduce the rate at which nfsd threads hammer on the page allocator.
This improve throughput scalability by enabling the threads to run
more independently of each other.

Signed-off-by: Chuck Lever <[email protected]>
---
net/sunrpc/svc_xprt.c | 43 +++++++++++++++++++++++++++++++------------
1 file changed, 31 insertions(+), 12 deletions(-)

diff --git a/net/sunrpc/svc_xprt.c b/net/sunrpc/svc_xprt.c
index 819e46ab0a4a..15aacfa5ca21 100644
--- a/net/sunrpc/svc_xprt.c
+++ b/net/sunrpc/svc_xprt.c
@@ -661,11 +661,12 @@ static void svc_check_conn_limits(struct svc_serv *serv)
static int svc_alloc_arg(struct svc_rqst *rqstp)
{
struct svc_serv *serv = rqstp->rq_server;
+ unsigned long needed;
struct xdr_buf *arg;
+ struct page *page;
int pages;
int i;

- /* now allocate needed pages. If we get a failure, sleep briefly */
pages = (serv->sv_max_mesg + 2 * PAGE_SIZE) >> PAGE_SHIFT;
if (pages > RPCSVC_MAXPAGES) {
pr_warn_once("svc: warning: pages=%u > RPCSVC_MAXPAGES=%lu\n",
@@ -673,19 +674,28 @@ static int svc_alloc_arg(struct svc_rqst *rqstp)
/* use as many pages as possible */
pages = RPCSVC_MAXPAGES;
}
- for (i = 0; i < pages ; i++)
- while (rqstp->rq_pages[i] == NULL) {
- struct page *p = alloc_page(GFP_KERNEL);
- if (!p) {
- set_current_state(TASK_INTERRUPTIBLE);
- if (signalled() || kthread_should_stop()) {
- set_current_state(TASK_RUNNING);
- return -EINTR;
- }
- schedule_timeout(msecs_to_jiffies(500));
+
+ for (needed = 0, i = 0; i < pages ; i++)
+ if (!rqstp->rq_pages[i])
+ needed++;
+ if (needed) {
+ LIST_HEAD(list);
+
+retry:
+ alloc_pages_bulk(GFP_KERNEL, 0, needed, &list);
+ for (i = 0; i < pages; i++) {
+ if (!rqstp->rq_pages[i]) {
+ page = list_first_entry_or_null(&list,
+ struct page,
+ lru);
+ if (unlikely(!page))
+ goto empty_list;
+ list_del(&page->lru);
+ rqstp->rq_pages[i] = page;
+ needed--;
}
- rqstp->rq_pages[i] = p;
}
+ }
rqstp->rq_page_end = &rqstp->rq_pages[pages];
rqstp->rq_pages[pages] = NULL; /* this might be seen in nfsd_splice_actor() */

@@ -700,6 +710,15 @@ static int svc_alloc_arg(struct svc_rqst *rqstp)
arg->len = (pages-1)*PAGE_SIZE;
arg->tail[0].iov_len = 0;
return 0;
+
+empty_list:
+ set_current_state(TASK_INTERRUPTIBLE);
+ if (signalled() || kthread_should_stop()) {
+ set_current_state(TASK_RUNNING);
+ return -EINTR;
+ }
+ schedule_timeout(msecs_to_jiffies(500));
+ goto retry;
}

static bool


2021-02-22 15:28:17

by Chuck Lever

[permalink] [raw]
Subject: [PATCH v2 4/4] SUNRPC: Cache pages that were replaced during a read splice

To avoid extra trips to the page allocator, don't free unused pages
in nfsd_splice_actor(), but instead place them in a local cache.
That cache is then used first when refilling rq_pages.

On workloads that perform large NFS READs on splice-capable file
systems, this saves a considerable amount of work.

Suggested-by: NeilBrown <[email protected]>
Signed-off-by: Chuck Lever <[email protected]>
---
fs/nfsd/vfs.c | 4 ++--
include/linux/sunrpc/svc.h | 1 +
include/linux/sunrpc/svc_xprt.h | 28 ++++++++++++++++++++++++++++
net/sunrpc/svc.c | 7 +++++++
net/sunrpc/svc_xprt.c | 12 ++++++++++++
5 files changed, 50 insertions(+), 2 deletions(-)

diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c
index d316e11923c5..25cf41eaf3c4 100644
--- a/fs/nfsd/vfs.c
+++ b/fs/nfsd/vfs.c
@@ -852,14 +852,14 @@ nfsd_splice_actor(struct pipe_inode_info *pipe, struct pipe_buffer *buf,

if (rqstp->rq_res.page_len == 0) {
get_page(page);
- put_page(*rqstp->rq_next_page);
+ svc_rqst_put_unused_page(rqstp, *rqstp->rq_next_page);
*(rqstp->rq_next_page++) = page;
rqstp->rq_res.page_base = buf->offset;
rqstp->rq_res.page_len = size;
} else if (page != pp[-1]) {
get_page(page);
if (*rqstp->rq_next_page)
- put_page(*rqstp->rq_next_page);
+ svc_rqst_put_unused_page(rqstp, *rqstp->rq_next_page);
*(rqstp->rq_next_page++) = page;
rqstp->rq_res.page_len += size;
} else
diff --git a/include/linux/sunrpc/svc.h b/include/linux/sunrpc/svc.h
index 31ee3b6047c3..340f4f3989c0 100644
--- a/include/linux/sunrpc/svc.h
+++ b/include/linux/sunrpc/svc.h
@@ -250,6 +250,7 @@ struct svc_rqst {
struct xdr_stream rq_arg_stream;
struct page *rq_scratch_page;
struct xdr_buf rq_res;
+ struct list_head rq_unused_pages;
struct page *rq_pages[RPCSVC_MAXPAGES + 1];
struct page * *rq_respages; /* points into rq_pages */
struct page * *rq_next_page; /* next reply page to use */
diff --git a/include/linux/sunrpc/svc_xprt.h b/include/linux/sunrpc/svc_xprt.h
index 571f605bc91e..49ef86499876 100644
--- a/include/linux/sunrpc/svc_xprt.h
+++ b/include/linux/sunrpc/svc_xprt.h
@@ -150,6 +150,34 @@ static inline void svc_xprt_get(struct svc_xprt *xprt)
{
kref_get(&xprt->xpt_ref);
}
+
+/**
+ * svc_rqst_get_unused_page - Tap a page from the local cache
+ * @rqstp: svc_rqst with cached unused pages
+ *
+ * To save an allocator round trip, pages can be added to a
+ * local cache and re-used later by svc_alloc_arg().
+ *
+ * Returns an unused page, or NULL if the cache is empty.
+ */
+static inline struct page *svc_rqst_get_unused_page(struct svc_rqst *rqstp)
+{
+ return list_first_entry_or_null(&rqstp->rq_unused_pages,
+ struct page, lru);
+}
+
+/**
+ * svc_rqst_put_unused_page - Stash a page in the local cache
+ * @rqstp: svc_rqst with cached unused pages
+ * @page: page to cache
+ *
+ */
+static inline void svc_rqst_put_unused_page(struct svc_rqst *rqstp,
+ struct page *page)
+{
+ list_add(&page->lru, &rqstp->rq_unused_pages);
+}
+
static inline void svc_xprt_set_local(struct svc_xprt *xprt,
const struct sockaddr *sa,
const size_t salen)
diff --git a/net/sunrpc/svc.c b/net/sunrpc/svc.c
index 61fb8a18552c..3920fa8f1146 100644
--- a/net/sunrpc/svc.c
+++ b/net/sunrpc/svc.c
@@ -570,6 +570,8 @@ svc_init_buffer(struct svc_rqst *rqstp, unsigned int size, int node)
if (svc_is_backchannel(rqstp))
return 1;

+ INIT_LIST_HEAD(&rqstp->rq_unused_pages);
+
pages = size / PAGE_SIZE + 1; /* extra page as we hold both request and reply.
* We assume one is at most one page
*/
@@ -593,8 +595,13 @@ svc_init_buffer(struct svc_rqst *rqstp, unsigned int size, int node)
static void
svc_release_buffer(struct svc_rqst *rqstp)
{
+ struct page *page;
unsigned int i;

+ while ((page = svc_rqst_get_unused_page(rqstp))) {
+ list_del(&page->lru);
+ put_page(page);
+ }
for (i = 0; i < ARRAY_SIZE(rqstp->rq_pages); i++)
if (rqstp->rq_pages[i])
put_page(rqstp->rq_pages[i]);
diff --git a/net/sunrpc/svc_xprt.c b/net/sunrpc/svc_xprt.c
index 15aacfa5ca21..84210e546a66 100644
--- a/net/sunrpc/svc_xprt.c
+++ b/net/sunrpc/svc_xprt.c
@@ -678,6 +678,18 @@ static int svc_alloc_arg(struct svc_rqst *rqstp)
for (needed = 0, i = 0; i < pages ; i++)
if (!rqstp->rq_pages[i])
needed++;
+ if (needed) {
+ for (i = 0; i < pages; i++) {
+ if (!rqstp->rq_pages[i]) {
+ page = svc_rqst_get_unused_page(rqstp);
+ if (!page)
+ break;
+ list_del(&page->lru);
+ rqstp->rq_pages[i] = page;
+ needed--;
+ }
+ }
+ }
if (needed) {
LIST_HEAD(list);



2021-02-22 17:40:43

by kernel test robot

[permalink] [raw]
Subject: Re: [PATCH v2 3/4] SUNRPC: Refresh rq_pages using a bulk page allocator

Hi Chuck,

I love your patch! Yet something to improve:

[auto build test ERROR on nfs/linux-next]
[also build test ERROR on v5.11 next-20210222]
[cannot apply to nfsd/nfsd-next hnaz-linux-mm/master]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url: https://github.com/0day-ci/linux/commits/Chuck-Lever/Reduce-page-allocator-traffic-caused-by-NFSD/20210222-232552
base: git://git.linux-nfs.org/projects/trondmy/linux-nfs.git linux-next
config: nds32-defconfig (attached as .config)
compiler: nds32le-linux-gcc (GCC) 9.3.0
reproduce (this is a W=1 build):
wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
chmod +x ~/bin/make.cross
# https://github.com/0day-ci/linux/commit/b96e9fc08f8c7bd4c85a2f60171fdd344643580f
git remote add linux-review https://github.com/0day-ci/linux
git fetch --no-tags linux-review Chuck-Lever/Reduce-page-allocator-traffic-caused-by-NFSD/20210222-232552
git checkout b96e9fc08f8c7bd4c85a2f60171fdd344643580f
# save the attached .config to linux build tree
COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-9.3.0 make.cross ARCH=nds32

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <[email protected]>

All errors (new ones prefixed by >>):

nds32le-linux-ld: net/sunrpc/svc_xprt.o: in function `svc_recv':
svc_xprt.c:(.text+0x1b6e): undefined reference to `__alloc_pages_bulk_nodemask'
>> nds32le-linux-ld: svc_xprt.c:(.text+0x1b72): undefined reference to `__alloc_pages_bulk_nodemask'

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/[email protected]


Attachments:
(No filename) (1.83 kB)
.config.gz (10.55 kB)
Download all attachments

2021-02-22 17:51:35

by kernel test robot

[permalink] [raw]
Subject: Re: [PATCH v2 3/4] SUNRPC: Refresh rq_pages using a bulk page allocator

Hi Chuck,

I love your patch! Yet something to improve:

[auto build test ERROR on nfs/linux-next]
[also build test ERROR on v5.11 next-20210222]
[cannot apply to nfsd/nfsd-next hnaz-linux-mm/master]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url: https://github.com/0day-ci/linux/commits/Chuck-Lever/Reduce-page-allocator-traffic-caused-by-NFSD/20210222-232552
base: git://git.linux-nfs.org/projects/trondmy/linux-nfs.git linux-next
config: arm-defconfig (attached as .config)
compiler: arm-linux-gnueabi-gcc (GCC) 9.3.0
reproduce (this is a W=1 build):
wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
chmod +x ~/bin/make.cross
# https://github.com/0day-ci/linux/commit/b96e9fc08f8c7bd4c85a2f60171fdd344643580f
git remote add linux-review https://github.com/0day-ci/linux
git fetch --no-tags linux-review Chuck-Lever/Reduce-page-allocator-traffic-caused-by-NFSD/20210222-232552
git checkout b96e9fc08f8c7bd4c85a2f60171fdd344643580f
# save the attached .config to linux build tree
COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-9.3.0 make.cross ARCH=arm

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <[email protected]>

All errors (new ones prefixed by >>):

arm-linux-gnueabi-ld: net/sunrpc/svc_xprt.o: in function `svc_recv':
>> svc_xprt.c:(.text+0x1c9c): undefined reference to `__alloc_pages_bulk_nodemask'

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/[email protected]


Attachments:
(No filename) (1.73 kB)
.config.gz (52.93 kB)
Download all attachments

2021-02-26 07:22:22

by NeilBrown

[permalink] [raw]
Subject: Re: [PATCH v2 4/4] SUNRPC: Cache pages that were replaced during a read splice

On Mon, Feb 22 2021, Chuck Lever wrote:

> To avoid extra trips to the page allocator, don't free unused pages
> in nfsd_splice_actor(), but instead place them in a local cache.
> That cache is then used first when refilling rq_pages.
>
> On workloads that perform large NFS READs on splice-capable file
> systems, this saves a considerable amount of work.
>
> Suggested-by: NeilBrown <[email protected]>
> Signed-off-by: Chuck Lever <[email protected]>
> ---
> fs/nfsd/vfs.c | 4 ++--
> include/linux/sunrpc/svc.h | 1 +
> include/linux/sunrpc/svc_xprt.h | 28 ++++++++++++++++++++++++++++
> net/sunrpc/svc.c | 7 +++++++
> net/sunrpc/svc_xprt.c | 12 ++++++++++++
> 5 files changed, 50 insertions(+), 2 deletions(-)
>
> diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c
> index d316e11923c5..25cf41eaf3c4 100644
> --- a/fs/nfsd/vfs.c
> +++ b/fs/nfsd/vfs.c
> @@ -852,14 +852,14 @@ nfsd_splice_actor(struct pipe_inode_info *pipe, struct pipe_buffer *buf,
>
> if (rqstp->rq_res.page_len == 0) {
> get_page(page);
> - put_page(*rqstp->rq_next_page);
> + svc_rqst_put_unused_page(rqstp, *rqstp->rq_next_page);
> *(rqstp->rq_next_page++) = page;
> rqstp->rq_res.page_base = buf->offset;
> rqstp->rq_res.page_len = size;
> } else if (page != pp[-1]) {
> get_page(page);
> if (*rqstp->rq_next_page)
> - put_page(*rqstp->rq_next_page);
> + svc_rqst_put_unused_page(rqstp, *rqstp->rq_next_page);
> *(rqstp->rq_next_page++) = page;
> rqstp->rq_res.page_len += size;
> } else
> diff --git a/include/linux/sunrpc/svc.h b/include/linux/sunrpc/svc.h
> index 31ee3b6047c3..340f4f3989c0 100644
> --- a/include/linux/sunrpc/svc.h
> +++ b/include/linux/sunrpc/svc.h
> @@ -250,6 +250,7 @@ struct svc_rqst {
> struct xdr_stream rq_arg_stream;
> struct page *rq_scratch_page;
> struct xdr_buf rq_res;
> + struct list_head rq_unused_pages;
> struct page *rq_pages[RPCSVC_MAXPAGES + 1];
> struct page * *rq_respages; /* points into rq_pages */
> struct page * *rq_next_page; /* next reply page to use */
> diff --git a/include/linux/sunrpc/svc_xprt.h b/include/linux/sunrpc/svc_xprt.h
> index 571f605bc91e..49ef86499876 100644
> --- a/include/linux/sunrpc/svc_xprt.h
> +++ b/include/linux/sunrpc/svc_xprt.h
> @@ -150,6 +150,34 @@ static inline void svc_xprt_get(struct svc_xprt *xprt)
> {
> kref_get(&xprt->xpt_ref);
> }
> +
> +/**
> + * svc_rqst_get_unused_page - Tap a page from the local cache
> + * @rqstp: svc_rqst with cached unused pages
> + *
> + * To save an allocator round trip, pages can be added to a
> + * local cache and re-used later by svc_alloc_arg().
> + *
> + * Returns an unused page, or NULL if the cache is empty.
> + */
> +static inline struct page *svc_rqst_get_unused_page(struct svc_rqst *rqstp)
> +{
> + return list_first_entry_or_null(&rqstp->rq_unused_pages,
> + struct page, lru);
> +}
> +
> +/**
> + * svc_rqst_put_unused_page - Stash a page in the local cache
> + * @rqstp: svc_rqst with cached unused pages
> + * @page: page to cache
> + *
> + */
> +static inline void svc_rqst_put_unused_page(struct svc_rqst *rqstp,
> + struct page *page)
> +{
> + list_add(&page->lru, &rqstp->rq_unused_pages);
> +}
> +
> static inline void svc_xprt_set_local(struct svc_xprt *xprt,
> const struct sockaddr *sa,
> const size_t salen)
> diff --git a/net/sunrpc/svc.c b/net/sunrpc/svc.c
> index 61fb8a18552c..3920fa8f1146 100644
> --- a/net/sunrpc/svc.c
> +++ b/net/sunrpc/svc.c
> @@ -570,6 +570,8 @@ svc_init_buffer(struct svc_rqst *rqstp, unsigned int size, int node)
> if (svc_is_backchannel(rqstp))
> return 1;
>
> + INIT_LIST_HEAD(&rqstp->rq_unused_pages);
> +
> pages = size / PAGE_SIZE + 1; /* extra page as we hold both request and reply.
> * We assume one is at most one page
> */
> @@ -593,8 +595,13 @@ svc_init_buffer(struct svc_rqst *rqstp, unsigned int size, int node)
> static void
> svc_release_buffer(struct svc_rqst *rqstp)
> {
> + struct page *page;
> unsigned int i;
>
> + while ((page = svc_rqst_get_unused_page(rqstp))) {
> + list_del(&page->lru);
> + put_page(page);
> + }
> for (i = 0; i < ARRAY_SIZE(rqstp->rq_pages); i++)
> if (rqstp->rq_pages[i])
> put_page(rqstp->rq_pages[i]);
> diff --git a/net/sunrpc/svc_xprt.c b/net/sunrpc/svc_xprt.c
> index 15aacfa5ca21..84210e546a66 100644
> --- a/net/sunrpc/svc_xprt.c
> +++ b/net/sunrpc/svc_xprt.c
> @@ -678,6 +678,18 @@ static int svc_alloc_arg(struct svc_rqst *rqstp)
> for (needed = 0, i = 0; i < pages ; i++)
> if (!rqstp->rq_pages[i])
> needed++;
> + if (needed) {
> + for (i = 0; i < pages; i++) {
> + if (!rqstp->rq_pages[i]) {
> + page = svc_rqst_get_unused_page(rqstp);
> + if (!page)
> + break;
> + list_del(&page->lru);
> + rqstp->rq_pages[i] = page;
> + needed--;
> + }
> + }
> + }
> if (needed) {
> LIST_HEAD(list);
>
This looks good! Probably simpler than the way I imagined it :-)
I would do that last bit of code differently though...

for (needed = 0, i = 0; i < pages ; i++)
if (!rqstp->rq_pages[i]) {
page = svc_rqst_get_unused_pages(rqstp);
if (page) {
list_del(&page->lru);
rqstp->rq_pages[i] = page;
} else
needed++;
}

but it is really a minor style difference - I don't object to your
version.

Reviewed-by: NeilBrown <[email protected]>

Thanks,
NeilBrown


Attachments:
signature.asc (873.00 B)

2021-02-26 14:16:00

by Chuck Lever

[permalink] [raw]
Subject: Re: [PATCH v2 4/4] SUNRPC: Cache pages that were replaced during a read splice



> On Feb 26, 2021, at 2:19 AM, NeilBrown <[email protected]> wrote:
>
> On Mon, Feb 22 2021, Chuck Lever wrote:
>
>> To avoid extra trips to the page allocator, don't free unused pages
>> in nfsd_splice_actor(), but instead place them in a local cache.
>> That cache is then used first when refilling rq_pages.
>>
>> On workloads that perform large NFS READs on splice-capable file
>> systems, this saves a considerable amount of work.
>>
>> Suggested-by: NeilBrown <[email protected]>
>> Signed-off-by: Chuck Lever <[email protected]>
>> ---
>> fs/nfsd/vfs.c | 4 ++--
>> include/linux/sunrpc/svc.h | 1 +
>> include/linux/sunrpc/svc_xprt.h | 28 ++++++++++++++++++++++++++++
>> net/sunrpc/svc.c | 7 +++++++
>> net/sunrpc/svc_xprt.c | 12 ++++++++++++
>> 5 files changed, 50 insertions(+), 2 deletions(-)
>>
>> diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c
>> index d316e11923c5..25cf41eaf3c4 100644
>> --- a/fs/nfsd/vfs.c
>> +++ b/fs/nfsd/vfs.c
>> @@ -852,14 +852,14 @@ nfsd_splice_actor(struct pipe_inode_info *pipe, struct pipe_buffer *buf,
>>
>> if (rqstp->rq_res.page_len == 0) {
>> get_page(page);
>> - put_page(*rqstp->rq_next_page);
>> + svc_rqst_put_unused_page(rqstp, *rqstp->rq_next_page);
>> *(rqstp->rq_next_page++) = page;
>> rqstp->rq_res.page_base = buf->offset;
>> rqstp->rq_res.page_len = size;
>> } else if (page != pp[-1]) {
>> get_page(page);
>> if (*rqstp->rq_next_page)
>> - put_page(*rqstp->rq_next_page);
>> + svc_rqst_put_unused_page(rqstp, *rqstp->rq_next_page);
>> *(rqstp->rq_next_page++) = page;
>> rqstp->rq_res.page_len += size;
>> } else
>> diff --git a/include/linux/sunrpc/svc.h b/include/linux/sunrpc/svc.h
>> index 31ee3b6047c3..340f4f3989c0 100644
>> --- a/include/linux/sunrpc/svc.h
>> +++ b/include/linux/sunrpc/svc.h
>> @@ -250,6 +250,7 @@ struct svc_rqst {
>> struct xdr_stream rq_arg_stream;
>> struct page *rq_scratch_page;
>> struct xdr_buf rq_res;
>> + struct list_head rq_unused_pages;
>> struct page *rq_pages[RPCSVC_MAXPAGES + 1];
>> struct page * *rq_respages; /* points into rq_pages */
>> struct page * *rq_next_page; /* next reply page to use */
>> diff --git a/include/linux/sunrpc/svc_xprt.h b/include/linux/sunrpc/svc_xprt.h
>> index 571f605bc91e..49ef86499876 100644
>> --- a/include/linux/sunrpc/svc_xprt.h
>> +++ b/include/linux/sunrpc/svc_xprt.h
>> @@ -150,6 +150,34 @@ static inline void svc_xprt_get(struct svc_xprt *xprt)
>> {
>> kref_get(&xprt->xpt_ref);
>> }
>> +
>> +/**
>> + * svc_rqst_get_unused_page - Tap a page from the local cache
>> + * @rqstp: svc_rqst with cached unused pages
>> + *
>> + * To save an allocator round trip, pages can be added to a
>> + * local cache and re-used later by svc_alloc_arg().
>> + *
>> + * Returns an unused page, or NULL if the cache is empty.
>> + */
>> +static inline struct page *svc_rqst_get_unused_page(struct svc_rqst *rqstp)
>> +{
>> + return list_first_entry_or_null(&rqstp->rq_unused_pages,
>> + struct page, lru);
>> +}
>> +
>> +/**
>> + * svc_rqst_put_unused_page - Stash a page in the local cache
>> + * @rqstp: svc_rqst with cached unused pages
>> + * @page: page to cache
>> + *
>> + */
>> +static inline void svc_rqst_put_unused_page(struct svc_rqst *rqstp,
>> + struct page *page)
>> +{
>> + list_add(&page->lru, &rqstp->rq_unused_pages);
>> +}
>> +
>> static inline void svc_xprt_set_local(struct svc_xprt *xprt,
>> const struct sockaddr *sa,
>> const size_t salen)
>> diff --git a/net/sunrpc/svc.c b/net/sunrpc/svc.c
>> index 61fb8a18552c..3920fa8f1146 100644
>> --- a/net/sunrpc/svc.c
>> +++ b/net/sunrpc/svc.c
>> @@ -570,6 +570,8 @@ svc_init_buffer(struct svc_rqst *rqstp, unsigned int size, int node)
>> if (svc_is_backchannel(rqstp))
>> return 1;
>>
>> + INIT_LIST_HEAD(&rqstp->rq_unused_pages);
>> +
>> pages = size / PAGE_SIZE + 1; /* extra page as we hold both request and reply.
>> * We assume one is at most one page
>> */
>> @@ -593,8 +595,13 @@ svc_init_buffer(struct svc_rqst *rqstp, unsigned int size, int node)
>> static void
>> svc_release_buffer(struct svc_rqst *rqstp)
>> {
>> + struct page *page;
>> unsigned int i;
>>
>> + while ((page = svc_rqst_get_unused_page(rqstp))) {
>> + list_del(&page->lru);
>> + put_page(page);
>> + }
>> for (i = 0; i < ARRAY_SIZE(rqstp->rq_pages); i++)
>> if (rqstp->rq_pages[i])
>> put_page(rqstp->rq_pages[i]);
>> diff --git a/net/sunrpc/svc_xprt.c b/net/sunrpc/svc_xprt.c
>> index 15aacfa5ca21..84210e546a66 100644
>> --- a/net/sunrpc/svc_xprt.c
>> +++ b/net/sunrpc/svc_xprt.c
>> @@ -678,6 +678,18 @@ static int svc_alloc_arg(struct svc_rqst *rqstp)
>> for (needed = 0, i = 0; i < pages ; i++)
>> if (!rqstp->rq_pages[i])
>> needed++;
>> + if (needed) {
>> + for (i = 0; i < pages; i++) {
>> + if (!rqstp->rq_pages[i]) {
>> + page = svc_rqst_get_unused_page(rqstp);
>> + if (!page)
>> + break;
>> + list_del(&page->lru);
>> + rqstp->rq_pages[i] = page;
>> + needed--;
>> + }
>> + }
>> + }
>> if (needed) {
>> LIST_HEAD(list);
>>
> This looks good! Probably simpler than the way I imagined it :-)
> I would do that last bit of code differently though...
>
> for (needed = 0, i = 0; i < pages ; i++)
> if (!rqstp->rq_pages[i]) {
> page = svc_rqst_get_unused_pages(rqstp);
> if (page) {
> list_del(&page->lru);
> rqstp->rq_pages[i] = page;
> } else
> needed++;
> }
>
> but it is really a minor style difference - I don't object to your
> version.

One trip through the array is better than two. I'll use yours.


> Reviewed-by: NeilBrown <[email protected]>
>
> Thanks,
> NeilBrown

--
Chuck Lever