2018-07-12 02:56:32

by Matthew Wilcox

[permalink] [raw]
Subject: [PATCH v2 0/6] 9p: Use IDRs more effectively

The 9p code doesn't take advantage of the IDR's ability to store
a pointer. We can actually get rid of the p9_idpool abstraction
and the multi-dimensional array of requests.

v2: Address feedback from Dominique.

Matthew Wilcox (6):
9p: Fix comment on smp_wmb
9p: Change p9_fid_create calling convention
9p: Replace the fidlist with an IDR
9p: Embed wait_queue_head into p9_req_t
9p: Use a slab for allocating requests
9p: Remove p9_idpool

include/net/9p/9p.h | 8 -
include/net/9p/client.h | 62 ++------
net/9p/Makefile | 1 -
net/9p/client.c | 319 ++++++++++++++--------------------------
net/9p/mod.c | 7 +-
net/9p/trans_virtio.c | 2 +-
net/9p/util.c | 141 ------------------
7 files changed, 133 insertions(+), 407 deletions(-)
delete mode 100644 net/9p/util.c

--
2.18.0



2018-07-12 02:56:03

by Matthew Wilcox

[permalink] [raw]
Subject: [PATCH v2 5/6] 9p: Use a slab for allocating requests

Replace the custom batch allocation with a slab. Use an IDR to store
pointers to the active requests instead of an array. We don't try to
handle P9_NOTAG specially; the IDR will happily shrink all the way back
once the TVERSION call has completed.

Signed-off-by: Matthew Wilcox <[email protected]>
---
include/net/9p/client.h | 51 ++-------
net/9p/client.c | 239 +++++++++++++++-------------------------
net/9p/mod.c | 7 +-
3 files changed, 101 insertions(+), 196 deletions(-)

diff --git a/include/net/9p/client.h b/include/net/9p/client.h
index 0fa0fbab33b0..a4dc42c53d18 100644
--- a/include/net/9p/client.h
+++ b/include/net/9p/client.h
@@ -64,22 +64,15 @@ enum p9_trans_status {

/**
* enum p9_req_status_t - status of a request
- * @REQ_STATUS_IDLE: request slot unused
* @REQ_STATUS_ALLOC: request has been allocated but not sent
* @REQ_STATUS_UNSENT: request waiting to be sent
* @REQ_STATUS_SENT: request sent to server
* @REQ_STATUS_RCVD: response received from server
* @REQ_STATUS_FLSHD: request has been flushed
* @REQ_STATUS_ERROR: request encountered an error on the client side
- *
- * The @REQ_STATUS_IDLE state is used to mark a request slot as unused
- * but use is actually tracked by the idpool structure which handles tag
- * id allocation.
- *
*/

enum p9_req_status_t {
- REQ_STATUS_IDLE,
REQ_STATUS_ALLOC,
REQ_STATUS_UNSENT,
REQ_STATUS_SENT,
@@ -92,24 +85,12 @@ enum p9_req_status_t {
* struct p9_req_t - request slots
* @status: status of this request slot
* @t_err: transport error
- * @flush_tag: tag of request being flushed (for flush requests)
* @wq: wait_queue for the client to block on for this request
* @tc: the request fcall structure
* @rc: the response fcall structure
* @aux: transport specific data (provided for trans_fd migration)
* @req_list: link for higher level objects to chain requests
- *
- * Transport use an array to track outstanding requests
- * instead of a list. While this may incurr overhead during initial
- * allocation or expansion, it makes request lookup much easier as the
- * tag id is a index into an array. (We use tag+1 so that we can accommodate
- * the -1 tag for the T_VERSION request).
- * This also has the nice effect of only having to allocate wait_queues
- * once, instead of constantly allocating and freeing them. Its possible
- * other resources could benefit from this scheme as well.
- *
*/
-
struct p9_req_t {
int status;
int t_err;
@@ -117,40 +98,26 @@ struct p9_req_t {
struct p9_fcall *tc;
struct p9_fcall *rc;
void *aux;
-
struct list_head req_list;
};

/**
* struct p9_client - per client instance state
- * @lock: protect @fidlist
+ * @lock: protect @fids and @reqs
* @msize: maximum data size negotiated by protocol
- * @dotu: extension flags negotiated by protocol
* @proto_version: 9P protocol version to use
* @trans_mod: module API instantiated with this client
+ * @status: connection state
* @trans: tranport instance state and API
* @fids: All active FID handles
- * @tagpool - transaction id accounting for session
- * @reqs - 2D array of requests
- * @max_tag - current maximum tag id allocated
- * @name - node name used as client id
+ * @reqs: All active requests.
+ * @name: node name used as client id
*
* The client structure is used to keep track of various per-client
* state that has been instantiated.
- * In order to minimize per-transaction overhead we use a
- * simple array to lookup requests instead of a hash table
- * or linked list. In order to support larger number of
- * transactions, we make this a 2D array, allocating new rows
- * when we need to grow the total number of the transactions.
- *
- * Each row is 256 requests and we'll support up to 256 rows for
- * a total of 64k concurrent requests per session.
- *
- * Bugs: duplicated data and potentially unnecessary elements.
*/
-
struct p9_client {
- spinlock_t lock; /* protect client structure */
+ spinlock_t lock;
unsigned int msize;
unsigned char proto_version;
struct p9_trans_module *trans_mod;
@@ -170,10 +137,7 @@ struct p9_client {
} trans_opts;

struct idr fids;
-
- struct p9_idpool *tagpool;
- struct p9_req_t *reqs[P9_ROW_MAXTAG];
- int max_tag;
+ struct idr reqs;

char name[__NEW_UTS_LEN + 1];
};
@@ -279,4 +243,7 @@ struct p9_fid *p9_client_xattrwalk(struct p9_fid *, const char *, u64 *);
int p9_client_xattrcreate(struct p9_fid *, const char *, u64, int);
int p9_client_readlink(struct p9_fid *fid, char **target);

+int p9_client_init(void);
+void p9_client_exit(void);
+
#endif /* NET_9P_CLIENT_H */
diff --git a/net/9p/client.c b/net/9p/client.c
index bc8aba6b5ce0..ecf5dd269f4c 100644
--- a/net/9p/client.c
+++ b/net/9p/client.c
@@ -241,132 +241,103 @@ static struct p9_fcall *p9_fcall_alloc(int alloc_msize)
return fc;
}

+static struct kmem_cache *p9_req_cache;
+
/**
- * p9_tag_alloc - lookup/allocate a request by tag
- * @c: client session to lookup tag within
- * @tag: numeric id for transaction
- *
- * this is a simple array lookup, but will grow the
- * request_slots as necessary to accommodate transaction
- * ids which did not previously have a slot.
- *
- * this code relies on the client spinlock to manage locks, its
- * possible we should switch to something else, but I'd rather
- * stick with something low-overhead for the common case.
+ * p9_req_alloc - Allocate a new request.
+ * @c: Client session.
+ * @type: Transaction type.
+ * @max_size: Maximum packet size for this request.
*
+ * Context: Process context.
+ * Return: Pointer to new request.
*/
-
static struct p9_req_t *
-p9_tag_alloc(struct p9_client *c, u16 tag, unsigned int max_size)
+p9_tag_alloc(struct p9_client *c, int8_t type, unsigned int max_size)
{
- unsigned long flags;
- int row, col;
- struct p9_req_t *req;
+ struct p9_req_t *req = kmem_cache_alloc(p9_req_cache, GFP_NOFS);
int alloc_msize = min(c->msize, max_size);
+ int tag;

- /* This looks up the original request by tag so we know which
- * buffer to read the data into */
- tag++;
-
- if (tag >= c->max_tag) {
- spin_lock_irqsave(&c->lock, flags);
- /* check again since original check was outside of lock */
- while (tag >= c->max_tag) {
- row = (tag / P9_ROW_MAXTAG);
- c->reqs[row] = kcalloc(P9_ROW_MAXTAG,
- sizeof(struct p9_req_t), GFP_ATOMIC);
-
- if (!c->reqs[row]) {
- pr_err("Couldn't grow tag array\n");
- spin_unlock_irqrestore(&c->lock, flags);
- return ERR_PTR(-ENOMEM);
- }
- for (col = 0; col < P9_ROW_MAXTAG; col++) {
- req = &c->reqs[row][col];
- req->status = REQ_STATUS_IDLE;
- init_waitqueue_head(&req->wq);
- }
- c->max_tag += P9_ROW_MAXTAG;
- }
- spin_unlock_irqrestore(&c->lock, flags);
- }
- row = tag / P9_ROW_MAXTAG;
- col = tag % P9_ROW_MAXTAG;
+ if (!req)
+ return NULL;

- req = &c->reqs[row][col];
- if (!req->tc)
- req->tc = p9_fcall_alloc(alloc_msize);
- if (!req->rc)
- req->rc = p9_fcall_alloc(alloc_msize);
+ req->tc = p9_fcall_alloc(alloc_msize);
+ req->rc = p9_fcall_alloc(alloc_msize);
if (!req->tc || !req->rc)
- goto grow_failed;
+ goto free;

p9pdu_reset(req->tc);
p9pdu_reset(req->rc);
-
- req->tc->tag = tag-1;
req->status = REQ_STATUS_ALLOC;
+ init_waitqueue_head(&req->wq);
+ INIT_LIST_HEAD(&req->req_list);
+
+ idr_preload(GFP_NOFS);
+ spin_lock_irq(&c->lock);
+ if (type == P9_TVERSION)
+ tag = idr_alloc(&c->reqs, req, P9_NOTAG, P9_NOTAG + 1,
+ GFP_NOWAIT);
+ else
+ tag = idr_alloc(&c->reqs, req, 0, P9_NOTAG, GFP_NOWAIT);
+ req->tc->tag = tag;
+ spin_unlock_irq(&c->lock);
+ idr_preload_end();
+ if (tag < 0)
+ goto free;

return req;

-grow_failed:
- pr_err("Couldn't grow tag array\n");
+free:
kfree(req->tc);
kfree(req->rc);
- req->tc = req->rc = NULL;
+ kmem_cache_free(p9_req_cache, req);
return ERR_PTR(-ENOMEM);
}

/**
- * p9_tag_lookup - lookup a request by tag
- * @c: client session to lookup tag within
- * @tag: numeric id for transaction
+ * p9_tag_lookup - Look up a request by tag.
+ * @c: Client session.
+ * @tag: Transaction ID.
*
+ * Context: Any context.
+ * Return: A request, or %NULL if there is no request with that tag.
*/
-
struct p9_req_t *p9_tag_lookup(struct p9_client *c, u16 tag)
{
- int row, col;
-
- /* This looks up the original request by tag so we know which
- * buffer to read the data into */
- tag++;
-
- if(tag >= c->max_tag)
- return NULL;
+ struct p9_req_t *req;

- row = tag / P9_ROW_MAXTAG;
- col = tag % P9_ROW_MAXTAG;
+ rcu_read_lock();
+ req = idr_find(&c->reqs, tag);
+ /*
+ * There's no refcount on the req; a malicious server could cause
+ * us to dereference a NULL pointer
+ */
+ rcu_read_unlock();

- return &c->reqs[row][col];
+ return req;
}
EXPORT_SYMBOL(p9_tag_lookup);

/**
- * p9_tag_init - setup tags structure and contents
- * @c: v9fs client struct
- *
- * This initializes the tags structure for each client instance.
+ * p9_free_req - Free a request.
+ * @c: Client session.
+ * @r: Request to free.
*
+ * Context: Any context.
*/
-
-static int p9_tag_init(struct p9_client *c)
+static void p9_free_req(struct p9_client *c, struct p9_req_t *r)
{
- int err = 0;
+ unsigned long flags;
+ u16 tag = r->tc->tag;

- c->tagpool = p9_idpool_create();
- if (IS_ERR(c->tagpool)) {
- err = PTR_ERR(c->tagpool);
- goto error;
- }
- err = p9_idpool_get(c->tagpool); /* reserve tag 0 */
- if (err < 0) {
- p9_idpool_destroy(c->tagpool);
- goto error;
- }
- c->max_tag = 0;
-error:
- return err;
+ p9_debug(P9_DEBUG_MUX, "clnt %p req %p tag: %d\n", c, r, tag);
+ spin_lock_irqsave(&c->lock, flags);
+ idr_remove(&c->reqs, tag);
+ spin_unlock_irqrestore(&c->lock, flags);
+ kfree(r->tc);
+ kfree(r->rc);
+ kmem_cache_free(p9_req_cache, r);
}

/**
@@ -378,52 +349,15 @@ static int p9_tag_init(struct p9_client *c)
*/
static void p9_tag_cleanup(struct p9_client *c)
{
- int row, col;
-
- /* check to insure all requests are idle */
- for (row = 0; row < (c->max_tag/P9_ROW_MAXTAG); row++) {
- for (col = 0; col < P9_ROW_MAXTAG; col++) {
- if (c->reqs[row][col].status != REQ_STATUS_IDLE) {
- p9_debug(P9_DEBUG_MUX,
- "Attempting to cleanup non-free tag %d,%d\n",
- row, col);
- /* TODO: delay execution of cleanup */
- return;
- }
- }
- }
-
- if (c->tagpool) {
- p9_idpool_put(0, c->tagpool); /* free reserved tag 0 */
- p9_idpool_destroy(c->tagpool);
- }
+ struct p9_req_t *req;
+ int id;

- /* free requests associated with tags */
- for (row = 0; row < (c->max_tag/P9_ROW_MAXTAG); row++) {
- for (col = 0; col < P9_ROW_MAXTAG; col++) {
- kfree(c->reqs[row][col].tc);
- kfree(c->reqs[row][col].rc);
- }
- kfree(c->reqs[row]);
+ rcu_read_lock();
+ idr_for_each_entry(&c->reqs, req, id) {
+ pr_info("Tag %d still in use\n", id);
+ p9_free_req(c, req);
}
- c->max_tag = 0;
-}
-
-/**
- * p9_free_req - free a request and clean-up as necessary
- * c: client state
- * r: request to release
- *
- */
-
-static void p9_free_req(struct p9_client *c, struct p9_req_t *r)
-{
- int tag = r->tc->tag;
- p9_debug(P9_DEBUG_MUX, "clnt %p req %p tag: %d\n", c, r, tag);
-
- r->status = REQ_STATUS_IDLE;
- if (tag != P9_NOTAG && p9_idpool_check(tag, c->tagpool))
- p9_idpool_put(tag, c->tagpool);
+ rcu_read_unlock();
}

/**
@@ -690,7 +624,7 @@ static struct p9_req_t *p9_client_prepare_req(struct p9_client *c,
int8_t type, int req_size,
const char *fmt, va_list ap)
{
- int tag, err;
+ int err;
struct p9_req_t *req;

p9_debug(P9_DEBUG_MUX, "client %p op %d\n", c, type);
@@ -703,24 +637,17 @@ static struct p9_req_t *p9_client_prepare_req(struct p9_client *c,
if ((c->status == BeginDisconnect) && (type != P9_TCLUNK))
return ERR_PTR(-EIO);

- tag = P9_NOTAG;
- if (type != P9_TVERSION) {
- tag = p9_idpool_get(c->tagpool);
- if (tag < 0)
- return ERR_PTR(-ENOMEM);
- }
-
- req = p9_tag_alloc(c, tag, req_size);
+ req = p9_tag_alloc(c, type, req_size);
if (IS_ERR(req))
return req;

/* marshall the data */
- p9pdu_prepare(req->tc, tag, type);
+ p9pdu_prepare(req->tc, req->tc->tag, type);
err = p9pdu_vwritef(req->tc, c->proto_version, fmt, ap);
if (err)
goto reterr;
p9pdu_finalize(c, req->tc);
- trace_9p_client_req(c, type, tag);
+ trace_9p_client_req(c, type, req->tc->tag);
return req;
reterr:
p9_free_req(c, req);
@@ -1018,14 +945,11 @@ struct p9_client *p9_client_create(const char *dev_name, char *options)

spin_lock_init(&clnt->lock);
idr_init(&clnt->fids);
-
- err = p9_tag_init(clnt);
- if (err < 0)
- goto free_client;
+ idr_init(&clnt->reqs);

err = parse_opts(options, clnt);
if (err < 0)
- goto destroy_tagpool;
+ goto free_client;

if (!clnt->trans_mod)
clnt->trans_mod = v9fs_get_default_trans();
@@ -1034,7 +958,7 @@ struct p9_client *p9_client_create(const char *dev_name, char *options)
err = -EPROTONOSUPPORT;
p9_debug(P9_DEBUG_ERROR,
"No transport defined or default transport\n");
- goto destroy_tagpool;
+ goto free_client;
}

p9_debug(P9_DEBUG_MUX, "clnt %p trans %p msize %d protocol %d\n",
@@ -1057,8 +981,6 @@ struct p9_client *p9_client_create(const char *dev_name, char *options)
clnt->trans_mod->close(clnt);
put_trans:
v9fs_put_trans(clnt->trans_mod);
-destroy_tagpool:
- p9_idpool_destroy(clnt->tagpool);
free_client:
kfree(clnt);
return ERR_PTR(err);
@@ -2274,3 +2196,14 @@ int p9_client_readlink(struct p9_fid *fid, char **target)
return err;
}
EXPORT_SYMBOL(p9_client_readlink);
+
+int __init p9_client_init(void)
+{
+ p9_req_cache = KMEM_CACHE(p9_req_t, 0);
+ return p9_req_cache ? 0 : -ENOMEM;
+}
+
+void __exit p9_client_exit(void)
+{
+ kmem_cache_destroy(p9_req_cache);
+}
diff --git a/net/9p/mod.c b/net/9p/mod.c
index eb9777f05755..0da56d6af73b 100644
--- a/net/9p/mod.c
+++ b/net/9p/mod.c
@@ -171,7 +171,11 @@ void v9fs_put_trans(struct p9_trans_module *m)
*/
static int __init init_p9(void)
{
- int ret = 0;
+ int ret;
+
+ ret = p9_client_init();
+ if (ret)
+ return ret;

p9_error_init();
pr_info("Installing 9P2000 support\n");
@@ -190,6 +194,7 @@ static void __exit exit_p9(void)
pr_info("Unloading 9P2000 support\n");

p9_trans_fd_exit();
+ p9_client_exit();
}

module_init(init_p9)
--
2.18.0


2018-07-12 02:56:07

by Matthew Wilcox

[permalink] [raw]
Subject: [PATCH v2 1/6] 9p: Fix comment on smp_wmb

The previous comment misled me into thinking the barrier wasn't needed
at all.

Signed-off-by: Matthew Wilcox <[email protected]>
---
net/9p/client.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/9p/client.c b/net/9p/client.c
index 18c5271910dc..999eceb8af98 100644
--- a/net/9p/client.c
+++ b/net/9p/client.c
@@ -447,7 +447,7 @@ void p9_client_cb(struct p9_client *c, struct p9_req_t *req, int status)

/*
* This barrier is needed to make sure any change made to req before
- * the other thread wakes up will indeed be seen by the waiting side.
+ * the status change is visible to another thread
*/
smp_wmb();
req->status = status;
--
2.18.0


2018-07-12 02:56:20

by Matthew Wilcox

[permalink] [raw]
Subject: [PATCH v2 6/6] 9p: Remove p9_idpool

There are no more users left of the p9_idpool; delete it.

Signed-off-by: Matthew Wilcox <[email protected]>
---
include/net/9p/9p.h | 8 ---
net/9p/Makefile | 1 -
net/9p/util.c | 141 --------------------------------------------
3 files changed, 150 deletions(-)
delete mode 100644 net/9p/util.c

diff --git a/include/net/9p/9p.h b/include/net/9p/9p.h
index b8eb51a661e5..e23896116d9a 100644
--- a/include/net/9p/9p.h
+++ b/include/net/9p/9p.h
@@ -561,16 +561,8 @@ struct p9_fcall {
u8 *sdata;
};

-struct p9_idpool;
-
int p9_errstr2errno(char *errstr, int len);

-struct p9_idpool *p9_idpool_create(void);
-void p9_idpool_destroy(struct p9_idpool *);
-int p9_idpool_get(struct p9_idpool *p);
-void p9_idpool_put(int id, struct p9_idpool *p);
-int p9_idpool_check(int id, struct p9_idpool *p);
-
int p9_error_init(void);
int p9_trans_fd_init(void);
void p9_trans_fd_exit(void);
diff --git a/net/9p/Makefile b/net/9p/Makefile
index c0486cfc85d9..aa0a5641e5d0 100644
--- a/net/9p/Makefile
+++ b/net/9p/Makefile
@@ -8,7 +8,6 @@ obj-$(CONFIG_NET_9P_RDMA) += 9pnet_rdma.o
mod.o \
client.o \
error.o \
- util.o \
protocol.o \
trans_fd.o \
trans_common.o \
diff --git a/net/9p/util.c b/net/9p/util.c
deleted file mode 100644
index 59f278e64f58..000000000000
--- a/net/9p/util.c
+++ /dev/null
@@ -1,141 +0,0 @@
-/*
- * net/9p/util.c
- *
- * This file contains some helper functions
- *
- * Copyright (C) 2007 by Latchesar Ionkov <[email protected]>
- * Copyright (C) 2004 by Eric Van Hensbergen <[email protected]>
- * Copyright (C) 2002 by Ron Minnich <[email protected]>
- *
- * This program is free software; you can redistribute it and/or modify
- * it under the terms of the GNU General Public License version 2
- * as published by the Free Software Foundation.
- *
- * This program is distributed in the hope that it will be useful,
- * but WITHOUT ANY WARRANTY; without even the implied warranty of
- * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
- * GNU General Public License for more details.
- *
- * You should have received a copy of the GNU General Public License
- * along with this program; if not, write to:
- * Free Software Foundation
- * 51 Franklin Street, Fifth Floor
- * Boston, MA 02111-1301 USA
- *
- */
-
-#include <linux/module.h>
-#include <linux/errno.h>
-#include <linux/fs.h>
-#include <linux/sched.h>
-#include <linux/parser.h>
-#include <linux/idr.h>
-#include <linux/slab.h>
-#include <net/9p/9p.h>
-
-/**
- * struct p9_idpool - per-connection accounting for tag idpool
- * @lock: protects the pool
- * @pool: idr to allocate tag id from
- *
- */
-
-struct p9_idpool {
- spinlock_t lock;
- struct idr pool;
-};
-
-/**
- * p9_idpool_create - create a new per-connection id pool
- *
- */
-
-struct p9_idpool *p9_idpool_create(void)
-{
- struct p9_idpool *p;
-
- p = kmalloc(sizeof(struct p9_idpool), GFP_KERNEL);
- if (!p)
- return ERR_PTR(-ENOMEM);
-
- spin_lock_init(&p->lock);
- idr_init(&p->pool);
-
- return p;
-}
-EXPORT_SYMBOL(p9_idpool_create);
-
-/**
- * p9_idpool_destroy - create a new per-connection id pool
- * @p: idpool to destroy
- */
-
-void p9_idpool_destroy(struct p9_idpool *p)
-{
- idr_destroy(&p->pool);
- kfree(p);
-}
-EXPORT_SYMBOL(p9_idpool_destroy);
-
-/**
- * p9_idpool_get - allocate numeric id from pool
- * @p: pool to allocate from
- *
- * Bugs: This seems to be an awful generic function, should it be in idr.c with
- * the lock included in struct idr?
- */
-
-int p9_idpool_get(struct p9_idpool *p)
-{
- int i;
- unsigned long flags;
-
- idr_preload(GFP_NOFS);
- spin_lock_irqsave(&p->lock, flags);
-
- /* no need to store exactly p, we just need something non-null */
- i = idr_alloc(&p->pool, p, 0, 0, GFP_NOWAIT);
-
- spin_unlock_irqrestore(&p->lock, flags);
- idr_preload_end();
- if (i < 0)
- return -1;
-
- p9_debug(P9_DEBUG_MUX, " id %d pool %p\n", i, p);
- return i;
-}
-EXPORT_SYMBOL(p9_idpool_get);
-
-/**
- * p9_idpool_put - release numeric id from pool
- * @id: numeric id which is being released
- * @p: pool to release id into
- *
- * Bugs: This seems to be an awful generic function, should it be in idr.c with
- * the lock included in struct idr?
- */
-
-void p9_idpool_put(int id, struct p9_idpool *p)
-{
- unsigned long flags;
-
- p9_debug(P9_DEBUG_MUX, " id %d pool %p\n", id, p);
-
- spin_lock_irqsave(&p->lock, flags);
- idr_remove(&p->pool, id);
- spin_unlock_irqrestore(&p->lock, flags);
-}
-EXPORT_SYMBOL(p9_idpool_put);
-
-/**
- * p9_idpool_check - check if the specified id is available
- * @id: id to check
- * @p: pool to check
- */
-
-int p9_idpool_check(int id, struct p9_idpool *p)
-{
- return idr_find(&p->pool, id) != NULL;
-}
-EXPORT_SYMBOL(p9_idpool_check);
-
--
2.18.0


2018-07-12 02:56:26

by Matthew Wilcox

[permalink] [raw]
Subject: [PATCH v2 2/6] 9p: Change p9_fid_create calling convention

Return NULL instead of ERR_PTR when we can't allocate a FID. The ENOSPC
return value was getting all the way back to userspace, and that's
confusing for a userspace program which isn't expecting read() to tell it
there's no space left on the filesystem. The best error we can return to
indicate a temporary failure caused by lack of client resources is ENOMEM.

Maybe it would be better to sleep until a FID is available, but that's
not a change I'm comfortable making.

Signed-off-by: Matthew Wilcox <[email protected]>
---
net/9p/client.c | 23 +++++++++--------------
1 file changed, 9 insertions(+), 14 deletions(-)

diff --git a/net/9p/client.c b/net/9p/client.c
index 999eceb8af98..389a2904b7b3 100644
--- a/net/9p/client.c
+++ b/net/9p/client.c
@@ -913,13 +913,11 @@ static struct p9_fid *p9_fid_create(struct p9_client *clnt)
p9_debug(P9_DEBUG_FID, "clnt %p\n", clnt);
fid = kmalloc(sizeof(struct p9_fid), GFP_KERNEL);
if (!fid)
- return ERR_PTR(-ENOMEM);
+ return NULL;

ret = p9_idpool_get(clnt->fidpool);
- if (ret < 0) {
- ret = -ENOSPC;
+ if (ret < 0)
goto error;
- }
fid->fid = ret;

memset(&fid->qid, 0, sizeof(struct p9_qid));
@@ -935,7 +933,7 @@ static struct p9_fid *p9_fid_create(struct p9_client *clnt)

error:
kfree(fid);
- return ERR_PTR(ret);
+ return NULL;
}

static void p9_fid_destroy(struct p9_fid *fid)
@@ -1137,9 +1135,8 @@ struct p9_fid *p9_client_attach(struct p9_client *clnt, struct p9_fid *afid,
p9_debug(P9_DEBUG_9P, ">>> TATTACH afid %d uname %s aname %s\n",
afid ? afid->fid : -1, uname, aname);
fid = p9_fid_create(clnt);
- if (IS_ERR(fid)) {
- err = PTR_ERR(fid);
- fid = NULL;
+ if (!fid) {
+ err = -ENOMEM;
goto error;
}
fid->uid = n_uname;
@@ -1188,9 +1185,8 @@ struct p9_fid *p9_client_walk(struct p9_fid *oldfid, uint16_t nwname,
clnt = oldfid->clnt;
if (clone) {
fid = p9_fid_create(clnt);
- if (IS_ERR(fid)) {
- err = PTR_ERR(fid);
- fid = NULL;
+ if (!fid) {
+ err = -ENOMEM;
goto error;
}

@@ -2018,9 +2014,8 @@ struct p9_fid *p9_client_xattrwalk(struct p9_fid *file_fid,
err = 0;
clnt = file_fid->clnt;
attr_fid = p9_fid_create(clnt);
- if (IS_ERR(attr_fid)) {
- err = PTR_ERR(attr_fid);
- attr_fid = NULL;
+ if (!attr_fid) {
+ err = -ENOMEM;
goto error;
}
p9_debug(P9_DEBUG_9P,
--
2.18.0


2018-07-12 02:56:35

by Matthew Wilcox

[permalink] [raw]
Subject: [PATCH v2 4/6] 9p: Embed wait_queue_head into p9_req_t

On a 64-bit system, the wait_queue_head_t is 24 bytes while the pointer
to it is 8 bytes. Growing the p9_req_t by 16 bytes is better than
performing a 24-byte memory allocation.

Signed-off-by: Matthew Wilcox <[email protected]>
---
include/net/9p/client.h | 2 +-
net/9p/client.c | 19 +++++--------------
net/9p/trans_virtio.c | 2 +-
3 files changed, 7 insertions(+), 16 deletions(-)

diff --git a/include/net/9p/client.h b/include/net/9p/client.h
index e405729cd1c7..0fa0fbab33b0 100644
--- a/include/net/9p/client.h
+++ b/include/net/9p/client.h
@@ -113,7 +113,7 @@ enum p9_req_status_t {
struct p9_req_t {
int status;
int t_err;
- wait_queue_head_t *wq;
+ wait_queue_head_t wq;
struct p9_fcall *tc;
struct p9_fcall *rc;
void *aux;
diff --git a/net/9p/client.c b/net/9p/client.c
index b89c7298267c..bc8aba6b5ce0 100644
--- a/net/9p/client.c
+++ b/net/9p/client.c
@@ -282,8 +282,9 @@ p9_tag_alloc(struct p9_client *c, u16 tag, unsigned int max_size)
return ERR_PTR(-ENOMEM);
}
for (col = 0; col < P9_ROW_MAXTAG; col++) {
- c->reqs[row][col].status = REQ_STATUS_IDLE;
- c->reqs[row][col].tc = NULL;
+ req = &c->reqs[row][col];
+ req->status = REQ_STATUS_IDLE;
+ init_waitqueue_head(&req->wq);
}
c->max_tag += P9_ROW_MAXTAG;
}
@@ -293,13 +294,6 @@ p9_tag_alloc(struct p9_client *c, u16 tag, unsigned int max_size)
col = tag % P9_ROW_MAXTAG;

req = &c->reqs[row][col];
- if (!req->wq) {
- req->wq = kmalloc(sizeof(wait_queue_head_t), GFP_NOFS);
- if (!req->wq)
- goto grow_failed;
- init_waitqueue_head(req->wq);
- }
-
if (!req->tc)
req->tc = p9_fcall_alloc(alloc_msize);
if (!req->rc)
@@ -319,9 +313,7 @@ p9_tag_alloc(struct p9_client *c, u16 tag, unsigned int max_size)
pr_err("Couldn't grow tag array\n");
kfree(req->tc);
kfree(req->rc);
- kfree(req->wq);
req->tc = req->rc = NULL;
- req->wq = NULL;
return ERR_PTR(-ENOMEM);
}

@@ -409,7 +401,6 @@ static void p9_tag_cleanup(struct p9_client *c)
/* free requests associated with tags */
for (row = 0; row < (c->max_tag/P9_ROW_MAXTAG); row++) {
for (col = 0; col < P9_ROW_MAXTAG; col++) {
- kfree(c->reqs[row][col].wq);
kfree(c->reqs[row][col].tc);
kfree(c->reqs[row][col].rc);
}
@@ -452,7 +443,7 @@ void p9_client_cb(struct p9_client *c, struct p9_req_t *req, int status)
smp_wmb();
req->status = status;

- wake_up(req->wq);
+ wake_up(&req->wq);
p9_debug(P9_DEBUG_MUX, "wakeup: %d\n", req->tc->tag);
}
EXPORT_SYMBOL(p9_client_cb);
@@ -773,7 +764,7 @@ p9_client_rpc(struct p9_client *c, int8_t type, const char *fmt, ...)
}
again:
/* Wait for the response */
- err = wait_event_killable(*req->wq, req->status >= REQ_STATUS_RCVD);
+ err = wait_event_killable(req->wq, req->status >= REQ_STATUS_RCVD);

/*
* Make sure our req is coherent with regard to updates in other
diff --git a/net/9p/trans_virtio.c b/net/9p/trans_virtio.c
index 05006cbb3361..3e096c98313c 100644
--- a/net/9p/trans_virtio.c
+++ b/net/9p/trans_virtio.c
@@ -490,7 +490,7 @@ p9_virtio_zc_request(struct p9_client *client, struct p9_req_t *req,
virtqueue_kick(chan->vq);
spin_unlock_irqrestore(&chan->lock, flags);
p9_debug(P9_DEBUG_TRANS, "virtio request kicked\n");
- err = wait_event_killable(*req->wq, req->status >= REQ_STATUS_RCVD);
+ err = wait_event_killable(req->wq, req->status >= REQ_STATUS_RCVD);
/*
* Non kernel buffers are pinned, unpin them
*/
--
2.18.0


2018-07-12 02:56:40

by Matthew Wilcox

[permalink] [raw]
Subject: [PATCH v2 3/6] 9p: Replace the fidlist with an IDR

The p9_idpool being used to allocate the IDs uses an IDR to allocate
the IDs ... which we then keep in a doubly-linked list, rather than in
the IDR which allocated them. We can use an IDR directly which saves
two pointers per p9_fid, and a tiny memory allocation per p9_client.

Signed-off-by: Matthew Wilcox <[email protected]>
---
include/net/9p/client.h | 9 +++------
net/9p/client.c | 44 +++++++++++++++--------------------------
2 files changed, 19 insertions(+), 34 deletions(-)

diff --git a/include/net/9p/client.h b/include/net/9p/client.h
index 7af9d769b97d..e405729cd1c7 100644
--- a/include/net/9p/client.h
+++ b/include/net/9p/client.h
@@ -27,6 +27,7 @@
#define NET_9P_CLIENT_H

#include <linux/utsname.h>
+#include <linux/idr.h>

/* Number of requests per row */
#define P9_ROW_MAXTAG 255
@@ -128,8 +129,7 @@ struct p9_req_t {
* @proto_version: 9P protocol version to use
* @trans_mod: module API instantiated with this client
* @trans: tranport instance state and API
- * @fidpool: fid handle accounting for session
- * @fidlist: List of active fid handles
+ * @fids: All active FID handles
* @tagpool - transaction id accounting for session
* @reqs - 2D array of requests
* @max_tag - current maximum tag id allocated
@@ -169,8 +169,7 @@ struct p9_client {
} tcp;
} trans_opts;

- struct p9_idpool *fidpool;
- struct list_head fidlist;
+ struct idr fids;

struct p9_idpool *tagpool;
struct p9_req_t *reqs[P9_ROW_MAXTAG];
@@ -188,7 +187,6 @@ struct p9_client {
* @iounit: the server reported maximum transaction size for this file
* @uid: the numeric uid of the local user who owns this handle
* @rdir: readdir accounting structure (allocated on demand)
- * @flist: per-client-instance fid tracking
* @dlist: per-dentry fid tracking
*
* TODO: This needs lots of explanation.
@@ -204,7 +202,6 @@ struct p9_fid {

void *rdir;

- struct list_head flist;
struct hlist_node dlist; /* list of all fids attached to a dentry */
};

diff --git a/net/9p/client.c b/net/9p/client.c
index 389a2904b7b3..b89c7298267c 100644
--- a/net/9p/client.c
+++ b/net/9p/client.c
@@ -908,30 +908,29 @@ static struct p9_fid *p9_fid_create(struct p9_client *clnt)
{
int ret;
struct p9_fid *fid;
- unsigned long flags;

p9_debug(P9_DEBUG_FID, "clnt %p\n", clnt);
fid = kmalloc(sizeof(struct p9_fid), GFP_KERNEL);
if (!fid)
return NULL;

- ret = p9_idpool_get(clnt->fidpool);
- if (ret < 0)
- goto error;
- fid->fid = ret;
-
memset(&fid->qid, 0, sizeof(struct p9_qid));
fid->mode = -1;
fid->uid = current_fsuid();
fid->clnt = clnt;
fid->rdir = NULL;
- spin_lock_irqsave(&clnt->lock, flags);
- list_add(&fid->flist, &clnt->fidlist);
- spin_unlock_irqrestore(&clnt->lock, flags);
+ fid->fid = 0;

- return fid;
+ idr_preload(GFP_KERNEL);
+ spin_lock_irq(&clnt->lock);
+ ret = idr_alloc_u32(&clnt->fids, fid, &fid->fid, P9_NOFID - 1,
+ GFP_NOWAIT);
+ spin_unlock_irq(&clnt->lock);
+ idr_preload_end();
+
+ if (!ret)
+ return fid;

-error:
kfree(fid);
return NULL;
}
@@ -943,9 +942,8 @@ static void p9_fid_destroy(struct p9_fid *fid)

p9_debug(P9_DEBUG_FID, "fid %d\n", fid->fid);
clnt = fid->clnt;
- p9_idpool_put(fid->fid, clnt->fidpool);
spin_lock_irqsave(&clnt->lock, flags);
- list_del(&fid->flist);
+ idr_remove(&clnt->fids, fid->fid);
spin_unlock_irqrestore(&clnt->lock, flags);
kfree(fid->rdir);
kfree(fid);
@@ -1028,7 +1026,7 @@ struct p9_client *p9_client_create(const char *dev_name, char *options)
memcpy(clnt->name, client_id, strlen(client_id) + 1);

spin_lock_init(&clnt->lock);
- INIT_LIST_HEAD(&clnt->fidlist);
+ idr_init(&clnt->fids);

err = p9_tag_init(clnt);
if (err < 0)
@@ -1048,18 +1046,12 @@ struct p9_client *p9_client_create(const char *dev_name, char *options)
goto destroy_tagpool;
}

- clnt->fidpool = p9_idpool_create();
- if (IS_ERR(clnt->fidpool)) {
- err = PTR_ERR(clnt->fidpool);
- goto put_trans;
- }
-
p9_debug(P9_DEBUG_MUX, "clnt %p trans %p msize %d protocol %d\n",
clnt, clnt->trans_mod, clnt->msize, clnt->proto_version);

err = clnt->trans_mod->create(clnt, dev_name, options);
if (err)
- goto destroy_fidpool;
+ goto put_trans;

if (clnt->msize > clnt->trans_mod->maxsize)
clnt->msize = clnt->trans_mod->maxsize;
@@ -1072,8 +1064,6 @@ struct p9_client *p9_client_create(const char *dev_name, char *options)

close_trans:
clnt->trans_mod->close(clnt);
-destroy_fidpool:
- p9_idpool_destroy(clnt->fidpool);
put_trans:
v9fs_put_trans(clnt->trans_mod);
destroy_tagpool:
@@ -1086,7 +1076,8 @@ EXPORT_SYMBOL(p9_client_create);

void p9_client_destroy(struct p9_client *clnt)
{
- struct p9_fid *fid, *fidptr;
+ struct p9_fid *fid;
+ int id;

p9_debug(P9_DEBUG_MUX, "clnt %p\n", clnt);

@@ -1095,14 +1086,11 @@ void p9_client_destroy(struct p9_client *clnt)

v9fs_put_trans(clnt->trans_mod);

- list_for_each_entry_safe(fid, fidptr, &clnt->fidlist, flist) {
+ idr_for_each_entry(&clnt->fids, fid, id) {
pr_info("Found fid %d not clunked\n", fid->fid);
p9_fid_destroy(fid);
}

- if (clnt->fidpool)
- p9_idpool_destroy(clnt->fidpool);
-
p9_tag_cleanup(clnt);

kfree(clnt);
--
2.18.0


2018-07-12 03:06:03

by Dominique Martinet

[permalink] [raw]
Subject: Re: [PATCH v2 0/6] 9p: Use IDRs more effectively

Matthew Wilcox wrote on Wed, Jul 11, 2018:
> The 9p code doesn't take advantage of the IDR's ability to store
> a pointer. We can actually get rid of the p9_idpool abstraction
> and the multi-dimensional array of requests.
>
> v2: Address feedback from Dominique.

Thanks, I've picked them up for 4.19
( git://github.com/martinetd/linux 9p-next as per mail on
v9fs-developer list, "Current 9P patches - test branch")

This shouldn't stop anyone else from doing more reviews/test, I'm doing
this 9p-patch-gathering on no autority and I've only checked very basic
things for now.

--
Dominique Martinet

2018-07-12 03:12:45

by piaojun

[permalink] [raw]
Subject: Re: [V9fs-developer] [PATCH v2 2/6] 9p: Change p9_fid_create calling convention

LGTM

On 2018/7/12 5:02, Matthew Wilcox wrote:
> Return NULL instead of ERR_PTR when we can't allocate a FID. The ENOSPC
> return value was getting all the way back to userspace, and that's
> confusing for a userspace program which isn't expecting read() to tell it
> there's no space left on the filesystem. The best error we can return to
> indicate a temporary failure caused by lack of client resources is ENOMEM.
>
> Maybe it would be better to sleep until a FID is available, but that's
> not a change I'm comfortable making.
>
> Signed-off-by: Matthew Wilcox <[email protected]>
Reviewed-by: Jun Piao <[email protected]>
> ---
> net/9p/client.c | 23 +++++++++--------------
> 1 file changed, 9 insertions(+), 14 deletions(-)
>
> diff --git a/net/9p/client.c b/net/9p/client.c
> index 999eceb8af98..389a2904b7b3 100644
> --- a/net/9p/client.c
> +++ b/net/9p/client.c
> @@ -913,13 +913,11 @@ static struct p9_fid *p9_fid_create(struct p9_client *clnt)
> p9_debug(P9_DEBUG_FID, "clnt %p\n", clnt);
> fid = kmalloc(sizeof(struct p9_fid), GFP_KERNEL);
> if (!fid)
> - return ERR_PTR(-ENOMEM);
> + return NULL;
>
> ret = p9_idpool_get(clnt->fidpool);
> - if (ret < 0) {
> - ret = -ENOSPC;
> + if (ret < 0)
> goto error;
> - }
> fid->fid = ret;
>
> memset(&fid->qid, 0, sizeof(struct p9_qid));
> @@ -935,7 +933,7 @@ static struct p9_fid *p9_fid_create(struct p9_client *clnt)
>
> error:
> kfree(fid);
> - return ERR_PTR(ret);
> + return NULL;
> }
>
> static void p9_fid_destroy(struct p9_fid *fid)
> @@ -1137,9 +1135,8 @@ struct p9_fid *p9_client_attach(struct p9_client *clnt, struct p9_fid *afid,
> p9_debug(P9_DEBUG_9P, ">>> TATTACH afid %d uname %s aname %s\n",
> afid ? afid->fid : -1, uname, aname);
> fid = p9_fid_create(clnt);
> - if (IS_ERR(fid)) {
> - err = PTR_ERR(fid);
> - fid = NULL;
> + if (!fid) {
> + err = -ENOMEM;
> goto error;
> }
> fid->uid = n_uname;
> @@ -1188,9 +1185,8 @@ struct p9_fid *p9_client_walk(struct p9_fid *oldfid, uint16_t nwname,
> clnt = oldfid->clnt;
> if (clone) {
> fid = p9_fid_create(clnt);
> - if (IS_ERR(fid)) {
> - err = PTR_ERR(fid);
> - fid = NULL;
> + if (!fid) {
> + err = -ENOMEM;
> goto error;
> }
>
> @@ -2018,9 +2014,8 @@ struct p9_fid *p9_client_xattrwalk(struct p9_fid *file_fid,
> err = 0;
> clnt = file_fid->clnt;
> attr_fid = p9_fid_create(clnt);
> - if (IS_ERR(attr_fid)) {
> - err = PTR_ERR(attr_fid);
> - attr_fid = NULL;
> + if (!attr_fid) {
> + err = -ENOMEM;
> goto error;
> }
> p9_debug(P9_DEBUG_9P,
>

2018-07-12 11:18:34

by Dominique Martinet

[permalink] [raw]
Subject: Re: [PATCH v2 3/6] 9p: Replace the fidlist with an IDR

Matthew Wilcox wrote on Wed, Jul 11, 2018:
> diff --git a/net/9p/client.c b/net/9p/client.c
> index 389a2904b7b3..b89c7298267c 100644
> --- a/net/9p/client.c
> +++ b/net/9p/client.c
> @@ -908,30 +908,29 @@ static struct p9_fid *p9_fid_create(struct p9_client *clnt)
> {
> int ret;
> struct p9_fid *fid;
> - unsigned long flags;
>
> p9_debug(P9_DEBUG_FID, "clnt %p\n", clnt);
> fid = kmalloc(sizeof(struct p9_fid), GFP_KERNEL);
> if (!fid)
> return NULL;
>
> - ret = p9_idpool_get(clnt->fidpool);
> - if (ret < 0)
> - goto error;
> - fid->fid = ret;
> -
> memset(&fid->qid, 0, sizeof(struct p9_qid));

Ah, I had missed that you didn't update this memset as you said in reply
to comment on v1.

Could you resend just this patch and either initialize fid->fid or use
kzalloc for the fid allocation?

> fid->mode = -1;
> fid->uid = current_fsuid();
> fid->clnt = clnt;
> fid->rdir = NULL;
> - spin_lock_irqsave(&clnt->lock, flags);
> - list_add(&fid->flist, &clnt->fidlist);
> - spin_unlock_irqrestore(&clnt->lock, flags);
> + fid->fid = 0;
>
> - return fid;
> + idr_preload(GFP_KERNEL);
> + spin_lock_irq(&clnt->lock);
> + ret = idr_alloc_u32(&clnt->fids, fid, &fid->fid, P9_NOFID - 1,
> + GFP_NOWAIT);

If you do resend, alignment here was wrong.


Thanks,
--
Dominique Martinet

2018-07-12 11:24:40

by Matthew Wilcox

[permalink] [raw]
Subject: Re: [PATCH v2 3/6] 9p: Replace the fidlist with an IDR

On Thu, Jul 12, 2018 at 01:17:26PM +0200, Dominique Martinet wrote:
> Matthew Wilcox wrote on Wed, Jul 11, 2018:
> > diff --git a/net/9p/client.c b/net/9p/client.c
> > index 389a2904b7b3..b89c7298267c 100644
> > --- a/net/9p/client.c
> > +++ b/net/9p/client.c
> > memset(&fid->qid, 0, sizeof(struct p9_qid));
>
> Ah, I had missed that you didn't update this memset as you said in reply
> to comment on v1.

Rather than update the memset, I ...

> Could you resend just this patch and either initialize fid->fid or use
> kzalloc for the fid allocation?
>
> > + fid->fid = 0;

Did that instead ;-)

If I were going to initialise the entire structure to 0, I'd replace
the kmalloc with kzalloc and drop the memset altogether.

> If you do resend, alignment here was wrong.

I think this warning from checkpatch is complete bullshit. It's
really none of checkpatch's business how I choose to align function
arguments.

That said, if you want it to be aligned some particular way, feel free
to adjust the whitespace. I don't care.

2018-07-12 11:32:14

by Dominique Martinet

[permalink] [raw]
Subject: Re: [PATCH v2 3/6] 9p: Replace the fidlist with an IDR

Matthew Wilcox wrote on Thu, Jul 12, 2018:
> > Ah, I had missed that you didn't update this memset as you said in reply
> > to comment on v1.
>
> Rather than update the memset, I ...
>
> > Could you resend just this patch and either initialize fid->fid or use
> > kzalloc for the fid allocation?
> >
> > > + fid->fid = 0;
>
> Did that instead ;-)
>
> If I were going to initialise the entire structure to 0, I'd replace
> the kmalloc with kzalloc and drop the memset altogether.

Oh, I'm blind, sorry! :)

> > If you do resend, alignment here was wrong.
>
> I think this warning from checkpatch is complete bullshit. It's
> really none of checkpatch's business how I choose to align function
> arguments.
>
> That said, if you want it to be aligned some particular way, feel free
> to adjust the whitespace. I don't care.

I would tend to agree that sometimes checkpatch is overzealous, but
having worked on projects where code style is all over the place it
really feels much more comfortable to have a consistent style
everywhere.

Thanks for the permission, I'll adjust it (assuming this does end up
getting pulled from my branch, but nobody yelled so far)

--
Dominique Martinet

2018-07-12 12:32:26

by Greg Kurz

[permalink] [raw]
Subject: Re: [V9fs-developer] [PATCH v2 1/6] 9p: Fix comment on smp_wmb

On Wed, 11 Jul 2018 14:02:20 -0700
Matthew Wilcox <[email protected]> wrote:

> The previous comment misled me into thinking the barrier wasn't needed
> at all.
>
> Signed-off-by: Matthew Wilcox <[email protected]>
> ---

Reviewed-by: Greg Kurz <[email protected]>

> net/9p/client.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/net/9p/client.c b/net/9p/client.c
> index 18c5271910dc..999eceb8af98 100644
> --- a/net/9p/client.c
> +++ b/net/9p/client.c
> @@ -447,7 +447,7 @@ void p9_client_cb(struct p9_client *c, struct p9_req_t *req, int status)
>
> /*
> * This barrier is needed to make sure any change made to req before
> - * the other thread wakes up will indeed be seen by the waiting side.
> + * the status change is visible to another thread
> */
> smp_wmb();
> req->status = status;


2018-07-12 12:35:03

by Greg Kurz

[permalink] [raw]
Subject: Re: [V9fs-developer] [PATCH v2 2/6] 9p: Change p9_fid_create calling convention

On Wed, 11 Jul 2018 14:02:21 -0700
Matthew Wilcox <[email protected]> wrote:

> Return NULL instead of ERR_PTR when we can't allocate a FID. The ENOSPC
> return value was getting all the way back to userspace, and that's
> confusing for a userspace program which isn't expecting read() to tell it
> there's no space left on the filesystem. The best error we can return to
> indicate a temporary failure caused by lack of client resources is ENOMEM.
>
> Maybe it would be better to sleep until a FID is available, but that's
> not a change I'm comfortable making.
>
> Signed-off-by: Matthew Wilcox <[email protected]>
> ---

Reviewed-by: Greg Kurz <[email protected]>

> net/9p/client.c | 23 +++++++++--------------
> 1 file changed, 9 insertions(+), 14 deletions(-)
>
> diff --git a/net/9p/client.c b/net/9p/client.c
> index 999eceb8af98..389a2904b7b3 100644
> --- a/net/9p/client.c
> +++ b/net/9p/client.c
> @@ -913,13 +913,11 @@ static struct p9_fid *p9_fid_create(struct p9_client *clnt)
> p9_debug(P9_DEBUG_FID, "clnt %p\n", clnt);
> fid = kmalloc(sizeof(struct p9_fid), GFP_KERNEL);
> if (!fid)
> - return ERR_PTR(-ENOMEM);
> + return NULL;
>
> ret = p9_idpool_get(clnt->fidpool);
> - if (ret < 0) {
> - ret = -ENOSPC;
> + if (ret < 0)
> goto error;
> - }
> fid->fid = ret;
>
> memset(&fid->qid, 0, sizeof(struct p9_qid));
> @@ -935,7 +933,7 @@ static struct p9_fid *p9_fid_create(struct p9_client *clnt)
>
> error:
> kfree(fid);
> - return ERR_PTR(ret);
> + return NULL;
> }
>
> static void p9_fid_destroy(struct p9_fid *fid)
> @@ -1137,9 +1135,8 @@ struct p9_fid *p9_client_attach(struct p9_client *clnt, struct p9_fid *afid,
> p9_debug(P9_DEBUG_9P, ">>> TATTACH afid %d uname %s aname %s\n",
> afid ? afid->fid : -1, uname, aname);
> fid = p9_fid_create(clnt);
> - if (IS_ERR(fid)) {
> - err = PTR_ERR(fid);
> - fid = NULL;
> + if (!fid) {
> + err = -ENOMEM;
> goto error;
> }
> fid->uid = n_uname;
> @@ -1188,9 +1185,8 @@ struct p9_fid *p9_client_walk(struct p9_fid *oldfid, uint16_t nwname,
> clnt = oldfid->clnt;
> if (clone) {
> fid = p9_fid_create(clnt);
> - if (IS_ERR(fid)) {
> - err = PTR_ERR(fid);
> - fid = NULL;
> + if (!fid) {
> + err = -ENOMEM;
> goto error;
> }
>
> @@ -2018,9 +2014,8 @@ struct p9_fid *p9_client_xattrwalk(struct p9_fid *file_fid,
> err = 0;
> clnt = file_fid->clnt;
> attr_fid = p9_fid_create(clnt);
> - if (IS_ERR(attr_fid)) {
> - err = PTR_ERR(attr_fid);
> - attr_fid = NULL;
> + if (!attr_fid) {
> + err = -ENOMEM;
> goto error;
> }
> p9_debug(P9_DEBUG_9P,


2018-07-12 14:41:29

by Dominique Martinet

[permalink] [raw]
Subject: Re: [V9fs-developer] [PATCH v2 4/6] 9p: Embed wait_queue_head into p9_req_t

Greg Kurz wrote on Thu, Jul 12, 2018:
> This is true when all tags have been used at least once. But the current code
> lazily allocates the wait_queue_head_t, ie, only when a tag is used for the
> first time. Your patch causes a full row of wait_quest_head_t to be pre-allocated.
>
> ie, P9_ROW_MAXTAG * 24 = 255 * 24 = 6120
>
> instead of (P9_ROW_MAXTAG * 8) + 24 = 255 * 8 + 24 = 2064
>
> This is nearly a page of allocated memory that might be never used.
>
> Not sure if this is a problem though...

I thought the exact same, but the next patch in the serie removes that
array and allocates even more lazily with a slab, so this was a
no-brainer :)

--
Dominique Martinet

2018-07-12 14:54:29

by Greg Kurz

[permalink] [raw]
Subject: Re: [V9fs-developer] [PATCH v2 4/6] 9p: Embed wait_queue_head into p9_req_t

On Wed, 11 Jul 2018 14:02:23 -0700
Matthew Wilcox <[email protected]> wrote:

> On a 64-bit system, the wait_queue_head_t is 24 bytes while the pointer
> to it is 8 bytes. Growing the p9_req_t by 16 bytes is better than
> performing a 24-byte memory allocation.
>

This is true when all tags have been used at least once. But the current code
lazily allocates the wait_queue_head_t, ie, only when a tag is used for the
first time. Your patch causes a full row of wait_quest_head_t to be pre-allocated.

ie, P9_ROW_MAXTAG * 24 = 255 * 24 = 6120

instead of (P9_ROW_MAXTAG * 8) + 24 = 255 * 8 + 24 = 2064

This is nearly a page of allocated memory that might be never used.

Not sure if this is a problem though...

> Signed-off-by: Matthew Wilcox <[email protected]>
> ---
> include/net/9p/client.h | 2 +-
> net/9p/client.c | 19 +++++--------------
> net/9p/trans_virtio.c | 2 +-
> 3 files changed, 7 insertions(+), 16 deletions(-)
>

... and the diffstat is nice :) so

Reviewed-by: Greg Kurz <[email protected]>

> diff --git a/include/net/9p/client.h b/include/net/9p/client.h
> index e405729cd1c7..0fa0fbab33b0 100644
> --- a/include/net/9p/client.h
> +++ b/include/net/9p/client.h
> @@ -113,7 +113,7 @@ enum p9_req_status_t {
> struct p9_req_t {
> int status;
> int t_err;
> - wait_queue_head_t *wq;
> + wait_queue_head_t wq;
> struct p9_fcall *tc;
> struct p9_fcall *rc;
> void *aux;
> diff --git a/net/9p/client.c b/net/9p/client.c
> index b89c7298267c..bc8aba6b5ce0 100644
> --- a/net/9p/client.c
> +++ b/net/9p/client.c
> @@ -282,8 +282,9 @@ p9_tag_alloc(struct p9_client *c, u16 tag, unsigned int max_size)
> return ERR_PTR(-ENOMEM);
> }
> for (col = 0; col < P9_ROW_MAXTAG; col++) {
> - c->reqs[row][col].status = REQ_STATUS_IDLE;
> - c->reqs[row][col].tc = NULL;
> + req = &c->reqs[row][col];
> + req->status = REQ_STATUS_IDLE;
> + init_waitqueue_head(&req->wq);
> }
> c->max_tag += P9_ROW_MAXTAG;
> }
> @@ -293,13 +294,6 @@ p9_tag_alloc(struct p9_client *c, u16 tag, unsigned int max_size)
> col = tag % P9_ROW_MAXTAG;
>
> req = &c->reqs[row][col];
> - if (!req->wq) {
> - req->wq = kmalloc(sizeof(wait_queue_head_t), GFP_NOFS);
> - if (!req->wq)
> - goto grow_failed;
> - init_waitqueue_head(req->wq);
> - }
> -
> if (!req->tc)
> req->tc = p9_fcall_alloc(alloc_msize);
> if (!req->rc)
> @@ -319,9 +313,7 @@ p9_tag_alloc(struct p9_client *c, u16 tag, unsigned int max_size)
> pr_err("Couldn't grow tag array\n");
> kfree(req->tc);
> kfree(req->rc);
> - kfree(req->wq);
> req->tc = req->rc = NULL;
> - req->wq = NULL;
> return ERR_PTR(-ENOMEM);
> }
>
> @@ -409,7 +401,6 @@ static void p9_tag_cleanup(struct p9_client *c)
> /* free requests associated with tags */
> for (row = 0; row < (c->max_tag/P9_ROW_MAXTAG); row++) {
> for (col = 0; col < P9_ROW_MAXTAG; col++) {
> - kfree(c->reqs[row][col].wq);
> kfree(c->reqs[row][col].tc);
> kfree(c->reqs[row][col].rc);
> }
> @@ -452,7 +443,7 @@ void p9_client_cb(struct p9_client *c, struct p9_req_t *req, int status)
> smp_wmb();
> req->status = status;
>
> - wake_up(req->wq);
> + wake_up(&req->wq);
> p9_debug(P9_DEBUG_MUX, "wakeup: %d\n", req->tc->tag);
> }
> EXPORT_SYMBOL(p9_client_cb);
> @@ -773,7 +764,7 @@ p9_client_rpc(struct p9_client *c, int8_t type, const char *fmt, ...)
> }
> again:
> /* Wait for the response */
> - err = wait_event_killable(*req->wq, req->status >= REQ_STATUS_RCVD);
> + err = wait_event_killable(req->wq, req->status >= REQ_STATUS_RCVD);
>
> /*
> * Make sure our req is coherent with regard to updates in other
> diff --git a/net/9p/trans_virtio.c b/net/9p/trans_virtio.c
> index 05006cbb3361..3e096c98313c 100644
> --- a/net/9p/trans_virtio.c
> +++ b/net/9p/trans_virtio.c
> @@ -490,7 +490,7 @@ p9_virtio_zc_request(struct p9_client *client, struct p9_req_t *req,
> virtqueue_kick(chan->vq);
> spin_unlock_irqrestore(&chan->lock, flags);
> p9_debug(P9_DEBUG_TRANS, "virtio request kicked\n");
> - err = wait_event_killable(*req->wq, req->status >= REQ_STATUS_RCVD);
> + err = wait_event_killable(req->wq, req->status >= REQ_STATUS_RCVD);
> /*
> * Non kernel buffers are pinned, unpin them
> */


2018-07-12 15:17:22

by Greg Kurz

[permalink] [raw]
Subject: Re: [V9fs-developer] [PATCH v2 4/6] 9p: Embed wait_queue_head into p9_req_t

On Thu, 12 Jul 2018 16:40:12 +0200
Dominique Martinet <[email protected]> wrote:

> Greg Kurz wrote on Thu, Jul 12, 2018:
> > This is true when all tags have been used at least once. But the current code
> > lazily allocates the wait_queue_head_t, ie, only when a tag is used for the
> > first time. Your patch causes a full row of wait_quest_head_t to be pre-allocated.
> >
> > ie, P9_ROW_MAXTAG * 24 = 255 * 24 = 6120
> >
> > instead of (P9_ROW_MAXTAG * 8) + 24 = 255 * 8 + 24 = 2064
> >
> > This is nearly a page of allocated memory that might be never used.
> >
> > Not sure if this is a problem though...
>
> I thought the exact same, but the next patch in the serie removes that
> array and allocates even more lazily with a slab, so this was a
> no-brainer :)
>

Ah ? I haven't started looking at that *big* other patch yet... :)

2018-07-13 01:19:35

by jiangyiwen

[permalink] [raw]
Subject: Re: [V9fs-developer] [PATCH v2 2/6] 9p: Change p9_fid_create calling convention

On 2018/7/12 5:02, Matthew Wilcox wrote:
> Return NULL instead of ERR_PTR when we can't allocate a FID. The ENOSPC
> return value was getting all the way back to userspace, and that's
> confusing for a userspace program which isn't expecting read() to tell it
> there's no space left on the filesystem. The best error we can return to
> indicate a temporary failure caused by lack of client resources is ENOMEM.
>
> Maybe it would be better to sleep until a FID is available, but that's
> not a change I'm comfortable making.
>
> Signed-off-by: Matthew Wilcox <[email protected]>

Reviewed-by: Yiwen Jiang <[email protected]>

> ---
> net/9p/client.c | 23 +++++++++--------------
> 1 file changed, 9 insertions(+), 14 deletions(-)
>
> diff --git a/net/9p/client.c b/net/9p/client.c
> index 999eceb8af98..389a2904b7b3 100644
> --- a/net/9p/client.c
> +++ b/net/9p/client.c
> @@ -913,13 +913,11 @@ static struct p9_fid *p9_fid_create(struct p9_client *clnt)
> p9_debug(P9_DEBUG_FID, "clnt %p\n", clnt);
> fid = kmalloc(sizeof(struct p9_fid), GFP_KERNEL);
> if (!fid)
> - return ERR_PTR(-ENOMEM);
> + return NULL;
>
> ret = p9_idpool_get(clnt->fidpool);
> - if (ret < 0) {
> - ret = -ENOSPC;
> + if (ret < 0)
> goto error;
> - }
> fid->fid = ret;
>
> memset(&fid->qid, 0, sizeof(struct p9_qid));
> @@ -935,7 +933,7 @@ static struct p9_fid *p9_fid_create(struct p9_client *clnt)
>
> error:
> kfree(fid);
> - return ERR_PTR(ret);
> + return NULL;
> }
>
> static void p9_fid_destroy(struct p9_fid *fid)
> @@ -1137,9 +1135,8 @@ struct p9_fid *p9_client_attach(struct p9_client *clnt, struct p9_fid *afid,
> p9_debug(P9_DEBUG_9P, ">>> TATTACH afid %d uname %s aname %s\n",
> afid ? afid->fid : -1, uname, aname);
> fid = p9_fid_create(clnt);
> - if (IS_ERR(fid)) {
> - err = PTR_ERR(fid);
> - fid = NULL;
> + if (!fid) {
> + err = -ENOMEM;
> goto error;
> }
> fid->uid = n_uname;
> @@ -1188,9 +1185,8 @@ struct p9_fid *p9_client_walk(struct p9_fid *oldfid, uint16_t nwname,
> clnt = oldfid->clnt;
> if (clone) {
> fid = p9_fid_create(clnt);
> - if (IS_ERR(fid)) {
> - err = PTR_ERR(fid);
> - fid = NULL;
> + if (!fid) {
> + err = -ENOMEM;
> goto error;
> }
>
> @@ -2018,9 +2014,8 @@ struct p9_fid *p9_client_xattrwalk(struct p9_fid *file_fid,
> err = 0;
> clnt = file_fid->clnt;
> attr_fid = p9_fid_create(clnt);
> - if (IS_ERR(attr_fid)) {
> - err = PTR_ERR(attr_fid);
> - attr_fid = NULL;
> + if (!attr_fid) {
> + err = -ENOMEM;
> goto error;
> }
> p9_debug(P9_DEBUG_9P,
>


2018-07-13 02:07:15

by jiangyiwen

[permalink] [raw]
Subject: Re: [V9fs-developer] [PATCH v2 3/6] 9p: Replace the fidlist with an IDR

On 2018/7/12 5:02, Matthew Wilcox wrote:
> The p9_idpool being used to allocate the IDs uses an IDR to allocate
> the IDs ... which we then keep in a doubly-linked list, rather than in
> the IDR which allocated them. We can use an IDR directly which saves
> two pointers per p9_fid, and a tiny memory allocation per p9_client.
>
> Signed-off-by: Matthew Wilcox <[email protected]>
> ---
> include/net/9p/client.h | 9 +++------
> net/9p/client.c | 44 +++++++++++++++--------------------------
> 2 files changed, 19 insertions(+), 34 deletions(-)
>
> diff --git a/include/net/9p/client.h b/include/net/9p/client.h
> index 7af9d769b97d..e405729cd1c7 100644
> --- a/include/net/9p/client.h
> +++ b/include/net/9p/client.h
> @@ -27,6 +27,7 @@
> #define NET_9P_CLIENT_H
>
> #include <linux/utsname.h>
> +#include <linux/idr.h>
>
> /* Number of requests per row */
> #define P9_ROW_MAXTAG 255
> @@ -128,8 +129,7 @@ struct p9_req_t {
> * @proto_version: 9P protocol version to use
> * @trans_mod: module API instantiated with this client
> * @trans: tranport instance state and API
> - * @fidpool: fid handle accounting for session
> - * @fidlist: List of active fid handles
> + * @fids: All active FID handles
> * @tagpool - transaction id accounting for session
> * @reqs - 2D array of requests
> * @max_tag - current maximum tag id allocated
> @@ -169,8 +169,7 @@ struct p9_client {
> } tcp;
> } trans_opts;
>
> - struct p9_idpool *fidpool;
> - struct list_head fidlist;
> + struct idr fids;
>
> struct p9_idpool *tagpool;
> struct p9_req_t *reqs[P9_ROW_MAXTAG];
> @@ -188,7 +187,6 @@ struct p9_client {
> * @iounit: the server reported maximum transaction size for this file
> * @uid: the numeric uid of the local user who owns this handle
> * @rdir: readdir accounting structure (allocated on demand)
> - * @flist: per-client-instance fid tracking
> * @dlist: per-dentry fid tracking
> *
> * TODO: This needs lots of explanation.
> @@ -204,7 +202,6 @@ struct p9_fid {
>
> void *rdir;
>
> - struct list_head flist;
> struct hlist_node dlist; /* list of all fids attached to a dentry */
> };
>
> diff --git a/net/9p/client.c b/net/9p/client.c
> index 389a2904b7b3..b89c7298267c 100644
> --- a/net/9p/client.c
> +++ b/net/9p/client.c
> @@ -908,30 +908,29 @@ static struct p9_fid *p9_fid_create(struct p9_client *clnt)
> {
> int ret;
> struct p9_fid *fid;
> - unsigned long flags;
>
> p9_debug(P9_DEBUG_FID, "clnt %p\n", clnt);
> fid = kmalloc(sizeof(struct p9_fid), GFP_KERNEL);
> if (!fid)
> return NULL;
>
> - ret = p9_idpool_get(clnt->fidpool);
> - if (ret < 0)
> - goto error;
> - fid->fid = ret;
> -
> memset(&fid->qid, 0, sizeof(struct p9_qid));
> fid->mode = -1;
> fid->uid = current_fsuid();
> fid->clnt = clnt;
> fid->rdir = NULL;
> - spin_lock_irqsave(&clnt->lock, flags);
> - list_add(&fid->flist, &clnt->fidlist);
> - spin_unlock_irqrestore(&clnt->lock, flags);
> + fid->fid = 0;
>
> - return fid;
> + idr_preload(GFP_KERNEL);

It is best to use GFP_NOFS instead, or else it may cause some
unpredictable problem, because when out of memory it will
reclaim memory from v9fs.

> + spin_lock_irq(&clnt->lock);
> + ret = idr_alloc_u32(&clnt->fids, fid, &fid->fid, P9_NOFID - 1,
> + GFP_NOWAIT);
> + spin_unlock_irq(&clnt->lock);

use spin_lock instead, clnt->lock is not used in irq context.

> + idr_preload_end();
> +
> + if (!ret)
> + return fid;
>
> -error:
> kfree(fid);
> return NULL;
> }
> @@ -943,9 +942,8 @@ static void p9_fid_destroy(struct p9_fid *fid)
>
> p9_debug(P9_DEBUG_FID, "fid %d\n", fid->fid);
> clnt = fid->clnt;
> - p9_idpool_put(fid->fid, clnt->fidpool);
> spin_lock_irqsave(&clnt->lock, flags);
> - list_del(&fid->flist);
> + idr_remove(&clnt->fids, fid->fid);
> spin_unlock_irqrestore(&clnt->lock, flags);
> kfree(fid->rdir);
> kfree(fid);
> @@ -1028,7 +1026,7 @@ struct p9_client *p9_client_create(const char *dev_name, char *options)
> memcpy(clnt->name, client_id, strlen(client_id) + 1);
>
> spin_lock_init(&clnt->lock);
> - INIT_LIST_HEAD(&clnt->fidlist);
> + idr_init(&clnt->fids);
>
> err = p9_tag_init(clnt);
> if (err < 0)
> @@ -1048,18 +1046,12 @@ struct p9_client *p9_client_create(const char *dev_name, char *options)
> goto destroy_tagpool;
> }
>
> - clnt->fidpool = p9_idpool_create();
> - if (IS_ERR(clnt->fidpool)) {
> - err = PTR_ERR(clnt->fidpool);
> - goto put_trans;
> - }
> -
> p9_debug(P9_DEBUG_MUX, "clnt %p trans %p msize %d protocol %d\n",
> clnt, clnt->trans_mod, clnt->msize, clnt->proto_version);
>
> err = clnt->trans_mod->create(clnt, dev_name, options);
> if (err)
> - goto destroy_fidpool;
> + goto put_trans;
>
> if (clnt->msize > clnt->trans_mod->maxsize)
> clnt->msize = clnt->trans_mod->maxsize;
> @@ -1072,8 +1064,6 @@ struct p9_client *p9_client_create(const char *dev_name, char *options)
>
> close_trans:
> clnt->trans_mod->close(clnt);
> -destroy_fidpool:
> - p9_idpool_destroy(clnt->fidpool);
> put_trans:
> v9fs_put_trans(clnt->trans_mod);
> destroy_tagpool:
> @@ -1086,7 +1076,8 @@ EXPORT_SYMBOL(p9_client_create);
>
> void p9_client_destroy(struct p9_client *clnt)
> {
> - struct p9_fid *fid, *fidptr;
> + struct p9_fid *fid;
> + int id;
>
> p9_debug(P9_DEBUG_MUX, "clnt %p\n", clnt);
>
> @@ -1095,14 +1086,11 @@ void p9_client_destroy(struct p9_client *clnt)
>
> v9fs_put_trans(clnt->trans_mod);
>
> - list_for_each_entry_safe(fid, fidptr, &clnt->fidlist, flist) {
> + idr_for_each_entry(&clnt->fids, fid, id) {
> pr_info("Found fid %d not clunked\n", fid->fid);
> p9_fid_destroy(fid);
> }
>
> - if (clnt->fidpool)
> - p9_idpool_destroy(clnt->fidpool);
> -

I suggest add idr_destroy in the end.

> p9_tag_cleanup(clnt);
>
> kfree(clnt);
>



2018-07-13 02:48:57

by Matthew Wilcox

[permalink] [raw]
Subject: Re: [V9fs-developer] [PATCH v2 3/6] 9p: Replace the fidlist with an IDR

On Fri, Jul 13, 2018 at 10:05:50AM +0800, jiangyiwen wrote:
> > @@ -908,30 +908,29 @@ static struct p9_fid *p9_fid_create(struct p9_client *clnt)
> > {
> > int ret;
> > struct p9_fid *fid;
> > - unsigned long flags;
> >
> > p9_debug(P9_DEBUG_FID, "clnt %p\n", clnt);
> > fid = kmalloc(sizeof(struct p9_fid), GFP_KERNEL);
> > if (!fid)
> > return NULL;
> >
> > - ret = p9_idpool_get(clnt->fidpool);
> > - if (ret < 0)
> > - goto error;
> > - fid->fid = ret;
> > -
> > memset(&fid->qid, 0, sizeof(struct p9_qid));
> > fid->mode = -1;
> > fid->uid = current_fsuid();
> > fid->clnt = clnt;
> > fid->rdir = NULL;
> > - spin_lock_irqsave(&clnt->lock, flags);
> > - list_add(&fid->flist, &clnt->fidlist);
> > - spin_unlock_irqrestore(&clnt->lock, flags);
> > + fid->fid = 0;
> >
> > - return fid;
> > + idr_preload(GFP_KERNEL);
>
> It is best to use GFP_NOFS instead, or else it may cause some
> unpredictable problem, because when out of memory it will
> reclaim memory from v9fs.

Earlier in this function, fid was allocated with GFP_KERNEL:

> > fid = kmalloc(sizeof(struct p9_fid), GFP_KERNEL);


> > + spin_lock_irq(&clnt->lock);
> > + ret = idr_alloc_u32(&clnt->fids, fid, &fid->fid, P9_NOFID - 1,
> > + GFP_NOWAIT);
> > + spin_unlock_irq(&clnt->lock);
>
> use spin_lock instead, clnt->lock is not used in irq context.

I don't think that's right. What about p9_fid_destroy? It was already
using spin_lock_irqsave(), so I just assumed that whoever wrote that
code at least considered that it might be called from interrupt context.

Also consider p9_free_req() which shares the same lock. We could get
rid of clnt->lock altogether as there's a lock embedded in each IDR,
but that'll introduce an unwanted dependence on the RDMA tree in this
merge window.

> > @@ -1095,14 +1086,11 @@ void p9_client_destroy(struct p9_client *clnt)
> >
> > v9fs_put_trans(clnt->trans_mod);
> >
> > - list_for_each_entry_safe(fid, fidptr, &clnt->fidlist, flist) {
> > + idr_for_each_entry(&clnt->fids, fid, id) {
> > pr_info("Found fid %d not clunked\n", fid->fid);
> > p9_fid_destroy(fid);
> > }
> >
> > - if (clnt->fidpool)
> > - p9_idpool_destroy(clnt->fidpool);
> > -
>
> I suggest add idr_destroy in the end.

Why? p9_fid_destroy calls idr_remove() for each fid, so it'll already
be empty.

Thanks for all the review, to everyone who's submitted review. This is
a really healthy community.

2018-07-18 10:07:30

by Dominique Martinet

[permalink] [raw]
Subject: Re: [PATCH v2 5/6] 9p: Use a slab for allocating requests

+Cc Greg, I could use your opinion on this if you have a moment.

Matthew Wilcox wrote on Wed, Jul 11, 2018:
> Replace the custom batch allocation with a slab. Use an IDR to store
> pointers to the active requests instead of an array. We don't try to
> handle P9_NOTAG specially; the IDR will happily shrink all the way back
> once the TVERSION call has completed.

Sorry for coming back to this patch now, I just noticed something that's
actually probably a fairly big hit on performance...

While the slab is just as good as the array for the request itself, this
makes every single request allocate "fcalls" everytime instead of
reusing a cached allocation.
The default msize is 8k and these allocs probably are fairly efficient,
but some transports like RDMA allow to increase this to up to 1MB... And
doing this kind of allocation twice for every packet is going to be very
slow.
(not that hogging megabytes of memory was a great practice either!)


One thing is that the buffers are all going to be the same size for a
given client (.... except virtio zc buffers, I wonder what I'm missing
or why that didn't blow up before?)
Err, that aside I was going to ask if we couldn't find a way to keep a
pool of these somehow.
Ideally putting them in another slab so they could be reclaimed if
necessary, but the size could vary from one client to another, can we
create a kmem_cache object per client? the KMEM_CACHE macro is not very
flexible so I don't think that is encouraged... :)


It's a shame because I really like that patch, I'll try to find time to
run some light benchmark with varying msizes eventually but I'm not sure
when I'll find time for that... Hopefully before the 4.19 merge window!


> /**
> - * p9_tag_alloc - lookup/allocate a request by tag
> - * @c: client session to lookup tag within
> - * @tag: numeric id for transaction
> - *
> - * this is a simple array lookup, but will grow the
> - * request_slots as necessary to accommodate transaction
> - * ids which did not previously have a slot.
> - *
> - * this code relies on the client spinlock to manage locks, its
> - * possible we should switch to something else, but I'd rather
> - * stick with something low-overhead for the common case.
> + * p9_req_alloc - Allocate a new request.
> + * @c: Client session.
> + * @type: Transaction type.
> + * @max_size: Maximum packet size for this request.
> *
> + * Context: Process context.
> + * Return: Pointer to new request.
> */
> -
> static struct p9_req_t *
> -p9_tag_alloc(struct p9_client *c, u16 tag, unsigned int max_size)
> +p9_tag_alloc(struct p9_client *c, int8_t type, unsigned int max_size)
> {
> - unsigned long flags;
> - int row, col;
> - struct p9_req_t *req;
> + struct p9_req_t *req = kmem_cache_alloc(p9_req_cache, GFP_NOFS);
> int alloc_msize = min(c->msize, max_size);
> + int tag;
>
> - /* This looks up the original request by tag so we know which
> - * buffer to read the data into */
> - tag++;
> -
> - if (tag >= c->max_tag) {
> - spin_lock_irqsave(&c->lock, flags);
> - /* check again since original check was outside of lock */
> - while (tag >= c->max_tag) {
> - row = (tag / P9_ROW_MAXTAG);
> - c->reqs[row] = kcalloc(P9_ROW_MAXTAG,
> - sizeof(struct p9_req_t), GFP_ATOMIC);
> -
> - if (!c->reqs[row]) {
> - pr_err("Couldn't grow tag array\n");
> - spin_unlock_irqrestore(&c->lock, flags);
> - return ERR_PTR(-ENOMEM);
> - }
> - for (col = 0; col < P9_ROW_MAXTAG; col++) {
> - req = &c->reqs[row][col];
> - req->status = REQ_STATUS_IDLE;
> - init_waitqueue_head(&req->wq);
> - }
> - c->max_tag += P9_ROW_MAXTAG;
> - }
> - spin_unlock_irqrestore(&c->lock, flags);
> - }
> - row = tag / P9_ROW_MAXTAG;
> - col = tag % P9_ROW_MAXTAG;
> + if (!req)
> + return NULL;
>
> - req = &c->reqs[row][col];
> - if (!req->tc)
> - req->tc = p9_fcall_alloc(alloc_msize);
> - if (!req->rc)
> - req->rc = p9_fcall_alloc(alloc_msize);
> + req->tc = p9_fcall_alloc(alloc_msize);
> + req->rc = p9_fcall_alloc(alloc_msize);
> if (!req->tc || !req->rc)
> - goto grow_failed;
> + goto free;
>
> p9pdu_reset(req->tc);
> p9pdu_reset(req->rc);
> -
> - req->tc->tag = tag-1;
> req->status = REQ_STATUS_ALLOC;
> + init_waitqueue_head(&req->wq);
> + INIT_LIST_HEAD(&req->req_list);
> +
> + idr_preload(GFP_NOFS);
> + spin_lock_irq(&c->lock);
> + if (type == P9_TVERSION)
> + tag = idr_alloc(&c->reqs, req, P9_NOTAG, P9_NOTAG + 1,
> + GFP_NOWAIT);
> + else
> + tag = idr_alloc(&c->reqs, req, 0, P9_NOTAG, GFP_NOWAIT);
> + req->tc->tag = tag;
> + spin_unlock_irq(&c->lock);
> + idr_preload_end();
> + if (tag < 0)
> + goto free;

--
Dominique

2018-07-18 11:50:54

by Matthew Wilcox

[permalink] [raw]
Subject: Re: [PATCH v2 5/6] 9p: Use a slab for allocating requests

On Wed, Jul 18, 2018 at 12:05:54PM +0200, Dominique Martinet wrote:
> +Cc Greg, I could use your opinion on this if you have a moment.
>
> Matthew Wilcox wrote on Wed, Jul 11, 2018:
> > Replace the custom batch allocation with a slab. Use an IDR to store
> > pointers to the active requests instead of an array. We don't try to
> > handle P9_NOTAG specially; the IDR will happily shrink all the way back
> > once the TVERSION call has completed.
>
> Sorry for coming back to this patch now, I just noticed something that's
> actually probably a fairly big hit on performance...
>
> While the slab is just as good as the array for the request itself, this
> makes every single request allocate "fcalls" everytime instead of
> reusing a cached allocation.
> The default msize is 8k and these allocs probably are fairly efficient,
> but some transports like RDMA allow to increase this to up to 1MB... And
> doing this kind of allocation twice for every packet is going to be very
> slow.
> (not that hogging megabytes of memory was a great practice either!)
>
>
> One thing is that the buffers are all going to be the same size for a
> given client (.... except virtio zc buffers, I wonder what I'm missing
> or why that didn't blow up before?)
> Err, that aside I was going to ask if we couldn't find a way to keep a
> pool of these somehow.
> Ideally putting them in another slab so they could be reclaimed if
> necessary, but the size could vary from one client to another, can we
> create a kmem_cache object per client? the KMEM_CACHE macro is not very
> flexible so I don't think that is encouraged... :)
>
>
> It's a shame because I really like that patch, I'll try to find time to
> run some light benchmark with varying msizes eventually but I'm not sure
> when I'll find time for that... Hopefully before the 4.19 merge window!

I must admit to having not looked at the fcall aspect of this. kmalloc
is implemented in terms of slab, so it's not going to be much slower than
using a dedicatd slab (a few instructions to figure out which slab cache
to use).

What would help is splitting the p9_fcall struct from the alloc_size.
kmalloc is pretty effective at allocating power-of-two sized structures,
and 8k + 32 bytes is approximately the worst case scenario. I'd do this
by embedding the p9_fcall into the p9_req_t and only allocating the
msize, like this:

(I only converted client.c to the new embedded fcall way; all the
transports also need massaging)

Regardless of whether any of the other patches go in, this patch is
worth having; it's really wasting slab allocator memory.

diff --git a/include/net/9p/client.h b/include/net/9p/client.h
index a4dc42c53d18..4b4ac1362ad5 100644
--- a/include/net/9p/client.h
+++ b/include/net/9p/client.h
@@ -95,8 +95,8 @@ struct p9_req_t {
int status;
int t_err;
wait_queue_head_t wq;
- struct p9_fcall *tc;
- struct p9_fcall *rc;
+ struct p9_fcall tc;
+ struct p9_fcall rc;
void *aux;
struct list_head req_list;
};
diff --git a/net/9p/client.c b/net/9p/client.c
index ecf5dd269f4c..058dfbebdaa2 100644
--- a/net/9p/client.c
+++ b/net/9p/client.c
@@ -230,15 +230,13 @@ static int parse_opts(char *opts, struct p9_client *clnt)
return ret;
}

-static struct p9_fcall *p9_fcall_alloc(int alloc_msize)
+static int p9_fcall_alloc(struct p9_fcall *fc, int alloc_msize)
{
- struct p9_fcall *fc;
- fc = kmalloc(sizeof(struct p9_fcall) + alloc_msize, GFP_NOFS);
- if (!fc)
- return NULL;
+ fc->sdata = kmalloc(alloc_msize, GFP_NOFS);
+ if (!fc->sdata)
+ return -ENOMEM;
fc->capacity = alloc_msize;
- fc->sdata = (char *) fc + sizeof(struct p9_fcall);
- return fc;
+ return 0;
}

static struct kmem_cache *p9_req_cache;
@@ -262,13 +260,13 @@ p9_tag_alloc(struct p9_client *c, int8_t type, unsigned int max_size)
if (!req)
return NULL;

- req->tc = p9_fcall_alloc(alloc_msize);
- req->rc = p9_fcall_alloc(alloc_msize);
- if (!req->tc || !req->rc)
+ if (p9_fcall_alloc(&req->tc, alloc_msize))
+ goto free;
+ if (p9_fcall_alloc(&req->rc, alloc_msize))
goto free;

- p9pdu_reset(req->tc);
- p9pdu_reset(req->rc);
+ p9pdu_reset(&req->tc);
+ p9pdu_reset(&req->rc);
req->status = REQ_STATUS_ALLOC;
init_waitqueue_head(&req->wq);
INIT_LIST_HEAD(&req->req_list);
@@ -280,7 +278,7 @@ p9_tag_alloc(struct p9_client *c, int8_t type, unsigned int max_size)
GFP_NOWAIT);
else
tag = idr_alloc(&c->reqs, req, 0, P9_NOTAG, GFP_NOWAIT);
- req->tc->tag = tag;
+ req->tc.tag = tag;
spin_unlock_irq(&c->lock);
idr_preload_end();
if (tag < 0)
@@ -289,8 +287,8 @@ p9_tag_alloc(struct p9_client *c, int8_t type, unsigned int max_size)
return req;

free:
- kfree(req->tc);
- kfree(req->rc);
+ kfree(req->tc.sdata);
+ kfree(req->rc.sdata);
kmem_cache_free(p9_req_cache, req);
return ERR_PTR(-ENOMEM);
}
@@ -329,14 +327,14 @@ EXPORT_SYMBOL(p9_tag_lookup);
static void p9_free_req(struct p9_client *c, struct p9_req_t *r)
{
unsigned long flags;
- u16 tag = r->tc->tag;
+ u16 tag = r->tc.tag;

p9_debug(P9_DEBUG_MUX, "clnt %p req %p tag: %d\n", c, r, tag);
spin_lock_irqsave(&c->lock, flags);
idr_remove(&c->reqs, tag);
spin_unlock_irqrestore(&c->lock, flags);
- kfree(r->tc);
- kfree(r->rc);
+ kfree(r->tc.sdata);
+ kfree(r->rc.sdata);
kmem_cache_free(p9_req_cache, r);
}

@@ -368,7 +366,7 @@ static void p9_tag_cleanup(struct p9_client *c)
*/
void p9_client_cb(struct p9_client *c, struct p9_req_t *req, int status)
{
- p9_debug(P9_DEBUG_MUX, " tag %d\n", req->tc->tag);
+ p9_debug(P9_DEBUG_MUX, " tag %d\n", req->tc.tag);

/*
* This barrier is needed to make sure any change made to req before
@@ -378,7 +376,7 @@ void p9_client_cb(struct p9_client *c, struct p9_req_t *req, int status)
req->status = status;

wake_up(&req->wq);
- p9_debug(P9_DEBUG_MUX, "wakeup: %d\n", req->tc->tag);
+ p9_debug(P9_DEBUG_MUX, "wakeup: %d\n", req->tc.tag);
}
EXPORT_SYMBOL(p9_client_cb);

@@ -448,12 +446,12 @@ static int p9_check_errors(struct p9_client *c, struct p9_req_t *req)
int err;
int ecode;

- err = p9_parse_header(req->rc, NULL, &type, NULL, 0);
+ err = p9_parse_header(&req->rc, NULL, &type, NULL, 0);
/*
* dump the response from server
* This should be after check errors which poplulate pdu_fcall.
*/
- trace_9p_protocol_dump(c, req->rc);
+ trace_9p_protocol_dump(c, &req->rc);
if (err) {
p9_debug(P9_DEBUG_ERROR, "couldn't parse header %d\n", err);
return err;
@@ -463,7 +461,7 @@ static int p9_check_errors(struct p9_client *c, struct p9_req_t *req)

if (!p9_is_proto_dotl(c)) {
char *ename;
- err = p9pdu_readf(req->rc, c->proto_version, "s?d",
+ err = p9pdu_readf(&req->rc, c->proto_version, "s?d",
&ename, &ecode);
if (err)
goto out_err;
@@ -479,7 +477,7 @@ static int p9_check_errors(struct p9_client *c, struct p9_req_t *req)
}
kfree(ename);
} else {
- err = p9pdu_readf(req->rc, c->proto_version, "d", &ecode);
+ err = p9pdu_readf(&req->rc, c->proto_version, "d", &ecode);
err = -ecode;

p9_debug(P9_DEBUG_9P, "<<< RLERROR (%d)\n", -ecode);
@@ -513,12 +511,12 @@ static int p9_check_zc_errors(struct p9_client *c, struct p9_req_t *req,
int8_t type;
char *ename = NULL;

- err = p9_parse_header(req->rc, NULL, &type, NULL, 0);
+ err = p9_parse_header(&req->rc, NULL, &type, NULL, 0);
/*
* dump the response from server
* This should be after parse_header which poplulate pdu_fcall.
*/
- trace_9p_protocol_dump(c, req->rc);
+ trace_9p_protocol_dump(c, &req->rc);
if (err) {
p9_debug(P9_DEBUG_ERROR, "couldn't parse header %d\n", err);
return err;
@@ -533,13 +531,13 @@ static int p9_check_zc_errors(struct p9_client *c, struct p9_req_t *req,
/* 7 = header size for RERROR; */
int inline_len = in_hdrlen - 7;

- len = req->rc->size - req->rc->offset;
+ len = req->rc.size - req->rc.offset;
if (len > (P9_ZC_HDR_SZ - 7)) {
err = -EFAULT;
goto out_err;
}

- ename = &req->rc->sdata[req->rc->offset];
+ ename = &req->rc.sdata[req->rc.offset];
if (len > inline_len) {
/* We have error in external buffer */
if (!copy_from_iter_full(ename + inline_len,
@@ -549,7 +547,7 @@ static int p9_check_zc_errors(struct p9_client *c, struct p9_req_t *req,
}
}
ename = NULL;
- err = p9pdu_readf(req->rc, c->proto_version, "s?d",
+ err = p9pdu_readf(&req->rc, c->proto_version, "s?d",
&ename, &ecode);
if (err)
goto out_err;
@@ -565,7 +563,7 @@ static int p9_check_zc_errors(struct p9_client *c, struct p9_req_t *req,
}
kfree(ename);
} else {
- err = p9pdu_readf(req->rc, c->proto_version, "d", &ecode);
+ err = p9pdu_readf(&req->rc, c->proto_version, "d", &ecode);
err = -ecode;

p9_debug(P9_DEBUG_9P, "<<< RLERROR (%d)\n", -ecode);
@@ -598,7 +596,7 @@ static int p9_client_flush(struct p9_client *c, struct p9_req_t *oldreq)
int16_t oldtag;
int err;

- err = p9_parse_header(oldreq->tc, NULL, NULL, &oldtag, 1);
+ err = p9_parse_header(&oldreq->tc, NULL, NULL, &oldtag, 1);
if (err)
return err;

@@ -642,12 +640,12 @@ static struct p9_req_t *p9_client_prepare_req(struct p9_client *c,
return req;

/* marshall the data */
- p9pdu_prepare(req->tc, req->tc->tag, type);
- err = p9pdu_vwritef(req->tc, c->proto_version, fmt, ap);
+ p9pdu_prepare(&req->tc, req->tc.tag, type);
+ err = p9pdu_vwritef(&req->tc, c->proto_version, fmt, ap);
if (err)
goto reterr;
- p9pdu_finalize(c, req->tc);
- trace_9p_client_req(c, type, req->tc->tag);
+ p9pdu_finalize(c, &req->tc);
+ trace_9p_client_req(c, type, req->tc.tag);
return req;
reterr:
p9_free_req(c, req);
@@ -732,7 +730,7 @@ p9_client_rpc(struct p9_client *c, int8_t type, const char *fmt, ...)
goto reterr;

err = p9_check_errors(c, req);
- trace_9p_client_res(c, type, req->rc->tag, err);
+ trace_9p_client_res(c, type, req->rc.tag, err);
if (!err)
return req;
reterr:
@@ -814,7 +812,7 @@ static struct p9_req_t *p9_client_zc_rpc(struct p9_client *c, int8_t type,
goto reterr;

err = p9_check_zc_errors(c, req, uidata, in_hdrlen);
- trace_9p_client_res(c, type, req->rc->tag, err);
+ trace_9p_client_res(c, type, req->rc.tag, err);
if (!err)
return req;
reterr:
@@ -897,10 +895,10 @@ static int p9_client_version(struct p9_client *c)
if (IS_ERR(req))
return PTR_ERR(req);

- err = p9pdu_readf(req->rc, c->proto_version, "ds", &msize, &version);
+ err = p9pdu_readf(&req->rc, c->proto_version, "ds", &msize, &version);
if (err) {
p9_debug(P9_DEBUG_9P, "version error %d\n", err);
- trace_9p_protocol_dump(c, req->rc);
+ trace_9p_protocol_dump(c, &req->rc);
goto error;
}

@@ -1049,9 +1047,9 @@ struct p9_fid *p9_client_attach(struct p9_client *clnt, struct p9_fid *afid,
goto error;
}

- err = p9pdu_readf(req->rc, clnt->proto_version, "Q", &qid);
+ err = p9pdu_readf(&req->rc, clnt->proto_version, "Q", &qid);
if (err) {
- trace_9p_protocol_dump(clnt, req->rc);
+ trace_9p_protocol_dump(clnt, &req->rc);
p9_free_req(clnt, req);
goto error;
}
@@ -1106,9 +1104,9 @@ struct p9_fid *p9_client_walk(struct p9_fid *oldfid, uint16_t nwname,
goto error;
}

- err = p9pdu_readf(req->rc, clnt->proto_version, "R", &nwqids, &wqids);
+ err = p9pdu_readf(&req->rc, clnt->proto_version, "R", &nwqids, &wqids);
if (err) {
- trace_9p_protocol_dump(clnt, req->rc);
+ trace_9p_protocol_dump(clnt, &req->rc);
p9_free_req(clnt, req);
goto clunk_fid;
}
@@ -1173,9 +1171,9 @@ int p9_client_open(struct p9_fid *fid, int mode)
goto error;
}

- err = p9pdu_readf(req->rc, clnt->proto_version, "Qd", &qid, &iounit);
+ err = p9pdu_readf(&req->rc, clnt->proto_version, "Qd", &qid, &iounit);
if (err) {
- trace_9p_protocol_dump(clnt, req->rc);
+ trace_9p_protocol_dump(clnt, &req->rc);
goto free_and_error;
}

@@ -1217,9 +1215,9 @@ int p9_client_create_dotl(struct p9_fid *ofid, const char *name, u32 flags, u32
goto error;
}

- err = p9pdu_readf(req->rc, clnt->proto_version, "Qd", qid, &iounit);
+ err = p9pdu_readf(&req->rc, clnt->proto_version, "Qd", qid, &iounit);
if (err) {
- trace_9p_protocol_dump(clnt, req->rc);
+ trace_9p_protocol_dump(clnt, &req->rc);
goto free_and_error;
}

@@ -1262,9 +1260,9 @@ int p9_client_fcreate(struct p9_fid *fid, const char *name, u32 perm, int mode,
goto error;
}

- err = p9pdu_readf(req->rc, clnt->proto_version, "Qd", &qid, &iounit);
+ err = p9pdu_readf(&req->rc, clnt->proto_version, "Qd", &qid, &iounit);
if (err) {
- trace_9p_protocol_dump(clnt, req->rc);
+ trace_9p_protocol_dump(clnt, &req->rc);
goto free_and_error;
}

@@ -1301,9 +1299,9 @@ int p9_client_symlink(struct p9_fid *dfid, const char *name,
goto error;
}

- err = p9pdu_readf(req->rc, clnt->proto_version, "Q", qid);
+ err = p9pdu_readf(&req->rc, clnt->proto_version, "Q", qid);
if (err) {
- trace_9p_protocol_dump(clnt, req->rc);
+ trace_9p_protocol_dump(clnt, &req->rc);
goto free_and_error;
}

@@ -1499,10 +1497,10 @@ p9_client_read(struct p9_fid *fid, u64 offset, struct iov_iter *to, int *err)
break;
}

- *err = p9pdu_readf(req->rc, clnt->proto_version,
+ *err = p9pdu_readf(&req->rc, clnt->proto_version,
"D", &count, &dataptr);
if (*err) {
- trace_9p_protocol_dump(clnt, req->rc);
+ trace_9p_protocol_dump(clnt, &req->rc);
p9_free_req(clnt, req);
break;
}
@@ -1572,9 +1570,9 @@ p9_client_write(struct p9_fid *fid, u64 offset, struct iov_iter *from, int *err)
break;
}

- *err = p9pdu_readf(req->rc, clnt->proto_version, "d", &count);
+ *err = p9pdu_readf(&req->rc, clnt->proto_version, "d", &count);
if (*err) {
- trace_9p_protocol_dump(clnt, req->rc);
+ trace_9p_protocol_dump(clnt, &req->rc);
p9_free_req(clnt, req);
break;
}
@@ -1616,9 +1614,9 @@ struct p9_wstat *p9_client_stat(struct p9_fid *fid)
goto error;
}

- err = p9pdu_readf(req->rc, clnt->proto_version, "wS", &ignored, ret);
+ err = p9pdu_readf(&req->rc, clnt->proto_version, "wS", &ignored, ret);
if (err) {
- trace_9p_protocol_dump(clnt, req->rc);
+ trace_9p_protocol_dump(clnt, &req->rc);
p9_free_req(clnt, req);
goto error;
}
@@ -1669,9 +1667,9 @@ struct p9_stat_dotl *p9_client_getattr_dotl(struct p9_fid *fid,
goto error;
}

- err = p9pdu_readf(req->rc, clnt->proto_version, "A", ret);
+ err = p9pdu_readf(&req->rc, clnt->proto_version, "A", ret);
if (err) {
- trace_9p_protocol_dump(clnt, req->rc);
+ trace_9p_protocol_dump(clnt, &req->rc);
p9_free_req(clnt, req);
goto error;
}
@@ -1821,11 +1819,11 @@ int p9_client_statfs(struct p9_fid *fid, struct p9_rstatfs *sb)
goto error;
}

- err = p9pdu_readf(req->rc, clnt->proto_version, "ddqqqqqqd", &sb->type,
+ err = p9pdu_readf(&req->rc, clnt->proto_version, "ddqqqqqqd", &sb->type,
&sb->bsize, &sb->blocks, &sb->bfree, &sb->bavail,
&sb->files, &sb->ffree, &sb->fsid, &sb->namelen);
if (err) {
- trace_9p_protocol_dump(clnt, req->rc);
+ trace_9p_protocol_dump(clnt, &req->rc);
p9_free_req(clnt, req);
goto error;
}
@@ -1929,9 +1927,9 @@ struct p9_fid *p9_client_xattrwalk(struct p9_fid *file_fid,
err = PTR_ERR(req);
goto error;
}
- err = p9pdu_readf(req->rc, clnt->proto_version, "q", attr_size);
+ err = p9pdu_readf(&req->rc, clnt->proto_version, "q", attr_size);
if (err) {
- trace_9p_protocol_dump(clnt, req->rc);
+ trace_9p_protocol_dump(clnt, &req->rc);
p9_free_req(clnt, req);
goto clunk_fid;
}
@@ -2017,9 +2015,9 @@ int p9_client_readdir(struct p9_fid *fid, char *data, u32 count, u64 offset)
goto error;
}

- err = p9pdu_readf(req->rc, clnt->proto_version, "D", &count, &dataptr);
+ err = p9pdu_readf(&req->rc, clnt->proto_version, "D", &count, &dataptr);
if (err) {
- trace_9p_protocol_dump(clnt, req->rc);
+ trace_9p_protocol_dump(clnt, &req->rc);
goto free_and_error;
}
if (rsize < count) {
@@ -2058,9 +2056,9 @@ int p9_client_mknod_dotl(struct p9_fid *fid, const char *name, int mode,
if (IS_ERR(req))
return PTR_ERR(req);

- err = p9pdu_readf(req->rc, clnt->proto_version, "Q", qid);
+ err = p9pdu_readf(&req->rc, clnt->proto_version, "Q", qid);
if (err) {
- trace_9p_protocol_dump(clnt, req->rc);
+ trace_9p_protocol_dump(clnt, &req->rc);
goto error;
}
p9_debug(P9_DEBUG_9P, "<<< RMKNOD qid %x.%llx.%x\n", qid->type,
@@ -2089,9 +2087,9 @@ int p9_client_mkdir_dotl(struct p9_fid *fid, const char *name, int mode,
if (IS_ERR(req))
return PTR_ERR(req);

- err = p9pdu_readf(req->rc, clnt->proto_version, "Q", qid);
+ err = p9pdu_readf(&req->rc, clnt->proto_version, "Q", qid);
if (err) {
- trace_9p_protocol_dump(clnt, req->rc);
+ trace_9p_protocol_dump(clnt, &req->rc);
goto error;
}
p9_debug(P9_DEBUG_9P, "<<< RMKDIR qid %x.%llx.%x\n", qid->type,
@@ -2124,9 +2122,9 @@ int p9_client_lock_dotl(struct p9_fid *fid, struct p9_flock *flock, u8 *status)
if (IS_ERR(req))
return PTR_ERR(req);

- err = p9pdu_readf(req->rc, clnt->proto_version, "b", status);
+ err = p9pdu_readf(&req->rc, clnt->proto_version, "b", status);
if (err) {
- trace_9p_protocol_dump(clnt, req->rc);
+ trace_9p_protocol_dump(clnt, &req->rc);
goto error;
}
p9_debug(P9_DEBUG_9P, "<<< RLOCK status %i\n", *status);
@@ -2155,11 +2153,11 @@ int p9_client_getlock_dotl(struct p9_fid *fid, struct p9_getlock *glock)
if (IS_ERR(req))
return PTR_ERR(req);

- err = p9pdu_readf(req->rc, clnt->proto_version, "bqqds", &glock->type,
+ err = p9pdu_readf(&req->rc, clnt->proto_version, "bqqds", &glock->type,
&glock->start, &glock->length, &glock->proc_id,
&glock->client_id);
if (err) {
- trace_9p_protocol_dump(clnt, req->rc);
+ trace_9p_protocol_dump(clnt, &req->rc);
goto error;
}
p9_debug(P9_DEBUG_9P, "<<< RGETLOCK type %i start %lld length %lld "
@@ -2185,9 +2183,9 @@ int p9_client_readlink(struct p9_fid *fid, char **target)
if (IS_ERR(req))
return PTR_ERR(req);

- err = p9pdu_readf(req->rc, clnt->proto_version, "s", target);
+ err = p9pdu_readf(&req->rc, clnt->proto_version, "s", target);
if (err) {
- trace_9p_protocol_dump(clnt, req->rc);
+ trace_9p_protocol_dump(clnt, &req->rc);
goto error;
}
p9_debug(P9_DEBUG_9P, "<<< RREADLINK target %s\n", *target);

2018-07-18 12:48:47

by Dominique Martinet

[permalink] [raw]
Subject: Re: [PATCH v2 5/6] 9p: Use a slab for allocating requests

Matthew Wilcox wrote on Wed, Jul 18, 2018:
> I must admit to having not looked at the fcall aspect of this. kmalloc
> is implemented in terms of slab, so it's not going to be much slower than
> using a dedicatd slab (a few instructions to figure out which slab cache
> to use).

Yeah, kmalloc-8, kmalloc-16 etc. This is fast and efficient if the
requested amount is a power of two as you said below.


> What would help is splitting the p9_fcall struct from the alloc_size.
> kmalloc is pretty effective at allocating power-of-two sized structures,
> and 8k + 32 bytes is approximately the worst case scenario. I'd do this
> by embedding the p9_fcall into the p9_req_t and only allocating the
> msize, like this:

Good idea, I'll include something like this in my benchmarks.

I'm still worried about big allocs for bigger rdma msizes (we've had our
share of troubles with mellanox allocating some big 256k queue pairs and
failing pretty often on very memory-fragmented servers, so now that's
fixed I'd rather avoid reproducing the same kind of pattern if
possible...), but the root of the problem is that it's a scam to call
that rdma when the protocol only allows send/recv operations... I guess
this could finally be motivation to work on a v2 allowing for zero-copy
and some scatter-gather lists to allow using fragmented memory.

Thanks for the fast reply as usual.

> (I only converted client.c to the new embedded fcall way; all the
> transports also need massaging)
>
> Regardless of whether any of the other patches go in, this patch is
> worth having; it's really wasting slab allocator memory.

The other patchs don't have anything wrong, but I agree I'd want to keep
this as well unless the numbers are really horrible.
Still needs checking, though!

> diff --git a/include/net/9p/client.h b/include/net/9p/client.h
> index a4dc42c53d18..4b4ac1362ad5 100644
> --- a/include/net/9p/client.h
> +++ b/include/net/9p/client.h
> @@ -95,8 +95,8 @@ struct p9_req_t {
> int status;
> int t_err;
> wait_queue_head_t wq;
> - struct p9_fcall *tc;
> - struct p9_fcall *rc;
> + struct p9_fcall tc;
> + struct p9_fcall rc;
> void *aux;
> struct list_head req_list;
> };
> diff --git a/net/9p/client.c b/net/9p/client.c
> index ecf5dd269f4c..058dfbebdaa2 100644
> --- a/net/9p/client.c
> +++ b/net/9p/client.c
> @@ -230,15 +230,13 @@ static int parse_opts(char *opts, struct p9_client *clnt)
> return ret;
> }
>
> -static struct p9_fcall *p9_fcall_alloc(int alloc_msize)
> +static int p9_fcall_alloc(struct p9_fcall *fc, int alloc_msize)
> {
> - struct p9_fcall *fc;
> - fc = kmalloc(sizeof(struct p9_fcall) + alloc_msize, GFP_NOFS);
> - if (!fc)
> - return NULL;
> + fc->sdata = kmalloc(alloc_msize, GFP_NOFS);
> + if (!fc->sdata)
> + return -ENOMEM;
> fc->capacity = alloc_msize;
> - fc->sdata = (char *) fc + sizeof(struct p9_fcall);
> - return fc;
> + return 0;
> }
>
> static struct kmem_cache *p9_req_cache;
> @@ -262,13 +260,13 @@ p9_tag_alloc(struct p9_client *c, int8_t type, unsigned int max_size)
> if (!req)
> return NULL;
>
> - req->tc = p9_fcall_alloc(alloc_msize);
> - req->rc = p9_fcall_alloc(alloc_msize);
> - if (!req->tc || !req->rc)
> + if (p9_fcall_alloc(&req->tc, alloc_msize))
> + goto free;
> + if (p9_fcall_alloc(&req->rc, alloc_msize))
> goto free;
>
> - p9pdu_reset(req->tc);
> - p9pdu_reset(req->rc);
> + p9pdu_reset(&req->tc);
> + p9pdu_reset(&req->rc);
> req->status = REQ_STATUS_ALLOC;
> init_waitqueue_head(&req->wq);
> INIT_LIST_HEAD(&req->req_list);
> @@ -280,7 +278,7 @@ p9_tag_alloc(struct p9_client *c, int8_t type, unsigned int max_size)
> GFP_NOWAIT);
> else
> tag = idr_alloc(&c->reqs, req, 0, P9_NOTAG, GFP_NOWAIT);
> - req->tc->tag = tag;
> + req->tc.tag = tag;
> spin_unlock_irq(&c->lock);
> idr_preload_end();
> if (tag < 0)
> @@ -289,8 +287,8 @@ p9_tag_alloc(struct p9_client *c, int8_t type, unsigned int max_size)
> return req;
>
> free:
> - kfree(req->tc);
> - kfree(req->rc);
> + kfree(req->tc.sdata);
> + kfree(req->rc.sdata);
> kmem_cache_free(p9_req_cache, req);
> return ERR_PTR(-ENOMEM);
> }
> @@ -329,14 +327,14 @@ EXPORT_SYMBOL(p9_tag_lookup);
> static void p9_free_req(struct p9_client *c, struct p9_req_t *r)
> {
> unsigned long flags;
> - u16 tag = r->tc->tag;
> + u16 tag = r->tc.tag;
>
> p9_debug(P9_DEBUG_MUX, "clnt %p req %p tag: %d\n", c, r, tag);
> spin_lock_irqsave(&c->lock, flags);
> idr_remove(&c->reqs, tag);
> spin_unlock_irqrestore(&c->lock, flags);
> - kfree(r->tc);
> - kfree(r->rc);
> + kfree(r->tc.sdata);
> + kfree(r->rc.sdata);
> kmem_cache_free(p9_req_cache, r);
> }
>
> @@ -368,7 +366,7 @@ static void p9_tag_cleanup(struct p9_client *c)
> */
> void p9_client_cb(struct p9_client *c, struct p9_req_t *req, int status)
> {
> - p9_debug(P9_DEBUG_MUX, " tag %d\n", req->tc->tag);
> + p9_debug(P9_DEBUG_MUX, " tag %d\n", req->tc.tag);
>
> /*
> * This barrier is needed to make sure any change made to req before
> @@ -378,7 +376,7 @@ void p9_client_cb(struct p9_client *c, struct p9_req_t *req, int status)
> req->status = status;
>
> wake_up(&req->wq);
> - p9_debug(P9_DEBUG_MUX, "wakeup: %d\n", req->tc->tag);
> + p9_debug(P9_DEBUG_MUX, "wakeup: %d\n", req->tc.tag);
> }
> EXPORT_SYMBOL(p9_client_cb);
>
> @@ -448,12 +446,12 @@ static int p9_check_errors(struct p9_client *c, struct p9_req_t *req)
> int err;
> int ecode;
>
> - err = p9_parse_header(req->rc, NULL, &type, NULL, 0);
> + err = p9_parse_header(&req->rc, NULL, &type, NULL, 0);
> /*
> * dump the response from server
> * This should be after check errors which poplulate pdu_fcall.
> */
> - trace_9p_protocol_dump(c, req->rc);
> + trace_9p_protocol_dump(c, &req->rc);
> if (err) {
> p9_debug(P9_DEBUG_ERROR, "couldn't parse header %d\n", err);
> return err;
> @@ -463,7 +461,7 @@ static int p9_check_errors(struct p9_client *c, struct p9_req_t *req)
>
> if (!p9_is_proto_dotl(c)) {
> char *ename;
> - err = p9pdu_readf(req->rc, c->proto_version, "s?d",
> + err = p9pdu_readf(&req->rc, c->proto_version, "s?d",
> &ename, &ecode);
> if (err)
> goto out_err;
> @@ -479,7 +477,7 @@ static int p9_check_errors(struct p9_client *c, struct p9_req_t *req)
> }
> kfree(ename);
> } else {
> - err = p9pdu_readf(req->rc, c->proto_version, "d", &ecode);
> + err = p9pdu_readf(&req->rc, c->proto_version, "d", &ecode);
> err = -ecode;
>
> p9_debug(P9_DEBUG_9P, "<<< RLERROR (%d)\n", -ecode);
> @@ -513,12 +511,12 @@ static int p9_check_zc_errors(struct p9_client *c, struct p9_req_t *req,
> int8_t type;
> char *ename = NULL;
>
> - err = p9_parse_header(req->rc, NULL, &type, NULL, 0);
> + err = p9_parse_header(&req->rc, NULL, &type, NULL, 0);
> /*
> * dump the response from server
> * This should be after parse_header which poplulate pdu_fcall.
> */
> - trace_9p_protocol_dump(c, req->rc);
> + trace_9p_protocol_dump(c, &req->rc);
> if (err) {
> p9_debug(P9_DEBUG_ERROR, "couldn't parse header %d\n", err);
> return err;
> @@ -533,13 +531,13 @@ static int p9_check_zc_errors(struct p9_client *c, struct p9_req_t *req,
> /* 7 = header size for RERROR; */
> int inline_len = in_hdrlen - 7;
>
> - len = req->rc->size - req->rc->offset;
> + len = req->rc.size - req->rc.offset;
> if (len > (P9_ZC_HDR_SZ - 7)) {
> err = -EFAULT;
> goto out_err;
> }
>
> - ename = &req->rc->sdata[req->rc->offset];
> + ename = &req->rc.sdata[req->rc.offset];
> if (len > inline_len) {
> /* We have error in external buffer */
> if (!copy_from_iter_full(ename + inline_len,
> @@ -549,7 +547,7 @@ static int p9_check_zc_errors(struct p9_client *c, struct p9_req_t *req,
> }
> }
> ename = NULL;
> - err = p9pdu_readf(req->rc, c->proto_version, "s?d",
> + err = p9pdu_readf(&req->rc, c->proto_version, "s?d",
> &ename, &ecode);
> if (err)
> goto out_err;
> @@ -565,7 +563,7 @@ static int p9_check_zc_errors(struct p9_client *c, struct p9_req_t *req,
> }
> kfree(ename);
> } else {
> - err = p9pdu_readf(req->rc, c->proto_version, "d", &ecode);
> + err = p9pdu_readf(&req->rc, c->proto_version, "d", &ecode);
> err = -ecode;
>
> p9_debug(P9_DEBUG_9P, "<<< RLERROR (%d)\n", -ecode);
> @@ -598,7 +596,7 @@ static int p9_client_flush(struct p9_client *c, struct p9_req_t *oldreq)
> int16_t oldtag;
> int err;
>
> - err = p9_parse_header(oldreq->tc, NULL, NULL, &oldtag, 1);
> + err = p9_parse_header(&oldreq->tc, NULL, NULL, &oldtag, 1);
> if (err)
> return err;
>
> @@ -642,12 +640,12 @@ static struct p9_req_t *p9_client_prepare_req(struct p9_client *c,
> return req;
>
> /* marshall the data */
> - p9pdu_prepare(req->tc, req->tc->tag, type);
> - err = p9pdu_vwritef(req->tc, c->proto_version, fmt, ap);
> + p9pdu_prepare(&req->tc, req->tc.tag, type);
> + err = p9pdu_vwritef(&req->tc, c->proto_version, fmt, ap);
> if (err)
> goto reterr;
> - p9pdu_finalize(c, req->tc);
> - trace_9p_client_req(c, type, req->tc->tag);
> + p9pdu_finalize(c, &req->tc);
> + trace_9p_client_req(c, type, req->tc.tag);
> return req;
> reterr:
> p9_free_req(c, req);
> @@ -732,7 +730,7 @@ p9_client_rpc(struct p9_client *c, int8_t type, const char *fmt, ...)
> goto reterr;
>
> err = p9_check_errors(c, req);
> - trace_9p_client_res(c, type, req->rc->tag, err);
> + trace_9p_client_res(c, type, req->rc.tag, err);
> if (!err)
> return req;
> reterr:
> @@ -814,7 +812,7 @@ static struct p9_req_t *p9_client_zc_rpc(struct p9_client *c, int8_t type,
> goto reterr;
>
> err = p9_check_zc_errors(c, req, uidata, in_hdrlen);
> - trace_9p_client_res(c, type, req->rc->tag, err);
> + trace_9p_client_res(c, type, req->rc.tag, err);
> if (!err)
> return req;
> reterr:
> @@ -897,10 +895,10 @@ static int p9_client_version(struct p9_client *c)
> if (IS_ERR(req))
> return PTR_ERR(req);
>
> - err = p9pdu_readf(req->rc, c->proto_version, "ds", &msize, &version);
> + err = p9pdu_readf(&req->rc, c->proto_version, "ds", &msize, &version);
> if (err) {
> p9_debug(P9_DEBUG_9P, "version error %d\n", err);
> - trace_9p_protocol_dump(c, req->rc);
> + trace_9p_protocol_dump(c, &req->rc);
> goto error;
> }
>
> @@ -1049,9 +1047,9 @@ struct p9_fid *p9_client_attach(struct p9_client *clnt, struct p9_fid *afid,
> goto error;
> }
>
> - err = p9pdu_readf(req->rc, clnt->proto_version, "Q", &qid);
> + err = p9pdu_readf(&req->rc, clnt->proto_version, "Q", &qid);
> if (err) {
> - trace_9p_protocol_dump(clnt, req->rc);
> + trace_9p_protocol_dump(clnt, &req->rc);
> p9_free_req(clnt, req);
> goto error;
> }
> @@ -1106,9 +1104,9 @@ struct p9_fid *p9_client_walk(struct p9_fid *oldfid, uint16_t nwname,
> goto error;
> }
>
> - err = p9pdu_readf(req->rc, clnt->proto_version, "R", &nwqids, &wqids);
> + err = p9pdu_readf(&req->rc, clnt->proto_version, "R", &nwqids, &wqids);
> if (err) {
> - trace_9p_protocol_dump(clnt, req->rc);
> + trace_9p_protocol_dump(clnt, &req->rc);
> p9_free_req(clnt, req);
> goto clunk_fid;
> }
> @@ -1173,9 +1171,9 @@ int p9_client_open(struct p9_fid *fid, int mode)
> goto error;
> }
>
> - err = p9pdu_readf(req->rc, clnt->proto_version, "Qd", &qid, &iounit);
> + err = p9pdu_readf(&req->rc, clnt->proto_version, "Qd", &qid, &iounit);
> if (err) {
> - trace_9p_protocol_dump(clnt, req->rc);
> + trace_9p_protocol_dump(clnt, &req->rc);
> goto free_and_error;
> }
>
> @@ -1217,9 +1215,9 @@ int p9_client_create_dotl(struct p9_fid *ofid, const char *name, u32 flags, u32
> goto error;
> }
>
> - err = p9pdu_readf(req->rc, clnt->proto_version, "Qd", qid, &iounit);
> + err = p9pdu_readf(&req->rc, clnt->proto_version, "Qd", qid, &iounit);
> if (err) {
> - trace_9p_protocol_dump(clnt, req->rc);
> + trace_9p_protocol_dump(clnt, &req->rc);
> goto free_and_error;
> }
>
> @@ -1262,9 +1260,9 @@ int p9_client_fcreate(struct p9_fid *fid, const char *name, u32 perm, int mode,
> goto error;
> }
>
> - err = p9pdu_readf(req->rc, clnt->proto_version, "Qd", &qid, &iounit);
> + err = p9pdu_readf(&req->rc, clnt->proto_version, "Qd", &qid, &iounit);
> if (err) {
> - trace_9p_protocol_dump(clnt, req->rc);
> + trace_9p_protocol_dump(clnt, &req->rc);
> goto free_and_error;
> }
>
> @@ -1301,9 +1299,9 @@ int p9_client_symlink(struct p9_fid *dfid, const char *name,
> goto error;
> }
>
> - err = p9pdu_readf(req->rc, clnt->proto_version, "Q", qid);
> + err = p9pdu_readf(&req->rc, clnt->proto_version, "Q", qid);
> if (err) {
> - trace_9p_protocol_dump(clnt, req->rc);
> + trace_9p_protocol_dump(clnt, &req->rc);
> goto free_and_error;
> }
>
> @@ -1499,10 +1497,10 @@ p9_client_read(struct p9_fid *fid, u64 offset, struct iov_iter *to, int *err)
> break;
> }
>
> - *err = p9pdu_readf(req->rc, clnt->proto_version,
> + *err = p9pdu_readf(&req->rc, clnt->proto_version,
> "D", &count, &dataptr);
> if (*err) {
> - trace_9p_protocol_dump(clnt, req->rc);
> + trace_9p_protocol_dump(clnt, &req->rc);
> p9_free_req(clnt, req);
> break;
> }
> @@ -1572,9 +1570,9 @@ p9_client_write(struct p9_fid *fid, u64 offset, struct iov_iter *from, int *err)
> break;
> }
>
> - *err = p9pdu_readf(req->rc, clnt->proto_version, "d", &count);
> + *err = p9pdu_readf(&req->rc, clnt->proto_version, "d", &count);
> if (*err) {
> - trace_9p_protocol_dump(clnt, req->rc);
> + trace_9p_protocol_dump(clnt, &req->rc);
> p9_free_req(clnt, req);
> break;
> }
> @@ -1616,9 +1614,9 @@ struct p9_wstat *p9_client_stat(struct p9_fid *fid)
> goto error;
> }
>
> - err = p9pdu_readf(req->rc, clnt->proto_version, "wS", &ignored, ret);
> + err = p9pdu_readf(&req->rc, clnt->proto_version, "wS", &ignored, ret);
> if (err) {
> - trace_9p_protocol_dump(clnt, req->rc);
> + trace_9p_protocol_dump(clnt, &req->rc);
> p9_free_req(clnt, req);
> goto error;
> }
> @@ -1669,9 +1667,9 @@ struct p9_stat_dotl *p9_client_getattr_dotl(struct p9_fid *fid,
> goto error;
> }
>
> - err = p9pdu_readf(req->rc, clnt->proto_version, "A", ret);
> + err = p9pdu_readf(&req->rc, clnt->proto_version, "A", ret);
> if (err) {
> - trace_9p_protocol_dump(clnt, req->rc);
> + trace_9p_protocol_dump(clnt, &req->rc);
> p9_free_req(clnt, req);
> goto error;
> }
> @@ -1821,11 +1819,11 @@ int p9_client_statfs(struct p9_fid *fid, struct p9_rstatfs *sb)
> goto error;
> }
>
> - err = p9pdu_readf(req->rc, clnt->proto_version, "ddqqqqqqd", &sb->type,
> + err = p9pdu_readf(&req->rc, clnt->proto_version, "ddqqqqqqd", &sb->type,
> &sb->bsize, &sb->blocks, &sb->bfree, &sb->bavail,
> &sb->files, &sb->ffree, &sb->fsid, &sb->namelen);
> if (err) {
> - trace_9p_protocol_dump(clnt, req->rc);
> + trace_9p_protocol_dump(clnt, &req->rc);
> p9_free_req(clnt, req);
> goto error;
> }
> @@ -1929,9 +1927,9 @@ struct p9_fid *p9_client_xattrwalk(struct p9_fid *file_fid,
> err = PTR_ERR(req);
> goto error;
> }
> - err = p9pdu_readf(req->rc, clnt->proto_version, "q", attr_size);
> + err = p9pdu_readf(&req->rc, clnt->proto_version, "q", attr_size);
> if (err) {
> - trace_9p_protocol_dump(clnt, req->rc);
> + trace_9p_protocol_dump(clnt, &req->rc);
> p9_free_req(clnt, req);
> goto clunk_fid;
> }
> @@ -2017,9 +2015,9 @@ int p9_client_readdir(struct p9_fid *fid, char *data, u32 count, u64 offset)
> goto error;
> }
>
> - err = p9pdu_readf(req->rc, clnt->proto_version, "D", &count, &dataptr);
> + err = p9pdu_readf(&req->rc, clnt->proto_version, "D", &count, &dataptr);
> if (err) {
> - trace_9p_protocol_dump(clnt, req->rc);
> + trace_9p_protocol_dump(clnt, &req->rc);
> goto free_and_error;
> }
> if (rsize < count) {
> @@ -2058,9 +2056,9 @@ int p9_client_mknod_dotl(struct p9_fid *fid, const char *name, int mode,
> if (IS_ERR(req))
> return PTR_ERR(req);
>
> - err = p9pdu_readf(req->rc, clnt->proto_version, "Q", qid);
> + err = p9pdu_readf(&req->rc, clnt->proto_version, "Q", qid);
> if (err) {
> - trace_9p_protocol_dump(clnt, req->rc);
> + trace_9p_protocol_dump(clnt, &req->rc);
> goto error;
> }
> p9_debug(P9_DEBUG_9P, "<<< RMKNOD qid %x.%llx.%x\n", qid->type,
> @@ -2089,9 +2087,9 @@ int p9_client_mkdir_dotl(struct p9_fid *fid, const char *name, int mode,
> if (IS_ERR(req))
> return PTR_ERR(req);
>
> - err = p9pdu_readf(req->rc, clnt->proto_version, "Q", qid);
> + err = p9pdu_readf(&req->rc, clnt->proto_version, "Q", qid);
> if (err) {
> - trace_9p_protocol_dump(clnt, req->rc);
> + trace_9p_protocol_dump(clnt, &req->rc);
> goto error;
> }
> p9_debug(P9_DEBUG_9P, "<<< RMKDIR qid %x.%llx.%x\n", qid->type,
> @@ -2124,9 +2122,9 @@ int p9_client_lock_dotl(struct p9_fid *fid, struct p9_flock *flock, u8 *status)
> if (IS_ERR(req))
> return PTR_ERR(req);
>
> - err = p9pdu_readf(req->rc, clnt->proto_version, "b", status);
> + err = p9pdu_readf(&req->rc, clnt->proto_version, "b", status);
> if (err) {
> - trace_9p_protocol_dump(clnt, req->rc);
> + trace_9p_protocol_dump(clnt, &req->rc);
> goto error;
> }
> p9_debug(P9_DEBUG_9P, "<<< RLOCK status %i\n", *status);
> @@ -2155,11 +2153,11 @@ int p9_client_getlock_dotl(struct p9_fid *fid, struct p9_getlock *glock)
> if (IS_ERR(req))
> return PTR_ERR(req);
>
> - err = p9pdu_readf(req->rc, clnt->proto_version, "bqqds", &glock->type,
> + err = p9pdu_readf(&req->rc, clnt->proto_version, "bqqds", &glock->type,
> &glock->start, &glock->length, &glock->proc_id,
> &glock->client_id);
> if (err) {
> - trace_9p_protocol_dump(clnt, req->rc);
> + trace_9p_protocol_dump(clnt, &req->rc);
> goto error;
> }
> p9_debug(P9_DEBUG_9P, "<<< RGETLOCK type %i start %lld length %lld "
> @@ -2185,9 +2183,9 @@ int p9_client_readlink(struct p9_fid *fid, char **target)
> if (IS_ERR(req))
> return PTR_ERR(req);
>
> - err = p9pdu_readf(req->rc, clnt->proto_version, "s", target);
> + err = p9pdu_readf(&req->rc, clnt->proto_version, "s", target);
> if (err) {
> - trace_9p_protocol_dump(clnt, req->rc);
> + trace_9p_protocol_dump(clnt, &req->rc);
> goto error;
> }
> p9_debug(P9_DEBUG_9P, "<<< RREADLINK target %s\n", *target);

--
Dominique

2018-07-23 12:34:22

by Greg Kurz

[permalink] [raw]
Subject: Re: [PATCH v2 5/6] 9p: Use a slab for allocating requests

On Wed, 18 Jul 2018 12:05:54 +0200
Dominique Martinet <[email protected]> wrote:

> +Cc Greg, I could use your opinion on this if you have a moment.
>

Hi Dominique,

The patch is quite big and I'm not sure I can find time to review it
carefully, but I'll try to help anyway.

> Matthew Wilcox wrote on Wed, Jul 11, 2018:
> > Replace the custom batch allocation with a slab. Use an IDR to store
> > pointers to the active requests instead of an array. We don't try to
> > handle P9_NOTAG specially; the IDR will happily shrink all the way back
> > once the TVERSION call has completed.
>
> Sorry for coming back to this patch now, I just noticed something that's
> actually probably a fairly big hit on performance...
>
> While the slab is just as good as the array for the request itself, this
> makes every single request allocate "fcalls" everytime instead of
> reusing a cached allocation.
> The default msize is 8k and these allocs probably are fairly efficient,
> but some transports like RDMA allow to increase this to up to 1MB... And

It can be even bigger with virtio:

#define VIRTQUEUE_NUM 128

.maxsize = PAGE_SIZE * (VIRTQUEUE_NUM - 3),

On a typical ppc64 server class setup with 64KB pages, this is nearly 8MB.

> doing this kind of allocation twice for every packet is going to be very
> slow.
> (not that hogging megabytes of memory was a great practice either!)
>
>
> One thing is that the buffers are all going to be the same size for a
> given client (.... except virtio zc buffers, I wonder what I'm missing
> or why that didn't blow up before?)

ZC allocates a 4KB buffer, which is more than enough to hold the 7-byte 9P
header and the "dqd" part of all messages that may use ZC, ie, 16 bytes.
So I'm not sure to catch what could blow up.

> Err, that aside I was going to ask if we couldn't find a way to keep a
> pool of these somehow.
> Ideally putting them in another slab so they could be reclaimed if
> necessary, but the size could vary from one client to another, can we
> create a kmem_cache object per client? the KMEM_CACHE macro is not very
> flexible so I don't think that is encouraged... :)
>
>
> It's a shame because I really like that patch, I'll try to find time to
> run some light benchmark with varying msizes eventually but I'm not sure
> when I'll find time for that... Hopefully before the 4.19 merge window!
>

Yeah, the open-coded cache we have now really obfuscates things.

Maybe have a per-client kmem_cache object for non-ZC requests with
size msize [*], and a global kmem_cache object for ZC requests with
fixed size P9_ZC_HDR_SZ.

[*] the server can require a smaller msize during version negotiation,
so maybe we should change the kmem_cache object in this case.

Cheers,

--
Greg

>
> > /**
> > - * p9_tag_alloc - lookup/allocate a request by tag
> > - * @c: client session to lookup tag within
> > - * @tag: numeric id for transaction
> > - *
> > - * this is a simple array lookup, but will grow the
> > - * request_slots as necessary to accommodate transaction
> > - * ids which did not previously have a slot.
> > - *
> > - * this code relies on the client spinlock to manage locks, its
> > - * possible we should switch to something else, but I'd rather
> > - * stick with something low-overhead for the common case.
> > + * p9_req_alloc - Allocate a new request.
> > + * @c: Client session.
> > + * @type: Transaction type.
> > + * @max_size: Maximum packet size for this request.
> > *
> > + * Context: Process context.
> > + * Return: Pointer to new request.
> > */
> > -
> > static struct p9_req_t *
> > -p9_tag_alloc(struct p9_client *c, u16 tag, unsigned int max_size)
> > +p9_tag_alloc(struct p9_client *c, int8_t type, unsigned int max_size)
> > {
> > - unsigned long flags;
> > - int row, col;
> > - struct p9_req_t *req;
> > + struct p9_req_t *req = kmem_cache_alloc(p9_req_cache, GFP_NOFS);
> > int alloc_msize = min(c->msize, max_size);
> > + int tag;
> >
> > - /* This looks up the original request by tag so we know which
> > - * buffer to read the data into */
> > - tag++;
> > -
> > - if (tag >= c->max_tag) {
> > - spin_lock_irqsave(&c->lock, flags);
> > - /* check again since original check was outside of lock */
> > - while (tag >= c->max_tag) {
> > - row = (tag / P9_ROW_MAXTAG);
> > - c->reqs[row] = kcalloc(P9_ROW_MAXTAG,
> > - sizeof(struct p9_req_t), GFP_ATOMIC);
> > -
> > - if (!c->reqs[row]) {
> > - pr_err("Couldn't grow tag array\n");
> > - spin_unlock_irqrestore(&c->lock, flags);
> > - return ERR_PTR(-ENOMEM);
> > - }
> > - for (col = 0; col < P9_ROW_MAXTAG; col++) {
> > - req = &c->reqs[row][col];
> > - req->status = REQ_STATUS_IDLE;
> > - init_waitqueue_head(&req->wq);
> > - }
> > - c->max_tag += P9_ROW_MAXTAG;
> > - }
> > - spin_unlock_irqrestore(&c->lock, flags);
> > - }
> > - row = tag / P9_ROW_MAXTAG;
> > - col = tag % P9_ROW_MAXTAG;
> > + if (!req)
> > + return NULL;
> >
> > - req = &c->reqs[row][col];
> > - if (!req->tc)
> > - req->tc = p9_fcall_alloc(alloc_msize);
> > - if (!req->rc)
> > - req->rc = p9_fcall_alloc(alloc_msize);
> > + req->tc = p9_fcall_alloc(alloc_msize);
> > + req->rc = p9_fcall_alloc(alloc_msize);
> > if (!req->tc || !req->rc)
> > - goto grow_failed;
> > + goto free;
> >
> > p9pdu_reset(req->tc);
> > p9pdu_reset(req->rc);
> > -
> > - req->tc->tag = tag-1;
> > req->status = REQ_STATUS_ALLOC;
> > + init_waitqueue_head(&req->wq);
> > + INIT_LIST_HEAD(&req->req_list);
> > +
> > + idr_preload(GFP_NOFS);
> > + spin_lock_irq(&c->lock);
> > + if (type == P9_TVERSION)
> > + tag = idr_alloc(&c->reqs, req, P9_NOTAG, P9_NOTAG + 1,
> > + GFP_NOWAIT);
> > + else
> > + tag = idr_alloc(&c->reqs, req, 0, P9_NOTAG, GFP_NOWAIT);
> > + req->tc->tag = tag;
> > + spin_unlock_irq(&c->lock);
> > + idr_preload_end();
> > + if (tag < 0)
> > + goto free;
>


2018-07-23 12:44:29

by Dominique Martinet

[permalink] [raw]
Subject: Re: [PATCH v2 5/6] 9p: Use a slab for allocating requests

Greg Kurz wrote on Mon, Jul 23, 2018:
> The patch is quite big and I'm not sure I can find time to review it
> carefully, but I'll try to help anyway.

No worry, thanks for this already.

> > Sorry for coming back to this patch now, I just noticed something that's
> > actually probably a fairly big hit on performance...
> >
> > While the slab is just as good as the array for the request itself, this
> > makes every single request allocate "fcalls" everytime instead of
> > reusing a cached allocation.
> > The default msize is 8k and these allocs probably are fairly efficient,
> > but some transports like RDMA allow to increase this to up to 1MB... And
>
> It can be even bigger with virtio:
>
> #define VIRTQUEUE_NUM 128
>
> .maxsize = PAGE_SIZE * (VIRTQUEUE_NUM - 3),
>
> On a typical ppc64 server class setup with 64KB pages, this is nearly 8MB.

I don't think I'll be able to test 64KB pages, and it's "just" 500k with
4K pages so I'll go with IB.
I just finished reinstalling my IB-enabled VMs, now to get some iops
test running (dbench maybe) and I'll get some figures to be able to play
with different models and evaluate the impact of these.

> > One thing is that the buffers are all going to be the same size for a
> > given client (.... except virtio zc buffers, I wonder what I'm missing
> > or why that didn't blow up before?)
>
> ZC allocates a 4KB buffer, which is more than enough to hold the 7-byte 9P
> header and the "dqd" part of all messages that may use ZC, ie, 16 bytes.
> So I'm not sure to catch what could blow up.

ZC requests won't blow up, but from what I can see with the current
(old) request cache array, if a ZC request has a not-yet used tag it'll
allocate a new 4k buffer, then if a normal request uses that tag it'll
get the 4k buffer instead of an msize sized one.

On the client size the request would be posted with req->rc->capacity
which would correctly be 4k, but I'm not sure what would happen if qemu
tries to write more than the given size to that request?

> > It's a shame because I really like that patch, I'll try to find time to
> > run some light benchmark with varying msizes eventually but I'm not sure
> > when I'll find time for that... Hopefully before the 4.19 merge window!
> >
>
> Yeah, the open-coded cache we have now really obfuscates things.
>
> Maybe have a per-client kmem_cache object for non-ZC requests with
> size msize [*], and a global kmem_cache object for ZC requests with
> fixed size P9_ZC_HDR_SZ.
>
> [*] the server can require a smaller msize during version negotiation,
> so maybe we should change the kmem_cache object in this case.

Yeah, if we're going to want to accomodate non-power of two buffers, I
think we'll need a separate kmem_cache for them.
The ZC requests could be made into exactly 4k and these could come with
regular kmalloc just fine, it looks like trying to create a cache of
that size would just return the same cache used by kmalloc anyway so
it's probably easier to fall back to kmalloc if requested alloc size
doesn't match what we were hoping for.

I'll try to get figures for various approaches before the merge window
for 4.19 starts, it's getting closer though...

--
Dominique

2018-07-23 15:59:50

by Greg Kurz

[permalink] [raw]
Subject: Re: [PATCH v2 5/6] 9p: Use a slab for allocating requests

On Mon, 23 Jul 2018 14:25:31 +0200
Dominique Martinet <[email protected]> wrote:

> Greg Kurz wrote on Mon, Jul 23, 2018:
> > The patch is quite big and I'm not sure I can find time to review it
> > carefully, but I'll try to help anyway.
>
> No worry, thanks for this already.
>
> > > Sorry for coming back to this patch now, I just noticed something that's
> > > actually probably a fairly big hit on performance...
> > >
> > > While the slab is just as good as the array for the request itself, this
> > > makes every single request allocate "fcalls" everytime instead of
> > > reusing a cached allocation.
> > > The default msize is 8k and these allocs probably are fairly efficient,
> > > but some transports like RDMA allow to increase this to up to 1MB... And
> >
> > It can be even bigger with virtio:
> >
> > #define VIRTQUEUE_NUM 128
> >
> > .maxsize = PAGE_SIZE * (VIRTQUEUE_NUM - 3),
> >
> > On a typical ppc64 server class setup with 64KB pages, this is nearly 8MB.
>
> I don't think I'll be able to test 64KB pages, and it's "just" 500k with
> 4K pages so I'll go with IB.
> I just finished reinstalling my IB-enabled VMs, now to get some iops
> test running (dbench maybe) and I'll get some figures to be able to play
> with different models and evaluate the impact of these.
>

Sounds like a good plan.

> > > One thing is that the buffers are all going to be the same size for a
> > > given client (.... except virtio zc buffers, I wonder what I'm missing
> > > or why that didn't blow up before?)
> >
> > ZC allocates a 4KB buffer, which is more than enough to hold the 7-byte 9P
> > header and the "dqd" part of all messages that may use ZC, ie, 16 bytes.
> > So I'm not sure to catch what could blow up.
>
> ZC requests won't blow up, but from what I can see with the current
> (old) request cache array, if a ZC request has a not-yet used tag it'll
> allocate a new 4k buffer, then if a normal request uses that tag it'll
> get the 4k buffer instead of an msize sized one.
>

Indeed.

> On the client size the request would be posted with req->rc->capacity
> which would correctly be 4k, but I'm not sure what would happen if qemu
> tries to write more than the given size to that request?
>

QEMU would detect that the sg list doesn't have enough capacity.
Old QEMUs used to return a RERROR or RLERROR message with ENOBUFS
in this case. This didn't made sense to hijack the 9P protocol, which
is transport agnostic to report misconfigured buffers. Especially, in
the worst case, maybe we wouldn't even have enough space for the error
response... So, since QEMU 2.10, we put the virtio 9p device into
broken state instead, ie, inoperative until it gets reset.


I guess this situation was never hit because server responses mostly
need less than 4KB...

> > > It's a shame because I really like that patch, I'll try to find time to
> > > run some light benchmark with varying msizes eventually but I'm not sure
> > > when I'll find time for that... Hopefully before the 4.19 merge window!
> > >
> >
> > Yeah, the open-coded cache we have now really obfuscates things.
> >
> > Maybe have a per-client kmem_cache object for non-ZC requests with
> > size msize [*], and a global kmem_cache object for ZC requests with
> > fixed size P9_ZC_HDR_SZ.
> >
> > [*] the server can require a smaller msize during version negotiation,
> > so maybe we should change the kmem_cache object in this case.
>
> Yeah, if we're going to want to accomodate non-power of two buffers, I
> think we'll need a separate kmem_cache for them.
> The ZC requests could be made into exactly 4k and these could come with
> regular kmalloc just fine, it looks like trying to create a cache of
> that size would just return the same cache used by kmalloc anyway so
> it's probably easier to fall back to kmalloc if requested alloc size
> doesn't match what we were hoping for.
>

You're right, ZC requests could rely on kmalloc() directly.

> I'll try to get figures for various approaches before the merge window
> for 4.19 starts, it's getting closer though...
>

Great thanks for your effort, but we've been leaving with this code
since the beginning. If this misses the 4.19 merge window, we'll have
more time to validate the approach and polish the fix for 4.20 :)

Cheers,

--
Greg

2018-07-30 09:32:24

by Dominique Martinet

[permalink] [raw]
Subject: Re: [PATCH v2 5/6] 9p: Use a slab for allocating requests

Dominique Martinet wrote on Mon, Jul 23, 2018:
> I'll try to get figures for various approaches before the merge window
> for 4.19 starts, it's getting closer though...

Here's some numbers; with v4.18-rc7 + current test tree (my 9p-next) as
a base.


For the context, I'm running on VMs that bind their cores to CPUs on the
host (32 cores), and have a Connect-IB mellanox card through SRIOV.

The server is nfs-ganesha, serving a tmpfs filesystem on a second VM
(different host)

Mounting with msize=$((1024*1024))

My main problem with this test is that the client has way too much
memory and it's mostly pristine with a boot not long before, so any kind
of memory pressure won't be seen here.
If someone knows how to fragment memory quickly I'll take that and rerun
the tests :)


I've changed my mind from mdtest to a simple ior, as I'm testing on
trans=rdma there's no difference and I'm more familiar with ior options.

I ran two workloads:
- 32 processes, file per process, 512k at a time writing a total of
32GB (1GB per file), repeated 10 times
- 32 processes, file per process, 32 bytes at a time writing a total of
16MB (512k per file), repeated 10 times.

The first test gives a proper impression of the throughput the systems
can sustain and the results are pretty much around what I was expecting
for the setup; the second test is purely a latency test (how long does
it take to send 512k RPCs)


I ran almost all of these tests with KASAN enabled in the VMs a first
time, so leaving the results with KASAN at the end for reference...


Overall I'm rather happy with the result, without KASAN the overhead of
the patch isn't negligible (~6%) but I'd say it's acceptable for
correctness and with an extra two patchs with the suggesteed changes
(rounding down the alloc size to not include the struct overhead and
separate kmem cache) it's getting down to 0.5% which is quite good, I
think.

I'll send the two patchs to the list shortly. The first one is rather
huge even if it's a trivial change logically, so part of me wants to get
it merged quickly to not have to deal with rebases... ;)


With KASAN, well, it certainly does more things but I hope
performance-critical systems don't have it enabled in the first place.



Raw results:

* Base = 4.18-rc7 + queued patches, without request cache rework
- "Big" I/Os:
Summary of all tests:
Operation Max(MiB) Min(MiB) Mean(MiB) StdDev Mean(s) Test# #Tasks tPN reps fPP reord reordoff reordrand seed segcnt blksiz xsize aggsize API RefNum
write 5842.40 5751.58 5793.53 23.93 5.65606 0 32 32 10 1 0 1 0 0 1 1073741824 524288 34359738368 POSIX 0
read 6098.92 6018.63 6064.30 20.00 5.40348 0 32 32 10 1 0 1 0 0 1 1073741824 524288 34359738368 POSIX 0

- "Small" I/Os:
Operation Max(MiB) Min(MiB) Mean(MiB) StdDev Mean(s) Test# #Tasks tPN reps fPP reord reordoff reordrand seed segcnt blksiz xsize aggsize API RefNum
write 2.10 1.91 2.00 0.05 8.01074 0 32 32 10 1 0 1 0 0 1 524288 32 16777216 POSIX 0
read 1.27 1.07 1.15 0.06 13.93901 0 32 32 10 1 0 1 0 0 1 524288 32 16777216 POSIX 0
-> 512k / 8.01074 = 65.4k req/s


* Base + patch as submitted
- "Big" I/Os:
Operation Max(MiB) Min(MiB) Mean(MiB) StdDev Mean(s) Test# #Tasks tPN reps fPP reord reordoff reordrand seed segcnt blksiz xsize aggsize API RefNum
write 5844.84 5665.32 5787.15 48.94 5.66261 0 32 32 10 1 0 1 0 0 1 1073741824 524288 34359738368 POSIX 0
read 6082.24 6039.62 6057.14 12.50 5.40983 0 32 32 10 1 0 1 0 0 1 1073741824 524288 34359738368 POSIX 0

- "Small" I/Os:
Operation Max(MiB) Min(MiB) Mean(MiB) StdDev Mean(s) Test# #Tasks tPN reps fPP reord reordoff reordrand seed segcnt blksiz xsize aggsize API RefNum
write 1.95 1.82 1.88 0.04 8.50453 0 32 32 10 1 0 1 0 0 1 524288 32 16777216 POSIX 0
read 1.18 1.07 1.14 0.03 14.04634 0 32 32 10 1 0 1 0 0 1 524288 32 16777216 POSIX 0
-> 512k / 8.50453 = 61.6k req/s


* Base + patch as submitted + moving the header into req so the
allocation is "round" as suggested by Matthew
- "Big" I/Os:
Operation Max(MiB) Min(MiB) Mean(MiB) StdDev Mean(s) Test# #Tasks tPN reps fPP reord reordoff reordrand seed segcnt blksiz xsize aggsize API RefNum
write 5861.79 5680.99 5795.71 48.84 5.65424 0 32 32 10 1 0 1 0 0 1 1073741824 524288 34359738368 POSIX 0
read 6098.54 6037.55 6067.80 19.39 5.40036 0 32 32 10 1 0 1 0 0 1 1073741824 524288 34359738368 POSIX 0

- "Small" I/Os:
Operation Max(MiB) Min(MiB) Mean(MiB) StdDev Mean(s) Test# #Tasks tPN reps fPP reord reordoff reordrand seed segcnt blksiz xsize aggsize API RefNum
write 1.98 1.81 1.90 0.06 8.43521 0 32 32 10 1 0 1 0 0 1 524288 32 16777216 POSIX 0
read 1.19 1.08 1.13 0.03 14.11709 0 32 32 10 1 0 1 0 0 1 524288 32 16777216 POSIX 0
-> 62.2k req/s

* Base + patchs submitted + round alloc + kmem cache in the client
struct
- "Big" I/Os
Operation Max(MiB) Min(MiB) Mean(MiB) StdDev Mean(s) Test# #Tasks tPN reps fPP reord reordoff reordrand seed segcnt blksiz xsize aggsize API RefNum
write 5859.51 5747.64 5808.22 34.81 5.64186 0 32 32 10 1 0 1 0 0 1 1073741824 524288 34359738368 POSIX 0
read 6087.90 6037.03 6063.98 15.14 5.40374 0 32 32 10 1 0 1 0 0 1 1073741824 524288 34359738368 POSIX 0

- "Small" I/Os:
Operation Max(MiB) Min(MiB) Mean(MiB) StdDev Mean(s) Test# #Tasks tPN reps fPP reord reordoff reordrand seed segcnt blksiz xsize aggsize API RefNum
write 2.07 1.95 1.99 0.03 8.05362 0 32 32 10 1 0 1 0 0 1 524288 32 16777216 POSIX 0
read 1.22 1.11 1.16 0.04 13.75312 0 32 32 10 1 0 1 0 0 1 524288 32 16777216 POSIX 0
-> 65.1k req/s

* Base + patchs submitted + kmem cache in the client struct (kind of
similar to testing an 'odd' msize like 1.001MB)
- "Big" I/Os:
Operation Max(MiB) Min(MiB) Mean(MiB) StdDev Mean(s) Test# #Tasks tPN reps fPP reord reordoff reordrand seed segcnt blksiz xsize aggsize API RefNum
write 5883.03 5725.30 5811.58 45.22 5.63874 0 32 32 10 1 0 1 0 0 1 1073741824 524288 34359738368 POSIX 0
read 6090.29 6015.23 6062.49 25.93 5.40514 0 32 32 10 1 0 1 0 0 1 1073741824 524288 34359738368 POSIX 0

- "Small" I/Os:
Operation Max(MiB) Min(MiB) Mean(MiB) StdDev Mean(s) Test# #Tasks tPN reps fPP reord reordoff reordrand seed segcnt blksiz xsize aggsize API RefNum
write 2.07 1.89 1.98 0.05 8.10028 0 32 32 10 1 0 1 0 0 1 524288 32 16777216 POSIX 0
read 1.23 1.05 1.12 0.05 14.25607 0 32 32 10 1 0 1 0 0 1 524288 32 16777216 POSIX 0
-> 64.7k req/s




Raw results with KASAN:
* Base = 4.18-rc7 + queued patches, without request cache rework
- "Big" I/Os:
Operation Max(MiB) Min(MiB) Mean(MiB) StdDev Mean(s) Test# #Tasks tPN reps fPP reord reordoff reordrand seed segcnt blksiz xsize aggsize API RefNum
write 5790.03 5705.32 5749.69 27.63 5.69922 0 32 32 10 1 0 1 0 0 1 1073741824 524288 34359738368 POSIX 0
read 6095.11 6007.29 6066.50 26.26 5.40157 0 32 32 10 1 0 1 0 0 1 1073741824 524288 34359738368 POSIX 0

- "Small" I/Os:
Operation Max(MiB) Min(MiB) Mean(MiB) StdDev Mean(s) Test# #Tasks tPN reps fPP reord reordoff reordrand seed segcnt blksiz xsize aggsize API RefNum
write 1.63 1.53 1.58 0.03 10.10286 0 32 32 10 1 0 1 0 0 1 524288 32 16777216 POSIX 0
read 1.43 1.19 1.31 0.07 12.27704 0 32 32 10 1 0 1 0 0 1 524288 32 16777216 POSIX 0


* Base + patch as submitted
- "Big" I/Os:
Operation Max(MiB) Min(MiB) Mean(MiB) StdDev Mean(s) Test# #Tasks tPN reps fPP reord reordoff reordrand seed segcnt blksiz xsize aggsize API RefNum
write 5773.60 5673.92 5729.01 29.63 5.71982 0 32 32 10 1 0 1 0 0 1 1073741824 524288 34359738368 POSIX 0
read 6097.96 6006.50 6059.40 26.74 5.40790 0 32 32 10 1 0 1 0 0 1 1073741824 524288 34359738368 POSIX 0

- "Small" I/Os:
Operation Max(MiB) Min(MiB) Mean(MiB) StdDev Mean(s) Test# #Tasks tPN reps fPP reord reordoff reordrand seed segcnt blksiz xsize aggsize API RefNum
write 1.15 1.08 1.12 0.02 14.32230 0 32 32 10 1 0 1 0 0 1 524288 32 16777216 POSIX 0
read 1.18 1.06 1.10 0.04 14.51172 0 32 32 10 1 0 1 0 0 1 524288 32 16777216 POSIX 0


* Base + patch as submitted + moving the header into req so the
allocation is "round" as suggested by Matthew
- "Big" I/Os:
Operation Max(MiB) Min(MiB) Mean(MiB) StdDev Mean(s) Test# #Tasks tPN reps fPP reord reordoff reordrand seed segcnt blksiz xsize aggsize API RefNum
write 5878.75 5709.74 5798.96 57.12 5.65122 0 32 32 10 1 0 1 0 0 1 1073741824 524288 34359738368 POSIX 0
read 6089.83 6039.75 6072.64 14.78 5.39604 0 32 32 10 1 0 1 0 0 1 1073741824 524288 34359738368 POSIX 0

- "Small" I/Os
Operation Max(MiB) Min(MiB) Mean(MiB) StdDev Mean(s) Test#
#Tasks tPN reps fPP reord reordoff reordrand seed segcnt blksiz xsize
aggsize API RefNum
write 1.33 1.26 1.29 0.02 12.38185 0 32 32
10 1 0 1 0 0 1 524288 32 16777216 POSIX 0
read 1.18 1.08 1.15 0.03 13.90525 0 32 32
10 1 0 1 0 0 1 524288 32 16777216 POSIX 0


* Base + patchs submitted + round alloc + kmem cache in the client
struct
- "Big" I/Os
Operation Max(MiB) Min(MiB) Mean(MiB) StdDev Mean(s) Test# #Tasks tPN reps fPP reord reordoff reordrand seed segcnt blksiz xsize aggsize API RefNum
write 5816.89 5729.58 5775.02 26.71 5.67422 0 32 32 10 1 0 1 0 0 1 1073741824 524288 34359738368 POSIX 0
read 6087.33 6032.62 6058.69 16.73 5.40847 0 32 32 10 1 0 1 0 0 1 1073741824 524288 34359738368 POSIX 0


- "Small" I/Os
Operation Max(MiB) Min(MiB) Mean(MiB) StdDev Mean(s) Test# #Tasks tPN reps fPP reord reordoff reordrand seed segcnt blksiz xsize aggsize API RefNum
write 0.87 0.85 0.86 0.01 18.59584 0 32 32 10 1 0 1 0 0 1 524288 32 16777216 POSIX 0
read 0.89 0.86 0.88 0.01 18.26275 0 32 32 10 1 0 1 0 0 1 524288 32 16777216 POSIX 0
-> I'm not sure why it's so different, actually; the cache doesn't turn
up in /proc/slabinfo so I'm figuring it got merged with kmalloc-1024 so
there should be no difference? And this turned out fine without KASAN...


--
Dominique


2018-07-30 09:35:45

by Dominique Martinet

[permalink] [raw]
Subject: [PATCH 2/2] net/9p: add a per-client fcall kmem_cache

From: Dominique Martinet <[email protected]>

Having a specific cache for the fcall allocations helps speed up
allocations a bit, especially in case of non-"round" msizes.

The caches will automatically be merged if there are multiple caches
of items with the same size so we do not need to try to share a cache
between different clients of the same size.

Since the msize is negotiated with the server, only allocate the cache
after that negotiation has happened - previous allocations or
allocations of different sizes (e.g. zero-copy fcall) are made with
kmalloc directly.

Signed-off-by: Dominique Martinet <[email protected]>
---
include/net/9p/client.h | 2 ++
net/9p/client.c | 40 ++++++++++++++++++++++++++++++++--------
net/9p/trans_rdma.c | 2 +-
3 files changed, 35 insertions(+), 9 deletions(-)

diff --git a/include/net/9p/client.h b/include/net/9p/client.h
index 4b4ac1362ad5..8d9bc7402a42 100644
--- a/include/net/9p/client.h
+++ b/include/net/9p/client.h
@@ -123,6 +123,7 @@ struct p9_client {
struct p9_trans_module *trans_mod;
enum p9_trans_status status;
void *trans;
+ struct kmem_cache *fcall_cache;

union {
struct {
@@ -230,6 +231,7 @@ int p9_client_mkdir_dotl(struct p9_fid *fid, const char *name, int mode,
kgid_t gid, struct p9_qid *);
int p9_client_lock_dotl(struct p9_fid *fid, struct p9_flock *flock, u8 *status);
int p9_client_getlock_dotl(struct p9_fid *fid, struct p9_getlock *fl);
+void p9_fcall_free(struct p9_client *c, struct p9_fcall *fc);
struct p9_req_t *p9_tag_lookup(struct p9_client *, u16);
void p9_client_cb(struct p9_client *c, struct p9_req_t *req, int status);

diff --git a/net/9p/client.c b/net/9p/client.c
index ba99a94a12c9..215e3b1ed7b4 100644
--- a/net/9p/client.c
+++ b/net/9p/client.c
@@ -231,15 +231,34 @@ static int parse_opts(char *opts, struct p9_client *clnt)
return ret;
}

-static int p9_fcall_alloc(struct p9_fcall *fc, int alloc_msize)
+static int p9_fcall_alloc(struct p9_client *c, struct p9_fcall *fc,
+ int alloc_msize)
{
- fc->sdata = kmalloc(alloc_msize, GFP_NOFS);
+ if (c->fcall_cache && alloc_msize == c->msize)
+ fc->sdata = kmem_cache_alloc(c->fcall_cache, GFP_NOFS);
+ else
+ fc->sdata = kmalloc(alloc_msize, GFP_NOFS);
if (!fc->sdata)
return -ENOMEM;
fc->capacity = alloc_msize;
return 0;
}

+void p9_fcall_free(struct p9_client *c, struct p9_fcall *fc)
+{
+ /* sdata can be NULL for interrupted requests in trans_rdma,
+ * and kmem_cache_free does not do NULL-check for us
+ */
+ if (unlikely(!fc->sdata))
+ return;
+
+ if (c->fcall_cache && fc->capacity == c->msize)
+ kmem_cache_free(c->fcall_cache, fc->sdata);
+ else
+ kfree(fc->sdata);
+}
+EXPORT_SYMBOL(p9_fcall_free);
+
static struct kmem_cache *p9_req_cache;

/**
@@ -261,9 +280,9 @@ p9_tag_alloc(struct p9_client *c, int8_t type, unsigned int max_size)
if (!req)
return NULL;

- if (p9_fcall_alloc(&req->tc, alloc_msize))
+ if (p9_fcall_alloc(c, &req->tc, alloc_msize))
goto free;
- if (p9_fcall_alloc(&req->rc, alloc_msize))
+ if (p9_fcall_alloc(c, &req->rc, alloc_msize))
goto free;

p9pdu_reset(&req->tc);
@@ -288,8 +307,8 @@ p9_tag_alloc(struct p9_client *c, int8_t type, unsigned int max_size)
return req;

free:
- kfree(req->tc.sdata);
- kfree(req->rc.sdata);
+ p9_fcall_free(c, &req->tc);
+ p9_fcall_free(c, &req->rc);
kmem_cache_free(p9_req_cache, req);
return ERR_PTR(-ENOMEM);
}
@@ -333,8 +352,8 @@ static void p9_free_req(struct p9_client *c, struct p9_req_t *r)
spin_lock_irqsave(&c->lock, flags);
idr_remove(&c->reqs, tag);
spin_unlock_irqrestore(&c->lock, flags);
- kfree(r->tc.sdata);
- kfree(r->rc.sdata);
+ p9_fcall_free(c, &r->tc);
+ p9_fcall_free(c, &r->rc);
kmem_cache_free(p9_req_cache, r);
}

@@ -944,6 +963,7 @@ struct p9_client *p9_client_create(const char *dev_name, char *options)

clnt->trans_mod = NULL;
clnt->trans = NULL;
+ clnt->fcall_cache = NULL;

client_id = utsname()->nodename;
memcpy(clnt->name, client_id, strlen(client_id) + 1);
@@ -980,6 +1000,9 @@ struct p9_client *p9_client_create(const char *dev_name, char *options)
if (err)
goto close_trans;

+ clnt->fcall_cache = kmem_cache_create("9p-fcall-cache", clnt->msize,
+ 0, 0, NULL);
+
return clnt;

close_trans:
@@ -1011,6 +1034,7 @@ void p9_client_destroy(struct p9_client *clnt)

p9_tag_cleanup(clnt);

+ kmem_cache_destroy(clnt->fcall_cache);
kfree(clnt);
}
EXPORT_SYMBOL(p9_client_destroy);
diff --git a/net/9p/trans_rdma.c b/net/9p/trans_rdma.c
index c5cac97df7f7..5e43f0a00b3a 100644
--- a/net/9p/trans_rdma.c
+++ b/net/9p/trans_rdma.c
@@ -445,7 +445,7 @@ static int rdma_request(struct p9_client *client, struct p9_req_t *req)
if (unlikely(atomic_read(&rdma->excess_rc) > 0)) {
if ((atomic_sub_return(1, &rdma->excess_rc) >= 0)) {
/* Got one! */
- kfree(req->rc.sdata);
+ p9_fcall_free(client, &req->rc);
req->rc.sdata = NULL;
goto dont_need_post_recv;
} else {
--
2.17.1


2018-07-30 09:36:28

by Dominique Martinet

[permalink] [raw]
Subject: [PATCH 1/2] net/9p: embed fcall in req to round down buffer allocs

From: Dominique Martinet <[email protected]>

'msize' is often a power of two, or at least page-aligned, so avoiding
an overhead of two dozen bytes for each allocation will help the
allocator do its work and reduce memory fragmentation.

Suggested-by: Matthew Wilcox <[email protected]>
Signed-off-by: Dominique Martinet <[email protected]>
---
include/net/9p/client.h | 4 +-
net/9p/client.c | 160 ++++++++++++++++++++--------------------
net/9p/trans_fd.c | 12 +--
net/9p/trans_rdma.c | 29 ++++----
net/9p/trans_virtio.c | 18 ++---
net/9p/trans_xen.c | 12 +--
6 files changed, 117 insertions(+), 118 deletions(-)

diff --git a/include/net/9p/client.h b/include/net/9p/client.h
index a4dc42c53d18..4b4ac1362ad5 100644
--- a/include/net/9p/client.h
+++ b/include/net/9p/client.h
@@ -95,8 +95,8 @@ struct p9_req_t {
int status;
int t_err;
wait_queue_head_t wq;
- struct p9_fcall *tc;
- struct p9_fcall *rc;
+ struct p9_fcall tc;
+ struct p9_fcall rc;
void *aux;
struct list_head req_list;
};
diff --git a/net/9p/client.c b/net/9p/client.c
index 88db45966740..ba99a94a12c9 100644
--- a/net/9p/client.c
+++ b/net/9p/client.c
@@ -231,15 +231,13 @@ static int parse_opts(char *opts, struct p9_client *clnt)
return ret;
}

-static struct p9_fcall *p9_fcall_alloc(int alloc_msize)
+static int p9_fcall_alloc(struct p9_fcall *fc, int alloc_msize)
{
- struct p9_fcall *fc;
- fc = kmalloc(sizeof(struct p9_fcall) + alloc_msize, GFP_NOFS);
- if (!fc)
- return NULL;
+ fc->sdata = kmalloc(alloc_msize, GFP_NOFS);
+ if (!fc->sdata)
+ return -ENOMEM;
fc->capacity = alloc_msize;
- fc->sdata = (char *) fc + sizeof(struct p9_fcall);
- return fc;
+ return 0;
}

static struct kmem_cache *p9_req_cache;
@@ -263,13 +261,13 @@ p9_tag_alloc(struct p9_client *c, int8_t type, unsigned int max_size)
if (!req)
return NULL;

- req->tc = p9_fcall_alloc(alloc_msize);
- req->rc = p9_fcall_alloc(alloc_msize);
- if (!req->tc || !req->rc)
+ if (p9_fcall_alloc(&req->tc, alloc_msize))
+ goto free;
+ if (p9_fcall_alloc(&req->rc, alloc_msize))
goto free;

- p9pdu_reset(req->tc);
- p9pdu_reset(req->rc);
+ p9pdu_reset(&req->tc);
+ p9pdu_reset(&req->rc);
req->status = REQ_STATUS_ALLOC;
init_waitqueue_head(&req->wq);
INIT_LIST_HEAD(&req->req_list);
@@ -281,7 +279,7 @@ p9_tag_alloc(struct p9_client *c, int8_t type, unsigned int max_size)
GFP_NOWAIT);
else
tag = idr_alloc(&c->reqs, req, 0, P9_NOTAG, GFP_NOWAIT);
- req->tc->tag = tag;
+ req->tc.tag = tag;
spin_unlock_irq(&c->lock);
idr_preload_end();
if (tag < 0)
@@ -290,8 +288,8 @@ p9_tag_alloc(struct p9_client *c, int8_t type, unsigned int max_size)
return req;

free:
- kfree(req->tc);
- kfree(req->rc);
+ kfree(req->tc.sdata);
+ kfree(req->rc.sdata);
kmem_cache_free(p9_req_cache, req);
return ERR_PTR(-ENOMEM);
}
@@ -329,14 +327,14 @@ EXPORT_SYMBOL(p9_tag_lookup);
static void p9_free_req(struct p9_client *c, struct p9_req_t *r)
{
unsigned long flags;
- u16 tag = r->tc->tag;
+ u16 tag = r->tc.tag;

p9_debug(P9_DEBUG_MUX, "clnt %p req %p tag: %d\n", c, r, tag);
spin_lock_irqsave(&c->lock, flags);
idr_remove(&c->reqs, tag);
spin_unlock_irqrestore(&c->lock, flags);
- kfree(r->tc);
- kfree(r->rc);
+ kfree(r->tc.sdata);
+ kfree(r->rc.sdata);
kmem_cache_free(p9_req_cache, r);
}

@@ -368,7 +366,7 @@ static void p9_tag_cleanup(struct p9_client *c)
*/
void p9_client_cb(struct p9_client *c, struct p9_req_t *req, int status)
{
- p9_debug(P9_DEBUG_MUX, " tag %d\n", req->tc->tag);
+ p9_debug(P9_DEBUG_MUX, " tag %d\n", req->tc.tag);

/*
* This barrier is needed to make sure any change made to req before
@@ -378,7 +376,7 @@ void p9_client_cb(struct p9_client *c, struct p9_req_t *req, int status)
req->status = status;

wake_up(&req->wq);
- p9_debug(P9_DEBUG_MUX, "wakeup: %d\n", req->tc->tag);
+ p9_debug(P9_DEBUG_MUX, "wakeup: %d\n", req->tc.tag);
}
EXPORT_SYMBOL(p9_client_cb);

@@ -449,18 +447,18 @@ static int p9_check_errors(struct p9_client *c, struct p9_req_t *req)
int err;
int ecode;

- err = p9_parse_header(req->rc, NULL, &type, NULL, 0);
- if (req->rc->size >= c->msize) {
+ err = p9_parse_header(&req->rc, NULL, &type, NULL, 0);
+ if (req->rc.size >= c->msize) {
p9_debug(P9_DEBUG_ERROR,
"requested packet size too big: %d\n",
- req->rc->size);
+ req->rc.size);
return -EIO;
}
/*
* dump the response from server
* This should be after check errors which poplulate pdu_fcall.
*/
- trace_9p_protocol_dump(c, req->rc);
+ trace_9p_protocol_dump(c, &req->rc);
if (err) {
p9_debug(P9_DEBUG_ERROR, "couldn't parse header %d\n", err);
return err;
@@ -470,7 +468,7 @@ static int p9_check_errors(struct p9_client *c, struct p9_req_t *req)

if (!p9_is_proto_dotl(c)) {
char *ename;
- err = p9pdu_readf(req->rc, c->proto_version, "s?d",
+ err = p9pdu_readf(&req->rc, c->proto_version, "s?d",
&ename, &ecode);
if (err)
goto out_err;
@@ -486,7 +484,7 @@ static int p9_check_errors(struct p9_client *c, struct p9_req_t *req)
}
kfree(ename);
} else {
- err = p9pdu_readf(req->rc, c->proto_version, "d", &ecode);
+ err = p9pdu_readf(&req->rc, c->proto_version, "d", &ecode);
err = -ecode;

p9_debug(P9_DEBUG_9P, "<<< RLERROR (%d)\n", -ecode);
@@ -520,12 +518,12 @@ static int p9_check_zc_errors(struct p9_client *c, struct p9_req_t *req,
int8_t type;
char *ename = NULL;

- err = p9_parse_header(req->rc, NULL, &type, NULL, 0);
+ err = p9_parse_header(&req->rc, NULL, &type, NULL, 0);
/*
* dump the response from server
* This should be after parse_header which poplulate pdu_fcall.
*/
- trace_9p_protocol_dump(c, req->rc);
+ trace_9p_protocol_dump(c, &req->rc);
if (err) {
p9_debug(P9_DEBUG_ERROR, "couldn't parse header %d\n", err);
return err;
@@ -540,13 +538,13 @@ static int p9_check_zc_errors(struct p9_client *c, struct p9_req_t *req,
/* 7 = header size for RERROR; */
int inline_len = in_hdrlen - 7;

- len = req->rc->size - req->rc->offset;
+ len = req->rc.size - req->rc.offset;
if (len > (P9_ZC_HDR_SZ - 7)) {
err = -EFAULT;
goto out_err;
}

- ename = &req->rc->sdata[req->rc->offset];
+ ename = &req->rc.sdata[req->rc.offset];
if (len > inline_len) {
/* We have error in external buffer */
if (!copy_from_iter_full(ename + inline_len,
@@ -556,7 +554,7 @@ static int p9_check_zc_errors(struct p9_client *c, struct p9_req_t *req,
}
}
ename = NULL;
- err = p9pdu_readf(req->rc, c->proto_version, "s?d",
+ err = p9pdu_readf(&req->rc, c->proto_version, "s?d",
&ename, &ecode);
if (err)
goto out_err;
@@ -572,7 +570,7 @@ static int p9_check_zc_errors(struct p9_client *c, struct p9_req_t *req,
}
kfree(ename);
} else {
- err = p9pdu_readf(req->rc, c->proto_version, "d", &ecode);
+ err = p9pdu_readf(&req->rc, c->proto_version, "d", &ecode);
err = -ecode;

p9_debug(P9_DEBUG_9P, "<<< RLERROR (%d)\n", -ecode);
@@ -605,7 +603,7 @@ static int p9_client_flush(struct p9_client *c, struct p9_req_t *oldreq)
int16_t oldtag;
int err;

- err = p9_parse_header(oldreq->tc, NULL, NULL, &oldtag, 1);
+ err = p9_parse_header(&oldreq->tc, NULL, NULL, &oldtag, 1);
if (err)
return err;

@@ -649,12 +647,12 @@ static struct p9_req_t *p9_client_prepare_req(struct p9_client *c,
return req;

/* marshall the data */
- p9pdu_prepare(req->tc, req->tc->tag, type);
- err = p9pdu_vwritef(req->tc, c->proto_version, fmt, ap);
+ p9pdu_prepare(&req->tc, req->tc.tag, type);
+ err = p9pdu_vwritef(&req->tc, c->proto_version, fmt, ap);
if (err)
goto reterr;
- p9pdu_finalize(c, req->tc);
- trace_9p_client_req(c, type, req->tc->tag);
+ p9pdu_finalize(c, &req->tc);
+ trace_9p_client_req(c, type, req->tc.tag);
return req;
reterr:
p9_free_req(c, req);
@@ -739,7 +737,7 @@ p9_client_rpc(struct p9_client *c, int8_t type, const char *fmt, ...)
goto reterr;

err = p9_check_errors(c, req);
- trace_9p_client_res(c, type, req->rc->tag, err);
+ trace_9p_client_res(c, type, req->rc.tag, err);
if (!err)
return req;
reterr:
@@ -821,7 +819,7 @@ static struct p9_req_t *p9_client_zc_rpc(struct p9_client *c, int8_t type,
goto reterr;

err = p9_check_zc_errors(c, req, uidata, in_hdrlen);
- trace_9p_client_res(c, type, req->rc->tag, err);
+ trace_9p_client_res(c, type, req->rc.tag, err);
if (!err)
return req;
reterr:
@@ -904,10 +902,10 @@ static int p9_client_version(struct p9_client *c)
if (IS_ERR(req))
return PTR_ERR(req);

- err = p9pdu_readf(req->rc, c->proto_version, "ds", &msize, &version);
+ err = p9pdu_readf(&req->rc, c->proto_version, "ds", &msize, &version);
if (err) {
p9_debug(P9_DEBUG_9P, "version error %d\n", err);
- trace_9p_protocol_dump(c, req->rc);
+ trace_9p_protocol_dump(c, &req->rc);
goto error;
}

@@ -1056,9 +1054,9 @@ struct p9_fid *p9_client_attach(struct p9_client *clnt, struct p9_fid *afid,
goto error;
}

- err = p9pdu_readf(req->rc, clnt->proto_version, "Q", &qid);
+ err = p9pdu_readf(&req->rc, clnt->proto_version, "Q", &qid);
if (err) {
- trace_9p_protocol_dump(clnt, req->rc);
+ trace_9p_protocol_dump(clnt, &req->rc);
p9_free_req(clnt, req);
goto error;
}
@@ -1113,9 +1111,9 @@ struct p9_fid *p9_client_walk(struct p9_fid *oldfid, uint16_t nwname,
goto error;
}

- err = p9pdu_readf(req->rc, clnt->proto_version, "R", &nwqids, &wqids);
+ err = p9pdu_readf(&req->rc, clnt->proto_version, "R", &nwqids, &wqids);
if (err) {
- trace_9p_protocol_dump(clnt, req->rc);
+ trace_9p_protocol_dump(clnt, &req->rc);
p9_free_req(clnt, req);
goto clunk_fid;
}
@@ -1180,9 +1178,9 @@ int p9_client_open(struct p9_fid *fid, int mode)
goto error;
}

- err = p9pdu_readf(req->rc, clnt->proto_version, "Qd", &qid, &iounit);
+ err = p9pdu_readf(&req->rc, clnt->proto_version, "Qd", &qid, &iounit);
if (err) {
- trace_9p_protocol_dump(clnt, req->rc);
+ trace_9p_protocol_dump(clnt, &req->rc);
goto free_and_error;
}

@@ -1224,9 +1222,9 @@ int p9_client_create_dotl(struct p9_fid *ofid, const char *name, u32 flags, u32
goto error;
}

- err = p9pdu_readf(req->rc, clnt->proto_version, "Qd", qid, &iounit);
+ err = p9pdu_readf(&req->rc, clnt->proto_version, "Qd", qid, &iounit);
if (err) {
- trace_9p_protocol_dump(clnt, req->rc);
+ trace_9p_protocol_dump(clnt, &req->rc);
goto free_and_error;
}

@@ -1269,9 +1267,9 @@ int p9_client_fcreate(struct p9_fid *fid, const char *name, u32 perm, int mode,
goto error;
}

- err = p9pdu_readf(req->rc, clnt->proto_version, "Qd", &qid, &iounit);
+ err = p9pdu_readf(&req->rc, clnt->proto_version, "Qd", &qid, &iounit);
if (err) {
- trace_9p_protocol_dump(clnt, req->rc);
+ trace_9p_protocol_dump(clnt, &req->rc);
goto free_and_error;
}

@@ -1308,9 +1306,9 @@ int p9_client_symlink(struct p9_fid *dfid, const char *name,
goto error;
}

- err = p9pdu_readf(req->rc, clnt->proto_version, "Q", qid);
+ err = p9pdu_readf(&req->rc, clnt->proto_version, "Q", qid);
if (err) {
- trace_9p_protocol_dump(clnt, req->rc);
+ trace_9p_protocol_dump(clnt, &req->rc);
goto free_and_error;
}

@@ -1506,10 +1504,10 @@ p9_client_read(struct p9_fid *fid, u64 offset, struct iov_iter *to, int *err)
break;
}

- *err = p9pdu_readf(req->rc, clnt->proto_version,
+ *err = p9pdu_readf(&req->rc, clnt->proto_version,
"D", &count, &dataptr);
if (*err) {
- trace_9p_protocol_dump(clnt, req->rc);
+ trace_9p_protocol_dump(clnt, &req->rc);
p9_free_req(clnt, req);
break;
}
@@ -1579,9 +1577,9 @@ p9_client_write(struct p9_fid *fid, u64 offset, struct iov_iter *from, int *err)
break;
}

- *err = p9pdu_readf(req->rc, clnt->proto_version, "d", &count);
+ *err = p9pdu_readf(&req->rc, clnt->proto_version, "d", &count);
if (*err) {
- trace_9p_protocol_dump(clnt, req->rc);
+ trace_9p_protocol_dump(clnt, &req->rc);
p9_free_req(clnt, req);
break;
}
@@ -1623,9 +1621,9 @@ struct p9_wstat *p9_client_stat(struct p9_fid *fid)
goto error;
}

- err = p9pdu_readf(req->rc, clnt->proto_version, "wS", &ignored, ret);
+ err = p9pdu_readf(&req->rc, clnt->proto_version, "wS", &ignored, ret);
if (err) {
- trace_9p_protocol_dump(clnt, req->rc);
+ trace_9p_protocol_dump(clnt, &req->rc);
p9_free_req(clnt, req);
goto error;
}
@@ -1676,9 +1674,9 @@ struct p9_stat_dotl *p9_client_getattr_dotl(struct p9_fid *fid,
goto error;
}

- err = p9pdu_readf(req->rc, clnt->proto_version, "A", ret);
+ err = p9pdu_readf(&req->rc, clnt->proto_version, "A", ret);
if (err) {
- trace_9p_protocol_dump(clnt, req->rc);
+ trace_9p_protocol_dump(clnt, &req->rc);
p9_free_req(clnt, req);
goto error;
}
@@ -1828,11 +1826,11 @@ int p9_client_statfs(struct p9_fid *fid, struct p9_rstatfs *sb)
goto error;
}

- err = p9pdu_readf(req->rc, clnt->proto_version, "ddqqqqqqd", &sb->type,
- &sb->bsize, &sb->blocks, &sb->bfree, &sb->bavail,
- &sb->files, &sb->ffree, &sb->fsid, &sb->namelen);
+ err = p9pdu_readf(&req->rc, clnt->proto_version, "ddqqqqqqd", &sb->type,
+ &sb->bsize, &sb->blocks, &sb->bfree, &sb->bavail,
+ &sb->files, &sb->ffree, &sb->fsid, &sb->namelen);
if (err) {
- trace_9p_protocol_dump(clnt, req->rc);
+ trace_9p_protocol_dump(clnt, &req->rc);
p9_free_req(clnt, req);
goto error;
}
@@ -1936,9 +1934,9 @@ struct p9_fid *p9_client_xattrwalk(struct p9_fid *file_fid,
err = PTR_ERR(req);
goto error;
}
- err = p9pdu_readf(req->rc, clnt->proto_version, "q", attr_size);
+ err = p9pdu_readf(&req->rc, clnt->proto_version, "q", attr_size);
if (err) {
- trace_9p_protocol_dump(clnt, req->rc);
+ trace_9p_protocol_dump(clnt, &req->rc);
p9_free_req(clnt, req);
goto clunk_fid;
}
@@ -2024,9 +2022,9 @@ int p9_client_readdir(struct p9_fid *fid, char *data, u32 count, u64 offset)
goto error;
}

- err = p9pdu_readf(req->rc, clnt->proto_version, "D", &count, &dataptr);
+ err = p9pdu_readf(&req->rc, clnt->proto_version, "D", &count, &dataptr);
if (err) {
- trace_9p_protocol_dump(clnt, req->rc);
+ trace_9p_protocol_dump(clnt, &req->rc);
goto free_and_error;
}
if (rsize < count) {
@@ -2065,9 +2063,9 @@ int p9_client_mknod_dotl(struct p9_fid *fid, const char *name, int mode,
if (IS_ERR(req))
return PTR_ERR(req);

- err = p9pdu_readf(req->rc, clnt->proto_version, "Q", qid);
+ err = p9pdu_readf(&req->rc, clnt->proto_version, "Q", qid);
if (err) {
- trace_9p_protocol_dump(clnt, req->rc);
+ trace_9p_protocol_dump(clnt, &req->rc);
goto error;
}
p9_debug(P9_DEBUG_9P, "<<< RMKNOD qid %x.%llx.%x\n", qid->type,
@@ -2096,9 +2094,9 @@ int p9_client_mkdir_dotl(struct p9_fid *fid, const char *name, int mode,
if (IS_ERR(req))
return PTR_ERR(req);

- err = p9pdu_readf(req->rc, clnt->proto_version, "Q", qid);
+ err = p9pdu_readf(&req->rc, clnt->proto_version, "Q", qid);
if (err) {
- trace_9p_protocol_dump(clnt, req->rc);
+ trace_9p_protocol_dump(clnt, &req->rc);
goto error;
}
p9_debug(P9_DEBUG_9P, "<<< RMKDIR qid %x.%llx.%x\n", qid->type,
@@ -2131,9 +2129,9 @@ int p9_client_lock_dotl(struct p9_fid *fid, struct p9_flock *flock, u8 *status)
if (IS_ERR(req))
return PTR_ERR(req);

- err = p9pdu_readf(req->rc, clnt->proto_version, "b", status);
+ err = p9pdu_readf(&req->rc, clnt->proto_version, "b", status);
if (err) {
- trace_9p_protocol_dump(clnt, req->rc);
+ trace_9p_protocol_dump(clnt, &req->rc);
goto error;
}
p9_debug(P9_DEBUG_9P, "<<< RLOCK status %i\n", *status);
@@ -2162,11 +2160,11 @@ int p9_client_getlock_dotl(struct p9_fid *fid, struct p9_getlock *glock)
if (IS_ERR(req))
return PTR_ERR(req);

- err = p9pdu_readf(req->rc, clnt->proto_version, "bqqds", &glock->type,
- &glock->start, &glock->length, &glock->proc_id,
- &glock->client_id);
+ err = p9pdu_readf(&req->rc, clnt->proto_version, "bqqds", &glock->type,
+ &glock->start, &glock->length, &glock->proc_id,
+ &glock->client_id);
if (err) {
- trace_9p_protocol_dump(clnt, req->rc);
+ trace_9p_protocol_dump(clnt, &req->rc);
goto error;
}
p9_debug(P9_DEBUG_9P, "<<< RGETLOCK type %i start %lld length %lld "
@@ -2192,9 +2190,9 @@ int p9_client_readlink(struct p9_fid *fid, char **target)
if (IS_ERR(req))
return PTR_ERR(req);

- err = p9pdu_readf(req->rc, clnt->proto_version, "s", target);
+ err = p9pdu_readf(&req->rc, clnt->proto_version, "s", target);
if (err) {
- trace_9p_protocol_dump(clnt, req->rc);
+ trace_9p_protocol_dump(clnt, &req->rc);
goto error;
}
p9_debug(P9_DEBUG_9P, "<<< RREADLINK target %s\n", *target);
diff --git a/net/9p/trans_fd.c b/net/9p/trans_fd.c
index e2ef3c782c53..20f46f13fe83 100644
--- a/net/9p/trans_fd.c
+++ b/net/9p/trans_fd.c
@@ -354,7 +354,7 @@ static void p9_read_work(struct work_struct *work)
goto error;
}

- if (m->req->rc == NULL) {
+ if (m->req->rc.sdata == NULL) {
p9_debug(P9_DEBUG_ERROR,
"No recv fcall for tag %d (req %p), disconnecting!\n",
m->rc.tag, m->req);
@@ -362,7 +362,7 @@ static void p9_read_work(struct work_struct *work)
err = -EIO;
goto error;
}
- m->rc.sdata = (char *)m->req->rc + sizeof(struct p9_fcall);
+ m->rc.sdata = m->req->rc.sdata;
memcpy(m->rc.sdata, m->tmp_buf, m->rc.capacity);
m->rc.capacity = m->rc.size;
}
@@ -372,7 +372,7 @@ static void p9_read_work(struct work_struct *work)
*/
if ((m->req) && (m->rc.offset == m->rc.capacity)) {
p9_debug(P9_DEBUG_TRANS, "got new packet\n");
- m->req->rc->size = m->rc.offset;
+ m->req->rc.size = m->rc.offset;
spin_lock(&m->client->lock);
if (m->req->status != REQ_STATUS_ERROR)
status = REQ_STATUS_RCVD;
@@ -469,8 +469,8 @@ static void p9_write_work(struct work_struct *work)
p9_debug(P9_DEBUG_TRANS, "move req %p\n", req);
list_move_tail(&req->req_list, &m->req_list);

- m->wbuf = req->tc->sdata;
- m->wsize = req->tc->size;
+ m->wbuf = req->tc.sdata;
+ m->wsize = req->tc.size;
m->wpos = 0;
spin_unlock(&m->client->lock);
}
@@ -663,7 +663,7 @@ static int p9_fd_request(struct p9_client *client, struct p9_req_t *req)
struct p9_conn *m = &ts->conn;

p9_debug(P9_DEBUG_TRANS, "mux %p task %p tcall %p id %d\n",
- m, current, req->tc, req->tc->id);
+ m, current, &req->tc, req->tc.id);
if (m->err < 0)
return m->err;

diff --git a/net/9p/trans_rdma.c b/net/9p/trans_rdma.c
index 2ab4574183c9..c5cac97df7f7 100644
--- a/net/9p/trans_rdma.c
+++ b/net/9p/trans_rdma.c
@@ -122,7 +122,7 @@ struct p9_rdma_context {
dma_addr_t busa;
union {
struct p9_req_t *req;
- struct p9_fcall *rc;
+ struct p9_fcall rc;
};
};

@@ -320,8 +320,8 @@ recv_done(struct ib_cq *cq, struct ib_wc *wc)
if (wc->status != IB_WC_SUCCESS)
goto err_out;

- c->rc->size = wc->byte_len;
- err = p9_parse_header(c->rc, NULL, NULL, &tag, 1);
+ c->rc.size = wc->byte_len;
+ err = p9_parse_header(&c->rc, NULL, NULL, &tag, 1);
if (err)
goto err_out;

@@ -331,12 +331,13 @@ recv_done(struct ib_cq *cq, struct ib_wc *wc)

/* Check that we have not yet received a reply for this request.
*/
- if (unlikely(req->rc)) {
+ if (unlikely(req->rc.sdata)) {
pr_err("Duplicate reply for request %d", tag);
goto err_out;
}

- req->rc = c->rc;
+ req->rc.size = c->rc.size;
+ req->rc.sdata = c->rc.sdata;
p9_client_cb(client, req, REQ_STATUS_RCVD);

out:
@@ -361,7 +362,7 @@ send_done(struct ib_cq *cq, struct ib_wc *wc)
container_of(wc->wr_cqe, struct p9_rdma_context, cqe);

ib_dma_unmap_single(rdma->cm_id->device,
- c->busa, c->req->tc->size,
+ c->busa, c->req->tc.size,
DMA_TO_DEVICE);
up(&rdma->sq_sem);
kfree(c);
@@ -401,7 +402,7 @@ post_recv(struct p9_client *client, struct p9_rdma_context *c)
struct ib_sge sge;

c->busa = ib_dma_map_single(rdma->cm_id->device,
- c->rc->sdata, client->msize,
+ c->rc.sdata, client->msize,
DMA_FROM_DEVICE);
if (ib_dma_mapping_error(rdma->cm_id->device, c->busa))
goto error;
@@ -443,9 +444,9 @@ static int rdma_request(struct p9_client *client, struct p9_req_t *req)
**/
if (unlikely(atomic_read(&rdma->excess_rc) > 0)) {
if ((atomic_sub_return(1, &rdma->excess_rc) >= 0)) {
- /* Got one ! */
- kfree(req->rc);
- req->rc = NULL;
+ /* Got one! */
+ kfree(req->rc.sdata);
+ req->rc.sdata = NULL;
goto dont_need_post_recv;
} else {
/* We raced and lost. */
@@ -459,7 +460,7 @@ static int rdma_request(struct p9_client *client, struct p9_req_t *req)
err = -ENOMEM;
goto recv_error;
}
- rpl_context->rc = req->rc;
+ rpl_context->rc.sdata = req->rc.sdata;

/*
* Post a receive buffer for this request. We need to ensure
@@ -479,7 +480,7 @@ static int rdma_request(struct p9_client *client, struct p9_req_t *req)
goto recv_error;
}
/* remove posted receive buffer from request structure */
- req->rc = NULL;
+ req->rc.sdata = NULL;

dont_need_post_recv:
/* Post the request */
@@ -491,7 +492,7 @@ static int rdma_request(struct p9_client *client, struct p9_req_t *req)
c->req = req;

c->busa = ib_dma_map_single(rdma->cm_id->device,
- c->req->tc->sdata, c->req->tc->size,
+ c->req->tc.sdata, c->req->tc.size,
DMA_TO_DEVICE);
if (ib_dma_mapping_error(rdma->cm_id->device, c->busa)) {
err = -EIO;
@@ -501,7 +502,7 @@ static int rdma_request(struct p9_client *client, struct p9_req_t *req)
c->cqe.done = send_done;

sge.addr = c->busa;
- sge.length = c->req->tc->size;
+ sge.length = c->req->tc.size;
sge.lkey = rdma->pd->local_dma_lkey;

wr.next = NULL;
diff --git a/net/9p/trans_virtio.c b/net/9p/trans_virtio.c
index 6265d1d62749..80dd34d8a6af 100644
--- a/net/9p/trans_virtio.c
+++ b/net/9p/trans_virtio.c
@@ -157,7 +157,7 @@ static void req_done(struct virtqueue *vq)
}

if (len) {
- req->rc->size = len;
+ req->rc.size = len;
p9_client_cb(chan->client, req, REQ_STATUS_RCVD);
}
}
@@ -274,12 +274,12 @@ p9_virtio_request(struct p9_client *client, struct p9_req_t *req)
out_sgs = in_sgs = 0;
/* Handle out VirtIO ring buffers */
out = pack_sg_list(chan->sg, 0,
- VIRTQUEUE_NUM, req->tc->sdata, req->tc->size);
+ VIRTQUEUE_NUM, req->tc.sdata, req->tc.size);
if (out)
sgs[out_sgs++] = chan->sg;

in = pack_sg_list(chan->sg, out,
- VIRTQUEUE_NUM, req->rc->sdata, req->rc->capacity);
+ VIRTQUEUE_NUM, req->rc.sdata, req->rc.capacity);
if (in)
sgs[out_sgs + in_sgs++] = chan->sg + out;

@@ -417,15 +417,15 @@ p9_virtio_zc_request(struct p9_client *client, struct p9_req_t *req,
out_nr_pages = DIV_ROUND_UP(n + offs, PAGE_SIZE);
if (n != outlen) {
__le32 v = cpu_to_le32(n);
- memcpy(&req->tc->sdata[req->tc->size - 4], &v, 4);
+ memcpy(&req->tc.sdata[req->tc.size - 4], &v, 4);
outlen = n;
}
/* The size field of the message must include the length of the
* header and the length of the data. We didn't actually know
* the length of the data until this point so add it in now.
*/
- sz = cpu_to_le32(req->tc->size + outlen);
- memcpy(&req->tc->sdata[0], &sz, sizeof(sz));
+ sz = cpu_to_le32(req->tc.size + outlen);
+ memcpy(&req->tc.sdata[0], &sz, sizeof(sz));
} else if (uidata) {
int n = p9_get_mapped_pages(chan, &in_pages, uidata,
inlen, &offs, &need_drop);
@@ -434,7 +434,7 @@ p9_virtio_zc_request(struct p9_client *client, struct p9_req_t *req,
in_nr_pages = DIV_ROUND_UP(n + offs, PAGE_SIZE);
if (n != inlen) {
__le32 v = cpu_to_le32(n);
- memcpy(&req->tc->sdata[req->tc->size - 4], &v, 4);
+ memcpy(&req->tc.sdata[req->tc.size - 4], &v, 4);
inlen = n;
}
}
@@ -446,7 +446,7 @@ p9_virtio_zc_request(struct p9_client *client, struct p9_req_t *req,

/* out data */
out = pack_sg_list(chan->sg, 0,
- VIRTQUEUE_NUM, req->tc->sdata, req->tc->size);
+ VIRTQUEUE_NUM, req->tc.sdata, req->tc.size);

if (out)
sgs[out_sgs++] = chan->sg;
@@ -465,7 +465,7 @@ p9_virtio_zc_request(struct p9_client *client, struct p9_req_t *req,
* alloced memory and payload onto the user buffer.
*/
in = pack_sg_list(chan->sg, out,
- VIRTQUEUE_NUM, req->rc->sdata, in_hdr_len);
+ VIRTQUEUE_NUM, req->rc.sdata, in_hdr_len);
if (in)
sgs[out_sgs + in_sgs++] = chan->sg + out;

diff --git a/net/9p/trans_xen.c b/net/9p/trans_xen.c
index c2d54ac76bfd..1a5b38892eb4 100644
--- a/net/9p/trans_xen.c
+++ b/net/9p/trans_xen.c
@@ -141,7 +141,7 @@ static int p9_xen_request(struct p9_client *client, struct p9_req_t *p9_req)
struct xen_9pfs_front_priv *priv = NULL;
RING_IDX cons, prod, masked_cons, masked_prod;
unsigned long flags;
- u32 size = p9_req->tc->size;
+ u32 size = p9_req->tc.size;
struct xen_9pfs_dataring *ring;
int num;

@@ -154,7 +154,7 @@ static int p9_xen_request(struct p9_client *client, struct p9_req_t *p9_req)
if (!priv || priv->client != client)
return -EINVAL;

- num = p9_req->tc->tag % priv->num_rings;
+ num = p9_req->tc.tag % priv->num_rings;
ring = &priv->rings[num];

again:
@@ -176,7 +176,7 @@ static int p9_xen_request(struct p9_client *client, struct p9_req_t *p9_req)
masked_prod = xen_9pfs_mask(prod, XEN_9PFS_RING_SIZE);
masked_cons = xen_9pfs_mask(cons, XEN_9PFS_RING_SIZE);

- xen_9pfs_write_packet(ring->data.out, p9_req->tc->sdata, size,
+ xen_9pfs_write_packet(ring->data.out, p9_req->tc.sdata, size,
&masked_prod, masked_cons, XEN_9PFS_RING_SIZE);

p9_req->status = REQ_STATUS_SENT;
@@ -229,12 +229,12 @@ static void p9_xen_response(struct work_struct *work)
continue;
}

- memcpy(req->rc, &h, sizeof(h));
- req->rc->offset = 0;
+ memcpy(&req->rc, &h, sizeof(h));
+ req->rc.offset = 0;

masked_cons = xen_9pfs_mask(cons, XEN_9PFS_RING_SIZE);
/* Then, read the whole packet (including the header) */
- xen_9pfs_read_packet(req->rc->sdata, ring->data.in, h.size,
+ xen_9pfs_read_packet(req->rc.sdata, ring->data.in, h.size,
masked_prod, &masked_cons,
XEN_9PFS_RING_SIZE);

--
2.17.1


2018-07-31 00:57:05

by piaojun

[permalink] [raw]
Subject: Re: [V9fs-developer] [PATCH 1/2] net/9p: embed fcall in req to round down buffer allocs

Hi Dominique,

This is really a *big* patch, but the modification seems no harm. And I
suggest running testcases to cover this. Please see my comments below.

On 2018/7/30 17:34, Dominique Martinet wrote:
> From: Dominique Martinet <[email protected]>
>
> 'msize' is often a power of two, or at least page-aligned, so avoiding
> an overhead of two dozen bytes for each allocation will help the
> allocator do its work and reduce memory fragmentation.
>
> Suggested-by: Matthew Wilcox <[email protected]>
> Signed-off-by: Dominique Martinet <[email protected]>
> ---
> include/net/9p/client.h | 4 +-
> net/9p/client.c | 160 ++++++++++++++++++++--------------------
> net/9p/trans_fd.c | 12 +--
> net/9p/trans_rdma.c | 29 ++++----
> net/9p/trans_virtio.c | 18 ++---
> net/9p/trans_xen.c | 12 +--
> 6 files changed, 117 insertions(+), 118 deletions(-)
>
> diff --git a/include/net/9p/client.h b/include/net/9p/client.h
> index a4dc42c53d18..4b4ac1362ad5 100644
> --- a/include/net/9p/client.h
> +++ b/include/net/9p/client.h
> @@ -95,8 +95,8 @@ struct p9_req_t {
> int status;
> int t_err;
> wait_queue_head_t wq;
> - struct p9_fcall *tc;
> - struct p9_fcall *rc;
> + struct p9_fcall tc;
> + struct p9_fcall rc;
> void *aux;
> struct list_head req_list;
> };
> diff --git a/net/9p/client.c b/net/9p/client.c
> index 88db45966740..ba99a94a12c9 100644
> --- a/net/9p/client.c
> +++ b/net/9p/client.c
> @@ -231,15 +231,13 @@ static int parse_opts(char *opts, struct p9_client *clnt)
> return ret;
> }
>
> -static struct p9_fcall *p9_fcall_alloc(int alloc_msize)
> +static int p9_fcall_alloc(struct p9_fcall *fc, int alloc_msize)
> {
> - struct p9_fcall *fc;
> - fc = kmalloc(sizeof(struct p9_fcall) + alloc_msize, GFP_NOFS);
> - if (!fc)
> - return NULL;
> + fc->sdata = kmalloc(alloc_msize, GFP_NOFS);
> + if (!fc->sdata)
> + return -ENOMEM;
> fc->capacity = alloc_msize;
> - fc->sdata = (char *) fc + sizeof(struct p9_fcall);
> - return fc;
> + return 0;
> }
>
> static struct kmem_cache *p9_req_cache;
> @@ -263,13 +261,13 @@ p9_tag_alloc(struct p9_client *c, int8_t type, unsigned int max_size)
> if (!req)
> return NULL;
>
> - req->tc = p9_fcall_alloc(alloc_msize);
> - req->rc = p9_fcall_alloc(alloc_msize);
> - if (!req->tc || !req->rc)
> + if (p9_fcall_alloc(&req->tc, alloc_msize))
> + goto free;
> + if (p9_fcall_alloc(&req->rc, alloc_msize))
> goto free;
>
> - p9pdu_reset(req->tc);
> - p9pdu_reset(req->rc);
> + p9pdu_reset(&req->tc);
> + p9pdu_reset(&req->rc);
> req->status = REQ_STATUS_ALLOC;
> init_waitqueue_head(&req->wq);
> INIT_LIST_HEAD(&req->req_list);
> @@ -281,7 +279,7 @@ p9_tag_alloc(struct p9_client *c, int8_t type, unsigned int max_size)
> GFP_NOWAIT);
> else
> tag = idr_alloc(&c->reqs, req, 0, P9_NOTAG, GFP_NOWAIT);
> - req->tc->tag = tag;
> + req->tc.tag = tag;
> spin_unlock_irq(&c->lock);
> idr_preload_end();
> if (tag < 0)
> @@ -290,8 +288,8 @@ p9_tag_alloc(struct p9_client *c, int8_t type, unsigned int max_size)
> return req;
>
> free:
> - kfree(req->tc);
> - kfree(req->rc);
> + kfree(req->tc.sdata);
> + kfree(req->rc.sdata);

I wonder if we will free a wild pointer as 'sdata' has not been initialized NULL.

> kmem_cache_free(p9_req_cache, req);
> return ERR_PTR(-ENOMEM);
> }
> @@ -329,14 +327,14 @@ EXPORT_SYMBOL(p9_tag_lookup);
> static void p9_free_req(struct p9_client *c, struct p9_req_t *r)
> {
> unsigned long flags;
> - u16 tag = r->tc->tag;
> + u16 tag = r->tc.tag;
>
> p9_debug(P9_DEBUG_MUX, "clnt %p req %p tag: %d\n", c, r, tag);
> spin_lock_irqsave(&c->lock, flags);
> idr_remove(&c->reqs, tag);
> spin_unlock_irqrestore(&c->lock, flags);
> - kfree(r->tc);
> - kfree(r->rc);
> + kfree(r->tc.sdata);
> + kfree(r->rc.sdata);
> kmem_cache_free(p9_req_cache, r);
> }
>
> @@ -368,7 +366,7 @@ static void p9_tag_cleanup(struct p9_client *c)
> */
> void p9_client_cb(struct p9_client *c, struct p9_req_t *req, int status)
> {
> - p9_debug(P9_DEBUG_MUX, " tag %d\n", req->tc->tag);
> + p9_debug(P9_DEBUG_MUX, " tag %d\n", req->tc.tag);
>
> /*
> * This barrier is needed to make sure any change made to req before
> @@ -378,7 +376,7 @@ void p9_client_cb(struct p9_client *c, struct p9_req_t *req, int status)
> req->status = status;
>
> wake_up(&req->wq);
> - p9_debug(P9_DEBUG_MUX, "wakeup: %d\n", req->tc->tag);
> + p9_debug(P9_DEBUG_MUX, "wakeup: %d\n", req->tc.tag);
> }
> EXPORT_SYMBOL(p9_client_cb);
>
> @@ -449,18 +447,18 @@ static int p9_check_errors(struct p9_client *c, struct p9_req_t *req)
> int err;
> int ecode;
>
> - err = p9_parse_header(req->rc, NULL, &type, NULL, 0);
> - if (req->rc->size >= c->msize) {
> + err = p9_parse_header(&req->rc, NULL, &type, NULL, 0);
> + if (req->rc.size >= c->msize) {
> p9_debug(P9_DEBUG_ERROR,
> "requested packet size too big: %d\n",
> - req->rc->size);
> + req->rc.size);
> return -EIO;
> }
> /*
> * dump the response from server
> * This should be after check errors which poplulate pdu_fcall.
> */
> - trace_9p_protocol_dump(c, req->rc);
> + trace_9p_protocol_dump(c, &req->rc);
> if (err) {
> p9_debug(P9_DEBUG_ERROR, "couldn't parse header %d\n", err);
> return err;
> @@ -470,7 +468,7 @@ static int p9_check_errors(struct p9_client *c, struct p9_req_t *req)
>
> if (!p9_is_proto_dotl(c)) {
> char *ename;
> - err = p9pdu_readf(req->rc, c->proto_version, "s?d",
> + err = p9pdu_readf(&req->rc, c->proto_version, "s?d",
> &ename, &ecode);
> if (err)
> goto out_err;
> @@ -486,7 +484,7 @@ static int p9_check_errors(struct p9_client *c, struct p9_req_t *req)
> }
> kfree(ename);
> } else {
> - err = p9pdu_readf(req->rc, c->proto_version, "d", &ecode);
> + err = p9pdu_readf(&req->rc, c->proto_version, "d", &ecode);
> err = -ecode;
>
> p9_debug(P9_DEBUG_9P, "<<< RLERROR (%d)\n", -ecode);
> @@ -520,12 +518,12 @@ static int p9_check_zc_errors(struct p9_client *c, struct p9_req_t *req,
> int8_t type;
> char *ename = NULL;
>
> - err = p9_parse_header(req->rc, NULL, &type, NULL, 0);
> + err = p9_parse_header(&req->rc, NULL, &type, NULL, 0);
> /*
> * dump the response from server
> * This should be after parse_header which poplulate pdu_fcall.
> */
> - trace_9p_protocol_dump(c, req->rc);
> + trace_9p_protocol_dump(c, &req->rc);
> if (err) {
> p9_debug(P9_DEBUG_ERROR, "couldn't parse header %d\n", err);
> return err;
> @@ -540,13 +538,13 @@ static int p9_check_zc_errors(struct p9_client *c, struct p9_req_t *req,
> /* 7 = header size for RERROR; */
> int inline_len = in_hdrlen - 7;
>
> - len = req->rc->size - req->rc->offset;
> + len = req->rc.size - req->rc.offset;
> if (len > (P9_ZC_HDR_SZ - 7)) {
> err = -EFAULT;
> goto out_err;
> }
>
> - ename = &req->rc->sdata[req->rc->offset];
> + ename = &req->rc.sdata[req->rc.offset];
> if (len > inline_len) {
> /* We have error in external buffer */
> if (!copy_from_iter_full(ename + inline_len,
> @@ -556,7 +554,7 @@ static int p9_check_zc_errors(struct p9_client *c, struct p9_req_t *req,
> }
> }
> ename = NULL;
> - err = p9pdu_readf(req->rc, c->proto_version, "s?d",
> + err = p9pdu_readf(&req->rc, c->proto_version, "s?d",
> &ename, &ecode);
> if (err)
> goto out_err;
> @@ -572,7 +570,7 @@ static int p9_check_zc_errors(struct p9_client *c, struct p9_req_t *req,
> }
> kfree(ename);
> } else {
> - err = p9pdu_readf(req->rc, c->proto_version, "d", &ecode);
> + err = p9pdu_readf(&req->rc, c->proto_version, "d", &ecode);
> err = -ecode;
>
> p9_debug(P9_DEBUG_9P, "<<< RLERROR (%d)\n", -ecode);
> @@ -605,7 +603,7 @@ static int p9_client_flush(struct p9_client *c, struct p9_req_t *oldreq)
> int16_t oldtag;
> int err;
>
> - err = p9_parse_header(oldreq->tc, NULL, NULL, &oldtag, 1);
> + err = p9_parse_header(&oldreq->tc, NULL, NULL, &oldtag, 1);
> if (err)
> return err;
>
> @@ -649,12 +647,12 @@ static struct p9_req_t *p9_client_prepare_req(struct p9_client *c,
> return req;
>
> /* marshall the data */
> - p9pdu_prepare(req->tc, req->tc->tag, type);
> - err = p9pdu_vwritef(req->tc, c->proto_version, fmt, ap);
> + p9pdu_prepare(&req->tc, req->tc.tag, type);
> + err = p9pdu_vwritef(&req->tc, c->proto_version, fmt, ap);
> if (err)
> goto reterr;
> - p9pdu_finalize(c, req->tc);
> - trace_9p_client_req(c, type, req->tc->tag);
> + p9pdu_finalize(c, &req->tc);
> + trace_9p_client_req(c, type, req->tc.tag);
> return req;
> reterr:
> p9_free_req(c, req);
> @@ -739,7 +737,7 @@ p9_client_rpc(struct p9_client *c, int8_t type, const char *fmt, ...)
> goto reterr;
>
> err = p9_check_errors(c, req);
> - trace_9p_client_res(c, type, req->rc->tag, err);
> + trace_9p_client_res(c, type, req->rc.tag, err);
> if (!err)
> return req;
> reterr:
> @@ -821,7 +819,7 @@ static struct p9_req_t *p9_client_zc_rpc(struct p9_client *c, int8_t type,
> goto reterr;
>
> err = p9_check_zc_errors(c, req, uidata, in_hdrlen);
> - trace_9p_client_res(c, type, req->rc->tag, err);
> + trace_9p_client_res(c, type, req->rc.tag, err);
> if (!err)
> return req;
> reterr:
> @@ -904,10 +902,10 @@ static int p9_client_version(struct p9_client *c)
> if (IS_ERR(req))
> return PTR_ERR(req);
>
> - err = p9pdu_readf(req->rc, c->proto_version, "ds", &msize, &version);
> + err = p9pdu_readf(&req->rc, c->proto_version, "ds", &msize, &version);
> if (err) {
> p9_debug(P9_DEBUG_9P, "version error %d\n", err);
> - trace_9p_protocol_dump(c, req->rc);
> + trace_9p_protocol_dump(c, &req->rc);
> goto error;
> }
>
> @@ -1056,9 +1054,9 @@ struct p9_fid *p9_client_attach(struct p9_client *clnt, struct p9_fid *afid,
> goto error;
> }
>
> - err = p9pdu_readf(req->rc, clnt->proto_version, "Q", &qid);
> + err = p9pdu_readf(&req->rc, clnt->proto_version, "Q", &qid);
> if (err) {
> - trace_9p_protocol_dump(clnt, req->rc);
> + trace_9p_protocol_dump(clnt, &req->rc);
> p9_free_req(clnt, req);
> goto error;
> }
> @@ -1113,9 +1111,9 @@ struct p9_fid *p9_client_walk(struct p9_fid *oldfid, uint16_t nwname,
> goto error;
> }
>
> - err = p9pdu_readf(req->rc, clnt->proto_version, "R", &nwqids, &wqids);
> + err = p9pdu_readf(&req->rc, clnt->proto_version, "R", &nwqids, &wqids);
> if (err) {
> - trace_9p_protocol_dump(clnt, req->rc);
> + trace_9p_protocol_dump(clnt, &req->rc);
> p9_free_req(clnt, req);
> goto clunk_fid;
> }
> @@ -1180,9 +1178,9 @@ int p9_client_open(struct p9_fid *fid, int mode)
> goto error;
> }
>
> - err = p9pdu_readf(req->rc, clnt->proto_version, "Qd", &qid, &iounit);
> + err = p9pdu_readf(&req->rc, clnt->proto_version, "Qd", &qid, &iounit);
> if (err) {
> - trace_9p_protocol_dump(clnt, req->rc);
> + trace_9p_protocol_dump(clnt, &req->rc);
> goto free_and_error;
> }
>
> @@ -1224,9 +1222,9 @@ int p9_client_create_dotl(struct p9_fid *ofid, const char *name, u32 flags, u32
> goto error;
> }
>
> - err = p9pdu_readf(req->rc, clnt->proto_version, "Qd", qid, &iounit);
> + err = p9pdu_readf(&req->rc, clnt->proto_version, "Qd", qid, &iounit);
> if (err) {
> - trace_9p_protocol_dump(clnt, req->rc);
> + trace_9p_protocol_dump(clnt, &req->rc);
> goto free_and_error;
> }
>
> @@ -1269,9 +1267,9 @@ int p9_client_fcreate(struct p9_fid *fid, const char *name, u32 perm, int mode,
> goto error;
> }
>
> - err = p9pdu_readf(req->rc, clnt->proto_version, "Qd", &qid, &iounit);
> + err = p9pdu_readf(&req->rc, clnt->proto_version, "Qd", &qid, &iounit);
> if (err) {
> - trace_9p_protocol_dump(clnt, req->rc);
> + trace_9p_protocol_dump(clnt, &req->rc);
> goto free_and_error;
> }
>
> @@ -1308,9 +1306,9 @@ int p9_client_symlink(struct p9_fid *dfid, const char *name,
> goto error;
> }
>
> - err = p9pdu_readf(req->rc, clnt->proto_version, "Q", qid);
> + err = p9pdu_readf(&req->rc, clnt->proto_version, "Q", qid);
> if (err) {
> - trace_9p_protocol_dump(clnt, req->rc);
> + trace_9p_protocol_dump(clnt, &req->rc);
> goto free_and_error;
> }
>
> @@ -1506,10 +1504,10 @@ p9_client_read(struct p9_fid *fid, u64 offset, struct iov_iter *to, int *err)
> break;
> }
>
> - *err = p9pdu_readf(req->rc, clnt->proto_version,
> + *err = p9pdu_readf(&req->rc, clnt->proto_version,
> "D", &count, &dataptr);
> if (*err) {
> - trace_9p_protocol_dump(clnt, req->rc);
> + trace_9p_protocol_dump(clnt, &req->rc);
> p9_free_req(clnt, req);
> break;
> }
> @@ -1579,9 +1577,9 @@ p9_client_write(struct p9_fid *fid, u64 offset, struct iov_iter *from, int *err)
> break;
> }
>
> - *err = p9pdu_readf(req->rc, clnt->proto_version, "d", &count);
> + *err = p9pdu_readf(&req->rc, clnt->proto_version, "d", &count);
> if (*err) {
> - trace_9p_protocol_dump(clnt, req->rc);
> + trace_9p_protocol_dump(clnt, &req->rc);
> p9_free_req(clnt, req);
> break;
> }
> @@ -1623,9 +1621,9 @@ struct p9_wstat *p9_client_stat(struct p9_fid *fid)
> goto error;
> }
>
> - err = p9pdu_readf(req->rc, clnt->proto_version, "wS", &ignored, ret);
> + err = p9pdu_readf(&req->rc, clnt->proto_version, "wS", &ignored, ret);
> if (err) {
> - trace_9p_protocol_dump(clnt, req->rc);
> + trace_9p_protocol_dump(clnt, &req->rc);
> p9_free_req(clnt, req);
> goto error;
> }
> @@ -1676,9 +1674,9 @@ struct p9_stat_dotl *p9_client_getattr_dotl(struct p9_fid *fid,
> goto error;
> }
>
> - err = p9pdu_readf(req->rc, clnt->proto_version, "A", ret);
> + err = p9pdu_readf(&req->rc, clnt->proto_version, "A", ret);
> if (err) {
> - trace_9p_protocol_dump(clnt, req->rc);
> + trace_9p_protocol_dump(clnt, &req->rc);
> p9_free_req(clnt, req);
> goto error;
> }
> @@ -1828,11 +1826,11 @@ int p9_client_statfs(struct p9_fid *fid, struct p9_rstatfs *sb)
> goto error;
> }
>
> - err = p9pdu_readf(req->rc, clnt->proto_version, "ddqqqqqqd", &sb->type,
> - &sb->bsize, &sb->blocks, &sb->bfree, &sb->bavail,
> - &sb->files, &sb->ffree, &sb->fsid, &sb->namelen);
> + err = p9pdu_readf(&req->rc, clnt->proto_version, "ddqqqqqqd", &sb->type,
> + &sb->bsize, &sb->blocks, &sb->bfree, &sb->bavail,
> + &sb->files, &sb->ffree, &sb->fsid, &sb->namelen);
> if (err) {
> - trace_9p_protocol_dump(clnt, req->rc);
> + trace_9p_protocol_dump(clnt, &req->rc);
> p9_free_req(clnt, req);
> goto error;
> }
> @@ -1936,9 +1934,9 @@ struct p9_fid *p9_client_xattrwalk(struct p9_fid *file_fid,
> err = PTR_ERR(req);
> goto error;
> }
> - err = p9pdu_readf(req->rc, clnt->proto_version, "q", attr_size);
> + err = p9pdu_readf(&req->rc, clnt->proto_version, "q", attr_size);
> if (err) {
> - trace_9p_protocol_dump(clnt, req->rc);
> + trace_9p_protocol_dump(clnt, &req->rc);
> p9_free_req(clnt, req);
> goto clunk_fid;
> }
> @@ -2024,9 +2022,9 @@ int p9_client_readdir(struct p9_fid *fid, char *data, u32 count, u64 offset)
> goto error;
> }
>
> - err = p9pdu_readf(req->rc, clnt->proto_version, "D", &count, &dataptr);
> + err = p9pdu_readf(&req->rc, clnt->proto_version, "D", &count, &dataptr);
> if (err) {
> - trace_9p_protocol_dump(clnt, req->rc);
> + trace_9p_protocol_dump(clnt, &req->rc);
> goto free_and_error;
> }
> if (rsize < count) {
> @@ -2065,9 +2063,9 @@ int p9_client_mknod_dotl(struct p9_fid *fid, const char *name, int mode,
> if (IS_ERR(req))
> return PTR_ERR(req);
>
> - err = p9pdu_readf(req->rc, clnt->proto_version, "Q", qid);
> + err = p9pdu_readf(&req->rc, clnt->proto_version, "Q", qid);
> if (err) {
> - trace_9p_protocol_dump(clnt, req->rc);
> + trace_9p_protocol_dump(clnt, &req->rc);
> goto error;
> }
> p9_debug(P9_DEBUG_9P, "<<< RMKNOD qid %x.%llx.%x\n", qid->type,
> @@ -2096,9 +2094,9 @@ int p9_client_mkdir_dotl(struct p9_fid *fid, const char *name, int mode,
> if (IS_ERR(req))
> return PTR_ERR(req);
>
> - err = p9pdu_readf(req->rc, clnt->proto_version, "Q", qid);
> + err = p9pdu_readf(&req->rc, clnt->proto_version, "Q", qid);
> if (err) {
> - trace_9p_protocol_dump(clnt, req->rc);
> + trace_9p_protocol_dump(clnt, &req->rc);
> goto error;
> }
> p9_debug(P9_DEBUG_9P, "<<< RMKDIR qid %x.%llx.%x\n", qid->type,
> @@ -2131,9 +2129,9 @@ int p9_client_lock_dotl(struct p9_fid *fid, struct p9_flock *flock, u8 *status)
> if (IS_ERR(req))
> return PTR_ERR(req);
>
> - err = p9pdu_readf(req->rc, clnt->proto_version, "b", status);
> + err = p9pdu_readf(&req->rc, clnt->proto_version, "b", status);
> if (err) {
> - trace_9p_protocol_dump(clnt, req->rc);
> + trace_9p_protocol_dump(clnt, &req->rc);
> goto error;
> }
> p9_debug(P9_DEBUG_9P, "<<< RLOCK status %i\n", *status);
> @@ -2162,11 +2160,11 @@ int p9_client_getlock_dotl(struct p9_fid *fid, struct p9_getlock *glock)
> if (IS_ERR(req))
> return PTR_ERR(req);
>
> - err = p9pdu_readf(req->rc, clnt->proto_version, "bqqds", &glock->type,
> - &glock->start, &glock->length, &glock->proc_id,
> - &glock->client_id);
> + err = p9pdu_readf(&req->rc, clnt->proto_version, "bqqds", &glock->type,
> + &glock->start, &glock->length, &glock->proc_id,
> + &glock->client_id);
> if (err) {
> - trace_9p_protocol_dump(clnt, req->rc);
> + trace_9p_protocol_dump(clnt, &req->rc);
> goto error;
> }
> p9_debug(P9_DEBUG_9P, "<<< RGETLOCK type %i start %lld length %lld "
> @@ -2192,9 +2190,9 @@ int p9_client_readlink(struct p9_fid *fid, char **target)
> if (IS_ERR(req))
> return PTR_ERR(req);
>
> - err = p9pdu_readf(req->rc, clnt->proto_version, "s", target);
> + err = p9pdu_readf(&req->rc, clnt->proto_version, "s", target);
> if (err) {
> - trace_9p_protocol_dump(clnt, req->rc);
> + trace_9p_protocol_dump(clnt, &req->rc);
> goto error;
> }
> p9_debug(P9_DEBUG_9P, "<<< RREADLINK target %s\n", *target);
> diff --git a/net/9p/trans_fd.c b/net/9p/trans_fd.c
> index e2ef3c782c53..20f46f13fe83 100644
> --- a/net/9p/trans_fd.c
> +++ b/net/9p/trans_fd.c
> @@ -354,7 +354,7 @@ static void p9_read_work(struct work_struct *work)
> goto error;
> }
>
> - if (m->req->rc == NULL) {
> + if (m->req->rc.sdata == NULL) {
> p9_debug(P9_DEBUG_ERROR,
> "No recv fcall for tag %d (req %p), disconnecting!\n",
> m->rc.tag, m->req);
> @@ -362,7 +362,7 @@ static void p9_read_work(struct work_struct *work)
> err = -EIO;
> goto error;
> }
> - m->rc.sdata = (char *)m->req->rc + sizeof(struct p9_fcall);
> + m->rc.sdata = m->req->rc.sdata;
> memcpy(m->rc.sdata, m->tmp_buf, m->rc.capacity);
> m->rc.capacity = m->rc.size;
> }
> @@ -372,7 +372,7 @@ static void p9_read_work(struct work_struct *work)
> */
> if ((m->req) && (m->rc.offset == m->rc.capacity)) {
> p9_debug(P9_DEBUG_TRANS, "got new packet\n");
> - m->req->rc->size = m->rc.offset;
> + m->req->rc.size = m->rc.offset;
> spin_lock(&m->client->lock);
> if (m->req->status != REQ_STATUS_ERROR)
> status = REQ_STATUS_RCVD;
> @@ -469,8 +469,8 @@ static void p9_write_work(struct work_struct *work)
> p9_debug(P9_DEBUG_TRANS, "move req %p\n", req);
> list_move_tail(&req->req_list, &m->req_list);
>
> - m->wbuf = req->tc->sdata;
> - m->wsize = req->tc->size;
> + m->wbuf = req->tc.sdata;
> + m->wsize = req->tc.size;
> m->wpos = 0;
> spin_unlock(&m->client->lock);
> }
> @@ -663,7 +663,7 @@ static int p9_fd_request(struct p9_client *client, struct p9_req_t *req)
> struct p9_conn *m = &ts->conn;
>
> p9_debug(P9_DEBUG_TRANS, "mux %p task %p tcall %p id %d\n",
> - m, current, req->tc, req->tc->id);
> + m, current, &req->tc, req->tc.id);
> if (m->err < 0)
> return m->err;
>
> diff --git a/net/9p/trans_rdma.c b/net/9p/trans_rdma.c
> index 2ab4574183c9..c5cac97df7f7 100644
> --- a/net/9p/trans_rdma.c
> +++ b/net/9p/trans_rdma.c
> @@ -122,7 +122,7 @@ struct p9_rdma_context {
> dma_addr_t busa;
> union {
> struct p9_req_t *req;
> - struct p9_fcall *rc;
> + struct p9_fcall rc;
> };
> };
>
> @@ -320,8 +320,8 @@ recv_done(struct ib_cq *cq, struct ib_wc *wc)
> if (wc->status != IB_WC_SUCCESS)
> goto err_out;
>
> - c->rc->size = wc->byte_len;
> - err = p9_parse_header(c->rc, NULL, NULL, &tag, 1);
> + c->rc.size = wc->byte_len;
> + err = p9_parse_header(&c->rc, NULL, NULL, &tag, 1);
> if (err)
> goto err_out;
>
> @@ -331,12 +331,13 @@ recv_done(struct ib_cq *cq, struct ib_wc *wc)
>
> /* Check that we have not yet received a reply for this request.
> */
> - if (unlikely(req->rc)) {
> + if (unlikely(req->rc.sdata)) {
> pr_err("Duplicate reply for request %d", tag);
> goto err_out;
> }
>
> - req->rc = c->rc;
> + req->rc.size = c->rc.size;
> + req->rc.sdata = c->rc.sdata;
> p9_client_cb(client, req, REQ_STATUS_RCVD);
>
> out:
> @@ -361,7 +362,7 @@ send_done(struct ib_cq *cq, struct ib_wc *wc)
> container_of(wc->wr_cqe, struct p9_rdma_context, cqe);
>
> ib_dma_unmap_single(rdma->cm_id->device,
> - c->busa, c->req->tc->size,
> + c->busa, c->req->tc.size,
> DMA_TO_DEVICE);
> up(&rdma->sq_sem);
> kfree(c);
> @@ -401,7 +402,7 @@ post_recv(struct p9_client *client, struct p9_rdma_context *c)
> struct ib_sge sge;
>
> c->busa = ib_dma_map_single(rdma->cm_id->device,
> - c->rc->sdata, client->msize,
> + c->rc.sdata, client->msize,
> DMA_FROM_DEVICE);
> if (ib_dma_mapping_error(rdma->cm_id->device, c->busa))
> goto error;
> @@ -443,9 +444,9 @@ static int rdma_request(struct p9_client *client, struct p9_req_t *req)
> **/
> if (unlikely(atomic_read(&rdma->excess_rc) > 0)) {
> if ((atomic_sub_return(1, &rdma->excess_rc) >= 0)) {
> - /* Got one ! */
> - kfree(req->rc);
> - req->rc = NULL;
> + /* Got one! */
> + kfree(req->rc.sdata);
> + req->rc.sdata = NULL;
> goto dont_need_post_recv;
> } else {
> /* We raced and lost. */
> @@ -459,7 +460,7 @@ static int rdma_request(struct p9_client *client, struct p9_req_t *req)
> err = -ENOMEM;
> goto recv_error;
> }
> - rpl_context->rc = req->rc;
> + rpl_context->rc.sdata = req->rc.sdata;
>
> /*
> * Post a receive buffer for this request. We need to ensure
> @@ -479,7 +480,7 @@ static int rdma_request(struct p9_client *client, struct p9_req_t *req)
> goto recv_error;
> }
> /* remove posted receive buffer from request structure */
> - req->rc = NULL;
> + req->rc.sdata = NULL;
>
> dont_need_post_recv:
> /* Post the request */
> @@ -491,7 +492,7 @@ static int rdma_request(struct p9_client *client, struct p9_req_t *req)
> c->req = req;
>
> c->busa = ib_dma_map_single(rdma->cm_id->device,
> - c->req->tc->sdata, c->req->tc->size,
> + c->req->tc.sdata, c->req->tc.size,
> DMA_TO_DEVICE);
> if (ib_dma_mapping_error(rdma->cm_id->device, c->busa)) {
> err = -EIO;
> @@ -501,7 +502,7 @@ static int rdma_request(struct p9_client *client, struct p9_req_t *req)
> c->cqe.done = send_done;
>
> sge.addr = c->busa;
> - sge.length = c->req->tc->size;
> + sge.length = c->req->tc.size;
> sge.lkey = rdma->pd->local_dma_lkey;
>
> wr.next = NULL;
> diff --git a/net/9p/trans_virtio.c b/net/9p/trans_virtio.c
> index 6265d1d62749..80dd34d8a6af 100644
> --- a/net/9p/trans_virtio.c
> +++ b/net/9p/trans_virtio.c
> @@ -157,7 +157,7 @@ static void req_done(struct virtqueue *vq)
> }
>
> if (len) {
> - req->rc->size = len;
> + req->rc.size = len;
> p9_client_cb(chan->client, req, REQ_STATUS_RCVD);
> }
> }
> @@ -274,12 +274,12 @@ p9_virtio_request(struct p9_client *client, struct p9_req_t *req)
> out_sgs = in_sgs = 0;
> /* Handle out VirtIO ring buffers */
> out = pack_sg_list(chan->sg, 0,
> - VIRTQUEUE_NUM, req->tc->sdata, req->tc->size);
> + VIRTQUEUE_NUM, req->tc.sdata, req->tc.size);
> if (out)
> sgs[out_sgs++] = chan->sg;
>
> in = pack_sg_list(chan->sg, out,
> - VIRTQUEUE_NUM, req->rc->sdata, req->rc->capacity);
> + VIRTQUEUE_NUM, req->rc.sdata, req->rc.capacity);
> if (in)
> sgs[out_sgs + in_sgs++] = chan->sg + out;
>
> @@ -417,15 +417,15 @@ p9_virtio_zc_request(struct p9_client *client, struct p9_req_t *req,
> out_nr_pages = DIV_ROUND_UP(n + offs, PAGE_SIZE);
> if (n != outlen) {
> __le32 v = cpu_to_le32(n);
> - memcpy(&req->tc->sdata[req->tc->size - 4], &v, 4);
> + memcpy(&req->tc.sdata[req->tc.size - 4], &v, 4);
> outlen = n;
> }
> /* The size field of the message must include the length of the
> * header and the length of the data. We didn't actually know
> * the length of the data until this point so add it in now.
> */
> - sz = cpu_to_le32(req->tc->size + outlen);
> - memcpy(&req->tc->sdata[0], &sz, sizeof(sz));
> + sz = cpu_to_le32(req->tc.size + outlen);
> + memcpy(&req->tc.sdata[0], &sz, sizeof(sz));
> } else if (uidata) {
> int n = p9_get_mapped_pages(chan, &in_pages, uidata,
> inlen, &offs, &need_drop);
> @@ -434,7 +434,7 @@ p9_virtio_zc_request(struct p9_client *client, struct p9_req_t *req,
> in_nr_pages = DIV_ROUND_UP(n + offs, PAGE_SIZE);
> if (n != inlen) {
> __le32 v = cpu_to_le32(n);
> - memcpy(&req->tc->sdata[req->tc->size - 4], &v, 4);
> + memcpy(&req->tc.sdata[req->tc.size - 4], &v, 4);
> inlen = n;
> }
> }
> @@ -446,7 +446,7 @@ p9_virtio_zc_request(struct p9_client *client, struct p9_req_t *req,
>
> /* out data */
> out = pack_sg_list(chan->sg, 0,
> - VIRTQUEUE_NUM, req->tc->sdata, req->tc->size);
> + VIRTQUEUE_NUM, req->tc.sdata, req->tc.size);
>
> if (out)
> sgs[out_sgs++] = chan->sg;
> @@ -465,7 +465,7 @@ p9_virtio_zc_request(struct p9_client *client, struct p9_req_t *req,
> * alloced memory and payload onto the user buffer.
> */
> in = pack_sg_list(chan->sg, out,
> - VIRTQUEUE_NUM, req->rc->sdata, in_hdr_len);
> + VIRTQUEUE_NUM, req->rc.sdata, in_hdr_len);
> if (in)
> sgs[out_sgs + in_sgs++] = chan->sg + out;
>
> diff --git a/net/9p/trans_xen.c b/net/9p/trans_xen.c
> index c2d54ac76bfd..1a5b38892eb4 100644
> --- a/net/9p/trans_xen.c
> +++ b/net/9p/trans_xen.c
> @@ -141,7 +141,7 @@ static int p9_xen_request(struct p9_client *client, struct p9_req_t *p9_req)
> struct xen_9pfs_front_priv *priv = NULL;
> RING_IDX cons, prod, masked_cons, masked_prod;
> unsigned long flags;
> - u32 size = p9_req->tc->size;
> + u32 size = p9_req->tc.size;
> struct xen_9pfs_dataring *ring;
> int num;
>
> @@ -154,7 +154,7 @@ static int p9_xen_request(struct p9_client *client, struct p9_req_t *p9_req)
> if (!priv || priv->client != client)
> return -EINVAL;
>
> - num = p9_req->tc->tag % priv->num_rings;
> + num = p9_req->tc.tag % priv->num_rings;
> ring = &priv->rings[num];
>
> again:
> @@ -176,7 +176,7 @@ static int p9_xen_request(struct p9_client *client, struct p9_req_t *p9_req)
> masked_prod = xen_9pfs_mask(prod, XEN_9PFS_RING_SIZE);
> masked_cons = xen_9pfs_mask(cons, XEN_9PFS_RING_SIZE);
>
> - xen_9pfs_write_packet(ring->data.out, p9_req->tc->sdata, size,
> + xen_9pfs_write_packet(ring->data.out, p9_req->tc.sdata, size,
> &masked_prod, masked_cons, XEN_9PFS_RING_SIZE);
>
> p9_req->status = REQ_STATUS_SENT;
> @@ -229,12 +229,12 @@ static void p9_xen_response(struct work_struct *work)
> continue;
> }
>
> - memcpy(req->rc, &h, sizeof(h));
> - req->rc->offset = 0;
> + memcpy(&req->rc, &h, sizeof(h));
> + req->rc.offset = 0;
>
> masked_cons = xen_9pfs_mask(cons, XEN_9PFS_RING_SIZE);
> /* Then, read the whole packet (including the header) */
> - xen_9pfs_read_packet(req->rc->sdata, ring->data.in, h.size,
> + xen_9pfs_read_packet(req->rc.sdata, ring->data.in, h.size,
> masked_prod, &masked_cons,
> XEN_9PFS_RING_SIZE);
>
>

2018-07-31 01:14:15

by Dominique Martinet

[permalink] [raw]
Subject: Re: [V9fs-developer] [PATCH 1/2] net/9p: embed fcall in req to round down buffer allocs

piaojun wrote on Tue, Jul 31, 2018:
> This is really a *big* patch, but the modification seems no harm. And I
> suggest running testcases to cover this. Please see my comments below.

I'm always running tests, but more never hurt - please help ;)

For reference I'm running a subset of cthon04[1], ltp[2] and some custom
tests like these[3][4]

[1] https://fedorapeople.org/cgit/steved/public_git/cthon04.git/
[2] https://github.com/linux-test-project/ltp
[3] https://github.com/phdeniel/sigmund/blob/master/modules/allfs.inc#L208
[4] https://github.com/phdeniel/sigmund/blob/master/modules/allfs.inc#L251

> > [...]
> > @@ -263,13 +261,13 @@ p9_tag_alloc(struct p9_client *c, int8_t type, unsigned int max_size)
> > if (!req)
> > return NULL;
> >
> > - req->tc = p9_fcall_alloc(alloc_msize);
> > - req->rc = p9_fcall_alloc(alloc_msize);
> > - if (!req->tc || !req->rc)
> > + if (p9_fcall_alloc(&req->tc, alloc_msize))
> > + goto free;
> > + if (p9_fcall_alloc(&req->rc, alloc_msize))
> > goto free;
> >
> > - p9pdu_reset(req->tc);
> > - p9pdu_reset(req->rc);
> > + p9pdu_reset(&req->tc);
> > + p9pdu_reset(&req->rc);
> > req->status = REQ_STATUS_ALLOC;
> > init_waitqueue_head(&req->wq);
> > INIT_LIST_HEAD(&req->req_list);
> > @@ -281,7 +279,7 @@ p9_tag_alloc(struct p9_client *c, int8_t type, unsigned int max_size)
> > GFP_NOWAIT);
> > else
> > tag = idr_alloc(&c->reqs, req, 0, P9_NOTAG, GFP_NOWAIT);
> > - req->tc->tag = tag;
> > + req->tc.tag = tag;
> > spin_unlock_irq(&c->lock);
> > idr_preload_end();
> > if (tag < 0)
> > @@ -290,8 +288,8 @@ p9_tag_alloc(struct p9_client *c, int8_t type, unsigned int max_size)
> > return req;
> >
> > free:
> > - kfree(req->tc);
> > - kfree(req->rc);
> > + kfree(req->tc.sdata);
> > + kfree(req->rc.sdata);
>
> I wonder if we will free a wild pointer as 'sdata' has not been initialized NULL.

Good point, it's possible to jump here if the first fcall_alloc failed
since this declustered the two allocations.

Please consider this added to the previous patch (I'll send a v2 after
this has had more time for review, you can find the amended commit in my
9p-test tree meanwhile):
-----8<-----------------------------
diff --git a/net/9p/client.c b/net/9p/client.c
index ba99a94a12c9..fe030ef1c076 100644
--- a/net/9p/client.c
+++ b/net/9p/client.c
@@ -262,7 +262,7 @@ p9_tag_alloc(struct p9_client *c, int8_t type, unsigned int max_size)
return NULL;

if (p9_fcall_alloc(&req->tc, alloc_msize))
- goto free;
+ goto free_req;
if (p9_fcall_alloc(&req->rc, alloc_msize))
goto free;

@@ -290,6 +290,7 @@ p9_tag_alloc(struct p9_client *c, int8_t type, unsigned int max_size)
free:
kfree(req->tc.sdata);
kfree(req->rc.sdata);
+free_req:
kmem_cache_free(p9_req_cache, req);
return ERR_PTR(-ENOMEM);
}
-----8<-----------------------------

The second goto doesn't need changing because rc.sdata will be set to
NULL if the allocation failed

--
Dominique

2018-07-31 01:20:14

by piaojun

[permalink] [raw]
Subject: Re: [V9fs-developer] [PATCH 2/2] net/9p: add a per-client fcall kmem_cache

Hi Dominique,

Could you help paste some test result before-and-after the patch applied?
And I have a little suggestion in comments below.

On 2018/7/30 17:34, Dominique Martinet wrote:
> From: Dominique Martinet <[email protected]>
>
> Having a specific cache for the fcall allocations helps speed up
> allocations a bit, especially in case of non-"round" msizes.
>
> The caches will automatically be merged if there are multiple caches
> of items with the same size so we do not need to try to share a cache
> between different clients of the same size.
>
> Since the msize is negotiated with the server, only allocate the cache
> after that negotiation has happened - previous allocations or
> allocations of different sizes (e.g. zero-copy fcall) are made with
> kmalloc directly.
>
> Signed-off-by: Dominique Martinet <[email protected]>
> ---
> include/net/9p/client.h | 2 ++
> net/9p/client.c | 40 ++++++++++++++++++++++++++++++++--------
> net/9p/trans_rdma.c | 2 +-
> 3 files changed, 35 insertions(+), 9 deletions(-)
>
> diff --git a/include/net/9p/client.h b/include/net/9p/client.h
> index 4b4ac1362ad5..8d9bc7402a42 100644
> --- a/include/net/9p/client.h
> +++ b/include/net/9p/client.h
> @@ -123,6 +123,7 @@ struct p9_client {
> struct p9_trans_module *trans_mod;
> enum p9_trans_status status;
> void *trans;
> + struct kmem_cache *fcall_cache;
>
> union {
> struct {
> @@ -230,6 +231,7 @@ int p9_client_mkdir_dotl(struct p9_fid *fid, const char *name, int mode,
> kgid_t gid, struct p9_qid *);
> int p9_client_lock_dotl(struct p9_fid *fid, struct p9_flock *flock, u8 *status);
> int p9_client_getlock_dotl(struct p9_fid *fid, struct p9_getlock *fl);
> +void p9_fcall_free(struct p9_client *c, struct p9_fcall *fc);
> struct p9_req_t *p9_tag_lookup(struct p9_client *, u16);
> void p9_client_cb(struct p9_client *c, struct p9_req_t *req, int status);
>
> diff --git a/net/9p/client.c b/net/9p/client.c
> index ba99a94a12c9..215e3b1ed7b4 100644
> --- a/net/9p/client.c
> +++ b/net/9p/client.c
> @@ -231,15 +231,34 @@ static int parse_opts(char *opts, struct p9_client *clnt)
> return ret;
> }
>
> -static int p9_fcall_alloc(struct p9_fcall *fc, int alloc_msize)
> +static int p9_fcall_alloc(struct p9_client *c, struct p9_fcall *fc,
> + int alloc_msize)
> {
> - fc->sdata = kmalloc(alloc_msize, GFP_NOFS);
> + if (c->fcall_cache && alloc_msize == c->msize)
> + fc->sdata = kmem_cache_alloc(c->fcall_cache, GFP_NOFS);
> + else
> + fc->sdata = kmalloc(alloc_msize, GFP_NOFS);
> if (!fc->sdata)
> return -ENOMEM;
> fc->capacity = alloc_msize;
> return 0;
> }
>
> +void p9_fcall_free(struct p9_client *c, struct p9_fcall *fc)
> +{
> + /* sdata can be NULL for interrupted requests in trans_rdma,
> + * and kmem_cache_free does not do NULL-check for us
> + */
> + if (unlikely(!fc->sdata))
> + return;
> +
> + if (c->fcall_cache && fc->capacity == c->msize)
> + kmem_cache_free(c->fcall_cache, fc->sdata);
> + else
> + kfree(fc->sdata);
> +}
> +EXPORT_SYMBOL(p9_fcall_free);
> +
> static struct kmem_cache *p9_req_cache;
>
> /**
> @@ -261,9 +280,9 @@ p9_tag_alloc(struct p9_client *c, int8_t type, unsigned int max_size)
> if (!req)
> return NULL;
>
> - if (p9_fcall_alloc(&req->tc, alloc_msize))
> + if (p9_fcall_alloc(c, &req->tc, alloc_msize))
> goto free;
> - if (p9_fcall_alloc(&req->rc, alloc_msize))
> + if (p9_fcall_alloc(c, &req->rc, alloc_msize))
> goto free;
>
> p9pdu_reset(&req->tc);
> @@ -288,8 +307,8 @@ p9_tag_alloc(struct p9_client *c, int8_t type, unsigned int max_size)
> return req;
>
> free:
> - kfree(req->tc.sdata);
> - kfree(req->rc.sdata);
> + p9_fcall_free(c, &req->tc);
> + p9_fcall_free(c, &req->rc);
> kmem_cache_free(p9_req_cache, req);
> return ERR_PTR(-ENOMEM);
> }
> @@ -333,8 +352,8 @@ static void p9_free_req(struct p9_client *c, struct p9_req_t *r)
> spin_lock_irqsave(&c->lock, flags);
> idr_remove(&c->reqs, tag);
> spin_unlock_irqrestore(&c->lock, flags);
> - kfree(r->tc.sdata);
> - kfree(r->rc.sdata);
> + p9_fcall_free(c, &r->tc);
> + p9_fcall_free(c, &r->rc);
> kmem_cache_free(p9_req_cache, r);
> }
>
> @@ -944,6 +963,7 @@ struct p9_client *p9_client_create(const char *dev_name, char *options)
>
> clnt->trans_mod = NULL;
> clnt->trans = NULL;
> + clnt->fcall_cache = NULL;
>
> client_id = utsname()->nodename;
> memcpy(clnt->name, client_id, strlen(client_id) + 1);
> @@ -980,6 +1000,9 @@ struct p9_client *p9_client_create(const char *dev_name, char *options)
> if (err)
> goto close_trans;
>
> + clnt->fcall_cache = kmem_cache_create("9p-fcall-cache", clnt->msize,
> + 0, 0, NULL);
> +
> return clnt;
>
> close_trans:
> @@ -1011,6 +1034,7 @@ void p9_client_destroy(struct p9_client *clnt)
>
> p9_tag_cleanup(clnt);
>
> + kmem_cache_destroy(clnt->fcall_cache);

We could set NULL for fcall_cache in case of use-after-free.

> kfree(clnt);
> }
> EXPORT_SYMBOL(p9_client_destroy);
> diff --git a/net/9p/trans_rdma.c b/net/9p/trans_rdma.c
> index c5cac97df7f7..5e43f0a00b3a 100644
> --- a/net/9p/trans_rdma.c
> +++ b/net/9p/trans_rdma.c
> @@ -445,7 +445,7 @@ static int rdma_request(struct p9_client *client, struct p9_req_t *req)
> if (unlikely(atomic_read(&rdma->excess_rc) > 0)) {
> if ((atomic_sub_return(1, &rdma->excess_rc) >= 0)) {
> /* Got one! */
> - kfree(req->rc.sdata);
> + p9_fcall_free(client, &req->rc);
> req->rc.sdata = NULL;
> goto dont_need_post_recv;
> } else {
>

2018-07-31 01:29:44

by piaojun

[permalink] [raw]
Subject: Re: [V9fs-developer] [PATCH 1/2] net/9p: embed fcall in req to round down buffer allocs



On 2018/7/31 9:12, Dominique Martinet wrote:
> piaojun wrote on Tue, Jul 31, 2018:
>> This is really a *big* patch, but the modification seems no harm. And I
>> suggest running testcases to cover this. Please see my comments below.
>
> I'm always running tests, but more never hurt - please help ;)

I'm glad to help testing, and actually I'm going to run some testcases in
xfs-test for 9p.

>
> For reference I'm running a subset of cthon04[1], ltp[2] and some custom
> tests like these[3][4]
>
> [1] https://fedorapeople.org/cgit/steved/public_git/cthon04.git/
> [2] https://github.com/linux-test-project/ltp
> [3] https://github.com/phdeniel/sigmund/blob/master/modules/allfs.inc#L208
> [4] https://github.com/phdeniel/sigmund/blob/master/modules/allfs.inc#L251
>
>>> [...]
>>> @@ -263,13 +261,13 @@ p9_tag_alloc(struct p9_client *c, int8_t type, unsigned int max_size)
>>> if (!req)
>>> return NULL;
>>>
>>> - req->tc = p9_fcall_alloc(alloc_msize);
>>> - req->rc = p9_fcall_alloc(alloc_msize);
>>> - if (!req->tc || !req->rc)
>>> + if (p9_fcall_alloc(&req->tc, alloc_msize))
>>> + goto free;
>>> + if (p9_fcall_alloc(&req->rc, alloc_msize))
>>> goto free;
>>>
>>> - p9pdu_reset(req->tc);
>>> - p9pdu_reset(req->rc);
>>> + p9pdu_reset(&req->tc);
>>> + p9pdu_reset(&req->rc);
>>> req->status = REQ_STATUS_ALLOC;
>>> init_waitqueue_head(&req->wq);
>>> INIT_LIST_HEAD(&req->req_list);
>>> @@ -281,7 +279,7 @@ p9_tag_alloc(struct p9_client *c, int8_t type, unsigned int max_size)
>>> GFP_NOWAIT);
>>> else
>>> tag = idr_alloc(&c->reqs, req, 0, P9_NOTAG, GFP_NOWAIT);
>>> - req->tc->tag = tag;
>>> + req->tc.tag = tag;
>>> spin_unlock_irq(&c->lock);
>>> idr_preload_end();
>>> if (tag < 0)
>>> @@ -290,8 +288,8 @@ p9_tag_alloc(struct p9_client *c, int8_t type, unsigned int max_size)
>>> return req;
>>>
>>> free:
>>> - kfree(req->tc);
>>> - kfree(req->rc);
>>> + kfree(req->tc.sdata);
>>> + kfree(req->rc.sdata);
>>
>> I wonder if we will free a wild pointer as 'sdata' has not been initialized NULL.
>
> Good point, it's possible to jump here if the first fcall_alloc failed
> since this declustered the two allocations.
>
> Please consider this added to the previous patch (I'll send a v2 after
> this has had more time for review, you can find the amended commit in my
> 9p-test tree meanwhile):
> -----8<-----------------------------
> diff --git a/net/9p/client.c b/net/9p/client.c
> index ba99a94a12c9..fe030ef1c076 100644
> --- a/net/9p/client.c
> +++ b/net/9p/client.c
> @@ -262,7 +262,7 @@ p9_tag_alloc(struct p9_client *c, int8_t type, unsigned int max_size)
> return NULL;
>
> if (p9_fcall_alloc(&req->tc, alloc_msize))
> - goto free;
> + goto free_req;
> if (p9_fcall_alloc(&req->rc, alloc_msize))
> goto free;
>
> @@ -290,6 +290,7 @@ p9_tag_alloc(struct p9_client *c, int8_t type, unsigned int max_size)
> free:
> kfree(req->tc.sdata);
> kfree(req->rc.sdata);
> +free_req:
> kmem_cache_free(p9_req_cache, req);
> return ERR_PTR(-ENOMEM);
> }
> -----8<-----------------------------
>
> The second goto doesn't need changing because rc.sdata will be set to
> NULL if the allocation failed
>

2018-07-31 01:37:14

by Dominique Martinet

[permalink] [raw]
Subject: Re: [V9fs-developer] [PATCH 2/2] net/9p: add a per-client fcall kmem_cache

piaojun wrote on Tue, Jul 31, 2018:
> Could you help paste some test result before-and-after the patch applied?

The only performance tests I did were sent to the list a couple of mails
earlier, you can find it here:
http://lkml.kernel.org/r/20180730093101.GA7894@nautica

In particular, the results for benchmark on small writes just before and
after this patch, without KASAN (these are the same numbers as in the
link, hardware/setup is described there):
- no alloc (4.18-rc7 request cache): 65.4k req/s
- non-power of two alloc, no patch: 61.6k req/s
- power of two alloc, no patch: 62.2k req/s
- non-power of two alloc, with patch: 64.7k req/s
- power of two alloc, with patch: 65.1k req/s

I'm rather happy with the result, I didn't expect using a dedicated
cache would bring this much back but it's certainly worth it.

> > @@ -1011,6 +1034,7 @@ void p9_client_destroy(struct p9_client *clnt)
> >
> > p9_tag_cleanup(clnt);
> >
> > + kmem_cache_destroy(clnt->fcall_cache);
>
> We could set NULL for fcall_cache in case of use-after-free.
>
> > kfree(clnt);

Hmm, I understand where this comes from, but I'm not sure I agree.
If someone tries to access the client while/after it is freed things are
going to break anyway, I'd rather let things break as obviously as
possible than try to cover it up.

--
Dominique

2018-07-31 01:47:25

by piaojun

[permalink] [raw]
Subject: Re: [V9fs-developer] [PATCH 2/2] net/9p: add a per-client fcall kmem_cache



On 2018/7/31 9:35, Dominique Martinet wrote:
> piaojun wrote on Tue, Jul 31, 2018:
>> Could you help paste some test result before-and-after the patch applied?
>
> The only performance tests I did were sent to the list a couple of mails
> earlier, you can find it here:
> http://lkml.kernel.org/r/20180730093101.GA7894@nautica
>
> In particular, the results for benchmark on small writes just before and
> after this patch, without KASAN (these are the same numbers as in the
> link, hardware/setup is described there):
> - no alloc (4.18-rc7 request cache): 65.4k req/s
> - non-power of two alloc, no patch: 61.6k req/s
> - power of two alloc, no patch: 62.2k req/s
> - non-power of two alloc, with patch: 64.7k req/s
> - power of two alloc, with patch: 65.1k req/s
>
> I'm rather happy with the result, I didn't expect using a dedicated
> cache would bring this much back but it's certainly worth it.
>
It looks like an obvious promotion.

>>> @@ -1011,6 +1034,7 @@ void p9_client_destroy(struct p9_client *clnt)
>>>
>>> p9_tag_cleanup(clnt);
>>>
>>> + kmem_cache_destroy(clnt->fcall_cache);
>>
>> We could set NULL for fcall_cache in case of use-after-free.
>>
>>> kfree(clnt);
>
> Hmm, I understand where this comes from, but I'm not sure I agree.
> If someone tries to access the client while/after it is freed things are
> going to break anyway, I'd rather let things break as obviously as
> possible than try to cover it up.
>
Setting NULL is not a big matter, and I will hear others' opinion.

2018-07-31 02:48:34

by Matthew Wilcox

[permalink] [raw]
Subject: Re: [PATCH 2/2] net/9p: add a per-client fcall kmem_cache

On Mon, Jul 30, 2018 at 11:34:23AM +0200, Dominique Martinet wrote:
> -static int p9_fcall_alloc(struct p9_fcall *fc, int alloc_msize)
> +static int p9_fcall_alloc(struct p9_client *c, struct p9_fcall *fc,
> + int alloc_msize)
> {
> - fc->sdata = kmalloc(alloc_msize, GFP_NOFS);
> + if (c->fcall_cache && alloc_msize == c->msize)
> + fc->sdata = kmem_cache_alloc(c->fcall_cache, GFP_NOFS);
> + else
> + fc->sdata = kmalloc(alloc_msize, GFP_NOFS);

Could you simplify this by initialising c->msize to 0 and then this
can simply be:

> + if (alloc_msize == c->msize)
...

> +void p9_fcall_free(struct p9_client *c, struct p9_fcall *fc)
> +{
> + /* sdata can be NULL for interrupted requests in trans_rdma,
> + * and kmem_cache_free does not do NULL-check for us
> + */
> + if (unlikely(!fc->sdata))
> + return;
> +
> + if (c->fcall_cache && fc->capacity == c->msize)
> + kmem_cache_free(c->fcall_cache, fc->sdata);
> + else
> + kfree(fc->sdata);
> +}

Is it possible for fcall_cache to be allocated before fcall_free is
called? I'm concerned we might do this:

allocate message A
allocate message B
receive response A
allocate fcall_cache
receive response B

and then we'd call kmem_cache_free() for something allocated by kmalloc(),
which works with slab and slub, but doesn't work with slob (alas).

> @@ -980,6 +1000,9 @@ struct p9_client *p9_client_create(const char *dev_name, char *options)
> if (err)
> goto close_trans;
>
> + clnt->fcall_cache = kmem_cache_create("9p-fcall-cache", clnt->msize,
> + 0, 0, NULL);
> +

If we have slab merging turned off, or we have two mounts from servers
with different msizes, we'll end up with two slabs called 9p-fcall-cache.
I'm OK with that, but are you?


2018-07-31 04:19:12

by Dominique Martinet

[permalink] [raw]
Subject: Re: [PATCH 2/2] net/9p: add a per-client fcall kmem_cache

Matthew Wilcox wrote on Mon, Jul 30, 2018:
> On Mon, Jul 30, 2018 at 11:34:23AM +0200, Dominique Martinet wrote:
> > -static int p9_fcall_alloc(struct p9_fcall *fc, int alloc_msize)
> > +static int p9_fcall_alloc(struct p9_client *c, struct p9_fcall *fc,
> > + int alloc_msize)
> > {
> > - fc->sdata = kmalloc(alloc_msize, GFP_NOFS);
> > + if (c->fcall_cache && alloc_msize == c->msize)
> > + fc->sdata = kmem_cache_alloc(c->fcall_cache, GFP_NOFS);
> > + else
> > + fc->sdata = kmalloc(alloc_msize, GFP_NOFS);
>
> Could you simplify this by initialising c->msize to 0 and then this
> can simply be:
>
> > + if (alloc_msize == c->msize)
> ...

Hmm, this is rather tricky with the current flow of things;
p9_client_version() has multiple uses for that msize field.

Basically what happens is:
- init client struct, set clip msize to mount option/transport-specific
max
- p9_client_version() uses current c->msize to send a suggested value
to the server
- p9_client_rpc() uses current c->msize to allocate that first rpc,
this is pretty much hard-coded and will be quite intrusive to make an
exception for
- p9_client_version() looks at the msize the server suggested and clips
c->msize if the reply's is smaller than c->msize


I kind of agree it'd be nice to remove that check being done all the
time for just startup, but I don't see how to do this easily with the
current code.

Making p9_client_version take an extra argument would be easy but we'd
need to actually hardcode in p9_client_rpc that "if the message type is
TVERSION then use [page size or whatever] for allocation" and that kinds
of kills the point... The alternative being having p9_client_rpc takes
the actual size as argument itself but this once again is pretty
intrusive even if it could be done mechanically...

I'll think about this some more

> > +void p9_fcall_free(struct p9_client *c, struct p9_fcall *fc)
> > +{
> > + /* sdata can be NULL for interrupted requests in trans_rdma,
> > + * and kmem_cache_free does not do NULL-check for us
> > + */
> > + if (unlikely(!fc->sdata))
> > + return;
> > +
> > + if (c->fcall_cache && fc->capacity == c->msize)
> > + kmem_cache_free(c->fcall_cache, fc->sdata);
> > + else
> > + kfree(fc->sdata);
> > +}
>
> Is it possible for fcall_cache to be allocated before fcall_free is
> called? I'm concerned we might do this:
>
> allocate message A
> allocate message B
> receive response A
> allocate fcall_cache
> receive response B
>
> and then we'd call kmem_cache_free() for something allocated by kmalloc(),
> which works with slab and slub, but doesn't work with slob (alas).

Bleh, I checked this would work for slab and didn't really check
others..

This cannot happen right now because we only return the client struct
from p9_client_create after the first message is done (and, right now,
freed) but when we start adding refcounting to requests it'd be possible
to free the very first response after fcall_cache is allocated with a
"bad" server like syzcaller does sending the version reply before the
request came in.

I can't see any work-around around this other than storing how the fcall
was allocated in the struct itself though...
I guess I might as well do that now, unless you have a better idea.


> > @@ -980,6 +1000,9 @@ struct p9_client *p9_client_create(const char *dev_name, char *options)
> > if (err)
> > goto close_trans;
> >
> > + clnt->fcall_cache = kmem_cache_create("9p-fcall-cache", clnt->msize,
> > + 0, 0, NULL);
> > +
>
> If we have slab merging turned off, or we have two mounts from servers
> with different msizes, we'll end up with two slabs called 9p-fcall-cache.
> I'm OK with that, but are you?

Yeah, the reason I didn't make it global like p9_req_cache is precisely
to get two separate caches if the msizes are different.

I actually considered adding msize to the string with snprintf or
something but someone looking at it through slabinfo or similar will
have the sizes anyway so I don't think this would bring anything, do you
know if/think that tools will choke on multiple caches with the same
name?


I'm not sure about slab merging being disabled though, from the little I
understand I do not see why anyone would do that except for debugging,
and I'm fine with that.
Please let me know if I'm missing something though!


Thanks for the review,
--
Dominique Martinet

2018-08-01 14:33:45

by Greg Kurz

[permalink] [raw]
Subject: Re: [V9fs-developer] [PATCH 1/2] net/9p: embed fcall in req to round down buffer allocs

On Mon, 30 Jul 2018 11:34:22 +0200
Dominique Martinet <[email protected]> wrote:

> From: Dominique Martinet <[email protected]>
>
> 'msize' is often a power of two, or at least page-aligned, so avoiding
> an overhead of two dozen bytes for each allocation will help the
> allocator do its work and reduce memory fragmentation.
>
> Suggested-by: Matthew Wilcox <[email protected]>
> Signed-off-by: Dominique Martinet <[email protected]>
> ---
> include/net/9p/client.h | 4 +-
> net/9p/client.c | 160 ++++++++++++++++++++--------------------
> net/9p/trans_fd.c | 12 +--
> net/9p/trans_rdma.c | 29 ++++----
> net/9p/trans_virtio.c | 18 ++---
> net/9p/trans_xen.c | 12 +--
> 6 files changed, 117 insertions(+), 118 deletions(-)
>
> diff --git a/include/net/9p/client.h b/include/net/9p/client.h
> index a4dc42c53d18..4b4ac1362ad5 100644
> --- a/include/net/9p/client.h
> +++ b/include/net/9p/client.h
> @@ -95,8 +95,8 @@ struct p9_req_t {
> int status;
> int t_err;
> wait_queue_head_t wq;
> - struct p9_fcall *tc;
> - struct p9_fcall *rc;
> + struct p9_fcall tc;
> + struct p9_fcall rc;
> void *aux;
> struct list_head req_list;
> };
> diff --git a/net/9p/client.c b/net/9p/client.c
> index 88db45966740..ba99a94a12c9 100644
> --- a/net/9p/client.c
> +++ b/net/9p/client.c
> @@ -231,15 +231,13 @@ static int parse_opts(char *opts, struct p9_client *clnt)
> return ret;
> }
>
> -static struct p9_fcall *p9_fcall_alloc(int alloc_msize)
> +static int p9_fcall_alloc(struct p9_fcall *fc, int alloc_msize)
> {
> - struct p9_fcall *fc;
> - fc = kmalloc(sizeof(struct p9_fcall) + alloc_msize, GFP_NOFS);
> - if (!fc)
> - return NULL;
> + fc->sdata = kmalloc(alloc_msize, GFP_NOFS);
> + if (!fc->sdata)
> + return -ENOMEM;
> fc->capacity = alloc_msize;
> - fc->sdata = (char *) fc + sizeof(struct p9_fcall);
> - return fc;
> + return 0;
> }
>
> static struct kmem_cache *p9_req_cache;
> @@ -263,13 +261,13 @@ p9_tag_alloc(struct p9_client *c, int8_t type, unsigned int max_size)
> if (!req)
> return NULL;
>
> - req->tc = p9_fcall_alloc(alloc_msize);
> - req->rc = p9_fcall_alloc(alloc_msize);
> - if (!req->tc || !req->rc)
> + if (p9_fcall_alloc(&req->tc, alloc_msize))
> + goto free;
> + if (p9_fcall_alloc(&req->rc, alloc_msize))
> goto free;

Hmm... if the first allocation fails, we will kfree() req->rc.sdata.

Are we sure we won't have a stale pointer or uninitialized data in
there ?

And even if we don't with the current code base, this is fragile and
could be easily broken.

I think you should drop this hunk and rather rename p9_fcall_alloc() to
p9_fcall_alloc_sdata() instead, since this is what the function is
actually doing with this patch applied.

>
> - p9pdu_reset(req->tc);
> - p9pdu_reset(req->rc);
> + p9pdu_reset(&req->tc);
> + p9pdu_reset(&req->rc);
> req->status = REQ_STATUS_ALLOC;
> init_waitqueue_head(&req->wq);
> INIT_LIST_HEAD(&req->req_list);
> @@ -281,7 +279,7 @@ p9_tag_alloc(struct p9_client *c, int8_t type, unsigned int max_size)
> GFP_NOWAIT);
> else
> tag = idr_alloc(&c->reqs, req, 0, P9_NOTAG, GFP_NOWAIT);
> - req->tc->tag = tag;
> + req->tc.tag = tag;
> spin_unlock_irq(&c->lock);
> idr_preload_end();
> if (tag < 0)
> @@ -290,8 +288,8 @@ p9_tag_alloc(struct p9_client *c, int8_t type, unsigned int max_size)
> return req;
>
> free:
> - kfree(req->tc);
> - kfree(req->rc);
> + kfree(req->tc.sdata);
> + kfree(req->rc.sdata);
> kmem_cache_free(p9_req_cache, req);
> return ERR_PTR(-ENOMEM);
> }
> @@ -329,14 +327,14 @@ EXPORT_SYMBOL(p9_tag_lookup);
> static void p9_free_req(struct p9_client *c, struct p9_req_t *r)
> {
> unsigned long flags;
> - u16 tag = r->tc->tag;
> + u16 tag = r->tc.tag;
>
> p9_debug(P9_DEBUG_MUX, "clnt %p req %p tag: %d\n", c, r, tag);
> spin_lock_irqsave(&c->lock, flags);
> idr_remove(&c->reqs, tag);
> spin_unlock_irqrestore(&c->lock, flags);
> - kfree(r->tc);
> - kfree(r->rc);
> + kfree(r->tc.sdata);
> + kfree(r->rc.sdata);
> kmem_cache_free(p9_req_cache, r);
> }
>
> @@ -368,7 +366,7 @@ static void p9_tag_cleanup(struct p9_client *c)
> */
> void p9_client_cb(struct p9_client *c, struct p9_req_t *req, int status)
> {
> - p9_debug(P9_DEBUG_MUX, " tag %d\n", req->tc->tag);
> + p9_debug(P9_DEBUG_MUX, " tag %d\n", req->tc.tag);
>
> /*
> * This barrier is needed to make sure any change made to req before
> @@ -378,7 +376,7 @@ void p9_client_cb(struct p9_client *c, struct p9_req_t *req, int status)
> req->status = status;
>
> wake_up(&req->wq);
> - p9_debug(P9_DEBUG_MUX, "wakeup: %d\n", req->tc->tag);
> + p9_debug(P9_DEBUG_MUX, "wakeup: %d\n", req->tc.tag);
> }
> EXPORT_SYMBOL(p9_client_cb);
>
> @@ -449,18 +447,18 @@ static int p9_check_errors(struct p9_client *c, struct p9_req_t *req)
> int err;
> int ecode;
>
> - err = p9_parse_header(req->rc, NULL, &type, NULL, 0);
> - if (req->rc->size >= c->msize) {
> + err = p9_parse_header(&req->rc, NULL, &type, NULL, 0);
> + if (req->rc.size >= c->msize) {
> p9_debug(P9_DEBUG_ERROR,
> "requested packet size too big: %d\n",
> - req->rc->size);
> + req->rc.size);
> return -EIO;
> }
> /*
> * dump the response from server
> * This should be after check errors which poplulate pdu_fcall.
> */
> - trace_9p_protocol_dump(c, req->rc);
> + trace_9p_protocol_dump(c, &req->rc);
> if (err) {
> p9_debug(P9_DEBUG_ERROR, "couldn't parse header %d\n", err);
> return err;
> @@ -470,7 +468,7 @@ static int p9_check_errors(struct p9_client *c, struct p9_req_t *req)
>
> if (!p9_is_proto_dotl(c)) {
> char *ename;
> - err = p9pdu_readf(req->rc, c->proto_version, "s?d",
> + err = p9pdu_readf(&req->rc, c->proto_version, "s?d",
> &ename, &ecode);
> if (err)
> goto out_err;
> @@ -486,7 +484,7 @@ static int p9_check_errors(struct p9_client *c, struct p9_req_t *req)
> }
> kfree(ename);
> } else {
> - err = p9pdu_readf(req->rc, c->proto_version, "d", &ecode);
> + err = p9pdu_readf(&req->rc, c->proto_version, "d", &ecode);
> err = -ecode;
>
> p9_debug(P9_DEBUG_9P, "<<< RLERROR (%d)\n", -ecode);
> @@ -520,12 +518,12 @@ static int p9_check_zc_errors(struct p9_client *c, struct p9_req_t *req,
> int8_t type;
> char *ename = NULL;
>
> - err = p9_parse_header(req->rc, NULL, &type, NULL, 0);
> + err = p9_parse_header(&req->rc, NULL, &type, NULL, 0);
> /*
> * dump the response from server
> * This should be after parse_header which poplulate pdu_fcall.
> */
> - trace_9p_protocol_dump(c, req->rc);
> + trace_9p_protocol_dump(c, &req->rc);
> if (err) {
> p9_debug(P9_DEBUG_ERROR, "couldn't parse header %d\n", err);
> return err;
> @@ -540,13 +538,13 @@ static int p9_check_zc_errors(struct p9_client *c, struct p9_req_t *req,
> /* 7 = header size for RERROR; */
> int inline_len = in_hdrlen - 7;
>
> - len = req->rc->size - req->rc->offset;
> + len = req->rc.size - req->rc.offset;
> if (len > (P9_ZC_HDR_SZ - 7)) {
> err = -EFAULT;
> goto out_err;
> }
>
> - ename = &req->rc->sdata[req->rc->offset];
> + ename = &req->rc.sdata[req->rc.offset];
> if (len > inline_len) {
> /* We have error in external buffer */
> if (!copy_from_iter_full(ename + inline_len,
> @@ -556,7 +554,7 @@ static int p9_check_zc_errors(struct p9_client *c, struct p9_req_t *req,
> }
> }
> ename = NULL;
> - err = p9pdu_readf(req->rc, c->proto_version, "s?d",
> + err = p9pdu_readf(&req->rc, c->proto_version, "s?d",
> &ename, &ecode);
> if (err)
> goto out_err;
> @@ -572,7 +570,7 @@ static int p9_check_zc_errors(struct p9_client *c, struct p9_req_t *req,
> }
> kfree(ename);
> } else {
> - err = p9pdu_readf(req->rc, c->proto_version, "d", &ecode);
> + err = p9pdu_readf(&req->rc, c->proto_version, "d", &ecode);
> err = -ecode;
>
> p9_debug(P9_DEBUG_9P, "<<< RLERROR (%d)\n", -ecode);
> @@ -605,7 +603,7 @@ static int p9_client_flush(struct p9_client *c, struct p9_req_t *oldreq)
> int16_t oldtag;
> int err;
>
> - err = p9_parse_header(oldreq->tc, NULL, NULL, &oldtag, 1);
> + err = p9_parse_header(&oldreq->tc, NULL, NULL, &oldtag, 1);
> if (err)
> return err;
>
> @@ -649,12 +647,12 @@ static struct p9_req_t *p9_client_prepare_req(struct p9_client *c,
> return req;
>
> /* marshall the data */
> - p9pdu_prepare(req->tc, req->tc->tag, type);
> - err = p9pdu_vwritef(req->tc, c->proto_version, fmt, ap);
> + p9pdu_prepare(&req->tc, req->tc.tag, type);
> + err = p9pdu_vwritef(&req->tc, c->proto_version, fmt, ap);
> if (err)
> goto reterr;
> - p9pdu_finalize(c, req->tc);
> - trace_9p_client_req(c, type, req->tc->tag);
> + p9pdu_finalize(c, &req->tc);
> + trace_9p_client_req(c, type, req->tc.tag);
> return req;
> reterr:
> p9_free_req(c, req);
> @@ -739,7 +737,7 @@ p9_client_rpc(struct p9_client *c, int8_t type, const char *fmt, ...)
> goto reterr;
>
> err = p9_check_errors(c, req);
> - trace_9p_client_res(c, type, req->rc->tag, err);
> + trace_9p_client_res(c, type, req->rc.tag, err);
> if (!err)
> return req;
> reterr:
> @@ -821,7 +819,7 @@ static struct p9_req_t *p9_client_zc_rpc(struct p9_client *c, int8_t type,
> goto reterr;
>
> err = p9_check_zc_errors(c, req, uidata, in_hdrlen);
> - trace_9p_client_res(c, type, req->rc->tag, err);
> + trace_9p_client_res(c, type, req->rc.tag, err);
> if (!err)
> return req;
> reterr:
> @@ -904,10 +902,10 @@ static int p9_client_version(struct p9_client *c)
> if (IS_ERR(req))
> return PTR_ERR(req);
>
> - err = p9pdu_readf(req->rc, c->proto_version, "ds", &msize, &version);
> + err = p9pdu_readf(&req->rc, c->proto_version, "ds", &msize, &version);
> if (err) {
> p9_debug(P9_DEBUG_9P, "version error %d\n", err);
> - trace_9p_protocol_dump(c, req->rc);
> + trace_9p_protocol_dump(c, &req->rc);
> goto error;
> }
>
> @@ -1056,9 +1054,9 @@ struct p9_fid *p9_client_attach(struct p9_client *clnt, struct p9_fid *afid,
> goto error;
> }
>
> - err = p9pdu_readf(req->rc, clnt->proto_version, "Q", &qid);
> + err = p9pdu_readf(&req->rc, clnt->proto_version, "Q", &qid);
> if (err) {
> - trace_9p_protocol_dump(clnt, req->rc);
> + trace_9p_protocol_dump(clnt, &req->rc);
> p9_free_req(clnt, req);
> goto error;
> }
> @@ -1113,9 +1111,9 @@ struct p9_fid *p9_client_walk(struct p9_fid *oldfid, uint16_t nwname,
> goto error;
> }
>
> - err = p9pdu_readf(req->rc, clnt->proto_version, "R", &nwqids, &wqids);
> + err = p9pdu_readf(&req->rc, clnt->proto_version, "R", &nwqids, &wqids);
> if (err) {
> - trace_9p_protocol_dump(clnt, req->rc);
> + trace_9p_protocol_dump(clnt, &req->rc);
> p9_free_req(clnt, req);
> goto clunk_fid;
> }
> @@ -1180,9 +1178,9 @@ int p9_client_open(struct p9_fid *fid, int mode)
> goto error;
> }
>
> - err = p9pdu_readf(req->rc, clnt->proto_version, "Qd", &qid, &iounit);
> + err = p9pdu_readf(&req->rc, clnt->proto_version, "Qd", &qid, &iounit);
> if (err) {
> - trace_9p_protocol_dump(clnt, req->rc);
> + trace_9p_protocol_dump(clnt, &req->rc);
> goto free_and_error;
> }
>
> @@ -1224,9 +1222,9 @@ int p9_client_create_dotl(struct p9_fid *ofid, const char *name, u32 flags, u32
> goto error;
> }
>
> - err = p9pdu_readf(req->rc, clnt->proto_version, "Qd", qid, &iounit);
> + err = p9pdu_readf(&req->rc, clnt->proto_version, "Qd", qid, &iounit);
> if (err) {
> - trace_9p_protocol_dump(clnt, req->rc);
> + trace_9p_protocol_dump(clnt, &req->rc);
> goto free_and_error;
> }
>
> @@ -1269,9 +1267,9 @@ int p9_client_fcreate(struct p9_fid *fid, const char *name, u32 perm, int mode,
> goto error;
> }
>
> - err = p9pdu_readf(req->rc, clnt->proto_version, "Qd", &qid, &iounit);
> + err = p9pdu_readf(&req->rc, clnt->proto_version, "Qd", &qid, &iounit);
> if (err) {
> - trace_9p_protocol_dump(clnt, req->rc);
> + trace_9p_protocol_dump(clnt, &req->rc);
> goto free_and_error;
> }
>
> @@ -1308,9 +1306,9 @@ int p9_client_symlink(struct p9_fid *dfid, const char *name,
> goto error;
> }
>
> - err = p9pdu_readf(req->rc, clnt->proto_version, "Q", qid);
> + err = p9pdu_readf(&req->rc, clnt->proto_version, "Q", qid);
> if (err) {
> - trace_9p_protocol_dump(clnt, req->rc);
> + trace_9p_protocol_dump(clnt, &req->rc);
> goto free_and_error;
> }
>
> @@ -1506,10 +1504,10 @@ p9_client_read(struct p9_fid *fid, u64 offset, struct iov_iter *to, int *err)
> break;
> }
>
> - *err = p9pdu_readf(req->rc, clnt->proto_version,
> + *err = p9pdu_readf(&req->rc, clnt->proto_version,
> "D", &count, &dataptr);
> if (*err) {
> - trace_9p_protocol_dump(clnt, req->rc);
> + trace_9p_protocol_dump(clnt, &req->rc);
> p9_free_req(clnt, req);
> break;
> }
> @@ -1579,9 +1577,9 @@ p9_client_write(struct p9_fid *fid, u64 offset, struct iov_iter *from, int *err)
> break;
> }
>
> - *err = p9pdu_readf(req->rc, clnt->proto_version, "d", &count);
> + *err = p9pdu_readf(&req->rc, clnt->proto_version, "d", &count);
> if (*err) {
> - trace_9p_protocol_dump(clnt, req->rc);
> + trace_9p_protocol_dump(clnt, &req->rc);
> p9_free_req(clnt, req);
> break;
> }
> @@ -1623,9 +1621,9 @@ struct p9_wstat *p9_client_stat(struct p9_fid *fid)
> goto error;
> }
>
> - err = p9pdu_readf(req->rc, clnt->proto_version, "wS", &ignored, ret);
> + err = p9pdu_readf(&req->rc, clnt->proto_version, "wS", &ignored, ret);
> if (err) {
> - trace_9p_protocol_dump(clnt, req->rc);
> + trace_9p_protocol_dump(clnt, &req->rc);
> p9_free_req(clnt, req);
> goto error;
> }
> @@ -1676,9 +1674,9 @@ struct p9_stat_dotl *p9_client_getattr_dotl(struct p9_fid *fid,
> goto error;
> }
>
> - err = p9pdu_readf(req->rc, clnt->proto_version, "A", ret);
> + err = p9pdu_readf(&req->rc, clnt->proto_version, "A", ret);
> if (err) {
> - trace_9p_protocol_dump(clnt, req->rc);
> + trace_9p_protocol_dump(clnt, &req->rc);
> p9_free_req(clnt, req);
> goto error;
> }
> @@ -1828,11 +1826,11 @@ int p9_client_statfs(struct p9_fid *fid, struct p9_rstatfs *sb)
> goto error;
> }
>
> - err = p9pdu_readf(req->rc, clnt->proto_version, "ddqqqqqqd", &sb->type,
> - &sb->bsize, &sb->blocks, &sb->bfree, &sb->bavail,
> - &sb->files, &sb->ffree, &sb->fsid, &sb->namelen);
> + err = p9pdu_readf(&req->rc, clnt->proto_version, "ddqqqqqqd", &sb->type,
> + &sb->bsize, &sb->blocks, &sb->bfree, &sb->bavail,
> + &sb->files, &sb->ffree, &sb->fsid, &sb->namelen);
> if (err) {
> - trace_9p_protocol_dump(clnt, req->rc);
> + trace_9p_protocol_dump(clnt, &req->rc);
> p9_free_req(clnt, req);
> goto error;
> }
> @@ -1936,9 +1934,9 @@ struct p9_fid *p9_client_xattrwalk(struct p9_fid *file_fid,
> err = PTR_ERR(req);
> goto error;
> }
> - err = p9pdu_readf(req->rc, clnt->proto_version, "q", attr_size);
> + err = p9pdu_readf(&req->rc, clnt->proto_version, "q", attr_size);
> if (err) {
> - trace_9p_protocol_dump(clnt, req->rc);
> + trace_9p_protocol_dump(clnt, &req->rc);
> p9_free_req(clnt, req);
> goto clunk_fid;
> }
> @@ -2024,9 +2022,9 @@ int p9_client_readdir(struct p9_fid *fid, char *data, u32 count, u64 offset)
> goto error;
> }
>
> - err = p9pdu_readf(req->rc, clnt->proto_version, "D", &count, &dataptr);
> + err = p9pdu_readf(&req->rc, clnt->proto_version, "D", &count, &dataptr);
> if (err) {
> - trace_9p_protocol_dump(clnt, req->rc);
> + trace_9p_protocol_dump(clnt, &req->rc);
> goto free_and_error;
> }
> if (rsize < count) {
> @@ -2065,9 +2063,9 @@ int p9_client_mknod_dotl(struct p9_fid *fid, const char *name, int mode,
> if (IS_ERR(req))
> return PTR_ERR(req);
>
> - err = p9pdu_readf(req->rc, clnt->proto_version, "Q", qid);
> + err = p9pdu_readf(&req->rc, clnt->proto_version, "Q", qid);
> if (err) {
> - trace_9p_protocol_dump(clnt, req->rc);
> + trace_9p_protocol_dump(clnt, &req->rc);
> goto error;
> }
> p9_debug(P9_DEBUG_9P, "<<< RMKNOD qid %x.%llx.%x\n", qid->type,
> @@ -2096,9 +2094,9 @@ int p9_client_mkdir_dotl(struct p9_fid *fid, const char *name, int mode,
> if (IS_ERR(req))
> return PTR_ERR(req);
>
> - err = p9pdu_readf(req->rc, clnt->proto_version, "Q", qid);
> + err = p9pdu_readf(&req->rc, clnt->proto_version, "Q", qid);
> if (err) {
> - trace_9p_protocol_dump(clnt, req->rc);
> + trace_9p_protocol_dump(clnt, &req->rc);
> goto error;
> }
> p9_debug(P9_DEBUG_9P, "<<< RMKDIR qid %x.%llx.%x\n", qid->type,
> @@ -2131,9 +2129,9 @@ int p9_client_lock_dotl(struct p9_fid *fid, struct p9_flock *flock, u8 *status)
> if (IS_ERR(req))
> return PTR_ERR(req);
>
> - err = p9pdu_readf(req->rc, clnt->proto_version, "b", status);
> + err = p9pdu_readf(&req->rc, clnt->proto_version, "b", status);
> if (err) {
> - trace_9p_protocol_dump(clnt, req->rc);
> + trace_9p_protocol_dump(clnt, &req->rc);
> goto error;
> }
> p9_debug(P9_DEBUG_9P, "<<< RLOCK status %i\n", *status);
> @@ -2162,11 +2160,11 @@ int p9_client_getlock_dotl(struct p9_fid *fid, struct p9_getlock *glock)
> if (IS_ERR(req))
> return PTR_ERR(req);
>
> - err = p9pdu_readf(req->rc, clnt->proto_version, "bqqds", &glock->type,
> - &glock->start, &glock->length, &glock->proc_id,
> - &glock->client_id);
> + err = p9pdu_readf(&req->rc, clnt->proto_version, "bqqds", &glock->type,
> + &glock->start, &glock->length, &glock->proc_id,
> + &glock->client_id);
> if (err) {
> - trace_9p_protocol_dump(clnt, req->rc);
> + trace_9p_protocol_dump(clnt, &req->rc);
> goto error;
> }
> p9_debug(P9_DEBUG_9P, "<<< RGETLOCK type %i start %lld length %lld "
> @@ -2192,9 +2190,9 @@ int p9_client_readlink(struct p9_fid *fid, char **target)
> if (IS_ERR(req))
> return PTR_ERR(req);
>
> - err = p9pdu_readf(req->rc, clnt->proto_version, "s", target);
> + err = p9pdu_readf(&req->rc, clnt->proto_version, "s", target);
> if (err) {
> - trace_9p_protocol_dump(clnt, req->rc);
> + trace_9p_protocol_dump(clnt, &req->rc);
> goto error;
> }
> p9_debug(P9_DEBUG_9P, "<<< RREADLINK target %s\n", *target);
> diff --git a/net/9p/trans_fd.c b/net/9p/trans_fd.c
> index e2ef3c782c53..20f46f13fe83 100644
> --- a/net/9p/trans_fd.c
> +++ b/net/9p/trans_fd.c
> @@ -354,7 +354,7 @@ static void p9_read_work(struct work_struct *work)
> goto error;
> }
>
> - if (m->req->rc == NULL) {
> + if (m->req->rc.sdata == NULL) {
> p9_debug(P9_DEBUG_ERROR,
> "No recv fcall for tag %d (req %p), disconnecting!\n",
> m->rc.tag, m->req);
> @@ -362,7 +362,7 @@ static void p9_read_work(struct work_struct *work)
> err = -EIO;
> goto error;
> }
> - m->rc.sdata = (char *)m->req->rc + sizeof(struct p9_fcall);
> + m->rc.sdata = m->req->rc.sdata;
> memcpy(m->rc.sdata, m->tmp_buf, m->rc.capacity);
> m->rc.capacity = m->rc.size;
> }
> @@ -372,7 +372,7 @@ static void p9_read_work(struct work_struct *work)
> */
> if ((m->req) && (m->rc.offset == m->rc.capacity)) {
> p9_debug(P9_DEBUG_TRANS, "got new packet\n");
> - m->req->rc->size = m->rc.offset;
> + m->req->rc.size = m->rc.offset;
> spin_lock(&m->client->lock);
> if (m->req->status != REQ_STATUS_ERROR)
> status = REQ_STATUS_RCVD;
> @@ -469,8 +469,8 @@ static void p9_write_work(struct work_struct *work)
> p9_debug(P9_DEBUG_TRANS, "move req %p\n", req);
> list_move_tail(&req->req_list, &m->req_list);
>
> - m->wbuf = req->tc->sdata;
> - m->wsize = req->tc->size;
> + m->wbuf = req->tc.sdata;
> + m->wsize = req->tc.size;
> m->wpos = 0;
> spin_unlock(&m->client->lock);
> }
> @@ -663,7 +663,7 @@ static int p9_fd_request(struct p9_client *client, struct p9_req_t *req)
> struct p9_conn *m = &ts->conn;
>
> p9_debug(P9_DEBUG_TRANS, "mux %p task %p tcall %p id %d\n",
> - m, current, req->tc, req->tc->id);
> + m, current, &req->tc, req->tc.id);
> if (m->err < 0)
> return m->err;
>
> diff --git a/net/9p/trans_rdma.c b/net/9p/trans_rdma.c
> index 2ab4574183c9..c5cac97df7f7 100644
> --- a/net/9p/trans_rdma.c
> +++ b/net/9p/trans_rdma.c
> @@ -122,7 +122,7 @@ struct p9_rdma_context {
> dma_addr_t busa;
> union {
> struct p9_req_t *req;
> - struct p9_fcall *rc;
> + struct p9_fcall rc;
> };
> };
>
> @@ -320,8 +320,8 @@ recv_done(struct ib_cq *cq, struct ib_wc *wc)
> if (wc->status != IB_WC_SUCCESS)
> goto err_out;
>
> - c->rc->size = wc->byte_len;
> - err = p9_parse_header(c->rc, NULL, NULL, &tag, 1);
> + c->rc.size = wc->byte_len;
> + err = p9_parse_header(&c->rc, NULL, NULL, &tag, 1);
> if (err)
> goto err_out;
>
> @@ -331,12 +331,13 @@ recv_done(struct ib_cq *cq, struct ib_wc *wc)
>
> /* Check that we have not yet received a reply for this request.
> */
> - if (unlikely(req->rc)) {
> + if (unlikely(req->rc.sdata)) {
> pr_err("Duplicate reply for request %d", tag);
> goto err_out;
> }
>
> - req->rc = c->rc;
> + req->rc.size = c->rc.size;
> + req->rc.sdata = c->rc.sdata;
> p9_client_cb(client, req, REQ_STATUS_RCVD);
>
> out:
> @@ -361,7 +362,7 @@ send_done(struct ib_cq *cq, struct ib_wc *wc)
> container_of(wc->wr_cqe, struct p9_rdma_context, cqe);
>
> ib_dma_unmap_single(rdma->cm_id->device,
> - c->busa, c->req->tc->size,
> + c->busa, c->req->tc.size,
> DMA_TO_DEVICE);
> up(&rdma->sq_sem);
> kfree(c);
> @@ -401,7 +402,7 @@ post_recv(struct p9_client *client, struct p9_rdma_context *c)
> struct ib_sge sge;
>
> c->busa = ib_dma_map_single(rdma->cm_id->device,
> - c->rc->sdata, client->msize,
> + c->rc.sdata, client->msize,
> DMA_FROM_DEVICE);
> if (ib_dma_mapping_error(rdma->cm_id->device, c->busa))
> goto error;
> @@ -443,9 +444,9 @@ static int rdma_request(struct p9_client *client, struct p9_req_t *req)
> **/
> if (unlikely(atomic_read(&rdma->excess_rc) > 0)) {
> if ((atomic_sub_return(1, &rdma->excess_rc) >= 0)) {
> - /* Got one ! */
> - kfree(req->rc);
> - req->rc = NULL;
> + /* Got one! */
> + kfree(req->rc.sdata);
> + req->rc.sdata = NULL;
> goto dont_need_post_recv;
> } else {
> /* We raced and lost. */
> @@ -459,7 +460,7 @@ static int rdma_request(struct p9_client *client, struct p9_req_t *req)
> err = -ENOMEM;
> goto recv_error;
> }
> - rpl_context->rc = req->rc;
> + rpl_context->rc.sdata = req->rc.sdata;
>
> /*
> * Post a receive buffer for this request. We need to ensure
> @@ -479,7 +480,7 @@ static int rdma_request(struct p9_client *client, struct p9_req_t *req)
> goto recv_error;
> }
> /* remove posted receive buffer from request structure */
> - req->rc = NULL;
> + req->rc.sdata = NULL;
>
> dont_need_post_recv:
> /* Post the request */
> @@ -491,7 +492,7 @@ static int rdma_request(struct p9_client *client, struct p9_req_t *req)
> c->req = req;
>
> c->busa = ib_dma_map_single(rdma->cm_id->device,
> - c->req->tc->sdata, c->req->tc->size,
> + c->req->tc.sdata, c->req->tc.size,
> DMA_TO_DEVICE);
> if (ib_dma_mapping_error(rdma->cm_id->device, c->busa)) {
> err = -EIO;
> @@ -501,7 +502,7 @@ static int rdma_request(struct p9_client *client, struct p9_req_t *req)
> c->cqe.done = send_done;
>
> sge.addr = c->busa;
> - sge.length = c->req->tc->size;
> + sge.length = c->req->tc.size;
> sge.lkey = rdma->pd->local_dma_lkey;
>
> wr.next = NULL;
> diff --git a/net/9p/trans_virtio.c b/net/9p/trans_virtio.c
> index 6265d1d62749..80dd34d8a6af 100644
> --- a/net/9p/trans_virtio.c
> +++ b/net/9p/trans_virtio.c
> @@ -157,7 +157,7 @@ static void req_done(struct virtqueue *vq)
> }
>
> if (len) {
> - req->rc->size = len;
> + req->rc.size = len;
> p9_client_cb(chan->client, req, REQ_STATUS_RCVD);
> }
> }
> @@ -274,12 +274,12 @@ p9_virtio_request(struct p9_client *client, struct p9_req_t *req)
> out_sgs = in_sgs = 0;
> /* Handle out VirtIO ring buffers */
> out = pack_sg_list(chan->sg, 0,
> - VIRTQUEUE_NUM, req->tc->sdata, req->tc->size);
> + VIRTQUEUE_NUM, req->tc.sdata, req->tc.size);
> if (out)
> sgs[out_sgs++] = chan->sg;
>
> in = pack_sg_list(chan->sg, out,
> - VIRTQUEUE_NUM, req->rc->sdata, req->rc->capacity);
> + VIRTQUEUE_NUM, req->rc.sdata, req->rc.capacity);
> if (in)
> sgs[out_sgs + in_sgs++] = chan->sg + out;
>
> @@ -417,15 +417,15 @@ p9_virtio_zc_request(struct p9_client *client, struct p9_req_t *req,
> out_nr_pages = DIV_ROUND_UP(n + offs, PAGE_SIZE);
> if (n != outlen) {
> __le32 v = cpu_to_le32(n);
> - memcpy(&req->tc->sdata[req->tc->size - 4], &v, 4);
> + memcpy(&req->tc.sdata[req->tc.size - 4], &v, 4);
> outlen = n;
> }
> /* The size field of the message must include the length of the
> * header and the length of the data. We didn't actually know
> * the length of the data until this point so add it in now.
> */
> - sz = cpu_to_le32(req->tc->size + outlen);
> - memcpy(&req->tc->sdata[0], &sz, sizeof(sz));
> + sz = cpu_to_le32(req->tc.size + outlen);
> + memcpy(&req->tc.sdata[0], &sz, sizeof(sz));
> } else if (uidata) {
> int n = p9_get_mapped_pages(chan, &in_pages, uidata,
> inlen, &offs, &need_drop);
> @@ -434,7 +434,7 @@ p9_virtio_zc_request(struct p9_client *client, struct p9_req_t *req,
> in_nr_pages = DIV_ROUND_UP(n + offs, PAGE_SIZE);
> if (n != inlen) {
> __le32 v = cpu_to_le32(n);
> - memcpy(&req->tc->sdata[req->tc->size - 4], &v, 4);
> + memcpy(&req->tc.sdata[req->tc.size - 4], &v, 4);
> inlen = n;
> }
> }
> @@ -446,7 +446,7 @@ p9_virtio_zc_request(struct p9_client *client, struct p9_req_t *req,
>
> /* out data */
> out = pack_sg_list(chan->sg, 0,
> - VIRTQUEUE_NUM, req->tc->sdata, req->tc->size);
> + VIRTQUEUE_NUM, req->tc.sdata, req->tc.size);
>
> if (out)
> sgs[out_sgs++] = chan->sg;
> @@ -465,7 +465,7 @@ p9_virtio_zc_request(struct p9_client *client, struct p9_req_t *req,
> * alloced memory and payload onto the user buffer.
> */
> in = pack_sg_list(chan->sg, out,
> - VIRTQUEUE_NUM, req->rc->sdata, in_hdr_len);
> + VIRTQUEUE_NUM, req->rc.sdata, in_hdr_len);
> if (in)
> sgs[out_sgs + in_sgs++] = chan->sg + out;
>
> diff --git a/net/9p/trans_xen.c b/net/9p/trans_xen.c
> index c2d54ac76bfd..1a5b38892eb4 100644
> --- a/net/9p/trans_xen.c
> +++ b/net/9p/trans_xen.c
> @@ -141,7 +141,7 @@ static int p9_xen_request(struct p9_client *client, struct p9_req_t *p9_req)
> struct xen_9pfs_front_priv *priv = NULL;
> RING_IDX cons, prod, masked_cons, masked_prod;
> unsigned long flags;
> - u32 size = p9_req->tc->size;
> + u32 size = p9_req->tc.size;
> struct xen_9pfs_dataring *ring;
> int num;
>
> @@ -154,7 +154,7 @@ static int p9_xen_request(struct p9_client *client, struct p9_req_t *p9_req)
> if (!priv || priv->client != client)
> return -EINVAL;
>
> - num = p9_req->tc->tag % priv->num_rings;
> + num = p9_req->tc.tag % priv->num_rings;
> ring = &priv->rings[num];
>
> again:
> @@ -176,7 +176,7 @@ static int p9_xen_request(struct p9_client *client, struct p9_req_t *p9_req)
> masked_prod = xen_9pfs_mask(prod, XEN_9PFS_RING_SIZE);
> masked_cons = xen_9pfs_mask(cons, XEN_9PFS_RING_SIZE);
>
> - xen_9pfs_write_packet(ring->data.out, p9_req->tc->sdata, size,
> + xen_9pfs_write_packet(ring->data.out, p9_req->tc.sdata, size,
> &masked_prod, masked_cons, XEN_9PFS_RING_SIZE);
>
> p9_req->status = REQ_STATUS_SENT;
> @@ -229,12 +229,12 @@ static void p9_xen_response(struct work_struct *work)
> continue;
> }
>
> - memcpy(req->rc, &h, sizeof(h));
> - req->rc->offset = 0;
> + memcpy(&req->rc, &h, sizeof(h));
> + req->rc.offset = 0;
>
> masked_cons = xen_9pfs_mask(cons, XEN_9PFS_RING_SIZE);
> /* Then, read the whole packet (including the header) */
> - xen_9pfs_read_packet(req->rc->sdata, ring->data.in, h.size,
> + xen_9pfs_read_packet(req->rc.sdata, ring->data.in, h.size,
> masked_prod, &masked_cons,
> XEN_9PFS_RING_SIZE);
>


2018-08-01 14:40:54

by Dominique Martinet

[permalink] [raw]
Subject: Re: [V9fs-developer] [PATCH 1/2] net/9p: embed fcall in req to round down buffer allocs

Greg Kurz wrote on Wed, Aug 01, 2018:
> > @@ -263,13 +261,13 @@ p9_tag_alloc(struct p9_client *c, int8_t type, unsigned int max_size)
> > if (!req)
> > return NULL;
> >
> > - req->tc = p9_fcall_alloc(alloc_msize);
> > - req->rc = p9_fcall_alloc(alloc_msize);
> > - if (!req->tc || !req->rc)
> > + if (p9_fcall_alloc(&req->tc, alloc_msize))
> > + goto free;
> > + if (p9_fcall_alloc(&req->rc, alloc_msize))
> > goto free;
>
> Hmm... if the first allocation fails, we will kfree() req->rc.sdata.
>
> Are we sure we won't have a stale pointer or uninitialized data in
> there ?

Yeah, Jun pointed that out and I have a v2 that only frees as needed
with an extra goto (I sent an incremental diff in my reply to his
comment here[1])

[1] https://lkml.kernel.org/r/20180731011256.GA30388@nautica

> And even if we don't with the current code base, this is fragile and
> could be easily broken.
>
> I think you should drop this hunk and rather rename p9_fcall_alloc() to
> p9_fcall_alloc_sdata() instead, since this is what the function is
> actually doing with this patch applied.

Hmm. I agree the naming isn't accurate, but even if we rename it we'll
need to pass a pointer to fcall as argument as it inits its capacity.
p9_fcall_init(fc, msize) might be simpler?

(I'm not sure I follow what you mean by 'drop this hunk', to be honest,
did you want a single function call to init both maybe?)

--
Dominique

2018-08-01 15:07:39

by Greg Kurz

[permalink] [raw]
Subject: Re: [V9fs-developer] [PATCH 2/2] net/9p: add a per-client fcall kmem_cache

On Mon, 30 Jul 2018 11:34:23 +0200
Dominique Martinet <[email protected]> wrote:

> From: Dominique Martinet <[email protected]>
>
> Having a specific cache for the fcall allocations helps speed up
> allocations a bit, especially in case of non-"round" msizes.
>
> The caches will automatically be merged if there are multiple caches
> of items with the same size so we do not need to try to share a cache
> between different clients of the same size.
>
> Since the msize is negotiated with the server, only allocate the cache
> after that negotiation has happened - previous allocations or
> allocations of different sizes (e.g. zero-copy fcall) are made with
> kmalloc directly.
>
> Signed-off-by: Dominique Martinet <[email protected]>
> ---

The patch looks good to me. It would need to be rebased when you have fixed
the potential kfree() of stale data in patch 1. Either with an extra goto
label in p9_tag_alloc or by turning p9_fcall_alloc into p9_fcall_alloc_sdata,
both solutions are equivalent.

Just one suggestion, see below.

> include/net/9p/client.h | 2 ++
> net/9p/client.c | 40 ++++++++++++++++++++++++++++++++--------
> net/9p/trans_rdma.c | 2 +-
> 3 files changed, 35 insertions(+), 9 deletions(-)
>
> diff --git a/include/net/9p/client.h b/include/net/9p/client.h
> index 4b4ac1362ad5..8d9bc7402a42 100644
> --- a/include/net/9p/client.h
> +++ b/include/net/9p/client.h
> @@ -123,6 +123,7 @@ struct p9_client {
> struct p9_trans_module *trans_mod;
> enum p9_trans_status status;
> void *trans;
> + struct kmem_cache *fcall_cache;
>
> union {
> struct {
> @@ -230,6 +231,7 @@ int p9_client_mkdir_dotl(struct p9_fid *fid, const char *name, int mode,
> kgid_t gid, struct p9_qid *);
> int p9_client_lock_dotl(struct p9_fid *fid, struct p9_flock *flock, u8 *status);
> int p9_client_getlock_dotl(struct p9_fid *fid, struct p9_getlock *fl);
> +void p9_fcall_free(struct p9_client *c, struct p9_fcall *fc);
> struct p9_req_t *p9_tag_lookup(struct p9_client *, u16);
> void p9_client_cb(struct p9_client *c, struct p9_req_t *req, int status);
>
> diff --git a/net/9p/client.c b/net/9p/client.c
> index ba99a94a12c9..215e3b1ed7b4 100644
> --- a/net/9p/client.c
> +++ b/net/9p/client.c
> @@ -231,15 +231,34 @@ static int parse_opts(char *opts, struct p9_client *clnt)
> return ret;
> }
>
> -static int p9_fcall_alloc(struct p9_fcall *fc, int alloc_msize)
> +static int p9_fcall_alloc(struct p9_client *c, struct p9_fcall *fc,
> + int alloc_msize)
> {
> - fc->sdata = kmalloc(alloc_msize, GFP_NOFS);
> + if (c->fcall_cache && alloc_msize == c->msize)

This is a presumably hot path for any request but the initial TVERSION,
you probably want likely() here...

> + fc->sdata = kmem_cache_alloc(c->fcall_cache, GFP_NOFS);
> + else
> + fc->sdata = kmalloc(alloc_msize, GFP_NOFS);
> if (!fc->sdata)
> return -ENOMEM;
> fc->capacity = alloc_msize;
> return 0;
> }
>
> +void p9_fcall_free(struct p9_client *c, struct p9_fcall *fc)
> +{
> + /* sdata can be NULL for interrupted requests in trans_rdma,
> + * and kmem_cache_free does not do NULL-check for us
> + */
> + if (unlikely(!fc->sdata))
> + return;
> +
> + if (c->fcall_cache && fc->capacity == c->msize)

... and here as well.

> + kmem_cache_free(c->fcall_cache, fc->sdata);
> + else
> + kfree(fc->sdata);
> +}
> +EXPORT_SYMBOL(p9_fcall_free);
> +
> static struct kmem_cache *p9_req_cache;
>
> /**
> @@ -261,9 +280,9 @@ p9_tag_alloc(struct p9_client *c, int8_t type, unsigned int max_size)
> if (!req)
> return NULL;
>
> - if (p9_fcall_alloc(&req->tc, alloc_msize))
> + if (p9_fcall_alloc(c, &req->tc, alloc_msize))
> goto free;
> - if (p9_fcall_alloc(&req->rc, alloc_msize))
> + if (p9_fcall_alloc(c, &req->rc, alloc_msize))
> goto free;
>
> p9pdu_reset(&req->tc);
> @@ -288,8 +307,8 @@ p9_tag_alloc(struct p9_client *c, int8_t type, unsigned int max_size)
> return req;
>
> free:
> - kfree(req->tc.sdata);
> - kfree(req->rc.sdata);
> + p9_fcall_free(c, &req->tc);
> + p9_fcall_free(c, &req->rc);
> kmem_cache_free(p9_req_cache, req);
> return ERR_PTR(-ENOMEM);
> }
> @@ -333,8 +352,8 @@ static void p9_free_req(struct p9_client *c, struct p9_req_t *r)
> spin_lock_irqsave(&c->lock, flags);
> idr_remove(&c->reqs, tag);
> spin_unlock_irqrestore(&c->lock, flags);
> - kfree(r->tc.sdata);
> - kfree(r->rc.sdata);
> + p9_fcall_free(c, &r->tc);
> + p9_fcall_free(c, &r->rc);
> kmem_cache_free(p9_req_cache, r);
> }
>
> @@ -944,6 +963,7 @@ struct p9_client *p9_client_create(const char *dev_name, char *options)
>
> clnt->trans_mod = NULL;
> clnt->trans = NULL;
> + clnt->fcall_cache = NULL;
>
> client_id = utsname()->nodename;
> memcpy(clnt->name, client_id, strlen(client_id) + 1);
> @@ -980,6 +1000,9 @@ struct p9_client *p9_client_create(const char *dev_name, char *options)
> if (err)
> goto close_trans;
>
> + clnt->fcall_cache = kmem_cache_create("9p-fcall-cache", clnt->msize,
> + 0, 0, NULL);
> +
> return clnt;
>
> close_trans:
> @@ -1011,6 +1034,7 @@ void p9_client_destroy(struct p9_client *clnt)
>
> p9_tag_cleanup(clnt);
>
> + kmem_cache_destroy(clnt->fcall_cache);
> kfree(clnt);
> }
> EXPORT_SYMBOL(p9_client_destroy);
> diff --git a/net/9p/trans_rdma.c b/net/9p/trans_rdma.c
> index c5cac97df7f7..5e43f0a00b3a 100644
> --- a/net/9p/trans_rdma.c
> +++ b/net/9p/trans_rdma.c
> @@ -445,7 +445,7 @@ static int rdma_request(struct p9_client *client, struct p9_req_t *req)
> if (unlikely(atomic_read(&rdma->excess_rc) > 0)) {
> if ((atomic_sub_return(1, &rdma->excess_rc) >= 0)) {
> /* Got one! */
> - kfree(req->rc.sdata);
> + p9_fcall_free(client, &req->rc);
> req->rc.sdata = NULL;
> goto dont_need_post_recv;
> } else {


2018-08-01 15:26:09

by Dominique Martinet

[permalink] [raw]
Subject: Re: [V9fs-developer] [PATCH 2/2] net/9p: add a per-client fcall kmem_cache

Greg Kurz wrote on Wed, Aug 01, 2018:
> > diff --git a/net/9p/client.c b/net/9p/client.c
> > index ba99a94a12c9..215e3b1ed7b4 100644
> > --- a/net/9p/client.c
> > +++ b/net/9p/client.c
> > @@ -231,15 +231,34 @@ static int parse_opts(char *opts, struct p9_client *clnt)
> > return ret;
> > }
> >
> > -static int p9_fcall_alloc(struct p9_fcall *fc, int alloc_msize)
> > +static int p9_fcall_alloc(struct p9_client *c, struct p9_fcall *fc,
> > + int alloc_msize)
> > {
> > - fc->sdata = kmalloc(alloc_msize, GFP_NOFS);
> > + if (c->fcall_cache && alloc_msize == c->msize)
>
> This is a presumably hot path for any request but the initial TVERSION,
> you probably want likely() here...

c->fcall_cache is indeed very likely, but alloc_size == c->msize not so
much, as zc requests will be quite common for virtio and will be 4k in
size.
Although with that cache I'm starting to wonder if we should always use
it... Speed-wise if there is no memory pressure the cache is likely
going to be faster.
If there is pressure and the items are reclaimed though that'll bring
some heavier slow-down as it'll need to find bigger memory regions.

I'm not sure which path we should favor, tbh. I'll keep these separate
for now.


For the first part of the check, Matthew suggested trying to trick msize
into a different value to fail this check for the initial TVERSION call,
but even after thinking it a bit I don't really see how to do that
cleanly. I can at least make -that- likely()...

>
> > + fc->sdata = kmem_cache_alloc(c->fcall_cache, GFP_NOFS);
> > + else
> > + fc->sdata = kmalloc(alloc_msize, GFP_NOFS);
> > if (!fc->sdata)
> > return -ENOMEM;
> > fc->capacity = alloc_msize;
> > return 0;
> > }
> >
> > +void p9_fcall_free(struct p9_client *c, struct p9_fcall *fc)
> > +{
> > + /* sdata can be NULL for interrupted requests in trans_rdma,
> > + * and kmem_cache_free does not do NULL-check for us
> > + */
> > + if (unlikely(!fc->sdata))
> > + return;
> > +
> > + if (c->fcall_cache && fc->capacity == c->msize)
>
> ... and here as well.

For this one I'll unfortunately need to store in the fc how it has been
allocated as slob doesn't allow to kmem_cache_free() a buffer allocated
by kmalloc and in prevision of refs being refcounted in a hostile world
the initial TVERSION req could be freed after fcall_cache is created :/

That's a bit of a burden but at least will reduce the checks to one here

> > + kmem_cache_free(c->fcall_cache, fc->sdata);
> > + else
> > + kfree(fc->sdata);
> > +}
> > +EXPORT_SYMBOL(p9_fcall_free);
> > +
> > static struct kmem_cache *p9_req_cache;
> >
> > /**


Anyway I've had as many comments as I could hope for, thanks everyone
for the quick review.
I'll send a v2 of both patches tomorrow

--
Dominique

2018-08-01 16:23:12

by Greg Kurz

[permalink] [raw]
Subject: Re: [V9fs-developer] [PATCH 1/2] net/9p: embed fcall in req to round down buffer allocs

On Wed, 1 Aug 2018 16:38:40 +0200
Dominique Martinet <[email protected]> wrote:

> Greg Kurz wrote on Wed, Aug 01, 2018:
> > > @@ -263,13 +261,13 @@ p9_tag_alloc(struct p9_client *c, int8_t type, unsigned int max_size)
> > > if (!req)
> > > return NULL;
> > >
> > > + if (p9_fcall_alloc(&req->tc, alloc_msize))
> > > + goto free;
> > > + if (p9_fcall_alloc(&req->rc, alloc_msize))
> > > goto free;
> >
> > Hmm... if the first allocation fails, we will kfree() req->rc.sdata.
> >
> > Are we sure we won't have a stale pointer or uninitialized data in
> > there ?
>
> Yeah, Jun pointed that out and I have a v2 that only frees as needed
> with an extra goto (I sent an incremental diff in my reply to his
> comment here[1])
>
> [1] https://lkml.kernel.org/r/20180731011256.GA30388@nautica
>
> > And even if we don't with the current code base, this is fragile and
> > could be easily broken.
> >
> > I think you should drop this hunk and rather rename p9_fcall_alloc() to
> > p9_fcall_alloc_sdata() instead, since this is what the function is
> > actually doing with this patch applied.
>
> Hmm. I agree the naming isn't accurate, but even if we rename it we'll
> need to pass a pointer to fcall as argument as it inits its capacity.
> p9_fcall_init(fc, msize) might be simpler?
>

Ah yes you're right... alloc is a bit misleading then. I agree that
p9_fcall_init() is more appropriate in this case.

And maybe you should introduce p9_fcall_fini() or _release() for
completeness. It would only do kfree() for a start, but it would
then evolve to be like the p9_fcall_kfree() function from patch 2.

> (I'm not sure I follow what you mean by 'drop this hunk', to be honest,
> did you want a single function call to init both maybe?)
>

I was meaning "keep the same logic in p9_tag_alloc()", something like:

req->tc.sdata = p9_fcall_alloc_sdata(&req->tc, alloc_msize);
req->rc.sdata = p9_fcall_alloc_sdata(&req->tc, alloc_msize);
if (!req->tc.sdata || !req->rc.sdata)

But I agree the way you did is cleaner.

2018-08-02 02:39:47

by Dominique Martinet

[permalink] [raw]
Subject: [PATCH v2 2/2] net/9p: add a per-client fcall kmem_cache

From: Dominique Martinet <[email protected]>

Having a specific cache for the fcall allocations helps speed up
allocations a bit, especially in case of non-"round" msizes.

The caches will automatically be merged if there are multiple caches
of items with the same size so we do not need to try to share a cache
between different clients of the same size.

Since the msize is negotiated with the server, only allocate the cache
after that negotiation has happened - previous allocations or
allocations of different sizes (e.g. zero-copy fcall) are made with
kmalloc directly.

Signed-off-by: Dominique Martinet <[email protected]>
Cc: Matthew Wilcox <[email protected]>
Cc: Greg Kurz <[email protected]>
Cc: Jun Piao <[email protected]>
---
v2:
- Add a pointer to the cache in p9_fcall to make sure a buffer
allocated with kmalloc gets freed with kfree and vice-versa
This could have been smaller with a bool but this spares having
to look at the client so looked a bit cleaner, I'm expecting this
patch will need a v3 one way or another so I went for the bolder
approach - please say if you think a smaller item is better ; I *think*
nothing relies on this being ordered the same way as the data on the
wire (struct isn't packed anyway) so we can move id after tag and add
another u8 to not have any overhead

- added likely() to cache existence check in allocation, but nothing
for msize check or free because of zc request being of different size

include/net/9p/9p.h | 1 +
include/net/9p/client.h | 2 ++
net/9p/client.c | 34 ++++++++++++++++++++++++++++------
net/9p/trans_rdma.c | 2 +-
4 files changed, 32 insertions(+), 7 deletions(-)

diff --git a/include/net/9p/9p.h b/include/net/9p/9p.h
index e23896116d9a..f1d2ed3cee61 100644
--- a/include/net/9p/9p.h
+++ b/include/net/9p/9p.h
@@ -558,6 +558,7 @@ struct p9_fcall {
size_t offset;
size_t capacity;

+ struct kmem_cache *cache;
u8 *sdata;
};

diff --git a/include/net/9p/client.h b/include/net/9p/client.h
index 4b4ac1362ad5..735f3979d559 100644
--- a/include/net/9p/client.h
+++ b/include/net/9p/client.h
@@ -123,6 +123,7 @@ struct p9_client {
struct p9_trans_module *trans_mod;
enum p9_trans_status status;
void *trans;
+ struct kmem_cache *fcall_cache;

union {
struct {
@@ -230,6 +231,7 @@ int p9_client_mkdir_dotl(struct p9_fid *fid, const char *name, int mode,
kgid_t gid, struct p9_qid *);
int p9_client_lock_dotl(struct p9_fid *fid, struct p9_flock *flock, u8 *status);
int p9_client_getlock_dotl(struct p9_fid *fid, struct p9_getlock *fl);
+void p9_fcall_fini(struct p9_fcall *fc);
struct p9_req_t *p9_tag_lookup(struct p9_client *, u16);
void p9_client_cb(struct p9_client *c, struct p9_req_t *req, int status);

diff --git a/net/9p/client.c b/net/9p/client.c
index bc40bb11b832..0e0f8bb3fd3c 100644
--- a/net/9p/client.c
+++ b/net/9p/client.c
@@ -231,19 +231,36 @@ static int parse_opts(char *opts, struct p9_client *clnt)
return ret;
}

-static int p9_fcall_init(struct p9_fcall *fc, int alloc_msize)
+static int p9_fcall_init(struct p9_client *c, struct p9_fcall *fc,
+ int alloc_msize)
{
- fc->sdata = kmalloc(alloc_msize, GFP_NOFS);
+ if (likely(c->fcall_cache) && alloc_msize == c->msize) {
+ fc->sdata = kmem_cache_alloc(c->fcall_cache, GFP_NOFS);
+ fc->cache = c->fcall_cache;
+ } else {
+ fc->sdata = kmalloc(alloc_msize, GFP_NOFS);
+ fc->cache = NULL;
+ }
if (!fc->sdata)
return -ENOMEM;
fc->capacity = alloc_msize;
return 0;
}

-static void p9_fcall_fini(struct p9_fcall *fc)
+void p9_fcall_fini(struct p9_fcall *fc)
{
- kfree(fc->sdata);
+ /* sdata can be NULL for interrupted requests in trans_rdma,
+ * and kmem_cache_free does not do NULL-check for us
+ */
+ if (unlikely(!fc->sdata))
+ return;
+
+ if (fc->cache)
+ kmem_cache_free(fc->cache, fc->sdata);
+ else
+ kfree(fc->sdata);
}
+EXPORT_SYMBOL(p9_fcall_fini);

static struct kmem_cache *p9_req_cache;

@@ -266,9 +283,9 @@ p9_tag_alloc(struct p9_client *c, int8_t type, unsigned int max_size)
if (!req)
return NULL;

- if (p9_fcall_init(&req->tc, alloc_msize))
+ if (p9_fcall_init(c, &req->tc, alloc_msize))
goto free_req;
- if (p9_fcall_init(&req->rc, alloc_msize))
+ if (p9_fcall_init(c, &req->rc, alloc_msize))
goto free;

p9pdu_reset(&req->tc);
@@ -950,6 +967,7 @@ struct p9_client *p9_client_create(const char *dev_name, char *options)

clnt->trans_mod = NULL;
clnt->trans = NULL;
+ clnt->fcall_cache = NULL;

client_id = utsname()->nodename;
memcpy(clnt->name, client_id, strlen(client_id) + 1);
@@ -986,6 +1004,9 @@ struct p9_client *p9_client_create(const char *dev_name, char *options)
if (err)
goto close_trans;

+ clnt->fcall_cache = kmem_cache_create("9p-fcall-cache", clnt->msize,
+ 0, 0, NULL);
+
return clnt;

close_trans:
@@ -1017,6 +1038,7 @@ void p9_client_destroy(struct p9_client *clnt)

p9_tag_cleanup(clnt);

+ kmem_cache_destroy(clnt->fcall_cache);
kfree(clnt);
}
EXPORT_SYMBOL(p9_client_destroy);
diff --git a/net/9p/trans_rdma.c b/net/9p/trans_rdma.c
index c5cac97df7f7..c60655c90c9e 100644
--- a/net/9p/trans_rdma.c
+++ b/net/9p/trans_rdma.c
@@ -445,7 +445,7 @@ static int rdma_request(struct p9_client *client, struct p9_req_t *req)
if (unlikely(atomic_read(&rdma->excess_rc) > 0)) {
if ((atomic_sub_return(1, &rdma->excess_rc) >= 0)) {
/* Got one! */
- kfree(req->rc.sdata);
+ p9_fcall_fini(&req->rc);
req->rc.sdata = NULL;
goto dont_need_post_recv;
} else {
--
2.17.1


2018-08-02 03:04:19

by Dominique Martinet

[permalink] [raw]
Subject: [PATCH v2 1/2] net/9p: embed fcall in req to round down buffer allocs

From: Dominique Martinet <[email protected]>

'msize' is often a power of two, or at least page-aligned, so avoiding
an overhead of two dozen bytes for each allocation will help the
allocator do its work and reduce memory fragmentation.

Suggested-by: Matthew Wilcox <[email protected]>
Signed-off-by: Dominique Martinet <[email protected]>
Cc: Matthew Wilcox <[email protected]>
Cc: Greg Kurz <[email protected]>
Cc: Jun Piao <[email protected]>
---

v2:
- Add extra label to not free uninitialized memory on alloc failure
- Rename p9_fcall_alloc to 9p_fcall_init
- Add a p9_fcall_fini function to echo to init

include/net/9p/client.h | 4 +-
net/9p/client.c | 166 ++++++++++++++++++++--------------------
net/9p/trans_fd.c | 12 +--
net/9p/trans_rdma.c | 29 +++----
net/9p/trans_virtio.c | 18 ++---
net/9p/trans_xen.c | 12 +--
6 files changed, 123 insertions(+), 118 deletions(-)

diff --git a/include/net/9p/client.h b/include/net/9p/client.h
index a4dc42c53d18..4b4ac1362ad5 100644
--- a/include/net/9p/client.h
+++ b/include/net/9p/client.h
@@ -95,8 +95,8 @@ struct p9_req_t {
int status;
int t_err;
wait_queue_head_t wq;
- struct p9_fcall *tc;
- struct p9_fcall *rc;
+ struct p9_fcall tc;
+ struct p9_fcall rc;
void *aux;
struct list_head req_list;
};
diff --git a/net/9p/client.c b/net/9p/client.c
index 88db45966740..bc40bb11b832 100644
--- a/net/9p/client.c
+++ b/net/9p/client.c
@@ -231,15 +231,18 @@ static int parse_opts(char *opts, struct p9_client *clnt)
return ret;
}

-static struct p9_fcall *p9_fcall_alloc(int alloc_msize)
+static int p9_fcall_init(struct p9_fcall *fc, int alloc_msize)
{
- struct p9_fcall *fc;
- fc = kmalloc(sizeof(struct p9_fcall) + alloc_msize, GFP_NOFS);
- if (!fc)
- return NULL;
+ fc->sdata = kmalloc(alloc_msize, GFP_NOFS);
+ if (!fc->sdata)
+ return -ENOMEM;
fc->capacity = alloc_msize;
- fc->sdata = (char *) fc + sizeof(struct p9_fcall);
- return fc;
+ return 0;
+}
+
+static void p9_fcall_fini(struct p9_fcall *fc)
+{
+ kfree(fc->sdata);
}

static struct kmem_cache *p9_req_cache;
@@ -263,13 +266,13 @@ p9_tag_alloc(struct p9_client *c, int8_t type, unsigned int max_size)
if (!req)
return NULL;

- req->tc = p9_fcall_alloc(alloc_msize);
- req->rc = p9_fcall_alloc(alloc_msize);
- if (!req->tc || !req->rc)
+ if (p9_fcall_init(&req->tc, alloc_msize))
+ goto free_req;
+ if (p9_fcall_init(&req->rc, alloc_msize))
goto free;

- p9pdu_reset(req->tc);
- p9pdu_reset(req->rc);
+ p9pdu_reset(&req->tc);
+ p9pdu_reset(&req->rc);
req->status = REQ_STATUS_ALLOC;
init_waitqueue_head(&req->wq);
INIT_LIST_HEAD(&req->req_list);
@@ -281,7 +284,7 @@ p9_tag_alloc(struct p9_client *c, int8_t type, unsigned int max_size)
GFP_NOWAIT);
else
tag = idr_alloc(&c->reqs, req, 0, P9_NOTAG, GFP_NOWAIT);
- req->tc->tag = tag;
+ req->tc.tag = tag;
spin_unlock_irq(&c->lock);
idr_preload_end();
if (tag < 0)
@@ -290,8 +293,9 @@ p9_tag_alloc(struct p9_client *c, int8_t type, unsigned int max_size)
return req;

free:
- kfree(req->tc);
- kfree(req->rc);
+ p9_fcall_fini(&req->tc);
+ p9_fcall_fini(&req->rc);
+free_req:
kmem_cache_free(p9_req_cache, req);
return ERR_PTR(-ENOMEM);
}
@@ -329,14 +333,14 @@ EXPORT_SYMBOL(p9_tag_lookup);
static void p9_free_req(struct p9_client *c, struct p9_req_t *r)
{
unsigned long flags;
- u16 tag = r->tc->tag;
+ u16 tag = r->tc.tag;

p9_debug(P9_DEBUG_MUX, "clnt %p req %p tag: %d\n", c, r, tag);
spin_lock_irqsave(&c->lock, flags);
idr_remove(&c->reqs, tag);
spin_unlock_irqrestore(&c->lock, flags);
- kfree(r->tc);
- kfree(r->rc);
+ p9_fcall_fini(&r->tc);
+ p9_fcall_fini(&r->rc);
kmem_cache_free(p9_req_cache, r);
}

@@ -368,7 +372,7 @@ static void p9_tag_cleanup(struct p9_client *c)
*/
void p9_client_cb(struct p9_client *c, struct p9_req_t *req, int status)
{
- p9_debug(P9_DEBUG_MUX, " tag %d\n", req->tc->tag);
+ p9_debug(P9_DEBUG_MUX, " tag %d\n", req->tc.tag);

/*
* This barrier is needed to make sure any change made to req before
@@ -378,7 +382,7 @@ void p9_client_cb(struct p9_client *c, struct p9_req_t *req, int status)
req->status = status;

wake_up(&req->wq);
- p9_debug(P9_DEBUG_MUX, "wakeup: %d\n", req->tc->tag);
+ p9_debug(P9_DEBUG_MUX, "wakeup: %d\n", req->tc.tag);
}
EXPORT_SYMBOL(p9_client_cb);

@@ -449,18 +453,18 @@ static int p9_check_errors(struct p9_client *c, struct p9_req_t *req)
int err;
int ecode;

- err = p9_parse_header(req->rc, NULL, &type, NULL, 0);
- if (req->rc->size >= c->msize) {
+ err = p9_parse_header(&req->rc, NULL, &type, NULL, 0);
+ if (req->rc.size >= c->msize) {
p9_debug(P9_DEBUG_ERROR,
"requested packet size too big: %d\n",
- req->rc->size);
+ req->rc.size);
return -EIO;
}
/*
* dump the response from server
* This should be after check errors which poplulate pdu_fcall.
*/
- trace_9p_protocol_dump(c, req->rc);
+ trace_9p_protocol_dump(c, &req->rc);
if (err) {
p9_debug(P9_DEBUG_ERROR, "couldn't parse header %d\n", err);
return err;
@@ -470,7 +474,7 @@ static int p9_check_errors(struct p9_client *c, struct p9_req_t *req)

if (!p9_is_proto_dotl(c)) {
char *ename;
- err = p9pdu_readf(req->rc, c->proto_version, "s?d",
+ err = p9pdu_readf(&req->rc, c->proto_version, "s?d",
&ename, &ecode);
if (err)
goto out_err;
@@ -486,7 +490,7 @@ static int p9_check_errors(struct p9_client *c, struct p9_req_t *req)
}
kfree(ename);
} else {
- err = p9pdu_readf(req->rc, c->proto_version, "d", &ecode);
+ err = p9pdu_readf(&req->rc, c->proto_version, "d", &ecode);
err = -ecode;

p9_debug(P9_DEBUG_9P, "<<< RLERROR (%d)\n", -ecode);
@@ -520,12 +524,12 @@ static int p9_check_zc_errors(struct p9_client *c, struct p9_req_t *req,
int8_t type;
char *ename = NULL;

- err = p9_parse_header(req->rc, NULL, &type, NULL, 0);
+ err = p9_parse_header(&req->rc, NULL, &type, NULL, 0);
/*
* dump the response from server
* This should be after parse_header which poplulate pdu_fcall.
*/
- trace_9p_protocol_dump(c, req->rc);
+ trace_9p_protocol_dump(c, &req->rc);
if (err) {
p9_debug(P9_DEBUG_ERROR, "couldn't parse header %d\n", err);
return err;
@@ -540,13 +544,13 @@ static int p9_check_zc_errors(struct p9_client *c, struct p9_req_t *req,
/* 7 = header size for RERROR; */
int inline_len = in_hdrlen - 7;

- len = req->rc->size - req->rc->offset;
+ len = req->rc.size - req->rc.offset;
if (len > (P9_ZC_HDR_SZ - 7)) {
err = -EFAULT;
goto out_err;
}

- ename = &req->rc->sdata[req->rc->offset];
+ ename = &req->rc.sdata[req->rc.offset];
if (len > inline_len) {
/* We have error in external buffer */
if (!copy_from_iter_full(ename + inline_len,
@@ -556,7 +560,7 @@ static int p9_check_zc_errors(struct p9_client *c, struct p9_req_t *req,
}
}
ename = NULL;
- err = p9pdu_readf(req->rc, c->proto_version, "s?d",
+ err = p9pdu_readf(&req->rc, c->proto_version, "s?d",
&ename, &ecode);
if (err)
goto out_err;
@@ -572,7 +576,7 @@ static int p9_check_zc_errors(struct p9_client *c, struct p9_req_t *req,
}
kfree(ename);
} else {
- err = p9pdu_readf(req->rc, c->proto_version, "d", &ecode);
+ err = p9pdu_readf(&req->rc, c->proto_version, "d", &ecode);
err = -ecode;

p9_debug(P9_DEBUG_9P, "<<< RLERROR (%d)\n", -ecode);
@@ -605,7 +609,7 @@ static int p9_client_flush(struct p9_client *c, struct p9_req_t *oldreq)
int16_t oldtag;
int err;

- err = p9_parse_header(oldreq->tc, NULL, NULL, &oldtag, 1);
+ err = p9_parse_header(&oldreq->tc, NULL, NULL, &oldtag, 1);
if (err)
return err;

@@ -649,12 +653,12 @@ static struct p9_req_t *p9_client_prepare_req(struct p9_client *c,
return req;

/* marshall the data */
- p9pdu_prepare(req->tc, req->tc->tag, type);
- err = p9pdu_vwritef(req->tc, c->proto_version, fmt, ap);
+ p9pdu_prepare(&req->tc, req->tc.tag, type);
+ err = p9pdu_vwritef(&req->tc, c->proto_version, fmt, ap);
if (err)
goto reterr;
- p9pdu_finalize(c, req->tc);
- trace_9p_client_req(c, type, req->tc->tag);
+ p9pdu_finalize(c, &req->tc);
+ trace_9p_client_req(c, type, req->tc.tag);
return req;
reterr:
p9_free_req(c, req);
@@ -739,7 +743,7 @@ p9_client_rpc(struct p9_client *c, int8_t type, const char *fmt, ...)
goto reterr;

err = p9_check_errors(c, req);
- trace_9p_client_res(c, type, req->rc->tag, err);
+ trace_9p_client_res(c, type, req->rc.tag, err);
if (!err)
return req;
reterr:
@@ -821,7 +825,7 @@ static struct p9_req_t *p9_client_zc_rpc(struct p9_client *c, int8_t type,
goto reterr;

err = p9_check_zc_errors(c, req, uidata, in_hdrlen);
- trace_9p_client_res(c, type, req->rc->tag, err);
+ trace_9p_client_res(c, type, req->rc.tag, err);
if (!err)
return req;
reterr:
@@ -904,10 +908,10 @@ static int p9_client_version(struct p9_client *c)
if (IS_ERR(req))
return PTR_ERR(req);

- err = p9pdu_readf(req->rc, c->proto_version, "ds", &msize, &version);
+ err = p9pdu_readf(&req->rc, c->proto_version, "ds", &msize, &version);
if (err) {
p9_debug(P9_DEBUG_9P, "version error %d\n", err);
- trace_9p_protocol_dump(c, req->rc);
+ trace_9p_protocol_dump(c, &req->rc);
goto error;
}

@@ -1056,9 +1060,9 @@ struct p9_fid *p9_client_attach(struct p9_client *clnt, struct p9_fid *afid,
goto error;
}

- err = p9pdu_readf(req->rc, clnt->proto_version, "Q", &qid);
+ err = p9pdu_readf(&req->rc, clnt->proto_version, "Q", &qid);
if (err) {
- trace_9p_protocol_dump(clnt, req->rc);
+ trace_9p_protocol_dump(clnt, &req->rc);
p9_free_req(clnt, req);
goto error;
}
@@ -1113,9 +1117,9 @@ struct p9_fid *p9_client_walk(struct p9_fid *oldfid, uint16_t nwname,
goto error;
}

- err = p9pdu_readf(req->rc, clnt->proto_version, "R", &nwqids, &wqids);
+ err = p9pdu_readf(&req->rc, clnt->proto_version, "R", &nwqids, &wqids);
if (err) {
- trace_9p_protocol_dump(clnt, req->rc);
+ trace_9p_protocol_dump(clnt, &req->rc);
p9_free_req(clnt, req);
goto clunk_fid;
}
@@ -1180,9 +1184,9 @@ int p9_client_open(struct p9_fid *fid, int mode)
goto error;
}

- err = p9pdu_readf(req->rc, clnt->proto_version, "Qd", &qid, &iounit);
+ err = p9pdu_readf(&req->rc, clnt->proto_version, "Qd", &qid, &iounit);
if (err) {
- trace_9p_protocol_dump(clnt, req->rc);
+ trace_9p_protocol_dump(clnt, &req->rc);
goto free_and_error;
}

@@ -1224,9 +1228,9 @@ int p9_client_create_dotl(struct p9_fid *ofid, const char *name, u32 flags, u32
goto error;
}

- err = p9pdu_readf(req->rc, clnt->proto_version, "Qd", qid, &iounit);
+ err = p9pdu_readf(&req->rc, clnt->proto_version, "Qd", qid, &iounit);
if (err) {
- trace_9p_protocol_dump(clnt, req->rc);
+ trace_9p_protocol_dump(clnt, &req->rc);
goto free_and_error;
}

@@ -1269,9 +1273,9 @@ int p9_client_fcreate(struct p9_fid *fid, const char *name, u32 perm, int mode,
goto error;
}

- err = p9pdu_readf(req->rc, clnt->proto_version, "Qd", &qid, &iounit);
+ err = p9pdu_readf(&req->rc, clnt->proto_version, "Qd", &qid, &iounit);
if (err) {
- trace_9p_protocol_dump(clnt, req->rc);
+ trace_9p_protocol_dump(clnt, &req->rc);
goto free_and_error;
}

@@ -1308,9 +1312,9 @@ int p9_client_symlink(struct p9_fid *dfid, const char *name,
goto error;
}

- err = p9pdu_readf(req->rc, clnt->proto_version, "Q", qid);
+ err = p9pdu_readf(&req->rc, clnt->proto_version, "Q", qid);
if (err) {
- trace_9p_protocol_dump(clnt, req->rc);
+ trace_9p_protocol_dump(clnt, &req->rc);
goto free_and_error;
}

@@ -1506,10 +1510,10 @@ p9_client_read(struct p9_fid *fid, u64 offset, struct iov_iter *to, int *err)
break;
}

- *err = p9pdu_readf(req->rc, clnt->proto_version,
+ *err = p9pdu_readf(&req->rc, clnt->proto_version,
"D", &count, &dataptr);
if (*err) {
- trace_9p_protocol_dump(clnt, req->rc);
+ trace_9p_protocol_dump(clnt, &req->rc);
p9_free_req(clnt, req);
break;
}
@@ -1579,9 +1583,9 @@ p9_client_write(struct p9_fid *fid, u64 offset, struct iov_iter *from, int *err)
break;
}

- *err = p9pdu_readf(req->rc, clnt->proto_version, "d", &count);
+ *err = p9pdu_readf(&req->rc, clnt->proto_version, "d", &count);
if (*err) {
- trace_9p_protocol_dump(clnt, req->rc);
+ trace_9p_protocol_dump(clnt, &req->rc);
p9_free_req(clnt, req);
break;
}
@@ -1623,9 +1627,9 @@ struct p9_wstat *p9_client_stat(struct p9_fid *fid)
goto error;
}

- err = p9pdu_readf(req->rc, clnt->proto_version, "wS", &ignored, ret);
+ err = p9pdu_readf(&req->rc, clnt->proto_version, "wS", &ignored, ret);
if (err) {
- trace_9p_protocol_dump(clnt, req->rc);
+ trace_9p_protocol_dump(clnt, &req->rc);
p9_free_req(clnt, req);
goto error;
}
@@ -1676,9 +1680,9 @@ struct p9_stat_dotl *p9_client_getattr_dotl(struct p9_fid *fid,
goto error;
}

- err = p9pdu_readf(req->rc, clnt->proto_version, "A", ret);
+ err = p9pdu_readf(&req->rc, clnt->proto_version, "A", ret);
if (err) {
- trace_9p_protocol_dump(clnt, req->rc);
+ trace_9p_protocol_dump(clnt, &req->rc);
p9_free_req(clnt, req);
goto error;
}
@@ -1828,11 +1832,11 @@ int p9_client_statfs(struct p9_fid *fid, struct p9_rstatfs *sb)
goto error;
}

- err = p9pdu_readf(req->rc, clnt->proto_version, "ddqqqqqqd", &sb->type,
- &sb->bsize, &sb->blocks, &sb->bfree, &sb->bavail,
- &sb->files, &sb->ffree, &sb->fsid, &sb->namelen);
+ err = p9pdu_readf(&req->rc, clnt->proto_version, "ddqqqqqqd", &sb->type,
+ &sb->bsize, &sb->blocks, &sb->bfree, &sb->bavail,
+ &sb->files, &sb->ffree, &sb->fsid, &sb->namelen);
if (err) {
- trace_9p_protocol_dump(clnt, req->rc);
+ trace_9p_protocol_dump(clnt, &req->rc);
p9_free_req(clnt, req);
goto error;
}
@@ -1936,9 +1940,9 @@ struct p9_fid *p9_client_xattrwalk(struct p9_fid *file_fid,
err = PTR_ERR(req);
goto error;
}
- err = p9pdu_readf(req->rc, clnt->proto_version, "q", attr_size);
+ err = p9pdu_readf(&req->rc, clnt->proto_version, "q", attr_size);
if (err) {
- trace_9p_protocol_dump(clnt, req->rc);
+ trace_9p_protocol_dump(clnt, &req->rc);
p9_free_req(clnt, req);
goto clunk_fid;
}
@@ -2024,9 +2028,9 @@ int p9_client_readdir(struct p9_fid *fid, char *data, u32 count, u64 offset)
goto error;
}

- err = p9pdu_readf(req->rc, clnt->proto_version, "D", &count, &dataptr);
+ err = p9pdu_readf(&req->rc, clnt->proto_version, "D", &count, &dataptr);
if (err) {
- trace_9p_protocol_dump(clnt, req->rc);
+ trace_9p_protocol_dump(clnt, &req->rc);
goto free_and_error;
}
if (rsize < count) {
@@ -2065,9 +2069,9 @@ int p9_client_mknod_dotl(struct p9_fid *fid, const char *name, int mode,
if (IS_ERR(req))
return PTR_ERR(req);

- err = p9pdu_readf(req->rc, clnt->proto_version, "Q", qid);
+ err = p9pdu_readf(&req->rc, clnt->proto_version, "Q", qid);
if (err) {
- trace_9p_protocol_dump(clnt, req->rc);
+ trace_9p_protocol_dump(clnt, &req->rc);
goto error;
}
p9_debug(P9_DEBUG_9P, "<<< RMKNOD qid %x.%llx.%x\n", qid->type,
@@ -2096,9 +2100,9 @@ int p9_client_mkdir_dotl(struct p9_fid *fid, const char *name, int mode,
if (IS_ERR(req))
return PTR_ERR(req);

- err = p9pdu_readf(req->rc, clnt->proto_version, "Q", qid);
+ err = p9pdu_readf(&req->rc, clnt->proto_version, "Q", qid);
if (err) {
- trace_9p_protocol_dump(clnt, req->rc);
+ trace_9p_protocol_dump(clnt, &req->rc);
goto error;
}
p9_debug(P9_DEBUG_9P, "<<< RMKDIR qid %x.%llx.%x\n", qid->type,
@@ -2131,9 +2135,9 @@ int p9_client_lock_dotl(struct p9_fid *fid, struct p9_flock *flock, u8 *status)
if (IS_ERR(req))
return PTR_ERR(req);

- err = p9pdu_readf(req->rc, clnt->proto_version, "b", status);
+ err = p9pdu_readf(&req->rc, clnt->proto_version, "b", status);
if (err) {
- trace_9p_protocol_dump(clnt, req->rc);
+ trace_9p_protocol_dump(clnt, &req->rc);
goto error;
}
p9_debug(P9_DEBUG_9P, "<<< RLOCK status %i\n", *status);
@@ -2162,11 +2166,11 @@ int p9_client_getlock_dotl(struct p9_fid *fid, struct p9_getlock *glock)
if (IS_ERR(req))
return PTR_ERR(req);

- err = p9pdu_readf(req->rc, clnt->proto_version, "bqqds", &glock->type,
- &glock->start, &glock->length, &glock->proc_id,
- &glock->client_id);
+ err = p9pdu_readf(&req->rc, clnt->proto_version, "bqqds", &glock->type,
+ &glock->start, &glock->length, &glock->proc_id,
+ &glock->client_id);
if (err) {
- trace_9p_protocol_dump(clnt, req->rc);
+ trace_9p_protocol_dump(clnt, &req->rc);
goto error;
}
p9_debug(P9_DEBUG_9P, "<<< RGETLOCK type %i start %lld length %lld "
@@ -2192,9 +2196,9 @@ int p9_client_readlink(struct p9_fid *fid, char **target)
if (IS_ERR(req))
return PTR_ERR(req);

- err = p9pdu_readf(req->rc, clnt->proto_version, "s", target);
+ err = p9pdu_readf(&req->rc, clnt->proto_version, "s", target);
if (err) {
- trace_9p_protocol_dump(clnt, req->rc);
+ trace_9p_protocol_dump(clnt, &req->rc);
goto error;
}
p9_debug(P9_DEBUG_9P, "<<< RREADLINK target %s\n", *target);
diff --git a/net/9p/trans_fd.c b/net/9p/trans_fd.c
index e2ef3c782c53..20f46f13fe83 100644
--- a/net/9p/trans_fd.c
+++ b/net/9p/trans_fd.c
@@ -354,7 +354,7 @@ static void p9_read_work(struct work_struct *work)
goto error;
}

- if (m->req->rc == NULL) {
+ if (m->req->rc.sdata == NULL) {
p9_debug(P9_DEBUG_ERROR,
"No recv fcall for tag %d (req %p), disconnecting!\n",
m->rc.tag, m->req);
@@ -362,7 +362,7 @@ static void p9_read_work(struct work_struct *work)
err = -EIO;
goto error;
}
- m->rc.sdata = (char *)m->req->rc + sizeof(struct p9_fcall);
+ m->rc.sdata = m->req->rc.sdata;
memcpy(m->rc.sdata, m->tmp_buf, m->rc.capacity);
m->rc.capacity = m->rc.size;
}
@@ -372,7 +372,7 @@ static void p9_read_work(struct work_struct *work)
*/
if ((m->req) && (m->rc.offset == m->rc.capacity)) {
p9_debug(P9_DEBUG_TRANS, "got new packet\n");
- m->req->rc->size = m->rc.offset;
+ m->req->rc.size = m->rc.offset;
spin_lock(&m->client->lock);
if (m->req->status != REQ_STATUS_ERROR)
status = REQ_STATUS_RCVD;
@@ -469,8 +469,8 @@ static void p9_write_work(struct work_struct *work)
p9_debug(P9_DEBUG_TRANS, "move req %p\n", req);
list_move_tail(&req->req_list, &m->req_list);

- m->wbuf = req->tc->sdata;
- m->wsize = req->tc->size;
+ m->wbuf = req->tc.sdata;
+ m->wsize = req->tc.size;
m->wpos = 0;
spin_unlock(&m->client->lock);
}
@@ -663,7 +663,7 @@ static int p9_fd_request(struct p9_client *client, struct p9_req_t *req)
struct p9_conn *m = &ts->conn;

p9_debug(P9_DEBUG_TRANS, "mux %p task %p tcall %p id %d\n",
- m, current, req->tc, req->tc->id);
+ m, current, &req->tc, req->tc.id);
if (m->err < 0)
return m->err;

diff --git a/net/9p/trans_rdma.c b/net/9p/trans_rdma.c
index 2ab4574183c9..c5cac97df7f7 100644
--- a/net/9p/trans_rdma.c
+++ b/net/9p/trans_rdma.c
@@ -122,7 +122,7 @@ struct p9_rdma_context {
dma_addr_t busa;
union {
struct p9_req_t *req;
- struct p9_fcall *rc;
+ struct p9_fcall rc;
};
};

@@ -320,8 +320,8 @@ recv_done(struct ib_cq *cq, struct ib_wc *wc)
if (wc->status != IB_WC_SUCCESS)
goto err_out;

- c->rc->size = wc->byte_len;
- err = p9_parse_header(c->rc, NULL, NULL, &tag, 1);
+ c->rc.size = wc->byte_len;
+ err = p9_parse_header(&c->rc, NULL, NULL, &tag, 1);
if (err)
goto err_out;

@@ -331,12 +331,13 @@ recv_done(struct ib_cq *cq, struct ib_wc *wc)

/* Check that we have not yet received a reply for this request.
*/
- if (unlikely(req->rc)) {
+ if (unlikely(req->rc.sdata)) {
pr_err("Duplicate reply for request %d", tag);
goto err_out;
}

- req->rc = c->rc;
+ req->rc.size = c->rc.size;
+ req->rc.sdata = c->rc.sdata;
p9_client_cb(client, req, REQ_STATUS_RCVD);

out:
@@ -361,7 +362,7 @@ send_done(struct ib_cq *cq, struct ib_wc *wc)
container_of(wc->wr_cqe, struct p9_rdma_context, cqe);

ib_dma_unmap_single(rdma->cm_id->device,
- c->busa, c->req->tc->size,
+ c->busa, c->req->tc.size,
DMA_TO_DEVICE);
up(&rdma->sq_sem);
kfree(c);
@@ -401,7 +402,7 @@ post_recv(struct p9_client *client, struct p9_rdma_context *c)
struct ib_sge sge;

c->busa = ib_dma_map_single(rdma->cm_id->device,
- c->rc->sdata, client->msize,
+ c->rc.sdata, client->msize,
DMA_FROM_DEVICE);
if (ib_dma_mapping_error(rdma->cm_id->device, c->busa))
goto error;
@@ -443,9 +444,9 @@ static int rdma_request(struct p9_client *client, struct p9_req_t *req)
**/
if (unlikely(atomic_read(&rdma->excess_rc) > 0)) {
if ((atomic_sub_return(1, &rdma->excess_rc) >= 0)) {
- /* Got one ! */
- kfree(req->rc);
- req->rc = NULL;
+ /* Got one! */
+ kfree(req->rc.sdata);
+ req->rc.sdata = NULL;
goto dont_need_post_recv;
} else {
/* We raced and lost. */
@@ -459,7 +460,7 @@ static int rdma_request(struct p9_client *client, struct p9_req_t *req)
err = -ENOMEM;
goto recv_error;
}
- rpl_context->rc = req->rc;
+ rpl_context->rc.sdata = req->rc.sdata;

/*
* Post a receive buffer for this request. We need to ensure
@@ -479,7 +480,7 @@ static int rdma_request(struct p9_client *client, struct p9_req_t *req)
goto recv_error;
}
/* remove posted receive buffer from request structure */
- req->rc = NULL;
+ req->rc.sdata = NULL;

dont_need_post_recv:
/* Post the request */
@@ -491,7 +492,7 @@ static int rdma_request(struct p9_client *client, struct p9_req_t *req)
c->req = req;

c->busa = ib_dma_map_single(rdma->cm_id->device,
- c->req->tc->sdata, c->req->tc->size,
+ c->req->tc.sdata, c->req->tc.size,
DMA_TO_DEVICE);
if (ib_dma_mapping_error(rdma->cm_id->device, c->busa)) {
err = -EIO;
@@ -501,7 +502,7 @@ static int rdma_request(struct p9_client *client, struct p9_req_t *req)
c->cqe.done = send_done;

sge.addr = c->busa;
- sge.length = c->req->tc->size;
+ sge.length = c->req->tc.size;
sge.lkey = rdma->pd->local_dma_lkey;

wr.next = NULL;
diff --git a/net/9p/trans_virtio.c b/net/9p/trans_virtio.c
index 6265d1d62749..80dd34d8a6af 100644
--- a/net/9p/trans_virtio.c
+++ b/net/9p/trans_virtio.c
@@ -157,7 +157,7 @@ static void req_done(struct virtqueue *vq)
}

if (len) {
- req->rc->size = len;
+ req->rc.size = len;
p9_client_cb(chan->client, req, REQ_STATUS_RCVD);
}
}
@@ -274,12 +274,12 @@ p9_virtio_request(struct p9_client *client, struct p9_req_t *req)
out_sgs = in_sgs = 0;
/* Handle out VirtIO ring buffers */
out = pack_sg_list(chan->sg, 0,
- VIRTQUEUE_NUM, req->tc->sdata, req->tc->size);
+ VIRTQUEUE_NUM, req->tc.sdata, req->tc.size);
if (out)
sgs[out_sgs++] = chan->sg;

in = pack_sg_list(chan->sg, out,
- VIRTQUEUE_NUM, req->rc->sdata, req->rc->capacity);
+ VIRTQUEUE_NUM, req->rc.sdata, req->rc.capacity);
if (in)
sgs[out_sgs + in_sgs++] = chan->sg + out;

@@ -417,15 +417,15 @@ p9_virtio_zc_request(struct p9_client *client, struct p9_req_t *req,
out_nr_pages = DIV_ROUND_UP(n + offs, PAGE_SIZE);
if (n != outlen) {
__le32 v = cpu_to_le32(n);
- memcpy(&req->tc->sdata[req->tc->size - 4], &v, 4);
+ memcpy(&req->tc.sdata[req->tc.size - 4], &v, 4);
outlen = n;
}
/* The size field of the message must include the length of the
* header and the length of the data. We didn't actually know
* the length of the data until this point so add it in now.
*/
- sz = cpu_to_le32(req->tc->size + outlen);
- memcpy(&req->tc->sdata[0], &sz, sizeof(sz));
+ sz = cpu_to_le32(req->tc.size + outlen);
+ memcpy(&req->tc.sdata[0], &sz, sizeof(sz));
} else if (uidata) {
int n = p9_get_mapped_pages(chan, &in_pages, uidata,
inlen, &offs, &need_drop);
@@ -434,7 +434,7 @@ p9_virtio_zc_request(struct p9_client *client, struct p9_req_t *req,
in_nr_pages = DIV_ROUND_UP(n + offs, PAGE_SIZE);
if (n != inlen) {
__le32 v = cpu_to_le32(n);
- memcpy(&req->tc->sdata[req->tc->size - 4], &v, 4);
+ memcpy(&req->tc.sdata[req->tc.size - 4], &v, 4);
inlen = n;
}
}
@@ -446,7 +446,7 @@ p9_virtio_zc_request(struct p9_client *client, struct p9_req_t *req,

/* out data */
out = pack_sg_list(chan->sg, 0,
- VIRTQUEUE_NUM, req->tc->sdata, req->tc->size);
+ VIRTQUEUE_NUM, req->tc.sdata, req->tc.size);

if (out)
sgs[out_sgs++] = chan->sg;
@@ -465,7 +465,7 @@ p9_virtio_zc_request(struct p9_client *client, struct p9_req_t *req,
* alloced memory and payload onto the user buffer.
*/
in = pack_sg_list(chan->sg, out,
- VIRTQUEUE_NUM, req->rc->sdata, in_hdr_len);
+ VIRTQUEUE_NUM, req->rc.sdata, in_hdr_len);
if (in)
sgs[out_sgs + in_sgs++] = chan->sg + out;

diff --git a/net/9p/trans_xen.c b/net/9p/trans_xen.c
index c2d54ac76bfd..1a5b38892eb4 100644
--- a/net/9p/trans_xen.c
+++ b/net/9p/trans_xen.c
@@ -141,7 +141,7 @@ static int p9_xen_request(struct p9_client *client, struct p9_req_t *p9_req)
struct xen_9pfs_front_priv *priv = NULL;
RING_IDX cons, prod, masked_cons, masked_prod;
unsigned long flags;
- u32 size = p9_req->tc->size;
+ u32 size = p9_req->tc.size;
struct xen_9pfs_dataring *ring;
int num;

@@ -154,7 +154,7 @@ static int p9_xen_request(struct p9_client *client, struct p9_req_t *p9_req)
if (!priv || priv->client != client)
return -EINVAL;

- num = p9_req->tc->tag % priv->num_rings;
+ num = p9_req->tc.tag % priv->num_rings;
ring = &priv->rings[num];

again:
@@ -176,7 +176,7 @@ static int p9_xen_request(struct p9_client *client, struct p9_req_t *p9_req)
masked_prod = xen_9pfs_mask(prod, XEN_9PFS_RING_SIZE);
masked_cons = xen_9pfs_mask(cons, XEN_9PFS_RING_SIZE);

- xen_9pfs_write_packet(ring->data.out, p9_req->tc->sdata, size,
+ xen_9pfs_write_packet(ring->data.out, p9_req->tc.sdata, size,
&masked_prod, masked_cons, XEN_9PFS_RING_SIZE);

p9_req->status = REQ_STATUS_SENT;
@@ -229,12 +229,12 @@ static void p9_xen_response(struct work_struct *work)
continue;
}

- memcpy(req->rc, &h, sizeof(h));
- req->rc->offset = 0;
+ memcpy(&req->rc, &h, sizeof(h));
+ req->rc.offset = 0;

masked_cons = xen_9pfs_mask(cons, XEN_9PFS_RING_SIZE);
/* Then, read the whole packet (including the header) */
- xen_9pfs_read_packet(req->rc->sdata, ring->data.in, h.size,
+ xen_9pfs_read_packet(req->rc.sdata, ring->data.in, h.size,
masked_prod, &masked_cons,
XEN_9PFS_RING_SIZE);

--
2.17.1


2018-08-02 05:00:44

by Dominique Martinet

[permalink] [raw]
Subject: Re: [V9fs-developer] [PATCH v2 2/2] net/9p: add a per-client fcall kmem_cache

Dominique Martinet wrote on Thu, Aug 02, 2018:
> [...]
> + clnt->fcall_cache = kmem_cache_create("9p-fcall-cache", clnt->msize,
> + 0, 0, NULL);


Well, my gut feeling that I'd need a v3 was right, after a bit more time
testing (slightly different setup), I got this warning:
Bad or missing usercopy whitelist? Kernel memory overwrite attempt
detected to SLUB object '9p-fcall-cache' (offset 23, size 55078)!

So this kmem_cache_create needs to change to kmem_cache_create_usercopy,
but I'm not sure how to best specify the range -- this warning was about
writing data to the buffer so a TWRITE:
size[4] Twrite tag[2] fid[4] offset[8] count[4] data[count]
(the data[count] part comes from user - this matches 4+1+2+4+8+4 = 23)

Similarily RREAD has data that can be copied to userspace, which has a
similar check:
size[4] Rread tag[2] count[4] data[count]

So I could hardcode offset = 4+1+2+4=11, usercopy size = msize - 11...


We have some P9_*HDR*SZ macros but none are really appropriate:
---
/* ample room for Twrite/Rread header */
#define P9_IOHDRSZ 24

/* Room for readdir header */
#define P9_READDIRHDRSZ 24

/* size of header for zero copy read/write */
#define P9_ZC_HDR_SZ 4096
---

It's actually been bugging me for a while that I see hardcoded '7's for
9p main header size (len + msg type + tag) everywhere, I could add a
first P9_HDRSZ of 7 and use that + 4?

--
Dominique

2018-08-02 09:43:14

by Greg Kurz

[permalink] [raw]
Subject: Re: [PATCH v2 1/2] net/9p: embed fcall in req to round down buffer allocs

On Thu, 2 Aug 2018 04:37:31 +0200
Dominique Martinet <[email protected]> wrote:

> From: Dominique Martinet <[email protected]>
>
> 'msize' is often a power of two, or at least page-aligned, so avoiding
> an overhead of two dozen bytes for each allocation will help the
> allocator do its work and reduce memory fragmentation.
>
> Suggested-by: Matthew Wilcox <[email protected]>
> Signed-off-by: Dominique Martinet <[email protected]>
> Cc: Matthew Wilcox <[email protected]>
> Cc: Greg Kurz <[email protected]>
> Cc: Jun Piao <[email protected]>
> ---
>
> v2:
> - Add extra label to not free uninitialized memory on alloc failure
> - Rename p9_fcall_alloc to 9p_fcall_init
> - Add a p9_fcall_fini function to echo to init
>
[...]
> diff --git a/net/9p/trans_rdma.c b/net/9p/trans_rdma.c
> index 2ab4574183c9..c5cac97df7f7 100644
> --- a/net/9p/trans_rdma.c
> +++ b/net/9p/trans_rdma.c
> @@ -122,7 +122,7 @@ struct p9_rdma_context {
> dma_addr_t busa;
> union {
> struct p9_req_t *req;
> - struct p9_fcall *rc;
> + struct p9_fcall rc;
> };
> };
>
> @@ -320,8 +320,8 @@ recv_done(struct ib_cq *cq, struct ib_wc *wc)
> if (wc->status != IB_WC_SUCCESS)
> goto err_out;
>
> - c->rc->size = wc->byte_len;
> - err = p9_parse_header(c->rc, NULL, NULL, &tag, 1);
> + c->rc.size = wc->byte_len;
> + err = p9_parse_header(&c->rc, NULL, NULL, &tag, 1);
> if (err)
> goto err_out;
>
> @@ -331,12 +331,13 @@ recv_done(struct ib_cq *cq, struct ib_wc *wc)
>
> /* Check that we have not yet received a reply for this request.
> */
> - if (unlikely(req->rc)) {
> + if (unlikely(req->rc.sdata)) {
> pr_err("Duplicate reply for request %d", tag);
> goto err_out;
> }
>
> - req->rc = c->rc;
> + req->rc.size = c->rc.size;
> + req->rc.sdata = c->rc.sdata;
> p9_client_cb(client, req, REQ_STATUS_RCVD);
>
> out:
> @@ -361,7 +362,7 @@ send_done(struct ib_cq *cq, struct ib_wc *wc)
> container_of(wc->wr_cqe, struct p9_rdma_context, cqe);
>
> ib_dma_unmap_single(rdma->cm_id->device,
> - c->busa, c->req->tc->size,
> + c->busa, c->req->tc.size,
> DMA_TO_DEVICE);
> up(&rdma->sq_sem);
> kfree(c);
> @@ -401,7 +402,7 @@ post_recv(struct p9_client *client, struct p9_rdma_context *c)
> struct ib_sge sge;
>
> c->busa = ib_dma_map_single(rdma->cm_id->device,
> - c->rc->sdata, client->msize,
> + c->rc.sdata, client->msize,
> DMA_FROM_DEVICE);
> if (ib_dma_mapping_error(rdma->cm_id->device, c->busa))
> goto error;
> @@ -443,9 +444,9 @@ static int rdma_request(struct p9_client *client, struct p9_req_t *req)
> **/
> if (unlikely(atomic_read(&rdma->excess_rc) > 0)) {
> if ((atomic_sub_return(1, &rdma->excess_rc) >= 0)) {
> - /* Got one ! */
> - kfree(req->rc);
> - req->rc = NULL;
> + /* Got one! */
> + kfree(req->rc.sdata);

Shouldn't this be p9_fcall_fini(&req->rc) ?

The rest looks good, so, with that fixed, you can add:

Reviewed-by: Greg Kurz <[email protected]>

> + req->rc.sdata = NULL;
> goto dont_need_post_recv;
> } else {
> /* We raced and lost. */

2018-08-02 22:05:05

by Dominique Martinet

[permalink] [raw]
Subject: Re: [PATCH v2 1/2] net/9p: embed fcall in req to round down buffer allocs

Greg Kurz wrote on Thu, Aug 02, 2018:
> > @@ -443,9 +444,9 @@ static int rdma_request(struct p9_client *client, struct p9_req_t *req)
> > **/
> > if (unlikely(atomic_read(&rdma->excess_rc) > 0)) {
> > if ((atomic_sub_return(1, &rdma->excess_rc) >= 0)) {
> > - /* Got one ! */
> > - kfree(req->rc);
> > - req->rc = NULL;
> > + /* Got one! */
> > + kfree(req->rc.sdata);
>
> Shouldn't this be p9_fcall_fini(&req->rc) ?

Right, I failed at bookkeeping, I changed that in the next patch but it
should have been done now.
Will add p9_fcall_fini to headers/export in this patch instead of the
next

> The rest looks good, so, with that fixed, you can add:
>
> Reviewed-by: Greg Kurz <[email protected]>

Thanks!

--
Dominique

2018-08-09 14:35:20

by Dominique Martinet

[permalink] [raw]
Subject: [PATCH v3 2/2] net/9p: add a per-client fcall kmem_cache

From: Dominique Martinet <[email protected]>

Having a specific cache for the fcall allocations helps speed up
allocations a bit, especially in case of non-"round" msizes.

The caches will automatically be merged if there are multiple caches
of items with the same size so we do not need to try to share a cache
between different clients of the same size.

Since the msize is negotiated with the server, only allocate the cache
after that negotiation has happened - previous allocations or
allocations of different sizes (e.g. zero-copy fcall) are made with
kmalloc directly.

Signed-off-by: Dominique Martinet <[email protected]>
Cc: Matthew Wilcox <[email protected]>
Cc: Greg Kurz <[email protected]>
Cc: Jun Piao <[email protected]>
---
v2:
- Add a pointer to the cache in p9_fcall to make sure a buffer
allocated with kmalloc gets freed with kfree and vice-versa
This could have been smaller with a bool but this spares having
to look at the client so looked a bit cleaner, I'm expecting this
patch will need a v3 one way or another so I went for the bolder
approach - please say if you think a smaller item is better ; I *think*
nothing relies on this being ordered the same way as the data on the
wire (struct isn't packed anyway) so we can move id after tag and add
another u8 to not have any overhead

- added likely() to cache existence check in allocation, but nothing
for msize check or free because of zc request being of different size

v3:
- defined the userdata region in the cache, as read or write calls can
access the buffer directly (lead to warnings with HARDENED_USERCOPY=y)



I didn't get any comment on v2 for this patch, but I'm still not fully
happy with this in all honesty.

Part of me believes we might be better off always allocating from the
cache even for zero-copy headers, it's a waste of space but the buffers
are there and being reused constantly for non-zc calls, and mixing
kmallocs in could lead to these buffers being really freed and
reallocated all the time instead of reusing the slab buffers
continuously.

That would let me move the likely() for the fast path, with the only
exception being the TVERSION initial call on mount, for which I still
don't have any nice idea on how to handle, using a different size in
p9_client_rpc or prepare_req if the type is TVERSION and I'm really not
happy with that...


include/net/9p/9p.h | 4 ++++
include/net/9p/client.h | 1 +
net/9p/client.c | 40 +++++++++++++++++++++++++++++++++++-----
3 files changed, 40 insertions(+), 5 deletions(-)

diff --git a/include/net/9p/9p.h b/include/net/9p/9p.h
index e23896116d9a..645266b74652 100644
--- a/include/net/9p/9p.h
+++ b/include/net/9p/9p.h
@@ -336,6 +336,9 @@ enum p9_qid_t {
#define P9_NOFID (u32)(~0)
#define P9_MAXWELEM 16

+/* Minimal header size: len + id + tag */
+#define P9_HDRSZ 7
+
/* ample room for Twrite/Rread header */
#define P9_IOHDRSZ 24

@@ -558,6 +561,7 @@ struct p9_fcall {
size_t offset;
size_t capacity;

+ struct kmem_cache *cache;
u8 *sdata;
};

diff --git a/include/net/9p/client.h b/include/net/9p/client.h
index c2671d40bb6b..735f3979d559 100644
--- a/include/net/9p/client.h
+++ b/include/net/9p/client.h
@@ -123,6 +123,7 @@ struct p9_client {
struct p9_trans_module *trans_mod;
enum p9_trans_status status;
void *trans;
+ struct kmem_cache *fcall_cache;

union {
struct {
diff --git a/net/9p/client.c b/net/9p/client.c
index ed78751aee7c..6c57ab1294d7 100644
--- a/net/9p/client.c
+++ b/net/9p/client.c
@@ -192,6 +192,9 @@ static int parse_opts(char *opts, struct p9_client *clnt)
goto free_and_return;
}

+ if (clnt->trans_mod) {
+ pr_warn("Had trans %s\n", clnt->trans_mod->name);
+ }
v9fs_put_trans(clnt->trans_mod);
clnt->trans_mod = v9fs_get_trans_by_name(s);
if (clnt->trans_mod == NULL) {
@@ -231,9 +234,16 @@ static int parse_opts(char *opts, struct p9_client *clnt)
return ret;
}

-static int p9_fcall_init(struct p9_fcall *fc, int alloc_msize)
+static int p9_fcall_init(struct p9_client *c, struct p9_fcall *fc,
+ int alloc_msize)
{
- fc->sdata = kmalloc(alloc_msize, GFP_NOFS);
+ if (likely(c->fcall_cache) && alloc_msize == c->msize) {
+ fc->sdata = kmem_cache_alloc(c->fcall_cache, GFP_NOFS);
+ fc->cache = c->fcall_cache;
+ } else {
+ fc->sdata = kmalloc(alloc_msize, GFP_NOFS);
+ fc->cache = NULL;
+ }
if (!fc->sdata)
return -ENOMEM;
fc->capacity = alloc_msize;
@@ -242,7 +252,16 @@ static int p9_fcall_init(struct p9_fcall *fc, int alloc_msize)

void p9_fcall_fini(struct p9_fcall *fc)
{
- kfree(fc->sdata);
+ /* sdata can be NULL for interrupted requests in trans_rdma,
+ * and kmem_cache_free does not do NULL-check for us
+ */
+ if (unlikely(!fc->sdata))
+ return;
+
+ if (fc->cache)
+ kmem_cache_free(fc->cache, fc->sdata);
+ else
+ kfree(fc->sdata);
}
EXPORT_SYMBOL(p9_fcall_fini);

@@ -267,9 +286,9 @@ p9_tag_alloc(struct p9_client *c, int8_t type, unsigned int max_size)
if (!req)
return NULL;

- if (p9_fcall_init(&req->tc, alloc_msize))
+ if (p9_fcall_init(c, &req->tc, alloc_msize))
goto free_req;
- if (p9_fcall_init(&req->rc, alloc_msize))
+ if (p9_fcall_init(c, &req->rc, alloc_msize))
goto free;

p9pdu_reset(&req->tc);
@@ -951,6 +970,7 @@ struct p9_client *p9_client_create(const char *dev_name, char *options)

clnt->trans_mod = NULL;
clnt->trans = NULL;
+ clnt->fcall_cache = NULL;

client_id = utsname()->nodename;
memcpy(clnt->name, client_id, strlen(client_id) + 1);
@@ -987,6 +1007,15 @@ struct p9_client *p9_client_create(const char *dev_name, char *options)
if (err)
goto close_trans;

+ /* P9_HDRSZ + 4 is the smallest packet header we can have that is
+ * followed by data accessed from userspace by read
+ */
+ clnt->fcall_cache =
+ kmem_cache_create_usercopy("9p-fcall-cache", clnt->msize,
+ 0, 0, P9_HDRSZ + 4,
+ clnt->msize - (P9_HDRSZ + 4),
+ NULL);
+
return clnt;

close_trans:
@@ -1018,6 +1047,7 @@ void p9_client_destroy(struct p9_client *clnt)

p9_tag_cleanup(clnt);

+ kmem_cache_destroy(clnt->fcall_cache);
kfree(clnt);
}
EXPORT_SYMBOL(p9_client_destroy);
--
2.17.1


2018-08-09 14:35:42

by Dominique Martinet

[permalink] [raw]
Subject: [PATCH v3 1/2] net/9p: embed fcall in req to round down buffer allocs

From: Dominique Martinet <[email protected]>

'msize' is often a power of two, or at least page-aligned, so avoiding
an overhead of two dozen bytes for each allocation will help the
allocator do its work and reduce memory fragmentation.

Suggested-by: Matthew Wilcox <[email protected]>
Signed-off-by: Dominique Martinet <[email protected]>
Reviewed-by: Greg Kurz <[email protected]>
Cc: Matthew Wilcox <[email protected]>
Cc: Jun Piao <[email protected]>
---
v2:
- Add extra label to not free uninitialized memory on alloc failure
- Rename p9_fcall_alloc to 9p_fcall_init
- Add a p9_fcall_fini function to echo to init

v3:
- use p9_call_fini() in rdma_request() instead of kfree, the code
was in the following patch in previous version

include/net/9p/client.h | 5 +-
net/9p/client.c | 167 +++++++++++++++++++++-------------------
net/9p/trans_fd.c | 12 +--
net/9p/trans_rdma.c | 29 +++----
net/9p/trans_virtio.c | 18 ++---
net/9p/trans_xen.c | 12 +--
6 files changed, 125 insertions(+), 118 deletions(-)

diff --git a/include/net/9p/client.h b/include/net/9p/client.h
index a4dc42c53d18..c2671d40bb6b 100644
--- a/include/net/9p/client.h
+++ b/include/net/9p/client.h
@@ -95,8 +95,8 @@ struct p9_req_t {
int status;
int t_err;
wait_queue_head_t wq;
- struct p9_fcall *tc;
- struct p9_fcall *rc;
+ struct p9_fcall tc;
+ struct p9_fcall rc;
void *aux;
struct list_head req_list;
};
@@ -230,6 +230,7 @@ int p9_client_mkdir_dotl(struct p9_fid *fid, const char *name, int mode,
kgid_t gid, struct p9_qid *);
int p9_client_lock_dotl(struct p9_fid *fid, struct p9_flock *flock, u8 *status);
int p9_client_getlock_dotl(struct p9_fid *fid, struct p9_getlock *fl);
+void p9_fcall_fini(struct p9_fcall *fc);
struct p9_req_t *p9_tag_lookup(struct p9_client *, u16);
void p9_client_cb(struct p9_client *c, struct p9_req_t *req, int status);

diff --git a/net/9p/client.c b/net/9p/client.c
index 88db45966740..ed78751aee7c 100644
--- a/net/9p/client.c
+++ b/net/9p/client.c
@@ -231,16 +231,20 @@ static int parse_opts(char *opts, struct p9_client *clnt)
return ret;
}

-static struct p9_fcall *p9_fcall_alloc(int alloc_msize)
+static int p9_fcall_init(struct p9_fcall *fc, int alloc_msize)
{
- struct p9_fcall *fc;
- fc = kmalloc(sizeof(struct p9_fcall) + alloc_msize, GFP_NOFS);
- if (!fc)
- return NULL;
+ fc->sdata = kmalloc(alloc_msize, GFP_NOFS);
+ if (!fc->sdata)
+ return -ENOMEM;
fc->capacity = alloc_msize;
- fc->sdata = (char *) fc + sizeof(struct p9_fcall);
- return fc;
+ return 0;
+}
+
+void p9_fcall_fini(struct p9_fcall *fc)
+{
+ kfree(fc->sdata);
}
+EXPORT_SYMBOL(p9_fcall_fini);

static struct kmem_cache *p9_req_cache;

@@ -263,13 +267,13 @@ p9_tag_alloc(struct p9_client *c, int8_t type, unsigned int max_size)
if (!req)
return NULL;

- req->tc = p9_fcall_alloc(alloc_msize);
- req->rc = p9_fcall_alloc(alloc_msize);
- if (!req->tc || !req->rc)
+ if (p9_fcall_init(&req->tc, alloc_msize))
+ goto free_req;
+ if (p9_fcall_init(&req->rc, alloc_msize))
goto free;

- p9pdu_reset(req->tc);
- p9pdu_reset(req->rc);
+ p9pdu_reset(&req->tc);
+ p9pdu_reset(&req->rc);
req->status = REQ_STATUS_ALLOC;
init_waitqueue_head(&req->wq);
INIT_LIST_HEAD(&req->req_list);
@@ -281,7 +285,7 @@ p9_tag_alloc(struct p9_client *c, int8_t type, unsigned int max_size)
GFP_NOWAIT);
else
tag = idr_alloc(&c->reqs, req, 0, P9_NOTAG, GFP_NOWAIT);
- req->tc->tag = tag;
+ req->tc.tag = tag;
spin_unlock_irq(&c->lock);
idr_preload_end();
if (tag < 0)
@@ -290,8 +294,9 @@ p9_tag_alloc(struct p9_client *c, int8_t type, unsigned int max_size)
return req;

free:
- kfree(req->tc);
- kfree(req->rc);
+ p9_fcall_fini(&req->tc);
+ p9_fcall_fini(&req->rc);
+free_req:
kmem_cache_free(p9_req_cache, req);
return ERR_PTR(-ENOMEM);
}
@@ -329,14 +334,14 @@ EXPORT_SYMBOL(p9_tag_lookup);
static void p9_free_req(struct p9_client *c, struct p9_req_t *r)
{
unsigned long flags;
- u16 tag = r->tc->tag;
+ u16 tag = r->tc.tag;

p9_debug(P9_DEBUG_MUX, "clnt %p req %p tag: %d\n", c, r, tag);
spin_lock_irqsave(&c->lock, flags);
idr_remove(&c->reqs, tag);
spin_unlock_irqrestore(&c->lock, flags);
- kfree(r->tc);
- kfree(r->rc);
+ p9_fcall_fini(&r->tc);
+ p9_fcall_fini(&r->rc);
kmem_cache_free(p9_req_cache, r);
}

@@ -368,7 +373,7 @@ static void p9_tag_cleanup(struct p9_client *c)
*/
void p9_client_cb(struct p9_client *c, struct p9_req_t *req, int status)
{
- p9_debug(P9_DEBUG_MUX, " tag %d\n", req->tc->tag);
+ p9_debug(P9_DEBUG_MUX, " tag %d\n", req->tc.tag);

/*
* This barrier is needed to make sure any change made to req before
@@ -378,7 +383,7 @@ void p9_client_cb(struct p9_client *c, struct p9_req_t *req, int status)
req->status = status;

wake_up(&req->wq);
- p9_debug(P9_DEBUG_MUX, "wakeup: %d\n", req->tc->tag);
+ p9_debug(P9_DEBUG_MUX, "wakeup: %d\n", req->tc.tag);
}
EXPORT_SYMBOL(p9_client_cb);

@@ -449,18 +454,18 @@ static int p9_check_errors(struct p9_client *c, struct p9_req_t *req)
int err;
int ecode;

- err = p9_parse_header(req->rc, NULL, &type, NULL, 0);
- if (req->rc->size >= c->msize) {
+ err = p9_parse_header(&req->rc, NULL, &type, NULL, 0);
+ if (req->rc.size >= c->msize) {
p9_debug(P9_DEBUG_ERROR,
"requested packet size too big: %d\n",
- req->rc->size);
+ req->rc.size);
return -EIO;
}
/*
* dump the response from server
* This should be after check errors which poplulate pdu_fcall.
*/
- trace_9p_protocol_dump(c, req->rc);
+ trace_9p_protocol_dump(c, &req->rc);
if (err) {
p9_debug(P9_DEBUG_ERROR, "couldn't parse header %d\n", err);
return err;
@@ -470,7 +475,7 @@ static int p9_check_errors(struct p9_client *c, struct p9_req_t *req)

if (!p9_is_proto_dotl(c)) {
char *ename;
- err = p9pdu_readf(req->rc, c->proto_version, "s?d",
+ err = p9pdu_readf(&req->rc, c->proto_version, "s?d",
&ename, &ecode);
if (err)
goto out_err;
@@ -486,7 +491,7 @@ static int p9_check_errors(struct p9_client *c, struct p9_req_t *req)
}
kfree(ename);
} else {
- err = p9pdu_readf(req->rc, c->proto_version, "d", &ecode);
+ err = p9pdu_readf(&req->rc, c->proto_version, "d", &ecode);
err = -ecode;

p9_debug(P9_DEBUG_9P, "<<< RLERROR (%d)\n", -ecode);
@@ -520,12 +525,12 @@ static int p9_check_zc_errors(struct p9_client *c, struct p9_req_t *req,
int8_t type;
char *ename = NULL;

- err = p9_parse_header(req->rc, NULL, &type, NULL, 0);
+ err = p9_parse_header(&req->rc, NULL, &type, NULL, 0);
/*
* dump the response from server
* This should be after parse_header which poplulate pdu_fcall.
*/
- trace_9p_protocol_dump(c, req->rc);
+ trace_9p_protocol_dump(c, &req->rc);
if (err) {
p9_debug(P9_DEBUG_ERROR, "couldn't parse header %d\n", err);
return err;
@@ -540,13 +545,13 @@ static int p9_check_zc_errors(struct p9_client *c, struct p9_req_t *req,
/* 7 = header size for RERROR; */
int inline_len = in_hdrlen - 7;

- len = req->rc->size - req->rc->offset;
+ len = req->rc.size - req->rc.offset;
if (len > (P9_ZC_HDR_SZ - 7)) {
err = -EFAULT;
goto out_err;
}

- ename = &req->rc->sdata[req->rc->offset];
+ ename = &req->rc.sdata[req->rc.offset];
if (len > inline_len) {
/* We have error in external buffer */
if (!copy_from_iter_full(ename + inline_len,
@@ -556,7 +561,7 @@ static int p9_check_zc_errors(struct p9_client *c, struct p9_req_t *req,
}
}
ename = NULL;
- err = p9pdu_readf(req->rc, c->proto_version, "s?d",
+ err = p9pdu_readf(&req->rc, c->proto_version, "s?d",
&ename, &ecode);
if (err)
goto out_err;
@@ -572,7 +577,7 @@ static int p9_check_zc_errors(struct p9_client *c, struct p9_req_t *req,
}
kfree(ename);
} else {
- err = p9pdu_readf(req->rc, c->proto_version, "d", &ecode);
+ err = p9pdu_readf(&req->rc, c->proto_version, "d", &ecode);
err = -ecode;

p9_debug(P9_DEBUG_9P, "<<< RLERROR (%d)\n", -ecode);
@@ -605,7 +610,7 @@ static int p9_client_flush(struct p9_client *c, struct p9_req_t *oldreq)
int16_t oldtag;
int err;

- err = p9_parse_header(oldreq->tc, NULL, NULL, &oldtag, 1);
+ err = p9_parse_header(&oldreq->tc, NULL, NULL, &oldtag, 1);
if (err)
return err;

@@ -649,12 +654,12 @@ static struct p9_req_t *p9_client_prepare_req(struct p9_client *c,
return req;

/* marshall the data */
- p9pdu_prepare(req->tc, req->tc->tag, type);
- err = p9pdu_vwritef(req->tc, c->proto_version, fmt, ap);
+ p9pdu_prepare(&req->tc, req->tc.tag, type);
+ err = p9pdu_vwritef(&req->tc, c->proto_version, fmt, ap);
if (err)
goto reterr;
- p9pdu_finalize(c, req->tc);
- trace_9p_client_req(c, type, req->tc->tag);
+ p9pdu_finalize(c, &req->tc);
+ trace_9p_client_req(c, type, req->tc.tag);
return req;
reterr:
p9_free_req(c, req);
@@ -739,7 +744,7 @@ p9_client_rpc(struct p9_client *c, int8_t type, const char *fmt, ...)
goto reterr;

err = p9_check_errors(c, req);
- trace_9p_client_res(c, type, req->rc->tag, err);
+ trace_9p_client_res(c, type, req->rc.tag, err);
if (!err)
return req;
reterr:
@@ -821,7 +826,7 @@ static struct p9_req_t *p9_client_zc_rpc(struct p9_client *c, int8_t type,
goto reterr;

err = p9_check_zc_errors(c, req, uidata, in_hdrlen);
- trace_9p_client_res(c, type, req->rc->tag, err);
+ trace_9p_client_res(c, type, req->rc.tag, err);
if (!err)
return req;
reterr:
@@ -904,10 +909,10 @@ static int p9_client_version(struct p9_client *c)
if (IS_ERR(req))
return PTR_ERR(req);

- err = p9pdu_readf(req->rc, c->proto_version, "ds", &msize, &version);
+ err = p9pdu_readf(&req->rc, c->proto_version, "ds", &msize, &version);
if (err) {
p9_debug(P9_DEBUG_9P, "version error %d\n", err);
- trace_9p_protocol_dump(c, req->rc);
+ trace_9p_protocol_dump(c, &req->rc);
goto error;
}

@@ -1056,9 +1061,9 @@ struct p9_fid *p9_client_attach(struct p9_client *clnt, struct p9_fid *afid,
goto error;
}

- err = p9pdu_readf(req->rc, clnt->proto_version, "Q", &qid);
+ err = p9pdu_readf(&req->rc, clnt->proto_version, "Q", &qid);
if (err) {
- trace_9p_protocol_dump(clnt, req->rc);
+ trace_9p_protocol_dump(clnt, &req->rc);
p9_free_req(clnt, req);
goto error;
}
@@ -1113,9 +1118,9 @@ struct p9_fid *p9_client_walk(struct p9_fid *oldfid, uint16_t nwname,
goto error;
}

- err = p9pdu_readf(req->rc, clnt->proto_version, "R", &nwqids, &wqids);
+ err = p9pdu_readf(&req->rc, clnt->proto_version, "R", &nwqids, &wqids);
if (err) {
- trace_9p_protocol_dump(clnt, req->rc);
+ trace_9p_protocol_dump(clnt, &req->rc);
p9_free_req(clnt, req);
goto clunk_fid;
}
@@ -1180,9 +1185,9 @@ int p9_client_open(struct p9_fid *fid, int mode)
goto error;
}

- err = p9pdu_readf(req->rc, clnt->proto_version, "Qd", &qid, &iounit);
+ err = p9pdu_readf(&req->rc, clnt->proto_version, "Qd", &qid, &iounit);
if (err) {
- trace_9p_protocol_dump(clnt, req->rc);
+ trace_9p_protocol_dump(clnt, &req->rc);
goto free_and_error;
}

@@ -1224,9 +1229,9 @@ int p9_client_create_dotl(struct p9_fid *ofid, const char *name, u32 flags, u32
goto error;
}

- err = p9pdu_readf(req->rc, clnt->proto_version, "Qd", qid, &iounit);
+ err = p9pdu_readf(&req->rc, clnt->proto_version, "Qd", qid, &iounit);
if (err) {
- trace_9p_protocol_dump(clnt, req->rc);
+ trace_9p_protocol_dump(clnt, &req->rc);
goto free_and_error;
}

@@ -1269,9 +1274,9 @@ int p9_client_fcreate(struct p9_fid *fid, const char *name, u32 perm, int mode,
goto error;
}

- err = p9pdu_readf(req->rc, clnt->proto_version, "Qd", &qid, &iounit);
+ err = p9pdu_readf(&req->rc, clnt->proto_version, "Qd", &qid, &iounit);
if (err) {
- trace_9p_protocol_dump(clnt, req->rc);
+ trace_9p_protocol_dump(clnt, &req->rc);
goto free_and_error;
}

@@ -1308,9 +1313,9 @@ int p9_client_symlink(struct p9_fid *dfid, const char *name,
goto error;
}

- err = p9pdu_readf(req->rc, clnt->proto_version, "Q", qid);
+ err = p9pdu_readf(&req->rc, clnt->proto_version, "Q", qid);
if (err) {
- trace_9p_protocol_dump(clnt, req->rc);
+ trace_9p_protocol_dump(clnt, &req->rc);
goto free_and_error;
}

@@ -1506,10 +1511,10 @@ p9_client_read(struct p9_fid *fid, u64 offset, struct iov_iter *to, int *err)
break;
}

- *err = p9pdu_readf(req->rc, clnt->proto_version,
+ *err = p9pdu_readf(&req->rc, clnt->proto_version,
"D", &count, &dataptr);
if (*err) {
- trace_9p_protocol_dump(clnt, req->rc);
+ trace_9p_protocol_dump(clnt, &req->rc);
p9_free_req(clnt, req);
break;
}
@@ -1579,9 +1584,9 @@ p9_client_write(struct p9_fid *fid, u64 offset, struct iov_iter *from, int *err)
break;
}

- *err = p9pdu_readf(req->rc, clnt->proto_version, "d", &count);
+ *err = p9pdu_readf(&req->rc, clnt->proto_version, "d", &count);
if (*err) {
- trace_9p_protocol_dump(clnt, req->rc);
+ trace_9p_protocol_dump(clnt, &req->rc);
p9_free_req(clnt, req);
break;
}
@@ -1623,9 +1628,9 @@ struct p9_wstat *p9_client_stat(struct p9_fid *fid)
goto error;
}

- err = p9pdu_readf(req->rc, clnt->proto_version, "wS", &ignored, ret);
+ err = p9pdu_readf(&req->rc, clnt->proto_version, "wS", &ignored, ret);
if (err) {
- trace_9p_protocol_dump(clnt, req->rc);
+ trace_9p_protocol_dump(clnt, &req->rc);
p9_free_req(clnt, req);
goto error;
}
@@ -1676,9 +1681,9 @@ struct p9_stat_dotl *p9_client_getattr_dotl(struct p9_fid *fid,
goto error;
}

- err = p9pdu_readf(req->rc, clnt->proto_version, "A", ret);
+ err = p9pdu_readf(&req->rc, clnt->proto_version, "A", ret);
if (err) {
- trace_9p_protocol_dump(clnt, req->rc);
+ trace_9p_protocol_dump(clnt, &req->rc);
p9_free_req(clnt, req);
goto error;
}
@@ -1828,11 +1833,11 @@ int p9_client_statfs(struct p9_fid *fid, struct p9_rstatfs *sb)
goto error;
}

- err = p9pdu_readf(req->rc, clnt->proto_version, "ddqqqqqqd", &sb->type,
- &sb->bsize, &sb->blocks, &sb->bfree, &sb->bavail,
- &sb->files, &sb->ffree, &sb->fsid, &sb->namelen);
+ err = p9pdu_readf(&req->rc, clnt->proto_version, "ddqqqqqqd", &sb->type,
+ &sb->bsize, &sb->blocks, &sb->bfree, &sb->bavail,
+ &sb->files, &sb->ffree, &sb->fsid, &sb->namelen);
if (err) {
- trace_9p_protocol_dump(clnt, req->rc);
+ trace_9p_protocol_dump(clnt, &req->rc);
p9_free_req(clnt, req);
goto error;
}
@@ -1936,9 +1941,9 @@ struct p9_fid *p9_client_xattrwalk(struct p9_fid *file_fid,
err = PTR_ERR(req);
goto error;
}
- err = p9pdu_readf(req->rc, clnt->proto_version, "q", attr_size);
+ err = p9pdu_readf(&req->rc, clnt->proto_version, "q", attr_size);
if (err) {
- trace_9p_protocol_dump(clnt, req->rc);
+ trace_9p_protocol_dump(clnt, &req->rc);
p9_free_req(clnt, req);
goto clunk_fid;
}
@@ -2024,9 +2029,9 @@ int p9_client_readdir(struct p9_fid *fid, char *data, u32 count, u64 offset)
goto error;
}

- err = p9pdu_readf(req->rc, clnt->proto_version, "D", &count, &dataptr);
+ err = p9pdu_readf(&req->rc, clnt->proto_version, "D", &count, &dataptr);
if (err) {
- trace_9p_protocol_dump(clnt, req->rc);
+ trace_9p_protocol_dump(clnt, &req->rc);
goto free_and_error;
}
if (rsize < count) {
@@ -2065,9 +2070,9 @@ int p9_client_mknod_dotl(struct p9_fid *fid, const char *name, int mode,
if (IS_ERR(req))
return PTR_ERR(req);

- err = p9pdu_readf(req->rc, clnt->proto_version, "Q", qid);
+ err = p9pdu_readf(&req->rc, clnt->proto_version, "Q", qid);
if (err) {
- trace_9p_protocol_dump(clnt, req->rc);
+ trace_9p_protocol_dump(clnt, &req->rc);
goto error;
}
p9_debug(P9_DEBUG_9P, "<<< RMKNOD qid %x.%llx.%x\n", qid->type,
@@ -2096,9 +2101,9 @@ int p9_client_mkdir_dotl(struct p9_fid *fid, const char *name, int mode,
if (IS_ERR(req))
return PTR_ERR(req);

- err = p9pdu_readf(req->rc, clnt->proto_version, "Q", qid);
+ err = p9pdu_readf(&req->rc, clnt->proto_version, "Q", qid);
if (err) {
- trace_9p_protocol_dump(clnt, req->rc);
+ trace_9p_protocol_dump(clnt, &req->rc);
goto error;
}
p9_debug(P9_DEBUG_9P, "<<< RMKDIR qid %x.%llx.%x\n", qid->type,
@@ -2131,9 +2136,9 @@ int p9_client_lock_dotl(struct p9_fid *fid, struct p9_flock *flock, u8 *status)
if (IS_ERR(req))
return PTR_ERR(req);

- err = p9pdu_readf(req->rc, clnt->proto_version, "b", status);
+ err = p9pdu_readf(&req->rc, clnt->proto_version, "b", status);
if (err) {
- trace_9p_protocol_dump(clnt, req->rc);
+ trace_9p_protocol_dump(clnt, &req->rc);
goto error;
}
p9_debug(P9_DEBUG_9P, "<<< RLOCK status %i\n", *status);
@@ -2162,11 +2167,11 @@ int p9_client_getlock_dotl(struct p9_fid *fid, struct p9_getlock *glock)
if (IS_ERR(req))
return PTR_ERR(req);

- err = p9pdu_readf(req->rc, clnt->proto_version, "bqqds", &glock->type,
- &glock->start, &glock->length, &glock->proc_id,
- &glock->client_id);
+ err = p9pdu_readf(&req->rc, clnt->proto_version, "bqqds", &glock->type,
+ &glock->start, &glock->length, &glock->proc_id,
+ &glock->client_id);
if (err) {
- trace_9p_protocol_dump(clnt, req->rc);
+ trace_9p_protocol_dump(clnt, &req->rc);
goto error;
}
p9_debug(P9_DEBUG_9P, "<<< RGETLOCK type %i start %lld length %lld "
@@ -2192,9 +2197,9 @@ int p9_client_readlink(struct p9_fid *fid, char **target)
if (IS_ERR(req))
return PTR_ERR(req);

- err = p9pdu_readf(req->rc, clnt->proto_version, "s", target);
+ err = p9pdu_readf(&req->rc, clnt->proto_version, "s", target);
if (err) {
- trace_9p_protocol_dump(clnt, req->rc);
+ trace_9p_protocol_dump(clnt, &req->rc);
goto error;
}
p9_debug(P9_DEBUG_9P, "<<< RREADLINK target %s\n", *target);
diff --git a/net/9p/trans_fd.c b/net/9p/trans_fd.c
index e2ef3c782c53..20f46f13fe83 100644
--- a/net/9p/trans_fd.c
+++ b/net/9p/trans_fd.c
@@ -354,7 +354,7 @@ static void p9_read_work(struct work_struct *work)
goto error;
}

- if (m->req->rc == NULL) {
+ if (m->req->rc.sdata == NULL) {
p9_debug(P9_DEBUG_ERROR,
"No recv fcall for tag %d (req %p), disconnecting!\n",
m->rc.tag, m->req);
@@ -362,7 +362,7 @@ static void p9_read_work(struct work_struct *work)
err = -EIO;
goto error;
}
- m->rc.sdata = (char *)m->req->rc + sizeof(struct p9_fcall);
+ m->rc.sdata = m->req->rc.sdata;
memcpy(m->rc.sdata, m->tmp_buf, m->rc.capacity);
m->rc.capacity = m->rc.size;
}
@@ -372,7 +372,7 @@ static void p9_read_work(struct work_struct *work)
*/
if ((m->req) && (m->rc.offset == m->rc.capacity)) {
p9_debug(P9_DEBUG_TRANS, "got new packet\n");
- m->req->rc->size = m->rc.offset;
+ m->req->rc.size = m->rc.offset;
spin_lock(&m->client->lock);
if (m->req->status != REQ_STATUS_ERROR)
status = REQ_STATUS_RCVD;
@@ -469,8 +469,8 @@ static void p9_write_work(struct work_struct *work)
p9_debug(P9_DEBUG_TRANS, "move req %p\n", req);
list_move_tail(&req->req_list, &m->req_list);

- m->wbuf = req->tc->sdata;
- m->wsize = req->tc->size;
+ m->wbuf = req->tc.sdata;
+ m->wsize = req->tc.size;
m->wpos = 0;
spin_unlock(&m->client->lock);
}
@@ -663,7 +663,7 @@ static int p9_fd_request(struct p9_client *client, struct p9_req_t *req)
struct p9_conn *m = &ts->conn;

p9_debug(P9_DEBUG_TRANS, "mux %p task %p tcall %p id %d\n",
- m, current, req->tc, req->tc->id);
+ m, current, &req->tc, req->tc.id);
if (m->err < 0)
return m->err;

diff --git a/net/9p/trans_rdma.c b/net/9p/trans_rdma.c
index 2ab4574183c9..c60655c90c9e 100644
--- a/net/9p/trans_rdma.c
+++ b/net/9p/trans_rdma.c
@@ -122,7 +122,7 @@ struct p9_rdma_context {
dma_addr_t busa;
union {
struct p9_req_t *req;
- struct p9_fcall *rc;
+ struct p9_fcall rc;
};
};

@@ -320,8 +320,8 @@ recv_done(struct ib_cq *cq, struct ib_wc *wc)
if (wc->status != IB_WC_SUCCESS)
goto err_out;

- c->rc->size = wc->byte_len;
- err = p9_parse_header(c->rc, NULL, NULL, &tag, 1);
+ c->rc.size = wc->byte_len;
+ err = p9_parse_header(&c->rc, NULL, NULL, &tag, 1);
if (err)
goto err_out;

@@ -331,12 +331,13 @@ recv_done(struct ib_cq *cq, struct ib_wc *wc)

/* Check that we have not yet received a reply for this request.
*/
- if (unlikely(req->rc)) {
+ if (unlikely(req->rc.sdata)) {
pr_err("Duplicate reply for request %d", tag);
goto err_out;
}

- req->rc = c->rc;
+ req->rc.size = c->rc.size;
+ req->rc.sdata = c->rc.sdata;
p9_client_cb(client, req, REQ_STATUS_RCVD);

out:
@@ -361,7 +362,7 @@ send_done(struct ib_cq *cq, struct ib_wc *wc)
container_of(wc->wr_cqe, struct p9_rdma_context, cqe);

ib_dma_unmap_single(rdma->cm_id->device,
- c->busa, c->req->tc->size,
+ c->busa, c->req->tc.size,
DMA_TO_DEVICE);
up(&rdma->sq_sem);
kfree(c);
@@ -401,7 +402,7 @@ post_recv(struct p9_client *client, struct p9_rdma_context *c)
struct ib_sge sge;

c->busa = ib_dma_map_single(rdma->cm_id->device,
- c->rc->sdata, client->msize,
+ c->rc.sdata, client->msize,
DMA_FROM_DEVICE);
if (ib_dma_mapping_error(rdma->cm_id->device, c->busa))
goto error;
@@ -443,9 +444,9 @@ static int rdma_request(struct p9_client *client, struct p9_req_t *req)
**/
if (unlikely(atomic_read(&rdma->excess_rc) > 0)) {
if ((atomic_sub_return(1, &rdma->excess_rc) >= 0)) {
- /* Got one ! */
- kfree(req->rc);
- req->rc = NULL;
+ /* Got one! */
+ p9_fcall_fini(&req->rc);
+ req->rc.sdata = NULL;
goto dont_need_post_recv;
} else {
/* We raced and lost. */
@@ -459,7 +460,7 @@ static int rdma_request(struct p9_client *client, struct p9_req_t *req)
err = -ENOMEM;
goto recv_error;
}
- rpl_context->rc = req->rc;
+ rpl_context->rc.sdata = req->rc.sdata;

/*
* Post a receive buffer for this request. We need to ensure
@@ -479,7 +480,7 @@ static int rdma_request(struct p9_client *client, struct p9_req_t *req)
goto recv_error;
}
/* remove posted receive buffer from request structure */
- req->rc = NULL;
+ req->rc.sdata = NULL;

dont_need_post_recv:
/* Post the request */
@@ -491,7 +492,7 @@ static int rdma_request(struct p9_client *client, struct p9_req_t *req)
c->req = req;

c->busa = ib_dma_map_single(rdma->cm_id->device,
- c->req->tc->sdata, c->req->tc->size,
+ c->req->tc.sdata, c->req->tc.size,
DMA_TO_DEVICE);
if (ib_dma_mapping_error(rdma->cm_id->device, c->busa)) {
err = -EIO;
@@ -501,7 +502,7 @@ static int rdma_request(struct p9_client *client, struct p9_req_t *req)
c->cqe.done = send_done;

sge.addr = c->busa;
- sge.length = c->req->tc->size;
+ sge.length = c->req->tc.size;
sge.lkey = rdma->pd->local_dma_lkey;

wr.next = NULL;
diff --git a/net/9p/trans_virtio.c b/net/9p/trans_virtio.c
index 7728b0acde09..3dd6ce1c0f2d 100644
--- a/net/9p/trans_virtio.c
+++ b/net/9p/trans_virtio.c
@@ -155,7 +155,7 @@ static void req_done(struct virtqueue *vq)
}

if (len) {
- req->rc->size = len;
+ req->rc.size = len;
p9_client_cb(chan->client, req, REQ_STATUS_RCVD);
}
}
@@ -273,12 +273,12 @@ p9_virtio_request(struct p9_client *client, struct p9_req_t *req)
out_sgs = in_sgs = 0;
/* Handle out VirtIO ring buffers */
out = pack_sg_list(chan->sg, 0,
- VIRTQUEUE_NUM, req->tc->sdata, req->tc->size);
+ VIRTQUEUE_NUM, req->tc.sdata, req->tc.size);
if (out)
sgs[out_sgs++] = chan->sg;

in = pack_sg_list(chan->sg, out,
- VIRTQUEUE_NUM, req->rc->sdata, req->rc->capacity);
+ VIRTQUEUE_NUM, req->rc.sdata, req->rc.capacity);
if (in)
sgs[out_sgs + in_sgs++] = chan->sg + out;

@@ -416,15 +416,15 @@ p9_virtio_zc_request(struct p9_client *client, struct p9_req_t *req,
out_nr_pages = DIV_ROUND_UP(n + offs, PAGE_SIZE);
if (n != outlen) {
__le32 v = cpu_to_le32(n);
- memcpy(&req->tc->sdata[req->tc->size - 4], &v, 4);
+ memcpy(&req->tc.sdata[req->tc.size - 4], &v, 4);
outlen = n;
}
/* The size field of the message must include the length of the
* header and the length of the data. We didn't actually know
* the length of the data until this point so add it in now.
*/
- sz = cpu_to_le32(req->tc->size + outlen);
- memcpy(&req->tc->sdata[0], &sz, sizeof(sz));
+ sz = cpu_to_le32(req->tc.size + outlen);
+ memcpy(&req->tc.sdata[0], &sz, sizeof(sz));
} else if (uidata) {
int n = p9_get_mapped_pages(chan, &in_pages, uidata,
inlen, &offs, &need_drop);
@@ -433,7 +433,7 @@ p9_virtio_zc_request(struct p9_client *client, struct p9_req_t *req,
in_nr_pages = DIV_ROUND_UP(n + offs, PAGE_SIZE);
if (n != inlen) {
__le32 v = cpu_to_le32(n);
- memcpy(&req->tc->sdata[req->tc->size - 4], &v, 4);
+ memcpy(&req->tc.sdata[req->tc.size - 4], &v, 4);
inlen = n;
}
}
@@ -445,7 +445,7 @@ p9_virtio_zc_request(struct p9_client *client, struct p9_req_t *req,

/* out data */
out = pack_sg_list(chan->sg, 0,
- VIRTQUEUE_NUM, req->tc->sdata, req->tc->size);
+ VIRTQUEUE_NUM, req->tc.sdata, req->tc.size);

if (out)
sgs[out_sgs++] = chan->sg;
@@ -464,7 +464,7 @@ p9_virtio_zc_request(struct p9_client *client, struct p9_req_t *req,
* alloced memory and payload onto the user buffer.
*/
in = pack_sg_list(chan->sg, out,
- VIRTQUEUE_NUM, req->rc->sdata, in_hdr_len);
+ VIRTQUEUE_NUM, req->rc.sdata, in_hdr_len);
if (in)
sgs[out_sgs + in_sgs++] = chan->sg + out;

diff --git a/net/9p/trans_xen.c b/net/9p/trans_xen.c
index c2d54ac76bfd..1a5b38892eb4 100644
--- a/net/9p/trans_xen.c
+++ b/net/9p/trans_xen.c
@@ -141,7 +141,7 @@ static int p9_xen_request(struct p9_client *client, struct p9_req_t *p9_req)
struct xen_9pfs_front_priv *priv = NULL;
RING_IDX cons, prod, masked_cons, masked_prod;
unsigned long flags;
- u32 size = p9_req->tc->size;
+ u32 size = p9_req->tc.size;
struct xen_9pfs_dataring *ring;
int num;

@@ -154,7 +154,7 @@ static int p9_xen_request(struct p9_client *client, struct p9_req_t *p9_req)
if (!priv || priv->client != client)
return -EINVAL;

- num = p9_req->tc->tag % priv->num_rings;
+ num = p9_req->tc.tag % priv->num_rings;
ring = &priv->rings[num];

again:
@@ -176,7 +176,7 @@ static int p9_xen_request(struct p9_client *client, struct p9_req_t *p9_req)
masked_prod = xen_9pfs_mask(prod, XEN_9PFS_RING_SIZE);
masked_cons = xen_9pfs_mask(cons, XEN_9PFS_RING_SIZE);

- xen_9pfs_write_packet(ring->data.out, p9_req->tc->sdata, size,
+ xen_9pfs_write_packet(ring->data.out, p9_req->tc.sdata, size,
&masked_prod, masked_cons, XEN_9PFS_RING_SIZE);

p9_req->status = REQ_STATUS_SENT;
@@ -229,12 +229,12 @@ static void p9_xen_response(struct work_struct *work)
continue;
}

- memcpy(req->rc, &h, sizeof(h));
- req->rc->offset = 0;
+ memcpy(&req->rc, &h, sizeof(h));
+ req->rc.offset = 0;

masked_cons = xen_9pfs_mask(cons, XEN_9PFS_RING_SIZE);
/* Then, read the whole packet (including the header) */
- xen_9pfs_read_packet(req->rc->sdata, ring->data.in, h.size,
+ xen_9pfs_read_packet(req->rc.sdata, ring->data.in, h.size,
masked_prod, &masked_cons,
XEN_9PFS_RING_SIZE);

--
2.17.1


2018-08-10 00:53:08

by piaojun

[permalink] [raw]
Subject: Re: [PATCH v3 1/2] net/9p: embed fcall in req to round down buffer allocs

LGTM

On 2018/8/9 22:33, Dominique Martinet wrote:
> From: Dominique Martinet <[email protected]>
>
> 'msize' is often a power of two, or at least page-aligned, so avoiding
> an overhead of two dozen bytes for each allocation will help the
> allocator do its work and reduce memory fragmentation.
>
> Suggested-by: Matthew Wilcox <[email protected]>
> Signed-off-by: Dominique Martinet <[email protected]>
> Reviewed-by: Greg Kurz <[email protected]>
Acked-by: Jun Piao <[email protected]>

> Cc: Matthew Wilcox <[email protected]>
> Cc: Jun Piao <[email protected]>
> ---
> v2:
> - Add extra label to not free uninitialized memory on alloc failure
> - Rename p9_fcall_alloc to 9p_fcall_init
> - Add a p9_fcall_fini function to echo to init
>
> v3:
> - use p9_call_fini() in rdma_request() instead of kfree, the code
> was in the following patch in previous version
>
> include/net/9p/client.h | 5 +-
> net/9p/client.c | 167 +++++++++++++++++++++-------------------
> net/9p/trans_fd.c | 12 +--
> net/9p/trans_rdma.c | 29 +++----
> net/9p/trans_virtio.c | 18 ++---
> net/9p/trans_xen.c | 12 +--
> 6 files changed, 125 insertions(+), 118 deletions(-)
>
> diff --git a/include/net/9p/client.h b/include/net/9p/client.h
> index a4dc42c53d18..c2671d40bb6b 100644
> --- a/include/net/9p/client.h
> +++ b/include/net/9p/client.h
> @@ -95,8 +95,8 @@ struct p9_req_t {
> int status;
> int t_err;
> wait_queue_head_t wq;
> - struct p9_fcall *tc;
> - struct p9_fcall *rc;
> + struct p9_fcall tc;
> + struct p9_fcall rc;
> void *aux;
> struct list_head req_list;
> };
> @@ -230,6 +230,7 @@ int p9_client_mkdir_dotl(struct p9_fid *fid, const char *name, int mode,
> kgid_t gid, struct p9_qid *);
> int p9_client_lock_dotl(struct p9_fid *fid, struct p9_flock *flock, u8 *status);
> int p9_client_getlock_dotl(struct p9_fid *fid, struct p9_getlock *fl);
> +void p9_fcall_fini(struct p9_fcall *fc);
> struct p9_req_t *p9_tag_lookup(struct p9_client *, u16);
> void p9_client_cb(struct p9_client *c, struct p9_req_t *req, int status);
>
> diff --git a/net/9p/client.c b/net/9p/client.c
> index 88db45966740..ed78751aee7c 100644
> --- a/net/9p/client.c
> +++ b/net/9p/client.c
> @@ -231,16 +231,20 @@ static int parse_opts(char *opts, struct p9_client *clnt)
> return ret;
> }
>
> -static struct p9_fcall *p9_fcall_alloc(int alloc_msize)
> +static int p9_fcall_init(struct p9_fcall *fc, int alloc_msize)
> {
> - struct p9_fcall *fc;
> - fc = kmalloc(sizeof(struct p9_fcall) + alloc_msize, GFP_NOFS);
> - if (!fc)
> - return NULL;
> + fc->sdata = kmalloc(alloc_msize, GFP_NOFS);
> + if (!fc->sdata)
> + return -ENOMEM;
> fc->capacity = alloc_msize;
> - fc->sdata = (char *) fc + sizeof(struct p9_fcall);
> - return fc;
> + return 0;
> +}
> +
> +void p9_fcall_fini(struct p9_fcall *fc)
> +{
> + kfree(fc->sdata);
> }
> +EXPORT_SYMBOL(p9_fcall_fini);
>
> static struct kmem_cache *p9_req_cache;
>
> @@ -263,13 +267,13 @@ p9_tag_alloc(struct p9_client *c, int8_t type, unsigned int max_size)
> if (!req)
> return NULL;
>
> - req->tc = p9_fcall_alloc(alloc_msize);
> - req->rc = p9_fcall_alloc(alloc_msize);
> - if (!req->tc || !req->rc)
> + if (p9_fcall_init(&req->tc, alloc_msize))
> + goto free_req;
> + if (p9_fcall_init(&req->rc, alloc_msize))
> goto free;
>
> - p9pdu_reset(req->tc);
> - p9pdu_reset(req->rc);
> + p9pdu_reset(&req->tc);
> + p9pdu_reset(&req->rc);
> req->status = REQ_STATUS_ALLOC;
> init_waitqueue_head(&req->wq);
> INIT_LIST_HEAD(&req->req_list);
> @@ -281,7 +285,7 @@ p9_tag_alloc(struct p9_client *c, int8_t type, unsigned int max_size)
> GFP_NOWAIT);
> else
> tag = idr_alloc(&c->reqs, req, 0, P9_NOTAG, GFP_NOWAIT);
> - req->tc->tag = tag;
> + req->tc.tag = tag;
> spin_unlock_irq(&c->lock);
> idr_preload_end();
> if (tag < 0)
> @@ -290,8 +294,9 @@ p9_tag_alloc(struct p9_client *c, int8_t type, unsigned int max_size)
> return req;
>
> free:
> - kfree(req->tc);
> - kfree(req->rc);
> + p9_fcall_fini(&req->tc);
> + p9_fcall_fini(&req->rc);
> +free_req:
> kmem_cache_free(p9_req_cache, req);
> return ERR_PTR(-ENOMEM);
> }
> @@ -329,14 +334,14 @@ EXPORT_SYMBOL(p9_tag_lookup);
> static void p9_free_req(struct p9_client *c, struct p9_req_t *r)
> {
> unsigned long flags;
> - u16 tag = r->tc->tag;
> + u16 tag = r->tc.tag;
>
> p9_debug(P9_DEBUG_MUX, "clnt %p req %p tag: %d\n", c, r, tag);
> spin_lock_irqsave(&c->lock, flags);
> idr_remove(&c->reqs, tag);
> spin_unlock_irqrestore(&c->lock, flags);
> - kfree(r->tc);
> - kfree(r->rc);
> + p9_fcall_fini(&r->tc);
> + p9_fcall_fini(&r->rc);
> kmem_cache_free(p9_req_cache, r);
> }
>
> @@ -368,7 +373,7 @@ static void p9_tag_cleanup(struct p9_client *c)
> */
> void p9_client_cb(struct p9_client *c, struct p9_req_t *req, int status)
> {
> - p9_debug(P9_DEBUG_MUX, " tag %d\n", req->tc->tag);
> + p9_debug(P9_DEBUG_MUX, " tag %d\n", req->tc.tag);
>
> /*
> * This barrier is needed to make sure any change made to req before
> @@ -378,7 +383,7 @@ void p9_client_cb(struct p9_client *c, struct p9_req_t *req, int status)
> req->status = status;
>
> wake_up(&req->wq);
> - p9_debug(P9_DEBUG_MUX, "wakeup: %d\n", req->tc->tag);
> + p9_debug(P9_DEBUG_MUX, "wakeup: %d\n", req->tc.tag);
> }
> EXPORT_SYMBOL(p9_client_cb);
>
> @@ -449,18 +454,18 @@ static int p9_check_errors(struct p9_client *c, struct p9_req_t *req)
> int err;
> int ecode;
>
> - err = p9_parse_header(req->rc, NULL, &type, NULL, 0);
> - if (req->rc->size >= c->msize) {
> + err = p9_parse_header(&req->rc, NULL, &type, NULL, 0);
> + if (req->rc.size >= c->msize) {
> p9_debug(P9_DEBUG_ERROR,
> "requested packet size too big: %d\n",
> - req->rc->size);
> + req->rc.size);
> return -EIO;
> }
> /*
> * dump the response from server
> * This should be after check errors which poplulate pdu_fcall.
> */
> - trace_9p_protocol_dump(c, req->rc);
> + trace_9p_protocol_dump(c, &req->rc);
> if (err) {
> p9_debug(P9_DEBUG_ERROR, "couldn't parse header %d\n", err);
> return err;
> @@ -470,7 +475,7 @@ static int p9_check_errors(struct p9_client *c, struct p9_req_t *req)
>
> if (!p9_is_proto_dotl(c)) {
> char *ename;
> - err = p9pdu_readf(req->rc, c->proto_version, "s?d",
> + err = p9pdu_readf(&req->rc, c->proto_version, "s?d",
> &ename, &ecode);
> if (err)
> goto out_err;
> @@ -486,7 +491,7 @@ static int p9_check_errors(struct p9_client *c, struct p9_req_t *req)
> }
> kfree(ename);
> } else {
> - err = p9pdu_readf(req->rc, c->proto_version, "d", &ecode);
> + err = p9pdu_readf(&req->rc, c->proto_version, "d", &ecode);
> err = -ecode;
>
> p9_debug(P9_DEBUG_9P, "<<< RLERROR (%d)\n", -ecode);
> @@ -520,12 +525,12 @@ static int p9_check_zc_errors(struct p9_client *c, struct p9_req_t *req,
> int8_t type;
> char *ename = NULL;
>
> - err = p9_parse_header(req->rc, NULL, &type, NULL, 0);
> + err = p9_parse_header(&req->rc, NULL, &type, NULL, 0);
> /*
> * dump the response from server
> * This should be after parse_header which poplulate pdu_fcall.
> */
> - trace_9p_protocol_dump(c, req->rc);
> + trace_9p_protocol_dump(c, &req->rc);
> if (err) {
> p9_debug(P9_DEBUG_ERROR, "couldn't parse header %d\n", err);
> return err;
> @@ -540,13 +545,13 @@ static int p9_check_zc_errors(struct p9_client *c, struct p9_req_t *req,
> /* 7 = header size for RERROR; */
> int inline_len = in_hdrlen - 7;
>
> - len = req->rc->size - req->rc->offset;
> + len = req->rc.size - req->rc.offset;
> if (len > (P9_ZC_HDR_SZ - 7)) {
> err = -EFAULT;
> goto out_err;
> }
>
> - ename = &req->rc->sdata[req->rc->offset];
> + ename = &req->rc.sdata[req->rc.offset];
> if (len > inline_len) {
> /* We have error in external buffer */
> if (!copy_from_iter_full(ename + inline_len,
> @@ -556,7 +561,7 @@ static int p9_check_zc_errors(struct p9_client *c, struct p9_req_t *req,
> }
> }
> ename = NULL;
> - err = p9pdu_readf(req->rc, c->proto_version, "s?d",
> + err = p9pdu_readf(&req->rc, c->proto_version, "s?d",
> &ename, &ecode);
> if (err)
> goto out_err;
> @@ -572,7 +577,7 @@ static int p9_check_zc_errors(struct p9_client *c, struct p9_req_t *req,
> }
> kfree(ename);
> } else {
> - err = p9pdu_readf(req->rc, c->proto_version, "d", &ecode);
> + err = p9pdu_readf(&req->rc, c->proto_version, "d", &ecode);
> err = -ecode;
>
> p9_debug(P9_DEBUG_9P, "<<< RLERROR (%d)\n", -ecode);
> @@ -605,7 +610,7 @@ static int p9_client_flush(struct p9_client *c, struct p9_req_t *oldreq)
> int16_t oldtag;
> int err;
>
> - err = p9_parse_header(oldreq->tc, NULL, NULL, &oldtag, 1);
> + err = p9_parse_header(&oldreq->tc, NULL, NULL, &oldtag, 1);
> if (err)
> return err;
>
> @@ -649,12 +654,12 @@ static struct p9_req_t *p9_client_prepare_req(struct p9_client *c,
> return req;
>
> /* marshall the data */
> - p9pdu_prepare(req->tc, req->tc->tag, type);
> - err = p9pdu_vwritef(req->tc, c->proto_version, fmt, ap);
> + p9pdu_prepare(&req->tc, req->tc.tag, type);
> + err = p9pdu_vwritef(&req->tc, c->proto_version, fmt, ap);
> if (err)
> goto reterr;
> - p9pdu_finalize(c, req->tc);
> - trace_9p_client_req(c, type, req->tc->tag);
> + p9pdu_finalize(c, &req->tc);
> + trace_9p_client_req(c, type, req->tc.tag);
> return req;
> reterr:
> p9_free_req(c, req);
> @@ -739,7 +744,7 @@ p9_client_rpc(struct p9_client *c, int8_t type, const char *fmt, ...)
> goto reterr;
>
> err = p9_check_errors(c, req);
> - trace_9p_client_res(c, type, req->rc->tag, err);
> + trace_9p_client_res(c, type, req->rc.tag, err);
> if (!err)
> return req;
> reterr:
> @@ -821,7 +826,7 @@ static struct p9_req_t *p9_client_zc_rpc(struct p9_client *c, int8_t type,
> goto reterr;
>
> err = p9_check_zc_errors(c, req, uidata, in_hdrlen);
> - trace_9p_client_res(c, type, req->rc->tag, err);
> + trace_9p_client_res(c, type, req->rc.tag, err);
> if (!err)
> return req;
> reterr:
> @@ -904,10 +909,10 @@ static int p9_client_version(struct p9_client *c)
> if (IS_ERR(req))
> return PTR_ERR(req);
>
> - err = p9pdu_readf(req->rc, c->proto_version, "ds", &msize, &version);
> + err = p9pdu_readf(&req->rc, c->proto_version, "ds", &msize, &version);
> if (err) {
> p9_debug(P9_DEBUG_9P, "version error %d\n", err);
> - trace_9p_protocol_dump(c, req->rc);
> + trace_9p_protocol_dump(c, &req->rc);
> goto error;
> }
>
> @@ -1056,9 +1061,9 @@ struct p9_fid *p9_client_attach(struct p9_client *clnt, struct p9_fid *afid,
> goto error;
> }
>
> - err = p9pdu_readf(req->rc, clnt->proto_version, "Q", &qid);
> + err = p9pdu_readf(&req->rc, clnt->proto_version, "Q", &qid);
> if (err) {
> - trace_9p_protocol_dump(clnt, req->rc);
> + trace_9p_protocol_dump(clnt, &req->rc);
> p9_free_req(clnt, req);
> goto error;
> }
> @@ -1113,9 +1118,9 @@ struct p9_fid *p9_client_walk(struct p9_fid *oldfid, uint16_t nwname,
> goto error;
> }
>
> - err = p9pdu_readf(req->rc, clnt->proto_version, "R", &nwqids, &wqids);
> + err = p9pdu_readf(&req->rc, clnt->proto_version, "R", &nwqids, &wqids);
> if (err) {
> - trace_9p_protocol_dump(clnt, req->rc);
> + trace_9p_protocol_dump(clnt, &req->rc);
> p9_free_req(clnt, req);
> goto clunk_fid;
> }
> @@ -1180,9 +1185,9 @@ int p9_client_open(struct p9_fid *fid, int mode)
> goto error;
> }
>
> - err = p9pdu_readf(req->rc, clnt->proto_version, "Qd", &qid, &iounit);
> + err = p9pdu_readf(&req->rc, clnt->proto_version, "Qd", &qid, &iounit);
> if (err) {
> - trace_9p_protocol_dump(clnt, req->rc);
> + trace_9p_protocol_dump(clnt, &req->rc);
> goto free_and_error;
> }
>
> @@ -1224,9 +1229,9 @@ int p9_client_create_dotl(struct p9_fid *ofid, const char *name, u32 flags, u32
> goto error;
> }
>
> - err = p9pdu_readf(req->rc, clnt->proto_version, "Qd", qid, &iounit);
> + err = p9pdu_readf(&req->rc, clnt->proto_version, "Qd", qid, &iounit);
> if (err) {
> - trace_9p_protocol_dump(clnt, req->rc);
> + trace_9p_protocol_dump(clnt, &req->rc);
> goto free_and_error;
> }
>
> @@ -1269,9 +1274,9 @@ int p9_client_fcreate(struct p9_fid *fid, const char *name, u32 perm, int mode,
> goto error;
> }
>
> - err = p9pdu_readf(req->rc, clnt->proto_version, "Qd", &qid, &iounit);
> + err = p9pdu_readf(&req->rc, clnt->proto_version, "Qd", &qid, &iounit);
> if (err) {
> - trace_9p_protocol_dump(clnt, req->rc);
> + trace_9p_protocol_dump(clnt, &req->rc);
> goto free_and_error;
> }
>
> @@ -1308,9 +1313,9 @@ int p9_client_symlink(struct p9_fid *dfid, const char *name,
> goto error;
> }
>
> - err = p9pdu_readf(req->rc, clnt->proto_version, "Q", qid);
> + err = p9pdu_readf(&req->rc, clnt->proto_version, "Q", qid);
> if (err) {
> - trace_9p_protocol_dump(clnt, req->rc);
> + trace_9p_protocol_dump(clnt, &req->rc);
> goto free_and_error;
> }
>
> @@ -1506,10 +1511,10 @@ p9_client_read(struct p9_fid *fid, u64 offset, struct iov_iter *to, int *err)
> break;
> }
>
> - *err = p9pdu_readf(req->rc, clnt->proto_version,
> + *err = p9pdu_readf(&req->rc, clnt->proto_version,
> "D", &count, &dataptr);
> if (*err) {
> - trace_9p_protocol_dump(clnt, req->rc);
> + trace_9p_protocol_dump(clnt, &req->rc);
> p9_free_req(clnt, req);
> break;
> }
> @@ -1579,9 +1584,9 @@ p9_client_write(struct p9_fid *fid, u64 offset, struct iov_iter *from, int *err)
> break;
> }
>
> - *err = p9pdu_readf(req->rc, clnt->proto_version, "d", &count);
> + *err = p9pdu_readf(&req->rc, clnt->proto_version, "d", &count);
> if (*err) {
> - trace_9p_protocol_dump(clnt, req->rc);
> + trace_9p_protocol_dump(clnt, &req->rc);
> p9_free_req(clnt, req);
> break;
> }
> @@ -1623,9 +1628,9 @@ struct p9_wstat *p9_client_stat(struct p9_fid *fid)
> goto error;
> }
>
> - err = p9pdu_readf(req->rc, clnt->proto_version, "wS", &ignored, ret);
> + err = p9pdu_readf(&req->rc, clnt->proto_version, "wS", &ignored, ret);
> if (err) {
> - trace_9p_protocol_dump(clnt, req->rc);
> + trace_9p_protocol_dump(clnt, &req->rc);
> p9_free_req(clnt, req);
> goto error;
> }
> @@ -1676,9 +1681,9 @@ struct p9_stat_dotl *p9_client_getattr_dotl(struct p9_fid *fid,
> goto error;
> }
>
> - err = p9pdu_readf(req->rc, clnt->proto_version, "A", ret);
> + err = p9pdu_readf(&req->rc, clnt->proto_version, "A", ret);
> if (err) {
> - trace_9p_protocol_dump(clnt, req->rc);
> + trace_9p_protocol_dump(clnt, &req->rc);
> p9_free_req(clnt, req);
> goto error;
> }
> @@ -1828,11 +1833,11 @@ int p9_client_statfs(struct p9_fid *fid, struct p9_rstatfs *sb)
> goto error;
> }
>
> - err = p9pdu_readf(req->rc, clnt->proto_version, "ddqqqqqqd", &sb->type,
> - &sb->bsize, &sb->blocks, &sb->bfree, &sb->bavail,
> - &sb->files, &sb->ffree, &sb->fsid, &sb->namelen);
> + err = p9pdu_readf(&req->rc, clnt->proto_version, "ddqqqqqqd", &sb->type,
> + &sb->bsize, &sb->blocks, &sb->bfree, &sb->bavail,
> + &sb->files, &sb->ffree, &sb->fsid, &sb->namelen);
> if (err) {
> - trace_9p_protocol_dump(clnt, req->rc);
> + trace_9p_protocol_dump(clnt, &req->rc);
> p9_free_req(clnt, req);
> goto error;
> }
> @@ -1936,9 +1941,9 @@ struct p9_fid *p9_client_xattrwalk(struct p9_fid *file_fid,
> err = PTR_ERR(req);
> goto error;
> }
> - err = p9pdu_readf(req->rc, clnt->proto_version, "q", attr_size);
> + err = p9pdu_readf(&req->rc, clnt->proto_version, "q", attr_size);
> if (err) {
> - trace_9p_protocol_dump(clnt, req->rc);
> + trace_9p_protocol_dump(clnt, &req->rc);
> p9_free_req(clnt, req);
> goto clunk_fid;
> }
> @@ -2024,9 +2029,9 @@ int p9_client_readdir(struct p9_fid *fid, char *data, u32 count, u64 offset)
> goto error;
> }
>
> - err = p9pdu_readf(req->rc, clnt->proto_version, "D", &count, &dataptr);
> + err = p9pdu_readf(&req->rc, clnt->proto_version, "D", &count, &dataptr);
> if (err) {
> - trace_9p_protocol_dump(clnt, req->rc);
> + trace_9p_protocol_dump(clnt, &req->rc);
> goto free_and_error;
> }
> if (rsize < count) {
> @@ -2065,9 +2070,9 @@ int p9_client_mknod_dotl(struct p9_fid *fid, const char *name, int mode,
> if (IS_ERR(req))
> return PTR_ERR(req);
>
> - err = p9pdu_readf(req->rc, clnt->proto_version, "Q", qid);
> + err = p9pdu_readf(&req->rc, clnt->proto_version, "Q", qid);
> if (err) {
> - trace_9p_protocol_dump(clnt, req->rc);
> + trace_9p_protocol_dump(clnt, &req->rc);
> goto error;
> }
> p9_debug(P9_DEBUG_9P, "<<< RMKNOD qid %x.%llx.%x\n", qid->type,
> @@ -2096,9 +2101,9 @@ int p9_client_mkdir_dotl(struct p9_fid *fid, const char *name, int mode,
> if (IS_ERR(req))
> return PTR_ERR(req);
>
> - err = p9pdu_readf(req->rc, clnt->proto_version, "Q", qid);
> + err = p9pdu_readf(&req->rc, clnt->proto_version, "Q", qid);
> if (err) {
> - trace_9p_protocol_dump(clnt, req->rc);
> + trace_9p_protocol_dump(clnt, &req->rc);
> goto error;
> }
> p9_debug(P9_DEBUG_9P, "<<< RMKDIR qid %x.%llx.%x\n", qid->type,
> @@ -2131,9 +2136,9 @@ int p9_client_lock_dotl(struct p9_fid *fid, struct p9_flock *flock, u8 *status)
> if (IS_ERR(req))
> return PTR_ERR(req);
>
> - err = p9pdu_readf(req->rc, clnt->proto_version, "b", status);
> + err = p9pdu_readf(&req->rc, clnt->proto_version, "b", status);
> if (err) {
> - trace_9p_protocol_dump(clnt, req->rc);
> + trace_9p_protocol_dump(clnt, &req->rc);
> goto error;
> }
> p9_debug(P9_DEBUG_9P, "<<< RLOCK status %i\n", *status);
> @@ -2162,11 +2167,11 @@ int p9_client_getlock_dotl(struct p9_fid *fid, struct p9_getlock *glock)
> if (IS_ERR(req))
> return PTR_ERR(req);
>
> - err = p9pdu_readf(req->rc, clnt->proto_version, "bqqds", &glock->type,
> - &glock->start, &glock->length, &glock->proc_id,
> - &glock->client_id);
> + err = p9pdu_readf(&req->rc, clnt->proto_version, "bqqds", &glock->type,
> + &glock->start, &glock->length, &glock->proc_id,
> + &glock->client_id);
> if (err) {
> - trace_9p_protocol_dump(clnt, req->rc);
> + trace_9p_protocol_dump(clnt, &req->rc);
> goto error;
> }
> p9_debug(P9_DEBUG_9P, "<<< RGETLOCK type %i start %lld length %lld "
> @@ -2192,9 +2197,9 @@ int p9_client_readlink(struct p9_fid *fid, char **target)
> if (IS_ERR(req))
> return PTR_ERR(req);
>
> - err = p9pdu_readf(req->rc, clnt->proto_version, "s", target);
> + err = p9pdu_readf(&req->rc, clnt->proto_version, "s", target);
> if (err) {
> - trace_9p_protocol_dump(clnt, req->rc);
> + trace_9p_protocol_dump(clnt, &req->rc);
> goto error;
> }
> p9_debug(P9_DEBUG_9P, "<<< RREADLINK target %s\n", *target);
> diff --git a/net/9p/trans_fd.c b/net/9p/trans_fd.c
> index e2ef3c782c53..20f46f13fe83 100644
> --- a/net/9p/trans_fd.c
> +++ b/net/9p/trans_fd.c
> @@ -354,7 +354,7 @@ static void p9_read_work(struct work_struct *work)
> goto error;
> }
>
> - if (m->req->rc == NULL) {
> + if (m->req->rc.sdata == NULL) {
> p9_debug(P9_DEBUG_ERROR,
> "No recv fcall for tag %d (req %p), disconnecting!\n",
> m->rc.tag, m->req);
> @@ -362,7 +362,7 @@ static void p9_read_work(struct work_struct *work)
> err = -EIO;
> goto error;
> }
> - m->rc.sdata = (char *)m->req->rc + sizeof(struct p9_fcall);
> + m->rc.sdata = m->req->rc.sdata;
> memcpy(m->rc.sdata, m->tmp_buf, m->rc.capacity);
> m->rc.capacity = m->rc.size;
> }
> @@ -372,7 +372,7 @@ static void p9_read_work(struct work_struct *work)
> */
> if ((m->req) && (m->rc.offset == m->rc.capacity)) {
> p9_debug(P9_DEBUG_TRANS, "got new packet\n");
> - m->req->rc->size = m->rc.offset;
> + m->req->rc.size = m->rc.offset;
> spin_lock(&m->client->lock);
> if (m->req->status != REQ_STATUS_ERROR)
> status = REQ_STATUS_RCVD;
> @@ -469,8 +469,8 @@ static void p9_write_work(struct work_struct *work)
> p9_debug(P9_DEBUG_TRANS, "move req %p\n", req);
> list_move_tail(&req->req_list, &m->req_list);
>
> - m->wbuf = req->tc->sdata;
> - m->wsize = req->tc->size;
> + m->wbuf = req->tc.sdata;
> + m->wsize = req->tc.size;
> m->wpos = 0;
> spin_unlock(&m->client->lock);
> }
> @@ -663,7 +663,7 @@ static int p9_fd_request(struct p9_client *client, struct p9_req_t *req)
> struct p9_conn *m = &ts->conn;
>
> p9_debug(P9_DEBUG_TRANS, "mux %p task %p tcall %p id %d\n",
> - m, current, req->tc, req->tc->id);
> + m, current, &req->tc, req->tc.id);
> if (m->err < 0)
> return m->err;
>
> diff --git a/net/9p/trans_rdma.c b/net/9p/trans_rdma.c
> index 2ab4574183c9..c60655c90c9e 100644
> --- a/net/9p/trans_rdma.c
> +++ b/net/9p/trans_rdma.c
> @@ -122,7 +122,7 @@ struct p9_rdma_context {
> dma_addr_t busa;
> union {
> struct p9_req_t *req;
> - struct p9_fcall *rc;
> + struct p9_fcall rc;
> };
> };
>
> @@ -320,8 +320,8 @@ recv_done(struct ib_cq *cq, struct ib_wc *wc)
> if (wc->status != IB_WC_SUCCESS)
> goto err_out;
>
> - c->rc->size = wc->byte_len;
> - err = p9_parse_header(c->rc, NULL, NULL, &tag, 1);
> + c->rc.size = wc->byte_len;
> + err = p9_parse_header(&c->rc, NULL, NULL, &tag, 1);
> if (err)
> goto err_out;
>
> @@ -331,12 +331,13 @@ recv_done(struct ib_cq *cq, struct ib_wc *wc)
>
> /* Check that we have not yet received a reply for this request.
> */
> - if (unlikely(req->rc)) {
> + if (unlikely(req->rc.sdata)) {
> pr_err("Duplicate reply for request %d", tag);
> goto err_out;
> }
>
> - req->rc = c->rc;
> + req->rc.size = c->rc.size;
> + req->rc.sdata = c->rc.sdata;
> p9_client_cb(client, req, REQ_STATUS_RCVD);
>
> out:
> @@ -361,7 +362,7 @@ send_done(struct ib_cq *cq, struct ib_wc *wc)
> container_of(wc->wr_cqe, struct p9_rdma_context, cqe);
>
> ib_dma_unmap_single(rdma->cm_id->device,
> - c->busa, c->req->tc->size,
> + c->busa, c->req->tc.size,
> DMA_TO_DEVICE);
> up(&rdma->sq_sem);
> kfree(c);
> @@ -401,7 +402,7 @@ post_recv(struct p9_client *client, struct p9_rdma_context *c)
> struct ib_sge sge;
>
> c->busa = ib_dma_map_single(rdma->cm_id->device,
> - c->rc->sdata, client->msize,
> + c->rc.sdata, client->msize,
> DMA_FROM_DEVICE);
> if (ib_dma_mapping_error(rdma->cm_id->device, c->busa))
> goto error;
> @@ -443,9 +444,9 @@ static int rdma_request(struct p9_client *client, struct p9_req_t *req)
> **/
> if (unlikely(atomic_read(&rdma->excess_rc) > 0)) {
> if ((atomic_sub_return(1, &rdma->excess_rc) >= 0)) {
> - /* Got one ! */
> - kfree(req->rc);
> - req->rc = NULL;
> + /* Got one! */
> + p9_fcall_fini(&req->rc);
> + req->rc.sdata = NULL;
> goto dont_need_post_recv;
> } else {
> /* We raced and lost. */
> @@ -459,7 +460,7 @@ static int rdma_request(struct p9_client *client, struct p9_req_t *req)
> err = -ENOMEM;
> goto recv_error;
> }
> - rpl_context->rc = req->rc;
> + rpl_context->rc.sdata = req->rc.sdata;
>
> /*
> * Post a receive buffer for this request. We need to ensure
> @@ -479,7 +480,7 @@ static int rdma_request(struct p9_client *client, struct p9_req_t *req)
> goto recv_error;
> }
> /* remove posted receive buffer from request structure */
> - req->rc = NULL;
> + req->rc.sdata = NULL;
>
> dont_need_post_recv:
> /* Post the request */
> @@ -491,7 +492,7 @@ static int rdma_request(struct p9_client *client, struct p9_req_t *req)
> c->req = req;
>
> c->busa = ib_dma_map_single(rdma->cm_id->device,
> - c->req->tc->sdata, c->req->tc->size,
> + c->req->tc.sdata, c->req->tc.size,
> DMA_TO_DEVICE);
> if (ib_dma_mapping_error(rdma->cm_id->device, c->busa)) {
> err = -EIO;
> @@ -501,7 +502,7 @@ static int rdma_request(struct p9_client *client, struct p9_req_t *req)
> c->cqe.done = send_done;
>
> sge.addr = c->busa;
> - sge.length = c->req->tc->size;
> + sge.length = c->req->tc.size;
> sge.lkey = rdma->pd->local_dma_lkey;
>
> wr.next = NULL;
> diff --git a/net/9p/trans_virtio.c b/net/9p/trans_virtio.c
> index 7728b0acde09..3dd6ce1c0f2d 100644
> --- a/net/9p/trans_virtio.c
> +++ b/net/9p/trans_virtio.c
> @@ -155,7 +155,7 @@ static void req_done(struct virtqueue *vq)
> }
>
> if (len) {
> - req->rc->size = len;
> + req->rc.size = len;
> p9_client_cb(chan->client, req, REQ_STATUS_RCVD);
> }
> }
> @@ -273,12 +273,12 @@ p9_virtio_request(struct p9_client *client, struct p9_req_t *req)
> out_sgs = in_sgs = 0;
> /* Handle out VirtIO ring buffers */
> out = pack_sg_list(chan->sg, 0,
> - VIRTQUEUE_NUM, req->tc->sdata, req->tc->size);
> + VIRTQUEUE_NUM, req->tc.sdata, req->tc.size);
> if (out)
> sgs[out_sgs++] = chan->sg;
>
> in = pack_sg_list(chan->sg, out,
> - VIRTQUEUE_NUM, req->rc->sdata, req->rc->capacity);
> + VIRTQUEUE_NUM, req->rc.sdata, req->rc.capacity);
> if (in)
> sgs[out_sgs + in_sgs++] = chan->sg + out;
>
> @@ -416,15 +416,15 @@ p9_virtio_zc_request(struct p9_client *client, struct p9_req_t *req,
> out_nr_pages = DIV_ROUND_UP(n + offs, PAGE_SIZE);
> if (n != outlen) {
> __le32 v = cpu_to_le32(n);
> - memcpy(&req->tc->sdata[req->tc->size - 4], &v, 4);
> + memcpy(&req->tc.sdata[req->tc.size - 4], &v, 4);
> outlen = n;
> }
> /* The size field of the message must include the length of the
> * header and the length of the data. We didn't actually know
> * the length of the data until this point so add it in now.
> */
> - sz = cpu_to_le32(req->tc->size + outlen);
> - memcpy(&req->tc->sdata[0], &sz, sizeof(sz));
> + sz = cpu_to_le32(req->tc.size + outlen);
> + memcpy(&req->tc.sdata[0], &sz, sizeof(sz));
> } else if (uidata) {
> int n = p9_get_mapped_pages(chan, &in_pages, uidata,
> inlen, &offs, &need_drop);
> @@ -433,7 +433,7 @@ p9_virtio_zc_request(struct p9_client *client, struct p9_req_t *req,
> in_nr_pages = DIV_ROUND_UP(n + offs, PAGE_SIZE);
> if (n != inlen) {
> __le32 v = cpu_to_le32(n);
> - memcpy(&req->tc->sdata[req->tc->size - 4], &v, 4);
> + memcpy(&req->tc.sdata[req->tc.size - 4], &v, 4);
> inlen = n;
> }
> }
> @@ -445,7 +445,7 @@ p9_virtio_zc_request(struct p9_client *client, struct p9_req_t *req,
>
> /* out data */
> out = pack_sg_list(chan->sg, 0,
> - VIRTQUEUE_NUM, req->tc->sdata, req->tc->size);
> + VIRTQUEUE_NUM, req->tc.sdata, req->tc.size);
>
> if (out)
> sgs[out_sgs++] = chan->sg;
> @@ -464,7 +464,7 @@ p9_virtio_zc_request(struct p9_client *client, struct p9_req_t *req,
> * alloced memory and payload onto the user buffer.
> */
> in = pack_sg_list(chan->sg, out,
> - VIRTQUEUE_NUM, req->rc->sdata, in_hdr_len);
> + VIRTQUEUE_NUM, req->rc.sdata, in_hdr_len);
> if (in)
> sgs[out_sgs + in_sgs++] = chan->sg + out;
>
> diff --git a/net/9p/trans_xen.c b/net/9p/trans_xen.c
> index c2d54ac76bfd..1a5b38892eb4 100644
> --- a/net/9p/trans_xen.c
> +++ b/net/9p/trans_xen.c
> @@ -141,7 +141,7 @@ static int p9_xen_request(struct p9_client *client, struct p9_req_t *p9_req)
> struct xen_9pfs_front_priv *priv = NULL;
> RING_IDX cons, prod, masked_cons, masked_prod;
> unsigned long flags;
> - u32 size = p9_req->tc->size;
> + u32 size = p9_req->tc.size;
> struct xen_9pfs_dataring *ring;
> int num;
>
> @@ -154,7 +154,7 @@ static int p9_xen_request(struct p9_client *client, struct p9_req_t *p9_req)
> if (!priv || priv->client != client)
> return -EINVAL;
>
> - num = p9_req->tc->tag % priv->num_rings;
> + num = p9_req->tc.tag % priv->num_rings;
> ring = &priv->rings[num];
>
> again:
> @@ -176,7 +176,7 @@ static int p9_xen_request(struct p9_client *client, struct p9_req_t *p9_req)
> masked_prod = xen_9pfs_mask(prod, XEN_9PFS_RING_SIZE);
> masked_cons = xen_9pfs_mask(cons, XEN_9PFS_RING_SIZE);
>
> - xen_9pfs_write_packet(ring->data.out, p9_req->tc->sdata, size,
> + xen_9pfs_write_packet(ring->data.out, p9_req->tc.sdata, size,
> &masked_prod, masked_cons, XEN_9PFS_RING_SIZE);
>
> p9_req->status = REQ_STATUS_SENT;
> @@ -229,12 +229,12 @@ static void p9_xen_response(struct work_struct *work)
> continue;
> }
>
> - memcpy(req->rc, &h, sizeof(h));
> - req->rc->offset = 0;
> + memcpy(&req->rc, &h, sizeof(h));
> + req->rc.offset = 0;
>
> masked_cons = xen_9pfs_mask(cons, XEN_9PFS_RING_SIZE);
> /* Then, read the whole packet (including the header) */
> - xen_9pfs_read_packet(req->rc->sdata, ring->data.in, h.size,
> + xen_9pfs_read_packet(req->rc.sdata, ring->data.in, h.size,
> masked_prod, &masked_cons,
> XEN_9PFS_RING_SIZE);
>
>

2018-08-10 01:31:50

by piaojun

[permalink] [raw]
Subject: Re: [PATCH v3 2/2] net/9p: add a per-client fcall kmem_cache

Hi Dominique,

Could you help paste the test result of before-after-applied this patch in
comment? And please see my comments below.

On 2018/8/9 22:33, Dominique Martinet wrote:
> From: Dominique Martinet <[email protected]>
>
> Having a specific cache for the fcall allocations helps speed up
> allocations a bit, especially in case of non-"round" msizes.
>
> The caches will automatically be merged if there are multiple caches
> of items with the same size so we do not need to try to share a cache
> between different clients of the same size.
>
> Since the msize is negotiated with the server, only allocate the cache
> after that negotiation has happened - previous allocations or
> allocations of different sizes (e.g. zero-copy fcall) are made with
> kmalloc directly.
>
> Signed-off-by: Dominique Martinet <[email protected]>
> Cc: Matthew Wilcox <[email protected]>
> Cc: Greg Kurz <[email protected]>
> Cc: Jun Piao <[email protected]>
> ---
> v2:
> - Add a pointer to the cache in p9_fcall to make sure a buffer
> allocated with kmalloc gets freed with kfree and vice-versa
> This could have been smaller with a bool but this spares having
> to look at the client so looked a bit cleaner, I'm expecting this
> patch will need a v3 one way or another so I went for the bolder
> approach - please say if you think a smaller item is better ; I *think*
> nothing relies on this being ordered the same way as the data on the
> wire (struct isn't packed anyway) so we can move id after tag and add
> another u8 to not have any overhead
>
> - added likely() to cache existence check in allocation, but nothing
> for msize check or free because of zc request being of different size
>
> v3:
> - defined the userdata region in the cache, as read or write calls can
> access the buffer directly (lead to warnings with HARDENED_USERCOPY=y)
>
>
>
> I didn't get any comment on v2 for this patch, but I'm still not fully
> happy with this in all honesty.
>
> Part of me believes we might be better off always allocating from the
> cache even for zero-copy headers, it's a waste of space but the buffers
> are there and being reused constantly for non-zc calls, and mixing
> kmallocs in could lead to these buffers being really freed and
> reallocated all the time instead of reusing the slab buffers
> continuously.
>
> That would let me move the likely() for the fast path, with the only
> exception being the TVERSION initial call on mount, for which I still
> don't have any nice idea on how to handle, using a different size in
> p9_client_rpc or prepare_req if the type is TVERSION and I'm really not
> happy with that...
>
>
> include/net/9p/9p.h | 4 ++++
> include/net/9p/client.h | 1 +
> net/9p/client.c | 40 +++++++++++++++++++++++++++++++++++-----
> 3 files changed, 40 insertions(+), 5 deletions(-)
>
> diff --git a/include/net/9p/9p.h b/include/net/9p/9p.h
> index e23896116d9a..645266b74652 100644
> --- a/include/net/9p/9p.h
> +++ b/include/net/9p/9p.h
> @@ -336,6 +336,9 @@ enum p9_qid_t {
> #define P9_NOFID (u32)(~0)
> #define P9_MAXWELEM 16
>
> +/* Minimal header size: len + id + tag */

Here should be 'size + id + tag'.

> +#define P9_HDRSZ 7
> +
> /* ample room for Twrite/Rread header */
> #define P9_IOHDRSZ 24
>
> @@ -558,6 +561,7 @@ struct p9_fcall {
> size_t offset;
> size_t capacity;
>
> + struct kmem_cache *cache;
> u8 *sdata;
> };
>
> diff --git a/include/net/9p/client.h b/include/net/9p/client.h
> index c2671d40bb6b..735f3979d559 100644
> --- a/include/net/9p/client.h
> +++ b/include/net/9p/client.h
> @@ -123,6 +123,7 @@ struct p9_client {
> struct p9_trans_module *trans_mod;
> enum p9_trans_status status;
> void *trans;
> + struct kmem_cache *fcall_cache;
>
> union {
> struct {
> diff --git a/net/9p/client.c b/net/9p/client.c
> index ed78751aee7c..6c57ab1294d7 100644
> --- a/net/9p/client.c
> +++ b/net/9p/client.c
> @@ -192,6 +192,9 @@ static int parse_opts(char *opts, struct p9_client *clnt)
> goto free_and_return;
> }
>
> + if (clnt->trans_mod) {
> + pr_warn("Had trans %s\n", clnt->trans_mod->name);
> + }
> v9fs_put_trans(clnt->trans_mod);
> clnt->trans_mod = v9fs_get_trans_by_name(s);
> if (clnt->trans_mod == NULL) {
> @@ -231,9 +234,16 @@ static int parse_opts(char *opts, struct p9_client *clnt)
> return ret;
> }
>
> -static int p9_fcall_init(struct p9_fcall *fc, int alloc_msize)
> +static int p9_fcall_init(struct p9_client *c, struct p9_fcall *fc,
> + int alloc_msize)
> {
> - fc->sdata = kmalloc(alloc_msize, GFP_NOFS);
> + if (likely(c->fcall_cache) && alloc_msize == c->msize) {
> + fc->sdata = kmem_cache_alloc(c->fcall_cache, GFP_NOFS);
> + fc->cache = c->fcall_cache;
> + } else {
> + fc->sdata = kmalloc(alloc_msize, GFP_NOFS);
> + fc->cache = NULL;
> + }
> if (!fc->sdata)
> return -ENOMEM;
> fc->capacity = alloc_msize;
> @@ -242,7 +252,16 @@ static int p9_fcall_init(struct p9_fcall *fc, int alloc_msize)
>
> void p9_fcall_fini(struct p9_fcall *fc)
> {
> - kfree(fc->sdata);
> + /* sdata can be NULL for interrupted requests in trans_rdma,
> + * and kmem_cache_free does not do NULL-check for us
> + */
> + if (unlikely(!fc->sdata))
> + return;
> +
> + if (fc->cache)
> + kmem_cache_free(fc->cache, fc->sdata);
> + else
> + kfree(fc->sdata);
> }
> EXPORT_SYMBOL(p9_fcall_fini);
>
> @@ -267,9 +286,9 @@ p9_tag_alloc(struct p9_client *c, int8_t type, unsigned int max_size)
> if (!req)
> return NULL;
>
> - if (p9_fcall_init(&req->tc, alloc_msize))
> + if (p9_fcall_init(c, &req->tc, alloc_msize))
> goto free_req;
> - if (p9_fcall_init(&req->rc, alloc_msize))
> + if (p9_fcall_init(c, &req->rc, alloc_msize))
> goto free;
>
> p9pdu_reset(&req->tc);
> @@ -951,6 +970,7 @@ struct p9_client *p9_client_create(const char *dev_name, char *options)
>
> clnt->trans_mod = NULL;
> clnt->trans = NULL;
> + clnt->fcall_cache = NULL;
>
> client_id = utsname()->nodename;
> memcpy(clnt->name, client_id, strlen(client_id) + 1);
> @@ -987,6 +1007,15 @@ struct p9_client *p9_client_create(const char *dev_name, char *options)
> if (err)
> goto close_trans;
>
> + /* P9_HDRSZ + 4 is the smallest packet header we can have that is
> + * followed by data accessed from userspace by read
> + */
> + clnt->fcall_cache =
> + kmem_cache_create_usercopy("9p-fcall-cache", clnt->msize,
> + 0, 0, P9_HDRSZ + 4,
> + clnt->msize - (P9_HDRSZ + 4),
> + NULL);
> +
> return clnt;
>
> close_trans:
> @@ -1018,6 +1047,7 @@ void p9_client_destroy(struct p9_client *clnt)
>
> p9_tag_cleanup(clnt);
>
> + kmem_cache_destroy(clnt->fcall_cache);

I'm afraid that we should check NULL for clnt->fcall_cache.

Thanks,
Jun

> kfree(clnt);
> }
> EXPORT_SYMBOL(p9_client_destroy);
>

2018-08-10 01:47:36

by Dominique Martinet

[permalink] [raw]
Subject: Re: [PATCH v3 2/2] net/9p: add a per-client fcall kmem_cache

piaojun wrote on Fri, Aug 10, 2018:
> Could you help paste the test result of before-after-applied this patch in
> comment? And please see my comments below.

Thanks the the review, do you mean the commit message?

I'll add the summary I wrote in reply to your question a few mails
before.


> > diff --git a/include/net/9p/9p.h b/include/net/9p/9p.h
> > index e23896116d9a..645266b74652 100644
> > --- a/include/net/9p/9p.h
> > +++ b/include/net/9p/9p.h
> > @@ -336,6 +336,9 @@ enum p9_qid_t {
> > #define P9_NOFID (u32)(~0)
> > #define P9_MAXWELEM 16
> >
> > +/* Minimal header size: len + id + tag */
>
> Here should be 'size + id + tag'.

hm I didn't want to repeat size, but I guess people do refer to that
field as size.
I'll actually rewrite it as:
Minimal header size: size[4] type[1] tag[2]

> > + kmem_cache_destroy(clnt->fcall_cache);
>
> I'm afraid that we should check NULL for clnt->fcall_cache.

kmem_cache_destroy() in mm/slab_common.c does the null check for us:
------
void kmem_cache_destroy(struct kmem_cache *s)
{
int err;

if (unlikely(!s))
return;
------

--
Dominique

2018-08-10 02:20:52

by piaojun

[permalink] [raw]
Subject: Re: [PATCH v3 2/2] net/9p: add a per-client fcall kmem_cache

Thanks for clearing my doubt, and you can add:

Acked-by: Jun Piao <[email protected]>

On 2018/8/10 9:41, Dominique Martinet wrote:
> piaojun wrote on Fri, Aug 10, 2018:
>> Could you help paste the test result of before-after-applied this patch in
>> comment? And please see my comments below.
>
> Thanks the the review, do you mean the commit message?
>
> I'll add the summary I wrote in reply to your question a few mails
> before.
>
Yes, I mean the commit message.

>
>>> diff --git a/include/net/9p/9p.h b/include/net/9p/9p.h
>>> index e23896116d9a..645266b74652 100644
>>> --- a/include/net/9p/9p.h
>>> +++ b/include/net/9p/9p.h
>>> @@ -336,6 +336,9 @@ enum p9_qid_t {
>>> #define P9_NOFID (u32)(~0)
>>> #define P9_MAXWELEM 16
>>>
>>> +/* Minimal header size: len + id + tag */
>>
>> Here should be 'size + id + tag'.
>
> hm I didn't want to repeat size, but I guess people do refer to that
> field as size.
> I'll actually rewrite it as:
> Minimal header size: size[4] type[1] tag[2]
>
It looks better.

>>> + kmem_cache_destroy(clnt->fcall_cache);
>>
>> I'm afraid that we should check NULL for clnt->fcall_cache.
>
> kmem_cache_destroy() in mm/slab_common.c does the null check for us:
> ------
> void kmem_cache_destroy(struct kmem_cache *s)
> {
> int err;
>
> if (unlikely(!s))
> return;
> ------
>
OK, it makes sense.