2018-06-28 18:02:58

by Matthew Wilcox

[permalink] [raw]
Subject: [PATCH 0/6] 9p: Use IDRs more effectively

The 9p code doesn't take advantage of the IDR's ability to store
a pointer. We can actually get rid of the p9_idpool abstraction
and the multi-dimensional array of requests.

I haven't tested these patches, so caveat maintainer.

Matthew Wilcox (6):
9p: Change p9_fid_create calling convention
9p: Replace the fidlist with an IDR
9p: Embed wait_queue_head into p9_req_t
9p: Remove an unnecessary memory barrier
9p: Use a slab for allocating requests
9p: Remove p9_idpool

include/net/9p/9p.h | 8 -
include/net/9p/client.h | 62 ++------
net/9p/Makefile | 1 -
net/9p/client.c | 322 ++++++++++++++--------------------------
net/9p/mod.c | 7 +-
net/9p/trans_virtio.c | 2 +-
net/9p/util.c | 141 ------------------
7 files changed, 132 insertions(+), 411 deletions(-)
delete mode 100644 net/9p/util.c

--
2.18.0



2018-06-28 13:28:17

by Matthew Wilcox

[permalink] [raw]
Subject: [PATCH 5/6] 9p: Use a slab for allocating requests

Replace the custom batch allocation with a slab. Use an IDR to store
pointers to the active requests instead of an array. We don't try to
handle P9_NOTAG specially; the IDR will happily shrink all the way back
once the TVERSION call has completed.

Signed-off-by: Matthew Wilcox <[email protected]>
---
include/net/9p/client.h | 51 ++-------
net/9p/client.c | 240 +++++++++++++++-------------------------
net/9p/mod.c | 7 +-
3 files changed, 102 insertions(+), 196 deletions(-)

diff --git a/include/net/9p/client.h b/include/net/9p/client.h
index 0fa0fbab33b0..fd326811ebd4 100644
--- a/include/net/9p/client.h
+++ b/include/net/9p/client.h
@@ -64,22 +64,15 @@ enum p9_trans_status {

/**
* enum p9_req_status_t - status of a request
- * @REQ_STATUS_IDLE: request slot unused
* @REQ_STATUS_ALLOC: request has been allocated but not sent
* @REQ_STATUS_UNSENT: request waiting to be sent
* @REQ_STATUS_SENT: request sent to server
* @REQ_STATUS_RCVD: response received from server
* @REQ_STATUS_FLSHD: request has been flushed
* @REQ_STATUS_ERROR: request encountered an error on the client side
- *
- * The @REQ_STATUS_IDLE state is used to mark a request slot as unused
- * but use is actually tracked by the idpool structure which handles tag
- * id allocation.
- *
*/

enum p9_req_status_t {
- REQ_STATUS_IDLE,
REQ_STATUS_ALLOC,
REQ_STATUS_UNSENT,
REQ_STATUS_SENT,
@@ -92,24 +85,12 @@ enum p9_req_status_t {
* struct p9_req_t - request slots
* @status: status of this request slot
* @t_err: transport error
- * @flush_tag: tag of request being flushed (for flush requests)
* @wq: wait_queue for the client to block on for this request
* @tc: the request fcall structure
* @rc: the response fcall structure
* @aux: transport specific data (provided for trans_fd migration)
* @req_list: link for higher level objects to chain requests
- *
- * Transport use an array to track outstanding requests
- * instead of a list. While this may incurr overhead during initial
- * allocation or expansion, it makes request lookup much easier as the
- * tag id is a index into an array. (We use tag+1 so that we can accommodate
- * the -1 tag for the T_VERSION request).
- * This also has the nice effect of only having to allocate wait_queues
- * once, instead of constantly allocating and freeing them. Its possible
- * other resources could benefit from this scheme as well.
- *
*/
-
struct p9_req_t {
int status;
int t_err;
@@ -117,40 +98,26 @@ struct p9_req_t {
struct p9_fcall *tc;
struct p9_fcall *rc;
void *aux;
-
struct list_head req_list;
};

/**
* struct p9_client - per client instance state
- * @lock: protect @fidlist
+ * @lock: protect @fids and @reqs
* @msize: maximum data size negotiated by protocol
- * @dotu: extension flags negotiated by protocol
* @proto_version: 9P protocol version to use
* @trans_mod: module API instantiated with this client
+ * @status: XXX
* @trans: tranport instance state and API
* @fids: All active FID handles
- * @tagpool - transaction id accounting for session
- * @reqs - 2D array of requests
- * @max_tag - current maximum tag id allocated
- * @name - node name used as client id
+ * @reqs: All active requests.
+ * @name: node name used as client id
*
* The client structure is used to keep track of various per-client
* state that has been instantiated.
- * In order to minimize per-transaction overhead we use a
- * simple array to lookup requests instead of a hash table
- * or linked list. In order to support larger number of
- * transactions, we make this a 2D array, allocating new rows
- * when we need to grow the total number of the transactions.
- *
- * Each row is 256 requests and we'll support up to 256 rows for
- * a total of 64k concurrent requests per session.
- *
- * Bugs: duplicated data and potentially unnecessary elements.
*/
-
struct p9_client {
- spinlock_t lock; /* protect client structure */
+ spinlock_t lock;
unsigned int msize;
unsigned char proto_version;
struct p9_trans_module *trans_mod;
@@ -170,10 +137,7 @@ struct p9_client {
} trans_opts;

struct idr fids;
-
- struct p9_idpool *tagpool;
- struct p9_req_t *reqs[P9_ROW_MAXTAG];
- int max_tag;
+ struct idr reqs;

char name[__NEW_UTS_LEN + 1];
};
@@ -279,4 +243,7 @@ struct p9_fid *p9_client_xattrwalk(struct p9_fid *, const char *, u64 *);
int p9_client_xattrcreate(struct p9_fid *, const char *, u64, int);
int p9_client_readlink(struct p9_fid *fid, char **target);

+int p9_client_init(void);
+void p9_client_exit(void);
+
#endif /* NET_9P_CLIENT_H */
diff --git a/net/9p/client.c b/net/9p/client.c
index 2dce8d8307cc..fd5446377445 100644
--- a/net/9p/client.c
+++ b/net/9p/client.c
@@ -241,132 +241,104 @@ static struct p9_fcall *p9_fcall_alloc(int alloc_msize)
return fc;
}

+static struct kmem_cache *p9_req_cache;
+
/**
- * p9_tag_alloc - lookup/allocate a request by tag
- * @c: client session to lookup tag within
- * @tag: numeric id for transaction
- *
- * this is a simple array lookup, but will grow the
- * request_slots as necessary to accommodate transaction
- * ids which did not previously have a slot.
- *
- * this code relies on the client spinlock to manage locks, its
- * possible we should switch to something else, but I'd rather
- * stick with something low-overhead for the common case.
+ * p9_req_alloc - Allocate a new request.
+ * @c: Client session.
+ * @type: Transaction type.
+ * @max_size: Maximum packet size for this request.
*
+ * Context: Process context. What mutex might we be holding that
+ * requires GFP_NOFS?
+ * Return: Pointer to new request.
*/
-
static struct p9_req_t *
-p9_tag_alloc(struct p9_client *c, u16 tag, unsigned int max_size)
+p9_tag_alloc(struct p9_client *c, int8_t type, unsigned int max_size)
{
- unsigned long flags;
- int row, col;
- struct p9_req_t *req;
+ struct p9_req_t *req = kmem_cache_alloc(p9_req_cache, GFP_NOFS);
int alloc_msize = min(c->msize, max_size);
+ int tag;

- /* This looks up the original request by tag so we know which
- * buffer to read the data into */
- tag++;
-
- if (tag >= c->max_tag) {
- spin_lock_irqsave(&c->lock, flags);
- /* check again since original check was outside of lock */
- while (tag >= c->max_tag) {
- row = (tag / P9_ROW_MAXTAG);
- c->reqs[row] = kcalloc(P9_ROW_MAXTAG,
- sizeof(struct p9_req_t), GFP_ATOMIC);
-
- if (!c->reqs[row]) {
- pr_err("Couldn't grow tag array\n");
- spin_unlock_irqrestore(&c->lock, flags);
- return ERR_PTR(-ENOMEM);
- }
- for (col = 0; col < P9_ROW_MAXTAG; col++) {
- req = &c->reqs[row][col];
- req->status = REQ_STATUS_IDLE;
- init_waitqueue_head(&req->wq);
- }
- c->max_tag += P9_ROW_MAXTAG;
- }
- spin_unlock_irqrestore(&c->lock, flags);
- }
- row = tag / P9_ROW_MAXTAG;
- col = tag % P9_ROW_MAXTAG;
+ if (!req)
+ return NULL;

- req = &c->reqs[row][col];
- if (!req->tc)
- req->tc = p9_fcall_alloc(alloc_msize);
- if (!req->rc)
- req->rc = p9_fcall_alloc(alloc_msize);
+ req->tc = p9_fcall_alloc(alloc_msize);
+ req->rc = p9_fcall_alloc(alloc_msize);
if (!req->tc || !req->rc)
- goto grow_failed;
+ goto free;

p9pdu_reset(req->tc);
p9pdu_reset(req->rc);
-
- req->tc->tag = tag-1;
req->status = REQ_STATUS_ALLOC;
+ init_waitqueue_head(&req->wq);
+ INIT_LIST_HEAD(&req->req_list);
+
+ idr_preload(GFP_NOFS);
+ spin_lock_irq(&c->lock);
+ if (type == P9_TVERSION)
+ tag = idr_alloc(&c->reqs, req, P9_NOTAG, P9_NOTAG + 1,
+ GFP_NOWAIT);
+ else
+ tag = idr_alloc(&c->reqs, req, 0, P9_NOTAG, GFP_NOWAIT);
+ req->tc->tag = tag;
+ spin_unlock_irq(&c->lock);
+ idr_preload_end();
+ if (tag < 0)
+ goto free;

return req;

-grow_failed:
- pr_err("Couldn't grow tag array\n");
+free:
kfree(req->tc);
kfree(req->rc);
- req->tc = req->rc = NULL;
+ kmem_cache_free(p9_req_cache, req);
return ERR_PTR(-ENOMEM);
}

/**
- * p9_tag_lookup - lookup a request by tag
- * @c: client session to lookup tag within
- * @tag: numeric id for transaction
+ * p9_tag_lookup - Look up a request by tag.
+ * @c: Client session.
+ * @tag: Transaction ID.
*
+ * Context: Any context.
+ * Return: A request, or %NULL if there is no request with that tag.
*/
-
struct p9_req_t *p9_tag_lookup(struct p9_client *c, u16 tag)
{
- int row, col;
-
- /* This looks up the original request by tag so we know which
- * buffer to read the data into */
- tag++;
-
- if(tag >= c->max_tag)
- return NULL;
+ struct p9_req_t *req;

- row = tag / P9_ROW_MAXTAG;
- col = tag % P9_ROW_MAXTAG;
+ rcu_read_lock();
+ req = idr_find(&c->reqs, tag);
+ /*
+ * There's no refcount on the req; a malicious server could cause
+ * us to dereference a NULL pointer
+ */
+ rcu_read_unlock();

- return &c->reqs[row][col];
+ return req;
}
EXPORT_SYMBOL(p9_tag_lookup);

/**
- * p9_tag_init - setup tags structure and contents
- * @c: v9fs client struct
- *
- * This initializes the tags structure for each client instance.
+ * p9_free_req - Free a request.
+ * @c: Client session.
+ * @r: Request to free.
*
+ * Context: Any context.
*/
-
-static int p9_tag_init(struct p9_client *c)
+static void p9_free_req(struct p9_client *c, struct p9_req_t *r)
{
- int err = 0;
+ unsigned long flags;
+ u16 tag = r->tc->tag;

- c->tagpool = p9_idpool_create();
- if (IS_ERR(c->tagpool)) {
- err = PTR_ERR(c->tagpool);
- goto error;
- }
- err = p9_idpool_get(c->tagpool); /* reserve tag 0 */
- if (err < 0) {
- p9_idpool_destroy(c->tagpool);
- goto error;
- }
- c->max_tag = 0;
-error:
- return err;
+ p9_debug(P9_DEBUG_MUX, "clnt %p req %p tag: %d\n", c, r, tag);
+ spin_lock_irqsave(&c->lock, flags);
+ idr_remove(&c->reqs, tag);
+ spin_unlock_irqrestore(&c->lock, flags);
+ kfree(r->tc);
+ kfree(r->rc);
+ kmem_cache_free(p9_req_cache, r);
}

/**
@@ -378,52 +350,15 @@ static int p9_tag_init(struct p9_client *c)
*/
static void p9_tag_cleanup(struct p9_client *c)
{
- int row, col;
-
- /* check to insure all requests are idle */
- for (row = 0; row < (c->max_tag/P9_ROW_MAXTAG); row++) {
- for (col = 0; col < P9_ROW_MAXTAG; col++) {
- if (c->reqs[row][col].status != REQ_STATUS_IDLE) {
- p9_debug(P9_DEBUG_MUX,
- "Attempting to cleanup non-free tag %d,%d\n",
- row, col);
- /* TODO: delay execution of cleanup */
- return;
- }
- }
- }
-
- if (c->tagpool) {
- p9_idpool_put(0, c->tagpool); /* free reserved tag 0 */
- p9_idpool_destroy(c->tagpool);
- }
+ struct p9_req_t *req;
+ int id;

- /* free requests associated with tags */
- for (row = 0; row < (c->max_tag/P9_ROW_MAXTAG); row++) {
- for (col = 0; col < P9_ROW_MAXTAG; col++) {
- kfree(c->reqs[row][col].tc);
- kfree(c->reqs[row][col].rc);
- }
- kfree(c->reqs[row]);
+ rcu_read_lock();
+ idr_for_each_entry(&c->reqs, req, id) {
+ pr_info("Tag %d still in use\n", id);
+ p9_free_req(c, req);
}
- c->max_tag = 0;
-}
-
-/**
- * p9_free_req - free a request and clean-up as necessary
- * c: client state
- * r: request to release
- *
- */
-
-static void p9_free_req(struct p9_client *c, struct p9_req_t *r)
-{
- int tag = r->tc->tag;
- p9_debug(P9_DEBUG_MUX, "clnt %p req %p tag: %d\n", c, r, tag);
-
- r->status = REQ_STATUS_IDLE;
- if (tag != P9_NOTAG && p9_idpool_check(tag, c->tagpool))
- p9_idpool_put(tag, c->tagpool);
+ rcu_read_unlock();
}

/**
@@ -686,7 +621,7 @@ static struct p9_req_t *p9_client_prepare_req(struct p9_client *c,
int8_t type, int req_size,
const char *fmt, va_list ap)
{
- int tag, err;
+ int err;
struct p9_req_t *req;

p9_debug(P9_DEBUG_MUX, "client %p op %d\n", c, type);
@@ -699,24 +634,17 @@ static struct p9_req_t *p9_client_prepare_req(struct p9_client *c,
if ((c->status == BeginDisconnect) && (type != P9_TCLUNK))
return ERR_PTR(-EIO);

- tag = P9_NOTAG;
- if (type != P9_TVERSION) {
- tag = p9_idpool_get(c->tagpool);
- if (tag < 0)
- return ERR_PTR(-ENOMEM);
- }
-
- req = p9_tag_alloc(c, tag, req_size);
+ req = p9_tag_alloc(c, type, req_size);
if (IS_ERR(req))
return req;

/* marshall the data */
- p9pdu_prepare(req->tc, tag, type);
+ p9pdu_prepare(req->tc, req->tc->tag, type);
err = p9pdu_vwritef(req->tc, c->proto_version, fmt, ap);
if (err)
goto reterr;
p9pdu_finalize(c, req->tc);
- trace_9p_client_req(c, type, tag);
+ trace_9p_client_req(c, type, req->tc->tag);
return req;
reterr:
p9_free_req(c, req);
@@ -1012,14 +940,11 @@ struct p9_client *p9_client_create(const char *dev_name, char *options)

spin_lock_init(&clnt->lock);
idr_init(&clnt->fids);
-
- err = p9_tag_init(clnt);
- if (err < 0)
- goto free_client;
+ idr_init(&clnt->reqs);

err = parse_opts(options, clnt);
if (err < 0)
- goto destroy_tagpool;
+ goto free_client;

if (!clnt->trans_mod)
clnt->trans_mod = v9fs_get_default_trans();
@@ -1028,7 +953,7 @@ struct p9_client *p9_client_create(const char *dev_name, char *options)
err = -EPROTONOSUPPORT;
p9_debug(P9_DEBUG_ERROR,
"No transport defined or default transport\n");
- goto destroy_tagpool;
+ goto free_client;
}

p9_debug(P9_DEBUG_MUX, "clnt %p trans %p msize %d protocol %d\n",
@@ -1051,8 +976,6 @@ struct p9_client *p9_client_create(const char *dev_name, char *options)
clnt->trans_mod->close(clnt);
put_trans:
v9fs_put_trans(clnt->trans_mod);
-destroy_tagpool:
- p9_idpool_destroy(clnt->tagpool);
free_client:
kfree(clnt);
return ERR_PTR(err);
@@ -2268,3 +2191,14 @@ int p9_client_readlink(struct p9_fid *fid, char **target)
return err;
}
EXPORT_SYMBOL(p9_client_readlink);
+
+int __init p9_client_init(void)
+{
+ p9_req_cache = KMEM_CACHE(p9_req_t, 0);
+ return p9_req_cache ? 0 : -ENOMEM;
+}
+
+void __exit p9_client_exit(void)
+{
+ kmem_cache_destroy(p9_req_cache);
+}
diff --git a/net/9p/mod.c b/net/9p/mod.c
index eb9777f05755..0da56d6af73b 100644
--- a/net/9p/mod.c
+++ b/net/9p/mod.c
@@ -171,7 +171,11 @@ void v9fs_put_trans(struct p9_trans_module *m)
*/
static int __init init_p9(void)
{
- int ret = 0;
+ int ret;
+
+ ret = p9_client_init();
+ if (ret)
+ return ret;

p9_error_init();
pr_info("Installing 9P2000 support\n");
@@ -190,6 +194,7 @@ static void __exit exit_p9(void)
pr_info("Unloading 9P2000 support\n");

p9_trans_fd_exit();
+ p9_client_exit();
}

module_init(init_p9)
--
2.18.0


2018-06-28 13:28:36

by Matthew Wilcox

[permalink] [raw]
Subject: [PATCH 4/6] 9p: Remove an unnecessary memory barrier

And add a comment about why we don't need it.

Signed-off-by: Matthew Wilcox <[email protected]>
---
net/9p/client.c | 6 +-----
1 file changed, 1 insertion(+), 5 deletions(-)

diff --git a/net/9p/client.c b/net/9p/client.c
index 602f76de388a..2dce8d8307cc 100644
--- a/net/9p/client.c
+++ b/net/9p/client.c
@@ -436,13 +436,9 @@ void p9_client_cb(struct p9_client *c, struct p9_req_t *req, int status)
{
p9_debug(P9_DEBUG_MUX, " tag %d\n", req->tc->tag);

- /*
- * This barrier is needed to make sure any change made to req before
- * the other thread wakes up will indeed be seen by the waiting side.
- */
- smp_wmb();
req->status = status;

+ /* wake_up is an implicit write memory barrier */
wake_up(&req->wq);
p9_debug(P9_DEBUG_MUX, "wakeup: %d\n", req->tc->tag);
}
--
2.18.0


2018-06-28 13:28:59

by Matthew Wilcox

[permalink] [raw]
Subject: [PATCH 6/6] 9p: Remove p9_idpool

There are no more users left of the p9_idpool; delete it.

Signed-off-by: Matthew Wilcox <[email protected]>
---
include/net/9p/9p.h | 8 ---
net/9p/Makefile | 1 -
net/9p/util.c | 141 --------------------------------------------
3 files changed, 150 deletions(-)
delete mode 100644 net/9p/util.c

diff --git a/include/net/9p/9p.h b/include/net/9p/9p.h
index b8eb51a661e5..e23896116d9a 100644
--- a/include/net/9p/9p.h
+++ b/include/net/9p/9p.h
@@ -561,16 +561,8 @@ struct p9_fcall {
u8 *sdata;
};

-struct p9_idpool;
-
int p9_errstr2errno(char *errstr, int len);

-struct p9_idpool *p9_idpool_create(void);
-void p9_idpool_destroy(struct p9_idpool *);
-int p9_idpool_get(struct p9_idpool *p);
-void p9_idpool_put(int id, struct p9_idpool *p);
-int p9_idpool_check(int id, struct p9_idpool *p);
-
int p9_error_init(void);
int p9_trans_fd_init(void);
void p9_trans_fd_exit(void);
diff --git a/net/9p/Makefile b/net/9p/Makefile
index c0486cfc85d9..aa0a5641e5d0 100644
--- a/net/9p/Makefile
+++ b/net/9p/Makefile
@@ -8,7 +8,6 @@ obj-$(CONFIG_NET_9P_RDMA) += 9pnet_rdma.o
mod.o \
client.o \
error.o \
- util.o \
protocol.o \
trans_fd.o \
trans_common.o \
diff --git a/net/9p/util.c b/net/9p/util.c
deleted file mode 100644
index 59f278e64f58..000000000000
--- a/net/9p/util.c
+++ /dev/null
@@ -1,141 +0,0 @@
-/*
- * net/9p/util.c
- *
- * This file contains some helper functions
- *
- * Copyright (C) 2007 by Latchesar Ionkov <[email protected]>
- * Copyright (C) 2004 by Eric Van Hensbergen <[email protected]>
- * Copyright (C) 2002 by Ron Minnich <[email protected]>
- *
- * This program is free software; you can redistribute it and/or modify
- * it under the terms of the GNU General Public License version 2
- * as published by the Free Software Foundation.
- *
- * This program is distributed in the hope that it will be useful,
- * but WITHOUT ANY WARRANTY; without even the implied warranty of
- * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
- * GNU General Public License for more details.
- *
- * You should have received a copy of the GNU General Public License
- * along with this program; if not, write to:
- * Free Software Foundation
- * 51 Franklin Street, Fifth Floor
- * Boston, MA 02111-1301 USA
- *
- */
-
-#include <linux/module.h>
-#include <linux/errno.h>
-#include <linux/fs.h>
-#include <linux/sched.h>
-#include <linux/parser.h>
-#include <linux/idr.h>
-#include <linux/slab.h>
-#include <net/9p/9p.h>
-
-/**
- * struct p9_idpool - per-connection accounting for tag idpool
- * @lock: protects the pool
- * @pool: idr to allocate tag id from
- *
- */
-
-struct p9_idpool {
- spinlock_t lock;
- struct idr pool;
-};
-
-/**
- * p9_idpool_create - create a new per-connection id pool
- *
- */
-
-struct p9_idpool *p9_idpool_create(void)
-{
- struct p9_idpool *p;
-
- p = kmalloc(sizeof(struct p9_idpool), GFP_KERNEL);
- if (!p)
- return ERR_PTR(-ENOMEM);
-
- spin_lock_init(&p->lock);
- idr_init(&p->pool);
-
- return p;
-}
-EXPORT_SYMBOL(p9_idpool_create);
-
-/**
- * p9_idpool_destroy - create a new per-connection id pool
- * @p: idpool to destroy
- */
-
-void p9_idpool_destroy(struct p9_idpool *p)
-{
- idr_destroy(&p->pool);
- kfree(p);
-}
-EXPORT_SYMBOL(p9_idpool_destroy);
-
-/**
- * p9_idpool_get - allocate numeric id from pool
- * @p: pool to allocate from
- *
- * Bugs: This seems to be an awful generic function, should it be in idr.c with
- * the lock included in struct idr?
- */
-
-int p9_idpool_get(struct p9_idpool *p)
-{
- int i;
- unsigned long flags;
-
- idr_preload(GFP_NOFS);
- spin_lock_irqsave(&p->lock, flags);
-
- /* no need to store exactly p, we just need something non-null */
- i = idr_alloc(&p->pool, p, 0, 0, GFP_NOWAIT);
-
- spin_unlock_irqrestore(&p->lock, flags);
- idr_preload_end();
- if (i < 0)
- return -1;
-
- p9_debug(P9_DEBUG_MUX, " id %d pool %p\n", i, p);
- return i;
-}
-EXPORT_SYMBOL(p9_idpool_get);
-
-/**
- * p9_idpool_put - release numeric id from pool
- * @id: numeric id which is being released
- * @p: pool to release id into
- *
- * Bugs: This seems to be an awful generic function, should it be in idr.c with
- * the lock included in struct idr?
- */
-
-void p9_idpool_put(int id, struct p9_idpool *p)
-{
- unsigned long flags;
-
- p9_debug(P9_DEBUG_MUX, " id %d pool %p\n", id, p);
-
- spin_lock_irqsave(&p->lock, flags);
- idr_remove(&p->pool, id);
- spin_unlock_irqrestore(&p->lock, flags);
-}
-EXPORT_SYMBOL(p9_idpool_put);
-
-/**
- * p9_idpool_check - check if the specified id is available
- * @id: id to check
- * @p: pool to check
- */
-
-int p9_idpool_check(int id, struct p9_idpool *p)
-{
- return idr_find(&p->pool, id) != NULL;
-}
-EXPORT_SYMBOL(p9_idpool_check);
-
--
2.18.0


2018-06-28 13:29:23

by Matthew Wilcox

[permalink] [raw]
Subject: [PATCH 1/6] 9p: Change p9_fid_create calling convention

Return NULL instead of ERR_PTR when we can't allocate a FID. The ENOSPC
return value was getting all the way back to userspace, and that's
confusing for a userspace program which isn't expecting read() to tell it
there's no space left on the filesystem. The best error we can return to
indicate a temporary failure caused by lack of client resources is ENOMEM.

Maybe it would be better to sleep until a FID is available, but that's
not a change I'm comfortable making.

Signed-off-by: Matthew Wilcox <[email protected]>
---
net/9p/client.c | 23 +++++++++--------------
1 file changed, 9 insertions(+), 14 deletions(-)

diff --git a/net/9p/client.c b/net/9p/client.c
index 18c5271910dc..f8d58b0852fe 100644
--- a/net/9p/client.c
+++ b/net/9p/client.c
@@ -913,13 +913,11 @@ static struct p9_fid *p9_fid_create(struct p9_client *clnt)
p9_debug(P9_DEBUG_FID, "clnt %p\n", clnt);
fid = kmalloc(sizeof(struct p9_fid), GFP_KERNEL);
if (!fid)
- return ERR_PTR(-ENOMEM);
+ return NULL;

ret = p9_idpool_get(clnt->fidpool);
- if (ret < 0) {
- ret = -ENOSPC;
+ if (ret < 0)
goto error;
- }
fid->fid = ret;

memset(&fid->qid, 0, sizeof(struct p9_qid));
@@ -935,7 +933,7 @@ static struct p9_fid *p9_fid_create(struct p9_client *clnt)

error:
kfree(fid);
- return ERR_PTR(ret);
+ return NULL;
}

static void p9_fid_destroy(struct p9_fid *fid)
@@ -1137,9 +1135,8 @@ struct p9_fid *p9_client_attach(struct p9_client *clnt, struct p9_fid *afid,
p9_debug(P9_DEBUG_9P, ">>> TATTACH afid %d uname %s aname %s\n",
afid ? afid->fid : -1, uname, aname);
fid = p9_fid_create(clnt);
- if (IS_ERR(fid)) {
- err = PTR_ERR(fid);
- fid = NULL;
+ if (!fid) {
+ err = -ENOMEM;
goto error;
}
fid->uid = n_uname;
@@ -1188,9 +1185,8 @@ struct p9_fid *p9_client_walk(struct p9_fid *oldfid, uint16_t nwname,
clnt = oldfid->clnt;
if (clone) {
fid = p9_fid_create(clnt);
- if (IS_ERR(fid)) {
- err = PTR_ERR(fid);
- fid = NULL;
+ if (!fid) {
+ err = -ENOMEM;
goto error;
}

@@ -2018,9 +2014,8 @@ struct p9_fid *p9_client_xattrwalk(struct p9_fid *file_fid,
err = 0;
clnt = file_fid->clnt;
attr_fid = p9_fid_create(clnt);
- if (IS_ERR(attr_fid)) {
- err = PTR_ERR(attr_fid);
- attr_fid = NULL;
+ if (!attr_fid) {
+ err = -ENOMEM;
goto error;
}
p9_debug(P9_DEBUG_9P,
--
2.18.0


2018-06-28 13:30:30

by Matthew Wilcox

[permalink] [raw]
Subject: [PATCH 2/6] 9p: Replace the fidlist with an IDR

The p9_idpool being used to allocate the IDs uses an IDR to allocate
the IDs ... which we then keep in a doubly-linked list, rather than in
the IDR which allocated them. We can use an IDR directly which saves
two pointers per p9_fid, and a tiny memory allocation per p9_client.

Signed-off-by: Matthew Wilcox <[email protected]>
---
include/net/9p/client.h | 9 +++------
net/9p/client.c | 42 ++++++++++++++---------------------------
2 files changed, 17 insertions(+), 34 deletions(-)

diff --git a/include/net/9p/client.h b/include/net/9p/client.h
index 7af9d769b97d..e405729cd1c7 100644
--- a/include/net/9p/client.h
+++ b/include/net/9p/client.h
@@ -27,6 +27,7 @@
#define NET_9P_CLIENT_H

#include <linux/utsname.h>
+#include <linux/idr.h>

/* Number of requests per row */
#define P9_ROW_MAXTAG 255
@@ -128,8 +129,7 @@ struct p9_req_t {
* @proto_version: 9P protocol version to use
* @trans_mod: module API instantiated with this client
* @trans: tranport instance state and API
- * @fidpool: fid handle accounting for session
- * @fidlist: List of active fid handles
+ * @fids: All active FID handles
* @tagpool - transaction id accounting for session
* @reqs - 2D array of requests
* @max_tag - current maximum tag id allocated
@@ -169,8 +169,7 @@ struct p9_client {
} tcp;
} trans_opts;

- struct p9_idpool *fidpool;
- struct list_head fidlist;
+ struct idr fids;

struct p9_idpool *tagpool;
struct p9_req_t *reqs[P9_ROW_MAXTAG];
@@ -188,7 +187,6 @@ struct p9_client {
* @iounit: the server reported maximum transaction size for this file
* @uid: the numeric uid of the local user who owns this handle
* @rdir: readdir accounting structure (allocated on demand)
- * @flist: per-client-instance fid tracking
* @dlist: per-dentry fid tracking
*
* TODO: This needs lots of explanation.
@@ -204,7 +202,6 @@ struct p9_fid {

void *rdir;

- struct list_head flist;
struct hlist_node dlist; /* list of all fids attached to a dentry */
};

diff --git a/net/9p/client.c b/net/9p/client.c
index f8d58b0852fe..bbab82f22c20 100644
--- a/net/9p/client.c
+++ b/net/9p/client.c
@@ -908,30 +908,27 @@ static struct p9_fid *p9_fid_create(struct p9_client *clnt)
{
int ret;
struct p9_fid *fid;
- unsigned long flags;

p9_debug(P9_DEBUG_FID, "clnt %p\n", clnt);
fid = kmalloc(sizeof(struct p9_fid), GFP_KERNEL);
if (!fid)
return NULL;

- ret = p9_idpool_get(clnt->fidpool);
- if (ret < 0)
- goto error;
- fid->fid = ret;
-
memset(&fid->qid, 0, sizeof(struct p9_qid));
fid->mode = -1;
fid->uid = current_fsuid();
fid->clnt = clnt;
fid->rdir = NULL;
- spin_lock_irqsave(&clnt->lock, flags);
- list_add(&fid->flist, &clnt->fidlist);
- spin_unlock_irqrestore(&clnt->lock, flags);

- return fid;
+ idr_preload(GFP_KERNEL);
+ spin_lock_irq(&clnt->lock);
+ ret = idr_alloc_u32(&clnt->fids, fid, &fid->fid, UINT_MAX, GFP_NOWAIT);
+ spin_unlock_irq(&clnt->lock);
+ idr_preload_end();
+
+ if (!ret)
+ return fid;

-error:
kfree(fid);
return NULL;
}
@@ -943,9 +940,8 @@ static void p9_fid_destroy(struct p9_fid *fid)

p9_debug(P9_DEBUG_FID, "fid %d\n", fid->fid);
clnt = fid->clnt;
- p9_idpool_put(fid->fid, clnt->fidpool);
spin_lock_irqsave(&clnt->lock, flags);
- list_del(&fid->flist);
+ idr_remove(&clnt->fids, fid->fid);
spin_unlock_irqrestore(&clnt->lock, flags);
kfree(fid->rdir);
kfree(fid);
@@ -1028,7 +1024,7 @@ struct p9_client *p9_client_create(const char *dev_name, char *options)
memcpy(clnt->name, client_id, strlen(client_id) + 1);

spin_lock_init(&clnt->lock);
- INIT_LIST_HEAD(&clnt->fidlist);
+ idr_init(&clnt->fids);

err = p9_tag_init(clnt);
if (err < 0)
@@ -1048,18 +1044,12 @@ struct p9_client *p9_client_create(const char *dev_name, char *options)
goto destroy_tagpool;
}

- clnt->fidpool = p9_idpool_create();
- if (IS_ERR(clnt->fidpool)) {
- err = PTR_ERR(clnt->fidpool);
- goto put_trans;
- }
-
p9_debug(P9_DEBUG_MUX, "clnt %p trans %p msize %d protocol %d\n",
clnt, clnt->trans_mod, clnt->msize, clnt->proto_version);

err = clnt->trans_mod->create(clnt, dev_name, options);
if (err)
- goto destroy_fidpool;
+ goto put_trans;

if (clnt->msize > clnt->trans_mod->maxsize)
clnt->msize = clnt->trans_mod->maxsize;
@@ -1072,8 +1062,6 @@ struct p9_client *p9_client_create(const char *dev_name, char *options)

close_trans:
clnt->trans_mod->close(clnt);
-destroy_fidpool:
- p9_idpool_destroy(clnt->fidpool);
put_trans:
v9fs_put_trans(clnt->trans_mod);
destroy_tagpool:
@@ -1086,7 +1074,8 @@ EXPORT_SYMBOL(p9_client_create);

void p9_client_destroy(struct p9_client *clnt)
{
- struct p9_fid *fid, *fidptr;
+ struct p9_fid *fid;
+ int id;

p9_debug(P9_DEBUG_MUX, "clnt %p\n", clnt);

@@ -1095,14 +1084,11 @@ void p9_client_destroy(struct p9_client *clnt)

v9fs_put_trans(clnt->trans_mod);

- list_for_each_entry_safe(fid, fidptr, &clnt->fidlist, flist) {
+ idr_for_each_entry(&clnt->fids, fid, id) {
pr_info("Found fid %d not clunked\n", fid->fid);
p9_fid_destroy(fid);
}

- if (clnt->fidpool)
- p9_idpool_destroy(clnt->fidpool);
-
p9_tag_cleanup(clnt);

kfree(clnt);
--
2.18.0


2018-06-28 13:31:22

by Matthew Wilcox

[permalink] [raw]
Subject: [PATCH 3/6] 9p: Embed wait_queue_head into p9_req_t

On a 64-bit system, the wait_queue_head_t is 24 bytes while the pointer
to it is 8 bytes. Growing the p9_req_t by 16 bytes is better than
performing a 24-byte memory allocation.

Signed-off-by: Matthew Wilcox <[email protected]>
---
include/net/9p/client.h | 2 +-
net/9p/client.c | 19 +++++--------------
net/9p/trans_virtio.c | 2 +-
3 files changed, 7 insertions(+), 16 deletions(-)

diff --git a/include/net/9p/client.h b/include/net/9p/client.h
index e405729cd1c7..0fa0fbab33b0 100644
--- a/include/net/9p/client.h
+++ b/include/net/9p/client.h
@@ -113,7 +113,7 @@ enum p9_req_status_t {
struct p9_req_t {
int status;
int t_err;
- wait_queue_head_t *wq;
+ wait_queue_head_t wq;
struct p9_fcall *tc;
struct p9_fcall *rc;
void *aux;
diff --git a/net/9p/client.c b/net/9p/client.c
index bbab82f22c20..602f76de388a 100644
--- a/net/9p/client.c
+++ b/net/9p/client.c
@@ -282,8 +282,9 @@ p9_tag_alloc(struct p9_client *c, u16 tag, unsigned int max_size)
return ERR_PTR(-ENOMEM);
}
for (col = 0; col < P9_ROW_MAXTAG; col++) {
- c->reqs[row][col].status = REQ_STATUS_IDLE;
- c->reqs[row][col].tc = NULL;
+ req = &c->reqs[row][col];
+ req->status = REQ_STATUS_IDLE;
+ init_waitqueue_head(&req->wq);
}
c->max_tag += P9_ROW_MAXTAG;
}
@@ -293,13 +294,6 @@ p9_tag_alloc(struct p9_client *c, u16 tag, unsigned int max_size)
col = tag % P9_ROW_MAXTAG;

req = &c->reqs[row][col];
- if (!req->wq) {
- req->wq = kmalloc(sizeof(wait_queue_head_t), GFP_NOFS);
- if (!req->wq)
- goto grow_failed;
- init_waitqueue_head(req->wq);
- }
-
if (!req->tc)
req->tc = p9_fcall_alloc(alloc_msize);
if (!req->rc)
@@ -319,9 +313,7 @@ p9_tag_alloc(struct p9_client *c, u16 tag, unsigned int max_size)
pr_err("Couldn't grow tag array\n");
kfree(req->tc);
kfree(req->rc);
- kfree(req->wq);
req->tc = req->rc = NULL;
- req->wq = NULL;
return ERR_PTR(-ENOMEM);
}

@@ -409,7 +401,6 @@ static void p9_tag_cleanup(struct p9_client *c)
/* free requests associated with tags */
for (row = 0; row < (c->max_tag/P9_ROW_MAXTAG); row++) {
for (col = 0; col < P9_ROW_MAXTAG; col++) {
- kfree(c->reqs[row][col].wq);
kfree(c->reqs[row][col].tc);
kfree(c->reqs[row][col].rc);
}
@@ -452,7 +443,7 @@ void p9_client_cb(struct p9_client *c, struct p9_req_t *req, int status)
smp_wmb();
req->status = status;

- wake_up(req->wq);
+ wake_up(&req->wq);
p9_debug(P9_DEBUG_MUX, "wakeup: %d\n", req->tc->tag);
}
EXPORT_SYMBOL(p9_client_cb);
@@ -773,7 +764,7 @@ p9_client_rpc(struct p9_client *c, int8_t type, const char *fmt, ...)
}
again:
/* Wait for the response */
- err = wait_event_killable(*req->wq, req->status >= REQ_STATUS_RCVD);
+ err = wait_event_killable(req->wq, req->status >= REQ_STATUS_RCVD);

/*
* Make sure our req is coherent with regard to updates in other
diff --git a/net/9p/trans_virtio.c b/net/9p/trans_virtio.c
index 05006cbb3361..3e096c98313c 100644
--- a/net/9p/trans_virtio.c
+++ b/net/9p/trans_virtio.c
@@ -490,7 +490,7 @@ p9_virtio_zc_request(struct p9_client *client, struct p9_req_t *req,
virtqueue_kick(chan->vq);
spin_unlock_irqrestore(&chan->lock, flags);
p9_debug(P9_DEBUG_TRANS, "virtio request kicked\n");
- err = wait_event_killable(*req->wq, req->status >= REQ_STATUS_RCVD);
+ err = wait_event_killable(req->wq, req->status >= REQ_STATUS_RCVD);
/*
* Non kernel buffers are pinned, unpin them
*/
--
2.18.0


2018-06-28 13:50:29

by Dominique Martinet

[permalink] [raw]
Subject: Re: [V9fs-developer] [PATCH 4/6] 9p: Remove an unnecessary memory barrier

Matthew Wilcox wrote on Thu, Jun 28, 2018:
> --- a/net/9p/client.c
> +++ b/net/9p/client.c
> @@ -436,13 +436,9 @@ void p9_client_cb(struct p9_client *c, struct p9_req_t *req, int status)
> {
> p9_debug(P9_DEBUG_MUX, " tag %d\n", req->tc->tag);
>
> - /*
> - * This barrier is needed to make sure any change made to req before
> - * the other thread wakes up will indeed be seen by the waiting side.
> - */
> - smp_wmb();
> req->status = status;
>
> + /* wake_up is an implicit write memory barrier */

Nope.
Please note the wmb is _before_ setting status, basically it protects
from cpu optimizations where status could be set before other fields,
then other core opportunistically checking and finding status is good so
other thread continuing.

I could only reproduce this bug with infiniband network, but it is very
definitely needed. Here is the commit message of when I added that barrier:
-----
9P: Add memory barriers to protect request fields over cb/rpc threads handoff

We need barriers to guarantee this pattern works as intended:
[w] req->rc, 1 [r] req->status, 1
wmb rmb
[w] req->status, 1 [r] req->rc

Where the wmb ensures that rc gets written before status,
and the rmb ensures that if you observe status == 1, rc is the new value.
-----

It might need an update to the comment though, if you thought about
removing it...
--
Dominique Martinet | Asmadeus

2018-06-28 18:23:42

by Matthew Wilcox

[permalink] [raw]
Subject: Re: [V9fs-developer] [PATCH 4/6] 9p: Remove an unnecessary memory barrier

On Thu, Jun 28, 2018 at 03:40:29PM +0200, Dominique Martinet wrote:
> Matthew Wilcox wrote on Thu, Jun 28, 2018:
> > --- a/net/9p/client.c
> > +++ b/net/9p/client.c
> > @@ -436,13 +436,9 @@ void p9_client_cb(struct p9_client *c, struct p9_req_t *req, int status)
> > {
> > p9_debug(P9_DEBUG_MUX, " tag %d\n", req->tc->tag);
> >
> > - /*
> > - * This barrier is needed to make sure any change made to req before
> > - * the other thread wakes up will indeed be seen by the waiting side.
> > - */
> > - smp_wmb();
> > req->status = status;
> >
> > + /* wake_up is an implicit write memory barrier */
>
> Nope.
> Please note the wmb is _before_ setting status, basically it protects
> from cpu optimizations where status could be set before other fields,
> then other core opportunistically checking and finding status is good so
> other thread continuing.
>
> I could only reproduce this bug with infiniband network, but it is very
> definitely needed. Here is the commit message of when I added that barrier:
> -----
> 9P: Add memory barriers to protect request fields over cb/rpc threads handoff
>
> We need barriers to guarantee this pattern works as intended:
> [w] req->rc, 1 [r] req->status, 1
> wmb rmb
> [w] req->status, 1 [r] req->rc
>
> Where the wmb ensures that rc gets written before status,
> and the rmb ensures that if you observe status == 1, rc is the new value.
> -----
>
> It might need an update to the comment though, if you thought about
> removing it...

Ah! Yes, that situation is different from what the comment documents.

How about this?

/*
* This barrier is needed to make sure any change made to req before
- * the other thread wakes up will indeed be seen by the waiting side.
+ * the status change is visible to another thread
*/

2018-06-28 21:52:17

by Dominique Martinet

[permalink] [raw]
Subject: Re: [V9fs-developer] [PATCH 4/6] 9p: Remove an unnecessary memory barrier

Matthew Wilcox wrote on Thu, Jun 28, 2018:
> How about this?
>
> /*
> * This barrier is needed to make sure any change made to req before
> - * the other thread wakes up will indeed be seen by the waiting side.
> + * the status change is visible to another thread
> */

Yes, that sounds better.

This code is fairly old and I was wondering if the new WRITE_ONCE and
READ_ONCE macros would help but it looks like compile-time barriers only
so I do not think they would be enough...
Documentation/memory-barriers.txt still suggests something similar in
the "SLEEP AND WAKE-UP FUNCTIONS" section so I guess this is fine.


Thanks,
--
Dominique Martinet | Asmadeus

2018-07-11 13:00:41

by Matthew Wilcox

[permalink] [raw]
Subject: Re: [V9fs-developer] [PATCH 2/6] 9p: Replace the fidlist with an IDR

On Wed, Jul 11, 2018 at 02:40:38PM +0200, Dominique Martinet wrote:
> Matthew Wilcox wrote on Thu, Jun 28, 2018:
> > diff --git a/net/9p/client.c b/net/9p/client.c
> > index f8d58b0852fe..bbab82f22c20 100644
> > --- a/net/9p/client.c
> > +++ b/net/9p/client.c
> > @@ -908,30 +908,27 @@ static struct p9_fid *p9_fid_create(struct p9_client *clnt)
> > - ret = p9_idpool_get(clnt->fidpool);
> > - if (ret < 0)
> > - goto error;
> > - fid->fid = ret;
> > -
...
> > + idr_preload(GFP_KERNEL);
> > + spin_lock_irq(&clnt->lock);
> > + ret = idr_alloc_u32(&clnt->fids, fid, &fid->fid, UINT_MAX, GFP_NOWAIT);
>
> There's also a P9_NOFID that we shouldn't use, so the max here should be
> P9_NOFID - 1

Happy to fix that. It shouldn't actually happen, of course. I can't
imagine us having 4 billion FIDs in use at once.

> That aside this introduces a change of behaviour that fid used to be
> alloc'd linearily from 0 which no longer holds true, that breaks one
> serveur (nfs-ganesha just returns ERANGE) but others seem to handle this
> just fine so they'll just need to fix that server.
> max aside this looks good.

I don't understand your assertion that this is a change of behaviour.
The implementation of p9_idpool_get() uses idr_alloc(), not
idr_alloc_cyclic(), so I don't believe I've changed which FID would
be allocated.

2018-07-11 13:18:07

by Matthew Wilcox

[permalink] [raw]
Subject: Re: [V9fs-developer] [PATCH 2/6] 9p: Replace the fidlist with an IDR

On Wed, Jul 11, 2018 at 02:58:12PM +0200, Dominique Martinet wrote:
> > I don't understand your assertion that this is a change of behaviour.
> > The implementation of p9_idpool_get() uses idr_alloc(), not
> > idr_alloc_cyclic(), so I don't believe I've changed which FID would
> > be allocated.
>
> Hmm, I just tried mounting something with ganesha and that broke because
> it received fid=1730858632 in a TATTACH request, so something in the
> patch series made some big fid happens.
>
> If you say this isn't intented though this needs debugging, I'm glad I
> brought this up :P
>
> (Note that it really should be fine if it is random within the pool, I
> have notified the current developpers of the problem)

Heh, unintended protocol fuzzing ;-)

I see the problem; need to initialise fid->fid to 0 before calling
idr_alloc_u32. I'd misread:

memset(&fid->qid, 0, sizeof(struct p9_qid));
as
memset(fid, 0, sizeof(struct p9_fid));

so I thought fid->fid was already 0. I'll fix that up, thanks!

2018-07-11 15:28:58

by Dominique Martinet

[permalink] [raw]
Subject: Re: [V9fs-developer] [PATCH 2/6] 9p: Replace the fidlist with an IDR

Matthew Wilcox wrote on Wed, Jul 11, 2018:
> On Wed, Jul 11, 2018 at 02:40:38PM +0200, Dominique Martinet wrote:
> > > + idr_preload(GFP_KERNEL);
> > > + spin_lock_irq(&clnt->lock);
> > > + ret = idr_alloc_u32(&clnt->fids, fid, &fid->fid, UINT_MAX, GFP_NOWAIT);
> >
> > There's also a P9_NOFID that we shouldn't use, so the max here should be
> > P9_NOFID - 1
>
> Happy to fix that. It shouldn't actually happen, of course. I can't
> imagine us having 4 billion FIDs in use at once.

Oh, I said that assuming that it was random.. But I guess we might as
well fix it.

> > That aside this introduces a change of behaviour that fid used to be
> > alloc'd linearily from 0 which no longer holds true, that breaks one
> > serveur (nfs-ganesha just returns ERANGE) but others seem to handle this
> > just fine so they'll just need to fix that server.
> > max aside this looks good.
>
> I don't understand your assertion that this is a change of behaviour.
> The implementation of p9_idpool_get() uses idr_alloc(), not
> idr_alloc_cyclic(), so I don't believe I've changed which FID would
> be allocated.

Hmm, I just tried mounting something with ganesha and that broke because
it received fid=1730858632 in a TATTACH request, so something in the
patch series made some big fid happens.

If you say this isn't intented though this needs debugging, I'm glad I
brought this up :P

(Note that it really should be fine if it is random within the pool, I
have notified the current developpers of the problem)
--
Dominique

2018-07-11 15:41:54

by Dominique Martinet

[permalink] [raw]
Subject: Re: [V9fs-developer] [PATCH 2/6] 9p: Replace the fidlist with an IDR

Matthew Wilcox wrote on Thu, Jun 28, 2018:
> diff --git a/net/9p/client.c b/net/9p/client.c
> index f8d58b0852fe..bbab82f22c20 100644
> --- a/net/9p/client.c
> +++ b/net/9p/client.c
> @@ -908,30 +908,27 @@ static struct p9_fid *p9_fid_create(struct p9_client *clnt)
> {
> int ret;
> struct p9_fid *fid;
> - unsigned long flags;
>
> p9_debug(P9_DEBUG_FID, "clnt %p\n", clnt);
> fid = kmalloc(sizeof(struct p9_fid), GFP_KERNEL);
> if (!fid)
> return NULL;
>
> - ret = p9_idpool_get(clnt->fidpool);
> - if (ret < 0)
> - goto error;
> - fid->fid = ret;
> -
> memset(&fid->qid, 0, sizeof(struct p9_qid));
> fid->mode = -1;
> fid->uid = current_fsuid();
> fid->clnt = clnt;
> fid->rdir = NULL;
> - spin_lock_irqsave(&clnt->lock, flags);
> - list_add(&fid->flist, &clnt->fidlist);
> - spin_unlock_irqrestore(&clnt->lock, flags);
>
> - return fid;
> + idr_preload(GFP_KERNEL);
> + spin_lock_irq(&clnt->lock);
> + ret = idr_alloc_u32(&clnt->fids, fid, &fid->fid, UINT_MAX, GFP_NOWAIT);

There's also a P9_NOFID that we shouldn't use, so the max here should be
P9_NOFID - 1

That aside this introduces a change of behaviour that fid used to be
alloc'd linearily from 0 which no longer holds true, that breaks one
serveur (nfs-ganesha just returns ERANGE) but others seem to handle this
just fine so they'll just need to fix that server.
max aside this looks good.

--
Dominique Martinet

2018-07-11 16:20:44

by Dominique Martinet

[permalink] [raw]
Subject: Re: [V9fs-developer] [PATCH 0/6] 9p: Use IDRs more effectively

Matthew Wilcox wrote on Thu, Jun 28, 2018:
> The 9p code doesn't take advantage of the IDR's ability to store
> a pointer. We can actually get rid of the p9_idpool abstraction
> and the multi-dimensional array of requests.
>
> I haven't tested these patches, so caveat maintainer.

Overall pretty good for something that hadn't even been tested!

I'm done with the comments here, the three other patches I didn't
comment look good to me.

For [PATCH 4/6] 9p: Remove an unnecessary memory barrier, your suggested
wording in the comment is good; I've offered to do it if you aren't
going to but since you're submitting a second version of the other
patchs I suggest you do this one as well.


Thanks for the well-needed cleanup,
--
Dominique Martinet

2018-07-11 16:33:02

by Dominique Martinet

[permalink] [raw]
Subject: Re: [V9fs-developer] [PATCH 5/6] 9p: Use a slab for allocating requests

Matthew Wilcox wrote on Thu, Jun 28, 2018:
> diff --git a/include/net/9p/client.h b/include/net/9p/client.h
> index 0fa0fbab33b0..fd326811ebd4 100644
> --- a/include/net/9p/client.h
> +++ b/include/net/9p/client.h
> @@ -92,24 +85,12 @@ enum p9_req_status_t {
> * struct p9_req_t - request slots
> * @status: status of this request slot
> * @t_err: transport error
> - * @flush_tag: tag of request being flushed (for flush requests)
> * @wq: wait_queue for the client to block on for this request
> * @tc: the request fcall structure
> * @rc: the response fcall structure
> * @aux: transport specific data (provided for trans_fd migration)
> * @req_list: link for higher level objects to chain requests
> - *
> - * Transport use an array to track outstanding requests
> - * instead of a list. While this may incurr overhead during initial
> - * allocation or expansion, it makes request lookup much easier as the
> - * tag id is a index into an array. (We use tag+1 so that we can accommodate
> - * the -1 tag for the T_VERSION request).
> - * This also has the nice effect of only having to allocate wait_queues
> - * once, instead of constantly allocating and freeing them. Its possible
> - * other resources could benefit from this scheme as well.
> - *
> */
> -
> struct p9_req_t {
> int status;
> int t_err;
> @@ -117,40 +98,26 @@ struct p9_req_t {
> struct p9_fcall *tc;
> struct p9_fcall *rc;
> void *aux;
> -
> struct list_head req_list;
> };
>
> /**
> * struct p9_client - per client instance state
> - * @lock: protect @fidlist
> + * @lock: protect @fids and @reqs
> * @msize: maximum data size negotiated by protocol
> - * @dotu: extension flags negotiated by protocol
> * @proto_version: 9P protocol version to use
> * @trans_mod: module API instantiated with this client
> + * @status: XXX

Let's give this a proper comment; something like 'connection state
machine' ? (this contains Connected/BeginDisconnect/Disconnected/Hung)


> diff --git a/net/9p/client.c b/net/9p/client.c
> index 2dce8d8307cc..fd5446377445 100644
> --- a/net/9p/client.c
> +++ b/net/9p/client.c
> @@ -241,132 +241,104 @@ static struct p9_fcall *p9_fcall_alloc(int alloc_msize)
> return fc;
> }
>
> +static struct kmem_cache *p9_req_cache;
> +
> /**
> - * p9_tag_alloc - lookup/allocate a request by tag
> - * @c: client session to lookup tag within
> - * @tag: numeric id for transaction
> - *
> - * this is a simple array lookup, but will grow the
> - * request_slots as necessary to accommodate transaction
> - * ids which did not previously have a slot.
> - *
> - * this code relies on the client spinlock to manage locks, its
> - * possible we should switch to something else, but I'd rather
> - * stick with something low-overhead for the common case.
> + * p9_req_alloc - Allocate a new request.
> + * @c: Client session.
> + * @type: Transaction type.
> + * @max_size: Maximum packet size for this request.
> *
> + * Context: Process context. What mutex might we be holding that
> + * requires GFP_NOFS?

Good question, but p9_tag_alloc happens on every single client request
so the fs/9p functions might be trying to do something and the alloc
request here comes in and could try to destroy the inode that is
currently used in the request -- I'm not sure how likely this is, but
I'd rather not tempt fate :p

> + * Return: Pointer to new request.
> */
> -
> static struct p9_req_t *
> -p9_tag_alloc(struct p9_client *c, u16 tag, unsigned int max_size)
> +p9_tag_alloc(struct p9_client *c, int8_t type, unsigned int max_size)
> {
> - unsigned long flags;
> - int row, col;
> - struct p9_req_t *req;
> + struct p9_req_t *req = kmem_cache_alloc(p9_req_cache, GFP_NOFS);
> int alloc_msize = min(c->msize, max_size);
> + int tag;
>
> - /* This looks up the original request by tag so we know which
> - * buffer to read the data into */
> - tag++;
> -
> - if (tag >= c->max_tag) {
> - spin_lock_irqsave(&c->lock, flags);
> - /* check again since original check was outside of lock */
> - while (tag >= c->max_tag) {
> - row = (tag / P9_ROW_MAXTAG);
> - c->reqs[row] = kcalloc(P9_ROW_MAXTAG,
> - sizeof(struct p9_req_t), GFP_ATOMIC);
> -
> - if (!c->reqs[row]) {
> - pr_err("Couldn't grow tag array\n");
> - spin_unlock_irqrestore(&c->lock, flags);
> - return ERR_PTR(-ENOMEM);
> - }
> - for (col = 0; col < P9_ROW_MAXTAG; col++) {
> - req = &c->reqs[row][col];
> - req->status = REQ_STATUS_IDLE;
> - init_waitqueue_head(&req->wq);
> - }
> - c->max_tag += P9_ROW_MAXTAG;
> - }
> - spin_unlock_irqrestore(&c->lock, flags);
> - }
> - row = tag / P9_ROW_MAXTAG;
> - col = tag % P9_ROW_MAXTAG;
> + if (!req)
> + return NULL;
>
> - req = &c->reqs[row][col];
> - if (!req->tc)
> - req->tc = p9_fcall_alloc(alloc_msize);
> - if (!req->rc)
> - req->rc = p9_fcall_alloc(alloc_msize);
> + req->tc = p9_fcall_alloc(alloc_msize);
> + req->rc = p9_fcall_alloc(alloc_msize);
> if (!req->tc || !req->rc)
> - goto grow_failed;
> + goto free;
>
> p9pdu_reset(req->tc);
> p9pdu_reset(req->rc);
> -
> - req->tc->tag = tag-1;
> req->status = REQ_STATUS_ALLOC;
> + init_waitqueue_head(&req->wq);
> + INIT_LIST_HEAD(&req->req_list);
> +
> + idr_preload(GFP_NOFS);
> + spin_lock_irq(&c->lock);
> + if (type == P9_TVERSION)
> + tag = idr_alloc(&c->reqs, req, P9_NOTAG, P9_NOTAG + 1,
> + GFP_NOWAIT);

Well this appears to work but P9_NOTAG being '(u16)(~0)' I'm not too
confident with P9_NOTAG + 1. . . it doesn't look like it's overflowing
before the cast on my laptop but is that guaranteed?


> [..]
> @@ -1012,14 +940,11 @@ struct p9_client *p9_client_create(const char *dev_name, char *options)
>
> spin_lock_init(&clnt->lock);
> idr_init(&clnt->fids);
> -
> - err = p9_tag_init(clnt);
> - if (err < 0)
> - goto free_client;
> + idr_init(&clnt->reqs);

I do not see any call to idr_destroy, is that OK?


Looks like another good patch overall, thanks!
--
Dominique

2018-07-11 20:44:39

by Matthew Wilcox

[permalink] [raw]
Subject: Re: [V9fs-developer] [PATCH 5/6] 9p: Use a slab for allocating requests

On Wed, Jul 11, 2018 at 03:33:13PM +0200, Dominique Martinet wrote:
> Matthew Wilcox wrote on Thu, Jun 28, 2018:
> > /**
> > * struct p9_client - per client instance state
> > - * @lock: protect @fidlist
> > + * @lock: protect @fids and @reqs
> > * @msize: maximum data size negotiated by protocol
> > - * @dotu: extension flags negotiated by protocol
> > * @proto_version: 9P protocol version to use
> > * @trans_mod: module API instantiated with this client
> > + * @status: XXX
>
> Let's give this a proper comment; something like 'connection state
> machine' ? (this contains Connected/BeginDisconnect/Disconnected/Hung)

Sure! Will add.

> > /**
> > - * p9_tag_alloc - lookup/allocate a request by tag
> > - * @c: client session to lookup tag within
> > - * @tag: numeric id for transaction
> > - *
> > - * this is a simple array lookup, but will grow the
> > - * request_slots as necessary to accommodate transaction
> > - * ids which did not previously have a slot.
> > - *
> > - * this code relies on the client spinlock to manage locks, its
> > - * possible we should switch to something else, but I'd rather
> > - * stick with something low-overhead for the common case.
> > + * p9_req_alloc - Allocate a new request.
> > + * @c: Client session.
> > + * @type: Transaction type.
> > + * @max_size: Maximum packet size for this request.
> > *
> > + * Context: Process context. What mutex might we be holding that
> > + * requires GFP_NOFS?
>
> Good question, but p9_tag_alloc happens on every single client request
> so the fs/9p functions might be trying to do something and the alloc
> request here comes in and could try to destroy the inode that is
> currently used in the request -- I'm not sure how likely this is, but
> I'd rather not tempt fate :p

Fair. I'll remove the question.

> > + INIT_LIST_HEAD(&req->req_list);
> > +
> > + idr_preload(GFP_NOFS);
> > + spin_lock_irq(&c->lock);
> > + if (type == P9_TVERSION)
> > + tag = idr_alloc(&c->reqs, req, P9_NOTAG, P9_NOTAG + 1,
> > + GFP_NOWAIT);
>
> Well this appears to work but P9_NOTAG being '(u16)(~0)' I'm not too
> confident with P9_NOTAG + 1. . . it doesn't look like it's overflowing
> before the cast on my laptop but is that guaranteed?

By my understanding of n1256.pdf ... this falls under 6.3.1.8 ("Usual
arithmetic conversions"). We have a u16 and an int. Therefore this
rule applies:

Otherwise, if the type of the operand with signed integer type can
represent all of the values of the type of the operand with unsigned
integer type, then the operand with unsigned integer type is converted
to the type of the operand with signed integer type.

> > [..]
> > @@ -1012,14 +940,11 @@ struct p9_client *p9_client_create(const char *dev_name, char *options)
> >
> > spin_lock_init(&clnt->lock);
> > idr_init(&clnt->fids);
> > -
> > - err = p9_tag_init(clnt);
> > - if (err < 0)
> > - goto free_client;
> > + idr_init(&clnt->reqs);
>
> I do not see any call to idr_destroy, is that OK?

Yes, that's fine. It used to be (back in 2013) that one had to call
idr_destroy() in order to free the preallocated idr data structures.
Now it's a no-op if called on an empty IDR, and I would expect that both
IDRs are empty at the time that it comes to unloading the module (and if
they aren't, we probably have bigger problems than a small memory leak).
Some users like to assert that the IDR is empty; most do not go to that
extent of defensive programming.

2018-07-11 20:45:29

by Dominique Martinet

[permalink] [raw]
Subject: Re: [V9fs-developer] [PATCH 5/6] 9p: Use a slab for allocating requests

Matthew Wilcox wrote on Wed, Jul 11, 2018:
> On Wed, Jul 11, 2018 at 03:33:13PM +0200, Dominique Martinet wrote:
> > Well this appears to work but P9_NOTAG being '(u16)(~0)' I'm not too
> > confident with P9_NOTAG + 1. . . it doesn't look like it's overflowing
> > before the cast on my laptop but is that guaranteed?
>
> By my understanding of n1256.pdf ... this falls under 6.3.1.8 ("Usual
> arithmetic conversions"). We have a u16 and an int. Therefore this
> rule applies:
>
> Otherwise, if the type of the operand with signed integer type can
> represent all of the values of the type of the operand with unsigned
> integer type, then the operand with unsigned integer type is converted
> to the type of the operand with signed integer type.

Thanks for checking, that'll work then.

> > I do not see any call to idr_destroy, is that OK?
>
> Yes, that's fine. It used to be (back in 2013) that one had to call
> idr_destroy() in order to free the preallocated idr data structures.
> Now it's a no-op if called on an empty IDR, and I would expect that both
> IDRs are empty at the time that it comes to unloading the module (and if
> they aren't, we probably have bigger problems than a small memory leak).
> Some users like to assert that the IDR is empty; most do not go to that
> extent of defensive programming.

Ok, I agree we're not there yet.

Just comments nitpicks, then :)

--
Dominique Martinet