2007-10-01 19:27:33

by Tom Tucker

[permalink] [raw]
Subject: [RFC,PATCH 00/35] SVC Transport Switch

This is rev 2 of the new pluggable transport switch for
RPC servers. This version includes two new patches: one to add a field
for keeping track of a transport specific header that precedes the
RPC header for deferral processing, and one that cleans up some
left over references to svc_sock in transport independent code.

Also in this patch are bug fixes, whitespace, and style changes based on
reviews from from Neil Brown, Bruce Fields, and Chuck Lever.

Edited text from rev 1 follows:

- The overall design of the switch has been modified to be more similar
to the client side, e.g.
- There is a transport class structure svc_xprt_class, and
- A transport independent structure is manipulated by xprt
independent code (svc_xprt)
- Further consolidation of transport independent logic out of
transport providers and into transport independent code.
- Transport independent code has been broken out into a separate file
- Transport independent functions prevously adorned with _sock_ have
had their names changed, e.g. svc_sock_enqueue
- atomic refcounts have been changed to krefs

I've attempted to organize the patchset such that logical changes are
clearly reviewable without too much clutter from functionally empty name
changes. This was somewhat awkward since intermediate patches may look
ugly/broken/incomplete to some reviewers. This was to avoid losing the
context of a change while keeping each patch a reasonable size. For example,
making svc_recv transport independent and moving it to the svc_xprt file
cannot be done in the same patch without losing the diffs to the svc_recv
function.

This patchset has had limited testing with TCP/UDP. In this case, the tests
included connectathon and building the kernel on an NFS mount running on the
transport switch.

This patchset is against the 2.6.23-rc8 kernel tree.

--
Signed-off-by: Tom Tucker <[email protected]>

-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2005.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs


2007-10-04 15:19:13

by Chuck Lever

[permalink] [raw]
Subject: Re: [RFC, PATCH 12/35] svc: Add a generic transport svc_create_xprt function

On Oct 3, 2007, at 10:30 PM, Greg Banks wrote:
> On Tue, Oct 02, 2007 at 11:39:18AM -0400, Chuck Lever wrote:
>> On Oct 1, 2007, at 3:27 PM, Tom Tucker wrote:
>>>
>>> struct svc_xprt_ops {
>>> + struct svc_xprt *(*xpo_create)(struct svc_serv *,
>>> + struct sockaddr *,
>>> + int);
>>
>> Should xpo_create also have a length argument, as in (struct sockaddr
>> *, socklen_t) ?
>
> Consistency would be nice. Let's see, how often do we
> maintain address length fields now?
>
> svc_deferred_req.addr: yes
> svc_rqst.rq_addr: yes
> svc_sock.sk_local: no
> svc_sock.sk_remote: yes
> rpc_xprt.addr: yes
> rpc_create_args.address: \
> rpc_create_args.saddress: / one length field for both
> rpc_xprtsock_create.srcaddr: \
> rpc_xprtsock_create.dstaddr: / one length field for both
> rpc_peeraddr(): yes
> rpcb_create(): no
> xs_send_kvec(): yes
> xs_sendpages(): yes
> __svc_print_addr(): no
> svc_port_is_privileged(): no
> svc_create_socket(): yes
>
>> (or whatever the type of sockaddr lengths are: size_t perhaps?)
>
> socklen_t should be right, but the kernel doesn't seem to have one.

FYI I've been using size_t as I convert "struct sockaddr_in *" to
"struct sockaddr *, length".

--
Chuck Lever
chuck[dot]lever[at]oracle[dot]com




-------------------------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems? Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >> http://get.splunk.com/
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2007-10-09 17:09:29

by J. Bruce Fields

[permalink] [raw]
Subject: Re: [RFC, PATCH 13/35] svc: Change services to use new svc_create_xprt service

On Mon, Oct 01, 2007 at 02:27:59PM -0500, Tom Tucker wrote:
> diff --git a/fs/nfs/callback.c b/fs/nfs/callback.c
> index a796be5..e27ca14 100644
> --- a/fs/nfs/callback.c
> +++ b/fs/nfs/callback.c
> @@ -123,8 +123,8 @@ int nfs_callback_up(void)
> if (!serv)
> goto out_err;
>
> - ret = svc_makesock(serv, IPPROTO_TCP, nfs_callback_set_tcpport,
> - SVC_SOCK_ANONYMOUS);
> + ret = svc_create_xprt(serv, "tcp", nfs_callback_set_tcpport,
> + SVC_SOCK_ANONYMOUS);
> if (ret <= 0)
> goto out_destroy;
> nfs_callback_tcpport = ret;

Looks like svc_makesock returned a port number, where svc_create_xprt
returns a 0 or -ERRNO. This is breaking nfsv4 callbacks.

--b.

-------------------------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems? Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >> http://get.splunk.com/
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2007-10-09 18:33:35

by Tom Tucker

[permalink] [raw]
Subject: Re: [RFC, PATCH 13/35] svc: Change services to use new svc_create_xprt service

On Tue, 2007-10-09 at 13:09 -0400, J. Bruce Fields wrote:
> On Mon, Oct 01, 2007 at 02:27:59PM -0500, Tom Tucker wrote:
> > diff --git a/fs/nfs/callback.c b/fs/nfs/callback.c
> > index a796be5..e27ca14 100644
> > --- a/fs/nfs/callback.c
> > +++ b/fs/nfs/callback.c
> > @@ -123,8 +123,8 @@ int nfs_callback_up(void)
> > if (!serv)
> > goto out_err;
> >
> > - ret = svc_makesock(serv, IPPROTO_TCP, nfs_callback_set_tcpport,
> > - SVC_SOCK_ANONYMOUS);
> > + ret = svc_create_xprt(serv, "tcp", nfs_callback_set_tcpport,
> > + SVC_SOCK_ANONYMOUS);
> > if (ret <= 0)
> > goto out_destroy;
> > nfs_callback_tcpport = ret;
>
> Looks like svc_makesock returned a port number, where svc_create_xprt
> returns a 0 or -ERRNO. This is breaking nfsv4 callbacks.
>

Bruce:

Yikes! I missed that. My inclination is to have this port number pulled
from xpt_local in the svc_xprt structure. What do you think?

Tom

> --b.


-------------------------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems? Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >> http://get.splunk.com/
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2007-10-09 19:49:18

by J. Bruce Fields

[permalink] [raw]
Subject: Re: [RFC, PATCH 13/35] svc: Change services to use new svc_create_xprt service

On Tue, Oct 09, 2007 at 01:32:18PM -0500, Tom Tucker wrote:
> On Tue, 2007-10-09 at 13:09 -0400, J. Bruce Fields wrote:
> > On Mon, Oct 01, 2007 at 02:27:59PM -0500, Tom Tucker wrote:
> > > diff --git a/fs/nfs/callback.c b/fs/nfs/callback.c
> > > index a796be5..e27ca14 100644
> > > --- a/fs/nfs/callback.c
> > > +++ b/fs/nfs/callback.c
> > > @@ -123,8 +123,8 @@ int nfs_callback_up(void)
> > > if (!serv)
> > > goto out_err;
> > >
> > > - ret = svc_makesock(serv, IPPROTO_TCP, nfs_callback_set_tcpport,
> > > - SVC_SOCK_ANONYMOUS);
> > > + ret = svc_create_xprt(serv, "tcp", nfs_callback_set_tcpport,
> > > + SVC_SOCK_ANONYMOUS);
> > > if (ret <= 0)
> > > goto out_destroy;
> > > nfs_callback_tcpport = ret;
> >
> > Looks like svc_makesock returned a port number, where svc_create_xprt
> > returns a 0 or -ERRNO. This is breaking nfsv4 callbacks.
> >
>
> Bruce:
>
> Yikes! I missed that. My inclination is to have this port number pulled
> from xpt_local in the svc_xprt structure. What do you think?

Were you thinking of doing that in the caller, or inside svc_create_xprt
itself? I'd be inclined to do the latter, and keep returning the port
number on success, though it's true that this seems to be the only
caller that uses the port number.

By the way, I guess it's actually the previous patch that introduced the
problem.

--b.

-------------------------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems? Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >> http://get.splunk.com/
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2007-10-09 20:19:34

by J. Bruce Fields

[permalink] [raw]
Subject: Re: [RFC, PATCH 13/35] svc: Change services to use new svc_create_xprt service

On Tue, Oct 09, 2007 at 01:32:18PM -0500, Tom Tucker wrote:
> On Tue, 2007-10-09 at 13:09 -0400, J. Bruce Fields wrote:
> > On Mon, Oct 01, 2007 at 02:27:59PM -0500, Tom Tucker wrote:
> > > diff --git a/fs/nfs/callback.c b/fs/nfs/callback.c
> > > index a796be5..e27ca14 100644
> > > --- a/fs/nfs/callback.c
> > > +++ b/fs/nfs/callback.c
> > > @@ -123,8 +123,8 @@ int nfs_callback_up(void)
> > > if (!serv)
> > > goto out_err;
> > >
> > > - ret = svc_makesock(serv, IPPROTO_TCP, nfs_callback_set_tcpport,
> > > - SVC_SOCK_ANONYMOUS);
> > > + ret = svc_create_xprt(serv, "tcp", nfs_callback_set_tcpport,
> > > + SVC_SOCK_ANONYMOUS);
> > > if (ret <= 0)
> > > goto out_destroy;
> > > nfs_callback_tcpport = ret;
> >
> > Looks like svc_makesock returned a port number, where svc_create_xprt
> > returns a 0 or -ERRNO. This is breaking nfsv4 callbacks.
> >
>
> Bruce:
>
> Yikes! I missed that. My inclination is to have this port number pulled
> from xpt_local in the svc_xprt structure. What do you think?

Minor nit: the printk() at the end of fs/lockd/svc.c:make_socks() still
refers to "makesock". Maybe something like "failed to create sockets"
would be more informative anyway.

--b.

-------------------------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems? Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >> http://get.splunk.com/
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2007-10-01 19:28:48

by Tom Tucker

[permalink] [raw]
Subject: [RFC, PATCH 32/35] svc: Move the xprt independent code to the svc_xprt.c file


This functionally trivial patch moves all of the transport independent
functions from the svcsock.c file to the transport independent svc_xprt.c
file.

Signed-off-by: Tom Tucker <[email protected]>
---

include/linux/sunrpc/svc_xprt.h | 4
net/sunrpc/svc_xprt.c | 746 +++++++++++++++++++++++++++++++++++++++
net/sunrpc/svcsock.c | 750 ---------------------------------------
3 files changed, 751 insertions(+), 749 deletions(-)

diff --git a/include/linux/sunrpc/svc_xprt.h b/include/linux/sunrpc/svc_xprt.h
index 47ad941..94c40f2 100644
--- a/include/linux/sunrpc/svc_xprt.h
+++ b/include/linux/sunrpc/svc_xprt.h
@@ -73,10 +73,14 @@ void svc_xprt_init(struct svc_xprt_class
struct svc_serv *);
int svc_create_xprt(struct svc_serv *, char *, unsigned short, int);
void svc_xprt_received(struct svc_xprt *);
+void svc_xprt_enqueue(struct svc_xprt *xprt);
+int svc_port_is_privileged(struct sockaddr *sin);
void svc_xprt_put(struct svc_xprt *xprt);
static inline void svc_xprt_get(struct svc_xprt *xprt)
{
kref_get(&xprt->xpt_ref);
}
+void svc_delete_xprt(struct svc_xprt *xprt);
+void svc_close_xprt(struct svc_xprt *xprt);

#endif /* SUNRPC_SVC_XPRT_H */
diff --git a/net/sunrpc/svc_xprt.c b/net/sunrpc/svc_xprt.c
index 56cda03..f408626 100644
--- a/net/sunrpc/svc_xprt.c
+++ b/net/sunrpc/svc_xprt.c
@@ -35,6 +35,17 @@ #include <linux/sunrpc/svc_xprt.h>

#define RPCDBG_FACILITY RPCDBG_SVCXPRT

+static struct svc_deferred_req *svc_deferred_dequeue(struct svc_xprt *xprt);
+static int svc_deferred_recv(struct svc_rqst *rqstp);
+static struct cache_deferred_req *svc_defer(struct cache_req *req);
+static void svc_age_temp_xprts(unsigned long closure);
+/* apparently the "standard" is that clients close
+ * idle connections after 5 minutes, servers after
+ * 6 minutes
+ * http://www.connectathon.org/talks96/nfstcp.pdf
+ */
+static int svc_conn_age_period = 6*60;
+
/* List of registered transport classes */
static spinlock_t svc_xprt_class_lock = SPIN_LOCK_UNLOCKED;
static LIST_HEAD(svc_xprt_class_list);
@@ -162,3 +173,738 @@ int svc_create_xprt(struct svc_serv *ser
return ret;
}
EXPORT_SYMBOL_GPL(svc_create_xprt);
+
+/*
+ * Queue up an idle server thread. Must have pool->sp_lock held.
+ * Note: this is really a stack rather than a queue, so that we only
+ * use as many different threads as we need, and the rest don't pollute
+ * the cache.
+ */
+static inline void
+svc_thread_enqueue(struct svc_pool *pool, struct svc_rqst *rqstp)
+{
+ list_add(&rqstp->rq_list, &pool->sp_threads);
+}
+
+/*
+ * Dequeue an nfsd thread. Must have pool->sp_lock held.
+ */
+static inline void
+svc_thread_dequeue(struct svc_pool *pool, struct svc_rqst *rqstp)
+{
+ list_del(&rqstp->rq_list);
+}
+
+/*
+ * Queue up a transport with data pending. If there are idle nfsd
+ * processes, wake 'em up.
+ *
+ */
+void
+svc_xprt_enqueue(struct svc_xprt *xprt)
+{
+ struct svc_serv *serv = xprt->xpt_server;
+ struct svc_pool *pool;
+ struct svc_rqst *rqstp;
+ int cpu;
+
+ if (!(xprt->xpt_flags &
+ ((1<<XPT_CONN)|(1<<XPT_DATA)|(1<<XPT_CLOSE)|(1<<XPT_DEFERRED))))
+ return;
+ if (test_bit(XPT_DEAD, &xprt->xpt_flags))
+ return;
+
+ cpu = get_cpu();
+ pool = svc_pool_for_cpu(xprt->xpt_server, cpu);
+ put_cpu();
+
+ spin_lock_bh(&pool->sp_lock);
+
+ if (!list_empty(&pool->sp_threads) &&
+ !list_empty(&pool->sp_sockets))
+ printk(KERN_ERR
+ "svc_xprt_enqueue: threads and xprt both waiting??\n");
+
+ if (test_bit(XPT_DEAD, &xprt->xpt_flags)) {
+ /* Don't enqueue dead transports */
+ dprintk("svc: transport %p is dead, not enqueued\n", xprt);
+ goto out_unlock;
+ }
+
+ /* Mark transport as busy. It will remain in this state until the
+ * server has processed all pending data and put the transport back
+ * on the idle list. We update XPT_BUSY atomically because
+ * it also guards against trying to enqueue the svc_sock twice.
+ */
+ if (test_and_set_bit(XPT_BUSY, &xprt->xpt_flags)) {
+ /* Don't enqueue transport while already enqueued */
+ dprintk("svc: transport %p busy, not enqueued\n", xprt);
+ goto out_unlock;
+ }
+ BUG_ON(xprt->xpt_pool != NULL);
+ xprt->xpt_pool = pool;
+
+ /* Handle pending connection */
+ if (test_bit(XPT_CONN, &xprt->xpt_flags))
+ goto process;
+
+ /* Handle close in-progress */
+ if (test_bit(XPT_CLOSE, &xprt->xpt_flags))
+ goto process;
+
+ /* Check if we have space to reply to a request */
+ if (!xprt->xpt_ops.xpo_has_wspace(xprt)) {
+ /* Don't enqueue while not enough space for reply */
+ dprintk("svc: no write space, transport %p not enqueued\n", xprt);
+ xprt->xpt_pool = NULL;
+ clear_bit(XPT_BUSY, &xprt->xpt_flags);
+ goto out_unlock;
+ }
+
+ process:
+ if (!list_empty(&pool->sp_threads)) {
+ rqstp = list_entry(pool->sp_threads.next,
+ struct svc_rqst,
+ rq_list);
+ dprintk("svc: transport %p served by daemon %p\n",
+ xprt, rqstp);
+ svc_thread_dequeue(pool, rqstp);
+ if (rqstp->rq_xprt)
+ printk(KERN_ERR
+ "svc_xprt_enqueue: server %p, rq_xprt=%p!\n",
+ rqstp, rqstp->rq_xprt);
+ rqstp->rq_xprt = xprt;
+ svc_xprt_get(xprt);
+ rqstp->rq_reserved = serv->sv_max_mesg;
+ atomic_add(rqstp->rq_reserved, &xprt->xpt_reserved);
+ BUG_ON(xprt->xpt_pool != pool);
+ wake_up(&rqstp->rq_wait);
+ } else {
+ dprintk("svc: transport %p put into queue\n", xprt);
+ list_add_tail(&xprt->xpt_ready, &pool->sp_sockets);
+ BUG_ON(xprt->xpt_pool != pool);
+ }
+
+out_unlock:
+ spin_unlock_bh(&pool->sp_lock);
+}
+EXPORT_SYMBOL_GPL(svc_xprt_enqueue);
+
+/*
+ * Dequeue the first transport. Must be called with the pool->sp_lock held.
+ */
+static inline struct svc_xprt *
+svc_xprt_dequeue(struct svc_pool *pool)
+{
+ struct svc_xprt *xprt;
+
+ if (list_empty(&pool->sp_sockets))
+ return NULL;
+
+ xprt = list_entry(pool->sp_sockets.next,
+ struct svc_xprt, xpt_ready);
+ list_del_init(&xprt->xpt_ready);
+
+ dprintk("svc: transport %p dequeued, inuse=%d\n",
+ xprt, atomic_read(&xprt->xpt_ref.refcount));
+
+ return xprt;
+}
+
+/*
+ * Having read something from a transport, check whether it
+ * needs to be re-enqueued.
+ * Note: XPT_DATA only gets cleared when a read-attempt finds
+ * no (or insufficient) data.
+ */
+void
+svc_xprt_received(struct svc_xprt *xprt)
+{
+ xprt->xpt_pool = NULL;
+ clear_bit(XPT_BUSY, &xprt->xpt_flags);
+ svc_xprt_enqueue(xprt);
+}
+EXPORT_SYMBOL_GPL(svc_xprt_received);
+
+/**
+ * svc_reserve - change the space reserved for the reply to a request.
+ * @rqstp: The request in question
+ * @space: new max space to reserve
+ *
+ * Each request reserves some space on the output queue of the transport
+ * to make sure the reply fits. This function reduces that reserved
+ * space to be the amount of space used already, plus @space.
+ *
+ */
+void svc_reserve(struct svc_rqst *rqstp, int space)
+{
+ space += rqstp->rq_res.head[0].iov_len;
+
+ if (space < rqstp->rq_reserved) {
+ struct svc_xprt *xprt = rqstp->rq_xprt;
+ atomic_sub((rqstp->rq_reserved - space), &xprt->xpt_reserved);
+ rqstp->rq_reserved = space;
+
+ svc_xprt_enqueue(xprt);
+ }
+}
+
+static void
+svc_xprt_release(struct svc_rqst *rqstp)
+{
+ struct svc_xprt *xprt = rqstp->rq_xprt;
+
+ rqstp->rq_xprt->xpt_ops.xpo_release(rqstp);
+
+ svc_free_res_pages(rqstp);
+ rqstp->rq_res.page_len = 0;
+ rqstp->rq_res.page_base = 0;
+
+ /* Reset response buffer and release
+ * the reservation.
+ * But first, check that enough space was reserved
+ * for the reply, otherwise we have a bug!
+ */
+ if ((rqstp->rq_res.len) > rqstp->rq_reserved)
+ printk(KERN_ERR "RPC request reserved %d but used %d\n",
+ rqstp->rq_reserved,
+ rqstp->rq_res.len);
+
+ rqstp->rq_res.head[0].iov_len = 0;
+ svc_reserve(rqstp, 0);
+ rqstp->rq_xprt = NULL;
+
+ svc_xprt_put(xprt);
+}
+
+/*
+ * External function to wake up a server waiting for data
+ * This really only makes sense for services like lockd
+ * which have exactly one thread anyway.
+ */
+void
+svc_wake_up(struct svc_serv *serv)
+{
+ struct svc_rqst *rqstp;
+ unsigned int i;
+ struct svc_pool *pool;
+
+ for (i = 0; i < serv->sv_nrpools; i++) {
+ pool = &serv->sv_pools[i];
+
+ spin_lock_bh(&pool->sp_lock);
+ if (!list_empty(&pool->sp_threads)) {
+ rqstp = list_entry(pool->sp_threads.next,
+ struct svc_rqst,
+ rq_list);
+ dprintk("svc: daemon %p woken up.\n", rqstp);
+ /*
+ svc_thread_dequeue(pool, rqstp);
+ rqstp->rq_xprt = NULL;
+ */
+ wake_up(&rqstp->rq_wait);
+ }
+ spin_unlock_bh(&pool->sp_lock);
+ }
+}
+
+static void
+svc_check_conn_limits(struct svc_serv *serv)
+{
+ char buf[RPC_MAX_ADDRBUFLEN];
+
+ /* make sure that we don't have too many active connections.
+ * If we have, something must be dropped.
+ *
+ * There's no point in trying to do random drop here for
+ * DoS prevention. The NFS clients does 1 reconnect in 15
+ * seconds. An attacker can easily beat that.
+ *
+ * The only somewhat efficient mechanism would be if drop
+ * old connections from the same IP first.
+ */
+ if (serv->sv_tmpcnt > (serv->sv_nrthreads+3)*20) {
+ struct svc_xprt *xprt = NULL;
+ spin_lock_bh(&serv->sv_lock);
+ if (!list_empty(&serv->sv_tempsocks)) {
+ if (net_ratelimit()) {
+ /* Try to help the admin */
+ printk(KERN_NOTICE "%s: too many open "
+ "connections, consider increasing the "
+ "number of nfsd threads\n",
+ serv->sv_name);
+ printk(KERN_NOTICE
+ "%s: last connection from %s\n",
+ serv->sv_name, buf);
+ }
+ /*
+ * Always select the oldest connection. It's not fair,
+ * but so is life
+ */
+ xprt = list_entry(serv->sv_tempsocks.prev,
+ struct svc_xprt,
+ xpt_list);
+ set_bit(XPT_CLOSE, &xprt->xpt_flags);
+ svc_xprt_get(xprt);
+ }
+ spin_unlock_bh(&serv->sv_lock);
+
+ if (xprt) {
+ svc_xprt_enqueue(xprt);
+ svc_xprt_put(xprt);
+ }
+ }
+}
+
+static inline void svc_copy_addr(struct svc_rqst *rqstp, struct svc_xprt *xprt)
+{
+ struct sockaddr *sin;
+
+ /* sock_recvmsg doesn't fill in the name/namelen, so we must..
+ */
+ memcpy(&rqstp->rq_addr, &xprt->xpt_remote, xprt->xpt_remotelen);
+ rqstp->rq_addrlen = xprt->xpt_remotelen;
+
+ /* Destination address in request is needed for binding the
+ * source address in RPC callbacks later.
+ */
+ sin = (struct sockaddr *)&xprt->xpt_local;
+ switch (sin->sa_family) {
+ case AF_INET:
+ rqstp->rq_daddr.addr = ((struct sockaddr_in *)sin)->sin_addr;
+ break;
+ case AF_INET6:
+ rqstp->rq_daddr.addr6 = ((struct sockaddr_in6 *)sin)->sin6_addr;
+ break;
+ }
+}
+
+/*
+ * Receive the next request on any transport. This code is carefully
+ * organised not to touch any cachelines in the shared svc_serv
+ * structure, only cachelines in the local svc_pool.
+ */
+int
+svc_recv(struct svc_rqst *rqstp, long timeout)
+{
+ struct svc_xprt *xprt = NULL;
+ struct svc_serv *serv = rqstp->rq_server;
+ struct svc_pool *pool = rqstp->rq_pool;
+ int len, i;
+ int pages;
+ struct xdr_buf *arg;
+ DECLARE_WAITQUEUE(wait, current);
+
+ dprintk("svc: server %p waiting for data (to = %ld)\n",
+ rqstp, timeout);
+
+ if (rqstp->rq_xprt)
+ printk(KERN_ERR
+ "svc_recv: service %p, transport not NULL!\n",
+ rqstp);
+ if (waitqueue_active(&rqstp->rq_wait))
+ printk(KERN_ERR
+ "svc_recv: service %p, wait queue active!\n",
+ rqstp);
+
+
+ /* now allocate needed pages. If we get a failure, sleep briefly */
+ pages = (serv->sv_max_mesg + PAGE_SIZE) / PAGE_SIZE;
+ for (i = 0; i < pages ; i++)
+ while (rqstp->rq_pages[i] == NULL) {
+ struct page *p = alloc_page(GFP_KERNEL);
+ if (!p)
+ schedule_timeout_uninterruptible(msecs_to_jiffies(500));
+ rqstp->rq_pages[i] = p;
+ }
+ rqstp->rq_pages[i++] = NULL; /* this might be seen in nfs_read_actor */
+ BUG_ON(pages >= RPCSVC_MAXPAGES);
+
+ /* Make arg->head point to first page and arg->pages point to rest */
+ arg = &rqstp->rq_arg;
+ arg->head[0].iov_base = page_address(rqstp->rq_pages[0]);
+ arg->head[0].iov_len = PAGE_SIZE;
+ arg->pages = rqstp->rq_pages + 1;
+ arg->page_base = 0;
+ /* save at least one page for response */
+ arg->page_len = (pages-2)*PAGE_SIZE;
+ arg->len = (pages-1)*PAGE_SIZE;
+ arg->tail[0].iov_len = 0;
+
+ try_to_freeze();
+ cond_resched();
+ if (signalled())
+ return -EINTR;
+
+ spin_lock_bh(&pool->sp_lock);
+ if ((xprt = svc_xprt_dequeue(pool)) != NULL) {
+ rqstp->rq_xprt = xprt;
+ svc_xprt_get(xprt);
+ rqstp->rq_reserved = serv->sv_max_mesg;
+ atomic_add(rqstp->rq_reserved, &xprt->xpt_reserved);
+ } else {
+ /* No data pending. Go to sleep */
+ svc_thread_enqueue(pool, rqstp);
+
+ /*
+ * We have to be able to interrupt this wait
+ * to bring down the daemons ...
+ */
+ set_current_state(TASK_INTERRUPTIBLE);
+ add_wait_queue(&rqstp->rq_wait, &wait);
+ spin_unlock_bh(&pool->sp_lock);
+
+ schedule_timeout(timeout);
+
+ try_to_freeze();
+
+ spin_lock_bh(&pool->sp_lock);
+ remove_wait_queue(&rqstp->rq_wait, &wait);
+
+ if (!(xprt = rqstp->rq_xprt)) {
+ svc_thread_dequeue(pool, rqstp);
+ spin_unlock_bh(&pool->sp_lock);
+ dprintk("svc: server %p, no data yet\n", rqstp);
+ return signalled()? -EINTR : -EAGAIN;
+ }
+ }
+ spin_unlock_bh(&pool->sp_lock);
+
+ len = 0;
+ if (test_bit(XPT_CLOSE, &xprt->xpt_flags)) {
+ dprintk("svc_recv: found XPT_CLOSE\n");
+ svc_delete_xprt(xprt);
+ } else if (test_bit(XPT_LISTENER, &xprt->xpt_flags)) {
+ struct svc_xprt *newxpt;
+ newxpt = xprt->xpt_ops.xpo_accept(xprt);
+ if (newxpt) {
+ svc_xprt_received(newxpt);
+ /*
+ * We know this module_get will succeed because the
+ * listener holds a reference too
+ */
+ __module_get(newxpt->xpt_class->xcl_owner);
+ svc_check_conn_limits(xprt->xpt_server);
+ spin_lock_bh(&serv->sv_lock);
+ set_bit(XPT_TEMP, &newxpt->xpt_flags);
+ list_add(&newxpt->xpt_list, &serv->sv_tempsocks);
+ serv->sv_tmpcnt++;
+ if (serv->sv_temptimer.function == NULL) {
+ /* setup timer to age temp transports */
+ setup_timer(&serv->sv_temptimer, svc_age_temp_xprts,
+ (unsigned long)serv);
+ mod_timer(&serv->sv_temptimer,
+ jiffies + svc_conn_age_period * HZ);
+ }
+ spin_unlock_bh(&serv->sv_lock);
+ }
+ svc_xprt_received(xprt);
+ } else {
+ dprintk("svc: server %p, pool %u, transport %p, inuse=%d\n",
+ rqstp, pool->sp_id, xprt,
+ atomic_read(&xprt->xpt_ref.refcount));
+
+ if ((rqstp->rq_deferred = svc_deferred_dequeue(xprt))) {
+ svc_xprt_received(xprt);
+ len = svc_deferred_recv(rqstp);
+ } else
+ len = xprt->xpt_ops.xpo_recvfrom(rqstp);
+ svc_copy_addr(rqstp, xprt);
+ dprintk("svc: got len=%d\n", len);
+ }
+
+ /* No data, incomplete (TCP) read, or accept() */
+ if (len == 0 || len == -EAGAIN) {
+ rqstp->rq_res.len = 0;
+ svc_xprt_release(rqstp);
+ return -EAGAIN;
+ }
+ clear_bit(XPT_OLD, &xprt->xpt_flags);
+
+ rqstp->rq_secure = svc_port_is_privileged(svc_addr(rqstp));
+ rqstp->rq_chandle.defer = svc_defer;
+
+ if (serv->sv_stats)
+ serv->sv_stats->netcnt++;
+ return len;
+}
+
+/*
+ * Drop request
+ */
+void
+svc_drop(struct svc_rqst *rqstp)
+{
+ dprintk("svc: xprt %p dropped request\n", rqstp->rq_xprt);
+ svc_xprt_release(rqstp);
+}
+
+/*
+ * Return reply to client.
+ */
+int
+svc_send(struct svc_rqst *rqstp)
+{
+ struct svc_xprt *xprt;
+ int len;
+ struct xdr_buf *xb;
+
+ if ((xprt = rqstp->rq_xprt) == NULL) {
+ printk(KERN_WARNING "NULL transport pointer in %s:%d\n",
+ __FILE__, __LINE__);
+ return -EFAULT;
+ }
+
+ /* release the receive skb before sending the reply */
+ rqstp->rq_xprt->xpt_ops.xpo_release(rqstp);
+
+ /* calculate over-all length */
+ xb = & rqstp->rq_res;
+ xb->len = xb->head[0].iov_len +
+ xb->page_len +
+ xb->tail[0].iov_len;
+
+ /* Grab mutex to serialize outgoing data. */
+ mutex_lock(&xprt->xpt_mutex);
+ if (test_bit(XPT_DEAD, &xprt->xpt_flags))
+ len = -ENOTCONN;
+ else
+ len = xprt->xpt_ops.xpo_sendto(rqstp);
+ mutex_unlock(&xprt->xpt_mutex);
+ svc_xprt_release(rqstp);
+
+ if (len == -ECONNREFUSED || len == -ENOTCONN || len == -EAGAIN)
+ return 0;
+ return len;
+}
+
+/*
+ * Timer function to close old temporary transports, using
+ * a mark-and-sweep algorithm.
+ */
+static void
+svc_age_temp_xprts(unsigned long closure)
+{
+ struct svc_serv *serv = (struct svc_serv *)closure;
+ struct svc_xprt *xprt;
+ struct list_head *le, *next;
+ LIST_HEAD(to_be_aged);
+
+ dprintk("svc_age_temp_xprts\n");
+
+ if (!spin_trylock_bh(&serv->sv_lock)) {
+ /* busy, try again 1 sec later */
+ dprintk("svc_age_temp_xprts: busy\n");
+ mod_timer(&serv->sv_temptimer, jiffies + HZ);
+ return;
+ }
+
+ list_for_each_safe(le, next, &serv->sv_tempsocks) {
+ xprt = list_entry(le, struct svc_xprt, xpt_list);
+
+ /* First time through, just mark it OLD. Second time
+ * through, close it. */
+ if (!test_and_set_bit(XPT_OLD, &xprt->xpt_flags))
+ continue;
+ if (atomic_read(&xprt->xpt_ref.refcount) > 1
+ || test_bit(XPT_BUSY, &xprt->xpt_flags))
+ continue;
+ svc_xprt_get(xprt);
+ list_move(le, &to_be_aged);
+ set_bit(XPT_CLOSE, &xprt->xpt_flags);
+ set_bit(XPT_DETACHED, &xprt->xpt_flags);
+ }
+ spin_unlock_bh(&serv->sv_lock);
+
+ while (!list_empty(&to_be_aged)) {
+ le = to_be_aged.next;
+ /* fiddling the xpt_list node is safe 'cos we're XPT_DETACHED */
+ list_del_init(le);
+ xprt = list_entry(le, struct svc_xprt, xpt_list);
+
+ dprintk("queuing xprt %p for closing\n", xprt);
+
+ /* a thread will dequeue and close it soon */
+ svc_xprt_enqueue(xprt);
+ svc_xprt_put(xprt);
+ }
+
+ mod_timer(&serv->sv_temptimer, jiffies + svc_conn_age_period * HZ);
+}
+
+/*
+ * Remove a dead transport
+ */
+void
+svc_delete_xprt(struct svc_xprt *xprt)
+{
+ struct svc_serv *serv;
+
+ dprintk("svc: svc_delete_xprt(%p)\n", xprt);
+
+ serv = xprt->xpt_server;
+
+ xprt->xpt_ops.xpo_detach(xprt);
+
+ spin_lock_bh(&serv->sv_lock);
+
+ if (!test_and_set_bit(XPT_DETACHED, &xprt->xpt_flags))
+ list_del_init(&xprt->xpt_list);
+ /*
+ * We used to delete the transport from whichever list
+ * it's sk_xprt.xpt_ready node was on, but we don't actually
+ * need to. This is because the only time we're called
+ * while still attached to a queue, the queue itself
+ * is about to be destroyed (in svc_destroy).
+ */
+ if (!test_and_set_bit(XPT_DEAD, &xprt->xpt_flags)) {
+ BUG_ON(atomic_read(&xprt->xpt_ref.refcount) < 2);
+ svc_xprt_put(xprt);
+ if (test_bit(XPT_TEMP, &xprt->xpt_flags))
+ serv->sv_tmpcnt--;
+ }
+
+ spin_unlock_bh(&serv->sv_lock);
+}
+
+void svc_close_xprt(struct svc_xprt *xprt)
+{
+ set_bit(XPT_CLOSE, &xprt->xpt_flags);
+ if (test_and_set_bit(XPT_BUSY, &xprt->xpt_flags))
+ /* someone else will have to effect the close */
+ return;
+
+ svc_xprt_get(xprt);
+ svc_delete_xprt(xprt);
+ clear_bit(XPT_BUSY, &xprt->xpt_flags);
+ svc_xprt_put(xprt);
+}
+
+void svc_close_all(struct list_head *xprt_list)
+{
+ struct svc_xprt *xprt;
+ struct svc_xprt *tmp;
+
+ list_for_each_entry_safe(xprt, tmp, xprt_list, xpt_list) {
+ set_bit(XPT_CLOSE, &xprt->xpt_flags);
+ if (test_bit(XPT_BUSY, &xprt->xpt_flags)) {
+ /* Waiting to be processed, but no threads left,
+ * So just remove it from the waiting list
+ */
+ list_del_init(&xprt->xpt_ready);
+ clear_bit(XPT_BUSY, &xprt->xpt_flags);
+ }
+ svc_close_xprt(xprt);
+ }
+}
+
+int svc_port_is_privileged(struct sockaddr *sin)
+{
+ switch (sin->sa_family) {
+ case AF_INET:
+ return ntohs(((struct sockaddr_in *)sin)->sin_port)
+ < PROT_SOCK;
+ case AF_INET6:
+ return ntohs(((struct sockaddr_in6 *)sin)->sin6_port)
+ < PROT_SOCK;
+ default:
+ return 0;
+ }
+}
+
+/*
+ * Handle defer and revisit of requests
+ */
+
+static void svc_revisit(struct cache_deferred_req *dreq, int too_many)
+{
+ struct svc_deferred_req *dr = container_of(dreq, struct svc_deferred_req, handle);
+ struct svc_xprt *xprt = dr->xprt;
+
+ if (too_many) {
+ svc_xprt_put(xprt);
+ kfree(dr);
+ return;
+ }
+ dprintk("revisit queued\n");
+ dr->xprt = NULL;
+ spin_lock(&xprt->xpt_lock);
+ list_add(&dr->handle.recent, &xprt->xpt_deferred);
+ spin_unlock(&xprt->xpt_lock);
+ set_bit(XPT_DEFERRED, &xprt->xpt_flags);
+ svc_xprt_enqueue(xprt);
+ svc_xprt_put(xprt);
+}
+
+static struct cache_deferred_req *
+svc_defer(struct cache_req *req)
+{
+ struct svc_rqst *rqstp = container_of(req, struct svc_rqst, rq_chandle);
+ int size = sizeof(struct svc_deferred_req) + (rqstp->rq_arg.len);
+ struct svc_deferred_req *dr;
+
+ if (rqstp->rq_arg.page_len)
+ return NULL; /* if more than a page, give up FIXME */
+ if (rqstp->rq_deferred) {
+ dr = rqstp->rq_deferred;
+ rqstp->rq_deferred = NULL;
+ } else {
+ int skip = rqstp->rq_arg.len - rqstp->rq_arg.head[0].iov_len;
+ /* FIXME maybe discard if size too large */
+ dr = kmalloc(size, GFP_KERNEL);
+ if (dr == NULL)
+ return NULL;
+
+ dr->handle.owner = rqstp->rq_server;
+ dr->prot = rqstp->rq_prot;
+ memcpy(&dr->addr, &rqstp->rq_addr, rqstp->rq_addrlen);
+ dr->addrlen = rqstp->rq_addrlen;
+ dr->daddr = rqstp->rq_daddr;
+ dr->argslen = rqstp->rq_arg.len >> 2;
+ memcpy(dr->args, rqstp->rq_arg.head[0].iov_base-skip, dr->argslen<<2);
+ }
+ svc_xprt_get(rqstp->rq_xprt);
+ dr->xprt = rqstp->rq_xprt;
+
+ dr->handle.revisit = svc_revisit;
+ return &dr->handle;
+}
+
+/*
+ * recv data from a deferred request into an active one
+ */
+static int svc_deferred_recv(struct svc_rqst *rqstp)
+{
+ struct svc_deferred_req *dr = rqstp->rq_deferred;
+
+ rqstp->rq_arg.head[0].iov_base = dr->args;
+ rqstp->rq_arg.head[0].iov_len = dr->argslen<<2;
+ rqstp->rq_arg.page_len = 0;
+ rqstp->rq_arg.len = dr->argslen<<2;
+ rqstp->rq_prot = dr->prot;
+ memcpy(&rqstp->rq_addr, &dr->addr, dr->addrlen);
+ rqstp->rq_addrlen = dr->addrlen;
+ rqstp->rq_daddr = dr->daddr;
+ rqstp->rq_respages = rqstp->rq_pages;
+ return dr->argslen<<2;
+}
+
+
+static struct svc_deferred_req *svc_deferred_dequeue(struct svc_xprt *xprt)
+{
+ struct svc_deferred_req *dr = NULL;
+
+ if (!test_bit(XPT_DEFERRED, &xprt->xpt_flags))
+ return NULL;
+ spin_lock(&xprt->xpt_lock);
+ clear_bit(XPT_DEFERRED, &xprt->xpt_flags);
+ if (!list_empty(&xprt->xpt_deferred)) {
+ dr = list_entry(xprt->xpt_deferred.next,
+ struct svc_deferred_req,
+ handle.recent);
+ list_del_init(&dr->handle.recent);
+ set_bit(XPT_DEFERRED, &xprt->xpt_flags);
+ }
+ spin_unlock(&xprt->xpt_lock);
+ return dr;
+}
diff --git a/net/sunrpc/svcsock.c b/net/sunrpc/svcsock.c
index 353aae2..40badd7 100644
--- a/net/sunrpc/svcsock.c
+++ b/net/sunrpc/svcsock.c
@@ -50,7 +50,7 @@ #include <linux/sunrpc/stats.h>
/* SMP locking strategy:
*
* svc_pool->sp_lock protects most of the fields of that pool.
- * svc_serv->sv_lock protects sv_tempsocks, sv_permsocks, sv_tmpcnt.
+ * svc_serv->sv_lock protects sv_tempsocks, sv_permsocks, sv_tmpcnt.
* when both need to be taken (rare), svc_serv->sv_lock is first.
* BKL protects svc_serv->sv_nrthread.
* svc_sock->sk_lock protects the svc_sock->sk_deferred list
@@ -80,27 +80,14 @@ #define RPCDBG_FACILITY RPCDBG_SVCXPRT

static struct svc_sock *svc_setup_socket(struct svc_serv *, struct socket *,
int *errp, int flags);
-static void svc_delete_xprt(struct svc_xprt *xprt);
static void svc_udp_data_ready(struct sock *, int);
static int svc_udp_recvfrom(struct svc_rqst *);
static int svc_udp_sendto(struct svc_rqst *);
-static void svc_close_xprt(struct svc_xprt *xprt);
static void svc_sock_detach(struct svc_xprt *);
static void svc_sock_free(struct svc_xprt *);

-static struct svc_deferred_req *svc_deferred_dequeue(struct svc_xprt *xprt);
-static int svc_deferred_recv(struct svc_rqst *rqstp);
-static struct cache_deferred_req *svc_defer(struct cache_req *req);
static struct svc_xprt *
svc_create_socket(struct svc_serv *, int, struct sockaddr *, int, int);
-static void svc_age_temp_xprts(unsigned long closure);
-
-/* apparently the "standard" is that clients close
- * idle connections after 5 minutes, servers after
- * 6 minutes
- * http://www.connectathon.org/talks96/nfstcp.pdf
- */
-static int svc_conn_age_period = 6*60;

#ifdef CONFIG_DEBUG_LOCK_ALLOC
static struct lock_class_key svc_key[2];
@@ -167,27 +154,6 @@ char *svc_print_addr(struct svc_rqst *rq
EXPORT_SYMBOL_GPL(svc_print_addr);

/*
- * Queue up an idle server thread. Must have pool->sp_lock held.
- * Note: this is really a stack rather than a queue, so that we only
- * use as many different threads as we need, and the rest don't pollute
- * the cache.
- */
-static inline void
-svc_thread_enqueue(struct svc_pool *pool, struct svc_rqst *rqstp)
-{
- list_add(&rqstp->rq_list, &pool->sp_threads);
-}
-
-/*
- * Dequeue an nfsd thread. Must have pool->sp_lock held.
- */
-static inline void
-svc_thread_dequeue(struct svc_pool *pool, struct svc_rqst *rqstp)
-{
- list_del(&rqstp->rq_list);
-}
-
-/*
* Release an skbuff after use
*/
static void
@@ -226,219 +192,6 @@ svc_sock_wspace(struct svc_sock *svsk)
return wspace;
}

-/*
- * Queue up a socket with data pending. If there are idle nfsd
- * processes, wake 'em up.
- *
- */
-void
-svc_xprt_enqueue(struct svc_xprt *xprt)
-{
- struct svc_serv *serv = xprt->xpt_server;
- struct svc_pool *pool;
- struct svc_rqst *rqstp;
- int cpu;
-
- if (!(xprt->xpt_flags &
- ((1<<XPT_CONN)|(1<<XPT_DATA)|(1<<XPT_CLOSE)|(1<<XPT_DEFERRED))))
- return;
- if (test_bit(XPT_DEAD, &xprt->xpt_flags))
- return;
-
- cpu = get_cpu();
- pool = svc_pool_for_cpu(xprt->xpt_server, cpu);
- put_cpu();
-
- spin_lock_bh(&pool->sp_lock);
-
- if (!list_empty(&pool->sp_threads) &&
- !list_empty(&pool->sp_sockets))
- printk(KERN_ERR
- "svc_xprt_enqueue: threads and sockets both waiting??\n");
-
- if (test_bit(XPT_DEAD, &xprt->xpt_flags)) {
- /* Don't enqueue dead sockets */
- dprintk("svc: transport %p is dead, not enqueued\n", xprt);
- goto out_unlock;
- }
-
- /* Mark socket as busy. It will remain in this state until the
- * server has processed all pending data and put the socket back
- * on the idle list. We update XPT_BUSY atomically because
- * it also guards against trying to enqueue the svc_sock twice.
- */
- if (test_and_set_bit(XPT_BUSY, &xprt->xpt_flags)) {
- /* Don't enqueue socket while already enqueued */
- dprintk("svc: transport %p busy, not enqueued\n", xprt);
- goto out_unlock;
- }
- BUG_ON(xprt->xpt_pool != NULL);
- xprt->xpt_pool = pool;
-
- /* Handle pending connection */
- if (test_bit(XPT_CONN, &xprt->xpt_flags))
- goto process;
-
- /* Handle close in-progress */
- if (test_bit(XPT_CLOSE, &xprt->xpt_flags))
- goto process;
-
- /* Check if we have space to reply to a request */
- if (!xprt->xpt_ops.xpo_has_wspace(xprt)) {
- /* Don't enqueue while not enough space for reply */
- dprintk("svc: no write space, transport %p not enqueued\n", xprt);
- xprt->xpt_pool = NULL;
- clear_bit(XPT_BUSY, &xprt->xpt_flags);
- goto out_unlock;
- }
-
- process:
- if (!list_empty(&pool->sp_threads)) {
- rqstp = list_entry(pool->sp_threads.next,
- struct svc_rqst,
- rq_list);
- dprintk("svc: transport %p served by daemon %p\n",
- xprt, rqstp);
- svc_thread_dequeue(pool, rqstp);
- if (rqstp->rq_xprt)
- printk(KERN_ERR
- "svc_xprt_enqueue: server %p, rq_xprt=%p!\n",
- rqstp, rqstp->rq_xprt);
- rqstp->rq_xprt = xprt;
- svc_xprt_get(xprt);
- rqstp->rq_reserved = serv->sv_max_mesg;
- atomic_add(rqstp->rq_reserved, &xprt->xpt_reserved);
- BUG_ON(xprt->xpt_pool != pool);
- wake_up(&rqstp->rq_wait);
- } else {
- dprintk("svc: transport %p put into queue\n", xprt);
- list_add_tail(&xprt->xpt_ready, &pool->sp_sockets);
- BUG_ON(xprt->xpt_pool != pool);
- }
-
-out_unlock:
- spin_unlock_bh(&pool->sp_lock);
-}
-EXPORT_SYMBOL_GPL(svc_xprt_enqueue);
-
-/*
- * Dequeue the first socket. Must be called with the pool->sp_lock held.
- */
-static inline struct svc_xprt *
-svc_xprt_dequeue(struct svc_pool *pool)
-{
- struct svc_xprt *xprt;
-
- if (list_empty(&pool->sp_sockets))
- return NULL;
-
- xprt = list_entry(pool->sp_sockets.next,
- struct svc_xprt, xpt_ready);
- list_del_init(&xprt->xpt_ready);
-
- dprintk("svc: transport %p dequeued, inuse=%d\n",
- xprt, atomic_read(&xprt->xpt_ref.refcount));
-
- return xprt;
-}
-
-/*
- * Having read something from a socket, check whether it
- * needs to be re-enqueued.
- * Note: XPT_DATA only gets cleared when a read-attempt finds
- * no (or insufficient) data.
- */
-void
-svc_xprt_received(struct svc_xprt *xprt)
-{
- xprt->xpt_pool = NULL;
- clear_bit(XPT_BUSY, &xprt->xpt_flags);
- svc_xprt_enqueue(xprt);
-}
-EXPORT_SYMBOL_GPL(svc_xprt_received);
-
-/**
- * svc_reserve - change the space reserved for the reply to a request.
- * @rqstp: The request in question
- * @space: new max space to reserve
- *
- * Each request reserves some space on the output queue of the socket
- * to make sure the reply fits. This function reduces that reserved
- * space to be the amount of space used already, plus @space.
- *
- */
-void svc_reserve(struct svc_rqst *rqstp, int space)
-{
- space += rqstp->rq_res.head[0].iov_len;
-
- if (space < rqstp->rq_reserved) {
- struct svc_xprt *xprt = rqstp->rq_xprt;
- atomic_sub((rqstp->rq_reserved - space), &xprt->xpt_reserved);
- rqstp->rq_reserved = space;
-
- svc_xprt_enqueue(xprt);
- }
-}
-
-static void
-svc_xprt_release(struct svc_rqst *rqstp)
-{
- struct svc_xprt *xprt = rqstp->rq_xprt;
-
- rqstp->rq_xprt->xpt_ops.xpo_release(rqstp);
-
- svc_free_res_pages(rqstp);
- rqstp->rq_res.page_len = 0;
- rqstp->rq_res.page_base = 0;
-
- /* Reset response buffer and release
- * the reservation.
- * But first, check that enough space was reserved
- * for the reply, otherwise we have a bug!
- */
- if ((rqstp->rq_res.len) > rqstp->rq_reserved)
- printk(KERN_ERR "RPC request reserved %d but used %d\n",
- rqstp->rq_reserved,
- rqstp->rq_res.len);
-
- rqstp->rq_res.head[0].iov_len = 0;
- svc_reserve(rqstp, 0);
- rqstp->rq_xprt = NULL;
-
- svc_xprt_put(xprt);
-}
-
-/*
- * External function to wake up a server waiting for data
- * This really only makes sense for services like lockd
- * which have exactly one thread anyway.
- */
-void
-svc_wake_up(struct svc_serv *serv)
-{
- struct svc_rqst *rqstp;
- unsigned int i;
- struct svc_pool *pool;
-
- for (i = 0; i < serv->sv_nrpools; i++) {
- pool = &serv->sv_pools[i];
-
- spin_lock_bh(&pool->sp_lock);
- if (!list_empty(&pool->sp_threads)) {
- rqstp = list_entry(pool->sp_threads.next,
- struct svc_rqst,
- rq_list);
- dprintk("svc: daemon %p woken up.\n", rqstp);
- /*
- svc_thread_dequeue(pool, rqstp);
- rqstp->rq_xprt = NULL;
- */
- wake_up(&rqstp->rq_wait);
- }
- spin_unlock_bh(&pool->sp_lock);
- }
-}
-
union svc_pktinfo_u {
struct in_pktinfo pkti;
struct in6_pktinfo pkti6;
@@ -1024,20 +777,6 @@ svc_tcp_data_ready(struct sock *sk, int
wake_up_interruptible(sk->sk_sleep);
}

-static inline int svc_port_is_privileged(struct sockaddr *sin)
-{
- switch (sin->sa_family) {
- case AF_INET:
- return ntohs(((struct sockaddr_in *)sin)->sin_port)
- < PROT_SOCK;
- case AF_INET6:
- return ntohs(((struct sockaddr_in6 *)sin)->sin6_port)
- < PROT_SOCK;
- default:
- return 0;
- }
-}
-
/*
* Accept a TCP connection
*/
@@ -1437,330 +1176,6 @@ svc_sock_update_bufs(struct svc_serv *se
spin_unlock_bh(&serv->sv_lock);
}

-static void
-svc_check_conn_limits(struct svc_serv *serv)
-{
- char buf[RPC_MAX_ADDRBUFLEN];
-
- /* make sure that we don't have too many active connections.
- * If we have, something must be dropped.
- *
- * There's no point in trying to do random drop here for
- * DoS prevention. The NFS clients does 1 reconnect in 15
- * seconds. An attacker can easily beat that.
- *
- * The only somewhat efficient mechanism would be if drop
- * old connections from the same IP first.
- */
- if (serv->sv_tmpcnt > (serv->sv_nrthreads+3)*20) {
- struct svc_xprt *xprt = NULL;
- spin_lock_bh(&serv->sv_lock);
- if (!list_empty(&serv->sv_tempsocks)) {
- if (net_ratelimit()) {
- /* Try to help the admin */
- printk(KERN_NOTICE "%s: too many open "
- "connections, consider increasing the "
- "number of nfsd threads\n",
- serv->sv_name);
- printk(KERN_NOTICE
- "%s: last connection from %s\n",
- serv->sv_name, buf);
- }
- /*
- * Always select the oldest connection. It's not fair,
- * but so is life
- */
- xprt = list_entry(serv->sv_tempsocks.prev,
- struct svc_xprt,
- xpt_list);
- set_bit(XPT_CLOSE, &xprt->xpt_flags);
- svc_xprt_get(xprt);
- }
- spin_unlock_bh(&serv->sv_lock);
-
- if (xprt) {
- svc_xprt_enqueue(xprt);
- svc_xprt_put(xprt);
- }
- }
-}
-
-static inline void svc_copy_addr(struct svc_rqst *rqstp, struct svc_xprt *xprt)
-{
- struct sockaddr *sin;
-
- /* sock_recvmsg doesn't fill in the name/namelen, so we must..
- */
- memcpy(&rqstp->rq_addr, &xprt->xpt_remote, xprt->xpt_remotelen);
- rqstp->rq_addrlen = xprt->xpt_remotelen;
-
- /* Destination address in request is needed for binding the
- * source address in RPC callbacks later.
- */
- sin = (struct sockaddr *)&xprt->xpt_local;
- switch (sin->sa_family) {
- case AF_INET:
- rqstp->rq_daddr.addr = ((struct sockaddr_in *)sin)->sin_addr;
- break;
- case AF_INET6:
- rqstp->rq_daddr.addr6 = ((struct sockaddr_in6 *)sin)->sin6_addr;
- break;
- }
-}
-
-/*
- * Receive the next request on any socket. This code is carefully
- * organised not to touch any cachelines in the shared svc_serv
- * structure, only cachelines in the local svc_pool.
- */
-int
-svc_recv(struct svc_rqst *rqstp, long timeout)
-{
- struct svc_xprt *xprt = NULL;
- struct svc_serv *serv = rqstp->rq_server;
- struct svc_pool *pool = rqstp->rq_pool;
- int len, i;
- int pages;
- struct xdr_buf *arg;
- DECLARE_WAITQUEUE(wait, current);
-
- dprintk("svc: server %p waiting for data (to = %ld)\n",
- rqstp, timeout);
-
- if (rqstp->rq_xprt)
- printk(KERN_ERR
- "svc_recv: service %p, transport not NULL!\n",
- rqstp);
- if (waitqueue_active(&rqstp->rq_wait))
- printk(KERN_ERR
- "svc_recv: service %p, wait queue active!\n",
- rqstp);
-
-
- /* now allocate needed pages. If we get a failure, sleep briefly */
- pages = (serv->sv_max_mesg + PAGE_SIZE) / PAGE_SIZE;
- for (i=0; i < pages ; i++)
- while (rqstp->rq_pages[i] == NULL) {
- struct page *p = alloc_page(GFP_KERNEL);
- if (!p)
- schedule_timeout_uninterruptible(msecs_to_jiffies(500));
- rqstp->rq_pages[i] = p;
- }
- rqstp->rq_pages[i++] = NULL; /* this might be seen in nfs_read_actor */
- BUG_ON(pages >= RPCSVC_MAXPAGES);
-
- /* Make arg->head point to first page and arg->pages point to rest */
- arg = &rqstp->rq_arg;
- arg->head[0].iov_base = page_address(rqstp->rq_pages[0]);
- arg->head[0].iov_len = PAGE_SIZE;
- arg->pages = rqstp->rq_pages + 1;
- arg->page_base = 0;
- /* save at least one page for response */
- arg->page_len = (pages-2)*PAGE_SIZE;
- arg->len = (pages-1)*PAGE_SIZE;
- arg->tail[0].iov_len = 0;
-
- try_to_freeze();
- cond_resched();
- if (signalled())
- return -EINTR;
-
- spin_lock_bh(&pool->sp_lock);
- if ((xprt = svc_xprt_dequeue(pool)) != NULL) {
- rqstp->rq_xprt = xprt;
- svc_xprt_get(xprt);
- rqstp->rq_reserved = serv->sv_max_mesg;
- atomic_add(rqstp->rq_reserved, &xprt->xpt_reserved);
- } else {
- /* No data pending. Go to sleep */
- svc_thread_enqueue(pool, rqstp);
-
- /*
- * We have to be able to interrupt this wait
- * to bring down the daemons ...
- */
- set_current_state(TASK_INTERRUPTIBLE);
- add_wait_queue(&rqstp->rq_wait, &wait);
- spin_unlock_bh(&pool->sp_lock);
-
- schedule_timeout(timeout);
-
- try_to_freeze();
-
- spin_lock_bh(&pool->sp_lock);
- remove_wait_queue(&rqstp->rq_wait, &wait);
-
- if (!(xprt = rqstp->rq_xprt)) {
- svc_thread_dequeue(pool, rqstp);
- spin_unlock_bh(&pool->sp_lock);
- dprintk("svc: server %p, no data yet\n", rqstp);
- return signalled()? -EINTR : -EAGAIN;
- }
- }
- spin_unlock_bh(&pool->sp_lock);
-
- len = 0;
- if (test_bit(XPT_CLOSE, &xprt->xpt_flags)) {
- dprintk("svc_recv: found XPT_CLOSE\n");
- svc_delete_xprt(xprt);
- } else if (test_bit(XPT_LISTENER, &xprt->xpt_flags)) {
- struct svc_xprt *newxpt;
- newxpt = xprt->xpt_ops.xpo_accept(xprt);
- if (newxpt) {
- svc_xprt_received(newxpt);
- /*
- * We know this module_get will succeed because the
- * listener holds a reference too
- */
- __module_get(newxpt->xpt_class->xcl_owner);
- svc_check_conn_limits(xprt->xpt_server);
- spin_lock_bh(&serv->sv_lock);
- set_bit(XPT_TEMP, &newxpt->xpt_flags);
- list_add(&newxpt->xpt_list, &serv->sv_tempsocks);
- serv->sv_tmpcnt++;
- if (serv->sv_temptimer.function == NULL) {
- /* setup timer to age temp sockets */
- setup_timer(&serv->sv_temptimer, svc_age_temp_xprts,
- (unsigned long)serv);
- mod_timer(&serv->sv_temptimer,
- jiffies + svc_conn_age_period * HZ);
- }
- spin_unlock_bh(&serv->sv_lock);
- }
- svc_xprt_received(xprt);
- } else {
- dprintk("svc: server %p, pool %u, transport %p, inuse=%d\n",
- rqstp, pool->sp_id, xprt,
- atomic_read(&xprt->xpt_ref.refcount));
-
- if ((rqstp->rq_deferred = svc_deferred_dequeue(xprt))) {
- svc_xprt_received(xprt);
- len = svc_deferred_recv(rqstp);
- } else
- len = xprt->xpt_ops.xpo_recvfrom(rqstp);
- svc_copy_addr(rqstp, xprt);
- dprintk("svc: got len=%d\n", len);
- }
-
- /* No data, incomplete (TCP) read, or accept() */
- if (len == 0 || len == -EAGAIN) {
- rqstp->rq_res.len = 0;
- svc_xprt_release(rqstp);
- return -EAGAIN;
- }
- clear_bit(XPT_OLD, &xprt->xpt_flags);
-
- rqstp->rq_secure = svc_port_is_privileged(svc_addr(rqstp));
- rqstp->rq_chandle.defer = svc_defer;
-
- if (serv->sv_stats)
- serv->sv_stats->netcnt++;
- return len;
-}
-
-/*
- * Drop request
- */
-void
-svc_drop(struct svc_rqst *rqstp)
-{
- dprintk("svc: xprt %p dropped request\n", rqstp->rq_xprt);
- svc_xprt_release(rqstp);
-}
-
-/*
- * Return reply to client.
- */
-int
-svc_send(struct svc_rqst *rqstp)
-{
- struct svc_xprt *xprt;
- int len;
- struct xdr_buf *xb;
-
- if ((xprt = rqstp->rq_xprt) == NULL) {
- printk(KERN_WARNING "NULL transport pointer in %s:%d\n",
- __FILE__, __LINE__);
- return -EFAULT;
- }
-
- /* release the receive skb before sending the reply */
- rqstp->rq_xprt->xpt_ops.xpo_release(rqstp);
-
- /* calculate over-all length */
- xb = & rqstp->rq_res;
- xb->len = xb->head[0].iov_len +
- xb->page_len +
- xb->tail[0].iov_len;
-
- /* Grab mutex to serialize outgoing data. */
- mutex_lock(&xprt->xpt_mutex);
- if (test_bit(XPT_DEAD, &xprt->xpt_flags))
- len = -ENOTCONN;
- else
- len = xprt->xpt_ops.xpo_sendto(rqstp);
- mutex_unlock(&xprt->xpt_mutex);
- svc_xprt_release(rqstp);
-
- if (len == -ECONNREFUSED || len == -ENOTCONN || len == -EAGAIN)
- return 0;
- return len;
-}
-
-/*
- * Timer function to close old temporary sockets, using
- * a mark-and-sweep algorithm.
- */
-static void
-svc_age_temp_xprts(unsigned long closure)
-{
- struct svc_serv *serv = (struct svc_serv *)closure;
- struct svc_xprt *xprt;
- struct list_head *le, *next;
- LIST_HEAD(to_be_aged);
-
- dprintk("svc_age_temp_xprts\n");
-
- if (!spin_trylock_bh(&serv->sv_lock)) {
- /* busy, try again 1 sec later */
- dprintk("svc_age_temp_xprts: busy\n");
- mod_timer(&serv->sv_temptimer, jiffies + HZ);
- return;
- }
-
- list_for_each_safe(le, next, &serv->sv_tempsocks) {
- xprt = list_entry(le, struct svc_xprt, xpt_list);
-
- /* First time through, just mark it OLD. Second time
- * through, close it. */
- if (!test_and_set_bit(XPT_OLD, &xprt->xpt_flags))
- continue;
- if (atomic_read(&xprt->xpt_ref.refcount) > 1
- || test_bit(XPT_BUSY, &xprt->xpt_flags))
- continue;
- svc_xprt_get(xprt);
- list_move(le, &to_be_aged);
- set_bit(XPT_CLOSE, &xprt->xpt_flags);
- set_bit(XPT_DETACHED, &xprt->xpt_flags);
- }
- spin_unlock_bh(&serv->sv_lock);
-
- while (!list_empty(&to_be_aged)) {
- le = to_be_aged.next;
- /* fiddling the xpt_list node is safe 'cos we're XPT_DETACHED */
- list_del_init(le);
- xprt = list_entry(le, struct svc_xprt, xpt_list);
-
- dprintk("queuing xprt %p for closing\n", xprt);
-
- /* a thread will dequeue and close it soon */
- svc_xprt_enqueue(xprt);
- svc_xprt_put(xprt);
- }
-
- mod_timer(&serv->sv_temptimer, jiffies + svc_conn_age_period * HZ);
-}
-
/*
* Initialize socket for RPC use and create svc_sock struct
* XXX: May want to setsockopt SO_SNDBUF and SO_RCVBUF.
@@ -1938,166 +1353,3 @@ svc_sock_free(struct svc_xprt *xprt)
sock_release(svsk->sk_sock);
kfree(svsk);
}
-
-/*
- * Remove a dead transport
- */
-static void
-svc_delete_xprt(struct svc_xprt *xprt)
-{
- struct svc_serv *serv;
-
- dprintk("svc: svc_delete_xprt(%p)\n", xprt);
-
- serv = xprt->xpt_server;
-
- xprt->xpt_ops.xpo_detach(xprt);
-
- spin_lock_bh(&serv->sv_lock);
-
- if (!test_and_set_bit(XPT_DETACHED, &xprt->xpt_flags))
- list_del_init(&xprt->xpt_list);
- /*
- * We used to delete the transport from whichever list
- * it's sk_xprt.xpt_ready node was on, but we don't actually
- * need to. This is because the only time we're called
- * while still attached to a queue, the queue itself
- * is about to be destroyed (in svc_destroy).
- */
- if (!test_and_set_bit(XPT_DEAD, &xprt->xpt_flags)) {
- BUG_ON(atomic_read(&xprt->xpt_ref.refcount) < 2);
- svc_xprt_put(xprt);
- if (test_bit(XPT_TEMP, &xprt->xpt_flags))
- serv->sv_tmpcnt--;
- }
-
- spin_unlock_bh(&serv->sv_lock);
-}
-
-static void svc_close_xprt(struct svc_xprt *xprt)
-{
- set_bit(XPT_CLOSE, &xprt->xpt_flags);
- if (test_and_set_bit(XPT_BUSY, &xprt->xpt_flags))
- /* someone else will have to effect the close */
- return;
-
- svc_xprt_get(xprt);
- svc_delete_xprt(xprt);
- clear_bit(XPT_BUSY, &xprt->xpt_flags);
- svc_xprt_put(xprt);
-}
-
-void svc_close_all(struct list_head *xprt_list)
-{
- struct svc_xprt *xprt;
- struct svc_xprt *tmp;
-
- list_for_each_entry_safe(xprt, tmp, xprt_list, xpt_list) {
- set_bit(XPT_CLOSE, &xprt->xpt_flags);
- if (test_bit(XPT_BUSY, &xprt->xpt_flags)) {
- /* Waiting to be processed, but no threads left,
- * So just remove it from the waiting list
- */
- list_del_init(&xprt->xpt_ready);
- clear_bit(XPT_BUSY, &xprt->xpt_flags);
- }
- svc_close_xprt(xprt);
- }
-}
-
-/*
- * Handle defer and revisit of requests
- */
-
-static void svc_revisit(struct cache_deferred_req *dreq, int too_many)
-{
- struct svc_deferred_req *dr = container_of(dreq, struct svc_deferred_req, handle);
- struct svc_xprt *xprt = dr->xprt;
-
- if (too_many) {
- svc_xprt_put(xprt);
- kfree(dr);
- return;
- }
- dprintk("revisit queued\n");
- dr->xprt = NULL;
- spin_lock(&xprt->xpt_lock);
- list_add(&dr->handle.recent, &xprt->xpt_deferred);
- spin_unlock(&xprt->xpt_lock);
- set_bit(XPT_DEFERRED, &xprt->xpt_flags);
- svc_xprt_enqueue(xprt);
- svc_xprt_put(xprt);
-}
-
-static struct cache_deferred_req *
-svc_defer(struct cache_req *req)
-{
- struct svc_rqst *rqstp = container_of(req, struct svc_rqst, rq_chandle);
- int size = sizeof(struct svc_deferred_req) + (rqstp->rq_arg.len);
- struct svc_deferred_req *dr;
-
- if (rqstp->rq_arg.page_len)
- return NULL; /* if more than a page, give up FIXME */
- if (rqstp->rq_deferred) {
- dr = rqstp->rq_deferred;
- rqstp->rq_deferred = NULL;
- } else {
- int skip = rqstp->rq_arg.len - rqstp->rq_arg.head[0].iov_len;
- /* FIXME maybe discard if size too large */
- dr = kmalloc(size, GFP_KERNEL);
- if (dr == NULL)
- return NULL;
-
- dr->handle.owner = rqstp->rq_server;
- dr->prot = rqstp->rq_prot;
- memcpy(&dr->addr, &rqstp->rq_addr, rqstp->rq_addrlen);
- dr->addrlen = rqstp->rq_addrlen;
- dr->daddr = rqstp->rq_daddr;
- dr->argslen = rqstp->rq_arg.len >> 2;
- memcpy(dr->args, rqstp->rq_arg.head[0].iov_base-skip, dr->argslen<<2);
- }
- svc_xprt_get(rqstp->rq_xprt);
- dr->xprt = rqstp->rq_xprt;
-
- dr->handle.revisit = svc_revisit;
- return &dr->handle;
-}
-
-/*
- * recv data from a deferred request into an active one
- */
-static int svc_deferred_recv(struct svc_rqst *rqstp)
-{
- struct svc_deferred_req *dr = rqstp->rq_deferred;
-
- rqstp->rq_arg.head[0].iov_base = dr->args;
- rqstp->rq_arg.head[0].iov_len = dr->argslen<<2;
- rqstp->rq_arg.page_len = 0;
- rqstp->rq_arg.len = dr->argslen<<2;
- rqstp->rq_prot = dr->prot;
- memcpy(&rqstp->rq_addr, &dr->addr, dr->addrlen);
- rqstp->rq_addrlen = dr->addrlen;
- rqstp->rq_daddr = dr->daddr;
- rqstp->rq_respages = rqstp->rq_pages;
- return dr->argslen<<2;
-}
-
-
-static struct svc_deferred_req *svc_deferred_dequeue(struct svc_xprt *xprt)
-{
- struct svc_deferred_req *dr = NULL;
-
- if (!test_bit(XPT_DEFERRED, &xprt->xpt_flags))
- return NULL;
- spin_lock(&xprt->xpt_lock);
- clear_bit(XPT_DEFERRED, &xprt->xpt_flags);
- if (!list_empty(&xprt->xpt_deferred)) {
- dr = list_entry(xprt->xpt_deferred.next,
- struct svc_deferred_req,
- handle.recent);
- list_del_init(&dr->handle.recent);
- set_bit(XPT_DEFERRED, &xprt->xpt_flags);
- }
- spin_unlock(&xprt->xpt_lock);
- return dr;
-}

-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2005.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2007-10-01 19:28:48

by Tom Tucker

[permalink] [raw]
Subject: [RFC,PATCH 34/35] svc: Add /proc/sys/sunrpc/transport files


Add a file that when read lists the set of registered svc
transports.

Signed-off-by: Tom Tucker <[email protected]>
---

include/linux/sunrpc/debug.h | 1 +
include/linux/sunrpc/svc_xprt.h | 2 +-
net/sunrpc/svc_xprt.c | 28 ++++++++++++++++++++++++++++
net/sunrpc/sysctl.c | 37 +++++++++++++++++++++++++++++++++++++
4 files changed, 67 insertions(+), 1 deletions(-)

diff --git a/include/linux/sunrpc/debug.h b/include/linux/sunrpc/debug.h
index 10709cb..89458df 100644
--- a/include/linux/sunrpc/debug.h
+++ b/include/linux/sunrpc/debug.h
@@ -88,6 +88,7 @@ enum {
CTL_SLOTTABLE_TCP,
CTL_MIN_RESVPORT,
CTL_MAX_RESVPORT,
+ CTL_TRANSPORTS,
};

#endif /* _LINUX_SUNRPC_DEBUG_H_ */
diff --git a/include/linux/sunrpc/svc_xprt.h b/include/linux/sunrpc/svc_xprt.h
index 94c40f2..53c8891 100644
--- a/include/linux/sunrpc/svc_xprt.h
+++ b/include/linux/sunrpc/svc_xprt.h
@@ -82,5 +82,5 @@ static inline void svc_xprt_get(struct s
}
void svc_delete_xprt(struct svc_xprt *xprt);
void svc_close_xprt(struct svc_xprt *xprt);
-
+int svc_print_xprts(char *buf, int maxlen);
#endif /* SUNRPC_SVC_XPRT_H */
diff --git a/net/sunrpc/svc_xprt.c b/net/sunrpc/svc_xprt.c
index 6b99786..e55904f 100644
--- a/net/sunrpc/svc_xprt.c
+++ b/net/sunrpc/svc_xprt.c
@@ -93,6 +93,34 @@ int svc_unreg_xprt_class(struct svc_xprt
}
EXPORT_SYMBOL_GPL(svc_unreg_xprt_class);

+/*
+ * Format the transport list for printing
+ */
+int svc_print_xprts(char *buf, int maxlen)
+{
+ struct list_head *le;
+ char tmpstr[80];
+ int len = 0;
+ buf[0] = '\0';
+
+ spin_lock(&svc_xprt_class_lock);
+ list_for_each(le, &svc_xprt_class_list) {
+ int slen;
+ struct svc_xprt_class *xcl =
+ list_entry(le, struct svc_xprt_class, xcl_list);
+
+ sprintf(tmpstr, "%s %d\n", xcl->xcl_name, xcl->xcl_max_payload);
+ slen = strlen(tmpstr);
+ if (len + slen > maxlen)
+ break;
+ len += slen;
+ strcat(buf, tmpstr);
+ }
+ spin_unlock(&svc_xprt_class_lock);
+
+ return len;
+}
+
static inline void svc_xprt_free(struct kref *kref)
{
struct svc_xprt *xprt =
diff --git a/net/sunrpc/sysctl.c b/net/sunrpc/sysctl.c
index 738db32..8642f6f 100644
--- a/net/sunrpc/sysctl.c
+++ b/net/sunrpc/sysctl.c
@@ -18,6 +18,7 @@ #include <asm/uaccess.h>
#include <linux/sunrpc/types.h>
#include <linux/sunrpc/sched.h>
#include <linux/sunrpc/stats.h>
+#include <linux/sunrpc/svc_xprt.h>

/*
* Declare the debug flags here
@@ -27,6 +28,8 @@ unsigned int nfs_debug;
unsigned int nfsd_debug;
unsigned int nlm_debug;

+char xprt_buf[128];
+
#ifdef RPC_DEBUG

static struct ctl_table_header *sunrpc_table_header;
@@ -48,6 +51,32 @@ rpc_unregister_sysctl(void)
}
}

+static int proc_do_xprt(ctl_table *table, int write, struct file *file,
+ void __user *buffer, size_t *lenp, loff_t *ppos)
+{
+ char tmpbuf[sizeof(xprt_buf)];
+ int len;
+ if ((*ppos && !write) || !*lenp) {
+ *lenp = 0;
+ return 0;
+ }
+ if (write)
+ return -EINVAL;
+ else {
+
+ len = svc_print_xprts(tmpbuf, sizeof(tmpbuf));
+ if (!access_ok(VERIFY_WRITE, buffer, len))
+ return -EFAULT;
+
+ if (__copy_to_user(buffer, tmpbuf, len))
+ return -EFAULT;
+ }
+
+ *lenp -= len;
+ *ppos += len;
+ return 0;
+}
+
static int
proc_dodebug(ctl_table *table, int write, struct file *file,
void __user *buffer, size_t *lenp, loff_t *ppos)
@@ -145,6 +174,14 @@ static ctl_table debug_table[] = {
.mode = 0644,
.proc_handler = &proc_dodebug
},
+ {
+ .ctl_name = CTL_TRANSPORTS,
+ .procname = "transports",
+ .data = xprt_buf,
+ .maxlen = sizeof(xprt_buf),
+ .mode = 0444,
+ .proc_handler = &proc_do_xprt,
+ },
{ .ctl_name = 0 }
};


-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2005.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2007-10-01 19:28:48

by Tom Tucker

[permalink] [raw]
Subject: [RFC, PATCH 35/35] knfsd: Support adding transports by writing portlist file


Update the write handler for the portlist file to allow creating new
listening endpoints on a transport. The general form of the string is:

<transport_name><space><port number>

For example:

tcp 2049

This is intended to support the creation of a listening endpoint for
RDMA transports without adding #ifdef code to the nfssvc.c file.

Signed-off-by: Tom Tucker <[email protected]>
---

fs/nfsd/nfsctl.c | 16 ++++++++++++++++
1 files changed, 16 insertions(+), 0 deletions(-)

diff --git a/fs/nfsd/nfsctl.c b/fs/nfsd/nfsctl.c
index baac89d..923b817 100644
--- a/fs/nfsd/nfsctl.c
+++ b/fs/nfsd/nfsctl.c
@@ -554,6 +554,22 @@ static ssize_t write_ports(struct file *
kfree(toclose);
return len;
}
+ /*
+ * Add a transport listener by writing it's transport name
+ */
+ if (isalnum(buf[0])) {
+ int err;
+ char transport[16];
+ int port;
+ if (sscanf(buf, "%15s %4d", transport, &port) == 2) {
+ err = nfsd_create_serv();
+ if (!err)
+ err = svc_create_xprt(nfsd_serv,
+ transport, port,
+ SVC_SOCK_ANONYMOUS);
+ return err < 0 ? err : 0;
+ }
+ }
return -EINVAL;
}


-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2005.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2007-10-01 19:28:45

by Tom Tucker

[permalink] [raw]
Subject: [RFC, PATCH 33/35] svc: Add transport hdr size for defer/revisit


Some transports have a header in front of the RPC header. The current
defer/revisit processing considers only the iov_len and arg_len to
determine how much to back up when saving the original request
to revisit. Add a field to the rqstp structure to save the size
of the transport header so svc_defer can correctly compute
the start of a request.

Signed-off-by: Tom Tucker <[email protected]>
---

include/linux/sunrpc/svc.h | 1 +
net/sunrpc/svc_xprt.c | 24 ++++++++++++++++++++----
net/sunrpc/svcsock.c | 3 +++
3 files changed, 24 insertions(+), 4 deletions(-)

diff --git a/include/linux/sunrpc/svc.h b/include/linux/sunrpc/svc.h
index 04eb20e..ea07e3d 100644
--- a/include/linux/sunrpc/svc.h
+++ b/include/linux/sunrpc/svc.h
@@ -217,6 +217,7 @@ struct svc_rqst {
void * rq_xprt_ctxt; /* transport specific context ptr */
struct svc_deferred_req*rq_deferred; /* deferred request we are replaying */

+ size_t rq_xprt_hlen; /* xprt header len */
struct xdr_buf rq_arg;
struct xdr_buf rq_res;
struct page * rq_pages[RPCSVC_MAXPAGES];
diff --git a/net/sunrpc/svc_xprt.c b/net/sunrpc/svc_xprt.c
index f408626..6b99786 100644
--- a/net/sunrpc/svc_xprt.c
+++ b/net/sunrpc/svc_xprt.c
@@ -836,11 +836,19 @@ static void svc_revisit(struct cache_def
svc_xprt_put(xprt);
}

+/*
+ * Save the request off for later processing. The request buffer looks
+ * like this:
+ *
+ * <xprt-header><rpc-header><rpc-pagelist><rpc-tail>
+ *
+ * This code can only handle requests that consist of an xprt-header
+ * and rpc-header.
+ */
static struct cache_deferred_req *
svc_defer(struct cache_req *req)
{
struct svc_rqst *rqstp = container_of(req, struct svc_rqst, rq_chandle);
- int size = sizeof(struct svc_deferred_req) + (rqstp->rq_arg.len);
struct svc_deferred_req *dr;

if (rqstp->rq_arg.page_len)
@@ -849,8 +857,11 @@ svc_defer(struct cache_req *req)
dr = rqstp->rq_deferred;
rqstp->rq_deferred = NULL;
} else {
- int skip = rqstp->rq_arg.len - rqstp->rq_arg.head[0].iov_len;
+ int skip;
+ int size;
/* FIXME maybe discard if size too large */
+ size = sizeof(struct svc_deferred_req) + rqstp->rq_arg.len +
+ rqstp->rq_xprt_hlen;
dr = kmalloc(size, GFP_KERNEL);
if (dr == NULL)
return NULL;
@@ -860,8 +871,13 @@ svc_defer(struct cache_req *req)
memcpy(&dr->addr, &rqstp->rq_addr, rqstp->rq_addrlen);
dr->addrlen = rqstp->rq_addrlen;
dr->daddr = rqstp->rq_daddr;
- dr->argslen = rqstp->rq_arg.len >> 2;
- memcpy(dr->args, rqstp->rq_arg.head[0].iov_base-skip, dr->argslen<<2);
+ dr->argslen = (rqstp->rq_arg.len + rqstp->rq_xprt_hlen) >> 2;
+
+ /* back up head to the start of the buffer and copy */
+ skip = (rqstp->rq_arg.len + rqstp->rq_xprt_hlen) -
+ rqstp->rq_arg.head[0].iov_len;
+ memcpy(dr->args, rqstp->rq_arg.head[0].iov_base - skip,
+ dr->argslen << 2);
}
svc_xprt_get(rqstp->rq_xprt);
dr->xprt = rqstp->rq_xprt;
diff --git a/net/sunrpc/svcsock.c b/net/sunrpc/svcsock.c
index 40badd7..e170866 100644
--- a/net/sunrpc/svcsock.c
+++ b/net/sunrpc/svcsock.c
@@ -393,6 +393,9 @@ svc_recvfrom(struct svc_rqst *rqstp, str
};
int len;

+ /* TCP/UDP have no transport header */
+ rqstp->rq_xprt_hlen = 0;
+
len = kernel_recvmsg(svsk->sk_sock, &msg, iov, nr, buflen,
msg.msg_flags);


-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2005.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2007-10-02 14:56:11

by Chuck Lever

[permalink] [raw]
Subject: Re: [RFC, PATCH 04/35] svc: Add a max payload value to the transport

On Oct 1, 2007, at 3:27 PM, Tom Tucker wrote:
>
> The svc_max_payload function currently looks at the socket type
> to determine the max payload. Add a max payload value to
> svc_xprt_class
> so it can be returned directly.
>
> Signed-off-by: Tom Tucker <[email protected]>
> ---
>
> include/linux/sunrpc/svc_xprt.h | 2 ++
> net/sunrpc/svc.c | 4 +---
> net/sunrpc/svc_xprt.c | 1 +
> net/sunrpc/svcsock.c | 2 ++
> 4 files changed, 6 insertions(+), 3 deletions(-)
>
> diff --git a/include/linux/sunrpc/svc_xprt.h b/include/linux/sunrpc/
> svc_xprt.h
> index a9a3afe..827f0fe 100644
> --- a/include/linux/sunrpc/svc_xprt.h
> +++ b/include/linux/sunrpc/svc_xprt.h
> @@ -17,11 +17,13 @@ struct svc_xprt_class {
> struct module *xcl_owner;
> struct svc_xprt_ops *xcl_ops;
> struct list_head xcl_list;
> + u32 xcl_max_payload;
> };
>
> struct svc_xprt {
> struct svc_xprt_class *xpt_class;
> struct svc_xprt_ops xpt_ops;
> + u32 xpt_max_payload;
> };

Why do you need this field in both the class and the instance
structures? Since svc_xprt refers back to svc_xprt_class, you can
just take the max payload value from the class.


> int svc_reg_xprt_class(struct svc_xprt_class *);
> diff --git a/net/sunrpc/svc.c b/net/sunrpc/svc.c
> index 55ea6df..2a4b3c6 100644
> --- a/net/sunrpc/svc.c
> +++ b/net/sunrpc/svc.c
> @@ -1034,10 +1034,8 @@ err_bad:
> */
> u32 svc_max_payload(const struct svc_rqst *rqstp)
> {
> - int max = RPCSVC_MAXPAYLOAD_TCP;
> + int max = rqstp->rq_xprt->xpt_max_payload;
>
> - if (rqstp->rq_sock->sk_sock->type == SOCK_DGRAM)
> - max = RPCSVC_MAXPAYLOAD_UDP;
> if (rqstp->rq_server->sv_max_payload < max)
> max = rqstp->rq_server->sv_max_payload;
> return max;
> diff --git a/net/sunrpc/svc_xprt.c b/net/sunrpc/svc_xprt.c
> index f838b57..8ea65c3 100644
> --- a/net/sunrpc/svc_xprt.c
> +++ b/net/sunrpc/svc_xprt.c
> @@ -90,5 +90,6 @@ void svc_xprt_init(struct svc_xprt_class
> {
> xpt->xpt_class = xcl;
> xpt->xpt_ops = *xcl->xcl_ops;
> + xpt->xpt_max_payload = xcl->xcl_max_payload;
> }
> EXPORT_SYMBOL_GPL(svc_xprt_init);
> diff --git a/net/sunrpc/svcsock.c b/net/sunrpc/svcsock.c
> index d52a6e2..d84b5c8 100644
> --- a/net/sunrpc/svcsock.c
> +++ b/net/sunrpc/svcsock.c
> @@ -905,6 +905,7 @@ static struct svc_xprt_ops svc_udp_ops =
> static struct svc_xprt_class svc_udp_class = {
> .xcl_name = "udp",
> .xcl_ops = &svc_udp_ops,
> + .xcl_max_payload = RPCSVC_MAXPAYLOAD_UDP,
> };
>
> static void
> @@ -1358,6 +1359,7 @@ static struct svc_xprt_ops svc_tcp_ops =
> static struct svc_xprt_class svc_tcp_class = {
> .xcl_name = "tcp",
> .xcl_ops = &svc_tcp_ops,
> + .xcl_max_payload = RPCSVC_MAXPAYLOAD_TCP,
> };
>
> void svc_init_xprt_sock(void)

Chuck Lever
[email protected]




-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2005.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2007-10-02 15:06:25

by Chuck Lever

[permalink] [raw]
Subject: Re: [RFC, PATCH 05/35] svc: Move sk_sendto and sk_recvfrom to svc_xprt_class

On Oct 1, 2007, at 3:27 PM, Tom Tucker wrote:
>
> The sk_sendto and sk_recvfrom are function pointers that allow
> svc_sock
> to be used for both UDP and TCP. Move these function pointers to the
> svc_xprt_ops structure.
>
> Signed-off-by: Tom Tucker <[email protected]>
> ---
>
> include/linux/sunrpc/svc_xprt.h | 2 ++
> include/linux/sunrpc/svcsock.h | 3 ---
> net/sunrpc/svcsock.c | 12 ++++++------
> 3 files changed, 8 insertions(+), 9 deletions(-)
>
> diff --git a/include/linux/sunrpc/svc_xprt.h b/include/linux/sunrpc/
> svc_xprt.h
> index 827f0fe..f0ba052 100644
> --- a/include/linux/sunrpc/svc_xprt.h
> +++ b/include/linux/sunrpc/svc_xprt.h
> @@ -10,6 +10,8 @@ #define SUNRPC_SVC_XPRT_H
> #include <linux/sunrpc/svc.h>
>
> struct svc_xprt_ops {
> + int (*xpo_recvfrom)(struct svc_rqst *);
> + int (*xpo_sendto)(struct svc_rqst *);
> };
>
> struct svc_xprt_class {
> diff --git a/include/linux/sunrpc/svcsock.h b/include/linux/sunrpc/
> svcsock.h
> index 1878cbe..08e78d0 100644
> --- a/include/linux/sunrpc/svcsock.h
> +++ b/include/linux/sunrpc/svcsock.h
> @@ -45,9 +45,6 @@ #define SK_DETACHED 10 /* detached fro
> * be revisted */
> struct mutex sk_mutex; /* to serialize sending data */
>
> - int (*sk_recvfrom)(struct svc_rqst *rqstp);
> - int (*sk_sendto)(struct svc_rqst *rqstp);
> -
> /* We keep the old state_change and data_ready CB's here */
> void (*sk_ostate)(struct sock *);
> void (*sk_odata)(struct sock *, int bytes);
> diff --git a/net/sunrpc/svcsock.c b/net/sunrpc/svcsock.c
> index d84b5c8..150531f 100644
> --- a/net/sunrpc/svcsock.c
> +++ b/net/sunrpc/svcsock.c
> @@ -900,6 +900,8 @@ svc_udp_sendto(struct svc_rqst *rqstp)
> }
>
> static struct svc_xprt_ops svc_udp_ops = {
> + .xpo_recvfrom = svc_udp_recvfrom,
> + .xpo_sendto = svc_udp_sendto,
> };
>
> static struct svc_xprt_class svc_udp_class = {
> @@ -917,8 +919,6 @@ svc_udp_init(struct svc_sock *svsk)
> svc_xprt_init(&svc_udp_class, &svsk->sk_xprt);
> svsk->sk_sk->sk_data_ready = svc_udp_data_ready;
> svsk->sk_sk->sk_write_space = svc_write_space;
> - svsk->sk_recvfrom = svc_udp_recvfrom;
> - svsk->sk_sendto = svc_udp_sendto;
>
> /* initialise setting must have enough space to
> * receive and respond to one request.
> @@ -1354,6 +1354,8 @@ svc_tcp_sendto(struct svc_rqst *rqstp)
> }
>
> static struct svc_xprt_ops svc_tcp_ops = {
> + .xpo_recvfrom = svc_tcp_recvfrom,
> + .xpo_sendto = svc_tcp_sendto,
> };
>
> static struct svc_xprt_class svc_tcp_class = {
> @@ -1381,8 +1383,6 @@ svc_tcp_init(struct svc_sock *svsk)
> struct tcp_sock *tp = tcp_sk(sk);
>
> svc_xprt_init(&svc_tcp_class, &svsk->sk_xprt);
> - svsk->sk_recvfrom = svc_tcp_recvfrom;
> - svsk->sk_sendto = svc_tcp_sendto;
>
> if (sk->sk_state == TCP_LISTEN) {
> dprintk("setting up TCP socket for listening\n");
> @@ -1530,7 +1530,7 @@ svc_recv(struct svc_rqst *rqstp, long ti
>
> dprintk("svc: server %p, pool %u, socket %p, inuse=%d\n",
> rqstp, pool->sp_id, svsk, atomic_read(&svsk->sk_inuse));
> - len = svsk->sk_recvfrom(rqstp);
> + len = svsk->sk_xprt.xpt_ops.xpo_recvfrom(rqstp);
> dprintk("svc: got len=%d\n", len);
>
> /* No data, incomplete (TCP) read, or accept() */
> @@ -1590,7 +1590,7 @@ svc_send(struct svc_rqst *rqstp)
> if (test_bit(SK_DEAD, &svsk->sk_flags))
> len = -ENOTCONN;
> else
> - len = svsk->sk_sendto(rqstp);
> + len = svsk->sk_xprt.xpt_ops.xpo_sendto(rqstp);
> mutex_unlock(&svsk->sk_mutex);
> svc_sock_release(rqstp);

Again, here you have copied a pointer from the class structure to the
instance structure -- the address of the transport ops structure
never changes during the lifetime of the xprt instance, does it? You
could just as easily use the class's ops pointer instead.

It looks like on the client side, I didn't put the ops vector or the
payload maximum in the class structure at all... 6 of one, half dozen
of the other. Using the class's value of the ops and payload maximum
would save some space in the svc_xprt, though, come to think of it.

Also, to address Neil's concern about the appearance of the
expression which dereferences these methods, why not use a macro,
similar to VOP_GETATTR() in the old BSD kernels, that replaces this
long chain of indirections with a simple to recognize macro call?

Chuck Lever
[email protected]




-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2005.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2007-10-02 15:21:40

by Chuck Lever

[permalink] [raw]
Subject: Re: [RFC, PATCH 06/35] svc: Add transport specific xpo_release function

On Oct 1, 2007, at 3:27 PM, Tom Tucker wrote:
>
> The svc_sock_release function releases pages allocated to a thread.
> For
> UDP, this also returns the receive skb to the stack. For RDMA it will
> post a receive WR and bump the client credit count.
>
> Signed-off-by: Tom Tucker <[email protected]>
> ---
>
> include/linux/sunrpc/svc.h | 2 +-
> include/linux/sunrpc/svc_xprt.h | 1 +
> net/sunrpc/svcsock.c | 16 +++++++++-------
> 3 files changed, 11 insertions(+), 8 deletions(-)
>
> diff --git a/include/linux/sunrpc/svc.h b/include/linux/sunrpc/svc.h
> index 37f7448..cfb2652 100644
> --- a/include/linux/sunrpc/svc.h
> +++ b/include/linux/sunrpc/svc.h
> @@ -217,7 +217,7 @@ struct svc_rqst {
> struct auth_ops * rq_authop; /* authentication flavour */
> u32 rq_flavor; /* pseudoflavor */
> struct svc_cred rq_cred; /* auth info */
> - struct sk_buff * rq_skbuff; /* fast recv inet buffer */
> + void * rq_xprt_ctxt; /* transport specific context ptr */
> struct svc_deferred_req*rq_deferred; /* deferred request we are
> replaying */
>
> struct xdr_buf rq_arg;
> diff --git a/include/linux/sunrpc/svc_xprt.h b/include/linux/sunrpc/
> svc_xprt.h
> index f0ba052..5871faa 100644
> --- a/include/linux/sunrpc/svc_xprt.h
> +++ b/include/linux/sunrpc/svc_xprt.h
> @@ -12,6 +12,7 @@ #include <linux/sunrpc/svc.h>
> struct svc_xprt_ops {
> int (*xpo_recvfrom)(struct svc_rqst *);
> int (*xpo_sendto)(struct svc_rqst *);
> + void (*xpo_release)(struct svc_rqst *);
> };
>
> struct svc_xprt_class {

You intend to add xpo_detach and xpo_free in a later patch. The
method names suggest all of these operate on the svc_xprt.
xpo_release, however appears to operate on a request, not on a svc_xprt.

Perhaps you might name this method xpo_release_rqst or some other
name that indicates that this operates on a request. The name
xpo_release could easily refer to closing the underlying socket. As
an example, the client-side transport uses ->release_request.

The client side also appears to treat the transport as handling
requests, instead of socket reads and writes. The use of the method
names recvfrom and sendto suggest we are talking about bytes on a
socket here, not transmitting and receiving whole RPC requests. I
think that's a useful abstraction.

While I'm whining aloud... I don't prefer the method names detach or
free, either. On the client side we used close and destroy, which
(to me, anyway) makes more sense.

> diff --git a/net/sunrpc/svcsock.c b/net/sunrpc/svcsock.c
> index 150531f..2d5731c 100644
> --- a/net/sunrpc/svcsock.c
> +++ b/net/sunrpc/svcsock.c
> @@ -184,14 +184,14 @@ svc_thread_dequeue(struct svc_pool *pool
> /*
> * Release an skbuff after use
> */
> -static inline void
> +static void
> svc_release_skb(struct svc_rqst *rqstp)
> {
> - struct sk_buff *skb = rqstp->rq_skbuff;
> + struct sk_buff *skb = rqstp->rq_xprt_ctxt;
> struct svc_deferred_req *dr = rqstp->rq_deferred;
>
> if (skb) {
> - rqstp->rq_skbuff = NULL;
> + rqstp->rq_xprt_ctxt = NULL;
>
> dprintk("svc: service %p, releasing skb %p\n", rqstp, skb);
> skb_free_datagram(rqstp->rq_sock->sk_sk, skb);
> @@ -394,7 +394,7 @@ svc_sock_release(struct svc_rqst *rqstp)
> {
> struct svc_sock *svsk = rqstp->rq_sock;
>
> - svc_release_skb(rqstp);
> + rqstp->rq_xprt->xpt_ops.xpo_release(rqstp);
>
> svc_free_res_pages(rqstp);
> rqstp->rq_res.page_len = 0;
> @@ -866,7 +866,7 @@ svc_udp_recvfrom(struct svc_rqst *rqstp)
> skb_free_datagram(svsk->sk_sk, skb);
> return 0;
> }
> - rqstp->rq_skbuff = skb;
> + rqstp->rq_xprt_ctxt = skb;
> }
>
> rqstp->rq_arg.page_base = 0;
> @@ -902,6 +902,7 @@ svc_udp_sendto(struct svc_rqst *rqstp)
> static struct svc_xprt_ops svc_udp_ops = {
> .xpo_recvfrom = svc_udp_recvfrom,
> .xpo_sendto = svc_udp_sendto,
> + .xpo_release = svc_release_skb,
> };
>
> static struct svc_xprt_class svc_udp_class = {
> @@ -1290,7 +1291,7 @@ svc_tcp_recvfrom(struct svc_rqst *rqstp)
> rqstp->rq_arg.page_len = len - rqstp->rq_arg.head[0].iov_len;
> }
>
> - rqstp->rq_skbuff = NULL;
> + rqstp->rq_xprt_ctxt = NULL;
> rqstp->rq_prot = IPPROTO_TCP;
>
> /* Reset TCP read info */
> @@ -1356,6 +1357,7 @@ svc_tcp_sendto(struct svc_rqst *rqstp)
> static struct svc_xprt_ops svc_tcp_ops = {
> .xpo_recvfrom = svc_tcp_recvfrom,
> .xpo_sendto = svc_tcp_sendto,
> + .xpo_release = svc_release_skb,
> };
>
> static struct svc_xprt_class svc_tcp_class = {
> @@ -1577,7 +1579,7 @@ svc_send(struct svc_rqst *rqstp)
> }
>
> /* release the receive skb before sending the reply */
> - svc_release_skb(rqstp);
> + rqstp->rq_xprt->xpt_ops.xpo_release(rqstp);
>
> /* calculate over-all length */
> xb = & rqstp->rq_res;

Chuck Lever
[email protected]




-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2005.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2007-10-02 15:25:59

by J. Bruce Fields

[permalink] [raw]
Subject: Re: [RFC,PATCH 00/35] SVC Transport Switch

On Mon, Oct 01, 2007 at 02:14:26PM -0500, Tom Tucker wrote:
> This is rev 2 of the new pluggable transport switch for
> RPC servers. This version includes two new patches: one to add a field
> for keeping track of a transport specific header that precedes the
> RPC header for deferral processing, and one that cleans up some
> left over references to svc_sock in transport independent code.

Thanks! I've replaced this in for-mm at

git://linux-nfs.org/~bfields/linux.git for-mm

But still haven't taken the time to read through it carefully.

The review comments seem to mostly be on small stuff. Are there any
doubts about the basic approach?

By the way, are the only three transport classes we can forsee for now
udp, tcp, and rdma?

--b.

-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2005.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2007-10-02 15:34:42

by Chuck Lever

[permalink] [raw]
Subject: Re: [RFC,PATCH 11/35] svc: Add xpo_accept transport function

On Oct 1, 2007, at 3:27 PM, Tom Tucker wrote:
>
> Previously, the accept logic looked into the socket state to determine
> whether to call accept or recv when data-ready was indicated on an
> endpoint.
> Since some transports don't use sockets, this logic was changed to
> use a flag
> bit (SK_LISTENER) to identify listening endpoints. A transport
> function
> (xpo_accept) was added to allow each transport to define its own
> accept
> processing. A transport's initialization logic is reponsible for
> setting the

"reSponsible"

>
> SK_LISTENER bit. I didn't see any way to do this in transport
> independent
> logic since the passive side of a UDP connection doesn't listen and
> always recv's.
>
> In the svc_recv function, if the SK_LISTENER bit is set, the transport
> xpo_accept function is called to handle accept processing.
>
> Note that all functions are defined even if they don't make sense
> for a given transport. For example, accept doesn't mean anything for
> UDP. The fuction is defined anyway and bug checks if called. The
> UDP transport should never set the SK_LISTENER bit.
>
> The code that poaches connections when the connection
> limit is hit was moved to a subroutine to make the accept logic path
> easier to follow. Since this is in the new connection path, it should
> not be a performance issue.
>
> Signed-off-by: Tom Tucker <[email protected]>
> ---
>
> include/linux/sunrpc/svc_xprt.h | 1
> include/linux/sunrpc/svcsock.h | 1
> net/sunrpc/svcsock.c | 130 +++++++++++++++++++++
> +-----------------
> 3 files changed, 75 insertions(+), 57 deletions(-)
>
> diff --git a/include/linux/sunrpc/svc_xprt.h b/include/linux/sunrpc/
> svc_xprt.h
> index 47bedfa..4c1a650 100644
> --- a/include/linux/sunrpc/svc_xprt.h
> +++ b/include/linux/sunrpc/svc_xprt.h
> @@ -10,6 +10,7 @@ #define SUNRPC_SVC_XPRT_H
> #include <linux/sunrpc/svc.h>
>
> struct svc_xprt_ops {
> + struct svc_xprt *(*xpo_accept)(struct svc_xprt *);
> int (*xpo_has_wspace)(struct svc_xprt *);
> int (*xpo_recvfrom)(struct svc_rqst *);
> void (*xpo_prep_reply_hdr)(struct svc_rqst *);
> diff --git a/include/linux/sunrpc/svcsock.h b/include/linux/sunrpc/
> svcsock.h
> index 08e78d0..9882ce0 100644
> --- a/include/linux/sunrpc/svcsock.h
> +++ b/include/linux/sunrpc/svcsock.h
> @@ -36,6 +36,7 @@ #define SK_CHNGBUF 7 /* need to change
> #define SK_DEFERRED 8 /* request on sk_deferred */
> #define SK_OLD 9 /* used for temp socket aging mark+sweep */
> #define SK_DETACHED 10 /* detached from tempsocks list */
> +#define SK_LISTENER 11 /* listening endpoint */
>
> atomic_t sk_reserved; /* space on outq that is reserved */
>
> diff --git a/net/sunrpc/svcsock.c b/net/sunrpc/svcsock.c
> index 1028914..ffc54a1 100644
> --- a/net/sunrpc/svcsock.c
> +++ b/net/sunrpc/svcsock.c
> @@ -914,6 +914,13 @@ svc_udp_has_wspace(struct svc_xprt *xprt
> return 1;
> }
>
> +static struct svc_xprt *
> +svc_udp_accept(struct svc_xprt *xprt)
> +{
> + BUG();
> + return NULL;
> +}
> +
> static struct svc_xprt_ops svc_udp_ops = {
> .xpo_recvfrom = svc_udp_recvfrom,
> .xpo_sendto = svc_udp_sendto,
> @@ -922,6 +929,7 @@ static struct svc_xprt_ops svc_udp_ops =
> .xpo_free = svc_sock_free,
> .xpo_prep_reply_hdr = svc_udp_prep_reply_hdr,
> .xpo_has_wspace = svc_udp_has_wspace,
> + .xpo_accept = svc_udp_accept,
> };
>
> static struct svc_xprt_class svc_udp_class = {
> @@ -1046,9 +1054,10 @@ static inline int svc_port_is_privileged
> /*
> * Accept a TCP connection
> */
> -static void
> -svc_tcp_accept(struct svc_sock *svsk)
> +static struct svc_xprt *
> +svc_tcp_accept(struct svc_xprt *xprt)
> {
> + struct svc_sock *svsk = container_of(xprt, struct svc_sock,
> sk_xprt);
> struct sockaddr_storage addr;
> struct sockaddr *sin = (struct sockaddr *) &addr;
> struct svc_serv *serv = svsk->sk_server;
> @@ -1060,7 +1069,7 @@ svc_tcp_accept(struct svc_sock *svsk)
>
> dprintk("svc: tcp_accept %p sock %p\n", svsk, sock);
> if (!sock)
> - return;
> + return NULL;
>
> clear_bit(SK_CONN, &svsk->sk_flags);
> err = kernel_accept(sock, &newsock, O_NONBLOCK);
> @@ -1071,7 +1080,7 @@ svc_tcp_accept(struct svc_sock *svsk)
> else if (err != -EAGAIN && net_ratelimit())
> printk(KERN_WARNING "%s: accept failed (err %d)!\n",
> serv->sv_name, -err);
> - return;
> + return NULL;
> }
>
> set_bit(SK_CONN, &svsk->sk_flags);
> @@ -1117,59 +1126,14 @@ svc_tcp_accept(struct svc_sock *svsk)
>
> svc_sock_received(newsvsk);
>
> - /* make sure that we don't have too many active connections.
> - * If we have, something must be dropped.
> - *
> - * There's no point in trying to do random drop here for
> - * DoS prevention. The NFS clients does 1 reconnect in 15
> - * seconds. An attacker can easily beat that.
> - *
> - * The only somewhat efficient mechanism would be if drop
> - * old connections from the same IP first. But right now
> - * we don't even record the client IP in svc_sock.
> - */
> - if (serv->sv_tmpcnt > (serv->sv_nrthreads+3)*20) {
> - struct svc_sock *svsk = NULL;
> - spin_lock_bh(&serv->sv_lock);
> - if (!list_empty(&serv->sv_tempsocks)) {
> - if (net_ratelimit()) {
> - /* Try to help the admin */
> - printk(KERN_NOTICE "%s: too many open TCP "
> - "sockets, consider increasing the "
> - "number of nfsd threads\n",
> - serv->sv_name);
> - printk(KERN_NOTICE
> - "%s: last TCP connect from %s\n",
> - serv->sv_name, __svc_print_addr(sin,
> - buf, sizeof(buf)));
> - }
> - /*
> - * Always select the oldest socket. It's not fair,
> - * but so is life
> - */
> - svsk = list_entry(serv->sv_tempsocks.prev,
> - struct svc_sock,
> - sk_list);
> - set_bit(SK_CLOSE, &svsk->sk_flags);
> - atomic_inc(&svsk->sk_inuse);
> - }
> - spin_unlock_bh(&serv->sv_lock);
> -
> - if (svsk) {
> - svc_sock_enqueue(svsk);
> - svc_sock_put(svsk);
> - }
> -
> - }
> -
> if (serv->sv_stats)
> serv->sv_stats->nettcpconn++;
>
> - return;
> + return &newsvsk->sk_xprt;
>
> failed:
> sock_release(newsock);
> - return;
> + return NULL;
> }
>
> /*
> @@ -1194,12 +1158,6 @@ svc_tcp_recvfrom(struct svc_rqst *rqstp)
> return svc_deferred_recv(rqstp);
> }
>
> - if (svsk->sk_sk->sk_state == TCP_LISTEN) {
> - svc_tcp_accept(svsk);
> - svc_sock_received(svsk);
> - return 0;
> - }
> -
> if (test_and_clear_bit(SK_CHNGBUF, &svsk->sk_flags))
> /* sndbuf needs to have room for one request
> * per thread, otherwise we can stall even when the
> @@ -1407,6 +1365,7 @@ static struct svc_xprt_ops svc_tcp_ops =
> .xpo_free = svc_sock_free,
> .xpo_prep_reply_hdr = svc_tcp_prep_reply_hdr,
> .xpo_has_wspace = svc_tcp_has_wspace,
> + .xpo_accept = svc_tcp_accept,
> };
>
> static struct svc_xprt_class svc_tcp_class = {
> @@ -1488,6 +1447,55 @@ svc_sock_update_bufs(struct svc_serv *se
> spin_unlock_bh(&serv->sv_lock);
> }
>
> +static void
> +svc_check_conn_limits(struct svc_serv *serv)
> +{
> + char buf[RPC_MAX_ADDRBUFLEN];
> +
> + /* make sure that we don't have too many active connections.
> + * If we have, something must be dropped.
> + *
> + * There's no point in trying to do random drop here for
> + * DoS prevention. The NFS clients does 1 reconnect in 15
> + * seconds. An attacker can easily beat that.
> + *
> + * The only somewhat efficient mechanism would be if drop
> + * old connections from the same IP first. But right now
> + * we don't even record the client IP in svc_sock.
> + */
> + if (serv->sv_tmpcnt > (serv->sv_nrthreads+3)*20) {
> + struct svc_sock *svsk = NULL;
> + spin_lock_bh(&serv->sv_lock);
> + if (!list_empty(&serv->sv_tempsocks)) {
> + if (net_ratelimit()) {
> + /* Try to help the admin */
> + printk(KERN_NOTICE "%s: too many open TCP "
> + "sockets, consider increasing the "
> + "number of nfsd threads\n",
> + serv->sv_name);
> + printk(KERN_NOTICE
> + "%s: last TCP connect from %s\n",
> + serv->sv_name, buf);
> + }
> + /*
> + * Always select the oldest socket. It's not fair,
> + * but so is life
> + */
> + svsk = list_entry(serv->sv_tempsocks.prev,
> + struct svc_sock,
> + sk_list);
> + set_bit(SK_CLOSE, &svsk->sk_flags);
> + atomic_inc(&svsk->sk_inuse);
> + }
> + spin_unlock_bh(&serv->sv_lock);
> +
> + if (svsk) {
> + svc_sock_enqueue(svsk);
> + svc_sock_put(svsk);
> + }
> + }
> +}
> +
> /*
> * Receive the next request on any socket. This code is carefully
> * organised not to touch any cachelines in the shared svc_serv
> @@ -1583,6 +1591,12 @@ svc_recv(struct svc_rqst *rqstp, long ti
> if (test_bit(SK_CLOSE, &svsk->sk_flags)) {
> dprintk("svc_recv: found SK_CLOSE\n");
> svc_delete_socket(svsk);
> + } else if (test_bit(SK_LISTENER, &svsk->sk_flags)) {
> + struct svc_xprt *newxpt;
> + newxpt = svsk->sk_xprt.xpt_ops.xpo_accept(&svsk->sk_xprt);
> + if (newxpt)
> + svc_check_conn_limits(svsk->sk_server);
> + svc_sock_received(svsk);
> } else {
> dprintk("svc: server %p, pool %u, socket %p, inuse=%d\n",
> rqstp, pool->sp_id, svsk, atomic_read(&svsk->sk_inuse));

Instead of adding a test_bit() and conditional branch here, why not
always call xpo_accept? For UDP, the method simply returns.

> @@ -1859,6 +1873,8 @@ static int svc_create_socket(struct svc_
> }
>
> if ((svsk = svc_setup_socket(serv, sock, &error, flags)) != NULL) {
> + if (protocol == IPPROTO_TCP)
> + set_bit(SK_LISTENER, &svsk->sk_flags);
> svc_sock_received(svsk);
> return ntohs(inet_sk(svsk->sk_sk)->sport);
> }

If you really need to set SK_LISTENER for TCP, shouldn't that be done
in svc_tcp_init() ?

Chuck Lever
[email protected]




-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2005.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2007-10-02 15:42:25

by Chuck Lever

[permalink] [raw]
Subject: Re: [RFC, PATCH 12/35] svc: Add a generic transport svc_create_xprt function

On Oct 1, 2007, at 3:27 PM, Tom Tucker wrote:
>
> The svc_create_xprt function is a transport independent version
> of the svc_makesock function.
>
> Since transport instance creation contains transport dependent and
> independent components, add an xpo_create transport function. The
> transport implementation of this function allocates the memory for the
> endpoint, implements the transport dependent initialization logic, and
> calls svc_xprt_init to initialize the transport independent field
> (svc_xprt)
> in it's data structure.
>
> Signed-off-by: Tom Tucker <[email protected]>
> ---
>
> include/linux/sunrpc/svc_xprt.h | 4 +++
> net/sunrpc/svc_xprt.c | 35 ++++++++++++++++++++++++
> net/sunrpc/svcsock.c | 58 ++++++++++++++++++++++++++++
> +----------
> 3 files changed, 82 insertions(+), 15 deletions(-)
>
> diff --git a/include/linux/sunrpc/svc_xprt.h b/include/linux/sunrpc/
> svc_xprt.h
> index 4c1a650..6a34bb4 100644
> --- a/include/linux/sunrpc/svc_xprt.h
> +++ b/include/linux/sunrpc/svc_xprt.h
> @@ -10,6 +10,9 @@ #define SUNRPC_SVC_XPRT_H
> #include <linux/sunrpc/svc.h>
>
> struct svc_xprt_ops {
> + struct svc_xprt *(*xpo_create)(struct svc_serv *,
> + struct sockaddr *,
> + int);

Should xpo_create also have a length argument, as in (struct sockaddr
*, socklen_t) ?

(or whatever the type of sockaddr lengths are: size_t perhaps?)

> struct svc_xprt *(*xpo_accept)(struct svc_xprt *);
> int (*xpo_has_wspace)(struct svc_xprt *);
> int (*xpo_recvfrom)(struct svc_rqst *);
> @@ -37,5 +40,6 @@ struct svc_xprt {
> int svc_reg_xprt_class(struct svc_xprt_class *);
> int svc_unreg_xprt_class(struct svc_xprt_class *);
> void svc_xprt_init(struct svc_xprt_class *, struct svc_xprt *);
> +int svc_create_xprt(struct svc_serv *, char *, unsigned short, int);
>
> #endif /* SUNRPC_SVC_XPRT_H */
> diff --git a/net/sunrpc/svc_xprt.c b/net/sunrpc/svc_xprt.c
> index 8ea65c3..d57064f 100644
> --- a/net/sunrpc/svc_xprt.c
> +++ b/net/sunrpc/svc_xprt.c
> @@ -93,3 +93,38 @@ void svc_xprt_init(struct svc_xprt_class
> xpt->xpt_max_payload = xcl->xcl_max_payload;
> }
> EXPORT_SYMBOL_GPL(svc_xprt_init);
> +
> +int svc_create_xprt(struct svc_serv *serv, char *xprt_name,
> unsigned short port,
> + int flags)
> +{
> + struct svc_xprt_class *xcl;
> + int ret = -ENOENT;
> + struct sockaddr_in sin = {
> + .sin_family = AF_INET,
> + .sin_addr.s_addr = INADDR_ANY,
> + .sin_port = htons(port),
> + };
> + dprintk("svc: creating transport %s[%d]\n", xprt_name, port);
> + spin_lock(&svc_xprt_class_lock);
> + list_for_each_entry(xcl, &svc_xprt_class_list, xcl_list) {
> + if (strcmp(xprt_name, xcl->xcl_name) == 0) {
> + spin_unlock(&svc_xprt_class_lock);
> + if (try_module_get(xcl->xcl_owner)) {
> + struct svc_xprt *newxprt;
> + ret = 0;
> + newxprt = xcl->xcl_ops->xpo_create
> + (serv, (struct sockaddr *)&sin, flags);
> + if (IS_ERR(newxprt)) {
> + module_put(xcl->xcl_owner);
> + ret = PTR_ERR(newxprt);
> + }
> + }
> + goto out;
> + }
> + }
> + spin_unlock(&svc_xprt_class_lock);
> + dprintk("svc: transport %s not found\n", xprt_name);
> + out:
> + return ret;
> +}
> +EXPORT_SYMBOL_GPL(svc_create_xprt);
> diff --git a/net/sunrpc/svcsock.c b/net/sunrpc/svcsock.c
> index ffc54a1..e3c74e0 100644
> --- a/net/sunrpc/svcsock.c
> +++ b/net/sunrpc/svcsock.c
> @@ -90,6 +90,8 @@ static void svc_sock_free(struct svc_xp
> static struct svc_deferred_req *svc_deferred_dequeue(struct
> svc_sock *svsk);
> static int svc_deferred_recv(struct svc_rqst *rqstp);
> static struct cache_deferred_req *svc_defer(struct cache_req *req);
> +static struct svc_xprt *
> +svc_create_socket(struct svc_serv *, int, struct sockaddr *, int,
> int);
>
> /* apparently the "standard" is that clients close
> * idle connections after 5 minutes, servers after
> @@ -381,6 +383,7 @@ svc_sock_put(struct svc_sock *svsk)
> {
> if (atomic_dec_and_test(&svsk->sk_inuse)) {
> BUG_ON(!test_bit(SK_DEAD, &svsk->sk_flags));
> + module_put(svsk->sk_xprt.xpt_class->xcl_owner);
> svsk->sk_xprt.xpt_ops.xpo_free(&svsk->sk_xprt);
> }
> }
> @@ -921,7 +924,15 @@ svc_udp_accept(struct svc_xprt *xprt)
> return NULL;
> }
>
> +static struct svc_xprt *
> +svc_udp_create(struct svc_serv *serv, struct sockaddr *sa, int flags)
> +{
> + return svc_create_socket(serv, IPPROTO_UDP, sa,
> + sizeof(struct sockaddr_in), flags);
> +}
> +
> static struct svc_xprt_ops svc_udp_ops = {
> + .xpo_create = svc_udp_create,
> .xpo_recvfrom = svc_udp_recvfrom,
> .xpo_sendto = svc_udp_sendto,
> .xpo_release = svc_release_skb,
> @@ -934,6 +945,7 @@ static struct svc_xprt_ops svc_udp_ops =
>
> static struct svc_xprt_class svc_udp_class = {
> .xcl_name = "udp",
> + .xcl_owner = THIS_MODULE,
> .xcl_ops = &svc_udp_ops,
> .xcl_max_payload = RPCSVC_MAXPAYLOAD_UDP,
> };
> @@ -1357,7 +1369,15 @@ svc_tcp_has_wspace(struct svc_xprt *xprt
> return 1;
> }
>
> +static struct svc_xprt *
> +svc_tcp_create(struct svc_serv *serv, struct sockaddr *sa, int flags)
> +{
> + return svc_create_socket(serv, IPPROTO_TCP, sa,
> + sizeof(struct sockaddr_in), flags);
> +}
> +
> static struct svc_xprt_ops svc_tcp_ops = {
> + .xpo_create = svc_tcp_create,
> .xpo_recvfrom = svc_tcp_recvfrom,
> .xpo_sendto = svc_tcp_sendto,
> .xpo_release = svc_release_skb,
> @@ -1370,6 +1390,7 @@ static struct svc_xprt_ops svc_tcp_ops =
>
> static struct svc_xprt_class svc_tcp_class = {
> .xcl_name = "tcp",
> + .xcl_owner = THIS_MODULE,
> .xcl_ops = &svc_tcp_ops,
> .xcl_max_payload = RPCSVC_MAXPAYLOAD_TCP,
> };
> @@ -1594,8 +1615,14 @@ svc_recv(struct svc_rqst *rqstp, long ti
> } else if (test_bit(SK_LISTENER, &svsk->sk_flags)) {
> struct svc_xprt *newxpt;
> newxpt = svsk->sk_xprt.xpt_ops.xpo_accept(&svsk->sk_xprt);
> - if (newxpt)
> + if (newxpt) {
> + /*
> + * We know this module_get will succeed because the
> + * listener holds a reference too
> + */
> + __module_get(newxpt->xpt_class->xcl_owner);
> svc_check_conn_limits(svsk->sk_server);
> + }
> svc_sock_received(svsk);
> } else {
> dprintk("svc: server %p, pool %u, socket %p, inuse=%d\n",
> @@ -1835,8 +1862,9 @@ EXPORT_SYMBOL_GPL(svc_addsock);
> /*
> * Create socket for RPC service.
> */
> -static int svc_create_socket(struct svc_serv *serv, int protocol,
> - struct sockaddr *sin, int len, int flags)
> +static struct svc_xprt *
> +svc_create_socket(struct svc_serv *serv, int protocol,
> + struct sockaddr *sin, int len, int flags)
> {
> struct svc_sock *svsk;
> struct socket *sock;
> @@ -1851,13 +1879,13 @@ static int svc_create_socket(struct svc_
> if (protocol != IPPROTO_UDP && protocol != IPPROTO_TCP) {
> printk(KERN_WARNING "svc: only UDP and TCP "
> "sockets supported\n");
> - return -EINVAL;
> + return ERR_PTR(-EINVAL);
> }
> type = (protocol == IPPROTO_UDP)? SOCK_DGRAM : SOCK_STREAM;
>
> error = sock_create_kern(sin->sa_family, type, protocol, &sock);
> if (error < 0)
> - return error;
> + return ERR_PTR(error);
>
> svc_reclassify_socket(sock);
>
> @@ -1876,13 +1904,13 @@ static int svc_create_socket(struct svc_
> if (protocol == IPPROTO_TCP)
> set_bit(SK_LISTENER, &svsk->sk_flags);
> svc_sock_received(svsk);
> - return ntohs(inet_sk(svsk->sk_sk)->sport);
> + return (struct svc_xprt *)svsk;
> }
>
> bummer:
> dprintk("svc: svc_create_socket error = %d\n", -error);
> sock_release(sock);
> - return error;
> + return ERR_PTR(error);
> }
>
> /*
> @@ -1995,15 +2023,15 @@ void svc_force_close_socket(struct svc_s
> int svc_makesock(struct svc_serv *serv, int protocol, unsigned
> short port,
> int flags)
> {
> - struct sockaddr_in sin = {
> - .sin_family = AF_INET,
> - .sin_addr.s_addr = INADDR_ANY,
> - .sin_port = htons(port),
> - };
> -
> dprintk("svc: creating socket proto = %d\n", protocol);
> - return svc_create_socket(serv, protocol, (struct sockaddr *) &sin,
> - sizeof(sin), flags);
> + switch (protocol) {
> + case IPPROTO_TCP:
> + return svc_create_xprt(serv, "tcp", port, flags);
> + case IPPROTO_UDP:
> + return svc_create_xprt(serv, "udp", port, flags);
> + default:
> + return -EINVAL;
> + }
> }
>
> /*

Chuck Lever
[email protected]




-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2005.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2007-10-02 15:45:05

by Chuck Lever

[permalink] [raw]
Subject: Re: [RFC, PATCH 13/35] svc: Change services to use new svc_create_xprt service

On Oct 1, 2007, at 3:27 PM, Tom Tucker wrote:
>
> Modify the various kernel RPC svcs to use the svc_create_xprt service.
>
> Signed-off-by: Tom Tucker <[email protected]>
> ---
>
> fs/lockd/svc.c | 17 ++++++++---------
> fs/nfs/callback.c | 4 ++--
> fs/nfsd/nfssvc.c | 4 ++--
> include/linux/sunrpc/svcsock.h | 1 -
> net/sunrpc/sunrpc_syms.c | 1 -
> net/sunrpc/svcsock.c | 22 ----------------------
> 6 files changed, 12 insertions(+), 37 deletions(-)
>
> diff --git a/fs/lockd/svc.c b/fs/lockd/svc.c
> index 82e2192..8686915 100644
> --- a/fs/lockd/svc.c
> +++ b/fs/lockd/svc.c
> @@ -219,13 +219,12 @@ lockd(struct svc_rqst *rqstp)
> module_put_and_exit(0);
> }
>
> -
> -static int find_socket(struct svc_serv *serv, int proto)
> +static int find_xprt(struct svc_serv *serv, char *proto)
> {
> struct svc_sock *svsk;
> int found = 0;
> list_for_each_entry(svsk, &serv->sv_permsocks, sk_list)
> - if (svsk->sk_sk->sk_protocol == proto) {
> + if (strcmp(svsk->sk_xprt.xpt_class->xcl_name, proto) == 0) {
> found = 1;
> break;
> }

This is scary. :-)

First, I think we would be better off making the server transport API
stronger by not allowing ULPs to dig around in svc_xprt or the
svc_xprt_class structures directly. Perhaps you could provide a
method for obtaining the transport's NETID.

Second, is there any guarantee that the string name of the underlying
protocol is the same as the name of the transport class? Is there
any relationship between the transport name and the NETIDs it supports?

> @@ -243,13 +242,13 @@ static int make_socks(struct svc_serv *s
> int err = 0;
>
> if (proto == IPPROTO_UDP || nlm_udpport)
> - if (!find_socket(serv, IPPROTO_UDP))
> - err = svc_makesock(serv, IPPROTO_UDP, nlm_udpport,
> - SVC_SOCK_DEFAULTS);
> + if (!find_xprt(serv, "udp"))
> + err = svc_create_xprt(serv, "udp", nlm_udpport,
> + SVC_SOCK_DEFAULTS);
> if (err >= 0 && (proto == IPPROTO_TCP || nlm_tcpport))
> - if (!find_socket(serv, IPPROTO_TCP))
> - err = svc_makesock(serv, IPPROTO_TCP, nlm_tcpport,
> - SVC_SOCK_DEFAULTS);
> + if (!find_xprt(serv, "tcp"))
> + err = svc_create_xprt(serv, "tcp", nlm_tcpport,
> + SVC_SOCK_DEFAULTS);
>
> if (err >= 0) {
> warned = 0;
> diff --git a/fs/nfs/callback.c b/fs/nfs/callback.c
> index a796be5..e27ca14 100644
> --- a/fs/nfs/callback.c
> +++ b/fs/nfs/callback.c
> @@ -123,8 +123,8 @@ int nfs_callback_up(void)
> if (!serv)
> goto out_err;
>
> - ret = svc_makesock(serv, IPPROTO_TCP, nfs_callback_set_tcpport,
> - SVC_SOCK_ANONYMOUS);
> + ret = svc_create_xprt(serv, "tcp", nfs_callback_set_tcpport,
> + SVC_SOCK_ANONYMOUS);
> if (ret <= 0)
> goto out_destroy;
> nfs_callback_tcpport = ret;
> diff --git a/fs/nfsd/nfssvc.c b/fs/nfsd/nfssvc.c
> index a8c89ae..bf70b06 100644
> --- a/fs/nfsd/nfssvc.c
> +++ b/fs/nfsd/nfssvc.c
> @@ -236,7 +236,7 @@ static int nfsd_init_socks(int port)
>
> error = lockd_up(IPPROTO_UDP);
> if (error >= 0) {
> - error = svc_makesock(nfsd_serv, IPPROTO_UDP, port,
> + error = svc_create_xprt(nfsd_serv, "udp", port,
> SVC_SOCK_DEFAULTS);
> if (error < 0)
> lockd_down();
> @@ -247,7 +247,7 @@ static int nfsd_init_socks(int port)
> #ifdef CONFIG_NFSD_TCP
> error = lockd_up(IPPROTO_TCP);
> if (error >= 0) {
> - error = svc_makesock(nfsd_serv, IPPROTO_TCP, port,
> + error = svc_create_xprt(nfsd_serv, "tcp", port,
> SVC_SOCK_DEFAULTS);
> if (error < 0)
> lockd_down();
> diff --git a/include/linux/sunrpc/svcsock.h b/include/linux/sunrpc/
> svcsock.h
> index 9882ce0..3181d9d 100644
> --- a/include/linux/sunrpc/svcsock.h
> +++ b/include/linux/sunrpc/svcsock.h
> @@ -67,7 +67,6 @@ #define SK_LISTENER 11 /* listening en
> /*
> * Function prototypes.
> */
> -int svc_makesock(struct svc_serv *, int, unsigned short, int flags);
> void svc_force_close_socket(struct svc_sock *);
> int svc_recv(struct svc_rqst *, long);
> int svc_send(struct svc_rqst *);
> diff --git a/net/sunrpc/sunrpc_syms.c b/net/sunrpc/sunrpc_syms.c
> index a62ce47..e4cad0f 100644
> --- a/net/sunrpc/sunrpc_syms.c
> +++ b/net/sunrpc/sunrpc_syms.c
> @@ -72,7 +72,6 @@ EXPORT_SYMBOL(svc_drop);
> EXPORT_SYMBOL(svc_process);
> EXPORT_SYMBOL(svc_recv);
> EXPORT_SYMBOL(svc_wake_up);
> -EXPORT_SYMBOL(svc_makesock);
> EXPORT_SYMBOL(svc_reserve);
> EXPORT_SYMBOL(svc_auth_register);
> EXPORT_SYMBOL(auth_domain_lookup);
> diff --git a/net/sunrpc/svcsock.c b/net/sunrpc/svcsock.c
> index e3c74e0..373f020 100644
> --- a/net/sunrpc/svcsock.c
> +++ b/net/sunrpc/svcsock.c
> @@ -2012,28 +2012,6 @@ void svc_force_close_socket(struct svc_s
> svc_close_socket(svsk);
> }
>
> -/**
> - * svc_makesock - Make a socket for nfsd and lockd
> - * @serv: RPC server structure
> - * @protocol: transport protocol to use
> - * @port: port to use
> - * @flags: requested socket characteristics
> - *
> - */
> -int svc_makesock(struct svc_serv *serv, int protocol, unsigned
> short port,
> - int flags)
> -{
> - dprintk("svc: creating socket proto = %d\n", protocol);
> - switch (protocol) {
> - case IPPROTO_TCP:
> - return svc_create_xprt(serv, "tcp", port, flags);
> - case IPPROTO_UDP:
> - return svc_create_xprt(serv, "udp", port, flags);
> - default:
> - return -EINVAL;
> - }
> -}
> -
> /*
> * Handle defer and revisit of requests
> */

Chuck Lever
[email protected]




-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2005.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2007-10-02 16:16:24

by Chuck Lever

[permalink] [raw]
Subject: Re: [RFC,PATCH 20/35] svc: Make svc_send transport neutral

On Oct 1, 2007, at 3:28 PM, Tom Tucker wrote:
>
> Move the sk_mutex field to the transport independent svc_xprt
> structure.
> Now all the fields that svc_send touches are transport neutral.
> Change the
> svc_send function to use the transport independent svc_xprt
> directly instead
> of the transport dependent svc_sock structure.
>
> Signed-off-by: Tom Tucker <[email protected]>
> ---
>
> include/linux/sunrpc/svc_xprt.h | 1 +
> include/linux/sunrpc/svcsock.h | 1 -
> net/sunrpc/svc_xprt.c | 1 +
> net/sunrpc/svcsock.c | 17 ++++++++---------
> 4 files changed, 10 insertions(+), 10 deletions(-)
>
> diff --git a/include/linux/sunrpc/svc_xprt.h b/include/linux/sunrpc/
> svc_xprt.h
> index e8be38f..c16a2c6 100644
> --- a/include/linux/sunrpc/svc_xprt.h
> +++ b/include/linux/sunrpc/svc_xprt.h
> @@ -55,6 +55,7 @@ #define XPT_LISTENER 11 /* listening en
> struct svc_pool *xpt_pool; /* current pool iff queued */
> struct svc_serv *xpt_server; /* service for transport */
> atomic_t xpt_reserved; /* space on outq that is rsvd */
> + struct mutex xpt_mutex; /* to serialize sending data */
> };
>
> int svc_reg_xprt_class(struct svc_xprt_class *);
> diff --git a/include/linux/sunrpc/svcsock.h b/include/linux/sunrpc/
> svcsock.h
> index ba41f11..41c2dfa 100644
> --- a/include/linux/sunrpc/svcsock.h
> +++ b/include/linux/sunrpc/svcsock.h
> @@ -24,7 +24,6 @@ struct svc_sock {
> * sk_info_authunix */
> struct list_head sk_deferred; /* deferred requests that need to
> * be revisted */
> - struct mutex sk_mutex; /* to serialize sending data */
>
> /* We keep the old state_change and data_ready CB's here */
> void (*sk_ostate)(struct sock *);
> diff --git a/net/sunrpc/svc_xprt.c b/net/sunrpc/svc_xprt.c
> index 5195131..2a7e214 100644
> --- a/net/sunrpc/svc_xprt.c
> +++ b/net/sunrpc/svc_xprt.c
> @@ -112,6 +112,7 @@ void svc_xprt_init(struct svc_xprt_class
> xpt->xpt_server = serv;
> INIT_LIST_HEAD(&xpt->xpt_list);
> INIT_LIST_HEAD(&xpt->xpt_ready);
> + mutex_init(&xpt->xpt_mutex);
> }
> EXPORT_SYMBOL_GPL(svc_xprt_init);
>
> diff --git a/net/sunrpc/svcsock.c b/net/sunrpc/svcsock.c
> index 7cf15c6..eee64ce 100644
> --- a/net/sunrpc/svcsock.c
> +++ b/net/sunrpc/svcsock.c
> @@ -1655,12 +1655,12 @@ svc_drop(struct svc_rqst *rqstp)
> int
> svc_send(struct svc_rqst *rqstp)
> {
> - struct svc_sock *svsk;
> + struct svc_xprt *xprt;
> int len;
> struct xdr_buf *xb;
>
> - if ((svsk = rqstp->rq_sock) == NULL) {
> - printk(KERN_WARNING "NULL socket pointer in %s:%d\n",
> + if ((xprt = rqstp->rq_xprt) == NULL) {
> + printk(KERN_WARNING "NULL transport pointer in %s:%d\n",
> __FILE__, __LINE__);
> return -EFAULT;
> }

Do we still want this printk here? Maybe it can be removed.

> @@ -1674,13 +1674,13 @@ svc_send(struct svc_rqst *rqstp)
> xb->page_len +
> xb->tail[0].iov_len;
>
> - /* Grab svsk->sk_mutex to serialize outgoing data. */
> - mutex_lock(&svsk->sk_mutex);
> - if (test_bit(XPT_DEAD, &svsk->sk_xprt.xpt_flags))
> + /* Grab mutex to serialize outgoing data. */
> + mutex_lock(&xprt->xpt_mutex);
> + if (test_bit(XPT_DEAD, &xprt->xpt_flags))
> len = -ENOTCONN;
> else
> - len = svsk->sk_xprt.xpt_ops.xpo_sendto(rqstp);
> - mutex_unlock(&svsk->sk_mutex);
> + len = xprt->xpt_ops.xpo_sendto(rqstp);
> + mutex_unlock(&xprt->xpt_mutex);
> svc_sock_release(rqstp);
>
> if (len == -ECONNREFUSED || len == -ENOTCONN || len == -EAGAIN)
> @@ -1782,7 +1782,6 @@ static struct svc_sock *svc_setup_socket
> svsk->sk_lastrecv = get_seconds();
> spin_lock_init(&svsk->sk_lock);
> INIT_LIST_HEAD(&svsk->sk_deferred);
> - mutex_init(&svsk->sk_mutex);
>
> /* Initialize the socket */
> if (sock->type == SOCK_DGRAM)

Chuck Lever
[email protected]




-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2005.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2007-10-02 16:19:43

by Tom Tucker

[permalink] [raw]
Subject: Re: [RFC,PATCH 00/35] SVC Transport Switch

On Tue, 2007-10-02 at 11:25 -0400, J. Bruce Fields wrote:
> On Mon, Oct 01, 2007 at 02:14:26PM -0500, Tom Tucker wrote:
> > This is rev 2 of the new pluggable transport switch for
> > RPC servers. This version includes two new patches: one to add a field
> > for keeping track of a transport specific header that precedes the
> > RPC header for deferral processing, and one that cleans up some
> > left over references to svc_sock in transport independent code.
>
> Thanks! I've replaced this in for-mm at

Awesome, thanks.

>
> git://linux-nfs.org/~bfields/linux.git for-mm
>
> But still haven't taken the time to read through it carefully.
>
> The review comments seem to mostly be on small stuff. Are there any
> doubts about the basic approach?
>

Not architecturally, however I was initially concerned about the size
and scope of the changes. I think we're passed that at this point, so
I'm comfortable with the approach.

> By the way, are the only three transport classes we can forsee for now
> udp, tcp, and rdma?
>

Those are the ones that I am directly working on. I don't know of any
others that are imminent.

> --b.


-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2005.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2007-10-02 16:20:02

by Chuck Lever

[permalink] [raw]
Subject: Re: [RFC, PATCH 21/35] svc: Change svc_sock_received to svc_xprt_received and export it

On Oct 1, 2007, at 3:28 PM, Tom Tucker wrote:
>
> All fields touched by svc_sock_received are now transport independent.
> Change it to use svc_xprt directly. This function is called from
> transport dependent code, so export it.
>
> Signed-off-by: Tom Tucker <[email protected]>
> ---
>
> include/linux/sunrpc/svc_xprt.h | 2 +-
> net/sunrpc/svcsock.c | 37 +++++++++++++++++
> +-------------------
> 2 files changed, 19 insertions(+), 20 deletions(-)
>
> diff --git a/include/linux/sunrpc/svc_xprt.h b/include/linux/sunrpc/
> svc_xprt.h
> index c16a2c6..103aa36 100644
> --- a/include/linux/sunrpc/svc_xprt.h
> +++ b/include/linux/sunrpc/svc_xprt.h
> @@ -63,8 +63,8 @@ int svc_unreg_xprt_class(struct svc_xprt
> void svc_xprt_init(struct svc_xprt_class *, struct svc_xprt *,
> struct svc_serv *);
> int svc_create_xprt(struct svc_serv *, char *, unsigned short, int);
> +void svc_xprt_received(struct svc_xprt *);
> void svc_xprt_put(struct svc_xprt *xprt);
> -
> static inline void svc_xprt_get(struct svc_xprt *xprt)
> {
> kref_get(&xprt->xpt_ref);
> diff --git a/net/sunrpc/svcsock.c b/net/sunrpc/svcsock.c
> index eee64ce..b8d0d55 100644
> --- a/net/sunrpc/svcsock.c
> +++ b/net/sunrpc/svcsock.c
> @@ -345,14 +345,14 @@ svc_sock_dequeue(struct svc_pool *pool)
> * Note: XPT_DATA only gets cleared when a read-attempt finds
> * no (or insufficient) data.
> */
> -static inline void
> -svc_sock_received(struct svc_sock *svsk)
> +void
> +svc_xprt_received(struct svc_xprt *xprt)

Minor note here:

When altering a function's synopsis, the usual practice is to convert
these function definitions to use the new kernel style which would
include the return type on the same line. Same comment applies
throughout your patch series.

> {
> - svsk->sk_xprt.xpt_pool = NULL;
> - clear_bit(XPT_BUSY, &svsk->sk_xprt.xpt_flags);
> - svc_xprt_enqueue(&svsk->sk_xprt);
> + xprt->xpt_pool = NULL;
> + clear_bit(XPT_BUSY, &xprt->xpt_flags);
> + svc_xprt_enqueue(xprt);
> }
> -
> +EXPORT_SYMBOL_GPL(svc_xprt_received);
>
> /**
> * svc_reserve - change the space reserved for the reply to a
> request.
> @@ -781,7 +781,7 @@ svc_udp_recvfrom(struct svc_rqst *rqstp)
> (serv->sv_nrthreads+3) * serv->sv_max_mesg);
>
> if ((rqstp->rq_deferred = svc_deferred_dequeue(svsk))) {
> - svc_sock_received(svsk);
> + svc_xprt_received(&svsk->sk_xprt);
> return svc_deferred_recv(rqstp);
> }
>
> @@ -798,7 +798,7 @@ svc_udp_recvfrom(struct svc_rqst *rqstp)
> dprintk("svc: recvfrom returned error %d\n", -err);
> set_bit(XPT_DATA, &svsk->sk_xprt.xpt_flags);
> }
> - svc_sock_received(svsk);
> + svc_xprt_received(&svsk->sk_xprt);
> return -EAGAIN;
> }
> rqstp->rq_addrlen = sizeof(rqstp->rq_addr);
> @@ -813,7 +813,7 @@ svc_udp_recvfrom(struct svc_rqst *rqstp)
> /*
> * Maybe more packets - kick another thread ASAP.
> */
> - svc_sock_received(svsk);
> + svc_xprt_received(&svsk->sk_xprt);
>
> len = skb->len - sizeof(struct udphdr);
> rqstp->rq_arg.len = len;
> @@ -1126,8 +1126,6 @@ svc_tcp_accept(struct svc_xprt *xprt)
> }
> memcpy(&newsvsk->sk_local, sin, slen);
>
> - svc_sock_received(newsvsk);
> -
> if (serv->sv_stats)
> serv->sv_stats->nettcpconn++;
>
> @@ -1156,7 +1154,7 @@ svc_tcp_recvfrom(struct svc_rqst *rqstp)
> test_bit(XPT_CLOSE, &svsk->sk_xprt.xpt_flags));
>
> if ((rqstp->rq_deferred = svc_deferred_dequeue(svsk))) {
> - svc_sock_received(svsk);
> + svc_xprt_received(&svsk->sk_xprt);
> return svc_deferred_recv(rqstp);
> }
>
> @@ -1196,7 +1194,7 @@ svc_tcp_recvfrom(struct svc_rqst *rqstp)
> if (len < want) {
> dprintk("svc: short recvfrom while reading record length (%d of
> %lu)\n",
> len, want);
> - svc_sock_received(svsk);
> + svc_xprt_received(&svsk->sk_xprt);
> return -EAGAIN; /* record header not complete */
> }
>
> @@ -1232,7 +1230,7 @@ svc_tcp_recvfrom(struct svc_rqst *rqstp)
> if (len < svsk->sk_reclen) {
> dprintk("svc: incomplete TCP record (%d of %d)\n",
> len, svsk->sk_reclen);
> - svc_sock_received(svsk);
> + svc_xprt_received(&svsk->sk_xprt);
> return -EAGAIN; /* record not complete */
> }
> len = svsk->sk_reclen;
> @@ -1272,7 +1270,7 @@ svc_tcp_recvfrom(struct svc_rqst *rqstp)
> svsk->sk_reclen = 0;
> svsk->sk_tcplen = 0;
>
> - svc_sock_received(svsk);
> + svc_xprt_received(&svsk->sk_xprt);
> if (serv->sv_stats)
> serv->sv_stats->nettcpcnt++;
>
> @@ -1285,7 +1283,7 @@ svc_tcp_recvfrom(struct svc_rqst *rqstp)
> error:
> if (len == -EAGAIN) {
> dprintk("RPC: TCP recvfrom got EAGAIN\n");
> - svc_sock_received(svsk);
> + svc_xprt_received(&svsk->sk_xprt);
> } else {
> printk(KERN_NOTICE "%s: recvfrom returned errno %d\n",
> svsk->sk_xprt.xpt_server->sv_name, -len);
> @@ -1606,6 +1604,7 @@ svc_recv(struct svc_rqst *rqstp, long ti
> struct svc_xprt *newxpt;
> newxpt = svsk->sk_xprt.xpt_ops.xpo_accept(&svsk->sk_xprt);
> if (newxpt) {
> + svc_xprt_received(newxpt);
> /*
> * We know this module_get will succeed because the
> * listener holds a reference too
> @@ -1613,7 +1612,7 @@ svc_recv(struct svc_rqst *rqstp, long ti
> __module_get(newxpt->xpt_class->xcl_owner);
> svc_check_conn_limits(svsk->sk_xprt.xpt_server);
> }
> - svc_sock_received(svsk);
> + svc_xprt_received(&svsk->sk_xprt);
> } else {
> dprintk("svc: server %p, pool %u, socket %p, inuse=%d\n",
> rqstp, pool->sp_id, svsk,
> @@ -1834,7 +1833,7 @@ int svc_addsock(struct svc_serv *serv,
> else {
> svsk = svc_setup_socket(serv, so, &err, SVC_SOCK_DEFAULTS);
> if (svsk) {
> - svc_sock_received(svsk);
> + svc_xprt_received(&svsk->sk_xprt);
> err = 0;
> }
> }
> @@ -1891,7 +1890,7 @@ svc_create_socket(struct svc_serv *serv,
> if ((svsk = svc_setup_socket(serv, sock, &error, flags)) != NULL) {
> if (protocol == IPPROTO_TCP)
> set_bit(XPT_LISTENER, &svsk->sk_xprt.xpt_flags);
> - svc_sock_received(svsk);
> + svc_xprt_received(&svsk->sk_xprt);
> return (struct svc_xprt *)svsk;
> }
>

Chuck Lever
[email protected]




-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2005.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2007-10-02 16:29:22

by Tom Tucker

[permalink] [raw]
Subject: Re: [RFC, PATCH 04/35] svc: Add a max payload value to the transport

On Tue, 2007-10-02 at 10:54 -0400, Chuck Lever wrote:
> On Oct 1, 2007, at 3:27 PM, Tom Tucker wrote:
> >

[...snip...]

> >
> > struct svc_xprt {
> > struct svc_xprt_class *xpt_class;
> > struct svc_xprt_ops xpt_ops;
> > + u32 xpt_max_payload;
> > };
>
> Why do you need this field in both the class and the instance
> structures? Since svc_xprt refers back to svc_xprt_class, you can
> just take the max payload value from the class.
>

The premise was that I didn't want a given thread peeking into some
other processors memory, so anything needed from the class is copied
into the svc_xprt structure when the transport instance is created.

Greg, help me here ;-)

>
> > int svc_reg_xprt_class(struct svc_xprt_class *);
> > diff --git a/net/sunrpc/svc.c b/net/sunrpc/svc.c
> > index 55ea6df..2a4b3c6 100644
> > --- a/net/sunrpc/svc.c
> > +++ b/net/sunrpc/svc.c
> > @@ -1034,10 +1034,8 @@ err_bad:
> > */
> > u32 svc_max_payload(const struct svc_rqst *rqstp)
> > {
> > - int max = RPCSVC_MAXPAYLOAD_TCP;
> > + int max = rqstp->rq_xprt->xpt_max_payload;
> >
> > - if (rqstp->rq_sock->sk_sock->type == SOCK_DGRAM)
> > - max = RPCSVC_MAXPAYLOAD_UDP;
> > if (rqstp->rq_server->sv_max_payload < max)
> > max = rqstp->rq_server->sv_max_payload;
> > return max;
> > diff --git a/net/sunrpc/svc_xprt.c b/net/sunrpc/svc_xprt.c
> > index f838b57..8ea65c3 100644
> > --- a/net/sunrpc/svc_xprt.c
> > +++ b/net/sunrpc/svc_xprt.c
> > @@ -90,5 +90,6 @@ void svc_xprt_init(struct svc_xprt_class
> > {
> > xpt->xpt_class = xcl;
> > xpt->xpt_ops = *xcl->xcl_ops;
> > + xpt->xpt_max_payload = xcl->xcl_max_payload;
> > }
> > EXPORT_SYMBOL_GPL(svc_xprt_init);
> > diff --git a/net/sunrpc/svcsock.c b/net/sunrpc/svcsock.c
> > index d52a6e2..d84b5c8 100644
> > --- a/net/sunrpc/svcsock.c
> > +++ b/net/sunrpc/svcsock.c
> > @@ -905,6 +905,7 @@ static struct svc_xprt_ops svc_udp_ops =
> > static struct svc_xprt_class svc_udp_class = {
> > .xcl_name = "udp",
> > .xcl_ops = &svc_udp_ops,
> > + .xcl_max_payload = RPCSVC_MAXPAYLOAD_UDP,
> > };
> >
> > static void
> > @@ -1358,6 +1359,7 @@ static struct svc_xprt_ops svc_tcp_ops =
> > static struct svc_xprt_class svc_tcp_class = {
> > .xcl_name = "tcp",
> > .xcl_ops = &svc_tcp_ops,
> > + .xcl_max_payload = RPCSVC_MAXPAYLOAD_TCP,
> > };
> >
> > void svc_init_xprt_sock(void)
>
> Chuck Lever
> [email protected]
>
>


-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2005.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2007-10-02 16:31:10

by Tom Tucker

[permalink] [raw]
Subject: Re: [RFC, PATCH 05/35] svc: Move sk_sendto and sk_recvfrom to svc_xprt_class

On Tue, 2007-10-02 at 11:04 -0400, Chuck Lever wrote:
> On Oct 1, 2007, at 3:27 PM, Tom Tucker wrote:
> >
> > The sk_sendto and sk_recvfrom are function pointers that allow
> > svc_sock
> > to be used for both UDP and TCP. Move these function pointers to the
> > svc_xprt_ops structure.
> >
> > Signed-off-by: Tom Tucker <[email protected]>
> > ---
> >
> > include/linux/sunrpc/svc_xprt.h | 2 ++
> > include/linux/sunrpc/svcsock.h | 3 ---
> > net/sunrpc/svcsock.c | 12 ++++++------
> > 3 files changed, 8 insertions(+), 9 deletions(-)
> >
> > diff --git a/include/linux/sunrpc/svc_xprt.h b/include/linux/sunrpc/
> > svc_xprt.h
> > index 827f0fe..f0ba052 100644
> > --- a/include/linux/sunrpc/svc_xprt.h
> > +++ b/include/linux/sunrpc/svc_xprt.h
> > @@ -10,6 +10,8 @@ #define SUNRPC_SVC_XPRT_H
> > #include <linux/sunrpc/svc.h>
> >
> > struct svc_xprt_ops {
> > + int (*xpo_recvfrom)(struct svc_rqst *);
> > + int (*xpo_sendto)(struct svc_rqst *);
> > };
> >
> > struct svc_xprt_class {
> > diff --git a/include/linux/sunrpc/svcsock.h b/include/linux/sunrpc/
> > svcsock.h
> > index 1878cbe..08e78d0 100644
> > --- a/include/linux/sunrpc/svcsock.h
> > +++ b/include/linux/sunrpc/svcsock.h
> > @@ -45,9 +45,6 @@ #define SK_DETACHED 10 /* detached fro
> > * be revisted */
> > struct mutex sk_mutex; /* to serialize sending data */
> >
> > - int (*sk_recvfrom)(struct svc_rqst *rqstp);
> > - int (*sk_sendto)(struct svc_rqst *rqstp);
> > -
> > /* We keep the old state_change and data_ready CB's here */
> > void (*sk_ostate)(struct sock *);
> > void (*sk_odata)(struct sock *, int bytes);
> > diff --git a/net/sunrpc/svcsock.c b/net/sunrpc/svcsock.c
> > index d84b5c8..150531f 100644
> > --- a/net/sunrpc/svcsock.c
> > +++ b/net/sunrpc/svcsock.c
> > @@ -900,6 +900,8 @@ svc_udp_sendto(struct svc_rqst *rqstp)
> > }
> >
> > static struct svc_xprt_ops svc_udp_ops = {
> > + .xpo_recvfrom = svc_udp_recvfrom,
> > + .xpo_sendto = svc_udp_sendto,
> > };
> >
> > static struct svc_xprt_class svc_udp_class = {
> > @@ -917,8 +919,6 @@ svc_udp_init(struct svc_sock *svsk)
> > svc_xprt_init(&svc_udp_class, &svsk->sk_xprt);
> > svsk->sk_sk->sk_data_ready = svc_udp_data_ready;
> > svsk->sk_sk->sk_write_space = svc_write_space;
> > - svsk->sk_recvfrom = svc_udp_recvfrom;
> > - svsk->sk_sendto = svc_udp_sendto;
> >
> > /* initialise setting must have enough space to
> > * receive and respond to one request.
> > @@ -1354,6 +1354,8 @@ svc_tcp_sendto(struct svc_rqst *rqstp)
> > }
> >
> > static struct svc_xprt_ops svc_tcp_ops = {
> > + .xpo_recvfrom = svc_tcp_recvfrom,
> > + .xpo_sendto = svc_tcp_sendto,
> > };
> >
> > static struct svc_xprt_class svc_tcp_class = {
> > @@ -1381,8 +1383,6 @@ svc_tcp_init(struct svc_sock *svsk)
> > struct tcp_sock *tp = tcp_sk(sk);
> >
> > svc_xprt_init(&svc_tcp_class, &svsk->sk_xprt);
> > - svsk->sk_recvfrom = svc_tcp_recvfrom;
> > - svsk->sk_sendto = svc_tcp_sendto;
> >
> > if (sk->sk_state == TCP_LISTEN) {
> > dprintk("setting up TCP socket for listening\n");
> > @@ -1530,7 +1530,7 @@ svc_recv(struct svc_rqst *rqstp, long ti
> >
> > dprintk("svc: server %p, pool %u, socket %p, inuse=%d\n",
> > rqstp, pool->sp_id, svsk, atomic_read(&svsk->sk_inuse));
> > - len = svsk->sk_recvfrom(rqstp);
> > + len = svsk->sk_xprt.xpt_ops.xpo_recvfrom(rqstp);
> > dprintk("svc: got len=%d\n", len);
> >
> > /* No data, incomplete (TCP) read, or accept() */
> > @@ -1590,7 +1590,7 @@ svc_send(struct svc_rqst *rqstp)
> > if (test_bit(SK_DEAD, &svsk->sk_flags))
> > len = -ENOTCONN;
> > else
> > - len = svsk->sk_sendto(rqstp);
> > + len = svsk->sk_xprt.xpt_ops.xpo_sendto(rqstp);
> > mutex_unlock(&svsk->sk_mutex);
> > svc_sock_release(rqstp);
>
> Again, here you have copied a pointer from the class structure to the
> instance structure -- the address of the transport ops structure
> never changes during the lifetime of the xprt instance, does it? You
> could just as easily use the class's ops pointer instead.
>
> It looks like on the client side, I didn't put the ops vector or the
> payload maximum in the class structure at all... 6 of one, half dozen
> of the other. Using the class's value of the ops and payload maximum
> would save some space in the svc_xprt, though, come to think of it.
>

cache thing again. let's see how Greg weighs in.


> Also, to address Neil's concern about the appearance of the
> expression which dereferences these methods, why not use a macro,
> similar to VOP_GETATTR() in the old BSD kernels, that replaces this
> long chain of indirections with a simple to recognize macro call?
>

The chain is disgusting until everything gets normalized to an svc_xprt
in later patches. Take a look at the final result and see if you still
think a macro helps vs. just obscures.

> Chuck Lever
> [email protected]
>
>


-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2005.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2007-10-02 16:34:32

by Chuck Lever

[permalink] [raw]
Subject: Re: [RFC, PATCH 25/35] svc: Move the sockaddr information to svc_xprt

On Oct 1, 2007, at 3:28 PM, Tom Tucker wrote:
>
> Move the IP address fields to the svc_xprt structure. Note that this
> assumes that _all_ RPC transports must have IP based 4-tuples. This
> seems reasonable given the tight coupling with the portmapper etc...
> Thoughts?

My quibble is with "IP based 4-tuples" in your description -- that
doesn't describe IPv6 addresses. "For now, we assume that an IP
address and port is used to locate RPC transport endpoints." might be
better.

The svc_copy_addr function below might benefit from a comment about
why the address's port isn't copied as well. Otherwise we would use
a straight memcpy.

Also, the preference is to let the compiler decide for itself whether
inlining is appropriate.

>
> Signed-off-by: Tom Tucker <[email protected]>
> ---
>
> include/linux/sunrpc/svc_xprt.h | 3 ++
> include/linux/sunrpc/svcsock.h | 4 ---
> net/sunrpc/svcsock.c | 50 ++++++++++++++++++++
> +------------------
> 3 files changed, 30 insertions(+), 27 deletions(-)
>
> diff --git a/include/linux/sunrpc/svc_xprt.h b/include/linux/sunrpc/
> svc_xprt.h
> index ba92909..47ad941 100644
> --- a/include/linux/sunrpc/svc_xprt.h
> +++ b/include/linux/sunrpc/svc_xprt.h
> @@ -62,6 +62,9 @@ #define XPT_CACHE_AUTH 12 /* cache auth
> void *xpt_auth_cache;/* auth cache */
> struct list_head xpt_deferred; /* deferred requests that need
> * to be revisted */
> + struct sockaddr_storage xpt_local; /* local address */
> + struct sockaddr_storage xpt_remote; /* remote peer's address */
> + int xpt_remotelen; /* length of address */
> };
>
> int svc_reg_xprt_class(struct svc_xprt_class *);
> diff --git a/include/linux/sunrpc/svcsock.h b/include/linux/sunrpc/
> svcsock.h
> index 96a229e..206f092 100644
> --- a/include/linux/sunrpc/svcsock.h
> +++ b/include/linux/sunrpc/svcsock.h
> @@ -28,10 +28,6 @@ struct svc_sock {
> /* private TCP part */
> int sk_reclen; /* length of record */
> int sk_tcplen; /* current read length */
> -
> - struct sockaddr_storage sk_local; /* local address */
> - struct sockaddr_storage sk_remote; /* remote peer's address */
> - int sk_remotelen; /* length of address */
> };
>
> /*
> diff --git a/net/sunrpc/svcsock.c b/net/sunrpc/svcsock.c
> index 0732dc2..ab34bb2 100644
> --- a/net/sunrpc/svcsock.c
> +++ b/net/sunrpc/svcsock.c
> @@ -632,33 +632,13 @@ svc_recvfrom(struct svc_rqst *rqstp, str
> struct msghdr msg = {
> .msg_flags = MSG_DONTWAIT,
> };
> - struct sockaddr *sin;
> int len;
>
> len = kernel_recvmsg(svsk->sk_sock, &msg, iov, nr, buflen,
> msg.msg_flags);
>
> - /* sock_recvmsg doesn't fill in the name/namelen, so we must..
> - */
> - memcpy(&rqstp->rq_addr, &svsk->sk_remote, svsk->sk_remotelen);
> - rqstp->rq_addrlen = svsk->sk_remotelen;
> -
> - /* Destination address in request is needed for binding the
> - * source address in RPC callbacks later.
> - */
> - sin = (struct sockaddr *)&svsk->sk_local;
> - switch (sin->sa_family) {
> - case AF_INET:
> - rqstp->rq_daddr.addr = ((struct sockaddr_in *)sin)->sin_addr;
> - break;
> - case AF_INET6:
> - rqstp->rq_daddr.addr6 = ((struct sockaddr_in6 *)sin)->sin6_addr;
> - break;
> - }
> -
> dprintk("svc: socket %p recvfrom(%p, %Zu) = %d\n",
> svsk, iov[0].iov_base, iov[0].iov_len, len);
> -
> return len;
> }
>
> @@ -1113,14 +1093,14 @@ svc_tcp_accept(struct svc_xprt *xprt)
> if (!(newsvsk = svc_setup_socket(serv, newsock, &err,
> (SVC_SOCK_ANONYMOUS | SVC_SOCK_TEMPORARY))))
> goto failed;
> - memcpy(&newsvsk->sk_remote, sin, slen);
> - newsvsk->sk_remotelen = slen;
> + memcpy(&newsvsk->sk_xprt.xpt_remote, sin, slen);
> + newsvsk->sk_xprt.xpt_remotelen = slen;
> err = kernel_getsockname(newsock, sin, &slen);
> if (unlikely(err < 0)) {
> dprintk("svc_tcp_accept: kernel_getsockname error %d\n", -err);
> slen = offsetof(struct sockaddr, sa_data);
> }
> - memcpy(&newsvsk->sk_local, sin, slen);
> + memcpy(&newsvsk->sk_xprt.xpt_local, sin, slen);
>
> if (serv->sv_stats)
> serv->sv_stats->nettcpconn++;
> @@ -1496,6 +1476,29 @@ svc_check_conn_limits(struct svc_serv *s
> }
> }
>
> +static inline void svc_copy_addr(struct svc_rqst *rqstp, struct
> svc_xprt *xprt)
> +{
> + struct sockaddr *sin;
> +
> + /* sock_recvmsg doesn't fill in the name/namelen, so we must..
> + */
> + memcpy(&rqstp->rq_addr, &xprt->xpt_remote, xprt->xpt_remotelen);
> + rqstp->rq_addrlen = xprt->xpt_remotelen;
> +
> + /* Destination address in request is needed for binding the
> + * source address in RPC callbacks later.
> + */
> + sin = (struct sockaddr *)&xprt->xpt_local;
> + switch (sin->sa_family) {
> + case AF_INET:
> + rqstp->rq_daddr.addr = ((struct sockaddr_in *)sin)->sin_addr;
> + break;
> + case AF_INET6:
> + rqstp->rq_daddr.addr6 = ((struct sockaddr_in6 *)sin)->sin6_addr;
> + break;
> + }
> +}
> +
> /*
> * Receive the next request on any socket. This code is carefully
> * organised not to touch any cachelines in the shared svc_serv
> @@ -1614,6 +1617,7 @@ svc_recv(struct svc_rqst *rqstp, long ti
> len = svc_deferred_recv(rqstp);
> } else
> len = svsk->sk_xprt.xpt_ops.xpo_recvfrom(rqstp);
> + svc_copy_addr(rqstp, &svsk->sk_xprt);
> dprintk("svc: got len=%d\n", len);
> }
>

Chuck Lever
[email protected]




-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2005.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2007-10-02 16:36:50

by Chuck Lever

[permalink] [raw]
Subject: Re: [RFC,PATCH 27/35] svc: Make svc_recv transport neutral

On Oct 1, 2007, at 3:28 PM, Tom Tucker wrote:
>
> All of the transport field and functions used by svc_recv are now
> transport independent. Change the svc_recv function to use the
> svc_xprt
> structure directly instead of the transport specific svc_sock
> structure.
>
> Signed-off-by: Tom Tucker <[email protected]>
> ---
>
> net/sunrpc/svcsock.c | 64 ++++++++++++++++++++++++
> +-------------------------
> 1 files changed, 32 insertions(+), 32 deletions(-)
>
> diff --git a/net/sunrpc/svcsock.c b/net/sunrpc/svcsock.c
> index 68ae7a9..573792f 100644
> --- a/net/sunrpc/svcsock.c
> +++ b/net/sunrpc/svcsock.c
> @@ -321,22 +321,22 @@ EXPORT_SYMBOL_GPL(svc_xprt_enqueue);
> /*
> * Dequeue the first socket. Must be called with the pool-
> >sp_lock held.
> */
> -static inline struct svc_sock *
> -svc_sock_dequeue(struct svc_pool *pool)
> +static inline struct svc_xprt *
> +svc_xprt_dequeue(struct svc_pool *pool)
> {
> - struct svc_sock *svsk;
> + struct svc_xprt *xprt;
>
> if (list_empty(&pool->sp_sockets))
> return NULL;
>
> - svsk = list_entry(pool->sp_sockets.next,
> - struct svc_sock, sk_xprt.xpt_ready);
> - list_del_init(&svsk->sk_xprt.xpt_ready);
> + xprt = list_entry(pool->sp_sockets.next,
> + struct svc_xprt, xpt_ready);
> + list_del_init(&xprt->xpt_ready);
>
> - dprintk("svc: socket %p dequeued, inuse=%d\n",
> - svsk->sk_sk, atomic_read(&svsk->sk_xprt.xpt_ref.refcount));
> + dprintk("svc: transport %p dequeued, inuse=%d\n",
> + xprt, atomic_read(&xprt->xpt_ref.refcount));
>
> - return svsk;
> + return xprt;
> }
>
> /*
> @@ -1506,20 +1506,20 @@ static inline void svc_copy_addr(struct
> int
> svc_recv(struct svc_rqst *rqstp, long timeout)
> {
> - struct svc_sock *svsk = NULL;
> + struct svc_xprt *xprt = NULL;
> struct svc_serv *serv = rqstp->rq_server;
> struct svc_pool *pool = rqstp->rq_pool;
> int len, i;
> - int pages;
> + int pages;
> struct xdr_buf *arg;
> DECLARE_WAITQUEUE(wait, current);
>
> dprintk("svc: server %p waiting for data (to = %ld)\n",
> rqstp, timeout);
>
> - if (rqstp->rq_sock)
> + if (rqstp->rq_xprt)
> printk(KERN_ERR
> - "svc_recv: service %p, socket not NULL!\n",
> + "svc_recv: service %p, transport not NULL!\n",
> rqstp);

Again, anyone know why this printk is here? Is it still needed?

> if (waitqueue_active(&rqstp->rq_wait))
> printk(KERN_ERR
> @@ -1556,11 +1556,11 @@ svc_recv(struct svc_rqst *rqstp, long ti
> return -EINTR;
>
> spin_lock_bh(&pool->sp_lock);
> - if ((svsk = svc_sock_dequeue(pool)) != NULL) {
> - rqstp->rq_sock = svsk;
> - svc_xprt_get(&svsk->sk_xprt);
> + if ((xprt = svc_xprt_dequeue(pool)) != NULL) {
> + rqstp->rq_xprt = xprt;
> + svc_xprt_get(xprt);
> rqstp->rq_reserved = serv->sv_max_mesg;
> - atomic_add(rqstp->rq_reserved, &svsk->sk_xprt.xpt_reserved);
> + atomic_add(rqstp->rq_reserved, &xprt->xpt_reserved);
> } else {
> /* No data pending. Go to sleep */
> svc_thread_enqueue(pool, rqstp);
> @@ -1580,7 +1580,7 @@ svc_recv(struct svc_rqst *rqstp, long ti
> spin_lock_bh(&pool->sp_lock);
> remove_wait_queue(&rqstp->rq_wait, &wait);
>
> - if (!(svsk = rqstp->rq_sock)) {
> + if (!(xprt = rqstp->rq_xprt)) {
> svc_thread_dequeue(pool, rqstp);
> spin_unlock_bh(&pool->sp_lock);
> dprintk("svc: server %p, no data yet\n", rqstp);
> @@ -1590,12 +1590,12 @@ svc_recv(struct svc_rqst *rqstp, long ti
> spin_unlock_bh(&pool->sp_lock);
>
> len = 0;
> - if (test_bit(XPT_CLOSE, &svsk->sk_xprt.xpt_flags)) {
> + if (test_bit(XPT_CLOSE, &xprt->xpt_flags)) {
> dprintk("svc_recv: found XPT_CLOSE\n");
> - svc_delete_xprt(&svsk->sk_xprt);
> - } else if (test_bit(XPT_LISTENER, &svsk->sk_xprt.xpt_flags)) {
> + svc_delete_xprt(xprt);
> + } else if (test_bit(XPT_LISTENER, &xprt->xpt_flags)) {
> struct svc_xprt *newxpt;
> - newxpt = svsk->sk_xprt.xpt_ops.xpo_accept(&svsk->sk_xprt);
> + newxpt = xprt->xpt_ops.xpo_accept(xprt);
> if (newxpt) {
> svc_xprt_received(newxpt);
> /*
> @@ -1603,20 +1603,20 @@ svc_recv(struct svc_rqst *rqstp, long ti
> * listener holds a reference too
> */
> __module_get(newxpt->xpt_class->xcl_owner);
> - svc_check_conn_limits(svsk->sk_xprt.xpt_server);
> + svc_check_conn_limits(xprt->xpt_server);
> }
> - svc_xprt_received(&svsk->sk_xprt);
> + svc_xprt_received(xprt);
> } else {
> - dprintk("svc: server %p, pool %u, socket %p, inuse=%d\n",
> - rqstp, pool->sp_id, svsk,
> - atomic_read(&svsk->sk_xprt.xpt_ref.refcount));
> + dprintk("svc: server %p, pool %u, transport %p, inuse=%d\n",
> + rqstp, pool->sp_id, xprt,
> + atomic_read(&xprt->xpt_ref.refcount));
>
> - if ((rqstp->rq_deferred = svc_deferred_dequeue(&svsk->sk_xprt))) {
> - svc_xprt_received(&svsk->sk_xprt);
> + if ((rqstp->rq_deferred = svc_deferred_dequeue(xprt))) {
> + svc_xprt_received(xprt);
> len = svc_deferred_recv(rqstp);
> } else
> - len = svsk->sk_xprt.xpt_ops.xpo_recvfrom(rqstp);
> - svc_copy_addr(rqstp, &svsk->sk_xprt);
> + len = xprt->xpt_ops.xpo_recvfrom(rqstp);
> + svc_copy_addr(rqstp, xprt);
> dprintk("svc: got len=%d\n", len);
> }
>
> @@ -1626,7 +1626,7 @@ svc_recv(struct svc_rqst *rqstp, long ti
> svc_xprt_release(rqstp);
> return -EAGAIN;
> }
> - clear_bit(XPT_OLD, &svsk->sk_xprt.xpt_flags);
> + clear_bit(XPT_OLD, &xprt->xpt_flags);
>
> rqstp->rq_secure = svc_port_is_privileged(svc_addr(rqstp));
> rqstp->rq_chandle.defer = svc_defer;

Chuck Lever
[email protected]




-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2005.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2007-10-02 16:36:51

by Tom Tucker

[permalink] [raw]
Subject: Re: [RFC, PATCH 06/35] svc: Add transport specific xpo_release function

On Tue, 2007-10-02 at 11:18 -0400, Chuck Lever wrote:
> On Oct 1, 2007, at 3:27 PM, Tom Tucker wrote:
> >

[...snip...]

> >
> > struct svc_xprt_class {
>
> You intend to add xpo_detach and xpo_free in a later patch. The
> method names suggest all of these operate on the svc_xprt.
> xpo_release, however appears to operate on a request, not on a svc_xprt.

It's a little of both. It allows a transport to handle
transport-specific context information that is cached in the rqstp
structure.

>
> Perhaps you might name this method xpo_release_rqst or some other
> name that indicates that this operates on a request. The name
> xpo_release could easily refer to closing the underlying socket. As
> an example, the client-side transport uses ->release_request.

Yes, I like your name better.

>
> The client side also appears to treat the transport as handling
> requests, instead of socket reads and writes. The use of the method
> names recvfrom and sendto suggest we are talking about bytes on a
> socket here, not transmitting and receiving whole RPC requests. I
> think that's a useful abstraction.
>
> While I'm whining aloud... I don't prefer the method names detach or
> free, either. On the client side we used close and destroy, which
> (to me, anyway) makes more sense.
>

Perhaps, but these names match what is actually done: detach disconnects
the transport from asynchronous I/O events, and free -- well frees.

> > diff --git a/net/sunrpc/svcsock.c b/net/sunrpc/svcsock.c
> > index 150531f..2d5731c 100644
> > --- a/net/sunrpc/svcsock.c
> > +++ b/net/sunrpc/svcsock.c
> > @@ -184,14 +184,14 @@ svc_thread_dequeue(struct svc_pool *pool
> > /*
> > * Release an skbuff after use
> > */
> > -static inline void
> > +static void
> > svc_release_skb(struct svc_rqst *rqstp)
> > {
> > - struct sk_buff *skb = rqstp->rq_skbuff;
> > + struct sk_buff *skb = rqstp->rq_xprt_ctxt;
> > struct svc_deferred_req *dr = rqstp->rq_deferred;
> >
> > if (skb) {
> > - rqstp->rq_skbuff = NULL;
> > + rqstp->rq_xprt_ctxt = NULL;
> >
> > dprintk("svc: service %p, releasing skb %p\n", rqstp, skb);
> > skb_free_datagram(rqstp->rq_sock->sk_sk, skb);
> > @@ -394,7 +394,7 @@ svc_sock_release(struct svc_rqst *rqstp)
> > {
> > struct svc_sock *svsk = rqstp->rq_sock;
> >
> > - svc_release_skb(rqstp);
> > + rqstp->rq_xprt->xpt_ops.xpo_release(rqstp);
> >
> > svc_free_res_pages(rqstp);
> > rqstp->rq_res.page_len = 0;
> > @@ -866,7 +866,7 @@ svc_udp_recvfrom(struct svc_rqst *rqstp)
> > skb_free_datagram(svsk->sk_sk, skb);
> > return 0;
> > }
> > - rqstp->rq_skbuff = skb;
> > + rqstp->rq_xprt_ctxt = skb;
> > }
> >
> > rqstp->rq_arg.page_base = 0;
> > @@ -902,6 +902,7 @@ svc_udp_sendto(struct svc_rqst *rqstp)
> > static struct svc_xprt_ops svc_udp_ops = {
> > .xpo_recvfrom = svc_udp_recvfrom,
> > .xpo_sendto = svc_udp_sendto,
> > + .xpo_release = svc_release_skb,
> > };
> >
> > static struct svc_xprt_class svc_udp_class = {
> > @@ -1290,7 +1291,7 @@ svc_tcp_recvfrom(struct svc_rqst *rqstp)
> > rqstp->rq_arg.page_len = len - rqstp->rq_arg.head[0].iov_len;
> > }
> >
> > - rqstp->rq_skbuff = NULL;
> > + rqstp->rq_xprt_ctxt = NULL;
> > rqstp->rq_prot = IPPROTO_TCP;
> >
> > /* Reset TCP read info */
> > @@ -1356,6 +1357,7 @@ svc_tcp_sendto(struct svc_rqst *rqstp)
> > static struct svc_xprt_ops svc_tcp_ops = {
> > .xpo_recvfrom = svc_tcp_recvfrom,
> > .xpo_sendto = svc_tcp_sendto,
> > + .xpo_release = svc_release_skb,
> > };
> >
> > static struct svc_xprt_class svc_tcp_class = {
> > @@ -1577,7 +1579,7 @@ svc_send(struct svc_rqst *rqstp)
> > }
> >
> > /* release the receive skb before sending the reply */
> > - svc_release_skb(rqstp);
> > + rqstp->rq_xprt->xpt_ops.xpo_release(rqstp);
> >
> > /* calculate over-all length */
> > xb = & rqstp->rq_res;
>
> Chuck Lever
> [email protected]
>
>


-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2005.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2007-10-02 16:52:58

by Tom Tucker

[permalink] [raw]
Subject: Re: [RFC, PATCH 25/35] svc: Move the sockaddr information to svc_xprt

On Tue, 2007-10-02 at 12:34 -0400, Chuck Lever wrote:
> On Oct 1, 2007, at 3:28 PM, Tom Tucker wrote:
> >
> > Move the IP address fields to the svc_xprt structure. Note that this
> > assumes that _all_ RPC transports must have IP based 4-tuples. This
> > seems reasonable given the tight coupling with the portmapper etc...
> > Thoughts?
>
> My quibble is with "IP based 4-tuples" in your description -- that
> doesn't describe IPv6 addresses. "For now, we assume that an IP
> address and port is used to locate RPC transport endpoints." might be
> better.

I meant to imply both.

>
> The svc_copy_addr function below might benefit from a comment about
> why the address's port isn't copied as well. Otherwise we would use
> a straight memcpy.
>

ok,

> Also, the preference is to let the compiler decide for itself whether
> inlining is appropriate.
>

ok,

> >
> > Signed-off-by: Tom Tucker <[email protected]>
> > ---
> >
> > include/linux/sunrpc/svc_xprt.h | 3 ++
> > include/linux/sunrpc/svcsock.h | 4 ---
> > net/sunrpc/svcsock.c | 50 ++++++++++++++++++++
> > +------------------
> > 3 files changed, 30 insertions(+), 27 deletions(-)
> >
> > diff --git a/include/linux/sunrpc/svc_xprt.h b/include/linux/sunrpc/
> > svc_xprt.h
> > index ba92909..47ad941 100644
> > --- a/include/linux/sunrpc/svc_xprt.h
> > +++ b/include/linux/sunrpc/svc_xprt.h
> > @@ -62,6 +62,9 @@ #define XPT_CACHE_AUTH 12 /* cache auth
> > void *xpt_auth_cache;/* auth cache */
> > struct list_head xpt_deferred; /* deferred requests that need
> > * to be revisted */
> > + struct sockaddr_storage xpt_local; /* local address */
> > + struct sockaddr_storage xpt_remote; /* remote peer's address */
> > + int xpt_remotelen; /* length of address */
> > };
> >
> > int svc_reg_xprt_class(struct svc_xprt_class *);
> > diff --git a/include/linux/sunrpc/svcsock.h b/include/linux/sunrpc/
> > svcsock.h
> > index 96a229e..206f092 100644
> > --- a/include/linux/sunrpc/svcsock.h
> > +++ b/include/linux/sunrpc/svcsock.h
> > @@ -28,10 +28,6 @@ struct svc_sock {
> > /* private TCP part */
> > int sk_reclen; /* length of record */
> > int sk_tcplen; /* current read length */
> > -
> > - struct sockaddr_storage sk_local; /* local address */
> > - struct sockaddr_storage sk_remote; /* remote peer's address */
> > - int sk_remotelen; /* length of address */
> > };
> >
> > /*
> > diff --git a/net/sunrpc/svcsock.c b/net/sunrpc/svcsock.c
> > index 0732dc2..ab34bb2 100644
> > --- a/net/sunrpc/svcsock.c
> > +++ b/net/sunrpc/svcsock.c
> > @@ -632,33 +632,13 @@ svc_recvfrom(struct svc_rqst *rqstp, str
> > struct msghdr msg = {
> > .msg_flags = MSG_DONTWAIT,
> > };
> > - struct sockaddr *sin;
> > int len;
> >
> > len = kernel_recvmsg(svsk->sk_sock, &msg, iov, nr, buflen,
> > msg.msg_flags);
> >
> > - /* sock_recvmsg doesn't fill in the name/namelen, so we must..
> > - */
> > - memcpy(&rqstp->rq_addr, &svsk->sk_remote, svsk->sk_remotelen);
> > - rqstp->rq_addrlen = svsk->sk_remotelen;
> > -
> > - /* Destination address in request is needed for binding the
> > - * source address in RPC callbacks later.
> > - */
> > - sin = (struct sockaddr *)&svsk->sk_local;
> > - switch (sin->sa_family) {
> > - case AF_INET:
> > - rqstp->rq_daddr.addr = ((struct sockaddr_in *)sin)->sin_addr;
> > - break;
> > - case AF_INET6:
> > - rqstp->rq_daddr.addr6 = ((struct sockaddr_in6 *)sin)->sin6_addr;
> > - break;
> > - }
> > -
> > dprintk("svc: socket %p recvfrom(%p, %Zu) = %d\n",
> > svsk, iov[0].iov_base, iov[0].iov_len, len);
> > -
> > return len;
> > }
> >
> > @@ -1113,14 +1093,14 @@ svc_tcp_accept(struct svc_xprt *xprt)
> > if (!(newsvsk = svc_setup_socket(serv, newsock, &err,
> > (SVC_SOCK_ANONYMOUS | SVC_SOCK_TEMPORARY))))
> > goto failed;
> > - memcpy(&newsvsk->sk_remote, sin, slen);
> > - newsvsk->sk_remotelen = slen;
> > + memcpy(&newsvsk->sk_xprt.xpt_remote, sin, slen);
> > + newsvsk->sk_xprt.xpt_remotelen = slen;
> > err = kernel_getsockname(newsock, sin, &slen);
> > if (unlikely(err < 0)) {
> > dprintk("svc_tcp_accept: kernel_getsockname error %d\n", -err);
> > slen = offsetof(struct sockaddr, sa_data);
> > }
> > - memcpy(&newsvsk->sk_local, sin, slen);
> > + memcpy(&newsvsk->sk_xprt.xpt_local, sin, slen);
> >
> > if (serv->sv_stats)
> > serv->sv_stats->nettcpconn++;
> > @@ -1496,6 +1476,29 @@ svc_check_conn_limits(struct svc_serv *s
> > }
> > }
> >
> > +static inline void svc_copy_addr(struct svc_rqst *rqstp, struct
> > svc_xprt *xprt)
> > +{
> > + struct sockaddr *sin;
> > +
> > + /* sock_recvmsg doesn't fill in the name/namelen, so we must..
> > + */
> > + memcpy(&rqstp->rq_addr, &xprt->xpt_remote, xprt->xpt_remotelen);
> > + rqstp->rq_addrlen = xprt->xpt_remotelen;
> > +
> > + /* Destination address in request is needed for binding the
> > + * source address in RPC callbacks later.
> > + */
> > + sin = (struct sockaddr *)&xprt->xpt_local;
> > + switch (sin->sa_family) {
> > + case AF_INET:
> > + rqstp->rq_daddr.addr = ((struct sockaddr_in *)sin)->sin_addr;
> > + break;
> > + case AF_INET6:
> > + rqstp->rq_daddr.addr6 = ((struct sockaddr_in6 *)sin)->sin6_addr;
> > + break;
> > + }
> > +}
> > +
> > /*
> > * Receive the next request on any socket. This code is carefully
> > * organised not to touch any cachelines in the shared svc_serv
> > @@ -1614,6 +1617,7 @@ svc_recv(struct svc_rqst *rqstp, long ti
> > len = svc_deferred_recv(rqstp);
> > } else
> > len = svsk->sk_xprt.xpt_ops.xpo_recvfrom(rqstp);
> > + svc_copy_addr(rqstp, &svsk->sk_xprt);
> > dprintk("svc: got len=%d\n", len);
> > }
> >
>
> Chuck Lever
> [email protected]
>
>


-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2005.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2007-10-02 16:52:58

by Tom Tucker

[permalink] [raw]
Subject: Re: [RFC, PATCH 29/35] svc: Move common create logic to common code

On Tue, 2007-10-02 at 12:42 -0400, Chuck Lever wrote:
> On Oct 1, 2007, at 3:28 PM, Tom Tucker wrote:
> >
[...snip...]

> > @@ -1832,6 +1826,12 @@ int svc_addsock(struct svc_serv *serv,
> > svc_xprt_received(&svsk->sk_xprt);
> > err = 0;
> > }
> > + if (so->sk->sk_protocol == IPPROTO_TCP)
> > + set_bit(XPT_LISTENER, &svsk->sk_xprt.xpt_flags);
>
> One might wish that, even though this is in socket-transport-specific
> code, you didn't have to check the underlying socket's protocol
> setting here, but that this could be done in a TCP-specific function
> such as svc_tcp_init.

I'll move it to svc_tcp_init and look at sk_state. Thanks,
Tom
>
> > + clear_bit(XPT_TEMP, &svsk->sk_xprt.xpt_flags);
> > + spin_lock_bh(&serv->sv_lock);
> > + list_add(&svsk->sk_xprt.xpt_list, &serv->sv_permsocks);
> > + spin_unlock_bh(&serv->sv_lock);
> > }
> > if (err) {
> > sockfd_put(so);
>
> Chuck Lever
> [email protected]
>
>


-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2005.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2007-10-02 16:53:03

by Chuck Lever

[permalink] [raw]
Subject: Re: [RFC, PATCH 29/35] svc: Move common create logic to common code

On Oct 1, 2007, at 3:28 PM, Tom Tucker wrote:
>
> Move the code that adds a transport instance to the sv_tempsocks and
> sv_permsocks lists out of the transport specific functions and into
> core
> logic.
>
> The svc_addsock routine still manipulates sv_permsocks directly. This
> code may be removed when rpc.nfsd is modified to create transports
> by writing to the portlist file.
>
> Signed-off-by: Tom Tucker <[email protected]>
> ---
>
> net/sunrpc/svc_xprt.c | 7 +++++++
> net/sunrpc/svcsock.c | 38 +++++++++++++++++++-------------------
> 2 files changed, 26 insertions(+), 19 deletions(-)
>
> diff --git a/net/sunrpc/svc_xprt.c b/net/sunrpc/svc_xprt.c
> index 2a27d5e..56cda03 100644
> --- a/net/sunrpc/svc_xprt.c
> +++ b/net/sunrpc/svc_xprt.c
> @@ -144,6 +144,13 @@ int svc_create_xprt(struct svc_serv *ser
> if (IS_ERR(newxprt)) {
> module_put(xcl->xcl_owner);
> ret = PTR_ERR(newxprt);
> + } else {
> + clear_bit(XPT_TEMP,
> + &newxprt->xpt_flags);
> + spin_lock_bh(&serv->sv_lock);
> + list_add(&newxprt->xpt_list,
> + &serv->sv_permsocks);
> + spin_unlock_bh(&serv->sv_lock);
> }
> }
> goto out;
> diff --git a/net/sunrpc/svcsock.c b/net/sunrpc/svcsock.c
> index d6f3c02..f1ea6f7 100644
> --- a/net/sunrpc/svcsock.c
> +++ b/net/sunrpc/svcsock.c
> @@ -93,6 +93,7 @@ static int svc_deferred_recv(struct svc_
> static struct cache_deferred_req *svc_defer(struct cache_req *req);
> static struct svc_xprt *
> svc_create_socket(struct svc_serv *, int, struct sockaddr *, int,
> int);
> +static void svc_age_temp_xprts(unsigned long closure);
>
> /* apparently the "standard" is that clients close
> * idle connections after 5 minutes, servers after
> @@ -1604,6 +1605,18 @@ svc_recv(struct svc_rqst *rqstp, long ti
> */
> __module_get(newxpt->xpt_class->xcl_owner);
> svc_check_conn_limits(xprt->xpt_server);
> + spin_lock_bh(&serv->sv_lock);
> + set_bit(XPT_TEMP, &newxpt->xpt_flags);
> + list_add(&newxpt->xpt_list, &serv->sv_tempsocks);
> + serv->sv_tmpcnt++;
> + if (serv->sv_temptimer.function == NULL) {
> + /* setup timer to age temp sockets */
> + setup_timer(&serv->sv_temptimer, svc_age_temp_xprts,
> + (unsigned long)serv);
> + mod_timer(&serv->sv_temptimer,
> + jiffies + svc_conn_age_period * HZ);
> + }
> + spin_unlock_bh(&serv->sv_lock);
> }
> svc_xprt_received(xprt);
> } else {
> @@ -1750,7 +1763,6 @@ static struct svc_sock *svc_setup_socket
> struct svc_sock *svsk;
> struct sock *inet;
> int pmap_register = !(flags & SVC_SOCK_ANONYMOUS);
> - int is_temporary = flags & SVC_SOCK_TEMPORARY;
>
> dprintk("svc: svc_setup_socket %p\n", sock);
> if (!(svsk = kzalloc(sizeof(*svsk), GFP_KERNEL))) {
> @@ -1784,24 +1796,6 @@ static struct svc_sock *svc_setup_socket
> else
> svc_tcp_init(svsk, serv);
>
> - spin_lock_bh(&serv->sv_lock);
> - if (is_temporary) {
> - set_bit(XPT_TEMP, &svsk->sk_xprt.xpt_flags);
> - list_add(&svsk->sk_xprt.xpt_list, &serv->sv_tempsocks);
> - serv->sv_tmpcnt++;
> - if (serv->sv_temptimer.function == NULL) {
> - /* setup timer to age temp sockets */
> - setup_timer(&serv->sv_temptimer, svc_age_temp_xprts,
> - (unsigned long)serv);
> - mod_timer(&serv->sv_temptimer,
> - jiffies + svc_conn_age_period * HZ);
> - }
> - } else {
> - clear_bit(XPT_TEMP, &svsk->sk_xprt.xpt_flags);
> - list_add(&svsk->sk_xprt.xpt_list, &serv->sv_permsocks);
> - }
> - spin_unlock_bh(&serv->sv_lock);
> -
> dprintk("svc: svc_setup_socket created %p (inet %p)\n",
> svsk, svsk->sk_sk);
>
> @@ -1832,6 +1826,12 @@ int svc_addsock(struct svc_serv *serv,
> svc_xprt_received(&svsk->sk_xprt);
> err = 0;
> }
> + if (so->sk->sk_protocol == IPPROTO_TCP)
> + set_bit(XPT_LISTENER, &svsk->sk_xprt.xpt_flags);

One might wish that, even though this is in socket-transport-specific
code, you didn't have to check the underlying socket's protocol
setting here, but that this could be done in a TCP-specific function
such as svc_tcp_init.

> + clear_bit(XPT_TEMP, &svsk->sk_xprt.xpt_flags);
> + spin_lock_bh(&serv->sv_lock);
> + list_add(&svsk->sk_xprt.xpt_list, &serv->sv_permsocks);
> + spin_unlock_bh(&serv->sv_lock);
> }
> if (err) {
> sockfd_put(so);

Chuck Lever
[email protected]




-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2005.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2007-10-02 16:53:05

by Tom Tucker

[permalink] [raw]
Subject: Re: [RFC,PATCH 11/35] svc: Add xpo_accept transport function

On Tue, 2007-10-02 at 11:33 -0400, Chuck Lever wrote:
> On Oct 1, 2007, at 3:27 PM, Tom Tucker wrote:

[...snip...]

> > + if (newxpt)
> > + svc_check_conn_limits(svsk->sk_server);
> > + svc_sock_received(svsk);
> > } else {
> > dprintk("svc: server %p, pool %u, socket %p, inuse=%d\n",
> > rqstp, pool->sp_id, svsk, atomic_read(&svsk->sk_inuse));
>
> Instead of adding a test_bit() and conditional branch here, why not
> always call xpo_accept? For UDP, the method simply returns.
>

That's what I thought at first too, but UDP needs to call receive here.
Doing nothing stalls the service and lockd never gets set up.

> > @@ -1859,6 +1873,8 @@ static int svc_create_socket(struct svc_
> > }
> >
> > if ((svsk = svc_setup_socket(serv, sock, &error, flags)) != NULL) {
> > + if (protocol == IPPROTO_TCP)
> > + set_bit(SK_LISTENER, &svsk->sk_flags);
> > svc_sock_received(svsk);
> > return ntohs(inet_sk(svsk->sk_sk)->sport);
> > }
>
> If you really need to set SK_LISTENER for TCP, shouldn't that be done
> in svc_tcp_init() ?

Yes, it could since svc_tcp_init looks at the socket state.

>
> Chuck Lever
> [email protected]
>
>


-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2005.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2007-10-02 16:55:06

by Chuck Lever

[permalink] [raw]
Subject: Re: [RFC,PATCH 20/35] svc: Make svc_send transport neutral


On Oct 2, 2007, at 12:46 PM, Tom Tucker wrote:

> On Tue, 2007-10-02 at 12:15 -0400, Chuck Lever wrote:
>> On Oct 1, 2007, at 3:28 PM, Tom Tucker wrote:
>>>
>
> [...snip...]
>
>>> - if ((svsk = rqstp->rq_sock) == NULL) {
>>> - printk(KERN_WARNING "NULL socket pointer in %s:%d\n",
>>> + if ((xprt = rqstp->rq_xprt) == NULL) {
>>> + printk(KERN_WARNING "NULL transport pointer in %s:%d\n",
>>> __FILE__, __LINE__);
>>> return -EFAULT;
>>> }
>>
>> Do we still want this printk here? Maybe it can be removed.
>
> I don't know why it's here. Maybe replace it with a BUG_ON?

/me makes an X with his fingers and hisses like a cat....

BUG_ON is heavyweight and often makes the system unusable after a
short while by leaving a lot of bogus state (like held locks).
Unless a NULL here is a real sign of a software fault that requires a
hard stop, simply returning EFAULT makes sense.

>>
>>> @@ -1674,13 +1674,13 @@ svc_send(struct svc_rqst *rqstp)
>>> xb->page_len +
>>> xb->tail[0].iov_len;
>>>
>>> - /* Grab svsk->sk_mutex to serialize outgoing data. */
>>> - mutex_lock(&svsk->sk_mutex);
>>> - if (test_bit(XPT_DEAD, &svsk->sk_xprt.xpt_flags))
>>> + /* Grab mutex to serialize outgoing data. */
>>> + mutex_lock(&xprt->xpt_mutex);
>>> + if (test_bit(XPT_DEAD, &xprt->xpt_flags))
>>> len = -ENOTCONN;
>>> else
>>> - len = svsk->sk_xprt.xpt_ops.xpo_sendto(rqstp);
>>> - mutex_unlock(&svsk->sk_mutex);
>>> + len = xprt->xpt_ops.xpo_sendto(rqstp);
>>> + mutex_unlock(&xprt->xpt_mutex);
>>> svc_sock_release(rqstp);
>>>
>>> if (len == -ECONNREFUSED || len == -ENOTCONN || len == -EAGAIN)
>>> @@ -1782,7 +1782,6 @@ static struct svc_sock *svc_setup_socket
>>> svsk->sk_lastrecv = get_seconds();
>>> spin_lock_init(&svsk->sk_lock);
>>> INIT_LIST_HEAD(&svsk->sk_deferred);
>>> - mutex_init(&svsk->sk_mutex);
>>>
>>> /* Initialize the socket */
>>> if (sock->type == SOCK_DGRAM)
>>
>> Chuck Lever
>> [email protected]
>>
>>
>

Chuck Lever
[email protected]




-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2005.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2007-10-02 16:58:15

by Chuck Lever

[permalink] [raw]
Subject: Re: [RFC, PATCH 05/35] svc: Move sk_sendto and sk_recvfrom to svc_xprt_class


On Oct 2, 2007, at 12:29 PM, Tom Tucker wrote:

> On Tue, 2007-10-02 at 11:04 -0400, Chuck Lever wrote:
>> On Oct 1, 2007, at 3:27 PM, Tom Tucker wrote:
>>>
>>> The sk_sendto and sk_recvfrom are function pointers that allow
>>> svc_sock
>>> to be used for both UDP and TCP. Move these function pointers to the
>>> svc_xprt_ops structure.
>>>
>>> Signed-off-by: Tom Tucker <[email protected]>
>>> ---
>>>
>>> include/linux/sunrpc/svc_xprt.h | 2 ++
>>> include/linux/sunrpc/svcsock.h | 3 ---
>>> net/sunrpc/svcsock.c | 12 ++++++------
>>> 3 files changed, 8 insertions(+), 9 deletions(-)
>>>
>>> diff --git a/include/linux/sunrpc/svc_xprt.h b/include/linux/sunrpc/
>>> svc_xprt.h
>>> index 827f0fe..f0ba052 100644
>>> --- a/include/linux/sunrpc/svc_xprt.h
>>> +++ b/include/linux/sunrpc/svc_xprt.h
>>> @@ -10,6 +10,8 @@ #define SUNRPC_SVC_XPRT_H
>>> #include <linux/sunrpc/svc.h>
>>>
>>> struct svc_xprt_ops {
>>> + int (*xpo_recvfrom)(struct svc_rqst *);
>>> + int (*xpo_sendto)(struct svc_rqst *);
>>> };
>>>
>>> struct svc_xprt_class {
>>> diff --git a/include/linux/sunrpc/svcsock.h b/include/linux/sunrpc/
>>> svcsock.h
>>> index 1878cbe..08e78d0 100644
>>> --- a/include/linux/sunrpc/svcsock.h
>>> +++ b/include/linux/sunrpc/svcsock.h
>>> @@ -45,9 +45,6 @@ #define SK_DETACHED 10 /* detached fro
>>> * be revisted */
>>> struct mutex sk_mutex; /* to serialize sending data */
>>>
>>> - int (*sk_recvfrom)(struct svc_rqst *rqstp);
>>> - int (*sk_sendto)(struct svc_rqst *rqstp);
>>> -
>>> /* We keep the old state_change and data_ready CB's here */
>>> void (*sk_ostate)(struct sock *);
>>> void (*sk_odata)(struct sock *, int bytes);
>>> diff --git a/net/sunrpc/svcsock.c b/net/sunrpc/svcsock.c
>>> index d84b5c8..150531f 100644
>>> --- a/net/sunrpc/svcsock.c
>>> +++ b/net/sunrpc/svcsock.c
>>> @@ -900,6 +900,8 @@ svc_udp_sendto(struct svc_rqst *rqstp)
>>> }
>>>
>>> static struct svc_xprt_ops svc_udp_ops = {
>>> + .xpo_recvfrom = svc_udp_recvfrom,
>>> + .xpo_sendto = svc_udp_sendto,
>>> };
>>>
>>> static struct svc_xprt_class svc_udp_class = {
>>> @@ -917,8 +919,6 @@ svc_udp_init(struct svc_sock *svsk)
>>> svc_xprt_init(&svc_udp_class, &svsk->sk_xprt);
>>> svsk->sk_sk->sk_data_ready = svc_udp_data_ready;
>>> svsk->sk_sk->sk_write_space = svc_write_space;
>>> - svsk->sk_recvfrom = svc_udp_recvfrom;
>>> - svsk->sk_sendto = svc_udp_sendto;
>>>
>>> /* initialise setting must have enough space to
>>> * receive and respond to one request.
>>> @@ -1354,6 +1354,8 @@ svc_tcp_sendto(struct svc_rqst *rqstp)
>>> }
>>>
>>> static struct svc_xprt_ops svc_tcp_ops = {
>>> + .xpo_recvfrom = svc_tcp_recvfrom,
>>> + .xpo_sendto = svc_tcp_sendto,
>>> };
>>>
>>> static struct svc_xprt_class svc_tcp_class = {
>>> @@ -1381,8 +1383,6 @@ svc_tcp_init(struct svc_sock *svsk)
>>> struct tcp_sock *tp = tcp_sk(sk);
>>>
>>> svc_xprt_init(&svc_tcp_class, &svsk->sk_xprt);
>>> - svsk->sk_recvfrom = svc_tcp_recvfrom;
>>> - svsk->sk_sendto = svc_tcp_sendto;
>>>
>>> if (sk->sk_state == TCP_LISTEN) {
>>> dprintk("setting up TCP socket for listening\n");
>>> @@ -1530,7 +1530,7 @@ svc_recv(struct svc_rqst *rqstp, long ti
>>>
>>> dprintk("svc: server %p, pool %u, socket %p, inuse=%d\n",
>>> rqstp, pool->sp_id, svsk, atomic_read(&svsk->sk_inuse));
>>> - len = svsk->sk_recvfrom(rqstp);
>>> + len = svsk->sk_xprt.xpt_ops.xpo_recvfrom(rqstp);
>>> dprintk("svc: got len=%d\n", len);
>>>
>>> /* No data, incomplete (TCP) read, or accept() */
>>> @@ -1590,7 +1590,7 @@ svc_send(struct svc_rqst *rqstp)
>>> if (test_bit(SK_DEAD, &svsk->sk_flags))
>>> len = -ENOTCONN;
>>> else
>>> - len = svsk->sk_sendto(rqstp);
>>> + len = svsk->sk_xprt.xpt_ops.xpo_sendto(rqstp);
>>> mutex_unlock(&svsk->sk_mutex);
>>> svc_sock_release(rqstp);
>>
>> Again, here you have copied a pointer from the class structure to the
>> instance structure -- the address of the transport ops structure
>> never changes during the lifetime of the xprt instance, does it? You
>> could just as easily use the class's ops pointer instead.
>>
>> It looks like on the client side, I didn't put the ops vector or the
>> payload maximum in the class structure at all... 6 of one, half dozen
>> of the other. Using the class's value of the ops and payload maximum
>> would save some space in the svc_xprt, though, come to think of it.
>>
>
> cache thing again. let's see how Greg weighs in.

The ops vector itself will be in some other CPU's memory most of the
time on big systems. I don't see how you can avoid a peek... but
since it's a constant, caching should protect you most of the time, yes?

Chuck Lever
[email protected]




-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2005.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2007-10-02 17:08:33

by Chuck Lever

[permalink] [raw]
Subject: Re: [RFC,PATCH 11/35] svc: Add xpo_accept transport function

On Oct 2, 2007, at 12:41 PM, Tom Tucker wrote:
> On Tue, 2007-10-02 at 11:33 -0400, Chuck Lever wrote:
>> On Oct 1, 2007, at 3:27 PM, Tom Tucker wrote:
>
> [...snip...]
>
>>> + if (newxpt)
>>> + svc_check_conn_limits(svsk->sk_server);
>>> + svc_sock_received(svsk);
>>> } else {
>>> dprintk("svc: server %p, pool %u, socket %p, inuse=%d\n",
>>> rqstp, pool->sp_id, svsk, atomic_read(&svsk->sk_inuse));
>>
>> Instead of adding a test_bit() and conditional branch here, why not
>> always call xpo_accept? For UDP, the method simply returns.
>>
>
> That's what I thought at first too, but UDP needs to call receive
> here.
> Doing nothing stalls the service and lockd never gets set up.

The purpose of a transport switch is to force all the transport
specific processing down into the transport implementation so you
don't need these SK_ switches to decide whether or not to call a
function based on which transport is in use.

Could you instead create, say, an ->xpo_accept_and_receive hook that
did the right thing for all three transports?

--
Chuck Lever
chuck[dot]lever[at]oracle[dot]com




-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2005.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2007-10-02 16:52:56

by Tom Tucker

[permalink] [raw]
Subject: Re: [RFC, PATCH 13/35] svc: Change services to use new svc_create_xprt service

On Tue, 2007-10-02 at 11:44 -0400, Chuck Lever wrote:
> On Oct 1, 2007, at 3:27 PM, Tom Tucker wrote:
> >
> > Modify the various kernel RPC svcs to use the svc_create_xprt service.
> >
> > Signed-off-by: Tom Tucker <[email protected]>
> > ---
> >
> > fs/lockd/svc.c | 17 ++++++++---------
> > fs/nfs/callback.c | 4 ++--
> > fs/nfsd/nfssvc.c | 4 ++--
> > include/linux/sunrpc/svcsock.h | 1 -
> > net/sunrpc/sunrpc_syms.c | 1 -
> > net/sunrpc/svcsock.c | 22 ----------------------
> > 6 files changed, 12 insertions(+), 37 deletions(-)
> >
> > diff --git a/fs/lockd/svc.c b/fs/lockd/svc.c
> > index 82e2192..8686915 100644
> > --- a/fs/lockd/svc.c
> > +++ b/fs/lockd/svc.c
> > @@ -219,13 +219,12 @@ lockd(struct svc_rqst *rqstp)
> > module_put_and_exit(0);
> > }
> >
> > -
> > -static int find_socket(struct svc_serv *serv, int proto)
> > +static int find_xprt(struct svc_serv *serv, char *proto)
> > {
> > struct svc_sock *svsk;
> > int found = 0;
> > list_for_each_entry(svsk, &serv->sv_permsocks, sk_list)
> > - if (svsk->sk_sk->sk_protocol == proto) {
> > + if (strcmp(svsk->sk_xprt.xpt_class->xcl_name, proto) == 0) {
> > found = 1;
> > break;
> > }
>
> This is scary. :-)
>

Yes, I agree. In fact, this is the "last place" where svcs ramble around
svc_xprt internals. This was along the lines of "how much bigger do I
make this patchset". BTW, this function was already here , I just
modified it.

> First, I think we would be better off making the server transport API
> stronger by not allowing ULPs to dig around in svc_xprt or the
> svc_xprt_class structures directly. Perhaps you could provide a
> method for obtaining the transport's NETID.

I'll propose a service. NETID or simply protocol/port? What's the
consensus?

>
> Second, is there any guarantee that the string name of the underlying
> protocol is the same as the name of the transport class? Is there
> any relationship between the transport name and the NETIDs it supports?
>

None that are enforced, but that would be cool.

> > @@ -243,13 +242,13 @@ static int make_socks(struct svc_serv *s
> > int err = 0;
> >
> > if (proto == IPPROTO_UDP || nlm_udpport)
> > - if (!find_socket(serv, IPPROTO_UDP))
> > - err = svc_makesock(serv, IPPROTO_UDP, nlm_udpport,
> > - SVC_SOCK_DEFAULTS);
> > + if (!find_xprt(serv, "udp"))
> > + err = svc_create_xprt(serv, "udp", nlm_udpport,
> > + SVC_SOCK_DEFAULTS);
> > if (err >= 0 && (proto == IPPROTO_TCP || nlm_tcpport))
> > - if (!find_socket(serv, IPPROTO_TCP))
> > - err = svc_makesock(serv, IPPROTO_TCP, nlm_tcpport,
> > - SVC_SOCK_DEFAULTS);
> > + if (!find_xprt(serv, "tcp"))
> > + err = svc_create_xprt(serv, "tcp", nlm_tcpport,
> > + SVC_SOCK_DEFAULTS);
> >
> > if (err >= 0) {
> > warned = 0;
> > diff --git a/fs/nfs/callback.c b/fs/nfs/callback.c
> > index a796be5..e27ca14 100644
> > --- a/fs/nfs/callback.c
> > +++ b/fs/nfs/callback.c
> > @@ -123,8 +123,8 @@ int nfs_callback_up(void)
> > if (!serv)
> > goto out_err;
> >
> > - ret = svc_makesock(serv, IPPROTO_TCP, nfs_callback_set_tcpport,
> > - SVC_SOCK_ANONYMOUS);
> > + ret = svc_create_xprt(serv, "tcp", nfs_callback_set_tcpport,
> > + SVC_SOCK_ANONYMOUS);
> > if (ret <= 0)
> > goto out_destroy;
> > nfs_callback_tcpport = ret;
> > diff --git a/fs/nfsd/nfssvc.c b/fs/nfsd/nfssvc.c
> > index a8c89ae..bf70b06 100644
> > --- a/fs/nfsd/nfssvc.c
> > +++ b/fs/nfsd/nfssvc.c
> > @@ -236,7 +236,7 @@ static int nfsd_init_socks(int port)
> >
> > error = lockd_up(IPPROTO_UDP);
> > if (error >= 0) {
> > - error = svc_makesock(nfsd_serv, IPPROTO_UDP, port,
> > + error = svc_create_xprt(nfsd_serv, "udp", port,
> > SVC_SOCK_DEFAULTS);
> > if (error < 0)
> > lockd_down();
> > @@ -247,7 +247,7 @@ static int nfsd_init_socks(int port)
> > #ifdef CONFIG_NFSD_TCP
> > error = lockd_up(IPPROTO_TCP);
> > if (error >= 0) {
> > - error = svc_makesock(nfsd_serv, IPPROTO_TCP, port,
> > + error = svc_create_xprt(nfsd_serv, "tcp", port,
> > SVC_SOCK_DEFAULTS);
> > if (error < 0)
> > lockd_down();
> > diff --git a/include/linux/sunrpc/svcsock.h b/include/linux/sunrpc/
> > svcsock.h
> > index 9882ce0..3181d9d 100644
> > --- a/include/linux/sunrpc/svcsock.h
> > +++ b/include/linux/sunrpc/svcsock.h
> > @@ -67,7 +67,6 @@ #define SK_LISTENER 11 /* listening en
> > /*
> > * Function prototypes.
> > */
> > -int svc_makesock(struct svc_serv *, int, unsigned short, int flags);
> > void svc_force_close_socket(struct svc_sock *);
> > int svc_recv(struct svc_rqst *, long);
> > int svc_send(struct svc_rqst *);
> > diff --git a/net/sunrpc/sunrpc_syms.c b/net/sunrpc/sunrpc_syms.c
> > index a62ce47..e4cad0f 100644
> > --- a/net/sunrpc/sunrpc_syms.c
> > +++ b/net/sunrpc/sunrpc_syms.c
> > @@ -72,7 +72,6 @@ EXPORT_SYMBOL(svc_drop);
> > EXPORT_SYMBOL(svc_process);
> > EXPORT_SYMBOL(svc_recv);
> > EXPORT_SYMBOL(svc_wake_up);
> > -EXPORT_SYMBOL(svc_makesock);
> > EXPORT_SYMBOL(svc_reserve);
> > EXPORT_SYMBOL(svc_auth_register);
> > EXPORT_SYMBOL(auth_domain_lookup);
> > diff --git a/net/sunrpc/svcsock.c b/net/sunrpc/svcsock.c
> > index e3c74e0..373f020 100644
> > --- a/net/sunrpc/svcsock.c
> > +++ b/net/sunrpc/svcsock.c
> > @@ -2012,28 +2012,6 @@ void svc_force_close_socket(struct svc_s
> > svc_close_socket(svsk);
> > }
> >
> > -/**
> > - * svc_makesock - Make a socket for nfsd and lockd
> > - * @serv: RPC server structure
> > - * @protocol: transport protocol to use
> > - * @port: port to use
> > - * @flags: requested socket characteristics
> > - *
> > - */
> > -int svc_makesock(struct svc_serv *serv, int protocol, unsigned
> > short port,
> > - int flags)
> > -{
> > - dprintk("svc: creating socket proto = %d\n", protocol);
> > - switch (protocol) {
> > - case IPPROTO_TCP:
> > - return svc_create_xprt(serv, "tcp", port, flags);
> > - case IPPROTO_UDP:
> > - return svc_create_xprt(serv, "udp", port, flags);
> > - default:
> > - return -EINVAL;
> > - }
> > -}
> > -
> > /*
> > * Handle defer and revisit of requests
> > */
>
> Chuck Lever
> [email protected]
>
>


-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2005.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2007-10-02 16:52:51

by Tom Tucker

[permalink] [raw]
Subject: Re: [RFC,PATCH 20/35] svc: Make svc_send transport neutral

On Tue, 2007-10-02 at 12:15 -0400, Chuck Lever wrote:
> On Oct 1, 2007, at 3:28 PM, Tom Tucker wrote:
> >

[...snip...]

> > - if ((svsk = rqstp->rq_sock) == NULL) {
> > - printk(KERN_WARNING "NULL socket pointer in %s:%d\n",
> > + if ((xprt = rqstp->rq_xprt) == NULL) {
> > + printk(KERN_WARNING "NULL transport pointer in %s:%d\n",
> > __FILE__, __LINE__);
> > return -EFAULT;
> > }
>
> Do we still want this printk here? Maybe it can be removed.

I don't know why it's here. Maybe replace it with a BUG_ON?

>
> > @@ -1674,13 +1674,13 @@ svc_send(struct svc_rqst *rqstp)
> > xb->page_len +
> > xb->tail[0].iov_len;
> >
> > - /* Grab svsk->sk_mutex to serialize outgoing data. */
> > - mutex_lock(&svsk->sk_mutex);
> > - if (test_bit(XPT_DEAD, &svsk->sk_xprt.xpt_flags))
> > + /* Grab mutex to serialize outgoing data. */
> > + mutex_lock(&xprt->xpt_mutex);
> > + if (test_bit(XPT_DEAD, &xprt->xpt_flags))
> > len = -ENOTCONN;
> > else
> > - len = svsk->sk_xprt.xpt_ops.xpo_sendto(rqstp);
> > - mutex_unlock(&svsk->sk_mutex);
> > + len = xprt->xpt_ops.xpo_sendto(rqstp);
> > + mutex_unlock(&xprt->xpt_mutex);
> > svc_sock_release(rqstp);
> >
> > if (len == -ECONNREFUSED || len == -ENOTCONN || len == -EAGAIN)
> > @@ -1782,7 +1782,6 @@ static struct svc_sock *svc_setup_socket
> > svsk->sk_lastrecv = get_seconds();
> > spin_lock_init(&svsk->sk_lock);
> > INIT_LIST_HEAD(&svsk->sk_deferred);
> > - mutex_init(&svsk->sk_mutex);
> >
> > /* Initialize the socket */
> > if (sock->type == SOCK_DGRAM)
>
> Chuck Lever
> [email protected]
>
>


-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2005.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2007-10-02 18:25:38

by Tom Tucker

[permalink] [raw]
Subject: Re: [RFC, PATCH 05/35] svc: Move sk_sendto and sk_recvfrom to svc_xprt_class

On Tue, 2007-10-02 at 12:57 -0400, Chuck Lever wrote:
> On Oct 2, 2007, at 12:29 PM, Tom Tucker wrote:

[...snip...]

> >>
> >> It looks like on the client side, I didn't put the ops vector or the
> >> payload maximum in the class structure at all... 6 of one, half dozen
> >> of the other. Using the class's value of the ops and payload maximum
> >> would save some space in the svc_xprt, though, come to think of it.
> >>
> >
> > cache thing again. let's see how Greg weighs in.
>
> The ops vector itself will be in some other CPU's memory most of the
> time on big systems.

Well this is a good point. Unless we implement thread pools for svc_xprt
memory allocation, it won't likely buy you much.

> I don't see how you can avoid a peek... but
> since it's a constant, caching should protect you most of the time, yes?

>
> Chuck Lever
> [email protected]
>
>


-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2005.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2007-10-02 18:30:08

by Tom Tucker

[permalink] [raw]
Subject: Re: [RFC,PATCH 11/35] svc: Add xpo_accept transport function

On Tue, 2007-10-02 at 13:07 -0400, Chuck Lever wrote:
> On Oct 2, 2007, at 12:41 PM, Tom Tucker wrote:
> > On Tue, 2007-10-02 at 11:33 -0400, Chuck Lever wrote:
> >> On Oct 1, 2007, at 3:27 PM, Tom Tucker wrote:
> >
> > [...snip...]
> >
> >>> + if (newxpt)
> >>> + svc_check_conn_limits(svsk->sk_server);
> >>> + svc_sock_received(svsk);
> >>> } else {
> >>> dprintk("svc: server %p, pool %u, socket %p, inuse=%d\n",
> >>> rqstp, pool->sp_id, svsk, atomic_read(&svsk->sk_inuse));
> >>
> >> Instead of adding a test_bit() and conditional branch here, why not
> >> always call xpo_accept? For UDP, the method simply returns.
> >>
> >
> > That's what I thought at first too, but UDP needs to call receive
> > here.
> > Doing nothing stalls the service and lockd never gets set up.
>
> The purpose of a transport switch is to force all the transport
> specific processing down into the transport implementation so you
> don't need these SK_ switches to decide whether or not to call a
> function based on which transport is in use.

I don't think it's doing that. I think it's checking the "role" of the
instance; passive vs. active endpoint. The role is transport independent
and is checked in the generic svc_recv function.

>
> Could you instead create, say, an ->xpo_accept_and_receive hook that
> did the right thing for all three transports?

You could, but IMO doing so just neuters the meaning of the
XPT_LISTENING bit for peer-to-peer transports.

>
> --
> Chuck Lever
> chuck[dot]lever[at]oracle[dot]com
>
>


-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2005.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2007-10-02 18:31:58

by Tom Tucker

[permalink] [raw]
Subject: Re: [RFC, PATCH 05/35] svc: Move sk_sendto and sk_recvfrom to svc_xprt_class

On Tue, 2007-10-02 at 13:24 -0500, Tom Tucker wrote:
> On Tue, 2007-10-02 at 12:57 -0400, Chuck Lever wrote:
> > On Oct 2, 2007, at 12:29 PM, Tom Tucker wrote:
>
> [...snip...]
>
> > >>
> > >> It looks like on the client side, I didn't put the ops vector or the
> > >> payload maximum in the class structure at all... 6 of one, half dozen
> > >> of the other. Using the class's value of the ops and payload maximum
> > >> would save some space in the svc_xprt, though, come to think of it.
> > >>
> > >
> > > cache thing again. let's see how Greg weighs in.
> >
> > The ops vector itself will be in some other CPU's memory most of the
> > time on big systems.
>
> Well this is a good point. Unless we implement thread pools for svc_xprt
> memory allocation, it won't likely buy you much.
>

Actually, I'm having second thoughts. Since the svc_xprt structure is
allocated on the rqstp thread in which the transport is going to be
used, won't the memory be local to the allocating processor on a NUMA
system?


> > I don't see how you can avoid a peek... but
> > since it's a constant, caching should protect you most of the time, yes?
>
> >
> > Chuck Lever
> > [email protected]
> >
> >
>
>
> -------------------------------------------------------------------------
> This SF.net email is sponsored by: Microsoft
> Defy all challenges. Microsoft(R) Visual Studio 2005.
> http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
> _______________________________________________
> NFS maillist - [email protected]
> https://lists.sourceforge.net/lists/listinfo/nfs


-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2005.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2007-10-02 18:48:17

by Chuck Lever

[permalink] [raw]
Subject: Re: [RFC, PATCH 05/35] svc: Move sk_sendto and sk_recvfrom to svc_xprt_class


On Oct 2, 2007, at 2:30 PM, Tom Tucker wrote:

> On Tue, 2007-10-02 at 13:24 -0500, Tom Tucker wrote:
>> On Tue, 2007-10-02 at 12:57 -0400, Chuck Lever wrote:
>>> On Oct 2, 2007, at 12:29 PM, Tom Tucker wrote:
>>
>> [...snip...]
>>
>>>>>
>>>>> It looks like on the client side, I didn't put the ops vector
>>>>> or the
>>>>> payload maximum in the class structure at all... 6 of one, half
>>>>> dozen
>>>>> of the other. Using the class's value of the ops and payload
>>>>> maximum
>>>>> would save some space in the svc_xprt, though, come to think of
>>>>> it.
>>>>>
>>>>
>>>> cache thing again. let's see how Greg weighs in.
>>>
>>> The ops vector itself will be in some other CPU's memory most of the
>>> time on big systems.
>>
>> Well this is a good point. Unless we implement thread pools for
>> svc_xprt
>> memory allocation, it won't likely buy you much.
>>
>
> Actually, I'm having second thoughts. Since the svc_xprt structure is
> allocated on the rqstp thread in which the transport is going to be
> used, won't the memory be local to the allocating processor on a NUMA
> system?

The ops vector isn't in the svc_xprt. It's a constant, so it's in
memory allocated by the kernel loader at boot time.

In other words, on each request, you will need to get the address of
the vector from dynamically allocated memory, but the vector itself
(and therefore the addresses of the transport methods) will be fixed
and in the boot CPU's memory pool.

--
Chuck Lever
chuck[dot]lever[at]oracle[dot]com



-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2005.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2007-10-02 18:50:07

by Chuck Lever

[permalink] [raw]
Subject: Re: [RFC,PATCH 11/35] svc: Add xpo_accept transport function


On Oct 2, 2007, at 2:28 PM, Tom Tucker wrote:

> On Tue, 2007-10-02 at 13:07 -0400, Chuck Lever wrote:
>> On Oct 2, 2007, at 12:41 PM, Tom Tucker wrote:
>>> On Tue, 2007-10-02 at 11:33 -0400, Chuck Lever wrote:
>>>> On Oct 1, 2007, at 3:27 PM, Tom Tucker wrote:
>>>
>>> [...snip...]
>>>
>>>>> + if (newxpt)
>>>>> + svc_check_conn_limits(svsk->sk_server);
>>>>> + svc_sock_received(svsk);
>>>>> } else {
>>>>> dprintk("svc: server %p, pool %u, socket %p, inuse=%d\n",
>>>>> rqstp, pool->sp_id, svsk, atomic_read(&svsk->sk_inuse));
>>>>
>>>> Instead of adding a test_bit() and conditional branch here, why not
>>>> always call xpo_accept? For UDP, the method simply returns.
>>>>
>>>
>>> That's what I thought at first too, but UDP needs to call receive
>>> here.
>>> Doing nothing stalls the service and lockd never gets set up.
>>
>> The purpose of a transport switch is to force all the transport
>> specific processing down into the transport implementation so you
>> don't need these SK_ switches to decide whether or not to call a
>> function based on which transport is in use.
>
> I don't think it's doing that. I think it's checking the "role" of the
> instance; passive vs. active endpoint. The role is transport
> independent
> and is checked in the generic svc_recv function.
>
>>
>> Could you instead create, say, an ->xpo_accept_and_receive hook that
>> did the right thing for all three transports?
>
> You could, but IMO doing so just neuters the meaning of the
> XPT_LISTENING bit for peer-to-peer transports.

Yes, that's the point. You wouldn't need XPT_LISTENING at all. The
transports would decide whether the endpoint is for listening or
receiving internally.

So I guess what I'm asking is: "Is there a good reason to expose the
difference between listening and receiving transport endpoints in the
generic code?"

--
Chuck Lever
chuck[dot]lever[at]oracle[dot]com




-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2005.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2007-10-02 19:56:22

by Tom Tucker

[permalink] [raw]
Subject: Re: [RFC, PATCH 05/35] svc: Move sk_sendto and sk_recvfrom to svc_xprt_class

On Tue, 2007-10-02 at 14:47 -0400, Chuck Lever wrote:
> On Oct 2, 2007, at 2:30 PM, Tom Tucker wrote:
>
> > On Tue, 2007-10-02 at 13:24 -0500, Tom Tucker wrote:
> >> On Tue, 2007-10-02 at 12:57 -0400, Chuck Lever wrote:
> >>> On Oct 2, 2007, at 12:29 PM, Tom Tucker wrote:
> >>
> >> [...snip...]

[...snip...]

> >
> > Actually, I'm having second thoughts. Since the svc_xprt structure is
> > allocated on the rqstp thread in which the transport is going to be
> > used, won't the memory be local to the allocating processor on a NUMA
> > system?
>
> The ops vector isn't in the svc_xprt. It's a constant, so it's in
> memory allocated by the kernel loader at boot time.
>

I think one of us missing something. Here's how I think it works...

The svc_xprt_ops structure is a constant in kernel memory. The
svc_xprt_class is also a constant and points to the svc_xprt_ops
structure. The svc_xprt structure, however, is allocated via kmalloc and
contains a _copy_ of the constant svc_xprt_ops structure and a copy of
the xcl_max_payload value. See the svc_xprt_init function.

My original thinking (flawed I think) was that since the svc_xprt was
allocated in the context of the current rqstp thread, that it would be
allocated from processor local memory. While I think this is true,
subsequent assignment of a rqstp thread to service a transport has no
affinity to a particular transport. So as you say, it will likely be in
some other processors memory, so what does it matter?

So the question in my mind, is "is it worth it to create rqstp to
transport affinity when assigning a thread to service a transport. If
not, then I should remove all this copy nonsense and access the constant
in svc_xprt_class directly.

> In other words, on each request, you will need to get the address of
> the vector from dynamically allocated memory, but the vector itself
> (and therefore the addresses of the transport methods) will be fixed
> and in the boot CPU's memory pool.
>
> --
> Chuck Lever
> chuck[dot]lever[at]oracle[dot]com
>


-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2005.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2007-10-02 20:30:05

by Chuck Lever

[permalink] [raw]
Subject: Re: [RFC, PATCH 05/35] svc: Move sk_sendto and sk_recvfrom to svc_xprt_class

On Oct 2, 2007, at 3:55 PM, Tom Tucker wrote:
> On Tue, 2007-10-02 at 14:47 -0400, Chuck Lever wrote:
>> On Oct 2, 2007, at 2:30 PM, Tom Tucker wrote:
>>
>>> On Tue, 2007-10-02 at 13:24 -0500, Tom Tucker wrote:
>>>> On Tue, 2007-10-02 at 12:57 -0400, Chuck Lever wrote:
>>>>> On Oct 2, 2007, at 12:29 PM, Tom Tucker wrote:
>>>>
>>>> [...snip...]
>
> [...snip...]
>
>>>
>>> Actually, I'm having second thoughts. Since the svc_xprt
>>> structure is
>>> allocated on the rqstp thread in which the transport is going to be
>>> used, won't the memory be local to the allocating processor on a
>>> NUMA
>>> system?
>>
>> The ops vector isn't in the svc_xprt. It's a constant, so it's in
>> memory allocated by the kernel loader at boot time.
>>
>
> I think one of us missing something. Here's how I think it works...
>
> The svc_xprt_ops structure is a constant in kernel memory. The
> svc_xprt_class is also a constant and points to the svc_xprt_ops
> structure. The svc_xprt structure, however, is allocated via
> kmalloc and
> contains a _copy_ of the constant svc_xprt_ops structure and a copy of
> the xcl_max_payload value. See the svc_xprt_init function.

Hence this:

+ xpt->xpt_ops = *xcl->xcl_ops;

I've never seen this kind of thing anywhere else in the kernel. At
the very least it deserves a special comment to explain why each
svc_xprt keeps its own copy of the transport class methods.

> My original thinking (flawed I think) was that since the svc_xprt was
> allocated in the context of the current rqstp thread, that it would be
> allocated from processor local memory. While I think this is true,
> subsequent assignment of a rqstp thread to service a transport has no
> affinity to a particular transport. So as you say, it will likely
> be in
> some other processors memory, so what does it matter?

I agree, I don't think currently there's any affinity between the
svc_xprt and any requests that may be created. If it really matters,
you can create a rqst creation method.

> So the question in my mind, is "is it worth it to create rqstp to
> transport affinity when assigning a thread to service a transport. If
> not, then I should remove all this copy nonsense and access the
> constant
> in svc_xprt_class directly.

If we're not already assigning socket or thread affinity to requests,
then I don't see that it needs to be enforced via the transport
switch interface.

>> In other words, on each request, you will need to get the address of
>> the vector from dynamically allocated memory, but the vector itself
>> (and therefore the addresses of the transport methods) will be fixed
>> and in the boot CPU's memory pool.

--
Chuck Lever
chuck[dot]lever[at]oracle[dot]com




-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2005.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2007-10-02 20:36:36

by Tom Tucker

[permalink] [raw]
Subject: Re: [RFC, PATCH 05/35] svc: Move sk_sendto and sk_recvfrom to svc_xprt_class

On Tue, 2007-10-02 at 16:29 -0400, Chuck Lever wrote:
> On Oct 2, 2007, at 3:55 PM, Tom Tucker wrote:
> > On Tue, 2007-10-02 at 14:47 -0400, Chuck Lever wrote:
> >> On Oct 2, 2007, at 2:30 PM, Tom Tucker wrote:
> >>
> >>> On Tue, 2007-10-02 at 13:24 -0500, Tom Tucker wrote:
> >>>> On Tue, 2007-10-02 at 12:57 -0400, Chuck Lever wrote:
> >>>>> On Oct 2, 2007, at 12:29 PM, Tom Tucker wrote:
> >>>>
> >>>> [...snip...]
> >
> > [...snip...]
> >
> >>>
> >>> Actually, I'm having second thoughts. Since the svc_xprt
> >>> structure is
> >>> allocated on the rqstp thread in which the transport is going to be
> >>> used, won't the memory be local to the allocating processor on a
> >>> NUMA
> >>> system?
> >>
> >> The ops vector isn't in the svc_xprt. It's a constant, so it's in
> >> memory allocated by the kernel loader at boot time.
> >>
> >
> > I think one of us missing something. Here's how I think it works...
> >
> > The svc_xprt_ops structure is a constant in kernel memory. The
> > svc_xprt_class is also a constant and points to the svc_xprt_ops
> > structure. The svc_xprt structure, however, is allocated via
> > kmalloc and
> > contains a _copy_ of the constant svc_xprt_ops structure and a copy of
> > the xcl_max_payload value. See the svc_xprt_init function.
>
> Hence this:
>
> + xpt->xpt_ops = *xcl->xcl_ops;
>
> I've never seen this kind of thing anywhere else in the kernel. At
> the very least it deserves a special comment to explain why each
> svc_xprt keeps its own copy of the transport class methods.
>
> > My original thinking (flawed I think) was that since the svc_xprt was
> > allocated in the context of the current rqstp thread, that it would be
> > allocated from processor local memory. While I think this is true,
> > subsequent assignment of a rqstp thread to service a transport has no
> > affinity to a particular transport. So as you say, it will likely
> > be in
> > some other processors memory, so what does it matter?
>
> I agree, I don't think currently there's any affinity between the
> svc_xprt and any requests that may be created. If it really matters,
> you can create a rqst creation method.
>
> > So the question in my mind, is "is it worth it to create rqstp to
> > transport affinity when assigning a thread to service a transport. If
> > not, then I should remove all this copy nonsense and access the
> > constant
> > in svc_xprt_class directly.
>
> If we're not already assigning socket or thread affinity to requests,
> then I don't see that it needs to be enforced via the transport
> switch interface.

I agree too. So unless anyone objects, I'm going to change the the
svc_xprt structure to have a pointer to the svc_xprt_ops structure, and
the svc_max_payload will be accessed from the svc_xprt_class structure.

The ptr to the svc_xprt_ops structure is to avoid one ptr indirection
when accessing ops (done quite a bit).

>
> >> In other words, on each request, you will need to get the address of
> >> the vector from dynamically allocated memory, but the vector itself
> >> (and therefore the addresses of the transport methods) will be fixed
> >> and in the boot CPU's memory pool.
>
> --
> Chuck Lever
> chuck[dot]lever[at]oracle[dot]com
>
>


-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2005.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2007-10-02 20:40:12

by Tom Tucker

[permalink] [raw]
Subject: Re: [RFC, PATCH 05/35] svc: Move sk_sendto and sk_recvfrom to svc_xprt_class

On Tue, 2007-10-02 at 15:35 -0500, Tom Tucker wrote:
> On Tue, 2007-10-02 at 16:29 -0400, Chuck Lever wrote:

[...snip...]

> I agree too. So unless anyone objects, I'm going to change the the
> svc_xprt structure to have a pointer to the svc_xprt_ops structure, and
> the svc_max_payload will be accessed from the svc_xprt_class structure.
>
> The ptr to the svc_xprt_ops structure is to avoid one ptr indirection
> when accessing ops (done quite a bit).

BTW, this reminds me, the current implementation avoids one ptr
indirection when calling ops as follows:

xrpt->ops.op

if we change it, it will be

xprt->ops_ptr->ops.op


> >
> > >> In other words, on each request, you will need to get the address of
> > >> the vector from dynamically allocated memory, but the vector itself
> > >> (and therefore the addresses of the transport methods) will be fixed
> > >> and in the boot CPU's memory pool.
> >
> > --
> > Chuck Lever
> > chuck[dot]lever[at]oracle[dot]com
> >
> >
>
>
> -------------------------------------------------------------------------
> This SF.net email is sponsored by: Microsoft
> Defy all challenges. Microsoft(R) Visual Studio 2005.
> http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
> _______________________________________________
> NFS maillist - [email protected]
> https://lists.sourceforge.net/lists/listinfo/nfs


-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2005.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2007-10-03 10:59:09

by Greg Banks

[permalink] [raw]
Subject: Re: [RFC,PATCH 00/35] SVC Transport Switch

On Mon, Oct 01, 2007 at 02:14:26PM -0500, Tom Tucker wrote:
> This is rev 2 of the new pluggable transport switch for
> RPC servers. This version includes two new patches: one to add a field
> for keeping track of a transport specific header that precedes the
> RPC header for deferral processing, and one that cleans up some
> left over references to svc_sock in transport independent code.

> Subject: [RFC,PATCH 22/35] svc: Remove sk_lastrecv

ok


> Subject: [RFC,PATCH 31/35] svc: Make svc_check_conn_limits xprt independent

ok


> Subject: [RFC,PATCH 33/35] svc: Add transport hdr size for defer/revisit

Hmm. When I did it this way I needed to do two things this patch
doesn't do.

* Save my equivalent of rqstp->rq_hdr_len in the deferred call record

* Have a hunk a in svc_deferred_recv() to restore
rqstp->rq_arg.head[0].iov_base properly so that on revisit it points
to the byte *after* the transport-specific header (where it should
be on return from xpo_recvfrom).

Like this:


--- linux-2.6.16.orig/include/linux/sunrpc/svc.h 2007-03-27 23:10:51.439088106 +1000
+++ linux-2.6.16/include/linux/sunrpc/svc.h 2007-05-23 23:11:52.324128040 +1000
@@ -350,6 +350,8 @@ struct svc_deferred_req {
u32 daddr; /* where reply must come from */
struct cache_deferred_req handle;
int argslen;
+ int trans_header; /* size of preserved transport-
+ * specific header */
u32 args[0];
};

--- linux-2.6.16.orig/net/sunrpc/svcsock.c 2007-05-23 23:05:56.293445240 +1000
+++ linux-2.6.16/net/sunrpc/svcsock.c 2007-05-23 23:11:52.396118869 +1000
@@ -1727,6 +1729,8 @@ static struct cache_deferred_req *svc_de
dr->addr = rqstp->rq_addr;
dr->daddr = rqstp->rq_daddr;
dr->argslen = rqstp->rq_arg.len >> 2;
+ dr->trans_header = rqstp->rq_arg.trans_header;
+ skip += dr->trans_header;
memcpy(dr->args, rqstp->rq_arg.head[0].iov_base-skip, dr->argslen<<2);
dprintk("svc: deferred %d bytes of request, dr=%p\n",
dr->argslen<<2, dr);
@@ -1745,10 +1749,11 @@ static int svc_deferred_recv(struct svc_
{
struct svc_deferred_req *dr = rqstp->rq_deferred;

- rqstp->rq_arg.head[0].iov_base = dr->args;
+ rqstp->rq_arg.head[0].iov_base = dr->args + dr->trans_header;
rqstp->rq_arg.head[0].iov_len = dr->argslen<<2;
rqstp->rq_arg.page_len = 0;
rqstp->rq_arg.len = dr->argslen<<2;
+ rqstp->rq_arg.trans_header = dr->trans_header;
rqstp->rq_prot = dr->prot;
rqstp->rq_addr = dr->addr;
rqstp->rq_daddr = dr->daddr;

Is this somehow no longer necessary?

@@ -393,6 +393,9 @@ svc_recvfrom(struct svc_rqst *rqstp, str
};
int len;

+ /* TCP/UDP have no transport header */
+ rqstp->rq_xprt_hlen = 0;
+
len = kernel_recvmsg(svsk->sk_sock, &msg, iov, nr, buflen,
msg.msg_flags);

This comment is not quite correct. TCP has a transport header, but
we don't need it after svc_tcp_recvfrom() returns, so it doesn't
need to be handled by defer/revisit.



Greg.
--
Greg Banks, R&D Software Engineer, SGI Australian Software Group.
Apparently, I'm Bedevere. Which MPHG character are you?
I don't speak for SGI.

-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2005.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2007-10-03 11:04:48

by Greg Banks

[permalink] [raw]
Subject: Re: [RFC, PATCH 04/35] svc: Add a max payload value to the transport

On Tue, Oct 02, 2007 at 11:28:08AM -0500, Tom Tucker wrote:
> On Tue, 2007-10-02 at 10:54 -0400, Chuck Lever wrote:
> > On Oct 1, 2007, at 3:27 PM, Tom Tucker wrote:
> > >
>
> [...snip...]
>
> > >
> > > struct svc_xprt {
> > > struct svc_xprt_class *xpt_class;
> > > struct svc_xprt_ops xpt_ops;
> > > + u32 xpt_max_payload;
> > > };
> >
> > Why do you need this field in both the class and the instance
> > structures? Since svc_xprt refers back to svc_xprt_class, you can
> > just take the max payload value from the class.
> >
>
> The premise was that I didn't want a given thread peeking into some
> other processors memory, so anything needed from the class is copied
> into the svc_xprt structure when the transport instance is created.

If I understand the code correctly, the only field in svc_xprt_class
that ever changes is the list_head used to add it to the global list,
which changes once when the transport module is loaded. After that
all the svc_xprt_class cachelines will remain in shared state in all
CPUs' caches; it's effectively read-only. Given that, there's little
to be concerned about.

>
> Greg, help me here ;-)

Sorry, I agree with Chuck :-/

Greg.
--
Greg Banks, R&D Software Engineer, SGI Australian Software Group.
Apparently, I'm Bedevere. Which MPHG character are you?
I don't speak for SGI.

-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2005.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2007-10-03 11:08:41

by Greg Banks

[permalink] [raw]
Subject: Re: [RFC, PATCH 05/35] svc: Move sk_sendto and sk_recvfrom to svc_xprt_class

On Tue, Oct 02, 2007 at 11:29:56AM -0500, Tom Tucker wrote:
> On Tue, 2007-10-02 at 11:04 -0400, Chuck Lever wrote:
> > On Oct 1, 2007, at 3:27 PM, Tom Tucker wrote:
> > Again, here you have copied a pointer from the class structure to the
> > instance structure -- the address of the transport ops structure
> > never changes during the lifetime of the xprt instance, does it? You
> > could just as easily use the class's ops pointer instead.
> >
> > It looks like on the client side, I didn't put the ops vector or the
> > payload maximum in the class structure at all... 6 of one, half dozen
> > of the other. Using the class's value of the ops and payload maximum
> > would save some space in the svc_xprt, though, come to think of it.
> >
>
> cache thing again. let's see how Greg weighs in.

What Chuck said.

> > Also, to address Neil's concern about the appearance of the
> > expression which dereferences these methods, why not use a macro,
> > similar to VOP_GETATTR() in the old BSD kernels, that replaces this
> > long chain of indirections with a simple to recognize macro call?
> >
>
> The chain is disgusting until everything gets normalized to an svc_xprt
> in later patches. Take a look at the final result and see if you still
> think a macro helps vs. just obscures.

I agree with Tom here, the final result is less sickening
than some of the intermediate stages. Like sausage.

Greg.
--
Greg Banks, R&D Software Engineer, SGI Australian Software Group.
Apparently, I'm Bedevere. Which MPHG character are you?
I don't speak for SGI.

-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2005.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2007-10-03 11:12:19

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [RFC,PATCH 14/35] svc: Change sk_inuse to a kref

On Mon, Oct 01, 2007 at 02:28:01PM -0500, Tom Tucker wrote:
>
> Change the atomic_t reference count to a kref and move it to the
> transport indepenent svc_xprt structure. Change the reference count
> wrapper names to be generic.

Why? krefs are a complete pain in the ass, and it's hard to avoid
various races with them.


-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2005.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2007-10-03 14:03:52

by Tom Tucker

[permalink] [raw]
Subject: Re: [RFC,PATCH 00/35] SVC Transport Switch

On Wed, 2007-10-03 at 21:03 +1000, Greg Banks wrote:
> On Mon, Oct 01, 2007 at 02:14:26PM -0500, Tom Tucker wrote:
> > This is rev 2 of the new pluggable transport switch for
> > RPC servers. This version includes two new patches: one to add a field
> > for keeping track of a transport specific header that precedes the
> > RPC header for deferral processing, and one that cleans up some
> > left over references to svc_sock in transport independent code.
>
> > Subject: [RFC,PATCH 22/35] svc: Remove sk_lastrecv
>
> ok
>
>
> > Subject: [RFC,PATCH 31/35] svc: Make svc_check_conn_limits xprt independent
>
> ok
>
>
> > Subject: [RFC,PATCH 33/35] svc: Add transport hdr size for defer/revisit
>
> Hmm. When I did it this way I needed to do two things this patch
> doesn't do.
>
> * Save my equivalent of rqstp->rq_hdr_len in the deferred call record
>
> * Have a hunk a in svc_deferred_recv() to restore
> rqstp->rq_arg.head[0].iov_base properly so that on revisit it points
> to the byte *after* the transport-specific header (where it should
> be on return from xpo_recvfrom).
>
> Like this:
>
>
> --- linux-2.6.16.orig/include/linux/sunrpc/svc.h 2007-03-27 23:10:51.439088106 +1000
> +++ linux-2.6.16/include/linux/sunrpc/svc.h 2007-05-23 23:11:52.324128040 +1000
> @@ -350,6 +350,8 @@ struct svc_deferred_req {
> u32 daddr; /* where reply must come from */
> struct cache_deferred_req handle;
> int argslen;
> + int trans_header; /* size of preserved transport-
> + * specific header */
> u32 args[0];
> };
>
> --- linux-2.6.16.orig/net/sunrpc/svcsock.c 2007-05-23 23:05:56.293445240 +1000
> +++ linux-2.6.16/net/sunrpc/svcsock.c 2007-05-23 23:11:52.396118869 +1000
> @@ -1727,6 +1729,8 @@ static struct cache_deferred_req *svc_de
> dr->addr = rqstp->rq_addr;
> dr->daddr = rqstp->rq_daddr;
> dr->argslen = rqstp->rq_arg.len >> 2;
> + dr->trans_header = rqstp->rq_arg.trans_header;
> + skip += dr->trans_header;
> memcpy(dr->args, rqstp->rq_arg.head[0].iov_base-skip, dr->argslen<<2);
> dprintk("svc: deferred %d bytes of request, dr=%p\n",
> dr->argslen<<2, dr);
> @@ -1745,10 +1749,11 @@ static int svc_deferred_recv(struct svc_
> {
> struct svc_deferred_req *dr = rqstp->rq_deferred;
>
> - rqstp->rq_arg.head[0].iov_base = dr->args;
> + rqstp->rq_arg.head[0].iov_base = dr->args + dr->trans_header;
> rqstp->rq_arg.head[0].iov_len = dr->argslen<<2;
> rqstp->rq_arg.page_len = 0;
> rqstp->rq_arg.len = dr->argslen<<2;
> + rqstp->rq_arg.trans_header = dr->trans_header;
> rqstp->rq_prot = dr->prot;
> rqstp->rq_addr = dr->addr;
> rqstp->rq_daddr = dr->daddr;
>
> Is this somehow no longer necessary?

No, it's still necessary, but your approach is a nice optimization since
you don't need to re-parse the header to get it's size when handling the
deferred recv.

Thanks,

>
> @@ -393,6 +393,9 @@ svc_recvfrom(struct svc_rqst *rqstp, str
> };
> int len;
>
> + /* TCP/UDP have no transport header */
> + rqstp->rq_xprt_hlen = 0;
> +
> len = kernel_recvmsg(svsk->sk_sock, &msg, iov, nr, buflen,
> msg.msg_flags);
>
> This comment is not quite correct. TCP has a transport header, but
> we don't need it after svc_tcp_recvfrom() returns, so it doesn't
> need to be handled by defer/revisit.
>

Ok,

>
> Greg.


-------------------------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems? Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >> http://get.splunk.com/
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2007-10-03 14:28:07

by Tom Tucker

[permalink] [raw]
Subject: Re: [RFC, PATCH 04/35] svc: Add a max payload value to the transport

On Wed, 2007-10-03 at 21:09 +1000, Greg Banks wrote:
> On Tue, Oct 02, 2007 at 11:28:08AM -0500, Tom Tucker wrote:

[...snip...]

> >
> > The premise was that I didn't want a given thread peeking into some
> > other processors memory, so anything needed from the class is copied
> > into the svc_xprt structure when the transport instance is created.
>
> If I understand the code correctly, the only field in svc_xprt_class
> that ever changes is the list_head used to add it to the global list,
> which changes once when the transport module is loaded. After that
> all the svc_xprt_class cachelines will remain in shared state in all
> CPUs' caches; it's effectively read-only. Given that, there's little
> to be concerned about.
>
> >
> > Greg, help me here ;-)
>
> Sorry, I agree with Chuck :-/

np, let's get it right. So the max_payload should remain in the
svc_xprt_class structure.

I see only one possible benefit of copying the svc_xprt_ops structure
and that is that it saves one ptr de-reference when calling ops. What's
the consensus? Copy it, or keep a ptr?

Tom

>
> Greg.


-------------------------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems? Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >> http://get.splunk.com/
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2007-10-03 14:41:15

by Tom Tucker

[permalink] [raw]
Subject: Re: [RFC,PATCH 14/35] svc: Change sk_inuse to a kref

On Wed, 2007-10-03 at 12:12 +0100, Christoph Hellwig wrote:
> On Mon, Oct 01, 2007 at 02:28:01PM -0500, Tom Tucker wrote:
> >
> > Change the atomic_t reference count to a kref and move it to the
> > transport indepenent svc_xprt structure. Change the reference count
> > wrapper names to be generic.
>
> Why? krefs are a complete pain in the ass, and it's hard to avoid
> various races with them.

I converted to krefs at the request of Chuck Lever who implemented the
client side.

>From my perspective the value of krefs is that it is a focal point for
adding debug logic to catch dangling reference holders -- of course,
that debug logic is not in the kernel right now -- so ...

Other than that it's a wash.





-------------------------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems? Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >> http://get.splunk.com/
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2007-10-03 14:45:19

by J. Bruce Fields

[permalink] [raw]
Subject: Re: [RFC,PATCH 14/35] svc: Change sk_inuse to a kref

On Wed, Oct 03, 2007 at 12:12:10PM +0100, Christoph Hellwig wrote:
> On Mon, Oct 01, 2007 at 02:28:01PM -0500, Tom Tucker wrote:
> >
> > Change the atomic_t reference count to a kref and move it to the
> > transport indepenent svc_xprt structure. Change the reference count
> > wrapper names to be generic.
>
> Why? krefs are a complete pain in the ass, and it's hard to avoid
> various races with them.

So kref's are deprecated now? What's the deal? Is there a pointer to
discussion somewhere?

--b.

-------------------------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems? Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >> http://get.splunk.com/
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2007-10-03 14:52:41

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [RFC,PATCH 14/35] svc: Change sk_inuse to a kref

On Wed, Oct 03, 2007 at 10:45:17AM -0400, J. Bruce Fields wrote:
> On Wed, Oct 03, 2007 at 12:12:10PM +0100, Christoph Hellwig wrote:
> > On Mon, Oct 01, 2007 at 02:28:01PM -0500, Tom Tucker wrote:
> > >
> > > Change the atomic_t reference count to a kref and move it to the
> > > transport indepenent svc_xprt structure. Change the reference count
> > > wrapper names to be generic.
> >
> > Why? krefs are a complete pain in the ass, and it's hard to avoid
> > various races with them.
>
> So kref's are deprecated now? What's the deal? Is there a pointer to
> discussion somewhere?

The people that have introduced them as still playing fanboys for them,
but that doesn't make them any better idea. I don't think they've every
been encouraged by lots of people.

-------------------------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems? Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >> http://get.splunk.com/
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2007-10-03 15:11:39

by J. Bruce Fields

[permalink] [raw]
Subject: Re: [RFC,PATCH 14/35] svc: Change sk_inuse to a kref

On Wed, Oct 03, 2007 at 03:52:38PM +0100, Christoph Hellwig wrote:
> On Wed, Oct 03, 2007 at 10:45:17AM -0400, J. Bruce Fields wrote:
> > On Wed, Oct 03, 2007 at 12:12:10PM +0100, Christoph Hellwig wrote:
> > > On Mon, Oct 01, 2007 at 02:28:01PM -0500, Tom Tucker wrote:
> > > >
> > > > Change the atomic_t reference count to a kref and move it to the
> > > > transport indepenent svc_xprt structure. Change the reference count
> > > > wrapper names to be generic.
> > >
> > > Why? krefs are a complete pain in the ass, and it's hard to avoid
> > > various races with them.
> >
> > So kref's are deprecated now? What's the deal? Is there a pointer to
> > discussion somewhere?
>
> The people that have introduced them as still playing fanboys for them,
> but that doesn't make them any better idea. I don't think they've every
> been encouraged by lots of people.

OK, but why are they a complete pain in the ass, and which hard-to-avoid
races are you thinking of? Is that documented somewhere?

--b.

-------------------------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems? Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >> http://get.splunk.com/
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2007-10-03 15:15:24

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [RFC,PATCH 14/35] svc: Change sk_inuse to a kref

On Wed, Oct 03, 2007 at 11:11:39AM -0400, J. Bruce Fields wrote:
> OK, but why are they a complete pain in the ass, and which hard-to-avoid
> races are you thinking of? Is that documented somewhere?

They lack a primitive equivalent to atomic_dec_and_test which is required
if you need the lock to prevent it going up from zero again. There have
been various lkml threads on this.

-------------------------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems? Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >> http://get.splunk.com/
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2007-10-03 15:21:16

by Chuck Lever

[permalink] [raw]
Subject: Re: [RFC, PATCH 04/35] svc: Add a max payload value to the transport

On Oct 3, 2007, at 10:26 AM, Tom Tucker wrote:
> On Wed, 2007-10-03 at 21:09 +1000, Greg Banks wrote:
>> On Tue, Oct 02, 2007 at 11:28:08AM -0500, Tom Tucker wrote:
>
> [...snip...]
>
>>>
>>> The premise was that I didn't want a given thread peeking into some
>>> other processors memory, so anything needed from the class is copied
>>> into the svc_xprt structure when the transport instance is created.
>>
>> If I understand the code correctly, the only field in svc_xprt_class
>> that ever changes is the list_head used to add it to the global list,
>> which changes once when the transport module is loaded. After that
>> all the svc_xprt_class cachelines will remain in shared state in all
>> CPUs' caches; it's effectively read-only. Given that, there's little
>> to be concerned about.
>>
>>>
>>> Greg, help me here ;-)
>>
>> Sorry, I agree with Chuck :-/
>
> np, let's get it right. So the max_payload should remain in the
> svc_xprt_class structure.
>
> I see only one possible benefit of copying the svc_xprt_ops structure
> and that is that it saves one ptr de-reference when calling ops.
> What's
> the consensus? Copy it, or keep a ptr?

My preference is not to copy it.

I'm not aware of any place in the kernel that copies the ops vector
into each instance of an object, and I don't see that we have a
strong motivation to implement something unique in this case.

--
Chuck Lever
chuck[dot]lever[at]oracle[dot]com




-------------------------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems? Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >> http://get.splunk.com/
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2007-10-03 15:21:23

by Chuck Lever

[permalink] [raw]
Subject: Re: [RFC,PATCH 14/35] svc: Change sk_inuse to a kref

On Oct 3, 2007, at 10:52 AM, Christoph Hellwig wrote:
> On Wed, Oct 03, 2007 at 10:45:17AM -0400, J. Bruce Fields wrote:
>> On Wed, Oct 03, 2007 at 12:12:10PM +0100, Christoph Hellwig wrote:
>>> On Mon, Oct 01, 2007 at 02:28:01PM -0500, Tom Tucker wrote:
>>>>
>>>> Change the atomic_t reference count to a kref and move it to the
>>>> transport indepenent svc_xprt structure. Change the reference count
>>>> wrapper names to be generic.
>>>
>>> Why? krefs are a complete pain in the ass, and it's hard to avoid
>>> various races with them.
>>
>> So kref's are deprecated now? What's the deal? Is there a
>> pointer to
>> discussion somewhere?
>
> The people that have introduced them as still playing fanboys for
> them,
> but that doesn't make them any better idea. I don't think they've
> every
> been encouraged by lots of people.

Christoph, I think you are stating a personal preference here...

We're using krefs without issue in several other areas of the RPC
client and server. Both rpc_clnt and rpc_xprt have a kref in them,
for example. I don't see the harm in using them here as well, as
long as they are used carefully.

--
Chuck Lever
chuck[dot]lever[at]oracle[dot]com



-------------------------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems? Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >> http://get.splunk.com/
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2007-10-03 15:34:08

by J. Bruce Fields

[permalink] [raw]
Subject: Re: [RFC,PATCH 14/35] svc: Change sk_inuse to a kref

On Wed, Oct 03, 2007 at 11:13:58AM -0400, Chuck Lever wrote:
> We're using krefs without issue in several other areas of the RPC client
> and server. Both rpc_clnt and rpc_xprt have a kref in them, for example.
> I don't see the harm in using them here as well, as long as they are used
> carefully.

I believe the change in this case is more or less a no-op, so I can't
bring myself to care either way.

I seem to recall Andrew (? I thought?) saying that the use of kref's
helped made obvious cases where use of a reference count followed the
simplest

atomic_set( ,0) on init
atomic_inc() on get
atomic_dec_and_test() on put

pattern, and that that simplified reviewing.

I dunno.

--b.

-------------------------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems? Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >> http://get.splunk.com/
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2007-10-03 16:09:12

by Chuck Lever

[permalink] [raw]
Subject: Re: [RFC, PATCH 13/35] svc: Change services to use new svc_create_xprt service

On Oct 2, 2007, at 12:45 PM, Tom Tucker wrote:
> On Tue, 2007-10-02 at 11:44 -0400, Chuck Lever wrote:
>> On Oct 1, 2007, at 3:27 PM, Tom Tucker wrote:
>>>
>>> Modify the various kernel RPC svcs to use the svc_create_xprt
>>> service.
>>>
>>> Signed-off-by: Tom Tucker <[email protected]>
>>> ---
>>>
>>> fs/lockd/svc.c | 17 ++++++++---------
>>> fs/nfs/callback.c | 4 ++--
>>> fs/nfsd/nfssvc.c | 4 ++--
>>> include/linux/sunrpc/svcsock.h | 1 -
>>> net/sunrpc/sunrpc_syms.c | 1 -
>>> net/sunrpc/svcsock.c | 22 ----------------------
>>> 6 files changed, 12 insertions(+), 37 deletions(-)
>>>
>>> diff --git a/fs/lockd/svc.c b/fs/lockd/svc.c
>>> index 82e2192..8686915 100644
>>> --- a/fs/lockd/svc.c
>>> +++ b/fs/lockd/svc.c
>>> @@ -219,13 +219,12 @@ lockd(struct svc_rqst *rqstp)
>>> module_put_and_exit(0);
>>> }
>>>
>>> -
>>> -static int find_socket(struct svc_serv *serv, int proto)
>>> +static int find_xprt(struct svc_serv *serv, char *proto)
>>> {
>>> struct svc_sock *svsk;
>>> int found = 0;
>>> list_for_each_entry(svsk, &serv->sv_permsocks, sk_list)
>>> - if (svsk->sk_sk->sk_protocol == proto) {
>>> + if (strcmp(svsk->sk_xprt.xpt_class->xcl_name, proto) == 0) {
>>> found = 1;
>>> break;
>>> }
>>
>> This is scary. :-)
>>
>
> Yes, I agree. In fact, this is the "last place" where svcs ramble
> around
> svc_xprt internals. This was along the lines of "how much bigger do I
> make this patchset". BTW, this function was already here , I just
> modified it.
>
>> First, I think we would be better off making the server transport API
>> stronger by not allowing ULPs to dig around in svc_xprt or the
>> svc_xprt_class structures directly. Perhaps you could provide a
>> method for obtaining the transport's NETID.
>
> I'll propose a service. NETID or simply protocol/port? What's the
> consensus?

I think NETID is all that is needed here.

You might consider moving the whole function into the RPC server to
make this even cleaner. The fact that this facility is planted
squarely in lockd is a bit of a layering violation to begin with.

--
Chuck Lever
chuck[dot]lever[at]oracle[dot]com



-------------------------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems? Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >> http://get.splunk.com/
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2007-10-03 16:25:04

by Tom Tucker

[permalink] [raw]
Subject: Re: [RFC, PATCH 13/35] svc: Change services to use new svc_create_xprt service

On Wed, 2007-10-03 at 11:25 -0400, Chuck Lever wrote:
> On Oct 2, 2007, at 12:45 PM, Tom Tucker wrote:
[...snip...]
> >> First, I think we would be better off making the server transport API
> >> stronger by not allowing ULPs to dig around in svc_xprt or the
> >> svc_xprt_class structures directly. Perhaps you could provide a
> >> method for obtaining the transport's NETID.
> >
> > I'll propose a service. NETID or simply protocol/port? What's the
> > consensus?
>
> I think NETID is all that is needed here.
>
> You might consider moving the whole function into the RPC server to
> make this even cleaner. The fact that this facility is planted
> squarely in lockd is a bit of a layering violation to begin with.

Agreed. Thanks,
Tom

>
> --
> Chuck Lever
> chuck[dot]lever[at]oracle[dot]com
>


-------------------------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems? Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >> http://get.splunk.com/
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2007-10-03 20:02:26

by Tom Tucker

[permalink] [raw]
Subject: Re: [RFC, PATCH 12/35] svc: Add a generic transport svc_create_xprt function

On Tue, 2007-10-02 at 11:39 -0400, Chuck Lever wrote:
> On Oct 1, 2007, at 3:27 PM, Tom Tucker wrote:
> >
[...snip...]
> >
> > struct svc_xprt_ops {
> > + struct svc_xprt *(*xpo_create)(struct svc_serv *,
> > + struct sockaddr *,
> > + int);
>
> Should xpo_create also have a length argument, as in (struct sockaddr
> *, socklen_t) ?

I think socklen_t is only defined in userland.

>
> (or whatever the type of sockaddr lengths are: size_t perhaps?)
>

I've seen it both ways. I just copied kernel_bind which takes an int for
the length. Does anyone know what the preferred type is for sockaddr
len?

> > struct svc_xprt *(*xpo_accept)(struct svc_xprt *);
> > int (*xpo_has_wspace)(struct svc_xprt *);
> > int (*xpo_recvfrom)(struct svc_rqst *);
> > @@ -37,5 +40,6 @@ struct svc_xprt {
> > int svc_reg_xprt_class(struct svc_xprt_class *);
> > int svc_unreg_xprt_class(struct svc_xprt_class *);
> > void svc_xprt_init(struct svc_xprt_class *, struct svc_xprt *);
> > +int svc_create_xprt(struct svc_serv *, char *, unsigned short, int);
> >
> > #endif /* SUNRPC_SVC_XPRT_H */
> > diff --git a/net/sunrpc/svc_xprt.c b/net/sunrpc/svc_xprt.c
> > index 8ea65c3..d57064f 100644
> > --- a/net/sunrpc/svc_xprt.c
> > +++ b/net/sunrpc/svc_xprt.c
> > @@ -93,3 +93,38 @@ void svc_xprt_init(struct svc_xprt_class
> > xpt->xpt_max_payload = xcl->xcl_max_payload;
> > }
> > EXPORT_SYMBOL_GPL(svc_xprt_init);
> > +
> > +int svc_create_xprt(struct svc_serv *serv, char *xprt_name,
> > unsigned short port,
> > + int flags)
> > +{
> > + struct svc_xprt_class *xcl;
> > + int ret = -ENOENT;
> > + struct sockaddr_in sin = {
> > + .sin_family = AF_INET,
> > + .sin_addr.s_addr = INADDR_ANY,
> > + .sin_port = htons(port),
> > + };
> > + dprintk("svc: creating transport %s[%d]\n", xprt_name, port);
> > + spin_lock(&svc_xprt_class_lock);
> > + list_for_each_entry(xcl, &svc_xprt_class_list, xcl_list) {
> > + if (strcmp(xprt_name, xcl->xcl_name) == 0) {
> > + spin_unlock(&svc_xprt_class_lock);
> > + if (try_module_get(xcl->xcl_owner)) {
> > + struct svc_xprt *newxprt;
> > + ret = 0;
> > + newxprt = xcl->xcl_ops->xpo_create
> > + (serv, (struct sockaddr *)&sin, flags);
> > + if (IS_ERR(newxprt)) {
> > + module_put(xcl->xcl_owner);
> > + ret = PTR_ERR(newxprt);
> > + }
> > + }
> > + goto out;
> > + }
> > + }
> > + spin_unlock(&svc_xprt_class_lock);
> > + dprintk("svc: transport %s not found\n", xprt_name);
> > + out:
> > + return ret;
> > +}
> > +EXPORT_SYMBOL_GPL(svc_create_xprt);
> > diff --git a/net/sunrpc/svcsock.c b/net/sunrpc/svcsock.c
> > index ffc54a1..e3c74e0 100644
> > --- a/net/sunrpc/svcsock.c
> > +++ b/net/sunrpc/svcsock.c
> > @@ -90,6 +90,8 @@ static void svc_sock_free(struct svc_xp
> > static struct svc_deferred_req *svc_deferred_dequeue(struct
> > svc_sock *svsk);
> > static int svc_deferred_recv(struct svc_rqst *rqstp);
> > static struct cache_deferred_req *svc_defer(struct cache_req *req);
> > +static struct svc_xprt *
> > +svc_create_socket(struct svc_serv *, int, struct sockaddr *, int,
> > int);
> >
> > /* apparently the "standard" is that clients close
> > * idle connections after 5 minutes, servers after
> > @@ -381,6 +383,7 @@ svc_sock_put(struct svc_sock *svsk)
> > {
> > if (atomic_dec_and_test(&svsk->sk_inuse)) {
> > BUG_ON(!test_bit(SK_DEAD, &svsk->sk_flags));
> > + module_put(svsk->sk_xprt.xpt_class->xcl_owner);
> > svsk->sk_xprt.xpt_ops.xpo_free(&svsk->sk_xprt);
> > }
> > }
> > @@ -921,7 +924,15 @@ svc_udp_accept(struct svc_xprt *xprt)
> > return NULL;
> > }
> >
> > +static struct svc_xprt *
> > +svc_udp_create(struct svc_serv *serv, struct sockaddr *sa, int flags)
> > +{
> > + return svc_create_socket(serv, IPPROTO_UDP, sa,
> > + sizeof(struct sockaddr_in), flags);
> > +}
> > +
> > static struct svc_xprt_ops svc_udp_ops = {
> > + .xpo_create = svc_udp_create,
> > .xpo_recvfrom = svc_udp_recvfrom,
> > .xpo_sendto = svc_udp_sendto,
> > .xpo_release = svc_release_skb,
> > @@ -934,6 +945,7 @@ static struct svc_xprt_ops svc_udp_ops =
> >
> > static struct svc_xprt_class svc_udp_class = {
> > .xcl_name = "udp",
> > + .xcl_owner = THIS_MODULE,
> > .xcl_ops = &svc_udp_ops,
> > .xcl_max_payload = RPCSVC_MAXPAYLOAD_UDP,
> > };
> > @@ -1357,7 +1369,15 @@ svc_tcp_has_wspace(struct svc_xprt *xprt
> > return 1;
> > }
> >
> > +static struct svc_xprt *
> > +svc_tcp_create(struct svc_serv *serv, struct sockaddr *sa, int flags)
> > +{
> > + return svc_create_socket(serv, IPPROTO_TCP, sa,
> > + sizeof(struct sockaddr_in), flags);
> > +}
> > +
> > static struct svc_xprt_ops svc_tcp_ops = {
> > + .xpo_create = svc_tcp_create,
> > .xpo_recvfrom = svc_tcp_recvfrom,
> > .xpo_sendto = svc_tcp_sendto,
> > .xpo_release = svc_release_skb,
> > @@ -1370,6 +1390,7 @@ static struct svc_xprt_ops svc_tcp_ops =
> >
> > static struct svc_xprt_class svc_tcp_class = {
> > .xcl_name = "tcp",
> > + .xcl_owner = THIS_MODULE,
> > .xcl_ops = &svc_tcp_ops,
> > .xcl_max_payload = RPCSVC_MAXPAYLOAD_TCP,
> > };
> > @@ -1594,8 +1615,14 @@ svc_recv(struct svc_rqst *rqstp, long ti
> > } else if (test_bit(SK_LISTENER, &svsk->sk_flags)) {
> > struct svc_xprt *newxpt;
> > newxpt = svsk->sk_xprt.xpt_ops.xpo_accept(&svsk->sk_xprt);
> > - if (newxpt)
> > + if (newxpt) {
> > + /*
> > + * We know this module_get will succeed because the
> > + * listener holds a reference too
> > + */
> > + __module_get(newxpt->xpt_class->xcl_owner);
> > svc_check_conn_limits(svsk->sk_server);
> > + }
> > svc_sock_received(svsk);
> > } else {
> > dprintk("svc: server %p, pool %u, socket %p, inuse=%d\n",
> > @@ -1835,8 +1862,9 @@ EXPORT_SYMBOL_GPL(svc_addsock);
> > /*
> > * Create socket for RPC service.
> > */
> > -static int svc_create_socket(struct svc_serv *serv, int protocol,
> > - struct sockaddr *sin, int len, int flags)
> > +static struct svc_xprt *
> > +svc_create_socket(struct svc_serv *serv, int protocol,
> > + struct sockaddr *sin, int len, int flags)
> > {
> > struct svc_sock *svsk;
> > struct socket *sock;
> > @@ -1851,13 +1879,13 @@ static int svc_create_socket(struct svc_
> > if (protocol != IPPROTO_UDP && protocol != IPPROTO_TCP) {
> > printk(KERN_WARNING "svc: only UDP and TCP "
> > "sockets supported\n");
> > - return -EINVAL;
> > + return ERR_PTR(-EINVAL);
> > }
> > type = (protocol == IPPROTO_UDP)? SOCK_DGRAM : SOCK_STREAM;
> >
> > error = sock_create_kern(sin->sa_family, type, protocol, &sock);
> > if (error < 0)
> > - return error;
> > + return ERR_PTR(error);
> >
> > svc_reclassify_socket(sock);
> >
> > @@ -1876,13 +1904,13 @@ static int svc_create_socket(struct svc_
> > if (protocol == IPPROTO_TCP)
> > set_bit(SK_LISTENER, &svsk->sk_flags);
> > svc_sock_received(svsk);
> > - return ntohs(inet_sk(svsk->sk_sk)->sport);
> > + return (struct svc_xprt *)svsk;
> > }
> >
> > bummer:
> > dprintk("svc: svc_create_socket error = %d\n", -error);
> > sock_release(sock);
> > - return error;
> > + return ERR_PTR(error);
> > }
> >
> > /*
> > @@ -1995,15 +2023,15 @@ void svc_force_close_socket(struct svc_s
> > int svc_makesock(struct svc_serv *serv, int protocol, unsigned
> > short port,
> > int flags)
> > {
> > - struct sockaddr_in sin = {
> > - .sin_family = AF_INET,
> > - .sin_addr.s_addr = INADDR_ANY,
> > - .sin_port = htons(port),
> > - };
> > -
> > dprintk("svc: creating socket proto = %d\n", protocol);
> > - return svc_create_socket(serv, protocol, (struct sockaddr *) &sin,
> > - sizeof(sin), flags);
> > + switch (protocol) {
> > + case IPPROTO_TCP:
> > + return svc_create_xprt(serv, "tcp", port, flags);
> > + case IPPROTO_UDP:
> > + return svc_create_xprt(serv, "udp", port, flags);
> > + default:
> > + return -EINVAL;
> > + }
> > }
> >
> > /*
>
> Chuck Lever
> [email protected]
>
>


-------------------------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems? Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >> http://get.splunk.com/
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2007-10-03 20:05:34

by Tom Tucker

[permalink] [raw]
Subject: Re: [RFC, PATCH 12/35] svc: Add a generic transport svc_create_xprt function

On Wed, 2007-10-03 at 15:01 -0500, Tom Tucker wrote:
> On Tue, 2007-10-02 at 11:39 -0400, Chuck Lever wrote:
> > On Oct 1, 2007, at 3:27 PM, Tom Tucker wrote:
> > >
> [...snip...]
> > >
> > > struct svc_xprt_ops {
> > > + struct svc_xprt *(*xpo_create)(struct svc_serv *,
> > > + struct sockaddr *,
> > > + int);
> >
> > Should xpo_create also have a length argument, as in (struct sockaddr
> > *, socklen_t) ?
>
> I think socklen_t is only defined in userland.
>
> >
> > (or whatever the type of sockaddr lengths are: size_t perhaps?)
> >
>
> I've seen it both ways. I just copied kernel_bind which takes an int for
> the length. Does anyone know what the preferred type is for sockaddr
> len?

Oops, I just realized I confused the flags for len. Yes, it should have
a length. I'll code an int initially to match kernel_bind unless someone
feels strongly otherwise.

>
> > > struct svc_xprt *(*xpo_accept)(struct svc_xprt *);
> > > int (*xpo_has_wspace)(struct svc_xprt *);
> > > int (*xpo_recvfrom)(struct svc_rqst *);
> > > @@ -37,5 +40,6 @@ struct svc_xprt {
> > > int svc_reg_xprt_class(struct svc_xprt_class *);
> > > int svc_unreg_xprt_class(struct svc_xprt_class *);
> > > void svc_xprt_init(struct svc_xprt_class *, struct svc_xprt *);
> > > +int svc_create_xprt(struct svc_serv *, char *, unsigned short, int);
> > >
> > > #endif /* SUNRPC_SVC_XPRT_H */
> > > diff --git a/net/sunrpc/svc_xprt.c b/net/sunrpc/svc_xprt.c
> > > index 8ea65c3..d57064f 100644
> > > --- a/net/sunrpc/svc_xprt.c
> > > +++ b/net/sunrpc/svc_xprt.c
> > > @@ -93,3 +93,38 @@ void svc_xprt_init(struct svc_xprt_class
> > > xpt->xpt_max_payload = xcl->xcl_max_payload;
> > > }
> > > EXPORT_SYMBOL_GPL(svc_xprt_init);
> > > +
> > > +int svc_create_xprt(struct svc_serv *serv, char *xprt_name,
> > > unsigned short port,
> > > + int flags)
> > > +{
> > > + struct svc_xprt_class *xcl;
> > > + int ret = -ENOENT;
> > > + struct sockaddr_in sin = {
> > > + .sin_family = AF_INET,
> > > + .sin_addr.s_addr = INADDR_ANY,
> > > + .sin_port = htons(port),
> > > + };
> > > + dprintk("svc: creating transport %s[%d]\n", xprt_name, port);
> > > + spin_lock(&svc_xprt_class_lock);
> > > + list_for_each_entry(xcl, &svc_xprt_class_list, xcl_list) {
> > > + if (strcmp(xprt_name, xcl->xcl_name) == 0) {
> > > + spin_unlock(&svc_xprt_class_lock);
> > > + if (try_module_get(xcl->xcl_owner)) {
> > > + struct svc_xprt *newxprt;
> > > + ret = 0;
> > > + newxprt = xcl->xcl_ops->xpo_create
> > > + (serv, (struct sockaddr *)&sin, flags);
> > > + if (IS_ERR(newxprt)) {
> > > + module_put(xcl->xcl_owner);
> > > + ret = PTR_ERR(newxprt);
> > > + }
> > > + }
> > > + goto out;
> > > + }
> > > + }
> > > + spin_unlock(&svc_xprt_class_lock);
> > > + dprintk("svc: transport %s not found\n", xprt_name);
> > > + out:
> > > + return ret;
> > > +}
> > > +EXPORT_SYMBOL_GPL(svc_create_xprt);
> > > diff --git a/net/sunrpc/svcsock.c b/net/sunrpc/svcsock.c
> > > index ffc54a1..e3c74e0 100644
> > > --- a/net/sunrpc/svcsock.c
> > > +++ b/net/sunrpc/svcsock.c
> > > @@ -90,6 +90,8 @@ static void svc_sock_free(struct svc_xp
> > > static struct svc_deferred_req *svc_deferred_dequeue(struct
> > > svc_sock *svsk);
> > > static int svc_deferred_recv(struct svc_rqst *rqstp);
> > > static struct cache_deferred_req *svc_defer(struct cache_req *req);
> > > +static struct svc_xprt *
> > > +svc_create_socket(struct svc_serv *, int, struct sockaddr *, int,
> > > int);
> > >
> > > /* apparently the "standard" is that clients close
> > > * idle connections after 5 minutes, servers after
> > > @@ -381,6 +383,7 @@ svc_sock_put(struct svc_sock *svsk)
> > > {
> > > if (atomic_dec_and_test(&svsk->sk_inuse)) {
> > > BUG_ON(!test_bit(SK_DEAD, &svsk->sk_flags));
> > > + module_put(svsk->sk_xprt.xpt_class->xcl_owner);
> > > svsk->sk_xprt.xpt_ops.xpo_free(&svsk->sk_xprt);
> > > }
> > > }
> > > @@ -921,7 +924,15 @@ svc_udp_accept(struct svc_xprt *xprt)
> > > return NULL;
> > > }
> > >
> > > +static struct svc_xprt *
> > > +svc_udp_create(struct svc_serv *serv, struct sockaddr *sa, int flags)
> > > +{
> > > + return svc_create_socket(serv, IPPROTO_UDP, sa,
> > > + sizeof(struct sockaddr_in), flags);
> > > +}
> > > +
> > > static struct svc_xprt_ops svc_udp_ops = {
> > > + .xpo_create = svc_udp_create,
> > > .xpo_recvfrom = svc_udp_recvfrom,
> > > .xpo_sendto = svc_udp_sendto,
> > > .xpo_release = svc_release_skb,
> > > @@ -934,6 +945,7 @@ static struct svc_xprt_ops svc_udp_ops =
> > >
> > > static struct svc_xprt_class svc_udp_class = {
> > > .xcl_name = "udp",
> > > + .xcl_owner = THIS_MODULE,
> > > .xcl_ops = &svc_udp_ops,
> > > .xcl_max_payload = RPCSVC_MAXPAYLOAD_UDP,
> > > };
> > > @@ -1357,7 +1369,15 @@ svc_tcp_has_wspace(struct svc_xprt *xprt
> > > return 1;
> > > }
> > >
> > > +static struct svc_xprt *
> > > +svc_tcp_create(struct svc_serv *serv, struct sockaddr *sa, int flags)
> > > +{
> > > + return svc_create_socket(serv, IPPROTO_TCP, sa,
> > > + sizeof(struct sockaddr_in), flags);
> > > +}
> > > +
> > > static struct svc_xprt_ops svc_tcp_ops = {
> > > + .xpo_create = svc_tcp_create,
> > > .xpo_recvfrom = svc_tcp_recvfrom,
> > > .xpo_sendto = svc_tcp_sendto,
> > > .xpo_release = svc_release_skb,
> > > @@ -1370,6 +1390,7 @@ static struct svc_xprt_ops svc_tcp_ops =
> > >
> > > static struct svc_xprt_class svc_tcp_class = {
> > > .xcl_name = "tcp",
> > > + .xcl_owner = THIS_MODULE,
> > > .xcl_ops = &svc_tcp_ops,
> > > .xcl_max_payload = RPCSVC_MAXPAYLOAD_TCP,
> > > };
> > > @@ -1594,8 +1615,14 @@ svc_recv(struct svc_rqst *rqstp, long ti
> > > } else if (test_bit(SK_LISTENER, &svsk->sk_flags)) {
> > > struct svc_xprt *newxpt;
> > > newxpt = svsk->sk_xprt.xpt_ops.xpo_accept(&svsk->sk_xprt);
> > > - if (newxpt)
> > > + if (newxpt) {
> > > + /*
> > > + * We know this module_get will succeed because the
> > > + * listener holds a reference too
> > > + */
> > > + __module_get(newxpt->xpt_class->xcl_owner);
> > > svc_check_conn_limits(svsk->sk_server);
> > > + }
> > > svc_sock_received(svsk);
> > > } else {
> > > dprintk("svc: server %p, pool %u, socket %p, inuse=%d\n",
> > > @@ -1835,8 +1862,9 @@ EXPORT_SYMBOL_GPL(svc_addsock);
> > > /*
> > > * Create socket for RPC service.
> > > */
> > > -static int svc_create_socket(struct svc_serv *serv, int protocol,
> > > - struct sockaddr *sin, int len, int flags)
> > > +static struct svc_xprt *
> > > +svc_create_socket(struct svc_serv *serv, int protocol,
> > > + struct sockaddr *sin, int len, int flags)
> > > {
> > > struct svc_sock *svsk;
> > > struct socket *sock;
> > > @@ -1851,13 +1879,13 @@ static int svc_create_socket(struct svc_
> > > if (protocol != IPPROTO_UDP && protocol != IPPROTO_TCP) {
> > > printk(KERN_WARNING "svc: only UDP and TCP "
> > > "sockets supported\n");
> > > - return -EINVAL;
> > > + return ERR_PTR(-EINVAL);
> > > }
> > > type = (protocol == IPPROTO_UDP)? SOCK_DGRAM : SOCK_STREAM;
> > >
> > > error = sock_create_kern(sin->sa_family, type, protocol, &sock);
> > > if (error < 0)
> > > - return error;
> > > + return ERR_PTR(error);
> > >
> > > svc_reclassify_socket(sock);
> > >
> > > @@ -1876,13 +1904,13 @@ static int svc_create_socket(struct svc_
> > > if (protocol == IPPROTO_TCP)
> > > set_bit(SK_LISTENER, &svsk->sk_flags);
> > > svc_sock_received(svsk);
> > > - return ntohs(inet_sk(svsk->sk_sk)->sport);
> > > + return (struct svc_xprt *)svsk;
> > > }
> > >
> > > bummer:
> > > dprintk("svc: svc_create_socket error = %d\n", -error);
> > > sock_release(sock);
> > > - return error;
> > > + return ERR_PTR(error);
> > > }
> > >
> > > /*
> > > @@ -1995,15 +2023,15 @@ void svc_force_close_socket(struct svc_s
> > > int svc_makesock(struct svc_serv *serv, int protocol, unsigned
> > > short port,
> > > int flags)
> > > {
> > > - struct sockaddr_in sin = {
> > > - .sin_family = AF_INET,
> > > - .sin_addr.s_addr = INADDR_ANY,
> > > - .sin_port = htons(port),
> > > - };
> > > -
> > > dprintk("svc: creating socket proto = %d\n", protocol);
> > > - return svc_create_socket(serv, protocol, (struct sockaddr *) &sin,
> > > - sizeof(sin), flags);
> > > + switch (protocol) {
> > > + case IPPROTO_TCP:
> > > + return svc_create_xprt(serv, "tcp", port, flags);
> > > + case IPPROTO_UDP:
> > > + return svc_create_xprt(serv, "udp", port, flags);
> > > + default:
> > > + return -EINVAL;
> > > + }
> > > }
> > >
> > > /*
> >
> > Chuck Lever
> > [email protected]
> >
> >
>
>
> -------------------------------------------------------------------------
> This SF.net email is sponsored by: Splunk Inc.
> Still grepping through log files to find problems? Stop.
> Now Search log events and configuration files using AJAX and a browser.
> Download your FREE copy of Splunk now >> http://get.splunk.com/
> _______________________________________________
> NFS maillist - [email protected]
> https://lists.sourceforge.net/lists/listinfo/nfs


-------------------------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems? Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >> http://get.splunk.com/
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2007-10-04 01:05:51

by Greg Banks

[permalink] [raw]
Subject: Re: [RFC, PATCH 04/35] svc: Add a max payload value to the transport

On Wed, Oct 03, 2007 at 09:26:53AM -0500, Tom Tucker wrote:
> On Wed, 2007-10-03 at 21:09 +1000, Greg Banks wrote:
> > On Tue, Oct 02, 2007 at 11:28:08AM -0500, Tom Tucker wrote:
> I see only one possible benefit of copying the svc_xprt_ops structure
> and that is that it saves one ptr de-reference when calling ops. What's
> the consensus? Copy it, or keep a ptr?

Pointer.

Greg.
--
Greg Banks, R&D Software Engineer, SGI Australian Software Group.
Apparently, I'm Bedevere. Which MPHG character are you?
I don't speak for SGI.

-------------------------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems? Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >> http://get.splunk.com/
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2007-10-04 01:15:56

by Greg Banks

[permalink] [raw]
Subject: Re: [RFC, PATCH 05/35] svc: Move sk_sendto and sk_recvfrom to svc_xprt_class

On Tue, Oct 02, 2007 at 01:30:42PM -0500, Tom Tucker wrote:
> On Tue, 2007-10-02 at 13:24 -0500, Tom Tucker wrote:
> > On Tue, 2007-10-02 at 12:57 -0400, Chuck Lever wrote:
> > > On Oct 2, 2007, at 12:29 PM, Tom Tucker wrote:
> >
> > [...snip...]
> >
> > > >>
> > > >> It looks like on the client side, I didn't put the ops vector or the
> > > >> payload maximum in the class structure at all... 6 of one, half dozen
> > > >> of the other. Using the class's value of the ops and payload maximum
> > > >> would save some space in the svc_xprt, though, come to think of it.
> > > >>
> > > >
> > > > cache thing again. let's see how Greg weighs in.
> > >
> > > The ops vector itself will be in some other CPU's memory most of the
> > > time on big systems.
> >
> > Well this is a good point. Unless we implement thread pools for svc_xprt
> > memory allocation, it won't likely buy you much.
> >
>
> Actually, I'm having second thoughts. Since the svc_xprt structure is
> allocated on the rqstp thread in which the transport is going to be
> used, won't the memory be local to the allocating processor on a NUMA
> system?

On NUMA systems this more or less just works out OK as long as you
don't do anything silly like round-robin bonding or enable cpuset
memory_spread_slab. The incoming TCP SYN segment is processed on
the CPU which handles irqs from the NIC on which the segment arrives.
This means the svc_xprt is allocated from that CPU, and lives in that
node's memory, where it's nice and local for all the NFS processing
in response to subsequent TCP segments.

Achieving this effect was one of the unspoken goals of the knfsd
NUMAisation work. It's the reason that the permanent sockets are
not per-pool, so that threads on any CPU can respond to connection
attempts.

Greg.
--
Greg Banks, R&D Software Engineer, SGI Australian Software Group.
Apparently, I'm Bedevere. Which MPHG character are you?
I don't speak for SGI.

-------------------------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems? Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >> http://get.splunk.com/
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2007-10-04 01:29:48

by Greg Banks

[permalink] [raw]
Subject: Re: [RFC, PATCH 05/35] svc: Move sk_sendto and sk_recvfrom to svc_xprt_class

On Tue, Oct 02, 2007 at 02:55:06PM -0500, Tom Tucker wrote:
> On Tue, 2007-10-02 at 14:47 -0400, Chuck Lever wrote:
> > On Oct 2, 2007, at 2:30 PM, Tom Tucker wrote:
> >
> > > On Tue, 2007-10-02 at 13:24 -0500, Tom Tucker wrote:
> > >> On Tue, 2007-10-02 at 12:57 -0400, Chuck Lever wrote:
> > >>> On Oct 2, 2007, at 12:29 PM, Tom Tucker wrote:
> > >>
> > >> [...snip...]
>
> [...snip...]
>
> > >
> > > Actually, I'm having second thoughts. Since the svc_xprt structure is
> > > allocated on the rqstp thread in which the transport is going to be
> > > used, won't the memory be local to the allocating processor on a NUMA
> > > system?
> >
> > The ops vector isn't in the svc_xprt. It's a constant, so it's in
> > memory allocated by the kernel loader at boot time.
> >
>
> I think one of us missing something. Here's how I think it works...
>
> The svc_xprt_ops structure is a constant in kernel memory. The
> svc_xprt_class is also a constant and points to the svc_xprt_ops
> structure. The svc_xprt structure, however, is allocated via kmalloc and
> contains a _copy_ of the constant svc_xprt_ops structure and a copy of
> the xcl_max_payload value. See the svc_xprt_init function.
>
> My original thinking (flawed I think) was that since the svc_xprt was
> allocated in the context of the current rqstp thread, that it would be
> allocated from processor local memory. While I think this is true,
> subsequent assignment of a rqstp thread to service a transport has no
> affinity to a particular transport.

Actually it does, on systems where these effects matter, thanks
to irq binding. Altix hardware irq behaviour is that interrupts
from one device go to one CPU only (on other platforms you may
need to explicitly bind interrupts to achieve the same effect).
Given non-variable IP routing (i.e. assuming you tune ARP to behave
sensibly, and don't use mode=rr bonding) this means that network irqs
destined for a particular TCP socket have a very strong affinity to
a particular CPU. In the steady state, NFS traffic from a single client
hits only a single CPU and inter-node NUMA traffic is very small.

The only part of this picture that doesn't work right in tot is
that svc_rqst structures are allocated at system boot and end up
on node0, and for really large systems doing a lot of IO this can
have a noticeable effect. I have a patch which I need to get around
to posting.

Greg.
--
Greg Banks, R&D Software Engineer, SGI Australian Software Group.
Apparently, I'm Bedevere. Which MPHG character are you?
I don't speak for SGI.

-------------------------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems? Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >> http://get.splunk.com/
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2007-10-04 01:43:42

by Greg Banks

[permalink] [raw]
Subject: Re: [RFC, PATCH 06/35] svc: Add transport specific xpo_release function

On Tue, Oct 02, 2007 at 11:18:53AM -0400, Chuck Lever wrote:
> On Oct 1, 2007, at 3:27 PM, Tom Tucker wrote:
> >
> You intend to add xpo_detach and xpo_free in a later patch. The
> method names suggest all of these operate on the svc_xprt.
> xpo_release, however appears to operate on a request, not on a svc_xprt.

Good point.

> Perhaps you might name this method xpo_release_rqst or some other
> name that indicates that this operates on a request. The name
> xpo_release could easily refer to closing the underlying socket. As
> an example, the client-side transport uses ->release_request.

I'd really like that, except that server-side the naming waters are
muddied by svc_rqst being the per-thread structure rather a per-request
structure. I'd use the word "call" to be slightly less ambiguous.

> The client side also appears to treat the transport as handling
> requests, instead of socket reads and writes. The use of the method
> names recvfrom and sendto suggest we are talking about bytes on a
> socket here, not transmitting and receiving whole RPC requests. I
> think that's a useful abstraction.

Agreed.

> While I'm whining aloud... I don't prefer the method names detach or
> free, either. On the client side we used close and destroy, which
> (to me, anyway) makes more sense.

"xpo_destroy" is as good as "xpo_free" by me. However "xpo_detach"
is actually a precise description of what the method does: it prevents
further data ready callbacks so that no new incoming data is noticed,
and nothing else. Actually closing the socket needs to be done later
when any sender has finished.

Greg.
--
Greg Banks, R&D Software Engineer, SGI Australian Software Group.
Apparently, I'm Bedevere. Which MPHG character are you?
I don't speak for SGI.

-------------------------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems? Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >> http://get.splunk.com/
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2007-10-04 01:49:43

by Greg Banks

[permalink] [raw]
Subject: Re: [RFC,PATCH 11/35] svc: Add xpo_accept transport function

On Tue, Oct 02, 2007 at 01:28:54PM -0500, Tom Tucker wrote:
> On Tue, 2007-10-02 at 13:07 -0400, Chuck Lever wrote:
> > On Oct 2, 2007, at 12:41 PM, Tom Tucker wrote:
> > > On Tue, 2007-10-02 at 11:33 -0400, Chuck Lever wrote:
> > >> On Oct 1, 2007, at 3:27 PM, Tom Tucker wrote:
> > >
> > > [...snip...]
> > >
> > >>> + if (newxpt)
> > >>> + svc_check_conn_limits(svsk->sk_server);
> > >>> + svc_sock_received(svsk);
> > >>> } else {
> > >>> dprintk("svc: server %p, pool %u, socket %p, inuse=%d\n",
> > >>> rqstp, pool->sp_id, svsk, atomic_read(&svsk->sk_inuse));
> > >>
> > >> Instead of adding a test_bit() and conditional branch here, why not
> > >> always call xpo_accept? For UDP, the method simply returns.
> > >>
> > >
> > > That's what I thought at first too, but UDP needs to call receive
> > > here.
> > > Doing nothing stalls the service and lockd never gets set up.
> >
> > The purpose of a transport switch is to force all the transport
> > specific processing down into the transport implementation so you
> > don't need these SK_ switches to decide whether or not to call a
> > function based on which transport is in use.
>
> I don't think it's doing that. I think it's checking the "role" of the
> instance; passive vs. active endpoint. The role is transport independent
> and is checked in the generic svc_recv function.

Yes, this is generic functionality that any connection-oriented
transport needs. Two of the three transports we have use xpo_accept.

Fundamentally, the transport-independent core needs to be aware of the
difference between connectionless and connection-oriented transports.
It needs to do several special things for the latter. Examples
include enforcing a connection limit, and caching client-related data
on the connection.

Greg.
--
Greg Banks, R&D Software Engineer, SGI Australian Software Group.
Apparently, I'm Bedevere. Which MPHG character are you?
I don't speak for SGI.

-------------------------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems? Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >> http://get.splunk.com/
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2007-10-04 02:25:20

by Greg Banks

[permalink] [raw]
Subject: Re: [RFC, PATCH 12/35] svc: Add a generic transport svc_create_xprt function

On Tue, Oct 02, 2007 at 11:39:18AM -0400, Chuck Lever wrote:
> On Oct 1, 2007, at 3:27 PM, Tom Tucker wrote:
> >
> > struct svc_xprt_ops {
> >+ struct svc_xprt *(*xpo_create)(struct svc_serv *,
> >+ struct sockaddr *,
> >+ int);
>
> Should xpo_create also have a length argument, as in (struct sockaddr
> *, socklen_t) ?

Consistency would be nice. Let's see, how often do we
maintain address length fields now?

svc_deferred_req.addr: yes
svc_rqst.rq_addr: yes
svc_sock.sk_local: no
svc_sock.sk_remote: yes
rpc_xprt.addr: yes
rpc_create_args.address: \
rpc_create_args.saddress: / one length field for both
rpc_xprtsock_create.srcaddr: \
rpc_xprtsock_create.dstaddr: / one length field for both
rpc_peeraddr(): yes
rpcb_create(): no
xs_send_kvec(): yes
xs_sendpages(): yes
__svc_print_addr(): no
svc_port_is_privileged(): no
svc_create_socket(): yes

> (or whatever the type of sockaddr lengths are: size_t perhaps?)

socklen_t should be right, but the kernel doesn't seem to have one.

Greg.
--
Greg Banks, R&D Software Engineer, SGI Australian Software Group.
Apparently, I'm Bedevere. Which MPHG character are you?
I don't speak for SGI.

-------------------------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems? Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >> http://get.splunk.com/
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2007-10-04 02:30:24

by Greg Banks

[permalink] [raw]
Subject: Re: [RFC, PATCH 13/35] svc: Change services to use new svc_create_xprt service

On Tue, Oct 02, 2007 at 11:44:50AM -0400, Chuck Lever wrote:
> On Oct 1, 2007, at 3:27 PM, Tom Tucker wrote:
> >
> >-static int find_socket(struct svc_serv *serv, int proto)
> >+static int find_xprt(struct svc_serv *serv, char *proto)
> > {
> > struct svc_sock *svsk;
> > int found = 0;
> > list_for_each_entry(svsk, &serv->sv_permsocks, sk_list)
> >- if (svsk->sk_sk->sk_protocol == proto) {
> >+ if (strcmp(svsk->sk_xprt.xpt_class->xcl_name, proto) == 0) {
> > found = 1;
> > break;
> > }
>
> This is scary. :-)
>
> First, I think we would be better off making the server transport API
> stronger by not allowing ULPs to dig around in svc_xprt or the
> svc_xprt_class structures directly. Perhaps you could provide a
> method for obtaining the transport's NETID.

Or a function to do the match:

struct svc_xprt_ops {
...
int (*xpo_netid_supported)(const char *netid);
...
};

Or, your could push the find_xprt() function down into svc_xprt.c
where such futzing belongs.

> Second, is there any guarantee that the string name of the underlying
> protocol is the same as the name of the transport class? Is there
> any relationship between the transport name and the NETIDs it supports?

This is already confused: our "tcp" transport supports both TCP/IPv4
and TCP/IPv6, which are two separate netids "tcp4" and "tcp6" in
TLI jargon.

Greg.
--
Greg Banks, R&D Software Engineer, SGI Australian Software Group.
Apparently, I'm Bedevere. Which MPHG character are you?
I don't speak for SGI.

-------------------------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems? Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >> http://get.splunk.com/
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2007-10-04 02:46:24

by Greg Banks

[permalink] [raw]
Subject: Re: [RFC,PATCH 14/35] svc: Change sk_inuse to a kref

On Wed, Oct 03, 2007 at 11:34:01AM -0400, J. Bruce Fields wrote:
> On Wed, Oct 03, 2007 at 11:13:58AM -0400, Chuck Lever wrote:
> > We're using krefs without issue in several other areas of the RPC client
> > and server. Both rpc_clnt and rpc_xprt have a kref in them, for example.
> > I don't see the harm in using them here as well, as long as they are used
> > carefully.
>
> I believe the change in this case is more or less a no-op, so I can't
> bring myself to care either way.
>
> I seem to recall Andrew (? I thought?) saying that the use of kref's
> helped made obvious cases where use of a reference count followed the
> simplest
>
> atomic_set( ,0) on init

Or rather, atomic_set(,1)

> atomic_inc() on get
> atomic_dec_and_test() on put
>
> pattern, and that that simplified reviewing.
>
> I dunno.
>

krefs also seem to be missing a wrapper for atomic_read(), which is
used several times in the svc code for BUG_ON()s and dprintks().

Personally I don't see the attraction of krefs. It looks like someone
saw the typical C++ pattern of a paired abstract base class/smart
pointer class to do automatic reference counting and tried to copy it.
In C it doesn't really work because you can't do a smart pointer and
a virtual destructor is hard to do cleanly.

But, whatever. In this case they're adequate.

Greg.
--
Greg Banks, R&D Software Engineer, SGI Australian Software Group.
Apparently, I'm Bedevere. Which MPHG character are you?
I don't speak for SGI.

-------------------------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems? Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >> http://get.splunk.com/
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2007-10-04 02:54:06

by Greg Banks

[permalink] [raw]
Subject: Re: [RFC,PATCH 20/35] svc: Make svc_send transport neutral

On Tue, Oct 02, 2007 at 12:54:32PM -0400, Chuck Lever wrote:
>
> On Oct 2, 2007, at 12:46 PM, Tom Tucker wrote:
>
> >On Tue, 2007-10-02 at 12:15 -0400, Chuck Lever wrote:
> >>On Oct 1, 2007, at 3:28 PM, Tom Tucker wrote:
> >>>
> >
> >[...snip...]
> >
> >>>- if ((svsk = rqstp->rq_sock) == NULL) {
> >>>- printk(KERN_WARNING "NULL socket pointer in %s:%d\n",
> >>>+ if ((xprt = rqstp->rq_xprt) == NULL) {
> >>>+ printk(KERN_WARNING "NULL transport pointer in %s:%d\n",
> >>> __FILE__, __LINE__);
> >>> return -EFAULT;
> >>> }
> >>
> >>Do we still want this printk here? Maybe it can be removed.
> >
> >I don't know why it's here. Maybe replace it with a BUG_ON?
>
> /me makes an X with his fingers and hisses like a cat....
>
> BUG_ON is heavyweight and often makes the system unusable after a
> short while by leaving a lot of bogus state (like held locks).
> Unless a NULL here is a real sign of a software fault that requires a
> hard stop, simply returning EFAULT makes sense.

#define EFAULT 14 /* Bad address */

"Bad address" ? Even EBADF would be a better approximation.
Fmeh, if only we had EINTERNALERROR.

Greg.
--
Greg Banks, R&D Software Engineer, SGI Australian Software Group.
Apparently, I'm Bedevere. Which MPHG character are you?
I don't speak for SGI.

-------------------------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems? Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >> http://get.splunk.com/
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2007-10-04 14:28:38

by Tom Tucker

[permalink] [raw]
Subject: Re: [RFC, PATCH 13/35] svc: Change services to use new svc_create_xprt service

On Thu, 2007-10-04 at 12:35 +1000, Greg Banks wrote:
> On Tue, Oct 02, 2007 at 11:44:50AM -0400, Chuck Lever wrote:
> > On Oct 1, 2007, at 3:27 PM, Tom Tucker wrote:
> > >
> > >-static int find_socket(struct svc_serv *serv, int proto)
> > >+static int find_xprt(struct svc_serv *serv, char *proto)
> > > {
> > > struct svc_sock *svsk;
> > > int found = 0;
> > > list_for_each_entry(svsk, &serv->sv_permsocks, sk_list)
> > >- if (svsk->sk_sk->sk_protocol == proto) {
> > >+ if (strcmp(svsk->sk_xprt.xpt_class->xcl_name, proto) == 0) {
> > > found = 1;
> > > break;
> > > }
> >
> > This is scary. :-)
> >
> > First, I think we would be better off making the server transport API
> > stronger by not allowing ULPs to dig around in svc_xprt or the
> > svc_xprt_class structures directly. Perhaps you could provide a
> > method for obtaining the transport's NETID.
>
> Or a function to do the match:
>
> struct svc_xprt_ops {
> ...
> int (*xpo_netid_supported)(const char *netid);
> ...
> };
>
> Or, your could push the find_xprt() function down into svc_xprt.c
> where such futzing belongs.

I think this is the correct approach, although the parameters of the
svc_find_xprt function may need some tweaking.

>
> > Second, is there any guarantee that the string name of the underlying
> > protocol

The transport string doesn't name a protocol, it names a transport
class. I'll check the code because I bet I've got a confused name or two
in the code.

The transport class identifies a provider that serves up the API we've
defined. To be painfully pedantic, the transport provider determines the
underlying protocol. Within a protocol family, the address family in the
sockaddr is used to further discriminate. So in the case of the "tcp"
transport class, the socket type (SOCK_STREAM), combined with the
address family (AF_INET/AF_INET6) in the provided sockaddr determines
the protocol (TCP/IPV4, or TCP/IPV6).

Along these lines, a reasonable argument could be made that the names of
the transport classes should be "sock_stream", and "sock_dgram", not
"tcp", and "udp".

> is the same as the name of the transport class? Is there
> > any relationship between the transport name and the NETIDs it supports?
>

I think trying to map the netid to a transport instance is a mistake.
The "transport" name here is above the network layer and maps more to an
API (socket, ofa rdma) than to a network protocol. "rdma" in this
context can mean iWARP, or IB, and iWARP can mean RDDP/TCP, RDDP/SCTP,
etc...

Given the current support, "rdma" can be over IB or RDDP/TCP. So, IMO:

rdma:AF_INET:0.0.0.0:2081
rdma:AF_INET6:ff.ff.ff.ff.10.10.0.5:2081
tcp:AF_INET:0.0.0.0:2049

etc, are strings that map a transport instance logically and uniquely.

If you always want to listen on zero, then you can get rid of the
address itself.

This approach also allows the transport to extend naturally to other
address families.

> This is already confused: our "tcp" transport supports both TCP/IPv4
> and TCP/IPv6, which are two separate netids "tcp4" and "tcp6" in
> TLI jargon.


>
> Greg.


-------------------------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems? Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >> http://get.splunk.com/
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2007-10-01 19:27:39

by Tom Tucker

[permalink] [raw]
Subject: [RFC, PATCH 04/35] svc: Add a max payload value to the transport


The svc_max_payload function currently looks at the socket type
to determine the max payload. Add a max payload value to svc_xprt_class
so it can be returned directly.

Signed-off-by: Tom Tucker <[email protected]>
---

include/linux/sunrpc/svc_xprt.h | 2 ++
net/sunrpc/svc.c | 4 +---
net/sunrpc/svc_xprt.c | 1 +
net/sunrpc/svcsock.c | 2 ++
4 files changed, 6 insertions(+), 3 deletions(-)

diff --git a/include/linux/sunrpc/svc_xprt.h b/include/linux/sunrpc/svc_xprt.h
index a9a3afe..827f0fe 100644
--- a/include/linux/sunrpc/svc_xprt.h
+++ b/include/linux/sunrpc/svc_xprt.h
@@ -17,11 +17,13 @@ struct svc_xprt_class {
struct module *xcl_owner;
struct svc_xprt_ops *xcl_ops;
struct list_head xcl_list;
+ u32 xcl_max_payload;
};

struct svc_xprt {
struct svc_xprt_class *xpt_class;
struct svc_xprt_ops xpt_ops;
+ u32 xpt_max_payload;
};

int svc_reg_xprt_class(struct svc_xprt_class *);
diff --git a/net/sunrpc/svc.c b/net/sunrpc/svc.c
index 55ea6df..2a4b3c6 100644
--- a/net/sunrpc/svc.c
+++ b/net/sunrpc/svc.c
@@ -1034,10 +1034,8 @@ err_bad:
*/
u32 svc_max_payload(const struct svc_rqst *rqstp)
{
- int max = RPCSVC_MAXPAYLOAD_TCP;
+ int max = rqstp->rq_xprt->xpt_max_payload;

- if (rqstp->rq_sock->sk_sock->type == SOCK_DGRAM)
- max = RPCSVC_MAXPAYLOAD_UDP;
if (rqstp->rq_server->sv_max_payload < max)
max = rqstp->rq_server->sv_max_payload;
return max;
diff --git a/net/sunrpc/svc_xprt.c b/net/sunrpc/svc_xprt.c
index f838b57..8ea65c3 100644
--- a/net/sunrpc/svc_xprt.c
+++ b/net/sunrpc/svc_xprt.c
@@ -90,5 +90,6 @@ void svc_xprt_init(struct svc_xprt_class
{
xpt->xpt_class = xcl;
xpt->xpt_ops = *xcl->xcl_ops;
+ xpt->xpt_max_payload = xcl->xcl_max_payload;
}
EXPORT_SYMBOL_GPL(svc_xprt_init);
diff --git a/net/sunrpc/svcsock.c b/net/sunrpc/svcsock.c
index d52a6e2..d84b5c8 100644
--- a/net/sunrpc/svcsock.c
+++ b/net/sunrpc/svcsock.c
@@ -905,6 +905,7 @@ static struct svc_xprt_ops svc_udp_ops =
static struct svc_xprt_class svc_udp_class = {
.xcl_name = "udp",
.xcl_ops = &svc_udp_ops,
+ .xcl_max_payload = RPCSVC_MAXPAYLOAD_UDP,
};

static void
@@ -1358,6 +1359,7 @@ static struct svc_xprt_ops svc_tcp_ops =
static struct svc_xprt_class svc_tcp_class = {
.xcl_name = "tcp",
.xcl_ops = &svc_tcp_ops,
+ .xcl_max_payload = RPCSVC_MAXPAYLOAD_TCP,
};

void svc_init_xprt_sock(void)

-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2005.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2007-10-01 19:27:34

by Tom Tucker

[permalink] [raw]
Subject: [RFC,PATCH 01/35] svc: Add an svc transport class


The transport class (svc_xprt_class) represents a type of transport, e.g.
udp, tcp, rdma. A transport class has a unique name and a set of transport
operations kept in the svc_xprt_ops structure.

A transport class can be dynamically registered and unregisterd. The
svc_xprt_class represents the module that implements the transport
type and keeps reference counts on the module to avoid unloading while
there are active users.

The endpoint (svc_xprt) is a generic, transport independent endpoint that can
be used to send and receive data for an RPC service. It inherits it's
operations from the transport class.

A transport driver module registers and unregisters itself with svc sunrpc
by calling svc_reg_xprt_class, and svc_unreg_xprt_class respectively.

Signed-off-by: Tom Tucker <[email protected]>
---

include/linux/sunrpc/debug.h | 1
include/linux/sunrpc/svc_xprt.h | 31 +++++++++++++
net/sunrpc/Makefile | 3 +
net/sunrpc/svc_xprt.c | 94 +++++++++++++++++++++++++++++++++++++++
4 files changed, 128 insertions(+), 1 deletions(-)

diff --git a/include/linux/sunrpc/debug.h b/include/linux/sunrpc/debug.h
index 3912cf1..092fcfa 100644
--- a/include/linux/sunrpc/debug.h
+++ b/include/linux/sunrpc/debug.h
@@ -21,6 +21,7 @@ #define RPCDBG_BIND 0x0020
#define RPCDBG_SCHED 0x0040
#define RPCDBG_TRANS 0x0080
#define RPCDBG_SVCSOCK 0x0100
+#define RPCDBG_SVCXPRT 0x0100
#define RPCDBG_SVCDSP 0x0200
#define RPCDBG_MISC 0x0400
#define RPCDBG_CACHE 0x0800
diff --git a/include/linux/sunrpc/svc_xprt.h b/include/linux/sunrpc/svc_xprt.h
new file mode 100644
index 0000000..a9a3afe
--- /dev/null
+++ b/include/linux/sunrpc/svc_xprt.h
@@ -0,0 +1,31 @@
+/*
+ * linux/include/linux/sunrpc/svc_xprt.h
+ *
+ * RPC server transport I/O
+ */
+
+#ifndef SUNRPC_SVC_XPRT_H
+#define SUNRPC_SVC_XPRT_H
+
+#include <linux/sunrpc/svc.h>
+
+struct svc_xprt_ops {
+};
+
+struct svc_xprt_class {
+ const char *xcl_name;
+ struct module *xcl_owner;
+ struct svc_xprt_ops *xcl_ops;
+ struct list_head xcl_list;
+};
+
+struct svc_xprt {
+ struct svc_xprt_class *xpt_class;
+ struct svc_xprt_ops xpt_ops;
+};
+
+int svc_reg_xprt_class(struct svc_xprt_class *);
+int svc_unreg_xprt_class(struct svc_xprt_class *);
+void svc_xprt_init(struct svc_xprt_class *, struct svc_xprt *);
+
+#endif /* SUNRPC_SVC_XPRT_H */
diff --git a/net/sunrpc/Makefile b/net/sunrpc/Makefile
index 8ebfc4d..e37aa99 100644
--- a/net/sunrpc/Makefile
+++ b/net/sunrpc/Makefile
@@ -10,6 +10,7 @@ sunrpc-y := clnt.o xprt.o socklib.o xprt
auth.o auth_null.o auth_unix.o \
svc.o svcsock.o svcauth.o svcauth_unix.o \
rpcb_clnt.o timer.o xdr.o \
- sunrpc_syms.o cache.o rpc_pipe.o
+ sunrpc_syms.o cache.o rpc_pipe.o \
+ svc_xprt.o
sunrpc-$(CONFIG_PROC_FS) += stats.o
sunrpc-$(CONFIG_SYSCTL) += sysctl.o
diff --git a/net/sunrpc/svc_xprt.c b/net/sunrpc/svc_xprt.c
new file mode 100644
index 0000000..f838b57
--- /dev/null
+++ b/net/sunrpc/svc_xprt.c
@@ -0,0 +1,94 @@
+/*
+ * linux/net/sunrpc/svc_xprt.c
+ *
+ * Author: Tom Tucker <[email protected]>
+ */
+
+#include <linux/sched.h>
+#include <linux/errno.h>
+#include <linux/fcntl.h>
+#include <linux/net.h>
+#include <linux/in.h>
+#include <linux/inet.h>
+#include <linux/udp.h>
+#include <linux/tcp.h>
+#include <linux/unistd.h>
+#include <linux/slab.h>
+#include <linux/netdevice.h>
+#include <linux/skbuff.h>
+#include <linux/file.h>
+#include <linux/freezer.h>
+#include <net/sock.h>
+#include <net/checksum.h>
+#include <net/ip.h>
+#include <net/ipv6.h>
+#include <net/tcp_states.h>
+#include <linux/uaccess.h>
+#include <asm/ioctls.h>
+
+#include <linux/sunrpc/types.h>
+#include <linux/sunrpc/clnt.h>
+#include <linux/sunrpc/xdr.h>
+#include <linux/sunrpc/svcsock.h>
+#include <linux/sunrpc/stats.h>
+#include <linux/sunrpc/svc_xprt.h>
+
+#define RPCDBG_FACILITY RPCDBG_SVCXPRT
+
+/* List of registered transport classes */
+static spinlock_t svc_xprt_class_lock = SPIN_LOCK_UNLOCKED;
+static LIST_HEAD(svc_xprt_class_list);
+
+int svc_reg_xprt_class(struct svc_xprt_class *xcl)
+{
+ struct svc_xprt_class *cl;
+ int res = -EEXIST;
+
+ dprintk("svc: Adding svc transport class '%s'\n",
+ xcl->xcl_name);
+
+ INIT_LIST_HEAD(&xcl->xcl_list);
+ spin_lock(&svc_xprt_class_lock);
+ list_for_each_entry(cl, &svc_xprt_class_list, xcl_list) {
+ if (xcl == cl)
+ goto out;
+ }
+ list_add_tail(&xcl->xcl_list, &svc_xprt_class_list);
+ res = 0;
+out:
+ spin_unlock(&svc_xprt_class_lock);
+ return res;
+}
+EXPORT_SYMBOL_GPL(svc_reg_xprt_class);
+
+int svc_unreg_xprt_class(struct svc_xprt_class *xcl)
+{
+ struct svc_xprt_class *cl;
+ int res = 0;
+
+ dprintk("svc: Removing svc transport class '%s'\n", xcl->xcl_name);
+
+ spin_lock(&svc_xprt_class_lock);
+ list_for_each_entry(cl, &svc_xprt_class_list, xcl_list) {
+ if (xcl == cl) {
+ list_del_init(&cl->xcl_list);
+ goto out;
+ }
+ }
+ res = -ENOENT;
+ out:
+ spin_unlock(&svc_xprt_class_lock);
+ return res;
+}
+EXPORT_SYMBOL_GPL(svc_unreg_xprt_class);
+
+/*
+ * Called by transport drivers to initialize the transport independent
+ * portion of the transport instance.
+ */
+void svc_xprt_init(struct svc_xprt_class *xcl, struct svc_xprt *xpt)
+{
+ xpt->xpt_class = xcl;
+ xpt->xpt_ops = *xcl->xcl_ops;
+}
+EXPORT_SYMBOL_GPL(svc_xprt_init);

-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2005.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2007-10-01 19:27:37

by Tom Tucker

[permalink] [raw]
Subject: [RFC, PATCH 03/35] svc: Change the svc_sock in the rqstp structure to a transport


The rqstp structure contains a pointer to the transport for the
RPC request. This functionaly trivial patch adds an unamed union
with pointers to both svc_sock and svc_xprt. Ultimately the
union will be removed and only the rq_xprt field will remain. This
allows incrementally extracting transport independent interfaces without
one gigundo patch.

Signed-off-by: Tom Tucker <[email protected]>
---

include/linux/sunrpc/svc.h | 5 ++++-
1 files changed, 4 insertions(+), 1 deletions(-)

diff --git a/include/linux/sunrpc/svc.h b/include/linux/sunrpc/svc.h
index 8531a70..37f7448 100644
--- a/include/linux/sunrpc/svc.h
+++ b/include/linux/sunrpc/svc.h
@@ -204,7 +204,10 @@ union svc_addr_u {
struct svc_rqst {
struct list_head rq_list; /* idle list */
struct list_head rq_all; /* all threads list */
- struct svc_sock * rq_sock; /* socket */
+ union {
+ struct svc_xprt * rq_xprt; /* transport ptr */
+ struct svc_sock * rq_sock; /* socket ptr */
+ };
struct sockaddr_storage rq_addr; /* peer address */
size_t rq_addrlen;


-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2005.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2007-10-01 19:27:34

by Tom Tucker

[permalink] [raw]
Subject: [RFC,PATCH 02/35] svc: Make svc_sock the tcp/udp transport


Make TCP and UDP svc_sock transports, and register them
with the svc transport core.

A transport type (svc_sock) has an svc_xprt as its first member,
and calls svc_xprt_init to initialize this field.

Signed-off-by: Tom Tucker <[email protected]>
---

include/linux/sunrpc/debug.h | 1 -
include/linux/sunrpc/svcsock.h | 4 ++++
net/sunrpc/sunrpc_syms.c | 4 +++-
net/sunrpc/svcsock.c | 33 ++++++++++++++++++++++++++++++++-
4 files changed, 39 insertions(+), 3 deletions(-)

diff --git a/include/linux/sunrpc/debug.h b/include/linux/sunrpc/debug.h
index 092fcfa..10709cb 100644
--- a/include/linux/sunrpc/debug.h
+++ b/include/linux/sunrpc/debug.h
@@ -20,7 +20,6 @@ #define RPCDBG_AUTH 0x0010
#define RPCDBG_BIND 0x0020
#define RPCDBG_SCHED 0x0040
#define RPCDBG_TRANS 0x0080
-#define RPCDBG_SVCSOCK 0x0100
#define RPCDBG_SVCXPRT 0x0100
#define RPCDBG_SVCDSP 0x0200
#define RPCDBG_MISC 0x0400
diff --git a/include/linux/sunrpc/svcsock.h b/include/linux/sunrpc/svcsock.h
index a53e0fa..1878cbe 100644
--- a/include/linux/sunrpc/svcsock.h
+++ b/include/linux/sunrpc/svcsock.h
@@ -10,11 +10,13 @@ #ifndef SUNRPC_SVCSOCK_H
#define SUNRPC_SVCSOCK_H

#include <linux/sunrpc/svc.h>
+#include <linux/sunrpc/svc_xprt.h>

/*
* RPC server socket.
*/
struct svc_sock {
+ struct svc_xprt sk_xprt;
struct list_head sk_ready; /* list of ready sockets */
struct list_head sk_list; /* list of all sockets */
struct socket * sk_sock; /* berkeley socket layer */
@@ -78,6 +80,8 @@ int svc_addsock(struct svc_serv *serv,
int fd,
char *name_return,
int *proto);
+void svc_init_xprt_sock(void);
+void svc_cleanup_xprt_sock(void);

/*
* svc_makesock socket characteristics
diff --git a/net/sunrpc/sunrpc_syms.c b/net/sunrpc/sunrpc_syms.c
index 384c4ad..a62ce47 100644
--- a/net/sunrpc/sunrpc_syms.c
+++ b/net/sunrpc/sunrpc_syms.c
@@ -151,7 +151,8 @@ #ifdef CONFIG_PROC_FS
#endif
cache_register(&ip_map_cache);
cache_register(&unix_gid_cache);
- init_socket_xprt();
+ svc_init_xprt_sock(); /* svc sock transport */
+ init_socket_xprt(); /* clnt sock transport */
rpcauth_init_module();
out:
return err;
@@ -162,6 +163,7 @@ cleanup_sunrpc(void)
{
rpcauth_remove_module();
cleanup_socket_xprt();
+ svc_cleanup_xprt_sock();
unregister_rpc_pipefs();
rpc_destroy_mempool();
if (cache_unregister(&ip_map_cache))
diff --git a/net/sunrpc/svcsock.c b/net/sunrpc/svcsock.c
index 036ab52..d52a6e2 100644
--- a/net/sunrpc/svcsock.c
+++ b/net/sunrpc/svcsock.c
@@ -74,7 +74,7 @@ #include <linux/sunrpc/stats.h>
*
*/

-#define RPCDBG_FACILITY RPCDBG_SVCSOCK
+#define RPCDBG_FACILITY RPCDBG_SVCXPRT


static struct svc_sock *svc_setup_socket(struct svc_serv *, struct socket *,
@@ -899,12 +899,21 @@ svc_udp_sendto(struct svc_rqst *rqstp)
return error;
}

+static struct svc_xprt_ops svc_udp_ops = {
+};
+
+static struct svc_xprt_class svc_udp_class = {
+ .xcl_name = "udp",
+ .xcl_ops = &svc_udp_ops,
+};
+
static void
svc_udp_init(struct svc_sock *svsk)
{
int one = 1;
mm_segment_t oldfs;

+ svc_xprt_init(&svc_udp_class, &svsk->sk_xprt);
svsk->sk_sk->sk_data_ready = svc_udp_data_ready;
svsk->sk_sk->sk_write_space = svc_write_space;
svsk->sk_recvfrom = svc_udp_recvfrom;
@@ -1343,12 +1352,33 @@ svc_tcp_sendto(struct svc_rqst *rqstp)
return sent;
}

+static struct svc_xprt_ops svc_tcp_ops = {
+};
+
+static struct svc_xprt_class svc_tcp_class = {
+ .xcl_name = "tcp",
+ .xcl_ops = &svc_tcp_ops,
+};
+
+void svc_init_xprt_sock(void)
+{
+ svc_reg_xprt_class(&svc_tcp_class);
+ svc_reg_xprt_class(&svc_udp_class);
+}
+
+void svc_cleanup_xprt_sock(void)
+{
+ svc_unreg_xprt_class(&svc_tcp_class);
+ svc_unreg_xprt_class(&svc_udp_class);
+}
+
static void
svc_tcp_init(struct svc_sock *svsk)
{
struct sock *sk = svsk->sk_sk;
struct tcp_sock *tp = tcp_sk(sk);

+ svc_xprt_init(&svc_tcp_class, &svsk->sk_xprt);
svsk->sk_recvfrom = svc_tcp_recvfrom;
svsk->sk_sendto = svc_tcp_sendto;

@@ -1964,3 +1994,4 @@ static struct svc_deferred_req *svc_defe
spin_unlock(&svsk->sk_lock);
return dr;
}
+

-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2005.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2007-10-01 19:27:42

by Tom Tucker

[permalink] [raw]
Subject: [RFC, PATCH 05/35] svc: Move sk_sendto and sk_recvfrom to svc_xprt_class


The sk_sendto and sk_recvfrom are function pointers that allow svc_sock
to be used for both UDP and TCP. Move these function pointers to the
svc_xprt_ops structure.

Signed-off-by: Tom Tucker <[email protected]>
---

include/linux/sunrpc/svc_xprt.h | 2 ++
include/linux/sunrpc/svcsock.h | 3 ---
net/sunrpc/svcsock.c | 12 ++++++------
3 files changed, 8 insertions(+), 9 deletions(-)

diff --git a/include/linux/sunrpc/svc_xprt.h b/include/linux/sunrpc/svc_xprt.h
index 827f0fe..f0ba052 100644
--- a/include/linux/sunrpc/svc_xprt.h
+++ b/include/linux/sunrpc/svc_xprt.h
@@ -10,6 +10,8 @@ #define SUNRPC_SVC_XPRT_H
#include <linux/sunrpc/svc.h>

struct svc_xprt_ops {
+ int (*xpo_recvfrom)(struct svc_rqst *);
+ int (*xpo_sendto)(struct svc_rqst *);
};

struct svc_xprt_class {
diff --git a/include/linux/sunrpc/svcsock.h b/include/linux/sunrpc/svcsock.h
index 1878cbe..08e78d0 100644
--- a/include/linux/sunrpc/svcsock.h
+++ b/include/linux/sunrpc/svcsock.h
@@ -45,9 +45,6 @@ #define SK_DETACHED 10 /* detached fro
* be revisted */
struct mutex sk_mutex; /* to serialize sending data */

- int (*sk_recvfrom)(struct svc_rqst *rqstp);
- int (*sk_sendto)(struct svc_rqst *rqstp);
-
/* We keep the old state_change and data_ready CB's here */
void (*sk_ostate)(struct sock *);
void (*sk_odata)(struct sock *, int bytes);
diff --git a/net/sunrpc/svcsock.c b/net/sunrpc/svcsock.c
index d84b5c8..150531f 100644
--- a/net/sunrpc/svcsock.c
+++ b/net/sunrpc/svcsock.c
@@ -900,6 +900,8 @@ svc_udp_sendto(struct svc_rqst *rqstp)
}

static struct svc_xprt_ops svc_udp_ops = {
+ .xpo_recvfrom = svc_udp_recvfrom,
+ .xpo_sendto = svc_udp_sendto,
};

static struct svc_xprt_class svc_udp_class = {
@@ -917,8 +919,6 @@ svc_udp_init(struct svc_sock *svsk)
svc_xprt_init(&svc_udp_class, &svsk->sk_xprt);
svsk->sk_sk->sk_data_ready = svc_udp_data_ready;
svsk->sk_sk->sk_write_space = svc_write_space;
- svsk->sk_recvfrom = svc_udp_recvfrom;
- svsk->sk_sendto = svc_udp_sendto;

/* initialise setting must have enough space to
* receive and respond to one request.
@@ -1354,6 +1354,8 @@ svc_tcp_sendto(struct svc_rqst *rqstp)
}

static struct svc_xprt_ops svc_tcp_ops = {
+ .xpo_recvfrom = svc_tcp_recvfrom,
+ .xpo_sendto = svc_tcp_sendto,
};

static struct svc_xprt_class svc_tcp_class = {
@@ -1381,8 +1383,6 @@ svc_tcp_init(struct svc_sock *svsk)
struct tcp_sock *tp = tcp_sk(sk);

svc_xprt_init(&svc_tcp_class, &svsk->sk_xprt);
- svsk->sk_recvfrom = svc_tcp_recvfrom;
- svsk->sk_sendto = svc_tcp_sendto;

if (sk->sk_state == TCP_LISTEN) {
dprintk("setting up TCP socket for listening\n");
@@ -1530,7 +1530,7 @@ svc_recv(struct svc_rqst *rqstp, long ti

dprintk("svc: server %p, pool %u, socket %p, inuse=%d\n",
rqstp, pool->sp_id, svsk, atomic_read(&svsk->sk_inuse));
- len = svsk->sk_recvfrom(rqstp);
+ len = svsk->sk_xprt.xpt_ops.xpo_recvfrom(rqstp);
dprintk("svc: got len=%d\n", len);

/* No data, incomplete (TCP) read, or accept() */
@@ -1590,7 +1590,7 @@ svc_send(struct svc_rqst *rqstp)
if (test_bit(SK_DEAD, &svsk->sk_flags))
len = -ENOTCONN;
else
- len = svsk->sk_sendto(rqstp);
+ len = svsk->sk_xprt.xpt_ops.xpo_sendto(rqstp);
mutex_unlock(&svsk->sk_mutex);
svc_sock_release(rqstp);


-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2005.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2007-10-01 19:27:46

by Tom Tucker

[permalink] [raw]
Subject: [RFC,PATCH 07/35] svc: Add per-transport delete functions


Add transport specific xpo_detach and xpo_free functions. The xpo_detach
function causes the transport to stop delivering data-ready events
and enqueing the transport for I/O.

The xpo_free function frees all resources associated with the particular
transport instance.

Signed-off-by: Tom Tucker <[email protected]>
---

include/linux/sunrpc/svc_xprt.h | 2 +
net/sunrpc/svcsock.c | 58 ++++++++++++++++++++++++++++++---------
2 files changed, 47 insertions(+), 13 deletions(-)

diff --git a/include/linux/sunrpc/svc_xprt.h b/include/linux/sunrpc/svc_xprt.h
index 5871faa..85d84b2 100644
--- a/include/linux/sunrpc/svc_xprt.h
+++ b/include/linux/sunrpc/svc_xprt.h
@@ -13,6 +13,8 @@ struct svc_xprt_ops {
int (*xpo_recvfrom)(struct svc_rqst *);
int (*xpo_sendto)(struct svc_rqst *);
void (*xpo_release)(struct svc_rqst *);
+ void (*xpo_detach)(struct svc_xprt *);
+ void (*xpo_free)(struct svc_xprt *);
};

struct svc_xprt_class {
diff --git a/net/sunrpc/svcsock.c b/net/sunrpc/svcsock.c
index 2d5731c..cb13977 100644
--- a/net/sunrpc/svcsock.c
+++ b/net/sunrpc/svcsock.c
@@ -84,6 +84,8 @@ static void svc_udp_data_ready(struct s
static int svc_udp_recvfrom(struct svc_rqst *);
static int svc_udp_sendto(struct svc_rqst *);
static void svc_close_socket(struct svc_sock *svsk);
+static void svc_sock_detach(struct svc_xprt *);
+static void svc_sock_free(struct svc_xprt *);

static struct svc_deferred_req *svc_deferred_dequeue(struct svc_sock *svsk);
static int svc_deferred_recv(struct svc_rqst *rqstp);
@@ -376,16 +378,8 @@ static inline void
svc_sock_put(struct svc_sock *svsk)
{
if (atomic_dec_and_test(&svsk->sk_inuse)) {
- BUG_ON(! test_bit(SK_DEAD, &svsk->sk_flags));
-
- dprintk("svc: releasing dead socket\n");
- if (svsk->sk_sock->file)
- sockfd_put(svsk->sk_sock);
- else
- sock_release(svsk->sk_sock);
- if (svsk->sk_info_authunix != NULL)
- svcauth_unix_info_release(svsk->sk_info_authunix);
- kfree(svsk);
+ BUG_ON(!test_bit(SK_DEAD, &svsk->sk_flags));
+ svsk->sk_xprt.xpt_ops.xpo_free(&svsk->sk_xprt);
}
}

@@ -903,6 +897,8 @@ static struct svc_xprt_ops svc_udp_ops =
.xpo_recvfrom = svc_udp_recvfrom,
.xpo_sendto = svc_udp_sendto,
.xpo_release = svc_release_skb,
+ .xpo_detach = svc_sock_detach,
+ .xpo_free = svc_sock_free,
};

static struct svc_xprt_class svc_udp_class = {
@@ -1358,6 +1354,8 @@ static struct svc_xprt_ops svc_tcp_ops =
.xpo_recvfrom = svc_tcp_recvfrom,
.xpo_sendto = svc_tcp_sendto,
.xpo_release = svc_release_skb,
+ .xpo_detach = svc_sock_detach,
+ .xpo_free = svc_sock_free,
};

static struct svc_xprt_class svc_tcp_class = {
@@ -1815,6 +1813,42 @@ bummer:
}

/*
+ * Detach the svc_sock from the socket so that no
+ * more callbacks occur.
+ */
+static void
+svc_sock_detach(struct svc_xprt *xprt)
+{
+ struct svc_sock *svsk = container_of(xprt, struct svc_sock, sk_xprt);
+ struct sock *sk = svsk->sk_sk;
+
+ dprintk("svc: svc_sock_detach(%p)\n", svsk);
+
+ /* put back the old socket callbacks */
+ sk->sk_state_change = svsk->sk_ostate;
+ sk->sk_data_ready = svsk->sk_odata;
+ sk->sk_write_space = svsk->sk_owspace;
+}
+
+/*
+ * Free the svc_sock's socket resources and the svc_sock itself.
+ */
+static void
+svc_sock_free(struct svc_xprt *xprt)
+{
+ struct svc_sock *svsk = container_of(xprt, struct svc_sock, sk_xprt);
+ dprintk("svc: svc_sock_free(%p)\n", svsk);
+
+ if (svsk->sk_info_authunix != NULL)
+ svcauth_unix_info_release(svsk->sk_info_authunix);
+ if (svsk->sk_sock->file)
+ sockfd_put(svsk->sk_sock);
+ else
+ sock_release(svsk->sk_sock);
+ kfree(svsk);
+}
+
+/*
* Remove a dead socket
*/
static void
@@ -1828,9 +1862,7 @@ svc_delete_socket(struct svc_sock *svsk)
serv = svsk->sk_server;
sk = svsk->sk_sk;

- sk->sk_state_change = svsk->sk_ostate;
- sk->sk_data_ready = svsk->sk_odata;
- sk->sk_write_space = svsk->sk_owspace;
+ svsk->sk_xprt.xpt_ops.xpo_detach(&svsk->sk_xprt);

spin_lock_bh(&serv->sv_lock);


-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2005.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2007-10-01 19:27:43

by Tom Tucker

[permalink] [raw]
Subject: [RFC, PATCH 06/35] svc: Add transport specific xpo_release function


The svc_sock_release function releases pages allocated to a thread. For
UDP, this also returns the receive skb to the stack. For RDMA it will
post a receive WR and bump the client credit count.

Signed-off-by: Tom Tucker <[email protected]>
---

include/linux/sunrpc/svc.h | 2 +-
include/linux/sunrpc/svc_xprt.h | 1 +
net/sunrpc/svcsock.c | 16 +++++++++-------
3 files changed, 11 insertions(+), 8 deletions(-)

diff --git a/include/linux/sunrpc/svc.h b/include/linux/sunrpc/svc.h
index 37f7448..cfb2652 100644
--- a/include/linux/sunrpc/svc.h
+++ b/include/linux/sunrpc/svc.h
@@ -217,7 +217,7 @@ struct svc_rqst {
struct auth_ops * rq_authop; /* authentication flavour */
u32 rq_flavor; /* pseudoflavor */
struct svc_cred rq_cred; /* auth info */
- struct sk_buff * rq_skbuff; /* fast recv inet buffer */
+ void * rq_xprt_ctxt; /* transport specific context ptr */
struct svc_deferred_req*rq_deferred; /* deferred request we are replaying */

struct xdr_buf rq_arg;
diff --git a/include/linux/sunrpc/svc_xprt.h b/include/linux/sunrpc/svc_xprt.h
index f0ba052..5871faa 100644
--- a/include/linux/sunrpc/svc_xprt.h
+++ b/include/linux/sunrpc/svc_xprt.h
@@ -12,6 +12,7 @@ #include <linux/sunrpc/svc.h>
struct svc_xprt_ops {
int (*xpo_recvfrom)(struct svc_rqst *);
int (*xpo_sendto)(struct svc_rqst *);
+ void (*xpo_release)(struct svc_rqst *);
};

struct svc_xprt_class {
diff --git a/net/sunrpc/svcsock.c b/net/sunrpc/svcsock.c
index 150531f..2d5731c 100644
--- a/net/sunrpc/svcsock.c
+++ b/net/sunrpc/svcsock.c
@@ -184,14 +184,14 @@ svc_thread_dequeue(struct svc_pool *pool
/*
* Release an skbuff after use
*/
-static inline void
+static void
svc_release_skb(struct svc_rqst *rqstp)
{
- struct sk_buff *skb = rqstp->rq_skbuff;
+ struct sk_buff *skb = rqstp->rq_xprt_ctxt;
struct svc_deferred_req *dr = rqstp->rq_deferred;

if (skb) {
- rqstp->rq_skbuff = NULL;
+ rqstp->rq_xprt_ctxt = NULL;

dprintk("svc: service %p, releasing skb %p\n", rqstp, skb);
skb_free_datagram(rqstp->rq_sock->sk_sk, skb);
@@ -394,7 +394,7 @@ svc_sock_release(struct svc_rqst *rqstp)
{
struct svc_sock *svsk = rqstp->rq_sock;

- svc_release_skb(rqstp);
+ rqstp->rq_xprt->xpt_ops.xpo_release(rqstp);

svc_free_res_pages(rqstp);
rqstp->rq_res.page_len = 0;
@@ -866,7 +866,7 @@ svc_udp_recvfrom(struct svc_rqst *rqstp)
skb_free_datagram(svsk->sk_sk, skb);
return 0;
}
- rqstp->rq_skbuff = skb;
+ rqstp->rq_xprt_ctxt = skb;
}

rqstp->rq_arg.page_base = 0;
@@ -902,6 +902,7 @@ svc_udp_sendto(struct svc_rqst *rqstp)
static struct svc_xprt_ops svc_udp_ops = {
.xpo_recvfrom = svc_udp_recvfrom,
.xpo_sendto = svc_udp_sendto,
+ .xpo_release = svc_release_skb,
};

static struct svc_xprt_class svc_udp_class = {
@@ -1290,7 +1291,7 @@ svc_tcp_recvfrom(struct svc_rqst *rqstp)
rqstp->rq_arg.page_len = len - rqstp->rq_arg.head[0].iov_len;
}

- rqstp->rq_skbuff = NULL;
+ rqstp->rq_xprt_ctxt = NULL;
rqstp->rq_prot = IPPROTO_TCP;

/* Reset TCP read info */
@@ -1356,6 +1357,7 @@ svc_tcp_sendto(struct svc_rqst *rqstp)
static struct svc_xprt_ops svc_tcp_ops = {
.xpo_recvfrom = svc_tcp_recvfrom,
.xpo_sendto = svc_tcp_sendto,
+ .xpo_release = svc_release_skb,
};

static struct svc_xprt_class svc_tcp_class = {
@@ -1577,7 +1579,7 @@ svc_send(struct svc_rqst *rqstp)
}

/* release the receive skb before sending the reply */
- svc_release_skb(rqstp);
+ rqstp->rq_xprt->xpt_ops.xpo_release(rqstp);

/* calculate over-all length */
xb = & rqstp->rq_res;

-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2005.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2007-10-01 19:27:55

by Tom Tucker

[permalink] [raw]
Subject: [RFC,PATCH 11/35] svc: Add xpo_accept transport function


Previously, the accept logic looked into the socket state to determine
whether to call accept or recv when data-ready was indicated on an endpoint.
Since some transports don't use sockets, this logic was changed to use a flag
bit (SK_LISTENER) to identify listening endpoints. A transport function
(xpo_accept) was added to allow each transport to define its own accept
processing. A transport's initialization logic is reponsible for setting the
SK_LISTENER bit. I didn't see any way to do this in transport independent
logic since the passive side of a UDP connection doesn't listen and
always recv's.

In the svc_recv function, if the SK_LISTENER bit is set, the transport
xpo_accept function is called to handle accept processing.

Note that all functions are defined even if they don't make sense
for a given transport. For example, accept doesn't mean anything for
UDP. The fuction is defined anyway and bug checks if called. The
UDP transport should never set the SK_LISTENER bit.

The code that poaches connections when the connection
limit is hit was moved to a subroutine to make the accept logic path
easier to follow. Since this is in the new connection path, it should
not be a performance issue.

Signed-off-by: Tom Tucker <[email protected]>
---

include/linux/sunrpc/svc_xprt.h | 1
include/linux/sunrpc/svcsock.h | 1
net/sunrpc/svcsock.c | 130 ++++++++++++++++++++++-----------------
3 files changed, 75 insertions(+), 57 deletions(-)

diff --git a/include/linux/sunrpc/svc_xprt.h b/include/linux/sunrpc/svc_xprt.h
index 47bedfa..4c1a650 100644
--- a/include/linux/sunrpc/svc_xprt.h
+++ b/include/linux/sunrpc/svc_xprt.h
@@ -10,6 +10,7 @@ #define SUNRPC_SVC_XPRT_H
#include <linux/sunrpc/svc.h>

struct svc_xprt_ops {
+ struct svc_xprt *(*xpo_accept)(struct svc_xprt *);
int (*xpo_has_wspace)(struct svc_xprt *);
int (*xpo_recvfrom)(struct svc_rqst *);
void (*xpo_prep_reply_hdr)(struct svc_rqst *);
diff --git a/include/linux/sunrpc/svcsock.h b/include/linux/sunrpc/svcsock.h
index 08e78d0..9882ce0 100644
--- a/include/linux/sunrpc/svcsock.h
+++ b/include/linux/sunrpc/svcsock.h
@@ -36,6 +36,7 @@ #define SK_CHNGBUF 7 /* need to change
#define SK_DEFERRED 8 /* request on sk_deferred */
#define SK_OLD 9 /* used for temp socket aging mark+sweep */
#define SK_DETACHED 10 /* detached from tempsocks list */
+#define SK_LISTENER 11 /* listening endpoint */

atomic_t sk_reserved; /* space on outq that is reserved */

diff --git a/net/sunrpc/svcsock.c b/net/sunrpc/svcsock.c
index 1028914..ffc54a1 100644
--- a/net/sunrpc/svcsock.c
+++ b/net/sunrpc/svcsock.c
@@ -914,6 +914,13 @@ svc_udp_has_wspace(struct svc_xprt *xprt
return 1;
}

+static struct svc_xprt *
+svc_udp_accept(struct svc_xprt *xprt)
+{
+ BUG();
+ return NULL;
+}
+
static struct svc_xprt_ops svc_udp_ops = {
.xpo_recvfrom = svc_udp_recvfrom,
.xpo_sendto = svc_udp_sendto,
@@ -922,6 +929,7 @@ static struct svc_xprt_ops svc_udp_ops =
.xpo_free = svc_sock_free,
.xpo_prep_reply_hdr = svc_udp_prep_reply_hdr,
.xpo_has_wspace = svc_udp_has_wspace,
+ .xpo_accept = svc_udp_accept,
};

static struct svc_xprt_class svc_udp_class = {
@@ -1046,9 +1054,10 @@ static inline int svc_port_is_privileged
/*
* Accept a TCP connection
*/
-static void
-svc_tcp_accept(struct svc_sock *svsk)
+static struct svc_xprt *
+svc_tcp_accept(struct svc_xprt *xprt)
{
+ struct svc_sock *svsk = container_of(xprt, struct svc_sock, sk_xprt);
struct sockaddr_storage addr;
struct sockaddr *sin = (struct sockaddr *) &addr;
struct svc_serv *serv = svsk->sk_server;
@@ -1060,7 +1069,7 @@ svc_tcp_accept(struct svc_sock *svsk)

dprintk("svc: tcp_accept %p sock %p\n", svsk, sock);
if (!sock)
- return;
+ return NULL;

clear_bit(SK_CONN, &svsk->sk_flags);
err = kernel_accept(sock, &newsock, O_NONBLOCK);
@@ -1071,7 +1080,7 @@ svc_tcp_accept(struct svc_sock *svsk)
else if (err != -EAGAIN && net_ratelimit())
printk(KERN_WARNING "%s: accept failed (err %d)!\n",
serv->sv_name, -err);
- return;
+ return NULL;
}

set_bit(SK_CONN, &svsk->sk_flags);
@@ -1117,59 +1126,14 @@ svc_tcp_accept(struct svc_sock *svsk)

svc_sock_received(newsvsk);

- /* make sure that we don't have too many active connections.
- * If we have, something must be dropped.
- *
- * There's no point in trying to do random drop here for
- * DoS prevention. The NFS clients does 1 reconnect in 15
- * seconds. An attacker can easily beat that.
- *
- * The only somewhat efficient mechanism would be if drop
- * old connections from the same IP first. But right now
- * we don't even record the client IP in svc_sock.
- */
- if (serv->sv_tmpcnt > (serv->sv_nrthreads+3)*20) {
- struct svc_sock *svsk = NULL;
- spin_lock_bh(&serv->sv_lock);
- if (!list_empty(&serv->sv_tempsocks)) {
- if (net_ratelimit()) {
- /* Try to help the admin */
- printk(KERN_NOTICE "%s: too many open TCP "
- "sockets, consider increasing the "
- "number of nfsd threads\n",
- serv->sv_name);
- printk(KERN_NOTICE
- "%s: last TCP connect from %s\n",
- serv->sv_name, __svc_print_addr(sin,
- buf, sizeof(buf)));
- }
- /*
- * Always select the oldest socket. It's not fair,
- * but so is life
- */
- svsk = list_entry(serv->sv_tempsocks.prev,
- struct svc_sock,
- sk_list);
- set_bit(SK_CLOSE, &svsk->sk_flags);
- atomic_inc(&svsk->sk_inuse);
- }
- spin_unlock_bh(&serv->sv_lock);
-
- if (svsk) {
- svc_sock_enqueue(svsk);
- svc_sock_put(svsk);
- }
-
- }
-
if (serv->sv_stats)
serv->sv_stats->nettcpconn++;

- return;
+ return &newsvsk->sk_xprt;

failed:
sock_release(newsock);
- return;
+ return NULL;
}

/*
@@ -1194,12 +1158,6 @@ svc_tcp_recvfrom(struct svc_rqst *rqstp)
return svc_deferred_recv(rqstp);
}

- if (svsk->sk_sk->sk_state == TCP_LISTEN) {
- svc_tcp_accept(svsk);
- svc_sock_received(svsk);
- return 0;
- }
-
if (test_and_clear_bit(SK_CHNGBUF, &svsk->sk_flags))
/* sndbuf needs to have room for one request
* per thread, otherwise we can stall even when the
@@ -1407,6 +1365,7 @@ static struct svc_xprt_ops svc_tcp_ops =
.xpo_free = svc_sock_free,
.xpo_prep_reply_hdr = svc_tcp_prep_reply_hdr,
.xpo_has_wspace = svc_tcp_has_wspace,
+ .xpo_accept = svc_tcp_accept,
};

static struct svc_xprt_class svc_tcp_class = {
@@ -1488,6 +1447,55 @@ svc_sock_update_bufs(struct svc_serv *se
spin_unlock_bh(&serv->sv_lock);
}

+static void
+svc_check_conn_limits(struct svc_serv *serv)
+{
+ char buf[RPC_MAX_ADDRBUFLEN];
+
+ /* make sure that we don't have too many active connections.
+ * If we have, something must be dropped.
+ *
+ * There's no point in trying to do random drop here for
+ * DoS prevention. The NFS clients does 1 reconnect in 15
+ * seconds. An attacker can easily beat that.
+ *
+ * The only somewhat efficient mechanism would be if drop
+ * old connections from the same IP first. But right now
+ * we don't even record the client IP in svc_sock.
+ */
+ if (serv->sv_tmpcnt > (serv->sv_nrthreads+3)*20) {
+ struct svc_sock *svsk = NULL;
+ spin_lock_bh(&serv->sv_lock);
+ if (!list_empty(&serv->sv_tempsocks)) {
+ if (net_ratelimit()) {
+ /* Try to help the admin */
+ printk(KERN_NOTICE "%s: too many open TCP "
+ "sockets, consider increasing the "
+ "number of nfsd threads\n",
+ serv->sv_name);
+ printk(KERN_NOTICE
+ "%s: last TCP connect from %s\n",
+ serv->sv_name, buf);
+ }
+ /*
+ * Always select the oldest socket. It's not fair,
+ * but so is life
+ */
+ svsk = list_entry(serv->sv_tempsocks.prev,
+ struct svc_sock,
+ sk_list);
+ set_bit(SK_CLOSE, &svsk->sk_flags);
+ atomic_inc(&svsk->sk_inuse);
+ }
+ spin_unlock_bh(&serv->sv_lock);
+
+ if (svsk) {
+ svc_sock_enqueue(svsk);
+ svc_sock_put(svsk);
+ }
+ }
+}
+
/*
* Receive the next request on any socket. This code is carefully
* organised not to touch any cachelines in the shared svc_serv
@@ -1583,6 +1591,12 @@ svc_recv(struct svc_rqst *rqstp, long ti
if (test_bit(SK_CLOSE, &svsk->sk_flags)) {
dprintk("svc_recv: found SK_CLOSE\n");
svc_delete_socket(svsk);
+ } else if (test_bit(SK_LISTENER, &svsk->sk_flags)) {
+ struct svc_xprt *newxpt;
+ newxpt = svsk->sk_xprt.xpt_ops.xpo_accept(&svsk->sk_xprt);
+ if (newxpt)
+ svc_check_conn_limits(svsk->sk_server);
+ svc_sock_received(svsk);
} else {
dprintk("svc: server %p, pool %u, socket %p, inuse=%d\n",
rqstp, pool->sp_id, svsk, atomic_read(&svsk->sk_inuse));
@@ -1859,6 +1873,8 @@ static int svc_create_socket(struct svc_
}

if ((svsk = svc_setup_socket(serv, sock, &error, flags)) != NULL) {
+ if (protocol == IPPROTO_TCP)
+ set_bit(SK_LISTENER, &svsk->sk_flags);
svc_sock_received(svsk);
return ntohs(inet_sk(svsk->sk_sk)->sport);
}

-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2005.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2007-10-01 19:27:48

by Tom Tucker

[permalink] [raw]
Subject: [RFC, PATCH 10/35] svc: Move close processing to a single place


Close handling was duplicated in the UDP and TCP recvfrom
methods. This code has been moved to the transport independent
svc_recv function.

Signed-off-by: Tom Tucker <[email protected]>
---

net/sunrpc/svcsock.c | 24 ++++++++++--------------
1 files changed, 10 insertions(+), 14 deletions(-)

diff --git a/net/sunrpc/svcsock.c b/net/sunrpc/svcsock.c
index 59b6115..1028914 100644
--- a/net/sunrpc/svcsock.c
+++ b/net/sunrpc/svcsock.c
@@ -792,11 +792,6 @@ svc_udp_recvfrom(struct svc_rqst *rqstp)
return svc_deferred_recv(rqstp);
}

- if (test_bit(SK_CLOSE, &svsk->sk_flags)) {
- svc_delete_socket(svsk);
- return 0;
- }
-
clear_bit(SK_DATA, &svsk->sk_flags);
skb = NULL;
err = kernel_recvmsg(svsk->sk_sock, &msg, NULL,
@@ -1199,11 +1194,6 @@ svc_tcp_recvfrom(struct svc_rqst *rqstp)
return svc_deferred_recv(rqstp);
}

- if (test_bit(SK_CLOSE, &svsk->sk_flags)) {
- svc_delete_socket(svsk);
- return 0;
- }
-
if (svsk->sk_sk->sk_state == TCP_LISTEN) {
svc_tcp_accept(svsk);
svc_sock_received(svsk);
@@ -1589,10 +1579,16 @@ svc_recv(struct svc_rqst *rqstp, long ti
}
spin_unlock_bh(&pool->sp_lock);

- dprintk("svc: server %p, pool %u, socket %p, inuse=%d\n",
- rqstp, pool->sp_id, svsk, atomic_read(&svsk->sk_inuse));
- len = svsk->sk_xprt.xpt_ops.xpo_recvfrom(rqstp);
- dprintk("svc: got len=%d\n", len);
+ len = 0;
+ if (test_bit(SK_CLOSE, &svsk->sk_flags)) {
+ dprintk("svc_recv: found SK_CLOSE\n");
+ svc_delete_socket(svsk);
+ } else {
+ dprintk("svc: server %p, pool %u, socket %p, inuse=%d\n",
+ rqstp, pool->sp_id, svsk, atomic_read(&svsk->sk_inuse));
+ len = svsk->sk_xprt.xpt_ops.xpo_recvfrom(rqstp);
+ dprintk("svc: got len=%d\n", len);
+ }

/* No data, incomplete (TCP) read, or accept() */
if (len == 0 || len == -EAGAIN) {

-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2005.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2007-10-01 19:27:48

by Tom Tucker

[permalink] [raw]
Subject: [RFC,PATCH 08/35] svc: Add xpo_prep_reply_hdr


Some transports add fields to the RPC header for replies, e.g. the TCP
record length. This function is called when preparing the reply header
to allow each transport to add whatever fields it requires.

Signed-off-by: Tom Tucker <[email protected]>
---

include/linux/sunrpc/svc_xprt.h | 1 +
net/sunrpc/svc.c | 6 +++---
net/sunrpc/svcsock.c | 19 +++++++++++++++++++
3 files changed, 23 insertions(+), 3 deletions(-)

diff --git a/include/linux/sunrpc/svc_xprt.h b/include/linux/sunrpc/svc_xprt.h
index 85d84b2..1cd86fe 100644
--- a/include/linux/sunrpc/svc_xprt.h
+++ b/include/linux/sunrpc/svc_xprt.h
@@ -11,6 +11,7 @@ #include <linux/sunrpc/svc.h>

struct svc_xprt_ops {
int (*xpo_recvfrom)(struct svc_rqst *);
+ void (*xpo_prep_reply_hdr)(struct svc_rqst *);
int (*xpo_sendto)(struct svc_rqst *);
void (*xpo_release)(struct svc_rqst *);
void (*xpo_detach)(struct svc_xprt *);
diff --git a/net/sunrpc/svc.c b/net/sunrpc/svc.c
index 2a4b3c6..ee68117 100644
--- a/net/sunrpc/svc.c
+++ b/net/sunrpc/svc.c
@@ -815,9 +815,9 @@ svc_process(struct svc_rqst *rqstp)
rqstp->rq_res.tail[0].iov_len = 0;
/* Will be turned off only in gss privacy case: */
rqstp->rq_splice_ok = 1;
- /* tcp needs a space for the record length... */
- if (rqstp->rq_prot == IPPROTO_TCP)
- svc_putnl(resv, 0);
+
+ /* Setup reply header */
+ rqstp->rq_xprt->xpt_ops.xpo_prep_reply_hdr(rqstp);

rqstp->rq_xid = svc_getu32(argv);
svc_putu32(resv, rqstp->rq_xid);
diff --git a/net/sunrpc/svcsock.c b/net/sunrpc/svcsock.c
index cb13977..770d569 100644
--- a/net/sunrpc/svcsock.c
+++ b/net/sunrpc/svcsock.c
@@ -893,12 +893,18 @@ svc_udp_sendto(struct svc_rqst *rqstp)
return error;
}

+static void
+svc_udp_prep_reply_hdr(struct svc_rqst *rqstp)
+{
+}
+
static struct svc_xprt_ops svc_udp_ops = {
.xpo_recvfrom = svc_udp_recvfrom,
.xpo_sendto = svc_udp_sendto,
.xpo_release = svc_release_skb,
.xpo_detach = svc_sock_detach,
.xpo_free = svc_sock_free,
+ .xpo_prep_reply_hdr = svc_udp_prep_reply_hdr,
};

static struct svc_xprt_class svc_udp_class = {
@@ -1350,12 +1356,25 @@ svc_tcp_sendto(struct svc_rqst *rqstp)
return sent;
}

+/*
+ * Setup response header. TCP has a 4B record length field.
+ */
+static void
+svc_tcp_prep_reply_hdr(struct svc_rqst *rqstp)
+{
+ struct kvec *resv = &rqstp->rq_res.head[0];
+
+ /* tcp needs a space for the record length... */
+ svc_putnl(resv, 0);
+}
+
static struct svc_xprt_ops svc_tcp_ops = {
.xpo_recvfrom = svc_tcp_recvfrom,
.xpo_sendto = svc_tcp_sendto,
.xpo_release = svc_release_skb,
.xpo_detach = svc_sock_detach,
.xpo_free = svc_sock_free,
+ .xpo_prep_reply_hdr = svc_tcp_prep_reply_hdr,
};

static struct svc_xprt_class svc_tcp_class = {

-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2005.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2007-10-01 19:28:06

by Tom Tucker

[permalink] [raw]
Subject: [RFC,PATCH 18/35] svc: Move sk_reserved to svc_xprt


This functionally trivial patch moves the sk_reserved field to the
transport independent svc_xprt structure.

Signed-off-by: Tom Tucker <[email protected]>
---

include/linux/sunrpc/svc_xprt.h | 1 +
include/linux/sunrpc/svcsock.h | 2 --
net/sunrpc/svcsock.c | 10 +++++-----
3 files changed, 6 insertions(+), 7 deletions(-)

diff --git a/include/linux/sunrpc/svc_xprt.h b/include/linux/sunrpc/svc_xprt.h
index 6064bc3..e8be38f 100644
--- a/include/linux/sunrpc/svc_xprt.h
+++ b/include/linux/sunrpc/svc_xprt.h
@@ -54,6 +54,7 @@ #define XPT_LISTENER 11 /* listening en

struct svc_pool *xpt_pool; /* current pool iff queued */
struct svc_serv *xpt_server; /* service for transport */
+ atomic_t xpt_reserved; /* space on outq that is rsvd */
};

int svc_reg_xprt_class(struct svc_xprt_class *);
diff --git a/include/linux/sunrpc/svcsock.h b/include/linux/sunrpc/svcsock.h
index 060508b..ba41f11 100644
--- a/include/linux/sunrpc/svcsock.h
+++ b/include/linux/sunrpc/svcsock.h
@@ -20,8 +20,6 @@ struct svc_sock {
struct socket * sk_sock; /* berkeley socket layer */
struct sock * sk_sk; /* INET layer */

- atomic_t sk_reserved; /* space on outq that is reserved */
-
spinlock_t sk_lock; /* protects sk_deferred and
* sk_info_authunix */
struct list_head sk_deferred; /* deferred requests that need to
diff --git a/net/sunrpc/svcsock.c b/net/sunrpc/svcsock.c
index 2ed64fd..8178e65 100644
--- a/net/sunrpc/svcsock.c
+++ b/net/sunrpc/svcsock.c
@@ -304,7 +304,7 @@ svc_sock_enqueue(struct svc_sock *svsk)
rqstp->rq_sock = svsk;
svc_xprt_get(&svsk->sk_xprt);
rqstp->rq_reserved = serv->sv_max_mesg;
- atomic_add(rqstp->rq_reserved, &svsk->sk_reserved);
+ atomic_add(rqstp->rq_reserved, &svsk->sk_xprt.xpt_reserved);
BUG_ON(svsk->sk_xprt.xpt_pool != pool);
wake_up(&rqstp->rq_wait);
} else {
@@ -369,7 +369,7 @@ void svc_reserve(struct svc_rqst *rqstp,

if (space < rqstp->rq_reserved) {
struct svc_sock *svsk = rqstp->rq_sock;
- atomic_sub((rqstp->rq_reserved - space), &svsk->sk_reserved);
+ atomic_sub((rqstp->rq_reserved - space), &svsk->sk_xprt.xpt_reserved);
rqstp->rq_reserved = space;

svc_sock_enqueue(svsk);
@@ -899,7 +899,7 @@ svc_udp_has_wspace(struct svc_xprt *xprt
* sock space.
*/
set_bit(SOCK_NOSPACE, &svsk->sk_sock->flags);
- required = atomic_read(&svsk->sk_reserved) + serv->sv_max_mesg;
+ required = atomic_read(&svsk->sk_xprt.xpt_reserved) + serv->sv_max_mesg;
if (required*2 > sock_wspace(svsk->sk_sk))
return 0;
clear_bit(SOCK_NOSPACE, &svsk->sk_sock->flags);
@@ -1351,7 +1351,7 @@ svc_tcp_has_wspace(struct svc_xprt *xprt
* sock space.
*/
set_bit(SOCK_NOSPACE, &svsk->sk_sock->flags);
- required = atomic_read(&svsk->sk_reserved) + serv->sv_max_mesg;
+ required = atomic_read(&svsk->sk_xprt.xpt_reserved) + serv->sv_max_mesg;
if (required*2 > sk_stream_wspace(svsk->sk_sk))
return 0;
clear_bit(SOCK_NOSPACE, &svsk->sk_sock->flags);
@@ -1568,7 +1568,7 @@ svc_recv(struct svc_rqst *rqstp, long ti
rqstp->rq_sock = svsk;
svc_xprt_get(&svsk->sk_xprt);
rqstp->rq_reserved = serv->sv_max_mesg;
- atomic_add(rqstp->rq_reserved, &svsk->sk_reserved);
+ atomic_add(rqstp->rq_reserved, &svsk->sk_xprt.xpt_reserved);
} else {
/* No data pending. Go to sleep */
svc_thread_enqueue(pool, rqstp);

-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2005.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2007-10-01 19:28:02

by Tom Tucker

[permalink] [raw]
Subject: [RFC,PATCH 14/35] svc: Change sk_inuse to a kref


Change the atomic_t reference count to a kref and move it to the
transport indepenent svc_xprt structure. Change the reference count
wrapper names to be generic.

Signed-off-by: Tom Tucker <[email protected]>
---

include/linux/sunrpc/svc_xprt.h | 8 ++++++
include/linux/sunrpc/svcsock.h | 1 -
net/sunrpc/svc_xprt.c | 17 ++++++++++++
net/sunrpc/svcsock.c | 54 +++++++++++++++------------------------
4 files changed, 46 insertions(+), 34 deletions(-)

diff --git a/include/linux/sunrpc/svc_xprt.h b/include/linux/sunrpc/svc_xprt.h
index 6a34bb4..c77e873 100644
--- a/include/linux/sunrpc/svc_xprt.h
+++ b/include/linux/sunrpc/svc_xprt.h
@@ -8,6 +8,7 @@ #ifndef SUNRPC_SVC_XPRT_H
#define SUNRPC_SVC_XPRT_H

#include <linux/sunrpc/svc.h>
+#include <linux/module.h>

struct svc_xprt_ops {
struct svc_xprt *(*xpo_create)(struct svc_serv *,
@@ -35,11 +36,18 @@ struct svc_xprt {
struct svc_xprt_class *xpt_class;
struct svc_xprt_ops xpt_ops;
u32 xpt_max_payload;
+ struct kref xpt_ref;
};

int svc_reg_xprt_class(struct svc_xprt_class *);
int svc_unreg_xprt_class(struct svc_xprt_class *);
void svc_xprt_init(struct svc_xprt_class *, struct svc_xprt *);
int svc_create_xprt(struct svc_serv *, char *, unsigned short, int);
+void svc_xprt_put(struct svc_xprt *xprt);
+
+static inline void svc_xprt_get(struct svc_xprt *xprt)
+{
+ kref_get(&xprt->xpt_ref);
+}

#endif /* SUNRPC_SVC_XPRT_H */
diff --git a/include/linux/sunrpc/svcsock.h b/include/linux/sunrpc/svcsock.h
index 3181d9d..ba07d50 100644
--- a/include/linux/sunrpc/svcsock.h
+++ b/include/linux/sunrpc/svcsock.h
@@ -24,7 +24,6 @@ struct svc_sock {

struct svc_pool * sk_pool; /* current pool iff queued */
struct svc_serv * sk_server; /* service for this socket */
- atomic_t sk_inuse; /* use count */
unsigned long sk_flags;
#define SK_BUSY 0 /* enqueued/receiving */
#define SK_CONN 1 /* conn pending */
diff --git a/net/sunrpc/svc_xprt.c b/net/sunrpc/svc_xprt.c
index d57064f..05ccfa6 100644
--- a/net/sunrpc/svc_xprt.c
+++ b/net/sunrpc/svc_xprt.c
@@ -82,6 +82,22 @@ int svc_unreg_xprt_class(struct svc_xprt
}
EXPORT_SYMBOL_GPL(svc_unreg_xprt_class);

+static inline void svc_xprt_free(struct kref *kref)
+{
+ struct svc_xprt *xprt =
+ container_of(kref, struct svc_xprt, xpt_ref);
+ struct module *owner = xprt->xpt_class->xcl_owner;
+ BUG_ON(atomic_read(&kref->refcount));
+ xprt->xpt_ops.xpo_free(xprt);
+ module_put(owner);
+}
+
+void svc_xprt_put(struct svc_xprt *xprt)
+{
+ kref_put(&xprt->xpt_ref, svc_xprt_free);
+}
+EXPORT_SYMBOL_GPL(svc_xprt_put);
+
/*
* Called by transport drivers to initialize the transport independent
* portion of the transport instance.
@@ -91,6 +107,7 @@ void svc_xprt_init(struct svc_xprt_class
xpt->xpt_class = xcl;
xpt->xpt_ops = *xcl->xcl_ops;
xpt->xpt_max_payload = xcl->xcl_max_payload;
+ kref_init(&xpt->xpt_ref);
}
EXPORT_SYMBOL_GPL(svc_xprt_init);

diff --git a/net/sunrpc/svcsock.c b/net/sunrpc/svcsock.c
index 373f020..d5e78b9 100644
--- a/net/sunrpc/svcsock.c
+++ b/net/sunrpc/svcsock.c
@@ -65,8 +65,8 @@ #include <linux/sunrpc/stats.h>
* after a clear, the socket must be read/accepted
* if this succeeds, it must be set again.
* SK_CLOSE can set at any time. It is never cleared.
- * sk_inuse contains a bias of '1' until SK_DEAD is set.
- * so when sk_inuse hits zero, we know the socket is dead
+ * xpt_ref contains a bias of '1' until SK_DEAD is set.
+ * so when xprt_ref hits zero, we know the transport is dead
* and no-one is using it.
* SK_DEAD can only be set while SK_BUSY is held which ensures
* no other thread will be using the socket or will try to
@@ -301,7 +301,7 @@ svc_sock_enqueue(struct svc_sock *svsk)
"svc_sock_enqueue: server %p, rq_sock=%p!\n",
rqstp, rqstp->rq_sock);
rqstp->rq_sock = svsk;
- atomic_inc(&svsk->sk_inuse);
+ svc_xprt_get(&svsk->sk_xprt);
rqstp->rq_reserved = serv->sv_max_mesg;
atomic_add(rqstp->rq_reserved, &svsk->sk_reserved);
BUG_ON(svsk->sk_pool != pool);
@@ -332,7 +332,7 @@ svc_sock_dequeue(struct svc_pool *pool)
list_del_init(&svsk->sk_ready);

dprintk("svc: socket %p dequeued, inuse=%d\n",
- svsk->sk_sk, atomic_read(&svsk->sk_inuse));
+ svsk->sk_sk, atomic_read(&svsk->sk_xprt.xpt_ref.refcount));

return svsk;
}
@@ -375,19 +375,6 @@ void svc_reserve(struct svc_rqst *rqstp,
}
}

-/*
- * Release a socket after use.
- */
-static inline void
-svc_sock_put(struct svc_sock *svsk)
-{
- if (atomic_dec_and_test(&svsk->sk_inuse)) {
- BUG_ON(!test_bit(SK_DEAD, &svsk->sk_flags));
- module_put(svsk->sk_xprt.xpt_class->xcl_owner);
- svsk->sk_xprt.xpt_ops.xpo_free(&svsk->sk_xprt);
- }
-}
-
static void
svc_sock_release(struct svc_rqst *rqstp)
{
@@ -414,7 +401,7 @@ svc_sock_release(struct svc_rqst *rqstp)
svc_reserve(rqstp, 0);
rqstp->rq_sock = NULL;

- svc_sock_put(svsk);
+ svc_xprt_put(&svsk->sk_xprt);
}

/*
@@ -1506,13 +1493,13 @@ svc_check_conn_limits(struct svc_serv *s
struct svc_sock,
sk_list);
set_bit(SK_CLOSE, &svsk->sk_flags);
- atomic_inc(&svsk->sk_inuse);
+ svc_xprt_get(&svsk->sk_xprt);
}
spin_unlock_bh(&serv->sv_lock);

if (svsk) {
svc_sock_enqueue(svsk);
- svc_sock_put(svsk);
+ svc_xprt_put(&svsk->sk_xprt);
}
}
}
@@ -1577,7 +1564,7 @@ svc_recv(struct svc_rqst *rqstp, long ti
spin_lock_bh(&pool->sp_lock);
if ((svsk = svc_sock_dequeue(pool)) != NULL) {
rqstp->rq_sock = svsk;
- atomic_inc(&svsk->sk_inuse);
+ svc_xprt_get(&svsk->sk_xprt);
rqstp->rq_reserved = serv->sv_max_mesg;
atomic_add(rqstp->rq_reserved, &svsk->sk_reserved);
} else {
@@ -1626,7 +1613,8 @@ svc_recv(struct svc_rqst *rqstp, long ti
svc_sock_received(svsk);
} else {
dprintk("svc: server %p, pool %u, socket %p, inuse=%d\n",
- rqstp, pool->sp_id, svsk, atomic_read(&svsk->sk_inuse));
+ rqstp, pool->sp_id, svsk,
+ atomic_read(&svsk->sk_xprt.xpt_ref.refcount));
len = svsk->sk_xprt.xpt_ops.xpo_recvfrom(rqstp);
dprintk("svc: got len=%d\n", len);
}
@@ -1723,9 +1711,10 @@ svc_age_temp_sockets(unsigned long closu

if (!test_and_set_bit(SK_OLD, &svsk->sk_flags))
continue;
- if (atomic_read(&svsk->sk_inuse) > 1 || test_bit(SK_BUSY, &svsk->sk_flags))
+ if (atomic_read(&svsk->sk_xprt.xpt_ref.refcount) > 1
+ || test_bit(SK_BUSY, &svsk->sk_flags))
continue;
- atomic_inc(&svsk->sk_inuse);
+ svc_xprt_get(&svsk->sk_xprt);
list_move(le, &to_be_aged);
set_bit(SK_CLOSE, &svsk->sk_flags);
set_bit(SK_DETACHED, &svsk->sk_flags);
@@ -1743,7 +1732,7 @@ svc_age_temp_sockets(unsigned long closu

/* a thread will dequeue and close it soon */
svc_sock_enqueue(svsk);
- svc_sock_put(svsk);
+ svc_xprt_put(&svsk->sk_xprt);
}

mod_timer(&serv->sv_temptimer, jiffies + svc_conn_age_period * HZ);
@@ -1788,7 +1777,6 @@ static struct svc_sock *svc_setup_socket
svsk->sk_odata = inet->sk_data_ready;
svsk->sk_owspace = inet->sk_write_space;
svsk->sk_server = serv;
- atomic_set(&svsk->sk_inuse, 1);
svsk->sk_lastrecv = get_seconds();
spin_lock_init(&svsk->sk_lock);
INIT_LIST_HEAD(&svsk->sk_deferred);
@@ -1977,8 +1965,8 @@ svc_delete_socket(struct svc_sock *svsk)
* is about to be destroyed (in svc_destroy).
*/
if (!test_and_set_bit(SK_DEAD, &svsk->sk_flags)) {
- BUG_ON(atomic_read(&svsk->sk_inuse)<2);
- atomic_dec(&svsk->sk_inuse);
+ BUG_ON(atomic_read(&svsk->sk_xprt.xpt_ref.refcount) < 2);
+ svc_xprt_put(&svsk->sk_xprt);
if (test_bit(SK_TEMP, &svsk->sk_flags))
serv->sv_tmpcnt--;
}
@@ -1993,10 +1981,10 @@ static void svc_close_socket(struct svc_
/* someone else will have to effect the close */
return;

- atomic_inc(&svsk->sk_inuse);
+ svc_xprt_get(&svsk->sk_xprt);
svc_delete_socket(svsk);
clear_bit(SK_BUSY, &svsk->sk_flags);
- svc_sock_put(svsk);
+ svc_xprt_put(&svsk->sk_xprt);
}

void svc_force_close_socket(struct svc_sock *svsk)
@@ -2022,7 +2010,7 @@ static void svc_revisit(struct cache_def
struct svc_sock *svsk;

if (too_many) {
- svc_sock_put(dr->svsk);
+ svc_xprt_put(&dr->svsk->sk_xprt);
kfree(dr);
return;
}
@@ -2034,7 +2022,7 @@ static void svc_revisit(struct cache_def
spin_unlock(&svsk->sk_lock);
set_bit(SK_DEFERRED, &svsk->sk_flags);
svc_sock_enqueue(svsk);
- svc_sock_put(svsk);
+ svc_xprt_put(&svsk->sk_xprt);
}

static struct cache_deferred_req *
@@ -2064,7 +2052,7 @@ svc_defer(struct cache_req *req)
dr->argslen = rqstp->rq_arg.len >> 2;
memcpy(dr->args, rqstp->rq_arg.head[0].iov_base-skip, dr->argslen<<2);
}
- atomic_inc(&rqstp->rq_sock->sk_inuse);
+ svc_xprt_get(rqstp->rq_xprt);
dr->svsk = rqstp->rq_sock;

dr->handle.revisit = svc_revisit;

-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2005.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2007-10-01 19:27:45

by Tom Tucker

[permalink] [raw]
Subject: [RFC, PATCH 09/35] svc: Add a transport function that checks for write space


In order to avoid blocking a service thread, the receive side checks
to see if there is sufficient write space to reply to the request.
Each transport has a different mechanism for determining if there is
enough write space to reply.

The code that checked for white space was coupled with code that
checked for CLOSE and CONN. These checks have been broken out into
separate statements to make the code easier to read.

Signed-off-by: Tom Tucker <[email protected]>
---

include/linux/sunrpc/svc_xprt.h | 1 +
net/sunrpc/svcsock.c | 62 +++++++++++++++++++++++++++++++++------
2 files changed, 53 insertions(+), 10 deletions(-)

diff --git a/include/linux/sunrpc/svc_xprt.h b/include/linux/sunrpc/svc_xprt.h
index 1cd86fe..47bedfa 100644
--- a/include/linux/sunrpc/svc_xprt.h
+++ b/include/linux/sunrpc/svc_xprt.h
@@ -10,6 +10,7 @@ #define SUNRPC_SVC_XPRT_H
#include <linux/sunrpc/svc.h>

struct svc_xprt_ops {
+ int (*xpo_has_wspace)(struct svc_xprt *);
int (*xpo_recvfrom)(struct svc_rqst *);
void (*xpo_prep_reply_hdr)(struct svc_rqst *);
int (*xpo_sendto)(struct svc_rqst *);
diff --git a/net/sunrpc/svcsock.c b/net/sunrpc/svcsock.c
index 770d569..59b6115 100644
--- a/net/sunrpc/svcsock.c
+++ b/net/sunrpc/svcsock.c
@@ -269,22 +269,24 @@ svc_sock_enqueue(struct svc_sock *svsk)
BUG_ON(svsk->sk_pool != NULL);
svsk->sk_pool = pool;

- set_bit(SOCK_NOSPACE, &svsk->sk_sock->flags);
- if (((atomic_read(&svsk->sk_reserved) + serv->sv_max_mesg)*2
- > svc_sock_wspace(svsk))
- && !test_bit(SK_CLOSE, &svsk->sk_flags)
- && !test_bit(SK_CONN, &svsk->sk_flags)) {
+ /* Handle pending connection */
+ if (test_bit(SK_CONN, &svsk->sk_flags))
+ goto process;
+
+ /* Handle close in-progress */
+ if (test_bit(SK_CLOSE, &svsk->sk_flags))
+ goto process;
+
+ /* Check if we have space to reply to a request */
+ if (!svsk->sk_xprt.xpt_ops.xpo_has_wspace(&svsk->sk_xprt)) {
/* Don't enqueue while not enough space for reply */
- dprintk("svc: socket %p no space, %d*2 > %ld, not enqueued\n",
- svsk->sk_sk, atomic_read(&svsk->sk_reserved)+serv->sv_max_mesg,
- svc_sock_wspace(svsk));
+ dprintk("svc: no write space, socket %p not enqueued\n", svsk);
svsk->sk_pool = NULL;
clear_bit(SK_BUSY, &svsk->sk_flags);
goto out_unlock;
}
- clear_bit(SOCK_NOSPACE, &svsk->sk_sock->flags);
-

+ process:
if (!list_empty(&pool->sp_threads)) {
rqstp = list_entry(pool->sp_threads.next,
struct svc_rqst,
@@ -898,6 +900,25 @@ svc_udp_prep_reply_hdr(struct svc_rqst *
{
}

+static int
+svc_udp_has_wspace(struct svc_xprt *xprt)
+{
+ struct svc_sock *svsk = container_of(xprt, struct svc_sock, sk_xprt);
+ struct svc_serv *serv = svsk->sk_server;
+ int required;
+
+ /*
+ * Set the SOCK_NOSPACE flag before checking the available
+ * sock space.
+ */
+ set_bit(SOCK_NOSPACE, &svsk->sk_sock->flags);
+ required = atomic_read(&svsk->sk_reserved) + serv->sv_max_mesg;
+ if (required*2 > sock_wspace(svsk->sk_sk))
+ return 0;
+ clear_bit(SOCK_NOSPACE, &svsk->sk_sock->flags);
+ return 1;
+}
+
static struct svc_xprt_ops svc_udp_ops = {
.xpo_recvfrom = svc_udp_recvfrom,
.xpo_sendto = svc_udp_sendto,
@@ -905,6 +926,7 @@ static struct svc_xprt_ops svc_udp_ops =
.xpo_detach = svc_sock_detach,
.xpo_free = svc_sock_free,
.xpo_prep_reply_hdr = svc_udp_prep_reply_hdr,
+ .xpo_has_wspace = svc_udp_has_wspace,
};

static struct svc_xprt_class svc_udp_class = {
@@ -1368,6 +1390,25 @@ svc_tcp_prep_reply_hdr(struct svc_rqst *
svc_putnl(resv, 0);
}

+static int
+svc_tcp_has_wspace(struct svc_xprt *xprt)
+{
+ struct svc_sock *svsk = container_of(xprt, struct svc_sock, sk_xprt);
+ struct svc_serv *serv = svsk->sk_server;
+ int required;
+
+ /*
+ * Set the SOCK_NOSPACE flag before checking the available
+ * sock space.
+ */
+ set_bit(SOCK_NOSPACE, &svsk->sk_sock->flags);
+ required = atomic_read(&svsk->sk_reserved) + serv->sv_max_mesg;
+ if (required*2 > sk_stream_wspace(svsk->sk_sk))
+ return 0;
+ clear_bit(SOCK_NOSPACE, &svsk->sk_sock->flags);
+ return 1;
+}
+
static struct svc_xprt_ops svc_tcp_ops = {
.xpo_recvfrom = svc_tcp_recvfrom,
.xpo_sendto = svc_tcp_sendto,
@@ -1375,6 +1416,7 @@ static struct svc_xprt_ops svc_tcp_ops =
.xpo_detach = svc_sock_detach,
.xpo_free = svc_sock_free,
.xpo_prep_reply_hdr = svc_tcp_prep_reply_hdr,
+ .xpo_has_wspace = svc_tcp_has_wspace,
};

static struct svc_xprt_class svc_tcp_class = {

-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2005.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2007-10-01 19:28:04

by Tom Tucker

[permalink] [raw]
Subject: [RFC,PATCH 17/35] svc: Make close transport independent


Move sk_list and sk_ready to svc_xprt. This involves close because these
lists are walked by svcs when closing all their transports. So I combined
the moving of these lists to svc_xprt with making close transport independent.

The svc_force_sock_close has been changed to svc_close_all and takes a list
as an argument. This removes some svc internals knowledge from the svcs.

This code races with module removal and transport addition.

Signed-off-by: Tom Tucker <[email protected]>
---

fs/lockd/svc.c | 6 +-
fs/nfsd/nfssvc.c | 4 +-
include/linux/sunrpc/svc_xprt.h | 2 +
include/linux/sunrpc/svcsock.h | 4 --
net/sunrpc/svc.c | 9 +---
net/sunrpc/svc_xprt.c | 2 +
net/sunrpc/svcsock.c | 100 ++++++++++++++++++++-------------------
7 files changed, 63 insertions(+), 64 deletions(-)

diff --git a/fs/lockd/svc.c b/fs/lockd/svc.c
index 8686915..a8e79a9 100644
--- a/fs/lockd/svc.c
+++ b/fs/lockd/svc.c
@@ -221,10 +221,10 @@ lockd(struct svc_rqst *rqstp)

static int find_xprt(struct svc_serv *serv, char *proto)
{
- struct svc_sock *svsk;
+ struct svc_xprt *xprt;
int found = 0;
- list_for_each_entry(svsk, &serv->sv_permsocks, sk_list)
- if (strcmp(svsk->sk_xprt.xpt_class->xcl_name, proto) == 0) {
+ list_for_each_entry(xprt, &serv->sv_permsocks, xpt_list)
+ if (strcmp(xprt->xpt_class->xcl_name, proto) == 0) {
found = 1;
break;
}
diff --git a/fs/nfsd/nfssvc.c b/fs/nfsd/nfssvc.c
index bf70b06..4f6d6fd 100644
--- a/fs/nfsd/nfssvc.c
+++ b/fs/nfsd/nfssvc.c
@@ -155,8 +155,8 @@ static int killsig; /* signal that was u
static void nfsd_last_thread(struct svc_serv *serv)
{
/* When last nfsd thread exits we need to do some clean-up */
- struct svc_sock *svsk;
- list_for_each_entry(svsk, &serv->sv_permsocks, sk_list)
+ struct svc_xprt *xprt;
+ list_for_each_entry(xprt, &serv->sv_permsocks, xpt_list)
lockd_down();
nfsd_serv = NULL;
nfsd_racache_shutdown();
diff --git a/include/linux/sunrpc/svc_xprt.h b/include/linux/sunrpc/svc_xprt.h
index da7f827..6064bc3 100644
--- a/include/linux/sunrpc/svc_xprt.h
+++ b/include/linux/sunrpc/svc_xprt.h
@@ -37,6 +37,8 @@ struct svc_xprt {
struct svc_xprt_ops xpt_ops;
u32 xpt_max_payload;
struct kref xpt_ref;
+ struct list_head xpt_list;
+ struct list_head xpt_ready;
unsigned long xpt_flags;
#define XPT_BUSY 0 /* enqueued/receiving */
#define XPT_CONN 1 /* conn pending */
diff --git a/include/linux/sunrpc/svcsock.h b/include/linux/sunrpc/svcsock.h
index 92d4cc9..060508b 100644
--- a/include/linux/sunrpc/svcsock.h
+++ b/include/linux/sunrpc/svcsock.h
@@ -17,8 +17,6 @@ #include <linux/sunrpc/svc_xprt.h>
*/
struct svc_sock {
struct svc_xprt sk_xprt;
- struct list_head sk_ready; /* list of ready sockets */
- struct list_head sk_list; /* list of all sockets */
struct socket * sk_sock; /* berkeley socket layer */
struct sock * sk_sk; /* INET layer */

@@ -51,7 +49,7 @@ struct svc_sock {
/*
* Function prototypes.
*/
-void svc_force_close_socket(struct svc_sock *);
+void svc_close_all(struct list_head *);
int svc_recv(struct svc_rqst *, long);
int svc_send(struct svc_rqst *);
void svc_drop(struct svc_rqst *);
diff --git a/net/sunrpc/svc.c b/net/sunrpc/svc.c
index ee68117..440ea59 100644
--- a/net/sunrpc/svc.c
+++ b/net/sunrpc/svc.c
@@ -458,9 +458,6 @@ svc_create_pooled(struct svc_program *pr
void
svc_destroy(struct svc_serv *serv)
{
- struct svc_sock *svsk;
- struct svc_sock *tmp;
-
dprintk("svc: svc_destroy(%s, %d)\n",
serv->sv_program->pg_name,
serv->sv_nrthreads);
@@ -475,14 +472,12 @@ svc_destroy(struct svc_serv *serv)

del_timer_sync(&serv->sv_temptimer);

- list_for_each_entry_safe(svsk, tmp, &serv->sv_tempsocks, sk_list)
- svc_force_close_socket(svsk);
+ svc_close_all(&serv->sv_tempsocks);

if (serv->sv_shutdown)
serv->sv_shutdown(serv);

- list_for_each_entry_safe(svsk, tmp, &serv->sv_permsocks, sk_list)
- svc_force_close_socket(svsk);
+ svc_close_all(&serv->sv_permsocks);

BUG_ON(!list_empty(&serv->sv_permsocks));
BUG_ON(!list_empty(&serv->sv_tempsocks));
diff --git a/net/sunrpc/svc_xprt.c b/net/sunrpc/svc_xprt.c
index a6db507..5195131 100644
--- a/net/sunrpc/svc_xprt.c
+++ b/net/sunrpc/svc_xprt.c
@@ -110,6 +110,8 @@ void svc_xprt_init(struct svc_xprt_class
xpt->xpt_max_payload = xcl->xcl_max_payload;
kref_init(&xpt->xpt_ref);
xpt->xpt_server = serv;
+ INIT_LIST_HEAD(&xpt->xpt_list);
+ INIT_LIST_HEAD(&xpt->xpt_ready);
}
EXPORT_SYMBOL_GPL(svc_xprt_init);

diff --git a/net/sunrpc/svcsock.c b/net/sunrpc/svcsock.c
index 80054e4..2ed64fd 100644
--- a/net/sunrpc/svcsock.c
+++ b/net/sunrpc/svcsock.c
@@ -80,11 +80,11 @@ #define RPCDBG_FACILITY RPCDBG_SVCXPRT

static struct svc_sock *svc_setup_socket(struct svc_serv *, struct socket *,
int *errp, int flags);
-static void svc_delete_socket(struct svc_sock *svsk);
+static void svc_delete_xprt(struct svc_xprt *xprt);
static void svc_udp_data_ready(struct sock *, int);
static int svc_udp_recvfrom(struct svc_rqst *);
static int svc_udp_sendto(struct svc_rqst *);
-static void svc_close_socket(struct svc_sock *svsk);
+static void svc_close_xprt(struct svc_xprt *xprt);
static void svc_sock_detach(struct svc_xprt *);
static void svc_sock_free(struct svc_xprt *);

@@ -309,7 +309,7 @@ svc_sock_enqueue(struct svc_sock *svsk)
wake_up(&rqstp->rq_wait);
} else {
dprintk("svc: socket %p put into queue\n", svsk->sk_sk);
- list_add_tail(&svsk->sk_ready, &pool->sp_sockets);
+ list_add_tail(&svsk->sk_xprt.xpt_ready, &pool->sp_sockets);
BUG_ON(svsk->sk_xprt.xpt_pool != pool);
}

@@ -329,8 +329,8 @@ svc_sock_dequeue(struct svc_pool *pool)
return NULL;

svsk = list_entry(pool->sp_sockets.next,
- struct svc_sock, sk_ready);
- list_del_init(&svsk->sk_ready);
+ struct svc_sock, sk_xprt.xpt_ready);
+ list_del_init(&svsk->sk_xprt.xpt_ready);

dprintk("svc: socket %p dequeued, inuse=%d\n",
svsk->sk_sk, atomic_read(&svsk->sk_xprt.xpt_ref.refcount));
@@ -588,7 +588,7 @@ svc_sock_names(char *buf, struct svc_ser
if (!serv)
return 0;
spin_lock_bh(&serv->sv_lock);
- list_for_each_entry(svsk, &serv->sv_permsocks, sk_list) {
+ list_for_each_entry(svsk, &serv->sv_permsocks, sk_xprt.xpt_list) {
int onelen = one_sock_name(buf+len, svsk);
if (toclose && strcmp(toclose, buf+len) == 0)
closesk = svsk;
@@ -600,7 +600,7 @@ svc_sock_names(char *buf, struct svc_ser
/* Should unregister with portmap, but you cannot
* unregister just one protocol...
*/
- svc_close_socket(closesk);
+ svc_close_xprt(&closesk->sk_xprt);
else if (toclose)
return -ENOENT;
return len;
@@ -1278,7 +1278,7 @@ svc_tcp_recvfrom(struct svc_rqst *rqstp)
return len;

err_delete:
- svc_delete_socket(svsk);
+ svc_delete_xprt(&svsk->sk_xprt);
return -EAGAIN;

error:
@@ -1446,12 +1446,12 @@ svc_sock_update_bufs(struct svc_serv *se
spin_lock_bh(&serv->sv_lock);
list_for_each(le, &serv->sv_permsocks) {
struct svc_sock *svsk =
- list_entry(le, struct svc_sock, sk_list);
+ list_entry(le, struct svc_sock, sk_xprt.xpt_list);
set_bit(XPT_CHNGBUF, &svsk->sk_xprt.xpt_flags);
}
list_for_each(le, &serv->sv_tempsocks) {
struct svc_sock *svsk =
- list_entry(le, struct svc_sock, sk_list);
+ list_entry(le, struct svc_sock, sk_xprt.xpt_list);
set_bit(XPT_CHNGBUF, &svsk->sk_xprt.xpt_flags);
}
spin_unlock_bh(&serv->sv_lock);
@@ -1493,7 +1493,7 @@ svc_check_conn_limits(struct svc_serv *s
*/
svsk = list_entry(serv->sv_tempsocks.prev,
struct svc_sock,
- sk_list);
+ sk_xprt.xpt_list);
set_bit(XPT_CLOSE, &svsk->sk_xprt.xpt_flags);
svc_xprt_get(&svsk->sk_xprt);
}
@@ -1600,7 +1600,7 @@ svc_recv(struct svc_rqst *rqstp, long ti
len = 0;
if (test_bit(XPT_CLOSE, &svsk->sk_xprt.xpt_flags)) {
dprintk("svc_recv: found XPT_CLOSE\n");
- svc_delete_socket(svsk);
+ svc_delete_xprt(&svsk->sk_xprt);
} else if (test_bit(XPT_LISTENER, &svsk->sk_xprt.xpt_flags)) {
struct svc_xprt *newxpt;
newxpt = svsk->sk_xprt.xpt_ops.xpo_accept(&svsk->sk_xprt);
@@ -1709,7 +1709,7 @@ svc_age_temp_sockets(unsigned long closu
}

list_for_each_safe(le, next, &serv->sv_tempsocks) {
- svsk = list_entry(le, struct svc_sock, sk_list);
+ svsk = list_entry(le, struct svc_sock, sk_xprt.xpt_list);

if (!test_and_set_bit(XPT_OLD, &svsk->sk_xprt.xpt_flags))
continue;
@@ -1725,9 +1725,9 @@ svc_age_temp_sockets(unsigned long closu

while (!list_empty(&to_be_aged)) {
le = to_be_aged.next;
- /* fiddling the sk_list node is safe 'cos we're XPT_DETACHED */
+ /* fiddling the sk_xprt.xpt_list node is safe 'cos we're XPT_DETACHED */
list_del_init(le);
- svsk = list_entry(le, struct svc_sock, sk_list);
+ svsk = list_entry(le, struct svc_sock, sk_xprt.xpt_list);

dprintk("queuing svsk %p for closing, %lu seconds old\n",
svsk, get_seconds() - svsk->sk_lastrecv);
@@ -1781,7 +1781,6 @@ static struct svc_sock *svc_setup_socket
svsk->sk_lastrecv = get_seconds();
spin_lock_init(&svsk->sk_lock);
INIT_LIST_HEAD(&svsk->sk_deferred);
- INIT_LIST_HEAD(&svsk->sk_ready);
mutex_init(&svsk->sk_mutex);

/* Initialize the socket */
@@ -1793,7 +1792,7 @@ static struct svc_sock *svc_setup_socket
spin_lock_bh(&serv->sv_lock);
if (is_temporary) {
set_bit(XPT_TEMP, &svsk->sk_xprt.xpt_flags);
- list_add(&svsk->sk_list, &serv->sv_tempsocks);
+ list_add(&svsk->sk_xprt.xpt_list, &serv->sv_tempsocks);
serv->sv_tmpcnt++;
if (serv->sv_temptimer.function == NULL) {
/* setup timer to age temp sockets */
@@ -1804,7 +1803,7 @@ static struct svc_sock *svc_setup_socket
}
} else {
clear_bit(XPT_TEMP, &svsk->sk_xprt.xpt_flags);
- list_add(&svsk->sk_list, &serv->sv_permsocks);
+ list_add(&svsk->sk_xprt.xpt_list, &serv->sv_permsocks);
}
spin_unlock_bh(&serv->sv_lock);

@@ -1939,66 +1938,69 @@ svc_sock_free(struct svc_xprt *xprt)
}

/*
- * Remove a dead socket
+ * Remove a dead transport
*/
static void
-svc_delete_socket(struct svc_sock *svsk)
+svc_delete_xprt(struct svc_xprt *xprt)
{
struct svc_serv *serv;
- struct sock *sk;

- dprintk("svc: svc_delete_socket(%p)\n", svsk);
+ dprintk("svc: svc_delete_xprt(%p)\n", xprt);

- serv = svsk->sk_xprt.xpt_server;
- sk = svsk->sk_sk;
+ serv = xprt->xpt_server;

- svsk->sk_xprt.xpt_ops.xpo_detach(&svsk->sk_xprt);
+ xprt->xpt_ops.xpo_detach(xprt);

spin_lock_bh(&serv->sv_lock);

- if (!test_and_set_bit(XPT_DETACHED, &svsk->sk_xprt.xpt_flags))
- list_del_init(&svsk->sk_list);
+ if (!test_and_set_bit(XPT_DETACHED, &xprt->xpt_flags))
+ list_del_init(&xprt->xpt_list);
/*
- * We used to delete the svc_sock from whichever list
- * it's sk_ready node was on, but we don't actually
+ * We used to delete the transport from whichever list
+ * it's sk_xprt.xpt_ready node was on, but we don't actually
* need to. This is because the only time we're called
* while still attached to a queue, the queue itself
* is about to be destroyed (in svc_destroy).
*/
- if (!test_and_set_bit(XPT_DEAD, &svsk->sk_xprt.xpt_flags)) {
- BUG_ON(atomic_read(&svsk->sk_xprt.xpt_ref.refcount) < 2);
- svc_xprt_put(&svsk->sk_xprt);
- if (test_bit(XPT_TEMP, &svsk->sk_xprt.xpt_flags))
+ if (!test_and_set_bit(XPT_DEAD, &xprt->xpt_flags)) {
+ BUG_ON(atomic_read(&xprt->xpt_ref.refcount) < 2);
+ svc_xprt_put(xprt);
+ if (test_bit(XPT_TEMP, &xprt->xpt_flags))
serv->sv_tmpcnt--;
}

spin_unlock_bh(&serv->sv_lock);
}

-static void svc_close_socket(struct svc_sock *svsk)
+static void svc_close_xprt(struct svc_xprt *xprt)
{
- set_bit(XPT_CLOSE, &svsk->sk_xprt.xpt_flags);
- if (test_and_set_bit(XPT_BUSY, &svsk->sk_xprt.xpt_flags))
+ set_bit(XPT_CLOSE, &xprt->xpt_flags);
+ if (test_and_set_bit(XPT_BUSY, &xprt->xpt_flags))
/* someone else will have to effect the close */
return;

- svc_xprt_get(&svsk->sk_xprt);
- svc_delete_socket(svsk);
- clear_bit(XPT_BUSY, &svsk->sk_xprt.xpt_flags);
- svc_xprt_put(&svsk->sk_xprt);
+ svc_xprt_get(xprt);
+ svc_delete_xprt(xprt);
+ clear_bit(XPT_BUSY, &xprt->xpt_flags);
+ svc_xprt_put(xprt);
}

-void svc_force_close_socket(struct svc_sock *svsk)
+void svc_close_all(struct list_head *xprt_list)
{
- set_bit(XPT_CLOSE, &svsk->sk_xprt.xpt_flags);
- if (test_bit(XPT_BUSY, &svsk->sk_xprt.xpt_flags)) {
- /* Waiting to be processed, but no threads left,
- * So just remove it from the waiting list
- */
- list_del_init(&svsk->sk_ready);
- clear_bit(XPT_BUSY, &svsk->sk_xprt.xpt_flags);
+ struct svc_xprt *xprt;
+ struct svc_xprt *tmp;
+
+ list_for_each_entry_safe(xprt, tmp, xprt_list, xpt_list) {
+ set_bit(XPT_CLOSE, &xprt->xpt_flags);
+ if (test_bit(XPT_BUSY, &xprt->xpt_flags)) {
+ /* Waiting to be processed, but no threads left,
+ * So just remove it from the waiting list
+ */
+ list_del_init(&xprt->xpt_ready);
+ clear_bit(XPT_BUSY, &xprt->xpt_flags);
+ }
+ svc_close_xprt(xprt);
}
- svc_close_socket(svsk);
}

/*

-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2005.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2007-10-01 19:27:58

by Tom Tucker

[permalink] [raw]
Subject: [RFC, PATCH 12/35] svc: Add a generic transport svc_create_xprt function


The svc_create_xprt function is a transport independent version
of the svc_makesock function.

Since transport instance creation contains transport dependent and
independent components, add an xpo_create transport function. The
transport implementation of this function allocates the memory for the
endpoint, implements the transport dependent initialization logic, and
calls svc_xprt_init to initialize the transport independent field (svc_xprt)
in it's data structure.

Signed-off-by: Tom Tucker <[email protected]>
---

include/linux/sunrpc/svc_xprt.h | 4 +++
net/sunrpc/svc_xprt.c | 35 ++++++++++++++++++++++++
net/sunrpc/svcsock.c | 58 +++++++++++++++++++++++++++++----------
3 files changed, 82 insertions(+), 15 deletions(-)

diff --git a/include/linux/sunrpc/svc_xprt.h b/include/linux/sunrpc/svc_xprt.h
index 4c1a650..6a34bb4 100644
--- a/include/linux/sunrpc/svc_xprt.h
+++ b/include/linux/sunrpc/svc_xprt.h
@@ -10,6 +10,9 @@ #define SUNRPC_SVC_XPRT_H
#include <linux/sunrpc/svc.h>

struct svc_xprt_ops {
+ struct svc_xprt *(*xpo_create)(struct svc_serv *,
+ struct sockaddr *,
+ int);
struct svc_xprt *(*xpo_accept)(struct svc_xprt *);
int (*xpo_has_wspace)(struct svc_xprt *);
int (*xpo_recvfrom)(struct svc_rqst *);
@@ -37,5 +40,6 @@ struct svc_xprt {
int svc_reg_xprt_class(struct svc_xprt_class *);
int svc_unreg_xprt_class(struct svc_xprt_class *);
void svc_xprt_init(struct svc_xprt_class *, struct svc_xprt *);
+int svc_create_xprt(struct svc_serv *, char *, unsigned short, int);

#endif /* SUNRPC_SVC_XPRT_H */
diff --git a/net/sunrpc/svc_xprt.c b/net/sunrpc/svc_xprt.c
index 8ea65c3..d57064f 100644
--- a/net/sunrpc/svc_xprt.c
+++ b/net/sunrpc/svc_xprt.c
@@ -93,3 +93,38 @@ void svc_xprt_init(struct svc_xprt_class
xpt->xpt_max_payload = xcl->xcl_max_payload;
}
EXPORT_SYMBOL_GPL(svc_xprt_init);
+
+int svc_create_xprt(struct svc_serv *serv, char *xprt_name, unsigned short port,
+ int flags)
+{
+ struct svc_xprt_class *xcl;
+ int ret = -ENOENT;
+ struct sockaddr_in sin = {
+ .sin_family = AF_INET,
+ .sin_addr.s_addr = INADDR_ANY,
+ .sin_port = htons(port),
+ };
+ dprintk("svc: creating transport %s[%d]\n", xprt_name, port);
+ spin_lock(&svc_xprt_class_lock);
+ list_for_each_entry(xcl, &svc_xprt_class_list, xcl_list) {
+ if (strcmp(xprt_name, xcl->xcl_name) == 0) {
+ spin_unlock(&svc_xprt_class_lock);
+ if (try_module_get(xcl->xcl_owner)) {
+ struct svc_xprt *newxprt;
+ ret = 0;
+ newxprt = xcl->xcl_ops->xpo_create
+ (serv, (struct sockaddr *)&sin, flags);
+ if (IS_ERR(newxprt)) {
+ module_put(xcl->xcl_owner);
+ ret = PTR_ERR(newxprt);
+ }
+ }
+ goto out;
+ }
+ }
+ spin_unlock(&svc_xprt_class_lock);
+ dprintk("svc: transport %s not found\n", xprt_name);
+ out:
+ return ret;
+}
+EXPORT_SYMBOL_GPL(svc_create_xprt);
diff --git a/net/sunrpc/svcsock.c b/net/sunrpc/svcsock.c
index ffc54a1..e3c74e0 100644
--- a/net/sunrpc/svcsock.c
+++ b/net/sunrpc/svcsock.c
@@ -90,6 +90,8 @@ static void svc_sock_free(struct svc_xp
static struct svc_deferred_req *svc_deferred_dequeue(struct svc_sock *svsk);
static int svc_deferred_recv(struct svc_rqst *rqstp);
static struct cache_deferred_req *svc_defer(struct cache_req *req);
+static struct svc_xprt *
+svc_create_socket(struct svc_serv *, int, struct sockaddr *, int, int);

/* apparently the "standard" is that clients close
* idle connections after 5 minutes, servers after
@@ -381,6 +383,7 @@ svc_sock_put(struct svc_sock *svsk)
{
if (atomic_dec_and_test(&svsk->sk_inuse)) {
BUG_ON(!test_bit(SK_DEAD, &svsk->sk_flags));
+ module_put(svsk->sk_xprt.xpt_class->xcl_owner);
svsk->sk_xprt.xpt_ops.xpo_free(&svsk->sk_xprt);
}
}
@@ -921,7 +924,15 @@ svc_udp_accept(struct svc_xprt *xprt)
return NULL;
}

+static struct svc_xprt *
+svc_udp_create(struct svc_serv *serv, struct sockaddr *sa, int flags)
+{
+ return svc_create_socket(serv, IPPROTO_UDP, sa,
+ sizeof(struct sockaddr_in), flags);
+}
+
static struct svc_xprt_ops svc_udp_ops = {
+ .xpo_create = svc_udp_create,
.xpo_recvfrom = svc_udp_recvfrom,
.xpo_sendto = svc_udp_sendto,
.xpo_release = svc_release_skb,
@@ -934,6 +945,7 @@ static struct svc_xprt_ops svc_udp_ops =

static struct svc_xprt_class svc_udp_class = {
.xcl_name = "udp",
+ .xcl_owner = THIS_MODULE,
.xcl_ops = &svc_udp_ops,
.xcl_max_payload = RPCSVC_MAXPAYLOAD_UDP,
};
@@ -1357,7 +1369,15 @@ svc_tcp_has_wspace(struct svc_xprt *xprt
return 1;
}

+static struct svc_xprt *
+svc_tcp_create(struct svc_serv *serv, struct sockaddr *sa, int flags)
+{
+ return svc_create_socket(serv, IPPROTO_TCP, sa,
+ sizeof(struct sockaddr_in), flags);
+}
+
static struct svc_xprt_ops svc_tcp_ops = {
+ .xpo_create = svc_tcp_create,
.xpo_recvfrom = svc_tcp_recvfrom,
.xpo_sendto = svc_tcp_sendto,
.xpo_release = svc_release_skb,
@@ -1370,6 +1390,7 @@ static struct svc_xprt_ops svc_tcp_ops =

static struct svc_xprt_class svc_tcp_class = {
.xcl_name = "tcp",
+ .xcl_owner = THIS_MODULE,
.xcl_ops = &svc_tcp_ops,
.xcl_max_payload = RPCSVC_MAXPAYLOAD_TCP,
};
@@ -1594,8 +1615,14 @@ svc_recv(struct svc_rqst *rqstp, long ti
} else if (test_bit(SK_LISTENER, &svsk->sk_flags)) {
struct svc_xprt *newxpt;
newxpt = svsk->sk_xprt.xpt_ops.xpo_accept(&svsk->sk_xprt);
- if (newxpt)
+ if (newxpt) {
+ /*
+ * We know this module_get will succeed because the
+ * listener holds a reference too
+ */
+ __module_get(newxpt->xpt_class->xcl_owner);
svc_check_conn_limits(svsk->sk_server);
+ }
svc_sock_received(svsk);
} else {
dprintk("svc: server %p, pool %u, socket %p, inuse=%d\n",
@@ -1835,8 +1862,9 @@ EXPORT_SYMBOL_GPL(svc_addsock);
/*
* Create socket for RPC service.
*/
-static int svc_create_socket(struct svc_serv *serv, int protocol,
- struct sockaddr *sin, int len, int flags)
+static struct svc_xprt *
+svc_create_socket(struct svc_serv *serv, int protocol,
+ struct sockaddr *sin, int len, int flags)
{
struct svc_sock *svsk;
struct socket *sock;
@@ -1851,13 +1879,13 @@ static int svc_create_socket(struct svc_
if (protocol != IPPROTO_UDP && protocol != IPPROTO_TCP) {
printk(KERN_WARNING "svc: only UDP and TCP "
"sockets supported\n");
- return -EINVAL;
+ return ERR_PTR(-EINVAL);
}
type = (protocol == IPPROTO_UDP)? SOCK_DGRAM : SOCK_STREAM;

error = sock_create_kern(sin->sa_family, type, protocol, &sock);
if (error < 0)
- return error;
+ return ERR_PTR(error);

svc_reclassify_socket(sock);

@@ -1876,13 +1904,13 @@ static int svc_create_socket(struct svc_
if (protocol == IPPROTO_TCP)
set_bit(SK_LISTENER, &svsk->sk_flags);
svc_sock_received(svsk);
- return ntohs(inet_sk(svsk->sk_sk)->sport);
+ return (struct svc_xprt *)svsk;
}

bummer:
dprintk("svc: svc_create_socket error = %d\n", -error);
sock_release(sock);
- return error;
+ return ERR_PTR(error);
}

/*
@@ -1995,15 +2023,15 @@ void svc_force_close_socket(struct svc_s
int svc_makesock(struct svc_serv *serv, int protocol, unsigned short port,
int flags)
{
- struct sockaddr_in sin = {
- .sin_family = AF_INET,
- .sin_addr.s_addr = INADDR_ANY,
- .sin_port = htons(port),
- };
-
dprintk("svc: creating socket proto = %d\n", protocol);
- return svc_create_socket(serv, protocol, (struct sockaddr *) &sin,
- sizeof(sin), flags);
+ switch (protocol) {
+ case IPPROTO_TCP:
+ return svc_create_xprt(serv, "tcp", port, flags);
+ case IPPROTO_UDP:
+ return svc_create_xprt(serv, "udp", port, flags);
+ default:
+ return -EINVAL;
+ }
}

/*

-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2005.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2007-10-01 19:28:02

by Tom Tucker

[permalink] [raw]
Subject: [RFC, PATCH 13/35] svc: Change services to use new svc_create_xprt service


Modify the various kernel RPC svcs to use the svc_create_xprt service.

Signed-off-by: Tom Tucker <[email protected]>
---

fs/lockd/svc.c | 17 ++++++++---------
fs/nfs/callback.c | 4 ++--
fs/nfsd/nfssvc.c | 4 ++--
include/linux/sunrpc/svcsock.h | 1 -
net/sunrpc/sunrpc_syms.c | 1 -
net/sunrpc/svcsock.c | 22 ----------------------
6 files changed, 12 insertions(+), 37 deletions(-)

diff --git a/fs/lockd/svc.c b/fs/lockd/svc.c
index 82e2192..8686915 100644
--- a/fs/lockd/svc.c
+++ b/fs/lockd/svc.c
@@ -219,13 +219,12 @@ lockd(struct svc_rqst *rqstp)
module_put_and_exit(0);
}

-
-static int find_socket(struct svc_serv *serv, int proto)
+static int find_xprt(struct svc_serv *serv, char *proto)
{
struct svc_sock *svsk;
int found = 0;
list_for_each_entry(svsk, &serv->sv_permsocks, sk_list)
- if (svsk->sk_sk->sk_protocol == proto) {
+ if (strcmp(svsk->sk_xprt.xpt_class->xcl_name, proto) == 0) {
found = 1;
break;
}
@@ -243,13 +242,13 @@ static int make_socks(struct svc_serv *s
int err = 0;

if (proto == IPPROTO_UDP || nlm_udpport)
- if (!find_socket(serv, IPPROTO_UDP))
- err = svc_makesock(serv, IPPROTO_UDP, nlm_udpport,
- SVC_SOCK_DEFAULTS);
+ if (!find_xprt(serv, "udp"))
+ err = svc_create_xprt(serv, "udp", nlm_udpport,
+ SVC_SOCK_DEFAULTS);
if (err >= 0 && (proto == IPPROTO_TCP || nlm_tcpport))
- if (!find_socket(serv, IPPROTO_TCP))
- err = svc_makesock(serv, IPPROTO_TCP, nlm_tcpport,
- SVC_SOCK_DEFAULTS);
+ if (!find_xprt(serv, "tcp"))
+ err = svc_create_xprt(serv, "tcp", nlm_tcpport,
+ SVC_SOCK_DEFAULTS);

if (err >= 0) {
warned = 0;
diff --git a/fs/nfs/callback.c b/fs/nfs/callback.c
index a796be5..e27ca14 100644
--- a/fs/nfs/callback.c
+++ b/fs/nfs/callback.c
@@ -123,8 +123,8 @@ int nfs_callback_up(void)
if (!serv)
goto out_err;

- ret = svc_makesock(serv, IPPROTO_TCP, nfs_callback_set_tcpport,
- SVC_SOCK_ANONYMOUS);
+ ret = svc_create_xprt(serv, "tcp", nfs_callback_set_tcpport,
+ SVC_SOCK_ANONYMOUS);
if (ret <= 0)
goto out_destroy;
nfs_callback_tcpport = ret;
diff --git a/fs/nfsd/nfssvc.c b/fs/nfsd/nfssvc.c
index a8c89ae..bf70b06 100644
--- a/fs/nfsd/nfssvc.c
+++ b/fs/nfsd/nfssvc.c
@@ -236,7 +236,7 @@ static int nfsd_init_socks(int port)

error = lockd_up(IPPROTO_UDP);
if (error >= 0) {
- error = svc_makesock(nfsd_serv, IPPROTO_UDP, port,
+ error = svc_create_xprt(nfsd_serv, "udp", port,
SVC_SOCK_DEFAULTS);
if (error < 0)
lockd_down();
@@ -247,7 +247,7 @@ static int nfsd_init_socks(int port)
#ifdef CONFIG_NFSD_TCP
error = lockd_up(IPPROTO_TCP);
if (error >= 0) {
- error = svc_makesock(nfsd_serv, IPPROTO_TCP, port,
+ error = svc_create_xprt(nfsd_serv, "tcp", port,
SVC_SOCK_DEFAULTS);
if (error < 0)
lockd_down();
diff --git a/include/linux/sunrpc/svcsock.h b/include/linux/sunrpc/svcsock.h
index 9882ce0..3181d9d 100644
--- a/include/linux/sunrpc/svcsock.h
+++ b/include/linux/sunrpc/svcsock.h
@@ -67,7 +67,6 @@ #define SK_LISTENER 11 /* listening en
/*
* Function prototypes.
*/
-int svc_makesock(struct svc_serv *, int, unsigned short, int flags);
void svc_force_close_socket(struct svc_sock *);
int svc_recv(struct svc_rqst *, long);
int svc_send(struct svc_rqst *);
diff --git a/net/sunrpc/sunrpc_syms.c b/net/sunrpc/sunrpc_syms.c
index a62ce47..e4cad0f 100644
--- a/net/sunrpc/sunrpc_syms.c
+++ b/net/sunrpc/sunrpc_syms.c
@@ -72,7 +72,6 @@ EXPORT_SYMBOL(svc_drop);
EXPORT_SYMBOL(svc_process);
EXPORT_SYMBOL(svc_recv);
EXPORT_SYMBOL(svc_wake_up);
-EXPORT_SYMBOL(svc_makesock);
EXPORT_SYMBOL(svc_reserve);
EXPORT_SYMBOL(svc_auth_register);
EXPORT_SYMBOL(auth_domain_lookup);
diff --git a/net/sunrpc/svcsock.c b/net/sunrpc/svcsock.c
index e3c74e0..373f020 100644
--- a/net/sunrpc/svcsock.c
+++ b/net/sunrpc/svcsock.c
@@ -2012,28 +2012,6 @@ void svc_force_close_socket(struct svc_s
svc_close_socket(svsk);
}

-/**
- * svc_makesock - Make a socket for nfsd and lockd
- * @serv: RPC server structure
- * @protocol: transport protocol to use
- * @port: port to use
- * @flags: requested socket characteristics
- *
- */
-int svc_makesock(struct svc_serv *serv, int protocol, unsigned short port,
- int flags)
-{
- dprintk("svc: creating socket proto = %d\n", protocol);
- switch (protocol) {
- case IPPROTO_TCP:
- return svc_create_xprt(serv, "tcp", port, flags);
- case IPPROTO_UDP:
- return svc_create_xprt(serv, "udp", port, flags);
- default:
- return -EINVAL;
- }
-}
-
/*
* Handle defer and revisit of requests
*/

-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2005.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2007-10-01 19:28:08

by Tom Tucker

[permalink] [raw]
Subject: [RFC, PATCH 19/35] svc: Make the enqueue service transport neutral and export it.


The svc_sock_enqueue function is now transport independent since all of
the fields it touches have been moved to the transport independent svc_xprt
structure. Change the function to use the svc_xprt structure directly
instead of the transport specific svc_sock structure.

Transport specific data-ready handlers need to call this function, so
export it.

Signed-off-by: Tom Tucker <[email protected]>
---

net/sunrpc/svcsock.c | 95 +++++++++++++++++++++++++-------------------------
1 files changed, 48 insertions(+), 47 deletions(-)

diff --git a/net/sunrpc/svcsock.c b/net/sunrpc/svcsock.c
index 8178e65..7cf15c6 100644
--- a/net/sunrpc/svcsock.c
+++ b/net/sunrpc/svcsock.c
@@ -5,7 +5,7 @@
*
* The server scheduling algorithm does not always distribute the load
* evenly when servicing a single client. May need to modify the
- * svc_sock_enqueue procedure...
+ * svc_xprt_enqueue procedure...
*
* TCP support is largely untested and may be a little slow. The problem
* is that we currently do two separate recvfrom's, one for the 4-byte
@@ -62,7 +62,7 @@ #include <linux/sunrpc/stats.h>
* providing that certain rules are followed:
*
* XPT_CONN, XPT_DATA, can be set or cleared at any time.
- * after a set, svc_sock_enqueue must be called.
+ * after a set, svc_xprt_enqueue must be called.
* after a clear, the socket must be read/accepted
* if this succeeds, it must be set again.
* XPT_CLOSE can set at any time. It is never cleared.
@@ -228,22 +228,22 @@ svc_sock_wspace(struct svc_sock *svsk)
* processes, wake 'em up.
*
*/
-static void
-svc_sock_enqueue(struct svc_sock *svsk)
+void
+svc_xprt_enqueue(struct svc_xprt *xprt)
{
- struct svc_serv *serv = svsk->sk_xprt.xpt_server;
+ struct svc_serv *serv = xprt->xpt_server;
struct svc_pool *pool;
struct svc_rqst *rqstp;
int cpu;

- if (!(svsk->sk_xprt.xpt_flags &
+ if (!(xprt->xpt_flags &
((1<<XPT_CONN)|(1<<XPT_DATA)|(1<<XPT_CLOSE)|(1<<XPT_DEFERRED))))
return;
- if (test_bit(XPT_DEAD, &svsk->sk_xprt.xpt_flags))
+ if (test_bit(XPT_DEAD, &xprt->xpt_flags))
return;

cpu = get_cpu();
- pool = svc_pool_for_cpu(svsk->sk_xprt.xpt_server, cpu);
+ pool = svc_pool_for_cpu(xprt->xpt_server, cpu);
put_cpu();

spin_lock_bh(&pool->sp_lock);
@@ -251,11 +251,11 @@ svc_sock_enqueue(struct svc_sock *svsk)
if (!list_empty(&pool->sp_threads) &&
!list_empty(&pool->sp_sockets))
printk(KERN_ERR
- "svc_sock_enqueue: threads and sockets both waiting??\n");
+ "svc_xprt_enqueue: threads and sockets both waiting??\n");

- if (test_bit(XPT_DEAD, &svsk->sk_xprt.xpt_flags)) {
+ if (test_bit(XPT_DEAD, &xprt->xpt_flags)) {
/* Don't enqueue dead sockets */
- dprintk("svc: socket %p is dead, not enqueued\n", svsk->sk_sk);
+ dprintk("svc: transport %p is dead, not enqueued\n", xprt);
goto out_unlock;
}

@@ -264,28 +264,28 @@ svc_sock_enqueue(struct svc_sock *svsk)
* on the idle list. We update XPT_BUSY atomically because
* it also guards against trying to enqueue the svc_sock twice.
*/
- if (test_and_set_bit(XPT_BUSY, &svsk->sk_xprt.xpt_flags)) {
+ if (test_and_set_bit(XPT_BUSY, &xprt->xpt_flags)) {
/* Don't enqueue socket while already enqueued */
- dprintk("svc: socket %p busy, not enqueued\n", svsk->sk_sk);
+ dprintk("svc: transport %p busy, not enqueued\n", xprt);
goto out_unlock;
}
- BUG_ON(svsk->sk_xprt.xpt_pool != NULL);
- svsk->sk_xprt.xpt_pool = pool;
+ BUG_ON(xprt->xpt_pool != NULL);
+ xprt->xpt_pool = pool;

/* Handle pending connection */
- if (test_bit(XPT_CONN, &svsk->sk_xprt.xpt_flags))
+ if (test_bit(XPT_CONN, &xprt->xpt_flags))
goto process;

/* Handle close in-progress */
- if (test_bit(XPT_CLOSE, &svsk->sk_xprt.xpt_flags))
+ if (test_bit(XPT_CLOSE, &xprt->xpt_flags))
goto process;

/* Check if we have space to reply to a request */
- if (!svsk->sk_xprt.xpt_ops.xpo_has_wspace(&svsk->sk_xprt)) {
+ if (!xprt->xpt_ops.xpo_has_wspace(xprt)) {
/* Don't enqueue while not enough space for reply */
- dprintk("svc: no write space, socket %p not enqueued\n", svsk);
- svsk->sk_xprt.xpt_pool = NULL;
- clear_bit(XPT_BUSY, &svsk->sk_xprt.xpt_flags);
+ dprintk("svc: no write space, transport %p not enqueued\n", xprt);
+ xprt->xpt_pool = NULL;
+ clear_bit(XPT_BUSY, &xprt->xpt_flags);
goto out_unlock;
}

@@ -294,28 +294,29 @@ svc_sock_enqueue(struct svc_sock *svsk)
rqstp = list_entry(pool->sp_threads.next,
struct svc_rqst,
rq_list);
- dprintk("svc: socket %p served by daemon %p\n",
- svsk->sk_sk, rqstp);
+ dprintk("svc: transport %p served by daemon %p\n",
+ xprt, rqstp);
svc_thread_dequeue(pool, rqstp);
- if (rqstp->rq_sock)
+ if (rqstp->rq_xprt)
printk(KERN_ERR
- "svc_sock_enqueue: server %p, rq_sock=%p!\n",
- rqstp, rqstp->rq_sock);
- rqstp->rq_sock = svsk;
- svc_xprt_get(&svsk->sk_xprt);
+ "svc_xprt_enqueue: server %p, rq_xprt=%p!\n",
+ rqstp, rqstp->rq_xprt);
+ rqstp->rq_xprt = xprt;
+ svc_xprt_get(xprt);
rqstp->rq_reserved = serv->sv_max_mesg;
- atomic_add(rqstp->rq_reserved, &svsk->sk_xprt.xpt_reserved);
- BUG_ON(svsk->sk_xprt.xpt_pool != pool);
+ atomic_add(rqstp->rq_reserved, &xprt->xpt_reserved);
+ BUG_ON(xprt->xpt_pool != pool);
wake_up(&rqstp->rq_wait);
} else {
- dprintk("svc: socket %p put into queue\n", svsk->sk_sk);
- list_add_tail(&svsk->sk_xprt.xpt_ready, &pool->sp_sockets);
- BUG_ON(svsk->sk_xprt.xpt_pool != pool);
+ dprintk("svc: transport %p put into queue\n", xprt);
+ list_add_tail(&xprt->xpt_ready, &pool->sp_sockets);
+ BUG_ON(xprt->xpt_pool != pool);
}

out_unlock:
spin_unlock_bh(&pool->sp_lock);
}
+EXPORT_SYMBOL_GPL(svc_xprt_enqueue);

/*
* Dequeue the first socket. Must be called with the pool->sp_lock held.
@@ -349,7 +350,7 @@ svc_sock_received(struct svc_sock *svsk)
{
svsk->sk_xprt.xpt_pool = NULL;
clear_bit(XPT_BUSY, &svsk->sk_xprt.xpt_flags);
- svc_sock_enqueue(svsk);
+ svc_xprt_enqueue(&svsk->sk_xprt);
}


@@ -368,11 +369,11 @@ void svc_reserve(struct svc_rqst *rqstp,
space += rqstp->rq_res.head[0].iov_len;

if (space < rqstp->rq_reserved) {
- struct svc_sock *svsk = rqstp->rq_sock;
- atomic_sub((rqstp->rq_reserved - space), &svsk->sk_xprt.xpt_reserved);
+ struct svc_xprt *xprt = rqstp->rq_xprt;
+ atomic_sub((rqstp->rq_reserved - space), &xprt->xpt_reserved);
rqstp->rq_reserved = space;

- svc_sock_enqueue(svsk);
+ svc_xprt_enqueue(xprt);
}
}

@@ -700,7 +701,7 @@ svc_udp_data_ready(struct sock *sk, int
svsk, sk, count,
test_bit(XPT_BUSY, &svsk->sk_xprt.xpt_flags));
set_bit(XPT_DATA, &svsk->sk_xprt.xpt_flags);
- svc_sock_enqueue(svsk);
+ svc_xprt_enqueue(&svsk->sk_xprt);
}
if (sk->sk_sleep && waitqueue_active(sk->sk_sleep))
wake_up_interruptible(sk->sk_sleep);
@@ -717,7 +718,7 @@ svc_write_space(struct sock *sk)
if (svsk) {
dprintk("svc: socket %p(inet %p), write_space busy=%d\n",
svsk, sk, test_bit(XPT_BUSY, &svsk->sk_xprt.xpt_flags));
- svc_sock_enqueue(svsk);
+ svc_xprt_enqueue(&svsk->sk_xprt);
}

if (sk->sk_sleep && waitqueue_active(sk->sk_sleep)) {
@@ -993,7 +994,7 @@ svc_tcp_listen_data_ready(struct sock *s
if (sk->sk_state == TCP_LISTEN) {
if (svsk) {
set_bit(XPT_CONN, &svsk->sk_xprt.xpt_flags);
- svc_sock_enqueue(svsk);
+ svc_xprt_enqueue(&svsk->sk_xprt);
} else
printk("svc: socket %p: no user data\n", sk);
}
@@ -1017,7 +1018,7 @@ svc_tcp_state_change(struct sock *sk)
printk("svc: socket %p: no user data\n", sk);
else {
set_bit(XPT_CLOSE, &svsk->sk_xprt.xpt_flags);
- svc_sock_enqueue(svsk);
+ svc_xprt_enqueue(&svsk->sk_xprt);
}
if (sk->sk_sleep && waitqueue_active(sk->sk_sleep))
wake_up_interruptible_all(sk->sk_sleep);
@@ -1032,7 +1033,7 @@ svc_tcp_data_ready(struct sock *sk, int
sk, sk->sk_user_data);
if (svsk) {
set_bit(XPT_DATA, &svsk->sk_xprt.xpt_flags);
- svc_sock_enqueue(svsk);
+ svc_xprt_enqueue(&svsk->sk_xprt);
}
if (sk->sk_sleep && waitqueue_active(sk->sk_sleep))
wake_up_interruptible(sk->sk_sleep);
@@ -1085,7 +1086,7 @@ svc_tcp_accept(struct svc_xprt *xprt)
}

set_bit(XPT_CONN, &svsk->sk_xprt.xpt_flags);
- svc_sock_enqueue(svsk);
+ svc_xprt_enqueue(&svsk->sk_xprt);

err = kernel_getpeername(newsock, sin, &slen);
if (err < 0) {
@@ -1321,7 +1322,7 @@ svc_tcp_sendto(struct svc_rqst *rqstp)
(sent<0)?"got error":"sent only",
sent, xbufp->len);
set_bit(XPT_CLOSE, &rqstp->rq_sock->sk_xprt.xpt_flags);
- svc_sock_enqueue(rqstp->rq_sock);
+ svc_xprt_enqueue(rqstp->rq_xprt);
sent = -EAGAIN;
}
return sent;
@@ -1500,7 +1501,7 @@ svc_check_conn_limits(struct svc_serv *s
spin_unlock_bh(&serv->sv_lock);

if (svsk) {
- svc_sock_enqueue(svsk);
+ svc_xprt_enqueue(&svsk->sk_xprt);
svc_xprt_put(&svsk->sk_xprt);
}
}
@@ -1733,7 +1734,7 @@ svc_age_temp_sockets(unsigned long closu
svsk, get_seconds() - svsk->sk_lastrecv);

/* a thread will dequeue and close it soon */
- svc_sock_enqueue(svsk);
+ svc_xprt_enqueue(&svsk->sk_xprt);
svc_xprt_put(&svsk->sk_xprt);
}

@@ -2024,7 +2025,7 @@ static void svc_revisit(struct cache_def
list_add(&dr->handle.recent, &svsk->sk_deferred);
spin_unlock(&svsk->sk_lock);
set_bit(XPT_DEFERRED, &svsk->sk_xprt.xpt_flags);
- svc_sock_enqueue(svsk);
+ svc_xprt_enqueue(&svsk->sk_xprt);
svc_xprt_put(&svsk->sk_xprt);
}


-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2005.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2007-10-01 19:28:24

by Tom Tucker

[permalink] [raw]
Subject: [RFC,PATCH 22/35] svc: Remove sk_lastrecv


With the implementation of the new mark and sweep algorithm for shutting
down old connections, the sk_lastrecv field is no longer needed.

Signed-off-by: Tom Tucker <[email protected]>
---

include/linux/sunrpc/svcsock.h | 1 -
net/sunrpc/svcsock.c | 5 +----
2 files changed, 1 insertions(+), 5 deletions(-)

diff --git a/include/linux/sunrpc/svcsock.h b/include/linux/sunrpc/svcsock.h
index 41c2dfa..406d003 100644
--- a/include/linux/sunrpc/svcsock.h
+++ b/include/linux/sunrpc/svcsock.h
@@ -33,7 +33,6 @@ struct svc_sock {
/* private TCP part */
int sk_reclen; /* length of record */
int sk_tcplen; /* current read length */
- time_t sk_lastrecv; /* time of last received request */

/* cache of various info for TCP sockets */
void *sk_info_authunix;
diff --git a/net/sunrpc/svcsock.c b/net/sunrpc/svcsock.c
index b8d0d55..b1c843e 100644
--- a/net/sunrpc/svcsock.c
+++ b/net/sunrpc/svcsock.c
@@ -1627,7 +1627,6 @@ svc_recv(struct svc_rqst *rqstp, long ti
svc_sock_release(rqstp);
return -EAGAIN;
}
- svsk->sk_lastrecv = get_seconds();
clear_bit(XPT_OLD, &svsk->sk_xprt.xpt_flags);

rqstp->rq_secure = svc_port_is_privileged(svc_addr(rqstp));
@@ -1729,8 +1728,7 @@ svc_age_temp_sockets(unsigned long closu
list_del_init(le);
svsk = list_entry(le, struct svc_sock, sk_xprt.xpt_list);

- dprintk("queuing svsk %p for closing, %lu seconds old\n",
- svsk, get_seconds() - svsk->sk_lastrecv);
+ dprintk("queuing svsk %p for closing\n", svsk);

/* a thread will dequeue and close it soon */
svc_xprt_enqueue(&svsk->sk_xprt);
@@ -1778,7 +1776,6 @@ static struct svc_sock *svc_setup_socket
svsk->sk_ostate = inet->sk_state_change;
svsk->sk_odata = inet->sk_data_ready;
svsk->sk_owspace = inet->sk_write_space;
- svsk->sk_lastrecv = get_seconds();
spin_lock_init(&svsk->sk_lock);
INIT_LIST_HEAD(&svsk->sk_deferred);


-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2005.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2007-10-01 19:28:24

by Tom Tucker

[permalink] [raw]
Subject: [RFC, PATCH 24/35] svc: Make deferral processing xprt independent


This functionally trivial patch moves the transport independent sk_deferred
list to the svc_xprt structure and updates the svc_deferred_req structure
to keep pointers to svc_xprt's directly.

Signed-off-by: Tom Tucker <[email protected]>
---

include/linux/sunrpc/svc.h | 2 +
include/linux/sunrpc/svc_xprt.h | 2 +
include/linux/sunrpc/svcsock.h | 3 --
net/sunrpc/svc_xprt.c | 1 +
net/sunrpc/svcsock.c | 58 +++++++++++++++++----------------------
5 files changed, 29 insertions(+), 37 deletions(-)

diff --git a/include/linux/sunrpc/svc.h b/include/linux/sunrpc/svc.h
index cfb2652..40adc9d 100644
--- a/include/linux/sunrpc/svc.h
+++ b/include/linux/sunrpc/svc.h
@@ -320,7 +320,7 @@ static inline void svc_free_res_pages(st

struct svc_deferred_req {
u32 prot; /* protocol (UDP or TCP) */
- struct svc_sock *svsk;
+ struct svc_xprt *xprt;
struct sockaddr_storage addr; /* where reply must go */
size_t addrlen;
union svc_addr_u daddr; /* where reply must come from */
diff --git a/include/linux/sunrpc/svc_xprt.h b/include/linux/sunrpc/svc_xprt.h
index 9a31d6a..ba92909 100644
--- a/include/linux/sunrpc/svc_xprt.h
+++ b/include/linux/sunrpc/svc_xprt.h
@@ -60,6 +60,8 @@ #define XPT_CACHE_AUTH 12 /* cache auth
spinlock_t xpt_lock; /* protects sk_deferred
* and xpt_auth_cache */
void *xpt_auth_cache;/* auth cache */
+ struct list_head xpt_deferred; /* deferred requests that need
+ * to be revisted */
};

int svc_reg_xprt_class(struct svc_xprt_class *);
diff --git a/include/linux/sunrpc/svcsock.h b/include/linux/sunrpc/svcsock.h
index f2ed6a2..96a229e 100644
--- a/include/linux/sunrpc/svcsock.h
+++ b/include/linux/sunrpc/svcsock.h
@@ -20,9 +20,6 @@ struct svc_sock {
struct socket * sk_sock; /* berkeley socket layer */
struct sock * sk_sk; /* INET layer */

- struct list_head sk_deferred; /* deferred requests that need to
- * be revisted */
-
/* We keep the old state_change and data_ready CB's here */
void (*sk_ostate)(struct sock *);
void (*sk_odata)(struct sock *, int bytes);
diff --git a/net/sunrpc/svc_xprt.c b/net/sunrpc/svc_xprt.c
index 22577a4..2a27d5e 100644
--- a/net/sunrpc/svc_xprt.c
+++ b/net/sunrpc/svc_xprt.c
@@ -115,6 +115,7 @@ void svc_xprt_init(struct svc_xprt_class
xpt->xpt_server = serv;
INIT_LIST_HEAD(&xpt->xpt_list);
INIT_LIST_HEAD(&xpt->xpt_ready);
+ INIT_LIST_HEAD(&xpt->xpt_deferred);
mutex_init(&xpt->xpt_mutex);
spin_lock_init(&xpt->xpt_lock);
}
diff --git a/net/sunrpc/svcsock.c b/net/sunrpc/svcsock.c
index 6d7f2f1..0732dc2 100644
--- a/net/sunrpc/svcsock.c
+++ b/net/sunrpc/svcsock.c
@@ -88,7 +88,7 @@ static void svc_close_xprt(struct svc_x
static void svc_sock_detach(struct svc_xprt *);
static void svc_sock_free(struct svc_xprt *);

-static struct svc_deferred_req *svc_deferred_dequeue(struct svc_sock *svsk);
+static struct svc_deferred_req *svc_deferred_dequeue(struct svc_xprt *xprt);
static int svc_deferred_recv(struct svc_rqst *rqstp);
static struct cache_deferred_req *svc_defer(struct cache_req *req);
static struct svc_xprt *
@@ -780,11 +780,6 @@ svc_udp_recvfrom(struct svc_rqst *rqstp)
(serv->sv_nrthreads+3) * serv->sv_max_mesg,
(serv->sv_nrthreads+3) * serv->sv_max_mesg);

- if ((rqstp->rq_deferred = svc_deferred_dequeue(svsk))) {
- svc_xprt_received(&svsk->sk_xprt);
- return svc_deferred_recv(rqstp);
- }
-
clear_bit(XPT_DATA, &svsk->sk_xprt.xpt_flags);
skb = NULL;
err = kernel_recvmsg(svsk->sk_sock, &msg, NULL,
@@ -1154,11 +1149,6 @@ svc_tcp_recvfrom(struct svc_rqst *rqstp)
test_bit(XPT_CONN, &svsk->sk_xprt.xpt_flags),
test_bit(XPT_CLOSE, &svsk->sk_xprt.xpt_flags));

- if ((rqstp->rq_deferred = svc_deferred_dequeue(svsk))) {
- svc_xprt_received(&svsk->sk_xprt);
- return svc_deferred_recv(rqstp);
- }
-
if (test_and_clear_bit(XPT_CHNGBUF, &svsk->sk_xprt.xpt_flags))
/* sndbuf needs to have room for one request
* per thread, otherwise we can stall even when the
@@ -1618,7 +1608,12 @@ svc_recv(struct svc_rqst *rqstp, long ti
dprintk("svc: server %p, pool %u, socket %p, inuse=%d\n",
rqstp, pool->sp_id, svsk,
atomic_read(&svsk->sk_xprt.xpt_ref.refcount));
- len = svsk->sk_xprt.xpt_ops.xpo_recvfrom(rqstp);
+
+ if ((rqstp->rq_deferred = svc_deferred_dequeue(&svsk->sk_xprt))) {
+ svc_xprt_received(&svsk->sk_xprt);
+ len = svc_deferred_recv(rqstp);
+ } else
+ len = svsk->sk_xprt.xpt_ops.xpo_recvfrom(rqstp);
dprintk("svc: got len=%d\n", len);
}

@@ -1777,7 +1772,6 @@ static struct svc_sock *svc_setup_socket
svsk->sk_ostate = inet->sk_state_change;
svsk->sk_odata = inet->sk_data_ready;
svsk->sk_owspace = inet->sk_write_space;
- INIT_LIST_HEAD(&svsk->sk_deferred);

/* Initialize the socket */
if (sock->type == SOCK_DGRAM)
@@ -2004,22 +1998,21 @@ void svc_close_all(struct list_head *xpr
static void svc_revisit(struct cache_deferred_req *dreq, int too_many)
{
struct svc_deferred_req *dr = container_of(dreq, struct svc_deferred_req, handle);
- struct svc_sock *svsk;
+ struct svc_xprt *xprt = dr->xprt;

if (too_many) {
- svc_xprt_put(&dr->svsk->sk_xprt);
+ svc_xprt_put(xprt);
kfree(dr);
return;
}
dprintk("revisit queued\n");
- svsk = dr->svsk;
- dr->svsk = NULL;
- spin_lock(&svsk->sk_xprt.xpt_lock);
- list_add(&dr->handle.recent, &svsk->sk_deferred);
- spin_unlock(&svsk->sk_xprt.xpt_lock);
- set_bit(XPT_DEFERRED, &svsk->sk_xprt.xpt_flags);
- svc_xprt_enqueue(&svsk->sk_xprt);
- svc_xprt_put(&svsk->sk_xprt);
+ dr->xprt = NULL;
+ spin_lock(&xprt->xpt_lock);
+ list_add(&dr->handle.recent, &xprt->xpt_deferred);
+ spin_unlock(&xprt->xpt_lock);
+ set_bit(XPT_DEFERRED, &xprt->xpt_flags);
+ svc_xprt_enqueue(xprt);
+ svc_xprt_put(xprt);
}

static struct cache_deferred_req *
@@ -2050,7 +2043,7 @@ svc_defer(struct cache_req *req)
memcpy(dr->args, rqstp->rq_arg.head[0].iov_base-skip, dr->argslen<<2);
}
svc_xprt_get(rqstp->rq_xprt);
- dr->svsk = rqstp->rq_sock;
+ dr->xprt = rqstp->rq_xprt;

dr->handle.revisit = svc_revisit;
return &dr->handle;
@@ -2076,22 +2069,21 @@ static int svc_deferred_recv(struct svc_
}


-static struct svc_deferred_req *svc_deferred_dequeue(struct svc_sock *svsk)
+static struct svc_deferred_req *svc_deferred_dequeue(struct svc_xprt *xprt)
{
struct svc_deferred_req *dr = NULL;

- if (!test_bit(XPT_DEFERRED, &svsk->sk_xprt.xpt_flags))
+ if (!test_bit(XPT_DEFERRED, &xprt->xpt_flags))
return NULL;
- spin_lock(&svsk->sk_xprt.xpt_lock);
- clear_bit(XPT_DEFERRED, &svsk->sk_xprt.xpt_flags);
- if (!list_empty(&svsk->sk_deferred)) {
- dr = list_entry(svsk->sk_deferred.next,
+ spin_lock(&xprt->xpt_lock);
+ clear_bit(XPT_DEFERRED, &xprt->xpt_flags);
+ if (!list_empty(&xprt->xpt_deferred)) {
+ dr = list_entry(xprt->xpt_deferred.next,
struct svc_deferred_req,
handle.recent);
list_del_init(&dr->handle.recent);
- set_bit(XPT_DEFERRED, &svsk->sk_xprt.xpt_flags);
+ set_bit(XPT_DEFERRED, &xprt->xpt_flags);
}
- spin_unlock(&svsk->sk_xprt.xpt_lock);
+ spin_unlock(&xprt->xpt_lock);
return dr;
}
-

-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2005.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2007-10-01 19:28:19

by Tom Tucker

[permalink] [raw]
Subject: [RFC,PATCH 23/35] svc: Move the authinfo cache to svc_xprt.


Move the authinfo cache to svc_xprt. This allows both the TCP and RDMA
transports to share this logic. A flag bit is used to determine if
auth information is to be cached or not. Previously, this code looked
at the transport protocol.

I've also changed the spin_lock/unlock logic so that a lock is not taken for
transports that are not caching auth info.

Signed-off-by: Tom Tucker <[email protected]>
---

include/linux/sunrpc/svc_xprt.h | 4 +++
include/linux/sunrpc/svcsock.h | 5 ----
net/sunrpc/svc_xprt.c | 4 +++
net/sunrpc/svcauth_unix.c | 54 +++++++++++++++++++++------------------
net/sunrpc/svcsock.c | 20 +++++++-------
5 files changed, 46 insertions(+), 41 deletions(-)

diff --git a/include/linux/sunrpc/svc_xprt.h b/include/linux/sunrpc/svc_xprt.h
index 103aa36..9a31d6a 100644
--- a/include/linux/sunrpc/svc_xprt.h
+++ b/include/linux/sunrpc/svc_xprt.h
@@ -51,11 +51,15 @@ #define XPT_DEFERRED 8 /* deferred requ
#define XPT_OLD 9 /* used for xprt aging mark+sweep */
#define XPT_DETACHED 10 /* detached from tempsocks list */
#define XPT_LISTENER 11 /* listening endpoint */
+#define XPT_CACHE_AUTH 12 /* cache auth info */

struct svc_pool *xpt_pool; /* current pool iff queued */
struct svc_serv *xpt_server; /* service for transport */
atomic_t xpt_reserved; /* space on outq that is rsvd */
struct mutex xpt_mutex; /* to serialize sending data */
+ spinlock_t xpt_lock; /* protects sk_deferred
+ * and xpt_auth_cache */
+ void *xpt_auth_cache;/* auth cache */
};

int svc_reg_xprt_class(struct svc_xprt_class *);
diff --git a/include/linux/sunrpc/svcsock.h b/include/linux/sunrpc/svcsock.h
index 406d003..f2ed6a2 100644
--- a/include/linux/sunrpc/svcsock.h
+++ b/include/linux/sunrpc/svcsock.h
@@ -20,8 +20,6 @@ struct svc_sock {
struct socket * sk_sock; /* berkeley socket layer */
struct sock * sk_sk; /* INET layer */

- spinlock_t sk_lock; /* protects sk_deferred and
- * sk_info_authunix */
struct list_head sk_deferred; /* deferred requests that need to
* be revisted */

@@ -34,9 +32,6 @@ struct svc_sock {
int sk_reclen; /* length of record */
int sk_tcplen; /* current read length */

- /* cache of various info for TCP sockets */
- void *sk_info_authunix;
-
struct sockaddr_storage sk_local; /* local address */
struct sockaddr_storage sk_remote; /* remote peer's address */
int sk_remotelen; /* length of address */
diff --git a/net/sunrpc/svc_xprt.c b/net/sunrpc/svc_xprt.c
index 2a7e214..22577a4 100644
--- a/net/sunrpc/svc_xprt.c
+++ b/net/sunrpc/svc_xprt.c
@@ -89,6 +89,9 @@ static inline void svc_xprt_free(struct
struct module *owner = xprt->xpt_class->xcl_owner;
BUG_ON(atomic_read(&kref->refcount));
xprt->xpt_ops.xpo_free(xprt);
+ if (test_bit(XPT_CACHE_AUTH, &xprt->xpt_flags)
+ && xprt->xpt_auth_cache != NULL)
+ svcauth_unix_info_release(xprt->xpt_auth_cache);
module_put(owner);
}

@@ -113,6 +116,7 @@ void svc_xprt_init(struct svc_xprt_class
INIT_LIST_HEAD(&xpt->xpt_list);
INIT_LIST_HEAD(&xpt->xpt_ready);
mutex_init(&xpt->xpt_mutex);
+ spin_lock_init(&xpt->xpt_lock);
}
EXPORT_SYMBOL_GPL(svc_xprt_init);

diff --git a/net/sunrpc/svcauth_unix.c b/net/sunrpc/svcauth_unix.c
index 4114794..6815157 100644
--- a/net/sunrpc/svcauth_unix.c
+++ b/net/sunrpc/svcauth_unix.c
@@ -384,41 +384,45 @@ void svcauth_unix_purge(void)
static inline struct ip_map *
ip_map_cached_get(struct svc_rqst *rqstp)
{
- struct ip_map *ipm;
- struct svc_sock *svsk = rqstp->rq_sock;
- spin_lock(&svsk->sk_lock);
- ipm = svsk->sk_info_authunix;
- if (ipm != NULL) {
- if (!cache_valid(&ipm->h)) {
- /*
- * The entry has been invalidated since it was
- * remembered, e.g. by a second mount from the
- * same IP address.
- */
- svsk->sk_info_authunix = NULL;
- spin_unlock(&svsk->sk_lock);
- cache_put(&ipm->h, &ip_map_cache);
- return NULL;
+ struct ip_map *ipm = NULL;
+ struct svc_xprt *xprt = rqstp->rq_xprt;
+
+ if (test_bit(XPT_CACHE_AUTH, &xprt->xpt_flags)) {
+ spin_lock(&xprt->xpt_lock);
+ ipm = xprt->xpt_auth_cache;
+ if (ipm != NULL) {
+ if (!cache_valid(&ipm->h)) {
+ /*
+ * The entry has been invalidated since it was
+ * remembered, e.g. by a second mount from the
+ * same IP address.
+ */
+ xprt->xpt_auth_cache = NULL;
+ spin_unlock(&xprt->xpt_lock);
+ cache_put(&ipm->h, &ip_map_cache);
+ return NULL;
+ }
+ cache_get(&ipm->h);
}
- cache_get(&ipm->h);
+ spin_unlock(&xprt->xpt_lock);
}
- spin_unlock(&svsk->sk_lock);
return ipm;
}

static inline void
ip_map_cached_put(struct svc_rqst *rqstp, struct ip_map *ipm)
{
- struct svc_sock *svsk = rqstp->rq_sock;
+ struct svc_xprt *xprt = rqstp->rq_xprt;

- spin_lock(&svsk->sk_lock);
- if (svsk->sk_sock->type == SOCK_STREAM &&
- svsk->sk_info_authunix == NULL) {
- /* newly cached, keep the reference */
- svsk->sk_info_authunix = ipm;
- ipm = NULL;
+ if (test_bit(XPT_CACHE_AUTH, &xprt->xpt_flags)) {
+ spin_lock(&xprt->xpt_lock);
+ if (xprt->xpt_auth_cache == NULL) {
+ /* newly cached, keep the reference */
+ xprt->xpt_auth_cache = ipm;
+ ipm = NULL;
+ }
+ spin_unlock(&xprt->xpt_lock);
}
- spin_unlock(&svsk->sk_lock);
if (ipm)
cache_put(&ipm->h, &ip_map_cache);
}
diff --git a/net/sunrpc/svcsock.c b/net/sunrpc/svcsock.c
index b1c843e..6d7f2f1 100644
--- a/net/sunrpc/svcsock.c
+++ b/net/sunrpc/svcsock.c
@@ -108,16 +108,16 @@ static struct lock_class_key svc_slock_k
static inline void svc_reclassify_socket(struct socket *sock)
{
struct sock *sk = sock->sk;
- BUG_ON(sk->sk_lock.owner != NULL);
+ BUG_ON(sk->sk_xprt.xpt_lock.owner != NULL);
switch (sk->sk_family) {
case AF_INET:
sock_lock_init_class_and_name(sk, "slock-AF_INET-NFSD",
- &svc_slock_key[0], "sk_lock-AF_INET-NFSD", &svc_key[0]);
+ &svc_slock_key[0], "sk_xprt.xpt_lock-AF_INET-NFSD", &svc_key[0]);
break;

case AF_INET6:
sock_lock_init_class_and_name(sk, "slock-AF_INET6-NFSD",
- &svc_slock_key[1], "sk_lock-AF_INET6-NFSD", &svc_key[1]);
+ &svc_slock_key[1], "sk_xprt.xpt_lock-AF_INET6-NFSD", &svc_key[1]);
break;

default:
@@ -947,6 +947,7 @@ svc_udp_init(struct svc_sock *svsk, stru
mm_segment_t oldfs;

svc_xprt_init(&svc_udp_class, &svsk->sk_xprt, serv);
+ clear_bit(XPT_CACHE_AUTH, &svsk->sk_xprt.xpt_flags);
svsk->sk_sk->sk_data_ready = svc_udp_data_ready;
svsk->sk_sk->sk_write_space = svc_write_space;

@@ -1402,7 +1403,7 @@ svc_tcp_init(struct svc_sock *svsk, stru
struct tcp_sock *tp = tcp_sk(sk);

svc_xprt_init(&svc_tcp_class, &svsk->sk_xprt, serv);
-
+ set_bit(XPT_CACHE_AUTH, &svsk->sk_xprt.xpt_flags);
if (sk->sk_state == TCP_LISTEN) {
dprintk("setting up TCP socket for listening\n");
sk->sk_data_ready = svc_tcp_listen_data_ready;
@@ -1776,7 +1777,6 @@ static struct svc_sock *svc_setup_socket
svsk->sk_ostate = inet->sk_state_change;
svsk->sk_odata = inet->sk_data_ready;
svsk->sk_owspace = inet->sk_write_space;
- spin_lock_init(&svsk->sk_lock);
INIT_LIST_HEAD(&svsk->sk_deferred);

/* Initialize the socket */
@@ -1924,8 +1924,6 @@ svc_sock_free(struct svc_xprt *xprt)
struct svc_sock *svsk = container_of(xprt, struct svc_sock, sk_xprt);
dprintk("svc: svc_sock_free(%p)\n", svsk);

- if (svsk->sk_info_authunix != NULL)
- svcauth_unix_info_release(svsk->sk_info_authunix);
if (svsk->sk_sock->file)
sockfd_put(svsk->sk_sock);
else
@@ -2016,9 +2014,9 @@ static void svc_revisit(struct cache_def
dprintk("revisit queued\n");
svsk = dr->svsk;
dr->svsk = NULL;
- spin_lock(&svsk->sk_lock);
+ spin_lock(&svsk->sk_xprt.xpt_lock);
list_add(&dr->handle.recent, &svsk->sk_deferred);
- spin_unlock(&svsk->sk_lock);
+ spin_unlock(&svsk->sk_xprt.xpt_lock);
set_bit(XPT_DEFERRED, &svsk->sk_xprt.xpt_flags);
svc_xprt_enqueue(&svsk->sk_xprt);
svc_xprt_put(&svsk->sk_xprt);
@@ -2084,7 +2082,7 @@ static struct svc_deferred_req *svc_defe

if (!test_bit(XPT_DEFERRED, &svsk->sk_xprt.xpt_flags))
return NULL;
- spin_lock(&svsk->sk_lock);
+ spin_lock(&svsk->sk_xprt.xpt_lock);
clear_bit(XPT_DEFERRED, &svsk->sk_xprt.xpt_flags);
if (!list_empty(&svsk->sk_deferred)) {
dr = list_entry(svsk->sk_deferred.next,
@@ -2093,7 +2091,7 @@ static struct svc_deferred_req *svc_defe
list_del_init(&dr->handle.recent);
set_bit(XPT_DEFERRED, &svsk->sk_xprt.xpt_flags);
}
- spin_unlock(&svsk->sk_lock);
+ spin_unlock(&svsk->sk_xprt.xpt_lock);
return dr;
}


-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2005.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2007-10-01 19:28:15

by Tom Tucker

[permalink] [raw]
Subject: [RFC,PATCH 20/35] svc: Make svc_send transport neutral


Move the sk_mutex field to the transport independent svc_xprt structure.
Now all the fields that svc_send touches are transport neutral. Change the
svc_send function to use the transport independent svc_xprt directly instead
of the transport dependent svc_sock structure.

Signed-off-by: Tom Tucker <[email protected]>
---

include/linux/sunrpc/svc_xprt.h | 1 +
include/linux/sunrpc/svcsock.h | 1 -
net/sunrpc/svc_xprt.c | 1 +
net/sunrpc/svcsock.c | 17 ++++++++---------
4 files changed, 10 insertions(+), 10 deletions(-)

diff --git a/include/linux/sunrpc/svc_xprt.h b/include/linux/sunrpc/svc_xprt.h
index e8be38f..c16a2c6 100644
--- a/include/linux/sunrpc/svc_xprt.h
+++ b/include/linux/sunrpc/svc_xprt.h
@@ -55,6 +55,7 @@ #define XPT_LISTENER 11 /* listening en
struct svc_pool *xpt_pool; /* current pool iff queued */
struct svc_serv *xpt_server; /* service for transport */
atomic_t xpt_reserved; /* space on outq that is rsvd */
+ struct mutex xpt_mutex; /* to serialize sending data */
};

int svc_reg_xprt_class(struct svc_xprt_class *);
diff --git a/include/linux/sunrpc/svcsock.h b/include/linux/sunrpc/svcsock.h
index ba41f11..41c2dfa 100644
--- a/include/linux/sunrpc/svcsock.h
+++ b/include/linux/sunrpc/svcsock.h
@@ -24,7 +24,6 @@ struct svc_sock {
* sk_info_authunix */
struct list_head sk_deferred; /* deferred requests that need to
* be revisted */
- struct mutex sk_mutex; /* to serialize sending data */

/* We keep the old state_change and data_ready CB's here */
void (*sk_ostate)(struct sock *);
diff --git a/net/sunrpc/svc_xprt.c b/net/sunrpc/svc_xprt.c
index 5195131..2a7e214 100644
--- a/net/sunrpc/svc_xprt.c
+++ b/net/sunrpc/svc_xprt.c
@@ -112,6 +112,7 @@ void svc_xprt_init(struct svc_xprt_class
xpt->xpt_server = serv;
INIT_LIST_HEAD(&xpt->xpt_list);
INIT_LIST_HEAD(&xpt->xpt_ready);
+ mutex_init(&xpt->xpt_mutex);
}
EXPORT_SYMBOL_GPL(svc_xprt_init);

diff --git a/net/sunrpc/svcsock.c b/net/sunrpc/svcsock.c
index 7cf15c6..eee64ce 100644
--- a/net/sunrpc/svcsock.c
+++ b/net/sunrpc/svcsock.c
@@ -1655,12 +1655,12 @@ svc_drop(struct svc_rqst *rqstp)
int
svc_send(struct svc_rqst *rqstp)
{
- struct svc_sock *svsk;
+ struct svc_xprt *xprt;
int len;
struct xdr_buf *xb;

- if ((svsk = rqstp->rq_sock) == NULL) {
- printk(KERN_WARNING "NULL socket pointer in %s:%d\n",
+ if ((xprt = rqstp->rq_xprt) == NULL) {
+ printk(KERN_WARNING "NULL transport pointer in %s:%d\n",
__FILE__, __LINE__);
return -EFAULT;
}
@@ -1674,13 +1674,13 @@ svc_send(struct svc_rqst *rqstp)
xb->page_len +
xb->tail[0].iov_len;

- /* Grab svsk->sk_mutex to serialize outgoing data. */
- mutex_lock(&svsk->sk_mutex);
- if (test_bit(XPT_DEAD, &svsk->sk_xprt.xpt_flags))
+ /* Grab mutex to serialize outgoing data. */
+ mutex_lock(&xprt->xpt_mutex);
+ if (test_bit(XPT_DEAD, &xprt->xpt_flags))
len = -ENOTCONN;
else
- len = svsk->sk_xprt.xpt_ops.xpo_sendto(rqstp);
- mutex_unlock(&svsk->sk_mutex);
+ len = xprt->xpt_ops.xpo_sendto(rqstp);
+ mutex_unlock(&xprt->xpt_mutex);
svc_sock_release(rqstp);

if (len == -ECONNREFUSED || len == -ENOTCONN || len == -EAGAIN)
@@ -1782,7 +1782,6 @@ static struct svc_sock *svc_setup_socket
svsk->sk_lastrecv = get_seconds();
spin_lock_init(&svsk->sk_lock);
INIT_LIST_HEAD(&svsk->sk_deferred);
- mutex_init(&svsk->sk_mutex);

/* Initialize the socket */
if (sock->type == SOCK_DGRAM)

-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2005.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2007-10-01 19:28:07

by Tom Tucker

[permalink] [raw]
Subject: [RFC, PATCH 16/35] svc: Move sk_server and sk_pool to svc_xprt


This is another incremental change that moves transport independent
fields from svc_sock to the svc_xprt structure. The changes
should be functionally null.

Signed-off-by: Tom Tucker <[email protected]>
---

include/linux/sunrpc/svc_xprt.h | 6 ++++
include/linux/sunrpc/svcsock.h | 2 -
net/sunrpc/svc_xprt.c | 4 ++-
net/sunrpc/svcsock.c | 55 +++++++++++++++++++--------------------
4 files changed, 35 insertions(+), 32 deletions(-)

diff --git a/include/linux/sunrpc/svc_xprt.h b/include/linux/sunrpc/svc_xprt.h
index 852a58a..da7f827 100644
--- a/include/linux/sunrpc/svc_xprt.h
+++ b/include/linux/sunrpc/svc_xprt.h
@@ -49,11 +49,15 @@ #define XPT_DEFERRED 8 /* deferred requ
#define XPT_OLD 9 /* used for xprt aging mark+sweep */
#define XPT_DETACHED 10 /* detached from tempsocks list */
#define XPT_LISTENER 11 /* listening endpoint */
+
+ struct svc_pool *xpt_pool; /* current pool iff queued */
+ struct svc_serv *xpt_server; /* service for transport */
};

int svc_reg_xprt_class(struct svc_xprt_class *);
int svc_unreg_xprt_class(struct svc_xprt_class *);
-void svc_xprt_init(struct svc_xprt_class *, struct svc_xprt *);
+void svc_xprt_init(struct svc_xprt_class *, struct svc_xprt *,
+ struct svc_serv *);
int svc_create_xprt(struct svc_serv *, char *, unsigned short, int);
void svc_xprt_put(struct svc_xprt *xprt);

diff --git a/include/linux/sunrpc/svcsock.h b/include/linux/sunrpc/svcsock.h
index b8a8496..92d4cc9 100644
--- a/include/linux/sunrpc/svcsock.h
+++ b/include/linux/sunrpc/svcsock.h
@@ -22,8 +22,6 @@ struct svc_sock {
struct socket * sk_sock; /* berkeley socket layer */
struct sock * sk_sk; /* INET layer */

- struct svc_pool * sk_pool; /* current pool iff queued */
- struct svc_serv * sk_server; /* service for this socket */
atomic_t sk_reserved; /* space on outq that is reserved */

spinlock_t sk_lock; /* protects sk_deferred and
diff --git a/net/sunrpc/svc_xprt.c b/net/sunrpc/svc_xprt.c
index 05ccfa6..a6db507 100644
--- a/net/sunrpc/svc_xprt.c
+++ b/net/sunrpc/svc_xprt.c
@@ -102,12 +102,14 @@ EXPORT_SYMBOL_GPL(svc_xprt_put);
* Called by transport drivers to initialize the transport independent
* portion of the transport instance.
*/
-void svc_xprt_init(struct svc_xprt_class *xcl, struct svc_xprt *xpt)
+void svc_xprt_init(struct svc_xprt_class *xcl, struct svc_xprt *xpt,
+ struct svc_serv *serv)
{
xpt->xpt_class = xcl;
xpt->xpt_ops = *xcl->xcl_ops;
xpt->xpt_max_payload = xcl->xcl_max_payload;
kref_init(&xpt->xpt_ref);
+ xpt->xpt_server = serv;
}
EXPORT_SYMBOL_GPL(svc_xprt_init);

diff --git a/net/sunrpc/svcsock.c b/net/sunrpc/svcsock.c
index 2b82780..80054e4 100644
--- a/net/sunrpc/svcsock.c
+++ b/net/sunrpc/svcsock.c
@@ -231,7 +231,7 @@ svc_sock_wspace(struct svc_sock *svsk)
static void
svc_sock_enqueue(struct svc_sock *svsk)
{
- struct svc_serv *serv = svsk->sk_server;
+ struct svc_serv *serv = svsk->sk_xprt.xpt_server;
struct svc_pool *pool;
struct svc_rqst *rqstp;
int cpu;
@@ -243,7 +243,7 @@ svc_sock_enqueue(struct svc_sock *svsk)
return;

cpu = get_cpu();
- pool = svc_pool_for_cpu(svsk->sk_server, cpu);
+ pool = svc_pool_for_cpu(svsk->sk_xprt.xpt_server, cpu);
put_cpu();

spin_lock_bh(&pool->sp_lock);
@@ -269,8 +269,8 @@ svc_sock_enqueue(struct svc_sock *svsk)
dprintk("svc: socket %p busy, not enqueued\n", svsk->sk_sk);
goto out_unlock;
}
- BUG_ON(svsk->sk_pool != NULL);
- svsk->sk_pool = pool;
+ BUG_ON(svsk->sk_xprt.xpt_pool != NULL);
+ svsk->sk_xprt.xpt_pool = pool;

/* Handle pending connection */
if (test_bit(XPT_CONN, &svsk->sk_xprt.xpt_flags))
@@ -284,7 +284,7 @@ svc_sock_enqueue(struct svc_sock *svsk)
if (!svsk->sk_xprt.xpt_ops.xpo_has_wspace(&svsk->sk_xprt)) {
/* Don't enqueue while not enough space for reply */
dprintk("svc: no write space, socket %p not enqueued\n", svsk);
- svsk->sk_pool = NULL;
+ svsk->sk_xprt.xpt_pool = NULL;
clear_bit(XPT_BUSY, &svsk->sk_xprt.xpt_flags);
goto out_unlock;
}
@@ -305,12 +305,12 @@ svc_sock_enqueue(struct svc_sock *svsk)
svc_xprt_get(&svsk->sk_xprt);
rqstp->rq_reserved = serv->sv_max_mesg;
atomic_add(rqstp->rq_reserved, &svsk->sk_reserved);
- BUG_ON(svsk->sk_pool != pool);
+ BUG_ON(svsk->sk_xprt.xpt_pool != pool);
wake_up(&rqstp->rq_wait);
} else {
dprintk("svc: socket %p put into queue\n", svsk->sk_sk);
list_add_tail(&svsk->sk_ready, &pool->sp_sockets);
- BUG_ON(svsk->sk_pool != pool);
+ BUG_ON(svsk->sk_xprt.xpt_pool != pool);
}

out_unlock:
@@ -347,7 +347,7 @@ svc_sock_dequeue(struct svc_pool *pool)
static inline void
svc_sock_received(struct svc_sock *svsk)
{
- svsk->sk_pool = NULL;
+ svsk->sk_xprt.xpt_pool = NULL;
clear_bit(XPT_BUSY, &svsk->sk_xprt.xpt_flags);
svc_sock_enqueue(svsk);
}
@@ -751,7 +751,7 @@ static int
svc_udp_recvfrom(struct svc_rqst *rqstp)
{
struct svc_sock *svsk = rqstp->rq_sock;
- struct svc_serv *serv = svsk->sk_server;
+ struct svc_serv *serv = svsk->sk_xprt.xpt_server;
struct sk_buff *skb;
union {
struct cmsghdr hdr;
@@ -891,7 +891,7 @@ static int
svc_udp_has_wspace(struct svc_xprt *xprt)
{
struct svc_sock *svsk = container_of(xprt, struct svc_sock, sk_xprt);
- struct svc_serv *serv = svsk->sk_server;
+ struct svc_serv *serv = svsk->sk_xprt.xpt_server;
int required;

/*
@@ -940,12 +940,12 @@ static struct svc_xprt_class svc_udp_cla
};

static void
-svc_udp_init(struct svc_sock *svsk)
+svc_udp_init(struct svc_sock *svsk, struct svc_serv *serv)
{
int one = 1;
mm_segment_t oldfs;

- svc_xprt_init(&svc_udp_class, &svsk->sk_xprt);
+ svc_xprt_init(&svc_udp_class, &svsk->sk_xprt, serv);
svsk->sk_sk->sk_data_ready = svc_udp_data_ready;
svsk->sk_sk->sk_write_space = svc_write_space;

@@ -954,8 +954,8 @@ svc_udp_init(struct svc_sock *svsk)
* svc_udp_recvfrom will re-adjust if necessary
*/
svc_sock_setbufsize(svsk->sk_sock,
- 3 * svsk->sk_server->sv_max_mesg,
- 3 * svsk->sk_server->sv_max_mesg);
+ 3 * svsk->sk_xprt.xpt_server->sv_max_mesg,
+ 3 * svsk->sk_xprt.xpt_server->sv_max_mesg);

set_bit(XPT_DATA, &svsk->sk_xprt.xpt_flags); /* might have come in before data_ready set up */
set_bit(XPT_CHNGBUF, &svsk->sk_xprt.xpt_flags);
@@ -1061,7 +1061,7 @@ svc_tcp_accept(struct svc_xprt *xprt)
struct svc_sock *svsk = container_of(xprt, struct svc_sock, sk_xprt);
struct sockaddr_storage addr;
struct sockaddr *sin = (struct sockaddr *) &addr;
- struct svc_serv *serv = svsk->sk_server;
+ struct svc_serv *serv = svsk->sk_xprt.xpt_server;
struct socket *sock = svsk->sk_sock;
struct socket *newsock;
struct svc_sock *newsvsk;
@@ -1144,7 +1144,7 @@ static int
svc_tcp_recvfrom(struct svc_rqst *rqstp)
{
struct svc_sock *svsk = rqstp->rq_sock;
- struct svc_serv *serv = svsk->sk_server;
+ struct svc_serv *serv = svsk->sk_xprt.xpt_server;
int len;
struct kvec *vec;
int pnum, vlen;
@@ -1287,7 +1287,7 @@ svc_tcp_recvfrom(struct svc_rqst *rqstp)
svc_sock_received(svsk);
} else {
printk(KERN_NOTICE "%s: recvfrom returned errno %d\n",
- svsk->sk_server->sv_name, -len);
+ svsk->sk_xprt.xpt_server->sv_name, -len);
goto err_delete;
}

@@ -1317,7 +1317,7 @@ svc_tcp_sendto(struct svc_rqst *rqstp)
sent = svc_sendto(rqstp, &rqstp->rq_res);
if (sent != xbufp->len) {
printk(KERN_NOTICE "rpc-srv/tcp: %s: %s %d when sending %d bytes - shutting down socket\n",
- rqstp->rq_sock->sk_server->sv_name,
+ rqstp->rq_sock->sk_xprt.xpt_server->sv_name,
(sent<0)?"got error":"sent only",
sent, xbufp->len);
set_bit(XPT_CLOSE, &rqstp->rq_sock->sk_xprt.xpt_flags);
@@ -1343,7 +1343,7 @@ static int
svc_tcp_has_wspace(struct svc_xprt *xprt)
{
struct svc_sock *svsk = container_of(xprt, struct svc_sock, sk_xprt);
- struct svc_serv *serv = svsk->sk_server;
+ struct svc_serv *serv = svsk->sk_xprt.xpt_server;
int required;

/*
@@ -1397,12 +1397,12 @@ void svc_cleanup_xprt_sock(void)
}

static void
-svc_tcp_init(struct svc_sock *svsk)
+svc_tcp_init(struct svc_sock *svsk, struct svc_serv *serv)
{
struct sock *sk = svsk->sk_sk;
struct tcp_sock *tp = tcp_sk(sk);

- svc_xprt_init(&svc_tcp_class, &svsk->sk_xprt);
+ svc_xprt_init(&svc_tcp_class, &svsk->sk_xprt, serv);

if (sk->sk_state == TCP_LISTEN) {
dprintk("setting up TCP socket for listening\n");
@@ -1424,8 +1424,8 @@ svc_tcp_init(struct svc_sock *svsk)
* svc_tcp_recvfrom will re-adjust if necessary
*/
svc_sock_setbufsize(svsk->sk_sock,
- 3 * svsk->sk_server->sv_max_mesg,
- 3 * svsk->sk_server->sv_max_mesg);
+ 3 * svsk->sk_xprt.xpt_server->sv_max_mesg,
+ 3 * svsk->sk_xprt.xpt_server->sv_max_mesg);

set_bit(XPT_CHNGBUF, &svsk->sk_xprt.xpt_flags);
set_bit(XPT_DATA, &svsk->sk_xprt.xpt_flags);
@@ -1610,7 +1610,7 @@ svc_recv(struct svc_rqst *rqstp, long ti
* listener holds a reference too
*/
__module_get(newxpt->xpt_class->xcl_owner);
- svc_check_conn_limits(svsk->sk_server);
+ svc_check_conn_limits(svsk->sk_xprt.xpt_server);
}
svc_sock_received(svsk);
} else {
@@ -1778,7 +1778,6 @@ static struct svc_sock *svc_setup_socket
svsk->sk_ostate = inet->sk_state_change;
svsk->sk_odata = inet->sk_data_ready;
svsk->sk_owspace = inet->sk_write_space;
- svsk->sk_server = serv;
svsk->sk_lastrecv = get_seconds();
spin_lock_init(&svsk->sk_lock);
INIT_LIST_HEAD(&svsk->sk_deferred);
@@ -1787,9 +1786,9 @@ static struct svc_sock *svc_setup_socket

/* Initialize the socket */
if (sock->type == SOCK_DGRAM)
- svc_udp_init(svsk);
+ svc_udp_init(svsk, serv);
else
- svc_tcp_init(svsk);
+ svc_tcp_init(svsk, serv);

spin_lock_bh(&serv->sv_lock);
if (is_temporary) {
@@ -1950,7 +1949,7 @@ svc_delete_socket(struct svc_sock *svsk)

dprintk("svc: svc_delete_socket(%p)\n", svsk);

- serv = svsk->sk_server;
+ serv = svsk->sk_xprt.xpt_server;
sk = svsk->sk_sk;

svsk->sk_xprt.xpt_ops.xpo_detach(&svsk->sk_xprt);

-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2005.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2007-10-01 19:28:19

by Tom Tucker

[permalink] [raw]
Subject: [RFC, PATCH 21/35] svc: Change svc_sock_received to svc_xprt_received and export it


All fields touched by svc_sock_received are now transport independent.
Change it to use svc_xprt directly. This function is called from
transport dependent code, so export it.

Signed-off-by: Tom Tucker <[email protected]>
---

include/linux/sunrpc/svc_xprt.h | 2 +-
net/sunrpc/svcsock.c | 37 ++++++++++++++++++-------------------
2 files changed, 19 insertions(+), 20 deletions(-)

diff --git a/include/linux/sunrpc/svc_xprt.h b/include/linux/sunrpc/svc_xprt.h
index c16a2c6..103aa36 100644
--- a/include/linux/sunrpc/svc_xprt.h
+++ b/include/linux/sunrpc/svc_xprt.h
@@ -63,8 +63,8 @@ int svc_unreg_xprt_class(struct svc_xprt
void svc_xprt_init(struct svc_xprt_class *, struct svc_xprt *,
struct svc_serv *);
int svc_create_xprt(struct svc_serv *, char *, unsigned short, int);
+void svc_xprt_received(struct svc_xprt *);
void svc_xprt_put(struct svc_xprt *xprt);
-
static inline void svc_xprt_get(struct svc_xprt *xprt)
{
kref_get(&xprt->xpt_ref);
diff --git a/net/sunrpc/svcsock.c b/net/sunrpc/svcsock.c
index eee64ce..b8d0d55 100644
--- a/net/sunrpc/svcsock.c
+++ b/net/sunrpc/svcsock.c
@@ -345,14 +345,14 @@ svc_sock_dequeue(struct svc_pool *pool)
* Note: XPT_DATA only gets cleared when a read-attempt finds
* no (or insufficient) data.
*/
-static inline void
-svc_sock_received(struct svc_sock *svsk)
+void
+svc_xprt_received(struct svc_xprt *xprt)
{
- svsk->sk_xprt.xpt_pool = NULL;
- clear_bit(XPT_BUSY, &svsk->sk_xprt.xpt_flags);
- svc_xprt_enqueue(&svsk->sk_xprt);
+ xprt->xpt_pool = NULL;
+ clear_bit(XPT_BUSY, &xprt->xpt_flags);
+ svc_xprt_enqueue(xprt);
}
-
+EXPORT_SYMBOL_GPL(svc_xprt_received);

/**
* svc_reserve - change the space reserved for the reply to a request.
@@ -781,7 +781,7 @@ svc_udp_recvfrom(struct svc_rqst *rqstp)
(serv->sv_nrthreads+3) * serv->sv_max_mesg);

if ((rqstp->rq_deferred = svc_deferred_dequeue(svsk))) {
- svc_sock_received(svsk);
+ svc_xprt_received(&svsk->sk_xprt);
return svc_deferred_recv(rqstp);
}

@@ -798,7 +798,7 @@ svc_udp_recvfrom(struct svc_rqst *rqstp)
dprintk("svc: recvfrom returned error %d\n", -err);
set_bit(XPT_DATA, &svsk->sk_xprt.xpt_flags);
}
- svc_sock_received(svsk);
+ svc_xprt_received(&svsk->sk_xprt);
return -EAGAIN;
}
rqstp->rq_addrlen = sizeof(rqstp->rq_addr);
@@ -813,7 +813,7 @@ svc_udp_recvfrom(struct svc_rqst *rqstp)
/*
* Maybe more packets - kick another thread ASAP.
*/
- svc_sock_received(svsk);
+ svc_xprt_received(&svsk->sk_xprt);

len = skb->len - sizeof(struct udphdr);
rqstp->rq_arg.len = len;
@@ -1126,8 +1126,6 @@ svc_tcp_accept(struct svc_xprt *xprt)
}
memcpy(&newsvsk->sk_local, sin, slen);

- svc_sock_received(newsvsk);
-
if (serv->sv_stats)
serv->sv_stats->nettcpconn++;

@@ -1156,7 +1154,7 @@ svc_tcp_recvfrom(struct svc_rqst *rqstp)
test_bit(XPT_CLOSE, &svsk->sk_xprt.xpt_flags));

if ((rqstp->rq_deferred = svc_deferred_dequeue(svsk))) {
- svc_sock_received(svsk);
+ svc_xprt_received(&svsk->sk_xprt);
return svc_deferred_recv(rqstp);
}

@@ -1196,7 +1194,7 @@ svc_tcp_recvfrom(struct svc_rqst *rqstp)
if (len < want) {
dprintk("svc: short recvfrom while reading record length (%d of %lu)\n",
len, want);
- svc_sock_received(svsk);
+ svc_xprt_received(&svsk->sk_xprt);
return -EAGAIN; /* record header not complete */
}

@@ -1232,7 +1230,7 @@ svc_tcp_recvfrom(struct svc_rqst *rqstp)
if (len < svsk->sk_reclen) {
dprintk("svc: incomplete TCP record (%d of %d)\n",
len, svsk->sk_reclen);
- svc_sock_received(svsk);
+ svc_xprt_received(&svsk->sk_xprt);
return -EAGAIN; /* record not complete */
}
len = svsk->sk_reclen;
@@ -1272,7 +1270,7 @@ svc_tcp_recvfrom(struct svc_rqst *rqstp)
svsk->sk_reclen = 0;
svsk->sk_tcplen = 0;

- svc_sock_received(svsk);
+ svc_xprt_received(&svsk->sk_xprt);
if (serv->sv_stats)
serv->sv_stats->nettcpcnt++;

@@ -1285,7 +1283,7 @@ svc_tcp_recvfrom(struct svc_rqst *rqstp)
error:
if (len == -EAGAIN) {
dprintk("RPC: TCP recvfrom got EAGAIN\n");
- svc_sock_received(svsk);
+ svc_xprt_received(&svsk->sk_xprt);
} else {
printk(KERN_NOTICE "%s: recvfrom returned errno %d\n",
svsk->sk_xprt.xpt_server->sv_name, -len);
@@ -1606,6 +1604,7 @@ svc_recv(struct svc_rqst *rqstp, long ti
struct svc_xprt *newxpt;
newxpt = svsk->sk_xprt.xpt_ops.xpo_accept(&svsk->sk_xprt);
if (newxpt) {
+ svc_xprt_received(newxpt);
/*
* We know this module_get will succeed because the
* listener holds a reference too
@@ -1613,7 +1612,7 @@ svc_recv(struct svc_rqst *rqstp, long ti
__module_get(newxpt->xpt_class->xcl_owner);
svc_check_conn_limits(svsk->sk_xprt.xpt_server);
}
- svc_sock_received(svsk);
+ svc_xprt_received(&svsk->sk_xprt);
} else {
dprintk("svc: server %p, pool %u, socket %p, inuse=%d\n",
rqstp, pool->sp_id, svsk,
@@ -1834,7 +1833,7 @@ int svc_addsock(struct svc_serv *serv,
else {
svsk = svc_setup_socket(serv, so, &err, SVC_SOCK_DEFAULTS);
if (svsk) {
- svc_sock_received(svsk);
+ svc_xprt_received(&svsk->sk_xprt);
err = 0;
}
}
@@ -1891,7 +1890,7 @@ svc_create_socket(struct svc_serv *serv,
if ((svsk = svc_setup_socket(serv, sock, &error, flags)) != NULL) {
if (protocol == IPPROTO_TCP)
set_bit(XPT_LISTENER, &svsk->sk_xprt.xpt_flags);
- svc_sock_received(svsk);
+ svc_xprt_received(&svsk->sk_xprt);
return (struct svc_xprt *)svsk;
}


-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2005.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2007-10-01 19:28:09

by Tom Tucker

[permalink] [raw]
Subject: [RFC, PATCH 15/35] svc: Move sk_flags to the svc_xprt structure


This functionally trivial change moves the transport independent sk_flags
field to the transport independent svc_xprt structure.

Signed-off-by: Tom Tucker <[email protected]>
---

include/linux/sunrpc/svc_xprt.h | 12 +++
include/linux/sunrpc/svcsock.h | 13 ---
net/sunrpc/svcsock.c | 148 ++++++++++++++++++++-------------------
3 files changed, 87 insertions(+), 86 deletions(-)

diff --git a/include/linux/sunrpc/svc_xprt.h b/include/linux/sunrpc/svc_xprt.h
index c77e873..852a58a 100644
--- a/include/linux/sunrpc/svc_xprt.h
+++ b/include/linux/sunrpc/svc_xprt.h
@@ -37,6 +37,18 @@ struct svc_xprt {
struct svc_xprt_ops xpt_ops;
u32 xpt_max_payload;
struct kref xpt_ref;
+ unsigned long xpt_flags;
+#define XPT_BUSY 0 /* enqueued/receiving */
+#define XPT_CONN 1 /* conn pending */
+#define XPT_CLOSE 2 /* dead or dying */
+#define XPT_DATA 3 /* data pending */
+#define XPT_TEMP 4 /* connected transport */
+#define XPT_DEAD 6 /* transport closed */
+#define XPT_CHNGBUF 7 /* need to change snd/rcv buf sizes */
+#define XPT_DEFERRED 8 /* deferred request pending */
+#define XPT_OLD 9 /* used for xprt aging mark+sweep */
+#define XPT_DETACHED 10 /* detached from tempsocks list */
+#define XPT_LISTENER 11 /* listening endpoint */
};

int svc_reg_xprt_class(struct svc_xprt_class *);
diff --git a/include/linux/sunrpc/svcsock.h b/include/linux/sunrpc/svcsock.h
index ba07d50..b8a8496 100644
--- a/include/linux/sunrpc/svcsock.h
+++ b/include/linux/sunrpc/svcsock.h
@@ -24,19 +24,6 @@ struct svc_sock {

struct svc_pool * sk_pool; /* current pool iff queued */
struct svc_serv * sk_server; /* service for this socket */
- unsigned long sk_flags;
-#define SK_BUSY 0 /* enqueued/receiving */
-#define SK_CONN 1 /* conn pending */
-#define SK_CLOSE 2 /* dead or dying */
-#define SK_DATA 3 /* data pending */
-#define SK_TEMP 4 /* temp (TCP) socket */
-#define SK_DEAD 6 /* socket closed */
-#define SK_CHNGBUF 7 /* need to change snd/rcv buffer sizes */
-#define SK_DEFERRED 8 /* request on sk_deferred */
-#define SK_OLD 9 /* used for temp socket aging mark+sweep */
-#define SK_DETACHED 10 /* detached from tempsocks list */
-#define SK_LISTENER 11 /* listening endpoint */
-
atomic_t sk_reserved; /* space on outq that is reserved */

spinlock_t sk_lock; /* protects sk_deferred and
diff --git a/net/sunrpc/svcsock.c b/net/sunrpc/svcsock.c
index d5e78b9..2b82780 100644
--- a/net/sunrpc/svcsock.c
+++ b/net/sunrpc/svcsock.c
@@ -55,22 +55,23 @@ #include <linux/sunrpc/stats.h>
* BKL protects svc_serv->sv_nrthread.
* svc_sock->sk_lock protects the svc_sock->sk_deferred list
* and the ->sk_info_authunix cache.
- * svc_sock->sk_flags.SK_BUSY prevents a svc_sock being enqueued multiply.
+ * svc_sock->sk_xprt.xpt_flags.XPT_BUSY prevents a svc_sock being
+ * enqueued multiply.
*
* Some flags can be set to certain values at any time
* providing that certain rules are followed:
*
- * SK_CONN, SK_DATA, can be set or cleared at any time.
+ * XPT_CONN, XPT_DATA, can be set or cleared at any time.
* after a set, svc_sock_enqueue must be called.
* after a clear, the socket must be read/accepted
* if this succeeds, it must be set again.
- * SK_CLOSE can set at any time. It is never cleared.
- * xpt_ref contains a bias of '1' until SK_DEAD is set.
+ * XPT_CLOSE can set at any time. It is never cleared.
+ * xpt_ref contains a bias of '1' until XPT_DEAD is set.
* so when xprt_ref hits zero, we know the transport is dead
* and no-one is using it.
- * SK_DEAD can only be set while SK_BUSY is held which ensures
+ * XPT_DEAD can only be set while XPT_BUSY is held which ensures
* no other thread will be using the socket or will try to
- * set SK_DEAD.
+ * set XPT_DEAD.
*
*/

@@ -235,10 +236,10 @@ svc_sock_enqueue(struct svc_sock *svsk)
struct svc_rqst *rqstp;
int cpu;

- if (!(svsk->sk_flags &
- ( (1<<SK_CONN)|(1<<SK_DATA)|(1<<SK_CLOSE)|(1<<SK_DEFERRED)) ))
+ if (!(svsk->sk_xprt.xpt_flags &
+ ((1<<XPT_CONN)|(1<<XPT_DATA)|(1<<XPT_CLOSE)|(1<<XPT_DEFERRED))))
return;
- if (test_bit(SK_DEAD, &svsk->sk_flags))
+ if (test_bit(XPT_DEAD, &svsk->sk_xprt.xpt_flags))
return;

cpu = get_cpu();
@@ -252,7 +253,7 @@ svc_sock_enqueue(struct svc_sock *svsk)
printk(KERN_ERR
"svc_sock_enqueue: threads and sockets both waiting??\n");

- if (test_bit(SK_DEAD, &svsk->sk_flags)) {
+ if (test_bit(XPT_DEAD, &svsk->sk_xprt.xpt_flags)) {
/* Don't enqueue dead sockets */
dprintk("svc: socket %p is dead, not enqueued\n", svsk->sk_sk);
goto out_unlock;
@@ -260,10 +261,10 @@ svc_sock_enqueue(struct svc_sock *svsk)

/* Mark socket as busy. It will remain in this state until the
* server has processed all pending data and put the socket back
- * on the idle list. We update SK_BUSY atomically because
+ * on the idle list. We update XPT_BUSY atomically because
* it also guards against trying to enqueue the svc_sock twice.
*/
- if (test_and_set_bit(SK_BUSY, &svsk->sk_flags)) {
+ if (test_and_set_bit(XPT_BUSY, &svsk->sk_xprt.xpt_flags)) {
/* Don't enqueue socket while already enqueued */
dprintk("svc: socket %p busy, not enqueued\n", svsk->sk_sk);
goto out_unlock;
@@ -272,11 +273,11 @@ svc_sock_enqueue(struct svc_sock *svsk)
svsk->sk_pool = pool;

/* Handle pending connection */
- if (test_bit(SK_CONN, &svsk->sk_flags))
+ if (test_bit(XPT_CONN, &svsk->sk_xprt.xpt_flags))
goto process;

/* Handle close in-progress */
- if (test_bit(SK_CLOSE, &svsk->sk_flags))
+ if (test_bit(XPT_CLOSE, &svsk->sk_xprt.xpt_flags))
goto process;

/* Check if we have space to reply to a request */
@@ -284,7 +285,7 @@ svc_sock_enqueue(struct svc_sock *svsk)
/* Don't enqueue while not enough space for reply */
dprintk("svc: no write space, socket %p not enqueued\n", svsk);
svsk->sk_pool = NULL;
- clear_bit(SK_BUSY, &svsk->sk_flags);
+ clear_bit(XPT_BUSY, &svsk->sk_xprt.xpt_flags);
goto out_unlock;
}

@@ -340,14 +341,14 @@ svc_sock_dequeue(struct svc_pool *pool)
/*
* Having read something from a socket, check whether it
* needs to be re-enqueued.
- * Note: SK_DATA only gets cleared when a read-attempt finds
+ * Note: XPT_DATA only gets cleared when a read-attempt finds
* no (or insufficient) data.
*/
static inline void
svc_sock_received(struct svc_sock *svsk)
{
svsk->sk_pool = NULL;
- clear_bit(SK_BUSY, &svsk->sk_flags);
+ clear_bit(XPT_BUSY, &svsk->sk_xprt.xpt_flags);
svc_sock_enqueue(svsk);
}

@@ -696,8 +697,9 @@ svc_udp_data_ready(struct sock *sk, int

if (svsk) {
dprintk("svc: socket %p(inet %p), count=%d, busy=%d\n",
- svsk, sk, count, test_bit(SK_BUSY, &svsk->sk_flags));
- set_bit(SK_DATA, &svsk->sk_flags);
+ svsk, sk, count,
+ test_bit(XPT_BUSY, &svsk->sk_xprt.xpt_flags));
+ set_bit(XPT_DATA, &svsk->sk_xprt.xpt_flags);
svc_sock_enqueue(svsk);
}
if (sk->sk_sleep && waitqueue_active(sk->sk_sleep))
@@ -714,7 +716,7 @@ svc_write_space(struct sock *sk)

if (svsk) {
dprintk("svc: socket %p(inet %p), write_space busy=%d\n",
- svsk, sk, test_bit(SK_BUSY, &svsk->sk_flags));
+ svsk, sk, test_bit(XPT_BUSY, &svsk->sk_xprt.xpt_flags));
svc_sock_enqueue(svsk);
}

@@ -764,7 +766,7 @@ svc_udp_recvfrom(struct svc_rqst *rqstp)
.msg_flags = MSG_DONTWAIT,
};

- if (test_and_clear_bit(SK_CHNGBUF, &svsk->sk_flags))
+ if (test_and_clear_bit(XPT_CHNGBUF, &svsk->sk_xprt.xpt_flags))
/* udp sockets need large rcvbuf as all pending
* requests are still in that buffer. sndbuf must
* also be large enough that there is enough space
@@ -782,7 +784,7 @@ svc_udp_recvfrom(struct svc_rqst *rqstp)
return svc_deferred_recv(rqstp);
}

- clear_bit(SK_DATA, &svsk->sk_flags);
+ clear_bit(XPT_DATA, &svsk->sk_xprt.xpt_flags);
skb = NULL;
err = kernel_recvmsg(svsk->sk_sock, &msg, NULL,
0, 0, MSG_PEEK | MSG_DONTWAIT);
@@ -793,7 +795,7 @@ svc_udp_recvfrom(struct svc_rqst *rqstp)
if (err != -EAGAIN) {
/* possibly an icmp error */
dprintk("svc: recvfrom returned error %d\n", -err);
- set_bit(SK_DATA, &svsk->sk_flags);
+ set_bit(XPT_DATA, &svsk->sk_xprt.xpt_flags);
}
svc_sock_received(svsk);
return -EAGAIN;
@@ -805,7 +807,7 @@ svc_udp_recvfrom(struct svc_rqst *rqstp)
need that much accuracy */
}
svsk->sk_sk->sk_stamp = skb->tstamp;
- set_bit(SK_DATA, &svsk->sk_flags); /* there may be more data... */
+ set_bit(XPT_DATA, &svsk->sk_xprt.xpt_flags); /* there may be more data... */

/*
* Maybe more packets - kick another thread ASAP.
@@ -955,8 +957,8 @@ svc_udp_init(struct svc_sock *svsk)
3 * svsk->sk_server->sv_max_mesg,
3 * svsk->sk_server->sv_max_mesg);

- set_bit(SK_DATA, &svsk->sk_flags); /* might have come in before data_ready set up */
- set_bit(SK_CHNGBUF, &svsk->sk_flags);
+ set_bit(XPT_DATA, &svsk->sk_xprt.xpt_flags); /* might have come in before data_ready set up */
+ set_bit(XPT_CHNGBUF, &svsk->sk_xprt.xpt_flags);

oldfs = get_fs();
set_fs(KERNEL_DS);
@@ -990,7 +992,7 @@ svc_tcp_listen_data_ready(struct sock *s
*/
if (sk->sk_state == TCP_LISTEN) {
if (svsk) {
- set_bit(SK_CONN, &svsk->sk_flags);
+ set_bit(XPT_CONN, &svsk->sk_xprt.xpt_flags);
svc_sock_enqueue(svsk);
} else
printk("svc: socket %p: no user data\n", sk);
@@ -1014,7 +1016,7 @@ svc_tcp_state_change(struct sock *sk)
if (!svsk)
printk("svc: socket %p: no user data\n", sk);
else {
- set_bit(SK_CLOSE, &svsk->sk_flags);
+ set_bit(XPT_CLOSE, &svsk->sk_xprt.xpt_flags);
svc_sock_enqueue(svsk);
}
if (sk->sk_sleep && waitqueue_active(sk->sk_sleep))
@@ -1029,7 +1031,7 @@ svc_tcp_data_ready(struct sock *sk, int
dprintk("svc: socket %p TCP data ready (svsk %p)\n",
sk, sk->sk_user_data);
if (svsk) {
- set_bit(SK_DATA, &svsk->sk_flags);
+ set_bit(XPT_DATA, &svsk->sk_xprt.xpt_flags);
svc_sock_enqueue(svsk);
}
if (sk->sk_sleep && waitqueue_active(sk->sk_sleep))
@@ -1070,7 +1072,7 @@ svc_tcp_accept(struct svc_xprt *xprt)
if (!sock)
return NULL;

- clear_bit(SK_CONN, &svsk->sk_flags);
+ clear_bit(XPT_CONN, &svsk->sk_xprt.xpt_flags);
err = kernel_accept(sock, &newsock, O_NONBLOCK);
if (err < 0) {
if (err == -ENOMEM)
@@ -1082,7 +1084,7 @@ svc_tcp_accept(struct svc_xprt *xprt)
return NULL;
}

- set_bit(SK_CONN, &svsk->sk_flags);
+ set_bit(XPT_CONN, &svsk->sk_xprt.xpt_flags);
svc_sock_enqueue(svsk);

err = kernel_getpeername(newsock, sin, &slen);
@@ -1148,16 +1150,16 @@ svc_tcp_recvfrom(struct svc_rqst *rqstp)
int pnum, vlen;

dprintk("svc: tcp_recv %p data %d conn %d close %d\n",
- svsk, test_bit(SK_DATA, &svsk->sk_flags),
- test_bit(SK_CONN, &svsk->sk_flags),
- test_bit(SK_CLOSE, &svsk->sk_flags));
+ svsk, test_bit(XPT_DATA, &svsk->sk_xprt.xpt_flags),
+ test_bit(XPT_CONN, &svsk->sk_xprt.xpt_flags),
+ test_bit(XPT_CLOSE, &svsk->sk_xprt.xpt_flags));

if ((rqstp->rq_deferred = svc_deferred_dequeue(svsk))) {
svc_sock_received(svsk);
return svc_deferred_recv(rqstp);
}

- if (test_and_clear_bit(SK_CHNGBUF, &svsk->sk_flags))
+ if (test_and_clear_bit(XPT_CHNGBUF, &svsk->sk_xprt.xpt_flags))
/* sndbuf needs to have room for one request
* per thread, otherwise we can stall even when the
* network isn't a bottleneck.
@@ -1174,7 +1176,7 @@ svc_tcp_recvfrom(struct svc_rqst *rqstp)
(serv->sv_nrthreads+3) * serv->sv_max_mesg,
3 * serv->sv_max_mesg);

- clear_bit(SK_DATA, &svsk->sk_flags);
+ clear_bit(XPT_DATA, &svsk->sk_xprt.xpt_flags);

/* Receive data. If we haven't got the record length yet, get
* the next four bytes. Otherwise try to gobble up as much as
@@ -1233,7 +1235,7 @@ svc_tcp_recvfrom(struct svc_rqst *rqstp)
return -EAGAIN; /* record not complete */
}
len = svsk->sk_reclen;
- set_bit(SK_DATA, &svsk->sk_flags);
+ set_bit(XPT_DATA, &svsk->sk_xprt.xpt_flags);

vec = rqstp->rq_vec;
vec[0] = rqstp->rq_arg.head[0];
@@ -1309,7 +1311,7 @@ svc_tcp_sendto(struct svc_rqst *rqstp)
reclen = htonl(0x80000000|((xbufp->len ) - 4));
memcpy(xbufp->head[0].iov_base, &reclen, 4);

- if (test_bit(SK_DEAD, &rqstp->rq_sock->sk_flags))
+ if (test_bit(XPT_DEAD, &rqstp->rq_sock->sk_xprt.xpt_flags))
return -ENOTCONN;

sent = svc_sendto(rqstp, &rqstp->rq_res);
@@ -1318,7 +1320,7 @@ svc_tcp_sendto(struct svc_rqst *rqstp)
rqstp->rq_sock->sk_server->sv_name,
(sent<0)?"got error":"sent only",
sent, xbufp->len);
- set_bit(SK_CLOSE, &rqstp->rq_sock->sk_flags);
+ set_bit(XPT_CLOSE, &rqstp->rq_sock->sk_xprt.xpt_flags);
svc_sock_enqueue(rqstp->rq_sock);
sent = -EAGAIN;
}
@@ -1405,7 +1407,7 @@ svc_tcp_init(struct svc_sock *svsk)
if (sk->sk_state == TCP_LISTEN) {
dprintk("setting up TCP socket for listening\n");
sk->sk_data_ready = svc_tcp_listen_data_ready;
- set_bit(SK_CONN, &svsk->sk_flags);
+ set_bit(XPT_CONN, &svsk->sk_xprt.xpt_flags);
} else {
dprintk("setting up TCP socket for reading\n");
sk->sk_state_change = svc_tcp_state_change;
@@ -1425,10 +1427,10 @@ svc_tcp_init(struct svc_sock *svsk)
3 * svsk->sk_server->sv_max_mesg,
3 * svsk->sk_server->sv_max_mesg);

- set_bit(SK_CHNGBUF, &svsk->sk_flags);
- set_bit(SK_DATA, &svsk->sk_flags);
+ set_bit(XPT_CHNGBUF, &svsk->sk_xprt.xpt_flags);
+ set_bit(XPT_DATA, &svsk->sk_xprt.xpt_flags);
if (sk->sk_state != TCP_ESTABLISHED)
- set_bit(SK_CLOSE, &svsk->sk_flags);
+ set_bit(XPT_CLOSE, &svsk->sk_xprt.xpt_flags);
}
}

@@ -1445,12 +1447,12 @@ svc_sock_update_bufs(struct svc_serv *se
list_for_each(le, &serv->sv_permsocks) {
struct svc_sock *svsk =
list_entry(le, struct svc_sock, sk_list);
- set_bit(SK_CHNGBUF, &svsk->sk_flags);
+ set_bit(XPT_CHNGBUF, &svsk->sk_xprt.xpt_flags);
}
list_for_each(le, &serv->sv_tempsocks) {
struct svc_sock *svsk =
list_entry(le, struct svc_sock, sk_list);
- set_bit(SK_CHNGBUF, &svsk->sk_flags);
+ set_bit(XPT_CHNGBUF, &svsk->sk_xprt.xpt_flags);
}
spin_unlock_bh(&serv->sv_lock);
}
@@ -1492,7 +1494,7 @@ svc_check_conn_limits(struct svc_serv *s
svsk = list_entry(serv->sv_tempsocks.prev,
struct svc_sock,
sk_list);
- set_bit(SK_CLOSE, &svsk->sk_flags);
+ set_bit(XPT_CLOSE, &svsk->sk_xprt.xpt_flags);
svc_xprt_get(&svsk->sk_xprt);
}
spin_unlock_bh(&serv->sv_lock);
@@ -1596,10 +1598,10 @@ svc_recv(struct svc_rqst *rqstp, long ti
spin_unlock_bh(&pool->sp_lock);

len = 0;
- if (test_bit(SK_CLOSE, &svsk->sk_flags)) {
- dprintk("svc_recv: found SK_CLOSE\n");
+ if (test_bit(XPT_CLOSE, &svsk->sk_xprt.xpt_flags)) {
+ dprintk("svc_recv: found XPT_CLOSE\n");
svc_delete_socket(svsk);
- } else if (test_bit(SK_LISTENER, &svsk->sk_flags)) {
+ } else if (test_bit(XPT_LISTENER, &svsk->sk_xprt.xpt_flags)) {
struct svc_xprt *newxpt;
newxpt = svsk->sk_xprt.xpt_ops.xpo_accept(&svsk->sk_xprt);
if (newxpt) {
@@ -1626,7 +1628,7 @@ svc_recv(struct svc_rqst *rqstp, long ti
return -EAGAIN;
}
svsk->sk_lastrecv = get_seconds();
- clear_bit(SK_OLD, &svsk->sk_flags);
+ clear_bit(XPT_OLD, &svsk->sk_xprt.xpt_flags);

rqstp->rq_secure = svc_port_is_privileged(svc_addr(rqstp));
rqstp->rq_chandle.defer = svc_defer;
@@ -1673,7 +1675,7 @@ svc_send(struct svc_rqst *rqstp)

/* Grab svsk->sk_mutex to serialize outgoing data. */
mutex_lock(&svsk->sk_mutex);
- if (test_bit(SK_DEAD, &svsk->sk_flags))
+ if (test_bit(XPT_DEAD, &svsk->sk_xprt.xpt_flags))
len = -ENOTCONN;
else
len = svsk->sk_xprt.xpt_ops.xpo_sendto(rqstp);
@@ -1709,21 +1711,21 @@ svc_age_temp_sockets(unsigned long closu
list_for_each_safe(le, next, &serv->sv_tempsocks) {
svsk = list_entry(le, struct svc_sock, sk_list);

- if (!test_and_set_bit(SK_OLD, &svsk->sk_flags))
+ if (!test_and_set_bit(XPT_OLD, &svsk->sk_xprt.xpt_flags))
continue;
if (atomic_read(&svsk->sk_xprt.xpt_ref.refcount) > 1
- || test_bit(SK_BUSY, &svsk->sk_flags))
+ || test_bit(XPT_BUSY, &svsk->sk_xprt.xpt_flags))
continue;
svc_xprt_get(&svsk->sk_xprt);
list_move(le, &to_be_aged);
- set_bit(SK_CLOSE, &svsk->sk_flags);
- set_bit(SK_DETACHED, &svsk->sk_flags);
+ set_bit(XPT_CLOSE, &svsk->sk_xprt.xpt_flags);
+ set_bit(XPT_DETACHED, &svsk->sk_xprt.xpt_flags);
}
spin_unlock_bh(&serv->sv_lock);

while (!list_empty(&to_be_aged)) {
le = to_be_aged.next;
- /* fiddling the sk_list node is safe 'cos we're SK_DETACHED */
+ /* fiddling the sk_list node is safe 'cos we're XPT_DETACHED */
list_del_init(le);
svsk = list_entry(le, struct svc_sock, sk_list);

@@ -1769,7 +1771,7 @@ static struct svc_sock *svc_setup_socket
return NULL;
}

- set_bit(SK_BUSY, &svsk->sk_flags);
+ set_bit(XPT_BUSY, &svsk->sk_xprt.xpt_flags);
inet->sk_user_data = svsk;
svsk->sk_sock = sock;
svsk->sk_sk = inet;
@@ -1791,7 +1793,7 @@ static struct svc_sock *svc_setup_socket

spin_lock_bh(&serv->sv_lock);
if (is_temporary) {
- set_bit(SK_TEMP, &svsk->sk_flags);
+ set_bit(XPT_TEMP, &svsk->sk_xprt.xpt_flags);
list_add(&svsk->sk_list, &serv->sv_tempsocks);
serv->sv_tmpcnt++;
if (serv->sv_temptimer.function == NULL) {
@@ -1802,7 +1804,7 @@ static struct svc_sock *svc_setup_socket
jiffies + svc_conn_age_period * HZ);
}
} else {
- clear_bit(SK_TEMP, &svsk->sk_flags);
+ clear_bit(XPT_TEMP, &svsk->sk_xprt.xpt_flags);
list_add(&svsk->sk_list, &serv->sv_permsocks);
}
spin_unlock_bh(&serv->sv_lock);
@@ -1890,7 +1892,7 @@ svc_create_socket(struct svc_serv *serv,

if ((svsk = svc_setup_socket(serv, sock, &error, flags)) != NULL) {
if (protocol == IPPROTO_TCP)
- set_bit(SK_LISTENER, &svsk->sk_flags);
+ set_bit(XPT_LISTENER, &svsk->sk_xprt.xpt_flags);
svc_sock_received(svsk);
return (struct svc_xprt *)svsk;
}
@@ -1955,7 +1957,7 @@ svc_delete_socket(struct svc_sock *svsk)

spin_lock_bh(&serv->sv_lock);

- if (!test_and_set_bit(SK_DETACHED, &svsk->sk_flags))
+ if (!test_and_set_bit(XPT_DETACHED, &svsk->sk_xprt.xpt_flags))
list_del_init(&svsk->sk_list);
/*
* We used to delete the svc_sock from whichever list
@@ -1964,10 +1966,10 @@ svc_delete_socket(struct svc_sock *svsk)
* while still attached to a queue, the queue itself
* is about to be destroyed (in svc_destroy).
*/
- if (!test_and_set_bit(SK_DEAD, &svsk->sk_flags)) {
+ if (!test_and_set_bit(XPT_DEAD, &svsk->sk_xprt.xpt_flags)) {
BUG_ON(atomic_read(&svsk->sk_xprt.xpt_ref.refcount) < 2);
svc_xprt_put(&svsk->sk_xprt);
- if (test_bit(SK_TEMP, &svsk->sk_flags))
+ if (test_bit(XPT_TEMP, &svsk->sk_xprt.xpt_flags))
serv->sv_tmpcnt--;
}

@@ -1976,26 +1978,26 @@ svc_delete_socket(struct svc_sock *svsk)

static void svc_close_socket(struct svc_sock *svsk)
{
- set_bit(SK_CLOSE, &svsk->sk_flags);
- if (test_and_set_bit(SK_BUSY, &svsk->sk_flags))
+ set_bit(XPT_CLOSE, &svsk->sk_xprt.xpt_flags);
+ if (test_and_set_bit(XPT_BUSY, &svsk->sk_xprt.xpt_flags))
/* someone else will have to effect the close */
return;

svc_xprt_get(&svsk->sk_xprt);
svc_delete_socket(svsk);
- clear_bit(SK_BUSY, &svsk->sk_flags);
+ clear_bit(XPT_BUSY, &svsk->sk_xprt.xpt_flags);
svc_xprt_put(&svsk->sk_xprt);
}

void svc_force_close_socket(struct svc_sock *svsk)
{
- set_bit(SK_CLOSE, &svsk->sk_flags);
- if (test_bit(SK_BUSY, &svsk->sk_flags)) {
+ set_bit(XPT_CLOSE, &svsk->sk_xprt.xpt_flags);
+ if (test_bit(XPT_BUSY, &svsk->sk_xprt.xpt_flags)) {
/* Waiting to be processed, but no threads left,
* So just remove it from the waiting list
*/
list_del_init(&svsk->sk_ready);
- clear_bit(SK_BUSY, &svsk->sk_flags);
+ clear_bit(XPT_BUSY, &svsk->sk_xprt.xpt_flags);
}
svc_close_socket(svsk);
}
@@ -2020,7 +2022,7 @@ static void svc_revisit(struct cache_def
spin_lock(&svsk->sk_lock);
list_add(&dr->handle.recent, &svsk->sk_deferred);
spin_unlock(&svsk->sk_lock);
- set_bit(SK_DEFERRED, &svsk->sk_flags);
+ set_bit(XPT_DEFERRED, &svsk->sk_xprt.xpt_flags);
svc_sock_enqueue(svsk);
svc_xprt_put(&svsk->sk_xprt);
}
@@ -2083,16 +2085,16 @@ static struct svc_deferred_req *svc_defe
{
struct svc_deferred_req *dr = NULL;

- if (!test_bit(SK_DEFERRED, &svsk->sk_flags))
+ if (!test_bit(XPT_DEFERRED, &svsk->sk_xprt.xpt_flags))
return NULL;
spin_lock(&svsk->sk_lock);
- clear_bit(SK_DEFERRED, &svsk->sk_flags);
+ clear_bit(XPT_DEFERRED, &svsk->sk_xprt.xpt_flags);
if (!list_empty(&svsk->sk_deferred)) {
dr = list_entry(svsk->sk_deferred.next,
struct svc_deferred_req,
handle.recent);
list_del_init(&dr->handle.recent);
- set_bit(SK_DEFERRED, &svsk->sk_flags);
+ set_bit(XPT_DEFERRED, &svsk->sk_xprt.xpt_flags);
}
spin_unlock(&svsk->sk_lock);
return dr;

-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2005.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2007-10-01 19:28:26

by Tom Tucker

[permalink] [raw]
Subject: [RFC,PATCH 27/35] svc: Make svc_recv transport neutral


All of the transport field and functions used by svc_recv are now
transport independent. Change the svc_recv function to use the svc_xprt
structure directly instead of the transport specific svc_sock structure.

Signed-off-by: Tom Tucker <[email protected]>
---

net/sunrpc/svcsock.c | 64 +++++++++++++++++++++++++-------------------------
1 files changed, 32 insertions(+), 32 deletions(-)

diff --git a/net/sunrpc/svcsock.c b/net/sunrpc/svcsock.c
index 68ae7a9..573792f 100644
--- a/net/sunrpc/svcsock.c
+++ b/net/sunrpc/svcsock.c
@@ -321,22 +321,22 @@ EXPORT_SYMBOL_GPL(svc_xprt_enqueue);
/*
* Dequeue the first socket. Must be called with the pool->sp_lock held.
*/
-static inline struct svc_sock *
-svc_sock_dequeue(struct svc_pool *pool)
+static inline struct svc_xprt *
+svc_xprt_dequeue(struct svc_pool *pool)
{
- struct svc_sock *svsk;
+ struct svc_xprt *xprt;

if (list_empty(&pool->sp_sockets))
return NULL;

- svsk = list_entry(pool->sp_sockets.next,
- struct svc_sock, sk_xprt.xpt_ready);
- list_del_init(&svsk->sk_xprt.xpt_ready);
+ xprt = list_entry(pool->sp_sockets.next,
+ struct svc_xprt, xpt_ready);
+ list_del_init(&xprt->xpt_ready);

- dprintk("svc: socket %p dequeued, inuse=%d\n",
- svsk->sk_sk, atomic_read(&svsk->sk_xprt.xpt_ref.refcount));
+ dprintk("svc: transport %p dequeued, inuse=%d\n",
+ xprt, atomic_read(&xprt->xpt_ref.refcount));

- return svsk;
+ return xprt;
}

/*
@@ -1506,20 +1506,20 @@ static inline void svc_copy_addr(struct
int
svc_recv(struct svc_rqst *rqstp, long timeout)
{
- struct svc_sock *svsk = NULL;
+ struct svc_xprt *xprt = NULL;
struct svc_serv *serv = rqstp->rq_server;
struct svc_pool *pool = rqstp->rq_pool;
int len, i;
- int pages;
+ int pages;
struct xdr_buf *arg;
DECLARE_WAITQUEUE(wait, current);

dprintk("svc: server %p waiting for data (to = %ld)\n",
rqstp, timeout);

- if (rqstp->rq_sock)
+ if (rqstp->rq_xprt)
printk(KERN_ERR
- "svc_recv: service %p, socket not NULL!\n",
+ "svc_recv: service %p, transport not NULL!\n",
rqstp);
if (waitqueue_active(&rqstp->rq_wait))
printk(KERN_ERR
@@ -1556,11 +1556,11 @@ svc_recv(struct svc_rqst *rqstp, long ti
return -EINTR;

spin_lock_bh(&pool->sp_lock);
- if ((svsk = svc_sock_dequeue(pool)) != NULL) {
- rqstp->rq_sock = svsk;
- svc_xprt_get(&svsk->sk_xprt);
+ if ((xprt = svc_xprt_dequeue(pool)) != NULL) {
+ rqstp->rq_xprt = xprt;
+ svc_xprt_get(xprt);
rqstp->rq_reserved = serv->sv_max_mesg;
- atomic_add(rqstp->rq_reserved, &svsk->sk_xprt.xpt_reserved);
+ atomic_add(rqstp->rq_reserved, &xprt->xpt_reserved);
} else {
/* No data pending. Go to sleep */
svc_thread_enqueue(pool, rqstp);
@@ -1580,7 +1580,7 @@ svc_recv(struct svc_rqst *rqstp, long ti
spin_lock_bh(&pool->sp_lock);
remove_wait_queue(&rqstp->rq_wait, &wait);

- if (!(svsk = rqstp->rq_sock)) {
+ if (!(xprt = rqstp->rq_xprt)) {
svc_thread_dequeue(pool, rqstp);
spin_unlock_bh(&pool->sp_lock);
dprintk("svc: server %p, no data yet\n", rqstp);
@@ -1590,12 +1590,12 @@ svc_recv(struct svc_rqst *rqstp, long ti
spin_unlock_bh(&pool->sp_lock);

len = 0;
- if (test_bit(XPT_CLOSE, &svsk->sk_xprt.xpt_flags)) {
+ if (test_bit(XPT_CLOSE, &xprt->xpt_flags)) {
dprintk("svc_recv: found XPT_CLOSE\n");
- svc_delete_xprt(&svsk->sk_xprt);
- } else if (test_bit(XPT_LISTENER, &svsk->sk_xprt.xpt_flags)) {
+ svc_delete_xprt(xprt);
+ } else if (test_bit(XPT_LISTENER, &xprt->xpt_flags)) {
struct svc_xprt *newxpt;
- newxpt = svsk->sk_xprt.xpt_ops.xpo_accept(&svsk->sk_xprt);
+ newxpt = xprt->xpt_ops.xpo_accept(xprt);
if (newxpt) {
svc_xprt_received(newxpt);
/*
@@ -1603,20 +1603,20 @@ svc_recv(struct svc_rqst *rqstp, long ti
* listener holds a reference too
*/
__module_get(newxpt->xpt_class->xcl_owner);
- svc_check_conn_limits(svsk->sk_xprt.xpt_server);
+ svc_check_conn_limits(xprt->xpt_server);
}
- svc_xprt_received(&svsk->sk_xprt);
+ svc_xprt_received(xprt);
} else {
- dprintk("svc: server %p, pool %u, socket %p, inuse=%d\n",
- rqstp, pool->sp_id, svsk,
- atomic_read(&svsk->sk_xprt.xpt_ref.refcount));
+ dprintk("svc: server %p, pool %u, transport %p, inuse=%d\n",
+ rqstp, pool->sp_id, xprt,
+ atomic_read(&xprt->xpt_ref.refcount));

- if ((rqstp->rq_deferred = svc_deferred_dequeue(&svsk->sk_xprt))) {
- svc_xprt_received(&svsk->sk_xprt);
+ if ((rqstp->rq_deferred = svc_deferred_dequeue(xprt))) {
+ svc_xprt_received(xprt);
len = svc_deferred_recv(rqstp);
} else
- len = svsk->sk_xprt.xpt_ops.xpo_recvfrom(rqstp);
- svc_copy_addr(rqstp, &svsk->sk_xprt);
+ len = xprt->xpt_ops.xpo_recvfrom(rqstp);
+ svc_copy_addr(rqstp, xprt);
dprintk("svc: got len=%d\n", len);
}

@@ -1626,7 +1626,7 @@ svc_recv(struct svc_rqst *rqstp, long ti
svc_xprt_release(rqstp);
return -EAGAIN;
}
- clear_bit(XPT_OLD, &svsk->sk_xprt.xpt_flags);
+ clear_bit(XPT_OLD, &xprt->xpt_flags);

rqstp->rq_secure = svc_port_is_privileged(svc_addr(rqstp));
rqstp->rq_chandle.defer = svc_defer;

-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2005.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2007-10-01 19:28:25

by Tom Tucker

[permalink] [raw]
Subject: [RFC, PATCH 25/35] svc: Move the sockaddr information to svc_xprt


Move the IP address fields to the svc_xprt structure. Note that this
assumes that _all_ RPC transports must have IP based 4-tuples. This
seems reasonable given the tight coupling with the portmapper etc...
Thoughts?

Signed-off-by: Tom Tucker <[email protected]>
---

include/linux/sunrpc/svc_xprt.h | 3 ++
include/linux/sunrpc/svcsock.h | 4 ---
net/sunrpc/svcsock.c | 50 +++++++++++++++++++++------------------
3 files changed, 30 insertions(+), 27 deletions(-)

diff --git a/include/linux/sunrpc/svc_xprt.h b/include/linux/sunrpc/svc_xprt.h
index ba92909..47ad941 100644
--- a/include/linux/sunrpc/svc_xprt.h
+++ b/include/linux/sunrpc/svc_xprt.h
@@ -62,6 +62,9 @@ #define XPT_CACHE_AUTH 12 /* cache auth
void *xpt_auth_cache;/* auth cache */
struct list_head xpt_deferred; /* deferred requests that need
* to be revisted */
+ struct sockaddr_storage xpt_local; /* local address */
+ struct sockaddr_storage xpt_remote; /* remote peer's address */
+ int xpt_remotelen; /* length of address */
};

int svc_reg_xprt_class(struct svc_xprt_class *);
diff --git a/include/linux/sunrpc/svcsock.h b/include/linux/sunrpc/svcsock.h
index 96a229e..206f092 100644
--- a/include/linux/sunrpc/svcsock.h
+++ b/include/linux/sunrpc/svcsock.h
@@ -28,10 +28,6 @@ struct svc_sock {
/* private TCP part */
int sk_reclen; /* length of record */
int sk_tcplen; /* current read length */
-
- struct sockaddr_storage sk_local; /* local address */
- struct sockaddr_storage sk_remote; /* remote peer's address */
- int sk_remotelen; /* length of address */
};

/*
diff --git a/net/sunrpc/svcsock.c b/net/sunrpc/svcsock.c
index 0732dc2..ab34bb2 100644
--- a/net/sunrpc/svcsock.c
+++ b/net/sunrpc/svcsock.c
@@ -632,33 +632,13 @@ svc_recvfrom(struct svc_rqst *rqstp, str
struct msghdr msg = {
.msg_flags = MSG_DONTWAIT,
};
- struct sockaddr *sin;
int len;

len = kernel_recvmsg(svsk->sk_sock, &msg, iov, nr, buflen,
msg.msg_flags);

- /* sock_recvmsg doesn't fill in the name/namelen, so we must..
- */
- memcpy(&rqstp->rq_addr, &svsk->sk_remote, svsk->sk_remotelen);
- rqstp->rq_addrlen = svsk->sk_remotelen;
-
- /* Destination address in request is needed for binding the
- * source address in RPC callbacks later.
- */
- sin = (struct sockaddr *)&svsk->sk_local;
- switch (sin->sa_family) {
- case AF_INET:
- rqstp->rq_daddr.addr = ((struct sockaddr_in *)sin)->sin_addr;
- break;
- case AF_INET6:
- rqstp->rq_daddr.addr6 = ((struct sockaddr_in6 *)sin)->sin6_addr;
- break;
- }
-
dprintk("svc: socket %p recvfrom(%p, %Zu) = %d\n",
svsk, iov[0].iov_base, iov[0].iov_len, len);
-
return len;
}

@@ -1113,14 +1093,14 @@ svc_tcp_accept(struct svc_xprt *xprt)
if (!(newsvsk = svc_setup_socket(serv, newsock, &err,
(SVC_SOCK_ANONYMOUS | SVC_SOCK_TEMPORARY))))
goto failed;
- memcpy(&newsvsk->sk_remote, sin, slen);
- newsvsk->sk_remotelen = slen;
+ memcpy(&newsvsk->sk_xprt.xpt_remote, sin, slen);
+ newsvsk->sk_xprt.xpt_remotelen = slen;
err = kernel_getsockname(newsock, sin, &slen);
if (unlikely(err < 0)) {
dprintk("svc_tcp_accept: kernel_getsockname error %d\n", -err);
slen = offsetof(struct sockaddr, sa_data);
}
- memcpy(&newsvsk->sk_local, sin, slen);
+ memcpy(&newsvsk->sk_xprt.xpt_local, sin, slen);

if (serv->sv_stats)
serv->sv_stats->nettcpconn++;
@@ -1496,6 +1476,29 @@ svc_check_conn_limits(struct svc_serv *s
}
}

+static inline void svc_copy_addr(struct svc_rqst *rqstp, struct svc_xprt *xprt)
+{
+ struct sockaddr *sin;
+
+ /* sock_recvmsg doesn't fill in the name/namelen, so we must..
+ */
+ memcpy(&rqstp->rq_addr, &xprt->xpt_remote, xprt->xpt_remotelen);
+ rqstp->rq_addrlen = xprt->xpt_remotelen;
+
+ /* Destination address in request is needed for binding the
+ * source address in RPC callbacks later.
+ */
+ sin = (struct sockaddr *)&xprt->xpt_local;
+ switch (sin->sa_family) {
+ case AF_INET:
+ rqstp->rq_daddr.addr = ((struct sockaddr_in *)sin)->sin_addr;
+ break;
+ case AF_INET6:
+ rqstp->rq_daddr.addr6 = ((struct sockaddr_in6 *)sin)->sin6_addr;
+ break;
+ }
+}
+
/*
* Receive the next request on any socket. This code is carefully
* organised not to touch any cachelines in the shared svc_serv
@@ -1614,6 +1617,7 @@ svc_recv(struct svc_rqst *rqstp, long ti
len = svc_deferred_recv(rqstp);
} else
len = svsk->sk_xprt.xpt_ops.xpo_recvfrom(rqstp);
+ svc_copy_addr(rqstp, &svsk->sk_xprt);
dprintk("svc: got len=%d\n", len);
}


-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2005.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2007-10-01 19:28:39

by Tom Tucker

[permalink] [raw]
Subject: [RFC, PATCH 31/35] svc: Make svc_check_conn_limits xprt independent


The svc_check_conn_limits function only manipulates xprt fields. Change references
to svc_sock->sk_xprt to svc_xprt directly.

Signed-off-by: Tom Tucker <[email protected]>
---

net/sunrpc/svcsock.c | 31 +++++++++++++++----------------
1 files changed, 15 insertions(+), 16 deletions(-)

diff --git a/net/sunrpc/svcsock.c b/net/sunrpc/svcsock.c
index 0f57426..353aae2 100644
--- a/net/sunrpc/svcsock.c
+++ b/net/sunrpc/svcsock.c
@@ -1450,38 +1450,37 @@ svc_check_conn_limits(struct svc_serv *s
* seconds. An attacker can easily beat that.
*
* The only somewhat efficient mechanism would be if drop
- * old connections from the same IP first. But right now
- * we don't even record the client IP in svc_sock.
+ * old connections from the same IP first.
*/
if (serv->sv_tmpcnt > (serv->sv_nrthreads+3)*20) {
- struct svc_sock *svsk = NULL;
+ struct svc_xprt *xprt = NULL;
spin_lock_bh(&serv->sv_lock);
if (!list_empty(&serv->sv_tempsocks)) {
if (net_ratelimit()) {
/* Try to help the admin */
- printk(KERN_NOTICE "%s: too many open TCP "
- "sockets, consider increasing the "
+ printk(KERN_NOTICE "%s: too many open "
+ "connections, consider increasing the "
"number of nfsd threads\n",
- serv->sv_name);
+ serv->sv_name);
printk(KERN_NOTICE
- "%s: last TCP connect from %s\n",
+ "%s: last connection from %s\n",
serv->sv_name, buf);
}
/*
- * Always select the oldest socket. It's not fair,
+ * Always select the oldest connection. It's not fair,
* but so is life
*/
- svsk = list_entry(serv->sv_tempsocks.prev,
- struct svc_sock,
- sk_xprt.xpt_list);
- set_bit(XPT_CLOSE, &svsk->sk_xprt.xpt_flags);
- svc_xprt_get(&svsk->sk_xprt);
+ xprt = list_entry(serv->sv_tempsocks.prev,
+ struct svc_xprt,
+ xpt_list);
+ set_bit(XPT_CLOSE, &xprt->xpt_flags);
+ svc_xprt_get(xprt);
}
spin_unlock_bh(&serv->sv_lock);

- if (svsk) {
- svc_xprt_enqueue(&svsk->sk_xprt);
- svc_xprt_put(&svsk->sk_xprt);
+ if (xprt) {
+ svc_xprt_enqueue(xprt);
+ svc_xprt_put(xprt);
}
}
}

-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2005.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2007-10-01 19:28:36

by Tom Tucker

[permalink] [raw]
Subject: [RFC, PATCH 29/35] svc: Move common create logic to common code


Move the code that adds a transport instance to the sv_tempsocks and
sv_permsocks lists out of the transport specific functions and into core
logic.

The svc_addsock routine still manipulates sv_permsocks directly. This
code may be removed when rpc.nfsd is modified to create transports
by writing to the portlist file.

Signed-off-by: Tom Tucker <[email protected]>
---

net/sunrpc/svc_xprt.c | 7 +++++++
net/sunrpc/svcsock.c | 38 +++++++++++++++++++-------------------
2 files changed, 26 insertions(+), 19 deletions(-)

diff --git a/net/sunrpc/svc_xprt.c b/net/sunrpc/svc_xprt.c
index 2a27d5e..56cda03 100644
--- a/net/sunrpc/svc_xprt.c
+++ b/net/sunrpc/svc_xprt.c
@@ -144,6 +144,13 @@ int svc_create_xprt(struct svc_serv *ser
if (IS_ERR(newxprt)) {
module_put(xcl->xcl_owner);
ret = PTR_ERR(newxprt);
+ } else {
+ clear_bit(XPT_TEMP,
+ &newxprt->xpt_flags);
+ spin_lock_bh(&serv->sv_lock);
+ list_add(&newxprt->xpt_list,
+ &serv->sv_permsocks);
+ spin_unlock_bh(&serv->sv_lock);
}
}
goto out;
diff --git a/net/sunrpc/svcsock.c b/net/sunrpc/svcsock.c
index d6f3c02..f1ea6f7 100644
--- a/net/sunrpc/svcsock.c
+++ b/net/sunrpc/svcsock.c
@@ -93,6 +93,7 @@ static int svc_deferred_recv(struct svc_
static struct cache_deferred_req *svc_defer(struct cache_req *req);
static struct svc_xprt *
svc_create_socket(struct svc_serv *, int, struct sockaddr *, int, int);
+static void svc_age_temp_xprts(unsigned long closure);

/* apparently the "standard" is that clients close
* idle connections after 5 minutes, servers after
@@ -1604,6 +1605,18 @@ svc_recv(struct svc_rqst *rqstp, long ti
*/
__module_get(newxpt->xpt_class->xcl_owner);
svc_check_conn_limits(xprt->xpt_server);
+ spin_lock_bh(&serv->sv_lock);
+ set_bit(XPT_TEMP, &newxpt->xpt_flags);
+ list_add(&newxpt->xpt_list, &serv->sv_tempsocks);
+ serv->sv_tmpcnt++;
+ if (serv->sv_temptimer.function == NULL) {
+ /* setup timer to age temp sockets */
+ setup_timer(&serv->sv_temptimer, svc_age_temp_xprts,
+ (unsigned long)serv);
+ mod_timer(&serv->sv_temptimer,
+ jiffies + svc_conn_age_period * HZ);
+ }
+ spin_unlock_bh(&serv->sv_lock);
}
svc_xprt_received(xprt);
} else {
@@ -1750,7 +1763,6 @@ static struct svc_sock *svc_setup_socket
struct svc_sock *svsk;
struct sock *inet;
int pmap_register = !(flags & SVC_SOCK_ANONYMOUS);
- int is_temporary = flags & SVC_SOCK_TEMPORARY;

dprintk("svc: svc_setup_socket %p\n", sock);
if (!(svsk = kzalloc(sizeof(*svsk), GFP_KERNEL))) {
@@ -1784,24 +1796,6 @@ static struct svc_sock *svc_setup_socket
else
svc_tcp_init(svsk, serv);

- spin_lock_bh(&serv->sv_lock);
- if (is_temporary) {
- set_bit(XPT_TEMP, &svsk->sk_xprt.xpt_flags);
- list_add(&svsk->sk_xprt.xpt_list, &serv->sv_tempsocks);
- serv->sv_tmpcnt++;
- if (serv->sv_temptimer.function == NULL) {
- /* setup timer to age temp sockets */
- setup_timer(&serv->sv_temptimer, svc_age_temp_xprts,
- (unsigned long)serv);
- mod_timer(&serv->sv_temptimer,
- jiffies + svc_conn_age_period * HZ);
- }
- } else {
- clear_bit(XPT_TEMP, &svsk->sk_xprt.xpt_flags);
- list_add(&svsk->sk_xprt.xpt_list, &serv->sv_permsocks);
- }
- spin_unlock_bh(&serv->sv_lock);
-
dprintk("svc: svc_setup_socket created %p (inet %p)\n",
svsk, svsk->sk_sk);

@@ -1832,6 +1826,12 @@ int svc_addsock(struct svc_serv *serv,
svc_xprt_received(&svsk->sk_xprt);
err = 0;
}
+ if (so->sk->sk_protocol == IPPROTO_TCP)
+ set_bit(XPT_LISTENER, &svsk->sk_xprt.xpt_flags);
+ clear_bit(XPT_TEMP, &svsk->sk_xprt.xpt_flags);
+ spin_lock_bh(&serv->sv_lock);
+ list_add(&svsk->sk_xprt.xpt_list, &serv->sv_permsocks);
+ spin_unlock_bh(&serv->sv_lock);
}
if (err) {
sockfd_put(so);

-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2005.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2007-10-01 19:28:33

by Tom Tucker

[permalink] [raw]
Subject: [RFC, PATCH 26/35] svc: Make svc_sock_release svc_xprt_release


The svc_sock_release function only touches transport independent fields.
Change the function to manipulate svc_xprt directly instead of the transport
dependent svc_sock structure.

Signed-off-by: Tom Tucker <[email protected]>
---

net/sunrpc/svcsock.c | 15 +++++++--------
1 files changed, 7 insertions(+), 8 deletions(-)

diff --git a/net/sunrpc/svcsock.c b/net/sunrpc/svcsock.c
index ab34bb2..68ae7a9 100644
--- a/net/sunrpc/svcsock.c
+++ b/net/sunrpc/svcsock.c
@@ -378,9 +378,9 @@ void svc_reserve(struct svc_rqst *rqstp,
}

static void
-svc_sock_release(struct svc_rqst *rqstp)
+svc_xprt_release(struct svc_rqst *rqstp)
{
- struct svc_sock *svsk = rqstp->rq_sock;
+ struct svc_xprt *xprt = rqstp->rq_xprt;

rqstp->rq_xprt->xpt_ops.xpo_release(rqstp);

@@ -388,7 +388,6 @@ svc_sock_release(struct svc_rqst *rqstp)
rqstp->rq_res.page_len = 0;
rqstp->rq_res.page_base = 0;

-
/* Reset response buffer and release
* the reservation.
* But first, check that enough space was reserved
@@ -401,9 +400,9 @@ svc_sock_release(struct svc_rqst *rqstp)

rqstp->rq_res.head[0].iov_len = 0;
svc_reserve(rqstp, 0);
- rqstp->rq_sock = NULL;
+ rqstp->rq_xprt = NULL;

- svc_xprt_put(&svsk->sk_xprt);
+ svc_xprt_put(xprt);
}

/*
@@ -1624,7 +1623,7 @@ svc_recv(struct svc_rqst *rqstp, long ti
/* No data, incomplete (TCP) read, or accept() */
if (len == 0 || len == -EAGAIN) {
rqstp->rq_res.len = 0;
- svc_sock_release(rqstp);
+ svc_xprt_release(rqstp);
return -EAGAIN;
}
clear_bit(XPT_OLD, &svsk->sk_xprt.xpt_flags);
@@ -1644,7 +1643,7 @@ void
svc_drop(struct svc_rqst *rqstp)
{
dprintk("svc: socket %p dropped request\n", rqstp->rq_sock);
- svc_sock_release(rqstp);
+ svc_xprt_release(rqstp);
}

/*
@@ -1679,7 +1678,7 @@ svc_send(struct svc_rqst *rqstp)
else
len = xprt->xpt_ops.xpo_sendto(rqstp);
mutex_unlock(&xprt->xpt_mutex);
- svc_sock_release(rqstp);
+ svc_xprt_release(rqstp);

if (len == -ECONNREFUSED || len == -ENOTCONN || len == -EAGAIN)
return 0;

-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2005.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2007-10-01 19:28:38

by Tom Tucker

[permalink] [raw]
Subject: [RFC, PATCH 30/35] svc: Removing remaining references to rq_sock in rqstp


This functionally empty patch removes rq_sock and unamed union
from rqstp structure.

Signed-off-by: Tom Tucker <[email protected]>
---

include/linux/sunrpc/svc.h | 5 +----
net/sunrpc/svcsock.c | 38 ++++++++++++++++++++++++--------------
2 files changed, 25 insertions(+), 18 deletions(-)

diff --git a/include/linux/sunrpc/svc.h b/include/linux/sunrpc/svc.h
index 40adc9d..04eb20e 100644
--- a/include/linux/sunrpc/svc.h
+++ b/include/linux/sunrpc/svc.h
@@ -204,10 +204,7 @@ union svc_addr_u {
struct svc_rqst {
struct list_head rq_list; /* idle list */
struct list_head rq_all; /* all threads list */
- union {
- struct svc_xprt * rq_xprt; /* transport ptr */
- struct svc_sock * rq_sock; /* socket ptr */
- };
+ struct svc_xprt * rq_xprt; /* transport ptr */
struct sockaddr_storage rq_addr; /* peer address */
size_t rq_addrlen;

diff --git a/net/sunrpc/svcsock.c b/net/sunrpc/svcsock.c
index f1ea6f7..0f57426 100644
--- a/net/sunrpc/svcsock.c
+++ b/net/sunrpc/svcsock.c
@@ -197,10 +197,12 @@ svc_release_skb(struct svc_rqst *rqstp)
struct svc_deferred_req *dr = rqstp->rq_deferred;

if (skb) {
+ struct svc_sock *svsk =
+ container_of(rqstp->rq_xprt, struct svc_sock, sk_xprt);
rqstp->rq_xprt_ctxt = NULL;

dprintk("svc: service %p, releasing skb %p\n", rqstp, skb);
- skb_free_datagram(rqstp->rq_sock->sk_sk, skb);
+ skb_free_datagram(svsk->sk_sk, skb);
}
if (dr) {
rqstp->rq_deferred = NULL;
@@ -429,7 +431,7 @@ svc_wake_up(struct svc_serv *serv)
dprintk("svc: daemon %p woken up.\n", rqstp);
/*
svc_thread_dequeue(pool, rqstp);
- rqstp->rq_sock = NULL;
+ rqstp->rq_xprt = NULL;
*/
wake_up(&rqstp->rq_wait);
}
@@ -446,7 +448,9 @@ #define SVC_PKTINFO_SPACE \

static void svc_set_cmsg_data(struct svc_rqst *rqstp, struct cmsghdr *cmh)
{
- switch (rqstp->rq_sock->sk_sk->sk_family) {
+ struct svc_sock *svsk =
+ container_of(rqstp->rq_xprt, struct svc_sock, sk_xprt);
+ switch (svsk->sk_sk->sk_family) {
case AF_INET: {
struct in_pktinfo *pki = CMSG_DATA(cmh);

@@ -479,7 +483,8 @@ static void svc_set_cmsg_data(struct svc
static int
svc_sendto(struct svc_rqst *rqstp, struct xdr_buf *xdr)
{
- struct svc_sock *svsk = rqstp->rq_sock;
+ struct svc_sock *svsk =
+ container_of(rqstp->rq_xprt, struct svc_sock, sk_xprt);
struct socket *sock = svsk->sk_sock;
int slen;
union {
@@ -552,7 +557,7 @@ svc_sendto(struct svc_rqst *rqstp, struc
}
out:
dprintk("svc: socket %p sendto([%p %Zu... ], %d) = %d (addr %s)\n",
- rqstp->rq_sock, xdr->head[0].iov_base, xdr->head[0].iov_len,
+ svsk, xdr->head[0].iov_base, xdr->head[0].iov_len,
xdr->len, len, svc_print_addr(rqstp, buf, sizeof(buf)));

return len;
@@ -628,7 +633,8 @@ svc_recv_available(struct svc_sock *svsk
static int
svc_recvfrom(struct svc_rqst *rqstp, struct kvec *iov, int nr, int buflen)
{
- struct svc_sock *svsk = rqstp->rq_sock;
+ struct svc_sock *svsk =
+ container_of(rqstp->rq_xprt, struct svc_sock, sk_xprt);
struct msghdr msg = {
.msg_flags = MSG_DONTWAIT,
};
@@ -711,7 +717,9 @@ svc_write_space(struct sock *sk)
static inline void svc_udp_get_dest_address(struct svc_rqst *rqstp,
struct cmsghdr *cmh)
{
- switch (rqstp->rq_sock->sk_sk->sk_family) {
+ struct svc_sock *svsk =
+ container_of(rqstp->rq_xprt, struct svc_sock, sk_xprt);
+ switch (svsk->sk_sk->sk_family) {
case AF_INET: {
struct in_pktinfo *pki = CMSG_DATA(cmh);
rqstp->rq_daddr.addr.s_addr = pki->ipi_spec_dst.s_addr;
@@ -731,7 +739,8 @@ static inline void svc_udp_get_dest_addr
static int
svc_udp_recvfrom(struct svc_rqst *rqstp)
{
- struct svc_sock *svsk = rqstp->rq_sock;
+ struct svc_sock *svsk =
+ container_of(rqstp->rq_xprt, struct svc_sock, sk_xprt);
struct svc_serv *serv = svsk->sk_xprt.xpt_server;
struct sk_buff *skb;
union {
@@ -1118,7 +1127,8 @@ failed:
static int
svc_tcp_recvfrom(struct svc_rqst *rqstp)
{
- struct svc_sock *svsk = rqstp->rq_sock;
+ struct svc_sock *svsk =
+ container_of(rqstp->rq_xprt, struct svc_sock, sk_xprt);
struct svc_serv *serv = svsk->sk_xprt.xpt_server;
int len;
struct kvec *vec;
@@ -1281,16 +1291,16 @@ svc_tcp_sendto(struct svc_rqst *rqstp)
reclen = htonl(0x80000000|((xbufp->len ) - 4));
memcpy(xbufp->head[0].iov_base, &reclen, 4);

- if (test_bit(XPT_DEAD, &rqstp->rq_sock->sk_xprt.xpt_flags))
+ if (test_bit(XPT_DEAD, &rqstp->rq_xprt->xpt_flags))
return -ENOTCONN;

sent = svc_sendto(rqstp, &rqstp->rq_res);
if (sent != xbufp->len) {
printk(KERN_NOTICE "rpc-srv/tcp: %s: %s %d when sending %d bytes - shutting down socket\n",
- rqstp->rq_sock->sk_xprt.xpt_server->sv_name,
+ rqstp->rq_xprt->xpt_server->sv_name,
(sent<0)?"got error":"sent only",
sent, xbufp->len);
- set_bit(XPT_CLOSE, &rqstp->rq_sock->sk_xprt.xpt_flags);
+ set_bit(XPT_CLOSE, &rqstp->rq_xprt->xpt_flags);
svc_xprt_enqueue(rqstp->rq_xprt);
sent = -EAGAIN;
}
@@ -1312,7 +1322,7 @@ svc_tcp_prep_reply_hdr(struct svc_rqst *
static int
svc_tcp_has_wspace(struct svc_xprt *xprt)
{
- struct svc_sock *svsk = container_of(xprt, struct svc_sock, sk_xprt);
+ struct svc_sock *svsk = container_of(xprt, struct svc_sock, sk_xprt);
struct svc_serv *serv = svsk->sk_xprt.xpt_server;
int required;

@@ -1655,7 +1665,7 @@ svc_recv(struct svc_rqst *rqstp, long ti
void
svc_drop(struct svc_rqst *rqstp)
{
- dprintk("svc: socket %p dropped request\n", rqstp->rq_sock);
+ dprintk("svc: xprt %p dropped request\n", rqstp->rq_xprt);
svc_xprt_release(rqstp);
}


-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2005.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2007-10-01 19:28:28

by Tom Tucker

[permalink] [raw]
Subject: [RFC, PATCH 28/35] svc: Make svc_age_temp_sockets svc_age_temp_transports


This function is transport independent. Change it to use svc_xprt directly
and change it's name to reflect this.

Signed-off-by: Tom Tucker <[email protected]>
---

net/sunrpc/svcsock.c | 36 +++++++++++++++++++-----------------
1 files changed, 19 insertions(+), 17 deletions(-)

diff --git a/net/sunrpc/svcsock.c b/net/sunrpc/svcsock.c
index 573792f..d6f3c02 100644
--- a/net/sunrpc/svcsock.c
+++ b/net/sunrpc/svcsock.c
@@ -1690,48 +1690,50 @@ svc_send(struct svc_rqst *rqstp)
* a mark-and-sweep algorithm.
*/
static void
-svc_age_temp_sockets(unsigned long closure)
+svc_age_temp_xprts(unsigned long closure)
{
struct svc_serv *serv = (struct svc_serv *)closure;
- struct svc_sock *svsk;
+ struct svc_xprt *xprt;
struct list_head *le, *next;
LIST_HEAD(to_be_aged);

- dprintk("svc_age_temp_sockets\n");
+ dprintk("svc_age_temp_xprts\n");

if (!spin_trylock_bh(&serv->sv_lock)) {
/* busy, try again 1 sec later */
- dprintk("svc_age_temp_sockets: busy\n");
+ dprintk("svc_age_temp_xprts: busy\n");
mod_timer(&serv->sv_temptimer, jiffies + HZ);
return;
}

list_for_each_safe(le, next, &serv->sv_tempsocks) {
- svsk = list_entry(le, struct svc_sock, sk_xprt.xpt_list);
+ xprt = list_entry(le, struct svc_xprt, xpt_list);

- if (!test_and_set_bit(XPT_OLD, &svsk->sk_xprt.xpt_flags))
+ /* First time through, just mark it OLD. Second time
+ * through, close it. */
+ if (!test_and_set_bit(XPT_OLD, &xprt->xpt_flags))
continue;
- if (atomic_read(&svsk->sk_xprt.xpt_ref.refcount) > 1
- || test_bit(XPT_BUSY, &svsk->sk_xprt.xpt_flags))
+ if (atomic_read(&xprt->xpt_ref.refcount) > 1
+ || test_bit(XPT_BUSY, &xprt->xpt_flags))
continue;
- svc_xprt_get(&svsk->sk_xprt);
+ svc_xprt_get(xprt);
list_move(le, &to_be_aged);
- set_bit(XPT_CLOSE, &svsk->sk_xprt.xpt_flags);
- set_bit(XPT_DETACHED, &svsk->sk_xprt.xpt_flags);
+ set_bit(XPT_CLOSE, &xprt->xpt_flags);
+ set_bit(XPT_DETACHED, &xprt->xpt_flags);
}
spin_unlock_bh(&serv->sv_lock);

while (!list_empty(&to_be_aged)) {
le = to_be_aged.next;
- /* fiddling the sk_xprt.xpt_list node is safe 'cos we're XPT_DETACHED */
+ /* fiddling the xpt_list node is safe 'cos we're XPT_DETACHED */
list_del_init(le);
- svsk = list_entry(le, struct svc_sock, sk_xprt.xpt_list);
+ xprt = list_entry(le, struct svc_xprt, xpt_list);

- dprintk("queuing svsk %p for closing\n", svsk);
+ dprintk("queuing xprt %p for closing\n", xprt);

/* a thread will dequeue and close it soon */
- svc_xprt_enqueue(&svsk->sk_xprt);
- svc_xprt_put(&svsk->sk_xprt);
+ svc_xprt_enqueue(xprt);
+ svc_xprt_put(xprt);
}

mod_timer(&serv->sv_temptimer, jiffies + svc_conn_age_period * HZ);
@@ -1789,7 +1791,7 @@ static struct svc_sock *svc_setup_socket
serv->sv_tmpcnt++;
if (serv->sv_temptimer.function == NULL) {
/* setup timer to age temp sockets */
- setup_timer(&serv->sv_temptimer, svc_age_temp_sockets,
+ setup_timer(&serv->sv_temptimer, svc_age_temp_xprts,
(unsigned long)serv);
mod_timer(&serv->sv_temptimer,
jiffies + svc_conn_age_period * HZ);

-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2005.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs