Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751576AbdGZX7c (ORCPT ); Wed, 26 Jul 2017 19:59:32 -0400 Received: from mail.kernel.org ([198.145.29.99]:40400 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751228AbdGZX7a (ORCPT ); Wed, 26 Jul 2017 19:59:30 -0400 DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 1E48A22BCF Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=kernel.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=sstabellini@kernel.org Date: Wed, 26 Jul 2017 16:59:29 -0700 (PDT) From: Stefano Stabellini X-X-Sender: sstabellini@sstabellini-ThinkPad-X260 To: Boris Ostrovsky cc: Stefano Stabellini , xen-devel@lists.xen.org, linux-kernel@vger.kernel.org, jgross@suse.com, Stefano Stabellini Subject: Re: [PATCH v2 05/13] xen/pvcalls: implement bind command In-Reply-To: <5978AD87.20504@oracle.com> Message-ID: References: <1501017730-12797-1-git-send-email-sstabellini@kernel.org> <1501017730-12797-5-git-send-email-sstabellini@kernel.org> <5978AD87.20504@oracle.com> User-Agent: Alpine 2.10 (DEB 1266 2009-07-14) MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 6145 Lines: 164 On Wed, 26 Jul 2017, Boris Ostrovsky wrote: > On 7/25/2017 5:22 PM, Stefano Stabellini wrote: > > Send PVCALLS_BIND to the backend. Introduce a new structure, part of > > struct sock_mapping, to store information specific to passive sockets. > > > > Introduce a status field to keep track of the status of the passive > > socket. > > > > Introduce a waitqueue for the "accept" command (see the accept command > > implementation): it is used to allow only one outstanding accept > > command at any given time and to implement polling on the passive > > socket. Introduce a flags field to keep track of in-flight accept and > > poll commands. > > > > sock->sk->sk_send_head is not used for ip sockets: reuse the field to > > store a pointer to the struct sock_mapping corresponding to the socket. > > > > Convert the struct socket pointer into an uint64_t and use it as id for > > the socket to pass to the backend. > > > > Signed-off-by: Stefano Stabellini > > CC: boris.ostrovsky@oracle.com > > CC: jgross@suse.com > > --- > > drivers/xen/pvcalls-front.c | 73 > > +++++++++++++++++++++++++++++++++++++++++++++ > > drivers/xen/pvcalls-front.h | 3 ++ > > 2 files changed, 76 insertions(+) > > > > diff --git a/drivers/xen/pvcalls-front.c b/drivers/xen/pvcalls-front.c > > index d0f5f42..af2ce20 100644 > > --- a/drivers/xen/pvcalls-front.c > > +++ b/drivers/xen/pvcalls-front.c > > @@ -59,6 +59,23 @@ struct sock_mapping { > > wait_queue_head_t inflight_conn_req; > > } active; > > + struct { > > + /* Socket status */ > > +#define PVCALLS_STATUS_UNINITALIZED 0 > > +#define PVCALLS_STATUS_BIND 1 > > +#define PVCALLS_STATUS_LISTEN 2 > > + uint8_t status; > > + /* > > + * Internal state-machine flags. > > + * Only one accept operation can be inflight for a socket. > > + * Only one poll operation can be inflight for a given socket. > > + */ > > +#define PVCALLS_FLAG_ACCEPT_INFLIGHT 0 > > +#define PVCALLS_FLAG_POLL_INFLIGHT 1 > > +#define PVCALLS_FLAG_POLL_RET 2 > > + uint8_t flags; > > + wait_queue_head_t inflight_accept_req; > > + } passive; > > }; > > }; > > @@ -292,6 +309,62 @@ int pvcalls_front_connect(struct socket *sock, struct > > sockaddr *addr, > > return ret; > > } > > +int pvcalls_front_bind(struct socket *sock, struct sockaddr *addr, int > > addr_len) > > +{ > > + struct pvcalls_bedata *bedata; > > + struct sock_mapping *map = NULL; > > + struct xen_pvcalls_request *req; > > + int notify, req_id, ret; > > + > > + if (!pvcalls_front_dev) > > + return -ENOTCONN; > > + if (addr->sa_family != AF_INET || sock->type != SOCK_STREAM) > > + return -ENOTSUPP; > > + bedata = dev_get_drvdata(&pvcalls_front_dev->dev); > > + > > + map = kzalloc(sizeof(*map), GFP_KERNEL); > > + if (map == NULL) > > + return -ENOMEM; > > + > > + spin_lock(&bedata->pvcallss_lock); > > + req_id = bedata->ring.req_prod_pvt & (RING_SIZE(&bedata->ring) - 1); > > + if (RING_FULL(&bedata->ring) || > > + READ_ONCE(bedata->rsp[req_id].req_id) != PVCALLS_INVALID_ID) { > > + kfree(map); > > + spin_unlock(&bedata->pvcallss_lock); > > + return -EAGAIN; > > + } > > + req = RING_GET_REQUEST(&bedata->ring, req_id); > > + req->req_id = req_id; > > + map->sock = sock; > > + req->cmd = PVCALLS_BIND; > > + req->u.bind.id = (uint64_t) sock; > > + memcpy(req->u.bind.addr, addr, sizeof(*addr)); > > + req->u.bind.len = addr_len; > > + > > + init_waitqueue_head(&map->passive.inflight_accept_req); > > + > > + list_add_tail(&map->list, &bedata->socketpass_mappings); > > + WRITE_ONCE(sock->sk->sk_send_head, (void *)map); > > + map->active_socket = false; > > + > > + bedata->ring.req_prod_pvt++; > > + RING_PUSH_REQUESTS_AND_CHECK_NOTIFY(&bedata->ring, notify); > > + spin_unlock(&bedata->pvcallss_lock); > > + if (notify) > > + notify_remote_via_irq(bedata->irq); > > + > > + wait_event(bedata->inflight_req, > > + READ_ONCE(bedata->rsp[req_id].req_id) == req_id); > > This all looks very similar to previous patches. Can it be factored out? You are right that the pattern is the same for all commands: - get a request - fill the request - possibly do something else - wait however each request is different, the struct and fields are different. There are spin_lock and spin_unlock calls intermingled. I am not sure I can factor out much of this. Maybe I could create a static inline or macro as a syntactic sugar to replace the wait call, but that's pretty much it I think. > Also, you've used wait_event_interruptible in socket() implementation. Why not > here (and connect())? My intention was to use wait_event to wait for replies everywhere but I missed some of them in the conversion (I used to use wait_event_interruptible in early versions of the code). The reason to use wait_event is that it makes it easier to handle the rsp slot in bedata (bedata->rsp[req_id]): in case of EINTR the response in bedata->rsp would not be cleared by anybody. If we use wait_event there is no such problem, and the backend could still return EINTR and we would handle it just fine as any other responses. I'll make sure to use wait_event to wait for a response (like here), and wait_event_interruptible elsewhere (like in recvmsg, where we don't risk leaking a rsp slot). > > + > > + map->passive.status = PVCALLS_STATUS_BIND; > > + ret = bedata->rsp[req_id].ret; > > + /* read ret, then set this rsp slot to be reused */ > > + smp_mb(); > > + WRITE_ONCE(bedata->rsp[req_id].req_id, PVCALLS_INVALID_ID); > > + return 0; > > +} > > + > > static const struct xenbus_device_id pvcalls_front_ids[] = { > > { "pvcalls" }, > > { "" } > > diff --git a/drivers/xen/pvcalls-front.h b/drivers/xen/pvcalls-front.h > > index 63b0417..8b0a274 100644 > > --- a/drivers/xen/pvcalls-front.h > > +++ b/drivers/xen/pvcalls-front.h > > @@ -6,5 +6,8 @@ > > int pvcalls_front_socket(struct socket *sock); > > int pvcalls_front_connect(struct socket *sock, struct sockaddr *addr, > > int addr_len, int flags); > > +int pvcalls_front_bind(struct socket *sock, > > + struct sockaddr *addr, > > + int addr_len); > > #endif