Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752164AbdFNT1R (ORCPT ); Wed, 14 Jun 2017 15:27:17 -0400 Received: from mail.kernel.org ([198.145.29.99]:37710 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751759AbdFNT1Q (ORCPT ); Wed, 14 Jun 2017 15:27:16 -0400 DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 4A018219A7 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=kernel.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=sstabellini@kernel.org Date: Wed, 14 Jun 2017 12:27:14 -0700 (PDT) From: Stefano Stabellini X-X-Sender: sstabellini@sstabellini-ThinkPad-X260 To: Juergen Gross cc: Stefano Stabellini , xen-devel@lists.xen.org, linux-kernel@vger.kernel.org, boris.ostrovsky@oracle.com, Stefano Stabellini Subject: Re: [PATCH v3 11/18] xen/pvcalls: implement accept command In-Reply-To: <6749c353-5e63-77fb-2541-44703072e9ac@suse.com> Message-ID: References: <1496431915-20774-1-git-send-email-sstabellini@kernel.org> <1496431915-20774-11-git-send-email-sstabellini@kernel.org> <6749c353-5e63-77fb-2541-44703072e9ac@suse.com> User-Agent: Alpine 2.10 (DEB 1266 2009-07-14) MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 6882 Lines: 197 On Wed, 14 Jun 2017, Juergen Gross wrote: > On 14/06/17 02:47, Stefano Stabellini wrote: > > On Tue, 13 Jun 2017, Juergen Gross wrote: > >> On 02/06/17 21:31, Stefano Stabellini wrote: > >>> Implement the accept command by calling inet_accept. To avoid blocking > >>> in the kernel, call inet_accept(O_NONBLOCK) from a workqueue, which get > >>> scheduled on sk_data_ready (for a passive socket, it means that there > >>> are connections to accept). > >>> > >>> Use the reqcopy field to store the request. Accept the new socket from > >>> the delayed work function, create a new sock_mapping for it, map > >>> the indexes page and data ring, and reply to the other end. Allocate an > >>> ioworker for the socket. > >>> > >>> Only support one outstanding blocking accept request for every socket at > >>> any time. > >>> > >>> Add a field to sock_mapping to remember the passive socket from which an > >>> active socket was created. > >>> > >>> Signed-off-by: Stefano Stabellini > >>> CC: boris.ostrovsky@oracle.com > >>> CC: jgross@suse.com > >>> --- > >>> drivers/xen/pvcalls-back.c | 109 ++++++++++++++++++++++++++++++++++++++++++++- > >>> 1 file changed, 108 insertions(+), 1 deletion(-) > >>> > >>> diff --git a/drivers/xen/pvcalls-back.c b/drivers/xen/pvcalls-back.c > >>> index a75586e..f1173f4 100644 > >>> --- a/drivers/xen/pvcalls-back.c > >>> +++ b/drivers/xen/pvcalls-back.c > >>> @@ -65,6 +65,7 @@ struct pvcalls_ioworker { > >>> struct sock_mapping { > >>> struct list_head list; > >>> struct pvcalls_fedata *priv; > >>> + struct sockpass_mapping *sockpass; > >>> struct socket *sock; > >>> uint64_t id; > >>> grant_ref_t ref; > >>> @@ -275,10 +276,79 @@ static int pvcalls_back_release(struct xenbus_device *dev, > >>> > >>> static void __pvcalls_back_accept(struct work_struct *work) > >>> { > >>> + struct sockpass_mapping *mappass = container_of( > >>> + work, struct sockpass_mapping, register_work); > >>> + struct sock_mapping *map; > >>> + struct pvcalls_ioworker *iow; > >>> + struct pvcalls_fedata *priv; > >>> + struct socket *sock; > >>> + struct xen_pvcalls_response *rsp; > >>> + struct xen_pvcalls_request *req; > >>> + int notify; > >>> + int ret = -EINVAL; > >>> + unsigned long flags; > >>> + > >>> + priv = mappass->priv; > >>> + /* We only need to check the value of "cmd" atomically on read. */ > >>> + spin_lock_irqsave(&mappass->copy_lock, flags); > >>> + req = &mappass->reqcopy; > >>> + if (req->cmd != PVCALLS_ACCEPT) { > >>> + spin_unlock_irqrestore(&mappass->copy_lock, flags); > >>> + return; > >>> + } > >>> + spin_unlock_irqrestore(&mappass->copy_lock, flags); > >> > >> What about: > >> req = &mappass->reqcopy; > >> if (ACCESS_ONCE(req->cmd) != PVCALLS_ACCEPT) > >> return; > >> > >> I can't see the need for taking a lock here. > > > > Sure, good idea > > > > > >>> + > >>> + sock = sock_alloc(); > >>> + if (sock == NULL) > >>> + goto out_error; > >>> + sock->type = mappass->sock->type; > >>> + sock->ops = mappass->sock->ops; > >>> + > >>> + ret = inet_accept(mappass->sock, sock, O_NONBLOCK, true); > >>> + if (ret == -EAGAIN) { > >>> + sock_release(sock); > >>> + goto out_error; > >>> + } > >>> + > >>> + map = pvcalls_new_active_socket(priv, > >>> + req->u.accept.id_new, > >>> + req->u.accept.ref, > >>> + req->u.accept.evtchn, > >>> + sock); > >>> + if (!map) { > >>> + sock_release(sock); > >>> + goto out_error; > >>> + } > >>> + > >>> + map->sockpass = mappass; > >>> + iow = &map->ioworker; > >>> + atomic_inc(&map->read); > >>> + atomic_inc(&map->io); > >>> + queue_work_on(iow->cpu, iow->wq, &iow->register_work); > >>> + > >>> +out_error: > >>> + rsp = RING_GET_RESPONSE(&priv->ring, priv->ring.rsp_prod_pvt++); > >>> + rsp->req_id = req->req_id; > >>> + rsp->cmd = req->cmd; > >>> + rsp->u.accept.id = req->u.accept.id; > >>> + rsp->ret = ret; > >>> + RING_PUSH_RESPONSES_AND_CHECK_NOTIFY(&priv->ring, notify); > >>> + if (notify) > >>> + notify_remote_via_irq(priv->irq); > >>> + > >>> + spin_lock_irqsave(&mappass->copy_lock, flags); > >>> + mappass->reqcopy.cmd = 0; > >>> + spin_unlock_irqrestore(&mappass->copy_lock, flags); > >> > >> ACCESS_ONCE(mappass->reqcopy.cmd) = 0; > > > > OK > > > > > >>> } > >>> > >>> static void pvcalls_pass_sk_data_ready(struct sock *sock) > >>> { > >>> + struct sockpass_mapping *mappass = sock->sk_user_data; > >>> + > >>> + if (mappass == NULL) > >>> + return; > >>> + > >>> + queue_work(mappass->wq, &mappass->register_work); > >>> } > >>> > >>> static int pvcalls_back_bind(struct xenbus_device *dev, > >>> @@ -380,7 +450,44 @@ static int pvcalls_back_listen(struct xenbus_device *dev, > >>> static int pvcalls_back_accept(struct xenbus_device *dev, > >>> struct xen_pvcalls_request *req) > >>> { > >>> - return 0; > >>> + struct pvcalls_fedata *priv; > >>> + struct sockpass_mapping *mappass; > >>> + int ret = -EINVAL; > >>> + struct xen_pvcalls_response *rsp; > >>> + unsigned long flags; > >>> + > >>> + priv = dev_get_drvdata(&dev->dev); > >>> + > >>> + mappass = radix_tree_lookup(&priv->socketpass_mappings, > >>> + req->u.accept.id); > >>> + if (mappass == NULL) > >>> + goto out_error; > >>> + > >>> + /* > >>> + * Limitation of the current implementation: only support one > >>> + * concurrent accept or poll call on one socket. > >>> + */ > >>> + spin_lock_irqsave(&mappass->copy_lock, flags); > >>> + if (mappass->reqcopy.cmd != 0) { > >>> + spin_unlock_irqrestore(&mappass->copy_lock, flags); > >>> + ret = -EINTR; > >>> + goto out_error; > >>> + } > >>> + > >>> + mappass->reqcopy = *req; > >> > >> This time you need the lock, however you should use: > >> > >> ACCESS_ONCE(mappass->reqcopy) = *req; > > > > I don't think that guarantees atomic accesses to the cmd field of the > > struct. Shouldn't this be: > > > > ACCESS_ONCE(mappass->reqcopy.cmd) = req->cmd; > > mappass->reqcopy = *req; > > Hmm, what if the frontend changes cmd between those two accesses? This cannot happen because req is a copy of the guest request here. However, it is possible that __pvcalls_back_accept is racing against pvcalls_back_accept. In that case, I would need to make sure not only that cmd is written atomically, but now that I am thinking about this, that cmd is written *after* the rest of reqcopy: otherwise __pvcalls_back_accept could see a partially updated reqcopy. It would be possible to do this with atomic accesses and barriers, but I am thinking that it is not worth the effort. I am tempted to roll back to the previous version with spinlocks. > You either need another local buffer or you have to copy cmd via > ACCESS_ONCE() and the rest of *req separately (seems not to be > that hard: its just cmd, req_id and u). > > BTW: Maybe you should use READ_ONCE() and WRITE_ONCE() instead of > ACCESS_ONCE(), as those seem to be preferred nowadays.