Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752134AbdFUXc7 (ORCPT ); Wed, 21 Jun 2017 19:32:59 -0400 Received: from mail.kernel.org ([198.145.29.99]:47324 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751641AbdFUXc6 (ORCPT ); Wed, 21 Jun 2017 19:32:58 -0400 DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org C26062187B Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=kernel.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=sstabellini@kernel.org Date: Wed, 21 Jun 2017 16:32:53 -0700 (PDT) From: Stefano Stabellini X-X-Sender: sstabellini@sstabellini-ThinkPad-X260 To: Boris Ostrovsky cc: Stefano Stabellini , xen-devel@lists.xen.org, linux-kernel@vger.kernel.org, jgross@suse.com, Stefano Stabellini Subject: Re: [PATCH v4 16/18] xen/pvcalls: implement read In-Reply-To: <3fb2cd09-66b1-602a-f56a-8f08f7cbdda5@oracle.com> Message-ID: References: <1497553787-3709-1-git-send-email-sstabellini@kernel.org> <1497553787-3709-16-git-send-email-sstabellini@kernel.org> <3fb2cd09-66b1-602a-f56a-8f08f7cbdda5@oracle.com> User-Agent: Alpine 2.10 (DEB 1266 2009-07-14) MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 5097 Lines: 151 On Wed, 21 Jun 2017, Boris Ostrovsky wrote: > On 06/15/2017 03:09 PM, Stefano Stabellini wrote: > > When an active socket has data available, increment the io and read > > counters, and schedule the ioworker. > > > > Implement the read function by reading from the socket, writing the data > > to the data ring. > > > > Set in_error on error. > > > > Signed-off-by: Stefano Stabellini > > CC: boris.ostrovsky@oracle.com > > CC: jgross@suse.com > > --- > > drivers/xen/pvcalls-back.c | 85 ++++++++++++++++++++++++++++++++++++++++++++++ > > 1 file changed, 85 insertions(+) > > > > diff --git a/drivers/xen/pvcalls-back.c b/drivers/xen/pvcalls-back.c > > index b9a10b9..65d9eba 100644 > > --- a/drivers/xen/pvcalls-back.c > > +++ b/drivers/xen/pvcalls-back.c > > @@ -100,6 +100,81 @@ static int pvcalls_back_release_active(struct xenbus_device *dev, > > > > static void pvcalls_conn_back_read(void *opaque) > > { > > + struct sock_mapping *map = (struct sock_mapping *)opaque; > > + struct msghdr msg; > > + struct kvec vec[2]; > > + RING_IDX cons, prod, size, wanted, array_size, masked_prod, masked_cons; > > + int32_t error; > > + struct pvcalls_data_intf *intf = map->ring; > > + struct pvcalls_data *data = &map->data; > > + unsigned long flags; > > + int ret; > > + > > + array_size = XEN_FLEX_RING_SIZE(map->ring_order); > > I noticed that in the next patch you call this 'ring_size. Can you make > those things consistent? (There may be more than just this variable and, > in fact, perhaps some things can be factored out? There are code > fragments that look similar) Yes, I'll make them more consistent. I don't think we can actually share code between the two functions are they do different things. > > + cons = intf->in_cons; > > + prod = intf->in_prod; > > + error = intf->in_error; > > + /* read the indexes first, then deal with the data */ > > + virt_mb(); > > + > > + if (error) > > + return; > > + > > + size = pvcalls_queued(prod, cons, array_size); > > + if (size >= array_size) > > + return; > > + spin_lock_irqsave(&map->sock->sk->sk_receive_queue.lock, flags); > > + if (skb_queue_empty(&map->sock->sk->sk_receive_queue)) { > > + atomic_set(&map->read, 0); > > + spin_unlock_irqrestore(&map->sock->sk->sk_receive_queue.lock, > > + flags); > > + return; > > + } > > + spin_unlock_irqrestore(&map->sock->sk->sk_receive_queue.lock, flags); > > + wanted = array_size - size; > > + masked_prod = pvcalls_mask(prod, array_size); > > + masked_cons = pvcalls_mask(cons, array_size); > > + > > + memset(&msg, 0, sizeof(msg)); > > + msg.msg_iter.type = ITER_KVEC|WRITE; > > + msg.msg_iter.count = wanted; > > + if (masked_prod < masked_cons) { > > + vec[0].iov_base = data->in + masked_prod; > > + vec[0].iov_len = wanted; > > + msg.msg_iter.kvec = vec; > > + msg.msg_iter.nr_segs = 1; > > + } else { > > + vec[0].iov_base = data->in + masked_prod; > > + vec[0].iov_len = array_size - masked_prod; > > + vec[1].iov_base = data->in; > > + vec[1].iov_len = wanted - vec[0].iov_len; > > + msg.msg_iter.kvec = vec; > > + msg.msg_iter.nr_segs = 2; > > + } > > > This is probably obvious to everyone but me but can you explain what is > going on here? ;-) We are setting up iovecs based on the "in" array (similarly the write function does the same for the "out" array). Then we are passing the iovecs to inet_recvmsg to do IO. Depending on the indexes on the array we need one iovec entry or two, in case we need to wrap around the circular buffer. > > + > > + atomic_set(&map->read, 0); > > Is this not atomic_dec() by any chance? It is meant to be atomic_set: the idea is that we are going to drain all the data. If there is any remaming data after inet_recvmsg, we'll increase map->read again. > > + ret = inet_recvmsg(map->sock, &msg, wanted, MSG_DONTWAIT); > > + WARN_ON(ret > wanted); > > + if (ret == -EAGAIN) /* shouldn't happen */ > > + return; > > + if (!ret) > > + ret = -ENOTCONN; > > + spin_lock_irqsave(&map->sock->sk->sk_receive_queue.lock, flags); > > + if (ret > 0 && !skb_queue_empty(&map->sock->sk->sk_receive_queue)) > > + atomic_inc(&map->read); > > + spin_unlock_irqrestore(&map->sock->sk->sk_receive_queue.lock, flags); > > + > > + /* write the data, then modify the indexes */ > > + virt_wmb(); > > + if (ret < 0) > > + intf->in_error = ret; > > + else > > + intf->in_prod = prod + ret; > > + /* update the indexes, then notify the other end */ > > + virt_wmb(); > > + notify_remote_via_irq(map->irq); > > + > > + return; > > } > > > > static int pvcalls_conn_back_write(struct sock_mapping *map) > > @@ -172,6 +247,16 @@ static void pvcalls_sk_state_change(struct sock *sock) > > > > static void pvcalls_sk_data_ready(struct sock *sock) > > { > > + struct sock_mapping *map = sock->sk_user_data; > > + struct pvcalls_ioworker *iow; > > + > > + if (map == NULL) > > + return; > > + > > + iow = &map->ioworker; > > + atomic_inc(&map->read); > > + atomic_inc(&map->io); > > + queue_work(iow->wq, &iow->register_work); > > } > > > > static struct sock_mapping *pvcalls_new_active_socket( >