Subject: Re: [PATCH v2 11/13] xen/pvcalls: implement release command
To: Stefano Stabellini <sstabellini@kernel.org>
References: <alpine.DEB.2.10.1707251415190.22381@sstabellini-ThinkPad-X260>
 <1501017730-12797-1-git-send-email-sstabellini@kernel.org>
 <1501017730-12797-11-git-send-email-sstabellini@kernel.org>
 <81df7507-287b-ee06-89e4-463e82628d10@oracle.com>
 <alpine.DEB.2.10.1707311528470.22381@sstabellini-ThinkPad-X260>
Cc: xen-devel@lists.xen.org, linux-kernel@vger.kernel.org, jgross@suse.com,
        Stefano Stabellini <stefano@aporeto.com>
From: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Message-ID: <c081688e-3d88-c6c2-f53f-e2b10641e8f1@oracle.com>
Date: Tue, 1 Aug 2017 11:23:52 -0400
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101
 Thunderbird/45.8.0
MIME-Version: 1.0
In-Reply-To: <alpine.DEB.2.10.1707311528470.22381@sstabellini-ThinkPad-X260>
Content-Type: text/plain; charset=windows-1252
Content-Transfer-Encoding: 7bit
Sender: linux-kernel-owner@vger.kernel.org
Content-Length: 3224
Lines: 88

On 07/31/2017 06:34 PM, Stefano Stabellini wrote:
> On Thu, 27 Jul 2017, Boris Ostrovsky wrote:
>>> +int pvcalls_front_release(struct socket *sock)
>>> +{
>>> +	struct pvcalls_bedata *bedata;
>>> +	struct sock_mapping *map;
>>> +	int req_id, notify;
>>> +	struct xen_pvcalls_request *req;
>>> +
>>> +	if (!pvcalls_front_dev)
>>> +		return -EIO;
>>> +	bedata = dev_get_drvdata(&pvcalls_front_dev->dev);
>>> +	if (!bedata)
>>> +		return -EIO;
>> Some (all?) other ops don't check bedata validity. Should they all do?
> No, I don't think they should: dev_set_drvdata is called in the probe
> function (pvcalls_front_probe). I'll remove it.
>
>
>>> +
>>> +	if (sock->sk == NULL)
>>> +		return 0;
>>> +
>>> +	map = (struct sock_mapping *) READ_ONCE(sock->sk->sk_send_head);
>>> +	if (map == NULL)
>>> +		return 0;
>>> +
>>> +	spin_lock(&bedata->pvcallss_lock);
>>> +	req_id = bedata->ring.req_prod_pvt & (RING_SIZE(&bedata->ring) - 1);
>>> +	if (RING_FULL(&bedata->ring) ||
>>> +	    READ_ONCE(bedata->rsp[req_id].req_id) != PVCALLS_INVALID_ID) {
>>> +		spin_unlock(&bedata->pvcallss_lock);
>>> +		return -EAGAIN;
>>> +	}
>>> +	WRITE_ONCE(sock->sk->sk_send_head, NULL);
>>> +
>>> +	req = RING_GET_REQUEST(&bedata->ring, req_id);
>>> +	req->req_id = req_id;
>>> +	req->cmd = PVCALLS_RELEASE;
>>> +	req->u.release.id = (uint64_t)sock;
>>> +
>>> +	bedata->ring.req_prod_pvt++;
>>> +	RING_PUSH_REQUESTS_AND_CHECK_NOTIFY(&bedata->ring, notify);
>>> +	spin_unlock(&bedata->pvcallss_lock);
>>> +	if (notify)
>>> +		notify_remote_via_irq(bedata->irq);
>>> +
>>> +	wait_event(bedata->inflight_req,
>>> +		READ_ONCE(bedata->rsp[req_id].req_id) == req_id);
>>> +
>>> +	if (map->active_socket) {
>>> +		/* 
>>> +		 * Set in_error and wake up inflight_conn_req to force
>>> +		 * recvmsg waiters to exit.
>>> +		 */
>>> +		map->active.ring->in_error = -EBADF;
>>> +		wake_up_interruptible(&map->active.inflight_conn_req);
>>> +
>>> +		mutex_lock(&map->active.in_mutex);
>>> +		mutex_lock(&map->active.out_mutex);
>>> +		pvcalls_front_free_map(bedata, map);
>>> +		mutex_unlock(&map->active.out_mutex);
>>> +		mutex_unlock(&map->active.in_mutex);
>>> +		kfree(map);
>> Since you are locking here I assume you expect that someone else might
>> also be trying to lock the map. But you are freeing it immediately after
>> unlocking. Wouldn't that mean that whoever is trying to grab the lock
>> might then dereference freed memory?
> The lock is to make sure there are no recvmsg or sendmsg in progress. We
> are sure that no newer sendmsg or recvmsg are waiting for
> pvcalls_front_release to release the lock because before send a message
> to the backend we set sk_send_head to NULL.

Is there a chance that whoever is potentially calling send/rcvmsg has
checked that sk_send_head is non-NULL but hasn't grabbed the lock yet?

Freeing a structure containing a lock right after releasing the lock
looks weird (to me). Is there any other way to synchronize with
sender/receiver? Any other lock?

BTW, I also noticed that in rcvmsg you are calling
wait_event_interruptible() while holding the lock. Have you tested with
CONFIG_DEBUG_ATOMIC_SLEEP? (or maybe it's some other config  option that
would complain about those sorts of thing)

-boris