Subject: Re: [PATCH v2 11/13] xen/pvcalls: implement release command
To: Boris Ostrovsky <boris.ostrovsky@oracle.com>,
        Stefano Stabellini <sstabellini@kernel.org>
Cc: xen-devel@lists.xen.org, linux-kernel@vger.kernel.org,
        Stefano Stabellini <stefano@aporeto.com>
References: <alpine.DEB.2.10.1707251415190.22381@sstabellini-ThinkPad-X260>
 <1501017730-12797-1-git-send-email-sstabellini@kernel.org>
 <1501017730-12797-11-git-send-email-sstabellini@kernel.org>
 <81df7507-287b-ee06-89e4-463e82628d10@oracle.com>
 <alpine.DEB.2.10.1707311528470.22381@sstabellini-ThinkPad-X260>
 <c081688e-3d88-c6c2-f53f-e2b10641e8f1@oracle.com>
From: Juergen Gross <jgross@suse.com>
Message-ID: <7ace9427-5215-6be7-907a-46dd15ea2a8f@suse.com>
Date: Tue, 1 Aug 2017 17:34:44 +0200
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101
 Thunderbird/52.2.1
MIME-Version: 1.0
In-Reply-To: <c081688e-3d88-c6c2-f53f-e2b10641e8f1@oracle.com>
Content-Type: text/plain; charset=windows-1252
Content-Language: de-DE
Content-Transfer-Encoding: 7bit
Sender: linux-kernel-owner@vger.kernel.org
Content-Length: 3668
Lines: 97

On 01/08/17 17:23, Boris Ostrovsky wrote:
> On 07/31/2017 06:34 PM, Stefano Stabellini wrote:
>> On Thu, 27 Jul 2017, Boris Ostrovsky wrote:
>>>> +int pvcalls_front_release(struct socket *sock)
>>>> +{
>>>> +	struct pvcalls_bedata *bedata;
>>>> +	struct sock_mapping *map;
>>>> +	int req_id, notify;
>>>> +	struct xen_pvcalls_request *req;
>>>> +
>>>> +	if (!pvcalls_front_dev)
>>>> +		return -EIO;
>>>> +	bedata = dev_get_drvdata(&pvcalls_front_dev->dev);
>>>> +	if (!bedata)
>>>> +		return -EIO;
>>> Some (all?) other ops don't check bedata validity. Should they all do?
>> No, I don't think they should: dev_set_drvdata is called in the probe
>> function (pvcalls_front_probe). I'll remove it.
>>
>>
>>>> +
>>>> +	if (sock->sk == NULL)
>>>> +		return 0;
>>>> +
>>>> +	map = (struct sock_mapping *) READ_ONCE(sock->sk->sk_send_head);
>>>> +	if (map == NULL)
>>>> +		return 0;
>>>> +
>>>> +	spin_lock(&bedata->pvcallss_lock);
>>>> +	req_id = bedata->ring.req_prod_pvt & (RING_SIZE(&bedata->ring) - 1);
>>>> +	if (RING_FULL(&bedata->ring) ||
>>>> +	    READ_ONCE(bedata->rsp[req_id].req_id) != PVCALLS_INVALID_ID) {
>>>> +		spin_unlock(&bedata->pvcallss_lock);
>>>> +		return -EAGAIN;
>>>> +	}
>>>> +	WRITE_ONCE(sock->sk->sk_send_head, NULL);
>>>> +
>>>> +	req = RING_GET_REQUEST(&bedata->ring, req_id);
>>>> +	req->req_id = req_id;
>>>> +	req->cmd = PVCALLS_RELEASE;
>>>> +	req->u.release.id = (uint64_t)sock;
>>>> +
>>>> +	bedata->ring.req_prod_pvt++;
>>>> +	RING_PUSH_REQUESTS_AND_CHECK_NOTIFY(&bedata->ring, notify);
>>>> +	spin_unlock(&bedata->pvcallss_lock);
>>>> +	if (notify)
>>>> +		notify_remote_via_irq(bedata->irq);
>>>> +
>>>> +	wait_event(bedata->inflight_req,
>>>> +		READ_ONCE(bedata->rsp[req_id].req_id) == req_id);
>>>> +
>>>> +	if (map->active_socket) {
>>>> +		/* 
>>>> +		 * Set in_error and wake up inflight_conn_req to force
>>>> +		 * recvmsg waiters to exit.
>>>> +		 */
>>>> +		map->active.ring->in_error = -EBADF;
>>>> +		wake_up_interruptible(&map->active.inflight_conn_req);
>>>> +
>>>> +		mutex_lock(&map->active.in_mutex);
>>>> +		mutex_lock(&map->active.out_mutex);
>>>> +		pvcalls_front_free_map(bedata, map);
>>>> +		mutex_unlock(&map->active.out_mutex);
>>>> +		mutex_unlock(&map->active.in_mutex);
>>>> +		kfree(map);
>>> Since you are locking here I assume you expect that someone else might
>>> also be trying to lock the map. But you are freeing it immediately after
>>> unlocking. Wouldn't that mean that whoever is trying to grab the lock
>>> might then dereference freed memory?
>> The lock is to make sure there are no recvmsg or sendmsg in progress. We
>> are sure that no newer sendmsg or recvmsg are waiting for
>> pvcalls_front_release to release the lock because before send a message
>> to the backend we set sk_send_head to NULL.
> 
> Is there a chance that whoever is potentially calling send/rcvmsg has
> checked that sk_send_head is non-NULL but hasn't grabbed the lock yet?
> 
> Freeing a structure containing a lock right after releasing the lock
> looks weird (to me). Is there any other way to synchronize with
> sender/receiver? Any other lock?

Right. This looks fishy. Either you don't need the locks or you can't
just free the area right after releasing the lock.

> BTW, I also noticed that in rcvmsg you are calling
> wait_event_interruptible() while holding the lock. Have you tested with
> CONFIG_DEBUG_ATOMIC_SLEEP? (or maybe it's some other config  option that
> would complain about those sorts of thing)

I believe sleeping while holding a mutex is allowed. Sleeping in
spinlocked paths is bad.

BTW: You are looking for CONFIG_DEBUG_MUTEXES (see
Documentation/locking/mutex-design.txt ).


Juergen