DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org B44D421A95
Date: Mon, 11 Sep 2017 16:56:07 -0700 (PDT)
From: Stefano Stabellini <sstabellini@kernel.org>
To: Juergen Gross <jgross@suse.com>
cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>,
        Stefano Stabellini <sstabellini@kernel.org>, xen-devel@lists.xen.org,
        linux-kernel@vger.kernel.org, Stefano Stabellini <stefano@aporeto.com>
Subject: Re: [PATCH v2 11/13] xen/pvcalls: implement release command
In-Reply-To: <7ace9427-5215-6be7-907a-46dd15ea2a8f@suse.com>
Message-ID: <alpine.DEB.2.10.1709111648290.19719@sstabellini-ThinkPad-X260>
References: <alpine.DEB.2.10.1707251415190.22381@sstabellini-ThinkPad-X260> <1501017730-12797-1-git-send-email-sstabellini@kernel.org> <1501017730-12797-11-git-send-email-sstabellini@kernel.org> <81df7507-287b-ee06-89e4-463e82628d10@oracle.com>
 <alpine.DEB.2.10.1707311528470.22381@sstabellini-ThinkPad-X260> <c081688e-3d88-c6c2-f53f-e2b10641e8f1@oracle.com> <7ace9427-5215-6be7-907a-46dd15ea2a8f@suse.com>
User-Agent: Alpine 2.10 (DEB 1266 2009-07-14)
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
Sender: linux-kernel-owner@vger.kernel.org
Content-Length: 3161
Lines: 76

On Tue, 1 Aug 2017, Juergen Gross wrote:
> >>>> +	if (sock->sk == NULL)
> >>>> +		return 0;
> >>>> +
> >>>> +	map = (struct sock_mapping *) READ_ONCE(sock->sk->sk_send_head);
> >>>> +	if (map == NULL)
> >>>> +		return 0;
> >>>> +
> >>>> +	spin_lock(&bedata->pvcallss_lock);
> >>>> +	req_id = bedata->ring.req_prod_pvt & (RING_SIZE(&bedata->ring) - 1);
> >>>> +	if (RING_FULL(&bedata->ring) ||
> >>>> +	    READ_ONCE(bedata->rsp[req_id].req_id) != PVCALLS_INVALID_ID) {
> >>>> +		spin_unlock(&bedata->pvcallss_lock);
> >>>> +		return -EAGAIN;
> >>>> +	}
> >>>> +	WRITE_ONCE(sock->sk->sk_send_head, NULL);
> >>>> +
> >>>> +	req = RING_GET_REQUEST(&bedata->ring, req_id);
> >>>> +	req->req_id = req_id;
> >>>> +	req->cmd = PVCALLS_RELEASE;
> >>>> +	req->u.release.id = (uint64_t)sock;
> >>>> +
> >>>> +	bedata->ring.req_prod_pvt++;
> >>>> +	RING_PUSH_REQUESTS_AND_CHECK_NOTIFY(&bedata->ring, notify);
> >>>> +	spin_unlock(&bedata->pvcallss_lock);
> >>>> +	if (notify)
> >>>> +		notify_remote_via_irq(bedata->irq);
> >>>> +
> >>>> +	wait_event(bedata->inflight_req,
> >>>> +		READ_ONCE(bedata->rsp[req_id].req_id) == req_id);
> >>>> +
> >>>> +	if (map->active_socket) {
> >>>> +		/* 
> >>>> +		 * Set in_error and wake up inflight_conn_req to force
> >>>> +		 * recvmsg waiters to exit.
> >>>> +		 */
> >>>> +		map->active.ring->in_error = -EBADF;
> >>>> +		wake_up_interruptible(&map->active.inflight_conn_req);
> >>>> +
> >>>> +		mutex_lock(&map->active.in_mutex);
> >>>> +		mutex_lock(&map->active.out_mutex);
> >>>> +		pvcalls_front_free_map(bedata, map);
> >>>> +		mutex_unlock(&map->active.out_mutex);
> >>>> +		mutex_unlock(&map->active.in_mutex);
> >>>> +		kfree(map);
> >>> Since you are locking here I assume you expect that someone else might
> >>> also be trying to lock the map. But you are freeing it immediately after
> >>> unlocking. Wouldn't that mean that whoever is trying to grab the lock
> >>> might then dereference freed memory?
> >> The lock is to make sure there are no recvmsg or sendmsg in progress. We
> >> are sure that no newer sendmsg or recvmsg are waiting for
> >> pvcalls_front_release to release the lock because before send a message
> >> to the backend we set sk_send_head to NULL.
> > 
> > Is there a chance that whoever is potentially calling send/rcvmsg has
> > checked that sk_send_head is non-NULL but hasn't grabbed the lock yet?
> > 
> > Freeing a structure containing a lock right after releasing the lock
> > looks weird (to me). Is there any other way to synchronize with
> > sender/receiver? Any other lock?
> 
> Right. This looks fishy. Either you don't need the locks or you can't
> just free the area right after releasing the lock.

I changed this code, you'll see soon in the new patch series I am going
to send. There were two very similar mutex_unlock/kfree problems:

1) pvcalls_front_release
2) pvcalls_front_remove

For 2), I introduced a refcount. I only free the data structs when the
refcount reaches 0.

For 1), I could introduce a similar refcount that would serve the same
purpose, but instead I used mutex_trylock, effectively using the
internal count in in_mutex and out_mutex for the same purpose.