Subject: Re: [PATCH 2/2] xen-netfront: Fix race between device setup and open
To: Ross Lagerwall <ross.lagerwall@citrix.com>,
        xen-devel@lists.xenproject.org
Cc: Juergen Gross <jgross@suse.com>, netdev@vger.kernel.org,
        linux-kernel@vger.kernel.org
References: <20180111093638.28937-1-ross.lagerwall@citrix.com>
 <20180111093638.28937-3-ross.lagerwall@citrix.com>
From: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Message-ID: <de3cc880-e94e-304a-0d90-e5f90d21a734@oracle.com>
Date: Tue, 16 Jan 2018 13:56:37 -0500
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101
 Thunderbird/52.2.1
MIME-Version: 1.0
In-Reply-To: <20180111093638.28937-3-ross.lagerwall@citrix.com>
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit
Content-Language: en-US
Sender: linux-kernel-owner@vger.kernel.org

On 01/11/2018 04:36 AM, Ross Lagerwall wrote:
> When a netfront device is set up it registers a netdev fairly early on,
> before it has set up the queues and is actually usable. A userspace tool
> like NetworkManager will immediately try to open it and access its state
> as soon as it appears. The bug can be reproduced by hotplugging VIFs
> until the VM runs out of grant refs. It registers the netdev but fails
> to set up any queues (since there are no more grant refs). In the
> meantime, NetworkManager opens the device and the kernel crashes trying
> to access the queues (of which there are none).
>
> Fix this in two ways:
> * For initial setup, register the netdev much later, after the queues
> are setup. This avoids the race entirely.
> * During a suspend/resume cycle, the frontend reconnects to the backend
> and the queues are recreated. It is possible (though highly unlikely) to
> race with something opening the device and accessing the queues after
> they have been destroyed but before they have been recreated. Extend the
> region covered by the rtnl semaphore to protect against this race. There
> is a possibility that we fail to recreate the queues so check for this
> in the open function.
>
> Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com>

Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>