Subject: Re: [PATCH 2/2] xen-netfront: Fix race between device setup and open
To: David Miller <davem@davemloft.net>
CC: <xen-devel@lists.xenproject.org>, <boris.ostrovsky@oracle.com>,
        <jgross@suse.com>, <netdev@vger.kernel.org>,
        <linux-kernel@vger.kernel.org>
References: <20180111093638.28937-1-ross.lagerwall@citrix.com>
 <20180111093638.28937-3-ross.lagerwall@citrix.com>
 <20180111.102622.769744562294438306.davem@davemloft.net>
From: Ross Lagerwall <ross.lagerwall@citrix.com>
Message-ID: <f297b1a4-d287-ba66-ad44-bf6073950bdf@citrix.com>
Date: Thu, 11 Jan 2018 15:49:07 +0000
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101
 Thunderbird/52.4.0
MIME-Version: 1.0
In-Reply-To: <20180111.102622.769744562294438306.davem@davemloft.net>
Content-Type: text/plain; charset="utf-8"; format=flowed
Content-Language: en-US
Content-Transfer-Encoding: 7bit
Sender: linux-kernel-owner@vger.kernel.org

On 01/11/2018 03:26 PM, David Miller wrote:
> From: Ross Lagerwall <ross.lagerwall@citrix.com>
> Date: Thu, 11 Jan 2018 09:36:38 +0000
> 
>> When a netfront device is set up it registers a netdev fairly early on,
>> before it has set up the queues and is actually usable. A userspace tool
>> like NetworkManager will immediately try to open it and access its state
>> as soon as it appears. The bug can be reproduced by hotplugging VIFs
>> until the VM runs out of grant refs. It registers the netdev but fails
>> to set up any queues (since there are no more grant refs). In the
>> meantime, NetworkManager opens the device and the kernel crashes trying
>> to access the queues (of which there are none).
>>
>> Fix this in two ways:
>> * For initial setup, register the netdev much later, after the queues
>> are setup. This avoids the race entirely.
>> * During a suspend/resume cycle, the frontend reconnects to the backend
>> and the queues are recreated. It is possible (though highly unlikely) to
>> race with something opening the device and accessing the queues after
>> they have been destroyed but before they have been recreated. Extend the
>> region covered by the rtnl semaphore to protect against this race. There
>> is a possibility that we fail to recreate the queues so check for this
>> in the open function.
>>
>> Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com>
> 
> Where is patch 1/2 and the 0/2 header posting which explains what this
> patch series is doing, how it is doing it, and why it is doing it that
> way?
> 

I've now added CC'd netdev on the other two.

Cheers,
-- 
Ross Lagerwall