Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S934572AbeAKPtO (ORCPT + 1 other); Thu, 11 Jan 2018 10:49:14 -0500 Received: from smtp02.citrix.com ([66.165.176.63]:27134 "EHLO SMTP02.CITRIX.COM" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S932429AbeAKPtL (ORCPT ); Thu, 11 Jan 2018 10:49:11 -0500 X-IronPort-AV: E=Sophos;i="5.46,345,1511827200"; d="scan'208";a="465426247" Subject: Re: [PATCH 2/2] xen-netfront: Fix race between device setup and open To: David Miller CC: , , , , References: <20180111093638.28937-1-ross.lagerwall@citrix.com> <20180111093638.28937-3-ross.lagerwall@citrix.com> <20180111.102622.769744562294438306.davem@davemloft.net> From: Ross Lagerwall Message-ID: Date: Thu, 11 Jan 2018 15:49:07 +0000 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.4.0 MIME-Version: 1.0 In-Reply-To: <20180111.102622.769744562294438306.davem@davemloft.net> Content-Type: text/plain; charset="utf-8"; format=flowed Content-Language: en-US Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Return-Path: On 01/11/2018 03:26 PM, David Miller wrote: > From: Ross Lagerwall > Date: Thu, 11 Jan 2018 09:36:38 +0000 > >> When a netfront device is set up it registers a netdev fairly early on, >> before it has set up the queues and is actually usable. A userspace tool >> like NetworkManager will immediately try to open it and access its state >> as soon as it appears. The bug can be reproduced by hotplugging VIFs >> until the VM runs out of grant refs. It registers the netdev but fails >> to set up any queues (since there are no more grant refs). In the >> meantime, NetworkManager opens the device and the kernel crashes trying >> to access the queues (of which there are none). >> >> Fix this in two ways: >> * For initial setup, register the netdev much later, after the queues >> are setup. This avoids the race entirely. >> * During a suspend/resume cycle, the frontend reconnects to the backend >> and the queues are recreated. It is possible (though highly unlikely) to >> race with something opening the device and accessing the queues after >> they have been destroyed but before they have been recreated. Extend the >> region covered by the rtnl semaphore to protect against this race. There >> is a possibility that we fail to recreate the queues so check for this >> in the open function. >> >> Signed-off-by: Ross Lagerwall > > Where is patch 1/2 and the 0/2 header posting which explains what this > patch series is doing, how it is doing it, and why it is doing it that > way? > I've now added CC'd netdev on the other two. Cheers, -- Ross Lagerwall