Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S934656AbeAKP0a (ORCPT + 1 other); Thu, 11 Jan 2018 10:26:30 -0500 Received: from shards.monkeyblade.net ([184.105.139.130]:38664 "EHLO shards.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S932197AbeAKP00 (ORCPT ); Thu, 11 Jan 2018 10:26:26 -0500 Date: Thu, 11 Jan 2018 10:26:22 -0500 (EST) Message-Id: <20180111.102622.769744562294438306.davem@davemloft.net> To: ross.lagerwall@citrix.com Cc: xen-devel@lists.xenproject.org, boris.ostrovsky@oracle.com, jgross@suse.com, netdev@vger.kernel.org, linux-kernel@vger.kernel.org Subject: Re: [PATCH 2/2] xen-netfront: Fix race between device setup and open From: David Miller In-Reply-To: <20180111093638.28937-3-ross.lagerwall@citrix.com> References: <20180111093638.28937-1-ross.lagerwall@citrix.com> <20180111093638.28937-3-ross.lagerwall@citrix.com> X-Mailer: Mew version 6.7 on Emacs 25.3 / Mule 6.0 (HANACHIRUSATO) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.5.12 (shards.monkeyblade.net [149.20.54.216]); Thu, 11 Jan 2018 07:26:25 -0800 (PST) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Return-Path: From: Ross Lagerwall Date: Thu, 11 Jan 2018 09:36:38 +0000 > When a netfront device is set up it registers a netdev fairly early on, > before it has set up the queues and is actually usable. A userspace tool > like NetworkManager will immediately try to open it and access its state > as soon as it appears. The bug can be reproduced by hotplugging VIFs > until the VM runs out of grant refs. It registers the netdev but fails > to set up any queues (since there are no more grant refs). In the > meantime, NetworkManager opens the device and the kernel crashes trying > to access the queues (of which there are none). > > Fix this in two ways: > * For initial setup, register the netdev much later, after the queues > are setup. This avoids the race entirely. > * During a suspend/resume cycle, the frontend reconnects to the backend > and the queues are recreated. It is possible (though highly unlikely) to > race with something opening the device and accessing the queues after > they have been destroyed but before they have been recreated. Extend the > region covered by the rtnl semaphore to protect against this race. There > is a possibility that we fail to recreate the queues so check for this > in the open function. > > Signed-off-by: Ross Lagerwall Where is patch 1/2 and the 0/2 header posting which explains what this patch series is doing, how it is doing it, and why it is doing it that way? Thanks.