Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751774AbeAPS4r (ORCPT + 1 other); Tue, 16 Jan 2018 13:56:47 -0500 Received: from userp2130.oracle.com ([156.151.31.86]:51348 "EHLO userp2130.oracle.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751107AbeAPS4q (ORCPT ); Tue, 16 Jan 2018 13:56:46 -0500 Subject: Re: [PATCH 2/2] xen-netfront: Fix race between device setup and open To: Ross Lagerwall , xen-devel@lists.xenproject.org Cc: Juergen Gross , netdev@vger.kernel.org, linux-kernel@vger.kernel.org References: <20180111093638.28937-1-ross.lagerwall@citrix.com> <20180111093638.28937-3-ross.lagerwall@citrix.com> From: Boris Ostrovsky Message-ID: Date: Tue, 16 Jan 2018 13:56:37 -0500 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.2.1 MIME-Version: 1.0 In-Reply-To: <20180111093638.28937-3-ross.lagerwall@citrix.com> Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit Content-Language: en-US X-Proofpoint-Virus-Version: vendor=nai engine=5900 definitions=8776 signatures=668653 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 suspectscore=0 malwarescore=0 phishscore=0 bulkscore=0 spamscore=0 mlxscore=0 mlxlogscore=932 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1711220000 definitions=main-1801160261 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Return-Path: On 01/11/2018 04:36 AM, Ross Lagerwall wrote: > When a netfront device is set up it registers a netdev fairly early on, > before it has set up the queues and is actually usable. A userspace tool > like NetworkManager will immediately try to open it and access its state > as soon as it appears. The bug can be reproduced by hotplugging VIFs > until the VM runs out of grant refs. It registers the netdev but fails > to set up any queues (since there are no more grant refs). In the > meantime, NetworkManager opens the device and the kernel crashes trying > to access the queues (of which there are none). > > Fix this in two ways: > * For initial setup, register the netdev much later, after the queues > are setup. This avoids the race entirely. > * During a suspend/resume cycle, the frontend reconnects to the backend > and the queues are recreated. It is possible (though highly unlikely) to > race with something opening the device and accessing the queues after > they have been destroyed but before they have been recreated. Extend the > region covered by the rtnl semaphore to protect against this race. There > is a possibility that we fail to recreate the queues so check for this > in the open function. > > Signed-off-by: Ross Lagerwall Reviewed-by: Boris Ostrovsky