Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1759333AbcDEPuX (ORCPT ); Tue, 5 Apr 2016 11:50:23 -0400 Received: from mail1.bemta8.messagelabs.com ([216.82.243.203]:20384 "EHLO mail1.bemta8.messagelabs.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752143AbcDEPuV convert rfc822-to-8bit (ORCPT ); Tue, 5 Apr 2016 11:50:21 -0400 X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFtrHKsWRWlGSWpSXmKPExsVywNY2Q3fVPeZ wg3MPBC32nPnFbtG8eD2bxYcpd1ktLu+aw2bR+m8OuwOrx719h1k89s9dw+6x7tJ7Ro/3+66y eXzeJBfAGsWamZeUX5HAmjFjwSnGgoseFS++TmVpYNxn0cXIxSEksItRounmemYI5wCjxMIJr 1ggnPWMElPOv2PtYuTkYBMwlLi2bBsbiC0ioCJx4d9edpAiZoFdTBJL1z9jAUkIC9hKfLvcAV VkJ7H00kdmCNtIYsfZz4xdjBwcLEDN22dogYR5BXwkTqybCFYiJLCFSeLzdX0Qm1PAUWLjmzd gYxgFxCS+n1rDBGIzC4hL3HoyH8yWEBCQWLLnPDOELSrx8vE/VgjbQGLr0n0sELaSxM/vW1gg enUkFuz+xAZha0ssW/iaGeIGQYmTM5+wQNygInFy9z+2CYzis5Csm4WkfRaS9llI2hcwsqxiV C9OLSpLLdK10EsqykzPKMlNzMzRNTSw0MtNLS5OTE/NSUwq1kvOz93ECIxXBiDYwXi42fkQoy QHk5Io75JDzOFCfEn5KZUZicUZ8UWlOanFhxhlODiUJHhb7wLlBItS01Mr0jJzgIkDJi3BwaM kwhsHkuYtLkjMLc5Mh0idYlSUEucNAUkIgCQySvPg2mDJ6hKjrJQwLyPQIUI8BalFuZklqPKv GMU5GJWEebtApvBk5pXATX8FtJgJaHG9MBPI4pJEhJRUA+ManWVsOxo37PUTl9y/UvF6jobvW knpGgZLi7Z/gZ+TV+rNmnVgpnGNp9qE6Ra7crb8CvJqT3LadfCwtPLJTcULLdWVb4XyKd75p9 SnWsJ6+4t93oXqk+/1+168/PFmu0hcQNGqzU3/mJbuvpOn918rVpw19ecFQYVOpapA4Tm8X3a esGc5816JpTgj0VCLuag4EQAnmvTdUQMAAA== X-Env-Sender: Timothy.Sell@unisys.com X-Msg-Ref: server-13.tower-75.messagelabs.com!1459871383!16211748!11 X-Originating-IP: [192.61.61.104] X-StarScan-Received: X-StarScan-Version: 8.28; banners=-,-,- X-VirusChecked: Checked From: "Sell, Timothy C" To: Neil Horman CC: Iban Rodriguez , "Kershner, David A" , Greg Kroah-Hartman , *S-Par-Maintainer , "devel@driverdev.osuosl.org" , "linux-kernel@vger.kernel.org" Subject: RE: Staging: unisys/verisonic: Correct double unlock Thread-Topic: Staging: unisys/verisonic: Correct double unlock Thread-Index: AQHRj0uOXuAqawUJu0uZ4gzHlo5g1597gWLw Date: Tue, 5 Apr 2016 15:49:57 +0000 Message-ID: References: <1459619226-9786-1-git-send-email-iban.rodriguez@ono.com> <20160404143521.GB8807@hmsreliant.think-freely.org> <20160405145758.GD11269@hmsreliant.think-freely.org> In-Reply-To: <20160405145758.GD11269@hmsreliant.think-freely.org> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-ms-exchange-transport-fromentityheader: Hosted x-originating-ip: [172.17.125.147] Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 8BIT MIME-Version: 1.0 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 9257 Lines: 238 > -----Original Message----- > From: Neil Horman [mailto:nhorman@redhat.com] > Sent: Tuesday, April 05, 2016 10:58 AM > To: Sell, Timothy C > Cc: Iban Rodriguez; Kershner, David A; Greg Kroah-Hartman; Benjamin > Romer; *S-Par-Maintainer; devel@driverdev.osuosl.org; linux- > kernel@vger.kernel.org > Subject: Re: Staging: unisys/verisonic: Correct double unlock > > On Mon, Apr 04, 2016 at 09:40:13PM +0000, Sell, Timothy C wrote: > > > -----Original Message----- > > > From: Neil Horman [mailto:nhorman@redhat.com] > > > Sent: Monday, April 04, 2016 10:35 AM > > > To: Sell, Timothy C > > > Cc: Iban Rodriguez; Kershner, David A; Greg Kroah-Hartman; Benjamin > > > Romer; *S-Par-Maintainer; devel@driverdev.osuosl.org; linux- > > > kernel@vger.kernel.org > > > Subject: Re: Staging: unisys/verisonic: Correct double unlock > > > > > > On Sat, Apr 02, 2016 at 11:20:14PM +0000, Sell, Timothy C wrote: > > > > > -----Original Message----- > > > > > From: Iban Rodriguez [mailto:iban.rodriguez@ono.com] > > > > > Sent: Saturday, April 02, 2016 1:47 PM > > > > > To: Kershner, David A; Greg Kroah-Hartman; Benjamin Romer; Sell, > > > Timothy > > > > > C; Neil Horman > > > > > Cc: *S-Par-Maintainer; devel@driverdev.osuosl.org; linux- > > > > > kernel@vger.kernel.org; Iban Rodriguez > > > > > Subject: Staging: unisys/verisonic: Correct double unlock > > > > > > > > > > 'priv_lock' is unlocked twice. The first one is removed and > > > > > the function 'visornic_serverdown_complete' is now called with > > > > > 'priv_lock' locked because 'devdata' is modified inside. > > > > > > > > > > Signed-off-by: Iban Rodriguez > > > > > --- > > > > > drivers/staging/unisys/visornic/visornic_main.c | 1 - > > > > > 1 file changed, 1 deletion(-) > > > > > > > > > > diff --git a/drivers/staging/unisys/visornic/visornic_main.c > > > > > b/drivers/staging/unisys/visornic/visornic_main.c > > > > > index be0d057346c3..af03f2938fe9 100644 > > > > > --- a/drivers/staging/unisys/visornic/visornic_main.c > > > > > +++ b/drivers/staging/unisys/visornic/visornic_main.c > > > > > @@ -368,7 +368,6 @@ visornic_serverdown(struct visornic_devdata > > > > > *devdata, > > > > > } > > > > > devdata->server_change_state = true; > > > > > devdata->server_down_complete_func = complete_func; > > > > > - spin_unlock_irqrestore(&devdata->priv_lock, flags); > > > > > visornic_serverdown_complete(devdata); > > > > > } else if (devdata->server_change_state) { > > > > > dev_dbg(&devdata->dev->device, "%s changing state\n", > > > > > > > > I agree there is a bug here involving priv_lock being unlocked > > > > twice, but this patch isn't the appropriate fix. Reason is, we can NOT > > > > call visornic_serverdown_complete() while holding a spinlock > > > > (which is what this patch would cause to occur) because > > > > visornic_serverdown_complete() might block when it calls > > > > rtnl_lock() in this code sequence (rtnl_lock() grabs a mutex): > > > > > > > > rtnl_lock(); > > > > dev_close(netdev); > > > > rtnl_unlock(); > > > > > > > > Blocking with a spinlock held is always a bad idea. :-( > > > > > > > > > > You should just get rid of the priv_lock entirely, its not needed. > > > > > > priv_lock is used the following functions: > > > > > > visornic_serverdown - only called at the end of a tx_timeout reset > > > operation, so > > > you are sure that the rx and tx paths are quiesced (i.e. no data access > > > happening) > > > > > > visornic_disable_with_timeout - move the netif_stop_queue operation > to > > > the top > > > of this function and you will be guaranteed no concurrent access in the tx > > > path > > > > > > visornic_enable_with_timeout - same as above, make sure that > > > netif_start_queue > > > and napi_enable are at the end of the function and you are guarantted > no > > > concurrent access. > > > > > > visornic_xmit - The queue lock in the netdev_start_xmit routine > guarantees > > > you > > > single access here from multiple transmits. > > > > > > visornic_xmit_timeout - only called on a tx timeout, when you are > > > guaranteed not > > > to have concurrent transmit occuing, by definition. > > > > > > visornic_rx - the only tests made here are to devdata members that are > > > altered > > > in service_resp_queue, and the visornic_rx is only called from > > > service_resp_queue, so you are guaranteed a stable data structure, as > there > > > is > > > only ever one context in service_resp_queue as its called from the napi > poll > > > routine > > > > > > service_resp_queue - Same as above, for any given queue, > > > service_resp_queue only > > > has one context exectuing at once. > > > > > > host_side_disappeared - only called from visornic_remove, when > implies > > > that all > > > associated devices are closed already, guaranteeing single access. > > > > > > visornic_remove > > > visornic_resume - Both of these function only get called when all > network > > > interfaces are quiesced. > > > > > > just remove the lock and make the minor changes needed to guarantee > > > isolated > > > access. It makes the code cleaner and faster > > > > > > Neil > > > > Neil, > > > > Although I would also love to get rid of this lock, I think we still > > need it, and will attempt to explain. > > > > There's a thread of execution present in visornic that doesn't exist > > in traditional network drivers, which involves the visornic_pause() and > > visornic_resume() functions registered during: > > > > visorbus_register_visor_driver(&visornic_driver) > > > > visornic_pause() and visornic_resume() are called on a thread managed > > by visorbus, in response to messages received from our hypervisor > > back-end. > > > Ok, but you still can get away without the lock. The other lock points are all > in the tx/rx paths, so insted of holding the lock, stop the transmit queues > with > netif_tx_stop_all_queues, and pause the napi instance with napi_disable. > That > allows you to guarantee that the tx and rx paths have no execute going on, > and > you can complete the serverdown path safely. > > > Note that visornic_pause() calls visornic_serverdown(), which is one of > > the users of priv_lock. (I.e., visornic_serverdown() is called from > > other places besides the end of a tx_timeout reset operation, which is > > what you called out in your explanation above). We need priv_lock to > > do a false --> true transition of devdata->server_change_state in the > > pause/resume path, so we can prevent this transition from occurring > > during critical sections in the normal networking path. > > > > The comment present on priv_lock's declaration: > > > > spinlock_t priv_lock; /* spinlock to access devdata structures */ > > > > is indeed inadequate to the point of being misleading. > > > > visornic_serverdown() in its present form is hard-to-follow, in > > addition to having the double-unlock bug. I would prefer if it were > > corrected and rewritten to look like this (where the main-path falls > > thru down the left side of the screen): > > > Right, but the point of the lock is still to protect the devdata structure, and > there are ways to do so (in my previous email and point above), without > needing > a lock. You just have to ensure mutual exclusion > > Neil You convinced me; I like the idea. Would it be acceptable to fix the immediate double-unlock bug now, and address the lock-removal along with the other features we have queued up for after the driver gets out of staging (like interrupt support)? The lock removal will touch many of our code paths dealing with back-end hypervisor recovery scenarios, which is historically very time-consuming to debug and test. Plus it seems like this could be classified as a 'feature'. Thanks Neil. Tim > > > static int > > visornic_serverdown(struct visornic_devdata *devdata, > > visorbus_state_complete_func complete_func) > > { > > unsigned long flags; > > int err; > > > > spin_lock_irqsave(&devdata->priv_lock, flags); > > if (devdata->server_change_state) { > > dev_dbg(&devdata->dev->device, "%s changing state\n", > > __func__); > > err = -EINVAL; > > goto err_unlock; > > } > > if (devdata->server_down) { > > dev_dbg(&devdata->dev->device, "%s already down\n", > > __func__); > > err = -EINVAL; > > goto err_unlock; > > } > > if (devdata->going_away) { > > dev_dbg(&devdata->dev->device, > > "%s aborting because device removal pending\n", > > __func__); > > err = -ENODEV; > > goto err_unlock; > > } > > devdata->server_change_state = true; > > devdata->server_down_complete_func = complete_func; > > spin_unlock_irqrestore(&devdata->priv_lock, flags); > > > > visornic_serverdown_complete(devdata); > > return 0; > > > > err_unlock: > > spin_unlock_irqrestore(&devdata->priv_lock, flags); > > return err; > > } > > > > Tim > > > > > > > > > > -- > > > > > 1.9.1 > > > >