Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932722AbbBDISq (ORCPT ); Wed, 4 Feb 2015 03:18:46 -0500 Received: from mail-bl2on0141.outbound.protection.outlook.com ([65.55.169.141]:35130 "EHLO na01-bl2-obe.outbound.protection.outlook.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S932317AbbBDISo convert rfc822-to-8bit (ORCPT ); Wed, 4 Feb 2015 03:18:44 -0500 From: Dexuan Cui To: Vitaly Kuznetsov , KY Srinivasan , "devel@linuxdriverproject.org" CC: Haiyang Zhang , "linux-kernel@vger.kernel.org" , Jason Wang Subject: RE: [PATCH 1/4] Drivers: hv: vmbus: implement get/put usage workflow for vmbus channels Thread-Topic: [PATCH 1/4] Drivers: hv: vmbus: implement get/put usage workflow for vmbus channels Thread-Index: AQHQP9MLNTS2vfI7/0ubZqpZt0Ene5zgI+2w Date: Wed, 4 Feb 2015 08:18:17 +0000 Message-ID: References: <1422982839-3948-1-git-send-email-vkuznets@redhat.com> <1422982839-3948-2-git-send-email-vkuznets@redhat.com> In-Reply-To: <1422982839-3948-2-git-send-email-vkuznets@redhat.com> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [141.251.55.69] Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 8BIT MIME-Version: 1.0 X-EOPAttributedMessage: 0 Authentication-Results: spf=pass (sender IP is 206.191.230.4) smtp.mailfrom=decui@microsoft.com; linuxdriverproject.org; dkim=none (message not signed) header.d=none; X-Forefront-Antispam-Report: CIP:206.191.230.4;CTRY:US;IPV:NLI;EFV:NLI;SFV:NSPM;SFS:(10019020)(6009001)(438002)(51704005)(377454003)(13464003)(164054003)(51914003)(2501002)(6806004)(19580405001)(86146001)(19580395003)(1511001)(55846006)(50986999)(54356999)(76176999)(66066001)(22746005)(22756005)(46406003)(47776003)(33656002)(92566002)(2421001)(104016003)(50466002)(87936001)(2656002)(46102003)(2950100001)(2900100001)(2920100001)(106116001)(102836002)(106466001)(86362001)(86612001)(77156002)(62966003)(97756001)(23726002)(79686002);DIR:OUT;SFP:1102;SCL:1;SRVR:CY1PR0301MB0844;H:064-smtp-out.microsoft.com;FPR:;SPF:Pass;MLV:sfv;LANG:en; X-Microsoft-Antispam: UriScan:; X-Microsoft-Antispam: BCL:0;PCL:0;RULEID:;SRVR:CY1PR0301MB0844; X-Exchange-Antispam-Report-Test: UriScan:; X-Exchange-Antispam-Report-CFA-Test: BCL:0;PCL:0;RULEID:(601004);SRVR:CY1PR0301MB0844; X-Forefront-PRVS: 04772EA191 X-Exchange-Antispam-Report-CFA-Test: BCL:0;PCL:0;RULEID:;SRVR:CY1PR0301MB0844; X-OriginatorOrg: microsoft.onmicrosoft.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 04 Feb 2015 08:18:40.1259 (UTC) X-MS-Exchange-CrossTenant-Id: 72f988bf-86f1-41af-91ab-2d7cd011db47 X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=72f988bf-86f1-41af-91ab-2d7cd011db47;Ip=[206.191.230.4] X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: CY1PR0301MB0844 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 9875 Lines: 295 > -----Original Message----- > From: Vitaly Kuznetsov [mailto:vkuznets@redhat.com] > Sent: Wednesday, February 4, 2015 1:01 AM > To: KY Srinivasan; devel@linuxdriverproject.org > Cc: Haiyang Zhang; linux-kernel@vger.kernel.org; Dexuan Cui; Jason Wang > Subject: [PATCH 1/4] Drivers: hv: vmbus: implement get/put usage workflow for > vmbus channels > > free_channel() function frees the channel unconditionally so we need to make > sure nobody has any link to it. This is not trivial and there are several > examples of races we have: > > 1) In vmbus_onoffer_rescind() we check for channel existence with > relid2channel() and then use it. This can go wrong if we're in the middle > of channel removal (free_channel() was already called). > > 2) In process_chn_event() we check for channel existence with > pcpu_relid2channel() and then use it. This can also go wrong. > > 3) vmbus_free_channels() just frees all channels, in case we're in the middle > of vmbus_process_rescind_offer() crash is possible. > > The issue can be solved by holding vmbus_connection.channel_lock everywhere, > however, it looks like a way to deadlocks and performance degradation. Get/put > workflow fits here the best. > > Implement vmbus_get_channel()/vmbus_put_channel() pair instead of > free_channel(). > > Signed-off-by: Vitaly Kuznetsov > --- > drivers/hv/channel_mgmt.c | 45 > ++++++++++++++++++++++++++++++++++++++------- > drivers/hv/connection.c | 7 +++++-- > drivers/hv/hyperv_vmbus.h | 4 ++++ > include/linux/hyperv.h | 13 +++++++++++++ > 4 files changed, 60 insertions(+), 9 deletions(-) > > diff --git a/drivers/hv/channel_mgmt.c b/drivers/hv/channel_mgmt.c > index 36bacc7..eb9ce94 100644 > --- a/drivers/hv/channel_mgmt.c > +++ b/drivers/hv/channel_mgmt.c > @@ -147,6 +147,8 @@ static struct vmbus_channel *alloc_channel(void) > return NULL; > > channel->id = atomic_inc_return(&chan_num); > + atomic_set(&channel->count, 1); > + > spin_lock_init(&channel->inbound_lock); > spin_lock_init(&channel->lock); > > @@ -178,19 +180,47 @@ static void release_channel(struct work_struct *work) > } > > /* > - * free_channel - Release the resources used by the vmbus channel object > + * vmbus_put_channel - Decrease the channel usage counter and release the > + * resources when this counter reaches zero. > */ > -static void free_channel(struct vmbus_channel *channel) > +void vmbus_put_channel(struct vmbus_channel *channel) > { > + unsigned long flags; > > /* > * We have to release the channel's workqueue/thread in the vmbus's > * workqueue/thread context > * ie we can't destroy ourselves. > */ > - INIT_WORK(&channel->work, release_channel); > - queue_work(vmbus_connection.work_queue, &channel->work); > + spin_lock_irqsave(&channel->lock, flags); > + if (atomic_dec_and_test(&channel->count)) { > + channel->dying = true; > + INIT_WORK(&channel->work, release_channel); > + spin_unlock_irqrestore(&channel->lock, flags); > + queue_work(vmbus_connection.work_queue, &channel->work); > + } else > + spin_unlock_irqrestore(&channel->lock, flags); > +} > +EXPORT_SYMBOL_GPL(vmbus_put_channel); > + > +/* vmbus_get_channel - Get additional reference to the channel */ > +struct vmbus_channel *vmbus_get_channel(struct vmbus_channel *channel) > +{ > + unsigned long flags; > + struct vmbus_channel *ret = NULL; > + > + if (!channel) > + return NULL; > + > + spin_lock_irqsave(&channel->lock, flags); > + if (!channel->dying) { > + atomic_inc(&channel->count); > + ret = channel; > + } > + spin_unlock_irqrestore(&channel->lock, flags); > + return ret; > } > +EXPORT_SYMBOL_GPL(vmbus_get_channel); > > static void percpu_channel_enq(void *arg) > { > @@ -253,7 +283,7 @@ static void vmbus_process_rescind_offer(struct > work_struct *work) > list_del(&channel->sc_list); > spin_unlock_irqrestore(&primary_channel->lock, flags); > } > - free_channel(channel); > + vmbus_put_channel(channel); > } > > void vmbus_free_channels(void) > @@ -262,7 +292,7 @@ void vmbus_free_channels(void) > > list_for_each_entry(channel, &vmbus_connection.chn_list, listentry) { > vmbus_device_unregister(channel->device_obj); > - free_channel(channel); > + vmbus_put_channel(channel); > } > } > > @@ -391,7 +421,7 @@ done_init_rescind: > spin_unlock_irqrestore(&newchannel->lock, flags); > return; > err_free_chan: > - free_channel(newchannel); > + vmbus_put_channel(newchannel); > } > > enum { > @@ -549,6 +579,7 @@ static void vmbus_onoffer_rescind(struct > vmbus_channel_message_header *hdr) > queue_work(channel->controlwq, &channel->work); > > spin_unlock_irqrestore(&channel->lock, flags); > + vmbus_put_channel(channel); > } > > /* > diff --git a/drivers/hv/connection.c b/drivers/hv/connection.c > index c4acd1c..d1ce134 100644 > --- a/drivers/hv/connection.c > +++ b/drivers/hv/connection.c > @@ -247,7 +247,8 @@ void vmbus_disconnect(void) > * Map the given relid to the corresponding channel based on the > * per-cpu list of channels that have been affinitized to this CPU. > * This will be used in the channel callback path as we can do this > - * mapping in a lock-free fashion. > + * mapping in a lock-free fashion. Takes additional reference to the > + * channel, all users are supposed to do vmbus_put_channel(). > */ > static struct vmbus_channel *pcpu_relid2channel(u32 relid) > { > @@ -263,7 +264,7 @@ static struct vmbus_channel *pcpu_relid2channel(u32 > relid) > } > } > > - return found_channel; > + return vmbus_get_channel(found_channel); > } > > /* > @@ -297,6 +298,7 @@ struct vmbus_channel *relid2channel(u32 relid) > } > } > } > + found_channel = vmbus_get_channel(found_channel); > spin_unlock_irqrestore(&vmbus_connection.channel_lock, flags); > > return found_channel; > @@ -360,6 +362,7 @@ static void process_chn_event(u32 relid) > pr_err("no channel callback for relid - %u\n", relid); > } > > + vmbus_put_channel(channel); > } > > /* > diff --git a/drivers/hv/hyperv_vmbus.h b/drivers/hv/hyperv_vmbus.h > index b055e53..40d70f0 100644 > --- a/drivers/hv/hyperv_vmbus.h > +++ b/drivers/hv/hyperv_vmbus.h > @@ -687,6 +687,10 @@ void vmbus_device_unregister(struct hv_device > *device_obj); > /* VmbusChildDeviceDestroy( */ > /* struct hv_device *); */ > > +/* > + * Get the channel by its relid. Takes additional reference to the channel so > + * all users are supposed to do vmbus_put_channel() when they're done. > + */ > struct vmbus_channel *relid2channel(u32 relid); > > void vmbus_free_channels(void); > diff --git a/include/linux/hyperv.h b/include/linux/hyperv.h > index e73cfeb..c576d2d 100644 > --- a/include/linux/hyperv.h > +++ b/include/linux/hyperv.h > @@ -649,6 +649,9 @@ struct vmbus_channel { > /* Unique channel id */ > int id; > > + /* Active reference count */ > + atomic_t count; > + > struct list_head listentry; > > struct hv_device *device_obj; > @@ -666,6 +669,7 @@ struct vmbus_channel { > u8 monitor_bit; > > bool rescind; /* got rescind msg */ > + bool dying; /* channel is dying */ > > u32 ringbuffer_gpadlhandle; > > @@ -907,6 +911,15 @@ extern int vmbus_recvpacket_raw(struct > vmbus_channel *channel, > > extern void vmbus_ontimer(unsigned long data); > > +/* > + * Decrease reference count for the channel. Frees the channel when its usage > + * count reaches zero. > + */ > +extern void vmbus_put_channel(struct vmbus_channel *channel); > + > +/* Get additional reference to the channel */ > +extern struct vmbus_channel *vmbus_get_channel(struct vmbus_channel > *channel); > + > /* Base driver object */ > struct hv_driver { > const char *name; > -- Hi Vitaly, Thanks for the patchset! I once tried to do the same work, but just didn't find enough time. :-) Please see my below questions: Since we always get channel->lock when accessing channel->count, I don't think the counter has to be of atomic_t? I suggest we add vmbus_get/put_channel() in vmbus_open/close(). In this way, IMO patch 3/4 and 4/4 would be unnecessary? In vmbus_exit(), I suspect we should swap the order of the 2 lines: vmbus_free_channels(); bus_unregister(&hv_bus); I suppose in bus_unregister(), the .remove callbacks of all the drivers are invoked, and vmbus_close() is invoked for every channel. BTW, I just noticed: alloc_channel() -> alloc_workqueue(,,max_active==1,): due to max_active==1 here, a channel's process_offer() and process_rescind_offer() is already serialized. I'm not sure if this fact can be used to simplify the logic? Is the previous commit "Drivers: hv: vmbus: serialize Offer and Rescind offer" necessary? BTW2, I think there is an unsafe race in the 2 paths (this is an existing issue, not related to this patch of yours): vmbus_on_event() -> process_chn_event() -> pcpu_relid2channel -> list_for_each_entry(channel, pcpu_head, percpu_list) and vmbus_process_rescind_offer() -> percpu_channel_deq() -> list_del(&channel->percpu_list) because the former is running in tasklet and hence can preempt the latter (running in process context). percpu_channel_enq() should have the same issue(?) We should add local_bh_disable()/enable() accordingly? Can you please help to fix this? vmbus_exit() -> vmbus_free_channels() -> vmbus_put_channel() may have a race with vmbus_process_rescind_offer()? After vmbus_process_rescind_offer() -> vmbus_device_unregister(), the count can already be the initial 1, later vmbus_free_channels() -> vmbus_put_channel() can kfree the channel before vmbus_process_rescind_offer() completely finishes? Thanks, -- Dexuan -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/