Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932473AbcCRSC6 (ORCPT ); Fri, 18 Mar 2016 14:02:58 -0400 Received: from mail-by2on0118.outbound.protection.outlook.com ([207.46.100.118]:26587 "EHLO na01-by2-obe.outbound.protection.outlook.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1751321AbcCRSC4 convert rfc822-to-8bit (ORCPT ); Fri, 18 Mar 2016 14:02:56 -0400 From: KY Srinivasan To: Vitaly Kuznetsov , "devel@linuxdriverproject.org" CC: "linux-kernel@vger.kernel.org" , "Haiyang Zhang" , "Alex Ng (LIS)" , "Radim Krcmar" , Cathy Avery Subject: RE: [PATCH] Drivers: hv: vmbus: handle various crash scenarios Thread-Topic: [PATCH] Drivers: hv: vmbus: handle various crash scenarios Thread-Index: AQHRgRJgH+GbSkQfjk2MKtlbBA5OMZ9ffm9g Date: Fri, 18 Mar 2016 18:02:53 +0000 Message-ID: References: <1458304404-8347-1-git-send-email-vkuznets@redhat.com> In-Reply-To: <1458304404-8347-1-git-send-email-vkuznets@redhat.com> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: authentication-results: redhat.com; dkim=none (message not signed) header.d=none;redhat.com; dmarc=none action=none header.from=microsoft.com; x-originating-ip: [50.135.110.52] x-ms-office365-filtering-correlation-id: b44655c1-5934-4cad-bed5-08d34f578705 x-microsoft-exchange-diagnostics: 1;SN2PR03MB1917;5:NLbBJ7pJTqiOMM1IG2qstlDy7wWnv8EfLJl0D442/db+rb6+05j5oHMr4LENuZGVvVQ4PmFexeVkX5LpR/BGkZLWBxXQ/UPTBGVSM9aWa4nfdPyxT0y3TKOD8gfd+NDajyGnN7DLrwMXGh7Z0Tzedw==;24:U8P1sHp5/p3zZsjQ1ze3CheSXwvp/+XUANzI0FPc8lu1BhqtvblQA9Y4DbWvNHeXHlBhLtEyY3Kca8gQyLRZLsrfnCCb5kihCkgdBst9X/k= x-microsoft-antispam: UriScan:;BCL:0;PCL:0;RULEID:;SRVR:SN2PR03MB1917; x-microsoft-antispam-prvs: x-exchange-antispam-report-test: UriScan:; x-exchange-antispam-report-cfa-test: BCL:0;PCL:0;RULEID:(61425038)(601004)(2401047)(5005006)(8121501046)(3002001)(10201501046)(61426038)(61427038);SRVR:SN2PR03MB1917;BCL:0;PCL:0;RULEID:;SRVR:SN2PR03MB1917; x-forefront-prvs: 088552DE73 x-forefront-antispam-report: SFV:NSPM;SFS:(10019020)(6009001)(13464003)(377454003)(74316001)(102836003)(6116002)(76176999)(5002640100001)(2501003)(5001770100001)(3846002)(19580395003)(50986999)(2906002)(81166005)(189998001)(5003600100002)(33656002)(586003)(1096002)(87936001)(3280700002)(11100500001)(106116001)(54356999)(77096005)(8990500004)(2950100001)(76576001)(86362001)(5005710100001)(1220700001)(10400500002)(10290500002)(92566002)(5004730100002)(2900100001)(19580405001)(5008740100001)(122556002)(10090500001)(4326007)(66066001)(3660700001)(99286002);DIR:OUT;SFP:1102;SCL:1;SRVR:SN2PR03MB1917;H:SN2PR03MB2142.namprd03.prod.outlook.com;FPR:;SPF:None;MLV:sfv;LANG:en; Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 8BIT MIME-Version: 1.0 X-OriginatorOrg: microsoft.com X-MS-Exchange-CrossTenant-originalarrivaltime: 18 Mar 2016 18:02:53.3390 (UTC) X-MS-Exchange-CrossTenant-fromentityheader: Hosted X-MS-Exchange-CrossTenant-id: 72f988bf-86f1-41af-91ab-2d7cd011db47 X-MS-Exchange-Transport-CrossTenantHeadersStamped: SN2PR03MB1917 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 3713 Lines: 109 > -----Original Message----- > From: Vitaly Kuznetsov [mailto:vkuznets@redhat.com] > Sent: Friday, March 18, 2016 5:33 AM > To: devel@linuxdriverproject.org > Cc: linux-kernel@vger.kernel.org; KY Srinivasan ; > Haiyang Zhang ; Alex Ng (LIS) > ; Radim Krcmar ; Cathy > Avery > Subject: [PATCH] Drivers: hv: vmbus: handle various crash scenarios > > Kdump keeps biting. Turns out CHANNELMSG_UNLOAD_RESPONSE is always > delivered to CPU0 regardless of what CPU we're sending > CHANNELMSG_UNLOAD > from. vmbus_wait_for_unload() doesn't account for the fact that in case > we're crashing on some other CPU and CPU0 is still alive and operational > CHANNELMSG_UNLOAD_RESPONSE will be delivered there completing > vmbus_connection.unload_event, our wait on the current CPU will never > end. What was the host you were testing on? K. Y > > Do the following: > 1) Check for completion_done() in the loop. In case interrupt handler is > still alive we'll get the confirmation we need. > > 2) Always read CPU0's message page as CHANNELMSG_UNLOAD_RESPONSE > will be > delivered there. We can race with still-alive interrupt handler doing > the same but we don't care as we're checking completion_done() now. > > 3) Cleanup message pages on all CPUs. This is required (at least for the > current CPU as we're clearing CPU0 messages now but we may want to > bring > up additional CPUs on crash) as new messages won't be delivered till we > consume what's pending. On boot we'll place message pages somewhere > else > and we won't be able to read stale messages. > > Signed-off-by: Vitaly Kuznetsov > --- > drivers/hv/channel_mgmt.c | 30 +++++++++++++++++++++++++----- > 1 file changed, 25 insertions(+), 5 deletions(-) > > diff --git a/drivers/hv/channel_mgmt.c b/drivers/hv/channel_mgmt.c > index b10e8f74..5f37057 100644 > --- a/drivers/hv/channel_mgmt.c > +++ b/drivers/hv/channel_mgmt.c > @@ -512,14 +512,26 @@ static void init_vp_index(struct vmbus_channel > *channel, const uuid_le *type_gui > > static void vmbus_wait_for_unload(void) > { > - int cpu = smp_processor_id(); > - void *page_addr = hv_context.synic_message_page[cpu]; > + int cpu; > + void *page_addr = hv_context.synic_message_page[0]; > struct hv_message *msg = (struct hv_message *)page_addr + > VMBUS_MESSAGE_SINT; > struct vmbus_channel_message_header *hdr; > bool unloaded = false; > > - while (1) { > + /* > + * CHANNELMSG_UNLOAD_RESPONSE is always delivered to CPU0. > When we're > + * crashing on a different CPU let's hope that IRQ handler on CPU0 is > + * still functional and vmbus_unload_response() will complete > + * vmbus_connection.unload_event. If not, the last thing we can do > is > + * read message page for CPU0 regardless of what CPU we're on. > + */ > + while (!unloaded) { > + if (completion_done(&vmbus_connection.unload_event)) { > + unloaded = true; > + break; > + } > + > if (READ_ONCE(msg->header.message_type) == > HVMSG_NONE) { > mdelay(10); > continue; > @@ -530,9 +542,17 @@ static void vmbus_wait_for_unload(void) > unloaded = true; > > vmbus_signal_eom(msg); > + } > > - if (unloaded) > - break; > + /* > + * We're crashing and already got the UNLOAD_RESPONSE, cleanup > all > + * maybe-pending messages on all CPUs to be able to receive new > + * messages after we reconnect. > + */ > + for_each_online_cpu(cpu) { > + page_addr = hv_context.synic_message_page[cpu]; > + msg = (struct hv_message *)page_addr + > VMBUS_MESSAGE_SINT; > + msg->header.message_type = HVMSG_NONE; > } > } > > -- > 2.5.0