Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1758986AbcCVOSS (ORCPT ); Tue, 22 Mar 2016 10:18:18 -0400 Received: from mail-bl2on0129.outbound.protection.outlook.com ([65.55.169.129]:19470 "EHLO na01-bl2-obe.outbound.protection.outlook.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1758645AbcCVOSL convert rfc822-to-8bit (ORCPT ); Tue, 22 Mar 2016 10:18:11 -0400 From: KY Srinivasan To: Vitaly Kuznetsov CC: "devel@linuxdriverproject.org" , "linux-kernel@vger.kernel.org" , Haiyang Zhang , "Alex Ng (LIS)" , "Radim Krcmar" , Cathy Avery Subject: RE: [PATCH] Drivers: hv: vmbus: handle various crash scenarios Thread-Topic: [PATCH] Drivers: hv: vmbus: handle various crash scenarios Thread-Index: AQHRgRJgH+GbSkQfjk2MKtlbBA5OMZ9ffm9ggAQMh/yAAOv38IABDXtVgAAEcwA= Date: Tue, 22 Mar 2016 14:18:05 +0000 Message-ID: References: <1458304404-8347-1-git-send-email-vkuznets@redhat.com> <874mc02rqd.fsf@vitty.brq.redhat.com> <87mvpq7gtg.fsf@vitty.brq.redhat.com> In-Reply-To: <87mvpq7gtg.fsf@vitty.brq.redhat.com> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: authentication-results: redhat.com; dkim=none (message not signed) header.d=none;redhat.com; dmarc=none action=none header.from=microsoft.com; x-originating-ip: [2601:600:8c01:121d:f198:5bd9:44f9:dc20] x-ms-office365-filtering-correlation-id: f70b7abd-a30c-457d-3234-08d3525cc991 x-microsoft-exchange-diagnostics: 1;SN2PR03MB2142;5:90Y9NsDQVh0i/k9AQ0FNeWCfZsiD3PH0+VeycOreRxqzggxkrOHt7mtaXYqWmTdMasCDlyxvqOD4xYChwYkKM6McW7cF3p3GYQxt/7fVFrE6C31T84xdc8YeaWswJkDjM+2HE/0CUN9gFh7yJUOVYQ==;24:evpCM8MBv7+Ql+iegw/tx5toUP3PmDuda4+DXurb5dhv6frj8535Lnus+sQWLTvn7QZsYQqamXRo/5+1asxfSIjQqeTvnRrpszNtvGco5ds= x-microsoft-antispam: UriScan:;BCL:0;PCL:0;RULEID:;SRVR:SN2PR03MB2142; x-microsoft-antispam-prvs: x-exchange-antispam-report-test: UriScan:; x-exchange-antispam-report-cfa-test: BCL:0;PCL:0;RULEID:(61425038)(601004)(2401047)(5005006)(8121501046)(10201501046)(3002001)(61426038)(61427038);SRVR:SN2PR03MB2142;BCL:0;PCL:0;RULEID:;SRVR:SN2PR03MB2142; x-forefront-prvs: 08897B549D x-forefront-antispam-report: SFV:NSPM;SFS:(10019020)(6009001)(377454003)(13464003)(3660700001)(76576001)(586003)(102836003)(86362001)(6116002)(87936001)(54356999)(33656002)(5004730100002)(3280700002)(1220700001)(1096002)(76176999)(106116001)(5002640100001)(92566002)(50986999)(5008740100001)(86612001)(110136002)(5890100001)(2950100001)(81166005)(99286002)(74316001)(5005710100001)(5003600100002)(2906002)(189998001)(4326007)(122556002)(19580405001)(10290500002)(77096005)(10400500002)(10090500001)(19580395003)(3826002);DIR:OUT;SFP:1102;SCL:1;SRVR:SN2PR03MB2142;H:SN2PR03MB2142.namprd03.prod.outlook.com;FPR:;SPF:None;MLV:sfv;LANG:en; spamdiagnosticoutput: 1:23 spamdiagnosticmetadata: NSPM Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 8BIT MIME-Version: 1.0 X-OriginatorOrg: microsoft.com X-MS-Exchange-CrossTenant-originalarrivaltime: 22 Mar 2016 14:18:06.0014 (UTC) X-MS-Exchange-CrossTenant-fromentityheader: Hosted X-MS-Exchange-CrossTenant-id: 72f988bf-86f1-41af-91ab-2d7cd011db47 X-MS-Exchange-Transport-CrossTenantHeadersStamped: SN2PR03MB2142 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 3608 Lines: 93 > -----Original Message----- > From: Vitaly Kuznetsov [mailto:vkuznets@redhat.com] > Sent: Tuesday, March 22, 2016 7:01 AM > To: KY Srinivasan > Cc: devel@linuxdriverproject.org; linux-kernel@vger.kernel.org; Haiyang > Zhang ; Alex Ng (LIS) ; > Radim Krcmar ; Cathy Avery > Subject: Re: [PATCH] Drivers: hv: vmbus: handle various crash scenarios > > KY Srinivasan writes: > > >> -----Original Message----- > >> From: Vitaly Kuznetsov [mailto:vkuznets@redhat.com] > >> Sent: Monday, March 21, 2016 12:52 AM > >> To: KY Srinivasan > >> Cc: devel@linuxdriverproject.org; linux-kernel@vger.kernel.org; Haiyang > >> Zhang ; Alex Ng (LIS) > ; > >> Radim Krcmar ; Cathy Avery > > >> Subject: Re: [PATCH] Drivers: hv: vmbus: handle various crash scenarios > >> > >> KY Srinivasan writes: > >> > >> >> -----Original Message----- > >> >> From: Vitaly Kuznetsov [mailto:vkuznets@redhat.com] > >> >> Sent: Friday, March 18, 2016 5:33 AM > >> >> To: devel@linuxdriverproject.org > >> >> Cc: linux-kernel@vger.kernel.org; KY Srinivasan ; > >> >> Haiyang Zhang ; Alex Ng (LIS) > >> >> ; Radim Krcmar ; > Cathy > >> >> Avery > >> >> Subject: [PATCH] Drivers: hv: vmbus: handle various crash scenarios > >> >> > >> >> Kdump keeps biting. Turns out CHANNELMSG_UNLOAD_RESPONSE is > >> always > >> >> delivered to CPU0 regardless of what CPU we're sending > >> >> CHANNELMSG_UNLOAD > >> >> from. vmbus_wait_for_unload() doesn't account for the fact that in > case > >> >> we're crashing on some other CPU and CPU0 is still alive and > operational > >> >> CHANNELMSG_UNLOAD_RESPONSE will be delivered there > completing > >> >> vmbus_connection.unload_event, our wait on the current CPU will > never > >> >> end. > >> > > >> > What was the host you were testing on? > >> > > >> > >> I was testing on both 2012R2 and 2016TP4. The bug is easily reproducible > >> by forcing crash on a secondary CPU, e.g.: > > > > Prior to 2012R2, all messages would be delivered on CPU0 and this includes > CHANNELMSG_UNLOAD_RESPONSE. > > For this reason we don't support kexec on pre-2012 R2 hosts. On 2012. > From 2012 R2 on, all vmbus > > messages (responses) will be delivered on the CPU that we initially set up - > look at the code in > > vmbus_negotiate_version(). So on post 2012 R2 hosts, the response to > CHANNELMSG_UNLOAD_RESPONSE > > will be delivered on the CPU where we initiate the contact with the > > host - CHANNELMSG_INITIATE_CONTACT message. > > Unfortunatelly there is a descrepancy between WS2012R2 and WS2016TP4. > On > WS2012R2 what you're saying is true and all messages including > CHANNELMSG_UNLOAD_RESPONSE are delivered to the CPU we used for > initial > contact. On WS2016TP4 CHANNELMSG_UNLOAD_RESPONSE seems to be a > special > case and it is always delivered to CPU0, no matter which CPU we used for > initial contact. This can be a host bug. You can use the attached patch > to see the issue. This looks like a host bug and I will try to get is addressed before ws2016 ships. > > For now I can suggest we check message pages for all CPUs from > vmbus_wait_for_unload(). We can race with other CPUs again but we don't > care as we're checking for completion_done() in the loop as well. I'll > try this approach. Thank you. K. Y > > -- > Vitaly