Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755302AbcCUWoq (ORCPT ); Mon, 21 Mar 2016 18:44:46 -0400 Received: from mail-bn1bon0119.outbound.protection.outlook.com ([157.56.111.119]:53060 "EHLO na01-bn1-obe.outbound.protection.outlook.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1750745AbcCUWoo convert rfc822-to-8bit (ORCPT ); Mon, 21 Mar 2016 18:44:44 -0400 From: KY Srinivasan To: Vitaly Kuznetsov CC: "devel@linuxdriverproject.org" , "linux-kernel@vger.kernel.org" , Haiyang Zhang , "Alex Ng (LIS)" , "Radim Krcmar" , Cathy Avery Subject: RE: [PATCH] Drivers: hv: vmbus: handle various crash scenarios Thread-Topic: [PATCH] Drivers: hv: vmbus: handle various crash scenarios Thread-Index: AQHRgRJgH+GbSkQfjk2MKtlbBA5OMZ9ffm9ggAQMh/yAAOv38A== Date: Mon, 21 Mar 2016 22:44:41 +0000 Message-ID: References: <1458304404-8347-1-git-send-email-vkuznets@redhat.com> <874mc02rqd.fsf@vitty.brq.redhat.com> In-Reply-To: <874mc02rqd.fsf@vitty.brq.redhat.com> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: authentication-results: redhat.com; dkim=none (message not signed) header.d=none;redhat.com; dmarc=none action=none header.from=microsoft.com; x-originating-ip: [2001:4898:80e8:b::344] x-ms-office365-filtering-correlation-id: c7ea3596-1d15-4e52-a37a-08d351da6499 x-microsoft-exchange-diagnostics: 1;SN2PR03MB1917;5:z7v6CizEP/2F7HpeSWEnAObvaXO7Wv8lz9D8xGogwcROVghF8cs6T12b7jmK/jQ+lqgJcdtBtJ8jQxkMe07TwiVQ/6NsH7BEcUK4jMhafO5tGrtP4oZwbirTQOSoEER0hjdyruG8IcyXfordqE5zCA==;24:b+SeMiRiVSjvWvNCDj9kV0UJuKcNa/pej1tpbV8zM5M7AthJpbs/0OiwuwS5VvnNV/BlMerKCSxY75PUqTL49l609sdN+TL+xcyblZSiK1E= x-microsoft-antispam: UriScan:;BCL:0;PCL:0;RULEID:;SRVR:SN2PR03MB1917; x-microsoft-antispam-prvs: x-exchange-antispam-report-test: UriScan:; x-exchange-antispam-report-cfa-test: BCL:0;PCL:0;RULEID:(61425038)(601004)(2401047)(5005006)(8121501046)(3002001)(10201501046)(61426038)(61427038);SRVR:SN2PR03MB1917;BCL:0;PCL:0;RULEID:;SRVR:SN2PR03MB1917; x-forefront-prvs: 0888B1D284 x-forefront-antispam-report: SFV:NSPM;SFS:(10019020)(6009001)(377454003)(13464003)(74316001)(6116002)(102836003)(5002640100001)(76176999)(19580395003)(50986999)(87936001)(3280700002)(11100500001)(586003)(33656002)(86612001)(1096002)(2906002)(5003600100002)(106116001)(54356999)(81166005)(189998001)(77096005)(2950100001)(8990500004)(76576001)(86362001)(5005710100001)(1220700001)(10400500002)(10290500002)(110136002)(99286002)(3660700001)(92566002)(5004730100002)(10090500001)(4326007)(2900100001)(5008740100001)(19580405001)(122556002);DIR:OUT;SFP:1102;SCL:1;SRVR:SN2PR03MB1917;H:SN2PR03MB2142.namprd03.prod.outlook.com;FPR:;SPF:None;MLV:sfv;LANG:en; Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 8BIT MIME-Version: 1.0 X-OriginatorOrg: microsoft.com X-MS-Exchange-CrossTenant-originalarrivaltime: 21 Mar 2016 22:44:41.8913 (UTC) X-MS-Exchange-CrossTenant-fromentityheader: Hosted X-MS-Exchange-CrossTenant-id: 72f988bf-86f1-41af-91ab-2d7cd011db47 X-MS-Exchange-Transport-CrossTenantHeadersStamped: SN2PR03MB1917 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2301 Lines: 52 > -----Original Message----- > From: Vitaly Kuznetsov [mailto:vkuznets@redhat.com] > Sent: Monday, March 21, 2016 12:52 AM > To: KY Srinivasan > Cc: devel@linuxdriverproject.org; linux-kernel@vger.kernel.org; Haiyang > Zhang ; Alex Ng (LIS) ; > Radim Krcmar ; Cathy Avery > Subject: Re: [PATCH] Drivers: hv: vmbus: handle various crash scenarios > > KY Srinivasan writes: > > >> -----Original Message----- > >> From: Vitaly Kuznetsov [mailto:vkuznets@redhat.com] > >> Sent: Friday, March 18, 2016 5:33 AM > >> To: devel@linuxdriverproject.org > >> Cc: linux-kernel@vger.kernel.org; KY Srinivasan ; > >> Haiyang Zhang ; Alex Ng (LIS) > >> ; Radim Krcmar ; Cathy > >> Avery > >> Subject: [PATCH] Drivers: hv: vmbus: handle various crash scenarios > >> > >> Kdump keeps biting. Turns out CHANNELMSG_UNLOAD_RESPONSE is > always > >> delivered to CPU0 regardless of what CPU we're sending > >> CHANNELMSG_UNLOAD > >> from. vmbus_wait_for_unload() doesn't account for the fact that in case > >> we're crashing on some other CPU and CPU0 is still alive and operational > >> CHANNELMSG_UNLOAD_RESPONSE will be delivered there completing > >> vmbus_connection.unload_event, our wait on the current CPU will never > >> end. > > > > What was the host you were testing on? > > > > I was testing on both 2012R2 and 2016TP4. The bug is easily reproducible > by forcing crash on a secondary CPU, e.g.: Prior to 2012R2, all messages would be delivered on CPU0 and this includes CHANNELMSG_UNLOAD_RESPONSE. For this reason we don't support kexec on pre-2012 R2 hosts. On 2012. From 2012 R2 on, all vmbus messages (responses) will be delivered on the CPU that we initially set up - look at the code in vmbus_negotiate_version(). So on post 2012 R2 hosts, the response to CHANNELMSG_UNLOAD_RESPONSE will be delivered on the CPU where we initiate the contact with the host - CHANNELMSG_INITIATE_CONTACT message. So, maybe we can stash away the CPU on which we made the initial contact and poll the state on that CPU to make forward progress in the case of crash. Regards, K. Y