Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S935813AbcLOOcf (ORCPT ); Thu, 15 Dec 2016 09:32:35 -0500 Received: from mx1.redhat.com ([209.132.183.28]:41280 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1757933AbcLOOcc (ORCPT ); Thu, 15 Dec 2016 09:32:32 -0500 From: Vitaly Kuznetsov To: Olaf Hering Cc: kys@microsoft.com, gregkh@linuxfoundation.org, linux-kernel@vger.kernel.org, devel@linuxdriverproject.org Subject: Re: move hyperv CHANNELMSG_UNLOAD from crashed kernel to kdump kernel References: <20161207085110.GC1618@aepfle.de> <87r3594hef.fsf@vitty.brq.redhat.com> <20161215103402.GA6336@aepfle.de> <87mvfx4g4y.fsf@vitty.brq.redhat.com> <20161215125139.GC6336@aepfle.de> <87bmwd490b.fsf@vitty.brq.redhat.com> <20161215135111.GD6336@aepfle.de> Date: Thu, 15 Dec 2016 15:32:29 +0100 In-Reply-To: <20161215135111.GD6336@aepfle.de> (Olaf Hering's message of "Thu, 15 Dec 2016 14:51:12 +0100") Message-ID: <87vaul2rgi.fsf@vitty.brq.redhat.com> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/25.1 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.30]); Thu, 15 Dec 2016 14:32:32 +0000 (UTC) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 1139 Lines: 27 Olaf Hering writes: > On Thu, Dec 15, Vitaly Kuznetsov wrote: > >> vmbus_wait_for_unload() may be receiving a message (not necessarily the >> CHANNELMSG_UNLOAD_RESPONSE, we may see some other message) on the same >> CPU it runs and in this case wrmsrl() makes sense. In other cases it >> does nothing (neither good nor bad). > > If that other cpu has interrupts disabled it may not process a pending > msg (the response may be stuck in the host queue?), and the loop can not > kick the other cpus queue if a wrmsrl is just valid for the current cpu. > If thats true, the response will not arrive in the loop. > In case interrupts get permanently disabled on the CPU which is supposed to receive the CHANNELMSG_UNLOAD_RESPONSE message *and* there is some other message pedning in the slot for that CPU we'll hang. We may try to overcome this by sending NMIs but this is getting more and more complicated... I'd like to see a simple fix from Hyper-V host team: always deliver CHANNELMSG_UNLOAD_RESPONSE reply to the cpu which sent CHANNELMSG_UNLOAD request. This would allow us to remove all the craziness. -- Vitaly