Received: by 2002:a25:1985:0:0:0:0:0 with SMTP id 127csp972415ybz; Fri, 17 Apr 2020 13:21:46 -0700 (PDT) X-Google-Smtp-Source: APiQypJ+jdCuDFNxg4qcbqIHPJyjCmPuibgATPYHxwdvPvixDRBWk770iBOPJbsw7hldQYYz/laq X-Received: by 2002:a05:6402:1766:: with SMTP id da6mr1961832edb.119.1587154905812; Fri, 17 Apr 2020 13:21:45 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1587154905; cv=none; d=google.com; s=arc-20160816; b=Ckj+lUWfzTJbRtFaCwA6BFEA31B011ULAMm6bnDU69ExBhqhVxofP6pWIdN23WSr5S JFkU0h4ZA+Casd0zLdyo2mywJ4Lb6lI3wlLtCCs1KxhRMsdEiAwTSlLGJ37N0JDbgG4Y 76oaOi8e7mcGUShM4vj/aa+Xtzw5VrSSqZR0CAfKq4pB7FMvHBRLJpvWq4M63/6Z5uND nBl+P44aBcTTelTHLPRw0FtJmUwO02gS87HIYoya4w/QL0IFnCBXmHSdTNNMlC+HjYSZ Z7YG9NSXESifEP3amcmaftJoD82HHWTtHiE6Vms4obmimafXzolaNSW7mG87a7X9eiBr en9A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:mime-version:message-id:date:references :in-reply-to:subject:cc:to:from; bh=eqsvIesSoqrIRZGCAa3k0crb2/xhCJPFL34e/QZhL6Q=; b=vdw0IhtJt93AdPGq7PhI/59On6IPZ40g267lKIXboHUZkx99GhXQeCzIKacb+KrCNN OxCZLKuNGrF/XfEwdoVmQI2Q62g4CT4egrwfckvJB4iTUtmb1ybtFYlywLpMBFld27ua TDf2rYPZYbQzLtRH16OO9n0jat3AckbixzNTPUqdn4+4uBqggO1HzxvUjv4UoidGVMK5 GMgAR89sftCvL58MLbjCHqH3Fz9K54lN4AfslSdly66j5UIm0joGGZgxZdsH4wlK81+v wVKUkIjDb3UjsC/94oV960kXZ73IsjZvCMpTYcceRNRTA8K0lS7WzbVczAOSVyAKS9tN zNuQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id h15si11919276edv.341.2020.04.17.13.21.22; Fri, 17 Apr 2020 13:21:45 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1730979AbgDQUUB (ORCPT + 99 others); Fri, 17 Apr 2020 16:20:01 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:33282 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1730573AbgDQUUB (ORCPT ); Fri, 17 Apr 2020 16:20:01 -0400 Received: from Galois.linutronix.de (Galois.linutronix.de [IPv6:2a0a:51c0:0:12e:550::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id BC658C061A0C for ; Fri, 17 Apr 2020 13:20:00 -0700 (PDT) Received: from p5de0bf0b.dip0.t-ipconnect.de ([93.224.191.11] helo=nanos.tec.linutronix.de) by Galois.linutronix.de with esmtpsa (TLS1.2:DHE_RSA_AES_256_CBC_SHA256:256) (Exim 4.80) (envelope-from ) id 1jPXSw-0006dZ-Jf; Fri, 17 Apr 2020 22:19:58 +0200 Received: by nanos.tec.linutronix.de (Postfix, from userid 1000) id 1CCE3100C47; Fri, 17 Apr 2020 22:19:58 +0200 (CEST) From: Thomas Gleixner To: Marc Dionne , Linux Kernel Mailing List Cc: x86@kernel.org Subject: Re: FreeNAS VM disk access errors, bisected to commit 6f1a4891a592 In-Reply-To: References: Date: Fri, 17 Apr 2020 22:19:58 +0200 Message-ID: <87d085zwy9.fsf@nanos.tec.linutronix.de> MIME-Version: 1.0 Content-Type: text/plain X-Linutronix-Spam-Score: -1.0 X-Linutronix-Spam-Level: - X-Linutronix-Spam-Status: No , -1.0 points, 5.0 required, ALL_TRUSTED=-1,SHORTCIRCUIT=-0.0001 Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Marc, Marc Dionne writes: > Commit 6f1a4891a592 ("x86/apic/msi: Plug non-maskable MSI affinity > race") causes Linux VMs hosted on FreeNAS (bhyve hypervisor) to lose > access to their disk devices shortly after boot. The disks are zfs > zvols on the host, presented to each VM. > > Background: I recently updated some fedora 31 VMs running under the > bhyve hypervisor (hosted on a FreeNAS mini), and they moved to a > distro 5.5 kernel (5.5.15). Shortly after reboot, the disks became > inaccessible with any operation getting EIO errors. Booting back into > a 5.4 kernel, everything was fine. I built a 5.7-rc1 kernel, which > showed the same symptoms, and was then able to bisect it down to > commit 6f1a4891a592. Note that the symptoms do not occur on every > boot, but often enough (roughly 80%) to make bisection possible. > > Applying a manual revert of 6f1a4891a592 on top of mainline from > yesterday gives me a kernel that works fine. we tested on real hardware and various hypervisors that the fix actually works correctly. That makes me assume that the staged approach of changing affinity for this non-maskable MSI mess makes your particular hypervisor unhappy. Are there any messages like this: "do_IRQ: 0.83 No irq handler for vector" in dmesg on the Linux side? If they happen then before the disk timeout happens. I have absolutely zero knowledge about bhyve, so may I suggest to talk to the bhyve experts about this. Thanks, tglx