Received: by 2002:a25:1985:0:0:0:0:0 with SMTP id 127csp1176252ybz; Fri, 17 Apr 2020 17:51:46 -0700 (PDT) X-Google-Smtp-Source: APiQypKh5FCOOKgaD634zWxDCxE727sJSbLDAV277wWDeEvw3MI0/IcqemDaWcGbHZYtJxng32r6 X-Received: by 2002:aa7:d4cd:: with SMTP id t13mr50572edr.30.1587171106372; Fri, 17 Apr 2020 17:51:46 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1587171106; cv=none; d=google.com; s=arc-20160816; b=Wq/SODHaXosZdULn0SHKH2awdFxGTlblOn9QAJisvRxn0AgEIv7nt0rd8/HXtLO2wl eCk2UaVjj7wird42AsibwaefdC45pg9xKu0VnRm3ELkR554yqjkNIY+G7KF5G+XSvqo5 2YA8GAfiRc2tKh/AEPyVwerN8vauV/HA5GjVJc6lkgNaYt9t5b55TwbUWH2Xqv62jKBm Y5/rlZovClSjMCJvQCxwnHUzZml9gq9izVDYFTVxwgVfLycsRJLdcuvcXppfTnqkZp0m gGL2G8MRJyaKtaCZqqYcBqxU9PRkDMWPUK5C2brzGQGSfRyGt9xb3+KYZALqqgM56i3m iT5w== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:dkim-signature; bh=ILGJdPVopH8dcZhs9zo8meFz8ZU1EhEF7xfIxGI/Jew=; b=DIQZ5wKpiDwCjIFfg/Q4jmI6rMGVd1y8T5ExphkMo/lbh5dNA6PXEUY/09Ty8nq09o biG+2UxK+sVegPatogk76+h/KlUrlwAlA+fjuvu23EBSJCrxqFeXDNXnGfZegXRyD3HP rJDVviwbcgqLDPUuWphJxKsTMNAVw4SU1qNDRIEaFVI7LBQzoFIyMcgIKfdhZMOC6iaD NoHUO8O3Tusi1x74EWzDtSILU1sraaFZNJ5y9D8bDBj0MTdKJU4xTnkLywu+JDZlpF9p gNcs4cYv/cDvIhyNWDAc5Da1lSNZUD3b0R2SLa8tGL7PNXAin+uBDx5+alikiysSMl0Y qjfQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=X0i85OPn; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id ks2si7511866ejb.445.2020.04.17.17.51.23; Fri, 17 Apr 2020 17:51:46 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=X0i85OPn; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1725953AbgDRAtt (ORCPT + 99 others); Fri, 17 Apr 2020 20:49:49 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:47016 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-FAIL-OK-FAIL) by vger.kernel.org with ESMTP id S1725856AbgDRAts (ORCPT ); Fri, 17 Apr 2020 20:49:48 -0400 Received: from mail-ed1-x541.google.com (mail-ed1-x541.google.com [IPv6:2a00:1450:4864:20::541]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 67D96C061A0C for ; Fri, 17 Apr 2020 17:49:43 -0700 (PDT) Received: by mail-ed1-x541.google.com with SMTP id f12so2781572edn.12 for ; Fri, 17 Apr 2020 17:49:43 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=ILGJdPVopH8dcZhs9zo8meFz8ZU1EhEF7xfIxGI/Jew=; b=X0i85OPnq74NVcUCzWPdo/3Sdhtgs2brEcb60Il8uSMv8zsbbwQn/ZVdUswz/2UnTl Ke0gTwLzfULJVWmSGEfcGmrVBoKjNnY+fBVjbs4Hkc9XusQl20LxTGn0OBOw9gcbyHaQ 35EdYRiN+b+1dPnswJjZMB6TZS9qRsulPw6ORtRMzjxmXLBbsnCcX8p8y7cPoO1nAFZu sjVMx6rk5DHzAHcdTNRPoME3pu7CmOWtZFfLowcGGSYTK/jsH8ALG31Ht9lc1pIRgdW0 EOcjo1mFOTtSQ10CzXCmgN41Tj5jSaL4GHEurN/NIqQN1wS3j8V/uSWICdBtpEq+B23X p/xw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=ILGJdPVopH8dcZhs9zo8meFz8ZU1EhEF7xfIxGI/Jew=; b=MhGdhmZGRqTThinRH1wtFYFbbnFoPpuKGS7t5CmNTxN6gfSS43b7o3C/XxClFmuSJ7 N2pOrY4ik/IqVI2wgwl7UWtj4nqEAmeUr715LR0gJIKCshBssWifEzNo4LjH14cLuNGf xomGQlveSfK12lWgrJ5uvF5DwlaYzbCkMLVArWN3fj3olYJ1ac73+OftVHjxqOar9Z2m fGWY0Ps/+HnNytoV6ac3nvGWm6o6mGRij8v7r/hQNgyL7JUJAfH7l8soXdBGUQVugI3C XVYvLA5YzaVxdbSdiKSKSRejW6cTkFC6IGmBMJcfqiFNrp7+YuNoXGQpeVbl/PTwy/sL SHhg== X-Gm-Message-State: AGi0PuacRJ4Gaxq6FKodRsw6rUV4tT4yBY1G4IMjh9WXlpCaUGZ9SQoY QjoEIyDPzZYipKEzGgJeqSWKKXwW80o/dSs/8uE= X-Received: by 2002:a05:6402:120a:: with SMTP id c10mr5197545edw.15.1587170981913; Fri, 17 Apr 2020 17:49:41 -0700 (PDT) MIME-Version: 1.0 References: <87d085zwy9.fsf@nanos.tec.linutronix.de> In-Reply-To: <87d085zwy9.fsf@nanos.tec.linutronix.de> From: Marc Dionne Date: Fri, 17 Apr 2020 21:49:30 -0300 Message-ID: Subject: Re: FreeNAS VM disk access errors, bisected to commit 6f1a4891a592 To: Thomas Gleixner Cc: Linux Kernel Mailing List , x86@kernel.org Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, Apr 17, 2020 at 5:19 PM Thomas Gleixner wrote: > > Marc, > > Marc Dionne writes: > > > Commit 6f1a4891a592 ("x86/apic/msi: Plug non-maskable MSI affinity > > race") causes Linux VMs hosted on FreeNAS (bhyve hypervisor) to lose > > access to their disk devices shortly after boot. The disks are zfs > > zvols on the host, presented to each VM. > > > > Background: I recently updated some fedora 31 VMs running under the > > bhyve hypervisor (hosted on a FreeNAS mini), and they moved to a > > distro 5.5 kernel (5.5.15). Shortly after reboot, the disks became > > inaccessible with any operation getting EIO errors. Booting back into > > a 5.4 kernel, everything was fine. I built a 5.7-rc1 kernel, which > > showed the same symptoms, and was then able to bisect it down to > > commit 6f1a4891a592. Note that the symptoms do not occur on every > > boot, but often enough (roughly 80%) to make bisection possible. > > > > Applying a manual revert of 6f1a4891a592 on top of mainline from > > yesterday gives me a kernel that works fine. > > we tested on real hardware and various hypervisors that the fix actually > works correctly. > > That makes me assume that the staged approach of changing affinity for > this non-maskable MSI mess makes your particular hypervisor unhappy. > > Are there any messages like this: > > "do_IRQ: 0.83 No irq handler for vector" I haven't seen those although I only have a VNC console that scrolls by rather fast. I did see a report from someone running Ubuntu 18.04 which had this after the initial errors: do_IRQ: 2.35 No irq handler for vector ata1.00: revalidation failed (error=-5) > in dmesg on the Linux side? If they happen then before the disk timeout > happens. > > I have absolutely zero knowledge about bhyve, so may I suggest to talk > to the bhyve experts about this. I opened a ticket with iXsystems. I noticed several people reporting the same problem in their community forums. Marc