Received: by 2002:a05:7412:31a9:b0:e2:908c:2ebd with SMTP id et41csp4045499rdb; Thu, 14 Sep 2023 10:05:30 -0700 (PDT) X-Google-Smtp-Source: AGHT+IGLHfX1g3fsBfVOtwfVFN3eVNrWpxW4pWjtOpxA1+tf2euE1Pshfq+9NTZo6LxL1+mLRpCN X-Received: by 2002:a17:903:4d7:b0:1c3:9aa7:66d2 with SMTP id jm23-20020a17090304d700b001c39aa766d2mr5860613plb.55.1694711129631; Thu, 14 Sep 2023 10:05:29 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1694711129; cv=none; d=google.com; s=arc-20160816; b=1Kz3/uudJ5CggCT2GMd56TdOr0vVZxc5l4+VviwdP0xcULoNn7g31d0H+O2qfCTAlv JMXZbTyu2JFacVcAgduhV1rRLoO/psPaNpnJ/X4Ahtd7UX40UALyzqUr1v9HEYXHqiqw 6GqoKs0IGpR4xb9cZM/yFKO1TKQkNZglDJENWcuW15ajlQYLQRB9vVkTuDfppQ4vfXIN vyeQ9Odh0c9qNYROtUpZbnaTpiLxF5LBkTbtdpTLiulUqgok0KnPgBiy8C5zB4ia5Vsc 2UaU/ap/NGhTpEYmXJ2wb+D7qN4gY6U/XCZl6T5ebi0hlmtPjpc6yrjhciZHUCu48zgK qNgA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-disposition:mime-version :references:message-id:subject:cc:to:from:date; bh=PahaKia8H+1LQNegIRh8j0lUFgz4B6IfmnZt4Vv49aU=; fh=z21qQDdJKkLDBlxcnjcg0hniT0rXImXqo6gcyxH2IPc=; b=hn7Ut0c4hZZlxKXpQMOC/zc8WJ55bANso2sw5+pb5gCGZbyaCcg+MKdsOiFY37dILL hCVFYjFvIQ9whc1PmuOMc3ed55UgaHLgxuJB+2sug5cMef5Joj/PdvnK3W9JthDwhp/I bBMaJjX/nI5YPhtA3jMpsqSQhpslKqXP/zNtapX3RMkGDinyNLAv052dLjbXg+FtdgbT tLLGW6HXyCibzfHYjc3V3gu1+n2c/8YDUtQNqiXQob7BN++c/gOYpYtpiwicwWy0zhiU y+13IF5piKjDyf3NlQBKIwUXHVaJzrBzWMJaTmfEyJ5508oOe0oNY1m/jGWRXh8B0a7J mGAQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.32 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from agentk.vger.email (agentk.vger.email. [23.128.96.32]) by mx.google.com with ESMTPS id s5-20020a170903214500b001b6aa82743esi1854555ple.271.2023.09.14.10.05.28 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 14 Sep 2023 10:05:29 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.32 as permitted sender) client-ip=23.128.96.32; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.32 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by agentk.vger.email (Postfix) with ESMTP id 5486D81F851D; Thu, 14 Sep 2023 10:02:54 -0700 (PDT) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.10 at agentk.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S237415AbjINRCx (ORCPT + 99 others); Thu, 14 Sep 2023 13:02:53 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:37996 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232594AbjINRCw (ORCPT ); Thu, 14 Sep 2023 13:02:52 -0400 Received: from mailhost.m5p.com (mailhost.m5p.com [74.104.188.4]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id A59601BD9 for ; Thu, 14 Sep 2023 10:02:48 -0700 (PDT) Received: from m5p.com (mailhost.m5p.com [IPv6:2001:470:1f07:15ff:0:0:0:f7]) by mailhost.m5p.com (8.17.1/8.15.2) with ESMTPS id 38EH252b038701 (version=TLSv1.3 cipher=TLS_AES_256_GCM_SHA384 bits=256 verify=NO); Thu, 14 Sep 2023 13:02:11 -0400 (EDT) (envelope-from ehem@m5p.com) Received: (from ehem@localhost) by m5p.com (8.17.1/8.15.2/Submit) id 38EH25Ba038700; Thu, 14 Sep 2023 10:02:05 -0700 (PDT) (envelope-from ehem) Date: Thu, 14 Sep 2023 10:02:05 -0700 From: Elliott Mitchell To: Borislav Petkov Cc: "Luck, Tony" , Yazen Ghannam , smita.koralahallichannabasappa@amd.com, linux-edac@vger.kernel.org, linux-kernel@vger.kernel.org, x86@kernel.org, xen-devel@lists.xenproject.org, rric@kernel.org, james.morse@arm.com Subject: Re: [PATCH] Revert "EDAC/mce_amd: Do not load edac_mce_amd module on guests" Message-ID: References: <20210628172740.245689-1-Smita.KoralahalliChannabasappa@amd.com> <20230908035911.GAZPqcD/EjfKZ0ISrZ@fat_crate.local> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20230908035911.GAZPqcD/EjfKZ0ISrZ@fat_crate.local> X-Spam-Status: No, score=-0.8 required=5.0 tests=HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on agentk.vger.email Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (agentk.vger.email [0.0.0.0]); Thu, 14 Sep 2023 10:02:54 -0700 (PDT) On Fri, Sep 08, 2023 at 05:59:11AM +0200, Borislav Petkov wrote: > On Thu, Sep 07, 2023 at 08:08:00PM -0700, Elliott Mitchell wrote: > > This reverts commit 767f4b620edadac579c9b8b6660761d4285fa6f9. > > > > There are at least 3 valid reasons why a VM may see MCE events/registers. > > Hmm, so they all read like a bunch of handwaving to me, with those > probable hypothetical "may" formulations. Indeed. At what point is the lack of information and response long enough to simply commit a revert due to those lacks? Even with the commit message having been rewritten and the link to: https://lkml.kernel.org/r/20210628172740.245689-1-Smita.KoralahalliChannabasappa@amd.com added, this still reads as roughly: "A hypothetical bug on a hypothetivisor" I rather suspect a genuine issue was observed, but with absolutely no detail this is useless. I can make some guesses, but those guesses relation to reality is dubious. On Wed, Sep 13, 2023 at 03:50:12PM +0000, Luck, Tony wrote: > > Also, please note that the EDAC modules don't handle MCE events > > directly. They act on information passed from the MCE subsystem. > > > > Furthermore, there are other EDAC modules that have the same !hypervisor > > check, so why change only this one? > > The older Intel EDAC drivers translated system physical addresses to DIMM > addresses by digging around in the CONFIG and MMIO space of the memory > controller devices. It would seem unwise for a VMM to give access to those > addresses to a guest (in general ... perhaps OK for a Xen style "DOM0" guest that is > handling many tasks for the VMM?). Which seems oddly similar to: "the Linux kernel may be handling adminstrative duties/hardware for a hypervisor. In this case, the events need to be processed and potentially passed back through the hypervisor." On Wed, Sep 13, 2023 at 12:21:50PM -0400, Yazen Ghannam wrote: > The MCE decoder may access some newer MCA registers, or request info > from the MCE subsystem. But this is for informational error decoding. It > won't support any actions that a guest could take. > > The AMD64 EDAC module reads system-specific memory controller registers > through non-architectural interfaces. So also unwise or not useful for a > guest to access. This could be emulated. With it not being officially specified the emulation may not be too accurate, but it is possible. Admittedly VMware may have abandoned this level of perfect emulation accuracy, but one could do it. Which would be "full virtualization of MCE events." On Wed, Sep 13, 2023 at 10:36:50AM -0400, Yazen Ghannam wrote: > Furthermore, there are other EDAC modules that have the same !hypervisor > check, so why change only this one? Indeed. Those will also need similar treatment, but that wouldn't be a revert of 767f4b620eda. I found 767f4b620eda in the process of looking for the correct hook point. There are at least two, and possibly more, points of view with regards to MCE and virtualization. I keep noticing most implementers are strictly thinking of perfect, full virtualization of hardware, and missing what is actually desired. Full virtualization is where you are renting an actual physical slice of actual hardware, proper virtualization of CEs and UEs is desireable. In reality most clients merely want to rent the processing power the hardware provides and not deal with actually owning the hardware. To them, CEs are an annoyance since they clutter logs and they're not something they're in a position to deal with. Instead the owner of the hardware wants the CEs so they can monitor hardware health. What you want depends on your SLAs, but the most prominent authors keep missing that many clients (VM owners) don't actually want to deal with CEs. A SLA could also state a single UE means discarding current VM state and rolling back to the last known good checkpoint. -- (\___(\___(\______ --=> 8-) EHM <=-- ______/)___/)___/) \BS ( | ehem+sigmsg@m5p.com PGP 87145445 | ) / \_CS\ | _____ -O #include O- _____ | / _/ 8A19\___\_|_/58D2 7E3D DDF4 7BA6 <-PGP-> 41D1 B375 37D0 8714\_|_/___/5445