Received: by 2002:a05:6358:1087:b0:cb:c9d3:cd90 with SMTP id j7csp1315673rwi; Wed, 26 Oct 2022 13:42:34 -0700 (PDT) X-Google-Smtp-Source: AMsMyM4waDrd1VEFClhWKfVVsWM8sW8qO4KgOlM5AqWAspvsA0K/Pg1PyCAsI7KD5JC7rD2e6xOq X-Received: by 2002:a17:902:ebca:b0:183:d123:e2a7 with SMTP id p10-20020a170902ebca00b00183d123e2a7mr46346082plg.105.1666816943792; Wed, 26 Oct 2022 13:42:23 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1666816943; cv=none; d=google.com; s=arc-20160816; b=GtEEiuVrjnDS22G3OXdkQr4S+hs/MFcJthldK0tB9tIZqZxgI6pSng/b6X3rJitr/i Ke83NwzbD4iQCRmtl4Bpcbn3OWPev9tMQ6l8dwJxkfp+WyUWXznkE7cZeUm9vgL+K2TB o97G8iZbkNgb4MpPl51HUAFqvQpq4PvmjfbQLGcWR3feD0r6C52QYmAKE2g9D2oGO7Wa 1hVhxBiUr8lfVROvsj7666WEyoRz6f06GA8nMdYuT+M+BjUQEOdTrYEjtfvPML8OheTJ qQ8hZHztVyyY/ZZUdjsF50dS9gD73OhoL/UB8S87xlBCJIFjmBhXmpY8s1ToFUXNW9td ix/g== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-disposition:mime-version :references:message-id:subject:cc:to:from:date:dkim-signature; bh=JRPU/VosrpVRR+iXu5ifunnhEheX6vmVtTupYdvcy3w=; b=hQh/nBWUpS81lAaRMRwp0DHSWqH4sIZ/s93sE7Dk/b0hM1v0Uy6jnmJY2J5IC2MTq3 RJ70/LZxIYwtgypXMiyqzmIzxNJgUm77SM/+4mPhxPOlPcp2ma+vEOBf5vV3WMSZyH9/ y13euskD9Gzmm9J+tJtIrVd0wFqkamVh/jqbVsmJG73KQD02iqboqxhp/H4AIK5xBDJ/ VriQxBJAk8IKGoTpHVmRJsZq82oM7HW0Ks67fPI57KeFAZS6OlwtatkopliiRCpLt7mC FfPpxKNwVDf2fN8NTyZ4NNKmekN08CVhO8yz6f3kXlfFDVuHNUMVU06FBGFUJ0FsH8lq 3GeQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@alien8.de header.s=dkim header.b=Z9BTwCQl; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=alien8.de Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id d19-20020a170902c19300b00186abd0c784si7370611pld.217.2022.10.26.13.42.12; Wed, 26 Oct 2022 13:42:23 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@alien8.de header.s=dkim header.b=Z9BTwCQl; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=alien8.de Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235121AbiJZUM1 (ORCPT + 99 others); Wed, 26 Oct 2022 16:12:27 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:50440 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235115AbiJZUMW (ORCPT ); Wed, 26 Oct 2022 16:12:22 -0400 Received: from mail.skyhub.de (mail.skyhub.de [5.9.137.197]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id B419A143A79; Wed, 26 Oct 2022 13:12:21 -0700 (PDT) Received: from zn.tnic (p200300ea9733e7b8329c23fffea6a903.dip0.t-ipconnect.de [IPv6:2003:ea:9733:e7b8:329c:23ff:fea6:a903]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.skyhub.de (SuperMail on ZX Spectrum 128k) with ESMTPSA id 1537A1EC06BD; Wed, 26 Oct 2022 22:12:20 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=alien8.de; s=dkim; t=1666815140; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:in-reply-to:in-reply-to: references:references; bh=JRPU/VosrpVRR+iXu5ifunnhEheX6vmVtTupYdvcy3w=; b=Z9BTwCQlOjSc0gmsajldgC8unz0qOWgec9dWC8Sr2ro4ysB7DQOehw7MfQRfTGFSW/I+BX esaqgLhAxa0lAy4UTawwwUKyUB5FLyTXFbYueSfGBBb11icL8QVV4cLH2rNvEQXQWZzM4K DB8C6oDKfMGXfaIaKfVZJJR6ZbpHFYM= Date: Wed, 26 Oct 2022 22:12:15 +0200 From: Borislav Petkov To: Yazen Ghannam Cc: Greg KH , linux-edac@vger.kernel.org, linux-kernel@vger.kernel.org, tony.luck@intel.com, x86@kernel.org, Smita.KoralahalliChannabasappa@amd.com, mpatocka@redhat.com Subject: Re: [PATCH] x86/MCE/AMD: Decrement threshold_bank refcount when removing threshold blocks Message-ID: References: <20220614174346.3648305-1-yazen.ghannam@amd.com> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,SPF_HELO_NONE,SPF_PASS, URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, Oct 26, 2022 at 07:44:17PM +0000, Yazen Ghannam wrote: > 1) Apply the patch I submitted as a simple fix/workaround for the presented > symptom. I tried to keep it small and well described to be a stable backport. > Obviously I wrote it without knowing the shared kobject behavior isn't ideal. We'll see. > 2) Address the shared kobject thing. > Here are some options: > a. Only set up the thresholding kobject on a single CPU per "AMD Node". > Technically MCA Bank 4 is "shared" on legacy systems. But AFAICT from > looking at old BKDG docs, in practice only the "Node Base Core" can access > the registers. This behavior is controlled by a bit in NB which BIOS is > supposed to set. Maybe some BIOSes don't do this, but I think that's a > "broken BIOS on legacy system" issue if so. I guess we can do that. And I even think we have some code which finds out which the NBC is... /me greps a bit: ah, there it is: get_nbc_for_node() in arch/x86/kernel/cpu/mce/inject.c. > b. Disable the MCA Thresholding interface for Families before 0x17. Can't. It is user-visible and you don't know for sure whether someone is using it or not. Believe me, I have been wanting to disable this thing forever. I've never heard of anyone using it and all the energy we put in it was for nothing. :-\ We could try to deprecate it, though, make it default=n in Kconfig and see who complains. And after a couple of releases, kill it. > This is an undocumented interface, Of course it is documented - it is in the old BKDGs. > and I don't know if anyone is using it on older systems. Yap. > The issue we're discussing here started because of a splat during > suspend/resume/CPU hotplug. In disable_err_thresholding(), we disable > MCA Thresholding for bank 4 on Family 15h, so there's some precedent. > c. Do nothing at the moment. I *really* want to clean up the MCA > Thresholding interface, and the shared kobject thing may get resolved > in that. Clean it up how exactly? Put it behind a Kconfig item, disable it and remove it after a while? :-) If so, I wouldn't mind. No one's using this. At least I haven't heard of a single bug report or of a use case. Only when CPU hotplug explodes and that thing is involved, only then. Might as well remove it. And then remove it in the hardware too. RAS folks would love to get rid of some of that crap which takes up verif resources for no good reason. :-) -- Regards/Gruss, Boris. https://people.kernel.org/tglx/notes-about-netiquette