Received: by 2002:a05:7412:a9a2:b0:e2:908c:2ebd with SMTP id o34csp1444489rdh; Fri, 27 Oct 2023 14:47:50 -0700 (PDT) X-Google-Smtp-Source: AGHT+IGcEGU9khavDI2pJJXW30x+6JLCfZJirvIkSKqpUsTv7vghbbvIMF/o4HcRh1vJpucbiG3P X-Received: by 2002:a05:690c:710:b0:583:c917:7ff0 with SMTP id bs16-20020a05690c071000b00583c9177ff0mr3525439ywb.51.1698443270407; Fri, 27 Oct 2023 14:47:50 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1698443270; cv=none; d=google.com; s=arc-20160816; b=Pp/HoH2NNQxuwJK0P9Low6B7a+PlMm9D2tf6sur279fWut91A+lHd7q4vLakFuoIqI qsZBydkMwI6/fHz590dxUEykc+B6orbFa+p4gbKhqt8GZAT42x7sNEEMFjZOwQsWpXsH bTlwi3kC7gDjQPLxozjey6AuNMnJV07TerIcSGY3HxsgTw5iswIR+A+WNECgtkOt8Wmy Ie9YWjUxQE1+knZE+qyyoLqAtF4lCFtDVJmmiLMS41/3rI8b3DdSJ9epfSas22ZAlj5v PBeiru+28TIb0EsunUPSie9RLhzztoS7+OybfrhWWZHlYbxV+d5Qb2SwITK/uG5zcfOz yuBg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:mime-version:message-id:date:references :in-reply-to:subject:cc:to:dkim-signature:dkim-signature:from; bh=XFlGYGD96K+5LkGjpYByN6tWVT7HSksJ2vFMkU/hCVk=; fh=eLW7AVOCGPPXcL3CZ1wmyKBFxhc7XmvdfKkygtZ0wzc=; b=GX22d9UE0hn5dJqFzKnZOIa4YzPLKsuf+BV4dU5eo+wgGP75cCVSrHOovVFeqJSQjj lVrIYN0ls75jfsuQKtEPJweJfR3irQGpzZ3pq3YrYomgTdd9GfFHxuYa5L+rdqS78KM4 fttOJcXraN/851XxbNKoZdRNJegEjNxyitN9vd1p+8247hQ+lN+/3oYxOhUfsfSZzoXM RH/in3QFyhAAM12Gj2wRxhnZduZaETQ1mN9ioOQEEgRUCp5QGmbN2XsTOMX7a9hFgJlH oZQsQJbec7hO6LMV6SaPf7bgWV2s+dtOCTb1eZ/8tc/gALeGkkIaHa9ZB3LyTqQNwiUH yz8w== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linutronix.de header.s=2020 header.b=JeCfrHiP; dkim=neutral (no key) header.i=@linutronix.de header.s=2020e; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.36 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=linutronix.de Return-Path: Received: from pete.vger.email (pete.vger.email. [23.128.96.36]) by mx.google.com with ESMTPS id c188-20020a0dc1c5000000b0058cac53da37si3934706ywd.348.2023.10.27.14.47.49 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 27 Oct 2023 14:47:50 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.36 as permitted sender) client-ip=23.128.96.36; Authentication-Results: mx.google.com; dkim=pass header.i=@linutronix.de header.s=2020 header.b=JeCfrHiP; dkim=neutral (no key) header.i=@linutronix.de header.s=2020e; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.36 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=linutronix.de Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by pete.vger.email (Postfix) with ESMTP id 210CB802B100; Fri, 27 Oct 2023 14:47:47 -0700 (PDT) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.10 at pete.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1346594AbjJ0VrW (ORCPT + 99 others); Fri, 27 Oct 2023 17:47:22 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:58422 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235117AbjJ0VrT (ORCPT ); Fri, 27 Oct 2023 17:47:19 -0400 Received: from galois.linutronix.de (Galois.linutronix.de [IPv6:2a0a:51c0:0:12e:550::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id D26791BF; Fri, 27 Oct 2023 14:47:16 -0700 (PDT) From: Thomas Gleixner DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020; t=1698443235; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=XFlGYGD96K+5LkGjpYByN6tWVT7HSksJ2vFMkU/hCVk=; b=JeCfrHiP2URhMWgqY2LRJ3yKPlrz+LtQdJaiJtOhcev0MmT4dMSpgHagCq9PkzmlifoXfk yEaFKvVFPGg7hmjbM8LGxncodNCdAe6R5E0MXZeDhavYvXjnnEMhjbN1p/U01vq1bH1pcq bCmPepD936IfMqCpeG5xCvPECVbgmZylJtUfz79tz2YIKTAPwzlSVIyyIGvg1BiCz78lKs /R2RDcQNvy31L1mbdIoC5YBs3a6dPscIhhNTGbvO+pla7cdYiOVVqMV4rqZRyhWwUGC5wX JQCwROPkA4uZCmiAnejX+JKGZ+4oqfFZWT1f79agg5bFTC1rBf8J9CGKAqjNvQ== DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020e; t=1698443235; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=XFlGYGD96K+5LkGjpYByN6tWVT7HSksJ2vFMkU/hCVk=; b=6YOjVs+CWd3e07l5jHIA3i1LafzO49eA6loOma8U+G7v57V4zNB+fGVi1G/cVvmu5MkAmG p3EsIAo3PwkSeSCQ== To: Mario Limonciello , Peter Zijlstra , Ingo Molnar , Borislav Petkov , "maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT)" Cc: Arnaldo Carvalho de Melo , Mark Rutland , Alexander Shishkin , Jiri Olsa , Namhyung Kim , Ian Rogers , Adrian Hunter , Dave Hansen , "H . Peter Anvin" , "Rafael J . Wysocki" , Len Brown , Pavel Machek , David Woodhouse , Sandipan Das , "open list:PERFORMANCE EVENTS SUBSYSTEM" , "open list:PERFORMANCE EVENTS SUBSYSTEM" , "open list:SUSPEND TO RAM" , "open list:ACPI" , Mario Limonciello Subject: Re: [PATCH v2 2/2] perf/x86/amd: Stop calling amd_pmu_cpu_reset() from amd_pmu_cpu_dead() In-Reply-To: <20231026170330.4657-3-mario.limonciello@amd.com> References: <20231026170330.4657-1-mario.limonciello@amd.com> <20231026170330.4657-3-mario.limonciello@amd.com> Date: Fri, 27 Oct 2023 23:47:15 +0200 Message-ID: <87ttqb20ws.ffs@tglx> MIME-Version: 1.0 Content-Type: text/plain X-Spam-Status: No, score=-0.8 required=5.0 tests=DKIM_SIGNED,DKIM_VALID, DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI, SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on pete.vger.email Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (pete.vger.email [0.0.0.0]); Fri, 27 Oct 2023 14:47:47 -0700 (PDT) On Thu, Oct 26 2023 at 12:03, Mario Limonciello wrote: > During suspend testing on a workstation CPU a preemption BUG was > reported. How is this related to a workstation CPU? Laptop CPUs and server CPUs are magically not affected, right? Also how is this related to suspend? This clearly affects any CPU down operation whether in the context of suspend or initiated via sysfs, no? Just because you observed it during suspend testing does not magically make it a suspend related problem.... > BUG: using smp_processor_id() in preemptible [00000000] code: rtcwake/2960 > caller is amd_pmu_lbr_reset+0x19/0xc0 > CPU: 104 PID: 2960 Comm: rtcwake Not tainted 6.6.0-rc6-00002-g3e2c7f3ac51f > Call Trace: > > dump_stack_lvl+0x44/0x60 > check_preemption_disabled+0xce/0xf0 > ? __pfx_x86_pmu_dead_cpu+0x10/0x10 > amd_pmu_lbr_reset+0x19/0xc0 > ? __pfx_x86_pmu_dead_cpu+0x10/0x10 > amd_pmu_cpu_reset.constprop.0+0x51/0x60 > amd_pmu_cpu_dead+0x3e/0x90 > x86_pmu_dead_cpu+0x13/0x20 > cpuhp_invoke_callback+0x169/0x4b0 > ? __pfx_virtnet_cpu_dead+0x10/0x10 > __cpuhp_invoke_callback_range+0x76/0xe0 > _cpu_down+0x112/0x270 > freeze_secondary_cpus+0x8e/0x280 > suspend_devices_and_enter+0x342/0x900 > pm_suspend+0x2fd/0x690 > state_store+0x71/0xd0 > kernfs_fop_write_iter+0x128/0x1c0 > vfs_write+0x2db/0x400 > ksys_write+0x5f/0xe0 > do_syscall_64+0x59/0x90 > ? srso_alias_return_thunk+0x5/0x7f > ? count_memcg_events.constprop.0+0x1a/0x30 > ? srso_alias_return_thunk+0x5/0x7f > ? handle_mm_fault+0x1e9/0x340 > ? srso_alias_return_thunk+0x5/0x7f > ? preempt_count_add+0x4d/0xa0 > ? srso_alias_return_thunk+0x5/0x7f > ? up_read+0x38/0x70 > ? srso_alias_return_thunk+0x5/0x7f > ? do_user_addr_fault+0x343/0x6b0 > ? srso_alias_return_thunk+0x5/0x7f > ? exc_page_fault+0x74/0x170 > entry_SYSCALL_64_after_hwframe+0x6e/0xd8 > RIP: 0033:0x7f32f8d14a77 > Code: 10 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b7 0f 1f 00 f3 0f 1e fa > 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff > 77 51 c3 48 83 ec 28 48 89 54 24 18 48 89 74 24 > RSP: 002b:00007ffdc648de18 EFLAGS: 00000246 ORIG_RAX: 0000000000000001 > RAX: ffffffffffffffda RBX: 0000000000000004 RCX: 00007f32f8d14a77 > RDX: 0000000000000004 RSI: 000055b2fc2a5670 RDI: 0000000000000004 > RBP: 000055b2fc2a5670 R08: 0000000000000000 R09: 000055b2fc2a5670 > R10: 00007f32f8e1a2f0 R11: 0000000000000246 R12: 0000000000000004 > R13: 000055b2fc2a2480 R14: 00007f32f8e16600 R15: 00007f32f8e15a00 > How much of that backtrace is actually substantial information? At max 5 lines out of ~50. See: https://www.kernel.org/doc/html/latest/process/submitting-patches.html#backtraces > This bug shows that there is a mistake with the flow used for offlining This bug shows nothing than a calltrace. Please explain the context and the fail in coherent sentences. The bug backtrace is just for illustration. > a CPU. Calling amd_pmu_cpu_reset() from the dead callback is > problematic It's not problematic. It's simply wrong. > because this doesn't run on the actual CPU being offlined. The intent of > the function is to reset MSRs local to that CPU. > > Move the call into the dying callback which is actually run on the local > CPU. ... > +static void amd_pmu_cpu_dying(int cpu) > +{ > + amd_pmu_cpu_reset(cpu); > +} You clearly can spare that wrapper which wraps a function with the signature void fn(int) into a function with the signature void fn(int) by just assigning amd_pmu_cpu_reset() to the cpu_dying callback, no? Thanks, tglx