Received: by 2002:a05:7412:8d10:b0:f3:1519:9f41 with SMTP id bj16csp397482rdb; Tue, 5 Dec 2023 08:18:46 -0800 (PST) X-Google-Smtp-Source: AGHT+IFztHZxb/ppvMzrcxjqO0wzzv9qN5jLE83jVg9a9J1X1iINCQP2kOxo/MSfHaQc456Xlm8P X-Received: by 2002:a92:b11:0:b0:35d:5995:9034 with SMTP id b17-20020a920b11000000b0035d59959034mr3457282ilf.39.1701793125882; Tue, 05 Dec 2023 08:18:45 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1701793125; cv=none; d=google.com; s=arc-20160816; b=fTWaFaVeF0ftfj4Dfs6GB7lucY9luuSf295d9i/IRom43wN2+z4ATZP/+9izT+3riK 21Pz1viCZ2ToWSRXOks/z3ryVKgLaySW5h3Qrr4nXlv9Da2qeI+A0S1MrUg8rqoilVlV ZLlk211kmchiGQXeIv4nmfGFhMEcAlIO8gNej17f2WM8PeOZ/URDAJ/lspMxDO8+WCBM SIZVnT6TgkpnpbBXgR4iUJuzWeIwNUPRXFVATuFr+ssUXuQ8gjUg97644Ex2a3flUZZH mdV0DJ4Z3vXVxPnG20uT7rRP6MazoepfvXG8WEl46w610/bZWbhnxtlcFvrZ05WrpHLi WA/g== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from; bh=U0kcazU4Zavm4NwD+MBtZqbd7J7HMLAqEhj2csGxlpU=; fh=j8Hw7lilfJM9L0BwDk5YbNksoIL60jpocmP7dOg/RPw=; b=ivFdMcy/pYSMRdxKSpeYxfqXM26zkgBjr0XKdvMjZW8iaJtDykTXai+buV+VUBEFU0 VsNuHwQ1yuVOh1O7S/yYkL79cL/NncaLG2VJMKXms0B9YhjZax07+cKyzY27EcdJjAYV a0rmCa6CbTsknMqlgY5e6EeeyOSCsVYWLG7sbWMlO/Qg3rX18AFSJKPvxM5tHcmm6Kf0 yyUVkiOKZwiX9ThHAyJf+doHMDwEc/SBdNrT+Yrqh/0Eu2uexf+kfzP10+KGD6WWoA1l Qw4+gpuoHpdEfhCgzIV+Iw9FQdaZarvlw7esCiXFID/M6bNZAdE0O/hOPBQM3Nmlj25M jB+g== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.37 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=suse.com Return-Path: Received: from snail.vger.email (snail.vger.email. [23.128.96.37]) by mx.google.com with ESMTPS id h10-20020a65404a000000b005b909e678dfsi9588809pgp.450.2023.12.05.08.18.44 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 05 Dec 2023 08:18:45 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.37 as permitted sender) client-ip=23.128.96.37; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.37 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=suse.com Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by snail.vger.email (Postfix) with ESMTP id F405A8031AE5; Tue, 5 Dec 2023 08:18:07 -0800 (PST) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.11 at snail.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232147AbjLEQR4 (ORCPT + 99 others); Tue, 5 Dec 2023 11:17:56 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:36962 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231897AbjLEQRu (ORCPT ); Tue, 5 Dec 2023 11:17:50 -0500 Received: from smtp-out2.suse.de (smtp-out2.suse.de [IPv6:2a07:de40:b251:101:10:150:64:2]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id CE71B19B; Tue, 5 Dec 2023 08:17:55 -0800 (PST) Received: from imap1.dmz-prg2.suse.org (imap1.dmz-prg2.suse.org [IPv6:2a07:de40:b281:104:10:150:64:97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by smtp-out2.suse.de (Postfix) with ESMTPS id 33DC21FB9F; Tue, 5 Dec 2023 16:17:54 +0000 (UTC) Received: from imap1.dmz-prg2.suse.org (localhost [127.0.0.1]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by imap1.dmz-prg2.suse.org (Postfix) with ESMTPS id E667F13AFD; Tue, 5 Dec 2023 16:17:53 +0000 (UTC) Received: from dovecot-director2.suse.de ([10.150.64.162]) by imap1.dmz-prg2.suse.org with ESMTPSA id WOj2NjFNb2WtdwAAD6G6ig (envelope-from ); Tue, 05 Dec 2023 16:17:53 +0000 From: Petr Pavlu To: rostedt@goodmis.org, mhiramat@kernel.org, mathieu.desnoyers@efficios.com Cc: zhengyejian1@huawei.com, linux-trace-kernel@vger.kernel.org, linux-kernel@vger.kernel.org, Petr Pavlu Subject: [PATCH v2 1/3] tracing: Fix incomplete locking when disabling buffered events Date: Tue, 5 Dec 2023 17:17:34 +0100 Message-Id: <20231205161736.19663-2-petr.pavlu@suse.com> X-Mailer: git-send-email 2.35.3 In-Reply-To: <20231205161736.19663-1-petr.pavlu@suse.com> References: <20231205161736.19663-1-petr.pavlu@suse.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Spamd-Bar: +++++++++++++++ Authentication-Results: smtp-out2.suse.de; dkim=none; dmarc=fail reason="No valid SPF, No valid DKIM" header.from=suse.com (policy=quarantine); spf=fail (smtp-out2.suse.de: domain of petr.pavlu@suse.com does not designate 2a07:de40:b281:104:10:150:64:97 as permitted sender) smtp.mailfrom=petr.pavlu@suse.com X-Rspamd-Server: rspamd2 X-Spamd-Result: default: False [15.10 / 50.00]; RCVD_VIA_SMTP_AUTH(0.00)[]; R_SPF_FAIL(1.00)[-all]; ARC_NA(0.00)[]; FROM_HAS_DN(0.00)[]; TO_DN_SOME(0.00)[]; R_MISSING_CHARSET(2.50)[]; TO_MATCH_ENVRCPT_ALL(0.00)[]; MIME_GOOD(-0.10)[text/plain]; NEURAL_SPAM_SHORT(0.10)[0.034]; BROKEN_CONTENT_TYPE(1.50)[]; DMARC_POLICY_QUARANTINE(1.50)[suse.com : No valid SPF, No valid DKIM,quarantine]; RCVD_COUNT_THREE(0.00)[3]; MX_GOOD(-0.01)[]; RCPT_COUNT_SEVEN(0.00)[7]; MID_CONTAINS_FROM(1.00)[]; DBL_BLOCKED_OPENRESOLVER(0.00)[suse.com:email]; FUZZY_BLOCKED(0.00)[rspamd.com]; FROM_EQ_ENVFROM(0.00)[]; R_DKIM_NA(2.20)[]; MIME_TRACE(0.00)[0:+]; RCVD_TLS_ALL(0.00)[]; BAYES_HAM(-2.11)[95.72%] X-Spam-Score: 15.10 X-Rspamd-Queue-Id: 33DC21FB9F X-Spam: Yes X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00, RCVD_IN_DNSWL_BLOCKED,SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (snail.vger.email [0.0.0.0]); Tue, 05 Dec 2023 08:18:08 -0800 (PST) The following warning appears when using buffered events: [ 203.556451] WARNING: CPU: 53 PID: 10220 at kernel/trace/ring_buffer.c:3912 ring_buffer_discard_commit+0x2eb/0x420 [...] [ 203.670690] CPU: 53 PID: 10220 Comm: stress-ng-sysin Tainted: G E 6.7.0-rc2-default #4 56e6d0fcf5581e6e51eaaecbdaec2a2338c80f3a [ 203.670704] Hardware name: Intel Corp. GROVEPORT/GROVEPORT, BIOS GVPRCRB1.86B.0016.D04.1705030402 05/03/2017 [ 203.670709] RIP: 0010:ring_buffer_discard_commit+0x2eb/0x420 [ 203.735721] Code: 4c 8b 4a 50 48 8b 42 48 49 39 c1 0f 84 b3 00 00 00 49 83 e8 01 75 b1 48 8b 42 10 f0 ff 40 08 0f 0b e9 fc fe ff ff f0 ff 47 08 <0f> 0b e9 77 fd ff ff 48 8b 42 10 f0 ff 40 08 0f 0b e9 f5 fe ff ff [ 203.735734] RSP: 0018:ffffb4ae4f7b7d80 EFLAGS: 00010202 [ 203.735745] RAX: 0000000000000000 RBX: ffffb4ae4f7b7de0 RCX: ffff8ac10662c000 [ 203.735754] RDX: ffff8ac0c750be00 RSI: ffff8ac10662c000 RDI: ffff8ac0c004d400 [ 203.781832] RBP: ffff8ac0c039cea0 R08: 0000000000000000 R09: 0000000000000000 [ 203.781839] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000 [ 203.781842] R13: ffff8ac10662c000 R14: ffff8ac0c004d400 R15: ffff8ac10662c008 [ 203.781846] FS: 00007f4cd8a67740(0000) GS:ffff8ad798880000(0000) knlGS:0000000000000000 [ 203.781851] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 203.781855] CR2: 0000559766a74028 CR3: 00000001804c4000 CR4: 00000000001506f0 [ 203.781862] Call Trace: [ 203.781870] [ 203.851949] trace_event_buffer_commit+0x1ea/0x250 [ 203.851967] trace_event_raw_event_sys_enter+0x83/0xe0 [ 203.851983] syscall_trace_enter.isra.0+0x182/0x1a0 [ 203.851990] do_syscall_64+0x3a/0xe0 [ 203.852075] entry_SYSCALL_64_after_hwframe+0x6e/0x76 [ 203.852090] RIP: 0033:0x7f4cd870fa77 [ 203.982920] Code: 00 b8 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 66 90 b8 89 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d e9 43 0e 00 f7 d8 64 89 01 48 [ 203.982932] RSP: 002b:00007fff99717dd8 EFLAGS: 00000246 ORIG_RAX: 0000000000000089 [ 203.982942] RAX: ffffffffffffffda RBX: 0000558ea1d7b6f0 RCX: 00007f4cd870fa77 [ 203.982948] RDX: 0000000000000000 RSI: 00007fff99717de0 RDI: 0000558ea1d7b6f0 [ 203.982957] RBP: 00007fff99717de0 R08: 00007fff997180e0 R09: 00007fff997180e0 [ 203.982962] R10: 00007fff997180e0 R11: 0000000000000246 R12: 00007fff99717f40 [ 204.049239] R13: 00007fff99718590 R14: 0000558e9f2127a8 R15: 00007fff997180b0 [ 204.049256] For instance, it can be triggered by running these two commands in parallel: $ while true; do echo hist:key=id.syscall:val=hitcount > \ /sys/kernel/debug/tracing/events/raw_syscalls/sys_enter/trigger; done $ stress-ng --sysinfo $(nproc) The warning indicates that the current ring_buffer_per_cpu is not in the committing state. It happens because the active ring_buffer_event doesn't actually come from the ring_buffer_per_cpu but is allocated from trace_buffered_event. The bug is in function trace_buffered_event_disable() where the following normally happens: * The code invokes disable_trace_buffered_event() via smp_call_function_many() and follows it by synchronize_rcu(). This increments the per-CPU variable trace_buffered_event_cnt on each target CPU and grants trace_buffered_event_disable() the exclusive access to the per-CPU variable trace_buffered_event. * Maintenance is performed on trace_buffered_event, all per-CPU event buffers get freed. * The code invokes enable_trace_buffered_event() via smp_call_function_many(). This decrements trace_buffered_event_cnt and releases the access to trace_buffered_event. A problem is that smp_call_function_many() runs a given function on all target CPUs except on the current one. The following can then occur: * Task X executing trace_buffered_event_disable() runs on CPU 0. * The control reaches synchronize_rcu() and the task gets rescheduled on another CPU 1. * The RCU synchronization finishes. At this point, trace_buffered_event_disable() has the exclusive access to all trace_buffered_event variables except trace_buffered_event[CPU0] because trace_buffered_event_cnt[CPU0] is never incremented and if the buffer is currently unused, remains set to 0. * A different task Y is scheduled on CPU 0 and hits a trace event. The code in trace_event_buffer_lock_reserve() sees that trace_buffered_event_cnt[CPU0] is set to 0 and decides the use the buffer provided by trace_buffered_event[CPU0]. * Task X continues its execution in trace_buffered_event_disable(). The code incorrectly frees the event buffer pointed by trace_buffered_event[CPU0] and resets the variable to NULL. * Task Y writes event data to the now freed buffer and later detects the created inconsistency. The issue is observable since commit dea499781a11 ("tracing: Fix warning in trace_buffered_event_disable()") which moved the call of trace_buffered_event_disable() in __ftrace_event_enable_disable() earlier, prior to invoking call->class->reg(.. TRACE_REG_UNREGISTER ..). The underlying problem in trace_buffered_event_disable() is however present since the original implementation in commit 0fc1b09ff1ff ("tracing: Use temp buffer when filtering events"). Fix the problem by replacing the two smp_call_function_many() calls with on_each_cpu_mask() which invokes a given callback on all CPUs. Fixes: 0fc1b09ff1ff ("tracing: Use temp buffer when filtering events") Fixes: dea499781a11 ("tracing: Fix warning in trace_buffered_event_disable()") Signed-off-by: Petr Pavlu --- kernel/trace/trace.c | 12 ++++-------- 1 file changed, 4 insertions(+), 8 deletions(-) diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c index 9aebf904ff97..6f22a2a31589 100644 --- a/kernel/trace/trace.c +++ b/kernel/trace/trace.c @@ -2823,11 +2823,9 @@ void trace_buffered_event_disable(void) if (--trace_buffered_event_ref) return; - preempt_disable(); /* For each CPU, set the buffer as used. */ - smp_call_function_many(tracing_buffer_mask, - disable_trace_buffered_event, NULL, 1); - preempt_enable(); + on_each_cpu_mask(tracing_buffer_mask, disable_trace_buffered_event, + NULL, true); /* Wait for all current users to finish */ synchronize_rcu(); @@ -2842,11 +2840,9 @@ void trace_buffered_event_disable(void) */ smp_wmb(); - preempt_disable(); /* Do the work on each cpu */ - smp_call_function_many(tracing_buffer_mask, - enable_trace_buffered_event, NULL, 1); - preempt_enable(); + on_each_cpu_mask(tracing_buffer_mask, enable_trace_buffered_event, NULL, + true); } static struct trace_buffer *temp_buffer; -- 2.35.3