Received: by 2002:ab2:7041:0:b0:1f4:bcc8:f211 with SMTP id x1csp146584lql; Fri, 12 Apr 2024 06:29:57 -0700 (PDT) X-Forwarded-Encrypted: i=3; AJvYcCUQkV4TFN1T60N/wxwGGpsZCFyqJRRx+/YZ17Iw5KFPBFBsiGp6FXXIJrCNTW79n1ZQdv5xeqFSI7+gPLbRH62PM/aSg6X038eqiSSw+g== X-Google-Smtp-Source: AGHT+IEKMsglJh9PEHy7zzLTge/JzZ72OueeRWHe1xm6N86Y78AT4e45RH1paF8Sz3vewHHKX2EQ X-Received: by 2002:a05:620a:5698:b0:78d:6b2a:3051 with SMTP id wg24-20020a05620a569800b0078d6b2a3051mr2767413qkn.66.1712928597220; Fri, 12 Apr 2024 06:29:57 -0700 (PDT) ARC-Seal: i=2; a=rsa-sha256; t=1712928597; cv=pass; d=google.com; s=arc-20160816; b=nNr/k8Ht4o43AwUKZJUSt2agcjue/RDdo3PBnensbiJ+tBFIZtLZgsCujufqwN6Tqw C/n213b6a+TOHKmI9qsXoGSkTGueuGMMrx5KFDrW1OKA+7Lof7oGKvgrPE+6B2586e41 zTaNypgYox3lz8nEi+vk20wDpBjIu0wtd+qfROOhMdJmXiOpmp1wnYSFnnjFFJ/eNcpp IYk5+f0RENyCY76NOLPTJ671ipkpXvqQawhDoVxs81+Ot1EurNbCxSbr+8yioTPq0pJ6 sVWHKGaV4i9yQBqlMQEpycmruPBUaxH+kEsDO52zFyr3LH0pRcnP+o4zGwMbPkCm20lf a5Yg== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=mime-version:list-unsubscribe:list-subscribe:list-id:precedence :references:subject:cc:to:from:date:user-agent:message-id; bh=NsnqyUMq7jpdLkJLIuaeSaI5u5/bBvorAdsUnBRtvM8=; fh=hu/11vjhAVGN2soHkDrejjtKeeIeHno3fFH7CvZIC0Y=; b=jHb4AqFEfCMo3LF3Q8NAMdoBKi51g6id8WvtJ8V8UKACcKKoifNe1fpdgbmQQdPoRd ceYIkpeRFfskZv7LJSv/UCHi/cg9yXej0g3aRU+xEtLh5ysLshDXAY80bV7w3R7vAbNB qlXTAyZIg5DQTMul/6JWnKhfAkLq51AUMARl2+hCHe2dm3AbhshcD1kssu6o670KjvP+ 4fEUrQNubfcSZSt4YgSVZQZFvn/MpEVftwqoX+MqZugtbPg8rHRdYdC8gPoqO+YxFu7y 0Wdd23+gFGurZea/hzHE/etiRXnrEFrocqzOvnji3Be+IClHk8LLE6eki5e5breZe8T/ K/6Q==; dara=google.com ARC-Authentication-Results: i=2; mx.google.com; arc=pass (i=1); spf=pass (google.com: domain of linux-kernel+bounces-142727-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:45d1:ec00::1 as permitted sender) smtp.mailfrom="linux-kernel+bounces-142727-linux.lists.archive=gmail.com@vger.kernel.org" Return-Path: Received: from ny.mirrors.kernel.org (ny.mirrors.kernel.org. [2604:1380:45d1:ec00::1]) by mx.google.com with ESMTPS id dz27-20020a05620a2b9b00b0078d63d36643si4190261qkb.444.2024.04.12.06.29.57 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 12 Apr 2024 06:29:57 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel+bounces-142727-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:45d1:ec00::1 as permitted sender) client-ip=2604:1380:45d1:ec00::1; Authentication-Results: mx.google.com; arc=pass (i=1); spf=pass (google.com: domain of linux-kernel+bounces-142727-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:45d1:ec00::1 as permitted sender) smtp.mailfrom="linux-kernel+bounces-142727-linux.lists.archive=gmail.com@vger.kernel.org" Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ny.mirrors.kernel.org (Postfix) with ESMTPS id EDC8D1C21017 for ; Fri, 12 Apr 2024 13:29:56 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 757AD83A1D; Fri, 12 Apr 2024 13:29:33 +0000 (UTC) Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id D1386824B3; Fri, 12 Apr 2024 13:29:32 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1712928572; cv=none; b=nhydOyVvhSK3ecvIdtpvFqXCebYqroD0iHlb29F7uvp4snknKzJUTBhxEWq9d2fYejElllXm8N+cbliZOanJBizkq71dnMCBiPyFXKL+jn8oT9kwLzE2FcmRxNC/yQFFMSIOBGBBGdwSmhnHAZaVGzbJ1yewAjgM8CqtG0FT8yc= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1712928572; c=relaxed/simple; bh=u8dMeeeKnX/l6l8EK0qzPoKX+NXDcA9+9DlJCRADEYA=; h=Message-ID:Date:From:To:Cc:Subject:References:MIME-Version: Content-Type; b=MsGnacRSfQqEDLfrJACznESvKKDLG8bMq7ZmLTOTBcaLpwgRZ+3SgWIoCH8hTC7kpcUxyvOfLaW/AlTZBrMy+1JF2RINpazhiePUBeg/6DPpZ7dawgcFCPCEEcS6IwAjQFN90RGyC4BIrg4tZGpqMU6nPFNCc+4vP+1n8iNdy3w= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 Received: by smtp.kernel.org (Postfix) with ESMTPSA id 8A7FCC4AF07; Fri, 12 Apr 2024 13:29:32 +0000 (UTC) Received: from rostedt by gandalf with local (Exim 4.97) (envelope-from ) id 1rvH0o-000000012qu-1r4T; Fri, 12 Apr 2024 09:32:14 -0400 Message-ID: <20240412133214.307818654@goodmis.org> User-Agent: quilt/0.67 Date: Fri, 12 Apr 2024 09:31:56 -0400 From: Steven Rostedt To: linux-kernel@vger.kernel.org Cc: Masami Hiramatsu , Mark Rutland , Mathieu Desnoyers , Andrew Morton , stable@vger.kernel.org Subject: [for-linus][PATCH 4/4] ring-buffer: Only update pages_touched when a new page is touched References: <20240412133152.723632549@goodmis.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 From: "Steven Rostedt (Google)" The "buffer_percent" logic that is used by the ring buffer splice code to only wake up the tasks when there's no data after the buffer is filled to the percentage of the "buffer_percent" file is dependent on three variables that determine the amount of data that is in the ring buffer: 1) pages_read - incremented whenever a new sub-buffer is consumed 2) pages_lost - incremented every time a writer overwrites a sub-buffer 3) pages_touched - incremented when a write goes to a new sub-buffer The percentage is the calculation of: (pages_touched - (pages_lost + pages_read)) / nr_pages Basically, the amount of data is the total number of sub-bufs that have been touched, minus the number of sub-bufs lost and sub-bufs consumed. This is divided by the total count to give the buffer percentage. When the percentage is greater than the value in the "buffer_percent" file, it wakes up splice readers waiting for that amount. It was observed that over time, the amount read from the splice was constantly decreasing the longer the trace was running. That is, if one asked for 60%, it would read over 60% when it first starts tracing, but then it would be woken up at under 60% and would slowly decrease the amount of data read after being woken up, where the amount becomes much less than the buffer percent. This was due to an accounting of the pages_touched incrementation. This value is incremented whenever a writer transfers to a new sub-buffer. But the place where it was incremented was incorrect. If a writer overflowed the current sub-buffer it would go to the next one. If it gets preempted by an interrupt at that time, and the interrupt performs a trace, it too will end up going to the next sub-buffer. But only one should increment the counter. Unfortunately, that was not the case. Change the cmpxchg() that does the real switch of the tail-page into a try_cmpxchg(), and on success, perform the increment of pages_touched. This will only increment the counter once for when the writer moves to a new sub-buffer, and not when there's a race and is incremented for when a writer and its preempting writer both move to the same new sub-buffer. Link: https://lore.kernel.org/linux-trace-kernel/20240409151309.0d0e5056@gandalf.local.home Cc: stable@vger.kernel.org Cc: Mathieu Desnoyers Fixes: 2c2b0a78b3739 ("ring-buffer: Add percentage of ring buffer full to wake up reader") Acked-by: Masami Hiramatsu (Google) Signed-off-by: Steven Rostedt (Google) --- kernel/trace/ring_buffer.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c index 25476ead681b..6511dc3a00da 100644 --- a/kernel/trace/ring_buffer.c +++ b/kernel/trace/ring_buffer.c @@ -1393,7 +1393,6 @@ static void rb_tail_page_update(struct ring_buffer_per_cpu *cpu_buffer, old_write = local_add_return(RB_WRITE_INTCNT, &next_page->write); old_entries = local_add_return(RB_WRITE_INTCNT, &next_page->entries); - local_inc(&cpu_buffer->pages_touched); /* * Just make sure we have seen our old_write and synchronize * with any interrupts that come in. @@ -1430,8 +1429,9 @@ static void rb_tail_page_update(struct ring_buffer_per_cpu *cpu_buffer, */ local_set(&next_page->page->commit, 0); - /* Again, either we update tail_page or an interrupt does */ - (void)cmpxchg(&cpu_buffer->tail_page, tail_page, next_page); + /* Either we update tail_page or an interrupt does */ + if (try_cmpxchg(&cpu_buffer->tail_page, &tail_page, next_page)) + local_inc(&cpu_buffer->pages_touched); } } -- 2.43.0