Received: by 2002:a05:7412:ba23:b0:fa:4c10:6cad with SMTP id jp35csp1080052rdb; Fri, 19 Jan 2024 07:36:45 -0800 (PST) X-Google-Smtp-Source: AGHT+IGRES4WwmfcB+STkKL+9FKlBUGTIgcZxQsBLxwRjuPKg+ihDpI9LPKGNkfx5gimNUZUgQJh X-Received: by 2002:ac8:7f56:0:b0:429:fb54:7e17 with SMTP id g22-20020ac87f56000000b00429fb547e17mr2919376qtk.54.1705678605714; Fri, 19 Jan 2024 07:36:45 -0800 (PST) ARC-Seal: i=2; a=rsa-sha256; t=1705678605; cv=pass; d=google.com; s=arc-20160816; b=p2g2nD5/aXa3HqcyNbyqHxdKNJgzvBDPcWzxDL9NzHuZZyW/q/TeZAuw6dQGJKtnHW BsNeHFNWxmNr5jtJByov5ACIzRqoNvugDahFNfEgiaCBbwWUIMoCnLTne3aI8h7UhCgf tV+ycjiAd4dPZ7srgJjOgofB1LUUpaiFu3xGm0sLmDJpJF6G3qABo63Nzrc4R4ubpdy1 NUt2lDhROQYk5C3y5PN29cnVQeFJbj2xwTdpqOw75O/CEfNqUgHrvkmoLs6t2vnOt8MR ziFurp1t0iJ6KIeQ2lVsn+YvQE1Bii6Q01iJ5OAiXXciixdyVHTj3tNRQq9CRz70qwnK 4N0w== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:list-unsubscribe :list-subscribe:list-id:precedence:references:in-reply-to:message-id :subject:cc:to:from:date; bh=mdLArEAx2++hnsF4nRXH6+e4vKM3ueZvivxFppef9Do=; fh=DXqSFHFMGWs8NWhSKawAW+CcN1HWregSaTKgaWTRpeY=; b=vlLZyq8IBEBn6pXfT92BuQynYkIhGDQxKv3GF/M/222iQJjvU/t8PhTNbWZYgbzMHA PSLXHKbFtLAXNxj8rUcX4kCxewSbgu5AR+RFBOEWfK6q868CdQypepRIj2wR3runevm6 8ak7pG2j1+s3BXqx21CXo1QfGG/JdVkfcMAuTa0KzMwlasggKDh6E8iERnPH0YYUyrlz QNk5XMNZnr0WciqO4yOhT3rXlbGVXEv3cpAJQr4UsiJ19pg/sYNPsYnI/E4GKYUfsHEv iCGaH8JJSD5Zi/Db5kF1QnXKSJwFjntyY864ozMF1CUfi6odvtNurfewFAotHcWTUgaC hUjQ== ARC-Authentication-Results: i=2; mx.google.com; arc=pass (i=1); spf=pass (google.com: domain of linux-kernel+bounces-31317-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:45d1:ec00::1 as permitted sender) smtp.mailfrom="linux-kernel+bounces-31317-linux.lists.archive=gmail.com@vger.kernel.org" Return-Path: Received: from ny.mirrors.kernel.org (ny.mirrors.kernel.org. [2604:1380:45d1:ec00::1]) by mx.google.com with ESMTPS id c11-20020a05622a024b00b0042a08bef4f4si7106003qtx.499.2024.01.19.07.36.45 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 19 Jan 2024 07:36:45 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel+bounces-31317-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:45d1:ec00::1 as permitted sender) client-ip=2604:1380:45d1:ec00::1; Authentication-Results: mx.google.com; arc=pass (i=1); spf=pass (google.com: domain of linux-kernel+bounces-31317-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:45d1:ec00::1 as permitted sender) smtp.mailfrom="linux-kernel+bounces-31317-linux.lists.archive=gmail.com@vger.kernel.org" Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ny.mirrors.kernel.org (Postfix) with ESMTPS id 6A36F1C23820 for ; Fri, 19 Jan 2024 15:36:45 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id EF24554BCF; Fri, 19 Jan 2024 15:36:34 +0000 (UTC) Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 637B154BC7; Fri, 19 Jan 2024 15:36:34 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1705678594; cv=none; b=uNEJBaCfrIE0Ux69ZPVG/IqqPKXPDY3c4fgF2LYoAech2nshJGMBH7+v3wp3vQOWifOFARQXopMan0VySfjw95zcgY8lZPbXb5ZKIqEOAslgg9Y1MGbwOH3A6DDyV93m3utrU2xw4Eldy/+Balia/4Ec4uYw0GK5+CSTSIY9y2Y= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1705678594; c=relaxed/simple; bh=mdLArEAx2++hnsF4nRXH6+e4vKM3ueZvivxFppef9Do=; h=Date:From:To:Cc:Subject:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=sIhk8hCEJAuV46jBKipzXCUOA83nkpSAgV7kT2FjTBTUI0taQrhDxqj9dpi8SHETjaGx1B1YPmDWvnjLarU+6wexXHJYR7Psxlo8k7oDroHNN7PjtlX4yID3uvulik0zCVNgP0eCm/KDozDpn/mRVeNBxCGKPM9E7vkVyIHicJc= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 Received: by smtp.kernel.org (Postfix) with ESMTPSA id 29257C433C7; Fri, 19 Jan 2024 15:36:33 +0000 (UTC) Date: Fri, 19 Jan 2024 10:37:54 -0500 From: Steven Rostedt To: Mathieu Desnoyers Cc: LKML , Linux Trace Kernel , Masami Hiramatsu , Mark Rutland Subject: Re: [PATCH] ring-buffer: Simplify reservation with try_cmpxchg() loop Message-ID: <20240119103754.154dc009@gandalf.local.home> In-Reply-To: <504085e9-bf91-4948-a158-abae5dcb276a@efficios.com> References: <20240118181206.4977da2f@gandalf.local.home> <504085e9-bf91-4948-a158-abae5dcb276a@efficios.com> X-Mailer: Claws Mail 3.19.1 (GTK+ 2.24.33; x86_64-pc-linux-gnu) Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable On Fri, 19 Jan 2024 09:40:27 -0500 Mathieu Desnoyers wrote: > On 2024-01-18 18:12, Steven Rostedt wrote: > > From: "Steven Rostedt (Google)" > >=20 > > Instead of using local_add_return() to reserve the ring buffer data, > > Mathieu Desnoyers suggested using local_cmpxchg(). This would simplify = the > > reservation with the time keeping code. =20 >=20 > I admire the effort of trying to simplify the Ftrace ring buffer by bring= ing > over ideas that worked well for LTTng. :-) As reviewer of the tracing sub= system, > I certainly welcome the simplifications. >=20 The idea itself wasn't new. It was you convincing me that local_add_return() is no faster than local_cmpxchg(). As I would have done it this way from the start if I wasn't dead set against adding any new cmpxchg() in the fast path. Testing showed that local_add_return() is pretty much just as bad, so the added complexity to avoid just slapping in a cmpxchg() was useless. > > Although, it does not get rid of the double time stamps (before_stamp a= nd > > write_stamp), using cmpxchg() does get rid of the more complex case when > > an interrupting event occurs between getting the timestamps and reservi= ng > > the data, as when that happens, it just tries again instead of dealing > > with it. =20 >=20 > I understand that the reason why you need the before/after stamps and the= ir > associated complexity is because the Ftrace ring buffer ABI encodes event > timestamps as delta from the previous event within the buffer as a mean of > compressing the timestamp fields. If the delta cannot be represented in a > given number of bits, then it inserts a 64-bit timestamp (not sure if that > one is absolute or a delta from previous event). There's both. An extended timestamp, which is added when the delta is too big, and that too is just a delta from the previous event. And there is the absolute timestamp as well. I could always just use the absolute one. That event came much later. >=20 > This timestamp encoding as delta between events introduce a strong > inter-dependency between consecutive (nested) events, and is the reason > why you are stuck with all this timestamp before/after complexity. >=20 > The Common Trace Format specifies (and LTTng implements) a different way > to achieve the same ring buffer space-savings achieved with timestamp del= tas > while keeping the timestamps semantically absolute from a given reference, > hence without all the before/after timestamp complexity. You can see the > clock value decoding procedure in the CTF2 SPEC RC9 [1] document. The bas= ic That points to this: ---------------------8<------------------------- 6.3. Clock value update procedure To update DEF_CLK_VAL from an unsigned integer field F having the unsigned = integer value V and the class C: Let L be an unsigned integer initialized to, depending on the type property= of C: "fixed-length-unsigned-integer" The value of the length property of C. "variable-length-unsigned-integer" S =C3=977, where S is the number of bytes which F occupies with the data st= ream. Let MASK be an unsigned integer initialized to 2L =E2=88=92 1. Let H be an unsigned integer initialized to DEF_CLK_VAL & ~MASK, where =E2= =80=9C&=E2=80=9D is the bitwise AND operator and =E2=80=9C~=E2=80=9D is the= bitwise NOT operator. Let CUR be an unsigned integer initialized to DEF_CLK_VAL & MASK, where =E2= =80=9C&=E2=80=9D is the bitwise AND operator. Set DEF_CLK_VAL to: If V =E2=89=A5 CUR H + V Else H + MASK + 1 + V --------------------->8------------------------- There's a lot of missing context there, so I don't see how it relates. > idea on the producer side is to record the low-order bits of the current > timestamp in the event header (truncating the higher order bits), and > fall back on a full 64-bit value if the number of low-order bits overflows > from the previous timestamp is more than 1, or if it is impossible to fig= ure > out precisely the timestamp of the previous event due to a race. This ach= ieves > the same space savings as delta timestamp encoding without introducing the > strong event inter-dependency. So when an overflow happens, you just insert a timestamp, and then events after that is based on that? >=20 > The fact that Ftrace exposes this ring buffer binary layout as a user-spa= ce > ABI makes it tricky to move to the Common Trace Format timestamp encoding. > There are clearly huge simplifications that could be made by moving to th= is > scheme though. Is there any way to introduce a different timestamp encodi= ng > scheme as an extension to the Ftrace ring buffer ABI ? This would allow u= s to > introduce this simpler scheme and gradually phase out the more complex de= lta > encoding when no users are left. I'm not sure if there's a path forward. The infrastructure can easily swap in and out a new implementation. That is, there's not much dependency on the way the ring buffer works outside the ring buffer itself. If we were to change the layout, it would likely require a new interface file to read. The trace_pipe_raw is the only file that exposes the current ring buffer. We could create a trace_out_raw or some other named file that has a completely different API and it wouldn't break any existing API. Although, if we want to change the "default" way, it may need some other knobs or something, which wouldn't be hard. Now I have to ask, what's the motivation for this. The code isn't that complex anymore. Yes it still has the before/after timestamps, but the most complexity in that code was due to what happens in the race of updating the reserved data. But that's no longer the case with the cmpxchg(). If you look at this patch, that entire else statement was deleted. And that deleted code was what made me sick to my stomach ;-) Good riddance! -- Steve