Received: by 2002:a05:7412:8d10:b0:f3:1519:9f41 with SMTP id bj16csp4217116rdb; Mon, 11 Dec 2023 12:13:42 -0800 (PST) X-Google-Smtp-Source: AGHT+IHDTNBKC9UQ5iYCGujTLRO12lVB2N/P/wHUnfTl4V21vf1HC866DdGo3BnxgWrApKdAKq8g X-Received: by 2002:a05:6a00:6706:b0:6ce:50bf:45ef with SMTP id hm6-20020a056a00670600b006ce50bf45efmr1960519pfb.43.1702325622422; Mon, 11 Dec 2023 12:13:42 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1702325622; cv=none; d=google.com; s=arc-20160816; b=vUYwA1fJHGKU52yYLz0bf2Kx5F4O9aEPRH601bfGeLbBV3SqRuTbMJfrmT1dvTBub8 9ajQd6Y2rdewpEj1XF9ii7djLuc/zYJMUz0CehezQj4r5d2KK7pY/IqAFPo1aP361G6u MF9FgQARNAdYmexr+As3U0v1rPoY+EWqd51TXY8AxYQJD3zq3dSOFDY7RzgFhsoUWz+8 FsbZP+QGMc45eABnOUaabmJM/3sfI7BQpmtS99g/mzueq76wXFyAMtwTd/TgciBqSy7Q dCz4fFoXGRikJHy2OaWgWuo+NzkeKh1m6kLf6drVO8G3OAfRzWuYXPPufbkNAp+8h9r9 /QCw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :message-id:date:subject:cc:to:from:dkim-signature; bh=Ro19/tjKio4Pg4vnx//zzJFz55mqtHOf4RQSaapIvRk=; fh=swEFr3N7o5ccg6p3NtRiiO/WVpY3DB85Xlxg6OZMFig=; b=rzhpsLntM8+KzfyEI4JckAzrpQp3HtPW8/DtInJoZJboTJfCA8K2qIJXYTvEcSwHKv pbd8rWMemUpEvZlIU/yKZGOKyzooVrBRzMJK4ahKxjEKYRh4v0pmkGV9ks62vM9+UatE lV/r2/Z/qqakM983xPjCmVg+AdXXemv1DQgvGmiaKWbpwgqX6ohh2rdnTR0V8Oruwexi wdHcq/PDFxz6p3Bs4lWCA22bgCg+yEdua5llvi+a6HTjyZ0r/RTqBXDecg/Z7Z3+KUAJ M+tGORkIns0E56ILyl2R53Wi9JcJLlwkJNFEcZqmWJ4A9Z/va57Tq0exh0qo9q/711Vy lhxQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@efficios.com header.s=smtpout1 header.b=vmV0GKJ1; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:8 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=efficios.com Return-Path: Received: from fry.vger.email (fry.vger.email. [2620:137:e000::3:8]) by mx.google.com with ESMTPS id a8-20020a62d408000000b006ce65842c57si6473905pfh.130.2023.12.11.12.13.42 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 11 Dec 2023 12:13:42 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:8 as permitted sender) client-ip=2620:137:e000::3:8; Authentication-Results: mx.google.com; dkim=pass header.i=@efficios.com header.s=smtpout1 header.b=vmV0GKJ1; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:8 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=efficios.com Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by fry.vger.email (Postfix) with ESMTP id 87CBC809C190; Mon, 11 Dec 2023 12:13:39 -0800 (PST) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.11 at fry.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1344978AbjLKUNX (ORCPT + 99 others); Mon, 11 Dec 2023 15:13:23 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:35086 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1344971AbjLKUNW (ORCPT ); Mon, 11 Dec 2023 15:13:22 -0500 Received: from smtpout.efficios.com (smtpout.efficios.com [167.114.26.122]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id CEE76DC; Mon, 11 Dec 2023 12:13:27 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=efficios.com; s=smtpout1; t=1702325605; bh=MvVyKw/x+yVjAm3vE1F1gu66BqT0E/W/bARjFgMbliw=; h=From:To:Cc:Subject:Date:From; b=vmV0GKJ1rOjQY5SnrMb0zU+FNuvpCJXSliUZCWUZMDsH1vzEpSlT+jfeFu16ldkTq xSQ4mMjPS50qdRcyDMHp9fCo4evqL2dtUJlJVykukrR21tOEPem8iTT1PDJSz1JsP0 YDy85b3xc9oDozuVFXW0Z4xxL4oCtm4MgqZUKwcJck0BcXOL5PXB7L9laA06L/Y6g6 ByXYrMJ7u/ExSg5zCTdQJvIZUI0yrXNRebOrgxp4h4T+wqAo8rEAu1jHeNyTb1s9aa GVUktYykV6lOQ0vrccU27+wifpCO/4Gk6C9lz2NM8G7rQxmW5qSZ+XO/a1cK5poDIV M+hdNtsGw5upQ== Received: from thinkos.internal.efficios.com (192-222-143-198.qc.cable.ebox.net [192.222.143.198]) by smtpout.efficios.com (Postfix) with ESMTPSA id 4SptGs4nGDzGRs; Mon, 11 Dec 2023 15:13:25 -0500 (EST) From: Mathieu Desnoyers To: Steven Rostedt Cc: linux-kernel@vger.kernel.org, Mathieu Desnoyers , Masami Hiramatsu , linux-trace-kernel@vger.kernel.org Subject: [RFC PATCH] ring-buffer: Fix and comment ring buffer rb_time functions on 32-bit Date: Mon, 11 Dec 2023 15:13:24 -0500 Message-Id: <20231211201324.652870-1-mathieu.desnoyers@efficios.com> X-Mailer: git-send-email 2.39.2 MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-0.9 required=5.0 tests=DKIM_SIGNED,DKIM_VALID, DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI, SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on fry.vger.email Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (fry.vger.email [0.0.0.0]); Mon, 11 Dec 2023 12:13:39 -0800 (PST) Going through a review of the ring buffer rb_time functions for 32-bit architectures, I updated the comments to match the code, and identified the following issues: - rb_time_cmpxchg() needs to update the msb last, so it matches the validation of top and msb by __rb_time_read(). This is fixed by this patch. - A cmpxchg interrupted by 4 writes or cmpxchg overflows the counter and produces corrupted time stamps. This is _not_ fixed by this patch. - After a cmpxchg fails between updates to top and msb, a write is needed before read and cmpxchg can succeed again. I am not entirely sure the rest of the ring buffer handles this correctly. Signed-off-by: Mathieu Desnoyers Cc: Steven Rostedt Cc: Masami Hiramatsu Cc: linux-trace-kernel@vger.kernel.org --- kernel/trace/ring_buffer.c | 64 +++++++++++++++++++++++++++----------- 1 file changed, 46 insertions(+), 18 deletions(-) diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c index 8d2a4f00eca9..f6ed699947cd 100644 --- a/kernel/trace/ring_buffer.c +++ b/kernel/trace/ring_buffer.c @@ -576,34 +576,50 @@ struct ring_buffer_iter { #ifdef RB_TIME_32 /* - * On 32 bit machines, local64_t is very expensive. As the ring - * buffer doesn't need all the features of a true 64 bit atomic, - * on 32 bit, it uses these functions (64 still uses local64_t). + * On 32-bit machines, local64_t is very expensive. As the ring + * buffer doesn't need all the features of a true 64-bit atomic, + * on 32-bit, it uses these functions (64-bit still uses local64_t). * - * For the ring buffer, 64 bit required operations for the time is - * the following: + * For the ring buffer, the operations required to manipulate 64-bit + * time stamps are the following: * - * - Reads may fail if it interrupted a modification of the time stamp. + * - Read may fail if it interrupted a modification of the time stamp. * It will succeed if it did not interrupt another write even if * the read itself is interrupted by a write. + * A read will fail if it follows a cmpxchg which failed between + * updates to its top and msb bits, until a write is performed. + * (note: this limitation may be unexpected in parts of the + * ring buffer algorithm) * It returns whether it was successful or not. * - * - Writes always succeed and will overwrite other writes and writes + * - Write always succeeds and will overwrite other writes and writes * that were done by events interrupting the current write. * * - A write followed by a read of the same time stamp will always succeed, * but may not contain the same value. * * - A cmpxchg will fail if it interrupted another write or cmpxchg. + * A cmpxchg will fail if it follows a cmpxchg which failed between + * updates to its top and msb bits, until a write is performed. + * (note: this limitation may be unexpected in parts of the + * ring buffer algorithm) * Other than that, it acts like a normal cmpxchg. * - * The 60 bit time stamp is broken up by 30 bits in a top and bottom half - * (bottom being the least significant 30 bits of the 60 bit time stamp). + * The 64-bit time stamp is broken up, from most to least significant, + * in: msb, top and bottom fields, of respectively 4, 30, and 30 bits. * - * The two most significant bits of each half holds a 2 bit counter (0-3). + * The two most significant bits of each field hold a 2-bit counter (0-3). * Each update will increment this counter by one. - * When reading the top and bottom, if the two counter bits match then the - * top and bottom together make a valid 60 bit number. + * When reading the top, bottom, and msb fields, if the two counter bits + * match, then the combined values make a valid 64-bit number. + * + * Counter limits. The following situations can generate overflows that + * produce corrupted time stamps: + * + * - A read or a write interrupted by 2^32 writes or cmpxchg. + * + * - A cmpxchg interrupted by 4 writes or cmpxchg. + * (note: this is not sufficient and should be fixed) */ #define RB_TIME_SHIFT 30 #define RB_TIME_VAL_MASK ((1 << RB_TIME_SHIFT) - 1) @@ -632,7 +648,7 @@ static inline bool __rb_time_read(rb_time_t *t, u64 *ret, unsigned long *cnt) /* * If the read is interrupted by a write, then the cnt will - * be different. Loop until both top and bottom have been read + * be different. Loop until top, bottom and msb have been read * without interruption. */ do { @@ -644,7 +660,12 @@ static inline bool __rb_time_read(rb_time_t *t, u64 *ret, unsigned long *cnt) *cnt = rb_time_cnt(top); - /* If top and msb counts don't match, this interrupted a write */ + /* + * If top and msb counts don't match, this either interrupted a + * write or follows a failed cmpxchg. + * This requires the update to bottom to be enclosed between + * updates to top and msb. + */ if (*cnt != rb_time_cnt(msb)) return false; @@ -685,9 +706,10 @@ static void rb_time_set(rb_time_t *t, u64 val) rb_time_split(val, &top, &bottom, &msb); - /* Writes always succeed with a valid number even if it gets interrupted. */ + /* Write always succeeds with a valid number even if it gets interrupted. */ do { cnt = local_inc_return(&t->cnt); + /* The top and msb updates surround bottom update. */ rb_time_val_set(&t->top, top, cnt); rb_time_val_set(&t->bottom, bottom, cnt); rb_time_val_set(&t->msb, val >> RB_TIME_MSB_SHIFT, cnt); @@ -706,7 +728,12 @@ static bool rb_time_cmpxchg(rb_time_t *t, u64 expect, u64 set) unsigned long cnt2, top2, bottom2, msb2; u64 val; - /* The cmpxchg always fails if it interrupted an update */ + /* + * The cmpxchg always fails if it interrupted an update or if it + * follows a cmpxchg that fails between updates to top and msb. + * A rb_time_set() is needed after a failed cmpxchg to reset to + * a state where cmpxchg can succeed again. + */ if (!__rb_time_read(t, &val, &cnt2)) return false; @@ -729,12 +756,13 @@ static bool rb_time_cmpxchg(rb_time_t *t, u64 expect, u64 set) if (!rb_time_read_cmpxchg(&t->cnt, cnt, cnt2)) return false; - if (!rb_time_read_cmpxchg(&t->msb, msb, msb2)) - return false; + /* The top and msb updates surround bottom update. */ if (!rb_time_read_cmpxchg(&t->top, top, top2)) return false; if (!rb_time_read_cmpxchg(&t->bottom, bottom, bottom2)) return false; + if (!rb_time_read_cmpxchg(&t->msb, msb, msb2)) + return false; return true; } -- 2.39.2