Received: by 2002:a05:7412:8521:b0:e2:908c:2ebd with SMTP id t33csp2402043rdf; Mon, 6 Nov 2023 13:08:09 -0800 (PST) X-Google-Smtp-Source: AGHT+IFhqjIoh3hz9zIcCX+q/C9DRq1OKT+8sACzO9LSMXVNqUiBdvBzm57RNhUSnp43Mex0HIuz X-Received: by 2002:a05:6a00:21d3:b0:6bc:635a:aad3 with SMTP id t19-20020a056a0021d300b006bc635aaad3mr26431753pfj.9.1699304888945; Mon, 06 Nov 2023 13:08:08 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1699304888; cv=none; d=google.com; s=arc-20160816; b=szC7FoWFQxKY2Q4HgGs7iLfHf4nC1BISSHo7LAIl5y2t6cldVyTYpbSUwAoyroFs5l WPyEdjxHKFi0JGAgN9ECDhh0Nm/QULWtzakJ8CphckORuZGbMHl35o2c1sOpHOIZE4e4 TvDQcKl06Ozz8abT5+h4UpswgCt00HxkXgWbUeo4ChwBCdkf8wWBUMMNut0TfgwzyFs3 Rbd7q9nRS1Rwp9agXkFOYFRLwrZ7KXRDXptpFG/9vWLjZ/hRPSS/STU8Rg1UNdUp2n9Z Xer77mYZmYhewGRYBKte4zzA6yZDosNXmDDS6j6uMVxd9uSBPshprj0mpP+Obyk9Rhd/ ifHg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:dkim-signature :dkim-signature:from; bh=z3f4HQf7ma/ip4YULZcgH7RuMtHihS3FcdUbZrma7Uk=; fh=SH94mmwiRuycNuXnyBv4dc1kk0FqoQ65LgesulXvwvo=; b=mmnjGoPd1bmOcrzxmx5Pg5vGxnRXapMZoaUOWxnxqDfY53h7WggsWV5Er6YIb+gYKA m+eWbnS6Wq0jhUA+LFx1Xo6kAAYocxSzrDcGW0jFkZruvk0wiHcZFnFzorBS2fD6FoV3 1t/hlrb7ffcWWeIhQ4To+VcrDzd4SsiuikgBNN22pAb5cx0IQG8e5iUdfljgxdyYjwPv MuPv28TKens42zQWEKbWUfm3qUJpePtMxne5TdLiLos08VSTyY3/iuWJq4QGxXQPle9j vmv1pZFEXG8yNTpEHYywo/HAqMsaXQsPCn5FesvQDkrChBfctYJFRD+0R07K0s2SO++Z RidQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linutronix.de header.s=2020 header.b=Ka1zOgQR; dkim=neutral (no key) header.i=@linutronix.de header.b=1c9aXZum; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:5 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=linutronix.de Return-Path: Received: from groat.vger.email (groat.vger.email. [2620:137:e000::3:5]) by mx.google.com with ESMTPS id t27-20020a056a00139b00b006c0fe935fa1si9415625pfg.181.2023.11.06.13.08.08 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 06 Nov 2023 13:08:08 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:5 as permitted sender) client-ip=2620:137:e000::3:5; Authentication-Results: mx.google.com; dkim=pass header.i=@linutronix.de header.s=2020 header.b=Ka1zOgQR; dkim=neutral (no key) header.i=@linutronix.de header.b=1c9aXZum; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:5 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=linutronix.de Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by groat.vger.email (Postfix) with ESMTP id E9CC48048F07; Mon, 6 Nov 2023 13:07:57 -0800 (PST) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.10 at groat.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233108AbjKFVHk (ORCPT + 99 others); Mon, 6 Nov 2023 16:07:40 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:57706 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233014AbjKFVHi (ORCPT ); Mon, 6 Nov 2023 16:07:38 -0500 Received: from galois.linutronix.de (Galois.linutronix.de [193.142.43.55]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id EE242D51 for ; Mon, 6 Nov 2023 13:07:34 -0800 (PST) From: John Ogness DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020; t=1699304853; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=z3f4HQf7ma/ip4YULZcgH7RuMtHihS3FcdUbZrma7Uk=; b=Ka1zOgQRVwOC98UHvxmZ5RPErohmKs1H9jqjbO7+JzTjJeJqk7Md97RUUPigivGsEB481W oPlaHboW50PHEI3jAd8tARikwPbeXPSg12R2JjR0lJdqwr6PioNEHgRO37S3g8d4ElHquM fq+4TvhRAA2TDg7hrK8mYHFYMZkr1we6FfOsh+CnyL0VMD5FBSX3+ed9qjPQj8uBE04fnl e4VMyeBG4M0Afa5PwMJRi3tUVEavK73KAoWvrZQ7x1xfRAqG5yjoWt+MiOVjqZvihlsoFl mK4jH2PofoWJeOTPZ8EafGy75YOfZQblvEaaWbnTlvCQbwx6JP2NJvekIq0a3A== DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020e; t=1699304853; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=z3f4HQf7ma/ip4YULZcgH7RuMtHihS3FcdUbZrma7Uk=; b=1c9aXZumpnPAe5mtBK859fA03kvAfrkbOsMQkwd4RKKpZDEGwC4iFIy27xrg6l01nOCBp6 s6F2tGPdLd3fHXCA== To: Petr Mladek Cc: Sergey Senozhatsky , Steven Rostedt , Thomas Gleixner , linux-kernel@vger.kernel.org, Mukesh Ojha Subject: [PATCH printk v2 1/9] printk: ringbuffer: Do not skip non-finalized records with prb_next_seq() Date: Mon, 6 Nov 2023 22:13:22 +0106 Message-Id: <20231106210730.115192-2-john.ogness@linutronix.de> In-Reply-To: <20231106210730.115192-1-john.ogness@linutronix.de> References: <20231106210730.115192-1-john.ogness@linutronix.de> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-0.2 required=5.0 tests=DKIM_SIGNED,DKIM_VALID, DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,INVALID_DATE_TZ_ABSURD, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on groat.vger.email Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (groat.vger.email [0.0.0.0]); Mon, 06 Nov 2023 13:07:58 -0800 (PST) Commit f244b4dc53e5 ("printk: ringbuffer: Improve prb_next_seq() performance") introduced an optimization for prb_next_seq() by using best-effort to track recently finalized records. However, the order of finalization does not necessarily match the order of the records. The optimization changed prb_next_seq() to return inconsistent results, possibly yielding sequence numbers that are not available to readers because they are preceded by non-finalized records or they are not yet visible to the reader CPU. Rather than simply best-effort tracking recently finalized records, force the committing writer to read records and increment the last "contiguous block" of finalized records. In order to do this, the sequence number instead of ID must be stored because ID's cannot be directly compared. A new memory barrier pair is introduced to guarantee that a reader can always read the records up until the sequence number returned by prb_next_seq() (unless the records have since been overwritten in the ringbuffer). This restores the original functionality of prb_next_seq() while also keeping the optimization. For 32bit systems, only the lower 32 bits of the sequence number are stored. When reading the value, it is expanded to the full 64bit sequence number by folding in the value returned by prb_first_seq(). Fixes: f244b4dc53e5 ("printk: ringbuffer: Improve prb_next_seq() performance") Signed-off-by: John Ogness --- kernel/printk/printk_ringbuffer.c | 189 ++++++++++++++++++++++++------ kernel/printk/printk_ringbuffer.h | 4 +- 2 files changed, 153 insertions(+), 40 deletions(-) diff --git a/kernel/printk/printk_ringbuffer.c b/kernel/printk/printk_ringbuffer.c index fde338606ce8..94eede5356ac 100644 --- a/kernel/printk/printk_ringbuffer.c +++ b/kernel/printk/printk_ringbuffer.c @@ -6,6 +6,7 @@ #include #include #include "printk_ringbuffer.h" +#include "internal.h" /** * DOC: printk_ringbuffer overview @@ -303,6 +304,9 @@ * * desc_push_tail:B / desc_reserve:D * set descriptor reusable (state), then push descriptor tail (id) + * + * desc_update_last_finalized:A / desc_last_finalized_seq:A + * store finalized record, then set new highest finalized sequence number */ #define DATA_SIZE(data_ring) _DATA_SIZE((data_ring)->size_bits) @@ -1441,20 +1445,144 @@ bool prb_reserve_in_last(struct prb_reserved_entry *e, struct printk_ringbuffer return false; } +#ifdef CONFIG_64BIT + +#define __u64seq_to_ulseq(u64seq) (u64seq) +#define __ulseq_to_u64seq(ulseq) (ulseq) + +#else /* CONFIG_64BIT */ + +static u64 prb_first_seq(struct printk_ringbuffer *rb); + +#define __u64seq_to_ulseq(u64seq) ((u32)u64seq) +static inline u64 __ulseq_to_u64seq(u32 ulseq) +{ + u64 rb_first_seq = prb_first_seq(prb); + u64 seq; + + /* + * The provided sequence is only the lower 32 bits of the ringbuffer + * sequence. It needs to be expanded to 64bit. Get the first sequence + * number from the ringbuffer and fold it. + */ + seq = rb_first_seq - ((u32)rb_first_seq - ulseq); + + return seq; +} + +#endif /* CONFIG_64BIT */ + +/* + * @last_finalized_seq value guarantees that all records up to and including + * this sequence number are finalized and can be read. The only exception are + * too old records which have already been overwritten. + * + * It is also guaranteed that @last_finalized_seq only increases. + * + * Be aware that finalized records following non-finalized records are not + * reported because they are not yet available to the reader. For example, + * a new record stored via printk() will not be available to a printer if + * it follows a record that has not been finalized yet. However, once that + * non-finalized record becomes finalized, @last_finalized_seq will be + * appropriately updated and the full set of finalized records will be + * available to the printer. And since each printk() caller will either + * directly print or trigger deferred printing of all available unprinted + * records, all printk() messages will get printed. + */ +static u64 desc_last_finalized_seq(struct prb_desc_ring *desc_ring) +{ + unsigned long ulseq; + + /* + * Guarantee the sequence number is loaded before loading the + * associated record in order to guarantee that the record can be + * seen by this CPU. This pairs with desc_update_last_finalized:A. + */ + ulseq = atomic_long_read_acquire(&desc_ring->last_finalized_seq + ); /* LMM(desc_last_finalized_seq:A) */ + + return __ulseq_to_u64seq(ulseq); +} + +static bool _prb_read_valid(struct printk_ringbuffer *rb, u64 *seq, + struct printk_record *r, unsigned int *line_count); + +/* + * Check if there are records directly following @last_finalized_seq that are + * finalized. If so, update @last_finalized_seq to the latest of these + * records. It is not allowed to skip over records that are not yet finalized. + */ +static void desc_update_last_finalized(struct printk_ringbuffer *rb) +{ + struct prb_desc_ring *desc_ring = &rb->desc_ring; + u64 old_seq = desc_last_finalized_seq(desc_ring); + unsigned long oldval; + unsigned long newval; + u64 finalized_seq; + u64 try_seq; + +try_again: + finalized_seq = old_seq; + try_seq = finalized_seq + 1; + + /* Try to find later finalized records. */ + while (_prb_read_valid(rb, &try_seq, NULL, NULL)) { + finalized_seq = try_seq; + try_seq++; + } + + /* No update needed if no later finalized record was found. */ + if (finalized_seq == old_seq) + return; + + oldval = __u64seq_to_ulseq(old_seq); + newval = __u64seq_to_ulseq(finalized_seq); + + /* + * Set the sequence number of a later finalized record that has been + * seen. + * + * Guarantee the record data is visible to other CPUs before storing + * its sequence number. This pairs with desc_last_finalized_seq:A. + * + * Memory barrier involvement: + * + * If desc_last_finalized_seq:A reads from + * desc_update_last_finalized:A, then desc_read:A reads from + * _prb_commit:B. + * + * Relies on: + * + * RELEASE from _prb_commit:B to desc_update_last_finalized:A + * matching + * ACQUIRE from desc_last_finalized_seq:A to desc_read:A + * + * Note: _prb_commit:B and desc_update_last_finalized:A can be + * different CPUs. However, the desc_update_last_finalized:A + * CPU (which performs the release) must have previously seen + * _prb_commit:B. + */ + if (!atomic_long_try_cmpxchg_release(&desc_ring->last_finalized_seq, + &oldval, newval)) { /* LMM(desc_update_last_finalized:A) */ + old_seq = __ulseq_to_u64seq(oldval); + goto try_again; + } +} + /* * Attempt to finalize a specified descriptor. If this fails, the descriptor * is either already final or it will finalize itself when the writer commits. */ -static void desc_make_final(struct prb_desc_ring *desc_ring, unsigned long id) +static void desc_make_final(struct printk_ringbuffer *rb, unsigned long id) { + struct prb_desc_ring *desc_ring = &rb->desc_ring; unsigned long prev_state_val = DESC_SV(id, desc_committed); struct prb_desc *d = to_desc(desc_ring, id); - atomic_long_cmpxchg_relaxed(&d->state_var, prev_state_val, - DESC_SV(id, desc_finalized)); /* LMM(desc_make_final:A) */ - - /* Best effort to remember the last finalized @id. */ - atomic_long_set(&desc_ring->last_finalized_id, id); + if (atomic_long_try_cmpxchg_relaxed(&d->state_var, &prev_state_val, + DESC_SV(id, desc_finalized))) { /* LMM(desc_make_final:A) */ + desc_update_last_finalized(rb); + } } /** @@ -1550,7 +1678,7 @@ bool prb_reserve(struct prb_reserved_entry *e, struct printk_ringbuffer *rb, * readers. (For seq==0 there is no previous descriptor.) */ if (info->seq > 0) - desc_make_final(desc_ring, DESC_ID(id - 1)); + desc_make_final(rb, DESC_ID(id - 1)); r->text_buf = data_alloc(rb, r->text_buf_size, &d->text_blk_lpos, id); /* If text data allocation fails, a data-less record is committed. */ @@ -1643,7 +1771,7 @@ void prb_commit(struct prb_reserved_entry *e) */ head_id = atomic_long_read(&desc_ring->head_id); /* LMM(prb_commit:A) */ if (head_id != e->id) - desc_make_final(desc_ring, e->id); + desc_make_final(e->rb, e->id); } /** @@ -1663,12 +1791,9 @@ void prb_commit(struct prb_reserved_entry *e) */ void prb_final_commit(struct prb_reserved_entry *e) { - struct prb_desc_ring *desc_ring = &e->rb->desc_ring; - _prb_commit(e, desc_finalized); - /* Best effort to remember the last finalized @id. */ - atomic_long_set(&desc_ring->last_finalized_id, e->id); + desc_update_last_finalized(e->rb); } /* @@ -2008,7 +2133,9 @@ u64 prb_first_valid_seq(struct printk_ringbuffer *rb) * newest sequence number available to readers will be. * * This provides readers a sequence number to jump to if all currently - * available records should be skipped. + * available records should be skipped. It is guaranteed that all records + * previous to the returned value have been finalized and are (or were) + * available to the reader. * * Context: Any context. * Return: The sequence number of the next newest (not yet available) record @@ -2017,33 +2144,19 @@ u64 prb_first_valid_seq(struct printk_ringbuffer *rb) u64 prb_next_seq(struct printk_ringbuffer *rb) { struct prb_desc_ring *desc_ring = &rb->desc_ring; - enum desc_state d_state; - unsigned long id; u64 seq; - /* Check if the cached @id still points to a valid @seq. */ - id = atomic_long_read(&desc_ring->last_finalized_id); - d_state = desc_read(desc_ring, id, NULL, &seq, NULL); + seq = desc_last_finalized_seq(desc_ring); - if (d_state == desc_finalized || d_state == desc_reusable) { - /* - * Begin searching after the last finalized record. - * - * On 0, the search must begin at 0 because of hack#2 - * of the bootstrapping phase it is not known if a - * record at index 0 exists. - */ - if (seq != 0) - seq++; - } else { - /* - * The information about the last finalized sequence number - * has gone. It should happen only when there is a flood of - * new messages and the ringbuffer is rapidly recycled. - * Give up and start from the beginning. - */ - seq = 0; - } + /* + * Begin searching after the last finalized record. + * + * On 0, the search must begin at 0 because of hack#2 + * of the bootstrapping phase it is not known if a + * record at index 0 exists. + */ + if (seq != 0) + seq++; /* * The information about the last finalized @seq might be inaccurate. @@ -2085,7 +2198,7 @@ void prb_init(struct printk_ringbuffer *rb, rb->desc_ring.infos = infos; atomic_long_set(&rb->desc_ring.head_id, DESC0_ID(descbits)); atomic_long_set(&rb->desc_ring.tail_id, DESC0_ID(descbits)); - atomic_long_set(&rb->desc_ring.last_finalized_id, DESC0_ID(descbits)); + atomic_long_set(&rb->desc_ring.last_finalized_seq, 0); rb->text_data_ring.size_bits = textbits; rb->text_data_ring.data = text_buf; diff --git a/kernel/printk/printk_ringbuffer.h b/kernel/printk/printk_ringbuffer.h index 18cd25e489b8..3374a5a3303e 100644 --- a/kernel/printk/printk_ringbuffer.h +++ b/kernel/printk/printk_ringbuffer.h @@ -75,7 +75,7 @@ struct prb_desc_ring { struct printk_info *infos; atomic_long_t head_id; atomic_long_t tail_id; - atomic_long_t last_finalized_id; + atomic_long_t last_finalized_seq; }; /* @@ -259,7 +259,7 @@ static struct printk_ringbuffer name = { \ .infos = &_##name##_infos[0], \ .head_id = ATOMIC_INIT(DESC0_ID(descbits)), \ .tail_id = ATOMIC_INIT(DESC0_ID(descbits)), \ - .last_finalized_id = ATOMIC_INIT(DESC0_ID(descbits)), \ + .last_finalized_seq = ATOMIC_INIT(0), \ }, \ .text_data_ring = { \ .size_bits = (avgtextbits) + (descbits), \ -- 2.39.2