Received: by 2002:a25:683:0:0:0:0:0 with SMTP id 125csp373023ybg; Tue, 9 Jun 2020 02:52:36 -0700 (PDT) X-Google-Smtp-Source: ABdhPJygS6E+bbd8WRpkJpAlnzjp/+72pVt2Vvuh0jKTcZNbsrDhTp7rGOVxDCcMSovrNdyicPEU X-Received: by 2002:a17:906:6897:: with SMTP id n23mr25314595ejr.437.1591696356075; Tue, 09 Jun 2020 02:52:36 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1591696356; cv=none; d=google.com; s=arc-20160816; b=sSXZIAGkYbvmCfJhyaW7zezKi48jrS24NpR2SCdQE1lOpBkE2899698uKd2j56HPL8 aGgCHXvah3+KYht+BTAdW5qzlZHRz8dsgU/H/XCVYsc9cVmX2PRXKLlxIMQqTNnPsgpI vP62Z8Qg2MPTvHjFUyIxMCWWYNCX80ahgxEiGn1PyNIvX0teu1rYLDYtDfVRMmb8ISMk 04TdCkQGV6+3WMjH62tnOm5BEyyAxF9h2OvpAU1f+InDIdCZLobhAa7dMHFVP1DHEJZv NfqfRNNxk4u8VGpD7WvoU/acBaNPL9nohdE2AITAuV8XpD5gDFmG1+HQq1QVblMpXBc7 BS/g== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date; bh=Sj44goQVXDJnixWJDDDOyU9qPs87q+YBiY2mZ8eE3og=; b=CDE2nkjlN+cZiHNJk/6Cxsn/zwVEnuWwF5XgP7UayLyQtB415D9Ep+ITKEi+XN9/T9 U0kBTeK9uxnkTkHAvDSsrNAmJ3ijJHd+xNqt65qIbZF4NK90ZSyV+B+cWxTu7oSfXXr9 bEW8GX2j0pczrG2wmtX19yHFmFwsdYdYZwvVM+f0gMqvZb3VZgyorePPIE4bGp+wTcei 4fSNi6y0y5MOmxirI/239sRuVsDk/CneSAKnWeR70Jjs5xZvd2BTDHG4LhuqbfRS98GB TtOUnNeFsG8uuCn1E3XR6zLkTR2qImb56lGXH0KJCYo8kUNSvehHKDPh1QLLu63nfzZF +0IA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id mj14si10280358ejb.464.2020.06.09.02.52.12; Tue, 09 Jun 2020 02:52:36 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728068AbgFIJsZ (ORCPT + 99 others); Tue, 9 Jun 2020 05:48:25 -0400 Received: from mx2.suse.de ([195.135.220.15]:56536 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725950AbgFIJsZ (ORCPT ); Tue, 9 Jun 2020 05:48:25 -0400 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.220.254]) by mx2.suse.de (Postfix) with ESMTP id EE31CAE67; Tue, 9 Jun 2020 09:48:25 +0000 (UTC) Date: Tue, 9 Jun 2020 11:48:20 +0200 From: Petr Mladek To: John Ogness Cc: Peter Zijlstra , Sergey Senozhatsky , Sergey Senozhatsky , Steven Rostedt , Linus Torvalds , Greg Kroah-Hartman , Andrea Parri , Thomas Gleixner , kexec@lists.infradead.org, linux-kernel@vger.kernel.org Subject: Full barrier in data_push_tail(): was [PATCH v2 2/3] printk: add lockless buffer Message-ID: <20200609094820.GC23752@linux-b0ei> References: <20200501094010.17694-1-john.ogness@linutronix.de> <20200501094010.17694-3-john.ogness@linutronix.de> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20200501094010.17694-3-john.ogness@linutronix.de> User-Agent: Mutt/1.10.1 (2018-07-13) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri 2020-05-01 11:46:09, John Ogness wrote: > Introduce a multi-reader multi-writer lockless ringbuffer for storing > the kernel log messages. Readers and writers may use their API from > any context (including scheduler and NMI). This ringbuffer will make > it possible to decouple printk() callers from any context, locking, > or console constraints. It also makes it possible for readers to have > full access to the ringbuffer contents at any time and context (for > example from any panic situation). > > diff --git a/kernel/printk/printk_ringbuffer.c b/kernel/printk/printk_ringbuffer.c > new file mode 100644 > index 000000000000..e0a66468d4f3 > --- /dev/null > +++ b/kernel/printk/printk_ringbuffer.c > +/* > + * Advance the data ring tail to at least @lpos. This function puts > + * descriptors into the reusable state if the tail is pushed beyond > + * their associated data block. > + */ > +static bool data_push_tail(struct printk_ringbuffer *rb, > + struct prb_data_ring *data_ring, > + unsigned long lpos) > +{ > + unsigned long tail_lpos; > + unsigned long next_lpos; > + > + /* If @lpos is not valid, there is nothing to do. */ > + if (lpos == INVALID_LPOS) > + return true; > + > + tail_lpos = atomic_long_read(&data_ring->tail_lpos); > + > + do { > + /* Done, if the tail lpos is already at or beyond @lpos. */ > + if ((lpos - tail_lpos) - 1 >= DATA_SIZE(data_ring)) > + break; > + > + /* > + * Make all descriptors reusable that are associated with > + * data blocks before @lpos. > + */ > + if (!data_make_reusable(rb, data_ring, tail_lpos, lpos, > + &next_lpos)) { > + /* > + * Guarantee the descriptor state loaded in > + * data_make_reusable() is performed before reloading > + * the tail lpos. The failed data_make_reusable() may > + * be due to a newly recycled descriptor causing > + * the tail lpos to have been previously pushed. This > + * pairs with desc_reserve:D. > + * > + * Memory barrier involvement: > + * > + * If data_make_reusable:D reads from desc_reserve:G, > + * then data_push_tail:B reads from data_push_tail:D. > + * > + * Relies on: > + * > + * MB from data_push_tail:D to desc_reserve:G > + * matching > + * RMB from data_make_reusable:D to data_push_tail:B > + * > + * Note: data_push_tail:D and desc_reserve:G can be > + * different CPUs. However, the desc_reserve:G > + * CPU (which performs the full memory barrier) > + * must have previously seen data_push_tail:D. > + */ > + smp_rmb(); /* LMM(data_push_tail:A) */ > + > + next_lpos = atomic_long_read(&data_ring->tail_lpos > + ); /* LMM(data_push_tail:B) */ > + if (next_lpos == tail_lpos) > + return false; > + > + /* Another task pushed the tail. Try again. */ > + tail_lpos = next_lpos; > + continue; > + } > + > + /* > + * Guarantee any descriptor states that have transitioned to > + * reusable are stored before pushing the tail lpos. This > + * allows readers to identify if data has expired while > + * reading the descriptor. This pairs with desc_read:D. > + */ > + smp_mb(); /* LMM(data_push_tail:C) */ The comment does not explain why we need a full barrier here. It talks about writing descriptor states. It suggests that write barrier should be enough. I guess that this is related to the discussion that we had last time, and the litmus test mentioned in see https://lore.kernel.org/r/87h7zcjkxy.fsf@linutronix.de I would add something like: * Full barrier is necessary because the descriptors * might have been made reusable also by other CPUs. For people like me, it would be great to add also link to a more detailed explanation, for example, the litmus tests, or something even more human readable ;-) I think that it is a "rather" common problem. I wonder whether it is already documented somewhere. > + } while (!atomic_long_try_cmpxchg_relaxed(&data_ring->tail_lpos, > + &tail_lpos, next_lpos)); /* LMM(data_push_tail:D) */ > + > + return true; > +} > + Best Regards, Petr