Received: by 2002:a25:683:0:0:0:0:0 with SMTP id 125csp633604ybg; Tue, 9 Jun 2020 09:02:02 -0700 (PDT) X-Google-Smtp-Source: ABdhPJy9xh3jrpy7+xepLaGhn01LyLiaSi/NmZgyUtJEEHRpTg2rlkmA+GCNJW7Y98DSzfO3UP/E X-Received: by 2002:a17:906:b1c3:: with SMTP id bv3mr26887612ejb.292.1591718522234; Tue, 09 Jun 2020 09:02:02 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1591718522; cv=none; d=google.com; s=arc-20160816; b=igxplXbQWCLLySKPSGjCz8N1ld3jUX5cLNUF7I266nuiBy2Mvwz1JHA+yZ3MlqHehu ddTqUiH8r7bQWZvLlSuzy/1OebN4Mo1c29lhTlBfDO4z5IWr6j/gTcGrR3GgO8ItBgqx xuLxGPu2y2+uECLPirMCUS/KoSzTSh0mYGHQqK6DhgC3Jxjkf+Rg3JCsMnnXg6YLsR9T uaVh5snqwvXNGWoy+q5MAlc2rCY3pFiHRaH+YvxNZT34Xy7bTYHz0Be5NnkB9YXFmJfV Z4Jj+EWyA8X1hveOnVlAVqfw+FLDZ7fFW5LtYJauPDi+pahfIg4auqXKpsAuw/3y0xfJ bMTQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:mime-version:user-agent:message-id :in-reply-to:date:references:subject:cc:to:from; bh=I4kf0rIMCu6MPK2bDLn6sDkO2Tc5MV3tbJuIlGhweXg=; b=IQgjePLOSSazrc2EF6QJ2kHleopD3Qq3Jb3n1hEp4qRpbnr3BPno/mc2JiL5Cs1NG6 x7PP/aAIX2sw4SuXB1sXqqllnkgpHpjSzQ6lUVl26SQH8AE8zugNBjUXMnrPiwCuGJMx TxuRs4S+xTGrJ3QdJLbayDZMZ0vPziCQ8HzVvK7ya71SKpt3D6+sbjJRLaiW4WBYz5r3 A3/ytQM5d7B9Ugu9BQTGF/y1d4B+ZmFFIQGZGOsN+x2NDqPZIXIzyVRDua3MB73ASAOc YhHwLhYtN2a6OrRuiYcAW+dEy0CAvsMIajJ85RfoYdGCJ26e73YVDStEcejd4Mu8Wxpv D41Q== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id v3si4474136eja.251.2020.06.09.09.01.37; Tue, 09 Jun 2020 09:02:02 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1731021AbgFIP4J (ORCPT + 99 others); Tue, 9 Jun 2020 11:56:09 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:51996 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728888AbgFIP4J (ORCPT ); Tue, 9 Jun 2020 11:56:09 -0400 Received: from Galois.linutronix.de (Galois.linutronix.de [IPv6:2a0a:51c0:0:12e:550::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 84C23C05BD1E for ; Tue, 9 Jun 2020 08:56:08 -0700 (PDT) Received: from localhost ([127.0.0.1] helo=vostro) by Galois.linutronix.de with esmtps (TLS1.2:RSA_AES_256_CBC_SHA1:256) (Exim 4.80) (envelope-from ) id 1jigbe-00044i-C0; Tue, 09 Jun 2020 17:56:06 +0200 From: John Ogness To: Petr Mladek Cc: Peter Zijlstra , Sergey Senozhatsky , Sergey Senozhatsky , Steven Rostedt , Linus Torvalds , Greg Kroah-Hartman , Andrea Parri , Thomas Gleixner , kexec@lists.infradead.org, linux-kernel@vger.kernel.org, Paul McKenney Subject: Re: Barrier before pushing desc_ring tail: was [PATCH v2 2/3] printk: add lockless buffer References: <20200501094010.17694-1-john.ogness@linutronix.de> <20200501094010.17694-3-john.ogness@linutronix.de> <20200609113751.GD23752@linux-b0ei> Date: Tue, 09 Jun 2020 17:56:03 +0200 In-Reply-To: <20200609113751.GD23752@linux-b0ei> (Petr Mladek's message of "Tue, 9 Jun 2020 13:37:51 +0200") Message-ID: <87d068utbg.fsf@vostro.fn.ogness.net> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.1 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 2020-06-09, Petr Mladek wrote: >> --- /dev/null >> +++ b/kernel/printk/printk_ringbuffer.c >> +/* >> + * Advance the desc ring tail. This function advances the tail by one >> + * descriptor, thus invalidating the oldest descriptor. Before advancing >> + * the tail, the tail descriptor is made reusable and all data blocks up to >> + * and including the descriptor's data block are invalidated (i.e. the data >> + * ring tail is pushed past the data block of the descriptor being made >> + * reusable). >> + */ >> +static bool desc_push_tail(struct printk_ringbuffer *rb, >> + unsigned long tail_id) >> +{ >> + struct prb_desc_ring *desc_ring = &rb->desc_ring; >> + enum desc_state d_state; >> + struct prb_desc desc; >> + >> + d_state = desc_read(desc_ring, tail_id, &desc); >> + >> + switch (d_state) { >> + case desc_miss: >> + /* >> + * If the ID is exactly 1 wrap behind the expected, it is >> + * in the process of being reserved by another writer and >> + * must be considered reserved. >> + */ >> + if (DESC_ID(atomic_long_read(&desc.state_var)) == >> + DESC_ID_PREV_WRAP(desc_ring, tail_id)) { >> + return false; >> + } >> + >> + /* >> + * The ID has changed. Another writer must have pushed the >> + * tail and recycled the descriptor already. Success is >> + * returned because the caller is only interested in the >> + * specified tail being pushed, which it was. >> + */ >> + return true; >> + case desc_reserved: >> + return false; >> + case desc_committed: >> + desc_make_reusable(desc_ring, tail_id); >> + break; >> + case desc_reusable: >> + break; >> + } >> + >> + /* >> + * Data blocks must be invalidated before their associated >> + * descriptor can be made available for recycling. Invalidating >> + * them later is not possible because there is no way to trust >> + * data blocks once their associated descriptor is gone. >> + */ >> + >> + if (!data_push_tail(rb, &rb->text_data_ring, desc.text_blk_lpos.next)) >> + return false; >> + if (!data_push_tail(rb, &rb->dict_data_ring, desc.dict_blk_lpos.next)) >> + return false; >> + >> + /* >> + * Check the next descriptor after @tail_id before pushing the tail >> + * to it because the tail must always be in a committed or reusable >> + * state. The implementation of prb_first_seq() relies on this. >> + * >> + * A successful read implies that the next descriptor is less than or >> + * equal to @head_id so there is no risk of pushing the tail past the >> + * head. >> + */ >> + d_state = desc_read(desc_ring, DESC_ID(tail_id + 1), >> + &desc); /* LMM(desc_push_tail:A) */ >> + if (d_state == desc_committed || d_state == desc_reusable) { >> + /* >> + * Any CPU that loads the new tail ID, must see that the >> + * descriptor at @tail_id is in the reusable state. See the >> + * read memory barrier part of desc_reserve:D for details. >> + */ >> + atomic_long_cmpxchg_relaxed(&desc_ring->tail_id, tail_id, >> + DESC_ID(tail_id + 1)); /* LMM(desc_push_tail:B) */ > > I was quite confused by the above comment. Does it mean that we need > a barrier here? Or does it explain why the cmpxchg has its own > LMM marker? This LMM marker is referenced quite often, but since it is a relaxed cmpxchg(), its significance is not immediately clear. I was hoping to add some hints as to why it is significant. The comment that it is referring to is: /* * Guarantee the tail ID is read before validating the * recycled descriptor state. A read memory barrier is * sufficient for this. This pairs with data_push_tail:C. * * Memory barrier involvement: * * If desc_reserve:C reads from desc_push_tail:B, then * desc_reserve:F reads from desc_make_reusable:A. * * Relies on: * * MB from desc_make_reusable:A to desc_push_tail:B * matching * RMB from desc_reserve:C to desc_reserve:F * * Note: desc_make_reusable:A, desc_push_tail:B, and * data_push_tail:C can all be different CPUs. However, * the desc_push_tail:B CPU must have previously seen * data_push_tail:D and the data_push_tail:D CPU (which * performs the full memory barrier) must have * previously seen desc_make_reusable:A. */ English translation: In order to push the data tail, a CPU must first see that the associated descriptor is in the reusable state. Since a full memory barrier is performed after that sighting and before doing the data tail push, _any_ CPU that sees the pushed data tail will be able to see that the associated descriptor is in the reusable state. In order to push the descriptor tail, a CPU must first see that the associated data tail has been pushed. Therefore, that CPU would also see that the associated descriptor is in the reusable state. > I think that we actually need a full barrier here to make sure that > all CPUs see the changes made by data_push_tail() before we > allow to rewrite the descriptor. The changes in data_push_tail() might > be done on different CPUs. There needs to be a reason why the ordering of data tail pushing and descriptor tail pushing is important. > It is similar like the full barrier in data_push_tail() before changing > data_ring->tail_lpos. How so? That memory barrier exists to make sure the reusable descriptor state is stored before pushing the data tail. This is important for readers (which start from the data tail) so they can notice if the descriptor has since been invalidated (reusable state). But where is it important that the data tail change is seen before the descriptor tail change? How are the data tail and descriptor tail significantly related to each other? John Ogness