Received: by 2002:a25:8b12:0:0:0:0:0 with SMTP id i18csp4757368ybl; Mon, 26 Aug 2019 15:38:03 -0700 (PDT) X-Google-Smtp-Source: APXvYqwBZTVPQC2OQz+KpY+BOEy182MHShW/Cm0aSbwfa1XDnQhAtubrIT5MrZIYV/AoqzACZUZo X-Received: by 2002:a65:62cd:: with SMTP id m13mr8262081pgv.437.1566859082957; Mon, 26 Aug 2019 15:38:02 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1566859082; cv=none; d=google.com; s=arc-20160816; b=sAByI8hw0PXC1qBiWRhcOlY/G6vq1gg2tmLhMoB+ZNJBElYUAKrV+BcMIJzCZKkbBc u8/Cunn6pzNRklnXH7J8Gl5VyxApO5XCsx+g81bVymlpk2w/2KLC/69yuzSulGvQU9Cx nWV0OP8oR3bQuSaxOacev0g8879GH3idTjhX8tEYHvWMDoBZoQDiHFgmV8Wf4gJ9vbvP feCCs0lhmay3WTIoDPjsIo25CeuulNhoGg4iM9vkjNCz4ehJlf5S8rbcc0zM4m+ec/Xm 8SA9Q8FFpZFjSzZltfB0yGlDcDoK2ocF6tXZZOPe7OaCwcLjWMTghPURd+zxSKnKZpBU wB+A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:mime-version:user-agent:message-id :in-reply-to:date:references:subject:cc:to:from; bh=kO4ZOfVcQGXXSUz9jl7FCE2RzJDg9Z7gwUwV5Lv4zDc=; b=Mmn78kNdcotASQg990s1b7xO6hVj3wr9nOzfhmB4lP3yZXpNc9077OkqQfHK5YtRPL pguQkpD11Su/1/5AeQMlKBLq4E9Lnwr+ONOFm+9+Cjbm6GbFsLVxBB0a08Mcw16HFMO3 8xw4x3WsN6c4BeXRz43TK89jIrRqZpUDRbaLY8XoGczRfOq928spgNDd05Pg5EZk+T/u tZfzXBhhKTJ7HDQ0ERLQe5roK4YG4my4GrZRrV/xjcHJkyUeHCLy2cMI4NpFBTKhrX56 rqeZKYqzlIdqzboYt4G1ooUVbIGD/yWSECMVuvfhXtpNaGpu12E6WYXlbuVh1snmdg1f UB8A== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id c5si10742499plr.291.2019.08.26.15.37.47; Mon, 26 Aug 2019 15:38:02 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726539AbfHZWgf (ORCPT + 99 others); Mon, 26 Aug 2019 18:36:35 -0400 Received: from Galois.linutronix.de ([193.142.43.55]:41402 "EHLO Galois.linutronix.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726160AbfHZWge (ORCPT ); Mon, 26 Aug 2019 18:36:34 -0400 Received: from localhost ([127.0.0.1] helo=vostro.local) by Galois.linutronix.de with esmtp (Exim 4.80) (envelope-from ) id 1i2Nb2-0006c8-Uo; Tue, 27 Aug 2019 00:36:21 +0200 From: John Ogness To: Petr Mladek Cc: linux-kernel@vger.kernel.org, Andrea Parri , Sergey Senozhatsky , Sergey Senozhatsky , Steven Rostedt , Brendan Higgins , Peter Zijlstra , Thomas Gleixner , Linus Torvalds , Greg Kroah-Hartman Subject: Re: numlist_push() barriers Re: [RFC PATCH v4 1/9] printk-rb: add a new printk ringbuffer implementation References: <20190807222634.1723-1-john.ogness@linutronix.de> <20190807222634.1723-2-john.ogness@linutronix.de> <20190823092109.doduc36avylm5cds@pathway.suse.cz> Date: Tue, 27 Aug 2019 00:36:18 +0200 In-Reply-To: <20190823092109.doduc36avylm5cds@pathway.suse.cz> (Petr Mladek's message of "Fri, 23 Aug 2019 11:21:09 +0200") Message-ID: <878srfo8pp.fsf@linutronix.de> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/23.4 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi Petr, AndreaP responded with some explanation (and great links!) on the topic of READ_ONCE. But I feel like your comments about the WRITE_ONCE were not addressed. I address that (and your other comments) below... On 2019-08-23, Petr Mladek wrote: >> --- /dev/null >> +++ b/kernel/printk/numlist.c >> +/** >> + * numlist_push() - Add a node to the list and assign it a sequence number. >> + * >> + * @nl: The numbered list to push to. >> + * >> + * @n: A node to push to the numbered list. >> + * The node must not already be part of a list. >> + * >> + * @id: The ID of the node. >> + * >> + * A node is added in two steps: The first step is to make this node the >> + * head, which causes a following push to add to this node. The second step is >> + * to update @next_id of the former head node to point to this one, which >> + * makes this node visible to any task that sees the former head node. >> + */ >> +void numlist_push(struct numlist *nl, struct nl_node *n, unsigned long id) >> +{ >> + unsigned long head_id; >> + unsigned long seq; >> + unsigned long r; >> + >> + /* >> + * bA: >> + * >> + * Setup the node to be a list terminator: next_id == id. >> + */ >> + WRITE_ONCE(n->next_id, id); > > Do we need WRITE_ONCE() here? > Both "n" and "id" are given as parameters and do not change. > The assigment must be done before "id" is set as nl->head_id. > The ordering is enforced by cmpxchg_release(). The cmpxchg_release() ensures that if the node is visible to writers, then the finalized assignment is also visible. And the store_release() ensures that if the previous node is visible to any readers, then the finalized assignment is also visible. In the reader case, if any readers happen to be sitting on the node, numlist_read() will fail because the ID was updated when the node was popped. So for all these cases any compiler optimizations leading to that assigment (tearing, speculation, etc) should be irrelevant. Therefore, IMO the WRITE_ONCE() is not needed. Since all of this is lockless, I used WRITE_ONCE() whenever touching shared variables. I must admit the decision may be motivated primarily by fear of compiler optimizations. Although "documenting lockless shared variable access" did play a role as well. I will replace the WRITE_ONCE with an assignment. >> + >> + /* bB: #1 */ >> + head_id = atomic_long_read(&nl->head_id); >> + >> + for (;;) { >> + /* bC: */ >> + while (!numlist_read(nl, head_id, &seq, NULL)) { >> + /* >> + * @head_id is invalid. Try again with an >> + * updated value. >> + */ >> + >> + cpu_relax(); > > I have got very confused by this. cpu_relax() suggests that this > cycle is busy waiting until a particular node becomes valid. > My first though was that it must cause deadlock in NMI when > the interrupted code is supposed to make the node valid. > > But it is the other way. The head is always valid when it is > added to the list. It might become invalid when another CPU > moves the head and the old one gets reused. > > Anyway, I do not see any reason for cpu_relax() here. You are correct. The cpu_relax() should not be there. But there is still an issue that this could spin hard if the head was recycled and this CPU does not yet see the new head value. To handle that, and in preparation for my next version, I'm now using a read_acquire() to load the ID in the node() callback (matching the set_release() in assign_desc()). This ensures that if numlist_read() fails, the new head will be visible. > Also the entire cycle would deserve a comment to avoid this mistake. > For example: > > /* > * bC: Read seq from current head. Repeat with new > * head when it has changed and the old one got reused. > */ Agreed. >> + >> + /* bB: #2 */ >> + head_id = atomic_long_read(&nl->head_id); >> + } >> + >> + /* >> + * bD: >> + * >> + * Set @seq to +1 of @seq from the previous head. >> + * >> + * Memory barrier involvement: >> + * >> + * If bB reads from bE, then bC->aA reads from bD. >> + * >> + * Relies on: >> + * >> + * RELEASE from bD to bE >> + * matching >> + * ADDRESS DEP. from bB to bC->aA >> + */ >> + WRITE_ONCE(n->seq, seq + 1); > > Do we really need WRITE_ONCE() here? > It is the same problem as with setting n->next_id above. For the same reasons as the other WRITE_ONCE, I will replace the WRITE_ONCE with an assignment. >> + >> + /* >> + * bE: >> + * >> + * This store_release() guarantees that @seq and @next are >> + * stored before the node with @id is visible to any popping >> + * writers. It pairs with the address dependency between @id >> + * and @seq/@next provided by numlist_read(). See bD and bF >> + * for details. >> + */ >> + r = atomic_long_cmpxchg_release(&nl->head_id, head_id, id); >> + if (r == head_id) >> + break; >> + >> + /* bB: #3 */ >> + head_id = r; >> + } >> + >> + n = nl->node(head_id, nl->node_arg); >> + >> + /* >> + * The old head (which is still the list terminator), cannot be >> + * removed because the list will always have at least one node. >> + * Therefore @n must be non-NULL. >> + */ > > Please, move this comment above the nl->node() call. Both locations > makes sense. I just see it as an important note for the call and thus > is should be above. Also it will be better separated from the below > comments for the _release() barrier. OK. >> + /* >> + * bF: the STORE part for @next_id >> + * >> + * Set @next_id of the previous head to @id. >> + * >> + * Memory barrier involvement: >> + * >> + * If bB reads from bE, then bF overwrites bA. >> + * >> + * Relies on: >> + * >> + * RELEASE from bA to bE >> + * matching >> + * ADDRESS DEP. from bB to bF >> + */ >> + /* >> + * bG: the RELEASE part for @next_id >> + * >> + * This _release() guarantees that a reader will see the updates to >> + * this node's @seq/@next_id if the reader saw the @next_id of the >> + * previous node in the list. It pairs with the address dependency >> + * between @id and @seq/@next provided by numlist_read(). >> + * >> + * Memory barrier involvement: >> + * >> + * If aB reads from bG, then aA' reads from bD, where aA' is in >> + * numlist_read() to read the node ID from bG. >> + * If aB reads from bG, then aB' reads from bA, where aB' is in >> + * numlist_read() to read the node ID from bG. >> + * >> + * Relies on: >> + * >> + * RELEASE from bG to bD >> + * matching >> + * ADDRESS DEP. from aB to aA' >> + * >> + * RELEASE from bG to bA >> + * matching >> + * ADDRESS DEP. from aB to aB' >> + */ >> + smp_store_release(&n->next_id, id); > > Sigh, I see this line one screen below the previous command thanks > to the extensive comments. Well, bF comment looks redundant. Yes, it is a lot, but bF and bG are commenting on different things. bF is the explanation that the _writer_ will overwrite the previous node's terminating value with the ID of the new node. bG is the explanation that if the _reader_ reads a non-terminating @next_id value, then the initialization of @seq and @next_id for that next node will be visible. Both of these points are key to the numlist implementation because they guarantee the complete and correct linking of the list for all nodes pushed. John Ogness