Received: by 2002:a25:8b12:0:0:0:0:0 with SMTP id i18csp2515253ybl; Sat, 24 Aug 2019 19:45:35 -0700 (PDT) X-Google-Smtp-Source: APXvYqwuqIcRecMCKyUd/Xg4BtkiZpspUyOBVuWqjFx4dxVe9MLc232j0CM+SgVxfZECFsLtucPE X-Received: by 2002:a62:33c3:: with SMTP id z186mr13745631pfz.212.1566701134950; Sat, 24 Aug 2019 19:45:34 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1566701134; cv=none; d=google.com; s=arc-20160816; b=r9u6JCW50mA7RcW42dA0mUhbiAYcX4GJlR9am1+H7dzV8abk/Tn5JvARm7vJ8zsSkJ MMxY3ZJSh0uVZRjsNGPdZo/qQ3E7CaaV0KNEKsLtJYG0KVkizkkk9yKQ2yviJp+8v/U3 7Pyw5YbUcz9y6/qwsz+cdprJph3Fqi/4Qq9ylRCf2n2Mb0JuwcML9EeAN8wXShRznba6 Kv/Sg5+CyxnId5NIjH7FS5E4038ZfnA3CRgRjzGjxpKm3C20L7NcVc3EVXZk+VeWgwZs Y+c70uZw960wZH+ZcWO1/uvVeWe/AM/ZHQR1NIGen0pkIWhD4O0b+Olhzt9ut/hI7v0O B3Vw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:mime-version:user-agent:message-id :in-reply-to:date:references:subject:cc:to:from; bh=rfQcLmIXlDVl9cJpjNz7hCRa4nMuSFDiUXecoIZtQkg=; b=PNroGthJsMvx7qyyrP0LPMrE6LDAdb2+WMc2DeLpqcAB1nGKOeMP/mVMAlkhvVbkcu pVQ5RVgqI1zaIDfPWnbrVM81U7DAYRZA2Et0xz7KHbNTGkZqw0D5U6CaaJYV2DC46PWA 98NWRBHYryvjpMF/GrqG7DLJdgogL/J5EPsJ3gB8LE4lpvLz8NsCvz6HkPW0WMJAHPG8 tgtlfuDlzOd2bVh1JvJIT2JsUJpaj0ucr6s5lhNVeXfh5svvO0bIqqFy6gA27SLwvBka gzH7GMFs0BDszwBbMsL0Yq6g9vLto33aA1VF4dV0MBYOi1ieqAY6t40FB0BQS8ymw4P+ T2EQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id d1si5556082pgq.540.2019.08.24.19.45.19; Sat, 24 Aug 2019 19:45:34 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728332AbfHYCnF (ORCPT + 99 others); Sat, 24 Aug 2019 22:43:05 -0400 Received: from Galois.linutronix.de ([193.142.43.55]:37850 "EHLO Galois.linutronix.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728246AbfHYCnF (ORCPT ); Sat, 24 Aug 2019 22:43:05 -0400 Received: from localhost ([127.0.0.1] helo=vostro.local) by Galois.linutronix.de with esmtp (Exim 4.80) (envelope-from ) id 1i1iUI-0002JB-W8; Sun, 25 Aug 2019 04:42:39 +0200 From: John Ogness To: Petr Mladek Cc: linux-kernel@vger.kernel.org, Andrea Parri , Sergey Senozhatsky , Sergey Senozhatsky , Steven Rostedt , Brendan Higgins , Peter Zijlstra , Thomas Gleixner , Linus Torvalds , Greg Kroah-Hartman Subject: Re: dataring_push() barriers Re: [RFC PATCH v4 1/9] printk-rb: add a new printk ringbuffer implementation References: <20190807222634.1723-1-john.ogness@linutronix.de> <20190807222634.1723-2-john.ogness@linutronix.de> <20190820135004.7vatbrzphfsgsnw2@pathway.suse.cz> Date: Sun, 25 Aug 2019 04:42:37 +0200 In-Reply-To: <20190820135004.7vatbrzphfsgsnw2@pathway.suse.cz> (Petr Mladek's message of "Tue, 20 Aug 2019 15:50:04 +0200") Message-ID: <87r25aklsy.fsf@linutronix.de> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/23.4 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 2019-08-20, Petr Mladek wrote: >> +/** >> + * dataring_push() - Reserve a data block in the data array. >> + * >> + * @dr: The data ringbuffer to reserve data in. >> + * >> + * @size: The size to reserve. >> + * >> + * @desc: A pointer to a descriptor to store the data block information. >> + * >> + * @id: The ID of the descriptor to be associated. >> + * The data block will not be set with @id, but rather initialized with >> + * a value that is explicitly different than @id. This is to handle the >> + * case when newly available garbage by chance matches the descriptor >> + * ID. >> + * >> + * This function expects to move the head pointer forward. If this would >> + * result in overtaking the data array index of the tail, the tail data block >> + * will be invalidated. >> + * >> + * Return: A pointer to the reserved writer data, otherwise NULL. >> + * >> + * This will only fail if it was not possible to invalidate the tail data >> + * block. >> + */ >> +char *dataring_push(struct dataring *dr, unsigned int size, >> + struct dr_desc *desc, unsigned long id) >> +{ >> + unsigned long begin_lpos; >> + unsigned long next_lpos; >> + struct dr_datablock *db; >> + bool ret; >> + >> + to_db_size(&size); >> + >> + do { >> + /* fA: */ >> + ret = get_new_lpos(dr, size, &begin_lpos, &next_lpos); >> + >> + /* >> + * fB: >> + * >> + * The data ringbuffer tail may have been pushed (by this or >> + * any other task). The updated @tail_lpos must be visible to >> + * all observers before changes to @begin_lpos, @next_lpos, or >> + * @head_lpos by this task are visible in order to allow other >> + * tasks to recognize the invalidation of the data >> + * blocks. > > This sounds strange. The write barrier should be done only on CPU > that really modified tail_lpos. I.e. it should be in _dataring_pop() > after successful dr->tail_lpos modification. The problem is that there are no data dependencies between the different variables. When a new datablock is being reserved, it is critical that all other observers see that the tail_lpos moved forward _before_ any other changes. _dataring_pop() uses an smp_rmb() to synchronize for tail_lpos movement. This CPU is about to make some changes and may have seen an updated tail_lpos. An smp_wmb() is useless if this is not the CPU that performed that update. The full memory barrier ensures that all other observers will see what this CPU sees before any of its future changes are seen. You suggest an alternative implementation below. I will address that there. >> + * This pairs with the smp_rmb() in _dataring_pop() as well as >> + * any reader task using smp_rmb() to post-validate data that >> + * has been read from a data block. >> + >> + * Memory barrier involvement: >> + * >> + * If dE reads from fE, then dI reads from fA->eA. >> + * If dC reads from fG, then dI reads from fA->eA. >> + * If dD reads from fH, then dI reads from fA->eA. >> + * If mC reads from fH, then mF reads from fA->eA. >> + * >> + * Relies on: >> + * >> + * FULL MB between fA->eA and fE >> + * matching >> + * RMB between dE and dI >> + * >> + * FULL MB between fA->eA and fG >> + * matching >> + * RMB between dC and dI >> + * >> + * FULL MB between fA->eA and fH >> + * matching >> + * RMB between dD and dI >> + * >> + * FULL MB between fA->eA and fH >> + * matching >> + * RMB between mC and mF >> + */ >> + smp_mb(); > > All these comments talk about sychronization against read barriers. > It means that we would need a write barrier here. But it does > not make much sense to do write barrier before actually > writing dr->head_lpos. I think my comments above address this. > After all I think that we do not need any barrier here. > The write barrier for dr->tail_lpos should be in > _dataring_pop(). The read barrier is not needed because > we are not reading anything here. > > Instead we should put a barrier after modyfying dr->head_lpos, > see below. Comments below. >> + if (!ret) { >> + /* >> + * Force @desc permanently invalid to minimize risk >> + * of the descriptor later unexpectedly being >> + * determined as valid due to overflowing/wrapping of >> + * @head_lpos. An unaligned @begin_lpos can never >> + * point to a data block and having the same value >> + * for @begin_lpos and @next_lpos is also invalid. >> + */ >> + >> + /* fC: */ >> + WRITE_ONCE(desc->begin_lpos, 1); >> + >> + /* fD: */ >> + WRITE_ONCE(desc->next_lpos, 1); >> + >> + return NULL; >> + } >> + /* fE: */ >> + } while (atomic_long_cmpxchg_relaxed(&dr->head_lpos, begin_lpos, >> + next_lpos) != begin_lpos); >> + > > We need a write barrier here to make sure that dr->head_lpos > is updated before we start updating other values, e.g. > db->id below. My RFCv2 implemented it that way. The function was called data_reserve() and it moved the head using cmpxchg_release(). For RFCv3 I changed to a full memory barrier instead because using acquire/release here is a bit messy. There are 2 different places where the acquire needed to be: - In _dataring_pop() a load_acquire() of head_lpos would need to be _before_ loading of begin_lpos and next_lpos. - In prb_iter_next_valid_entry() a load_acquire() of head_lpos would need to be at the beginning within the dataring_datablock_isvalid() check (mC). If smp_mb() is too heavy to call for every printk(), then we can move to acquire/release. The comments of fB list exactly what is synchronized (and where). John Ogness