Received: by 2002:a25:683:0:0:0:0:0 with SMTP id 125csp1555066ybg; Thu, 11 Jun 2020 12:56:05 -0700 (PDT) X-Google-Smtp-Source: ABdhPJxpk/8l93pdXRmd5UCn1i31lREWQrVQln4/gcCsYJVr2XTf3GVTApCLgc1DzznYWbi0sXzE X-Received: by 2002:a17:906:a0f:: with SMTP id w15mr9070013ejf.332.1591905365126; Thu, 11 Jun 2020 12:56:05 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1591905365; cv=none; d=google.com; s=arc-20160816; b=r16qOIoZt+c4WTb6fNpfhNXAgRog2cehmd9TuE4LuoBb37Fa4m7cqqMJpJvC8K86Lo OqQCElY1s6SXk5OeJJZcJ8Ve8XKiZ1h20LzZ7t+Jl6b58KkVUHGnTFmZwItkHMcPqm/K Jw/0eYN1QCuipypaolH89OeNXyKng8bxZWvFPU95OdzB5Qdz6b2nGiRBskUoiiUgRupD F+gn2fHl9XofCAiYappvQqTQhMoBLhY9FfB78wCZgYLQW3WRQcKCegv7/KZ1T0Q5MLJY jZ1Z/c3msnmBM+nncEg+kHpUJxoMxYlg4ZISH4SqJNImKzXWg27H/WbipG443LoUOkVj CfUw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:mime-version:user-agent:message-id :in-reply-to:date:references:subject:cc:to:from; bh=vDM+BRUo34nJVbvYKy13En9T62qva+vcF0ooGhbKUFE=; b=uZ68WudjIyUc7h5wz9qLMh5cP3ItwflwpjBelV5OuNA0e6jmxGWD0NOC9FE0LzSu2b Y86d6bgymx1AyofYSj6i2hlyvIzsGLnqA3ScHnkV9p8CJrY13ox6Pz/P5wqtsFNnX1tx kF2VikKhKbD/mF26OKWnUko6oQnt9twcsP2VDiomKA73kRmoXmf0vrcsD+1P8160lB+O bor+5oK7ME5TC8PN6y4DPHnoCAhkyUc3XTpDTHWpzbha9Qu7QjO2W9dCfoMTRanrF3IF dcSC5OKxRFadht/4+6dIDREMPJ0EII/SArZ02IiTtLRVUyvI6p6/gkmFSSQQYn1EKQgm +Ikg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id s23si2268701ejq.306.2020.06.11.12.55.42; Thu, 11 Jun 2020 12:56:05 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726786AbgFKTvX (ORCPT + 99 others); Thu, 11 Jun 2020 15:51:23 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:53272 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726251AbgFKTvX (ORCPT ); Thu, 11 Jun 2020 15:51:23 -0400 Received: from Galois.linutronix.de (Galois.linutronix.de [IPv6:2a0a:51c0:0:12e:550::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 3CF77C08C5C1 for ; Thu, 11 Jun 2020 12:51:23 -0700 (PDT) Received: from localhost ([127.0.0.1] helo=vostro) by Galois.linutronix.de with esmtps (TLS1.2:RSA_AES_256_CBC_SHA1:256) (Exim 4.80) (envelope-from ) id 1jjTEO-00055V-7p; Thu, 11 Jun 2020 21:51:20 +0200 From: John Ogness To: Petr Mladek Cc: Peter Zijlstra , Sergey Senozhatsky , Sergey Senozhatsky , Steven Rostedt , Linus Torvalds , Greg Kroah-Hartman , Andrea Parri , Thomas Gleixner , kexec@lists.infradead.org, linux-kernel@vger.kernel.org, Paul McKenney Subject: Re: redundant check in make_data_reusable(): was [PATCH v2 2/3] printk: add lockless buffer References: <20200501094010.17694-1-john.ogness@linutronix.de> <20200501094010.17694-3-john.ogness@linutronix.de> <20200609093103.GB23752@linux-b0ei> <87lfkwuwg1.fsf@vostro.fn.ogness.net> <20200610093835.GB4311@linux-b0ei> <87o8prp6bi.fsf@vostro.fn.ogness.net> <87d067otoq.fsf@vostro.fn.ogness.net> Date: Thu, 11 Jun 2020 21:51:18 +0200 In-Reply-To: <87d067otoq.fsf@vostro.fn.ogness.net> (John Ogness's message of "Wed, 10 Jun 2020 16:56:53 +0200") Message-ID: <87k10dtm89.fsf@vostro.fn.ogness.net> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.1 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 2020-06-10, John Ogness wrote: >>> +static bool data_make_reusable(struct printk_ringbuffer *rb, >>> + struct prb_data_ring *data_ring, >>> + unsigned long lpos_begin, >>> + unsigned long lpos_end, >>> + unsigned long *lpos_out) >>> +{ >>> + struct prb_desc_ring *desc_ring = &rb->desc_ring; >>> + struct prb_data_blk_lpos *blk_lpos; >>> + struct prb_data_block *blk; >>> + unsigned long tail_lpos; >>> + enum desc_state d_state; >>> + struct prb_desc desc; >>> + unsigned long id; >>> + >>> + /* >>> + * Using the provided @data_ring, point @blk_lpos to the correct >>> + * blk_lpos within the local copy of the descriptor. >>> + */ >>> + if (data_ring == &rb->text_data_ring) >>> + blk_lpos = &desc.text_blk_lpos; >>> + else >>> + blk_lpos = &desc.dict_blk_lpos; >>> + >>> + /* Loop until @lpos_begin has advanced to or beyond @lpos_end. */ >>> + while ((lpos_end - lpos_begin) - 1 < DATA_SIZE(data_ring)) { >>> + blk = to_block(data_ring, lpos_begin); >>> + id = READ_ONCE(blk->id); /* LMM(data_make_reusable:A) */ >>> + >>> + /* >>> + * Guarantee the block ID is loaded before checking the tail >>> + * lpos. The loaded block ID can only be considered valid if >>> + * the tail lpos has not overtaken @lpos_begin. This pairs >>> + * with data_alloc:A. >>> + * >>> + * Memory barrier involvement: >>> + * >>> + * If data_make_reusable:A reads from data_alloc:B, then >>> + * data_make_reusable:C reads from data_push_tail:D. >>> + * >>> + * Relies on: >>> + * >>> + * MB from data_push_tail:D to data_alloc:B >>> + * matching >>> + * RMB from data_make_reusable:A to data_make_reusable:C >>> + * >>> + * Note: data_push_tail:D and data_alloc:B can be different >>> + * CPUs. However, the data_alloc:B CPU (which performs >>> + * the full memory barrier) must have previously seen >>> + * data_push_tail:D. >>> + */ >>> + smp_rmb(); /* LMM(data_make_reusable:B) */ >>> + >>> + tail_lpos = atomic_long_read(&data_ring->tail_lpos >>> + ); /* LMM(data_make_reusable:C) */ >>> + >>> + /* >>> + * If @lpos_begin has fallen behind the tail lpos, the read >>> + * block ID cannot be trusted. Fast forward @lpos_begin to the >>> + * tail lpos and try again. >>> + */ >>> + if (lpos_begin - tail_lpos >= DATA_SIZE(data_ring)) { >>> + lpos_begin = tail_lpos; >>> + continue; >>> + } >>> + >>> + d_state = desc_read(desc_ring, id, >>> + &desc); /* LMM(data_make_reusable:D) */ >>> + >>> + switch (d_state) { >>> + case desc_miss: >>> + return false; >>> + case desc_reserved: >>> + return false; >>> + case desc_committed: >>> + /* >>> + * This data block is invalid if the descriptor >>> + * does not point back to it. >>> + */ >> >> Here again the comments describe what the check does but not why. >> I would write something like: >> >> /* >> * The block might have already been >> * reused. Make sure that the descriptor really >> * points back to the checked lpos. It covers >> * both situations. Random data might point to >> * a valid descriptor just by chance. Or the block >> * has been already reused by another descriptor. >> */ > > Originally this check was needed because the descriptor would be read > even if there was a data race reading the ID from the data > block. Validating the lpos value was a kind of protection against > reading random data that by chance yielded an ID of a committed/reusable > descriptor. > > However, after you pointed out that this check was not enough, the code > now re-checks the data tail to make sure that no data race happened. So > actually it is not possible that a descriptor in the committed/reusable > state will point anywhere else. We know the ID is not random garbage or > recycled, so the state can be trusted. > > I recommend to either remove this sanity check (for committed and > reusable) or at least change it to: > > WARN_ON_ONCE(blk_lpos->begin != lpos_begin); Sorry, I wasn't thinking things out correctly which this suggestion. We need the checks. The explanation is as follows... With regard to the block ID read (data_make_reusable:A): Failing the following tail check is a guarantee that we read garbage. Data behind the data tail must never be trusted. It is correct to try again with the new tail value. Passing the following tail check could still be a garbage read. The descriptor owning that data block may be in the reserved state and has not yet written the new block ID. In that case we read whatever random data was there even though the tail check passes. And that random data could yield a valid descriptor ID, resulting in some state value that does not apply. The "descriptor pointing back to data block" checks are playing the critical role of protecting against data_make_reusable:A reading garbage. And indeed, I can reproduce this situation fairly easily on ppc64le+kvm tests. So please disregard this suggestion. And yes, the comments for these checks need to be expanded to make clear just how critical they are and explain how the situation can occur. John Ogness >>> + if (blk_lpos->begin != lpos_begin) >>> + return false; >>> + desc_make_reusable(desc_ring, id); >>> + break; >>> + case desc_reusable: >>> + /* >>> + * This data block is invalid if the descriptor >>> + * does not point back to it. >>> + */ >>> + if (blk_lpos->begin != lpos_begin) >>> + return false; >>> + break; >>> + } >>> + >>> + /* Advance @lpos_begin to the next data block. */ >>> + lpos_begin = blk_lpos->next; >>> + }