Received: by 2002:a05:6a10:8c0a:0:0:0:0 with SMTP id go10csp720078pxb; Wed, 3 Feb 2021 16:29:26 -0800 (PST) X-Google-Smtp-Source: ABdhPJziGkmvBHZe4hv1pTXLFiD7qVlZsWCKG/3+m4HrfBEazh9L982gnJ+713tMHNbODUoMHqdu X-Received: by 2002:aa7:c2c7:: with SMTP id m7mr5717037edp.134.1612398566270; Wed, 03 Feb 2021 16:29:26 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1612398566; cv=none; d=google.com; s=arc-20160816; b=FL4l29Q/H5KTj9uwDe4rI2EbRnJ3qE3+C2JYGAuXFEfy18xtnArrm85296JkFtNwvn OhWG3Vy1ZXVyCq2afxqOKwpcOY9CMmYsFwdeN5S9orFkFjrp0BDvwN6ABqO2vrlFq/hY EEdY8qp4Q0Hr6d2EL+MYvAhb1aJ1b7OJ5edy7i/8+A23uLmPLJnhjrT10zfswGMu8rcU bUAck8q5wX+lXViMrt5bzVD3YxYsGbdTqWlmHlodcfhSXEODt+mHLom7Ki/BwydOY+Uz 4qiumsP+KCXtmmCqKVztADmPJ7uw7spqAlq33/BqVakF8Oak2RBuur5QJ/k212y+TGCI d19A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:subject:message-id:date:from:in-reply-to :references:mime-version:dkim-signature; bh=xyEbtM3oBDhnuHqycJNBjjR1/B89Yhm019P071TQu48=; b=hfRxlwuw/7mf8OImyMlUPZM8+qphpCgNbW3ZKIbyxAWVe/CQ5cS9+kdWQJnZTC3viS N0zL2jHp4WzA/dGx4gYZFHbCDrW1MiijxCwnKeswpg/48bgGbfVAa54RBeLkHJpuyap5 kxVqTuCwAin1v//Pwu+wxDt4NWMsaQv2pSIqiN1IhdjAotHScO8/4b1gP5B6FhSwhaq9 eBpXSshXkQvizZXBiICQIzhsvOu8EFf6Yeozq69LOe7Gbtmx0m4AQZ+/TwPqmD/Ae8xh L03u1+OwaI24smFsyjcj6LAOwnl09ZrOmLSdIk/HnvnJ5ytlEsvYPSNAu16irbq2jGc4 sHzQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=PyjI0sGe; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id g25si2319734edr.526.2021.02.03.16.29.00; Wed, 03 Feb 2021 16:29:26 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=PyjI0sGe; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233533AbhBDA1Q (ORCPT + 99 others); Wed, 3 Feb 2021 19:27:16 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:43104 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233506AbhBDA1N (ORCPT ); Wed, 3 Feb 2021 19:27:13 -0500 Received: from mail-ed1-x532.google.com (mail-ed1-x532.google.com [IPv6:2a00:1450:4864:20::532]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 75339C0613ED for ; Wed, 3 Feb 2021 16:26:33 -0800 (PST) Received: by mail-ed1-x532.google.com with SMTP id df22so1980224edb.1 for ; Wed, 03 Feb 2021 16:26:33 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=xyEbtM3oBDhnuHqycJNBjjR1/B89Yhm019P071TQu48=; b=PyjI0sGecZM5547DnvmFxMBcoowizD9MjexPTlrbIX/HYHp5OR84gY2BgvA/Bzanru o/+Gu60Z9DKDxX5SnMvg/ODOwAMYl8ha/jaCQR81p858/G5tGFMpTAGuj0iquoTJUpDy avKE4dGHC2ad8cVa+/hIsBMyF3XfZ2/emQ0zuym6sEOmlEvI00WK1bi7JFM7aPoQIFqh tIc/w/hv9HGvoKWh4trkbiLujBZ67kIAE4uOD12Cg/sP754OwuJ6DtxJ93/e0VlPXvEi DSx6xrZPNvOwzxyGlPECsaosLXkbJvLTSn6EuE8aZ/xwHM34MxjU0Xct+Z4Ju5hgxfO0 t2MQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=xyEbtM3oBDhnuHqycJNBjjR1/B89Yhm019P071TQu48=; b=i6g3a/sKhSatS8k7K/7dNefK5GShG2JMngxD/4MipuoIZCuFQDaYOupkqQFtozQLzI ilvXqel3kmlRCMt1JhNKpp85ABptFdzGTNc7RrxKkI98APhO54ZYwMyIi6fGrza0eRaZ +HiEOs6p2K/YN7pgBdVi1JAYP9Z4Mm1ZlN5uoD/bKHQcn/fMuPKcxJ6Ttecu/ygYn4lR HWGtdmFTOu5yDkV0btQ7A8cEFbET2BNjAZEfXBsOMUns1FMpbN+N4zrofOAA+V3jpd8J 249HbFBobHauouHRwN09dciIh/Oz0oUBXchcI5sAcadtLAgQfebW61tGJsPmBusqWdYq QGqQ== X-Gm-Message-State: AOAM530CPNub+HLfekf/VMhdzeqgq2Ptzgv+LI5B0Uh/1tVBJepSga3R V27XyTEuC0hHaaBhMe0NhOdcipFioRVYKeP0DgQ= X-Received: by 2002:aa7:de82:: with SMTP id j2mr5705265edv.313.1612398392246; Wed, 03 Feb 2021 16:26:32 -0800 (PST) MIME-Version: 1.0 References: <20210126003411.2AC51464@viggo.jf.intel.com> <20210126003421.45897BF4@viggo.jf.intel.com> <317d4c23-76a7-b653-87a4-bab642fa1717@intel.com> In-Reply-To: From: Yang Shi Date: Wed, 3 Feb 2021 16:26:20 -0800 Message-ID: Subject: Re: [RFC][PATCH 05/13] mm/numa: automatically generate node migration order To: Dave Hansen Cc: Dave Hansen , Linux Kernel Mailing List , Linux MM , Yang Shi , David Rientjes , Huang Ying , Dan Williams , David Hildenbrand , Oscar Salvador Content-Type: text/plain; charset="UTF-8" Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, Feb 2, 2021 at 4:43 PM Dave Hansen wrote: > > On 2/2/21 9:46 AM, Yang Shi wrote: > > On Mon, Feb 1, 2021 at 11:13 AM Dave Hansen wrote: > >> On 1/29/21 12:46 PM, Yang Shi wrote: > >> ... > >>>> int next_demotion_node(int node) > >>>> { > >>>> - return node_demotion[node]; > >>>> + /* > >>>> + * node_demotion[] is updated without excluding > >>>> + * this function from running. READ_ONCE() avoids > >>>> + * reading multiple, inconsistent 'node' values > >>>> + * during an update. > >>>> + */ > >>> Don't we need a smp_rmb() here? The single write barrier might be not > >>> enough in migration target set. Typically a write barrier should be > >>> used in pairs with a read barrier. > >> I don't think we need one, practically. > >> > >> Since there is no locking against node_demotion[] updates, although a > >> smp_rmb() would ensure that this read is up-to-date, it could change > >> freely after the smp_rmb(). > > Yes, but this should be able to guarantee we see "disable + after" > > state. Isn't it more preferred? > > I'm debating how much of this is theoretical versus actually applicable > to what we have in the kernel. But, I'm generally worried about code > like this that *looks* innocuous: > > int terminal_node = start_node; > int next_node = next_demotion_node(start_node); > while (next_node != NUMA_NO_NODE) { > next_node = terminal_node; > terminal_node = next_demotion_node(terminal_node); > } > > That could loop forever if it doesn't go out to memory during each loop. > > However, if node_demotion[] *is* read on every trip through the loop, it > will eventually terminate. READ_ONCE() can guarantee that, as could > compiler barriers like smp_rmb(). > > But, after staring at it for a while, I think RCU may be the most > clearly correct way to solve the problem. Or, maybe just throw in the > towel and do a spinlock like a normal human being. :) > > Anyway, here's what I was thinking I'd do with RCU: > > 1. node_demotion[] starts off in a "before" state > 2. Writers to node_demotion[] first set the whole array such that > it will not induce cycles, like setting every member to > NUMA_NO_NODE. (the "disable" state) > 3. Writer calls synchronize_rcu(). After it returns, no readers can > observe the "before" values. > 4. Writer sets the actual values it wants. (the "after" state) > 5. Readers use rcu_read_lock() over any critical section where they > read the array. They are guaranteed to only see one of the two > adjacent states (before+disabled, or disabled+after), but never > before+after within one RCU read-side critical section. > 6. Readers use READ_ONCE() or some other compiler directive to ensure > the compiler does not reorder or combine reads from multiple, > adjacent RCU read-side critical sections. Makes sense to me. > > Although, after writing this, plain old locks are sounding awfully tempting.