Received: by 2002:a6b:500f:0:0:0:0:0 with SMTP id e15csp6667686iob; Wed, 11 May 2022 02:43:02 -0700 (PDT) X-Google-Smtp-Source: ABdhPJw/xEIreYxckfsGZDJbD4wOYrcUteB6HsKd0P6+W+GgsYNf8jiiTWGqdFSJ4UFsWEVJXwT1 X-Received: by 2002:a17:902:f70a:b0:153:88c7:774 with SMTP id h10-20020a170902f70a00b0015388c70774mr24708607plo.166.1652262181889; Wed, 11 May 2022 02:43:01 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1652262181; cv=none; d=google.com; s=arc-20160816; b=Hh94lPL+wkxzj79RVPFR2O7Nv9qCV6C0keAlua+04eC8OawjEG5rdUyUON2amIXOh4 OSrPsKp3/rUkqVP8XxB4ExwcRVvxCePog3Ae3O/mFP0bwRVD7i0rYOha1lFbep5Fc15m 7ZZ4UUaLrboMkbiPJz3Ahu5YqxYpd26Nesy6y/fSun7NzDhAteQGh5wI5mm8DQcvoyrx IOFN1G4NMmV6soxiAjAxZ48IhE0630mXgyr9lI1tGnLb/viW+ldlA6Uv5tOJzoqALOjY waPNDoev4AGUW4R+i73nXYSd1ZCUtnkKxAWH+h7YopVifbmed+PMEtH25X2JxFHScaTl c/rQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:subject:message-id:date:from:in-reply-to :references:mime-version:dkim-signature; bh=vB2dR56A5q3bFoAJ53SBSeAtmH77H5/ybtaBTyeQV0w=; b=X+3may15KYC0V/esMYkHPmEdbmaXWsHi1TxmW3E3GRem5pU2WTCqqBBv7/lOWTy/Ls SjxZd3szl+SopjoRkvUV6JhfGpc+dxfcW55Mx7D/ReqC5V18W70wEowdgGgjIb2bu5+8 YptnPsJC4WmOI+D2Y1qBKVyrwNZwd0lQJnMst66SgL1EYcTCPFNezJvHKgNMNCdlaao3 2V1FWGcGnXeNZdx3JcwPsufwVuxSl3x6zhZBBy/lfl01X9NfTd3jL2BhU5sz+vJqQBRS SiDKHm0Yhplz2XYtR8+/auI2Eh2mmzZokw2Jf08X5A/4pEM09sd3hY9KUMQr2gA09MBN z05w== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=ikgkCUlC; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id ba12-20020a170902720c00b001587f099641si1750665plb.387.2022.05.11.02.42.49; Wed, 11 May 2022 02:43:01 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=ikgkCUlC; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S240442AbiEKFbd (ORCPT + 99 others); Wed, 11 May 2022 01:31:33 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:37986 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S240765AbiEKFan (ORCPT ); Wed, 11 May 2022 01:30:43 -0400 Received: from mail-vs1-xe36.google.com (mail-vs1-xe36.google.com [IPv6:2607:f8b0:4864:20::e36]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 0AC98244F2E for ; Tue, 10 May 2022 22:30:16 -0700 (PDT) Received: by mail-vs1-xe36.google.com with SMTP id i186so869217vsc.9 for ; Tue, 10 May 2022 22:30:15 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=vB2dR56A5q3bFoAJ53SBSeAtmH77H5/ybtaBTyeQV0w=; b=ikgkCUlCGPbFI/KIA/yJ6gB9/0MYb3JI9K0o5JCBRKlQgGkWsZpC7uLQl+rANxK3/k wMsEGVKDlEtSnZ6REsRJfwKIl8itOVEpgP7LSl4X+Zy3TNN/upnfQ1mY4aV44W1J0Vln MNPHCRa3i8mMagY9xHbVKc9TxTbbY6cggRTJ2xaXLgUf/KIF76ve3joDWJ1RtXoCKIJl PheSx8lbtuBBY3/b5YJ/CgPTE2b/Eb6H1NWgnK9R9cEpip9NRytEd4tlpadx9wmsB3cM NNulML+cFkkpwhIz38h+4zSMVmf2y7Ca0CObNcuoQTmrQ+lDHAQQlltdstcx837Px5wY sqCQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=vB2dR56A5q3bFoAJ53SBSeAtmH77H5/ybtaBTyeQV0w=; b=RivDpAqS0ctoSvG1TuiZYSXNlpRLzPF8fRqq5rUE4SBXUm/9AN0FaobSKX8VNlQbiE mukBy0KXqsebdYYIzzXoTIPM1qKchNa/HGcsfDnGPnFmfC8yIybzcd7NB/iWs7/NiTjY 8RN5FdGzo7Aepxk4+ZIYNSxeCLs3kndM8FYI/gNz43yHoPt0F4wmTnGoFnjkKAUCNy4c 5mxbxw7hsRg5Q68wBS75MKggZq99AIhgAHajwp35LBn3d9XPWRdjABz00Ziq2yCwQRuo ODRb11xK87zvR+28u6RIMhw9zMTSXtxW3fD2w+A//J87c0c/yhQSwPYeD+oC1nK1K6lD BURQ== X-Gm-Message-State: AOAM533/WJugiRpyEZSIv8ujlOaEPV6VBN6QOqF6pdexzSMt02P8tSH+ RFu0Ksc+o2Iy9tGawfsdo3K7eqcNDDztBbxIZ3Vg2g== X-Received: by 2002:a67:ed88:0:b0:328:27d9:1381 with SMTP id d8-20020a67ed88000000b0032827d91381mr13054653vsp.12.1652247014975; Tue, 10 May 2022 22:30:14 -0700 (PDT) MIME-Version: 1.0 References: <87tua3h5r1.fsf@nvdebian.thelocal> <875ymerl81.fsf@nvdebian.thelocal> <87fslhhb2l.fsf@linux.ibm.com> In-Reply-To: <87fslhhb2l.fsf@linux.ibm.com> From: Wei Xu Date: Tue, 10 May 2022 22:30:03 -0700 Message-ID: Subject: Re: RFC: Memory Tiering Kernel Interfaces To: "Aneesh Kumar K.V" Cc: Alistair Popple , Yang Shi , Andrew Morton , Dave Hansen , Huang Ying , Dan Williams , Linux MM , Greg Thelen , Jagdish Gediya , Linux Kernel Mailing List , Davidlohr Bueso , Michal Hocko , Baolin Wang , Brice Goglin , Feng Tang , Jonathan Cameron , Tim Chen Content-Type: text/plain; charset="UTF-8" X-Spam-Status: No, score=-17.6 required=5.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF, ENV_AND_HDR_SPF_MATCH,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS, T_SCC_BODY_TEXT_LINE,USER_IN_DEF_DKIM_WL,USER_IN_DEF_SPF_WL autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, May 10, 2022 at 4:38 AM Aneesh Kumar K.V wrote: > > Alistair Popple writes: > > > Wei Xu writes: > > > >> On Thu, May 5, 2022 at 5:19 PM Alistair Popple wrote: > >>> > >>> Wei Xu writes: > >>> > >>> [...] > >>> > >>> >> > > >>> >> > > >>> >> > Tiering Hierarchy Initialization > >>> >> > `==============================' > >>> >> > > >>> >> > By default, all memory nodes are in the top tier (N_TOPTIER_MEMORY). > >>> >> > > >>> >> > A device driver can remove its memory nodes from the top tier, e.g. > >>> >> > a dax driver can remove PMEM nodes from the top tier. > >>> >> > >>> >> With the topology built by firmware we should not need this. > >>> > >>> I agree that in an ideal world the hierarchy should be built by firmware based > >>> on something like the HMAT. But I also think being able to override this will be > >>> useful in getting there. Therefore a way of overriding the generated hierarchy > >>> would be good, either via sysfs or kernel boot parameter if we don't want to > >>> commit to a particular user interface now. > >>> > >>> However I'm less sure letting device-drivers override this is a good idea. How > >>> for example would a GPU driver make sure it's node is in the top tier? By moving > >>> every node that the driver does not know about out of N_TOPTIER_MEMORY? That > >>> could get messy if say there were two drivers both of which wanted their node to > >>> be in the top tier. > >> > >> The suggestion is to allow a device driver to opt out its memory > >> devices from the top-tier, not the other way around. > > > > So how would demotion work in the case of accelerators then? In that > > case we would want GPU memory to demote to DRAM, but that won't happen > > if both DRAM and GPU memory are in N_TOPTIER_MEMORY and it seems the > > only override available with this proposal would move GPU memory into a > > lower tier, which is the opposite of what's needed there. > > How about we do 3 tiers now. dax kmem devices can be registered to > tier 3. By default all numa nodes can be registered at tier 2 and HBM or > GPU can be enabled to register at tier 1. ? This makes sense. I will send an updated RFC based on the discussions so far. > -aneesh