Received: by 2002:a5d:925a:0:0:0:0:0 with SMTP id e26csp453832iol; Thu, 9 Jun 2022 07:09:56 -0700 (PDT) X-Google-Smtp-Source: ABdhPJxfsugVW3MihjTHZNmABX9yBBOxjaSgcwrw7cAjl0hJMByI5h7YWCaG3UtoYb5DmcYE5Wlu X-Received: by 2002:a17:907:7f1e:b0:6ff:10b:9cee with SMTP id qf30-20020a1709077f1e00b006ff010b9ceemr11976673ejc.302.1654783796629; Thu, 09 Jun 2022 07:09:56 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1654783796; cv=none; d=google.com; s=arc-20160816; b=NdTB6Sz+tAj205zMKqOMYOWCNuMEp0VXXtv9lK3G/k3TR8REcykQrWE+/QvpnFEEHo 6HR7F4A5qBnXLwC3yUFZs4CzvPjIFjMhwd6RHZItD51/s7m0ayDbzldUPrW184MAjTjL f8lW7VmZU9k2d2pIUmcQbuJhDlpXSHHKI053SC5jldz7K3LQpotYSjIGUBh2O/tKJ22S jo6pVwUDpoSa24JFZ8ylaFM8akA3rw7R0oKTIpwK2K3Lq9BFv+ofX/rnvyRe+ylkixxi A0qzpuU1InfUQTLp+uJ51JH/JjxOkFYm8djkJ/lQCLtA96F7r/RypsSJKnL+H37A7fRB TS8Q== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-disposition:mime-version :references:message-id:subject:cc:to:from:date:dkim-signature; bh=aYtrdDBwU4yTdQKJkMQVA7gMy9IooF4te3GdXx0ANHQ=; b=TGepI8ud+LG275WiDNLaB9NrowvqkERybfJfPwxuuF7OyOFegsgu7MZbpkISugnmgM eKa98BYSfdVSzXxOhO/UFK+6juBHNgdB2ruKx2N6pxw3T3sITtxDhErZ5XiOKIkJ+0ne FU6qBBFTZ9UaqeTfkNZOBSvCe0SG5kSFpNcIN8ES0IWqt3D59i3y1uwXoBZQp1y6Nrlg gGj8ht0MyEd7+IbipAXJT1bjli5Z1QfSAIcHdN/r0e8irWWUz0y3hVHVNtUDXqXpgszp 5iBi9XeLFzsP++9HMrbztIZcx+ZiIqdkQu1b/KXwnBlj3GCUYQaljsPBhAOWit9L60RF JYjA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@cmpxchg-org.20210112.gappssmtp.com header.s=20210112 header.b=mji5Zcu8; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=cmpxchg.org Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id sc11-20020a1709078a0b00b0071212458068si723456ejc.807.2022.06.09.07.09.17; Thu, 09 Jun 2022 07:09:56 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@cmpxchg-org.20210112.gappssmtp.com header.s=20210112 header.b=mji5Zcu8; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=cmpxchg.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S244891AbiFIN4G (ORCPT + 99 others); Thu, 9 Jun 2022 09:56:06 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:47016 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S244556AbiFINzx (ORCPT ); Thu, 9 Jun 2022 09:55:53 -0400 Received: from mail-qt1-x82f.google.com (mail-qt1-x82f.google.com [IPv6:2607:f8b0:4864:20::82f]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id DC4C53969A for ; Thu, 9 Jun 2022 06:55:47 -0700 (PDT) Received: by mail-qt1-x82f.google.com with SMTP id hf10so17156893qtb.7 for ; Thu, 09 Jun 2022 06:55:47 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=cmpxchg-org.20210112.gappssmtp.com; s=20210112; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=aYtrdDBwU4yTdQKJkMQVA7gMy9IooF4te3GdXx0ANHQ=; b=mji5Zcu8fibQv/E7UoK5eVhMxlLtfyUkPZm4jv1sYUvNbnF8NLo2jlTiIC7uKicDAS W3QPzMCZ89ze+yxQ4khvlBtGdgxOPlJ4IRc3IJyI6U/xCTXX1zfwUHpKiwKy0xeG7gbi p79oWBTdhp8rokiGPNYfpVwOoDH28YTPZuKGy7PaujamUmjQOCnZMICM7ZVJ5sCmQXg6 0h0kcqeZgtslqeW0NxrWWer3Yt9ALb9awlQgm9neqCY3LC6AbhUeP/wzAuwpBBedjEbv mak+OT3huvyh7+kpD+B7yn1iMOEbonOwkC74FhTU3WFzCQbToUStNxM39T8tBUqx/2f4 /Gsw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=aYtrdDBwU4yTdQKJkMQVA7gMy9IooF4te3GdXx0ANHQ=; b=F9d7/OPSlqQVINiUdUSWCL66DgaotFv6RrsItz51UDLP8ID/fuVZSX5/17nRnYUi1w mTpK10JaiIfX0kkz/iVD4LAqGNCvM7N6PvP/tcR1HMttBjUyg5XCbinVbrFpA/SW1X1Z /c149wKm6wjH2+GQWqYgW/nr2gAMZQyLA8+VJRo1tXu3ZxCc/sPmg0gsPpxqYQac+lRw vlyPbFz9lckyEfTa/xw4CGUirqEQmpV5lLCeRqqWVAqwbt/r88FuDuOycuzSgX751D4j gYSKXBP9UIvFMTaZMFo4rt4rpr0ld2yiPoihLu5PRRcCoqmVHLoyt9hpY/P/k6EJI2Oi X/kg== X-Gm-Message-State: AOAM531m6Yo5OdX44+H7RxHYSZodjNLiv5TXF4brIdOGV5U4DF9Lr5SV L7LjvAZw6r11TRLSFYrVigV3Hg== X-Received: by 2002:ac8:4e87:0:b0:304:fedb:e972 with SMTP id 7-20020ac84e87000000b00304fedbe972mr9148182qtp.444.1654782946539; Thu, 09 Jun 2022 06:55:46 -0700 (PDT) Received: from localhost ([2620:10d:c091:480::1:4759]) by smtp.gmail.com with ESMTPSA id j9-20020ac86649000000b00304ea3d2f62sm9195640qtp.41.2022.06.09.06.55.45 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 09 Jun 2022 06:55:45 -0700 (PDT) Date: Thu, 9 Jun 2022 09:55:45 -0400 From: Johannes Weiner To: Aneesh Kumar K V Cc: linux-mm@kvack.org, akpm@linux-foundation.org, Wei Xu , Huang Ying , Greg Thelen , Yang Shi , Davidlohr Bueso , Tim C Chen , Brice Goglin , Michal Hocko , Linux Kernel Mailing List , Hesham Almatary , Dave Hansen , Jonathan Cameron , Alistair Popple , Dan Williams , Feng Tang , Jagdish Gediya , Baolin Wang , David Rientjes Subject: Re: [PATCH v5 1/9] mm/demotion: Add support for explicit memory tiers Message-ID: References: <20220603134237.131362-1-aneesh.kumar@linux.ibm.com> <20220603134237.131362-2-aneesh.kumar@linux.ibm.com> <02ee2c97-3bca-8eb6-97d8-1f8743619453@linux.ibm.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <02ee2c97-3bca-8eb6-97d8-1f8743619453@linux.ibm.com> X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS, T_SCC_BODY_TEXT_LINE,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, Jun 09, 2022 at 08:03:26AM +0530, Aneesh Kumar K V wrote: > On 6/8/22 11:46 PM, Johannes Weiner wrote: > > On Wed, Jun 08, 2022 at 09:43:52PM +0530, Aneesh Kumar K V wrote: > > > On 6/8/22 9:25 PM, Johannes Weiner wrote: > > > > Hello, > > > > > > > > On Wed, Jun 08, 2022 at 10:11:31AM -0400, Johannes Weiner wrote: > > > > > On Fri, Jun 03, 2022 at 07:12:29PM +0530, Aneesh Kumar K.V wrote: > > > > > > @@ -0,0 +1,20 @@ > > > > > > +/* SPDX-License-Identifier: GPL-2.0 */ > > > > > > +#ifndef _LINUX_MEMORY_TIERS_H > > > > > > +#define _LINUX_MEMORY_TIERS_H > > > > > > + > > > > > > +#ifdef CONFIG_TIERED_MEMORY > > > > > > + > > > > > > +#define MEMORY_TIER_HBM_GPU 0 > > > > > > +#define MEMORY_TIER_DRAM 1 > > > > > > +#define MEMORY_TIER_PMEM 2 > > > > > > + > > > > > > +#define MEMORY_RANK_HBM_GPU 300 > > > > > > +#define MEMORY_RANK_DRAM 200 > > > > > > +#define MEMORY_RANK_PMEM 100 > > > > > > + > > > > > > +#define DEFAULT_MEMORY_TIER MEMORY_TIER_DRAM > > > > > > +#define MAX_MEMORY_TIERS 3 > > > > > > > > > > I understand the names are somewhat arbitrary, and the tier ID space > > > > > can be expanded down the line by bumping MAX_MEMORY_TIERS. > > > > > > > > > > But starting out with a packed ID space can get quite awkward for > > > > > users when new tiers - especially intermediate tiers - show up in > > > > > existing configurations. I mentioned in the other email that DRAM != > > > > > DRAM, so new tiers seem inevitable already. > > > > > > > > > > It could make sense to start with a bigger address space and spread > > > > > out the list of kernel default tiers a bit within it: > > > > > > > > > > MEMORY_TIER_GPU 0 > > > > > MEMORY_TIER_DRAM 10 > > > > > MEMORY_TIER_PMEM 20 > > > > > > > > Forgive me if I'm asking a question that has been answered. I went > > > > back to earlier threads and couldn't work it out - maybe there were > > > > some off-list discussions? Anyway... > > > > > > > > Why is there a distinction between tier ID and rank? I undestand that > > > > rank was added because tier IDs were too few. But if rank determines > > > > ordering, what is the use of a separate tier ID? IOW, why not make the > > > > tier ID space wider and have the kernel pick a few spread out defaults > > > > based on known hardware, with plenty of headroom to be future proof. > > > > > > > > $ ls tiers > > > > 100 # DEFAULT_TIER > > > > $ cat tiers/100/nodelist > > > > 0-1 # conventional numa nodes > > > > > > > > > > > > > > > > $ grep . tiers/*/nodelist > > > > tiers/100/nodelist:0-1 # conventional numa > > > > tiers/200/nodelist:2 # pmem > > > > > > > > $ grep . nodes/*/tier > > > > nodes/0/tier:100 > > > > nodes/1/tier:100 > > > > nodes/2/tier:200 > > > > > > > > > > > > > > > > $ grep . tiers/*/nodelist > > > > tiers/100/nodelist:0-1,3 > > > > tiers/200/nodelist:2 > > > > > > > > $ echo 300 >nodes/3/tier > > > > $ grep . tiers/*/nodelist > > > > tiers/100/nodelist:0-1 > > > > tiers/200/nodelist:2 > > > > tiers/300/nodelist:3 > > > > > > > > $ echo 200 >nodes/3/tier > > > > $ grep . tiers/*/nodelist > > > > tiers/100/nodelist:0-1 > > > > tiers/200/nodelist:2-3 > > > > > > > > etc. > > > > > > tier ID is also used as device id memtier.dev.id. It was discussed that we > > > would need the ability to change the rank value of a memory tier. If we make > > > rank value same as tier ID or tier device id, we will not be able to support > > > that. > > > > Is the idea that you could change the rank of a collection of nodes in > > one go? Rather than moving the nodes one by one into a new tier? > > > > [ Sorry, I wasn't able to find this discussion. AFAICS the first > > patches in RFC4 already had the struct device { .id = tier } > > logic. Could you point me to it? In general it would be really > > helpful to maintain summarized rationales for such decisions in the > > coverletter to make sure things don't get lost over many, many > > threads, conferences, and video calls. ] > > Most of the discussion happened not int he patch review email threads. > > RFC: Memory Tiering Kernel Interfaces (v2) > https://lore.kernel.org/linux-mm/CAAPL-u_diGYEb7+WsgqNBLRix-nRCk2SsDj6p9r8j5JZwOABZQ@mail.gmail.com > > RFC: Memory Tiering Kernel Interfaces (v4) > https://lore.kernel.org/linux-mm/CAAPL-u9Wv+nH1VOZTj=9p9S70Y3Qz3+63EkqncRDdHfubsrjfw@mail.gmail.com I read the RFCs, the discussions and your code. It's still not clear why the tier/device ID and the rank need to be two separate, user-visible things. There is only one tier of a given rank, why can't the rank be the unique device id? dev->id = 100. One number. Or use a unique device id allocator if large numbers are causing problems internally. But I don't see an explanation why they need to be two different things, let alone two different things in the user ABI.