Received: by 2002:a05:6358:111d:b0:dc:6189:e246 with SMTP id f29csp3705245rwi; Wed, 2 Nov 2022 02:03:47 -0700 (PDT) X-Google-Smtp-Source: AMsMyM78hPYopmPMqo+6vvJyDf7zcY+gHfAC6MywxXw35YJiP3wLVP8ishh0J7ZDo3kfdc+N6S/5 X-Received: by 2002:a17:906:fe45:b0:791:9624:9ea4 with SMTP id wz5-20020a170906fe4500b0079196249ea4mr22400200ejb.147.1667379827631; Wed, 02 Nov 2022 02:03:47 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1667379827; cv=none; d=google.com; s=arc-20160816; b=We5xajHlfsrmf+PuGmTMu8CVV1dlIJ7BCephUwAs4c/PFK0z4Eh+qKckojsRJmXbhT ci6JrbQPTTjngNJKs9mNm2HVfgYjhY6JkhRsa9WiNWi2EDtkv7T7LKsgpwl38gudluYv 2BtbGJ3907Gay8WTiqC9WUbbsJ25W04BHZpVAIc9eubmqYYqFkU9J1Hqc0R1xuSv+gp7 6gOTCYIDx6DxQPOQW7hK1smxcFWVo8mcNuPvoRxUanNx5p1rDuG2aaH+9aSIL9fxLgtg ouZFd0ML4jk1vohM1SCd9YU3jApetTR9J5a2jwQoKoXuWzxnAZUUalvEDIMzYPnmURDV Zv1w== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-disposition:mime-version :references:message-id:subject:cc:to:from:date:dkim-signature; bh=Zrv/qbRmfritn/KoivF6xxSnIGiDG+7iRGZc6nLU87g=; b=jkoXq+6+5STX75Ja1yCM7Udi4l+IWQFkp1eC6VQ7QdFYOhUkGEjSGHwpGJQ1Vr1osC qxKJxFw1Id7czxSiKGUQVduSduOj/O8FchmJyLWZ5e6w0dpITs4r4oCu2iOGW9tyzfyB rYAWBKPqMjIpzU64i0csVTjsKZQ07TSQdVVtb6KpGN2HvUVcM8VEYme1fCbRXTlKZ8QO 5/n0rQjSDK0XNF/CQN3nLtsTsILLpE+G14+1/zimxcV8X93K6VqPPWfqRLRK9DjevUq4 mq31mMlNxbthv2bvTUf/vX+eyDf2hQqUudNISIIjf+UdE5LMxMo3yMr1MVrUEsnnH21j znuQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@suse.com header.s=susede1 header.b=smICCH+1; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=suse.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id dp20-20020a170906c15400b007a44a13536bsi15553771ejc.243.2022.11.02.02.03.10; Wed, 02 Nov 2022 02:03:47 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@suse.com header.s=susede1 header.b=smICCH+1; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=suse.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230525AbiKBIjn (ORCPT + 97 others); Wed, 2 Nov 2022 04:39:43 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:45450 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231126AbiKBIj3 (ORCPT ); Wed, 2 Nov 2022 04:39:29 -0400 Received: from smtp-out2.suse.de (smtp-out2.suse.de [IPv6:2001:67c:2178:6::1d]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id D57EA27B09 for ; Wed, 2 Nov 2022 01:39:27 -0700 (PDT) Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by smtp-out2.suse.de (Postfix) with ESMTPS id 8EE7D1F88E; Wed, 2 Nov 2022 08:39:26 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1667378366; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=Zrv/qbRmfritn/KoivF6xxSnIGiDG+7iRGZc6nLU87g=; b=smICCH+1F6L37F+OjERhcgElgE+cm3r9vpJj851v62rN4qlv122OhMJLjfIZAoIKH5DqwB 50odSK/f4LgovFBZPlAI80vWNeE7gLK7+6FPEGBwFH8IBe+WJYC2whQ+yZwgvf3wHEV9T0 BLyXOIV4eMKYfw7gJJ6OZrdUX+1T4rk= Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by imap2.suse-dmz.suse.de (Postfix) with ESMTPS id 542D3139D3; Wed, 2 Nov 2022 08:39:26 +0000 (UTC) Received: from dovecot-director2.suse.de ([192.168.254.65]) by imap2.suse-dmz.suse.de with ESMTPSA id HGP6Eb4sYmPyEAAAMHmgww (envelope-from ); Wed, 02 Nov 2022 08:39:26 +0000 Date: Wed, 2 Nov 2022 09:39:25 +0100 From: Michal Hocko To: "Huang, Ying" Cc: Bharata B Rao , Aneesh Kumar K V , linux-mm@kvack.org, linux-kernel@vger.kernel.org, Andrew Morton , Alistair Popple , Dan Williams , Dave Hansen , Davidlohr Bueso , Hesham Almatary , Jagdish Gediya , Johannes Weiner , Jonathan Cameron , Tim Chen , Wei Xu , Yang Shi Subject: Re: [RFC] memory tiering: use small chunk size and more tiers Message-ID: References: <0d938c9f-c810-b10a-e489-c2b312475c52@amd.com> <87tu3oibyr.fsf@yhuang6-desk2.ccr.corp.intel.com> <07912a0d-eb91-a6ef-2b9d-74593805f29e@amd.com> <87leowepz6.fsf@yhuang6-desk2.ccr.corp.intel.com> <878rkuchpm.fsf@yhuang6-desk2.ccr.corp.intel.com> <87bkppbx75.fsf@yhuang6-desk2.ccr.corp.intel.com> <877d0dbw13.fsf@yhuang6-desk2.ccr.corp.intel.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <877d0dbw13.fsf@yhuang6-desk2.ccr.corp.intel.com> X-Spam-Status: No, score=-4.4 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_MED,SPF_HELO_NONE, SPF_PASS autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed 02-11-22 16:28:08, Huang, Ying wrote: > Michal Hocko writes: > > > On Wed 02-11-22 16:02:54, Huang, Ying wrote: > >> Michal Hocko writes: > >> > >> > On Wed 02-11-22 08:39:49, Huang, Ying wrote: > >> >> Michal Hocko writes: > >> >> > >> >> > On Mon 31-10-22 09:33:49, Huang, Ying wrote: > >> >> > [...] > >> >> >> In the upstream implementation, 4 tiers are possible below DRAM. That's > >> >> >> enough for now. But in the long run, it may be better to define more. > >> >> >> 100 possible tiers below DRAM may be too extreme. > >> >> > > >> >> > I am just curious. Is any configurations with more than couple of tiers > >> >> > even manageable? I mean applications have been struggling even with > >> >> > regular NUMA systems for years and vast majority of them is largerly > >> >> > NUMA unaware. How are they going to configure for a more complex system > >> >> > when a) there is no resource access control so whatever you aim for > >> >> > might not be available and b) in which situations there is going to be a > >> >> > demand only for subset of tears (GPU memory?) ? > >> >> > >> >> Sorry for confusing. I think that there are only several (less than 10) > >> >> tiers in a system in practice. Yes, here, I suggested to define 100 (10 > >> >> in the later text) POSSIBLE tiers below DRAM. My intention isn't to > >> >> manage a system with tens memory tiers. Instead, my intention is to > >> >> avoid to put 2 memory types into one memory tier by accident via make > >> >> the abstract distance range of each memory tier as small as possible. > >> >> More possible memory tiers, smaller abstract distance range of each > >> >> memory tier. > >> > > >> > TBH I do not really understand how tweaking ranges helps anything. > >> > IIUC drivers are free to assign any abstract distance so they will clash > >> > without any higher level coordination. > >> > >> Yes. That's possible. Each memory tier corresponds to one abstract > >> distance range. The larger the range is, the higher the possibility of > >> clashing is. So I suggest to make the abstract distance range smaller > >> to reduce the possibility of clashing. > > > > I am sorry but I really do not understand how the size of the range > > actually addresses a fundamental issue that each driver simply picks > > what it wants. Is there any enumeration defining basic characteristic of > > each tier? How does a driver developer knows which tear to assign its > > driver to? > > The smaller range size will not guarantee anything. It just tries to > help the default behavior. > > The drivers are expected to assign the abstract distance based on the > memory latency/bandwidth, etc. Would it be possible/feasible to have a canonical way to calculate the abstract distance from these characteristics by the core kernel so that drivers do not even have fall into that trap? -- Michal Hocko SUSE Labs