Received: by 2002:a05:6a10:1d13:0:0:0:0 with SMTP id pp19csp1564844pxb; Fri, 27 Aug 2021 11:48:39 -0700 (PDT) X-Google-Smtp-Source: ABdhPJzjBukHjRHvjWZ8pE+mHp6Id4bwKRzX7pl31940AuNoisXOUswHK/SUFmCOTlLnO6/Sxhqa X-Received: by 2002:a17:906:c1d0:: with SMTP id bw16mr11611993ejb.146.1630090119197; Fri, 27 Aug 2021 11:48:39 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1630090119; cv=none; d=google.com; s=arc-20160816; b=ANaSQeRCUbSufuo7sZz0B1/MLjNoZZXGTCRVNafIUS2f/ENzNb7t+2wAQxXdRABsxn 2k1+qIv+JucCCHXDzykelC6P/sAPu4uRXN4BE1Xk5WryqfhpNsKB7Te4roLWDJOc5msQ 1JjMnwirQHI7eN73byHYITp+S6cZBl081qrSRdoK/xsSLbgnVc5sB/XAej757BhjGrBR r5KWMd8+vbMeP4R072k+otmfcWmZs38xVQIyRpd99BmvRZ8psm6IzIcHRIWoQT+6IWfJ 07nyn70qLTCBTyY8p+bY4RDB9Q8HM0x9GAAxZrLqCGisWNIsVikVBrUQc/fbl2FFD0SP V7VA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-disposition:mime-version :references:message-id:subject:cc:to:from:date:dkim-signature; bh=TmIFGLuzR5TKf3tBpVDD04JlWgwoS6T72CAQ6nPDOIw=; b=CzPKUYibpYf8kVDF8zkj/+SM99EiXYBPfkpm/qHpi+dplIqHgQQcZmZJxSHDDglGMa sYkYkzYIqqst+x514XOKiN0YExJyoYp81eaUPnwiZaVlqcU3NF9T4MJuz/XIHaZ1kba+ FqE6olrflV04BKEsxSwxq9oDlNHs3JsUqzIbsBqOAqbOO6AF0jXEM6wpWXPbxpA3VPhp fatTdQvV5td14ErTZsiFlbVi5CqEt7w3HwjpFRC2O6EHEmHzOWqoMaVc4q9DvxhsZSEJ 8ShoamIj9dcLJTdIh3do3gMbI0K5293K7wJWm2xadW3D3s6htbMhvGxsmRlUK92HIdPn sXxA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@infradead.org header.s=casper.20170209 header.b=ut6u+4w3; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id h8si7424031ejj.750.2021.08.27.11.48.05; Fri, 27 Aug 2021 11:48:39 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@infradead.org header.s=casper.20170209 header.b=ut6u+4w3; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230265AbhH0SrD (ORCPT + 99 others); Fri, 27 Aug 2021 14:47:03 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:51960 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229882AbhH0SrC (ORCPT ); Fri, 27 Aug 2021 14:47:02 -0400 Received: from casper.infradead.org (casper.infradead.org [IPv6:2001:8b0:10b:1236::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 82A72C061757; Fri, 27 Aug 2021 11:46:13 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=casper.20170209; h=In-Reply-To:Content-Type:MIME-Version: References:Message-ID:Subject:Cc:To:From:Date:Sender:Reply-To: Content-Transfer-Encoding:Content-ID:Content-Description; bh=TmIFGLuzR5TKf3tBpVDD04JlWgwoS6T72CAQ6nPDOIw=; b=ut6u+4w3fu6/PUU3PCNobyymP5 iovfucrUkllD1i8f2h2sv2LvTP3XLNPka4S2GwsmI3JC8YBGZMJ8Cn2JZIMax7T1sY5GZfbUssQBM LAlNGyy1wDNq6BLXk8uHoEtiZux4t6gBBpuw8YLLJ8rmOE/OaGdB1a9wTD2jhLU7Lt0lGVvDLfgdM pTzYqSoTXIaBMd/N1A767I/LRyy7Xaxi2/U4DdUh3E4DIZivssGA1We+VIN+4JTBVF493wj0ArTLA ff7hVDqdxNw6HipwmczrPP3Ea/3XbtUSBkB/219pFEgysx9EhmbasfcniBLGqNpZS7JZBozf+A+Fx d0qVmVzQ==; Received: from willy by casper.infradead.org with local (Exim 4.94.2 #2 (Red Hat Linux)) id 1mJgq5-00Esn6-Ln; Fri, 27 Aug 2021 18:44:52 +0000 Date: Fri, 27 Aug 2021 19:44:29 +0100 From: Matthew Wilcox To: Johannes Weiner Cc: "Darrick J. Wong" , Linus Torvalds , linux-mm@kvack.org, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, Andrew Morton Subject: Re: [GIT PULL] Memory folios for v5.15 Message-ID: References: <20210826004555.GF12597@magnolia> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, Aug 27, 2021 at 10:07:16AM -0400, Johannes Weiner wrote: > We have the same thoughts in MM and growing memory sizes. The DAX > stuff said from the start it won't be built on linear struct page > mappings anymore because we expect the memory modules to be too big to > manage them with such fine-grained granularity. Well, I did. Then I left Intel, and Dan took over. Now we have a struct page for each 4kB of PMEM. I'm not particularly happy about this change of direction. > But in practice, this > is more and more becoming true for DRAM as well. We don't want to > allocate gigabytes of struct page when on our servers only a very > small share of overall memory needs to be managed at this granularity. This is a much less compelling argument than you think. I had some ideas along these lines and I took them to a performance analysis group. They told me that for their workloads, doubling the amount of DRAM in a system increased performance by ~10%. So increasing the amount of DRAM by 1/63 is going to increase performance by 1/630 or 0.15%. There are more important performance wins to go after. Even in the cloud space where increasing memory by 1/63 might increase the number of VMs you can host by 1/63, how many PMs host as many as 63 VMs? ie does it really buy you anything? It sounds like a nice big number ("My 1TB machine has 16GB occupied by memmap!"), but the real benefit doesn't really seem to be there. And of course, that assumes that you have enough other resources to scale to 64/63 of your current workload; you might hit CPU, IO or some other limit first. > Folio perpetuates the problem of the base page being the floor for > cache granularity, and so from an MM POV it doesn't allow us to scale > up to current memory sizes without horribly regressing certain > filesystem workloads that still need us to be able to scale down. The mistake you're making is coupling "minimum mapping granularity" with "minimum allocation granularity". We can happily build a system which only allocates memory on 2MB boundaries and yet lets you map that memory to userspace in 4kB granules. > I really don't think it makes sense to discuss folios as the means for > enabling huge pages in the page cache, without also taking a long hard > look at the allocation model that is supposed to back them. Because > you can't make it happen without that. And this part isn't looking so > hot to me, tbh. Please, don't creep the scope of this project to "first, redesign the memory allocator". This project is _if we can_, use larg(er) pages to cache files. What Darrick is talking about is an entirely different project that I haven't signed up for and won't. > Willy says he has future ideas to make compound pages scale. But we > have years of history saying this is incredibly hard to achieve - and > it certainly wasn't for a lack of constant trying. I genuinely don't understand. We have five primary users of memory in Linux (once we're in a steady state after boot): - Anonymous memory - File-backed memory - Slab - Network buffers - Page tables The relative importance of each one very much depends on your workload. Slab already uses medium order pages and can be made to use larger. Folios should give us large allocations of file-backed memory and eventually anonymous memory. Network buffers seem to be headed towards larger allocations too. Page tables will need some more thought, but once we're no longer interleaving file cache pages, anon pages and page tables, they become less of a problem to deal with. Once everybody's allocating order-4 pages, order-4 pages become easy to allocate. When everybody's allocating order-0 pages, order-4 pages require the right 16 pages to come available, and that's really freaking hard.