Received: by 2002:a05:6a11:4021:0:0:0:0 with SMTP id ky33csp537806pxb; Thu, 23 Sep 2021 05:47:58 -0700 (PDT) X-Google-Smtp-Source: ABdhPJzJSRDTOmlFCFPJ2D7r6+4vf1htsxOxKE0TNXFaKmhgE4DN5VdGXRoGgcQoee/sFM9cfNPT X-Received: by 2002:a5e:d80a:: with SMTP id l10mr3593098iok.36.1632401278359; Thu, 23 Sep 2021 05:47:58 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1632401278; cv=none; d=google.com; s=arc-20160816; b=XtRKRIYb+jOQw7huk5cDByJRhJLwTihaW2KzZ2qGJkFvCu4/jrXIlDlts4mFSD0m4v OLH5QJm7S2ET1WPcTxJfxff1YAoGfiY+ruebwK7/Hq3t5cGKoF/u7GW/gxGBDcKzDn0X IQqhB28T/PbrLqeKFf0AlW3OeAXk/MUIYlI2oFIqLtE3BBalQm/RzUS1buVVWxBLCt8h L9cFRJwSSnVwk6h7MpnVHneG1JqznpQ/bW51wbtuQGa0QJJfHvdxNPexJYkoXu8daWtt kg2snnkNeK/JmFnqh1+5OEJ/F9X5UnGBElVnjQqDMohYWSy1Y1J1CJ+uzFX0bmpIivXm ycXw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-disposition:mime-version :references:message-id:subject:cc:to:from:date:dkim-signature; bh=DLm/JPaMIp6sjkaiuRZT1UExfCGgcFfeWo4FyZBtSmY=; b=Yp4CAqpaHElqgs2YKaUd+VFq1N8DPQ5NfgVCIiMlYYETEBK8RuViT5S7vhzO+qDVw6 yt9G/km1bFnbiQ6FeaJUEfWiYDIcxeSytDxW7IADOjbBLzAA183//WiAqvzT1Af+awnI i3LAwZco29kVIBwmWxB6EopYV2GWQcRjVT5iyDMyzK5ZkrE3MNn0tSva4AwZyUYamRFC /6Fj+g67M4/mZHFxVrkoH7Mh4zgDogveWp6gcjIHvsgtCwIFqSjfHBRune0MkmpBdR2z tpdW7BCtXPW6MRr3v1qzvUPasn8FCb3OJzvIeXqCZ0ZWPYYVd9PPrtCB+HuDwfXETbdR 4nSQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@shutemov-name.20210112.gappssmtp.com header.s=20210112 header.b=C0MDMxtY; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id 205si5570315ioc.95.2021.09.23.05.47.28; Thu, 23 Sep 2021 05:47:58 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@shutemov-name.20210112.gappssmtp.com header.s=20210112 header.b=C0MDMxtY; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S241156AbhIWMsA (ORCPT + 99 others); Thu, 23 Sep 2021 08:48:00 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:51484 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S241113AbhIWMrq (ORCPT ); Thu, 23 Sep 2021 08:47:46 -0400 Received: from mail-lf1-x12e.google.com (mail-lf1-x12e.google.com [IPv6:2a00:1450:4864:20::12e]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 4A030C061766 for ; Thu, 23 Sep 2021 05:45:03 -0700 (PDT) Received: by mail-lf1-x12e.google.com with SMTP id i25so26379576lfg.6 for ; Thu, 23 Sep 2021 05:45:03 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=shutemov-name.20210112.gappssmtp.com; s=20210112; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=DLm/JPaMIp6sjkaiuRZT1UExfCGgcFfeWo4FyZBtSmY=; b=C0MDMxtYtfwEmfvJywxjNgCnSRV1IuOYywW30v5Q3V9tB/M1aIo+IL5+rjhfuO2TN5 fMvGamX4epVIpdX+CpP8rU6rJNm/tJzqhXDiGoNzoUD9ytFoJAcBIK7l6FCZc1jZYZCe ZD4DjQAdQG/JJ0onl/YbliWqsLTG+3CsAkeDyRC3w++Z8+A8Wj+hEvRQvjF7hA/a1/mB Q0GA4tbppZBSJzskdVLbtv4tU7ZK8PqWAH9lbnngCzcy8DcGq9eSMFS5/8ENNmtTCCsB 8e4dS0e5WZ3eJrTXAvSxkUkwFFLtuv+RLMg25F5mXXZrSpuW6L0n75ltFTPCmt2D2Kto aPgw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=DLm/JPaMIp6sjkaiuRZT1UExfCGgcFfeWo4FyZBtSmY=; b=6YKYgJmB7CASd0BJF3G5YyJ4qxPJzt1ISm76aOpQIuDo15GS46Mh+3/75g0FRX5HIi WS/WOy+2xJQBg7ohui++wzeTgt5nfCI4Igje+qUhtAhzosb/CCiO/pDc9AMpsYb2uFLU KB2PE2Gpkeoa0SyO+ihELfXWmmR6k0p8WhZH5qg6WHpaGGhCciMv1NuTSKUpJ/X7iXe+ F1sQWwQBA1wlFTzOqgyqQm6eRhlZJogD61Ocq/oJWKzt5FMVCpq8u27pJyl+J75Dvf2H WHIgSt4d4bT60Okafx7VYEhyLT9T0n6cuw8GpxRZF1jHUX/7dZTnLv0LwS7xb3MN2Tiu uFNQ== X-Gm-Message-State: AOAM533Ciqo2aj4DxmYc8Dzr8mZlwIp5raPR6OcGNfXQPzpsB32BVEid Q6NlRO1icDCr8mObp388/9NbLA== X-Received: by 2002:a2e:8eda:: with SMTP id e26mr4944190ljl.266.1632401101517; Thu, 23 Sep 2021 05:45:01 -0700 (PDT) Received: from box.localdomain ([86.57.175.117]) by smtp.gmail.com with ESMTPSA id v18sm581717ljg.95.2021.09.23.05.45.00 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 23 Sep 2021 05:45:00 -0700 (PDT) Received: by box.localdomain (Postfix, from userid 1000) id 02E7510306B; Thu, 23 Sep 2021 15:45:02 +0300 (+03) Date: Thu, 23 Sep 2021 15:45:02 +0300 From: "Kirill A. Shutemov" To: Matthew Wilcox Cc: Kent Overstreet , linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, Johannes Weiner , Linus Torvalds , Andrew Morton , "Darrick J. Wong" , Christoph Hellwig , David Howells , Mike Kravetz Subject: Re: Mapcount of subpages Message-ID: <20210923124502.nxfdaoiov4sysed4@box.shutemov.name> References: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, Sep 23, 2021 at 12:40:14PM +0100, Matthew Wilcox wrote: > On Thu, Sep 23, 2021 at 01:15:16AM -0400, Kent Overstreet wrote: > > On Thu, Sep 23, 2021 at 04:23:12AM +0100, Matthew Wilcox wrote: > > > (compiling that list reminds me that we'll need to sort out mapcount > > > on subpages when it comes time to do this. ask me if you don't know > > > what i'm talking about here.) > > > > I am curious why we would ever need a mapcount for just part of a page, tell me > > more. > > I would say Kirill is the expert here. My understanding: > > We have three different approaches to allocating 2MB pages today; > anon THP, shmem THP and hugetlbfs. Hugetlbfs can only be mapped on a > 2MB boundary, so it has no special handling of mapcount [1]. Anon THP > always starts out as being mapped exclusively on a 2MB boundary, but > then it can be split by, eg, munmap(). If it is, then the mapcount in > the head page is distributed to the subpages. One more complication for anon THP is that it can be shared across fork() and one process may split it while other have it mapped with PMD. > Shmem THP is the tricky one. You might have a 2MB page in the page cache, > but then have processes which only ever map part of it. Or you might > have some processes mapping it with a 2MB entry and others mapping part > or all of it with 4kB entries. And then someone truncates the file to > midway through this page; we split it, and now we need to figure out what > the mapcount should be on each of the subpages. We handle this by using > ->mapcount on each subpage to record how many non-2MB mappings there are > of that specific page and using ->compound_mapcount to record how many 2MB > mappings there are of the entire 2MB page. Then, when we split, we just > need to distribute the compound_mapcount to each page to make it correct. > We also have the PageDoubleMap flag to tell us whether anybody has this > 2MB page mapped with 4kB entries, so we can skip all the summing of 4kB > mapcounts if nobody has done that. Possible future complication comes from 1G THP effort. With 1G THP we would have whole hierarchy of mapcounts: 1 PUD mapcount, 512 PMD mapcounts and 262144 PTE mapcounts. (That's one of the reasons I don't think 1G THP is viable.) Note that there are places where exact mapcount accounting is critical: try_to_unmap() may finish prematurely if we underestimate mapcount and overestimating mapcount may lead to superfluous CoW that breaks GUP. > > [1] Mike is looking to change this, but I'm not sure where he is with it. > -- Kirill A. Shutemov