Received: by 2002:a05:7412:f589:b0:e2:908c:2ebd with SMTP id eh9csp1100224rdb; Wed, 1 Nov 2023 11:13:26 -0700 (PDT) X-Google-Smtp-Source: AGHT+IFIu5pI1cxbOOcCTPkp7+ndhK4dK+bBMZbj5qd2fRzyAgluJXLePNoaOBAqPKIMZB/k0nVR X-Received: by 2002:a17:90b:947:b0:27d:12e1:7e20 with SMTP id dw7-20020a17090b094700b0027d12e17e20mr15546953pjb.12.1698862406663; Wed, 01 Nov 2023 11:13:26 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1698862406; cv=none; d=google.com; s=arc-20160816; b=odhrx2bcNrh7yThFZvZboBoqzSabS6nUMykKA8uU2W+V19YxuW2ClEIBg9KWU39yW4 UgGRmdINxUFHOmMeSvBiCJUeNaFNRCCAeuv3lhLpZyoDSj6AuC2HQkiehke96RnWg089 rYJGzX0vHvcG7bv2RVVtvwBnGlAPD4fMgPamHZroJBDi6BrrvRDQU88XVz7DPKl+Keea cZttbNmcySmoMsp3u+jzfN3/qpD6K27PTh3XtT4V+lKIdgXdsitPIpmhWlJ24+UelJyB HeyN0YZOitAuza2LHfyKJWclhAD+Sbwb+zQHrWBbr50B7xES1ytvlJfUeN7FAmK/LDvC a5RQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:cc:to:subject :message-id:date:from:in-reply-to:references:mime-version :dkim-signature; bh=KFXG9E/24dOz3uE7iAGtCMg93Y3K/HY/Z5kpbtFMk+E=; fh=jvG4RQAmetVTIHYw33j9IbkvKtTbirihuLD/JLVA9to=; b=tNXGAy3lN8fOb6lEtmnPBy4at7CdLn8r+NizWqSEs6/xOBlM0eKcHzIGYW1vWAwcWF 9M0ZOYkvwTzgxw1pfxU4GcDO1gJg3MXiMrw/MqI0faML34W/nAQ+UCM0dbvz+lVaGsnf 1jtnk72Ur9KKKK+V0hGcvG3djcE0K864bclMsoLTQRnIy2tfoalm9bZYh5XgLZ4uR0od UYi8ZUWvHwPYz7VHVqLvEAv7XN9tCaCqsxcJvBkfcVBFkwGyRTFq6pFpXshGdgyYGWhA pJ2HDeGoXqH8AMa2/objsrYeOzsL9tZQn21W+6nwdZCHqk38b8HHJqgrHphxR/eKzNkk wHEg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20230601 header.b=YnLndn27; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:3 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Return-Path: Received: from lipwig.vger.email (lipwig.vger.email. [2620:137:e000::3:3]) by mx.google.com with ESMTPS id q4-20020a17090a9f4400b00268178130a2si1264560pjv.60.2023.11.01.11.13.23 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 01 Nov 2023 11:13:26 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:3 as permitted sender) client-ip=2620:137:e000::3:3; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20230601 header.b=YnLndn27; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:3 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by lipwig.vger.email (Postfix) with ESMTP id 4094D8077F99; Wed, 1 Nov 2023 11:13:17 -0700 (PDT) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.10 at lipwig.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231974AbjKASMj (ORCPT + 99 others); Wed, 1 Nov 2023 14:12:39 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:34462 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232508AbjKASMh (ORCPT ); Wed, 1 Nov 2023 14:12:37 -0400 Received: from mail-pl1-x631.google.com (mail-pl1-x631.google.com [IPv6:2607:f8b0:4864:20::631]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id CEA1C1BDB for ; Wed, 1 Nov 2023 11:11:37 -0700 (PDT) Received: by mail-pl1-x631.google.com with SMTP id d9443c01a7336-1cc1ee2d8dfso184265ad.3 for ; Wed, 01 Nov 2023 11:11:37 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1698862296; x=1699467096; darn=vger.kernel.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=KFXG9E/24dOz3uE7iAGtCMg93Y3K/HY/Z5kpbtFMk+E=; b=YnLndn273305jLur8Q8pVylPsncqEgBfwsRxM4UOvaNnp9MaQYBybcOEQqnJprekgc 2d5mHYwdxYUv9zu47bBfKoSKvXyx+LcqdyrFFqf18ITa4t9ikJT4p12ZBTm2s7yelmgH VYLls8As2MViCrU76StO62ZqZvh7RaND/C/J/nzkPqwVt4q8OOmq8/GEOSPN4BZe0uT0 Gi+Q/WsyfLQOuPbmF7OL0Mgf0U3ixGuDLqz4xsFDLkxcw2wu04jlxSYKo49tVakZU+ZD aqzXY1oK/mZbmT4ErWYvGvx9NH86a5EuYHpkwhO7iTnyNq85NjZDCBtRxkoLIdvcGc92 Tn5g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1698862296; x=1699467096; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=KFXG9E/24dOz3uE7iAGtCMg93Y3K/HY/Z5kpbtFMk+E=; b=dbVFCHotuxI1SMOL0W75V9u8T/ZaXjCPhkECZ3XE+7R3nIbEYZgtQIieJeNfCK/+tO He4BS+9Nphs0HLO3mbzCiJ0h26zDJaHgi4ugFUoA0d111PUfW5G6cdvGctZX0Fj3hV87 qmVqTCX+bEPFy7ILxjuy7FIGrDX6wMhfv2FisMS97LTvfSr09HWQU1YLe56ZkK7KZCCu 07xucJnFlpXgmY3B5DQlp8inOhy3yM/GXTPXnIIr6XOesamw+/m+5sZuL/KylTXcZ4/x XWCeaXbzBjT7REh8QAEhAww9d8ifGCUVbCwFy4eLK4CM4+jUCjJC59SoRM+RxWVyqDUt nqyw== X-Gm-Message-State: AOJu0Yy/NwyuqPmNVmX2KEIqvjCsqEJXqMbJLQcosV7lA/oOWHzgbur9 delhgu61oVc1dgvxM2pjb+r3XQyAefkpayc+jg8= X-Received: by 2002:a17:90a:ca13:b0:27f:fce3:2266 with SMTP id x19-20020a17090aca1300b0027ffce32266mr14439023pjt.24.1698862296552; Wed, 01 Nov 2023 11:11:36 -0700 (PDT) MIME-Version: 1.0 References: <20230929114421.3761121-1-ryan.roberts@arm.com> <6d89fdc9-ef55-d44e-bf12-fafff318aef8@redhat.com> <7a3a2d49-528d-4297-ae19-56aa9e6c59c6@arm.com> <148676a4-8267-42de-a3ad-a3734e3f4bd9@arm.com> In-Reply-To: <148676a4-8267-42de-a3ad-a3734e3f4bd9@arm.com> From: Yang Shi Date: Wed, 1 Nov 2023 11:11:24 -0700 Message-ID: Subject: Re: [PATCH v6 0/9] variable-order, large folios for anonymous memory To: Ryan Roberts Cc: David Hildenbrand , Andrew Morton , Matthew Wilcox , Yin Fengwei , Yu Zhao , Catalin Marinas , Anshuman Khandual , "Huang, Ying" , Zi Yan , Luis Chamberlain , Itaru Kitayama , "Kirill A. Shutemov" , John Hubbard , David Rientjes , Vlastimil Babka , Hugh Dickins , linux-mm@kvack.org, linux-kernel@vger.kernel.org, linux-arm-kernel@lists.infradead.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Spam-Status: No, score=-0.6 required=5.0 tests=DKIM_SIGNED,DKIM_VALID, DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_HELO_NONE, SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lipwig.vger.email Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (lipwig.vger.email [0.0.0.0]); Wed, 01 Nov 2023 11:13:17 -0700 (PDT) On Wed, Nov 1, 2023 at 7:02=E2=80=AFAM Ryan Roberts = wrote: > > On 31/10/2023 18:29, Yang Shi wrote: > > On Tue, Oct 31, 2023 at 4:55=E2=80=AFAM Ryan Roberts wrote: > >> > >> On 31/10/2023 11:50, Ryan Roberts wrote: > >>> On 06/10/2023 21:06, David Hildenbrand wrote: > >>> [...] > >>>> > >>>> Change 2: sysfs interface. > >>>> > >>>> If we call it THP, it shall go under "/sys/kernel/mm/transparent_hug= epage/", I > >>>> agree. > >>>> > >>>> What we expose there and how, is TBD. Again, not a friend of "orders= " and > >>>> bitmaps at all. We can do better if we want to go down that path. > >>>> > >>>> Maybe we should take a look at hugetlb, and how they added support f= or multiple > >>>> sizes. What *might* make sense could be (depending on which values w= e actually > >>>> support!) > >>>> > >>>> > >>>> /sys/kernel/mm/transparent_hugepage/hugepages-64kB/ > >>>> /sys/kernel/mm/transparent_hugepage/hugepages-128kB/ > >>>> /sys/kernel/mm/transparent_hugepage/hugepages-256kB/ > >>>> /sys/kernel/mm/transparent_hugepage/hugepages-512kB/ > >>>> /sys/kernel/mm/transparent_hugepage/hugepages-1024kB/ > >>>> /sys/kernel/mm/transparent_hugepage/hugepages-2048kB/ > >>>> > >>>> Each one would contain an "enabled" and "defrag" file. We want somet= hing minimal > >>>> first? Start with the "enabled" option. > >>>> > >>>> > >>>> enabled: always [global] madvise never > >>>> > >>>> Initially, we would set it for PMD-sized THP to "global" and for eve= rything else > >>>> to "never". > >>> > >>> Hi David, > >>> > >>> I've just started coding this, and it occurs to me that I might need = a small > >>> clarification here; the existing global "enabled" control is used to = drive > >>> decisions for both anonymous memory and (non-shmem) file-backed memor= y. But the > >>> proposed new per-size "enabled" is implicitly only controlling anon m= emory (for > >>> now). > >>> > >>> 1) Is this potentially confusing for the user? Should we rename the p= er-size > >>> controls to "anon_enabled"? Or is it preferable to jsut keep it vague= for now so > >>> we can reuse the same control for file-backed memory in future? > >>> > >>> 2) The global control will continue to drive the file-backed memory d= ecision > >>> (for now), even when hugepages-2048kB/enabled !=3D "global"; agreed? > >>> > >>> Thanks, > >>> Ryan > >>> > >> > >> Also, an implementation question: > >> > >> hugepage_vma_check() doesn't currently care whether enabled=3D"never" = for DAX VMAs > >> (although it does honour MADV_NOHUGEPAGE and the prctl); It will retur= n true > >> regardless. Is that by design? It couldn't fathom any reasoning from t= he commit log: > > > > The enabled=3D"never" is for anonymous VMAs, DAX VMAs are typically fil= e VMAs. > > That's not quite true; enabled=3D"never" is honoured for non-DAX/non-shme= m file > VMAs (for collapse via CONFIG_READ_ONLY_THP_FOR_FS and more recently for When implementing READ_ONLY_THP_FOR_FS the file THP just can be collapsed by khugepaged, but khugepaged is started iff enabled !=3D "never". So READ_ONLY_THP_FOR_FS has to honor it. Unfortunately there are some confusing exceptions... But anyway DAX is not the same class. > anything that implements huge_fault() - see > 7a81751fcdeb833acc858e59082688e3020bfe12). IIUC this commit just gives the vmas which implement huge_fault() a chance to handle the fault. Currently just DAX vmas implement huge_fault() in vanilla kernel AFAICT. >