Received: by 2002:a05:6358:d09b:b0:dc:cd0c:909e with SMTP id jc27csp2917823rwb; Fri, 9 Dec 2022 07:50:56 -0800 (PST) X-Google-Smtp-Source: AA0mqf7YkReZiEeX7zyqBbmr8HH8WwyUn04AU8xYwEcvPzsoUJWB3q0K/COdjA4jWG7ipj+U8F1h X-Received: by 2002:a17:907:7fa0:b0:7ad:9629:a63 with SMTP id qk32-20020a1709077fa000b007ad96290a63mr6546718ejc.25.1670601056082; Fri, 09 Dec 2022 07:50:56 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1670601056; cv=none; d=google.com; s=arc-20160816; b=qHRdEZ4X2dCOo5GWFA6uFOAEnXnFXXLp+duA4UccdYKigqgPQ7rXxxAYGx3XUtcyr9 YhX/stQINfvlKL8HjyvyxtzdsShN3NctmGheF+4C96dUPVPJem+/baVudtgxM7/lbhpZ 14xJrUyUDbJEPtiiigJLZ3PDVM9l7CKVR+JuiycnJIOFyy6aIq7NaH7r+o8qQUWZ4YL+ x8q22D8/IPLj+Ukyo1j+tGiRFhQpCZupLnw/m9DUgc/f1MlYR5tKmUITYhvlnL2sgIO8 ujD7ocv/mANyrsgKeLW8OEsLqQNtbgghwgq9Dxhue08+R5aToX3++GoK8w3mhd38EPat vm3w== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:subject:message-id:date:from:in-reply-to :references:mime-version:dkim-signature; bh=zLl4Tz9dOxODdkz805/Ivhudgy+WqekBga7RITekvs0=; b=KyI74DZcuutJjZJzFC6C3ASjeGOhIN2UaF8lWJO5ZnQ+Jjf7g/9aWIwejSDbcrcl2Q C7pJKHUXeZcBxXrX4/ypOp0Z9DmvdB3hdXsemAdtc3pnJtK5kcF9FEnVIOKexDra/+QS jwZB13uKSxjK85kUC+n/s0dOhTI6pyvNmEvLFWrd2RuOumPPBxVQWd2NsAnCrdd9JGJn QB5a3Y5TD387Ckit2+QVSsAIgZedKFppfavQ5Vqu9yMdCm2YNg72GSIyNvpl7Yj+lhui 0YLevm8LChALjb0ub9s8e1AJTl7BwyRzHZaMvoOW4XpjFDJsv/6Yyk4Qz308thV55pqT icpQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=feC9CFwS; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id lo8-20020a170906fa0800b007c0e155ead2si78506ejb.369.2022.12.09.07.50.35; Fri, 09 Dec 2022 07:50:56 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=feC9CFwS; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230072AbiLIPln (ORCPT + 75 others); Fri, 9 Dec 2022 10:41:43 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:59982 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230103AbiLIPlk (ORCPT ); Fri, 9 Dec 2022 10:41:40 -0500 Received: from mail-wm1-x329.google.com (mail-wm1-x329.google.com [IPv6:2a00:1450:4864:20::329]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 6061B1A817 for ; Fri, 9 Dec 2022 07:41:38 -0800 (PST) Received: by mail-wm1-x329.google.com with SMTP id ay14-20020a05600c1e0e00b003cf6ab34b61so117160wmb.2 for ; Fri, 09 Dec 2022 07:41:38 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=zLl4Tz9dOxODdkz805/Ivhudgy+WqekBga7RITekvs0=; b=feC9CFwS0Yiuqn3p/A3RGjuWnRA9669IU/0/uzpCLL70yZ9pTra2n57v6GDb73pzdM bKtVnBSeXqDuwkhZayfoVJO5HfHoqpUJd9jmYhO8knmEccpqfYzVuceNJu6P8OwkCJcq P/rjNhPDzp2RoH56JsWnRjSSanSqDLHaogGwa5CKxAmuSbYIClvX9YtBhTkI55fxayAx fDPXz4s6sBKRQw0HYvm+X19SZWRRprgNwgOxsMR2eViYi6Ibsd9f7GGVP8XHisZgj5WP pwNkWRat8aoDtNtnM9p1LCmn+pejNEKlJoW9o5DQ7gB9oLBeMmIVHMOYgi4laYZvv/Ph XtQQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=zLl4Tz9dOxODdkz805/Ivhudgy+WqekBga7RITekvs0=; b=vupf8KD840yQFfEv0iC0NGljx7s1LGn8z78TOPFBKD56X6gSqFFc9inLAeELlPG853 xWJ5fridM+/1VjsgFZShackHheKnFfRTeHlfZdCVp4SALGzELW/HpDFpYFTexawqcN3Y hPEW6/T5kfX/eE9jf0heEkr04R/dUlTEorChiP/ZrEZVmwIkKhWVPrCZPU2fpF13JLIf FP2usoeO5jLGTB6/KW08N2b25wF3WmAhyhsoOCwMgy5Dlq+QYUc+Azekrd/VAkaaspvW su2Wakj0YmZYgn6Kr05JPppfnWnl7iUH8GbO3a63EnMRJq822YNWNN/3bP4OSkhurrL0 B96g== X-Gm-Message-State: ANoB5pk0vYFexeYmdJcZPmoVBgZYP8jYCtC7bdlc4PN1H+maWN3DQEmZ JYYzH7ZNu8/A1HHY+O6OokntYcL8scZCOGAA3XsI3A== X-Received: by 2002:a05:600c:5124:b0:3cf:878c:6555 with SMTP id o36-20020a05600c512400b003cf878c6555mr58256487wms.38.1670600496778; Fri, 09 Dec 2022 07:41:36 -0800 (PST) MIME-Version: 1.0 References: <20221021163703.3218176-1-jthoughton@google.com> <20221021163703.3218176-9-jthoughton@google.com> In-Reply-To: From: James Houghton Date: Fri, 9 Dec 2022 10:41:24 -0500 Message-ID: Subject: Re: [RFC PATCH v2 08/47] hugetlb: add HGM enablement functions To: Mina Almasry Cc: Mike Kravetz , Muchun Song , Peter Xu , David Hildenbrand , David Rientjes , Axel Rasmussen , "Zach O'Keefe" , Manish Mishra , Naoya Horiguchi , "Dr . David Alan Gilbert" , "Matthew Wilcox (Oracle)" , Vlastimil Babka , Baolin Wang , Miaohe Lin , Yang Shi , Andrew Morton , linux-mm@kvack.org, linux-kernel@vger.kernel.org Content-Type: text/plain; charset="UTF-8" X-Spam-Status: No, score=-17.6 required=5.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF, ENV_AND_HDR_SPF_MATCH,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS, USER_IN_DEF_DKIM_WL,USER_IN_DEF_SPF_WL autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, Dec 7, 2022 at 7:26 PM Mina Almasry wrote: > > On Fri, Oct 21, 2022 at 9:37 AM James Houghton wrote: > > > > Currently it is possible for all shared VMAs to use HGM, but it must be > > enabled first. This is because with HGM, we lose PMD sharing, and page > > table walks require additional synchronization (we need to take the VMA > > lock). > > > > Signed-off-by: James Houghton > > --- > > include/linux/hugetlb.h | 22 +++++++++++++ > > mm/hugetlb.c | 69 +++++++++++++++++++++++++++++++++++++++++ > > 2 files changed, 91 insertions(+) > > > > diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h > > index 534958499ac4..6e0c36b08a0c 100644 > > --- a/include/linux/hugetlb.h > > +++ b/include/linux/hugetlb.h > > @@ -123,6 +123,9 @@ struct hugetlb_vma_lock { > > > > struct hugetlb_shared_vma_data { > > struct hugetlb_vma_lock vma_lock; > > +#ifdef CONFIG_HUGETLB_HIGH_GRANULARITY_MAPPING > > + bool hgm_enabled; > > +#endif > > }; > > > > extern struct resv_map *resv_map_alloc(void); > > @@ -1179,6 +1182,25 @@ static inline void hugetlb_unregister_node(struct node *node) > > } > > #endif /* CONFIG_HUGETLB_PAGE */ > > > > +#ifdef CONFIG_HUGETLB_HIGH_GRANULARITY_MAPPING > > +bool hugetlb_hgm_enabled(struct vm_area_struct *vma); > > +bool hugetlb_hgm_eligible(struct vm_area_struct *vma); > > +int enable_hugetlb_hgm(struct vm_area_struct *vma); > > +#else > > +static inline bool hugetlb_hgm_enabled(struct vm_area_struct *vma) > > +{ > > + return false; > > +} > > +static inline bool hugetlb_hgm_eligible(struct vm_area_struct *vma) > > +{ > > + return false; > > +} > > +static inline int enable_hugetlb_hgm(struct vm_area_struct *vma) > > +{ > > + return -EINVAL; > > +} > > +#endif > > + > > static inline spinlock_t *huge_pte_lock(struct hstate *h, > > struct mm_struct *mm, pte_t *pte) > > { > > diff --git a/mm/hugetlb.c b/mm/hugetlb.c > > index 5ae8bc8c928e..a18143add956 100644 > > --- a/mm/hugetlb.c > > +++ b/mm/hugetlb.c > > @@ -6840,6 +6840,10 @@ static bool pmd_sharing_possible(struct vm_area_struct *vma) > > #ifdef CONFIG_USERFAULTFD > > if (uffd_disable_huge_pmd_share(vma)) > > return false; > > +#endif > > +#ifdef CONFIG_HUGETLB_HIGH_GRANULARITY_MAPPING > > + if (hugetlb_hgm_enabled(vma)) > > + return false; > > #endif > > /* > > * Only shared VMAs can share PMDs. > > @@ -7033,6 +7037,9 @@ static int hugetlb_vma_data_alloc(struct vm_area_struct *vma) > > kref_init(&data->vma_lock.refs); > > init_rwsem(&data->vma_lock.rw_sema); > > data->vma_lock.vma = vma; > > +#ifdef CONFIG_HUGETLB_HIGH_GRANULARITY_MAPPING > > + data->hgm_enabled = false; > > +#endif > > vma->vm_private_data = data; > > return 0; > > } > > @@ -7290,6 +7297,68 @@ __weak unsigned long hugetlb_mask_last_page(struct hstate *h) > > > > #endif /* CONFIG_ARCH_WANT_GENERAL_HUGETLB */ > > > > +#ifdef CONFIG_HUGETLB_HIGH_GRANULARITY_MAPPING > > +bool hugetlb_hgm_eligible(struct vm_area_struct *vma) > > +{ > > + /* > > + * All shared VMAs may have HGM. > > + * > > + * HGM requires using the VMA lock, which only exists for shared VMAs. > > + * To make HGM work for private VMAs, we would need to use another > > + * scheme to prevent collapsing/splitting from invalidating other > > + * threads' page table walks. > > + */ > > + return vma && (vma->vm_flags & VM_MAYSHARE); > > +} > > +bool hugetlb_hgm_enabled(struct vm_area_struct *vma) > > +{ > > + struct hugetlb_shared_vma_data *data = vma->vm_private_data; > > + > > + if (!vma || !(vma->vm_flags & VM_MAYSHARE)) > > + return false; > > + > > + return data && data->hgm_enabled; > > Don't you need to lock data->vma_lock before you access data? Or did I > misunderstand the locking? Or are you assuming this is safe before > hgm_enabled can't be disabled? This should be protected by the mmap_lock (we must be holding it for at least reading here). `data` and `data->hgm_enabled` are only changed when holding the mmap_lock for writing. > > +} > > + > > +/* > > + * Enable high-granularity mapping (HGM) for this VMA. Once enabled, HGM > > + * cannot be turned off. > > + * > > + * PMDs cannot be shared in HGM VMAs. > > + */ > > +int enable_hugetlb_hgm(struct vm_area_struct *vma) > > +{ > > + int ret; > > + struct hugetlb_shared_vma_data *data; > > + > > + if (!hugetlb_hgm_eligible(vma)) > > + return -EINVAL; > > + > > + if (hugetlb_hgm_enabled(vma)) > > + return 0; > > + > > + /* > > + * We must hold the mmap lock for writing so that callers can rely on > > + * hugetlb_hgm_enabled returning a consistent result while holding > > + * the mmap lock for reading. > > + */ > > + mmap_assert_write_locked(vma->vm_mm); > > + > > + /* HugeTLB HGM requires the VMA lock to synchronize collapsing. */ > > + ret = hugetlb_vma_data_alloc(vma); > > Confused we need to vma_data_alloc() here. Shouldn't this be done by > hugetlb_vm_op_open()? hugetlb_vma_data_alloc() can fail. In hugetlb_vm_op_open()/other places, it is allowed to fail, and so we call it again here and check that it succeeded so that we can rely on the VMA lock. I think I need to be a little bit more careful with how I handle VMA splitting, though. It's possible for `data` not to be allocated after we split, but for some things to be mapped at high-granularity. The easiest solution here would be to disallow splitting when HGM is enabled; not sure what the best solution is though. Thanks for the review, Mina! > > > + if (ret) > > + return ret; > > + > > + data = vma->vm_private_data; > > + BUG_ON(!data); > > + data->hgm_enabled = true; > > + > > + /* We don't support PMD sharing with HGM. */ > > + hugetlb_unshare_all_pmds(vma); > > + return 0; > > +} > > +#endif /* CONFIG_HUGETLB_HIGH_GRANULARITY_MAPPING */ > > + > > /* > > * These functions are overwritable if your architecture needs its own > > * behavior. > > -- > > 2.38.0.135.g90850a2211-goog > >