Received: by 2002:a05:6358:a55:b0:ec:fcf4:3ecf with SMTP id 21csp6903716rwb; Wed, 18 Jan 2023 10:47:41 -0800 (PST) X-Google-Smtp-Source: AMrXdXvjJEdsAKDn1onbrp57BiLH8Uc+W9+XNH/ocQIvc18Z3BKH900ltN9oLrt6lQwhp39jssRn X-Received: by 2002:a17:907:d490:b0:870:8e31:7e33 with SMTP id vj16-20020a170907d49000b008708e317e33mr9461929ejc.15.1674067660693; Wed, 18 Jan 2023 10:47:40 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1674067660; cv=none; d=google.com; s=arc-20160816; b=UAEyZBh1CtOYI3XmQ0JE8A63CfqsHc+8CS0puwzk15ychJm05Gnr96GYiwgjIwrHN8 EoQXpMzB5/6zFEF214HO9KUf8F4I68dra3RvB37KiD+KdXXa4262HzheEjFp+QQE5DzI J7OK8/cwxEpoAWzFC9obhQTJOuI1ihCHTQySy5CZFh9r2X3PKPqz8pBCSEd1KOAncJFr tdlgUEb952D+A0PDW6SjM9MHgNxBkibIczsZQ3zHQMo9IHCZO2qxZHmz09HvgTNiibFJ g1b4hvIOjEB8v5gpCi0AbDt6rF/X251X/mN1MVtSYLALc1S0mXiLxvCMGGSYN68uHAKo Ntfg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:in-reply-to:subject :organization:from:references:cc:to:content-language:user-agent :mime-version:date:message-id:dkim-signature; bh=kkfAUPkyScTipaKcC/XWnnaZrRP9e8MsdaN4fyf8AEA=; b=wM1LmYhl03eFbVQ4b1FD/b4noT/d+rZdB8XtuRv0VTplpPTBvqJ4yiEMZK976kHxBI I+RLb+YYN4d0glmSklX1KmCQ+eEMCAJlx89auPFdIjJaVaABfDxXP6lem2iQoQ455vXg MSmFdBP1LqxemKvkF69Ifb8SIEyAVhypKNd4vZyzTwpaRSn0Qhf+gtRTH8QUAX/lJKLj qGyiNqFNZ1fixfnArCDupaC00vWKTjSnHn8tEDOtEYQLx+fnVU5YBGrgk6XVS5Iamf6+ o9HMmfk39ezA44/E63qFuDY/ETeVjZP+KlVoH4qEkaUHZP5y+NHUkg2qZIYc7GoFfyx1 0JMA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b="H/b6oe7X"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id i3-20020a1709064fc300b00867779a2fddsi23677921ejw.188.2023.01.18.10.47.29; Wed, 18 Jan 2023 10:47:40 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b="H/b6oe7X"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229634AbjARSWn (ORCPT + 45 others); Wed, 18 Jan 2023 13:22:43 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:45886 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229379AbjARSWk (ORCPT ); Wed, 18 Jan 2023 13:22:40 -0500 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 868903A846 for ; Wed, 18 Jan 2023 10:21:52 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1674066109; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=kkfAUPkyScTipaKcC/XWnnaZrRP9e8MsdaN4fyf8AEA=; b=H/b6oe7XQMjGl/qhAacev4HrGGMYSpEPDt3k5jnGbbtojFnZx2ygBeHKoalWEvo8LrRMqC c7864uhk27a4Adf8i7SGWqhfcMluH+mYrRq+nu7qbK1pDfyVlLxxoGrY+AcaaMYEh6uUld /PgdvP6yNzdvzNRBDRmfSZbwPKi8qQs= Received: from mail-wm1-f72.google.com (mail-wm1-f72.google.com [209.85.128.72]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_128_GCM_SHA256) id us-mta-627-H9wWZdjVPQq4IHG0_9sXwQ-1; Wed, 18 Jan 2023 13:21:46 -0500 X-MC-Unique: H9wWZdjVPQq4IHG0_9sXwQ-1 Received: by mail-wm1-f72.google.com with SMTP id r15-20020a05600c35cf00b003d9a14517b2so1628511wmq.2 for ; Wed, 18 Jan 2023 10:21:46 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:in-reply-to:subject:organization:from :references:cc:to:content-language:user-agent:mime-version:date :message-id:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=kkfAUPkyScTipaKcC/XWnnaZrRP9e8MsdaN4fyf8AEA=; b=eXl2jWfSBAqjp0AsrKoTW424nVdjScRKRqiFPxL4XimWs2ZWUAcsobb+2806AicYJX tTx/uFHRvbAdi4DStueVVrJykY0SYBWKDns2NNmWzyVjkd0XhwGBubGIy58iTOIoMFSt aJaGsFp/P3nTsAsqGcxtLoB+7xsPKPUv/CMjE+whMEEXGxoRH7sZuY3THCVQPcdzKeEy eqgg4UliP5E6xp75oHneYLZv/bQTWzCxqVy0fb9mxpzakFGpYUrdu7kDcLL2FAvnEOvr sM9F2XFZ+UXnUYNHOnEkUJAYHmtB72JH0rrkxlURe54tQfpD7MZgZeYEkyTjKuNhWwkr 04pQ== X-Gm-Message-State: AFqh2koTsKSiDj8SjgRrdEGEODUQpKMjZsV60OrE5B+p5+KhNMwx5twk 2zPKWCdlxGbvMioy6ZFcTR0yokVLwzlIOilO7WJOCAOUF5kye+XneCOYY16b0d+SCtYZ7KE5h3Z 7SmnU4Vgi17VoHNC1i4i28Tgp X-Received: by 2002:a7b:c4d0:0:b0:3d1:f6b3:2ce3 with SMTP id g16-20020a7bc4d0000000b003d1f6b32ce3mr7892198wmk.35.1674066105813; Wed, 18 Jan 2023 10:21:45 -0800 (PST) X-Received: by 2002:a7b:c4d0:0:b0:3d1:f6b3:2ce3 with SMTP id g16-20020a7bc4d0000000b003d1f6b32ce3mr7892185wmk.35.1674066105554; Wed, 18 Jan 2023 10:21:45 -0800 (PST) Received: from ?IPV6:2003:cb:c705:800:1a88:f98a:d223:c454? (p200300cbc70508001a88f98ad223c454.dip0.t-ipconnect.de. [2003:cb:c705:800:1a88:f98a:d223:c454]) by smtp.gmail.com with ESMTPSA id fm11-20020a05600c0c0b00b003c21ba7d7d6sm2585970wmb.44.2023.01.18.10.21.44 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Wed, 18 Jan 2023 10:21:44 -0800 (PST) Message-ID: <941f0f8f-a2c2-0021-0773-6cfaa81aabd7@redhat.com> Date: Wed, 18 Jan 2023 19:21:43 +0100 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.6.0 Content-Language: en-US To: James Houghton , Peter Xu Cc: Mike Kravetz , Muchun Song , David Rientjes , Axel Rasmussen , Mina Almasry , Zach O'Keefe , Manish Mishra , Naoya Horiguchi , "Dr . David Alan Gilbert" , "Matthew Wilcox (Oracle)" , Vlastimil Babka , Baolin Wang , Miaohe Lin , Yang Shi , Andrew Morton , linux-mm@kvack.org, linux-kernel@vger.kernel.org References: <06423461-c543-56fe-cc63-cabda6871104@redhat.com> <6548b3b3-30c9-8f64-7d28-8a434e0a0b80@redhat.com> From: David Hildenbrand Organization: Red Hat Subject: Re: [PATCH 21/46] hugetlb: use struct hugetlb_pte for walk_hugetlb_range In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Spam-Status: No, score=-2.2 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,NICE_REPLY_A, RCVD_IN_DNSWL_NONE,RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_NONE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org >>> Once the last piece is unmapped (or simpler: once the complete subtree of >>> page tables is gone), we decrement refcount+mapcount. Might require some >>> brain power to do this tracking, but I wouldn't call it impossible right >>> from the start. >>> >>> Would such a design violate other design aspects that are important? > > This is actually how mapcount was treated in HGM RFC v1 (though not > refcount); it is doable for both [2]. > > One caveat here: if a page is unmapped in small pieces, it is > difficult to know if the page is legitimately completely unmapped (we > would have to check all the PTEs in the page table). In RFC v1, I > sidestepped this caveat by saying that "page_mapcount() is incremented > if the hstate-level PTE is present". A single unmap on the whole > hugepage will clear the hstate-level PTE, thus decrementing the > mapcount. > > On a related note, there still exists an (albeit minor) API difference > vs. THPs: a piece of a page that is legitimately unmapped can still > have a positive page_mapcount(). > > Given that this approach allows us to retain the hugetlb vmemmap > optimization (and it wouldn't require a horrible amount of > complexity), I prefer this approach over the THP-like approach. If we can store (directly/indirectly) metadata in the highest pgtable that HGM-maps a hugetlb page, I guess what would be reasonable: * hugetlb page pointer * mapped size Whenever mapping/unmapping sub-parts, we'd have to update that information. Once "mapped size" dropped to 0, we know that the hugetlb page was completely unmapped and we can drop the refcount+mapcount, clear metadata (including hugetlb page pointer) [+ remove the page tables?]. Similarly, once "mapped size" corresponds to the hugetlb size, we can immediately spot that everything is mapped. Again, just a high-level idea. -- Thanks, David / dhildenb