Received: by 2002:a05:6358:45e:b0:b5:b6eb:e1f9 with SMTP id 30csp3763101rwe; Mon, 29 Aug 2022 20:10:47 -0700 (PDT) X-Google-Smtp-Source: AA6agR6Nrggh0sKYXl41F7P4C9vVXMt3PmSDO/24hGnCEXRfx009lLquVWaFcplXbv/yuUaU8JS9 X-Received: by 2002:a17:907:a059:b0:741:4b9b:8d85 with SMTP id gz25-20020a170907a05900b007414b9b8d85mr8794912ejc.730.1661829047722; Mon, 29 Aug 2022 20:10:47 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1661829047; cv=none; d=google.com; s=arc-20160816; b=OAnCC0rOv5999cbBh6DePI4+yyg61nicNCMINhHqr2pGA+7VOW56bcG1AlqDuMM94c 5LB52AIK/vAM14qfpPFaDtVrWObdZTeypQSKPHdvM1rTWoJPbSUof/pNp+dB9EROfxFu sQclMPCwEwET31A3E6hOSjGpSkwQgv9vHIoCu7g0bVdKPC3LoZmRIMPZqX0Eaossyw9Y rhds8vzWVy88rzCa7u009XTtKzd7QXlAo8LMjSVOtRkvDc8teijIQJA6QhofKe8XO5ig wOUexzDG7I/uM7G1+AYqgf5ZyHwP6mUSo7YMeWvVpC2bGkASQO1W/AsUr6eH+SDbR0Ph 3i+w== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:content-language :in-reply-to:mime-version:user-agent:date:message-id:from:references :cc:to:subject; bh=3s914zp6PF/Ao0Pa3nU/yYP4aibMxcW44BCHbk2Jur0=; b=yy+gzu4deFbOdIUqQJ1/eMrBFoNSdHm5lR1ZOLCswucD6EQ7hrg66JQlXceXVske2V CJvKiG8Gdu+gxwbMV+HipCvOfPnesZzZT1A1MYR3FM1vZSR2WvsYEhPCpgavn5EqUhNm bBZit6oRU69O3at1pAkMAvH0cxtsHhi1wfY4xAzAwuHBqKrvxVLlgkhA8jfzwQ3ndjA8 CKAmQzr11W3PW8HuEz6BtnZg+od5jq/uDUvFmRW9RL2bSaeNmJzxEqeA0eN761P/2DO1 oDU8dt2KJGV7OOy7wxJ/40lPzJC0WqKTCoo6bia1ZLbjNT2vMLj7ZZCB8S6MiiOKMgkZ yaMQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=huawei.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id dr20-20020a170907721400b007307d1063besi8422683ejc.89.2022.08.29.20.10.20; Mon, 29 Aug 2022 20:10:47 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=huawei.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229964AbiH3CfF (ORCPT + 99 others); Mon, 29 Aug 2022 22:35:05 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:54970 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229684AbiH3CfD (ORCPT ); Mon, 29 Aug 2022 22:35:03 -0400 Received: from szxga02-in.huawei.com (szxga02-in.huawei.com [45.249.212.188]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id CA95D11A2F for ; Mon, 29 Aug 2022 19:34:59 -0700 (PDT) Received: from canpemm500002.china.huawei.com (unknown [172.30.72.53]) by szxga02-in.huawei.com (SkyGuard) with ESMTP id 4MGrsg2T1dzlWJy; Tue, 30 Aug 2022 10:31:35 +0800 (CST) Received: from [10.174.177.76] (10.174.177.76) by canpemm500002.china.huawei.com (7.192.104.244) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2375.24; Tue, 30 Aug 2022 10:34:57 +0800 Subject: Re: [PATCH 6/8] hugetlb: add vma based lock for pmd sharing To: Mike Kravetz CC: Muchun Song , David Hildenbrand , Michal Hocko , Peter Xu , Naoya Horiguchi , "Aneesh Kumar K . V" , Andrea Arcangeli , "Kirill A . Shutemov" , Davidlohr Bueso , Prakash Sangappa , James Houghton , Mina Almasry , Pasha Tatashin , Axel Rasmussen , Ray Fucillo , Andrew Morton , , References: <20220824175757.20590-1-mike.kravetz@oracle.com> <20220824175757.20590-7-mike.kravetz@oracle.com> <47cc90bf-d616-5004-555d-b3d7e9b09bd1@huawei.com> From: Miaohe Lin Message-ID: <33ed8bff-97f4-16c0-e4cb-fec18ff843c0@huawei.com> Date: Tue, 30 Aug 2022 10:34:56 +0800 User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:78.0) Gecko/20100101 Thunderbird/78.6.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset="utf-8" Content-Language: en-US Content-Transfer-Encoding: 7bit X-Originating-IP: [10.174.177.76] X-ClientProxiedBy: dggems704-chm.china.huawei.com (10.3.19.181) To canpemm500002.china.huawei.com (7.192.104.244) X-CFilter-Loop: Reflected X-Spam-Status: No, score=-4.2 required=5.0 tests=BAYES_00,NICE_REPLY_A, RCVD_IN_DNSWL_MED,SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 2022/8/30 6:24, Mike Kravetz wrote: > On 08/27/22 17:30, Miaohe Lin wrote: >> On 2022/8/25 1:57, Mike Kravetz wrote: >>> Allocate a rw semaphore and hang off vm_private_data for >>> synchronization use by vmas that could be involved in pmd sharing. Only >>> add infrastructure for the new lock here. Actual use will be added in >>> subsequent patch. >>> >>> Signed-off-by: Mike Kravetz >> >> >> >>> +static void hugetlb_vma_lock_free(struct vm_area_struct *vma) >>> +{ >>> + /* >>> + * Only present in sharable vmas. See comment in >>> + * __unmap_hugepage_range_final about the neeed to check both >> >> s/neeed/need/ >> >>> + * VM_SHARED and VM_MAYSHARE in free path >> >> I think there might be some wrong checks around this patch. As above comment said, we >> need to check both flags, so we should do something like below instead? >> >> if (!(vma->vm_flags & (VM_MAYSHARE | VM_SHARED) == (VM_MAYSHARE | VM_SHARED))) >> >>> + */ > > Thanks. I will update. > >>> + if (!vma || !(vma->vm_flags & (VM_MAYSHARE | VM_SHARED))) >>> + return; >>> + >>> + if (vma->vm_private_data) { >>> + kfree(vma->vm_private_data); >>> + vma->vm_private_data = NULL; >>> + } >>> +} >>> + >>> +static void hugetlb_vma_lock_alloc(struct vm_area_struct *vma) >>> +{ >>> + struct rw_semaphore *vma_sema; >>> + >>> + /* Only establish in (flags) sharable vmas */ >>> + if (!vma || !(vma->vm_flags & VM_MAYSHARE)) >>> + return; >>> + >>> + /* Should never get here with non-NULL vm_private_data */ >> >> We can get here with non-NULL vm_private_data when called from hugetlb_vm_op_open during fork? > > Right! > > In fork, We allocate a new semaphore in hugetlb_dup_vma_private, and then > shortly after call hugetlb_vm_op_open. > > It works as is, and I can update the comment. However, I wonder if we should > just clear vm_private_data in hugetlb_dup_vma_private and let hugetlb_vm_op_open > do the allocation. I think it's a good idea. We can also avoid allocating memory for vma_lock (via clear_vma_resv_huge_pages()) and then free the corresponding vma right away (via do_munmap())in move_vma(). But maybe I'm miss something. Thanks, Miaohe Lin > >> >> Also there's one missing change on comment: >> >> diff --git a/mm/hugetlb.c b/mm/hugetlb.c >> index d0617d64d718..4bc844a1d312 100644 >> --- a/mm/hugetlb.c >> +++ b/mm/hugetlb.c >> @@ -863,7 +863,7 @@ __weak unsigned long vma_mmu_pagesize(struct vm_area_struct *vma) >> * faults in a MAP_PRIVATE mapping. Only the process that called mmap() >> * is guaranteed to have their future faults succeed. >> * >> - * With the exception of reset_vma_resv_huge_pages() which is called at fork(), >> + * With the exception of hugetlb_dup_vma_private() which is called at fork(), >> * the reserve counters are updated with the hugetlb_lock held. It is safe >> * to reset the VMA at fork() time as it is not in use yet and there is no >> * chance of the global counters getting corrupted as a result of the values. >> >> >> Otherwise this patch looks good to me. Thanks. > > Will update, Thank you! >