Received: by 2002:ab2:6309:0:b0:1fb:d597:ff75 with SMTP id s9csp130059lqt; Wed, 5 Jun 2024 20:57:23 -0700 (PDT) X-Forwarded-Encrypted: i=3; AJvYcCU/Hx+sxF06mvpKgquaIkrZCDRKNrznHcoMaeLenS8Q2tWncmQ8AzsqIWK1VvkGZ860G2GQjLdLt18iXkCGbKtwmsu3jPB/1iqUhaLRRQ== X-Google-Smtp-Source: AGHT+IHs1Q9IL5vDuvJOcoOgB/T6SoXzOotciOWK2kSxHc2IrqdMAs0Vv9O1mZe+pnMdg6SacaW+ X-Received: by 2002:a17:90a:de98:b0:2c1:c581:8eae with SMTP id 98e67ed59e1d1-2c27db00387mr4764764a91.5.1717646243337; Wed, 05 Jun 2024 20:57:23 -0700 (PDT) ARC-Seal: i=2; a=rsa-sha256; t=1717646243; cv=pass; d=google.com; s=arc-20160816; b=eEcZVOHCkmZwF+Bb0NzlFiFQ3DPyD481R6t1zxIjlpXDOWVZoYF6R/AgfubV7R6Xhb SkgzYW4z3Mvxrtde/auEuSb4oY3KVd1Fk2jtB3E1ubvuZIOQNGONJbXDogkqzuUQaKvD bEIVisf6ec1mvjUY/2Qzpw7DKd+HhJDnIwVEFIQohATz9r5AntyoqeERnpYzZJUfMGuv 9mESLVp2nNds6A73kZRh86ZQiZPN31gEqJTGU/cLgHu6YvQA8y9ZxFe2XQRlthHUOMjg K+wtb5ZnKI6Ht51bxCIQXlLlxvaUA0Xjy+pUBHSnR1QnmDaxqyM6Is18rkkbtgUHjRo1 gVOw== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:list-unsubscribe:list-subscribe :list-id:precedence:dkim-signature; bh=dMhZBOaM18whZrcjPH6UtuRN1wLcVJ6zQldlsjrMTg4=; fh=wzVtcU5/Xigm3skIxgKr8/hj5PGpIwGa4jSpwjbGsRo=; b=H9YUsA7BUcKiSMdQBOrllewA3ceDqTznP1LTzba/JuicNyccnpIfmYd3wfWQ53LUs/ LsuO5ehn2AgzQ+dM+hos3EK88BxpPks46L8+zTy85srDhIPw79ASkp+dL2DIf/9wC5YP +9gwLhevnGfJgOXlfZGxqoQmmABhV/8zahCWLHiIMsTPhZspPcG0C9y6c+QDELX8JyWu 7wlQiwBTvICLjNTW9FacB/hX797AplP7SfqjoM/GE59M2P7GeJRnCGIaaUEEjgUjIaTA HHXh0UjvrzoXS+iJdGkQJQ1Fk6kuC0Y6BpFPqk7T2Ol/MUzSwaAJMcTbPeIftQBjSDJM NeiQ==; dara=google.com ARC-Authentication-Results: i=2; mx.google.com; dkim=pass header.i=@gmail.com header.s=20230601 header.b=kCcV1hgZ; arc=pass (i=1 spf=pass spfdomain=gmail.com dkim=pass dkdomain=gmail.com dmarc=pass fromdomain=gmail.com); spf=pass (google.com: domain of linux-kernel+bounces-203598-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:45e3:2400::1 as permitted sender) smtp.mailfrom="linux-kernel+bounces-203598-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Return-Path: Received: from sv.mirrors.kernel.org (sv.mirrors.kernel.org. [2604:1380:45e3:2400::1]) by mx.google.com with ESMTPS id 98e67ed59e1d1-2c2806f1072si2193915a91.189.2024.06.05.20.57.23 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 05 Jun 2024 20:57:23 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel+bounces-203598-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:45e3:2400::1 as permitted sender) client-ip=2604:1380:45e3:2400::1; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20230601 header.b=kCcV1hgZ; arc=pass (i=1 spf=pass spfdomain=gmail.com dkim=pass dkdomain=gmail.com dmarc=pass fromdomain=gmail.com); spf=pass (google.com: domain of linux-kernel+bounces-203598-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:45e3:2400::1 as permitted sender) smtp.mailfrom="linux-kernel+bounces-203598-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by sv.mirrors.kernel.org (Postfix) with ESMTPS id EED17285058 for ; Thu, 6 Jun 2024 03:57:22 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id CA9E81F954; Thu, 6 Jun 2024 03:57:17 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="kCcV1hgZ" Received: from mail-wm1-f46.google.com (mail-wm1-f46.google.com [209.85.128.46]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id E9BE31DA32 for ; Thu, 6 Jun 2024 03:57:14 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.128.46 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1717646236; cv=none; b=UgshcEEsyBT3anJmLdcDcfJUkhpADSoqNzo4xdUL8d0hE1ONxeKjAJotHXWXKMkKYJPyiA4Pt2CXhSyNNt17gTfynxX1C0knAqbcPrDTGRw94vJilnf+dpnhQVMbxcFhdJ9PsWsS2SJn/Zc+IMpiixZ38g2PnjOKhqNP8qLiFpQ= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1717646236; c=relaxed/simple; bh=d8YHgemm19xqU1lY39fZqM1SzjMXBfATws/gJt16EWg=; h=MIME-Version:References:In-Reply-To:From:Date:Message-ID:Subject: To:Cc:Content-Type; b=RGMKnlxdT7qyOHMqRYJCQaOzlx4qs4W0+1g1GsCBOwaEoB0DTunXW63TSyR7zqV7pd3NZ5yWEkZ/vnyOI9f7H3WPO1qU4EYsO845Z6uf4M2Ovd6o0ZpwUuBrNf1/gMRLB5FxcaAIn/KCtXDG3BadrYadNkqHHqH8hNd+wP8/hHk= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=kCcV1hgZ; arc=none smtp.client-ip=209.85.128.46 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Received: by mail-wm1-f46.google.com with SMTP id 5b1f17b1804b1-421578c546eso6005205e9.0 for ; Wed, 05 Jun 2024 20:57:14 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1717646233; x=1718251033; darn=vger.kernel.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=dMhZBOaM18whZrcjPH6UtuRN1wLcVJ6zQldlsjrMTg4=; b=kCcV1hgZtY81oXg7UeIXhnNCq3iPE0ockRzqI+7XrbvOqI5UpDyHEHSOWcjZ3Ux7lm tauu2w1uMoWsnXDtrTzyZdybieX3YW1BOjBA/dYABcD7GqeQaTdOIiuxTUtjgsCSVbTg EQsAA3Xf/XiWlqdPg2eq5qavCySRQVkhCI/6tuVAwKnTC4Jg19B8Kz1dt0tgq7C12Mco IAIJ9cdkV4Of5FIOJdLo/x08AM/EOZ2mMCUUIbV52OYNwOPcDPvOzyS4IOuTHdoXjIHo 0tP7zsKvQrax7qcwZlvpA1127unW6COdAvO76UFT7xD9GihJawgmCqrhTC2g0zfeQe5T I8Lg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1717646233; x=1718251033; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=dMhZBOaM18whZrcjPH6UtuRN1wLcVJ6zQldlsjrMTg4=; b=OT3Cjzgh4SmpQQP0eSbWWOBvHDiMbVuIepjVPSwLJbI42aDnyC0oGSv0gXuEaXQCar ShfV1RchZwohKfTWB8WJafA/cLbvpu3Y86dIFgJ+12dbazm5mhw/N9ksYFJK4mG68NX6 oCprNvk08miliExFcOVeepxkIjO26AMrFDnzPNlRt8FPgeUq3i6PkhrlNNPm98a3EEgy iNKLVHhapPL+1844knDK5CVBPu3+KA9ll5XZoxCGsMhCyVxPf8HU1YxmDIg8gxcYMAEf Gp7JFS9Sss4Wm1Zq7qa90Ma7pAUn+GlKl935Zbo+640Dq//H2KfNntipFBpulSsjky6J r//w== X-Forwarded-Encrypted: i=1; AJvYcCVObbkatzO7vKXaqOqVzM5erkKEsrdrdN+5ARi4QVTTrKZ7teLpXR6GR7VGopkNY0ibg60i1hWeYGyNxp1c5yO0LdoQD7ArzIY8OaFD X-Gm-Message-State: AOJu0YyTzoxbbMCO5lm6YOqAxjkOYeJ578uLhLuht0cwTyYseP0XdEZu 6a1bv/nWhSVWmZGXp2rdvbaftJoEYKtK0S+8dxNRu/BLilnVq6dUjYUiyP4MNNr9zwSB2NoNprR HHcx8ssC1fP6xgo1ZWOBukYmGN5Q= X-Received: by 2002:a05:600c:4754:b0:416:9ba0:8f17 with SMTP id 5b1f17b1804b1-421562f21b8mr39172235e9.22.1717646233147; Wed, 05 Jun 2024 20:57:13 -0700 (PDT) Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 References: <20240521040244.48760-1-ioworker0@gmail.com> <20240521040244.48760-3-ioworker0@gmail.com> <8580a462-eadc-4fa5-b01a-c0b8c3ae644d@redhat.com> <7f2ab112-5916-422c-b29f-343cc0d6d754@redhat.com> In-Reply-To: From: Lance Yang Date: Thu, 6 Jun 2024 11:57:01 +0800 Message-ID: Subject: Re: [PATCH v6 2/3] mm/rmap: integrate PMD-mapped folio splitting into pagewalk loop To: David Hildenbrand Cc: Yin Fengwei , akpm@linux-foundation.org, willy@infradead.org, sj@kernel.org, baolin.wang@linux.alibaba.com, maskray@google.com, ziy@nvidia.com, ryan.roberts@arm.com, 21cnbao@gmail.com, mhocko@suse.com, zokeefe@google.com, shy828301@gmail.com, xiehuan09@gmail.com, libang.li@antgroup.com, wangkefeng.wang@huawei.com, songmuchun@bytedance.com, peterx@redhat.com, minchan@kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable On Thu, Jun 6, 2024 at 12:16=E2=80=AFAM David Hildenbrand wrote: > > On 05.06.24 17:43, Lance Yang wrote: > > On Wed, Jun 5, 2024 at 11:03=E2=80=AFPM David Hildenbrand wrote: > >> > >> On 05.06.24 16:57, Lance Yang wrote: > >>> On Wed, Jun 5, 2024 at 10:39=E2=80=AFPM David Hildenbrand wrote: > >>>> > >>>> On 05.06.24 16:28, David Hildenbrand wrote: > >>>>> On 05.06.24 16:20, Lance Yang wrote: > >>>>>> Hi David, > >>>>>> > >>>>>> On Wed, Jun 5, 2024 at 8:46=E2=80=AFPM David Hildenbrand wrote: > >>>>>>> > >>>>>>> On 21.05.24 06:02, Lance Yang wrote: > >>>>>>>> In preparation for supporting try_to_unmap_one() to unmap PMD-ma= pped > >>>>>>>> folios, start the pagewalk first, then call split_huge_pmd_addre= ss() to > >>>>>>>> split the folio. > >>>>>>>> > >>>>>>>> Since TTU_SPLIT_HUGE_PMD will no longer perform immediately, we = might > >>>>>>>> encounter a PMD-mapped THP missing the mlock in the VM_LOCKED ra= nge during > >>>>>>>> the page walk. It=E2=80=99s probably necessary to mlock this THP= to prevent it from > >>>>>>>> being picked up during page reclaim. > >>>>>>>> > >>>>>>>> Suggested-by: David Hildenbrand > >>>>>>>> Suggested-by: Baolin Wang > >>>>>>>> Signed-off-by: Lance Yang > >>>>>>>> --- > >>>>>>> > >>>>>>> [...] again, sorry for the late review. > >>>>>> > >>>>>> No worries at all, thanks for taking time to review! > >>>>>> > >>>>>>> > >>>>>>>> diff --git a/mm/rmap.c b/mm/rmap.c > >>>>>>>> index ddffa30c79fb..08a93347f283 100644 > >>>>>>>> --- a/mm/rmap.c > >>>>>>>> +++ b/mm/rmap.c > >>>>>>>> @@ -1640,9 +1640,6 @@ static bool try_to_unmap_one(struct folio = *folio, struct vm_area_struct *vma, > >>>>>>>> if (flags & TTU_SYNC) > >>>>>>>> pvmw.flags =3D PVMW_SYNC; > >>>>>>>> > >>>>>>>> - if (flags & TTU_SPLIT_HUGE_PMD) > >>>>>>>> - split_huge_pmd_address(vma, address, false, folio)= ; > >>>>>>>> - > >>>>>>>> /* > >>>>>>>> * For THP, we have to assume the worse case ie pmd fo= r invalidation. > >>>>>>>> * For hugetlb, it could be much worse if we need to d= o pud > >>>>>>>> @@ -1668,20 +1665,35 @@ static bool try_to_unmap_one(struct foli= o *folio, struct vm_area_struct *vma, > >>>>>>>> mmu_notifier_invalidate_range_start(&range); > >>>>>>>> > >>>>>>>> while (page_vma_mapped_walk(&pvmw)) { > >>>>>>>> - /* Unexpected PMD-mapped THP? */ > >>>>>>>> - VM_BUG_ON_FOLIO(!pvmw.pte, folio); > >>>>>>>> - > >>>>>>>> /* > >>>>>>>> * If the folio is in an mlock()d vma, we must= not swap it out. > >>>>>>>> */ > >>>>>>>> if (!(flags & TTU_IGNORE_MLOCK) && > >>>>>>>> (vma->vm_flags & VM_LOCKED)) { > >>>>>>>> /* Restore the mlock which got missed = */ > >>>>>>>> - if (!folio_test_large(folio)) > >>>>>>>> + if (!folio_test_large(folio) || > >>>>>>>> + (!pvmw.pte && (flags & TTU_SPLIT_HUGE_= PMD))) > >>>>>>>> mlock_vma_folio(folio, vma); > >>>>>>> > >>>>>>> Can you elaborate why you think this would be required? If we wou= ld have > >>>>>>> performed the split_huge_pmd_address() beforehand, we would stil= l be > >>>>>>> left with a large folio, no? > >>>>>> > >>>>>> Yep, there would still be a large folio, but it wouldn't be PMD-ma= pped. > >>>>>> > >>>>>> After Weifeng's series[1], the kernel supports mlock for PTE-mappe= d large > >>>>>> folio, but there are a few scenarios where we don't mlock a large = folio, such > >>>>>> as when it crosses a VM_LOCKed VMA boundary. > >>>>>> > >>>>>> - if (!folio_test_large(folio)) > >>>>>> + if (!folio_test_large(folio) || > >>>>>> + (!pvmw.pte && (flags & TTU_SPLIT_HU= GE_PMD))) > >>>>>> > >>>>>> And this check is just future-proofing and likely unnecessary. If = encountering a > >>>>>> PMD-mapped THP missing the mlock for some reason, we can mlock thi= s > >>>>>> THP to prevent it from being picked up during page reclaim, since = it is fully > >>>>>> mapped and doesn't cross the VMA boundary, IIUC. > >>>>>> > >>>>>> What do you think? > >>>>>> I would appreciate any suggestions regarding this check ;) > >>>>> > >>>>> Reading this patch only, I wonder if this change makes sense in the > >>>>> context here. > >>>>> > >>>>> Before this patch, we would have PTE-mapped the PMD-mapped THP befo= re > >>>>> reaching this call and skipped it due to "!folio_test_large(folio)"= . > >>>>> > >>>>> After this patch, we either > >>>>> > >>>>> a) PTE-remap the THP after this check, but retry and end-up here ag= ain, > >>>>> whereby we would skip it due to "!folio_test_large(folio)". > >>>>> > >>>>> b) Discard the PMD-mapped THP due to lazyfree directly. Can that > >>>>> co-exist with mlock and what would be the problem here with mlock? > >>>>> > >>>>> > >>> > >>> Thanks a lot for clarifying! > >>> > >>>>> So if the check is required in this patch, we really have to unders= tand > >>>>> why. If not, we should better drop it from this patch. > >>>>> > >>>>> At least my opinion, still struggling to understand why it would be > >>>>> required (I have 0 knowledge about mlock interaction with large fol= ios :) ). > >>>>> > >>>> > >>>> Looking at that series, in folio_references_one(), we do > >>>> > >>>> if (!folio_test_large(folio) || !pvmw.pte)= { > >>>> /* Restore the mlock which got mis= sed */ > >>>> mlock_vma_folio(folio, vma); > >>>> page_vma_mapped_walk_done(&pvmw); > >>>> pra->vm_flags |=3D VM_LOCKED; > >>>> return false; /* To break the loop= */ > >>>> } > >>>> > >>>> I wonder if we want that here as well now: in case of lazyfree we > >>>> would not back off, right? > >>>> > >>>> But I'm not sure if lazyfree in mlocked areas are even possible. > >>>> > >>>> Adding the "!pvmw.pte" would be much clearer to me than the flag che= ck. > >>> > >>> Hmm... How about we drop it from this patch for now, and add it back = if needed > >>> in the future? > >> > >> If we can rule out that MADV_FREE + mlock() keeps working as expected = in > >> the PMD-mapped case, we're good. > >> > >> Can we rule that out? (especially for MADV_FREE followed by mlock()) > > > > Perhaps we don't worry about that. > > > > IIUC, without that check, MADV_FREE + mlock() still works as expected i= n > > the PMD-mapped case, since if encountering a large folio in a VM_LOCKED > > VMA range, we will stop the page walk immediately. > > > Can you point me at the code (especially considering patch #3?) Yep, please see my other mail ;) Thanks, Lance > > -- > Cheers, > > David / dhildenb >