Received: by 2002:a05:6358:3188:b0:123:57c1:9b43 with SMTP id q8csp32806833rwd; Fri, 7 Jul 2023 22:44:44 -0700 (PDT) X-Google-Smtp-Source: APBJJlFYgx/T8j5rlD/FPtv8ngwLSQqa8VGyPpg4W4PKBKgpkFMPPE3X9jzPGy2kRdYNE48PsV+m X-Received: by 2002:a05:6a20:2d8:b0:12d:1ae8:a62d with SMTP id 24-20020a056a2002d800b0012d1ae8a62dmr5103203pzb.19.1688795084465; Fri, 07 Jul 2023 22:44:44 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1688795084; cv=none; d=google.com; s=arc-20160816; b=jUzCgNXnQK5//yKRlQOhBLfd5sVJ8IdlFcvscJdaFXhKSbAB92L8DHGBnkAAUKLrQ1 XwpNGzXInqzOFZiGr1KVEm0/zCY2oDUGs33LWRTa6lYsUocy87ZTKECLu1itkULWLx2F A/+CK05+JdB6PGSAK1IG8BW7z3g0UmgZ5BY92QyMAJjHcp1Ew/T0qKGK4CCX+CO1Y1zc Dkt3oAzMmMkIikyC2r1iH/NfpgU2tHN/VUS5jPmz04qzixtFmdQFkxJBpsw0GGSci5TH 7vALaJxjIapvg73HYaLhRl64WPtBvN0ICguHM5/rO5iVZneKVjxsHeL2hxdRGXxCD21A F7kg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:cc:to:subject :message-id:date:from:in-reply-to:references:mime-version :dkim-signature; bh=7bMoIuIpswGrRuq9dBxbRgEGX82Nkz8Bwpi7t8SdLYA=; fh=9ohviUUfItdP6D1Jfb8j20ZbDtCebcTmsUICV7qX1DU=; b=VSvNX8laObYqy7Adf5s5mGBPP2Nqc4mtXkmnX3V35QMxxf9VWB/zKVhqSg5eHsw4gR rgkil3oziefcKtSA+HuAJ6h1+7ukstG2dtbGMHulpTmpBOUhEcbw3I0QO64mf/66FRWS +RBFgfwvj+hQU4L/3nQnNrJedrFq+89+7pUxG3tOqBzK5qhQ/rvje7LUCRQnr9LCeAIN XdSmi/cN8Xs/KI51jb8ER7Tl7popIkELFuI2uYH34G+xxflo3h/aswYZIfXZ2iMc/w2g 6H2bb9c7xXl3KlewBbCaSAI4E+UA1rgffyj350jcWy0EsR+nq2bYYAOzXSJ70nlJKAnY G1Pw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20221208 header.b=ADHRhJhT; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id m4-20020a170902768400b001b7fa0f052dsi4745504pll.484.2023.07.07.22.44.31; Fri, 07 Jul 2023 22:44:44 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20221208 header.b=ADHRhJhT; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231953AbjGHEgY (ORCPT + 99 others); Sat, 8 Jul 2023 00:36:24 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:36372 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229515AbjGHEgW (ORCPT ); Sat, 8 Jul 2023 00:36:22 -0400 Received: from mail-qt1-x82e.google.com (mail-qt1-x82e.google.com [IPv6:2607:f8b0:4864:20::82e]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 44E601FE0 for ; Fri, 7 Jul 2023 21:36:21 -0700 (PDT) Received: by mail-qt1-x82e.google.com with SMTP id d75a77b69052e-40371070eb7so75551cf.1 for ; Fri, 07 Jul 2023 21:36:21 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1688790980; x=1691382980; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=7bMoIuIpswGrRuq9dBxbRgEGX82Nkz8Bwpi7t8SdLYA=; b=ADHRhJhT/5vEK4kj7EBAm2pdLgCCQJt9RXfUPplyGkER7We1CEm7aPlvzskKXFgHxF ys12yXkliKti5QLx1icUyYoh6jD8GBpxUXqRsfFVCKX53r5TUgmxDTzPltBh1KC33/y1 Wv9I0YBvPB5QFmsQD7TuqY2Qg9YYg1O1QS5eW/x4me32FMaKdo6oIh6MGxBFEnd1gqj3 ZnpBMMW3LmE0NTjfJRtVgP43ZRCve7J1QfuWej3v8xr3Odv7an5NHlanlYoVS4LSTuy1 UFXqEDhVJQCBXkmfzBUrLnWv5zY8AKoAoMrgEOMF0GWUhB0DnvQIfVyNaLsVJt528BA3 ClSw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1688790980; x=1691382980; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=7bMoIuIpswGrRuq9dBxbRgEGX82Nkz8Bwpi7t8SdLYA=; b=aYKtmJQsKDU6+HJqUdPdef7/yrvZKpvYtCLH3yYP+3vOARoo+JnoRGEOJHaUqzHUUP RrhabxqA5w9I8kZLYtGEXtywZG8+9BI/DpXsfi8nvSYpx3CI+iMy9QE5wdmSF8LIRvdl 4flfij0X73gblTd5rZaxJJkvCwzX4qF0/gFVoTJqgnvDe05fHnLCyXGGCvWlaXDqVEeU Au3vHXh75fhP2xJq/MPF3IlDDdE6dNqyvSMx8TYQy73F+eslsKqSdkobulJcCONUWLv/ yXHnH89l/2hr6u5FLjd+eUBlNZAouIHOFA9jFLpGh6Nsj3ZSj3Vq9lMalQH18L7sKvi0 /kTQ== X-Gm-Message-State: ABy/qLY8J8zFGuFFsK7/wSyU8d0k2ffwkaykSfmYVKgBNUO1P1SMKLKV pAEmfzfXmbXSVVYTYFKVyOkeaOPN6ZklgCIuRPhUA+P59Qr47Jq96t2lx69F X-Received: by 2002:ac8:5dcf:0:b0:3f0:af20:1a37 with SMTP id e15-20020ac85dcf000000b003f0af201a37mr43542qtx.15.1688790980296; Fri, 07 Jul 2023 21:36:20 -0700 (PDT) MIME-Version: 1.0 References: <20230707165221.4076590-1-fengwei.yin@intel.com> <4bb39d6e-a324-0d85-7d44-8e8a37a1cfec@redhat.com> <436cd29f-44a6-7636-5015-377051942137@intel.com> In-Reply-To: From: Yu Zhao Date: Fri, 7 Jul 2023 22:35:43 -0600 Message-ID: Subject: Re: [RFC PATCH 0/3] support large folio for mlock To: Matthew Wilcox Cc: "Yin, Fengwei" , David Hildenbrand , linux-mm@kvack.org, linux-kernel@vger.kernel.org, ryan.roberts@arm.com, shy828301@gmail.com, akpm@linux-foundation.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Spam-Status: No, score=-17.6 required=5.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF, ENV_AND_HDR_SPF_MATCH,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS, T_SCC_BODY_TEXT_LINE,USER_IN_DEF_DKIM_WL,USER_IN_DEF_SPF_WL autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, Jul 7, 2023 at 10:02=E2=80=AFPM Matthew Wilcox wrote: > > On Sat, Jul 08, 2023 at 11:52:23AM +0800, Yin, Fengwei wrote: > > > Oh, I agree, there are always going to be circumstances where we real= ise > > > we've made a bad decision and can't (easily) undo it. Unless we have= a > > > per-page pincount, and I Would Rather Not Do That. But we should _tr= y_ > > > to do that because it's the right model -- that's what I meant by "Te= ll > > > me why I'm wrong"; what scenarios do we have where a user temporarill= y > > > mlocks (or mprotects or ...) a range of memory, but wants that memory > > > to be aged in the LRU exactly the same way as the adjacent memory tha= t > > > wasn't mprotected? > > for manpage of mlock(): > > mlock(), mlock2(), and mlockall() lock part or all of the calli= ng process's virtual address space into RAM, preventing that memory > > from being paged to the swap area. > > > > So my understanding is it's OK to let the memory mlocked to be aged wit= h > > the adjacent memory which is not mlocked. Just make sure they are not > > paged out to swap. > > Right, it doesn't break anything; it's just a similar problem to > internal fragmentation. The pages of the folio which aren't mlocked > will also be locked in RAM and never paged out. I don't think this is the case: since partially locking a non-pmd-mappable large folio is a nop, it remains on one of the evictable LRUs. The rmap walk by folio_referenced() should already be able to find the VMA and the PTEs mapping the unlocked portion. So the page reclaim should be able to correctly age the unlocked portion even though the folio contains a locked portion too. And when it tries to reclaim the entire folio, it first tries to split it into a list of base folios in shrink_folio_list(), and if that succeeds, it walks the rmap of each base folio on that list to unmap (not age). Unmapping doesn't have TTU_IGNORE_MLOCK, so it should correctly call mlock_vma_folio() on the locked base folios and bail out. And finally those locked base folios are put back to the unevictable list. > > One question for implementation detail: > > If the large folio cross VMA boundary can not be split, how do we > > deal with this case? Retry in syscall till it's split successfully? > > Or return error (and what ERRORS should we choose) to user space? > > I would be tempted to allocate memory & copy to the new mlocked VMA. > The old folio will go on the deferred_list and be split later, or its > valid parts will be written to swap and then it can be freed.