Received: by 2002:a05:6358:7058:b0:131:369:b2a3 with SMTP id 24csp7588791rwp; Tue, 18 Jul 2023 18:48:18 -0700 (PDT) X-Google-Smtp-Source: APBJJlHnUGU6V2nW6EYntQwl6dVEHTmXlYSYQUnlmdHsV6OWcr+/LltVsjxnQF8joCgJrWBGL5UX X-Received: by 2002:a17:902:e890:b0:1b7:f64b:378a with SMTP id w16-20020a170902e89000b001b7f64b378amr15914415plg.16.1689731297963; Tue, 18 Jul 2023 18:48:17 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1689731297; cv=none; d=google.com; s=arc-20160816; b=vk+7O0N4BnINdrsc3kvyM63LbXReOcFKNKK/MJNGTGRVEJvD4wHzhGFB8D4PE5fS7o ceoQg3KcpE6bbg8afVAhSe+GEZ4R/N6vLVMD650afn6rSSBtno0+a3aMWBPiqe+v5/D4 t3qvX2B+vHPoscwhZuDIaNAjJHm7uPyArZrPmWdStcuOSnKfc3Yki9IHfbEPNJ4Cx7at LBf4vP/+3Lrj7UCgplZLxsNKtn3s5sLkUxRtbSPBq9vZHIlDSH/jwo2gzBxwjFVQcGl5 mXpcAKnRX84G6aExioTFw4LgNlSBkaO4kpvOXo4d8wRmScmEJrtjD8ModvybfjfMr5DM CRzg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:cc:to:subject :message-id:date:from:in-reply-to:references:mime-version :dkim-signature; bh=wu8yrI0vurtxA36w+DQeue49ZjuZ2e3VFUWp6JueYOg=; fh=F5LIVBEWbRSZEVBKcHS1VGWWqIC2fUayj3MSanyEU8w=; b=oc/+R7uMTzfENdKCHqzKSFPZuO2zhYpFXXZ/u3Ld8tUynb/BQJ48EYXXPzn+1DP40U DdgkDB1iaSHQPdiG3xKiSzua0kDlIn1Nkau80y93NRxKp45CTyQGKygLtqGm7wthnmCx i1SVkWWpYvz5ptpt2w7jcFuRAVWBw0+H1DkvVwONSQgxZYzgV8e2OmWBG1B8tbHe/sxL roJXrKpNdZMR+VNAvyhotS2v4JeTV+2HEhfuPzwn7doQMTrPsLd7whwBfohn4yLi3Ku0 yh85/7gpGYks79bNiaszSbyXOhfU6QTV0YbZ+Z/73bx7jhKIt1rZv9oBnwRuY84p3Rnb kUUw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20221208 header.b=1QaXmQoU; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id b3-20020a170903228300b001b8b564b558si2472325plh.645.2023.07.18.18.48.04; Tue, 18 Jul 2023 18:48:17 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20221208 header.b=1QaXmQoU; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229490AbjGSBcw (ORCPT + 99 others); Tue, 18 Jul 2023 21:32:52 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:37492 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229478AbjGSBcv (ORCPT ); Tue, 18 Jul 2023 21:32:51 -0400 Received: from mail-ej1-x634.google.com (mail-ej1-x634.google.com [IPv6:2a00:1450:4864:20::634]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 1C8AD134 for ; Tue, 18 Jul 2023 18:32:50 -0700 (PDT) Received: by mail-ej1-x634.google.com with SMTP id a640c23a62f3a-997c4107d62so345569366b.0 for ; Tue, 18 Jul 2023 18:32:50 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1689730368; x=1692322368; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=wu8yrI0vurtxA36w+DQeue49ZjuZ2e3VFUWp6JueYOg=; b=1QaXmQoU6kXcXGavL9t2Q8nTz66m7Csb56F6wlwnjDHZ0H4As3ER87l+U783RQdoYc rktnWOv9E2/P/PlZprRb0WYGFl2QSz5FrrpDrlw7bRetKw3O9dPJIBT2u86dcROFkiZ2 fhHd+hBjv+1hL1J0QMAx8LTW1UR4aYeF++uCqmUHn4fZ7kNO8yWkuvHT7Tqyt5N+kKjQ aq82srs83oaNxNIbZiOEBZExFNX33vDYz6ZvIFEHcio5ahIEmOI+pDnke6q5ZQ+yrJDY psaBqwXXVtbem3RafmS7nOZRROuCDYob2L6H9gC9patepFz/UFut0VeNegk1XxuhZYMI KHrw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1689730368; x=1692322368; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=wu8yrI0vurtxA36w+DQeue49ZjuZ2e3VFUWp6JueYOg=; b=EVW3iH7xiszKZ8VAi5dtT/7dx/j0ZK03KwLaiZpg14005dWzB9rfIQ/Y8rGAnANQ6I JlGT3kdhUVMxSp5tGrsiqWJvfBVCDo/3MruWTReDaVDqkDWF9NSZjmKX5Umbxocc3RbZ KIRNfKhgomaVpUiPpeSoa+0jfaNvxIjJIyMRx+U7pQ/J9rDFW2Qmw+IC7av622AaJGqZ 9uaz2GkKLIo2kndmqTWSdJLeGtcP80wEv17MbRIuBws9xrFtB7He4ZRvEcrZPQcCROn8 W+mxDQxb4EroIZK6pmx1JlvYa+esqarJY3zaLBMR+IJ3EdEHyk+SGVb7kunHaixJdNqf S2iA== X-Gm-Message-State: ABy/qLZfjRAJnVUJAQ4NGU/1BLLGhvPRW6+byYavQwEjHRnTzyyldxqq yDF5zoaHrNvBDXs2XC1A5L1ILVoq3zrKpCVXej/QXxzzmiRfaQnu4I8ltg== X-Received: by 2002:a17:906:5187:b0:992:1653:3416 with SMTP id y7-20020a170906518700b0099216533416mr1575308ejk.0.1689730368452; Tue, 18 Jul 2023 18:32:48 -0700 (PDT) MIME-Version: 1.0 References: <20230712060144.3006358-1-fengwei.yin@intel.com> <20230712060144.3006358-4-fengwei.yin@intel.com> <40cbc39e-5179-c2f4-3cea-0a98395aaff1@intel.com> <16844254-7248-f557-b1eb-b8b102c877a2@intel.com> In-Reply-To: From: Yosry Ahmed Date: Tue, 18 Jul 2023 18:32:12 -0700 Message-ID: Subject: Re: [RFC PATCH v2 3/3] mm: mlock: update mlock_pte_range to handle large folio To: Yu Zhao Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, akpm@linux-foundation.org, willy@infradead.org, david@redhat.com, ryan.roberts@arm.com, shy828301@gmail.com, Yin Fengwei , Hugh Dickins Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Spam-Status: No, score=-17.6 required=5.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF, ENV_AND_HDR_SPF_MATCH,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS, T_SCC_BODY_TEXT_LINE,USER_IN_DEF_DKIM_WL,USER_IN_DEF_SPF_WL autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, Jul 18, 2023 at 4:47=E2=80=AFPM Yin Fengwei = wrote: > > > > On 7/19/23 06:48, Yosry Ahmed wrote: > > On Sun, Jul 16, 2023 at 6:58=E2=80=AFPM Yin Fengwei wrote: > >> > >> > >> > >> On 7/17/23 08:35, Yu Zhao wrote: > >>> On Sun, Jul 16, 2023 at 6:00=E2=80=AFPM Yin, Fengwei wrote: > >>>> > >>>> On 7/15/2023 2:06 PM, Yu Zhao wrote: > >>>>> There is a problem here that I didn't have the time to elaborate: w= e > >>>>> can't mlock() a folio that is within the range but not fully mapped > >>>>> because this folio can be on the deferred split queue. When the spl= it > >>>>> happens, those unmapped folios (not mapped by this vma but are mapp= ed > >>>>> into other vmas) will be stranded on the unevictable lru. > >>>> > >>>> This should be fine unless I missed something. During large folio sp= lit, > >>>> the unmap_folio() will be migrate(anon)/unmap(file) folio. Folio wil= l be > >>>> munlocked in unmap_folio(). So the head/tail pages will be evictable= always. > >>> > >>> It's close but not entirely accurate: munlock can fail on isolated fo= lios. > >> Yes. The munlock just clear PG_mlocked bit but with PG_unevictable lef= t. > >> > >> Could this also happen against normal 4K page? I mean when user try to= munlock > >> a normal 4K page and this 4K page is isolated. So it become unevictabl= e page? > > > > Looks like it can be possible. If cpu 1 is in __munlock_folio() and > > cpu 2 is isolating the folio for any purpose: > > > > cpu1 cpu2 > > isolate folio > > folio_test_clear_lru() // 0 > > putback folio // add > > to unevictable list > > folio_test_clear_mlocked() > Yes. Yu showed this sequence to me in another email. I thought the putbac= k_lru() > could correct the none-mlocked but unevictable folio. But it doesn't beca= use > of this race. (+Hugh Dickins for vis) Yu, I am not familiar with the split_folio() case, so I am not sure it is the same exact race I stated above. Can you confirm whether or not doing folio_test_clear_mlocked() before folio_test_clear_lru() would fix the race you are referring to? IIUC, in this case, we make sure we clear PG_mlocked before we try to to clear PG_lru. If we fail to clear it, then someone else have the folio isolated after we clear PG_mlocked, so we can be sure that when they put the folio back it will be correctly made evictable. Is my understanding correct? If yes, I can add this fix to my next version of the RFC series to rework mlock_count. It would be a lot more complicated with the current implementation (as I stated in a previous email). > > > > > > > The page would be stranded on the unevictable list in this case, no? > > Maybe we should only try to isolate the page (clear PG_lru) after we > > possibly clear PG_mlocked? In this case if we fail to isolate we know > > for sure that whoever has the page isolated will observe that > > PG_mlocked is clear and correctly make the page evictable. > > > > This probably would be complicated with the current implementation, as > > we first need to decrement mlock_count to determine if we want to > > clear PG_mlocked, and to do so we need to isolate the page as > > mlock_count overlays page->lru. With the proposal in [1] to rework > > mlock_count, it might be much simpler as far as I can tell. I intend > > to refresh this proposal soon-ish. > > > > [1]https://lore.kernel.org/lkml/20230618065719.1363271-1-yosryahmed@goo= gle.com/ > > > >> > >> > >> Regards > >> Yin, Fengwei > >>