Received: by 2002:a05:7412:d1aa:b0:fc:a2b0:25d7 with SMTP id ba42csp498365rdb; Mon, 29 Jan 2024 08:35:19 -0800 (PST) X-Google-Smtp-Source: AGHT+IGu6mxOLfU9R1XK+sibLXciDAZ2ExzMPKHTr/L9+JGtulCOaVxH52XjAM8X8h0ZOcdXCSsc X-Received: by 2002:a17:906:f208:b0:a35:2ee1:b953 with SMTP id gt8-20020a170906f20800b00a352ee1b953mr4309152ejb.26.1706546119488; Mon, 29 Jan 2024 08:35:19 -0800 (PST) ARC-Seal: i=2; a=rsa-sha256; t=1706546119; cv=pass; d=google.com; s=arc-20160816; b=ZnOZ5lZ9jQAXjum8pZR/jpWjwigs1YdD3CkWSMdKajvZt/nIkujTLciRpARxkeLTJv sDCkQ4L1s3lAyttccKm3cbp5jx/Oe225TJaCacRxcr9VuU97OWB9TIFulJmPuSeAh6zS md950GQlGM6PlD2O5XlqzQPctQxdyzt07rIQ24CxNJGBsowkAHk4FlHNiZJ4XCK2D2K6 HWmpzGTLuumqd/NusMyLr3TnS+zFo/hILYe+HxFsmhS/wpI+tS6XzTv7AQDKOq8xTJBD JqG9MhhfW4UQdSRJRqCoZ83HELWspeCEBp/BqJxB4RhfgbGTLLp63WbHOnT5sWXQAva6 uS7Q== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:list-unsubscribe:list-subscribe :list-id:precedence:dkim-signature; bh=bqVLvtcgm5RSCGUSAkB0qjT/cVo9K8UvLEITfRMFIqc=; fh=RxQZRPv5bxmd1EEl3IHOOavptjyZniBRgL61g6qkWJM=; b=tPIL6rD/j37zkW13gj88B4O8YYBc8/csQ4GY2pZqOhqs36reRnQXYXw67981B9cBgc zn9VmbgvxIPZPuJWK+GzrfDQvN/tqPLv1kmfVOFB0aNG5kHhu808YAM55p2hPDKkrcvW XFZGnHyXG+go1veKyYuy4k5CXoJJc+RSnqFuo2+a9o+K4TTGqj3MUDaG1yRK/+evc9pf 417RLDHFcZtmTYRCwGRRMewqN1CLPbZgwiZvX7Yj6qmsjTGw3NtG0fTwZNsA77NsnWSK AuMeBHN3/hBmQIW1BWRUyT/YE2MRVDdFtKt16PejFPtE25RoEB/Mh98MKBQngC1SY5Jd XKzA== ARC-Authentication-Results: i=2; mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=p+YZYA9S; arc=pass (i=1 dkim=pass dkdomain=kernel.org); spf=pass (google.com: domain of linux-kernel+bounces-43107-linux.lists.archive=gmail.com@vger.kernel.org designates 147.75.80.249 as permitted sender) smtp.mailfrom="linux-kernel+bounces-43107-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from am.mirrors.kernel.org (am.mirrors.kernel.org. [147.75.80.249]) by mx.google.com with ESMTPS id q9-20020a1709060e4900b00a359a7294f3si1389466eji.871.2024.01.29.08.35.19 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 29 Jan 2024 08:35:19 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel+bounces-43107-linux.lists.archive=gmail.com@vger.kernel.org designates 147.75.80.249 as permitted sender) client-ip=147.75.80.249; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=p+YZYA9S; arc=pass (i=1 dkim=pass dkdomain=kernel.org); spf=pass (google.com: domain of linux-kernel+bounces-43107-linux.lists.archive=gmail.com@vger.kernel.org designates 147.75.80.249 as permitted sender) smtp.mailfrom="linux-kernel+bounces-43107-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by am.mirrors.kernel.org (Postfix) with ESMTPS id 3FD861F23D96 for ; Mon, 29 Jan 2024 16:35:19 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 31E4F15AAA7; Mon, 29 Jan 2024 16:32:05 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="p+YZYA9S" Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 4456F15A4AB for ; Mon, 29 Jan 2024 16:32:03 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1706545924; cv=none; b=uBPifmk2EnIFOg6k1RYI9+QjQhihQJ8RKf9rcUEeLSmXtKIJY6UEqxfjkrc1OS/Tu0n3Dnn5olJWZhj3bTYm6frBGUf4LOvcRSKZkg31Ih1eK14TjtFVgjwG/hZYrnQokHxzyKwryZYrU+bR7X7s5o+0GP8+/4+j+FAiNdK39kg= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1706545924; c=relaxed/simple; bh=/u1rFiHmhXAQAtlUXq7FgxQxSIB+dSj/HU6AwwQbRoo=; h=MIME-Version:References:In-Reply-To:From:Date:Message-ID:Subject: To:Cc:Content-Type; b=EWL8xi3H11+V0Hoz0Nf4J1r/7JWMhrqF2BgsF35aYVPaqDjJUkbaobnMag+Ra7Z0ciQIRaUcDwTrDVUSqfrZDS9hbdIf2L8wkbfxKGnEDsnPv0riKJysbwP8MoAnJFKbi47LJormEbpNVkDa/uVM5vXxZBPyVt99iLhhheMu1dU= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=p+YZYA9S; arc=none smtp.client-ip=10.30.226.201 Received: by smtp.kernel.org (Postfix) with ESMTPSA id C9FD4C4166B for ; Mon, 29 Jan 2024 16:32:03 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1706545923; bh=/u1rFiHmhXAQAtlUXq7FgxQxSIB+dSj/HU6AwwQbRoo=; h=References:In-Reply-To:From:Date:Subject:To:Cc:From; b=p+YZYA9S5HvYtpH/fWISTw5LLg6YC26CUaUE0OOkycY2NWeb4mIy88mZqtJ3zu2TA CUpwtuAcyoVN25144BfjKst3mkyuVvnDA8ELm7vKAFrzBramuJCqNmRIupLIsrivd2 geGtoUhQzku9bIcOOwhNJN+srTO1fQyUOh+BcVTNpNbcBz6MB0QqnJnnC4jGeetD2g Wie4q3thvfistpghwe9jY38naYjt0lob/XXbn5qr2OwB40a3CAeu0S/HRxRcCo4wDX QnwcEq6FcIwOLHI4tx3LKCqZM/SIQ9VpKYbJbUomnYHU4Sdne44ItfBKJzV5dmXMCS 4v8lEFRTEDvyw== Received: by mail-il1-f175.google.com with SMTP id e9e14a558f8ab-3638b5f8240so313445ab.0 for ; Mon, 29 Jan 2024 08:32:03 -0800 (PST) X-Gm-Message-State: AOJu0Yz8kJydBcmd4YBuNVChyFvCuxpPA87ek50Mz9TcHYvk3LGWJu54 xUQo6JZxhbhdroG1Hm24aLQenjoXE4DuceaInblNin2CXKBGFw60jKoEgP4y+ygMzgmPRU49DT0 DQIdtQCUGawJEHjuDfTLuKyoT8y4jp1sJYuB6 X-Received: by 2002:a92:a306:0:b0:360:73de:be5e with SMTP id a6-20020a92a306000000b0036073debe5emr4185014ili.14.1706545922962; Mon, 29 Jan 2024 08:32:02 -0800 (PST) Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 References: <20231025144546.577640-1-ryan.roberts@arm.com> <20240118111036.72641-1-21cnbao@gmail.com> <20240118111036.72641-6-21cnbao@gmail.com> <58efd8d4-28aa-45d6-b384-7463568b24bb@redhat.com> In-Reply-To: From: Chris Li Date: Mon, 29 Jan 2024 08:31:51 -0800 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: [PATCH RFC 5/6] mm: rmap: weaken the WARN_ON in __folio_add_anon_rmap() To: David Hildenbrand Cc: Barry Song <21cnbao@gmail.com>, ryan.roberts@arm.com, akpm@linux-foundation.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, mhocko@suse.com, shy828301@gmail.com, wangkefeng.wang@huawei.com, willy@infradead.org, xiang@kernel.org, ying.huang@intel.com, yuzhao@google.com, surenb@google.com, steven.price@arm.com, Barry Song , Chuanhua Han Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable On Mon, Jan 29, 2024 at 2:07=E2=80=AFAM David Hildenbrand wrote: > > On 29.01.24 04:25, Chris Li wrote: > > Hi David and Barry, > > > > On Mon, Jan 22, 2024 at 10:49=E2=80=AFPM Barry Song <21cnbao@gmail.com>= wrote: > >> > >>> > >>> > >>> I have on my todo list to move all that !anon handling out of > >>> folio_add_anon_rmap_ptes(), and instead make swapin code call add > >>> folio_add_new_anon_rmap(), where we'll have to pass an exclusive flag > >>> then (-> whole new folio exclusive). > >>> > >>> That's the cleaner approach. > >>> > >> > >> one tricky thing is that sometimes it is hard to know who is the first > >> one to add rmap and thus should > >> call folio_add_new_anon_rmap. > >> especially when we want to support swapin_readahead(), the one who > >> allocated large filio might not > >> be that one who firstly does rmap. > > > > I think Barry has a point. Two tasks might race to swap in the folio > > then race to perform the rmap. > > folio_add_new_anon_rmap() should only call a folio that is absolutely > > "new", not shared. The sharing in swap cache disqualifies that > > condition. > > We have to hold the folio lock. So only one task at a time might do the > folio_add_anon_rmap_ptes() right now, and the > folio_add_new_shared_anon_rmap() in the future [below]. > Ah, I see. The folio_lock() is the answer I am looking for. > Also observe how folio_add_anon_rmap_ptes() states that one must hold > the page lock, because otherwise this would all be completely racy. > > From the pte swp exclusive flags, we know for sure whether we are > dealing with exclusive vs. shared. I think patch #6 does not properly > check that all entries are actually the same in that regard (all > exclusive vs all shared). That likely needs fixing. > > [I have converting per-page PageAnonExclusive flags to a single > per-folio flag on my todo list. I suspect that we'll keep the > per-swp-pte exlusive bits, but the question is rather what we can > actually make work, because swap and migration just make it much more > complicated. Anyhow, future work] > > > > >> is it an acceptable way to do the below in do_swap_page? > >> if (!folio_test_anon(folio)) > >> folio_add_new_anon_rmap() > >> else > >> folio_add_anon_rmap_ptes() > > > > I am curious to know the answer as well. > > > Yes, the end code should likely be something like: > > /* ksm created a completely new copy */ > if (unlikely(folio !=3D swapcache && swapcache)) { > folio_add_new_anon_rmap(folio, vma, vmf->address); > folio_add_lru_vma(folio, vma); > } else if (folio_test_anon(folio)) { > folio_add_anon_rmap_ptes(rmap_flags) > } else { > folio_add_new_anon_rmap(rmap_flags) > } > > Maybe we want to avoid teaching all existing folio_add_new_anon_rmap() > callers about a new flag, and just have a new > folio_add_new_shared_anon_rmap() instead. TBD. There is more than one caller needed to perform that dance around folio_test_anon() then decide which function to call. It would be nice to have a wrapper function folio_add_new_shared_anon_rmap() to abstract this behavior. > > > > > BTW, that test might have a race as well. By the time the task got > > !anon result, this result might get changed by another task. We need > > to make sure in the caller context this race can't happen. Otherwise > > we can't do the above safely. > Again, folio lock. Observe the folio_lock_or_retry() call that covers > our existing folio_add_new_anon_rmap/folio_add_anon_rmap_pte calls. Ack. Thanks for the explanation. Chris