Received: by 2002:a05:7412:3b8b:b0:fc:a2b0:25d7 with SMTP id nd11csp493814rdb; Thu, 8 Feb 2024 11:42:39 -0800 (PST) X-Forwarded-Encrypted: i=3; AJvYcCWQAfN60A6SJARFcVbTg1qGnURiWQdy187Fc+SRu7wZiXamaq2MRWL+0sDk3eoHs5hatfQF4CZV0p0Pw1Wo0Ru2+1IaMZ+0Uv4WOf3yFQ== X-Google-Smtp-Source: AGHT+IG4e7/c/TMzbNwBrdwTptSJRtOglaLK+ePUWuvmxGpzfU56DH0tUP5NqmZkiLEgLlUknXkC X-Received: by 2002:a05:6214:d89:b0:68c:cf45:9a3f with SMTP id e9-20020a0562140d8900b0068ccf459a3fmr1182727qve.0.1707421359458; Thu, 08 Feb 2024 11:42:39 -0800 (PST) ARC-Seal: i=2; a=rsa-sha256; t=1707421359; cv=pass; d=google.com; s=arc-20160816; b=jIn//7M95nhNc05Jt57WQwzIE8LuzYy3Q1hPvvVEBO0j74Kyc6ZWepbeXzzQpf06Tv peVKlbVb/sO/h3NRgYlpXi0aq2tjPcUl5TIZ5iiK5djJPo/9yhIeMMzKYRZaqKS7xNOX x7+41cdg0sc+WHmxkWSVwQm3LCQ336pnjVRvpnd9ct2HIxtwIZL9GFy7Bgc+Th2NKS+7 sam+s/iz8Qr/V9RjqzsI/FT3PlB+5U1sCpXOw0SrXMCQqF72WaumyNHNP4QeGuVJO/QU djUgB0FRvtehm3eFsB/Jkhb6jiOHIhKCJy10l1ZA5ZQzETUGXrdRfZdikTJnlDdpke3q GfBw== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:list-unsubscribe:list-subscribe :list-id:precedence:dkim-signature; bh=STyRiDGlDumQNZa9aAh2P3TN5xoiGOeWEMjJF3qd8BU=; fh=s2spOMk47Ty13xzXUQAGaEKkPgcmkMTs2jaLK9TluF8=; b=ZFfhgV9wzRwlOqNhDFey6UaqLwTVU+Q/uio+I4XIsFseNsYxHN8UpOwV9HSaE92A0V MGQJ6g1zKFgG9g06ln+HPFpOyFBVLOzYleQLnFy5iYlbWb8fBs1/foyfc1WchWbxtWs4 uDTcsJnIA8crzor5lEkSWEiHMlZ2Bp+GPXAooyFQ7iP98t07Y8OWOZm6iU3tO/PpX2Qx gxlCBnBFav206z7gQomQNDb6HASU18BCF9BN8iPRvQpFbyL2Cd2o9D85qGh5zgzAqWnP rvW+DRP96RkKj53g8Jkg86KayDhfwsEa20HvSShpOXy7AMteEbjr6vYQEQuif6anUlKk +IgA==; dara=google.com ARC-Authentication-Results: i=2; mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=nwyL+pHr; arc=pass (i=1 dkim=pass dkdomain=kernel.org); spf=pass (google.com: domain of linux-kernel+bounces-58633-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:45d1:ec00::1 as permitted sender) smtp.mailfrom="linux-kernel+bounces-58633-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org X-Forwarded-Encrypted: i=2; AJvYcCWxPmXC9XFllLpDz7hnhXNS4oQzVn6WfwcRNB3KhFH8lZp1LxSHJyHRk1MnNaX97DEPDpL4DwXNBfxTi2sDqO9QVTpnkWvooecrE7CBTg== Return-Path: Received: from ny.mirrors.kernel.org (ny.mirrors.kernel.org. [2604:1380:45d1:ec00::1]) by mx.google.com with ESMTPS id gs9-20020a056214226900b0068cae96f029si262617qvb.32.2024.02.08.11.42.39 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 08 Feb 2024 11:42:39 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel+bounces-58633-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:45d1:ec00::1 as permitted sender) client-ip=2604:1380:45d1:ec00::1; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=nwyL+pHr; arc=pass (i=1 dkim=pass dkdomain=kernel.org); spf=pass (google.com: domain of linux-kernel+bounces-58633-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:45d1:ec00::1 as permitted sender) smtp.mailfrom="linux-kernel+bounces-58633-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ny.mirrors.kernel.org (Postfix) with ESMTPS id 1FBFD1C29F77 for ; Thu, 8 Feb 2024 19:42:39 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 0427B381CC; Thu, 8 Feb 2024 19:42:31 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="nwyL+pHr" Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 0B12B3218B for ; Thu, 8 Feb 2024 19:42:29 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1707421350; cv=none; b=T7bmXrz39A143H4HHWViDkmqS6zfJWpHg0Bl/RPft7O+S7h/vpEBXixSpXh5aKXQnV1ZyvR+GYdLLQRrGVUlRwP+5oCPbjQ0BojR/ouaAI8vBdt8uXevrypXEtn2Ji68uBowx/nbYAmSrxtZinhht2iHU/OfywTYw45548Xdsdk= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1707421350; c=relaxed/simple; bh=Rtu/dgNHowIMhZUa88CnntQ6SugfUthYsgnqdjsX7Go=; h=MIME-Version:References:In-Reply-To:From:Date:Message-ID:Subject: To:Cc:Content-Type; b=YIkJSHfjQc1SrFRtIukefPneCuDQKgMBRG8U4CQXC78ZmCrd5CI3xhEQivME29tpCpXlwaSNex3lRpa2EyEF0Fp0BArRWWQmsjOR/bO+cskPU+oQfaGi/2QWANwcNOq8vOG5M2Ep0XAHh7/R1HHL2yl72g8tqijZaUzODbW98j4= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=nwyL+pHr; arc=none smtp.client-ip=10.30.226.201 Received: by smtp.kernel.org (Postfix) with ESMTPSA id 8FF5EC43394 for ; Thu, 8 Feb 2024 19:42:29 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1707421349; bh=Rtu/dgNHowIMhZUa88CnntQ6SugfUthYsgnqdjsX7Go=; h=References:In-Reply-To:From:Date:Subject:To:Cc:From; b=nwyL+pHrtUggAFG7+gq7rpvp4fptaBg+1myWc3chKSCN2XvuCRzfkgy2Rilz3gP0T IwCUG3APe8Gnp3Uzg1F1hfJKyN0dBFYEdb0nsfkHawXiKV44TY98l6Umq+r9OlEOK+ T4Yb7DlyaHmiqU3khye1tas4BiFKtbmvgm8pusH55z/xukK70zcFUbT1WMQ/BO2lLy N4w3XWZIfHv3tmeYiR3wevs7VRZmd5VJtUIa4N1HJ5TOBktvgHDPe40V/GjQpCThPH AWitEdB9uk5PtdOiTdtU/9RQPflOgF2UWmk/eK3kDpzFG7RZSCRHj/ORY65dhqPTPJ WN6GI5SDvEUzA== Received: by mail-io1-f53.google.com with SMTP id ca18e2360f4ac-7ba9f1cfe94so45042739f.1 for ; Thu, 08 Feb 2024 11:42:29 -0800 (PST) X-Forwarded-Encrypted: i=1; AJvYcCXZW14I/Wrc5hW0ULue83jUqXgthPMvTOGCCm6kTIL8aLP8GAj6wDPTDflf7k/7vhsGfle1QC904XlomK5/Be25KS6OkIFjn3O7VppB X-Gm-Message-State: AOJu0YxpfCj09q5GXdGcuYRRu8CRA1KHx/bIwkDaIHvF66p/2yKXGwaY nX+q5k7yMFmdcCJuvN1PIYH9PsFwNsD7TVG4VzitIirAE4YROQrw393LJ5hwUYDwHi1vCB+qagi OrfsDHaTM9kWVjVafXBJqCbZn+ctzB9RQrmHD X-Received: by 2002:a92:d481:0:b0:363:8882:d86 with SMTP id p1-20020a92d481000000b0036388820d86mr501941ilg.1.1707421348650; Thu, 08 Feb 2024 11:42:28 -0800 (PST) Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 References: <20240206182559.32264-1-ryncsn@gmail.com> <87eddnxy47.fsf@yhuang6-desk2.ccr.corp.intel.com> In-Reply-To: From: Chris Li Date: Thu, 8 Feb 2024 11:42:16 -0800 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: [PATCH v2] mm/swap: fix race when skipping swapcache To: Kairui Song Cc: "Huang, Ying" , Minchan Kim , Barry Song <21cnbao@gmail.com>, linux-mm@kvack.org, Andrew Morton , Yu Zhao , Barry Song , SeongJae Park , Hugh Dickins , Johannes Weiner , Matthew Wilcox , Michal Hocko , Yosry Ahmed , David Hildenbrand , stable@vger.kernel.org, linux-kernel@vger.kernel.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable On Thu, Feb 8, 2024 at 11:01=E2=80=AFAM Kairui Song wrot= e: > > On Thu, Feb 8, 2024 at 2:36=E2=80=AFPM Huang, Ying = wrote: > > > > Kairui Song writes: > > > > > On Thu, Feb 8, 2024 at 2:31=E2=80=AFAM Minchan Kim wrote: > > >> > > >> On Wed, Feb 07, 2024 at 12:06:15PM +0800, Kairui Song wrote: > > > > [snip] > > > > >> > > > >> > So I think the thing is, it's getting complex because this patch > > >> > wanted to make it simple and just reuse the swap cache flags. > > >> > > >> I agree that a simple fix would be the important at this point. > > >> > > >> Considering your description, here's my understanding of the other i= dea: > > >> Other method, such as increasing the swap count, haven't proven effe= ctive > > >> in your tests. The approach risk forcing racers to rely on the swap = cache > > >> again and the potential performance loss in race scenario. > > >> > > >> While I understand that simplicity is important, and performance los= s > > >> in this case may be infrequent, I believe swap_count approach could = be a > > >> suitable solution. What do you think? > > > > > > Hi Minchan > > > > > > Yes, my main concern was about simplicity and performance. > > > > > > Increasing swap_count here will also race with another process from > > > releasing swap_count to 0 (swapcache was able to sync callers in othe= r > > > call paths but we skipped swapcache here). > > > > What is the consequence of the race condition? > > Hi Ying, > > It will increase the swap count of an already freed entry, this race > with multiple swap free/alloc logic that checks if count =3D=3D > SWAP_HAS_CACHE or sets count to zero, or repeated free of an entry, > all result in random corruption of the swap map. This happens a lot > during stress testing. In theory, the original code before your patch can get into a situation similar to what you try to avoid. CPU1 enters the do_swap_page() with entry swap count =3D=3D 1 and continues handling the swap fault without swap cache. Then some operation bumps up the swap entry count and CPU2 enters the do_swap_page() racing with CPU1 with swap count =3D=3D 2. CPU2 will need to go through the swap cache case. We still need to handle this situation correctly. So the complexity is already there. If we can make sure the above case works correctly, then one way to avoid the problem is just send the CPU2 to use the swap cache (without the swap cache by-passing). > > > > > > So the right step is: 1. Lock the cluster/swap lock; 2. Check if stil= l > > > have swap_count =3D=3D 1, bail out if not; 3. Set it to 2; > > > __swap_duplicate can be modified to support this, it's similar to > > > existing logics for SWAP_HAS_CACHE. > > > > > > And swap freeing path will do more things, swapcache clean up needs t= o > > > be handled even in the bypassing path since the racer may add it to > > > swapcache. > > > > > > Reusing SWAP_HAS_CACHE seems to make it much simpler and avoided many > > > overhead, so I used that way in this patch, the only issue is > > > potentially repeated page faults now. > > > > > > I'm currently trying to add a SWAP_MAP_LOCK (or SWAP_MAP_SYNC, I'm ba= d > > > at naming it) special value, so any racer can just spin on it to avoi= d > > > all the problems, how do you think about this? > > > > Let's try some simpler method firstly. > > Another simpler idea is, add a schedule() or > schedule_timeout_uninterruptible(1) in the swapcache_prepare failure > path before goto out (just like __read_swap_cache_async). I think this > should ensure in almost all cases, PTE is ready after it returns, also > yields more CPU. I mentioned in my earlier email and Ying points out that as well. Looping waiting inside do_swap_page() is bad because it is holding other locks. Sorry I started this idea but it seems no good. If we can have CPU2 make forward progress without retrying the page fault would be the best, if possible. Chris