Received: by 2002:a05:7412:5112:b0:fa:6e18:a558 with SMTP id fm18csp1603807rdb; Thu, 25 Jan 2024 00:00:13 -0800 (PST) X-Google-Smtp-Source: AGHT+IGvmrRQBAYZlpj4lCDMfz6RZZF3km4ZcBq8fDV2WGCJyrsNAdWcpaZ5JC0IL+oq8gB96zV+ X-Received: by 2002:a17:902:bc41:b0:1d5:f667:1096 with SMTP id t1-20020a170902bc4100b001d5f6671096mr313918plz.132.1706169613716; Thu, 25 Jan 2024 00:00:13 -0800 (PST) ARC-Seal: i=2; a=rsa-sha256; t=1706169613; cv=pass; d=google.com; s=arc-20160816; b=Qo73kDVmq9SMDKkleJR6Hhd2BBrC5T9psTEl/hfChOhiR5RRcdperHfbXANq0HSJPo F2zwhGekOlOTiLqLXCvuhnJgWYCsyswQV8gswd1G+/OXaR1QRpzYkQgYiUt1W+VCm8IW RBSrCTCjBsJWTBl2UmNiJWFUQCjZPc+nvqI94Lv0sO3RU9746d4qDQ5vAy+6KRkT5+k5 h9NL3VFvkdbQLsNh9LeIR3OSCmiqMEXeCDnIgQVsHrjW0gd1Y8ES2Mt5G31P10U2ieQf qjkyjl2QEqJU5JLT/G5CqySRPj3DS9THGp4/pJDRazU8rY1UvDwA6BEkkn23iZ7QsxVX k34w== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:list-unsubscribe:list-subscribe :list-id:precedence:dkim-signature; bh=y1H6lVGRWuFfz+Z1+0jRYL727Z9vnxtvrMU1rUX5Q8w=; fh=XqAlIES7mKGwjGNoteawchAAYNoY97qWjbbWepc/uNs=; b=bPqtnXyIEVdho3jod9AuA6aG2ViURyiMQFam44osmWtor+zOrrYY63OY8YmSmdp2dZ +ZjDZz3tB/VfW5p8wq2VKhl6XtmhiQPuff6photLV2wvzZ4DOqGLqPm2yjdGWtYKJZBd LqIbs5bATrfxYaNmquVIq2/IVz6d5tftHdnCoTMRDrzzdRAwit1YXmSXqkwEXApR5WO3 mR59cEvArXkq+8SLUtrTizQytxlukU67sHX1Vr/tG/k7Ycv+W7+31mBaCtdwsI+2srSs kT4ejr626DFzfL7R7ICLIAfYO394owyos2uEcwLiBvEAwpCaQ0qM/kQ7oQHgVLjakErc +UDA== ARC-Authentication-Results: i=2; mx.google.com; dkim=pass header.i=@google.com header.s=20230601 header.b=NTEkWIeh; arc=pass (i=1 spf=pass spfdomain=google.com dkim=pass dkdomain=google.com dmarc=pass fromdomain=google.com); spf=pass (google.com: domain of linux-kernel+bounces-38119-linux.lists.archive=gmail.com@vger.kernel.org designates 139.178.88.99 as permitted sender) smtp.mailfrom="linux-kernel+bounces-38119-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from sv.mirrors.kernel.org (sv.mirrors.kernel.org. [139.178.88.99]) by mx.google.com with ESMTPS id p18-20020a170902c71200b001d8882dd67dsi909614plp.143.2024.01.25.00.00.13 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 25 Jan 2024 00:00:13 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel+bounces-38119-linux.lists.archive=gmail.com@vger.kernel.org designates 139.178.88.99 as permitted sender) client-ip=139.178.88.99; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20230601 header.b=NTEkWIeh; arc=pass (i=1 spf=pass spfdomain=google.com dkim=pass dkdomain=google.com dmarc=pass fromdomain=google.com); spf=pass (google.com: domain of linux-kernel+bounces-38119-linux.lists.archive=gmail.com@vger.kernel.org designates 139.178.88.99 as permitted sender) smtp.mailfrom="linux-kernel+bounces-38119-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by sv.mirrors.kernel.org (Postfix) with ESMTPS id 5B7A42873C1 for ; Thu, 25 Jan 2024 08:00:13 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 08530175B1; Thu, 25 Jan 2024 07:59:55 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="NTEkWIeh" Received: from mail-ej1-f53.google.com (mail-ej1-f53.google.com [209.85.218.53]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 8061A1758F for ; Thu, 25 Jan 2024 07:59:52 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.218.53 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1706169594; cv=none; b=BXvM/VE5V5fRM4vHIBxa5M83TjHl3BiZta2wUOgdnZskOpyKMN+p3MDIDKHEBSN/+UrySjDQyiTFRW1axn9RV6yCkHVHuC/acCfUGxb9qYKBrvOOZXyRoZx3Kl2mDC9I2RMKWHOq6zWgM+BuNWIVuswaTj/gCE4Q48Oi6X6xkrI= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1706169594; c=relaxed/simple; bh=bQELMkFcYFews4tzTwFizOFP2dATmO+CBSBN1CERlhA=; h=MIME-Version:References:In-Reply-To:From:Date:Message-ID:Subject: To:Cc:Content-Type; b=QnB1/JPTqUWrTxzXrNYWOV2CK0ZtS0wf+4LrbArw4HAZLrD9IGrys0LUzWR+ePUgnHot67Y5sL58i5VV9RsYIxtnWtvBme6EHKrOFqY8PzmcyLvkYYQru5YSwOyMMX0LGl+6pLre1w7t7MaD8wvAWkNHSL6G6doJ5KxdyLZXsDA= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=NTEkWIeh; arc=none smtp.client-ip=209.85.218.53 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=google.com Received: by mail-ej1-f53.google.com with SMTP id a640c23a62f3a-a30ed6dbdadso217106666b.1 for ; Wed, 24 Jan 2024 23:59:52 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1706169591; x=1706774391; darn=vger.kernel.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=y1H6lVGRWuFfz+Z1+0jRYL727Z9vnxtvrMU1rUX5Q8w=; b=NTEkWIehmGP4+Y/Xiitej7z/zMJCLUDi9tFgnsS3r3J32mxyDaryfUAjzLSFf/1/FB 2DaAH/+Gj5boKvUFXBGmX7POADzTo3g76X0S9x97/7Y4Z7mBEw9btS6XolpojDGcJ9qg TW+GT1xb1aUkAmrihLBlhFmGRkeVNUlVC+gKeCZDawEP/ag4lTEhMUxapDiS4UoXc9kQ qPLqyYeYrrXL0gx7XIUqDBYT0I62lUm//wZdoKVPHxXfViJP7ljEAwzPdmifWUOZKmHc 52YE6FnSOBNHyqo+wSMCe26TabeRjiE1dZxTa0G+u3GwhSvmI2JPEvzpxugD5D1RFSOC 1iKA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1706169591; x=1706774391; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=y1H6lVGRWuFfz+Z1+0jRYL727Z9vnxtvrMU1rUX5Q8w=; b=jCCQmq8xgH4EvW9cW1kF5e4aoijql0EmqYtkMjo385zLsvbKu53bBUF/vFPnOWqcsz DAun8MInreWCPB8HkKaHKWfbwICiMfSOe1Fn/fQquujCxx1kACJtIfX0onkkVoy3eMZb FGDlgwm17nJUSAU5IF4saEhLFY+U93tiAlnsEFJtf6MrHnG3jS13+cWOEI+0GdHscnUk Y9CSzVbMFo1KQ2JQJslG1IIk1P4SdxSYBJmaug1jfsqSpSGSS78GCYt96FdXvH2Uct+o 7tOUnJvixYGykhFldBXsKhayRTXOuOmcdVXJCFUE7/7mMmJH7eU++JW5cn+tb/NScKt4 ZkYA== X-Gm-Message-State: AOJu0YwK43tr75JUT0we4bEm1m+2A5gWvbBFfxBDT8hima3vchdpnee2 lf7NbxPjzm550eKjl0S5WmQV+nGmfuLBaME6ZvryISGwPIQA/VcdYIYzZ3NumTNrs71lkubCarA vJ4CuYm9bdeZesgiO4kv7ij3rCl8VP1P6aCAd X-Received: by 2002:a17:906:b78c:b0:a2c:88d3:754c with SMTP id dt12-20020a170906b78c00b00a2c88d3754cmr349013ejb.40.1706169590452; Wed, 24 Jan 2024 23:59:50 -0800 (PST) Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 References: <20240120024007.2850671-1-yosryahmed@google.com> <20240120024007.2850671-3-yosryahmed@google.com> <20240122201906.GA1567330@cmpxchg.org> <20240123153851.GA1745986@cmpxchg.org> <20240123201234.GC1745986@cmpxchg.org> In-Reply-To: From: Yosry Ahmed Date: Wed, 24 Jan 2024 23:59:14 -0800 Message-ID: Subject: Re: [PATCH 2/2] mm: zswap: remove unnecessary tree cleanups in zswap_swapoff() To: Chris Li Cc: Johannes Weiner , Andrew Morton , Nhat Pham , Chengming Zhou , Huang Ying , linux-mm@kvack.org, linux-kernel@vger.kernel.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable On Wed, Jan 24, 2024 at 9:29=E2=80=AFPM Chris Li wrot= e: > > Hi Yosry, > > On Tue, Jan 23, 2024 at 10:58=E2=80=AFPM Yosry Ahmed wrote: > > > > > > > > > Thanks for the great analysis, I missed the swapoff/swapon race mysel= f :) > > > > > > The first solution that came to mind for me was refcounting the zswap > > > tree with RCU with percpu-refcount, similar to how cgroup refs are > > > handled (init in zswap_swapon() and kill in zswap_swapoff()). I think > > > the percpu-refcount may be an overkill in terms of memory usage > > > though. I think we can still do our own refcounting with RCU, but it > > > may be more complicated. > > > > FWIW, I was able to reproduce the problem in a vm with the following > > kernel diff: > > Thanks for the great find. > > I was worry about the usage after free situation in this email: > > https://lore.kernel.org/lkml/CAF8kJuOvOmn7wmKxoqpqSEk4gk63NtQG1Wc+Q0e9FZ9= OFiUG6g@mail.gmail.com/ > > Glad you are able to find a reproducible case. That is one of the > reasons I change the free to invalidate entries in my xarray patch. > > I think the swap_off code should remove the entry from the tree, just > wait for each zswap entry to drop to zero. Then free it. This doesn't really help. The swapoff code is already removing all the entries from the trees before zswap_swapoff() is called through zswap_invalidate(). The race I described occurs because the writeback code is accessing the entries through the LRU, not the tree. The writeback code could have isolated a zswap entry from the LRU before swapoff, then tried to access it after swapoff. Although the zswap entry itself is referenced and safe to use, accessing the tree to grab the tree lock and check if the entry is still in the tree is the problem. > > That way you shouldn't need to refcount the tree. The tree refcount is > effectively the combined refcount of all the zswap entries. The problem is that given a zswap entry, you have no way to stabilize the zswap tree before trying to deference it with the current code. Chengming's suggestion of moving the swap cache pin before accessing the tree seems like the right way to go. > Having refcount on the tree would be very high contention. A percpu refcount cannot be contended by definition :)