Received: by 2002:ab2:6309:0:b0:1fb:d597:ff75 with SMTP id s9csp1378331lqt; Fri, 7 Jun 2024 17:29:24 -0700 (PDT) X-Forwarded-Encrypted: i=3; AJvYcCVP8Plis2r83tIJK4hDXOGkMXpwArFBs3dd/oxjGbt7ZZOBVlqOPDiaLuaIHW3sLWuXx0BX3HHoyqSD0MjN1VOhKXCuiFhCSt0Han43fA== X-Google-Smtp-Source: AGHT+IFlmHFpTXgGBZDbKZK1kn3lo/CvsUfyHOE25oWDHpaHQAmcsw1JlQPNPtz4Rn7PrOeA/j5n X-Received: by 2002:a17:902:e812:b0:1f6:5fb6:c993 with SMTP id d9443c01a7336-1f6d02d469emr45992395ad.14.1717806564102; Fri, 07 Jun 2024 17:29:24 -0700 (PDT) ARC-Seal: i=2; a=rsa-sha256; t=1717806564; cv=pass; d=google.com; s=arc-20160816; b=n8jFXGxsvFiyT1CqLOTVV2S6TA25htboyNgwcEqK/4sfpVWHEv8B7Z8jZ9bz3bfPZ6 1ENQgYKCDl1AVxP7yc7S1rV7/xstLPZjk5vgDRGatLD6YrkNqIK/GsOXZTMdX/VCwvdv suEgaGClc1IG8oUIWUXLfCdBGWIHdLRnkr0AfqxpFUYTKGBpP8rAFIrhrv30pwD8j10O SfZYtuyfkbpdRKkcc3pCeDn/hTYDAxyRDskV35fcXkQa+LblWeI157Mbm9ZsQrZPrQmC smvytudT95yB6VsmgX9+nRDXs/e/XMKfigFGCrrJy4oIrBW0Jntmyz3AP/0SpxxkkMQF F2WQ== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:list-unsubscribe:list-subscribe :list-id:precedence:dkim-signature; bh=JU3lVIkMLo3Prbwhc1rQ2oK3OOx/42O1VO+Q7TSex8Q=; fh=Z0+TKkGyFE6GP5o3zqpaEvELRwQZRXnWGjkJbAxZ1/4=; b=QBy/GxAGqiyIwQF3hoO27PxVhDTxvdflY+Sv9dCpagoQaMAQFtSxWv26wG8CzFEp7Z 5blVHDWm0jFAr4EL6X+ps0sTnhsS5JreGB3UWcNbmd1agod/gOzY+iBpCJObK2IZKtPI +3U6+b4P0flernauZzc5HbXyLj06N2Jrm6HOZmv3sLNYYLCK4EAN8cGzP+OmrXsed+k9 anMGhuLphRJRgLO5ki5mMSZ8zCxwQshidX2X7ZN2hOJNF34+G7C3EKBhLDQEC+rh9OBw RQaqMlD8m0dGRV9zMQt2Wc2IwFDUa8/HUFBe1ggtEx77MgTlO1WncS4Kau+14LybP4ic 3cDg==; dara=google.com ARC-Authentication-Results: i=2; mx.google.com; dkim=pass header.i=@google.com header.s=20230601 header.b=sDruvzkQ; arc=pass (i=1 spf=pass spfdomain=google.com dkim=pass dkdomain=google.com dmarc=pass fromdomain=google.com); spf=pass (google.com: domain of linux-kernel+bounces-206856-linux.lists.archive=gmail.com@vger.kernel.org designates 147.75.48.161 as permitted sender) smtp.mailfrom="linux-kernel+bounces-206856-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from sy.mirrors.kernel.org (sy.mirrors.kernel.org. [147.75.48.161]) by mx.google.com with ESMTPS id d9443c01a7336-1f6bd7f4433si39244515ad.487.2024.06.07.17.29.23 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 07 Jun 2024 17:29:24 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel+bounces-206856-linux.lists.archive=gmail.com@vger.kernel.org designates 147.75.48.161 as permitted sender) client-ip=147.75.48.161; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20230601 header.b=sDruvzkQ; arc=pass (i=1 spf=pass spfdomain=google.com dkim=pass dkdomain=google.com dmarc=pass fromdomain=google.com); spf=pass (google.com: domain of linux-kernel+bounces-206856-linux.lists.archive=gmail.com@vger.kernel.org designates 147.75.48.161 as permitted sender) smtp.mailfrom="linux-kernel+bounces-206856-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by sy.mirrors.kernel.org (Postfix) with ESMTPS id A3610B23933 for ; Sat, 8 Jun 2024 00:29:21 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id F090028FA; Sat, 8 Jun 2024 00:29:13 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="sDruvzkQ" Received: from mail-ed1-f45.google.com (mail-ed1-f45.google.com [209.85.208.45]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 33AD37F for ; Sat, 8 Jun 2024 00:29:11 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.208.45 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1717806553; cv=none; b=swFos48fcVm37W/L/7NSzjPqQiKDAVjeCmZJwo8d/nsoR3atOfdT1LG20wR4F221QFLnuFigtHqGt9yySGd8tHppreZNZgw45Ru5yEEOwhO5cdo/5Q+08fz4lDfFj3LysPzmmFUChSz7tlHpLSDItE2jZ4s4gKj/PwyV6Yl3T2E= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1717806553; c=relaxed/simple; bh=kjmv8RmXDLcFGRxZg9i2sgDpiC6aPIiaGPsEdtZHlbo=; h=MIME-Version:References:In-Reply-To:From:Date:Message-ID:Subject: To:Cc:Content-Type; b=AJ33fDeOzpBmySYlB4pVMu1YDw+7D2xnCzQFmus9LtF+YrzhHjCH+BZGTzN6z3ba5Uez2rSy/LBNvFZybXnsIVfXmTY5t0JwAGlKXdI1SzJdLjl8v1QylaiU+SR2C/hTYvWVbLf8ed66VceepEle+waaWyXCJ+aCx6VdHOxgOH4= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=sDruvzkQ; arc=none smtp.client-ip=209.85.208.45 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=google.com Received: by mail-ed1-f45.google.com with SMTP id 4fb4d7f45d1cf-57c614c572aso892277a12.1 for ; Fri, 07 Jun 2024 17:29:10 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1717806549; x=1718411349; darn=vger.kernel.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=JU3lVIkMLo3Prbwhc1rQ2oK3OOx/42O1VO+Q7TSex8Q=; b=sDruvzkQJhovD+iPsvy8ddQNF6DhWSOUL2bnqXsW0Vi0rRJsoAM/kgqNpDEnsKUcWp tVo4fCsAXM0H8zvndnCMDlDdeNDkYP8bT9gUkcNg+ZajIfXzTqNeuO6refigXBIO4heQ CAVJfjlYi7VdSxSVoSXTj9TmcgB7xjCYAgpx9lcb8BwkYoky0wb/ZXMSnuUG72nbxdbk VTBHE+YMDVHD8st2rRL7gV1XFcnjjSlXapSQj3KWw4R3TD2NkNnCjVkQrpld4OYVPDr7 xWV00BLz2pgTkSASCdcAQTxAVoeD46DunbMgEkTuatRoes3TJF0OBNSciCGLjhkQGQah zH7g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1717806549; x=1718411349; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=JU3lVIkMLo3Prbwhc1rQ2oK3OOx/42O1VO+Q7TSex8Q=; b=mM2xZAU4qBK1eO6uMCptcMMSM8j97pSPHg4s296JJD8weqWapIKSXS+FfrFVTTg5CR AnOeEDMy8+g3V99QbNRe5myRWiVTLKrZJZf7ZbyPvxTF7z4Cv0CVMS8a/7nkcL3adEex VwXHbrY+s8eMJIawvEaOAcxftG9B+j8EJ4OZBGekv6wd+arW6jj579+VcHh9Mrrv3H6Q W30hfw+oqBXI0qu9/x5RxLX/poBkl6woQMdW0CpCf1qCfw1P3FP4fsEBcFDx6b6yuv3G 4g7aw3CuifvnP2o0lYrDdMuSQ502QZwm8x4jDAtLZBHK2/3VOAD0Fc/JVO/sMEY+YtJv 8JHg== X-Forwarded-Encrypted: i=1; AJvYcCVS8ptdLL2t3pH0N1sbdG94J/CVVdgfBOIPYRNXbg+PkvOFlnjaLUtKIETbeWKA/gOJuVquot0NQTlwzglILs6+T+3UNB02WxFDW/FH X-Gm-Message-State: AOJu0YymIlSkyPxEJBurOVc9oRebOpnuP9rxy4FVmFCGeHrzkLxFNzLk jfzN1NMCMMufakAx3m/DQKZD6WyYVKffBlSPW8n2mgQX5CYDOVcOEoQWuBRJq0J5ncGzQkQ9KwB Y81JA0TG0hqAJ0h0eXaQbiPbT9EbJ0RUA8OBL X-Received: by 2002:a17:906:54e:b0:a6c:8be4:7f25 with SMTP id a640c23a62f3a-a6cdaa0f4f6mr269861366b.56.1717806549286; Fri, 07 Jun 2024 17:29:09 -0700 (PDT) Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 References: <20240606184818.1566920-1-yosryahmed@google.com> <84d78362-e75c-40c8-b6c2-56d5d5292aa7@redhat.com> <7507d075-9f4d-4a9b-836c-1fbb2fbd2257@redhat.com> <9374758d-9f81-4e4f-8405-1f972234173e@redhat.com> <424c6430-e40d-4a60-8297-438fc33056c9@redhat.com> In-Reply-To: From: Yosry Ahmed Date: Fri, 7 Jun 2024 17:28:30 -0700 Message-ID: Subject: Re: [PATCH] mm: zswap: add VM_BUG_ON() if large folio swapin is attempted To: Barry Song <21cnbao@gmail.com> Cc: David Hildenbrand , Andrew Morton , Johannes Weiner , Nhat Pham , Chengming Zhou , Baolin Wang , Chris Li , Ryan Roberts , Matthew Wilcox , linux-mm@kvack.org, linux-kernel@vger.kernel.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable On Fri, Jun 7, 2024 at 3:09=E2=80=AFPM Barry Song <21cnbao@gmail.com> wrote= : > > On Sat, Jun 8, 2024 at 6:58=E2=80=AFAM Yosry Ahmed wrote: > > > > On Fri, Jun 7, 2024 at 11:52=E2=80=AFAM David Hildenbrand wrote: > > > > > > >> I have no strong opinion on this one, but likely a VM_WARN_ON woul= d also > > > >> be sufficient to find such issues early during testing. No need to= crash > > > >> the machine. > > > > > > > > I thought VM_BUG_ON() was less frowned-upon than BUG_ON(), but afte= r > > > > some digging I found your patches to checkpatch and Linus clearly > > > > stating that it isn't. > > > > > > :) yes. > > > > > > VM_BUG_ON is not particularly helpful IMHO. If you want something to = be > > > found early during testing, VM_WARN_ON is good enough. > > > > > > Ever since Fedora stopped enabling CONFIG_DEBUG_VM, VM_* friends are > > > primarily reported during early/development testing only. But maybe s= ome > > > distro out there still sets it. > > > > > > > > > > > How about something like the following (untested), it is the minima= l > > > > recovery we can do but should work for a lot of cases, and does > > > > nothing beyond a warning if we can swapin the large folio from disk= : > > > > > > > > diff --git a/mm/page_io.c b/mm/page_io.c > > > > index f1a9cfab6e748..8f441dd8e109f 100644 > > > > --- a/mm/page_io.c > > > > +++ b/mm/page_io.c > > > > @@ -517,7 +517,6 @@ void swap_read_folio(struct folio *folio, struc= t > > > > swap_iocb **plug) > > > > delayacct_swapin_start(); > > > > > > > > if (zswap_load(folio)) { > > > > - folio_mark_uptodate(folio); > > > > folio_unlock(folio); > > > > } else if (data_race(sis->flags & SWP_FS_OPS)) { > > > > swap_read_folio_fs(folio, plug); > > > > diff --git a/mm/zswap.c b/mm/zswap.c > > > > index 6007252429bb2..cc04db6bb217e 100644 > > > > --- a/mm/zswap.c > > > > +++ b/mm/zswap.c > > > > @@ -1557,6 +1557,22 @@ bool zswap_load(struct folio *folio) > > > > > > > > VM_WARN_ON_ONCE(!folio_test_locked(folio)); > > > > > > > > + /* > > > > + * Large folios should not be swapped in while zswap is bei= ng used, as > > > > + * they are not properly handled. > > > > + * > > > > + * If any of the subpages are in zswap, reading from disk w= ould result > > > > + * in data corruption, so return true without marking the f= olio uptodate > > > > + * so that an IO error is emitted (e.g. do_swap_page() will= sigfault). > > > > + * > > > > + * Otherwise, return false and read the folio from disk. > > > > + */ > > > > + if (WARN_ON_ONCE(folio_test_large(folio))) { > > > > + if (xa_find(tree, &offset, offset + > > > > folio_nr_pages(folio) - 1, 0)) > > > > + return true; > > > > + return false; > > > > + } > > > > + > > > > /* > > > > * When reading into the swapcache, invalidate our entry. = The > > > > * swapcache can be the authoritative owner of the page an= d > > > > @@ -1593,7 +1609,7 @@ bool zswap_load(struct folio *folio) > > > > zswap_entry_free(entry); > > > > folio_mark_dirty(folio); > > > > } > > > > - > > > > + folio_mark_uptodate(folio); > > > > return true; > > > > } > > > > > > > > One problem is that even if zswap was never enabled, the warning wi= ll > > > > be emitted just if CONFIG_ZSWAP is on. Perhaps we need a variable o= r > > > > static key if zswap was "ever" enabled. > > > > > > We should use WARN_ON_ONCE() only for things that cannot happen. So i= f > > > there are cases where this could be triggered today, it would be > > > problematic -- especially if it can be triggered from unprivileged us= er > > > space. But if we're concerned of other code messing up our invariant = in > > > the future (e.g., enabling large folios without taking proper care ab= out > > > zswap etc), we're good to add it. > > > > Right now I can't see any paths allocating large folios for swapin, so > > I think it cannot happen. Once someone tries adding it, the warning > > will fire if CONFIG_ZSWAP is used, even if zswap is disabled. > > > > At this point we will have several options: > > - Make large folios swapin depend on !CONFIG_ZSWAP for now. > > It appears quite problematic. We lack control over whether the kernel bui= ld > will enable CONFIG_ZSWAP, particularly when aiming for a common > defconfig across all platforms to streamline configurations. For instance= , > in the case of ARM, this was once a significant goal. > > Simply trigger a single WARN or BUG if an attempt is made to load > large folios in zswap_load, while ensuring that zswap_is_enabled() > remains unaffected. In the mainline code, large folio swap-in support > is absent, so this warning is intended for debugging purposes and > targets a very small audience=E2=80=94perhaps fewer than five individuals > worldwide. Real users won=E2=80=99t encounter this warning, as it remains > hidden from their view. I can make the warning only fire if any part of the folio is in zswap to avoid getting warnings from zswap_load() if we never actually use zswap, that's reasonable. I wanted to warn if we reach zswap_load() with any large folio at all for higher coverage only. I will send something out in the next week or so. > > > - Keep track if zswap was ever enabled and make the warning > > conditional on it. We should also always fallback to order-0 if zswap > > was ever enabled. > > - Properly handle large folio swapin with zswap. > > > > Does this sound reasonable to you? > > Thanks > Barry