Received: by 2002:ab2:6309:0:b0:1fb:d597:ff75 with SMTP id s9csp641403lqt; Thu, 6 Jun 2024 13:55:43 -0700 (PDT) X-Forwarded-Encrypted: i=3; AJvYcCUdc4itUo+GEV6tTxLbE7ruvyV/tMmdTfyrkeJRKGqJJg0Pho8+SQ7B2Go7PszNwp7V0Tl67EFrN0O9uVbFq9GjgV+y3PW8lQuQh11rEA== X-Google-Smtp-Source: AGHT+IHoJEBFdt4QKWm/o9YszmVf+o9NMzC0B6ZQyRdXZ3JgiOeDkBc+c2kI7UerEzZ88n3GqvZZ X-Received: by 2002:a05:6a00:2393:b0:702:4139:7584 with SMTP id d2e1a72fcca58-7040c649c24mr729170b3a.9.1717707342975; Thu, 06 Jun 2024 13:55:42 -0700 (PDT) ARC-Seal: i=2; a=rsa-sha256; t=1717707342; cv=pass; d=google.com; s=arc-20160816; b=ACPE8I3vNetLpIOz0x77ZTQmOB6ySU7JWAbteGE75l4aGIxAAVThLI+YpGDHvpwWXd PI0tB8bPEsiUv2deThw4VRaSuneWmpSfloA/9NS2EUAUrDYnWb/YA2MH1D4HoeZq4/rI wWdq7GD1j0ia/PO+4pFo/QybvNgI6hM4YaNBzEQhz14BOg5/l7QbqN32XmcN8+1tlgRH BTENGcqabnuXIAUkNuvksBYzmDVRQquo6X2FnV6eW+34bpg3v+VDLRLCzcfZrA0Y9elE bkzCsMm/KZPjiq08DnGtdzLZ3wgxUAiZkxI7LpMeVwqZcq1W606cOjcC/Jz+9cDxvkYh PLpw== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:list-unsubscribe:list-subscribe :list-id:precedence:dkim-signature; bh=yL7tTfGLqeY7hfzIjcIfo8g+rxb0Tacathwl6LUj7oY=; fh=+BOHAZL3174CvwWuFBqYH5/kbwWsotLtByQ4Ef5M18o=; b=Gclvh8LlUR/f51A0TBvjOh73hNiVEvjEnzwP4S8dO5Va6FVz6YSTLUxThS7rcbtuR2 iOI1Vqsfj69CSfRjsrl+/BTNfLOI8ykfK92q6/jxYPeU2Zyv/vPoPl9pA5//yoquzm+j Ieu+hrG8uaUfHOOt5m0qN+II/94IWxc2ltBCldQwR4N6zJCT2VM2+oZjRjv14nh1GJMg ksuPVbDd6DMB6udBndZiSqO7mcJlGwnNxWp5PG+R4GRDSUjNZIcwCZ4R0YRvghMjQWok Q3IGTKpy1PkOV+qvvBB2W/UkrIieuARvvg722EJabOMtEMkcHOIUggWecMmAW5BrAr+O mIzA==; dara=google.com ARC-Authentication-Results: i=2; mx.google.com; dkim=pass header.i=@gmail.com header.s=20230601 header.b=FD79WkSl; arc=pass (i=1 spf=pass spfdomain=gmail.com dkim=pass dkdomain=gmail.com dmarc=pass fromdomain=gmail.com); spf=pass (google.com: domain of linux-kernel+bounces-205012-linux.lists.archive=gmail.com@vger.kernel.org designates 147.75.48.161 as permitted sender) smtp.mailfrom="linux-kernel+bounces-205012-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Return-Path: Received: from sy.mirrors.kernel.org (sy.mirrors.kernel.org. [147.75.48.161]) by mx.google.com with ESMTPS id d2e1a72fcca58-703fd55fde7si1305835b3a.307.2024.06.06.13.55.42 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 06 Jun 2024 13:55:42 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel+bounces-205012-linux.lists.archive=gmail.com@vger.kernel.org designates 147.75.48.161 as permitted sender) client-ip=147.75.48.161; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20230601 header.b=FD79WkSl; arc=pass (i=1 spf=pass spfdomain=gmail.com dkim=pass dkdomain=gmail.com dmarc=pass fromdomain=gmail.com); spf=pass (google.com: domain of linux-kernel+bounces-205012-linux.lists.archive=gmail.com@vger.kernel.org designates 147.75.48.161 as permitted sender) smtp.mailfrom="linux-kernel+bounces-205012-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by sy.mirrors.kernel.org (Postfix) with ESMTPS id 22F30B21AEB for ; Thu, 6 Jun 2024 20:55:29 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id BB3DF19750B; Thu, 6 Jun 2024 20:55:11 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="FD79WkSl" Received: from mail-ua1-f43.google.com (mail-ua1-f43.google.com [209.85.222.43]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 397EE13C67B for ; Thu, 6 Jun 2024 20:55:09 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.222.43 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1717707310; cv=none; b=UyKjyXA2uStx1laXA/+Lqcmh3zJC7MPOX7VHnLqezynF5YWCAlypg/eJkgcePxH8g1+aqJE+xprenIfb1ZpOIdA96m/Qmk7MikEMGkdlJY7AWd6P3qR78Qn/UgXMnO/Ab7/80G7hqSQBxv8EYN/ESMzELZPvxJsC3b/AyEGwBI8= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1717707310; c=relaxed/simple; bh=umOIhey/oqSigy9oxZ9JmAIfpr4+c1V6FQJLEtzPmQQ=; h=MIME-Version:References:In-Reply-To:From:Date:Message-ID:Subject: To:Cc:Content-Type; b=tuLpaNEf4DzBHXjDzJMW75R7FVYo6qil8gQuuYVtxLq1GouoMXeNvcuihoeRMVedKwoNeDnItGkBBFZoptt1e0bulX26ebjkQjGwS0XmRqXTtDKbFpnihJ79RqupuSm5JInarXvrW4UReWas7m2qAEee1wfb0Z7ccKMiBocr1yo= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=FD79WkSl; arc=none smtp.client-ip=209.85.222.43 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Received: by mail-ua1-f43.google.com with SMTP id a1e0cc1a2514c-80acf1bca86so436214241.3 for ; Thu, 06 Jun 2024 13:55:09 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1717707308; x=1718312108; darn=vger.kernel.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=yL7tTfGLqeY7hfzIjcIfo8g+rxb0Tacathwl6LUj7oY=; b=FD79WkSlPFGyoU+TZIgQnjTvZ/6QwdlGBZdXlfLpVBbAfc0Vw2/6OczpafSdQP0ROI orNmddU6w5q5rD5J2FkckrHvUijhlj3F16a9lVz2LHfhBckjYdC8kpLNq/T2vtoM2c+X 9mjkEvfcifrAf0ZycsierhZjevt/G47F/R0+4lkyk8C63TFm+zrbjELXqxJzC+0ajsaP uD8R54YsuaJ1SK4yEj3cSV+x5uMWoyVMPXTCmv1YrQ7tGrjbOFa4+xtpgFTgYouSxJ86 BIkV0I5I7uzTY9iZ0Vd2zkeKjVlMQNfzJnA/IvSewIb8yFkdhrlNspPs5QSVoDeZ9TTz n0kA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1717707308; x=1718312108; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=yL7tTfGLqeY7hfzIjcIfo8g+rxb0Tacathwl6LUj7oY=; b=DUtqYi8MCrCOVt4uYm+1H8f0SltDRrAkn1BHOCqA65JbnQzzfRyJBkaJgquy/Oi5HB oSlkDz+v2nGpTCVNhaGfPkWtYRpOB/0v3mQYRzUZ/zXB7P6V3GKPg9EVPDUaogX6gMTQ Yke1EQfPv6xsiXlyAPZKURPJcSDZuUURDy7h97LpJLPQ+7TPnZia45oV+YBdEwK7eaWu AEHrTPqFrIWUR3vIcvEE9H9wBXenkjJc6NpS5qx+UlemnCpdMXOpmKKKAPlwCHICRbWp Ekth16x5swEl3ZD56fWbnKXgmQ3dwpXCXYnZoA6lahMzc59/ff3beEoH/GjljqbpTAZJ 6XxQ== X-Forwarded-Encrypted: i=1; AJvYcCUrMUzd2S2fa9PLkIZqsuNgGYqlF72/5aJ6vZcawdj0g+FfC3mimrbZ+oa1A3XD+yKOKBrNn6goHSr74XK4kK4Zeg88znEpTS2b1v2/ X-Gm-Message-State: AOJu0Ywu4Fr5mc09gCy/2IZMrN+O9u1kRhiXwLXoGKpmPPv9Y0YkD9oH Z9Ppzk1WDo6pz60VHz6KGouA6uma23FktkYXv85JLHS3QEO7AbsUcRD582jCQ9C/qHsSk6gWzIA EH87kr+urezKTFLr7+47yPoFgGA8= X-Received: by 2002:a05:6102:216a:b0:47b:d70c:ca9d with SMTP id ada2fe7eead31-48c275342e2mr468266137.29.1717707307769; Thu, 06 Jun 2024 13:55:07 -0700 (PDT) Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 References: <20240606184818.1566920-1-yosryahmed@google.com> <84d78362-e75c-40c8-b6c2-56d5d5292aa7@redhat.com> In-Reply-To: From: Barry Song <21cnbao@gmail.com> Date: Fri, 7 Jun 2024 08:54:56 +1200 Message-ID: Subject: Re: [PATCH] mm: zswap: add VM_BUG_ON() if large folio swapin is attempted To: Yosry Ahmed Cc: David Hildenbrand , Andrew Morton , Johannes Weiner , Nhat Pham , Chengming Zhou , Baolin Wang , Chris Li , Ryan Roberts , Matthew Wilcox , linux-mm@kvack.org, linux-kernel@vger.kernel.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable On Fri, Jun 7, 2024 at 8:32=E2=80=AFAM Yosry Ahmed = wrote: > > On Thu, Jun 6, 2024 at 1:22=E2=80=AFPM David Hildenbrand wrote: > > > > On 06.06.24 20:48, Yosry Ahmed wrote: > > > With ongoing work to support large folio swapin, it is important to m= ake > > > sure we do not pass large folios to zswap_load() without implementing > > > proper support. > > > > > > For example, if a swapin fault observes that contiguous PTEs are > > > pointing to contiguous swap entries and tries to swap them in as a la= rge > > > folio, swap_read_folio() will pass in a large folio to zswap_load(), = but > > > zswap_load() will only effectively load the first page in the folio. = If > > > the first page is not in zswap, the folio will be read from disk, eve= n > > > though other pages may be in zswap. > > > > > > In both cases, this will lead to silent data corruption. > > > > > > Proper large folio swapin support needs to go into zswap before zswap > > > can be enabled in a system that supports large folio swapin. > > > > > > Looking at callers of swap_read_folio(), it seems like they are eithe= r > > > allocated from __read_swap_cache_async() or do_swap_page() in the > > > SWP_SYNCHRONOUS_IO path. Both of which allocate order-0 folios, so we > > > are fine for now. > > > > > > Add a VM_BUG_ON() in zswap_load() to make sure that we detect changes= in > > > the order of those allocations without proper handling of zswap. > > > > > > Alternatively, swap_read_folio() (or its callers) can be updated to h= ave > > > a fallback mechanism that splits large folios or reads subpages > > > separately. Similar logic may be needed anyway in case part of a larg= e > > > folio is already in the swapcache and the rest of it is swapped out. > > > > > > Signed-off-by: Yosry Ahmed Acked-by: Barry Song this has been observed by me[1], that's why you can find the below code in my swapin patch +static struct folio *alloc_swap_folio(struct vm_fault *vmf) +{ + ... + /* + * a large folio being swapped-in could be partially in + * zswap and partially in swap devices, zswap doesn't + * support large folios yet, we might get corrupted + * zero-filled data by reading all subpages from swap + * devices while some of them are actually in zswap + */ + if (is_zswap_enabled()) + goto fallback; [1] https://lore.kernel.org/linux-mm/20240304081348.197341-6-21cnbao@gmail.= com/ > > > --- > > > > > > Sorry for the long CC list, I just found myself repeatedly looking at > > > new series that add swap support for mTHPs / large folios, making sur= e > > > they do not break with zswap or make incorrect assumptions. This debu= g > > > check should give us some peace of mind. Hopefully this patch will al= so > > > raise awareness among people who are working on this. > > > > > > --- > > > mm/zswap.c | 3 +++ > > > 1 file changed, 3 insertions(+) > > > > > > diff --git a/mm/zswap.c b/mm/zswap.c > > > index b9b35ef86d9be..6007252429bb2 100644 > > > --- a/mm/zswap.c > > > +++ b/mm/zswap.c > > > @@ -1577,6 +1577,9 @@ bool zswap_load(struct folio *folio) > > > if (!entry) > > > return false; > > > > > > + /* Zswap loads do not handle large folio swapins correctly yet = */ > > > + VM_BUG_ON(folio_test_large(folio)); > > > + > > > > There is no way we could have a WARN_ON_ONCE() and recover, right? > > Not without making more fundamental changes to the surrounding swap > code. Currently zswap_load() returns either true (folio was loaded > from zswap) or false (folio is not in zswap). > > To handle this correctly zswap_load() would need to tell > swap_read_folio() which subpages are in zswap and have been loaded, > and then swap_read_folio() would need to read the remaining subpages > from disk. This of course assumes that the caller of swap_read_folio() > made sure that the entire folio is swapped out and protected against > races with other swapins. > > Also, because swap_read_folio() cannot split the folio itself, other > swap_read_folio_*() functions that are called from it should be > updated to handle swapping in tail subpages, which may be questionable > in its own right. > > An alternative would be that zswap_load() (or a separate interface) > could tell swap_read_folio() that the folio is partially in zswap, > then we can just bail and tell the caller that it cannot read the > large folio and that it should be split. > > There may be other options as well, but the bottom line is that it is > possible, but probably not something that we want to do right now. > > A stronger protection method would be to introduce a config option or > boot parameter for large folio swapin, and then make CONFIG_ZSWAP > depend on it being disabled, or have zswap check it at boot and refuse > to be enabled if it is on. Thanks Barry