Received: by 2002:ab2:6857:0:b0:1ef:ffd0:ce49 with SMTP id l23csp2694858lqp; Mon, 25 Mar 2024 06:56:07 -0700 (PDT) X-Forwarded-Encrypted: i=3; AJvYcCWd2KN5BdctiAM6rW5IERk4/BpuqQGUaxhUyhx3Q12owKtNRo0pDiB5/d2FopHlx8lp1aYnkDssB6a9I3TymeYmCVoJOLfpqlxION91Jw== X-Google-Smtp-Source: AGHT+IHNd+AQtwDv7uRfRSumBbEN6zxjKDjuz5ELsabxz6ZhUzbw9RKwiJu9u1y010RZNKj+3l+l X-Received: by 2002:a05:6a20:6d85:b0:1a3:4d0a:35c0 with SMTP id gl5-20020a056a206d8500b001a34d0a35c0mr5677089pzb.17.1711374967370; Mon, 25 Mar 2024 06:56:07 -0700 (PDT) ARC-Seal: i=2; a=rsa-sha256; t=1711374967; cv=pass; d=google.com; s=arc-20160816; b=LTeOy/w4Dra5uC0XzfS6FVjpGs278o4ij8YV94zrtCzmEOGXSVTpDVyzqCd2NoZvga DOjsyfvslS6KpsKm3vfgq7HPgvz7nZJKrkAJXLf8RpvtQmAz53Dj8Zj1VpZNpb2KV0eV +5tpS+RyqmlRphgjecNfrd6G5fLHJQ+0rowwrco8Vv66tl14/UZKosWiwitZ3HDQNCot hW5nK3z2MkdA5TbT9BPbJYqfDxSm5mStSvZ6rHMO0nb6NEf0scL0L9dkKTudT8DJBSwK VASW1jC4hGkKE6U6bGYVs3syYvuc3rjE5EnSl2BdgUdUrdzql8C87g6UE2aHp2Wyf6+2 vAhg== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:list-unsubscribe:list-subscribe :list-id:precedence:dkim-signature; bh=Y0kZP691y+X37CeRc5a1b81NRGe9/CqFEZLGS/IX9lU=; fh=9aOh37VGWSO31dDaFBQ2MdzRXa1RUAQvTZa94BCkwDY=; b=KE7i6lG8NZDw2OnW7XWLQrEVhIT2qqjdRSCwB2pC49FasmZCjZKbNN+OrbXdyyuCFy EvWf/aalA9RUEIgP+i1vFVR61W3c/rvHGTaNOoVi/ygoKdfh0e6iiSI5LAJ5SrxkdheO 8F+xrsnii7EcCJc9BBXntS+gaZ/2M7MlsSEjSvlff50kh4uG4aIg97tae+FGbmAXIiY3 mVGzj2ayVdlzH6aPKYR5TtRNDpQ64y05zViB4M6tBd3uHlslem4Xok7DzTilBPG0hmfD faGO+ttN3UGV3UVDUTSlV4jK58q7baYd26n4Xve2MwmyXtSfZvGSUsL+0ZRUDC0o/ytI XbJw==; dara=google.com ARC-Authentication-Results: i=2; mx.google.com; dkim=pass header.i=@google.com header.s=20230601 header.b=AejJBxrw; arc=pass (i=1 spf=pass spfdomain=google.com dkim=pass dkdomain=google.com dmarc=pass fromdomain=google.com); spf=pass (google.com: domain of linux-kernel+bounces-116703-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:45e3:2400::1 as permitted sender) smtp.mailfrom="linux-kernel+bounces-116703-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from sv.mirrors.kernel.org (sv.mirrors.kernel.org. [2604:1380:45e3:2400::1]) by mx.google.com with ESMTPS id 11-20020a170902c14b00b001e039a37289si5091663plj.613.2024.03.25.06.56.07 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 25 Mar 2024 06:56:07 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel+bounces-116703-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:45e3:2400::1 as permitted sender) client-ip=2604:1380:45e3:2400::1; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20230601 header.b=AejJBxrw; arc=pass (i=1 spf=pass spfdomain=google.com dkim=pass dkdomain=google.com dmarc=pass fromdomain=google.com); spf=pass (google.com: domain of linux-kernel+bounces-116703-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:45e3:2400::1 as permitted sender) smtp.mailfrom="linux-kernel+bounces-116703-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by sv.mirrors.kernel.org (Postfix) with ESMTPS id AE6BB2C8EAA for ; Mon, 25 Mar 2024 13:44:45 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id D2141131BA5; Mon, 25 Mar 2024 10:28:04 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="AejJBxrw" Received: from mail-ej1-f45.google.com (mail-ej1-f45.google.com [209.85.218.45]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 0F01E15573D for ; Mon, 25 Mar 2024 08:39:25 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.218.45 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1711355967; cv=none; b=IdApYwognTKZ78GBe0LuOUTKEeijBqX7Vl1rJ4iM8zroktoOGPwBdVa/Qf8iTx8D9+4o/QqCmSd4V8wU1XLEQe+OpqLf7ZQ2AfvsmH+qWerTqz3t5RcpMleXS7nGnMT2ITeEfsa4vn97eK0qw84K453/u4TpJMEinim3cXoYsC0= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1711355967; c=relaxed/simple; bh=BAC+WVMO1tGwo33xnug+/2dC0AeTsQNT40RAmXUK4xw=; h=MIME-Version:References:In-Reply-To:From:Date:Message-ID:Subject: To:Cc:Content-Type; b=hsYETpqs/wf3wrMnfjh6MEBCwVXn67kzIzoGc3uWgfDmBsA+izSbaFNlOlynbX+cNpKFts/OZzoRst6VyRv3CiknvHQr+lHHCwpXDnD9rktYmo+kKTIODNqQcwcwyi2sien75TSrwKKZYGbuknlWZJ3O7ZBkSgE5ULKezKiui6E= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=AejJBxrw; arc=none smtp.client-ip=209.85.218.45 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=google.com Received: by mail-ej1-f45.google.com with SMTP id a640c23a62f3a-a4a3a5e47baso49078366b.2 for ; Mon, 25 Mar 2024 01:39:25 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1711355964; x=1711960764; darn=vger.kernel.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=Y0kZP691y+X37CeRc5a1b81NRGe9/CqFEZLGS/IX9lU=; b=AejJBxrwlY9xMNHTAWsRBde6Bh81IF2usY9qx0V3mmk8vL/V34nE6lnKoSJdXIp1T/ WgtWI0hemmrxb/yD/niOZYOKa9+dP+dyX+qr3830Pzm0S6PEs2MS3pHL8ivcyVEcvuEU Ekt4A3LEdbHD/KDOTzwMJ6P9XxsSgx67YHb0N0SELG/hddkSaSfZPFlDevctAWRzCm5A F96wL6X0Ean8jJkPGPBZrADj1KtXnt5qBV5RzbxFdlcFsWdRz1vykxw637MQFH18w1gv kq/Ghm/QUfA5U7Nc1ZsSoFOQUgunIcAxhhUTjeGNq4OO6/byjloGPn9LPWkRs9VTCn6w /yTg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1711355964; x=1711960764; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=Y0kZP691y+X37CeRc5a1b81NRGe9/CqFEZLGS/IX9lU=; b=VvdnppPV0iHGUfWZkeplDoWaf8RIciZmJviFjAzXN5Y2JUefTfyTxJFCdsqsNGC/Qk lHraQ+yFkEtlIb60UXlWtzR1ZDaTUOIhyADp2D02ssOH3+vzjYF5ypVhoPFXKn0WKnMs v8On67hoTPwg8QinUEn9W1vsBjYQMKItczysPC7RX+NfRhjyvTR1J5rW7twnHYcPxDH6 oXayiT47nvhIp34xx7FerRw0Kxf3gEg6nF3vHPwjRlUm2Be/hf/ym0iAKqbooq861Xey FJrJUC3Mndos2Qh0faq4ED2r7WaZDXc0nlbQsrAu5iFVfq+KXRTLB/uS7xibc+Evj30i V7Qg== X-Forwarded-Encrypted: i=1; AJvYcCVvCbe8UxpBv13Ox3gPI9IhWLvcHi2pns16H3cjDtBSlmah0QJA0+rx4eyjN10iwItJPPXflIPQQXNxQOfyG8fr3xAw06bug8Ar5SD5 X-Gm-Message-State: AOJu0YxWjVAQ63PfJc6hGWywEYBVeL0iEyrid+mlEWwO2Vk2azdDwSvU aLmG6vzaBndEQGD9/G95vgaEO9e9Eko+Xkoe1WvRp0Rwp7raFNp4HuJpG8L2jakqlObadyI7MBD yLpnMrINmQVFdFuZwylpf3QWUsBMlfaB9nZmX X-Received: by 2002:a17:906:c109:b0:a47:6bd4:cbd9 with SMTP id do9-20020a170906c10900b00a476bd4cbd9mr2433483ejc.52.1711355964219; Mon, 25 Mar 2024 01:39:24 -0700 (PDT) Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 References: <20240324210447.956973-1-hannes@cmpxchg.org> <1e7ce417-b9dd-4d62-9f54-0adf1ccdae35@linux.dev> In-Reply-To: <1e7ce417-b9dd-4d62-9f54-0adf1ccdae35@linux.dev> From: Yosry Ahmed Date: Mon, 25 Mar 2024 01:38:47 -0700 Message-ID: Subject: Re: [PATCH] mm: zswap: fix data loss on SWP_SYNCHRONOUS_IO devices To: Chengming Zhou Cc: Barry Song <21cnbao@gmail.com>, Johannes Weiner , Andrew Morton , Zhongkun He , Chengming Zhou , Chris Li , Nhat Pham , linux-mm@kvack.org, linux-kernel@vger.kernel.org, Kairui Song Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable On Mon, Mar 25, 2024 at 12:33=E2=80=AFAM Chengming Zhou wrote: > > On 2024/3/25 15:06, Yosry Ahmed wrote: > > On Sun, Mar 24, 2024 at 9:54=E2=80=AFPM Barry Song <21cnbao@gmail.com> = wrote: > >> > >> On Mon, Mar 25, 2024 at 10:23=E2=80=AFAM Yosry Ahmed wrote: > >>> > >>> On Sun, Mar 24, 2024 at 2:04=E2=80=AFPM Johannes Weiner wrote: > >>>> > >>>> Zhongkun He reports data corruption when combining zswap with zram. > >>>> > >>>> The issue is the exclusive loads we're doing in zswap. They assume > >>>> that all reads are going into the swapcache, which can assume > >>>> authoritative ownership of the data and so the zswap copy can go. > >>>> > >>>> However, zram files are marked SWP_SYNCHRONOUS_IO, and faults will t= ry > >>>> to bypass the swapcache. This results in an optimistic read of the > >>>> swap data into a page that will be dismissed if the fault fails due = to > >>>> races. In this case, zswap mustn't drop its authoritative copy. > >>>> > >>>> Link: https://lore.kernel.org/all/CACSyD1N+dUvsu8=3DzV9P691B9bVq33er= wOXNTmEaUbi9DrDeJzw@mail.gmail.com/ > >>>> Reported-by: Zhongkun He > >>>> Fixes: b9c91c43412f ("mm: zswap: support exclusive loads") > >>>> Cc: stable@vger.kernel.org [6.5+] > >>>> Signed-off-by: Johannes Weiner > >>>> Tested-by: Zhongkun He > >> > >> Acked-by: Barry Song > >> > >>> > >>> Do we also want to mention somewhere (commit log or comment) that > >>> keeping the entry in the tree is fine because we are still protected > >>> from concurrent loads/invalidations/writeback by swapcache_prepare() > >>> setting SWAP_HAS_CACHE or so? > >> > >> It seems that Kairui's patch comprehensively addresses the issue at ha= nd. > >> Johannes's solution, on the other hand, appears to align zswap behavio= r > >> more closely with that of a traditional swap device, only releasing an= entry > >> when the corresponding swap slot is freed, particularly in the sync-io= case. > > > > It actually worked out quite well that Kairui's fix landed shortly > > before this bug was reported, as this fix wouldn't have been possible > > without it as far as I can tell. > > > >> > >> Johannes' patch has inspired me to consider whether zRAM could achieve > >> a comparable outcome by immediately releasing objects in swap cache > >> scenarios. When I have the opportunity, I plan to experiment with zRA= M. > > > > That would be interesting. I am curious if it would be as > > straightforward in zram to just mark the folio as dirty in this case > > like zswap does, given its implementation as a block device. > > > > This makes me wonder who is responsible for marking folio dirty in this s= wapcache > bypass case? Should we call folio_mark_dirty() after the swap_read_folio(= )? In shrink_folio_list(), we try to add anonymous folios to the swapcache if they are not there before checking if they are dirty. add_to_swap() calls folio_mark_dirty(), so this should take care of it. There is an interesting comment there though. It says that PTE should be dirty, so unmapping the folio should have already marked it as dirty by the time we are adding it to the swapcache, except for the MADV_FREE case. However, I think we actually unmap the folio after we add it to the swapcache in shrink_folio_list(). Also, I don't immediately see why the PTE would be dirty. In do_swap_page(), making the PTE dirty seems to be conditional on the fault being a write fault, but I didn't look thoroughly, maybe I missed it. It is also possible that the comment is just outdated.