Received: by 2002:ab2:6857:0:b0:1ef:ffd0:ce49 with SMTP id l23csp2935707lqp; Mon, 25 Mar 2024 13:40:18 -0700 (PDT) X-Forwarded-Encrypted: i=3; AJvYcCVRb8y3B/C6YqAfyRutYzw22jfjVYoOYl053VrWRIROAifQdEyj4w1HexxqelHiLM0gLvigoAAfMyNddSCWJviNJ8ihGrN1weCAwP7w+A== X-Google-Smtp-Source: AGHT+IGdaaHPl9sA+YobX1gyiPQOLR5QXAzeAV9RJuT7mNsk2f0CbCtpBVirSSzQBZXLD9mJ04+V X-Received: by 2002:a05:6870:5a1:b0:222:7404:f3e4 with SMTP id m33-20020a05687005a100b002227404f3e4mr9639307oap.29.1711399218218; Mon, 25 Mar 2024 13:40:18 -0700 (PDT) ARC-Seal: i=2; a=rsa-sha256; t=1711399218; cv=pass; d=google.com; s=arc-20160816; b=DytBG/mIqiNecDlXx4QrHutnBRjN94VstuYY1jaUM6zNmxyy4E4OyaIvtdmo+GNvAv NjPfoP7c3faNXElDdAxRiDj2CU4IAuhJeh7ZKJ7bVut1OaWJ0RpJ/JkOIrP1vAuY1Wru 27Moegt6iTafRYHFRjqkbUSONnY4Lb0Ymhr0B7IOKSXWNgvZkMoKv1Nz4xmpWTjdGL/T BHU1mGw0NdtH47VbdXK+R3s15HEIsrfjHoB6wEmVpPwJixPg4sqvpJCmiKf0u1l3H51z PewrYD43b53n9SyWoghEAg+QyxL1xo+4VDg7/f9Ds099yseN7tHZiRHSUuxDEsalMW/m GqCQ== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:in-reply-to:from:references:cc:to :content-language:subject:mime-version:list-unsubscribe :list-subscribe:list-id:precedence:date:dkim-signature:message-id; bh=eOlZm0+5XlJcPzDimS2ld8aXFCHL2fRhCLxa+S5e368=; fh=/uOPHK78e1Z1S7ba6WFGjt3WkCK274XnkyrAC0Hf6o0=; b=CYZw2C9SbXIKBqK78Si6RLzRaZCKEVto4cNijlxgyoMUlPn0peVp6MtxKypV5dpvVU dPd9lQMCOPA4SUXqTzEi4dyEesCkjgWwWFluztl3UsC+w0q8ZVcOiRWQQFfPKsBAgEv9 aZtnUBNojrN03aG5m8tUFKOolglHVn7WfWfqZS5gzHLMfJdjljZGHzHHMdZYlSou/P5P LsIh5svqO6eKq4F8XjA+zjTATJ4VTxUiQ4DKEF8DGsAZ36u5J8JjGEfRf8ECAW2Jen53 VBLFpyY+z+pDhLA9rRyZyFHJVVERq60JajaqGamzgqoqXlyzHz3aL2/3M+kcUhSQ8k8+ Noqg==; dara=google.com ARC-Authentication-Results: i=2; mx.google.com; dkim=pass header.i=@linux.dev header.s=key1 header.b=iUaRKlNS; arc=pass (i=1 spf=pass spfdomain=linux.dev dkim=pass dkdomain=linux.dev dmarc=pass fromdomain=linux.dev); spf=pass (google.com: domain of linux-kernel+bounces-116741-linux.lists.archive=gmail.com@vger.kernel.org designates 147.75.48.161 as permitted sender) smtp.mailfrom="linux-kernel+bounces-116741-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linux.dev Return-Path: Received: from sy.mirrors.kernel.org (sy.mirrors.kernel.org. [147.75.48.161]) by mx.google.com with ESMTPS id f30-20020a63101e000000b005e43cb66c1asi7986165pgl.329.2024.03.25.13.40.17 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 25 Mar 2024 13:40:18 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel+bounces-116741-linux.lists.archive=gmail.com@vger.kernel.org designates 147.75.48.161 as permitted sender) client-ip=147.75.48.161; Authentication-Results: mx.google.com; dkim=pass header.i=@linux.dev header.s=key1 header.b=iUaRKlNS; arc=pass (i=1 spf=pass spfdomain=linux.dev dkim=pass dkdomain=linux.dev dmarc=pass fromdomain=linux.dev); spf=pass (google.com: domain of linux-kernel+bounces-116741-linux.lists.archive=gmail.com@vger.kernel.org designates 147.75.48.161 as permitted sender) smtp.mailfrom="linux-kernel+bounces-116741-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linux.dev Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by sy.mirrors.kernel.org (Postfix) with ESMTPS id 26DAAB64DF0 for ; Mon, 25 Mar 2024 13:55:06 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 406C1158211; Mon, 25 Mar 2024 10:37:00 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b="iUaRKlNS" Received: from out-175.mta0.migadu.com (out-175.mta0.migadu.com [91.218.175.175]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id A072D1581ED for ; Mon, 25 Mar 2024 09:22:45 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=91.218.175.175 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1711358568; cv=none; b=Oo+EFK0wlwU9PVfnSDyLti0sokILRMrGYikYYbutWE4x/HExKP6atTIQa3R3d7AtZQm6J7wa41TmF6xoxOjRMB/rdSWfh507+5+tXUBFXuzUSiFm9Uwkx609JgkvBUKQblQFNik92noo4IB14MVvNVowFaacKAJLyhp7ObstMDc= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1711358568; c=relaxed/simple; bh=NUq6l92k3z6+MavqfTyLMHgevvGnDis+wYiPtoA46pA=; h=Message-ID:Date:MIME-Version:Subject:To:Cc:References:From: In-Reply-To:Content-Type; b=e2Fk9QoBHSzdKPI6E3pPZTdtBW1wpDsBnzDuEXirSf34j89MY0KXRfIUXRGopQCGoi/eciHbe43nIcUWgkxc0v9hCNTbpSguhBUoUriq3fS7K9Tzkz99d0oFGVRY2Aw+SGGvUAftoqkDblYCkrfWZvMTq1uLDspjjgXoXQcXQdw= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev; spf=pass smtp.mailfrom=linux.dev; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b=iUaRKlNS; arc=none smtp.client-ip=91.218.175.175 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.dev Message-ID: DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1711358563; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=eOlZm0+5XlJcPzDimS2ld8aXFCHL2fRhCLxa+S5e368=; b=iUaRKlNSkeZqVpQBNnWAM8JvB4mvcqL9WsD9sTLOx62VlXRPwbRfz0wOYGYQyTqbsBg6jG FhFVtjvuIJFv/Q3a2J6GxDPC79M19bffV9YEN8L54jK8O5kXK6O55bKVjmnHxZPvt8zzb3 KYbBB6abu5qZhUZueEtLBSOgsxnzI3c= Date: Mon, 25 Mar 2024 17:22:25 +0800 Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Subject: Re: [PATCH] mm: zswap: fix data loss on SWP_SYNCHRONOUS_IO devices Content-Language: en-US To: Yosry Ahmed Cc: Barry Song <21cnbao@gmail.com>, Johannes Weiner , Andrew Morton , Zhongkun He , Chengming Zhou , Chris Li , Nhat Pham , linux-mm@kvack.org, linux-kernel@vger.kernel.org, Kairui Song References: <20240324210447.956973-1-hannes@cmpxchg.org> <1e7ce417-b9dd-4d62-9f54-0adf1ccdae35@linux.dev> X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. From: Chengming Zhou In-Reply-To: Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-Migadu-Flow: FLOW_OUT On 2024/3/25 16:38, Yosry Ahmed wrote: > On Mon, Mar 25, 2024 at 12:33 AM Chengming Zhou > wrote: >> >> On 2024/3/25 15:06, Yosry Ahmed wrote: >>> On Sun, Mar 24, 2024 at 9:54 PM Barry Song <21cnbao@gmail.com> wrote: >>>> >>>> On Mon, Mar 25, 2024 at 10:23 AM Yosry Ahmed wrote: >>>>> >>>>> On Sun, Mar 24, 2024 at 2:04 PM Johannes Weiner wrote: >>>>>> >>>>>> Zhongkun He reports data corruption when combining zswap with zram. >>>>>> >>>>>> The issue is the exclusive loads we're doing in zswap. They assume >>>>>> that all reads are going into the swapcache, which can assume >>>>>> authoritative ownership of the data and so the zswap copy can go. >>>>>> >>>>>> However, zram files are marked SWP_SYNCHRONOUS_IO, and faults will try >>>>>> to bypass the swapcache. This results in an optimistic read of the >>>>>> swap data into a page that will be dismissed if the fault fails due to >>>>>> races. In this case, zswap mustn't drop its authoritative copy. >>>>>> >>>>>> Link: https://lore.kernel.org/all/CACSyD1N+dUvsu8=zV9P691B9bVq33erwOXNTmEaUbi9DrDeJzw@mail.gmail.com/ >>>>>> Reported-by: Zhongkun He >>>>>> Fixes: b9c91c43412f ("mm: zswap: support exclusive loads") >>>>>> Cc: stable@vger.kernel.org [6.5+] >>>>>> Signed-off-by: Johannes Weiner >>>>>> Tested-by: Zhongkun He >>>> >>>> Acked-by: Barry Song >>>> >>>>> >>>>> Do we also want to mention somewhere (commit log or comment) that >>>>> keeping the entry in the tree is fine because we are still protected >>>>> from concurrent loads/invalidations/writeback by swapcache_prepare() >>>>> setting SWAP_HAS_CACHE or so? >>>> >>>> It seems that Kairui's patch comprehensively addresses the issue at hand. >>>> Johannes's solution, on the other hand, appears to align zswap behavior >>>> more closely with that of a traditional swap device, only releasing an entry >>>> when the corresponding swap slot is freed, particularly in the sync-io case. >>> >>> It actually worked out quite well that Kairui's fix landed shortly >>> before this bug was reported, as this fix wouldn't have been possible >>> without it as far as I can tell. >>> >>>> >>>> Johannes' patch has inspired me to consider whether zRAM could achieve >>>> a comparable outcome by immediately releasing objects in swap cache >>>> scenarios. When I have the opportunity, I plan to experiment with zRAM. >>> >>> That would be interesting. I am curious if it would be as >>> straightforward in zram to just mark the folio as dirty in this case >>> like zswap does, given its implementation as a block device. >>> >> >> This makes me wonder who is responsible for marking folio dirty in this swapcache >> bypass case? Should we call folio_mark_dirty() after the swap_read_folio()? > > In shrink_folio_list(), we try to add anonymous folios to the > swapcache if they are not there before checking if they are dirty. > add_to_swap() calls folio_mark_dirty(), so this should take care of Right, thanks for your clarification, so should be no problem here. Although it was a fix just for MADV_FREE case. > it. There is an interesting comment there though. It says that PTE > should be dirty, so unmapping the folio should have already marked it > as dirty by the time we are adding it to the swapcache, except for the > MADV_FREE case. It seems to say the folio will be dirtied when unmap later, supposing the PTE is dirty. > > However, I think we actually unmap the folio after we add it to the > swapcache in shrink_folio_list(). Also, I don't immediately see why > the PTE would be dirty. In do_swap_page(), making the PTE dirty seems If all anon pages on LRU list are faulted by write, it should be true. We could just use the zero page if faulted by read, right? > to be conditional on the fault being a write fault, but I didn't look > thoroughly, maybe I missed it. It is also possible that the comment is > just outdated. Yeah, dirty is only marked on write fault. Thanks.