Received: by 2002:ab2:7903:0:b0:1fb:b500:807b with SMTP id a3csp403208lqj; Sun, 2 Jun 2024 04:04:57 -0700 (PDT) X-Forwarded-Encrypted: i=3; AJvYcCVEWa7gSNlZglHcRVbCvF1QHaPcG6zXd9toMUTmwtN5yrJPTfz/88RjjziXRKm4ngo+gmC9JpgpTPOh5WO10EVGLWWzcqbopR+nF99IaA== X-Google-Smtp-Source: AGHT+IFjPASUjEY9hAwkl4JWw/u+wKCMAskJsh4O+wLXBFSULuChNWSbF6tMm9xlVBwOblXZBcJ/ X-Received: by 2002:a05:6a20:7495:b0:1b2:565a:4b2d with SMTP id adf61e73a8af0-1b26f25de26mr6702479637.38.1717326296806; Sun, 02 Jun 2024 04:04:56 -0700 (PDT) ARC-Seal: i=2; a=rsa-sha256; t=1717326296; cv=pass; d=google.com; s=arc-20160816; b=Uc+2RAYV23E+G1U8wq630kNoAaUvUzbNxb3UjQ1Pgu7uc6El3XamH9SpmEONtwZHSr p0unX9Mqgv1FbjynZMxxyCNvOdQsSEJLcRM53ie/i9D6dtg3Sf6xQvYxr7QLaC8E/jDc wVvQLtbZmUb528DGDnoHW2gY85neL3zkg593g7VLzAvX+65TJmmePFQSMzS0nP48OOVN I2HwuF05EtzyIPGqKc3qqhZnbqyUTdw5ICiDin3uJYGOU7rfp7OGvNI+cLdOfuNKseBk 20q0+735l03PiPsQMtyxNukZ/QNyXQPuGGYuLlIgKmciSe3S8grmWOP83wjS6l7JjkUd GOyQ== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=in-reply-to:content-disposition:mime-version:list-unsubscribe :list-subscribe:list-id:precedence:references:message-id:subject:cc :to:from:date:dkim-signature; bh=2RmCsa3x4PFA+Awu1HBsaYV9OR5QaPiIjh54C0Qzzgg=; fh=sPKrtYcvfYrx0SMCKkPItgREgastrob9cURc2UNAf8I=; b=i0PA4aUk68tanAtTytPvXtRLRxIpInnvig86njf6oC0Mm6CGZLDvo46vDXzDvEVsOr vR00YDkk5tTZ/R1Qp3h/6dyU8/sQxB4JREiQsLbuuH1ihCA4DmqmcaI7k64lXSTaA2xM kuCBocFpwUeKHXKrW4ClKXq3/ZXcVKr/sAnwPmyL/ka9Qzn7DO9v/UiHQ1ngAXF+X6/0 fZdQVlhk9RY73hJSx52kS/oQlZbJiuSWGmpXXGRE56EMngh0RTv6czNDxscwoHIQenno SbqCkQoiF/g9DAx7byXjIPQ3bvdVAdbDweK3FB960SRV+898ZBh/QSvR+UNJEfWE3EIb NArw==; dara=google.com ARC-Authentication-Results: i=2; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=Yoc1DTfe; arc=pass (i=1 spf=pass spfdomain=redhat.com dkim=pass dkdomain=redhat.com dmarc=pass fromdomain=redhat.com); spf=pass (google.com: domain of linux-kernel+bounces-198203-linux.lists.archive=gmail.com@vger.kernel.org designates 147.75.48.161 as permitted sender) smtp.mailfrom="linux-kernel+bounces-198203-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from sy.mirrors.kernel.org (sy.mirrors.kernel.org. [147.75.48.161]) by mx.google.com with ESMTPS id 41be03b00d2f7-6c35c73c6d8si4677140a12.772.2024.06.02.04.04.56 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 02 Jun 2024 04:04:56 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel+bounces-198203-linux.lists.archive=gmail.com@vger.kernel.org designates 147.75.48.161 as permitted sender) client-ip=147.75.48.161; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=Yoc1DTfe; arc=pass (i=1 spf=pass spfdomain=redhat.com dkim=pass dkdomain=redhat.com dmarc=pass fromdomain=redhat.com); spf=pass (google.com: domain of linux-kernel+bounces-198203-linux.lists.archive=gmail.com@vger.kernel.org designates 147.75.48.161 as permitted sender) smtp.mailfrom="linux-kernel+bounces-198203-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by sy.mirrors.kernel.org (Postfix) with ESMTPS id 9D3A3B20FA3 for ; Sun, 2 Jun 2024 11:04:54 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id BCCFB3839D; Sun, 2 Jun 2024 11:04:42 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="Yoc1DTfe" Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 612A029422 for ; Sun, 2 Jun 2024 11:04:40 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.129.124 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1717326281; cv=none; b=uUapMmPIiEfDZH2jAUibQAaZ69dtSPvtCko4JGuV7RiEnjf4ZV9Px9qaRMQu+DNbKAt30gLCUPnjObzmgp4zSbTnAcDv+lKqJ2PF6+xAXfcqiu2yH93YxLJ8cjOBx3SjsDBDHa198h1fA00TjeuAKTZOEVTy1572Cmol7lcbpP0= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1717326281; c=relaxed/simple; bh=PvJfrcgdw2ZEP8oxZRffPus/wUS5uj9s+Fa7sxHHdzs=; h=Date:From:To:Cc:Subject:Message-ID:References:MIME-Version: Content-Type:Content-Disposition:In-Reply-To; b=hH7knfHRMnF0kXu4v6DDfNl9mGiCBCJVJ4IAgLEZyza9nEAQQLyxh51rZ3PiF2fPh4zfBEyz5DvZzREAlRAH9W6iMmJTTZQYUPSB/7b/I3ebn3RKmWFzI7Bxz7vzYueyC+VFTWBlc3/f1HKZzTjQ8lbWZBGaeiP7kxxRs/n+rB8= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=Yoc1DTfe; arc=none smtp.client-ip=170.10.129.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1717326279; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=2RmCsa3x4PFA+Awu1HBsaYV9OR5QaPiIjh54C0Qzzgg=; b=Yoc1DTfe9nzCMQNRypLWP4DIKxsELXaySOvxpHL7L0jAnUL84HY7sJPIh87tvCJrXOKZcU rldBumg6HsuRIQXvxpBexQ1QvKaeK3TGPI/jz2DvzrrWw7c1KYvm+YeF8sGphwNfdA5z8S RtZ5rElhvAY2G65f+9VTpTi+XFPHWfQ= Received: from mimecast-mx02.redhat.com (mx-ext.redhat.com [66.187.233.73]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-319-0ynkJ7azMDSTQA4HTcLcpg-1; Sun, 02 Jun 2024 07:04:31 -0400 X-MC-Unique: 0ynkJ7azMDSTQA4HTcLcpg-1 Received: from smtp.corp.redhat.com (int-mx03.intmail.prod.int.rdu2.redhat.com [10.11.54.3]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 7F5053C025AC; Sun, 2 Jun 2024 11:04:30 +0000 (UTC) Received: from bfoster (unknown [10.22.8.96]) by smtp.corp.redhat.com (Postfix) with ESMTPS id 7C8C5105480A; Sun, 2 Jun 2024 11:04:29 +0000 (UTC) Date: Sun, 2 Jun 2024 07:04:47 -0400 From: Brian Foster To: Zhang Yi Cc: linux-xfs@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, djwong@kernel.org, hch@infradead.org, brauner@kernel.org, david@fromorbit.com, chandanbabu@kernel.org, jack@suse.cz, willy@infradead.org, yi.zhang@huawei.com, chengzhihao1@huawei.com, yukuai3@huawei.com Subject: Re: [RFC PATCH v4 1/8] iomap: zeroing needs to be pagecache aware Message-ID: References: <20240529095206.2568162-1-yi.zhang@huaweicloud.com> <20240529095206.2568162-2-yi.zhang@huaweicloud.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20240529095206.2568162-2-yi.zhang@huaweicloud.com> X-Scanned-By: MIMEDefang 3.4.1 on 10.11.54.3 On Wed, May 29, 2024 at 05:51:59PM +0800, Zhang Yi wrote: > From: Dave Chinner > > Unwritten extents can have page cache data over the range being > zeroed so we can't just skip them entirely. Fix this by checking for > an existing dirty folio over the unwritten range we are zeroing > and only performing zeroing if the folio is already dirty. > > XXX: how do we detect a iomap containing a cow mapping over a hole > in iomap_zero_iter()? The XFS code implies this case also needs to > zero the page cache if there is data present, so trigger for page > cache lookup only in iomap_zero_iter() needs to handle this case as > well. > > Before: > > $ time sudo ./pwrite-trunc /mnt/scratch/foo 50000 > path /mnt/scratch/foo, 50000 iters > > real 0m14.103s > user 0m0.015s > sys 0m0.020s > > $ sudo strace -c ./pwrite-trunc /mnt/scratch/foo 50000 > path /mnt/scratch/foo, 50000 iters > % time seconds usecs/call calls errors syscall > ------ ----------- ----------- --------- --------- ---------------- > 85.90 0.847616 16 50000 ftruncate > 14.01 0.138229 2 50000 pwrite64 > .... > > After: > > $ time sudo ./pwrite-trunc /mnt/scratch/foo 50000 > path /mnt/scratch/foo, 50000 iters > > real 0m0.144s > user 0m0.021s > sys 0m0.012s > > $ sudo strace -c ./pwrite-trunc /mnt/scratch/foo 50000 > path /mnt/scratch/foo, 50000 iters > % time seconds usecs/call calls errors syscall > ------ ----------- ----------- --------- --------- ---------------- > 53.86 0.505964 10 50000 ftruncate > 46.12 0.433251 8 50000 pwrite64 > .... > > Yup, we get back all the performance. > > As for the "mmap write beyond EOF" data exposure aspect > documented here: > > https://lore.kernel.org/linux-xfs/20221104182358.2007475-1-bfoster@redhat.com/ > > With this command: > > $ sudo xfs_io -tfc "falloc 0 1k" -c "pwrite 0 1k" \ > -c "mmap 0 4k" -c "mwrite 3k 1k" -c "pwrite 32k 4k" \ > -c fsync -c "pread -v 3k 32" /mnt/scratch/foo > > Before: > > wrote 1024/1024 bytes at offset 0 > 1 KiB, 1 ops; 0.0000 sec (34.877 MiB/sec and 35714.2857 ops/sec) > wrote 4096/4096 bytes at offset 32768 > 4 KiB, 1 ops; 0.0000 sec (229.779 MiB/sec and 58823.5294 ops/sec) > 00000c00: 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58 > XXXXXXXXXXXXXXXX > 00000c10: 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58 > XXXXXXXXXXXXXXXX > read 32/32 bytes at offset 3072 > 32.000000 bytes, 1 ops; 0.0000 sec (568.182 KiB/sec and 18181.8182 > ops/sec > > After: > > wrote 1024/1024 bytes at offset 0 > 1 KiB, 1 ops; 0.0000 sec (40.690 MiB/sec and 41666.6667 ops/sec) > wrote 4096/4096 bytes at offset 32768 > 4 KiB, 1 ops; 0.0000 sec (150.240 MiB/sec and 38461.5385 ops/sec) > 00000c00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > ................ > 00000c10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > ................ > read 32/32 bytes at offset 3072 > 32.000000 bytes, 1 ops; 0.0000 sec (558.036 KiB/sec and 17857.1429 > ops/sec) > > We see that this post-eof unwritten extent dirty page zeroing is > working correctly. > I've pointed this out in the past, but IIRC this implementation is racy vs. reclaim. Specifically, relying on folio lookup after mapping lookup doesn't take reclaim into account, so if we look up an unwritten mapping and then a folio flushes and reclaims by the time the scan reaches that offset, it incorrectly treats that subrange as already zero when it actually isn't (because the extent is actually stale by that point, but the stale extent check is skipped). A simple example to demonstrate this is something like the following: # looping truncate zeroing while [ true ]; do xfs_io -fc "truncate 0" -c "falloc 0 32K" -c "pwrite 0 4k" -c "truncate 2k" xfs_io -c "mmap 0 4k" -c "mread -v 2k 16" | grep cd && break done vs. # looping writeback and reclaim while [ true ]; do xfs_io -c "sync_range -a 0 0" -c "fadvise -d 0 0" done If I ran that against this patch, the first loop will eventually detect stale data exposed past eof. Brian