Received: by 2002:a05:6a10:8c0a:0:0:0:0 with SMTP id go10csp4389044pxb; Tue, 26 Jan 2021 22:00:06 -0800 (PST) X-Google-Smtp-Source: ABdhPJx0cuN+/UAxNALYnlr1I7+dl/BUbL1xPHu/oeWmCWbiRUU2omuk3QfNgEwwySEN9YCAIe9z X-Received: by 2002:a05:6402:1398:: with SMTP id b24mr7129498edv.108.1611727206364; Tue, 26 Jan 2021 22:00:06 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1611727206; cv=none; d=google.com; s=arc-20160816; b=nOlwij8FIXRu0fFS+5yfgZyFqhY0dz46rLR3tNXwlhZN8ViawsD4ICJHlTVUyzDdwg HS7TRxNwL1uLY3buv62dEBwAdovKE8P2Nx/hyS2leumAA4bsFwGJPq119wWj6YJbdZD5 NarsGTz1rBhuoEIMLRREr5IGTv2xUcFVs2yOyQgBeRs63IgaFOkCzA171scMCOx5Xr+V HpT7qcJrSz2mDXNEFyXrlv7leYtKPBOJfP/W8rc5ZIZG0Fz+irSgSK1OO5Xuz+mNHnXR dIHLPqgyrih3yclFKAzEwUH5FhjEo1sG7Sqnk7UOrHvvOkIFcKumS9KEuJbDc8jBaHYl jfyA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-disposition:mime-version :references:message-id:subject:cc:to:from:date; bh=fSX+x3Y+nZIHURGJ9aVYhrN6aw0a7xZztzTfrZTD8LY=; b=QCYC6Q+y9EmbgMkPTaA/lx/ZX4iOUq/jIJXcxR4YXEl7mT+wN9WQ7nEB1ao6YGo7Ei tScgsCMhUzaPAyvEWc9bbWcXelYV7Lcug+mrCRgyLxwsOyoFjpmlNVvwBnHTn9PSlN+6 AhQpD8rUF87P07/1NnIetKwWM51lRMOWN9IkeuNDIn/mh9qHHrgWuJeKdEgCAA28Tldq +6ctbhnQTLs5jvmY+u+LJrF4dZrCYeTGziw3SKPm2r5o6SA+eh99IGaTy406LBeMGerH /XpuHq9G2QKDkNkvN/42dRRx0yerUjaxGVhr4uvanPRRK5TbeLaelPF3oxFJcLEXAIiR MrDw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id f16si400536ejx.196.2021.01.26.21.59.42; Tue, 26 Jan 2021 22:00:06 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2392619AbhAZRjA (ORCPT + 99 others); Tue, 26 Jan 2021 12:39:00 -0500 Received: from mail104.syd.optusnet.com.au ([211.29.132.246]:42048 "EHLO mail104.syd.optusnet.com.au" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1730808AbhAZHuw (ORCPT ); Tue, 26 Jan 2021 02:50:52 -0500 Received: from dread.disaster.area (pa49-180-243-77.pa.nsw.optusnet.com.au [49.180.243.77]) by mail104.syd.optusnet.com.au (Postfix) with ESMTPS id 05DF38278DE; Tue, 26 Jan 2021 18:49:57 +1100 (AEDT) Received: from dave by dread.disaster.area with local (Exim 4.92.3) (envelope-from ) id 1l4J6q-002Wuh-24; Tue, 26 Jan 2021 18:49:56 +1100 Date: Tue, 26 Jan 2021 18:49:56 +1100 From: Dave Chinner To: Nicolas Boichat Cc: "Darrick J. Wong" , linux-fsdevel@vger.kernel.org, lkml , Amir Goldstein , Dave Chinner , Luis Lozano , iant@google.com Subject: Re: [BUG] copy_file_range with sysfs file as input Message-ID: <20210126074956.GF4626@dread.disaster.area> References: <20210126013414.GE4626@dread.disaster.area> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Optus-CM-Score: 0 X-Optus-CM-Analysis: v=2.3 cv=YKPhNiOx c=1 sm=1 tr=0 cx=a_idp_d a=juxvdbeFDU67v5YkIhU0sw==:117 a=juxvdbeFDU67v5YkIhU0sw==:17 a=kj9zAlcOel0A:10 a=EmqxpYm9HcoA:10 a=7-415B0cAAAA:8 a=GcyzOjIWAAAA:8 a=J2zK9Hhgt6MEY7-RQMQA:9 a=CjuIK1q_8ugA:10 a=2fKfcJrRRacA:10 a=biEYGPWJfzWAr4FL6Ov7:22 a=hQL3dl6oAZ8NdCsdz28n:22 Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, Jan 26, 2021 at 11:50:50AM +0800, Nicolas Boichat wrote: > On Tue, Jan 26, 2021 at 9:34 AM Dave Chinner wrote: > > > > On Mon, Jan 25, 2021 at 03:54:31PM +0800, Nicolas Boichat wrote: > > > Hi copy_file_range experts, > > > > > > We hit this interesting issue when upgrading Go compiler from 1.13 to > > > 1.15 [1]. Basically we use Go's `io.Copy` to copy the content of > > > `/sys/kernel/debug/tracing/trace` to a temporary file. > > > > > > Under the hood, Go now uses `copy_file_range` syscall to optimize the > > > copy operation. However, that fails to copy any content when the input > > > file is from sysfs/tracefs, with an apparent size of 0 (but there is > > > still content when you `cat` it, of course). > > > > > > A repro case is available in comment7 (adapted from the man page), > > > also copied below [2]. > > > > > > Output looks like this (on kernels 5.4.89 (chromeos), 5.7.17 and > > > 5.10.3 (chromeos)) > > > $ ./copyfrom /sys/kernel/debug/tracing/trace x > > > 0 bytes copied > > > > That's basically telling you that copy_file_range() was unable to > > copy anything. The man page says: > > > > RETURN VALUE > > Upon successful completion, copy_file_range() will return > > the number of bytes copied between files. This could be less > > than the length originally requested. If the file offset > > of fd_in is at or past the end of file, no bytes are copied, > > and copy_file_range() returns zero. > > > > THe man page explains it perfectly. > > I'm not that confident the explanation is perfect ,-) > > How does one define "EOF"? The read manpage > (https://man7.org/linux/man-pages/man2/read.2.html) defines it as a > zero return value. And so does copy_file_range(). That's the -API definition-, it does not define the kernel implementation of how to decide when the file is at EOF. > I don't think using the inode file size is > standard. It is the standard VFS filesystem definition of EOF. Indeed: copy_file_range() vfs_copy_file_range() generic_copy_file_checks() ..... /* Shorten the copy to EOF */ size_in = i_size_read(inode_in); if (pos_in >= size_in) count = 0; else count = min(count, size_in - (uint64_t)pos_in); That inode size check for EOF is -exactly- what is triggering here, and a copy of zero length returns 0 bytes having done nothing. The page cache read path does similar things in generic_file_buffered_read() to avoid truncate races exposing stale/bad data to userspace: /* * i_size must be checked after we know the pages are Uptodate. * * Checking i_size after the check allows us to calculate * the correct value for "nr", which means the zero-filled * part of the page is not copied back to userspace (unless * another truncate extends the file - this is desired though). */ isize = i_size_read(inode); if (unlikely(iocb->ki_pos >= isize)) goto put_pages; > > 'cat' "works" in this situation because it doesn't check the file > > size and just attempts to read unconditionally from the file. Hence > > it happily returns non-existent stale data from busted filesystem > > implementations that allow data to be read from beyond EOF... > > `cp` also works, so does `dd` and basically any other file operation. They do not use a syscall interface that can offload work to filesystems, low level block layer software, hardware and/or remote systems. copy_file_range() is restricted to regular files and does complex stuff that read() and friends will never do, so we have strictly enforced rules to prevent people from playing fast and loose and silently corrupting user data with it.... Cheers, Dave. -- Dave Chinner david@fromorbit.com