Received: by 2002:a05:6358:7058:b0:131:369:b2a3 with SMTP id 24csp8669769rwp; Wed, 19 Jul 2023 13:28:05 -0700 (PDT) X-Google-Smtp-Source: APBJJlHcQWpTW/F/+x2UYc+Yc+kvq2rhDAaVwE5ihXA00P0yPoAyzrjodS9t/lj6BDYFsMSBrBi9 X-Received: by 2002:a17:903:120b:b0:1b5:5059:e733 with SMTP id l11-20020a170903120b00b001b55059e733mr280154plh.17.1689798485302; Wed, 19 Jul 2023 13:28:05 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1689798485; cv=none; d=google.com; s=arc-20160816; b=uvKlj8HLbvVnyaWl4K+xKnDB22WJMGu2PqiMr+OlzWsPWHsAZjZspTjDA7lePW8SYL LcSTezXNbTyDo1b5Ac+LP3YioD3YStVTE8qs/sLZHPLSvKfUGYI3QlPgolD8w8g4NuU1 uxQYnPkFuz/ZOdDjY4QYj4OgYwNkyNJCjRyoLx41d9qqNW+UI1FCH44WIU0Hfg260Kvy uLWXWTR0w3yRf4g1Wz2CzEUTIJYhxdGKKtU2lYDQwiQOnRiGw5kKjLUic972W7Bbo8q8 0J+S8yJYno5jeukmrweMAiZ6NtwuzojRkoS7DupubyvQ7I+cxL/5bRYzMB9TAXCuyKx8 6Gvg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:cc:to:subject :message-id:date:from:in-reply-to:references:mime-version :dkim-signature; bh=eCAWdeG436mH+DfARvo16QjhMFi/Eb5+1UBeyzzjyUI=; fh=zQgQCa2BKK3utOIPzy5ma7V4zBNECQa8L+MUQvM7VbU=; b=HNQwcOSLhdoysSvIvi/4oJRkYGjihaOSOEoJvhslpIKzqtfzXjskgBuhFWeq3gYt1R XNmVKth2zcAlq/nbsp7duDebO/ujw5Lq6vqgRcJB0vVqfJCCjYBfVJCeFyOpsMBy1r76 NdZ+FBwvYwoZt+MQodcM6R/OVMjzrMwWx/v258QPTMXEVoOcxRFaJkV1EtDRmy/THz6M FXJQRLVTlwt2ZpSob2zO3W8HMB2IVZDx49hHYBd8RgH8o+CdJvMgVDKh8HpxWK0RYYs3 /5dwi2xB4MORH31wyUmc4t4zHzx+yVnc8Zl6gE2815jsC7Pkt6t7veF5OTRGxYxDM7Ol mtEA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@szeredi.hu header.s=google header.b=KDf25kUk; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=szeredi.hu Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id z17-20020a1709028f9100b001a99b9d767csi3827000plo.166.2023.07.19.13.27.53; Wed, 19 Jul 2023 13:28:05 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@szeredi.hu header.s=google header.b=KDf25kUk; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=szeredi.hu Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229875AbjGST5B (ORCPT + 99 others); Wed, 19 Jul 2023 15:57:01 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:42330 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229981AbjGST47 (ORCPT ); Wed, 19 Jul 2023 15:56:59 -0400 Received: from mail-ed1-x52e.google.com (mail-ed1-x52e.google.com [IPv6:2a00:1450:4864:20::52e]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 7E7B61FCE for ; Wed, 19 Jul 2023 12:56:58 -0700 (PDT) Received: by mail-ed1-x52e.google.com with SMTP id 4fb4d7f45d1cf-51de9c2bc77so10028131a12.3 for ; Wed, 19 Jul 2023 12:56:58 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=szeredi.hu; s=google; t=1689796616; x=1692388616; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=eCAWdeG436mH+DfARvo16QjhMFi/Eb5+1UBeyzzjyUI=; b=KDf25kUkA8y5VPkkc7HrbMTq3b2TUQ9qV1Rqf5zm3CstAGUdbu1d6TNxedY+pIsa7j 2FZ1elHv1v1po8knHje3+LiAmx8Xahh5ExqZRnI0mXGfRX7m3EFLlyAK10LkH8G1Mz8+ NL2rNXbVT3ccaArlkF+ebJlcCMIxueJ9R+uLs= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1689796616; x=1692388616; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=eCAWdeG436mH+DfARvo16QjhMFi/Eb5+1UBeyzzjyUI=; b=kGc9oFTWXYNNVS84hC/I1xK4BlUFOUSMb8EFLlDyUVwdHvW1wj5uEvMUgp47tErvM1 NZZ1jr7Mds3hXA7VlS2MPVtI5fhrY3MQOUE9f0l4fAA6NpLnI7E3e37iWsBmyVk0YChj X+x3HsvdRpHhpAUAxSAh1sbldscbtLHSFrHVPESMMDCCbB9o1Bm9C9dKnwliaEUR9SKb 4LBQSIT20CEHr/pviKv2785skff31/NSva8giquCTqDZpWZkiDcqeoYOn1bb0WNITQrw NI3HvbU3SBp6jlyfdfWwFqgFlknz6+YCcO7oc12cZF88F3bIsfG8UB9FE3U/wok0V7T0 R3Hw== X-Gm-Message-State: ABy/qLZwmetFXZ+LQydmwZ1jIhwu+Rc/kWUSeleK5wEjXYVK38VvsOVM TyqNq7ExSveWedDW17BPRzk0OVSf3RgBGPbp1B6DIg== X-Received: by 2002:a17:906:7485:b0:993:f497:adbe with SMTP id e5-20020a170906748500b00993f497adbemr4056662ejl.19.1689796616378; Wed, 19 Jul 2023 12:56:56 -0700 (PDT) MIME-Version: 1.0 References: <20230629155433.4170837-1-dhowells@redhat.com> <20230629155433.4170837-2-dhowells@redhat.com> In-Reply-To: From: Miklos Szeredi Date: Wed, 19 Jul 2023 21:56:44 +0200 Message-ID: Subject: Re: [RFC PATCH 1/4] splice: Fix corruption of spliced data after splice() returns To: Matthew Wilcox Cc: Matt Whitlock , David Howells , netdev@vger.kernel.org, Dave Chinner , Linus Torvalds , Jens Axboe , linux-fsdevel@kvack.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, Christoph Hellwig , linux-fsdevel@vger.kernel.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_BLOCKED, SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE,URIBL_BLOCKED autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, 19 Jul 2023 at 21:44, Matthew Wilcox wrote: > > On Wed, Jul 19, 2023 at 09:35:33PM +0200, Miklos Szeredi wrote: > > On Wed, 19 Jul 2023 at 19:59, Matt Whitlock = wrote: > > > > > > On Wednesday, 19 July 2023 06:17:51 EDT, Miklos Szeredi wrote: > > > > On Thu, 29 Jun 2023 at 17:56, David Howells w= rote: > > > >> > > > >> Splicing data from, say, a file into a pipe currently leaves the s= ource > > > >> pages in the pipe after splice() returns - but this means that tho= se pages > > > >> can be subsequently modified by shared-writable mmap(), write(), > > > >> fallocate(), etc. before they're consumed. > > > > > > > > What is this trying to fix? The above behavior is well known, so > > > > it's not likely to be a problem. > > > > > > Respectfully, it's not well-known, as it's not documented. If the spl= ice(2) > > > man page had mentioned that pages can be mutated after they're alread= y > > > ostensibly at rest in the output pipe buffer, then my nightly backups > > > wouldn't have been incurring corruption silently for many months. > > > > splice(2): > > > > Though we talk of copying, actual copies are generally avoided. > > The kernel does this by implementing a pipe buffer as a set of > > refer=E2=80=90 > > ence-counted pointers to pages of kernel memory. The > > kernel creates "copies" of pages in a buffer by creating new pointers > > (for the > > output buffer) referring to the pages, and increasing the > > reference counts for the pages: only pointers are copied, not the > > pages of the > > buffer. > > > > While not explicitly stating that the contents of the pages can change > > after being spliced, this can easily be inferred from the above > > semantics. > > So what's the API that provides the semantics of _copying_? What's your definition of copying? Thanks, Miklos