Received: by 2002:ac0:da4c:0:0:0:0:0 with SMTP id a12csp932203imi; Thu, 21 Jul 2022 13:52:14 -0700 (PDT) X-Google-Smtp-Source: AGRyM1vAnapr59YNl3YQIEZc4KvW7liWBUZyJl4zK4T7DxWc4ET0v8prjBHDYORIdGII4b85b401 X-Received: by 2002:a63:87c6:0:b0:41a:3cc2:1f29 with SMTP id i189-20020a6387c6000000b0041a3cc21f29mr232743pge.96.1658436733804; Thu, 21 Jul 2022 13:52:13 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1658436733; cv=none; d=google.com; s=arc-20160816; b=kqbhzL3EWVvSpv3Gk/oNCl9wuetI3ifZU/JZalr1iSrcIB2LVSIN5tG1BSMcI53F7c hMibvydTqizv+hWenmrZU8rk0R1yzKJZZWPNA4tsd+RYciI1yP4ZdBssSlPQ0Q/2jhnL yIzMLla0cQOCaL8ZQOzguwmJWxvwgwrsSlj/65q86IhurMwcmlZmvSmYgffKYVGyHjMP KTX8Apkr3ph4SW8d9XiV8sHE/KmX/+7L19kfdf6MKHZl+NNRqx/a3EBXQBPv/zloH8hG f9k/Mlmr6C/qq65s67BEOdoK4h7kzyw7q6/BrqrRi9YAkUZu7sTkNfu6CiKsvt8g+F6i qMmA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-disposition:mime-version :references:message-id:subject:cc:to:from:date:dkim-signature; bh=eJzfJJUbBt7HFtRFrUthszMXR55SFv/KXcRLdAJs0uA=; b=XbvRI1FYnhO98REVaBJjQAI0GAqqvhzL9VKd3V6+IRZn3qtCZmnS5O5T6P6GqNQwK5 HXexsJJyoL0JKWKr0K7ShwIFLPX2H8rbBaFRQc/g2yemZ5TqwQTyXq7czIrxg2jrXThI f7gFItVVMd5EzZ+truzVNS4OBMPTfkJ1mkOn85v3MfEmsHYhQtfRRYpolmj0hmtlZgh3 Bt4CCATamZytl/nMKrJoxJqbqsDoqE6+xFMaJ4z+iIdyt64DKEBwqkuYN6L9Ns+tz7Yt TVFNuUBetvDQqigcqcVZFaWAmSHax7Eo47PD2Wo3685r7db5P6HT6O0CydEMqkICOS8Q SD3A== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@toxicpanda-com.20210112.gappssmtp.com header.s=20210112 header.b=NT0OaIfx; spf=pass (google.com: domain of linux-nfs-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-nfs-owner@vger.kernel.org Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id h70-20020a638349000000b00408e3bbb7dbsi2677118pge.100.2022.07.21.13.51.57; Thu, 21 Jul 2022 13:52:13 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-nfs-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@toxicpanda-com.20210112.gappssmtp.com header.s=20210112 header.b=NT0OaIfx; spf=pass (google.com: domain of linux-nfs-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-nfs-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232948AbiGUUr7 (ORCPT + 99 others); Thu, 21 Jul 2022 16:47:59 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:40720 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229761AbiGUUr7 (ORCPT ); Thu, 21 Jul 2022 16:47:59 -0400 Received: from mail-qk1-x736.google.com (mail-qk1-x736.google.com [IPv6:2607:f8b0:4864:20::736]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id E044C8F523 for ; Thu, 21 Jul 2022 13:47:57 -0700 (PDT) Received: by mail-qk1-x736.google.com with SMTP id v28so2244640qkg.13 for ; Thu, 21 Jul 2022 13:47:57 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=toxicpanda-com.20210112.gappssmtp.com; s=20210112; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=eJzfJJUbBt7HFtRFrUthszMXR55SFv/KXcRLdAJs0uA=; b=NT0OaIfxExmedjYrNABjjMoO2PrKfLsUcFdkprFHB2xpLZxXK2qgNb7UfPJV5nNptA UBt4JX/f/gzjXbQ4cWXQX+bsjPhZdHiwhoPeJg6/UISHUl3cuulmAsXjetthAKwh9CO1 S5Q3jcwJnii0nDwL8JbC97rYAlcg5ydVBAWekLJLvo9I3OxZgI3+Okp4bgHNR2uookUa GcUG0LUSizEEFSHIJ65FtSj0Z3su4E6sjW18mm6teNO0XbLkpKP7xfhiwDztlTKcjwXh d3h/F/LfIbHWY9QkOYsUlpvneNETr1RkBTgRjU2Zeaop4yakGfMIweihTtfjttYyak7c +hyA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=eJzfJJUbBt7HFtRFrUthszMXR55SFv/KXcRLdAJs0uA=; b=MjLmBgdJQmjKGBQQ2ZWTi4XOYufdgpb5Lsvc90ubMjsHwp+0Mu7+NPL05iEujRwhL2 fy1LD022w2gpK1HxNxg0yNCf4sdspZWcDILy08lz9G8OaNcllCKnAHSFV2mC7U7qkuYg Y7dRhP//Of1kxrMj/u0CcXPd7ElfLOAgqJUqQm0A66l76UfQkdtPC/Ri8oNoszGY6kSN hQNlx5Gy+bcJWF09gzYwd8Y+PfTSVKJXlT/23BKCxWrffTcNL/ILjvxrGny7u5+8eMcD Zv/MDCvukabVVdM6vyv7jQ73PADG8w6uw5KuEpGUCKC9xE4cg4KPz2IjuUY2KUyYGBS1 Ufig== X-Gm-Message-State: AJIora/wF2+Ajgcm6s0qVdFXbsLO6K25Yptqg3nwYX21FMPXjS7lj40d 1x74YcYIH0yMGxkzF2TUBUPkdw== X-Received: by 2002:ae9:c314:0:b0:6b2:4306:ca78 with SMTP id n20-20020ae9c314000000b006b24306ca78mr260044qkg.230.1658436476844; Thu, 21 Jul 2022 13:47:56 -0700 (PDT) Received: from localhost (cpe-174-109-172-136.nc.res.rr.com. [174.109.172.136]) by smtp.gmail.com with ESMTPSA id cm20-20020a05622a251400b00304fc3d144esm1897595qtb.1.2022.07.21.13.47.56 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 21 Jul 2022 13:47:56 -0700 (PDT) Date: Thu, 21 Jul 2022 16:47:55 -0400 From: Josef Bacik To: Anna Schumaker Cc: Dave Chinner , Chuck Lever III , Linux NFS Mailing List , linux-fsdevel Subject: Re: [PATCH v3 6/6] NFSD: Repeal and replace the READ_PLUS implementation Message-ID: References: <20220715184433.838521-1-anna@kernel.org> <20220715184433.838521-7-anna@kernel.org> <20220718011552.GK3600936@dread.disaster.area> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_NONE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-nfs@vger.kernel.org On Tue, Jul 19, 2022 at 04:46:50PM -0400, Anna Schumaker wrote: > On Sun, Jul 17, 2022 at 9:16 PM Dave Chinner wrote: > > > > On Fri, Jul 15, 2022 at 07:08:13PM +0000, Chuck Lever III wrote: > > > > On Jul 15, 2022, at 2:44 PM, Anna Schumaker wrote: > > > > > > > > From: Anna Schumaker > > > > > > > > Rather than relying on the underlying filesystem to tell us where hole > > > > and data segments are through vfs_llseek(), let's instead do the hole > > > > compression ourselves. This has a few advantages over the old > > > > implementation: > > > > > > > > 1) A single call to the underlying filesystem through nfsd_readv() means > > > > the file can't change from underneath us in the middle of encoding. > > > > Hi Anna, > > > > I'm assuming you mean the vfs_llseek(SEEK_HOLE) call at the start > > of nfsd4_encode_read_plus_data() that is used to trim the data that > > has already been read out of the file? > > There is also the vfs_llseek(SEEK_DATA) call at the start of > nfsd4_encode_read_plus_hole(). They are used to determine the length > of the current hole or data segment. > > > > > What's the problem with racing with a hole punch here? All it does > > is shorten the read data returned to match the new hole, so all it's > > doing is making the returned data "more correct". > > The problem is we call vfs_llseek() potentially many times when > encoding a single reply to READ_PLUS. nfsd4_encode_read_plus() has a > loop where we alternate between hole and data segments until we've > encoded the requested number of bytes. My attempts at locking the file > have resulted in a deadlock since vfs_llseek() also locks the file, so > the file could change from underneath us during each iteration of the > loop. > > > > > OTOH, if something allocates over a hole that the read filled with > > zeros, what's the problem with occasionally returning zeros as data? > > Regardless, if this has raced with a write to the file that filled > > that hole, we're already returning stale data/hole information to > > the client regardless of whether we trim it or not.... > > > > i.e. I can't see a correctness or data integrity problem here that > > doesn't already exist, and I have my doubts that hole > > punching/filling racing with reads happens often enough to create a > > performance or bandwidth problem OTW. Hence I've really got no idea > > what the problem that needs to be solved here is. > > > > Can you explain what the symptoms of the problem a user would see > > that this change solves? > > This fixes xfstests generic/091 and generic/263, along with this > reported bug: https://bugzilla.kernel.org/show_bug.cgi?id=215673 > > > > > > 2) A single call to the underlying filestem also means that the > > > > underlying filesystem only needs to synchronize cached and on-disk > > > > data one time instead of potentially many speeding up the reply. > > > > SEEK_HOLE/DATA doesn't require cached data to be sync'd to disk to > > be coherent - that's only a problem FIEMAP has (and syncing cached > > data doesn't fix the TOCTOU coherency issue!). i.e. SEEK_HOLE/DATA > > will check the page cache for data if appropriate (e.g. unwritten > > disk extents may have data in memory over the top of them) instead > > of syncing data to disk. > > For some reason, btrfs on virtual hardware has terrible performance > numbers when using vfs_llseek() with files that are already in the > server's cache. I think it had something to do with how they lock > extents, and some extra work that needs to be done if the file already > exists in the server's memory but it's been a few years since I've > gone into their code to figure out where the slowdown is coming from. > See this section of my performance results wiki page: > https://wiki.linux-nfs.org/wiki/index.php/Read_Plus_May_2022#BTRFS_3 > I just did this locally, once in my test vm's and once on my continuous performance testing hardware and I'm not able to reproduce the numbers, so I think I'm doing something wrong. My test is stupid, I just dd a 5gib file, come behind it and punch holes every other 4k. Then I umount and remount, SEEK_DATA+SEEK_HOLE through the whole file, and then do it again so I have uncached and cached. The numbers I'm seeing are equivalent to ext4/xfs. Granted on my VM I had to redo the test because I had lockdep and other debugging on which makes us look stupid because of the extent locking stuff, but with it off everything appears normal. Does this more or less mirror your testing? Looking at our code we're not doing anything inherently stupid, so I'm not entirely sure what could be the problem. Thanks, Josef