Received: by 2002:ab2:6816:0:b0:1f9:5764:f03e with SMTP id t22csp3187853lqo; Tue, 21 May 2024 09:09:11 -0700 (PDT) X-Forwarded-Encrypted: i=3; AJvYcCWppsnlvA9ZdxUmYWVAtJ1dQSsnUuq+mfXoo1u4MvMuRxagJH2zBHMzStUO5jqLMaRXCAHr/2wQbcui6NAxfsbuQ56nOUQCIqmhM/mrWQ== X-Google-Smtp-Source: AGHT+IEkAJHQbaheHfKXosdsPOpTMN2w+Ikz+xapwOLaVwdBvGV6qudjBT7Lk6sClSN953h/Fbee X-Received: by 2002:a05:622a:12:b0:43b:d8d:30f with SMTP id d75a77b69052e-43dfdaa9e5cmr391870141cf.2.1716307750839; Tue, 21 May 2024 09:09:10 -0700 (PDT) ARC-Seal: i=2; a=rsa-sha256; t=1716307750; cv=pass; d=google.com; s=arc-20160816; b=jTOECZlzV+QIgHK4lJFjdsm56q1DrL9BthgSFwkoldllpNbuQFwTylnTe1QdzlL7eZ 167XURlpfWv378T/AbWSu3HwEBLNJI2IB/B3jybtIO8yqGsUTGgYo4xIK+EojZWzBXQo guKhc/uI1ETFoBMJXImhCqpQheJLd4OXFZMYCf/hWMgeLa24GVL9PPRO93owj7jugOnP h5O3Qw5ciYyHdwkPUJtkLPstiVPfmThwl4a3QML+uKMrTGcNYedaDxiynxkhwwxPOo2j OIrFMIAAInJ//I2e6mN5a3JFnBg+1OL0Apvc5LAJA6QCoINEqBjDpMOadUmGfPAoJ+/8 vgcg== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:in-reply-to:from:content-language :references:cc:to:subject:user-agent:mime-version:list-unsubscribe :list-subscribe:list-id:precedence:date:message-id; bh=qkcH0n5iGmejhRVMIMNw6LBUDINwcFE+shaltipre+U=; fh=5f3bu0XRaEEqiqMgQ7VrQgk6mevrfrOAjMWJ6Ghbvnk=; b=UT4t5mzuZYXAvqWha1xMqFu6lj5pYey2AeFz4iOE47XeKxJXn/PB+99z7cU5OaDpbz Rl2fLymJJXU3PSr0zUOKSOXaUqiyxXWP/r9oluUIth3CAhRhrs8NkfGk7v+hwo0X5oP+ F6y0GsrV+E4U1HJly5nE+BT/AdYOHPhGVkWvuVO3mDPUP0KjN3vrVhojKa5spzFhxBB+ nBG6SI09UinFQcVX6d9lOim71KERoyTObbTtBUpIKtAjQ6gAptSO9uKXQ2z+I94W+g4u /s7B3GewU2CjS5QwM7/tlcvaU2jDJfZQ1gzmeGbvrHKZ1Gv7jRU19YJJA2ZuzeQXf4CO Ll9A==; dara=google.com ARC-Authentication-Results: i=2; mx.google.com; arc=pass (i=1 spf=pass spfdomain=gmail.com); spf=pass (google.com: domain of linux-nfs+bounces-3320-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:45d1:ec00::1 as permitted sender) smtp.mailfrom="linux-nfs+bounces-3320-linux.lists.archive=gmail.com@vger.kernel.org" Return-Path: Received: from ny.mirrors.kernel.org (ny.mirrors.kernel.org. [2604:1380:45d1:ec00::1]) by mx.google.com with ESMTPS id d75a77b69052e-43df9b99deasi28741571cf.276.2024.05.21.09.09.10 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 21 May 2024 09:09:10 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-nfs+bounces-3320-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:45d1:ec00::1 as permitted sender) client-ip=2604:1380:45d1:ec00::1; Authentication-Results: mx.google.com; arc=pass (i=1 spf=pass spfdomain=gmail.com); spf=pass (google.com: domain of linux-nfs+bounces-3320-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:45d1:ec00::1 as permitted sender) smtp.mailfrom="linux-nfs+bounces-3320-linux.lists.archive=gmail.com@vger.kernel.org" Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ny.mirrors.kernel.org (Postfix) with ESMTPS id 733E41C20C45 for ; Tue, 21 May 2024 16:09:10 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 19E721C68C; Tue, 21 May 2024 16:09:08 +0000 (UTC) X-Original-To: linux-nfs@vger.kernel.org Received: from mail-wr1-f43.google.com (mail-wr1-f43.google.com [209.85.221.43]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 262D42D7A8 for ; Tue, 21 May 2024 16:09:05 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.221.43 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1716307748; cv=none; b=AUlf5dnnrDa1TdFOFYIkuzxxQ7WKA/z98aLR5vESQboHc/fDpv1VQrmOBGWU4T+37h+qD1FqvLHd5fRc+jPlDMysVqT5IqYTeIe+5wVrxaMr8flBW4zg24siQyrJj/WiAfN874D74g4L5d6sj3rURIZF88mZGufrPFtm7FST66Q= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1716307748; c=relaxed/simple; bh=jh3ldPhffzfx8KBrkqDrrAKaoyzGA3Aj68XfvKdNeMU=; h=Message-ID:Date:MIME-Version:Subject:To:Cc:References:From: In-Reply-To:Content-Type; b=He+9GquJgJPzxv0g+yPOkVfTlypCY0i6kRpZQ9qpOkpBw5PoVsUowacmcgBBa2chLDcYkFPKlPQVD82j21AIPEhH6b5kgFOePErw1TqqEoQixQ7dffIRDCH15CfvZ8O0eB8RU3d8vDsl/KDgaWDsaZewRMhVTjmCU2glgxHR8RY= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=grimberg.me; spf=pass smtp.mailfrom=gmail.com; arc=none smtp.client-ip=209.85.221.43 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=grimberg.me Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Received: by mail-wr1-f43.google.com with SMTP id ffacd0b85a97d-354ccdfb014so206884f8f.2 for ; Tue, 21 May 2024 09:09:05 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1716307744; x=1716912544; h=content-transfer-encoding:in-reply-to:from:content-language :references:cc:to:subject:user-agent:mime-version:date:message-id :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=qkcH0n5iGmejhRVMIMNw6LBUDINwcFE+shaltipre+U=; b=NcnkljgUQ40RkCZXGkrJJibWiVa7l3sJoYwvz2kFYMxQ1jFtDtvzYueoFKx4MvAqf3 zp3E7iMb95lUcMza07LpkXDOYr508HojawHQ7GsUXWDW4TShjhKtrC63ccqGR+ZWpq0V 5YNIIeF9LToKMzb/SDWTSLY1TZypfOBoS2tI0QO+KB893J+BZFBLBqbtTGnTCbDAPJWk a+gZRxVTU22RlxqnlWl1joCsjl6CC3hfXvyq6vfN22fXdG64tx2/fZ3cYlkwrPJOIAgx Jcn8QVmAfeNlkVIO1pMrAUlr9niUee5E9luVJ07dtSfx2sKir+LuLshRO+SRQ+YGKUVC BOvg== X-Forwarded-Encrypted: i=1; AJvYcCUs/R78j/bs4ZzLZDV+q6JdXTmGr5hbVgfLuSWWJuQt5qLwTOS05/K28G/OuMKa2i3oeEGTISl1UEYJDgL+tDLuijtYH4CLUfZd X-Gm-Message-State: AOJu0Yz72AGMn60lKN3bx5jUmOJLKhg5Oqb4eM0xgqWPGOGanIIDEdRy IWdRWulYheFFFfwtmnR63K5tdMxqr0B7Fxctz6FhlaQ0LXTBXlyh X-Received: by 2002:a05:600c:1c11:b0:41f:9c43:574f with SMTP id 5b1f17b1804b1-41feac59cffmr237393885e9.3.1716307743573; Tue, 21 May 2024 09:09:03 -0700 (PDT) Received: from [10.100.102.74] (85.65.193.189.dynamic.barak-online.net. [85.65.193.189]) by smtp.gmail.com with ESMTPSA id 5b1f17b1804b1-42016a511a7sm342091405e9.0.2024.05.21.09.09.02 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Tue, 21 May 2024 09:09:03 -0700 (PDT) Message-ID: <4d2bc7f1-b5c2-469c-9351-772626c707d7@grimberg.me> Date: Tue, 21 May 2024 19:09:01 +0300 Precedence: bulk X-Mailing-List: linux-nfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH rfc] nfs: propagate readlink errors in nfs_symlink_filler To: Trond Myklebust , "linux-nfs@vger.kernel.org" , "jlayton@kernel.org" Cc: "hch@lst.de" , "dan.aloni@vastdata.com" , "chuck.lever@oracle.com" References: <20240521125840.186618-1-sagi@grimberg.me> Content-Language: he-IL, en-US From: Sagi Grimberg In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit On 21/05/2024 18:13, Trond Myklebust wrote: > On Tue, 2024-05-21 at 18:05 +0300, Sagi Grimberg wrote: >> >> On 21/05/2024 16:22, Jeff Layton wrote: >>> On Tue, 2024-05-21 at 15:58 +0300, Sagi Grimberg wrote: >>>> There is an inherent race where a symlink file may have been >>>> overriden >>>> (by a different client) between lookup and readlink, resulting in >>>> a >>>> spurious EIO error returned to userspace. Fix this by propagating >>>> back >>>> ESTALE errors such that the vfs will retry the lookup/get_link >>>> (similar >>>> to nfs4_file_open) at least once. >>>> >>>> Cc: Dan Aloni >>>> Signed-off-by: Sagi Grimberg >>>> --- >>>> Note that with this change the vfs should retry once for >>>> ESTALE errors. However with an artificial reproducer of high >>>> frequency symlink overrides, nothing prevents the retry to >>>> also encounter ESTALE, propagating the error back to userspace. >>>> The man pages for openat/readlinkat do not list an ESTALE errno. >>>> >>>> An alternative attempt (implemented by Dan) was a local retry >>>> loop >>>> in nfs_get_link(), if this is an applicable approach, Dan can >>>> share his patch instead. >>>> >>>>   fs/nfs/symlink.c | 2 +- >>>>   1 file changed, 1 insertion(+), 1 deletion(-) >>>> >>>> diff --git a/fs/nfs/symlink.c b/fs/nfs/symlink.c >>>> index 0e27a2e4e68b..13818129d268 100644 >>>> --- a/fs/nfs/symlink.c >>>> +++ b/fs/nfs/symlink.c >>>> @@ -41,7 +41,7 @@ static int nfs_symlink_filler(struct file >>>> *file, struct folio *folio) >>>>   error: >>>>    folio_set_error(folio); >>>>    folio_unlock(folio); >>>> - return -EIO; >>>> + return error; >>>>   } >>>> >>>>   static const char *nfs_get_link(struct dentry *dentry, >>> git blame seems to indicate that we've returned -EIO here since the >>> beginning of the git era (and likely long before that). I see no >>> reason >>> for us to cloak the real error there though, especially with >>> something >>> like an ESTALE error. >>> >>>      Reviewed-by: Jeff Layton >>> >>> FWIW, I think we shouldn't try to do any retry looping on ESTALE >>> beyond >>> what we already do. >>> >>> Yes, we can sometimes trigger ESTALE errors to bubble up to >>> userland if >>> we really thrash the underlying filesystem when testing, but I >>> think >>> that's actually desirable: >> Returning ESTALE would be an improvement over returning EIO IMO, >> but it may be surprising for userspace to see an undocumented errno. >> Maybe the man pages can be amended? >> >>> If you have real workloads across multiple machines that are racing >>> with other that tightly, then you should probably be using some >>> sort of >>> locking or other synchronization. If it's clever enough that it >>> doesn''t need that, then it should be able to deal with the >>> occasional >>> ESTALE error by retrying on its own. >> I tend to agree. FWIW Solaris has a config knob for number of stale >> retries >> it does, maybe there is an appetite to have something like that as >> well? >> > Any reason why we couldn't just return ENOENT in the case where the > filehandle is stale? There will have been an unlink() on the symlink at > some point in the recent past. > No reason that I can see. However given that this was observed in the wild, and essentially a common pattern with symlinks (overwrite a config file for example), I think its reasonable to have the vfs at least do a single retry, by simply returning ESTALE. However NFS cannot distinguish between first and second retries afaict... Perhaps the vfs can help with a ESTALE->ENOENT conversion?