Received: by 2002:a25:6193:0:0:0:0:0 with SMTP id v141csp1553937ybb; Sat, 4 Apr 2020 06:59:55 -0700 (PDT) X-Google-Smtp-Source: APiQypLioo2rGGDUfgJ29g2pCKLVMRffue5QgWRuhhhVA44ADU0I++hKVU3jVU+TwwTOI5c741W2 X-Received: by 2002:a9d:4f0:: with SMTP id 103mr10355076otm.336.1586008795662; Sat, 04 Apr 2020 06:59:55 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1586008795; cv=none; d=google.com; s=arc-20160816; b=MDGH3nbzRyPjQoK/Wx2yREk9XFkRLIxvdH9fh/zZ4vn/ckxqN42Ji/kRsIuRBDOknX nphpD8VeLnP96sCEeGD9Lz2qLSSl/kta76lTjCK5Ct+TpM3LMZhkRJXOEWn6HDjeiq1S YsRGqcEesdBy8MjbHW9tMk5yQjBNIqsMVkYERrcq1B4Hipd5d9sf71q1RNyDZsDXpUh6 YRFpmLDtB7NaF21PZnDHmp5rsCe/WYegLLn7YIJrD9CMr23p23lZnR/COn3tgXCRd9Jf W0QgW4y/nQ9QnQDtoQMf50tagU0uE7EFNhSe5LGsVJtdmgPqSBgHg68SvNDOMg3wEq8P HCfA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding :content-language:in-reply-to:mime-version:user-agent:date :message-id:references:cc:to:subject:from; bh=9J8hnqcM0UGnh/s8gto4BChgKkutQluxNcdrJpS66AQ=; b=rIy5lcHM3Pnh8mBcdE4DYiVBBUZMUv93PVeSKjgtnM11ISO+eKdKSsyrwCurKayR34 9tB2mEqbc85OmHfknhOkHRnIHH3JlNvPit8xAh3vOErhst+dB5/b0MWTY3aW6TLH912a 0zi4JTHxQNTJu8ZEGiV1qOsipdD1KTCa03ixlRCoUIxjFNiriyA0r7mPD3BLPIsejAD7 oD5sbZPp0yFix01syoOQuXmQAybDy2Ck8iiRTt6++jJWVoFvsY0mWWior4tfS6gPZFDA IhkL8A+ekVn6J+IOWWecUtu8uqXb1XO8Q8uBo9M8qKA2lhFurcwoxN5jMMD/fR3Ck6ik h7mA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-nfs-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-nfs-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=alibaba.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id 94si4941396otb.114.2020.04.04.06.59.31; Sat, 04 Apr 2020 06:59:55 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-nfs-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-nfs-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-nfs-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=alibaba.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726291AbgDDNyi (ORCPT + 99 others); Sat, 4 Apr 2020 09:54:38 -0400 Received: from out30-54.freemail.mail.aliyun.com ([115.124.30.54]:54965 "EHLO out30-54.freemail.mail.aliyun.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725730AbgDDNyi (ORCPT ); Sat, 4 Apr 2020 09:54:38 -0400 X-Alimail-AntiSpam: AC=PASS;BC=-1|-1;BR=01201311R131e4;CH=green;DM=||false|;DS=||;FP=0|-1|-1|-1|0|-1|-1|-1;HT=e01e04394;MF=wuyihao@linux.alibaba.com;NM=1;PH=DS;RN=5;SR=0;TI=SMTPD_---0TuaeAVn_1586008401; Received: from Macintosh.local(mailfrom:wuyihao@linux.alibaba.com fp:SMTPD_---0TuaeAVn_1586008401) by smtp.aliyun-inc.com(127.0.0.1); Sat, 04 Apr 2020 21:53:22 +0800 From: Yihao Wu Subject: Re: [PATCH] nfsd: fix race between cache_clean and cache_purge To: NeilBrown , "J . Bruce Fields" , Chuck Lever Cc: linux-nfs@vger.kernel.org References: <5eed50660eb13326b0fbf537fb58481ea53c1acb.1585043174.git.wuyihao@linux.alibaba.com> <87369x8i8t.fsf@notabene.neil.brown.name> Message-ID: <9c023160-8226-ac5d-9e7a-0e26ebf703f0@linux.alibaba.com> Date: Sat, 4 Apr 2020 21:53:20 +0800 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:68.0) Gecko/20100101 Thunderbird/68.6.0 MIME-Version: 1.0 In-Reply-To: <87369x8i8t.fsf@notabene.neil.brown.name> Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit Sender: linux-nfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-nfs@vger.kernel.org On 2020/3/25 7:07 AM, NeilBrown wrote: >> >> @@ -541,6 +543,7 @@ void cache_purge(struct cache_detail *detail) >> } >> } >> spin_unlock(&detail->hash_lock); >> + spin_unlock(&cache_list_lock); >> } >> EXPORT_SYMBOL_GPL(cache_purge); >> >> -- >> 2.20.1.2432.ga663e714 > I wonder if this is the best solution. > This code: > > hlist_for_each_entry_safe(ch, tmp, head, cache_list) { > sunrpc_begin_cache_remove_entry(ch, detail); > spin_unlock(&detail->hash_lock); > sunrpc_end_cache_remove_entry(ch, detail); > spin_lock(&detail->hash_lock); > } > > Looks wrong. > Dropping a lock while walking a list if only safe if you hold a > reference to the place-holder - 'tmp' in this case. but we don't. > As this is trying to remove everything in the list it would be safer to > do something like > > while (!hlist_empty(head)) { > ch = hlist_entry(head->first, struct cache_head, h); > sunrpc_begin_cache_remove_entry(ch, detail); > spin_unlock(&detail->hash_lock); > sunrpc_end_cache_remove_entry(ch, detail); > spin_lock(&detail->hash_lock); > } > > I'm guessing that would fix the problem in a more focused. > But I'm not 100% sure because there is no analysis given of the cause. > What line is > cache_purge+0xce/0x160 > ./scripts/faddr2line can tell you. > I suspect it is the hlist_for_each_entry_safe() line. > > Can you test the change I suggest and see if it helps? > > Thanks, > NeilBrown Sorry for the late. It took me some time to reproduce the bug stably so I can verify the correctness of the fix. You definitely pointed out the root cause. And the solution is more elegant. After applying your solution. The bug doesn't reproduce now. There's no race condition. hash_lock is designed to protect cache_detail in fine grain. And it already did its job. And yes, hlist_for_each_entry_safe is where the bug at. It may walk to a deleted entry(tmp). My v1 solution is a regression considering this. So I will modify the patch title in v2 too. BTW, I checked faddr2line output, it says cache_purge+0xce/0x160 is cache_put. It make sense too, and doesn't go against your theory. Here's my reproduce script: systemctl enable rpcbind systemctl enable nfs systemctl start rpcbind systemctl start nfs mkdir /tmp/x /tmp/y # Create some collision in hash table for i in `seq 256`; do mkdir /tmp/x/$i mkdir /tmp/y/$i exportfs localhost:/tmp/x/$i done for i in `seq 256`; do mount localhost:/tmp/x/$i /tmp/y/$i done END=$(cat /proc/self/net/rpc/nfsd.export/flush) NOW=$(date +%s) sleep $((END - NOW)) # Trigger cache_purge systemctl stop nfs & usleep 20000 # Trigger cache_clean echo > /proc/self/net/rpc/nfsd.export/flush To speedup the reproducing process, I also added mdelay(500) between acquiring and releasing hash_lock in cache_purge. Thank you so much! Yihao Wu