Return-Path: linux-nfs-owner@vger.kernel.org Received: from fieldses.org ([174.143.236.118]:35134 "EHLO fieldses.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754908Ab3BDP4L (ORCPT ); Mon, 4 Feb 2013 10:56:11 -0500 Date: Mon, 4 Feb 2013 10:56:04 -0500 From: "J. Bruce Fields" To: Jeff Layton Cc: linux-nfs@vger.kernel.org Subject: Re: [PATCH v2 0/8] nfsd: duplicate reply cache overhaul Message-ID: <20130204155604.GC815@fieldses.org> References: <1359983887-28535-1-git-send-email-jlayton@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii In-Reply-To: <1359983887-28535-1-git-send-email-jlayton@redhat.com> Sender: linux-nfs-owner@vger.kernel.org List-ID: On Mon, Feb 04, 2013 at 08:17:59AM -0500, Jeff Layton wrote: > This is the second posting of the remaining unmerged patches in this > set. There are a number of differences from the first set: > > - The bug in the checksum patch has been fixed. > > - A hard cap on the number of DRC entries is retained, but it's larger > than the original cap, and scales with the amount of low memory in > the machine. > > - A shrinker is still registered, but it will now only free entries > that are expired or are over the max number of entries. > > Our QA group has been reporting on and off for the last several years > about occasional failures in testing, especially on UDP. When we go to > look at traces, we see a missing reply from a server on a non-idempotent > request. The client then retransmits the request and the server tries to > redo it instead of just sending the DRC entry. > > With an instrumented kernel on the server and a synthetic reproducer, we > found that it's quite easy to hammer the server so fast that DRC entries > get flushed out long before a retransmit can come in. > > This patchset is a first pass at fixing this. Instead of simply keeping > a cache of the last 1024 entries, it allows nfsd to grow and shrink the > DRC dynamically. > > While most of us will probably say "so what" when it comes to UDP > failures, it's a potential problem on connected transports as well. I'm > also inclined to try and fix things that screw up the people that are > helping us test our code. > > I'd like to see this merged for 3.9 if possible... These look fine, thanks. Applying pending some testing. --b. > > Jeff Layton (8): > nfsd: always move DRC entries to the end of LRU list when updating > timestamp > nfsd: track the number of DRC entries in the cache > nfsd: dynamically allocate DRC entries > nfsd: remove the cache_disabled flag > nfsd: when updating an entry with RC_NOCACHE, just free it > nfsd: add recurring workqueue job to clean the cache > nfsd: register a shrinker for DRC cache entries > nfsd: keep a checksum of the first 256 bytes of request > > fs/nfsd/cache.h | 5 + > fs/nfsd/nfscache.c | 271 ++++++++++++++++++++++++++++++++++++++++------------- > 2 files changed, 209 insertions(+), 67 deletions(-) > > -- > 1.7.11.7 >