Return-Path: linux-nfs-owner@vger.kernel.org Received: from mail-ee0-f46.google.com ([74.125.83.46]:50994 "EHLO mail-ee0-f46.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751735Ab1LYMDy (ORCPT ); Sun, 25 Dec 2011 07:03:54 -0500 Received: by eekc4 with SMTP id c4so10531106eek.19 for ; Sun, 25 Dec 2011 04:03:53 -0800 (PST) Message-ID: <4EF71125.1060901@tonian.com> Date: Sun, 25 Dec 2011 14:03:49 +0200 From: Benny Halevy MIME-Version: 1.0 To: Trond Myklebust CC: tigran.mkrtchyan@desy.de, linux-nfs Subject: Re: Session timeout on RHEL6.2 References: <1324475851.7709.12.camel@lade.trondhjem.org> <4EF6A898.2010207@tonian.com> <1324806463.2740.6.camel@lade.trondhjem.org> In-Reply-To: <1324806463.2740.6.camel@lade.trondhjem.org> Content-Type: text/plain; charset=ISO-8859-1 Sender: linux-nfs-owner@vger.kernel.org List-ID: On 2011-12-25 11:47, Trond Myklebust wrote: > On Sun, 2011-12-25 at 06:37 +0200, Benny Halevy wrote: >> On 2011-12-21 22:11, Tigran Mkrtchyan wrote: >>> On Wed, Dec 21, 2011 at 2:57 PM, Trond Myklebust >>> wrote: >>>> On Wed, 2011-12-21 at 10:24 +0100, Tigran Mkrtchyan wrote: >>>>> Dear friends, >>>>> >>>>> We are observing strange behavior with RHEL 6.2: >>>>> >>>>> Our the server lease time is 90 seconds. I can see that client >>>>> sends SEQUENCE every 60 sec. And this is for some hours ( ~8 ). >>>>> At some point client sends SEQUENCE after 127 seconds and >>>>> gets, as expected, EXPIRED. >>>> >>>> Why shouldn't the client be allowed to let the lease expire if nothing >>>> is using that filesystem? >>>> >>>>> I this point I have to blame myself. >>>>> Client comes with EXCHANGE_ID using the same clientid. >>>>> We did not garbage collected clientid internally as this happens after >>>>> 2*LEASE_TIME >>>>> and return EXPIRE. This ping-pong never ends. >>>>> >>>>> This is probably mostly a bug on my side. Nevertheless we never observed late >>>>> SEQUENCE with kernel > 2.6.39. A short packet dump attached. >>>>> >>>>> I can open bug at RHEL if required. >>>> >>>> I wouldn't consider that a bug. >>> >>> As I said, there is a bug in exchange_id processing ( case 3 ) on my >>> side. But to me it's sounds strange that client after more than 8 >>> hours of sending only sequence decided to send one of them later than >>> lease time. Especially, that we did not have it with other kernels. >> >> I'm inclined to agree. The client can let the lease expire for sure >> and that's not a bug but the fact that the client sent the SEQUENCE operation >> after the lease had expired indicates it might not be aware of that fact >> and that seems to be a client bug. >> >> That said, I don't think that letting the lease expire when the client is idle >> is the most polite thing to do. Why let the server clean up after the client >> and revert to possibly un-optimized recovery paths rather than orderly >> destruction of the state by the client? > > There are plenty of cases where the client can be idle for hours or even > _days_. What's the point of pinging the server all the time after > working hours? > > If someone wants to code up a DESTROY_SESSION and DESTROY_CLIENTID in > order to make it formal, then fine, however note that we don't even do > that on a full unmount today. > The heavy lifting is releasing locks and returning layouts and delegations sending DESTROY_{SESSION,CLIENTID} would be nice to have but I don't think it's the most important issue. Benny