From: Chuck Lever Subject: Re: Make sm-notify faster if there are no servers to notify Date: Fri, 5 Dec 2008 13:41:32 -0500 Message-ID: <6763B58B-6C3B-40FA-84F4-737E93D429A6@oracle.com> References: <1225302305994@dmwebmail.dmwebmail.chezphil.org> <20081029184153.GE31936@fieldses.org> <5AB39614-D03F-43DF-BCD2-2B2501A79D65@oracle.com> <20081029211145.GE1406@fieldses.org> <49183A12.7010707@RedHat.com> <20081204211057.GC9593@fieldses.org> <18744.41310.635618.148281@notabene.brown> <20081205035803.GC15115@fieldses.org> <49392C14.7000709@RedHat.com> <20081205163824.GA29227@fieldses.org> <20081205172913.GB29227@fieldses.org> Mime-Version: 1.0 (Apple Message framework v929.2) Content-Type: text/plain; charset=US-ASCII; format=flowed; delsp=yes Cc: Steve Dickson , Neil Brown , Phil Endecott , linux-nfs@vger.kernel.org To: "J. Bruce Fields" Return-path: Received: from acsinet11.oracle.com ([141.146.126.233]:53707 "EHLO acsinet11.oracle.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750991AbYLESl6 (ORCPT ); Fri, 5 Dec 2008 13:41:58 -0500 In-Reply-To: <20081205172913.GB29227@fieldses.org> Sender: linux-nfs-owner@vger.kernel.org List-ID: On Dec 5, 2008, at 12:29 PM, J. Bruce Fields wrote: > On Fri, Dec 05, 2008 at 11:38:24AM -0500, bfields wrote: >> On Fri, Dec 05, 2008 at 08:26:44AM -0500, Steve Dickson wrote: >>> J. Bruce Fields wrote: >>>> >>>>> I think it would still be valuable to replace the 'sync' with two >>>>> 'fsync's, one of the file, one on the directory. >>>> >>>> Sure, may as well.--b. >>>> >>> Something similar to this: >>> >>> diff -up nfs-utils/utils/statd/sm-notify.c.orig nfs-utils/utils/ >>> statd/sm-notify.c >>> --- nfs-utils/utils/statd/sm-notify.c.orig 2008-11-17 >>> 15:06:13.000000000 -0500 >>> +++ nfs-utils/utils/statd/sm-notify.c 2008-12-05 >>> 08:21:52.000000000 -0500 >>> @@ -211,12 +211,6 @@ usage: fprintf(stderr, >>> backup_hosts(_SM_DIR_PATH, _SM_BAK_PATH); >>> get_hosts(_SM_BAK_PATH); >>> >>> - /* If there are not hosts to notify, just exit */ >>> - if (!hosts) { >>> - nsm_log(LOG_DEBUG, "No hosts to notify; exiting"); >>> - return 0; >>> - } >> >> This was still a huge boot-time win in the common case, so now that >> we've committed to it I'd rather not regress. Let's just skip the >> sync()s/fsncy()s in the !hosts case--that looks to me like the >> simplest >> correct solution for now. > > My argument for correctness: if we don't sync in that case, then on > reboot the rename that updates the state will either have happened or > (if a crash comes too soon) not. > > It is OK for that update to not happen as long as we're assured it > happens before the first lock request is made or replied to, or the > first monitor request completes, as, in the absence of any notifies, > those are the only points at which the new state will be exposed to > the > outside world. > > The first lock request will also require an upcall to statd. So we're > OK as long as any monitor requests (from either the local kernel or > remote peers) do a sync. > > And statd should be doing a sync before responding to any monitor > request. As long as the SM_DIR is on the same filesystem as the state > file, that would do the job.... But now that I look, I see statd is > using an open with O_SYNC to ensure the new statd record hits stable > storage. Which we can't count on being enough. > > How about adding an explicit fsync() of the state file (and parent > directory) to statd's first succesful creation of a statd record, > together with a comment explaining this? So around about line 194 in > utils/statd/monitor.c:sm_mon_1_svc()? > > In fact, we could delete the sync entirely and do the same before the > first notification, and then we wouldn't have to wait for the sync in > the case host records are present either.... (statd would, but > perhaps > we could still get other work done in the mean time). > > (Am I missing something?) This all might work, but I think we're adding a lot of complexity as a workaround. Someone should fix the real problem, which is the implementation of sync(). That would probably have other benefits too. -- Chuck Lever chuck[dot]lever[at]oracle[dot]com