2008-10-29 00:13:23

by Phil Endecott

[permalink] [raw]
Subject: Make sm-notify faster if there are no servers to notify

Dear Experts,

sm-notify was taking a long time while my laptop booted. This was odd because I
use NFS only rarely - via autofs - on that machine, and sm-notify actually has
no-one to notify most of the time. So I have patched it as follows. Is this a
legitimate thing to do?


diff -ur nfs-utils-1.1.3.orig/utils/statd/sm-notify.c nfs-utils-1.1.3/utils/statd/sm-notify.c
--- nfs-utils-1.1.3.orig/utils/statd/sm-notify.c 2008-07-27 22:01:45.000000000 +0100
+++ nfs-utils-1.1.3/utils/statd/sm-notify.c 2008-10-13 19:02:54.000000000 +0100
@@ -169,6 +169,10 @@
backup_hosts(_SM_DIR_PATH, _SM_BAK_PATH);
get_hosts(_SM_BAK_PATH);

+ if (!hosts) {
+ return 0;
+ }
+
/* Get and update the NSM state. This will call sync() */
nsm_state = nsm_get_state(opt_update_state);
set_kernel_nsm_state(nsm_state);



Regards, Phil.
(Please Cc: me in any replies.)



2008-10-29 17:18:43

by J. Bruce Fields

[permalink] [raw]
Subject: Re: Make sm-notify faster if there are no servers to notify

On Wed, Oct 29, 2008 at 12:13:20AM +0000, Phil Endecott wrote:
> Dear Experts,
>
> sm-notify was taking a long time while my laptop booted. This was odd because I
> use NFS only rarely - via autofs - on that machine, and sm-notify actually has
> no-one to notify most of the time. So I have patched it as follows. Is this a
> legitimate thing to do?

It looks like your patch was committed to nfs-utils a couple weeks ago:
see c8d18e26d2a53d9036a32c2dafebccaf4ce1634d from

git://linux-nfs.org/nfs-utils

--b.

>
>
> diff -ur nfs-utils-1.1.3.orig/utils/statd/sm-notify.c nfs-utils-1.1.3/utils/statd/sm-notify.c
> --- nfs-utils-1.1.3.orig/utils/statd/sm-notify.c 2008-07-27 22:01:45.000000000 +0100
> +++ nfs-utils-1.1.3/utils/statd/sm-notify.c 2008-10-13 19:02:54.000000000 +0100
> @@ -169,6 +169,10 @@
> backup_hosts(_SM_DIR_PATH, _SM_BAK_PATH);
> get_hosts(_SM_BAK_PATH);
>
> + if (!hosts) {
> + return 0;
> + }
> +
> /* Get and update the NSM state. This will call sync() */
> nsm_state = nsm_get_state(opt_update_state);
> set_kernel_nsm_state(nsm_state);
>
>
>
> Regards, Phil.
> (Please Cc: me in any replies.)
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html

2008-10-29 17:30:07

by Phil Endecott

[permalink] [raw]
Subject: Re: Make sm-notify faster if there are no servers to notify

J. Bruce Fields wrote:
> On Wed, Oct 29, 2008 at 12:13:20AM +0000, Phil Endecott wrote:
>> Dear Experts,
>>
>> sm-notify was taking a long time while my laptop booted. This was odd because I
>> use NFS only rarely - via autofs - on that machine, and sm-notify actually has
>> no-one to notify most of the time. So I have patched it as follows. Is this a
>> legitimate thing to do?
>
> It looks like your patch was committed to nfs-utils a couple weeks ago:
> see c8d18e26d2a53d9036a32c2dafebccaf4ce1634d from
>
> git://linux-nfs.org/nfs-utils
>
> --b.

How curious. I guess someone saw my Debian bug report. No mention of
it on this list as far as I can see though.

I presume from this that it is considered a safe thing to do.


Phil.




2008-10-29 17:37:51

by J. Bruce Fields

[permalink] [raw]
Subject: Re: Make sm-notify faster if there are no servers to notify

On Wed, Oct 29, 2008 at 05:30:03PM +0000, Phil Endecott wrote:
> J. Bruce Fields wrote:
>> On Wed, Oct 29, 2008 at 12:13:20AM +0000, Phil Endecott wrote:
>>> Dear Experts,
>>>
>>> sm-notify was taking a long time while my laptop booted. This was
>>> odd because I use NFS only rarely - via autofs - on that machine, and
>>> sm-notify actually has no-one to notify most of the time. So I have
>>> patched it as follows. Is this a legitimate thing to do?
>>
>> It looks like your patch was committed to nfs-utils a couple weeks ago:
>> see c8d18e26d2a53d9036a32c2dafebccaf4ce1634d from
>>
>> git://linux-nfs.org/nfs-utils
>>
>> --b.
>
> How curious. I guess someone saw my Debian bug report. No mention of
> it on this list as far as I can see though.
>
> I presume from this that it is considered a safe thing to do.

It looks right to me. Hopefully somebody actually has tested this on a
client that holds locks when it reboots?

I remember this was one of the things Arjan mentioned having to disable
in his "5-second boot" talk at the Linux Plumbers Conference, so you're
not the only one to have noticed the problem....

--b.

2008-10-29 17:45:09

by Phil Endecott

[permalink] [raw]
Subject: Re: Make sm-notify faster if there are no servers to notify

J. Bruce Fields wrote:
> On Wed, Oct 29, 2008 at 05:30:03PM +0000, Phil Endecott wrote:
>> J. Bruce Fields wrote:
>>> On Wed, Oct 29, 2008 at 12:13:20AM +0000, Phil Endecott wrote:
>>>> Dear Experts,
>>>>
>>>> sm-notify was taking a long time while my laptop booted. This was
>>>> odd because I use NFS only rarely - via autofs - on that machine, and
>>>> sm-notify actually has no-one to notify most of the time. So I have
>>>> patched it as follows. Is this a legitimate thing to do?
>>>
>>> It looks like your patch was committed to nfs-utils a couple weeks ago:
>>> see c8d18e26d2a53d9036a32c2dafebccaf4ce1634d from
>>>
>>> git://linux-nfs.org/nfs-utils
>>>
>>> --b.
>>
>> How curious. I guess someone saw my Debian bug report. No mention of
>> it on this list as far as I can see though.
>>
>> I presume from this that it is considered a safe thing to do.
>
> It looks right to me. Hopefully somebody actually has tested this on a
> client that holds locks when it reboots?

Not me.

> I remember this was one of the things Arjan mentioned having to disable
> in his "5-second boot" talk at the Linux Plumbers Conference, so you're
> not the only one to have noticed the problem....

Yes; I noticed the huge pause due to the sync() and applied this fix.
It was only later that I looked at Arjan's slides (I had to wait until
someone had converted then from PowerPoint to PDF...) and saw that he
had the same thing in his bootchart.


Phil.




2008-10-29 18:41:54

by J. Bruce Fields

[permalink] [raw]
Subject: Re: Make sm-notify faster if there are no servers to notify

On Wed, Oct 29, 2008 at 05:45:05PM +0000, Phil Endecott wrote:
> J. Bruce Fields wrote:
>> On Wed, Oct 29, 2008 at 05:30:03PM +0000, Phil Endecott wrote:
>>> J. Bruce Fields wrote:
>>>> On Wed, Oct 29, 2008 at 12:13:20AM +0000, Phil Endecott wrote:
>>>>> Dear Experts,
>>>>>
>>>>> sm-notify was taking a long time while my laptop booted. This
>>>>> was odd because I use NFS only rarely - via autofs - on that
>>>>> machine, and sm-notify actually has no-one to notify most of the
>>>>> time. So I have patched it as follows. Is this a legitimate
>>>>> thing to do?
>>>>
>>>> It looks like your patch was committed to nfs-utils a couple weeks ago:
>>>> see c8d18e26d2a53d9036a32c2dafebccaf4ce1634d from
>>>>
>>>> git://linux-nfs.org/nfs-utils
>>>>
>>>> --b.
>>>
>>> How curious. I guess someone saw my Debian bug report. No mention
>>> of it on this list as far as I can see though.
>>>
>>> I presume from this that it is considered a safe thing to do.
>>
>> It looks right to me. Hopefully somebody actually has tested this on a
>> client that holds locks when it reboots?
>
> Not me.

Ugh.

We really need someone doing regular lock recovery tests with both the
latest kernel and latest nfs-utils.

Ideally we'd also have tests that could be easily run by anyone. Though
there may be too many site-specific details involved in writing scripts
that interact with (and reboot) multiple machines.

>> I remember this was one of the things Arjan mentioned having to disable
>> in his "5-second boot" talk at the Linux Plumbers Conference, so you're
>> not the only one to have noticed the problem....
>
> Yes; I noticed the huge pause due to the sync() and applied this fix.
> It was only later that I looked at Arjan's slides (I had to wait until
> someone had converted then from PowerPoint to PDF...) and saw that he
> had the same thing in his bootchart.

Oh, so the time was all spent in the sync() in nsm_get_state()?

Anyway, I think the nsm state updating shouldn't matter if you don't
even have any peers to notify.

--b.

2008-10-29 20:30:49

by Chuck Lever

[permalink] [raw]
Subject: Re: Make sm-notify faster if there are no servers to notify

On Oct 29, 2008, at 2:41 PM, J. Bruce Fields wrote:
> On Wed, Oct 29, 2008 at 05:45:05PM +0000, Phil Endecott wrote:
>> J. Bruce Fields wrote:
>>> On Wed, Oct 29, 2008 at 05:30:03PM +0000, Phil Endecott wrote:
>>>> J. Bruce Fields wrote:
>>>>> On Wed, Oct 29, 2008 at 12:13:20AM +0000, Phil Endecott wrote:
>>>>>> Dear Experts,
>>>>>>
>>>>>> sm-notify was taking a long time while my laptop booted. This
>>>>>> was odd because I use NFS only rarely - via autofs - on that
>>>>>> machine, and sm-notify actually has no-one to notify most of the
>>>>>> time. So I have patched it as follows. Is this a legitimate
>>>>>> thing to do?
>>>>>
>>>>> It looks like your patch was committed to nfs-utils a couple
>>>>> weeks ago:
>>>>> see c8d18e26d2a53d9036a32c2dafebccaf4ce1634d from
>>>>>
>>>>> git://linux-nfs.org/nfs-utils
>>>>>
>>>>> --b.
>>>>
>>>> How curious. I guess someone saw my Debian bug report. No mention
>>>> of it on this list as far as I can see though.
>>>>
>>>> I presume from this that it is considered a safe thing to do.
>>>
>>> It looks right to me. Hopefully somebody actually has tested this
>>> on a
>>> client that holds locks when it reboots?
>>
>> Not me.
>
> Ugh.
>
> We really need someone doing regular lock recovery tests with both the
> latest kernel and latest nfs-utils.
>
> Ideally we'd also have tests that could be easily run by anyone.
> Though
> there may be too many site-specific details involved in writing
> scripts
> that interact with (and reboot) multiple machines.
>
>>> I remember this was one of the things Arjan mentioned having to
>>> disable
>>> in his "5-second boot" talk at the Linux Plumbers Conference, so
>>> you're
>>> not the only one to have noticed the problem....
>>
>> Yes; I noticed the huge pause due to the sync() and applied this fix.
>> It was only later that I looked at Arjan's slides (I had to wait
>> until
>> someone had converted then from PowerPoint to PDF...) and saw that he
>> had the same thing in his bootchart.
>
> Oh, so the time was all spent in the sync() in nsm_get_state()?

I assume sync() is required because this logic performs a rename as
well as a simple write?

> Anyway, I think the nsm state updating shouldn't matter if you don't
> even have any peers to notify.

It probably does matter.

When a system is initially installed, it likely does not have a state
file in /var/lib/nfs. This may be harmless if it's not present;
rpc.statd probably does the right thing in this case.

However, the rest of the logic in nsm_get_state() is needed to bump
the system's state value properly after every reboot. It may be
inconsequential if there were no mounts or no NFS clients during the
last reboot, but this is subtle. I wouldn't bet on it.

>
>
> --b.
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs"
> in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html

--
Chuck Lever
chuck[dot]lever[at]oracle[dot]com




2008-10-29 21:11:57

by J. Bruce Fields

[permalink] [raw]
Subject: Re: Make sm-notify faster if there are no servers to notify

On Wed, Oct 29, 2008 at 04:30:32PM -0400, Chuck Lever wrote:
> I assume sync() is required because this logic performs a rename as well
> as a simple write?

I think an fsync() on the containing directory (together with an fsync()
of the file itself) would do the job if you wanted to avoid the globaly
sync(). I don't think ext3 is capable of doing anything finer-grained
than a whole-filesystem sync, though, so this doesn't help many people
in practice right now.

In any case, the rename adds an extra level of safety by ensuring the
nsm state is updated atomically, so we shouldn't get rid of it.

>> Anyway, I think the nsm state updating shouldn't matter if you don't
>> even have any peers to notify.
>
> It probably does matter.
>
> When a system is initially installed, it likely does not have a state
> file in /var/lib/nfs. This may be harmless if it's not present;
> rpc.statd probably does the right thing in this case.

The "right thing" in that case would be, I guess, to create a state file
with "0" in it. It doesn't do that. So this patch *does* break stuff.
Oops!

So should we revert it and do something else, or patch statd to create
a new state file if necessary?

> However, the rest of the logic in nsm_get_state() is needed to bump the
> system's state value properly after every reboot. It may be
> inconsequential if there were no mounts or no NFS clients during the
> last reboot, but this is subtle. I wouldn't bet on it.

If the state is only every communicated to hosts by notifications, then
if we're not notifying, the update of the state can't matter.

(So is that premise correct?)

--b.

2008-10-31 17:54:12

by Steve Dickson

[permalink] [raw]
Subject: Re: Make sm-notify faster if there are no servers to notify



J. Bruce Fields wrote:
> On Wed, Oct 29, 2008 at 05:30:03PM +0000, Phil Endecott wrote:
>> J. Bruce Fields wrote:
>>> On Wed, Oct 29, 2008 at 12:13:20AM +0000, Phil Endecott wrote:
>>>> Dear Experts,
>>>>
>>>> sm-notify was taking a long time while my laptop booted. This was
>>>> odd because I use NFS only rarely - via autofs - on that machine, and
>>>> sm-notify actually has no-one to notify most of the time. So I have
>>>> patched it as follows. Is this a legitimate thing to do?
>>> It looks like your patch was committed to nfs-utils a couple weeks ago:
>>> see c8d18e26d2a53d9036a32c2dafebccaf4ce1634d from
>>>
>>> git://linux-nfs.org/nfs-utils
>>>
>>> --b.
>> How curious. I guess someone saw my Debian bug report. No mention of
>> it on this list as far as I can see though.
>>
>> I presume from this that it is considered a safe thing to do.
>
> It looks right to me. Hopefully somebody actually has tested this on a
> client that holds locks when it reboots?
I did... things worked as expected....

steved.

2008-11-09 19:25:33

by J. Bruce Fields

[permalink] [raw]
Subject: Re: Make sm-notify faster if there are no servers to notify

On Wed, Oct 29, 2008 at 05:11:45PM -0400, bfields wrote:
> On Wed, Oct 29, 2008 at 04:30:32PM -0400, Chuck Lever wrote:
> > I assume sync() is required because this logic performs a rename as well
> > as a simple write?
>
> I think an fsync() on the containing directory (together with an fsync()
> of the file itself) would do the job if you wanted to avoid the globaly
> sync(). I don't think ext3 is capable of doing anything finer-grained
> than a whole-filesystem sync, though, so this doesn't help many people
> in practice right now.
>
> In any case, the rename adds an extra level of safety by ensuring the
> nsm state is updated atomically, so we shouldn't get rid of it.
>
> >> Anyway, I think the nsm state updating shouldn't matter if you don't
> >> even have any peers to notify.
> >
> > It probably does matter.
> >
> > When a system is initially installed, it likely does not have a state
> > file in /var/lib/nfs. This may be harmless if it's not present;
> > rpc.statd probably does the right thing in this case.
>
> The "right thing" in that case would be, I guess, to create a state file
> with "0" in it. It doesn't do that. So this patch *does* break stuff.
> Oops!
>
> So should we revert it and do something else, or patch statd to create
> a new state file if necessary?

It looks like this still needs to be fixed? I think it would be good
enough just to teach rpc.nfsd to create the file if it doesn't exist.

--b.

>
> > However, the rest of the logic in nsm_get_state() is needed to bump the
> > system's state value properly after every reboot. It may be
> > inconsequential if there were no mounts or no NFS clients during the
> > last reboot, but this is subtle. I wouldn't bet on it.
>
> If the state is only every communicated to hosts by notifications, then
> if we're not notifying, the update of the state can't matter.
>
> (So is that premise correct?)
>
> --b.

2008-11-10 00:53:35

by Chuck Lever

[permalink] [raw]
Subject: Re: Make sm-notify faster if there are no servers to notify

On Nov 9, 2008, at Nov 9, 2008, 2:25 PM, J. Bruce Fields wrote:
> On Wed, Oct 29, 2008 at 05:11:45PM -0400, bfields wrote:
>> On Wed, Oct 29, 2008 at 04:30:32PM -0400, Chuck Lever wrote:
>>> I assume sync() is required because this logic performs a rename
>>> as well
>>> as a simple write?
>>
>> I think an fsync() on the containing directory (together with an
>> fsync()
>> of the file itself) would do the job if you wanted to avoid the
>> globaly
>> sync(). I don't think ext3 is capable of doing anything finer-
>> grained
>> than a whole-filesystem sync, though, so this doesn't help many
>> people
>> in practice right now.
>>
>> In any case, the rename adds an extra level of safety by ensuring the
>> nsm state is updated atomically, so we shouldn't get rid of it.
>>
>>>> Anyway, I think the nsm state updating shouldn't matter if you
>>>> don't
>>>> even have any peers to notify.
>>>
>>> It probably does matter.
>>>
>>> When a system is initially installed, it likely does not have a
>>> state
>>> file in /var/lib/nfs. This may be harmless if it's not present;
>>> rpc.statd probably does the right thing in this case.
>>
>> The "right thing" in that case would be, I guess, to create a state
>> file
>> with "0" in it. It doesn't do that. So this patch *does* break
>> stuff.
>> Oops!
>>
>> So should we revert it and do something else, or patch statd to
>> create
>> a new state file if necessary?
>
> It looks like this still needs to be fixed? I think it would be good
> enough just to teach rpc.nfsd to create the file if it doesn't exist.

Meh. I'd rather manage the state file in one place, rather than have
multiple user space entities fiddle with it.

I think we should find out exactly what breaks when sm-notify quits
early. Steve hasn't found a problem with the patch already in nfs-
utils, but the corner cases here are really narrow.

Without a lot of testing (which we currently don't have the resources
for), I don't feel 100% positive about sm-notify quitting early. My
preferred solution would involve working around the sync(2) call
instead (ie fixing sm-notify so we don't need it, or somehow doing it
in the background so it doesn't hold up the boot-up process). I think
we will end up waiting until this actually bites someone, but chances
are it will be a long wait.

> --b.
>
>>
>>> However, the rest of the logic in nsm_get_state() is needed to
>>> bump the
>>> system's state value properly after every reboot. It may be
>>> inconsequential if there were no mounts or no NFS clients during the
>>> last reboot, but this is subtle. I wouldn't bet on it.
>>
>> If the state is only every communicated to hosts by notifications,
>> then
>> if we're not notifying, the update of the state can't matter.
>>
>> (So is that premise correct?)
>>
>> --b.

--
Chuck Lever
chuck[dot]lever[at]oracle[dot]com





2008-11-10 00:55:24

by J. Bruce Fields

[permalink] [raw]
Subject: Re: Make sm-notify faster if there are no servers to notify

On Sun, Nov 09, 2008 at 07:52:59PM -0500, Chuck Lever wrote:
> On Nov 9, 2008, at Nov 9, 2008, 2:25 PM, J. Bruce Fields wrote:
>> On Wed, Oct 29, 2008 at 05:11:45PM -0400, bfields wrote:
>>> On Wed, Oct 29, 2008 at 04:30:32PM -0400, Chuck Lever wrote:
>>>> I assume sync() is required because this logic performs a rename
>>>> as well
>>>> as a simple write?
>>>
>>> I think an fsync() on the containing directory (together with an
>>> fsync()
>>> of the file itself) would do the job if you wanted to avoid the
>>> globaly
>>> sync(). I don't think ext3 is capable of doing anything finer-
>>> grained
>>> than a whole-filesystem sync, though, so this doesn't help many
>>> people
>>> in practice right now.
>>>
>>> In any case, the rename adds an extra level of safety by ensuring the
>>> nsm state is updated atomically, so we shouldn't get rid of it.
>>>
>>>>> Anyway, I think the nsm state updating shouldn't matter if you
>>>>> don't
>>>>> even have any peers to notify.
>>>>
>>>> It probably does matter.
>>>>
>>>> When a system is initially installed, it likely does not have a
>>>> state
>>>> file in /var/lib/nfs. This may be harmless if it's not present;
>>>> rpc.statd probably does the right thing in this case.
>>>
>>> The "right thing" in that case would be, I guess, to create a state
>>> file
>>> with "0" in it. It doesn't do that. So this patch *does* break
>>> stuff.
>>> Oops!
>>>
>>> So should we revert it and do something else, or patch statd to
>>> create
>>> a new state file if necessary?
>>
>> It looks like this still needs to be fixed? I think it would be good
>> enough just to teach rpc.nfsd to create the file if it doesn't exist.
^^^^^^^^

(Err, sorry, note I meant rpc.statd, not rpc.nfsd.)

--b.

> Meh. I'd rather manage the state file in one place, rather than have
> multiple user space entities fiddle with it.
>
> I think we should find out exactly what breaks when sm-notify quits
> early. Steve hasn't found a problem with the patch already in nfs-
> utils, but the corner cases here are really narrow.
>
> Without a lot of testing (which we currently don't have the resources
> for), I don't feel 100% positive about sm-notify quitting early. My
> preferred solution would involve working around the sync(2) call instead
> (ie fixing sm-notify so we don't need it, or somehow doing it in the
> background so it doesn't hold up the boot-up process). I think we will
> end up waiting until this actually bites someone, but chances are it will
> be a long wait.

2008-11-10 01:02:13

by Chuck Lever

[permalink] [raw]
Subject: Re: Make sm-notify faster if there are no servers to notify

On Nov 9, 2008, at Nov 9, 2008, 7:55 PM, J. Bruce Fields wrote:
> On Sun, Nov 09, 2008 at 07:52:59PM -0500, Chuck Lever wrote:
>> On Nov 9, 2008, at Nov 9, 2008, 2:25 PM, J. Bruce Fields wrote:
>>> On Wed, Oct 29, 2008 at 05:11:45PM -0400, bfields wrote:
>>>> On Wed, Oct 29, 2008 at 04:30:32PM -0400, Chuck Lever wrote:
>>>>> I assume sync() is required because this logic performs a rename
>>>>> as well
>>>>> as a simple write?
>>>>
>>>> I think an fsync() on the containing directory (together with an
>>>> fsync()
>>>> of the file itself) would do the job if you wanted to avoid the
>>>> globaly
>>>> sync(). I don't think ext3 is capable of doing anything finer-
>>>> grained
>>>> than a whole-filesystem sync, though, so this doesn't help many
>>>> people
>>>> in practice right now.
>>>>
>>>> In any case, the rename adds an extra level of safety by ensuring
>>>> the
>>>> nsm state is updated atomically, so we shouldn't get rid of it.
>>>>
>>>>>> Anyway, I think the nsm state updating shouldn't matter if you
>>>>>> don't
>>>>>> even have any peers to notify.
>>>>>
>>>>> It probably does matter.
>>>>>
>>>>> When a system is initially installed, it likely does not have a
>>>>> state
>>>>> file in /var/lib/nfs. This may be harmless if it's not present;
>>>>> rpc.statd probably does the right thing in this case.
>>>>
>>>> The "right thing" in that case would be, I guess, to create a state
>>>> file
>>>> with "0" in it. It doesn't do that. So this patch *does* break
>>>> stuff.
>>>> Oops!
>>>>
>>>> So should we revert it and do something else, or patch statd to
>>>> create
>>>> a new state file if necessary?
>>>
>>> It looks like this still needs to be fixed? I think it would be
>>> good
>>> enough just to teach rpc.nfsd to create the file if it doesn't
>>> exist.
> ^^^^^^^^
>
> (Err, sorry, note I meant rpc.statd, not rpc.nfsd.)
>
> --b.
>
>> Meh. I'd rather manage the state file in one place, rather than have
>> multiple user space entities fiddle with it.

Acknowledged. My comments below still stand, I think. More testing!
Yay!

>> I think we should find out exactly what breaks when sm-notify quits
>> early. Steve hasn't found a problem with the patch already in nfs-
>> utils, but the corner cases here are really narrow.
>>
>> Without a lot of testing (which we currently don't have the resources
>> for), I don't feel 100% positive about sm-notify quitting early. My
>> preferred solution would involve working around the sync(2) call
>> instead
>> (ie fixing sm-notify so we don't need it, or somehow doing it in the
>> background so it doesn't hold up the boot-up process). I think we
>> will
>> end up waiting until this actually bites someone, but chances are
>> it will
>> be a long wait.


In any event, I will keep this in mind (along with the idea of
allowing only client or server peers to be notified of a reboot) as I
audit this code for IPv6 support.

--
Chuck Lever
chuck[dot]lever[at]oracle[dot]com

2008-11-10 09:40:48

by Phil Endecott

[permalink] [raw]
Subject: Re: Make sm-notify faster if there are no servers to notify

Chuck Lever wrote:
> My preferred solution would involve working around the sync(2) call
> instead (ie fixing sm-notify so we don't need it, or somehow doing it
> in the background so it doesn't hold up the boot-up process).

Doing it in the background doesn't really help, as it's the huge blob
of disk writes that it generates that's the problem as much as the
elapsed time. Boot time is more likely to be I/O bound than CPU bound.

BTW, I have written an article about my efforts to speed up booting
which will should appear at debian-administration.org later today.


Cheers, Phil.




2008-11-10 13:43:38

by Steve Dickson

[permalink] [raw]
Subject: Re: Make sm-notify faster if there are no servers to notify

J. Bruce Fields wrote:
> On Wed, Oct 29, 2008 at 04:30:32PM -0400, Chuck Lever wrote:
>> I assume sync() is required because this logic performs a rename as well
>> as a simple write?
>
> I think an fsync() on the containing directory (together with an fsync()
> of the file itself) would do the job if you wanted to avoid the globaly
> sync(). I don't think ext3 is capable of doing anything finer-grained
> than a whole-filesystem sync, though, so this doesn't help many people
> in practice right now.
>
> In any case, the rename adds an extra level of safety by ensuring the
> nsm state is updated atomically, so we shouldn't get rid of it.
>
>>> Anyway, I think the nsm state updating shouldn't matter if you don't
>>> even have any peers to notify.
>> It probably does matter.
>>
>> When a system is initially installed, it likely does not have a state
>> file in /var/lib/nfs. This may be harmless if it's not present;
>> rpc.statd probably does the right thing in this case.
>
> The "right thing" in that case would be, I guess, to create a state file
> with "0" in it. It doesn't do that. So this patch *does* break stuff.
> Oops!
>
> So should we revert it and do something else, or patch statd to create
> a new state file if necessary?
I have to agree with Chuck, that managing the state file from one
place is desirable... Although does it make sense to move that management
into a init script? Maybe insuring the state is different before any
daemons are started? Would we still need the sync()?

>
>> However, the rest of the logic in nsm_get_state() is needed to bump the
>> system's state value properly after every reboot. It may be
>> inconsequential if there were no mounts or no NFS clients during the
>> last reboot, but this is subtle. I wouldn't bet on it.
>
> If the state is only every communicated to hosts by notifications, then
> if we're not notifying, the update of the state can't matter.
I had the same notion... If there are no clients to notify, why would
any clients care if our state changed?? With that said.... I do have a
sinking feeling that this patch will come back to bite us...
It was just too easy of a fix... :-)

steved.

2008-12-11 22:45:12

by J. Bruce Fields

[permalink] [raw]
Subject: Re: Make sm-notify faster if there are no servers to notify

On Mon, Dec 08, 2008 at 02:50:39PM -0500, Chuck Lever wrote:
> Hi Bruce-
>
> On Dec 5, 2008, at Dec 5, 2008, 9:42 PM, J. Bruce Fields wrote:
>> On Fri, Dec 05, 2008 at 01:41:32PM -0500, Chuck Lever wrote:
>>> On Dec 5, 2008, at 12:29 PM, J. Bruce Fields wrote:
>>>> How about adding an explicit fsync() of the state file (and parent
>>>> directory) to statd's first succesful creation of a statd record,
>>>> together with a comment explaining this? So around about line 194
>>>> in
>>>> utils/statd/monitor.c:sm_mon_1_svc()?
>>>>
>>>> In fact, we could delete the sync entirely and do the same before
>>>> the
>>>> first notification, and then we wouldn't have to wait for the sync
>>>> in
>>>> the case host records are present either.... (statd would, but
>>>> perhaps we could still get other work done in the mean time).
>>>>
>>>> (Am I missing something?)
>>>
>>> This all might work, but I think we're adding a lot of complexity as
>>> a
>>> workaround.
>>
>> I think you mean the justification is too subtle--the code itself
>> (just
>> a couple syncs or fsyncs) is pretty simple.
>
> Let me state it another way. Your suggested redesign makes assumptions
> about the order in which these operations are performed. Can the code
> provide any guarantee that this ordering is always true? This may have
> been feasible when rpc.statd did notification too, but these are now two
> separate programs. Can you think of a way of implementing this where the
> ordering dependencies are coded instead of commented?

The existing code requires that we update the on-disk nsm state number
before doing notification or accepting statd calls. That's the only
ordering I'd require.

Maybe I should try to write a patch.

--b.

>
> It would be significantly less fragile (ie immune to ordering problems
> and long-term changes to system start-up and nfs-utils code) and simpler
> to understand if sm-notify does whatever syncing it needs, and rpc.statd
> does whatever syncing it needs.
>
> All I'm saying is this scheme may be easy to break by accident, and any
> future problem here is probably not going to show up until someone
> really needs this to work right.
>
>>> Someone should fix the real problem, which is the
>>> implementation of sync().
>>
>> I think you mean of fsync(). My understanding of past discussions on
>> the issue is that it's not really fixable on ext3, at least. So
>> default setups will have this problem for a while.
>
> --
> Chuck Lever
> chuck[dot]lever[at]oracle[dot]com

2008-12-04 21:11:21

by J. Bruce Fields

[permalink] [raw]
Subject: Re: Make sm-notify faster if there are no servers to notify

On Mon, Nov 10, 2008 at 08:41:38AM -0500, Steve Dickson wrote:
> J. Bruce Fields wrote:
> > On Wed, Oct 29, 2008 at 04:30:32PM -0400, Chuck Lever wrote:
> >> I assume sync() is required because this logic performs a rename as well
> >> as a simple write?
> >
> > I think an fsync() on the containing directory (together with an fsync()
> > of the file itself) would do the job if you wanted to avoid the globaly
> > sync(). I don't think ext3 is capable of doing anything finer-grained
> > than a whole-filesystem sync, though, so this doesn't help many people
> > in practice right now.
> >
> > In any case, the rename adds an extra level of safety by ensuring the
> > nsm state is updated atomically, so we shouldn't get rid of it.
> >
> >>> Anyway, I think the nsm state updating shouldn't matter if you don't
> >>> even have any peers to notify.
> >> It probably does matter.
> >>
> >> When a system is initially installed, it likely does not have a state
> >> file in /var/lib/nfs. This may be harmless if it's not present;
> >> rpc.statd probably does the right thing in this case.
> >
> > The "right thing" in that case would be, I guess, to create a state file
> > with "0" in it. It doesn't do that. So this patch *does* break stuff.
> > Oops!
> >
> > So should we revert it and do something else, or patch statd to create
> > a new state file if necessary?
> I have to agree with Chuck, that managing the state file from one
> place is desirable... Although does it make sense to move that management
> into a init script? Maybe insuring the state is different before any
> daemons are started? Would we still need the sync()?
>
> >
> >> However, the rest of the logic in nsm_get_state() is needed to bump the
> >> system's state value properly after every reboot. It may be
> >> inconsequential if there were no mounts or no NFS clients during the
> >> last reboot, but this is subtle. I wouldn't bet on it.
> >
> > If the state is only every communicated to hosts by notifications, then
> > if we're not notifying, the update of the state can't matter.
> I had the same notion... If there are no clients to notify, why would
> any clients care if our state changed?? With that said.... I do have a
> sinking feeling that this patch will come back to bite us...
> It was just too easy of a fix... :-)

Any progress on this? I don't think we can release in the current
state, since as far as I can tell that means on a new system, unless the
install scripts create /var/lib/nfs/state, neither sm-notify nor statd
ever writes to /proc/sys/fs/nfs/nsm_local_state, and (without testing)
it looks to me like that means lockd defaults to a state of 0, which is
nonsense?

One possible compromise that I think would be correct might be to skip
the sync in the case where there are no peers to notify, instead of
exiting early in that case. As long as a sync happens at some point
before we grant any locks (which statd can guarantee), I think that will
work....

--b.

2008-12-05 03:34:53

by NeilBrown

[permalink] [raw]
Subject: Re: Make sm-notify faster if there are no servers to notify

On Thursday December 4, [email protected] wrote:
>
> Any progress on this? I don't think we can release in the current
> state, since as far as I can tell that means on a new system, unless the
> install scripts create /var/lib/nfs/state, neither sm-notify nor statd
> ever writes to /proc/sys/fs/nfs/nsm_local_state, and (without testing)
> it looks to me like that means lockd defaults to a state of 0, which is
> nonsense?

Why is '0' nonsense?
The only real requirement on 'state' is that it changes when the host
reboots while some peer is monitoring it. Even if it got reset to
zero every time the sm and sm.bak became empty it would still work
just fine. If we reboot and find that both sm and sm.bak are empty
there is really no point in changing 'state'.


I think it would still be valuable to replace the 'sync' with two
'fsync's, one of the file, one on the directory.
This may not be a win on ext3 today (I'm not 100% certain about that)
but there are other filesystems and more seem to be coming.

NeilBrown

2008-12-05 03:58:16

by J. Bruce Fields

[permalink] [raw]
Subject: Re: Make sm-notify faster if there are no servers to notify

On Fri, Dec 05, 2008 at 02:34:54PM +1100, Neil Brown wrote:
> On Thursday December 4, [email protected] wrote:
> >
> > Any progress on this? I don't think we can release in the current
> > state, since as far as I can tell that means on a new system, unless the
> > install scripts create /var/lib/nfs/state, neither sm-notify nor statd
> > ever writes to /proc/sys/fs/nfs/nsm_local_state, and (without testing)
> > it looks to me like that means lockd defaults to a state of 0, which is
> > nonsense?
>
> Why is '0' nonsense?

Even numbers are supposed to mean that a host is down, odd that it's up.
In practice noone ever uses that fact--they only ever advertise odd
numbers. So we might get away with using an even number. But then
again a peer would be in its rights to check the parity and complain,
and some may well do that.

> The only real requirement on 'state' is that it changes when the host
> reboots while some peer is monitoring it. Even if it got reset to
> zero every time the sm and sm.bak became empty it would still work
> just fine.

Hm, I don't know; it might be that an inconveniently timed network
partition combined with nsm states that repeat themselves could prevent
a client from knowing about some reboot that it should have known about.

It's safer to keep a counter and ensure that the state never repeats
(well, anyway, not until it overflows after a few billion reboots).

> If we reboot and find that both sm and sm.bak are empty
> there is really no point in changing 'state'.

I think I've convinced myself of that, yes. For the purposes of
locking, a reboot which didn't require notifying any client may as well
have not been a reboot, since no state was lost.

But we should at least make sure that the state is properly initialized
and nondecreasing.

> I think it would still be valuable to replace the 'sync' with two
> 'fsync's, one of the file, one on the directory.

Sure, may as well.--b.

> This may not be a win on ext3 today (I'm not 100% certain about that)
> but there are other filesystems and more seem to be coming.

2008-12-05 13:29:08

by Steve Dickson

[permalink] [raw]
Subject: Re: Make sm-notify faster if there are no servers to notify

J. Bruce Fields wrote:
>
>> I think it would still be valuable to replace the 'sync' with two
>> 'fsync's, one of the file, one on the directory.
>
> Sure, may as well.--b.
>
Something similar to this:

diff -up nfs-utils/utils/statd/sm-notify.c.orig nfs-utils/utils/statd/sm-notify.c
--- nfs-utils/utils/statd/sm-notify.c.orig 2008-11-17 15:06:13.000000000 -0500
+++ nfs-utils/utils/statd/sm-notify.c 2008-12-05 08:21:52.000000000 -0500
@@ -211,12 +211,6 @@ usage: fprintf(stderr,
backup_hosts(_SM_DIR_PATH, _SM_BAK_PATH);
get_hosts(_SM_BAK_PATH);

- /* If there are not hosts to notify, just exit */
- if (!hosts) {
- nsm_log(LOG_DEBUG, "No hosts to notify; exiting");
- return 0;
- }
-
/* Get and update the NSM state. This will call sync() */
nsm_state = nsm_get_state(opt_update_state);
set_kernel_nsm_state(nsm_state);
@@ -694,6 +688,7 @@ nsm_get_state(int update)
{
char newfile[PATH_MAX];
int fd, state;
+ DIR *fp;

if ((fd = open(_SM_STATE_PATH, O_RDONLY)) < 0) {
if (!opt_quiet) {
@@ -730,13 +725,16 @@ nsm_get_state(int update)
"Failed to write state to %s", newfile);
exit(1);
}
+ fsync(fd);
close(fd);
+ fp = opendir(_SM_STATE_PATH);
if (rename(newfile, _SM_STATE_PATH) < 0) {
nsm_log(LOG_ERR,
"Cannot create %s: %m", _SM_STATE_PATH);
exit(1);
}
- sync();

If so, I'll make it happen...

steved.

2008-12-05 16:38:36

by J. Bruce Fields

[permalink] [raw]
Subject: Re: Make sm-notify faster if there are no servers to notify

On Fri, Dec 05, 2008 at 08:26:44AM -0500, Steve Dickson wrote:
> J. Bruce Fields wrote:
> >
> >> I think it would still be valuable to replace the 'sync' with two
> >> 'fsync's, one of the file, one on the directory.
> >
> > Sure, may as well.--b.
> >
> Something similar to this:
>
> diff -up nfs-utils/utils/statd/sm-notify.c.orig nfs-utils/utils/statd/sm-notify.c
> --- nfs-utils/utils/statd/sm-notify.c.orig 2008-11-17 15:06:13.000000000 -0500
> +++ nfs-utils/utils/statd/sm-notify.c 2008-12-05 08:21:52.000000000 -0500
> @@ -211,12 +211,6 @@ usage: fprintf(stderr,
> backup_hosts(_SM_DIR_PATH, _SM_BAK_PATH);
> get_hosts(_SM_BAK_PATH);
>
> - /* If there are not hosts to notify, just exit */
> - if (!hosts) {
> - nsm_log(LOG_DEBUG, "No hosts to notify; exiting");
> - return 0;
> - }

This was still a huge boot-time win in the common case, so now that
we've committed to it I'd rather not regress. Let's just skip the
sync()s/fsncy()s in the !hosts case--that looks to me like the simplest
correct solution for now.

> -
> /* Get and update the NSM state. This will call sync() */
> nsm_state = nsm_get_state(opt_update_state);
> set_kernel_nsm_state(nsm_state);
> @@ -694,6 +688,7 @@ nsm_get_state(int update)
> {
> char newfile[PATH_MAX];
> int fd, state;
> + DIR *fp;
>
> if ((fd = open(_SM_STATE_PATH, O_RDONLY)) < 0) {
> if (!opt_quiet) {
> @@ -730,13 +725,16 @@ nsm_get_state(int update)
> "Failed to write state to %s", newfile);
> exit(1);
> }
> + fsync(fd);
> close(fd);
> + fp = opendir(_SM_STATE_PATH);

Also, I think you meant to:

fsync(fp);
close(fp);

?

--b.

> if (rename(newfile, _SM_STATE_PATH) < 0) {
> nsm_log(LOG_ERR,
> "Cannot create %s: %m", _SM_STATE_PATH);
> exit(1);
> }
> - sync();
>
> If so, I'll make it happen...
>
> steved.

2008-12-05 17:29:30

by J. Bruce Fields

[permalink] [raw]
Subject: Re: Make sm-notify faster if there are no servers to notify

On Fri, Dec 05, 2008 at 11:38:24AM -0500, bfields wrote:
> On Fri, Dec 05, 2008 at 08:26:44AM -0500, Steve Dickson wrote:
> > J. Bruce Fields wrote:
> > >
> > >> I think it would still be valuable to replace the 'sync' with two
> > >> 'fsync's, one of the file, one on the directory.
> > >
> > > Sure, may as well.--b.
> > >
> > Something similar to this:
> >
> > diff -up nfs-utils/utils/statd/sm-notify.c.orig nfs-utils/utils/statd/sm-notify.c
> > --- nfs-utils/utils/statd/sm-notify.c.orig 2008-11-17 15:06:13.000000000 -0500
> > +++ nfs-utils/utils/statd/sm-notify.c 2008-12-05 08:21:52.000000000 -0500
> > @@ -211,12 +211,6 @@ usage: fprintf(stderr,
> > backup_hosts(_SM_DIR_PATH, _SM_BAK_PATH);
> > get_hosts(_SM_BAK_PATH);
> >
> > - /* If there are not hosts to notify, just exit */
> > - if (!hosts) {
> > - nsm_log(LOG_DEBUG, "No hosts to notify; exiting");
> > - return 0;
> > - }
>
> This was still a huge boot-time win in the common case, so now that
> we've committed to it I'd rather not regress. Let's just skip the
> sync()s/fsncy()s in the !hosts case--that looks to me like the simplest
> correct solution for now.

My argument for correctness: if we don't sync in that case, then on
reboot the rename that updates the state will either have happened or
(if a crash comes too soon) not.

It is OK for that update to not happen as long as we're assured it
happens before the first lock request is made or replied to, or the
first monitor request completes, as, in the absence of any notifies,
those are the only points at which the new state will be exposed to the
outside world.

The first lock request will also require an upcall to statd. So we're
OK as long as any monitor requests (from either the local kernel or
remote peers) do a sync.

And statd should be doing a sync before responding to any monitor
request. As long as the SM_DIR is on the same filesystem as the state
file, that would do the job.... But now that I look, I see statd is
using an open with O_SYNC to ensure the new statd record hits stable
storage. Which we can't count on being enough.

How about adding an explicit fsync() of the state file (and parent
directory) to statd's first succesful creation of a statd record,
together with a comment explaining this? So around about line 194 in
utils/statd/monitor.c:sm_mon_1_svc()?

In fact, we could delete the sync entirely and do the same before the
first notification, and then we wouldn't have to wait for the sync in
the case host records are present either.... (statd would, but perhaps
we could still get other work done in the mean time).

(Am I missing something?

--b.

2008-12-05 18:41:58

by Chuck Lever

[permalink] [raw]
Subject: Re: Make sm-notify faster if there are no servers to notify

On Dec 5, 2008, at 12:29 PM, J. Bruce Fields wrote:
> On Fri, Dec 05, 2008 at 11:38:24AM -0500, bfields wrote:
>> On Fri, Dec 05, 2008 at 08:26:44AM -0500, Steve Dickson wrote:
>>> J. Bruce Fields wrote:
>>>>
>>>>> I think it would still be valuable to replace the 'sync' with two
>>>>> 'fsync's, one of the file, one on the directory.
>>>>
>>>> Sure, may as well.--b.
>>>>
>>> Something similar to this:
>>>
>>> diff -up nfs-utils/utils/statd/sm-notify.c.orig nfs-utils/utils/
>>> statd/sm-notify.c
>>> --- nfs-utils/utils/statd/sm-notify.c.orig 2008-11-17
>>> 15:06:13.000000000 -0500
>>> +++ nfs-utils/utils/statd/sm-notify.c 2008-12-05
>>> 08:21:52.000000000 -0500
>>> @@ -211,12 +211,6 @@ usage: fprintf(stderr,
>>> backup_hosts(_SM_DIR_PATH, _SM_BAK_PATH);
>>> get_hosts(_SM_BAK_PATH);
>>>
>>> - /* If there are not hosts to notify, just exit */
>>> - if (!hosts) {
>>> - nsm_log(LOG_DEBUG, "No hosts to notify; exiting");
>>> - return 0;
>>> - }
>>
>> This was still a huge boot-time win in the common case, so now that
>> we've committed to it I'd rather not regress. Let's just skip the
>> sync()s/fsncy()s in the !hosts case--that looks to me like the
>> simplest
>> correct solution for now.
>
> My argument for correctness: if we don't sync in that case, then on
> reboot the rename that updates the state will either have happened or
> (if a crash comes too soon) not.
>
> It is OK for that update to not happen as long as we're assured it
> happens before the first lock request is made or replied to, or the
> first monitor request completes, as, in the absence of any notifies,
> those are the only points at which the new state will be exposed to
> the
> outside world.
>
> The first lock request will also require an upcall to statd. So we're
> OK as long as any monitor requests (from either the local kernel or
> remote peers) do a sync.
>
> And statd should be doing a sync before responding to any monitor
> request. As long as the SM_DIR is on the same filesystem as the state
> file, that would do the job.... But now that I look, I see statd is
> using an open with O_SYNC to ensure the new statd record hits stable
> storage. Which we can't count on being enough.
>
> How about adding an explicit fsync() of the state file (and parent
> directory) to statd's first succesful creation of a statd record,
> together with a comment explaining this? So around about line 194 in
> utils/statd/monitor.c:sm_mon_1_svc()?
>
> In fact, we could delete the sync entirely and do the same before the
> first notification, and then we wouldn't have to wait for the sync in
> the case host records are present either.... (statd would, but
> perhaps
> we could still get other work done in the mean time).
>
> (Am I missing something?)

This all might work, but I think we're adding a lot of complexity as a
workaround. Someone should fix the real problem, which is the
implementation of sync().

That would probably have other benefits too.

--
Chuck Lever
chuck[dot]lever[at]oracle[dot]com

2008-12-05 18:44:41

by Steve Dickson

[permalink] [raw]
Subject: Re: Make sm-notify faster if there are no servers to notify

J. Bruce Fields wrote:
> On Fri, Dec 05, 2008 at 08:26:44AM -0500, Steve Dickson wrote:
>> J. Bruce Fields wrote:
>>>> I think it would still be valuable to replace the 'sync' with two
>>>> 'fsync's, one of the file, one on the directory.
>>> Sure, may as well.--b.
>>>
>> Something similar to this:
>>
>> diff -up nfs-utils/utils/statd/sm-notify.c.orig nfs-utils/utils/statd/sm-notify.c
>> --- nfs-utils/utils/statd/sm-notify.c.orig 2008-11-17 15:06:13.000000000 -0500
>> +++ nfs-utils/utils/statd/sm-notify.c 2008-12-05 08:21:52.000000000 -0500
>> @@ -211,12 +211,6 @@ usage: fprintf(stderr,
>> backup_hosts(_SM_DIR_PATH, _SM_BAK_PATH);
>> get_hosts(_SM_BAK_PATH);
>>
>> - /* If there are not hosts to notify, just exit */
>> - if (!hosts) {
>> - nsm_log(LOG_DEBUG, "No hosts to notify; exiting");
>> - return 0;
>> - }
>
> This was still a huge boot-time win in the common case, so now that
> we've committed to it I'd rather not regress.
I thought the idea was the state had to be updated even when there
are no hosts...

>> @@ -730,13 +725,16 @@ nsm_get_state(int update)
>> "Failed to write state to %s", newfile);
>> exit(1);
>> }
>> + fsync(fd);
>> close(fd);
>> + fp = opendir(_SM_STATE_PATH);
>
> Also, I think you meant to:
>
> fsync(fp);
> close(fp);
No... I don't think so... the fd is an open() of the new file and the
fp is a DIR stream... But there was a cut/paste error... The fsync()
of the DIR stream was missing... here is is again...

@@ -694,6 +688,7 @@ nsm_get_state(int update)
{
char newfile[PATH_MAX];
int fd, state;
+ DIR *fp;

if ((fd = open(_SM_STATE_PATH, O_RDONLY)) < 0) {
if (!opt_quiet) {
@@ -730,13 +725,18 @@ nsm_get_state(int update)
"Failed to write state to %s", newfile);
exit(1);
}
+ fsync(fd);
close(fd);
+ fp = opendir(_SM_STATE_PATH);
if (rename(newfile, _SM_STATE_PATH) < 0) {
nsm_log(LOG_ERR,
"Cannot create %s: %m", _SM_STATE_PATH);
exit(1);
}
- sync();
+ if (fp != NULL) {
+ fsync(dirfd(fp));
+ closedir(fp);
+ }
}

return state;


2008-12-05 19:14:48

by Steve Dickson

[permalink] [raw]
Subject: Re: Make sm-notify faster if there are no servers to notify

J. Bruce Fields wrote:
> On Fri, Dec 05, 2008 at 11:38:24AM -0500, bfields wrote:
>> On Fri, Dec 05, 2008 at 08:26:44AM -0500, Steve Dickson wrote:
>>> J. Bruce Fields wrote:
>>>>> I think it would still be valuable to replace the 'sync' with two
>>>>> 'fsync's, one of the file, one on the directory.
>>>> Sure, may as well.--b.
>>>>
>>> Something similar to this:
>>>
>>> diff -up nfs-utils/utils/statd/sm-notify.c.orig nfs-utils/utils/statd/sm-notify.c
>>> --- nfs-utils/utils/statd/sm-notify.c.orig 2008-11-17 15:06:13.000000000 -0500
>>> +++ nfs-utils/utils/statd/sm-notify.c 2008-12-05 08:21:52.000000000 -0500
>>> @@ -211,12 +211,6 @@ usage: fprintf(stderr,
>>> backup_hosts(_SM_DIR_PATH, _SM_BAK_PATH);
>>> get_hosts(_SM_BAK_PATH);
>>>
>>> - /* If there are not hosts to notify, just exit */
>>> - if (!hosts) {
>>> - nsm_log(LOG_DEBUG, "No hosts to notify; exiting");
>>> - return 0;
>>> - }
>> This was still a huge boot-time win in the common case, so now that
>> we've committed to it I'd rather not regress. Let's just skip the
>> sync()s/fsncy()s in the !hosts case--that looks to me like the simplest
>> correct solution for now.
>
> My argument for correctness: if we don't sync in that case, then on
> reboot the rename that updates the state will either have happened or
> (if a crash comes too soon) not.
>
> It is OK for that update to not happen as long as we're assured it
> happens before the first lock request is made or replied to, or the
> first monitor request completes, as, in the absence of any notifies,
> those are the only points at which the new state will be exposed to the
> outside world.
Doesn't the sync() have to happen before the file is first
read by stated. Meaning before statd:main() calls load_state_number()?

>
> The first lock request will also require an upcall to statd. So we're
> OK as long as any monitor requests (from either the local kernel or
> remote peers) do a sync.
>
> And statd should be doing a sync before responding to any monitor
> request. As long as the SM_DIR is on the same filesystem as the state
> file, that would do the job.... But now that I look, I see statd is
> using an open with O_SYNC to ensure the new statd record hits stable
> storage. Which we can't count on being enough.
>
> How about adding an explicit fsync() of the state file (and parent
> directory) to statd's first succesful creation of a statd record,
> together with a comment explaining this? So around about line 194 in
> utils/statd/monitor.c:sm_mon_1_svc()?
If we do the sync()/fsync() here we will also have to update MY_STATE
since that's what is the number used in the RPCs. But also I think
doing the sync this late be a bit waste since there is real good
chance the rename has already been sync-ed out by previous sync()
during boot up... or am I missing something...

steved.


2008-12-06 02:42:54

by J. Bruce Fields

[permalink] [raw]
Subject: Re: Make sm-notify faster if there are no servers to notify

On Fri, Dec 05, 2008 at 01:41:32PM -0500, Chuck Lever wrote:
> On Dec 5, 2008, at 12:29 PM, J. Bruce Fields wrote:
>> How about adding an explicit fsync() of the state file (and parent
>> directory) to statd's first succesful creation of a statd record,
>> together with a comment explaining this? So around about line 194 in
>> utils/statd/monitor.c:sm_mon_1_svc()?
>>
>> In fact, we could delete the sync entirely and do the same before the
>> first notification, and then we wouldn't have to wait for the sync in
>> the case host records are present either.... (statd would, but
>> perhaps
>> we could still get other work done in the mean time).
>>
>> (Am I missing something?)
>
> This all might work, but I think we're adding a lot of complexity as a
> workaround.

I think you mean the justification is too subtle--the code itself (just
a couple syncs or fsyncs) is pretty simple.

> Someone should fix the real problem, which is the
> implementation of sync().

I think you mean of fsync(). My understanding of past discussions on
the issue is that it's not really fixable on ext3, at least. So
default setups will have this problem for a while.

--b.

> That would probably have other benefits too.

2008-12-06 02:49:23

by J. Bruce Fields

[permalink] [raw]
Subject: Re: Make sm-notify faster if there are no servers to notify

On Fri, Dec 05, 2008 at 02:12:27PM -0500, Steve Dickson wrote:
> J. Bruce Fields wrote:
> > On Fri, Dec 05, 2008 at 11:38:24AM -0500, bfields wrote:
> >> On Fri, Dec 05, 2008 at 08:26:44AM -0500, Steve Dickson wrote:
> >>> J. Bruce Fields wrote:
> >>>>> I think it would still be valuable to replace the 'sync' with two
> >>>>> 'fsync's, one of the file, one on the directory.
> >>>> Sure, may as well.--b.
> >>>>
> >>> Something similar to this:
> >>>
> >>> diff -up nfs-utils/utils/statd/sm-notify.c.orig nfs-utils/utils/statd/sm-notify.c
> >>> --- nfs-utils/utils/statd/sm-notify.c.orig 2008-11-17 15:06:13.000000000 -0500
> >>> +++ nfs-utils/utils/statd/sm-notify.c 2008-12-05 08:21:52.000000000 -0500
> >>> @@ -211,12 +211,6 @@ usage: fprintf(stderr,
> >>> backup_hosts(_SM_DIR_PATH, _SM_BAK_PATH);
> >>> get_hosts(_SM_BAK_PATH);
> >>>
> >>> - /* If there are not hosts to notify, just exit */
> >>> - if (!hosts) {
> >>> - nsm_log(LOG_DEBUG, "No hosts to notify; exiting");
> >>> - return 0;
> >>> - }
> >> This was still a huge boot-time win in the common case, so now that
> >> we've committed to it I'd rather not regress. Let's just skip the
> >> sync()s/fsncy()s in the !hosts case--that looks to me like the simplest
> >> correct solution for now.
> >
> > My argument for correctness: if we don't sync in that case, then on
> > reboot the rename that updates the state will either have happened or
> > (if a crash comes too soon) not.
> >
> > It is OK for that update to not happen as long as we're assured it
> > happens before the first lock request is made or replied to, or the
> > first monitor request completes, as, in the absence of any notifies,
> > those are the only points at which the new state will be exposed to the
> > outside world.
> Doesn't the sync() have to happen before the file is first
> read by stated. Meaning before statd:main() calls load_state_number()?

I can't see why. sync() of course can't have any effect on a subsequent
read of the file.

> > The first lock request will also require an upcall to statd. So we're
> > OK as long as any monitor requests (from either the local kernel or
> > remote peers) do a sync.
> >
> > And statd should be doing a sync before responding to any monitor
> > request. As long as the SM_DIR is on the same filesystem as the state
> > file, that would do the job.... But now that I look, I see statd is
> > using an open with O_SYNC to ensure the new statd record hits stable
> > storage. Which we can't count on being enough.
> >
> > How about adding an explicit fsync() of the state file (and parent
> > directory) to statd's first succesful creation of a statd record,
> > together with a comment explaining this? So around about line 194 in
> > utils/statd/monitor.c:sm_mon_1_svc()?
> If we do the sync()/fsync() here we will also have to update MY_STATE
> since that's what is the number used in the RPCs.

I don't believe that's true.

> But also I think
> doing the sync this late be a bit waste since there is real good
> chance the rename has already been sync-ed out by previous sync()
> during boot up... or am I missing something...

The whole boot gets held up waiting for this one sync to complete; those
later sync's mainly only delay bringing up nfs.

--b.

2008-12-06 03:08:43

by J. Bruce Fields

[permalink] [raw]
Subject: Re: Make sm-notify faster if there are no servers to notify

On Fri, Dec 05, 2008 at 01:42:18PM -0500, Steve Dickson wrote:
> J. Bruce Fields wrote:
> > On Fri, Dec 05, 2008 at 08:26:44AM -0500, Steve Dickson wrote:
> >> J. Bruce Fields wrote:
> >>>> I think it would still be valuable to replace the 'sync' with two
> >>>> 'fsync's, one of the file, one on the directory.
> >>> Sure, may as well.--b.
> >>>
> >> Something similar to this:
> >>
> >> diff -up nfs-utils/utils/statd/sm-notify.c.orig nfs-utils/utils/statd/sm-notify.c
> >> --- nfs-utils/utils/statd/sm-notify.c.orig 2008-11-17 15:06:13.000000000 -0500
> >> +++ nfs-utils/utils/statd/sm-notify.c 2008-12-05 08:21:52.000000000 -0500
> >> @@ -211,12 +211,6 @@ usage: fprintf(stderr,
> >> backup_hosts(_SM_DIR_PATH, _SM_BAK_PATH);
> >> get_hosts(_SM_BAK_PATH);
> >>
> >> - /* If there are not hosts to notify, just exit */
> >> - if (!hosts) {
> >> - nsm_log(LOG_DEBUG, "No hosts to notify; exiting");
> >> - return 0;
> >> - }
> >
> > This was still a huge boot-time win in the common case, so now that
> > we've committed to it I'd rather not regress.
> I thought the idea was the state had to be updated even when there
> are no hosts...

I would prefer to increment the state file even in the absence of hosts
(it might be useful e.g. to a client that missed a previous grace period
due to a network problem to know that the current boot instance is later
than the one it needed to reclaim in). But I don't believe it's
necessary.

The more immediate problem is that the state file no longer gets created
at all on a new system. (Unless the package install scripts do it. I
haven't checked.)

So, we could either make sure the state file gets created some other
way, or delay the syncs as described before.

> >> @@ -730,13 +725,16 @@ nsm_get_state(int update)
> >> "Failed to write state to %s", newfile);
> >> exit(1);
> >> }
> >> + fsync(fd);
> >> close(fd);
> >> + fp = opendir(_SM_STATE_PATH);
> >
> > Also, I think you meant to:
> >
> > fsync(fp);
> > close(fp);
> No... I don't think so... the fd is an open() of the new file and the
> fp is a DIR stream... But there was a cut/paste error... The fsync()
> of the DIR stream was missing... here is is again...

OK.

> @@ -694,6 +688,7 @@ nsm_get_state(int update)
> {
> char newfile[PATH_MAX];
> int fd, state;
> + DIR *fp;
>
> if ((fd = open(_SM_STATE_PATH, O_RDONLY)) < 0) {
> if (!opt_quiet) {
> @@ -730,13 +725,18 @@ nsm_get_state(int update)
> "Failed to write state to %s", newfile);
> exit(1);
> }
> + fsync(fd);
> close(fd);
> + fp = opendir(_SM_STATE_PATH);
> if (rename(newfile, _SM_STATE_PATH) < 0) {
> nsm_log(LOG_ERR,
> "Cannot create %s: %m", _SM_STATE_PATH);
> exit(1);
> }
> - sync();

May as well move that opendir down here where it's used, though.

> + if (fp != NULL) {
> + fsync(dirfd(fp));
> + closedir(fp);
> + }
> }
>
> return state;

--b.

2008-12-08 19:51:18

by Chuck Lever

[permalink] [raw]
Subject: Re: Make sm-notify faster if there are no servers to notify

Hi Bruce-

On Dec 5, 2008, at Dec 5, 2008, 9:42 PM, J. Bruce Fields wrote:
> On Fri, Dec 05, 2008 at 01:41:32PM -0500, Chuck Lever wrote:
>> On Dec 5, 2008, at 12:29 PM, J. Bruce Fields wrote:
>>> How about adding an explicit fsync() of the state file (and parent
>>> directory) to statd's first succesful creation of a statd record,
>>> together with a comment explaining this? So around about line 194
>>> in
>>> utils/statd/monitor.c:sm_mon_1_svc()?
>>>
>>> In fact, we could delete the sync entirely and do the same before
>>> the
>>> first notification, and then we wouldn't have to wait for the sync
>>> in
>>> the case host records are present either.... (statd would, but
>>> perhaps we could still get other work done in the mean time).
>>>
>>> (Am I missing something?)
>>
>> This all might work, but I think we're adding a lot of complexity
>> as a
>> workaround.
>
> I think you mean the justification is too subtle--the code itself
> (just
> a couple syncs or fsyncs) is pretty simple.

Let me state it another way. Your suggested redesign makes
assumptions about the order in which these operations are performed.
Can the code provide any guarantee that this ordering is always true?
This may have been feasible when rpc.statd did notification too, but
these are now two separate programs. Can you think of a way of
implementing this where the ordering dependencies are coded instead of
commented?

It would be significantly less fragile (ie immune to ordering problems
and long-term changes to system start-up and nfs-utils code) and
simpler to understand if sm-notify does whatever syncing it needs, and
rpc.statd does whatever syncing it needs.

All I'm saying is this scheme may be easy to break by accident, and
any future problem here is probably not going to show up until someone
really needs this to work right.

>> Someone should fix the real problem, which is the
>> implementation of sync().
>
> I think you mean of fsync(). My understanding of past discussions on
> the issue is that it's not really fixable on ext3, at least. So
> default setups will have this problem for a while.

--
Chuck Lever
chuck[dot]lever[at]oracle[dot]com