2008-07-15 06:21:49

by NeilBrown

[permalink] [raw]
Subject: [PATCH] sm-notify: perform DNS lookup in the background.



Hi Steve,
if you agree with the following patch, it can be pulled from

git://neil.brown.name/nfs-utils for-steved


Thanks,
NeilBrown


From: Neil Brown <[email protected]>
Date: Tue, 15 Jul 2008 06:11:55 +1000
Subject: [PATCH] sm-notify: perform DNS lookup in the background.

If an NFS server has no network connectivity when it reboots,
it will block in sm-notify waiting for DNS lookup for a potentially
large number of hosts. This is not helpful and just annoys the
sysadmin.

So do the DNS lookup in the backgrounded phase of sm-notify,
before sending off the NOTIFY requests.

Signed-off-by: NeilBrown <[email protected]>
---
utils/statd/sm-notify.c | 23 ++++++++++++++---------
1 files changed, 14 insertions(+), 9 deletions(-)

diff --git a/utils/statd/sm-notify.c b/utils/statd/sm-notify.c
index bb67c37..8e00aac 100644
--- a/utils/statd/sm-notify.c
+++ b/utils/statd/sm-notify.c
@@ -286,6 +286,20 @@ notify(void)
hp = hosts;
hosts = hp->next;

+ if (hp->ai == NULL)
+ hp->ai = host_lookup(AF_UNSPEC, hp->name);
+ if (hp->ai == NULL) {
+ nsm_log(LOG_WARNING,
+ "%s doesn't seem to be a valid address,"
+ " skipped",
+ hp->name);
+ unlink(hp->path);
+ free(hp->name);
+ free(hp->path);
+ free(hp);
+ continue;
+ }
+
notify_host(sock, hp);

/* Set the timeout for this call, using an
@@ -532,15 +546,6 @@ get_hosts(const char *dirname)
if (stat(path, &stb) < 0)
continue;

- host->ai = host_lookup(AF_UNSPEC, de->d_name);
- if (! host->ai) {
- nsm_log(LOG_WARNING,
- "%s doesn't seem to be a valid address, skipped",
- de->d_name);
- unlink(path);
- continue;
- }
-
host->last_used = stb.st_mtime;
host->timeout = NSM_TIMEOUT;
host->path = strdup(path);
--
1.5.6.2



2008-07-15 15:32:06

by Chuck Lever

[permalink] [raw]
Subject: Re: [PATCH] sm-notify: perform DNS lookup in the background.

On Tue, Jul 15, 2008 at 2:21 AM, Neil Brown <[email protected]> wrote:
>
>
> Hi Steve,
> if you agree with the following patch, it can be pulled from
>
> git://neil.brown.name/nfs-utils for-steved
>
>
> Thanks,
> NeilBrown
>
>
> From: Neil Brown <[email protected]>
> Date: Tue, 15 Jul 2008 06:11:55 +1000
> Subject: [PATCH] sm-notify: perform DNS lookup in the background.
>
> If an NFS server has no network connectivity when it reboots,
> it will block in sm-notify waiting for DNS lookup for a potentially
> large number of hosts. This is not helpful and just annoys the
> sysadmin.
>
> So do the DNS lookup in the backgrounded phase of sm-notify,
> before sending off the NOTIFY requests.

Good idea. A couple of minor comments below.

>
> Signed-off-by: NeilBrown <[email protected]>
> ---
> utils/statd/sm-notify.c | 23 ++++++++++++++---------
> 1 files changed, 14 insertions(+), 9 deletions(-)
>
> diff --git a/utils/statd/sm-notify.c b/utils/statd/sm-notify.c
> index bb67c37..8e00aac 100644
> --- a/utils/statd/sm-notify.c
> +++ b/utils/statd/sm-notify.c
> @@ -286,6 +286,20 @@ notify(void)
> hp = hosts;
> hosts = hp->next;
>
> + if (hp->ai == NULL)
> + hp->ai = host_lookup(AF_UNSPEC, hp->name);
> + if (hp->ai == NULL) {
> + nsm_log(LOG_WARNING,
> + "%s doesn't seem to be a valid address,"
> + " skipped",
> + hp->name);
> + unlink(hp->path);
> + free(hp->name);
> + free(hp->path);
> + free(hp);
> + continue;
> + }
> +

Arguably it would be cleaner architecturally if the DNS lookup were in
notify_host(). Since doing it in notify() means you will be looking
up these addresses on retries, maybe the host_lookup() call should be
integrated into the "If we retransmitted 4 times" logic. That would
be an opportunity to fix the addrinfo leak in there.

[Note that freeaddrinfo(3) simply walks the list of addrinfo
structures and frees them. By setting ai_next to NULL in some of the
addrinfo structures, sm-notify is orphaning them -- freeaddrinfo(3)
will never find them].

> notify_host(sock, hp);
>
> /* Set the timeout for this call, using an
> @@ -532,15 +546,6 @@ get_hosts(const char *dirname)

You should probably fix the documenting comment for get_hosts() --
"convert them to host entries" might be more precise.

> if (stat(path, &stb) < 0)
> continue;
>
> - host->ai = host_lookup(AF_UNSPEC, de->d_name);
> - if (! host->ai) {
> - nsm_log(LOG_WARNING,
> - "%s doesn't seem to be a valid address, skipped",
> - de->d_name);
> - unlink(path);
> - continue;
> - }
> -
> host->last_used = stb.st_mtime;
> host->timeout = NSM_TIMEOUT;
> host->path = strdup(path);
> --
> 1.5.6.2

--
Chuck Lever

2008-07-16 18:44:29

by Steve Dickson

[permalink] [raw]
Subject: Re: [PATCH] sm-notify: perform DNS lookup in the background.



Chuck Lever wrote:
> On Tue, Jul 15, 2008 at 2:21 AM, Neil Brown <[email protected]> wrote:

>> If an NFS server has no network connectivity when it reboots,
>> it will block in sm-notify waiting for DNS lookup for a potentially
>> large number of hosts. This is not helpful and just annoys the
>> sysadmin.
>>
>> So do the DNS lookup in the backgrounded phase of sm-notify,
>> before sending off the NOTIFY requests.
>
> Good idea. A couple of minor comments below.
>

>
> Arguably it would be cleaner architecturally if the DNS lookup were in
> notify_host(). Since doing it in notify() means you will be looking
> up these addresses on retries, maybe the host_lookup() call should be
> integrated into the "If we retransmitted 4 times" logic. That would
> be an opportunity to fix the addrinfo leak in there.
>
> [Note that freeaddrinfo(3) simply walks the list of addrinfo
> structures and frees them. By setting ai_next to NULL in some of the
> addrinfo structures, sm-notify is orphaning them -- freeaddrinfo(3)
> will never find them].
>

>> /* Set the timeout for this call, using an
>> @@ -532,15 +546,6 @@ get_hosts(const char *dirname)
>
> You should probably fix the documenting comment for get_hosts() --
> "convert them to host entries" might be more precise.
>


I believe the following patch address those minor comments... true?

Signed-off-by: Steve Dickson <[email protected]>

diff --git a/utils/statd/sm-notify.c b/utils/statd/sm-notify.c
index bb67c37..2fc8ec6 100644
--- a/utils/statd/sm-notify.c
+++ b/utils/statd/sm-notify.c
@@ -76,7 +76,7 @@ static int log_syslog = 0;

static unsigned int nsm_get_state(int);
static void notify(void);
-static void notify_host(int, struct nsm_host *);
+static int notify_host(int, struct nsm_host *);
static void recv_reply(int);
static void backup_hosts(const char *, const char *);
static void get_hosts(const char *);
@@ -286,7 +286,13 @@ notify(void)
hp = hosts;
hosts = hp->next;

- notify_host(sock, hp);
+ if (notify_host(sock, hp)){
+ unlink(hp->path);
+ free(hp->name);
+ free(hp->path);
+ free(hp);
+ continue;
+ }

/* Set the timeout for this call, using an
exponential timeout strategy */
@@ -318,7 +324,7 @@ notify(void)
/*
* Send notification to a single host
*/
-void
+int
notify_host(int sock, struct nsm_host *host)
{
static unsigned int xid = 0;
@@ -331,6 +337,15 @@ notify_host(int sock, struct nsm_host *host)
if (!host->xid)
host->xid = xid++;

+ if (hp->ai == NULL) {
+ hp->ai = host_lookup(AF_UNSPEC, hp->name);
+ if (hp->ai == NULL)
+ nsm_log(LOG_WARNING,
+ "%s doesn't seem to be a valid address,"
+ " skipped", hp->name);
+ return 1;
+ }
+
memset(msgbuf, 0, sizeof(msgbuf));
p = msgbuf;
*p++ = htonl(host->xid);
@@ -349,7 +364,6 @@ notify_host(int sock, struct nsm_host *host)
while ( *next )
next = & (*next)->ai_next;
*next = hold;
- hold->ai_next = NULL;
memcpy(&host->addr, hold->ai_addr, hold->ai_addrlen);
addr_set_port(&host->addr, 0);
host->retries = 0;
@@ -394,7 +408,11 @@ notify_host(int sock, struct nsm_host *host)
}
len = (p - msgbuf) << 2;

- sendto(sock, msgbuf, len, 0, (struct sockaddr *) &dest, sizeof(dest));
+ if (sendto(sock, msgbuf, len, 0, (struct sockaddr *) &dest, sizeof(dest)) < 0)
+ nsm_log(LOG_WARNING, "Sending Reboot Notification to "
+ "'%s' failed: errno %d (%s)", host->name, errno, strerror(errno));
+
+ return 0;
}

/*
@@ -504,7 +522,7 @@ backup_hosts(const char *dirname, const char *bakname)
}

/*
- * Get all entries from sm.bak and convert them to host names
+ * Get all entries from sm.bak and convert them to host entries
*/
static void
get_hosts(const char *dirname)
@@ -532,15 +550,6 @@ get_hosts(const char *dirname)
if (stat(path, &stb) < 0)
continue;

- host->ai = host_lookup(AF_UNSPEC, de->d_name);
- if (! host->ai) {
- nsm_log(LOG_WARNING,
- "%s doesn't seem to be a valid address, skipped",
- de->d_name);
- unlink(path);
- continue;
- }
-
host->last_used = stb.st_mtime;
host->timeout = NSM_TIMEOUT;
host->path = strdup(path);

2008-07-17 02:41:43

by NeilBrown

[permalink] [raw]
Subject: Re: [PATCH] sm-notify: perform DNS lookup in the background.

On Wednesday July 16, [email protected] wrote:
>
> >
> > Arguably it would be cleaner architecturally if the DNS lookup were in
> > notify_host(). Since doing it in notify() means you will be looking
> > up these addresses on retries, maybe the host_lookup() call should be
> > integrated into the "If we retransmitted 4 times" logic. That would
> > be an opportunity to fix the addrinfo leak in there.

What addrinfo leak is that?

> >
> > [Note that freeaddrinfo(3) simply walks the list of addrinfo
> > structures and frees them. By setting ai_next to NULL in some of the
> > addrinfo structures, sm-notify is orphaning them -- freeaddrinfo(3)
> > will never find them].
> >

I assume we are talking about the code in notify_host in the
if (host->retries >= 4) {
branch.

What the code is doing is taking the first addrinfo from the host->ai
list, and moving it to the end - effectively rotating the list.
This shouldn't change what freeaddrinfo will do.

One of us is missing something.
>
>
> I believe the following patch address those minor comments... true?

Except for:

> @@ -349,7 +364,6 @@ notify_host(int sock, struct nsm_host *host)
> while ( *next )
> next = & (*next)->ai_next;
> *next = hold;
> - hold->ai_next = NULL;
> memcpy(&host->addr, hold->ai_addr, hold->ai_addrlen);
> addr_set_port(&host->addr, 0);
> host->retries = 0;

which I think is wrong, and wondering why:

> @@ -394,7 +408,11 @@ notify_host(int sock, struct nsm_host *host)
> }
> len = (p - msgbuf) << 2;
>
> - sendto(sock, msgbuf, len, 0, (struct sockaddr *) &dest, sizeof(dest));
> + if (sendto(sock, msgbuf, len, 0, (struct sockaddr *) &dest, sizeof(dest)) < 0)
> + nsm_log(LOG_WARNING, "Sending Reboot Notification to "
> + "'%s' failed: errno %d (%s)", host->name, errno, strerror(errno));
> +
> + return 0;
> }
>
> /*

Yes, I believe it addresses those comments.

Acked-by: NeilBrown <[email protected]>

Thanks!
NeilBrown

2008-07-17 10:30:23

by Steve Dickson

[permalink] [raw]
Subject: Re: [PATCH] sm-notify: perform DNS lookup in the background.



Neil Brown wrote:
> On Wednesday July 16, [email protected] wrote:
>>> Arguably it would be cleaner architecturally if the DNS lookup were in
>>> notify_host(). Since doing it in notify() means you will be looking
>>> up these addresses on retries, maybe the host_lookup() call should be
>>> integrated into the "If we retransmitted 4 times" logic. That would
>>> be an opportunity to fix the addrinfo leak in there.
>
> What addrinfo leak is that?
See questions below...

>
>>> [Note that freeaddrinfo(3) simply walks the list of addrinfo
>>> structures and frees them. By setting ai_next to NULL in some of the
>>> addrinfo structures, sm-notify is orphaning them -- freeaddrinfo(3)
>>> will never find them].
>>>
>
> I assume we are talking about the code in notify_host in the
> if (host->retries >= 4) {
> branch.
>
> What the code is doing is taking the first addrinfo from the host->ai
> list, and moving it to the end - effectively rotating the list.
> This shouldn't change what freeaddrinfo will do.
>
> One of us is missing something.
>>
>> I believe the following patch address those minor comments... true?
>
> Except for:
>
>> @@ -349,7 +364,6 @@ notify_host(int sock, struct nsm_host *host)
>> while ( *next )
>> next = & (*next)->ai_next;
>> *next = hold;
>> - hold->ai_next = NULL;
>> memcpy(&host->addr, hold->ai_addr, hold->ai_addrlen);
>> addr_set_port(&host->addr, 0);
>> host->retries = 0;
>
> which I think is wrong, and wondering why:
After the while loop doesn't hold point to the head of the list?
If so, setting hold->ai_next = NULL; orphans the rest of the list, right?
Or am I missing something...

steved.


2008-07-17 15:09:33

by Chuck Lever

[permalink] [raw]
Subject: Re: [PATCH] sm-notify: perform DNS lookup in the background.

On Wed, Jul 16, 2008 at 10:41 PM, Neil Brown <[email protected]> wrote:
> On Wednesday July 16, [email protected] wrote:
>>
>> >
>> > Arguably it would be cleaner architecturally if the DNS lookup were in
>> > notify_host(). Since doing it in notify() means you will be looking
>> > up these addresses on retries, maybe the host_lookup() call should be
>> > integrated into the "If we retransmitted 4 times" logic. That would
>> > be an opportunity to fix the addrinfo leak in there.
>
> What addrinfo leak is that?
>
>> >
>> > [Note that freeaddrinfo(3) simply walks the list of addrinfo
>> > structures and frees them. By setting ai_next to NULL in some of the
>> > addrinfo structures, sm-notify is orphaning them -- freeaddrinfo(3)
>> > will never find them].
>> >
>
> I assume we are talking about the code in notify_host in the
> if (host->retries >= 4) {
> branch.
>
> What the code is doing is taking the first addrinfo from the host->ai
> list, and moving it to the end - effectively rotating the list.
> This shouldn't change what freeaddrinfo will do.
>
> One of us is missing something.

Even after staring at this for half an hour last week, it looked to me
like you were chopping off the last addrinfo in the list each time
through the "retries" loop. I could hope for something a little more
straightforward to do the address rotation.

It would be nice to consolidate the acquisition and freeing of the
addrinfo structs for each host so the lifetime of the results is made
clear. Moving the host_lookup() into notify_host() does help with
that.

There is another addrinfo leak in notify(), I think. In the "Bind
source IP if provided on command line" paragraph, host_lookup() is
called and the address is copied, but that list of addrinfo structs is
never freed, is it?

>> I believe the following patch address those minor comments... true?
>
> Except for:
>
>> @@ -349,7 +364,6 @@ notify_host(int sock, struct nsm_host *host)
>> while ( *next )
>> next = & (*next)->ai_next;
>> *next = hold;
>> - hold->ai_next = NULL;
>> memcpy(&host->addr, hold->ai_addr, hold->ai_addrlen);
>> addr_set_port(&host->addr, 0);
>> host->retries = 0;
>
> which I think is wrong, and wondering why:
>
>> @@ -394,7 +408,11 @@ notify_host(int sock, struct nsm_host *host)
>> }
>> len = (p - msgbuf) << 2;
>>
>> - sendto(sock, msgbuf, len, 0, (struct sockaddr *) &dest, sizeof(dest));
>> + if (sendto(sock, msgbuf, len, 0, (struct sockaddr *) &dest, sizeof(dest)) < 0)
>> + nsm_log(LOG_WARNING, "Sending Reboot Notification to "
>> + "'%s' failed: errno %d (%s)", host->name, errno, strerror(errno));
>> +
>> + return 0;
>> }

Yes, it's good to check for a return code here; but I'm wondering why
change it in this patch?

>>
>> /*
>
> Yes, I believe it addresses those comments.
>
> Acked-by: NeilBrown <[email protected]>
>
> Thanks!
> NeilBrown

--
Chuck Lever