2008-09-26 23:09:39

by Bruce Fields

[permalink] [raw]
Subject: Re: [PATCH 08/10] lockd: struct nlm_reboot should contain a full socket address

On Wed, Sep 17, 2008 at 11:18:11AM -0500, Chuck Lever wrote:
> The XDR decoders for the NSM NOTIFY procedure should construct a full
> socket address and store it in the nlm_reboot structure. In addition
> to being able to store larger addresses, this means upper layer
> routines get an address family tag so they can distinguish between
> AF_INET and AF_INET6 addresses.
>
> This also keeps potentially large socket addresses off the stack and
> instead in dynamically allocated storage.

So one way to think of this would be that you're extending the
kernel<->statd interface by using the address family of statd's notify
call to communicate the address family of the host that rebooted.

Do I have that right?

--b.

>
> Signed-off-by: Chuck Lever <[email protected]>
> ---
>
> fs/lockd/svc4proc.c | 10 +---------
> fs/lockd/svcproc.c | 10 +---------
> fs/lockd/xdr.c | 28 ++++++++++++++++++++++++++--
> fs/lockd/xdr4.c | 27 +++++++++++++++++++++++++--
> include/linux/lockd/xdr.h | 3 ++-
> 5 files changed, 55 insertions(+), 23 deletions(-)
>
> diff --git a/fs/lockd/svc4proc.c b/fs/lockd/svc4proc.c
> index 6a5ef9f..e61c05f 100644
> --- a/fs/lockd/svc4proc.c
> +++ b/fs/lockd/svc4proc.c
> @@ -430,8 +430,6 @@ static __be32
> nlm4svc_proc_sm_notify(struct svc_rqst *rqstp, struct nlm_reboot *argp,
> void *resp)
> {
> - struct sockaddr_in saddr;
> -
> dprintk("lockd: SM_NOTIFY called\n");
>
> if (!nlm_privileged_requester(rqstp)) {
> @@ -441,13 +439,7 @@ nlm4svc_proc_sm_notify(struct svc_rqst *rqstp, struct nlm_reboot *argp,
> return rpc_system_err;
> }
>
> - /* Obtain the host pointer for this NFS server and try to
> - * reclaim all locks we hold on this server.
> - */
> - memset(&saddr, 0, sizeof(saddr));
> - saddr.sin_family = AF_INET;
> - saddr.sin_addr.s_addr = argp->addr;
> - nlm_host_rebooted((struct sockaddr *)&saddr, sizeof(saddr),
> + nlm_host_rebooted((struct sockaddr *)&argp->addr, argp->addrlen,
> argp->mon, argp->len, argp->state);
>
> return rpc_success;
> diff --git a/fs/lockd/svcproc.c b/fs/lockd/svcproc.c
> index 62fcfdb..86a487a 100644
> --- a/fs/lockd/svcproc.c
> +++ b/fs/lockd/svcproc.c
> @@ -462,8 +462,6 @@ static __be32
> nlmsvc_proc_sm_notify(struct svc_rqst *rqstp, struct nlm_reboot *argp,
> void *resp)
> {
> - struct sockaddr_in saddr;
> -
> dprintk("lockd: SM_NOTIFY called\n");
>
> if (!nlm_privileged_requester(rqstp)) {
> @@ -473,13 +471,7 @@ nlmsvc_proc_sm_notify(struct svc_rqst *rqstp, struct nlm_reboot *argp,
> return rpc_system_err;
> }
>
> - /* Obtain the host pointer for this NFS server and try to
> - * reclaim all locks we hold on this server.
> - */
> - memset(&saddr, 0, sizeof(saddr));
> - saddr.sin_family = AF_INET;
> - saddr.sin_addr.s_addr = argp->addr;
> - nlm_host_rebooted((struct sockaddr *)&saddr, sizeof(saddr),
> + nlm_host_rebooted((struct sockaddr *)&argp->addr, argp->addrlen,
> argp->mon, argp->len, argp->state);
>
> return rpc_success;
> diff --git a/fs/lockd/xdr.c b/fs/lockd/xdr.c
> index 1f22629..92c5695 100644
> --- a/fs/lockd/xdr.c
> +++ b/fs/lockd/xdr.c
> @@ -9,6 +9,7 @@
> #include <linux/types.h>
> #include <linux/sched.h>
> #include <linux/utsname.h>
> +#include <linux/in.h>
> #include <linux/nfs.h>
>
> #include <linux/sunrpc/xdr.h>
> @@ -346,11 +347,34 @@ nlmsvc_decode_notify(struct svc_rqst *rqstp, __be32 *p, struct nlm_args *argp)
> int
> nlmsvc_decode_reboot(struct svc_rqst *rqstp, __be32 *p, struct nlm_reboot *argp)
> {
> + struct sockaddr_in *sin = (struct sockaddr_in *)&argp->addr;
> + struct sockaddr_in6 *sin6 = (struct sockaddr_in6 *)&argp->addr;
> +
> if (!(p = xdr_decode_string_inplace(p, &argp->mon, &argp->len, SM_MAXSTRLEN)))
> return 0;
> argp->state = ntohl(*p++);
> - /* Preserve the address in network byte order */
> - argp->addr = *p++;
> +
> + /* Decode the 16-byte private field */
> + memset(&argp->addr, 0, sizeof(argp->addr));
> + switch (svc_addr(rqstp)->sa_family) {
> + case AF_INET:
> + /* data in recv buffer is already in network byte order */
> + sin->sin_family = AF_INET;
> + sin->sin_addr.s_addr = *p++;
> + argp->addrlen = sizeof(*sin);
> + break;
> + case AF_INET6:
> + sin6->sin6_family = AF_INET6;
> + sin6->sin6_addr.s6_addr32[0] = *p++;
> + sin6->sin6_addr.s6_addr32[1] = *p++;
> + sin6->sin6_addr.s6_addr32[2] = *p++;
> + sin6->sin6_addr.s6_addr32[3] = *p++;
> + argp->addrlen = sizeof(*sin6);
> + break;
> + default:
> + return -EIO;
> + }
> +
> return xdr_argsize_check(rqstp, p);
> }
>
> diff --git a/fs/lockd/xdr4.c b/fs/lockd/xdr4.c
> index 50c493a..2009020 100644
> --- a/fs/lockd/xdr4.c
> +++ b/fs/lockd/xdr4.c
> @@ -353,11 +353,34 @@ nlm4svc_decode_notify(struct svc_rqst *rqstp, __be32 *p, struct nlm_args *argp)
> int
> nlm4svc_decode_reboot(struct svc_rqst *rqstp, __be32 *p, struct nlm_reboot *argp)
> {
> + struct sockaddr_in *sin = (struct sockaddr_in *)&argp->addr;
> + struct sockaddr_in6 *sin6 = (struct sockaddr_in6 *)&argp->addr;
> +
> if (!(p = xdr_decode_string_inplace(p, &argp->mon, &argp->len, SM_MAXSTRLEN)))
> return 0;
> argp->state = ntohl(*p++);
> - /* Preserve the address in network byte order */
> - argp->addr = *p++;
> +
> + /* Decode the 16-byte private field */
> + memset(&argp->addr, 0, sizeof(argp->addr));
> + switch (svc_addr(rqstp)->sa_family) {
> + case AF_INET:
> + /* address in recv buffer is already in network byte order */
> + sin->sin_family = AF_INET;
> + sin->sin_addr.s_addr = *p++;
> + argp->addrlen = sizeof(*sin);
> + break;
> + case AF_INET6:
> + sin6->sin6_family = AF_INET6;
> + sin6->sin6_addr.s6_addr32[0] = *p++;
> + sin6->sin6_addr.s6_addr32[1] = *p++;
> + sin6->sin6_addr.s6_addr32[2] = *p++;
> + sin6->sin6_addr.s6_addr32[3] = *p++;
> + argp->addrlen = sizeof(*sin6);
> + break;
> + default:
> + return -EIO;
> + }
> +
> return xdr_argsize_check(rqstp, p);
> }
>
> diff --git a/include/linux/lockd/xdr.h b/include/linux/lockd/xdr.h
> index d6b3a80..6057b7e 100644
> --- a/include/linux/lockd/xdr.h
> +++ b/include/linux/lockd/xdr.h
> @@ -80,7 +80,8 @@ struct nlm_reboot {
> char * mon;
> unsigned int len;
> u32 state;
> - __be32 addr;
> + struct sockaddr_storage addr;
> + size_t addrlen;
> };
>
> /*
>


2008-10-01 16:17:20

by Chuck Lever III

[permalink] [raw]
Subject: Re: [PATCH 08/10] lockd: struct nlm_reboot should contain a full socket address

On Sep 26, 2008, at Sep 26, 2008, 7:09 PM, J. Bruce Fields wrote:
> On Wed, Sep 17, 2008 at 11:18:11AM -0500, Chuck Lever wrote:
>> The XDR decoders for the NSM NOTIFY procedure should construct a full
>> socket address and store it in the nlm_reboot structure. In addition
>> to being able to store larger addresses, this means upper layer
>> routines get an address family tag so they can distinguish between
>> AF_INET and AF_INET6 addresses.
>>
>> This also keeps potentially large socket addresses off the stack and
>> instead in dynamically allocated storage.
>
> So one way to think of this would be that you're extending the
> kernel<->statd interface by using the address family of statd's notify
> call to communicate the address family of the host that rebooted.
>
> Do I have that right?

For statd, we're using the same technique that we used when
constructing the source address in nlmsvc_lookup_host(). There's no
family tag associated with the address because the 16-byte opaque in
the on-the-wire format has room only for the sin6_addr part of the
address. The notification downcall for IPv6 peers has to come via
IPv6 loopback for this to work.

>
>
> --b.
>
>>
>> Signed-off-by: Chuck Lever <[email protected]>
>> ---
>>
>> fs/lockd/svc4proc.c | 10 +---------
>> fs/lockd/svcproc.c | 10 +---------
>> fs/lockd/xdr.c | 28 ++++++++++++++++++++++++++--
>> fs/lockd/xdr4.c | 27 +++++++++++++++++++++++++--
>> include/linux/lockd/xdr.h | 3 ++-
>> 5 files changed, 55 insertions(+), 23 deletions(-)
>>
>> diff --git a/fs/lockd/svc4proc.c b/fs/lockd/svc4proc.c
>> index 6a5ef9f..e61c05f 100644
>> --- a/fs/lockd/svc4proc.c
>> +++ b/fs/lockd/svc4proc.c
>> @@ -430,8 +430,6 @@ static __be32
>> nlm4svc_proc_sm_notify(struct svc_rqst *rqstp, struct nlm_reboot
>> *argp,
>> void *resp)
>> {
>> - struct sockaddr_in saddr;
>> -
>> dprintk("lockd: SM_NOTIFY called\n");
>>
>> if (!nlm_privileged_requester(rqstp)) {
>> @@ -441,13 +439,7 @@ nlm4svc_proc_sm_notify(struct svc_rqst *rqstp,
>> struct nlm_reboot *argp,
>> return rpc_system_err;
>> }
>>
>> - /* Obtain the host pointer for this NFS server and try to
>> - * reclaim all locks we hold on this server.
>> - */
>> - memset(&saddr, 0, sizeof(saddr));
>> - saddr.sin_family = AF_INET;
>> - saddr.sin_addr.s_addr = argp->addr;
>> - nlm_host_rebooted((struct sockaddr *)&saddr, sizeof(saddr),
>> + nlm_host_rebooted((struct sockaddr *)&argp->addr, argp->addrlen,
>> argp->mon, argp->len, argp->state);
>>
>> return rpc_success;
>> diff --git a/fs/lockd/svcproc.c b/fs/lockd/svcproc.c
>> index 62fcfdb..86a487a 100644
>> --- a/fs/lockd/svcproc.c
>> +++ b/fs/lockd/svcproc.c
>> @@ -462,8 +462,6 @@ static __be32
>> nlmsvc_proc_sm_notify(struct svc_rqst *rqstp, struct nlm_reboot
>> *argp,
>> void *resp)
>> {
>> - struct sockaddr_in saddr;
>> -
>> dprintk("lockd: SM_NOTIFY called\n");
>>
>> if (!nlm_privileged_requester(rqstp)) {
>> @@ -473,13 +471,7 @@ nlmsvc_proc_sm_notify(struct svc_rqst *rqstp,
>> struct nlm_reboot *argp,
>> return rpc_system_err;
>> }
>>
>> - /* Obtain the host pointer for this NFS server and try to
>> - * reclaim all locks we hold on this server.
>> - */
>> - memset(&saddr, 0, sizeof(saddr));
>> - saddr.sin_family = AF_INET;
>> - saddr.sin_addr.s_addr = argp->addr;
>> - nlm_host_rebooted((struct sockaddr *)&saddr, sizeof(saddr),
>> + nlm_host_rebooted((struct sockaddr *)&argp->addr, argp->addrlen,
>> argp->mon, argp->len, argp->state);
>>
>> return rpc_success;
>> diff --git a/fs/lockd/xdr.c b/fs/lockd/xdr.c
>> index 1f22629..92c5695 100644
>> --- a/fs/lockd/xdr.c
>> +++ b/fs/lockd/xdr.c
>> @@ -9,6 +9,7 @@
>> #include <linux/types.h>
>> #include <linux/sched.h>
>> #include <linux/utsname.h>
>> +#include <linux/in.h>
>> #include <linux/nfs.h>
>>
>> #include <linux/sunrpc/xdr.h>
>> @@ -346,11 +347,34 @@ nlmsvc_decode_notify(struct svc_rqst *rqstp,
>> __be32 *p, struct nlm_args *argp)
>> int
>> nlmsvc_decode_reboot(struct svc_rqst *rqstp, __be32 *p, struct
>> nlm_reboot *argp)
>> {
>> + struct sockaddr_in *sin = (struct sockaddr_in *)&argp->addr;
>> + struct sockaddr_in6 *sin6 = (struct sockaddr_in6 *)&argp->addr;
>> +
>> if (!(p = xdr_decode_string_inplace(p, &argp->mon, &argp->len,
>> SM_MAXSTRLEN)))
>> return 0;
>> argp->state = ntohl(*p++);
>> - /* Preserve the address in network byte order */
>> - argp->addr = *p++;
>> +
>> + /* Decode the 16-byte private field */
>> + memset(&argp->addr, 0, sizeof(argp->addr));
>> + switch (svc_addr(rqstp)->sa_family) {
>> + case AF_INET:
>> + /* data in recv buffer is already in network byte order */
>> + sin->sin_family = AF_INET;
>> + sin->sin_addr.s_addr = *p++;
>> + argp->addrlen = sizeof(*sin);
>> + break;
>> + case AF_INET6:
>> + sin6->sin6_family = AF_INET6;
>> + sin6->sin6_addr.s6_addr32[0] = *p++;
>> + sin6->sin6_addr.s6_addr32[1] = *p++;
>> + sin6->sin6_addr.s6_addr32[2] = *p++;
>> + sin6->sin6_addr.s6_addr32[3] = *p++;
>> + argp->addrlen = sizeof(*sin6);
>> + break;
>> + default:
>> + return -EIO;
>> + }
>> +
>> return xdr_argsize_check(rqstp, p);
>> }
>>
>> diff --git a/fs/lockd/xdr4.c b/fs/lockd/xdr4.c
>> index 50c493a..2009020 100644
>> --- a/fs/lockd/xdr4.c
>> +++ b/fs/lockd/xdr4.c
>> @@ -353,11 +353,34 @@ nlm4svc_decode_notify(struct svc_rqst *rqstp,
>> __be32 *p, struct nlm_args *argp)
>> int
>> nlm4svc_decode_reboot(struct svc_rqst *rqstp, __be32 *p, struct
>> nlm_reboot *argp)
>> {
>> + struct sockaddr_in *sin = (struct sockaddr_in *)&argp->addr;
>> + struct sockaddr_in6 *sin6 = (struct sockaddr_in6 *)&argp->addr;
>> +
>> if (!(p = xdr_decode_string_inplace(p, &argp->mon, &argp->len,
>> SM_MAXSTRLEN)))
>> return 0;
>> argp->state = ntohl(*p++);
>> - /* Preserve the address in network byte order */
>> - argp->addr = *p++;
>> +
>> + /* Decode the 16-byte private field */
>> + memset(&argp->addr, 0, sizeof(argp->addr));
>> + switch (svc_addr(rqstp)->sa_family) {
>> + case AF_INET:
>> + /* address in recv buffer is already in network byte order */
>> + sin->sin_family = AF_INET;
>> + sin->sin_addr.s_addr = *p++;
>> + argp->addrlen = sizeof(*sin);
>> + break;
>> + case AF_INET6:
>> + sin6->sin6_family = AF_INET6;
>> + sin6->sin6_addr.s6_addr32[0] = *p++;
>> + sin6->sin6_addr.s6_addr32[1] = *p++;
>> + sin6->sin6_addr.s6_addr32[2] = *p++;
>> + sin6->sin6_addr.s6_addr32[3] = *p++;
>> + argp->addrlen = sizeof(*sin6);
>> + break;
>> + default:
>> + return -EIO;
>> + }
>> +
>> return xdr_argsize_check(rqstp, p);
>> }
>>
>> diff --git a/include/linux/lockd/xdr.h b/include/linux/lockd/xdr.h
>> index d6b3a80..6057b7e 100644
>> --- a/include/linux/lockd/xdr.h
>> +++ b/include/linux/lockd/xdr.h
>> @@ -80,7 +80,8 @@ struct nlm_reboot {
>> char * mon;
>> unsigned int len;
>> u32 state;
>> - __be32 addr;
>> + struct sockaddr_storage addr;
>> + size_t addrlen;
>> };
>>
>> /*
>>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs"
> in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html

--
Chuck Lever
chuck[dot]lever[at]oracle[dot]com





2008-10-01 18:18:02

by Bruce Fields

[permalink] [raw]
Subject: Re: [PATCH 08/10] lockd: struct nlm_reboot should contain a full socket address

On Wed, Oct 01, 2008 at 12:17:02PM -0400, Chuck Lever wrote:
> On Sep 26, 2008, at Sep 26, 2008, 7:09 PM, J. Bruce Fields wrote:
>> On Wed, Sep 17, 2008 at 11:18:11AM -0500, Chuck Lever wrote:
>>> The XDR decoders for the NSM NOTIFY procedure should construct a full
>>> socket address and store it in the nlm_reboot structure. In addition
>>> to being able to store larger addresses, this means upper layer
>>> routines get an address family tag so they can distinguish between
>>> AF_INET and AF_INET6 addresses.
>>>
>>> This also keeps potentially large socket addresses off the stack and
>>> instead in dynamically allocated storage.
>>
>> So one way to think of this would be that you're extending the
>> kernel<->statd interface by using the address family of statd's notify
>> call to communicate the address family of the host that rebooted.
>>
>> Do I have that right?
>
> For statd, we're using the same technique that we used when constructing
> the source address in nlmsvc_lookup_host(). There's no family tag
> associated with the address because the 16-byte opaque in the on-the-wire
> format has room only for the sin6_addr part of the address.

OK. It seems a bit tricky, but I can't see why it doesn't work.

--b.

> The notification downcall for IPv6 peers has to come via IPv6 loopback
> for this to work.
>
>>
>>
>> --b.
>>
>>>
>>> Signed-off-by: Chuck Lever <[email protected]>
>>> ---
>>>
>>> fs/lockd/svc4proc.c | 10 +---------
>>> fs/lockd/svcproc.c | 10 +---------
>>> fs/lockd/xdr.c | 28 ++++++++++++++++++++++++++--
>>> fs/lockd/xdr4.c | 27 +++++++++++++++++++++++++--
>>> include/linux/lockd/xdr.h | 3 ++-
>>> 5 files changed, 55 insertions(+), 23 deletions(-)
>>>
>>> diff --git a/fs/lockd/svc4proc.c b/fs/lockd/svc4proc.c
>>> index 6a5ef9f..e61c05f 100644
>>> --- a/fs/lockd/svc4proc.c
>>> +++ b/fs/lockd/svc4proc.c
>>> @@ -430,8 +430,6 @@ static __be32
>>> nlm4svc_proc_sm_notify(struct svc_rqst *rqstp, struct nlm_reboot
>>> *argp,
>>> void *resp)
>>> {
>>> - struct sockaddr_in saddr;
>>> -
>>> dprintk("lockd: SM_NOTIFY called\n");
>>>
>>> if (!nlm_privileged_requester(rqstp)) {
>>> @@ -441,13 +439,7 @@ nlm4svc_proc_sm_notify(struct svc_rqst *rqstp,
>>> struct nlm_reboot *argp,
>>> return rpc_system_err;
>>> }
>>>
>>> - /* Obtain the host pointer for this NFS server and try to
>>> - * reclaim all locks we hold on this server.
>>> - */
>>> - memset(&saddr, 0, sizeof(saddr));
>>> - saddr.sin_family = AF_INET;
>>> - saddr.sin_addr.s_addr = argp->addr;
>>> - nlm_host_rebooted((struct sockaddr *)&saddr, sizeof(saddr),
>>> + nlm_host_rebooted((struct sockaddr *)&argp->addr, argp->addrlen,
>>> argp->mon, argp->len, argp->state);
>>>
>>> return rpc_success;
>>> diff --git a/fs/lockd/svcproc.c b/fs/lockd/svcproc.c
>>> index 62fcfdb..86a487a 100644
>>> --- a/fs/lockd/svcproc.c
>>> +++ b/fs/lockd/svcproc.c
>>> @@ -462,8 +462,6 @@ static __be32
>>> nlmsvc_proc_sm_notify(struct svc_rqst *rqstp, struct nlm_reboot
>>> *argp,
>>> void *resp)
>>> {
>>> - struct sockaddr_in saddr;
>>> -
>>> dprintk("lockd: SM_NOTIFY called\n");
>>>
>>> if (!nlm_privileged_requester(rqstp)) {
>>> @@ -473,13 +471,7 @@ nlmsvc_proc_sm_notify(struct svc_rqst *rqstp,
>>> struct nlm_reboot *argp,
>>> return rpc_system_err;
>>> }
>>>
>>> - /* Obtain the host pointer for this NFS server and try to
>>> - * reclaim all locks we hold on this server.
>>> - */
>>> - memset(&saddr, 0, sizeof(saddr));
>>> - saddr.sin_family = AF_INET;
>>> - saddr.sin_addr.s_addr = argp->addr;
>>> - nlm_host_rebooted((struct sockaddr *)&saddr, sizeof(saddr),
>>> + nlm_host_rebooted((struct sockaddr *)&argp->addr, argp->addrlen,
>>> argp->mon, argp->len, argp->state);
>>>
>>> return rpc_success;
>>> diff --git a/fs/lockd/xdr.c b/fs/lockd/xdr.c
>>> index 1f22629..92c5695 100644
>>> --- a/fs/lockd/xdr.c
>>> +++ b/fs/lockd/xdr.c
>>> @@ -9,6 +9,7 @@
>>> #include <linux/types.h>
>>> #include <linux/sched.h>
>>> #include <linux/utsname.h>
>>> +#include <linux/in.h>
>>> #include <linux/nfs.h>
>>>
>>> #include <linux/sunrpc/xdr.h>
>>> @@ -346,11 +347,34 @@ nlmsvc_decode_notify(struct svc_rqst *rqstp,
>>> __be32 *p, struct nlm_args *argp)
>>> int
>>> nlmsvc_decode_reboot(struct svc_rqst *rqstp, __be32 *p, struct
>>> nlm_reboot *argp)
>>> {
>>> + struct sockaddr_in *sin = (struct sockaddr_in *)&argp->addr;
>>> + struct sockaddr_in6 *sin6 = (struct sockaddr_in6 *)&argp->addr;
>>> +
>>> if (!(p = xdr_decode_string_inplace(p, &argp->mon, &argp->len,
>>> SM_MAXSTRLEN)))
>>> return 0;
>>> argp->state = ntohl(*p++);
>>> - /* Preserve the address in network byte order */
>>> - argp->addr = *p++;
>>> +
>>> + /* Decode the 16-byte private field */
>>> + memset(&argp->addr, 0, sizeof(argp->addr));
>>> + switch (svc_addr(rqstp)->sa_family) {
>>> + case AF_INET:
>>> + /* data in recv buffer is already in network byte order */
>>> + sin->sin_family = AF_INET;
>>> + sin->sin_addr.s_addr = *p++;
>>> + argp->addrlen = sizeof(*sin);
>>> + break;
>>> + case AF_INET6:
>>> + sin6->sin6_family = AF_INET6;
>>> + sin6->sin6_addr.s6_addr32[0] = *p++;
>>> + sin6->sin6_addr.s6_addr32[1] = *p++;
>>> + sin6->sin6_addr.s6_addr32[2] = *p++;
>>> + sin6->sin6_addr.s6_addr32[3] = *p++;
>>> + argp->addrlen = sizeof(*sin6);
>>> + break;
>>> + default:
>>> + return -EIO;
>>> + }
>>> +
>>> return xdr_argsize_check(rqstp, p);
>>> }
>>>
>>> diff --git a/fs/lockd/xdr4.c b/fs/lockd/xdr4.c
>>> index 50c493a..2009020 100644
>>> --- a/fs/lockd/xdr4.c
>>> +++ b/fs/lockd/xdr4.c
>>> @@ -353,11 +353,34 @@ nlm4svc_decode_notify(struct svc_rqst *rqstp,
>>> __be32 *p, struct nlm_args *argp)
>>> int
>>> nlm4svc_decode_reboot(struct svc_rqst *rqstp, __be32 *p, struct
>>> nlm_reboot *argp)
>>> {
>>> + struct sockaddr_in *sin = (struct sockaddr_in *)&argp->addr;
>>> + struct sockaddr_in6 *sin6 = (struct sockaddr_in6 *)&argp->addr;
>>> +
>>> if (!(p = xdr_decode_string_inplace(p, &argp->mon, &argp->len,
>>> SM_MAXSTRLEN)))
>>> return 0;
>>> argp->state = ntohl(*p++);
>>> - /* Preserve the address in network byte order */
>>> - argp->addr = *p++;
>>> +
>>> + /* Decode the 16-byte private field */
>>> + memset(&argp->addr, 0, sizeof(argp->addr));
>>> + switch (svc_addr(rqstp)->sa_family) {
>>> + case AF_INET:
>>> + /* address in recv buffer is already in network byte order */
>>> + sin->sin_family = AF_INET;
>>> + sin->sin_addr.s_addr = *p++;
>>> + argp->addrlen = sizeof(*sin);
>>> + break;
>>> + case AF_INET6:
>>> + sin6->sin6_family = AF_INET6;
>>> + sin6->sin6_addr.s6_addr32[0] = *p++;
>>> + sin6->sin6_addr.s6_addr32[1] = *p++;
>>> + sin6->sin6_addr.s6_addr32[2] = *p++;
>>> + sin6->sin6_addr.s6_addr32[3] = *p++;
>>> + argp->addrlen = sizeof(*sin6);
>>> + break;
>>> + default:
>>> + return -EIO;
>>> + }
>>> +
>>> return xdr_argsize_check(rqstp, p);
>>> }
>>>
>>> diff --git a/include/linux/lockd/xdr.h b/include/linux/lockd/xdr.h
>>> index d6b3a80..6057b7e 100644
>>> --- a/include/linux/lockd/xdr.h
>>> +++ b/include/linux/lockd/xdr.h
>>> @@ -80,7 +80,8 @@ struct nlm_reboot {
>>> char * mon;
>>> unsigned int len;
>>> u32 state;
>>> - __be32 addr;
>>> + struct sockaddr_storage addr;
>>> + size_t addrlen;
>>> };
>>>
>>> /*
>>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-nfs"
>> in
>> the body of a message to [email protected]
>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
> --
> Chuck Lever
> chuck[dot]lever[at]oracle[dot]com
>
>
>
>

2008-10-01 19:40:21

by Chuck Lever III

[permalink] [raw]
Subject: Re: [PATCH 08/10] lockd: struct nlm_reboot should contain a full socket address

On Oct 1, 2008, at Oct 1, 2008, 2:18 PM, J. Bruce Fields wrote:
> On Wed, Oct 01, 2008 at 12:17:02PM -0400, Chuck Lever wrote:
>> On Sep 26, 2008, at Sep 26, 2008, 7:09 PM, J. Bruce Fields wrote:
>>> On Wed, Sep 17, 2008 at 11:18:11AM -0500, Chuck Lever wrote:
>>>> The XDR decoders for the NSM NOTIFY procedure should construct a
>>>> full
>>>> socket address and store it in the nlm_reboot structure. In
>>>> addition
>>>> to being able to store larger addresses, this means upper layer
>>>> routines get an address family tag so they can distinguish between
>>>> AF_INET and AF_INET6 addresses.
>>>>
>>>> This also keeps potentially large socket addresses off the stack
>>>> and
>>>> instead in dynamically allocated storage.
>>>
>>> So one way to think of this would be that you're extending the
>>> kernel<->statd interface by using the address family of statd's
>>> notify
>>> call to communicate the address family of the host that rebooted.
>>>
>>> Do I have that right?
>>
>> For statd, we're using the same technique that we used when
>> constructing
>> the source address in nlmsvc_lookup_host(). There's no family tag
>> associated with the address because the 16-byte opaque in the on-
>> the-wire
>> format has room only for the sin6_addr part of the address.
>
> OK. It seems a bit tricky, but I can't see why it doesn't work.

Right. I couldn't think of something simpler.

rpc.statd has to continue to work with legacy kernels where the first
4 bytes of the opaque are a 32-bit sin_addr field and the following 12
bytes are zero. Apparently you can have a valid IPv6 address that
looks like that, but I can't find anything that states this
conclusively.

The kernel needs to know the address family in order to interpret the
contents of the opaque and turn them back into a sockaddr so it can be
matched against an nlm_host.

>> The notification downcall for IPv6 peers has to come via IPv6
>> loopback
>> for this to work.
>>
>>>
>>>
>>> --b.
>>>
>>>>
>>>> Signed-off-by: Chuck Lever <[email protected]>
>>>> ---
>>>>
>>>> fs/lockd/svc4proc.c | 10 +---------
>>>> fs/lockd/svcproc.c | 10 +---------
>>>> fs/lockd/xdr.c | 28 ++++++++++++++++++++++++++--
>>>> fs/lockd/xdr4.c | 27 +++++++++++++++++++++++++--
>>>> include/linux/lockd/xdr.h | 3 ++-
>>>> 5 files changed, 55 insertions(+), 23 deletions(-)
>>>>
>>>> diff --git a/fs/lockd/svc4proc.c b/fs/lockd/svc4proc.c
>>>> index 6a5ef9f..e61c05f 100644
>>>> --- a/fs/lockd/svc4proc.c
>>>> +++ b/fs/lockd/svc4proc.c
>>>> @@ -430,8 +430,6 @@ static __be32
>>>> nlm4svc_proc_sm_notify(struct svc_rqst *rqstp, struct nlm_reboot
>>>> *argp,
>>>> void *resp)
>>>> {
>>>> - struct sockaddr_in saddr;
>>>> -
>>>> dprintk("lockd: SM_NOTIFY called\n");
>>>>
>>>> if (!nlm_privileged_requester(rqstp)) {
>>>> @@ -441,13 +439,7 @@ nlm4svc_proc_sm_notify(struct svc_rqst *rqstp,
>>>> struct nlm_reboot *argp,
>>>> return rpc_system_err;
>>>> }
>>>>
>>>> - /* Obtain the host pointer for this NFS server and try to
>>>> - * reclaim all locks we hold on this server.
>>>> - */
>>>> - memset(&saddr, 0, sizeof(saddr));
>>>> - saddr.sin_family = AF_INET;
>>>> - saddr.sin_addr.s_addr = argp->addr;
>>>> - nlm_host_rebooted((struct sockaddr *)&saddr, sizeof(saddr),
>>>> + nlm_host_rebooted((struct sockaddr *)&argp->addr, argp->addrlen,
>>>> argp->mon, argp->len, argp->state);
>>>>
>>>> return rpc_success;
>>>> diff --git a/fs/lockd/svcproc.c b/fs/lockd/svcproc.c
>>>> index 62fcfdb..86a487a 100644
>>>> --- a/fs/lockd/svcproc.c
>>>> +++ b/fs/lockd/svcproc.c
>>>> @@ -462,8 +462,6 @@ static __be32
>>>> nlmsvc_proc_sm_notify(struct svc_rqst *rqstp, struct nlm_reboot
>>>> *argp,
>>>> void *resp)
>>>> {
>>>> - struct sockaddr_in saddr;
>>>> -
>>>> dprintk("lockd: SM_NOTIFY called\n");
>>>>
>>>> if (!nlm_privileged_requester(rqstp)) {
>>>> @@ -473,13 +471,7 @@ nlmsvc_proc_sm_notify(struct svc_rqst *rqstp,
>>>> struct nlm_reboot *argp,
>>>> return rpc_system_err;
>>>> }
>>>>
>>>> - /* Obtain the host pointer for this NFS server and try to
>>>> - * reclaim all locks we hold on this server.
>>>> - */
>>>> - memset(&saddr, 0, sizeof(saddr));
>>>> - saddr.sin_family = AF_INET;
>>>> - saddr.sin_addr.s_addr = argp->addr;
>>>> - nlm_host_rebooted((struct sockaddr *)&saddr, sizeof(saddr),
>>>> + nlm_host_rebooted((struct sockaddr *)&argp->addr, argp->addrlen,
>>>> argp->mon, argp->len, argp->state);
>>>>
>>>> return rpc_success;
>>>> diff --git a/fs/lockd/xdr.c b/fs/lockd/xdr.c
>>>> index 1f22629..92c5695 100644
>>>> --- a/fs/lockd/xdr.c
>>>> +++ b/fs/lockd/xdr.c
>>>> @@ -9,6 +9,7 @@
>>>> #include <linux/types.h>
>>>> #include <linux/sched.h>
>>>> #include <linux/utsname.h>
>>>> +#include <linux/in.h>
>>>> #include <linux/nfs.h>
>>>>
>>>> #include <linux/sunrpc/xdr.h>
>>>> @@ -346,11 +347,34 @@ nlmsvc_decode_notify(struct svc_rqst *rqstp,
>>>> __be32 *p, struct nlm_args *argp)
>>>> int
>>>> nlmsvc_decode_reboot(struct svc_rqst *rqstp, __be32 *p, struct
>>>> nlm_reboot *argp)
>>>> {
>>>> + struct sockaddr_in *sin = (struct sockaddr_in *)&argp->addr;
>>>> + struct sockaddr_in6 *sin6 = (struct sockaddr_in6 *)&argp->addr;
>>>> +
>>>> if (!(p = xdr_decode_string_inplace(p, &argp->mon, &argp->len,
>>>> SM_MAXSTRLEN)))
>>>> return 0;
>>>> argp->state = ntohl(*p++);
>>>> - /* Preserve the address in network byte order */
>>>> - argp->addr = *p++;
>>>> +
>>>> + /* Decode the 16-byte private field */
>>>> + memset(&argp->addr, 0, sizeof(argp->addr));
>>>> + switch (svc_addr(rqstp)->sa_family) {
>>>> + case AF_INET:
>>>> + /* data in recv buffer is already in network byte order */
>>>> + sin->sin_family = AF_INET;
>>>> + sin->sin_addr.s_addr = *p++;
>>>> + argp->addrlen = sizeof(*sin);
>>>> + break;
>>>> + case AF_INET6:
>>>> + sin6->sin6_family = AF_INET6;
>>>> + sin6->sin6_addr.s6_addr32[0] = *p++;
>>>> + sin6->sin6_addr.s6_addr32[1] = *p++;
>>>> + sin6->sin6_addr.s6_addr32[2] = *p++;
>>>> + sin6->sin6_addr.s6_addr32[3] = *p++;
>>>> + argp->addrlen = sizeof(*sin6);
>>>> + break;
>>>> + default:
>>>> + return -EIO;
>>>> + }
>>>> +
>>>> return xdr_argsize_check(rqstp, p);
>>>> }
>>>>
>>>> diff --git a/fs/lockd/xdr4.c b/fs/lockd/xdr4.c
>>>> index 50c493a..2009020 100644
>>>> --- a/fs/lockd/xdr4.c
>>>> +++ b/fs/lockd/xdr4.c
>>>> @@ -353,11 +353,34 @@ nlm4svc_decode_notify(struct svc_rqst *rqstp,
>>>> __be32 *p, struct nlm_args *argp)
>>>> int
>>>> nlm4svc_decode_reboot(struct svc_rqst *rqstp, __be32 *p, struct
>>>> nlm_reboot *argp)
>>>> {
>>>> + struct sockaddr_in *sin = (struct sockaddr_in *)&argp->addr;
>>>> + struct sockaddr_in6 *sin6 = (struct sockaddr_in6 *)&argp->addr;
>>>> +
>>>> if (!(p = xdr_decode_string_inplace(p, &argp->mon, &argp->len,
>>>> SM_MAXSTRLEN)))
>>>> return 0;
>>>> argp->state = ntohl(*p++);
>>>> - /* Preserve the address in network byte order */
>>>> - argp->addr = *p++;
>>>> +
>>>> + /* Decode the 16-byte private field */
>>>> + memset(&argp->addr, 0, sizeof(argp->addr));
>>>> + switch (svc_addr(rqstp)->sa_family) {
>>>> + case AF_INET:
>>>> + /* address in recv buffer is already in network byte order */
>>>> + sin->sin_family = AF_INET;
>>>> + sin->sin_addr.s_addr = *p++;
>>>> + argp->addrlen = sizeof(*sin);
>>>> + break;
>>>> + case AF_INET6:
>>>> + sin6->sin6_family = AF_INET6;
>>>> + sin6->sin6_addr.s6_addr32[0] = *p++;
>>>> + sin6->sin6_addr.s6_addr32[1] = *p++;
>>>> + sin6->sin6_addr.s6_addr32[2] = *p++;
>>>> + sin6->sin6_addr.s6_addr32[3] = *p++;
>>>> + argp->addrlen = sizeof(*sin6);
>>>> + break;
>>>> + default:
>>>> + return -EIO;
>>>> + }
>>>> +
>>>> return xdr_argsize_check(rqstp, p);
>>>> }
>>>>
>>>> diff --git a/include/linux/lockd/xdr.h b/include/linux/lockd/xdr.h
>>>> index d6b3a80..6057b7e 100644
>>>> --- a/include/linux/lockd/xdr.h
>>>> +++ b/include/linux/lockd/xdr.h
>>>> @@ -80,7 +80,8 @@ struct nlm_reboot {
>>>> char * mon;
>>>> unsigned int len;
>>>> u32 state;
>>>> - __be32 addr;
>>>> + struct sockaddr_storage addr;
>>>> + size_t addrlen;
>>>> };
>>>>
>>>> /*
>>>>
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe linux-nfs"
>>> in
>>> the body of a message to [email protected]
>>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>>
>> --
>> Chuck Lever
>> chuck[dot]lever[at]oracle[dot]com
>>
>>
>>
>>

--
Chuck Lever
chuck[dot]lever[at]oracle[dot]com





2008-10-01 20:08:42

by Bruce Fields

[permalink] [raw]
Subject: Re: [PATCH 08/10] lockd: struct nlm_reboot should contain a full socket address

On Wed, Oct 01, 2008 at 03:40:05PM -0400, Chuck Lever wrote:
> On Oct 1, 2008, at Oct 1, 2008, 2:18 PM, J. Bruce Fields wrote:
>> On Wed, Oct 01, 2008 at 12:17:02PM -0400, Chuck Lever wrote:
>>> On Sep 26, 2008, at Sep 26, 2008, 7:09 PM, J. Bruce Fields wrote:
>>>> On Wed, Sep 17, 2008 at 11:18:11AM -0500, Chuck Lever wrote:
>>>>> The XDR decoders for the NSM NOTIFY procedure should construct a
>>>>> full
>>>>> socket address and store it in the nlm_reboot structure. In
>>>>> addition
>>>>> to being able to store larger addresses, this means upper layer
>>>>> routines get an address family tag so they can distinguish between
>>>>> AF_INET and AF_INET6 addresses.
>>>>>
>>>>> This also keeps potentially large socket addresses off the stack
>>>>> and
>>>>> instead in dynamically allocated storage.
>>>>
>>>> So one way to think of this would be that you're extending the
>>>> kernel<->statd interface by using the address family of statd's
>>>> notify
>>>> call to communicate the address family of the host that rebooted.
>>>>
>>>> Do I have that right?
>>>
>>> For statd, we're using the same technique that we used when
>>> constructing
>>> the source address in nlmsvc_lookup_host(). There's no family tag
>>> associated with the address because the 16-byte opaque in the on-
>>> the-wire
>>> format has room only for the sin6_addr part of the address.
>>
>> OK. It seems a bit tricky, but I can't see why it doesn't work.
>
> Right. I couldn't think of something simpler.
>
> rpc.statd has to continue to work with legacy kernels where the first 4
> bytes of the opaque are a 32-bit sin_addr field and the following 12
> bytes are zero.

Wait a minute, there's something not right there: rpc.statd shouldn't
interpret the contents of the opaque field at all, should it?

--b.

> Apparently you can have a valid IPv6 address that looks
> like that, but I can't find anything that states this conclusively.

>
> The kernel needs to know the address family in order to interpret the
> contents of the opaque and turn them back into a sockaddr so it can be
> matched against an nlm_host.

2008-10-01 20:33:07

by Bruce Fields

[permalink] [raw]
Subject: Re: [PATCH 08/10] lockd: struct nlm_reboot should contain a full socket address

On Wed, Oct 01, 2008 at 04:08:39PM -0400, bfields wrote:
> On Wed, Oct 01, 2008 at 03:40:05PM -0400, Chuck Lever wrote:
> > On Oct 1, 2008, at Oct 1, 2008, 2:18 PM, J. Bruce Fields wrote:
> >> On Wed, Oct 01, 2008 at 12:17:02PM -0400, Chuck Lever wrote:
> >>> On Sep 26, 2008, at Sep 26, 2008, 7:09 PM, J. Bruce Fields wrote:
> >>>> On Wed, Sep 17, 2008 at 11:18:11AM -0500, Chuck Lever wrote:
> >>>>> The XDR decoders for the NSM NOTIFY procedure should construct a
> >>>>> full
> >>>>> socket address and store it in the nlm_reboot structure. In
> >>>>> addition
> >>>>> to being able to store larger addresses, this means upper layer
> >>>>> routines get an address family tag so they can distinguish between
> >>>>> AF_INET and AF_INET6 addresses.
> >>>>>
> >>>>> This also keeps potentially large socket addresses off the stack
> >>>>> and
> >>>>> instead in dynamically allocated storage.
> >>>>
> >>>> So one way to think of this would be that you're extending the
> >>>> kernel<->statd interface by using the address family of statd's
> >>>> notify
> >>>> call to communicate the address family of the host that rebooted.
> >>>>
> >>>> Do I have that right?
> >>>
> >>> For statd, we're using the same technique that we used when
> >>> constructing
> >>> the source address in nlmsvc_lookup_host(). There's no family tag
> >>> associated with the address because the 16-byte opaque in the on-
> >>> the-wire
> >>> format has room only for the sin6_addr part of the address.
> >>
> >> OK. It seems a bit tricky, but I can't see why it doesn't work.
> >
> > Right. I couldn't think of something simpler.
> >
> > rpc.statd has to continue to work with legacy kernels where the first 4
> > bytes of the opaque are a 32-bit sin_addr field and the following 12
> > bytes are zero.
>
> Wait a minute, there's something not right there: rpc.statd shouldn't
> interpret the contents of the opaque field at all, should it?

And from a quick look at nfs-utils/statd/ it certainly looks to me like
it's correctly treating the contents as a totally opaque object.

So I think we can put whatever we want in the priv field--an ipv6
address, an index into some kernel table, whatever. (The priv field
shouldn't even have to make sense to a future boot instance--we don't
need to be notified of previously monitored hosts' reboots any more once
we've rebooted ourselves.)

--b.

2008-10-01 20:43:07

by Chuck Lever III

[permalink] [raw]
Subject: Re: [PATCH 08/10] lockd: struct nlm_reboot should contain a full socket address

On Oct 1, 2008, at Oct 1, 2008, 4:08 PM, J. Bruce Fields wrote:
> On Wed, Oct 01, 2008 at 03:40:05PM -0400, Chuck Lever wrote:
>> On Oct 1, 2008, at Oct 1, 2008, 2:18 PM, J. Bruce Fields wrote:
>>> On Wed, Oct 01, 2008 at 12:17:02PM -0400, Chuck Lever wrote:
>>>> On Sep 26, 2008, at Sep 26, 2008, 7:09 PM, J. Bruce Fields wrote:
>>>>> On Wed, Sep 17, 2008 at 11:18:11AM -0500, Chuck Lever wrote:
>>>>>> The XDR decoders for the NSM NOTIFY procedure should construct a
>>>>>> full
>>>>>> socket address and store it in the nlm_reboot structure. In
>>>>>> addition
>>>>>> to being able to store larger addresses, this means upper layer
>>>>>> routines get an address family tag so they can distinguish
>>>>>> between
>>>>>> AF_INET and AF_INET6 addresses.
>>>>>>
>>>>>> This also keeps potentially large socket addresses off the stack
>>>>>> and
>>>>>> instead in dynamically allocated storage.
>>>>>
>>>>> So one way to think of this would be that you're extending the
>>>>> kernel<->statd interface by using the address family of statd's
>>>>> notify
>>>>> call to communicate the address family of the host that rebooted.
>>>>>
>>>>> Do I have that right?
>>>>
>>>> For statd, we're using the same technique that we used when
>>>> constructing
>>>> the source address in nlmsvc_lookup_host(). There's no family tag
>>>> associated with the address because the 16-byte opaque in the on-
>>>> the-wire
>>>> format has room only for the sin6_addr part of the address.
>>>
>>> OK. It seems a bit tricky, but I can't see why it doesn't work.
>>
>> Right. I couldn't think of something simpler.
>>
>> rpc.statd has to continue to work with legacy kernels where the
>> first 4
>> bytes of the opaque are a 32-bit sin_addr field and the following 12
>> bytes are zero.
>
> Wait a minute, there's something not right there: rpc.statd shouldn't
> interpret the contents of the opaque field at all, should it?

Yes, our rpc.statd and lockd should agree on the contents and
interpretation of the opaque field. The field is opaque because it's
internal content is not defined by a standard. It's a cookie, just
like a file handle.

I would certainly *prefer* the use of an entirely arbitrary cookie,
but that's not the way Linux happens to work today.

>> Apparently you can have a valid IPv6 address that looks
>> like that, but I can't find anything that states this conclusively.
>>
>> The kernel needs to know the address family in order to interpret the
>> contents of the opaque and turn them back into a sockaddr so it can
>> be
>> matched against an nlm_host.

--
Chuck Lever
chuck[dot]lever[at]oracle[dot]com

2008-10-01 20:48:54

by Chuck Lever III

[permalink] [raw]
Subject: Re: [PATCH 08/10] lockd: struct nlm_reboot should contain a full socket address

On Oct 1, 2008, at Oct 1, 2008, 4:33 PM, J. Bruce Fields wrote:
> On Wed, Oct 01, 2008 at 04:08:39PM -0400, bfields wrote:
>> On Wed, Oct 01, 2008 at 03:40:05PM -0400, Chuck Lever wrote:
>>> On Oct 1, 2008, at Oct 1, 2008, 2:18 PM, J. Bruce Fields wrote:
>>>> On Wed, Oct 01, 2008 at 12:17:02PM -0400, Chuck Lever wrote:
>>>>> On Sep 26, 2008, at Sep 26, 2008, 7:09 PM, J. Bruce Fields wrote:
>>>>>> On Wed, Sep 17, 2008 at 11:18:11AM -0500, Chuck Lever wrote:
>>>>>>> The XDR decoders for the NSM NOTIFY procedure should construct a
>>>>>>> full
>>>>>>> socket address and store it in the nlm_reboot structure. In
>>>>>>> addition
>>>>>>> to being able to store larger addresses, this means upper layer
>>>>>>> routines get an address family tag so they can distinguish
>>>>>>> between
>>>>>>> AF_INET and AF_INET6 addresses.
>>>>>>>
>>>>>>> This also keeps potentially large socket addresses off the stack
>>>>>>> and
>>>>>>> instead in dynamically allocated storage.
>>>>>>
>>>>>> So one way to think of this would be that you're extending the
>>>>>> kernel<->statd interface by using the address family of statd's
>>>>>> notify
>>>>>> call to communicate the address family of the host that rebooted.
>>>>>>
>>>>>> Do I have that right?
>>>>>
>>>>> For statd, we're using the same technique that we used when
>>>>> constructing
>>>>> the source address in nlmsvc_lookup_host(). There's no family tag
>>>>> associated with the address because the 16-byte opaque in the on-
>>>>> the-wire
>>>>> format has room only for the sin6_addr part of the address.
>>>>
>>>> OK. It seems a bit tricky, but I can't see why it doesn't work.
>>>
>>> Right. I couldn't think of something simpler.
>>>
>>> rpc.statd has to continue to work with legacy kernels where the
>>> first 4
>>> bytes of the opaque are a 32-bit sin_addr field and the following 12
>>> bytes are zero.
>>
>> Wait a minute, there's something not right there: rpc.statd shouldn't
>> interpret the contents of the opaque field at all, should it?
>
> And from a quick look at nfs-utils/statd/ it certainly looks to me
> like
> it's correctly treating the contents as a totally opaque object.

That's exactly what happens. The cookie field is generated by the
SM_MON upcall, and passed back by the SM_NOTIFY downcall.

> So I think we can put whatever we want in the priv field--an ipv6
> address, an index into some kernel table, whatever. (The priv field
> shouldn't even have to make sense to a future boot instance--we don't
> need to be notified of previously monitored hosts' reboots any more
> once
> we've rebooted ourselves.)

Apparently, then, rpc.statd doesn't have enough knowledge to make the
downcall via IPv6 when needed. Everything we need to look up the host
must be encoded into the 16-byte opaque field.

I'll look into it.

--
Chuck Lever
chuck[dot]lever[at]oracle[dot]com

2008-10-01 20:51:58

by Bruce Fields

[permalink] [raw]
Subject: Re: [PATCH 08/10] lockd: struct nlm_reboot should contain a full socket address

On Wed, Oct 01, 2008 at 04:42:51PM -0400, Chuck Lever wrote:
> On Oct 1, 2008, at Oct 1, 2008, 4:08 PM, J. Bruce Fields wrote:
>> On Wed, Oct 01, 2008 at 03:40:05PM -0400, Chuck Lever wrote:
>>> Right. I couldn't think of something simpler.
>>>
>>> rpc.statd has to continue to work with legacy kernels where the
>>> first 4
>>> bytes of the opaque are a 32-bit sin_addr field and the following 12
>>> bytes are zero.
>>
>> Wait a minute, there's something not right there: rpc.statd shouldn't
>> interpret the contents of the opaque field at all, should it?
>
> Yes, our rpc.statd and lockd should agree on the contents and
> interpretation of the opaque field. The field is opaque because it's
> internal content is not defined by a standard. It's a cookie, just like
> a file handle.
>
> I would certainly *prefer* the use of an entirely arbitrary cookie, but
> that's not the way Linux happens to work today.

I'm pretty certain it does--at least on the rpc.statd size. (Any
evidence to the contrary.) Sure, it means something to the kernel, but
that's just for the kernel's convenience. We can do whatever we want in
the kernel.

--b.

2008-10-01 20:52:37

by Bruce Fields

[permalink] [raw]
Subject: Re: [PATCH 08/10] lockd: struct nlm_reboot should contain a full socket address

On Wed, Oct 01, 2008 at 04:51:56PM -0400, bfields wrote:
> On Wed, Oct 01, 2008 at 04:42:51PM -0400, Chuck Lever wrote:
> > On Oct 1, 2008, at Oct 1, 2008, 4:08 PM, J. Bruce Fields wrote:
> >> On Wed, Oct 01, 2008 at 03:40:05PM -0400, Chuck Lever wrote:
> >>> Right. I couldn't think of something simpler.
> >>>
> >>> rpc.statd has to continue to work with legacy kernels where the
> >>> first 4
> >>> bytes of the opaque are a 32-bit sin_addr field and the following 12
> >>> bytes are zero.
> >>
> >> Wait a minute, there's something not right there: rpc.statd shouldn't
> >> interpret the contents of the opaque field at all, should it?
> >
> > Yes, our rpc.statd and lockd should agree on the contents and
> > interpretation of the opaque field. The field is opaque because it's
> > internal content is not defined by a standard. It's a cookie, just like
> > a file handle.
> >
> > I would certainly *prefer* the use of an entirely arbitrary cookie, but
> > that's not the way Linux happens to work today.
>
> I'm pretty certain it does--at least on the rpc.statd size. (Any
> evidence to the contrary.) Sure, it means something to the kernel, but

(Err, I think that parenthetical remark was meant to be a question.
Apologies.)

> that's just for the kernel's convenience. We can do whatever we want in
> the kernel.
>
> --b.

2008-10-01 20:55:21

by Bruce Fields

[permalink] [raw]
Subject: Re: [PATCH 08/10] lockd: struct nlm_reboot should contain a full socket address

On Wed, Oct 01, 2008 at 04:48:41PM -0400, Chuck Lever wrote:
> On Oct 1, 2008, at Oct 1, 2008, 4:33 PM, J. Bruce Fields wrote:
>> And from a quick look at nfs-utils/statd/ it certainly looks to me
>> like
>> it's correctly treating the contents as a totally opaque object.
>
> That's exactly what happens. The cookie field is generated by the
> SM_MON upcall, and passed back by the SM_NOTIFY downcall.

OK, that's great. So we can just

- Choose whatever contents for the "priv"/cookie field we want
to keep the lookup on receipt of SM_NOTIFY easy, and
- remove the support for ipv6 on the loopback communication
with statd; we don't need it.

>> So I think we can put whatever we want in the priv field--an ipv6
>> address, an index into some kernel table, whatever. (The priv field
>> shouldn't even have to make sense to a future boot instance--we don't
>> need to be notified of previously monitored hosts' reboots any more
>> once
>> we've rebooted ourselves.)
>
> Apparently, then, rpc.statd doesn't have enough knowledge to make the
> downcall via IPv6 when needed. Everything we need to look up the host
> must be encoded into the 16-byte opaque field.
>
> I'll look into it.

Thanks!

--b.

2008-10-01 21:16:57

by Chuck Lever III

[permalink] [raw]
Subject: Re: [PATCH 08/10] lockd: struct nlm_reboot should contain a full socket address


On Oct 1, 2008, at Oct 1, 2008, 4:55 PM, J. Bruce Fields wrote:

> On Wed, Oct 01, 2008 at 04:48:41PM -0400, Chuck Lever wrote:
>> On Oct 1, 2008, at Oct 1, 2008, 4:33 PM, J. Bruce Fields wrote:
>>> And from a quick look at nfs-utils/statd/ it certainly looks to me
>>> like
>>> it's correctly treating the contents as a totally opaque object.
>>
>> That's exactly what happens. The cookie field is generated by the
>> SM_MON upcall, and passed back by the SM_NOTIFY downcall.
>
> OK, that's great. So we can just
>
> - Choose whatever contents for the "priv"/cookie field we want
> to keep the lookup on receipt of SM_NOTIFY easy, and
> - remove the support for ipv6 on the loopback communication
> with statd; we don't need it.

I agree that since lockd generates the cookies, we can be smarter
about how lockd deals with cookies coming back from statd, and I
strongly prefer an approach that treats the opaques as arbitrary
rather than interpreted values.

I think we may need to keep some IPv6 support in lockd for loopback
down calls. On an AF_INET6 listener socket, loopback calls will come
from the IPv6 loopback address, AFAICT, even if they are sent from
user space via the IPv4 loopback address. The kernel's network layer
will map the IPv4 loopback address to the IPv6 loopback address
automatically.

I am working on revising last week's patch set based on your
comments. I plan to repost an updated version of these soon, but I
can leave out the last two or three patches that deal solely with the
mechanics of these NSM-related cookies.

>>> So I think we can put whatever we want in the priv field--an ipv6
>>> address, an index into some kernel table, whatever. (The priv field
>>> shouldn't even have to make sense to a future boot instance--we
>>> don't
>>> need to be notified of previously monitored hosts' reboots any more
>>> once
>>> we've rebooted ourselves.)
>>
>> Apparently, then, rpc.statd doesn't have enough knowledge to make the
>> downcall via IPv6 when needed. Everything we need to look up the
>> host
>> must be encoded into the 16-byte opaque field.
>>
>> I'll look into it.
>
> Thanks!
>
> --b.

--
Chuck Lever
chuck[dot]lever[at]oracle[dot]com





2008-10-01 21:30:04

by Bruce Fields

[permalink] [raw]
Subject: Re: [PATCH 08/10] lockd: struct nlm_reboot should contain a full socket address

On Wed, Oct 01, 2008 at 05:16:45PM -0400, Chuck Lever wrote:
>
> On Oct 1, 2008, at Oct 1, 2008, 4:55 PM, J. Bruce Fields wrote:
>
>> On Wed, Oct 01, 2008 at 04:48:41PM -0400, Chuck Lever wrote:
>>> On Oct 1, 2008, at Oct 1, 2008, 4:33 PM, J. Bruce Fields wrote:
>>>> And from a quick look at nfs-utils/statd/ it certainly looks to me
>>>> like
>>>> it's correctly treating the contents as a totally opaque object.
>>>
>>> That's exactly what happens. The cookie field is generated by the
>>> SM_MON upcall, and passed back by the SM_NOTIFY downcall.
>>
>> OK, that's great. So we can just
>>
>> - Choose whatever contents for the "priv"/cookie field we want
>> to keep the lookup on receipt of SM_NOTIFY easy, and
>> - remove the support for ipv6 on the loopback communication
>> with statd; we don't need it.
>
> I agree that since lockd generates the cookies, we can be smarter about
> how lockd deals with cookies coming back from statd, and I strongly
> prefer an approach that treats the opaques as arbitrary rather than
> interpreted values.
>
> I think we may need to keep some IPv6 support in lockd for loopback down
> calls. On an AF_INET6 listener socket, loopback calls will come from the
> IPv6 loopback address, AFAICT, even if they are sent from user space via
> the IPv4 loopback address. The kernel's network layer will map the IPv4
> loopback address to the IPv6 loopback address automatically.

OK. (Well, I guess we could do whatever we'd like here, including using
a separate socket and even program for the notification downcall, but
doing what you propose is probably simplest.)

> I am working on revising last week's patch set based on your comments. I
> plan to repost an updated version of these soon, but I can leave out the
> last two or three patches that deal solely with the mechanics of these
> NSM-related cookies.

Sounds fine, thanks!

--b.