2011-11-18 14:35:04

by John Hughes

[permalink] [raw]
Subject: [PATCH] Add "-e" option to rpc.gssd to allow error on ticket expiry. Try 2 with added man pages.

Description: Add "-e" (ticket expiry is error) option to rpc.gssd
In kernels starting around 2.6.34 the nfs4 server will block all I/O
when a user ticket expires. In earlier kernels the I/O would fail
with an EACCESS error. This patch adds a "-e" option to rpc.gssd
which allow the earlier behaviour (EKEYEXPIRED is converted to
EACCESS). This behaviour is particularly useful when user home
directories are nfs4 mounted with krb5 security - if the user is
absent from their workstation for long enough for the ticket to
expire a new ticket will be obtained (via pam_krb5) by the screen
unlock process.
Author: John Hughes<[email protected]>
Signed-off-by: John Hughes<[email protected]>
Bug-Debian: http://bugs.debian.org/648155
Bug-Ubuntu: https://launchpad.net/bugs/648155

--- nfs-utils-1.2.5.orig/utils/gssd/gssd_proc.c
+++ nfs-utils-1.2.5/utils/gssd/gssd_proc.c
@@ -1007,7 +1007,7 @@ process_krb5_upcall(struct clnt_info *cl
/* Tell krb5 gss which credentials cache to use */
for (dirname = ccachesearch; *dirname != NULL; dirname++) {
err = gssd_setup_krb5_user_gss_ccache(uid, clp->servername, *dirname);
- if (err == -EKEYEXPIRED)
+ if (err == -EKEYEXPIRED&& !ticket_expiry_is_error)
downcall_err = -EKEYEXPIRED;
else if (!err)
create_resp = create_auth_rpc_client(clp,&rpc_clnt,&auth, uid,
--- nfs-utils-1.2.5.orig/utils/gssd/gssd.c
+++ nfs-utils-1.2.5/utils/gssd/gssd.c
@@ -63,6 +63,7 @@ int use_memcache = 0;
int root_uses_machine_creds = 1;
unsigned int context_timeout = 0;
char *preferred_realm = NULL;
+int ticket_expiry_is_error = 0;

void
sig_die(int signal)
@@ -85,7 +86,7 @@ sig_hup(int signal)
static void
usage(char *progname)
{
- fprintf(stderr, "usage: %s [-f] [-M] [-n] [-v] [-r] [-p pipefsdir] [-k keytab] [-d ccachedir] [-t timeout] [-R preferred realm]\n",
+ fprintf(stderr, "usage: %s [-e] [-f] [-M] [-n] [-v] [-r] [-p pipefsdir] [-k keytab] [-d ccachedir] [-t timeout] [-R preferred realm]\n",
progname);
exit(1);
}
@@ -102,8 +103,11 @@ main(int argc, char *argv[])
char *progname;

memset(ccachesearch, 0, sizeof(ccachesearch));
- while ((opt = getopt(argc, argv, "fvrmnMp:k:d:t:R:")) != -1) {
+ while ((opt = getopt(argc, argv, "efvrmnMp:k:d:t:R:")) != -1) {
switch (opt) {
+ case 'e':
+ ticket_expiry_is_error = 1;
+ break;
case 'f':
fg = 1;
break;
--- nfs-utils-1.2.5.orig/utils/gssd/gssd.h
+++ nfs-utils-1.2.5/utils/gssd/gssd.h
@@ -66,6 +66,7 @@ extern int use_memcache;
extern int root_uses_machine_creds;
extern unsigned int context_timeout;
extern char *preferred_realm;
+extern int ticket_expiry_is_error;

TAILQ_HEAD(clnt_list_head, clnt_info) clnt_list;

diff --git a/utils/gssd/gssd.man b/utils/gssd/gssd.man
index 073379d..e2b7b7a 100644
--- a/utils/gssd/gssd.man
+++ b/utils/gssd/gssd.man
@@ -6,7 +6,7 @@
.SH NAME
rpc.gssd \- rpcsec_gss daemon
.SH SYNOPSIS
-.B "rpc.gssd [-f] [-n] [-k keytab] [-p pipefsdir] [-v] [-r] [-d ccachedir]"
+.B "rpc.gssd [-e] [-f] [-n] [-k keytab] [-p pipefsdir] [-v] [-r] [-d ccachedir]"
.SH DESCRIPTION
The rpcsec_gss protocol gives a means of using the gss-api generic security
api to provide security for protocols using rpc (in particular, nfs). Before
@@ -20,6 +20,25 @@ daemon uses files in the rpc_pipefs filesystem to communicate with the kernel.

.SH OPTIONS
.TP
+.TO
+.B -e
+Versions of
+.B rpc.gssd
+before 1.2.2 reported ticket expiry to the kernel as
+.B EACCESS
+(permission denied). More recent versions return
+.B EKEYEXPIRED
+which causes recent kernels to block all I/O to a nfs mount until a new
+key is obtained. The
+.B -e
+option restores the old behaviour.
+
+This is useful in the common case that the user home directories are
+nfs mounted. Without the
+.B -e
+option the user may have difficulty getting a new ticket as she will
+only find out about the expiry of the old one when her processes hang.
+.TP
.B -f
Runs
.B rpc.gssd



2011-11-18 20:34:13

by Myklebust, Trond

[permalink] [raw]
Subject: Re: [PATCH] Add "-e" option to rpc.gssd to allow error on ticket expiry. Try 2 with added man pages.

On Fri, 2011-11-18 at 20:19 +0100, John Hughes wrote:
> On 11/18/2011 07:35 PM, Trond Myklebust wrote:
> > On Fri, 2011-11-18 at 15:34 +0100, John Hughes wrote:
> >
> >> Description: Add "-e" (ticket expiry is error) option to rpc.gssd
> >> In kernels starting around 2.6.34 the nfs4 server will block all I/O
> >> when a user ticket expires. In earlier kernels the I/O would fail
> >> with an EACCESS error. This patch adds a "-e" option to rpc.gssd
> >> which allow the earlier behaviour (EKEYEXPIRED is converted to
> >> EACCESS). This behaviour is particularly useful when user home
> >> directories are nfs4 mounted with krb5 security - if the user is
> >> absent from their workstation for long enough for the ticket to
> >> expire a new ticket will be obtained (via pam_krb5) by the screen
> >> unlock process.
> >>
> > You need a big fat warning somewhere that enabling this option WILL
> > cause data corruption...
> >
> Why?
>
> Because some process may get the EACCES error half way through it's
> operation.

No. Because the process can receive a reply to the write() syscall that
indicates that the data is safe, but the EKEYEXPIRED error will cause
the data to be lost when the client tries to actually commit the data to
disk.

> Ok, that needs documenting.
>
> So far we seem to have established that the old way of doing things was
> bad because it produced non-posix behaviour and could lead to data
> corruption if a ticket expires while a process needs it.
>
> And the new way is bad because it leaves people puzzling over hung
> workstations in the morning.
>
> The traditional Kerberos/AFS way was to behave the old way, and use
> krenew to keep the ticket from expiring if a process needed to be run
> overnight.

Which is just wrong: the general intention of kerberos security is to
ensure that the _user_ has ACKed an operation. Renewing tickets without
user input would circumvent that intention. If you need to have the job
run overnight, then ask for a longer lifetime for your ticket.

> What other way is there of fixing the problem if we are going to keep
> the "hang 'till a ticket turns up" behaviour? (rewrite gnome and kde
> seems kind of a big job).

Notify the kernel that a ticket is about to expire so that the kernel
can decide to block the process on the next NFS-related syscall.

Trond

--
Trond Myklebust
Linux NFS client maintainer

NetApp
[email protected]
http://www.netapp.com


2011-11-18 21:04:01

by Myklebust, Trond

[permalink] [raw]
Subject: Re: [PATCH] Add "-e" option to rpc.gssd to allow error on ticket expiry. Try 2 with added man pages.

On Fri, 2011-11-18 at 15:57 -0500, Jim Rees wrote:
> Trond Myklebust wrote:
>
> On Fri, 2011-11-18 at 20:19 +0100, John Hughes wrote:
> > On 11/18/2011 07:35 PM, Trond Myklebust wrote:
> > > On Fri, 2011-11-18 at 15:34 +0100, John Hughes wrote:
> > >
> > >> Description: Add "-e" (ticket expiry is error) option to rpc.gssd
> > >> In kernels starting around 2.6.34 the nfs4 server will block all I/O
> > >> when a user ticket expires. In earlier kernels the I/O would fail
> > >> with an EACCESS error. This patch adds a "-e" option to rpc.gssd
> > >> which allow the earlier behaviour (EKEYEXPIRED is converted to
> > >> EACCESS). This behaviour is particularly useful when user home
> > >> directories are nfs4 mounted with krb5 security - if the user is
> > >> absent from their workstation for long enough for the ticket to
> > >> expire a new ticket will be obtained (via pam_krb5) by the screen
> > >> unlock process.
> > >>
> > > You need a big fat warning somewhere that enabling this option WILL
> > > cause data corruption...
> > >
> > Why?
> >
> > Because some process may get the EACCES error half way through it's
> > operation.
>
> No. Because the process can receive a reply to the write() syscall that
> indicates that the data is safe, but the EKEYEXPIRED error will cause
> the data to be lost when the client tries to actually commit the data to
> disk.
>
> The write() syscall doesn't indicate whether the data is safe or not. That
> would be the close() syscall.

fsync(). Which may succeed if the user renews their ticket first.
However you may still have data loss if dirty data has been lost because
of EKEYEXPIRED returns on the WRITE RPC call...

Also, for the fsync() to return EKEYEXPIRED _after_ the user has renewed
their ticket would seem counter-intuitive to most people.

--
Trond Myklebust
Linux NFS client maintainer

NetApp
[email protected]
http://www.netapp.com


2011-11-18 20:54:58

by Myklebust, Trond

[permalink] [raw]
Subject: Re: [PATCH] Add "-e" option to rpc.gssd to allow error on ticket expiry. Try 2 with added man pages.

On Fri, 2011-11-18 at 15:47 -0500, Nick Bowler wrote:
> On 2011-11-18 22:33 +0200, Trond Myklebust wrote:
> > On Fri, 2011-11-18 at 20:19 +0100, John Hughes wrote:
> > > On 11/18/2011 07:35 PM, Trond Myklebust wrote:
> > > > On Fri, 2011-11-18 at 15:34 +0100, John Hughes wrote:
> > > >
> > > >> Description: Add "-e" (ticket expiry is error) option to rpc.gssd
> > > >> In kernels starting around 2.6.34 the nfs4 server will block all I/O
> > > >> when a user ticket expires. In earlier kernels the I/O would fail
> > > >> with an EACCESS error. This patch adds a "-e" option to rpc.gssd
> > > >> which allow the earlier behaviour (EKEYEXPIRED is converted to
> > > >> EACCESS). This behaviour is particularly useful when user home
> > > >> directories are nfs4 mounted with krb5 security - if the user is
> > > >> absent from their workstation for long enough for the ticket to
> > > >> expire a new ticket will be obtained (via pam_krb5) by the screen
> > > >> unlock process.
> > > >>
> > > > You need a big fat warning somewhere that enabling this option WILL
> > > > cause data corruption...
> > > >
> > > Why?
> > >
> > > Because some process may get the EACCES error half way through it's
> > > operation.
> >
> > No. Because the process can receive a reply to the write() syscall that
> > indicates that the data is safe, but the EKEYEXPIRED error will cause
> > the data to be lost when the client tries to actually commit the data to
> > disk.
>
> But on a local disk, a successful return from the write syscall doesn't
> mean "the data is safe". It seems odd to me that NFS should provide
> this guarantee while a local disk does not.

The guarantee that POSIX offers is that if the close() or fsync()
succeeds, then the data is guaranteed to be on disk. That is the same
guarantee that NFS is supposed to offer.

Allowing data to disappear into a black hole after a successful write()
and before the client has a chance to fsync() is not sanctioned by POSIX
or anything else.

> Is this guarantee documented anywhere?

Yes. It is the same guarantee as POSIX offers.

--
Trond Myklebust
Linux NFS client maintainer

NetApp
[email protected]
http://www.netapp.com


2011-11-18 22:07:53

by John Hughes

[permalink] [raw]
Subject: Re: [PATCH] Add "-e" option to rpc.gssd to allow error on ticket expiry. Try 2 with added man pages.

On 11/18/2011 09:33 PM, Trond Myklebust wrote:
> On Fri, 2011-11-18 at 20:19 +0100, John Hughes wrote:
>
>> On 11/18/2011 07:35 PM, Trond Myklebust wrote:
>>
>>>
>>> You need a big fat warning somewhere that enabling this option WILL
>>> cause data corruption...
>>>
>>>
>> Why?
>>
>> Because some process may get the EACCES error half way through it's
>> operation.
>>
> No. Because the process can receive a reply to the write() syscall that
> indicates that the data is safe,

There is no reply from "write(2)" that says the data is safe.

> but the EKEYEXPIRED error will cause
> the data to be lost when the client tries to actually commit the data to
> disk.
>
>
>> The traditional Kerberos/AFS way was to behave the old way, and use
>> krenew to keep the ticket from expiring if a process needed to be run
>> overnight.
>>
> Which is just wrong: the general intention of kerberos security is to
> ensure that the _user_ has ACKed an operation. Renewing tickets without
> user input would circumvent that intention. If you need to have the job
> run overnight, then ask for a longer lifetime for your ticket.
>

Ok, so no need for the hang on ticket expired then.

(Although I don't think renewable tickets and krenew are a figment of my
imagination).

>
>> What other way is there of fixing the problem if we are going to keep
>> the "hang 'till a ticket turns up" behaviour? (rewrite gnome and kde
>> seems kind of a big job).
>>
> Notify the kernel that a ticket is about to expire so that the kernel
> can decide to block the process on the next NFS-related syscall.
>
>
I don't understand. How is it a win to block processes *before* the
ticket has expired?



2011-11-18 22:37:36

by Myklebust, Trond

[permalink] [raw]
Subject: Re: [PATCH] Add "-e" option to rpc.gssd to allow error on ticket expiry. Try 2 with added man pages.

On Fri, 2011-11-18 at 23:33 +0100, John Hughes wrote:
> On 11/18/2011 10:03 PM, Trond Myklebust wrote:
> > On Fri, 2011-11-18 at 15:57 -0500, Jim Rees wrote:
> >
> >>
> >> The write() syscall doesn't indicate whether the data is safe or not. That
> >> would be the close() syscall.
> >>
> > fsync(). Which may succeed if the user renews their ticket first.
> > However you may still have data loss if dirty data has been lost because
> > of EKEYEXPIRED returns on the WRITE RPC call...
> >
> Only if the write(2) returned EKEYEXPIRED, surely,

What part of "write is asynchronous" is so hard to understand?

> > Also, for the fsync() to return EKEYEXPIRED _after_ the user has renewed
> > their ticket would seem counter-intuitive to most people.
> >
>
> I would want to know if data was lost.
>
> Intuition means nothing if I get an error.
>
> If it were possible I'd like:
>
> 1. write works
> 1a. WRITE RPC fails, data stays in cache
> 2. ticket renewed
> 3. fsync works, data written

Which is _exactly_ how it works today, so what is the problem?

--
Trond Myklebust
Linux NFS client maintainer

NetApp
[email protected]
http://www.netapp.com


2011-11-18 19:18:15

by John Hughes

[permalink] [raw]
Subject: Re: [PATCH] Add "-e" option to rpc.gssd to allow error on ticket expiry. Try 2 with added man pages.

On 11/18/2011 07:35 PM, Trond Myklebust wrote:
> On Fri, 2011-11-18 at 15:34 +0100, John Hughes wrote:
>
>> Description: Add "-e" (ticket expiry is error) option to rpc.gssd
>> In kernels starting around 2.6.34 the nfs4 server will block all I/O
>> when a user ticket expires. In earlier kernels the I/O would fail
>> with an EACCESS error. This patch adds a "-e" option to rpc.gssd
>> which allow the earlier behaviour (EKEYEXPIRED is converted to
>> EACCESS). This behaviour is particularly useful when user home
>> directories are nfs4 mounted with krb5 security - if the user is
>> absent from their workstation for long enough for the ticket to
>> expire a new ticket will be obtained (via pam_krb5) by the screen
>> unlock process.
>>
> You need a big fat warning somewhere that enabling this option WILL
> cause data corruption...
>
Why?

Because some process may get the EACCES error half way through it's
operation.

Ok, that needs documenting.

So far we seem to have established that the old way of doing things was
bad because it produced non-posix behaviour and could lead to data
corruption if a ticket expires while a process needs it.

And the new way is bad because it leaves people puzzling over hung
workstations in the morning.

The traditional Kerberos/AFS way was to behave the old way, and use
krenew to keep the ticket from expiring if a process needed to be run
overnight.

What other way is there of fixing the problem if we are going to keep
the "hang 'till a ticket turns up" behaviour? (rewrite gnome and kde
seems kind of a big job).


2011-11-18 18:35:34

by Myklebust, Trond

[permalink] [raw]
Subject: Re: [PATCH] Add "-e" option to rpc.gssd to allow error on ticket expiry. Try 2 with added man pages.

On Fri, 2011-11-18 at 15:34 +0100, John Hughes wrote:
> Description: Add "-e" (ticket expiry is error) option to rpc.gssd
> In kernels starting around 2.6.34 the nfs4 server will block all I/O
> when a user ticket expires. In earlier kernels the I/O would fail
> with an EACCESS error. This patch adds a "-e" option to rpc.gssd
> which allow the earlier behaviour (EKEYEXPIRED is converted to
> EACCESS). This behaviour is particularly useful when user home
> directories are nfs4 mounted with krb5 security - if the user is
> absent from their workstation for long enough for the ticket to
> expire a new ticket will be obtained (via pam_krb5) by the screen
> unlock process.

You need a big fat warning somewhere that enabling this option WILL
cause data corruption...

Trond
--
Trond Myklebust
Linux NFS client maintainer

NetApp
[email protected]
http://www.netapp.com


2011-11-18 22:38:09

by Myklebust, Trond

[permalink] [raw]
Subject: Re: [PATCH] Add "-e" option to rpc.gssd to allow error on ticket expiry. Try 2 with added man pages.

On Fri, 2011-11-18 at 23:08 +0100, John Hughes wrote:
> On 11/18/2011 09:33 PM, Trond Myklebust wrote:
> > On Fri, 2011-11-18 at 20:19 +0100, John Hughes wrote:
> >
> >> On 11/18/2011 07:35 PM, Trond Myklebust wrote:
> >>
> >>>
> >>> You need a big fat warning somewhere that enabling this option WILL
> >>> cause data corruption...
> >>>
> >>>
> >> Why?
> >>
> >> Because some process may get the EACCES error half way through it's
> >> operation.
> >>
> > No. Because the process can receive a reply to the write() syscall that
> > indicates that the data is safe,
>
> There is no reply from "write(2)" that says the data is safe.

Read the rest of the thread... Jim and Nick already made this point, and
I've replied.

The fact of the matter is that most application writers remain blithely
oblivious of the need to fsync() as the ext4 people know all to well:
see the attempts to impose the fully posix-compatible 'data=writeback'
mode as the default and the catastrophe that occurred when
'data=ordered' semantics changed for the rename() syscall. Adding new
failure modes needs to be done with care, or GNOME will crash and/or
your word processor _will_ lose your last hour or so of work.

> > but the EKEYEXPIRED error will cause
> > the data to be lost when the client tries to actually commit the data to
> > disk.
> >
> >
> >> The traditional Kerberos/AFS way was to behave the old way, and use
> >> krenew to keep the ticket from expiring if a process needed to be run
> >> overnight.
> >>
> > Which is just wrong: the general intention of kerberos security is to
> > ensure that the _user_ has ACKed an operation. Renewing tickets without
> > user input would circumvent that intention. If you need to have the job
> > run overnight, then ask for a longer lifetime for your ticket.
> >
>
> Ok, so no need for the hang on ticket expired then.
>
> (Although I don't think renewable tickets and krenew are a figment of my
> imagination).

They are a workaround to the problem of users failing to plan ahead
and/or jobs not running as quickly as originally scheduled. You can run
them if you feel safe doing so, but they should not be a mandatory
feature to ensure data isn't lost during normal operation.

BTW: instead of trying to change existing kernel and gssd semantics, why
not concentrate on adding the equivalent of krenewd/kstart?

> >> What other way is there of fixing the problem if we are going to keep
> >> the "hang 'till a ticket turns up" behaviour? (rewrite gnome and kde
> >> seems kind of a big job).
> >>
> > Notify the kernel that a ticket is about to expire so that the kernel
> > can decide to block the process on the next NFS-related syscall.
> >
> >
> I don't understand. How is it a win to block processes *before* the
> ticket has expired?

So that the kernel can flush out any dirty data while the ticket is
valid, and block further asynchronous read/write operations (either by
making them synchronous, or by returning an appropriate error). We may
also want to consider closing open file state and possibly freeing up
locks so that other processes with valid credentials may still access
the data.

IOW: that allows for an _orderly_ failure mode instead of the current
catastrophic mode. The application gets a chance to deal with any errors
_before_ the data is lost.

--
Trond Myklebust
Linux NFS client maintainer

NetApp
[email protected]
http://www.netapp.com


2011-11-18 22:33:06

by John Hughes

[permalink] [raw]
Subject: Re: [PATCH] Add "-e" option to rpc.gssd to allow error on ticket expiry. Try 2 with added man pages.

On 11/18/2011 10:03 PM, Trond Myklebust wrote:
> On Fri, 2011-11-18 at 15:57 -0500, Jim Rees wrote:
>
>>
>> The write() syscall doesn't indicate whether the data is safe or not. That
>> would be the close() syscall.
>>
> fsync(). Which may succeed if the user renews their ticket first.
> However you may still have data loss if dirty data has been lost because
> of EKEYEXPIRED returns on the WRITE RPC call...
>
Only if the write(2) returned EKEYEXPIRED, surely,
> Also, for the fsync() to return EKEYEXPIRED _after_ the user has renewed
> their ticket would seem counter-intuitive to most people.
>

I would want to know if data was lost.

Intuition means nothing if I get an error.

If it were possible I'd like:

1. write works
1a. WRITE RPC fails, data stays in cache
2. ticket renewed
3. fsync works, data written

But:

1. write...
1a. WRITE RPC fails
1b. ... fails

seems ok.

Even

1. write works
1a WRITE RPC fails
2. ticket renewed
3. fsync fails

would be ok for me. (light cone problems?)


2011-11-18 22:46:15

by John Hughes

[permalink] [raw]
Subject: Re: [PATCH] Add "-e" option to rpc.gssd to allow error on ticket expiry. Try 2 with added man pages.

On 11/18/2011 11:37 PM, Trond Myklebust wrote:
> On Fri, 2011-11-18 at 23:33 +0100, John Hughes wrote:
>
>> On 11/18/2011 10:03 PM, Trond Myklebust wrote:
>>
>>> On Fri, 2011-11-18 at 15:57 -0500, Jim Rees wrote:
>>>
>>>
>>>> The write() syscall doesn't indicate whether the data is safe or not. That
>>>> would be the close() syscall.
>>>>
>>>>
>>> fsync(). Which may succeed if the user renews their ticket first.
>>> However you may still have data loss if dirty data has been lost because
>>> of EKEYEXPIRED returns on the WRITE RPC call...
>>>
>>>
>> Only if the write(2) returned EKEYEXPIRED, surely,
>>
> What part of "write is asynchronous" is so hard to understand?
>

If write succeeds,
and the write rpc fails
and data is lost
and fsync succeeds
then the nfs client is broken.

d'accord?

>
>> I would want to know if data was lost.
>> Intuition means nothing if I get an error.
>>
>> If it were possible I'd like:
>>
>> 1. write works
>> 1a. WRITE RPC fails, data stays in cache
>> 2. ticket renewed
>> 3. fsync works, data written
>>
> Which is _exactly_ how it works today, so what is the problem?
>
>
Well, the hang after step 1a.

If there is to be a hang I'd like it when the fsync is done.

(And no hang if no fsync).



2011-11-18 22:56:45

by John Hughes

[permalink] [raw]
Subject: Re: [PATCH] Add "-e" option to rpc.gssd to allow error on ticket expiry. Try 2 with added man pages.

On 11/18/2011 11:38 PM, Trond Myklebust wrote:
> The fact of the matter is that most application writers remain blithely
> oblivious of the need to fsync() as the ext4 people know all to well:
> see the attempts to impose the fully posix-compatible 'data=writeback'
> mode as the default and the catastrophe that occurred when
> 'data=ordered' semantics changed for the rename() syscall. Adding new
> failure modes needs to be done with care, or GNOME will crash and/or
> your word processor _will_ lose your last hour or so of work.
>
1. It's not a new failure mode. it's how things worked before 2.6.34
2. My ticket usualy expires in the middle of the night. I get back in
the morning and have to reboot the workstation because it's hung.
Anything left in the wordprocessor is lost.

Before this change I got back in the morning, the unlock screen popped
up, I entered my password and got a new ticket, everyone was happy.

2011-11-18 20:57:56

by Jim Rees

[permalink] [raw]
Subject: Re: [PATCH] Add "-e" option to rpc.gssd to allow error on ticket expiry. Try 2 with added man pages.

Trond Myklebust wrote:

On Fri, 2011-11-18 at 20:19 +0100, John Hughes wrote:
> On 11/18/2011 07:35 PM, Trond Myklebust wrote:
> > On Fri, 2011-11-18 at 15:34 +0100, John Hughes wrote:
> >
> >> Description: Add "-e" (ticket expiry is error) option to rpc.gssd
> >> In kernels starting around 2.6.34 the nfs4 server will block all I/O
> >> when a user ticket expires. In earlier kernels the I/O would fail
> >> with an EACCESS error. This patch adds a "-e" option to rpc.gssd
> >> which allow the earlier behaviour (EKEYEXPIRED is converted to
> >> EACCESS). This behaviour is particularly useful when user home
> >> directories are nfs4 mounted with krb5 security - if the user is
> >> absent from their workstation for long enough for the ticket to
> >> expire a new ticket will be obtained (via pam_krb5) by the screen
> >> unlock process.
> >>
> > You need a big fat warning somewhere that enabling this option WILL
> > cause data corruption...
> >
> Why?
>
> Because some process may get the EACCES error half way through it's
> operation.

No. Because the process can receive a reply to the write() syscall that
indicates that the data is safe, but the EKEYEXPIRED error will cause
the data to be lost when the client tries to actually commit the data to
disk.

The write() syscall doesn't indicate whether the data is safe or not. That
would be the close() syscall.

2011-11-18 21:46:14

by Nick Bowler

[permalink] [raw]
Subject: Re: [PATCH] Add "-e" option to rpc.gssd to allow error on ticket expiry. Try 2 with added man pages.

On 2011-11-18 22:33 +0200, Trond Myklebust wrote:
> On Fri, 2011-11-18 at 20:19 +0100, John Hughes wrote:
> > On 11/18/2011 07:35 PM, Trond Myklebust wrote:
> > > On Fri, 2011-11-18 at 15:34 +0100, John Hughes wrote:
> > >
> > >> Description: Add "-e" (ticket expiry is error) option to rpc.gssd
> > >> In kernels starting around 2.6.34 the nfs4 server will block all I/O
> > >> when a user ticket expires. In earlier kernels the I/O would fail
> > >> with an EACCESS error. This patch adds a "-e" option to rpc.gssd
> > >> which allow the earlier behaviour (EKEYEXPIRED is converted to
> > >> EACCESS). This behaviour is particularly useful when user home
> > >> directories are nfs4 mounted with krb5 security - if the user is
> > >> absent from their workstation for long enough for the ticket to
> > >> expire a new ticket will be obtained (via pam_krb5) by the screen
> > >> unlock process.
> > >>
> > > You need a big fat warning somewhere that enabling this option WILL
> > > cause data corruption...
> > >
> > Why?
> >
> > Because some process may get the EACCES error half way through it's
> > operation.
>
> No. Because the process can receive a reply to the write() syscall that
> indicates that the data is safe, but the EKEYEXPIRED error will cause
> the data to be lost when the client tries to actually commit the data to
> disk.

But on a local disk, a successful return from the write syscall doesn't
mean "the data is safe". It seems odd to me that NFS should provide
this guarantee while a local disk does not.

Is this guarantee documented anywhere?

Cheers,
--
Nick Bowler, Elliptic Technologies (http://www.elliptictech.com/)