2002-07-29 11:14:24

by Jan Hudec

[permalink] [raw]
Subject: Race in open(O_CREAT|O_EXCL) and network filesystem

Hi all,

maybe I'm blind, but I think there is a race featuring
open(O_CREAT|O_EXCL) and nfs or any other network fs.

What may happen is:

client A: open_namei looks up the inode
driver queries server and gets ENOENT
client B: open_namei looks up the inode
driver queries server and gets ENOENT
client A: open_namei calls create method
driver requests file to be created and is successful
client B: open_namei calls create method
dirver requests file to be created and since it does not know,
cant specify exclusion, thus is succesful
client A: open_namei does no more checks and thus open succeeds
client B: open succeeds too here - and it shouldn't

Since many applications rely on this working correctly (especialy
mailboxes are locked using exclusive creates and mounting them over NFS
is quite common).

So, can someone please answer:

1) Is there some reason this can't happen that I overlooked?
2) If it is a problem (comment in NFS suggest so), I can see two ways of
handling this. Either pass the flags to the create method, or restart
the open when create returns EEXISTS. Which one would be prefered?
3) How to fix NFS to add exclude flag to the NFSv3 request?

-------------------------------------------------------------------------------
Jan 'Bulb' Hudec <[email protected]>


2002-07-29 11:42:37

by Trond Myklebust

[permalink] [raw]
Subject: Re: Race in open(O_CREAT|O_EXCL) and network filesystem

>>>>> " " == Jan Hudec <[email protected]> writes:

> Hi all, maybe I'm blind, but I think there is a race featuring
> open(O_CREAT|O_EXCL) and nfs or any other network fs.

> 1) Is there some reason this can't happen that I overlooked?

No. It can indeed happen, and it is one of my pet peeves in the
current open_namei() layout. The VFS seems all too often to assume
that a semaphore suffices to ensure atomicity. This is obviously not
the case for networked filesystems.

> 2) If it is a problem (comment in NFS suggest so), I can see
> two ways of
> handling this. Either pass the flags to the create method, or
> restart the open when create returns EEXISTS. Which one would
> be prefered?

I'd rather like to see some method by which we could merge the
lookup() and create() calls.

Given its support for exclusive create, there is no reason why we
should be doing the lookup in the first place on NFSv3. It's just a
waste of an RPC call...
IIRC, the NFSv4 client actually has to work around the whole
open_namei() thingy with a new 'open()' method in order to conform to
the RFCs.

The minimum change I'd need, though, is for vfs_create() to actually
pass me the O_EXCL flag.

Cheers,
Trond

2002-07-29 11:47:15

by NeilBrown

[permalink] [raw]
Subject: Re: Race in open(O_CREAT|O_EXCL) and network filesystem

On Sunday July 28, [email protected] wrote:
> Hi all,
>
> maybe I'm blind, but I think there is a race featuring
> open(O_CREAT|O_EXCL) and nfs or any other network fs.
>
> What may happen is:
>
> client A: open_namei looks up the inode
> driver queries server and gets ENOENT
> client B: open_namei looks up the inode
> driver queries server and gets ENOENT
> client A: open_namei calls create method
> driver requests file to be created and is successful
> client B: open_namei calls create method
> dirver requests file to be created and since it does not know,
> cant specify exclusion, thus is succesful
> client A: open_namei does no more checks and thus open succeeds
> client B: open succeeds too here - and it shouldn't
>
> Since many applications rely on this working correctly (especialy
> mailboxes are locked using exclusive creates and mounting them over NFS
> is quite common).
>
> So, can someone please answer:
>
> 1) Is there some reason this can't happen that I overlooked?

No. You are correct.

> 2) If it is a problem (comment in NFS suggest so), I can see two ways of
> handling this. Either pass the flags to the create method, or restart
> the open when create returns EEXISTS. Which one would be prefered?

Well.. at OLS in the Lustre talk Peter Braam talked about something
that could be used. Unfortunately it doesn't seem to be included in
the paper in the proceedings but the idea was to include some "intend"
in the lookup request. e.g. "lookup with intent to create" or
"lookup with intent to delete" or maybe "lookup with intent to open
for exclusive write access". The filesystem could then, at it's
option, carry out the intended operation (possibly only partially) as
part of the lookup. A simple filesystem wouldn't bothe and the VFS
would continue with the normal process. A networked filesystem could
do that whole operation including intent atomically.

Apparently Al Viro is not completely against the idea, but I haven't
actually seen any code or detailed specs (not that I have looked hard)
so it might all be vapourware.

> 3) How to fix NFS to add exclude flag to the NFSv3 request?

It's not easy... else it would already have been done.

NeilBrown

2002-07-29 15:00:43

by Andreas Dilger

[permalink] [raw]
Subject: Re: Race in open(O_CREAT|O_EXCL) and network filesystem

On Jul 29, 2002 21:50 +1000, Neil Brown wrote:
> On Sunday July 28, [email protected] wrote:
> > maybe I'm blind, but I think there is a race featuring
> > open(O_CREAT|O_EXCL) and nfs or any other network fs.
> >
> > What may happen is:
> >
> > client A: open_namei looks up the inode
> > driver queries server and gets ENOENT
> > client B: open_namei looks up the inode
> > driver queries server and gets ENOENT
> > client A: open_namei calls create method
> > driver requests file to be created and is successful
> > client B: open_namei calls create method
> > dirver requests file to be created and since it does not know,
> > cant specify exclusion, thus is succesful
> > client A: open_namei does no more checks and thus open succeeds
> > client B: open succeeds too here - and it shouldn't
> >
> > Since many applications rely on this working correctly (especialy
> > mailboxes are locked using exclusive creates and mounting them over NFS
> > is quite common).
> >
> > So, can someone please answer:
> >
> > 1) Is there some reason this can't happen that I overlooked?
>
> No. You are correct.
>
> > 2) If it is a problem (comment in NFS suggest so), I can see two ways of
> > handling this. Either pass the flags to the create method, or restart
> > the open when create returns EEXISTS. Which one would be prefered?
>
> Well.. at OLS in the Lustre talk Peter Braam talked about something
> that could be used. Unfortunately it doesn't seem to be included in
> the paper in the proceedings but the idea was to include some "intend"
> in the lookup request. e.g. "lookup with intent to create" or
> "lookup with intent to delete" or maybe "lookup with intent to open
> for exclusive write access". The filesystem could then, at it's
> option, carry out the intended operation (possibly only partially) as
> part of the lookup. A simple filesystem wouldn't bothe and the VFS
> would continue with the normal process. A networked filesystem could
> do that whole operation including intent atomically.

The intent-based lookup code is available as part of the Lustre CVS.
See lustre/patches/patch-2.4.18 at the SF lustre project. There are
a couple of other changes in the patch that are unrelated to intents,
but those are fairly obvious (i.e. ext3/jbd changes, some exports, etc).

Cheers, Andreas
--
Andreas Dilger
http://www-mddsp.enel.ucalgary.ca/People/adilger/
http://sourceforge.net/projects/ext2resize/

2002-07-29 23:00:43

by NeilBrown

[permalink] [raw]
Subject: Re: Race in open(O_CREAT|O_EXCL) and network filesystem

On Monday July 29, [email protected] wrote:
>
> The intent-based lookup code is available as part of the Lustre CVS.
> See lustre/patches/patch-2.4.18 at the SF lustre project. There are
> a couple of other changes in the patch that are unrelated to intents,
> but those are fairly obvious (i.e. ext3/jbd changes, some exports, etc).
>

Thanks. I've found it. I might have a read through some time.
Is there any plan (or likelyhood) for this getting into 2.5?

NeilBrown

2002-07-29 23:42:45

by Andreas Dilger

[permalink] [raw]
Subject: Re: Race in open(O_CREAT|O_EXCL) and network filesystem

On Jul 30, 2002 09:04 +1000, Neil Brown wrote:
> On Monday July 29, [email protected] wrote:
> > The intent-based lookup code is available as part of the Lustre CVS.
> > See lustre/patches/patch-2.4.18 at the SF lustre project. There are
> > a couple of other changes in the patch that are unrelated to intents,
> > but those are fairly obvious (i.e. ext3/jbd changes, some exports, etc).
>
> Thanks. I've found it. I might have a read through some time.
> Is there any plan (or likelyhood) for this getting into 2.5?

Well, we plan to submit it for 2.5, but no work has been done in that
direction yet. I believe Peter has an agreement-in-principle with Al
on this, but I don't think Al has seen the code yet. We want to make
sure that we don't need any major changes before it is submit it. So
far the current patch is working well for us.

Cheers, Andreas
--
Andreas Dilger
http://www-mddsp.enel.ucalgary.ca/People/adilger/
http://sourceforge.net/projects/ext2resize/