2019-02-08 11:20:39

by Amir Goldstein

[permalink] [raw]
Subject: Better interop for NFS/SMB file share mode/reservation

Hi Bruce,

I have been following you discussion with Volker Lendecke
on the samba technical mailing list [1] and have had discussed
this issue with Volker myself as well.

I decided to start this new thread to bring some kernel developers
in the loop and to propose an idea that takes a somewhat
different approach to the "interop" approaches I have seen
so far. "interop" in this context often means consistency of file
lock states between samba and nfs server, but I am referring
to the stronger sense of interop with local filesystem on the server.

You pointed to Pavel Shilovsky's O_DENY* patches [2] as a possible
solution to interop of NFS Share Reservation and SMB Share Mode
with local filesystems.
Some of the complaints on this approach were (rightfully) concerned
about DoS and the prospect of plaguing Linux with Windows server
"files left open" issues.

My idea comes from the observation that Windows server
administrators can release locked files that were left open by clients.
I suppose that an NFS server admin can do the same?
That realization makes "share access" locks (a.k.a. MAND_LOCK)
not so very different from oplocks (leases/delegations).
As long as samba and nfsd cooperate nicely with MAND_LOCK
semantics, we don't really have to force local filesystems
to obay MAND_LOCK semantics. If the file servers take leases
on local filesystems, they will not get exclusive write access for
files already open for write on local filesytem and same for read.

On local file access on the server that violates the share mode,
the file server acts as a grumpy washed out administrator that
automatically grants any lock revoke ticket after timeout.

This model may not fit use cases where "real" interop with
local filesystem is needed, but compared to the existing
solution (no interop at all) it is quite an improvement.

Furthermore, short of SMB DENY_DELETE, we may not even
need to change any kernel APIs.
The addition of O_DENY* open flags can make programming
easier, but taking a lease on an open file is still safe enough
to implement share reservation (no?).

Satisfying DENY_DELETE could be more tricky, but perhaps
the existing SILLYRENAME interface of==between knfsd and vfs
could be somehow utilized for this purpose?

I though of bringing this up as a TOPIC for LSF/MM, but wanted
to consult with you first. I am sure that you or Jeff can do a better
job than me in enumerating the "interop" file lock issues that
could be discussed in filesystems track forum.

Thoughts? Explanation why this idea is idiotic?

Thanks,
Amir.

[1] https://lists.samba.org/archive/samba-technical/2019-February/132366.html
[2] https://lore.kernel.org/lkml/[email protected]/


2019-02-08 13:10:59

by Jeff Layton

[permalink] [raw]
Subject: Re: Better interop for NFS/SMB file share mode/reservation

On Fri, 2019-02-08 at 13:20 +0200, Amir Goldstein wrote:
> Hi Bruce,
>
> I have been following you discussion with Volker Lendecke
> on the samba technical mailing list [1] and have had discussed
> this issue with Volker myself as well.
>
> I decided to start this new thread to bring some kernel developers
> in the loop and to propose an idea that takes a somewhat
> different approach to the "interop" approaches I have seen
> so far. "interop" in this context often means consistency of file
> lock states between samba and nfs server, but I am referring
> to the stronger sense of interop with local filesystem on the server.
>
> You pointed to Pavel Shilovsky's O_DENY* patches [2] as a possible
> solution to interop of NFS Share Reservation and SMB Share Mode
> with local filesystems.
> Some of the complaints on this approach were (rightfully) concerned
> about DoS and the prospect of plaguing Linux with Windows server
> "files left open" issues.
>
> My idea comes from the observation that Windows server
> administrators can release locked files that were left open by clients.
> I suppose that an NFS server admin can do the same?

The Linux kernel has no mechanism for this (aside from sending a SIGKILL
to lockd, which makes it drop all locks). Solaris did have a tool for
this at one point (and probably still does).

It's a little less of a problem now than it used to be with NFS, given
the move to NFSv4 (which has lease-based locking). If you have
misbehaving clients, you just kick them out and their locks eventually
go away. v3 locks can stick around in perpetuity however, so people have
long wanted such a tool on Linux as well.

> That realization makes "share access" locks (a.k.a. MAND_LOCK)
> not so very different from oplocks (leases/delegations).
> As long as samba and nfsd cooperate nicely with MAND_LOCK
> semantics, we don't really have to force local filesystems
> to obay MAND_LOCK semantics. If the file servers take leases
> on local filesystems, they will not get exclusive write access for
> files already open for write on local filesytem and same for read.
>

I think this last statement isn't correct (if I'm parsing it correctly).
If a file is already open for write, then you just don't get a lease
when you try to request one. Ditto for write leases if it's already open
for read.

> On local file access on the server that violates the share mode,
> the file server acts as a grumpy washed out administrator that
> automatically grants any lock revoke ticket after timeout.
>

Devil's advocate:

Is this situation any better than just teaching the NFS/SMB servers to
track these locks out of band? Both samba and most NFS servers respect
share/deny mode locks, but only internally -- they aren't aware of the
others'. We could (in principle) come up with a mechanism to track these
that doesn't involve plumbing them into the kernel.

That said, coherent locking is best done in the kernel, IMO...

> This model may not fit use cases where "real" interop with
> local filesystem is needed, but compared to the existing
> solution (no interop at all) it is quite an improvement.
>
> Furthermore, short of SMB DENY_DELETE, we may not even
> need to change any kernel APIs.
> The addition of O_DENY* open flags can make programming
> easier, but taking a lease on an open file is still safe enough
> to implement share reservation (no?).
>
> Satisfying DENY_DELETE could be more tricky, but perhaps
> the existing SILLYRENAME interface of==between knfsd and vfs
> could be somehow utilized for this purpose?
>
> I though of bringing this up as a TOPIC for LSF/MM, but wanted
> to consult with you first. I am sure that you or Jeff can do a better
> job than me in enumerating the "interop" file lock issues that
> could be discussed in filesystems track forum.
>
> Thoughts? Explanation why this idea is idiotic?

I think it's not a single idea. There are really two different aspects
to this given that we're really talking about two different types of
locks in SMB. I think you have to consider solving these problems
separately:

1) the ability to set a (typically whole-file) share/deny lock
atomically when you open a file. This is necessary for coherent
share/deny lock semantics. Note that these are only enforced open()
time.

2) mandatory locking (forbidding reads and writes on a byte range when
there is a conflicting lock set).

The first could (probably) be solved with something like what Pavel
proposed a few years ago...or maybe we just wire up O_EXLOCK and
O_SHLOCK:

https://www.gnu.org/software/libc/manual/html_node/Open_002dtime-Flags.html

This seems like a fine idea (in principle) but it needs someone to drive
the work forward. You'll also likely be consuming a couple of O_* flags,
which could be tough sell (unless you come up with another way to do
it).

The second problem is much more difficult to fix correctly, and involves
interjecting locking checks into (hot) file read/write codepaths. This
is non-trivial and could have performance impacts even when no lock is
set on a file.
--
Jeff Layton <[email protected]>


2019-02-08 14:46:01

by Amir Goldstein

[permalink] [raw]
Subject: Re: Better interop for NFS/SMB file share mode/reservation

On Fri, Feb 8, 2019 at 3:10 PM Jeff Layton <[email protected]> wrote:
>
> On Fri, 2019-02-08 at 13:20 +0200, Amir Goldstein wrote:
> > Hi Bruce,
> >
> > I have been following you discussion with Volker Lendecke
> > on the samba technical mailing list [1] and have had discussed
> > this issue with Volker myself as well.
> >
> > I decided to start this new thread to bring some kernel developers
> > in the loop and to propose an idea that takes a somewhat
> > different approach to the "interop" approaches I have seen
> > so far. "interop" in this context often means consistency of file
> > lock states between samba and nfs server, but I am referring
> > to the stronger sense of interop with local filesystem on the server.
> >
> > You pointed to Pavel Shilovsky's O_DENY* patches [2] as a possible
> > solution to interop of NFS Share Reservation and SMB Share Mode
> > with local filesystems.
> > Some of the complaints on this approach were (rightfully) concerned
> > about DoS and the prospect of plaguing Linux with Windows server
> > "files left open" issues.
> >
> > My idea comes from the observation that Windows server
> > administrators can release locked files that were left open by clients.
> > I suppose that an NFS server admin can do the same?
>
> The Linux kernel has no mechanism for this (aside from sending a SIGKILL
> to lockd, which makes it drop all locks). Solaris did have a tool for
> this at one point (and probably still does).
>
> It's a little less of a problem now than it used to be with NFS, given
> the move to NFSv4 (which has lease-based locking). If you have
> misbehaving clients, you just kick them out and their locks eventually
> go away. v3 locks can stick around in perpetuity however, so people have
> long wanted such a tool on Linux as well.
>

In a nut shell, I think my proposal is that samba will do something
similar and request leases from the kernel instead of trying to
enforce real mandatory locks.

> > That realization makes "share access" locks (a.k.a. MAND_LOCK)
> > not so very different from oplocks (leases/delegations).
> > As long as samba and nfsd cooperate nicely with MAND_LOCK
> > semantics, we don't really have to force local filesystems
> > to obay MAND_LOCK semantics. If the file servers take leases
> > on local filesystems, they will not get exclusive write access for
> > files already open for write on local filesytem and same for read.
> >
>
> I think this last statement isn't correct (if I'm parsing it correctly).
> If a file is already open for write, then you just don't get a lease
> when you try to request one. Ditto for write leases if it's already open
> for read.
>

I think you miss read what I miss wrote ;-)
As the title of this thread states, I am talking about the first case
of acquiring an exclusive or read shared access to file at open time.
It may be the fact that samba currently calls flock(LOCK_MAND)
that is the source for confusion.

Open failure is the expected behavior if file is already open for
write (or read) on local filesystem, so my suggestion is:
- Server opens the file and request a lease based of desired share mode
- If file server got the lease, client gets the file handle
- Otherwise, client gets an open failure

> > On local file access on the server that violates the share mode,
> > the file server acts as a grumpy washed out administrator that
> > automatically grants any lock revoke ticket after timeout.
> >
>
> Devil's advocate:
>
> Is this situation any better than just teaching the NFS/SMB servers to
> track these locks out of band? Both samba and most NFS servers respect
> share/deny mode locks, but only internally -- they aren't aware of the
> others'. We could (in principle) come up with a mechanism to track these
> that doesn't involve plumbing them into the kernel.
>

That would be a prerequisite to my suggested solution, as I wrote:
"As long as samba and nfsd cooperate nicely with LOCK_MAND..."
That means the two file servers cooperate on the share mode locks
and try to figure out if there are outstanding leases before opening
a file that will break those leases.

> That said, coherent locking is best done in the kernel, IMO...
>

Indeed...

> > This model may not fit use cases where "real" interop with
> > local filesystem is needed, but compared to the existing
> > solution (no interop at all) it is quite an improvement.
> >
> > Furthermore, short of SMB DENY_DELETE, we may not even
> > need to change any kernel APIs.
> > The addition of O_DENY* open flags can make programming
> > easier, but taking a lease on an open file is still safe enough
> > to implement share reservation (no?).
> >
> > Satisfying DENY_DELETE could be more tricky, but perhaps
> > the existing SILLYRENAME interface of==between knfsd and vfs
> > could be somehow utilized for this purpose?
> >
> > I though of bringing this up as a TOPIC for LSF/MM, but wanted
> > to consult with you first. I am sure that you or Jeff can do a better
> > job than me in enumerating the "interop" file lock issues that
> > could be discussed in filesystems track forum.
> >
> > Thoughts? Explanation why this idea is idiotic?
>
> I think it's not a single idea. There are really two different aspects
> to this given that we're really talking about two different types of
> locks in SMB. I think you have to consider solving these problems
> separately:
>
> 1) the ability to set a (typically whole-file) share/deny lock
> atomically when you open a file. This is necessary for coherent
> share/deny lock semantics. Note that these are only enforced open()
> time.
>
> 2) mandatory locking (forbidding reads and writes on a byte range when
> there is a conflicting lock set).
>

I was only trying to address the first problem (small steps...).

> The first could (probably) be solved with something like what Pavel
> proposed a few years ago...or maybe we just wire up O_EXLOCK and
> O_SHLOCK:
>
> https://www.gnu.org/software/libc/manual/html_node/Open_002dtime-Flags.html
>

Nice. I wasn't aware of those BSD flags.

> This seems like a fine idea (in principle) but it needs someone to drive
> the work forward. You'll also likely be consuming a couple of O_* flags,
> which could be tough sell (unless you come up with another way to do
> it).
>

Once I know the obstacles to watch out from, I can drive this work.
Thing is, I am not convinced myself that any new O_ flags are needed.

How about this (for samba, knfsd is simpler):
- pfd = open(filename, O_PATH)
- flock(pfd, LOCK_MAND) (for file servers interop)
- vfs checks no conflicting LOCK_MAND locks (like patch you once posted)
- open(filename, O_RDWR) (and verify st_ino like samba does)
- Request lease (for local fs interop)
- check_conflicting_open() is changed to use inode_is_open_for_read()
instead of checking d_count and i_count.
- we already have i_readcount, just need to remove ifdef CONFIG_IMA
- On lease break (from local fs), break client oplocks and invalidate
file handle on server

Thanks,
Amir.

2019-02-08 15:50:54

by J. Bruce Fields

[permalink] [raw]
Subject: Re: Better interop for NFS/SMB file share mode/reservation

On Fri, Feb 08, 2019 at 04:45:46PM +0200, Amir Goldstein wrote:
> - check_conflicting_open() is changed to use inode_is_open_for_read()
> instead of checking d_count and i_count.

Independently of the rest, I'd love to do away with those
d_count/i_count checks. What's inode_is_open_for_read()?

--b.

2019-02-08 16:03:37

by Jeff Layton

[permalink] [raw]
Subject: Re: Better interop for NFS/SMB file share mode/reservation

On Fri, 2019-02-08 at 16:45 +0200, Amir Goldstein wrote:
> On Fri, Feb 8, 2019 at 3:10 PM Jeff Layton <[email protected]> wrote:
> > On Fri, 2019-02-08 at 13:20 +0200, Amir Goldstein wrote:
> > > Hi Bruce,
> > >
> > > I have been following you discussion with Volker Lendecke
> > > on the samba technical mailing list [1] and have had discussed
> > > this issue with Volker myself as well.
> > >
> > > I decided to start this new thread to bring some kernel developers
> > > in the loop and to propose an idea that takes a somewhat
> > > different approach to the "interop" approaches I have seen
> > > so far. "interop" in this context often means consistency of file
> > > lock states between samba and nfs server, but I am referring
> > > to the stronger sense of interop with local filesystem on the server.
> > >
> > > You pointed to Pavel Shilovsky's O_DENY* patches [2] as a possible
> > > solution to interop of NFS Share Reservation and SMB Share Mode
> > > with local filesystems.
> > > Some of the complaints on this approach were (rightfully) concerned
> > > about DoS and the prospect of plaguing Linux with Windows server
> > > "files left open" issues.
> > >
> > > My idea comes from the observation that Windows server
> > > administrators can release locked files that were left open by clients.
> > > I suppose that an NFS server admin can do the same?
> >
> > The Linux kernel has no mechanism for this (aside from sending a SIGKILL
> > to lockd, which makes it drop all locks). Solaris did have a tool for
> > this at one point (and probably still does).
> >
> > It's a little less of a problem now than it used to be with NFS, given
> > the move to NFSv4 (which has lease-based locking). If you have
> > misbehaving clients, you just kick them out and their locks eventually
> > go away. v3 locks can stick around in perpetuity however, so people have
> > long wanted such a tool on Linux as well.
> >
>
> In a nut shell, I think my proposal is that samba will do something
> similar and request leases from the kernel instead of trying to
> enforce real mandatory locks.
>
> > > That realization makes "share access" locks (a.k.a. MAND_LOCK)
> > > not so very different from oplocks (leases/delegations).
> > > As long as samba and nfsd cooperate nicely with MAND_LOCK
> > > semantics, we don't really have to force local filesystems
> > > to obay MAND_LOCK semantics. If the file servers take leases
> > > on local filesystems, they will not get exclusive write access for
> > > files already open for write on local filesytem and same for read.
> > >
> >
> > I think this last statement isn't correct (if I'm parsing it correctly).
> > If a file is already open for write, then you just don't get a lease
> > when you try to request one. Ditto for write leases if it's already open
> > for read.
> >
>
> I think you miss read what I miss wrote ;-)
> As the title of this thread states, I am talking about the first case
> of acquiring an exclusive or read shared access to file at open time.
> It may be the fact that samba currently calls flock(LOCK_MAND)
> that is the source for confusion.
>
> Open failure is the expected behavior if file is already open for
> write (or read) on local filesystem, so my suggestion is:
> - Server opens the file and request a lease based of desired share mode
> - If file server got the lease, client gets the file handle
> - Otherwise, client gets an open failure

> > > On local file access on the server that violates the share mode,
> > > the file server acts as a grumpy washed out administrator that
> > > automatically grants any lock revoke ticket after timeout.
> > >
> >
> > Devil's advocate:
> >
> > Is this situation any better than just teaching the NFS/SMB servers to
> > track these locks out of band? Both samba and most NFS servers respect
> > share/deny mode locks, but only internally -- they aren't aware of the
> > others'. We could (in principle) come up with a mechanism to track these
> > that doesn't involve plumbing them into the kernel.
> >
>
> That would be a prerequisite to my suggested solution, as I wrote:
> "As long as samba and nfsd cooperate nicely with LOCK_MAND..."
> That means the two file servers cooperate on the share mode locks
> and try to figure out if there are outstanding leases before opening
> a file that will break those leases.
>
> > That said, coherent locking is best done in the kernel, IMO...
> >
>
> Indeed...
>
> > > This model may not fit use cases where "real" interop with
> > > local filesystem is needed, but compared to the existing
> > > solution (no interop at all) it is quite an improvement.
> > >
> > > Furthermore, short of SMB DENY_DELETE, we may not even
> > > need to change any kernel APIs.
> > > The addition of O_DENY* open flags can make programming
> > > easier, but taking a lease on an open file is still safe enough
> > > to implement share reservation (no?).
> > >
> > > Satisfying DENY_DELETE could be more tricky, but perhaps
> > > the existing SILLYRENAME interface of==between knfsd and vfs
> > > could be somehow utilized for this purpose?
> > >
> > > I though of bringing this up as a TOPIC for LSF/MM, but wanted
> > > to consult with you first. I am sure that you or Jeff can do a better
> > > job than me in enumerating the "interop" file lock issues that
> > > could be discussed in filesystems track forum.
> > >
> > > Thoughts? Explanation why this idea is idiotic?
> >
> > I think it's not a single idea. There are really two different aspects
> > to this given that we're really talking about two different types of
> > locks in SMB. I think you have to consider solving these problems
> > separately:
> >
> > 1) the ability to set a (typically whole-file) share/deny lock
> > atomically when you open a file. This is necessary for coherent
> > share/deny lock semantics. Note that these are only enforced open()
> > time.
> >
> > 2) mandatory locking (forbidding reads and writes on a byte range when
> > there is a conflicting lock set).
> >
>
> I was only trying to address the first problem (small steps...).
>
> > The first could (probably) be solved with something like what Pavel
> > proposed a few years ago...or maybe we just wire up O_EXLOCK and
> > O_SHLOCK:
> >
> > https://www.gnu.org/software/libc/manual/html_node/Open_002dtime-Flags.html
> >
>
> Nice. I wasn't aware of those BSD flags.
>

Share/deny open semantics are pretty similar across NFS and SMB (by
design, really). If you intend to solve that use-case, what you really
want is whole-file, shared/exclusive locks that are set atomically with
the open call. O_EXLOCK and O_SHLOCK seem like a reasonable fit there.

Then you could have SMB and NFS servers set these flags when opening
files, and deal with the occasional denial at open time. Other
applications won't be aware of them of course, but that's probably fine
for most use-cases where you want this sort of protocol interop.

DENY_DELETE is a bit harder to deal with however, but that's probably
something that could be addressed separately.

> > This seems like a fine idea (in principle) but it needs someone to drive
> > the work forward. You'll also likely be consuming a couple of O_* flags,
> > which could be tough sell (unless you come up with another way to do
> > it).
> >
>
> Once I know the obstacles to watch out from, I can drive this work.
> Thing is, I am not convinced myself that any new O_ flags are needed.
>
> How about this (for samba, knfsd is simpler):
> - pfd = open(filename, O_PATH)
> - flock(pfd, LOCK_MAND) (for file servers interop)
> - vfs checks no conflicting LOCK_MAND locks (like patch you once posted)
> - open(filename, O_RDWR) (and verify st_ino like samba does)
> - Request lease (for local fs interop)
> - check_conflicting_open() is changed to use inode_is_open_for_read()gi
> - we already have i_readcount, just need to remove ifdef CONFIG_IMA
> - On lease break (from local fs), break client oplocks and invalidate
> file handle on server
>

Now that I look at the handling of flock LOCK_MAND, I'm not sure how
it's supposed to work. In particular, flock_locks_conflict basically
says that a LOCK_MAND lock can never conflict with anything. I'm not
sure what good that does.

The flock manpage does not document LOCK_MAND. It's in /usr/include/asm-
generic/fcntl.h on my machine, but it looks like it just got taken right
out of the kernel headers long ago.

I think we need to have a hard look at what this flag is doing today
(seems like not much). What are samba's expectations with that flag?

--
Jeff Layton <[email protected]>


2019-02-08 16:44:20

by Jeffrey Layton

[permalink] [raw]
Subject: Re: Better interop for NFS/SMB file share mode/reservation

On Fri, 2019-02-08 at 11:03 -0500, Jeff Layton wrote:
> On Fri, 2019-02-08 at 16:45 +0200, Amir Goldstein wrote:
> > On Fri, Feb 8, 2019 at 3:10 PM Jeff Layton <[email protected]> wrote:
> > > On Fri, 2019-02-08 at 13:20 +0200, Amir Goldstein wrote:
> > > > Hi Bruce,
> > > >
> > > > I have been following you discussion with Volker Lendecke
> > > > on the samba technical mailing list [1] and have had discussed
> > > > this issue with Volker myself as well.
> > > >
> > > > I decided to start this new thread to bring some kernel developers
> > > > in the loop and to propose an idea that takes a somewhat
> > > > different approach to the "interop" approaches I have seen
> > > > so far. "interop" in this context often means consistency of file
> > > > lock states between samba and nfs server, but I am referring
> > > > to the stronger sense of interop with local filesystem on the server.
> > > >
> > > > You pointed to Pavel Shilovsky's O_DENY* patches [2] as a possible
> > > > solution to interop of NFS Share Reservation and SMB Share Mode
> > > > with local filesystems.
> > > > Some of the complaints on this approach were (rightfully) concerned
> > > > about DoS and the prospect of plaguing Linux with Windows server
> > > > "files left open" issues.
> > > >
> > > > My idea comes from the observation that Windows server
> > > > administrators can release locked files that were left open by clients.
> > > > I suppose that an NFS server admin can do the same?
> > >
> > > The Linux kernel has no mechanism for this (aside from sending a SIGKILL
> > > to lockd, which makes it drop all locks). Solaris did have a tool for
> > > this at one point (and probably still does).
> > >
> > > It's a little less of a problem now than it used to be with NFS, given
> > > the move to NFSv4 (which has lease-based locking). If you have
> > > misbehaving clients, you just kick them out and their locks eventually
> > > go away. v3 locks can stick around in perpetuity however, so people have
> > > long wanted such a tool on Linux as well.
> > >
> >
> > In a nut shell, I think my proposal is that samba will do something
> > similar and request leases from the kernel instead of trying to
> > enforce real mandatory locks.
> >
> > > > That realization makes "share access" locks (a.k.a. MAND_LOCK)
> > > > not so very different from oplocks (leases/delegations).
> > > > As long as samba and nfsd cooperate nicely with MAND_LOCK
> > > > semantics, we don't really have to force local filesystems
> > > > to obay MAND_LOCK semantics. If the file servers take leases
> > > > on local filesystems, they will not get exclusive write access for
> > > > files already open for write on local filesytem and same for read.
> > > >
> > >
> > > I think this last statement isn't correct (if I'm parsing it correctly).
> > > If a file is already open for write, then you just don't get a lease
> > > when you try to request one. Ditto for write leases if it's already open
> > > for read.
> > >
> >
> > I think you miss read what I miss wrote ;-)
> > As the title of this thread states, I am talking about the first case
> > of acquiring an exclusive or read shared access to file at open time.
> > It may be the fact that samba currently calls flock(LOCK_MAND)
> > that is the source for confusion.
> >
> > Open failure is the expected behavior if file is already open for
> > write (or read) on local filesystem, so my suggestion is:
> > - Server opens the file and request a lease based of desired share mode
> > - If file server got the lease, client gets the file handle
> > - Otherwise, client gets an open failure
> > > > On local file access on the server that violates the share mode,
> > > > the file server acts as a grumpy washed out administrator that
> > > > automatically grants any lock revoke ticket after timeout.
> > > >
> > >
> > > Devil's advocate:
> > >
> > > Is this situation any better than just teaching the NFS/SMB servers to
> > > track these locks out of band? Both samba and most NFS servers respect
> > > share/deny mode locks, but only internally -- they aren't aware of the
> > > others'. We could (in principle) come up with a mechanism to track these
> > > that doesn't involve plumbing them into the kernel.
> > >
> >
> > That would be a prerequisite to my suggested solution, as I wrote:
> > "As long as samba and nfsd cooperate nicely with LOCK_MAND..."
> > That means the two file servers cooperate on the share mode locks
> > and try to figure out if there are outstanding leases before opening
> > a file that will break those leases.
> >
> > > That said, coherent locking is best done in the kernel, IMO...
> > >
> >
> > Indeed...
> >
> > > > This model may not fit use cases where "real" interop with
> > > > local filesystem is needed, but compared to the existing
> > > > solution (no interop at all) it is quite an improvement.
> > > >
> > > > Furthermore, short of SMB DENY_DELETE, we may not even
> > > > need to change any kernel APIs.
> > > > The addition of O_DENY* open flags can make programming
> > > > easier, but taking a lease on an open file is still safe enough
> > > > to implement share reservation (no?).
> > > >
> > > > Satisfying DENY_DELETE could be more tricky, but perhaps
> > > > the existing SILLYRENAME interface of==between knfsd and vfs
> > > > could be somehow utilized for this purpose?
> > > >
> > > > I though of bringing this up as a TOPIC for LSF/MM, but wanted
> > > > to consult with you first. I am sure that you or Jeff can do a better
> > > > job than me in enumerating the "interop" file lock issues that
> > > > could be discussed in filesystems track forum.
> > > >
> > > > Thoughts? Explanation why this idea is idiotic?
> > >
> > > I think it's not a single idea. There are really two different aspects
> > > to this given that we're really talking about two different types of
> > > locks in SMB. I think you have to consider solving these problems
> > > separately:
> > >
> > > 1) the ability to set a (typically whole-file) share/deny lock
> > > atomically when you open a file. This is necessary for coherent
> > > share/deny lock semantics. Note that these are only enforced open()
> > > time.
> > >
> > > 2) mandatory locking (forbidding reads and writes on a byte range when
> > > there is a conflicting lock set).
> > >
> >
> > I was only trying to address the first problem (small steps...).
> >
> > > The first could (probably) be solved with something like what Pavel
> > > proposed a few years ago...or maybe we just wire up O_EXLOCK and
> > > O_SHLOCK:
> > >
> > > https://www.gnu.org/software/libc/manual/html_node/Open_002dtime-Flags.html
> > >
> >
> > Nice. I wasn't aware of those BSD flags.
> >
>
> Share/deny open semantics are pretty similar across NFS and SMB (by
> design, really). If you intend to solve that use-case, what you really
> want is whole-file, shared/exclusive locks that are set atomically with
> the open call. O_EXLOCK and O_SHLOCK seem like a reasonable fit there.
>
> Then you could have SMB and NFS servers set these flags when opening
> files, and deal with the occasional denial at open time. Other
> applications won't be aware of them of course, but that's probably fine
> for most use-cases where you want this sort of protocol interop.
>
> DENY_DELETE is a bit harder to deal with however, but that's probably
> something that could be addressed separately.
>
> > > This seems like a fine idea (in principle) but it needs someone to drive
> > > the work forward. You'll also likely be consuming a couple of O_* flags,
> > > which could be tough sell (unless you come up with another way to do
> > > it).
> > >
> >
> > Once I know the obstacles to watch out from, I can drive this work.
> > Thing is, I am not convinced myself that any new O_ flags are needed.
> >
> > How about this (for samba, knfsd is simpler):
> > - pfd = open(filename, O_PATH)
> > - flock(pfd, LOCK_MAND) (for file servers interop)
> > - vfs checks no conflicting LOCK_MAND locks (like patch you once posted)
> > - open(filename, O_RDWR) (and verify st_ino like samba does)
> > - Request lease (for local fs interop)
> > - check_conflicting_open() is changed to use inode_is_open_for_read()gi
> > - we already have i_readcount, just need to remove ifdef CONFIG_IMA
> > - On lease break (from local fs), break client oplocks and invalidate
> > file handle on server
> >
>
> Now that I look at the handling of flock LOCK_MAND, I'm not sure how
> it's supposed to work. In particular, flock_locks_conflict basically
> says that a LOCK_MAND lock can never conflict with anything. I'm not
> sure what good that does.
>
> The flock manpage does not document LOCK_MAND. It's in /usr/include/asm-
> generic/fcntl.h on my machine, but it looks like it just got taken right
> out of the kernel headers long ago.
>
> I think we need to have a hard look at what this flag is doing today
> (seems like not much). What are samba's expectations with that flag?
>

Yeah, in fact, I rolled this program and ran it in two different shells
on the same machine against the same file, and they both acquired a
lock:

---------------------------[snip]------------------------------
#define _GNU_SOURCE
#include <stdio.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <sys/file.h>
#include <fcntl.h>

int main(int argc, char **argv) {
int fd, ret;

fd = open(argv[1], O_RDWR|O_CREAT, 0644);
if (fd < 0)
perror("open");

ret = flock(fd, LOCK_EX|LOCK_MAND);
if (ret)
perror("flock");
printf("Lock acquired");
getchar();
return 0;
}
---------------------------[snip]------------------------------

I move that LOCK_MAND be nuked from orbit...or someone step forward to
propose reasonable semantics for it. :)

--
Jeffrey Layton <[email protected]>


2019-02-08 20:02:57

by Amir Goldstein

[permalink] [raw]
Subject: Re: Better interop for NFS/SMB file share mode/reservation

On Fri, Feb 8, 2019 at 5:51 PM J. Bruce Fields <[email protected]> wrote:
>
> On Fri, Feb 08, 2019 at 04:45:46PM +0200, Amir Goldstein wrote:
> > - check_conflicting_open() is changed to use inode_is_open_for_read()
> > instead of checking d_count and i_count.
>
> Independently of the rest, I'd love to do away with those
> d_count/i_count checks. What's inode_is_open_for_read()?
>

It would look maybe something like this:

static inline bool file_is_open_for_read(const struct inode *file)
{
struct inode *inode = file_inode(file);
int countself = (file->f_mode & (FMODE_READ | FMODE_WRITE)) ==
FMODE_READ) ? 1 : 0;

return atomic_read(&inode->i_readcount) > countself;
}

And it would allow for acquiring F_WRLCK lease if other
instances of inode are open O_PATH.
A slight change of semantics that seems harmless(?)
and will allow some flexibility.

But if samba can't figure out a way to keep a single open file
descriptor for oplocks per client-file, then this model doesn't
help us make any progress.

Thanks,
Amir.

2019-02-08 20:16:51

by J. Bruce Fields

[permalink] [raw]
Subject: Re: Better interop for NFS/SMB file share mode/reservation

On Fri, Feb 08, 2019 at 10:02:43PM +0200, Amir Goldstein wrote:
> On Fri, Feb 8, 2019 at 5:51 PM J. Bruce Fields <[email protected]> wrote:
> >
> > On Fri, Feb 08, 2019 at 04:45:46PM +0200, Amir Goldstein wrote:
> > > - check_conflicting_open() is changed to use inode_is_open_for_read()
> > > instead of checking d_count and i_count.
> >
> > Independently of the rest, I'd love to do away with those
> > d_count/i_count checks. What's inode_is_open_for_read()?
> >
>
> It would look maybe something like this:
>
> static inline bool file_is_open_for_read(const struct inode *file)
> {
> struct inode *inode = file_inode(file);
> int countself = (file->f_mode & (FMODE_READ | FMODE_WRITE)) ==
> FMODE_READ) ? 1 : 0;
>
> return atomic_read(&inode->i_readcount) > countself;
> }
>
> And it would allow for acquiring F_WRLCK lease if other
> instances of inode are open O_PATH.
> A slight change of semantics that seems harmless(?)
> and will allow some flexibility.

How did I not know about i_readcount? (Looking) I guess it would mean
adding some dependence on CONFIG_IMA, hm.

--b.

2019-02-08 20:31:21

by Amir Goldstein

[permalink] [raw]
Subject: Re: Better interop for NFS/SMB file share mode/reservation

On Fri, Feb 8, 2019 at 10:17 PM J. Bruce Fields <[email protected]> wrote:
>
> On Fri, Feb 08, 2019 at 10:02:43PM +0200, Amir Goldstein wrote:
> > On Fri, Feb 8, 2019 at 5:51 PM J. Bruce Fields <[email protected]> wrote:
> > >
> > > On Fri, Feb 08, 2019 at 04:45:46PM +0200, Amir Goldstein wrote:
> > > > - check_conflicting_open() is changed to use inode_is_open_for_read()
> > > > instead of checking d_count and i_count.
> > >
> > > Independently of the rest, I'd love to do away with those
> > > d_count/i_count checks. What's inode_is_open_for_read()?
> > >
> >
> > It would look maybe something like this:
> >
> > static inline bool file_is_open_for_read(const struct inode *file)
> > {
> > struct inode *inode = file_inode(file);
> > int countself = (file->f_mode & (FMODE_READ | FMODE_WRITE)) ==
> > FMODE_READ) ? 1 : 0;
> >
> > return atomic_read(&inode->i_readcount) > countself;
> > }
> >
> > And it would allow for acquiring F_WRLCK lease if other
> > instances of inode are open O_PATH.
> > A slight change of semantics that seems harmless(?)
> > and will allow some flexibility.
>
> How did I not know about i_readcount? (Looking) I guess it would mean
> adding some dependence on CONFIG_IMA, hm.
>

Yes, or we remove ifdef CONFIG_IMA from i_readcount.
I am not sure if the concern was size of struct inode
(shouldn't increase on 64bit arch) or the accounting on
open/close. The impact doesn't look significant (?)..

Thanks,
Amir.

2019-02-08 22:44:18

by Jeremy Allison

[permalink] [raw]
Subject: Re: Better interop for NFS/SMB file share mode/reservation

On Fri, Feb 08, 2019 at 10:02:43PM +0200, Amir Goldstein via samba-technical wrote:
> On Fri, Feb 8, 2019 at 5:51 PM J. Bruce Fields <[email protected]> wrote:
> >
> > On Fri, Feb 08, 2019 at 04:45:46PM +0200, Amir Goldstein wrote:
> > > - check_conflicting_open() is changed to use inode_is_open_for_read()
> > > instead of checking d_count and i_count.
> >
> > Independently of the rest, I'd love to do away with those
> > d_count/i_count checks. What's inode_is_open_for_read()?
> >
>
> It would look maybe something like this:
>
> static inline bool file_is_open_for_read(const struct inode *file)
> {
> struct inode *inode = file_inode(file);
> int countself = (file->f_mode & (FMODE_READ | FMODE_WRITE)) ==
> FMODE_READ) ? 1 : 0;
>
> return atomic_read(&inode->i_readcount) > countself;
> }
>
> And it would allow for acquiring F_WRLCK lease if other
> instances of inode are open O_PATH.
> A slight change of semantics that seems harmless(?)
> and will allow some flexibility.
>
> But if samba can't figure out a way to keep a single open file
> descriptor for oplocks per client-file, then this model doesn't
> help us make any progress.

Samba uses a single file descriptor per SMB2 open file
handle. Is this what you meant ? We need this to keep
the per-handle OFD locks around.

2019-02-09 04:04:35

by Amir Goldstein

[permalink] [raw]
Subject: Re: Better interop for NFS/SMB file share mode/reservation

On Sat, Feb 9, 2019 at 12:12 AM Jeremy Allison <[email protected]> wrote:
>
> On Fri, Feb 08, 2019 at 10:02:43PM +0200, Amir Goldstein via samba-technical wrote:
> > On Fri, Feb 8, 2019 at 5:51 PM J. Bruce Fields <[email protected]> wrote:
> > >
> > > On Fri, Feb 08, 2019 at 04:45:46PM +0200, Amir Goldstein wrote:
> > > > - check_conflicting_open() is changed to use inode_is_open_for_read()
> > > > instead of checking d_count and i_count.
> > >
> > > Independently of the rest, I'd love to do away with those
> > > d_count/i_count checks. What's inode_is_open_for_read()?
> > >
> >
> > It would look maybe something like this:
> >
> > static inline bool file_is_open_for_read(const struct inode *file)
> > {
> > struct inode *inode = file_inode(file);
> > int countself = (file->f_mode & (FMODE_READ | FMODE_WRITE)) ==
> > FMODE_READ) ? 1 : 0;
> >
> > return atomic_read(&inode->i_readcount) > countself;
> > }
> >
> > And it would allow for acquiring F_WRLCK lease if other
> > instances of inode are open O_PATH.
> > A slight change of semantics that seems harmless(?)
> > and will allow some flexibility.
> >
> > But if samba can't figure out a way to keep a single open file
> > descriptor for oplocks per client-file, then this model doesn't
> > help us make any progress.
>
> Samba uses a single file descriptor per SMB2 open file
> handle. Is this what you meant ? We need this to keep
> the per-handle OFD locks around.

I understand now there are several cases when smbd has
several open file descriptors for the same client.
Is that related to this comment in samba wiki about kernel oplocks?

"Linux kernel oplocks don't provide the needed features.
(They don't even work correctly for oplocks...)"

Can you elaborate on that? is that because a samba oplock
is per client and therefore other file opens from same client
should not break its own lease?
If that is the case, than Bruce's work on the "delegations"
flavor of kernel oplocks could make them a good fit for samba.

As Bruce wrote, we could export the "delegations" flavor
to user space for samba, just after we figure out it matches
samba requirements.

Thanks,
Amir.

[1] https://wiki.samba.org/index.php/Samba3/SMB2#locking.2Fopen_files_.28fs_layer.29
[2] https://lore.kernel.org/lkml/[email protected]/

2019-02-11 05:31:48

by ronnie sahlberg

[permalink] [raw]
Subject: Re: Better interop for NFS/SMB file share mode/reservation

On Sat, Feb 9, 2019 at 12:47 AM Amir Goldstein via samba-technical
<[email protected]> wrote:
>
> On Fri, Feb 8, 2019 at 3:10 PM Jeff Layton <[email protected]> wrote:
> >
> > On Fri, 2019-02-08 at 13:20 +0200, Amir Goldstein wrote:
> > > Hi Bruce,
> > >
> > > I have been following you discussion with Volker Lendecke
> > > on the samba technical mailing list [1] and have had discussed
> > > this issue with Volker myself as well.
> > >
> > > I decided to start this new thread to bring some kernel developers
> > > in the loop and to propose an idea that takes a somewhat
> > > different approach to the "interop" approaches I have seen
> > > so far. "interop" in this context often means consistency of file
> > > lock states between samba and nfs server, but I am referring
> > > to the stronger sense of interop with local filesystem on the server.
> > >
> > > You pointed to Pavel Shilovsky's O_DENY* patches [2] as a possible
> > > solution to interop of NFS Share Reservation and SMB Share Mode
> > > with local filesystems.
> > > Some of the complaints on this approach were (rightfully) concerned
> > > about DoS and the prospect of plaguing Linux with Windows server
> > > "files left open" issues.
> > >
> > > My idea comes from the observation that Windows server
> > > administrators can release locked files that were left open by clients.
> > > I suppose that an NFS server admin can do the same?
> >
> > The Linux kernel has no mechanism for this (aside from sending a SIGKILL
> > to lockd, which makes it drop all locks). Solaris did have a tool for
> > this at one point (and probably still does).
> >
> > It's a little less of a problem now than it used to be with NFS, given
> > the move to NFSv4 (which has lease-based locking). If you have
> > misbehaving clients, you just kick them out and their locks eventually
> > go away. v3 locks can stick around in perpetuity however, so people have
> > long wanted such a tool on Linux as well.
> >

Now, maybe NFSv3 is rapidly becoming obsolete so it might not be worth
spending time on.
We has such tools at EMC for our servers, NetApp had too. Customers
loved these tools especially since
locking and nfsv3 was always very fragile :-)

Many years ago I looked at linux nfs in order to see if I could write
such tools.
(the tools themselves are really simple they just uses the NLM
protocol which the kernel already speaks so no need to add any new
apis.)
Takling from dim memory, but I think what tripped me up is that for
NLM_TEST reply, linux does not fill in the OH field, i.e. the owner
string,
which you would need to send a spoofed NLM_UNLOCK to release the lock.


>
> In a nut shell, I think my proposal is that samba will do something
> similar and request leases from the kernel instead of trying to
> enforce real mandatory locks.
>
> > > That realization makes "share access" locks (a.k.a. MAND_LOCK)
> > > not so very different from oplocks (leases/delegations).
> > > As long as samba and nfsd cooperate nicely with MAND_LOCK
> > > semantics, we don't really have to force local filesystems
> > > to obay MAND_LOCK semantics. If the file servers take leases
> > > on local filesystems, they will not get exclusive write access for
> > > files already open for write on local filesytem and same for read.
> > >
> >
> > I think this last statement isn't correct (if I'm parsing it correctly).
> > If a file is already open for write, then you just don't get a lease
> > when you try to request one. Ditto for write leases if it's already open
> > for read.
> >
>
> I think you miss read what I miss wrote ;-)
> As the title of this thread states, I am talking about the first case
> of acquiring an exclusive or read shared access to file at open time.
> It may be the fact that samba currently calls flock(LOCK_MAND)
> that is the source for confusion.
>
> Open failure is the expected behavior if file is already open for
> write (or read) on local filesystem, so my suggestion is:
> - Server opens the file and request a lease based of desired share mode
> - If file server got the lease, client gets the file handle
> - Otherwise, client gets an open failure
>
> > > On local file access on the server that violates the share mode,
> > > the file server acts as a grumpy washed out administrator that
> > > automatically grants any lock revoke ticket after timeout.
> > >
> >
> > Devil's advocate:
> >
> > Is this situation any better than just teaching the NFS/SMB servers to
> > track these locks out of band? Both samba and most NFS servers respect
> > share/deny mode locks, but only internally -- they aren't aware of the
> > others'. We could (in principle) come up with a mechanism to track these
> > that doesn't involve plumbing them into the kernel.
> >
>
> That would be a prerequisite to my suggested solution, as I wrote:
> "As long as samba and nfsd cooperate nicely with LOCK_MAND..."
> That means the two file servers cooperate on the share mode locks
> and try to figure out if there are outstanding leases before opening
> a file that will break those leases.
>
> > That said, coherent locking is best done in the kernel, IMO...
> >
>
> Indeed...
>
> > > This model may not fit use cases where "real" interop with
> > > local filesystem is needed, but compared to the existing
> > > solution (no interop at all) it is quite an improvement.
> > >
> > > Furthermore, short of SMB DENY_DELETE, we may not even
> > > need to change any kernel APIs.
> > > The addition of O_DENY* open flags can make programming
> > > easier, but taking a lease on an open file is still safe enough
> > > to implement share reservation (no?).
> > >
> > > Satisfying DENY_DELETE could be more tricky, but perhaps
> > > the existing SILLYRENAME interface of==between knfsd and vfs
> > > could be somehow utilized for this purpose?
> > >
> > > I though of bringing this up as a TOPIC for LSF/MM, but wanted
> > > to consult with you first. I am sure that you or Jeff can do a better
> > > job than me in enumerating the "interop" file lock issues that
> > > could be discussed in filesystems track forum.
> > >
> > > Thoughts? Explanation why this idea is idiotic?
> >
> > I think it's not a single idea. There are really two different aspects
> > to this given that we're really talking about two different types of
> > locks in SMB. I think you have to consider solving these problems
> > separately:
> >
> > 1) the ability to set a (typically whole-file) share/deny lock
> > atomically when you open a file. This is necessary for coherent
> > share/deny lock semantics. Note that these are only enforced open()
> > time.
> >
> > 2) mandatory locking (forbidding reads and writes on a byte range when
> > there is a conflicting lock set).
> >
>
> I was only trying to address the first problem (small steps...).
>
> > The first could (probably) be solved with something like what Pavel
> > proposed a few years ago...or maybe we just wire up O_EXLOCK and
> > O_SHLOCK:
> >
> > https://www.gnu.org/software/libc/manual/html_node/Open_002dtime-Flags.html
> >
>
> Nice. I wasn't aware of those BSD flags.
>
> > This seems like a fine idea (in principle) but it needs someone to drive
> > the work forward. You'll also likely be consuming a couple of O_* flags,
> > which could be tough sell (unless you come up with another way to do
> > it).
> >
>
> Once I know the obstacles to watch out from, I can drive this work.
> Thing is, I am not convinced myself that any new O_ flags are needed.
>
> How about this (for samba, knfsd is simpler):
> - pfd = open(filename, O_PATH)
> - flock(pfd, LOCK_MAND) (for file servers interop)
> - vfs checks no conflicting LOCK_MAND locks (like patch you once posted)
> - open(filename, O_RDWR) (and verify st_ino like samba does)
> - Request lease (for local fs interop)
> - check_conflicting_open() is changed to use inode_is_open_for_read()
> instead of checking d_count and i_count.
> - we already have i_readcount, just need to remove ifdef CONFIG_IMA
> - On lease break (from local fs), break client oplocks and invalidate
> file handle on server
>
> Thanks,
> Amir.
>

2019-02-14 20:51:02

by J. Bruce Fields

[permalink] [raw]
Subject: Re: Better interop for NFS/SMB file share mode/reservation

On Fri, Feb 08, 2019 at 10:31:07PM +0200, Amir Goldstein wrote:
> On Fri, Feb 8, 2019 at 10:17 PM J. Bruce Fields <[email protected]> wrote:
> > On Fri, Feb 08, 2019 at 10:02:43PM +0200, Amir Goldstein wrote:
> > > On Fri, Feb 8, 2019 at 5:51 PM J. Bruce Fields <[email protected]> wrote:
> > > > On Fri, Feb 08, 2019 at 04:45:46PM +0200, Amir Goldstein wrote:
> > > > > - check_conflicting_open() is changed to use inode_is_open_for_read()
> > > > > instead of checking d_count and i_count.
> > > >
> > > > Independently of the rest, I'd love to do away with those
> > > > d_count/i_count checks. What's inode_is_open_for_read()?
> > > >
> > >
> > > It would look maybe something like this:
> > >
> > > static inline bool file_is_open_for_read(const struct inode *file)
> > > {
> > > struct inode *inode = file_inode(file);
> > > int countself = (file->f_mode & (FMODE_READ | FMODE_WRITE)) ==
> > > FMODE_READ) ? 1 : 0;
> > >
> > > return atomic_read(&inode->i_readcount) > countself;
> > > }
> > >
> > > And it would allow for acquiring F_WRLCK lease if other
> > > instances of inode are open O_PATH.
> > > A slight change of semantics that seems harmless(?)
> > > and will allow some flexibility.
> >
> > How did I not know about i_readcount? (Looking) I guess it would mean
> > adding some dependence on CONFIG_IMA, hm.
> >
>
> Yes, or we remove ifdef CONFIG_IMA from i_readcount.
> I am not sure if the concern was size of struct inode
> (shouldn't increase on 64bit arch) or the accounting on
> open/close. The impact doesn't look significant (?)..

Looks like the original patch was d984ea604943bb "fs: move i_readcount".
I did some googling around and looked at the discussion summarized by
https://lwn.net/Articles/410895/ but can't find useful discussion of
i_readcount impact.

Looks like CONFIG_IMA is on in Fedora and RHEL, for what it's worth.

Maybe something like this?

--b.

commit 02cfda99ed8c
Author: J. Bruce Fields <[email protected]>
Date: Thu Feb 14 15:02:02 2019 -0500

locks: use i_readcount to detect lease conflicts

The lease code currently uses the inode and dentry refcounts to detect
whether someone has a file open for read. This seems fragile. Use
i_readcount instead.

Signed-off-by: J. Bruce Fields <[email protected]>

diff --git a/fs/locks.c b/fs/locks.c
index ff6af2c32601..299abad65545 100644
--- a/fs/locks.c
+++ b/fs/locks.c
@@ -1769,8 +1769,7 @@ check_conflicting_open(const struct dentry *dentry, const long arg, int flags)
if ((arg == F_RDLCK) && inode_is_open_for_write(inode))
return -EAGAIN;

- if ((arg == F_WRLCK) && ((d_count(dentry) > 1) ||
- (atomic_read(&inode->i_count) > 1)))
+ if ((arg == F_WRLCK) && (atomic_read(&inode->i_readcount) > 1))
ret = -EAGAIN;

return ret;
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 29d8e2cfed0e..e862de682da9 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -676,7 +676,7 @@ struct inode {
atomic_t i_count;
atomic_t i_dio_count;
atomic_t i_writecount;
-#ifdef CONFIG_IMA
+#if defined(CONFIG_IMA) || (defined_CONFIG_FILE_LOCKING)
atomic_t i_readcount; /* struct files open RO */
#endif
const struct file_operations *i_fop; /* former ->i_op->default_file_ops */
@@ -2869,7 +2869,7 @@ static inline bool inode_is_open_for_write(const struct inode *inode)
return atomic_read(&inode->i_writecount) > 0;
}

-#ifdef CONFIG_IMA
+#if defined(CONFIG_IMA) || defined(CONFIG_FILE_LOCKING)
static inline void i_readcount_dec(struct inode *inode)
{
BUG_ON(!atomic_read(&inode->i_readcount));

2019-02-14 21:06:53

by J. Bruce Fields

[permalink] [raw]
Subject: Re: Better interop for NFS/SMB file share mode/reservation

On Sat, Feb 09, 2019 at 06:04:22AM +0200, Amir Goldstein wrote:
> On Sat, Feb 9, 2019 at 12:12 AM Jeremy Allison <[email protected]> wrote:
> >
> > On Fri, Feb 08, 2019 at 10:02:43PM +0200, Amir Goldstein via samba-technical wrote:
> > > On Fri, Feb 8, 2019 at 5:51 PM J. Bruce Fields <[email protected]> wrote:
> > > >
> > > > On Fri, Feb 08, 2019 at 04:45:46PM +0200, Amir Goldstein wrote:
> > > > > - check_conflicting_open() is changed to use inode_is_open_for_read()
> > > > > instead of checking d_count and i_count.
> > > >
> > > > Independently of the rest, I'd love to do away with those
> > > > d_count/i_count checks. What's inode_is_open_for_read()?
> > > >
> > >
> > > It would look maybe something like this:
> > >
> > > static inline bool file_is_open_for_read(const struct inode *file)
> > > {
> > > struct inode *inode = file_inode(file);
> > > int countself = (file->f_mode & (FMODE_READ | FMODE_WRITE)) ==
> > > FMODE_READ) ? 1 : 0;
> > >
> > > return atomic_read(&inode->i_readcount) > countself;
> > > }
> > >
> > > And it would allow for acquiring F_WRLCK lease if other
> > > instances of inode are open O_PATH.
> > > A slight change of semantics that seems harmless(?)
> > > and will allow some flexibility.
> > >
> > > But if samba can't figure out a way to keep a single open file
> > > descriptor for oplocks per client-file, then this model doesn't
> > > help us make any progress.
> >
> > Samba uses a single file descriptor per SMB2 open file
> > handle. Is this what you meant ? We need this to keep
> > the per-handle OFD locks around.
>
> I understand now there are several cases when smbd has
> several open file descriptors for the same client.
> Is that related to this comment in samba wiki about kernel oplocks?
>
> "Linux kernel oplocks don't provide the needed features.
> (They don't even work correctly for oplocks...)"
>
> Can you elaborate on that? is that because a samba oplock
> is per client and therefore other file opens from same client
> should not break its own lease?
> If that is the case, than Bruce's work on the "delegations"
> flavor of kernel oplocks could make them a good fit for samba.

After this:

https://marc.info/?l=linux-nfs&m=154966239918297&w=2

delegations would no longer conflict with opens from the same tgid. So
if your threads all run in the same process and you're willing to manage
conflicts among your own clients, that should still allow you to do
multiple opens of the same file without giving up your lease/delegation.

I'd be curious to know whether that works with Samba's design.

--b.

2019-02-15 07:32:18

by Amir Goldstein

[permalink] [raw]
Subject: Re: Better interop for NFS/SMB file share mode/reservation

On Thu, Feb 14, 2019 at 10:51 PM J. Bruce Fields <[email protected]> wrote:
>
> On Fri, Feb 08, 2019 at 10:31:07PM +0200, Amir Goldstein wrote:
> > On Fri, Feb 8, 2019 at 10:17 PM J. Bruce Fields <[email protected]> wrote:
> > > On Fri, Feb 08, 2019 at 10:02:43PM +0200, Amir Goldstein wrote:
> > > > On Fri, Feb 8, 2019 at 5:51 PM J. Bruce Fields <[email protected]> wrote:
> > > > > On Fri, Feb 08, 2019 at 04:45:46PM +0200, Amir Goldstein wrote:
> > > > > > - check_conflicting_open() is changed to use inode_is_open_for_read()
> > > > > > instead of checking d_count and i_count.
> > > > >
> > > > > Independently of the rest, I'd love to do away with those
> > > > > d_count/i_count checks. What's inode_is_open_for_read()?
> > > > >
> > > >
> > > > It would look maybe something like this:
> > > >
> > > > static inline bool file_is_open_for_read(const struct inode *file)
> > > > {
> > > > struct inode *inode = file_inode(file);
> > > > int countself = (file->f_mode & (FMODE_READ | FMODE_WRITE)) ==
> > > > FMODE_READ) ? 1 : 0;
> > > >
> > > > return atomic_read(&inode->i_readcount) > countself;
> > > > }
> > > >
> > > > And it would allow for acquiring F_WRLCK lease if other
> > > > instances of inode are open O_PATH.
> > > > A slight change of semantics that seems harmless(?)
> > > > and will allow some flexibility.
> > >
> > > How did I not know about i_readcount? (Looking) I guess it would mean
> > > adding some dependence on CONFIG_IMA, hm.
> > >
> >
> > Yes, or we remove ifdef CONFIG_IMA from i_readcount.
> > I am not sure if the concern was size of struct inode
> > (shouldn't increase on 64bit arch) or the accounting on
> > open/close. The impact doesn't look significant (?)..
>
> Looks like the original patch was d984ea604943bb "fs: move i_readcount".
> I did some googling around and looked at the discussion summarized by
> https://lwn.net/Articles/410895/ but can't find useful discussion of
> i_readcount impact.
>
> Looks like CONFIG_IMA is on in Fedora and RHEL, for what it's worth.
>
> Maybe something like this?
>
> --b.
>
> commit 02cfda99ed8c
> Author: J. Bruce Fields <[email protected]>
> Date: Thu Feb 14 15:02:02 2019 -0500
>
> locks: use i_readcount to detect lease conflicts
>
> The lease code currently uses the inode and dentry refcounts to detect
> whether someone has a file open for read. This seems fragile. Use
> i_readcount instead.
>
> Signed-off-by: J. Bruce Fields <[email protected]>
>
> diff --git a/fs/locks.c b/fs/locks.c
> index ff6af2c32601..299abad65545 100644
> --- a/fs/locks.c
> +++ b/fs/locks.c
> @@ -1769,8 +1769,7 @@ check_conflicting_open(const struct dentry *dentry, const long arg, int flags)
> if ((arg == F_RDLCK) && inode_is_open_for_write(inode))
> return -EAGAIN;
>
> - if ((arg == F_WRLCK) && ((d_count(dentry) > 1) ||
> - (atomic_read(&inode->i_count) > 1)))
> + if ((arg == F_WRLCK) && (atomic_read(&inode->i_readcount) > 1))
> ret = -EAGAIN;

Alas, i_readcount is not the count of file opens for read, it is the count
of file opens O_RDONLY, so this is incorrect wrt conflict with other writers.

I guess since there is a full smp_mb() before this check, then you
can check (i_readcount + i_writecount) > 1 || (i_writecount < 0)

You can also check if caller itself is O_RDONLY to know if self
count is expect to be in i_readcount or i_writecount, but not sure
it is worth the trouble.

Thanks,
Amir.

2019-02-15 20:09:41

by J. Bruce Fields

[permalink] [raw]
Subject: Re: Better interop for NFS/SMB file share mode/reservation

On Fri, Feb 15, 2019 at 09:31:48AM +0200, Amir Goldstein wrote:
> On Thu, Feb 14, 2019 at 10:51 PM J. Bruce Fields <[email protected]> wrote:
> >
> > On Fri, Feb 08, 2019 at 10:31:07PM +0200, Amir Goldstein wrote:
> > > On Fri, Feb 8, 2019 at 10:17 PM J. Bruce Fields <[email protected]> wrote:
> > > > On Fri, Feb 08, 2019 at 10:02:43PM +0200, Amir Goldstein wrote:
> > > > > On Fri, Feb 8, 2019 at 5:51 PM J. Bruce Fields <[email protected]> wrote:
> > > > > > On Fri, Feb 08, 2019 at 04:45:46PM +0200, Amir Goldstein wrote:
> > > > > > > - check_conflicting_open() is changed to use inode_is_open_for_read()
> > > > > > > instead of checking d_count and i_count.
> > > > > >
> > > > > > Independently of the rest, I'd love to do away with those
> > > > > > d_count/i_count checks. What's inode_is_open_for_read()?
> > > > > >
> > > > >
> > > > > It would look maybe something like this:
> > > > >
> > > > > static inline bool file_is_open_for_read(const struct inode *file)
> > > > > {
> > > > > struct inode *inode = file_inode(file);
> > > > > int countself = (file->f_mode & (FMODE_READ | FMODE_WRITE)) ==
> > > > > FMODE_READ) ? 1 : 0;
> > > > >
> > > > > return atomic_read(&inode->i_readcount) > countself;
> > > > > }
> > > > >
> > > > > And it would allow for acquiring F_WRLCK lease if other
> > > > > instances of inode are open O_PATH.
> > > > > A slight change of semantics that seems harmless(?)
> > > > > and will allow some flexibility.
> > > >
> > > > How did I not know about i_readcount? (Looking) I guess it would mean
> > > > adding some dependence on CONFIG_IMA, hm.
> > > >
> > >
> > > Yes, or we remove ifdef CONFIG_IMA from i_readcount.
> > > I am not sure if the concern was size of struct inode
> > > (shouldn't increase on 64bit arch) or the accounting on
> > > open/close. The impact doesn't look significant (?)..
> >
> > Looks like the original patch was d984ea604943bb "fs: move i_readcount".
> > I did some googling around and looked at the discussion summarized by
> > https://lwn.net/Articles/410895/ but can't find useful discussion of
> > i_readcount impact.
> >
> > Looks like CONFIG_IMA is on in Fedora and RHEL, for what it's worth.
> >
> > Maybe something like this?
> >
> > --b.
> >
> > commit 02cfda99ed8c
> > Author: J. Bruce Fields <[email protected]>
> > Date: Thu Feb 14 15:02:02 2019 -0500
> >
> > locks: use i_readcount to detect lease conflicts
> >
> > The lease code currently uses the inode and dentry refcounts to detect
> > whether someone has a file open for read. This seems fragile. Use
> > i_readcount instead.
> >
> > Signed-off-by: J. Bruce Fields <[email protected]>
> >
> > diff --git a/fs/locks.c b/fs/locks.c
> > index ff6af2c32601..299abad65545 100644
> > --- a/fs/locks.c
> > +++ b/fs/locks.c
> > @@ -1769,8 +1769,7 @@ check_conflicting_open(const struct dentry *dentry, const long arg, int flags)
> > if ((arg == F_RDLCK) && inode_is_open_for_write(inode))
> > return -EAGAIN;
> >
> > - if ((arg == F_WRLCK) && ((d_count(dentry) > 1) ||
> > - (atomic_read(&inode->i_count) > 1)))
> > + if ((arg == F_WRLCK) && (atomic_read(&inode->i_readcount) > 1))
> > ret = -EAGAIN;
>
> Alas, i_readcount is not the count of file opens for read, it is the count
> of file opens O_RDONLY, so this is incorrect wrt conflict with other writers.

Whoops, thanks!

> I guess since there is a full smp_mb() before this check, then you
> can check (i_readcount + i_writecount) > 1 || (i_writecount < 0)
>
> You can also check if caller itself is O_RDONLY to know if self
> count is expect to be in i_readcount or i_writecount, but not sure
> it is worth the trouble.

I don't know, it still looks reasonable. I'll fool around with it.

--b.

2019-03-05 21:47:49

by J. Bruce Fields

[permalink] [raw]
Subject: Re: Better interop for NFS/SMB file share mode/reservation

On Thu, Feb 14, 2019 at 04:06:52PM -0500, J. Bruce Fields wrote:
> After this:
>
> https://marc.info/?l=linux-nfs&m=154966239918297&w=2
>
> delegations would no longer conflict with opens from the same tgid. So
> if your threads all run in the same process and you're willing to manage
> conflicts among your own clients, that should still allow you to do
> multiple opens of the same file without giving up your lease/delegation.
>
> I'd be curious to know whether that works with Samba's design.

Any idea whether that would work?

(Easy? Impossible? Possible, but realistically the changes required to
Samba would be painful enough that it'd be unlikely to get done?)

--b.

2019-03-06 07:09:34

by Amir Goldstein

[permalink] [raw]
Subject: Re: Better interop for NFS/SMB file share mode/reservation

On Tue, Mar 5, 2019 at 11:48 PM J. Bruce Fields <[email protected]> wrote:
>
> On Thu, Feb 14, 2019 at 04:06:52PM -0500, J. Bruce Fields wrote:
> > After this:
> >
> > https://marc.info/?l=linux-nfs&m=154966239918297&w=2
> >
> > delegations would no longer conflict with opens from the same tgid. So
> > if your threads all run in the same process and you're willing to manage
> > conflicts among your own clients, that should still allow you to do
> > multiple opens of the same file without giving up your lease/delegation.
> >
> > I'd be curious to know whether that works with Samba's design.
>
> Any idea whether that would work?
>
> (Easy? Impossible? Possible, but realistically the changes required to
> Samba would be painful enough that it'd be unlikely to get done?)
>

[CC Ralph Boehme]

I am not a samba team member, but seems to me that your proposal
fits samba design like a glove. With one smbd process per client connection,
with your proposal, opens (for read) from same smbd process will not break the
shared read lease from same client, so oplocks level II could be implemented
using kernel oplocks (new flavor).

IOW, can someone from samba team please elaborate on this quote
from samba wiki [1]: "Linux kernel oplocks don't provide the needed features.
(They don't even work correctly for oplocks...) ==> SMB-only feature."

[1] https://wiki.samba.org/index.php/Samba3/SMB2#new_concepts

I would like to use this opportunity to ask samba team members to raise
any (*) other pain points about missing or lacking Linux kernel interfaces.
I promise to use my time in LSF/MM 2019 to try and promote samba
needs among Linux filesystem developers.

Preferably, update the samba wiki page with wish list from Linux kernel,
but try to be more descriptive than "... don't provide the needed features".

(*) OK, not RichACLs. I know my own limitations.

Thanks,
Amir.

2019-03-06 15:11:52

by J. Bruce Fields

[permalink] [raw]
Subject: Re: Better interop for NFS/SMB file share mode/reservation

On Tue, Mar 05, 2019 at 04:47:48PM -0500, J. Bruce Fields wrote:
> On Thu, Feb 14, 2019 at 04:06:52PM -0500, J. Bruce Fields wrote:
> > After this:
> >
> > https://marc.info/?l=linux-nfs&m=154966239918297&w=2
> >
> > delegations would no longer conflict with opens from the same tgid. So
> > if your threads all run in the same process and you're willing to manage
> > conflicts among your own clients, that should still allow you to do
> > multiple opens of the same file without giving up your lease/delegation.
> >
> > I'd be curious to know whether that works with Samba's design.
>
> Any idea whether that would work?
>
> (Easy? Impossible? Possible, but realistically the changes required to
> Samba would be painful enough that it'd be unlikely to get done?)

Volker reminds me off-list that he'd like to see Ganesha and Samba work
out an API in userspace first before commiting to a user<->kernel API.

Jeff, wasn't there some work (on Ceph maybe?) on a userspace delegation
API? Is that close to what's needed?

In any case, my immediate goal is just to get knfsd fixed, which doesn't
really commit us to anything--knfsd only needs kernel internal
interfaces. But it'd be nice to have at least some idea if we're on the
right track, to save having to redo that work later.

--b.

2019-03-06 15:17:37

by J. Bruce Fields

[permalink] [raw]
Subject: Re: Better interop for NFS/SMB file share mode/reservation

On Wed, Mar 06, 2019 at 09:09:21AM +0200, Amir Goldstein wrote:
> On Tue, Mar 5, 2019 at 11:48 PM J. Bruce Fields <[email protected]> wrote:
> >
> > On Thu, Feb 14, 2019 at 04:06:52PM -0500, J. Bruce Fields wrote:
> > > After this:
> > >
> > > https://marc.info/?l=linux-nfs&m=154966239918297&w=2
> > >
> > > delegations would no longer conflict with opens from the same tgid. So
> > > if your threads all run in the same process and you're willing to manage
> > > conflicts among your own clients, that should still allow you to do
> > > multiple opens of the same file without giving up your lease/delegation.
> > >
> > > I'd be curious to know whether that works with Samba's design.
> >
> > Any idea whether that would work?
> >
> > (Easy? Impossible? Possible, but realistically the changes required to
> > Samba would be painful enough that it'd be unlikely to get done?)
> >
>
> [CC Ralph Boehme]
>
> I am not a samba team member, but seems to me that your proposal
> fits samba design like a glove. With one smbd process per client connection,
> with your proposal, opens (for read) from same smbd process will not break the
> shared read lease from same client, so oplocks level II could be implemented
> using kernel oplocks (new flavor).

OK. So I wonder about Ganesha. I'm not sure, but I *think* it's like
knfsd in that it has a bunch of worker threads that can each take rpc's
from any client. I don't remember if they're actually threads or
processes.

> IOW, can someone from samba team please elaborate on this quote
> from samba wiki [1]: "Linux kernel oplocks don't provide the needed features.
> (They don't even work correctly for oplocks...) ==> SMB-only feature."
>
> [1] https://wiki.samba.org/index.php/Samba3/SMB2#new_concepts

Yes, it'd be useful to get those details written down in one place.

> I would like to use this opportunity to ask samba team members to raise
> any (*) other pain points about missing or lacking Linux kernel interfaces.
> I promise to use my time in LSF/MM 2019 to try and promote samba
> needs among Linux filesystem developers.

I feel like this particular problem is about details of
oplock/lease/delegation semantics that will interest a small number of
people, so should mainly be handled as a hallway-track thing. But,
maybe it's good to bring it up in a session if only to make sure anyone
interested is aware.

> (*) OK, not RichACLs. I know my own limitations.

Hah.

--b.

2019-03-06 20:31:12

by Jeff Layton

[permalink] [raw]
Subject: Re: Better interop for NFS/SMB file share mode/reservation

On Wed, 2019-03-06 at 10:11 -0500, J. Bruce Fields wrote:
> On Tue, Mar 05, 2019 at 04:47:48PM -0500, J. Bruce Fields wrote:
> > On Thu, Feb 14, 2019 at 04:06:52PM -0500, J. Bruce Fields wrote:
> > > After this:
> > >
> > > https://marc.info/?l=linux-nfs&m=154966239918297&w=2
> > >
> > > delegations would no longer conflict with opens from the same tgid. So
> > > if your threads all run in the same process and you're willing to manage
> > > conflicts among your own clients, that should still allow you to do
> > > multiple opens of the same file without giving up your lease/delegation.
> > >
> > > I'd be curious to know whether that works with Samba's design.
> >
> > Any idea whether that would work?
> >
> > (Easy? Impossible? Possible, but realistically the changes required to
> > Samba would be painful enough that it'd be unlikely to get done?)
>
> Volker reminds me off-list that he'd like to see Ganesha and Samba work
> out an API in userspace first before commiting to a user<->kernel API.
>
> Jeff, wasn't there some work (on Ceph maybe?) on a userspace delegation
> API? Is that close to what's needed?
>

Here's the C headers for that stuff:

https://github.com/ceph/ceph/blob/7ba6bece4187eda5d05a9b84211fe6ba8dd287bd/src/include/cephfs/libcephfs.h#L1734

It's simple enough and works for us in ganesha, and I think we can
probably adapt it to samba without too much difficulty. The callback
doesn't seem like it'll do for a kernel API though -- you'd almost
certainly need to do something different there (signals? inotify?).

> In any case, my immediate goal is just to get knfsd fixed, which doesn't
> really commit us to anything--knfsd only needs kernel internal
> interfaces. But it'd be nice to have at least some idea if we're on the
> right track, to save having to redo that work later.
>


--
Jeff Layton <[email protected]>


2019-03-06 21:07:52

by Jeremy Allison

[permalink] [raw]
Subject: Re: Better interop for NFS/SMB file share mode/reservation

On Wed, Mar 06, 2019 at 03:31:08PM -0500, Jeff Layton wrote:
> On Wed, 2019-03-06 at 10:11 -0500, J. Bruce Fields wrote:
> > On Tue, Mar 05, 2019 at 04:47:48PM -0500, J. Bruce Fields wrote:
> > > On Thu, Feb 14, 2019 at 04:06:52PM -0500, J. Bruce Fields wrote:
> > > > After this:
> > > >
> > > > https://marc.info/?l=linux-nfs&m=154966239918297&w=2
> > > >
> > > > delegations would no longer conflict with opens from the same tgid. So
> > > > if your threads all run in the same process and you're willing to manage
> > > > conflicts among your own clients, that should still allow you to do
> > > > multiple opens of the same file without giving up your lease/delegation.
> > > >
> > > > I'd be curious to know whether that works with Samba's design.
> > >
> > > Any idea whether that would work?
> > >
> > > (Easy? Impossible? Possible, but realistically the changes required to
> > > Samba would be painful enough that it'd be unlikely to get done?)
> >
> > Volker reminds me off-list that he'd like to see Ganesha and Samba work
> > out an API in userspace first before commiting to a user<->kernel API.
> >
> > Jeff, wasn't there some work (on Ceph maybe?) on a userspace delegation
> > API? Is that close to what's needed?
> >
>
> Here's the C headers for that stuff:
>
> https://github.com/ceph/ceph/blob/7ba6bece4187eda5d05a9b84211fe6ba8dd287bd/src/include/cephfs/libcephfs.h#L1734
>
> It's simple enough and works for us in ganesha, and I think we can
> probably adapt it to samba without too much difficulty. The callback
> doesn't seem like it'll do for a kernel API though -- you'd almost
> certainly need to do something different there (signals? inotify?).

SMB3 leases have R/RW and Handle-based leases.

Handle leases allow multiple opens of the same pathname
that get different handles to share the lease, allowing
a client redirector to delay opens or closes locally
so long as it has a handle lease.

Here are the semantics:

https://docs.microsoft.com/en-us/openspecs/windows_protocols/ms-smb2/d8df943d-6ad7-4b30-9f58-96ae90fc6204

I'm not sure a simple file-descriptor based API is
enough for us. Can he have a uuid or token based
API instead where the server can chose what fd's
to cover with a token ?

2019-03-06 21:25:24

by Ralph Boehme

[permalink] [raw]
Subject: Re: Better interop for NFS/SMB file share mode/reservation


Jeremy Allison wrote:
> On Wed, Mar 06, 2019 at 03:31:08PM -0500, Jeff Layton wrote:
>> On Wed, 2019-03-06 at 10:11 -0500, J. Bruce Fields wrote:
>>>
>>> Jeff, wasn't there some work (on Ceph maybe?) on a userspace delegation
>>> API? Is that close to what's needed?
>>>
>>
>> Here's the C headers for that stuff:
>>
>> https://github.com/ceph/ceph/blob/7ba6bece4187eda5d05a9b84211fe6ba8dd287bd/src/include/cephfs/libcephfs.h#L1734
>>
>> It's simple enough and works for us in ganesha, and I think we can
>> probably adapt it to samba without too much difficulty. The callback
>> doesn't seem like it'll do for a kernel API though -- you'd almost
>> certainly need to do something different there (signals? inotify?).
>
> SMB3 leases have R/RW and Handle-based leases.

Just to be precise: SMB2.1+ has R, RH, RW and RWH leases.

> Handle leases allow multiple opens of the same pathname
> that get different handles to share the lease, allowing
> a client redirector to delay opens or closes locally
> so long as it has a handle lease.

That'a a propertly of leases in general, not just H-leases. The client provides a lease key which is a GUID with each lease request

>
> Here are the semantics:
>
> https://docs.microsoft.com/en-us/openspecs/windows_protocols/ms-smb2/d8df943d-6ad7-4b30-9f58-96ae90fc6204
>
> I'm not sure a simple file-descriptor based API is
> enough for us. Can he have a uuid or token based
> API instead where the server can chose what fd's
> to cover with a token ?

Yes, that would be ideal.

-slow

2019-03-06 21:55:56

by Jeff Layton

[permalink] [raw]
Subject: Re: Better interop for NFS/SMB file share mode/reservation

On Wed, 2019-03-06 at 13:07 -0800, Jeremy Allison wrote:
> On Wed, Mar 06, 2019 at 03:31:08PM -0500, Jeff Layton wrote:
> > On Wed, 2019-03-06 at 10:11 -0500, J. Bruce Fields wrote:
> > > On Tue, Mar 05, 2019 at 04:47:48PM -0500, J. Bruce Fields wrote:
> > > > On Thu, Feb 14, 2019 at 04:06:52PM -0500, J. Bruce Fields wrote:
> > > > > After this:
> > > > >
> > > > > https://marc.info/?l=linux-nfs&m=154966239918297&w=2
> > > > >
> > > > > delegations would no longer conflict with opens from the same tgid. So
> > > > > if your threads all run in the same process and you're willing to manage
> > > > > conflicts among your own clients, that should still allow you to do
> > > > > multiple opens of the same file without giving up your lease/delegation.
> > > > >
> > > > > I'd be curious to know whether that works with Samba's design.
> > > >
> > > > Any idea whether that would work?
> > > >
> > > > (Easy? Impossible? Possible, but realistically the changes required to
> > > > Samba would be painful enough that it'd be unlikely to get done?)
> > >
> > > Volker reminds me off-list that he'd like to see Ganesha and Samba work
> > > out an API in userspace first before commiting to a user<->kernel API.
> > >
> > > Jeff, wasn't there some work (on Ceph maybe?) on a userspace delegation
> > > API? Is that close to what's needed?
> > >
> >
> > Here's the C headers for that stuff:
> >
> > https://github.com/ceph/ceph/blob/7ba6bece4187eda5d05a9b84211fe6ba8dd287bd/src/include/cephfs/libcephfs.h#L1734
> >
> > It's simple enough and works for us in ganesha, and I think we can
> > probably adapt it to samba without too much difficulty. The callback
> > doesn't seem like it'll do for a kernel API though -- you'd almost
> > certainly need to do something different there (signals? inotify?).
>
> SMB3 leases have R/RW and Handle-based leases.
>
> Handle leases allow multiple opens of the same pathname
> that get different handles to share the lease, allowing
> a client redirector to delay opens or closes locally
> so long as it has a handle lease.
>
> Here are the semantics:
>
> https://docs.microsoft.com/en-us/openspecs/windows_protocols/ms-smb2/d8df943d-6ad7-4b30-9f58-96ae90fc6204
>
> I'm not sure a simple file-descriptor based API is
> enough for us. Can he have a uuid or token based
> API instead where the server can chose what fd's
> to cover with a token ?

The libcephfs API takes an opaque void * that could hold such a token.
It gets passed back to the callback function when it's called to handle
a delegation break.

I could envision a delegation storing something similar (maybe an
unsigned long or uint64_t) and pass it back in a delegation break. With
that you could set multiple delegations on a fd, and use that to store a
key for each one.

Would something like that work for samba, or am I misunderstanding what
it needs?

--
Jeff Layton <[email protected]>


2019-03-07 07:48:37

by Frank Filz

[permalink] [raw]
Subject: RE: [NFS-Ganesha-Devel] Re: Better interop for NFS/SMB file share mode/reservation

> On Wed, Mar 06, 2019 at 09:09:21AM +0200, Amir Goldstein wrote:
> > On Tue, Mar 5, 2019 at 11:48 PM J. Bruce Fields <[email protected]> wrote:
> > >
> > > On Thu, Feb 14, 2019 at 04:06:52PM -0500, J. Bruce Fields wrote:
> > > > After this:
> > > >
> > > > https://marc.info/?l=linux-nfs&m=154966239918297&w=2
> > > >
> > > > delegations would no longer conflict with opens from the same
> > > > tgid. So if your threads all run in the same process and you're
> > > > willing to manage conflicts among your own clients, that should
> > > > still allow you to do multiple opens of the same file without giving up your
> lease/delegation.
> > > >
> > > > I'd be curious to know whether that works with Samba's design.
> > >
> > > Any idea whether that would work?
> > >
> > > (Easy? Impossible? Possible, but realistically the changes
> > > required to Samba would be painful enough that it'd be unlikely to
> > > get done?)
> > >
> >
> > [CC Ralph Boehme]
> >
> > I am not a samba team member, but seems to me that your proposal fits
> > samba design like a glove. With one smbd process per client
> > connection, with your proposal, opens (for read) from same smbd
> > process will not break the shared read lease from same client, so
> > oplocks level II could be implemented using kernel oplocks (new flavor).
>
> OK. So I wonder about Ganesha. I'm not sure, but I *think* it's like knfsd in that
> it has a bunch of worker threads that can each take rpc's from any client. I don't
> remember if they're actually threads or processes.

Ganesha does use worker threads, however, one thing that may be an advantage here, or at least can be leveraged, is that Ganesha attaches a single file descriptor to each stateid. As long as the I/O requests come using the stateid, that file descriptor will be used.

We have some work completed and more in progress on delegations, and if there becomes a new kernel oplock available, we could definitely use it. On the other hand, FSAL_VFS which is the FSAL used with kernel file systems does not support delegations...

The (distributed) file systems we support delegations on have use space libraries (which Samba should also be using?) that implement the delegation primitives.

> > IOW, can someone from samba team please elaborate on this quote from
> > samba wiki [1]: "Linux kernel oplocks don't provide the needed features.
> > (They don't even work correctly for oplocks...) ==> SMB-only feature."
> >
> > [1] https://wiki.samba.org/index.php/Samba3/SMB2#new_concepts
>
> Yes, it'd be useful to get those details written down in one place.
>
> > I would like to use this opportunity to ask samba team members to
> > raise any (*) other pain points about missing or lacking Linux kernel interfaces.
> > I promise to use my time in LSF/MM 2019 to try and promote samba needs
> > among Linux filesystem developers.
>
> I feel like this particular problem is about details of oplock/lease/delegation
> semantics that will interest a small number of people, so should mainly be
> handled as a hallway-track thing. But, maybe it's good to bring it up in a session
> if only to make sure anyone interested is aware.
>
> > (*) OK, not RichACLs. I know my own limitations.
>
> Hah.

:-)

Frank


2019-03-07 11:04:10

by Stefan Metzmacher

[permalink] [raw]
Subject: Re: Better interop for NFS/SMB file share mode/reservation

Am 06.03.19 um 22:25 schrieb Ralph Böhme via samba-technical:
>
> Jeremy Allison wrote:
>> On Wed, Mar 06, 2019 at 03:31:08PM -0500, Jeff Layton wrote:
>>> On Wed, 2019-03-06 at 10:11 -0500, J. Bruce Fields wrote:
>>>>
>>>> Jeff, wasn't there some work (on Ceph maybe?) on a userspace delegation
>>>> API? Is that close to what's needed?
>>>>
>>>
>>> Here's the C headers for that stuff:
>>>
>>> https://github.com/ceph/ceph/blob/7ba6bece4187eda5d05a9b84211fe6ba8dd287bd/src/include/cephfs/libcephfs.h#L1734
>>>
>>> It's simple enough and works for us in ganesha, and I think we can
>>> probably adapt it to samba without too much difficulty. The callback
>>> doesn't seem like it'll do for a kernel API though -- you'd almost
>>> certainly need to do something different there (signals? inotify?).
>>
>> SMB3 leases have R/RW and Handle-based leases.
>
> Just to be precise: SMB2.1+ has R, RH, RW and RWH leases.
>
>> Handle leases allow multiple opens of the same pathname
>> that get different handles to share the lease, allowing
>> a client redirector to delay opens or closes locally
>> so long as it has a handle lease.
>
> That'a a propertly of leases in general, not just H-leases. The client provides a lease key which is a GUID with each lease request
>
>>
>> Here are the semantics:
>>
>> https://docs.microsoft.com/en-us/openspecs/windows_protocols/ms-smb2/d8df943d-6ad7-4b30-9f58-96ae90fc6204
>>
>> I'm not sure a simple file-descriptor based API is
>> enough for us. Can he have a uuid or token based
>> API instead where the server can chose what fd's
>> to cover with a token ?
>
> Yes, that would be ideal.

If we want to design an useful API, we also need to think about
all features:
- file oplock/leases
- directory leases
- share modes
- disconnected handles (for durable and persistent handles),
which exists within the kernel for a while and can be reattached
to process, using some kind of cookie and the same euid
- the API needs ways to use epoll in order to do async opens
and lease breaks. For opens the model of async socket connects
could be used. Leases could have a signalfd-style api.

We may not need everything at once, but we should have the full picture
in mind. And we need working code in kernel and userspace that passes
all tests (we may need to add additional test). Otherwise the kernel
creates new syscalls, which wouldn't be used by Samba in the end.

metze


Attachments:
signature.asc (833.00 B)
OpenPGP digital signature

2019-03-07 16:47:44

by Simo

[permalink] [raw]
Subject: Re: Better interop for NFS/SMB file share mode/reservation

On Thu, 2019-03-07 at 12:03 +0100, Stefan Metzmacher via samba-
technical wrote:
> Am 06.03.19 um 22:25 schrieb Ralph Böhme via samba-technical:
> > Jeremy Allison wrote:
> > > On Wed, Mar 06, 2019 at 03:31:08PM -0500, Jeff Layton wrote:
> > > > On Wed, 2019-03-06 at 10:11 -0500, J. Bruce Fields wrote:
> > > > > Jeff, wasn't there some work (on Ceph maybe?) on a userspace delegation
> > > > > API? Is that close to what's needed?
> > > > >
> > > >
> > > > Here's the C headers for that stuff:
> > > >
> > > > https://github.com/ceph/ceph/blob/7ba6bece4187eda5d05a9b84211fe6ba8dd287bd/src/include/cephfs/libcephfs.h#L1734
> > > >
> > > > It's simple enough and works for us in ganesha, and I think we can
> > > > probably adapt it to samba without too much difficulty. The callback
> > > > doesn't seem like it'll do for a kernel API though -- you'd almost
> > > > certainly need to do something different there (signals? inotify?).
> > >
> > > SMB3 leases have R/RW and Handle-based leases.
> >
> > Just to be precise: SMB2.1+ has R, RH, RW and RWH leases.
> >
> > > Handle leases allow multiple opens of the same pathname
> > > that get different handles to share the lease, allowing
> > > a client redirector to delay opens or closes locally
> > > so long as it has a handle lease.
> >
> > That'a a propertly of leases in general, not just H-leases. The client provides a lease key which is a GUID with each lease request
> >
> > > Here are the semantics:
> > >
> > > https://docs.microsoft.com/en-us/openspecs/windows_protocols/ms-smb2/d8df943d-6ad7-4b30-9f58-96ae90fc6204
> > >
> > > I'm not sure a simple file-descriptor based API is
> > > enough for us. Can he have a uuid or token based
> > > API instead where the server can chose what fd's
> > > to cover with a token ?
> >
> > Yes, that would be ideal.
>
> If we want to design an useful API, we also need to think about
> all features:
> - file oplock/leases
> - directory leases
> - share modes
> - disconnected handles (for durable and persistent handles),
> which exists within the kernel for a while and can be reattached
> to process, using some kind of cookie and the same euid
> - the API needs ways to use epoll in order to do async opens
> and lease breaks. For opens the model of async socket connects
> could be used. Leases could have a signalfd-style api.
>
> We may not need everything at once, but we should have the full picture
> in mind. And we need working code in kernel and userspace that passes
> all tests (we may need to add additional test). Otherwise the kernel
> creates new syscalls, which wouldn't be used by Samba in the end.

Just a thought, but you should probably classify these facilities in
two lists, one for items that can only reasonably be done via a kernel
API and one for items that can be satisfactorily be handled via a
coordinating userspace component (daemon/database/convention/other).

Getting all that stuff in kernel may prove overly hard and contentious
so being able to negotiate on the critical items only may be important.

Simo.



2019-03-08 21:38:20

by J. Bruce Fields

[permalink] [raw]
Subject: Re: [NFS-Ganesha-Devel] Re: Better interop for NFS/SMB file share mode/reservation

On Wed, Mar 06, 2019 at 07:37:00AM -0800, Frank Filz wrote:
> > On Wed, Mar 06, 2019 at 09:09:21AM +0200, Amir Goldstein wrote:
> > > On Tue, Mar 5, 2019 at 11:48 PM J. Bruce Fields
> > > <[email protected]> wrote:
> > > >
> > > > On Thu, Feb 14, 2019 at 04:06:52PM -0500, J. Bruce Fields wrote:
> > > > > After this:
> > > > >
> > > > > https://marc.info/?l=linux-nfs&m=154966239918297&w=2
> > > > >
> > > > > delegations would no longer conflict with opens from the same
> > > > > tgid. So if your threads all run in the same process and
> > > > > you're willing to manage conflicts among your own clients,
> > > > > that should still allow you to do multiple opens of the same
> > > > > file without giving up your
> > lease/delegation.
> > > > >
> > > > > I'd be curious to know whether that works with Samba's design.
> > > >
> > > > Any idea whether that would work?
> > > >
> > > > (Easy? Impossible? Possible, but realistically the changes
> > > > required to Samba would be painful enough that it'd be unlikely
> > > > to get done?)
> > > >
> > >
> > > [CC Ralph Boehme]
> > >
> > > I am not a samba team member, but seems to me that your proposal
> > > fits samba design like a glove. With one smbd process per client
> > > connection, with your proposal, opens (for read) from same smbd
> > > process will not break the shared read lease from same client, so
> > > oplocks level II could be implemented using kernel oplocks (new
> > > flavor).
> >
> > OK. So I wonder about Ganesha. I'm not sure, but I *think* it's
> > like knfsd in that it has a bunch of worker threads that can each
> > take rpc's from any client. I don't remember if they're actually
> > threads or processes.
>
> Ganesha does use worker threads

And they're all part of one process?

> however, one thing that may be an
> advantage here, or at least can be leveraged, is that Ganesha attaches
> a single file descriptor to each stateid. As long as the I/O requests
> come using the stateid, that file descriptor will be used.
>
> We have some work completed and more in progress on delegations, and
> if there becomes a new kernel oplock available, we could definitely
> use it. On the other hand, FSAL_VFS which is the FSAL used with kernel
> file systems does not support delegations...
>
> The (distributed) file systems we support delegations on have use
> space libraries (which Samba should also be using?) that implement the
> delegation primitives.

Is there anyone working on delegation support for FSAL_VFS? If it's not
getting much attention then maybe Samba is the only real user for the
forseeable future.

--b.

2019-03-08 21:53:25

by Frank Filz

[permalink] [raw]
Subject: RE: [NFS-Ganesha-Devel] Re: Better interop for NFS/SMB file share mode/reservation

> From: 'J. Bruce Fields' [mailto:[email protected]]
> Sent: Friday, March 8, 2019 1:38 PM
> To: Frank Filz <[email protected]>
> Cc: 'linux-fsdevel' <[email protected]>;
> [email protected]; [email protected]; 'Jeremy Allison'
> <[email protected]>; 'Linux NFS Mailing List' <[email protected]>; 'Jeff
> Layton' <[email protected]>; 'Amir Goldstein' <[email protected]>;
> [email protected]; 'Ralph Boehme' <[email protected]>
> Subject: [NFS-Ganesha-Devel] Re: Better interop for NFS/SMB file share
> mode/reservation
>
> On Wed, Mar 06, 2019 at 07:37:00AM -0800, Frank Filz wrote:
> > > On Wed, Mar 06, 2019 at 09:09:21AM +0200, Amir Goldstein wrote:
> > > > On Tue, Mar 5, 2019 at 11:48 PM J. Bruce Fields
> > > > <[email protected]> wrote:
> > > > >
> > > > > On Thu, Feb 14, 2019 at 04:06:52PM -0500, J. Bruce Fields wrote:
> > > > > > After this:
> > > > > >
> > > > > > https://marc.info/?l=linux-nfs&m=154966239918297&w=2
> > > > > >
> > > > > > delegations would no longer conflict with opens from the same
> > > > > > tgid. So if your threads all run in the same process and
> > > > > > you're willing to manage conflicts among your own clients,
> > > > > > that should still allow you to do multiple opens of the same
> > > > > > file without giving up your
> > > lease/delegation.
> > > > > >
> > > > > > I'd be curious to know whether that works with Samba's design.
> > > > >
> > > > > Any idea whether that would work?
> > > > >
> > > > > (Easy? Impossible? Possible, but realistically the changes
> > > > > required to Samba would be painful enough that it'd be unlikely
> > > > > to get done?)
> > > > >
> > > >
> > > > [CC Ralph Boehme]
> > > >
> > > > I am not a samba team member, but seems to me that your proposal
> > > > fits samba design like a glove. With one smbd process per client
> > > > connection, with your proposal, opens (for read) from same smbd
> > > > process will not break the shared read lease from same client, so
> > > > oplocks level II could be implemented using kernel oplocks (new
> > > > flavor).
> > >
> > > OK. So I wonder about Ganesha. I'm not sure, but I *think* it's
> > > like knfsd in that it has a bunch of worker threads that can each
> > > take rpc's from any client. I don't remember if they're actually
> > > threads or processes.
> >
> > Ganesha does use worker threads
>
> And they're all part of one process?

Sorry, should have specified, yes, Ganesha runs a single multi-threaded process.

> > however, one thing that may be an
> > advantage here, or at least can be leveraged, is that Ganesha attaches
> > a single file descriptor to each stateid. As long as the I/O requests
> > come using the stateid, that file descriptor will be used.
> >
> > We have some work completed and more in progress on delegations, and
> > if there becomes a new kernel oplock available, we could definitely
> > use it. On the other hand, FSAL_VFS which is the FSAL used with kernel
> > file systems does not support delegations...
> >
> > The (distributed) file systems we support delegations on have use
> > space libraries (which Samba should also be using?) that implement the
> > delegation primitives.
>
> Is there anyone working on delegation support for FSAL_VFS? If it's not getting
> much attention then maybe Samba is the only real user for the forseeable
> future.

As far as I know, no one is working on delegations for FSAL_VFS. A good interface would make it something that might be used if it then became easy to implement delegations. FSAL_VFS is convenient for verifying some of the protocol and meta-data caching features of Ganesha.

Frank


2019-04-25 18:52:07

by Amir Goldstein

[permalink] [raw]
Subject: Re: Better interop for NFS/SMB file share mode/reservation

On Thu, Mar 7, 2019 at 1:04 PM Stefan Metzmacher <[email protected]> wrote:
>
> Am 06.03.19 um 22:25 schrieb Ralph Böhme via samba-technical:
> >
> > Jeremy Allison wrote:
> >> On Wed, Mar 06, 2019 at 03:31:08PM -0500, Jeff Layton wrote:
> >>> On Wed, 2019-03-06 at 10:11 -0500, J. Bruce Fields wrote:
> >>>>
> >>>> Jeff, wasn't there some work (on Ceph maybe?) on a userspace delegation
> >>>> API? Is that close to what's needed?
> >>>>
> >>>
> >>> Here's the C headers for that stuff:
> >>>
> >>> https://github.com/ceph/ceph/blob/7ba6bece4187eda5d05a9b84211fe6ba8dd287bd/src/include/cephfs/libcephfs.h#L1734
> >>>
> >>> It's simple enough and works for us in ganesha, and I think we can
> >>> probably adapt it to samba without too much difficulty. The callback
> >>> doesn't seem like it'll do for a kernel API though -- you'd almost
> >>> certainly need to do something different there (signals? inotify?).
> >>
> >> SMB3 leases have R/RW and Handle-based leases.
> >
> > Just to be precise: SMB2.1+ has R, RH, RW and RWH leases.
> >
> >> Handle leases allow multiple opens of the same pathname
> >> that get different handles to share the lease, allowing
> >> a client redirector to delay opens or closes locally
> >> so long as it has a handle lease.
> >
> > That'a a propertly of leases in general, not just H-leases. The client provides a lease key which is a GUID with each lease request
> >
> >>
> >> Here are the semantics:
> >>
> >> https://docs.microsoft.com/en-us/openspecs/windows_protocols/ms-smb2/d8df943d-6ad7-4b30-9f58-96ae90fc6204
> >>
> >> I'm not sure a simple file-descriptor based API is
> >> enough for us. Can he have a uuid or token based
> >> API instead where the server can chose what fd's
> >> to cover with a token ?
> >
> > Yes, that would be ideal.

Getting back to this.
Thanks all for the valuable inputs.

Next week is LSF/MM and I was assigned a 30 minute slot on filesystems track
to discuss "NFS/SMB file share".

So let me try to echo what I read on this thread and how I understand what APIs
samba needs from the kernel.

>
> If we want to design an useful API, we also need to think about
> all features:
> - file oplock/leases

Kernel can have a flavor of leases which are not broken
by opens from threads of the process holding the lease.
Bruce has some patches along those lines for knfsd and SMB R/RW
leases could use this flavor if it was exported to userspace?

For SMB RH/RWH leases and Ganesha delegations, server
could keep track of its own handles/clients and break leases within the
same process without involving the kernel.
Am I wrong?

> - directory leases

I have WIP on fsnotify directory pre modification hooks.
There is opposition from fsnotify maintainer to add new userspace
APIs that can create kernel->user->kernel deadlocks, like the
deadlocks currently reported with fanotify permission events.

Need to see if we can find a middle ground between
"post modification notifications" and "pre modification permission"
API, somewhere along the lines of regular file lease breaking API.

> - share modes

Volker told me he thinks samba can enforce share modes by
a single daemon policing all opens in the system with fanotify.
I think he is right. If anyone thinks differently please speak up.

> - disconnected handles (for durable and persistent handles),
> which exists within the kernel for a while and can be reattached
> to process, using some kind of cookie and the same euid

So this interface exists in the kernel.
Nothing more required from the kernel API. Right?

> - the API needs ways to use epoll in order to do async opens
> and lease breaks. For opens the model of async socket connects
> could be used. Leases could have a signalfd-style api.

I should hope that the new AIO API (http://kernel.dk/io_uring.pdf)
would solve those problems as well as other issues that
samba has w.r.t dispatching AIO.

>
> We may not need everything at once, but we should have the full picture
> in mind. And we need working code in kernel and userspace that passes
> all tests (we may need to add additional test). Otherwise the kernel
> creates new syscalls, which wouldn't be used by Samba in the end.
>

Tested interfaces - good idea ;-)

If anyone has any comments about my view of required new interfaces,
or important things that I missed, please say so before Tuesday!

Thanks,
Amir.

2019-04-27 20:17:32

by Amir Goldstein

[permalink] [raw]
Subject: Re: Better interop for NFS/SMB file share mode/reservation

[adding back samba/nfs and fsdevel]

On Fri, Apr 26, 2019 at 6:22 PM Jeff Layton <[email protected]> wrote:
>
> On Fri, 2019-04-26 at 10:50 -0400, J. Bruce Fields wrote:
> > On Fri, Apr 26, 2019 at 04:11:00PM +0200, Amir Goldstein wrote:
> > > On Fri, Apr 26, 2019, 4:00 PM J. Bruce Fields <[email protected]> wrote:
> > >
> > > > On Fri, Apr 26, 2019 at 03:50:46PM +0200, Amir Goldstein wrote:
> > > > > On Fri, Feb 8, 2019, 5:03 PM Jeff Layton <[email protected]> wrote:
> > > > > > Share/deny open semantics are pretty similar across NFS and SMB (by
> > > > > > design, really). If you intend to solve that use-case, what you really
> > > > > > want is whole-file, shared/exclusive locks that are set atomically with
> > > > > > the open call. O_EXLOCK and O_SHLOCK seem like a reasonable fit there.
> > > > > >
> > > > > > Then you could have SMB and NFS servers set these flags when opening
> > > > > > files, and deal with the occasional denial at open time. Other
> > > > > > applications won't be aware of them of course, but that's probably fine
> > > > > > for most use-cases where you want this sort of protocol interop.
> > > > >
> > > > > Sorry for posting off list. Airport emails...
> > > > > I looked at implemeting O_EXLOCK and O_SHLOCK and it looks doable.
> > > > >
> > > > > I was wondering if there is an inherent reason not to allow an exclusive
> > > > > lock on a file that is open read-only.
> > > > >
> > > > > Samba seems to need it and currently flock and ofd locks won't allow it.
> > > > > Do you thing it will be ok to allow it with O_EXLOCK?
> > > >
> > > > Somebody could deny everyone access to a shared resource that everyone
> > > > needs to make progress, like /etc/passwd or a shared library.
> > > >
> > > > Have you looked at Pavel Shilovsky's O_DENY patches? He had the feature
> > > > off by default, with a mount option provided to turn it on.
> > > >
> > >
> > > O_EXLOCK is advisory. It only aquired flock or ofd lock atomically with
> > > open.
> >
> > Whoops, got it.
> >
> > Is that really adequate for open share locks, though?
> >
> > I assumed that Windows apps depend on the assumption that they're
> > mandatory. So e.g. if you can get a DENY_READ open on a shared library
> > then you know you can update it without the risk of making someone else
> > crash.
> >
>
> I think this is (slightly) better than doing it internally like we do
> today and would give you coherent locking between NFS and SMB. Other
> applications wouldn't see them, but for a NAS-style deployment, that's
> probably ok.
>

We can do a little bit better.
We can make sure that O_DENY_WRITE (named for convenience) fails
if file is currently open for write by anyone and similarly for O_DENY_READ.
But if we cannot deny future non-cooperative opens what's the point?....

> Any open by samba or nfsd would need to start setting O_SHLOCK, and deny
> mode opens would have to set O_EXLOCK. We would actually need 2 per
> inode though (one for read and one for write).
>

...the point is that O_DENY_NONE does not need to be implemented with
a new type of lock object (O_WR_SHLOCK) its enough that it checks there
are no relevant exclusive locks and the then inode->i_writecount and
inode->i_readcount already provide enough context to cooperate with
O_DENY_WRITE and O_DENY_READ.

I need to see if incrementing inode->i_readcount on O_RDWR opens is
possible (right now it only counts O_RDONLY opens).

> I think these should probably be in their own "namespace" too. They
> could use the same semantics as flock, but should sit on their own list
> in file_lock_context.
>

I would much rather that they didn't. The reason is that new open flags
are a backward compat problem. The way I want to solve it is this API:

// On new kernel this will acquire OFD F_WRLCK atomically...
fd = open(..., O_RDWR | O_EXLOCK);
// ...check if it did acquire OFD lock
fcntl(fd, F_OFD_GETLK, ...);

We'd need at least one new l_type F_EX_RDLCK and maybe also a new
semantic F_EX_RDWRLCK, although similar in conflicts to F_WRLCK it can be
acquired without FMODE_WRITE. Though I personally thing we can do without
it if the only way to acquire F_WRLCK on readonly file is via new open flag.

> That said, we could also look at a vfs-level mount option that would
> make the kernel enforce these for any opener. That could also be useful,
> and shouldn't be too hard to implement. Maybe even make it a vfsmount-
> level option (like -o ro is).
>

Yeh, I am humbly going to leave this struggle to someone else.
Not important enough IMO and completely independent effort to the
advisory atomic open&lock API.

> If you're denied, what error should you get back when you try to open
> it? It should be something distinct. We may even want to add new error
> codes for this.

IMO EBUSY does the job. Its distinct because open is not expected
to return EBUSY for regular files/dirs and when open is expected to
return EBUSY for blockdev its for the exact same use case (i.e.
exclusive write open is acquired by userspace tools).

Thanks,
Amir.

2019-04-28 12:10:22

by Jeff Layton

[permalink] [raw]
Subject: Re: Better interop for NFS/SMB file share mode/reservation

On Sat, 2019-04-27 at 16:16 -0400, Amir Goldstein wrote:
> [adding back samba/nfs and fsdevel]
>

cc'ing Pavel too -- he did a bunch of work in this area a few years ago.

> On Fri, Apr 26, 2019 at 6:22 PM Jeff Layton <[email protected]> wrote:
> > On Fri, 2019-04-26 at 10:50 -0400, J. Bruce Fields wrote:
> > > On Fri, Apr 26, 2019 at 04:11:00PM +0200, Amir Goldstein wrote:
> > > > On Fri, Apr 26, 2019, 4:00 PM J. Bruce Fields <[email protected]> wrote:
> > > >
> > > > > On Fri, Apr 26, 2019 at 03:50:46PM +0200, Amir Goldstein wrote:
> > > > > > On Fri, Feb 8, 2019, 5:03 PM Jeff Layton <[email protected]> wrote:
> > > > > > > Share/deny open semantics are pretty similar across NFS and SMB (by
> > > > > > > design, really). If you intend to solve that use-case, what you really
> > > > > > > want is whole-file, shared/exclusive locks that are set atomically with
> > > > > > > the open call. O_EXLOCK and O_SHLOCK seem like a reasonable fit there.
> > > > > > >
> > > > > > > Then you could have SMB and NFS servers set these flags when opening
> > > > > > > files, and deal with the occasional denial at open time. Other
> > > > > > > applications won't be aware of them of course, but that's probably fine
> > > > > > > for most use-cases where you want this sort of protocol interop.
> > > > > >
> > > > > > Sorry for posting off list. Airport emails...
> > > > > > I looked at implemeting O_EXLOCK and O_SHLOCK and it looks doable.
> > > > > >
> > > > > > I was wondering if there is an inherent reason not to allow an exclusive
> > > > > > lock on a file that is open read-only.
> > > > > >
> > > > > > Samba seems to need it and currently flock and ofd locks won't allow it.
> > > > > > Do you thing it will be ok to allow it with O_EXLOCK?
> > > > >
> > > > > Somebody could deny everyone access to a shared resource that everyone
> > > > > needs to make progress, like /etc/passwd or a shared library.
> > > > >
> > > > > Have you looked at Pavel Shilovsky's O_DENY patches? He had the feature
> > > > > off by default, with a mount option provided to turn it on.
> > > > >
> > > >
> > > > O_EXLOCK is advisory. It only aquired flock or ofd lock atomically with
> > > > open.
> > >
> > > Whoops, got it.
> > >
> > > Is that really adequate for open share locks, though?
> > >
> > > I assumed that Windows apps depend on the assumption that they're
> > > mandatory. So e.g. if you can get a DENY_READ open on a shared library
> > > then you know you can update it without the risk of making someone else
> > > crash.
> > >
> >
> > I think this is (slightly) better than doing it internally like we do
> > today and would give you coherent locking between NFS and SMB. Other
> > applications wouldn't see them, but for a NAS-style deployment, that's
> > probably ok.
> >
>
> We can do a little bit better.
> We can make sure that O_DENY_WRITE (named for convenience) fails
> if file is currently open for write by anyone and similarly for O_DENY_READ.
> But if we cannot deny future non-cooperative opens what's the point?....
>

As you said in another mail, the main interest here is in getting
NFS+SMB semantics right. If the exported filesystem is _only_ available
via NFS+SMB, then do we need to deny non-cooperative opens?

> > Any open by samba or nfsd would need to start setting O_SHLOCK, and deny
> > mode opens would have to set O_EXLOCK. We would actually need 2 per
> > inode though (one for read and one for write).
> >
>
> ...the point is that O_DENY_NONE does not need to be implemented with
> a new type of lock object (O_WR_SHLOCK) its enough that it checks there
> are no relevant exclusive locks and the then inode->i_writecount and
> inode->i_readcount already provide enough context to cooperate with
> O_DENY_WRITE and O_DENY_READ.
>

That would work, if the goal is to have deny modes affect all opens. We
could also do this on the opt-in basis that I was suggesting with a new
set of counters in struct file_lock_context.

> I need to see if incrementing inode->i_readcount on O_RDWR opens is
> possible (right now it only counts O_RDONLY opens).
>
> > I think these should probably be in their own "namespace" too. They
> > could use the same semantics as flock, but should sit on their own list
> > in file_lock_context.
> >
>
> I would much rather that they didn't. The reason is that new open flags
> are a backward compat problem. The way I want to solve it is this API:
>
> // On new kernel this will acquire OFD F_WRLCK atomically...
> fd = open(..., O_RDWR | O_EXLOCK);
> // ...check if it did acquire OFD lock
> fcntl(fd, F_OFD_GETLK, ...);
>
> We'd need at least one new l_type F_EX_RDLCK and maybe also a new
> semantic F_EX_RDWRLCK, although similar in conflicts to F_WRLCK it can be
> acquired without FMODE_WRITE. Though I personally thing we can do without
> it if the only way to acquire F_WRLCK on readonly file is via new open flag.
>

I don't think that will work at all. Share/deny modes are entirely
orthogonal to byte-range locks in both NFS and SMB. Consider:

Two clients open a file with O_RDWR | | O_SHARE_WRITE | O_SHARE_READ.
One of them now wants to set byte-range write lock on the file. That
should be allowed, but now it'll be denied, because the other client
will effectively hold a whole-file readlock on it.

There is also the problem that read and write deny modes are orthogonal
to one other, so you have to have a way to deal with them independently.

I'd suggest an API like this:

// open read/write and deny read/write
fd = open(..., O_RDWR | O_DENY_READ | O_DENY_WRITE);
// test for flags with F_GETFL
flags = fcntl(fd, F_GETFL);

That would also allow you to use F_SETFL to change those flags on an
existing fd.

> > That said, we could also look at a vfs-level mount option that would
> > make the kernel enforce these for any opener. That could also be useful,
> > and shouldn't be too hard to implement. Maybe even make it a vfsmount-
> > level option (like -o ro is).
> >
>
> Yeh, I am humbly going to leave this struggle to someone else.
> Not important enough IMO and completely independent effort to the
> advisory atomic open&lock API.

Having the kernel allow setting deny modes on any open call is a non-
starter, for the reasons Bruce outlined earlier. This _must_ be
restricted in some fashion or we'll be opening up a ginormous DoS
mechanism.

My proposal was to make this only be enforced by applications that
explicitly opt-in by setting O_SH*/O_EX* flags. It wouldn't be too
difficult to also allow them to be enforced on a per-fs basis via mount
option or something. Maybe we could expand the meaning of '-o mand' ?

How would you propose that we restrict this?

> > If you're denied, what error should you get back when you try to open
> > it? It should be something distinct. We may even want to add new error
> > codes for this.
>
> IMO EBUSY does the job. Its distinct because open is not expected
> to return EBUSY for regular files/dirs and when open is expected to
> return EBUSY for blockdev its for the exact same use case (i.e.
> exclusive write open is acquired by userspace tools).

That works for me.

We should probably have a close look at the work that Pavel did several
years ago too. It has almost certainly bitrotted by now, but it may
serve as a starting point (and he may he may have valuable input here).
--
Jeff Layton <[email protected]>


2019-04-28 13:50:09

by Amir Goldstein

[permalink] [raw]
Subject: Re: Better interop for NFS/SMB file share mode/reservation

On Sun, Apr 28, 2019 at 8:09 AM Jeff Layton <[email protected]> wrote:
>
> On Sat, 2019-04-27 at 16:16 -0400, Amir Goldstein wrote:
> > [adding back samba/nfs and fsdevel]
> >
>
> cc'ing Pavel too -- he did a bunch of work in this area a few years ago.
>
> > On Fri, Apr 26, 2019 at 6:22 PM Jeff Layton <[email protected]> wrote:
> > > On Fri, 2019-04-26 at 10:50 -0400, J. Bruce Fields wrote:
> > > > On Fri, Apr 26, 2019 at 04:11:00PM +0200, Amir Goldstein wrote:
> > > > > On Fri, Apr 26, 2019, 4:00 PM J. Bruce Fields <[email protected]> wrote:
> > > > >
> > > > > > On Fri, Apr 26, 2019 at 03:50:46PM +0200, Amir Goldstein wrote:
> > > > > > > On Fri, Feb 8, 2019, 5:03 PM Jeff Layton <[email protected]> wrote:
> > > > > > > > Share/deny open semantics are pretty similar across NFS and SMB (by
> > > > > > > > design, really). If you intend to solve that use-case, what you really
> > > > > > > > want is whole-file, shared/exclusive locks that are set atomically with
> > > > > > > > the open call. O_EXLOCK and O_SHLOCK seem like a reasonable fit there.
> > > > > > > >
> > > > > > > > Then you could have SMB and NFS servers set these flags when opening
> > > > > > > > files, and deal with the occasional denial at open time. Other
> > > > > > > > applications won't be aware of them of course, but that's probably fine
> > > > > > > > for most use-cases where you want this sort of protocol interop.
> > > > > > >
> > > > > > > Sorry for posting off list. Airport emails...
> > > > > > > I looked at implemeting O_EXLOCK and O_SHLOCK and it looks doable.
> > > > > > >
> > > > > > > I was wondering if there is an inherent reason not to allow an exclusive
> > > > > > > lock on a file that is open read-only.
> > > > > > >
> > > > > > > Samba seems to need it and currently flock and ofd locks won't allow it.
> > > > > > > Do you thing it will be ok to allow it with O_EXLOCK?
> > > > > >
> > > > > > Somebody could deny everyone access to a shared resource that everyone
> > > > > > needs to make progress, like /etc/passwd or a shared library.
> > > > > >
> > > > > > Have you looked at Pavel Shilovsky's O_DENY patches? He had the feature
> > > > > > off by default, with a mount option provided to turn it on.
> > > > > >
> > > > >
> > > > > O_EXLOCK is advisory. It only aquired flock or ofd lock atomically with
> > > > > open.
> > > >
> > > > Whoops, got it.
> > > >
> > > > Is that really adequate for open share locks, though?
> > > >
> > > > I assumed that Windows apps depend on the assumption that they're
> > > > mandatory. So e.g. if you can get a DENY_READ open on a shared library
> > > > then you know you can update it without the risk of making someone else
> > > > crash.
> > > >
> > >
> > > I think this is (slightly) better than doing it internally like we do
> > > today and would give you coherent locking between NFS and SMB. Other
> > > applications wouldn't see them, but for a NAS-style deployment, that's
> > > probably ok.
> > >
> >
> > We can do a little bit better.
> > We can make sure that O_DENY_WRITE (named for convenience) fails
> > if file is currently open for write by anyone and similarly for O_DENY_READ.
> > But if we cannot deny future non-cooperative opens what's the point?....
> >
>
> As you said in another mail, the main interest here is in getting
> NFS+SMB semantics right. If the exported filesystem is _only_ available
> via NFS+SMB, then do we need to deny non-cooperative opens?
>

We do not.

> > > Any open by samba or nfsd would need to start setting O_SHLOCK, and deny
> > > mode opens would have to set O_EXLOCK. We would actually need 2 per
> > > inode though (one for read and one for write).
> > >
> >
> > ...the point is that O_DENY_NONE does not need to be implemented with
> > a new type of lock object (O_WR_SHLOCK) its enough that it checks there
> > are no relevant exclusive locks and the then inode->i_writecount and
> > inode->i_readcount already provide enough context to cooperate with
> > O_DENY_WRITE and O_DENY_READ.
> >
>
> That would work, if the goal is to have deny modes affect all opens. We
> could also do this on the opt-in basis that I was suggesting with a new
> set of counters in struct file_lock_context.
>

Ok.

> > I need to see if incrementing inode->i_readcount on O_RDWR opens is
> > possible (right now it only counts O_RDONLY opens).
> >
> > > I think these should probably be in their own "namespace" too. They
> > > could use the same semantics as flock, but should sit on their own list
> > > in file_lock_context.
> > >
> >
> > I would much rather that they didn't. The reason is that new open flags
> > are a backward compat problem. The way I want to solve it is this API:
> >
> > // On new kernel this will acquire OFD F_WRLCK atomically...
> > fd = open(..., O_RDWR | O_EXLOCK);
> > // ...check if it did acquire OFD lock
> > fcntl(fd, F_OFD_GETLK, ...);
> >
> > We'd need at least one new l_type F_EX_RDLCK and maybe also a new
> > semantic F_EX_RDWRLCK, although similar in conflicts to F_WRLCK it can be
> > acquired without FMODE_WRITE. Though I personally thing we can do without
> > it if the only way to acquire F_WRLCK on readonly file is via new open flag.
> >
>
> I don't think that will work at all. Share/deny modes are entirely
> orthogonal to byte-range locks in both NFS and SMB. Consider:
>
> Two clients open a file with O_RDWR | | O_SHARE_WRITE | O_SHARE_READ.
> One of them now wants to set byte-range write lock on the file. That
> should be allowed, but now it'll be denied, because the other client
> will effectively hold a whole-file readlock on it.
>

Got it. flock semantics (as Pavel chose) are a better fit.
It only does not support O_SHARE_WRITE | O_DENY_READ naively,
but easy to add.

> There is also the problem that read and write deny modes are orthogonal
> to one other, so you have to have a way to deal with them independently.
>
> I'd suggest an API like this:
>
> // open read/write and deny read/write
> fd = open(..., O_RDWR | O_DENY_READ | O_DENY_WRITE);
> // test for flags with F_GETFL
> flags = fcntl(fd, F_GETFL);
>
> That would also allow you to use F_SETFL to change those flags on an
> existing fd.
>

Nice. If only old kernel wouldn't give out in F_GETFL any garbage flags
you piled on open.
That's why I wanted a different way to check if lock is taken and thought
of F_OFD_GETLK as a natural candidate.

We can play this game:

// New kernel doesn't copy O_TEST to f_flags
#define O_DENY_READ O_TEST | __O_DENY_READ
fd = open(..., O_RDWR | O_DENY_READ);
flags = fcntl(fd, F_GETFL);
if ((flags & O_DENY_READ) && !(flags & O_TEST))

A bit ugly, but if its wrapped in a library function
get_open_flags() who cares...

> > > That said, we could also look at a vfs-level mount option that would
> > > make the kernel enforce these for any opener. That could also be useful,
> > > and shouldn't be too hard to implement. Maybe even make it a vfsmount-
> > > level option (like -o ro is).
> > >
> >
> > Yeh, I am humbly going to leave this struggle to someone else.
> > Not important enough IMO and completely independent effort to the
> > advisory atomic open&lock API.
>
> Having the kernel allow setting deny modes on any open call is a non-
> starter, for the reasons Bruce outlined earlier. This _must_ be
> restricted in some fashion or we'll be opening up a ginormous DoS
> mechanism.
>
> My proposal was to make this only be enforced by applications that
> explicitly opt-in by setting O_SH*/O_EX* flags. It wouldn't be too
> difficult to also allow them to be enforced on a per-fs basis via mount
> option or something. Maybe we could expand the meaning of '-o mand' ?
>
> How would you propose that we restrict this?
>

Our communication channel is broken.
I did not intend to propose any implicit locking.
If samba and nfsd can opt-in with O_SHARE flags, I do not
understand why a mount option is helpful for the cause of
samba/nfsd interop.

If someone else is interested in samba/local interop than
yes, a mount option like suggested by Pavel could be a good option,
but it is an orthogonal effort IMO.


> > > If you're denied, what error should you get back when you try to open
> > > it? It should be something distinct. We may even want to add new error
> > > codes for this.
> >
> > IMO EBUSY does the job. Its distinct because open is not expected
> > to return EBUSY for regular files/dirs and when open is expected to
> > return EBUSY for blockdev its for the exact same use case (i.e.
> > exclusive write open is acquired by userspace tools).
>
> That works for me.

From Pavel's v6 cover letter:
"Make nfs code return -EBUSY for share conflicts (was -EACCESS)."
;-)

>
> We should probably have a close look at the work that Pavel did several
> years ago too. It has almost certainly bitrotted by now, but it may
> serve as a starting point (and he may he may have valuable input here).

I looked at the patches. There's good stuff in there.
Once we agree on the specifications I can rip some code off ;-)

A lot of the work in Pavel's patches evolves around making the
mount option work and respecting O_DENYDELETE.
IMO, that is not a good use of up-streaming effort, because:
- NFS won't ask for deny delete
- IMO, Windows applications should be used to being denied
a DENY_DELETE and fall back to SHARE_DELETE

So while implementing DENYDELETE may fall into a category of making
samba server behave more like Windows server, I don't think it falls into
the category of better samba/nfs interop.

It is something that we can add later if anyone really cares about.

Thanks,
Amir.

2019-04-28 15:08:22

by Trond Myklebust

[permalink] [raw]
Subject: Re: Better interop for NFS/SMB file share mode/reservation

On Sun, 2019-04-28 at 09:45 -0400, Amir Goldstein wrote:
> On Sun, Apr 28, 2019 at 8:09 AM Jeff Layton <[email protected]>
> wrote:
> > On Sat, 2019-04-27 at 16:16 -0400, Amir Goldstein wrote:
> > > [adding back samba/nfs and fsdevel]
> > >
> >
> > cc'ing Pavel too -- he did a bunch of work in this area a few years
> > ago.
> >
> > > On Fri, Apr 26, 2019 at 6:22 PM Jeff Layton <[email protected]>
> > > wrote:
> > > > On Fri, 2019-04-26 at 10:50 -0400, J. Bruce Fields wrote:
> > > > > On Fri, Apr 26, 2019 at 04:11:00PM +0200, Amir Goldstein
> > > > > wrote:
> > > > > > On Fri, Apr 26, 2019, 4:00 PM J. Bruce Fields <
> > > > > > [email protected]> wrote:
> > > > > >
> > > > That said, we could also look at a vfs-level mount option that
> > > > would
> > > > make the kernel enforce these for any opener. That could also
> > > > be useful,
> > > > and shouldn't be too hard to implement. Maybe even make it a
> > > > vfsmount-
> > > > level option (like -o ro is).
> > > >
> > >
> > > Yeh, I am humbly going to leave this struggle to someone else.
> > > Not important enough IMO and completely independent effort to the
> > > advisory atomic open&lock API.
> >
> > Having the kernel allow setting deny modes on any open call is a
> > non-
> > starter, for the reasons Bruce outlined earlier. This _must_ be
> > restricted in some fashion or we'll be opening up a ginormous DoS
> > mechanism.
> >
> > My proposal was to make this only be enforced by applications that
> > explicitly opt-in by setting O_SH*/O_EX* flags. It wouldn't be too
> > difficult to also allow them to be enforced on a per-fs basis via
> > mount
> > option or something. Maybe we could expand the meaning of '-o mand'
> > ?
> >
> > How would you propose that we restrict this?
> >
>
> Our communication channel is broken.
> I did not intend to propose any implicit locking.
> If samba and nfsd can opt-in with O_SHARE flags, I do not
> understand why a mount option is helpful for the cause of
> samba/nfsd interop.
>
> If someone else is interested in samba/local interop than
> yes, a mount option like suggested by Pavel could be a good option,
> but it is an orthogonal effort IMO.

If an NFS client 'opts in' to set share deny, then that still makes it
a non-optional lock for the other NFS clients, because all ordinary
open() calls will be gated by the server whether or not their
application specifies the O_SHARE flag. There is no flag in the NFS
protocol that could tell the server to ignore deny modes.

IOW: it would suffice for 1 client to use O_SHARE|O_DENY* to opt all
the other clients in.

--
Trond Myklebust
Linux NFS client maintainer, Hammerspace
[email protected]


2019-04-28 22:09:33

by Amir Goldstein

[permalink] [raw]
Subject: Re: Better interop for NFS/SMB file share mode/reservation

On Sun, Apr 28, 2019 at 11:06 AM Trond Myklebust
<[email protected]> wrote:
>
> On Sun, 2019-04-28 at 09:45 -0400, Amir Goldstein wrote:
> > On Sun, Apr 28, 2019 at 8:09 AM Jeff Layton <[email protected]>
> > wrote:
> > > On Sat, 2019-04-27 at 16:16 -0400, Amir Goldstein wrote:
> > > > [adding back samba/nfs and fsdevel]
> > > >
> > >
> > > cc'ing Pavel too -- he did a bunch of work in this area a few years
> > > ago.
> > >
> > > > On Fri, Apr 26, 2019 at 6:22 PM Jeff Layton <[email protected]>
> > > > wrote:
> > > > > On Fri, 2019-04-26 at 10:50 -0400, J. Bruce Fields wrote:
> > > > > > On Fri, Apr 26, 2019 at 04:11:00PM +0200, Amir Goldstein
> > > > > > wrote:
> > > > > > > On Fri, Apr 26, 2019, 4:00 PM J. Bruce Fields <
> > > > > > > [email protected]> wrote:
> > > > > > >
> > > > > That said, we could also look at a vfs-level mount option that
> > > > > would
> > > > > make the kernel enforce these for any opener. That could also
> > > > > be useful,
> > > > > and shouldn't be too hard to implement. Maybe even make it a
> > > > > vfsmount-
> > > > > level option (like -o ro is).
> > > > >
> > > >
> > > > Yeh, I am humbly going to leave this struggle to someone else.
> > > > Not important enough IMO and completely independent effort to the
> > > > advisory atomic open&lock API.
> > >
> > > Having the kernel allow setting deny modes on any open call is a
> > > non-
> > > starter, for the reasons Bruce outlined earlier. This _must_ be
> > > restricted in some fashion or we'll be opening up a ginormous DoS
> > > mechanism.
> > >
> > > My proposal was to make this only be enforced by applications that
> > > explicitly opt-in by setting O_SH*/O_EX* flags. It wouldn't be too
> > > difficult to also allow them to be enforced on a per-fs basis via
> > > mount
> > > option or something. Maybe we could expand the meaning of '-o mand'
> > > ?
> > >
> > > How would you propose that we restrict this?
> > >
> >
> > Our communication channel is broken.
> > I did not intend to propose any implicit locking.
> > If samba and nfsd can opt-in with O_SHARE flags, I do not
> > understand why a mount option is helpful for the cause of
> > samba/nfsd interop.
> >
> > If someone else is interested in samba/local interop than
> > yes, a mount option like suggested by Pavel could be a good option,
> > but it is an orthogonal effort IMO.
>
> If an NFS client 'opts in' to set share deny, then that still makes it
> a non-optional lock for the other NFS clients, because all ordinary
> open() calls will be gated by the server whether or not their
> application specifies the O_SHARE flag. There is no flag in the NFS
> protocol that could tell the server to ignore deny modes.
>
> IOW: it would suffice for 1 client to use O_SHARE|O_DENY* to opt all
> the other clients in.
>

Sorry for being thick, I don't understand if we are in agreement or not.

My understanding is that the network file server implementations
(i.e. samba, knfds, Ganesha) will always use share/deny modes.
So for example nfs v3 opens will always use O_DENY_NONE
in order to have correct interop with samba and nfs v4.

If I am misunderstanding something, please enlighten me.
If there is a reason why mount option is needed for the sole purpose
of interop between network filesystem servers, please enlighten me.

Thanks,
Amir.

2019-04-28 22:15:34

by Trond Myklebust

[permalink] [raw]
Subject: Re: Better interop for NFS/SMB file share mode/reservation

On Sun, 2019-04-28 at 18:00 -0400, Amir Goldstein wrote:
> On Sun, Apr 28, 2019 at 11:06 AM Trond Myklebust
> <[email protected]> wrote:
> > On Sun, 2019-04-28 at 09:45 -0400, Amir Goldstein wrote:
> > > On Sun, Apr 28, 2019 at 8:09 AM Jeff Layton <[email protected]>
> > > wrote:
> > > > On Sat, 2019-04-27 at 16:16 -0400, Amir Goldstein wrote:
> > > > > [adding back samba/nfs and fsdevel]
> > > > >
> > > >
> > > > cc'ing Pavel too -- he did a bunch of work in this area a few
> > > > years
> > > > ago.
> > > >
> > > > > On Fri, Apr 26, 2019 at 6:22 PM Jeff Layton <
> > > > > [email protected]>
> > > > > wrote:
> > > > > > On Fri, 2019-04-26 at 10:50 -0400, J. Bruce Fields wrote:
> > > > > > > On Fri, Apr 26, 2019 at 04:11:00PM +0200, Amir Goldstein
> > > > > > > wrote:
> > > > > > > > On Fri, Apr 26, 2019, 4:00 PM J. Bruce Fields <
> > > > > > > > [email protected]> wrote:
> > > > > > > >
> > > > > > That said, we could also look at a vfs-level mount option
> > > > > > that
> > > > > > would
> > > > > > make the kernel enforce these for any opener. That could
> > > > > > also
> > > > > > be useful,
> > > > > > and shouldn't be too hard to implement. Maybe even make it
> > > > > > a
> > > > > > vfsmount-
> > > > > > level option (like -o ro is).
> > > > > >
> > > > >
> > > > > Yeh, I am humbly going to leave this struggle to someone
> > > > > else.
> > > > > Not important enough IMO and completely independent effort to
> > > > > the
> > > > > advisory atomic open&lock API.
> > > >
> > > > Having the kernel allow setting deny modes on any open call is
> > > > a
> > > > non-
> > > > starter, for the reasons Bruce outlined earlier. This _must_ be
> > > > restricted in some fashion or we'll be opening up a ginormous
> > > > DoS
> > > > mechanism.
> > > >
> > > > My proposal was to make this only be enforced by applications
> > > > that
> > > > explicitly opt-in by setting O_SH*/O_EX* flags. It wouldn't be
> > > > too
> > > > difficult to also allow them to be enforced on a per-fs basis
> > > > via
> > > > mount
> > > > option or something. Maybe we could expand the meaning of '-o
> > > > mand'
> > > > ?
> > > >
> > > > How would you propose that we restrict this?
> > > >
> > >
> > > Our communication channel is broken.
> > > I did not intend to propose any implicit locking.
> > > If samba and nfsd can opt-in with O_SHARE flags, I do not
> > > understand why a mount option is helpful for the cause of
> > > samba/nfsd interop.
> > >
> > > If someone else is interested in samba/local interop than
> > > yes, a mount option like suggested by Pavel could be a good
> > > option,
> > > but it is an orthogonal effort IMO.
> >
> > If an NFS client 'opts in' to set share deny, then that still makes
> > it
> > a non-optional lock for the other NFS clients, because all ordinary
> > open() calls will be gated by the server whether or not their
> > application specifies the O_SHARE flag. There is no flag in the NFS
> > protocol that could tell the server to ignore deny modes.
> >
> > IOW: it would suffice for 1 client to use O_SHARE|O_DENY* to opt
> > all
> > the other clients in.
> >
>
> Sorry for being thick, I don't understand if we are in agreement or
> not.
>
> My understanding is that the network file server implementations
> (i.e. samba, knfds, Ganesha) will always use share/deny modes.
> So for example nfs v3 opens will always use O_DENY_NONE
> in order to have correct interop with samba and nfs v4.
>
> If I am misunderstanding something, please enlighten me.
> If there is a reason why mount option is needed for the sole purpose
> of interop between network filesystem servers, please enlighten me.
>
>

Same difference. As long as nfsd and/or Ganesha are translating
OPEN4_SHARE_ACCESS_READ and OPEN4_SHARE_ACCESS_WRITE into share access
locks, then those will conflict with any deny locks set by whatever
application that uses them.

IOW: any open(O_RDONLY) and open(O_RDWR) will conflict with an
O_DENY_READ that is set on the server, and any open(O_WRONLY) and
open(O_RDWR) will conflict with an O_DENY_WRITE that is set on the
server. There is no opt-out for NFS clients on this issue, because
stateful NFSv4 opens MUST set one or more of OPEN4_SHARE_ACCESS_READ
and OPEN4_SHARE_ACCESS_WRITE.

--
Trond Myklebust
Linux NFS client maintainer, Hammerspace
[email protected]


2019-04-28 22:34:25

by Amir Goldstein

[permalink] [raw]
Subject: Re: Better interop for NFS/SMB file share mode/reservation

On Sun, Apr 28, 2019 at 6:08 PM Trond Myklebust <[email protected]> wrote:
>
> On Sun, 2019-04-28 at 18:00 -0400, Amir Goldstein wrote:
> > On Sun, Apr 28, 2019 at 11:06 AM Trond Myklebust
> > <[email protected]> wrote:
> > > On Sun, 2019-04-28 at 09:45 -0400, Amir Goldstein wrote:
> > > > On Sun, Apr 28, 2019 at 8:09 AM Jeff Layton <[email protected]>
> > > > wrote:
> > > > > On Sat, 2019-04-27 at 16:16 -0400, Amir Goldstein wrote:
> > > > > > [adding back samba/nfs and fsdevel]
> > > > > >
> > > > >
> > > > > cc'ing Pavel too -- he did a bunch of work in this area a few
> > > > > years
> > > > > ago.
> > > > >
> > > > > > On Fri, Apr 26, 2019 at 6:22 PM Jeff Layton <
> > > > > > [email protected]>
> > > > > > wrote:
> > > > > > > On Fri, 2019-04-26 at 10:50 -0400, J. Bruce Fields wrote:
> > > > > > > > On Fri, Apr 26, 2019 at 04:11:00PM +0200, Amir Goldstein
> > > > > > > > wrote:
> > > > > > > > > On Fri, Apr 26, 2019, 4:00 PM J. Bruce Fields <
> > > > > > > > > [email protected]> wrote:
> > > > > > > > >
> > > > > > > That said, we could also look at a vfs-level mount option
> > > > > > > that
> > > > > > > would
> > > > > > > make the kernel enforce these for any opener. That could
> > > > > > > also
> > > > > > > be useful,
> > > > > > > and shouldn't be too hard to implement. Maybe even make it
> > > > > > > a
> > > > > > > vfsmount-
> > > > > > > level option (like -o ro is).
> > > > > > >
> > > > > >
> > > > > > Yeh, I am humbly going to leave this struggle to someone
> > > > > > else.
> > > > > > Not important enough IMO and completely independent effort to
> > > > > > the
> > > > > > advisory atomic open&lock API.
> > > > >
> > > > > Having the kernel allow setting deny modes on any open call is
> > > > > a
> > > > > non-
> > > > > starter, for the reasons Bruce outlined earlier. This _must_ be
> > > > > restricted in some fashion or we'll be opening up a ginormous
> > > > > DoS
> > > > > mechanism.
> > > > >
> > > > > My proposal was to make this only be enforced by applications
> > > > > that
> > > > > explicitly opt-in by setting O_SH*/O_EX* flags. It wouldn't be
> > > > > too
> > > > > difficult to also allow them to be enforced on a per-fs basis
> > > > > via
> > > > > mount
> > > > > option or something. Maybe we could expand the meaning of '-o
> > > > > mand'
> > > > > ?
> > > > >
> > > > > How would you propose that we restrict this?
> > > > >
> > > >
> > > > Our communication channel is broken.
> > > > I did not intend to propose any implicit locking.
> > > > If samba and nfsd can opt-in with O_SHARE flags, I do not
> > > > understand why a mount option is helpful for the cause of
> > > > samba/nfsd interop.
> > > >
> > > > If someone else is interested in samba/local interop than
> > > > yes, a mount option like suggested by Pavel could be a good
> > > > option,
> > > > but it is an orthogonal effort IMO.
> > >
> > > If an NFS client 'opts in' to set share deny, then that still makes
> > > it
> > > a non-optional lock for the other NFS clients, because all ordinary
> > > open() calls will be gated by the server whether or not their
> > > application specifies the O_SHARE flag. There is no flag in the NFS
> > > protocol that could tell the server to ignore deny modes.
> > >
> > > IOW: it would suffice for 1 client to use O_SHARE|O_DENY* to opt
> > > all
> > > the other clients in.
> > >
> >
> > Sorry for being thick, I don't understand if we are in agreement or
> > not.
> >
> > My understanding is that the network file server implementations
> > (i.e. samba, knfds, Ganesha) will always use share/deny modes.
> > So for example nfs v3 opens will always use O_DENY_NONE
> > in order to have correct interop with samba and nfs v4.
> >
> > If I am misunderstanding something, please enlighten me.
> > If there is a reason why mount option is needed for the sole purpose
> > of interop between network filesystem servers, please enlighten me.
> >
> >
>
> Same difference. As long as nfsd and/or Ganesha are translating
> OPEN4_SHARE_ACCESS_READ and OPEN4_SHARE_ACCESS_WRITE into share access
> locks, then those will conflict with any deny locks set by whatever
> application that uses them.
>
> IOW: any open(O_RDONLY) and open(O_RDWR) will conflict with an
> O_DENY_READ that is set on the server, and any open(O_WRONLY) and
> open(O_RDWR) will conflict with an O_DENY_WRITE that is set on the
> server. There is no opt-out for NFS clients on this issue, because
> stateful NFSv4 opens MUST set one or more of OPEN4_SHARE_ACCESS_READ
> and OPEN4_SHARE_ACCESS_WRITE.
>

Urgh! I *think* I understand the confusion.

I believe Jeff was talking about implementing a mount option
similar to -o mand for local fs on the server.
With that mount option, *any* open() by any app of file from
that mount will use O_DENY_NONE to interop correctly with
network servers that explicitly opt-in for interop on share modes.
I agree its a nice feature that is easy to implement - not important
for first version IMO.

I *think* you are talking on nfs client mount option for
opt-in/out of share modes? there was no such intention.

Thanks,
Amir.

2019-04-29 11:45:29

by Amir Goldstein

[permalink] [raw]
Subject: Re: Better interop for NFS/SMB file share mode/reservation

On Sun, Apr 28, 2019 at 8:57 PM Trond Myklebust <[email protected]> wrote:
>
> On Sun, 2019-04-28 at 18:33 -0400, Amir Goldstein wrote:
> > On Sun, Apr 28, 2019 at 6:08 PM Trond Myklebust <
> > [email protected]> wrote:
> > > On Sun, 2019-04-28 at 18:00 -0400, Amir Goldstein wrote:
> > > > On Sun, Apr 28, 2019 at 11:06 AM Trond Myklebust
> > > > <[email protected]> wrote:
> > > > > On Sun, 2019-04-28 at 09:45 -0400, Amir Goldstein wrote:
> > > > > > On Sun, Apr 28, 2019 at 8:09 AM Jeff Layton <
> > > > > > [email protected]>
> > > > > > wrote:
> > > > > > > On Sat, 2019-04-27 at 16:16 -0400, Amir Goldstein wrote:
> > > > > > > > [adding back samba/nfs and fsdevel]
> > > > > > > >
> > > > > > >
> > > > > > > cc'ing Pavel too -- he did a bunch of work in this area a
> > > > > > > few
> > > > > > > years
> > > > > > > ago.
> > > > > > >
> > > > > > > > On Fri, Apr 26, 2019 at 6:22 PM Jeff Layton <
> > > > > > > > [email protected]>
> > > > > > > > wrote:
> > > > > > > > > On Fri, 2019-04-26 at 10:50 -0400, J. Bruce Fields
> > > > > > > > > wrote:
> > > > > > > > > > On Fri, Apr 26, 2019 at 04:11:00PM +0200, Amir
> > > > > > > > > > Goldstein
> > > > > > > > > > wrote:
> > > > > > > > > > > On Fri, Apr 26, 2019, 4:00 PM J. Bruce Fields <
> > > > > > > > > > > [email protected]> wrote:
> > > > > > > > > > >
> > > > > > > > > That said, we could also look at a vfs-level mount
> > > > > > > > > option
> > > > > > > > > that
> > > > > > > > > would
> > > > > > > > > make the kernel enforce these for any opener. That
> > > > > > > > > could
> > > > > > > > > also
> > > > > > > > > be useful,
> > > > > > > > > and shouldn't be too hard to implement. Maybe even make
> > > > > > > > > it
> > > > > > > > > a
> > > > > > > > > vfsmount-
> > > > > > > > > level option (like -o ro is).
> > > > > > > > >
> > > > > > > >
> > > > > > > > Yeh, I am humbly going to leave this struggle to someone
> > > > > > > > else.
> > > > > > > > Not important enough IMO and completely independent
> > > > > > > > effort to
> > > > > > > > the
> > > > > > > > advisory atomic open&lock API.
> > > > > > >
> > > > > > > Having the kernel allow setting deny modes on any open call
> > > > > > > is
> > > > > > > a
> > > > > > > non-
> > > > > > > starter, for the reasons Bruce outlined earlier. This
> > > > > > > _must_ be
> > > > > > > restricted in some fashion or we'll be opening up a
> > > > > > > ginormous
> > > > > > > DoS
> > > > > > > mechanism.
> > > > > > >
> > > > > > > My proposal was to make this only be enforced by
> > > > > > > applications
> > > > > > > that
> > > > > > > explicitly opt-in by setting O_SH*/O_EX* flags. It wouldn't
> > > > > > > be
> > > > > > > too
> > > > > > > difficult to also allow them to be enforced on a per-fs
> > > > > > > basis
> > > > > > > via
> > > > > > > mount
> > > > > > > option or something. Maybe we could expand the meaning of
> > > > > > > '-o
> > > > > > > mand'
> > > > > > > ?
> > > > > > >
> > > > > > > How would you propose that we restrict this?
> > > > > > >
> > > > > >
> > > > > > Our communication channel is broken.
> > > > > > I did not intend to propose any implicit locking.
> > > > > > If samba and nfsd can opt-in with O_SHARE flags, I do not
> > > > > > understand why a mount option is helpful for the cause of
> > > > > > samba/nfsd interop.
> > > > > >
> > > > > > If someone else is interested in samba/local interop than
> > > > > > yes, a mount option like suggested by Pavel could be a good
> > > > > > option,
> > > > > > but it is an orthogonal effort IMO.
> > > > >
> > > > > If an NFS client 'opts in' to set share deny, then that still
> > > > > makes
> > > > > it
> > > > > a non-optional lock for the other NFS clients, because all
> > > > > ordinary
> > > > > open() calls will be gated by the server whether or not their
> > > > > application specifies the O_SHARE flag. There is no flag in the
> > > > > NFS
> > > > > protocol that could tell the server to ignore deny modes.
> > > > >
> > > > > IOW: it would suffice for 1 client to use O_SHARE|O_DENY* to
> > > > > opt
> > > > > all
> > > > > the other clients in.
> > > > >
> > > >
> > > > Sorry for being thick, I don't understand if we are in agreement
> > > > or
> > > > not.
> > > >
> > > > My understanding is that the network file server implementations
> > > > (i.e. samba, knfds, Ganesha) will always use share/deny modes.
> > > > So for example nfs v3 opens will always use O_DENY_NONE
> > > > in order to have correct interop with samba and nfs v4.
> > > >
> > > > If I am misunderstanding something, please enlighten me.
> > > > If there is a reason why mount option is needed for the sole
> > > > purpose
> > > > of interop between network filesystem servers, please enlighten
> > > > me.
> > > >
> > > >
> > >
> > > Same difference. As long as nfsd and/or Ganesha are translating
> > > OPEN4_SHARE_ACCESS_READ and OPEN4_SHARE_ACCESS_WRITE into share
> > > access
> > > locks, then those will conflict with any deny locks set by whatever
> > > application that uses them.
> > >
> > > IOW: any open(O_RDONLY) and open(O_RDWR) will conflict with an
> > > O_DENY_READ that is set on the server, and any open(O_WRONLY) and
> > > open(O_RDWR) will conflict with an O_DENY_WRITE that is set on the
> > > server. There is no opt-out for NFS clients on this issue, because
> > > stateful NFSv4 opens MUST set one or more of
> > > OPEN4_SHARE_ACCESS_READ
> > > and OPEN4_SHARE_ACCESS_WRITE.
> > >
> >
> > Urgh! I *think* I understand the confusion.
> >
> > I believe Jeff was talking about implementing a mount option
> > similar to -o mand for local fs on the server.
> > With that mount option, *any* open() by any app of file from
> > that mount will use O_DENY_NONE to interop correctly with
> > network servers that explicitly opt-in for interop on share modes.
> > I agree its a nice feature that is easy to implement - not important
> > for first version IMO.
> >
> > I *think* you are talking on nfs client mount option for
> > opt-in/out of share modes? there was no such intention.
> >
>
> No. I'm saying that whether you intended to or not, you _are_
> implementing a mandatory lock over NFS. No talk about O_SHARE flags and
> it being an opt-in process for local applications changes the fact that
> non-local applications (i.e. the ones that count ) are being subjected
> to a mandatory lock with all the potential for denial of service that
> implies.
> So we need a mechanism beyond O_SHARE in order to ensure this system
> cannot be used on sensitive files that need to be accessible to all. It
> could be an export option, or a mount option, or it could be a more
> specific mechanism (e.g. the setgid with no execute mode bit as using
> in POSIX mandatory locks).
>

I see. Thanks for making that concern clear.

If server owner wishes to have samba/nfs interop obviously
server owner should configure both samba and nfs for interop.
nfs should thus have it configurable via export options IMO
and not via mount option (server's responsibility).

Preventing O_DENY_X on a certain file... hmm
We can do that but, if nfs protocol has O_DENY what's the
logic that we would want to override it?
What we need is a way to track, blame the resource holder and
release the resource administratively.

For that matter, assuming the nfsd and smbd (etc) can contain
their own fds without leaking them to other modules (minus bugs)
then provided with sufficient sysfs/procfs info (i.e. Bruce's new open
files tracking), admin should be able to kill the offending nfs/smb client
to release the hogged file.

I believe that is the Windows server solution to the DoS that is implied
from O_DENY.

Thanks,
Amir.

2019-04-29 13:12:13

by Trond Myklebust

[permalink] [raw]
Subject: Re: Better interop for NFS/SMB file share mode/reservation

On Mon, 2019-04-29 at 07:42 -0400, Amir Goldstein wrote:
> On Sun, Apr 28, 2019 at 8:57 PM Trond Myklebust <
> [email protected]> wrote:
> > On Sun, 2019-04-28 at 18:33 -0400, Amir Goldstein wrote:
> > > On Sun, Apr 28, 2019 at 6:08 PM Trond Myklebust <
> > > [email protected]> wrote:
> > > > On Sun, 2019-04-28 at 18:00 -0400, Amir Goldstein wrote:
> > > > > On Sun, Apr 28, 2019 at 11:06 AM Trond Myklebust
> > > > > <[email protected]> wrote:
> > > > > > On Sun, 2019-04-28 at 09:45 -0400, Amir Goldstein wrote:
> > > > > > > On Sun, Apr 28, 2019 at 8:09 AM Jeff Layton <
> > > > > > > [email protected]>
> > > > > > > wrote:
> > > > > > > > On Sat, 2019-04-27 at 16:16 -0400, Amir Goldstein
> > > > > > > > wrote:
> > > > > > > > > [adding back samba/nfs and fsdevel]
> > > > > > > > >
> > > > > > > >
> > > > > > > > cc'ing Pavel too -- he did a bunch of work in this area
> > > > > > > > a
> > > > > > > > few
> > > > > > > > years
> > > > > > > > ago.
> > > > > > > >
> > > > > > > > > On Fri, Apr 26, 2019 at 6:22 PM Jeff Layton <
> > > > > > > > > [email protected]>
> > > > > > > > > wrote:
> > > > > > > > > > On Fri, 2019-04-26 at 10:50 -0400, J. Bruce Fields
> > > > > > > > > > wrote:
> > > > > > > > > > > On Fri, Apr 26, 2019 at 04:11:00PM +0200, Amir
> > > > > > > > > > > Goldstein
> > > > > > > > > > > wrote:
> > > > > > > > > > > > On Fri, Apr 26, 2019, 4:00 PM J. Bruce Fields <
> > > > > > > > > > > > [email protected]> wrote:
> > > > > > > > > > > >
> > > > > > > > > > That said, we could also look at a vfs-level mount
> > > > > > > > > > option
> > > > > > > > > > that
> > > > > > > > > > would
> > > > > > > > > > make the kernel enforce these for any opener. That
> > > > > > > > > > could
> > > > > > > > > > also
> > > > > > > > > > be useful,
> > > > > > > > > > and shouldn't be too hard to implement. Maybe even
> > > > > > > > > > make
> > > > > > > > > > it
> > > > > > > > > > a
> > > > > > > > > > vfsmount-
> > > > > > > > > > level option (like -o ro is).
> > > > > > > > > >
> > > > > > > > >
> > > > > > > > > Yeh, I am humbly going to leave this struggle to
> > > > > > > > > someone
> > > > > > > > > else.
> > > > > > > > > Not important enough IMO and completely independent
> > > > > > > > > effort to
> > > > > > > > > the
> > > > > > > > > advisory atomic open&lock API.
> > > > > > > >
> > > > > > > > Having the kernel allow setting deny modes on any open
> > > > > > > > call
> > > > > > > > is
> > > > > > > > a
> > > > > > > > non-
> > > > > > > > starter, for the reasons Bruce outlined earlier. This
> > > > > > > > _must_ be
> > > > > > > > restricted in some fashion or we'll be opening up a
> > > > > > > > ginormous
> > > > > > > > DoS
> > > > > > > > mechanism.
> > > > > > > >
> > > > > > > > My proposal was to make this only be enforced by
> > > > > > > > applications
> > > > > > > > that
> > > > > > > > explicitly opt-in by setting O_SH*/O_EX* flags. It
> > > > > > > > wouldn't
> > > > > > > > be
> > > > > > > > too
> > > > > > > > difficult to also allow them to be enforced on a per-fs
> > > > > > > > basis
> > > > > > > > via
> > > > > > > > mount
> > > > > > > > option or something. Maybe we could expand the meaning
> > > > > > > > of
> > > > > > > > '-o
> > > > > > > > mand'
> > > > > > > > ?
> > > > > > > >
> > > > > > > > How would you propose that we restrict this?
> > > > > > > >
> > > > > > >
> > > > > > > Our communication channel is broken.
> > > > > > > I did not intend to propose any implicit locking.
> > > > > > > If samba and nfsd can opt-in with O_SHARE flags, I do not
> > > > > > > understand why a mount option is helpful for the cause of
> > > > > > > samba/nfsd interop.
> > > > > > >
> > > > > > > If someone else is interested in samba/local interop than
> > > > > > > yes, a mount option like suggested by Pavel could be a
> > > > > > > good
> > > > > > > option,
> > > > > > > but it is an orthogonal effort IMO.
> > > > > >
> > > > > > If an NFS client 'opts in' to set share deny, then that
> > > > > > still
> > > > > > makes
> > > > > > it
> > > > > > a non-optional lock for the other NFS clients, because all
> > > > > > ordinary
> > > > > > open() calls will be gated by the server whether or not
> > > > > > their
> > > > > > application specifies the O_SHARE flag. There is no flag in
> > > > > > the
> > > > > > NFS
> > > > > > protocol that could tell the server to ignore deny modes.
> > > > > >
> > > > > > IOW: it would suffice for 1 client to use O_SHARE|O_DENY*
> > > > > > to
> > > > > > opt
> > > > > > all
> > > > > > the other clients in.
> > > > > >
> > > > >
> > > > > Sorry for being thick, I don't understand if we are in
> > > > > agreement
> > > > > or
> > > > > not.
> > > > >
> > > > > My understanding is that the network file server
> > > > > implementations
> > > > > (i.e. samba, knfds, Ganesha) will always use share/deny
> > > > > modes.
> > > > > So for example nfs v3 opens will always use O_DENY_NONE
> > > > > in order to have correct interop with samba and nfs v4.
> > > > >
> > > > > If I am misunderstanding something, please enlighten me.
> > > > > If there is a reason why mount option is needed for the sole
> > > > > purpose
> > > > > of interop between network filesystem servers, please
> > > > > enlighten
> > > > > me.
> > > > >
> > > > >
> > > >
> > > > Same difference. As long as nfsd and/or Ganesha are translating
> > > > OPEN4_SHARE_ACCESS_READ and OPEN4_SHARE_ACCESS_WRITE into share
> > > > access
> > > > locks, then those will conflict with any deny locks set by
> > > > whatever
> > > > application that uses them.
> > > >
> > > > IOW: any open(O_RDONLY) and open(O_RDWR) will conflict with an
> > > > O_DENY_READ that is set on the server, and any open(O_WRONLY)
> > > > and
> > > > open(O_RDWR) will conflict with an O_DENY_WRITE that is set on
> > > > the
> > > > server. There is no opt-out for NFS clients on this issue,
> > > > because
> > > > stateful NFSv4 opens MUST set one or more of
> > > > OPEN4_SHARE_ACCESS_READ
> > > > and OPEN4_SHARE_ACCESS_WRITE.
> > > >
> > >
> > > Urgh! I *think* I understand the confusion.
> > >
> > > I believe Jeff was talking about implementing a mount option
> > > similar to -o mand for local fs on the server.
> > > With that mount option, *any* open() by any app of file from
> > > that mount will use O_DENY_NONE to interop correctly with
> > > network servers that explicitly opt-in for interop on share
> > > modes.
> > > I agree its a nice feature that is easy to implement - not
> > > important
> > > for first version IMO.
> > >
> > > I *think* you are talking on nfs client mount option for
> > > opt-in/out of share modes? there was no such intention.
> > >
> >
> > No. I'm saying that whether you intended to or not, you _are_
> > implementing a mandatory lock over NFS. No talk about O_SHARE flags
> > and
> > it being an opt-in process for local applications changes the fact
> > that
> > non-local applications (i.e. the ones that count ) are being
> > subjected
> > to a mandatory lock with all the potential for denial of service
> > that
> > implies.
> > So we need a mechanism beyond O_SHARE in order to ensure this
> > system
> > cannot be used on sensitive files that need to be accessible to
> > all. It
> > could be an export option, or a mount option, or it could be a more
> > specific mechanism (e.g. the setgid with no execute mode bit as
> > using
> > in POSIX mandatory locks).
> >
>
> I see. Thanks for making that concern clear.
>
> If server owner wishes to have samba/nfs interop obviously
> server owner should configure both samba and nfs for interop.
> nfs should thus have it configurable via export options IMO
> and not via mount option (server's responsibility).
>
> Preventing O_DENY_X on a certain file... hmm
> We can do that but, if nfs protocol has O_DENY what's the
> logic that we would want to override it?

It was added in order to support Windows clients. There is also
optional support for mandatory byte range locks.

However the fact that the protocol supports it doesn't automatically
make it a good idea. Design by committee...

> What we need is a way to track, blame the resource holder and
> release the resource administratively.
>
> For that matter, assuming the nfsd and smbd (etc) can contain
> their own fds without leaking them to other modules (minus bugs)
> then provided with sufficient sysfs/procfs info (i.e. Bruce's new
> open
> files tracking), admin should be able to kill the offending nfs/smb
> client
> to release the hogged file.
>
> I believe that is the Windows server solution to the DoS that is
> implied
> from O_DENY.
>

Relying on being able to access the clients is not good enough. In
general, server admins tend not to have access to the clients.

However it should indeed be possible to create a tool on the server to
revoke locks and open state. Most commercial servers have that kind of
functionality, and I would agree that makes sense for dealing with
rogue processes.

--
Trond Myklebust
Linux NFS client maintainer, Hammerspace
[email protected]


2019-04-29 20:29:51

by Jeff Layton

[permalink] [raw]
Subject: Re: Better interop for NFS/SMB file share mode/reservation

On Mon, 2019-04-29 at 00:57 +0000, Trond Myklebust wrote:
> On Sun, 2019-04-28 at 18:33 -0400, Amir Goldstein wrote:
> > On Sun, Apr 28, 2019 at 6:08 PM Trond Myklebust <
> > [email protected]> wrote:
> > > On Sun, 2019-04-28 at 18:00 -0400, Amir Goldstein wrote:
> > > > On Sun, Apr 28, 2019 at 11:06 AM Trond Myklebust
> > > > <[email protected]> wrote:
> > > > > On Sun, 2019-04-28 at 09:45 -0400, Amir Goldstein wrote:
> > > > > > On Sun, Apr 28, 2019 at 8:09 AM Jeff Layton <
> > > > > > [email protected]>
> > > > > > wrote:
> > > > > > > On Sat, 2019-04-27 at 16:16 -0400, Amir Goldstein wrote:
> > > > > > > > [adding back samba/nfs and fsdevel]
> > > > > > > >
> > > > > > >
> > > > > > > cc'ing Pavel too -- he did a bunch of work in this area a
> > > > > > > few
> > > > > > > years
> > > > > > > ago.
> > > > > > >
> > > > > > > > On Fri, Apr 26, 2019 at 6:22 PM Jeff Layton <
> > > > > > > > [email protected]>
> > > > > > > > wrote:
> > > > > > > > > On Fri, 2019-04-26 at 10:50 -0400, J. Bruce Fields
> > > > > > > > > wrote:
> > > > > > > > > > On Fri, Apr 26, 2019 at 04:11:00PM +0200, Amir
> > > > > > > > > > Goldstein
> > > > > > > > > > wrote:
> > > > > > > > > > > On Fri, Apr 26, 2019, 4:00 PM J. Bruce Fields <
> > > > > > > > > > > [email protected]> wrote:
> > > > > > > > > > >
> > > > > > > > > That said, we could also look at a vfs-level mount
> > > > > > > > > option
> > > > > > > > > that
> > > > > > > > > would
> > > > > > > > > make the kernel enforce these for any opener. That
> > > > > > > > > could
> > > > > > > > > also
> > > > > > > > > be useful,
> > > > > > > > > and shouldn't be too hard to implement. Maybe even make
> > > > > > > > > it
> > > > > > > > > a
> > > > > > > > > vfsmount-
> > > > > > > > > level option (like -o ro is).
> > > > > > > > >
> > > > > > > >
> > > > > > > > Yeh, I am humbly going to leave this struggle to someone
> > > > > > > > else.
> > > > > > > > Not important enough IMO and completely independent
> > > > > > > > effort to
> > > > > > > > the
> > > > > > > > advisory atomic open&lock API.
> > > > > > >
> > > > > > > Having the kernel allow setting deny modes on any open call
> > > > > > > is
> > > > > > > a
> > > > > > > non-
> > > > > > > starter, for the reasons Bruce outlined earlier. This
> > > > > > > _must_ be
> > > > > > > restricted in some fashion or we'll be opening up a
> > > > > > > ginormous
> > > > > > > DoS
> > > > > > > mechanism.
> > > > > > >
> > > > > > > My proposal was to make this only be enforced by
> > > > > > > applications
> > > > > > > that
> > > > > > > explicitly opt-in by setting O_SH*/O_EX* flags. It wouldn't
> > > > > > > be
> > > > > > > too
> > > > > > > difficult to also allow them to be enforced on a per-fs
> > > > > > > basis
> > > > > > > via
> > > > > > > mount
> > > > > > > option or something. Maybe we could expand the meaning of
> > > > > > > '-o
> > > > > > > mand'
> > > > > > > ?
> > > > > > >
> > > > > > > How would you propose that we restrict this?
> > > > > > >
> > > > > >
> > > > > > Our communication channel is broken.
> > > > > > I did not intend to propose any implicit locking.
> > > > > > If samba and nfsd can opt-in with O_SHARE flags, I do not
> > > > > > understand why a mount option is helpful for the cause of
> > > > > > samba/nfsd interop.
> > > > > >
> > > > > > If someone else is interested in samba/local interop than
> > > > > > yes, a mount option like suggested by Pavel could be a good
> > > > > > option,
> > > > > > but it is an orthogonal effort IMO.
> > > > >
> > > > > If an NFS client 'opts in' to set share deny, then that still
> > > > > makes
> > > > > it
> > > > > a non-optional lock for the other NFS clients, because all
> > > > > ordinary
> > > > > open() calls will be gated by the server whether or not their
> > > > > application specifies the O_SHARE flag. There is no flag in the
> > > > > NFS
> > > > > protocol that could tell the server to ignore deny modes.
> > > > >
> > > > > IOW: it would suffice for 1 client to use O_SHARE|O_DENY* to
> > > > > opt
> > > > > all
> > > > > the other clients in.
> > > > >
> > > >
> > > > Sorry for being thick, I don't understand if we are in agreement
> > > > or
> > > > not.
> > > >
> > > > My understanding is that the network file server implementations
> > > > (i.e. samba, knfds, Ganesha) will always use share/deny modes.
> > > > So for example nfs v3 opens will always use O_DENY_NONE
> > > > in order to have correct interop with samba and nfs v4.
> > > >
> > > > If I am misunderstanding something, please enlighten me.
> > > > If there is a reason why mount option is needed for the sole
> > > > purpose
> > > > of interop between network filesystem servers, please enlighten
> > > > me.
> > > >
> > > >
> > >
> > > Same difference. As long as nfsd and/or Ganesha are translating
> > > OPEN4_SHARE_ACCESS_READ and OPEN4_SHARE_ACCESS_WRITE into share
> > > access
> > > locks, then those will conflict with any deny locks set by whatever
> > > application that uses them.
> > >
> > > IOW: any open(O_RDONLY) and open(O_RDWR) will conflict with an
> > > O_DENY_READ that is set on the server, and any open(O_WRONLY) and
> > > open(O_RDWR) will conflict with an O_DENY_WRITE that is set on the
> > > server. There is no opt-out for NFS clients on this issue, because
> > > stateful NFSv4 opens MUST set one or more of
> > > OPEN4_SHARE_ACCESS_READ
> > > and OPEN4_SHARE_ACCESS_WRITE.
> > >
> >
> > Urgh! I *think* I understand the confusion.
> >
> > I believe Jeff was talking about implementing a mount option
> > similar to -o mand for local fs on the server.
> > With that mount option, *any* open() by any app of file from
> > that mount will use O_DENY_NONE to interop correctly with
> > network servers that explicitly opt-in for interop on share modes.
> > I agree its a nice feature that is easy to implement - not important
> > for first version IMO.
> >
> > I *think* you are talking on nfs client mount option for
> > opt-in/out of share modes? there was no such intention.
> >
>
> No. I'm saying that whether you intended to or not, you _are_
> implementing a mandatory lock over NFS. No talk about O_SHARE flags and
> it being an opt-in process for local applications changes the fact that
> non-local applications (i.e. the ones that count ☺) are being subjected
> to a mandatory lock with all the potential for denial of service that
> implies.
> So we need a mechanism beyond O_SHARE in order to ensure this system
> cannot be used on sensitive files that need to be accessible to all. It
> could be an export option, or a mount option, or it could be a more
> specific mechanism (e.g. the setgid with no execute mode bit as using
> in POSIX mandatory locks).
>

That's a great point.

I was focused on the local fs piece in order to support NFS/SMB serving,
but we also have to consider that people using nfs or cifs filesystems
would want to use this interface to have their clients set deny bits as
well.

So, I think you're right that we can't really do this without involving
non-cooperating processes in some way.

A mount option sounds like the simplest way to do this. We have
SB_MANDLOCK now, so we'd just need a SB_DENYLOCK or something that would
enable the use of O_DENY_READ/WRITE on a file. Maybe '-o denymode' or
something.

You might still get back EBUSY on a nfs or cifs filesystem even without
that option, but there's not much we can do about that.
--
Jeff Layton <[email protected]>

2019-04-29 22:34:10

by Pavel Shilovskiy

[permalink] [raw]
Subject: RE: Better interop for NFS/SMB file share mode/reservation



пн, 29 апр. 2019 г. в 13:29, Jeff Layton <[email protected]>:
>
> On Mon, 2019-04-29 at 00:57 +0000, Trond Myklebust wrote:
> > On Sun, 2019-04-28 at 18:33 -0400, Amir Goldstein wrote:
> > > On Sun, Apr 28, 2019 at 6:08 PM Trond Myklebust <
> > > [email protected]> wrote:
> > > > On Sun, 2019-04-28 at 18:00 -0400, Amir Goldstein wrote:
> > > > > On Sun, Apr 28, 2019 at 11:06 AM Trond Myklebust
> > > > > <[email protected]> wrote:
> > > > > > On Sun, 2019-04-28 at 09:45 -0400, Amir Goldstein wrote:
> > > > > > > On Sun, Apr 28, 2019 at 8:09 AM Jeff Layton <
> > > > > > > [email protected]>
> > > > > > > wrote:
> > > > > > > > On Sat, 2019-04-27 at 16:16 -0400, Amir Goldstein wrote:
> > > > > > > > > [adding back samba/nfs and fsdevel]
> > > > > > > > >
> > > > > > > >
> > > > > > > > cc'ing Pavel too -- he did a bunch of work in this area a
> > > > > > > > few
> > > > > > > > years
> > > > > > > > ago.
> > > > > > > >
> > > > > > > > > On Fri, Apr 26, 2019 at 6:22 PM Jeff Layton <
> > > > > > > > > [email protected]>
> > > > > > > > > wrote:
> > > > > > > > > > On Fri, 2019-04-26 at 10:50 -0400, J. Bruce Fields
> > > > > > > > > > wrote:
> > > > > > > > > > > On Fri, Apr 26, 2019 at 04:11:00PM +0200, Amir
> > > > > > > > > > > Goldstein
> > > > > > > > > > > wrote:
> > > > > > > > > > > > On Fri, Apr 26, 2019, 4:00 PM J. Bruce Fields <
> > > > > > > > > > > > [email protected]> wrote:
> > > > > > > > > > > >
> > > > > > > > > > That said, we could also look at a vfs-level mount
> > > > > > > > > > option
> > > > > > > > > > that
> > > > > > > > > > would
> > > > > > > > > > make the kernel enforce these for any opener. That
> > > > > > > > > > could
> > > > > > > > > > also
> > > > > > > > > > be useful,
> > > > > > > > > > and shouldn't be too hard to implement. Maybe even make
> > > > > > > > > > it
> > > > > > > > > > a
> > > > > > > > > > vfsmount-
> > > > > > > > > > level option (like -o ro is).
> > > > > > > > > >
> > > > > > > > >
> > > > > > > > > Yeh, I am humbly going to leave this struggle to someone
> > > > > > > > > else.
> > > > > > > > > Not important enough IMO and completely independent
> > > > > > > > > effort to
> > > > > > > > > the
> > > > > > > > > advisory atomic open&lock API.
> > > > > > > >
> > > > > > > > Having the kernel allow setting deny modes on any open call
> > > > > > > > is
> > > > > > > > a
> > > > > > > > non-
> > > > > > > > starter, for the reasons Bruce outlined earlier. This
> > > > > > > > _must_ be
> > > > > > > > restricted in some fashion or we'll be opening up a
> > > > > > > > ginormous
> > > > > > > > DoS
> > > > > > > > mechanism.
> > > > > > > >
> > > > > > > > My proposal was to make this only be enforced by
> > > > > > > > applications
> > > > > > > > that
> > > > > > > > explicitly opt-in by setting O_SH*/O_EX* flags. It wouldn't
> > > > > > > > be
> > > > > > > > too
> > > > > > > > difficult to also allow them to be enforced on a per-fs
> > > > > > > > basis
> > > > > > > > via
> > > > > > > > mount
> > > > > > > > option or something. Maybe we could expand the meaning of
> > > > > > > > '-o
> > > > > > > > mand'
> > > > > > > > ?
> > > > > > > >
> > > > > > > > How would you propose that we restrict this?
> > > > > > > >
> > > > > > >
> > > > > > > Our communication channel is broken.
> > > > > > > I did not intend to propose any implicit locking.
> > > > > > > If samba and nfsd can opt-in with O_SHARE flags, I do not
> > > > > > > understand why a mount option is helpful for the cause of
> > > > > > > samba/nfsd interop.
> > > > > > >
> > > > > > > If someone else is interested in samba/local interop than
> > > > > > > yes, a mount option like suggested by Pavel could be a good
> > > > > > > option,
> > > > > > > but it is an orthogonal effort IMO.
> > > > > >
> > > > > > If an NFS client 'opts in' to set share deny, then that still
> > > > > > makes
> > > > > > it
> > > > > > a non-optional lock for the other NFS clients, because all
> > > > > > ordinary
> > > > > > open() calls will be gated by the server whether or not their
> > > > > > application specifies the O_SHARE flag. There is no flag in the
> > > > > > NFS
> > > > > > protocol that could tell the server to ignore deny modes.
> > > > > >
> > > > > > IOW: it would suffice for 1 client to use O_SHARE|O_DENY* to
> > > > > > opt
> > > > > > all
> > > > > > the other clients in.
> > > > > >
> > > > >
> > > > > Sorry for being thick, I don't understand if we are in agreement
> > > > > or
> > > > > not.
> > > > >
> > > > > My understanding is that the network file server implementations
> > > > > (i.e. samba, knfds, Ganesha) will always use share/deny modes.
> > > > > So for example nfs v3 opens will always use O_DENY_NONE
> > > > > in order to have correct interop with samba and nfs v4.
> > > > >
> > > > > If I am misunderstanding something, please enlighten me.
> > > > > If there is a reason why mount option is needed for the sole
> > > > > purpose
> > > > > of interop between network filesystem servers, please enlighten
> > > > > me.
> > > > >
> > > > >
> > > >
> > > > Same difference. As long as nfsd and/or Ganesha are translating
> > > > OPEN4_SHARE_ACCESS_READ and OPEN4_SHARE_ACCESS_WRITE into share
> > > > access
> > > > locks, then those will conflict with any deny locks set by whatever
> > > > application that uses them.
> > > >
> > > > IOW: any open(O_RDONLY) and open(O_RDWR) will conflict with an
> > > > O_DENY_READ that is set on the server, and any open(O_WRONLY) and
> > > > open(O_RDWR) will conflict with an O_DENY_WRITE that is set on the
> > > > server. There is no opt-out for NFS clients on this issue, because
> > > > stateful NFSv4 opens MUST set one or more of
> > > > OPEN4_SHARE_ACCESS_READ
> > > > and OPEN4_SHARE_ACCESS_WRITE.
> > > >
> > >
> > > Urgh! I *think* I understand the confusion.
> > >
> > > I believe Jeff was talking about implementing a mount option
> > > similar to -o mand for local fs on the server.
> > > With that mount option, *any* open() by any app of file from
> > > that mount will use O_DENY_NONE to interop correctly with
> > > network servers that explicitly opt-in for interop on share modes.
> > > I agree its a nice feature that is easy to implement - not important
> > > for first version IMO.
> > >
> > > I *think* you are talking on nfs client mount option for
> > > opt-in/out of share modes? there was no such intention.
> > >
> >
> > No. I'm saying that whether you intended to or not, you _are_
> > implementing a mandatory lock over NFS. No talk about O_SHARE flags and
> > it being an opt-in process for local applications changes the fact that
> > non-local applications (i.e. the ones that count ) are being subjected
> > to a mandatory lock with all the potential for denial of service that
> > implies.
> > So we need a mechanism beyond O_SHARE in order to ensure this system
> > cannot be used on sensitive files that need to be accessible to all. It
> > could be an export option, or a mount option, or it could be a more
> > specific mechanism (e.g. the setgid with no execute mode bit as using
> > in POSIX mandatory locks).
> >
>
> That's a great point.
>
> I was focused on the local fs piece in order to support NFS/SMB serving,
> but we also have to consider that people using nfs or cifs filesystems
> would want to use this interface to have their clients set deny bits as
> well.
>
> So, I think you're right that we can't really do this without involving
> non-cooperating processes in some way.

It's been 5+ years since I touched that code but I still like the idea of having a separate mount option for mountpoints used by Samba and NFS servers and clients to avoid security attacks on the sensitive files. For some sensitive files on such mountpoints a more selective mechanism may be used to prevent deny flags to be set (like mentioned above). Or we may think about adding another flag e.g. O_DENYFORCE available to root only that tells the kernel to not take into account deny flags already set on a file - might be useful for recovery tools.

About O_DENYDELETE: I don't understand how we may reach a good interop story without a proper implementation of this flag. Windows apps may set it and Samba needs to respect it. If an NFS client removes such an opened file, what will Samba tell the Windows client?

>
> A mount option sounds like the simplest way to do this. We have
> SB_MANDLOCK now, so we'd just need a SB_DENYLOCK or something that would
> enable the use of O_DENY_READ/WRITE on a file. Maybe '-o denymode' or
> something.

I remember it was 'sharelock' in my patchset but naming here is a least important I guess.

>
> You might still get back EBUSY on a nfs or cifs filesystem even without
> that option, but there's not much we can do about that.

I ended up with a new ESHAREDENIED error code which I found better for detectability - might be useful to know an exact reason of the open call being failed. Let's say a DB instance wants to be sure that a partition file is already being served by another instance before giving up on trying to open it.

--
Best regards,
Pavel Shilovsky

2019-04-30 00:32:45

by Amir Goldstein

[permalink] [raw]
Subject: Re: Better interop for NFS/SMB file share mode/reservation

...
> > > No. I'm saying that whether you intended to or not, you _are_
> > > implementing a mandatory lock over NFS. No talk about O_SHARE flags and
> > > it being an opt-in process for local applications changes the fact that
> > > non-local applications (i.e. the ones that count ) are being subjected
> > > to a mandatory lock with all the potential for denial of service that
> > > implies.
> > > So we need a mechanism beyond O_SHARE in order to ensure this system
> > > cannot be used on sensitive files that need to be accessible to all. It
> > > could be an export option, or a mount option, or it could be a more
> > > specific mechanism (e.g. the setgid with no execute mode bit as using
> > > in POSIX mandatory locks).
> > >
> >
> > That's a great point.
> >
> > I was focused on the local fs piece in order to support NFS/SMB serving,
> > but we also have to consider that people using nfs or cifs filesystems
> > would want to use this interface to have their clients set deny bits as
> > well.
> >
> > So, I think you're right that we can't really do this without involving
> > non-cooperating processes in some way.
>
> It's been 5+ years since I touched that code but I still like the idea of having a separate mount option for mountpoints used by Samba and NFS servers and clients to avoid security attacks on the sensitive files. For some sensitive files on such mountpoints a more selective mechanism may be used to prevent deny flags to be set (like mentioned above). Or we may think about adding another flag e.g. O_DENYFORCE available to root only that tells the kernel to not take into account deny flags already set on a file - might be useful for recovery tools.
>
> About O_DENYDELETE: I don't understand how we may reach a good interop story without a proper implementation of this flag. Windows apps may set it and Samba needs to respect it. If an NFS client removes such an opened file, what will Samba tell the Windows client?
>

Samba will tell the Windows client:
"Sorry, my administrator has decided to trade off interop with nfs on
share modes,
with DENY_DELETE functionality, so I cannot grant you DENY_DELETE that you
requested."
Not sure if that is workable. Samba developers need to chime in.

Thanks,
Amir.

2019-04-30 08:15:02

by Uri Simchoni

[permalink] [raw]
Subject: Re: Better interop for NFS/SMB file share mode/reservation

On 4/30/19 3:31 AM, Amir Goldstein via samba-technical wrote:
>>
>> About O_DENYDELETE: I don't understand how we may reach a good interop story without a proper implementation of this flag. Windows apps may set it and Samba needs to respect it. If an NFS client removes such an opened file, what will Samba tell the Windows client?
>>
>
> Samba will tell the Windows client:
> "Sorry, my administrator has decided to trade off interop with nfs on
> share modes,
> with DENY_DELETE functionality, so I cannot grant you DENY_DELETE that you
> requested."
> Not sure if that is workable. Samba developers need to chime in.
>
> Thanks,
> Amir.
>

On Windows you don't ask for DENY_DELETE, you get it by default unless
you ask to *allow* deletion. If you fopen() a file, even for
reading-only, the MSVC standard C library would open it with delete
denied because it does not explicitly request to allow it. My guess is
that runtimes of other high-level languages behave that way too on
Windows. That means pretty much everything would stop working.

Thanks,
Uri.

2019-04-30 09:24:32

by Amir Goldstein

[permalink] [raw]
Subject: Re: Better interop for NFS/SMB file share mode/reservation

On Tue, Apr 30, 2019 at 4:12 AM Uri Simchoni <[email protected]> wrote:
>
> On 4/30/19 3:31 AM, Amir Goldstein via samba-technical wrote:
> >>
> >> About O_DENYDELETE: I don't understand how we may reach a good interop story without a proper implementation of this flag. Windows apps may set it and Samba needs to respect it. If an NFS client removes such an opened file, what will Samba tell the Windows client?
> >>
> >
> > Samba will tell the Windows client:
> > "Sorry, my administrator has decided to trade off interop with nfs on
> > share modes,
> > with DENY_DELETE functionality, so I cannot grant you DENY_DELETE that you
> > requested."
> > Not sure if that is workable. Samba developers need to chime in.
> >
> > Thanks,
> > Amir.
> >
>
> On Windows you don't ask for DENY_DELETE, you get it by default unless
> you ask to *allow* deletion. If you fopen() a file, even for
> reading-only, the MSVC standard C library would open it with delete
> denied because it does not explicitly request to allow it. My guess is
> that runtimes of other high-level languages behave that way too on
> Windows. That means pretty much everything would stop working.
>

I see. I was wondering about something else.
Windows deletes a file by opening it for DELETE_ON_CLOSE
and then "The file is to be deleted immediately after all of its handles are
closed, which includes the specified handle and any other open or
duplicated handles.".
What about hardlinks?
Are open handles associate with a specific path? not a specific inode?

I should note that Linux NFS client does something similar called silly
rename. To unlink a file, rename it to temp name, then unlink temp name
on last handle close to that file from that client.

If, and its a very big if, samba could guess what the silly rename temp name
would be, DENY_DELETE could have been implement as creating a link
to file with silly rename name.

Of course we cannot rely on the NFS client to enforce the samba interop,
but nfsd v4 server and samba could both use a similar technique to
coordinate unlink/rename and DENY_DELETE.

Thanks,
Amir.

2019-05-24 07:14:12

by Amir Goldstein

[permalink] [raw]
Subject: Re: Better interop for NFS/SMB file share mode/reservation

[dropping linux-fsdevel]

On Thu, Apr 25, 2019 at 9:11 PM Amir Goldstein <[email protected]> wrote:
>
> On Thu, Mar 7, 2019 at 1:04 PM Stefan Metzmacher <[email protected]> wrote:
> >
> > Am 06.03.19 um 22:25 schrieb Ralph Böhme via samba-technical:
> > >
> > > Jeremy Allison wrote:
> > >> On Wed, Mar 06, 2019 at 03:31:08PM -0500, Jeff Layton wrote:
> > >>> On Wed, 2019-03-06 at 10:11 -0500, J. Bruce Fields wrote:
> > >>>>
> > >>>> Jeff, wasn't there some work (on Ceph maybe?) on a userspace delegation
> > >>>> API? Is that close to what's needed?
> > >>>>
> > >>>
> > >>> Here's the C headers for that stuff:
> > >>>
> > >>> https://github.com/ceph/ceph/blob/7ba6bece4187eda5d05a9b84211fe6ba8dd287bd/src/include/cephfs/libcephfs.h#L1734
> > >>>
> > >>> It's simple enough and works for us in ganesha, and I think we can
> > >>> probably adapt it to samba without too much difficulty. The callback
> > >>> doesn't seem like it'll do for a kernel API though -- you'd almost
> > >>> certainly need to do something different there (signals? inotify?).
> > >>
> > >> SMB3 leases have R/RW and Handle-based leases.
> > >
> > > Just to be precise: SMB2.1+ has R, RH, RW and RWH leases.
> > >
> > >> Handle leases allow multiple opens of the same pathname
> > >> that get different handles to share the lease, allowing
> > >> a client redirector to delay opens or closes locally
> > >> so long as it has a handle lease.
> > >
> > > That'a a propertly of leases in general, not just H-leases. The client provides a lease key which is a GUID with each lease request
> > >
> > >>
> > >> Here are the semantics:
> > >>
> > >> https://docs.microsoft.com/en-us/openspecs/windows_protocols/ms-smb2/d8df943d-6ad7-4b30-9f58-96ae90fc6204
> > >>
> > >> I'm not sure a simple file-descriptor based API is
> > >> enough for us. Can he have a uuid or token based
> > >> API instead where the server can chose what fd's
> > >> to cover with a token ?
> > >
> > > Yes, that would be ideal.
>
> Getting back to this.
> Thanks all for the valuable inputs.
>
> Next week is LSF/MM and I was assigned a 30 minute slot on filesystems track
> to discuss "NFS/SMB file share".
>
> So let me try to echo what I read on this thread and how I understand what APIs
> samba needs from the kernel.
>
> >
> > If we want to design an useful API, we also need to think about
> > all features:
> > - file oplock/leases
>
> Kernel can have a flavor of leases which are not broken
> by opens from threads of the process holding the lease.
> Bruce has some patches along those lines for knfsd and SMB R/RW
> leases could use this flavor if it was exported to userspace?
>
> For SMB RH/RWH leases and Ganesha delegations, server
> could keep track of its own handles/clients and break leases within the
> same process without involving the kernel.
> Am I wrong?
>
> > - directory leases
>
> I have WIP on fsnotify directory pre modification hooks.
> There is opposition from fsnotify maintainer to add new userspace
> APIs that can create kernel->user->kernel deadlocks, like the
> deadlocks currently reported with fanotify permission events.
>
> Need to see if we can find a middle ground between
> "post modification notifications" and "pre modification permission"
> API, somewhere along the lines of regular file lease breaking API.
>
> > - share modes
>
> Volker told me he thinks samba can enforce share modes by
> a single daemon policing all opens in the system with fanotify.
> I think he is right. If anyone thinks differently please speak up.
>
> > - disconnected handles (for durable and persistent handles),
> > which exists within the kernel for a while and can be reattached
> > to process, using some kind of cookie and the same euid
>
> So this interface exists in the kernel.
> Nothing more required from the kernel API. Right?
>
> > - the API needs ways to use epoll in order to do async opens
> > and lease breaks. For opens the model of async socket connects
> > could be used. Leases could have a signalfd-style api.
>
> I should hope that the new AIO API (http://kernel.dk/io_uring.pdf)
> would solve those problems as well as other issues that
> samba has w.r.t dispatching AIO.
>
> >
> > We may not need everything at once, but we should have the full picture
> > in mind. And we need working code in kernel and userspace that passes
> > all tests (we may need to add additional test). Otherwise the kernel
> > creates new syscalls, which wouldn't be used by Samba in the end.
> >
>
> Tested interfaces - good idea ;-)
>
> If anyone has any comments about my view of required new interfaces,
> or important things that I missed, please say so before Tuesday!
>

Hello Samba-team,

Some of you may have already seen the reports from my session at LSF/MM
on Samba/NFS interop: https://lwn.net/Articles/788335/

It should not be a surprise to anyone here to know that I have had interesting
and productive conversations with NFS folks about improving samba interop.
It should not be a surprise to anyone here to know that the rest of the audience
was, generally speaking, uninterested in the problem.

Which provides a re-enforcement to the point I was trying to make in session -
The path of least resistance for NFS-Samba interop is the communicate with
each other (both human and software wise) and try to leave VFS out of the
discussion for as much as possible (hence dropping linux-fsdevel from
this thread).

An idea that has already been thrown around is to use some samba daemon as
an arbitrator for opening files and locks. Of course, this would be an
opt-in feature
for NFS servers.

For example, can we use fanotify permission hooks to delegate access control
checks from knfsd to a daemon? Right now, the information in
permission events is
rather minimal, but as an fanotify developer, I can assure you, that
we can enrich the
information passed by knfsd on open permission events if that is deemed useful.

I will be attending SambaXP, so if any of the Samba guys would like to, we could
find a slot in the Hallway track or at a local bar to discuss those options.

Thanks,
Amir.

2019-05-24 13:16:20

by Ralph Boehme

[permalink] [raw]
Subject: Re: Better interop for NFS/SMB file share mode/reservation

On Fri, May 24, 2019 at 10:12:10AM +0300, Amir Goldstein wrote:
>I will be attending SambaXP, so if any of the Samba guys would like to, we could
>find a slot in the Hallway track or at a local bar to discuss those options.

awesome! I'll join as well.

Looking forward to see you at SambaXP!
-slow

--
Ralph Boehme, Samba Team https://samba.org/
Samba Developer, SerNet GmbH https://sernet.de/en/samba/
GPG-Fingerprint FAE2C6088A24252051C559E4AA1E9B7126399E46

2019-05-24 15:08:51

by J. Bruce Fields

[permalink] [raw]
Subject: Re: Better interop for NFS/SMB file share mode/reservation

On Fri, May 24, 2019 at 10:12:10AM +0300, Amir Goldstein wrote:
> Some of you may have already seen the reports from my session at LSF/MM
> on Samba/NFS interop: https://lwn.net/Articles/788335/
>
> It should not be a surprise to anyone here to know that I have had interesting
> and productive conversations with NFS folks about improving samba interop.
> It should not be a surprise to anyone here to know that the rest of the audience
> was, generally speaking, uninterested in the problem.

Eh, especially after a couple days of highly technical talks and people
have trouble focusing on stuff outside their area. I wouldn't take that
as opposition, if that's what you mean.

I think the only place where there's any entrenched opposition is (alas)
ACLs.

Lease/lock stuff, for example, should be no problem. It's mainly just a
matter of people finding time.

> Which provides a re-enforcement to the point I was trying to make in session -
> The path of least resistance for NFS-Samba interop is the communicate with
> each other (both human and software wise) and try to leave VFS out of the
> discussion for as much as possible (hence dropping linux-fsdevel from
> this thread).

I've got a strong preference for doing stuff in the VFS.

Maybe the approaches aren't incompatible--if we can do something without
new kernel interfaces for now, it doesn't rule out later moving some of
the logic into the kernel if that helps.

That said, I'm not comfortable depending on an assumption that knfsd and
SMB are the only users of a filesystem. If we're going to introduce
some new kind of lock, for example, I'd like it enforced against
everyone. In knfsd, we broke that rule for open deny modes and I think
it was a mistake.

--b.

> An idea that has already been thrown around is to use some samba daemon as
> an arbitrator for opening files and locks. Of course, this would be an
> opt-in feature
> for NFS servers.
>
> For example, can we use fanotify permission hooks to delegate access control
> checks from knfsd to a daemon? Right now, the information in
> permission events is
> rather minimal, but as an fanotify developer, I can assure you, that
> we can enrich the
> information passed by knfsd on open permission events if that is deemed useful.
>
> I will be attending SambaXP, so if any of the Samba guys would like to, we could
> find a slot in the Hallway track or at a local bar to discuss those options.