Please pull the following nfsd updates from the for-2.6.37 branch at:
git://linux-nfs.org/~bfields/linux.git for-2.6.37
Neil Brown modifed the way we wait for upcalls to solve a performance
problem and a long-standing NFSv4 correctness problem.
Pavel Emelyanov started the job of making the NFS client and server
container-aware. (Note that for the sake of keeping his work in one
tree, Trond and I agreed to merge it through my tree, though it includes
client as well as server changes.) Thanks to Chuck Lever for help with
review.
Thanks also to Tom Tucker for some rdma fixes (and review help).
I'm continuing work to bring the NFSv4.1 server close enough to the spec
that we could turn it on by default without risking future
interoperability problems. We're not quite there yet.
A moment's silence for the spkm3 code, please. It's been lying unused
for years now, so we decided to finally put it out of its misery.
We're also deprecating (not removing quite yet) the old nfsd control
interfaces.
I have a few last-minute changes still under testing that I may try to
sneak in later in the week, but given the short merge window I thought
it would make more sense to send the bulk of the changes now.
--b.
Andy Adamson (1):
nfsd: remove duplicate NFS4_STATEID_SIZE declaration
Andy Shevchenko (1):
sunrpc/cache: don't use custom hex_to_bin() converter
Benny Halevy (1):
nfsd4: adjust buflen for encoded attrs bitmap based on actual bitmap length
Bryan Schumaker (1):
lockd: Mostly remove BKL from the server
Chuck Lever (2):
SUNRPC: Use conventional switch statement when reclassifying sockets
SUNRPC: Properly initialize sock_xprt.srcaddr in all cases
J. Bruce Fields (39):
svcrpc: minor cache cleanup
svcrpc: cache deferral cleanup
Merge remote branch 'trond/bugfixes' into for-2.6.37
nfsd4: fix hang on fast-booting nfs servers
nfsd: fix /proc/net/rpc/nfsd.export/content display
nfsd4: remove spkm3
nfsd4: minor variable renaming (cb -> conn)
nfsd4: combine nfs4_rpc_args and nfsd4_cb_sequence
nfsd4: rename nfs4_rpc_args->nfsd4_cb_args
nfsd4: generic callback code
nfsd4: use generic callback code in null case
nfsd4: remove separate cb_args struct
nfsd4: Move callback setup to callback queue
nfsd4: fix alloc_init_session BUILD_BUG_ON()
nfsd4: fix alloc_init_session return type
nfsd4: clean up session allocation
nfsd4: keep per-session list of connections
nfsd: provide callbacks on svc_xprt deletion
nfsd4: use callbacks on svc_xprt_deletion
nfsd4: refactor connection allocation
nfsd4: add new connections to session
nfsd4: return expired on unfound stateid's
nfsd4: expire clients more promptly
nfsd4: don't cache seq_misordered replies
nfsd4: move callback setup into session init code
nfsd4: use client pointer to backchannel session
nfsd4: make backchannel sequence number per-session
nfsd4: confirm only on succesful create_session
nfsd4: track backchannel connections
nfsd4: callback program number is per-session
nfsd4: separate callback change and callback probe
nfsd4: delay session removal till free_client
nfsd4: move minorversion to client
nfsd4: only require krb5 principal for NFSv4.0 callbacks
nfsd4: fix connection allocation in sequence()
svcrpc: never clear XPT_BUSY on dead xprt
svcrpc: assume svc_delete_xprt() called only once
svcrpc: no need for XPT_DEAD check in svc_xprt_enqueue
svcrpc: svc_tcp_sendto XPT_DEAD check is redundant
NeilBrown (14):
sunrpc: extract some common sunrpc_cache code from nfsd
sunrpc: use seconds since boot in expiry cache
sunrpc/cache: allow threads to block while waiting for cache update.
sunrpc: close connection when a request is irretrievably lost.
nfsd: disable deferral for NFSv4
nfsd/idmap: drop special request deferal in favour of improved default.
svcauth_gss: replace a trivial 'switch' with an 'if'
sunrpc/cache: change deferred-request hash table to use hlist.
sunrpc/cache: fix recent breakage of cache_clean_deferred
nfsd: formally deprecate legacy nfsd syscall interface
nfsd: allow deprecated interface to be compiled out.
sunrpc: fix race in new cache_wait code.
sunrpc: Simplify cache_defer_req and related functions.
sunrpc/cache: centralise handling of size limit on deferred list.
Pavel Emelyanov (37):
nfsd: Export get_task_comm for nfsd
sunrpc: Pass the ip_map_parse's cd to lower calls
sunrpc: Make xprt auth cache release work with the xprt
sunrpc: Pass xprt to cached get/put routines
sunrpc: Add net to pure API calls
sunrpc: Add routines that allow registering per-net caches
sunrpc: Tag svc_xprt with net
sunrpc: The per-net skeleton
sunrpc: Make the /proc/net/rpc appear in net namespaces
sunrpc: Make the ip_map_cache be per-net
sunrpc: Factor out rpc_xprt allocation
sunrpc: Factor out rpc_xprt freeing
sunrpc: Add net argument to svc_create_xprt
sunrpc: Pull net argument downto svc_create_socket
sunrpc: Add net to rpc_create_args
sunrpc: Add net to xprt_create
sunrpc: Tag rpc_xprt with net
net: Export __sock_create
sunrpc: Create sockets in net namespaces
sunrpc: Use helper to set v4 mapped addr in ip_map_parse
sunrpc: Remove unused sock arg from xs_get_srcport
sunrpc: Remove unused sock arg from xs_next_srcport
sunrpc: Get xprt pointer once in xs_tcp_setup_socket
sunrpc: Remove duplicate xprt/transport arguments from calls
sunrpc: Factor out udp sockets creation
sunrpc: Factor out v4 sockets creation
sunrpc: Factor out v6 sockets creation
sunrpc: Call xs_create_sockX directly from setup_socket
sunrpc: Merge the xs_bind code
sunrpc: Merge xs_create_sock code
sunrpc: Pass family to setup_socket calls
sunrpc: Remove TCP worker wrappers
sunrpc: Remove UDP worker wrappers
sunrpc: Remove useless if (task == NULL) from xprt_reserve_xprt
sunrpc: Don't return NULL from rpcb_create
sunrpc: Remove dead "else" branch from bc xprt creation
sunrpc: Turn list_for_each-s into the ..._entry-s
Stephen Rothwell (1):
sunrpc: fix up rpcauth_remove_module section mismatch
Tejun Heo (1):
sunrpc/xprtrdma: clean up workqueue usage
Tom Tucker (2):
svcrdma: Change DMA mapping logic to avoid the page_address kernel API
svcrdma: Cleanup DMA unmapping in error paths.
Documentation/feature-removal-schedule.txt | 10 +
fs/Makefile | 5 +-
fs/compat.c | 2 +-
fs/lockd/host.c | 1 +
fs/lockd/mon.c | 1 +
fs/lockd/svc.c | 2 +-
fs/lockd/svc4proc.c | 2 -
fs/lockd/svclock.c | 31 ++-
fs/lockd/svcproc.c | 2 -
fs/nfs/callback.c | 4 +-
fs/nfs/client.c | 1 +
fs/nfs/dns_resolve.c | 6 +-
fs/nfs/mount_clnt.c | 2 +
fs/nfsd/Kconfig | 12 +
fs/nfsd/export.c | 73 +++--
fs/nfsd/nfs4callback.c | 245 ++++++++------
fs/nfsd/nfs4idmap.c | 105 +------
fs/nfsd/nfs4proc.c | 7 +-
fs/nfsd/nfs4state.c | 491 ++++++++++++++++++----------
fs/nfsd/nfs4xdr.c | 18 +-
fs/nfsd/nfsctl.c | 26 ++-
fs/nfsd/nfsd.h | 2 +-
fs/nfsd/nfssvc.c | 5 +-
fs/nfsd/state.h | 52 ++-
include/linux/net.h | 2 +
include/linux/nfs4.h | 3 +
include/linux/sunrpc/auth.h | 4 +-
include/linux/sunrpc/cache.h | 37 ++-
include/linux/sunrpc/clnt.h | 1 +
include/linux/sunrpc/gss_spkm3.h | 55 ---
include/linux/sunrpc/stats.h | 23 +-
include/linux/sunrpc/svc_xprt.h | 32 ++-
include/linux/sunrpc/svcauth.h | 17 +-
include/linux/sunrpc/xprt.h | 4 +
net/socket.c | 3 +-
net/sunrpc/Kconfig | 19 -
net/sunrpc/auth.c | 2 +-
net/sunrpc/auth_generic.c | 2 +-
net/sunrpc/auth_gss/Makefile | 5 -
net/sunrpc/auth_gss/gss_spkm3_mech.c | 247 --------------
net/sunrpc/auth_gss/gss_spkm3_seal.c | 186 -----------
net/sunrpc/auth_gss/gss_spkm3_token.c | 267 ---------------
net/sunrpc/auth_gss/gss_spkm3_unseal.c | 127 -------
net/sunrpc/auth_gss/svcauth_gss.c | 51 ++--
net/sunrpc/cache.c | 288 ++++++++++++-----
net/sunrpc/clnt.c | 1 +
net/sunrpc/netns.h | 19 +
net/sunrpc/rpcb_clnt.c | 4 +-
net/sunrpc/stats.c | 43 ++-
net/sunrpc/sunrpc_syms.c | 58 +++-
net/sunrpc/svc.c | 3 +
net/sunrpc/svc_xprt.c | 59 +++--
net/sunrpc/svcauth_unix.c | 194 ++++++++---
net/sunrpc/svcsock.c | 27 +-
net/sunrpc/xprt.c | 39 ++-
net/sunrpc/xprtrdma/svc_rdma.c | 11 +-
net/sunrpc/xprtrdma/svc_rdma_recvfrom.c | 19 +-
net/sunrpc/xprtrdma/svc_rdma_sendto.c | 82 ++++--
net/sunrpc/xprtrdma/svc_rdma_transport.c | 49 ++--
net/sunrpc/xprtrdma/transport.c | 25 +-
net/sunrpc/xprtsock.c | 358 +++++++-------------
61 files changed, 1534 insertions(+), 1937 deletions(-)
delete mode 100644 include/linux/sunrpc/gss_spkm3.h
delete mode 100644 net/sunrpc/auth_gss/gss_spkm3_mech.c
delete mode 100644 net/sunrpc/auth_gss/gss_spkm3_seal.c
delete mode 100644 net/sunrpc/auth_gss/gss_spkm3_token.c
delete mode 100644 net/sunrpc/auth_gss/gss_spkm3_unseal.c
create mode 100644 net/sunrpc/netns.h
On Wed, Oct 27, 2010 at 10:20:20PM +0200, Arnd Bergmann wrote:
> On Wednesday 27 October 2010 22:01:45 J. Bruce Fields wrote:
> > (You're missing Linus's comment fix, though, unless I'm confused.)
>
> Right, I didn't look into the attachment, so I hadn't noticed
> the new changelog. I've updated the changelog to Linus' version,
> as it's more verbose than mine.
He fixed a code comment too, though. No big deal.
> I also checked that the version in your tree is actually the
> right one, so I left the code untouched.
Yup, looks good.
--b.
On Wed, Oct 27, 2010 at 09:48:47PM +0200, Arnd Bergmann wrote:
> On Wednesday 27 October 2010 20:43:59 Linus Torvalds wrote:
> > On Wed, Oct 27, 2010 at 11:42 AM, Linus Torvalds
> > <[email protected]> wrote:
> > >
> > > Feel free to edit the message/patch to your hearts content.
> >
> > Oh, and if you want me to just commit this part, I can do so. It
> > doesn't make much sense without the other parts to actually make it
> > useful, though, so it probably makes more sense to come with them.
>
> Once Bruce is happy with the test results, you can pull it from
>
> git+ssh://master.kernel.org/pub/scm/linux/kernel/git/arnd/bkl.git flock
Not the useful url for us hoi polloi out here. OK, looking at
git://git.kernel.org/pub/scm/linux/kernel/git/arnd/bkl.git flock....
> I've rewritten all the changelogs to make more sense as a series,
> and split out the bits that turn lock_flocks into a spinlock to
> come last, so we get a bisectable series.
>
> The contents are left unchanged.
Looks fine to me!
I haven't retested that, but I can see that the result is identical to
what I tested.
(You're missing Linus's comment fix, though, unless I'm confused.)
--b.
>
> Arnd
> ---
>
> commit b3426739cc8f7c7dd127ca8dad5e25195930cac1
> Author: Arnd Bergmann <[email protected]>
> Date: Wed Oct 27 21:39:58 2010 +0200
>
> locks: turn lock_flocks into a spinlock
>
> Nothing depends on lock_flocks using the BKL
> any more, so we can do the switch over to
> a private spinlock.
>
> Signed-off-by: Arnd Bergmann <[email protected]>
>
> fs/Kconfig | 1 -
> fs/locks.c | 5 +++--
> 2 files changed, 3 insertions(+), 3 deletions(-)
>
> commit 55d3ff97c5c0b3dce39f705e2b1fe85818891822
> Author: Linus Torvalds <[email protected]>
> Date: Wed Oct 27 12:38:12 2010 -0400
>
> locks: avoid fasync allocation under lock_flocks
>
> This splits fasync_helper into four functions: fasync_alloc,
> fasync_free, fasync_insert_entry and fasync_remove_entry,
> in order to allow the lease handling to call them directly
> instead of going through fasync_helper.
>
> The fasync_helper interface really doesn't make sense outside
> of ->fasync file operations, which use the same calling
> conventions of passing flags in and out as the helper.
>
> After the change, fcntl_setlease can simply allocate the
> new fasync_struct outside of lock_flocks, which is required
> to turn that into a spinlock.
>
> Signed-off-by: Linus Torvalds <[email protected]>
> [[email protected]: rebase on top of my changes to Arnd's patch]
> Signed-off-by: J. Bruce Fields <[email protected]>
> [arnd: rewrite changelog text]
> Signed-off-by: Arnd Bergmann <[email protected]>
>
> fs/fcntl.c | 66 +++++++++++++++++++++++++++++++++++++++------------
> fs/locks.c | 18 +++++++++++++-
> include/linux/fs.h | 5 ++++
> 3 files changed, 72 insertions(+), 17 deletions(-)
>
> commit c5b1f0d92c36851aca09ac6c7c0c4f9690ac14f3
> Author: Arnd Bergmann <[email protected]>
> Date: Wed Oct 27 15:46:08 2010 +0200
>
> locks/nfsd: allocate file lock outside of spinlock
>
> As suggested by Christoph Hellwig, this moves allocation
> of new file locks out of generic_setlease into the
> callers, nfs4_open_delegation and fcntl_setlease in order
> to allow GFP_KERNEL allocations when lock_flocks has
> become a spinlock.
>
> Signed-off-by: Arnd Bergmann <[email protected]>
> Acked-by: J. Bruce Fields <[email protected]>
>
> fs/locks.c | 36 ++++++++++++------------------------
> fs/nfsd/nfs4state.c | 26 +++++++++++++++-----------
> include/linux/fs.h | 1 +
> 3 files changed, 28 insertions(+), 35 deletions(-)
>
> commit a282a1fa6b23bd21ba0b86e53ed2a316b001836f
> Author: J. Bruce Fields <[email protected]>
> Date: Tue Oct 26 18:25:30 2010 -0400
>
> lockd: fix nlmsvc_notify_blocked locking
>
> nlmsvc_notify_blocked walks the nlm_blocked list,
> which requires nlm_blocked_lock.
>
> Signed-off-by: J. Bruce Fields <[email protected]>
> Signed-off-by: Arnd Bergmann <[email protected]>
>
> fs/lockd/svclock.c | 6 ++++--
> 1 files changed, 4 insertions(+), 2 deletions(-)
>
> commit 763641d81202834e9d64de2019d1edec12868f4f
> Author: Arnd Bergmann <[email protected]>
> Date: Tue Oct 26 22:55:40 2010 +0200
>
> lockd: push lock_flocks down
>
> lockd should use lock_flocks() instead of lock_kernel()
> to lock against posix locks accessing the i_flock list.
>
> This is a prerequisite to turning lock_flocks into a
> spinlock.
>
> Signed-off-by: Arnd Bergmann <[email protected]>
> Acked-by: J. Bruce Fields <[email protected]>
>
> fs/lockd/svc.c | 11 -----------
> fs/lockd/svcsubs.c | 9 ++++++++-
> fs/nfs/Kconfig | 1 -
> fs/nfsd/Kconfig | 1 -
> 4 files changed, 8 insertions(+), 14 deletions(-)
On Wed, Oct 27, 2010 at 11:42 AM, Linus Torvalds
<[email protected]> wrote:
>
> Feel free to edit the message/patch to your hearts content.
Oh, and if you want me to just commit this part, I can do so. It
doesn't make much sense without the other parts to actually make it
useful, though, so it probably makes more sense to come with them.
Linus
On Tue, Oct 26, 2010 at 12:45:50PM -0400, J. Bruce Fields wrote:
> Please pull the following nfsd updates from the for-2.6.37 branch at:
>
> git://linux-nfs.org/~bfields/linux.git for-2.6.37
By the way, apologies, I see there's a conflict with upstream--looks
obvious (conflicting appends to
Documentation/feature-removal-schedule.txt), but I'm happy to fix it up
and add a merge commit on that branch if it saves you time.
--b.
>
> Neil Brown modifed the way we wait for upcalls to solve a performance
> problem and a long-standing NFSv4 correctness problem.
>
> Pavel Emelyanov started the job of making the NFS client and server
> container-aware. (Note that for the sake of keeping his work in one
> tree, Trond and I agreed to merge it through my tree, though it includes
> client as well as server changes.) Thanks to Chuck Lever for help with
> review.
>
> Thanks also to Tom Tucker for some rdma fixes (and review help).
>
> I'm continuing work to bring the NFSv4.1 server close enough to the spec
> that we could turn it on by default without risking future
> interoperability problems. We're not quite there yet.
>
> A moment's silence for the spkm3 code, please. It's been lying unused
> for years now, so we decided to finally put it out of its misery.
>
> We're also deprecating (not removing quite yet) the old nfsd control
> interfaces.
>
> I have a few last-minute changes still under testing that I may try to
> sneak in later in the week, but given the short merge window I thought
> it would make more sense to send the bulk of the changes now.
>
> --b.
>
> Andy Adamson (1):
> nfsd: remove duplicate NFS4_STATEID_SIZE declaration
>
> Andy Shevchenko (1):
> sunrpc/cache: don't use custom hex_to_bin() converter
>
> Benny Halevy (1):
> nfsd4: adjust buflen for encoded attrs bitmap based on actual bitmap length
>
> Bryan Schumaker (1):
> lockd: Mostly remove BKL from the server
>
> Chuck Lever (2):
> SUNRPC: Use conventional switch statement when reclassifying sockets
> SUNRPC: Properly initialize sock_xprt.srcaddr in all cases
>
> J. Bruce Fields (39):
> svcrpc: minor cache cleanup
> svcrpc: cache deferral cleanup
> Merge remote branch 'trond/bugfixes' into for-2.6.37
> nfsd4: fix hang on fast-booting nfs servers
> nfsd: fix /proc/net/rpc/nfsd.export/content display
> nfsd4: remove spkm3
> nfsd4: minor variable renaming (cb -> conn)
> nfsd4: combine nfs4_rpc_args and nfsd4_cb_sequence
> nfsd4: rename nfs4_rpc_args->nfsd4_cb_args
> nfsd4: generic callback code
> nfsd4: use generic callback code in null case
> nfsd4: remove separate cb_args struct
> nfsd4: Move callback setup to callback queue
> nfsd4: fix alloc_init_session BUILD_BUG_ON()
> nfsd4: fix alloc_init_session return type
> nfsd4: clean up session allocation
> nfsd4: keep per-session list of connections
> nfsd: provide callbacks on svc_xprt deletion
> nfsd4: use callbacks on svc_xprt_deletion
> nfsd4: refactor connection allocation
> nfsd4: add new connections to session
> nfsd4: return expired on unfound stateid's
> nfsd4: expire clients more promptly
> nfsd4: don't cache seq_misordered replies
> nfsd4: move callback setup into session init code
> nfsd4: use client pointer to backchannel session
> nfsd4: make backchannel sequence number per-session
> nfsd4: confirm only on succesful create_session
> nfsd4: track backchannel connections
> nfsd4: callback program number is per-session
> nfsd4: separate callback change and callback probe
> nfsd4: delay session removal till free_client
> nfsd4: move minorversion to client
> nfsd4: only require krb5 principal for NFSv4.0 callbacks
> nfsd4: fix connection allocation in sequence()
> svcrpc: never clear XPT_BUSY on dead xprt
> svcrpc: assume svc_delete_xprt() called only once
> svcrpc: no need for XPT_DEAD check in svc_xprt_enqueue
> svcrpc: svc_tcp_sendto XPT_DEAD check is redundant
>
> NeilBrown (14):
> sunrpc: extract some common sunrpc_cache code from nfsd
> sunrpc: use seconds since boot in expiry cache
> sunrpc/cache: allow threads to block while waiting for cache update.
> sunrpc: close connection when a request is irretrievably lost.
> nfsd: disable deferral for NFSv4
> nfsd/idmap: drop special request deferal in favour of improved default.
> svcauth_gss: replace a trivial 'switch' with an 'if'
> sunrpc/cache: change deferred-request hash table to use hlist.
> sunrpc/cache: fix recent breakage of cache_clean_deferred
> nfsd: formally deprecate legacy nfsd syscall interface
> nfsd: allow deprecated interface to be compiled out.
> sunrpc: fix race in new cache_wait code.
> sunrpc: Simplify cache_defer_req and related functions.
> sunrpc/cache: centralise handling of size limit on deferred list.
>
> Pavel Emelyanov (37):
> nfsd: Export get_task_comm for nfsd
> sunrpc: Pass the ip_map_parse's cd to lower calls
> sunrpc: Make xprt auth cache release work with the xprt
> sunrpc: Pass xprt to cached get/put routines
> sunrpc: Add net to pure API calls
> sunrpc: Add routines that allow registering per-net caches
> sunrpc: Tag svc_xprt with net
> sunrpc: The per-net skeleton
> sunrpc: Make the /proc/net/rpc appear in net namespaces
> sunrpc: Make the ip_map_cache be per-net
> sunrpc: Factor out rpc_xprt allocation
> sunrpc: Factor out rpc_xprt freeing
> sunrpc: Add net argument to svc_create_xprt
> sunrpc: Pull net argument downto svc_create_socket
> sunrpc: Add net to rpc_create_args
> sunrpc: Add net to xprt_create
> sunrpc: Tag rpc_xprt with net
> net: Export __sock_create
> sunrpc: Create sockets in net namespaces
> sunrpc: Use helper to set v4 mapped addr in ip_map_parse
> sunrpc: Remove unused sock arg from xs_get_srcport
> sunrpc: Remove unused sock arg from xs_next_srcport
> sunrpc: Get xprt pointer once in xs_tcp_setup_socket
> sunrpc: Remove duplicate xprt/transport arguments from calls
> sunrpc: Factor out udp sockets creation
> sunrpc: Factor out v4 sockets creation
> sunrpc: Factor out v6 sockets creation
> sunrpc: Call xs_create_sockX directly from setup_socket
> sunrpc: Merge the xs_bind code
> sunrpc: Merge xs_create_sock code
> sunrpc: Pass family to setup_socket calls
> sunrpc: Remove TCP worker wrappers
> sunrpc: Remove UDP worker wrappers
> sunrpc: Remove useless if (task == NULL) from xprt_reserve_xprt
> sunrpc: Don't return NULL from rpcb_create
> sunrpc: Remove dead "else" branch from bc xprt creation
> sunrpc: Turn list_for_each-s into the ..._entry-s
>
> Stephen Rothwell (1):
> sunrpc: fix up rpcauth_remove_module section mismatch
>
> Tejun Heo (1):
> sunrpc/xprtrdma: clean up workqueue usage
>
> Tom Tucker (2):
> svcrdma: Change DMA mapping logic to avoid the page_address kernel API
> svcrdma: Cleanup DMA unmapping in error paths.
>
> Documentation/feature-removal-schedule.txt | 10 +
> fs/Makefile | 5 +-
> fs/compat.c | 2 +-
> fs/lockd/host.c | 1 +
> fs/lockd/mon.c | 1 +
> fs/lockd/svc.c | 2 +-
> fs/lockd/svc4proc.c | 2 -
> fs/lockd/svclock.c | 31 ++-
> fs/lockd/svcproc.c | 2 -
> fs/nfs/callback.c | 4 +-
> fs/nfs/client.c | 1 +
> fs/nfs/dns_resolve.c | 6 +-
> fs/nfs/mount_clnt.c | 2 +
> fs/nfsd/Kconfig | 12 +
> fs/nfsd/export.c | 73 +++--
> fs/nfsd/nfs4callback.c | 245 ++++++++------
> fs/nfsd/nfs4idmap.c | 105 +------
> fs/nfsd/nfs4proc.c | 7 +-
> fs/nfsd/nfs4state.c | 491 ++++++++++++++++++----------
> fs/nfsd/nfs4xdr.c | 18 +-
> fs/nfsd/nfsctl.c | 26 ++-
> fs/nfsd/nfsd.h | 2 +-
> fs/nfsd/nfssvc.c | 5 +-
> fs/nfsd/state.h | 52 ++-
> include/linux/net.h | 2 +
> include/linux/nfs4.h | 3 +
> include/linux/sunrpc/auth.h | 4 +-
> include/linux/sunrpc/cache.h | 37 ++-
> include/linux/sunrpc/clnt.h | 1 +
> include/linux/sunrpc/gss_spkm3.h | 55 ---
> include/linux/sunrpc/stats.h | 23 +-
> include/linux/sunrpc/svc_xprt.h | 32 ++-
> include/linux/sunrpc/svcauth.h | 17 +-
> include/linux/sunrpc/xprt.h | 4 +
> net/socket.c | 3 +-
> net/sunrpc/Kconfig | 19 -
> net/sunrpc/auth.c | 2 +-
> net/sunrpc/auth_generic.c | 2 +-
> net/sunrpc/auth_gss/Makefile | 5 -
> net/sunrpc/auth_gss/gss_spkm3_mech.c | 247 --------------
> net/sunrpc/auth_gss/gss_spkm3_seal.c | 186 -----------
> net/sunrpc/auth_gss/gss_spkm3_token.c | 267 ---------------
> net/sunrpc/auth_gss/gss_spkm3_unseal.c | 127 -------
> net/sunrpc/auth_gss/svcauth_gss.c | 51 ++--
> net/sunrpc/cache.c | 288 ++++++++++++-----
> net/sunrpc/clnt.c | 1 +
> net/sunrpc/netns.h | 19 +
> net/sunrpc/rpcb_clnt.c | 4 +-
> net/sunrpc/stats.c | 43 ++-
> net/sunrpc/sunrpc_syms.c | 58 +++-
> net/sunrpc/svc.c | 3 +
> net/sunrpc/svc_xprt.c | 59 +++--
> net/sunrpc/svcauth_unix.c | 194 ++++++++---
> net/sunrpc/svcsock.c | 27 +-
> net/sunrpc/xprt.c | 39 ++-
> net/sunrpc/xprtrdma/svc_rdma.c | 11 +-
> net/sunrpc/xprtrdma/svc_rdma_recvfrom.c | 19 +-
> net/sunrpc/xprtrdma/svc_rdma_sendto.c | 82 ++++--
> net/sunrpc/xprtrdma/svc_rdma_transport.c | 49 ++--
> net/sunrpc/xprtrdma/transport.c | 25 +-
> net/sunrpc/xprtsock.c | 358 +++++++-------------
> 61 files changed, 1534 insertions(+), 1937 deletions(-)
> delete mode 100644 include/linux/sunrpc/gss_spkm3.h
> delete mode 100644 net/sunrpc/auth_gss/gss_spkm3_mech.c
> delete mode 100644 net/sunrpc/auth_gss/gss_spkm3_seal.c
> delete mode 100644 net/sunrpc/auth_gss/gss_spkm3_token.c
> delete mode 100644 net/sunrpc/auth_gss/gss_spkm3_unseal.c
> create mode 100644 net/sunrpc/netns.h
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wednesday 27 October 2010 19:40:16 J. Bruce Fields wrote:
> Doesn't matter to me. What I've currently got (Arnd's patches with some
> minor fixes from me, and your patch, all in a poorly changelogged
> jumble) is in
>
> git://linux-nfs.org/~bfields/linux-topics.git TMP-BKL-TESTING
>
> You wouldn't want to merge that as is, but it does seem to work.
Ok, then let's use your tree with proper changelogs. That should be the
easiest way to make sure that the code you test is the one that gets in.
Arnd
Arnd Bergmann wrote:
> > If you don't hold lock_flocks throughout fcntl_setlease, the flp variable
> > points to a flock that may get modified by another thread and you call
> > time_out_leases() without holding lock_flocks, which it requires.
Whoops, thanks for catching that.
On Wed, Oct 27, 2010 at 04:39:24AM -0400, Christoph Hellwig wrote:
> Do locks_alloc_lock and initialization of the heap struct file_lock
> in the caller. This also avoids an entirely useless copy of the
> lock structure. free the passed in structure if we are modifying
> an existing lock structure.
That sounds good; I'll give it a try.
--b.
On Wednesday 27 October 2010 00:11:56 J. Bruce Fields wrote:
> > BUG: sleeping function called from invalid context at mm/slab.c:3101
> > in_atomic(): 1, irqs_disabled(): 0, pid: 4345, name: lease_tests
> > 1 lock held by lease_tests/4345:
> > #0: (file_lock_lock){+.+.+.}, at: [<ffffffff81128be5>] lock_flocks+0x15/0x20
> > Pid: 4345, comm: lease_tests Not tainted 2.6.36-05858-gbd5e20b #1028
> > Call Trace:
> > [<ffffffff8103141d>] __might_sleep+0x10d/0x140
> > [<ffffffff810e3ad3>] kmem_cache_alloc+0x1f3/0x230
> > [<ffffffff8112a4d2>] generic_setlease+0x112/0x2c0
> > [<ffffffff8112a6b5>] __vfs_setlease+0x35/0x40
> > [<ffffffff8112acfe>] fcntl_setlease+0xce/0x180
> > [<ffffffff810f7c2e>] sys_fcntl+0x2fe/0x630
> > [<ffffffff81961999>] ? trace_hardirqs_on_thunk+0x3a/0x3f
> > [<ffffffff81002658>] system_call_fastpath+0x16/0x1b
> >
> > I'm testing a patch.
>
Thanks for the report!
> @@ -1524,8 +1528,6 @@ int fcntl_setlease(unsigned int fd, struct file *filp, long arg)
> if (error)
> return error;
>
> - lock_flocks();
> -
> error = __vfs_setlease(filp, arg, &flp);
> if (error || arg == F_UNLCK)
> goto out_unlock;
> @@ -1541,7 +1543,6 @@ int fcntl_setlease(unsigned int fd, struct file *filp, long arg)
>
> error = __f_setown(filp, task_pid(current), PIDTYPE_PID, 0);
> out_unlock:
> - unlock_flocks();
> return error;
> }
If you don't hold lock_flocks throughout fcntl_setlease, the flp variable
points to a flock that may get modified by another thread and you call
time_out_leases() without holding lock_flocks, which it requires.
The two alternatives I can see are to either use GFP_ATOMIC or to
take the lock inside of generic_setlease and drop it outside.
Neither of the two sounds particularly appealing.
Arnd
On Sat, Oct 30, 2010 at 05:31:14PM -0400, J. Bruce Fields wrote:
> We're depending on setlease to free the passed-in lease on failure.
But we would be much better to just free it in the caller. I'ts much
more natural - caller allocates, caller frees, and it's also simpler.
I'll send a patch to do so shortly, together with sorting out the
remaining nfs4d lock_manager_operations abuses. I think we're set with
that for 2.6.37, the setlease split can wait once we've sorted out the
lock freeing issue.
On Sat, Oct 30, 2010 at 05:31:16PM -0400, J. Bruce Fields wrote:
> The NFSv4 server was initializing the dp->dl_flock pointer by the
> somewhat ridiculous method of a locks_copy_lock callback.
>
> Now that setlease uses the passed-in lock instead of doing a copy,
> dl_flock no longer gets set, resulting in the lock leaking on delegation
> release, and later possible hangs (among other problems).
>
> So, initialize dl_flock and get rid of the callback.
>From what I can see this was the only instance of
lock_manager_operations.fl_copy_lock. Please kill it while you're at
it.
Also lock_manager_operations.fl_release_private has only exact one
instance in nfs4d which is part of the same abuse scheme. Please also
get rid of it. I recently noticed this while updating
Documentation/filesystems/Locking for the grand new BKL-less world.
We're depending on setlease to free the passed-in lease on failure.
Signed-off-by: J. Bruce Fields <[email protected]>
---
fs/locks.c | 12 +++++++-----
1 files changed, 7 insertions(+), 5 deletions(-)
diff --git a/fs/locks.c b/fs/locks.c
index 06c7773..63fbc41 100644
--- a/fs/locks.c
+++ b/fs/locks.c
@@ -1371,20 +1371,22 @@ int generic_setlease(struct file *filp, long arg, struct file_lock **flp)
struct inode *inode = dentry->d_inode;
int error, rdlease_count = 0, wrlease_count = 0;
+ lease = *flp;
+
+ error = -EACCES;
if ((current_fsuid() != inode->i_uid) && !capable(CAP_LEASE))
- return -EACCES;
+ goto out;
+ error = -EINVAL;
if (!S_ISREG(inode->i_mode))
- return -EINVAL;
+ goto out;
error = security_file_lock(filp, arg);
if (error)
- return error;
+ goto out;
time_out_leases(inode);
BUG_ON(!(*flp)->fl_lmops->fl_break);
- lease = *flp;
-
if (arg != F_UNLCK) {
error = -EAGAIN;
if ((arg == F_RDLCK) && (atomic_read(&inode->i_writecount) > 0))
--
1.7.1
This one was only used for a nasty hack in nfsd, which has recently
been removed.
Signed-off-by: Christoph Hellwig <[email protected]>
Index: linux-2.6/fs/locks.c
===================================================================
--- linux-2.6.orig/fs/locks.c 2010-10-31 07:10:07.649004084 -0400
+++ linux-2.6/fs/locks.c 2010-10-31 07:34:10.102255587 -0400
@@ -235,11 +235,8 @@ static void locks_copy_private(struct fi
fl->fl_ops->fl_copy_lock(new, fl);
new->fl_ops = fl->fl_ops;
}
- if (fl->fl_lmops) {
- if (fl->fl_lmops->fl_copy_lock)
- fl->fl_lmops->fl_copy_lock(new, fl);
+ if (fl->fl_lmops)
new->fl_lmops = fl->fl_lmops;
- }
}
/*
Index: linux-2.6/Documentation/filesystems/Locking
===================================================================
--- linux-2.6.orig/Documentation/filesystems/Locking 2010-10-31 07:34:13.269012536 -0400
+++ linux-2.6/Documentation/filesystems/Locking 2010-10-31 07:34:20.567292713 -0400
@@ -322,7 +322,6 @@ fl_release_private: yes yes
prototypes:
int (*fl_compare_owner)(struct file_lock *, struct file_lock *);
void (*fl_notify)(struct file_lock *); /* unblock callback */
- void (*fl_copy_lock)(struct file_lock *, struct file_lock *);
void (*fl_release_private)(struct file_lock *);
void (*fl_break)(struct file_lock *); /* break_lease callback */
@@ -330,7 +329,6 @@ locking rules:
BKL may block
fl_compare_owner: yes no
fl_notify: yes no
-fl_copy_lock: yes no
fl_release_private: yes yes
fl_break: yes no
Index: linux-2.6/include/linux/fs.h
===================================================================
--- linux-2.6.orig/include/linux/fs.h 2010-10-31 07:33:13.131262640 -0400
+++ linux-2.6/include/linux/fs.h 2010-10-31 07:33:25.959005833 -0400
@@ -1056,7 +1056,6 @@ struct lock_manager_operations {
int (*fl_compare_owner)(struct file_lock *, struct file_lock *);
void (*fl_notify)(struct file_lock *); /* unblock callback */
int (*fl_grant)(struct file_lock *, struct file_lock *, int);
- void (*fl_copy_lock)(struct file_lock *, struct file_lock *);
void (*fl_release_private)(struct file_lock *);
void (*fl_break)(struct file_lock *);
int (*fl_mylease)(struct file_lock *, struct file_lock *);
> If you don't hold lock_flocks throughout fcntl_setlease, the flp variable
> points to a flock that may get modified by another thread and you call
> time_out_leases() without holding lock_flocks, which it requires.
>
> The two alternatives I can see are to either use GFP_ATOMIC or to
> take the lock inside of generic_setlease and drop it outside.
> Neither of the two sounds particularly appealing.
Do locks_alloc_lock and initialization of the heap struct file_lock
in the caller. This also avoids an entirely useless copy of the
lock structure. free the passed in structure if we are modifying
an existing lock structure.
On Wed, Oct 27, 2010 at 9:46 AM, J. Bruce Fields <[email protected]> wrote:
> On Wed, Oct 27, 2010 at 09:12:06AM -0700, Linus Torvalds wrote:
>>
>> Something like the attached (UNTESTED!) perhaps?
>
> Makes sense to me. ?Testing....
So I found a buglet in the patch: the
NOTE! It is very important that the FASYNC flag always
match the state "is the filp on a fasync list".
comment should be moved to be associated with "fasync_insert_entry()"
rather than "fasync_add_entry()", since it's the insert-entry thing
that does the actual FASYNC flag handling.
But that incorrect comment placement shouldn't affect testing, obviously ;)
Btw, who is going to collect these things assuming it passes testing?
Arnd? You? I'll happily sign off on the fasync patch (with the comment
movement) assuming it tests out ok, but there's all the other patches
too that have been passed around. I really do want to get this into
the merge window, because it would be a big shame if we couldn't
effectively get rid of the BKL now just because of these kinds of
smallish final details, so I'm just checking who wants to step up to
the plate to collect it all together and make sure I have it?
Linus
On Tue, Oct 26, 2010 at 06:11:56PM -0400, J. Bruce Fields wrote:
> On Tue, Oct 26, 2010 at 05:44:41PM -0400, J. Bruce Fields wrote:
> > On Tue, Oct 26, 2010 at 02:37:26PM -0700, Linus Torvalds wrote:
> > > On Tue, Oct 26, 2010 at 2:24 PM, J. Bruce Fields <[email protected]> wrote:
> > > >
> > > > I did a couple connectathon runs just now with no obvious ill effects
> > > > except for some sleep-within-spinlock warnings in the lease code.
> > >
> > > Hmm. Those sleep-within-spinlock warnings are very likely serious
> > > bugs.
> >
> > Yeah, didn't mean to belittle them.
> >
> > > Can you quote the whole warning with stack trace?
> >
> > It's just obvious allocations in setlease:
> >
> > BUG: sleeping function called from invalid context at mm/slab.c:3101
> > in_atomic(): 1, irqs_disabled(): 0, pid: 4345, name: lease_tests
> > 1 lock held by lease_tests/4345:
> > #0: (file_lock_lock){+.+.+.}, at: [<ffffffff81128be5>] lock_flocks+0x15/0x20
> > Pid: 4345, comm: lease_tests Not tainted 2.6.36-05858-gbd5e20b #1028
> > Call Trace:
> > [<ffffffff8103141d>] __might_sleep+0x10d/0x140
> > [<ffffffff810e3ad3>] kmem_cache_alloc+0x1f3/0x230
> > [<ffffffff8112a4d2>] generic_setlease+0x112/0x2c0
> > [<ffffffff8112a6b5>] __vfs_setlease+0x35/0x40
> > [<ffffffff8112acfe>] fcntl_setlease+0xce/0x180
> > [<ffffffff810f7c2e>] sys_fcntl+0x2fe/0x630
> > [<ffffffff81961999>] ? trace_hardirqs_on_thunk+0x3a/0x3f
> > [<ffffffff81002658>] system_call_fastpath+0x16/0x1b
> >
> > I'm testing a patch.
>
> This works for me.
>
> I'm not saying it's correct, but it does at least pass my dumb tests
> without complaining.
I can't think of any more missing locking, though I did notice this on a
quick look.
--b.
commit fc42117585672abd3cbf247dd311869233d1606a
Author: J. Bruce Fields <[email protected]>
Date: Tue Oct 26 18:25:30 2010 -0400
fix nlmsvc_notify_blocked locking
Signed-off-by: J. Bruce Fields <[email protected]>
diff --git a/fs/lockd/svclock.c b/fs/lockd/svclock.c
index 6f1ef00..c462d34 100644
--- a/fs/lockd/svclock.c
+++ b/fs/lockd/svclock.c
@@ -700,14 +700,16 @@ nlmsvc_notify_blocked(struct file_lock *fl)
struct nlm_block *block;
dprintk("lockd: VFS unblock notification for block %p\n", fl);
+ spin_lock(&nlm_blocked_lock);
list_for_each_entry(block, &nlm_blocked, b_list) {
if (nlm_compare_locks(&block->b_call->a_args.lock.fl, fl)) {
- nlmsvc_insert_block(block, 0);
+ nlmsvc_insert_block_locked(block, 0);
+ spin_unlock(&nlm_blocked_lock);
svc_wake_up(block->b_daemon);
return;
}
}
-
+ spin_unlock(&nlm_blocked_lock);
printk(KERN_WARNING "lockd: notification for unknown block!\n");
}
On Wed, Oct 27, 2010 at 10:32:20AM -0700, Linus Torvalds wrote:
> On Wed, Oct 27, 2010 at 9:46 AM, J. Bruce Fields <[email protected]> wrote:
> > On Wed, Oct 27, 2010 at 09:12:06AM -0700, Linus Torvalds wrote:
> >>
> >> Something like the attached (UNTESTED!) perhaps?
> >
> > Makes sense to me. Â Testing....
>
> So I found a buglet in the patch: the
>
> NOTE! It is very important that the FASYNC flag always
> match the state "is the filp on a fasync list".
>
> comment should be moved to be associated with "fasync_insert_entry()"
> rather than "fasync_add_entry()", since it's the insert-entry thing
> that does the actual FASYNC flag handling.
>
> But that incorrect comment placement shouldn't affect testing, obviously ;)
>
> Btw, who is going to collect these things assuming it passes testing?
> Arnd? You? I'll happily sign off on the fasync patch (with the comment
> movement) assuming it tests out ok, but there's all the other patches
> too that have been passed around. I really do want to get this into
> the merge window, because it would be a big shame if we couldn't
> effectively get rid of the BKL now just because of these kinds of
> smallish final details, so I'm just checking who wants to step up to
> the plate to collect it all together and make sure I have it?
Doesn't matter to me. What I've currently got (Arnd's patches with some
minor fixes from me, and your patch, all in a poorly changelogged
jumble) is in
git://linux-nfs.org/~bfields/linux-topics.git TMP-BKL-TESTING
You wouldn't want to merge that as is, but it does seem to work.
--b.
Removing a lock shouldn't require any allocations; a failure due to
ENOMEM leaves the caller with a choice between retrying or giving up and
leaking an unused lease.
Next we should split the other lease calls into add and delete cases.
I wanted to start with just the bugfix.
Signed-off-by: J. Bruce Fields <[email protected]>
---
fs/locks.c | 43 ++++++++++++++++++++++++++++++-------------
1 files changed, 30 insertions(+), 13 deletions(-)
diff --git a/fs/locks.c b/fs/locks.c
index 50ec159..06c7773 100644
--- a/fs/locks.c
+++ b/fs/locks.c
@@ -1441,7 +1441,8 @@ int generic_setlease(struct file *filp, long arg, struct file_lock **flp)
return 0;
out:
- locks_free_lock(lease);
+ if (arg != F_UNLCK)
+ locks_free_lock(lease);
return error;
}
EXPORT_SYMBOL(generic_setlease);
@@ -1493,17 +1494,16 @@ int vfs_setlease(struct file *filp, long arg, struct file_lock **lease)
}
EXPORT_SYMBOL_GPL(vfs_setlease);
-/**
- * fcntl_setlease - sets a lease on an open file
- * @fd: open file descriptor
- * @filp: file pointer
- * @arg: type of lease to obtain
- *
- * Call this fcntl to establish a lease on the file.
- * Note that you also need to call %F_SETSIG to
- * receive a signal when the lease is broken.
- */
-int fcntl_setlease(unsigned int fd, struct file *filp, long arg)
+static int do_fcntl_delete_lease(struct file *filp)
+{
+ struct file_lock fl, *flp = &fl;
+
+ lease_init(filp, F_UNLCK, flp);
+
+ return vfs_setlease(filp, F_UNLCK, &flp);
+}
+
+static int do_fcntl_add_lease(unsigned int fd, struct file *filp, long arg)
{
struct file_lock *fl;
struct fasync_struct *new;
@@ -1521,7 +1521,7 @@ int fcntl_setlease(unsigned int fd, struct file *filp, long arg)
}
lock_flocks();
error = __vfs_setlease(filp, arg, &fl);
- if (error || arg == F_UNLCK)
+ if (error)
goto out_unlock;
/*
@@ -1550,6 +1550,23 @@ out_unlock:
}
/**
+ * fcntl_setlease - sets a lease on an open file
+ * @fd: open file descriptor
+ * @filp: file pointer
+ * @arg: type of lease to obtain
+ *
+ * Call this fcntl to establish a lease on the file.
+ * Note that you also need to call %F_SETSIG to
+ * receive a signal when the lease is broken.
+ */
+int fcntl_setlease(unsigned int fd, struct file *filp, long arg)
+{
+ if (arg == F_UNLCK)
+ return do_fcntl_delete_lease(filp);
+ return do_fcntl_add_lease(fd, filp, arg);
+}
+
+/**
* flock_lock_file_wait - Apply a FLOCK-style lock to a file
* @filp: The file to apply the lock to
* @fl: The lock to be applied
--
1.7.1
On Tue, Oct 26, 2010 at 1:55 PM, Arnd Bergmann <[email protected]> wrote:
>
> Finally, we can build fs/locks.c, nfs and nfsd with CONFIG_BKL
> disabled.
>
> *not tested*
Ok, please make this so, so that we finally can effectively get rid of
the BKL in this release (I don't really care about the random old
drivers that may be "depends on BKL").
But that obviously requires at least some minimal testing by somebody
who is using NFS heavily, including locking. I assume that the
connectathon tests include file locking? And I presume that the nfs
developers run those at least semi-regularly and could do at least
that kind of minimal testing?
Linus
And two more fixups on top of the set sent by Bruce that already are
in Linus' tree.
On Wed, Oct 27, 2010 at 03:46:08PM +0200, Arnd Bergmann wrote:
> On Wednesday 27 October 2010, J. Bruce Fields wrote:
> > On Wed, Oct 27, 2010 at 04:39:24AM -0400, Christoph Hellwig wrote:
> > > Do locks_alloc_lock and initialization of the heap struct file_lock
> > > in the caller. This also avoids an entirely useless copy of the
> > > lock structure. free the passed in structure if we are modifying
> > > an existing lock structure.
> >
> > That sounds good; I'll give it a try.
>
> I've already started, this is what I've come up with. I don't have
> a good way to test it though, and don't know as much as you about this
> code. Feel free to use it as a starting point if you like.
> It does look correct to me, but I'm not very confident in this.
Hm, two problems:
- We introduce the possibility of fcntl(fd, F_SETLEASE, F_UNLCK)
failing with ENOMEM.
- fasync_helper(.,.,1,.) sleeps. Argh.
--b.
>
> Signed-off-by: Arnd Bergmann <[email protected]>
> ---
>
> diff --git a/fs/locks.c b/fs/locks.c
> index 8b2b6ad..c5ec771 100644
> --- a/fs/locks.c
> +++ b/fs/locks.c
> @@ -162,10 +162,11 @@ EXPORT_SYMBOL_GPL(unlock_flocks);
> static struct kmem_cache *filelock_cache __read_mostly;
>
> /* Allocate an empty lock structure. */
> -static struct file_lock *locks_alloc_lock(void)
> +struct file_lock *locks_alloc_lock(void)
> {
> return kmem_cache_alloc(filelock_cache, GFP_KERNEL);
> }
> +EXPORT_SYMBOL_GPL(locks_alloc_lock);
>
> void locks_release_private(struct file_lock *fl)
> {
> @@ -1365,7 +1366,6 @@ int fcntl_getlease(struct file *filp)
> int generic_setlease(struct file *filp, long arg, struct file_lock **flp)
> {
> struct file_lock *fl, **before, **my_before = NULL, *lease;
> - struct file_lock *new_fl = NULL;
> struct dentry *dentry = filp->f_path.dentry;
> struct inode *inode = dentry->d_inode;
> int error, rdlease_count = 0, wrlease_count = 0;
> @@ -1385,11 +1385,6 @@ int generic_setlease(struct file *filp, long arg, struct file_lock **flp)
> lease = *flp;
>
> if (arg != F_UNLCK) {
> - error = -ENOMEM;
> - new_fl = locks_alloc_lock();
> - if (new_fl == NULL)
> - goto out;
> -
> error = -EAGAIN;
> if ((arg == F_RDLCK) && (atomic_read(&inode->i_writecount) > 0))
> goto out;
> @@ -1434,23 +1429,19 @@ int generic_setlease(struct file *filp, long arg, struct file_lock **flp)
> goto out;
> }
>
> - error = 0;
> if (arg == F_UNLCK)
> - goto out;
> + goto out_unlck;
>
> error = -EINVAL;
> if (!leases_enable)
> goto out;
>
> - locks_copy_lock(new_fl, lease);
> - locks_insert_lock(before, new_fl);
> -
> - *flp = new_fl;
> + locks_insert_lock(before, lease);
> +out_unlck:
> return 0;
>
> out:
> - if (new_fl != NULL)
> - locks_free_lock(new_fl);
> + locks_free_lock(lease);
> return error;
> }
> EXPORT_SYMBOL(generic_setlease);
> @@ -1514,26 +1505,31 @@ EXPORT_SYMBOL_GPL(vfs_setlease);
> */
> int fcntl_setlease(unsigned int fd, struct file *filp, long arg)
> {
> - struct file_lock fl, *flp = &fl;
> + struct file_lock *fl;
> struct inode *inode = filp->f_path.dentry->d_inode;
> int error;
>
> - locks_init_lock(&fl);
> - error = lease_init(filp, arg, &fl);
> - if (error)
> + fl = locks_alloc_lock();
> + if (!fl)
> + return -ENOMEM;
> +
> + locks_init_lock(fl);
> + error = lease_init(filp, arg, fl);
> + if (error) {
> + locks_free_lock(fl);
> return error;
> + }
>
> lock_flocks();
> -
> - error = __vfs_setlease(filp, arg, &flp);
> + error = __vfs_setlease(filp, arg, &fl);
> if (error || arg == F_UNLCK)
> goto out_unlock;
>
> - error = fasync_helper(fd, filp, 1, &flp->fl_fasync);
> + error = fasync_helper(fd, filp, 1, &fl->fl_fasync);
> if (error < 0) {
> /* remove lease just inserted by setlease */
> - flp->fl_type = F_UNLCK | F_INPROGRESS;
> - flp->fl_break_time = jiffies - 10;
> + fl->fl_type = F_UNLCK | F_INPROGRESS;
> + fl->fl_break_time = jiffies - 10;
> time_out_leases(inode);
> goto out_unlock;
> }
> diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
> index 9019e8e..56347e0 100644
> --- a/fs/nfsd/nfs4state.c
> +++ b/fs/nfsd/nfs4state.c
> @@ -2614,7 +2614,7 @@ nfs4_open_delegation(struct svc_fh *fh, struct nfsd4_open *open, struct nfs4_sta
> struct nfs4_delegation *dp;
> struct nfs4_stateowner *sop = stp->st_stateowner;
> int cb_up = atomic_read(&sop->so_client->cl_cb_set);
> - struct file_lock fl, *flp = &fl;
> + struct file_lock *fl;
> int status, flag = 0;
>
> flag = NFS4_OPEN_DELEGATE_NONE;
> @@ -2648,20 +2648,24 @@ nfs4_open_delegation(struct svc_fh *fh, struct nfsd4_open *open, struct nfs4_sta
> flag = NFS4_OPEN_DELEGATE_NONE;
> goto out;
> }
> - locks_init_lock(&fl);
> - fl.fl_lmops = &nfsd_lease_mng_ops;
> - fl.fl_flags = FL_LEASE;
> - fl.fl_type = flag == NFS4_OPEN_DELEGATE_READ? F_RDLCK: F_WRLCK;
> - fl.fl_end = OFFSET_MAX;
> - fl.fl_owner = (fl_owner_t)dp;
> - fl.fl_file = find_readable_file(stp->st_file);
> - BUG_ON(!fl.fl_file);
> - fl.fl_pid = current->tgid;
> + status = -ENOMEM;
> + fl = locks_alloc_lock();
> + if (!fl)
> + goto out;
> + locks_init_lock(fl);
> + fl->fl_lmops = &nfsd_lease_mng_ops;
> + fl->fl_flags = FL_LEASE;
> + fl->fl_type = flag == NFS4_OPEN_DELEGATE_READ? F_RDLCK: F_WRLCK;
> + fl->fl_end = OFFSET_MAX;
> + fl->fl_owner = (fl_owner_t)dp;
> + fl->fl_file = find_readable_file(stp->st_file);
> + BUG_ON(!fl->fl_file);
> + fl->fl_pid = current->tgid;
>
> /* vfs_setlease checks to see if delegation should be handed out.
> * the lock_manager callbacks fl_mylease and fl_change are used
> */
> - if ((status = vfs_setlease(fl.fl_file, fl.fl_type, &flp))) {
> + if ((status = vfs_setlease(fl->fl_file, fl->fl_type, &fl))) {
> dprintk("NFSD: setlease failed [%d], no delegation\n", status);
> unhash_delegation(dp);
> flag = NFS4_OPEN_DELEGATE_NONE;
The caller allocated it, the caller should free it. The only issue so
far is that we could change the flp pointer even on an error return if
the fl_change callback failed. But we can simply move the flp assignment
after the fl_change invocation, as the callers don't care about the
flp return value if the setlease call failed.
Signed-off-by: Christoph Hellwig <[email protected]>
Index: linux-2.6/fs/cifs/cifsfs.c
===================================================================
--- linux-2.6.orig/fs/cifs/cifsfs.c 2010-10-31 07:10:07.636004223 -0400
+++ linux-2.6/fs/cifs/cifsfs.c 2010-10-31 07:10:10.922004154 -0400
@@ -625,11 +625,8 @@ static int cifs_setlease(struct file *fi
knows that the file won't be changed on the server
by anyone else */
return generic_setlease(file, arg, lease);
- else {
- if (arg != F_UNLCK)
- locks_free_lock(*lease);
+ else
return -EAGAIN;
- }
}
struct file_system_type cifs_fs_type = {
Index: linux-2.6/fs/gfs2/file.c
===================================================================
--- linux-2.6.orig/fs/gfs2/file.c 2010-10-31 07:10:07.643004363 -0400
+++ linux-2.6/fs/gfs2/file.c 2010-10-31 07:10:10.923003665 -0400
@@ -629,8 +629,6 @@ static ssize_t gfs2_file_aio_write(struc
static int gfs2_setlease(struct file *file, long arg, struct file_lock **fl)
{
- if (arg != F_UNLCK)
- locks_free_lock(*fl);
return -EINVAL;
}
Index: linux-2.6/fs/locks.c
===================================================================
--- linux-2.6.orig/fs/locks.c 2010-10-31 07:10:07.649004084 -0400
+++ linux-2.6/fs/locks.c 2010-10-31 07:34:10.102255587 -0400
@@ -1428,8 +1425,9 @@ int generic_setlease(struct file *filp,
goto out;
if (my_before != NULL) {
- *flp = *my_before;
error = lease->fl_lmops->fl_change(my_before, arg);
+ if (!error)
+ *flp = *my_before;
goto out;
}
@@ -1444,8 +1442,6 @@ int generic_setlease(struct file *filp,
return 0;
out:
- if (arg != F_UNLCK)
- locks_free_lock(lease);
return error;
}
EXPORT_SYMBOL(generic_setlease);
@@ -1524,8 +1520,11 @@ static int do_fcntl_add_lease(unsigned i
}
lock_flocks();
error = __vfs_setlease(filp, arg, &fl);
- if (error)
- goto out_unlock;
+ if (error) {
+ unlock_flocks();
+ locks_free_lock(fl);
+ goto out_free_fasync;
+ }
/*
* fasync_insert_entry() returns the old entry if any.
@@ -1541,12 +1540,12 @@ static int do_fcntl_add_lease(unsigned i
fl->fl_type = F_UNLCK | F_INPROGRESS;
fl->fl_break_time = jiffies - 10;
time_out_leases(inode);
- goto out_unlock;
+ } else {
+ error = __f_setown(filp, task_pid(current), PIDTYPE_PID, 0);
}
-
- error = __f_setown(filp, task_pid(current), PIDTYPE_PID, 0);
-out_unlock:
unlock_flocks();
+
+out_free_fasync:
if (new)
fasync_free(new);
return error;
Index: linux-2.6/fs/nfs/file.c
===================================================================
--- linux-2.6.orig/fs/nfs/file.c 2010-10-31 07:10:07.658003804 -0400
+++ linux-2.6/fs/nfs/file.c 2010-10-31 07:10:10.936003734 -0400
@@ -884,7 +884,5 @@ static int nfs_setlease(struct file *fil
dprintk("NFS: setlease(%s/%s, arg=%ld)\n",
file->f_path.dentry->d_parent->d_name.name,
file->f_path.dentry->d_name.name, arg);
- if (arg != F_UNLCK)
- locks_free_lock(*fl);
return -EINVAL;
}
Index: linux-2.6/fs/nfsd/nfs4state.c
===================================================================
--- linux-2.6.orig/fs/nfsd/nfs4state.c 2010-10-31 07:10:07.666004084 -0400
+++ linux-2.6/fs/nfsd/nfs4state.c 2010-10-31 07:32:56.906254608 -0400
@@ -2652,6 +2652,7 @@ nfs4_open_delegation(struct svc_fh *fh,
if ((status = vfs_setlease(fl->fl_file, fl->fl_type, &fl))) {
dprintk("NFSD: setlease failed [%d], no delegation\n", status);
dp->dl_flock = NULL;
+ locks_free_lock(fl);
unhash_delegation(dp);
flag = NFS4_OPEN_DELEGATE_NONE;
goto out;
The NFSv4 server was initializing the dp->dl_flock pointer by the
somewhat ridiculous method of a locks_copy_lock callback.
Now that setlease uses the passed-in lock instead of doing a copy,
dl_flock no longer gets set, resulting in the lock leaking on delegation
release, and later possible hangs (among other problems).
So, initialize dl_flock and get rid of the callback.
Signed-off-by: J. Bruce Fields <[email protected]>
---
fs/nfsd/nfs4state.c | 19 ++-----------------
1 files changed, 2 insertions(+), 17 deletions(-)
diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
index 56347e0..b7f818b 100644
--- a/fs/nfsd/nfs4state.c
+++ b/fs/nfsd/nfs4state.c
@@ -2310,22 +2310,6 @@ void nfsd_release_deleg_cb(struct file_lock *fl)
}
/*
- * Set the delegation file_lock back pointer.
- *
- * Called from setlease() with lock_kernel() held.
- */
-static
-void nfsd_copy_lock_deleg_cb(struct file_lock *new, struct file_lock *fl)
-{
- struct nfs4_delegation *dp = (struct nfs4_delegation *)new->fl_owner;
-
- dprintk("NFSD: nfsd_copy_lock_deleg_cb: new fl %p dp %p\n", new, dp);
- if (!dp)
- return;
- dp->dl_flock = new;
-}
-
-/*
* Called from setlease() with lock_kernel() held
*/
static
@@ -2355,7 +2339,6 @@ int nfsd_change_deleg_cb(struct file_lock **onlist, int arg)
static const struct lock_manager_operations nfsd_lease_mng_ops = {
.fl_break = nfsd_break_deleg_cb,
.fl_release_private = nfsd_release_deleg_cb,
- .fl_copy_lock = nfsd_copy_lock_deleg_cb,
.fl_mylease = nfsd_same_client_deleg_cb,
.fl_change = nfsd_change_deleg_cb,
};
@@ -2661,12 +2644,14 @@ nfs4_open_delegation(struct svc_fh *fh, struct nfsd4_open *open, struct nfs4_sta
fl->fl_file = find_readable_file(stp->st_file);
BUG_ON(!fl->fl_file);
fl->fl_pid = current->tgid;
+ dp->dl_flock = fl;
/* vfs_setlease checks to see if delegation should be handed out.
* the lock_manager callbacks fl_mylease and fl_change are used
*/
if ((status = vfs_setlease(fl->fl_file, fl->fl_type, &fl))) {
dprintk("NFSD: setlease failed [%d], no delegation\n", status);
+ dp->dl_flock = NULL;
unhash_delegation(dp);
flag = NFS4_OPEN_DELEGATE_NONE;
goto out;
--
1.7.1
On Wednesday 27 October 2010, Christoph Hellwig wrote:
> On Wed, Oct 27, 2010 at 10:55:39AM -0400, J. Bruce Fields wrote:
> > Hm, two problems:
> > - We introduce the possibility of fcntl(fd, F_SETLEASE, F_UNLCK)
> > failing with ENOMEM.
>
> splitt ->setlease into ->add_least and ->delete_lease. No need to pass
> in a structure for the later. No need to return one either.
That sounds like a good way to get rid of a lot of special cases, too.
> > - fasync_helper(.,.,1,.) sleeps. Argh.
>
> That's not new..
It comes back to the original problem with Bruce's patch though:
fcntl_setlease wants to atomically add a lease or fail. If
fasync_helper fails, we want to remove the lease that was
just added before anyone can see it. At the very least we need
to keep the flock from getting freed in another thread while
we call fasync_helper without the lock.
locks_delete_lock is also called with lock_flocks held and calls
fasync_helper...
Arnd
On Wed, Oct 27, 2010 at 05:23:59PM +0200, Arnd Bergmann wrote:
> It comes back to the original problem with Bruce's patch though:
> fcntl_setlease wants to atomically add a lease or fail. If
> fasync_helper fails, we want to remove the lease that was
> just added before anyone can see it. At the very least we need
> to keep the flock from getting freed in another thread while
> we call fasync_helper without the lock.
a variant of fasync_add_entry that takes a preallocated
fasync_struct would indeed take care of that.
> locks_delete_lock is also called with lock_flocks held and calls
> fasync_helper...
but with on = 0, which calls into fasync_remove_entry, which doesn't
sleep.
On Tuesday 26 October 2010 22:35:50 Bryan Schumaker wrote:
> Hi
>
> I left the one call because I was unable to figure out what
> was being protected with the BKL in that section of the code.
> I figured I would leave it for the maintainers, since they
> know more about the code than I do.
My guess is that we need something like the patch below.
This moves the lock_flocks() call into the places where it
is needed and can be safely converted into a spinlock because
that code does not sleep.
Finally, we can build fs/locks.c, nfs and nfsd with CONFIG_BKL
disabled.
*not tested*
Signed-off-by: Arnd Bergmann <[email protected]>
---
Kconfig | 1 -
lockd/svc.c | 11 -----------
lockd/svcsubs.c | 9 ++++++++-
locks.c | 5 +++--
nfs/Kconfig | 1 -
nfsd/Kconfig | 1 -
6 files changed, 11 insertions(+), 17 deletions(-)
diff --git a/fs/Kconfig b/fs/Kconfig
index 65781de..3d18530 100644
--- a/fs/Kconfig
+++ b/fs/Kconfig
@@ -50,7 +50,6 @@ endif # BLOCK
config FILE_LOCKING
bool "Enable POSIX file locking API" if EMBEDDED
default y
- select BKL # while lockd still uses it.
help
This option enables standard file locking support, required
for filesystems like NFS and for the flock() system
diff --git a/fs/lockd/svc.c b/fs/lockd/svc.c
index b13aabc..abfff9d 100644
--- a/fs/lockd/svc.c
+++ b/fs/lockd/svc.c
@@ -22,7 +22,6 @@
#include <linux/in.h>
#include <linux/uio.h>
#include <linux/smp.h>
-#include <linux/smp_lock.h>
#include <linux/mutex.h>
#include <linux/kthread.h>
#include <linux/freezer.h>
@@ -130,15 +129,6 @@ lockd(void *vrqstp)
dprintk("NFS locking service started (ver " LOCKD_VERSION ").\n");
- /*
- * FIXME: it would be nice if lockd didn't spend its entire life
- * running under the BKL. At the very least, it would be good to
- * have someone clarify what it's intended to protect here. I've
- * seen some handwavy posts about posix locking needing to be
- * done under the BKL, but it's far from clear.
- */
- lock_kernel();
-
if (!nlm_timeout)
nlm_timeout = LOCKD_DFLT_TIMEO;
nlmsvc_timeout = nlm_timeout * HZ;
@@ -195,7 +185,6 @@ lockd(void *vrqstp)
if (nlmsvc_ops)
nlmsvc_invalidate_all();
nlm_shutdown_hosts();
- unlock_kernel();
return 0;
}
diff --git a/fs/lockd/svcsubs.c b/fs/lockd/svcsubs.c
index d0ef94c..1ca0679 100644
--- a/fs/lockd/svcsubs.c
+++ b/fs/lockd/svcsubs.c
@@ -170,6 +170,7 @@ nlm_traverse_locks(struct nlm_host *host, struct nlm_file *file,
again:
file->f_locks = 0;
+ lock_flocks(); /* protects i_flock list */
for (fl = inode->i_flock; fl; fl = fl->fl_next) {
if (fl->fl_lmops != &nlmsvc_lock_operations)
continue;
@@ -181,6 +182,7 @@ again:
if (match(lockhost, host)) {
struct file_lock lock = *fl;
+ unlock_flocks();
lock.fl_type = F_UNLCK;
lock.fl_start = 0;
lock.fl_end = OFFSET_MAX;
@@ -192,6 +194,7 @@ again:
goto again;
}
}
+ unlock_flocks();
return 0;
}
@@ -226,10 +229,14 @@ nlm_file_inuse(struct nlm_file *file)
if (file->f_count || !list_empty(&file->f_blocks) || file->f_shares)
return 1;
+ lock_flocks();
for (fl = inode->i_flock; fl; fl = fl->fl_next) {
- if (fl->fl_lmops == &nlmsvc_lock_operations)
+ if (fl->fl_lmops == &nlmsvc_lock_operations) {
+ unlock_flocks();
return 1;
+ }
}
+ unlock_flocks();
file->f_locks = 0;
return 0;
}
diff --git a/fs/locks.c b/fs/locks.c
index 8b2b6ad..a417901 100644
--- a/fs/locks.c
+++ b/fs/locks.c
@@ -142,6 +142,7 @@ int lease_break_time = 45;
static LIST_HEAD(file_lock_list);
static LIST_HEAD(blocked_list);
+static DEFINE_SPINLOCK(file_lock_lock);
/*
* Protects the two list heads above, plus the inode->i_flock list
@@ -149,13 +150,13 @@ static LIST_HEAD(blocked_list);
*/
void lock_flocks(void)
{
- lock_kernel();
+ spin_lock(&file_lock_lock);
}
EXPORT_SYMBOL_GPL(lock_flocks);
void unlock_flocks(void)
{
- unlock_kernel();
+ spin_unlock(&file_lock_lock);
}
EXPORT_SYMBOL_GPL(unlock_flocks);
diff --git a/fs/nfs/Kconfig b/fs/nfs/Kconfig
index fd66765..ba30665 100644
--- a/fs/nfs/Kconfig
+++ b/fs/nfs/Kconfig
@@ -1,7 +1,6 @@
config NFS_FS
tristate "NFS client support"
depends on INET && FILE_LOCKING
- depends on BKL # fix as soon as lockd is done
select LOCKD
select SUNRPC
select NFS_ACL_SUPPORT if NFS_V3_ACL
diff --git a/fs/nfsd/Kconfig b/fs/nfsd/Kconfig
index 31a78fc..18b3e89 100644
--- a/fs/nfsd/Kconfig
+++ b/fs/nfsd/Kconfig
@@ -2,7 +2,6 @@ config NFSD
tristate "NFS server support"
depends on INET
depends on FILE_LOCKING
- depends on BKL # fix as soon as lockd is done
select LOCKD
select SUNRPC
select EXPORTFS
On Wednesday 27 October 2010, J. Bruce Fields wrote:
> On Wed, Oct 27, 2010 at 04:39:24AM -0400, Christoph Hellwig wrote:
> > Do locks_alloc_lock and initialization of the heap struct file_lock
> > in the caller. This also avoids an entirely useless copy of the
> > lock structure. free the passed in structure if we are modifying
> > an existing lock structure.
>
> That sounds good; I'll give it a try.
I've already started, this is what I've come up with. I don't have
a good way to test it though, and don't know as much as you about this
code. Feel free to use it as a starting point if you like.
It does look correct to me, but I'm not very confident in this.
Signed-off-by: Arnd Bergmann <[email protected]>
---
diff --git a/fs/locks.c b/fs/locks.c
index 8b2b6ad..c5ec771 100644
--- a/fs/locks.c
+++ b/fs/locks.c
@@ -162,10 +162,11 @@ EXPORT_SYMBOL_GPL(unlock_flocks);
static struct kmem_cache *filelock_cache __read_mostly;
/* Allocate an empty lock structure. */
-static struct file_lock *locks_alloc_lock(void)
+struct file_lock *locks_alloc_lock(void)
{
return kmem_cache_alloc(filelock_cache, GFP_KERNEL);
}
+EXPORT_SYMBOL_GPL(locks_alloc_lock);
void locks_release_private(struct file_lock *fl)
{
@@ -1365,7 +1366,6 @@ int fcntl_getlease(struct file *filp)
int generic_setlease(struct file *filp, long arg, struct file_lock **flp)
{
struct file_lock *fl, **before, **my_before = NULL, *lease;
- struct file_lock *new_fl = NULL;
struct dentry *dentry = filp->f_path.dentry;
struct inode *inode = dentry->d_inode;
int error, rdlease_count = 0, wrlease_count = 0;
@@ -1385,11 +1385,6 @@ int generic_setlease(struct file *filp, long arg, struct file_lock **flp)
lease = *flp;
if (arg != F_UNLCK) {
- error = -ENOMEM;
- new_fl = locks_alloc_lock();
- if (new_fl == NULL)
- goto out;
-
error = -EAGAIN;
if ((arg == F_RDLCK) && (atomic_read(&inode->i_writecount) > 0))
goto out;
@@ -1434,23 +1429,19 @@ int generic_setlease(struct file *filp, long arg, struct file_lock **flp)
goto out;
}
- error = 0;
if (arg == F_UNLCK)
- goto out;
+ goto out_unlck;
error = -EINVAL;
if (!leases_enable)
goto out;
- locks_copy_lock(new_fl, lease);
- locks_insert_lock(before, new_fl);
-
- *flp = new_fl;
+ locks_insert_lock(before, lease);
+out_unlck:
return 0;
out:
- if (new_fl != NULL)
- locks_free_lock(new_fl);
+ locks_free_lock(lease);
return error;
}
EXPORT_SYMBOL(generic_setlease);
@@ -1514,26 +1505,31 @@ EXPORT_SYMBOL_GPL(vfs_setlease);
*/
int fcntl_setlease(unsigned int fd, struct file *filp, long arg)
{
- struct file_lock fl, *flp = &fl;
+ struct file_lock *fl;
struct inode *inode = filp->f_path.dentry->d_inode;
int error;
- locks_init_lock(&fl);
- error = lease_init(filp, arg, &fl);
- if (error)
+ fl = locks_alloc_lock();
+ if (!fl)
+ return -ENOMEM;
+
+ locks_init_lock(fl);
+ error = lease_init(filp, arg, fl);
+ if (error) {
+ locks_free_lock(fl);
return error;
+ }
lock_flocks();
-
- error = __vfs_setlease(filp, arg, &flp);
+ error = __vfs_setlease(filp, arg, &fl);
if (error || arg == F_UNLCK)
goto out_unlock;
- error = fasync_helper(fd, filp, 1, &flp->fl_fasync);
+ error = fasync_helper(fd, filp, 1, &fl->fl_fasync);
if (error < 0) {
/* remove lease just inserted by setlease */
- flp->fl_type = F_UNLCK | F_INPROGRESS;
- flp->fl_break_time = jiffies - 10;
+ fl->fl_type = F_UNLCK | F_INPROGRESS;
+ fl->fl_break_time = jiffies - 10;
time_out_leases(inode);
goto out_unlock;
}
diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
index 9019e8e..56347e0 100644
--- a/fs/nfsd/nfs4state.c
+++ b/fs/nfsd/nfs4state.c
@@ -2614,7 +2614,7 @@ nfs4_open_delegation(struct svc_fh *fh, struct nfsd4_open *open, struct nfs4_sta
struct nfs4_delegation *dp;
struct nfs4_stateowner *sop = stp->st_stateowner;
int cb_up = atomic_read(&sop->so_client->cl_cb_set);
- struct file_lock fl, *flp = &fl;
+ struct file_lock *fl;
int status, flag = 0;
flag = NFS4_OPEN_DELEGATE_NONE;
@@ -2648,20 +2648,24 @@ nfs4_open_delegation(struct svc_fh *fh, struct nfsd4_open *open, struct nfs4_sta
flag = NFS4_OPEN_DELEGATE_NONE;
goto out;
}
- locks_init_lock(&fl);
- fl.fl_lmops = &nfsd_lease_mng_ops;
- fl.fl_flags = FL_LEASE;
- fl.fl_type = flag == NFS4_OPEN_DELEGATE_READ? F_RDLCK: F_WRLCK;
- fl.fl_end = OFFSET_MAX;
- fl.fl_owner = (fl_owner_t)dp;
- fl.fl_file = find_readable_file(stp->st_file);
- BUG_ON(!fl.fl_file);
- fl.fl_pid = current->tgid;
+ status = -ENOMEM;
+ fl = locks_alloc_lock();
+ if (!fl)
+ goto out;
+ locks_init_lock(fl);
+ fl->fl_lmops = &nfsd_lease_mng_ops;
+ fl->fl_flags = FL_LEASE;
+ fl->fl_type = flag == NFS4_OPEN_DELEGATE_READ? F_RDLCK: F_WRLCK;
+ fl->fl_end = OFFSET_MAX;
+ fl->fl_owner = (fl_owner_t)dp;
+ fl->fl_file = find_readable_file(stp->st_file);
+ BUG_ON(!fl->fl_file);
+ fl->fl_pid = current->tgid;
/* vfs_setlease checks to see if delegation should be handed out.
* the lock_manager callbacks fl_mylease and fl_change are used
*/
- if ((status = vfs_setlease(fl.fl_file, fl.fl_type, &flp))) {
+ if ((status = vfs_setlease(fl->fl_file, fl->fl_type, &fl))) {
dprintk("NFSD: setlease failed [%d], no delegation\n", status);
unhash_delegation(dp);
flag = NFS4_OPEN_DELEGATE_NONE;
On Tue, Oct 26, 2010 at 10:22 AM, J. Bruce Fields <[email protected]> wrote:
> On Tue, Oct 26, 2010 at 12:45:50PM -0400, J. Bruce Fields wrote:
>> Please pull the following nfsd updates from the for-2.6.37 branch at:
>>
>> ? ? ? git://linux-nfs.org/~bfields/linux.git for-2.6.37
>
> By the way, apologies, I see there's a conflict with upstream--looks
> obvious (conflicting appends to
> Documentation/feature-removal-schedule.txt), but I'm happy to fix it up
> and add a merge commit on that branch if it saves you time.
No, as mentioned elsewhere in other similar threads, I actually prefer
people _not_ to pre-merge. It results in significantly uglier history
(especially since I tend to merge a lot during the merge window, so
your pre-merge will not obviate my later merge, because my tree will
have moved forward), since criss-crossing merges really end up making
the gitk graph rather harder to read.
But to me personally, the "uglier history" is actually the secondary
concern - I much prefer seeing the conflicts rather than have them
hidden from me by a pre-merge. Now, for something like
feature-removal-schedule.txt, the conflicts are usually not
interesting from a development angle, but on the other hand they are
also so trivial that they don't inconvenience me and I'd rather just
do them myself and avoid the history pollution of extra merges. But
then when the conflicts get more interesting and touching actual code,
I end up spending more time at them, but I _still_ want to see them,
because they inform me where different people ended up stepping on
each others code.
And those conflicts are interesting to me as a top-level maintainer,
because it either shows code organizational problems, or it shows
where people really were touching the same code for good reasons. And
regardless of the reason for the conflict, I like being aware of it,
because merges like that are often indicative of "this might cause
issues".
If the merge conflict ends up being so involved that I can't resolve
it, I'll push back an email saying so. It happens, but not very often
I'm happy to say. Our source organization is good enough that we
seldom have really complicated merges.
One thing that some maintainers do is to do a private pre-merge just
to see if there are problems (and because it can give a more accurate
diffstat in the presense of criss-cross merges or duplicate changes)
and then if that merge is complex, expose both an unmerged branch and
a "here, if you have issues, this is how I resolved the merge" branch.
What I usually end up doing in that case is actually to do the merge
myself anyway (because of all the issues above - I just want to know
what is going on), but then also compare my end result with the
maintainer pre-merge result, just to verify.
But even that kind of "both non-merged and pre-merged" pull requests
should be reserved just for cases where there really was a somewhat
involved conflict. For trivial conflicts like a feature removal doc
clash, don't even bother. I do several trivial merges a day, it's
normal and it doesn't really take me any appreciable extra time. Git
command line of the day:
git log --oneline v2.6.36.. --merges --author=Torvalds --grep=onflict
to see at least a partial list of the trivial conflicts I've done in
just this merge window. It's 3-4 per day, it's perfectly normal.
Linus
On Wednesday 27 October 2010 20:43:59 Linus Torvalds wrote:
> On Wed, Oct 27, 2010 at 11:42 AM, Linus Torvalds
> <[email protected]> wrote:
> >
> > Feel free to edit the message/patch to your hearts content.
>
> Oh, and if you want me to just commit this part, I can do so. It
> doesn't make much sense without the other parts to actually make it
> useful, though, so it probably makes more sense to come with them.
Once Bruce is happy with the test results, you can pull it from
git+ssh://master.kernel.org/pub/scm/linux/kernel/git/arnd/bkl.git flock
I've rewritten all the changelogs to make more sense as a series,
and split out the bits that turn lock_flocks into a spinlock to
come last, so we get a bisectable series.
The contents are left unchanged.
Arnd
---
commit b3426739cc8f7c7dd127ca8dad5e25195930cac1
Author: Arnd Bergmann <[email protected]>
Date: Wed Oct 27 21:39:58 2010 +0200
locks: turn lock_flocks into a spinlock
Nothing depends on lock_flocks using the BKL
any more, so we can do the switch over to
a private spinlock.
Signed-off-by: Arnd Bergmann <[email protected]>
fs/Kconfig | 1 -
fs/locks.c | 5 +++--
2 files changed, 3 insertions(+), 3 deletions(-)
commit 55d3ff97c5c0b3dce39f705e2b1fe85818891822
Author: Linus Torvalds <[email protected]>
Date: Wed Oct 27 12:38:12 2010 -0400
locks: avoid fasync allocation under lock_flocks
This splits fasync_helper into four functions: fasync_alloc,
fasync_free, fasync_insert_entry and fasync_remove_entry,
in order to allow the lease handling to call them directly
instead of going through fasync_helper.
The fasync_helper interface really doesn't make sense outside
of ->fasync file operations, which use the same calling
conventions of passing flags in and out as the helper.
After the change, fcntl_setlease can simply allocate the
new fasync_struct outside of lock_flocks, which is required
to turn that into a spinlock.
Signed-off-by: Linus Torvalds <[email protected]>
[[email protected]: rebase on top of my changes to Arnd's patch]
Signed-off-by: J. Bruce Fields <[email protected]>
[arnd: rewrite changelog text]
Signed-off-by: Arnd Bergmann <[email protected]>
fs/fcntl.c | 66 +++++++++++++++++++++++++++++++++++++++------------
fs/locks.c | 18 +++++++++++++-
include/linux/fs.h | 5 ++++
3 files changed, 72 insertions(+), 17 deletions(-)
commit c5b1f0d92c36851aca09ac6c7c0c4f9690ac14f3
Author: Arnd Bergmann <[email protected]>
Date: Wed Oct 27 15:46:08 2010 +0200
locks/nfsd: allocate file lock outside of spinlock
As suggested by Christoph Hellwig, this moves allocation
of new file locks out of generic_setlease into the
callers, nfs4_open_delegation and fcntl_setlease in order
to allow GFP_KERNEL allocations when lock_flocks has
become a spinlock.
Signed-off-by: Arnd Bergmann <[email protected]>
Acked-by: J. Bruce Fields <[email protected]>
fs/locks.c | 36 ++++++++++++------------------------
fs/nfsd/nfs4state.c | 26 +++++++++++++++-----------
include/linux/fs.h | 1 +
3 files changed, 28 insertions(+), 35 deletions(-)
commit a282a1fa6b23bd21ba0b86e53ed2a316b001836f
Author: J. Bruce Fields <[email protected]>
Date: Tue Oct 26 18:25:30 2010 -0400
lockd: fix nlmsvc_notify_blocked locking
nlmsvc_notify_blocked walks the nlm_blocked list,
which requires nlm_blocked_lock.
Signed-off-by: J. Bruce Fields <[email protected]>
Signed-off-by: Arnd Bergmann <[email protected]>
fs/lockd/svclock.c | 6 ++++--
1 files changed, 4 insertions(+), 2 deletions(-)
commit 763641d81202834e9d64de2019d1edec12868f4f
Author: Arnd Bergmann <[email protected]>
Date: Tue Oct 26 22:55:40 2010 +0200
lockd: push lock_flocks down
lockd should use lock_flocks() instead of lock_kernel()
to lock against posix locks accessing the i_flock list.
This is a prerequisite to turning lock_flocks into a
spinlock.
Signed-off-by: Arnd Bergmann <[email protected]>
Acked-by: J. Bruce Fields <[email protected]>
fs/lockd/svc.c | 11 -----------
fs/lockd/svcsubs.c | 9 ++++++++-
fs/nfs/Kconfig | 1 -
fs/nfsd/Kconfig | 1 -
4 files changed, 8 insertions(+), 14 deletions(-)
On Wed, Oct 27, 2010 at 8:23 AM, Arnd Bergmann <[email protected]> wrote:
>
> locks_delete_lock is also called with lock_flocks held and calls
> fasync_helper...
We don't really have to use fasync_helper.
In fact, the whole interface is pretty broken for something like file
locking, which isn't actually "fasync()". That whole "on/off as an
argument" is just crazy. It would be _trivial_ to expose a version of
fasync_helper() that takes a pre-allocated fasync_struct for add, and
that has separate helper functions for the add/delete case so that you
don't have the pointless crazy arguments (for "delete" the "fd"
argument is useless, and I do hate "modal" functions that take what
they should do as a flag).
Then fcntl_setlease() would trivially just allocate the dang thing before.
Something like the attached (UNTESTED!) perhaps?
Linus
On Wed, Oct 27, 2010 at 05:23:59PM +0200, Arnd Bergmann wrote:
> On Wednesday 27 October 2010, Christoph Hellwig wrote:
> > On Wed, Oct 27, 2010 at 10:55:39AM -0400, J. Bruce Fields wrote:
> > > Hm, two problems:
> > > - We introduce the possibility of fcntl(fd, F_SETLEASE, F_UNLCK)
> > > failing with ENOMEM.
> >
> > splitt ->setlease into ->add_least and ->delete_lease. No need to pass
> > in a structure for the later. No need to return one either.
>
> That sounds like a good way to get rid of a lot of special cases, too.
>
> > > - fasync_helper(.,.,1,.) sleeps. Argh.
> >
> > That's not new..
>
> It comes back to the original problem with Bruce's patch though:
> fcntl_setlease wants to atomically add a lease or fail. If
> fasync_helper fails, we want to remove the lease that was
> just added before anyone can see it. At the very least we need
> to keep the flock from getting freed in another thread while
> we call fasync_helper without the lock.
>
> locks_delete_lock is also called with lock_flocks held and calls
> fasync_helper...
Yeah, but just the fasync_remove_entry() case.
It would really seem nicer to me if people called
fasync_{add,remove}_entry() instead of having this silly fasync_helper()
and making the reader remember what the third argument means.
--b.
We modified setlease to require the caller to allocate the new lease in
the case of creating a new lease, but forgot to fix up the filesystem
methods.
Cc: Steven Whitehouse <[email protected]>
Cc: Steve French <[email protected]>
Cc: Trond Myklebust <[email protected]>
Signed-off-by: J. Bruce Fields <[email protected]>
---
fs/cifs/cifsfs.c | 5 ++++-
fs/gfs2/file.c | 2 ++
fs/locks.c | 3 ++-
fs/nfs/file.c | 3 ++-
include/linux/fs.h | 1 +
5 files changed, 11 insertions(+), 3 deletions(-)
diff --git a/fs/cifs/cifsfs.c b/fs/cifs/cifsfs.c
index 75c4eaa..54745b6 100644
--- a/fs/cifs/cifsfs.c
+++ b/fs/cifs/cifsfs.c
@@ -625,8 +625,11 @@ static int cifs_setlease(struct file *file, long arg, struct file_lock **lease)
knows that the file won't be changed on the server
by anyone else */
return generic_setlease(file, arg, lease);
- else
+ else {
+ if (arg != F_UNLCK)
+ locks_free_lock(*lease);
return -EAGAIN;
+ }
}
struct file_system_type cifs_fs_type = {
diff --git a/fs/gfs2/file.c b/fs/gfs2/file.c
index aa99647..ac943c1 100644
--- a/fs/gfs2/file.c
+++ b/fs/gfs2/file.c
@@ -629,6 +629,8 @@ static ssize_t gfs2_file_aio_write(struct kiocb *iocb, const struct iovec *iov,
static int gfs2_setlease(struct file *file, long arg, struct file_lock **fl)
{
+ if (arg != F_UNLCK)
+ locks_free_lock(*fl);
return -EINVAL;
}
diff --git a/fs/locks.c b/fs/locks.c
index 63fbc41..5b526a9 100644
--- a/fs/locks.c
+++ b/fs/locks.c
@@ -186,7 +186,7 @@ void locks_release_private(struct file_lock *fl)
EXPORT_SYMBOL_GPL(locks_release_private);
/* Free a lock which is not in use. */
-static void locks_free_lock(struct file_lock *fl)
+void locks_free_lock(struct file_lock *fl)
{
BUG_ON(waitqueue_active(&fl->fl_wait));
BUG_ON(!list_empty(&fl->fl_block));
@@ -195,6 +195,7 @@ static void locks_free_lock(struct file_lock *fl)
locks_release_private(fl);
kmem_cache_free(filelock_cache, fl);
}
+EXPORT_SYMBOL(locks_free_lock);
void locks_init_lock(struct file_lock *fl)
{
diff --git a/fs/nfs/file.c b/fs/nfs/file.c
index e756075..1e524fb 100644
--- a/fs/nfs/file.c
+++ b/fs/nfs/file.c
@@ -884,6 +884,7 @@ static int nfs_setlease(struct file *file, long arg, struct file_lock **fl)
dprintk("NFS: setlease(%s/%s, arg=%ld)\n",
file->f_path.dentry->d_parent->d_name.name,
file->f_path.dentry->d_name.name, arg);
-
+ if (arg != F_UNLCK)
+ locks_free_lock(*fl);
return -EINVAL;
}
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 7b7b507..1eb2939 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -1129,6 +1129,7 @@ extern int fcntl_setlease(unsigned int fd, struct file *filp, long arg);
extern int fcntl_getlease(struct file *filp);
/* fs/locks.c */
+void locks_free_lock(struct file_lock *fl);
extern void locks_init_lock(struct file_lock *);
extern struct file_lock * locks_alloc_lock(void);
extern void locks_copy_lock(struct file_lock *, struct file_lock *);
--
1.7.1
On Wed, Oct 27, 2010 at 10:55:39AM -0400, J. Bruce Fields wrote:
> Hm, two problems:
> - We introduce the possibility of fcntl(fd, F_SETLEASE, F_UNLCK)
> failing with ENOMEM.
splitt ->setlease into ->add_least and ->delete_lease. No need to pass
in a structure for the later. No need to return one either.
> - fasync_helper(.,.,1,.) sleeps. Argh.
That's not new..
On Sat, Oct 30, 2010 at 10:04:31PM -0400, Christoph Hellwig wrote:
> On Sat, Oct 30, 2010 at 05:31:16PM -0400, J. Bruce Fields wrote:
> > The NFSv4 server was initializing the dp->dl_flock pointer by the
> > somewhat ridiculous method of a locks_copy_lock callback.
> >
> > Now that setlease uses the passed-in lock instead of doing a copy,
> > dl_flock no longer gets set, resulting in the lock leaking on delegation
> > release, and later possible hangs (among other problems).
> >
> > So, initialize dl_flock and get rid of the callback.
>
> >From what I can see this was the only instance of
> lock_manager_operations.fl_copy_lock. Please kill it while you're at
> it.
>
> Also lock_manager_operations.fl_release_private has only exact one
> instance in nfs4d which is part of the same abuse scheme. Please also
> get rid of it. I recently noticed this while updating
> Documentation/filesystems/Locking for the grand new BKL-less world.
Yeah, I wanted to maximize chances of getting the minimal fixes into
-rc1, but I've got patches for that queued up too. I was planning on
waiting for the next merge window, but I'm happy enough to send them in
now.
--b.
Hi
I left the one call because I was unable to figure out what was being protected with the BKL in that section of the code. I figured I would leave it for the maintainers, since they know more about the code than I do.
Bryan
On 10/26/2010 04:18 PM, Arnd Bergmann wrote:
> On Tuesday 26 October 2010 18:45:50 J. Bruce Fields wrote:
>> Bryan Schumaker (1):
>> lockd: Mostly remove BKL from the server
>
> Could you explain the "mostly" part of this commit?
>
> The commit message only says "This patch removes all
> but one call to lock_kernel() from the server." This one
> call is what keeps us from removing the BKL from fs/locks.c
> because I can't tell if you still suspect that lockd
> needs to lock against posix file locks or if there was
> a different reason for leaving it in.
>
> I can't think of anything else that this might be locking
> against because everything that might interact with lockd
> now does not use the BKL any more and lockd is
> single-threaded by definition.
>
> My guess is that the only thing that really needs to
> lock_flocks() in lockd are the nlm_file_inuse and
> nlm_traverse_locks functions because they traverse
> the inode->i_flock list. All the exported functions
> from fs/lock.c take care of locking in their own way
> (possibly not lease_get_time, as I just discovered,
> but that was never called under the BKL...).
>
> Arnd
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, Oct 26, 2010 at 02:37:26PM -0700, Linus Torvalds wrote:
> On Tue, Oct 26, 2010 at 2:24 PM, J. Bruce Fields <[email protected]> wrote:
> >
> > I did a couple connectathon runs just now with no obvious ill effects
> > except for some sleep-within-spinlock warnings in the lease code.
>
> Hmm. Those sleep-within-spinlock warnings are very likely serious
> bugs.
Yeah, didn't mean to belittle them.
> Can you quote the whole warning with stack trace?
It's just obvious allocations in setlease:
BUG: sleeping function called from invalid context at mm/slab.c:3101
in_atomic(): 1, irqs_disabled(): 0, pid: 4345, name: lease_tests
1 lock held by lease_tests/4345:
#0: (file_lock_lock){+.+.+.}, at: [<ffffffff81128be5>] lock_flocks+0x15/0x20
Pid: 4345, comm: lease_tests Not tainted 2.6.36-05858-gbd5e20b #1028
Call Trace:
[<ffffffff8103141d>] __might_sleep+0x10d/0x140
[<ffffffff810e3ad3>] kmem_cache_alloc+0x1f3/0x230
[<ffffffff8112a4d2>] generic_setlease+0x112/0x2c0
[<ffffffff8112a6b5>] __vfs_setlease+0x35/0x40
[<ffffffff8112acfe>] fcntl_setlease+0xce/0x180
[<ffffffff810f7c2e>] sys_fcntl+0x2fe/0x630
[<ffffffff81961999>] ? trace_hardirqs_on_thunk+0x3a/0x3f
[<ffffffff81002658>] system_call_fastpath+0x16/0x1b
I'm testing a patch.
--b.
On Tue, Oct 26, 2010 at 02:02:05PM -0700, Linus Torvalds wrote:
> On Tue, Oct 26, 2010 at 1:55 PM, Arnd Bergmann <[email protected]> wrote:
> >
> > Finally, we can build fs/locks.c, nfs and nfsd with CONFIG_BKL
> > disabled.
> >
> > *not tested*
>
> Ok, please make this so, so that we finally can effectively get rid of
> the BKL in this release (I don't really care about the random old
> drivers that may be "depends on BKL").
>
> But that obviously requires at least some minimal testing by somebody
> who is using NFS heavily, including locking. I assume that the
> connectathon tests include file locking? And I presume that the nfs
> developers run those at least semi-regularly and could do at least
> that kind of minimal testing?
I did a couple connectathon runs just now with no obvious ill effects
except for some sleep-within-spinlock warnings in the lease code.
Yeah, connecthon does include lock tests, though they're just basic
correctness tests.
--b.
On Tue, Oct 26, 2010 at 2:24 PM, J. Bruce Fields <[email protected]> wrote:
>
> I did a couple connectathon runs just now with no obvious ill effects
> except for some sleep-within-spinlock warnings in the lease code.
Hmm. Those sleep-within-spinlock warnings are very likely serious
bugs. Can you quote the whole warning with stack trace?
Linus
On Tuesday 26 October 2010 18:45:50 J. Bruce Fields wrote:
> Bryan Schumaker (1):
> lockd: Mostly remove BKL from the server
Could you explain the "mostly" part of this commit?
The commit message only says "This patch removes all
but one call to lock_kernel() from the server." This one
call is what keeps us from removing the BKL from fs/locks.c
because I can't tell if you still suspect that lockd
needs to lock against posix file locks or if there was
a different reason for leaving it in.
I can't think of anything else that this might be locking
against because everything that might interact with lockd
now does not use the BKL any more and lockd is
single-threaded by definition.
My guess is that the only thing that really needs to
lock_flocks() in lockd are the nlm_file_inuse and
nlm_traverse_locks functions because they traverse
the inode->i_flock list. All the exported functions
from fs/lock.c take care of locking in their own way
(possibly not lease_get_time, as I just discovered,
but that was never called under the BKL...).
Arnd
On Tue, Oct 26, 2010 at 05:44:41PM -0400, J. Bruce Fields wrote:
> On Tue, Oct 26, 2010 at 02:37:26PM -0700, Linus Torvalds wrote:
> > On Tue, Oct 26, 2010 at 2:24 PM, J. Bruce Fields <[email protected]> wrote:
> > >
> > > I did a couple connectathon runs just now with no obvious ill effects
> > > except for some sleep-within-spinlock warnings in the lease code.
> >
> > Hmm. Those sleep-within-spinlock warnings are very likely serious
> > bugs.
>
> Yeah, didn't mean to belittle them.
>
> > Can you quote the whole warning with stack trace?
>
> It's just obvious allocations in setlease:
>
> BUG: sleeping function called from invalid context at mm/slab.c:3101
> in_atomic(): 1, irqs_disabled(): 0, pid: 4345, name: lease_tests
> 1 lock held by lease_tests/4345:
> #0: (file_lock_lock){+.+.+.}, at: [<ffffffff81128be5>] lock_flocks+0x15/0x20
> Pid: 4345, comm: lease_tests Not tainted 2.6.36-05858-gbd5e20b #1028
> Call Trace:
> [<ffffffff8103141d>] __might_sleep+0x10d/0x140
> [<ffffffff810e3ad3>] kmem_cache_alloc+0x1f3/0x230
> [<ffffffff8112a4d2>] generic_setlease+0x112/0x2c0
> [<ffffffff8112a6b5>] __vfs_setlease+0x35/0x40
> [<ffffffff8112acfe>] fcntl_setlease+0xce/0x180
> [<ffffffff810f7c2e>] sys_fcntl+0x2fe/0x630
> [<ffffffff81961999>] ? trace_hardirqs_on_thunk+0x3a/0x3f
> [<ffffffff81002658>] system_call_fastpath+0x16/0x1b
>
> I'm testing a patch.
This works for me.
I'm not saying it's correct, but it does at least pass my dumb tests
without complaining.
--b.
diff --git a/fs/locks.c b/fs/locks.c
index 02b6e0e..db3afa0 100644
--- a/fs/locks.c
+++ b/fs/locks.c
@@ -1379,7 +1379,9 @@ int generic_setlease(struct file *filp, long arg, struct file_lock **flp)
if (error)
return error;
+ lock_flocks();
time_out_leases(inode);
+ unlock_flocks();
BUG_ON(!(*flp)->fl_lmops->fl_break);
@@ -1400,6 +1402,7 @@ int generic_setlease(struct file *filp, long arg, struct file_lock **flp)
goto out;
}
+ lock_flocks();
/*
* At this point, we know that if there is an exclusive
* lease on this file, then we hold it on this filp
@@ -1427,28 +1430,31 @@ int generic_setlease(struct file *filp, long arg, struct file_lock **flp)
error = -EAGAIN;
if ((arg == F_RDLCK && (wrlease_count > 0)) ||
(arg == F_WRLCK && ((rdlease_count + wrlease_count) > 0)))
- goto out;
+ goto out_unlock;
if (my_before != NULL) {
*flp = *my_before;
error = lease->fl_lmops->fl_change(my_before, arg);
- goto out;
+ goto out_unlock;
}
error = 0;
if (arg == F_UNLCK)
- goto out;
+ goto out_unlock;
error = -EINVAL;
if (!leases_enable)
- goto out;
+ goto out_unlock;
locks_copy_lock(new_fl, lease);
locks_insert_lock(before, new_fl);
*flp = new_fl;
+ unlock_flocks();
return 0;
+out_unlock:
+ unlock_flocks();
out:
if (new_fl != NULL)
locks_free_lock(new_fl);
@@ -1495,9 +1501,7 @@ int vfs_setlease(struct file *filp, long arg, struct file_lock **lease)
{
int error;
- lock_flocks();
error = __vfs_setlease(filp, arg, lease);
- unlock_flocks();
return error;
}
@@ -1524,8 +1528,6 @@ int fcntl_setlease(unsigned int fd, struct file *filp, long arg)
if (error)
return error;
- lock_flocks();
-
error = __vfs_setlease(filp, arg, &flp);
if (error || arg == F_UNLCK)
goto out_unlock;
@@ -1541,7 +1543,6 @@ int fcntl_setlease(unsigned int fd, struct file *filp, long arg)
error = __f_setown(filp, task_pid(current), PIDTYPE_PID, 0);
out_unlock:
- unlock_flocks();
return error;
}
On Wednesday 27 October 2010 22:01:45 J. Bruce Fields wrote:
> (You're missing Linus's comment fix, though, unless I'm confused.)
Right, I didn't look into the attachment, so I hadn't noticed
the new changelog. I've updated the changelog to Linus' version,
as it's more verbose than mine.
I also checked that the version in your tree is actually the
right one, so I left the code untouched.
Arnd
On Wed, Oct 27, 2010 at 11:16:46AM -0400, J. Bruce Fields wrote:
> On Wed, Oct 27, 2010 at 10:59:29AM -0400, Christoph Hellwig wrote:
> > On Wed, Oct 27, 2010 at 10:55:39AM -0400, J. Bruce Fields wrote:
> > > Hm, two problems:
> > > - We introduce the possibility of fcntl(fd, F_SETLEASE, F_UNLCK)
> > > failing with ENOMEM.
> >
> > splitt ->setlease into ->add_least and ->delete_lease. No need to pass
> > in a structure for the later. No need to return one either.
>
> Sounds fine to me.
>
> >
> > > - fasync_helper(.,.,1,.) sleeps. Argh.
> >
> > That's not new..
>
> So we could do
>
> unlock_flocks();
> error = fasync_helper(fd, filp, 1, &fl->fl_fasync);
> lock_flocks();
>
> and say, hey, we didn't introduce any new bug there. But....
>
> I don't know, maybe add a version of fasync_add_entry() that takes a
> preallocated fasync_struct??
Or just convert the lock to a sleeping mutex. Now that we have adaptive
spinning the horrible behaviour that Willy saw years ago might not be
that bad any more. That'll need some benchmarking, though.
On Wed, Oct 27, 2010 at 11:20 AM, Arnd Bergmann <[email protected]> wrote:
>
> Ok, then let's use your tree with proper changelogs. That should be the
> easiest way to make sure that the code you test is the one that gets in.
Half of them are yours, and the one that is mine needs the comment
fixup. Can you organize it all for me to pull?
Here's my patch again, with the comment move and a suggested commit
message (and my sign-off and Bruce's tested-by: the only thing that
changed in the patch was literally some comment placement so the
tested-by should still be perfectly valid).
Feel free to edit the message/patch to your hearts content.
Thanks,
Linus
On Wed, Oct 27, 2010 at 10:59:29AM -0400, Christoph Hellwig wrote:
> On Wed, Oct 27, 2010 at 10:55:39AM -0400, J. Bruce Fields wrote:
> > Hm, two problems:
> > - We introduce the possibility of fcntl(fd, F_SETLEASE, F_UNLCK)
> > failing with ENOMEM.
>
> splitt ->setlease into ->add_least and ->delete_lease. No need to pass
> in a structure for the later. No need to return one either.
Sounds fine to me.
>
> > - fasync_helper(.,.,1,.) sleeps. Argh.
>
> That's not new..
So we could do
unlock_flocks();
error = fasync_helper(fd, filp, 1, &fl->fl_fasync);
lock_flocks();
and say, hey, we didn't introduce any new bug there. But....
I don't know, maybe add a version of fasync_add_entry() that takes a
preallocated fasync_struct??
--b.
I noticed a couple problems with the lease patches. Also:
On Wed, Oct 27, 2010 at 10:59:29AM -0400, Christoph Hellwig wrote:
> On Wed, Oct 27, 2010 at 10:55:39AM -0400, J. Bruce Fields wrote:
> > Hm, two problems:
> > - We introduce the possibility of fcntl(fd, F_SETLEASE, F_UNLCK)
> > failing with ENOMEM.
>
> splitt ->setlease into ->add_least and ->delete_lease. No need to pass
> in a structure for the later. No need to return one either.
We still need to do this part. I just did the minimum required to fix
the problem--I figure splitting out a separate ->delete_lease method
(which I agree would be cleaner) could wait till the next merge window.
Patches follow, tested on top of the merge of Arnd's patches; if they
look OK they can go in, or I can commit them on top of a few pending
nfsd fixes and send another pull request.
It'd be nice if we get this fixed by -rc1, but not the end of the world
if we don't.
--b.
On Sat, Oct 30, 2010 at 05:25:00PM -0400, J. Bruce Fields wrote:
> We still need to do this part. I just did the minimum required to fix
> the problem--I figure splitting out a separate ->delete_lease method
> (which I agree would be cleaner) could wait till the next merge window.
I'd say please do it now. The pre-allocated file_lock changes make
the semantics inside ->setlease really nasty right now. I think it's a
dumb idea to leave it that way even for a single merge window.
On Saturday 30 October 2010, J. Bruce Fields wrote:
> We still need to do this part. I just did the minimum required to fix
> the problem--I figure splitting out a separate ->delete_lease method
> (which I agree would be cleaner) could wait till the next merge window.
>
> Patches follow, tested on top of the merge of Arnd's patches; if they
> look OK they can go in, or I can commit them on top of a few pending
> nfsd fixes and send another pull request.
All four patches look good AFAICT.
Thanks for the follow-up!
Arnd
On Sat, Oct 30, 2010 at 10:07:08PM -0400, Christoph Hellwig wrote:
> On Sat, Oct 30, 2010 at 05:25:00PM -0400, J. Bruce Fields wrote:
> > We still need to do this part. I just did the minimum required to fix
> > the problem--I figure splitting out a separate ->delete_lease method
> > (which I agree would be cleaner) could wait till the next merge window.
>
> I'd say please do it now. The pre-allocated file_lock changes make
> the semantics inside ->setlease really nasty right now. I think it's a
> dumb idea to leave it that way even for a single merge window.
OK, will do.
--b.
On Tue, Oct 26, 2010 at 10:39:41AM -0700, Linus Torvalds wrote:
> But even that kind of "both non-merged and pre-merged" pull requests
> should be reserved just for cases where there really was a somewhat
> involved conflict. For trivial conflicts like a feature removal doc
> clash, don't even bother. I do several trivial merges a day, it's
> normal and it doesn't really take me any appreciable extra time. Git
> command line of the day:
>
> git log --oneline v2.6.36.. --merges --author=Torvalds --grep=onflict
>
> to see at least a partial list of the trivial conflicts I've done in
> just this merge window. It's 3-4 per day, it's perfectly normal.
Makes sense, thanks!
--b.
On Wed, Oct 27, 2010 at 09:12:06AM -0700, Linus Torvalds wrote:
> On Wed, Oct 27, 2010 at 8:23 AM, Arnd Bergmann <[email protected]> wrote:
> >
> > locks_delete_lock is also called with lock_flocks held and calls
> > fasync_helper...
>
> We don't really have to use fasync_helper.
>
> In fact, the whole interface is pretty broken for something like file
> locking, which isn't actually "fasync()". That whole "on/off as an
> argument" is just crazy. It would be _trivial_ to expose a version of
> fasync_helper() that takes a pre-allocated fasync_struct for add, and
> that has separate helper functions for the add/delete case so that you
> don't have the pointless crazy arguments (for "delete" the "fd"
> argument is useless, and I do hate "modal" functions that take what
> they should do as a flag).
>
> Then fcntl_setlease() would trivially just allocate the dang thing before.
>
> Something like the attached (UNTESTED!) perhaps?
Makes sense to me. Testing....
--b.
>
> Linus
> fs/fcntl.c | 66 +++++++++++++++++++++++++++++++++++++++------------
> fs/locks.c | 17 ++++++++++++-
> include/linux/fs.h | 5 ++++
> 3 files changed, 71 insertions(+), 17 deletions(-)
>
> diff --git a/fs/fcntl.c b/fs/fcntl.c
> index f8cc34f..dcdbc6f 100644
> --- a/fs/fcntl.c
> +++ b/fs/fcntl.c
> @@ -640,7 +640,7 @@ static void fasync_free_rcu(struct rcu_head *head)
> * match the state "is the filp on a fasync list".
> *
> */
> -static int fasync_remove_entry(struct file *filp, struct fasync_struct **fapp)
> +int fasync_remove_entry(struct file *filp, struct fasync_struct **fapp)
> {
> struct fasync_struct *fa, **fp;
> int result = 0;
> @@ -666,21 +666,28 @@ static int fasync_remove_entry(struct file *filp, struct fasync_struct **fapp)
> return result;
> }
>
> +struct fasync_struct *fasync_alloc(void)
> +{
> + return kmem_cache_alloc(fasync_cache, GFP_KERNEL);
> +}
> +
> /*
> - * Add a fasync entry. Return negative on error, positive if
> - * added, and zero if did nothing but change an existing one.
> - *
> - * NOTE! It is very important that the FASYNC flag always
> - * match the state "is the filp on a fasync list".
> + * NOTE! This can be used only for unused fasync entries:
> + * entries that actually got inserted on the fasync list
> + * need to be released by rcu - see fasync_remove_entry.
> */
> -static int fasync_add_entry(int fd, struct file *filp, struct fasync_struct **fapp)
> +void fasync_free(struct fasync_struct *new)
> {
> - struct fasync_struct *new, *fa, **fp;
> - int result = 0;
> + kmem_cache_free(fasync_cache, new);
> +}
>
> - new = kmem_cache_alloc(fasync_cache, GFP_KERNEL);
> - if (!new)
> - return -ENOMEM;
> +/*
> + * Insert a new entry into the fasync list. Return the pointer to the
> + * old one if we didn't use the new one.
> + */
> +struct fasync_struct *fasync_insert_entry(int fd, struct file *filp, struct fasync_struct **fapp, struct fasync_struct *new)
> +{
> + struct fasync_struct *fa, **fp;
>
> spin_lock(&filp->f_lock);
> spin_lock(&fasync_lock);
> @@ -691,8 +698,6 @@ static int fasync_add_entry(int fd, struct file *filp, struct fasync_struct **fa
> spin_lock_irq(&fa->fa_lock);
> fa->fa_fd = fd;
> spin_unlock_irq(&fa->fa_lock);
> -
> - kmem_cache_free(fasync_cache, new);
> goto out;
> }
>
> @@ -702,13 +707,42 @@ static int fasync_add_entry(int fd, struct file *filp, struct fasync_struct **fa
> new->fa_fd = fd;
> new->fa_next = *fapp;
> rcu_assign_pointer(*fapp, new);
> - result = 1;
> filp->f_flags |= FASYNC;
>
> out:
> spin_unlock(&fasync_lock);
> spin_unlock(&filp->f_lock);
> - return result;
> + return fa;
> +}
> +
> +/*
> + * Add a fasync entry. Return negative on error, positive if
> + * added, and zero if did nothing but change an existing one.
> + *
> + * NOTE! It is very important that the FASYNC flag always
> + * match the state "is the filp on a fasync list".
> + */
> +static int fasync_add_entry(int fd, struct file *filp, struct fasync_struct **fapp)
> +{
> + struct fasync_struct *new;
> +
> + new = fasync_alloc();
> + if (!new)
> + return -ENOMEM;
> +
> + /*
> + * fasync_insert_entry() returns the old (update) entry if
> + * it existed.
> + *
> + * So free the (unused) new entry and return 0 to let the
> + * caller know that we didn't add any new fasync entries.
> + */
> + if (fasync_insert_entry(fd, filp, fapp, new)) {
> + fasync_free(new);
> + return 0;
> + }
> +
> + return 1;
> }
>
> /*
> diff --git a/fs/locks.c b/fs/locks.c
> index 4de3a26..9ff3f66 100644
> --- a/fs/locks.c
> +++ b/fs/locks.c
> @@ -1515,6 +1515,7 @@ EXPORT_SYMBOL_GPL(vfs_setlease);
> int fcntl_setlease(unsigned int fd, struct file *filp, long arg)
> {
> struct file_lock fl, *flp = &fl;
> + struct fasync_struct *new;
> struct inode *inode = filp->f_path.dentry->d_inode;
> int error;
>
> @@ -1523,13 +1524,25 @@ int fcntl_setlease(unsigned int fd, struct file *filp, long arg)
> if (error)
> return error;
>
> + new = fasync_alloc();
> + if (!new)
> + return -ENOMEM;
> +
> lock_flocks();
>
> error = __vfs_setlease(filp, arg, &flp);
> if (error || arg == F_UNLCK)
> goto out_unlock;
>
> - error = fasync_helper(fd, filp, 1, &flp->fl_fasync);
> + /*
> + * fasync_insert_entry() returns the old entry if any.
> + * If there was no old entry, then it used 'new' and
> + * inserted it into the fasync list. Clear new so that
> + * we don't release it here.
> + */
> + if (!fasync_insert_entry(fd, filp, &flp->fl_fasync, new))
> + new = NULL;
> +
> if (error < 0) {
> /* remove lease just inserted by setlease */
> flp->fl_type = F_UNLCK | F_INPROGRESS;
> @@ -1541,6 +1554,8 @@ int fcntl_setlease(unsigned int fd, struct file *filp, long arg)
> error = __f_setown(filp, task_pid(current), PIDTYPE_PID, 0);
> out_unlock:
> unlock_flocks();
> + if (new)
> + fasync_free(new);
> return error;
> }
>
> diff --git a/include/linux/fs.h b/include/linux/fs.h
> index 240eb1d..d487772 100644
> --- a/include/linux/fs.h
> +++ b/include/linux/fs.h
> @@ -1310,6 +1310,11 @@ struct fasync_struct {
>
> /* SMP safe fasync helpers: */
> extern int fasync_helper(int, struct file *, int, struct fasync_struct **);
> +extern struct fasync_struct *fasync_insert_entry(int, struct file *, struct fasync_struct **, struct fasync_struct *);
> +extern int fasync_remove_entry(struct file *, struct fasync_struct **);
> +extern struct fasync_struct *fasync_alloc(void);
> +extern void fasync_free(struct fasync_struct *);
> +
> /* can be called from interrupts */
> extern void kill_fasync(struct fasync_struct **, int, int);
>
Btw, the nfsd delegation code still has a few comments referring to the
BKL. This should probably be cleaned up:
fs/nfsd/nfs4state.c: * Called from break_lease() with lock_kernel() held.
fs/nfsd/nfs4state.c: /* only place dl_time is set. protected by lock_kernel*/
fs/nfsd/nfs4state.c: * Called by locks_free_lock() with lock_kernel() held.
fs/nfsd/nfs4state.c: * Called from setlease() with lock_kernel() held
On Sun, Oct 31, 2010 at 08:35:10AM -0400, Christoph Hellwig wrote:
> The caller allocated it, the caller should free it. The only issue so
> far is that we could change the flp pointer even on an error return if
> the fl_change callback failed. But we can simply move the flp assignment
> after the fl_change invocation, as the callers don't care about the
> flp return value if the setlease call failed.
>
> Signed-off-by: Christoph Hellwig <[email protected]>
>
> Index: linux-2.6/fs/cifs/cifsfs.c
> ===================================================================
> --- linux-2.6.orig/fs/cifs/cifsfs.c 2010-10-31 07:10:07.636004223 -0400
> +++ linux-2.6/fs/cifs/cifsfs.c 2010-10-31 07:10:10.922004154 -0400
> @@ -625,11 +625,8 @@ static int cifs_setlease(struct file *fi
> knows that the file won't be changed on the server
> by anyone else */
> return generic_setlease(file, arg, lease);
> - else {
> - if (arg != F_UNLCK)
> - locks_free_lock(*lease);
> + else
> return -EAGAIN;
> - }
> }
>
> struct file_system_type cifs_fs_type = {
> Index: linux-2.6/fs/gfs2/file.c
> ===================================================================
> --- linux-2.6.orig/fs/gfs2/file.c 2010-10-31 07:10:07.643004363 -0400
> +++ linux-2.6/fs/gfs2/file.c 2010-10-31 07:10:10.923003665 -0400
> @@ -629,8 +629,6 @@ static ssize_t gfs2_file_aio_write(struc
>
> static int gfs2_setlease(struct file *file, long arg, struct file_lock **fl)
> {
> - if (arg != F_UNLCK)
> - locks_free_lock(*fl);
> return -EINVAL;
> }
>
> Index: linux-2.6/fs/locks.c
> ===================================================================
> --- linux-2.6.orig/fs/locks.c 2010-10-31 07:10:07.649004084 -0400
> +++ linux-2.6/fs/locks.c 2010-10-31 07:34:10.102255587 -0400
> @@ -1428,8 +1425,9 @@ int generic_setlease(struct file *filp,
> goto out;
>
> if (my_before != NULL) {
> - *flp = *my_before;
> error = lease->fl_lmops->fl_change(my_before, arg);
> + if (!error)
> + *flp = *my_before;
Argh, missed this: we're leaking the passed-in lease in this case.
--b.
> goto out;
> }
>
> @@ -1444,8 +1442,6 @@ int generic_setlease(struct file *filp,
> return 0;
>
> out:
> - if (arg != F_UNLCK)
> - locks_free_lock(lease);
> return error;
> }
> EXPORT_SYMBOL(generic_setlease);
> @@ -1524,8 +1520,11 @@ static int do_fcntl_add_lease(unsigned i
> }
> lock_flocks();
> error = __vfs_setlease(filp, arg, &fl);
> - if (error)
> - goto out_unlock;
> + if (error) {
> + unlock_flocks();
> + locks_free_lock(fl);
> + goto out_free_fasync;
> + }
>
> /*
> * fasync_insert_entry() returns the old entry if any.
> @@ -1541,12 +1540,12 @@ static int do_fcntl_add_lease(unsigned i
> fl->fl_type = F_UNLCK | F_INPROGRESS;
> fl->fl_break_time = jiffies - 10;
> time_out_leases(inode);
> - goto out_unlock;
> + } else {
> + error = __f_setown(filp, task_pid(current), PIDTYPE_PID, 0);
> }
> -
> - error = __f_setown(filp, task_pid(current), PIDTYPE_PID, 0);
> -out_unlock:
> unlock_flocks();
> +
> +out_free_fasync:
> if (new)
> fasync_free(new);
> return error;
> Index: linux-2.6/fs/nfs/file.c
> ===================================================================
> --- linux-2.6.orig/fs/nfs/file.c 2010-10-31 07:10:07.658003804 -0400
> +++ linux-2.6/fs/nfs/file.c 2010-10-31 07:10:10.936003734 -0400
> @@ -884,7 +884,5 @@ static int nfs_setlease(struct file *fil
> dprintk("NFS: setlease(%s/%s, arg=%ld)\n",
> file->f_path.dentry->d_parent->d_name.name,
> file->f_path.dentry->d_name.name, arg);
> - if (arg != F_UNLCK)
> - locks_free_lock(*fl);
> return -EINVAL;
> }
> Index: linux-2.6/fs/nfsd/nfs4state.c
> ===================================================================
> --- linux-2.6.orig/fs/nfsd/nfs4state.c 2010-10-31 07:10:07.666004084 -0400
> +++ linux-2.6/fs/nfsd/nfs4state.c 2010-10-31 07:32:56.906254608 -0400
> @@ -2652,6 +2652,7 @@ nfs4_open_delegation(struct svc_fh *fh,
> if ((status = vfs_setlease(fl->fl_file, fl->fl_type, &fl))) {
> dprintk("NFSD: setlease failed [%d], no delegation\n", status);
> dp->dl_flock = NULL;
> + locks_free_lock(fl);
> unhash_delegation(dp);
> flag = NFS4_OPEN_DELEGATE_NONE;
> goto out;
On Sat, Nov 06, 2010 at 03:03:32PM -0400, Christoph Hellwig wrote:
> On Wed, Nov 03, 2010 at 09:40:24PM -0400, J. Bruce Fields wrote:
> > The irritating thing is that the only lease user I understand is the
> > nfsd code, and it doesn't want this lease-merging behavior; the only
> > reason that fl_change is there is so it can just turn this case into an
> > error every time.
>
> Yes.
>
> > And I have no idea what the requirements are of any other users: do
> > leases behave like this on purpose, or was it just an arbitrary choice,
> > and does anyone depend on it now?
>
> Adding Willy and Stephen to the Cc list as they wrote the code.
>
> > In the end maybe it would be better just to leave leases as they are and
> > define a new lock type for nfsd.
> >
> > We'd probably have to do that eventually anyway, and it'd save me trying
> > to guess what the lease semantics are supposed to be....
>
> I'd rather see both leases and the nfs4 delegations detangled from the
> locks.c code.
What are you thinking of?
> It's far too much of a mess already anyway.
>
> > Subject: [PATCH 1/2] locks: fix leak on merging leases
> >
> > We must also free the passed-in lease in the case it wasn't used because
> > an existing lease was upgrade/downgraded or already existed.
> >
> > Note the nfsd caller doesn't care because it's fl_change callback
> > returns an error in those cases.
>
> The patch looks good to me. Care to feed it to Linus?
Yep, will do.
--b.
On Wed, Nov 03, 2010 at 04:41:48PM -0400, J. Bruce Fields wrote:
> On Sun, Oct 31, 2010 at 08:35:10AM -0400, Christoph Hellwig wrote:
> > Index: linux-2.6/fs/locks.c
> > ===================================================================
> > --- linux-2.6.orig/fs/locks.c 2010-10-31 07:10:07.649004084 -0400
> > +++ linux-2.6/fs/locks.c 2010-10-31 07:34:10.102255587 -0400
> > @@ -1428,8 +1425,9 @@ int generic_setlease(struct file *filp,
> > goto out;
> >
> > if (my_before != NULL) {
> > - *flp = *my_before;
> > error = lease->fl_lmops->fl_change(my_before, arg);
> > + if (!error)
> > + *flp = *my_before;
>
> Argh, missed this: we're leaking the passed-in lease in this case.
We could do something like this.
The irritating thing is that the only lease user I understand is the
nfsd code, and it doesn't want this lease-merging behavior; the only
reason that fl_change is there is so it can just turn this case into an
error every time.
And I have no idea what the requirements are of any other users: do
leases behave like this on purpose, or was it just an arbitrary choice,
and does anyone depend on it now?
In the end maybe it would be better just to leave leases as they are and
define a new lock type for nfsd.
We'd probably have to do that eventually anyway, and it'd save me trying
to guess what the lease semantics are supposed to be....
--b.
>From f5c6d51ae638af213d8bda31504f4b2287b8a801 Mon Sep 17 00:00:00 2001
From: J. Bruce Fields <[email protected]>
Date: Wed, 3 Nov 2010 16:49:44 -0400
Subject: [PATCH 1/2] locks: fix leak on merging leases
We must also free the passed-in lease in the case it wasn't used because
an existing lease was upgrade/downgraded or already existed.
Note the nfsd caller doesn't care because it's fl_change callback
returns an error in those cases.
Signed-off-by: J. Bruce Fields <[email protected]>
---
fs/locks.c | 7 +++++--
1 files changed, 5 insertions(+), 2 deletions(-)
diff --git a/fs/locks.c b/fs/locks.c
index 65765cb..61c22f7 100644
--- a/fs/locks.c
+++ b/fs/locks.c
@@ -1504,7 +1504,7 @@ static int do_fcntl_delete_lease(struct file *filp)
static int do_fcntl_add_lease(unsigned int fd, struct file *filp, long arg)
{
- struct file_lock *fl;
+ struct file_lock *fl, *ret;
struct fasync_struct *new;
struct inode *inode = filp->f_path.dentry->d_inode;
int error;
@@ -1518,6 +1518,7 @@ static int do_fcntl_add_lease(unsigned int fd, struct file *filp, long arg)
locks_free_lock(fl);
return -ENOMEM;
}
+ ret = fl;
lock_flocks();
error = __vfs_setlease(filp, arg, &fl);
if (error) {
@@ -1525,6 +1526,8 @@ static int do_fcntl_add_lease(unsigned int fd, struct file *filp, long arg)
locks_free_lock(fl);
goto out_free_fasync;
}
+ if (ret != fl)
+ locks_free_lock(fl);
/*
* fasync_insert_entry() returns the old entry if any.
@@ -1532,7 +1535,7 @@ static int do_fcntl_add_lease(unsigned int fd, struct file *filp, long arg)
* inserted it into the fasync list. Clear new so that
* we don't release it here.
*/
- if (!fasync_insert_entry(fd, filp, &fl->fl_fasync, new))
+ if (!fasync_insert_entry(fd, filp, &ret->fl_fasync, new))
new = NULL;
if (error < 0) {
--
1.7.1
On Wed, Nov 03, 2010 at 09:40:24PM -0400, J. Bruce Fields wrote:
> On Wed, Nov 03, 2010 at 04:41:48PM -0400, J. Bruce Fields wrote:
> > On Sun, Oct 31, 2010 at 08:35:10AM -0400, Christoph Hellwig wrote:
> > > Index: linux-2.6/fs/locks.c
> > > ===================================================================
> > > --- linux-2.6.orig/fs/locks.c 2010-10-31 07:10:07.649004084 -0400
> > > +++ linux-2.6/fs/locks.c 2010-10-31 07:34:10.102255587 -0400
> > > @@ -1428,8 +1425,9 @@ int generic_setlease(struct file *filp,
> > > goto out;
> > >
> > > if (my_before != NULL) {
> > > - *flp = *my_before;
> > > error = lease->fl_lmops->fl_change(my_before, arg);
> > > + if (!error)
> > > + *flp = *my_before;
> >
> > Argh, missed this: we're leaking the passed-in lease in this case.
>
> We could do something like this.
And while I was here I noticed:
--b.
>From 072be1f975c6f839661cfc4f6c45ccb944417eea Mon Sep 17 00:00:00 2001
From: J. Bruce Fields <[email protected]>
Date: Wed, 3 Nov 2010 18:09:18 -0400
Subject: [PATCH 2/2] locks: remove dead lease error-handling code
A minor oversight from f7347ce4ee7c65415f84be915c018473e7076f31,
"fasync: re-organize fasync entry insertion to allow it under a
spinlock": this cleanup-on-error was only needed to handle -ENOMEM. Now
that we're preallocating it's unneeded.
Signed-off-by: J. Bruce Fields <[email protected]>
---
fs/locks.c | 12 ++----------
1 files changed, 2 insertions(+), 10 deletions(-)
diff --git a/fs/locks.c b/fs/locks.c
index 61c22f7..0e62dd3 100644
--- a/fs/locks.c
+++ b/fs/locks.c
@@ -1506,7 +1506,6 @@ static int do_fcntl_add_lease(unsigned int fd, struct file *filp, long arg)
{
struct file_lock *fl, *ret;
struct fasync_struct *new;
- struct inode *inode = filp->f_path.dentry->d_inode;
int error;
fl = lease_alloc(filp, arg);
@@ -1520,7 +1519,7 @@ static int do_fcntl_add_lease(unsigned int fd, struct file *filp, long arg)
}
ret = fl;
lock_flocks();
- error = __vfs_setlease(filp, arg, &fl);
+ error = __vfs_setlease(filp, arg, &ret);
if (error) {
unlock_flocks();
locks_free_lock(fl);
@@ -1538,14 +1537,7 @@ static int do_fcntl_add_lease(unsigned int fd, struct file *filp, long arg)
if (!fasync_insert_entry(fd, filp, &ret->fl_fasync, new))
new = NULL;
- if (error < 0) {
- /* remove lease just inserted by setlease */
- fl->fl_type = F_UNLCK | F_INPROGRESS;
- fl->fl_break_time = jiffies - 10;
- time_out_leases(inode);
- } else {
- error = __f_setown(filp, task_pid(current), PIDTYPE_PID, 0);
- }
+ error = __f_setown(filp, task_pid(current), PIDTYPE_PID, 0);
unlock_flocks();
out_free_fasync:
--
1.7.1
On Mon, Nov 01, 2010 at 01:24:40PM -0400, J. Bruce Fields wrote:
> I also have patches that get rid of fl_release_private, fl_mylease, and
> (almost done) fl_change.
>
> Unless you've a better suggestion I'll probably send them out for review
> and then queue them up with other nfsd changes for 2.6.38.
Sounds good. I was also wondering if we can get rid of ->setlease
entirely. The file_lock_operations should be enough abstraction to
reject the leases in theory, but I need to look into it a bit more.
On Sun, Oct 31, 2010 at 07:10:57AM -0400, Christoph Hellwig wrote:
> On Sat, Oct 30, 2010 at 05:31:14PM -0400, J. Bruce Fields wrote:
> > We're depending on setlease to free the passed-in lease on failure.
>
> But we would be much better to just free it in the caller. I'ts much
> more natural - caller allocates, caller frees, and it's also simpler.
>
> I'll send a patch to do so shortly, together with sorting out the
> remaining nfs4d lock_manager_operations abuses. I think we're set with
> that for 2.6.37, the setlease split can wait once we've sorted out the
> lock freeing issue.
I also have patches that get rid of fl_release_private, fl_mylease, and
(almost done) fl_change.
Unless you've a better suggestion I'll probably send them out for review
and then queue them up with other nfsd changes for 2.6.38.
--b.
On Wed, Nov 03, 2010 at 09:40:24PM -0400, J. Bruce Fields wrote:
> The irritating thing is that the only lease user I understand is the
> nfsd code, and it doesn't want this lease-merging behavior; the only
> reason that fl_change is there is so it can just turn this case into an
> error every time.
Yes.
> And I have no idea what the requirements are of any other users: do
> leases behave like this on purpose, or was it just an arbitrary choice,
> and does anyone depend on it now?
Adding Willy and Stephen to the Cc list as they wrote the code.
> In the end maybe it would be better just to leave leases as they are and
> define a new lock type for nfsd.
>
> We'd probably have to do that eventually anyway, and it'd save me trying
> to guess what the lease semantics are supposed to be....
I'd rather see both leases and the nfs4 delegations detangled from the
locks.c code. It's far too much of a mess already anyway.
> Subject: [PATCH 1/2] locks: fix leak on merging leases
>
> We must also free the passed-in lease in the case it wasn't used because
> an existing lease was upgrade/downgraded or already existed.
>
> Note the nfsd caller doesn't care because it's fl_change callback
> returns an error in those cases.
The patch looks good to me. Care to feed it to Linus?
On Mon, Nov 01, 2010 at 01:41:22PM -0400, Christoph Hellwig wrote:
> On Mon, Nov 01, 2010 at 01:24:40PM -0400, J. Bruce Fields wrote:
> > I also have patches that get rid of fl_release_private, fl_mylease, and
> > (almost done) fl_change.
> >
> > Unless you've a better suggestion I'll probably send them out for review
> > and then queue them up with other nfsd changes for 2.6.38.
>
> Sounds good. I was also wondering if we can get rid of ->setlease
> entirely. The file_lock_operations should be enough abstraction to
> reject the leases in theory, but I need to look into it a bit more.
Note that cifs has a real lease operation now. No idea how it works or
if it's really correct.
--b.
> A minor oversight from f7347ce4ee7c65415f84be915c018473e7076f31,
> "fasync: re-organize fasync entry insertion to allow it under a
> spinlock": this cleanup-on-error was only needed to handle -ENOMEM. Now
> that we're preallocating it's unneeded.
Looks good to me, too.
On Sun, Oct 31, 2010 at 08:34:20AM -0400, Christoph Hellwig wrote:
> And two more fixups on top of the set sent by Bruce that already are
> in Linus' tree.
Both look good to me, thanks!
--b.