Return-Path: Received: from mail-bw0-f46.google.com ([209.85.214.46]:53628 "EHLO mail-bw0-f46.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1756882Ab0ITQVw convert rfc822-to-8bit (ORCPT ); Mon, 20 Sep 2010 12:21:52 -0400 Received: by bwz11 with SMTP id 11so4374143bwz.19 for ; Mon, 20 Sep 2010 09:21:50 -0700 (PDT) In-Reply-To: <4C976E9A.5080000@panasas.com> References: <1284779874-10499-1-git-send-email-iisaman@netapp.com> <1284779874-10499-8-git-send-email-iisaman@netapp.com> <4C976E9A.5080000@panasas.com> Date: Mon, 20 Sep 2010 12:21:50 -0400 Message-ID: Subject: Re: [PATCH 07/12] RFC: pnfs: full mount/umount infrastructure From: Fred Isaman To: Benny Halevy Cc: linux-nfs@vger.kernel.org Content-Type: text/plain; charset=ISO-8859-1 Sender: linux-nfs-owner@vger.kernel.org List-ID: MIME-Version: 1.0 On Mon, Sep 20, 2010 at 10:24 AM, Benny Halevy wrote: > On 2010-09-18 05:17, Fred Isaman wrote: >> From: The pNFS Team >> >> Allow a module implementing a layout type to register, and >> have its mount/umount routines called for filesystems that >> the server declares support it. >> >> Signed-off-by: TBD - melding/reorganization of several patches >> --- >> ?Documentation/filesystems/nfs/00-INDEX | ? ?2 + >> ?Documentation/filesystems/nfs/pnfs.txt | ? 48 ++++++++++++++++++ >> ?fs/nfs/Kconfig ? ? ? ? ? ? ? ? ? ? ? ? | ? ?2 +- >> ?fs/nfs/pnfs.c ? ? ? ? ? ? ? ? ? ? ? ? ?| ? 82 +++++++++++++++++++++++++++++++- >> ?fs/nfs/pnfs.h ? ? ? ? ? ? ? ? ? ? ? ? ?| ? ?9 ++++ >> ?5 files changed, 140 insertions(+), 3 deletions(-) >> ?create mode 100644 Documentation/filesystems/nfs/pnfs.txt >> >> diff --git a/Documentation/filesystems/nfs/00-INDEX b/Documentation/filesystems/nfs/00-INDEX >> index 2f68cd6..8d930b9 100644 >> --- a/Documentation/filesystems/nfs/00-INDEX >> +++ b/Documentation/filesystems/nfs/00-INDEX >> @@ -12,5 +12,7 @@ nfs-rdma.txt >> ? ? ? - how to install and setup the Linux NFS/RDMA client and server software >> ?nfsroot.txt >> ? ? ? - short guide on setting up a diskless box with NFS root filesystem. >> +pnfs.txt >> + ? ? - short explanation of some of the internals of the pnfs code > > that is, pnfs _client_ code... > OK >> ?rpc-cache.txt >> ? ? ? - introduction to the caching mechanisms in the sunrpc layer. >> diff --git a/Documentation/filesystems/nfs/pnfs.txt b/Documentation/filesystems/nfs/pnfs.txt >> new file mode 100644 >> index 0000000..bc0b9cf >> --- /dev/null >> +++ b/Documentation/filesystems/nfs/pnfs.txt >> @@ -0,0 +1,48 @@ >> +Reference counting in pnfs: >> +========================== >> + >> +The are several inter-related caches. ?We have layouts which can >> +reference multiple devices, each of which can reference multiple data servers. >> +Each data server can be referenced by multiple devices. ?Each device >> +can be referenced by multiple layouts. ?To keep all of this straight, >> +we need to reference count. >> + >> + >> +struct pnfs_layout_hdr >> +---------------------- >> +The on-the-wire command LAYOUTGET corresponds to struct >> +pnfs_layout_segment, usually referred to by the variable name lseg. >> +Each nfs_inode may hold a pointer to a cache of of these layout >> +segments in nfsi->layout, of type struct pnfs_layout_hdr. >> + >> +We reference the header for the inode pointing to it, across each >> +outstanding RPC call that references it (LAYOUTGET, fs/nfs/LAYOUTRETURN, >> +LAYOUTCOMMIT), and for each lseg held within. >> + >> +Each header is also (when non-empty) put on a list associated with >> +struct nfs_client (cl_layouts). ?Being put on this list does not bump >> +the reference count, as the layout is kept around by the lseg that >> +keeps it in the list. >> + >> +deviceid_cache >> +-------------- >> +lsegs reference device ids, which are resolved per nfs_client and >> +layout driver type. ?The device ids are held in a RCU cache (struct >> +nfs4_deviceid_cache). ?The cache itself is referenced across each >> +mount. ?The entries (struct nfs4_deviceid) themselves are held across >> +the lifetime of each lseg referencing them. >> + >> +RCU is used because the deviceid is basically a write once, read many >> +data structure. ?The hlist size of 32 buckets needs better >> +justification, but seems reasonable given that we can have multiple >> +deviceid's per filesystem, and multiple filesystems per nfs_client. >> + >> +The hash code is copied from the nfsd code base. ?A discussion of >> +hashing and variations of this algorithm can be found at: >> +http://groups.google.com/group/comp.lang.c/browse_thread/thread/9522965e2b8d3809 >> + >> +data server cache >> +----------------- >> +file driver devices refer to data servers, which are kept in a module >> +level cache. ?Its reference is held over the lifetime of the deviceid >> +pointing to it. >> diff --git a/fs/nfs/Kconfig b/fs/nfs/Kconfig >> index 6c2aad4..5f1b936 100644 >> --- a/fs/nfs/Kconfig >> +++ b/fs/nfs/Kconfig >> @@ -78,7 +78,7 @@ config NFS_V4_1 >> ? ? ? depends on NFS_V4 && EXPERIMENTAL >> ? ? ? help >> ? ? ? ? This option enables support for minor version 1 of the NFSv4 protocol >> - ? ? ? (draft-ietf-nfsv4-minorversion1) in the kernel's NFS client. >> + ? ? ? (RFC 5661) in the kernel's NFS client. >> >> ? ? ? ? If unsure, say N. >> >> diff --git a/fs/nfs/pnfs.c b/fs/nfs/pnfs.c >> index 2e5dba1..5a8a676 100644 >> --- a/fs/nfs/pnfs.c >> +++ b/fs/nfs/pnfs.c >> @@ -32,16 +32,53 @@ >> >> ?#define NFSDBG_FACILITY ? ? ? ? ? ? ?NFSDBG_PNFS >> >> -/* STUB that returns the equivalent of "no module found" */ >> +/* Locking: >> + * >> + * pnfs_spinlock: >> + * ? ? ?protects pnfs_modules_tbl. >> + */ >> +static DEFINE_SPINLOCK(pnfs_spinlock); >> + >> +/* >> + * pnfs_modules_tbl holds all pnfs modules >> + */ >> +static LIST_HEAD(pnfs_modules_tbl); >> + >> +/* Return the registered pnfs layout driver module matching given id */ >> +static struct pnfs_layoutdriver_type * >> +find_pnfs_driver_locked(u32 id) { > > nit: the curly brace should be moved down a line OK > >> + ? ? struct ?pnfs_layoutdriver_type *local; >> + >> + ? ? dprintk("PNFS: %s: Searching for %u\n", __func__, id); > > I'd move this printk down, before returning and print > the result as well. > OK >> + ? ? list_for_each_entry(local, &pnfs_modules_tbl, pnfs_tblid) >> + ? ? ? ? ? ? if (local->id == id) { >> + ? ? ? ? ? ? ? ? ? ? if (!try_module_get(local->owner)) >> + ? ? ? ? ? ? ? ? ? ? ? ? ? ? local = NULL; > > If this happens (e.g. in the case the module exited without > calling pnfs_unregister_layoutdriver) another (or a different instance) > layout driver might have registered on the same id so we need to keep > looking. There can only be a single driver registered with a particular id. So continuing the search is pointless. > >> + ? ? ? ? ? ? ? ? ? ? goto out; >> + ? ? ? ? ? ? } >> + ? ? local = NULL; >> +out: >> + ? ? return local; >> +} > > how about the following? > > static struct pnfs_layoutdriver_type * > find_pnfs_driver_locked(u32 id) > { > ? ? ? ?struct pnfs_layoutdriver_type *local, *found = NULL; > > ? ? ? ?list_for_each_entry (local, &pnfs_modules_tbl, pnfs_tblid) > ? ? ? ? ? ? ? ?if (local->id == id) { > ? ? ? ? ? ? ? ? ? ? ? ?if (!try_module_get(local->owner)) > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?continue; > ? ? ? ? ? ? ? ? ? ? ? ?found = local; > ? ? ? ? ? ? ? ? ? ? ? ?break; > ? ? ? ? ? ? ? ?} > out: > ? ? ? ?dprintk("%s: layout_type %u found %p\n", __func__, id, found); > ? ? ? ?return found; > } > >> + >> ?static struct pnfs_layoutdriver_type * >> ?find_pnfs_driver(u32 id) { >> - ? ? return NULL; >> + ? ? struct ?pnfs_layoutdriver_type *local; >> + >> + ? ? spin_lock(&pnfs_spinlock); >> + ? ? local = find_pnfs_driver_locked(id); >> + ? ? spin_unlock(&pnfs_spinlock); >> + ? ? return local; >> ?} >> >> ?/* Unitialize a mountpoint in a layout driver */ >> ?void >> ?unset_pnfs_layoutdriver(struct nfs_server *nfss) >> ?{ >> + ? ? if (nfss->pnfs_curr_ld) { >> + ? ? ? ? ? ? nfss->pnfs_curr_ld->uninitialize_mountpoint(nfss); >> + ? ? ? ? ? ? module_put(nfss->pnfs_curr_ld->owner); >> + ? ? } >> ? ? ? nfss->pnfs_curr_ld = NULL; >> ?} >> >> @@ -68,6 +105,13 @@ set_pnfs_layoutdriver(struct nfs_server *server, u32 id) >> ? ? ? ? ? ? ? ? ? ? ? goto out_no_driver; >> ? ? ? ? ? ? ? } >> ? ? ? } >> + ? ? if (ld_type->initialize_mountpoint(server)) { >> + ? ? ? ? ? ? printk(KERN_ERR >> + ? ? ? ? ? ? ? ? ? ?"%s: Error initializing mount point for layout driver %u.\n", >> + ? ? ? ? ? ? ? ? ? ?__func__, id); >> + ? ? ? ? ? ? module_put(ld_type->owner); >> + ? ? ? ? ? ? goto out_no_driver; >> + ? ? } >> ? ? ? server->pnfs_curr_ld = ld_type; >> ? ? ? dprintk("%s: pNFS module for %u set\n", __func__, id); >> ? ? ? return; >> @@ -76,3 +120,37 @@ out_no_driver: >> ? ? ? dprintk("%s: Using NFSv4 I/O\n", __func__); >> ? ? ? server->pnfs_curr_ld = NULL; >> ?} >> + >> +int >> +pnfs_register_layoutdriver(struct pnfs_layoutdriver_type *ld_type) >> +{ >> + ? ? int status = -EINVAL; >> + ? ? struct pnfs_layoutdriver_type *tmp; >> + > > Since we're relying on the fact the ld_type->id != 0 > let's add > ? ? ? ?BUG_ON(ld_type->id == 0); > > Benny OK. Fred > >> + ? ? spin_lock(&pnfs_spinlock); >> + ? ? tmp = find_pnfs_driver_locked(ld_type->id); >> + ? ? if (!tmp) { >> + ? ? ? ? ? ? list_add(&ld_type->pnfs_tblid, &pnfs_modules_tbl); >> + ? ? ? ? ? ? status = 0; >> + ? ? ? ? ? ? dprintk("%s Registering id:%u name:%s\n", __func__, ld_type->id, >> + ? ? ? ? ? ? ? ? ? ? ld_type->name); >> + ? ? } else { >> + ? ? ? ? ? ? module_put(tmp->owner); >> + ? ? ? ? ? ? printk(KERN_ERR "%s Module with id %d already loaded!\n", >> + ? ? ? ? ? ? ? ? ? ? __func__, ld_type->id); >> + ? ? } >> + ? ? spin_unlock(&pnfs_spinlock); >> + >> + ? ? return status; >> +} >> +EXPORT_SYMBOL_GPL(pnfs_register_layoutdriver); >> + >> +void >> +pnfs_unregister_layoutdriver(struct pnfs_layoutdriver_type *ld_type) >> +{ >> + ? ? dprintk("%s Deregistering id:%u\n", __func__, ld_type->id); >> + ? ? spin_lock(&pnfs_spinlock); >> + ? ? list_del(&ld_type->pnfs_tblid); >> + ? ? spin_unlock(&pnfs_spinlock); >> +} >> +EXPORT_SYMBOL_GPL(pnfs_unregister_layoutdriver); >> diff --git a/fs/nfs/pnfs.h b/fs/nfs/pnfs.h >> index c628ef1..61531f3 100644 >> --- a/fs/nfs/pnfs.h >> +++ b/fs/nfs/pnfs.h >> @@ -36,8 +36,17 @@ >> >> ?/* Per-layout driver specific registration structure */ >> ?struct pnfs_layoutdriver_type { >> + ? ? struct list_head pnfs_tblid; >> + ? ? const u32 id; >> + ? ? const char *name; >> + ? ? struct module *owner; >> + ? ? int (*initialize_mountpoint) (struct nfs_server *); >> + ? ? int (*uninitialize_mountpoint) (struct nfs_server *); >> ?}; >> >> +extern int pnfs_register_layoutdriver(struct pnfs_layoutdriver_type *); >> +extern void pnfs_unregister_layoutdriver(struct pnfs_layoutdriver_type *); >> + >> ?void set_pnfs_layoutdriver(struct nfs_server *, u32 id); >> ?void unset_pnfs_layoutdriver(struct nfs_server *); >> > -- > To unsubscribe from this list: send the line "unsubscribe linux-nfs" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at ?http://vger.kernel.org/majordomo-info.html >