2014-03-01 02:17:09

by Oleg Drokin

[permalink] [raw]
Subject: [PATCH 00/17] Lustre stability patches

This series of patches fixes most of the issues I hit during
Lustre regression test suite. All observed crashes are gone too.

Please consider for inclusion.

Alexey Lyashkov (1):
lustre/mdc: use ibits_known mask for lock match

Ann Koehler (1):
lustre/osc: Don't flush active extents.

Bruno Faccini (1):
lustre/ldlm: set l_lvb_type coherent when layout is returned

Hongchao Zhang (1):
lustre/recovery: free open/close request promptly

John L. Hammond (3):
staging/lustre/llite: fix open lock matching in ll_md_blocking_ast()
lustre/clio: honor O_NOATIME
lustre/mdc: fix bad ERR_PTR usage in mdc_locks.c

Lai Siyao (1):
lustre/llite: simplify dentry revalidate

Liang Zhen (2):
lustre/ptlrpc: rq_commit_cb is called for twice
lustre/ptlrpc: re-enqueue ptlrpcd worker

Niu Yawei (1):
lustre/quota: improper assert in osc_quota_chkdq()

Oleg Drokin (3):
lustre/mdc: Check for all attributes validity in revalidate
lustre/llite: Do not send parent dir fid in getattr by fid
lustre/libcfs: warn if all HTs in a core are gone

Peng Tao (1):
lustre/ptlrpc: skip rpcs that fail ptl_send_rpc

Sebastien Buisson (1):
lustre/ptlrpc: fix 'data race condition' issues

wang di (1):
lustre/mdc: comments on LOOKUP and PERM lock

drivers/staging/lustre/lustre/include/cl_object.h | 6 +-
.../lustre/lustre/include/lustre/lustre_idl.h | 32 ++-
.../staging/lustre/lustre/include/lustre_export.h | 17 ++
.../staging/lustre/lustre/include/lustre_import.h | 11 +
drivers/staging/lustre/lustre/include/lustre_net.h | 2 +
drivers/staging/lustre/lustre/include/obd.h | 5 +-
drivers/staging/lustre/lustre/include/obd_class.h | 4 +-
drivers/staging/lustre/lustre/ldlm/ldlm_lockd.c | 1 +
.../staging/lustre/lustre/libcfs/linux/linux-cpu.c | 19 +-
drivers/staging/lustre/lustre/llite/dcache.c | 290 ++-------------------
drivers/staging/lustre/lustre/llite/dir.c | 2 +-
drivers/staging/lustre/lustre/llite/file.c | 60 +++--
.../staging/lustre/lustre/llite/llite_internal.h | 6 +-
drivers/staging/lustre/lustre/llite/llite_lib.c | 3 +-
drivers/staging/lustre/lustre/llite/namei.c | 78 +++---
drivers/staging/lustre/lustre/lmv/lmv_intent.c | 1 -
drivers/staging/lustre/lustre/lmv/lmv_obd.c | 5 +-
drivers/staging/lustre/lustre/lov/lov_io.c | 1 +
drivers/staging/lustre/lustre/mdc/mdc_internal.h | 2 +-
drivers/staging/lustre/lustre/mdc/mdc_locks.c | 102 ++++----
drivers/staging/lustre/lustre/mdc/mdc_reint.c | 1 +
drivers/staging/lustre/lustre/mdc/mdc_request.c | 27 +-
drivers/staging/lustre/lustre/obdclass/genops.c | 2 +
.../lustre/lustre/obdclass/lprocfs_status.c | 1 +
drivers/staging/lustre/lustre/osc/osc_cache.c | 6 +
drivers/staging/lustre/lustre/osc/osc_io.c | 14 +-
drivers/staging/lustre/lustre/osc/osc_quota.c | 7 +-
drivers/staging/lustre/lustre/ptlrpc/client.c | 155 ++++++++---
drivers/staging/lustre/lustre/ptlrpc/import.c | 33 ++-
drivers/staging/lustre/lustre/ptlrpc/niobuf.c | 4 +
drivers/staging/lustre/lustre/ptlrpc/recover.c | 57 +++-
31 files changed, 480 insertions(+), 474 deletions(-)

--
1.8.5.3


2014-03-01 02:17:19

by Oleg Drokin

[permalink] [raw]
Subject: [PATCH 01/17] staging/lustre/llite: fix open lock matching in ll_md_blocking_ast()

From: "John L. Hammond" <[email protected]>

In ll_md_blocking_ast() match open locks before all others, ensuring
that MDS_INODELOCK_OPEN is not cleared from bits by another open lock
with a different mode. Change the int flags parameter of
ll_md_real_close() to fmode_t fmode. Clean up verious style issues in
both functions.

Signed-off-by: John L. Hammond <[email protected]>
Reviewed-on: http://review.whamcloud.com/8718
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-4429
Reviewed-by: Niu Yawei <[email protected]>
Reviewed-by: Jinshan Xiong <[email protected]>
Reviewed-by: Oleg Drokin <[email protected]>
Signed-off-by: Oleg Drokin <[email protected]>
---
drivers/staging/lustre/lustre/llite/file.c | 19 +++---
.../staging/lustre/lustre/llite/llite_internal.h | 2 +-
drivers/staging/lustre/lustre/llite/namei.c | 78 ++++++++++++----------
3 files changed, 54 insertions(+), 45 deletions(-)

diff --git a/drivers/staging/lustre/lustre/llite/file.c b/drivers/staging/lustre/lustre/llite/file.c
index 4c28f39..c9ee574 100644
--- a/drivers/staging/lustre/lustre/llite/file.c
+++ b/drivers/staging/lustre/lustre/llite/file.c
@@ -205,7 +205,7 @@ out:
return rc;
}

-int ll_md_real_close(struct inode *inode, int flags)
+int ll_md_real_close(struct inode *inode, fmode_t fmode)
{
struct ll_inode_info *lli = ll_i2info(inode);
struct obd_client_handle **och_p;
@@ -213,30 +213,33 @@ int ll_md_real_close(struct inode *inode, int flags)
__u64 *och_usecount;
int rc = 0;

- if (flags & FMODE_WRITE) {
+ if (fmode & FMODE_WRITE) {
och_p = &lli->lli_mds_write_och;
och_usecount = &lli->lli_open_fd_write_count;
- } else if (flags & FMODE_EXEC) {
+ } else if (fmode & FMODE_EXEC) {
och_p = &lli->lli_mds_exec_och;
och_usecount = &lli->lli_open_fd_exec_count;
} else {
- LASSERT(flags & FMODE_READ);
+ LASSERT(fmode & FMODE_READ);
och_p = &lli->lli_mds_read_och;
och_usecount = &lli->lli_open_fd_read_count;
}

mutex_lock(&lli->lli_och_mutex);
- if (*och_usecount) { /* There are still users of this handle, so
- skip freeing it. */
+ if (*och_usecount > 0) {
+ /* There are still users of this handle, so skip
+ * freeing it. */
mutex_unlock(&lli->lli_och_mutex);
return 0;
}
+
och=*och_p;
*och_p = NULL;
mutex_unlock(&lli->lli_och_mutex);

- if (och) { /* There might be a race and somebody have freed this och
- already */
+ if (och != NULL) {
+ /* There might be a race and this handle may already
+ be closed. */
rc = ll_close_inode_openhandle(ll_i2sbi(inode)->ll_md_exp,
inode, och, NULL);
}
diff --git a/drivers/staging/lustre/lustre/llite/llite_internal.h b/drivers/staging/lustre/lustre/llite/llite_internal.h
index e27efd1..47c5142 100644
--- a/drivers/staging/lustre/lustre/llite/llite_internal.h
+++ b/drivers/staging/lustre/lustre/llite/llite_internal.h
@@ -775,7 +775,7 @@ int ll_local_open(struct file *file,
int ll_release_openhandle(struct dentry *, struct lookup_intent *);
int ll_md_close(struct obd_export *md_exp, struct inode *inode,
struct file *file);
-int ll_md_real_close(struct inode *inode, int flags);
+int ll_md_real_close(struct inode *inode, fmode_t fmode);
void ll_ioepoch_close(struct inode *inode, struct md_op_data *op_data,
struct obd_client_handle **och, unsigned long flags);
void ll_done_writing_attr(struct inode *inode, struct md_op_data *op_data);
diff --git a/drivers/staging/lustre/lustre/llite/namei.c b/drivers/staging/lustre/lustre/llite/namei.c
index 93c3744..86ff708 100644
--- a/drivers/staging/lustre/lustre/llite/namei.c
+++ b/drivers/staging/lustre/lustre/llite/namei.c
@@ -195,101 +195,107 @@ static void ll_invalidate_negative_children(struct inode *dir)
int ll_md_blocking_ast(struct ldlm_lock *lock, struct ldlm_lock_desc *desc,
void *data, int flag)
{
- int rc;
struct lustre_handle lockh;
+ int rc;

switch (flag) {
case LDLM_CB_BLOCKING:
ldlm_lock2handle(lock, &lockh);
rc = ldlm_cli_cancel(&lockh, LCF_ASYNC);
if (rc < 0) {
- CDEBUG(D_INODE, "ldlm_cli_cancel: %d\n", rc);
+ CDEBUG(D_INODE, "ldlm_cli_cancel: rc = %d\n", rc);
return rc;
}
break;
case LDLM_CB_CANCELING: {
struct inode *inode = ll_inode_from_resource_lock(lock);
- struct ll_inode_info *lli;
__u64 bits = lock->l_policy_data.l_inodebits.bits;
- struct lu_fid *fid;
- ldlm_mode_t mode = lock->l_req_mode;

/* Inode is set to lock->l_resource->lr_lvb_inode
* for mdc - bug 24555 */
LASSERT(lock->l_ast_data == NULL);

- /* Invalidate all dentries associated with this inode */
if (inode == NULL)
break;

+ /* Invalidate all dentries associated with this inode */
LASSERT(lock->l_flags & LDLM_FL_CANCELING);

- if (bits & MDS_INODELOCK_XATTR)
+ if (!fid_res_name_eq(ll_inode2fid(inode),
+ &lock->l_resource->lr_name)) {
+ LDLM_ERROR(lock, "data mismatch with object "DFID"(%p)",
+ PFID(ll_inode2fid(inode)), inode);
+ LBUG();
+ }
+
+ if (bits & MDS_INODELOCK_XATTR) {
ll_xattr_cache_destroy(inode);
+ bits &= ~MDS_INODELOCK_XATTR;
+ }

/* For OPEN locks we differentiate between lock modes
* LCK_CR, LCK_CW, LCK_PR - bug 22891 */
- if (bits & (MDS_INODELOCK_LOOKUP | MDS_INODELOCK_UPDATE |
- MDS_INODELOCK_LAYOUT | MDS_INODELOCK_PERM))
- ll_have_md_lock(inode, &bits, LCK_MINMODE);
-
if (bits & MDS_INODELOCK_OPEN)
- ll_have_md_lock(inode, &bits, mode);
-
- fid = ll_inode2fid(inode);
- if (!fid_res_name_eq(fid, &lock->l_resource->lr_name))
- LDLM_ERROR(lock, "data mismatch with object "
- DFID" (%p)", PFID(fid), inode);
+ ll_have_md_lock(inode, &bits, lock->l_req_mode);

if (bits & MDS_INODELOCK_OPEN) {
- int flags = 0;
+ fmode_t fmode;
+
switch (lock->l_req_mode) {
case LCK_CW:
- flags = FMODE_WRITE;
+ fmode = FMODE_WRITE;
break;
case LCK_PR:
- flags = FMODE_EXEC;
+ fmode = FMODE_EXEC;
break;
case LCK_CR:
- flags = FMODE_READ;
+ fmode = FMODE_READ;
break;
default:
- CERROR("Unexpected lock mode for OPEN lock "
- "%d, inode %ld\n", lock->l_req_mode,
- inode->i_ino);
+ LDLM_ERROR(lock, "bad lock mode for OPEN lock");
+ LBUG();
}
- ll_md_real_close(inode, flags);
+
+ ll_md_real_close(inode, fmode);
}

- lli = ll_i2info(inode);
+ if (bits & (MDS_INODELOCK_LOOKUP | MDS_INODELOCK_UPDATE |
+ MDS_INODELOCK_LAYOUT | MDS_INODELOCK_PERM))
+ ll_have_md_lock(inode, &bits, LCK_MINMODE);
+
if (bits & MDS_INODELOCK_LAYOUT) {
- struct cl_object_conf conf = { { 0 } };
+ struct cl_object_conf conf = {
+ .coc_opc = OBJECT_CONF_INVALIDATE,
+ .coc_inode = inode,
+ };

- conf.coc_opc = OBJECT_CONF_INVALIDATE;
- conf.coc_inode = inode;
rc = ll_layout_conf(inode, &conf);
- if (rc)
- CDEBUG(D_INODE, "invaliding layout %d.\n", rc);
+ if (rc < 0)
+ CDEBUG(D_INODE, "cannot invalidate layout of "
+ DFID": rc = %d\n",
+ PFID(ll_inode2fid(inode)), rc);
}

if (bits & MDS_INODELOCK_UPDATE) {
+ struct ll_inode_info *lli = ll_i2info(inode);
+
spin_lock(&lli->lli_lock);
lli->lli_flags &= ~LLIF_MDS_SIZE_LOCK;
spin_unlock(&lli->lli_lock);
}

- if (S_ISDIR(inode->i_mode) &&
- (bits & MDS_INODELOCK_UPDATE)) {
+ if ((bits & MDS_INODELOCK_UPDATE) && S_ISDIR(inode->i_mode)) {
CDEBUG(D_INODE, "invalidating inode %lu\n",
inode->i_ino);
truncate_inode_pages(inode->i_mapping, 0);
ll_invalidate_negative_children(inode);
}

- if (inode->i_sb->s_root &&
- inode != inode->i_sb->s_root->d_inode &&
- (bits & (MDS_INODELOCK_LOOKUP | MDS_INODELOCK_PERM)))
+ if ((bits & (MDS_INODELOCK_LOOKUP | MDS_INODELOCK_PERM)) &&
+ inode->i_sb->s_root != NULL &&
+ inode != inode->i_sb->s_root->d_inode)
ll_invalidate_aliases(inode);
+
iput(inode);
break;
}
--
1.8.5.3

2014-03-01 02:17:22

by Oleg Drokin

[permalink] [raw]
Subject: [PATCH 08/17] lustre/recovery: free open/close request promptly

From: Hongchao Zhang <[email protected]>

- For the non-create open or committed open, the open request
should be freed along with the close request as soon as the
close done, despite that the transno of open/close is
greater than the last committed transno known by client or not.

- Move the committed open request into another dedicated list,
that will avoid scanning a huge replay list on receiving each
reply (when there are many open files).

Signed-off-by: Niu Yawei <[email protected]>
Signed-off-by: Hongchao Zhang <[email protected]>
Reviewed-on: http://review.whamcloud.com/6665
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-2613
Reviewed-by: Alex Zhuravlev <[email protected]>
Reviewed-by: Oleg Drokin <[email protected]>
Signed-off-by: Oleg Drokin <[email protected]>
---
.../lustre/lustre/include/lustre/lustre_idl.h | 6 +-
.../staging/lustre/lustre/include/lustre_export.h | 9 +++
.../staging/lustre/lustre/include/lustre_import.h | 11 +++
drivers/staging/lustre/lustre/include/lustre_net.h | 2 +
drivers/staging/lustre/lustre/include/obd.h | 5 +-
drivers/staging/lustre/lustre/include/obd_class.h | 4 +-
drivers/staging/lustre/lustre/llite/file.c | 2 +-
drivers/staging/lustre/lustre/llite/llite_lib.c | 3 +-
drivers/staging/lustre/lustre/lmv/lmv_obd.c | 4 +-
drivers/staging/lustre/lustre/mdc/mdc_internal.h | 2 +-
drivers/staging/lustre/lustre/mdc/mdc_locks.c | 2 +-
drivers/staging/lustre/lustre/mdc/mdc_reint.c | 1 +
drivers/staging/lustre/lustre/mdc/mdc_request.c | 27 +++++++-
drivers/staging/lustre/lustre/obdclass/genops.c | 2 +
.../lustre/lustre/obdclass/lprocfs_status.c | 1 +
drivers/staging/lustre/lustre/ptlrpc/client.c | 78 +++++++++++++++++-----
drivers/staging/lustre/lustre/ptlrpc/import.c | 33 ++++++---
drivers/staging/lustre/lustre/ptlrpc/recover.c | 57 +++++++++++++---
18 files changed, 198 insertions(+), 51 deletions(-)

diff --git a/drivers/staging/lustre/lustre/include/lustre/lustre_idl.h b/drivers/staging/lustre/lustre/include/lustre/lustre_idl.h
index 4c70c06..a55eebf 100644
--- a/drivers/staging/lustre/lustre/include/lustre/lustre_idl.h
+++ b/drivers/staging/lustre/lustre/include/lustre/lustre_idl.h
@@ -1305,6 +1305,7 @@ extern void lustre_swab_ptlrpc_body(struct ptlrpc_body *pb);
#define OBD_CONNECT_SHORTIO 0x2000000000000ULL/* short io */
#define OBD_CONNECT_PINGLESS 0x4000000000000ULL/* pings not required */
#define OBD_CONNECT_FLOCK_DEAD 0x8000000000000ULL/* flock deadlock detection */
+#define OBD_CONNECT_DISP_STRIPE 0x10000000000000ULL/*create stripe disposition*/

/* XXX README XXX:
* Please DO NOT add flag values here before first ensuring that this same
@@ -1344,7 +1345,9 @@ extern void lustre_swab_ptlrpc_body(struct ptlrpc_body *pb);
OBD_CONNECT_LIGHTWEIGHT | OBD_CONNECT_UMASK | \
OBD_CONNECT_LVB_TYPE | OBD_CONNECT_LAYOUTLOCK |\
OBD_CONNECT_PINGLESS | OBD_CONNECT_MAX_EASIZE |\
- OBD_CONNECT_FLOCK_DEAD)
+ OBD_CONNECT_FLOCK_DEAD | \
+ OBD_CONNECT_DISP_STRIPE)
+
#define OST_CONNECT_SUPPORTED (OBD_CONNECT_SRVLOCK | OBD_CONNECT_GRANT | \
OBD_CONNECT_REQPORTAL | OBD_CONNECT_VERSION | \
OBD_CONNECT_TRUNCLOCK | OBD_CONNECT_INDEX | \
@@ -2114,6 +2117,7 @@ extern void lustre_swab_generic_32s (__u32 *val);
#define DISP_ENQ_CREATE_REF 0x01000000
#define DISP_OPEN_LOCK 0x02000000
#define DISP_OPEN_LEASE 0x04000000
+#define DISP_OPEN_STRIPE 0x08000000

/* INODE LOCK PARTS */
#define MDS_INODELOCK_LOOKUP 0x000001 /* For namespace, dentry etc, and also
diff --git a/drivers/staging/lustre/lustre/include/lustre_export.h b/drivers/staging/lustre/lustre/include/lustre_export.h
index 82a230b..6f7f48c 100644
--- a/drivers/staging/lustre/lustre/include/lustre_export.h
+++ b/drivers/staging/lustre/lustre/include/lustre_export.h
@@ -388,6 +388,15 @@ static inline __u64 exp_connect_ibits(struct obd_export *exp)
return ocd->ocd_ibits_known;
}

+static inline bool imp_connect_disp_stripe(struct obd_import *imp)
+{
+ struct obd_connect_data *ocd;
+
+ LASSERT(imp != NULL);
+ ocd = &imp->imp_connect_data;
+ return ocd->ocd_connect_flags & OBD_CONNECT_DISP_STRIPE;
+}
+
extern struct obd_export *class_conn2export(struct lustre_handle *conn);
extern struct obd_device *class_conn2obd(struct lustre_handle *conn);

diff --git a/drivers/staging/lustre/lustre/include/lustre_import.h b/drivers/staging/lustre/lustre/include/lustre_import.h
index 67259eb..e9833ae 100644
--- a/drivers/staging/lustre/lustre/include/lustre_import.h
+++ b/drivers/staging/lustre/lustre/include/lustre_import.h
@@ -180,6 +180,17 @@ struct obd_import {
struct list_head imp_delayed_list;
/** @} */

+ /**
+ * List of requests that are retained for committed open replay. Once
+ * open is committed, open replay request will be moved from the
+ * imp_replay_list into the imp_committed_list.
+ * The imp_replay_cursor is for accelerating searching during replay.
+ * @{
+ */
+ struct list_head imp_committed_list;
+ struct list_head *imp_replay_cursor;
+ /** @} */
+
/** obd device for this import */
struct obd_device *imp_obd;

diff --git a/drivers/staging/lustre/lustre/include/lustre_net.h b/drivers/staging/lustre/lustre/include/lustre_net.h
index d8d0880..11382ab 100644
--- a/drivers/staging/lustre/lustre/include/lustre_net.h
+++ b/drivers/staging/lustre/lustre/include/lustre_net.h
@@ -2621,6 +2621,8 @@ int ptlrpc_register_rqbd(struct ptlrpc_request_buffer_desc *rqbd);
* request queues, request management, etc.
* @{
*/
+void ptlrpc_request_committed(struct ptlrpc_request *req, int force);
+
void ptlrpc_init_client(int req_portal, int rep_portal, char *name,
struct ptlrpc_client *);
void ptlrpc_cleanup_client(struct obd_import *imp);
diff --git a/drivers/staging/lustre/lustre/include/obd.h b/drivers/staging/lustre/lustre/include/obd.h
index c3470ce..1b38695 100644
--- a/drivers/staging/lustre/lustre/include/obd.h
+++ b/drivers/staging/lustre/lustre/include/obd.h
@@ -1323,7 +1323,8 @@ struct md_open_data {
struct obd_client_handle *mod_och;
struct ptlrpc_request *mod_open_req;
struct ptlrpc_request *mod_close_req;
- atomic_t mod_refcount;
+ atomic_t mod_refcount;
+ bool mod_is_create;
};

struct lookup_intent;
@@ -1392,7 +1393,7 @@ struct md_ops {

int (*m_set_open_replay_data)(struct obd_export *,
struct obd_client_handle *,
- struct ptlrpc_request *);
+ struct lookup_intent *);
int (*m_clear_open_replay_data)(struct obd_export *,
struct obd_client_handle *);
int (*m_set_lock_data)(struct obd_export *, __u64 *, void *, __u64 *);
diff --git a/drivers/staging/lustre/lustre/include/obd_class.h b/drivers/staging/lustre/lustre/include/obd_class.h
index 1c2ba19..0a18820 100644
--- a/drivers/staging/lustre/lustre/include/obd_class.h
+++ b/drivers/staging/lustre/lustre/include/obd_class.h
@@ -2001,11 +2001,11 @@ static inline int md_getxattr(struct obd_export *exp,

static inline int md_set_open_replay_data(struct obd_export *exp,
struct obd_client_handle *och,
- struct ptlrpc_request *open_req)
+ struct lookup_intent *it)
{
EXP_CHECK_MD_OP(exp, set_open_replay_data);
EXP_MD_COUNTER_INCREMENT(exp, set_open_replay_data);
- return MDP(exp->exp_obd, set_open_replay_data)(exp, och, open_req);
+ return MDP(exp->exp_obd, set_open_replay_data)(exp, och, it);
}

static inline int md_clear_open_replay_data(struct obd_export *exp,
diff --git a/drivers/staging/lustre/lustre/llite/file.c b/drivers/staging/lustre/lustre/llite/file.c
index 362f5ec..7ceec74 100644
--- a/drivers/staging/lustre/lustre/llite/file.c
+++ b/drivers/staging/lustre/lustre/llite/file.c
@@ -480,7 +480,7 @@ static int ll_och_fill(struct obd_export *md_exp, struct lookup_intent *it,
och->och_magic = OBD_CLIENT_HANDLE_MAGIC;
och->och_flags = it->it_flags;

- return md_set_open_replay_data(md_exp, och, req);
+ return md_set_open_replay_data(md_exp, och, it);
}

int ll_local_open(struct file *file, struct lookup_intent *it,
diff --git a/drivers/staging/lustre/lustre/llite/llite_lib.c b/drivers/staging/lustre/lustre/llite/llite_lib.c
index 85c01e1..7427f69 100644
--- a/drivers/staging/lustre/lustre/llite/llite_lib.c
+++ b/drivers/staging/lustre/lustre/llite/llite_lib.c
@@ -208,7 +208,8 @@ static int client_common_fill_super(struct super_block *sb, char *md, char *dt,
OBD_CONNECT_LAYOUTLOCK |
OBD_CONNECT_PINGLESS |
OBD_CONNECT_MAX_EASIZE |
- OBD_CONNECT_FLOCK_DEAD;
+ OBD_CONNECT_FLOCK_DEAD |
+ OBD_CONNECT_DISP_STRIPE;

if (sbi->ll_flags & LL_SBI_SOM_PREVIEW)
data->ocd_connect_flags |= OBD_CONNECT_SOM;
diff --git a/drivers/staging/lustre/lustre/lmv/lmv_obd.c b/drivers/staging/lustre/lustre/lmv/lmv_obd.c
index 1bddd8f..40fbd44 100644
--- a/drivers/staging/lustre/lustre/lmv/lmv_obd.c
+++ b/drivers/staging/lustre/lustre/lmv/lmv_obd.c
@@ -2593,7 +2593,7 @@ int lmv_free_lustre_md(struct obd_export *exp, struct lustre_md *md)

int lmv_set_open_replay_data(struct obd_export *exp,
struct obd_client_handle *och,
- struct ptlrpc_request *open_req)
+ struct lookup_intent *it)
{
struct obd_device *obd = exp->exp_obd;
struct lmv_obd *lmv = &obd->u.lmv;
@@ -2603,7 +2603,7 @@ int lmv_set_open_replay_data(struct obd_export *exp,
if (IS_ERR(tgt))
return PTR_ERR(tgt);

- return md_set_open_replay_data(tgt->ltd_exp, och, open_req);
+ return md_set_open_replay_data(tgt->ltd_exp, och, it);
}

int lmv_clear_open_replay_data(struct obd_export *exp,
diff --git a/drivers/staging/lustre/lustre/mdc/mdc_internal.h b/drivers/staging/lustre/lustre/mdc/mdc_internal.h
index fc21777..c78bf00 100644
--- a/drivers/staging/lustre/lustre/mdc/mdc_internal.h
+++ b/drivers/staging/lustre/lustre/mdc/mdc_internal.h
@@ -122,7 +122,7 @@ int mdc_free_lustre_md(struct obd_export *exp, struct lustre_md *md);

int mdc_set_open_replay_data(struct obd_export *exp,
struct obd_client_handle *och,
- struct ptlrpc_request *open_req);
+ struct lookup_intent *it);

int mdc_clear_open_replay_data(struct obd_export *exp,
struct obd_client_handle *och);
diff --git a/drivers/staging/lustre/lustre/mdc/mdc_locks.c b/drivers/staging/lustre/lustre/mdc/mdc_locks.c
index 6110943..20706e7 100644
--- a/drivers/staging/lustre/lustre/mdc/mdc_locks.c
+++ b/drivers/staging/lustre/lustre/mdc/mdc_locks.c
@@ -641,7 +641,7 @@ static int mdc_finish_enqueue(struct obd_export *exp,
* happens immediately after swabbing below, new reply
* is swabbed by that handler correctly.
*/
- mdc_set_open_replay_data(NULL, NULL, req);
+ mdc_set_open_replay_data(NULL, NULL, it);
}

if ((body->valid & (OBD_MD_FLDIREA | OBD_MD_FLEASIZE)) != 0) {
diff --git a/drivers/staging/lustre/lustre/mdc/mdc_reint.c b/drivers/staging/lustre/lustre/mdc/mdc_reint.c
index 1aea154..d79aa16 100644
--- a/drivers/staging/lustre/lustre/mdc/mdc_reint.c
+++ b/drivers/staging/lustre/lustre/mdc/mdc_reint.c
@@ -165,6 +165,7 @@ int mdc_setattr(struct obd_export *exp, struct md_op_data *op_data,
req->rq_cb_data = *mod;
(*mod)->mod_open_req = req;
req->rq_commit_cb = mdc_commit_open;
+ (*mod)->mod_is_create = true;
/**
* Take an extra reference on \var mod, it protects \var
* mod from being freed on eviction (commit callback is
diff --git a/drivers/staging/lustre/lustre/mdc/mdc_request.c b/drivers/staging/lustre/lustre/mdc/mdc_request.c
index 17c8e14..d9ddb39 100644
--- a/drivers/staging/lustre/lustre/mdc/mdc_request.c
+++ b/drivers/staging/lustre/lustre/mdc/mdc_request.c
@@ -722,11 +722,12 @@ void mdc_commit_open(struct ptlrpc_request *req)

int mdc_set_open_replay_data(struct obd_export *exp,
struct obd_client_handle *och,
- struct ptlrpc_request *open_req)
+ struct lookup_intent *it)
{
struct md_open_data *mod;
struct mdt_rec_create *rec;
struct mdt_body *body;
+ struct ptlrpc_request *open_req = it->d.lustre.it_data;
struct obd_import *imp = open_req->rq_import;

if (!open_req->rq_replay)
@@ -760,6 +761,8 @@ int mdc_set_open_replay_data(struct obd_export *exp,
spin_lock(&open_req->rq_lock);
och->och_mod = mod;
mod->mod_och = och;
+ mod->mod_is_create = it_disposition(it, DISP_OPEN_CREATE) ||
+ it_disposition(it, DISP_OPEN_STRIPE);
mod->mod_open_req = open_req;
open_req->rq_cb_data = mod;
open_req->rq_commit_cb = mdc_commit_open;
@@ -780,6 +783,23 @@ int mdc_set_open_replay_data(struct obd_export *exp,
return 0;
}

+static void mdc_free_open(struct md_open_data *mod)
+{
+ int committed = 0;
+
+ if (mod->mod_is_create == 0 &&
+ imp_connect_disp_stripe(mod->mod_open_req->rq_import))
+ committed = 1;
+
+ LASSERT(mod->mod_open_req->rq_replay == 0);
+
+ DEBUG_REQ(D_RPCTRACE, mod->mod_open_req, "free open request\n");
+
+ ptlrpc_request_committed(mod->mod_open_req, committed);
+ if (mod->mod_close_req)
+ ptlrpc_request_committed(mod->mod_close_req, committed);
+}
+
int mdc_clear_open_replay_data(struct obd_export *exp,
struct obd_client_handle *och)
{
@@ -793,6 +813,8 @@ int mdc_clear_open_replay_data(struct obd_export *exp,
return 0;

LASSERT(mod != LP_POISON);
+ LASSERT(mod->mod_open_req != NULL);
+ mdc_free_open(mod);

mod->mod_och = NULL;
och->och_mod = NULL;
@@ -991,6 +1013,9 @@ int mdc_done_writing(struct obd_export *exp, struct md_op_data *op_data,
if (mod) {
if (rc != 0)
mod->mod_close_req = NULL;
+ LASSERT(mod->mod_open_req != NULL);
+ mdc_free_open(mod);
+
/* Since now, mod is accessed through setattr req only,
* thus DW req does not keep a reference on mod anymore. */
obd_mod_put(mod);
diff --git a/drivers/staging/lustre/lustre/obdclass/genops.c b/drivers/staging/lustre/lustre/obdclass/genops.c
index d27f041..169c9ed 100644
--- a/drivers/staging/lustre/lustre/obdclass/genops.c
+++ b/drivers/staging/lustre/lustre/obdclass/genops.c
@@ -1010,6 +1010,8 @@ struct obd_import *class_new_import(struct obd_device *obd)
INIT_LIST_HEAD(&imp->imp_replay_list);
INIT_LIST_HEAD(&imp->imp_sending_list);
INIT_LIST_HEAD(&imp->imp_delayed_list);
+ INIT_LIST_HEAD(&imp->imp_committed_list);
+ imp->imp_replay_cursor = &imp->imp_committed_list;
spin_lock_init(&imp->imp_lock);
imp->imp_last_success_conn = 0;
imp->imp_state = LUSTRE_IMP_NEW;
diff --git a/drivers/staging/lustre/lustre/obdclass/lprocfs_status.c b/drivers/staging/lustre/lustre/obdclass/lprocfs_status.c
index 6e7d2e5..1432dd7 100644
--- a/drivers/staging/lustre/lustre/obdclass/lprocfs_status.c
+++ b/drivers/staging/lustre/lustre/obdclass/lprocfs_status.c
@@ -99,6 +99,7 @@ static const char * const obd_connect_names[] = {
"short_io",
"pingless",
"flock_deadlock",
+ "disp_stripe",
"unknown",
NULL
};
diff --git a/drivers/staging/lustre/lustre/ptlrpc/client.c b/drivers/staging/lustre/lustre/ptlrpc/client.c
index eb33bb7..a32b722 100644
--- a/drivers/staging/lustre/lustre/ptlrpc/client.c
+++ b/drivers/staging/lustre/lustre/ptlrpc/client.c
@@ -2360,6 +2360,39 @@ int ptlrpc_unregister_reply(struct ptlrpc_request *request, int async)
}
EXPORT_SYMBOL(ptlrpc_unregister_reply);

+static void ptlrpc_free_request(struct ptlrpc_request *req)
+{
+ spin_lock(&req->rq_lock);
+ req->rq_replay = 0;
+ spin_unlock(&req->rq_lock);
+
+ if (req->rq_commit_cb != NULL)
+ req->rq_commit_cb(req);
+ list_del_init(&req->rq_replay_list);
+
+ __ptlrpc_req_finished(req, 1);
+}
+
+/**
+ * the request is committed and dropped from the replay list of its import
+ */
+void ptlrpc_request_committed(struct ptlrpc_request *req, int force)
+{
+ struct obd_import *imp = req->rq_import;
+
+ spin_lock(&imp->imp_lock);
+ if (list_empty(&req->rq_replay_list)) {
+ spin_unlock(&imp->imp_lock);
+ return;
+ }
+
+ if (force || req->rq_transno <= imp->imp_peer_committed_transno)
+ ptlrpc_free_request(req);
+
+ spin_unlock(&imp->imp_lock);
+}
+EXPORT_SYMBOL(ptlrpc_request_committed);
+
/**
* Iterates through replay_list on import and prunes
* all requests have transno smaller than last_committed for the
@@ -2370,9 +2403,9 @@ EXPORT_SYMBOL(ptlrpc_unregister_reply);
*/
void ptlrpc_free_committed(struct obd_import *imp)
{
- struct list_head *tmp, *saved;
- struct ptlrpc_request *req;
+ struct ptlrpc_request *req, *saved;
struct ptlrpc_request *last_req = NULL; /* temporary fire escape */
+ bool skip_committed_list = true;

LASSERT(imp != NULL);

@@ -2388,13 +2421,15 @@ void ptlrpc_free_committed(struct obd_import *imp)
CDEBUG(D_RPCTRACE, "%s: committing for last_committed "LPU64" gen %d\n",
imp->imp_obd->obd_name, imp->imp_peer_committed_transno,
imp->imp_generation);
+
+ if (imp->imp_generation != imp->imp_last_generation_checked)
+ skip_committed_list = false;
+
imp->imp_last_transno_checked = imp->imp_peer_committed_transno;
imp->imp_last_generation_checked = imp->imp_generation;

- list_for_each_safe(tmp, saved, &imp->imp_replay_list) {
- req = list_entry(tmp, struct ptlrpc_request,
- rq_replay_list);
-
+ list_for_each_entry_safe(req, saved, &imp->imp_replay_list,
+ rq_replay_list) {
/* XXX ok to remove when 1357 resolved - rread 05/29/03 */
LASSERT(req != last_req);
last_req = req;
@@ -2408,27 +2443,34 @@ void ptlrpc_free_committed(struct obd_import *imp)
GOTO(free_req, 0);
}

- if (req->rq_replay) {
- DEBUG_REQ(D_RPCTRACE, req, "keeping (FL_REPLAY)");
- continue;
- }
-
/* not yet committed */
if (req->rq_transno > imp->imp_peer_committed_transno) {
DEBUG_REQ(D_RPCTRACE, req, "stopping search");
break;
}

+ if (req->rq_replay) {
+ DEBUG_REQ(D_RPCTRACE, req, "keeping (FL_REPLAY)");
+ list_move_tail(&req->rq_replay_list,
+ &imp->imp_committed_list);
+ continue;
+ }
+
DEBUG_REQ(D_INFO, req, "commit (last_committed "LPU64")",
imp->imp_peer_committed_transno);
free_req:
- spin_lock(&req->rq_lock);
- req->rq_replay = 0;
- spin_unlock(&req->rq_lock);
- if (req->rq_commit_cb != NULL)
- req->rq_commit_cb(req);
- list_del_init(&req->rq_replay_list);
- __ptlrpc_req_finished(req, 1);
+ ptlrpc_free_request(req);
+ }
+ if (skip_committed_list)
+ return;
+
+ list_for_each_entry_safe(req, saved, &imp->imp_committed_list,
+ rq_replay_list) {
+ LASSERT(req->rq_transno != 0);
+ if (req->rq_import_generation < imp->imp_generation) {
+ DEBUG_REQ(D_RPCTRACE, req, "free stale open request");
+ ptlrpc_free_request(req);
+ }
}
}

diff --git a/drivers/staging/lustre/lustre/ptlrpc/import.c b/drivers/staging/lustre/lustre/ptlrpc/import.c
index 82db0ed..537aa62 100644
--- a/drivers/staging/lustre/lustre/ptlrpc/import.c
+++ b/drivers/staging/lustre/lustre/ptlrpc/import.c
@@ -560,17 +560,30 @@ static int ptlrpc_first_transno(struct obd_import *imp, __u64 *transno)
struct ptlrpc_request *req;
struct list_head *tmp;

- if (list_empty(&imp->imp_replay_list))
- return 0;
- tmp = imp->imp_replay_list.next;
- req = list_entry(tmp, struct ptlrpc_request, rq_replay_list);
- *transno = req->rq_transno;
- if (req->rq_transno == 0) {
- DEBUG_REQ(D_ERROR, req, "zero transno in replay");
- LBUG();
+ /* The requests in committed_list always have smaller transnos than
+ * the requests in replay_list */
+ if (!list_empty(&imp->imp_committed_list)) {
+ tmp = imp->imp_committed_list.next;
+ req = list_entry(tmp, struct ptlrpc_request, rq_replay_list);
+ *transno = req->rq_transno;
+ if (req->rq_transno == 0) {
+ DEBUG_REQ(D_ERROR, req,
+ "zero transno in committed_list");
+ LBUG();
+ }
+ return 1;
}
-
- return 1;
+ if (!list_empty(&imp->imp_replay_list)) {
+ tmp = imp->imp_replay_list.next;
+ req = list_entry(tmp, struct ptlrpc_request, rq_replay_list);
+ *transno = req->rq_transno;
+ if (req->rq_transno == 0) {
+ DEBUG_REQ(D_ERROR, req, "zero transno in replay_list");
+ LBUG();
+ }
+ return 1;
+ }
+ return 0;
}

/**
diff --git a/drivers/staging/lustre/lustre/ptlrpc/recover.c b/drivers/staging/lustre/lustre/ptlrpc/recover.c
index 84c39e0..48ae328 100644
--- a/drivers/staging/lustre/lustre/ptlrpc/recover.c
+++ b/drivers/staging/lustre/lustre/ptlrpc/recover.c
@@ -105,24 +105,59 @@ int ptlrpc_replay_next(struct obd_import *imp, int *inflight)
* imp_lock is being held by ptlrpc_replay, but it's not. it's
* just a little race...
*/
- list_for_each_safe(tmp, pos, &imp->imp_replay_list) {
+
+ /* Replay all the committed open requests on committed_list first */
+ if (!list_empty(&imp->imp_committed_list)) {
+ tmp = imp->imp_committed_list.prev;
req = list_entry(tmp, struct ptlrpc_request,
rq_replay_list);

- /* If need to resend the last sent transno (because a
- reconnect has occurred), then stop on the matching
- req and send it again. If, however, the last sent
- transno has been committed then we continue replay
- from the next request. */
+ /* The last request on committed_list hasn't been replayed */
if (req->rq_transno > last_transno) {
- if (imp->imp_resend_replay)
- lustre_msg_add_flags(req->rq_reqmsg,
- MSG_RESENT);
- break;
+ /* Since the imp_committed_list is immutable before
+ * all of it's requests being replayed, it's safe to
+ * use a cursor to accelerate the search */
+ imp->imp_replay_cursor = imp->imp_replay_cursor->next;
+
+ while (imp->imp_replay_cursor !=
+ &imp->imp_committed_list) {
+ req = list_entry(imp->imp_replay_cursor,
+ struct ptlrpc_request,
+ rq_replay_list);
+ if (req->rq_transno > last_transno)
+ break;
+
+ req = NULL;
+ imp->imp_replay_cursor =
+ imp->imp_replay_cursor->next;
+ }
+ } else {
+ /* All requests on committed_list have been replayed */
+ imp->imp_replay_cursor = &imp->imp_committed_list;
+ req = NULL;
+ }
+ }
+
+ /* All the requests in committed list have been replayed, let's replay
+ * the imp_replay_list */
+ if (req == NULL) {
+ list_for_each_safe(tmp, pos, &imp->imp_replay_list) {
+ req = list_entry(tmp, struct ptlrpc_request,
+ rq_replay_list);
+
+ if (req->rq_transno > last_transno)
+ break;
+ req = NULL;
}
- req = NULL;
}

+ /* If need to resend the last sent transno (because a reconnect
+ * has occurred), then stop on the matching req and send it again.
+ * If, however, the last sent transno has been committed then we
+ * continue replay from the next request. */
+ if (req != NULL && imp->imp_resend_replay)
+ lustre_msg_add_flags(req->rq_reqmsg, MSG_RESENT);
+
spin_lock(&imp->imp_lock);
imp->imp_resend_replay = 0;
spin_unlock(&imp->imp_lock);
--
1.8.5.3

2014-03-01 02:17:27

by Oleg Drokin

[permalink] [raw]
Subject: [PATCH 11/17] lustre/ptlrpc: rq_commit_cb is called for twice

From: Liang Zhen <[email protected]>

If a ptlrpc_request is already on imp::imp_replay_list, when it's
replayed and replied, after_reply() will call req::rq_commit_cb
for the request, then call it again in ptlrpc_free_committed.

Signed-off-by: Liang Zhen <[email protected]>
Reviewed-on: http://review.whamcloud.com/8815
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-3618
Reviewed-by: Alex Zhuravlev <[email protected]>
Reviewed-by: Bobi Jam <[email protected]>
Reviewed-by: Oleg Drokin <[email protected]>
Signed-off-by: Oleg Drokin <[email protected]>
---
drivers/staging/lustre/lustre/ptlrpc/client.c | 6 +++++-
1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/drivers/staging/lustre/lustre/ptlrpc/client.c b/drivers/staging/lustre/lustre/ptlrpc/client.c
index a32b722..b6d831a 100644
--- a/drivers/staging/lustre/lustre/ptlrpc/client.c
+++ b/drivers/staging/lustre/lustre/ptlrpc/client.c
@@ -1313,7 +1313,11 @@ static int after_reply(struct ptlrpc_request *req)
/** version recovery */
ptlrpc_save_versions(req);
ptlrpc_retain_replayable_request(req, imp);
- } else if (req->rq_commit_cb != NULL) {
+ } else if (req->rq_commit_cb != NULL &&
+ list_empty(&req->rq_replay_list)) {
+ /* NB: don't call rq_commit_cb if it's already on
+ * rq_replay_list, ptlrpc_free_committed() will call
+ * it later, see LU-3618 for details */
spin_unlock(&imp->imp_lock);
req->rq_commit_cb(req);
spin_lock(&imp->imp_lock);
--
1.8.5.3

2014-03-01 02:17:34

by Oleg Drokin

[permalink] [raw]
Subject: [PATCH 15/17] lustre/osc: Don't flush active extents.

From: Ann Koehler <[email protected]>

The extent is active so we need to abort and let the caller
re-dirty the page. If we continued on here, and we were the
one making the extent active, we could deadlock waiting for
the page writeback to clear but it won't because the extent
is active and won't be written out.

Signed-off-by: Ann Koehler <[email protected]>
Reviewed-on: http://review.whamcloud.com/8278
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-4253
Reviewed-by: Jinshan Xiong <[email protected]>
Reviewed-by: Alex Zhuravlev <[email protected]>
Reviewed-by: Alexey Lyashkov <[email protected]>
Reviewed-by: Oleg Drokin <[email protected]>
Signed-off-by: Oleg Drokin <[email protected]>
---
drivers/staging/lustre/lustre/osc/osc_cache.c | 6 ++++++
1 file changed, 6 insertions(+)

diff --git a/drivers/staging/lustre/lustre/osc/osc_cache.c b/drivers/staging/lustre/lustre/osc/osc_cache.c
index b92a02e..af25c19 100644
--- a/drivers/staging/lustre/lustre/osc/osc_cache.c
+++ b/drivers/staging/lustre/lustre/osc/osc_cache.c
@@ -2394,6 +2394,12 @@ int osc_flush_async_page(const struct lu_env *env, struct cl_io *io,
* really sending the RPC. */
case OES_TRUNC:
/* race with truncate, page will be redirtied */
+ case OES_ACTIVE:
+ /* The extent is active so we need to abort and let the caller
+ * re-dirty the page. If we continued on here, and we were the
+ * one making the extent active, we could deadlock waiting for
+ * the page writeback to clear but it won't because the extent
+ * is active and won't be written out. */
GOTO(out, rc = -EAGAIN);
default:
break;
--
1.8.5.3

2014-03-01 02:17:37

by Oleg Drokin

[permalink] [raw]
Subject: [PATCH 17/17] lustre/libcfs: warn if all HTs in a core are gone

libcfs cpu partition can't support CPU hotplug, but it is safe
when plug-in new CPU or enabling/disabling hyper-threading.
It has potential risk only if plug-out CPU because it may break CPU
affinity of Lustre threads.

Current libcfs will print warning for all CPU notification, this
patch changed this behavior and only output warning when we lost all
HTs in a CPU core which may have broken affinity of Lustre threads.

Signed-off-by: Liang Zhen <[email protected]>
Reviewed-on: http://review.whamcloud.com/8770
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-4454
Reviewed-by: Bobi Jam <[email protected]>
Reviewed-by: Andreas Dilger <[email protected]>
Signed-off-by: Oleg Drokin <[email protected]>
---
.../staging/lustre/lustre/libcfs/linux/linux-cpu.c | 19 ++++++++++++++++---
1 file changed, 16 insertions(+), 3 deletions(-)

diff --git a/drivers/staging/lustre/lustre/libcfs/linux/linux-cpu.c b/drivers/staging/lustre/lustre/libcfs/linux/linux-cpu.c
index 58bb256..77b1ef6 100644
--- a/drivers/staging/lustre/lustre/libcfs/linux/linux-cpu.c
+++ b/drivers/staging/lustre/lustre/libcfs/linux/linux-cpu.c
@@ -952,6 +952,7 @@ static int
cfs_cpu_notify(struct notifier_block *self, unsigned long action, void *hcpu)
{
unsigned int cpu = (unsigned long)hcpu;
+ bool warn;

switch (action) {
case CPU_DEAD:
@@ -962,9 +963,21 @@ cfs_cpu_notify(struct notifier_block *self, unsigned long action, void *hcpu)
cpt_data.cpt_version++;
spin_unlock(&cpt_data.cpt_lock);
default:
- CWARN("Lustre: can't support CPU hotplug well now, "
- "performance and stability could be impacted"
- "[CPU %u notify: %lx]\n", cpu, action);
+ if (action != CPU_DEAD && action != CPU_DEAD_FROZEN) {
+ CDEBUG(D_INFO, "CPU changed [cpu %u action %lx]\n",
+ cpu, action);
+ break;
+ }
+
+ down(&cpt_data.cpt_mutex);
+ /* if all HTs in a core are offline, it may break affinity */
+ cfs_cpu_ht_siblings(cpu, cpt_data.cpt_cpumask);
+ warn = any_online_cpu(*cpt_data.cpt_cpumask) >= nr_cpu_ids;
+ up(&cpt_data.cpt_mutex);
+ CDEBUG(warn ? D_WARNING : D_INFO,
+ "Lustre: can't support CPU plug-out well now, "
+ "performance and stability could be impacted "
+ "[CPU %u action: %lx]\n", cpu, action);
}

return NOTIFY_OK;
--
1.8.5.3

2014-03-01 02:17:32

by Oleg Drokin

[permalink] [raw]
Subject: [PATCH 13/17] lustre/ptlrpc: fix 'data race condition' issues

From: Sebastien Buisson <[email protected]>

Fix 'data race condition' defects found by Coverity version
6.5.0:
Data race condition (MISSING_LOCK)
Accessing variable without holding lock. Elsewhere,
this variable is accessed with lock held.

Signed-off-by: Sebastien Buisson <[email protected]>
Reviewed-on: http://review.whamcloud.com/6575
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-2744
Reviewed-by: Andreas Dilger <[email protected]>
Signed-off-by: Oeg Drokin <[email protected]>
---
drivers/staging/lustre/lustre/ptlrpc/client.c | 6 ++++++
drivers/staging/lustre/lustre/ptlrpc/niobuf.c | 4 ++++
2 files changed, 10 insertions(+)

diff --git a/drivers/staging/lustre/lustre/ptlrpc/client.c b/drivers/staging/lustre/lustre/ptlrpc/client.c
index 98041e8..7b97c64 100644
--- a/drivers/staging/lustre/lustre/ptlrpc/client.c
+++ b/drivers/staging/lustre/lustre/ptlrpc/client.c
@@ -1190,7 +1190,9 @@ static int after_reply(struct ptlrpc_request *req)
* will roundup it */
req->rq_replen = req->rq_nob_received;
req->rq_nob_received = 0;
+ spin_lock(&req->rq_lock);
req->rq_resend = 1;
+ spin_unlock(&req->rq_lock);
return 0;
}

@@ -1412,7 +1414,9 @@ static int ptlrpc_send_new_req(struct ptlrpc_request *req)
req->rq_status = rc;
return 1;
} else {
+ spin_lock(&req->rq_lock);
req->rq_wait_ctx = 1;
+ spin_unlock(&req->rq_lock);
return 0;
}
}
@@ -1427,7 +1431,9 @@ static int ptlrpc_send_new_req(struct ptlrpc_request *req)
rc = ptl_send_rpc(req, 0);
if (rc) {
DEBUG_REQ(D_HA, req, "send failed (%d); expect timeout", rc);
+ spin_lock(&req->rq_lock);
req->rq_net_err = 1;
+ spin_unlock(&req->rq_lock);
return rc;
}
return 0;
diff --git a/drivers/staging/lustre/lustre/ptlrpc/niobuf.c b/drivers/staging/lustre/lustre/ptlrpc/niobuf.c
index 1e94597..a47a8d8 100644
--- a/drivers/staging/lustre/lustre/ptlrpc/niobuf.c
+++ b/drivers/staging/lustre/lustre/ptlrpc/niobuf.c
@@ -511,7 +511,9 @@ int ptl_send_rpc(struct ptlrpc_request *request, int noreply)
CDEBUG(D_HA, "muting rpc for failed imp obd %s\n",
request->rq_import->imp_obd->obd_name);
/* this prevents us from waiting in ptlrpc_queue_wait */
+ spin_lock(&request->rq_lock);
request->rq_err = 1;
+ spin_unlock(&request->rq_lock);
request->rq_status = -ENODEV;
return -ENODEV;
}
@@ -553,7 +555,9 @@ int ptl_send_rpc(struct ptlrpc_request *request, int noreply)
if (rc) {
/* this prevents us from looping in
* ptlrpc_queue_wait */
+ spin_lock(&request->rq_lock);
request->rq_err = 1;
+ spin_unlock(&request->rq_lock);
request->rq_status = rc;
GOTO(cleanup_bulk, rc);
}
--
1.8.5.3

2014-03-01 02:18:15

by Oleg Drokin

[permalink] [raw]
Subject: [PATCH 16/17] lustre/quota: improper assert in osc_quota_chkdq()

From: Niu Yawei <[email protected]>

In osc_quota_chkdq(), we should never try to access oqi found
from hash, since it could have been freed by osc_quota_setdq().

Signed-off-by: Niu Yawei <[email protected]>
Reviewed-on: http://review.whamcloud.com/8460
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-4336
Reviewed-by: Johann Lombardi <[email protected]>
Reviewed-by: Fan Yong <[email protected]>
Signed-off-by: Oleg Drokin <[email protected]>
---
drivers/staging/lustre/lustre/osc/osc_quota.c | 7 ++-----
1 file changed, 2 insertions(+), 5 deletions(-)

diff --git a/drivers/staging/lustre/lustre/osc/osc_quota.c b/drivers/staging/lustre/lustre/osc/osc_quota.c
index 6045a78..f395ae4 100644
--- a/drivers/staging/lustre/lustre/osc/osc_quota.c
+++ b/drivers/staging/lustre/lustre/osc/osc_quota.c
@@ -51,11 +51,8 @@ int osc_quota_chkdq(struct client_obd *cli, const unsigned int qid[])

oqi = cfs_hash_lookup(cli->cl_quota_hash[type], &qid[type]);
if (oqi) {
- obd_uid id = oqi->oqi_id;
-
- LASSERTF(id == qid[type],
- "The ids don't match %u != %u\n",
- id, qid[type]);
+ /* do not try to access oqi here, it could have been
+ * freed by osc_quota_setdq() */

/* the slot is busy, the user is about to run out of
* quota space on this OST */
--
1.8.5.3

2014-03-01 02:17:30

by Oleg Drokin

[permalink] [raw]
Subject: [PATCH 14/17] lustre/ptlrpc: re-enqueue ptlrpcd worker

From: Liang Zhen <[email protected]>

osc_extent_wait can be stuck in scenario like this:

1) thread-1 held an active extent
2) thread-2 called flush cache, and marked this extent as "urgent"
and "sync_wait"
3) thread-3 wants to write to the same extent, osc_extent_find will
get "conflict" because this extent is "sync_wait", so it starts
to wait...
4) cl_writeback_work has been scheduled by thread-4 to write some
other extents, it has sent RPCs but not returned yet.
5) thread-1 finished his work, and called osc_extent_release()->
osc_io_unplug_async()->ptlrpcd_queue_work(), but found
cl_writeback_work is still running, so it's ignored (-EBUSY)
6) thread-3 is stuck because nobody will wake him up.

This patch allows ptlrpcd_work to be rescheduled, so it will not
miss request anymore

Signed-off-by: Liang Zhen <[email protected]>
Reviewed-on: http://review.whamcloud.com/8922
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-4509
Reviewed-by: Jinshan Xiong <[email protected]>
Reviewed-by: Bobi Jam <[email protected]>
Reviewed-by: Oleg Drokin <[email protected]>
Signed-off-by: Oleg Drokin <[email protected]>
---
drivers/staging/lustre/lustre/ptlrpc/client.c | 64 +++++++++++++++++----------
1 file changed, 40 insertions(+), 24 deletions(-)

diff --git a/drivers/staging/lustre/lustre/ptlrpc/client.c b/drivers/staging/lustre/lustre/ptlrpc/client.c
index 7b97c64..4c9e006 100644
--- a/drivers/staging/lustre/lustre/ptlrpc/client.c
+++ b/drivers/staging/lustre/lustre/ptlrpc/client.c
@@ -48,6 +48,7 @@
#include "ptlrpc_internal.h"

static int ptlrpc_send_new_req(struct ptlrpc_request *req);
+static int ptlrpcd_check_work(struct ptlrpc_request *req);

/**
* Initialize passed in client structure \a cl.
@@ -1784,6 +1785,10 @@ interpret:

ptlrpc_req_interpret(env, req, req->rq_status);

+ if (ptlrpcd_check_work(req)) {
+ atomic_dec(&set->set_remaining);
+ continue;
+ }
ptlrpc_rqphase_move(req, RQ_PHASE_COMPLETE);

CDEBUG(req->rq_reqmsg != NULL ? D_RPCTRACE : 0,
@@ -2957,22 +2962,50 @@ EXPORT_SYMBOL(ptlrpc_sample_next_xid);
* have delay before it really runs by ptlrpcd thread.
*/
struct ptlrpc_work_async_args {
- __u64 magic;
int (*cb)(const struct lu_env *, void *);
void *cbdata;
};

-#define PTLRPC_WORK_MAGIC 0x6655436b676f4f44ULL /* magic code */
+static void ptlrpcd_add_work_req(struct ptlrpc_request *req)
+{
+ /* re-initialize the req */
+ req->rq_timeout = obd_timeout;
+ req->rq_sent = cfs_time_current_sec();
+ req->rq_deadline = req->rq_sent + req->rq_timeout;
+ req->rq_reply_deadline = req->rq_deadline;
+ req->rq_phase = RQ_PHASE_INTERPRET;
+ req->rq_next_phase = RQ_PHASE_COMPLETE;
+ req->rq_xid = ptlrpc_next_xid();
+ req->rq_import_generation = req->rq_import->imp_generation;
+
+ ptlrpcd_add_req(req, PDL_POLICY_ROUND, -1);
+}

static int work_interpreter(const struct lu_env *env,
struct ptlrpc_request *req, void *data, int rc)
{
struct ptlrpc_work_async_args *arg = data;

- LASSERT(arg->magic == PTLRPC_WORK_MAGIC);
+ LASSERT(ptlrpcd_check_work(req));
LASSERT(arg->cb != NULL);

- return arg->cb(env, arg->cbdata);
+ rc = arg->cb(env, arg->cbdata);
+
+ list_del_init(&req->rq_set_chain);
+ req->rq_set = NULL;
+
+ if (atomic_dec_return(&req->rq_refcount) > 1) {
+ atomic_set(&req->rq_refcount, 2);
+ ptlrpcd_add_work_req(req);
+ }
+ return rc;
+}
+
+static int worker_format;
+
+static int ptlrpcd_check_work(struct ptlrpc_request *req)
+{
+ return req->rq_pill.rc_fmt == (void *)&worker_format;
}

/**
@@ -3005,6 +3038,7 @@ void *ptlrpcd_alloc_work(struct obd_import *imp,
req->rq_receiving_reply = 0;
req->rq_must_unlink = 0;
req->rq_no_delay = req->rq_no_resend = 1;
+ req->rq_pill.rc_fmt = (void *)&worker_format;

spin_lock_init(&req->rq_lock);
INIT_LIST_HEAD(&req->rq_list);
@@ -3018,7 +3052,6 @@ void *ptlrpcd_alloc_work(struct obd_import *imp,

CLASSERT(sizeof(*args) <= sizeof(req->rq_async_args));
args = ptlrpc_req_async_args(req);
- args->magic = PTLRPC_WORK_MAGIC;
args->cb = cb;
args->cbdata = cbdata;

@@ -3048,25 +3081,8 @@ int ptlrpcd_queue_work(void *handler)
* req as opaque data. - Jinshan
*/
LASSERT(atomic_read(&req->rq_refcount) > 0);
- if (atomic_read(&req->rq_refcount) > 1)
- return -EBUSY;
-
- if (atomic_inc_return(&req->rq_refcount) > 2) { /* race */
- atomic_dec(&req->rq_refcount);
- return -EBUSY;
- }
-
- /* re-initialize the req */
- req->rq_timeout = obd_timeout;
- req->rq_sent = cfs_time_current_sec();
- req->rq_deadline = req->rq_sent + req->rq_timeout;
- req->rq_reply_deadline = req->rq_deadline;
- req->rq_phase = RQ_PHASE_INTERPRET;
- req->rq_next_phase = RQ_PHASE_COMPLETE;
- req->rq_xid = ptlrpc_next_xid();
- req->rq_import_generation = req->rq_import->imp_generation;
-
- ptlrpcd_add_req(req, PDL_POLICY_ROUND, -1);
+ if (atomic_inc_return(&req->rq_refcount) == 2)
+ ptlrpcd_add_work_req(req);
return 0;
}
EXPORT_SYMBOL(ptlrpcd_queue_work);
--
1.8.5.3

2014-03-01 02:17:25

by Oleg Drokin

[permalink] [raw]
Subject: [PATCH 09/17] lustre/llite: simplify dentry revalidate

From: Lai Siyao <[email protected]>

Lustre client dentry validation is protected by LDLM lock, so
any time a dentry is found, it's valid and no need to revalidate
from MDS, and even it does, there is race that it may be
invalidated after revalidation is finished.

Signed-off-by: Lai Siyao <[email protected]>
Reviewed-on: http://review.whamcloud.com/7475
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-3544
Reviewed-by: Peng Tao <[email protected]>
Reviewed-by: Bob Glossman <[email protected]>
Reviewed-by: John L. Hammond <[email protected]>
Reviewed-by: Oleg Drokin <[email protected]>
Signed-off-by: Oleg Drokin <[email protected]>
---
.../lustre/lustre/include/lustre/lustre_idl.h | 2 +-
drivers/staging/lustre/lustre/llite/dcache.c | 290 ++-------------------
drivers/staging/lustre/lustre/llite/file.c | 8 +-
.../staging/lustre/lustre/llite/llite_internal.h | 4 +-
drivers/staging/lustre/lustre/lmv/lmv_intent.c | 1 -
drivers/staging/lustre/lustre/lmv/lmv_obd.c | 1 -
drivers/staging/lustre/lustre/mdc/mdc_locks.c | 52 ++--
7 files changed, 45 insertions(+), 313 deletions(-)

diff --git a/drivers/staging/lustre/lustre/include/lustre/lustre_idl.h b/drivers/staging/lustre/lustre/include/lustre/lustre_idl.h
index a55eebf..5f5b0ba 100644
--- a/drivers/staging/lustre/lustre/include/lustre/lustre_idl.h
+++ b/drivers/staging/lustre/lustre/include/lustre/lustre_idl.h
@@ -2112,7 +2112,7 @@ extern void lustre_swab_generic_32s (__u32 *val);
#define DISP_LOOKUP_POS 0x00000008
#define DISP_OPEN_CREATE 0x00000010
#define DISP_OPEN_OPEN 0x00000020
-#define DISP_ENQ_COMPLETE 0x00400000
+#define DISP_ENQ_COMPLETE 0x00400000 /* obsolete and unused */
#define DISP_ENQ_OPEN_REF 0x00800000
#define DISP_ENQ_CREATE_REF 0x01000000
#define DISP_OPEN_LOCK 0x02000000
diff --git a/drivers/staging/lustre/lustre/llite/dcache.c b/drivers/staging/lustre/lustre/llite/dcache.c
index 3907c87..f971a54 100644
--- a/drivers/staging/lustre/lustre/llite/dcache.c
+++ b/drivers/staging/lustre/lustre/llite/dcache.c
@@ -241,9 +241,6 @@ void ll_intent_release(struct lookup_intent *it)
ptlrpc_req_finished(it->d.lustre.it_data); /* ll_file_open */
if (it_disposition(it, DISP_ENQ_CREATE_REF)) /* create rec */
ptlrpc_req_finished(it->d.lustre.it_data);
- if (it_disposition(it, DISP_ENQ_COMPLETE)) /* saved req from revalidate
- * to lookup */
- ptlrpc_req_finished(it->d.lustre.it_data);

it->d.lustre.it_disposition = 0;
it->d.lustre.it_data = NULL;
@@ -328,262 +325,32 @@ void ll_frob_intent(struct lookup_intent **itp, struct lookup_intent *deft)

}

-int ll_revalidate_it(struct dentry *de, int lookup_flags,
- struct lookup_intent *it)
+static int ll_revalidate_dentry(struct dentry *dentry,
+ unsigned int lookup_flags)
{
- struct md_op_data *op_data;
- struct ptlrpc_request *req = NULL;
- struct lookup_intent lookup_it = { .it_op = IT_LOOKUP };
- struct obd_export *exp;
- struct inode *parent = de->d_parent->d_inode;
- int rc;
-
- CDEBUG(D_VFSTRACE, "VFS Op:name=%s,intent=%s\n", de->d_name.name,
- LL_IT2STR(it));
-
- LASSERT(de != de->d_sb->s_root);
-
- if (de->d_inode == NULL) {
- __u64 ibits;
-
- /* We can only use negative dentries if this is stat or lookup,
- for opens and stuff we do need to query server. */
- /* If there is IT_CREAT in intent op set, then we must throw
- away this negative dentry and actually do the request to
- kernel to create whatever needs to be created (if possible)*/
- if (it && (it->it_op & IT_CREAT))
- return 0;
+ struct inode *dir = dentry->d_parent->d_inode;

- if (d_lustre_invalid(de))
- return 0;
-
- ibits = MDS_INODELOCK_UPDATE;
- rc = ll_have_md_lock(parent, &ibits, LCK_MINMODE);
- GOTO(out_sa, rc);
- }
-
- /* Never execute intents for mount points.
- * Attributes will be fixed up in ll_inode_revalidate_it */
- if (d_mountpoint(de))
- GOTO(out_sa, rc = 1);
-
- exp = ll_i2mdexp(de->d_inode);
-
- OBD_FAIL_TIMEOUT(OBD_FAIL_MDC_REVALIDATE_PAUSE, 5);
- ll_frob_intent(&it, &lookup_it);
- LASSERT(it);
+ /*
+ * if open&create is set, talk to MDS to make sure file is created if
+ * necessary, because we can't do this in ->open() later since that's
+ * called on an inode. return 0 here to let lookup to handle this.
+ */
+ if ((lookup_flags & (LOOKUP_OPEN | LOOKUP_CREATE)) ==
+ (LOOKUP_OPEN | LOOKUP_CREATE))
+ return 0;

- if (it->it_op == IT_LOOKUP && !d_lustre_invalid(de))
+ if (lookup_flags & (LOOKUP_PARENT | LOOKUP_OPEN | LOOKUP_CREATE))
return 1;

- if (it->it_op == IT_OPEN) {
- struct inode *inode = de->d_inode;
- struct ll_inode_info *lli = ll_i2info(inode);
- struct obd_client_handle **och_p;
- __u64 ibits;
-
- /*
- * We used to check for MDS_INODELOCK_OPEN here, but in fact
- * just having LOOKUP lock is enough to justify inode is the
- * same. And if inode is the same and we have suitable
- * openhandle, then there is no point in doing another OPEN RPC
- * just to throw away newly received openhandle. There are no
- * security implications too, if file owner or access mode is
- * change, LOOKUP lock is revoked.
- */
-
-
- if (it->it_flags & FMODE_WRITE)
- och_p = &lli->lli_mds_write_och;
- else if (it->it_flags & FMODE_EXEC)
- och_p = &lli->lli_mds_exec_och;
- else
- och_p = &lli->lli_mds_read_och;
-
- /* Check for the proper lock. */
- ibits = MDS_INODELOCK_LOOKUP;
- if (!ll_have_md_lock(inode, &ibits, LCK_MINMODE))
- goto do_lock;
- mutex_lock(&lli->lli_och_mutex);
- if (*och_p) { /* Everything is open already, do nothing */
- /* Originally it was idea to do not let them steal our
- * open handle from under us by (*och_usecount)++ here.
- * But in case we have the handle, but we cannot use it
- * due to later checks (e.g. O_CREAT|O_EXCL flags set),
- * nobody would decrement counter increased here. So we
- * just hope the lock won't be invalidated in between.
- * But if it would be, we'll reopen the open request to
- * MDS later during file open path.
- */
- mutex_unlock(&lli->lli_och_mutex);
- return 1;
- }
- mutex_unlock(&lli->lli_och_mutex);
- }
-
- if (it->it_op == IT_GETATTR) {
- rc = ll_statahead_enter(parent, &de, 0);
- if (rc == 1)
- goto mark;
- else if (rc != -EAGAIN && rc != 0)
- GOTO(out, rc = 0);
- }
-
-do_lock:
- op_data = ll_prep_md_op_data(NULL, parent, de->d_inode,
- de->d_name.name, de->d_name.len,
- 0, LUSTRE_OPC_ANY, NULL);
- if (IS_ERR(op_data))
- return PTR_ERR(op_data);
-
- if (!IS_POSIXACL(parent) || !exp_connect_umask(exp))
- it->it_create_mode &= ~current_umask();
- it->it_create_mode |= M_CHECK_STALE;
- rc = md_intent_lock(exp, op_data, NULL, 0, it,
- lookup_flags,
- &req, ll_md_blocking_ast, 0);
- it->it_create_mode &= ~M_CHECK_STALE;
- ll_finish_md_op_data(op_data);
-
- /* If req is NULL, then md_intent_lock only tried to do a lock match;
- * if all was well, it will return 1 if it found locks, 0 otherwise. */
- if (req == NULL && rc >= 0) {
- if (!rc)
- goto do_lookup;
- GOTO(out, rc);
- }
-
- if (rc < 0) {
- if (rc != -ESTALE) {
- CDEBUG(D_INFO, "ll_intent_lock: rc %d : it->it_status "
- "%d\n", rc, it->d.lustre.it_status);
- }
- GOTO(out, rc = 0);
- }
-
-revalidate_finish:
- rc = ll_revalidate_it_finish(req, it, de);
- if (rc != 0) {
- if (rc != -ESTALE && rc != -ENOENT)
- ll_intent_release(it);
- GOTO(out, rc = 0);
- }
-
- if ((it->it_op & IT_OPEN) && de->d_inode &&
- !S_ISREG(de->d_inode->i_mode) &&
- !S_ISDIR(de->d_inode->i_mode)) {
- ll_release_openhandle(de, it);
- }
- rc = 1;
-
-out:
- /* We do not free request as it may be reused during following lookup
- * (see comment in mdc/mdc_locks.c::mdc_intent_lock()), request will
- * be freed in ll_lookup_it or in ll_intent_release. But if
- * request was not completed, we need to free it. (bug 5154, 9903) */
- if (req != NULL && !it_disposition(it, DISP_ENQ_COMPLETE))
- ptlrpc_req_finished(req);
- if (rc == 0) {
- /* mdt may grant layout lock for the newly created file, so
- * release the lock to avoid leaking */
- ll_intent_drop_lock(it);
- ll_invalidate_aliases(de->d_inode);
- } else {
- __u64 bits = 0;
- __u64 matched_bits = 0;
-
- CDEBUG(D_DENTRY, "revalidated dentry %.*s (%p) parent %p "
- "inode %p refc %d\n", de->d_name.len,
- de->d_name.name, de, de->d_parent, de->d_inode,
- d_count(de));
-
- ll_set_lock_data(exp, de->d_inode, it, &bits);
-
- /* Note: We have to match both LOOKUP and PERM lock
- * here to make sure the dentry is valid and no one
- * changing the permission.
- * But if the client connects < 2.4 server, which will
- * only grant LOOKUP lock, so we can only Match LOOKUP
- * lock for old server */
- if (exp_connect_flags(ll_i2mdexp(de->d_inode)) &&
- OBD_CONNECT_LVB_TYPE)
- matched_bits =
- MDS_INODELOCK_LOOKUP | MDS_INODELOCK_PERM;
- else
- matched_bits = MDS_INODELOCK_LOOKUP;
-
- if (((bits & matched_bits) == matched_bits) &&
- d_lustre_invalid(de))
- d_lustre_revalidate(de);
- ll_lookup_finish_locks(it, de);
- }
-
-mark:
- if (it != NULL && it->it_op == IT_GETATTR && rc > 0)
- ll_statahead_mark(parent, de);
- return rc;
+ if (d_need_statahead(dir, dentry) <= 0)
+ return 1;

- /*
- * This part is here to combat evil-evil race in real_lookup on 2.6
- * kernels. The race details are: We enter do_lookup() looking for some
- * name, there is nothing in dcache for this name yet and d_lookup()
- * returns NULL. We proceed to real_lookup(), and while we do this,
- * another process does open on the same file we looking up (most simple
- * reproducer), open succeeds and the dentry is added. Now back to
- * us. In real_lookup() we do d_lookup() again and suddenly find the
- * dentry, so we call d_revalidate on it, but there is no lock, so
- * without this code we would return 0, but unpatched real_lookup just
- * returns -ENOENT in such a case instead of retrying the lookup. Once
- * this is dealt with in real_lookup(), all of this ugly mess can go and
- * we can just check locks in ->d_revalidate without doing any RPCs
- * ever.
- */
-do_lookup:
- if (it != &lookup_it) {
- /* MDS_INODELOCK_UPDATE needed for IT_GETATTR case. */
- if (it->it_op == IT_GETATTR)
- lookup_it.it_op = IT_GETATTR;
- ll_lookup_finish_locks(it, de);
- it = &lookup_it;
- }
+ if (lookup_flags & LOOKUP_RCU)
+ return -ECHILD;

- /* Do real lookup here. */
- op_data = ll_prep_md_op_data(NULL, parent, NULL, de->d_name.name,
- de->d_name.len, 0, (it->it_op & IT_CREAT ?
- LUSTRE_OPC_CREATE :
- LUSTRE_OPC_ANY), NULL);
- if (IS_ERR(op_data))
- return PTR_ERR(op_data);
-
- rc = md_intent_lock(exp, op_data, NULL, 0, it, 0, &req,
- ll_md_blocking_ast, 0);
- if (rc >= 0) {
- struct mdt_body *mdt_body;
- struct lu_fid fid = {.f_seq = 0, .f_oid = 0, .f_ver = 0};
- mdt_body = req_capsule_server_get(&req->rq_pill, &RMF_MDT_BODY);
-
- if (de->d_inode)
- fid = *ll_inode2fid(de->d_inode);
-
- /* see if we got same inode, if not - return error */
- if (lu_fid_eq(&fid, &mdt_body->fid1)) {
- ll_finish_md_op_data(op_data);
- op_data = NULL;
- goto revalidate_finish;
- }
- ll_intent_release(it);
- }
- ll_finish_md_op_data(op_data);
- GOTO(out, rc = 0);
-
-out_sa:
- /*
- * For rc == 1 case, should not return directly to prevent losing
- * statahead windows; for rc == 0 case, the "lookup" will be done later.
- */
- if (it != NULL && it->it_op == IT_GETATTR && rc == 1)
- ll_statahead_enter(parent, &de, 1);
- goto mark;
+ do_statahead_enter(dir, &dentry, dentry->d_inode == NULL);
+ ll_statahead_mark(dir, dentry);
+ return 1;
}

/*
@@ -591,24 +358,13 @@ out_sa:
*/
int ll_revalidate_nd(struct dentry *dentry, unsigned int flags)
{
- struct inode *parent = dentry->d_parent->d_inode;
- int unplug = 0;
+ int rc;

- CDEBUG(D_VFSTRACE, "VFS Op:name=%s,flags=%u\n",
+ CDEBUG(D_VFSTRACE, "VFS Op:name=%s, flags=%u\n",
dentry->d_name.name, flags);

- if (!(flags & (LOOKUP_PARENT|LOOKUP_OPEN|LOOKUP_CREATE)) &&
- ll_need_statahead(parent, dentry) > 0) {
- if (flags & LOOKUP_RCU)
- return -ECHILD;
-
- if (dentry->d_inode == NULL)
- unplug = 1;
- do_statahead_enter(parent, &dentry, unplug);
- ll_statahead_mark(parent, dentry);
- }
-
- return 1;
+ rc = ll_revalidate_dentry(dentry, flags);
+ return rc;
}


diff --git a/drivers/staging/lustre/lustre/llite/file.c b/drivers/staging/lustre/lustre/llite/file.c
index 7ceec74..70b48ab 100644
--- a/drivers/staging/lustre/lustre/llite/file.c
+++ b/drivers/staging/lustre/lustre/llite/file.c
@@ -446,8 +446,7 @@ static int ll_intent_file_open(struct file *file, void *lmm,
itp, NULL);

out:
- ptlrpc_req_finished(itp->d.lustre.it_data);
- it_clear_disposition(itp, DISP_ENQ_COMPLETE);
+ ptlrpc_req_finished(req);
ll_intent_drop_lock(itp);

return rc;
@@ -815,10 +814,7 @@ struct obd_client_handle *ll_lease_open(struct inode *inode, struct file *file,
* doesn't deal with openhandle, so normal openhandle will be leaked. */
LDLM_FL_NO_LRU | LDLM_FL_EXCL);
ll_finish_md_op_data(op_data);
- if (req != NULL) {
- ptlrpc_req_finished(req);
- it_clear_disposition(&it, DISP_ENQ_COMPLETE);
- }
+ ptlrpc_req_finished(req);
if (rc < 0)
GOTO(out_release_it, rc);

diff --git a/drivers/staging/lustre/lustre/llite/llite_internal.h b/drivers/staging/lustre/lustre/llite/llite_internal.h
index 47c5142..f67c508 100644
--- a/drivers/staging/lustre/lustre/llite/llite_internal.h
+++ b/drivers/staging/lustre/lustre/llite/llite_internal.h
@@ -1309,7 +1309,7 @@ ll_statahead_mark(struct inode *dir, struct dentry *dentry)
}

static inline int
-ll_need_statahead(struct inode *dir, struct dentry *dentryp)
+d_need_statahead(struct inode *dir, struct dentry *dentryp)
{
struct ll_inode_info *lli;
struct ll_dentry_data *ldd;
@@ -1354,7 +1354,7 @@ ll_statahead_enter(struct inode *dir, struct dentry **dentryp, int only_unplug)
{
int ret;

- ret = ll_need_statahead(dir, *dentryp);
+ ret = d_need_statahead(dir, *dentryp);
if (ret <= 0)
return ret;

diff --git a/drivers/staging/lustre/lustre/lmv/lmv_intent.c b/drivers/staging/lustre/lustre/lmv/lmv_intent.c
index 56dedce..9ba5a0a 100644
--- a/drivers/staging/lustre/lustre/lmv/lmv_intent.c
+++ b/drivers/staging/lustre/lustre/lmv/lmv_intent.c
@@ -119,7 +119,6 @@ static int lmv_intent_remote(struct obd_export *exp, void *lmm,
CDEBUG(D_INODE, "REMOTE_INTENT with fid="DFID" -> mds #%d\n",
PFID(&body->fid1), tgt->ltd_idx);

- it->d.lustre.it_disposition &= ~DISP_ENQ_COMPLETE;
rc = md_intent_lock(tgt->ltd_exp, op_data, lmm, lmmsize, it,
flags, &req, cb_blocking, extra_lock_flags);
if (rc)
diff --git a/drivers/staging/lustre/lustre/lmv/lmv_obd.c b/drivers/staging/lustre/lustre/lmv/lmv_obd.c
index 40fbd44..3ba0a0a 100644
--- a/drivers/staging/lustre/lustre/lmv/lmv_obd.c
+++ b/drivers/staging/lustre/lustre/lmv/lmv_obd.c
@@ -1744,7 +1744,6 @@ lmv_enqueue_remote(struct obd_export *exp, struct ldlm_enqueue_info *einfo,
it->d.lustre.it_data = NULL;
fid1 = body->fid1;

- it->d.lustre.it_disposition &= ~DISP_ENQ_COMPLETE;
ptlrpc_req_finished(req);

tgt = lmv_find_target(lmv, &fid1);
diff --git a/drivers/staging/lustre/lustre/mdc/mdc_locks.c b/drivers/staging/lustre/lustre/mdc/mdc_locks.c
index 20706e7..81adc2b 100644
--- a/drivers/staging/lustre/lustre/mdc/mdc_locks.c
+++ b/drivers/staging/lustre/lustre/mdc/mdc_locks.c
@@ -968,7 +968,6 @@ static int mdc_finish_intent_lock(struct obd_export *exp,
if (fid_is_sane(&op_data->op_fid2) &&
it->it_create_mode & M_CHECK_STALE &&
it->it_op != IT_GETATTR) {
- it_set_disposition(it, DISP_ENQ_COMPLETE);

/* Also: did we find the same inode? */
/* sever can return one of two fids:
@@ -1139,6 +1138,12 @@ int mdc_intent_lock(struct obd_export *exp, struct md_op_data *op_data,
ldlm_blocking_callback cb_blocking,
__u64 extra_lock_flags)
{
+ struct ldlm_enqueue_info einfo = {
+ .ei_type = LDLM_IBITS,
+ .ei_mode = it_to_lock_mode(it),
+ .ei_cb_bl = cb_blocking,
+ .ei_cb_cp = ldlm_completion_ast,
+ };
struct lustre_handle lockh;
int rc = 0;

@@ -1164,42 +1169,19 @@ int mdc_intent_lock(struct obd_export *exp, struct md_op_data *op_data,
return rc;
}

- /* lookup_it may be called only after revalidate_it has run, because
- * revalidate_it cannot return errors, only zero. Returning zero causes
- * this call to lookup, which *can* return an error.
- *
- * We only want to execute the request associated with the intent one
- * time, however, so don't send the request again. Instead, skip past
- * this and use the request from revalidate. In this case, revalidate
- * never dropped its reference, so the refcounts are all OK */
- if (!it_disposition(it, DISP_ENQ_COMPLETE)) {
- struct ldlm_enqueue_info einfo = {
- .ei_type = LDLM_IBITS,
- .ei_mode = it_to_lock_mode(it),
- .ei_cb_bl = cb_blocking,
- .ei_cb_cp = ldlm_completion_ast,
- };
-
- /* For case if upper layer did not alloc fid, do it now. */
- if (!fid_is_sane(&op_data->op_fid2) && it->it_op & IT_CREAT) {
- rc = mdc_fid_alloc(exp, &op_data->op_fid2, op_data);
- if (rc < 0) {
- CERROR("Can't alloc new fid, rc %d\n", rc);
- return rc;
- }
- }
- rc = mdc_enqueue(exp, &einfo, it, op_data, &lockh,
- lmm, lmmsize, NULL, extra_lock_flags);
- if (rc < 0)
+ /* For case if upper layer did not alloc fid, do it now. */
+ if (!fid_is_sane(&op_data->op_fid2) && it->it_op & IT_CREAT) {
+ rc = mdc_fid_alloc(exp, &op_data->op_fid2, op_data);
+ if (rc < 0) {
+ CERROR("Can't alloc new fid, rc %d\n", rc);
return rc;
- } else if (!fid_is_sane(&op_data->op_fid2) ||
- !(it->it_create_mode & M_CHECK_STALE)) {
- /* DISP_ENQ_COMPLETE set means there is extra reference on
- * request referenced from this intent, saved for subsequent
- * lookup. This path is executed when we proceed to this
- * lookup, so we clear DISP_ENQ_COMPLETE */
- it_clear_disposition(it, DISP_ENQ_COMPLETE);
+ }
}
+ rc = mdc_enqueue(exp, &einfo, it, op_data, &lockh, lmm, lmmsize, NULL,
+ extra_lock_flags);
+ if (rc < 0)
+ return rc;
+
*reqp = it->d.lustre.it_data;
rc = mdc_finish_intent_lock(exp, *reqp, op_data, it, &lockh);
return rc;
--
1.8.5.3

2014-03-01 02:18:58

by Oleg Drokin

[permalink] [raw]
Subject: [PATCH 12/17] lustre/ptlrpc: skip rpcs that fail ptl_send_rpc

From: Peng Tao <[email protected]>

ptl_send_rpc is not dealing with -ENOMEM in some
situations. When the ptl_send_rpc fails we need
set error and skip further processing or trigger
and LBUG

Signed-off-by: Keith Mannthey <[email protected]>
Signed-off-by: Peng Tao <[email protected]>
Reviewed-on: http://review.whamcloud.com/7411
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-3698
Reviewed-by: Mike Pershin <[email protected]>
Reviewed-by: Oleg Drokin <[email protected]>
Signed-off-by: Oleg Drokin <[email protected]>
---
drivers/staging/lustre/lustre/ptlrpc/client.c | 1 +
1 file changed, 1 insertion(+)

diff --git a/drivers/staging/lustre/lustre/ptlrpc/client.c b/drivers/staging/lustre/lustre/ptlrpc/client.c
index b6d831a..98041e8 100644
--- a/drivers/staging/lustre/lustre/ptlrpc/client.c
+++ b/drivers/staging/lustre/lustre/ptlrpc/client.c
@@ -1692,6 +1692,7 @@ int ptlrpc_check_set(const struct lu_env *env, struct ptlrpc_request_set *set)
spin_lock(&req->rq_lock);
req->rq_net_err = 1;
spin_unlock(&req->rq_lock);
+ continue;
}
/* need to reset the timeout */
force_timer_recalc = 1;
--
1.8.5.3

2014-03-01 02:19:35

by Oleg Drokin

[permalink] [raw]
Subject: [PATCH 10/17] lustre/ldlm: set l_lvb_type coherent when layout is returned

From: Bruno Faccini <[email protected]>

In case layout has been packed into server reply when not
requested, lock l_lvb_type must be set accordingly.

Signed-off-by: Bruno Faccini <[email protected]>
Reviewed-on: http://review.whamcloud.com/8270
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-4194
Reviewed-by: Jinshan Xiong <[email protected]>
Reviewed-by: Johann Lombardi <[email protected]>
Signed-off-by: Oleg Drokin <[email protected]>
---
drivers/staging/lustre/lustre/ldlm/ldlm_lockd.c | 1 +
drivers/staging/lustre/lustre/mdc/mdc_locks.c | 1 +
2 files changed, 2 insertions(+)

diff --git a/drivers/staging/lustre/lustre/ldlm/ldlm_lockd.c b/drivers/staging/lustre/lustre/ldlm/ldlm_lockd.c
index 3ed020e..d87048d 100644
--- a/drivers/staging/lustre/lustre/ldlm/ldlm_lockd.c
+++ b/drivers/staging/lustre/lustre/ldlm/ldlm_lockd.c
@@ -228,6 +228,7 @@ static void ldlm_handle_cp_callback(struct ptlrpc_request *req,

lock_res_and_lock(lock);
LASSERT(lock->l_lvb_data == NULL);
+ lock->l_lvb_type = LVB_T_LAYOUT;
lock->l_lvb_data = lvb_data;
lock->l_lvb_len = lvb_len;
unlock_res_and_lock(lock);
diff --git a/drivers/staging/lustre/lustre/mdc/mdc_locks.c b/drivers/staging/lustre/lustre/mdc/mdc_locks.c
index 81adc2b..b0d0e2a 100644
--- a/drivers/staging/lustre/lustre/mdc/mdc_locks.c
+++ b/drivers/staging/lustre/lustre/mdc/mdc_locks.c
@@ -753,6 +753,7 @@ static int mdc_finish_enqueue(struct obd_export *exp,
/* install lvb_data */
lock_res_and_lock(lock);
if (lock->l_lvb_data == NULL) {
+ lock->l_lvb_type = LVB_T_LAYOUT;
lock->l_lvb_data = lmm;
lock->l_lvb_len = lvb_len;
lmm = NULL;
--
1.8.5.3

2014-03-01 02:19:58

by Oleg Drokin

[permalink] [raw]
Subject: [PATCH 07/17] lustre/mdc: fix bad ERR_PTR usage in mdc_locks.c

From: "John L. Hammond" <[email protected]>

In mdc_intent_open_pack() return an ERR_PTR() rather than NULL when
ldlm_prep_enqueue_req() fails. In mdc_intent_getattr_async() check the
return value of mdc_intent_getattr_pack() using IS_ERR(). Clean up the
includes in mdc_locks.c.

Signed-off-by: John L. Hammond <[email protected]>
Reviewed-on: http://review.whamcloud.com/7886
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-4078
Reviewed-by: Andreas Dilger <[email protected]>
Reviewed-by: Nathaniel Clark <[email protected]>
Signed-off-by: Oleg Drokin <[email protected]>
---
drivers/staging/lustre/lustre/mdc/mdc_locks.c | 20 ++++++++++----------
1 file changed, 10 insertions(+), 10 deletions(-)

diff --git a/drivers/staging/lustre/lustre/mdc/mdc_locks.c b/drivers/staging/lustre/lustre/mdc/mdc_locks.c
index 6ef9e28..6110943 100644
--- a/drivers/staging/lustre/lustre/mdc/mdc_locks.c
+++ b/drivers/staging/lustre/lustre/mdc/mdc_locks.c
@@ -37,15 +37,15 @@
#define DEBUG_SUBSYSTEM S_MDC

# include <linux/module.h>
-# include <linux/pagemap.h>
-# include <linux/miscdevice.h>

-#include <lustre_acl.h>
+#include <linux/lustre_intent.h>
+#include <obd.h>
#include <obd_class.h>
#include <lustre_dlm.h>
-/* fid_res_name_eq() */
-#include <lustre_fid.h>
-#include <lprocfs_status.h>
+#include <lustre_fid.h> /* fid_res_name_eq() */
+#include <lustre_mdc.h>
+#include <lustre_net.h>
+#include <lustre_req_layout.h>
#include "mdc_internal.h"

struct mdc_getattr_args {
@@ -336,9 +336,9 @@ static struct ptlrpc_request *mdc_intent_open_pack(struct obd_export *exp,
max(lmmsize, obddev->u.cli.cl_default_mds_easize));

rc = ldlm_prep_enqueue_req(exp, req, &cancels, count);
- if (rc) {
+ if (rc < 0) {
ptlrpc_request_free(req);
- return NULL;
+ return ERR_PTR(rc);
}

spin_lock(&req->rq_lock);
@@ -1281,8 +1281,8 @@ int mdc_intent_getattr_async(struct obd_export *exp,

fid_build_reg_res_name(&op_data->op_fid1, &res_id);
req = mdc_intent_getattr_pack(exp, it, op_data);
- if (!req)
- return -ENOMEM;
+ if (IS_ERR(req))
+ return PTR_ERR(req);

rc = mdc_enter_request(&obddev->u.cli);
if (rc != 0) {
--
1.8.5.3

2014-03-01 02:20:26

by Oleg Drokin

[permalink] [raw]
Subject: [PATCH 05/17] lustre/mdc: use ibits_known mask for lock match

From: Alexey Lyashkov <[email protected]>

Before revalidating a lock on the client, mask the lock bits against
the lock bits supported by the server (ibits_known), so newer clients
will find valid locks given by older server versions.

Signed-off-by: Patrick Farrell <[email protected]>
Signed-off-by: Alexey Lyashkov <[email protected]>
Reviewed-on: http://review.whamcloud.com/8636
Xyratex-bug-id: MRP-1583
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-4405
Reviewed-by: Andreas Dilger <[email protected]>
Reviewed-by: Oleg Drokin <[email protected]>
Signed-off-by: Oleg Drokin <[email protected]>
---
drivers/staging/lustre/lustre/include/lustre_export.h | 8 ++++++++
drivers/staging/lustre/lustre/mdc/mdc_locks.c | 8 +++++---
2 files changed, 13 insertions(+), 3 deletions(-)

diff --git a/drivers/staging/lustre/lustre/include/lustre_export.h b/drivers/staging/lustre/lustre/include/lustre_export.h
index 2feb38b..82a230b 100644
--- a/drivers/staging/lustre/lustre/include/lustre_export.h
+++ b/drivers/staging/lustre/lustre/include/lustre_export.h
@@ -380,6 +380,14 @@ static inline bool imp_connect_lvb_type(struct obd_import *imp)
return false;
}

+static inline __u64 exp_connect_ibits(struct obd_export *exp)
+{
+ struct obd_connect_data *ocd;
+
+ ocd = &exp->exp_connect_data;
+ return ocd->ocd_ibits_known;
+}
+
extern struct obd_export *class_conn2export(struct lustre_handle *conn);
extern struct obd_device *class_conn2obd(struct lustre_handle *conn);

diff --git a/drivers/staging/lustre/lustre/mdc/mdc_locks.c b/drivers/staging/lustre/lustre/mdc/mdc_locks.c
index d9017a5..6ef9e28 100644
--- a/drivers/staging/lustre/lustre/mdc/mdc_locks.c
+++ b/drivers/staging/lustre/lustre/mdc/mdc_locks.c
@@ -160,6 +160,8 @@ ldlm_mode_t mdc_lock_match(struct obd_export *exp, __u64 flags,
ldlm_mode_t rc;

fid_build_reg_res_name(fid, &res_id);
+ /* LU-4405: Clear bits not supported by server */
+ policy->l_inodebits.bits &= exp_connect_ibits(exp);
rc = ldlm_lock_match(class_exp2obd(exp)->obd_namespace, flags,
&res_id, type, policy, mode, lockh, 0);
return rc;
@@ -1087,10 +1089,10 @@ int mdc_revalidate_lock(struct obd_export *exp, struct lookup_intent *it,
break;
}

- mode = ldlm_lock_match(exp->exp_obd->obd_namespace,
- LDLM_FL_BLOCK_GRANTED, &res_id,
+ mode = mdc_lock_match(exp, LDLM_FL_BLOCK_GRANTED, fid,
LDLM_IBITS, &policy,
- LCK_CR|LCK_CW|LCK_PR|LCK_PW, &lockh, 0);
+ LCK_CR | LCK_CW | LCK_PR | LCK_PW,
+ &lockh);
}

if (mode) {
--
1.8.5.3

2014-03-01 02:17:17

by Oleg Drokin

[permalink] [raw]
Subject: [PATCH 03/17] lustre/llite: Do not send parent dir fid in getattr by fid

Sending getattr by fid in this case is pointless, as the parent
might havelong changed and we have no control over it, but it's
irrelevant anyway, since we already have the child fid.

Signed-off-by: Oleg Drokin <[email protected]>
Reviewed-on: http://review.whamcloud.com/7910
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-3240
Reviewed-by: Dmitry Eremin <[email protected]>
Reviewed-by: wangdi <[email protected]>
Reviewed-by: Andreas Dilger <[email protected]>
---
drivers/staging/lustre/lustre/llite/dir.c | 2 +-
drivers/staging/lustre/lustre/llite/file.c | 2 +-
2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/staging/lustre/lustre/llite/dir.c b/drivers/staging/lustre/lustre/llite/dir.c
index fd0dd20e..7fbc18e 100644
--- a/drivers/staging/lustre/lustre/llite/dir.c
+++ b/drivers/staging/lustre/lustre/llite/dir.c
@@ -362,7 +362,7 @@ struct page *ll_get_dir_page(struct inode *dir, __u64 hash,
struct ptlrpc_request *request;
struct md_op_data *op_data;

- op_data = ll_prep_md_op_data(NULL, dir, NULL, NULL, 0, 0,
+ op_data = ll_prep_md_op_data(NULL, dir, dir, NULL, 0, 0,
LUSTRE_OPC_ANY, NULL);
if (IS_ERR(op_data))
return (void *)op_data;
diff --git a/drivers/staging/lustre/lustre/llite/file.c b/drivers/staging/lustre/lustre/llite/file.c
index c9ee574..36c54aa 100644
--- a/drivers/staging/lustre/lustre/llite/file.c
+++ b/drivers/staging/lustre/lustre/llite/file.c
@@ -2891,7 +2891,7 @@ int __ll_inode_revalidate_it(struct dentry *dentry, struct lookup_intent *it,
oit.it_op = IT_LOOKUP;

/* Call getattr by fid, so do not provide name at all. */
- op_data = ll_prep_md_op_data(NULL, dentry->d_parent->d_inode,
+ op_data = ll_prep_md_op_data(NULL, dentry->d_inode,
dentry->d_inode, NULL, 0, 0,
LUSTRE_OPC_ANY, NULL);
if (IS_ERR(op_data))
--
1.8.5.3

2014-03-01 02:17:16

by Oleg Drokin

[permalink] [raw]
Subject: [PATCH 02/17] lustre/mdc: Check for all attributes validity in revalidate

GETATTR needs to return attributes protected by different bits, so
we need to ensure all we have locks with all of those bits, not
just UPDATE bit

Signed-off-by: Alexey Lyashkov <[email protected]>
Signed-off-by: Oleg Drokin <[email protected]>
Reviewed-on: http://review.whamcloud.com/6460
Xyratex-bug-id: MRP-1052
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-3240
Reviewed-by: Dmitry Eremin <[email protected]>
Reviewed-by: wangdi <[email protected]>
Reviewed-by: Andreas Dilger <[email protected]>
---
drivers/staging/lustre/lustre/mdc/mdc_locks.c | 16 +++++++++++++++-
1 file changed, 15 insertions(+), 1 deletion(-)

diff --git a/drivers/staging/lustre/lustre/mdc/mdc_locks.c b/drivers/staging/lustre/lustre/mdc/mdc_locks.c
index 288a41e..1336d47 100644
--- a/drivers/staging/lustre/lustre/mdc/mdc_locks.c
+++ b/drivers/staging/lustre/lustre/mdc/mdc_locks.c
@@ -1061,7 +1061,20 @@ int mdc_revalidate_lock(struct obd_export *exp, struct lookup_intent *it,
fid_build_reg_res_name(fid, &res_id);
switch (it->it_op) {
case IT_GETATTR:
- policy.l_inodebits.bits = MDS_INODELOCK_UPDATE;
+ /* File attributes are held under multiple bits:
+ * nlink is under lookup lock, size and times are
+ * under UPDATE lock and recently we've also got
+ * a separate permissions lock for owner/group/acl that
+ * were protected by lookup lock before.
+ * Getattr must provide all of that information,
+ * so we need to ensure we have all of those locks.
+ * Unfortunately, if the bits are split across multiple
+ * locks, there's no easy way to match all of them here,
+ * so an extra RPC would be performed to fetch all
+ * of those bits at once for now. */
+ policy.l_inodebits.bits = MDS_INODELOCK_UPDATE |
+ MDS_INODELOCK_LOOKUP |
+ MDS_INODELOCK_PERM;
break;
case IT_LAYOUT:
policy.l_inodebits.bits = MDS_INODELOCK_LAYOUT;
@@ -1070,6 +1083,7 @@ int mdc_revalidate_lock(struct obd_export *exp, struct lookup_intent *it,
policy.l_inodebits.bits = MDS_INODELOCK_LOOKUP;
break;
}
+
mode = ldlm_lock_match(exp->exp_obd->obd_namespace,
LDLM_FL_BLOCK_GRANTED, &res_id,
LDLM_IBITS, &policy,
--
1.8.5.3

2014-03-01 02:21:13

by Oleg Drokin

[permalink] [raw]
Subject: [PATCH 06/17] lustre/clio: honor O_NOATIME

From: "John L. Hammond" <[email protected]>

Add a ci_noatime bit to struct cl_io. In ll_io_init() set this bit if
O_NOATIME is set in f_flags. Ensure that this bit is propagated down
to lower layers. In osc_io_read_start() don't update atime if this bit
is set. Add sanity test 39n to check that passing O_NOATIME to open()
is honored.

Signed-off-by: John L. Hammond <[email protected]>
Reviewed-on: http://review.whamcloud.com/7442
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-3832
Reviewed-by: Jinshan Xiong <[email protected]>
Reviewed-by: Lai Siyao <[email protected]>
Tested-by: Maloo <[email protected]>
Reviewed-by: Oleg Drokin <[email protected]>
Signed-off-by: Oleg Drokin <[email protected]>
---
drivers/staging/lustre/lustre/include/cl_object.h | 6 ++++-
drivers/staging/lustre/lustre/llite/file.c | 29 +++++++++++++++++++++++
drivers/staging/lustre/lustre/lov/lov_io.c | 1 +
drivers/staging/lustre/lustre/osc/osc_io.c | 14 ++++-------
4 files changed, 40 insertions(+), 10 deletions(-)

diff --git a/drivers/staging/lustre/lustre/include/cl_object.h b/drivers/staging/lustre/lustre/include/cl_object.h
index 4d692dc..76e1b68 100644
--- a/drivers/staging/lustre/lustre/include/cl_object.h
+++ b/drivers/staging/lustre/lustre/include/cl_object.h
@@ -2392,7 +2392,11 @@ struct cl_io {
/**
* file is released, restore has to to be triggered by vvp layer
*/
- ci_restore_needed:1;
+ ci_restore_needed:1,
+ /**
+ * O_NOATIME
+ */
+ ci_noatime:1;
/**
* Number of pages owned by this IO. For invariant checking.
*/
diff --git a/drivers/staging/lustre/lustre/llite/file.c b/drivers/staging/lustre/lustre/llite/file.c
index 36c54aa..362f5ec 100644
--- a/drivers/staging/lustre/lustre/llite/file.c
+++ b/drivers/staging/lustre/lustre/llite/file.c
@@ -1035,6 +1035,33 @@ int ll_glimpse_ioctl(struct ll_sb_info *sbi, struct lov_stripe_md *lsm,
return rc;
}

+static bool file_is_noatime(const struct file *file)
+{
+ const struct vfsmount *mnt = file->f_path.mnt;
+ const struct inode *inode = file->f_path.dentry->d_inode;
+
+ /* Adapted from file_accessed() and touch_atime().*/
+ if (file->f_flags & O_NOATIME)
+ return true;
+
+ if (inode->i_flags & S_NOATIME)
+ return true;
+
+ if (IS_NOATIME(inode))
+ return true;
+
+ if (mnt->mnt_flags & (MNT_NOATIME | MNT_READONLY))
+ return true;
+
+ if ((mnt->mnt_flags & MNT_NODIRATIME) && S_ISDIR(inode->i_mode))
+ return true;
+
+ if ((inode->i_sb->s_flags & MS_NODIRATIME) && S_ISDIR(inode->i_mode))
+ return true;
+
+ return false;
+}
+
void ll_io_init(struct cl_io *io, const struct file *file, int write)
{
struct inode *inode = file->f_dentry->d_inode;
@@ -1054,6 +1081,8 @@ void ll_io_init(struct cl_io *io, const struct file *file, int write)
} else if (file->f_flags & O_APPEND) {
io->ci_lockreq = CILR_MANDATORY;
}
+
+ io->ci_noatime = file_is_noatime(file);
}

static ssize_t
diff --git a/drivers/staging/lustre/lustre/lov/lov_io.c b/drivers/staging/lustre/lustre/lov/lov_io.c
index 5a6ab70..65133ea 100644
--- a/drivers/staging/lustre/lustre/lov/lov_io.c
+++ b/drivers/staging/lustre/lustre/lov/lov_io.c
@@ -194,6 +194,7 @@ static int lov_io_sub_init(const struct lu_env *env, struct lov_io *lio,
sub_io->ci_lockreq = io->ci_lockreq;
sub_io->ci_type = io->ci_type;
sub_io->ci_no_srvlock = io->ci_no_srvlock;
+ sub_io->ci_noatime = io->ci_noatime;

lov_sub_enter(sub);
result = cl_io_sub_init(sub->sub_env, sub_io,
diff --git a/drivers/staging/lustre/lustre/osc/osc_io.c b/drivers/staging/lustre/lustre/osc/osc_io.c
index 777ae24..5f3c545 100644
--- a/drivers/staging/lustre/lustre/osc/osc_io.c
+++ b/drivers/staging/lustre/lustre/osc/osc_io.c
@@ -512,19 +512,15 @@ static int osc_io_read_start(const struct lu_env *env,
struct osc_io *oio = cl2osc_io(env, slice);
struct cl_object *obj = slice->cis_obj;
struct cl_attr *attr = &osc_env_info(env)->oti_attr;
- int result = 0;
+ int rc = 0;

- if (oio->oi_lockless == 0) {
+ if (oio->oi_lockless == 0 && !slice->cis_io->ci_noatime) {
cl_object_attr_lock(obj);
- result = cl_object_attr_get(env, obj, attr);
- if (result == 0) {
- attr->cat_atime = LTIME_S(CURRENT_TIME);
- result = cl_object_attr_set(env, obj, attr,
- CAT_ATIME);
- }
+ attr->cat_atime = LTIME_S(CURRENT_TIME);
+ rc = cl_object_attr_set(env, obj, attr, CAT_ATIME);
cl_object_attr_unlock(obj);
}
- return result;
+ return rc;
}

static int osc_io_write_start(const struct lu_env *env,
--
1.8.5.3

2014-03-01 02:17:14

by Oleg Drokin

[permalink] [raw]
Subject: [PATCH 04/17] lustre/mdc: comments on LOOKUP and PERM lock

From: wang di <[email protected]>

Add more comments for MDS_INODELOCK_PERM and
MDS_INODELOCK_LOOKUP

Signed-off-by: wang di <[email protected]>
Reviewed-on: http://review.whamcloud.com/7937
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-3240
Reviewed-by: Andreas Dilger <[email protected]>
Reviewed-by: Oleg Drokin <[email protected]>
Signed-off-by: Oleg Drokin <[email protected]>
---
.../lustre/lustre/include/lustre/lustre_idl.h | 24 ++++++++++++++++------
drivers/staging/lustre/lustre/mdc/mdc_locks.c | 3 +++
2 files changed, 21 insertions(+), 6 deletions(-)

diff --git a/drivers/staging/lustre/lustre/include/lustre/lustre_idl.h b/drivers/staging/lustre/lustre/include/lustre/lustre_idl.h
index 4183a35..4c70c06 100644
--- a/drivers/staging/lustre/lustre/include/lustre/lustre_idl.h
+++ b/drivers/staging/lustre/lustre/include/lustre/lustre_idl.h
@@ -2116,12 +2116,24 @@ extern void lustre_swab_generic_32s (__u32 *val);
#define DISP_OPEN_LEASE 0x04000000

/* INODE LOCK PARTS */
-#define MDS_INODELOCK_LOOKUP 0x000001 /* dentry, mode, owner, group */
-#define MDS_INODELOCK_UPDATE 0x000002 /* size, links, timestamps */
-#define MDS_INODELOCK_OPEN 0x000004 /* For opened files */
-#define MDS_INODELOCK_LAYOUT 0x000008 /* for layout */
-#define MDS_INODELOCK_PERM 0x000010 /* for permission */
-#define MDS_INODELOCK_XATTR 0x000020 /* extended attributes */
+#define MDS_INODELOCK_LOOKUP 0x000001 /* For namespace, dentry etc, and also
+ * was used to protect permission (mode,
+ * owner, group etc) before 2.4. */
+#define MDS_INODELOCK_UPDATE 0x000002 /* size, links, timestamps */
+#define MDS_INODELOCK_OPEN 0x000004 /* For opened files */
+#define MDS_INODELOCK_LAYOUT 0x000008 /* for layout */
+
+/* The PERM bit is added int 2.4, and it is used to protect permission(mode,
+ * owner, group, acl etc), so to separate the permission from LOOKUP lock.
+ * Because for remote directories(in DNE), these locks will be granted by
+ * different MDTs(different ldlm namespace).
+ *
+ * For local directory, MDT will always grant UPDATE_LOCK|PERM_LOCK together.
+ * For Remote directory, the master MDT, where the remote directory is, will
+ * grant UPDATE_LOCK|PERM_LOCK, and the remote MDT, where the name entry is,
+ * will grant LOOKUP_LOCK. */
+#define MDS_INODELOCK_PERM 0x000010
+#define MDS_INODELOCK_XATTR 0x000020 /* extended attributes */

#define MDS_INODELOCK_MAXSHIFT 5
/* This FULL lock is useful to take on unlink sort of operations */
diff --git a/drivers/staging/lustre/lustre/mdc/mdc_locks.c b/drivers/staging/lustre/lustre/mdc/mdc_locks.c
index 1336d47..d9017a5 100644
--- a/drivers/staging/lustre/lustre/mdc/mdc_locks.c
+++ b/drivers/staging/lustre/lustre/mdc/mdc_locks.c
@@ -1072,6 +1072,9 @@ int mdc_revalidate_lock(struct obd_export *exp, struct lookup_intent *it,
* locks, there's no easy way to match all of them here,
* so an extra RPC would be performed to fetch all
* of those bits at once for now. */
+ /* For new MDTs(> 2.4), UPDATE|PERM should be enough,
+ * but for old MDTs (< 2.4), permission is covered
+ * by LOOKUP lock, so it needs to match all bits here.*/
policy.l_inodebits.bits = MDS_INODELOCK_UPDATE |
MDS_INODELOCK_LOOKUP |
MDS_INODELOCK_PERM;
--
1.8.5.3