2008-07-11 11:01:34

by Steven Whitehouse

[permalink] [raw]
Subject: [GFS2] Pre-pull patch posting

So, although the merge window isn't yet open, I'm guessing that its
probably not too far away, hence this posting of the contents of
the current GFS2 -nmw git tree.

This time the big news is locking changes, although having said that,
there are far fewer queued patches in total than I've had for previous
merge windows and I believe that this is an indication of the
growing maturity of GFS2.

The first patch in the series is really the main change and is
a big clean up of the core of the glocks, which are really the
core of GFS2 in a lot of ways. Further through the series is
a documentation patch, which explains the fine detail of how
glocks work and the assumptions made by the glock core when
calling the functions relating to individual glock types.

Other notable changes include merging the lock_nolock module
into the core of GFS2 since there is little point in retaining
it separately. There is a plan to do the same to lock_dlm as
well in the future (not the DLM itself obviously, just the
interface module thats part of GFS2).

Most of the remaing changes are bug fixes or futher optimisations
over the initial glock changes, plus one or two minor clean ups
along the way.

Steve.


2008-07-11 11:01:52

by Steven Whitehouse

[permalink] [raw]
Subject: [PATCH 01/18] [GFS2] Clean up the glock core

From: Steven Whitehouse <[email protected]>

This patch implements a number of cleanups to the core of the
GFS2 glock code. As a result a lot of code is removed. It looks
like a really big change, but actually a large part of this patch
is either removing or moving existing code.

There are some new bits too though, such as the new run_queue()
function which is considerably streamlined. Highlights of this
patch include:

o Fixes a cluster coherency bug during SH -> EX lock conversions
o Removes the "glmutex" code in favour of a single bit lock
o Removes the ->go_xmote_bh() for inodes since it was duplicating
->go_lock()
o We now only use the ->lm_lock() function for both locks and
unlocks (i.e. unlock is a lock with target mode LM_ST_UNLOCKED)
o The fast path is considerably shortly, giving performance gains
especially with lock_nolock
o The glock_workqueue is now used for all the callbacks from the DLM
which allows us to simplify the lock_dlm module (see following patch)
o The way is now open to make further changes such as eliminating the two
threads (gfs2_glockd and gfs2_scand) in favour of a more efficient
scheme.

This patch has undergone extensive testing with various test suites
so it should be pretty stable by now.

Signed-off-by: Steven Whitehouse <[email protected]>
Cc: Bob Peterson <[email protected]>

diff --git a/fs/gfs2/glock.c b/fs/gfs2/glock.c
index d636b3e..519a54c 100644
--- a/fs/gfs2/glock.c
+++ b/fs/gfs2/glock.c
@@ -45,21 +45,19 @@ struct gfs2_gl_hash_bucket {
struct hlist_head hb_list;
};

-struct glock_iter {
- int hash; /* hash bucket index */
- struct gfs2_sbd *sdp; /* incore superblock */
- struct gfs2_glock *gl; /* current glock struct */
- struct seq_file *seq; /* sequence file for debugfs */
- char string[512]; /* scratch space */
+struct gfs2_glock_iter {
+ int hash; /* hash bucket index */
+ struct gfs2_sbd *sdp; /* incore superblock */
+ struct gfs2_glock *gl; /* current glock struct */
+ char string[512]; /* scratch space */
};

typedef void (*glock_examiner) (struct gfs2_glock * gl);

static int gfs2_dump_lockstate(struct gfs2_sbd *sdp);
-static int dump_glock(struct glock_iter *gi, struct gfs2_glock *gl);
-static void gfs2_glock_xmote_th(struct gfs2_glock *gl, struct gfs2_holder *gh);
-static void gfs2_glock_drop_th(struct gfs2_glock *gl);
-static void run_queue(struct gfs2_glock *gl);
+static int __dump_glock(struct seq_file *seq, const struct gfs2_glock *gl);
+#define GLOCK_BUG_ON(gl,x) do { if (unlikely(x)) { __dump_glock(NULL, gl); BUG(); } } while(0)
+static void do_xmote(struct gfs2_glock *gl, struct gfs2_holder *gh, unsigned int target);

static DECLARE_RWSEM(gfs2_umount_flush_sem);
static struct dentry *gfs2_root;
@@ -123,33 +121,6 @@ static inline rwlock_t *gl_lock_addr(unsigned int x)
#endif

/**
- * relaxed_state_ok - is a requested lock compatible with the current lock mode?
- * @actual: the current state of the lock
- * @requested: the lock state that was requested by the caller
- * @flags: the modifier flags passed in by the caller
- *
- * Returns: 1 if the locks are compatible, 0 otherwise
- */
-
-static inline int relaxed_state_ok(unsigned int actual, unsigned requested,
- int flags)
-{
- if (actual == requested)
- return 1;
-
- if (flags & GL_EXACT)
- return 0;
-
- if (actual == LM_ST_EXCLUSIVE && requested == LM_ST_SHARED)
- return 1;
-
- if (actual != LM_ST_UNLOCKED && (flags & LM_FLAG_ANY))
- return 1;
-
- return 0;
-}
-
-/**
* gl_hash() - Turn glock number into hash bucket number
* @lock: The glock number
*
@@ -211,17 +182,14 @@ static void gfs2_glock_hold(struct gfs2_glock *gl)
int gfs2_glock_put(struct gfs2_glock *gl)
{
int rv = 0;
- struct gfs2_sbd *sdp = gl->gl_sbd;

write_lock(gl_lock_addr(gl->gl_hash));
if (atomic_dec_and_test(&gl->gl_ref)) {
hlist_del(&gl->gl_list);
write_unlock(gl_lock_addr(gl->gl_hash));
- gfs2_assert(sdp, gl->gl_state == LM_ST_UNLOCKED);
- gfs2_assert(sdp, list_empty(&gl->gl_reclaim));
- gfs2_assert(sdp, list_empty(&gl->gl_holders));
- gfs2_assert(sdp, list_empty(&gl->gl_waiters1));
- gfs2_assert(sdp, list_empty(&gl->gl_waiters3));
+ GLOCK_BUG_ON(gl, gl->gl_state != LM_ST_UNLOCKED);
+ GLOCK_BUG_ON(gl, !list_empty(&gl->gl_reclaim));
+ GLOCK_BUG_ON(gl, !list_empty(&gl->gl_holders));
glock_free(gl);
rv = 1;
goto out;
@@ -281,16 +249,382 @@ static struct gfs2_glock *gfs2_glock_find(const struct gfs2_sbd *sdp,
return gl;
}

+/**
+ * may_grant - check if its ok to grant a new lock
+ * @gl: The glock
+ * @gh: The lock request which we wish to grant
+ *
+ * Returns: true if its ok to grant the lock
+ */
+
+static inline int may_grant(const struct gfs2_glock *gl, const struct gfs2_holder *gh)
+{
+ const struct gfs2_holder *gh_head = list_entry(gl->gl_holders.next, const struct gfs2_holder, gh_list);
+ if ((gh->gh_state == LM_ST_EXCLUSIVE ||
+ gh_head->gh_state == LM_ST_EXCLUSIVE) && gh != gh_head)
+ return 0;
+ if (gl->gl_state == gh->gh_state)
+ return 1;
+ if (gh->gh_flags & GL_EXACT)
+ return 0;
+ if (gh->gh_state == LM_ST_SHARED && gl->gl_state == LM_ST_EXCLUSIVE)
+ return 1;
+ if (gl->gl_state != LM_ST_UNLOCKED && (gh->gh_flags & LM_FLAG_ANY))
+ return 1;
+ return 0;
+}
+
+static void gfs2_holder_wake(struct gfs2_holder *gh)
+{
+ clear_bit(HIF_WAIT, &gh->gh_iflags);
+ smp_mb__after_clear_bit();
+ wake_up_bit(&gh->gh_iflags, HIF_WAIT);
+}
+
+/**
+ * do_promote - promote as many requests as possible on the current queue
+ * @gl: The glock
+ *
+ * Returns: true if there is a blocked holder at the head of the list
+ */
+
+static int do_promote(struct gfs2_glock *gl)
+{
+ const struct gfs2_glock_operations *glops = gl->gl_ops;
+ struct gfs2_holder *gh, *tmp;
+ int ret;
+
+restart:
+ list_for_each_entry_safe(gh, tmp, &gl->gl_holders, gh_list) {
+ if (test_bit(HIF_HOLDER, &gh->gh_iflags))
+ continue;
+ if (may_grant(gl, gh)) {
+ if (gh->gh_list.prev == &gl->gl_holders &&
+ glops->go_lock) {
+ spin_unlock(&gl->gl_spin);
+ /* FIXME: eliminate this eventually */
+ ret = glops->go_lock(gh);
+ spin_lock(&gl->gl_spin);
+ if (ret) {
+ gh->gh_error = ret;
+ list_del_init(&gh->gh_list);
+ gfs2_holder_wake(gh);
+ goto restart;
+ }
+ set_bit(HIF_HOLDER, &gh->gh_iflags);
+ gfs2_holder_wake(gh);
+ goto restart;
+ }
+ set_bit(HIF_HOLDER, &gh->gh_iflags);
+ gfs2_holder_wake(gh);
+ continue;
+ }
+ if (gh->gh_list.prev == &gl->gl_holders)
+ return 1;
+ break;
+ }
+ return 0;
+}
+
+/**
+ * do_error - Something unexpected has happened during a lock request
+ *
+ */
+
+static inline void do_error(struct gfs2_glock *gl, const int ret)
+{
+ struct gfs2_holder *gh, *tmp;
+
+ list_for_each_entry_safe(gh, tmp, &gl->gl_holders, gh_list) {
+ if (test_bit(HIF_HOLDER, &gh->gh_iflags))
+ continue;
+ if (ret & LM_OUT_ERROR)
+ gh->gh_error = -EIO;
+ else if (gh->gh_flags & (LM_FLAG_TRY | LM_FLAG_TRY_1CB))
+ gh->gh_error = GLR_TRYFAILED;
+ else
+ continue;
+ list_del_init(&gh->gh_list);
+ gfs2_holder_wake(gh);
+ }
+}
+
+/**
+ * find_first_waiter - find the first gh that's waiting for the glock
+ * @gl: the glock
+ */
+
+static inline struct gfs2_holder *find_first_waiter(const struct gfs2_glock *gl)
+{
+ struct gfs2_holder *gh;
+
+ list_for_each_entry(gh, &gl->gl_holders, gh_list) {
+ if (!test_bit(HIF_HOLDER, &gh->gh_iflags))
+ return gh;
+ }
+ return NULL;
+}
+
+/**
+ * state_change - record that the glock is now in a different state
+ * @gl: the glock
+ * @new_state the new state
+ *
+ */
+
+static void state_change(struct gfs2_glock *gl, unsigned int new_state)
+{
+ int held1, held2;
+
+ held1 = (gl->gl_state != LM_ST_UNLOCKED);
+ held2 = (new_state != LM_ST_UNLOCKED);
+
+ if (held1 != held2) {
+ if (held2)
+ gfs2_glock_hold(gl);
+ else
+ gfs2_glock_put(gl);
+ }
+
+ gl->gl_state = new_state;
+ gl->gl_tchange = jiffies;
+}
+
+static void gfs2_demote_wake(struct gfs2_glock *gl)
+{
+ gl->gl_demote_state = LM_ST_EXCLUSIVE;
+ clear_bit(GLF_DEMOTE, &gl->gl_flags);
+ smp_mb__after_clear_bit();
+ wake_up_bit(&gl->gl_flags, GLF_DEMOTE);
+}
+
+/**
+ * finish_xmote - The DLM has replied to one of our lock requests
+ * @gl: The glock
+ * @ret: The status from the DLM
+ *
+ */
+
+static void finish_xmote(struct gfs2_glock *gl, unsigned int ret)
+{
+ const struct gfs2_glock_operations *glops = gl->gl_ops;
+ struct gfs2_holder *gh;
+ unsigned state = ret & LM_OUT_ST_MASK;
+
+ spin_lock(&gl->gl_spin);
+ state_change(gl, state);
+ gh = find_first_waiter(gl);
+
+ /* Demote to UN request arrived during demote to SH or DF */
+ if (test_bit(GLF_DEMOTE_IN_PROGRESS, &gl->gl_flags) &&
+ state != LM_ST_UNLOCKED && gl->gl_demote_state == LM_ST_UNLOCKED)
+ gl->gl_target = LM_ST_UNLOCKED;
+
+ /* Check for state != intended state */
+ if (unlikely(state != gl->gl_target)) {
+ if (gh && !test_bit(GLF_DEMOTE_IN_PROGRESS, &gl->gl_flags)) {
+ /* move to back of queue and try next entry */
+ if (ret & LM_OUT_CANCELED) {
+ if ((gh->gh_flags & LM_FLAG_PRIORITY) == 0)
+ list_move_tail(&gh->gh_list, &gl->gl_holders);
+ gh = find_first_waiter(gl);
+ gl->gl_target = gh->gh_state;
+ goto retry;
+ }
+ /* Some error or failed "try lock" - report it */
+ if ((ret & LM_OUT_ERROR) ||
+ (gh->gh_flags & (LM_FLAG_TRY | LM_FLAG_TRY_1CB))) {
+ gl->gl_target = gl->gl_state;
+ do_error(gl, ret);
+ goto out;
+ }
+ }
+ switch(state) {
+ /* Unlocked due to conversion deadlock, try again */
+ case LM_ST_UNLOCKED:
+retry:
+ do_xmote(gl, gh, gl->gl_target);
+ break;
+ /* Conversion fails, unlock and try again */
+ case LM_ST_SHARED:
+ case LM_ST_DEFERRED:
+ do_xmote(gl, gh, LM_ST_UNLOCKED);
+ break;
+ default: /* Everything else */
+ printk(KERN_ERR "GFS2: wanted %u got %u\n", gl->gl_target, state);
+ GLOCK_BUG_ON(gl, 1);
+ }
+ spin_unlock(&gl->gl_spin);
+ gfs2_glock_put(gl);
+ return;
+ }
+
+ /* Fast path - we got what we asked for */
+ if (test_and_clear_bit(GLF_DEMOTE_IN_PROGRESS, &gl->gl_flags))
+ gfs2_demote_wake(gl);
+ if (state != LM_ST_UNLOCKED) {
+ if (glops->go_xmote_bh) {
+ int rv;
+ spin_unlock(&gl->gl_spin);
+ rv = glops->go_xmote_bh(gl, gh);
+ if (rv == -EAGAIN)
+ return;
+ spin_lock(&gl->gl_spin);
+ if (rv) {
+ do_error(gl, rv);
+ goto out;
+ }
+ }
+ do_promote(gl);
+ }
+out:
+ clear_bit(GLF_LOCK, &gl->gl_flags);
+ spin_unlock(&gl->gl_spin);
+ gfs2_glock_put(gl);
+}
+
+static unsigned int gfs2_lm_lock(struct gfs2_sbd *sdp, void *lock,
+ unsigned int cur_state, unsigned int req_state,
+ unsigned int flags)
+{
+ int ret = LM_OUT_ERROR;
+ if (likely(!test_bit(SDF_SHUTDOWN, &sdp->sd_flags)))
+ ret = sdp->sd_lockstruct.ls_ops->lm_lock(lock, cur_state,
+ req_state, flags);
+ return ret;
+}
+
+/**
+ * do_xmote - Calls the DLM to change the state of a lock
+ * @gl: The lock state
+ * @gh: The holder (only for promotes)
+ * @target: The target lock state
+ *
+ */
+
+static void do_xmote(struct gfs2_glock *gl, struct gfs2_holder *gh, unsigned int target)
+{
+ const struct gfs2_glock_operations *glops = gl->gl_ops;
+ struct gfs2_sbd *sdp = gl->gl_sbd;
+ unsigned int lck_flags = gh ? gh->gh_flags : 0;
+ int ret;
+
+ lck_flags &= (LM_FLAG_TRY | LM_FLAG_TRY_1CB | LM_FLAG_NOEXP |
+ LM_FLAG_PRIORITY);
+ BUG_ON(gl->gl_state == target);
+ BUG_ON(gl->gl_state == gl->gl_target);
+ if ((target == LM_ST_UNLOCKED || target == LM_ST_DEFERRED) &&
+ glops->go_inval) {
+ set_bit(GLF_INVALIDATE_IN_PROGRESS, &gl->gl_flags);
+ do_error(gl, 0); /* Fail queued try locks */
+ }
+ spin_unlock(&gl->gl_spin);
+ if (glops->go_xmote_th)
+ glops->go_xmote_th(gl);
+ if (test_bit(GLF_INVALIDATE_IN_PROGRESS, &gl->gl_flags))
+ glops->go_inval(gl, target == LM_ST_DEFERRED ? 0 : DIO_METADATA);
+ clear_bit(GLF_INVALIDATE_IN_PROGRESS, &gl->gl_flags);
+
+ gfs2_glock_hold(gl);
+ if (target != LM_ST_UNLOCKED && (gl->gl_state == LM_ST_SHARED ||
+ gl->gl_state == LM_ST_DEFERRED) &&
+ !(lck_flags & (LM_FLAG_TRY | LM_FLAG_TRY_1CB)))
+ lck_flags |= LM_FLAG_TRY_1CB;
+ ret = gfs2_lm_lock(sdp, gl->gl_lock, gl->gl_state, target, lck_flags);
+
+ if (!(ret & LM_OUT_ASYNC)) {
+ finish_xmote(gl, ret);
+ gfs2_glock_hold(gl);
+ if (queue_delayed_work(glock_workqueue, &gl->gl_work, 0) == 0)
+ gfs2_glock_put(gl);
+ } else {
+ GLOCK_BUG_ON(gl, ret != LM_OUT_ASYNC);
+ }
+ spin_lock(&gl->gl_spin);
+}
+
+/**
+ * find_first_holder - find the first "holder" gh
+ * @gl: the glock
+ */
+
+static inline struct gfs2_holder *find_first_holder(const struct gfs2_glock *gl)
+{
+ struct gfs2_holder *gh;
+
+ if (!list_empty(&gl->gl_holders)) {
+ gh = list_entry(gl->gl_holders.next, struct gfs2_holder, gh_list);
+ if (test_bit(HIF_HOLDER, &gh->gh_iflags))
+ return gh;
+ }
+ return NULL;
+}
+
+/**
+ * run_queue - do all outstanding tasks related to a glock
+ * @gl: The glock in question
+ * @nonblock: True if we must not block in run_queue
+ *
+ */
+
+static void run_queue(struct gfs2_glock *gl, const int nonblock)
+{
+ struct gfs2_holder *gh = NULL;
+
+ if (test_and_set_bit(GLF_LOCK, &gl->gl_flags))
+ return;
+
+ GLOCK_BUG_ON(gl, test_bit(GLF_DEMOTE_IN_PROGRESS, &gl->gl_flags));
+
+ if (test_bit(GLF_DEMOTE, &gl->gl_flags) &&
+ gl->gl_demote_state != gl->gl_state) {
+ if (find_first_holder(gl))
+ goto out;
+ if (nonblock)
+ goto out_sched;
+ set_bit(GLF_DEMOTE_IN_PROGRESS, &gl->gl_flags);
+ gl->gl_target = gl->gl_demote_state;
+ } else {
+ if (test_bit(GLF_DEMOTE, &gl->gl_flags))
+ gfs2_demote_wake(gl);
+ if (do_promote(gl) == 0)
+ goto out;
+ gh = find_first_waiter(gl);
+ gl->gl_target = gh->gh_state;
+ if (!(gh->gh_flags & (LM_FLAG_TRY | LM_FLAG_TRY_1CB)))
+ do_error(gl, 0); /* Fail queued try locks */
+ }
+ do_xmote(gl, gh, gl->gl_target);
+ return;
+
+out_sched:
+ gfs2_glock_hold(gl);
+ if (queue_delayed_work(glock_workqueue, &gl->gl_work, 0) == 0)
+ gfs2_glock_put(gl);
+out:
+ clear_bit(GLF_LOCK, &gl->gl_flags);
+}
+
static void glock_work_func(struct work_struct *work)
{
+ unsigned long delay = 0;
struct gfs2_glock *gl = container_of(work, struct gfs2_glock, gl_work.work);

+ if (test_and_clear_bit(GLF_REPLY_PENDING, &gl->gl_flags))
+ finish_xmote(gl, gl->gl_reply);
spin_lock(&gl->gl_spin);
- if (test_and_clear_bit(GLF_PENDING_DEMOTE, &gl->gl_flags))
- set_bit(GLF_DEMOTE, &gl->gl_flags);
- run_queue(gl);
+ if (test_and_clear_bit(GLF_PENDING_DEMOTE, &gl->gl_flags)) {
+ unsigned long holdtime, now = jiffies;
+ holdtime = gl->gl_tchange + gl->gl_ops->go_min_hold_time;
+ if (time_before(now, holdtime))
+ delay = holdtime - now;
+ set_bit(delay ? GLF_PENDING_DEMOTE : GLF_DEMOTE, &gl->gl_flags);
+ }
+ run_queue(gl, 0);
spin_unlock(&gl->gl_spin);
- gfs2_glock_put(gl);
+ if (!delay ||
+ queue_delayed_work(glock_workqueue, &gl->gl_work, delay) == 0)
+ gfs2_glock_put(gl);
}

static int gfs2_lm_get_lock(struct gfs2_sbd *sdp, struct lm_lockname *name,
@@ -342,12 +676,10 @@ int gfs2_glock_get(struct gfs2_sbd *sdp, u64 number,
gl->gl_name = name;
atomic_set(&gl->gl_ref, 1);
gl->gl_state = LM_ST_UNLOCKED;
+ gl->gl_target = LM_ST_UNLOCKED;
gl->gl_demote_state = LM_ST_EXCLUSIVE;
gl->gl_hash = hash;
- gl->gl_owner_pid = NULL;
- gl->gl_ip = 0;
gl->gl_ops = glops;
- gl->gl_req_gh = NULL;
gl->gl_stamp = jiffies;
gl->gl_tchange = jiffies;
gl->gl_object = NULL;
@@ -447,13 +779,6 @@ void gfs2_holder_uninit(struct gfs2_holder *gh)
gh->gh_ip = 0;
}

-static void gfs2_holder_wake(struct gfs2_holder *gh)
-{
- clear_bit(HIF_WAIT, &gh->gh_iflags);
- smp_mb__after_clear_bit();
- wake_up_bit(&gh->gh_iflags, HIF_WAIT);
-}
-
static int just_schedule(void *word)
{
schedule();
@@ -466,14 +791,6 @@ static void wait_on_holder(struct gfs2_holder *gh)
wait_on_bit(&gh->gh_iflags, HIF_WAIT, just_schedule, TASK_UNINTERRUPTIBLE);
}

-static void gfs2_demote_wake(struct gfs2_glock *gl)
-{
- gl->gl_demote_state = LM_ST_EXCLUSIVE;
- clear_bit(GLF_DEMOTE, &gl->gl_flags);
- smp_mb__after_clear_bit();
- wake_up_bit(&gl->gl_flags, GLF_DEMOTE);
-}
-
static void wait_on_demote(struct gfs2_glock *gl)
{
might_sleep();
@@ -481,217 +798,6 @@ static void wait_on_demote(struct gfs2_glock *gl)
}

/**
- * rq_mutex - process a mutex request in the queue
- * @gh: the glock holder
- *
- * Returns: 1 if the queue is blocked
- */
-
-static int rq_mutex(struct gfs2_holder *gh)
-{
- struct gfs2_glock *gl = gh->gh_gl;
-
- list_del_init(&gh->gh_list);
- /* gh->gh_error never examined. */
- set_bit(GLF_LOCK, &gl->gl_flags);
- clear_bit(HIF_WAIT, &gh->gh_iflags);
- smp_mb();
- wake_up_bit(&gh->gh_iflags, HIF_WAIT);
-
- return 1;
-}
-
-/**
- * rq_promote - process a promote request in the queue
- * @gh: the glock holder
- *
- * Acquire a new inter-node lock, or change a lock state to more restrictive.
- *
- * Returns: 1 if the queue is blocked
- */
-
-static int rq_promote(struct gfs2_holder *gh)
-{
- struct gfs2_glock *gl = gh->gh_gl;
-
- if (!relaxed_state_ok(gl->gl_state, gh->gh_state, gh->gh_flags)) {
- if (list_empty(&gl->gl_holders)) {
- gl->gl_req_gh = gh;
- set_bit(GLF_LOCK, &gl->gl_flags);
- spin_unlock(&gl->gl_spin);
- gfs2_glock_xmote_th(gh->gh_gl, gh);
- spin_lock(&gl->gl_spin);
- }
- return 1;
- }
-
- if (list_empty(&gl->gl_holders)) {
- set_bit(HIF_FIRST, &gh->gh_iflags);
- set_bit(GLF_LOCK, &gl->gl_flags);
- } else {
- struct gfs2_holder *next_gh;
- if (gh->gh_state == LM_ST_EXCLUSIVE)
- return 1;
- next_gh = list_entry(gl->gl_holders.next, struct gfs2_holder,
- gh_list);
- if (next_gh->gh_state == LM_ST_EXCLUSIVE)
- return 1;
- }
-
- list_move_tail(&gh->gh_list, &gl->gl_holders);
- gh->gh_error = 0;
- set_bit(HIF_HOLDER, &gh->gh_iflags);
-
- gfs2_holder_wake(gh);
-
- return 0;
-}
-
-/**
- * rq_demote - process a demote request in the queue
- * @gh: the glock holder
- *
- * Returns: 1 if the queue is blocked
- */
-
-static int rq_demote(struct gfs2_glock *gl)
-{
- if (!list_empty(&gl->gl_holders))
- return 1;
-
- if (gl->gl_state == gl->gl_demote_state ||
- gl->gl_state == LM_ST_UNLOCKED) {
- gfs2_demote_wake(gl);
- return 0;
- }
-
- set_bit(GLF_LOCK, &gl->gl_flags);
- set_bit(GLF_DEMOTE_IN_PROGRESS, &gl->gl_flags);
-
- if (gl->gl_demote_state == LM_ST_UNLOCKED ||
- gl->gl_state != LM_ST_EXCLUSIVE) {
- spin_unlock(&gl->gl_spin);
- gfs2_glock_drop_th(gl);
- } else {
- spin_unlock(&gl->gl_spin);
- gfs2_glock_xmote_th(gl, NULL);
- }
-
- spin_lock(&gl->gl_spin);
- clear_bit(GLF_DEMOTE_IN_PROGRESS, &gl->gl_flags);
-
- return 0;
-}
-
-/**
- * run_queue - process holder structures on a glock
- * @gl: the glock
- *
- */
-static void run_queue(struct gfs2_glock *gl)
-{
- struct gfs2_holder *gh;
- int blocked = 1;
-
- for (;;) {
- if (test_bit(GLF_LOCK, &gl->gl_flags))
- break;
-
- if (!list_empty(&gl->gl_waiters1)) {
- gh = list_entry(gl->gl_waiters1.next,
- struct gfs2_holder, gh_list);
- blocked = rq_mutex(gh);
- } else if (test_bit(GLF_DEMOTE, &gl->gl_flags)) {
- blocked = rq_demote(gl);
- if (test_bit(GLF_WAITERS2, &gl->gl_flags) &&
- !blocked) {
- set_bit(GLF_DEMOTE, &gl->gl_flags);
- gl->gl_demote_state = LM_ST_UNLOCKED;
- }
- clear_bit(GLF_WAITERS2, &gl->gl_flags);
- } else if (!list_empty(&gl->gl_waiters3)) {
- gh = list_entry(gl->gl_waiters3.next,
- struct gfs2_holder, gh_list);
- blocked = rq_promote(gh);
- } else
- break;
-
- if (blocked)
- break;
- }
-}
-
-/**
- * gfs2_glmutex_lock - acquire a local lock on a glock
- * @gl: the glock
- *
- * Gives caller exclusive access to manipulate a glock structure.
- */
-
-static void gfs2_glmutex_lock(struct gfs2_glock *gl)
-{
- spin_lock(&gl->gl_spin);
- if (test_and_set_bit(GLF_LOCK, &gl->gl_flags)) {
- struct gfs2_holder gh;
-
- gfs2_holder_init(gl, 0, 0, &gh);
- set_bit(HIF_WAIT, &gh.gh_iflags);
- list_add_tail(&gh.gh_list, &gl->gl_waiters1);
- spin_unlock(&gl->gl_spin);
- wait_on_holder(&gh);
- gfs2_holder_uninit(&gh);
- } else {
- gl->gl_owner_pid = get_pid(task_pid(current));
- gl->gl_ip = (unsigned long)__builtin_return_address(0);
- spin_unlock(&gl->gl_spin);
- }
-}
-
-/**
- * gfs2_glmutex_trylock - try to acquire a local lock on a glock
- * @gl: the glock
- *
- * Returns: 1 if the glock is acquired
- */
-
-static int gfs2_glmutex_trylock(struct gfs2_glock *gl)
-{
- int acquired = 1;
-
- spin_lock(&gl->gl_spin);
- if (test_and_set_bit(GLF_LOCK, &gl->gl_flags)) {
- acquired = 0;
- } else {
- gl->gl_owner_pid = get_pid(task_pid(current));
- gl->gl_ip = (unsigned long)__builtin_return_address(0);
- }
- spin_unlock(&gl->gl_spin);
-
- return acquired;
-}
-
-/**
- * gfs2_glmutex_unlock - release a local lock on a glock
- * @gl: the glock
- *
- */
-
-static void gfs2_glmutex_unlock(struct gfs2_glock *gl)
-{
- struct pid *pid;
-
- spin_lock(&gl->gl_spin);
- clear_bit(GLF_LOCK, &gl->gl_flags);
- pid = gl->gl_owner_pid;
- gl->gl_owner_pid = NULL;
- gl->gl_ip = 0;
- run_queue(gl);
- spin_unlock(&gl->gl_spin);
-
- put_pid(pid);
-}
-
-/**
* handle_callback - process a demote request
* @gl: the glock
* @state: the state the caller wants us to change to
@@ -705,398 +811,45 @@ static void handle_callback(struct gfs2_glock *gl, unsigned int state,
{
int bit = delay ? GLF_PENDING_DEMOTE : GLF_DEMOTE;

- spin_lock(&gl->gl_spin);
set_bit(bit, &gl->gl_flags);
if (gl->gl_demote_state == LM_ST_EXCLUSIVE) {
gl->gl_demote_state = state;
gl->gl_demote_time = jiffies;
if (remote && gl->gl_ops->go_type == LM_TYPE_IOPEN &&
- gl->gl_object) {
+ gl->gl_object)
gfs2_glock_schedule_for_reclaim(gl);
- spin_unlock(&gl->gl_spin);
- return;
- }
} else if (gl->gl_demote_state != LM_ST_UNLOCKED &&
gl->gl_demote_state != state) {
- if (test_bit(GLF_DEMOTE_IN_PROGRESS, &gl->gl_flags))
- set_bit(GLF_WAITERS2, &gl->gl_flags);
- else
- gl->gl_demote_state = LM_ST_UNLOCKED;
+ gl->gl_demote_state = LM_ST_UNLOCKED;
}
- spin_unlock(&gl->gl_spin);
}

/**
- * state_change - record that the glock is now in a different state
- * @gl: the glock
- * @new_state the new state
- *
- */
-
-static void state_change(struct gfs2_glock *gl, unsigned int new_state)
-{
- int held1, held2;
-
- held1 = (gl->gl_state != LM_ST_UNLOCKED);
- held2 = (new_state != LM_ST_UNLOCKED);
-
- if (held1 != held2) {
- if (held2)
- gfs2_glock_hold(gl);
- else
- gfs2_glock_put(gl);
- }
-
- gl->gl_state = new_state;
- gl->gl_tchange = jiffies;
-}
-
-/**
- * drop_bh - Called after a lock module unlock completes
- * @gl: the glock
- * @ret: the return status
- *
- * Doesn't wake up the process waiting on the struct gfs2_holder (if any)
- * Doesn't drop the reference on the glock the top half took out
- *
- */
-
-static void drop_bh(struct gfs2_glock *gl, unsigned int ret)
-{
- struct gfs2_sbd *sdp = gl->gl_sbd;
- struct gfs2_holder *gh = gl->gl_req_gh;
-
- gfs2_assert_warn(sdp, test_bit(GLF_LOCK, &gl->gl_flags));
- gfs2_assert_warn(sdp, list_empty(&gl->gl_holders));
- gfs2_assert_warn(sdp, !ret);
-
- state_change(gl, LM_ST_UNLOCKED);
-
- if (test_and_clear_bit(GLF_CONV_DEADLK, &gl->gl_flags)) {
- spin_lock(&gl->gl_spin);
- gh->gh_error = 0;
- spin_unlock(&gl->gl_spin);
- gfs2_glock_xmote_th(gl, gl->gl_req_gh);
- gfs2_glock_put(gl);
- return;
- }
-
- spin_lock(&gl->gl_spin);
- gfs2_demote_wake(gl);
- clear_bit(GLF_LOCK, &gl->gl_flags);
- spin_unlock(&gl->gl_spin);
- gfs2_glock_put(gl);
-}
-
-/**
- * xmote_bh - Called after the lock module is done acquiring a lock
- * @gl: The glock in question
- * @ret: the int returned from the lock module
- *
- */
-
-static void xmote_bh(struct gfs2_glock *gl, unsigned int ret)
-{
- struct gfs2_sbd *sdp = gl->gl_sbd;
- const struct gfs2_glock_operations *glops = gl->gl_ops;
- struct gfs2_holder *gh = gl->gl_req_gh;
- int op_done = 1;
-
- if (!gh && (ret & LM_OUT_ST_MASK) == LM_ST_UNLOCKED) {
- drop_bh(gl, ret);
- return;
- }
-
- gfs2_assert_warn(sdp, test_bit(GLF_LOCK, &gl->gl_flags));
- gfs2_assert_warn(sdp, list_empty(&gl->gl_holders));
- gfs2_assert_warn(sdp, !(ret & LM_OUT_ASYNC));
-
- state_change(gl, ret & LM_OUT_ST_MASK);
-
- /* Deal with each possible exit condition */
-
- if (!gh) {
- gl->gl_stamp = jiffies;
- if (ret & LM_OUT_CANCELED) {
- op_done = 0;
- } else {
- spin_lock(&gl->gl_spin);
- if (gl->gl_state != gl->gl_demote_state) {
- spin_unlock(&gl->gl_spin);
- gfs2_glock_drop_th(gl);
- gfs2_glock_put(gl);
- return;
- }
- gfs2_demote_wake(gl);
- spin_unlock(&gl->gl_spin);
- }
- } else {
- spin_lock(&gl->gl_spin);
- if (ret & LM_OUT_CONV_DEADLK) {
- gh->gh_error = 0;
- set_bit(GLF_CONV_DEADLK, &gl->gl_flags);
- spin_unlock(&gl->gl_spin);
- gfs2_glock_drop_th(gl);
- gfs2_glock_put(gl);
- return;
- }
- list_del_init(&gh->gh_list);
- gh->gh_error = -EIO;
- if (unlikely(test_bit(SDF_SHUTDOWN, &sdp->sd_flags)))
- goto out;
- gh->gh_error = GLR_CANCELED;
- if (ret & LM_OUT_CANCELED)
- goto out;
- if (relaxed_state_ok(gl->gl_state, gh->gh_state, gh->gh_flags)) {
- list_add_tail(&gh->gh_list, &gl->gl_holders);
- gh->gh_error = 0;
- set_bit(HIF_HOLDER, &gh->gh_iflags);
- set_bit(HIF_FIRST, &gh->gh_iflags);
- op_done = 0;
- goto out;
- }
- gh->gh_error = GLR_TRYFAILED;
- if (gh->gh_flags & (LM_FLAG_TRY | LM_FLAG_TRY_1CB))
- goto out;
- gh->gh_error = -EINVAL;
- if (gfs2_assert_withdraw(sdp, 0) == -1)
- fs_err(sdp, "ret = 0x%.8X\n", ret);
-out:
- spin_unlock(&gl->gl_spin);
- }
-
- if (glops->go_xmote_bh)
- glops->go_xmote_bh(gl);
-
- if (op_done) {
- spin_lock(&gl->gl_spin);
- gl->gl_req_gh = NULL;
- clear_bit(GLF_LOCK, &gl->gl_flags);
- spin_unlock(&gl->gl_spin);
- }
-
- gfs2_glock_put(gl);
-
- if (gh)
- gfs2_holder_wake(gh);
-}
-
-static unsigned int gfs2_lm_lock(struct gfs2_sbd *sdp, void *lock,
- unsigned int cur_state, unsigned int req_state,
- unsigned int flags)
-{
- int ret = 0;
- if (likely(!test_bit(SDF_SHUTDOWN, &sdp->sd_flags)))
- ret = sdp->sd_lockstruct.ls_ops->lm_lock(lock, cur_state,
- req_state, flags);
- return ret;
-}
-
-/**
- * gfs2_glock_xmote_th - Call into the lock module to acquire or change a glock
- * @gl: The glock in question
- * @state: the requested state
- * @flags: modifier flags to the lock call
- *
- */
-
-static void gfs2_glock_xmote_th(struct gfs2_glock *gl, struct gfs2_holder *gh)
-{
- struct gfs2_sbd *sdp = gl->gl_sbd;
- int flags = gh ? gh->gh_flags : 0;
- unsigned state = gh ? gh->gh_state : gl->gl_demote_state;
- const struct gfs2_glock_operations *glops = gl->gl_ops;
- int lck_flags = flags & (LM_FLAG_TRY | LM_FLAG_TRY_1CB |
- LM_FLAG_NOEXP | LM_FLAG_ANY |
- LM_FLAG_PRIORITY);
- unsigned int lck_ret;
-
- if (glops->go_xmote_th)
- glops->go_xmote_th(gl);
- if (state == LM_ST_DEFERRED && glops->go_inval)
- glops->go_inval(gl, DIO_METADATA);
-
- gfs2_assert_warn(sdp, test_bit(GLF_LOCK, &gl->gl_flags));
- gfs2_assert_warn(sdp, list_empty(&gl->gl_holders));
- gfs2_assert_warn(sdp, state != LM_ST_UNLOCKED);
- gfs2_assert_warn(sdp, state != gl->gl_state);
-
- gfs2_glock_hold(gl);
-
- lck_ret = gfs2_lm_lock(sdp, gl->gl_lock, gl->gl_state, state, lck_flags);
-
- if (gfs2_assert_withdraw(sdp, !(lck_ret & LM_OUT_ERROR)))
- return;
-
- if (lck_ret & LM_OUT_ASYNC)
- gfs2_assert_warn(sdp, lck_ret == LM_OUT_ASYNC);
- else
- xmote_bh(gl, lck_ret);
-}
-
-static unsigned int gfs2_lm_unlock(struct gfs2_sbd *sdp, void *lock,
- unsigned int cur_state)
-{
- int ret = 0;
- if (likely(!test_bit(SDF_SHUTDOWN, &sdp->sd_flags)))
- ret = sdp->sd_lockstruct.ls_ops->lm_unlock(lock, cur_state);
- return ret;
-}
-
-/**
- * gfs2_glock_drop_th - call into the lock module to unlock a lock
- * @gl: the glock
- *
- */
-
-static void gfs2_glock_drop_th(struct gfs2_glock *gl)
-{
- struct gfs2_sbd *sdp = gl->gl_sbd;
- const struct gfs2_glock_operations *glops = gl->gl_ops;
- unsigned int ret;
-
- if (glops->go_xmote_th)
- glops->go_xmote_th(gl);
- if (glops->go_inval)
- glops->go_inval(gl, DIO_METADATA);
-
- gfs2_assert_warn(sdp, test_bit(GLF_LOCK, &gl->gl_flags));
- gfs2_assert_warn(sdp, list_empty(&gl->gl_holders));
- gfs2_assert_warn(sdp, gl->gl_state != LM_ST_UNLOCKED);
-
- gfs2_glock_hold(gl);
-
- ret = gfs2_lm_unlock(sdp, gl->gl_lock, gl->gl_state);
-
- if (gfs2_assert_withdraw(sdp, !(ret & LM_OUT_ERROR)))
- return;
-
- if (!ret)
- drop_bh(gl, ret);
- else
- gfs2_assert_warn(sdp, ret == LM_OUT_ASYNC);
-}
-
-/**
- * do_cancels - cancel requests for locks stuck waiting on an expire flag
- * @gh: the LM_FLAG_PRIORITY holder waiting to acquire the lock
- *
- * Don't cancel GL_NOCANCEL requests.
- */
-
-static void do_cancels(struct gfs2_holder *gh)
-{
- struct gfs2_glock *gl = gh->gh_gl;
- struct gfs2_sbd *sdp = gl->gl_sbd;
-
- spin_lock(&gl->gl_spin);
-
- while (gl->gl_req_gh != gh &&
- !test_bit(HIF_HOLDER, &gh->gh_iflags) &&
- !list_empty(&gh->gh_list)) {
- if (!(gl->gl_req_gh && (gl->gl_req_gh->gh_flags & GL_NOCANCEL))) {
- spin_unlock(&gl->gl_spin);
- if (likely(!test_bit(SDF_SHUTDOWN, &sdp->sd_flags)))
- sdp->sd_lockstruct.ls_ops->lm_cancel(gl->gl_lock);
- msleep(100);
- spin_lock(&gl->gl_spin);
- } else {
- spin_unlock(&gl->gl_spin);
- msleep(100);
- spin_lock(&gl->gl_spin);
- }
- }
-
- spin_unlock(&gl->gl_spin);
-}
-
-/**
- * glock_wait_internal - wait on a glock acquisition
+ * gfs2_glock_wait - wait on a glock acquisition
* @gh: the glock holder
*
* Returns: 0 on success
*/

-static int glock_wait_internal(struct gfs2_holder *gh)
+int gfs2_glock_wait(struct gfs2_holder *gh)
{
- struct gfs2_glock *gl = gh->gh_gl;
- struct gfs2_sbd *sdp = gl->gl_sbd;
- const struct gfs2_glock_operations *glops = gl->gl_ops;
-
- if (test_bit(HIF_ABORTED, &gh->gh_iflags))
- return -EIO;
-
- if (gh->gh_flags & (LM_FLAG_TRY | LM_FLAG_TRY_1CB)) {
- spin_lock(&gl->gl_spin);
- if (gl->gl_req_gh != gh &&
- !test_bit(HIF_HOLDER, &gh->gh_iflags) &&
- !list_empty(&gh->gh_list)) {
- list_del_init(&gh->gh_list);
- gh->gh_error = GLR_TRYFAILED;
- run_queue(gl);
- spin_unlock(&gl->gl_spin);
- return gh->gh_error;
- }
- spin_unlock(&gl->gl_spin);
- }
-
- if (gh->gh_flags & LM_FLAG_PRIORITY)
- do_cancels(gh);
-
wait_on_holder(gh);
- if (gh->gh_error)
- return gh->gh_error;
-
- gfs2_assert_withdraw(sdp, test_bit(HIF_HOLDER, &gh->gh_iflags));
- gfs2_assert_withdraw(sdp, relaxed_state_ok(gl->gl_state, gh->gh_state,
- gh->gh_flags));
-
- if (test_bit(HIF_FIRST, &gh->gh_iflags)) {
- gfs2_assert_warn(sdp, test_bit(GLF_LOCK, &gl->gl_flags));
-
- if (glops->go_lock) {
- gh->gh_error = glops->go_lock(gh);
- if (gh->gh_error) {
- spin_lock(&gl->gl_spin);
- list_del_init(&gh->gh_list);
- spin_unlock(&gl->gl_spin);
- }
- }
-
- spin_lock(&gl->gl_spin);
- gl->gl_req_gh = NULL;
- clear_bit(GLF_LOCK, &gl->gl_flags);
- run_queue(gl);
- spin_unlock(&gl->gl_spin);
- }
-
return gh->gh_error;
}

-static inline struct gfs2_holder *
-find_holder_by_owner(struct list_head *head, struct pid *pid)
-{
- struct gfs2_holder *gh;
-
- list_for_each_entry(gh, head, gh_list) {
- if (gh->gh_owner_pid == pid)
- return gh;
- }
-
- return NULL;
-}
-
-static void print_dbg(struct glock_iter *gi, const char *fmt, ...)
+void gfs2_print_dbg(struct seq_file *seq, const char *fmt, ...)
{
va_list args;

va_start(args, fmt);
- if (gi) {
+ if (seq) {
+ struct gfs2_glock_iter *gi = seq->private;
vsprintf(gi->string, fmt, args);
- seq_printf(gi->seq, gi->string);
- }
- else
+ seq_printf(seq, gi->string);
+ } else {
+ printk(KERN_ERR " ");
vprintk(fmt, args);
+ }
va_end(args);
}

@@ -1104,50 +857,75 @@ static void print_dbg(struct glock_iter *gi, const char *fmt, ...)
* add_to_queue - Add a holder to the wait queue (but look for recursion)
* @gh: the holder structure to add
*
+ * Eventually we should move the recursive locking trap to a
+ * debugging option or something like that. This is the fast
+ * path and needs to have the minimum number of distractions.
+ *
*/

-static void add_to_queue(struct gfs2_holder *gh)
+static inline void add_to_queue(struct gfs2_holder *gh)
{
struct gfs2_glock *gl = gh->gh_gl;
- struct gfs2_holder *existing;
+ struct gfs2_sbd *sdp = gl->gl_sbd;
+ struct list_head *insert_pt = NULL;
+ struct gfs2_holder *gh2;
+ int try_lock = 0;

BUG_ON(gh->gh_owner_pid == NULL);
if (test_and_set_bit(HIF_WAIT, &gh->gh_iflags))
BUG();

- if (!(gh->gh_flags & GL_FLOCK)) {
- existing = find_holder_by_owner(&gl->gl_holders,
- gh->gh_owner_pid);
- if (existing) {
- print_symbol(KERN_WARNING "original: %s\n",
- existing->gh_ip);
- printk(KERN_INFO "pid : %d\n",
- pid_nr(existing->gh_owner_pid));
- printk(KERN_INFO "lock type : %d lock state : %d\n",
- existing->gh_gl->gl_name.ln_type,
- existing->gh_gl->gl_state);
- print_symbol(KERN_WARNING "new: %s\n", gh->gh_ip);
- printk(KERN_INFO "pid : %d\n",
- pid_nr(gh->gh_owner_pid));
- printk(KERN_INFO "lock type : %d lock state : %d\n",
- gl->gl_name.ln_type, gl->gl_state);
- BUG();
- }
-
- existing = find_holder_by_owner(&gl->gl_waiters3,
- gh->gh_owner_pid);
- if (existing) {
- print_symbol(KERN_WARNING "original: %s\n",
- existing->gh_ip);
- print_symbol(KERN_WARNING "new: %s\n", gh->gh_ip);
- BUG();
+ if (gh->gh_flags & (LM_FLAG_TRY | LM_FLAG_TRY_1CB)) {
+ if (test_bit(GLF_LOCK, &gl->gl_flags))
+ try_lock = 1;
+ if (test_bit(GLF_INVALIDATE_IN_PROGRESS, &gl->gl_flags))
+ goto fail;
+ }
+
+ list_for_each_entry(gh2, &gl->gl_holders, gh_list) {
+ if (unlikely(gh2->gh_owner_pid == gh->gh_owner_pid &&
+ (gh->gh_gl->gl_ops->go_type != LM_TYPE_FLOCK)))
+ goto trap_recursive;
+ if (try_lock &&
+ !(gh2->gh_flags & (LM_FLAG_TRY | LM_FLAG_TRY_1CB)) &&
+ !may_grant(gl, gh)) {
+fail:
+ gh->gh_error = GLR_TRYFAILED;
+ gfs2_holder_wake(gh);
+ return;
}
+ if (test_bit(HIF_HOLDER, &gh2->gh_iflags))
+ continue;
+ if (unlikely((gh->gh_flags & LM_FLAG_PRIORITY) && !insert_pt))
+ insert_pt = &gh2->gh_list;
+ }
+ if (likely(insert_pt == NULL)) {
+ list_add_tail(&gh->gh_list, &gl->gl_holders);
+ if (unlikely(gh->gh_flags & LM_FLAG_PRIORITY))
+ goto do_cancel;
+ return;
}
+ list_add_tail(&gh->gh_list, insert_pt);
+do_cancel:
+ gh = list_entry(gl->gl_holders.next, struct gfs2_holder, gh_list);
+ if (!(gh->gh_flags & LM_FLAG_PRIORITY)) {
+ spin_unlock(&gl->gl_spin);
+ sdp->sd_lockstruct.ls_ops->lm_cancel(gl->gl_lock);
+ spin_lock(&gl->gl_spin);
+ }
+ return;

- if (gh->gh_flags & LM_FLAG_PRIORITY)
- list_add(&gh->gh_list, &gl->gl_waiters3);
- else
- list_add_tail(&gh->gh_list, &gl->gl_waiters3);
+trap_recursive:
+ print_symbol(KERN_ERR "original: %s\n", gh2->gh_ip);
+ printk(KERN_ERR "pid: %d\n", pid_nr(gh2->gh_owner_pid));
+ printk(KERN_ERR "lock type: %d req lock state : %d\n",
+ gh2->gh_gl->gl_name.ln_type, gh2->gh_state);
+ print_symbol(KERN_ERR "new: %s\n", gh->gh_ip);
+ printk(KERN_ERR "pid: %d\n", pid_nr(gh->gh_owner_pid));
+ printk(KERN_ERR "lock type: %d req lock state : %d\n",
+ gh->gh_gl->gl_name.ln_type, gh->gh_state);
+ __dump_glock(NULL, gl);
+ BUG();
}

/**
@@ -1165,24 +943,16 @@ int gfs2_glock_nq(struct gfs2_holder *gh)
struct gfs2_sbd *sdp = gl->gl_sbd;
int error = 0;

-restart:
- if (unlikely(test_bit(SDF_SHUTDOWN, &sdp->sd_flags))) {
- set_bit(HIF_ABORTED, &gh->gh_iflags);
+ if (unlikely(test_bit(SDF_SHUTDOWN, &sdp->sd_flags)))
return -EIO;
- }

spin_lock(&gl->gl_spin);
add_to_queue(gh);
- run_queue(gl);
+ run_queue(gl, 1);
spin_unlock(&gl->gl_spin);

- if (!(gh->gh_flags & GL_ASYNC)) {
- error = glock_wait_internal(gh);
- if (error == GLR_CANCELED) {
- msleep(100);
- goto restart;
- }
- }
+ if (!(gh->gh_flags & GL_ASYNC))
+ error = gfs2_glock_wait(gh);

return error;
}
@@ -1196,48 +966,7 @@ restart:

int gfs2_glock_poll(struct gfs2_holder *gh)
{
- struct gfs2_glock *gl = gh->gh_gl;
- int ready = 0;
-
- spin_lock(&gl->gl_spin);
-
- if (test_bit(HIF_HOLDER, &gh->gh_iflags))
- ready = 1;
- else if (list_empty(&gh->gh_list)) {
- if (gh->gh_error == GLR_CANCELED) {
- spin_unlock(&gl->gl_spin);
- msleep(100);
- if (gfs2_glock_nq(gh))
- return 1;
- return 0;
- } else
- ready = 1;
- }
-
- spin_unlock(&gl->gl_spin);
-
- return ready;
-}
-
-/**
- * gfs2_glock_wait - wait for a lock acquisition that ended in a GLR_ASYNC
- * @gh: the holder structure
- *
- * Returns: 0, GLR_TRYFAILED, or errno on failure
- */
-
-int gfs2_glock_wait(struct gfs2_holder *gh)
-{
- int error;
-
- error = glock_wait_internal(gh);
- if (error == GLR_CANCELED) {
- msleep(100);
- gh->gh_flags &= ~GL_ASYNC;
- error = gfs2_glock_nq(gh);
- }
-
- return error;
+ return test_bit(HIF_WAIT, &gh->gh_iflags) ? 0 : 1;
}

/**
@@ -1251,26 +980,30 @@ void gfs2_glock_dq(struct gfs2_holder *gh)
struct gfs2_glock *gl = gh->gh_gl;
const struct gfs2_glock_operations *glops = gl->gl_ops;
unsigned delay = 0;
+ int fast_path = 0;

+ spin_lock(&gl->gl_spin);
if (gh->gh_flags & GL_NOCACHE)
handle_callback(gl, LM_ST_UNLOCKED, 0, 0);

- gfs2_glmutex_lock(gl);
-
- spin_lock(&gl->gl_spin);
list_del_init(&gh->gh_list);
-
- if (list_empty(&gl->gl_holders)) {
+ if (find_first_holder(gl) == NULL) {
if (glops->go_unlock) {
+ GLOCK_BUG_ON(gl, test_and_set_bit(GLF_LOCK, &gl->gl_flags));
spin_unlock(&gl->gl_spin);
glops->go_unlock(gh);
spin_lock(&gl->gl_spin);
+ clear_bit(GLF_LOCK, &gl->gl_flags);
}
gl->gl_stamp = jiffies;
+ if (list_empty(&gl->gl_holders) &&
+ !test_bit(GLF_PENDING_DEMOTE, &gl->gl_flags) &&
+ !test_bit(GLF_DEMOTE, &gl->gl_flags))
+ fast_path = 1;
}
-
- clear_bit(GLF_LOCK, &gl->gl_flags);
spin_unlock(&gl->gl_spin);
+ if (likely(fast_path))
+ return;

gfs2_glock_hold(gl);
if (test_bit(GLF_PENDING_DEMOTE, &gl->gl_flags) &&
@@ -1469,20 +1202,14 @@ int gfs2_lvb_hold(struct gfs2_glock *gl)
{
int error;

- gfs2_glmutex_lock(gl);
-
if (!atomic_read(&gl->gl_lvb_count)) {
error = gfs2_lm_hold_lvb(gl->gl_sbd, gl->gl_lock, &gl->gl_lvb);
- if (error) {
- gfs2_glmutex_unlock(gl);
+ if (error)
return error;
- }
gfs2_glock_hold(gl);
}
atomic_inc(&gl->gl_lvb_count);

- gfs2_glmutex_unlock(gl);
-
return 0;
}

@@ -1497,8 +1224,6 @@ void gfs2_lvb_unhold(struct gfs2_glock *gl)
struct gfs2_sbd *sdp = gl->gl_sbd;

gfs2_glock_hold(gl);
- gfs2_glmutex_lock(gl);
-
gfs2_assert(gl->gl_sbd, atomic_read(&gl->gl_lvb_count) > 0);
if (atomic_dec_and_test(&gl->gl_lvb_count)) {
if (likely(!test_bit(SDF_SHUTDOWN, &sdp->sd_flags)))
@@ -1506,8 +1231,6 @@ void gfs2_lvb_unhold(struct gfs2_glock *gl)
gl->gl_lvb = NULL;
gfs2_glock_put(gl);
}
-
- gfs2_glmutex_unlock(gl);
gfs2_glock_put(gl);
}

@@ -1527,7 +1250,9 @@ static void blocking_cb(struct gfs2_sbd *sdp, struct lm_lockname *name,
if (time_before(now, holdtime))
delay = holdtime - now;

+ spin_lock(&gl->gl_spin);
handle_callback(gl, state, 1, delay);
+ spin_unlock(&gl->gl_spin);
if (queue_delayed_work(glock_workqueue, &gl->gl_work, delay) == 0)
gfs2_glock_put(gl);
}
@@ -1568,7 +1293,8 @@ void gfs2_glock_cb(void *cb_data, unsigned int type, void *data)
gl = gfs2_glock_find(sdp, &async->lc_name);
if (gfs2_assert_warn(sdp, gl))
return;
- xmote_bh(gl, async->lc_ret);
+ gl->gl_reply = async->lc_ret;
+ set_bit(GLF_REPLY_PENDING, &gl->gl_flags);
if (queue_delayed_work(glock_workqueue, &gl->gl_work, 0) == 0)
gfs2_glock_put(gl);
up_read(&gfs2_umount_flush_sem);
@@ -1646,6 +1372,7 @@ void gfs2_glock_schedule_for_reclaim(struct gfs2_glock *gl)
void gfs2_reclaim_glock(struct gfs2_sbd *sdp)
{
struct gfs2_glock *gl;
+ int done_callback = 0;

spin_lock(&sdp->sd_reclaim_lock);
if (list_empty(&sdp->sd_reclaim_list)) {
@@ -1660,14 +1387,16 @@ void gfs2_reclaim_glock(struct gfs2_sbd *sdp)
atomic_dec(&sdp->sd_reclaim_count);
atomic_inc(&sdp->sd_reclaimed);

- if (gfs2_glmutex_trylock(gl)) {
- if (list_empty(&gl->gl_holders) &&
- gl->gl_state != LM_ST_UNLOCKED && demote_ok(gl))
- handle_callback(gl, LM_ST_UNLOCKED, 0, 0);
- gfs2_glmutex_unlock(gl);
+ spin_lock(&gl->gl_spin);
+ if (find_first_holder(gl) == NULL &&
+ gl->gl_state != LM_ST_UNLOCKED && demote_ok(gl)) {
+ handle_callback(gl, LM_ST_UNLOCKED, 0, 0);
+ done_callback = 1;
}
-
- gfs2_glock_put(gl);
+ spin_unlock(&gl->gl_spin);
+ if (!done_callback ||
+ queue_delayed_work(glock_workqueue, &gl->gl_work, 0) == 0)
+ gfs2_glock_put(gl);
}

/**
@@ -1724,18 +1453,14 @@ static void scan_glock(struct gfs2_glock *gl)
{
if (gl->gl_ops == &gfs2_inode_glops && gl->gl_object)
return;
+ if (test_bit(GLF_LOCK, &gl->gl_flags))
+ return;

- if (gfs2_glmutex_trylock(gl)) {
- if (list_empty(&gl->gl_holders) &&
- gl->gl_state != LM_ST_UNLOCKED && demote_ok(gl))
- goto out_schedule;
- gfs2_glmutex_unlock(gl);
- }
- return;
-
-out_schedule:
- gfs2_glmutex_unlock(gl);
- gfs2_glock_schedule_for_reclaim(gl);
+ spin_lock(&gl->gl_spin);
+ if (find_first_holder(gl) == NULL &&
+ gl->gl_state != LM_ST_UNLOCKED && demote_ok(gl))
+ gfs2_glock_schedule_for_reclaim(gl);
+ spin_unlock(&gl->gl_spin);
}

/**
@@ -1760,12 +1485,13 @@ static void clear_glock(struct gfs2_glock *gl)
spin_unlock(&sdp->sd_reclaim_lock);
}

- if (gfs2_glmutex_trylock(gl)) {
- if (list_empty(&gl->gl_holders) &&
- gl->gl_state != LM_ST_UNLOCKED)
- handle_callback(gl, LM_ST_UNLOCKED, 0, 0);
- gfs2_glmutex_unlock(gl);
- }
+ spin_lock(&gl->gl_spin);
+ if (find_first_holder(gl) == NULL && gl->gl_state != LM_ST_UNLOCKED)
+ handle_callback(gl, LM_ST_UNLOCKED, 0, 0);
+ spin_unlock(&gl->gl_spin);
+ gfs2_glock_hold(gl);
+ if (queue_delayed_work(glock_workqueue, &gl->gl_work, 0) == 0)
+ gfs2_glock_put(gl);
}

/**
@@ -1810,180 +1536,164 @@ void gfs2_gl_hash_clear(struct gfs2_sbd *sdp, int wait)
}
}

-/*
- * Diagnostic routines to help debug distributed deadlock
- */
-
-static void gfs2_print_symbol(struct glock_iter *gi, const char *fmt,
- unsigned long address)
+static const char *state2str(unsigned state)
{
- char buffer[KSYM_SYMBOL_LEN];
-
- sprint_symbol(buffer, address);
- print_dbg(gi, fmt, buffer);
+ switch(state) {
+ case LM_ST_UNLOCKED:
+ return "UN";
+ case LM_ST_SHARED:
+ return "SH";
+ case LM_ST_DEFERRED:
+ return "DF";
+ case LM_ST_EXCLUSIVE:
+ return "EX";
+ }
+ return "??";
+}
+
+static const char *hflags2str(char *buf, unsigned flags, unsigned long iflags)
+{
+ char *p = buf;
+ if (flags & LM_FLAG_TRY)
+ *p++ = 't';
+ if (flags & LM_FLAG_TRY_1CB)
+ *p++ = 'T';
+ if (flags & LM_FLAG_NOEXP)
+ *p++ = 'e';
+ if (flags & LM_FLAG_ANY)
+ *p++ = 'a';
+ if (flags & LM_FLAG_PRIORITY)
+ *p++ = 'p';
+ if (flags & GL_ASYNC)
+ *p++ = 'a';
+ if (flags & GL_EXACT)
+ *p++ = 'E';
+ if (flags & GL_ATIME)
+ *p++ = 'a';
+ if (flags & GL_NOCACHE)
+ *p++ = 'c';
+ if (test_bit(HIF_HOLDER, &iflags))
+ *p++ = 'H';
+ if (test_bit(HIF_WAIT, &iflags))
+ *p++ = 'W';
+ if (test_bit(HIF_FIRST, &iflags))
+ *p++ = 'F';
+ *p = 0;
+ return buf;
}

/**
* dump_holder - print information about a glock holder
- * @str: a string naming the type of holder
+ * @seq: the seq_file struct
* @gh: the glock holder
*
* Returns: 0 on success, -ENOBUFS when we run out of space
*/

-static int dump_holder(struct glock_iter *gi, char *str,
- struct gfs2_holder *gh)
+static int dump_holder(struct seq_file *seq, const struct gfs2_holder *gh)
{
- unsigned int x;
- struct task_struct *gh_owner;
+ struct task_struct *gh_owner = NULL;
+ char buffer[KSYM_SYMBOL_LEN];
+ char flags_buf[32];

- print_dbg(gi, " %s\n", str);
- if (gh->gh_owner_pid) {
- print_dbg(gi, " owner = %ld ",
- (long)pid_nr(gh->gh_owner_pid));
+ sprint_symbol(buffer, gh->gh_ip);
+ if (gh->gh_owner_pid)
gh_owner = pid_task(gh->gh_owner_pid, PIDTYPE_PID);
- if (gh_owner)
- print_dbg(gi, "(%s)\n", gh_owner->comm);
- else
- print_dbg(gi, "(ended)\n");
- } else
- print_dbg(gi, " owner = -1\n");
- print_dbg(gi, " gh_state = %u\n", gh->gh_state);
- print_dbg(gi, " gh_flags =");
- for (x = 0; x < 32; x++)
- if (gh->gh_flags & (1 << x))
- print_dbg(gi, " %u", x);
- print_dbg(gi, " \n");
- print_dbg(gi, " error = %d\n", gh->gh_error);
- print_dbg(gi, " gh_iflags =");
- for (x = 0; x < 32; x++)
- if (test_bit(x, &gh->gh_iflags))
- print_dbg(gi, " %u", x);
- print_dbg(gi, " \n");
- gfs2_print_symbol(gi, " initialized at: %s\n", gh->gh_ip);
-
+ gfs2_print_dbg(seq, " H: s:%s f:%s e:%d p:%ld [%s] %s\n",
+ state2str(gh->gh_state),
+ hflags2str(flags_buf, gh->gh_flags, gh->gh_iflags),
+ gh->gh_error,
+ gh->gh_owner_pid ? (long)pid_nr(gh->gh_owner_pid) : -1,
+ gh_owner ? gh_owner->comm : "(ended)", buffer);
return 0;
}

-/**
- * dump_inode - print information about an inode
- * @ip: the inode
- *
- * Returns: 0 on success, -ENOBUFS when we run out of space
- */
-
-static int dump_inode(struct glock_iter *gi, struct gfs2_inode *ip)
-{
- unsigned int x;
-
- print_dbg(gi, " Inode:\n");
- print_dbg(gi, " num = %llu/%llu\n",
- (unsigned long long)ip->i_no_formal_ino,
- (unsigned long long)ip->i_no_addr);
- print_dbg(gi, " type = %u\n", IF2DT(ip->i_inode.i_mode));
- print_dbg(gi, " i_flags =");
- for (x = 0; x < 32; x++)
- if (test_bit(x, &ip->i_flags))
- print_dbg(gi, " %u", x);
- print_dbg(gi, " \n");
- return 0;
+static const char *gflags2str(char *buf, const unsigned long *gflags)
+{
+ char *p = buf;
+ if (test_bit(GLF_LOCK, gflags))
+ *p++ = 'l';
+ if (test_bit(GLF_STICKY, gflags))
+ *p++ = 's';
+ if (test_bit(GLF_DEMOTE, gflags))
+ *p++ = 'D';
+ if (test_bit(GLF_PENDING_DEMOTE, gflags))
+ *p++ = 'd';
+ if (test_bit(GLF_DEMOTE_IN_PROGRESS, gflags))
+ *p++ = 'p';
+ if (test_bit(GLF_DIRTY, gflags))
+ *p++ = 'y';
+ if (test_bit(GLF_LFLUSH, gflags))
+ *p++ = 'f';
+ if (test_bit(GLF_INVALIDATE_IN_PROGRESS, gflags))
+ *p++ = 'i';
+ if (test_bit(GLF_REPLY_PENDING, gflags))
+ *p++ = 'r';
+ *p = 0;
+ return buf;
}

/**
- * dump_glock - print information about a glock
+ * __dump_glock - print information about a glock
+ * @seq: The seq_file struct
* @gl: the glock
- * @count: where we are in the buffer
+ *
+ * The file format is as follows:
+ * One line per object, capital letters are used to indicate objects
+ * G = glock, I = Inode, R = rgrp, H = holder. Glocks are not indented,
+ * other objects are indented by a single space and follow the glock to
+ * which they are related. Fields are indicated by lower case letters
+ * followed by a colon and the field value, except for strings which are in
+ * [] so that its possible to see if they are composed of spaces for
+ * example. The field's are n = number (id of the object), f = flags,
+ * t = type, s = state, r = refcount, e = error, p = pid.
*
* Returns: 0 on success, -ENOBUFS when we run out of space
*/

-static int dump_glock(struct glock_iter *gi, struct gfs2_glock *gl)
+static int __dump_glock(struct seq_file *seq, const struct gfs2_glock *gl)
{
- struct gfs2_holder *gh;
- unsigned int x;
- int error = -ENOBUFS;
- struct task_struct *gl_owner;
+ const struct gfs2_glock_operations *glops = gl->gl_ops;
+ unsigned long long dtime;
+ const struct gfs2_holder *gh;
+ char gflags_buf[32];
+ int error = 0;

- spin_lock(&gl->gl_spin);
+ dtime = jiffies - gl->gl_demote_time;
+ dtime *= 1000000/HZ; /* demote time in uSec */
+ if (!test_bit(GLF_DEMOTE, &gl->gl_flags))
+ dtime = 0;
+ gfs2_print_dbg(seq, "G: s:%s n:%u/%llu f:%s t:%s d:%s/%llu l:%d a:%d r:%d\n",
+ state2str(gl->gl_state),
+ gl->gl_name.ln_type,
+ (unsigned long long)gl->gl_name.ln_number,
+ gflags2str(gflags_buf, &gl->gl_flags),
+ state2str(gl->gl_target),
+ state2str(gl->gl_demote_state), dtime,
+ atomic_read(&gl->gl_lvb_count),
+ atomic_read(&gl->gl_ail_count),
+ atomic_read(&gl->gl_ref));

- print_dbg(gi, "Glock 0x%p (%u, 0x%llx)\n", gl, gl->gl_name.ln_type,
- (unsigned long long)gl->gl_name.ln_number);
- print_dbg(gi, " gl_flags =");
- for (x = 0; x < 32; x++) {
- if (test_bit(x, &gl->gl_flags))
- print_dbg(gi, " %u", x);
- }
- if (!test_bit(GLF_LOCK, &gl->gl_flags))
- print_dbg(gi, " (unlocked)");
- print_dbg(gi, " \n");
- print_dbg(gi, " gl_ref = %d\n", atomic_read(&gl->gl_ref));
- print_dbg(gi, " gl_state = %u\n", gl->gl_state);
- if (gl->gl_owner_pid) {
- gl_owner = pid_task(gl->gl_owner_pid, PIDTYPE_PID);
- if (gl_owner)
- print_dbg(gi, " gl_owner = pid %d (%s)\n",
- pid_nr(gl->gl_owner_pid), gl_owner->comm);
- else
- print_dbg(gi, " gl_owner = %d (ended)\n",
- pid_nr(gl->gl_owner_pid));
- } else
- print_dbg(gi, " gl_owner = -1\n");
- print_dbg(gi, " gl_ip = %lu\n", gl->gl_ip);
- print_dbg(gi, " req_gh = %s\n", (gl->gl_req_gh) ? "yes" : "no");
- print_dbg(gi, " lvb_count = %d\n", atomic_read(&gl->gl_lvb_count));
- print_dbg(gi, " object = %s\n", (gl->gl_object) ? "yes" : "no");
- print_dbg(gi, " reclaim = %s\n",
- (list_empty(&gl->gl_reclaim)) ? "no" : "yes");
- if (gl->gl_aspace)
- print_dbg(gi, " aspace = 0x%p nrpages = %lu\n", gl->gl_aspace,
- gl->gl_aspace->i_mapping->nrpages);
- else
- print_dbg(gi, " aspace = no\n");
- print_dbg(gi, " ail = %d\n", atomic_read(&gl->gl_ail_count));
- if (gl->gl_req_gh) {
- error = dump_holder(gi, "Request", gl->gl_req_gh);
- if (error)
- goto out;
- }
list_for_each_entry(gh, &gl->gl_holders, gh_list) {
- error = dump_holder(gi, "Holder", gh);
- if (error)
- goto out;
- }
- list_for_each_entry(gh, &gl->gl_waiters1, gh_list) {
- error = dump_holder(gi, "Waiter1", gh);
- if (error)
- goto out;
- }
- list_for_each_entry(gh, &gl->gl_waiters3, gh_list) {
- error = dump_holder(gi, "Waiter3", gh);
+ error = dump_holder(seq, gh);
if (error)
goto out;
}
- if (test_bit(GLF_DEMOTE, &gl->gl_flags)) {
- print_dbg(gi, " Demotion req to state %u (%llu uS ago)\n",
- gl->gl_demote_state, (unsigned long long)
- (jiffies - gl->gl_demote_time)*(1000000/HZ));
- }
- if (gl->gl_ops == &gfs2_inode_glops && gl->gl_object) {
- if (!test_bit(GLF_LOCK, &gl->gl_flags) &&
- list_empty(&gl->gl_holders)) {
- error = dump_inode(gi, gl->gl_object);
- if (error)
- goto out;
- } else {
- error = -ENOBUFS;
- print_dbg(gi, " Inode: busy\n");
- }
- }
-
- error = 0;
-
+ if (gl->gl_state != LM_ST_UNLOCKED && glops->go_dump)
+ error = glops->go_dump(seq, gl);
out:
- spin_unlock(&gl->gl_spin);
return error;
}

+static int dump_glock(struct seq_file *seq, struct gfs2_glock *gl)
+{
+ int ret;
+ spin_lock(&gl->gl_spin);
+ ret = __dump_glock(seq, gl);
+ spin_unlock(&gl->gl_spin);
+ return ret;
+}
+
/**
* gfs2_dump_lockstate - print out the current lockstate
* @sdp: the filesystem
@@ -2086,7 +1796,7 @@ void gfs2_glock_exit(void)
module_param(scand_secs, uint, S_IRUGO|S_IWUSR);
MODULE_PARM_DESC(scand_secs, "The number of seconds between scand runs");

-static int gfs2_glock_iter_next(struct glock_iter *gi)
+static int gfs2_glock_iter_next(struct gfs2_glock_iter *gi)
{
struct gfs2_glock *gl;

@@ -2104,7 +1814,7 @@ restart:
gfs2_glock_put(gl);
if (gl && gi->gl == NULL)
gi->hash++;
- while(gi->gl == NULL) {
+ while (gi->gl == NULL) {
if (gi->hash >= GFS2_GL_HASH_SIZE)
return 1;
read_lock(gl_lock_addr(gi->hash));
@@ -2122,58 +1832,34 @@ restart:
return 0;
}

-static void gfs2_glock_iter_free(struct glock_iter *gi)
+static void gfs2_glock_iter_free(struct gfs2_glock_iter *gi)
{
if (gi->gl)
gfs2_glock_put(gi->gl);
- kfree(gi);
-}
-
-static struct glock_iter *gfs2_glock_iter_init(struct gfs2_sbd *sdp)
-{
- struct glock_iter *gi;
-
- gi = kmalloc(sizeof (*gi), GFP_KERNEL);
- if (!gi)
- return NULL;
-
- gi->sdp = sdp;
- gi->hash = 0;
- gi->seq = NULL;
gi->gl = NULL;
- memset(gi->string, 0, sizeof(gi->string));
-
- if (gfs2_glock_iter_next(gi)) {
- gfs2_glock_iter_free(gi);
- return NULL;
- }
-
- return gi;
}

-static void *gfs2_glock_seq_start(struct seq_file *file, loff_t *pos)
+static void *gfs2_glock_seq_start(struct seq_file *seq, loff_t *pos)
{
- struct glock_iter *gi;
+ struct gfs2_glock_iter *gi = seq->private;
loff_t n = *pos;

- gi = gfs2_glock_iter_init(file->private);
- if (!gi)
- return NULL;
+ gi->hash = 0;

- while(n--) {
+ do {
if (gfs2_glock_iter_next(gi)) {
gfs2_glock_iter_free(gi);
return NULL;
}
- }
+ } while (n--);

- return gi;
+ return gi->gl;
}

-static void *gfs2_glock_seq_next(struct seq_file *file, void *iter_ptr,
+static void *gfs2_glock_seq_next(struct seq_file *seq, void *iter_ptr,
loff_t *pos)
{
- struct glock_iter *gi = iter_ptr;
+ struct gfs2_glock_iter *gi = seq->private;

(*pos)++;

@@ -2182,24 +1868,18 @@ static void *gfs2_glock_seq_next(struct seq_file *file, void *iter_ptr,
return NULL;
}

- return gi;
+ return gi->gl;
}

-static void gfs2_glock_seq_stop(struct seq_file *file, void *iter_ptr)
+static void gfs2_glock_seq_stop(struct seq_file *seq, void *iter_ptr)
{
- struct glock_iter *gi = iter_ptr;
- if (gi)
- gfs2_glock_iter_free(gi);
+ struct gfs2_glock_iter *gi = seq->private;
+ gfs2_glock_iter_free(gi);
}

-static int gfs2_glock_seq_show(struct seq_file *file, void *iter_ptr)
+static int gfs2_glock_seq_show(struct seq_file *seq, void *iter_ptr)
{
- struct glock_iter *gi = iter_ptr;
-
- gi->seq = file;
- dump_glock(gi, gi->gl);
-
- return 0;
+ return dump_glock(seq, iter_ptr);
}

static const struct seq_operations gfs2_glock_seq_ops = {
@@ -2211,17 +1891,14 @@ static const struct seq_operations gfs2_glock_seq_ops = {

static int gfs2_debugfs_open(struct inode *inode, struct file *file)
{
- struct seq_file *seq;
- int ret;
-
- ret = seq_open(file, &gfs2_glock_seq_ops);
- if (ret)
- return ret;
-
- seq = file->private_data;
- seq->private = inode->i_private;
-
- return 0;
+ int ret = seq_open_private(file, &gfs2_glock_seq_ops,
+ sizeof(struct gfs2_glock_iter));
+ if (ret == 0) {
+ struct seq_file *seq = file->private_data;
+ struct gfs2_glock_iter *gi = seq->private;
+ gi->sdp = inode->i_private;
+ }
+ return ret;
}

static const struct file_operations gfs2_debug_fops = {
@@ -2229,7 +1906,7 @@ static const struct file_operations gfs2_debug_fops = {
.open = gfs2_debugfs_open,
.read = seq_read,
.llseek = seq_lseek,
- .release = seq_release
+ .release = seq_release_private,
};

int gfs2_create_debugfs_file(struct gfs2_sbd *sdp)
diff --git a/fs/gfs2/glock.h b/fs/gfs2/glock.h
index cdad3e6..7389f8e 100644
--- a/fs/gfs2/glock.h
+++ b/fs/gfs2/glock.h
@@ -26,11 +26,8 @@
#define GL_SKIP 0x00000100
#define GL_ATIME 0x00000200
#define GL_NOCACHE 0x00000400
-#define GL_FLOCK 0x00000800
-#define GL_NOCANCEL 0x00001000

#define GLR_TRYFAILED 13
-#define GLR_CANCELED 14

static inline struct gfs2_holder *gfs2_glock_is_locked_by_me(struct gfs2_glock *gl)
{
@@ -41,6 +38,8 @@ static inline struct gfs2_holder *gfs2_glock_is_locked_by_me(struct gfs2_glock *
spin_lock(&gl->gl_spin);
pid = task_pid(current);
list_for_each_entry(gh, &gl->gl_holders, gh_list) {
+ if (!test_bit(HIF_HOLDER, &gh->gh_iflags))
+ break;
if (gh->gh_owner_pid == pid)
goto out;
}
@@ -70,7 +69,7 @@ static inline int gfs2_glock_is_blocking(struct gfs2_glock *gl)
{
int ret;
spin_lock(&gl->gl_spin);
- ret = test_bit(GLF_DEMOTE, &gl->gl_flags) || !list_empty(&gl->gl_waiters3);
+ ret = test_bit(GLF_DEMOTE, &gl->gl_flags);
spin_unlock(&gl->gl_spin);
return ret;
}
@@ -98,6 +97,7 @@ int gfs2_glock_nq_num(struct gfs2_sbd *sdp,
int gfs2_glock_nq_m(unsigned int num_gh, struct gfs2_holder *ghs);
void gfs2_glock_dq_m(unsigned int num_gh, struct gfs2_holder *ghs);
void gfs2_glock_dq_uninit_m(unsigned int num_gh, struct gfs2_holder *ghs);
+void gfs2_print_dbg(struct seq_file *seq, const char *fmt, ...);

/**
* gfs2_glock_nq_init - intialize a holder and enqueue it on a glock
@@ -130,7 +130,6 @@ int gfs2_lvb_hold(struct gfs2_glock *gl);
void gfs2_lvb_unhold(struct gfs2_glock *gl);

void gfs2_glock_cb(void *cb_data, unsigned int type, void *data);
-
void gfs2_glock_schedule_for_reclaim(struct gfs2_glock *gl);
void gfs2_reclaim_glock(struct gfs2_sbd *sdp);
void gfs2_gl_hash_clear(struct gfs2_sbd *sdp, int wait);
diff --git a/fs/gfs2/glops.c b/fs/gfs2/glops.c
index 07d84d1..c6c318c 100644
--- a/fs/gfs2/glops.c
+++ b/fs/gfs2/glops.c
@@ -13,6 +13,7 @@
#include <linux/buffer_head.h>
#include <linux/gfs2_ondisk.h>
#include <linux/lm_interface.h>
+#include <linux/bio.h>

#include "gfs2.h"
#include "incore.h"
@@ -172,26 +173,6 @@ static void inode_go_sync(struct gfs2_glock *gl)
}

/**
- * inode_go_xmote_bh - After promoting/demoting a glock
- * @gl: the glock
- *
- */
-
-static void inode_go_xmote_bh(struct gfs2_glock *gl)
-{
- struct gfs2_holder *gh = gl->gl_req_gh;
- struct buffer_head *bh;
- int error;
-
- if (gl->gl_state != LM_ST_UNLOCKED &&
- (!gh || !(gh->gh_flags & GL_SKIP))) {
- error = gfs2_meta_read(gl, gl->gl_name.ln_number, 0, &bh);
- if (!error)
- brelse(bh);
- }
-}
-
-/**
* inode_go_inval - prepare a inode glock to be released
* @gl: the glock
* @flags:
@@ -267,6 +248,26 @@ static int inode_go_lock(struct gfs2_holder *gh)
}

/**
+ * inode_go_dump - print information about an inode
+ * @seq: The iterator
+ * @ip: the inode
+ *
+ * Returns: 0 on success, -ENOBUFS when we run out of space
+ */
+
+static int inode_go_dump(struct seq_file *seq, const struct gfs2_glock *gl)
+{
+ const struct gfs2_inode *ip = gl->gl_object;
+ if (ip == NULL)
+ return 0;
+ gfs2_print_dbg(seq, " I: n:%llu/%llu t:%u f:0x%08lx\n",
+ (unsigned long long)ip->i_no_formal_ino,
+ (unsigned long long)ip->i_no_addr,
+ IF2DT(ip->i_inode.i_mode), ip->i_flags);
+ return 0;
+}
+
+/**
* rgrp_go_demote_ok - Check to see if it's ok to unlock a RG's glock
* @gl: the glock
*
@@ -306,6 +307,22 @@ static void rgrp_go_unlock(struct gfs2_holder *gh)
}

/**
+ * rgrp_go_dump - print out an rgrp
+ * @seq: The iterator
+ * @gl: The glock in question
+ *
+ */
+
+static int rgrp_go_dump(struct seq_file *seq, const struct gfs2_glock *gl)
+{
+ const struct gfs2_rgrpd *rgd = gl->gl_object;
+ if (rgd == NULL)
+ return 0;
+ gfs2_print_dbg(seq, " R: n:%llu\n", (unsigned long long)rgd->rd_addr);
+ return 0;
+}
+
+/**
* trans_go_sync - promote/demote the transaction glock
* @gl: the glock
* @state: the requested state
@@ -330,7 +347,7 @@ static void trans_go_sync(struct gfs2_glock *gl)
*
*/

-static void trans_go_xmote_bh(struct gfs2_glock *gl)
+static int trans_go_xmote_bh(struct gfs2_glock *gl, struct gfs2_holder *gh)
{
struct gfs2_sbd *sdp = gl->gl_sbd;
struct gfs2_inode *ip = GFS2_I(sdp->sd_jdesc->jd_inode);
@@ -338,8 +355,7 @@ static void trans_go_xmote_bh(struct gfs2_glock *gl)
struct gfs2_log_header_host head;
int error;

- if (gl->gl_state != LM_ST_UNLOCKED &&
- test_bit(SDF_JOURNAL_LIVE, &sdp->sd_flags)) {
+ if (test_bit(SDF_JOURNAL_LIVE, &sdp->sd_flags)) {
j_gl->gl_ops->go_inval(j_gl, DIO_METADATA);

error = gfs2_find_jhead(sdp->sd_jdesc, &head);
@@ -354,6 +370,7 @@ static void trans_go_xmote_bh(struct gfs2_glock *gl)
gfs2_log_pointers_init(sdp, head.lh_blkno);
}
}
+ return 0;
}

/**
@@ -375,12 +392,12 @@ const struct gfs2_glock_operations gfs2_meta_glops = {

const struct gfs2_glock_operations gfs2_inode_glops = {
.go_xmote_th = inode_go_sync,
- .go_xmote_bh = inode_go_xmote_bh,
.go_inval = inode_go_inval,
.go_demote_ok = inode_go_demote_ok,
.go_lock = inode_go_lock,
+ .go_dump = inode_go_dump,
.go_type = LM_TYPE_INODE,
- .go_min_hold_time = HZ / 10,
+ .go_min_hold_time = HZ / 5,
};

const struct gfs2_glock_operations gfs2_rgrp_glops = {
@@ -389,8 +406,9 @@ const struct gfs2_glock_operations gfs2_rgrp_glops = {
.go_demote_ok = rgrp_go_demote_ok,
.go_lock = rgrp_go_lock,
.go_unlock = rgrp_go_unlock,
+ .go_dump = rgrp_go_dump,
.go_type = LM_TYPE_RGRP,
- .go_min_hold_time = HZ / 10,
+ .go_min_hold_time = HZ / 5,
};

const struct gfs2_glock_operations gfs2_trans_glops = {
diff --git a/fs/gfs2/incore.h b/fs/gfs2/incore.h
index eabe5ea..4b734c6 100644
--- a/fs/gfs2/incore.h
+++ b/fs/gfs2/incore.h
@@ -128,20 +128,20 @@ struct gfs2_bufdata {

struct gfs2_glock_operations {
void (*go_xmote_th) (struct gfs2_glock *gl);
- void (*go_xmote_bh) (struct gfs2_glock *gl);
+ int (*go_xmote_bh) (struct gfs2_glock *gl, struct gfs2_holder *gh);
void (*go_inval) (struct gfs2_glock *gl, int flags);
int (*go_demote_ok) (struct gfs2_glock *gl);
int (*go_lock) (struct gfs2_holder *gh);
void (*go_unlock) (struct gfs2_holder *gh);
+ int (*go_dump)(struct seq_file *seq, const struct gfs2_glock *gl);
const int go_type;
const unsigned long go_min_hold_time;
};

enum {
/* States */
- HIF_HOLDER = 6,
+ HIF_HOLDER = 6, /* Set for gh that "holds" the glock */
HIF_FIRST = 7,
- HIF_ABORTED = 9,
HIF_WAIT = 10,
};

@@ -154,20 +154,20 @@ struct gfs2_holder {
unsigned gh_flags;

int gh_error;
- unsigned long gh_iflags;
+ unsigned long gh_iflags; /* HIF_... */
unsigned long gh_ip;
};

enum {
- GLF_LOCK = 1,
- GLF_STICKY = 2,
- GLF_DEMOTE = 3,
- GLF_PENDING_DEMOTE = 4,
- GLF_DIRTY = 5,
- GLF_DEMOTE_IN_PROGRESS = 6,
- GLF_LFLUSH = 7,
- GLF_WAITERS2 = 8,
- GLF_CONV_DEADLK = 9,
+ GLF_LOCK = 1,
+ GLF_STICKY = 2,
+ GLF_DEMOTE = 3,
+ GLF_PENDING_DEMOTE = 4,
+ GLF_DEMOTE_IN_PROGRESS = 5,
+ GLF_DIRTY = 6,
+ GLF_LFLUSH = 7,
+ GLF_INVALIDATE_IN_PROGRESS = 8,
+ GLF_REPLY_PENDING = 9,
};

struct gfs2_glock {
@@ -179,19 +179,14 @@ struct gfs2_glock {
spinlock_t gl_spin;

unsigned int gl_state;
+ unsigned int gl_target;
+ unsigned int gl_reply;
unsigned int gl_hash;
unsigned int gl_demote_state; /* state requested by remote node */
unsigned long gl_demote_time; /* time of first demote request */
- struct pid *gl_owner_pid;
- unsigned long gl_ip;
struct list_head gl_holders;
- struct list_head gl_waiters1; /* HIF_MUTEX */
- struct list_head gl_waiters3; /* HIF_PROMOTE */

const struct gfs2_glock_operations *gl_ops;
-
- struct gfs2_holder *gl_req_gh;
-
void *gl_lock;
char *gl_lvb;
atomic_t gl_lvb_count;
diff --git a/fs/gfs2/locking/dlm/lock.c b/fs/gfs2/locking/dlm/lock.c
index cf7ea8a..fed9a67 100644
--- a/fs/gfs2/locking/dlm/lock.c
+++ b/fs/gfs2/locking/dlm/lock.c
@@ -308,6 +308,9 @@ unsigned int gdlm_lock(void *lock, unsigned int cur_state,
{
struct gdlm_lock *lp = lock;

+ if (req_state == LM_ST_UNLOCKED)
+ return gdlm_unlock(lock, cur_state);
+
clear_bit(LFL_DLM_CANCEL, &lp->flags);
if (flags & LM_FLAG_NOEXP)
set_bit(LFL_NOBLOCK, &lp->flags);
diff --git a/fs/gfs2/locking/nolock/main.c b/fs/gfs2/locking/nolock/main.c
index 284a5ec..627bfb7 100644
--- a/fs/gfs2/locking/nolock/main.c
+++ b/fs/gfs2/locking/nolock/main.c
@@ -107,6 +107,8 @@ static void nolock_put_lock(void *lock)
static unsigned int nolock_lock(void *lock, unsigned int cur_state,
unsigned int req_state, unsigned int flags)
{
+ if (req_state == LM_ST_UNLOCKED)
+ return 0;
return req_state | LM_OUT_CACHEABLE;
}

diff --git a/fs/gfs2/main.c b/fs/gfs2/main.c
index 053e2eb..bcc668d 100644
--- a/fs/gfs2/main.c
+++ b/fs/gfs2/main.c
@@ -40,8 +40,6 @@ static void gfs2_init_glock_once(struct kmem_cache *cachep, void *foo)
INIT_HLIST_NODE(&gl->gl_list);
spin_lock_init(&gl->gl_spin);
INIT_LIST_HEAD(&gl->gl_holders);
- INIT_LIST_HEAD(&gl->gl_waiters1);
- INIT_LIST_HEAD(&gl->gl_waiters3);
gl->gl_lvb = NULL;
atomic_set(&gl->gl_lvb_count, 0);
INIT_LIST_HEAD(&gl->gl_reclaim);
diff --git a/fs/gfs2/meta_io.c b/fs/gfs2/meta_io.c
index 78d75f8..0985362 100644
--- a/fs/gfs2/meta_io.c
+++ b/fs/gfs2/meta_io.c
@@ -129,7 +129,7 @@ void gfs2_meta_sync(struct gfs2_glock *gl)
}

/**
- * getbuf - Get a buffer with a given address space
+ * gfs2_getbuf - Get a buffer with a given address space
* @gl: the glock
* @blkno: the block number (filesystem scope)
* @create: 1 if the buffer should be created
@@ -137,7 +137,7 @@ void gfs2_meta_sync(struct gfs2_glock *gl)
* Returns: the buffer
*/

-static struct buffer_head *getbuf(struct gfs2_glock *gl, u64 blkno, int create)
+struct buffer_head *gfs2_getbuf(struct gfs2_glock *gl, u64 blkno, int create)
{
struct address_space *mapping = gl->gl_aspace->i_mapping;
struct gfs2_sbd *sdp = gl->gl_sbd;
@@ -205,7 +205,7 @@ static void meta_prep_new(struct buffer_head *bh)
struct buffer_head *gfs2_meta_new(struct gfs2_glock *gl, u64 blkno)
{
struct buffer_head *bh;
- bh = getbuf(gl, blkno, CREATE);
+ bh = gfs2_getbuf(gl, blkno, CREATE);
meta_prep_new(bh);
return bh;
}
@@ -223,7 +223,7 @@ struct buffer_head *gfs2_meta_new(struct gfs2_glock *gl, u64 blkno)
int gfs2_meta_read(struct gfs2_glock *gl, u64 blkno, int flags,
struct buffer_head **bhp)
{
- *bhp = getbuf(gl, blkno, CREATE);
+ *bhp = gfs2_getbuf(gl, blkno, CREATE);
if (!buffer_uptodate(*bhp)) {
ll_rw_block(READ_META, 1, bhp);
if (flags & DIO_WAIT) {
@@ -346,7 +346,7 @@ void gfs2_meta_wipe(struct gfs2_inode *ip, u64 bstart, u32 blen)
struct buffer_head *bh;

while (blen) {
- bh = getbuf(ip->i_gl, bstart, NO_CREATE);
+ bh = gfs2_getbuf(ip->i_gl, bstart, NO_CREATE);
if (bh) {
lock_buffer(bh);
gfs2_log_lock(sdp);
@@ -421,7 +421,7 @@ struct buffer_head *gfs2_meta_ra(struct gfs2_glock *gl, u64 dblock, u32 extlen)
if (extlen > max_ra)
extlen = max_ra;

- first_bh = getbuf(gl, dblock, CREATE);
+ first_bh = gfs2_getbuf(gl, dblock, CREATE);

if (buffer_uptodate(first_bh))
goto out;
@@ -432,7 +432,7 @@ struct buffer_head *gfs2_meta_ra(struct gfs2_glock *gl, u64 dblock, u32 extlen)
extlen--;

while (extlen) {
- bh = getbuf(gl, dblock, CREATE);
+ bh = gfs2_getbuf(gl, dblock, CREATE);

if (!buffer_uptodate(bh) && !buffer_locked(bh))
ll_rw_block(READA, 1, &bh);
diff --git a/fs/gfs2/meta_io.h b/fs/gfs2/meta_io.h
index 73e3b1c..b1a5f36 100644
--- a/fs/gfs2/meta_io.h
+++ b/fs/gfs2/meta_io.h
@@ -47,6 +47,7 @@ struct buffer_head *gfs2_meta_new(struct gfs2_glock *gl, u64 blkno);
int gfs2_meta_read(struct gfs2_glock *gl, u64 blkno,
int flags, struct buffer_head **bhp);
int gfs2_meta_wait(struct gfs2_sbd *sdp, struct buffer_head *bh);
+struct buffer_head *gfs2_getbuf(struct gfs2_glock *gl, u64 blkno, int create);

void gfs2_attach_bufdata(struct gfs2_glock *gl, struct buffer_head *bh,
int meta);
diff --git a/fs/gfs2/ops_address.c b/fs/gfs2/ops_address.c
index f55394e..2b556dd 100644
--- a/fs/gfs2/ops_address.c
+++ b/fs/gfs2/ops_address.c
@@ -507,26 +507,23 @@ static int __gfs2_readpage(void *file, struct page *page)
static int gfs2_readpage(struct file *file, struct page *page)
{
struct gfs2_inode *ip = GFS2_I(page->mapping->host);
- struct gfs2_holder *gh;
+ struct gfs2_holder gh;
int error;

- gh = gfs2_glock_is_locked_by_me(ip->i_gl);
- if (!gh) {
- gh = kmalloc(sizeof(struct gfs2_holder), GFP_NOFS);
- if (!gh)
- return -ENOBUFS;
- gfs2_holder_init(ip->i_gl, LM_ST_SHARED, GL_ATIME, gh);
+ gfs2_holder_init(ip->i_gl, LM_ST_SHARED, GL_ATIME|LM_FLAG_TRY_1CB, &gh);
+ error = gfs2_glock_nq_atime(&gh);
+ if (unlikely(error)) {
unlock_page(page);
- error = gfs2_glock_nq_atime(gh);
- if (likely(error != 0))
- goto out;
- return AOP_TRUNCATED_PAGE;
+ goto out;
}
error = __gfs2_readpage(file, page);
- gfs2_glock_dq(gh);
+ gfs2_glock_dq(&gh);
out:
- gfs2_holder_uninit(gh);
- kfree(gh);
+ gfs2_holder_uninit(&gh);
+ if (error == GLR_TRYFAILED) {
+ yield();
+ return AOP_TRUNCATED_PAGE;
+ }
return error;
}

diff --git a/fs/gfs2/ops_file.c b/fs/gfs2/ops_file.c
index e1b7d52..0ff512a 100644
--- a/fs/gfs2/ops_file.c
+++ b/fs/gfs2/ops_file.c
@@ -669,8 +669,7 @@ static int do_flock(struct file *file, int cmd, struct file_lock *fl)
int error = 0;

state = (fl->fl_type == F_WRLCK) ? LM_ST_EXCLUSIVE : LM_ST_SHARED;
- flags = (IS_SETLKW(cmd) ? 0 : LM_FLAG_TRY) | GL_EXACT | GL_NOCACHE
- | GL_FLOCK;
+ flags = (IS_SETLKW(cmd) ? 0 : LM_FLAG_TRY) | GL_EXACT | GL_NOCACHE;

mutex_lock(&fp->f_fl_mutex);

@@ -683,9 +682,8 @@ static int do_flock(struct file *file, int cmd, struct file_lock *fl)
gfs2_glock_dq_wait(fl_gh);
gfs2_holder_reinit(state, flags, fl_gh);
} else {
- error = gfs2_glock_get(GFS2_SB(&ip->i_inode),
- ip->i_no_addr, &gfs2_flock_glops,
- CREATE, &gl);
+ error = gfs2_glock_get(GFS2_SB(&ip->i_inode), ip->i_no_addr,
+ &gfs2_flock_glops, CREATE, &gl);
if (error)
goto out;
gfs2_holder_init(gl, state, flags, fl_gh);
diff --git a/fs/gfs2/recovery.c b/fs/gfs2/recovery.c
index 2888e4b..fdd3f0f 100644
--- a/fs/gfs2/recovery.c
+++ b/fs/gfs2/recovery.c
@@ -505,7 +505,7 @@ int gfs2_recover_journal(struct gfs2_jdesc *jd)

error = gfs2_glock_nq_init(sdp->sd_trans_gl, LM_ST_SHARED,
LM_FLAG_NOEXP | LM_FLAG_PRIORITY |
- GL_NOCANCEL | GL_NOCACHE, &t_gh);
+ GL_NOCACHE, &t_gh);
if (error)
goto fail_gunlock_ji;

diff --git a/fs/gfs2/super.c b/fs/gfs2/super.c
index 7aeacbc..12fe38f 100644
--- a/fs/gfs2/super.c
+++ b/fs/gfs2/super.c
@@ -941,8 +941,7 @@ static int gfs2_lock_fs_check_clean(struct gfs2_sbd *sdp,
}

error = gfs2_glock_nq_init(sdp->sd_trans_gl, LM_ST_DEFERRED,
- LM_FLAG_PRIORITY | GL_NOCACHE,
- t_gh);
+ GL_NOCACHE, t_gh);

list_for_each_entry(jd, &sdp->sd_jindex_list, jd_list) {
error = gfs2_jdesc_check(jd);
--
1.5.1.2

2008-07-11 11:02:19

by Steven Whitehouse

[permalink] [raw]
Subject: [PATCH 02/18] [GFS2] Fix ordering bug in lock_dlm

From: Steven Whitehouse <[email protected]>

This looks like a lot of change, but in fact its not. Mostly its
things moving from one file to another. The change is just that
instead of queuing lock completions and callbacks from the DLM
we now pass them directly to GFS2.

This gives us a net loss of two list heads per glock (a fair
saving in memory) plus a reduction in the latency of delivering
the messages to GFS2, plus we now have one thread fewer as well.
There was a bug where callbacks and completions could be delivered
in the wrong order due to this unnecessary queuing which is fixed
by this patch.

Signed-off-by: Steven Whitehouse <[email protected]>
Cc: Bob Peterson <[email protected]>

diff --git a/fs/gfs2/locking/dlm/lock.c b/fs/gfs2/locking/dlm/lock.c
index fed9a67..871ffc9 100644
--- a/fs/gfs2/locking/dlm/lock.c
+++ b/fs/gfs2/locking/dlm/lock.c
@@ -11,46 +11,63 @@

static char junk_lvb[GDLM_LVB_SIZE];

-static void queue_complete(struct gdlm_lock *lp)
+
+/* convert dlm lock-mode to gfs lock-state */
+
+static s16 gdlm_make_lmstate(s16 dlmmode)
{
- struct gdlm_ls *ls = lp->ls;
+ switch (dlmmode) {
+ case DLM_LOCK_IV:
+ case DLM_LOCK_NL:
+ return LM_ST_UNLOCKED;
+ case DLM_LOCK_EX:
+ return LM_ST_EXCLUSIVE;
+ case DLM_LOCK_CW:
+ return LM_ST_DEFERRED;
+ case DLM_LOCK_PR:
+ return LM_ST_SHARED;
+ }
+ gdlm_assert(0, "unknown DLM mode %d", dlmmode);
+ return -1;
+}

- clear_bit(LFL_ACTIVE, &lp->flags);
+/* A lock placed on this queue is re-submitted to DLM as soon as the lock_dlm
+ thread gets to it. */
+
+static void queue_submit(struct gdlm_lock *lp)
+{
+ struct gdlm_ls *ls = lp->ls;

spin_lock(&ls->async_lock);
- list_add_tail(&lp->clist, &ls->complete);
+ list_add_tail(&lp->delay_list, &ls->submit);
spin_unlock(&ls->async_lock);
wake_up(&ls->thread_wait);
}

-static inline void gdlm_ast(void *astarg)
+static void wake_up_ast(struct gdlm_lock *lp)
{
- queue_complete(astarg);
+ clear_bit(LFL_AST_WAIT, &lp->flags);
+ smp_mb__after_clear_bit();
+ wake_up_bit(&lp->flags, LFL_AST_WAIT);
}

-static inline void gdlm_bast(void *astarg, int mode)
+static void gdlm_delete_lp(struct gdlm_lock *lp)
{
- struct gdlm_lock *lp = astarg;
struct gdlm_ls *ls = lp->ls;

- if (!mode) {
- printk(KERN_INFO "lock_dlm: bast mode zero %x,%llx\n",
- lp->lockname.ln_type,
- (unsigned long long)lp->lockname.ln_number);
- return;
- }
-
spin_lock(&ls->async_lock);
- if (!lp->bast_mode) {
- list_add_tail(&lp->blist, &ls->blocking);
- lp->bast_mode = mode;
- } else if (lp->bast_mode < mode)
- lp->bast_mode = mode;
+ if (!list_empty(&lp->delay_list))
+ list_del_init(&lp->delay_list);
+ gdlm_assert(!list_empty(&lp->all_list), "%x,%llx", lp->lockname.ln_type,
+ (unsigned long long)lp->lockname.ln_number);
+ list_del_init(&lp->all_list);
+ ls->all_locks_count--;
spin_unlock(&ls->async_lock);
- wake_up(&ls->thread_wait);
+
+ kfree(lp);
}

-void gdlm_queue_delayed(struct gdlm_lock *lp)
+static void gdlm_queue_delayed(struct gdlm_lock *lp)
{
struct gdlm_ls *ls = lp->ls;

@@ -59,6 +76,249 @@ void gdlm_queue_delayed(struct gdlm_lock *lp)
spin_unlock(&ls->async_lock);
}

+static void process_complete(struct gdlm_lock *lp)
+{
+ struct gdlm_ls *ls = lp->ls;
+ struct lm_async_cb acb;
+ s16 prev_mode = lp->cur;
+
+ memset(&acb, 0, sizeof(acb));
+
+ if (lp->lksb.sb_status == -DLM_ECANCEL) {
+ log_info("complete dlm cancel %x,%llx flags %lx",
+ lp->lockname.ln_type,
+ (unsigned long long)lp->lockname.ln_number,
+ lp->flags);
+
+ lp->req = lp->cur;
+ acb.lc_ret |= LM_OUT_CANCELED;
+ if (lp->cur == DLM_LOCK_IV)
+ lp->lksb.sb_lkid = 0;
+ goto out;
+ }
+
+ if (test_and_clear_bit(LFL_DLM_UNLOCK, &lp->flags)) {
+ if (lp->lksb.sb_status != -DLM_EUNLOCK) {
+ log_info("unlock sb_status %d %x,%llx flags %lx",
+ lp->lksb.sb_status, lp->lockname.ln_type,
+ (unsigned long long)lp->lockname.ln_number,
+ lp->flags);
+ return;
+ }
+
+ lp->cur = DLM_LOCK_IV;
+ lp->req = DLM_LOCK_IV;
+ lp->lksb.sb_lkid = 0;
+
+ if (test_and_clear_bit(LFL_UNLOCK_DELETE, &lp->flags)) {
+ gdlm_delete_lp(lp);
+ return;
+ }
+ goto out;
+ }
+
+ if (lp->lksb.sb_flags & DLM_SBF_VALNOTVALID)
+ memset(lp->lksb.sb_lvbptr, 0, GDLM_LVB_SIZE);
+
+ if (lp->lksb.sb_flags & DLM_SBF_ALTMODE) {
+ if (lp->req == DLM_LOCK_PR)
+ lp->req = DLM_LOCK_CW;
+ else if (lp->req == DLM_LOCK_CW)
+ lp->req = DLM_LOCK_PR;
+ }
+
+ /*
+ * A canceled lock request. The lock was just taken off the delayed
+ * list and was never even submitted to dlm.
+ */
+
+ if (test_and_clear_bit(LFL_CANCEL, &lp->flags)) {
+ log_info("complete internal cancel %x,%llx",
+ lp->lockname.ln_type,
+ (unsigned long long)lp->lockname.ln_number);
+ lp->req = lp->cur;
+ acb.lc_ret |= LM_OUT_CANCELED;
+ goto out;
+ }
+
+ /*
+ * An error occured.
+ */
+
+ if (lp->lksb.sb_status) {
+ /* a "normal" error */
+ if ((lp->lksb.sb_status == -EAGAIN) &&
+ (lp->lkf & DLM_LKF_NOQUEUE)) {
+ lp->req = lp->cur;
+ if (lp->cur == DLM_LOCK_IV)
+ lp->lksb.sb_lkid = 0;
+ goto out;
+ }
+
+ /* this could only happen with cancels I think */
+ log_info("ast sb_status %d %x,%llx flags %lx",
+ lp->lksb.sb_status, lp->lockname.ln_type,
+ (unsigned long long)lp->lockname.ln_number,
+ lp->flags);
+ if (lp->lksb.sb_status == -EDEADLOCK &&
+ lp->ls->fsflags & LM_MFLAG_CONV_NODROP) {
+ lp->req = lp->cur;
+ acb.lc_ret |= LM_OUT_CONV_DEADLK;
+ if (lp->cur == DLM_LOCK_IV)
+ lp->lksb.sb_lkid = 0;
+ goto out;
+ } else
+ return;
+ }
+
+ /*
+ * This is an AST for an EX->EX conversion for sync_lvb from GFS.
+ */
+
+ if (test_and_clear_bit(LFL_SYNC_LVB, &lp->flags)) {
+ wake_up_ast(lp);
+ return;
+ }
+
+ /*
+ * A lock has been demoted to NL because it initially completed during
+ * BLOCK_LOCKS. Now it must be requested in the originally requested
+ * mode.
+ */
+
+ if (test_and_clear_bit(LFL_REREQUEST, &lp->flags)) {
+ gdlm_assert(lp->req == DLM_LOCK_NL, "%x,%llx",
+ lp->lockname.ln_type,
+ (unsigned long long)lp->lockname.ln_number);
+ gdlm_assert(lp->prev_req > DLM_LOCK_NL, "%x,%llx",
+ lp->lockname.ln_type,
+ (unsigned long long)lp->lockname.ln_number);
+
+ lp->cur = DLM_LOCK_NL;
+ lp->req = lp->prev_req;
+ lp->prev_req = DLM_LOCK_IV;
+ lp->lkf &= ~DLM_LKF_CONVDEADLK;
+
+ set_bit(LFL_NOCACHE, &lp->flags);
+
+ if (test_bit(DFL_BLOCK_LOCKS, &ls->flags) &&
+ !test_bit(LFL_NOBLOCK, &lp->flags))
+ gdlm_queue_delayed(lp);
+ else
+ queue_submit(lp);
+ return;
+ }
+
+ /*
+ * A request is granted during dlm recovery. It may be granted
+ * because the locks of a failed node were cleared. In that case,
+ * there may be inconsistent data beneath this lock and we must wait
+ * for recovery to complete to use it. When gfs recovery is done this
+ * granted lock will be converted to NL and then reacquired in this
+ * granted state.
+ */
+
+ if (test_bit(DFL_BLOCK_LOCKS, &ls->flags) &&
+ !test_bit(LFL_NOBLOCK, &lp->flags) &&
+ lp->req != DLM_LOCK_NL) {
+
+ lp->cur = lp->req;
+ lp->prev_req = lp->req;
+ lp->req = DLM_LOCK_NL;
+ lp->lkf |= DLM_LKF_CONVERT;
+ lp->lkf &= ~DLM_LKF_CONVDEADLK;
+
+ log_debug("rereq %x,%llx id %x %d,%d",
+ lp->lockname.ln_type,
+ (unsigned long long)lp->lockname.ln_number,
+ lp->lksb.sb_lkid, lp->cur, lp->req);
+
+ set_bit(LFL_REREQUEST, &lp->flags);
+ queue_submit(lp);
+ return;
+ }
+
+ /*
+ * DLM demoted the lock to NL before it was granted so GFS must be
+ * told it cannot cache data for this lock.
+ */
+
+ if (lp->lksb.sb_flags & DLM_SBF_DEMOTED)
+ set_bit(LFL_NOCACHE, &lp->flags);
+
+out:
+ /*
+ * This is an internal lock_dlm lock
+ */
+
+ if (test_bit(LFL_INLOCK, &lp->flags)) {
+ clear_bit(LFL_NOBLOCK, &lp->flags);
+ lp->cur = lp->req;
+ wake_up_ast(lp);
+ return;
+ }
+
+ /*
+ * Normal completion of a lock request. Tell GFS it now has the lock.
+ */
+
+ clear_bit(LFL_NOBLOCK, &lp->flags);
+ lp->cur = lp->req;
+
+ acb.lc_name = lp->lockname;
+ acb.lc_ret |= gdlm_make_lmstate(lp->cur);
+
+ if (!test_and_clear_bit(LFL_NOCACHE, &lp->flags) &&
+ (lp->cur > DLM_LOCK_NL) && (prev_mode > DLM_LOCK_NL))
+ acb.lc_ret |= LM_OUT_CACHEABLE;
+
+ ls->fscb(ls->sdp, LM_CB_ASYNC, &acb);
+}
+
+static void gdlm_ast(void *astarg)
+{
+ struct gdlm_lock *lp = astarg;
+ clear_bit(LFL_ACTIVE, &lp->flags);
+ process_complete(lp);
+}
+
+static void process_blocking(struct gdlm_lock *lp, int bast_mode)
+{
+ struct gdlm_ls *ls = lp->ls;
+ unsigned int cb = 0;
+
+ switch (gdlm_make_lmstate(bast_mode)) {
+ case LM_ST_EXCLUSIVE:
+ cb = LM_CB_NEED_E;
+ break;
+ case LM_ST_DEFERRED:
+ cb = LM_CB_NEED_D;
+ break;
+ case LM_ST_SHARED:
+ cb = LM_CB_NEED_S;
+ break;
+ default:
+ gdlm_assert(0, "unknown bast mode %u", bast_mode);
+ }
+
+ ls->fscb(ls->sdp, cb, &lp->lockname);
+}
+
+
+static void gdlm_bast(void *astarg, int mode)
+{
+ struct gdlm_lock *lp = astarg;
+
+ if (!mode) {
+ printk(KERN_INFO "lock_dlm: bast mode zero %x,%llx\n",
+ lp->lockname.ln_type,
+ (unsigned long long)lp->lockname.ln_number);
+ return;
+ }
+
+ process_blocking(lp, mode);
+}
+
/* convert gfs lock-state to dlm lock-mode */

static s16 make_mode(s16 lmstate)
@@ -77,24 +337,6 @@ static s16 make_mode(s16 lmstate)
return -1;
}

-/* convert dlm lock-mode to gfs lock-state */
-
-s16 gdlm_make_lmstate(s16 dlmmode)
-{
- switch (dlmmode) {
- case DLM_LOCK_IV:
- case DLM_LOCK_NL:
- return LM_ST_UNLOCKED;
- case DLM_LOCK_EX:
- return LM_ST_EXCLUSIVE;
- case DLM_LOCK_CW:
- return LM_ST_DEFERRED;
- case DLM_LOCK_PR:
- return LM_ST_SHARED;
- }
- gdlm_assert(0, "unknown DLM mode %d", dlmmode);
- return -1;
-}

/* verify agreement with GFS on the current lock state, NB: DLM_LOCK_NL and
DLM_LOCK_IV are both considered LM_ST_UNLOCKED by GFS. */
@@ -173,10 +415,6 @@ static int gdlm_create_lp(struct gdlm_ls *ls, struct lm_lockname *name,
make_strname(name, &lp->strname);
lp->ls = ls;
lp->cur = DLM_LOCK_IV;
- lp->lvb = NULL;
- lp->hold_null = NULL;
- INIT_LIST_HEAD(&lp->clist);
- INIT_LIST_HEAD(&lp->blist);
INIT_LIST_HEAD(&lp->delay_list);

spin_lock(&ls->async_lock);
@@ -188,26 +426,6 @@ static int gdlm_create_lp(struct gdlm_ls *ls, struct lm_lockname *name,
return 0;
}

-void gdlm_delete_lp(struct gdlm_lock *lp)
-{
- struct gdlm_ls *ls = lp->ls;
-
- spin_lock(&ls->async_lock);
- if (!list_empty(&lp->clist))
- list_del_init(&lp->clist);
- if (!list_empty(&lp->blist))
- list_del_init(&lp->blist);
- if (!list_empty(&lp->delay_list))
- list_del_init(&lp->delay_list);
- gdlm_assert(!list_empty(&lp->all_list), "%x,%llx", lp->lockname.ln_type,
- (unsigned long long)lp->lockname.ln_number);
- list_del_init(&lp->all_list);
- ls->all_locks_count--;
- spin_unlock(&ls->async_lock);
-
- kfree(lp);
-}
-
int gdlm_get_lock(void *lockspace, struct lm_lockname *name,
void **lockp)
{
@@ -261,7 +479,7 @@ unsigned int gdlm_do_lock(struct gdlm_lock *lp)

if ((error == -EAGAIN) && (lp->lkf & DLM_LKF_NOQUEUE)) {
lp->lksb.sb_status = -EAGAIN;
- queue_complete(lp);
+ gdlm_ast(lp);
error = 0;
}

@@ -311,6 +529,9 @@ unsigned int gdlm_lock(void *lock, unsigned int cur_state,
if (req_state == LM_ST_UNLOCKED)
return gdlm_unlock(lock, cur_state);

+ if (req_state == LM_ST_UNLOCKED)
+ return gdlm_unlock(lock, cur_state);
+
clear_bit(LFL_DLM_CANCEL, &lp->flags);
if (flags & LM_FLAG_NOEXP)
set_bit(LFL_NOBLOCK, &lp->flags);
@@ -354,7 +575,7 @@ void gdlm_cancel(void *lock)
if (delay_list) {
set_bit(LFL_CANCEL, &lp->flags);
set_bit(LFL_ACTIVE, &lp->flags);
- queue_complete(lp);
+ gdlm_ast(lp);
return;
}

diff --git a/fs/gfs2/locking/dlm/lock_dlm.h b/fs/gfs2/locking/dlm/lock_dlm.h
index a243cf6..ad944c6 100644
--- a/fs/gfs2/locking/dlm/lock_dlm.h
+++ b/fs/gfs2/locking/dlm/lock_dlm.h
@@ -72,15 +72,12 @@ struct gdlm_ls {
int recover_jid_done;
int recover_jid_status;
spinlock_t async_lock;
- struct list_head complete;
- struct list_head blocking;
struct list_head delayed;
struct list_head submit;
struct list_head all_locks;
u32 all_locks_count;
wait_queue_head_t wait_control;
- struct task_struct *thread1;
- struct task_struct *thread2;
+ struct task_struct *thread;
wait_queue_head_t thread_wait;
unsigned long drop_time;
int drop_locks_count;
@@ -117,10 +114,6 @@ struct gdlm_lock {
u32 lkf; /* dlm flags DLM_LKF_ */
unsigned long flags; /* lock_dlm flags LFL_ */

- int bast_mode; /* protected by async_lock */
-
- struct list_head clist; /* complete */
- struct list_head blist; /* blocking */
struct list_head delay_list; /* delayed */
struct list_head all_list; /* all locks for the fs */
struct gdlm_lock *hold_null; /* NL lock for hold_lvb */
@@ -159,11 +152,8 @@ void gdlm_release_threads(struct gdlm_ls *);

/* lock.c */

-s16 gdlm_make_lmstate(s16);
-void gdlm_queue_delayed(struct gdlm_lock *);
void gdlm_submit_delayed(struct gdlm_ls *);
int gdlm_release_all_locks(struct gdlm_ls *);
-void gdlm_delete_lp(struct gdlm_lock *);
unsigned int gdlm_do_lock(struct gdlm_lock *);

int gdlm_get_lock(void *, struct lm_lockname *, void **);
diff --git a/fs/gfs2/locking/dlm/mount.c b/fs/gfs2/locking/dlm/mount.c
index 470bdf6..0628520 100644
--- a/fs/gfs2/locking/dlm/mount.c
+++ b/fs/gfs2/locking/dlm/mount.c
@@ -28,15 +28,11 @@ static struct gdlm_ls *init_gdlm(lm_callback_t cb, struct gfs2_sbd *sdp,
ls->sdp = sdp;
ls->fsflags = flags;
spin_lock_init(&ls->async_lock);
- INIT_LIST_HEAD(&ls->complete);
- INIT_LIST_HEAD(&ls->blocking);
INIT_LIST_HEAD(&ls->delayed);
INIT_LIST_HEAD(&ls->submit);
INIT_LIST_HEAD(&ls->all_locks);
init_waitqueue_head(&ls->thread_wait);
init_waitqueue_head(&ls->wait_control);
- ls->thread1 = NULL;
- ls->thread2 = NULL;
ls->drop_time = jiffies;
ls->jid = -1;

diff --git a/fs/gfs2/locking/dlm/thread.c b/fs/gfs2/locking/dlm/thread.c
index e53db6f..f30350a 100644
--- a/fs/gfs2/locking/dlm/thread.c
+++ b/fs/gfs2/locking/dlm/thread.c
@@ -9,255 +9,12 @@

#include "lock_dlm.h"

-/* A lock placed on this queue is re-submitted to DLM as soon as the lock_dlm
- thread gets to it. */
-
-static void queue_submit(struct gdlm_lock *lp)
-{
- struct gdlm_ls *ls = lp->ls;
-
- spin_lock(&ls->async_lock);
- list_add_tail(&lp->delay_list, &ls->submit);
- spin_unlock(&ls->async_lock);
- wake_up(&ls->thread_wait);
-}
-
-static void process_blocking(struct gdlm_lock *lp, int bast_mode)
-{
- struct gdlm_ls *ls = lp->ls;
- unsigned int cb = 0;
-
- switch (gdlm_make_lmstate(bast_mode)) {
- case LM_ST_EXCLUSIVE:
- cb = LM_CB_NEED_E;
- break;
- case LM_ST_DEFERRED:
- cb = LM_CB_NEED_D;
- break;
- case LM_ST_SHARED:
- cb = LM_CB_NEED_S;
- break;
- default:
- gdlm_assert(0, "unknown bast mode %u", lp->bast_mode);
- }
-
- ls->fscb(ls->sdp, cb, &lp->lockname);
-}
-
-static void wake_up_ast(struct gdlm_lock *lp)
-{
- clear_bit(LFL_AST_WAIT, &lp->flags);
- smp_mb__after_clear_bit();
- wake_up_bit(&lp->flags, LFL_AST_WAIT);
-}
-
-static void process_complete(struct gdlm_lock *lp)
-{
- struct gdlm_ls *ls = lp->ls;
- struct lm_async_cb acb;
- s16 prev_mode = lp->cur;
-
- memset(&acb, 0, sizeof(acb));
-
- if (lp->lksb.sb_status == -DLM_ECANCEL) {
- log_info("complete dlm cancel %x,%llx flags %lx",
- lp->lockname.ln_type,
- (unsigned long long)lp->lockname.ln_number,
- lp->flags);
-
- lp->req = lp->cur;
- acb.lc_ret |= LM_OUT_CANCELED;
- if (lp->cur == DLM_LOCK_IV)
- lp->lksb.sb_lkid = 0;
- goto out;
- }
-
- if (test_and_clear_bit(LFL_DLM_UNLOCK, &lp->flags)) {
- if (lp->lksb.sb_status != -DLM_EUNLOCK) {
- log_info("unlock sb_status %d %x,%llx flags %lx",
- lp->lksb.sb_status, lp->lockname.ln_type,
- (unsigned long long)lp->lockname.ln_number,
- lp->flags);
- return;
- }
-
- lp->cur = DLM_LOCK_IV;
- lp->req = DLM_LOCK_IV;
- lp->lksb.sb_lkid = 0;
-
- if (test_and_clear_bit(LFL_UNLOCK_DELETE, &lp->flags)) {
- gdlm_delete_lp(lp);
- return;
- }
- goto out;
- }
-
- if (lp->lksb.sb_flags & DLM_SBF_VALNOTVALID)
- memset(lp->lksb.sb_lvbptr, 0, GDLM_LVB_SIZE);
-
- if (lp->lksb.sb_flags & DLM_SBF_ALTMODE) {
- if (lp->req == DLM_LOCK_PR)
- lp->req = DLM_LOCK_CW;
- else if (lp->req == DLM_LOCK_CW)
- lp->req = DLM_LOCK_PR;
- }
-
- /*
- * A canceled lock request. The lock was just taken off the delayed
- * list and was never even submitted to dlm.
- */
-
- if (test_and_clear_bit(LFL_CANCEL, &lp->flags)) {
- log_info("complete internal cancel %x,%llx",
- lp->lockname.ln_type,
- (unsigned long long)lp->lockname.ln_number);
- lp->req = lp->cur;
- acb.lc_ret |= LM_OUT_CANCELED;
- goto out;
- }
-
- /*
- * An error occured.
- */
-
- if (lp->lksb.sb_status) {
- /* a "normal" error */
- if ((lp->lksb.sb_status == -EAGAIN) &&
- (lp->lkf & DLM_LKF_NOQUEUE)) {
- lp->req = lp->cur;
- if (lp->cur == DLM_LOCK_IV)
- lp->lksb.sb_lkid = 0;
- goto out;
- }
-
- /* this could only happen with cancels I think */
- log_info("ast sb_status %d %x,%llx flags %lx",
- lp->lksb.sb_status, lp->lockname.ln_type,
- (unsigned long long)lp->lockname.ln_number,
- lp->flags);
- if (lp->lksb.sb_status == -EDEADLOCK &&
- lp->ls->fsflags & LM_MFLAG_CONV_NODROP) {
- lp->req = lp->cur;
- acb.lc_ret |= LM_OUT_CONV_DEADLK;
- if (lp->cur == DLM_LOCK_IV)
- lp->lksb.sb_lkid = 0;
- goto out;
- } else
- return;
- }
-
- /*
- * This is an AST for an EX->EX conversion for sync_lvb from GFS.
- */
-
- if (test_and_clear_bit(LFL_SYNC_LVB, &lp->flags)) {
- wake_up_ast(lp);
- return;
- }
-
- /*
- * A lock has been demoted to NL because it initially completed during
- * BLOCK_LOCKS. Now it must be requested in the originally requested
- * mode.
- */
-
- if (test_and_clear_bit(LFL_REREQUEST, &lp->flags)) {
- gdlm_assert(lp->req == DLM_LOCK_NL, "%x,%llx",
- lp->lockname.ln_type,
- (unsigned long long)lp->lockname.ln_number);
- gdlm_assert(lp->prev_req > DLM_LOCK_NL, "%x,%llx",
- lp->lockname.ln_type,
- (unsigned long long)lp->lockname.ln_number);
-
- lp->cur = DLM_LOCK_NL;
- lp->req = lp->prev_req;
- lp->prev_req = DLM_LOCK_IV;
- lp->lkf &= ~DLM_LKF_CONVDEADLK;
-
- set_bit(LFL_NOCACHE, &lp->flags);
-
- if (test_bit(DFL_BLOCK_LOCKS, &ls->flags) &&
- !test_bit(LFL_NOBLOCK, &lp->flags))
- gdlm_queue_delayed(lp);
- else
- queue_submit(lp);
- return;
- }
-
- /*
- * A request is granted during dlm recovery. It may be granted
- * because the locks of a failed node were cleared. In that case,
- * there may be inconsistent data beneath this lock and we must wait
- * for recovery to complete to use it. When gfs recovery is done this
- * granted lock will be converted to NL and then reacquired in this
- * granted state.
- */
-
- if (test_bit(DFL_BLOCK_LOCKS, &ls->flags) &&
- !test_bit(LFL_NOBLOCK, &lp->flags) &&
- lp->req != DLM_LOCK_NL) {
-
- lp->cur = lp->req;
- lp->prev_req = lp->req;
- lp->req = DLM_LOCK_NL;
- lp->lkf |= DLM_LKF_CONVERT;
- lp->lkf &= ~DLM_LKF_CONVDEADLK;
-
- log_debug("rereq %x,%llx id %x %d,%d",
- lp->lockname.ln_type,
- (unsigned long long)lp->lockname.ln_number,
- lp->lksb.sb_lkid, lp->cur, lp->req);
-
- set_bit(LFL_REREQUEST, &lp->flags);
- queue_submit(lp);
- return;
- }
-
- /*
- * DLM demoted the lock to NL before it was granted so GFS must be
- * told it cannot cache data for this lock.
- */
-
- if (lp->lksb.sb_flags & DLM_SBF_DEMOTED)
- set_bit(LFL_NOCACHE, &lp->flags);
-
-out:
- /*
- * This is an internal lock_dlm lock
- */
-
- if (test_bit(LFL_INLOCK, &lp->flags)) {
- clear_bit(LFL_NOBLOCK, &lp->flags);
- lp->cur = lp->req;
- wake_up_ast(lp);
- return;
- }
-
- /*
- * Normal completion of a lock request. Tell GFS it now has the lock.
- */
-
- clear_bit(LFL_NOBLOCK, &lp->flags);
- lp->cur = lp->req;
-
- acb.lc_name = lp->lockname;
- acb.lc_ret |= gdlm_make_lmstate(lp->cur);
-
- if (!test_and_clear_bit(LFL_NOCACHE, &lp->flags) &&
- (lp->cur > DLM_LOCK_NL) && (prev_mode > DLM_LOCK_NL))
- acb.lc_ret |= LM_OUT_CACHEABLE;
-
- ls->fscb(ls->sdp, LM_CB_ASYNC, &acb);
-}
-
-static inline int no_work(struct gdlm_ls *ls, int blocking)
+static inline int no_work(struct gdlm_ls *ls)
{
int ret;

spin_lock(&ls->async_lock);
- ret = list_empty(&ls->complete) && list_empty(&ls->submit);
- if (ret && blocking)
- ret = list_empty(&ls->blocking);
+ ret = list_empty(&ls->submit);
spin_unlock(&ls->async_lock);

return ret;
@@ -276,100 +33,55 @@ static inline int check_drop(struct gdlm_ls *ls)
return 0;
}

-static int gdlm_thread(void *data, int blist)
+static int gdlm_thread(void *data)
{
struct gdlm_ls *ls = (struct gdlm_ls *) data;
struct gdlm_lock *lp = NULL;
- uint8_t complete, blocking, submit, drop;
-
- /* Only thread1 is allowed to do blocking callbacks since gfs
- may wait for a completion callback within a blocking cb. */

while (!kthread_should_stop()) {
wait_event_interruptible(ls->thread_wait,
- !no_work(ls, blist) || kthread_should_stop());
-
- complete = blocking = submit = drop = 0;
+ !no_work(ls) || kthread_should_stop());

spin_lock(&ls->async_lock);

- if (blist && !list_empty(&ls->blocking)) {
- lp = list_entry(ls->blocking.next, struct gdlm_lock,
- blist);
- list_del_init(&lp->blist);
- blocking = lp->bast_mode;
- lp->bast_mode = 0;
- } else if (!list_empty(&ls->complete)) {
- lp = list_entry(ls->complete.next, struct gdlm_lock,
- clist);
- list_del_init(&lp->clist);
- complete = 1;
- } else if (!list_empty(&ls->submit)) {
+ if (!list_empty(&ls->submit)) {
lp = list_entry(ls->submit.next, struct gdlm_lock,
delay_list);
list_del_init(&lp->delay_list);
- submit = 1;
- }
-
- drop = check_drop(ls);
- spin_unlock(&ls->async_lock);
-
- if (complete)
- process_complete(lp);
-
- else if (blocking)
- process_blocking(lp, blocking);
-
- else if (submit)
+ spin_unlock(&ls->async_lock);
gdlm_do_lock(lp);
-
- if (drop)
+ spin_lock(&ls->async_lock);
+ }
+ /* Does this ever happen these days? I hope not anyway */
+ if (check_drop(ls)) {
+ spin_unlock(&ls->async_lock);
ls->fscb(ls->sdp, LM_CB_DROPLOCKS, NULL);
-
- schedule();
+ spin_lock(&ls->async_lock);
+ }
+ spin_unlock(&ls->async_lock);
}

return 0;
}

-static int gdlm_thread1(void *data)
-{
- return gdlm_thread(data, 1);
-}
-
-static int gdlm_thread2(void *data)
-{
- return gdlm_thread(data, 0);
-}
-
int gdlm_init_threads(struct gdlm_ls *ls)
{
struct task_struct *p;
int error;

- p = kthread_run(gdlm_thread1, ls, "lock_dlm1");
- error = IS_ERR(p);
- if (error) {
- log_error("can't start lock_dlm1 thread %d", error);
- return error;
- }
- ls->thread1 = p;
-
- p = kthread_run(gdlm_thread2, ls, "lock_dlm2");
+ p = kthread_run(gdlm_thread, ls, "lock_dlm");
error = IS_ERR(p);
if (error) {
- log_error("can't start lock_dlm2 thread %d", error);
- kthread_stop(ls->thread1);
+ log_error("can't start lock_dlm thread %d", error);
return error;
}
- ls->thread2 = p;
+ ls->thread = p;

return 0;
}

void gdlm_release_threads(struct gdlm_ls *ls)
{
- kthread_stop(ls->thread1);
- kthread_stop(ls->thread2);
+ kthread_stop(ls->thread);
}

--
1.5.1.2

2008-07-11 11:02:34

by Steven Whitehouse

[permalink] [raw]
Subject: [PATCH 03/18] [GFS2] No lock_nolock

From: Steven Whitehouse <[email protected]>

This patch merges the lock_nolock module into GFS2 itself. As well as removing
some of the overhead of the module, it also means that its now impossible to
build GFS2 without a lock module (which would be a pointless thing to do
anyway).

We also plan to merge lock_dlm into GFS2 in the future, but that is a more
tricky task, and will therefore be a separate patch.

Signed-off-by: Steven Whitehouse <[email protected]>
Cc: David Teigland <[email protected]>

diff --git a/fs/gfs2/Kconfig b/fs/gfs2/Kconfig
index 7f7947e..ab2f57e 100644
--- a/fs/gfs2/Kconfig
+++ b/fs/gfs2/Kconfig
@@ -14,23 +14,11 @@ config GFS2_FS
GFS is perfect consistency -- changes made to the filesystem on one
machine show up immediately on all other machines in the cluster.

- To use the GFS2 filesystem, you will need to enable one or more of
- the below locking modules. Documentation and utilities for GFS2 can
+ To use the GFS2 filesystem in a cluster, you will need to enable
+ the locking module below. Documentation and utilities for GFS2 can
be found here: http://sources.redhat.com/cluster

-config GFS2_FS_LOCKING_NOLOCK
- tristate "GFS2 \"nolock\" locking module"
- depends on GFS2_FS
- help
- Single node locking module for GFS2.
-
- Use this module if you want to use GFS2 on a single node without
- its clustering features. You can still take advantage of the
- large file support, and upgrade to running a full cluster later on
- if required.
-
- If you will only be using GFS2 in cluster mode, you do not need this
- module.
+ The "nolock" lock module is now built in to GFS2 by default.

config GFS2_FS_LOCKING_DLM
tristate "GFS2 DLM locking module"
diff --git a/fs/gfs2/Makefile b/fs/gfs2/Makefile
index e2350df..ec65851 100644
--- a/fs/gfs2/Makefile
+++ b/fs/gfs2/Makefile
@@ -5,6 +5,5 @@ gfs2-y := acl.o bmap.o daemon.o dir.o eaops.o eattr.o glock.o \
ops_fstype.o ops_inode.o ops_super.o quota.o \
recovery.o rgrp.o super.o sys.o trans.o util.o

-obj-$(CONFIG_GFS2_FS_LOCKING_NOLOCK) += locking/nolock/
obj-$(CONFIG_GFS2_FS_LOCKING_DLM) += locking/dlm/

diff --git a/fs/gfs2/glock.c b/fs/gfs2/glock.c
index 519a54c..be7ed50 100644
--- a/fs/gfs2/glock.c
+++ b/fs/gfs2/glock.c
@@ -153,7 +153,7 @@ static void glock_free(struct gfs2_glock *gl)
struct gfs2_sbd *sdp = gl->gl_sbd;
struct inode *aspace = gl->gl_aspace;

- if (likely(!test_bit(SDF_SHUTDOWN, &sdp->sd_flags)))
+ if (sdp->sd_lockstruct.ls_ops->lm_put_lock)
sdp->sd_lockstruct.ls_ops->lm_put_lock(gl->gl_lock);

if (aspace)
@@ -488,6 +488,10 @@ static unsigned int gfs2_lm_lock(struct gfs2_sbd *sdp, void *lock,
unsigned int flags)
{
int ret = LM_OUT_ERROR;
+
+ if (!sdp->sd_lockstruct.ls_ops->lm_lock)
+ return req_state == LM_ST_UNLOCKED ? 0 : req_state;
+
if (likely(!test_bit(SDF_SHUTDOWN, &sdp->sd_flags)))
ret = sdp->sd_lockstruct.ls_ops->lm_lock(lock, cur_state,
req_state, flags);
@@ -631,6 +635,8 @@ static int gfs2_lm_get_lock(struct gfs2_sbd *sdp, struct lm_lockname *name,
void **lockp)
{
int error = -EIO;
+ if (!sdp->sd_lockstruct.ls_ops->lm_get_lock)
+ return 0;
if (likely(!test_bit(SDF_SHUTDOWN, &sdp->sd_flags)))
error = sdp->sd_lockstruct.ls_ops->lm_get_lock(
sdp->sd_lockstruct.ls_lockspace, name, lockp);
@@ -910,7 +916,8 @@ do_cancel:
gh = list_entry(gl->gl_holders.next, struct gfs2_holder, gh_list);
if (!(gh->gh_flags & LM_FLAG_PRIORITY)) {
spin_unlock(&gl->gl_spin);
- sdp->sd_lockstruct.ls_ops->lm_cancel(gl->gl_lock);
+ if (sdp->sd_lockstruct.ls_ops->lm_cancel)
+ sdp->sd_lockstruct.ls_ops->lm_cancel(gl->gl_lock);
spin_lock(&gl->gl_spin);
}
return;
@@ -1187,6 +1194,8 @@ void gfs2_glock_dq_uninit_m(unsigned int num_gh, struct gfs2_holder *ghs)
static int gfs2_lm_hold_lvb(struct gfs2_sbd *sdp, void *lock, char **lvbp)
{
int error = -EIO;
+ if (!sdp->sd_lockstruct.ls_ops->lm_hold_lvb)
+ return 0;
if (likely(!test_bit(SDF_SHUTDOWN, &sdp->sd_flags)))
error = sdp->sd_lockstruct.ls_ops->lm_hold_lvb(lock, lvbp);
return error;
@@ -1226,7 +1235,7 @@ void gfs2_lvb_unhold(struct gfs2_glock *gl)
gfs2_glock_hold(gl);
gfs2_assert(gl->gl_sbd, atomic_read(&gl->gl_lvb_count) > 0);
if (atomic_dec_and_test(&gl->gl_lvb_count)) {
- if (likely(!test_bit(SDF_SHUTDOWN, &sdp->sd_flags)))
+ if (sdp->sd_lockstruct.ls_ops->lm_unhold_lvb)
sdp->sd_lockstruct.ls_ops->lm_unhold_lvb(gl->gl_lock, gl->gl_lvb);
gl->gl_lvb = NULL;
gfs2_glock_put(gl);
diff --git a/fs/gfs2/locking.c b/fs/gfs2/locking.c
index 663fee7..a4a367a 100644
--- a/fs/gfs2/locking.c
+++ b/fs/gfs2/locking.c
@@ -23,12 +23,54 @@ struct lmh_wrapper {
const struct lm_lockops *lw_ops;
};

+static int nolock_mount(char *table_name, char *host_data,
+ lm_callback_t cb, void *cb_data,
+ unsigned int min_lvb_size, int flags,
+ struct lm_lockstruct *lockstruct,
+ struct kobject *fskobj);
+
/* List of registered low-level locking protocols. A file system selects one
of them by name at mount time, e.g. lock_nolock, lock_dlm. */

+static const struct lm_lockops nolock_ops = {
+ .lm_proto_name = "lock_nolock",
+ .lm_mount = nolock_mount,
+};
+
+static struct lmh_wrapper nolock_proto = {
+ .lw_list = LIST_HEAD_INIT(nolock_proto.lw_list),
+ .lw_ops = &nolock_ops,
+};
+
static LIST_HEAD(lmh_list);
static DEFINE_MUTEX(lmh_lock);

+static int nolock_mount(char *table_name, char *host_data,
+ lm_callback_t cb, void *cb_data,
+ unsigned int min_lvb_size, int flags,
+ struct lm_lockstruct *lockstruct,
+ struct kobject *fskobj)
+{
+ char *c;
+ unsigned int jid;
+
+ c = strstr(host_data, "jid=");
+ if (!c)
+ jid = 0;
+ else {
+ c += 4;
+ sscanf(c, "%u", &jid);
+ }
+
+ lockstruct->ls_jid = jid;
+ lockstruct->ls_first = 1;
+ lockstruct->ls_lvb_size = min_lvb_size;
+ lockstruct->ls_ops = &nolock_ops;
+ lockstruct->ls_flags = LM_LSFLAG_LOCAL;
+
+ return 0;
+}
+
/**
* gfs2_register_lockproto - Register a low-level locking protocol
* @proto: the protocol definition
@@ -116,9 +158,13 @@ int gfs2_mount_lockproto(char *proto_name, char *table_name, char *host_data,
int try = 0;
int error, found;

+
retry:
mutex_lock(&lmh_lock);

+ if (list_empty(&nolock_proto.lw_list))
+ list_add(&lmh_list, &nolock_proto.lw_list);
+
found = 0;
list_for_each_entry(lw, &lmh_list, lw_list) {
if (!strcmp(lw->lw_ops->lm_proto_name, proto_name)) {
@@ -139,7 +185,8 @@ retry:
goto out;
}

- if (!try_module_get(lw->lw_ops->lm_owner)) {
+ if (lw->lw_ops->lm_owner &&
+ !try_module_get(lw->lw_ops->lm_owner)) {
try = 0;
mutex_unlock(&lmh_lock);
msleep(1000);
@@ -158,7 +205,8 @@ out:
void gfs2_unmount_lockproto(struct lm_lockstruct *lockstruct)
{
mutex_lock(&lmh_lock);
- lockstruct->ls_ops->lm_unmount(lockstruct->ls_lockspace);
+ if (lockstruct->ls_ops->lm_unmount)
+ lockstruct->ls_ops->lm_unmount(lockstruct->ls_lockspace);
if (lockstruct->ls_ops->lm_owner)
module_put(lockstruct->ls_ops->lm_owner);
mutex_unlock(&lmh_lock);
diff --git a/fs/gfs2/locking/nolock/Makefile b/fs/gfs2/locking/nolock/Makefile
deleted file mode 100644
index 35e9730..0000000
--- a/fs/gfs2/locking/nolock/Makefile
+++ /dev/null
@@ -1,3 +0,0 @@
-obj-$(CONFIG_GFS2_FS_LOCKING_NOLOCK) += lock_nolock.o
-lock_nolock-y := main.o
-
diff --git a/fs/gfs2/locking/nolock/main.c b/fs/gfs2/locking/nolock/main.c
deleted file mode 100644
index 627bfb7..0000000
--- a/fs/gfs2/locking/nolock/main.c
+++ /dev/null
@@ -1,240 +0,0 @@
-/*
- * Copyright (C) Sistina Software, Inc. 1997-2003 All rights reserved.
- * Copyright (C) 2004-2005 Red Hat, Inc. All rights reserved.
- *
- * This copyrighted material is made available to anyone wishing to use,
- * modify, copy, or redistribute it subject to the terms and conditions
- * of the GNU General Public License version 2.
- */
-
-#include <linux/module.h>
-#include <linux/slab.h>
-#include <linux/init.h>
-#include <linux/types.h>
-#include <linux/fs.h>
-#include <linux/lm_interface.h>
-
-struct nolock_lockspace {
- unsigned int nl_lvb_size;
-};
-
-static const struct lm_lockops nolock_ops;
-
-static int nolock_mount(char *table_name, char *host_data,
- lm_callback_t cb, void *cb_data,
- unsigned int min_lvb_size, int flags,
- struct lm_lockstruct *lockstruct,
- struct kobject *fskobj)
-{
- char *c;
- unsigned int jid;
- struct nolock_lockspace *nl;
-
- c = strstr(host_data, "jid=");
- if (!c)
- jid = 0;
- else {
- c += 4;
- sscanf(c, "%u", &jid);
- }
-
- nl = kzalloc(sizeof(struct nolock_lockspace), GFP_KERNEL);
- if (!nl)
- return -ENOMEM;
-
- nl->nl_lvb_size = min_lvb_size;
-
- lockstruct->ls_jid = jid;
- lockstruct->ls_first = 1;
- lockstruct->ls_lvb_size = min_lvb_size;
- lockstruct->ls_lockspace = nl;
- lockstruct->ls_ops = &nolock_ops;
- lockstruct->ls_flags = LM_LSFLAG_LOCAL;
-
- return 0;
-}
-
-static void nolock_others_may_mount(void *lockspace)
-{
-}
-
-static void nolock_unmount(void *lockspace)
-{
- struct nolock_lockspace *nl = lockspace;
- kfree(nl);
-}
-
-static void nolock_withdraw(void *lockspace)
-{
-}
-
-/**
- * nolock_get_lock - get a lm_lock_t given a descripton of the lock
- * @lockspace: the lockspace the lock lives in
- * @name: the name of the lock
- * @lockp: return the lm_lock_t here
- *
- * Returns: 0 on success, -EXXX on failure
- */
-
-static int nolock_get_lock(void *lockspace, struct lm_lockname *name,
- void **lockp)
-{
- *lockp = lockspace;
- return 0;
-}
-
-/**
- * nolock_put_lock - get rid of a lock structure
- * @lock: the lock to throw away
- *
- */
-
-static void nolock_put_lock(void *lock)
-{
-}
-
-/**
- * nolock_lock - acquire a lock
- * @lock: the lock to manipulate
- * @cur_state: the current state
- * @req_state: the requested state
- * @flags: modifier flags
- *
- * Returns: A bitmap of LM_OUT_*
- */
-
-static unsigned int nolock_lock(void *lock, unsigned int cur_state,
- unsigned int req_state, unsigned int flags)
-{
- if (req_state == LM_ST_UNLOCKED)
- return 0;
- return req_state | LM_OUT_CACHEABLE;
-}
-
-/**
- * nolock_unlock - unlock a lock
- * @lock: the lock to manipulate
- * @cur_state: the current state
- *
- * Returns: 0
- */
-
-static unsigned int nolock_unlock(void *lock, unsigned int cur_state)
-{
- return 0;
-}
-
-static void nolock_cancel(void *lock)
-{
-}
-
-/**
- * nolock_hold_lvb - hold on to a lock value block
- * @lock: the lock the LVB is associated with
- * @lvbp: return the lm_lvb_t here
- *
- * Returns: 0 on success, -EXXX on failure
- */
-
-static int nolock_hold_lvb(void *lock, char **lvbp)
-{
- struct nolock_lockspace *nl = lock;
- int error = 0;
-
- *lvbp = kzalloc(nl->nl_lvb_size, GFP_NOFS);
- if (!*lvbp)
- error = -ENOMEM;
-
- return error;
-}
-
-/**
- * nolock_unhold_lvb - release a LVB
- * @lock: the lock the LVB is associated with
- * @lvb: the lock value block
- *
- */
-
-static void nolock_unhold_lvb(void *lock, char *lvb)
-{
- kfree(lvb);
-}
-
-static int nolock_plock_get(void *lockspace, struct lm_lockname *name,
- struct file *file, struct file_lock *fl)
-{
- posix_test_lock(file, fl);
-
- return 0;
-}
-
-static int nolock_plock(void *lockspace, struct lm_lockname *name,
- struct file *file, int cmd, struct file_lock *fl)
-{
- int error;
- error = posix_lock_file_wait(file, fl);
- return error;
-}
-
-static int nolock_punlock(void *lockspace, struct lm_lockname *name,
- struct file *file, struct file_lock *fl)
-{
- int error;
- error = posix_lock_file_wait(file, fl);
- return error;
-}
-
-static void nolock_recovery_done(void *lockspace, unsigned int jid,
- unsigned int message)
-{
-}
-
-static const struct lm_lockops nolock_ops = {
- .lm_proto_name = "lock_nolock",
- .lm_mount = nolock_mount,
- .lm_others_may_mount = nolock_others_may_mount,
- .lm_unmount = nolock_unmount,
- .lm_withdraw = nolock_withdraw,
- .lm_get_lock = nolock_get_lock,
- .lm_put_lock = nolock_put_lock,
- .lm_lock = nolock_lock,
- .lm_unlock = nolock_unlock,
- .lm_cancel = nolock_cancel,
- .lm_hold_lvb = nolock_hold_lvb,
- .lm_unhold_lvb = nolock_unhold_lvb,
- .lm_plock_get = nolock_plock_get,
- .lm_plock = nolock_plock,
- .lm_punlock = nolock_punlock,
- .lm_recovery_done = nolock_recovery_done,
- .lm_owner = THIS_MODULE,
-};
-
-static int __init init_nolock(void)
-{
- int error;
-
- error = gfs2_register_lockproto(&nolock_ops);
- if (error) {
- printk(KERN_WARNING
- "lock_nolock: can't register protocol: %d\n", error);
- return error;
- }
-
- printk(KERN_INFO
- "Lock_Nolock (built %s %s) installed\n", __DATE__, __TIME__);
- return 0;
-}
-
-static void __exit exit_nolock(void)
-{
- gfs2_unregister_lockproto(&nolock_ops);
-}
-
-module_init(init_nolock);
-module_exit(exit_nolock);
-
-MODULE_DESCRIPTION("GFS Nolock Locking Module");
-MODULE_AUTHOR("Red Hat, Inc.");
-MODULE_LICENSE("GPL");
-
diff --git a/fs/gfs2/ops_fstype.c b/fs/gfs2/ops_fstype.c
index b2028c8..9bd97c5 100644
--- a/fs/gfs2/ops_fstype.c
+++ b/fs/gfs2/ops_fstype.c
@@ -364,6 +364,8 @@ static int map_journal_extents(struct gfs2_sbd *sdp)

static void gfs2_lm_others_may_mount(struct gfs2_sbd *sdp)
{
+ if (!sdp->sd_lockstruct.ls_ops->lm_others_may_mount)
+ return;
if (likely(!test_bit(SDF_SHUTDOWN, &sdp->sd_flags)))
sdp->sd_lockstruct.ls_ops->lm_others_may_mount(
sdp->sd_lockstruct.ls_lockspace);
@@ -741,8 +743,7 @@ static int gfs2_lm_mount(struct gfs2_sbd *sdp, int silent)
goto out;
}

- if (gfs2_assert_warn(sdp, sdp->sd_lockstruct.ls_lockspace) ||
- gfs2_assert_warn(sdp, sdp->sd_lockstruct.ls_ops) ||
+ if (gfs2_assert_warn(sdp, sdp->sd_lockstruct.ls_ops) ||
gfs2_assert_warn(sdp, sdp->sd_lockstruct.ls_lvb_size >=
GFS2_MIN_LVB_SIZE)) {
gfs2_unmount_lockproto(&sdp->sd_lockstruct);
diff --git a/fs/gfs2/recovery.c b/fs/gfs2/recovery.c
index fdd3f0f..d5e91f4 100644
--- a/fs/gfs2/recovery.c
+++ b/fs/gfs2/recovery.c
@@ -428,6 +428,9 @@ static int clean_journal(struct gfs2_jdesc *jd, struct gfs2_log_header_host *hea
static void gfs2_lm_recovery_done(struct gfs2_sbd *sdp, unsigned int jid,
unsigned int message)
{
+ if (!sdp->sd_lockstruct.ls_ops->lm_recovery_done)
+ return;
+
if (likely(!test_bit(SDF_SHUTDOWN, &sdp->sd_flags)))
sdp->sd_lockstruct.ls_ops->lm_recovery_done(
sdp->sd_lockstruct.ls_lockspace, jid, message);
--
1.5.1.2

2008-07-11 11:02:51

by Steven Whitehouse

[permalink] [raw]
Subject: [PATCH 05/18] [GFS2] Fix ordering of args for list_add

From: Steven Whitehouse <[email protected]>

The patch to remove lock_nolock managed to get the arguments
of this list_add backwards. This fixes it.

Signed-off-by: Steven Whitehouse <[email protected]>

diff --git a/fs/gfs2/locking.c b/fs/gfs2/locking.c
index a4a367a..523243a 100644
--- a/fs/gfs2/locking.c
+++ b/fs/gfs2/locking.c
@@ -163,7 +163,7 @@ retry:
mutex_lock(&lmh_lock);

if (list_empty(&nolock_proto.lw_list))
- list_add(&lmh_list, &nolock_proto.lw_list);
+ list_add(&nolock_proto.lw_list, &lmh_list);

found = 0;
list_for_each_entry(lw, &lmh_list, lw_list) {
--
1.5.1.2

2008-07-11 11:03:16

by Steven Whitehouse

[permalink] [raw]
Subject: [PATCH 06/18] [GFS2] Revise readpage locking

From: Steven Whitehouse <[email protected]>

The previous attempt to fix the locking in readpage failed due
to the use of a "try lock" which resulted in occasional high
cpu usage during testing (due to repeated tries) and also it
did not resolve all the ordering problems wrt the transaction
lock (although it did solve all the inode lock ordering problems).

This patch avoids the problem by unlocking the page and getting the
locks in the correct order. This means that we have to retest the
page to ensure that it hasn't changed when we relock the page.

This now passes the tests which were previously failing.

Signed-off-by: Steven Whitehouse <[email protected]>

diff --git a/fs/gfs2/ops_address.c b/fs/gfs2/ops_address.c
index 2b556dd..e64a1b0 100644
--- a/fs/gfs2/ops_address.c
+++ b/fs/gfs2/ops_address.c
@@ -499,31 +499,34 @@ static int __gfs2_readpage(void *file, struct page *page)
* @file: The file to read
* @page: The page of the file
*
- * This deals with the locking required. We use a trylock in order to
- * avoid the page lock / glock ordering problems returning AOP_TRUNCATED_PAGE
- * in the event that we are unable to get the lock.
+ * This deals with the locking required. We have to unlock and
+ * relock the page in order to get the locking in the right
+ * order.
*/

static int gfs2_readpage(struct file *file, struct page *page)
{
- struct gfs2_inode *ip = GFS2_I(page->mapping->host);
+ struct address_space *mapping = page->mapping;
+ struct gfs2_inode *ip = GFS2_I(mapping->host);
struct gfs2_holder gh;
int error;

- gfs2_holder_init(ip->i_gl, LM_ST_SHARED, GL_ATIME|LM_FLAG_TRY_1CB, &gh);
+ unlock_page(page);
+ gfs2_holder_init(ip->i_gl, LM_ST_SHARED, GL_ATIME, &gh);
error = gfs2_glock_nq_atime(&gh);
- if (unlikely(error)) {
- unlock_page(page);
+ if (unlikely(error))
goto out;
- }
- error = __gfs2_readpage(file, page);
+ error = AOP_TRUNCATED_PAGE;
+ lock_page(page);
+ if (page->mapping == mapping && !PageUptodate(page))
+ error = __gfs2_readpage(file, page);
+ else
+ unlock_page(page);
gfs2_glock_dq(&gh);
out:
gfs2_holder_uninit(&gh);
- if (error == GLR_TRYFAILED) {
- yield();
- return AOP_TRUNCATED_PAGE;
- }
+ if (error && error != AOP_TRUNCATED_PAGE)
+ lock_page(page);
return error;
}

--
1.5.1.2

2008-07-11 11:03:47

by Steven Whitehouse

[permalink] [raw]
Subject: [PATCH 08/18] [GFS2] Remove remote lock dropping code

From: Steven Whitehouse <[email protected]>

There are several reasons why this is undesirable:

1. It never happens during normal operation anyway
2. If it does happen it causes performance to be very, very poor
3. It isn't likely to solve the original problem (memory shortage
on remote DLM node) it was supposed to solve
4. It uses a bunch of arbitrary constants which are unlikely to be
correct for any particular situation and for which the tuning seems
to be a black art.
5. In an N node cluster, only 1/N of the dropped locked will actually
contribute to solving the problem on average.

So all in all we are better off without it. This also makes merging
the lock_dlm module into GFS2 a bit easier.

Signed-off-by: Steven Whitehouse <[email protected]>

diff --git a/fs/gfs2/gfs2.h b/fs/gfs2/gfs2.h
index 3bb11c0..ef606e3 100644
--- a/fs/gfs2/gfs2.h
+++ b/fs/gfs2/gfs2.h
@@ -16,11 +16,6 @@ enum {
};

enum {
- NO_WAIT = 0,
- WAIT = 1,
-};
-
-enum {
NO_FORCE = 0,
FORCE = 1,
};
diff --git a/fs/gfs2/glock.c b/fs/gfs2/glock.c
index be7ed50..8d5450f 100644
--- a/fs/gfs2/glock.c
+++ b/fs/gfs2/glock.c
@@ -1316,11 +1316,6 @@ void gfs2_glock_cb(void *cb_data, unsigned int type, void *data)
wake_up_process(sdp->sd_recoverd_process);
return;

- case LM_CB_DROPLOCKS:
- gfs2_gl_hash_clear(sdp, NO_WAIT);
- gfs2_quota_scan(sdp);
- return;
-
default:
gfs2_assert_warn(sdp, 0);
return;
@@ -1508,11 +1503,10 @@ static void clear_glock(struct gfs2_glock *gl)
* @sdp: the filesystem
* @wait: wait until it's all gone
*
- * Called when unmounting the filesystem, or when inter-node lock manager
- * requests DROPLOCKS because it is running out of capacity.
+ * Called when unmounting the filesystem.
*/

-void gfs2_gl_hash_clear(struct gfs2_sbd *sdp, int wait)
+void gfs2_gl_hash_clear(struct gfs2_sbd *sdp)
{
unsigned long t;
unsigned int x;
@@ -1527,7 +1521,7 @@ void gfs2_gl_hash_clear(struct gfs2_sbd *sdp, int wait)
cont = 1;
}

- if (!wait || !cont)
+ if (!cont)
break;

if (time_after_eq(jiffies,
diff --git a/fs/gfs2/glock.h b/fs/gfs2/glock.h
index 7389f8e..971d92a 100644
--- a/fs/gfs2/glock.h
+++ b/fs/gfs2/glock.h
@@ -132,7 +132,7 @@ void gfs2_lvb_unhold(struct gfs2_glock *gl);
void gfs2_glock_cb(void *cb_data, unsigned int type, void *data);
void gfs2_glock_schedule_for_reclaim(struct gfs2_glock *gl);
void gfs2_reclaim_glock(struct gfs2_sbd *sdp);
-void gfs2_gl_hash_clear(struct gfs2_sbd *sdp, int wait);
+void gfs2_gl_hash_clear(struct gfs2_sbd *sdp);

int __init gfs2_glock_init(void);
void gfs2_glock_exit(void);
diff --git a/fs/gfs2/locking/dlm/lock_dlm.h b/fs/gfs2/locking/dlm/lock_dlm.h
index ad944c6..845a27f 100644
--- a/fs/gfs2/locking/dlm/lock_dlm.h
+++ b/fs/gfs2/locking/dlm/lock_dlm.h
@@ -79,9 +79,6 @@ struct gdlm_ls {
wait_queue_head_t wait_control;
struct task_struct *thread;
wait_queue_head_t thread_wait;
- unsigned long drop_time;
- int drop_locks_count;
- int drop_locks_period;
};

enum {
diff --git a/fs/gfs2/locking/dlm/mount.c b/fs/gfs2/locking/dlm/mount.c
index 0628520..fa31c54 100644
--- a/fs/gfs2/locking/dlm/mount.c
+++ b/fs/gfs2/locking/dlm/mount.c
@@ -22,8 +22,6 @@ static struct gdlm_ls *init_gdlm(lm_callback_t cb, struct gfs2_sbd *sdp,
if (!ls)
return NULL;

- ls->drop_locks_count = GDLM_DROP_COUNT;
- ls->drop_locks_period = GDLM_DROP_PERIOD;
ls->fscb = cb;
ls->sdp = sdp;
ls->fsflags = flags;
@@ -33,7 +31,6 @@ static struct gdlm_ls *init_gdlm(lm_callback_t cb, struct gfs2_sbd *sdp,
INIT_LIST_HEAD(&ls->all_locks);
init_waitqueue_head(&ls->thread_wait);
init_waitqueue_head(&ls->wait_control);
- ls->drop_time = jiffies;
ls->jid = -1;

strncpy(buf, table_name, 256);
diff --git a/fs/gfs2/locking/dlm/sysfs.c b/fs/gfs2/locking/dlm/sysfs.c
index a4ff271..4ec571c 100644
--- a/fs/gfs2/locking/dlm/sysfs.c
+++ b/fs/gfs2/locking/dlm/sysfs.c
@@ -114,17 +114,6 @@ static ssize_t recover_status_show(struct gdlm_ls *ls, char *buf)
return sprintf(buf, "%d\n", ls->recover_jid_status);
}

-static ssize_t drop_count_show(struct gdlm_ls *ls, char *buf)
-{
- return sprintf(buf, "%d\n", ls->drop_locks_count);
-}
-
-static ssize_t drop_count_store(struct gdlm_ls *ls, const char *buf, size_t len)
-{
- ls->drop_locks_count = simple_strtol(buf, NULL, 0);
- return len;
-}
-
struct gdlm_attr {
struct attribute attr;
ssize_t (*show)(struct gdlm_ls *, char *);
@@ -144,7 +133,6 @@ GDLM_ATTR(first_done, 0444, first_done_show, NULL);
GDLM_ATTR(recover, 0644, recover_show, recover_store);
GDLM_ATTR(recover_done, 0444, recover_done_show, NULL);
GDLM_ATTR(recover_status, 0444, recover_status_show, NULL);
-GDLM_ATTR(drop_count, 0644, drop_count_show, drop_count_store);

static struct attribute *gdlm_attrs[] = {
&gdlm_attr_proto_name.attr,
@@ -157,7 +145,6 @@ static struct attribute *gdlm_attrs[] = {
&gdlm_attr_recover.attr,
&gdlm_attr_recover_done.attr,
&gdlm_attr_recover_status.attr,
- &gdlm_attr_drop_count.attr,
NULL,
};

diff --git a/fs/gfs2/locking/dlm/thread.c b/fs/gfs2/locking/dlm/thread.c
index f30350a..38823ef 100644
--- a/fs/gfs2/locking/dlm/thread.c
+++ b/fs/gfs2/locking/dlm/thread.c
@@ -20,19 +20,6 @@ static inline int no_work(struct gdlm_ls *ls)
return ret;
}

-static inline int check_drop(struct gdlm_ls *ls)
-{
- if (!ls->drop_locks_count)
- return 0;
-
- if (time_after(jiffies, ls->drop_time + ls->drop_locks_period * HZ)) {
- ls->drop_time = jiffies;
- if (ls->all_locks_count >= ls->drop_locks_count)
- return 1;
- }
- return 0;
-}
-
static int gdlm_thread(void *data)
{
struct gdlm_ls *ls = (struct gdlm_ls *) data;
@@ -52,12 +39,6 @@ static int gdlm_thread(void *data)
gdlm_do_lock(lp);
spin_lock(&ls->async_lock);
}
- /* Does this ever happen these days? I hope not anyway */
- if (check_drop(ls)) {
- spin_unlock(&ls->async_lock);
- ls->fscb(ls->sdp, LM_CB_DROPLOCKS, NULL);
- spin_lock(&ls->async_lock);
- }
spin_unlock(&ls->async_lock);
}

diff --git a/fs/gfs2/ops_fstype.c b/fs/gfs2/ops_fstype.c
index 9bd97c5..6ba69dd 100644
--- a/fs/gfs2/ops_fstype.c
+++ b/fs/gfs2/ops_fstype.c
@@ -874,7 +874,7 @@ fail_sb:
fail_locking:
init_locking(sdp, &mount_gh, UNDO);
fail_lm:
- gfs2_gl_hash_clear(sdp, WAIT);
+ gfs2_gl_hash_clear(sdp);
gfs2_lm_unmount(sdp);
while (invalidate_inodes(sb))
yield();
diff --git a/fs/gfs2/ops_super.c b/fs/gfs2/ops_super.c
index 6690792..f66ea0f 100644
--- a/fs/gfs2/ops_super.c
+++ b/fs/gfs2/ops_super.c
@@ -126,7 +126,7 @@ static void gfs2_put_super(struct super_block *sb)
gfs2_clear_rgrpd(sdp);
gfs2_jindex_free(sdp);
/* Take apart glock structures and buffer lists */
- gfs2_gl_hash_clear(sdp, WAIT);
+ gfs2_gl_hash_clear(sdp);
/* Unmount the locking protocol */
gfs2_lm_unmount(sdp);

diff --git a/fs/gfs2/sys.c b/fs/gfs2/sys.c
index 9ab9fc8..6f7e2e5 100644
--- a/fs/gfs2/sys.c
+++ b/fs/gfs2/sys.c
@@ -110,18 +110,6 @@ static ssize_t statfs_sync_store(struct gfs2_sbd *sdp, const char *buf,
return len;
}

-static ssize_t shrink_store(struct gfs2_sbd *sdp, const char *buf, size_t len)
-{
- if (!capable(CAP_SYS_ADMIN))
- return -EACCES;
-
- if (simple_strtol(buf, NULL, 0) != 1)
- return -EINVAL;
-
- gfs2_gl_hash_clear(sdp, NO_WAIT);
- return len;
-}
-
static ssize_t quota_sync_store(struct gfs2_sbd *sdp, const char *buf,
size_t len)
{
@@ -175,7 +163,6 @@ static struct gfs2_attr gfs2_attr_##name = __ATTR(name, mode, show, store)
GFS2_ATTR(id, 0444, id_show, NULL);
GFS2_ATTR(fsname, 0444, fsname_show, NULL);
GFS2_ATTR(freeze, 0644, freeze_show, freeze_store);
-GFS2_ATTR(shrink, 0200, NULL, shrink_store);
GFS2_ATTR(withdraw, 0644, withdraw_show, withdraw_store);
GFS2_ATTR(statfs_sync, 0200, NULL, statfs_sync_store);
GFS2_ATTR(quota_sync, 0200, NULL, quota_sync_store);
@@ -186,7 +173,6 @@ static struct attribute *gfs2_attrs[] = {
&gfs2_attr_id.attr,
&gfs2_attr_fsname.attr,
&gfs2_attr_freeze.attr,
- &gfs2_attr_shrink.attr,
&gfs2_attr_withdraw.attr,
&gfs2_attr_statfs_sync.attr,
&gfs2_attr_quota_sync.attr,
diff --git a/include/linux/lm_interface.h b/include/linux/lm_interface.h
index f274997..d0a7112 100644
--- a/include/linux/lm_interface.h
+++ b/include/linux/lm_interface.h
@@ -138,9 +138,6 @@ typedef void (*lm_callback_t) (void *ptr, unsigned int type, void *data);
* LM_CB_NEED_RECOVERY
* The given journal needs to be recovered.
*
- * LM_CB_DROPLOCKS
- * Reduce the number of cached locks.
- *
* LM_CB_ASYNC
* The given lock has been granted.
*/
@@ -149,7 +146,6 @@ typedef void (*lm_callback_t) (void *ptr, unsigned int type, void *data);
#define LM_CB_NEED_D 258
#define LM_CB_NEED_S 259
#define LM_CB_NEED_RECOVERY 260
-#define LM_CB_DROPLOCKS 261
#define LM_CB_ASYNC 262

/*
--
1.5.1.2

2008-07-11 11:03:32

by Steven Whitehouse

[permalink] [raw]
Subject: [PATCH 07/18] [GFS2] kernel panic mounting volume

From: Bob Peterson <[email protected]>

This patch fixes Red Hat bugzilla bug 450156.

This started with a not-too-improbable mount failure because the
locking protocol was never set back to its proper "lock_dlm" after the
system was rebooted in the middle of a gfs2_fsck. That left a
(purposely) invalid locking protocol in the superblock, which caused an
error when the file system was mounted the next time.

When there's an error mounting, vfs calls DQUOT_OFF, which calls
vfs_quota_off which calls gfs2_sync_fs. Next, gfs2_sync_fs calls
gfs2_log_flush passing s_fs_info. But due to the error, s_fs_info
had been previously set to NULL, and so we have the kernel oops.

My solution in this patch is to test for the NULL value before passing
it. I tested this patch and it fixes the problem.

Signed-off-by: Bob Peterson <[email protected]>
Signed-off-by: Steven Whitehouse <[email protected]>

diff --git a/fs/gfs2/ops_super.c b/fs/gfs2/ops_super.c
index 0b7cc92..6690792 100644
--- a/fs/gfs2/ops_super.c
+++ b/fs/gfs2/ops_super.c
@@ -155,7 +155,7 @@ static void gfs2_write_super(struct super_block *sb)
static int gfs2_sync_fs(struct super_block *sb, int wait)
{
sb->s_dirt = 0;
- if (wait)
+ if (wait && sb->s_fs_info)
gfs2_log_flush(sb->s_fs_info, NULL);
return 0;
}
--
1.5.1.2

2008-07-11 11:04:06

by Steven Whitehouse

[permalink] [raw]
Subject: [PATCH 09/18] [GFS2] Remove obsolete conversion deadlock avoidance code

From: Steven Whitehouse <[email protected]>

This is only used by GFS1 so can be removed.

Signed-off-by: Steven Whitehouse <[email protected]>

diff --git a/fs/gfs2/locking/dlm/lock.c b/fs/gfs2/locking/dlm/lock.c
index 871ffc9..894df45 100644
--- a/fs/gfs2/locking/dlm/lock.c
+++ b/fs/gfs2/locking/dlm/lock.c
@@ -80,7 +80,6 @@ static void process_complete(struct gdlm_lock *lp)
{
struct gdlm_ls *ls = lp->ls;
struct lm_async_cb acb;
- s16 prev_mode = lp->cur;

memset(&acb, 0, sizeof(acb));

@@ -160,15 +159,7 @@ static void process_complete(struct gdlm_lock *lp)
lp->lksb.sb_status, lp->lockname.ln_type,
(unsigned long long)lp->lockname.ln_number,
lp->flags);
- if (lp->lksb.sb_status == -EDEADLOCK &&
- lp->ls->fsflags & LM_MFLAG_CONV_NODROP) {
- lp->req = lp->cur;
- acb.lc_ret |= LM_OUT_CONV_DEADLK;
- if (lp->cur == DLM_LOCK_IV)
- lp->lksb.sb_lkid = 0;
- goto out;
- } else
- return;
+ return;
}

/*
@@ -268,10 +259,6 @@ out:
acb.lc_name = lp->lockname;
acb.lc_ret |= gdlm_make_lmstate(lp->cur);

- if (!test_and_clear_bit(LFL_NOCACHE, &lp->flags) &&
- (lp->cur > DLM_LOCK_NL) && (prev_mode > DLM_LOCK_NL))
- acb.lc_ret |= LM_OUT_CACHEABLE;
-
ls->fscb(ls->sdp, LM_CB_ASYNC, &acb);
}

@@ -376,14 +363,6 @@ static inline unsigned int make_flags(struct gdlm_lock *lp,

if (lp->lksb.sb_lkid != 0) {
lkf |= DLM_LKF_CONVERT;
-
- /* Conversion deadlock avoidance by DLM */
-
- if (!(lp->ls->fsflags & LM_MFLAG_CONV_NODROP) &&
- !test_bit(LFL_FORCE_PROMOTE, &lp->flags) &&
- !(lkf & DLM_LKF_NOQUEUE) &&
- cur > DLM_LOCK_NL && req > DLM_LOCK_NL && cur != req)
- lkf |= DLM_LKF_CONVDEADLK;
}

if (lp->lvb)
diff --git a/include/linux/lm_interface.h b/include/linux/lm_interface.h
index d0a7112..2ed8fa1 100644
--- a/include/linux/lm_interface.h
+++ b/include/linux/lm_interface.h
@@ -122,11 +122,9 @@ typedef void (*lm_callback_t) (void *ptr, unsigned int type, void *data);
*/

#define LM_OUT_ST_MASK 0x00000003
-#define LM_OUT_CACHEABLE 0x00000004
#define LM_OUT_CANCELED 0x00000008
#define LM_OUT_ASYNC 0x00000080
#define LM_OUT_ERROR 0x00000100
-#define LM_OUT_CONV_DEADLK 0x00000200

/*
* lm_callback_t types
--
1.5.1.2

2008-07-11 11:04:31

by Steven Whitehouse

[permalink] [raw]
Subject: [PATCH 04/18] [GFS2] trivial sparse lock annotations

From: Harvey Harrison <[email protected]>

Annotate the &sdp->sd_log_lock.

Signed-off-by: Harvey Harrison <[email protected]>
Signed-off-by: Steven Whitehouse <[email protected]>

diff --git a/fs/gfs2/log.c b/fs/gfs2/log.c
index 548264b..6c6af9f 100644
--- a/fs/gfs2/log.c
+++ b/fs/gfs2/log.c
@@ -87,6 +87,8 @@ void gfs2_remove_from_ail(struct gfs2_bufdata *bd)
*/

static void gfs2_ail1_start_one(struct gfs2_sbd *sdp, struct gfs2_ail *ai)
+__releases(&sdp->sd_log_lock)
+__acquires(&sdp->sd_log_lock)
{
struct gfs2_bufdata *bd, *s;
struct buffer_head *bh;
diff --git a/fs/gfs2/log.h b/fs/gfs2/log.h
index 7711528..7c64510 100644
--- a/fs/gfs2/log.h
+++ b/fs/gfs2/log.h
@@ -21,6 +21,7 @@
*/

static inline void gfs2_log_lock(struct gfs2_sbd *sdp)
+__acquires(&sdp->sd_log_lock)
{
spin_lock(&sdp->sd_log_lock);
}
@@ -32,6 +33,7 @@ static inline void gfs2_log_lock(struct gfs2_sbd *sdp)
*/

static inline void gfs2_log_unlock(struct gfs2_sbd *sdp)
+__releases(&sdp->sd_log_lock)
{
spin_unlock(&sdp->sd_log_lock);
}
--
1.5.1.2

2008-07-11 11:05:09

by Steven Whitehouse

[permalink] [raw]
Subject: [PATCH 11/18] [GFS2] Glock documentation

From: Steven Whitehouse <[email protected]>

This patch adds a file describing the internals of GFS2's glock
abstraction.

Signed-off-by: Steven Whitehouse <[email protected]>

diff --git a/Documentation/filesystems/gfs2-glocks.txt b/Documentation/filesystems/gfs2-glocks.txt
new file mode 100644
index 0000000..4dae9a3
--- /dev/null
+++ b/Documentation/filesystems/gfs2-glocks.txt
@@ -0,0 +1,114 @@
+ Glock internal locking rules
+ ------------------------------
+
+This documents the basic principles of the glock state machine
+internals. Each glock (struct gfs2_glock in fs/gfs2/incore.h)
+has two main (internal) locks:
+
+ 1. A spinlock (gl_spin) which protects the internal state such
+ as gl_state, gl_target and the list of holders (gl_holders)
+ 2. A non-blocking bit lock, GLF_LOCK, which is used to prevent other
+ threads from making calls to the DLM, etc. at the same time. If a
+ thread takes this lock, it must then call run_queue (usually via the
+ workqueue) when it releases it in order to ensure any pending tasks
+ are completed.
+
+The gl_holders list contains all the queued lock requests (not
+just the holders) associated with the glock. If there are any
+held locks, then they will be contiguous entries at the head
+of the list. Locks are granted in strictly the order that they
+are queued, except for those marked LM_FLAG_PRIORITY which are
+used only during recovery, and even then only for journal locks.
+
+There are three lock states that users of the glock layer can request,
+namely shared (SH), deferred (DF) and exclusive (EX). Those translate
+to the following DLM lock modes:
+
+Glock mode | DLM lock mode
+------------------------------
+ UN | IV/NL Unlocked (no DLM lock associated with glock) or NL
+ SH | PR (Protected read)
+ DF | CW (Concurrent write)
+ EX | EX (Exclusive)
+
+Thus DF is basically a shared mode which is incompatible with the "normal"
+shared lock mode, SH. In GFS2 the DF mode is used exclusively for direct I/O
+operations. The glocks are basically a lock plus some routines which deal
+with cache management. The following rules apply for the cache:
+
+Glock mode | Cache data | Cache Metadata | Dirty Data | Dirty Metadata
+--------------------------------------------------------------------------
+ UN | No | No | No | No
+ SH | Yes | Yes | No | No
+ DF | No | Yes | No | No
+ EX | Yes | Yes | Yes | Yes
+
+These rules are implemented using the various glock operations which
+are defined for each type of glock. Not all types of glocks use
+all the modes. Only inode glocks use the DF mode for example.
+
+Table of glock operations and per type constants:
+
+Field | Purpose
+----------------------------------------------------------------------------
+go_xmote_th | Called before remote state change (e.g. to sync dirty data)
+go_xmote_bh | Called after remote state change (e.g. to refill cache)
+go_inval | Called if remote state change requires invalidating the cache
+go_demote_ok | Returns boolean value of whether its ok to demote a glock
+ | (e.g. checks timeout, and that there is no cached data)
+go_lock | Called for the first local holder of a lock
+go_unlock | Called on the final local unlock of a lock
+go_dump | Called to print content of object for debugfs file, or on
+ | error to dump glock to the log.
+go_type; | The type of the glock, LM_TYPE_.....
+go_min_hold_time | The minimum hold time
+
+The minimum hold time for each lock is the time after a remote lock
+grant for which we ignore remote demote requests. This is in order to
+prevent a situation where locks are being bounced around the cluster
+from node to node with none of the nodes making any progress. This
+tends to show up most with shared mmaped files which are being written
+to by multiple nodes. By delaying the demotion in response to a
+remote callback, that gives the userspace program time to make
+some progress before the pages are unmapped.
+
+There is a plan to try and remove the go_lock and go_unlock callbacks
+if possible, in order to try and speed up the fast path though the locking.
+Also, eventually we hope to make the glock "EX" mode locally shared
+such that any local locking will be done with the i_mutex as required
+rather than via the glock.
+
+Locking rules for glock operations:
+
+Operation | GLF_LOCK bit lock held | gl_spin spinlock held
+-----------------------------------------------------------------
+go_xmote_th | Yes | No
+go_xmote_bh | Yes | No
+go_inval | Yes | No
+go_demote_ok | Sometimes | Yes
+go_lock | Yes | No
+go_unlock | Yes | No
+go_dump | Sometimes | Yes
+
+N.B. Operations must not drop either the bit lock or the spinlock
+if its held on entry. go_dump and do_demote_ok must never block.
+Note that go_dump will only be called if the glock's state
+indicates that it is caching uptodate data.
+
+Glock locking order within GFS2:
+
+ 1. i_mutex (if required)
+ 2. Rename glock (for rename only)
+ 3. Inode glock(s)
+ (Parents before children, inodes at "same level" with same parent in
+ lock number order)
+ 4. Rgrp glock(s) (for (de)allocation operations)
+ 5. Transaction glock (via gfs2_trans_begin) for non-read operations
+ 6. Page lock (always last, very important!)
+
+There are two glocks per inode. One deals with access to the inode
+itself (locking order as above), and the other, known as the iopen
+glock is used in conjunction with the i_nlink field in the inode to
+determine the lifetime of the inode in question. Locking of inodes
+is on a per-inode basis. Locking of rgrps is on a per rgrp basis.
+
--
1.5.1.2

2008-07-11 11:05:38

by Steven Whitehouse

[permalink] [raw]
Subject: [PATCH 12/18] [GFS2] Fix module building

From: Steven Whitehouse <[email protected]>

Two lines missed from the previous patch.

Signed-off-by: Steven Whitehouse <[email protected]>

diff --git a/fs/gfs2/locking/dlm/lock_dlm.h b/fs/gfs2/locking/dlm/lock_dlm.h
index 21cf466..3c98e7c 100644
--- a/fs/gfs2/locking/dlm/lock_dlm.h
+++ b/fs/gfs2/locking/dlm/lock_dlm.h
@@ -148,7 +148,6 @@ void gdlm_release_threads(struct gdlm_ls *);
/* lock.c */

void gdlm_submit_delayed(struct gdlm_ls *);
-int gdlm_release_all_locks(struct gdlm_ls *);
unsigned int gdlm_do_lock(struct gdlm_lock *);

int gdlm_get_lock(void *, struct lm_lockname *, void **);
diff --git a/fs/gfs2/locking/dlm/mount.c b/fs/gfs2/locking/dlm/mount.c
index 947e706..09d78c2 100644
--- a/fs/gfs2/locking/dlm/mount.c
+++ b/fs/gfs2/locking/dlm/mount.c
@@ -221,7 +221,6 @@ static void gdlm_withdraw(void *lockspace)

dlm_release_lockspace(ls->dlm_lockspace, 2);
gdlm_release_threads(ls);
- gdlm_release_all_locks(ls);
gdlm_kobject_release(ls);
}

--
1.5.1.2

2008-07-11 11:04:49

by Steven Whitehouse

[permalink] [raw]
Subject: [PATCH 10/18] [GFS2] Remove all_list from lock_dlm

From: Steven Whitehouse <[email protected]>

I discovered that we had a list onto which every lock_dlm
lock was being put. Its only function was to discover whether
we'd got any locks left after umount. Since there was already
a counter for that purpose as well, I removed the list. The
saving is sizeof(struct list_head) per glock - well worth
having.

Signed-off-by: Steven Whitehouse <[email protected]>

diff --git a/fs/gfs2/locking/dlm/lock.c b/fs/gfs2/locking/dlm/lock.c
index 894df45..2482c90 100644
--- a/fs/gfs2/locking/dlm/lock.c
+++ b/fs/gfs2/locking/dlm/lock.c
@@ -58,9 +58,6 @@ static void gdlm_delete_lp(struct gdlm_lock *lp)
spin_lock(&ls->async_lock);
if (!list_empty(&lp->delay_list))
list_del_init(&lp->delay_list);
- gdlm_assert(!list_empty(&lp->all_list), "%x,%llx", lp->lockname.ln_type,
- (unsigned long long)lp->lockname.ln_number);
- list_del_init(&lp->all_list);
ls->all_locks_count--;
spin_unlock(&ls->async_lock);

@@ -397,7 +394,6 @@ static int gdlm_create_lp(struct gdlm_ls *ls, struct lm_lockname *name,
INIT_LIST_HEAD(&lp->delay_list);

spin_lock(&ls->async_lock);
- list_add(&lp->all_list, &ls->all_locks);
ls->all_locks_count++;
spin_unlock(&ls->async_lock);

@@ -710,22 +706,3 @@ void gdlm_submit_delayed(struct gdlm_ls *ls)
wake_up(&ls->thread_wait);
}

-int gdlm_release_all_locks(struct gdlm_ls *ls)
-{
- struct gdlm_lock *lp, *safe;
- int count = 0;
-
- spin_lock(&ls->async_lock);
- list_for_each_entry_safe(lp, safe, &ls->all_locks, all_list) {
- list_del_init(&lp->all_list);
-
- if (lp->lvb && lp->lvb != junk_lvb)
- kfree(lp->lvb);
- kfree(lp);
- count++;
- }
- spin_unlock(&ls->async_lock);
-
- return count;
-}
-
diff --git a/fs/gfs2/locking/dlm/lock_dlm.h b/fs/gfs2/locking/dlm/lock_dlm.h
index 845a27f..21cf466 100644
--- a/fs/gfs2/locking/dlm/lock_dlm.h
+++ b/fs/gfs2/locking/dlm/lock_dlm.h
@@ -74,7 +74,6 @@ struct gdlm_ls {
spinlock_t async_lock;
struct list_head delayed;
struct list_head submit;
- struct list_head all_locks;
u32 all_locks_count;
wait_queue_head_t wait_control;
struct task_struct *thread;
@@ -112,7 +111,6 @@ struct gdlm_lock {
unsigned long flags; /* lock_dlm flags LFL_ */

struct list_head delay_list; /* delayed */
- struct list_head all_list; /* all locks for the fs */
struct gdlm_lock *hold_null; /* NL lock for hold_lvb */
};

diff --git a/fs/gfs2/locking/dlm/mount.c b/fs/gfs2/locking/dlm/mount.c
index fa31c54..947e706 100644
--- a/fs/gfs2/locking/dlm/mount.c
+++ b/fs/gfs2/locking/dlm/mount.c
@@ -28,7 +28,6 @@ static struct gdlm_ls *init_gdlm(lm_callback_t cb, struct gfs2_sbd *sdp,
spin_lock_init(&ls->async_lock);
INIT_LIST_HEAD(&ls->delayed);
INIT_LIST_HEAD(&ls->submit);
- INIT_LIST_HEAD(&ls->all_locks);
init_waitqueue_head(&ls->thread_wait);
init_waitqueue_head(&ls->wait_control);
ls->jid = -1;
@@ -173,7 +172,6 @@ out:
static void gdlm_unmount(void *lockspace)
{
struct gdlm_ls *ls = lockspace;
- int rv;

log_debug("unmount flags %lx", ls->flags);

@@ -187,9 +185,7 @@ static void gdlm_unmount(void *lockspace)
gdlm_kobject_release(ls);
dlm_release_lockspace(ls->dlm_lockspace, 2);
gdlm_release_threads(ls);
- rv = gdlm_release_all_locks(ls);
- if (rv)
- log_info("gdlm_unmount: %d stray locks freed", rv);
+ BUG_ON(ls->all_locks_count);
out:
kfree(ls);
}
--
1.5.1.2

2008-07-11 11:06:04

by Steven Whitehouse

[permalink] [raw]
Subject: [PATCH 14/18] [GFS2] Fix delayed demote race

From: Steven Whitehouse <[email protected]>

There is a race in the delayed demote code where it does the wrong thing
if a demotion to UN has occurred for other reasons before the delay has
expired. This patch adds an assert to catch that condition as well as
fixing the root cause by adding an additional check for the UN state.

Signed-off-by: Steven Whitehouse <[email protected]>
Cc: Bob Peterson <[email protected]>

diff --git a/fs/gfs2/glock.c b/fs/gfs2/glock.c
index 8d5450f..cd0aa21 100644
--- a/fs/gfs2/glock.c
+++ b/fs/gfs2/glock.c
@@ -587,6 +587,7 @@ static void run_queue(struct gfs2_glock *gl, const int nonblock)
if (nonblock)
goto out_sched;
set_bit(GLF_DEMOTE_IN_PROGRESS, &gl->gl_flags);
+ GLOCK_BUG_ON(gl, gl->gl_demote_state == LM_ST_EXCLUSIVE);
gl->gl_target = gl->gl_demote_state;
} else {
if (test_bit(GLF_DEMOTE, &gl->gl_flags))
@@ -617,7 +618,9 @@ static void glock_work_func(struct work_struct *work)
if (test_and_clear_bit(GLF_REPLY_PENDING, &gl->gl_flags))
finish_xmote(gl, gl->gl_reply);
spin_lock(&gl->gl_spin);
- if (test_and_clear_bit(GLF_PENDING_DEMOTE, &gl->gl_flags)) {
+ if (test_and_clear_bit(GLF_PENDING_DEMOTE, &gl->gl_flags) &&
+ gl->gl_state != LM_ST_UNLOCKED &&
+ gl->gl_demote_state != LM_ST_EXCLUSIVE) {
unsigned long holdtime, now = jiffies;
holdtime = gl->gl_tchange + gl->gl_ops->go_min_hold_time;
if (time_before(now, holdtime))
--
1.5.1.2

2008-07-11 11:06:36

by Steven Whitehouse

[permalink] [raw]
Subject: [PATCH 15/18] [GFS2] Allow local DF locks when holding a cached EX glock

From: Steven Whitehouse <[email protected]>

We already allow local SH locks while we hold a cached EX glock, so here
we allow DF locks as well. This works only because we rely on the VFS's
invalidation for locally cached data, and because if we hold an EX lock,
then we know that no other node can be caching data relating to this
file.

It dramatically speeds up initial writes to O_DIRECT files since we fall
back to buffered I/O for this and would otherwise bounce between DF and
EX modes on each and every write call. The lessons to be learned from
that are to ensure that (for the time being anyway) O_DIRECT files are
preallocated and that they are written to using reasonably large I/O
sizes. Even so this change fixes that corner case nicely

Signed-off-by: Steven Whitehouse <[email protected]>

diff --git a/fs/gfs2/glock.c b/fs/gfs2/glock.c
index cd0aa21..13391e5 100644
--- a/fs/gfs2/glock.c
+++ b/fs/gfs2/glock.c
@@ -267,8 +267,12 @@ static inline int may_grant(const struct gfs2_glock *gl, const struct gfs2_holde
return 1;
if (gh->gh_flags & GL_EXACT)
return 0;
- if (gh->gh_state == LM_ST_SHARED && gl->gl_state == LM_ST_EXCLUSIVE)
- return 1;
+ if (gl->gl_state == LM_ST_EXCLUSIVE) {
+ if (gh->gh_state == LM_ST_SHARED && gh_head->gh_state == LM_ST_SHARED)
+ return 1;
+ if (gh->gh_state == LM_ST_DEFERRED && gh_head->gh_state == LM_ST_DEFERRED)
+ return 1;
+ }
if (gl->gl_state != LM_ST_UNLOCKED && (gh->gh_flags & LM_FLAG_ANY))
return 1;
return 0;
--
1.5.1.2

2008-07-11 11:07:06

by Steven Whitehouse

[permalink] [raw]
Subject: [PATCH 16/18] [GFS2] Replace rgrp "recent list" with mru list

From: Steven Whitehouse <[email protected]>

This patch removes the "recent list" which is used during allocation
and replaces it with the (already existing) mru list used during
deletion. The "recent list" was not a true mru list leading to a number
of inefficiencies including a "next" function which made scanning the
list an order N^2 operation wrt to the number of list elements.

This should increase allocation performance with large numbers of rgrps.
Its also a useful preparation and cleanup before some further changes
which are planned in this area.

Signed-off-by: Steven Whitehouse <[email protected]>

diff --git a/fs/gfs2/incore.h b/fs/gfs2/incore.h
index 4b734c6..4ab3c3a 100644
--- a/fs/gfs2/incore.h
+++ b/fs/gfs2/incore.h
@@ -77,7 +77,6 @@ struct gfs2_rgrp_host {
struct gfs2_rgrpd {
struct list_head rd_list; /* Link with superblock */
struct list_head rd_list_mru;
- struct list_head rd_recent; /* Recently used rgrps */
struct gfs2_glock *rd_gl; /* Glock for this rgrp */
u64 rd_addr; /* grp block disk address */
u64 rd_data0; /* first data location */
@@ -529,7 +528,6 @@ struct gfs2_sbd {
struct mutex sd_rindex_mutex;
struct list_head sd_rindex_list;
struct list_head sd_rindex_mru_list;
- struct list_head sd_rindex_recent_list;
struct gfs2_rgrpd *sd_rindex_forward;
unsigned int sd_rgrps;

diff --git a/fs/gfs2/ops_fstype.c b/fs/gfs2/ops_fstype.c
index 6ba69dd..b4d1d64 100644
--- a/fs/gfs2/ops_fstype.c
+++ b/fs/gfs2/ops_fstype.c
@@ -64,7 +64,6 @@ static struct gfs2_sbd *init_sbd(struct super_block *sb)
mutex_init(&sdp->sd_rindex_mutex);
INIT_LIST_HEAD(&sdp->sd_rindex_list);
INIT_LIST_HEAD(&sdp->sd_rindex_mru_list);
- INIT_LIST_HEAD(&sdp->sd_rindex_recent_list);

INIT_LIST_HEAD(&sdp->sd_jindex_list);
spin_lock_init(&sdp->sd_jindex_spin);
diff --git a/fs/gfs2/rgrp.c b/fs/gfs2/rgrp.c
index 3401628..2d90fb2 100644
--- a/fs/gfs2/rgrp.c
+++ b/fs/gfs2/rgrp.c
@@ -371,11 +371,6 @@ static void clear_rgrpdi(struct gfs2_sbd *sdp)

spin_lock(&sdp->sd_rindex_spin);
sdp->sd_rindex_forward = NULL;
- head = &sdp->sd_rindex_recent_list;
- while (!list_empty(head)) {
- rgd = list_entry(head->next, struct gfs2_rgrpd, rd_recent);
- list_del(&rgd->rd_recent);
- }
spin_unlock(&sdp->sd_rindex_spin);

head = &sdp->sd_rindex_list;
@@ -945,107 +940,30 @@ static struct inode *try_rgrp_unlink(struct gfs2_rgrpd *rgd, u64 *last_unlinked)
}

/**
- * recent_rgrp_first - get first RG from "recent" list
- * @sdp: The GFS2 superblock
- * @rglast: address of the rgrp used last
- *
- * Returns: The first rgrp in the recent list
- */
-
-static struct gfs2_rgrpd *recent_rgrp_first(struct gfs2_sbd *sdp,
- u64 rglast)
-{
- struct gfs2_rgrpd *rgd;
-
- spin_lock(&sdp->sd_rindex_spin);
-
- if (rglast) {
- list_for_each_entry(rgd, &sdp->sd_rindex_recent_list, rd_recent) {
- if (rgrp_contains_block(rgd, rglast))
- goto out;
- }
- }
- rgd = NULL;
- if (!list_empty(&sdp->sd_rindex_recent_list))
- rgd = list_entry(sdp->sd_rindex_recent_list.next,
- struct gfs2_rgrpd, rd_recent);
-out:
- spin_unlock(&sdp->sd_rindex_spin);
- return rgd;
-}
-
-/**
* recent_rgrp_next - get next RG from "recent" list
* @cur_rgd: current rgrp
- * @remove:
*
* Returns: The next rgrp in the recent list
*/

-static struct gfs2_rgrpd *recent_rgrp_next(struct gfs2_rgrpd *cur_rgd,
- int remove)
+static struct gfs2_rgrpd *recent_rgrp_next(struct gfs2_rgrpd *cur_rgd)
{
struct gfs2_sbd *sdp = cur_rgd->rd_sbd;
struct list_head *head;
struct gfs2_rgrpd *rgd;

spin_lock(&sdp->sd_rindex_spin);
-
- head = &sdp->sd_rindex_recent_list;
-
- list_for_each_entry(rgd, head, rd_recent) {
- if (rgd == cur_rgd) {
- if (cur_rgd->rd_recent.next != head)
- rgd = list_entry(cur_rgd->rd_recent.next,
- struct gfs2_rgrpd, rd_recent);
- else
- rgd = NULL;
-
- if (remove)
- list_del(&cur_rgd->rd_recent);
-
- goto out;
- }
+ head = &sdp->sd_rindex_mru_list;
+ if (unlikely(cur_rgd->rd_list_mru.next == head)) {
+ spin_unlock(&sdp->sd_rindex_spin);
+ return NULL;
}
-
- rgd = NULL;
- if (!list_empty(head))
- rgd = list_entry(head->next, struct gfs2_rgrpd, rd_recent);
-
-out:
+ rgd = list_entry(cur_rgd->rd_list_mru.next, struct gfs2_rgrpd, rd_list_mru);
spin_unlock(&sdp->sd_rindex_spin);
return rgd;
}

/**
- * recent_rgrp_add - add an RG to tail of "recent" list
- * @new_rgd: The rgrp to add
- *
- */
-
-static void recent_rgrp_add(struct gfs2_rgrpd *new_rgd)
-{
- struct gfs2_sbd *sdp = new_rgd->rd_sbd;
- struct gfs2_rgrpd *rgd;
- unsigned int count = 0;
- unsigned int max = sdp->sd_rgrps / gfs2_jindex_size(sdp);
-
- spin_lock(&sdp->sd_rindex_spin);
-
- list_for_each_entry(rgd, &sdp->sd_rindex_recent_list, rd_recent) {
- if (rgd == new_rgd)
- goto out;
-
- if (++count >= max)
- goto out;
- }
- list_add_tail(&new_rgd->rd_recent, &sdp->sd_rindex_recent_list);
-
-out:
- spin_unlock(&sdp->sd_rindex_spin);
-}
-
-/**
* forward_rgrp_get - get an rgrp to try next from full list
* @sdp: The GFS2 superblock
*
@@ -1112,9 +1030,7 @@ static struct inode *get_local_rgrp(struct gfs2_inode *ip, u64 *last_unlinked)
int loops = 0;
int error, rg_locked;

- /* Try recently successful rgrps */
-
- rgd = recent_rgrp_first(sdp, ip->i_goal);
+ rgd = gfs2_blk2rgrpd(sdp, ip->i_goal);

while (rgd) {
rg_locked = 0;
@@ -1136,11 +1052,9 @@ static struct inode *get_local_rgrp(struct gfs2_inode *ip, u64 *last_unlinked)
gfs2_glock_dq_uninit(&al->al_rgd_gh);
if (inode)
return inode;
- rgd = recent_rgrp_next(rgd, 1);
- break;
-
+ /* fall through */
case GLR_TRYFAILED:
- rgd = recent_rgrp_next(rgd, 0);
+ rgd = recent_rgrp_next(rgd);
break;

default:
@@ -1199,7 +1113,9 @@ static struct inode *get_local_rgrp(struct gfs2_inode *ip, u64 *last_unlinked)

out:
if (begin) {
- recent_rgrp_add(rgd);
+ spin_lock(&sdp->sd_rindex_spin);
+ list_move(&rgd->rd_list_mru, &sdp->sd_rindex_mru_list);
+ spin_unlock(&sdp->sd_rindex_spin);
rgd = gfs2_rgrpd_get_next(rgd);
if (!rgd)
rgd = gfs2_rgrpd_get_first(sdp);
--
1.5.1.2

2008-07-11 11:07:43

by Steven Whitehouse

[permalink] [raw]
Subject: [PATCH 17/18] [GFS2] Remove support for unused and pointless flag

From: Steven Whitehouse <[email protected]>

The ability to mark files for direct i/o access when opened
normally is both unused and pointless, so this patch removes
support for that feature.

Signed-off-by: Steven Whitehouse <[email protected]>

diff --git a/fs/gfs2/incore.h b/fs/gfs2/incore.h
index 4ab3c3a..448697a 100644
--- a/fs/gfs2/incore.h
+++ b/fs/gfs2/incore.h
@@ -421,7 +421,6 @@ struct gfs2_tune {
unsigned int gt_quota_quantum; /* Secs between syncs to quota file */
unsigned int gt_atime_quantum; /* Min secs between atime updates */
unsigned int gt_new_files_jdata;
- unsigned int gt_new_files_directio;
unsigned int gt_max_readahead; /* Max bytes to read-ahead from disk */
unsigned int gt_stall_secs; /* Detects trouble! */
unsigned int gt_complain_secs;
diff --git a/fs/gfs2/inode.c b/fs/gfs2/inode.c
index caf4090..6da0ab3 100644
--- a/fs/gfs2/inode.c
+++ b/fs/gfs2/inode.c
@@ -789,13 +789,8 @@ static void init_dinode(struct gfs2_inode *dip, struct gfs2_glock *gl,
if ((dip->i_di.di_flags & GFS2_DIF_INHERIT_JDATA) ||
gfs2_tune_get(sdp, gt_new_files_jdata))
di->di_flags |= cpu_to_be32(GFS2_DIF_JDATA);
- if ((dip->i_di.di_flags & GFS2_DIF_INHERIT_DIRECTIO) ||
- gfs2_tune_get(sdp, gt_new_files_directio))
- di->di_flags |= cpu_to_be32(GFS2_DIF_DIRECTIO);
} else if (S_ISDIR(mode)) {
di->di_flags |= cpu_to_be32(dip->i_di.di_flags &
- GFS2_DIF_INHERIT_DIRECTIO);
- di->di_flags |= cpu_to_be32(dip->i_di.di_flags &
GFS2_DIF_INHERIT_JDATA);
}

diff --git a/fs/gfs2/ops_file.c b/fs/gfs2/ops_file.c
index 1737af9..21b397d 100644
--- a/fs/gfs2/ops_file.c
+++ b/fs/gfs2/ops_file.c
@@ -134,7 +134,6 @@ static const u32 fsflags_to_gfs2[32] = {
[7] = GFS2_DIF_NOATIME,
[12] = GFS2_DIF_EXHASH,
[14] = GFS2_DIF_INHERIT_JDATA,
- [20] = GFS2_DIF_INHERIT_DIRECTIO,
};

static const u32 gfs2_to_fsflags[32] = {
@@ -143,7 +142,6 @@ static const u32 gfs2_to_fsflags[32] = {
[gfs2fl_AppendOnly] = FS_APPEND_FL,
[gfs2fl_NoAtime] = FS_NOATIME_FL,
[gfs2fl_ExHash] = FS_INDEX_FL,
- [gfs2fl_InheritDirectio] = FS_DIRECTIO_FL,
[gfs2fl_InheritJdata] = FS_JOURNAL_DATA_FL,
};

@@ -161,12 +159,8 @@ static int gfs2_get_flags(struct file *filp, u32 __user *ptr)
return error;

fsflags = fsflags_cvt(gfs2_to_fsflags, ip->i_di.di_flags);
- if (!S_ISDIR(inode->i_mode)) {
- if (ip->i_di.di_flags & GFS2_DIF_JDATA)
- fsflags |= FS_JOURNAL_DATA_FL;
- if (ip->i_di.di_flags & GFS2_DIF_DIRECTIO)
- fsflags |= FS_DIRECTIO_FL;
- }
+ if (!S_ISDIR(inode->i_mode) && ip->i_di.di_flags & GFS2_DIF_JDATA)
+ fsflags |= FS_JOURNAL_DATA_FL;
if (put_user(fsflags, ptr))
error = -EFAULT;

@@ -195,13 +189,11 @@ void gfs2_set_inode_flags(struct inode *inode)

/* Flags that can be set by user space */
#define GFS2_FLAGS_USER_SET (GFS2_DIF_JDATA| \
- GFS2_DIF_DIRECTIO| \
GFS2_DIF_IMMUTABLE| \
GFS2_DIF_APPENDONLY| \
GFS2_DIF_NOATIME| \
GFS2_DIF_SYNC| \
GFS2_DIF_SYSTEM| \
- GFS2_DIF_INHERIT_DIRECTIO| \
GFS2_DIF_INHERIT_JDATA)

/**
@@ -292,8 +284,6 @@ static int gfs2_set_flags(struct file *filp, u32 __user *ptr)
if (!S_ISDIR(inode->i_mode)) {
if (gfsflags & GFS2_DIF_INHERIT_JDATA)
gfsflags ^= (GFS2_DIF_JDATA | GFS2_DIF_INHERIT_JDATA);
- if (gfsflags & GFS2_DIF_INHERIT_DIRECTIO)
- gfsflags ^= (GFS2_DIF_DIRECTIO | GFS2_DIF_INHERIT_DIRECTIO);
return do_gfs2_set_flags(filp, gfsflags, ~0);
}
return do_gfs2_set_flags(filp, gfsflags, ~GFS2_DIF_JDATA);
@@ -494,11 +484,6 @@ static int gfs2_open(struct inode *inode, struct file *file)
goto fail_gunlock;
}

- /* Listen to the Direct I/O flag */
-
- if (ip->i_di.di_flags & GFS2_DIF_DIRECTIO)
- file->f_flags |= O_DIRECT;
-
gfs2_glock_dq_uninit(&i_gh);
}

diff --git a/fs/gfs2/super.c b/fs/gfs2/super.c
index 12fe38f..63a8a90 100644
--- a/fs/gfs2/super.c
+++ b/fs/gfs2/super.c
@@ -65,7 +65,6 @@ void gfs2_tune_init(struct gfs2_tune *gt)
gt->gt_quota_quantum = 60;
gt->gt_atime_quantum = 3600;
gt->gt_new_files_jdata = 0;
- gt->gt_new_files_directio = 0;
gt->gt_max_readahead = 1 << 18;
gt->gt_stall_secs = 600;
gt->gt_complain_secs = 10;
diff --git a/fs/gfs2/sys.c b/fs/gfs2/sys.c
index 6f7e2e5..7484655 100644
--- a/fs/gfs2/sys.c
+++ b/fs/gfs2/sys.c
@@ -412,7 +412,6 @@ TUNE_ATTR(max_readahead, 0);
TUNE_ATTR(complain_secs, 0);
TUNE_ATTR(statfs_slow, 0);
TUNE_ATTR(new_files_jdata, 0);
-TUNE_ATTR(new_files_directio, 0);
TUNE_ATTR(quota_simul_sync, 1);
TUNE_ATTR(quota_cache_secs, 1);
TUNE_ATTR(stall_secs, 1);
@@ -441,7 +440,6 @@ static struct attribute *tune_attrs[] = {
&tune_attr_quotad_secs.attr,
&tune_attr_quota_scale.attr,
&tune_attr_new_files_jdata.attr,
- &tune_attr_new_files_directio.attr,
NULL,
};

--
1.5.1.2

2008-07-11 11:08:12

by Steven Whitehouse

[permalink] [raw]
Subject: [PATCH 13/18] [GFS2] don't call permission()

From: Miklos Szeredi <[email protected]>

GFS2 calls permission() to verify permissions after locks on the files
have been taken.

For this it's sufficient to call gfs2_permission() instead. This
results in the following changes:

- IS_RDONLY() check is not performed
- IS_IMMUTABLE() check is not performed
- devcgroup_inode_permission() is not called
- security_inode_permission() is not called

IS_RDONLY() should be unnecessary anyway, as the per-mount read-only
flag should provide protection against read-only remounts during
operations. do_gfs2_set_flags() has been fixed to perform
mnt_want_write()/mnt_drop_write() to protect against remounting
read-only.

IS_IMMUTABLE has been added to gfs2_permission()

Repeating the security checks seems to be pointless, as they don't
normally change, and if they do, it's independent of the filesystem
state.

Signed-off-by: Miklos Szeredi <[email protected]>
Signed-off-by: Steven Whitehouse <[email protected]>

diff --git a/fs/gfs2/inode.c b/fs/gfs2/inode.c
index 09453d0..caf4090 100644
--- a/fs/gfs2/inode.c
+++ b/fs/gfs2/inode.c
@@ -504,7 +504,7 @@ struct inode *gfs2_lookupi(struct inode *dir, const struct qstr *name,
}

if (!is_root) {
- error = permission(dir, MAY_EXEC, NULL);
+ error = gfs2_permission(dir, MAY_EXEC);
if (error)
goto out;
}
@@ -667,7 +667,7 @@ static int create_ok(struct gfs2_inode *dip, const struct qstr *name,
{
int error;

- error = permission(&dip->i_inode, MAY_WRITE | MAY_EXEC, NULL);
+ error = gfs2_permission(&dip->i_inode, MAY_WRITE | MAY_EXEC);
if (error)
return error;

@@ -1134,7 +1134,7 @@ int gfs2_unlink_ok(struct gfs2_inode *dip, const struct qstr *name,
if (IS_APPEND(&dip->i_inode))
return -EPERM;

- error = permission(&dip->i_inode, MAY_WRITE | MAY_EXEC, NULL);
+ error = gfs2_permission(&dip->i_inode, MAY_WRITE | MAY_EXEC);
if (error)
return error;

diff --git a/fs/gfs2/inode.h b/fs/gfs2/inode.h
index 580da45..04e9fef 100644
--- a/fs/gfs2/inode.h
+++ b/fs/gfs2/inode.h
@@ -91,6 +91,7 @@ int gfs2_rmdiri(struct gfs2_inode *dip, const struct qstr *name,
struct gfs2_inode *ip);
int gfs2_unlink_ok(struct gfs2_inode *dip, const struct qstr *name,
const struct gfs2_inode *ip);
+int gfs2_permission(struct inode *inode, int mask);
int gfs2_ok_to_move(struct gfs2_inode *this, struct gfs2_inode *to);
int gfs2_readlinki(struct gfs2_inode *ip, char **buf, unsigned int *len);
int gfs2_glock_nq_atime(struct gfs2_holder *gh);
diff --git a/fs/gfs2/ops_file.c b/fs/gfs2/ops_file.c
index 0ff512a..1737af9 100644
--- a/fs/gfs2/ops_file.c
+++ b/fs/gfs2/ops_file.c
@@ -15,6 +15,7 @@
#include <linux/uio.h>
#include <linux/blkdev.h>
#include <linux/mm.h>
+#include <linux/mount.h>
#include <linux/fs.h>
#include <linux/gfs2_ondisk.h>
#include <linux/ext2_fs.h>
@@ -220,10 +221,14 @@ static int do_gfs2_set_flags(struct file *filp, u32 reqflags, u32 mask)
int error;
u32 new_flags, flags;

- error = gfs2_glock_nq_init(ip->i_gl, LM_ST_EXCLUSIVE, 0, &gh);
+ error = mnt_want_write(filp->f_path.mnt);
if (error)
return error;

+ error = gfs2_glock_nq_init(ip->i_gl, LM_ST_EXCLUSIVE, 0, &gh);
+ if (error)
+ goto out_drop_write;
+
flags = ip->i_di.di_flags;
new_flags = (flags & ~mask) | (reqflags & mask);
if ((new_flags ^ flags) == 0)
@@ -242,7 +247,7 @@ static int do_gfs2_set_flags(struct file *filp, u32 reqflags, u32 mask)
!capable(CAP_LINUX_IMMUTABLE))
goto out;
if (!IS_IMMUTABLE(inode)) {
- error = permission(inode, MAY_WRITE, NULL);
+ error = gfs2_permission(inode, MAY_WRITE);
if (error)
goto out;
}
@@ -272,6 +277,8 @@ out_trans_end:
gfs2_trans_end(sdp);
out:
gfs2_glock_dq_uninit(&gh);
+out_drop_write:
+ mnt_drop_write(filp->f_path.mnt);
return error;
}

diff --git a/fs/gfs2/ops_inode.c b/fs/gfs2/ops_inode.c
index 2686ad4..1e252df 100644
--- a/fs/gfs2/ops_inode.c
+++ b/fs/gfs2/ops_inode.c
@@ -163,7 +163,7 @@ static int gfs2_link(struct dentry *old_dentry, struct inode *dir,
if (error)
goto out;

- error = permission(dir, MAY_WRITE | MAY_EXEC, NULL);
+ error = gfs2_permission(dir, MAY_WRITE | MAY_EXEC);
if (error)
goto out_gunlock;

@@ -669,7 +669,7 @@ static int gfs2_rename(struct inode *odir, struct dentry *odentry,
}
}
} else {
- error = permission(ndir, MAY_WRITE | MAY_EXEC, NULL);
+ error = gfs2_permission(ndir, MAY_WRITE | MAY_EXEC);
if (error)
goto out_gunlock;

@@ -704,7 +704,7 @@ static int gfs2_rename(struct inode *odir, struct dentry *odentry,
/* Check out the dir to be renamed */

if (dir_rename) {
- error = permission(odentry->d_inode, MAY_WRITE, NULL);
+ error = gfs2_permission(odentry->d_inode, MAY_WRITE);
if (error)
goto out_gunlock;
}
@@ -891,7 +891,7 @@ static void *gfs2_follow_link(struct dentry *dentry, struct nameidata *nd)
* Returns: errno
*/

-static int gfs2_permission(struct inode *inode, int mask, struct nameidata *nd)
+int gfs2_permission(struct inode *inode, int mask)
{
struct gfs2_inode *ip = GFS2_I(inode);
struct gfs2_holder i_gh;
@@ -905,13 +905,22 @@ static int gfs2_permission(struct inode *inode, int mask, struct nameidata *nd)
unlock = 1;
}

- error = generic_permission(inode, mask, gfs2_check_acl);
+ if ((mask & MAY_WRITE) && IS_IMMUTABLE(inode))
+ error = -EACCES;
+ else
+ error = generic_permission(inode, mask, gfs2_check_acl);
if (unlock)
gfs2_glock_dq_uninit(&i_gh);

return error;
}

+static int gfs2_iop_permission(struct inode *inode, int mask,
+ struct nameidata *nd)
+{
+ return gfs2_permission(inode, mask);
+}
+
static int setattr_size(struct inode *inode, struct iattr *attr)
{
struct gfs2_inode *ip = GFS2_I(inode);
@@ -1141,7 +1150,7 @@ static int gfs2_removexattr(struct dentry *dentry, const char *name)
}

const struct inode_operations gfs2_file_iops = {
- .permission = gfs2_permission,
+ .permission = gfs2_iop_permission,
.setattr = gfs2_setattr,
.getattr = gfs2_getattr,
.setxattr = gfs2_setxattr,
@@ -1160,7 +1169,7 @@ const struct inode_operations gfs2_dir_iops = {
.rmdir = gfs2_rmdir,
.mknod = gfs2_mknod,
.rename = gfs2_rename,
- .permission = gfs2_permission,
+ .permission = gfs2_iop_permission,
.setattr = gfs2_setattr,
.getattr = gfs2_getattr,
.setxattr = gfs2_setxattr,
@@ -1172,7 +1181,7 @@ const struct inode_operations gfs2_dir_iops = {
const struct inode_operations gfs2_symlink_iops = {
.readlink = gfs2_readlink,
.follow_link = gfs2_follow_link,
- .permission = gfs2_permission,
+ .permission = gfs2_iop_permission,
.setattr = gfs2_setattr,
.getattr = gfs2_getattr,
.setxattr = gfs2_setxattr,
--
1.5.1.2

2008-07-11 11:08:28

by Steven Whitehouse

[permalink] [raw]
Subject: [PATCH 18/18] [GFS2] Remove unused declaration

From: Li Xiaodong <[email protected]>

The implementation of gfs2_inode_attr_in is removed.
So remove its declaration.

Signed-off-by: Li Xiaodong <[email protected]>
Signed-off-by: Steven Whitehouse <[email protected]>

diff --git a/fs/gfs2/inode.h b/fs/gfs2/inode.h
index 04e9fef..6074c25 100644
--- a/fs/gfs2/inode.h
+++ b/fs/gfs2/inode.h
@@ -72,7 +72,6 @@ static inline void gfs2_inum_out(const struct gfs2_inode *ip,
}


-void gfs2_inode_attr_in(struct gfs2_inode *ip);
void gfs2_set_iop(struct inode *inode);
struct inode *gfs2_inode_lookup(struct super_block *sb, unsigned type,
u64 no_addr, u64 no_formal_ino,
--
1.5.1.2

2008-07-11 11:21:16

by linux-os (Dick Johnson)

[permalink] [raw]
Subject: Re: [PATCH 17/18] [GFS2] Remove support for unused and pointless flag


On Fri, 11 Jul 2008 [email protected] wrote:

> From: Steven Whitehouse <[email protected]>
>
> The ability to mark files for direct i/o access when opened
> normally is both unused and pointless, so this patch removes
> support for that feature.
[Snipped...]

So linux is no longer going to support commercial databases?
Oracle and others need the O_DIRECT attribute. Linux can
probably igonore it, but such an open cannot fail or else
Linux gets thrown out of the commercial enterprise when
an upgrade disables an entire financial institution.

Cheers,
Dick Johnson
Penguin : Linux version 2.6.22.1 on an i686 machine (5588.28 BogoMips).
My book : http://www.AbominableFirebug.com/
_


****************************************************************
The information transmitted in this message is confidential and may be privileged. Any review, retransmission, dissemination, or other use of this information by persons or entities other than the intended recipient is prohibited. If you are not the intended recipient, please notify Analogic Corporation immediately - by replying to this message or by sending an email to [email protected] - and destroy all copies of this information, including any attachments, without reading or disclosing them.

Thank you.

2008-07-11 11:53:18

by Steven Whitehouse

[permalink] [raw]
Subject: Re: [PATCH 17/18] [GFS2] Remove support for unused and pointless flag

Hi,

On Fri, 2008-07-11 at 07:19 -0400, linux-os (Dick Johnson) wrote:
> On Fri, 11 Jul 2008 [email protected] wrote:
>
> > From: Steven Whitehouse <[email protected]>
> >
> > The ability to mark files for direct i/o access when opened
> > normally is both unused and pointless, so this patch removes
> > support for that feature.
> [Snipped...]
>
> So linux is no longer going to support commercial databases?
> Oracle and others need the O_DIRECT attribute. Linux can
> probably igonore it, but such an open cannot fail or else
> Linux gets thrown out of the commercial enterprise when
> an upgrade disables an entire financial institution.
>
> Cheers,
> Dick Johnson
> Penguin : Linux version 2.6.22.1 on an i686 machine (5588.28 BogoMips).
> My book : http://www.AbominableFirebug.com/
> _
>
No, thats not what it means...., but perhaps I could have explained it
better. You can still use O_DIRECT flags in open syscalls in exactly the
same way as before. What this patch removes is the (never used, and IMHO
pointless) feature of being able to set a flag on the inode (via
chattr/setattr) to force open's to set O_DIRECT in every case, even when
the application didn't ask for it. The reason that its pointless is that
applications which don't otherwise support O_DIRECT are very unlikely to
supply suitably aligned buffers and I/O requests, so that overriding the
O_DIRECT flag for such applications will most likely result in an I/O
error.

GFS2 will still support O_DIRECT just the same as it has always done,
and this doesn't affect any other filesystem's support for that feature
either,

Steve.

2008-07-11 12:50:06

by linux-os (Dick Johnson)

[permalink] [raw]
Subject: Re: [PATCH 17/18] [GFS2] Remove support for unused and pointlessflag


On Fri, 11 Jul 2008, Steven Whitehouse wrote:

> Hi,
>
> On Fri, 2008-07-11 at 07:19 -0400, linux-os (Dick Johnson) wrote:
>> On Fri, 11 Jul 2008 [email protected] wrote:
>>
>>> From: Steven Whitehouse <[email protected]>
>>>
>>> The ability to mark files for direct i/o access when opened
>>> normally is both unused and pointless, so this patch removes
>>> support for that feature.
>> [Snipped...]
>>
>> So linux is no longer going to support commercial databases?
[Snipped...]

>>
> No, thats not what it means...., but perhaps I could have explained it
> better. You can still use O_DIRECT flags in open syscalls in exactly the
> same way as before. What this patch removes is the (never used, and IMHO
> pointless) feature of being able to set a flag on the inode (via
> chattr/setattr) to force open's to set O_DIRECT in every case, even when
> the application didn't ask for it. The reason that its pointless is that
> applications which don't otherwise support O_DIRECT are very unlikely to
> supply suitably aligned buffers and I/O requests, so that overriding the
> O_DIRECT flag for such applications will most likely result in an I/O
> error.
>
> GFS2 will still support O_DIRECT just the same as it has always done,
> and this doesn't affect any other filesystem's support for that feature
> either,
>
> Steve.

Thanks for the clarification!

Cheers,
Dick Johnson
Penguin : Linux version 2.6.22.1 on an i686 machine (5588.28 BogoMips).
My book : http://www.AbominableFirebug.com/
_


****************************************************************
The information transmitted in this message is confidential and may be privileged. Any review, retransmission, dissemination, or other use of this information by persons or entities other than the intended recipient is prohibited. If you are not the intended recipient, please notify Analogic Corporation immediately - by replying to this message or by sending an email to [email protected] - and destroy all copies of this information, including any attachments, without reading or disclosing them.

Thank you.

2008-07-14 20:52:17

by Randy Dunlap

[permalink] [raw]
Subject: Re: [PATCH 11/18] [GFS2] Glock documentation

On Fri, 11 Jul 2008 11:11:12 +0100 [email protected] wrote:

> From: Steven Whitehouse <[email protected]>
>
> This patch adds a file describing the internals of GFS2's glock
> abstraction.
>
> Signed-off-by: Steven Whitehouse <[email protected]>
>
> diff --git a/Documentation/filesystems/gfs2-glocks.txt b/Documentation/filesystems/gfs2-glocks.txt
> new file mode 100644
> index 0000000..4dae9a3
> --- /dev/null
> +++ b/Documentation/filesystems/gfs2-glocks.txt
> @@ -0,0 +1,114 @@
> + Glock internal locking rules
> + ------------------------------
> +
> +This documents the basic principles of the glock state machine
> +internals. Each glock (struct gfs2_glock in fs/gfs2/incore.h)
> +has two main (internal) locks:
> +
> + 1. A spinlock (gl_spin) which protects the internal state such
> + as gl_state, gl_target and the list of holders (gl_holders)
> + 2. A non-blocking bit lock, GLF_LOCK, which is used to prevent other
> + threads from making calls to the DLM, etc. at the same time. If a
> + thread takes this lock, it must then call run_queue (usually via the
> + workqueue) when it releases it in order to ensure any pending tasks
> + are completed.
> +
> +The gl_holders list contains all the queued lock requests (not
> +just the holders) associated with the glock. If there are any
> +held locks, then they will be contiguous entries at the head
> +of the list. Locks are granted in strictly the order that they
> +are queued, except for those marked LM_FLAG_PRIORITY which are
> +used only during recovery, and even then only for journal locks.
> +
> +There are three lock states that users of the glock layer can request,
> +namely shared (SH), deferred (DF) and exclusive (EX). Those translate
> +to the following DLM lock modes:
> +
> +Glock mode | DLM lock mode
> +------------------------------
> + UN | IV/NL Unlocked (no DLM lock associated with glock) or NL
> + SH | PR (Protected read)
> + DF | CW (Concurrent write)
> + EX | EX (Exclusive)
> +
> +Thus DF is basically a shared mode which is incompatible with the "normal"
> +shared lock mode, SH. In GFS2 the DF mode is used exclusively for direct I/O
> +operations. The glocks are basically a lock plus some routines which deal
> +with cache management. The following rules apply for the cache:
> +
> +Glock mode | Cache data | Cache Metadata | Dirty Data | Dirty Metadata
> +--------------------------------------------------------------------------
> + UN | No | No | No | No
> + SH | Yes | Yes | No | No
> + DF | No | Yes | No | No
> + EX | Yes | Yes | Yes | Yes
> +
> +These rules are implemented using the various glock operations which
> +are defined for each type of glock. Not all types of glocks use
> +all the modes. Only inode glocks use the DF mode for example.
> +
> +Table of glock operations and per type constants:
> +
> +Field | Purpose
> +----------------------------------------------------------------------------
> +go_xmote_th | Called before remote state change (e.g. to sync dirty data)
> +go_xmote_bh | Called after remote state change (e.g. to refill cache)
> +go_inval | Called if remote state change requires invalidating the cache
> +go_demote_ok | Returns boolean value of whether its ok to demote a glock

it's

> + | (e.g. checks timeout, and that there is no cached data)
> +go_lock | Called for the first local holder of a lock
> +go_unlock | Called on the final local unlock of a lock
> +go_dump | Called to print content of object for debugfs file, or on
> + | error to dump glock to the log.
> +go_type; | The type of the glock, LM_TYPE_.....

^drop ';'

> +go_min_hold_time | The minimum hold time
> +
> +The minimum hold time for each lock is the time after a remote lock
> +grant for which we ignore remote demote requests. This is in order to
> +prevent a situation where locks are being bounced around the cluster
> +from node to node with none of the nodes making any progress. This
> +tends to show up most with shared mmaped files which are being written
> +to by multiple nodes. By delaying the demotion in response to a
> +remote callback, that gives the userspace program time to make
> +some progress before the pages are unmapped.
> +
> +There is a plan to try and remove the go_lock and go_unlock callbacks
> +if possible, in order to try and speed up the fast path though the locking.
> +Also, eventually we hope to make the glock "EX" mode locally shared
> +such that any local locking will be done with the i_mutex as required
> +rather than via the glock.
> +
> +Locking rules for glock operations:
> +
> +Operation | GLF_LOCK bit lock held | gl_spin spinlock held
> +-----------------------------------------------------------------
> +go_xmote_th | Yes | No
> +go_xmote_bh | Yes | No
> +go_inval | Yes | No
> +go_demote_ok | Sometimes | Yes
> +go_lock | Yes | No
> +go_unlock | Yes | No
> +go_dump | Sometimes | Yes
> +
> +N.B. Operations must not drop either the bit lock or the spinlock
> +if its held on entry. go_dump and do_demote_ok must never block.

it's

> +Note that go_dump will only be called if the glock's state
> +indicates that it is caching uptodate data.
> +
> +Glock locking order within GFS2:
> +
> + 1. i_mutex (if required)
> + 2. Rename glock (for rename only)
> + 3. Inode glock(s)
> + (Parents before children, inodes at "same level" with same parent in
> + lock number order)
> + 4. Rgrp glock(s) (for (de)allocation operations)
> + 5. Transaction glock (via gfs2_trans_begin) for non-read operations
> + 6. Page lock (always last, very important!)
> +
> +There are two glocks per inode. One deals with access to the inode
> +itself (locking order as above), and the other, known as the iopen
> +glock is used in conjunction with the i_nlink field in the inode to
> +determine the lifetime of the inode in question. Locking of inodes
> +is on a per-inode basis. Locking of rgrps is on a per rgrp basis.
> +
> --

---
~Randy
Linux Plumbers Conference, 17-19 September 2008, Portland, Oregon USA
http://linuxplumbersconf.org/