LinuxLists.cc - [PATCH 1/2] seqlock: Add a new blocking reader type

2013-09-11 14:29:06

Subject: [PATCH 1/2] seqlock: Add a new blocking reader type

The sequence lock (seqlock) was originally designed for the cases
where the readers do not need to block the writers by making the
readers retry the read operation when the data change.

Since then, the use cases have been expanded to include situations
where a thread does not need to change the data (effectively a reader)
at all but have to take the writer lock because it can't tolerate
changes to the protected structure. Some examples are the d_path()
function and the getcwd() syscall in fs/dcache.c where the functions
take the writer lock on rename_lock even though they don't need
to change anything in the protected data structure at all. This is
inefficient as a reader is now blocking other non-blocking readers
by pretending to be a writer.

This patch tries to eliminate this inefficiency by introducing a new
type of blocking reader to the seqlock locking mechanism. This new
blocking reader will not block other non-blocking readers, but will
block other blocking readers and writers.

Signed-off-by: Waiman Long <[email protected]>
---
include/linux/seqlock.h | 65 +++++++++++++++++++++++++++++++++++++++++++---
1 files changed, 60 insertions(+), 5 deletions(-)

diff --git a/include/linux/seqlock.h b/include/linux/seqlock.h
index 1829905..26be0d9 100644
--- a/include/linux/seqlock.h
+++ b/include/linux/seqlock.h
@@ -3,15 +3,18 @@
/*
* Reader/writer consistent mechanism without starving writers. This type of
* lock for data where the reader wants a consistent set of information
- * and is willing to retry if the information changes. Readers never
- * block but they may have to retry if a writer is in
- * progress. Writers do not wait for readers.
+ * and is willing to retry if the information changes. There are two types
+ * of readers:
+ * 1. Non-blocking readers which never block but they may have to retry if
+ * a writer is in progress. Writers do not wait for non-blocking readers.
+ * 2. Blocking readers which will block if a writer is in progress. A
+ * blocking reader in progress will also block a writer.
*
- * This is not as cache friendly as brlock. Also, this will not work
+ * This is not as cache friendly as brlock. Also, this may not work well
* for data that contains pointers, because any writer could
* invalidate a pointer that a reader was following.
*
- * Expected reader usage:
+ * Expected non-blocking reader usage:
* do {
* seq = read_seqbegin(&foo);
* ...
@@ -268,4 +271,56 @@ write_sequnlock_irqrestore(seqlock_t *sl, unsigned long flags)
spin_unlock_irqrestore(&sl->lock, flags);
}

+/*
+ * The blocking reader lock out other writers, but doesn't update the count.
+ * Acts like a normal spin_lock/unlock.
+ * Don't need preempt_disable() because that is in the spin_lock already.
+ */
+static inline void read_seqlock(seqlock_t *sl)
+{
+ spin_lock(&sl->lock);
+}
+
+static inline void read_sequnlock(seqlock_t *sl)
+{
+ spin_unlock(&sl->lock);
+}
+
+static inline void read_seqlock_bh(seqlock_t *sl)
+{
+ spin_lock_bh(&sl->lock);
+}
+
+static inline void read_sequnlock_bh(seqlock_t *sl)
+{
+ spin_unlock_bh(&sl->lock);
+}
+
+static inline void read_seqlock_irq(seqlock_t *sl)
+{
+ spin_lock_irq(&sl->lock);
+}
+
+static inline void read_sequnlock_irq(seqlock_t *sl)
+{
+ spin_unlock_irq(&sl->lock);
+}
+
+static inline unsigned long __read_seqlock_irqsave(seqlock_t *sl)
+{
+ unsigned long flags;
+
+ spin_lock_irqsave(&sl->lock, flags);
+ return flags;
+}
+
+#define read_seqlock_irqsave(lock, flags) \
+ do { flags = __read_seqlock_irqsave(lock); } while (0)
+
+static inline void
+read_sequnlock_irqrestore(seqlock_t *sl, unsigned long flags)
+{
+ spin_unlock_irqrestore(&sl->lock, flags);
+}
+
#endif /* __LINUX_SEQLOCK_H */
--
1.7.1

2013-09-11 14:29:07

by Waiman Long

[permalink] [raw]

Subject: [PATCH 2/2] dcache: use read_seqlock/unlock() in read_seqbegin_or_lock() & friend

This patch modifies read_seqbegin_or_lock() and need_seqretry() to
use newly introduced read_seqlock() and read_sequnlock() primitives
so that they won't change the sequence number even if they fall back
to take the lock. This is OK as no change to the protected data
structure is being made. It will prevent one fallback to lock taking
from cascading into a series of lock taking reducing performance
because of the sequence number change. It will also allow other
sequence reader to go ahead while a reader lock is taken.

This patch also updates some of the inaccurate comments in the code.

Signed-off-by: Waiman Long <[email protected]>
---
fs/dcache.c | 31 ++++++++++++++++---------------
1 files changed, 16 insertions(+), 15 deletions(-)

diff --git a/fs/dcache.c b/fs/dcache.c
index 4d9df3c..8191ca5 100644
--- a/fs/dcache.c
+++ b/fs/dcache.c
@@ -90,8 +90,8 @@ static struct kmem_cache *dentry_cache __read_mostly;

/**
* read_seqbegin_or_lock - begin a sequence number check or locking block
- * lock: sequence lock
- * seq : sequence number to be checked
+ * @lock: sequence lock
+ * @seq : sequence number to be checked
*
* First try it once optimistically without taking the lock. If that fails,
* take the lock. The sequence number is also used as a marker for deciding
@@ -103,7 +103,7 @@ static inline void read_seqbegin_or_lock(seqlock_t *lock, int *seq)
if (!(*seq & 1)) /* Even */
*seq = read_seqbegin(lock);
else /* Odd */
- write_seqlock(lock);
+ read_seqlock(lock);
}

static inline int need_seqretry(seqlock_t *lock, int seq)
@@ -114,7 +114,7 @@ static inline int need_seqretry(seqlock_t *lock, int seq)
static inline void done_seqretry(seqlock_t *lock, int seq)
{
if (seq & 1)
- write_sequnlock(lock);
+ read_sequnlock(lock);
}

/*
@@ -2673,9 +2673,9 @@ static int prepend(char **buffer, int *buflen, const char *str, int namelen)

/**
* prepend_name - prepend a pathname in front of current buffer pointer
- * buffer: buffer pointer
- * buflen: allocated length of the buffer
- * name: name string and length qstr structure
+ * @buffer: buffer pointer
+ * @buflen: allocated length of the buffer
+ * @name: name string and length qstr structure
*
* With RCU path tracing, it may race with d_move(). Use ACCESS_ONCE() to
* make sure that either the old or the new name pointer and length are
@@ -2713,14 +2713,15 @@ static int prepend_name(char **buffer, int *buflen, struct qstr *name)
* @buffer: pointer to the end of the buffer
* @buflen: pointer to buffer length
*
- * The function tries to write out the pathname without taking any lock other
- * than the RCU read lock to make sure that dentries won't go away. It only
- * checks the sequence number of the global rename_lock as any change in the
- * dentry's d_seq will be preceded by changes in the rename_lock sequence
- * number. If the sequence number had been change, it will restart the whole
- * pathname back-tracing sequence again. It performs a total of 3 trials of
- * lockless back-tracing sequences before falling back to take the
- * rename_lock.
+ * The function will first try to write out the pathname without taking any
+ * lock other than the RCU read lock to make sure that dentries won't go away.
+ * It only checks the sequence number of the global rename_lock as any change
+ * in the dentry's d_seq will be preceded by changes in the rename_lock
+ * sequence number. If the sequence number had been changed, it will restart
+ * the whole pathname back-tracing sequence again by taking the rename_lock.
+ * In this case, there is no need to take the RCU read lock as the recursive
+ * parent pointer references will keep the dentry chain alive as long as no
+ * rename operation is performed.
*/
static int prepend_path(const struct path *path,
const struct path *root,
--
1.7.1

2013-09-11 14:56:02

by Al Viro

[permalink] [raw]

Subject: Re: [PATCH 1/2] seqlock: Add a new blocking reader type

On Wed, Sep 11, 2013 at 10:28:26AM -0400, Waiman Long wrote:
> The sequence lock (seqlock) was originally designed for the cases
> where the readers do not need to block the writers by making the
> readers retry the read operation when the data change.
>
> Since then, the use cases have been expanded to include situations
> where a thread does not need to change the data (effectively a reader)
> at all but have to take the writer lock because it can't tolerate
> changes to the protected structure. Some examples are the d_path()
> function and the getcwd() syscall in fs/dcache.c where the functions
> take the writer lock on rename_lock even though they don't need
> to change anything in the protected data structure at all. This is
> inefficient as a reader is now blocking other non-blocking readers
> by pretending to be a writer.
>
> This patch tries to eliminate this inefficiency by introducing a new
> type of blocking reader to the seqlock locking mechanism. This new
> blocking reader will not block other non-blocking readers, but will
> block other blocking readers and writers.

Umm... That's misleading - it doesn't _block_, it spins. Moroever,
seq_readbegin() also spins in presense of writer; the main property
of this one is that it keeps writers away.

Folks, any suggestions on better names? The semantics we are getting is
this:
* a thread is a writer from the moment of seq_writelock() to
seq_writeunlock()
* a thread is exclusive reader from the moment of seq_readlock()
to seq_readunlock() [and these names and/or descriptions might need
replacement]
* at most one writer or excluding reader at any moment
* seq_readbegin() spins in presense of writers; it doesn't
care about exclusive readers.
* seq_readretry() checks if there had been any writers
since the moment of matching seq_readbegin(); again, it doesn't
care about exclusive readers

IOW, it's not writer vs. reader in sense of rwlock (i.e. exclusive vs.
shared); it's "does vs. doesn't disrupt the structures seq_read{begin/retry}
sections care about".

2013-09-11 16:33:50

by Waiman Long

[permalink] [raw]

Subject: Re: [PATCH 1/2] seqlock: Add a new blocking reader type

On 09/11/2013 10:55 AM, Al Viro wrote:
> On Wed, Sep 11, 2013 at 10:28:26AM -0400, Waiman Long wrote:
>> The sequence lock (seqlock) was originally designed for the cases
>> where the readers do not need to block the writers by making the
>> readers retry the read operation when the data change.
>>
>> Since then, the use cases have been expanded to include situations
>> where a thread does not need to change the data (effectively a reader)
>> at all but have to take the writer lock because it can't tolerate
>> changes to the protected structure. Some examples are the d_path()
>> function and the getcwd() syscall in fs/dcache.c where the functions
>> take the writer lock on rename_lock even though they don't need
>> to change anything in the protected data structure at all. This is
>> inefficient as a reader is now blocking other non-blocking readers
>> by pretending to be a writer.
>>
>> This patch tries to eliminate this inefficiency by introducing a new
>> type of blocking reader to the seqlock locking mechanism. This new
>> blocking reader will not block other non-blocking readers, but will
>> block other blocking readers and writers.
> Umm... That's misleading - it doesn't _block_, it spins. Moroever,
> seq_readbegin() also spins in presense of writer; the main property
> of this one is that it keeps writers away.

I used "block" in the sense that it will stop a writer from moving
forward. I will update the commit log to make that more clear.
> Folks, any suggestions on better names? The semantics we are getting is

I will welcome any better name suggestion and will incorporate that in
the patch.

-Longman

2013-09-11 17:26:32

by Al Viro

[permalink] [raw]

Subject: Re: [PATCH 1/2] seqlock: Add a new blocking reader type

On Wed, Sep 11, 2013 at 12:33:35PM -0400, Waiman Long wrote:

> >Folks, any suggestions on better names? The semantics we are getting is
>
> I will welcome any better name suggestion and will incorporate that
> in the patch.

FWIW, the suggestions I've seen so far had been

seq_exreadlock() [ex for exclusive]
seq_exclreadlock() [ditto, and IMO fails the "easily read over the phone"
test - /sekv-excre...ARRGH/]
seq_prot_readlock() [prot for protected, as in DLM protected read]

2013-09-11 18:35:02

by Waiman Long

[permalink] [raw]

Subject: Re: [PATCH 1/2] seqlock: Add a new blocking reader type

On 09/11/2013 01:26 PM, Al Viro wrote:
> On Wed, Sep 11, 2013 at 12:33:35PM -0400, Waiman Long wrote:
>
>>> Folks, any suggestions on better names? The semantics we are getting is
>> I will welcome any better name suggestion and will incorporate that
>> in the patch.
> FWIW, the suggestions I've seen so far had been
>
> seq_exreadlock() [ex for exclusive]
> seq_exclreadlock() [ditto, and IMO fails the "easily read over the phone"
> test - /sekv-excre...ARRGH/]
> seq_prot_readlock() [prot for protected, as in DLM protected read]

Following the naming convention in seqlock.h that all functions begin
with read_ or write_, is read_seqexcl_lock() look OK to you?

-Longman

2013-09-11 18:40:44

by J. Bruce Fields

[permalink] [raw]

Subject: Re: [PATCH 1/2] seqlock: Add a new blocking reader type

On Wed, Sep 11, 2013 at 06:26:25PM +0100, Al Viro wrote:
> On Wed, Sep 11, 2013 at 12:33:35PM -0400, Waiman Long wrote:
>
> > >Folks, any suggestions on better names? The semantics we are getting is
> >
> > I will welcome any better name suggestion and will incorporate that
> > in the patch.
>
> FWIW, the suggestions I've seen so far had been
>
> seq_exreadlock() [ex for exclusive]
> seq_exclreadlock() [ditto, and IMO fails the "easily read over the phone"
> test - /sekv-excre...ARRGH/]
> seq_prot_readlock() [prot for protected, as in DLM protected read]

Though the DLM protected read doesn't self-conflict either so that's a
poor analogy, my bad.

(Do the users really require that the read be exclusive?)

--b.

2013-09-11 18:52:56

by Al Viro

[permalink] [raw]

Subject: Re: [PATCH 1/2] seqlock: Add a new blocking reader type

On Wed, Sep 11, 2013 at 02:40:38PM -0400, J. Bruce Fields wrote:
> On Wed, Sep 11, 2013 at 06:26:25PM +0100, Al Viro wrote:
> > On Wed, Sep 11, 2013 at 12:33:35PM -0400, Waiman Long wrote:
> >
> > > >Folks, any suggestions on better names? The semantics we are getting is
> > >
> > > I will welcome any better name suggestion and will incorporate that
> > > in the patch.
> >
> > FWIW, the suggestions I've seen so far had been
> >
> > seq_exreadlock() [ex for exclusive]
> > seq_exclreadlock() [ditto, and IMO fails the "easily read over the phone"
> > test - /sekv-excre...ARRGH/]
> > seq_prot_readlock() [prot for protected, as in DLM protected read]
>
> Though the DLM protected read doesn't self-conflict either so that's a
> poor analogy, my bad.
>
> (Do the users really require that the read be exclusive?)

We want to exclude writers and since the writer is
lock:
spin_lock(&sl->lock), bump sl->sequence by 1, smp_wmb()
unlock:
smp_wmb(), bump sl->sequence by 1, spin_unlock(&sl->lock)
the obvious implementation for this new primitive is simply spin_{lock,unlock}
on the same spinlock...