2020-08-27 16:25:06

by Peter Zijlstra

[permalink] [raw]
Subject: [RFC][PATCH 6/7] freelist: Lock less freelist



Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
---
include/linux/freelist.h | 129 +++++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 129 insertions(+)

--- /dev/null
+++ b/include/linux/freelist.h
@@ -0,0 +1,129 @@
+// SPDX-License-Identifier: GPL-2.0-only OR BSD-2-Clause
+#ifndef FREELIST_H
+#define FREELIST_H
+
+#include <linux/atomic.h>
+
+/*
+ * Copyright: [email protected]
+ *
+ * A simple CAS-based lock-free free list. Not the fastest thing in the world
+ * under heavy contention, but simple and correct (assuming nodes are never
+ * freed until after the free list is destroyed), and fairly speedy under low
+ * contention.
+ *
+ * Adapted from: https://moodycamel.com/blog/2014/solving-the-aba-problem-for-lock-free-free-lists
+ */
+
+struct freelist_node {
+ atomic_t refs;
+ struct freelist_node *next;
+};
+
+struct freelist_head {
+ struct freelist_node *head;
+};
+
+#define REFS_ON_FREELIST 0x80000000
+#define REFS_MASK 0x7FFFFFFF
+
+static inline void __freelist_add(struct freelist_node *node, struct freelist_head *list)
+{
+ /*
+ * Since the refcount is zero, and nobody can increase it once it's
+ * zero (except us, and we run only one copy of this method per node at
+ * a time, i.e. the single thread case), then we know we can safely
+ * change the next pointer of the node; however, once the refcount is
+ * back above zero, then other threads could increase it (happens under
+ * heavy contention, when the refcount goes to zero in between a load
+ * and a refcount increment of a node in try_get, then back up to
+ * something non-zero, then the refcount increment is done by the other
+ * thread) -- so if the CAS to add the node to the actual list fails,
+ * decrese the refcount and leave the add operation to the next thread
+ * who puts the refcount back to zero (which could be us, hence the
+ * loop).
+ */
+ struct freelist_node *head = READ_ONCE(list->head);
+
+ for (;;) {
+ WRITE_ONCE(node->next, head);
+ atomic_set_release(&node->refs, 1);
+
+ if (!try_cmpxchg_release(&list->head, &head, node)) {
+ /*
+ * Hmm, the add failed, but we can only try again when
+ * the refcount goes back to zero.
+ */
+ if (atomic_fetch_add_release(REFS_ON_FREELIST - 1, &node->refs) == 1)
+ continue;
+ }
+ return;
+ }
+}
+
+static inline void freelist_add(struct freelist_node *node, struct freelist_head *list)
+{
+ /*
+ * We know that the should-be-on-freelist bit is 0 at this point, so
+ * it's safe to set it using a fetch_add.
+ */
+ if (!atomic_fetch_add_release(REFS_ON_FREELIST, &node->refs)) {
+ /*
+ * Oh look! We were the last ones referencing this node, and we
+ * know we want to add it to the free list, so let's do it!
+ */
+ __freelist_add(node, list);
+ }
+}
+
+static inline struct freelist_node *freelist_try_get(struct freelist_head *list)
+{
+ struct freelist_node *prev, *next, *head = smp_load_acquire(&list->head);
+ unsigned int refs;
+
+ while (head) {
+ prev = head;
+ refs = atomic_read(&head->refs);
+ if ((refs & REFS_MASK) == 0 ||
+ !atomic_try_cmpxchg_acquire(&head->refs, &refs, refs+1)) {
+ head = smp_load_acquire(&list->head);
+ continue;
+ }
+
+ /*
+ * Good, reference count has been incremented (it wasn't at
+ * zero), which means we can read the next and not worry about
+ * it changing between now and the time we do the CAS.
+ */
+ next = READ_ONCE(head->next);
+ if (try_cmpxchg_acquire(&list->head, &head, next)) {
+ /*
+ * Yay, got the node. This means it was on the list,
+ * which means should-be-on-freelist must be false no
+ * matter the refcount (because nobody else knows it's
+ * been taken off yet, it can't have been put back on).
+ */
+ WARN_ON_ONCE(atomic_read(&head->refs) & REFS_ON_FREELIST);
+
+ /*
+ * Decrease refcount twice, once for our ref, and once
+ * for the list's ref.
+ */
+ atomic_fetch_add(-2, &head->refs);
+
+ return head;
+ }
+
+ /*
+ * OK, the head must have changed on us, but we still need to decrement
+ * the refcount we increased.
+ */
+ refs = atomic_fetch_add(-1, &prev->refs);
+ if (refs == REFS_ON_FREELIST + 1)
+ __freelist_add(prev, list);
+ }
+
+ return NULL;
+}
+
+#endif /* FREELIST_H */



2020-08-27 16:39:46

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [RFC][PATCH 6/7] freelist: Lock less freelist

On Thu, Aug 27, 2020 at 06:12:43PM +0200, Peter Zijlstra wrote:
> +struct freelist_node {
> + atomic_t refs;
> + struct freelist_node *next;
> +};

Bah, the next patch relies on this structure to be ordered the other
way around.

Clearly writing code and listening to talks isn't ideal :-)

2020-08-27 16:57:25

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [RFC][PATCH 6/7] freelist: Lock less freelist

On Thu, Aug 27, 2020 at 12:49:20PM -0400, Cameron wrote:
> For what it's worth, the freelist.h code seems to be a faithful adaptation
> of my original blog post code. Didn't think it would end up in the Linux
> kernel one day :-)

Hehe, I ran into the traditional ABA problem for the lockless stack and
asked google, your blog came up.

I'll try and actually think about it a little when the current (virtual)
conference is over.

Are you Ok with the License I put on it, GPLv2 or BDS-2 ?

> I'm just wondering if the assumption that "nodes are never freed until
> after the free list is destroyed" will hold true in the intended use case?

It does, the nodes are only deleted once the whole freelist object is
discarded.

2020-08-27 17:02:18

by Cameron

[permalink] [raw]
Subject: Re: [RFC][PATCH 6/7] freelist: Lock less freelist

On Thu, Aug 27, 2020 at 12:56 PM <[email protected]> wrote:
> Are you Ok with the License I put on it, GPLv2 or BDS-2 ?
Yes, both GPLv2 and BSD-2 (my personal favourite) are fine.

2020-08-27 19:11:10

by Boqun Feng

[permalink] [raw]
Subject: Re: [RFC][PATCH 6/7] freelist: Lock less freelist

On Thu, Aug 27, 2020 at 06:12:43PM +0200, Peter Zijlstra wrote:
>
>
> Cc: [email protected]
> Cc: [email protected]
> Cc: [email protected]
> Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
> ---
> include/linux/freelist.h | 129 +++++++++++++++++++++++++++++++++++++++++++++++
> 1 file changed, 129 insertions(+)
>
> --- /dev/null
> +++ b/include/linux/freelist.h
> @@ -0,0 +1,129 @@
> +// SPDX-License-Identifier: GPL-2.0-only OR BSD-2-Clause
> +#ifndef FREELIST_H
> +#define FREELIST_H
> +
> +#include <linux/atomic.h>
> +
> +/*
> + * Copyright: [email protected]
> + *
> + * A simple CAS-based lock-free free list. Not the fastest thing in the world
> + * under heavy contention, but simple and correct (assuming nodes are never
> + * freed until after the free list is destroyed), and fairly speedy under low
> + * contention.
> + *
> + * Adapted from: https://moodycamel.com/blog/2014/solving-the-aba-problem-for-lock-free-free-lists
> + */
> +
> +struct freelist_node {
> + atomic_t refs;
> + struct freelist_node *next;
> +};
> +
> +struct freelist_head {
> + struct freelist_node *head;
> +};
> +
> +#define REFS_ON_FREELIST 0x80000000
> +#define REFS_MASK 0x7FFFFFFF
> +
> +static inline void __freelist_add(struct freelist_node *node, struct freelist_head *list)
> +{
> + /*
> + * Since the refcount is zero, and nobody can increase it once it's
> + * zero (except us, and we run only one copy of this method per node at
> + * a time, i.e. the single thread case), then we know we can safely
> + * change the next pointer of the node; however, once the refcount is
> + * back above zero, then other threads could increase it (happens under
> + * heavy contention, when the refcount goes to zero in between a load
> + * and a refcount increment of a node in try_get, then back up to
> + * something non-zero, then the refcount increment is done by the other
> + * thread) -- so if the CAS to add the node to the actual list fails,
> + * decrese the refcount and leave the add operation to the next thread
> + * who puts the refcount back to zero (which could be us, hence the
> + * loop).
> + */
> + struct freelist_node *head = READ_ONCE(list->head);
> +
> + for (;;) {
> + WRITE_ONCE(node->next, head);
> + atomic_set_release(&node->refs, 1);
> +
> + if (!try_cmpxchg_release(&list->head, &head, node)) {
> + /*
> + * Hmm, the add failed, but we can only try again when
> + * the refcount goes back to zero.
> + */
> + if (atomic_fetch_add_release(REFS_ON_FREELIST - 1, &node->refs) == 1)
> + continue;
> + }
> + return;
> + }
> +}
> +
> +static inline void freelist_add(struct freelist_node *node, struct freelist_head *list)
> +{
> + /*
> + * We know that the should-be-on-freelist bit is 0 at this point, so
> + * it's safe to set it using a fetch_add.
> + */
> + if (!atomic_fetch_add_release(REFS_ON_FREELIST, &node->refs)) {
> + /*
> + * Oh look! We were the last ones referencing this node, and we
> + * know we want to add it to the free list, so let's do it!
> + */
> + __freelist_add(node, list);
> + }
> +}
> +
> +static inline struct freelist_node *freelist_try_get(struct freelist_head *list)
> +{
> + struct freelist_node *prev, *next, *head = smp_load_acquire(&list->head);
> + unsigned int refs;
> +
> + while (head) {
> + prev = head;
> + refs = atomic_read(&head->refs);
> + if ((refs & REFS_MASK) == 0 ||
> + !atomic_try_cmpxchg_acquire(&head->refs, &refs, refs+1)) {
> + head = smp_load_acquire(&list->head);
> + continue;
> + }
> +
> + /*
> + * Good, reference count has been incremented (it wasn't at
> + * zero), which means we can read the next and not worry about
> + * it changing between now and the time we do the CAS.
> + */
> + next = READ_ONCE(head->next);
> + if (try_cmpxchg_acquire(&list->head, &head, next)) {

So if try_cmpxchg_acquire() fails, we don't have ACQUIRE semantics on
read of the new list->head, right? Then probably a
smp_mb__after_atomic() is needed in that case?

Regards,
Boqun

> + /*
> + * Yay, got the node. This means it was on the list,
> + * which means should-be-on-freelist must be false no
> + * matter the refcount (because nobody else knows it's
> + * been taken off yet, it can't have been put back on).
> + */
> + WARN_ON_ONCE(atomic_read(&head->refs) & REFS_ON_FREELIST);
> +
> + /*
> + * Decrease refcount twice, once for our ref, and once
> + * for the list's ref.
> + */
> + atomic_fetch_add(-2, &head->refs);
> +
> + return head;
> + }
> +
> + /*
> + * OK, the head must have changed on us, but we still need to decrement
> + * the refcount we increased.
> + */
> + refs = atomic_fetch_add(-1, &prev->refs);
> + if (refs == REFS_ON_FREELIST + 1)
> + __freelist_add(prev, list);
> + }
> +
> + return NULL;
> +}
> +
> +#endif /* FREELIST_H */
>
>

2020-08-27 19:58:38

by Cameron

[permalink] [raw]
Subject: Re: [RFC][PATCH 6/7] freelist: Lock less freelist

On Thu, Aug 27, 2020 at 3:08 PM Boqun Feng <[email protected]> wrote:
> So if try_cmpxchg_acquire() fails, we don't have ACQUIRE semantics on
> read of the new list->head, right? Then probably a
> smp_mb__after_atomic() is needed in that case?

Yes, there needs to be an acquire on the head after a failed cmpxchg;
does the atomic_fetch_add following that not have acquire semantics?

Cameron

2020-08-28 01:37:49

by Boqun Feng

[permalink] [raw]
Subject: Re: [RFC][PATCH 6/7] freelist: Lock less freelist

On Thu, Aug 27, 2020 at 03:57:22PM -0400, Cameron wrote:
> On Thu, Aug 27, 2020 at 3:08 PM Boqun Feng <[email protected]> wrote:
> > So if try_cmpxchg_acquire() fails, we don't have ACQUIRE semantics on
> > read of the new list->head, right? Then probably a
> > smp_mb__after_atomic() is needed in that case?
>
> Yes, there needs to be an acquire on the head after a failed cmpxchg;
> does the atomic_fetch_add following that not have acquire semantics?
>

Yes, you're right, the atomic_fecth_add() following is a fully-ordered
atomic, so could provide the necessary ACQUIRE semantics. I was missing
that. Maybe a few words explaining this helps.

Regards,
Boqun

> Cameron

2020-08-28 04:06:27

by Lai Jiangshan

[permalink] [raw]
Subject: Re: [RFC][PATCH 6/7] freelist: Lock less freelist

On Fri, Aug 28, 2020 at 12:23 AM Peter Zijlstra <[email protected]> wrote:

> +static inline void __freelist_add(struct freelist_node *node, struct freelist_head *list)
> +{
> + /*
> + * Since the refcount is zero, and nobody can increase it once it's
> + * zero (except us, and we run only one copy of this method per node at
> + * a time, i.e. the single thread case), then we know we can safely


> +
> + /*
> + * OK, the head must have changed on us, but we still need to decrement
> + * the refcount we increased.
> + */
> + refs = atomic_fetch_add(-1, &prev->refs);
> + if (refs == REFS_ON_FREELIST + 1)
> + __freelist_add(prev, list);

I'm curious whether it is correct to just set the prev->refs to zero and return
@prev? So that it can remove an unneeded "add()&get()" pair (although in
an unlikely branch) and __freelist_add() can be folded into freelist_add()
for tidier code.

Thanks
Lai.

2020-08-28 14:48:09

by Oleg Nesterov

[permalink] [raw]
Subject: Re: [RFC][PATCH 6/7] freelist: Lock less freelist

On 08/27, Peter Zijlstra wrote:
>
> 1 file changed, 129 insertions(+)

129 lines! And I spent more than 2 hours trying to understand these
129 lines ;) looks correct...

However, I still can't understand the usage of _acquire/release ops
in this code.

> +static inline void __freelist_add(struct freelist_node *node, struct freelist_head *list)
> +{
> + /*
> + * Since the refcount is zero, and nobody can increase it once it's
> + * zero (except us, and we run only one copy of this method per node at
> + * a time, i.e. the single thread case), then we know we can safely
> + * change the next pointer of the node; however, once the refcount is
> + * back above zero, then other threads could increase it (happens under
> + * heavy contention, when the refcount goes to zero in between a load
> + * and a refcount increment of a node in try_get, then back up to
> + * something non-zero, then the refcount increment is done by the other
> + * thread) -- so if the CAS to add the node to the actual list fails,
> + * decrese the refcount and leave the add operation to the next thread
> + * who puts the refcount back to zero (which could be us, hence the
> + * loop).
> + */
> + struct freelist_node *head = READ_ONCE(list->head);
> +
> + for (;;) {
> + WRITE_ONCE(node->next, head);
> + atomic_set_release(&node->refs, 1);
> +
> + if (!try_cmpxchg_release(&list->head, &head, node)) {

OK, these 2 _release above look understandable, they pair with
atomic_try_cmpxchg_acquire/try_cmpxchg_acquire in freelist_try_get().

> + /*
> + * Hmm, the add failed, but we can only try again when
> + * the refcount goes back to zero.
> + */
> + if (atomic_fetch_add_release(REFS_ON_FREELIST - 1, &node->refs) == 1)
> + continue;

Do we really need _release ? Why can't atomic_fetch_add_relaxed() work?

> +static inline struct freelist_node *freelist_try_get(struct freelist_head *list)
> +{
> + struct freelist_node *prev, *next, *head = smp_load_acquire(&list->head);
> + unsigned int refs;
> +
> + while (head) {
> + prev = head;
> + refs = atomic_read(&head->refs);
> + if ((refs & REFS_MASK) == 0 ||
> + !atomic_try_cmpxchg_acquire(&head->refs, &refs, refs+1)) {
> + head = smp_load_acquire(&list->head);
> + continue;
> + }
> +
> + /*
> + * Good, reference count has been incremented (it wasn't at
> + * zero), which means we can read the next and not worry about
> + * it changing between now and the time we do the CAS.
> + */
> + next = READ_ONCE(head->next);
> + if (try_cmpxchg_acquire(&list->head, &head, next)) {
> + /*
> + * Yay, got the node. This means it was on the list,
> + * which means should-be-on-freelist must be false no
> + * matter the refcount (because nobody else knows it's
> + * been taken off yet, it can't have been put back on).
> + */
> + WARN_ON_ONCE(atomic_read(&head->refs) & REFS_ON_FREELIST);
> +
> + /*
> + * Decrease refcount twice, once for our ref, and once
> + * for the list's ref.
> + */
> + atomic_fetch_add(-2, &head->refs);

Do we the barriers implied by _fetch_? Why can't atomic_sub(2, refs) work?

> + /*
> + * OK, the head must have changed on us, but we still need to decrement
> + * the refcount we increased.
> + */
> + refs = atomic_fetch_add(-1, &prev->refs);

Cosmetic, but why not atomic_fetch_dec() ?

Oleg.

2020-08-28 15:31:32

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [RFC][PATCH 6/7] freelist: Lock less freelist

On Fri, Aug 28, 2020 at 04:46:52PM +0200, Oleg Nesterov wrote:
> On 08/27, Peter Zijlstra wrote:
> >
> > 1 file changed, 129 insertions(+)
>
> 129 lines! And I spent more than 2 hours trying to understand these
> 129 lines ;) looks correct...

Yes, even though it already has a bunch of comments, I do feel we can
maybe improve on that a little.

For now I went for a 1:1 transliteration of the blog post though.

> > + /*
> > + * Yay, got the node. This means it was on the list,
> > + * which means should-be-on-freelist must be false no
> > + * matter the refcount (because nobody else knows it's
> > + * been taken off yet, it can't have been put back on).
> > + */
> > + WARN_ON_ONCE(atomic_read(&head->refs) & REFS_ON_FREELIST);
> > +
> > + /*
> > + * Decrease refcount twice, once for our ref, and once
> > + * for the list's ref.
> > + */
> > + atomic_fetch_add(-2, &head->refs);
>
> Do we the barriers implied by _fetch_? Why can't atomic_sub(2, refs) work?

I think we can, the original has std::memory_order_relaxed here. So I
should've used atomic_fetch_add_relaxed() but since we don't use the
return value, atomic_sub() would work just fine too.

> > + /*
> > + * OK, the head must have changed on us, but we still need to decrement
> > + * the refcount we increased.
> > + */
> > + refs = atomic_fetch_add(-1, &prev->refs);
>
> Cosmetic, but why not atomic_fetch_dec() ?

The original had that, I didn't want to risk more bugs by 'improving'
things. But yes, that can definitely become dec().

2020-08-29 03:07:14

by Cameron

[permalink] [raw]
Subject: Re: [RFC][PATCH 6/7] freelist: Lock less freelist

> I'm curious whether it is correct to just set the prev->refs to zero and return
> @prev? So that it can remove an unneeded "add()&get()" pair (although in
> an unlikely branch) and __freelist_add() can be folded into freelist_add()
> for tidier code.

That is a very good question. I believe it would be correct, yes, and
would certainly simplify the code. The reference count is zero, so
nobody can increment it again, and REFS_ON_FREELIST is set (the
should-be-on-freelist flag), so instead of adding it to the freelist
to be removed later, it can be returned directly.

> On Fri, Aug 28, 2020 at 04:46:52PM +0200, Oleg Nesterov wrote:
> > 129 lines! And I spent more than 2 hours trying to understand these
> > 129 lines ;) looks correct...

Hahaha. Sorry about that. Some of the most mind-bending code I've ever
written. :-)

> > + /*
> > + * Hmm, the add failed, but we can only try again when
> > + * the refcount goes back to zero.
> > + */
> > + if (atomic_fetch_add_release(REFS_ON_FREELIST - 1, &node->refs) == 1)
> > + continue;
> Do we really need _release ? Why can't atomic_fetch_add_relaxed() work?

The release is to synchronize with the acquire in the compare-exchange
of try_get. However, I think you're right, between the previous
release-write to the refs and that point, there are no memory effects
that need to be synchronized, and so it could be safely replaced with
a relaxed operation.

> Cosmetic, but why not atomic_fetch_dec() ?

This is just one of my idiosyncrasies. I prefer exclusively using
atomic add, I find it easier to read. Feel free to change them all to
subs!

Cameron


> >
> > Do we the barriers implied by _fetch_? Why can't atomic_sub(2, refs) work?
>
> I think we can, the original has std::memory_order_relaxed here. So I
> should've used atomic_fetch_add_relaxed() but since we don't use the
> return value, atomic_sub() would work just fine too.
>
> > > + /*
> > > + * OK, the head must have changed on us, but we still need to decrement
> > > + * the refcount we increased.
> > > + */
> > > + refs = atomic_fetch_add(-1, &prev->refs);
> >
> > Cosmetic, but why not atomic_fetch_dec() ?
>
> The original had that, I didn't want to risk more bugs by 'improving'
> things. But yes, that can definitely become dec().