A lockless_dereference() appears to be missing in llist_del_first().
It should only matter for Alpha in practice.
Signed-off-by: Mathieu Desnoyers <[email protected]>
CC: Huang Ying <[email protected]>
CC: Paul McKenney <[email protected]>
CC: David Howells <[email protected]>
CC: Pranith Kumar <[email protected]>
CC: [email protected] # v3.1+
---
lib/llist.c | 8 +++++++-
1 file changed, 7 insertions(+), 1 deletion(-)
diff --git a/lib/llist.c b/lib/llist.c
index f76196d..f34e176 100644
--- a/lib/llist.c
+++ b/lib/llist.c
@@ -26,6 +26,7 @@
#include <linux/export.h>
#include <linux/interrupt.h>
#include <linux/llist.h>
+#include <linux/rcupdate.h>
/**
@@ -67,7 +68,12 @@ struct llist_node *llist_del_first(struct llist_head *head)
{
struct llist_node *entry, *old_entry, *next;
- entry = head->first;
+ /*
+ * Load entry before entry->next. Matches the implicit
+ * memory barrier before the cmpxchg in llist_add_batch(),
+ * which ensures entry->next is stored before entry.
+ */
+ entry = lockless_dereference(head->first);
for (;;) {
if (entry == NULL)
return NULL;
--
2.1.4
On Fri, Feb 06, 2015 at 09:08:21PM -0500, Mathieu Desnoyers wrote:
> A lockless_dereference() appears to be missing in llist_del_first().
> It should only matter for Alpha in practice.
Meta-comment, do we really care about Alpha anymore? Is it still
consered an "active" arch we support? I haven't seen a single
alpha-related stable patch in _years_ if at all, which implies to me
that no one is even using it.
Not that stable patches for architectures are a valid reference for how
much they are used, but it does give me a good indication of what arches
have users that actually care about a modern (i.e. within the past 5
years) kernel.
thanks,
greg k-h
----- Original Message -----
> From: "Greg KH" <[email protected]>
> To: "Mathieu Desnoyers" <[email protected]>
> Cc: "Huang Ying" <[email protected]>, [email protected], "Paul McKenney" <[email protected]>,
> "David Howells" <[email protected]>, "Pranith Kumar" <[email protected]>, [email protected]
> Sent: Saturday, February 7, 2015 5:16:25 PM
> Subject: Re: [PATCH] llist: Fix missing lockless_dereference()
>
> On Fri, Feb 06, 2015 at 09:08:21PM -0500, Mathieu Desnoyers wrote:
> > A lockless_dereference() appears to be missing in llist_del_first().
> > It should only matter for Alpha in practice.
>
> Meta-comment, do we really care about Alpha anymore? Is it still
> consered an "active" arch we support? I haven't seen a single
> alpha-related stable patch in _years_ if at all, which implies to me
> that no one is even using it.
>
> Not that stable patches for architectures are a valid reference for how
> much they are used, but it does give me a good indication of what arches
> have users that actually care about a modern (i.e. within the past 5
> years) kernel.
Good question. Adding the Alpha maintainers to the CC.
Thanks,
Mathieu
>
> thanks,
>
> greg k-h
>
--
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com
On Sun, Feb 08, 2015 at 06:16:25AM +0800, Greg KH wrote:
> On Fri, Feb 06, 2015 at 09:08:21PM -0500, Mathieu Desnoyers wrote:
> > A lockless_dereference() appears to be missing in llist_del_first().
> > It should only matter for Alpha in practice.
>
> Meta-comment, do we really care about Alpha anymore? Is it still
> consered an "active" arch we support? I haven't seen a single
> alpha-related stable patch in _years_ if at all, which implies to me
> that no one is even using it.
>
> Not that stable patches for architectures are a valid reference for how
> much they are used, but it does give me a good indication of what arches
> have users that actually care about a modern (i.e. within the past 5
> years) kernel.
I get a reasonable number of objections whenever I suggest something that
would cause problems for Alpha. That said, my most recent suggestion
turns out to be mandated by recent versions of the C standard, so I think
that they have no choice but to get their compiler back-ends up to snuff.
(Before C11, a C compiler could legally compile a byte store as a
non-atomic read-modify-write sequence on the surrounding 32-bit quantity.
C11 and later outlaw this practice because it can introduce data races,
even in programs that use nothing but locking for synchronization.
The fix for this was introduced into gcc 4.7.)
Thanx, Paul
On Sat, Feb 7, 2015 at 2:30 PM, Mathieu Desnoyers
<[email protected]> wrote:
> ----- Original Message -----
>> From: "Greg KH" <[email protected]>
>> To: "Mathieu Desnoyers" <[email protected]>
>> Cc: "Huang Ying" <[email protected]>, [email protected], "Paul McKenney" <[email protected]>,
>> "David Howells" <[email protected]>, "Pranith Kumar" <[email protected]>, [email protected]
>> Sent: Saturday, February 7, 2015 5:16:25 PM
>> Subject: Re: [PATCH] llist: Fix missing lockless_dereference()
>>
>> On Fri, Feb 06, 2015 at 09:08:21PM -0500, Mathieu Desnoyers wrote:
>> > A lockless_dereference() appears to be missing in llist_del_first().
>> > It should only matter for Alpha in practice.
>>
>> Meta-comment, do we really care about Alpha anymore? Is it still
>> consered an "active" arch we support? I haven't seen a single
>> alpha-related stable patch in _years_ if at all, which implies to me
>> that no one is even using it.
>>
>> Not that stable patches for architectures are a valid reference for how
>> much they are used, but it does give me a good indication of what arches
>> have users that actually care about a modern (i.e. within the past 5
>> years) kernel.
>
> Good question. Adding the Alpha maintainers to the CC.
>
> Thanks,
>
> Mathieu
Hello,
Yes, Gentoo has a maintained Alpha port. We care about having modern
kernels (though I have not personally had a lot of time to work on
that recently)
Thanks,
Matt
On Sat, Feb 07, 2015 at 04:18:14PM -0800, Matt Turner wrote:
> On Sat, Feb 7, 2015 at 2:30 PM, Mathieu Desnoyers
> <[email protected]> wrote:
> > ----- Original Message -----
> >> From: "Greg KH" <[email protected]>
> >> To: "Mathieu Desnoyers" <[email protected]>
> >> Cc: "Huang Ying" <[email protected]>, [email protected], "Paul McKenney" <[email protected]>,
> >> "David Howells" <[email protected]>, "Pranith Kumar" <[email protected]>, [email protected]
> >> Sent: Saturday, February 7, 2015 5:16:25 PM
> >> Subject: Re: [PATCH] llist: Fix missing lockless_dereference()
> >>
> >> On Fri, Feb 06, 2015 at 09:08:21PM -0500, Mathieu Desnoyers wrote:
> >> > A lockless_dereference() appears to be missing in llist_del_first().
> >> > It should only matter for Alpha in practice.
> >>
> >> Meta-comment, do we really care about Alpha anymore? Is it still
> >> consered an "active" arch we support? I haven't seen a single
> >> alpha-related stable patch in _years_ if at all, which implies to me
> >> that no one is even using it.
> >>
> >> Not that stable patches for architectures are a valid reference for how
> >> much they are used, but it does give me a good indication of what arches
> >> have users that actually care about a modern (i.e. within the past 5
> >> years) kernel.
> >
> > Good question. Adding the Alpha maintainers to the CC.
> >
> > Thanks,
> >
> > Mathieu
>
> Hello,
>
> Yes, Gentoo has a maintained Alpha port. We care about having modern
> kernels (though I have not personally had a lot of time to work on
> that recently)
Ok, fair enough, thanks for letting me know. I guess we can't drop it
just yet :)
greg k-h
On Sat, Feb 07, 2015 at 10:30:44PM +0000, Mathieu Desnoyers wrote:
> > On Fri, Feb 06, 2015 at 09:08:21PM -0500, Mathieu Desnoyers wrote:
> > > A lockless_dereference() appears to be missing in llist_del_first().
> > > It should only matter for Alpha in practice.
What could one anticipate to be the symptoms of such a missing
lockless_dereference()?
The Alpha kernel is behaving pretty well provided one builds a machine
specific kernel and UP. When running an SMP kernel some packages
(most notably the java runtime, but there are a few others) occasionally
lock up in a pthread call --- could be a problem in libc rather then the
kernel.
> > Meta-comment, do we really care about Alpha anymore? Is it still
> > consered an "active" arch we support?
There are a few of us still running recent kernels on Alpha. I am
maintaining the unofficial Debian alpha port at debian-ports, and the
Debian popcon shows about 10 installations of Debian Alpha.
Cheers
Michael.
On Sun, Feb 08, 2015 at 01:47:29PM +1300, Michael Cree wrote:
> On Sat, Feb 07, 2015 at 10:30:44PM +0000, Mathieu Desnoyers wrote:
> > > On Fri, Feb 06, 2015 at 09:08:21PM -0500, Mathieu Desnoyers wrote:
> > > > A lockless_dereference() appears to be missing in llist_del_first().
> > > > It should only matter for Alpha in practice.
>
> What could one anticipate to be the symptoms of such a missing
> lockless_dereference()?
>
> The Alpha kernel is behaving pretty well provided one builds a machine
> specific kernel and UP. When running an SMP kernel some packages
> (most notably the java runtime, but there are a few others) occasionally
> lock up in a pthread call --- could be a problem in libc rather then the
> kernel.
Hm, if only UP alpha needs to be supported, odds are we could rip a lot
of odd stuff out of the kernel that deals with memory barriers and other
nasty locking things that the Alpha requires.
Would that be ok? Or is someone somewhere going to want to be running a
SMP kernel on Alpha in the future?
thanks,
greg k-h
On Sun, Feb 08, 2015 at 08:59:41AM +0800, Greg KH wrote:
> On Sun, Feb 08, 2015 at 01:47:29PM +1300, Michael Cree wrote:
> > On Sat, Feb 07, 2015 at 10:30:44PM +0000, Mathieu Desnoyers wrote:
> > > > On Fri, Feb 06, 2015 at 09:08:21PM -0500, Mathieu Desnoyers wrote:
> > > > > A lockless_dereference() appears to be missing in llist_del_first().
> > > > > It should only matter for Alpha in practice.
> >
> > What could one anticipate to be the symptoms of such a missing
> > lockless_dereference()?
> >
> > The Alpha kernel is behaving pretty well provided one builds a machine
> > specific kernel and UP. When running an SMP kernel some packages
> > (most notably the java runtime, but there are a few others) occasionally
> > lock up in a pthread call --- could be a problem in libc rather then the
> > kernel.
>
> Hm, if only UP alpha needs to be supported, odds are we could rip a lot
> of odd stuff out of the kernel that deals with memory barriers and other
> nasty locking things that the Alpha requires.
>
> Would that be ok? Or is someone somewhere going to want to be running a
> SMP kernel on Alpha in the future?
I am running an SMP kernel on a 3-cpu Alpha system; it mostly works
just fine.
I was just noting that there is something up with java---it locks up
occassionally in a pthread call, and there are a few other packages
that occasionally fail in test suites when being built under an SMP
kernel but always pass when built under an UP kernel which suggests
there is a little buglet somewhere in the SMP code, either in the
kernel or in libc.
Running an SMP system for the Debian Alpha build daemon at debian-ports
is really useful for keeping up with the other architectures.
Cheers
Michael.
On Sun, Feb 08, 2015 at 02:12:04PM +1300, Michael Cree wrote:
> On Sun, Feb 08, 2015 at 08:59:41AM +0800, Greg KH wrote:
> > On Sun, Feb 08, 2015 at 01:47:29PM +1300, Michael Cree wrote:
> > > On Sat, Feb 07, 2015 at 10:30:44PM +0000, Mathieu Desnoyers wrote:
> > > > > On Fri, Feb 06, 2015 at 09:08:21PM -0500, Mathieu Desnoyers wrote:
> > > > > > A lockless_dereference() appears to be missing in llist_del_first().
> > > > > > It should only matter for Alpha in practice.
> > >
> > > What could one anticipate to be the symptoms of such a missing
> > > lockless_dereference()?
> > >
> > > The Alpha kernel is behaving pretty well provided one builds a machine
> > > specific kernel and UP. When running an SMP kernel some packages
> > > (most notably the java runtime, but there are a few others) occasionally
> > > lock up in a pthread call --- could be a problem in libc rather then the
> > > kernel.
> >
> > Hm, if only UP alpha needs to be supported, odds are we could rip a lot
> > of odd stuff out of the kernel that deals with memory barriers and other
> > nasty locking things that the Alpha requires.
> >
> > Would that be ok? Or is someone somewhere going to want to be running a
> > SMP kernel on Alpha in the future?
>
> I am running an SMP kernel on a 3-cpu Alpha system; it mostly works
> just fine.
>
> I was just noting that there is something up with java---it locks up
> occassionally in a pthread call, and there are a few other packages
> that occasionally fail in test suites when being built under an SMP
> kernel but always pass when built under an UP kernel which suggests
> there is a little buglet somewhere in the SMP code, either in the
> kernel or in libc.
>
> Running an SMP system for the Debian Alpha build daemon at debian-ports
> is really useful for keeping up with the other architectures.
Ok, sorry, I got the impression that you weren't running a SMP kernel,
nevermind then, we'll go back to keeping this ancient beast alive :)
greg k-h
----- Original Message -----
> From: "Michael Cree" <[email protected]>
> To: "Mathieu Desnoyers" <[email protected]>
> Cc: "Greg KH" <[email protected]>, [email protected], "Richard Henderson" <[email protected]>, "Ivan
> Kokshaysky" <[email protected]>, "Matt Turner" <[email protected]>, "Huang Ying" <[email protected]>,
> [email protected], "Paul McKenney" <[email protected]>, "David Howells" <[email protected]>,
> "Pranith Kumar" <[email protected]>, [email protected]
> Sent: Saturday, February 7, 2015 7:47:29 PM
> Subject: Re: [PATCH] llist: Fix missing lockless_dereference()
>
> On Sat, Feb 07, 2015 at 10:30:44PM +0000, Mathieu Desnoyers wrote:
> > > On Fri, Feb 06, 2015 at 09:08:21PM -0500, Mathieu Desnoyers wrote:
> > > > A lockless_dereference() appears to be missing in llist_del_first().
> > > > It should only matter for Alpha in practice.
>
> What could one anticipate to be the symptoms of such a missing
> lockless_dereference()?
This can trigger corruption of the lockless linked-list, which is
used across a few subsystems. AFAIU, the scenario is as follows.
Please bear with me, because it's been a while since I've read on
the Alpha multi-cache-banks behavior.
The list here would be initially non-empty. Initial state of
new_last->next is unset (newly allocated); IOW: garbage. CPU A
adds a node into the list while CPU B removes a node from the
head of the list.
CPU A CPU B
llist_add_batch()
- Stores to new_last->next
- implicit full mb before cmpxchg makes the
update to CPU A's cache bank containing
new_last->next visible to other CPUs
before CPU A's cache bank update making
head->first visible to other CPUs.
- cmpxchg updates head->first = new_first
llist_del_first()
- entry = load head->first
-> here, lack of barrier on Alpha creates a window where
CPU B's cache bank can see the updated "head->first",
but the cache bank holding the next value did not
receive the update yet, since each cache bank have
their own channel, which can be independently
saturated.
- next = load entry->next (dereference entry pointer)
- cmpxchg updates head->first = next
-> can store unset "next" value into head->first, thus
corrupting the linked list.
>
> The Alpha kernel is behaving pretty well provided one builds a machine
> specific kernel and UP. When running an SMP kernel some packages
> (most notably the java runtime, but there are a few others) occasionally
> lock up in a pthread call --- could be a problem in libc rather then the
> kernel.
Are those lockups always occasional, or you have ways to reproduce them
frequently with stress-tests ?
Thanks,
Mathieu
>
> > > Meta-comment, do we really care about Alpha anymore? Is it still
> > > consered an "active" arch we support?
>
> There are a few of us still running recent kernels on Alpha. I am
> maintaining the unofficial Debian alpha port at debian-ports, and the
> Debian popcon shows about 10 installations of Debian Alpha.
>
> Cheers
> Michael.
>
--
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com
Hi, Mathieu,
On Sun, 2015-02-08 at 04:25 +0000, Mathieu Desnoyers wrote:
> ----- Original Message -----
> > From: "Michael Cree" <[email protected]>
> > To: "Mathieu Desnoyers" <[email protected]>
> > Cc: "Greg KH" <[email protected]>, [email protected], "Richard Henderson" <[email protected]>, "Ivan
> > Kokshaysky" <[email protected]>, "Matt Turner" <[email protected]>, "Huang Ying" <[email protected]>,
> > [email protected], "Paul McKenney" <[email protected]>, "David Howells" <[email protected]>,
> > "Pranith Kumar" <[email protected]>, [email protected]
> > Sent: Saturday, February 7, 2015 7:47:29 PM
> > Subject: Re: [PATCH] llist: Fix missing lockless_dereference()
> >
> > On Sat, Feb 07, 2015 at 10:30:44PM +0000, Mathieu Desnoyers wrote:
> > > > On Fri, Feb 06, 2015 at 09:08:21PM -0500, Mathieu Desnoyers wrote:
> > > > > A lockless_dereference() appears to be missing in llist_del_first().
> > > > > It should only matter for Alpha in practice.
> >
> > What could one anticipate to be the symptoms of such a missing
> > lockless_dereference()?
>
> This can trigger corruption of the lockless linked-list, which is
> used across a few subsystems. AFAIU, the scenario is as follows.
> Please bear with me, because it's been a while since I've read on
> the Alpha multi-cache-banks behavior.
>
> The list here would be initially non-empty. Initial state of
> new_last->next is unset (newly allocated); IOW: garbage. CPU A
> adds a node into the list while CPU B removes a node from the
> head of the list.
>
> CPU A CPU B
> llist_add_batch()
> - Stores to new_last->next
> - implicit full mb before cmpxchg makes the
> update to CPU A's cache bank containing
> new_last->next visible to other CPUs
> before CPU A's cache bank update making
> head->first visible to other CPUs.
> - cmpxchg updates head->first = new_first
> llist_del_first()
> - entry = load head->first
> -> here, lack of barrier on Alpha creates a window where
> CPU B's cache bank can see the updated "head->first",
> but the cache bank holding the next value did not
> receive the update yet, since each cache bank have
> their own channel, which can be independently
> saturated.
> - next = load entry->next (dereference entry pointer)
> - cmpxchg updates head->first = next
> -> can store unset "next" value into head->first, thus
> corrupting the linked list.
If my understanding were correct, cmpxchg will imply a full mb before
and after it, so that there is a mb between load head->first in cmpxchg
and load entry->next. If so, the memory barrier is only needed before
the loop.
Best Regards,
Huang, Ying
> >
> > The Alpha kernel is behaving pretty well provided one builds a machine
> > specific kernel and UP. When running an SMP kernel some packages
> > (most notably the java runtime, but there are a few others) occasionally
> > lock up in a pthread call --- could be a problem in libc rather then the
> > kernel.
>
> Are those lockups always occasional, or you have ways to reproduce them
> frequently with stress-tests ?
>
> Thanks,
>
> Mathieu
>
> >
> > > > Meta-comment, do we really care about Alpha anymore? Is it still
> > > > consered an "active" arch we support?
> >
> > There are a few of us still running recent kernels on Alpha. I am
> > maintaining the unofficial Debian alpha port at debian-ports, and the
> > Debian popcon shows about 10 installations of Debian Alpha.
> >
> > Cheers
> > Michael.
> >
>
----- Original Message -----
> From: "Huang Ying" <[email protected]>
> To: "Mathieu Desnoyers" <[email protected]>
> Cc: "Michael Cree" <[email protected]>, "Greg KH" <[email protected]>, [email protected],
> "Richard Henderson" <[email protected]>, "Ivan Kokshaysky" <[email protected]>, "Matt Turner"
> <[email protected]>, [email protected], "Paul McKenney" <[email protected]>, "David Howells"
> <[email protected]>, "Pranith Kumar" <[email protected]>, [email protected]
> Sent: Monday, February 9, 2015 8:52:28 PM
> Subject: Re: [PATCH] llist: Fix missing lockless_dereference()
>
> Hi, Mathieu,
>
> On Sun, 2015-02-08 at 04:25 +0000, Mathieu Desnoyers wrote:
> > ----- Original Message -----
> > > From: "Michael Cree" <[email protected]>
> > > To: "Mathieu Desnoyers" <[email protected]>
> > > Cc: "Greg KH" <[email protected]>, [email protected],
> > > "Richard Henderson" <[email protected]>, "Ivan
> > > Kokshaysky" <[email protected]>, "Matt Turner"
> > > <[email protected]>, "Huang Ying" <[email protected]>,
> > > [email protected], "Paul McKenney"
> > > <[email protected]>, "David Howells" <[email protected]>,
> > > "Pranith Kumar" <[email protected]>, [email protected]
> > > Sent: Saturday, February 7, 2015 7:47:29 PM
> > > Subject: Re: [PATCH] llist: Fix missing lockless_dereference()
> > >
> > > On Sat, Feb 07, 2015 at 10:30:44PM +0000, Mathieu Desnoyers wrote:
> > > > > On Fri, Feb 06, 2015 at 09:08:21PM -0500, Mathieu Desnoyers wrote:
> > > > > > A lockless_dereference() appears to be missing in
> > > > > > llist_del_first().
> > > > > > It should only matter for Alpha in practice.
> > >
> > > What could one anticipate to be the symptoms of such a missing
> > > lockless_dereference()?
> >
> > This can trigger corruption of the lockless linked-list, which is
> > used across a few subsystems. AFAIU, the scenario is as follows.
> > Please bear with me, because it's been a while since I've read on
> > the Alpha multi-cache-banks behavior.
> >
> > The list here would be initially non-empty. Initial state of
> > new_last->next is unset (newly allocated); IOW: garbage. CPU A
> > adds a node into the list while CPU B removes a node from the
> > head of the list.
> >
> > CPU A CPU B
> > llist_add_batch()
> > - Stores to new_last->next
> > - implicit full mb before cmpxchg makes the
> > update to CPU A's cache bank containing
> > new_last->next visible to other CPUs
> > before CPU A's cache bank update making
> > head->first visible to other CPUs.
> > - cmpxchg updates head->first = new_first
> > llist_del_first()
> > - entry = load head->first
> > -> here, lack of barrier on
> > Alpha creates a window where
> > CPU B's cache bank can see
> > the updated "head->first",
> > but the cache bank holding
> > the next value did not
> > receive the update yet, since
> > each cache bank have
> > their own channel, which can
> > be independently
> > saturated.
> > - next = load entry->next
> > (dereference entry pointer)
> > - cmpxchg updates head->first =
> > next
> > -> can store unset "next"
> > value into head->first, thus
> > corrupting the linked list.
>
> If my understanding were correct, cmpxchg will imply a full mb before
> and after it, so that there is a mb between load head->first in cmpxchg
> and load entry->next. If so, the memory barrier is only needed before
> the loop.
Yes, indeed, and by using lockless_dereference(), this is
what we end up doing.
FWIW, the reason why I moved smp_read_barrier_depends() into
the loop was to issue it after the check for NULL pointer,
assuming that getting a NULL pointer was a relatively
frequent case compared to a failing cmpxchg. But we're
talking about very minor optimisations compared to the
upside of lockless_dereference() making the code easier
to understand.
Thanks,
Mathieu
>
> Best Regards,
> Huang, Ying
>
> > >
> > > The Alpha kernel is behaving pretty well provided one builds a machine
> > > specific kernel and UP. When running an SMP kernel some packages
> > > (most notably the java runtime, but there are a few others) occasionally
> > > lock up in a pthread call --- could be a problem in libc rather then the
> > > kernel.
> >
> > Are those lockups always occasional, or you have ways to reproduce them
> > frequently with stress-tests ?
> >
> > Thanks,
> >
> > Mathieu
> >
> > >
> > > > > Meta-comment, do we really care about Alpha anymore? Is it still
> > > > > consered an "active" arch we support?
> > >
> > > There are a few of us still running recent kernels on Alpha. I am
> > > maintaining the unofficial Debian alpha port at debian-ports, and the
> > > Debian popcon shows about 10 installations of Debian Alpha.
> > >
> > > Cheers
> > > Michael.
> > >
> >
>
>
>
--
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com
On Sun, Feb 08, 2015 at 04:25:46AM +0000, Mathieu Desnoyers wrote:
> > From: "Michael Cree" <[email protected]>
> > The Alpha kernel is behaving pretty well provided one builds a machine
> > specific kernel and UP. When running an SMP kernel some packages
> > (most notably the java runtime, but there are a few others) occasionally
> > lock up in a pthread call --- could be a problem in libc rather then the
> > kernel.
>
> Are those lockups always occasional, or you have ways to reproduce them
> frequently with stress-tests ?
They are occasional but a build of openjdk-7 bootstrapping from itself
is very likely to end up with a locked up javac process. I've just set
such a build going with the openjdk-7 build-dependencies and some extra
debug packages installed in the build chroot to see if I can get a
useful backtrace. Will be tomorrow morning when I look as I am now
heading off for the night.
Cheers
Michael.
On 02/06/2015 09:08 PM, Mathieu Desnoyers wrote:
> A lockless_dereference() appears to be missing in llist_del_first().
> It should only matter for Alpha in practice.
>
> Signed-off-by: Mathieu Desnoyers <[email protected]>
> CC: Huang Ying <[email protected]>
> CC: Paul McKenney <[email protected]>
> CC: David Howells <[email protected]>
> CC: Pranith Kumar <[email protected]>
> CC: [email protected] # v3.1+
> ---
> lib/llist.c | 8 +++++++-
> 1 file changed, 7 insertions(+), 1 deletion(-)
>
> diff --git a/lib/llist.c b/lib/llist.c
> index f76196d..f34e176 100644
> --- a/lib/llist.c
> +++ b/lib/llist.c
> @@ -26,6 +26,7 @@
> #include <linux/export.h>
> #include <linux/interrupt.h>
> #include <linux/llist.h>
> +#include <linux/rcupdate.h>
Pranith,
I didn't realize you put lockless_dereference() in rcupdate.h
If the point of lockless_reference() is to provide a utility function for
situations _not_ involving RCU, then it doesn't make sense to provide it
in an RCU header file.
Regards,
Peter Hurley
PS - That's not an objection to this patch, though.
> /**
> @@ -67,7 +68,12 @@ struct llist_node *llist_del_first(struct llist_head *head)
> {
> struct llist_node *entry, *old_entry, *next;
>
> - entry = head->first;
> + /*
> + * Load entry before entry->next. Matches the implicit
> + * memory barrier before the cmpxchg in llist_add_batch(),
> + * which ensures entry->next is stored before entry.
> + */
> + entry = lockless_dereference(head->first);
> for (;;) {
> if (entry == NULL)
> return NULL;
>
On Tue, Feb 10, 2015 at 09:03:50AM -0500, Peter Hurley wrote:
> On 02/06/2015 09:08 PM, Mathieu Desnoyers wrote:
> > A lockless_dereference() appears to be missing in llist_del_first().
> > It should only matter for Alpha in practice.
> >
> > Signed-off-by: Mathieu Desnoyers <[email protected]>
> > CC: Huang Ying <[email protected]>
> > CC: Paul McKenney <[email protected]>
> > CC: David Howells <[email protected]>
> > CC: Pranith Kumar <[email protected]>
> > CC: [email protected] # v3.1+
> > ---
> > lib/llist.c | 8 +++++++-
> > 1 file changed, 7 insertions(+), 1 deletion(-)
> >
> > diff --git a/lib/llist.c b/lib/llist.c
> > index f76196d..f34e176 100644
> > --- a/lib/llist.c
> > +++ b/lib/llist.c
> > @@ -26,6 +26,7 @@
> > #include <linux/export.h>
> > #include <linux/interrupt.h>
> > #include <linux/llist.h>
> > +#include <linux/rcupdate.h>
>
> Pranith,
>
> I didn't realize you put lockless_dereference() in rcupdate.h
>
> If the point of lockless_reference() is to provide a utility function for
> situations _not_ involving RCU, then it doesn't make sense to provide it
> in an RCU header file.
OK, I'll bite. Just where do you suggest putting it? ;-)
That question aside, lockless_dereference() does resemble the
rcu_dereference() family of APIs. This of course means that having it in
rcupdate.h near rcu_dereference() makes it easier to maintain, given that
needed changes to one are likely to require at least review of the rest.
Thanx, Paul
> Regards,
> Peter Hurley
>
> PS - That's not an objection to this patch, though.
>
> > /**
> > @@ -67,7 +68,12 @@ struct llist_node *llist_del_first(struct llist_head *head)
> > {
> > struct llist_node *entry, *old_entry, *next;
> >
> > - entry = head->first;
> > + /*
> > + * Load entry before entry->next. Matches the implicit
> > + * memory barrier before the cmpxchg in llist_add_batch(),
> > + * which ensures entry->next is stored before entry.
> > + */
> > + entry = lockless_dereference(head->first);
> > for (;;) {
> > if (entry == NULL)
> > return NULL;
> >
>
On 02/10/2015 11:38 AM, Paul E. McKenney wrote:
> On Tue, Feb 10, 2015 at 09:03:50AM -0500, Peter Hurley wrote:
>> On 02/06/2015 09:08 PM, Mathieu Desnoyers wrote:
>>> A lockless_dereference() appears to be missing in llist_del_first().
>>> It should only matter for Alpha in practice.
>>>
>>> Signed-off-by: Mathieu Desnoyers <[email protected]>
>>> CC: Huang Ying <[email protected]>
>>> CC: Paul McKenney <[email protected]>
>>> CC: David Howells <[email protected]>
>>> CC: Pranith Kumar <[email protected]>
>>> CC: [email protected] # v3.1+
>>> ---
>>> lib/llist.c | 8 +++++++-
>>> 1 file changed, 7 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/lib/llist.c b/lib/llist.c
>>> index f76196d..f34e176 100644
>>> --- a/lib/llist.c
>>> +++ b/lib/llist.c
>>> @@ -26,6 +26,7 @@
>>> #include <linux/export.h>
>>> #include <linux/interrupt.h>
>>> #include <linux/llist.h>
>>> +#include <linux/rcupdate.h>
>>
>> Pranith,
>>
>> I didn't realize you put lockless_dereference() in rcupdate.h
>>
>> If the point of lockless_reference() is to provide a utility function for
>> situations _not_ involving RCU, then it doesn't make sense to provide it
>> in an RCU header file.
>
> OK, I'll bite. Just where do you suggest putting it? ;-)
Two possibilities:
1. linux/compiler.h where READ/WRITE/ACCESS_ONCE() are, or
2. a new arch-independent header sucked in by asm/barrier.h (because it's
basically a barrier abstraction, in the same way that smp_load_acquire/
smp_store_release are)
> That question aside, lockless_dereference() does resemble the
> rcu_dereference() family of APIs. This of course means that having it in
> rcupdate.h near rcu_dereference() makes it easier to maintain, given that
> needed changes to one are likely to require at least review of the rest.
I can understand how and why it got there.
But it's not an RCU abstraction, so having random users pulling in RCU headers
to get at a convenient (but not strictly necessary) helper function is less than
ideal.
Honestly, I'd rather see the naked smp_read_barrier_depends() than wondering why
someone grabbed linux/rcupdate.h for the lockless list implementation.
Regards,
Peter Hurley
>>> /**
>>> @@ -67,7 +68,12 @@ struct llist_node *llist_del_first(struct llist_head *head)
>>> {
>>> struct llist_node *entry, *old_entry, *next;
>>>
>>> - entry = head->first;
>>> + /*
>>> + * Load entry before entry->next. Matches the implicit
>>> + * memory barrier before the cmpxchg in llist_add_batch(),
>>> + * which ensures entry->next is stored before entry.
>>> + */
>>> + entry = lockless_dereference(head->first);
>>> for (;;) {
>>> if (entry == NULL)
>>> return NULL;
>>>
>>
>
On Tue, Feb 10, 2015 at 12:29:24PM -0500, Peter Hurley wrote:
> On 02/10/2015 11:38 AM, Paul E. McKenney wrote:
> > On Tue, Feb 10, 2015 at 09:03:50AM -0500, Peter Hurley wrote:
> >> On 02/06/2015 09:08 PM, Mathieu Desnoyers wrote:
> >>> A lockless_dereference() appears to be missing in llist_del_first().
> >>> It should only matter for Alpha in practice.
> >>>
> >>> Signed-off-by: Mathieu Desnoyers <[email protected]>
> >>> CC: Huang Ying <[email protected]>
> >>> CC: Paul McKenney <[email protected]>
> >>> CC: David Howells <[email protected]>
> >>> CC: Pranith Kumar <[email protected]>
> >>> CC: [email protected] # v3.1+
> >>> ---
> >>> lib/llist.c | 8 +++++++-
> >>> 1 file changed, 7 insertions(+), 1 deletion(-)
> >>>
> >>> diff --git a/lib/llist.c b/lib/llist.c
> >>> index f76196d..f34e176 100644
> >>> --- a/lib/llist.c
> >>> +++ b/lib/llist.c
> >>> @@ -26,6 +26,7 @@
> >>> #include <linux/export.h>
> >>> #include <linux/interrupt.h>
> >>> #include <linux/llist.h>
> >>> +#include <linux/rcupdate.h>
> >>
> >> Pranith,
> >>
> >> I didn't realize you put lockless_dereference() in rcupdate.h
> >>
> >> If the point of lockless_reference() is to provide a utility function for
> >> situations _not_ involving RCU, then it doesn't make sense to provide it
> >> in an RCU header file.
> >
> > OK, I'll bite. Just where do you suggest putting it? ;-)
>
> Two possibilities:
> 1. linux/compiler.h where READ/WRITE/ACCESS_ONCE() are, or
> 2. a new arch-independent header sucked in by asm/barrier.h (because it's
> basically a barrier abstraction, in the same way that smp_load_acquire/
> smp_store_release are)
>
>
> > That question aside, lockless_dereference() does resemble the
> > rcu_dereference() family of APIs. This of course means that having it in
> > rcupdate.h near rcu_dereference() makes it easier to maintain, given that
> > needed changes to one are likely to require at least review of the rest.
>
> I can understand how and why it got there.
> But it's not an RCU abstraction, so having random users pulling in RCU headers
> to get at a convenient (but not strictly necessary) helper function is less than
> ideal.
>
> Honestly, I'd rather see the naked smp_read_barrier_depends() than wondering why
> someone grabbed linux/rcupdate.h for the lockless list implementation.
The usual fix for this problem is to list the API member as a comment
at the end of the #include line.
Thanx, Paul
> Regards,
> Peter Hurley
>
>
> >>> /**
> >>> @@ -67,7 +68,12 @@ struct llist_node *llist_del_first(struct llist_head *head)
> >>> {
> >>> struct llist_node *entry, *old_entry, *next;
> >>>
> >>> - entry = head->first;
> >>> + /*
> >>> + * Load entry before entry->next. Matches the implicit
> >>> + * memory barrier before the cmpxchg in llist_add_batch(),
> >>> + * which ensures entry->next is stored before entry.
> >>> + */
> >>> + entry = lockless_dereference(head->first);
> >>> for (;;) {
> >>> if (entry == NULL)
> >>> return NULL;
> >>>
> >>
> >
>