2004-06-26 16:09:00

by James Bottomley

[permalink] [raw]
Subject: [PATCH] Fix the cpumask rewrite

This patch:

ChangeSet 1.1828.1.25 2004/06/24 08:51:00 [email protected]
[PATCH] cpumask: rewrite cpumask.h - single bitmap based implementation

From: Paul Jackson <[email protected]>

Major rewrite of cpumask to use a single implementation, as a struct-wrapped
bitmap.

Altered the cpumask iterators to add a volatile to the bitmap type.
This might be correct for x86 and itanium, but it isn't for parisc where
our bitmap operators don't take volatile pointers. This makes our
compile incredibly noisy in just about every file:

CC mm/rmap.o
In file included from include/linux/sched.h:15,
from include/linux/mm.h:4,
from mm/rmap.c:29:
include/linux/cpumask.h: In function `__cpu_set':
include/linux/cpumask.h:86: warning: passing arg 2 of `set_bit' discards
qualifiers from pointer target type
include/linux/cpumask.h: In function `__cpu_clear':
include/linux/cpumask.h:92: warning: passing arg 2 of `clear_bit'
discards qualifiers from pointer target type

Since whether the bitmap operators are volatile or not is within the
province of the architecture to decide, I think the correct fix is just
to drop the spurious volatiles from cpumask.h

James

===== include/linux/cpumask.h 1.15 vs edited =====
--- 1.15/include/linux/cpumask.h 2004-06-24 13:57:31 -05:00
+++ edited/include/linux/cpumask.h 2004-06-26 11:00:14 -05:00
@@ -81,13 +81,13 @@
extern cpumask_t _unused_cpumask_arg_;

#define cpu_set(cpu, dst) __cpu_set((cpu), &(dst))
-static inline void __cpu_set(int cpu, volatile cpumask_t *dstp)
+static inline void __cpu_set(int cpu, cpumask_t *dstp)
{
set_bit(cpu, dstp->bits);
}

#define cpu_clear(cpu, dst) __cpu_clear((cpu), &(dst))
-static inline void __cpu_clear(int cpu, volatile cpumask_t *dstp)
+static inline void __cpu_clear(int cpu, cpumask_t *dstp)
{
clear_bit(cpu, dstp->bits);
}


2004-06-26 16:34:03

by Linus Torvalds

[permalink] [raw]
Subject: Re: [PATCH] Fix the cpumask rewrite



On Sat, 26 Jun 2004, James Bottomley wrote:
>
> This might be correct for x86 and itanium, but it isn't for parisc where
> our bitmap operators don't take volatile pointers.

Why not? The thing is, the bitmap operators are supposed to work on
volatile data, ie people are literally using them for things like

while (test_bit(TASKLET_STATE_SCHED, &t->state));

and the thing is supposed to work.

Now, I personally am not a big believer in the "volatile" keyword itself,
since I believe that anybody who expects the compiler to generate
different code for volatiles and non-volatiles is pretty much waiting for
a bug to happen, but there are two cases where I think it's ok:

- in function prototypes to show that the function can take volatile data
(and not complain).

- as an arch-specific low-level implementation detail to avoid having to
use inline assembly just to load a value. Ie a _data_structure_ should
never be volatile, but it's ok to use a volatile pointer for a single
access.

I believe the bitop functions fall under #1 - the function is clearly
supposed to handle the case of a volatile pointer, and if it is inlined,
the above endless while-loop must not just load the bit once and turn it
into an endless loop - it needs to re-load the thing every iteration.

> Since whether the bitmap operators are volatile or not is within the
> province of the architecture to decide,

I disagree. Why wouldn't they be volatile?

Linus

2004-06-26 16:46:12

by Linus Torvalds

[permalink] [raw]
Subject: Re: [PATCH] Fix the cpumask rewrite



On Sat, 26 Jun 2004, Linus Torvalds wrote:
>
> Why not? The thing is, the bitmap operators are supposed to work on
> volatile data, ie people are literally using them for things like
>
> while (test_bit(TASKLET_STATE_SCHED, &t->state));

Ahh. That one is part of a "do while ()" loop, and grep mis-identifies it.
So parisc gets it right, because the whole loop is

do
yield();
while (test_bit(TASKLET_STATE_SCHED, &t->state));

so at least the compiler didn't get to optimize it away.

Hopefully we don't have any other busy loops either - even if it is a busy
loop, we should have a "cpu_relax()" or something there just to keep power
levels down, and that should also tell the compiler not to do anything
silly. But the conceptual point stands.

Linus

2004-06-26 16:48:17

by James Bottomley

[permalink] [raw]
Subject: Re: [PATCH] Fix the cpumask rewrite

On Sat, 2004-06-26 at 11:32, Linus Torvalds wrote:
> Why not? The thing is, the bitmap operators are supposed to work on
> volatile data, ie people are literally using them for things like
>
> while (test_bit(TASKLET_STATE_SCHED, &t->state));
>
> and the thing is supposed to work.

Well, I agree it's supposed to work, what I don't agree about is that
generic code gets to designate critical data as volatile.

> Now, I personally am not a big believer in the "volatile" keyword itself,
> since I believe that anybody who expects the compiler to generate
> different code for volatiles and non-volatiles is pretty much waiting for
> a bug to happen, but there are two cases where I think it's ok:
>
> - in function prototypes to show that the function can take volatile data
> (and not complain).

This is a real problem for us on parisc though. By designating the
function prototype as volatile, you force the function implementation
*also* to be volatile. Unfortunately, in our case, gcc seems to
generate worse code than a pigs arse when it sees volatile data lurking
anywhere in a function.

Our current set bit functions are *coded* to operate on volatile data,
we just don't use the volatile keyword to persuade gcc to generate
better code.

I realise we could hand code in assembly to get around this problem, but
our implementation actually has to use hashed spinlocks for the
volatility, so we'd rather not if we really don't have to.

The kernel has survived this far with no other bit operator data being
volatile.

> - as an arch-specific low-level implementation detail to avoid having to
> use inline assembly just to load a value. Ie a _data_structure_ should
> never be volatile, but it's ok to use a volatile pointer for a single
> access.

Definitely agree here ... the arch code is really the place we know the
compiler penalty of being volatile.

James


2004-06-26 16:54:40

by Linus Torvalds

[permalink] [raw]
Subject: Re: [PATCH] Fix the cpumask rewrite



On Sat, 26 Jun 2004, James Bottomley wrote:
>
> On Sat, 2004-06-26 at 11:32, Linus Torvalds wrote:
> > Why not? The thing is, the bitmap operators are supposed to work on
> > volatile data, ie people are literally using them for things like
> >
> > while (test_bit(TASKLET_STATE_SCHED, &t->state));
> >
> > and the thing is supposed to work.
>
> Well, I agree it's supposed to work, what I don't agree about is that
> generic code gets to designate critical data as volatile.

I agree in the sense that I don't think the _data_ should be volatile.

But I think the functions to access various pieces of data should be able
to take volatiles without warning.

See the difference? Same way "memcpy()" takes a "const" argument for the
source. That doesn't mean that the source _has_ to be const, it just means
that it won't complain if it is.

And the same is true of "volatile" for the bitop functions. They are
volatile not because they require the data to be volatile, but because
they have at least traditionally been used for various cases, _including_
volatile.

Now, we could say that we don't do that any more, and decide that the
regular bitop functions really cannot be used on volatile stuff. But
that's a BIG decision. And it's certainly not a decision that parisc
users should make.

> Our current set bit functions are *coded* to operate on volatile data,
> we just don't use the volatile keyword to persuade gcc to generate
> better code.

Well, at least judging by your "test_bit()", the function literally is
_not_ coded to work with volatile data.

If the above loop had been a real case, gcc on parisc would have optimized
it away, and done the wrong thing.

Linus

2004-06-26 17:18:41

by James Bottomley

[permalink] [raw]
Subject: Re: [PATCH] Fix the cpumask rewrite

On Sat, 2004-06-26 at 11:54, Linus Torvalds wrote:
> On Sat, 26 Jun 2004, James Bottomley wrote:
> >
> > On Sat, 2004-06-26 at 11:32, Linus Torvalds wrote:
> > > Why not? The thing is, the bitmap operators are supposed to work on
> > > volatile data, ie people are literally using them for things like
> > >
> > > while (test_bit(TASKLET_STATE_SCHED, &t->state));

I tried this on PA. Our gcc definitely generates the correct code, even
without the volatile...

By the way, I had to try with a genuine volatile since
tasklet_struct.state isn't actually volatile in linux/interrupt.h

> > > and the thing is supposed to work.
> >
> > Well, I agree it's supposed to work, what I don't agree about is that
> > generic code gets to designate critical data as volatile.
>
> I agree in the sense that I don't think the _data_ should be volatile.
>
> But I think the functions to access various pieces of data should be able
> to take volatiles without warning.
>
> See the difference? Same way "memcpy()" takes a "const" argument for the
> source. That doesn't mean that the source _has_ to be const, it just means
> that it won't complain if it is.
>
> And the same is true of "volatile" for the bitop functions. They are
> volatile not because they require the data to be volatile, but because
> they have at least traditionally been used for various cases, _including_
> volatile.

But in the current kernel, there are no bitops on volatile data in the
parisc tree. This cpumask is the first such one that we've come
across...

> Now, we could say that we don't do that any more, and decide that the
> regular bitop functions really cannot be used on volatile stuff. But
> that's a BIG decision. And it's certainly not a decision that parisc
> users should make.

But that's my point. Currently there is no volatile pointer bitops (and
I have bitten the heads off a few people who tried to introduce some),
so there's no existing case to justify being backward compatible with.

> Well, at least judging by your "test_bit()", the function literally is
> _not_ coded to work with volatile data.

test_bit's a special case. The lock is to prevent data corruption, not
racing between test_bit and set_bit.

> If the above loop had been a real case, gcc on parisc would have optimized
> it away, and done the wrong thing.

That's not what our gcc seems to do (both 3.0.4 and 3.3.4)

James


2004-06-26 18:01:37

by Linus Torvalds

[permalink] [raw]
Subject: Re: [PATCH] Fix the cpumask rewrite



On Sat, 26 Jun 2004, James Bottomley wrote:
>
> But in the current kernel, there are no bitops on volatile data in the
> parisc tree. This cpumask is the first such one that we've come
> across...

So?

You're ignoring my point.

I'm saying that data structures ARE NOT VOLATILE. I personally believe
that the notion of a "volatile" data structure is complete and utter shit.

But _code_ can act "with volatility". For bitops, it's just another way of
saying "people may use this function without locking", because it's
totally valid to use "test_bit()" to actually implement a lock.

So the rule has always been (and this has nothing to do with parisc, and
parisc DOES NOT GET TO SET THE RULES!) that "test_bit()" acts in a
volatile manner, and that you traditionally could write

while (test_and_set_bit(xxx, field))
while (test_bit(xx, field)) /* Nothing */;

to do a lock.

NOTE! There are no volatile data structures _anywhere_, because as
mentioned, I think that whole notion is a total piece of crap, and is not
even a well-defined part of the C language. But the test_bit() function
has to act as if the data it is passed down can change. To repeat:
_codepaths_ may have volatility attributes depending on usage of the data.

Now, we generally discouage this kind of hand-written locking code, and we
these days have a "bit_spin_lock()" helper function to do the above (which
also has the appropriate "cpu_relax()" in place), but the point is that
this code has existed, and test_bit() HAS TO FOLLOW THE RULES. Even on
pa-risc.

I repeat: this has NOTHING to do with "volatile data structures". The
kernel does not belive in that notion, and you generally have to use locks
to make sure that all accesses are serialized.

But sometimes non-locked accesses are ok. test_bit() historically is very
much one of those things. A classic example of this is how something like
bh->b_flags is not marked volatile (some fields of it require explicit
locking rules), but it's ok to check for certain bits of it asynchronously
without having locked the bh.

See?

- CODE can be volatile ("I must re-load this value").
- DATA has access rules ("this value should only be changed with write
lock xxx held").

And in this case, test_bit() has the "I must re-load this value" rule for
historical reasons.

AND PA-RISC IS WRONG IF IT DOESN'T FOLLOW THE RULES!

Final note: I might be willing to just change the rules, if people can
show that no paths that might need the volatile behaviour exist any more.
They definitely used to exist, though, and that's a BIG decision to make.

Linus

2004-06-26 18:28:22

by Vojtech Pavlik

[permalink] [raw]
Subject: Re: [PATCH] Fix the cpumask rewrite

On Sat, Jun 26, 2004 at 11:01:14AM -0700, Linus Torvalds wrote:

> Final note: I might be willing to just change the rules, if people can
> show that no paths that might need the volatile behaviour exist any more.
> They definitely used to exist, though, and that's a BIG decision to make.

At least input pretty much relies on the fact that bitops don't need
locking and act as memory barriers.

--
Vojtech Pavlik
SuSE Labs, SuSE CR

2004-06-26 18:55:50

by Linus Torvalds

[permalink] [raw]
Subject: Re: [PATCH] Fix the cpumask rewrite


On Sat, 26 Jun 2004, Vojtech Pavlik wrote:
>
> At least input pretty much relies on the fact that bitops don't need
> locking and act as memory barriers.

Well, plain test_bit() has always been more relaxed than the others, and
has never implied a memory barrier. Only the "test_and_set/clear()" things
imply memory barriers.

What we _could_ do (without changing any existing rules) is to add a
"__test_bit()" that is the relaxed version that doesn't do any of the
volatile etc. That would match the "__" versions of the other bit
operations.

Then people who know that they use the bits without any volatility issues
can use that one, and let the compiler optimize more.

Hmm?

Linus

2004-06-26 18:59:14

by James Bottomley

[permalink] [raw]
Subject: Re: [PATCH] Fix the cpumask rewrite

On Sat, 2004-06-26 at 13:01, Linus Torvalds wrote:
> So?
>
> You're ignoring my point.

You're making 3 points, I think

1) Volatility is a property of the code path, not the data, which I
agree with.

2) the bit operators need to operate on volatile data (i.e. need a
volatile in their prototypes) without gcc issuing a qualifier dropped
warning.

This I disagree with because we have no volatile data currently in the
kernel that necessitates this behavour.

3) The parisc bit operator implementations as inline functions need to
have volatile data in their function templates because otherwise gcc
will not implement them correctly and may optimise them away when it
shouldn't.

I disagree with this too...although I'm on shakier ground, and I'd
prefer gcc experts quantify why this happens, but if, on parisc, I look
at the assembly output of your example

while (test_and_set_bit(xxx, field))
while (test_bit(xx, field)) /* Nothing */;

I find it to be coded exactly correctly. Even without using a volatile
pointer in our test_and_set_bit() and test_bit() implementations, the
compiler still does both checks in the loop.

So my contention is that even without the volatile pointers in our
implementation, we still correctly treat this code path as having
volatile (i.e. we test the bits each time around the loop). All the
addition of the volatile to the cpumask patch does is cause us to emit
spurious warnings.

> And in this case, test_bit() has the "I must re-load this value" rule for
> historical reasons.

Right, I agree exactly with this. However, empirically we do do this,
even without having to declare the data as volatile.

> AND PA-RISC IS WRONG IF IT DOESN'T FOLLOW THE RULES!

I belive we do follow the rules.

> Final note: I might be willing to just change the rules, if people can
> show that no paths that might need the volatile behaviour exist any more.
> They definitely used to exist, though, and that's a BIG decision to make.

Actually, I don't want to change the rules, I just want parisc to be
able to implement them without being forced to have a performance
killing volatile in its implementation functions.

James


2004-06-26 19:03:39

by James Bottomley

[permalink] [raw]
Subject: Re: [PATCH] Fix the cpumask rewrite

On Sat, 2004-06-26 at 13:54, Linus Torvalds wrote:
> On Sat, 26 Jun 2004, Vojtech Pavlik wrote:
> > At least input pretty much relies on the fact that bitops don't need
> > locking and act as memory barriers.
>
> Well, plain test_bit() has always been more relaxed than the others, and
> has never implied a memory barrier. Only the "test_and_set/clear()" things
> imply memory barriers.
>
> What we _could_ do (without changing any existing rules) is to add a
> "__test_bit()" that is the relaxed version that doesn't do any of the
> volatile etc. That would match the "__" versions of the other bit
> operations.
>
> Then people who know that they use the bits without any volatility issues
> can use that one, and let the compiler optimize more.

Well, we can do this, yes.

Our test bit implementation would then become:

static __inline__ int test_bit(int nr, const volatile void *address)
{
return __test_bit(nr, (const void *)address);
}

That would keep our implementation happy.

James


2004-06-26 19:13:23

by Vojtech Pavlik

[permalink] [raw]
Subject: Re: [PATCH] Fix the cpumask rewrite

On Sat, Jun 26, 2004 at 11:54:24AM -0700, Linus Torvalds wrote:

> On Sat, 26 Jun 2004, Vojtech Pavlik wrote:
> >
> > At least input pretty much relies on the fact that bitops don't need
> > locking and act as memory barriers.
>
> Well, plain test_bit() has always been more relaxed than the others, and
> has never implied a memory barrier. Only the "test_and_set/clear()" things
> imply memory barriers.

Ouch. I'll have to revisit some code then.

> What we _could_ do (without changing any existing rules) is to add a
> "__test_bit()" that is the relaxed version that doesn't do any of the
> volatile etc. That would match the "__" versions of the other bit
> operations.
>
> Then people who know that they use the bits without any volatility issues
> can use that one, and let the compiler optimize more.
>
> Hmm?

That makes a lot of sense to me, as we already have the __ variants for
most of the other bitops already.

--
Vojtech Pavlik
SuSE Labs, SuSE CR

2004-06-26 19:13:19

by Linus Torvalds

[permalink] [raw]
Subject: Re: [PATCH] Fix the cpumask rewrite



On Sat, 26 Jun 2004, James Bottomley wrote:
>
> 1) Volatility is a property of the code path, not the data, which I
> agree with.

Right.

> 2) the bit operators need to operate on volatile data (i.e. need a
> volatile in their prototypes) without gcc issuing a qualifier dropped
> warning.
>
> This I disagree with because we have no volatile data currently in the
> kernel that necessitates this behavour.

Why do you disagree, since
- right now PA-RISC is BUGGY because of the lack of volatile. gcc _does_
optimize and combine accesses since it's an inline function _without_
any volatile specifier on the pointers.
- the volatile _documents_ the fact that the bitops have volatile
behaviour.

> 3) The parisc bit operator implementations as inline functions need to
> have volatile data in their function templates because otherwise gcc
> will not implement them correctly and may optimise them away when it
> shouldn't.
>
> I disagree with this too...although I'm on shakier ground, and I'd
> prefer gcc experts quantify why this happens, but if, on parisc, I look
> at the assembly output of your example
>
> while (test_and_set_bit(xxx, field))
> while (test_bit(xx, field)) /* Nothing */;

It seems the pa-risc optimizer for gcc is somehow broken. I just checked
on x86:

#define test_bit(x,y) \
(!!((1ul << x) & *(y)))

int test(unsigned long *a)
{
while (test_bit(0, a));
}

definitely does not re-load the value. I checked with an inline function
too, same result:

testb $1, (%eax)
.p2align 2,,3
.L2:
jne .L2

ie gcc _does_ optimize this away without the volatile. With the volatile
it turns into

.p2align 2,,3
.L2:
movl (%edx), %eax
testb $1, %al
jne .L2

so there's a clear difference here.

Now, why gcc on pa-risc misses that optimization, I have no clue.

> So my contention is that even without the volatile pointers in our
> implementation, we still correctly treat this code path as having
> volatile (i.e. we test the bits each time around the loop). All the
> addition of the volatile to the cpumask patch does is cause us to emit
> spurious warnings.

That's apparently just you being lucky due to having a broken gcc.

Linus

2004-06-26 19:15:36

by Linus Torvalds

[permalink] [raw]
Subject: Re: [PATCH] Fix the cpumask rewrite



On Sat, 26 Jun 2004, James Bottomley wrote:
>
> Our test bit implementation would then become:
>
> static __inline__ int test_bit(int nr, const volatile void *address)
> {
> return __test_bit(nr, (const void *)address);
> }
>
> That would keep our implementation happy.

You just _want_ to be screwed over whenever your gcc bugs are fixed, don't
you?

Are you going to complain to the gcc people when they fix their bugs? Or
are you going to spend months to debug problems that only happen for other
people, because they happen to have fixed compilers?

There's a real reason why there is a "volatile" there on other
architectures.

Linus

2004-06-26 19:34:08

by James Bottomley

[permalink] [raw]
Subject: Re: [PATCH] Fix the cpumask rewrite

On Sat, 2004-06-26 at 14:11, Linus Torvalds wrote:
> It seems the pa-risc optimizer for gcc is somehow broken. I just checked
> on x86:
>
> #define test_bit(x,y) \
> (!!((1ul << x) & *(y)))
>
> int test(unsigned long *a)
> {
> while (test_bit(0, a));
> }

OK, this one definitely compiles to a non reaload loop on parisc. I
concede we need the volatile.

James


2004-06-26 22:18:16

by Chris Wedgwood

[permalink] [raw]
Subject: Re: [PATCH] Fix the cpumask rewrite

On Sat, Jun 26, 2004 at 09:32:15AM -0700, Linus Torvalds wrote:

> Now, I personally am not a big believer in the "volatile" keyword
> itself, since I believe that anybody who expects the compiler to
> generate different code for volatiles and non-volatiles is pretty
> much waiting for a bug to happen

I recently had to change jiffies_64 (include/linux/jiffies.h) to be
volatile as gcc produced code that didn't work as a result of it.

Clearly in some cases gcc does know about volatile and does produce
'the right thing' --- I don't really see why people claim volatile is
a bad thing, there are clearly places where we need this and gcc seems
to do the right thing.


--cw

2004-06-26 22:48:47

by Linus Torvalds

[permalink] [raw]
Subject: Re: [PATCH] Fix the cpumask rewrite



On Sat, 26 Jun 2004, Chris Wedgwood wrote:
>
> I recently had to change jiffies_64 (include/linux/jiffies.h) to be
> volatile as gcc produced code that didn't work as a result of it.

"jiffies" is one of the few things that I accept as volatile, since it's
basically read-only and accessed directly and has no internal structure
(ie it's a single word).

However, "jiffies_64" should _not_ be accessed directly. Anything that
does so is likely buggy.

But for most data structures, the way to control access is either with
proper locking (at which point they aren't volatile any more) or through
proper accessor functions (ie "jiffies_64" should generally only be
accessed with something that understands about low/high word and update
ordering and re-testing).

And once you do that proper access function, you should put the volatile
_there_ - since THAT is the place that understands what the access
requireemnts are, not the compiler.

I repeat: it is the _code_ that knows about volatile rules, not the data
structure.

> Clearly in some cases gcc does know about volatile and does produce
> 'the right thing' --- I don't really see why people claim volatile is
> a bad thing, there are clearly places where we need this and gcc seems
> to do the right thing.

See above. There are basically _no_ places where the compiler can do the
right thing except by pure luck. "Programming-by-luck" may work, but it's
not a good idea.

So please tell me where the jiffies64 thing was suddenly correct as a
"volatile", and I can almost guarantee you that you were WRONG to mark it
volatile. It may work 99% of the time, but since the only correct way to
access jiffies_64 is either:

- knowing that you are the person that updates it (under xtime_lock),
thus making it non-volatile
- using "get_jiffies_64()", using the proper seq_begin() etc.

But you're right, on a 64-bit architecture, jiffies_64 falls back to the
"jiffies" case, and there it would be acceptable to make it volatile. Bit
since it had better be accessed with "get_jiffies_64()" anyway, you're
much better off putting the volatile THERE, and not cause problems for
code generation in the places that don't need it.

See? The volatile is (again) wrong on the data structure (jiffies_64), and
you should have added it to the code (get_jiffies_64).

Linus

2004-06-26 23:39:21

by Chris Wedgwood

[permalink] [raw]
Subject: jiffies_64

On Sat, Jun 26, 2004 at 03:48:34PM -0700, Linus Torvalds wrote:

> So please tell me where the jiffies64 thing was suddenly correct as
> a "volatile", and I can almost guarantee you that you were WRONG to
> mark it volatile.

In this case it's a platform where moving jiffies around is very
painful so I have a patch to make jiffies per-CPU using the local APIC
timer (jiffies becomes a macro to get_local_jiffies() with all the
horrors that come with that).

Because this drifts we use the PIT as a referrence and resynchronize
every few seconds against jiffies_64.

A few places can't actually use get_local_jiffies() early on like the
bogomips calibration code (because the local APIC timer isn't working
yet) so I've got some hacks in there where they use jiffies_64.

> But you're right, on a 64-bit architecture, jiffies_64 falls back to
> the "jiffies" case, and there it would be acceptable to make it
> volatile.

Actually, I don't like the way we use xtime_lock at all for jiffies_64
now. Things are unclear and fragile in places. In my tree I have
jiffies_64 protected by a spinlock instead of the seqlock thing (as it
still needs to be locked even though it's incremented only from one
CPU in irq context) and was looking at using cmpxchg8 on platforms
what can to remove the need to lock jiffies_64 completely.

I've wondered if we can't do the same thing for xtime as well,
however I need to make per-CPU xtime so I'll revisit that when I get a
chance.

> See? The volatile is (again) wrong on the data structure (jiffies_64), and
> you should have added it to the code (get_jiffies_64).

get_jiffies_64() in my case reads the per-CPU value and like I said a
few places don't like that.


--cw

2004-06-26 23:58:27

by Alan

[permalink] [raw]
Subject: Re: [parisc-linux] Re: [PATCH] Fix the cpumask rewrite

On Sad, 2004-06-26 at 23:48, Linus Torvalds wrote:
> "jiffies" is one of the few things that I accept as volatile, since it's
> basically read-only and accessed directly and has no internal structure
> (ie it's a single word).

For most uses jiffies should die. If drivers could not access jiffies
except by a (possibly trivial) helper then it would be a huge step
closer to being able to run embedded linux without a continually running
timer.


2004-06-27 00:05:48

by Chris Wedgwood

[permalink] [raw]
Subject: Re: [parisc-linux] Re: [PATCH] Fix the cpumask rewrite

On Sat, Jun 26, 2004 at 11:54:38PM +0100, Alan Cox wrote:

> For most uses jiffies should die. If drivers could not access jiffies
> except by a (possibly trivial) helper then it would be a huge step
> closer to being able to run embedded linux without a continually running
> timer.

I'm all for that, except last I counted there are about 5000 users of
jiffies. What do you suggest as a replacement API?



--cw

2004-06-27 01:55:22

by Chris Wedgwood

[permalink] [raw]
Subject: more (insane) jiffies ranting

Continuing my rant...

On Sat, Jun 26, 2004 at 03:48:34PM -0700, Linus Torvalds wrote:

> But for most data structures, the way to control access is either
> with proper locking (at which point they aren't volatile any more)
> or through proper accessor functions (ie "jiffies_64" should
> generally only be accessed with something that understands about
> low/high word and update ordering and re-testing).

I don't entirely buy this. Right now x86 code just assumes 32-bit
loads are atomic and does them blindly in lots of places (ie. every
user of jiffies just about).

Without the volatile it seems entirely reasonable gcc will produce
correct, but wrong code here so I would argue 'volatile' is a property
of the data in this case.

> I repeat: it is the _code_ that knows about volatile rules, not the
> data structure.

Except as I mentioned we have exceptions to this right now.

As far as I can tell jiffies is a mess (I'm talking mostly ia32 here):

jiffies_64 is protected by xtime_lock, this is a seqlock_t which
is IMO overly complicated and unnecessary, and this lock is shared
for xtime as well

jiffies_64 could be done locklessly as far as I can tell anyhow.

jiffies is linker-magic to jiffies_64 and works because a
little-endian load at the same address gives you the 32 lower bits.
I'm not opposed to this, but a comment wouldn't kill anyone.

we also have wall_jiffies which is 32-bit (unsigned long, ia32) and
is used get the gettimeofday code to detect lost ticks, having
this as well as jiffies_64 seems overkill

we do xtime updates w/o a lock on most platforms

Perhaps I misunderstand the code right now, the need for the
complexity and what-not --- but I don't like it.

It's either because it's too complicated or it's not-clear what is
going on, I don't know which one matches reality most closely,
probably the latter.

I know there are NTP implications of this this code but it feels more
like "it happens to work, please don't touch it" rather than anything
clean and well designed. I'm pretty sure there are some tricky corner
cases and subtle interactions to worry about, especially when we look
at leap-seconds.

Maybe having all this code moved into a single place would help, I'm
not sure, with platform-provided abstraction as required so older
platforms which can't do atomic 64-bit updates will still behave
sanely.

To be quite honest I'd like to see jiffies die and xtime become a
per-cpu thing (if that can be made to work reliably, I have some
concerns), or at least have this option on a platform by platform
basis.



--cw

2004-06-27 15:42:13

by Alan

[permalink] [raw]
Subject: Re: [parisc-linux] Re: [PATCH] Fix the cpumask rewrite

On Sul, 2004-06-27 at 01:05, Chris Wedgwood wrote:
> I'm all for that, except last I counted there are about 5000 users of
> jiffies. What do you suggest as a replacement API?

For a lot of them they shouldn't be polling. For those that do in
most cases something like

struct timeout_timer t;

timeout_set(t, 5 * HZ)
timeout_cancel(t)

if(timeout_expired(t))

Where timeout_timer is effectively an existing timer of add_timer
style and a single variable. The code to build that kind of timer
on top of add_timer is trivial.

2004-06-27 17:40:40

by Linus Torvalds

[permalink] [raw]
Subject: Re: more (insane) jiffies ranting



On Sat, 26 Jun 2004, Chris Wedgwood wrote:
>
> On Sat, Jun 26, 2004 at 03:48:34PM -0700, Linus Torvalds wrote:
>
> > But for most data structures, the way to control access is either
> > with proper locking (at which point they aren't volatile any more)
> > or through proper accessor functions (ie "jiffies_64" should
> > generally only be accessed with something that understands about
> > low/high word and update ordering and re-testing).
>
> I don't entirely buy this. Right now x86 code just assumes 32-bit
> loads are atomic and does them blindly in lots of places (ie. every
> user of jiffies just about).
>
> Without the volatile it seems entirely reasonable gcc will produce
> correct, but wrong code here so I would argue 'volatile' is a property
> of the data in this case.

It's a property of the data _iff_:
- it is _always_ volatile
- it is only ever used atomically: this also means that it must be
totally independent of _all_ other data structures and have no linkages
to anything else.

Snd basically, the above is pretty much never true except possibly for
real I/O accesses and sometimes things like simple "flags" (ie it's fine
to use "volatile sigatomic_t flag;" in user programs to have signal
handlers say "something happened" in a single-threaded environment).

NOTE! The "single-threaded environment" part really is important, and is
one of the historical reasons for volatile having been more useful than it
is today. If you are single-threaded and don't have issues like CPU memory
ordering etc, then you can let the compiler do more of the work, and there
are a lot of lockless algorithms that you can use that only depend on
fairly simple semantics for "volatile".

But the fact is, for the kernel none of the above is ever really true.
A 32-bit-atomic "jiffies" comes the closest, but even there the "always"
property wasn't true - it wasn't true in the update phase, and we
literally used to have something like this:

*((unsigned long *)&jiffies)++;

to update jiffies and still get good code generation (now that we have a
64-bit jiffies and need to do more complex stuff anyway, we don't have
that any more, but you should be able to find it in 2.3.x kernels if I
remember correctly).

And _anything_ that has any data dependencies, "volatile" is totally
useless. Even the (acceptable in single-threaded user-space) "flag" thing
is not valid usage in the kernel, since for a flag in a multi-threaded
environment you still need an explicit CPU memory barrier in the code,
making it impossible for the compiler to do the right thing anyway.

> > I repeat: it is the _code_ that knows about volatile rules, not the
> > data structure.
>
> Except as I mentioned we have exceptions to this right now.

No we don't. The _only_ accepted exception is the special case of "the low
bits of jiffies", and that's accepted partly because of historical
reasons, and partly because it's fundamentally a data structure we don't
really care that much about. There should be no other ones.

And that special case _literally_ is only for people who don't care that
much. Anybody who cares about "real time" needs to get xtime_lock and do
the proper magic to get a real date.

So I don't see your argument. I'm obviously saying that "yes, we have
_one_ case where we make a data structure volatile", but at the same time,
that case is very much a "we don't really care too much about precision
there, and even so people think we should have real accessor functions".

So I stand by the rule: we should make _code_ have the access rules, and
the data itself should never be volatile. And yes, jiffies breaks that
rule, but hey, that's not something I'm proud of.

Linus

2004-06-28 23:16:52

by Jonathan Lundell

[permalink] [raw]
Subject: Re: [PATCH] Fix the cpumask rewrite

At 11:01 AM -0700 6/26/04, Linus Torvalds wrote:
>I'm saying that data structures ARE NOT VOLATILE. I personally believe
>that the notion of a "volatile" data structure is complete and utter shit.

Perhaps, but surely they exist. I'm thinking specifically of
memory-mapped hardware registers and data structures that are shared
with DMA devices. Most recent Ethernet controllers fall into the
latter category, and in either case write-locking is not an option.

If I can find some way to force my code to reload the data, then
sure, call the code "volatile" if you like. But the data is simply
volatile, in the sense that it can (and is expected to) change
independent of my code paths.
--
/Jonathan Lundell.