The Alpha Architecture Reference Manual states that any memory access
performed between an LD_xL and a STx_C instruction may cause the
store-conditional to fail unconditionally and, as such, `no useful
program should do this'.
Linux is a useful program, so fix up the Alpha spinlock implementation
to use logical operations rather than load-address instructions for
generating immediates.
Cc: Richard Henderson <[email protected]>
Cc: Ivan Kokshaysky <[email protected]>
Cc: Matt Turner <[email protected]>
Signed-off-by: Will Deacon <[email protected]>
---
arch/alpha/include/asm/spinlock.h | 10 +++++-----
1 file changed, 5 insertions(+), 5 deletions(-)
diff --git a/arch/alpha/include/asm/spinlock.h b/arch/alpha/include/asm/spinlock.h
index 3bba21e..0c357cd 100644
--- a/arch/alpha/include/asm/spinlock.h
+++ b/arch/alpha/include/asm/spinlock.h
@@ -29,7 +29,7 @@ static inline void arch_spin_lock(arch_spinlock_t * lock)
__asm__ __volatile__(
"1: ldl_l %0,%1\n"
" bne %0,2f\n"
- " lda %0,1\n"
+ " mov 1,%0\n"
" stl_c %0,%1\n"
" beq %0,2f\n"
" mb\n"
@@ -86,7 +86,7 @@ static inline void arch_write_lock(arch_rwlock_t *lock)
__asm__ __volatile__(
"1: ldl_l %1,%0\n"
" bne %1,6f\n"
- " lda %1,1\n"
+ " mov 1,%1\n"
" stl_c %1,%0\n"
" beq %1,6f\n"
" mb\n"
@@ -106,7 +106,7 @@ static inline int arch_read_trylock(arch_rwlock_t * lock)
__asm__ __volatile__(
"1: ldl_l %1,%0\n"
- " lda %2,0\n"
+ " mov 0,%2\n"
" blbs %1,2f\n"
" subl %1,2,%2\n"
" stl_c %2,%0\n"
@@ -128,9 +128,9 @@ static inline int arch_write_trylock(arch_rwlock_t * lock)
__asm__ __volatile__(
"1: ldl_l %1,%0\n"
- " lda %2,0\n"
+ " mov 0,%2\n"
" bne %1,2f\n"
- " lda %2,1\n"
+ " mov 1,%2\n"
" stl_c %2,%0\n"
" beq %2,6f\n"
"2: mb\n"
--
1.8.2.2
On Mon, May 6, 2013 at 1:01 PM, Will Deacon <[email protected]> wrote:
> The Alpha Architecture Reference Manual states that any memory access
> performed between an LD_xL and a STx_C instruction may cause the
> store-conditional to fail unconditionally and, as such, `no useful
> program should do this'.
>
> Linux is a useful program, so fix up the Alpha spinlock implementation
> to use logical operations rather than load-address instructions for
> generating immediates.
>
> Cc: Richard Henderson <[email protected]>
> Cc: Ivan Kokshaysky <[email protected]>
> Cc: Matt Turner <[email protected]>
> Signed-off-by: Will Deacon <[email protected]>
> ---
> arch/alpha/include/asm/spinlock.h | 10 +++++-----
> 1 file changed, 5 insertions(+), 5 deletions(-)
>
> diff --git a/arch/alpha/include/asm/spinlock.h b/arch/alpha/include/asm/spinlock.h
> index 3bba21e..0c357cd 100644
> --- a/arch/alpha/include/asm/spinlock.h
> +++ b/arch/alpha/include/asm/spinlock.h
> @@ -29,7 +29,7 @@ static inline void arch_spin_lock(arch_spinlock_t * lock)
> __asm__ __volatile__(
> "1: ldl_l %0,%1\n"
> " bne %0,2f\n"
> - " lda %0,1\n"
> + " mov 1,%0\n"
> " stl_c %0,%1\n"
> " beq %0,2f\n"
> " mb\n"
> @@ -86,7 +86,7 @@ static inline void arch_write_lock(arch_rwlock_t *lock)
> __asm__ __volatile__(
> "1: ldl_l %1,%0\n"
> " bne %1,6f\n"
> - " lda %1,1\n"
> + " mov 1,%1\n"
> " stl_c %1,%0\n"
> " beq %1,6f\n"
> " mb\n"
> @@ -106,7 +106,7 @@ static inline int arch_read_trylock(arch_rwlock_t * lock)
>
> __asm__ __volatile__(
> "1: ldl_l %1,%0\n"
> - " lda %2,0\n"
> + " mov 0,%2\n"
> " blbs %1,2f\n"
> " subl %1,2,%2\n"
> " stl_c %2,%0\n"
> @@ -128,9 +128,9 @@ static inline int arch_write_trylock(arch_rwlock_t * lock)
>
> __asm__ __volatile__(
> "1: ldl_l %1,%0\n"
> - " lda %2,0\n"
> + " mov 0,%2\n"
> " bne %1,2f\n"
> - " lda %2,1\n"
> + " mov 1,%2\n"
> " stl_c %2,%0\n"
> " beq %2,6f\n"
> "2: mb\n"
> --
> 1.8.2.2
I'm not sure of the interpretation that LDA counts as a memory access.
The manual says it's Ra <- Rbv + SEXT(disp).
It's not touching memory that I can see.
Does this fix a known problem or is it just something that you noticed?
Matt
On Mon, May 06, 2013 at 09:01:05PM +0100, Will Deacon wrote:
> The Alpha Architecture Reference Manual states that any memory access
> performed between an LD_xL and a STx_C instruction may cause the
> store-conditional to fail unconditionally and, as such, `no useful
> program should do this'.
>
> Linux is a useful program, so fix up the Alpha spinlock implementation
> to use logical operations rather than load-address instructions for
> generating immediates.
Huh? Relevant quote is "If any other memory access (ECB, LDx, LDQ_U,
STx_C, STQ_U, WH64x) is executed on the given processor between the
LDx_L and the STx_C, the sequence above may always fail on some
implementations; hence, no no useful programs should do this". Where
do you see LDA in that list and why would it possibly be there? And
no, LDx does *not* cover it - the same reference manual gives
LD{Q,L,WU,BU} as expansion for LDx, using LDAx for LD{A,AH}; it's
a separate group of instructions and it does *NOT* do any kind of
memory access.
On Mon, May 06, 2013 at 01:19:51PM -0700, Matt Turner wrote:
> I'm not sure of the interpretation that LDA counts as a memory access.
>
> The manual says it's Ra <- Rbv + SEXT(disp).
>
> It's not touching memory that I can see.
More to the point, the same manual gives explicit list of instructions
that shouldn't occur between LDx_L and STx_C, and LDA does not belong to any
of those. I suspect that Will has misparsed the notations in there - LDx is
present in the list, but it's _not_ "all instructions with mnemonics starting
with LD", just the 4 "load integer from memory" ones. FWIW, instructions
with that encoding (x01xxx<a:5><b:5><offs:16>) are grouped so:
LDAx - LDA, LDAH; load address
LDx - LDL, LDQ, LDBU, LDWU; load memory data into integer register
LDQ_U; load unaligned
LDx_L - LDL_L, LDQ_L; load locked
STx_C - STL_C, STQ_C; store conditional
STx - STL, STQ, STB, STW; store
STQ_U; store unaligned
They all have the same encoding, naturally enough (operation/register/address
representation), but that's it... See section 4.2 in reference manual for
details; relevant note follows discussion of LDx_L and it spells the list
out. LDx is present, LDAx isn't (and neither is LDA by itself).
Hi Al, Matt,
On Mon, May 06, 2013 at 09:53:30PM +0100, Al Viro wrote:
> On Mon, May 06, 2013 at 01:19:51PM -0700, Matt Turner wrote:
>
> > I'm not sure of the interpretation that LDA counts as a memory access.
> >
> > The manual says it's Ra <- Rbv + SEXT(disp).
> >
> > It's not touching memory that I can see.
>
> More to the point, the same manual gives explicit list of instructions
> that shouldn't occur between LDx_L and STx_C, and LDA does not belong to any
> of those. I suspect that Will has misparsed the notations in there - LDx is
> present in the list, but it's _not_ "all instructions with mnemonics starting
> with LD", just the 4 "load integer from memory" ones. FWIW, instructions
> with that encoding (x01xxx<a:5><b:5><offs:16>) are grouped so:
> LDAx - LDA, LDAH; load address
> LDx - LDL, LDQ, LDBU, LDWU; load memory data into integer register
> LDQ_U; load unaligned
> LDx_L - LDL_L, LDQ_L; load locked
> STx_C - STL_C, STQ_C; store conditional
> STx - STL, STQ, STB, STW; store
> STQ_U; store unaligned
Your suspicions are right! I did assume that LDA fell under the LDx class,
so apologies for the false alarm. I suspect I should try and get out more,
rather than ponder over this reference manual.
The other (hopefully also wrong) worry that I had was when the manual
states that:
`If the virtual and physical addresses for a LDx_L and STx_C sequence are
not within the same naturally aligned 16-byte sections of virtual and
physical memory, that sequence may always fail, or may succeed despite
another processor’s store to the lock range; hence, no useful program
should do this'
This seems like it might have a curious interaction with CoW paging if
userspace is trying to use these instructions for a lock, since the
physical address for the conditional store might differ from the one which
was passed to the load due to CoW triggered by a different thread. Anyway,
I was still thinking about that one and haven't got as far as TLB
invalidation yet :)
> They all have the same encoding, naturally enough (operation/register/address
> representation), but that's it... See section 4.2 in reference manual for
> details; relevant note follows discussion of LDx_L and it spells the list
> out. LDx is present, LDAx isn't (and neither is LDA by itself).
Indeed, and looking at the disassembly, you can see the immediate operand to
LDA encoded into the instruction. I thought that perhaps it might behave
like ldr =<imm> on ARM, which goes and fetches the immediate value from the
literal pool.
Cheers for the explanation,
Will
Will Deacon <[email protected]> writes:
> Hi Al, Matt,
>
> On Mon, May 06, 2013 at 09:53:30PM +0100, Al Viro wrote:
>> On Mon, May 06, 2013 at 01:19:51PM -0700, Matt Turner wrote:
>>
>> > I'm not sure of the interpretation that LDA counts as a memory access.
>> >
>> > The manual says it's Ra <- Rbv + SEXT(disp).
>> >
>> > It's not touching memory that I can see.
>>
>> More to the point, the same manual gives explicit list of instructions
>> that shouldn't occur between LDx_L and STx_C, and LDA does not belong to any
>> of those. I suspect that Will has misparsed the notations in there - LDx is
>> present in the list, but it's _not_ "all instructions with mnemonics starting
>> with LD", just the 4 "load integer from memory" ones. FWIW, instructions
>> with that encoding (x01xxx<a:5><b:5><offs:16>) are grouped so:
>> LDAx - LDA, LDAH; load address
>> LDx - LDL, LDQ, LDBU, LDWU; load memory data into integer register
>> LDQ_U; load unaligned
>> LDx_L - LDL_L, LDQ_L; load locked
>> STx_C - STL_C, STQ_C; store conditional
>> STx - STL, STQ, STB, STW; store
>> STQ_U; store unaligned
>
> Your suspicions are right! I did assume that LDA fell under the LDx class,
> so apologies for the false alarm. I suspect I should try and get out more,
> rather than ponder over this reference manual.
LDA uses the address generation circuitry from the load/store unit, but
it does not actually access memory. It is merely a convenient way of
performing certain arithmetic operations, be it for scheduling reasons
or for the different range of immediate values available.
--
M?ns Rullg?rd
[email protected]
On Mon, May 06, 2013 at 10:12:38PM +0100, Will Deacon wrote:
> The other (hopefully also wrong) worry that I had was when the manual
> states that:
>
> `If the virtual and physical addresses for a LDx_L and STx_C sequence are
> not within the same naturally aligned 16-byte sections of virtual and
> physical memory, that sequence may always fail, or may succeed despite
> another processor’s store to the lock range; hence, no useful program
> should do this'
>
> This seems like it might have a curious interaction with CoW paging if
> userspace is trying to use these instructions for a lock, since the
> physical address for the conditional store might differ from the one which
> was passed to the load due to CoW triggered by a different thread. Anyway,
> I was still thinking about that one and haven't got as far as TLB
> invalidation yet :)
In case anybody is interested, the software broadcasting of TLB maintenance
solves this problem because the PAL_rti on the ret_to_user path will clear
the lock flag.
Will