On Fri, May 30, 2014 at 11:44:00AM -0400, Waiman Long wrote:
> @@ -19,13 +19,46 @@ extern struct static_key virt_unfairlocks_enabled;
> * that the clearing the lock bit is done ASAP without artificial delay
> * due to compiler optimization.
> */
> +#ifdef CONFIG_PARAVIRT_SPINLOCKS
> +static __always_inline void __queue_spin_unlock(struct qspinlock *lock)
> +#else
> static inline void queue_spin_unlock(struct qspinlock *lock)
> +#endif
> {
> barrier();
> ACCESS_ONCE(*(u8 *)lock) = 0;
> barrier();
> }
>
> +#ifdef CONFIG_PARAVIRT_SPINLOCKS
> +/*
> + * The lock byte can have a value of _Q_LOCKED_SLOWPATH to indicate
> + * that it needs to go through the slowpath to do the unlocking.
> + */
> +#define _Q_LOCKED_SLOWPATH (_Q_LOCKED_VAL | 2)
> +
> +extern void queue_spin_unlock_slowpath(struct qspinlock *lock);
> +
> +static inline void queue_spin_unlock(struct qspinlock *lock)
> +{
> + barrier();
> + if (static_key_false(¶virt_spinlocks_enabled)) {
> + /*
> + * Need to atomically clear the lock byte to avoid racing with
> + * queue head waiter trying to set _QLOCK_LOCKED_SLOWPATH.
> + */
> + if (likely(cmpxchg((u8 *)lock, _Q_LOCKED_VAL, 0)
> + == _Q_LOCKED_VAL))
> + return;
> + else
> + queue_spin_unlock_slowpath(lock);
> +
> + } else {
> + __queue_spin_unlock(lock);
> + }
> + barrier();
> +}
> +#endif /* CONFIG_PARAVIRT_SPINLOCKS */
Ideally we'd make all this use alternatives or so, such that the actual
function remains short enough to actually inline;
static inline void queue_spin_unlock(struct qspinlock *lock)
{
pv_spinlock_alternative(
ACCESS_ONCE(*(u8 *)lock) = 0,
pv_queue_spin_unlock(lock));
}
Or however that trickery works.
On 06/12/2014 04:17 AM, Peter Zijlstra wrote:
> On Fri, May 30, 2014 at 11:44:00AM -0400, Waiman Long wrote:
>> @@ -19,13 +19,46 @@ extern struct static_key virt_unfairlocks_enabled;
>> * that the clearing the lock bit is done ASAP without artificial delay
>> * due to compiler optimization.
>> */
>> +#ifdef CONFIG_PARAVIRT_SPINLOCKS
>> +static __always_inline void __queue_spin_unlock(struct qspinlock *lock)
>> +#else
>> static inline void queue_spin_unlock(struct qspinlock *lock)
>> +#endif
>> {
>> barrier();
>> ACCESS_ONCE(*(u8 *)lock) = 0;
>> barrier();
>> }
>>
>> +#ifdef CONFIG_PARAVIRT_SPINLOCKS
>> +/*
>> + * The lock byte can have a value of _Q_LOCKED_SLOWPATH to indicate
>> + * that it needs to go through the slowpath to do the unlocking.
>> + */
>> +#define _Q_LOCKED_SLOWPATH (_Q_LOCKED_VAL | 2)
>> +
>> +extern void queue_spin_unlock_slowpath(struct qspinlock *lock);
>> +
>> +static inline void queue_spin_unlock(struct qspinlock *lock)
>> +{
>> + barrier();
>> + if (static_key_false(¶virt_spinlocks_enabled)) {
>> + /*
>> + * Need to atomically clear the lock byte to avoid racing with
>> + * queue head waiter trying to set _QLOCK_LOCKED_SLOWPATH.
>> + */
>> + if (likely(cmpxchg((u8 *)lock, _Q_LOCKED_VAL, 0)
>> + == _Q_LOCKED_VAL))
>> + return;
>> + else
>> + queue_spin_unlock_slowpath(lock);
>> +
>> + } else {
>> + __queue_spin_unlock(lock);
>> + }
>> + barrier();
>> +}
>> +#endif /* CONFIG_PARAVIRT_SPINLOCKS */
> Ideally we'd make all this use alternatives or so, such that the actual
> function remains short enough to actually inline;
>
> static inline void queue_spin_unlock(struct qspinlock *lock)
> {
> pv_spinlock_alternative(
> ACCESS_ONCE(*(u8 *)lock) = 0,
> pv_queue_spin_unlock(lock));
> }
>
> Or however that trickery works.
I think the paravirt version of the unlock function is already short
enough. In addition, whenever PARAVIRT_SPINLOCKS is enabled, the
inlining of the unlock function is disabled so that the jump label
paravirt_spinlocks_enabled won't be used everywhere in the core kernel
and the kernel modules. This is true for the ticket spinlock and will
also be true for the queue spinlock.
I don't have a good understanding of the kernel alternatives mechanism.
I think it allow boot time modification of the kernel code according to
the CPU type, for example. The jump label mechanism is similar and is
much easier to use than the alternatives. I don't see a need to use the
alternatives unless you saw a big advantage of using it which I am not
aware of.
-Longman
On Thu, Jun 12, 2014 at 04:48:41PM -0400, Waiman Long wrote:
> I don't have a good understanding of the kernel alternatives mechanism.
I didn't either; I do now, cost me a whole day reading up on
alternative/paravirt code patching.
See the patches I just send out; I got the 'native' case with paravirt
enabled to be one NOP worse than the native case without paravirt -- for
queue_spin_unlock.
The lock slowpath is several nops and some pointless movs more expensive.
On Sun, Jun 15, 2014 at 03:16:54PM +0200, Peter Zijlstra wrote:
> On Thu, Jun 12, 2014 at 04:48:41PM -0400, Waiman Long wrote:
> > I don't have a good understanding of the kernel alternatives mechanism.
>
> I didn't either; I do now, cost me a whole day reading up on
> alternative/paravirt code patching.
>
> See the patches I just send out; I got the 'native' case with paravirt
> enabled to be one NOP worse than the native case without paravirt -- for
> queue_spin_unlock.
>
> The lock slowpath is several nops and some pointless movs more expensive.
You could use the asm goto which would optimize the fast path to be the
'native' case. That way you wouldn't have the the nops and movs in the
path.
(And asm goto also uses the alternative_asm macros).