Subject: Re: [Question]: try to fix contention between expire_timers and
 try_to_del_timer_sync
To: Thomas Gleixner <tglx@linutronix.de>
CC: John Stultz <john.stultz@linaro.org>, <sboyd@codeaurora.org>,
        LKML <linux-kernel@vger.kernel.org>,
        Wang Wilbur <wilburwang@asrmicro.com>,
        Marc Zyngier <marc.zyngier@arm.com>, Will Deacon <will.deacon@arm.com>,
        Peter Zijlstra <peterz@infradead.org>
References: <3d2459c7-defd-a47e-6cea-007c10cecaac@asrmicro.com>
 <alpine.DEB.2.20.1707261548560.2186@nanos>
From: qiaozhou <qiaozhou@asrmicro.com>
Message-ID: <dcb18367-c747-96a8-9927-d8ba6954c496@asrmicro.com>
Date: Thu, 27 Jul 2017 09:29:20 +0800
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101
 Thunderbird/52.2.1
MIME-Version: 1.0
In-Reply-To: <alpine.DEB.2.20.1707261548560.2186@nanos>
Content-Type: text/plain; charset="utf-8"; format=flowed
Content-Transfer-Encoding: 8bit
Content-Language: en-US
Sender: linux-kernel-owner@vger.kernel.org
Content-Length: 6511
Lines: 162


On 2017年07月26日 22:16, Thomas Gleixner wrote:
> On Wed, 26 Jul 2017, qiaozhou wrote:
>
> Cc'ed ARM folks.
>
>> I want to ask you for suggestions about how to fix one contention between
>> expire_timers and try_to_del_timer_sync. Thanks in advance.
>> The issue is a hard-lockup issue detected on our platform(arm64, one cluster
>> with 4 a53, and the other with 4 a73). The sequence is as below:
>> 1. core0 checks expired timers, and wakes up a process on a73, in step 1.3.
>> 2. core4 starts to run the process and try to delete the timer in sync method,
>> in step 2.1.
>> 3. Before core0 can run to 1.4 and get the lock, core4 has already run from
>> 2.1 to 2.6 in while loop. And in step 2.4, it fails since the timer is still
>> the running timer.
>>
>> core0(a53): core4(a73):
>> run_timer_softirq
>> __run_timers
>>      spin_lock_irq(&base->lock)
>>          expire_timers()
>>              1.1: base->running_timer = timer;
>>              1.2: spin_unlock_irq(&base->lock);
>>              1.3: call_timer_fn(timer, fn, data);
>> schedule()
>>              1.4: spin_lock_irq(&base->lock);     del_timer_sync(&timer)
>>              1.5: back to 1.1 in while loop
>> 2.1: try_to_del_timer_sync()
>>          2.2: get_timer_base()
>>          2.3: spin_lock_irqsave(&base->lock, flags);
>>            2.4: check "base->running_timer != timer"
>>          2.5: spin_unlock_irqrestore(&base->lock, flags);
>>     2.6:cpu_relax(); (back to 2.1 in while loop)
> This is horribly formatted.
Sorry for the format. Updated the calling sequence on two cores.

core0(a53): core4(a73):
run_timer_softirq
__run_timers
     spin_lock_irq(&base->lock)
         expire_timers()
             1.1: base->running_timer = timer;
             1.2: spin_unlock_irq(&base->lock);
             1.3: call_timer_fn(timer, fn, data);
             1.4: spin_lock_irq(&base->lock);
             1.5: back to 1.1 in while loop

core4(a73):
schedule()
   del_timer_sync(&timer)
      2.1: try_to_del_timer_sync()
         2.2: get_timer_base()
         2.3: spin_lock_irqsave(&base->lock, flags);
         2.4: check "base->running_timer != timer"
         2.5: spin_unlock_irqrestore(&base->lock, flags);
      2.6:cpu_relax(); (back to 2.1 in while loop)

>
>> Core0 runs @832MHz, and core4 runs @1.8GHz. A73 is also much more powerful in
>> design than a53. So the actual running is that a73 keeps looping in spin_lock
>> -> check running_timer -> spin_unlock -> spin_lock ->...., while a53 can't
>> even succeed to complete one store exclusive instruction, before it can enter
>> WFE to wait for lock release. It just keeps looping in +30 and+3c in below
>> asm, due to stxr fails.
>>
>> So it deadloop, a53 can't jump out ldaxr/stxr loop, while a73 can't pass the
>> running_timer check. Finally the hard-lockup is triggered.(BTW, the issue
>> needs a long time to occur.)
>>
>> <_raw_spin_lock_irq+0x2c>:   prfm    pstl1strm, [x19]
>> /<_raw_spin_lock_irq+0x30>:   ldaxr   w0, [x19]//
>> //<_raw_spin_lock_irq+0x34>:   add     w1, w0, #0x10, lsl #12//
>> //<_raw_spin_lock_irq+0x38>:   stxr    w2, w1, [x19]//
>> //<_raw_spin_lock_irq+0x3c>:   cbnz    w2, 0xffffff80089f52e0
>> <_raw_spin_lock_irq+0x30>/
>> <_raw_spin_lock_irq+0x40>:   eor     w1, w0, w0, ror #16
>> <_raw_spin_lock_irq+0x44>:   cbz     w1, 0xffffff80089f530c
>> <_raw_spin_lock_irq+0x5c>
>> <_raw_spin_lock_irq+0x48>:   sevl
>> <_raw_spin_lock_irq+0x4c>:   wfe
>> <_raw_spin_lock_irq+0x50>:   ldaxrh  w2, [x19]
>> <_raw_spin_lock_irq+0x54>:   eor     w1, w2, w0, lsr #16
>> <_raw_spin_lock_irq+0x58>:   cbnz    w1, 0xffffff80089f52fc
>> <_raw_spin_lock_irq+0x4c>
>>
>> The loop on a53 only has 4 instructions, and loop on a73 has ~100
>> instructions. Still a53 has no chance to store exclusive successfully. It may
>> be related with ldaxr/stxr cost, core frequency, core number etc.
>>
>> I have no idea of fixing it in spinlock/unlock implement, so I try to fix it
>> in timer driver. I prepared a raw patch, not sure it's the correct direction
>> to solve this issue. Could you help to give some suggestions? Thanks.
>>
>>  From fb8fbfeb8f9f92fdadd9920ce234fd433bc883e1 Mon Sep 17 00:00:00 2001
>> From: Qiao Zhou <qiaozhou@asrmicro.com>
>> Date: Wed, 26 Jul 2017 20:30:33 +0800
>> Subject: [PATCH] RFC: timers: try to fix contention between expire_timers and
>>   del_timer_sync
>>
>> try to fix contention between expire_timers and del_timer_sync by
>> adding TIMER_WOKEN status, which means that the timer has already
>> woken up corresponding process.
> Timers do a lot more things than waking up a process.
>
> Just because this happens in a particular case for you where a wakeup is
> involved this does not mean, that this is generally true.
Yes, you're right. My intention is that as the timer function is 
executed, maybe we can do something.
>
> And that whole flag business is completely broken. There is a comment in
> call_timer_fn() which says:
>
>          /*
>           * It is permissible to free the timer from inside the
>           * function that is called from it ....
>
> So you _CANNOT_ touch timer after the function returned. It might be gone
> already.
You're right. Touching timer here has risk.
>
> For that particular timer case we can clear base->running_timer w/o the
> lock held (see patch below), but this kind of
>
>       lock -> test -> unlock -> retry
>
> loops are all over the place in the kernel, so this is going to hurt you
> sooner than later in some other place.
It's true. This is the way spinlock is used normally and widely in 
kernel. I'll also ask ARM experts whether we can do something to avoid 
or reduce the chance of such issue. ARMv8.1 has one single 
instruction(ldadda) to replace the ldaxr/stxr loop. Hope it can improve 
and reduce the chance.
>
> Thanks,
>
> 	tglx
>
> 8<------------
>
> --- a/kernel/time/timer.c
> +++ b/kernel/time/timer.c
> @@ -1301,10 +1301,12 @@ static void expire_timers(struct timer_b
>   		if (timer->flags & TIMER_IRQSAFE) {
>   			raw_spin_unlock(&base->lock);
>   			call_timer_fn(timer, fn, data);
> +			base->running_timer = NULL;
>   			raw_spin_lock(&base->lock);
>   		} else {
>   			raw_spin_unlock_irq(&base->lock);
>   			call_timer_fn(timer, fn, data);
> +			base->running_timer = NULL;
>   			raw_spin_lock_irq(&base->lock);
>   		}
>   	}
It should work for this particular issue and I'll test it. Previously I 
thought it was unsafe to touch base->running_timer without holding lock.

Thanks a lot for all the suggestions.

Best Regards
Qiao