Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1756079AbbHZW4e (ORCPT ); Wed, 26 Aug 2015 18:56:34 -0400 Received: from mail-wi0-f174.google.com ([209.85.212.174]:37300 "EHLO mail-wi0-f174.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751647AbbHZW4d (ORCPT ); Wed, 26 Aug 2015 18:56:33 -0400 Date: Thu, 27 Aug 2015 00:56:30 +0200 From: Frederic Weisbecker To: Jason Low Cc: Peter Zijlstra , Ingo Molnar , Thomas Gleixner , Oleg Nesterov , "Paul E. McKenney" , linux-kernel@vger.kernel.org, Linus Torvalds , Davidlohr Bueso , Steven Rostedt , Andrew Morton , Terry Rudd , Rik van Riel , Scott J Norton Subject: Re: [PATCH 3/3] timer: Reduce unnecessary sighand lock contention Message-ID: <20150826225628.GD11992@lerouge> References: <1440559068-29680-1-git-send-email-jason.low2@hp.com> <1440559068-29680-4-git-send-email-jason.low2@hp.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1440559068-29680-4-git-send-email-jason.low2@hp.com> User-Agent: Mutt/1.5.23 (2014-03-12) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 3437 Lines: 76 On Tue, Aug 25, 2015 at 08:17:48PM -0700, Jason Low wrote: > It was found while running a database workload on large systems that > significant time was spent trying to acquire the sighand lock. > > The issue was that whenever an itimer expired, many threads ended up > simultaneously trying to send the signal. Most of the time, nothing > happened after acquiring the sighand lock because another thread > had already sent the signal and updated the "next expire" time. The > fastpath_timer_check() didn't help much since the "next expire" time > was updated later. > > This patch addresses this by having the thread_group_cputimer structure > maintain a boolean to signify when a thread in the group is already > checking for process wide timers, and adds extra logic in the fastpath > to check the boolean. > > Signed-off-by: Jason Low > --- > include/linux/init_task.h | 1 + > include/linux/sched.h | 3 +++ > kernel/time/posix-cpu-timers.c | 19 +++++++++++++++++-- > 3 files changed, 21 insertions(+), 2 deletions(-) > > diff --git a/include/linux/init_task.h b/include/linux/init_task.h > index d0b380e..3350c77 100644 > --- a/include/linux/init_task.h > +++ b/include/linux/init_task.h > @@ -53,6 +53,7 @@ extern struct fs_struct init_fs; > .cputimer = { \ > .cputime_atomic = INIT_CPUTIME_ATOMIC, \ > .running = 0, \ > + .checking_timer = 0, \ > }, \ > INIT_PREV_CPUTIME(sig) \ > .cred_guard_mutex = \ > diff --git a/include/linux/sched.h b/include/linux/sched.h > index 119823d..a6c8334 100644 > --- a/include/linux/sched.h > +++ b/include/linux/sched.h > @@ -619,6 +619,8 @@ struct task_cputime_atomic { > * @cputime_atomic: atomic thread group interval timers. > * @running: non-zero when there are timers running and > * @cputime receives updates. > + * @checking_timer: non-zero when a thread is in the process of > + * checking for thread group timers. > * > * This structure contains the version of task_cputime, above, that is > * used for thread group CPU timer calculations. > @@ -626,6 +628,7 @@ struct task_cputime_atomic { > struct thread_group_cputimer { > struct task_cputime_atomic cputime_atomic; > int running; > + int checking_timer; How about a flag in the "running" field instead? 1) Space in signal_struct is not as important as in task_strut but it still matters. 2) We already read the "running" field locklessly. Adding a new field like checking_timer gets even more complicated. Ideally there should be at least a paired memory barrier between both. Let's just simplify that with a single field. Now concerning the solution for your problem, I'm a bit uncomfortable with lockless magics like this. When the thread sets checking_timer to 1, there is no guarantee that the other threads in the process will see it "fast" enough to avoid the slow path checks. Then there is also the risk that the threads don't see "fast" enough that checking_timers has toggled to 0 and as a result a timer may expire late. Now the lockless access of "running" already induces such race. So if it really solves issues in practice, why not. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/