Date: Thu, 27 Aug 2015 00:56:30 +0200
From: Frederic Weisbecker <fweisbec@gmail.com>
To: Jason Low <jason.low2@hp.com>
Cc: Peter Zijlstra <peterz@infradead.org>, Ingo Molnar <mingo@kernel.org>,
        Thomas Gleixner <tglx@linutronix.de>, Oleg Nesterov <oleg@redhat.com>,
        "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>,
        linux-kernel@vger.kernel.org,
        Linus Torvalds <torvalds@linux-foundation.org>,
        Davidlohr Bueso <dave@stgolabs.net>,
        Steven Rostedt <rostedt@goodmis.org>,
        Andrew Morton <akpm@linux-foundation.org>,
        Terry Rudd <terry.rudd@hp.com>, Rik van Riel <riel@redhat.com>,
        Scott J Norton <scott.norton@hp.com>
Subject: Re: [PATCH 3/3] timer: Reduce unnecessary sighand lock contention
Message-ID: <20150826225628.GD11992@lerouge>
References: <1440559068-29680-1-git-send-email-jason.low2@hp.com>
 <1440559068-29680-4-git-send-email-jason.low2@hp.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <1440559068-29680-4-git-send-email-jason.low2@hp.com>
User-Agent: Mutt/1.5.23 (2014-03-12)
Sender: linux-kernel-owner@vger.kernel.org
Content-Length: 3437
Lines: 76

On Tue, Aug 25, 2015 at 08:17:48PM -0700, Jason Low wrote:
> It was found while running a database workload on large systems that
> significant time was spent trying to acquire the sighand lock.
> 
> The issue was that whenever an itimer expired, many threads ended up
> simultaneously trying to send the signal. Most of the time, nothing
> happened after acquiring the sighand lock because another thread
> had already sent the signal and updated the "next expire" time. The
> fastpath_timer_check() didn't help much since the "next expire" time
> was updated later.
>  
> This patch addresses this by having the thread_group_cputimer structure
> maintain a boolean to signify when a thread in the group is already
> checking for process wide timers, and adds extra logic in the fastpath
> to check the boolean.
> 
> Signed-off-by: Jason Low <jason.low2@hp.com>
> ---
>  include/linux/init_task.h      |    1 +
>  include/linux/sched.h          |    3 +++
>  kernel/time/posix-cpu-timers.c |   19 +++++++++++++++++--
>  3 files changed, 21 insertions(+), 2 deletions(-)
> 
> diff --git a/include/linux/init_task.h b/include/linux/init_task.h
> index d0b380e..3350c77 100644
> --- a/include/linux/init_task.h
> +++ b/include/linux/init_task.h
> @@ -53,6 +53,7 @@ extern struct fs_struct init_fs;
>  	.cputimer	= { 						\
>  		.cputime_atomic	= INIT_CPUTIME_ATOMIC,			\
>  		.running	= 0,					\
> +		.checking_timer	= 0,					\
>  	},								\
>  	INIT_PREV_CPUTIME(sig)						\
>  	.cred_guard_mutex =						\
> diff --git a/include/linux/sched.h b/include/linux/sched.h
> index 119823d..a6c8334 100644
> --- a/include/linux/sched.h
> +++ b/include/linux/sched.h
> @@ -619,6 +619,8 @@ struct task_cputime_atomic {
>   * @cputime_atomic:	atomic thread group interval timers.
>   * @running:		non-zero when there are timers running and
>   * 			@cputime receives updates.
> + * @checking_timer:	non-zero when a thread is in the process of
> + *			checking for thread group timers.
>   *
>   * This structure contains the version of task_cputime, above, that is
>   * used for thread group CPU timer calculations.
> @@ -626,6 +628,7 @@ struct task_cputime_atomic {
>  struct thread_group_cputimer {
>  	struct task_cputime_atomic cputime_atomic;
>  	int running;
> +	int checking_timer;

How about a flag in the "running" field instead?

1) Space in signal_struct is not as important as in task_strut but it
   still matters.

2) We already read the "running" field locklessly. Adding a new field like
   checking_timer gets even more complicated. Ideally there should be at
   least a paired memory barrier between both. Let's just simplify that
   with a single field.

Now concerning the solution for your problem, I'm a bit uncomfortable with
lockless magics like this. When the thread sets checking_timer to 1, there is
no guarantee that the other threads in the process will see it "fast" enough
to avoid the slow path checks. Then there is also the risk that the threads
don't see "fast" enough that checking_timers has toggled to 0 and as a result
a timer may expire late. Now the lockless access of "running" already induces
such race. So if it really solves issues in practice, why not.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/