2015-08-26 19:33:15

by George Spelvin

[permalink] [raw]
Subject: [PATCH 3/3] timer: Reduce unnecessary sighand lock contention

And some more comments on the series...

> @@ -626,6 +628,7 @@ struct task_cputime_atomic {
> struct thread_group_cputimer {
> struct task_cputime_atomic cputime_atomic;
> int running;
>+ int checking_timer;
> };

Why not make both running and checking_timer actual booleans, so the
pair is only 16 bits rather than 64?

I suppose since sum_exec_runtime in struct task_cputime_atomic is 64 bits,
alignment means there's no actual improvement. It's just my personal
preference (Linus disagrees with me!) for the bool type for documentation.

(Compile-tested patch appended.)


But when it comes to really understanding this code, I'm struggling.
It seems that this changes the return value of of fastpath_timer_check.
I'm tryng to wrap my head around exactly why that is safe.

Any time I see things like READ_ONCE and WRITE_ONCE, I wonder if there
need to be memory barriers as well. (Because x86 has strong default
memory ordering, testing there doesn't prove the code right on other
platforms.)

For the writes, the surrounding spinlock does the job.

But on the read side (fastpath_timer_check), I'm not sure
what the rules should be.

Or is it basically okay if this is massively racey, since process-wide
CPU timers are inherently sloppy. A race will just cause an expiration
check to be missed, but it will be retried soon anyway.


Other half-baked thoughts that went through my head:

If the problem is that you have contention for read access to
sig->cputimer.cputime, can that be changed to a seqlock?

Another possibility is to use a trylock before testing
checking_timer:

if (sig->cputimer.running) {
struct task_cputime group_sample;

- raw_spin_lock(&sig->cputimer.lock);
+ if (!raw_spin_trylock(&sig->cputimer.lock)) {
+ if (READ_ONCE(sig->cputimer.checking_timer))
+ return false; /* Lock holder's doing it for us */
+ raw_spin_lock(&sig->cputimer.lock);
+ }
group_sample = sig->cputimer.cputime;
raw_spin_unlock(&sig->cputimer.lock);

if (task_cputime_expired(&group_sample, &sig->cputime_expires))
return true;
}
return false;
}




----8<---- This is the patch mentioned above ----8<----

>From 0058bf5ed6a9e483ae33746685332e3ef9562ed5 Mon Sep 17 00:00:00 2001
From: George Spelvin <[email protected]>
Date: Wed, 26 Aug 2015 19:15:54 +0000
Subject: [PATCH] timer: Use bool more in kernel/time/posix-cpu-timers.c

This is mostly function return values, for documentation.

One structure field is changed (from int), but alignment
padding precludes any actual space saving.

Signed-off-by: George Spelvin <[email protected]>

diff --git a/include/linux/sched.h b/include/linux/sched.h
index 26a2e612..fac3a7e2 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -585,7 +585,7 @@ struct task_cputime {
/**
* struct thread_group_cputimer - thread group interval timer counts
* @cputime: thread group interval timers.
- * @running: non-zero when there are timers running and
+ * @running: %true when there are timers running and
* @cputime receives updates.
* @lock: lock for fields in this struct.
*
@@ -594,7 +594,7 @@ struct task_cputime {
*/
struct thread_group_cputimer {
struct task_cputime cputime;
- int running;
+ bool running;
raw_spinlock_t lock;
};

diff --git a/kernel/time/posix-cpu-timers.c b/kernel/time/posix-cpu-timers.c
index 0075da74..d8483ec5 100644
--- a/kernel/time/posix-cpu-timers.c
+++ b/kernel/time/posix-cpu-timers.c
@@ -113,14 +113,12 @@ static void bump_cpu_timer(struct k_itimer *timer,
*
* @cputime: The struct to compare.
*
- * Checks @cputime to see if all fields are zero. Returns true if all fields
- * are zero, false if any field is nonzero.
+ * Checks @cputime to see if all fields are zero. Returns %true if all fields
+ * are zero, %false if any field is nonzero.
*/
-static inline int task_cputime_zero(const struct task_cputime *cputime)
+static inline bool task_cputime_zero(const struct task_cputime *cputime)
{
- if (!cputime->utime && !cputime->stime && !cputime->sum_exec_runtime)
- return 1;
- return 0;
+ return !cputime->utime && !cputime->stime && !cputime->sum_exec_runtime;
}

static inline unsigned long long prof_ticks(struct task_struct *p)
@@ -223,7 +221,7 @@ void thread_group_cputimer(struct task_struct *tsk, struct task_cputime *times)
*/
thread_group_cputime(tsk, &sum);
raw_spin_lock_irqsave(&cputimer->lock, flags);
- cputimer->running = 1;
+ cputimer->running = true;
update_gt_cputime(&cputimer->cputime, &sum);
} else
raw_spin_lock_irqsave(&cputimer->lock, flags);
@@ -435,8 +433,9 @@ void posix_cpu_timers_exit_group(struct task_struct *tsk)
cleanup_timers(tsk->signal->cpu_timers);
}

-static inline int expires_gt(cputime_t expires, cputime_t new_exp)
+static inline bool expires_gt(cputime_t expires, cputime_t new_exp)
{
+ /* Could also be written "expires - 1 >= new_exp" */
return expires == 0 || expires > new_exp;
}

@@ -888,7 +887,7 @@ static void stop_process_timers(struct signal_struct *sig)
unsigned long flags;

raw_spin_lock_irqsave(&cputimer->lock, flags);
- cputimer->running = 0;
+ cputimer->running = false;
raw_spin_unlock_irqrestore(&cputimer->lock, flags);
}

@@ -1066,20 +1065,20 @@ out:
* @expires: Expiration times, against which @sample will be checked.
*
* Checks @sample against @expires to see if any field of @sample has expired.
- * Returns true if any field of the former is greater than the corresponding
- * field of the latter if the latter field is set. Otherwise returns false.
+ * Returns %true if any field of the former is greater than the corresponding
+ * field of the latter if the latter field is set. Otherwise returns %false.
*/
-static inline int task_cputime_expired(const struct task_cputime *sample,
+static inline bool task_cputime_expired(const struct task_cputime *sample,
const struct task_cputime *expires)
{
if (expires->utime && sample->utime >= expires->utime)
- return 1;
+ return true;
if (expires->stime && sample->utime + sample->stime >= expires->stime)
- return 1;
+ return true;
if (expires->sum_exec_runtime != 0 &&
sample->sum_exec_runtime >= expires->sum_exec_runtime)
- return 1;
- return 0;
+ return true;
+ return false;
}

/**
@@ -1088,11 +1087,11 @@ static inline int task_cputime_expired(const struct task_cputime *sample,
* @tsk: The task (thread) being checked.
*
* Check the task and thread group timers. If both are zero (there are no
- * timers set) return false. Otherwise snapshot the task and thread group
+ * timers set) return %false. Otherwise snapshot the task and thread group
* timers and compare them with the corresponding expiration times. Return
- * true if a timer has expired, else return false.
+ * %true if a timer has expired, else return %false.
*/
-static inline int fastpath_timer_check(struct task_struct *tsk)
+static inline bool fastpath_timer_check(struct task_struct *tsk)
{
struct signal_struct *sig;
cputime_t utime, stime;
@@ -1107,7 +1106,7 @@ static inline int fastpath_timer_check(struct task_struct *tsk)
};

if (task_cputime_expired(&task_sample, &tsk->cputime_expires))
- return 1;
+ return true;
}

sig = tsk->signal;
@@ -1119,10 +1118,10 @@ static inline int fastpath_timer_check(struct task_struct *tsk)
raw_spin_unlock(&sig->cputimer.lock);

if (task_cputime_expired(&group_sample, &sig->cputime_expires))
- return 1;
+ return true;
}

- return 0;
+ return false;
}

/*


2015-08-26 23:44:21

by Jason Low

[permalink] [raw]
Subject: Re: [PATCH 3/3] timer: Reduce unnecessary sighand lock contention

On Wed, 2015-08-26 at 15:33 -0400, George Spelvin wrote:
> And some more comments on the series...
>
> > @@ -626,6 +628,7 @@ struct task_cputime_atomic {
> > struct thread_group_cputimer {
> > struct task_cputime_atomic cputime_atomic;
> > int running;
> >+ int checking_timer;
> > };
>
> Why not make both running and checking_timer actual booleans, so the
> pair is only 16 bits rather than 64?
>
> I suppose since sum_exec_runtime in struct task_cputime_atomic is 64 bits,
> alignment means there's no actual improvement. It's just my personal
> preference (Linus disagrees with me!) for the bool type for documentation.
>
> (Compile-tested patch appended.)

I can include your patch in the series and then use boolean for the new
checking_timer field. However, it looks like this applies on an old
kernel. For example, the spin_lock field has already been removed from
the structure.

> But when it comes to really understanding this code, I'm struggling.
> It seems that this changes the return value of of fastpath_timer_check.
> I'm tryng to wrap my head around exactly why that is safe.
>
> Any time I see things like READ_ONCE and WRITE_ONCE, I wonder if there
> need to be memory barriers as well. (Because x86 has strong default
> memory ordering, testing there doesn't prove the code right on other
> platforms.)
>
> For the writes, the surrounding spinlock does the job.
>
> But on the read side (fastpath_timer_check), I'm not sure
> what the rules should be.
>
> Or is it basically okay if this is massively racey, since process-wide
> CPU timers are inherently sloppy. A race will just cause an expiration
> check to be missed, but it will be retried soon anyway.

Yes, the worst case scenario is that we wait until the next thread to
come along and handle the next expired timer. However, this "delay"
already occurs now (for example: a timer can expire right after a thread
exits check_process_timers()).

> Other half-baked thoughts that went through my head:
>
> If the problem is that you have contention for read access to
> sig->cputimer.cputime, can that be changed to a seqlock?
>
> Another possibility is to use a trylock before testing
> checking_timer:
>
> if (sig->cputimer.running) {
> struct task_cputime group_sample;
>
> - raw_spin_lock(&sig->cputimer.lock);
> + if (!raw_spin_trylock(&sig->cputimer.lock)) {
> + if (READ_ONCE(sig->cputimer.checking_timer))
> + return false; /* Lock holder's doing it for us */
> + raw_spin_lock(&sig->cputimer.lock);
> + }

The spinlock call has already been removed from a previous patch. The
issue now is with contention with the sighand lock.

Thanks,
Jason

2015-08-27 01:28:31

by George Spelvin

[permalink] [raw]
Subject: Re: [PATCH 3/3] timer: Reduce unnecessary sighand lock contention

> I can include your patch in the series and then use boolean for the new
> checking_timer field. However, it looks like this applies on an old
> kernel. For example, the spin_lock field has already been removed from
> the structure.

Apologies; that was 4.1.6. A 4.2-rc8 patch is appended (it's a pretty
trivial merge once you look at it).

> The spinlock call has already been removed from a previous patch. The
> issue now is with contention with the sighand lock.

I'll look some more and try to wrap my head around it.

>> Or is it basically okay if this is massively racey, since process-wide
>> CPU timers are inherently sloppy. A race will just cause an expiration
>> check to be missed, but it will be retried soon anyway.

> Yes, the worst case scenario is that we wait until the next thread to
> come along and handle the next expired timer. However, this "delay"
> already occurs now (for example: a timer can expire right after a thread
> exits check_process_timers()).

Ad is this polled, or is there some non-polled system that will trigger
another call to check_process_timers().

E.g. suppose a process fails to notice that it blew past a CPU time
timeout before blocking. Does anything guarantee that it will get
the timeout signal in finite real time?



>From 95349f9b16c30aea518ce79d72e8e0f0c5d12069 Mon Sep 17 00:00:00 2001
From: George Spelvin <[email protected]>
Date: Wed, 26 Aug 2015 19:15:54 +0000
Subject: [PATCH] timer: Use bool more in kernel/time/posix-cpu-timers.c

This is mostly function return values, for documentation.

One structure field is changed (from int), but alignment
padding precludes any actual space saving.

Signed-off-by: George Spelvin <[email protected]>

diff --git a/include/linux/sched.h b/include/linux/sched.h
index 04b5ada4..0cd80f76 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -607,7 +607,7 @@ struct task_cputime_atomic {
/**
* struct thread_group_cputimer - thread group interval timer counts
* @cputime_atomic: atomic thread group interval timers.
- * @running: non-zero when there are timers running and
+ * @running: %true when there are timers running and
* @cputime receives updates.
*
* This structure contains the version of task_cputime, above, that is
@@ -615,7 +615,7 @@ struct task_cputime_atomic {
*/
struct thread_group_cputimer {
struct task_cputime_atomic cputime_atomic;
- int running;
+ bool running;
};

#include <linux/rwsem.h>
diff --git a/kernel/time/posix-cpu-timers.c b/kernel/time/posix-cpu-timers.c
index 892e3dae..106368c5 100644
--- a/kernel/time/posix-cpu-timers.c
+++ b/kernel/time/posix-cpu-timers.c
@@ -113,14 +113,12 @@ static void bump_cpu_timer(struct k_itimer *timer,
*
* @cputime: The struct to compare.
*
- * Checks @cputime to see if all fields are zero. Returns true if all fields
- * are zero, false if any field is nonzero.
+ * Checks @cputime to see if all fields are zero. Returns %true if all fields
+ * are zero, %false if any field is nonzero.
*/
-static inline int task_cputime_zero(const struct task_cputime *cputime)
+static inline bool task_cputime_zero(const struct task_cputime *cputime)
{
- if (!cputime->utime && !cputime->stime && !cputime->sum_exec_runtime)
- return 1;
- return 0;
+ return !cputime->utime && !cputime->stime && !cputime->sum_exec_runtime;
}

static inline unsigned long long prof_ticks(struct task_struct *p)
@@ -249,7 +247,7 @@ void thread_group_cputimer(struct task_struct *tsk, struct task_cputime *times)
* but barriers are not required because update_gt_cputime()
* can handle concurrent updates.
*/
- WRITE_ONCE(cputimer->running, 1);
+ WRITE_ONCE(cputimer->running, true);
}
sample_cputime_atomic(times, &cputimer->cputime_atomic);
}
@@ -458,8 +456,9 @@ void posix_cpu_timers_exit_group(struct task_struct *tsk)
cleanup_timers(tsk->signal->cpu_timers);
}

-static inline int expires_gt(cputime_t expires, cputime_t new_exp)
+static inline bool expires_gt(cputime_t expires, cputime_t new_exp)
{
+ /* Could also be written "expires - 1 >= new_exp" */
return expires == 0 || expires > new_exp;
}

@@ -911,7 +910,7 @@ static inline void stop_process_timers(struct signal_struct *sig)
struct thread_group_cputimer *cputimer = &sig->cputimer;

/* Turn off cputimer->running. This is done without locking. */
- WRITE_ONCE(cputimer->running, 0);
+ WRITE_ONCE(cputimer->running, false);
}

static u32 onecputick;
@@ -1088,20 +1087,20 @@ out:
* @expires: Expiration times, against which @sample will be checked.
*
* Checks @sample against @expires to see if any field of @sample has expired.
- * Returns true if any field of the former is greater than the corresponding
- * field of the latter if the latter field is set. Otherwise returns false.
+ * Returns %true if any field of the former is greater than the corresponding
+ * field of the latter if the latter field is set. Otherwise returns %false.
*/
-static inline int task_cputime_expired(const struct task_cputime *sample,
+static inline bool task_cputime_expired(const struct task_cputime *sample,
const struct task_cputime *expires)
{
if (expires->utime && sample->utime >= expires->utime)
- return 1;
+ return true;
if (expires->stime && sample->utime + sample->stime >= expires->stime)
- return 1;
+ return true;
if (expires->sum_exec_runtime != 0 &&
sample->sum_exec_runtime >= expires->sum_exec_runtime)
- return 1;
- return 0;
+ return true;
+ return false;
}

/**
@@ -1110,11 +1109,11 @@ static inline int task_cputime_expired(const struct task_cputime *sample,
* @tsk: The task (thread) being checked.
*
* Check the task and thread group timers. If both are zero (there are no
- * timers set) return false. Otherwise snapshot the task and thread group
+ * timers set) return %false. Otherwise snapshot the task and thread group
* timers and compare them with the corresponding expiration times. Return
- * true if a timer has expired, else return false.
+ * %true if a timer has expired, else return %false.
*/
-static inline int fastpath_timer_check(struct task_struct *tsk)
+static inline bool fastpath_timer_check(struct task_struct *tsk)
{
struct signal_struct *sig;
cputime_t utime, stime;
@@ -1129,7 +1128,7 @@ static inline int fastpath_timer_check(struct task_struct *tsk)
};

if (task_cputime_expired(&task_sample, &tsk->cputime_expires))
- return 1;
+ return true;
}

sig = tsk->signal;
@@ -1140,10 +1139,10 @@ static inline int fastpath_timer_check(struct task_struct *tsk)
sample_cputime_atomic(&group_sample, &sig->cputimer.cputime_atomic);

if (task_cputime_expired(&group_sample, &sig->cputime_expires))
- return 1;
+ return true;
}

- return 0;
+ return false;
}

/*

2015-08-27 21:56:35

by Jason Low

[permalink] [raw]
Subject: Re: [PATCH 3/3] timer: Reduce unnecessary sighand lock contention

On Wed, 2015-08-26 at 21:28 -0400, George Spelvin wrote:
> > I can include your patch in the series and then use boolean for the new
> > checking_timer field. However, it looks like this applies on an old
> > kernel. For example, the spin_lock field has already been removed from
> > the structure.
>
> Apologies; that was 4.1.6. A 4.2-rc8 patch is appended (it's a pretty
> trivial merge once you look at it).

Frederic suggested that we just use a single "status" variable and
access the bits for the running and checking field. I am leaning towards
that method, so I might not include the rest of the boolean changes in
this patchset.

> > The spinlock call has already been removed from a previous patch. The
> > issue now is with contention with the sighand lock.
>
> I'll look some more and try to wrap my head around it.
>
> >> Or is it basically okay if this is massively racey, since process-wide
> >> CPU timers are inherently sloppy. A race will just cause an expiration
> >> check to be missed, but it will be retried soon anyway.
>
> > Yes, the worst case scenario is that we wait until the next thread to
> > come along and handle the next expired timer. However, this "delay"
> > already occurs now (for example: a timer can expire right after a thread
> > exits check_process_timers()).
>
> Ad is this polled, or is there some non-polled system that will trigger
> another call to check_process_timers().
>
> E.g. suppose a process fails to notice that it blew past a CPU time
> timeout before blocking. Does anything guarantee that it will get
> the timeout signal in finite real time?

Yep, the check_process_timers will get called again during the next
scheduler interrupt (approximately after 1 jiffy) and send the signal if
it finds that the timer expired then.

2015-08-27 22:43:09

by George Spelvin

[permalink] [raw]
Subject: Re: [PATCH 3/3] timer: Reduce unnecessary sighand lock contention

Jason Low wrote:
> Frederic suggested that we just use a single "status" variable and
> access the bits for the running and checking field. I am leaning towards
> that method, so I might not include the rest of the boolean changes in
> this patchset.

Don't worry, I'm not offended. I just started editing and figured
I might as well share it.

Whichever solution is easier. My only complaint about bitmapped variables
is that so many are "unsigned long" because the Linux atomic access
primitives, originally designed for file system bitmaps, use that type.

But using that for a non array wastes 4 bytes on 64-bit platforms that
can't be used if the code is to work on 32-bit ones.

>> E.g. suppose a process fails to notice that it blew past a CPU time
>> timeout before blocking. Does anything guarantee that it will get
>> the timeout signal in finite real time?
>
> Yep, the check_process_timers will get called again during the next
> scheduler interrupt (approximately after 1 jiffy) and send the signal if
> it finds that the timer expired then.

Will it? I thought it got called on the running process only.
Which is not the (blocked by assumption) process of interest.

I don't suspect that this would be a problem in practice, as CPU-time
timers are used on activities which use a *lot* of it. But it
seemed like a flaw worth either acknowledging or disproving.

2015-08-28 04:32:22

by Jason Low

[permalink] [raw]
Subject: Re: [PATCH 3/3] timer: Reduce unnecessary sighand lock contention

On Thu, 2015-08-27 at 18:43 -0400, George Spelvin wrote:
> Jason Low wrote:
> > Frederic suggested that we just use a single "status" variable and
> > access the bits for the running and checking field. I am leaning towards
> > that method, so I might not include the rest of the boolean changes in
> > this patchset.
>
> Don't worry, I'm not offended. I just started editing and figured
> I might as well share it.
>
> Whichever solution is easier. My only complaint about bitmapped variables
> is that so many are "unsigned long" because the Linux atomic access
> primitives, originally designed for file system bitmaps, use that type.
>
> But using that for a non array wastes 4 bytes on 64-bit platforms that
> can't be used if the code is to work on 32-bit ones.
>
> >> E.g. suppose a process fails to notice that it blew past a CPU time
> >> timeout before blocking. Does anything guarantee that it will get
> >> the timeout signal in finite real time?
> >
> > Yep, the check_process_timers will get called again during the next
> > scheduler interrupt (approximately after 1 jiffy) and send the signal if
> > it finds that the timer expired then.
>
> Will it? I thought it got called on the running process only.
> Which is not the (blocked by assumption) process of interest.
>
> I don't suspect that this would be a problem in practice, as CPU-time
> timers are used on activities which use a *lot* of it. But it
> seemed like a flaw worth either acknowledging or disproving.

You're right, this is only called on running processes. If the process
is blocked, sending the signal can get delayed. However, in practice,
this is not really a problem as you mentioned. Also, this "issue" is
also not really avoidable even without this patch. For instance, a timer
may expire and the process can get block before the next scheduler
interrupt.

Thanks,
Jason