Does anyone know why calc_load() must be called under the protection of the
xtime_lock???
One of the big-system hot spots that I am chasing involves timers, nsec
clocks, hrtimer, and related functions. I added debug code to see why
these paths are so bad on large systems.
At least part of the problem is in the following code path:
timer_interrupt
lock xtime_lock
do_timer
update_times
calc_load
unlock xtimer_lock
Once every 5 sec, update_times() calls calc_load(). On a large systems
depending on load, it can take several msec to calculate the load averages..
During this entire time, the xtime_lock is held. Any other cpu that calls
current_kernel_time(), getnstimeofday(), do_gettimeofday() or similar
functions will spin for the remaining time that xtime_lock is held.
Code added to getnstimeofday() shows that this function iterates an average
of 3200 times thru the read_seqbegin/read_seqretry loop once every 5 sec
when the system is running an application that causes a lot of memory
traffic (512p system).
Question: why is it necessary to hold the xtime_lock when doing the
calculation of calc_load(). calc_load() recalculates the recent system
load. This calculation is statisticaly in nature - it does not seem to
warrant a heavyweight lock such as xtime_lock.
calc_load() updates avenrun[]. There are only 3 consumers of this
data:
loadavg_read_proc - this function ignores xtime_lock. It reads the
avenrun[] without checking locks. Removing the xtime_lock
will have no effect.
sys_sysinfo - this function uses xtime_lock to atomically read
avenrun, getnstimeofday() & nr_threads. I understand
why getnstimeofday & wall_to_monotonic should be read
atomically. However, I don't see why avenrun[]
must also be read atomically.
net/sched/em_meta.c - ignores xtime_lock.
None of these would appear to require a lock for accesses to avenrun[].
What have I overlooked??
Here is the patch that I am proposing. This patch is incomplete because it
addresses only the IA64 architecture. If this approach is acceptible, I'll
update the patch to cover all architectures.
Signed-off-by: Jack Steiner <[email protected]>
arch/ia64/kernel/time.c | 5 +++--
include/linux/sched.h | 3 ++-
kernel/timer.c | 13 ++++++++-----
3 files changed, 13 insertions(+), 8 deletions(-)
Index: linux/arch/ia64/kernel/time.c
===================================================================
--- linux.orig/arch/ia64/kernel/time.c 2006-03-21 10:15:38.000000000 -0600
+++ linux/arch/ia64/kernel/time.c 2006-03-21 10:17:23.746892642 -0600
@@ -50,7 +50,7 @@ static struct time_interpolator itc_inte
static irqreturn_t
timer_interrupt (int irq, void *dev_id, struct pt_regs *regs)
{
- unsigned long new_itm;
+ unsigned long new_itm, ticks;
if (unlikely(cpu_is_offline(smp_processor_id()))) {
return IRQ_HANDLED;
@@ -79,9 +79,10 @@ timer_interrupt (int irq, void *dev_id,
* xtime_lock.
*/
write_seqlock(&xtime_lock);
- do_timer(regs);
+ ticks = do_timer(regs);
local_cpu_data->itm_next = new_itm;
write_sequnlock(&xtime_lock);
+ calc_load(ticks);
} else
local_cpu_data->itm_next = new_itm;
Index: linux/include/linux/sched.h
===================================================================
--- linux.orig/include/linux/sched.h 2006-03-21 10:15:37.000000000 -0600
+++ linux/include/linux/sched.h 2006-03-21 10:16:19.562739421 -0600
@@ -1152,7 +1152,8 @@ extern void switch_uid(struct user_struc
#include <asm/current.h>
-extern void do_timer(struct pt_regs *);
+extern unsigned long do_timer(struct pt_regs *);
+extern void calc_load(unsigned long ticks);
extern int FASTCALL(wake_up_state(struct task_struct * tsk, unsigned int state));
extern int FASTCALL(wake_up_process(struct task_struct * tsk));
Index: linux/kernel/timer.c
===================================================================
--- linux.orig/kernel/timer.c 2006-03-21 10:15:37.000000000 -0600
+++ linux/kernel/timer.c 2006-03-21 10:16:19.564692355 -0600
@@ -871,7 +871,7 @@ EXPORT_SYMBOL(avenrun);
* calc_load - given tick count, update the avenrun load estimates.
* This is called while holding a write_lock on xtime_lock.
*/
-static inline void calc_load(unsigned long ticks)
+void calc_load(unsigned long ticks)
{
unsigned long active_tasks; /* fixed-point */
static int count = LOAD_FREQ;
@@ -923,7 +923,7 @@ void run_local_timers(void)
* Called by the timer interrupt. xtime_lock must already be taken
* by the timer IRQ!
*/
-static inline void update_times(void)
+static inline unsigned long update_times(void)
{
unsigned long ticks;
@@ -932,7 +932,7 @@ static inline void update_times(void)
wall_jiffies += ticks;
update_wall_time(ticks);
}
- calc_load(ticks);
+ return ticks;
}
/*
@@ -941,13 +941,16 @@ static inline void update_times(void)
* jiffies is defined in the linker script...
*/
-void do_timer(struct pt_regs *regs)
+unsigned long do_timer(struct pt_regs *regs)
{
+ unsigned long ticks;
+
jiffies_64++;
/* prevent loading jiffies before storing new jiffies_64 value. */
barrier();
- update_times();
+ ticks = update_times();
softlockup_tick(regs);
+ return ticks;
}
#ifdef __ARCH_WANT_SYS_ALARM
* Jack Steiner <[email protected]> wrote:
> Here is the patch that I am proposing. This patch is incomplete
> because it addresses only the IA64 architecture. If this approach is
> acceptible, I'll update the patch to cover all architectures.
>
> Signed-off-by: Jack Steiner <[email protected]>
would be nice to base this on the GTOD patchset - that way you'll only
have to modify kernel/time/*.c, not a gazillion of architectures...
i agree with your analysis - there is no reason calc_load() should be
under xtime_lock. I guess no-one noticed this so far because calc_load()
iterating over hundreds of CPUs isnt too common.
Ingo
On Tue, Mar 21, 2006 at 09:36:47PM +0100, Ingo Molnar wrote:
> * Jack Steiner <[email protected]> wrote:
>
> > Here is the patch that I am proposing. This patch is incomplete
> > because it addresses only the IA64 architecture. If this approach is
> > acceptible, I'll update the patch to cover all architectures.
> >
> > Signed-off-by: Jack Steiner <[email protected]>
>
> i agree with your analysis - there is no reason calc_load() should be
> under xtime_lock. I guess no-one noticed this so far because calc_load()
> iterating over hundreds of CPUs isnt too common.
I tested this patch and it works well for eliminating latencies due to
contention for the xtime_lock. Without the patch the latencies are
quite substantial at higher cpu counts.