I posted something about this back in October, but received little response.
Maybe others have run into problems with this since then.
I've noticed much less deterministic and more widely varying thread wakeup
(scheduling) delays on recent kernels. Even with isolated processors, the
maximum delay to wakeup has gotten much longer (configured with or without
CONFIG_PREEMPT).
The maximum delay to wakeup is now more than 10x longer than it was in
2.6.13.4 and previous kernels, and that's on isolated processors (as much
as 300 usec on a 1GHz cpu), although nominal values remain largely unchanged.
The latest version I've tested is 2.6.15-rc5.
Delving into this further I discovered that this is due to the execution
time of file_free_rcu(), running from rcu_process_callbacks() in ksoftirqd.
It appears that the modification that caused this was:
http://www.kernel.org/git/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=ab2af1f5005069321c5d130f09cce577b03f43ef
By simply making the following change things return to more consistent
thread wakeup delays on isolated cpus, similiar to what we had on kernels
previous to the above mentioned mod (I know this change is incorrect,
it is just for test purposes):
fs/file_table.c
@@ -62,7 +62,7 @@
static inline void file_free(struct file *f)
{
- call_rcu(&f->f_rcuhead, file_free_rcu);
+ kmem_cache_free(filp_cachep, f);
}
I am wondering if there is some way we can return to consistently fast
and predictable scheduling of threads to be woken? If not on the
system in general, maybe at least on certain specified processors?
Dimitri
On Tue, Dec 20, 2005 at 09:17:22AM -0600, Dimitri Sivanich wrote:
> I posted something about this back in October, but received little response.
> Maybe others have run into problems with this since then.
>
> I've noticed much less deterministic and more widely varying thread wakeup
> (scheduling) delays on recent kernels. Even with isolated processors, the
> maximum delay to wakeup has gotten much longer (configured with or without
> CONFIG_PREEMPT).
Interesting -- what workload are you running, and what mechanism are
you using to check scheduling delays?
What happens when you run this workload on a -rt kernel?
Thanx, Paul
> The maximum delay to wakeup is now more than 10x longer than it was in
> 2.6.13.4 and previous kernels, and that's on isolated processors (as much
> as 300 usec on a 1GHz cpu), although nominal values remain largely unchanged.
> The latest version I've tested is 2.6.15-rc5.
>
> Delving into this further I discovered that this is due to the execution
> time of file_free_rcu(), running from rcu_process_callbacks() in ksoftirqd.
> It appears that the modification that caused this was:
> http://www.kernel.org/git/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=ab2af1f5005069321c5d130f09cce577b03f43ef
>
> By simply making the following change things return to more consistent
> thread wakeup delays on isolated cpus, similiar to what we had on kernels
> previous to the above mentioned mod (I know this change is incorrect,
> it is just for test purposes):
>
> fs/file_table.c
> @@ -62,7 +62,7 @@
>
> static inline void file_free(struct file *f)
> {
> - call_rcu(&f->f_rcuhead, file_free_rcu);
> + kmem_cache_free(filp_cachep, f);
> }
>
>
> I am wondering if there is some way we can return to consistently fast
> and predictable scheduling of threads to be woken? If not on the
> system in general, maybe at least on certain specified processors?
>
> Dimitri
>
Paul,
On Wed, Dec 21, 2005 at 05:38:47AM -0800, Paul E. McKenney wrote:
> On Tue, Dec 20, 2005 at 09:17:22AM -0600, Dimitri Sivanich wrote:
> > I posted something about this back in October, but received little response.
> > Maybe others have run into problems with this since then.
> >
> > I've noticed much less deterministic and more widely varying thread wakeup
> > (scheduling) delays on recent kernels. Even with isolated processors, the
> > maximum delay to wakeup has gotten much longer (configured with or without
> > CONFIG_PREEMPT).
>
> Interesting -- what workload are you running, and what mechanism are
> you using to check scheduling delays?
For this issue, it takes just a simple mechanism and little workload.
Aside from the 2 processors involved in the test, the rest of the system
is fairly idle. Basically, it looks like the following:
Thread pinned to processor A waits in the kernel on a mutex.
Thread pinned to processor B sets up some timer hardware to interrupt
processor A at a specified instant in time.
Processor A gets the interrupt and records an RTC time stamp,
then trips the mutex to wake the sleeping thread on this same cpu.
Thread on A wakes up and records another RTC time stamp.
Thread on A opens a results file. Should the difference between the
timestamps be > some threshold, data is written out to the file. The
file is then closed.
Thread on A waits again on the mutex.
If I look at the stack for processor A at the threshold time, it's always
somewhere in file_free_rcu. Nominal values for this test are well below
threshold (even now).
And again, setting either CONFIG_PREEMPT or the equivalent of
CONFIG_PREEMPT_VOLUNTARY (setting might_resched()=cond_resched()) makes
no difference.
>
> What happens when you run this workload on a -rt kernel?
I haven't tried it.
Thanks.
Dimitri
>
> Thanx, Paul
>
> > The maximum delay to wakeup is now more than 10x longer than it was in
> > 2.6.13.4 and previous kernels, and that's on isolated processors (as much
> > as 300 usec on a 1GHz cpu), although nominal values remain largely unchanged.
> > The latest version I've tested is 2.6.15-rc5.
> >
> > Delving into this further I discovered that this is due to the execution
> > time of file_free_rcu(), running from rcu_process_callbacks() in ksoftirqd.
> > It appears that the modification that caused this was:
> > http://www.kernel.org/git/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=ab2af1f5005069321c5d130f09cce577b03f43ef
> >
> > By simply making the following change things return to more consistent
> > thread wakeup delays on isolated cpus, similiar to what we had on kernels
> > previous to the above mentioned mod (I know this change is incorrect,
> > it is just for test purposes):
> >
> > fs/file_table.c
> > @@ -62,7 +62,7 @@
> >
> > static inline void file_free(struct file *f)
> > {
> > - call_rcu(&f->f_rcuhead, file_free_rcu);
> > + kmem_cache_free(filp_cachep, f);
> > }
> >
> >
> > I am wondering if there is some way we can return to consistently fast
> > and predictable scheduling of threads to be woken? If not on the
> > system in general, maybe at least on certain specified processors?
> >
> > Dimitri
> >