These are patches designed to improve system responsiveness and interactivity.
It is configurable to any workload but the default ck patch is aimed at the
desktop and cks is available with more emphasis on serverspace.
THESE INCLUDE THE PATCHES FROM 2.6.16.12 SO START WITH 2.6.16 AS YOUR BASE
Apply to 2.6.16
http://www.kernel.org/pub/linux/kernel/people/ck/patches/2.6/2.6.16/2.6.16-ck9/patch-2.6.16-ck9.bz2
or server version
http://www.kernel.org/pub/linux/kernel/people/ck/patches/cks/patch-2.6.16-cks9.bz2
web:
http://kernel.kolivas.org
all patches:
http://www.kernel.org/pub/linux/kernel/people/ck/patches/
Split patches available.
Changes since 2.6.16-ck8:
Added:
+sched-fix_idleprio.patch
A small bug crept in that prevented SCHED_IDLEPRIO tasks from being scheduled
normally when they held a semaphore making it possible to livelock. This
fixes it.
Modified:
-patch-2.6.16.11
+patch-2.6.16.12
Resync with mainline
-2.6.16-ck8-version.patch
+2.6.16-ck9-version.patch
Version update
--
-ck
On Tue, 2006-05-02 at 16:38, Con Kolivas wrote:
> These are patches designed to improve system responsiveness and interactivity.
> It is configurable to any workload but the default ck patch is aimed at the
> desktop and cks is available with more emphasis on serverspace.
>
> THESE INCLUDE THE PATCHES FROM 2.6.16.12 SO START WITH 2.6.16 AS YOUR BASE
>
> Apply to 2.6.16
> http://www.kernel.org/pub/linux/kernel/people/ck/patches/2.6/2.6.16/2.6.16-ck9/patch-2.6.16-ck9.bz2
>
> or server version
> http://www.kernel.org/pub/linux/kernel/people/ck/patches/cks/patch-2.6.16-cks9.bz2
>
> web:
> http://kernel.kolivas.org
>
> all patches:
> http://www.kernel.org/pub/linux/kernel/people/ck/patches/
>
> Split patches available.
>
>
> Changes since 2.6.16-ck8:
>
> Added:
> +sched-fix_idleprio.patch
> A small bug crept in that prevented SCHED_IDLEPRIO tasks from being scheduled
> normally when they held a semaphore making it possible to livelock. This
> fixes it.
Hmm...
I tried to run SetiAtHome at IDLEPRIO, but it competes equally with a
while(1); loop run at nice 19. I'm starting to wonder if there isn't
some kind of bug in the kernel which results in a program returning from
a system call with an in-kernel semaphore held. After all, according to
top, SetiAtHome consumes over 90% CPU, and the system consumes only
about 1%, so it can't be making system calls all the time either. Or
maybe there's some case where the calculations can become confused and
think that a semaphore is still being held when it's not.
Is there any way to test this ?
Anyway, I ran strace on FahCore (a program, launched by SetiAtHome main
program, that actually consumes the CPU), and got this:
rt_sigprocmask(SIG_BLOCK, [CHLD], [RTMIN], 8) = 0
rt_sigaction(SIGCHLD, NULL, {SIG_DFL}, 8) = 0
rt_sigprocmask(SIG_SETMASK, [RTMIN], NULL, 8) = 0
nanosleep({5, 0}, {5, 0}) = 0
nanosleep({0, 0}, NULL) = 0
This pattern just keeps on repeating, endlessly. Occasionally it also
has
kill(5432, SIG_0) = 0
attached to it. 5432 is the parent process, the FAH502-Linux.exe.
There is something very strange going on here...
On Wednesday 03 May 2006 06:57, Juho Saarikko wrote:
> On Tue, 2006-05-02 at 16:38, Con Kolivas wrote:
> > These are patches designed to improve system responsiveness and
> > interactivity. It is configurable to any workload but the default ck
> > patch is aimed at the desktop and cks is available with more emphasis on
> > serverspace.
> >
> > THESE INCLUDE THE PATCHES FROM 2.6.16.12 SO START WITH 2.6.16 AS YOUR
> > BASE
> >
> > Apply to 2.6.16
> > http://www.kernel.org/pub/linux/kernel/people/ck/patches/2.6/2.6.16/2.6.1
> >6-ck9/patch-2.6.16-ck9.bz2
> >
> > or server version
> > http://www.kernel.org/pub/linux/kernel/people/ck/patches/cks/patch-2.6.16
> >-cks9.bz2
> >
> > web:
> > http://kernel.kolivas.org
> >
> > all patches:
> > http://www.kernel.org/pub/linux/kernel/people/ck/patches/
> >
> > Split patches available.
> >
> >
> > Changes since 2.6.16-ck8:
> >
> > Added:
> > +sched-fix_idleprio.patch
> > A small bug crept in that prevented SCHED_IDLEPRIO tasks from being
> > scheduled normally when they held a semaphore making it possible to
> > livelock. This fixes it.
>
> Hmm...
>
> I tried to run SetiAtHome at IDLEPRIO, but it competes equally with a
> while(1); loop run at nice 19. I'm starting to wonder if there isn't
> some kind of bug in the kernel which results in a program returning from
> a system call with an in-kernel semaphore held. After all, according to
> top, SetiAtHome consumes over 90% CPU, and the system consumes only
> about 1%, so it can't be making system calls all the time either. Or
> maybe there's some case where the calculations can become confused and
> think that a semaphore is still being held when it's not.
>
> Is there any way to test this ?
>
> Anyway, I ran strace on FahCore (a program, launched by SetiAtHome main
> program, that actually consumes the CPU), and got this:
>
>
> rt_sigprocmask(SIG_BLOCK, [CHLD], [RTMIN], 8) = 0
> rt_sigaction(SIGCHLD, NULL, {SIG_DFL}, 8) = 0
> rt_sigprocmask(SIG_SETMASK, [RTMIN], NULL, 8) = 0
> nanosleep({5, 0}, {5, 0}) = 0
> nanosleep({0, 0}, NULL) = 0
>
>
> This pattern just keeps on repeating, endlessly. Occasionally it also
> has
>
> kill(5432, SIG_0) = 0
>
> attached to it. 5432 is the parent process, the FAH502-Linux.exe.
>
> There is something very strange going on here...
Find all the threads running with this command:
ps -wweALo spid,user,priority,ni,pcpu,vsize,time,args
The spid will show you any threads with different pids to the main task. Then
check the actual scheduling policy they run at. Perhaps FahCore actually
manually sets them to SCHED_NORMAL
do:
schedtool $spid
of each thread to see what policy it is.
--
-ck
On Wed, 2006-05-03 at 01:01, Con Kolivas wrote:
> On Wednesday 03 May 2006 06:57, Juho Saarikko wrote:
> > I tried to run SetiAtHome at IDLEPRIO, but it competes equally with a
> > while(1); loop run at nice 19. I'm starting to wonder if there isn't
> > some kind of bug in the kernel which results in a program returning from
> > a system call with an in-kernel semaphore held. After all, according to
> > top, SetiAtHome consumes over 90% CPU, and the system consumes only
> > about 1%, so it can't be making system calls all the time either. Or
> > maybe there's some case where the calculations can become confused and
> > think that a semaphore is still being held when it's not.
> >
> > Is there any way to test this ?
> >
> > Anyway, I ran strace on FahCore (a program, launched by SetiAtHome main
> > program, that actually consumes the CPU), and got this:
> >
> >
> > rt_sigprocmask(SIG_BLOCK, [CHLD], [RTMIN], 8) = 0
> > rt_sigaction(SIGCHLD, NULL, {SIG_DFL}, 8) = 0
> > rt_sigprocmask(SIG_SETMASK, [RTMIN], NULL, 8) = 0
> > nanosleep({5, 0}, {5, 0}) = 0
> > nanosleep({0, 0}, NULL) = 0
> >
> >
> > This pattern just keeps on repeating, endlessly. Occasionally it also
> > has
> >
> > kill(5432, SIG_0) = 0
> >
> > attached to it. 5432 is the parent process, the FAH502-Linux.exe.
> >
> > There is something very strange going on here...
>
> Find all the threads running with this command:
> ps -wweALo spid,user,priority,ni,pcpu,vsize,time,args
My version of ps doesn't seem to support spid, but by dropping it I got
the thread pids anyway.
> The spid will show you any threads with different pids to the main task. Then
> check the actual scheduling policy they run at. Perhaps FahCore actually
> manually sets them to SCHED_NORMAL.
And so it does. Annoying. Time to hack kernel to add a new scheduling
policy, SCHED_STAYIDLE, which is like SCHED_IDLE but cannot be unset
except by root.
Can't make it the default, since a program running at SCHED_IDLE in a
machine with 100% CPU usage by some other program will never process
SIGKILL, and thus can only be killed by setting its scheduling policy to
normal...
Darn obnoxious program, SetiAtHome...
On Wednesday 03 May 2006 18:54, Juho Saarikko wrote:
> On Wed, 2006-05-03 at 01:01, Con Kolivas wrote:
> > The spid will show you any threads with different pids to the main task.
> > Then check the actual scheduling policy they run at. Perhaps FahCore
> > actually manually sets them to SCHED_NORMAL.
>
> And so it does. Annoying. Time to hack kernel to add a new scheduling
> policy, SCHED_STAYIDLE, which is like SCHED_IDLE but cannot be unset
> except by root.
>
> Can't make it the default, since a program running at SCHED_IDLE in a
> machine with 100% CPU usage by some other program will never process
> SIGKILL, and thus can only be killed by setting its scheduling policy to
> normal...
I toyed with the idea of making it one way to convert tasks to SCHED_IDLEPRIO
but not back to SCHED_NORMAL much like we do for niceing tasks up but not
back down again. However I personally found this very inconvenient as I often
might run something idleprio for a while and then change it back. It seems a
fair thing for a normal user to do.
> Darn obnoxious program, SetiAtHome...
Obviously when they wrote the linux client and added the ability to set the
priority from within the program to nice 19 they also explicitly set the
scheduling policy at the same time. This might make sense on some other OS...
but not linux.
--
-ck
>
>I tried to run SetiAtHome at IDLEPRIO, but it competes equally with a
>while(1); loop run at nice 19. I'm starting to wonder if there isn't
>some kind of bug in the kernel which results in a program returning from
>a system call with an in-kernel semaphore held. After all, according to
>top, SetiAtHome consumes over 90% CPU, and the system consumes only
>about 1%, so it can't be making system calls all the time either.
SAH does make very few system calls in relation to its computing, in fact.
[It's a guess, not a proven answer.] The boinc supervisor process is mostly
the syscall, filesystem and networking part.
>This pattern just keeps on repeating, endlessly. Occasionally it also
>has
>
>kill(5432, SIG_0) = 0
>
>attached to it. 5432 is the parent process, the FAH502-Linux.exe.
You don't use boinc?
>There is something very strange going on here...
Jan Engelhardt
--
>> And so it does. Annoying. Time to hack kernel to add a new scheduling
>> policy, SCHED_STAYIDLE, which is like SCHED_IDLE but cannot be unset
>> except by root.
>>
>> Can't make it the default, since a program running at SCHED_IDLE in a
>> machine with 100% CPU usage by some other program will never process
>> SIGKILL, and thus can only be killed by setting its scheduling policy to
>> normal...
Try making SCHED_STAYIDLE non-idle enough so that non-catchable signals get
processed in an appropriate time.
>> Darn obnoxious program, SetiAtHome...
>
>Obviously when they wrote the linux client and added the ability to set the
>priority from within the program to nice 19 they also explicitly set the
>scheduling policy at the same time. This might make sense on some other OS...
>but not linux.
>
You have the source, you can change the behavior as a temporary workaround.
Jan Engelhardt
--
On Wed, 2006-05-03 at 19:25, Jan Engelhardt wrote:
> >
> >I tried to run SetiAtHome at IDLEPRIO, but it competes equally with a
> >while(1); loop run at nice 19. I'm starting to wonder if there isn't
> >some kind of bug in the kernel which results in a program returning from
> >a system call with an in-kernel semaphore held. After all, according to
> >top, SetiAtHome consumes over 90% CPU, and the system consumes only
> >about 1%, so it can't be making system calls all the time either.
>
> SAH does make very few system calls in relation to its computing, in fact.
> [It's a guess, not a proven answer.] The boinc supervisor process is mostly
> the syscall, filesystem and networking part.
>
> >This pattern just keeps on repeating, endlessly. Occasionally it also
> >has
> >
> >kill(5432, SIG_0) = 0
> >
> >attached to it. 5432 is the parent process, the FAH502-Linux.exe.
>
> You don't use boinc?
AARRGGHH. I meant FoldingAtHome.
That's what I get from not paying attention to what I'm typing ;(.