2009-09-03 21:57:31

by Marton Balint

[permalink] [raw]
Subject: Re: CPU scheduler weirdness?



On Sat, 29 Aug 2009, Marton Balint wrote:

>
>
> On Thu, 20 Aug 2009, Marton Balint wrote:
>>
>>
>> On Thu, 20 Aug 2009, Ingo Molnar wrote:
>>
>>>
>>> * Marton Balint <[email protected]> wrote:
>>>
>>>>
>>>> On Wed, 19 Aug 2009, Peter Zijlstra wrote:
>>>>
>>>>> On Wed, 2009-08-19 at 14:34 +0200, Marton Balint wrote:
>>>>>>
>>>>>> On Wed, 19 Aug 2009, Peter Zijlstra wrote:
>>>>>>
>>>>>>> On Wed, 2009-08-19 at 14:01 +0200, Marton Balint wrote:
>>>>>>>> On Wed, 19 Aug 2009, Peter Zijlstra wrote:
>>>>>>>>> On Tue, 2009-08-18 at 21:49 +0200, Marton Balint wrote:
>>>>>>>>>
>>>>>>>>>> In the meantime, I was able to create a tiny C program which always
>>>>>>>>>> succesfully reproduces the bug. It's basically an endless loop
>>>>>>>>>> which does
>>>>>>>>>> not stop while the process is running on the last CPU core. The
>>>>>>>>>> program
>>>>>>>>>> creates multiple instances of itself, to be able to keep all of the
>>>>>>>>>> CPU
>>>>>>>>>> cores busy. After 1 second, the processes running on other than the
>>>>>>>>>> last
>>>>>>>>>> CPU core die, the processes running on the last CPU core remain
>>>>>>>>>> stuck
>>>>>>>>>> there...
>>>>>>>>>>
>>>>>>>>>> I tested it on my dual core system, if someone could test it on a
>>>>>>>>>> quad
>>>>>>>>>> core and report back that would probably be useful.
>>>>>>>>>>
>>>>>>>>>> Usage: ./schedtest <number of CPU cores>
>>>>>>>>>>
>>>>>>>>>> And don't forget to kill the stuck processes after using the
>>>>>>>>>> program! :)
>>>>>>>>>
>>>>>>>>> So what's the bug? Sure one task will stay on the cpu, and because
>>>>>>>>> there
>>>>>>>>> is no contention it doesn't get migrated, and therefore won't quit,
>>>>>>>>> how's that a problem?
>>>>>>>>
>>>>>>>> Problem is that more than one processes remain on that CPU core, and
>>>>>>>> none
>>>>>>>> of them get migrated to other (idle) cores. I tested it with my E8400
>>>>>>>> processor and 2.6.31-rc5-git3 kernel.
>>>>>>>
>>>>>>> Only one remains here.. on a c2q running 2.6.31-rc6-tip
>>>>>>>
>>>>>>> Do you have a .config handy?
>>>>>>>
>>>>>>
>>>>>> Yes it's in my original post:
>>>>>>
>>>>>> http://marc.info/?l=linux-kernel&m=125012584709800&w=2
>>>>>
>>>>> Right you are,.. so I build a kernel with the cgroup scheduler in and
>>>>> tested it on a dual-core opteron machine, but I can't seem to reproduce
>>>>> this.
>>>>>
>>>>> Are you using cgroups in any way, or do you simply have it enabled in
>>>>> your config?
>>>>
>>>> No, it's just enabled. Actually the kernel is from the
>>>> openSUSE build service:
>>>>
>>>> http://download.opensuse.org/repositories/Kernel:/HEAD/openSUSE_11.1/x86_64/
>>>>
>>>> But the problem is present for both the kernel-default
>>>> kernel and the kernel-vanilla kernel which does not
>>>> contain any suse-specific patches.
>>>>
>>>> This evening I had a bit more time to test, and I've
>>>> made a surprising discovery: I can only reproduce the
>>>> bug if the kernel module of my TV tuner card is loaded.
>>>> I have a Leadtek Winfast 2000 XP Expert TV card, it
>>>> uses the cx8800 kernel module. It seems that the
>>>> problem is somehow related to the infrared sensor of
>>>> the TV card, because I recompiled the module with the
>>>> 'case CX88_BOARD_WINFAST2000XP_EXPERT:' line removed
>>>> from cx88-input.c and I couldn't reproduce the bug with
>>>> the new kernel module.
>>>
>>> Extremely weird. Are timers somehow busted?
>>
>> How can I check that?
>>
>> In the meantime, I updated my original C program and also created a kernel
>> module (schedtest_mod.c) which causes the same scheduling problems as the
>> kernel module of my TV card. The kernel module is a skeleton of the
>> infrared sensor polling code in cx88-input.c. It uses
>> schedule_delayed_work, this seems to cause the problem. The C program
>> (schedtest.c) is also updated, it now detects the number of CPU cores, from
>> now, what you can set as a command line parameter is the CPU core number,
>> on which the schedtest processes will not quit. (previously this was always
>> the last core).
>>
>> So to reproduce the bug on a dual core system, compile and insert the
>> kernel module (schedtest_mod.c). Then check dmesg, it should contain on
>> which CPU core is the delayed_work running. You should use the CPU core id
>> of the _other_ CPU core as a command line parameter to the updated
>> schedtest program.
>>
>> And by the way, thank you guys for the help so far, hopefully we'll get to
>> the bottom of this :)
>
> I reproduced the bug with the previously provided kernel module and C program
> on a different computer (it's a laptop with a core2 duo P8400 CPU), and also
> bisected the bug to this commit:
>
> sched: fine-tune SD_MC_INIT:
> 14800984706bf6936bbec5187f736e928be5c218
>
> If I add again the removed SD_BALANCE_NEWIDLE to flags, then everything works
> as expected. So what would be the correct fix for this bug? Revert the patch?
> Or just add SD_BALANCE_NEWIDLE to flags?


Ingo, Peter, could any of you guys have a look at the commit that caused
this bug? Is it OK to revert it? Or a fix somewhere else is necessary? I'm
pushing this because I hope that this bug will get fixed in the upcoming
stable kernel...

Regards,
Marton


2009-09-04 06:26:06

by Mike Galbraith

[permalink] [raw]
Subject: Re: CPU scheduler weirdness?

On Thu, 2009-09-03 at 23:57 +0200, Marton Balint wrote:

> >> In the meantime, I updated my original C program and also created a kernel
> >> module (schedtest_mod.c) which causes the same scheduling problems as the
> >> kernel module of my TV card. The kernel module is a skeleton of the
> >> infrared sensor polling code in cx88-input.c. It uses
> >> schedule_delayed_work, this seems to cause the problem. The C program
> >> (schedtest.c) is also updated, it now detects the number of CPU cores, from
> >> now, what you can set as a command line parameter is the CPU core number,
> >> on which the schedtest processes will not quit. (previously this was always
> >> the last core).
> >>
> >> So to reproduce the bug on a dual core system, compile and insert the
> >> kernel module (schedtest_mod.c). Then check dmesg, it should contain on
> >> which CPU core is the delayed_work running. You should use the CPU core id
> >> of the _other_ CPU core as a command line parameter to the updated
> >> schedtest program.
> >>
> >> And by the way, thank you guys for the help so far, hopefully we'll get to
> >> the bottom of this :)
> >
> > I reproduced the bug with the previously provided kernel module and C program
> > on a different computer (it's a laptop with a core2 duo P8400 CPU), and also
> > bisected the bug to this commit:
> >
> > sched: fine-tune SD_MC_INIT:
> > 14800984706bf6936bbec5187f736e928be5c218
> >
> > If I add again the removed SD_BALANCE_NEWIDLE to flags, then everything works
> > as expected. So what would be the correct fix for this bug? Revert the patch?
> > Or just add SD_BALANCE_NEWIDLE to flags?

Or, figure out what's going weird with that module loaded.

> Ingo, Peter, could any of you guys have a look at the commit that caused
> this bug? Is it OK to revert it? Or a fix somewhere else is necessary? I'm
> pushing this because I hope that this bug will get fixed in the upcoming
> stable kernel...

Where does your schedtest.c and schedtest_mod.c live?

-Mike

2009-09-04 07:53:51

by Marton Balint

[permalink] [raw]
Subject: Re: CPU scheduler weirdness?



On Fri, 4 Sep 2009, Mike Galbraith wrote:

> On Thu, 2009-09-03 at 23:57 +0200, Marton Balint wrote:
>
>>>> In the meantime, I updated my original C program and also created a kernel
>>>> module (schedtest_mod.c) which causes the same scheduling problems as the
>>>> kernel module of my TV card. The kernel module is a skeleton of the
>>>> infrared sensor polling code in cx88-input.c. It uses
>>>> schedule_delayed_work, this seems to cause the problem. The C program
>>>> (schedtest.c) is also updated, it now detects the number of CPU cores, from
>>>> now, what you can set as a command line parameter is the CPU core number,
>>>> on which the schedtest processes will not quit. (previously this was always
>>>> the last core).
>>>>
>>>> So to reproduce the bug on a dual core system, compile and insert the
>>>> kernel module (schedtest_mod.c). Then check dmesg, it should contain on
>>>> which CPU core is the delayed_work running. You should use the CPU core id
>>>> of the _other_ CPU core as a command line parameter to the updated
>>>> schedtest program.
>>>>
>>>> And by the way, thank you guys for the help so far, hopefully we'll get to
>>>> the bottom of this :)
>>>
>>> I reproduced the bug with the previously provided kernel module and C program
>>> on a different computer (it's a laptop with a core2 duo P8400 CPU), and also
>>> bisected the bug to this commit:
>>>
>>> sched: fine-tune SD_MC_INIT:
>>> 14800984706bf6936bbec5187f736e928be5c218
>>>
>>> If I add again the removed SD_BALANCE_NEWIDLE to flags, then everything works
>>> as expected. So what would be the correct fix for this bug? Revert the patch?
>>> Or just add SD_BALANCE_NEWIDLE to flags?
>
> Or, figure out what's going weird with that module loaded.

The problem is most likely caused by scheduled_delayed_work, a work
function is called every time a CPU wakes up.

>> Ingo, Peter, could any of you guys have a look at the commit that caused
>> this bug? Is it OK to revert it? Or a fix somewhere else is necessary? I'm
>> pushing this because I hope that this bug will get fixed in the upcoming
>> stable kernel...
>
> Where does your schedtest.c and schedtest_mod.c live?

They were attached to one of my previous mails, i'm inlining them here to
make the discussion easier. Thanks for looking into this.

Regards,
Marton


schedtest_mod.c
-------------------
#include <linux/module.h>
#include <linux/init.h>
#include <linux/workqueue.h>
#include <asm/smp.h>

static int i;
static struct delayed_work d_work;

static void schedtest_work(struct work_struct *work)
{
schedule_delayed_work(&d_work, msecs_to_jiffies(1));
if (i++ % 500 == 0) {
printk(KERN_DEBUG "schedtest: I am on CPU %d.\n", get_cpu());
put_cpu();
}
}

static int __init schedtest_init_module(void)
{
INIT_DELAYED_WORK(&d_work, schedtest_work);
schedule_delayed_work(&d_work, 0);
return 0;
}

static void __exit schedtest_cleanup_module(void)
{
cancel_delayed_work_sync(&d_work);
}

module_init(schedtest_init_module);
module_exit(schedtest_cleanup_module);

MODULE_LICENSE("GPL");



schedtest.c:
--------------------

#define _GNU_SOURCE
#include <utmpx.h>
#include <sys/time.h>
#include <unistd.h>

/* Usage: ./schedtest <cpu core to test> */

int miliseconds() {
struct timeval tv;
gettimeofday(&tv, 0);
return tv.tv_usec/1000;
}

int main(int argc, char *argv[]) {
int lives = 1000, time, lasttime, childs, cores, core_to_test;
cores = sysconf(_SC_NPROCESSORS_ONLN);
childs = cores * 2;
if (argc > 1)
core_to_test = atoi(argv[1]);
else
core_to_test = cores-1;
while (childs-- && !fork());
while (lives) {
time = miliseconds();
if (lasttime != time && sched_getcpu() != core_to_test)
lives--;
lasttime = time;
}
return 0;
}

2009-09-04 08:40:40

by Peter Zijlstra

[permalink] [raw]
Subject: Re: CPU scheduler weirdness?

On Thu, 2009-09-03 at 23:57 +0200, Marton Balint wrote:
> > sched: fine-tune SD_MC_INIT:
> > 14800984706bf6936bbec5187f736e928be5c218
> >
> > If I add again the removed SD_BALANCE_NEWIDLE to flags, then everything works
> > as expected. So what would be the correct fix for this bug? Revert the patch?
> > Or just add SD_BALANCE_NEWIDLE to flags?
>
>
> Ingo, Peter, could any of you guys have a look at the commit that caused
> this bug? Is it OK to revert it? Or a fix somewhere else is necessary? I'm
> pushing this because I hope that this bug will get fixed in the upcoming
> stable kernel...

I'm fine with re-adding SD_BALANCE_IDLE and SD_WAKE_IDLE on SMT/MC/CPU
levels.

Ingo?

2009-09-04 12:26:17

by Mike Galbraith

[permalink] [raw]
Subject: Re: CPU scheduler weirdness?

On Fri, 2009-09-04 at 09:53 +0200, Marton Balint wrote:
>
> On Fri, 4 Sep 2009, Mike Galbraith wrote:

> > Where does your schedtest.c and schedtest_mod.c live?
>
> They were attached to one of my previous mails, i'm inlining them here to
> make the discussion easier. Thanks for looking into this.

Thanks. Ingo has already resolved it though.

-Mike

2009-09-04 12:40:59

by Ingo Molnar

[permalink] [raw]
Subject: Re: CPU scheduler weirdness?


* Peter Zijlstra <[email protected]> wrote:

> On Thu, 2009-09-03 at 23:57 +0200, Marton Balint wrote:
> > > sched: fine-tune SD_MC_INIT:
> > > 14800984706bf6936bbec5187f736e928be5c218
> > >
> > > If I add again the removed SD_BALANCE_NEWIDLE to flags, then everything works
> > > as expected. So what would be the correct fix for this bug? Revert the patch?
> > > Or just add SD_BALANCE_NEWIDLE to flags?
> >
> >
> > Ingo, Peter, could any of you guys have a look at the commit
> > that caused this bug? Is it OK to revert it? Or a fix somewhere
> > else is necessary? I'm pushing this because I hope that this bug
> > will get fixed in the upcoming stable kernel...
>
> I'm fine with re-adding SD_BALANCE_IDLE and SD_WAKE_IDLE on
> SMT/MC/CPU levels.
>
> Ingo?

Ok, agreed. I have re-benchmarked a couple of key workload and it
seems like a good change, on top of your load-balancer fixes.

Marton, could you please double check the latest -tip tree:

http://people.redhat.com/mingo/tip.git/README

Does it resolve the problem?

Thanks,

Ingo

2009-09-04 15:31:31

by Marton Balint

[permalink] [raw]
Subject: Re: CPU scheduler weirdness?


On Fri, 4 Sep 2009, Ingo Molnar wrote:

>
> * Peter Zijlstra <[email protected]> wrote:
>
>> On Thu, 2009-09-03 at 23:57 +0200, Marton Balint wrote:
>>>> sched: fine-tune SD_MC_INIT:
>>>> 14800984706bf6936bbec5187f736e928be5c218
>>>>
>>>> If I add again the removed SD_BALANCE_NEWIDLE to flags, then everything works
>>>> as expected. So what would be the correct fix for this bug? Revert the patch?
>>>> Or just add SD_BALANCE_NEWIDLE to flags?
>>>
>>>
>>> Ingo, Peter, could any of you guys have a look at the commit
>>> that caused this bug? Is it OK to revert it? Or a fix somewhere
>>> else is necessary? I'm pushing this because I hope that this bug
>>> will get fixed in the upcoming stable kernel...
>>
>> I'm fine with re-adding SD_BALANCE_IDLE and SD_WAKE_IDLE on
>> SMT/MC/CPU levels.
>>
>> Ingo?
>
> Ok, agreed. I have re-benchmarked a couple of key workload and it
> seems like a good change, on top of your load-balancer fixes.
>
> Marton, could you please double check the latest -tip tree:
>
> http://people.redhat.com/mingo/tip.git/README
>
> Does it resolve the problem?

Yes it does, thanks.

Regards,
Marton