2002-01-11 00:43:28

by Ed Tomlinson

[permalink] [raw]
Subject: [patch] O(1) scheduler, -H4 - 2.4.17 problems

Hi,

The H4 sceduler does not boot here. The G1 version worked. The H4 version gets
as far as:

PCI: Using IRQ router VIA [1106/0586] at 00:07.0
Activating ISA DMA hang workarounds.
Linux NET4.0 for Linux 2.4
Based upon Swansea University Computer Society NET3.039
Initializing RT netlink socket

and Stalls. next messages are normally

Starting kswapd
matroxfb: Matrox Millennium G400 MAX (AGP) detected
matroxfb: MTRR's turned on
matroxfb: 640x480x8bpp (virtual: 640x26208)
matroxfb: framebuffer at 0xE8000000, mapped to 0xe0805000, size 33554432
Console: switching to colour frame buffer device 80x30

The rmap11 patch is also installed with the follow patch used to resolve the conflict.
I have been using this varient since E1.

Incase I messed up removing and repatch I tried from a clean kernel with the same results.
Any one else seeing this?

Box is K6-III 400.

Ed Tomlinson

--- linux/mm/page_alloc.c.rmap Tue Jan 8 18:24:10 2002
+++ linux/mm/page_alloc.c Tue Jan 8 18:25:00 2002
@@ -471,9 +471,7 @@
* NFS: we must yield the CPU (to rpciod) to avoid deadlock.
*/
if (gfp_mask & __GFP_WAIT) {
- __set_current_state(TASK_RUNNING);
- current->policy |= SCHED_YIELD;
- schedule();
+ yield();
if (!order || free_high(ALL_ZONES) >= 0) {
int progress = try_to_free_pages(gfp_mask);
if (progress || (gfp_mask & __GFP_FS))
--- linux/include/linux/sched.h.rmap Tue Jan 8 18:28:15 2002
+++ linux/include/linux/sched.h Tue Jan 8 18:33:36 2002
@@ -298,34 +298,50 @@

int lock_depth; /* Lock depth */

-/*
- * offset 32 begins here on 32-bit platforms. We keep
- * all fields in a single cacheline that are needed for
- * the goodness() loop in schedule().
- */
- long counter;
- long nice;
- unsigned long policy;
- struct mm_struct *mm;
- int processor;
/*
- * cpus_runnable is ~0 if the process is not running on any
- * CPU. It's (1 << cpu) if it's running on a CPU. This mask
- * is updated under the runqueue lock.
- *
- * To determine whether a process might run on a CPU, this
- * mask is AND-ed with cpus_allowed.
+ * offset 32 begins here on 32-bit platforms.
*/
- unsigned long cpus_runnable, cpus_allowed;
+ unsigned int cpu;
+ int prio;
+ long __nice;
+ list_t run_list;
+ prio_array_t *array;
+
+ unsigned int time_slice;
+ unsigned long sleep_timestamp, run_timestamp;
+
/*
- * (only the 'next' pointer fits into the cacheline, but
- * that's just fine.)
+ * A task's four 'sleep history' entries.
+ *
+ * We track the last 4 seconds of time. (including the current second).
+ *
+ * A value of '0' means it has spent no time sleeping in that
+ * particular past second. The maximum value of 'HZ' means that
+ * the task spent all its time running in that particular second.
+ *
+ * 'hist_idx' points to the current second, which, unlike the other
+ * 3 entries, is only partially complete. This means that a value of
+ * '25' does not mean the task slept 25% of the time in the current
+ * second, it means that it spent 25 timer ticks sleeping in the
+ * current second.
+ *
+ * All this might look a bit complex, but it can be maintained very
+ * small overhead and it gives very good statistics, based on which
+ * the scheduler can decide whether a task is 'interactive' or a
+ * 'CPU hog'. See sched.c for more details.
*/
- struct list_head run_list;
- unsigned long sleep_time;
+ #define SLEEP_HIST_SIZE 4
+
+ int hist_idx;
+ int hist[SLEEP_HIST_SIZE];
+
+ unsigned long policy;
+ unsigned long cpus_allowed;

struct task_struct *next_task, *prev_task;
- struct mm_struct *active_mm;
+
+ struct mm_struct *mm, *active_mm;
+ struct list_head local_pages;

/* task state */
struct linux_binfmt *binfmt;




2002-01-11 00:51:29

by khromy

[permalink] [raw]
Subject: Re: [patch] O(1) scheduler, -H4 - 2.4.17 problems

On Thu, Jan 10, 2002 at 07:43:04PM -0500, Ed Tomlinson wrote:
> Incase I messed up removing and repatch I tried from a clean kernel with the same results.
> Any one else seeing this?

Yes.. This is a PII350 with 128MiB... If anybody needs any more info let
me know.

2002-01-11 02:12:39

by Dieter Nützel

[permalink] [raw]
Subject: Re: [patch] O(1) scheduler, -H4 - 2.4.17 problems

On Fri, Jan 11, 2002 at 00:52:16AM, khromy wrote:
> On Thu, Jan 10, 2002 at 07:43:04PM -0500, Ed Tomlinson wrote:
> > Incase I messed up removing and repatch I tried from a clean kernel with
> > the same results.
> > Any one else seeing this?
>
> Yes.. This is a PII350 with 128MiB... If anybody needs any more info let
> me know.

-H5 (-G1, latest I've tried worked)

1 GHz Athlon II, 640 MB
hang hard right after
Initializing RT netlink socket

--
Dieter N?tzel
Graduate Student, Computer Science

University of Hamburg
Department of Computer Science
@home: [email protected]

2002-01-11 02:26:32

by Robert Love

[permalink] [raw]
Subject: Re: [patch] O(1) scheduler, -H4 - 2.4.17 problems

On Thu, 2002-01-10 at 19:43, Ed Tomlinson wrote:

> The H4 sceduler does not boot here. The G1 version worked. The H4 version
> gets as far as:

It seems most people are having problems with the scheduler some time
after G1, at least in UP and 2.4.

Same problem: stops just prior or after 'Starting kswapd' ...

I'll see what changed ...

Robert Love

2002-01-11 02:37:53

by Davide Libenzi

[permalink] [raw]
Subject: Re: [patch] O(1) scheduler, -H4 - 2.4.17 problems

On Fri, 11 Jan 2002, Dieter [iso-8859-15] N?tzel wrote:

> On Fri, Jan 11, 2002 at 00:52:16AM, khromy wrote:
> > On Thu, Jan 10, 2002 at 07:43:04PM -0500, Ed Tomlinson wrote:
> > > Incase I messed up removing and repatch I tried from a clean kernel with
> > > the same results.
> > > Any one else seeing this?
> >
> > Yes.. This is a PII350 with 128MiB... If anybody needs any more info let
> > me know.
>
> -H5 (-G1, latest I've tried worked)
>
> 1 GHz Athlon II, 640 MB
> hang hard right after
> Initializing RT netlink socket

Look in init/main.c, if kernel_thread() is called before init_idle().




- Davide


2002-01-11 02:58:38

by Robert Love

[permalink] [raw]
Subject: Re: [patch] O(1) scheduler, -H4 - 2.4.17 problems

On Thu, 2002-01-10 at 21:42, Davide Libenzi wrote:

> Look in init/main.c, if kernel_thread() is called before init_idle().

init is started via kernel_thread prior to init_idle ...

Robert Love

2002-01-11 03:06:09

by Davide Libenzi

[permalink] [raw]
Subject: Re: [patch] O(1) scheduler, -H4 - 2.4.17 problems

On 10 Jan 2002, Robert Love wrote:

> On Thu, 2002-01-10 at 21:42, Davide Libenzi wrote:
>
> > Look in init/main.c, if kernel_thread() is called before init_idle().
>
> init is started via kernel_thread prior to init_idle ...

Move init_idle() before the first call to kernel_thread().
This should fix it.



- Davide


2002-01-11 03:39:29

by Robert Love

[permalink] [raw]
Subject: Re: [patch] O(1) scheduler, -H4 - 2.4.17 problems

On Thu, 2002-01-10 at 22:11, Davide Libenzi wrote:

> Move init_idle() before the first call to kernel_thread().
> This should fix it.

On 2.4.17-pre3, UP, with -H5, the following patch failed to fix the
problem -- still hardlock on "Starting kswapd". I still suspect the
problem is with the init_idle changes, though ...

--- linux-2.4.18-pre3-ingo/init/main.c Thu Jan 10 21:13:12 2002
+++ linux/init/main.c Thu Jan 10 22:33:46 2002
@@ -590,14 +590,14 @@
check_bugs();
printk("POSIX conformance testing by UNIFIX\n");

+ smp_init();
+ init_idle();
kernel_thread(init, NULL, CLONE_FS | CLONE_FILES | CLONE_SIGNAL);
/*
* We count on the initial thread going ok
* Like idlers init is an unlocked kernel thread, which will
* make syscalls (and thus be locked).
*/
- smp_init();
- init_idle();
unlock_kernel();
printk("All processors have done init_idle\n");

Robert Love

2002-01-11 08:46:22

by Alex Davis

[permalink] [raw]
Subject: Re: [patch] O(1) scheduler, -H4 - 2.4.17 problems

I also tried 2.4.18pre3 with H5. The boot process hung in
spawn_ksoftirqd at the kernel_thread call.

__________________________________________________
Do You Yahoo!?
Send FREE video emails in Yahoo! Mail!
http://promo.yahoo.com/videomail/

2002-01-11 14:30:58

by Ingo Molnar

[permalink] [raw]
Subject: Re: [patch] O(1) scheduler, -H4 - 2.4.17 problems


On Thu, 10 Jan 2002, Davide Libenzi wrote:

> > -H5 (-G1, latest I've tried worked)
> >
> > 1 GHz Athlon II, 640 MB
> > hang hard right after
> > Initializing RT netlink socket
>
> Look in init/main.c, if kernel_thread() is called before init_idle().

look at my patches, they have included this fix for quite some time.
(Linus found it or me, i dont remember.) So whatever problem Dieter has,
it's not this particular bug.

Ingo

2002-01-11 14:48:21

by Taco IJsselmuiden

[permalink] [raw]
Subject: Re: [patch] O(1) scheduler, -H4 - 2.4.17 problems

> > > -H5 (-G1, latest I've tried worked)
> > >
> > > 1 GHz Athlon II, 640 MB
> > > hang hard right after
> > > Initializing RT netlink socket
> >
> > Look in init/main.c, if kernel_thread() is called before init_idle().
>
> look at my patches, they have included this fix for quite some time.
> (Linus found it or me, i dont remember.) So whatever problem Dieter has,
> it's not this particular bug.

i've got the same problem (UP, athlon 500Mhz)
anything I can check/test ??

Cheers,
Taco.

2002-01-11 17:34:13

by Taco IJsselmuiden

[permalink] [raw]
Subject: Re: [patch] O(1) scheduler, -H4 - 2.4.17 problems

> quick question: your box is not SMP, correct? if it's a UP box then the
> softirq.c change probably makes no difference to the lockup.
OK, just booted (remote) with 2.5.2-pre11-H6 and it works !!!

haven't looked at the patch yet, but is sure fixed it for me ;)
let's hope the same goes for 2.4.x people...


Cheers,
Taco.

2002-01-12 04:47:37

by Robert Love

[permalink] [raw]
Subject: Re: [patch] O(1) scheduler, -H4 - 2.4.17 problems

On Fri, 2002-01-11 at 12:33, Taco IJsselmuiden wrote:

> OK, just booted (remote) with 2.5.2-pre11-H6 and it works !!!
>
> haven't looked at the patch yet, but is sure fixed it for me ;)
> let's hope the same goes for 2.4.x people...

The problem is 2.4-only afaik. That said, I think H6 fixed it, so good
luck.

Robert Love