Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id ; Sun, 3 Feb 2002 01:00:20 -0500 Received: (majordomo@vger.kernel.org) by vger.kernel.org id ; Sun, 3 Feb 2002 01:00:11 -0500 Received: from tomts6.bellnexxia.net ([209.226.175.26]:45979 "EHLO tomts6-srv.bellnexxia.net") by vger.kernel.org with ESMTP id ; Sun, 3 Feb 2002 01:00:02 -0500 Content-Type: text/plain; charset=US-ASCII From: Ed Tomlinson Organization: me To: linux-kernel@vger.kernel.org Subject: Re: [PATCH] improving O(1)-J9 in heavily threaded situations Date: Sun, 3 Feb 2002 01:00:00 -0500 X-Mailer: KMail [version 1.3.2] Cc: Ingo Molnar MIME-Version: 1.0 Content-Transfer-Encoding: 7BIT Message-Id: <20020203060001.10E694D94@oscar.casa.dyndns.org> Sender: linux-kernel-owner@vger.kernel.org X-Mailing-List: linux-kernel@vger.kernel.org Found a bug in the patch: if (!TASK_INTERACTIVE(p) || EXPIRED_STARVING(rq)) { if (!rq->expired_timestamp) rq->expired_timestamp = jiffies; enqueue_task(p, rq->expired); >>> goto out; } else enqueue_task(p, rq->active); } /* * Here we stop a single interactive task or a group of tasks sharing * a mm_struct (like java threads) from monopolizing the cpu. */ if (p->mm) { >>> if (p->mm->mm_hogstamp[smp_processor_id()] != rq->switch_timestamp) { I added a 'goto out;' after the enqueue_task(p, rq->expired); Suspect bad things can happen if I dequeue from the wrong array... Also changed the test in 'p->mm->mm_hogstamp[smp_processor_id()] < rq->switch_timestamp)' from < to != as it will be safer when jiffies rolls over. Ed Tomlinson On February 2, 2002 10:50 am, Ed Tomlinson wrote: > The following patch improves the performance of O(1) when under load > and threaded programs are running. For this patch, threaded is taken > to mean processes sharing their vm (p->mm). While this type of > programming is usually not optimal it is very common (ie. java). > > Under load, with threaded tasks running niced, and a cpu bound task at > normal priority we quickly notice the load average shooting up. What > is happening is all the threads are trying to run. Due to the nature > of O(1) all the process in the current array must run to get on to the > inactive array. This quickly leads to the higher priority cpu bound > task starving. Running the threaded application at nice +19 can help > but does not solve the base issue. On UP, I have observed loads of > between 20 and 40 when this is happening, > > I tried various approaches to correcting this. Neither reducing the > timeslices or playing the the effective prio of the threaded tasks > helped much. What does seems quite effective is to monitor the total > time _all_ the tasks sharing a vm use. Once a threshold is exceeded > we move any tasks in the same group directly to the inactive array > temporarily increasing their effective prio. > > With this change the cpu bound tasks can get over 80% of the cpu. > > A couple of comments. The patch applies its logic to all processes. > This limits the number of times interactive tasks can requeue. With > this in place we can probably drop the expired_timeslice logic. > > I see no reason that some of the tunables should not become part of > the standard scheduler provided they are not adding significant > overhead. The THREAD_PENALTY tunable is only used in the tick > processing logic (and not in the common path) - keeping it tunable > should not introduce measureable overhead. > > Patch against 2.4.17pre7 > > Comments and feedback welcome, > Ed Tomlinson > > --- linux/include/linux/sched.h.orig Fri Feb 1 23:19:44 2002 > +++ linux/include/linux/sched.h Fri Feb 1 23:16:34 2002 > @@ -211,6 +211,8 @@ > pgd_t * pgd; > atomic_t mm_users; /* How many users with user space? */ > atomic_t mm_count; /* How many references to "struct mm_struct" (users > count as 1) */ + unsigned long mm_hogstamp[NR_CPUS]; /* timestamp to reset > the hogslice */ + unsigned long mm_hogslice[NR_CPUS]; /* when this reaches > 0, task using this mm are hogs */ int map_count; /* number of VMAs */ > struct rw_semaphore mmap_sem; > spinlock_t page_table_lock; /* Protects task page tables and mm->rss */ > @@ -486,6 +488,7 @@ > #define INTERACTIVE_DELTA 3 > #define MAX_SLEEP_AVG (2*HZ) > #define STARVATION_LIMIT (2*HZ) > +#define THREAD_PENALTY 8 > > #define USER_PRIO(p) ((p)-MAX_RT_PRIO) > #define MAX_USER_PRIO (USER_PRIO(MAX_PRIO)) > --- linux/kernel/sched.c.orig Fri Feb 1 23:23:50 2002 > +++ linux/kernel/sched.c Fri Feb 1 23:12:49 2002 > @@ -41,7 +41,7 @@ > */ > struct runqueue { > spinlock_t lock; > - unsigned long nr_running, nr_switches, expired_timestamp; > + unsigned long nr_running, nr_switches, expired_timestamp, > switch_timestamp; task_t *curr, *idle; > prio_array_t *active, *expired, arrays[2]; > int prev_nr_running[NR_CPUS]; > @@ -614,6 +614,28 @@ > } else > enqueue_task(p, rq->active); > } > + > + /* > + * Here we stop a single interactive task or a group of tasks sharing > + * a mm_struct (like java threads) from monopolizing the cpu. > + */ > + if (p->mm) { > + if (p->mm->mm_hogstamp[smp_processor_id()] < rq->switch_timestamp) { > + p->mm->mm_hogstamp[smp_processor_id()] = rq->switch_timestamp; > + p->mm->mm_hogslice[smp_processor_id()] = > + NICE_TO_TIMESLICE(p->__nice) * THREAD_PENALTY; > + } > + if (!p->mm->mm_hogslice[smp_processor_id()]) { > + dequeue_task(p, rq->active); > + p->need_resched = 1; > + p->prio--; > + if (p->prio < MAX_RT_PRIO) > + p->prio = MAX_RT_PRIO; > + enqueue_task(p, rq->expired); > + } else > + p->mm->mm_hogslice[smp_processor_id()]--; > + } > + > out: > #if CONFIG_SMP > if (!(now % BUSY_REBALANCE_TICK)) > @@ -676,6 +698,7 @@ > rq->expired = array; > array = rq->active; > rq->expired_timestamp = 0; > + rq->switch_timestamp = jiffies; > } > > idx = sched_find_first_bit(array->bitmap); - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/