Content-Type: text/plain; charset=US-ASCII
From: Ed Tomlinson <tomlins@cam.org>
Organization: me
To: linux-kernel@vger.kernel.org
Subject: Re: [PATCH] improving O(1)-J9 in heavily threaded situations
Date: Sun, 3 Feb 2002 01:00:00 -0500
Cc: Ingo Molnar <mingo@elte.hu>
MIME-Version: 1.0
Content-Transfer-Encoding: 7BIT
Message-Id: <20020203060001.10E694D94@oscar.casa.dyndns.org>
Sender: linux-kernel-owner@vger.kernel.org

Found a bug in the patch:

                if (!TASK_INTERACTIVE(p) || EXPIRED_STARVING(rq)) {
                        if (!rq->expired_timestamp)
                                rq->expired_timestamp = jiffies;
                        enqueue_task(p, rq->expired);
>>>                     goto out;
                } else
                        enqueue_task(p, rq->active);
        }

        /*
         * Here we stop a single interactive task or a group of tasks sharing
         * a mm_struct (like java threads) from monopolizing the cpu.
         */
        if (p->mm) {
>>>		if (p->mm->mm_hogstamp[smp_processor_id()] != rq->switch_timestamp) {

I added a 'goto out;' after the enqueue_task(p, rq->expired);  Suspect bad
things can happen if I dequeue from the wrong array...  Also changed the test
in 'p->mm->mm_hogstamp[smp_processor_id()] < rq->switch_timestamp)' from <
to != as it will be safer when jiffies rolls over.

Ed Tomlinson

On February 2, 2002 10:50 am, Ed Tomlinson wrote:
> The following patch improves the performance of O(1) when under load
> and threaded programs are running.  For this patch, threaded is taken
> to mean processes sharing their vm (p->mm).  While this type of
> programming is usually not optimal it is very common (ie. java).
>
> Under load, with threaded tasks running niced, and a cpu bound task at
> normal priority we quickly notice the load average shooting up.  What
> is happening is all the threads are trying to run.  Due to the nature
> of O(1) all the process in the current array must run to get on to the
> inactive array.  This quickly leads to the higher priority cpu bound
> task starving.  Running the threaded application at nice +19 can help
> but does not solve the base issue.  On UP, I have observed loads of
> between 20 and 40 when this is happening,
>
> I tried various approaches to correcting this.  Neither reducing the
> timeslices or playing the the effective prio of the threaded tasks
> helped much.  What does seems quite effective is to monitor the total
> time _all_ the tasks sharing a vm use.  Once a threshold is exceeded
> we move any tasks in the same group directly to the inactive array
> temporarily increasing their effective prio.
>
> With this change the cpu bound tasks can get over 80% of the cpu.
>
> A couple of comments.  The patch applies its logic to all processes.
> This limits the number of times interactive tasks can requeue.  With
> this in place we can probably drop the expired_timeslice logic.
>
> I see no reason that some of the tunables should not become part of
> the standard scheduler provided they are not adding significant
> overhead.  The THREAD_PENALTY tunable is only used in the tick
> processing logic (and not in the common path) - keeping it tunable
> should not introduce measureable overhead.
>
> Patch against 2.4.17pre7
>
> Comments and feedback welcome,
> Ed Tomlinson
>
> --- linux/include/linux/sched.h.orig	Fri Feb  1 23:19:44 2002
> +++ linux/include/linux/sched.h	Fri Feb  1 23:16:34 2002
> @@ -211,6 +211,8 @@
>  	pgd_t * pgd;
>  	atomic_t mm_users;			/* How many users with user space? */
>  	atomic_t mm_count;			/* How many references to "struct mm_struct" (users
> count as 1) */ +	unsigned long mm_hogstamp[NR_CPUS];	/* timestamp to reset
> the hogslice */ +	unsigned long mm_hogslice[NR_CPUS];	/* when this reaches
> 0, task using this mm are hogs */ int map_count;				/* number of VMAs */
>  	struct rw_semaphore mmap_sem;
>  	spinlock_t page_table_lock;		/* Protects task page tables and mm->rss */
> @@ -486,6 +488,7 @@
>  #define INTERACTIVE_DELTA       3
>  #define MAX_SLEEP_AVG           (2*HZ)
>  #define STARVATION_LIMIT        (2*HZ)
> +#define THREAD_PENALTY		8
>
>  #define USER_PRIO(p)		((p)-MAX_RT_PRIO)
>  #define MAX_USER_PRIO		(USER_PRIO(MAX_PRIO))
> --- linux/kernel/sched.c.orig	Fri Feb  1 23:23:50 2002
> +++ linux/kernel/sched.c	Fri Feb  1 23:12:49 2002
> @@ -41,7 +41,7 @@
>   */
>  struct runqueue {
>  	spinlock_t lock;
> -	unsigned long nr_running, nr_switches, expired_timestamp;
> +	unsigned long nr_running, nr_switches, expired_timestamp,
> switch_timestamp; task_t *curr, *idle;
>  	prio_array_t *active, *expired, arrays[2];
>  	int prev_nr_running[NR_CPUS];
> @@ -614,6 +614,28 @@
> 		} else
>  			enqueue_task(p, rq->active);
>  	}
> +
> +	/*
> +	 * Here we stop a single interactive task or a group of tasks sharing
> +	 * a mm_struct (like java threads) from monopolizing the cpu.
> +	 */
> +	if (p->mm) {
> +		if (p->mm->mm_hogstamp[smp_processor_id()] < rq->switch_timestamp) {
> +			p->mm->mm_hogstamp[smp_processor_id()] = rq->switch_timestamp;
> +			p->mm->mm_hogslice[smp_processor_id()] =
> +				NICE_TO_TIMESLICE(p->__nice) * THREAD_PENALTY;
> +		}
> +		if (!p->mm->mm_hogslice[smp_processor_id()]) {
> +			dequeue_task(p, rq->active);
> +			p->need_resched = 1;
> +			p->prio--;
> +        		if (p->prio < MAX_RT_PRIO)
> +                		p->prio = MAX_RT_PRIO;
> +			enqueue_task(p, rq->expired);
> +		} else
> +			p->mm->mm_hogslice[smp_processor_id()]--;
> +	}
> +
>  out:
>  #if CONFIG_SMP
>  	if (!(now % BUSY_REBALANCE_TICK))
> @@ -676,6 +698,7 @@
>  		rq->expired = array;
>  		array = rq->active;
>  		rq->expired_timestamp = 0;
> +		rq->switch_timestamp = jiffies;
>  	}
>
>  	idx = sched_find_first_bit(array->bitmap);
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/