Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1758891AbYF3JIR (ORCPT ); Mon, 30 Jun 2008 05:08:17 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1753502AbYF3JIE (ORCPT ); Mon, 30 Jun 2008 05:08:04 -0400 Received: from mtagate1.uk.ibm.com ([195.212.29.134]:29150 "EHLO mtagate1.uk.ibm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752685AbYF3JIB (ORCPT ); Mon, 30 Jun 2008 05:08:01 -0400 Date: Mon, 30 Jun 2008 11:07:44 +0200 From: Heiko Carstens To: Dmitry Adamushko Cc: Ingo Molnar , Peter Zijlstra , Avi Kivity , linux-kernel@vger.kernel.org Subject: Re: [BUG] CFS vs cpu hotplug Message-ID: <20080630090744.GB6598@osiris.boeblingen.de.ibm.com> References: <20080619161949.GA11062@osiris.ibm.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.17 (2007-11-01) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 1579 Lines: 39 On Sun, Jun 29, 2008 at 12:16:56AM +0200, Dmitry Adamushko wrote: > Hello, > > > it seems to be related to migrate_dead_tasks(). > > Firstly I added traces to see all tasks being migrated with > migrate_live_tasks() and migrate_dead_tasks(). On my setup the problem > pops up (the one with "se == NULL" in the loop of > pick_next_task_fair()) shortly after the traces indicate that some has > been migrated with migrate_dead_tasks()). btw., I can reproduce it > much faster now with just a plain cpu down/up loop. > > [disclaimer] Well, unless I'm really missing something important in > this late hour [/desclaimer] pick_next_task() is not something > appropriate for migrate_dead_tasks() :-) > > the following change seems to eliminate the problem on my setup > (although, I kept it running only for a few minutes to get a few > messages indicating migrate_dead_tasks() does move tasks and the > system is still ok) > > [ quick hack ] > > @@ -5887,6 +5907,7 @@ static void migrate_dead_tasks(unsigned int dead_cpu) > next = pick_next_task(rq, rq->curr); > if (!next) > break; > + next->sched_class->put_prev_task(rq, next); > migrate_dead(dead_cpu, next); > > } Thanks Dmitry! With your patch I cannot reproduce the bug anymore. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/