Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1756204AbYHNIG1 (ORCPT ); Thu, 14 Aug 2008 04:06:27 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1752464AbYHNIGM (ORCPT ); Thu, 14 Aug 2008 04:06:12 -0400 Received: from gv-out-0910.google.com ([216.239.58.190]:14704 "EHLO gv-out-0910.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752357AbYHNIGJ (ORCPT ); Thu, 14 Aug 2008 04:06:09 -0400 DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=subject:from:to:cc:in-reply-to:references:content-type:date :message-id:mime-version:x-mailer:content-transfer-encoding; b=j+yK55VCDG13r3vv0L8XNqtIPjkPMHEYx8NURZnSdZP+7PCrfmouOqeLw+eOpJKoVs yxEWnXnEBpk5bESzJcgIQHjvNbxfuki+Qrsva1+Uh/2Lsfmi5qdCcSikHQs0YSMZoVtg 3uvdZMmta4sQ57oSomcZvR2wfSyQ98FkTa2h8= Subject: RE: Possible CPU_IDLE bug [WAS: Re: Timer unstability on when using C2 and deeper sleep states (Dell Latitude XT)] From: Milan Plzik To: "Pallipadi, Venkatesh" Cc: "linux-kernel@vger.kernel.org" , "tglx@linutronix.de" In-Reply-To: <7E82351C108FA840AB1866AC776AEC460945BFF2@orsmsx505.amr.corp.intel.com> References: <1218566008.13866.41.camel@localhost> <1218625125.4304.7.camel@localhost> <1218636980.5250.12.camel@localhost> <7E82351C108FA840AB1866AC776AEC460945BB44@orsmsx505.amr.corp.intel.com> <1218658873.4336.15.camel@localhost> <7E82351C108FA840AB1866AC776AEC460945BFF2@orsmsx505.amr.corp.intel.com> Content-Type: text/plain Date: Thu, 14 Aug 2008 10:05:51 +0200 Message-Id: <1218701151.4285.9.camel@localhost> Mime-Version: 1.0 X-Mailer: Evolution 2.22.3.1 Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 6073 Lines: 150 On St, 2008-08-13 at 14:22 -0700, Pallipadi, Venkatesh wrote: > > >-----Original Message----- > >From: Milan Plzik [mailto:milan.plzik@gmail.com] > >Sent: Wednesday, August 13, 2008 1:21 PM > >To: Pallipadi, Venkatesh > >Cc: linux-kernel@vger.kernel.org > >Subject: RE: Possible CPU_IDLE bug [WAS: Re: Timer unstability > >on when using C2 and deeper sleep states (Dell Latitude XT)] > > > >On St, 2008-08-13 at 11:14 -0700, Pallipadi, Venkatesh wrote: > >> > >> >-----Original Message----- > >> >From: linux-kernel-owner@vger.kernel.org > >> >[mailto:linux-kernel-owner@vger.kernel.org] On Behalf Of Milan Plzik > >> >Sent: Wednesday, August 13, 2008 7:16 AM > >> >To: linux-kernel@vger.kernel.org > >> >Subject: Possible CPU_IDLE bug [WAS: Re: Timer unstability on > >> >when using C2 and deeper sleep states (Dell Latitude XT)] > >> > > >> > Hello again, > >> > > >> >On St, 2008-08-13 at 12:58 +0200, Milan Plzik wrote: > >> >> I apologize for replying on my own mail (and also for > >> >top-posting, but > >> >> this information is global update, not exactly fitting > >any of topics > >> >> mentioned below). > >> >> > >> >> After playing for a longer while I found out that the > >system ends > >> >> sometimes in state where, in order to do anything useful, > >I need to > >> >> press keys on keyboard. Otherwise, the system just stalls and does > >> >> nothing. I have no idea why does this happen (especially > >when I know > >> >> that OHCI or wireless network adapter produce fair amount of > >> >> interrupts). My /proc/interrupts is below. Just for the record, > >> >> chipset > >> >> on board is ATI RS600 with (apparently from lspci) ATI SB600 > >> >> southbridge. > >> > > >> > it looks like this problem disappears if CONFIG_CPU_IDLE option is > >> >disabled, system seems to be stable for more than one hour. This > >> >suggests that something may be wrong with the CPU_IDLE > >code. I can not > >> >spend much more time by debugging the kernel, but if anyone has an > >> >suggestion about what to fix, I will gladly test it. > >> > > >> > Best regards, > >> > Milan Plzik > >> > >> It may not be a problem with cpuidle code per se. We have > >had issues earlier like this one > >> > >> http://bugzilla.kernel.org/show_bug.cgi?id=10011 > >> > >> Cpuidle tries to go to C3 state aggressively and thus may be > >indirectly causing the problem with graphics hardware or > >something like that. > >> > >> From Dave's comment in the above bugzilla: > >> can you try with Option "DRI" "Off" in your xorg.conf > >> > >> Does that change anything? > > > > The DRI flag itself seems to have little to no effect on > >what actually > >happens. I noticed that the problems are really visible with > >CONFIG_HZ_1000 and no preemption, other settings seem to blur the > >problem a little (but it seems to be still there). I did some > >additional > >testing, below are the results. Testing programs were: powertop (ran > >immediately after booting), X server startup and starting mplayer with > >some videos. > > > >1) plain boot witho processor.max_cstate not set, DRI off: > > (boot process seemed to stall here and there) > > a) powertop on console (before running X server) returns bogus > >numbers, like 20000 wakeups/sec. > > b) starting X server -- succeeds, but only after tapping keys on > >keyboard, otherwise seems to stall. > > c) mplayer seems to get stuck here and there, keypresses help and it > >is able to play a little more of the video for a while. > > d) additional observation: keyboard autorepeat stopped (mostly) > >working, though it was enabled in both X server and console > > > >2) processor.max_cstate=2, DRI off > > a) powertop on console starts giving rational numbers, such as 300 > >wakeups/sec > > b) X server seems to start correctly > > c) mplayer seems to play files for a while, then it starts flickering > >as if it wasn't able to keep up with speed; at the same time powertop > >reports 90% of time spent in C2 > > > >2a) processor.max_cstate=2, DRI on (just changed X server configuration > >without reboot) > > video playback seems to be more stable, but that might be just GPU > >acceleration > > > >3) processor.max_cstate=2, DRI on after cold reboot > > symptoms like with attempt 1), but powertop returns correct numbers > > > >4) processor.max_cstate=1, DRI on > > in this state I'm writing this e-mail and so far seems to be > >stable :) > > > > > > I can just guess what causes these problems... . 1) might seem like > >improper timer setup after resuming from C3 (at least that > >would explain > >that weird powertop numbers). > > > > The issue with keyboard needing to be pressed seems more like some > >race condition, when sometimes the interrupts are not properly enabled > >-- sometimes it works, sometimes not. > > > > > > I hope these results will help at least a little. If something other > >is neccessary, I'll try to do it ASAP. > > > > Were all these tests with 2.6.26? Can you try with 2.6.27-rc3? > > There is one bugfix patch that, IIRC, went in after 2.6.26. > http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=b8f8c3cf0a4ac0632ec3f0e15e9dc0c29de917af I tried it just now, is performs a bit better than 2.6.26 (e.g. I don't get that "press any key unless nothing happens" states), even reports a bit more reasonable values of wakeups, but the system sometimes becomes rather slow (e.g. when playing video). I was not able to compile fglrx driver, so I had to change it to radeon one. And also, the number of wakeups reported is not very convincing, though, it changes from 300 to 600 (which is ~ two times the sum of wakeups) without any reason, and sometimes goes even higher. I tried to use nolapic_timer option, but it didn't help. > > Thanks, > Venki Thank you, Milan -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/