Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751987Ab1DQQYz (ORCPT ); Sun, 17 Apr 2011 12:24:55 -0400 Received: from mailservices.uwaterloo.ca ([129.97.128.141]:53362 "EHLO mailchk-m02.uwaterloo.ca" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1751017Ab1DQQYu (ORCPT ); Sun, 17 Apr 2011 12:24:50 -0400 Date: Sun, 17 Apr 2011 12:24:27 -0400 From: Kyle Spaans To: Marcin Slusarz Cc: torvalds@linux-foundation.org, linux-kernel@vger.kernel.org, Dominik Brodowski , Ben Skeggs , airlied@redhat.com, dri-devel@lists.freedesktop.org, mjg@redhat.com, maciej.rutecki@gmail.com, nouveau@lists.freedesktop.org, Nigel Cunningham , Nick Piggin Subject: Re: 2.6.39-rc1 nouveau regression (bisected) Message-ID: <20110417162427.GB25242@taurine.csclub.uwaterloo.ca> References: <20110403181206.GA19291@comet.dominikbrodowski.net> <20110407151129.GA24977@comet.dominikbrodowski.net> <20110414170559.GA10768@comet.dominikbrodowski.net> <20110414190117.GA3493@joi.lan> <20110415061136.GA21979@isilmar-3.linta.de> <4DAA1453.5000604@nigelcunningham.com.au> <20110416235028.GA6096@taurine.csclub.uwaterloo.ca> <20110417151204.GA24519@taurine.csclub.uwaterloo.ca> <20110417154557.GA2871@joi.lan> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20110417154557.GA2871@joi.lan> X-PGP-Key: http://csclub.uwaterloo.ca/~kspaans/kspaans-pubkey.asc User-Agent: Mutt/1.5.20 (2009-06-14) X-UUID: 9ae0cd04-3567-4b53-8011-f0c9e9ce35f1 X-Miltered: at mailchk-m02 with ID 4DAB143B.000 by Joe's j-chkmail (http://j-chkmail.ensmp.fr)! X-Greylist: Sender succeeded SMTP AUTH authentication, not delayed by milter-greylist-3.0 (mailchk-m02.uwaterloo.ca [129.97.128.141]); Sun, 17 Apr 2011 12:24:32 -0400 (EDT) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 8398 Lines: 148 On Sun, Apr 17, 2011 at 05:45:57PM +0200, Marcin Slusarz wrote: > On Sun, Apr 17, 2011 at 11:12:04AM -0400, Kyle Spaans wrote: > > On Sat, Apr 16, 2011 at 07:50:28PM -0400, Kyle Spaans wrote: > > > On Sun, Apr 17, 2011 at 08:12:35AM +1000, Nigel Cunningham wrote: > > > > On 15/04/11 16:11, Dominik Brodowski wrote: > > > > > On Thu, Apr 14, 2011 at 09:02:01PM +0200, Marcin Slusarz wrote: > > > > >> On Thu, Apr 14, 2011 at 07:05:59PM +0200, Dominik Brodowski wrote: > > > > >>> Thought about CCing Linus to show him that 2.6.39-rcX isn't as "calm" > > > > >>> to everyone, but then chose to CC Maciej instead: Would you be so kind and > > > > >>> add this to your regression list? Thanks! > > > > >>> > > > > >>> Since commit 38f1cff > > > > >>> > > > > >>> From: Dave Airlie > > > > >>> Date: Wed, 16 Mar 2011 11:34:41 +1000 > > > > >>> Subject: [PATCH] Merge commit '5359533801e3dd3abca5b7d3d985b0b33fd9fe8b' into dr > > > > >>> > > > > >>> This commit changed an internal radeon structure, that meant a new driver > > > > >>> in -next had to be fixed up, merge in the commit and fix up the driver. > > > > >>> > > > > >>> Also fixes a trivial nouveau merge. > > > > >>> > > > > >>> Conflicts: > > > > >>> drivers/gpu/drm/nouveau/nouveau_mem.c > > > > >>> > > > > >>> booting my atom/NM10/ION2 system crashes hard during boot, right after > > > > >>> blanking the screen, and before the initramfs gets loaded. I just > > > > >>> re-checked: both parent commits ( 5359533 and 4819d2e ) do indeed work > > > > >>> just fine, but the merge commit ( 38f1cff ) fails, same as tip ( 85f2e68 ). > > > > >> Can you activate netconsole and check whether kernel spits anything interesting? > > > > >> You might try to load nouveau module after boot - maybe something will be saved > > > > >> to /var/log or you could even ssh into the box and check dmesg... > > > > > Compiling it as a module seems to work fine. When I do so, no regression is > > > > > obvious from what gets reported in "dmesg". However, somehow I now do get > > > > > some output: The last message I see is > > > > > > > > > > [drm] nouveau 0000:01:00.0: allocated 1680x1050, fb 0x40.... b0 > > > > > > > > > > Then, nothing more. However, it really is quite strange why this error only > > > > > appears in the CONFIG_NOUVEAU=y case, not in the =m case... > > > > Try disabling CONFIG_BOOT_LOGO. I reported on freedesktop.org that it is > > > > causing me an oops at boot, but my bug has been ignored there so far - > > > > perhaps I should have posted it here instead. > > > > > > I'm getting the exact same symptoms on my Atom + ION hardware. Crashes before it > > > can write any logs if it's compiled in and the logo is selected, but boots fine > > > if compiled as a module or the logo is removed. > > > > > > In my case I bisected and found 8969960 by Nick Piggin (change to mm/vmalloc.c) > > > to be the first bad one in 2.6.38+. This makes me think that it's not a bug in > > > nouveau, but maybe a bug in the order that things are initialized? > > > > FWIW, reverting commit 89699605fe7cfd8611900346f61cb6cbf179b10a on 2.6.39-rc3+ > > makes my system boot just fine with the nouveau drivers compiled into the > > kernel. I've seen some similar looking bugs on LKML that this regression may or > > may not be related to? It works fine on 2.6.38. > > > > https://bugzilla.kernel.org/show_bug.cgi?id=33272 > > http://lkml.org/lkml/2011/4/15/194 > > > > I'm still trying to figure out exactly where the kernel is crashing after > > printing > > [drm] nouveau 0000:03:00.0: allocated 1280x1024 fb: 0x40000000, b0 f4cf7600 > > > > Any thoughts on what else I should look for? > > I reproduced this bug today, and reverting 89699605fe7cfd8611900346f61cb6cbf179b10a > does not fix it for me. Here's the backtrace: > > Entering kdb (current=0xffff8801becb0000, pid 1) on processor 6 Oops: (null) > due to oops @ 0xffffffff81255081 > CPU 6 Modules linked in: > > Pid: 1, comm: swapper Not tainted 2.6.39-rc2-nv+ #640 System manufacturer System Product Name/P6T SE > RIP: 0010:[] [] iowrite32+0x12/0x34 > RSP: 0000:ffff8801becab4b0 EFLAGS: 00010296 > RAX: 00000000ffffffff RBX: ffff8801bd334800 RCX: 00000000000016fc > RDX: 00000000ffffffff RSI: ffffc900100bbf4c RDI: ffffc900100bbf4c > RBP: ffff8801becab4b0 R08: 0000000000000002 R09: 0000000000000001 > R10: 00000000000000bb R11: ffff8801becab540 R12: ffff8801bd336000 > R13: ffff8801bd334818 R14: ffff8801bd600000 R15: 0000000000000020 > FS: 0000000000000000(0000) GS:ffff8801bfd80000(0000) knlGS:0000000000000000 > CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b > CR2: ffffc900100bbf4c CR3: 0000000001a2b000 CR4: 00000000000006e0 > DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 > Process swapper (pid: 1, threadinfo ffff8801becaa000, task ffff8801becb0000) > <0>Stack: > ffff8801becab4c0 ffffffff812f5bd5 ffff8801becab4f0 ffffffff8130f1f8 > ffff8801bd336000 ffffc90012a00000 ffff8801becab620 0000000000000000 > ffff8801becab590 ffffffff8127b4c8 ffff8801becb0000 ffffffff814c8c44 > <0>Call Trace: > <0> [] nouveau_bo_wr32+0x21/0x27 > <0> [] nouveau_fbcon_sync+0x19b/0x26e > <0> [] cfb_imageblit+0x80/0x450 > <0> [] ? __mutex_unlock_slowpath+0x100/0x124 > <0> [] ? trace_hardirqs_on_caller+0x118/0x13c > <0> [] ? nouveau_fbcon_imageblit+0x62/0xd8 > <0> [] nouveau_fbcon_imageblit+0xcd/0xd8 > <0> [] fb_show_logo+0x5ea/0x73a > <0> [] ? nouveau_fbcon_fillrect+0xae/0xd8 > <0> [] ? bit_clear_margins+0x141/0x14e > <0> [] fbcon_switch+0x3fd/0x475 > <0> [] redraw_screen+0x125/0x1fd > <0> [] bind_con_driver+0x5aa/0x637 > <0> [] take_over_console+0x38/0x45 > <0> [] fbcon_takeover+0x57/0x91 > <0> [] fbcon_event_notify+0x32d/0x65a > <0> [] notifier_call_chain+0x74/0xa1 > <0> [] __blocking_notifier_call_chain+0x71/0x8e > <0> [] blocking_notifier_call_chain+0xf/0x11 > <0> [] fb_notifier_call_chain+0x16/0x18 > <0> [] register_framebuffer+0x25a/0x271 > <0> [] drm_fb_helper_single_fb_probe+0x1bd/0x26f > <0> [] drm_fb_helper_initial_config+0x4a8/0x4bf > <0> [] ? mark_held_locks+0x52/0x70 > <0> [] nouveau_fbcon_init+0xd4/0xe0 > <0> [] nouveau_card_init+0x109e/0x11b9 > <0> [] nouveau_load+0x52d/0x56c > <0> [] drm_get_pci_dev+0x16a/0x26f > <0> [] nouveau_pci_probe+0x10/0x12 > <0> [] local_pci_probe+0x12/0x16 > <0> [] pci_device_probe+0x60/0x8f > <0> [] ? driver_sysfs_add+0x6b/0x90 > <0> [] driver_probe_device+0xa7/0x136 > <0> [] __driver_attach+0x5c/0x80 > <0> [] ? driver_probe_device+0x136/0x136 > <0> [] bus_for_each_dev+0x54/0x89 > <0> [] driver_attach+0x19/0x1b > <0> [] bus_add_driver+0xcd/0x219 > <0> [] driver_register+0x99/0x10a > <0> [] __pci_register_driver+0x63/0xd3 > <0> [] drm_pci_init+0x83/0xe8 > <0> [] ? ttm_init+0x62/0x62 > > It crashes on: > nouveau_bo_wr32(chan->notifier_bo, chan->m2mf_ntfy + 3, 0xffffffff); > in nouveau_fbcon_sync. Looks like your kernel is getting farther along than mine. I get a hard freeze with no oops though. From what I can tell mine is also following the same path through the fb_notifier stuff just before crashing. Can you "fix" your kernel by changing nouveau to be a module or turning off the tux boot logo? -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/