Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1161086Ab2KNKJ4 (ORCPT ); Wed, 14 Nov 2012 05:09:56 -0500 Received: from mail-wg0-f44.google.com ([74.125.82.44]:50583 "EHLO mail-wg0-f44.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1161008Ab2KNKJy (ORCPT ); Wed, 14 Nov 2012 05:09:54 -0500 MIME-Version: 1.0 X-Originating-IP: [178.83.130.250] Date: Wed, 14 Nov 2012 11:09:52 +0100 Message-ID: Subject: Regression due to "mm: fix-up zone present pages" From: Daniel Vetter To: Chris Wilson , "Lu, HuaX" , "Sun, Yi" , "Jin, Gordon" , Jianguo Wu , Jiang Liu , Andrew Morton , Linus Torvalds Cc: intel-gfx , dri-devel , Linux Kernel Mailing List , linux-mm@kvack.org, "Luck, Tony" , Mel Gorman , Yinghai Lu , Minchan Kim , Johannes Weiner , David Rientjes Content-Type: text/plain; charset=ISO-8859-1 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2877 Lines: 68 Hi all, Our QA noticed a regression in one of our i915/GEM testcases in 3.7: https://bugs.freedesktop.org/show_bug.cgi?id=56859 Direct link to dmesg of the machine: https://bugs.freedesktop.org/attachment.cgi?id=70052 Note that the machine is 32bit, which seems to be important since Chris Wilson confirmed the bug on his 32bit Sandybridge machine, whereas mine here with a 64bit kernel works flawlessly. The testcase is gem_tiled_swapping: http://cgit.freedesktop.org/xorg/app/intel-gpu-tools/tree/tests/gem_tiled_swapping.c Quick high-level description of the workload: It allocates a working set larger than available memory, then fills it by writing it through the gpu gart (required to get a linear view of tiled buffers) and afterwards reads it to check whether anything got corrupted. Since the working set is too large to fit into ram, this will force all buffers through swap. We've written this testcase to exercise the reswizzle swapin path since some platforms have a tiling layout depending upon physical pfn (awesome feature btw), but not snb. So within the kernel this workload simply grabs the backing storage from shmemfs with shmem_read_mapping_page_gfp and then binds them into the gpu pagetables (the GTT). This happens in the i915_gem_fault fucntion. Unbinding in this workload happens either directly (if the gem code can't get enough memory) or through our shrinker (i915_gem_inactive_shrink). Swapout is then left to shmemfs to handle. All the above stuff is in drivers/gpu/drm/i915_gem.c Testcase fails because it detects a mismatch between what has been written and what has been read back. Our qa people bisected the regression to commit 7f1290f2f2a4d2c3f1b7ce8e87256e052ca23125 Author: Jianguo Wu Date: Mon Oct 8 16:33:06 2012 -0700 mm: fix-up zone present pages and confirmed the revert on top of the latest drm-intel-nightly branch (which is based on top of 3.7-rc2 and contains the -next stuff for 3.8). They've also tested the for-QA branch which had latest Linus upstream merged in, which did not fix the problem. For reference the intel trees are at (but I don't think it matters really that it's not plain upstream, nothing really changed in the relevant i915/gem paths compared to upstream): http://cgit.freedesktop.org/~danvet/drm-intel I have no idea how that early boot zone init fix could even corrupt swapping in such a fashion, so ideas highly welcome. QA people are cc'ed, and hopefully I haven't missed anyone else on the cc list. Yours, Daniel -- Daniel Vetter Software Engineer, Intel Corporation +41 (0) 79 365 57 48 - http://blog.ffwll.ch -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/