Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754688AbbG3A3J (ORCPT ); Wed, 29 Jul 2015 20:29:09 -0400 Received: from youngberry.canonical.com ([91.189.89.112]:40081 "EHLO youngberry.canonical.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752856AbbG3A3H (ORCPT ); Wed, 29 Jul 2015 20:29:07 -0400 MIME-Version: 1.0 In-Reply-To: References: <20150729135102.GA11889@cmpxchg.org>

Date: Wed, 29 Jul 2015 20:29:04 -0400 Message-ID: Subject: Re: cgroup/loop Bad page state oops in Linux v4.2-rc3-136-g45b4b782e848 From: Ming Lei To: Josh Boyer Cc: Johannes Weiner , Tejun Heo , Jens Axboe , "Linux-Kernel@Vger. Kernel. Org" Content-Type: text/plain; charset=UTF-8 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 3435 Lines: 75 On Wed, Jul 29, 2015 at 12:36 PM, Josh Boyer wrote: > On Wed, Jul 29, 2015 at 11:32 AM, Ming Lei wrote: >> On Wed, Jul 29, 2015 at 9:51 AM, Johannes Weiner wrote: >>> On Wed, Jul 29, 2015 at 09:27:16AM -0400, Josh Boyer wrote: >>>> Hi All, >>>> >>>> We've gotten a report[1] that any of the upcoming Fedora 23 install >>>> images are all failing on 32-bit VMs/machines. Looking at the first >>>> instance of the oops, it seems to be a bad page state where a page is >>>> still charged to a group and it is trying to be freed. The oops >>>> output is below. >>>> >>>> Has anyone seen this in their 32-bit testing at all? Thus far nobody >>>> can recreate this on a 64-bit machine/VM. >>>> >>>> josh >>>> >>>> [1] https://bugzilla.redhat.com/show_bug.cgi?id=1247382 >>>> >>>> [ 9.026738] systemd[1]: Switching root. >>>> [ 9.036467] systemd-journald[149]: Received SIGTERM from PID 1 (systemd). >>>> [ 9.082262] BUG: Bad page state in process kworker/u5:1 pfn:372ac >>>> [ 9.083989] page:f3d32ae0 count:0 mapcount:0 mapping:f2252178 index:0x16a >>>> [ 9.085755] flags: 0x40020021(locked|lru|mappedtodisk) >>>> [ 9.087284] page dumped because: page still charged to cgroup >>>> [ 9.088772] bad because of flags: >>>> [ 9.089731] flags: 0x21(locked|lru) >>>> [ 9.090818] page->mem_cgroup:f2c3e400 >>> >>> It's also still locked and on the LRU. This page shouldn't have been >>> freed. >>> >>>> [ 9.117848] Call Trace: >>>> [ 9.118738] [] dump_stack+0x41/0x52 >>>> [ 9.120034] [] bad_page.part.80+0xaa/0x100 >>>> [ 9.121461] [] free_pages_prepare+0x3b9/0x3f0 >>>> [ 9.122934] [] free_hot_cold_page+0x22/0x160 >>>> [ 9.124400] [] ? copy_to_iter+0x1af/0x2a0 >>>> [ 9.125750] [] ? mempool_free_slab+0x13/0x20 >>>> [ 9.126840] [] __free_pages+0x37/0x50 >>>> [ 9.127849] [] mempool_free_pages+0xd/0x10 >>>> [ 9.128908] [] mempool_free+0x26/0x80 >>>> [ 9.129895] [] bounce_end_io+0x56/0x80 >>> >>> The page state looks completely off for a bounce buffer page. Did >>> somebody mess with a bounce bio's bv_page? >> >> Looks the page isn't touched in both lo_read_transfer() and >> lo_read_simple(). >> >> Maybe it is related with aa4d86163e4e(block: loop: switch to VFS ITER_BVEC), >> or it might be helpful to run 'git bisect' if reverting aa4d86163e4e can't >> fix the issue, suppose the issue can be reproduced easily. > > I can try reverting that and getting someone to test it. It is > somewhat complicated by having to spin a new install ISO, so a report > back will be somewhat delayed. In the meantime, I'm also asking > people to track down the first kernel build that hits this, so > hopefully that gives us more of a clue as well. > > It is odd that only 32-bit hits this issue though. At least from what > we've seen thus far. Page bounce may be just valid on 32-bit, and I will try to find one ARM box to see if it can be reproduced easily. BTW, are there any extra steps for reproducing the issue? Such as cgroup operations? Thanks, -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/