Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1945994AbbDYP4g (ORCPT ); Sat, 25 Apr 2015 11:56:36 -0400 Received: from resqmta-ch2-10v.sys.comcast.net ([69.252.207.42]:48351 "EHLO resqmta-ch2-10v.sys.comcast.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1756203AbbDYP4e (ORCPT ); Sat, 25 Apr 2015 11:56:34 -0400 Message-ID: <553BB91C.3010308@gentoo.org> Date: Sat, 25 Apr 2015 11:56:12 -0400 From: Joshua Kinard User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:31.0) Gecko/20100101 Thunderbird/31.6.0 MIME-Version: 1.0 To: LKML , Linux MIPS List Subject: MIPS: BUG() in isolate_lru_pages in mm/vmscan.c? Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 1595 Lines: 42 I keep tripping up a BUG() in isolate_lru_pages in mm/vmscan.c:1345: switch (__isolate_lru_page(page, mode)) { case 0: nr_pages = hpage_nr_pages(page); mem_cgroup_update_lru_size(lruvec, lru, -nr_pages); list_move(&page->lru, dst); nr_taken += nr_pages; break; case -EBUSY: /* else it is being freed elsewhere */ list_move(&page->lru, src); continue; default: BUG(); } This is on an SGI Onyx2 platform (MIPS, IP27), two node boards (4x R14000 CPUs), and 8G of RAM. The problem appears tied to heavy disk I/O, typically writes. I can reproduce sometimes with a long bonnie++ run, but I haven't gotten a recent panic() message under 4.0 yet. Most of the time, it silently hardlocks. I only have serial console access at 9600bps, so it may lock too fast before the serial driver can dump the panic. Is there any information behind the purpose or triggers of this BUG()? I went back in git all the way to the initial 2006 commit that added this function, but could not find any comments or explanation of just what it's protecting against. That makes it hard to know where to start debugging. I've already tried switching filesystems, first ext4, now XFS. Enabling CONFIG_NUMA seems to make it harder to trigger, but that's not an objective observation. An md RAID resync doesn't appear to trigger it either. Help? -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/