Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1758002Ab2EaSJV (ORCPT ); Thu, 31 May 2012 14:09:21 -0400 Received: from mx1.redhat.com ([209.132.183.28]:13895 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751951Ab2EaSJT (ORCPT ); Thu, 31 May 2012 14:09:19 -0400 Date: Thu, 31 May 2012 20:08:34 +0200 From: Andrea Arcangeli To: Petr Holasek Cc: "Kirill A. Shutemov" , linux-kernel@vger.kernel.org, linux-mm@kvack.org, Hillf Danton , Dan Smith , Peter Zijlstra , Linus Torvalds , Andrew Morton , Thomas Gleixner , Ingo Molnar , Paul Turner , Suresh Siddha , Mike Galbraith , "Paul E. McKenney" , Lai Jiangshan , Bharata B Rao , Lee Schermerhorn , Rik van Riel , Johannes Weiner , Srivatsa Vaddagiri , Christoph Lameter Subject: AutoNUMA15 Message-ID: <20120531180834.GP21339@redhat.com> References: <1337965359-29725-1-git-send-email-aarcange@redhat.com> <20120529133627.GA7637@shutemov.name> <20120529154308.GA10790@dhcp-27-244.brq.redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20120529154308.GA10790@dhcp-27-244.brq.redhat.com> Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2790 Lines: 44 Hi, On Tue, May 29, 2012 at 05:43:09PM +0200, Petr Holasek wrote: > Similar problem with __autonuma_migrate_page_remove here. > > [ 1945.516632] ------------[ cut here ]------------ > [ 1945.516636] WARNING: at lib/list_debug.c:50 __list_del_entry+0x63/0xd0() > [ 1945.516642] Hardware name: ProLiant DL585 G5 > [ 1945.516651] list_del corruption, ffff88017d68b068->next is LIST_POISON1 (dead000000100100) > [ 1945.516682] Modules linked in: ipt_MASQUERADE nf_conntrack_netbios_ns nf_conntrack_broadcast ip6table_mangle lockd ip6t_REJECT sunrpc nf_conntrack_ipv6 nf_defrag_ipv6 ip6table_filter ip6_tables iptable_nat nf_nat iptable_mangle nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack mperf freq_table kvm_amd kvm pcspkr amd64_edac_mod edac_core serio_raw bnx2 microcode edac_mce_amd shpchp k10temp hpilo ipmi_si ipmi_msghandler hpwdt qla2xxx hpsa ata_generic pata_acpi scsi_transport_fc scsi_tgt cciss pata_amd radeon i2c_algo_bit drm_kms_helper ttm drm i2c_core [last unloaded: scsi_wait_scan] > [ 1945.516694] Pid: 150, comm: knuma_migrated0 Tainted: G W 3.4.0aa_alpha+ #3 > [ 1945.516701] Call Trace: > [ 1945.516710] [] warn_slowpath_common+0x7f/0xc0 > [ 1945.516717] [] warn_slowpath_fmt+0x46/0x50 > [ 1945.516726] [] __list_del_entry+0x63/0xd0 > [ 1945.516735] [] list_del+0x11/0x40 > [ 1945.516743] [] __autonuma_migrate_page_remove+0x48/0x80 > [ 1945.516746] [] knuma_migrated+0x296/0x8a0 > [ 1945.516749] [] ? wake_up_bit+0x40/0x40 > [ 1945.516758] [] ? __autonuma_migrate_page_remove+0x80/0x80 > [ 1945.516766] [] kthread+0x93/0xa0 > [ 1945.516780] [] kernel_thread_helper+0x4/0x10 > [ 1945.516791] [] ? flush_kthread_worker+0x80/0x80 > [ 1945.516798] [] ? gs_change+0x13/0x13 > [ 1945.516800] ---[ end trace 7cab294af87bd79f ]--- I didn't manage to reproduce it on my hardware but it seems this was caused by the autonuma_migrate_split_huge_page: the tail page list linking wasn't surrounded by the compound lock to make list insertion and migrate_nid setting atomic like it happens everywhere else (the caller holding the lock on the head page wasn't enough to make the tails stable too). I released an AutoNUMA15 branch that includes all pending fixes: git clone --reference linux -b autonuma15 git://git.kernel.org/pub/scm/linux/kernel/git/andrea/aa.git Thanks, Andrea -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/