Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1756039Ab1BRSJH (ORCPT ); Fri, 18 Feb 2011 13:09:07 -0500 Received: from out02.mta.xmission.com ([166.70.13.232]:54492 "EHLO out02.mta.xmission.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754910Ab1BRSJE (ORCPT ); Fri, 18 Feb 2011 13:09:04 -0500 From: ebiederm@xmission.com (Eric W. Biederman) To: Linus Torvalds Cc: Michal Hocko , Ingo Molnar , linux-mm@kvack.org, LKML , David Miller , Eric Dumazet , References: <20110216185234.GA11636@tiehlicka.suse.cz> <20110216193700.GA6377@elte.hu> <20110217090910.GA3781@tiehlicka.suse.cz> <20110217163531.GF14168@elte.hu> <20110218122938.GB26779@tiehlicka.suse.cz> <20110218162623.GD4862@tiehlicka.suse.cz> Date: Fri, 18 Feb 2011 10:08:52 -0800 In-Reply-To: (Linus Torvalds's message of "Fri, 18 Feb 2011 08:39:02 -0800") Message-ID: User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/23.1 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-XM-SPF: eid=;;;mid=;;;hst=in01.mta.xmission.com;;;ip=98.207.153.68;;;frm=ebiederm@xmission.com;;;spf=neutral X-XM-AID: U2FsdGVkX18ZhucnmlJpiXkyevRy7a6Jw2P5nQ8GEGA= X-SA-Exim-Connect-IP: 98.207.153.68 X-SA-Exim-Mail-From: ebiederm@xmission.com X-Spam-Report: * 7.0 XM_URI_RBL URI blacklisted in uri.bl.xmission.com * [URIs: linux-foundation.org] * -1.0 ALL_TRUSTED Passed through trusted hosts only via SMTP * 0.0 T_TM2_M_HEADER_IN_MSG BODY: T_TM2_M_HEADER_IN_MSG * -3.0 BAYES_00 BODY: Bayes spam probability is 0 to 1% * [score: 0.0000] * -0.0 DCC_CHECK_NEGATIVE Not listed in DCC * [sa01 1397; Body=1 Fuz1=1 Fuz2=1] * 0.0 T_XMDrugObfuBody_08 obfuscated drug references * 0.0 T_TooManySym_01 4+ unique symbols in subject * 0.0 T_TooManySym_02 5+ unique symbols in subject * 0.4 UNTRUSTED_Relay Comes from a non-trusted relay X-Spam-DCC: XMission; sa01 1397; Body=1 Fuz1=1 Fuz2=1 X-Spam-Combo: ***;Linus Torvalds X-Spam-Relay-Country: Subject: Re: BUG: Bad page map in process udevd (anon_vma: (null)) in 2.6.38-rc4 X-Spam-Flag: No X-SA-Exim-Version: 4.2.1 (built Fri, 06 Aug 2010 16:31:04 -0600) X-SA-Exim-Scanned: Yes (on in01.mta.xmission.com) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 5885 Lines: 126 Linus Torvalds writes: > On Fri, Feb 18, 2011 at 8:26 AM, Michal Hocko wrote: >>> Now, I will try with the 2 patches patches in this thread. I will also >>> turn on DEBUG_LIST and DEBUG_PAGEALLOC. >> >> I am not able to reproduce with those 2 patches applied. > > Thanks for verifying. Davem/EricD - you can add Michal's tested-by to > the patches too. > > And I think we can consider this whole thing solved. It hopefully also > explains all the other random crashes that EricB saw - just random > memory corruption in other datastructures. > > EricB - do all your stress-testers run ok now? Things are looking better and PAGEALLOC debug isn't firing. So this looks like one bug down. I have not seen the bad page map symptom. I am still getting programs segfaulting but that is happening on other machines running on older kernels so I am going to chalk that up to a buggy test and a false positive. I am have OOM problems getting my tests run to complete. On a good day that happens about 1 time in 3 right now. I'm guess I will have to turn off DEBUG_PAGEALLOC to get everything to complete. DEBUG_PAGEALLOC causes us to use more memory doesn't it? The most interesting thing I have right now is a networking lockdep issue. Does anyone know what is going on there? Eric ================================= [ INFO: inconsistent lock state ] 2.6.38-rc4-359399.2010AroraKernelBeta.fc14.x86_64 #1 --------------------------------- inconsistent {IN-SOFTIRQ-W} -> {SOFTIRQ-ON-W} usage. kworker/u:1/10833 [HC0[0]:SC0[0]:HE1:SE1] takes: (tcp_death_row.death_lock){+.?...}, at: [] inet_twsk_deschedule+0x29/0xa0 {IN-SOFTIRQ-W} state was registered at: [] __lock_acquire+0x70e/0x1d30 [] lock_acquire+0x9f/0x120 [] _raw_spin_lock+0x2c/0x40 [] inet_twsk_schedule+0x3b/0x1e0 [] tcp_time_wait+0x20d/0x380 [] tcp_fin.clone.39+0x10e/0x1c0 [] tcp_data_queue+0x798/0xd50 [] tcp_rcv_state_process+0x799/0xbb0 [] tcp_v4_do_rcv+0x238/0x500 [] tcp_v4_rcv+0x86a/0xbe0 [] ip_local_deliver_finish+0x10d/0x380 [] ip_local_deliver+0x80/0x90 [] ip_rcv_finish+0x192/0x5a0 [] ip_rcv+0x234/0x300 [] __netif_receive_skb+0x443/0x700 [] netif_receive_skb+0xb8/0xf0 [] napi_skb_finish+0x48/0x60 [] napi_gro_receive+0xb5/0xc0 [] igb_poll+0x89f/0xd20 [igb] [] net_rx_action+0x149/0x270 [] __do_softirq+0xc0/0x1f0 [] call_softirq+0x1c/0x30 [] do_softirq+0xa5/0xe0 [] irq_exit+0x8d/0xa0 [] do_IRQ+0x61/0xe0 [] ret_from_intr+0x0/0x1a [] ____pagevec_lru_add+0x16d/0x1a0 [] lru_add_drain+0x73/0xe0 [] exit_mmap+0x5c/0x180 [] mmput+0x52/0xe0 [] exit_mm+0x120/0x150 [] do_exit+0x132/0x8c0 [] do_group_exit+0x59/0xd0 [] sys_exit_group+0x12/0x20 [] system_call_fastpath+0x16/0x1b irq event stamp: 187417 hardirqs last enabled at (187417): [] kmem_cache_free+0x125/0x160 hardirqs last disabled at (187416): [] kmem_cache_free+0x72/0x160 softirqs last enabled at (187410): [] sk_common_release+0x62/0xc0 softirqs last disabled at (187408): [] _raw_write_lock_bh+0x11/0x40 other info that might help us debug this: 3 locks held by kworker/u:1/10833: #0: (netns){.+.+.+}, at: [] process_one_work+0x121/0x4b0 #1: (net_cleanup_work){+.+.+.}, at: [] process_one_work+0x121/0x4b0 #2: (net_mutex){+.+.+.}, at: [] cleanup_net+0x80/0x1b0 stack backtrace: Pid: 10833, comm: kworker/u:1 Not tainted 2.6.38-rc4-359399.2010AroraKernelBeta.fc14.x86_64 #1 Call Trace: [] ? print_usage_bug+0x170/0x180 [] ? mark_lock+0x37f/0x400 [] ? __lock_acquire+0x790/0x1d30 [] ? __lock_acquire+0x3cf/0x1d30 [] ? check_object+0xaf/0x270 [] ? inet_twsk_deschedule+0x29/0xa0 [] ? lock_acquire+0x9f/0x120 [] ? inet_twsk_deschedule+0x29/0xa0 [] ? __sk_free+0xd9/0x160 [] ? _raw_spin_lock+0x2c/0x40 [] ? inet_twsk_deschedule+0x29/0xa0 [] ? inet_twsk_deschedule+0x29/0xa0 [] ? inet_twsk_purge+0xf6/0x180 [] ? inet_twsk_purge+0x30/0x180 [] ? tcp_sk_exit_batch+0x1c/0x20 [] ? ops_exit_list.clone.0+0x53/0x60 [] ? cleanup_net+0x100/0x1b0 [] ? process_one_work+0x187/0x4b0 [] ? process_one_work+0x121/0x4b0 [] ? cleanup_net+0x0/0x1b0 [] ? worker_thread+0x15c/0x330 [] ? worker_thread+0x0/0x330 [] ? kthread+0xb6/0xc0 [] ? trace_hardirqs_on_caller+0x13d/0x180 [] ? kernel_thread_helper+0x4/0x10 [] ? restore_args+0x0/0x30 [] ? kthread+0x0/0xc0 [] ? kernel_thread_helper+0x0/0x10 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/