Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id ; Thu, 20 Feb 2003 12:54:33 -0500 Received: (majordomo@vger.kernel.org) by vger.kernel.org id ; Thu, 20 Feb 2003 12:54:33 -0500 Received: from neon-gw-l3.transmeta.com ([63.209.4.196]:51981 "EHLO neon-gw.transmeta.com") by vger.kernel.org with ESMTP id ; Thu, 20 Feb 2003 12:54:30 -0500 Date: Thu, 20 Feb 2003 10:01:15 -0800 (PST) From: Linus Torvalds To: Ingo Molnar cc: Zwane Mwaikambo , Chris Wedgwood , Kernel Mailing List , "Martin J. Bligh" , William Lee Irwin III Subject: Re: doublefault debugging (was Re: Linux v2.5.62 --- spontaneous reboots) In-Reply-To: Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: linux-kernel-owner@vger.kernel.org X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 3282 Lines: 103 On Thu, 20 Feb 2003, Ingo Molnar wrote: > > a true heisenbug. I cannot reproduce it anymore. Anyway, from the serial > console i collected 3 instances of crashes - whatever it's worth. Pretty much every single time, release_task() has been there on the backtrace. In fact, I bet you this code in do_exit() is the cause: preempt_disable(); if (tsk->exit_signal == -1) *** release_task(tsk); *** schedule(); Note how "release_task()" will be releasing the stack that the process is running on right now. And the reason it doesn't crash _every_ time is simply that you need to have: - another memory allocation that picks up that page and fills it with something else in order to get a corrupted stack - and something delays schedule() so that you have time to race _and_ you need the stack. Which is why most of the oopses have an interrupt come in inside schedule (see the "common_interrupt()" thing In other words, I think we need to have schedule_tail() do the release_task(), otherwise we'd release it too early while the task structure (and the stack) are both still in use. You owe me a patch. Linus --- > [] release_task+0x17d/0x200 > [] mmput+0x1f/0xc0 > [] do_exit+0x31d/0x3b0 > [] common_interrupt+0x18/0x20 > [] error_code+0x2d/0x38 > [] schedule+0x3a1/0x3d0 > [] release_task+0x17d/0x200 > [] mmput+0x1f/0xc0 > [] do_exit+0x31d/0x3b0 > [] handle_IRQ_event+0x38/0x60 > [] do_IRQ+0x14b/0x1e0 > [] common_interrupt+0x18/0x20 > [] error_code+0x2d/0x38 > [] schedule+0x3a1/0x3d0 > [] release_task+0x17d/0x200 > [] mmput+0x1f/0xc0 > [] do_exit+0x31d/0x3b0 > [] do_IRQ+0x14b/0x1e0 > [] common_interrupt+0x18/0x20 > [] error_code+0x2d/0x38 > [] schedule+0x3a1/0x3d0 > [] release_task+0x17d/0x200 > [] mmput+0x1f/0xc0 > [] do_exit+0x31d/0x3b0 > [] do_IRQ+0x14b/0x1e0 > [] common_interrupt+0x18/0x20 > [] error_code+0x2d/0x38 > [] schedule+0x3a1/0x3d0 > [] release_task+0x17d/0x200 > [] mmput+0x1f/0xc0 > [] do_exit+0x31d/0x3b0 > [] __put_task_struct+0x7c/0x90 > [] do_exit+0x31d/0x3b0 > [] handle_IRQ_event+0x38/0x60 > [] do_IRQ+0x14b/0x1e0 > [] common_interrupt+0x18/0x20 > [] error_code+0x2d/0x38 > [] schedule+0x3a1/0x3d0 > [] release_task+0x17d/0x200 > [] mmput+0x1f/0xc0 > [] do_exit+0x31d/0x3b0 > [] handle_IRQ_event+0x38/0x60 > [] do_IRQ+0x14b/0x1e0 > [] common_interrupt+0x18/0x20 > [] error_code+0x2d/0x38 > [] schedule+0x3a1/0x3d0 > [] release_task+0x17d/0x200 > [] mmput+0x1f/0xc0 > [] do_exit+0x31d/0x3b0 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/