Received: by 2002:a05:6a10:22f:0:0:0:0 with SMTP id 15csp1930551pxk; Sun, 13 Sep 2020 23:07:27 -0700 (PDT) X-Google-Smtp-Source: ABdhPJxXai9PSuwFvYtJ59a/1WuOVdzJCxtoUBTUEgFoGGdkIehj1LTWEWhFdYnNSqZUcK86tqjn X-Received: by 2002:aa7:d059:: with SMTP id n25mr15551804edo.270.1600063647115; Sun, 13 Sep 2020 23:07:27 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1600063647; cv=none; d=google.com; s=arc-20160816; b=egHijmW+lHw2BAs0oLwn/PhjjNPop7TSDGLziWoIgxT9VJrpDSvMzXInTFfko11kaS DHKrAEPUueu+CbrBnBmaxNUfqPFwxuDYkjnQ2B7yIEQZx9WxPC4sXmv89SvM0wR5IGFk KclkM0ooQJy9L7E4ekk5J03yJvrssJY2A4GfwhCTRoVAyCKEDVgwVdRJXmNYv6hlxmZg bLxLaGeVyKxQrmO1cQBEFxgCRiNVZEy/AIC+7aIMDRasuiDZ9CuNnhuoNBywC1h1zVSv JrjP+bjdJ1qAL91+eLHpCaG0Q4NjGHJlwA16p25YojsOXkLsalixuT6v6lQqpiTWG7Rr GQCA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-language :content-transfer-encoding:in-reply-to:mime-version:user-agent:date :message-id:from:references:cc:to:subject:dkim-signature; bh=PTS1Gsr7Nh89LZ5e6XV6TSxihUbw/uQD/CL+Tyjz01w=; b=kgdfcdCp71wKEZeq26mXdhTgcvDw9j0KHCC2KyaYmI7M+V7yHymxjb2mv7B3DmCvPw q5Dl4GzfnY6gSWVSUhXZMlj6te0dklYaOr712TB2JhqRI2ZtM7pCF3tmBd/HHWq0f81y +tvIoAUWk+2HSUPwYG+hqrFVftPPwyD56+Kv2vkPo6JpX5R2WmqkEiHcSxk/zfFvxtg+ As//LkH9c6MCahd4Mjp34cy6W6Myb2Z31wjQkldlnPk0cd8hCMCsXbAzev4uPrg1qWTK cLdX0gJ6ME9licrQd1gI8VdI28q+4zm4O1DjBFgU0YQF/WS9ekh6KDdcNNQL2PFaAboD GtSQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@jv-coder.de header.s=dkim header.b=CrAaTU16; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id n26si6565878edt.334.2020.09.13.23.07.05; Sun, 13 Sep 2020 23:07:27 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@jv-coder.de header.s=dkim header.b=CrAaTU16; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726067AbgINGDe (ORCPT + 99 others); Mon, 14 Sep 2020 02:03:34 -0400 Received: from mail.jv-coder.de ([5.9.79.73]:46504 "EHLO mail.jv-coder.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726030AbgINGDd (ORCPT ); Mon, 14 Sep 2020 02:03:33 -0400 Received: from [10.61.40.7] (unknown [37.156.92.209]) by mail.jv-coder.de (Postfix) with ESMTPSA id 2F5369F7F6; Mon, 14 Sep 2020 06:03:29 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=jv-coder.de; s=dkim; t=1600063409; bh=PTS1Gsr7Nh89LZ5e6XV6TSxihUbw/uQD/CL+Tyjz01w=; h=Subject:To:From:Message-ID:Date:MIME-Version; b=CrAaTU16cfRkBIvL6Cn4YUvmfoO3BeKbgDyThduFmvRIJuokfWCociD4H7nGpo/O3 TKc8JDQhZpkyGndU34kituUfoBeXIF+tEEuOKKDiIQehbDlM+QA1P8EZmXUOBmAhyK NOtt3VpSJcmRYxF/i3GYCEvY8Eu7JLmkDkBSA9Kc= Subject: Re: [BUG RT] dump-capture kernel not executed for panic in interrupt context To: "Eric W. Biederman" Cc: peterz@infradead.org, Steven Rostedt , Andrew Morton , Thomas Gleixner , Sebastian Andrzej Siewior , Huang Ying , linux-kernel@vger.kernel.org, Joerg Vehlow References: <20200528084614.0c949e8d@gandalf.local.home> <20200727163655.8c94c8e245637b62311f5053@linux-foundation.org> <20200821110848.6c3183d1@oasis.local.home> <20200821134753.9547695c9b782275be3c95b5@linux-foundation.org> <20200821170334.73b52fdd@oasis.local.home> <95d7a489-a295-1c11-ac62-83e941ed3a87@jv-coder.de> <20200907114618.GR2674@hirez.programming.kicks-ass.net> <5c3a502f-2255-5aae-3599-5220aa4b8328@jv-coder.de> <20200907162338.GN1362448@hirez.programming.kicks-ass.net> <5600c9f8-2c9d-7776-161a-5f5c1be62c10@jv-coder.de> <51f3b288-260b-a800-6a47-51d93f892c3d@jv-coder.de> <87sgbo3p5w.fsf@x220.int.ebiederm.org> From: Joerg Vehlow Message-ID: <22bad750-ef5d-82a5-527c-5213346dd280@jv-coder.de> Date: Mon, 14 Sep 2020 08:03:27 +0200 User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:68.0) Gecko/20100101 Thunderbird/68.12.0 MIME-Version: 1.0 In-Reply-To: <87sgbo3p5w.fsf@x220.int.ebiederm.org> Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 8bit Content-Language: en-US X-Spam-Status: No, score=1.1 required=5.0 tests=DKIM_SIGNED,DKIM_VALID, DKIM_VALID_AU,DKIM_VALID_EF,HELO_MISC_IP,NICE_REPLY_A,RDNS_NONE autolearn=no autolearn_force=no version=3.4.4 X-Spam-Level: * X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on mail.jv-coder.de Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi Eric, > What is this patch supposed to be doing? > > What bug is it fixing? This information is part in the first message of this mail thread. The patch was intendedfor the active discussion in this thread, not for a broad review. A short summary: In the rt kernel, a panic in an interrupt context does not start the dump-capture kernel, because there is a mutex_trylock in __crash_kexe. If this is called in interrupt context, it always fails. In the non-rt kernel calling mutex_trylock is not allowed according to the comment of the function, but it still works. > A BUG_ON that triggers inside of BUG_ONs seems not just suspect but > outright impossible to make use of. I am not entirely sure what would happen here. But even if it gets in some kind ofendless loop, I guess this is ok, because it allows finding the problem. A piece of code in the function, that ensures the precondition is a lot better than relying on only a comment. If this was in mtex_trylock, the bug described above wouldn't have sneaked in 12 years ago... > I get the feeling skimming this that it is time to sort out and simplify > the locking here, rather than make it more complex, and more likely to > fail. I would very much like that, but sadly it looks like it is not possible. Either it wouldrequire blocking locks, that may fail, or not locking at all, that may also fail.Using a different kind of lock (like spinlock) is also not possible, becausespinlock_trylock again uses mutex_trylock in the rt kernel. > > I get the feeling that over the years somehow the assumption that the > rest of the kernel is broken and that we need to get out of the broken > kernel as fast and as simply as possible has been lost. Yes I also have the feeling, that the mutexes need fixing, but I wouldn't to post any patch for that. At the moment, given the interface of the mutex, this is clearly a bug in kexec, even if it works in the non-rt kernel. Jörg