Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 1E0C0C433FE for ; Wed, 5 Jan 2022 21:39:16 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S244529AbiAEVjP (ORCPT ); Wed, 5 Jan 2022 16:39:15 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:55788 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231801AbiAEVjN (ORCPT ); Wed, 5 Jan 2022 16:39:13 -0500 Received: from mail-lf1-x12b.google.com (mail-lf1-x12b.google.com [IPv6:2a00:1450:4864:20::12b]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 342D2C061245; Wed, 5 Jan 2022 13:39:13 -0800 (PST) Received: by mail-lf1-x12b.google.com with SMTP id bp20so809894lfb.6; Wed, 05 Jan 2022 13:39:13 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=subject:to:cc:references:from:message-id:date:user-agent :mime-version:in-reply-to:content-language:content-transfer-encoding; bh=XuAC60vVYARHTmBQJSyU80swuj8yQQGqnPy63JB/4RI=; b=p1j57YfnfABEqA3UGD+r81Zd9v/l8KYHviC0kssp6kkUZEvyGmwU6vObMH2xq+phy/ ghFT4kZhynEYMxAuw6WNTVYEsstZ8NkkUzjOIO3duvlxrKnX8C+XpwE+ccpLcaiq4WsZ Q1WOcEQTH9gQw3CEHin6g8PCjUG9MNwHJrMstHp7MgaiuV28KNaoqjNX6UzFKzAcmGAg WOOU5RXXBLjNmBSAlp0JrZ7MFK2JgRjWSVko949ImH6OscKRBddnU0hO6kMFBQf17cya V+CTM9AzVR+xv6lUyntLwUK4WezuxZLYC1u0Hl/j+IcTz3tkGRS70Dqkx3Zc563PdHzX ZrGA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:subject:to:cc:references:from:message-id:date :user-agent:mime-version:in-reply-to:content-language :content-transfer-encoding; bh=XuAC60vVYARHTmBQJSyU80swuj8yQQGqnPy63JB/4RI=; b=pUCvbPd0dKmJi5c/fnowQTxIbz6e8V7v4dh/cGTQVHuX0z0B7/WIz3qgLL9uIhyMmN 5QiEaKCr5ZEQ3mMiVnlAGjwzhiJx4Xl+s3T3I6pXmF3nZC1ASNDIyiJOVVr4DoFZp8Hv Ht9cbteM+S+kWKi1mGLsX8BXAjLzSqCZ0mRJ4E5U5FYu1h5hTGO15CuLYLqt7oz2MmIs vuuEqoFRjgn1YmBJCKCtf6YUFG+0YchCAZncePFp7EFiUdNGFkbmlEyIjN29laS3bGm/ 8NaxK1XXjwFEkTHqdpUWe9UjlbKGCGfZu+nSRUvcTCbMMe/XUgddfdPvARrOJwKq39wh hdqg== X-Gm-Message-State: AOAM532QKrgJT90qNRohj28WLprhrFSy1EYzfjYjHOTMRV+DX4uUsv3f xHvZpWXQiV8bklvZWzSYQvw= X-Google-Smtp-Source: ABdhPJzAB30CbXefi7QC7PUQFz6TqfeC7L472wXrX4EbPudT+VP8+dRCCogpNuX4ME/1y3AE6rclmA== X-Received: by 2002:ac2:5109:: with SMTP id q9mr49792866lfb.146.1641418751388; Wed, 05 Jan 2022 13:39:11 -0800 (PST) Received: from [192.168.2.145] (94-29-46-141.dynamic.spd-mgts.ru. [94.29.46.141]) by smtp.googlemail.com with ESMTPSA id y10sm6213ljp.82.2022.01.05.13.39.10 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Wed, 05 Jan 2022 13:39:11 -0800 (PST) Subject: Re: [PATCH 1/8] signal: Make SIGKILL during coredumps an explicit special case To: "Eric W. Biederman" Cc: linux-kernel@vger.kernel.org, linux-arch@vger.kernel.org, linux-api@vger.kernel.org, Linus Torvalds , Alexey Gladkov , Kyle Huey , Oleg Nesterov , Kees Cook , Al Viro References: <87a6ha4zsd.fsf@email.froward.int.ebiederm.org> <20211213225350.27481-1-ebiederm@xmission.com> <9363765f-9883-75ee-70f1-a1a8e9841812@gmail.com> <87pmp67y4r.fsf@email.froward.int.ebiederm.org> From: Dmitry Osipenko Message-ID: <5bbb54c4-7504-cd28-5dde-4e5965496625@gmail.com> Date: Thu, 6 Jan 2022 00:39:10 +0300 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.14.0 MIME-Version: 1.0 In-Reply-To: <87pmp67y4r.fsf@email.froward.int.ebiederm.org> Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 8bit Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org 05.01.2022 22:58, Eric W. Biederman пишет: > Dmitry Osipenko writes: > >> 14.12.2021 01:53, Eric W. Biederman пишет: >>> Simplify the code that allows SIGKILL during coredumps to terminate >>> the coredump. As far as I can tell I have avoided breaking it >>> by dumb luck. >>> >>> Historically with all of the other threads stopping in exit_mm the >>> wants_signal loop in complete_signal would find the dumper task and >>> then complete_signal would wake the dumper task with signal_wake_up. >>> >>> After moving the coredump_task_exit above the setting of PF_EXITING in >>> commit 92307383082d ("coredump: Don't perform any cleanups before >>> dumping core") wants_signal will consider all of the threads in a >>> multi-threaded process for waking up, not just the core dumping task. >>> >>> Luckily complete_signal short circuits SIGKILL during a coredump marks >>> every thread with SIGKILL and signal_wake_up. This code is arguably >>> buggy however as it tries to skip creating a group exit when is already >>> present, and it fails that a coredump is in progress. >>> >>> Ever since commit 06af8679449d ("coredump: Limit what can interrupt >>> coredumps") was added dump_interrupted needs not just TIF_SIGPENDING >>> set on the dumper task but also SIGKILL set in it's pending bitmap. >>> This means that if the code is ever fixed not to short-circuit and >>> kill a process after it has already been killed the special case >>> for SIGKILL during a coredump will be broken. >>> >>> Sort all of this out by making the coredump special case more special, >>> and perform all of the work in prepare_signal and leave the rest of >>> the signal delivery path out of it. >>> >>> In prepare_signal when the process coredumping is sent SIGKILL find >>> the task performing the coredump and use sigaddset and signal_wake_up >>> to ensure that task reports fatal_signal_pending. >>> >>> Return false from prepare_signal to tell the rest of the signal >>> delivery path to ignore the signal. >>> >>> Update wait_for_dump_helpers to perform a wait_event_killable wait >>> so that if signal_pending gets set spuriously the wait will not >>> be interrupted unless fatal_signal_pending is true. >>> >>> I have tested this and verified I did not break SIGKILL during >>> coredumps by accident (before or after this change). I actually >>> thought I had and I had to figure out what I had misread that kept >>> SIGKILL during coredumps working. >>> >>> Signed-off-by: "Eric W. Biederman" >>> --- >>> fs/coredump.c | 4 ++-- >>> kernel/signal.c | 11 +++++++++-- >>> 2 files changed, 11 insertions(+), 4 deletions(-) >>> >>> diff --git a/fs/coredump.c b/fs/coredump.c >>> index a6b3c196cdef..7b91fb32dbb8 100644 >>> --- a/fs/coredump.c >>> +++ b/fs/coredump.c >>> @@ -448,7 +448,7 @@ static void coredump_finish(bool core_dumped) >>> static bool dump_interrupted(void) >>> { >>> /* >>> - * SIGKILL or freezing() interrupt the coredumping. Perhaps we >>> + * SIGKILL or freezing() interrupted the coredumping. Perhaps we >>> * can do try_to_freeze() and check __fatal_signal_pending(), >>> * but then we need to teach dump_write() to restart and clear >>> * TIF_SIGPENDING. >>> @@ -471,7 +471,7 @@ static void wait_for_dump_helpers(struct file *file) >>> * We actually want wait_event_freezable() but then we need >>> * to clear TIF_SIGPENDING and improve dump_interrupted(). >>> */ >>> - wait_event_interruptible(pipe->rd_wait, pipe->readers == 1); >>> + wait_event_killable(pipe->rd_wait, pipe->readers == 1); >>> >>> pipe_lock(pipe); >>> pipe->readers--; >>> diff --git a/kernel/signal.c b/kernel/signal.c >>> index 8272cac5f429..7e305a8ec7c2 100644 >>> --- a/kernel/signal.c >>> +++ b/kernel/signal.c >>> @@ -907,8 +907,15 @@ static bool prepare_signal(int sig, struct task_struct *p, bool force) >>> sigset_t flush; >>> >>> if (signal->flags & (SIGNAL_GROUP_EXIT | SIGNAL_GROUP_COREDUMP)) { >>> - if (!(signal->flags & SIGNAL_GROUP_EXIT)) >>> - return sig == SIGKILL; >>> + struct core_state *core_state = signal->core_state; >>> + if (core_state) { >>> + if (sig == SIGKILL) { >>> + struct task_struct *dumper = core_state->dumper.task; >>> + sigaddset(&dumper->pending.signal, SIGKILL); >>> + signal_wake_up(dumper, 1); >>> + } >>> + return false; >>> + } >>> /* >>> * The process is in the middle of dying, nothing to do. >>> */ >>> >> >> Hi, >> >> This patch breaks userspace, in particular it breaks gst-plugin-scanner >> of GStreamer which hangs now on next-20211224. IIUC, this tool builds a >> registry of good/working GStreamer plugins by loading them and >> blacklisting those that don't work (crash). Before the hang I see >> systemd-coredump process running, taking snapshot of gst-plugin-scanner >> and then gst-plugin-scanner gets stuck. >> >> Bisection points at this patch, reverting it restores >> gst-plugin-scanner. Systemd-coredump still running, but there is no hang >> anymore and everything works properly as before. >> >> I'm seeing this problem on ARM32 and haven't checked other arches. >> Please fix, thanks in advance. > > > I have not yet been able to figure out how to run gst-pluggin-scanner in > a way that triggers this yet. In truth I can't figure out how to > run gst-pluggin-scanner in a useful way. > > I am going to set up some unit tests and see if I can reproduce your > hang another way, but if you could give me some more information on what > you are doing to trigger this I would appreciate it. Thanks, Eric. The distro is Arch Linux, but it's a development environment where I'm running latest GStreamer from git master. I'll try to figure out the reproduction steps and get back to you.