Received: by 2002:a05:6358:c692:b0:131:369:b2a3 with SMTP id fe18csp655735rwb; Wed, 26 Jul 2023 00:07:21 -0700 (PDT) X-Google-Smtp-Source: APBJJlH28pPrY+nvR6fbxb4p30hsDlYfLKfj/42NiwkXUL1l6irfb4MSD4LajrsOCIv58LrVsUff X-Received: by 2002:aa7:da81:0:b0:522:3abb:5bba with SMTP id q1-20020aa7da81000000b005223abb5bbamr885935eds.24.1690355240734; Wed, 26 Jul 2023 00:07:20 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1690355240; cv=none; d=google.com; s=arc-20160816; b=vbXLiO+/Yk8b29ZrD2MXPQZP97QtC5VJ73ic5CpopywnJj3fNOG6qJNMGgOLMxz2Z1 AGhBppdCG2tsLXVCuoSCbet/wUN/XIKYjRmZV1AKZjKbkLh3htZblERYCPB/C7tCeeu6 av7YHZN9JtfvcfGuLBNSr3jMe2nPWAu5LvOhvlVylxRZnNFe4XLblcY38oF+CCGHQ35Q P40N+YJgkd/eGH7Kk+o0d0IAvHKUAtaW/tWFpKIBxuxQrLt5GlR81U/O3ZzGTkLt/dlY dBfQtmaIOukwBPMG4Io7uZvLHuICv0YIuxq1KxcjPE46XfvMYLzIfDfwDy6HtTsy/Lkc gcAQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:in-reply-to:from :references:cc:to:content-language:subject:user-agent:mime-version :date:message-id; bh=/izC22qrnBtjnRiSgFbgElfLhEEOw8HxshiYAg+JJKQ=; fh=FLnACyC8bNDcDcCoTAPQlmJca7IzvSTxSlVW4ZSMQME=; b=TFghpVkGYkY8H42OYFOOHo3gpYiQ5/OsdkMzdLIoQ7zqDPZq2ccy73Gki1qLXiVFAd Av1pg3nMadR+70M88qAXlEVg9mQKA48OX2Iua7X2XzdVZ0TXzok2CZpjr7youYPjro/F Ia843zuXQAEI3WSr8UmJvEcfKfvcZwrUOthLahc3LA+2Udh6LlAmXVW0aq8TPgX5/haD spAismuLatJyxH/ZyVp3+iGty+u86K8APXSr/tq2gBdsAck1QNhknRCyKewhIpvou8PJ SPxA3UBEgPMxdrXno6t2/kjLnRMLbrdqQxqgnR22DNyi5mf3FLTR6EjKsRDmsFCmQqTX oBvg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id l8-20020a056402124800b00521d2b05963si8654304edw.427.2023.07.26.00.06.53; Wed, 26 Jul 2023 00:07:20 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231951AbjGZHBx (ORCPT + 99 others); Wed, 26 Jul 2023 03:01:53 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:36738 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231841AbjGZHA5 (ORCPT ); Wed, 26 Jul 2023 03:00:57 -0400 X-Greylist: delayed 525 seconds by postgrey-1.37 at lindbergh.monkeyblade.net; Wed, 26 Jul 2023 00:00:13 PDT Received: from proxmox-new.maurer-it.com (proxmox-new.maurer-it.com [94.136.29.106]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 950DC2D5A for ; Wed, 26 Jul 2023 00:00:13 -0700 (PDT) Received: from proxmox-new.maurer-it.com (localhost.localdomain [127.0.0.1]) by proxmox-new.maurer-it.com (Proxmox) with ESMTP id 36590452BD; Wed, 26 Jul 2023 08:51:26 +0200 (CEST) Message-ID: <85876d36-ca1f-4ba4-9065-4e7fc58329c0@proxmox.com> Date: Wed, 26 Jul 2023 08:51:24 +0200 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: segfaults of processes while being killed after commit "mm: make the page fault mmap locking killable" Content-Language: en-GB To: Linus Torvalds , Fiona Ebner , "Eric W. Biederman" , Oleg Nesterov Cc: akpm@linux-foundation.org, Wolfgang Bumiller , linux-mm@kvack.org, linux-kernel@vger.kernel.org References: <8d063a26-43f5-0bb7-3203-c6a04dc159f8@proxmox.com> From: Thomas Lamprecht In-Reply-To: Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00,SPF_HELO_NONE, SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 25/07/2023 18:38, Linus Torvalds wrote: > But before we revert it, would you mind trying out the attached > trivial patch instead? Not Fiona, but as I was still online yesterday I got around to already try that patch out, after adding the missing `tsk` task_struct param to the fatal_signal_pending call. With the patched kernel booted, the original case we found in the wild went from logging a segfault roughly twice per hour before, to none afterward, and that with a bit more than 10h of boot time. Fiona might have a more definitive confirmation, as IIRC she got a better (= faster) reproducer used for bisecting. > > I'd also still be interested if the symptoms were anything else than > 'show_unhandled_signals' causing the show_signal_msg() dance, and > resulting in a message something like > > a.out[1567]: segfault at xyz ip [..] likely on CPU X > > in dmesg... exactly, it was just like that with no actual fall out. The messages were like: > pverados[2183248]: segfault at 55e5a00f9ae0 ip 000055e5a00f9ae0 sp 00007ffc0720bea8 error 14 in perl[55e5a00d4000+195000] likely on CPU 10 (core 4, socket 0) And the slightly odd code triggering this was basically a fork, where the child wrote a message to the parent via a unix socket pair and then called exit. The parent read that message and then send a SIGKILL to the child process, i.e., the child exit and parent killing the child process would be pretty closely aligned, basically racing with each other. cheers, Thomas