Received: by 2002:a25:8b91:0:0:0:0:0 with SMTP id j17csp1979907ybl; Sat, 14 Dec 2019 03:52:49 -0800 (PST) X-Google-Smtp-Source: APXvYqxl6709zk/mSd6398quVpkE4aiSZzw51dTU/htzpFdBqtRV0YjZqSZYcSEoWmdcewu+S5nZ X-Received: by 2002:a9d:32e:: with SMTP id 43mr20134821otv.301.1576324369048; Sat, 14 Dec 2019 03:52:49 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1576324369; cv=none; d=google.com; s=arc-20160816; b=HwukWhNuQRWtsCPbdSB7Pt8n4/srmXG5P7ybR/gaiGUssDUCkQmFXelCWsQsBfhMai uFKXRupq7Az6juh5Jl4i55xqmtd/K2cjeMxoD25jj8iHDUXKp3uZyVXwhKoZBFjoyNDn ehY4eiAnDnlavGKd9nBaqQ12ICo7C6ztJpHvEZq9qxsFUuNgBc94Dwsg7i6tIp1VypFr gnN2eGXsH4n1VhdIX/oykG3o9B4vn6CULRj7gGRUKXUUnwMoQ8cXePR+bYgCI3T7c6R7 C+3B1QN5UOh692th2O3LlQKUm8+icBgqsov/48rRzwGIq2nOfFS7IXY0ZOxqLPrD9YYM pbPg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date; bh=Rro3XseHRgKMoM1kcdo0yzNDPdgoAB4oWA1nlFC+IIQ=; b=TJXaxEVxzW6wNKr5/qhQQIxvrZjn98rBVWID3R1BJDNObraMYg4FlUr2QJ5aq43lXs SfzAwWgcqSs62GbAoqdg6qoRtqdT8gZjOSoTjqa/KJPBtSwSj0foCU1JmatzEMOsOEp9 8DGrqJJTZ8JrrSdqcxAsrHZpXSGfZrrd7OI7sdO8HkXpiPyecIQJg0VHGz16QV7P1Zul OLz8CZF24mSbVvMUPz7FVwoCorGXSfG21kXcj3uXv/9qkSuSGWb8szjhWYKe3bBsJ6am 9j0L3jXXlAX/6nj9CTsDxbn/c4u4h8d9/bq0tHGWJwJ7m5k6Ze+EgNG9yVwrttZ803Dh czLg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id x7si6860548oia.165.2019.12.14.03.52.24; Sat, 14 Dec 2019 03:52:49 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1725975AbfLNLsW (ORCPT + 99 others); Sat, 14 Dec 2019 06:48:22 -0500 Received: from youngberry.canonical.com ([91.189.89.112]:55886 "EHLO youngberry.canonical.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725883AbfLNLsW (ORCPT ); Sat, 14 Dec 2019 06:48:22 -0500 Received: from [213.220.153.21] (helo=wittgenstein) by youngberry.canonical.com with esmtpsa (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.86_2) (envelope-from ) id 1ig5u8-0005YC-B3; Sat, 14 Dec 2019 11:48:12 +0000 Date: Sat, 14 Dec 2019 12:48:11 +0100 From: Christian Brauner To: chenqiwu Cc: peterz@infradead.org, mingo@kernel.org, kernel-team@android.com, linux-kernel@vger.kernel.org, chenqiwu@xiaomi.com, oleg@redhat.com Subject: Re: [PATCH] kernel/exit: do panic earlier to get coredump if global init task exit Message-ID: <20191214114810.ftfxtx4qaa5nzji4@wittgenstein> References: <1576131255-3433-1-git-send-email-qiwuchen55@gmail.com> <20191212095127.GA5460@redhat.com> <20191212100838.GB5460@redhat.com> <20191212110513.qf2sapgggnp46voc@wittgenstein> <20191214062704.GA5580@cqw-OptiPlex-7050> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: <20191214062704.GA5580@cqw-OptiPlex-7050> User-Agent: NeoMutt/20180716 Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Sat, Dec 14, 2019 at 02:27:04PM +0800, chenqiwu wrote: > On Thu, Dec 12, 2019 at 12:05:14PM +0100, Christian Brauner wrote: > > On Thu, Dec 12, 2019 at 11:08:38AM +0100, Oleg Nesterov wrote: > > > can't you use is_global_init() && group_dead ? > > > > Seems reasonable. > > Looks like we can move > > group_dead = atomic_dec_and_test(&tsk->signal->live); > > further up... > > > > (Ideally I'd like to have a test for this to ensure that this lets you > > capture a global init coredump but that might be tricky. But since you've > > seem to have run into this case maybe you even have something that could > > be turned into a test? (Similar to how we already have a purely opt-in > > test for pstore.)) > > > > Christian > > Hi all, > I agree that using is_global_init() && group_dead is more reasonable. > > The crash isuee happened on a Android phone by reboot stress test. > panic log: > [ 84.048521] Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b > [ 84.048521] > [ 84.048540] CPU: 2 PID: 1 Comm: init Tainted: G S O 4.14.117-perf-g8035d1a #1 > [ 84.048544] Hardware name: Qualcomm Technologies, Inc. SM8150 V2 PM8150 RAPHAEL (DT) > [ 84.048550] Call trace: > [ 84.048564] dump_backtrace+0x0/0x268 > [ 84.048569] show_stack+0x14/0x20 > [ 84.048577] dump_stack+0xc4/0x100 > [ 84.048584] panic+0x1f0/0x410 > [ 84.048591] complete_and_exit+0x0/0x20 > [ 84.048596] do_group_exit+0x8c/0xa0 > [ 84.048602] get_signal+0x1c0/0x790 > [ 84.048608] do_notify_resume+0x184/0xc30 > [ 84.048613] work_pending+0x8/0x10 > > From the kdump loaded by crash utility, all threads of global init have exited, > the group_dead value of global init has truned to 0 by atomic_dec_and_test(). > crash> ps init > PID PPID CPU TASK ST %MEM VSZ RSS COMM > > 1 0 2 ffffffcd77526000 ?? 0.0 0 0 init > 534 1 4 ffffffcd6b9a9000 ZO 0.0 0 0 init > 535 1 4 ffffffcd6b9aa000 ZO 0.0 0 0 init > crash> ps -g 1 > PID: 1 TASK: ffffffcd77526000 CPU: 2 COMMAND: "init" > (no threads) > crash> struct task_struct.signal ffffffcd77526000 > signal = 0xffffffcd77530000 > crash> struct signal_struct 0xffffffcd77530000 > struct signal_struct { > sigcnt = { > counter = 1 > }, > live = { > counter = 0 > }, > nr_threads = 1, > thread_head = { > next = 0xffffffcd77526730, > prev = 0xffffffcd77526730 > }, > group_exit_code = 11, > notify_count = 0, > group_exit_task = 0x0, > group_stop_count = 0, > flags = 4, > ... > } > > However, as Christian said, the test for this is tricky since we must > make sure all of init threads exited. I make a test for is_global_init() > and send a SIGSEGV signal to global init task in userspace. The phone > crash imeddiately and reboot to collect kdump. Then I extract the coredump > of global init task from kdump successfully. > (gdb) core-file core.1.init > Core was generated by `/system/bin/init second_stage'. > #0 _exit () at bionic/libc/arch-arm64/syscalls/_exit.S:9 > 9 cmn x0, #(MAX_ERRNO + 1) > (gdb) bt > #0 _exit () at bionic/libc/arch-arm64/syscalls/_exit.S:9 > #1 0x00000055606db11c in android::init::InstallRebootSignalHandlers()::$_14::operator()(int) const (this=, signal=11) > at system/core/init/reboot_utils.cpp:141 > #2 0x00000055606db100 in android::init::InstallRebootSignalHandlers()::$_14::__invoke(int) (signal=11) at system/core/init/reboot_utils.cpp:138 > #3 0x0000007f8de236a0 in ?? () > Backtrace stopped: previous frame identical to this frame (corrupt stack?) > > So from the following test, we have confidence that the following patch can help us Thanks. Can you please resend this as a proper patch for v2? Christian