Received: by 2002:a25:683:0:0:0:0:0 with SMTP id 125csp913073ybg; Mon, 1 Jun 2020 18:25:50 -0700 (PDT) X-Google-Smtp-Source: ABdhPJzsQenX8Vbt+y2cRhdG22YH08GqOkrIqu8EzFX6qTHj+fSU4WKKZHyourkYXAwUJDLJZJQx X-Received: by 2002:a17:906:b2c1:: with SMTP id cf1mr22928035ejb.135.1591061149861; Mon, 01 Jun 2020 18:25:49 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1591061149; cv=none; d=google.com; s=arc-20160816; b=fHSx6VXoWfEQhywCPYZ9SCedxZe9kb9BzuAxPyyCNQK2QVp8Tt28ftvx78AIgZSls1 Wckemr7O0fcGjyH6TwLXbcVWNl++s3sdsPE5SKgmB3xxPXBiISzCNnw4f6m9of6J8Fk6 +ygIsXaCW0cP2d0LooL1pSoO7v5Eqj3m4IGQjlESuHyCt99cK52ESgSECFnCjlSdpord CwlFR0VARw4vnv3Itga+TnkM6rSphy/EsejG2aKljHfXMTsEO1rIGQD6tgYbaRNpyupz KV5itU+xzlcMZzWg9+7ri5gNqXO/mNn69KuzkdYj5RgG9xIXdPS4mecXy7BjwvdY9vvg B0lw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding:in-reply-to :mime-version:user-agent:date:message-id:from:references:cc:to :subject; bh=fWbymMaAfGkpU2397WQ0QE9uG3DLsDMy75rLCbXSDaI=; b=liw186lliME18k8Ir9jckGYq/iF3aaCCqaAExNyRwj9hBekRtesTmr33q4BQS5EF/q kAvwvFo74SvpPkSrGVxVaUHfhGe7iF0MxisLfP63P3fiBH17Qnjashx1rU1rlauM4NHl 35nqIjA/NI6JvEmPUdxaOKYNEOtKN75wv9kjM8bUXOHu3ruruVwWvPbCczTFEBu5/c43 u60WZbkj2smEzD8nqT+m07OwDKtptwSasicaIZlyI7y1w6AyANUpZlXXxVcrKuxsg3ra KjeOcQ94Xb2phXe6QYfEjgrTCPFpCJ1yiNCl7zc6h1G2NcqcwxB32/32kiHU8O4QE5Of k0Hw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id q5si675615edn.1.2020.06.01.18.25.26; Mon, 01 Jun 2020 18:25:49 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726480AbgFBBWn (ORCPT + 99 others); Mon, 1 Jun 2020 21:22:43 -0400 Received: from szxga05-in.huawei.com ([45.249.212.191]:5325 "EHLO huawei.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1725832AbgFBBWm (ORCPT ); Mon, 1 Jun 2020 21:22:42 -0400 Received: from DGGEMS407-HUB.china.huawei.com (unknown [172.30.72.59]) by Forcepoint Email with ESMTP id 4E8A139E2513CCB8B04F; Tue, 2 Jun 2020 09:22:40 +0800 (CST) Received: from [127.0.0.1] (10.166.213.10) by DGGEMS407-HUB.china.huawei.com (10.3.19.207) with Microsoft SMTP Server id 14.3.487.0; Tue, 2 Jun 2020 09:22:33 +0800 Subject: Re: Question: livepatch failed for new fork() task stack unreliable To: Josh Poimboeuf CC: , , , , , , , , , , References: <20200529101059.39885-1-bobo.shaobowang@huawei.com> <20200529174433.wpkknhypx2bmjika@treble> <20200601180538.o5agg5trbdssqken@treble> From: "Wangshaobo (bobo)" Message-ID: Date: Tue, 2 Jun 2020 09:22:30 +0800 User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:68.0) Gecko/20100101 Thunderbird/68.1.0 MIME-Version: 1.0 In-Reply-To: <20200601180538.o5agg5trbdssqken@treble> Content-Type: text/plain; charset="utf-8"; format=flowed Content-Transfer-Encoding: 8bit X-Originating-IP: [10.166.213.10] X-CFilter-Loop: Reflected Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org 在 2020/6/2 2:05, Josh Poimboeuf 写道: > On Sat, May 30, 2020 at 10:21:19AM +0800, Wangshaobo (bobo) wrote: >> 1) when a user mode task just fork start excuting ret_from_fork() till >> schedule_tail, unwind_next_frame found >> >> orc->sp_reg is ORC_REG_UNDEFINED but orc->end not equals zero, this time >> arch_stack_walk_reliable() >> >> terminates it's backtracing loop for unwind_done() return true. then 'if >> (!(task->flags & (PF_KTHREAD | PF_IDLE)))' >> >> in arch_stack_walk_reliable() true and return -EINVAL after. >> >> * The stack trace looks like that: >> >> ret_from_fork >> >>       -=> UNWIND_HINT_EMPTY >> >>       -=> schedule_tail             /* schedule out */ >> >>       ... >> >>       -=> UNWIND_HINT_REGS      /*  UNDO */ > Yes, makes sense. > >> 2) when using call_usermodehelper_exec_async() to create a user mode task, >> ret_from_fork() still not exec whereas >> >> the task has been scheduled in __schedule(), at this time, orc->sp_reg is >> ORC_REG_UNDEFINED but orc->end equals zero, >> >> unwind_error() return true and also terminates arch_stack_walk_reliable()'s >> backtracing loop, end up return from >> >> 'if (unwind_error())' branch. >> >> * The stack trace looks like that: >> >> -=> call_usermodehelper_exec >> >>                  -=> do_exec >> >>                            -=> search_binary_handler >> >>                                       -=> load_elf_binary >> >>                                                 -=> elf_map >> >>                                                          -=> vm_mmap_pgoff >> >> -=> down_write_killable >> >> -=> _cond_resched >> >>              -=> __schedule           /* scheduled to work */ >> >> -=> ret_from_fork       /* UNDO */ > I don't quite follow the stacktrace, but it sounds like the issue is the > same as the first one you originally reported: yes, true, same as the first one,  the only difference what i want to say is the task has been scheduled but the first one is not. >> 1) The task was not actually scheduled to excute, at this time >> UNWIND_HINT_EMPTY in ret_from_fork() has not reset unwind_hint, it's >> sp_reg and end field remain default value and end up throwing an error >> in unwind_next_frame() when called by arch_stack_walk_reliable(); > Or am I misunderstanding? > > And to reiterate, these are not "livepatch failures", right? Livepatch > doesn't fail when stack_trace_save_tsk_reliable() returns an error. It > recovers gracefully and tries again later. yes, you are right,  "livepatch failures" only indicates serveral retry failures, we found if frequent fork() happend in current system, it is easier to cause retry but still always end up success. so i think this question is related to ORC unwinder, could i ask if you have strategy or plan to avoid this problem ? thanks, Wang ShaoBo