Received: by 2002:a05:6a10:1a4d:0:0:0:0 with SMTP id nk13csp6016623pxb; Mon, 14 Feb 2022 13:12:51 -0800 (PST) X-Google-Smtp-Source: ABdhPJwUj/fwQc9cPSCCOaAN2YH0t7FzbvFAhJfptZCrKwdbHV7KQH6SE+HHXLzcx9yGQE9RrljI X-Received: by 2002:a17:90b:1bc3:: with SMTP id oa3mr594802pjb.17.1644873171247; Mon, 14 Feb 2022 13:12:51 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1644873171; cv=none; d=google.com; s=arc-20160816; b=yKh5GSlMkH6aokrquJo+B4upESbwCbtdEjKXYjqfvFlddr9QuoX1ckT2C+UtoxUm9p sHom7CNm+QT1a1na6P1OjzHmuTNt3kuysNSJ/MUT2KAHd9QQRDPbJNJsYoQyCxjLFbjU sg8DbfzVv89eGtC93z8cs6FOb6c3gj/34UGqm2gfXXhhPLJLdBcWDY8GAEKmO757fl9J pHkNOOVKOjIzfPWwLGMLx7m3aJPsLpOCd+4og+PC0kX2jLSTMzpKW8ODdMe2+CqDMMRf e3yysr1lLNYqkocZyox33plboFzgy8x7cGPw7Extf9rz1iyNXKgqzI0ZVSjk0FzTwKTL Jdzg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:in-reply-to:from :references:cc:to:content-language:subject:user-agent:mime-version :date:message-id; bh=3eLypHgYwEv5/jeRMF9+cHLHLwlHTL/IbZg4cvFUCok=; b=vejof0HhPfEZ5h6Qrct+WHDAIj+tpSOPEPq+qQho/3gMUlGkLH0ASTbcYSdKg67gvP kr0eps6ICBO08dM2pag32G7BCiGPdeGUo65bX2L8NHI1IK6PgGTNzjaNLAk8SKC3ihLF dSrLL3GRqNF6YFCuGiez3cTLFHMDsoKegzUYKgh93pwMAYwHPm3jUwBHzcpYwviPdQjl 6SJFWW4qxz7ucXkb96tX4R+qD18Ty3dDOVnY1vTDTqGEFBkBD9WhorIIlp2Mi6eMzPTy LZcWrNFxwIDeKYsVNSaXbOM5hp9xaqThTpjvMK5/btjAdyW3JUBSj/0f7WsCp2LSp4KG +3Ug== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net. [2620:137:e000::1:18]) by mx.google.com with ESMTPS id s17si12215973pjz.160.2022.02.14.13.12.50 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 14 Feb 2022 13:12:51 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:18 as permitted sender) client-ip=2620:137:e000::1:18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: from out1.vger.email (out1.vger.email [IPv6:2620:137:e000::1:20]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id 5691F217F91; Mon, 14 Feb 2022 12:32:05 -0800 (PST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1354603AbiBNNqF (ORCPT + 99 others); Mon, 14 Feb 2022 08:46:05 -0500 Received: from mxb-00190b01.gslb.pphosted.com ([23.128.96.19]:58250 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1354731AbiBNNqB (ORCPT ); Mon, 14 Feb 2022 08:46:01 -0500 Received: from mx1.molgen.mpg.de (mx3.molgen.mpg.de [141.14.17.11]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id D2D8EB849; Mon, 14 Feb 2022 05:45:51 -0800 (PST) Received: from [192.168.0.2] (ip5f5aebfe.dynamic.kabel-deutschland.de [95.90.235.254]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) (Authenticated sender: pmenzel) by mx.molgen.mpg.de (Postfix) with ESMTPSA id 9D27261EA1927; Mon, 14 Feb 2022 14:45:49 +0100 (CET) Message-ID: <74d2302f-88fc-c75c-6d2d-4aece1a515bb@molgen.mpg.de> Date: Mon, 14 Feb 2022 14:45:49 +0100 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Thunderbird/91.6.0 Subject: Re: [PATCH v3 0/9] Parallel CPU bringup for x86_64 Content-Language: en-US To: David Woodhouse Cc: Thomas Gleixner , Ingo Molnar , Borislav Petkov , Dave Hansen , x86@kernel.org, "H . Peter Anvin" , Paolo Bonzini , "Paul E . McKenney" , linux-kernel@vger.kernel.org, kvm@vger.kernel.org, rcu@vger.kernel.org, mimoja@mimoja.de, hewenliang4@huawei.com, hushiyuan@huawei.com, luolongjun@huawei.com, hejingxian@huawei.com References: <20211215145633.5238-1-dwmw2@infradead.org> <9a47b5ec-f2d1-94d9-3a48-9b326c88cfcb@molgen.mpg.de> <3bfacf45d2d0f3dfa3789ff5a2dcb46744aacff7.camel@infradead.org> From: Paul Menzel In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,NICE_REPLY_A, RDNS_NONE,SPF_HELO_NONE,T_SCC_BODY_TEXT_LINE autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Dear David, Am 29.12.21 um 14:54 schrieb David Woodhouse: > On Wed, 2021-12-29 at 14:18 +0100, Paul Menzel wrote: >>> Or the one in >>> https://lore.kernel.org/lkml/d4cde50b4aab24612823714dfcbe69bc4bb63b60.camel@infradead.org >>> >>> which makes it do nothing except prepare all the CPUs before bringing >>> them up one at a time? >> >> I applied it on top the other one, and it made no difference either. > > It's possible I missed something else in the prepare stage that doesn't > cope with all CPUs being prepared first. > > My next attempt might be to change the loop in bringup_nonboot_cpus() > to bring all the CPUs not to the CPUHP_BP_PARALLEL_DYN state(s) but > instead just bring them to somewhere like CPUHP_RCUTREE_PREP, which is > somewhere in the middle between CPUHP_OFFLINE and CPUHP_BRINGUP_CPU. > > Then a binary chop search — if that one boots, try maybe > CPUHP_TOPOLOGY_PREPARE. And if not, try CPUHP_PROFILE_PREPARE. Etc. > >>> My current theory (not that I've spent that much time thinking about it >>> in the last week) is that there's something about the existing CPU >>> bringup, possibly a CPU bug or something special about the AMD CPUs, >>> which is triggered by just making it a little bit *faster*, which is >>> why bringing them up from kexec (especially in qemu) can cause it too? >> >> Would having the serial console enabled make a difference? > > Yes. I couldn't make this fail in my EC2 m6a instance (for clean boots; > I have never managed to kexec it) until I turned off the serial console > to make things go faster. > >>> Tom seemed to find that it was in load_TR_desc(), so if you could try >>> this hack on a machine that doesn't magically wink out of existence on >>> a triplefault before even flushing its serial output, that would be >>> much appreciated... > >> Unfortunately, no more messages were printed on the serial console. > > I suppose we need to litter those outputs somewhere earlier in the > trampoline then, perhaps it *isn't* getting to load_TR_desc() in your > case? > > Will be back online properly next week and can actually provide some of > the above suggestions in patch form if you're willing to keep testing. Sorry for replying so late. I saw your v4 patches, and tried commit 5e3524d21d2a () from your branch `parallel-5.17-part1`. Unfortunately, the boot problem still persists on an AMD Ryzen 3 2200 g system, I tested with. Please tell, where I should report these results too (here or posted v4 patches). Also, do you have (physical) access to a system with an AMD CPU? If not, maybe we can get you one, so it’s more convenient for you to test. Kind regards, Paul