Received: by 10.192.165.148 with SMTP id m20csp413125imm; Fri, 20 Apr 2018 08:45:02 -0700 (PDT) X-Google-Smtp-Source: AIpwx49h6e+7oce/F4kigwgcgVq+zOHrP3pDtqZ+9AUbObkdw8fSgNDCAAWJn3rC9Llpqu3XN/zQ X-Received: by 10.99.47.4 with SMTP id v4mr9265737pgv.42.1524239102190; Fri, 20 Apr 2018 08:45:02 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1524239102; cv=none; d=google.com; s=arc-20160816; b=qTSkgIFrrwu3uUMYUW5WjC7i3buzux8ykJ6thWH/nO8sN82toOeZ6UQqunICydskn/ pqRaxl5APxoVLRPc4HfPRsFsyc4SA1UTr3b1xBj9RxeZsPIBs9weUnVeYluJECFQrBRx jTSifdreIUWfGs37PnIQ8pTpG4WdSrxSuWSLLtgwzMoYS30ACYUOGZX2Wmabah16PC/R bSWMR9GfAIk1TXiadIn9FukvfZeQdpia3D/TgwRR7WM65IbuNmsP5DTuATXBHwrhepd4 akFZCRG/2t4B3rmd2fLgcR0Nbn9MkZZettnOpgZLs2UgjNvLfA//fTvpGGWGcu/uKtCL iOgg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:cc:to:subject:message-id:date:from :references:in-reply-to:mime-version:dkim-signature :arc-authentication-results; bh=dgJPrTxitnE+eRjvy9x6l74RChpplbeOS0q0wZ60dLQ=; b=z0sbKtut21v59wQv8+6Eo0mj9M0KiVVmrZvNWixb6zPBLAszyeljRCyQgfZ8ZMSX/k XWx2wj0L7lsJB3N2h24h0TOk3PpfS9QdOnXDk+9mtVFv6mYozzMGzHSQJsobAz0jRuxv j/atb28vuOahiTiuM8mjnPP5UljsjW2Gd1chk2QtCHEduzHSPlbRTVoWaf7LoaDRO202 smvf2fXya3Vw+fh54EodovuC9y2eJ0LvzNDmKE5Jh9YVCC6CKeb8vcEZgvXJrkDvw2h4 ZyndbTJgS/ujsNRNRbjCBg6tlpfa128DdfKfbwl6C3rrRkpRYAwTjq0xCwvOGQINaT3B mQhA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@oracle.com header.s=corp-2017-10-26 header.b=oreyTIlr; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=oracle.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id e18si4992161pgr.216.2018.04.20.08.44.47; Fri, 20 Apr 2018 08:45:02 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@oracle.com header.s=corp-2017-10-26 header.b=oreyTIlr; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=oracle.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755728AbeDTPnc (ORCPT + 99 others); Fri, 20 Apr 2018 11:43:32 -0400 Received: from aserp2130.oracle.com ([141.146.126.79]:54024 "EHLO aserp2130.oracle.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1755281AbeDTPnb (ORCPT ); Fri, 20 Apr 2018 11:43:31 -0400 Received: from pps.filterd (aserp2130.oracle.com [127.0.0.1]) by aserp2130.oracle.com (8.16.0.22/8.16.0.22) with SMTP id w3KFexeE002825 for ; Fri, 20 Apr 2018 15:43:30 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=mime-version : in-reply-to : references : from : date : message-id : subject : to : cc : content-type; s=corp-2017-10-26; bh=dgJPrTxitnE+eRjvy9x6l74RChpplbeOS0q0wZ60dLQ=; b=oreyTIlrHyi3dOF9cj0QtMY5twOKh8bwEHYAQHKiLaeEUqyjtZGs0O+uSyh7w+TQYZ0g aigOuuuc8Bxci1x+DBXjszHKGZWKnnKeqTTZnxEVsY/i4mWwIYbot0r1tFpjgSOS9ljN W576JvLzGVsjGnE9mdm8zUefL7b9u+2UTPt0Q/ymVIjtIW7pR2+3kK6N9QDl4heDT2uY NHH4kdS/el97JsIWD8Hu71YT4WINhm4IQRVWVVTR1wiUey7LoqH1HKB2AFv82HLWWFDQ DJQxCBf3lPIiwkqSNVDw2nUDiqaW0dOBn7QfaaieArcA5JzJhFK4V7bLL4sQZRKiT1V/ IA== Received: from userv0021.oracle.com (userv0021.oracle.com [156.151.31.71]) by aserp2130.oracle.com with ESMTP id 2hf7amtp5r-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK) for ; Fri, 20 Apr 2018 15:43:30 +0000 Received: from userv0121.oracle.com (userv0121.oracle.com [156.151.31.72]) by userv0021.oracle.com (8.14.4/8.14.4) with ESMTP id w3KFhTHW019450 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK) for ; Fri, 20 Apr 2018 15:43:29 GMT Received: from abhmp0010.oracle.com (abhmp0010.oracle.com [141.146.116.16]) by userv0121.oracle.com (8.14.4/8.13.8) with ESMTP id w3KFhT4D022532 for ; Fri, 20 Apr 2018 15:43:29 GMT Received: from mail-oi0-f54.google.com (/209.85.218.54) by default (Oracle Beehive Gateway v4.0) with ESMTP ; Fri, 20 Apr 2018 08:43:29 -0700 Received: by mail-oi0-f54.google.com with SMTP id t27-v6so8404263oij.9 for ; Fri, 20 Apr 2018 08:43:29 -0700 (PDT) X-Gm-Message-State: ALQs6tDbWY1ytO+1LkJ9kPmFgYzpsxn89lZlpOBQZRYV+1apAy4Kabzh awc1dWn5+PuCmZywzTNijHrFyg0I7FcqJ5vGgz8= X-Received: by 2002:aca:584:: with SMTP id 126-v6mr2011659oif.339.1524239008583; Fri, 20 Apr 2018 08:43:28 -0700 (PDT) MIME-Version: 1.0 Received: by 2002:a9d:2322:0:0:0:0:0 with HTTP; Fri, 20 Apr 2018 08:42:48 -0700 (PDT) In-Reply-To: References: <20180418233825.GA33106@big-sky.local> <20180419013128.iurzouiqxvcnpbvz@wfg-t540p.sh.intel.com> From: Pavel Tatashin Date: Fri, 20 Apr 2018 11:42:48 -0400 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: c9e97a1997 BUG: kernel reboot-without-warning in early-boot stage, last printk: early console in setup code To: Fengguang Wu Cc: Dennis Zhou , Daniel Jordan , Steven Sistare , Andrew Morton , Linux Memory Management List , LKML , LKP , Tejun Heo , Christoph Lameter , Linus Torvalds , Josef Bacik Content-Type: text/plain; charset="UTF-8" X-Proofpoint-Virus-Version: vendor=nai engine=5900 definitions=8869 signatures=668698 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 suspectscore=5 malwarescore=0 phishscore=0 bulkscore=0 spamscore=0 mlxscore=0 mlxlogscore=999 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1711220000 definitions=main-1804200158 Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org I have root caused the issue, and will submit a fix shortly. The fix also fixes the per_cpu_ptr_to_phys bug that is sent in a separate thread. The issue arises in this stack: start_kernel() trap_init() setup_cpu_entry_areas() setup_cpu_entry_area(cpu) get_cpu_gdt_paddr(cpu) per_cpu_ptr_to_phys(addr) pcpu_addr_to_page(addr) virt_to_page(addr) pfn_to_page(__pa(addr) >> PAGE_SHIFT) The returned "struct page" is sometimes uninitialized, and thus failing later when used. It turns out sometimes is because it depends on KASLR. When boot is failing we have this when pfn_to_page() is called: kasrl: 0x000000000d600000 addr: ffffffff83e0d000 pa: 1040d000 pfn: 1040d page: ffff88001f113340 page->flags ffffffffffffffff <- Uninitialized! When boot is successful: kaslr: 0x000000000a800000 addr: ffffffff83e0d000 pa: d60d000 pfn: d60d page: ffff88001f05b340 page->flags 280000000000 <- Initialized! Here are physical addresses that BIOS provided to us: [ 0.000000] e820: BIOS-provided physical RAM map: [ 0.000000] BIOS-e820: [mem 0x0000000000000000-0x000000000009fbff] usable [ 0.000000] BIOS-e820: [mem 0x000000000009fc00-0x000000000009ffff] reserved [ 0.000000] BIOS-e820: [mem 0x00000000000f0000-0x00000000000fffff] reserved [ 0.000000] BIOS-e820: [mem 0x0000000000100000-0x000000001ffdffff] usable [ 0.000000] BIOS-e820: [mem 0x000000001ffe0000-0x000000001fffffff] reserved [ 0.000000] BIOS-e820: [mem 0x00000000feffc000-0x00000000feffffff] reserved [ 0.000000] BIOS-e820: [mem 0x00000000fffc0000-0x00000000ffffffff] reserved In both cases, working and non-working the real physical address is the same: pa - kasrl = 0x2E0D000 The only thing that is different is PFN. We initialize struct pages in four places: 1. Early in boot a small set of struct pages is initialized to fill the first section, and lower zones. 2. During mm_init() we initialize "struct pages" for all the memory that is allocated, i.e reserved in memblock. 3. Using on-demand logic when pages are allocated after mm_init call 4. After smp_init() when the rest free deferred pages are initialized. The above path happens before deferred memory is initialized, and thus it must be covered either by 1, 2 or 3. So, lets check what PFNs are initialized after (1). memmap_init_zone() is called for pfn ranges: 1 - 1000, and 1000 - 1ffe0, but it quits after reaching pfn 0x10000, as it leaves the rest to be initialized as deferred pages. In the working scenario pfn ended up being below 1000, but in the failing scenario it is above. Hence, we must initialize this page in (2). But trap_init() is called before mm_init(). The bug was introduced by "mm: initialize pages on demand during boot" because we lowered amount of pages that is initialized in the step (1). But, it still could happen, because the number of initialized pages was a guessing. The proposed fix is this: diff --git a/init/main.c b/init/main.c index b795aa341a3a..870f75581cea 100644 --- a/init/main.c +++ b/init/main.c @@ -585,8 +585,8 @@ asmlinkage __visible void __init start_kernel(void) setup_log_buf(0); vfs_caches_init_early(); sort_main_extable(); - trap_init(); mm_init(); + trap_init(); ftrace_init();