Received: by 10.192.165.148 with SMTP id m20csp2577493imm; Thu, 26 Apr 2018 13:28:27 -0700 (PDT) X-Google-Smtp-Source: AB8JxZrj3ru+thKAs5aIBFGZ02nwL4Jr2t8w8RcJtt1mKEs4jvOTW4bwPKOAt8Lbfr3pdmFokhE1 X-Received: by 2002:a17:902:74c9:: with SMTP id f9-v6mr524100plt.385.1524774507067; Thu, 26 Apr 2018 13:28:27 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1524774507; cv=none; d=google.com; s=arc-20160816; b=pJhsSGPa/C7K+UaIiqaAOlHmg91lEHIe3cYzO/4wtIVWEYjzAheE0+pregWImFSMNN JaFFjEdbEYE4kywmLvhCr/oimrjIBhcbGR6oaiz0PjGjDeDtfD/c5eKZoaghPusIQeJ5 6WxIS6iWJMeBE7KCj34svbccdNz1BH6BDws3sBYTlgsSRq4pGltSBMonfvTFNaXYKTgM 08LBM27JCZc+JRUhNjNFcm4wdgUaUwJVm4DFfNhDE1h+sIXDhT65hvU4j2/JtpKBbbEg YO9W1F2mQzu3v08xjZ0pI+/Uk0YvEETp/QCHuMJh+fQvKlj7BzsnRyHEAdQqB9y239j4 FTjg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:message-id:date:subject:to:from :dkim-signature:arc-authentication-results; bh=r/gjIXfvzMu+2UMZyHf8whCsX9rPPZzoKq6VTm5vM+U=; b=vL29Cj/zim+Qn+6Fyd6st6N/zg0ASAGYB8Qyptb930+26DVOpg1LaBx6WS0zW/55ni F9BiZobgcb7EE9R9cH9bzn+xguLO7YQK6iJPciX6M/Q2NHRsLdsYjUbxQcFaDA9Gs54+ 4WGDGH7rIbFuaWX5JCx7vDaW/Vtk1wcCmklaz0ySpcXzF8iRqKZ5M/sqSdeHuihZ8vxS iLw1u130oUSKOSgqNufB+9i9iR9CHoVnWM4kDoYef3uGlSkuU8yhFc0atEJZF8J+8Fwt 57kvItmupXk3DKeomptMwGbU4LDRa77E6oIOU4GNFZjVJM7rDy/6t4TNXn1QCrtENkOX K25w== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@oracle.com header.s=corp-2017-10-26 header.b=r+Pj3tjz; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=oracle.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id g9si3799195pgv.218.2018.04.26.13.28.12; Thu, 26 Apr 2018 13:28:27 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@oracle.com header.s=corp-2017-10-26 header.b=r+Pj3tjz; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=oracle.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754187AbeDZU1H (ORCPT + 99 others); Thu, 26 Apr 2018 16:27:07 -0400 Received: from aserp2130.oracle.com ([141.146.126.79]:32978 "EHLO aserp2130.oracle.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753499AbeDZU1F (ORCPT ); Thu, 26 Apr 2018 16:27:05 -0400 Received: from pps.filterd (aserp2130.oracle.com [127.0.0.1]) by aserp2130.oracle.com (8.16.0.22/8.16.0.22) with SMTP id w3QKPwGn034245; Thu, 26 Apr 2018 20:26:28 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=from : to : subject : date : message-id; s=corp-2017-10-26; bh=r/gjIXfvzMu+2UMZyHf8whCsX9rPPZzoKq6VTm5vM+U=; b=r+Pj3tjzfj9GmJ4tc8Vyt/ECh0ubCPOv3QQeN82gHG6KjruMpO8OOEECc2O0U6jFWy7y YuOtsXRxYLMeLrWisG16dvUZeLqgzgV3Bqq4Iy1sk4JJ4jdRUR4poqI2fgRZKsCujQYJ fLiHdbSALsz98NXro7mcgnknr0umCL8X6DKuAzpGZO39VTLGtffFdY+a+1KfZFdt/6cg kWQOiX8UmJF3s5C3+XN4GTAL0voQc0m5tpE7mJdTy14EtBL/4pIXaijxbItJ4WcjWH2y F19Ihy/K59AmmQucQnCn3jrWckw8D/ldsQf8Qf/aTzGqzvn4vOFmyiqmi5cg0iOPcL3H wA== Received: from userv0021.oracle.com (userv0021.oracle.com [156.151.31.71]) by aserp2130.oracle.com with ESMTP id 2hfttyd279-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Thu, 26 Apr 2018 20:26:28 +0000 Received: from aserv0122.oracle.com (aserv0122.oracle.com [141.146.126.236]) by userv0021.oracle.com (8.14.4/8.14.4) with ESMTP id w3QKQQsR017820 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Thu, 26 Apr 2018 20:26:26 GMT Received: from abhmp0001.oracle.com (abhmp0001.oracle.com [141.146.116.7]) by aserv0122.oracle.com (8.14.4/8.14.4) with ESMTP id w3QKQOGq011458; Thu, 26 Apr 2018 20:26:25 GMT Received: from localhost.localdomain (/98.216.35.41) by default (Oracle Beehive Gateway v4.0) with ESMTP ; Thu, 26 Apr 2018 13:26:24 -0700 From: Pavel Tatashin To: steven.sistare@oracle.com, daniel.m.jordan@oracle.com, akpm@linux-foundation.org, linux-kernel@vger.kernel.org, tglx@linutronix.de, mhocko@suse.com, linux-mm@kvack.org, mgorman@techsingularity.net, mingo@kernel.org, peterz@infradead.org, rostedt@goodmis.org, fengguang.wu@intel.com, dennisszhou@gmail.com Subject: [PATCH v2] mm: access to uninitialized struct page Date: Thu, 26 Apr 2018 16:26:19 -0400 Message-Id: <20180426202619.2768-1-pasha.tatashin@oracle.com> X-Mailer: git-send-email 2.17.0 X-Proofpoint-Virus-Version: vendor=nai engine=5900 definitions=8875 signatures=668698 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 suspectscore=2 malwarescore=0 phishscore=0 bulkscore=0 spamscore=0 mlxscore=0 mlxlogscore=999 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1711220000 definitions=main-1804260190 Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org The following two bugs were reported by Fengguang Wu: kernel reboot-without-warning in early-boot stage, last printk: early console in setup code http://lkml.kernel.org/r/20180418135300.inazvpxjxowogyge@wfg-t540p.sh.intel.com And, also: [per_cpu_ptr_to_phys] PANIC: early exception 0x0d IP 10:ffffffffa892f15f error 0 cr2 0xffff88001fbff000 http://lkml.kernel.org/r/20180419013128.iurzouiqxvcnpbvz@wfg-t540p.sh.intel.com Both of the problems are due to accessing uninitialized struct page from trap_init(). We must first do mm_init() in order to initialize allocated struct pages, and than we can access fields of any struct page that belongs to memory that's been allocated. Below is explanation of the root cause. The issue arises in this stack: start_kernel() trap_init() setup_cpu_entry_areas() setup_cpu_entry_area(cpu) get_cpu_gdt_paddr(cpu) per_cpu_ptr_to_phys(addr) pcpu_addr_to_page(addr) virt_to_page(addr) pfn_to_page(__pa(addr) >> PAGE_SHIFT) The returned "struct page" is sometimes uninitialized, and thus failing later when used. It turns out sometimes is because it depends on KASLR. When boot is failing we have this when pfn_to_page() is called: kasrl: 0x000000000d600000 addr: ffffffff83e0d000 pa: 1040d000 pfn: 1040d page: ffff88001f113340 page->flags ffffffffffffffff <- Uninitialized! When boot is successful: kaslr: 0x000000000a800000 addr: ffffffff83e0d000 pa: d60d000 pfn: d60d page: ffff88001f05b340 page->flags 280000000000 <- Initialized! Here are physical addresses that BIOS provided to us: e820: BIOS-provided physical RAM map: BIOS-e820: [mem 0x0000000000000000-0x000000000009fbff] usable BIOS-e820: [mem 0x000000000009fc00-0x000000000009ffff] reserved BIOS-e820: [mem 0x00000000000f0000-0x00000000000fffff] reserved BIOS-e820: [mem 0x0000000000100000-0x000000001ffdffff] usable BIOS-e820: [mem 0x000000001ffe0000-0x000000001fffffff] reserved BIOS-e820: [mem 0x00000000feffc000-0x00000000feffffff] reserved BIOS-e820: [mem 0x00000000fffc0000-0x00000000ffffffff] reserved In both cases, working and non-working the real physical address is the same: pa - kasrl = 0x2E0D000 The only thing that is different is PFN. We initialize struct pages in four places: 1. Early in boot a small set of struct pages is initialized to fill the first section, and lower zones. 2. During mm_init() we initialize "struct pages" for all the memory that is allocated, i.e reserved in memblock. 3. Using on-demand logic when pages are allocated after mm_init call 4. After smp_init() when the rest free deferred pages are initialized. The above path happens before deferred memory is initialized, and thus it must be covered either by 1, 2 or 3. So, lets check what PFNs are initialized after (1). memmap_init_zone() is called for pfn ranges: 1 - 1000, and 1000 - 1ffe0, but it quits after reaching pfn 0x10000, as it leaves the rest to be initialized as deferred pages. In the working scenario pfn ended up being below 1000, but in the failing scenario it is above. Hence, we must initialize this page in (2). But trap_init() is called before mm_init(). The bug was introduced by "mm: initialize pages on demand during boot" because we lowered amount of pages that is initialized in the step (1). But, it still could happen, because the number of initialized pages was a guessing. The current fix moves trap_init() to be called after mm_init, but as alternative, we could increase pgdat->static_init_pgcnt: In free_area_init_node we can increase: pgdat->static_init_pgcnt = min_t(unsigned long, PAGES_PER_SECTION, pgdat->node_spanned_pages); Instead of one PAGES_PER_SECTION, set several, so the text is covered for all KASLR offsets. But, this would still be guessing. Therefore, I prefer the current fix. Fixes: c9e97a1997fb ("mm: initialize pages on demand during boot") Signed-off-by: Pavel Tatashin Reviewed-by: Steven Rostedt (VMware) --- init/main.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/init/main.c b/init/main.c index b795aa341a3a..870f75581cea 100644 --- a/init/main.c +++ b/init/main.c @@ -585,8 +585,8 @@ asmlinkage __visible void __init start_kernel(void) setup_log_buf(0); vfs_caches_init_early(); sort_main_extable(); - trap_init(); mm_init(); + trap_init(); ftrace_init(); -- 2.17.0