Received: by 2002:ac0:aed5:0:0:0:0:0 with SMTP id t21csp4122396imb; Wed, 6 Mar 2019 05:59:04 -0800 (PST) X-Google-Smtp-Source: APXvYqzfzjFB1xW5cHBzOt5yI7BjVUMAPPJWJxBMHAhqPm2kLktdssirQNDQAWyOv12e/2nwRAAb X-Received: by 2002:a65:468f:: with SMTP id h15mr6448846pgr.391.1551880744667; Wed, 06 Mar 2019 05:59:04 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1551880744; cv=none; d=google.com; s=arc-20160816; b=qQVXQXmWVpwRrJBQKKBIof2mRDXzjUzoWX6oM7i8bWKFRxJhWdy5eEL9Osuf6incor JGB6W9RzDQQsmI8xbU/8J3q7npnH5UU88lHxkEAwzLaD4vBbYTzImGF3KK7U9Y2Bj1UM MLuGoLsfJ2WR8UzsoOzIGYzspzEhJdcyca8aLa6TegktTJ2zWzGib1H7X0J3G6K9u+Gp 932wRA6/rWlYMf/MvGN0qHoIoS2qTHkj4QGwC+Rhgb2od1dtP2dAJUeJzWEKLTrV2k+n /3uauU/nZfeSg5WRx1QPFgCSnViZsNY/y5fjiUFRWvqu/rdO76UEpGdp6FSoONC3gDOq emag== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding:mime-version :message-id:date:subject:cc:to:from; bh=uNeBhOy20j7m9i+MPIzifbeUUn7rRAYOqqRlE2vBnTg=; b=SJ1BQA3+MMYIQmBv3QsSFpFSieCOtaACXCWRColuQCMWFceZVAN5c4etHscn7KRoce lyYaK0w8efk8baK6LCZ7RDjly4koNH7cvXrM3tjSPjSBNaFwPc7WTHsuESBHNsFBJ5X7 iilIR8JSckRQ0wpvkELqYkKByqRm+WCjlRW/PCKBl0l10l4Q+s19nUZ576Hiti0lpSpU TWHiE2B49dn0PAgDhW2KbF0upZZVGHiRwpead8eu86yq3LnCBg+LRvMBrCtxXl5q0OlJ GqZSHOIFmaM0bXIRbHSK9NqNzDpySzzK4HkZrVnmaaBJXtgRE5XDVLmnpqde7/j55MkX zWzQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id q88si1628875pfa.222.2019.03.06.05.58.17; Wed, 06 Mar 2019 05:59:04 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729357AbfCFLoh (ORCPT + 99 others); Wed, 6 Mar 2019 06:44:37 -0500 Received: from mx1.redhat.com ([209.132.183.28]:38168 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728734AbfCFLoh (ORCPT ); Wed, 6 Mar 2019 06:44:37 -0500 Received: from smtp.corp.redhat.com (int-mx01.intmail.prod.int.phx2.redhat.com [10.5.11.11]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id D6D5AC049D78; Wed, 6 Mar 2019 11:44:36 +0000 (UTC) Received: from kasong-laptop-rh.nay.redhat.com (unknown [10.66.128.61]) by smtp.corp.redhat.com (Postfix) with ESMTP id 9B924600C5; Wed, 6 Mar 2019 11:44:30 +0000 (UTC) From: Kairui Song To: linux-kernel@vger.kernel.org Cc: Thomas Gleixner , Ingo Molnar , Borislav Petkov , "H. Peter Anvin" , x86@kernel.org, Alexey Dobriyan , Andrew Morton , Omar Sandoval , Jiri Bohac , Baoquan He , Dave Young , Kairui Song Subject: [PATCH v4] x86/gart/kcore: Exclude GART aperture from kcore Date: Wed, 6 Mar 2019 19:38:59 +0800 Message-Id: <20190306113859.19263-1-kasong@redhat.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Scanned-By: MIMEDefang 2.79 on 10.5.11.11 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.31]); Wed, 06 Mar 2019 11:44:36 +0000 (UTC) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On machines where the GART aperture is mapped over physical RAM, /proc/kcore contains the GART aperture range and reading it may lead to kernel panic. Vmcore used to have the same issue, until we fixed it in commit 2a3e83c6f96c ("x86/gart: Exclude GART aperture from vmcore")', leveraging existing hook infrastructure in vmcore to let /proc/vmcore return zeroes when attempting to read the aperture region, and so it won't read from the actual memory. We apply the same workaround for kcore. First implement the same hook infrastructure for kcore, then reuse the hook function introduced in previous vmcore fix. Just with some minor adjustment, rename some functions for more general usage, and simplify the hook infrastructure a bit as there is no module usage yet. Suggested-by: Baoquan He Signed-off-by: Kairui Song --- Update from V3: - Reuse the approach in V2, as Jiri noticed V3 approach may fail some use case. It introduce overlapped region in kcore, and can't garenteen the read request will fall into the region we wanted. - Improve some function naming suggested by Baoquan in V2. - Simplify the hook registering and checking, we are not exporting the hook register function for now, no need to make it that complex. - Simplify the commit message Update from V2: Instead of repeating the same hook infrastructure for kcore, introduce a new kcore area type to avoid reading from, and let kcore always bypass this kind of area. Update from V1: Fix a complie error when CONFIG_PROC_KCORE is not set arch/x86/kernel/aperture_64.c | 20 +++++++++++++------- fs/proc/kcore.c | 32 ++++++++++++++++++++++++++++++++ include/linux/kcore.h | 3 +++ 3 files changed, 48 insertions(+), 7 deletions(-) diff --git a/arch/x86/kernel/aperture_64.c b/arch/x86/kernel/aperture_64.c index 58176b56354e..c1319567b441 100644 --- a/arch/x86/kernel/aperture_64.c +++ b/arch/x86/kernel/aperture_64.c @@ -14,6 +14,7 @@ #define pr_fmt(fmt) "AGP: " fmt #include +#include #include #include #include @@ -57,7 +58,7 @@ int fallback_aper_force __initdata; int fix_aperture __initdata = 1; -#ifdef CONFIG_PROC_VMCORE +#if defined(CONFIG_PROC_VMCORE) || defined(CONFIG_PROC_KCORE) /* * If the first kernel maps the aperture over e820 RAM, the kdump kernel will * use the same range because it will remain configured in the northbridge. @@ -66,20 +67,25 @@ int fix_aperture __initdata = 1; */ static unsigned long aperture_pfn_start, aperture_page_count; -static int gart_oldmem_pfn_is_ram(unsigned long pfn) +static int gart_mem_pfn_is_ram(unsigned long pfn) { return likely((pfn < aperture_pfn_start) || (pfn >= aperture_pfn_start + aperture_page_count)); } -static void exclude_from_vmcore(u64 aper_base, u32 aper_order) +static void exclude_from_core(u64 aper_base, u32 aper_order) { aperture_pfn_start = aper_base >> PAGE_SHIFT; aperture_page_count = (32 * 1024 * 1024) << aper_order >> PAGE_SHIFT; - WARN_ON(register_oldmem_pfn_is_ram(&gart_oldmem_pfn_is_ram)); +#ifdef CONFIG_PROC_VMCORE + WARN_ON(register_oldmem_pfn_is_ram(&gart_mem_pfn_is_ram)); +#endif +#ifdef CONFIG_PROC_KCORE + WARN_ON(register_mem_pfn_is_ram(&gart_mem_pfn_is_ram)); +#endif } #else -static void exclude_from_vmcore(u64 aper_base, u32 aper_order) +static void exclude_from_core(u64 aper_base, u32 aper_order) { } #endif @@ -474,7 +480,7 @@ int __init gart_iommu_hole_init(void) * may have allocated the range over its e820 RAM * and fixed up the northbridge */ - exclude_from_vmcore(last_aper_base, last_aper_order); + exclude_from_core(last_aper_base, last_aper_order); return 1; } @@ -520,7 +526,7 @@ int __init gart_iommu_hole_init(void) * overlap with the first kernel's memory. We can't access the * range through vmcore even though it should be part of the dump. */ - exclude_from_vmcore(aper_alloc, aper_order); + exclude_from_core(aper_alloc, aper_order); /* Fix up the north bridges */ for (i = 0; i < amd_nb_bus_dev_ranges[i].dev_limit; i++) { diff --git a/fs/proc/kcore.c b/fs/proc/kcore.c index bbcc185062bb..e51b324450d6 100644 --- a/fs/proc/kcore.c +++ b/fs/proc/kcore.c @@ -54,6 +54,33 @@ static LIST_HEAD(kclist_head); static DECLARE_RWSEM(kclist_lock); static int kcore_need_update = 1; +/* + * Returns > 0 for RAM pages, 0 for non-RAM pages, < 0 on error + * Same as oldmem_pfn_is_ram in vmcore + */ +static int (*mem_pfn_is_ram)(unsigned long pfn); + +int register_mem_pfn_is_ram(int (*fn)(unsigned long pfn)) +{ + if (mem_pfn_is_ram) + return -EBUSY; + mem_pfn_is_ram = fn; + return 0; +} + +void unregister_mem_pfn_is_ram(void) +{ + mem_pfn_is_ram = NULL; +} + +static int pfn_is_ram(unsigned long pfn) +{ + if (mem_pfn_is_ram) + return mem_pfn_is_ram(pfn); + else + return 1; +} + /* This doesn't grab kclist_lock, so it should only be used at init time. */ void __init kclist_add(struct kcore_list *new, void *addr, size_t size, int type) @@ -465,6 +492,11 @@ read_kcore(struct file *file, char __user *buffer, size_t buflen, loff_t *fpos) goto out; } m = NULL; /* skip the list anchor */ + } else if (!pfn_is_ram(__pa(start) >> PAGE_SHIFT)) { + if (clear_user(buffer, tsz)) { + ret = -EFAULT; + goto out; + } } else if (m->type == KCORE_VMALLOC) { vread(buf, (char *)start, tsz); /* we have to zero-fill user buffer even if no read */ diff --git a/include/linux/kcore.h b/include/linux/kcore.h index 8c3f8c14eeaa..2130a35f883b 100644 --- a/include/linux/kcore.h +++ b/include/linux/kcore.h @@ -44,6 +44,9 @@ void kclist_add_remap(struct kcore_list *m, void *addr, void *vaddr, size_t sz) m->vaddr = (unsigned long)vaddr; kclist_add(m, addr, sz, KCORE_REMAP); } + +extern int register_mem_pfn_is_ram(int (*fn)(unsigned long pfn)); +extern void unregister_mem_pfn_is_ram(void); #else static inline void kclist_add(struct kcore_list *new, void *addr, size_t size, int type) -- 2.20.1