Received: by 2002:a05:6a10:206:0:0:0:0 with SMTP id 6csp680876pxj; Fri, 14 May 2021 12:58:06 -0700 (PDT) X-Google-Smtp-Source: ABdhPJwtyf6GAh8bKcfsZfvYjlavZsSwqknKD3wPSOKypuGE6a7GgF8ZKZZPTNAMyY6JFaQm9gPP X-Received: by 2002:a6b:7306:: with SMTP id e6mr36625398ioh.75.1621022286600; Fri, 14 May 2021 12:58:06 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1621022286; cv=none; d=google.com; s=arc-20160816; b=MJdO23OFzGGrRxzFFBmp79p1HD2q9eFv2XiR/AVLd3fiP7P8dKHezTs6DPvf1v/Kse WXAOIU260lTEYVu2Ae++WEkjuTfVLbFMm2PmbtLeuNcSyFcn4+9K0igCbwTpOMUmFLcu Ych2QgJhiW3LPqLNre6/u9vOB2QuEd5dSEbn1mihlRluLi6ebr9gQdBpMLp83WiKdvzt 4Y+sCyB2lnxlrJLEmrbOHJ8BWSJimlxzcn4pYl0PXvFyN7GAy4Eflx130Pdw9mpjLgDd QPPfuPBBHxB9YBWRnNop9xq2y2/piS0DVgbTWVTufuRZn2PAuADUDgPPzrPIZL9dAbVc i8NQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=JA+PCClAmI2hyPxGFUfyhxo4lFjoE4jDjkNgRG+Z0Zg=; b=yxUVcmFG2RAxrVzE2Zc+B8DJF96gicqK36WuNVMpzoM1fY0/LCWRpxDnkhMe5IU+6t FhepS7Rjrx9FhGa7Aww34OYDzkUJuJ4Z8IfTqz0tSQ2f7lgDtHEde1LVXdZckdABENGa Tn1cgmrt2Ie9gxhP0WtyqRq48xh1id9fGE1iqCW4d9iTmGiOKK/p8qo4xIt0aZlefc/L 44rcX0nYBwqVSaJHUg82G5WZSIi4hk04OEd6mp/s44Vy+OtMOe7eTcqq6YdM35gsl8yu iTzF3izpDTZuwAzTiqkeyW4aHyTp/pjqE6XWqHVCDbUVrDXwsicWSnftgf+Hm5tP6TlM Xt6Q== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=IDUDa2FK; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id u12si9199736ilm.67.2021.05.14.12.57.54; Fri, 14 May 2021 12:58:06 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=IDUDa2FK; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233540AbhENRZA (ORCPT + 99 others); Fri, 14 May 2021 13:25:00 -0400 Received: from us-smtp-delivery-124.mimecast.com ([216.205.24.124]:57520 "EHLO us-smtp-delivery-124.mimecast.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235125AbhENRYy (ORCPT ); Fri, 14 May 2021 13:24:54 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1621013022; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=JA+PCClAmI2hyPxGFUfyhxo4lFjoE4jDjkNgRG+Z0Zg=; b=IDUDa2FK1AI8ynRQfjLiggtT86diCoO1WyJFs7ziAeU8IQTP0TijKcHpFR0s+gahSbqK42 c4qM5YyE5sNxwXRv8JGJow1gO0IZAL7XkwWXjNh0v/aHfNF5AfdI7/QJLn5yg7+jlLZuCb Gd0FYHSu5rGbDdONG7FPBJTXttveODU= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-331-mdTs41BmMbCEVUFq-BpX9A-1; Fri, 14 May 2021 13:23:39 -0400 X-MC-Unique: mdTs41BmMbCEVUFq-BpX9A-1 Received: from smtp.corp.redhat.com (int-mx08.intmail.prod.int.phx2.redhat.com [10.5.11.23]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 01F25801817; Fri, 14 May 2021 17:23:36 +0000 (UTC) Received: from t480s.redhat.com (ovpn-114-113.ams2.redhat.com [10.36.114.113]) by smtp.corp.redhat.com (Postfix) with ESMTP id E98111A86D; Fri, 14 May 2021 17:23:26 +0000 (UTC) From: David Hildenbrand To: linux-kernel@vger.kernel.org Cc: David Hildenbrand , Andrew Morton , "Michael S. Tsirkin" , Jason Wang , Alexey Dobriyan , Mike Rapoport , "Matthew Wilcox (Oracle)" , Oscar Salvador , Michal Hocko , Roman Gushchin , Alex Shi , Steven Price , Mike Kravetz , Aili Yao , Jiri Bohac , "K. Y. Srinivasan" , Haiyang Zhang , Stephen Hemminger , Wei Liu , Naoya Horiguchi , linux-hyperv@vger.kernel.org, virtualization@lists.linux-foundation.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, Mike Rapoport Subject: [PATCH v2 3/6] fs/proc/kcore: don't read offline sections, logically offline pages and hwpoisoned pages Date: Fri, 14 May 2021 19:22:44 +0200 Message-Id: <20210514172247.176750-4-david@redhat.com> In-Reply-To: <20210514172247.176750-1-david@redhat.com> References: <20210514172247.176750-1-david@redhat.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Scanned-By: MIMEDefang 2.84 on 10.5.11.23 Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Let's avoid reading: 1) Offline memory sections: the content of offline memory sections is stale as the memory is effectively unused by the kernel. On s390x with standby memory, offline memory sections (belonging to offline storage increments) are not accessible. With virtio-mem and the hyper-v balloon, we can have unavailable memory chunks that should not be accessed inside offline memory sections. Last but not least, offline memory sections might contain hwpoisoned pages which we can no longer identify because the memmap is stale. 2) PG_offline pages: logically offline pages that are documented as "The content of these pages is effectively stale. Such pages should not be touched (read/write/dump/save) except by their owner.". Examples include pages inflated in a balloon or unavailble memory ranges inside hotplugged memory sections with virtio-mem or the hyper-v balloon. 3) PG_hwpoison pages: Reading pages marked as hwpoisoned can be fatal. As documented: "Accessing is not safe since it may cause another machine check. Don't touch!" Introduce is_page_hwpoison(), adding a comment that it is inherently racy but best we can really do. Reading /proc/kcore now performs similar checks as when reading /proc/vmcore for kdump via makedumpfile: problematic pages are exclude. It's also similar to hibernation code, however, we don't skip hwpoisoned pages when processing pages in kernel/power/snapshot.c:saveable_page() yet. Note 1: we can race against memory offlining code, especially memory going offline and getting unplugged: however, we will properly tear down the identity mapping and handle faults gracefully when accessing this memory from kcore code. Note 2: we can race against drivers setting PageOffline() and turning memory inaccessible in the hypervisor. We'll handle this in a follow-up patch. Reviewed-by: Mike Rapoport Signed-off-by: David Hildenbrand --- fs/proc/kcore.c | 14 +++++++++++++- include/linux/page-flags.h | 12 ++++++++++++ 2 files changed, 25 insertions(+), 1 deletion(-) diff --git a/fs/proc/kcore.c b/fs/proc/kcore.c index ed6fbb3bd50c..92ff1e4436cb 100644 --- a/fs/proc/kcore.c +++ b/fs/proc/kcore.c @@ -465,6 +465,9 @@ read_kcore(struct file *file, char __user *buffer, size_t buflen, loff_t *fpos) m = NULL; while (buflen) { + struct page *page; + unsigned long pfn; + /* * If this is the first iteration or the address is not within * the previous entry, search for a matching entry. @@ -503,7 +506,16 @@ read_kcore(struct file *file, char __user *buffer, size_t buflen, loff_t *fpos) } break; case KCORE_RAM: - if (!pfn_is_ram(__pa(start) >> PAGE_SHIFT)) { + pfn = __pa(start) >> PAGE_SHIFT; + page = pfn_to_online_page(pfn); + + /* + * Don't read offline sections, logically offline pages + * (e.g., inflated in a balloon), hwpoisoned pages, + * and explicitly excluded physical ranges. + */ + if (!page || PageOffline(page) || + is_page_hwpoison(page) || !pfn_is_ram(pfn)) { if (clear_user(buffer, tsz)) { ret = -EFAULT; goto out; diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h index 04a34c08e0a6..daed82744f4b 100644 --- a/include/linux/page-flags.h +++ b/include/linux/page-flags.h @@ -694,6 +694,18 @@ PAGEFLAG_FALSE(DoubleMap) TESTSCFLAG_FALSE(DoubleMap) #endif +/* + * Check if a page is currently marked HWPoisoned. Note that this check is + * best effort only and inherently racy: there is no way to synchronize with + * failing hardware. + */ +static inline bool is_page_hwpoison(struct page *page) +{ + if (PageHWPoison(page)) + return true; + return PageHuge(page) && PageHWPoison(compound_head(page)); +} + /* * For pages that are never mapped to userspace (and aren't PageSlab), * page_type may be used. Because it is initialised to -1, we invert the -- 2.31.1