Received: by 2002:a05:6a10:a841:0:0:0:0 with SMTP id d1csp1353320pxy; Thu, 29 Apr 2021 05:28:49 -0700 (PDT) X-Google-Smtp-Source: ABdhPJyQvuHxpD4v/IurK0MOD+zcf9q7MejaO3pxBnObpkyKkAHszb/rBwlpDGfmxbw1UN38pYqx X-Received: by 2002:a17:90a:66c3:: with SMTP id z3mr9462409pjl.196.1619699328823; Thu, 29 Apr 2021 05:28:48 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1619699328; cv=none; d=google.com; s=arc-20160816; b=LiKa7TMRRmKAHRcXba829JiHqpFBsQ++p9w2yiGgbZDj02jA/MzjxyfvHXwLG48RQ6 e8XmOTl+t6RZOEYTMLiaH0QXKF/7Ruwjae1SHhxTq69QXRfc4r9bFnSx57mMWkFhWuo2 nT+eZQ6i2wWx6Dg49uFy6xbxmA130C3bCfeks5zt2HRneDOL0wMsxThXBsSuu6rxzYsa 2w7y1jI9VTQP4S3RRSPXnQmbl/RK+cN8XW50Z9vBOMkb0guY9zpJxPIu+QRPkCHC+HcX Q3NNhDgGRX623u5BU9SomgtTHjIG7t+Z3ldFNnvFNxr6Ew2Lc+aCNZIKicxUfKZr7835 4WJg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=G3IU63kGkBF+ogRqhIQp/uhSIS003Ny+/n/n+XU3YAE=; b=ty/GI4l9tL7eRIm6EZG68m924GLcpEicHHGFELY7QxYgD1CE/uhGjQNQPmni10jG7h MWrClkYmicuseyl40XGHSUKoFkhEfnyK3NjCTHvT0xBL9SY5UyMzOh6YMD7Xw0+tvD6q 8kB3WxXFZyeqA5aYHSwdwfYv0vt7faZQTRUZkgcVFdA7n3xGsoms4MFmJ33jC/rZDrsK FQwqrIP/axPxGxuVsz/lWGDz5CNcMB1XUbe1MDrFNXeWNMvpKdV3mmQfuKTlsqMVHpS0 qzEb8pEEyxtj61hactugBD9TAcS9qaU9rTY1cE446h+ry7m2j63Ss5WAfPpzhWvhvV+V 5ZmQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=cS5bjaVW; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id e31si705447pgb.341.2021.04.29.05.28.34; Thu, 29 Apr 2021 05:28:48 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=cS5bjaVW; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S236278AbhD2M1G (ORCPT + 99 others); Thu, 29 Apr 2021 08:27:06 -0400 Received: from us-smtp-delivery-124.mimecast.com ([170.10.133.124]:25264 "EHLO us-smtp-delivery-124.mimecast.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234420AbhD2M1F (ORCPT ); Thu, 29 Apr 2021 08:27:05 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1619699178; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=G3IU63kGkBF+ogRqhIQp/uhSIS003Ny+/n/n+XU3YAE=; b=cS5bjaVW9EK2YtB+rWme2BjY+ciGa+Y62nAdkLu6tg0LRgjFsp1Au2T7tpI0C6UatoxJ6H OqWnenKAE5XWSMuJxRYVDRs8ObO2EFscAcK/6AmmuxFZKpz5BqguQQOs8F+s8PvNoXU8sS PAtZG6ZWJyvtEZ7t2GzFG/0DNy3dECs= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-342-Q7ZmTdkgPk2x6LTwQLPkjg-1; Thu, 29 Apr 2021 08:26:17 -0400 X-MC-Unique: Q7ZmTdkgPk2x6LTwQLPkjg-1 Received: from smtp.corp.redhat.com (int-mx06.intmail.prod.int.phx2.redhat.com [10.5.11.16]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id EA17E8049CE; Thu, 29 Apr 2021 12:26:14 +0000 (UTC) Received: from t480s.redhat.com (ovpn-114-50.ams2.redhat.com [10.36.114.50]) by smtp.corp.redhat.com (Postfix) with ESMTP id 1B31818796; Thu, 29 Apr 2021 12:25:43 +0000 (UTC) From: David Hildenbrand To: linux-kernel@vger.kernel.org Cc: David Hildenbrand , Andrew Morton , "Michael S. Tsirkin" , Jason Wang , Alexey Dobriyan , Mike Rapoport , "Matthew Wilcox (Oracle)" , Oscar Salvador , Michal Hocko , Roman Gushchin , Alex Shi , Steven Price , Mike Kravetz , Aili Yao , Jiri Bohac , "K. Y. Srinivasan" , Haiyang Zhang , Stephen Hemminger , Wei Liu , Naoya Horiguchi , linux-hyperv@vger.kernel.org, virtualization@lists.linux-foundation.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org Subject: [PATCH v1 4/7] fs/proc/kcore: don't read offline sections, logically offline pages and hwpoisoned pages Date: Thu, 29 Apr 2021 14:25:16 +0200 Message-Id: <20210429122519.15183-5-david@redhat.com> In-Reply-To: <20210429122519.15183-1-david@redhat.com> References: <20210429122519.15183-1-david@redhat.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Scanned-By: MIMEDefang 2.79 on 10.5.11.16 Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Let's avoid reading: 1) Offline memory sections: the content of offline memory sections is stale as the memory is effectively unused by the kernel. On s390x with standby memory, offline memory sections (belonging to offline storage increments) are not accessible. With virtio-mem and the hyper-v balloon, we can have unavailable memory chunks that should not be accessed inside offline memory sections. Last but not least, offline memory sections might contain hwpoisoned pages which we can no longer identify because the memmap is stale. 2) PG_offline pages: logically offline pages that are documented as "The content of these pages is effectively stale. Such pages should not be touched (read/write/dump/save) except by their owner.". Examples include pages inflated in a balloon or unavailble memory ranges inside hotplugged memory sections with virtio-mem or the hyper-v balloon. 3) PG_hwpoison pages: Reading pages marked as hwpoisoned can be fatal. As documented: "Accessing is not safe since it may cause another machine check. Don't touch!" Reading /proc/kcore now performs similar checks as when reading /proc/vmcore for kdump via makedumpfile: problematic pages are exclude. It's also similar to hibernation code, however, we don't skip hwpoisoned pages when processing pages in kernel/power/snapshot.c:saveable_page() yet. Note 1: we can race against memory offlining code, especially memory going offline and getting unplugged: however, we will properly tear down the identity mapping and handle faults gracefully when accessing this memory from kcore code. Note 2: we can race against drivers setting PageOffline() and turning memory inaccessible in the hypervisor. We'll handle this in a follow-up patch. Signed-off-by: David Hildenbrand --- fs/proc/kcore.c | 14 +++++++++++++- 1 file changed, 13 insertions(+), 1 deletion(-) diff --git a/fs/proc/kcore.c b/fs/proc/kcore.c index ed6fbb3bd50c..92ff1e4436cb 100644 --- a/fs/proc/kcore.c +++ b/fs/proc/kcore.c @@ -465,6 +465,9 @@ read_kcore(struct file *file, char __user *buffer, size_t buflen, loff_t *fpos) m = NULL; while (buflen) { + struct page *page; + unsigned long pfn; + /* * If this is the first iteration or the address is not within * the previous entry, search for a matching entry. @@ -503,7 +506,16 @@ read_kcore(struct file *file, char __user *buffer, size_t buflen, loff_t *fpos) } break; case KCORE_RAM: - if (!pfn_is_ram(__pa(start) >> PAGE_SHIFT)) { + pfn = __pa(start) >> PAGE_SHIFT; + page = pfn_to_online_page(pfn); + + /* + * Don't read offline sections, logically offline pages + * (e.g., inflated in a balloon), hwpoisoned pages, + * and explicitly excluded physical ranges. + */ + if (!page || PageOffline(page) || + is_page_hwpoison(page) || !pfn_is_ram(pfn)) { if (clear_user(buffer, tsz)) { ret = -EFAULT; goto out; -- 2.30.2