Received: by 2002:a05:6a10:a852:0:0:0:0 with SMTP id d18csp2555383pxy; Mon, 3 May 2021 02:53:11 -0700 (PDT) X-Google-Smtp-Source: ABdhPJxNOoSTzvtsXSOQdfgi6XXnVb5Z3JdMpKtwVgiVZNbQFVFaptyPKml6g+OwRvpgZHlD6NrF X-Received: by 2002:a17:90a:8911:: with SMTP id u17mr10887893pjn.165.1620035591433; Mon, 03 May 2021 02:53:11 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1620035591; cv=none; d=google.com; s=arc-20160816; b=pkZY8A6+PT8aLJns5QA2dCGmhbOHGmruDs06i0tFhpCJ63c7PdTJOMptuu9CZSJ0Zr wS521GyhEskkge+ZG7De1sIZZZkOdwL+8L24/+RaX4FDPN9e36ygk5T6XpUoxGyhT+Q8 Znqvd2P27t3r7SPrCwMzfl70yF4Q/6sPBJtHSlcHf2G9cgzoZT0yzQ+S8fsEtEP8zQ5e EJ5X1DAF4Z8jsEl+q/YdtEBcfLZ+RSfRI/NsQkBZ+HzyXtUiXinJbT5GQA+1jIauNysl PJ7ojLKSwCcJX17KFb2qIWi6MukFRksvJzRyqJ2lMGckKMUnWwHWFahkRVar6s0eoRQ5 vcGw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-disposition:mime-version :references:message-id:subject:cc:to:from:date:dkim-signature; bh=lZ4d1KBKB8pQUeBYsVJxor85hSb7qdW0re8eixg2OTE=; b=Kqs7smHr/Vv/lCF8q4c5hIrNjMx2MUEgZdWnpuRrSv+c8WfJx7i3/kh74MxporJo2S iWFljESM3ZvzcL+z5eMoRD3HB+1vNLNKmFp/QrhyPH/oWrzIbwKaUoI00qRtT/ttyz/O 1OwgLjxjj1DjCSclB4joKe/4/omrhkKO8NiaVdqTp6xymQCrNWQP4lrwLo049WK3kyWM oE261P+04xINE6c3EW7dUDvuqdZGZGHuDvomc2i4Bz5TnPsgaChh6q7gboYHjs3jvDtC 4a17h4zr27VhetIimWWv1GdMTNIphzKeM0TA7Ozyf2LcKFX+zTrvqG9vfuHi65zV1RD0 E/xA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b="fLr3Rw/5"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id p7si15081862plk.163.2021.05.03.02.52.58; Mon, 03 May 2021 02:53:11 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b="fLr3Rw/5"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233038AbhECJaD (ORCPT + 99 others); Mon, 3 May 2021 05:30:03 -0400 Received: from mail.kernel.org ([198.145.29.99]:47198 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231523AbhECJaD (ORCPT ); Mon, 3 May 2021 05:30:03 -0400 Received: by mail.kernel.org (Postfix) with ESMTPSA id 4885D6134F; Mon, 3 May 2021 09:29:02 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1620034150; bh=I1Va2M1In7/8Va4ug37hQWcWQ6HmBWYcOh51DEY5fuc=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=fLr3Rw/5KuN7AdI4iGvmjflgDT8KEAvkqmIqRzugSLFvrln6yXh9vAxVmpsqNrv8P xE8FJtRESmSwQ23TKy/T3vyrWKBskbsUVkrPu6To8pNXIzZd357jqo8Ing/JyFo1Fv sfk18desrvvWKI0pwl08PsRN9PsEDuXQPrbpYEcEDGzT/yMujqy5wjx43yj0pCqUes Mi4zu361NOtsbBmJtNHWL/osjPbubXqhT7IPrbCNdZL1R8YsW0B09t8iiT2z+UrlPh snHK6IUaDGY6J7jvjGoq5CfDF7LW1DcMlUfI43honVFe5Ih3mQboTTYy4BqvihlVFI mVLzE8bWl2SMQ== Date: Mon, 3 May 2021 12:28:58 +0300 From: Mike Rapoport To: David Hildenbrand Cc: linux-kernel@vger.kernel.org, Andrew Morton , "Michael S. Tsirkin" , Jason Wang , Alexey Dobriyan , "Matthew Wilcox (Oracle)" , Oscar Salvador , Michal Hocko , Roman Gushchin , Alex Shi , Steven Price , Mike Kravetz , Aili Yao , Jiri Bohac , "K. Y. Srinivasan" , Haiyang Zhang , Stephen Hemminger , Wei Liu , Naoya Horiguchi , linux-hyperv@vger.kernel.org, virtualization@lists.linux-foundation.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org Subject: Re: [PATCH v1 7/7] fs/proc/kcore: use page_offline_(freeze|unfreeze) Message-ID: References: <20210429122519.15183-1-david@redhat.com> <20210429122519.15183-8-david@redhat.com> <5a5a7552-4f0a-75bc-582f-73d24afcf57b@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <5a5a7552-4f0a-75bc-582f-73d24afcf57b@redhat.com> Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, May 03, 2021 at 10:28:36AM +0200, David Hildenbrand wrote: > On 02.05.21 08:34, Mike Rapoport wrote: > > On Thu, Apr 29, 2021 at 02:25:19PM +0200, David Hildenbrand wrote: > > > Let's properly synchronize with drivers that set PageOffline(). Unfreeze > > > every now and then, so drivers that want to set PageOffline() can make > > > progress. > > > > > > Signed-off-by: David Hildenbrand > > > --- > > > fs/proc/kcore.c | 15 +++++++++++++++ > > > 1 file changed, 15 insertions(+) > > > > > > diff --git a/fs/proc/kcore.c b/fs/proc/kcore.c > > > index 92ff1e4436cb..3d7531f47389 100644 > > > --- a/fs/proc/kcore.c > > > +++ b/fs/proc/kcore.c > > > @@ -311,6 +311,7 @@ static void append_kcore_note(char *notes, size_t *i, const char *name, > > > static ssize_t > > > read_kcore(struct file *file, char __user *buffer, size_t buflen, loff_t *fpos) > > > { > > > + size_t page_offline_frozen = 0; > > > char *buf = file->private_data; > > > size_t phdrs_offset, notes_offset, data_offset; > > > size_t phdrs_len, notes_len; > > > @@ -509,6 +510,18 @@ read_kcore(struct file *file, char __user *buffer, size_t buflen, loff_t *fpos) > > > pfn = __pa(start) >> PAGE_SHIFT; > > > page = pfn_to_online_page(pfn); > > > > Can't this race with page offlining for the first time we get here? > > > To clarify, we have three types of offline pages in the kernel ... > > a) Pages part of an offline memory section; the memap is stale and not > trustworthy. pfn_to_online_page() checks that. We *can* protect against > memory offlining using get_online_mems()/put_online_mems(), but usually > avoid doing so as the race window is very small (and a problem all over the > kernel we basically never hit) and locking is rather expensive. In the > future, we might switch to rcu to handle that more efficiently and avoiding > these possible races. > > b) PageOffline(): logically offline pages contained in an online memory > section with a sane memmap. virtio-mem calls these pages "fake offline"; > something like a "temporary" memory hole. The new mechanism I propose will > be used to handle synchronization as races can be more severe, e.g., when > reading actual page content here. > > c) Soft offline pages: hwpoisoned pages that are not actually harmful yet, > but could become harmful in the future. So we better try to remove the page > from the page allcoator and try to migrate away existing users. > > > So page_offline_* handle "b) PageOffline()" only. There is a tiny race > between pfn_to_online_page(pfn) and looking at the memmap as we have in many > cases already throughout the kernel, to be tackled in the future. Right, but here you anyway add locking, so why exclude the first iteration? > (A better name for PageOffline() might make sense; PageSoftOffline() would > be catchy but interferes with c). PageLogicallyOffline() is ugly; > PageFakeOffline() might do) > > > > + /* > > > + * Don't race against drivers that set PageOffline() > > > + * and expect no further page access. > > > + */ > > > + if (page_offline_frozen == MAX_ORDER_NR_PAGES) { > > > + page_offline_unfreeze(); > > > + page_offline_frozen = 0; > > > + cond_resched(); > > > + } > > > + if (!page_offline_frozen++) > > > + page_offline_freeze(); > > > + BTW, did you consider something like if (page_offline_frozen++ % MAX_ORDER_NR_PAGES == 0) { page_offline_unfreeze(); cond_resched(); page_offline_freeze(); } We don't seem to care about page_offline_frozen overflows here, do we? -- Sincerely yours, Mike.