Received: by 2002:a05:6a10:a852:0:0:0:0 with SMTP id d18csp2592163pxy; Mon, 3 May 2021 03:50:55 -0700 (PDT) X-Google-Smtp-Source: ABdhPJwB/2n5o4mac+3n+fzZn8zYXaG3wXWv6o+/bR/CLHi9AMjmyqTEbsAc3sJuGCY6fI2AuHNY X-Received: by 2002:a17:906:5495:: with SMTP id r21mr16188529ejo.471.1620039055066; Mon, 03 May 2021 03:50:55 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1620039055; cv=none; d=google.com; s=arc-20160816; b=P+vGBID48Ngdirq6K+VW9AevHB2N/i6AaPG1/uZatt+v1xLVjt5MciN6pivD0nV+jm MQPaOQSDtAIqghznms6fPFQH7nqruik4WV5saU3YzRyi28EWJ67mGM3qBOR50x6quMIc HXI78GdfqeZR3BzhnWLJcgaMPyIWuIXD2z5y7kG8gjIDywuI/SF9zqzsYkSGSHrhXVgD QUEmvoPYYVX/MssO6F6KaDA/TVJRbO0OpF6T5fMQlrmVd7C9LjMuL5jCPyr1WTHsy5Zo QVsqo3J8IyUq8Rlm2txyoSMm4XbPzrujjuLOF7ETGAN/CUR6w9SCfUc58hIB9WmodLaG Owgw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:content-language :in-reply-to:mime-version:user-agent:date:message-id:subject :organization:from:references:cc:to:dkim-signature; bh=TfH/Rim8ttUy4kF1walnlnjFDNfkSM8H2CqszcJCsT8=; b=aIPL17YwYXkhrGfD1zcvPoV4X4eEUJS487082Azy7ohL+j4nGooUekuqKi6WRp6UBX vDvIsB5vi6cbBQ25s5/1b4IfQEDR6387l+TtdWhGWFiBWtVjem2TyjyYEGWcWPFJeZo3 7j5OQC194vuoqYWy58VoNK0G3a4DB1iU9rBQihOE1wuN8EQjJGBXJYjJMsLkfJB4VIi7 PjCkptAH5Z9G4O6VC0FXmsX+xPCiecK196EZUWTYeciQEPr+1jErpzYddF65uvZ0qRe2 6hTv8m0vV/qTGSJFrU+/iPr4n3BsuK5F2izyVoR4triGOOYArgm6Tkt/e2R9qhMJgk2g +Mng== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=BevCE+CO; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id lj7si10385382ejb.700.2021.05.03.03.50.18; Mon, 03 May 2021 03:50:55 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=BevCE+CO; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233289AbhECKOt (ORCPT + 99 others); Mon, 3 May 2021 06:14:49 -0400 Received: from us-smtp-delivery-124.mimecast.com ([216.205.24.124]:49779 "EHLO us-smtp-delivery-124.mimecast.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233062AbhECKOn (ORCPT ); Mon, 3 May 2021 06:14:43 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1620036830; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=TfH/Rim8ttUy4kF1walnlnjFDNfkSM8H2CqszcJCsT8=; b=BevCE+COvDOK3u4GHQHdjBYuqAK8ilirf44Aztim5XQN3tcBHlVdXkc215Jgj/0RflPCnV SKfhT2w2SIIyZuvcjJ+UeN0tQr8oAEWh6lgjEmOlurXHo3xT/Mza0oegDUmtYR8OUGvQ7K U7kY1hwNJsr24irOxV418wMyPtfVHDs= Received: from mail-wr1-f69.google.com (mail-wr1-f69.google.com [209.85.221.69]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-297-MH6fwSf2MKyaFnSS1tzcqQ-1; Mon, 03 May 2021 06:13:48 -0400 X-MC-Unique: MH6fwSf2MKyaFnSS1tzcqQ-1 Received: by mail-wr1-f69.google.com with SMTP id a7-20020adfc4470000b029010d875c66edso3623194wrg.23 for ; Mon, 03 May 2021 03:13:48 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:to:cc:references:from:organization:subject :message-id:date:user-agent:mime-version:in-reply-to :content-language:content-transfer-encoding; bh=TfH/Rim8ttUy4kF1walnlnjFDNfkSM8H2CqszcJCsT8=; b=ojXS8lmJ4UWNWD3/mxE3IhlZaoXWZCJ9R90YfUWNNxN/n49FbNxEqqTS3smkVaSbe/ edP1Reb9tM18BX2Rxp9DJjzSaVskOEIFunKS3lc+MpTCq9PzIxsvB/UR9/xVZ6eAkl3X bcwO/hYNdTBAwm8efJVl0Rfu+9FBG6M7wPCYHrh443Qaipv3nwN78bNWOB70BxiKfLrk um8wIV+85CXClGw2eHijgE1grwXAi24Mrkxpc/Dionkbm98Y2DiagiiHFlgbiE5iz2/s udOMhodlqIBFXh916w2zgksUI8dWWSYOuurJ+WC64RvlfI070KLX5hI4QoWEzWjVP7Et bcHA== X-Gm-Message-State: AOAM533Qww9bZKGJ2tLkemrwGHD/5HzYKU1RU/BThYAOPqvm8F5u/EYF HA8Q9GS8LKeXvEMQXEXoHIwbQCq/SaMCcwl9aOmYcR2r5BKkStt4PkiDtPtfUabKwKFTssUhHqD QlHhjq6COK2XrMsHQOoFv/xNz X-Received: by 2002:adf:e40f:: with SMTP id g15mr23980426wrm.392.1620036827528; Mon, 03 May 2021 03:13:47 -0700 (PDT) X-Received: by 2002:adf:e40f:: with SMTP id g15mr23980386wrm.392.1620036827214; Mon, 03 May 2021 03:13:47 -0700 (PDT) Received: from [192.168.3.132] (p5b0c649f.dip0.t-ipconnect.de. [91.12.100.159]) by smtp.gmail.com with ESMTPSA id d2sm11770212wrs.10.2021.05.03.03.13.46 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Mon, 03 May 2021 03:13:46 -0700 (PDT) To: Mike Rapoport Cc: linux-kernel@vger.kernel.org, Andrew Morton , "Michael S. Tsirkin" , Jason Wang , Alexey Dobriyan , "Matthew Wilcox (Oracle)" , Oscar Salvador , Michal Hocko , Roman Gushchin , Alex Shi , Steven Price , Mike Kravetz , Aili Yao , Jiri Bohac , "K. Y. Srinivasan" , Haiyang Zhang , Stephen Hemminger , Wei Liu , Naoya Horiguchi , linux-hyperv@vger.kernel.org, virtualization@lists.linux-foundation.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org References: <20210429122519.15183-1-david@redhat.com> <20210429122519.15183-8-david@redhat.com> <5a5a7552-4f0a-75bc-582f-73d24afcf57b@redhat.com> From: David Hildenbrand Organization: Red Hat Subject: Re: [PATCH v1 7/7] fs/proc/kcore: use page_offline_(freeze|unfreeze) Message-ID: <2f66cbfc-aa29-b3ef-4c6a-0da8b29b56f6@redhat.com> Date: Mon, 3 May 2021 12:13:45 +0200 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.8.1 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: 8bit Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 03.05.21 11:28, Mike Rapoport wrote: > On Mon, May 03, 2021 at 10:28:36AM +0200, David Hildenbrand wrote: >> On 02.05.21 08:34, Mike Rapoport wrote: >>> On Thu, Apr 29, 2021 at 02:25:19PM +0200, David Hildenbrand wrote: >>>> Let's properly synchronize with drivers that set PageOffline(). Unfreeze >>>> every now and then, so drivers that want to set PageOffline() can make >>>> progress. >>>> >>>> Signed-off-by: David Hildenbrand >>>> --- >>>> fs/proc/kcore.c | 15 +++++++++++++++ >>>> 1 file changed, 15 insertions(+) >>>> >>>> diff --git a/fs/proc/kcore.c b/fs/proc/kcore.c >>>> index 92ff1e4436cb..3d7531f47389 100644 >>>> --- a/fs/proc/kcore.c >>>> +++ b/fs/proc/kcore.c >>>> @@ -311,6 +311,7 @@ static void append_kcore_note(char *notes, size_t *i, const char *name, >>>> static ssize_t >>>> read_kcore(struct file *file, char __user *buffer, size_t buflen, loff_t *fpos) >>>> { >>>> + size_t page_offline_frozen = 0; >>>> char *buf = file->private_data; >>>> size_t phdrs_offset, notes_offset, data_offset; >>>> size_t phdrs_len, notes_len; >>>> @@ -509,6 +510,18 @@ read_kcore(struct file *file, char __user *buffer, size_t buflen, loff_t *fpos) >>>> pfn = __pa(start) >> PAGE_SHIFT; >>>> page = pfn_to_online_page(pfn); >>> >>> Can't this race with page offlining for the first time we get here? >> >> >> To clarify, we have three types of offline pages in the kernel ... >> >> a) Pages part of an offline memory section; the memap is stale and not >> trustworthy. pfn_to_online_page() checks that. We *can* protect against >> memory offlining using get_online_mems()/put_online_mems(), but usually >> avoid doing so as the race window is very small (and a problem all over the >> kernel we basically never hit) and locking is rather expensive. In the >> future, we might switch to rcu to handle that more efficiently and avoiding >> these possible races. >> >> b) PageOffline(): logically offline pages contained in an online memory >> section with a sane memmap. virtio-mem calls these pages "fake offline"; >> something like a "temporary" memory hole. The new mechanism I propose will >> be used to handle synchronization as races can be more severe, e.g., when >> reading actual page content here. >> >> c) Soft offline pages: hwpoisoned pages that are not actually harmful yet, >> but could become harmful in the future. So we better try to remove the page >> from the page allcoator and try to migrate away existing users. >> >> >> So page_offline_* handle "b) PageOffline()" only. There is a tiny race >> between pfn_to_online_page(pfn) and looking at the memmap as we have in many >> cases already throughout the kernel, to be tackled in the future. > > Right, but here you anyway add locking, so why exclude the first iteration? What we're protecting is PageOffline() below. If I didn't mess up, we should always be calling page_offline_freeze() before calling PageOffline(). Or am I missing something? > > BTW, did you consider something like Yes, I played with something like that. We'd have to handle the first page_offline_freeze() freeze differently, though, and that's where things got a bit ugly in my attempts. > > if (page_offline_frozen++ % MAX_ORDER_NR_PAGES == 0) { > page_offline_unfreeze(); > cond_resched(); > page_offline_freeze(); > } > > We don't seem to care about page_offline_frozen overflows here, do we? No, the buffer size is also size_t and gets incremented on a per-byte basis. The variant I have right now looked the cleanest to me. Happy to hear simpler alternatives. -- Thanks, David / dhildenb