Received: by 2002:a05:6a10:a852:0:0:0:0 with SMTP id d18csp471135pxy; Wed, 5 May 2021 06:42:40 -0700 (PDT) X-Google-Smtp-Source: ABdhPJydrEz6XQUi8rzZ6fPCTAKQzXIUDXzzu6wsQpH1XKewYT1iAw+lOr6sUrk3JdnOqq2x11uE X-Received: by 2002:a63:4610:: with SMTP id t16mr28613682pga.171.1620222159824; Wed, 05 May 2021 06:42:39 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1620222159; cv=none; d=google.com; s=arc-20160816; b=OxIFmiAkmJROvzprj+wav7az5v2CylOP3GZGj4V900o5fXMrkgzXbwBs7E4xQvVfHF DmdumclK8u4RMonsjiLqlVWqzfii8AV744mIJBLm814o//8LW4fDT20kYG5XrRKKoa5c Jtk/YnJYZzPGai9Fo/tZlS9CNFIQZRdN6B1QN/9MbJHDDy1UsrcIrWoaYk9IXgLGVA0a Dd8qRNxXCn5FxbnbOY62qHHa9UrcGTNKUWEeKOSHD9cIWaZiyCnPi6cRDOHyxrJ2UsMJ sIIsbxT9bWM+ezY4Q/jBRAZTHWm3/WaOvc+kn/ZttcVFKIBZW1048SU50BrgmLlTmcV8 zvZw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-disposition:mime-version :references:message-id:subject:cc:to:from:date:dkim-signature; bh=aSIc/lDgtqLluSkN8lwArkLOvYYY16AGyEezAwvl+AY=; b=yScv30mKATw9NSmbiotDjzvB0LOlywuW25Kr5MvBLArqhoov4WmYZijd7iJQ/hXN1w y4D/1e9vaLHpXTlTcfVKYflE5kMJskchQAkJz5ZJSz1jStBBsGr8apM9NlpLrGVI39Yq x/mAVzcllya1IiUyA8XcujsWI91/rL/0ANOGmfGy+UkiXCG6Z3KLsu0hjtgR5EWMI3bP 2fy4Mx4TpSAV0V4UbS4d6VReI++PyKgyaHkA/My2bibnP/hr5BI/oG2uRwS2fyQe26Ri iKBwG5sYv4JsayYqR0kv7WO0/545ceND7RbLdlNfhBE+2pv19wkuGvjofKcXvbSmdz8I LITg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@suse.com header.s=susede1 header.b=bvA4G+QB; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=NONE dis=NONE) header.from=suse.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id p13si334797pff.254.2021.05.05.06.42.26; Wed, 05 May 2021 06:42:39 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@suse.com header.s=susede1 header.b=bvA4G+QB; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=NONE dis=NONE) header.from=suse.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233512AbhEENZU (ORCPT + 99 others); Wed, 5 May 2021 09:25:20 -0400 Received: from mx2.suse.de ([195.135.220.15]:49522 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231774AbhEENZT (ORCPT ); Wed, 5 May 2021 09:25:19 -0400 X-Virus-Scanned: by amavisd-new at test-mx.suse.de DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1620221062; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=aSIc/lDgtqLluSkN8lwArkLOvYYY16AGyEezAwvl+AY=; b=bvA4G+QBmgc3mkDjrCQOl7LWQXn8USiuFiw+pa8xnUItusUWADT+mFHhgCSn26vV+rBUU9 pOYdePL4OmedwlIo4eFr4C8mcM/1rW/BXH2AEwTL500kavySpYOB3rV/L0nthKhb61seS/ XCvo9Farq+XVMr9m+Zs/DULxlVV5QuQ= Received: from relay2.suse.de (unknown [195.135.221.27]) by mx2.suse.de (Postfix) with ESMTP id DF87CAE5E; Wed, 5 May 2021 13:24:21 +0000 (UTC) Date: Wed, 5 May 2021 15:24:19 +0200 From: Michal Hocko To: David Hildenbrand Cc: linux-kernel@vger.kernel.org, Andrew Morton , "Michael S. Tsirkin" , Jason Wang , Alexey Dobriyan , Mike Rapoport , "Matthew Wilcox (Oracle)" , Oscar Salvador , Roman Gushchin , Alex Shi , Steven Price , Mike Kravetz , Aili Yao , Jiri Bohac , "K. Y. Srinivasan" , Haiyang Zhang , Stephen Hemminger , Wei Liu , Naoya Horiguchi , linux-hyperv@vger.kernel.org, virtualization@lists.linux-foundation.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org Subject: Re: [PATCH v1 5/7] mm: introduce page_offline_(begin|end|freeze|unfreeze) to synchronize setting PageOffline() Message-ID: References: <20210429122519.15183-1-david@redhat.com> <20210429122519.15183-6-david@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20210429122519.15183-6-david@redhat.com> Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu 29-04-21 14:25:17, David Hildenbrand wrote: > A driver might set a page logically offline -- PageOffline() -- and > turn the page inaccessible in the hypervisor; after that, access to page > content can be fatal. One example is virtio-mem; while unplugged memory > -- marked as PageOffline() can currently be read in the hypervisor, this > will no longer be the case in the future; for example, when having > a virtio-mem device backed by huge pages in the hypervisor. > > Some special PFN walkers -- i.e., /proc/kcore -- read content of random > pages after checking PageOffline(); however, these PFN walkers can race > with drivers that set PageOffline(). > > Let's introduce page_offline_(begin|end|freeze|unfreeze) for > synchronizing. > > page_offline_freeze()/page_offline_unfreeze() allows for a subsystem to > synchronize with such drivers, achieving that a page cannot be set > PageOffline() while frozen. > > page_offline_begin()/page_offline_end() is used by drivers that care about > such races when setting a page PageOffline(). > > For simplicity, use a rwsem for now; neither drivers nor users are > performance sensitive. Please add a note to the PageOffline documentation as well. While are adding the api close enough an explicit note there wouldn't hurt. > Signed-off-by: David Hildenbrand As to the patch itself, I am slightly worried that other pfn walkers might be less tolerant to the locking than the proc ones. On the other hand most users shouldn't really care as they do not tend to touch the memory content and PageOffline check without any synchronization should be sufficient for those. Let's try this out and see where we get... Acked-by: Michal Hocko > --- > include/linux/page-flags.h | 5 +++++ > mm/util.c | 38 ++++++++++++++++++++++++++++++++++++++ > 2 files changed, 43 insertions(+) > > diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h > index b8c56672a588..e3d00c72f459 100644 > --- a/include/linux/page-flags.h > +++ b/include/linux/page-flags.h > @@ -767,6 +767,11 @@ PAGE_TYPE_OPS(Buddy, buddy) > */ > PAGE_TYPE_OPS(Offline, offline) > > +extern void page_offline_freeze(void); > +extern void page_offline_unfreeze(void); > +extern void page_offline_begin(void); > +extern void page_offline_end(void); > + > /* > * Marks pages in use as page tables. > */ > diff --git a/mm/util.c b/mm/util.c > index 54870226cea6..95395d4e4209 100644 > --- a/mm/util.c > +++ b/mm/util.c > @@ -1013,3 +1013,41 @@ void mem_dump_obj(void *object) > } > pr_cont(" non-slab/vmalloc memory.\n"); > } > + > +/* > + * A driver might set a page logically offline -- PageOffline() -- and > + * turn the page inaccessible in the hypervisor; after that, access to page > + * content can be fatal. > + * > + * Some special PFN walkers -- i.e., /proc/kcore -- read content of random > + * pages after checking PageOffline(); however, these PFN walkers can race > + * with drivers that set PageOffline(). > + * > + * page_offline_freeze()/page_offline_unfreeze() allows for a subsystem to > + * synchronize with such drivers, achieving that a page cannot be set > + * PageOffline() while frozen. > + * > + * page_offline_begin()/page_offline_end() is used by drivers that care about > + * such races when setting a page PageOffline(). > + */ > +static DECLARE_RWSEM(page_offline_rwsem); > + > +void page_offline_freeze(void) > +{ > + down_read(&page_offline_rwsem); > +} > + > +void page_offline_unfreeze(void) > +{ > + up_read(&page_offline_rwsem); > +} > + > +void page_offline_begin(void) > +{ > + down_write(&page_offline_rwsem); > +} > + > +void page_offline_end(void) > +{ > + up_write(&page_offline_rwsem); > +} > -- > 2.30.2 > -- Michal Hocko SUSE Labs