Received: by 2002:ac0:a5b6:0:0:0:0:0 with SMTP id m51-v6csp2101678imm; Mon, 28 May 2018 01:29:41 -0700 (PDT) X-Google-Smtp-Source: AB8JxZp1+O7+ImEBjrcd4I/1G0u4RC+q/4fuVnEocyA7EcUg/vBB3UYVV5te0vdbulA9Mav7/luQ X-Received: by 2002:a62:981d:: with SMTP id q29-v6mr12577870pfd.65.1527496180943; Mon, 28 May 2018 01:29:40 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1527496180; cv=none; d=google.com; s=arc-20160816; b=u9O5F2tW74mLc6R3yAzEr0TLjIhYt0CcuiMG4NEmLWAOo0tbDzCpBr47tdniWEvhJ/ Rd5NF02d/OTBgNgffDj7QSCjADBopweHupXxsNvuSqiPEMXT38/jRDuhD5NLm9FIigyd XqVWh6uG/DJ3zQC8kDl2upRBsQBPHf+IhxgFysHDpqKB+a8oTE8emKAFZHiZJgIDk2eq 5+wNDkLSLbpjpf2+yQmKKW8MP+CoxMcvzMd8Ugmgw8FAU66FWuopr0TBJ8hDYTMiKPFI rcXGAXNPp9B+n4+EcAbvPbaFqEBQ49/zYutPhHvQaEHNqUBoFYMSsw1FNRwE4TFruujs aDPQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date:arc-authentication-results; bh=mfjvRl1Oa++g0lwmpCvdfh8e8n7rOaScXPIJdJq2glo=; b=PeT1J2PAGGO2MBjf5KriMItDsSd8e3oR1chGGf8jgDe97p5Neg8KlXmKB2u+AlONMb wwHRUjnHQODM1CO/Djoe65Xa03CYeRphXLK148OHQJGLrF3ylok/RoXcNMAACI5EObTo EimFxIX+KYeqkNieQ88/Szda9XXYAuKigAWt0w2oN2++TJ/C9ItZvP6zaGPvOS2bQEwM YIySb7CEoxF7rZxOnBTGmYWRZTZmeZxJqJfTNi6ou6kEA9fWqJBvmTXfscelOSjFgjpE 5Mwhv1SZ24IuNyl+3lFPXwRNX1ql48TVvQRw1X4bS1FGEeerp5TK7E2E9932Dw5joAF3 c/Ow== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id f3-v6si22942271pga.33.2018.05.28.01.29.25; Mon, 28 May 2018 01:29:40 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754036AbeE1I3N (ORCPT + 99 others); Mon, 28 May 2018 04:29:13 -0400 Received: from mx3-rdu2.redhat.com ([66.187.233.73]:33630 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1753849AbeE1I3L (ORCPT ); Mon, 28 May 2018 04:29:11 -0400 Received: from smtp.corp.redhat.com (int-mx06.intmail.prod.int.rdu2.redhat.com [10.11.54.6]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id A63164026766; Mon, 28 May 2018 08:29:10 +0000 (UTC) Received: from dhcp-128-65.nay.redhat.com (ovpn-12-172.pek2.redhat.com [10.72.12.172]) by smtp.corp.redhat.com (Postfix) with ESMTPS id 97F592166BB2; Mon, 28 May 2018 08:28:51 +0000 (UTC) Date: Mon, 28 May 2018 16:28:46 +0800 From: Dave Young To: David Hildenbrand Cc: Michal Hocko , linux-mm@kvack.org, linux-kernel@vger.kernel.org, Alexander Potapenko , Andrew Morton , Andrey Ryabinin , Balbir Singh , Baoquan He , Benjamin Herrenschmidt , Boris Ostrovsky , Dan Williams , Dmitry Vyukov , Greg Kroah-Hartman , Hari Bathini , Huang Ying , Hugh Dickins , Ingo Molnar , Jaewon Kim , Jan Kara , =?iso-8859-1?B?Suly9G1l?= Glisse , Joonsoo Kim , Juergen Gross , Kate Stewart , "Kirill A. Shutemov" , Matthew Wilcox , Mel Gorman , Michael Ellerman , Miles Chen , Oscar Salvador , Paul Mackerras , Pavel Tatashin , Philippe Ombredanne , Rashmica Gupta , Reza Arbab , Souptick Joarder , Tetsuo Handa , Thomas Gleixner , Vlastimil Babka Subject: Re: [PATCH v1 00/10] mm: online/offline 4MB chunks controlled by device driver Message-ID: <20180528082846.GA7884@dhcp-128-65.nay.redhat.com> References: <20180523151151.6730-1-david@redhat.com> <20180524075327.GU20441@dhcp22.suse.cz> <14d79dad-ad47-f090-2ec0-c5daf87ac529@redhat.com> <20180524085610.GA5467@dhcp-128-65.nay.redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.9.5 (2018-04-13) X-Scanned-By: MIMEDefang 2.78 on 10.11.54.6 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.11.55.6]); Mon, 28 May 2018 08:29:10 +0000 (UTC) X-Greylist: inspected by milter-greylist-4.5.16 (mx1.redhat.com [10.11.55.6]); Mon, 28 May 2018 08:29:10 +0000 (UTC) for IP:'10.11.54.6' DOMAIN:'int-mx06.intmail.prod.int.rdu2.redhat.com' HELO:'smtp.corp.redhat.com' FROM:'dyoung@redhat.com' RCPT:'' Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 05/24/18 at 11:14am, David Hildenbrand wrote: > On 24.05.2018 10:56, Dave Young wrote: > > Hi, > > > > [snip] > >>> > >>>> For kdump and onlining/offlining code, we > >>>> have to mark pages as offline before a new segment is visible to the system > >>>> (e.g. as these pages might not be backed by real memory in the hypervisor). > >>> > >>> Please expand on the kdump part. That is really confusing because > >>> hotplug should simply not depend on kdump at all. Moreover why don't you > >>> simply mark those pages reserved and pull them out from the page > >>> allocator? > >> > >> 1. "hotplug should simply not depend on kdump at all" > >> > >> In theory yes. In the current state we already have to trigger kdump to > >> reload whenever we add/remove a memory block. > >> > >> > >> 2. kdump part > >> > >> Whenever we offline a page and tell the hypervisor about it ("unplug"), > >> we should not assume that we can read that page again. Now, if dumping > >> tools assume they can read all memory that is offline, we are in trouble. > >> > >> It is the same thing as we already have with Pg_hwpoison. Just a > >> different meaning - "don't touch this page, it is offline" compared to > >> "don't touch this page, hw is broken". > > > > Does that means in case an offline no kdump reload as mentioned in 1)? > > > > If we have the offline event and reload kdump, I assume the memory state > > is refreshed so kdump will not read the memory offlined, am I missing > > something? > > If a whole section is offline: yes. (ACPI hotplug) > > If pages are online but broken ("logically offline" - hwpoison): no > > If single pages are logically offline: no. (Balloon inflation - let's > call it unplug as that's what some people refer to) > > If only subsections (4MB chunks) are offline: no. > > Exporting memory ranges in a smaller granularity to kdump than section > size would a) be heavily complicated b) introduce a lot of overhead for > this tracking data c) make us retrigger kdump way too often. > > So simply marking pages offline in the struct pages and telling kdump > about it is the straight forward thing to do. And it is fairly easy to > add and implement as we have the exact same thing in place for hwpoison. Ok, it is clear enough. If case fine grained page offline is is like a hwpoison page so a userspace patch for makedumpfile is needes to exclude them when copying vmcore. > > > > >> > >> Balloon drivers solve this problem by always allowing to read unplugged > >> memory. In virtio-mem, this cannot and should even not be guaranteed. > >> > > > > Hmm, that sounds a bug.. > > I can give you a simple example why reading such unplugged (or balloon > inflated) memory is problematic: Huge page backed guests. > > There is no zero page for huge pages. So if we allow the guest to read > that memory any time, we cannot guarantee that we actually consume less > memory in the hypervisor. This is absolutely to be avoided. > > Existing balloon drivers don't support huge page backed guests. (well > you can inflate, but the hypervisor cannot madvise() 4k on a huge page, > resulting in no action being performed). This scenario is to be > supported with virtio-mem. > > > So yes, this is actually a bug in e.g. virtio-balloon implementations: > > With "VIRTIO_BALLOON_F_MUST_TELL_HOST" we have to tell the hypervisor > before we access a page again. kdump cannot do this and does not care, > so this page is silently accessed and dumped. One of the main problems > why extending virtio-balloon hypervisor implementations to support > host-enforced R/W protection is impossible. I'm not sure I got all virt related background, but still thank you for the detailed explanation. This is the first time I heard about this, nobody complained before :( > > > > >> And what we have to do to make this work is actually pretty simple: Just > >> like Pg_hwpoison, track per page if it is online and provide this > >> information to kdump. > >> > >> > > > > Thanks > > Dave > > > > > -- > > Thanks, > > David / dhildenb Thanks Dave