Received: by 2002:a05:6a10:6744:0:0:0:0 with SMTP id w4csp1041456pxu; Fri, 16 Oct 2020 02:14:17 -0700 (PDT) X-Google-Smtp-Source: ABdhPJyS42J/HNsRcNEFvv1JUFWyaE6C2PsvZdmFQauFwNawHUICkG6V5FpgsPaH9U4PWIM+6wUX X-Received: by 2002:a50:d4dc:: with SMTP id e28mr2820390edj.137.1602839657633; Fri, 16 Oct 2020 02:14:17 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1602839657; cv=none; d=google.com; s=arc-20160816; b=G+TjV1qO8OPmP/dZwL2ygcG67kDdvvl+3LPI+h6zUD0I0SRtbsWcEB7dqcRNLaT1wj /L7r8oaq+ezJO7zZ5M+64OQLTcn+XGW17aNrDi3/wVPflyovexh7/t01Qr0ERxpVStRe sc8vx/rIGRQ2m+A5kMKyL7M9nmG3LHov17ht7K8OJIlwbecuX68UO8ZHUPmBcD9JI9ra tmtjgBABRMUqejKus9OzC34DZmYvoz3KS0dmtLwpw0qqkvyXMb3l7mpGqzsl9fjWHbjW zOo78fibNvYxETsdoMtkd+gc/AtGCPWYXp/3Jnf2MZgjd80HMUwjyb65jHYuVimtFK7U cbtA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:content-language :in-reply-to:mime-version:user-agent:date:message-id:organization :from:references:cc:to:subject:dkim-signature; bh=N0iuzsKV+en+co/Ft7zltRtGpfmCivUKMQYMYF4dZIE=; b=juSX7G+xAUy6osZT1Kl+Gti9ZJPGJKV9v362WOqvdEUnyQqr2xxbOkc5AbME7AaS4h +zvOYDHAzyB419/EePu0vYbPADaYMPk4ur4JM7rz0WLx5jX+InfDdV6xYwfOaqWqXcK5 lTqpr70S2kMaKdjqHsmE+y3u8aoGcPS0Wd0F1wepI70Q87Vq7NR/+FMFr7bTUZYuCktn HgbMBP5d2mvqjZUT/8tLPJUOhdenJ/OGIIWjyqJSd2a4d0+p8AtumQf+6CPKAAdylDQ5 zI2MaWZLgTffeLA3cvg7mm+OssZRKoWkL7rWKYsQD8ALRocSi/dy5wQhjNbNpHEUxgNv 5ExA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=aNB5vZr3; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id h20si1329961eja.402.2020.10.16.02.13.55; Fri, 16 Oct 2020 02:14:17 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=aNB5vZr3; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2405898AbgJPJLx (ORCPT + 99 others); Fri, 16 Oct 2020 05:11:53 -0400 Received: from us-smtp-delivery-124.mimecast.com ([63.128.21.124]:49834 "EHLO us-smtp-delivery-124.mimecast.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2405854AbgJPJLf (ORCPT ); Fri, 16 Oct 2020 05:11:35 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1602839493; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=N0iuzsKV+en+co/Ft7zltRtGpfmCivUKMQYMYF4dZIE=; b=aNB5vZr3C43v+Ylpr04BcuN+kwJHBJIhDoZl+qrC97AONeRrL6WMCWrowutQiNkJ4u3Dcp DU29kNCTJuInLuByN1/GPSQe+luONZKZ3v3WS4P+Psy0tH4MXsJP8uWBaK4URBbJkJDUh4 RFBQOzBZAUj1XgpgGMRn0hc2iGmB4wA= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-154-K1OXnZx8NnSMeViY6FHIoQ-1; Fri, 16 Oct 2020 05:11:31 -0400 X-MC-Unique: K1OXnZx8NnSMeViY6FHIoQ-1 Received: from smtp.corp.redhat.com (int-mx08.intmail.prod.int.phx2.redhat.com [10.5.11.23]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 5222F1020906; Fri, 16 Oct 2020 09:11:30 +0000 (UTC) Received: from [10.36.113.23] (ovpn-113-23.ams2.redhat.com [10.36.113.23]) by smtp.corp.redhat.com (Postfix) with ESMTP id 7EFE41992D; Fri, 16 Oct 2020 09:11:25 +0000 (UTC) Subject: Re: [PATCH v1 05/29] virtio-mem: generalize check for added memory To: Wei Yang Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org, virtualization@lists.linux-foundation.org, Andrew Morton , "Michael S . Tsirkin" , Jason Wang , Pankaj Gupta References: <20201012125323.17509-1-david@redhat.com> <20201012125323.17509-6-david@redhat.com> <20201015082808.GE86495@L-31X9LVDL-1304.local> <994394f3-c16d-911c-c9fc-d2280f32e7b1@redhat.com> <20201016021651.GI86495@L-31X9LVDL-1304.local> From: David Hildenbrand Organization: Red Hat GmbH Message-ID: <5caec772-295c-436a-2b19-ca261ea1ad0c@redhat.com> Date: Fri, 16 Oct 2020 11:11:24 +0200 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.3.1 MIME-Version: 1.0 In-Reply-To: <20201016021651.GI86495@L-31X9LVDL-1304.local> Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit X-Scanned-By: MIMEDefang 2.84 on 10.5.11.23 Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org >> That's an interesting corner case. Assume you have a 128MB memory block >> but only 64MB are plugged. > > Since we just plug a part of memory block, this state is OFFLINE_PARTIAL > first. But then we would add these memory and online it. This means the state > of this memory block is ONLINE_PARTIAL. > > When this state is changed to OFFLINE_PARTIAL again? Please note that memory onlining is *completely* controllable by user space. User space can offline/online memory blocks as it wants. Not saying this might actually be the right thing to do - but we cannot trust that user space does the right thing. So at any point in time, you have to assume that a) added memory might not get onlined b) previously onlined memory might get offlined c) previously offline memory might get onlined > >> >> As long as we have our online_pages callback in place, we can hinder the >> unplugged 64MB from getting exposed to the buddy >> (virtio_mem_online_page_cb()). However, once we unloaded the driver, > > Yes, > > virtio_mem_set_fake_offline() would __SetPageOffline() to those pages. > >> this is no longer the case. If someone would online that memory block, >> we would expose unplugged memory to the buddy - very bad. >> > > Per my understanding, at this point of time, the memory block is at online > state. Even part of it is set to *fake* offline. > > So how could user trigger another online from sysfs interface? Assume we added a partially plugged memory block, which is now offline. Further assume user space did not online the memory block (e.g., no udev rules). User space could happily online the block after unloading the driver. Again, we have to assume user space could do crazy things. > >> So we have to remove these partially plugged, offline memory blocks when >> losing control over them. >> >> I tried to document that via: >> >> "After we unregistered our callbacks, user space can online partially >> plugged offline blocks. Make sure to remove them." >> >>> >>> Also, during virtio_mem_remove(), we just handle OFFLINE_PARTIAL memory block. >>> How about memory block in other states? It is not necessary to remove >>> ONLINE[_PARTIAL] memroy blocks? >> >> Blocks that are fully plugged (ONLINE or OFFLINE) can get >> onlined/offlined without us having to care. Works fine - we only have to >> care about partially plugged blocks. >> >> While we *could* unplug OFFLINE blocks, there is no way we can >> deterministically offline+remove ONLINE blocks. So that memory has to >> stay, even after we unloaded the driver (similar to the dax/kmem driver). > > For OFFLINE memory blocks, would that leave the situation: > > Guest doesn't need those pages, while host still maps them? Yes, but the guest could online the memory and make use of it. (again, whoever decides to unload the driver better be knowing what he does) To do it even more cleanly, we would a) Have to remove completely plugged offline blocks (not done) b) Have to remove partially plugged offline blocks (done) c) Actually send unplug requests to the hypervisor Right now, only b) is done, because it might actually cause harm (as discussed). However, the problem is, that c) might actually fail. Long short: we could add a) if it turns out to be a real issue. But than, unloading the driver isn't really suggested, the current implementation just "keeps it working without crashes" - and I guess that's good enough for now. > >> >> ONLINE_PARTIAL is already taken care of: it cannot get offlined anymore, >> as we still hold references to these struct pages >> (virtio_mem_set_fake_offline()), and as we no longer have the memory >> notifier in place, we can no longer agree to offline this memory (when >> going_offline). >> > > Ok, I seems to understand the logic now. > > But how we prevent ONLINE_PARTIAL memory block get offlined? There are three > calls in virtio_mem_set_fake_offline(), while all of them adjust page's flag. > How they hold reference to struct page? Sorry, I should have given you the right pointer. (similar to my other reply) We hold a reference either via 1. alloc_contig_range() 2. memmap init code, when not calling generic_online_page(). So these fake-offline pages can never be actually offlined, because we no longer have the memory notifier registered to fix that up. -- Thanks, David / dhildenb