Received: by 2002:a05:6a10:f347:0:0:0:0 with SMTP id d7csp4909886pxu; Tue, 22 Dec 2020 04:00:49 -0800 (PST) X-Google-Smtp-Source: ABdhPJw1UFkq2cHSHVqcokMZiTehyz7L6EMEeffr3S1b/vhL1IH43asNZJgxz0FNHPcp9G3pWEQG X-Received: by 2002:a17:907:210b:: with SMTP id qn11mr19718098ejb.41.1608638449167; Tue, 22 Dec 2020 04:00:49 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1608638449; cv=none; d=google.com; s=arc-20160816; b=MLYHqiavcNT9OXd9qKYPhub5uXP8kgryJw0oCZFvBFo4IzTmFA8CoesZoLwgYE/jTB It0j3prZoyj7lcE56b4ZDPssTQMVXfLDKTxC880/xTgkIAo3IoHd1a956tXu0b/kWxsF +mcXHtL+bTsmmHNuguYtficxvp+3xsIDQqrXk1W5B2nUbuQz5NxLu+OiRJO/cas36Yzu VcEPHfagRRxZF5P4Ccmda8GV7+2Ia2NtE5i4RtiN2g5LMDNbGYboUDRKk+nqu2TToraZ OCTLKEPj1iJR9ds9x8xx46g0+eX3Y6IDl/g2/FL4Dgyyahk957IyZRJGHB6CeWjf8cam 7GqA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:content-language :in-reply-to:mime-version:user-agent:date:message-id:organization :from:references:cc:to:subject:dkim-signature; bh=c2qf5gIhcMVQKS3hR+V8X3H/dCp3GvNlzpFYpp+NKNs=; b=dgv7flLb5GZwFMu1ISzcsBn5CRnImJG2n1ikYClKP7zHUpbQyuXqXP3Pf8/05HNi2n jqI6He+8kuHWWMZL/JzNAqpmY8i+vsWeEeY37de34ne7+tB69TDVOTObjxMiBHPReYnq PN/5oCLFbnJ5q14mMCTDyDbZig42rKISXH4Odc/TWuDxGFOlprQYfcx1zpjD5lMToduZ oJeuBz5MKWqt1LyosNXw8CGW7VqubjNx/wsuC+XvEuBU5++gdLouJFmHBOJFP26gKf3M MEufYWnICwsKSkwmCtR6pkvaAOZqsgfOTe/ccwQY3Ibh5o3+TRAqnUEo6efpRT0q0Vq/ +6zQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=KWsIq3ET; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id g4si11903083edu.577.2020.12.22.04.00.26; Tue, 22 Dec 2020 04:00:49 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=KWsIq3ET; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726680AbgLVL7g (ORCPT + 99 others); Tue, 22 Dec 2020 06:59:36 -0500 Received: from us-smtp-delivery-124.mimecast.com ([216.205.24.124]:30471 "EHLO us-smtp-delivery-124.mimecast.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726591AbgLVL7f (ORCPT ); Tue, 22 Dec 2020 06:59:35 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1608638288; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=c2qf5gIhcMVQKS3hR+V8X3H/dCp3GvNlzpFYpp+NKNs=; b=KWsIq3ETq7JinqnJVkQF2kCdpiMoD5BIp96GJ43HAkL+cWDAG+lYdbpEp52rgDjcK2YezO Qce44mlBM6hqpOF3J5RDxI/v51Z6Qxmiizp3WVxQwYGAVDdDl0w5wgsgqt/RL0/JVKMqlL 952hUBnCyhyAdm3vK2Ws9pZUIGOLV7w= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-309-gXb2fMU9PgO7SaOF_AZm0g-1; Tue, 22 Dec 2020 06:58:04 -0500 X-MC-Unique: gXb2fMU9PgO7SaOF_AZm0g-1 Received: from smtp.corp.redhat.com (int-mx06.intmail.prod.int.phx2.redhat.com [10.5.11.16]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 4B30F1084C89; Tue, 22 Dec 2020 11:58:02 +0000 (UTC) Received: from [10.36.113.220] (ovpn-113-220.ams2.redhat.com [10.36.113.220]) by smtp.corp.redhat.com (Postfix) with ESMTP id 4FDC75E1B4; Tue, 22 Dec 2020 11:57:55 +0000 (UTC) Subject: Re: [RFC v2 PATCH 0/4] speed up page allocation for __GFP_ZERO To: Liang Li Cc: Alexander Duyck , Mel Gorman , Andrew Morton , Andrea Arcangeli , Dan Williams , "Michael S. Tsirkin" , Jason Wang , Dave Hansen , Michal Hocko , Liang Li , linux-mm@kvack.org, linux-kernel@vger.kernel.org, virtualization@lists.linux-foundation.org References: <20201221162519.GA22504@open-light-1.localdomain> <7bf0e895-52d6-9e2d-294b-980c33cf08e4@redhat.com> From: David Hildenbrand Organization: Red Hat GmbH Message-ID: <840ff69d-20d5-970a-1635-298000196f3e@redhat.com> Date: Tue, 22 Dec 2020 12:57:54 +0100 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.5.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 8bit X-Scanned-By: MIMEDefang 2.79 on 10.5.11.16 Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org > >>> >>> Virtulization >>> ============= >>> Speed up VM creation and shorten guest boot time, especially for PCI >>> SR-IOV device passthrough scenario. Compared with some of the para >>> vitalization solutions, it is easy to deploy because it’s transparent >>> to guest and can handle DMA properly in BIOS stage, while the para >>> virtualization solution can’t handle it well. >> >> What is the "para virtualization" approach you are talking about? > > I refer two topic in the KVM forum 2020, the doc can give more details : > https://static.sched.com/hosted_files/kvmforum2020/48/coIOMMU.pdf > https://static.sched.com/hosted_files/kvmforum2020/51/The%20Practice%20Method%20to%20Speed%20Up%2010x%20Boot-up%20Time%20for%20Guest%20in%20Alibaba%20Cloud.pdf > > and the flowing link is mine: > https://static.sched.com/hosted_files/kvmforum2020/90/Speed%20Up%20Creation%20of%20a%20VM%20With%20Passthrough%20GPU.pdf Thanks for the pointers! I actually did watch your presentation. >> >>> >>> Improve guest performance when use VIRTIO_BALLOON_F_REPORTING for memory >>> overcommit. The VIRTIO_BALLOON_F_REPORTING feature will report guest page >>> to the VMM, VMM will unmap the corresponding host page for reclaim, >>> when guest allocate a page just reclaimed, host will allocate a new page >>> and zero it out for guest, in this case pre zero out free page will help >>> to speed up the proccess of fault in and reduce the performance impaction. >> >> Such faults in the VMM are no different to other faults, when first >> accessing a page to be populated. Again, I wonder how much of a >> difference it actually makes. >> > > I am not just referring to faults in the VMM, I mean the whole process > that handles guest page faults. > without VIRTIO_BALLOON_F_REPORTING, pages used by guests will be zero > out only once by host. With VIRTIO_BALLOON_F_REPORTING, free pages are > reclaimed by the host and may return to the host buddy > free list. When the pages are given back to the guest, the host kernel > needs to zero out it again. It means > with VIRTIO_BALLOON_F_REPORTING, guest memory performance will be > degraded for frequently > zero out operation on host side. The performance degradation will be > obvious for huge page case. Free > page pre zero out can help to make guest memory performance almost the > same as without > VIRTIO_BALLOON_F_REPORTING. Yes, what I am saying is that this fault handling is no different to ordinary faults when accessing a virtual memory location the first time and populating a page. The only difference is that it happens continuously, not only the first time we touch a page. And we might be able to improve handling in the hypervisor in the future. We have been discussing using MADV_FREE instead of MADV_DONTNEED in QEMU for handling free page reporting. Then, guest reported pages will only get reclaimed by the hypervisor when there is actual memory pressure in the hypervisor (e.g., when about to swap). And zeroing a page is an obvious improvement over going to swap. The price for zeroing pages has to be paid at one point. Also note that we've been discussing cache-related things already. If you zero out before giving the page to the guest, the page will already be in the cache - where the guest directly wants to access it. [...] >>> >>> Security >>> ======== >>> This is a weak version of "introduce init_on_alloc=1 and init_on_free=1 >>> boot options", which zero out page in a asynchronous way. For users can't >>> tolerate the impaction of 'init_on_alloc=1' or 'init_on_free=1' brings, >>> this feauture provide another choice. >> "we don’t pre zero out all the free pages" so this is of little actual use. > > OK. It seems none of the reasons listed above is strong enough for I was rather saying that for security it's of little use IMHO. Application/VM start up time might be improved by using huge pages (and pre-zeroing these). Free page reporting might be improved by using MADV_FREE instead of MADV_DONTNEED in the hypervisor. > this feature, above all of them, which one is likely to become the > most strong one? From the implementation, you will find it is > configurable, users don't want to use it can turn it off. This is not > an option? Well, we have to maintain the feature and sacrifice a page flag. For example, do we expect someone explicitly enabling the feature just to speed up startup time of an app that consumes a lot of memory? I highly doubt it. I'd love to hear opinions of other people. (a lot of people are offline until beginning of January, including, well, actually me :) ) -- Thanks, David / dhildenb