Received: by 2002:a25:ad19:0:0:0:0:0 with SMTP id y25csp9989687ybi; Wed, 24 Jul 2019 13:40:03 -0700 (PDT) X-Google-Smtp-Source: APXvYqyEvngE7IInGK3jhEWLXzalfJ1ybine9xmmlA9/cGGM9zVQ8U63mHcNuHqiHfkD6/EPUQJC X-Received: by 2002:a65:5584:: with SMTP id j4mr52616485pgs.258.1564000803792; Wed, 24 Jul 2019 13:40:03 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1564000803; cv=none; d=google.com; s=arc-20160816; b=tMjecmm4oRSAtbnEQ0etEVKK9Zpr5j1EUaMeG2vxqCw9fYzkT1OArj0xfpD+guXRDS GhlMhmRg2kxbu+e7OkEAKAKyns+isCutL9NXOyf3S+GHyp4xiWixCyA8wvOJGxagB16p XSdWII0nhjSvPtfnYfAgFZj3OQopZjO0db6N4Jg4KDSjw3GH7+qC44j7WItINQrzg27/ cC9l5OeBKXbUTDl4JUGUaKkkhoSTUsr/lhXmlVqE2eVfAzNQQJ+Qt95ETte+EpHNsKoi XqIpoE59mXttTvxgU/G480gYu5Z8uE68tnhPSTtWamCgba2SJvL6DWNvxD897MQ/St/K Louw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:in-reply-to:content-disposition :mime-version:references:message-id:subject:cc:to:from:date; bh=mYCnwlDrFOeB58JxVB5YgZuCp6wcmGxGKrxXr5i4nBc=; b=hb6n3rGlf/lZvNFtMstJLwsT8fee9xTHJXrRzt9VZG75KMQ8Ryl6ecriY6FabBlfZS +VMTBhn9mqU17OcUrDBeOujrSWxYIJXanH9tQafclitPW88XMScsiowqZ7CjGSG9b/X7 HPrxQ4CV6qRPIzIicr7qpc5E9DCQfiQrYVgweaLiWfwzee0SBk7Zs8UJwRn237iaNEDL jxCX1em7h3SQ0CczvUhYDFaMzj17uJ+CEfKi0u2EzD2r/JX2gPuRqajp/mBQqPD5AQLz ea8nLYa5ie8jFT9Zws7xGuonlbQ497BPKvJwP2V06TB9D36ZZ9rFCRSbeZ/u9yQoWBiV ulGw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id g22si14492224pfh.219.2019.07.24.13.39.49; Wed, 24 Jul 2019 13:40:03 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728080AbfGXUis (ORCPT + 99 others); Wed, 24 Jul 2019 16:38:48 -0400 Received: from mail-qt1-f195.google.com ([209.85.160.195]:44656 "EHLO mail-qt1-f195.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726939AbfGXUis (ORCPT ); Wed, 24 Jul 2019 16:38:48 -0400 Received: by mail-qt1-f195.google.com with SMTP id 44so15859065qtg.11 for ; Wed, 24 Jul 2019 13:38:47 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=mYCnwlDrFOeB58JxVB5YgZuCp6wcmGxGKrxXr5i4nBc=; b=NmgIv22DV1v/MkWXq/Rl9oAE9tuj4EDirClMVhWJgJKIae2tsdMdS6KoXsnudPM8KY 5skmGyAspFOBT0VdyahEuxc/eVr3UR6cxCACPeSSIe6mGr4bBAWjrkvusbVp4S0A01C1 pTGuVx4PDlQW/mKNg2rSccfcJxkm+4i1sqGox87uErF/bFQJn3wkz4EE+VYfgK/LI/MM 97KK+lZcj90KajeVMxIxN46neDgKsWTJ1v1RXKcBxqKFZQsA65dAXJG76toKmwNdn+lf Sr9j5pPnLkrlRFsjiiYe0f7bT5dhApc8Garam0FTFElmmNKo/jnkRlm86Vdp6jlNlFMG OmXA== X-Gm-Message-State: APjAAAVD3Nz/vuh/5FsGqmbRo9J0YkexOs74l7PqNUSFOgud/G8ewjAq yLEZgFB16PLrW17vHjTQULjKAw== X-Received: by 2002:ac8:394b:: with SMTP id t11mr58254877qtb.286.1564000727125; Wed, 24 Jul 2019 13:38:47 -0700 (PDT) Received: from redhat.com (bzq-79-181-91-42.red.bezeqint.net. [79.181.91.42]) by smtp.gmail.com with ESMTPSA id a23sm19743310qtp.22.2019.07.24.13.38.41 (version=TLS1_3 cipher=AEAD-AES256-GCM-SHA384 bits=256/256); Wed, 24 Jul 2019 13:38:46 -0700 (PDT) Date: Wed, 24 Jul 2019 16:38:39 -0400 From: "Michael S. Tsirkin" To: Alexander Duyck Cc: Nitesh Narayan Lal , Alexander Duyck , kvm@vger.kernel.org, david@redhat.com, dave.hansen@intel.com, linux-kernel@vger.kernel.org, linux-mm@kvack.org, akpm@linux-foundation.org, yang.zhang.wz@gmail.com, pagupta@redhat.com, riel@surriel.com, konrad.wilk@oracle.com, lcapitulino@redhat.com, wei.w.wang@intel.com, aarcange@redhat.com, pbonzini@redhat.com, dan.j.williams@intel.com Subject: Re: [PATCH v2 0/5] mm / virtio: Provide support for page hinting Message-ID: <20190724163516-mutt-send-email-mst@kernel.org> References: <20190724165158.6685.87228.stgit@localhost.localdomain> <0c520470-4654-cdf2-cf4d-d7c351d25e8b@redhat.com> <088abe33117e891dd6265179f678847bd574c744.camel@linux.intel.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <088abe33117e891dd6265179f678847bd574c744.camel@linux.intel.com> Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, Jul 24, 2019 at 01:27:35PM -0700, Alexander Duyck wrote: > On Wed, 2019-07-24 at 14:40 -0400, Nitesh Narayan Lal wrote: > > On 7/24/19 12:54 PM, Alexander Duyck wrote: > > > This series provides an asynchronous means of hinting to a hypervisor > > > that a guest page is no longer in use and can have the data associated > > > with it dropped. To do this I have implemented functionality that allows > > > for what I am referring to as page hinting > > > > > > The functionality for this is fairly simple. When enabled it will allocate > > > statistics to track the number of hinted pages in a given free area. When > > > the number of free pages exceeds this value plus a high water value, > > > currently 32, > > Shouldn't we configure this to a lower number such as 16? > > Yes, we could do 16. > > > > it will begin performing page hinting which consists of > > > pulling pages off of free list and placing them into a scatter list. The > > > scatterlist is then given to the page hinting device and it will perform > > > the required action to make the pages "hinted", in the case of > > > virtio-balloon this results in the pages being madvised as MADV_DONTNEED > > > and as such they are forced out of the guest. After this they are placed > > > back on the free list, and an additional bit is added if they are not > > > merged indicating that they are a hinted buddy page instead of a standard > > > buddy page. The cycle then repeats with additional non-hinted pages being > > > pulled until the free areas all consist of hinted pages. > > > > > > I am leaving a number of things hard-coded such as limiting the lowest > > > order processed to PAGEBLOCK_ORDER, > > Have you considered making this option configurable at the compile time? > > We could. However, PAGEBLOCK_ORDER is already configurable on some > architectures. I didn't see much point in making it configurable in the > case of x86 as there are only really 2 orders that this could be used in > that provided good performance and that MAX_ORDER - 1 and PAGEBLOCK_ORDER. > > > > and have left it up to the guest to > > > determine what the limit is on how many pages it wants to allocate to > > > process the hints. > > It might make sense to set the number of pages to be hinted at a time from the > > hypervisor. > > We could do that. Although I would still want some upper limit on that as > I would prefer to keep the high water mark as a static value since it is > used in an inline function. Currently the virtio driver is the one > defining the capacity of pages per request. > > > > My primary testing has just been to verify the memory is being freed after > > > allocation by running memhog 79g on a 80g guest and watching the total > > > free memory via /proc/meminfo on the host. With this I have verified most > > > of the memory is freed after each iteration. As far as performance I have > > > been mainly focusing on the will-it-scale/page_fault1 test running with > > > 16 vcpus. With that I have seen at most a 2% difference between the base > > > kernel without these patches and the patches with virtio-balloon disabled. > > > With the patches and virtio-balloon enabled with hinting the results > > > largely depend on the host kernel. On a 3.10 RHEL kernel I saw up to a 2% > > > drop in performance as I approached 16 threads, > > I think this is acceptable. > > > however on the the lastest > > > linux-next kernel I saw roughly a 4% to 5% improvement in performance for > > > all tests with 8 or more threads. > > Do you mean that with your patches the will-it-scale/page_fault1 numbers were > > better by 4-5% over an unmodified kernel? > > Yes. That is the odd thing. I am wondering if there was some improvement > in the zeroing of THP pages or something that is somehow improving the > cache performance for the accessing of the pages by the test in the guest. Well cache is indexed by the PA on intel, right? So if you end up never writing into the pages, reading them will be faster because you will end up with a zero page. This will be offset by a fault when you finally do write into the page. > > > I believe the difference seen is due to > > > the overhead for faulting pages back into the guest and zeroing of memory. > > It may also make sense to test these patches with netperf to observe how much > > performance drop it is introducing. > > Do you have some test you were already using? I ask because I am not sure > netperf would generate a large enough memory window size to really trigger > much of a change in terms of hinting. If you have some test in mind I > could probably set it up and run it pretty quick. > > > > Patch 4 is a bit on the large side at about 600 lines of change, however > > > I really didn't see a good way to break it up since each piece feeds into > > > the next. So I couldn't add the statistics by themselves as it didn't > > > really make sense to add them without something that will either read or > > > increment/decrement them, or add the Hinted state without something that > > > would set/unset it. As such I just ended up adding the entire thing as > > > one patch. It makes it a bit bigger but avoids the issues in the previous > > > set where I was referencing things before they had been added. > > > > > > Changes from the RFC: > > > https://lore.kernel.org/lkml/20190530215223.13974.22445.stgit@localhost.localdomain/ > > > Moved aeration requested flag out of aerator and into zone->flags. > > > Moved bounary out of free_area and into local variables for aeration. > > > Moved aeration cycle out of interrupt and into workqueue. > > > Left nr_free as total pages instead of splitting it between raw and aerated. > > > Combined size and physical address values in virtio ring into one 64b value. > > > > > > Changes from v1: > > > https://lore.kernel.org/lkml/20190619222922.1231.27432.stgit@localhost.localdomain/ > > > Dropped "waste page treatment" in favor of "page hinting" > > We may still have to try and find a better name for virtio-balloon side changes. > > As "FREE_PAGE_HINT" and "PAGE_HINTING" are still confusing. > > We just need to settle on a name. Essentially all this requires is just a > quick find and replace with whatever name we decide on.