Received: by 2002:a6b:500f:0:0:0:0:0 with SMTP id e15csp4807767iob; Mon, 9 May 2022 02:02:02 -0700 (PDT) X-Google-Smtp-Source: ABdhPJxv3pL+DnHvL4fYwof5pVMvToGeD/dnOXJ/5vhD09zRdEgd1aW2gSlN/3GG+r6Mq2PgJ8Fo X-Received: by 2002:a17:90a:4803:b0:1dc:b4c9:1958 with SMTP id a3-20020a17090a480300b001dcb4c91958mr24685469pjh.61.1652086922791; Mon, 09 May 2022 02:02:02 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1652086922; cv=none; d=google.com; s=arc-20160816; b=Jri3P2853wIMf08k2vfJcGMZS/X9vg8WxUEoMCk/XAOMUZv4RqSVRFhrbYSEqXztZa Mnsy30s/kS3ajcUdtX6ajaVb12Q76bhKvTSeCkeeJeHkXmfObee6JxT0Sj8Wy21KFXrO ZD6U087nfqJ8/Tg6Z7owBV+meVNjdTM86PEa6kTTHV85EDhKXYuJ/9XFmXeegwxoWnKu Mt0Kx3OSmtDNvdbj8I+1dWAwK9CXUo37ekBLfpK8pTQQAXKJK86FmjGdUzH8V4JS+ndY V7k3WnAaMH0iRKGxBbVabUXs2Iiu4wQtvo8WFmFeej2okqc4WUDIRRxBfaNA8G7hX7QC AOEw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:in-reply-to:from :references:cc:to:content-language:subject:user-agent:mime-version :date:message-id:dkim-signature; bh=wsQP+BFVTB9UB9xiiBehZgKllGbYblTpYvuoEHp1V/E=; b=wOsJiCoHdzyRt/D65ulcjIU8tSd7wF0szSLsBgcA/HqOXRBo+xR/pm9xdU1uhgotlU vL48prfG7vwkNjdKQ8kQQqKQEp7GdLMgGmwyweDbbEmvz5R2mXTRMhU4PiZj1Ce50VC/ nj8bo6ZVOJaTnsJldEOeN6Up+Uh0okW8WPtP8wOeqTvko0Avl6ZH4osEduH7X9yYOOyL F4uTpe9+f1KmAUy2cUghaRbF7jghv51BwUJkJWzyh31fr4aza3Hqyw387qHzGLPe+dra G7bWT9xFHqQgS+uASsRVur6raGFLKWtaZAYhfsO3cUohnC2Ib5ZDB17cZ7WhW90Zlcse 05PA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=iZbB9vH8; spf=softfail (google.com: domain of transitioning linux-kernel-owner@vger.kernel.org does not designate 23.128.96.19 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Return-Path: Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net. [23.128.96.19]) by mx.google.com with ESMTPS id t3-20020a056a00138300b0050e12e5d0c8si15855994pfg.54.2022.05.09.02.02.01 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 09 May 2022 02:02:02 -0700 (PDT) Received-SPF: softfail (google.com: domain of transitioning linux-kernel-owner@vger.kernel.org does not designate 23.128.96.19 as permitted sender) client-ip=23.128.96.19; Authentication-Results: mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=iZbB9vH8; spf=softfail (google.com: domain of transitioning linux-kernel-owner@vger.kernel.org does not designate 23.128.96.19 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id E38551A4D03; Mon, 9 May 2022 01:49:38 -0700 (PDT) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S241908AbiEEO1t (ORCPT + 99 others); Thu, 5 May 2022 10:27:49 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:52656 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1349409AbiEEO1r (ORCPT ); Thu, 5 May 2022 10:27:47 -0400 Received: from mga14.intel.com (mga14.intel.com [192.55.52.115]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 381FD5AA6C for ; Thu, 5 May 2022 07:24:08 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1651760648; x=1683296648; h=message-id:date:mime-version:subject:to:cc:references: from:in-reply-to:content-transfer-encoding; bh=OhcxbBgmutkoElcmxsM0ZTVXHgDmWx9IZ8gbNt7Pzfc=; b=iZbB9vH8NBsBAxWg2OPciFQ9NgTXmYsueGhHp04hDLXIhbo9avqMauXA 7JZuw1oACTBINvKb0F6U1aorTXyKKSij1o+RoZH93J6J9qffzfsz0xG2t Ex7z99Uj5SgjNUyZo3i6ydbzRUoFQhYznrgahRksNrjtyosXTmQCGBgHB v4g9NdPLiFEbITUTKErGQYjvGoOgaJPrj3T17A1IrKgWlPn6w7u6Ge/ay xCzeAXRkrj6GOl/F2+ukMnCHS9850N1s9Z0A2NYxpU0t2XrHSiSD4SMZP DbfawA+5Dfnfx07i1CY3eWYnwlb4z38RrWhYGSVtTWEAntJIlrEkRGCvM g==; X-IronPort-AV: E=McAfee;i="6400,9594,10338"; a="268271785" X-IronPort-AV: E=Sophos;i="5.91,201,1647327600"; d="scan'208";a="268271785" Received: from orsmga006.jf.intel.com ([10.7.209.51]) by fmsmga103.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 05 May 2022 07:24:08 -0700 X-IronPort-AV: E=Sophos;i="5.91,201,1647327600"; d="scan'208";a="537354032" Received: from evegaag-mobl1.amr.corp.intel.com (HELO [10.209.187.127]) ([10.209.187.127]) by orsmga006-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 05 May 2022 07:24:06 -0700 Message-ID: Date: Thu, 5 May 2022 07:24:26 -0700 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Thunderbird/91.7.0 Subject: Re: RFC: Memory Tiering Kernel Interfaces Content-Language: en-US To: Wei Xu Cc: Alistair Popple , Davidlohr Bueso , Andrew Morton , Dave Hansen , Huang Ying , Dan Williams , Yang Shi , Linux MM , Greg Thelen , "Aneesh Kumar K.V" , Jagdish Gediya , Linux Kernel Mailing List , Michal Hocko , Baolin Wang , Brice Goglin , Feng Tang , Jonathan Cameron References: <20220501175813.tvytoosygtqlh3nn@offworld> <87o80eh65f.fsf@nvdebian.thelocal> <87mtfygoxs.fsf@nvdebian.thelocal> <9fb22767-54de-d316-7e6b-5aac375c9c49@intel.com> <52541497-c097-5a51-4718-feed13660255@intel.com> From: Dave Hansen In-Reply-To: Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-Spam-Status: No, score=-3.7 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,NICE_REPLY_A,RDNS_NONE,SPF_HELO_NONE, T_SCC_BODY_TEXT_LINE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 5/4/22 23:35, Wei Xu wrote: > On Wed, May 4, 2022 at 10:02 AM Dave Hansen wrote: >> That means a lot of page table and EPT walks to map those linear >> addresses back to physical. That adds to the inefficiency. > > That's true if the tracking is purely based on physical pages. For > hot page tracking from PEBS, we can consider tracking in > virtual/linear addresses. We don't need to maintain the history for > all linear page addresses nor for an indefinite amount of time. After > all, we just need to identify pages accessed frequently recently and > promote them. Except that you don't want to promote on *every* access. That might lead to too much churn. You're also assuming that all accesses to a physical page are via a single linear address, which ignores shared memory mapped at different linear addresses. Our (maybe wrong) assumption has been that shared memory is important enough to manage that it can't be ignored. >> In the end, you get big PEBS buffers with lots of irrelevant data that >> needs significant post-processing to make sense of it. > > I am curious about what are "lots of irrelevant data" if PEBS data is > filtered on data sources (e.g. DRAM vs PMEM) by hardware. If we need > to have different policies for the pages from the same data source, > then I agree that the software has to do a lot of filtering work. Perhaps "irrelevant" was a bad term to use. I meant that you can't just take the PEBS data and act directly on it. It has to be post-processed and you will see things in there like lots of adjacent accesses to a page. Those additional accesses can be interesting but at some point you have all the weight you need to promote the page and the _rest_ are irrelevant. >> The folks at Intel that tried this really struggled to take this mess and turn it into a successful hot-page tracking. >> >> Maybe someone else will find a better way to do it, but we tried and >> gave up. > > It might be challenging to use PEBS as the only and universal hot page > tracking hardware mechanism. For example, there are challenges to use > PEBS to sample KVM guest accesses from the host. Yep, agreed. This aspect of the hardware is very painful at the moment. > On the other hand, PEBS with hardware-based data source filtering can > be a useful mechanism to improve hot page tracking in conjunction > with other techniques. Rather than "can", I'd say: "might". Backing up to what I said originally: > So, in practice, these events (PEBS) weren't very useful > for driving memory tiering. By "driving" I really meant solely driving. Like, can PEBS be used as the one and only mechanism? We couldn't make it work. But, the hardware _is_ sitting there mostly unused. It might be great to augment what is there, and nobody should be discouraged from looking at it again.