Received: by 2002:ab2:1149:0:b0:1f3:1f8c:d0c6 with SMTP id z9csp2566686lqz; Wed, 3 Apr 2024 01:42:50 -0700 (PDT) X-Forwarded-Encrypted: i=3; AJvYcCXiTYVWwpXflyd0881dgWbB1Xr9WZ9w7SYLJ/Csms8gTnNKMrRDbdidFX7Mcngx7H3p2Wtfw9roV6/0ZtFxT5gwk6lJeBLApJTrH+mwEA== X-Google-Smtp-Source: AGHT+IHYY/A16tgHuPpk+qpYvhb2RMG9ajoDDZkKzK/gQPDyaPoNV9tomvj4iFXIt2scjpMPpGMw X-Received: by 2002:a17:903:1c5:b0:1e0:aa4d:747f with SMTP id e5-20020a17090301c500b001e0aa4d747fmr2247072plh.6.1712133770360; Wed, 03 Apr 2024 01:42:50 -0700 (PDT) ARC-Seal: i=2; a=rsa-sha256; t=1712133770; cv=pass; d=google.com; s=arc-20160816; b=N/JJHE/VGH+HYcFFDLqMLo+gNMBT5xyty8MTbI8OupNDh6v7mgUcVyu6/LTUrXffeG 6cHVpGOzewR+/ayFohSiIZwJaQ0A7JS3UdyZEx+KY9eJ/IwuRWe9vqLObCQNurk/oBfg eIGdO3ZI4NlZg5aZSVnxRn4NC8oFU2UGFZvsAXVQ/Oe2YfEoH6b0H+O03+PRJ/qz0zd+ Z4HT34l2zcfpxb/SXSdy4bKz/fBVTsEuzjbKvESmxxWOwYvw6xZW6e9PerwY/OwWyWda yQJoLBF8ts/Sh5f24OkxPagKdKqfbFnjkmpwRNzwYJKC2jrDgxcJm4es+dQJxPYwnpDi KsMg== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=mime-version:list-unsubscribe:list-subscribe:list-id:precedence :user-agent:message-id:date:references:in-reply-to:subject:cc:to :from:dkim-signature; bh=t6GmiQ6YFOUpgGjwEihKAXDenv6JMkNouCZgpYQ0q28=; fh=7k6QPw5BlXJGFuv/bRE7oOlyLHFUso6GL2Y7TicGH8E=; b=rbIa1sKBmkjVo/Fo7lco4fvaWq+bSgY20Kkjau1AGJ4wkdC8XWF59Sphwi+46VZbZm wfI1kTVuIXVeifO6H34ELM+UMIclD+auLqCcKDiTPYmjWVVeEqzo7u9NRixu8sOJuMRZ TXCZlFFd0gmsd8BItJGrhFhS462WEZ0iMjTb3H0BFA8QKt09BJ0gD0EpRAwxrKzwtYEL X2J1COQ5xhLWMAzjONa1+qUJIh949xJQSlEdbiPkVdN+13hJ99mj7XlR5HI5mL+vNtTE v3nSkmz7wWXf9fDxigkcd7Fx2Jsnc+I+cqeW9x4diy5A9Tj9e9A6DH9cKYyS6aQJ5eFC 4qgA==; dara=google.com ARC-Authentication-Results: i=2; mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=RSmqzXjb; arc=pass (i=1 spf=pass spfdomain=intel.com dkim=pass dkdomain=intel.com dmarc=pass fromdomain=intel.com); spf=pass (google.com: domain of linux-kernel+bounces-129355-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:45e3:2400::1 as permitted sender) smtp.mailfrom="linux-kernel+bounces-129355-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Return-Path: Received: from sv.mirrors.kernel.org (sv.mirrors.kernel.org. [2604:1380:45e3:2400::1]) by mx.google.com with ESMTPS id p15-20020a170902780f00b001e2377059c0si9566127pll.44.2024.04.03.01.42.50 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 03 Apr 2024 01:42:50 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel+bounces-129355-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:45e3:2400::1 as permitted sender) client-ip=2604:1380:45e3:2400::1; Authentication-Results: mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=RSmqzXjb; arc=pass (i=1 spf=pass spfdomain=intel.com dkim=pass dkdomain=intel.com dmarc=pass fromdomain=intel.com); spf=pass (google.com: domain of linux-kernel+bounces-129355-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:45e3:2400::1 as permitted sender) smtp.mailfrom="linux-kernel+bounces-129355-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by sv.mirrors.kernel.org (Postfix) with ESMTPS id 2DCEC285657 for ; Wed, 3 Apr 2024 08:42:36 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id BA0335FEE5; Wed, 3 Apr 2024 08:42:29 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="RSmqzXjb" Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.8]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id C5F3F6CDB4 for ; Wed, 3 Apr 2024 08:42:25 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=192.198.163.8 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1712133748; cv=none; b=b58RqXG66ljRE8N7JN+FRcPbuxVbPgKMBhT/oxAi3nm/z4+7RWpbsKCj8kI8dBWg/wi1VUdO+SadVE4hXtsIL6nq9a29zodgOyOYy1IQ/RrgiosuR2PcUgAI8pXA8qYNQooUHGyb7gSDFqh3IK099+V2dnvZW13PNN1Qza4Rf2o= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1712133748; c=relaxed/simple; bh=Yc/UxVgdcmJxp/86IHuzsAxrO66im/yKibWMN4SKocA=; h=From:To:Cc:Subject:In-Reply-To:References:Date:Message-ID: MIME-Version:Content-Type; b=XJB0UgJ0cepI0zclE1UzOMj2gy27p3y4u00ExFJN4pNgQm7xwwKJ3g/jE/YXUOzN18fNnMuF7tZpK5MkZy7EFaarxuOxEoB4nLfR5fvHffVhtpWCWoWtAmpXkzcJCEyL98BMWlrkzjItne29lKyJ9XWrBtRgxUV6FzwPpGyiOjw= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=RSmqzXjb; arc=none smtp.client-ip=192.198.163.8 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1712133746; x=1743669746; h=from:to:cc:subject:in-reply-to:references:date: message-id:mime-version; bh=Yc/UxVgdcmJxp/86IHuzsAxrO66im/yKibWMN4SKocA=; b=RSmqzXjbpYHo6ynOZip6VXWMzvKaIJhN+LnxPH6NjVmytn6RFE1E8NPS VOomkc6lEd7sf7XMKA1trBENkPmtoSTKhWTXrM3TPDLRzAXq89jCWfEj0 b5SYNnOKBOC0Tlp0v9EErITf7s0d5+OeJoqN/HdUvH0fdwO51nXH4FuSt VSG4Iy5rra1SqCGlpv7THRvtMpxU0UpYxBAiyl8pTYPkFGTnBoJ2pACwF zg2Va0lYWjvGpkRivoq2FZw9iw366rcox6S0hsyZPUp6lXt6wtGRPj5m3 Xzd9AA3lM3cWB1+9iZ6u8BJNdkruC9geCBC6NLUPQ2XCztrrqV9lmGi7i Q==; X-CSE-ConnectionGUID: +Q/Haxz+Sk+teiGRMbNDCg== X-CSE-MsgGUID: 5FZeYTqST86SKrB1SY825g== X-IronPort-AV: E=McAfee;i="6600,9927,11032"; a="24847904" X-IronPort-AV: E=Sophos;i="6.07,177,1708416000"; d="scan'208";a="24847904" Received: from fmviesa004.fm.intel.com ([10.60.135.144]) by fmvoesa102.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 03 Apr 2024 01:42:24 -0700 X-CSE-ConnectionGUID: FrRS57trSh+Ovwk9SHtEBA== X-CSE-MsgGUID: e7BMVgx9Q9ylI9EuWF6nMg== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.07,177,1708416000"; d="scan'208";a="22993868" Received: from yhuang6-desk2.sh.intel.com (HELO yhuang6-desk2.ccr.corp.intel.com) ([10.238.208.55]) by fmviesa004-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 03 Apr 2024 01:42:22 -0700 From: "Huang, Ying" To: Bharata B Rao Cc: , , , , , , , , Subject: Re: [RFC PATCH 0/2] Hot page promotion optimization for large address space In-Reply-To: <9ec3b04b-bde8-42ce-be1b-34f7d8e6762d@amd.com> (Bharata B. Rao's message of "Tue, 2 Apr 2024 14:56:37 +0530") References: <20240327160237.2355-1-bharata@amd.com> <87il16lxzl.fsf@yhuang6-desk2.ccr.corp.intel.com> <87edbulwom.fsf@yhuang6-desk2.ccr.corp.intel.com> <929b22ca-bb51-4307-855f-9b4ae0a102e3@amd.com> <875xx5lu05.fsf@yhuang6-desk2.ccr.corp.intel.com> <7e373c71-b2dc-4ae4-9746-c840f2a513a5@amd.com> <87o7asfrm1.fsf@yhuang6-desk2.ccr.corp.intel.com> <9ec3b04b-bde8-42ce-be1b-34f7d8e6762d@amd.com> Date: Wed, 03 Apr 2024 16:40:28 +0800 Message-ID: <87il0yet4z.fsf@yhuang6-desk2.ccr.corp.intel.com> User-Agent: Gnus/5.13 (Gnus v5.13) Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=ascii Bharata B Rao writes: > On 02-Apr-24 7:33 AM, Huang, Ying wrote: >> Bharata B Rao writes: >> >>> On 29-Mar-24 6:44 AM, Huang, Ying wrote: >>>> Bharata B Rao writes: >>> >>>>> I don't think the pages are cold but rather the existing mechanism fails >>>>> to categorize them as hot. This is because the pages were scanned way >>>>> before the accesses start happening. When repeated accesses are made to >>>>> a chunk of memory that has been scanned a while back, none of those >>>>> accesses get classified as hot because the scan time is way behind >>>>> the current access time. That's the reason we are seeing the value >>>>> of latency ranging from 20s to 630s as shown above. >>>> >>>> If repeated accesses continue, the page will be identified as hot when >>>> it is scanned next time even if we don't expand the threshold range. If >>>> the repeated accesses only last very short time, it makes little sense >>>> to identify the pages as hot. Right? >>> >>> The total allocated memory here is 192G and the chunk size is 1G. Each >>> time one such 1G chunk is taken up randomly for generating memory accesses. >>> Within that 1G, 262144 random accesses are performed and 262144 such >>> accesses are repeated for 512 times. I thought that should be enough >>> to classify that chunk of memory as hot. >> >> IIUC, some pages are accessed in very short time (maybe within 1ms). >> This isn't repeated access in a long period. I think that pages >> accessed repeatedly in a long period are good candidates for promoting. >> But pages accessed frequently in only very short time aren't. > > Here are the numbers for the 192nd chunk: > > Each iteration of 262144 random accesses takes around ~10ms > 512 such iterations are taking ~5s > numa_scan_seq is 16 when this chunk is accessed. > And no page promotions were done from this chunk. All the > time should_numa_migrate_memory() found the NUMA hint fault > latency to be higher than threshold. > > Are these time periods considered too short for the pages > to be detected as hot and promoted? Yes. I think so. This is burst accessing, not repeated accessing. IIUC, NUMA balancing based promotion only works for repeated accessing for long time, for example, >100s. >> >>> But as we see, often times >>> the scan time is lagging the access time by a large value. >>> >>> Let me instrument the code further to learn more insights (if possible) >>> about the scanning/fault time behaviors here. >>> >>> Leaving the fault count based threshold apart, do you think there is >>> value in updating the scan time for skipped pages/PTEs during every >>> scan so that the scan time remains current for all the pages? >> >> No, I don't think so. That makes hint page fault latency more >> inaccurate. > > For the case that I have shown, depending on a old value of scan > time doesn't work well when pages get accessed after a long time > since scanning. At least with the scheme I show in patch 2/2, > probability of detecting pages as hot increases. Yes. This may help your cases, but it will hurt other cases with incorrect hint page fault latency. To resolve your issue, we can increase the max value of the hot threshold automatically. We can work on that if you can find a real workload. -- Best Regards, Huang, Ying