Received: by 2002:a05:7412:a9a2:b0:e2:908c:2ebd with SMTP id o34csp1498828rdh; Fri, 27 Oct 2023 17:10:23 -0700 (PDT) X-Google-Smtp-Source: AGHT+IFZfhsFWFbhZRaBT6PhJxqWGjwaY0lZpeAKJwkVFhp/FPFwwaV0fmILQAWwW82m6+N/G/06 X-Received: by 2002:a05:6a20:7f92:b0:159:d4f5:d59 with SMTP id d18-20020a056a207f9200b00159d4f50d59mr5291301pzj.12.1698451822767; Fri, 27 Oct 2023 17:10:22 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1698451822; cv=none; d=google.com; s=arc-20160816; b=V3KwSnWNFeEhQKKaIRSgu+U2MgWDXqZ6uM/a+kJ4dZkqMcwKJVmWe4dSYV9gTGj6Si +jMJk+CAO3jgdYELmeUn4oE6K1egQoiIkX31c8I2Y29lG6rnEuDX4h326kEm6ufqq6hx 2AuVlFf0O67/LbRZLSw1QVMKqsBELc9VMXAhzjn/QQvVOaS2TPpLyCXQTjyIjL1+1SFk Cyy5A7301FdfrdHid0lmZ5RReYVqjjW3FTe5B2MlUjZF+9CzioS69Wik1e75YbpDKEk4 o/X2ll4DxrJgTp7/6cNROGZ2tF74VbCFJ5nUg3bb54cqamhxtfmB9sfO58cfS1oQp5kq qMWw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :message-id:date:subject:cc:to:from; bh=b5rv3LWodrplP3bf7jX0quuX+wsL+p+jTB1UwG8EudQ=; fh=sS+J4OyOC0EcVLWJpS3mBHGeO+0+dYZJ+ImCUfzsrH4=; b=oVEnwCz24g4T5wPUmSoqrmpFlPd22QjzHZkV4WG97c2JV2SM34kxBmxmyCWXsc4EZw q6zFlYZOIta/KAYMpQo1lU8vO40Wmx9MF16zT/Pua4/VROGJwAFSdBGJHEeAKj+CL4dY pZTlO9m2EKI5lMp5v1IaDOBgBp78eRuq3zX2arGIuF/u5Ci4igIqTY+KTc6W8eQIlNfT 70mRuPkvweYYtniYTCx3UDyygCdby2ilTe7oEXXqVppMvdJgj7icHI+tR4tNTQCq1IGe xQ/6JbaSVSzBglZ5OlK/bIz+zCBZs6Pr/+zd9nn22Dm2BtoDM3H9R6hvD9hncIPAV2dZ JJUg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.37 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from snail.vger.email (snail.vger.email. [23.128.96.37]) by mx.google.com with ESMTPS id t23-20020a656097000000b005b82f5f99cfsi1686166pgu.603.2023.10.27.17.10.22 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 27 Oct 2023 17:10:22 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.37 as permitted sender) client-ip=23.128.96.37; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.37 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by snail.vger.email (Postfix) with ESMTP id BD976837B2EC; Fri, 27 Oct 2023 17:10:21 -0700 (PDT) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.10 at snail.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232518AbjJ1AKT (ORCPT + 99 others); Fri, 27 Oct 2023 20:10:19 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:44866 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229712AbjJ1AKR (ORCPT ); Fri, 27 Oct 2023 20:10:17 -0400 Received: from 66-220-144-179.mail-mxout.facebook.com (66-220-144-179.mail-mxout.facebook.com [66.220.144.179]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 5C5F11B6 for ; Fri, 27 Oct 2023 17:10:14 -0700 (PDT) Received: by devbig1114.prn1.facebook.com (Postfix, from userid 425415) id C633AE5A2553; Fri, 27 Oct 2023 17:09:57 -0700 (PDT) From: Stefan Roesch To: kernel-team@fb.com Cc: shr@devkernel.io, akpm@linux-foundation.org, david@redhat.com, hannes@cmpxchg.org, riel@surriel.com, linux-kernel@vger.kernel.org, linux-mm@kvack.org Subject: [PATCH v2 0/4] mm/ksm: Add ksm advisor Date: Fri, 27 Oct 2023 17:09:41 -0700 Message-Id: <20231028000945.2428830-1-shr@devkernel.io> X-Mailer: git-send-email 2.39.3 MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Spam-Status: No, score=-0.1 required=5.0 tests=BAYES_00, RCVD_IN_DNSWL_BLOCKED,RDNS_DYNAMIC,SPF_HELO_PASS,SPF_NEUTRAL, TVD_RCVD_IP,URIBL_BLOCKED autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (snail.vger.email [0.0.0.0]); Fri, 27 Oct 2023 17:10:22 -0700 (PDT) What is the KSM advisor? =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D The ksm advisor automatically manages the pages_to_scan setting to achieve a target scan time. The target scan time defines how many seconds it should take to scan all the candidate KSM pages. In other words the pages_to_scan rate is changed by the advisor to achieve the target scan time. Why do we need a KSM advisor? =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D The number of candidate pages for KSM is dynamic. It can often be observe= d that during the startup of an application more candidate pages need to be processed. Without an advisor the pages_to_scan parameter needs to be sized for the maximum number of candidate pages. With the scan time advisor the pages_to_scan parameter based can be changed based on demand. Algorithm =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D The algorithm calculates the change value based on the target scan time and the previous scan time. To avoid pertubations an exponentially weighted moving average is applied. The algorithm has a max and min value to: - guarantee responsiveness to changes - to avoid to spend too much CPU Parameters to influence the KSM scan advisor =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D The respective parameters are: - ksm_advisor_mode 0: None (default), 1: scan time advisor - ksm_advisor_target_scan_time how many seconds a scan should of all candidate pages take - ksm_advisor_min_cpu lower limit for the cpu usage in percent of the ksmd background thread - ksm_advisor_max_cpu upper limit for the cpu usage in percent of the ksmd background thread The initial value and the max value for the pages_to_scan parameter can be limited with: - ksm_advisor_min_pages minimum value for pages_to_scan per batch - ksm_advisor_max_pages maximum value for pages_to_scan per batch The default settings for the above two parameters should be suitable for most workloads. The parameters are exposed as knobs in /sys/kernel/mm/ksm. By default the scan time advisor is disabled. Currently there are two advisors: - none and - scan time. Resource savings =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D Tests with various workloads have shown considerable CPU savings. Most of the workloads I have investigated have more candidate pages during startup. Once the workload is stable in terms of memory, the number of candidate pages is reduced. Without the advisor, the pages_to_scan needs to be sized for the maximum number of candidate pages. So having this advisor definitely helps in reducing CPU consumption. For the instagram workload, the advisor achieves a 25% CPU reduction. Once the memory is stable, the pages_to_scan parameter gets reduced to about 40% of its max value. The new advisor works especially well if the smart scan feature is also enabled. How is defining a target scan time better? =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D For an administrator it is more logical to set a target scan time.. The administrator can determine how many pages are scanned on each scan. Therefore setting a target scan time makes more sense. In addition the administrator might have a good idea about the memory sizing of its respective workloads. Setting cpu limits is easier than setting The pages_to_scan parameter. Th= e pages_to_scan parameter is per batch. For the administrator it is difficu= lt to set the pages_to_scan parameter. Tracing =3D=3D=3D=3D=3D=3D=3D A new tracing event has been added for the scan time advisor. The new trace event is called ksm_advisor. It reports the scan time, the new pages_to_scan setting and the cpu usage of the ksmd background thread. Other approaches =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D Approach 1: Adapt pages_to_scan after processing each batch. If KSM merges pages, increase the scan rate, if less KSM pages, reduce the the pages_to_scan rate. This doesn't work too well. While it increases the pages_to_scan for a short period, but generally it ends up with a too low pages_to_scan rate. Approach 2: Adapt pages_to_scan after each scan. The problem with that approach is that the calculated scan rate tends to be high. The more aggressive KSM scans, the more pages it can de-duplicate. There have been earlier attempts at an advisor: propose auto-run mode of ksm and its tests (https://marc.info/?l=3Dlinux-mm&m=3D166029880214485&w=3D2) Changes: =3D=3D=3D=3D=3D=3D=3D=3D V2: - Use functions for long long calculations to support 32 bit platforms - Use cpu min and cpu max settings for the advisor instead of the pages min and max parameters. - pages min and max values are now used for the initial and max values. Generally they are not required to be changed. - Add cpu percent usage value to tracepoint definition - Update documentation for cpu min and cpu max values=20 - Update commit messages for the above changes Stefan Roesch (4): mm/ksm: add ksm advisor mm/ksm: add sysfs knobs for advisor mm/ksm: add tracepoint for ksm advisor mm/ksm: document ksm advisor and its sysfs knobs Documentation/admin-guide/mm/ksm.rst | 66 ++++++ include/trace/events/ksm.h | 33 +++ mm/ksm.c | 314 ++++++++++++++++++++++++++- 3 files changed, 412 insertions(+), 1 deletion(-) base-commit: 12d04a7bf0da67321229d2bc8b1a7074d65415a9 --=20 2.39.3