Received: by 2002:a25:e7d8:0:0:0:0:0 with SMTP id e207csp308862ybh; Thu, 12 Mar 2020 02:22:25 -0700 (PDT) X-Google-Smtp-Source: ADFU+vvFZM/3QhaD3aYMtIcno0JvdRy71ThEdGLvP4KS1FYvylN1VqTK7jklZhN+hheku68gHG0f X-Received: by 2002:a05:6830:2091:: with SMTP id y17mr1759596otq.206.1584004945506; Thu, 12 Mar 2020 02:22:25 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1584004945; cv=none; d=google.com; s=arc-20160816; b=PQTrT7nEVziNx5ITZqDEhgxyUHzUWJhCoGrSfjlL1DJENCuJAeHEYJTb8OipqOo2vT ynzV2A2kyEjShByuD5P0ZY9r/IWJa4jDNIBKm07KxXD6Hk9uXVvnQa7RFXTzv2ahrqSf jrBy5J/pj3K5wd6XsJdoaAuEmEFUEQbdAMmA1KFO9dDC83Z5mFwKsfSYf9ClZ/w3uegH hv6H5Bi8v23Z7O/yLb7JNejJxmbKzYuFGDpeD7brAS/pYWGk9xoiIyC3rfIqHAeXFRNt NwhHZAod+S9zEvpypRruyoZoTq9Kx/SU3SikwAL1RM1nakVd1X8nBShTIi3IrGPf3hWr yOsg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:mime-version:in-reply-to:message-id:date :subject:cc:to:from:ironport-sdr:dkim-signature; bh=2uCJ+4ws4cvlEV+wb61ZNYapETdHtIGmrv87apB/d5Y=; b=QDoOX+/DNdXC8LUD1ZCc10xvVo+VMBfQ0jYYHlj+DWavd9x8qaPrDyVFRCr673YNBF zj70TrKaZEySMD0/Crh5Qveko8D0Mxr3/0ncGQXVD/nLdBv6JVLvz+9rohW0/V5c5J/p StVb1lckNj6bISHdCeI7LX7+C9MdmfXCLz0HKXTrOBcDqKjVJoevG5P2pKjA1tjUvE8A sNlX2TPahXqqD7RLKRw4AnAK3wujgZyHYBZGfOvkk+3mILxQCThxz6TbDMZ08c80v6Xj ciQxUm5+S4WNVKIp/9OBZV4RE4govs6ORgE/tRYtdCzMmHVSkc2BrOkgPPNvmgOV/Lkp Q5ug== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@amazon.com header.s=amazon201209 header.b=j8TK+EQF; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=amazon.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id h10si2265937oie.63.2020.03.12.02.22.12; Thu, 12 Mar 2020 02:22:25 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@amazon.com header.s=amazon201209 header.b=j8TK+EQF; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=amazon.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726594AbgCLJVQ (ORCPT + 99 others); Thu, 12 Mar 2020 05:21:16 -0400 Received: from smtp-fw-33001.amazon.com ([207.171.190.10]:21505 "EHLO smtp-fw-33001.amazon.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726099AbgCLJVP (ORCPT ); Thu, 12 Mar 2020 05:21:15 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amazon.com; i=@amazon.com; q=dns/txt; s=amazon201209; t=1584004874; x=1615540874; h=from:to:cc:subject:date:message-id:in-reply-to: mime-version; bh=2uCJ+4ws4cvlEV+wb61ZNYapETdHtIGmrv87apB/d5Y=; b=j8TK+EQFXF/EHi+avbrNqU8NWVYDhBw09s12w/PIVns7kN2QqEwrjDXX 9Vkvsw6FXH6cUXSM3i2PojN/4jJjlQBl1VaPlkl30aiZou5e200GATLFH 6sHBk9WXnJdDhf0NAc5itj4sYUCLq3H/mKXFRWlPBS1FstdE+heDjlbhz 8=; IronPort-SDR: byhSOgQR6MQP6GXvYn18sTTUXQ1OXhL4l7uE+EVjmfFsOwkRWzCdefsL6goSsigDWavy6wQX8V L/f9xsbvJEUA== X-IronPort-AV: E=Sophos;i="5.70,544,1574121600"; d="scan'208";a="32157934" Received: from sea32-co-svc-lb4-vlan3.sea.corp.amazon.com (HELO email-inbound-relay-1d-38ae4ad2.us-east-1.amazon.com) ([10.47.23.38]) by smtp-border-fw-out-33001.sea14.amazon.com with ESMTP; 12 Mar 2020 09:21:10 +0000 Received: from EX13MTAUEA002.ant.amazon.com (iad55-ws-svc-p15-lb9-vlan3.iad.amazon.com [10.40.159.166]) by email-inbound-relay-1d-38ae4ad2.us-east-1.amazon.com (Postfix) with ESMTPS id D137CA1D81; Thu, 12 Mar 2020 09:21:00 +0000 (UTC) Received: from EX13D31EUA001.ant.amazon.com (10.43.165.15) by EX13MTAUEA002.ant.amazon.com (10.43.61.77) with Microsoft SMTP Server (TLS) id 15.0.1236.3; Thu, 12 Mar 2020 09:20:59 +0000 Received: from u886c93fd17d25d.ant.amazon.com (10.43.161.152) by EX13D31EUA001.ant.amazon.com (10.43.165.15) with Microsoft SMTP Server (TLS) id 15.0.1497.2; Thu, 12 Mar 2020 09:20:47 +0000 From: SeongJae Park To: Jonathan Cameron CC: SeongJae Park , , "SeongJae Park" , , , , , , , , , , , , , , , , , , , , , , , , , , , , , Subject: Re: Re: [PATCH v6 02/14] mm/damon: Implement region based sampling Date: Thu, 12 Mar 2020 10:20:30 +0100 Message-ID: <20200312092030.347-1-sjpark@amazon.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20200310173938.00002af4@Huawei.com> (raw) MIME-Version: 1.0 Content-Type: text/plain X-Originating-IP: [10.43.161.152] X-ClientProxiedBy: EX13D15UWB003.ant.amazon.com (10.43.161.138) To EX13D31EUA001.ant.amazon.com (10.43.165.15) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, 10 Mar 2020 17:39:38 +0000 Jonathan Cameron wrote: > On Tue, 10 Mar 2020 17:22:40 +0100 > SeongJae Park wrote: > > > On Tue, 10 Mar 2020 15:55:10 +0000 Jonathan Cameron wrote: > > > > > On Tue, 10 Mar 2020 12:52:33 +0100 > > > SeongJae Park wrote: > > > > > > > Added replies to your every comment in line below. I agree to your whole > > > > opinions, will apply those in next spin! :) > > > > > > > > > > One additional question inline that came to mind. Using a single statistic > > > to monitor huge page and normal page hits is going to give us problems > > > I think. > > > > Ah, you're right!!! This is indeed a critical bug! > > > > > > > > Perhaps I'm missing something? > > > > > > > > > +/* > > > > > > + * Check whether the given region has accessed since the last check > > > > > > > > > > Should also make clear that this sets us up for the next access check at > > > > > a different memory address it the region. > > > > > > > > > > Given the lack of connection between activities perhaps just split this into > > > > > two functions that are always called next to each other. > > > > > > > > Will make the description more clearer as suggested. > > > > > > > > Also, I found that I'm not clearing *pte and *pmd before going 'mkold', thanks > > > > to this comment. Will fix it, either. > > > > > > > > > > > > > > > + * > > > > > > + * mm 'mm_struct' for the given virtual address space > > > > > > + * r the region to be checked > > > > > > + */ > > > > > > +static void kdamond_check_access(struct damon_ctx *ctx, > > > > > > + struct mm_struct *mm, struct damon_region *r) > > > > > > +{ > > > > > > + pte_t *pte = NULL; > > > > > > + pmd_t *pmd = NULL; > > > > > > + spinlock_t *ptl; > > > > > > + > > > > > > + if (follow_pte_pmd(mm, r->sampling_addr, NULL, &pte, &pmd, &ptl)) > > > > > > + goto mkold; > > > > > > + > > > > > > + /* Read the page table access bit of the page */ > > > > > > + if (pte && pte_young(*pte)) > > > > > > + r->nr_accesses++; > > > > > > +#ifdef CONFIG_TRANSPARENT_HUGEPAGE > > > > > > > > > > Is it worth having this protection? Seems likely to have only a very small > > > > > influence on performance and makes it a little harder to reason about the code. > > > > > > > > It was necessary for addressing 'implicit declaration' problem of 'pmd_young()' > > > > and 'pmd_mkold()' for build of DAMON on several architectures including User > > > > Mode Linux. > > > > > > > > Will modularize the code for better readability. > > > > > > > > > > > > > > > + else if (pmd && pmd_young(*pmd)) > > > > > > + r->nr_accesses++; > > > > > > So we increment a region count by one if we have an access in a huge page, or > > > in a normal page. > > > > > > If we get a region that has a mixture of the two, this seems likely to give a > > > bad approximation. > > > > > > Assume the region is accessed 'evenly' but each " 4k page" is only hit 10% of the time > > > (where a hit is in one check period) > > > > > > If our address in a page, then we'll hit 10% of the time, but if it is in a 2M > > > huge page then we'll hit a much higher percentage of the time. > > > 1 - (0.9^512) ~= 1 > > > > > > Should we look to somehow account for this? > > > > Yes, this is really critical bug and we should fix this! Thank you so much for > > finding this! > > > > > > > > > > > +#endif /* CONFIG_TRANSPARENT_HUGEPAGE */ > > > > > > + > > > > > > + spin_unlock(ptl); > > > > > > + > > > > > > +mkold: > > > > > > + /* mkold next target */ > > > > > > + r->sampling_addr = damon_rand(ctx, r->vm_start, r->vm_end); > > > > > > + > > > > > > + if (follow_pte_pmd(mm, r->sampling_addr, NULL, &pte, &pmd, &ptl)) > > > > > > + return; > > > > > > + > > > > > > + if (pte) { > > > > > > + if (pte_young(*pte)) { > > > > > > + clear_page_idle(pte_page(*pte)); > > > > > > + set_page_young(pte_page(*pte)); > > > > > > + } > > > > > > + *pte = pte_mkold(*pte); > > > > > > + } > > > > > > +#ifdef CONFIG_TRANSPARENT_HUGEPAGE > > > > > > + else if (pmd) { > > > > > > + if (pmd_young(*pmd)) { > > > > > > + clear_page_idle(pmd_page(*pmd)); > > > > > > + set_page_young(pmd_page(*pmd)); > > > > > > + } > > > > > > + *pmd = pmd_mkold(*pmd); > > > > > > + } > > > > This is also very problematic if several regions are backed by a single huge > > page, as only one region in the huge page will be checked as accessed. > > > > Will address these problems in next spin! > > Good point. There is little point in ever having multiple regions including > a single huge page. Would it be possible to tweak the region splitting algorithm > to not do this? Yes, it would be a good solution. However, I believe this is a problem of the access checking mechanism, as the definition of the region is only 'memory area having similar access frequency'. Adding more rules such as 'it should be aligned by HUGE PAGE size' might make things more complex. Also, we're currently using page table Accessed bits as the primitive for the access check, but it could be extended to other primitives in future. Therefore, I would like to modify the access checking mechanism to aware the huge pages existance. For regions containing both regular pages and huge pages, the huge pages will make some errorneous high access frequency as you noted before, but the adaptive regions adjustment will eventually split them. If you have other concerns or opinions, please let me know. Thanks, SeongJae Park > > Jonathan > > > > > > > Thanks, > > SeongJae Park > > > > > > > > +#endif > > > > > > + > > > > > > + spin_unlock(ptl); > > > > > > +} > > > > > > + > > > > > > > >