Received: by 2002:a05:6358:1087:b0:cb:c9d3:cd90 with SMTP id j7csp364256rwi; Tue, 18 Oct 2022 19:18:32 -0700 (PDT) X-Google-Smtp-Source: AMsMyM6vorUu+9/E675zanZ9uS0OAFyL8rIUU85l6xdZLPkpCM44PqYDthlcUWi7xMNkt4HB1+Ul X-Received: by 2002:a17:906:cc18:b0:78d:ee0f:ce02 with SMTP id ml24-20020a170906cc1800b0078dee0fce02mr4851898ejb.323.1666145911933; Tue, 18 Oct 2022 19:18:31 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1666145911; cv=none; d=google.com; s=arc-20160816; b=p+u92qiE2uJNsG9XyV9sLjJGrQIoWPWGjtHTxPZ3fSx9vk9RUB+y1/40rmsdlFejvt 6txBByCRWtdZT9zCi1fIyMYxJRQ33PIdK0LmHF2jYIQ+CT6KCpjICaWawVs8r++rJEqY WCmPo4oXbH4KCKq37y2JNYvdwA3YckPxhG0M1BJpB/uLWVPWazIN4yUl1K+uOAHw0eMC UCabDo7XAToE1Ln+ov4mH5DmR3J+xfuVDBNnhkJeyCWbTTYpnh+f0KqquKrn50OipCxC qlAEzBXYUh5nbVGwDe+uZqcAknBRRMI1IqDGrVU984VQ9EKaDsn+p2w9nG2ZO29t1S4U GJag== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:mime-version:user-agent:message-id:in-reply-to :date:references:subject:cc:to:from:dkim-signature; bh=c5nK0JCPpNEDvvauAqZtwzB404oGBk/KNA3Lg5ejedQ=; b=oTeuvc76h4um28irjdA01AUi27ZtTt5V2MTr46sFPq+1kOtNc0z3yLZq19Yd/UtvO6 mFrQDRocxcZGGPUOsktrr/yq8lPJAwBLM6EdJhLqmPPzfeYvolzWzBnPFy7RlHfvgzt0 PR6S/aGkIS6d6+6sJHoXXp6bzWtk5C/qyCaANxV9vtkQ4b9VfYPSml9uLJENZ5kW2Kwx HzlVH0sdQDdUuFQH19Wyl9X+sLWvpfyMg6w9FGao2ZWzhSUtauwK2yDvyzAUoT4WDkPw hv+butvx1bpCcy/F1GZ0lvoaGfIB3ji3B3ypJPqBpQzffPT6AB888zwAWnZ23ZuIBcql uC0g== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=D7pjHk+J; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id eb11-20020a0564020d0b00b0045d189ac60esi14626200edb.401.2022.10.18.19.17.40; Tue, 18 Oct 2022 19:18:31 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=D7pjHk+J; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229705AbiJSCGf (ORCPT + 99 others); Tue, 18 Oct 2022 22:06:35 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:40460 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229584AbiJSCGd (ORCPT ); Tue, 18 Oct 2022 22:06:33 -0400 Received: from mga12.intel.com (mga12.intel.com [192.55.52.136]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id B24752E9EB for ; Tue, 18 Oct 2022 19:06:32 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1666145192; x=1697681192; h=from:to:cc:subject:references:date:in-reply-to: message-id:mime-version; bh=wJzOhrQWqcsuSASOVqRvm9mp5Iy2u4n3d/wxjNCySmY=; b=D7pjHk+Jnjh3GKowbN6J8mFGOq9d49cKkjoeIIt2IvfUM4W/QpN83L/6 wsvBhfdt4jVFFTGN3flWnY7wo37/ZzCyb97rWLX2JL4EqWBtVNaPTJcoU 7AxPaYzdx/+qkFxhQUUBZStGIBW38tT3kqIxrlonuA84IkOZbPDGbLiwi 1U6GF3aK82J2UjuETHx+mkqpmsTPuSaWRvCZ+o1r4J6d59P2dPFUiHXFn 23Ao92bd4sa+QMj7FHK5T82D368a1UjZnqbqStxKi2IVgfgqeLlwaHCYk zVqFZionukwP0V6KpuEt/AkJWNkFvwZWcozTJyxzqgLGfWWbXPw7KflQ5 g==; X-IronPort-AV: E=McAfee;i="6500,9779,10504"; a="286001158" X-IronPort-AV: E=Sophos;i="5.95,194,1661842800"; d="scan'208";a="286001158" Received: from fmsmga006.fm.intel.com ([10.253.24.20]) by fmsmga106.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 18 Oct 2022 19:06:32 -0700 X-IronPort-AV: E=McAfee;i="6500,9779,10504"; a="874201142" X-IronPort-AV: E=Sophos;i="5.95,194,1661842800"; d="scan'208";a="874201142" Received: from yhuang6-desk2.sh.intel.com (HELO yhuang6-desk2.ccr.corp.intel.com) ([10.238.208.55]) by fmsmga006-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 18 Oct 2022 19:06:29 -0700 From: "Huang, Ying" To: kernel test robot Cc: Rik van Riel , , , Andrew Morton , Yang Shi , Matthew Wilcox , , , , , Subject: Re: [mm] f35b5d7d67: will-it-scale.per_process_ops -95.5% regression References: <202210181535.7144dd15-yujie.liu@intel.com> Date: Wed, 19 Oct 2022 10:05:50 +0800 In-Reply-To: <202210181535.7144dd15-yujie.liu@intel.com> (kernel test robot's message of "Tue, 18 Oct 2022 16:44:59 +0800") Message-ID: <87edv4r2ip.fsf@yhuang6-desk2.ccr.corp.intel.com> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/27.1 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain; charset=ascii X-Spam-Status: No, score=-4.7 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_MED, SPF_HELO_PASS,SPF_NONE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi, Yujie, > 32528 48% +147.6% 80547 38% numa-meminfo.node0.AnonHugePages > 92821 23% +59.3% 147839 28% numa-meminfo.node0.AnonPages The Anon pages allocated are much more than the parent commit. This is expected, because THP instead of normal page will be allocated for aligned memory area. > 95.23 -79.8 15.41 6% perf-profile.calltrace.cycles-pp.__munmap > 95.08 -79.7 15.40 6% perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.__munmap > 95.02 -79.6 15.39 6% perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.__munmap > 94.96 -79.6 15.37 6% perf-profile.calltrace.cycles-pp.__x64_sys_munmap.do_syscall_64.entry_SYSCALL_64_after_hwframe.__munmap > 94.95 -79.6 15.37 6% perf-profile.calltrace.cycles-pp.__vm_munmap.__x64_sys_munmap.do_syscall_64.entry_SYSCALL_64_after_hwframe.__munmap > 94.86 -79.5 15.35 6% perf-profile.calltrace.cycles-pp.__do_munmap.__vm_munmap.__x64_sys_munmap.do_syscall_64.entry_SYSCALL_64_after_hwframe > 94.38 -79.2 15.22 6% perf-profile.calltrace.cycles-pp.unmap_region.__do_munmap.__vm_munmap.__x64_sys_munmap.do_syscall_64 > 42.74 -42.7 0.00 perf-profile.calltrace.cycles-pp.lru_add_drain.unmap_region.__do_munmap.__vm_munmap.__x64_sys_munmap > 42.74 -42.7 0.00 perf-profile.calltrace.cycles-pp.lru_add_drain_cpu.lru_add_drain.unmap_region.__do_munmap.__vm_munmap > 42.72 -42.7 0.00 perf-profile.calltrace.cycles-pp.folio_batch_move_lru.lru_add_drain_cpu.lru_add_drain.unmap_region.__do_munmap > 41.84 -41.8 0.00 perf-profile.calltrace.cycles-pp.folio_lruvec_lock_irqsave.folio_batch_move_lru.lru_add_drain_cpu.lru_add_drain.unmap_region > 41.70 -41.7 0.00 perf-profile.calltrace.cycles-pp._raw_spin_lock_irqsave.folio_lruvec_lock_irqsave.folio_batch_move_lru.lru_add_drain_cpu.lru_add_drain > 41.62 -41.6 0.00 perf-profile.calltrace.cycles-pp.folio_lruvec_lock_irqsave.release_pages.tlb_batch_pages_flush.tlb_finish_mmu.unmap_region > 41.55 -41.6 0.00 perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock_irqsave.folio_lruvec_lock_irqsave.folio_batch_move_lru.lru_add_drain_cpu > 41.52 -41.5 0.00 perf-profile.calltrace.cycles-pp._raw_spin_lock_irqsave.folio_lruvec_lock_irqsave.release_pages.tlb_batch_pages_flush.tlb_finish_mmu > 41.28 -41.3 0.00 perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock_irqsave.folio_lruvec_lock_irqsave.release_pages.tlb_batch_pages_flush In the parent commit, most CPU cycles are used for contention on LRU lock. > 0.00 +4.8 4.82 7% perf-profile.calltrace.cycles-pp.do_huge_pmd_anonymous_page.__handle_mm_fault.handle_mm_fault.do_user_addr_fault.exc_page_fault > 0.00 +4.9 4.88 7% perf-profile.calltrace.cycles-pp.zap_huge_pmd.zap_pmd_range.unmap_page_range.unmap_vmas.unmap_region > 0.00 +8.2 8.22 8% perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock.rmqueue_bulk.rmqueue.get_page_from_freelist > 0.00 +8.2 8.23 8% perf-profile.calltrace.cycles-pp._raw_spin_lock.rmqueue_bulk.rmqueue.get_page_from_freelist.__alloc_pages > 0.00 +8.3 8.35 8% perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock.free_pcppages_bulk.free_unref_page.release_pages > 0.00 +8.3 8.35 8% perf-profile.calltrace.cycles-pp._raw_spin_lock.free_pcppages_bulk.free_unref_page.release_pages.tlb_batch_pages_flush > 0.00 +8.4 8.37 8% perf-profile.calltrace.cycles-pp.free_pcppages_bulk.free_unref_page.release_pages.tlb_batch_pages_flush.tlb_finish_mmu > 0.00 +9.6 9.60 6% perf-profile.calltrace.cycles-pp.free_unref_page.release_pages.tlb_batch_pages_flush.tlb_finish_mmu.unmap_region > 0.00 +65.5 65.48 2% perf-profile.calltrace.cycles-pp.clear_page_erms.clear_huge_page.__do_huge_pmd_anonymous_page.__handle_mm_fault.handle_mm_fault > 0.00 +72.5 72.51 2% perf-profile.calltrace.cycles-pp.clear_huge_page.__do_huge_pmd_anonymous_page.__handle_mm_fault.handle_mm_fault.do_user_addr_fault With the commit, most CPU cycles are consumed for clear huge page. This is expected. We allocate more pages, so, we need more cycles to clear them. Check the source code of test case (will-it-scale/malloc1), I found that it will allocate some memory with malloc() then free it. In the parent commit, because the virtual memory address isn't aligned with 2M, normal page will be allocated. With the commit, THP will be allocated, so more page clearing and less LRU lock contention. I think this is the expected behavior of the commit. And the test case isn't so popular (malloc() then free() but don't access the memory allocated). So this regression isn't important. We can just ignore it. Best Regards, Huang, Ying