Received: by 2002:ac0:a581:0:0:0:0:0 with SMTP id m1-v6csp5675079imm; Tue, 26 Jun 2018 15:56:24 -0700 (PDT) X-Google-Smtp-Source: ADUXVKJcHevUHQOjYVTBbk92Xppmoc8lWCS85SGXbX+yYjqcTI5NQCBS6y5Uv77iPVsNS4L1ZhWD X-Received: by 2002:a17:902:bd42:: with SMTP id b2-v6mr3484169plx.23.1530053783997; Tue, 26 Jun 2018 15:56:23 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1530053783; cv=none; d=google.com; s=arc-20160816; b=YHMxKAltlZiw6bRpt8mwfFG5iUMhJryCs9jW3OGNV8jzTuabgAWXk6kmrhCVFNK3WO gzQwYjmpIz4D0lR1IdJo2LIDLHMugaEGyTX0sbarZ4J2SrzcPSv2i3E4IiB9bYkDgrmE XdUJTN6m8E/sAQAzQRce8llMSXXSdN7QKEroK8cA/E2fEZi9NY5CNm/lfUOnXjLSU6UX Iw5IPJKJM55doqn9+hZOAGfXejNPGx/V9bh9xCLiZhRuNEY6vyW23C4KNMJWMIQS5p1+ kj82kVcVGzBGGYMrjxvCs63CQBFLUSLYPWJkDvpVBGyhzCk3DK5dM8+YnC2fcI6FUbSF WKAw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:message-id:date:subject:cc:to:from :arc-authentication-results; bh=HHVlOg7oJ1xEhqJSw6sBVCIp6nCIhRnFq/05QwnbNGI=; b=oggy32DSoAVYjEQYLIx8RN2LDqfUIkj/zZNtbm999hNnFiXRG3H3ESEVwUSlDx6/fN +I2zl6TVPgQULjQSCgxtNRfDxNEuPPm2oEHXUUC9fNZOC+k+xH42KXt8fIM2pDA2WBTI gvd+fCgg+sxGZnViZVm76OmGl1jYvGsVjVe5Ga8UemE9HBvtPLKNwzPGSGdO35P3UTYa wrth8qeh7lPHx+TEHR04KyiYORs6gUqenFqTSSsfaY8/V40NBKwpmFbimeRXON/4BUET 9IJZdhI8kvBiYPxWnZdxBQgaU2qHojnNlPWZ07Z22j3wDjWmXQJue5iZ7mXr3FYOQjS/ Ot1Q== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id t8-v6si2271413pgq.369.2018.06.26.15.56.09; Tue, 26 Jun 2018 15:56:23 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754311AbeFZRbi (ORCPT + 99 others); Tue, 26 Jun 2018 13:31:38 -0400 Received: from shelob.surriel.com ([96.67.55.147]:36850 "EHLO shelob.surriel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751026AbeFZRbh (ORCPT ); Tue, 26 Jun 2018 13:31:37 -0400 Received: from imladris.surriel.com ([96.67.55.152]) by shelob.surriel.com with esmtpsa (TLSv1.2:ECDHE-RSA-AES256-GCM-SHA384:256) (Exim 4.90_1) (envelope-from ) id 1fXroS-0005n2-Cv; Tue, 26 Jun 2018 13:31:32 -0400 From: Rik van Riel To: linux-kernel@vger.kernel.org Cc: x86@kernel.org, luto@kernel.org, dave.hansen@linux.intel.com, mingo@kernel.org, kernel-team@fb.com, tglx@linutronix.de, efault@gmx.de, songliubraving@fb.com Subject: [PATCH v2 0/7] x86,tlb,mm: make lazy TLB mode even lazier Date: Tue, 26 Jun 2018 13:31:20 -0400 Message-Id: <20180626173126.12296-1-riel@surriel.com> X-Mailer: git-send-email 2.14.4 Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Song noticed switch_mm_irqs_off taking a lot of CPU time in recent kernels, using 1.9% of a 48 CPU system during a netperf run. Digging into the profile, the atomic operations in cpumask_clear_cpu and cpumask_set_cpu are responsible for about half of that CPU use. However, the CPUs running netperf are simply switching back and forth between netperf and the idle task, which does not require any changes to the mm_cpumask if lazy TLB mode were used. Additionally, the init_mm cpumask ends up being the most heavily contended one in the system, for no reason at all. Making really lazy TLB mode work again on modern kernels, by sending a shootdown IPI only when page table pages are being unmapped, we get back some performance. v2 of the series implements things in the way suggested by Andy Lutomirski, which is a nice simplification from before. If patch 3 looks larger, it is because some of the existing code changed indentation, so it can easily be used by both sides of the if/else test. Song's netperf performance results: w/o patchset: 0.95% swapper [kernel.vmlinux] [k] switch_mm_irqs_off 0.77% netserver [kernel.vmlinux] [k] switch_mm_irqs_off w/ patchset: Throughput: 1.74075e+06 0.87% swapper [kernel.vmlinux] [k] switch_mm_irqs_off With these patches, netserver no longer calls switch_mm_irqs_off, and the CPU use of enter_lazy_tlb was below the 0.05% threshold of statistics gathered by Song's scripts. With a memcache style workload, performance does not change measurably, but the amount of CPU time used by switch_mm_irqs_off and other parts of the context switch code do appear to go down in profiles. I am still working on a patch to also get rid of the continuous pounding on mm->count during lazy TLB entry and exit, when the same mm_struct is being used all the time. I do not have that working yet. Until then, these patches provide a nice performance boost, as well as a little memory savings by shrinking the size of mm_struct.