Received: by 2002:a25:e74b:0:0:0:0:0 with SMTP id e72csp1086688ybh; Mon, 13 Jul 2020 09:01:50 -0700 (PDT) X-Google-Smtp-Source: ABdhPJz5vwHe+byL53LuJ0qDzIzKP9e+d3IR3D9uRYUyFbym4dqmqcTp9drTHg5wpsJWeXKT0fs6 X-Received: by 2002:a17:906:c459:: with SMTP id ck25mr421037ejb.177.1594656109409; Mon, 13 Jul 2020 09:01:49 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1594656109; cv=none; d=google.com; s=arc-20160816; b=kPkdqtuy+mvn2OscIfckRJVAaiqTWfi3SPPceWz0UOZWaJFAaYJBjVXuP/fXdcRXdf PmduwPEQvYIsg+VuLRCg3STiJrHuHvVm8GPPBeutduS+Dnup1b6QguSmzp9L8Q2yx5sZ 6kn30Vsn+LQw5QI6v+sWw2vEYS2NxdedcE+GWsKdUfff3st22hlQ8ZUqKXqFllwagiGt Oja15NYq0NVAR4MqW82hsc1gEu7p38QievDpSkyH5gFDp+gdJOP0P7wKZLdyvJ2qzpqY VVYF2MdhLGwQ1SOkof08eBIda++L+Rd8GNPVev/+2hrqioAGqhduIIoBiY+vlwXhvkoc 4NJw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:dkim-signature; bh=WBxWxKL7AEjqBwHWTPsSYTTfWqgqaXlNCX0zMBlrxQM=; b=Prxrw/TMT6DX8Le3OSQUZhUMCsnDAjATzAcyJgdVetZIldxY48zvTUuIsFgP6n7Nqp x8Si4t+cdmfOdJWTXjbqkxgUcI5w2Kx1B0QaVaSltDh60yEQnUNunfhARHgfLCWNQdtM 0kMqDUSmKNxn6fc2gO6jUzgnN9a9tpSPjZR8JChF+GUv6dtGmf2ZT98BVM9YGQDEz2I8 zi3/LRdsDn+zkU9Y3KvjcKDVH/pLciBVfo357THsSX5WV8J79wO23QuJir8ru2uqeN1l pe2dq9/olf2ghjnqto0fPSWZ+Wd7qAlIirlyul8T8gCw5tcJJwJcG4WgsPqG5HM1Fmvc BArQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=default header.b="J6B1/an8"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id by7si9740524edb.303.2020.07.13.09.01.25; Mon, 13 Jul 2020 09:01:49 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=default header.b="J6B1/an8"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1730237AbgGMP7S (ORCPT + 99 others); Mon, 13 Jul 2020 11:59:18 -0400 Received: from mail.kernel.org ([198.145.29.99]:48400 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1729027AbgGMP7S (ORCPT ); Mon, 13 Jul 2020 11:59:18 -0400 Received: from mail-wr1-f50.google.com (mail-wr1-f50.google.com [209.85.221.50]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id 561652076D for ; Mon, 13 Jul 2020 15:59:17 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1594655957; bh=KyCw4fkjqGBP+Ng7/LXUCofxHSgPOYAylS5bQauPuV4=; h=References:In-Reply-To:From:Date:Subject:To:Cc:From; b=J6B1/an87Uz+Ik5Zm/ikXniJFfkB0OJpKuPGtOS2jbFknhgYBK84N6z3WsA+39h32 PIVdl90MO3IcY5znDYwiVHWLvaEypp3S2PIYeR4tpEiDakNY6jKisITSFJqHaHAgj5 vtQuv/rRYW1/UBP9ICpJLNgkUogL19k2zGmWv098= Received: by mail-wr1-f50.google.com with SMTP id r12so17063500wrj.13 for ; Mon, 13 Jul 2020 08:59:17 -0700 (PDT) X-Gm-Message-State: AOAM532Osot6WujLqURPW/n8BgtsMeaDiV4GfFzKDDkUHX+SpTJx9Fma YafYTn2GG4bYXnujzTcR3fxEAu/0oI8zKir0QUad/g== X-Received: by 2002:adf:e482:: with SMTP id i2mr11665wrm.75.1594655955925; Mon, 13 Jul 2020 08:59:15 -0700 (PDT) MIME-Version: 1.0 References: <20200710015646.2020871-1-npiggin@gmail.com> <20200710015646.2020871-8-npiggin@gmail.com> In-Reply-To: <20200710015646.2020871-8-npiggin@gmail.com> From: Andy Lutomirski Date: Mon, 13 Jul 2020 08:59:04 -0700 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: [RFC PATCH 7/7] lazy tlb: shoot lazies, a non-refcounting lazy tlb option To: Nicholas Piggin Cc: linux-arch , X86 ML , Mathieu Desnoyers , Arnd Bergmann , Peter Zijlstra , LKML , linuxppc-dev , Linux-MM , Anton Blanchard Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, Jul 9, 2020 at 6:57 PM Nicholas Piggin wrote: > > On big systems, the mm refcount can become highly contented when doing > a lot of context switching with threaded applications (particularly > switching between the idle thread and an application thread). > > Abandoning lazy tlb slows switching down quite a bit in the important > user->idle->user cases, so so instead implement a non-refcounted scheme > that causes __mmdrop() to IPI all CPUs in the mm_cpumask and shoot down > any remaining lazy ones. > > On a 16-socket 192-core POWER8 system, a context switching benchmark > with as many software threads as CPUs (so each switch will go in and > out of idle), upstream can achieve a rate of about 1 million context > switches per second. After this patch it goes up to 118 million. > I read the patch a couple of times, and I have a suggestion that could be nonsense. You are, effectively, using mm_cpumask() as a sort of refcount. You're saying "hey, this mm has no more references, but it still has nonempty mm_cpumask(), so let's send an IPI and shoot down those references too." I'm wondering whether you actually need the IPI. What if, instead, you actually treated mm_cpumask as a refcount for real? Roughly, in __mmdrop(), you would only free the page tables if mm_cpumask() is empty. And, in the code that removes a CPU from mm_cpumask(), you would check if mm_users == 0 and, if so, check if you just removed the last bit from mm_cpumask and potentially free the mm. Getting the locking right here could be a bit tricky -- you need to avoid two CPUs simultaneously exiting lazy TLB and thinking they should free the mm, and you also need to avoid an mm with mm_users hitting zero concurrently with the last remote CPU using it lazily exiting lazy TLB. Perhaps this could be resolved by having mm_count == 1 mean "mm_cpumask() is might contain bits and, if so, it owns the mm" and mm_count == 0 meaning "now it's dead" and using some careful cmpxchg or dec_return to make sure that only one CPU frees it. Or maybe you'd need a lock or RCU for this, but the idea would be to only ever take the lock after mm_users goes to zero. --Andy