Received: by 2002:a25:e7d8:0:0:0:0:0 with SMTP id e207csp2358432ybh; Mon, 9 Mar 2020 04:23:22 -0700 (PDT) X-Google-Smtp-Source: ADFU+vs4gs9AkDJvDe35ZTGY0N6zbppbBHJ7JfRWFayfT4MQxMvKfsjTcaXC6zM0PZhue3Zze68U X-Received: by 2002:a9d:5e8e:: with SMTP id f14mr3568153otl.303.1583753002724; Mon, 09 Mar 2020 04:23:22 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1583753002; cv=none; d=google.com; s=arc-20160816; b=UEpz2D2kY+toQrZE648acAtJ9vSKwc5hSFZE+IEax/sGyEno0WJFNBY327uXB95/vG gmgU35qrtJiYFcQ2t//fzOoUAtmJH0BuS5wANPtfy8bdSzT2O1ocgC8ndAiCpbOm7V6t bBIOyPBBMVGKzAHRdGXIGoHa8B+2BfaKylpoy00MFnjpeLxNyMOpvH9vAb+/Y/clsxKK QPXPxinSj8ysS5M2jsEKWViyAk+IrLjzRQylKo4vi+JsvWNsalZUexMkmhI8hqLg71SV B4toRM/AFUCjllRjvktehXAP0QmDKYvQQzO3YGIeECRb6paKFy/kspWoMQDrzK3KbYH4 VSEg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date; bh=RafOThKMfVumgqZZEd6TDZhzPBnTvA93d+DCnArs530=; b=CJ5tAM8lk4/k7l+ibs9DdY/Ii7AvQZE9ZT0mwtyailzLOZ2cU++9hr2KkeOSoxybui /3ISgim3Q4vnxjdl2RmoH8xCQQ5wZD1GSzErpeLQfxm+Jb22CTWNTwTRedMCE+h82QWS Ll378SfroMWlFBR1yvSr2a2xj1Vpx8qkQVnuH/hETrAod0iXW4Yw/hiWoSlco76G1t/V JXaHQcYY7oc9pGkfOfBVXG00mO94gSmbhO9NdXhqBZW7/DmJlbj6kNgSFGigKpK1nZ+x g/r7VFQqv1iC9GFRw4oenh8jairyQQGbqu5WoMzdqnSTxqeOY07t0x3WAPD0vT8d5Zf6 DY3Q== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id n17si3918973oij.142.2020.03.09.04.23.10; Mon, 09 Mar 2020 04:23:22 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726383AbgCILWr (ORCPT + 99 others); Mon, 9 Mar 2020 07:22:47 -0400 Received: from foss.arm.com ([217.140.110.172]:50790 "EHLO foss.arm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725796AbgCILWr (ORCPT ); Mon, 9 Mar 2020 07:22:47 -0400 Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 872241FB; Mon, 9 Mar 2020 04:22:46 -0700 (PDT) Received: from mbp (unknown [172.31.20.19]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 22F6C3F6CF; Mon, 9 Mar 2020 04:22:45 -0700 (PDT) Date: Mon, 9 Mar 2020 11:22:42 +0000 From: Catalin Marinas To: Andrea Arcangeli Cc: Will Deacon , Rafael Aquini , Mark Salter , Jon Masters , linux-kernel@vger.kernel.org, linux-mm@kvack.org, linux-arm-kernel@lists.infradead.org, Michal Hocko , QI Fuli Subject: Re: [PATCH 3/3] arm64: tlb: skip tlbi broadcast Message-ID: <20200309112242.GB2487@mbp> References: <20200223192520.20808-1-aarcange@redhat.com> <20200223192520.20808-4-aarcange@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20200223192520.20808-4-aarcange@redhat.com> User-Agent: Mutt/1.10.1 (2018-07-13) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi Andrea, On Sun, Feb 23, 2020 at 02:25:20PM -0500, Andrea Arcangeli wrote: > switch_mm(struct mm_struct *prev, struct mm_struct *next, > struct task_struct *tsk) > { > - if (prev != next) > - __switch_mm(next); > + unsigned int cpu = smp_processor_id(); > + > + if (!per_cpu(cpu_not_lazy_tlb, cpu)) { > + per_cpu(cpu_not_lazy_tlb, cpu) = true; > + atomic_inc(&next->context.nr_active_mm); > + __switch_mm(next, cpu); > + } else if (prev != next) { > + atomic_inc(&next->context.nr_active_mm); > + __switch_mm(next, cpu); > + atomic_dec(&prev->context.nr_active_mm); > + } IIUC, nr_active_mm keeps track of how many instances of the current pgd (TTBR0_EL1) are active. > +enum tlb_flush_types tlb_flush_check(struct mm_struct *mm, unsigned int cpu) > +{ > + if (atomic_read(&mm->context.nr_active_mm) <= 1) { > + bool is_local = current->active_mm == mm && > + per_cpu(cpu_not_lazy_tlb, cpu); > + cpumask_t *stale_cpumask = mm_cpumask(mm); > + unsigned int next_zero = cpumask_next_zero(-1, stale_cpumask); > + bool local_is_clear = false; > + if (next_zero < nr_cpu_ids && > + (is_local && next_zero == cpu)) { > + next_zero = cpumask_next_zero(next_zero, stale_cpumask); > + local_is_clear = true; > + } > + if (next_zero < nr_cpu_ids) { > + cpumask_setall(stale_cpumask); > + local_is_clear = false; > + } > + > + /* > + * Enforce CPU ordering between the above > + * cpumask_setall(mm_cpumask) and the below > + * atomic_read(nr_active_mm). > + */ > + smp_mb(); > + > + if (likely(atomic_read(&mm->context.nr_active_mm)) <= 1) { > + if (is_local) { > + if (!local_is_clear) > + cpumask_clear_cpu(cpu, stale_cpumask); > + return TLB_FLUSH_LOCAL; > + } > + if (atomic_read(&mm->context.nr_active_mm) == 0) > + return TLB_FLUSH_NO; > + } > + } > + return TLB_FLUSH_BROADCAST; And this code here can assume that if nr_active_mm <= 1, no broadcast is necessary. One concern I have is the ordering between TTBR0_EL1 update in cpu_do_switch_mm() and the nr_active_mm, both on a different CPU. We only have an ISB for context synchronisation on that CPU but I don't think the architecture guarantees any relation between sysreg access and the memory update. We have a DSB but that's further down in switch_to(). However, what worries me more is that you can now potentially do a TLB shootdown without clearing the intermediate (e.g. VA to pte) walk caches from the TLB. Even if the corresponding pgd and ASID are no longer active on other CPUs, I'm not sure it's entirely safe to free (and re-allocate) pages belonging to a pgtable without first flushing the TLB. All the architecture spec states is that the software must first clear the entry followed by TLBI (the break-before-make rules). That said, the benchmark numbers are not very encouraging. Around 1% improvement in a single run, it can as well be noise. Also something like hackbench may also show a slight impact on the context switch path. Maybe with a true NUMA machine with hundreds of CPUs we may see a difference, but it depends on how well the TLBI is implemented. Thanks. -- Catalin