Received: by 2002:a25:4158:0:0:0:0:0 with SMTP id o85csp5466059yba; Mon, 13 May 2019 11:17:47 -0700 (PDT) X-Google-Smtp-Source: APXvYqyMpySgqCcmtB5dAMhQKjdIU/9IDvw3HK56bPJLZk3SsL9ElVkERlJSXXUbvhufOaQpDksu X-Received: by 2002:aa7:87c3:: with SMTP id i3mr34379211pfo.85.1557771467447; Mon, 13 May 2019 11:17:47 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1557771467; cv=none; d=google.com; s=arc-20160816; b=j7OHHj0Av0NaIS8x4hPAyZm8Qm42YO7O3f7gSmIar1OppXMcdIPPZOWzrowVR5tTyR /RgDDXFAMkGOG6zvqQseFOlkYWq/kJX0wnJN8oulZ2Fn5t5lQfFubXsZul5FLVDfvwiT Txo58QuLKLG2zeI2xyujRp8SzUeLev6plP2nYJkBLZUw6fyERvc7gWhDsWfg8KyYx5PT 3wWGakzAfqr8/llhjW+jqmTB2dMEX6+XO8NGC43kfK+IVtl//cKU6Igh7NzJzYkot0Yt q/sGYeWvpmMma+geXyx4T3CeNCVPusoZpsPG3TJxC9RYhdQr4uV4zJzeUKc5gNAj3dKD eNnQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-transfer-encoding:content-disposition:mime-version :references:message-id:subject:cc:to:from:date; bh=SMPgTik9tWqRCeyL6dNlO9ZQTxIwB41hxEOOmUM2dIA=; b=LZDNrgh5eSZPONnfyYZlPmwGC/J2uXHC+cpS3q5jRw7sQqHDkfNpw/IKmGARl0VoGw 38HmXah7tEcuQsznHOG939SNacv1yP8ehkV+9cVvaPzALmkIcDvDOqw6K9jaAoG6k48N nCBbdEljVZ5mFjGbrXv3YL+n1BGpkr3Eh+sFKkuK+4FbPrWpl8MNucAuRaJVOb5DeF1C Y/pC/TtVcRXhTlVYoG65gKUJHocMpdmkDBY4/RwKepg/CfKidvMnc4HaecS6x7o0m9YO OpYxoskRe3YNgKTgyK+1tuu3tPSQV8Qlv0/mKT6gmCGlc9dEjw+/2xhxa6ubr/uWY9Tx VmLw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id 67si15751324plf.382.2019.05.13.11.17.30; Mon, 13 May 2019 11:17:47 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1730449AbfEMQiA (ORCPT + 99 others); Mon, 13 May 2019 12:38:00 -0400 Received: from usa-sjc-mx-foss1.foss.arm.com ([217.140.101.70]:32988 "EHLO foss.arm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728639AbfEMQiA (ORCPT ); Mon, 13 May 2019 12:38:00 -0400 Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.72.51.249]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 65A81341; Mon, 13 May 2019 09:37:59 -0700 (PDT) Received: from fuggles.cambridge.arm.com (usa-sjc-imap-foss1.foss.arm.com [10.72.51.249]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 5067A3F71E; Mon, 13 May 2019 09:37:57 -0700 (PDT) Date: Mon, 13 May 2019 17:37:52 +0100 From: Will Deacon To: Nadav Amit Cc: Peter Zijlstra , Yang Shi , "jstancek@redhat.com" , Andrew Morton , "stable@vger.kernel.org" , Linux-MM , LKML , "Aneesh Kumar K . V" , Nick Piggin , Minchan Kim , Mel Gorman Subject: Re: [PATCH] mm: mmu_gather: remove __tlb_reset_range() for force flush Message-ID: <20190513163752.GA10754@fuggles.cambridge.arm.com> References: <1557264889-109594-1-git-send-email-yang.shi@linux.alibaba.com> <20190509083726.GA2209@brain-police> <20190509103813.GP2589@hirez.programming.kicks-ass.net> <20190509182435.GA2623@hirez.programming.kicks-ass.net> <04668E51-FD87-4D53-A066-5A35ABC3A0D6@vmware.com> <20190509191120.GD2623@hirez.programming.kicks-ass.net> <7DA60772-3EE3-4882-B26F-2A900690DA15@vmware.com> <20190513083606.GL2623@hirez.programming.kicks-ass.net> <75FD46B2-2E0C-41F2-9308-AB68C8780E33@vmware.com> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <75FD46B2-2E0C-41F2-9308-AB68C8780E33@vmware.com> User-Agent: Mutt/1.11.1+86 (6f28e57d73f2) () Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, May 13, 2019 at 09:11:38AM +0000, Nadav Amit wrote: > > On May 13, 2019, at 1:36 AM, Peter Zijlstra wrote: > > > > On Thu, May 09, 2019 at 09:21:35PM +0000, Nadav Amit wrote: > > > >>>>> And we can fix that by having tlb_finish_mmu() sync up. Never let a > >>>>> concurrent tlb_finish_mmu() complete until all concurrenct mmu_gathers > >>>>> have completed. > >>>>> > >>>>> This should not be too hard to make happen. > >>>> > >>>> This synchronization sounds much more expensive than what I proposed. But I > >>>> agree that cache-lines that move from one CPU to another might become an > >>>> issue. But I think that the scheme I suggested would minimize this overhead. > >>> > >>> Well, it would have a lot more unconditional atomic ops. My scheme only > >>> waits when there is actual concurrency. > >> > >> Well, something has to give. I didn’t think that if the same core does the > >> atomic op it would be too expensive. > > > > They're still at least 20 cycles a pop, uncontended. > > > >>> I _think_ something like the below ought to work, but its not even been > >>> near a compiler. The only problem is the unconditional wakeup; we can > >>> play games to avoid that if we want to continue with this. > >>> > >>> Ideally we'd only do this when there's been actual overlap, but I've not > >>> found a sensible way to detect that. > >>> > >>> diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h > >>> index 4ef4bbe78a1d..b70e35792d29 100644 > >>> --- a/include/linux/mm_types.h > >>> +++ b/include/linux/mm_types.h > >>> @@ -590,7 +590,12 @@ static inline void dec_tlb_flush_pending(struct mm_struct *mm) > >>> * > >>> * Therefore we must rely on tlb_flush_*() to guarantee order. > >>> */ > >>> - atomic_dec(&mm->tlb_flush_pending); > >>> + if (atomic_dec_and_test(&mm->tlb_flush_pending)) { > >>> + wake_up_var(&mm->tlb_flush_pending); > >>> + } else { > >>> + wait_event_var(&mm->tlb_flush_pending, > >>> + !atomic_read_acquire(&mm->tlb_flush_pending)); > >>> + } > >>> } > >> > >> It still seems very expensive to me, at least for certain workloads (e.g., > >> Apache with multithreaded MPM). > > > > Is that Apache-MPM workload triggering this lots? Having a known > > benchmark for this stuff is good for when someone has time to play with > > things. > > Setting Apache2 with mpm_worker causes every request to go through > mmap-writev-munmap flow on every thread. I didn’t run this workload after > the patches that downgrade the mmap_sem to read before the page-table > zapping were introduced. I presume these patches would allow the page-table > zapping to be done concurrently, and therefore would hit this flow. Hmm, I don't think so: munmap() still has to take the semaphore for write initially, so it will be serialised against other munmap() threads even after they've downgraded afaict. The initial bug report was about concurrent madvise() vs munmap(). Will