Received: by 2002:a25:1985:0:0:0:0:0 with SMTP id 127csp199612ybz; Thu, 30 Apr 2020 19:41:10 -0700 (PDT) X-Google-Smtp-Source: APiQypK8JW4OWKbI7wz3y+TtyMRjBWBO9n6jrzYwzzfwNzarV49YNRSzcocxMj7CBdHm6z35xmVx X-Received: by 2002:a50:ef18:: with SMTP id m24mr1733152eds.281.1588300870646; Thu, 30 Apr 2020 19:41:10 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1588300870; cv=none; d=google.com; s=arc-20160816; b=Vw+RkF1V1OQR4OlipB+QCZQkMSPpVvsqnmKBp4X4Qr9ZEW/8/Pekkp8SnO6/fsezgO GxDTlmmXkAqyYRhegrfV7HA8DzhKVuXH8YB9O7JyOEcn3ZYlduIBbA3nUBSBxivHoi+/ x+S+v1FlfDgL60F0DYghT3+BeT3AvOBCqNfiTvlApsft3BE1b3LLKyIX3EjLS+YDjzxn t2t/gmig7vcWMO0IvtuPVmEyoqdEjnOqBWBcihiEfkYW6AsbIxY9CXFsh7h6r5HFsjAG DGuOcSTq+2EhxgALs1KVsgbZVZJ+5gNZUPbp3JADYdS9i+X/z+VwvcDBxPKzyUzeqJJJ qUFQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding:mime-version :references:in-reply-to:message-id:subject:cc:to:from:date; bh=Pb0TZgr5UAnavjQEf//1xoTEJI/SpZeCXX/zQypNGhM=; b=ZOqLkv7yt5tPf+OPobtnnSkh/qUDnFABLfrocvDExVbGc4tC94KMJ2M4672PLbOPtA XVh80g1Aeb4iSMP4Nxv7nQueGYIbT15VFFLrCGxBtGiQeSBW0kJ8anWb8Z8T5ZclTHHG APe3UROlHHLnz2zOnrLdahs1RWZx1Nl5oeJ46gYggAD5t+PvmTA1PKMR/zrWW3YnXloq VN3EeixhYLPdqAy6wjYoLRvcHYxVn4ncu18em6IZ9rKIeKMPVqIXqFyanik5dQBSqhLv AZqODM7Zp0Df22FwBRqPNqvdDurRZMe1Qg2B+Y19TZxfPN+V1FoL9FQJckgiMl5N85zp D8dw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id d6si830475edy.66.2020.04.30.19.40.47; Thu, 30 Apr 2020 19:41:10 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728088AbgEACjW (ORCPT + 99 others); Thu, 30 Apr 2020 22:39:22 -0400 Received: from mail.kernel.org ([198.145.29.99]:51536 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728032AbgEACjW (ORCPT ); Thu, 30 Apr 2020 22:39:22 -0400 Received: from gandalf.local.home (cpe-66-24-58-225.stny.res.rr.com [66.24.58.225]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id 9A57B2071C; Fri, 1 May 2020 02:39:20 +0000 (UTC) Date: Thu, 30 Apr 2020 22:39:19 -0400 From: Steven Rostedt To: Mathieu Desnoyers Cc: Joerg Roedel , linux-kernel , Ingo Molnar , Thomas Gleixner , Peter Zijlstra , Borislav Petkov , Andrew Morton , Shile Zhang , Andy Lutomirski , "Rafael J. Wysocki" , Dave Hansen , Tzvetomir Stoyanov Subject: Re: [RFC][PATCH] x86/mm: Sync all vmalloc mappings before text_poke() Message-ID: <20200430223919.50861011@gandalf.local.home> In-Reply-To: <1902703609.78863.1588300015661.JavaMail.zimbra@efficios.com> References: <20200429054857.66e8e333@oasis.local.home> <20200429105941.GQ30814@suse.de> <20200429082854.6e1796b5@oasis.local.home> <20200429100731.201312a9@gandalf.local.home> <20200430141120.GA8135@suse.de> <20200430121136.6d7aeb22@gandalf.local.home> <20200430191434.GC8135@suse.de> <20200430211308.74a994dc@oasis.local.home> <1902703609.78863.1588300015661.JavaMail.zimbra@efficios.com> X-Mailer: Claws Mail 3.17.3 (GTK+ 2.24.32; x86_64-pc-linux-gnu) MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, 30 Apr 2020 22:26:55 -0400 (EDT) Mathieu Desnoyers wrote: > ----- On Apr 30, 2020, at 9:13 PM, rostedt rostedt@goodmis.org wrote: > > > [ Joerg, sending again this time not just to you. (hit reply to sender > > and not reply to all). Feel free to resend what you wrote before to this ] > > > > On Thu, 30 Apr 2020 21:14:34 +0200 > > Joerg Roedel wrote: > > > >> And alloc_percpu() calls down into pcpu_alloc(), which allocates new > >> percpu chunks using vmalloc() on x86. And there we are again in the > >> vmalloc area. > > > > So after a vmalloc() is made, should the page tables be synced? > > Why should it ? Usually, the page fault handler is able to resolve the > resulting minor page faults lazily. > > > > > This is a rather subtle bug, and I don't think it should be the caller of > > percpu_alloc() that needs to call vmalloc_sync_mappings(). > > Who said tracing was easy ? ;-) But anyone can hook to a tracepoint, and then if they hook to one that is in the page fault handler, and they use vmalloc, they can lock up the machine. > > > What's your suggestion for a fix? > > I know the question is not addressed to me, but here are my 2 cents: > > It's subtle because ftrace is tracing the page fault handler through > tracepoints. It would not make sense to slow down all vmalloc or > percpu_alloc() just because tracing recurses when tracing page faults. What's so damn special about alloc_percpu()? It's definitely not a fast path. And it's not used often. > > I think the right approach to solve this is to call vmalloc_sync_mappings() > before any vmalloc'd memory ends up being observable by instrumentation. > This can be achieved by adding a vmalloc_sync_mappings call on tracepoint > registration like I proposed in my patchset a few week ago: > > https://lore.kernel.org/r/20200409193543.18115-2-mathieu.desnoyers@efficios.com > > The tracers just have to make sure they perform their vmalloc'd memory > allocation before registering the tracepoint which can touch it, else they > need to issue vmalloc_sync_mappings() on their own before making the > newly allocated memory observable by instrumentation. > > This approach is not new: register_die_notifier() does exactly that today. > I'll give the answer I gave to Joerg when he replied to my accidental private (not public) email: Or even my original patch would be better than having the generic tracing code understanding the intrinsic properties of vmalloc() and alloc_percpu() on x86_64. I really don't think it is wise to have: foo = alloc_percpu(); /* * Because of some magic with the way alloc_percpu() works on * x86_64, we need to synchronize the pgd of all the tables, * otherwise the trace events that happen in x86_64 page fault * handlers can't cope with accessing the chance that a * alloc_percpu()'d memory might be touched in the page fault trace * event. Oh, and we need to audit all alloc_percpu() and vmalloc() * calls in tracing, because something might get triggered within a * page fault trace event! */ vmalloc_sync_mappings(); That would be exactly what I add as a comment if it were to be added in the generic tracing code. And we would need to audit any percpu alloc'd code in all tracing, or anything that might git hooked into something that hooks to the page fault trace point. Since this worked for a decade without this, I'm strongly against adding it in the generic code due to some issues with a single architecture. -- Steve