Received: by 2002:a25:1985:0:0:0:0:0 with SMTP id 127csp511088ybz; Fri, 1 May 2020 03:20:05 -0700 (PDT) X-Google-Smtp-Source: APiQypJ9ma3ikRQAdVIUXf5xrNbEdegvTUv4PZd0hDPLZ8ARzohGYDAu3UIWDtpOaY83xK4OBqDl X-Received: by 2002:a05:6402:1651:: with SMTP id s17mr2944334edx.173.1588328405391; Fri, 01 May 2020 03:20:05 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1588328405; cv=none; d=google.com; s=arc-20160816; b=nccBYWWqpKGt41giDL2KwNjZndPqioPDf/sZmn1ERs2ZXVFmEpgDiRx/iHHXUQj0IR 8IyR3yctfy/3fz3qwnEr7QquIGSHDXpCjHZ40M7e/9yZTPZ2goeN8J1i1DkdYBcV1tQY JBXmcDZzf3KEXoSoCRN2MHcV4An35J5OIfQmNSWmCn05lzx/LGezRxbARc8YEfHVsWNg KBmu5VjiSJSclWIuuHpP4Mc4DQ+zUfgK6KUOyAelAh3FfGsO2hrxngMquDMmUEMcE216 WmetcSc0xrb2V4W+pX5n26GpjJt0jhql1WjyIizgJPBPxYNgYaD0cGEY/cvDATad5CCq JzVA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date; bh=apeDRkExrP43dYU/fb+DLWjPE1w1GV1+leys70pTyzM=; b=drPb1dDmZenxQNhBEQ+Np00976HaCOWpc3t7jdyhtaCO1lF6+Vf2AmyRJsQGx4E+Cx WSfq+dxyPZ0PJEY9TbZRVyPdy23qR6+qaN4Lp94kephQPsU07+43NJs2hK0LvTrCO1ft AxYshobUc+A2IiPyv3D4oxjRIwBNKpubQUVWi1orrMQPdMAg4y1aJ4g9KQQLW4eb/uCd +bXCen9l/8ibn0xZ0KvrFWEShy9tsnaPZgP8CoM5ZnvvF9/sYe/0o3e0GGna7/Cs+ecN lX5LB4pbMO0GQGt7BxTo5Hl15HUm2S2NVFb2nwo/jlvHDWusk5OwbYlPJzpBRxIrDwKY 5TKQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id bu15si1304997edb.129.2020.05.01.03.19.42; Fri, 01 May 2020 03:20:05 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728549AbgEAKQO (ORCPT + 99 others); Fri, 1 May 2020 06:16:14 -0400 Received: from mx2.suse.de ([195.135.220.15]:39308 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728119AbgEAKQN (ORCPT ); Fri, 1 May 2020 06:16:13 -0400 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.220.254]) by mx2.suse.de (Postfix) with ESMTP id F12ABAC91; Fri, 1 May 2020 10:16:10 +0000 (UTC) Date: Fri, 1 May 2020 12:16:08 +0200 From: Joerg Roedel To: Steven Rostedt Cc: Mathieu Desnoyers , linux-kernel , Ingo Molnar , Thomas Gleixner , Peter Zijlstra , Borislav Petkov , Andrew Morton , Shile Zhang , Andy Lutomirski , "Rafael J. Wysocki" , Dave Hansen , Tzvetomir Stoyanov Subject: Re: [RFC][PATCH] x86/mm: Sync all vmalloc mappings before text_poke() Message-ID: <20200501101608.GE8135@suse.de> References: <20200429054857.66e8e333@oasis.local.home> <20200429105941.GQ30814@suse.de> <20200429082854.6e1796b5@oasis.local.home> <20200429100731.201312a9@gandalf.local.home> <20200430141120.GA8135@suse.de> <20200430121136.6d7aeb22@gandalf.local.home> <20200430191434.GC8135@suse.de> <20200430211308.74a994dc@oasis.local.home> <1902703609.78863.1588300015661.JavaMail.zimbra@efficios.com> <20200430223919.50861011@gandalf.local.home> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20200430223919.50861011@gandalf.local.home> User-Agent: Mutt/1.10.1 (2018-07-13) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, Apr 30, 2020 at 10:39:19PM -0400, Steven Rostedt wrote: > I'll give the answer I gave to Joerg when he replied to my accidental > private (not public) email: > > Or even my original patch would be better than having the generic tracing > code understanding the intrinsic properties of vmalloc() and > alloc_percpu() on x86_64. I really don't think it is wise to have: > > foo = alloc_percpu(); > > /* > * Because of some magic with the way alloc_percpu() works on > * x86_64, we need to synchronize the pgd of all the tables, > * otherwise the trace events that happen in x86_64 page fault > * handlers can't cope with accessing the chance that a > * alloc_percpu()'d memory might be touched in the page fault trace > * event. Oh, and we need to audit all alloc_percpu() and vmalloc() > * calls in tracing, because something might get triggered within a > * page fault trace event! > */ > vmalloc_sync_mappings(); > > That would be exactly what I add as a comment if it were to be added in the > generic tracing code. > > And we would need to audit any percpu alloc'd code in all tracing, or > anything that might git hooked into something that hooks to the page fault > trace point. > > Since this worked for a decade without this, I'm strongly against adding it > in the generic code due to some issues with a single architecture. That is exactly the problem with vmalloc_sync_mappings()/unmappings(). It is not at all clear when it needs to be called and why, or even who needs is responsible for calling it. The existing call-sites in Notifier and ACPI code have no comment on why it is necessary to synchronize the vmalloc mappings there. It is only needed for x86, we could also get rid of it completely if: 1) At x86-64 we pre-allocate all 64 P4D/PUD pages for the vmalloc area in init_mm at boot time. This needs 256kb of memory per system, most of it potentially unused as each P4D/PUD maps 512GB of address space. 2) At x86-32 we need to disable large pages for vmalloc/ioremap mappings and pre-allocate the PTE pages for the vmalloc area in init_mm. Depending on how much memory the system has and the configured kernel/user split this might take more than 64 pages. With that we could get rid of the vmalloc_sync interface and also the vmalloc-fault code in general and reduce the complexity. This interface has caused problems more than once. On the other side it would trade memory usage against complexity. Regards, Joerg