Received: by 2002:a25:1985:0:0:0:0:0 with SMTP id 127csp488432ybz; Wed, 29 Apr 2020 04:04:42 -0700 (PDT) X-Google-Smtp-Source: APiQypJ/bRO2jN0ezFvLhLtDKh6Sw1Zr++Mvk4ANeh9vAiHevPk5fJo0TTMk75DwRboctbuK/3A2 X-Received: by 2002:a17:906:88c:: with SMTP id n12mr2170845eje.92.1588158282036; Wed, 29 Apr 2020 04:04:42 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1588158282; cv=none; d=google.com; s=arc-20160816; b=bdJhLOpyhIT+eGjMEz3EeQLI56sh0hDmNd6CPUDejQcRlkxxWlZ1l9X7NjiQmr1BKF ++59bQ+431wpaDqO0Lq6qw+sQRle+XO4q6d82NTzlO8sQkEtn0M7OAc3BFVaBhtZrvNC Fa8n7oa2F2OKeB34STgfo0iENozjXs2glV10eyRfZEB9fo/S2wYDnOl5reI3QFrByfCZ KNlFjfy9ntgaEwNtPdI1sA3J6sBUPXKO7BmL3DKuvYERsUKiVM020Pgy45fcaL/fSaat qF972mzhBAo+VUMo3sNM1TrgEvMaL+DDxccg7umyehFetXWFlfJIQUcYz5ELCsEdjjGM A9Dg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date; bh=4TmzY6Vt/0FgPaCXkkWsh9kJCHgAlDD2qJ/oBmSo2l8=; b=BXaX63MXmkV6UZjgdRSoIoNlWK7XiKZQmuHfhAWgtoNZTRd+Y5wkEE0U8JO32fpXQJ /4AG0+EQhlCG/FlR0kn6moDp/htpwwtl0OVR4x0Z2F9eioRDyAvoUU7Qzkyr6iR5H8tH +UnNId6DPsuO8gIV7XKntY3TpJc8hVRIoAawu3uwaqzJuWDxPhgGGnB9ofXF6al4m0YY Jg4sg4wLbXDVGlKKCbd/VrYtzhUfXIqdMP4xt9YmkJvF8PZRUURP99JnvB7GHCqpLDxT IvEqO7HUKjzoYM0DXmtPBKWdIpJQxPVmkq9nkF32ycvbJJCRg6fYRb28ViP8/I873lmS NhoA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id v22si3687108ejw.454.2020.04.29.04.04.07; Wed, 29 Apr 2020 04:04:42 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726554AbgD2K7q (ORCPT + 99 others); Wed, 29 Apr 2020 06:59:46 -0400 Received: from mx2.suse.de ([195.135.220.15]:60296 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726426AbgD2K7q (ORCPT ); Wed, 29 Apr 2020 06:59:46 -0400 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.220.254]) by mx2.suse.de (Postfix) with ESMTP id 788E7AB8F; Wed, 29 Apr 2020 10:59:43 +0000 (UTC) Date: Wed, 29 Apr 2020 12:59:41 +0200 From: Joerg Roedel To: Steven Rostedt Cc: LKML , Ingo Molnar , Thomas Gleixner , Peter Zijlstra , Borislav Petkov , Andrew Morton , Shile Zhang , Andy Lutomirski , "Rafael J. Wysocki" , Dave Hansen , Tzvetomir Stoyanov Subject: Re: [RFC][PATCH] x86/mm: Sync all vmalloc mappings before text_poke() Message-ID: <20200429105941.GQ30814@suse.de> References: <20200429054857.66e8e333@oasis.local.home> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20200429054857.66e8e333@oasis.local.home> User-Agent: Mutt/1.10.1 (2018-07-13) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi Steven, On Wed, Apr 29, 2020 at 05:48:57AM -0400, Steven Rostedt wrote: > From: Steven Rostedt (VMware) > > Tzvetomir was adding a feature to trace-cmd that would allow the user > to specify filtering on process IDs within a tracing instance (or > buffer). When he added this feature and tested it on tracing PIDs 1 and > 2, it caused his kernel to hang. > > He sent me his code and I was able to reproduce the hang as well. I > bisected it down to this commit 763802b53a42 ("x86/mm: split > vmalloc_sync_all()"). It was 100% reproducible. With the commit it > would hang, and reverting the commit, it would work. > > Adding a bunch of printk()s, I found where it locked up. It was after > the recording was finished, and a write of "0" to > tracefs/instance/foo/events/enable. And in the code, it was: > > (you may skip to the end of the chain) > > system_enable_write() { > __ftrace_set_clr_event() { > __ftrace_set_clr_event_nolock() { > ftrace_event_enable_disable() { > __ftrace_event_enable_disable() { > call->class->reg() { > trace_point_probe_unregister() { > tracepoint_remove_func() { > static_key_slow_dec() { > __static_key_slow_dec() { > > > > __static_key_slow_dec_cpus_locked() { > jump_label_update() { > __jump_label_update() > arch_jump_label_transform() { > jump_label_transform() { > __jump_label_transform() { > text_poke_bp() { > text_poke_bp_batch() { > text_poke() { > __text_poke() { > > (This is where you want to see) > > use_temporary_mm() { > switch_mm_irqs_off() { > load_new_mm_cr3() { > write_cr3() <<--- Lock up! I don't see how it could lock up in write_cr3(), at least on bare-metal. What is the environment this happens, 32 or 64 bit, in a VM or bare-metal? I think it is more likely that your lockup is actually a page-fault loop, where the #PF handler does not map the faulting address correctly. But I have to look closer into how text_poke() works before I can say more. Btw, in case it happens on x86-64, does it also happen without vmalloc-stacks? Regards, Joerg