Received: by 2002:a25:1985:0:0:0:0:0 with SMTP id 127csp2009112ybz; Thu, 30 Apr 2020 09:15:35 -0700 (PDT) X-Google-Smtp-Source: APiQypI+JOllvUyt6Pvp+OFEVG+MpgZqpDTGXEXYxqdeTDuxcHnRwdz96Bsgk1JApSY28Qe7jm3o X-Received: by 2002:a05:6402:3047:: with SMTP id bu7mr3413441edb.303.1588263335064; Thu, 30 Apr 2020 09:15:35 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1588263335; cv=none; d=google.com; s=arc-20160816; b=hITjJNAo2pk7xsWcn9K/boveV3/WI+gEkC5aDF8pNjyZArMNkG/J7L5aygcPMb0n9D seZeqW8MuHXuevDrAGd7MMRrhKmYC7JrGIx5b0mziwOLEpUztGpA4SpL2rYixywsPHOi NK4RZjYF+5w25TNOVMtRHS8AISlQaRO3gMjdnGHGGDyxzoarrljyhFMy+VAHm/n+c+lw nxCZ2GEfmN+e0DXH2A/nMn+SN7ALtKKe11uc9RfsZsGYfgwycgy0uLLMQS0x3U5UmxQf vAngtEXCAdYro4tP63kSnHboGFUxnCzCVbyyUHEd89geJl4IFMeuALPIjQvF6frBTD2F ogVw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding:mime-version :references:in-reply-to:message-id:subject:cc:to:from:date; bh=0kvK6wHYCOJiKdo/DmecU6ex/Pp6BGZA+mHi/I2TiyU=; b=yrc4+d/NeGwvzKCw1OL6E6mIG1eUi8g+RWmpbuZbW2EmFZl4DzTC8GZ5WpKa9gktem OCRoumOEQ/P7K9vp/BR2nDp3CSYKqq6YIi/GsRH0CQ0a2MWZwBXtc+wHIvwfgNcr0mfe tsBvLjkDsTa6xtUtdPyS4BBUVAfgraYdSDIUr5jgKf74iouCsPD6evMC5bI+SQU0jSUR NgAli+roWNvGO5nGGt3Ch7gyHpkfUjnlsAJOInW406qHDrgQFUe9/bzJ169226tBdFJK gzCPX+SmFCbLLCeX5DUaDgZasYiKrTqmfGXLQ53l3k0cfI+IUkQzDPDXulK6bJhGbLJr Jy7Q== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id u14si48127ejz.292.2020.04.30.09.15.11; Thu, 30 Apr 2020 09:15:35 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727875AbgD3QLk (ORCPT + 99 others); Thu, 30 Apr 2020 12:11:40 -0400 Received: from mail.kernel.org ([198.145.29.99]:55346 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726688AbgD3QLj (ORCPT ); Thu, 30 Apr 2020 12:11:39 -0400 Received: from gandalf.local.home (cpe-66-24-58-225.stny.res.rr.com [66.24.58.225]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id E46412082E; Thu, 30 Apr 2020 16:11:37 +0000 (UTC) Date: Thu, 30 Apr 2020 12:11:36 -0400 From: Steven Rostedt To: Joerg Roedel Cc: LKML , Ingo Molnar , Thomas Gleixner , Peter Zijlstra , Borislav Petkov , Andrew Morton , Shile Zhang , Andy Lutomirski , "Rafael J. Wysocki" , Dave Hansen , Tzvetomir Stoyanov , Mathieu Desnoyers Subject: Re: [RFC][PATCH] x86/mm: Sync all vmalloc mappings before text_poke() Message-ID: <20200430121136.6d7aeb22@gandalf.local.home> In-Reply-To: <20200430141120.GA8135@suse.de> References: <20200429054857.66e8e333@oasis.local.home> <20200429105941.GQ30814@suse.de> <20200429082854.6e1796b5@oasis.local.home> <20200429100731.201312a9@gandalf.local.home> <20200430141120.GA8135@suse.de> X-Mailer: Claws Mail 3.17.3 (GTK+ 2.24.32; x86_64-pc-linux-gnu) MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, 30 Apr 2020 16:11:21 +0200 Joerg Roedel wrote: > Hi, > > On Wed, Apr 29, 2020 at 10:07:31AM -0400, Steven Rostedt wrote: > > Talking with Mathieu about this on IRC, he pointed out that my code does > > have a vzalloc() that is called: > > > > in trace_pid_write() > > > > pid_list->pids = vzalloc((pid_list->pid_max + 7) >> 3); > > > > This is done when -P1,2 is on the trace-cmd command line. > > Okay, tracked it down, some instrumentation in the page-fault and > double-fault handler gave me the stack-traces. Here is what happens: > > As already pointed out, it all happens because of page-faults on the > vzalloc'ed pid bitmap. It starts with this stack-trace: > > RIP: 0010:trace_event_ignore_this_pid+0x23/0x30 Interesting. Because that function is this: bool trace_event_ignore_this_pid(struct trace_event_file *trace_file) { struct trace_array *tr = trace_file->tr; struct trace_array_cpu *data; struct trace_pid_list *no_pid_list; struct trace_pid_list *pid_list; pid_list = rcu_dereference_raw(tr->filtered_pids); no_pid_list = rcu_dereference_raw(tr->filtered_no_pids); if (!pid_list && !no_pid_list) return false; data = this_cpu_ptr(tr->array_buffer.data); return data->ignore_pid; } Where it only sees if the pid masks exist. That is, it looks to see if there's pointers to them, it doesn't actually touch the vmalloc'd area. This check is to handle a race between allocating and deallocating the buffers and setting the ignore_pid bit. The reading of these arrays is done at sched_switch time, which sets or clears the ignore_pid field. That said, since this only happens on buffer instances (it does not trigger on the top level instance, which uses the same code for the pid masks) Could this possibly be for the tr->array_buffer.data, which is allocated with: allocate_trace_buffer() { [..] buf->data = alloc_percpu(struct trace_array_cpu); That is, the bug isn't the vmalloc being a problem, but perhaps the per_cpu allocation. This would explain why this crashes with the buffer instance and not with the top level instance. If it was related to the pid masks, then it would trigger for either (because they act the same in allocating at time of use). But when an instance is made, the tr->array_buffer.data is created. Which for the top level happens at boot up and the pages would have been synced long ago. But for a newly created instance, this happens just before its used. This could possibly explain why it's not a problem when doing it manually by hand, because the time between creating the instance, and the time to start and stop the tracing, is long enough for something to sync them page tables. tl;dr; It's not an issue with the vmalloc, it's an issue with per_cpu allocations! -- Steve