Received: by 2002:a25:1985:0:0:0:0:0 with SMTP id 127csp2021573ybz; Thu, 30 Apr 2020 09:28:13 -0700 (PDT) X-Google-Smtp-Source: APiQypKDmKmfd+T7ocf8n3dbKCbwdeAVkKRoRyXPYHZ/WXPf+NI1gGq9VbD70/EZjQleirRtZ/AS X-Received: by 2002:a17:906:81d7:: with SMTP id e23mr3199921ejx.309.1588264093322; Thu, 30 Apr 2020 09:28:13 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1588264093; cv=none; d=google.com; s=arc-20160816; b=mjn3uAHo71L2zG+yx8osI0ceV4L1kLdLqNzxkHtVjptAdHPjT2jrefZ7X67XHbLW7u a39uDKkO4KDMgh+Ouq/TdH7kJNTqr70T9QRJoxo0HdtU8yO3K5SJzifhOwioD+SytZ0d GUcSv3bmbfuey14AGhsz99c0i6veCwvGoIEvtUZNpZb+TWcA23eBkCDDUZFUKh9WinKk 2Rqsrgdjrp3vtfWzS8u7orR2lcV09qeADdpZQiVDPh76GmEFeAPrkUzMRiE94StgRQmx HactSxft4G50+tmbvIQDoFrlrtnh7D7qZkT5UYwqo7Lb10lD7WMX+h8oZXc9gNomgBfK mOsw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding:mime-version :references:in-reply-to:message-id:subject:cc:to:from:date; bh=n/wW02+lWjtO7Bd4VGRfQdG0xPiNZFLH3IMUjbAH6n8=; b=WalfYXELkxUaJeIUODaBUCpM08tb3wE2NmYgh43DED/EILm97t1oUa2XbYmbwBsMka tBP+2Ct2tMv3ke/NSKRDs2CzEpiFSy8DP1jG3dWLrG9V+amfzA0NyOw8wN1/7SpeBFQh ne2D7bRPMV77CFWrcdB8mgQ3H3SeDRcL5VVDuJwwpZrQ5+SEqHn9fJz5DoZamMjLeA5x rxyF3mwpexZUWB7p4dZb95TuBX7Lvb4+MYfFtRLUy28hQbUBk+LO6tnAqMClXTuhpzm4 Ohes7dG86okEa4IxrFSPaSO2Vr0A77TK/5V8Jn+A6/ObOIqLztSW6IALIToxhp+ax3/s g+dg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id f3si48292edn.165.2020.04.30.09.27.49; Thu, 30 Apr 2020 09:28:13 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726933AbgD3Q0C (ORCPT + 99 others); Thu, 30 Apr 2020 12:26:02 -0400 Received: from mail.kernel.org ([198.145.29.99]:35552 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726515AbgD3Q0C (ORCPT ); Thu, 30 Apr 2020 12:26:02 -0400 Received: from gandalf.local.home (cpe-66-24-58-225.stny.res.rr.com [66.24.58.225]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id 4C13E20873; Thu, 30 Apr 2020 16:26:00 +0000 (UTC) Date: Thu, 30 Apr 2020 12:25:58 -0400 From: Steven Rostedt To: Mathieu Desnoyers Cc: Joerg Roedel , linux-kernel , Ingo Molnar , Thomas Gleixner , Peter Zijlstra , Borislav Petkov , Andrew Morton , Shile Zhang , Andy Lutomirski , "Rafael J. Wysocki" , Dave Hansen , Tzvetomir Stoyanov Subject: Re: [RFC][PATCH] x86/mm: Sync all vmalloc mappings before text_poke() Message-ID: <20200430122558.406c9755@gandalf.local.home> In-Reply-To: <505666080.77869.1588263380070.JavaMail.zimbra@efficios.com> References: <20200429054857.66e8e333@oasis.local.home> <20200429105941.GQ30814@suse.de> <20200429082854.6e1796b5@oasis.local.home> <20200429100731.201312a9@gandalf.local.home> <20200430141120.GA8135@suse.de> <20200430121136.6d7aeb22@gandalf.local.home> <505666080.77869.1588263380070.JavaMail.zimbra@efficios.com> X-Mailer: Claws Mail 3.17.3 (GTK+ 2.24.32; x86_64-pc-linux-gnu) MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, 30 Apr 2020 12:16:20 -0400 (EDT) Mathieu Desnoyers wrote: > ----- On Apr 30, 2020, at 12:11 PM, rostedt rostedt@goodmis.org wrote: > > > On Thu, 30 Apr 2020 16:11:21 +0200 > > Joerg Roedel wrote: > > > >> Hi, > >> > >> On Wed, Apr 29, 2020 at 10:07:31AM -0400, Steven Rostedt wrote: > >> > Talking with Mathieu about this on IRC, he pointed out that my code does > >> > have a vzalloc() that is called: > >> > > >> > in trace_pid_write() > >> > > >> > pid_list->pids = vzalloc((pid_list->pid_max + 7) >> 3); > >> > > >> > This is done when -P1,2 is on the trace-cmd command line. > >> > >> Okay, tracked it down, some instrumentation in the page-fault and > >> double-fault handler gave me the stack-traces. Here is what happens: > >> > >> As already pointed out, it all happens because of page-faults on the > >> vzalloc'ed pid bitmap. It starts with this stack-trace: > >> > >> RIP: 0010:trace_event_ignore_this_pid+0x23/0x30 > > > > Interesting. Because that function is this: > > > > bool trace_event_ignore_this_pid(struct trace_event_file *trace_file) > > { > > struct trace_array *tr = trace_file->tr; > > struct trace_array_cpu *data; > > struct trace_pid_list *no_pid_list; > > struct trace_pid_list *pid_list; > > > > pid_list = rcu_dereference_raw(tr->filtered_pids); > > no_pid_list = rcu_dereference_raw(tr->filtered_no_pids); > > > > if (!pid_list && !no_pid_list) > > return false; > > > > data = this_cpu_ptr(tr->array_buffer.data); > > > > return data->ignore_pid; > > } > > > > Where it only sees if the pid masks exist. That is, it looks to see if > > there's pointers to them, it doesn't actually touch the vmalloc'd area. > > This check is to handle a race between allocating and deallocating the > > buffers and setting the ignore_pid bit. The reading of these arrays is done > > at sched_switch time, which sets or clears the ignore_pid field. > > > > That said, since this only happens on buffer instances (it does not trigger > > on the top level instance, which uses the same code for the pid masks) > > > > Could this possibly be for the tr->array_buffer.data, which is allocated > > with: > > > > allocate_trace_buffer() { > > [..] > > buf->data = alloc_percpu(struct trace_array_cpu); > > > > That is, the bug isn't the vmalloc being a problem, but perhaps the per_cpu > > allocation. This would explain why this crashes with the buffer instance > > and not with the top level instance. If it was related to the pid masks, > > then it would trigger for either (because they act the same in allocating > > at time of use). But when an instance is made, the tr->array_buffer.data is > > created. Which for the top level happens at boot up and the pages would > > have been synced long ago. But for a newly created instance, this happens > > just before its used. This could possibly explain why it's not a problem > > when doing it manually by hand, because the time between creating the > > instance, and the time to start and stop the tracing, is long enough for > > something to sync them page tables. > > > > tl;dr; It's not an issue with the vmalloc, it's an issue with per_cpu > > allocations! > > Did I mention that alloc_percpu uses: > > static void *pcpu_mem_zalloc(size_t size, gfp_t gfp) > { > if (WARN_ON_ONCE(!slab_is_available())) > return NULL; > > if (size <= PAGE_SIZE) > return kzalloc(size, gfp); > else > return __vmalloc(size, gfp | __GFP_ZERO, PAGE_KERNEL); > } > > So yeah, it's vmalloc'd memory when size > PAGE_SIZE. > I certainly hope that struct trace_array_cpu is not bigger than PAGE_SIZE! -- Steve