Received: by 2002:a25:1985:0:0:0:0:0 with SMTP id 127csp2013893ybz; Thu, 30 Apr 2020 09:20:17 -0700 (PDT) X-Google-Smtp-Source: APiQypLG03Mc9p7x4A3Oek7ehLI9pQmUiLuVkR3pOsilPzGhEe1+YEvZ0ny2ZJikergPWRsDx2CS X-Received: by 2002:a05:6402:1a2f:: with SMTP id be15mr3502675edb.385.1588263617594; Thu, 30 Apr 2020 09:20:17 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1588263617; cv=none; d=google.com; s=arc-20160816; b=Rwf6sUZcjKxYhS4YKMhs6CPkHuZKcygm6J7jCnZp925RDIXuvEDnSItGOc3DQb7AwM ChieiCsgmTwdcyD4OrWPR/egejyeUrEE1z0u959eVaCzmtQYmyLW0lcVg2ZLhCB4w1CZ Rvl5wsGFXrmtCROi29UVMN2x3YhFRugEMYuqh6JwQ/7Bstiv6Lrdgq8NjpFHTm3FXpi0 xCm3ACdjk8zQ4lOc1uR5GUQc/jJo4TGkQ+59PRCLRGKBwFLwV1KywwLB8Nn2Li5VRjPe fLvyKZY61yTPcChd1v5jpig/wDGLniueTNts9ZIiYn1R2vHFLsevHQQn4AtjOvkZztbF Xt8g== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:thread-index:thread-topic :content-transfer-encoding:mime-version:subject:references :in-reply-to:message-id:cc:to:from:date:dkim-signature:dkim-filter; bh=3peYxHoeMxttyPs/kK0oeQFnIA8w0ML3Z5mFQwUuuw4=; b=HjhstjpRciqYWKaiucXg7JzabvG58z29tzbHhZmkjkbyTHJhzAtm4gyifO9M/WbnwF ytyFB4C+srlgWm3mynETHElSO3G8KHRsp2FmzRxS/hxKx19qIT/Mr9pTPJe3lNI5WYmU UiwKezpb+yWhvOg+/h+U0RwSl5rqttnGCG5XIUMVxxillroigNXOrJCreS5NsEGoasWD G1fqInOY102UVTdHFTwgtnvJ5dyCmHPy3L/e7XARW0qq2nUu3XfccSzv5d9fBMMxH6iD gim0v0NthNhFQAGKxRKAnhEyW1QdE0fBv86gWOkrqlr3cmEVDuyItokcNpGgF62PxlNM Uukg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@efficios.com header.s=default header.b=HGgxy6OF; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=efficios.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id l22si42732ejg.362.2020.04.30.09.19.53; Thu, 30 Apr 2020 09:20:17 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@efficios.com header.s=default header.b=HGgxy6OF; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=efficios.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727104AbgD3QQW (ORCPT + 99 others); Thu, 30 Apr 2020 12:16:22 -0400 Received: from mail.efficios.com ([167.114.26.124]:38382 "EHLO mail.efficios.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726482AbgD3QQV (ORCPT ); Thu, 30 Apr 2020 12:16:21 -0400 Received: from localhost (localhost [127.0.0.1]) by mail.efficios.com (Postfix) with ESMTP id 8853028FB2F; Thu, 30 Apr 2020 12:16:20 -0400 (EDT) Received: from mail.efficios.com ([127.0.0.1]) by localhost (mail03.efficios.com [127.0.0.1]) (amavisd-new, port 10032) with ESMTP id xDaO_UPW64z2; Thu, 30 Apr 2020 12:16:20 -0400 (EDT) Received: from localhost (localhost [127.0.0.1]) by mail.efficios.com (Postfix) with ESMTP id 2862728FB2E; Thu, 30 Apr 2020 12:16:20 -0400 (EDT) DKIM-Filter: OpenDKIM Filter v2.10.3 mail.efficios.com 2862728FB2E DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=efficios.com; s=default; t=1588263380; bh=3peYxHoeMxttyPs/kK0oeQFnIA8w0ML3Z5mFQwUuuw4=; h=Date:From:To:Message-ID:MIME-Version; b=HGgxy6OFkI+B2RtYLYDdQuTTa12hQjnYMjWP4XHHzt99nrJfOJKd7p/4232pNS2M7 BCjUTQ8ggo2ws08s2w/ThMutUQmT2qdsOhSfz1qROwCzDm/ZCCUQLAfXAC3i1RoNAY fvhCZ8MprAIwkattoi3Vl09m7b3e3DUtVP659pFiHRhIce9mu7yxaA+WpxWbcm3SRx Va3Wyw3OSBlDA2mcoJJpfMV8jbqwDnJhJWBvyANajN+efL1gMzF/UPaL9wnRcL6x18 30tgT0g6kcMUCk8rIUSn2WvfY75abPj9OgNkEtW7lRZ8VZvsV3KWPoqyp2lw8ZL7vt okhQiAZIln0XQ== X-Virus-Scanned: amavisd-new at efficios.com Received: from mail.efficios.com ([127.0.0.1]) by localhost (mail03.efficios.com [127.0.0.1]) (amavisd-new, port 10026) with ESMTP id cVUp0jkXj21a; Thu, 30 Apr 2020 12:16:20 -0400 (EDT) Received: from mail03.efficios.com (mail03.efficios.com [167.114.26.124]) by mail.efficios.com (Postfix) with ESMTP id 1614F28F79B; Thu, 30 Apr 2020 12:16:20 -0400 (EDT) Date: Thu, 30 Apr 2020 12:16:20 -0400 (EDT) From: Mathieu Desnoyers To: rostedt Cc: Joerg Roedel , linux-kernel , Ingo Molnar , Thomas Gleixner , Peter Zijlstra , Borislav Petkov , Andrew Morton , Shile Zhang , Andy Lutomirski , "Rafael J. Wysocki" , Dave Hansen , Tzvetomir Stoyanov Message-ID: <505666080.77869.1588263380070.JavaMail.zimbra@efficios.com> In-Reply-To: <20200430121136.6d7aeb22@gandalf.local.home> References: <20200429054857.66e8e333@oasis.local.home> <20200429105941.GQ30814@suse.de> <20200429082854.6e1796b5@oasis.local.home> <20200429100731.201312a9@gandalf.local.home> <20200430141120.GA8135@suse.de> <20200430121136.6d7aeb22@gandalf.local.home> Subject: Re: [RFC][PATCH] x86/mm: Sync all vmalloc mappings before text_poke() MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Originating-IP: [167.114.26.124] X-Mailer: Zimbra 8.8.15_GA_3918 (ZimbraWebClient - FF75 (Linux)/8.8.15_GA_3895) Thread-Topic: x86/mm: Sync all vmalloc mappings before text_poke() Thread-Index: lAOrEjXtq/pzs5xj09aTYCikDgN7cA== Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org ----- On Apr 30, 2020, at 12:11 PM, rostedt rostedt@goodmis.org wrote: > On Thu, 30 Apr 2020 16:11:21 +0200 > Joerg Roedel wrote: > >> Hi, >> >> On Wed, Apr 29, 2020 at 10:07:31AM -0400, Steven Rostedt wrote: >> > Talking with Mathieu about this on IRC, he pointed out that my code does >> > have a vzalloc() that is called: >> > >> > in trace_pid_write() >> > >> > pid_list->pids = vzalloc((pid_list->pid_max + 7) >> 3); >> > >> > This is done when -P1,2 is on the trace-cmd command line. >> >> Okay, tracked it down, some instrumentation in the page-fault and >> double-fault handler gave me the stack-traces. Here is what happens: >> >> As already pointed out, it all happens because of page-faults on the >> vzalloc'ed pid bitmap. It starts with this stack-trace: >> >> RIP: 0010:trace_event_ignore_this_pid+0x23/0x30 > > Interesting. Because that function is this: > > bool trace_event_ignore_this_pid(struct trace_event_file *trace_file) > { > struct trace_array *tr = trace_file->tr; > struct trace_array_cpu *data; > struct trace_pid_list *no_pid_list; > struct trace_pid_list *pid_list; > > pid_list = rcu_dereference_raw(tr->filtered_pids); > no_pid_list = rcu_dereference_raw(tr->filtered_no_pids); > > if (!pid_list && !no_pid_list) > return false; > > data = this_cpu_ptr(tr->array_buffer.data); > > return data->ignore_pid; > } > > Where it only sees if the pid masks exist. That is, it looks to see if > there's pointers to them, it doesn't actually touch the vmalloc'd area. > This check is to handle a race between allocating and deallocating the > buffers and setting the ignore_pid bit. The reading of these arrays is done > at sched_switch time, which sets or clears the ignore_pid field. > > That said, since this only happens on buffer instances (it does not trigger > on the top level instance, which uses the same code for the pid masks) > > Could this possibly be for the tr->array_buffer.data, which is allocated > with: > > allocate_trace_buffer() { > [..] > buf->data = alloc_percpu(struct trace_array_cpu); > > That is, the bug isn't the vmalloc being a problem, but perhaps the per_cpu > allocation. This would explain why this crashes with the buffer instance > and not with the top level instance. If it was related to the pid masks, > then it would trigger for either (because they act the same in allocating > at time of use). But when an instance is made, the tr->array_buffer.data is > created. Which for the top level happens at boot up and the pages would > have been synced long ago. But for a newly created instance, this happens > just before its used. This could possibly explain why it's not a problem > when doing it manually by hand, because the time between creating the > instance, and the time to start and stop the tracing, is long enough for > something to sync them page tables. > > tl;dr; It's not an issue with the vmalloc, it's an issue with per_cpu > allocations! Did I mention that alloc_percpu uses: static void *pcpu_mem_zalloc(size_t size, gfp_t gfp) { if (WARN_ON_ONCE(!slab_is_available())) return NULL; if (size <= PAGE_SIZE) return kzalloc(size, gfp); else return __vmalloc(size, gfp | __GFP_ZERO, PAGE_KERNEL); } So yeah, it's vmalloc'd memory when size > PAGE_SIZE. Thanks, Mathieu -- Mathieu Desnoyers EfficiOS Inc. http://www.efficios.com