Received: by 2002:a25:4158:0:0:0:0:0 with SMTP id o85csp3093600yba; Mon, 6 May 2019 17:11:26 -0700 (PDT) X-Google-Smtp-Source: APXvYqy+UvoH/V5najLB/Hslo7Dcuz0VgqmBStOj0SkQKSCQsz244DYGNWSFjCVR0mhujJpcz4pC X-Received: by 2002:a62:d244:: with SMTP id c65mr37272473pfg.173.1557187886254; Mon, 06 May 2019 17:11:26 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1557187886; cv=none; d=google.com; s=arc-20160816; b=rIrsdvEPMkai5LPBV4BucK4ZwafmYF63dNVATlxi8hxVASnh/LvPMmdI+rx/7ngvW3 PktQ05K0r7BUX7w+JVBOGc0qFiGY3RXpAtd9sdRD1/NxUbA3gKs7i0kDY7/7nZHSJsXB dUzrwCKAMqn/7cLbavJo/jyrlXjn/uCwTS3dnwkOohTS+lXgI0P2M7P+13/efbg5bkTG yLqrou07buGAHAh2SR2pNl28ytE/nroT9PdNMFs4HhNZu0aM33+3z0V02P27r/t025OD 7DJ/6DdLSC80tgU1N+4haKa1nGqAHw/+zf3Xs0S8Q8vPpuddQmB2+Zl8qRtO0xPgiXAA Aazg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding:mime-version :references:in-reply-to:message-id:subject:cc:to:from:date; bh=oLVzEJZ7NX9lLhvb2DtY4ruLlvXJU1+R4q6n1BCCFFk=; b=g1dEP+FokZUcw6CZ++ipDOlLJzw1sDJhRgkwoEzaBRvtywHQ8E4EefuMYHVS+WUyDQ 2ypFlDKc3UOviHxqNEsSAaaPTrdq5oQIhR60UDQnS5mbHyzB5JUJQ/AMYuecxQeOq1w2 FQzgPrwJjxpEbh0vBi8Rfo62sKLHiaoOV6U6yEu6ROQWP/vVuh98l2nU0kYmBAf9x1R8 D9ABdCcofSrDFIGJQupI/jki1Y4d3YmpVWqTRcXATfun2Vt1EzkLz8YXwrXl7pubLN7s XHMxGOcF4YKwhTLrYxOFWI6EG0B8hPfX74p/XwTNzXwMzIFd5CqMqfbSu18D03F4Puya Aw/w== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id a11si15622814pgw.384.2019.05.06.17.11.10; Mon, 06 May 2019 17:11:26 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726454AbfEGAKU (ORCPT + 99 others); Mon, 6 May 2019 20:10:20 -0400 Received: from mail.kernel.org ([198.145.29.99]:34440 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726046AbfEGAKU (ORCPT ); Mon, 6 May 2019 20:10:20 -0400 Received: from oasis.local.home (cpe-66-24-58-225.stny.res.rr.com [66.24.58.225]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id 3309E206BF; Tue, 7 May 2019 00:10:16 +0000 (UTC) Date: Mon, 6 May 2019 20:10:14 -0400 From: Steven Rostedt To: Linus Torvalds Cc: Peter Zijlstra , Andy Lutomirski , Linux List Kernel Mailing , Ingo Molnar , Andrew Morton , Andy Lutomirski , Nicolai Stange , Thomas Gleixner , Ingo Molnar , Borislav Petkov , "H. Peter Anvin" , "the arch/x86 maintainers" , Josh Poimboeuf , Jiri Kosina , Miroslav Benes , Petr Mladek , Joe Lawrence , Shuah Khan , Konrad Rzeszutek Wilk , Tim Chen , Sebastian Andrzej Siewior , Mimi Zohar , Juergen Gross , Nick Desaulniers , Nayna Jain , Masahiro Yamada , Joerg Roedel , "open list:KERNEL SELFTEST FRAMEWORK" , stable , Masami Hiramatsu Subject: Re: [RFC][PATCH 1/2] x86: Allow breakpoints to emulate call functions Message-ID: <20190506201014.484e7b65@oasis.local.home> In-Reply-To: References: <20190502181811.GY2623@hirez.programming.kicks-ass.net> <20190503092247.20cc1ff0@gandalf.local.home> <2045370D-38D8-406C-9E94-C1D483E232C9@amacapital.net> <20190506081951.GJ2606@hirez.programming.kicks-ass.net> <20190506095631.6f71ad7c@gandalf.local.home> <20190506130643.62c35eeb@gandalf.local.home> <20190506145745.17c59596@gandalf.local.home> <20190506162915.380993f9@gandalf.local.home> <20190506174511.2f8b696b@gandalf.local.home> X-Mailer: Claws Mail 3.17.3 (GTK+ 2.24.32; x86_64-pc-linux-gnu) MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, 6 May 2019 15:31:57 -0700 Linus Torvalds wrote: > On Mon, May 6, 2019 at 3:06 PM Linus Torvalds > wrote: > > > > Why are you emulating something different than what you are rewriting? > > Side note: I'm also finding another bug on the ftrace side, which is a > simple race condition. > > In particular, the logic with 'modifying_ftrace_code' is fundamentally racy. > > What can happen is that on one CPU we rewrite one instruction: > > ftrace_update_func = ip; > /* Make sure the breakpoints see the ftrace_update_func update */ > smp_wmb(); > > /* See comment above by declaration of modifying_ftrace_code */ > atomic_inc(&modifying_ftrace_code); > > ret = ftrace_modify_code(ip, old, new); > > atomic_dec(&modifying_ftrace_code); > > but then another CPU hits the 'int3' while the modification is > going on, and takes the fault. > > The fault handler does that > > if (unlikely(atomic_read(&modifying_ftrace_code)).. > > and sees that "yes, it's in the middle of modifying the ftrace code", > and calls ftrace_int3_handler(). All good and "obviously correct" so > far, no? > > HOWEVER. It's actually buggy. Because in the meantime, the CPU that > was rewriting instructions has finished, and decrements the > modifying_ftrace_code, which doesn't hurt us (because we already saw > that the int3 was due to the modification. But the CPU that was rewriting instructions does a run_sync() after removing the int3: static void run_sync(void) { int enable_irqs; /* No need to sync if there's only one CPU */ if (num_online_cpus() == 1) return; enable_irqs = irqs_disabled(); /* We may be called with interrupts disabled (on bootup). */ if (enable_irqs) local_irq_enable(); on_each_cpu(do_sync_core, NULL, 1); if (enable_irqs) local_irq_disable(); } Which sends an IPI to all CPUs to make sure they no longer see the int3. > > BUT! There are two different races here: > > (a) maybe the fault handling was slow, and we saw the 'int3' and took > the fault, but the modifying CPU had already finished, so that > atomic_read(&modifying_ftrace_code) didn't actually trigger at all. > > (b) maybe the int3-faulting CPU *did* see the proper value of > modifying_ftrace_code, but the modifying CPU went on and started > *another* modification, and has changed ftrace_update_func in the > meantime, so now the int3 handling is looking at the wrong values! > > In the case of (a), we'll die with an oops due to the inexplicable > 'int3' we hit. And in the case of (b) we'll be fixing up using the > wrong address. > > Things like this is why I'm wondering how much of the problems are due > to the entry code, and how much of it is due to simply races and > timing differences? > > Again, I don't actually know the ftrace code, and maybe I'm missing > something, but this really looks like _another_ fundamental bug. > > The way to handle that modifying_ftrace_code thing is most likely by > using a sequence counter. For example, one way to actually do some > thing like this might be > > ftrace_update_func = ip; > ftrace_update_target = func; > smp_wmb(); > atomic_inc(&modifying_ftrace_head); > > ret = ftrace_modify_code(ip, old, new); > > atomic_inc(&modifying_ftrace_tail); > smp_wmb(); > > and now the int3 code could do something like > > int head, tail; > > head = atomic_read(&modifying_ftrace_head); > smp_rmb(); > tail = atomic_read(&modifying_ftrace_tail); > > /* Are we still in the process of modification? */ > if (unlikely(head != tail+1)) > return 0; > > ip = ftrace_update_func; > func = ftrace_update_target; > smp_rmb(); > /* Need to re-check that the above two values are consistent > and we didn't exit */ > if (atomic_read(&modifying_ftrace_tail) != tail) > return 0; > > *pregs int3_emulate_call(regs, ip, func); > return 1; > > although it probably really would be better to use a seqcount instead > of writing it out like the above. > > NOTE! The above only fixes the (b) race. The (a) race is probably best > handled by actually checking if the 'int3' instruction is still there > before dying. > > Again, maybe there's something I'm missing, but having looked at that > patch now what feels like a million times, I'm finding more worrisome > things in the ftrace code than in the kernel entry code.. I think you are missing the run_sync() which is the heavy hammer to make sure all CPUs are in sync. And this is done at each stage: add int3 run_sync(); update call cite outside of int3 run_sync() remove int3 run_sync() HPA said that the last run_sync() isn't needed, but I kept it because I wanted to make sure. Looks like your analysis shows that it is needed. -- Steve