Received: by 2002:a25:e7d8:0:0:0:0:0 with SMTP id e207csp2648643ybh; Mon, 9 Mar 2020 10:05:13 -0700 (PDT) X-Google-Smtp-Source: ADFU+vsolzhQOHxQECpCHjK5xkp3KcetTuYBYi2TOf/8IRrTYHviBMO3jc1a8uTaxP/t7ceLD5sk X-Received: by 2002:a05:6808:aa8:: with SMTP id r8mr75797oij.7.1583773513663; Mon, 09 Mar 2020 10:05:13 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1583773513; cv=none; d=google.com; s=arc-20160816; b=qKAp8qd1SfA/mcCMDDD8dW9AtXlMFiRuuegzjoiEB1M23lsjJpeo2BdsUtEEXpaRvL JEDtWI+ZDGx+HuwWX/K3D9zou8p/sQr6ZSeeaWn+xACorqYq7OmYRuPudF5DYKqy4RQ6 a4rrzO9wTZkU5RYGwek8gj1eWxngSVQKZV5KcGEl0Zup2GwGAnMnA79q+kae2LZXf4Z9 SHp1UxGeSFLgCpQpd9x21oT3fyTk+u5/utc+AQ4V16LP963GfJnWlRgbl3lHIPYNIBSJ sOHK5Cr3RwX89JokL3E9qu+83WIaBnqAg949n1SUdWtrfpeoaYDQTR8s65A5vUFn5z0w zPog== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:mime-version:message-id:date:subject:cc :to:from; bh=4gUp/cuD3q7IzmqrUV+xY7YrtOd2y1OguXSD2QTNGi8=; b=CMU56VrDFQIxfN9TROKga8xcoZQhEl4eHvQaI5iTWuZi1fkqn2cyy9Ta7isqoxRW6d zulVZlxVqxtVXyybep3k2KEPj4BLbiRI+LAlGpyssPgOHBnkWG16AogK1+fTBAC15noR WYbLfBHtW+QxljLUk4miEaebgM/R8Moa4tF9W4ftOTGO6bmoeKqzYfye0wTTS3Y0D4bI 6p4nOhZYp9eePgDXAgLnvXqKrLD31qnO06fvDYPMKUZtQQ79HsQIRzh+ZnpspgwrumfT clBbudXmVKMr3iXrMW+PBxWMn0yGw8fV+a23W5O16hPQi7ep5sQeEFsJJKFN+osPmEnL 9mhw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id w128si4181819oib.247.2020.03.09.10.05.00; Mon, 09 Mar 2020 10:05:13 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727202AbgCIRCu (ORCPT + 99 others); Mon, 9 Mar 2020 13:02:50 -0400 Received: from Galois.linutronix.de ([193.142.43.55]:59735 "EHLO Galois.linutronix.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727101AbgCIRCt (ORCPT ); Mon, 9 Mar 2020 13:02:49 -0400 Received: from [5.158.153.52] (helo=nanos.tec.linutronix.de) by Galois.linutronix.de with esmtpsa (TLS1.2:DHE_RSA_AES_256_CBC_SHA256:256) (Exim 4.80) (envelope-from ) id 1jBLnU-0003Um-U4; Mon, 09 Mar 2020 18:02:33 +0100 Received: by nanos.tec.linutronix.de (Postfix, from userid 1000) id 6A53E1040A7; Mon, 9 Mar 2020 18:02:32 +0100 (CET) From: Thomas Gleixner To: LKML Cc: Peter Zijlstra , Steven Rostedt , Masami Hiramatsu , Alexei Starovoitov , Mathieu Desnoyers , "Paul E. McKenney" , Joel Fernandes , Frederic Weisbecker Subject: Instrumentation and RCU Date: Mon, 09 Mar 2020 18:02:32 +0100 Message-ID: <87mu8p797b.fsf@nanos.tec.linutronix.de> MIME-Version: 1.0 Content-Type: text/plain Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Folks, I'm starting a new conversation because there are about 20 different threads which look at that problem in various ways and the information is so scattered that creating a coherent picture is pretty much impossible. There are several problems to solve: 1) Fragile low level entry code 2) Breakpoint utilization 3) RCU idle 4) Callchain protection #1 Fragile low level entry code While I understand the desire of instrumentation to observe everything we really have to ask the question whether it is worth the trouble especially with entry trainwrecks like x86, PTI and other horrors in that area. I don't think so and we really should just bite the bullet and forbid any instrumentation in that code unless it is explicitly designed for that case, makes sense and has a real value from an observation perspective. This is very much related to #3.. #2) Breakpoint utilization As recent findings have shown, breakpoint utilization needs to be extremly careful about not creating infinite breakpoint recursions. I think that's pretty much obvious, but falls into the overall question of how to protect callchains. #3) RCU idle Being able to trace code inside RCU idle sections is very similar to the question raised in #1. Assume all of the instrumentation would be doing conditional RCU schemes, i.e.: if (rcuidle) .... else rcu_read_lock_sched() before invoking the actual instrumentation functions and of course undoing that right after it, that really begs the question whether it's worth it. Especially constructs like: trace_hardirqs_off() idx = srcu_read_lock() rcu_irq_enter_irqson(); ... rcu_irq_exit_irqson(); srcu_read_unlock(idx); if (user_mode) user_exit_irqsoff(); else rcu_irq_enter(); are really more than questionable. For 99.9999% of instrumentation users it's absolutely irrelevant whether this traces the interrupt disabled time of user_exit_irqsoff() or rcu_irq_enter() or not. But what's relevant is the tracer overhead which is e.g. inflicted with todays trace_hardirqs_off/on() implementation because that unconditionally uses the rcuidle variant with the scru/rcu_irq dance around every tracepoint. Even if the tracepoint sits in the ASM code it just covers about ~20 low level ASM instructions more. The tracer invocation, which is even done twice when coming from user space on x86 (the second call is optimized in the tracer C-code), costs definitely way more cycles. When you take the scru/rcu_irq dance into account it's a complete disaster performance wise. #4 Protecting call chains Our current approach of annotating functions with notrace/noprobe is pretty much broken. Functions which are marked NOPROBE or notrace call out into functions which are not marked and while this might be ok, there are enough places where it is not. But we have no way to verify that. That's just a recipe for disaster. We really cannot request from sysadmins who want to use instrumentation to stare at the code first whether they can place/enable an instrumentation point somewhere. That'd be just a bad joke. I really think we need to have proper text sections which are off limit for any form of instrumentation and have tooling to analyze the calls into other sections. These calls need to be annotated as safe and intentional. Thoughts? Thanks, tglx