Received: by 2002:ac2:464d:0:0:0:0:0 with SMTP id s13csp3294108lfo; Mon, 23 May 2022 00:56:10 -0700 (PDT) X-Google-Smtp-Source: ABdhPJx/XvNhcmjhHZrAFJvkduRqQl+Fops9++nuFIIxujL4O6Ds562sKGeaHIun7A3SNJ7edAOG X-Received: by 2002:a17:902:ea57:b0:15a:6173:87d6 with SMTP id r23-20020a170902ea5700b0015a617387d6mr21459788plg.104.1653292570266; Mon, 23 May 2022 00:56:10 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1653292570; cv=none; d=google.com; s=arc-20160816; b=WwHP6qxZ8B4dv+KAx+gEqCoyTxTqi0MJODz2S4nHNetjNzZAYbS6BOtAVx/NNbu2Vj DTYuJ5bZd1GICa/xEPLO10RJDCbXi8GW4paG6eFIVDW9gi792ygJavybA+NKS0/11K/i bqVGOFaie4igvHMvu8CtXF9orFV8ccjOZbj6ZDTf7ZPOcrCSqxM8D9krWHCaxl2KWEr9 KSasLDjQalg3MYDLZSTGNyjwS6j0xYwlfYtHxkzlmnsdceN8wFhet+4yXrRRW6au6fi8 mPXOLCeOpmlciTwI5aK8yr5TlkhegqnX+oC2h4UwyX5UGcJK+y7PVL536Q3WUu098X0t KrLQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:subject:message-id:date:from:in-reply-to :references:mime-version:dkim-signature; bh=FonIHMi/3lO9Jffb1ttV3+nPKi58eDv/a59C6q9nmzI=; b=ptq89BrW5dENZI3b11rgc2E6wstxrdZDiRL7IYHrl2E0p1ejAwEZP93x4W90ePz/Wc VrNfLFq4H7lT73mjUI7+RJAW8LWR/A3EXRr3z1ctwvvSAWa9yA8x21ugLjW/w5CQLHBn Mp6sjyJWqec+zUGxUof/EHDb2SACy6yCoChJ/AIj0lDd2CI5FqvOGm7mpWdIsgqmV9Hz fNEyjUME6+k8hV0qfUhCtyn2oYucB4l0Tpz4A+V9NwbO7IzgqRQgm8J/IjwLkKhN9tEi UB7VZ87QeWlIrxRr/ITWTqIe4n/+ue5YSxMatPoQUE/w89igbWHIKXlwwzB/sH2LNVtq 1fEg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20210112 header.b=NaAO8pGG; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Return-Path: Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net. [2620:137:e000::1:18]) by mx.google.com with ESMTPS id rj11-20020a17090b3e8b00b001e056b30facsi2205338pjb.161.2022.05.23.00.56.09 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 23 May 2022 00:56:10 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:18 as permitted sender) client-ip=2620:137:e000::1:18; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20210112 header.b=NaAO8pGG; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id 965CE43AF3; Sun, 22 May 2022 23:56:09 -0700 (PDT) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S241211AbiEUWbI (ORCPT + 99 others); Sat, 21 May 2022 18:31:08 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:33230 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234785AbiEUWbG (ORCPT ); Sat, 21 May 2022 18:31:06 -0400 Received: from mail-io1-xd2b.google.com (mail-io1-xd2b.google.com [IPv6:2607:f8b0:4864:20::d2b]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id CD7ED63CA for ; Sat, 21 May 2022 15:31:03 -0700 (PDT) Received: by mail-io1-xd2b.google.com with SMTP id z20so3477043iof.1 for ; Sat, 21 May 2022 15:31:03 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=FonIHMi/3lO9Jffb1ttV3+nPKi58eDv/a59C6q9nmzI=; b=NaAO8pGGJ3T+MKNvg6WEHzeviTfr99ijUwaHhRJvuBfYsjLET3d2JzRD+AbYnPvVsp UUiKuvcF2XUO2+p4aKPweMWzrGVFtR00l7hYnztPRs5axpgJ3hnruLd7i165qiT0EPyo KdMMBVmFCSSl1Q6foB6wS++7HnFaNF+pHuyzZ68DogBZwEbz+Kgy5N11MsKwDzacegxo jH3CdtTIsOrfdezE0zxgJmKFy2YRJkWJNGOMEfBdjDirby+7ZdR0JvKy+kB8RFVBXbEe 7mzHsxgyZgWOEdBtjPCcY7ZNvGQv7476Y0XvRO8aKvysFgECGRaSKYS3qRAijP4WtZys 5TuQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=FonIHMi/3lO9Jffb1ttV3+nPKi58eDv/a59C6q9nmzI=; b=LUCJHWw+HHpcQl00E0ADpWbkwOBdYZBHJRgH/fklxHtGOaSbT05nINQUMheY5YZHPB hyi07LulHTGU+7MXy8oXtgt6e9idyUBFhCknQgmCSfxkjEQPtVT1Y+qfaDvCjrayAU1B T+iHtBoTARRYxzFWVswn+6NDaxWFuZ2TWoBcZY0bl6fJ+IclK7Po0jCbe6+/8wcIq+sF YHU6ksrhTIuIjLhUiFb/jLAtshy1wtF9NG/Vufvs0Tnpkqdy6MXF1QiKGNm3G0V/p85L tNm29yDTrOZsgaAWJGSOS6wtcx8G+JF4lSWbeFxyyLoLlC2hqmJnjNxh34tZqJfRyem1 RGgg== X-Gm-Message-State: AOAM532HzcFUcFmhtyfLLnEDIkDl4VQ0q4QYRXGdIOrdpRY8ioavxN3f E5HfxPSGuKESPuuZgfukbsG7QEu4B9zJ9hmR1F0= X-Received: by 2002:a05:6638:381c:b0:32e:49f9:5b6e with SMTP id i28-20020a056638381c00b0032e49f95b6emr8942455jav.71.1653172263265; Sat, 21 May 2022 15:31:03 -0700 (PDT) MIME-Version: 1.0 References: In-Reply-To: From: Andrey Konovalov Date: Sun, 22 May 2022 00:30:52 +0200 Message-ID: Subject: Re: [PATCH v3 0/3] kasan, arm64, scs: collect stack traces from Shadow Call Stack To: Mark Rutland Cc: andrey.konovalov@linux.dev, Marco Elver , Alexander Potapenko , Dmitry Vyukov , Andrey Ryabinin , kasan-dev , Catalin Marinas , Will Deacon , Vincenzo Frascino , Sami Tolvanen , Linux ARM , Peter Collingbourne , Evgenii Stepanov , Florian Mayer , Andrew Morton , Linux Memory Management List , LKML , Andrey Konovalov Content-Type: text/plain; charset="UTF-8" X-Spam-Status: No, score=-1.7 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,RDNS_NONE, SPF_HELO_NONE,T_SCC_BODY_TEXT_LINE autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, Apr 14, 2022 at 2:37 PM Mark Rutland wrote: > Hi Mark, Sorry for the delayed response, it took some time getting my hands on hardware for testing these changes. > Just to be clear: QEMU TCG mode is *in no way* representative of HW > performance, and has drastically different performance characteristics > compared to real HW. Please be very clear when you are quoting > performance figures from QEMU TCG mode. > > Previously you said you were trying to optimize this so that some > version of KASAN could be enabled in production builds, and the above is > not a suitable benchmark system for that. Understood. My expectation was that performance numbers from QEMU would be close to hardware. I knew that there are instructions that take longer to be emulated, but I expected that they would be uniformly spread across the code. However, your explanation proved this wrong. This indeed doesn't apply when measuring the performance of a piece of code with a different density of function calls. Thank you for the detailed explanation! Those QEMU arguments will definitely be handy when I need a faster QEMU setup. > Is that *actually* what you're trying to enable, or are you just trying > to speed up running instances under QEMU (e.g. for arm64 Syzkaller runs > on GCE)? No, I'm not trying to speed up QEMU. QEMU was just the only setup that I had access to at that moment. The goal is to allow enabling stack trace collection in production on HW_TAGS-enabled devices once those are created. [...] > While the SCS unwinder is still faster, the difference is nowhere near > as pronounced. As I mentioned before, there are changes that we can make > to the regular unwinder to close that gap somewhat, some of which I > intend to make as part of ongoing cleanup/rework in that area. I tried running the same experiments on Pixel 6. Unfortunately, I was only able to test the OUTLINE SW_TAGS mode (without STACK instrumentation, as HW_TAGS doesn't support STACK at the moment.) All of the other modes either fail to flash or fail to boot with AOSP on Pixel 6 :( The results are (timestamps were measured when "ALSA device list" was printed to the kernel log): sw_tags outline nostacks: 2.218 sw_tags outline: 2.516 (+13.4%) sw_tags outline nosanitize: 2.364 (+6.5%) sw_tags outline nosanitize __set_bit: 2.364 (+6.5%) sw_tags outline nosanitize scs: 2.236 (+0.8%) Used markings: nostacks: patch from master-no-stack-traces applied nosanitize: KASAN_SANITIZE_stacktrace.o := n __set_bit: set_bit -> __set_bit change applied scs: patches from up-scs-stacks-v3 applied First, disabling instrumentation of stacktrace.c is indeed a great idea for software KASAN modes! I will send a patch for this later. Changing set_bit to __set_bit seems to make no difference on Pixel 6. The awesome part is that the overhead of collecting stack traces with SCS and even saving them into the stack depot is less than 1%. However once again note, that this is for OUTLINE SW_TAGS without STACK. > I haven't bothered testing HW_TAGS, because the performance > characteristics of emulated MTE are also nothing like that of a real HW > implementation. > > So, given that and the problems I mentioned before, I don't think > there's a justification for adding a separate SCS unwinder. As before, > I'm still happy to try to make the regular unwinder faster (and I'm > happy to make changes which benefit QEMU TCG mode if those don't harm > the maintainability of the unwinder). > > NAK to adding an SCS-specific unwinder, regardless of where in the > source tree that is placed. I see. Perhaps, it makes sense to wait until there's HW_TAGS-enabled hardware available before continuing to look into this. At the end, the performance overhead for that setup is what matters. I'll look into improving the performance of the existing unwinder a bit more. However, I don't think I'll be able to speed it up to < 1%. Which means that we'll likely need a sample-based approach for HW_TAGS stack collection to reduce the overhead. Thank you!