Received: by 2002:a05:6a10:8c0a:0:0:0:0 with SMTP id go10csp733875pxb; Wed, 3 Feb 2021 16:57:43 -0800 (PST) X-Google-Smtp-Source: ABdhPJyRJkgZfnLQxHLt0HRI3Bc8ogMv/f4UyRKT/UZsf6RbEoX0US5gyNLXA2dgwbAQVVXJAXNT X-Received: by 2002:aa7:c813:: with SMTP id a19mr5931326edt.136.1612400263284; Wed, 03 Feb 2021 16:57:43 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1612400263; cv=none; d=google.com; s=arc-20160816; b=0OkzEG387IEyrVMOjoK3L0GC5i/5BlearuGADJo2JXScvG90IFZB7esCwwg67PIVx1 dQhRrrJJiOoUtvgsQq++qIq1JZGOIB9ZVkEdZgoLTYE96uuDRBMVWB3QfZmy1ly78Juf +jxOmeLhvslULO5iWhekEuDyL3Lqq+6Ivy44b4d0ZM+yEKA8SijphiVmILW4VzC3Tk0y PFAeLDqJgrgKTlPV/N3iT7Y+es2YA/4lNzB6SSkTaS/8NvSe3xuHZRp5uOtUULQRCFFX BMsbxUnr9MoFL9IPGgsbOdyEf/M/TYvW67x5ona4ZK5F5kcddHQBMGNLgieKSEHDl3lH 8MYA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:subject:message-id:date:from:in-reply-to :references:mime-version:dkim-signature; bh=z0j8u/cgjNk63BrE1ROiiO7WIMH6hW9ZaraaxN/GjSE=; b=zaFHvxdtIV+ZyhTCxdGbCfIT3r3BPdJdwHPV8sEAdslqreHJNJDbHqlQE6qCcOqwPl RWfDnBg4f+YjH05JYunkVcRj6ezMe/42F0ivxA04Yc09QVsFUn1fJRq5hcWSR7x70VEB POLcFmGCpswl4OYD8u1EWVXNJULC0i5K8aQEGaQKspIV5jlNt6xEbhW0D8jVQXnwsLuI Fb62cI4QqsKhMKwpcGRExfXEhTY126waMEIHRK3wm1kcE6VkRq1mi43xDsesgi89CoM9 ou8IpaMX1k5Jd7qa6LafwxLrauRTUgIQohYUR+slDa5xUthJswGkAUdGyq2wa5QjAWN6 9EXQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@cloudflare.com header.s=google header.b="gVr/bTGS"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=cloudflare.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id w16si2302622edi.602.2021.02.03.16.57.18; Wed, 03 Feb 2021 16:57:43 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@cloudflare.com header.s=google header.b="gVr/bTGS"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=cloudflare.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233875AbhBDAxt (ORCPT + 99 others); Wed, 3 Feb 2021 19:53:49 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:48802 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232478AbhBDAxp (ORCPT ); Wed, 3 Feb 2021 19:53:45 -0500 Received: from mail-lf1-x129.google.com (mail-lf1-x129.google.com [IPv6:2a00:1450:4864:20::129]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 3F569C061788 for ; Wed, 3 Feb 2021 16:52:55 -0800 (PST) Received: by mail-lf1-x129.google.com with SMTP id f1so2021422lfu.3 for ; Wed, 03 Feb 2021 16:52:55 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=cloudflare.com; s=google; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=z0j8u/cgjNk63BrE1ROiiO7WIMH6hW9ZaraaxN/GjSE=; b=gVr/bTGSubf6YnOvvpVG//iNim87v02Skx3jR2rMxXBTcD+HF8G3xEAnQQx+0Q+npP pqDCNmdgNk9JsEEyyZ8pGBqPuhB0LWAtXIXpCnbR2QYDJ+XTkBpqezeM0VhEQL870N4F uJkM0MIlJHoRCuuzrpOOrJtOOSiKWEnnBpQbw= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=z0j8u/cgjNk63BrE1ROiiO7WIMH6hW9ZaraaxN/GjSE=; b=I4aUqJpYVJNZdxbkuzm0q4vuNvZzmQLs7qgFEJ1KFV37KH/EhgZ8PGZdGKYerPfuJn brB/yQow+4YregnXlqatt9g1Nm2KSyY4UUexNTsovjaQxKsbSgveuu0tjnBQnibWjN2b GHjPpOTEVhk9Va2EMl0flVb3taHMsaWJHTVisXXMF0Cf7yxhMVJZK1Je22o2OXAYhStW 9v7vH1JbaaXjljhx67er10OsxcIDcGGqzwFGlTSzL3AH1+ubsj88TiThfcQNLGKcDqdw SHkc1OAQmJwGzbBXKMevZdh8E/lPbEg4oIJdx7Ut+DE45xEVLGjXgILN5F+3Iw9pXwsI q3vg== X-Gm-Message-State: AOAM532Wg9rED/tBE6s2udV+HFtvImMfma+Y2T2x8F3taXGBzjxi+xjU HnQd5PSm7Sa2OgdD8uwGsncT6znopmCAYyTVxw+uqg== X-Received: by 2002:a05:6512:3190:: with SMTP id i16mr3254379lfe.200.1612399973566; Wed, 03 Feb 2021 16:52:53 -0800 (PST) MIME-Version: 1.0 References: <20210203190518.nlwghesq75enas6n@treble> <20210203232735.nw73kugja56jp4ls@treble> <20210204001700.ry6dpqvavcswyvy7@treble> In-Reply-To: <20210204001700.ry6dpqvavcswyvy7@treble> From: Ivan Babrou Date: Wed, 3 Feb 2021 16:52:42 -0800 Message-ID: Subject: Re: BUG: KASAN: stack-out-of-bounds in unwind_next_frame+0x1df5/0x2650 To: Josh Poimboeuf Cc: Peter Zijlstra , kernel-team , Ignat Korchagin , Hailong liu , Andrey Ryabinin , Alexander Potapenko , Dmitry Vyukov , Andrew Morton , Thomas Gleixner , Ingo Molnar , Borislav Petkov , x86@kernel.org, "H. Peter Anvin" , Miroslav Benes , Julien Thierry , Jiri Slaby , kasan-dev@googlegroups.com, linux-mm@kvack.org, linux-kernel , Alasdair Kergon , Mike Snitzer , dm-devel@redhat.com, "Steven Rostedt (VMware)" , Alexei Starovoitov , Daniel Borkmann , Martin KaFai Lau , Song Liu , Yonghong Song , Andrii Nakryiko , John Fastabend , KP Singh , Robert Richter , "Joel Fernandes (Google)" , Mathieu Desnoyers , Linux Kernel Network Developers , bpf@vger.kernel.org, Alexey Kardashevskiy Content-Type: text/plain; charset="UTF-8" Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, Feb 3, 2021 at 4:17 PM Josh Poimboeuf wrote: > > On Wed, Feb 03, 2021 at 03:30:35PM -0800, Ivan Babrou wrote: > > > > > Can you recreate with this patch, and add "unwind_debug" to the cmdline? > > > > > It will spit out a bunch of stack data. > > > > > > > > Here's the three I'm building: > > > > > > > > * https://github.com/bobrik/linux/tree/ivan/static-call-5.9 > > > > > > > > It contains: > > > > > > > > * v5.9 tag as the base > > > > * static_call-2020-10-12 tag > > > > * dm-crypt patches to reproduce the issue with KASAN > > > > * x86/unwind: Add 'unwind_debug' cmdline option > > > > * tracepoint: Fix race between tracing and removing tracepoint > > > > > > > > The very same issue can be reproduced on 5.10.11 with no patches, > > > > but I'm going with 5.9, since it boils down to static call changes. > > > > > > > > Here's the decoded stack from the kernel with unwind debug enabled: > > > > > > > > * https://gist.github.com/bobrik/ed052ac0ae44c880f3170299ad4af56b > > > > > > > > See my first email for the exact commands that trigger this. > > > > > > Thanks. Do you happen to have the original dmesg, before running it > > > through the post-processing script? > > > > Yes, here it is: > > > > * https://gist.github.com/bobrik/8c13e6a02555fb21cadabb74cdd6f9ab > > It appears the unwinder is getting lost in crypto code. No idea what > this has to do with static calls though. Or maybe you're seeing > multiple issues. > > Does this fix it? It does for the dm-crypt case! But so does the following commit in 5.11 (and 5.10.12): * https://github.com/torvalds/linux/commit/ce8f86ee94?w=1 The reason I stuck to dm-crypt reproduction is that it reproduces reliably. We also have the following stack that doesn't touch any crypto: * https://gist.github.com/bobrik/40e2559add2f0b26ae39da30dc451f1e I cannot reproduce this one, and it took 2 days of uptime for it to happen. Is there anything I can do to help diagnose it? My goal is to enable multishot KASAN in our pre-production environment, but currently it sometimes starves TX queues on the NIC due to multiple reports in a row in an interrupt about unwind_next_frame, which disables network interface, which is not something we can tolerate.