Received: by 2002:a25:8b91:0:0:0:0:0 with SMTP id j17csp1817057ybl; Thu, 9 Jan 2020 02:06:27 -0800 (PST) X-Google-Smtp-Source: APXvYqxbWXtbv5FT0Z5ESZc5rkunFZmzDKvxrvvN9cO7d5/v96s3s5QEEXIUKWZS4+PxCmoV6vBv X-Received: by 2002:a9d:750b:: with SMTP id r11mr8090787otk.209.1578564387529; Thu, 09 Jan 2020 02:06:27 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1578564387; cv=none; d=google.com; s=arc-20160816; b=cYfb2zXCDqNckePL/6AMMLRzPXJiAoOcJgZoP+knXF1r/X45XAShABvVvgzrorrvIb aMEHonR4rHdqo3/tHgyx+wyG1eQ5SIlXrja3vpTdQGKDJTn7Ev1+EyFv9o/Qdh6FsZ+A Os1Il8CY29AeQmd5K3Ac+R9pZKb61QrrVVBsy6Jl8PQSQ4W7h6d5O73FEBXNygJyjqJe zJnRaZm3vHDVGPTHjXwu2mj8sgy70q2at5xZpi1DdhdsvYxcX+dDUyse5kfOsaZ4RBPi bVGi6AE0/QcWRIzT+uIoQBYqtHyv5sbnofZIWhrNB+cnTsUyH9TWsZd6x2MdTkvQ4DgJ 3oXA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:dkim-signature; bh=9oFbavNk6ckwJ4kQaBf73rEdRqoPbEcedxAQSe8iuZU=; b=SShb86BF6cHnyfGfvl7bXQ4dYhoM63L6bx6L5jRIPnBSv3Gqdvj+CSRTMARU8jSBV6 3YfzM8pxfRs1PX55k5QyCIg9W9tLlVcPqAvxORqAdf7Uc8jNmQMJb8DyJDU4NZQ+OF+j xhpegmQFyBFg1BAuh8OZVZdherfb1HIRhxqKO7YJXW/ecgEOzzalL1dKvTqNm+PQtvV8 yjmjhQM/HR7wryjX9FBqu+qYuXssKkRIMmERqkv9jm92JmvBTeKnbAdf5otck0ILihEY eHgljyrZVr5mJwtgFFjJxdbjAOtisTe1iaLauJ9Vou2oc6Jw2GZhbi5mb6eIbbJepUWy /tag== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=YNyqXjwH; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id w18si3564402otj.148.2020.01.09.02.06.12; Thu, 09 Jan 2020 02:06:27 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=YNyqXjwH; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729987AbgAIKF1 (ORCPT + 99 others); Thu, 9 Jan 2020 05:05:27 -0500 Received: from mail-qk1-f196.google.com ([209.85.222.196]:32958 "EHLO mail-qk1-f196.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1729894AbgAIKF1 (ORCPT ); Thu, 9 Jan 2020 05:05:27 -0500 Received: by mail-qk1-f196.google.com with SMTP id d71so5468280qkc.0 for ; Thu, 09 Jan 2020 02:05:26 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=9oFbavNk6ckwJ4kQaBf73rEdRqoPbEcedxAQSe8iuZU=; b=YNyqXjwHfTSpstNsoEKcG5BVg1naN2t5KWZLWby9VqAZ1r+UPRyeIxujwRnZp/CHcp wp/it0Fnxf0b+vp69LyI50TmpLDQ0iZagq/bZm1wNbm3hgNiL9rdYvV7Lu5yJ9KzBHzD SDX3wlRpPi788iigQuv+dZOhOkkDTympU6KC5LblzcaSHAG02m1sw9xl/TF6CVTFTzoH UOC8v5qJSnbSwibUtRZQtidynDyBqkDE1KQAZtGC6ncXO8LRKpudaSGzm0+PPm60eI6n PzFLqlDx1mYk/h2faql6+WtAtN/a/KJPPYmWH+aF1G89LAZ7vPUXyx06XhaNmgyqrayQ tAQA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=9oFbavNk6ckwJ4kQaBf73rEdRqoPbEcedxAQSe8iuZU=; b=QCZY6wqlREVtw1k+IHM0XKwXWQeC4BZWlYTO6e3LBxfbsZxFOgWLs9mORE0JgqPdG7 3h/erkiYPJkELlGmU5+zPxJKxCBVQiCrLEOmBEMPZ9tux22ScjNFEWmVMUNyyg7A/ZrD LDYDjKUVjLHzdn5XxTInS3isr71O8KmGqB3y4z6h7OgK9vkdgteX+e29Ok8sB+wLlloV vny0+Y11UPjBWEtTm2koExQYBv1KdJkfI1SO9n4bpXFNZKfcZhFlF9nbRVKOWBqvYYb/ WxlSz2mGwnXaxjCRUzLBjTB7zxbd5U1giAhW6mhHeequnYb0AxWi7ZXHFuB4nBYg0yRE hWJQ== X-Gm-Message-State: APjAAAVELaZVh+I3RY6eUFff0dkFYtc0CVwgPUVlo/X2KQBEnkOgWCob oPWp/wtdMR15uVjGcL72KxOSzDxuwCPLJw4Omy3caQ== X-Received: by 2002:a37:5841:: with SMTP id m62mr8530872qkb.256.1578564325780; Thu, 09 Jan 2020 02:05:25 -0800 (PST) MIME-Version: 1.0 References: <00000000000036decf0598c8762e@google.com> <87a787ekd0.fsf@dja-thinkpad.axtens.net> <87h81zax74.fsf@dja-thinkpad.axtens.net> <0b60c93e-a967-ecac-07e7-67aea1a0208e@I-love.SAKURA.ne.jp> <6d009462-74d9-96e9-ab3f-396842a58011@schaufler-ca.com> In-Reply-To: From: Dmitry Vyukov Date: Thu, 9 Jan 2020 11:05:13 +0100 Message-ID: Subject: Re: INFO: rcu detected stall in sys_kill To: Casey Schaufler , Daniel Axtens , Alexander Potapenko , clang-built-linux Cc: Tetsuo Handa , syzbot , kasan-dev , Andrew Morton , LKML , syzkaller-bugs Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, Jan 9, 2020 at 10:29 AM Dmitry Vyukov wrote: > > > > > > > > On 1/8/2020 2:25 AM, Tetsuo Handa wrote: > > > > > On 2020/01/08 15:20, Dmitry Vyukov wrote: > > > > >> I temporarily re-enabled smack instance and it produced another 50 > > > > >> stalls all over the kernel, and now keeps spewing a dozen every hour. > > > > > > > > Do I have to be using clang to test this? I'm setting up to work on this, > > > > and don't want to waste time using my current tool chain if the problem > > > > is clang specific. > > > > > > Humm, interesting. Initially I was going to say that most likely it's > > > not clang-related. Bug smack instance is actually the only one that > > > uses clang as well (except for KMSAN of course). So maybe it's indeed > > > clang-related rather than smack-related. Let me try to build a kernel > > > with clang. > > > > +clang-built-linux, glider > > > > [clang-built linux is severe broken since early Dec] > > > > Building kernel with clang I can immediately reproduce this locally: > > > > $ syz-manager > > 2020/01/09 09:27:15 loading corpus... > > 2020/01/09 09:27:17 serving http on http://0.0.0.0:50001 > > 2020/01/09 09:27:17 serving rpc on tcp://[::]:45851 > > 2020/01/09 09:27:17 booting test machines... > > 2020/01/09 09:27:17 wait for the connection from test machine... > > 2020/01/09 09:29:23 machine check: > > 2020/01/09 09:29:23 syscalls : 2961/3195 > > 2020/01/09 09:29:23 code coverage : enabled > > 2020/01/09 09:29:23 comparison tracing : enabled > > 2020/01/09 09:29:23 extra coverage : enabled > > 2020/01/09 09:29:23 setuid sandbox : enabled > > 2020/01/09 09:29:23 namespace sandbox : enabled > > 2020/01/09 09:29:23 Android sandbox : /sys/fs/selinux/policy > > does not exist > > 2020/01/09 09:29:23 fault injection : enabled > > 2020/01/09 09:29:23 leak checking : CONFIG_DEBUG_KMEMLEAK is > > not enabled > > 2020/01/09 09:29:23 net packet injection : enabled > > 2020/01/09 09:29:23 net device setup : enabled > > 2020/01/09 09:29:23 concurrency sanitizer : /sys/kernel/debug/kcsan > > does not exist > > 2020/01/09 09:29:23 devlink PCI setup : PCI device 0000:00:10.0 > > is not available > > 2020/01/09 09:29:27 corpus : 50226 (0 deleted) > > 2020/01/09 09:29:27 VMs 20, executed 0, cover 0, crashes 0, repro 0 > > 2020/01/09 09:29:37 VMs 20, executed 45, cover 0, crashes 0, repro 0 > > 2020/01/09 09:29:47 VMs 20, executed 74, cover 0, crashes 0, repro 0 > > 2020/01/09 09:29:57 VMs 20, executed 80, cover 0, crashes 0, repro 0 > > 2020/01/09 09:30:07 VMs 20, executed 80, cover 0, crashes 0, repro 0 > > 2020/01/09 09:30:17 VMs 20, executed 80, cover 0, crashes 0, repro 0 > > 2020/01/09 09:30:27 VMs 20, executed 80, cover 0, crashes 0, repro 0 > > 2020/01/09 09:30:37 VMs 20, executed 80, cover 0, crashes 0, repro 0 > > 2020/01/09 09:30:47 VMs 20, executed 80, cover 0, crashes 0, repro 0 > > 2020/01/09 09:30:57 VMs 20, executed 80, cover 0, crashes 0, repro 0 > > 2020/01/09 09:31:07 VMs 20, executed 80, cover 0, crashes 0, repro 0 > > 2020/01/09 09:31:17 VMs 20, executed 80, cover 0, crashes 0, repro 0 > > 2020/01/09 09:31:26 vm-10: crash: INFO: rcu detected stall in do_idle > > 2020/01/09 09:31:27 VMs 13, executed 80, cover 0, crashes 0, repro 0 > > 2020/01/09 09:31:28 vm-1: crash: INFO: rcu detected stall in sys_futex > > 2020/01/09 09:31:29 vm-4: crash: INFO: rcu detected stall in sys_futex > > 2020/01/09 09:31:31 vm-0: crash: INFO: rcu detected stall in sys_getsockopt > > 2020/01/09 09:31:33 vm-18: crash: INFO: rcu detected stall in sys_clone3 > > 2020/01/09 09:31:35 vm-3: crash: INFO: rcu detected stall in sys_futex > > 2020/01/09 09:31:36 vm-8: crash: INFO: rcu detected stall in do_idle > > 2020/01/09 09:31:37 VMs 7, executed 80, cover 0, crashes 6, repro 0 > > 2020/01/09 09:31:38 vm-19: crash: INFO: rcu detected stall in schedule_tail > > 2020/01/09 09:31:40 vm-6: crash: INFO: rcu detected stall in schedule_tail > > 2020/01/09 09:31:42 vm-2: crash: INFO: rcu detected stall in schedule_tail > > 2020/01/09 09:31:44 vm-12: crash: INFO: rcu detected stall in sys_futex > > 2020/01/09 09:31:46 vm-15: crash: INFO: rcu detected stall in sys_nanosleep > > 2020/01/09 09:31:47 VMs 1, executed 80, cover 0, crashes 11, repro 0 > > 2020/01/09 09:31:48 vm-16: crash: INFO: rcu detected stall in sys_futex > > 2020/01/09 09:31:50 vm-9: crash: INFO: rcu detected stall in schedule > > 2020/01/09 09:31:52 vm-13: crash: INFO: rcu detected stall in schedule_tail > > 2020/01/09 09:31:54 vm-11: crash: INFO: rcu detected stall in schedule_tail > > 2020/01/09 09:31:56 vm-17: crash: INFO: rcu detected stall in sys_futex > > 2020/01/09 09:31:57 VMs 0, executed 80, cover 0, crashes 16, repro 0 > > 2020/01/09 09:31:58 vm-7: crash: INFO: rcu detected stall in sys_futex > > 2020/01/09 09:32:00 vm-5: crash: INFO: rcu detected stall in dput > > 2020/01/09 09:32:02 vm-14: crash: INFO: rcu detected stall in sys_nanosleep > > > > > > Then I switched LSM to selinux and I _still_ can reproduce this. So, > > Casey, you may relax, this is not smack-specific :) > > > > Then I disabled CONFIG_KASAN_VMALLOC and CONFIG_VMAP_STACK and it > > started working normally. > > > > So this is somehow related to both clang and KASAN/VMAP_STACK. > > > > The clang I used is: > > https://storage.googleapis.com/syzkaller/clang-kmsan-362913.tar.gz > > (the one we use on syzbot). > > > Clustering hangs, they all happen within very limited section of the code: > > 1 free_thread_stack+0x124/0x590 kernel/fork.c:284 > 5 free_thread_stack+0x12e/0x590 kernel/fork.c:280 > 39 free_thread_stack+0x12e/0x590 kernel/fork.c:284 > 6 free_thread_stack+0x133/0x590 kernel/fork.c:280 > 5 free_thread_stack+0x13d/0x590 kernel/fork.c:280 > 2 free_thread_stack+0x141/0x590 kernel/fork.c:280 > 6 free_thread_stack+0x14c/0x590 kernel/fork.c:280 > 9 free_thread_stack+0x151/0x590 kernel/fork.c:280 > 3 free_thread_stack+0x15b/0x590 kernel/fork.c:280 > 67 free_thread_stack+0x168/0x590 kernel/fork.c:280 > 6 free_thread_stack+0x16d/0x590 kernel/fork.c:284 > 2 free_thread_stack+0x177/0x590 kernel/fork.c:284 > 1 free_thread_stack+0x182/0x590 kernel/fork.c:284 > 1 free_thread_stack+0x186/0x590 kernel/fork.c:284 > 16 free_thread_stack+0x18b/0x590 kernel/fork.c:284 > 4 free_thread_stack+0x195/0x590 kernel/fork.c:284 > > Here is disass of the function: > https://gist.githubusercontent.com/dvyukov/a283d1aaf2ef7874001d56525279ccbd/raw/ac2478bff6472bc473f57f91a75f827cd72bb6bf/gistfile1.txt > > But if I am not mistaken, the function only ever jumps down. So how > can it loop?... This is a miscompilation related to static branches. objdump shows: ffffffff814878f8: 0f 1f 44 00 00 nopl 0x0(%rax,%rax,1) ./arch/x86/include/asm/jump_label.h:25 asm_volatile_goto("1:" However, the actual instruction in memory at the time is: 0xffffffff814878f8 <+408>: jmpq 0xffffffff8148787f Which jumps to a wrong location in free_thread_stack and makes it loop. The static branch is this: static inline bool memcg_kmem_enabled(void) { return static_branch_unlikely(&memcg_kmem_enabled_key); } static inline void memcg_kmem_uncharge(struct page *page, int order) { if (memcg_kmem_enabled()) __memcg_kmem_uncharge(page, order); } I suspect it may have something to do with loop unrolling. It may jump to the right location, but in the wrong unrolled iteration.