Received: by 10.213.65.68 with SMTP id h4csp316336imn; Sat, 24 Mar 2018 23:30:55 -0700 (PDT) X-Google-Smtp-Source: AG47ELuD7Dq+e47MVmRGnmhg2LKbMAetgJUQlXG1RMt3cXMpIsrFEHyqI3nyBAdk3vEBj+QuggOH X-Received: by 2002:a17:902:bd02:: with SMTP id p2-v6mr12178034pls.41.1521959455097; Sat, 24 Mar 2018 23:30:55 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1521959455; cv=none; d=google.com; s=arc-20160816; b=ch6Cp2uXSLsfqi/rD6wny801ZgODFDyWdWUrS4RXNp4NC8PCIkn+3gVWGJxwqqQquS kQjpVuUYxSL946gQx6JZ2X5nuL1i7x5J1B1khrd18pbmpVhikUpzqDwTjjP7gxPFBjKJ Ee/b0kYkX7KtXwkk2u9/b9iMgtwnuruD3K9d1rNLDaOszzCttHkK0hGhTzk0YHwRntOq yh2z9fIOCVwSLuuEnbOT6YqO9RpDR72Jb3Qn9f3dXvqUgZ3pEPWGYkto1I14vuZZOYYx oSgm5FQ2EO5n5KfuXmZIe5btOdh4/DU4G3GZEvE8MnAXVjfsL8STaGMgBX2Eg4Vwa2Xm emJw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:cc:to:subject:message-id:date:from :references:in-reply-to:mime-version:dkim-signature :arc-authentication-results; bh=5EsEsTbgZDYt1ML/MtGKhdcAFtEKYSTO95CQWad5HjE=; b=ivkcNSxzEmm1nEPeJDwwVkovSC5gm0UklVf7T0zl4ufQPBRA04GlWkr35dLyAGBG9N nbcoS8Jci0I7jnPT9c50F38DMGXZz6MSZM9StmIiLYxG08xFqUCsXTketRD5hMkA0MHy 13ZzVB41/W04oioUnZ/Lyhvc7FsE0THE/0qSQOTN4SSCFoPy+fdV7NRDlEZasybJfLl+ NAk2DBeybvf+yqCGiRUESwjZMejGP06TgNK8j0MUyX5Sx1jaikfn5M27F9214FARLNKk S7kU2yN86tWcloWAxzPWrrV34Cum7nNCLt/NLJRJB2Cl2x1pKeST+iMd/IsnstTrSDj3 4/3Q== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=hif0FdKu; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id 3-v6si12486012plo.475.2018.03.24.23.30.39; Sat, 24 Mar 2018 23:30:55 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=hif0FdKu; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753168AbeCYG3q (ORCPT + 99 others); Sun, 25 Mar 2018 02:29:46 -0400 Received: from mail-it0-f65.google.com ([209.85.214.65]:53975 "EHLO mail-it0-f65.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751029AbeCYG3o (ORCPT ); Sun, 25 Mar 2018 02:29:44 -0400 Received: by mail-it0-f65.google.com with SMTP id m134-v6so2364100itb.3 for ; Sat, 24 Mar 2018 23:29:44 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc; bh=5EsEsTbgZDYt1ML/MtGKhdcAFtEKYSTO95CQWad5HjE=; b=hif0FdKuOft2yOR7d0TMa2Ec5DlmX499JbEmLL4m8dnfi6vgH7quj4jw/71NrzRMfS 6FBe0OtYhkRdBSQNbNmtRgfBExE7ee7Bj8WlUeCuxs7Gouq/bS/PH67/Fi9q/BSKsdSA O/R/xS3pBTNed5I6drDSFUk3dVJp45INdoCPDW2b255tC7UD5adaNsVQwBO/xGDExrs4 SlxcYPx1Dl5HdWgyqEB21DCuo+iUbhSDkvxYsX84OXKNEy7msejgBf379g0ntgY5yHgT TnYf2eQvvxH1f2tu0gNkWV8dcAvB8Wy7XtpolCDGffsE7Y1KXXA/dlC8DWa84Qij2HRI /ILw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:cc; bh=5EsEsTbgZDYt1ML/MtGKhdcAFtEKYSTO95CQWad5HjE=; b=ZbiUzN2jVT+i59BPa/jSUiwyaK5DgY9IryDGQjB3cYKhRQn1xpZt+l2uQW2GidSevw tW7DnR1Q0C1+MmI3kXAVLb74Zl25+pzpPU43kGQ70xHevS2FKcwPcslY3MB81UtUMtA3 NuZTrn0EG3v8s27lhOZHcVR5lM5LhdKlHa2t0Z6aAqL725v+wxjEZlN9e4C0EgkkXluG 7cmxXg5Vql5oGyxEEEVnpa5xvd4SwY1ELbtaPu8ooeX6Xi1EnL3QLBhfthFEIy/vIlep md2K0fy/2OncHX0iFxhKAy1sgEypTx7UUwmr8Gj2LH+XaGkFjedKgFb+Kuuh/1P5TgYE edOw== X-Gm-Message-State: AElRT7HjCwU/GQaUCCJxSvPZi/u1hz7kcAAQ0KBX1WWQAUV3aojvCuuN SPoecxiXS2FsLz5fOjWPO5+hMl1KiYEjW6oswPU5MQ== X-Received: by 2002:a24:49e6:: with SMTP id e99-v6mr18525381itd.47.1521959383395; Sat, 24 Mar 2018 23:29:43 -0700 (PDT) MIME-Version: 1.0 Received: by 10.107.11.158 with HTTP; Sat, 24 Mar 2018 23:29:42 -0700 (PDT) In-Reply-To: References: From: Joel Fernandes Date: Sat, 24 Mar 2018 23:29:42 -0700 Message-ID: Subject: Re: syzbot rcu/debugobjects warning To: Thomas Gleixner Cc: Paul McKenney , LKML , Todd Poynor , "open list:BPF (Safe dynamic programs and tools)" , Ben Hutchings , Greg Kroah-Hartman , Guillaume Nault Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, Mar 23, 2018 at 1:41 PM, Thomas Gleixner wrote: > On Fri, 23 Mar 2018, Joel Fernandes wrote: >> On Fri, Mar 23, 2018 at 2:11 AM, Thomas Gleixner wrote: >> > On Thu, 22 Mar 2018, Joel Fernandes wrote: >> Sorry. Here is the raw crash log: https://pastebin.com/raw/puvh0cXE >> (The kernel logs are toward the end with the above). > > And that is interesting: > > [ 150.629667] [ 150.631700] [] dump_stack+0xc1/0x128 > [ 150.637051] [] ? __debug_object_init+0x526/0xc40 > [ 150.643431] [] panic+0x1bc/0x3a8 > [ 150.648416] [] ? percpu_up_read_preempt_enable.constprop.53+0xd7/0xd7 > [ 150.656611] [] ? load_image_and_restore+0xf9/0xf9 > [ 150.663070] [] ? vprintk_default+0x1d/0x30 > [ 150.668925] [] ? __warn+0x1a9/0x1e0 > [ 150.674170] [] ? __debug_object_init+0x526/0xc40 > [ 150.680543] [] __warn+0x1c4/0x1e0 > [ 150.685614] [] warn_slowpath_null+0x2c/0x40 > [ 150.691972] [] __debug_object_init+0x526/0xc40 > [ 150.698174] [] ? debug_object_fixup+0x30/0x30 > [ 150.704283] [] debug_object_init_on_stack+0x19/0x20 > [ 150.710917] [] __wait_rcu_gp+0x93/0x1b0 > [ 150.716508] [] synchronize_rcu.part.65+0x101/0x110 > [ 150.723054] [] ? rcu_pm_notify+0xc0/0xc0 > [ 150.728735] [] ? __call_rcu.constprop.72+0x910/0x910 > [ 150.735459] [] ? __lock_is_held+0xa1/0xf0 > [ 150.741223] [] synchronize_rcu+0x27/0x90 > > So this calls synchronize_rcu from a rcu callback. That's a nono. This is > on the back of an interrupt in softirq context and __wait_rcu_gp() can > sleep, which is obviously a bad idea in softirq context.... > > Cc'ed netdev .... > > And that also explains the debug object splat because this is not running > on the task stack. It's running on the softirq stack .... > > [ 150.746908] [] __l2tp_session_unhash+0x3d5/0x550 > [ 150.753281] [] ? __l2tp_session_unhash+0x1bf/0x550 > [ 150.759828] [] ? __local_bh_enable_ip+0x6a/0xd0 > [ 150.766123] [] ? l2tp_udp_encap_recv+0xd90/0xd90 > [ 150.772497] [] l2tp_tunnel_closeall+0x1e7/0x3a0 > [ 150.778782] [] l2tp_tunnel_destruct+0x30e/0x5a0 > [ 150.785067] [] ? l2tp_tunnel_destruct+0x1aa/0x5a0 > [ 150.791537] [] ? l2tp_tunnel_del_work+0x460/0x460 > [ 150.797997] [] __sk_destruct+0x53/0x570 > [ 150.803588] [] rcu_process_callbacks+0x898/0x1300 > [ 150.810048] [] ? rcu_process_callbacks+0x977/0x1300 > [ 150.816684] [] ? __sk_dst_check+0x240/0x240 > [ 150.822625] [] __do_softirq+0x206/0x951 > [ 150.828223] [] irq_exit+0x165/0x190 > [ 150.833557] [] smp_apic_timer_interrupt+0x7b/0xa0 > [ 150.840018] [] apic_timer_interrupt+0xa0/0xb0 > [ 150.846132] [ 150.848166] [] ? native_safe_halt+0x6/0x10 > [ 150.854036] [] ? trace_hardirqs_on+0xd/0x10 > [ 150.859973] [] default_idle+0x55/0x360 > [ 150.865478] [] arch_cpu_idle+0xa/0x10 > [ 150.870896] [] default_idle_call+0x36/0x60 > [ 150.876751] [] cpu_startup_entry+0x2b0/0x380 > [ 150.882787] [] ? cpu_in_idle+0x20/0x20 > [ 150.888291] [] ? clockevents_register_device+0x123/0x200 > [ 150.895358] [] start_secondary+0x303/0x3e0 > [ 150.901209] [] ? set_cpu_sibling_map+0x11f0/0x11f0 Thomas, thanks a lot. It appears this issue will not happen on mainline since from commit 765924e362d1 (subject "l2tp: don't close sessions in l2tp_tunnel_destruct()"), l2tp_tunnel_closeall is no longer called from l2tp_tunnel_destruct. From that commit message it seems one of the motivations is to solve scheduling from atomic issue. However for this change to be applied to android-4.9 and/or 4.9 stable, it depends on several other l2p patches and they aren't straight forward cherry-picks from mainline (and I don't have much background with this driver). v3.16.56 stable seems to be further along with l2tp than v4.9.89, in that it atleast has more of the upstream patches adapted for it, that the above patch depends on. Since this also related to stable, I am CC'ing Greg kh and Ben. Here are some of the commits in 3.16 stable that I couldn't find applied to v4.9 stable. The above fix quotes the below patches as dependencies so they would need to be stable backported. Also CC'ing Guillaume since he authored the above mentioned fix. 0c15ddabbcf l2tp: don't register sessions in l2tp_session_create() a3c5d5b70f4e l2tp: fix race condition in l2tp_tunnel_delete 5b216e8dcda2 l2tp: prevent creation of sessions on terminated tunnels 76ff5e22f1e0 l2tp: hold tunnel while looking up sessions in l2tp_netlink ceb8f6b23a38 l2tp: define parameters of l2tp_session_get*() as "const" 0295d020b63f l2tp: initialise session's refcount before making it reachable 29a77518927e l2tp: take reference on sessions being dumped b301c9b7782f l2tp: take a reference on sessions used in genetlink handlers By the way I think the reason why scheduling while atomic checks didn't show up is because the debugobjects warning caused a panic first, before that could happen. - Joel PS: There's also 12d656af4e3d2 ("l2tp: Avoid schedule while atomic in exit_net") which was fixing a call to synchronize_rcu in the same path/function, but the caller originated from l2tp_exit_net. But this patch is already in the stable trees. I am just mentioning it here for completeness.