Received: by 2002:a05:6a10:16a7:0:0:0:0 with SMTP id gp39csp497270pxb; Thu, 19 Nov 2020 06:41:14 -0800 (PST) X-Google-Smtp-Source: ABdhPJxPudWuHiFKs4iy8k3P2L3MTe7kvCDAlkfY314iniJFovwZw9COBgzlnlsvUY5+dL5/I1qz X-Received: by 2002:a05:6402:229a:: with SMTP id cw26mr32422445edb.271.1605796874354; Thu, 19 Nov 2020 06:41:14 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1605796874; cv=none; d=google.com; s=arc-20160816; b=HXjIPyi2k6UI+Tdd/nyC48YH7UlXG57EO4IACgh7foLoCcqSsjbbebbFKLexO1A4l4 69z5W+4iZMiFnxS54Ur8DlJuD9cuBp5V/nq7v/iZEriCzV65txl21HuzXj3EysMqussS 2FMddEgdBBR7c0TSXrkf5JOibTJoZU3ad3nS+SnTgVIWZVSh1R160hMfiYmcEVPDNAbn ZP/2ZWy+qY2Rqpq4Lt7XK0ad0MHfS+V8TZJ2W/UnE7sG3uQCkxX1sw2f7ulQ8MWmfFRV 0YyeXT4r4mw5CQXVVDOg0/t0ABl7BVPyXhPH/c8E0SkvUeV5zRjANT4/RXlLDzMmfD1S D1Kg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:subject:message-id:date:from:in-reply-to :references:mime-version:dkim-signature; bh=pCARG4QZtMHnFpafj1OT+P5fHUKr/uQzChLy5RLnZB0=; b=N9ecYptQcQMiPHPMAsY2mMQKXn6JzDi8JSidgXaj308QcYsuT7bvHoJyMSu3ECBjX/ LcPrIwfxKGMZMz672a4bOn5YosZKuhMwSyzLwVNdjN9sMn6+cNKx5er3/zrOiHlW+cBL Hh+VbmSQxC/ehYwHEcUdnlK9YyDO2W2VMm4qWBM+Se5qzvMbikM8/T7o9y5xk1zKglnm Q6XFDHkFiiZkSmojHC/VXPiua6EW4fa7j3j1cyMUag6xKlQzMN+ayntpSWd5r6Fd57lV 97dYLXXZso9JFBWvgZJjwYqHth+eQiULcb/tW820m0EnR4oKlg5sOIt+kRE6U6txvpd4 mfnQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=Ke+b0AgH; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id j22si7668548ejm.64.2020.11.19.06.40.51; Thu, 19 Nov 2020 06:41:14 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=Ke+b0AgH; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727975AbgKSOgW (ORCPT + 99 others); Thu, 19 Nov 2020 09:36:22 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:36512 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727356AbgKSOgW (ORCPT ); Thu, 19 Nov 2020 09:36:22 -0500 Received: from mail-qv1-xf42.google.com (mail-qv1-xf42.google.com [IPv6:2607:f8b0:4864:20::f42]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id D1C83C0613CF for ; Thu, 19 Nov 2020 06:36:21 -0800 (PST) Received: by mail-qv1-xf42.google.com with SMTP id x13so2912548qvk.8 for ; Thu, 19 Nov 2020 06:36:21 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=pCARG4QZtMHnFpafj1OT+P5fHUKr/uQzChLy5RLnZB0=; b=Ke+b0AgHE803lafhEwwOjxoamU7d6CtbMuUxOt/Hi10Ppo5C29SltT+zO3DuyRWoch TcGVPN9NZCzCfFs1eGMLx+Q2nlRhfxYtgTxWSqAyTubwq8W1j0udwh1ylfpNswxIJnuC +lxC8xIoHmmuuOJX0GslUH9Y52nvTQWzHQW0kf9DIN98L6nz1/qJ998YQ8/YliqCuy7t GukyBTAwj/LUe+xjOmP0jkQZiXwLIYoUhrPHi7Wx+IzauHqwZQAuD6qG+RZ1oXg29SZp mIiRoeG7QE3k3kv+XUIwWH0l+BcSQlqF/21HQ01SzQS7kgwglFjJLOVz1cuE6p6f8CCs ITzw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=pCARG4QZtMHnFpafj1OT+P5fHUKr/uQzChLy5RLnZB0=; b=IYCmeOYVy2oQqqgDGH4ts84lEisejdB2e4hLdiCczPpAsNXfAGxpVV4IX0y78AwQ3n 7LkeIVs0tUbI0OMd/fT1D7pwoCI3AMy+gmcfXoVepjOp5Lk180f1iScHOHJS5zWayYEr J3X/JKIjHMKGDJ7kvaV+vXkHrgzcsQhIkJtkRWgV4qqt9kPURIGYtsJp3L2tfoP34Q6C eK0y58J98QEIGn/48P7TU+dJiWr4KC/6gne6b1ifn9uSZzOcyGsCpeD7S74XBNmBfqu5 NJF1wx3htXUPqR5PqE6o1X/LdqvFg9F/0+sUbiSH6/plKhJHyP2LdJBgNhOIqNLkiucQ Om6A== X-Gm-Message-State: AOAM533T/MdtcfIRP2Tb/m+NpXovurtoyKHj8lnfkT0v6wo8/OLpyh8D iqbjbRP7SoHtE38CWjS8Pokj1+xTVbQQLkNBV5xLIg== X-Received: by 2002:a0c:e911:: with SMTP id a17mr11585880qvo.18.1605796580560; Thu, 19 Nov 2020 06:36:20 -0800 (PST) MIME-Version: 1.0 References: <1595640639-9310-1-git-send-email-penguin-kernel@I-love.SAKURA.ne.jp> <384ce711-25c5-553b-8d22-965847132fbd@i-love.sakura.ne.jp> <0eb519fa-e77b-b655-724a-4e9eecc64626@i-love.sakura.ne.jp> <6933e938-f219-5e13-aee6-fe4de87eb43e@i-love.sakura.ne.jp> <81ab0ffd-6e80-c96c-053a-b1b4fe8694c1@i-love.sakura.ne.jp> <20201118142357.GW3121392@hirez.programming.kicks-ass.net> <1778f2e5-0a0c-2c6e-2c83-fe51d938e8a2@i-love.sakura.ne.jp> <20201118151038.GX3121392@hirez.programming.kicks-ass.net> <9bc4e07d-2a58-077b-b4c7-ab056ba33cf1@i-love.sakura.ne.jp> <5e8342c4-702f-80a9-e669-8a7386ce0da1@i-love.sakura.ne.jp> In-Reply-To: From: Dmitry Vyukov Date: Thu, 19 Nov 2020 15:36:09 +0100 Message-ID: Subject: Re: [PATCH v3] lockdep: Allow tuning tracing capacity constants. To: Tetsuo Handa Cc: syzkaller , Ingo Molnar , Will Deacon , Andrew Morton , LKML , Linus Torvalds , Peter Zijlstra Content-Type: text/plain; charset="UTF-8" Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, Nov 19, 2020 at 3:30 PM Dmitry Vyukov wrote: > > > > On Thu, Nov 19, 2020 at 2:45 PM Tetsuo Handa > > wrote: > > > > > > On 2020/11/19 22:06, Dmitry Vyukov wrote: > > > >>>> > > > >>>> I am trying to reproduce this locally first. syzbot caims it can > > > >>>> reproduce it with a number of very simpler reproducers (like spawn > > > >>>> process, unshare, create socket): > > > >>>> https://syzkaller.appspot.com/bug?id=8a18efe79140782a88dcd098808d6ab20ed740cc > > > >>>> > > > >>>> I see a very slow drift, but it's very slow, so get only to: > > > >>>> direct dependencies: 22072 [max: 32768] > > > >>>> > > > >>>> But that's running a very uniform workload. > > > >>>> > > > >>>> However when I tried to cat /proc/lockdep to see if there is anything > > > >>>> fishy already, > > > >>>> I got this (on c2e7554e1b85935d962127efa3c2a76483b0b3b6). > > > >>>> > > > >>>> Some missing locks? > > > > > > Not a TOMOYO's bug. Maybe a lockdep's bug. > > > > > > > > > > > But I don't know if it's enough to explain the overflow or not... > > > > > > > > > > Since you can't hit the limit locally, I guess we need to ask syzbot to > > > run massive testcases. > > > > I am trying to test the code that will do this. Otherwise we will get > > days-long round-trips for stupid bugs. These files are also quite > > huge, I afraid that may not fit into storage. > > > > So far I get to at most: > > > > lock-classes: 2901 [max: 8192] > > direct dependencies: 25574 [max: 32768] > > dependency chains: 40605 [max: 65536] > > dependency chain hlocks used: 176814 [max: 327680] > > stack-trace entries: 258590 [max: 524288] > > > > with these worst offenders: > > > > # egrep "BD: [0-9]" /proc/lockdep > > 00000000df5b6792 FD: 2 BD: 1235 -.-.: &obj_hash[i].lock > > 000000005dfeb73c FD: 1 BD: 1236 ..-.: pool_lock > > 00000000b86254b1 FD: 14 BD: 1111 -.-.: &rq->lock > > 00000000866efb75 FD: 1 BD: 1112 ....: &cfs_b->lock > > 000000006970cf1a FD: 2 BD: 1126 ----: tk_core.seq.seqcount > > 00000000f49d95b0 FD: 3 BD: 1180 -.-.: &base->lock > > 00000000ba3f8454 FD: 5 BD: 1115 -.-.: hrtimer_bases.lock > > 00000000fb340f16 FD: 16 BD: 1030 -.-.: &p->pi_lock > > 00000000c9f6f58c FD: 1 BD: 1114 -.-.: &per_cpu_ptr(group->pcpu, cpu)->seq > > 0000000049d3998c FD: 1 BD: 1112 -.-.: &cfs_rq->removed.lock > > 00000000fdf7f396 FD: 7 BD: 1112 -...: &rt_b->rt_runtime_lock > > 0000000021aedb8d FD: 1 BD: 1113 -...: &rt_rq->rt_runtime_lock > > 000000004e34c8d4 FD: 1 BD: 1112 ....: &cp->lock > > 00000000b2ac5d96 FD: 1 BD: 1127 -.-.: pvclock_gtod_data > > 00000000c5df4dc3 FD: 1 BD: 1031 ..-.: &tsk->delays->lock > > 00000000fe623698 FD: 1 BD: 1112 -...: > > per_cpu_ptr(&cgroup_rstat_cpu_lock, cpu) > > > > > > But the kernel continues to crash on different unrelated bugs... > > > Here is one successful sample. How do we debug it? What should we be > looking for? > > p.s. it's indeed huge, full log was 11MB, this probably won't be > chewed by syzbot. > Peter, are these [hex numbers] needed? Could we strip them during > post-processing? At first sight they look like derivatives of the > name. The worst back-edge offenders are: 00000000b445a595 FD: 2 BD: 1595 -.-.: &obj_hash[i].lock 0000000055ae0468 FD: 1 BD: 1596 ..-.: pool_lock 00000000b1336dc4 FD: 2 BD: 1002 ..-.: &zone->lock 000000009a0cabce FD: 1 BD: 1042 ...-: &____s->seqcount 000000001f2849b5 FD: 1 BD: 1192 ..-.: depot_lock 00000000d044255b FD: 1 BD: 1038 -.-.: &n->list_lock 000000005868699e FD: 17 BD: 1447 -.-.: &rq->lock 00000000bb52ab59 FD: 1 BD: 1448 ....: &cfs_b->lock 000000004f442fff FD: 2 BD: 1469 ----: tk_core.seq.seqcount 00000000c908cc32 FD: 3 BD: 1512 -.-.: &base->lock 00000000478677cc FD: 5 BD: 1435 -.-.: hrtimer_bases.lock 00000000b5b65cb1 FD: 19 BD: 1255 -.-.: &p->pi_lock 000000007f313bd5 FD: 1 BD: 1451 -.-.: &per_cpu_ptr(group->pcpu, cpu)->seq 00000000bac5d8ed FD: 1 BD: 1004 ...-: &____s->seqcount#2 000000000f57e411 FD: 1 BD: 1448 -.-.: &cfs_rq->removed.lock 0000000013c1ab65 FD: 7 BD: 1449 -.-.: &rt_b->rt_runtime_lock 000000003bdf78f4 FD: 1 BD: 1450 -.-.: &rt_rq->rt_runtime_lock 00000000975d5b80 FD: 1 BD: 1448 ....: &cp->lock 000000002586e81b FD: 1 BD: 1471 -.-.: pvclock_gtod_data 00000000d03aed24 FD: 1 BD: 1275 ..-.: &tsk->delays->lock 000000001119414f FD: 1 BD: 1448 -...: per_cpu_ptr(&cgroup_rstat_cpu_lock, cpu) 000000006f3d793b FD: 6 BD: 1449 -.-.: &ctx->lock 00000000f3f0190c FD: 9 BD: 1448 -...: &rq->lock/1 000000007410cf1a FD: 1 BD: 1448 -...: &rd->rto_lock There are 19 with ~1500 incoming edges. So that's 20K. In my local testing I was at around 20-something K and these worst offenders were at ~1000 back edges. Now they got to 1500, so that is what got us over the 32K limit, right? Does this analysis make sense? Any ideas what to do with these?