Received: by 10.192.165.156 with SMTP id m28csp1765510imm; Wed, 18 Apr 2018 15:41:40 -0700 (PDT) X-Google-Smtp-Source: AIpwx4/fa/6IpC07rmj3u1BtQ5ng20udD4/gAeaXF55WtHEcjVr1JrSsWzIznNnw/8+Ms8Yctpce X-Received: by 10.98.185.23 with SMTP id z23mr3484778pfe.180.1524091300231; Wed, 18 Apr 2018 15:41:40 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1524091300; cv=none; d=google.com; s=arc-20160816; b=MWxKNDvAME4uve3qsT+jyS/aBqB5EMmRgwfIPlqiUZO62xgrAfd/Pg34S4znEVPKrW A2xagZpztGWHr081Cb6x7aKGGyfB80hrGSTlHxE6pyk/pFx/5cHhSdcnm1S4kMf9Xrxc i8rmZ0+eU66JltP8A282bfA/OplyG1LT47bxzfNm7uZqpeTidP2v/o1nX8VYsQaoUspC L98cH5q50fSqOoakpllINovoxoZ2ebcpoF7g2uln5lKZTGNx7DJACIiW1cX74nX4Dc8T jylYh2GEy84/MNvfohjmKdkfHrjx5ja0AlyDXUQBdrJrP0xe/sY/OQiWEfi577MNXot2 iVpA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:mime-version:message-id:references :in-reply-to:subject:cc:date:to:from:arc-authentication-results; bh=svsJFhl2imGlDPUeALO5h8NFljr6+lBOUV1K1odgIu0=; b=M5LZT9pYcLFRgVeDDf84uYflNUBpyprT5aVhy9F0n8FYXebVGGuIqn4W0eOMOCU4oG XMnfGFgitmQBKwvRLHRIgIBIdWHeLcYKTMDE3I99pz8K1hFLwfksRRGdqVWu+Gg3NFR4 a2/CG62zt3vmS1bhtqH8xiRql6iORQ9zmPGOEWgB/2MrsraMRPxznufTz1tEnGLnB4qe Jwz7Ouoye4JszsP+N/busPt4q11vJ5YfOn7ICrYH6xPFIDTkoTfbLrs2O071UU0CXCLs pX0bVthCNEhee0sOMIR2oAaELSPAfmFoGw0BunJNOhRBMWVVYAhXErwLcQ0jWviv+GFd zRuQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id z143si2022628pfc.96.2018.04.18.15.40.58; Wed, 18 Apr 2018 15:41:40 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752972AbeDRWiZ (ORCPT + 99 others); Wed, 18 Apr 2018 18:38:25 -0400 Received: from mx2.suse.de ([195.135.220.15]:45150 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752907AbeDRWiY (ORCPT ); Wed, 18 Apr 2018 18:38:24 -0400 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (charybdis-ext.suse.de [195.135.220.254]) by mx2.suse.de (Postfix) with ESMTP id 30B14AEE2; Wed, 18 Apr 2018 22:38:22 +0000 (UTC) From: NeilBrown To: Linus Torvalds , Fengguang Wu , Andrey Ryabinin Date: Thu, 19 Apr 2018 08:38:10 +1000 Cc: Oleg Drokin , Andreas Dilger , James Simmons , Greg Kroah-Hartman , Denis Petrovic , lustre-devel@lists.lustre.org, Staging subsystem List , Linux Kernel Mailing List , LKP Subject: Re: [cfs_trace_lock_tcd] BUG: KASAN: null-ptr-deref in cfs_trace_lock_tcd+0x25/0xeb In-Reply-To: References: <20180418133831.uef7d77ejdyjtxgh@wfg-t540p.sh.intel.com> <20180418134058.l3orjjxcpv7cxjfw@wfg-t540p.sh.intel.com> Message-ID: <87k1t4qgx9.fsf@notabene.neil.brown.name> MIME-Version: 1.0 Content-Type: multipart/signed; boundary="=-=-="; micalg=pgp-sha256; protocol="application/pgp-signature" Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org --=-=-= Content-Type: text/plain On Wed, Apr 18 2018, Linus Torvalds wrote: > Ugh, that lustre code is disgusting. > > I thought we were getting rid of it. Lots of people seem to get value out of it. So we're trying to polish the code to make it less disgusting. This is just a little fall-out. The smoking gun is [ 6.528851] LNetError: 1:0:(module.c:546:libcfs_init()) misc_register: error -16 lustre registers a misc char device with the same number as USERIO. If they both try to register, one fails. Until recently, lustre could only be built as a module so when lustre failed to register the char dev, the module-load fails. Now it can be built monolithic (makes my testing easier) and the failure mode is different. The module that tried to register the chardev rewinds some initialization, and a subsequent module assumes that init was done, and explodes. There are patches in Greg's inbox to change lustre to use a dynamically allocated minor. And it is on my todo list to get lustre to do less initialization at module-init time (where, in a monolithic build, it is hard to give up if some previous module failed), and more at mount time. So this is a known bug (maybe a new manifestation) and a fix has been posted. There is certainly room for lots more cleanup and that is slowly happening. I'll make a note to look into the large stack frames you observed. Previous report of bug was Subject: [staging] 184ecc5ceb: BUG:unable_to_handle_kernel Message-ID: <20180319091931.gt6ijdw7ahkbtvrq@inn> Thanks, NeilBrown > > Anyway, I started looking at why the stack trace is such an incredible > mess, with lots of stale entries. > > The reason (well, _one_ reason) seems to be "ksocknal_startup". It has > a 500-byte stack frame for some incomprehensible reason. I assume due > to excessive inlining, because the function itself doesn't seem to be > that bad. > > Similarly, LNetNIInit has a 300-byte stack frame. So it gets pretty deep. > > I'm getting the feeling that KASAN is making things worse because > probably it's disabling all the sane stack frame stuff (ie no merging > of stack slot entries, perhaps?). > > Without KASAN (but also without a lot of other things, so I might be > blaming KASAN incorrectly), the stack usage of ksocknal_startup() is > just under 100 bytes, so if it is KASAN, it's really a big difference. > > Anyway, apart from the excessive elements, the report seems fine, but > I'm adding Neil Brown to the cc, since he's the one that has been > making most of the lustre/lnet changes this merge window. > > Also adding Andrey to check about the oddly large stack usage. > > Not including the whole email with the attachements - Neil, it's on > lkml and lustre-devel if you hadn't seen it. > > Linus --=-=-= Content-Type: application/pgp-signature; name="signature.asc" -----BEGIN PGP SIGNATURE----- iQIzBAEBCAAdFiEEG8Yp69OQ2HB7X0l6Oeye3VZigbkFAlrXyNIACgkQOeye3VZi gbkKsQ/+J+yynPr0444BWI3/gQEsufVSY5CIo7XDwAvGWDLpOgZWB5NCPcQ5w+LI zZbZsBEg+dmn2t2shfDRFvWfJAH+6ph1+Ic59LXX78nYWMysAsmtCAQikhTpgjy7 6L6Ymym7ra7xQGybIohuZclTEdwMkrBu4h+fNb8V31ttt4gTuTY5WAA+LFpo+btP bx1Hv0ytMO9U/mFbbX7v/NHdJzGpviJuC/amMdj48SL83csT9JKAKXWetfwP8uy8 iavuuQHb3yRALvD79TQpmRaoJEIXm4dEdjz2qyUrPuHtHzU/gVTsoRyV4b3tm9xj FOjr/YoDJMyvu9Cb5+QOYjZDqBoGXQorRtjhx3pvIktf/14B7vAnRvV2ZsViPKOG d6ucvH8o8hYOHhq+aVqsN3U4380ZL1WBLBexhs3lD/T7T2Br4PR6HdCf3e2r0Tr+ hyvstv6QaGuKdkHlCaKnetTNWqfRschGvjYbTTbrlFWfqGuj/AEVl7Wor5uzwsWn gUYrzzR099Q/GKIiqy6jzWimM/qFCOz39ItWbjtXxEQjs8nKBsUkVM4/NoWHSxHV mHO5bi8SiQxExuKcJh8rHLtrY8UKj9MOUR1k6OEEAHnFPuFB1HjeE9v/Opap0CKt 3tq61VwShWq9eyiK04DNdGwDlS1RajL7bAr89sPorNYaA9FTr78= =cNC0 -----END PGP SIGNATURE----- --=-=-=--