From: Jan Kara Subject: Re: Call trace in ext4_es_lru_add on 3.10 stable Date: Tue, 23 Sep 2014 16:43:40 +0200 Message-ID: <20140923144340.GI2359@quack.suse.cz> References: <20140918192131.GD19520@thunk.org> <541B32A1.3080706@profihost.ag> <20140918194311.GE19520@thunk.org> <541FC817.7030401@profihost.ag> <20140922164715.GB4572@thunk.org> <54206AA2.1050607@profihost.ag> <20140922202004.GF4572@thunk.org> <54212641.9010808@profihost.ag> <20140923094204.GB2359@quack.suse.cz> <54216641.8090608@profihost.ag> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Cc: Jan Kara , Theodore Ts'o , linux-ext4@vger.kernel.org, "p.herz@profihost.ag >> Philipp Herz - Profihost AG" , stable@vger.kernel.org, Zheng Liu To: Stefan Priebe Return-path: Content-Disposition: inline In-Reply-To: <54216641.8090608@profihost.ag> Sender: stable-owner@vger.kernel.org List-Id: linux-ext4.vger.kernel.org On Tue 23-09-14 14:23:29, Stefan Priebe wrote: > > Am 23.09.2014 11:42, schrieb Jan Kara: > >On Tue 23-09-14 09:50:25, Stefan Priebe - Profihost AG wrote: > >> > >>Am 22.09.2014 um 22:20 schrieb Theodore Ts'o: > >>>On Mon, Sep 22, 2014 at 08:29:54PM +0200, Stefan Priebe wrote: > >>>>Hi, > >>>>Am 22.09.2014 18:47, schrieb Theodore Ts'o: > >>>>>On Mon, Sep 22, 2014 at 08:56:23AM +0200, Stefan Priebe wrote: > >>>>>>>That's not the whole message; you just weren't able to capture it all. > >>>>>>>How are you capturing these messages, by the way? Serial console? > >>>>>> > >>>>>>Sorry this was an incomplete copy and paste by me. > >>>>>> > >>>>>>Here is the complete output: > >>>>>>[1578544.839610] BUG: soft lockup - CPU#7 stuck for 22s! [mysqld:29281] > >>>>>>[1578544.893450] Modules linked in: nf_conntrack_ipv4 nf_defrag_ipv4 > >>>>> > >>>>>OK, thanks, this is a known bug, where when ext4 is under heavy memory > >>>>>pressure, we can end up stalling in reclaim. This message indicates > >>>>>that the system got stalled for 22 seconds, which is not good, since > >>>>>it impacts the interactivity of your system, and increases the > >>>>>long-tail latency of requests to servers running on your system, but > >>>>>it doesn't cause any data loss or will cause any of your processes to > >>>>>crash or otherwise stop functioning (except for temporarily). > >>>>> > >>>>>It's something that we are working on, and there are patches which > >>>>>Zheng Liu submitted that still need a bit of polishing, but I hope to > >>>>>have it addressed soon. > >>>> > >>>>Thanks for your feedback. Will those patches go to stable? Any link to > >>>>those patches? > >>> > >>>I'm not sure they will go to Stable when they are ready, because the > >>>patches are somewhat complex and so they may not apply cleanly to much > >>>older kernels. > >>> > >>>The patches under discussion (some have been applied, others hae been > >>>waiting for some requested changes) can be found here: > >>> > >>>http://patchwork.ozlabs.org/patch/377720 > >>>http://patchwork.ozlabs.org/patch/377721 > >>>http://patchwork.ozlabs.org/patch/377722 > >>>http://patchwork.ozlabs.org/patch/377723 > >>>http://patchwork.ozlabs.org/patch/377724 > >>>http://patchwork.ozlabs.org/patch/377725 > >>>http://patchwork.ozlabs.org/patch/377727 > >> > >>hui that's a lot. Are they ALL needed to fix this? > > Yes, all of them are needed. > > How can i get notified when they're ready / polished? Watching changes to fs/ext4/extents_status.c is probably the most reliable. Or maybe Zheng (Cced) can add you to CC list when submitting the patch set next time. Honza -- Jan Kara SUSE Labs, CR