Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S934893AbcLMWST (ORCPT ); Tue, 13 Dec 2016 17:18:19 -0500 Received: from mail-oi0-f66.google.com ([209.85.218.66]:34423 "EHLO mail-oi0-f66.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S932661AbcLMWSS (ORCPT ); Tue, 13 Dec 2016 17:18:18 -0500 MIME-Version: 1.0 In-Reply-To: References: <87inqo4ip1.fsf@yhuang-dev.intel.com> <87oa0fpsqs.fsf@xmission.com> From: Andrey Vagin Date: Tue, 13 Dec 2016 14:18:15 -0800 X-Google-Sender-Auth: yn8lItjbYxI3xlQeJqiyqNnWfeY Message-ID: Subject: Re: [inotify] fee1df54b6: BUG_kmalloc-#(Not_tainted):Freepointer_corrupt To: Nikolay Borisov Cc: "Eric W. Biederman" , Linux Containers , Jan Kara , LKML , Serge Hallyn Content-Type: text/plain; charset=UTF-8 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2835 Lines: 78 On Tue, Dec 13, 2016 at 11:34 AM, Nikolay Borisov wrote: > > > On 13.12.2016 20:51, Eric W. Biederman wrote: >> Nikolay Borisov writes: >> >>> So this thing resurfaced again and I took a hard look into the code but >>> couldn't find anything suspicious. So the allocating and freeing >>> contexts leads me to believe it's the 'tbl' pointer that is being >>> corrupted. The only thing which I do with it is to increase it by two. >>> >>> Perhaps some liveness issues. >> >> To me it feels like a double free somewhere. Like we call dec_ucount >> and thus put_ucount multiple times in a way that goes to 0. >> >> Perhaps there is a peculiarity in the existing code which allows the >> count to go to zero which we don't notice because we don't free anything >> when the count goes to zero today. >> >> Perhaps there is some subtle semantic mismatch between your conversion >> and the inotify code. >> >> I don't know if you made a subtle misreading of the code, or if >> there is an existing bug that your changes took from harmless to >> problematic, but the evidence is overwhelming that something >> is going wrong and it is your patch that brings it out. >> >> If it helps the openvz folks apparently reproduced this with the criu >> regression tests and the appropriate kernel debug options, and confirmed >> the failure was your patch. > > Great but I think I missed this conversation, care to send relevant > threads? I'd like to get to the bottom of this and have it merged? > > @openvz guys - if you care to shout with more details I'd love to work > on getting this fixed! Hi Nikolay, We execute CRIU tests for linux-next and a few days ago they triggered a kernel bug: http://www.spinics.net/lists/linux-mm/msg118204.html If you want to execute these tests to reproduce a bug, you need to do these steps: $ apt-get install gcc make protobuf-c-compiler libprotobuf-c0-dev libaio-dev \ libprotobuf-dev protobuf-compiler python-ipaddr libcap-dev \ libnl-3-dev gdb bash python-protobuf $ git clone https://github.com/xemul/criu.git $ cd criu $ make $ python test/zdtm.py run -a -p 4 Here is a config file, which we use to compile a kernel: https://github.com/avagin/criu-jenkins-digitalocean/blob/master/jenkins-scripts/config I recommend to boot the kernel with slub_debug=FZ. Don't hesitate to ask me if you will have any questions. Thanks, Andrei > >> >> The current state of play is that I would love to merge this if we can >> track down this issue. I dropped this from my tree before I sent my pull >> request to Linus so there is no emergency to get this fixed. >> >> Eric >> >> > _______________________________________________ > Containers mailing list > Containers@lists.linux-foundation.org > https://lists.linuxfoundation.org/mailman/listinfo/containers