Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S934620AbcLMTfY (ORCPT ); Tue, 13 Dec 2016 14:35:24 -0500 Received: from mail-wj0-f195.google.com ([209.85.210.195]:34205 "EHLO mail-wj0-f195.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752826AbcLMTek (ORCPT ); Tue, 13 Dec 2016 14:34:40 -0500 Subject: Re: [inotify] fee1df54b6: BUG_kmalloc-#(Not_tainted):Freepointer_corrupt To: "Eric W. Biederman" References: <87inqo4ip1.fsf@yhuang-dev.intel.com> <87oa0fpsqs.fsf@xmission.com> Cc: Jan Kara , containers@lists.linux-foundation.org, LKML , Serge Hallyn , avagin@openvz.org From: Nikolay Borisov Message-ID: Date: Tue, 13 Dec 2016 21:34:35 +0200 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.5.1 MIME-Version: 1.0 In-Reply-To: <87oa0fpsqs.fsf@xmission.com> Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 1749 Lines: 45 On 13.12.2016 20:51, Eric W. Biederman wrote: > Nikolay Borisov writes: > >> So this thing resurfaced again and I took a hard look into the code but >> couldn't find anything suspicious. So the allocating and freeing >> contexts leads me to believe it's the 'tbl' pointer that is being >> corrupted. The only thing which I do with it is to increase it by two. >> >> Perhaps some liveness issues. > > To me it feels like a double free somewhere. Like we call dec_ucount > and thus put_ucount multiple times in a way that goes to 0. > > Perhaps there is a peculiarity in the existing code which allows the > count to go to zero which we don't notice because we don't free anything > when the count goes to zero today. > > Perhaps there is some subtle semantic mismatch between your conversion > and the inotify code. > > I don't know if you made a subtle misreading of the code, or if > there is an existing bug that your changes took from harmless to > problematic, but the evidence is overwhelming that something > is going wrong and it is your patch that brings it out. > > If it helps the openvz folks apparently reproduced this with the criu > regression tests and the appropriate kernel debug options, and confirmed > the failure was your patch. Great but I think I missed this conversation, care to send relevant threads? I'd like to get to the bottom of this and have it merged? @openvz guys - if you care to shout with more details I'd love to work on getting this fixed! > > The current state of play is that I would love to merge this if we can > track down this issue. I dropped this from my tree before I sent my pull > request to Linus so there is no emergency to get this fixed. > > Eric > >