Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1756519Ab3IYX3b (ORCPT ); Wed, 25 Sep 2013 19:29:31 -0400 Received: from ipmail07.adl2.internode.on.net ([150.101.137.131]:8506 "EHLO ipmail07.adl2.internode.on.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750825Ab3IYX33 (ORCPT ); Wed, 25 Sep 2013 19:29:29 -0400 X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: Ag8HANdwQ1J5LCC0/2dsb2JhbABagiVig0u4V4U8gRwXdIIlAQEEATocFQkFBQsIAxgJJQ8FJQMhE4gABbsOgT8WjzsHhB0Dl3uReIM2Kg Date: Thu, 26 Sep 2013 09:29:15 +1000 From: Dave Chinner To: "Kirill A. Shutemov" Cc: Andrew Morton , Andrea Arcangeli , Al Viro , Hugh Dickins , Wu Fengguang , Jan Kara , Mel Gorman , linux-mm@kvack.org, Andi Kleen , Matthew Wilcox , "Kirill A. Shutemov" , Hillf Danton , Dave Hansen , Ning Qu , Alexander Shishkin , linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org Subject: Re: [PATCHv6 00/22] Transparent huge page cache: phase 1, everything but mmap() Message-ID: <20130925232915.GK26872@dastard> References: <1379937950-8411-1-git-send-email-kirill.shutemov@linux.intel.com> <20130924163740.4bc7db61e3e520798220dc4c@linux-foundation.org> <20130925095105.06464E0090@blue.fi.intel.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20130925095105.06464E0090@blue.fi.intel.com> User-Agent: Mutt/1.5.21 (2010-09-15) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2967 Lines: 74 On Wed, Sep 25, 2013 at 12:51:04PM +0300, Kirill A. Shutemov wrote: > Andrew Morton wrote: > > On Mon, 23 Sep 2013 15:05:28 +0300 "Kirill A. Shutemov" wrote: > > > > > It brings thp support for ramfs, but without mmap() -- it will be posted > > > separately. > > > > We were never going to do this :( > > > > Has anyone reviewed these patches much yet? > > Dave did very good review. Few other people looked to separate patches. > See Reviewed-by/Acked-by tags in patches. > > It looks like most mm experts are busy with numa balancing nowadays, so > it's hard to get more review. Nobody has reviewed it from the filesystem side, though. The changes that require special code paths for huge pages in the write_begin/write_end paths are nasty. You're adding conditional code that depends on the page size and then having to add checks to ensure that large page operations don't step over small page boundaries and other such corner cases. It's an extremely fragile design, IMO. In general, I don't like all the if (thp) {} else {}; code that this series introduces - they are code paths that simply won't get tested with any sort of regularity and make the code more complex for those that aren't using THP to understand and debug... Then there is a new per-inode lock that is used in generic_perform_write() which is held across page faults and calls to filesystem block mapping callbacks. This inserts into the middle of an existing locking chain that needs to be strictly ordered, and as such will lead to the same type of lock inversion problems that the mmap_sem had. We do not want to introduce a new lock that has this same problem just as we are getting rid of that long standing nastiness from the page fault path... I also note that you didn't convert invalidate_inode_pages2_range() to support huge pages which is needed by real filesystems that support direct IO. There are other truncate/invalidate interfaces that you didn't convert, either, and some of them will present you with interesting locking challenges as a result of adding that new lock... > The patchset was mostly ignored for few rounds and Dave suggested to split > to have less scary patch number. It's still being ignored by filesystem people because you haven't actually tried to implement support into a real filesystem..... > > > Please review and consider applying. > > > > It appears rather too immature at this stage. > > More review is always welcome and I'm committed to address issues. IMO, supporting a real block based filesystem like ext4 or XFS and demonstrating that everything works is necessary before we go any further... Cheers, Dave. -- Dave Chinner david@fromorbit.com -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/