Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S934067Ab3GPVCS (ORCPT ); Tue, 16 Jul 2013 17:02:18 -0400 Received: from mail-vb0-f48.google.com ([209.85.212.48]:54938 "EHLO mail-vb0-f48.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S933320Ab3GPVCO (ORCPT ); Tue, 16 Jul 2013 17:02:14 -0400 MIME-Version: 1.0 In-Reply-To: <20130716204335.GH11674@dastard> References: <20130716015305.GB30569@redhat.com> <20130716023847.GA31481@redhat.com> <20130716060351.GE11674@dastard> <20130716193332.GB3572@sgi.com> <20130716204335.GH11674@dastard> Date: Tue, 16 Jul 2013 14:02:12 -0700 X-Google-Sender-Auth: EsUysDh-p2Ez6DwSWC3L1pZHWJ0 Message-ID: Subject: Re: splice vs execve lockdep trace. From: Linus Torvalds To: Dave Chinner Cc: Ben Myers , Peter Zijlstra , Oleg Nesterov , Linux Kernel , Alexander Viro , Dave Jones , xfs@oss.sgi.com Content-Type: text/plain; charset=UTF-8 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2202 Lines: 50 On Tue, Jul 16, 2013 at 1:43 PM, Dave Chinner wrote: > > Yes - IO is serialised based on the ip->i_iolock, not i_mutex. We > don't use i_mutex for many things IO related, and so internal > locking is needed to serialise against stuff like truncate, hole > punching, etc, that are run through non-vfs interfaces. Umm. But the page IO isn't serialized by i_mutext *either*. You don't hold it across page faults. In fact you don't even take it at all across page faults. That's kind of my point. splice is about the page IO, and it's serialized purely by the page lock. And then "->readpage()" will get whatever IO mutex in order to do that right, but think about the case where things are already in the page cache. There's no reason for any serialization what-so-ever. So this isn't about i_mutex. At all. > Read isn't the problem - it's write that's the deadlock issue... I agree, and I think your patches are needed, as I said in that email you replied to. But due to this issue, I was looking at the XFS splice support, and the read-side splice support seems inefficient and overly complex. I'm not seeing why it needs that i_iolock. And no, this really has nothing to do with i_mutex. Go look at generic_file_splice_read(). There's no i_mutex there at all. It's more like a series of magic page-faults without the actual page table actions. Which is kind of the whole point of splice - zero-copy without bothering with page table setup/teardown. Now, it's perfectly possible that XFS really needs some odd locking here, but your reply about i_mutex makes me think that you did it because you were confused about what it actually wants. *Every* other local filesystem uses generic_file_splice_read() with just a single .splice_read = generic_file_splice_read, in the file ops initializer. Sure, nfs and ocfs2 wrap things like xfs does, but they basically do it to revalidate their caches. Linus -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/