Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S934315Ab3HIPCI (ORCPT ); Fri, 9 Aug 2013 11:02:08 -0400 Received: from relay.parallels.com ([195.214.232.42]:51309 "EHLO relay.parallels.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S933939Ab3HIPCG (ORCPT ); Fri, 9 Aug 2013 11:02:06 -0400 Message-ID: <52050474.8040608@parallels.com> Date: Fri, 9 Aug 2013 19:02:12 +0400 From: Maxim Patlasov User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:17.0) Gecko/20130801 Thunderbird/17.0.8 MIME-Version: 1.0 To: Miklos Szeredi CC: , Kirill Korotaev , Pavel Emelianov , fuse-devel , Brian Foster , Kernel Mailing List , James Bottomley , , Al Viro , Linux-Fsdevel , Andrew Morton , , , Mel Gorman Subject: Re: [PATCH 10/16] fuse: Implement writepages callback References: <20130629172211.20175.70154.stgit@maximpc.sw.ru> <20130629174525.20175.18987.stgit@maximpc.sw.ru> <20130719165037.GA18358@tucsk.piliscsaba.szeredi.hu> <51FBD2DF.50506@parallels.com> In-Reply-To: Content-Type: text/plain; charset="UTF-8"; format=flowed Content-Transfer-Encoding: 8bit X-Originating-IP: [10.30.17.2] Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 3731 Lines: 86 Hi Miklos, 08/06/2013 08:25 PM, Miklos Szeredi пишет: > On Fri, Aug 2, 2013 at 5:40 PM, Maxim Patlasov wrote: >> 07/19/2013 08:50 PM, Miklos Szeredi пишет: >> >>> On Sat, Jun 29, 2013 at 09:45:29PM +0400, Maxim Patlasov wrote: >>>> From: Pavel Emelyanov >>>> >>>> The .writepages one is required to make each writeback request carry more >>>> than >>>> one page on it. The patch enables optimized behaviour unconditionally, >>>> i.e. mmap-ed writes will benefit from the patch even if >>>> fc->writeback_cache=0. >>> I rewrote this a bit, so we won't have to do the thing in two passes, >>> which >>> makes it simpler and more robust. Waiting for page writeback here is >>> wrong >>> anyway, see comment above fuse_page_mkwrite(). BTW we had a race there >>> because >>> fuse_page_mkwrite() didn't take the page lock. I've also fixed that up >>> and >>> pushed a series containing these patches up to implementing ->writepages() >>> to >>> >>> git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse.git >>> writepages >>> >>> Passed some trivial testing but more is needed. >> >> Thanks a lot for efforts. The approach you implemented looks promising, but >> it introduces the following assumption: a page cannot become dirty before we >> have a chance to wait on fuse writeback holding the page locked. This is >> already true for mmap-ed writes (due to your fixes) and it seems doable for >> cached writes as well (like we do in fuse_perform_write). But the assumption >> seems to be broken in case of direct read from local fs (e.g. ext4) to a >> memory region mmap-ed to a file on fuse fs. See how dio_bio_submit() marks >> pages dirty by bio_set_pages_dirty(). I can't see any solution for this >> use-case. Do you? > Hmm. Direct IO on an mmaped file will do get_user_pages() which will > do the necessary page fault magic and ->page_mkwrite() will be called. > At least AFAICS. Yes, I agree. > > The page cannot become dirty through a memory mapping without first > switching the pte from read-only to read-write first. Page accounting > logic relies on this too. The other way the page can become dirty is > through write(2) on the fs. But we do get notified about that too. Yes, that's correct, but I don't understand why you disregard two other cases of marking page dirty (both related to direct AIO read from a file to a memory region mmap-ed to a fuse file): 1. dio_bio_submit() --> bio_set_pages_dirty() --> set_page_dirty_lock() 2. dio_bio_complete() --> bio_check_pages_dirty() --> bio_dirty_fn() --> bio_set_pages_dirty() --> set_page_dirty_lock() As soon as a page became dirty through a memory mapping (exactly as you explained), nothing would prevent it to be written-back. And fuse will call end_page_writeback almost immediately after copying the real page to a temporary one. Then dio_bio_submit may re-dirty page speculatively w/o notifying fuse. And again, since then nothing would prevent it to be written-back once more. Hence we can end up in more then one temporary page in fuse write-back. And similar concern for dio_bio_complete() re-dirty. This make me think that we do need fuse_page_is_writeback() in fuse_writepages_fill(). But it shouldn't be harmful because it will no-op practically always due to waiting for fuse writeback in ->page_mkwrite() and in course of handling write(2). Thanks, Maxim -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/