Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751328AbbGaDNU (ORCPT ); Thu, 30 Jul 2015 23:13:20 -0400 Received: from youngberry.canonical.com ([91.189.89.112]:54348 "EHLO youngberry.canonical.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750767AbbGaDNT (ORCPT ); Thu, 30 Jul 2015 23:13:19 -0400 MIME-Version: 1.0 In-Reply-To: <20150730164233.GA14158@infradead.org> References: <1438256184-23645-1-git-send-email-ming.lei@canonical.com> <1438256184-23645-7-git-send-email-ming.lei@canonical.com> <20150730164233.GA14158@infradead.org> Date: Thu, 30 Jul 2015 23:13:17 -0400 Message-ID: Subject: Re: [PATCH v8 6/6] block: loop: support DIO & AIO From: Ming Lei To: Christoph Hellwig Cc: Jens Axboe , Linux Kernel Mailing List , Dave Kleikamp , Zach Brown , Maxim Patlasov , Andrew Morton , Alexander Viro , Tejun Heo , Dave Chinner Content-Type: text/plain; charset=UTF-8 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2843 Lines: 77 On Thu, Jul 30, 2015 at 12:42 PM, Christoph Hellwig wrote: > On Thu, Jul 30, 2015 at 07:36:24AM -0400, Ming Lei wrote: >> + /* >> + * When working at direct I/O, under very unusual cases, >> + * such as unaligned direct I/O from application and >> + * access to loop block device with 'unaligned' offset & size, >> + * we have to fallback to non-dio mode. >> + * >> + * During the switch between dio and non-dio, page cache >> + * has to be flushed to the backing file. >> + */ >> + if (unlikely(lo->use_dio && lo->last_use_dio != cmd->use_aio)) >> + vfs_fsync(lo->lo_backing_file, 0); > > Filesystems do the cache flushing for you. If it depends on specific filesystem, it is better to do that here explicitly and the page cache is still written back just one time(either here or filesystem). > >> +static inline bool req_dio_aligned(struct loop_device *lo, >> + const struct request *rq) >> +{ >> + return !((blk_rq_pos(rq) << 9) & lo->dio_align) && >> + !(blk_rq_bytes(rq) & lo->dio_align); >> +} >> + >> static int loop_queue_rq(struct blk_mq_hw_ctx *hctx, >> const struct blk_mq_queue_data *bd) >> { >> @@ -1554,6 +1658,13 @@ static int loop_queue_rq(struct blk_mq_hw_ctx *hctx, >> if (lo->lo_state != Lo_bound) >> return -EIO; >> >> + if (lo->use_dio && !lo->transfer && >> + req_dio_aligned(lo, bd->rq) && >> + !(cmd->rq->cmd_flags & (REQ_FLUSH | REQ_DISCARD))) >> + cmd->use_aio = true; >> + else >> + cmd->use_aio = false; > > But honestly run time switching between buffered I/O and direct I/O from > the same I/O stream is almost asking for triggering every possible > race in the dio vs buffered I/O synchronization. And there have been > a lot of those.. Yes, I agree, that is why I am a bit reluctant to do runtime switch in v7. I think of another way to aovid the race by serializing dio/buffered I/O strictly: - before switching to buffered I/O from dio, wait for completion of all pending dio - run fsync() - switch to buffered I/O Then concurrent dio and buffered I/O can't be possible, what do you think about this approach? > > I'd feel much more comfortable with a setup time check. Yes, that is what v7 did. But during setup time, we can't know the request's size and offset, and the simplest way is to just support 512-byte sector size of backing device for avoding runtime switching. Christoph and Dave, or do you have other better idea? Thanks, -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/