MIME-Version: 1.0
In-Reply-To: <20150730164233.GA14158@infradead.org>
References: <1438256184-23645-1-git-send-email-ming.lei@canonical.com>
	<1438256184-23645-7-git-send-email-ming.lei@canonical.com>
	<20150730164233.GA14158@infradead.org>
Date: Thu, 30 Jul 2015 23:13:17 -0400
Message-ID: <CACVXFVMO6aFg9wk33ck38hfz+bB1mSRK8s8_k432nbGgSbu6BQ@mail.gmail.com>
Subject: Re: [PATCH v8 6/6] block: loop: support DIO & AIO
From: Ming Lei <ming.lei@canonical.com>
To: Christoph Hellwig <hch@infradead.org>
Cc: Jens Axboe <axboe@kernel.dk>,
        Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
        Dave Kleikamp <dave.kleikamp@oracle.com>, Zach Brown <zab@zabbo.net>,
        Maxim Patlasov <mpatlasov@parallels.com>,
        Andrew Morton <akpm@linux-foundation.org>,
        Alexander Viro <viro@zeniv.linux.org.uk>, Tejun Heo <tj@kernel.org>,
        Dave Chinner <david@fromorbit.com>
Content-Type: text/plain; charset=UTF-8
Sender: linux-kernel-owner@vger.kernel.org
Content-Length: 2843
Lines: 77

On Thu, Jul 30, 2015 at 12:42 PM, Christoph Hellwig <hch@infradead.org> wrote:
> On Thu, Jul 30, 2015 at 07:36:24AM -0400, Ming Lei wrote:
>> +     /*
>> +      * When working at direct I/O, under very unusual cases,
>> +      * such as unaligned direct I/O from application and
>> +      * access to loop block device with 'unaligned' offset & size,
>> +      * we have to fallback to non-dio mode.
>> +      *
>> +      * During the switch between dio and non-dio, page cache
>> +      * has to be flushed to the backing file.
>> +      */
>> +     if (unlikely(lo->use_dio && lo->last_use_dio != cmd->use_aio))
>> +             vfs_fsync(lo->lo_backing_file, 0);
>
> Filesystems do the cache flushing for you.

If it depends on specific filesystem, it is better to do that here explicitly
and the page cache is still written back just one time(either here or
filesystem).

>
>> +static inline bool req_dio_aligned(struct loop_device *lo,
>> +             const struct request *rq)
>> +{
>> +     return !((blk_rq_pos(rq) << 9) & lo->dio_align) &&
>> +             !(blk_rq_bytes(rq) & lo->dio_align);
>> +}
>> +
>>  static int loop_queue_rq(struct blk_mq_hw_ctx *hctx,
>>               const struct blk_mq_queue_data *bd)
>>  {
>> @@ -1554,6 +1658,13 @@ static int loop_queue_rq(struct blk_mq_hw_ctx *hctx,
>>       if (lo->lo_state != Lo_bound)
>>               return -EIO;
>>
>> +     if (lo->use_dio && !lo->transfer &&
>> +                     req_dio_aligned(lo, bd->rq) &&
>> +                     !(cmd->rq->cmd_flags & (REQ_FLUSH | REQ_DISCARD)))
>> +             cmd->use_aio = true;
>> +     else
>> +             cmd->use_aio = false;
>
> But honestly run time switching between buffered I/O and direct I/O from
> the same I/O stream is almost asking for triggering every possible
> race in the dio vs buffered I/O synchronization.  And there have been
> a lot of those..

Yes, I agree, that is why I am a bit reluctant to do runtime switch
in v7.

I think of another way to aovid the race by serializing dio/buffered I/O
strictly:

- before switching to buffered I/O from dio, wait for completion of all
pending dio
- run fsync()
- switch to buffered I/O

Then concurrent dio and buffered I/O can't be possible, what do you
think about this approach?

>
> I'd feel much more comfortable with a setup time check.

Yes, that is what v7 did.  But during setup time, we can't know
the request's size and offset, and the simplest way is to just
support 512-byte sector size of backing device for avoding
runtime switching.

Christoph and Dave, or do you have other better idea?

Thanks,
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/