Subject: Re: [PATCH 8/8] writeback: throttle buffered writeback
To: xiakaixu <xiakaixu@huawei.com>
References: <1460953487-3430-1-git-send-email-axboe@fb.com>
 <1460953487-3430-9-git-send-email-axboe@fb.com> <571B3073.2010206@huawei.com>
 <571BEB03.5060906@fb.com> <571E024C.2020307@huawei.com>
CC: <linux-kernel@vger.kernel.org>, <linux-fsdevel@vger.kernel.org>,
        <linux-block@vger.kernel.org>, <jack@suse.cz>, <dchinner@redhat.com>,
        "miaoxie (A)" <miaoxie@huawei.com>, Bintian <bintian.wang@huawei.com>,
        Huxinwei <huxinwei@huawei.com>
From: Jens Axboe <axboe@fb.com>
Message-ID: <571E2BBD.7040804@fb.com>
Date: Mon, 25 Apr 2016 08:37:49 -0600
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101
 Thunderbird/38.6.0
MIME-Version: 1.0
In-Reply-To: <571E024C.2020307@huawei.com>
Content-Type: text/plain; charset="utf-8"; format=flowed
Content-Transfer-Encoding: 8bit
Sender: linux-kernel-owner@vger.kernel.org
Content-Length: 4748
Lines: 117

On 04/25/2016 05:41 AM, xiakaixu wrote:
> 于 2016/4/24 5:37, Jens Axboe 写道:
>> On 04/23/2016 02:21 AM, xiakaixu wrote:
>>>> diff --git a/block/blk-core.c b/block/blk-core.c
>>>> index 40b57bf4852c..d941f69dfb4b 100644
>>>> --- a/block/blk-core.c
>>>> +++ b/block/blk-core.c
>>>> @@ -39,6 +39,7 @@
>>>>
>>>>    #include "blk.h"
>>>>    #include "blk-mq.h"
>>>> +#include "blk-wb.h"
>>>>
>>>>    EXPORT_TRACEPOINT_SYMBOL_GPL(block_bio_remap);
>>>>    EXPORT_TRACEPOINT_SYMBOL_GPL(block_rq_remap);
>>>> @@ -880,6 +881,7 @@ blk_init_allocated_queue(struct request_queue *q, request_fn_proc *rfn,
>>>>
>>>>    fail:
>>>>        blk_free_flush_queue(q->fq);
>>>> +    blk_wb_exit(q);
>>>>        return NULL;
>>>>    }
>>>>    EXPORT_SYMBOL(blk_init_allocated_queue);
>>>> @@ -1395,6 +1397,7 @@ void blk_requeue_request(struct request_queue *q, struct request *rq)
>>>>        blk_delete_timer(rq);
>>>>        blk_clear_rq_complete(rq);
>>>>        trace_block_rq_requeue(q, rq);
>>>> +    blk_wb_requeue(q->rq_wb, rq);
>>>>
>>>>        if (rq->cmd_flags & REQ_QUEUED)
>>>>            blk_queue_end_tag(q, rq);
>>>> @@ -1485,6 +1488,8 @@ void __blk_put_request(struct request_queue *q, struct request *req)
>>>>        /* this is a bio leak */
>>>>        WARN_ON(req->bio != NULL);
>>>>
>>>> +    blk_wb_done(q->rq_wb, req);
>>>> +
>>>>        /*
>>>>         * Request may not have originated from ll_rw_blk. if not,
>>>>         * it didn't come out of our reserved rq pools
>>>> @@ -1714,6 +1719,7 @@ static blk_qc_t blk_queue_bio(struct request_queue *q, struct bio *bio)
>>>>        int el_ret, rw_flags, where = ELEVATOR_INSERT_SORT;
>>>>        struct request *req;
>>>>        unsigned int request_count = 0;
>>>> +    bool wb_acct;
>>>>
>>>>        /*
>>>>         * low level driver can indicate that it wants pages above a
>>>> @@ -1766,6 +1772,8 @@ static blk_qc_t blk_queue_bio(struct request_queue *q, struct bio *bio)
>>>>        }
>>>>
>>>>    get_rq:
>>>> +    wb_acct = blk_wb_wait(q->rq_wb, bio, q->queue_lock);
>>>> +
>>>>        /*
>>>>         * This sync check and mask will be re-done in init_request_from_bio(),
>>>>         * but we need to set it earlier to expose the sync flag to the
>>>> @@ -1781,11 +1789,16 @@ get_rq:
>>>>         */
>>>>        req = get_request(q, rw_flags, bio, GFP_NOIO);
>>>>        if (IS_ERR(req)) {
>>>> +        if (wb_acct)
>>>> +            __blk_wb_done(q->rq_wb);
>>>>            bio->bi_error = PTR_ERR(req);
>>>>            bio_endio(bio);
>>>>            goto out_unlock;
>>>>        }
>>>>
>>>> +    if (wb_acct)
>>>> +        req->cmd_flags |= REQ_BUF_INFLIGHT;
>>>> +
>>>>        /*
>>>>         * After dropping the lock and possibly sleeping here, our request
>>>>         * may now be mergeable after it had proven unmergeable (above).
>>>> @@ -2515,6 +2528,7 @@ void blk_start_request(struct request *req)
>>>>        blk_dequeue_request(req);
>>>>
>>>>        req->issue_time = ktime_to_ns(ktime_get());
>>>> +    blk_wb_issue(req->q->rq_wb, req);
>>>>
>>>>        /*
>>>>         * We are now handing the request to the hardware, initialize
>>>> @@ -2751,6 +2765,7 @@ void blk_finish_request(struct request *req, int error)
>>>>            blk_unprep_request(req);
>>>>
>>>>        blk_account_io_done(req);
>>>> +    blk_wb_done(req->q->rq_wb, req);
>>>
>>> Hi Jens,
>>>
>>> Seems the function blk_wb_done() will be executed twice even if the end_io
>>> callback is set.
>>> Maybe the same thing would happen in blk-mq.c.
>>
>> Yeah, that was a mistake, the current version has it fixed. It was inadvertently added when I discovered that the flush request didn't work properly. Now it just duplicates the call inside the check for if it has an ->end_io() defined, since we don't use the normal path for that.
>>
> Hi Jens,
>
> I have checked the wb-buf-throttle branch in your block git repo. I am not sure it is the completed version.
> Seems only the problem is fixed in blk-mq.c. The function blk_wb_done() still would be executed twice in blk-core.c.
> (the functions blk_finish_request() and __blk_put_request())
> Maybe we can add a flag to mark whether blk_wb_done() has been done or not.

Good catch, looks like I did only patch up the mq bits. It's still not 
perfect, since we could potentially double account a request that has a 
private end_io(), if it was allocated through the normal block rq 
allocator. It'll skew the unrelated-io-timestamp a bit, but it's not a 
big deal. The count for inflight will be consistent, which is the 
important part.

We currently have just 1 bit to tell if the request is tracked or not, 
so we don't know if it was tracked but already seen.

I'll fix up the blk-core part to be identical to the blk-mq fix.

-- 
Jens Axboe