From: Edward Hsieh <[email protected]>
For chained bio, trace_block_bio_complete in bio_endio is currently called
only by the parent bio once upon all chained bio completed.
However, the sector and size for the parent bio are modified in bio_split.
Therefore, the size and sector of the complete events might not match the
queue events in blktrace.
The original fix of bio completion trace <fbbaf700e7b1> ("block: trace
completion of all bios.") wants multiple complete events to correspond
to one queue event but missed this.
md/raid5 read with bio cross chunks can reproduce this issue.
To fix, move trace completion into the loop for every chained bio to call.
Fixes: fbbaf700e7b1 ("block: trace completion of all bios.")
Reviewed-by: Wade Liang <[email protected]>
Reviewed-by: BingJing Chang <[email protected]>
Signed-off-by: Edward Hsieh <[email protected]>
---
block/bio.c | 13 ++++++-------
1 file changed, 6 insertions(+), 7 deletions(-)
diff --git a/block/bio.c b/block/bio.c
index a1c4d29..2ff72cb 100644
--- a/block/bio.c
+++ b/block/bio.c
@@ -1397,8 +1397,7 @@ static inline bool bio_remaining_done(struct bio *bio)
*
* bio_endio() can be called several times on a bio that has been chained
* using bio_chain(). The ->bi_end_io() function will only be called the
- * last time. At this point the BLK_TA_COMPLETE tracing event will be
- * generated if BIO_TRACE_COMPLETION is set.
+ * last time.
**/
void bio_endio(struct bio *bio)
{
@@ -1411,6 +1410,11 @@ void bio_endio(struct bio *bio)
if (bio->bi_bdev)
rq_qos_done_bio(bio->bi_bdev->bd_disk->queue, bio);
+ if (bio->bi_bdev && bio_flagged(bio, BIO_TRACE_COMPLETION)) {
+ trace_block_bio_complete(bio->bi_bdev->bd_disk->queue, bio);
+ bio_clear_flag(bio, BIO_TRACE_COMPLETION);
+ }
+
/*
* Need to have a real endio function for chained bios, otherwise
* various corner cases will break (like stacking block devices that
@@ -1424,11 +1428,6 @@ void bio_endio(struct bio *bio)
goto again;
}
- if (bio->bi_bdev && bio_flagged(bio, BIO_TRACE_COMPLETION)) {
- trace_block_bio_complete(bio->bi_bdev->bd_disk->queue, bio);
- bio_clear_flag(bio, BIO_TRACE_COMPLETION);
- }
-
blk_throtl_bio_endio(bio);
/* release cgroup info */
bio_uninit(bio);
--
2.7.4
Hi Jens and Neil,
Is there any feedback on this patch?
Thank you,
Edward Hsieh
On 3/3/2021 11:22 AM, Edward Hsieh wrote:
> From: Edward Hsieh <[email protected]>
>
> For chained bio, trace_block_bio_complete in bio_endio is currently called
> only by the parent bio once upon all chained bio completed.
> However, the sector and size for the parent bio are modified in bio_split.
> Therefore, the size and sector of the complete events might not match the
> queue events in blktrace.
>
> The original fix of bio completion trace <fbbaf700e7b1> ("block: trace
> completion of all bios.") wants multiple complete events to correspond
> to one queue event but missed this.
>
> md/raid5 read with bio cross chunks can reproduce this issue.
>
> To fix, move trace completion into the loop for every chained bio to call.
>
> Fixes: fbbaf700e7b1 ("block: trace completion of all bios.")
> Reviewed-by: Wade Liang <[email protected]>
> Reviewed-by: BingJing Chang <[email protected]>
> Signed-off-by: Edward Hsieh <[email protected]>
> ---
> block/bio.c | 13 ++++++-------
> 1 file changed, 6 insertions(+), 7 deletions(-)
>
> diff --git a/block/bio.c b/block/bio.c
> index a1c4d29..2ff72cb 100644
> --- a/block/bio.c
> +++ b/block/bio.c
> @@ -1397,8 +1397,7 @@ static inline bool bio_remaining_done(struct bio *bio)
> *
> * bio_endio() can be called several times on a bio that has been chained
> * using bio_chain(). The ->bi_end_io() function will only be called the
> - * last time. At this point the BLK_TA_COMPLETE tracing event will be
> - * generated if BIO_TRACE_COMPLETION is set.
> + * last time.
> **/
> void bio_endio(struct bio *bio)
> {
> @@ -1411,6 +1410,11 @@ void bio_endio(struct bio *bio)
> if (bio->bi_bdev)
> rq_qos_done_bio(bio->bi_bdev->bd_disk->queue, bio);
>
> + if (bio->bi_bdev && bio_flagged(bio, BIO_TRACE_COMPLETION)) {
> + trace_block_bio_complete(bio->bi_bdev->bd_disk->queue, bio);
> + bio_clear_flag(bio, BIO_TRACE_COMPLETION);
> + }
> +
> /*
> * Need to have a real endio function for chained bios, otherwise
> * various corner cases will break (like stacking block devices that
> @@ -1424,11 +1428,6 @@ void bio_endio(struct bio *bio)
> goto again;
> }
>
> - if (bio->bi_bdev && bio_flagged(bio, BIO_TRACE_COMPLETION)) {
> - trace_block_bio_complete(bio->bi_bdev->bd_disk->queue, bio);
> - bio_clear_flag(bio, BIO_TRACE_COMPLETION);
> - }
> -
> blk_throtl_bio_endio(bio);
> /* release cgroup info */
> bio_uninit(bio);
>
On Wed, Mar 03 2021, edwardh wrote:
> From: Edward Hsieh <[email protected]>
>
> For chained bio, trace_block_bio_complete in bio_endio is currently called
> only by the parent bio once upon all chained bio completed.
> However, the sector and size for the parent bio are modified in bio_split.
> Therefore, the size and sector of the complete events might not match the
> queue events in blktrace.
>
> The original fix of bio completion trace <fbbaf700e7b1> ("block: trace
> completion of all bios.") wants multiple complete events to correspond
> to one queue event but missed this.
>
> md/raid5 read with bio cross chunks can reproduce this issue.
>
> To fix, move trace completion into the loop for every chained bio to call.
Thanks. I think this is correct as far as tracing goes.
However the code still looks a bit odd.
The comment for the handling of bio_chain_endio suggests that the *only*
purpose for that is to avoid deep recursion. That suggests it should be
at the end of the function.
As it is blk_throtl_bio_endio() and bio_unint() are only called on the
last bio in a chain.
That seems wrong.
I'd be more comfortable if the patch moved the bio_chain_endio()
handling to the end, after all of that.
So the function would end.
if (bio->bi_end_io == bio_chain_endio) {
bio = __bio_chain_endio(bio);
goto again;
} else if (bio->bi_end_io)
bio->bi_end_io(bio);
Jens: can you see any reason why that functions must only be called on
the last bio in the chain?
Thanks,
NeilBrown
>
> Fixes: fbbaf700e7b1 ("block: trace completion of all bios.")
> Reviewed-by: Wade Liang <[email protected]>
> Reviewed-by: BingJing Chang <[email protected]>
> Signed-off-by: Edward Hsieh <[email protected]>
> ---
> block/bio.c | 13 ++++++-------
> 1 file changed, 6 insertions(+), 7 deletions(-)
>
> diff --git a/block/bio.c b/block/bio.c
> index a1c4d29..2ff72cb 100644
> --- a/block/bio.c
> +++ b/block/bio.c
> @@ -1397,8 +1397,7 @@ static inline bool bio_remaining_done(struct bio *bio)
> *
> * bio_endio() can be called several times on a bio that has been chained
> * using bio_chain(). The ->bi_end_io() function will only be called the
> - * last time. At this point the BLK_TA_COMPLETE tracing event will be
> - * generated if BIO_TRACE_COMPLETION is set.
> + * last time.
> **/
> void bio_endio(struct bio *bio)
> {
> @@ -1411,6 +1410,11 @@ void bio_endio(struct bio *bio)
> if (bio->bi_bdev)
> rq_qos_done_bio(bio->bi_bdev->bd_disk->queue, bio);
>
> + if (bio->bi_bdev && bio_flagged(bio, BIO_TRACE_COMPLETION)) {
> + trace_block_bio_complete(bio->bi_bdev->bd_disk->queue, bio);
> + bio_clear_flag(bio, BIO_TRACE_COMPLETION);
> + }
> +
> /*
> * Need to have a real endio function for chained bios, otherwise
> * various corner cases will break (like stacking block devices that
> @@ -1424,11 +1428,6 @@ void bio_endio(struct bio *bio)
> goto again;
> }
>
> - if (bio->bi_bdev && bio_flagged(bio, BIO_TRACE_COMPLETION)) {
> - trace_block_bio_complete(bio->bi_bdev->bd_disk->queue, bio);
> - bio_clear_flag(bio, BIO_TRACE_COMPLETION);
> - }
> -
> blk_throtl_bio_endio(bio);
> /* release cgroup info */
> bio_uninit(bio);
> --
> 2.7.4
On 3/23/2021 5:22 AM, NeilBrown wrote:
> On Wed, Mar 03 2021, edwardh wrote:
>
>> From: Edward Hsieh <[email protected]>
>>
>> For chained bio, trace_block_bio_complete in bio_endio is currently called
>> only by the parent bio once upon all chained bio completed.
>> However, the sector and size for the parent bio are modified in bio_split.
>> Therefore, the size and sector of the complete events might not match the
>> queue events in blktrace.
>>
>> The original fix of bio completion trace <fbbaf700e7b1> ("block: trace
>> completion of all bios.") wants multiple complete events to correspond
>> to one queue event but missed this.
>>
>> md/raid5 read with bio cross chunks can reproduce this issue.
>>
>> To fix, move trace completion into the loop for every chained bio to call.
>
> Thanks. I think this is correct as far as tracing goes.
> However the code still looks a bit odd.
>
> The comment for the handling of bio_chain_endio suggests that the *only*
> purpose for that is to avoid deep recursion. That suggests it should be
> at the end of the function.
> As it is blk_throtl_bio_endio() and bio_unint() are only called on the
> last bio in a chain.
> That seems wrong.
>
> I'd be more comfortable if the patch moved the bio_chain_endio()
> handling to the end, after all of that.
> So the function would end.
>
> if (bio->bi_end_io == bio_chain_endio) {
> bio = __bio_chain_endio(bio);
> goto again;
> } else if (bio->bi_end_io)
> bio->bi_end_io(bio);
>
> Jens: can you see any reason why that functions must only be called on
> the last bio in the chain?
>
> Thanks,
> NeilBrown
>
Hi Neil and Jens,
From the commit message, bio_uninit is put here for bio allocated in
special ways (e.g., on stack), that will not be release by bio_free. For
chained bio, __bio_chain_endio invokes bio_put and release the
resources, so it seems that we don't need to call bio_uninit for chained
bio.
The blk_throtl_bio_endio is used to update the latency for the throttle
group. I think the latency should only be updated after the whole bio is
finished?
To make sense for the "tail call optimization" in the comment, I'll
suggest to wrap the whole statement with an else. What do you think?
if (bio->bi_end_io == bio_chain_endio) {
bio = __bio_chain_endio(bio);
goto again;
} else {
blk_throtl_bio_endio(bio);
/* release cgroup info */
bio_uninit(bio);
if (bio->bi_end_io)
bio->bi_end_io(bio);
}
Thanks,
Edward Hsieh
On 4/23/2021 4:04 PM, Edward Hsieh wrote:
> On 3/23/2021 5:22 AM, NeilBrown wrote:
>> On Wed, Mar 03 2021, edwardh wrote:
>>
>>> From: Edward Hsieh <[email protected]>
>>>
>>> For chained bio, trace_block_bio_complete in bio_endio is currently
>>> called
>>> only by the parent bio once upon all chained bio completed.
>>> However, the sector and size for the parent bio are modified in
>>> bio_split.
>>> Therefore, the size and sector of the complete events might not match
>>> the
>>> queue events in blktrace.
>>>
>>> The original fix of bio completion trace <fbbaf700e7b1> ("block: trace
>>> completion of all bios.") wants multiple complete events to correspond
>>> to one queue event but missed this.
>>>
>>> md/raid5 read with bio cross chunks can reproduce this issue.
>>>
>>> To fix, move trace completion into the loop for every chained bio to
>>> call.
>>
>> Thanks. I think this is correct as far as tracing goes.
>> However the code still looks a bit odd.
>>
>> The comment for the handling of bio_chain_endio suggests that the *only*
>> purpose for that is to avoid deep recursion. That suggests it should be
>> at the end of the function.
>> As it is blk_throtl_bio_endio() and bio_unint() are only called on the
>> last bio in a chain.
>> That seems wrong.
>>
>> I'd be more comfortable if the patch moved the bio_chain_endio()
>> handling to the end, after all of that.
>> So the function would end.
>>
>> if (bio->bi_end_io == bio_chain_endio) {
>> bio = __bio_chain_endio(bio);
>> goto again;
>> } else if (bio->bi_end_io)
>> bio->bi_end_io(bio);
>>
>> Jens: can you see any reason why that functions must only be called on
>> the last bio in the chain?
>>
>> Thanks,
>> NeilBrown
>>
>
> Hi Neil and Jens,
>
> From the commit message, bio_uninit is put here for bio allocated in
> special ways (e.g., on stack), that will not be release by bio_free. For
> chained bio, __bio_chain_endio invokes bio_put and release the
> resources, so it seems that we don't need to call bio_uninit for chained
> bio.
>
> The blk_throtl_bio_endio is used to update the latency for the throttle
> group. I think the latency should only be updated after the whole bio is
> finished?
>
> To make sense for the "tail call optimization" in the comment, I'll
> suggest to wrap the whole statement with an else. What do you think?
>
> if (bio->bi_end_io == bio_chain_endio) {
> bio = __bio_chain_endio(bio);
> goto again;
> } else {
> blk_throtl_bio_endio(bio);
> /* release cgroup info */
> bio_uninit(bio);
> if (bio->bi_end_io)
> bio->bi_end_io(bio);
> }
>
> Thanks,
> Edward Hsieh
Hi Neil and Jens,
Any feedback on this one?
Thank you,
Edward Hsieh
On 5/10/2021 10:06 AM, Edward Hsieh wrote:
>
> On 4/23/2021 4:04 PM, Edward Hsieh wrote:
>> On 3/23/2021 5:22 AM, NeilBrown wrote:
>>> On Wed, Mar 03 2021, edwardh wrote:
>>>
>>>> From: Edward Hsieh <[email protected]>
>>>>
>>>> For chained bio, trace_block_bio_complete in bio_endio is currently
>>>> called
>>>> only by the parent bio once upon all chained bio completed.
>>>> However, the sector and size for the parent bio are modified in
>>>> bio_split.
>>>> Therefore, the size and sector of the complete events might not
>>>> match the
>>>> queue events in blktrace.
>>>>
>>>> The original fix of bio completion trace <fbbaf700e7b1> ("block: trace
>>>> completion of all bios.") wants multiple complete events to correspond
>>>> to one queue event but missed this.
>>>>
>>>> md/raid5 read with bio cross chunks can reproduce this issue.
>>>>
>>>> To fix, move trace completion into the loop for every chained bio to
>>>> call.
>>>
>>> Thanks. I think this is correct as far as tracing goes.
>>> However the code still looks a bit odd.
>>>
>>> The comment for the handling of bio_chain_endio suggests that the *only*
>>> purpose for that is to avoid deep recursion. That suggests it should be
>>> at the end of the function.
>>> As it is blk_throtl_bio_endio() and bio_unint() are only called on the
>>> last bio in a chain.
>>> That seems wrong.
>>>
>>> I'd be more comfortable if the patch moved the bio_chain_endio()
>>> handling to the end, after all of that.
>>> So the function would end.
>>>
>>> if (bio->bi_end_io == bio_chain_endio) {
>>> bio = __bio_chain_endio(bio);
>>> goto again;
>>> } else if (bio->bi_end_io)
>>> bio->bi_end_io(bio);
>>>
>>> Jens: can you see any reason why that functions must only be called on
>>> the last bio in the chain?
>>>
>>> Thanks,
>>> NeilBrown
>>>
>>
>> Hi Neil and Jens,
>>
>> From the commit message, bio_uninit is put here for bio allocated in
>> special ways (e.g., on stack), that will not be release by bio_free. For
>> chained bio, __bio_chain_endio invokes bio_put and release the
>> resources, so it seems that we don't need to call bio_uninit for chained
>> bio.
>>
>> The blk_throtl_bio_endio is used to update the latency for the throttle
>> group. I think the latency should only be updated after the whole bio is
>> finished?
>>
>> To make sense for the "tail call optimization" in the comment, I'll
>> suggest to wrap the whole statement with an else. What do you think?
>>
>> if (bio->bi_end_io == bio_chain_endio) {
>> bio = __bio_chain_endio(bio);
>> goto again;
>> } else {
>> blk_throtl_bio_endio(bio);
>> /* release cgroup info */
>> bio_uninit(bio);
>> if (bio->bi_end_io)
>> bio->bi_end_io(bio);
>> }
>>
>> Thanks,
>> Edward Hsieh
>
> Hi Neil and Jens,
>
> Any feedback on this one?
>
> Thank you,
> Edward Hsieh >
Hi Neil and Jens,
Any comments?
Thank you,
Edward Hsieh