Sometimes process will be stalled in "wait_on_bit_io(&bh->b_state,
BH_Shadow, TASK_UNINTERRUPTIBLE)" for a while, and in order to analyse
app's latency thoroughly, add a new tracepoint to track this delay.
Trace info likes below:
fsstress-5068 [008] .... 11007.757543: jbd2_sleep_on_shadow: dev 254,17 sleep 1
fsstress-5070 [007] .... 11007.757544: jbd2_sleep_on_shadow: dev 254,17 sleep 2
fsstress-5069 [009] .... 11007.757548: jbd2_sleep_on_shadow: dev 254,17 sleep 2
fsstress-5067 [011] .... 11007.757569: jbd2_sleep_on_shadow: dev 254,17 sleep 1
fsstress-5063 [007] .... 11007.757651: jbd2_sleep_on_shadow: dev 254,17 sleep 2
fsstress-5070 [007] .... 11007.757792: jbd2_sleep_on_shadow: dev 254,17 sleep 0
fsstress-5071 [011] .... 11007.763493: jbd2_sleep_on_shadow: dev 254,17 sleep 1
Signed-off-by: Xiaoguang Wang <[email protected]>
---
fs/jbd2/transaction.c | 3 +++
include/trace/events/jbd2.h | 21 +++++++++++++++++++++
2 files changed, 24 insertions(+)
diff --git a/fs/jbd2/transaction.c b/fs/jbd2/transaction.c
index 990e7b5062e7..84974fb9d4f9 100644
--- a/fs/jbd2/transaction.c
+++ b/fs/jbd2/transaction.c
@@ -991,7 +991,10 @@ do_get_write_access(handle_t *handle, struct journal_head *jh,
if (buffer_shadow(bh)) {
JBUFFER_TRACE(jh, "on shadow: sleep");
jbd_unlock_bh_state(bh);
+ start_lock = jiffies;
wait_on_bit_io(&bh->b_state, BH_Shadow, TASK_UNINTERRUPTIBLE);
+ trace_jbd2_sleep_on_shadow(bh->b_bdev->bd_dev,
+ jiffies_to_msecs(jiffies - start_lock));
goto repeat;
}
diff --git a/include/trace/events/jbd2.h b/include/trace/events/jbd2.h
index 2310b259329f..2f048fdb63c6 100644
--- a/include/trace/events/jbd2.h
+++ b/include/trace/events/jbd2.h
@@ -380,6 +380,27 @@ TRACE_EVENT(jbd2_lock_buffer_stall,
__entry->stall_ms)
);
+TRACE_EVENT(jbd2_sleep_on_shadow,
+
+ TP_PROTO(dev_t dev, unsigned long stall_ms),
+
+ TP_ARGS(dev, stall_ms),
+
+ TP_STRUCT__entry(
+ __field( dev_t, dev )
+ __field(unsigned long, stall_ms )
+ ),
+
+ TP_fast_assign(
+ __entry->dev = dev;
+ __entry->stall_ms = stall_ms;
+ ),
+
+ TP_printk("dev %d,%d sleep %lu",
+ MAJOR(__entry->dev), MINOR(__entry->dev),
+ __entry->stall_ms)
+);
+
#endif /* _TRACE_JBD2_H */
/* This part must be outside protection */
--
2.17.2
Sometimes process will be stalled in wait_transaction_locked() for a while,
also add a new tracepoint to track this delay.
Trace info likes below:
fsstress-1672 [009] .... 184.663043: jbd2_wait_on_transaction_locked: dev 254,17 wait 0
fsstress-1674 [002] .... 184.771556: jbd2_wait_on_transaction_locked: dev 254,17 wait 42
fsstress-1676 [005] .... 184.771562: jbd2_wait_on_transaction_locked: dev 254,17 wait 100
fsstress-1677 [003] .... 184.771567: jbd2_wait_on_transaction_locked: dev 254,17 wait 102
kworker/13:1-160 [013] .... 184.771619: jbd2_wait_on_transaction_locked: dev 254,17 wait 102
fsstress-1673 [003] .... 184.771675: jbd2_wait_on_transaction_locked: dev 254,17 wait 95
Signed-off-by: Xiaoguang Wang <[email protected]>
---
fs/jbd2/transaction.c | 3 +++
include/trace/events/jbd2.h | 21 +++++++++++++++++++++
2 files changed, 24 insertions(+)
diff --git a/fs/jbd2/transaction.c b/fs/jbd2/transaction.c
index 84974fb9d4f9..43f4f7fadaec 100644
--- a/fs/jbd2/transaction.c
+++ b/fs/jbd2/transaction.c
@@ -148,6 +148,7 @@ static void wait_transaction_locked(journal_t *journal)
DEFINE_WAIT(wait);
int need_to_start;
tid_t tid = journal->j_running_transaction->t_tid;
+ unsigned long start = jiffies;
prepare_to_wait(&journal->j_wait_transaction_locked, &wait,
TASK_UNINTERRUPTIBLE);
@@ -158,6 +159,8 @@ static void wait_transaction_locked(journal_t *journal)
jbd2_might_wait_for_commit(journal);
schedule();
finish_wait(&journal->j_wait_transaction_locked, &wait);
+ trace_jbd2_wait_on_transaction_locked(journal->j_fs_dev->bd_dev,
+ jiffies_to_msecs(jiffies - start));
}
/*
diff --git a/include/trace/events/jbd2.h b/include/trace/events/jbd2.h
index 2f048fdb63c6..6f091f901223 100644
--- a/include/trace/events/jbd2.h
+++ b/include/trace/events/jbd2.h
@@ -401,6 +401,27 @@ TRACE_EVENT(jbd2_sleep_on_shadow,
__entry->stall_ms)
);
+TRACE_EVENT(jbd2_wait_on_transaction_locked,
+
+ TP_PROTO(dev_t dev, unsigned long stall_ms),
+
+ TP_ARGS(dev, stall_ms),
+
+ TP_STRUCT__entry(
+ __field( dev_t, dev )
+ __field(unsigned long, stall_ms )
+ ),
+
+ TP_fast_assign(
+ __entry->dev = dev;
+ __entry->stall_ms = stall_ms;
+ ),
+
+ TP_printk("dev %d,%d wait %lu",
+ MAJOR(__entry->dev), MINOR(__entry->dev),
+ __entry->stall_ms)
+);
+
#endif /* _TRACE_JBD2_H */
/* This part must be outside protection */
--
2.17.2
On Mon, Sep 02, 2019 at 10:54:41PM +0800, Xiaoguang Wang wrote:
> Sometimes process will be stalled in "wait_on_bit_io(&bh->b_state,
> BH_Shadow, TASK_UNINTERRUPTIBLE)" for a while, and in order to analyse
> app's latency thoroughly, add a new tracepoint to track this delay.
>
> Trace info likes below:
> fsstress-5068 [008] .... 11007.757543: jbd2_sleep_on_shadow: dev 254,17 sleep 1
> fsstress-5070 [007] .... 11007.757544: jbd2_sleep_on_shadow: dev 254,17 sleep 2
> fsstress-5069 [009] .... 11007.757548: jbd2_sleep_on_shadow: dev 254,17 sleep 2
> fsstress-5067 [011] .... 11007.757569: jbd2_sleep_on_shadow: dev 254,17 sleep 1
> fsstress-5063 [007] .... 11007.757651: jbd2_sleep_on_shadow: dev 254,17 sleep 2
> fsstress-5070 [007] .... 11007.757792: jbd2_sleep_on_shadow: dev 254,17 sleep 0
> fsstress-5071 [011] .... 11007.763493: jbd2_sleep_on_shadow: dev 254,17 sleep 1
>
> Signed-off-by: Xiaoguang Wang <[email protected]>
I think maybe it might be better to use units of microseconds and then
change sleep to usleep so the units are clear? This is a spinlock, so
it should be quick.
For the other patch in this series, milliseconds seems fine, but if we
change the trace info to use "msleep" instead that would be clearer
--- or you could change it to use microseconds as well just for
consistency; I think either would be fine.
What do you think?
Cheers,
- Ted
hi,
Thanks for reviewing.
> On Mon, Sep 02, 2019 at 10:54:41PM +0800, Xiaoguang Wang wrote:
>> Sometimes process will be stalled in "wait_on_bit_io(&bh->b_state,
>> BH_Shadow, TASK_UNINTERRUPTIBLE)" for a while, and in order to analyse
>> app's latency thoroughly, add a new tracepoint to track this delay.
>>
>> Trace info likes below:
>> fsstress-5068 [008] .... 11007.757543: jbd2_sleep_on_shadow: dev 254,17 sleep 1
>> fsstress-5070 [007] .... 11007.757544: jbd2_sleep_on_shadow: dev 254,17 sleep 2
>> fsstress-5069 [009] .... 11007.757548: jbd2_sleep_on_shadow: dev 254,17 sleep 2
>> fsstress-5067 [011] .... 11007.757569: jbd2_sleep_on_shadow: dev 254,17 sleep 1
>> fsstress-5063 [007] .... 11007.757651: jbd2_sleep_on_shadow: dev 254,17 sleep 2
>> fsstress-5070 [007] .... 11007.757792: jbd2_sleep_on_shadow: dev 254,17 sleep 0
>> fsstress-5071 [011] .... 11007.763493: jbd2_sleep_on_shadow: dev 254,17 sleep 1
>>
>> Signed-off-by: Xiaoguang Wang <[email protected]>
>
> I think maybe it might be better to use units of microseconds and then
> change sleep to usleep so the units are clear? This is a spinlock, so
> it should be quick.
Sorry, I may not quite understand you, do you mean that milliseconds is not precise, so
should use microseconds? For these two patches, they do not use usleep or msleep to do
real sleep work, they just record the duration which process takes to wait bh_shadow flag
to be cleared or transaction to be unlocked.
Regards,
Xiaougang Wang
>
> For the other patch in this series, milliseconds seems fine, but if we
> change the trace info to use "msleep" instead that would be clearer
> --- or you could change it to use microseconds as well just for
> consistency; I think either would be fine.
>
> What do you think?
>
> Cheers,
>
> - Ted
>
On Wed, Sep 11, 2019 at 02:52:51PM +0800, Xiaoguang Wang wrote:
> > I think maybe it might be better to use units of microseconds and then
> > change sleep to usleep so the units are clear? This is a spinlock, so
> > it should be quick.
>
> Sorry, I may not quite understand you, do you mean that milliseconds is not precise, so
> should use microseconds? For these two patches, they do not use usleep or msleep to do
> real sleep work, they just record the duration which process takes to wait bh_shadow flag
> to be cleared or transaction to be unlocked.
Apologies, I should have been clear enough. Yes, my concern that
milliseconds might not be fine-grained enough. The sample results
which you showed had values of 2ms, 1ms, and 0ms. And the single 0ms
result in particular raised the concern that we should use a
microseconds instead of milliseconds.
In fact, instead of "sleep", maybe "latency(us)" or "latency(ms)"
would be a better label?
Regards,
- Ted
hi,
> On Wed, Sep 11, 2019 at 02:52:51PM +0800, Xiaoguang Wang wrote:
>>> I think maybe it might be better to use units of microseconds and then
>>> change sleep to usleep so the units are clear? This is a spinlock, so
>>> it should be quick.
>>
>> Sorry, I may not quite understand you, do you mean that milliseconds is not precise, so
>> should use microseconds? For these two patches, they do not use usleep or msleep to do
>> real sleep work, they just record the duration which process takes to wait bh_shadow flag
>> to be cleared or transaction to be unlocked.
>
> Apologies, I should have been clear enough. Yes, my concern that
> milliseconds might not be fine-grained enough. The sample results
> which you showed had values of 2ms, 1ms, and 0ms. And the single 0ms
> result in particular raised the concern that we should use a
> microseconds instead of milliseconds.
>
> In fact, instead of "sleep", maybe "latency(us)" or "latency(ms)"
> would be a better label?
OK, I'll update a v2, thanks.
Regards,
Xiaoguang Wang
>
> Regards,
>
> - Ted
>