LinuxLists.cc - [PATCH 1/2] jbd2: add new tracepoint jbd2_sleep_on

2019-09-02 14:57:17

Subject: [PATCH 1/2] jbd2: add new tracepoint jbd2_sleep_on_shadow

Sometimes process will be stalled in "wait_on_bit_io(&bh->b_state,
BH_Shadow, TASK_UNINTERRUPTIBLE)" for a while, and in order to analyse
app's latency thoroughly, add a new tracepoint to track this delay.

Trace info likes below:
fsstress-5068 [008] .... 11007.757543: jbd2_sleep_on_shadow: dev 254,17 sleep 1
fsstress-5070 [007] .... 11007.757544: jbd2_sleep_on_shadow: dev 254,17 sleep 2
fsstress-5069 [009] .... 11007.757548: jbd2_sleep_on_shadow: dev 254,17 sleep 2
fsstress-5067 [011] .... 11007.757569: jbd2_sleep_on_shadow: dev 254,17 sleep 1
fsstress-5063 [007] .... 11007.757651: jbd2_sleep_on_shadow: dev 254,17 sleep 2
fsstress-5070 [007] .... 11007.757792: jbd2_sleep_on_shadow: dev 254,17 sleep 0
fsstress-5071 [011] .... 11007.763493: jbd2_sleep_on_shadow: dev 254,17 sleep 1

Signed-off-by: Xiaoguang Wang <[email protected]>
---
fs/jbd2/transaction.c | 3 +++
include/trace/events/jbd2.h | 21 +++++++++++++++++++++
2 files changed, 24 insertions(+)

diff --git a/fs/jbd2/transaction.c b/fs/jbd2/transaction.c
index 990e7b5062e7..84974fb9d4f9 100644
--- a/fs/jbd2/transaction.c
+++ b/fs/jbd2/transaction.c
@@ -991,7 +991,10 @@ do_get_write_access(handle_t *handle, struct journal_head *jh,
if (buffer_shadow(bh)) {
JBUFFER_TRACE(jh, "on shadow: sleep");
jbd_unlock_bh_state(bh);
+ start_lock = jiffies;
wait_on_bit_io(&bh->b_state, BH_Shadow, TASK_UNINTERRUPTIBLE);
+ trace_jbd2_sleep_on_shadow(bh->b_bdev->bd_dev,
+ jiffies_to_msecs(jiffies - start_lock));
goto repeat;
}

diff --git a/include/trace/events/jbd2.h b/include/trace/events/jbd2.h
index 2310b259329f..2f048fdb63c6 100644
--- a/include/trace/events/jbd2.h
+++ b/include/trace/events/jbd2.h
@@ -380,6 +380,27 @@ TRACE_EVENT(jbd2_lock_buffer_stall,
__entry->stall_ms)
);

+TRACE_EVENT(jbd2_sleep_on_shadow,
+
+ TP_PROTO(dev_t dev, unsigned long stall_ms),
+
+ TP_ARGS(dev, stall_ms),
+
+ TP_STRUCT__entry(
+ __field( dev_t, dev )
+ __field(unsigned long, stall_ms )
+ ),
+
+ TP_fast_assign(
+ __entry->dev = dev;
+ __entry->stall_ms = stall_ms;
+ ),
+
+ TP_printk("dev %d,%d sleep %lu",
+ MAJOR(__entry->dev), MINOR(__entry->dev),
+ __entry->stall_ms)
+);
+
#endif /* _TRACE_JBD2_H */

/* This part must be outside protection */
--
2.17.2

2019-09-02 14:58:55

by Xiaoguang Wang

[permalink] [raw]

Subject: [PATCH 2/2] jbd2: add new tracepoint jbd2_wait_on_transaction_locked

Sometimes process will be stalled in wait_transaction_locked() for a while,
also add a new tracepoint to track this delay.

Trace info likes below:
fsstress-1672 [009] .... 184.663043: jbd2_wait_on_transaction_locked: dev 254,17 wait 0
fsstress-1674 [002] .... 184.771556: jbd2_wait_on_transaction_locked: dev 254,17 wait 42
fsstress-1676 [005] .... 184.771562: jbd2_wait_on_transaction_locked: dev 254,17 wait 100
fsstress-1677 [003] .... 184.771567: jbd2_wait_on_transaction_locked: dev 254,17 wait 102
kworker/13:1-160 [013] .... 184.771619: jbd2_wait_on_transaction_locked: dev 254,17 wait 102
fsstress-1673 [003] .... 184.771675: jbd2_wait_on_transaction_locked: dev 254,17 wait 95

Signed-off-by: Xiaoguang Wang <[email protected]>
---
fs/jbd2/transaction.c | 3 +++
include/trace/events/jbd2.h | 21 +++++++++++++++++++++
2 files changed, 24 insertions(+)

diff --git a/fs/jbd2/transaction.c b/fs/jbd2/transaction.c
index 84974fb9d4f9..43f4f7fadaec 100644
--- a/fs/jbd2/transaction.c
+++ b/fs/jbd2/transaction.c
@@ -148,6 +148,7 @@ static void wait_transaction_locked(journal_t *journal)
DEFINE_WAIT(wait);
int need_to_start;
tid_t tid = journal->j_running_transaction->t_tid;
+ unsigned long start = jiffies;

prepare_to_wait(&journal->j_wait_transaction_locked, &wait,
TASK_UNINTERRUPTIBLE);
@@ -158,6 +159,8 @@ static void wait_transaction_locked(journal_t *journal)
jbd2_might_wait_for_commit(journal);
schedule();
finish_wait(&journal->j_wait_transaction_locked, &wait);
+ trace_jbd2_wait_on_transaction_locked(journal->j_fs_dev->bd_dev,
+ jiffies_to_msecs(jiffies - start));
}

/*
diff --git a/include/trace/events/jbd2.h b/include/trace/events/jbd2.h
index 2f048fdb63c6..6f091f901223 100644
--- a/include/trace/events/jbd2.h
+++ b/include/trace/events/jbd2.h
@@ -401,6 +401,27 @@ TRACE_EVENT(jbd2_sleep_on_shadow,
__entry->stall_ms)
);

+TRACE_EVENT(jbd2_wait_on_transaction_locked,
+
+ TP_PROTO(dev_t dev, unsigned long stall_ms),
+
+ TP_ARGS(dev, stall_ms),
+
+ TP_STRUCT__entry(
+ __field( dev_t, dev )
+ __field(unsigned long, stall_ms )
+ ),
+
+ TP_fast_assign(
+ __entry->dev = dev;
+ __entry->stall_ms = stall_ms;
+ ),
+
+ TP_printk("dev %d,%d wait %lu",
+ MAJOR(__entry->dev), MINOR(__entry->dev),
+ __entry->stall_ms)
+);
+
#endif /* _TRACE_JBD2_H */

/* This part must be outside protection */
--
2.17.2

2019-09-08 12:58:27

by Theodore Ts'o

[permalink] [raw]

Subject: Re: [PATCH 1/2] jbd2: add new tracepoint jbd2_sleep_on_shadow

On Mon, Sep 02, 2019 at 10:54:41PM +0800, Xiaoguang Wang wrote:
> Sometimes process will be stalled in "wait_on_bit_io(&bh->b_state,
> BH_Shadow, TASK_UNINTERRUPTIBLE)" for a while, and in order to analyse
> app's latency thoroughly, add a new tracepoint to track this delay.
>
> Trace info likes below:
> fsstress-5068 [008] .... 11007.757543: jbd2_sleep_on_shadow: dev 254,17 sleep 1
> fsstress-5070 [007] .... 11007.757544: jbd2_sleep_on_shadow: dev 254,17 sleep 2
> fsstress-5069 [009] .... 11007.757548: jbd2_sleep_on_shadow: dev 254,17 sleep 2
> fsstress-5067 [011] .... 11007.757569: jbd2_sleep_on_shadow: dev 254,17 sleep 1
> fsstress-5063 [007] .... 11007.757651: jbd2_sleep_on_shadow: dev 254,17 sleep 2
> fsstress-5070 [007] .... 11007.757792: jbd2_sleep_on_shadow: dev 254,17 sleep 0
> fsstress-5071 [011] .... 11007.763493: jbd2_sleep_on_shadow: dev 254,17 sleep 1
>
> Signed-off-by: Xiaoguang Wang <[email protected]>

I think maybe it might be better to use units of microseconds and then
change sleep to usleep so the units are clear? This is a spinlock, so
it should be quick.

For the other patch in this series, milliseconds seems fine, but if we
change the trace info to use "msleep" instead that would be clearer
--- or you could change it to use microseconds as well just for
consistency; I think either would be fine.

What do you think?

Cheers,

- Ted

2019-09-11 06:53:39

by Xiaoguang Wang

[permalink] [raw]

Subject: Re: [PATCH 1/2] jbd2: add new tracepoint jbd2_sleep_on_shadow

hi,

Thanks for reviewing.
> On Mon, Sep 02, 2019 at 10:54:41PM +0800, Xiaoguang Wang wrote:
>> Sometimes process will be stalled in "wait_on_bit_io(&bh->b_state,
>> BH_Shadow, TASK_UNINTERRUPTIBLE)" for a while, and in order to analyse
>> app's latency thoroughly, add a new tracepoint to track this delay.
>>
>> Trace info likes below:
>> fsstress-5068 [008] .... 11007.757543: jbd2_sleep_on_shadow: dev 254,17 sleep 1
>> fsstress-5070 [007] .... 11007.757544: jbd2_sleep_on_shadow: dev 254,17 sleep 2
>> fsstress-5069 [009] .... 11007.757548: jbd2_sleep_on_shadow: dev 254,17 sleep 2
>> fsstress-5067 [011] .... 11007.757569: jbd2_sleep_on_shadow: dev 254,17 sleep 1
>> fsstress-5063 [007] .... 11007.757651: jbd2_sleep_on_shadow: dev 254,17 sleep 2
>> fsstress-5070 [007] .... 11007.757792: jbd2_sleep_on_shadow: dev 254,17 sleep 0
>> fsstress-5071 [011] .... 11007.763493: jbd2_sleep_on_shadow: dev 254,17 sleep 1
>>
>> Signed-off-by: Xiaoguang Wang <[email protected]>
>
> I think maybe it might be better to use units of microseconds and then
> change sleep to usleep so the units are clear? This is a spinlock, so
> it should be quick.
Sorry, I may not quite understand you, do you mean that milliseconds is not precise, so
should use microseconds? For these two patches, they do not use usleep or msleep to do
real sleep work, they just record the duration which process takes to wait bh_shadow flag
to be cleared or transaction to be unlocked.

Regards,
Xiaougang Wang

>
> For the other patch in this series, milliseconds seems fine, but if we
> change the trace info to use "msleep" instead that would be clearer
> --- or you could change it to use microseconds as well just for
> consistency; I think either would be fine.
>
> What do you think?
>
> Cheers,
>
> - Ted
>

2019-09-11 14:02:07

by Theodore Ts'o

[permalink] [raw]

Subject: Re: [PATCH 1/2] jbd2: add new tracepoint jbd2_sleep_on_shadow

On Wed, Sep 11, 2019 at 02:52:51PM +0800, Xiaoguang Wang wrote:
> > I think maybe it might be better to use units of microseconds and then
> > change sleep to usleep so the units are clear? This is a spinlock, so
> > it should be quick.
>
> Sorry, I may not quite understand you, do you mean that milliseconds is not precise, so
> should use microseconds? For these two patches, they do not use usleep or msleep to do
> real sleep work, they just record the duration which process takes to wait bh_shadow flag
> to be cleared or transaction to be unlocked.

Apologies, I should have been clear enough. Yes, my concern that
milliseconds might not be fine-grained enough. The sample results
which you showed had values of 2ms, 1ms, and 0ms. And the single 0ms
result in particular raised the concern that we should use a
microseconds instead of milliseconds.

In fact, instead of "sleep", maybe "latency(us)" or "latency(ms)"
would be a better label?

Regards,

- Ted

2019-09-16 14:22:49

by Xiaoguang Wang

[permalink] [raw]

Subject: Re: [PATCH 1/2] jbd2: add new tracepoint jbd2_sleep_on_shadow

hi,

> On Wed, Sep 11, 2019 at 02:52:51PM +0800, Xiaoguang Wang wrote:
>>> I think maybe it might be better to use units of microseconds and then
>>> change sleep to usleep so the units are clear? This is a spinlock, so
>>> it should be quick.
>>
>> Sorry, I may not quite understand you, do you mean that milliseconds is not precise, so
>> should use microseconds? For these two patches, they do not use usleep or msleep to do
>> real sleep work, they just record the duration which process takes to wait bh_shadow flag
>> to be cleared or transaction to be unlocked.
>
> Apologies, I should have been clear enough. Yes, my concern that
> milliseconds might not be fine-grained enough. The sample results
> which you showed had values of 2ms, 1ms, and 0ms. And the single 0ms
> result in particular raised the concern that we should use a
> microseconds instead of milliseconds.
>
> In fact, instead of "sleep", maybe "latency(us)" or "latency(ms)"
> would be a better label?
OK, I'll update a v2, thanks.

Regards,
Xiaoguang Wang

>
> Regards,
>
> - Ted
>