LinuxLists.cc - Help required for Debugging JBD

2011-06-21 16:54:34

Subject: Help required for Debugging JBD

Hi all,
I have modified my JBD (no change to Ext3) to support
Transactional-Flash. For that I require only to collect metadata and
data blocks and send them simultaneously (no transaction record, revoke
record etc).
The problem is, though it works well for small number of
operations, the kernel completely hangs when I run any benchmark (like
blogbench or postmark) over them. The bad thing is that there is no
trace left afterwards, ie logs don't contain any message since the
operation was started, kernel OOPS are not shown, no faults are shown,
no panic is shown even though I have enabled panic on hard/soft lockup.
I have to hard reboot machine each time.
So essentially, I am totally clueless about the point at which
it is crashing or reason behind it. There is a small possibility that
bug may be in modified MTD layer (which I've written myself), but since
I have run unmodified Ext3 on that MTD layer without any bug, the chance
of buggy MTD layer appears very slim.

Any help in greatly appreciated.

Niraj

2011-06-22 06:43:54

by Amir Goldstein

[permalink] [raw]

Subject: Re: Help required for Debugging JBD

On Tue, Jun 21, 2011 at 7:52 PM, Niraj Kulkarni
<[email protected]> wrote:
> Hi all,
> ? ? ? ?I have modified my JBD (no change to Ext3) to support
> Transactional-Flash. For that I require only to collect metadata and data
> blocks and send them simultaneously (no transaction record, revoke record
> etc).

It's hard to help without knowing what your patches do in more details.
Could you post the patches or give a more detailed description of how they
work?
For example, how do you collect data blocks without them being already
written? or is this not a requirement for your use case?

> ? ? ? The problem is, though it works well for small number of operations,
> the kernel completely hangs when I run any benchmark (like blogbench or
> postmark) over them. The bad thing is that there is no trace left
> afterwards, ie logs don't contain any message since the operation was
> started, kernel OOPS are not shown, no faults are shown, no panic is shown
> even though I have enabled panic on hard/soft lockup. I have to hard reboot
> machine each time.

Did you try alt+sysrq+w to dump waiting tasks?

> ? ? ? ?So essentially, I am totally clueless about the point ?at which it is
> crashing or reason behind it. There is a small possibility that bug may be
> in modified MTD layer (which I've written myself), but since I have run
> unmodified Ext3 on that MTD layer without any bug, the chance of buggy MTD
> layer appears very slim.
>
> Any help in greatly appreciated.
>
> Niraj
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to [email protected]
> More majordomo info at ?http://vger.kernel.org/majordomo-info.html
>

2011-06-22 18:30:32

by Niraj Kulkarni

[permalink] [raw]

Subject: Re: Help required for Debugging JBD

Hi,
Thanks for that SysRq tip. Now I am able to get some logs.

From OOPS message, it showed an assertion failure on
J_ASSERT_JH(jh, jh->b_transaction ==
journal->j_committing_transaction);

In my code, I've modified journal_commit_transaction such that it
collects all buffer_head in a linked list, with their corresponding
buffer numbers in other list.
I collect all buffers (data + metadata ) and push them all
simultaneously and pass list of block numbers through a special ioctl call.

The problem that I see in my code is that all buffers are handled in
same way as all data buffers in original code. ie metadata buffers are
getting unfiled instead of refiling.

I am attaching my patch. Please can you see and check if that indeed is
problem here?

Also what can be possible solution to it? separation of buffers in 2
list (data, metadata) and handling them separately?

(Being a kernel noob, my coding does not conform to any standard. So
please point out any blunders I've committed in my patch)

Thank You
Niraj

Attachments:

patch (31.41 kB)

2011-06-22 19:29:29

by Amir Goldstein

[permalink] [raw]

Subject: Re: Help required for Debugging JBD

On Wed, Jun 22, 2011 at 9:27 PM, Niraj Kulkarni
<[email protected]> wrote:
> Hi,
> ? ? Thanks for that SysRq tip. Now I am able to get some logs.
>
> From OOPS message, it showed an assertion failure on
> ? ?J_ASSERT_JH(jh, jh->b_transaction == journal->j_committing_transaction);
>
> In my code, I've modified journal_commit_transaction such that it collects
> all buffer_head in a linked list, with their corresponding
> buffer numbers in other list.
> ? ? I collect all buffers (data + metadata ) and push them all
> simultaneously and pass list of block numbers through a special ioctl call.
>
> The problem that I see in my code is that all buffers are handled in same
> way as all data buffers in original code. ie metadata buffers are
> getting unfiled instead of refiling.

I am not sure what you are saying, but it sounds bad.
data buffers and metadata buffers are handled very differently.

>
> I am attaching my patch. Please can you see and check if that indeed is
> problem here?
>

JBD is one complicated piece of work (to me), so even if I do find time
to review your patch, it's not going to be easy for me.

> Also what can be possible solution to it? separation of buffers in 2 list
> (data, metadata) and handling them separately?
>

Without looking at your patches, I have a lead for you.
Lookup the "Journal guided RAID resync" patches.
They do something similar to what you describe, for a different purpose,
but they also maintain a list of data blocks and yes, they deal with them
separately.
These patches have been already tested, so they used to be in good shape,
but are not uptodate.

Good luck,
Amir.

> (Being a kernel noob, my coding does not conform to any standard. So please
> point out any blunders I've committed in my patch)
>
> Thank You
> Niraj
>