LinuxLists.cc - [PATCH 4/5] jbd2: Fix wrongly judgement for buffer head removing while doing checkpoint

2023-05-31 11:52:02

Subject: [PATCH 4/5] jbd2: Fix wrongly judgement for buffer head removing while doing checkpoint

From: Zhihao Cheng <[email protected]>

Following process,

jbd2_journal_commit_transaction
// there are several dirty buffer heads in transaction->t_checkpoint_list
P1 wb_workfn
jbd2_log_do_checkpoint
if (buffer_locked(bh)) // false
__block_write_full_page
trylock_buffer(bh)
test_clear_buffer_dirty(bh)
if (!buffer_dirty(bh))
__jbd2_journal_remove_checkpoint(jh)
if (buffer_write_io_error(bh)) // false
>> bh IO error occurs <<
jbd2_cleanup_journal_tail
__jbd2_update_log_tail
jbd2_write_superblock
// The bh won't be replayed in next mount.
, which could corrupt the ext4 image, fetch a reproducer in [Link].

Since writeback process clears buffer dirty after locking buffer head,
we can fix it by checking buffer dirty firstly and then checking buffer
locked, the buffer head can be removed if it is neither dirty nor locked.

Link: https://bugzilla.kernel.org/show_bug.cgi?id=217490
Fixes: 470decc613ab ("[PATCH] jbd2: initial copy of files from jbd")
Signed-off-by: Zhihao Cheng <[email protected]>
Signed-off-by: Zhang Yi <[email protected]>
---
fs/jbd2/checkpoint.c | 48 ++++++++++++++++++++++----------------------
1 file changed, 24 insertions(+), 24 deletions(-)

diff --git a/fs/jbd2/checkpoint.c b/fs/jbd2/checkpoint.c
index 3f52560912a9..620f3d345f3d 100644
--- a/fs/jbd2/checkpoint.c
+++ b/fs/jbd2/checkpoint.c
@@ -204,20 +204,6 @@ int jbd2_log_do_checkpoint(journal_t *journal)
jh = transaction->t_checkpoint_list;
bh = jh2bh(jh);

- /*
- * The buffer may be writing back, or flushing out in the
- * last couple of cycles, or re-adding into a new transaction,
- * need to check it again until it's unlocked.
- */
- if (buffer_locked(bh)) {
- get_bh(bh);
- spin_unlock(&journal->j_list_lock);
- wait_on_buffer(bh);
- /* the journal_head may have gone by now */
- BUFFER_TRACE(bh, "brelse");
- __brelse(bh);
- goto retry;
- }
if (jh->b_transaction != NULL) {
transaction_t *t = jh->b_transaction;
tid_t tid = t->t_tid;
@@ -252,16 +238,7 @@ int jbd2_log_do_checkpoint(journal_t *journal)
spin_lock(&journal->j_list_lock);
goto restart;
}
- if (!buffer_dirty(bh)) {
- BUFFER_TRACE(bh, "remove from checkpoint");
- /*
- * If the transaction was released or the checkpoint
- * list was empty, we're done.
- */
- if (__jbd2_journal_remove_checkpoint(jh) ||
- !transaction->t_checkpoint_list)
- goto out;
- } else {
+ if (buffer_dirty(bh)) {
/*
* We are about to write the buffer, it could be
* raced by some other transaction shrink or buffer
@@ -275,6 +252,29 @@ int jbd2_log_do_checkpoint(journal_t *journal)
journal->j_chkpt_bhs[batch_count++] = bh;
transaction->t_chp_stats.cs_written++;
transaction->t_checkpoint_list = jh->b_cpnext;
+ } else if (buffer_locked(bh)) {
+ /*
+ * The buffer may be writing back, or flushing out
+ * in the last couple of cycles, or re-adding into
+ * a new transaction, need to check it again until
+ * it's unlocked.
+ */
+ get_bh(bh);
+ spin_unlock(&journal->j_list_lock);
+ wait_on_buffer(bh);
+ /* the journal_head may have gone by now */
+ BUFFER_TRACE(bh, "brelse");
+ __brelse(bh);
+ goto retry;
+ } else {
+ BUFFER_TRACE(bh, "remove from checkpoint");
+ /*
+ * If the transaction was released or the checkpoint
+ * list was empty, we're done.
+ */
+ if (__jbd2_journal_remove_checkpoint(jh) ||
+ !transaction->t_checkpoint_list)
+ goto out;
}

if ((batch_count == JBD2_NR_BATCH) ||
--
2.31.1

2023-06-01 09:44:12

by Jan Kara

[permalink] [raw]

Subject: Re: [PATCH 4/5] jbd2: Fix wrongly judgement for buffer head removing while doing checkpoint

On Wed 31-05-23 19:50:59, Zhang Yi wrote:
> From: Zhihao Cheng <[email protected]>
>
> Following process,
>
> jbd2_journal_commit_transaction
> // there are several dirty buffer heads in transaction->t_checkpoint_list
> P1 wb_workfn
> jbd2_log_do_checkpoint
> if (buffer_locked(bh)) // false
> __block_write_full_page
> trylock_buffer(bh)
> test_clear_buffer_dirty(bh)
> if (!buffer_dirty(bh))
> __jbd2_journal_remove_checkpoint(jh)
> if (buffer_write_io_error(bh)) // false
> >> bh IO error occurs <<
> jbd2_cleanup_journal_tail
> __jbd2_update_log_tail
> jbd2_write_superblock
> // The bh won't be replayed in next mount.
> , which could corrupt the ext4 image, fetch a reproducer in [Link].
>
> Since writeback process clears buffer dirty after locking buffer head,
> we can fix it by checking buffer dirty firstly and then checking buffer
> locked, the buffer head can be removed if it is neither dirty nor locked.
>
> Link: https://bugzilla.kernel.org/show_bug.cgi?id=217490
> Fixes: 470decc613ab ("[PATCH] jbd2: initial copy of files from jbd")
> Signed-off-by: Zhihao Cheng <[email protected]>
> Signed-off-by: Zhang Yi <[email protected]>

OK, the analysis is correct but I'm afraid the fix won't be that easy. The
reordering of tests you did below doesn't really help because CPU or the
compiler are free to order the loads (and stores) in whatever way they
wish. You'd have to use memory barriers when reading and modifying bh flags
(although the modification side is implicitely handled by the bitlock
code) to make this work reliably. But that is IMHO too subtle for this
code.

What we should be doing to avoid these races is to lock the bh. So
something like:

if (jh->b_transaction != NULL) {
do stuff
}
if (!trylock_buffer(bh)) {
buffer_locked() branch
}
... Now we have the buffer locked and can safely check for dirtyness

And we need to do a similar treatment for journal_clean_one_cp_list() and
journal_shrink_one_cp_list().

BTW, I think we could merge journal_clean_one_cp_list() and
journal_shrink_one_cp_list() into a single common function. I think we can
drop the nr_to_scan argument and just always cleanup the whole checkpoint
list and return the number of freed buffers. That way we have one less
function to deal with checkpoint list cleaning.

Thinking about it some more maybe we can have a function like:

int jbd2_try_remove_checkpoint(struct journal_head *jh)
{
struct buffer_head *bh = jh2bh(jh);

if (!trylock_buffer(bh) || buffer_dirty(bh))
return -EBUSY;
/*
* Buffer is clean and the IO has finished (we hold the buffer lock) so
* the checkpoint is done. We can safely remove the buffer from this
* transaction.
*/
unlock_buffer(bh);
return __jbd2_journal_remove_checkpoint(jh);
}

and that can be used with a bit of care in the checkpointing functions as
well as in jbd2_journal_forget(), __journal_try_to_free_buffer(),
journal_unmap_buffer().

Honza
--
Jan Kara <[email protected]>
SUSE Labs, CR

2023-06-01 13:48:48

by Zhihao Cheng

[permalink] [raw]

Subject: Re: [PATCH 4/5] jbd2: Fix wrongly judgement for buffer head removing while doing checkpoint

?? 2023/6/1 17:41, Jan Kara д??:

Hi, Jan
> On Wed 31-05-23 19:50:59, Zhang Yi wrote:
>> From: Zhihao Cheng <[email protected]>
>>
>> Following process,
>>
>> jbd2_journal_commit_transaction
>> // there are several dirty buffer heads in transaction->t_checkpoint_list
>> P1 wb_workfn
>> jbd2_log_do_checkpoint
>> if (buffer_locked(bh)) // false
>> __block_write_full_page
>> trylock_buffer(bh)
>> test_clear_buffer_dirty(bh)
>> if (!buffer_dirty(bh))
>> __jbd2_journal_remove_checkpoint(jh)
>> if (buffer_write_io_error(bh)) // false
>> >> bh IO error occurs <<
>> jbd2_cleanup_journal_tail
>> __jbd2_update_log_tail
>> jbd2_write_superblock
>> // The bh won't be replayed in next mount.
>> , which could corrupt the ext4 image, fetch a reproducer in [Link].
>>
>> Since writeback process clears buffer dirty after locking buffer head,
>> we can fix it by checking buffer dirty firstly and then checking buffer
>> locked, the buffer head can be removed if it is neither dirty nor locked.
>>
>> Link: https://bugzilla.kernel.org/show_bug.cgi?id=217490
>> Fixes: 470decc613ab ("[PATCH] jbd2: initial copy of files from jbd")
>> Signed-off-by: Zhihao Cheng <[email protected]>
>> Signed-off-by: Zhang Yi <[email protected]>
>
> OK, the analysis is correct but I'm afraid the fix won't be that easy. The
> reordering of tests you did below doesn't really help because CPU or the
> compiler are free to order the loads (and stores) in whatever way they
> wish. You'd have to use memory barriers when reading and modifying bh flags
> (although the modification side is implicitely handled by the bitlock
> code) to make this work reliably. But that is IMHO too subtle for this
> code.
>

Do you mean there might be a sequence like following:

jbd2_log_do_checkpoint
if (buffer_dirty(bh))
else if (buffer_locked(bh))
else
__jbd2_journal_remove_checkpoint(jh)

CPU re-arranges the order of getting buffer state.
reg_1 = buffer_locked(bh) // false
lock_buffer(bh)
clear_buffer(bh)
reg_2 = buffer_dirty(bh) // false

Then, jbd2_log_do_checkpoint() could become:
if (reg_2)
else if (reg_1)
else
__jbd2_journal_remove_checkpoint(jh) // enter !

Am I understanding right?

> What we should be doing to avoid these races is to lock the bh. So
> something like:
>
> if (jh->b_transaction != NULL) {
> do stuff
> }
> if (!trylock_buffer(bh)) {
> buffer_locked() branch
> }
> ... Now we have the buffer locked and can safely check for dirtyness
>
> And we need to do a similar treatment for journal_clean_one_cp_list() and
> journal_shrink_one_cp_list().
>
> BTW, I think we could merge journal_clean_one_cp_list() and
> journal_shrink_one_cp_list() into a single common function. I think we can
> drop the nr_to_scan argument and just always cleanup the whole checkpoint
> list and return the number of freed buffers. That way we have one less
> function to deal with checkpoint list cleaning.
>
> Thinking about it some more maybe we can have a function like:
>
> int jbd2_try_remove_checkpoint(struct journal_head *jh)
> {
> struct buffer_head *bh = jh2bh(jh);
>
> if (!trylock_buffer(bh) || buffer_dirty(bh))
> return -EBUSY;
> /*
> * Buffer is clean and the IO has finished (we hold the buffer lock) so
> * the checkpoint is done. We can safely remove the buffer from this
> * transaction.
> */
> unlock_buffer(bh);
> return __jbd2_journal_remove_checkpoint(jh);
> }
>
> and that can be used with a bit of care in the checkpointing functions as
> well as in jbd2_journal_forget(), __journal_try_to_free_buffer(),
> journal_unmap_buffer().
>
> Honza
>

2023-06-01 14:25:16

by Zhang Yi

[permalink] [raw]

Subject: Re: [PATCH 4/5] jbd2: Fix wrongly judgement for buffer head removing while doing checkpoint

On 2023/6/1 21:44, Zhihao Cheng wrote:
> 在 2023/6/1 17:41, Jan Kara 写道:
>
> Hi, Jan
>> On Wed 31-05-23 19:50:59, Zhang Yi wrote:
>>> From: Zhihao Cheng <[email protected]>
>>>
>>> Following process,
>>>
>>> jbd2_journal_commit_transaction
>>> // there are several dirty buffer heads in transaction->t_checkpoint_list
>>>            P1                   wb_workfn
>>> jbd2_log_do_checkpoint
>>> if (buffer_locked(bh)) // false
>>>                              __block_write_full_page
>>>                               trylock_buffer(bh)
>>>                               test_clear_buffer_dirty(bh)
>>> if (!buffer_dirty(bh))
>>>    __jbd2_journal_remove_checkpoint(jh)
>>>     if (buffer_write_io_error(bh)) // false
>>>                               >> bh IO error occurs <<
>>> jbd2_cleanup_journal_tail
>>>    __jbd2_update_log_tail
>>>     jbd2_write_superblock
>>>     // The bh won't be replayed in next mount.
>>> , which could corrupt the ext4 image, fetch a reproducer in [Link].
>>>
>>> Since writeback process clears buffer dirty after locking buffer head,
>>> we can fix it by checking buffer dirty firstly and then checking buffer
>>> locked, the buffer head can be removed if it is neither dirty nor locked.
>>>
>>> Link: https://bugzilla.kernel.org/show_bug.cgi?id=217490
>>> Fixes: 470decc613ab ("[PATCH] jbd2: initial copy of files from jbd")
>>> Signed-off-by: Zhihao Cheng <[email protected]>
>>> Signed-off-by: Zhang Yi <[email protected]>
>>
>> OK, the analysis is correct but I'm afraid the fix won't be that easy. The
>> reordering of tests you did below doesn't really help because CPU or the
>> compiler are free to order the loads (and stores) in whatever way they
>> wish. You'd have to use memory barriers when reading and modifying bh flags
>> (although the modification side is implicitely handled by the bitlock
>> code) to make this work reliably. But that is IMHO too subtle for this
>> code.
>>
>

I write two litmus-test files in tools/memory-model to check the memory module
of these two cases as jbd2_log_do_checkpoint() and __cp_buffer_busy() does.

**1. test-ww-rr.litmus** //simulate our changes in jbd2_log_do_checkpoint()

C WW-RR

(*
* Result: Never
*
* This test shows that write-write ordering
* (in P0()) is visible to external process P2().
*
* bit 0: lock
* bit 1: dirty
*)

{
x=2;
}

P0(int *x)
{
int r1;

r1 = READ_ONCE(*x);
r1 = r1 | 1; //lock
WRITE_ONCE(*x, r1);

r1 = READ_ONCE(*x);
r1 = r1 & 253; //&~2, clear dirty
WRITE_ONCE(*x, r1);
}

P1(int *x)
{
int r0;
int r1;

r0 = READ_ONCE(*x);
r1 = READ_ONCE(*x);
}

exists (1:r1=2 /\ 1:r0=1)

The test results are:

$ herd7 -conf linux-kernel.cfg litmus-tests/test-ww-rr.litmus
Test WW-RR Allowed
States 6
1:r0=1; 1:r1=1;
1:r0=2; 1:r1=1;
1:r0=2; 1:r1=2;
1:r0=2; 1:r1=3;
1:r0=3; 1:r1=1;
1:r0=3; 1:r1=3;
No
Witnesses
Positive: 0 Negative: 6
Condition exists (1:r1=2 /\ 1:r0=1)
Observation WW-RR Never 0 6
Time WW-RR 0.02
Hash=d91ecee2379f8ac878b8d06f17967874

**2. test-ww-r.litmus** //simulate our changes in __cp_buffer_busy()

C WW-R

(*
* Result: Never
*
* This test shows that write-write ordering
* (in P0()) is visible to external process P2().
*
* bit 0: lock
* bit 1: dirty
*)

{
x=2;
}

P0(int *x)
{
int r1;

r1 = READ_ONCE(*x);
r1 = r1 | 1; //lock
WRITE_ONCE(*x, r1);

r1 = READ_ONCE(*x);
r1 = r1 & 253; //&~2, clear dirty
WRITE_ONCE(*x, r1);
}

P1(int *x)
{
int r0;

r0 = READ_ONCE(*x);
}

exists (1:r0=0)

The test results are:

$ herd7 -conf linux-kernel.cfg litmus-tests/test-ww-r.litmus
Test WW-R Allowed
States 3
1:r0=1;
1:r0=2;
1:r0=3;
No
Witnesses
Positive: 0 Negative: 3
Condition exists (1:r0=0)
Observation WW-R Never 0 3
Time WW-R 0.01
Hash=d76bb39c367dc0cbd0c363a87c6c9eb7

So it looks like the out-of-order situation cannot happen, am I miss something?

Thanks,
Yi.

> Do you mean there might be a sequence like following:
>
> jbd2_log_do_checkpoint
> if (buffer_dirty(bh))
> else if (buffer_locked(bh))
> else
>    __jbd2_journal_remove_checkpoint(jh)
>
> CPU re-arranges the order of getting buffer state.
> reg_1 = buffer_locked(bh) // false
>                            lock_buffer(bh)
>                            clear_buffer(bh)
> reg_2 = buffer_dirty(bh)   // false
>
> Then, jbd2_log_do_checkpoint() could become:
> if (reg_2)
> else if (reg_1)
> else
> __jbd2_journal_remove_checkpoint(jh) // enter !
>
> Am I understanding right?
>
>> What we should be doing to avoid these races is to lock the bh. So
>> something like:
>>
>>     if (jh->b_transaction != NULL) {
>>         do stuff
>>     }
>>     if (!trylock_buffer(bh)) {
>>         buffer_locked() branch
>>     }
>>     ... Now we have the buffer locked and can safely check for dirtyness
>>
>> And we need to do a similar treatment for journal_clean_one_cp_list() and
>> journal_shrink_one_cp_list().
>>
>> BTW, I think we could merge journal_clean_one_cp_list() and
>> journal_shrink_one_cp_list() into a single common function. I think we can
>> drop the nr_to_scan argument and just always cleanup the whole checkpoint
>> list and return the number of freed buffers. That way we have one less
>> function to deal with checkpoint list cleaning.
>>
>> Thinking about it some more maybe we can have a function like:
>>
>> int jbd2_try_remove_checkpoint(struct journal_head *jh)
>> {
>>     struct buffer_head *bh = jh2bh(jh);
>>
>>     if (!trylock_buffer(bh) || buffer_dirty(bh))
>>         return -EBUSY;
>>     /*
>>      * Buffer is clean and the IO has finished (we hold the buffer lock) so
>>      * the checkpoint is done. We can safely remove the buffer from this
>>      * transaction.
>>      */
>>     unlock_buffer(bh);
>>     return __jbd2_journal_remove_checkpoint(jh);
>> }
>>
>> and that can be used with a bit of care in the checkpointing functions as
>> well as in jbd2_journal_forget(), __journal_try_to_free_buffer(),
>> journal_unmap_buffer().
>>
>>                                 Honza
>>

2023-06-01 16:39:55

by Jan Kara

[permalink] [raw]

Subject: Re: [PATCH 4/5] jbd2: Fix wrongly judgement for buffer head removing while doing checkpoint

On Thu 01-06-23 22:20:38, Zhang Yi wrote:
> On 2023/6/1 21:44, Zhihao Cheng wrote:
> > 在 2023/6/1 17:41, Jan Kara 写道:
> >
> > Hi, Jan
> >> On Wed 31-05-23 19:50:59, Zhang Yi wrote:
> >>> From: Zhihao Cheng <[email protected]>
> >>>
> >>> Following process,
> >>>
> >>> jbd2_journal_commit_transaction
> >>> // there are several dirty buffer heads in transaction->t_checkpoint_list
> >>>            P1                   wb_workfn
> >>> jbd2_log_do_checkpoint
> >>> if (buffer_locked(bh)) // false
> >>>                              __block_write_full_page
> >>>                               trylock_buffer(bh)
> >>>                               test_clear_buffer_dirty(bh)
> >>> if (!buffer_dirty(bh))
> >>>    __jbd2_journal_remove_checkpoint(jh)
> >>>     if (buffer_write_io_error(bh)) // false
> >>>                               >> bh IO error occurs <<
> >>> jbd2_cleanup_journal_tail
> >>>    __jbd2_update_log_tail
> >>>     jbd2_write_superblock
> >>>     // The bh won't be replayed in next mount.
> >>> , which could corrupt the ext4 image, fetch a reproducer in [Link].
> >>>
> >>> Since writeback process clears buffer dirty after locking buffer head,
> >>> we can fix it by checking buffer dirty firstly and then checking buffer
> >>> locked, the buffer head can be removed if it is neither dirty nor locked.
> >>>
> >>> Link: https://bugzilla.kernel.org/show_bug.cgi?id=217490
> >>> Fixes: 470decc613ab ("[PATCH] jbd2: initial copy of files from jbd")
> >>> Signed-off-by: Zhihao Cheng <[email protected]>
> >>> Signed-off-by: Zhang Yi <[email protected]>
> >>
> >> OK, the analysis is correct but I'm afraid the fix won't be that easy. The
> >> reordering of tests you did below doesn't really help because CPU or the
> >> compiler are free to order the loads (and stores) in whatever way they
> >> wish. You'd have to use memory barriers when reading and modifying bh flags
> >> (although the modification side is implicitely handled by the bitlock
> >> code) to make this work reliably. But that is IMHO too subtle for this
> >> code.
> >>
> >
>
> I write two litmus-test files in tools/memory-model to check the memory module
> of these two cases as jbd2_log_do_checkpoint() and __cp_buffer_busy() does.

<snip litmus tests>

> So it looks like the out-of-order situation cannot happen, am I miss something?

After thinking about it a bit, indeed CPU cannot reorder the two loads
because they are from the same location in memory. Thanks for correcting me
on this. I'm not sure whether a C compiler still could not reorder the
tests - technically I suspect the C standard does not forbid this although
it would have to be really evil compiler to do this.

But still I think with the helper function I've suggested things are much
more obviously correct.

Honza
--
Jan Kara <[email protected]>
SUSE Labs, CR

2023-06-02 02:00:14

by Zhihao Cheng

[permalink] [raw]

Subject: Re: [PATCH 4/5] jbd2: Fix wrongly judgement for buffer head removing while doing checkpoint

在 2023/6/2 0:31, Jan Kara 写道:
> On Thu 01-06-23 22:20:38, Zhang Yi wrote:
>> On 2023/6/1 21:44, Zhihao Cheng wrote:
>>> 在 2023/6/1 17:41, Jan Kara 写道:
>>>
>>> Hi, Jan
>>>> On Wed 31-05-23 19:50:59, Zhang Yi wrote:
>>>>> From: Zhihao Cheng <[email protected]>
>>>>>
>>>>> Following process,
>>>>>
>>>>> jbd2_journal_commit_transaction
>>>>> // there are several dirty buffer heads in transaction->t_checkpoint_list
>>>>>            P1                   wb_workfn
>>>>> jbd2_log_do_checkpoint
>>>>> if (buffer_locked(bh)) // false
>>>>>                              __block_write_full_page
>>>>>                               trylock_buffer(bh)
>>>>>                               test_clear_buffer_dirty(bh)
>>>>> if (!buffer_dirty(bh))
>>>>>    __jbd2_journal_remove_checkpoint(jh)
>>>>>     if (buffer_write_io_error(bh)) // false
>>>>>                               >> bh IO error occurs <<
>>>>> jbd2_cleanup_journal_tail
>>>>>    __jbd2_update_log_tail
>>>>>     jbd2_write_superblock
>>>>>     // The bh won't be replayed in next mount.
>>>>> , which could corrupt the ext4 image, fetch a reproducer in [Link].
>>>>>
>>>>> Since writeback process clears buffer dirty after locking buffer head,
>>>>> we can fix it by checking buffer dirty firstly and then checking buffer
>>>>> locked, the buffer head can be removed if it is neither dirty nor locked.
>>>>>
>>>>> Link: https://bugzilla.kernel.org/show_bug.cgi?id=217490
>>>>> Fixes: 470decc613ab ("[PATCH] jbd2: initial copy of files from jbd")
>>>>> Signed-off-by: Zhihao Cheng <[email protected]>
>>>>> Signed-off-by: Zhang Yi <[email protected]>
>>>>
>>>> OK, the analysis is correct but I'm afraid the fix won't be that easy. The
>>>> reordering of tests you did below doesn't really help because CPU or the
>>>> compiler are free to order the loads (and stores) in whatever way they
>>>> wish. You'd have to use memory barriers when reading and modifying bh flags
>>>> (although the modification side is implicitely handled by the bitlock
>>>> code) to make this work reliably. But that is IMHO too subtle for this
>>>> code.
>>>>
>>>
>>
>> I write two litmus-test files in tools/memory-model to check the memory module
>> of these two cases as jbd2_log_do_checkpoint() and __cp_buffer_busy() does.
>
> <snip litmus tests>
>
>> So it looks like the out-of-order situation cannot happen, am I miss something?
>
> After thinking about it a bit, indeed CPU cannot reorder the two loads
> because they are from the same location in memory. Thanks for correcting me
> on this. I'm not sure whether a C compiler still could not reorder the
> tests - technically I suspect the C standard does not forbid this although
> it would have to be really evil compiler to do this.
>
> But still I think with the helper function I've suggested things are much
> more obviously correct.

Thanks for suggestions, we will modify it in next iteration.