A bitwise flag comparison could be done using a more efficient bit-ops way.
Signed-off-by: Jianyu Zhan <[email protected]>
---
include/linux/blkdev.h | 13 +++----------
1 file changed, 3 insertions(+), 10 deletions(-)
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 1e1fa3f..adfa40a 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -655,16 +655,9 @@ static inline bool rq_mergeable(struct request *rq)
static inline bool blk_check_merge_flags(unsigned int flags1,
unsigned int flags2)
{
- if ((flags1 & REQ_DISCARD) != (flags2 & REQ_DISCARD))
- return false;
-
- if ((flags1 & REQ_SECURE) != (flags2 & REQ_SECURE))
- return false;
-
- if ((flags1 & REQ_WRITE_SAME) != (flags2 & REQ_WRITE_SAME))
- return false;
-
- return true;
+ return (flags1 & (REQ_DISCARD | REQ_SECURE | REQ_WRITE_SAME)) ^
+ (flags2 & (REQ_DISCARD | REQ_SECURE | REQ_WRITE_SAME))
+ == 0;
}
static inline bool blk_write_same_mergeable(struct bio *a, struct bio *b)
--
1.9.0.GIT
On Wed 02-04-14 23:19:06, Jianyu Zhan wrote:
> A bitwise flag comparison could be done using a more efficient bit-ops way.
OK, but have you checked the generated code is actually any better? This
is something I'd expect a compiler might be able to optimize anyway. And the
original code looks more readable to me.
Honza
> Signed-off-by: Jianyu Zhan <[email protected]>
> ---
> include/linux/blkdev.h | 13 +++----------
> 1 file changed, 3 insertions(+), 10 deletions(-)
>
> diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
> index 1e1fa3f..adfa40a 100644
> --- a/include/linux/blkdev.h
> +++ b/include/linux/blkdev.h
> @@ -655,16 +655,9 @@ static inline bool rq_mergeable(struct request *rq)
> static inline bool blk_check_merge_flags(unsigned int flags1,
> unsigned int flags2)
> {
> - if ((flags1 & REQ_DISCARD) != (flags2 & REQ_DISCARD))
> - return false;
> -
> - if ((flags1 & REQ_SECURE) != (flags2 & REQ_SECURE))
> - return false;
> -
> - if ((flags1 & REQ_WRITE_SAME) != (flags2 & REQ_WRITE_SAME))
> - return false;
> -
> - return true;
> + return (flags1 & (REQ_DISCARD | REQ_SECURE | REQ_WRITE_SAME)) ^
> + (flags2 & (REQ_DISCARD | REQ_SECURE | REQ_WRITE_SAME))
> + == 0;
> }
>
> static inline bool blk_write_same_mergeable(struct bio *a, struct bio *b)
> --
> 1.9.0.GIT
>
--
Jan Kara <[email protected]>
SUSE Labs, CR
On Thu, Apr 3, 2014 at 2:13 AM, Jan Kara <[email protected]> wrote:
> OK, but have you checked the generated code is actually any better? This
> is something I'd expect a compiler might be able to optimize anyway. And the
> original code looks more readable to me.
Hi, Jan,
I've disassemble the code on my x86_64 box
(it's inline though, I just look at its call site),
and found that this patch DOES make it more efficient.
Orig asm snippt with
patch asm snippt
============ ================
mov %edx,%ecx mov %rdx,%r9
xor %r8d,%ecx xor %r8d,%r8d
test $0x80,%cl and $0x380,%r9d
jne 14c5 <blk_rq_merge_ok+0x15> test $0x380,%ecx
and $0x3,%ch sete %r8b
jne 14c5 <blk_rq_merge_ok+0x15> cmp %r8,%r9
je 14b5 <blk_rq_merge_ok+0x15>
This saves a branch.
Furthermore, I found that gcc is smart enough to try to optimize the
code, so if we do
like this, it will generate the most optimal and smallest code :
static inline bool blk_check_merge_flags(unsigned int flags1,
¦unsigned int flags2)
{
return ((flags1 ^ flags2) &
(REQ_DISCARD | REQ_SECURE | REQ_WRITE_SAME))
== 0;
}
this gives out :
mov %edx,%r8d
xor %ecx,%r8d
and $0x380,%r8d
jne 14a5 <blk_rq_merge_ok+0x15>
But yes, it compromises readibility.
Regards,
Jianyu Zhan
On Thu 03-04-14 16:00:44, Zhan Jianyu wrote:
> On Thu, Apr 3, 2014 at 2:13 AM, Jan Kara <[email protected]> wrote:
> > OK, but have you checked the generated code is actually any better? This
> > is something I'd expect a compiler might be able to optimize anyway. And the
> > original code looks more readable to me.
>
> Hi, Jan,
>
> I've disassemble the code on my x86_64 box
> (it's inline though, I just look at its call site),
> and found that this patch DOES make it more efficient.
>
> Orig asm snippt with
> patch asm snippt
> ============ ================
>
> mov %edx,%ecx mov %rdx,%r9
> xor %r8d,%ecx xor %r8d,%r8d
> test $0x80,%cl and $0x380,%r9d
> jne 14c5 <blk_rq_merge_ok+0x15> test $0x380,%ecx
> and $0x3,%ch sete %r8b
> jne 14c5 <blk_rq_merge_ok+0x15> cmp %r8,%r9
>
> je 14b5 <blk_rq_merge_ok+0x15>
>
> This saves a branch.
>
> Furthermore, I found that gcc is smart enough to try to optimize the
> code, so if we do
> like this, it will generate the most optimal and smallest code :
>
>
> static inline bool blk_check_merge_flags(unsigned int flags1,
> ?unsigned int flags2)
> {
> return ((flags1 ^ flags2) &
> (REQ_DISCARD | REQ_SECURE | REQ_WRITE_SAME))
> == 0;
> }
>
> this gives out :
>
> mov %edx,%r8d
> xor %ecx,%r8d
> and $0x380,%r8d
> jne 14a5 <blk_rq_merge_ok+0x15>
>
> But yes, it compromises readibility.
OK, I'd expect gcc is more clever ;). Thanks for the comparison. Anyway
if that function is performance sensitive, we can use your optimization.
Just add a comment there that we want to check whether the three flags are
the same in both flags and that checking your way generates better code.
Honza
--
Jan Kara <[email protected]>
SUSE Labs, CR
Hi, Jan,
I've just renewed the patch as you suggusted. Actually it isn't quite
performance sensitive, but the point is one less branch leads to
less penalty caused by branch prediction failure. Ok, this may be
way too paranoid.:-)
A bitwise flag comparison could be done using a more efficient bit-ops
way, by mimicking GCC logic of optimizing such bitwise comparison.
Signed-off-by: Jianyu Zhan <[email protected]>
---
include/linux/blkdev.h | 22 ++++++++++++----------
1 file changed, 12 insertions(+), 10 deletions(-)
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 1e1fa3f..f2b79fc 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -655,16 +655,18 @@ static inline bool rq_mergeable(struct request *rq)
static inline bool blk_check_merge_flags(unsigned int flags1,
unsigned int flags2)
{
- if ((flags1 & REQ_DISCARD) != (flags2 & REQ_DISCARD))
- return false;
-
- if ((flags1 & REQ_SECURE) != (flags2 & REQ_SECURE))
- return false;
-
- if ((flags1 & REQ_WRITE_SAME) != (flags2 & REQ_WRITE_SAME))
- return false;
-
- return true;
+ /*
+ * Check whether all tree flags are the same in both
+ * flags.
+ *
+ * Replace original three-if's comparision with a
+ * more efficient method, by mimicking the GCC logic of
+ * optimizing such bitwise comparion. This makes GCC
+ * to spit out most compact and least brach code.
+ */
+ return ((flags1 ^ flags2) &
+ (REQ_DISCARD | REQ_SECURE | REQ_WRITE_SAME))
+ == 0;
}
static inline bool blk_write_same_mergeable(struct bio *a, struct bio *b)
--
1.9.0.GIT