Just call kmem_cache_zalloc() instead of calling kmem_cache_alloc().
We're just initializing most fields to 0, false and NULL later on
_anyway_, so to make the code mode readable and potentially gain
a bit of performance (completely untested claim), we should fill our
btrfs_trans_handle with zeros on allocation then just initialize
those five remaining fields (not counting the list_heads) as normal.
Signed-off-by: Alexandru Moise <[email protected]>
---
fs/btrfs/transaction.c | 15 +--------------
1 file changed, 1 insertion(+), 14 deletions(-)
diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c
index f5021fc..d874ce25 100644
--- a/fs/btrfs/transaction.c
+++ b/fs/btrfs/transaction.c
@@ -468,7 +468,7 @@ start_transaction(struct btrfs_root *root, u64 num_items, unsigned int type,
goto reserve_fail;
}
again:
- h = kmem_cache_alloc(btrfs_trans_handle_cachep, GFP_NOFS);
+ h = kmem_cache_zalloc(btrfs_trans_handle_cachep, GFP_NOFS);
if (!h) {
ret = -ENOMEM;
goto alloc_fail;
@@ -509,22 +509,9 @@ again:
h->transid = cur_trans->transid;
h->transaction = cur_trans;
- h->blocks_used = 0;
- h->bytes_reserved = 0;
- h->chunk_bytes_reserved = 0;
h->root = root;
- h->delayed_ref_updates = 0;
h->use_count = 1;
- h->adding_csums = 0;
- h->block_rsv = NULL;
- h->orig_rsv = NULL;
- h->aborted = 0;
- h->qgroup_reserved = 0;
- h->delayed_ref_elem.seq = 0;
h->type = type;
- h->allocating_chunk = false;
- h->reloc_reserved = false;
- h->sync = false;
INIT_LIST_HEAD(&h->qgroup_ref_list);
INIT_LIST_HEAD(&h->new_bgs);
INIT_LIST_HEAD(&h->ordered);
--
2.5.0
On Thu, Aug 27, 2015 at 11:53:45PM +0000, Alexandru Moise wrote:
> Just call kmem_cache_zalloc() instead of calling kmem_cache_alloc().
> We're just initializing most fields to 0, false and NULL later on
> _anyway_, so to make the code mode readable and potentially gain
> a bit of performance (completely untested claim), we should fill our
> btrfs_trans_handle with zeros on allocation then just initialize
> those five remaining fields (not counting the list_heads) as normal.
>
> Signed-off-by: Alexandru Moise <[email protected]>
The performance gain is arguable but the generated code should be
smaller, which counts.
Reviewed-by: David Sterba <[email protected]>
On Fri, Aug 28, 2015 at 07:38:56PM +0200, David Sterba wrote:
> On Thu, Aug 27, 2015 at 11:53:45PM +0000, Alexandru Moise wrote:
> > Just call kmem_cache_zalloc() instead of calling kmem_cache_alloc().
> > We're just initializing most fields to 0, false and NULL later on
> > _anyway_, so to make the code mode readable and potentially gain
> > a bit of performance (completely untested claim), we should fill our
> > btrfs_trans_handle with zeros on allocation then just initialize
> > those five remaining fields (not counting the list_heads) as normal.
> >
> > Signed-off-by: Alexandru Moise <[email protected]>
>
> The performance gain is arguable but the generated code should be
> smaller, which counts.
>
> Reviewed-by: David Sterba <[email protected]>
Yeah, I ran a few iozone benchmarks on a Samsung 850 PRO SSD
on 3 kernels, the latest archlinux kernel, my custom kernel which has:
CONFIG_BTRFS_ASSERT=y
CONFIG_BTRFS_DEBUG=y
CONFIG_BTRFS_FS_RUN_SANITY_TESTS=y
CONFIG_BTRFS_FS_POSIX_ACL=y
with the patch, and my custom kernel without the patch.
I ran iozone 5 times on each kernel, There were huge differences
between my custom kernels and arch's kernel, but nothing conclusive
between my custom kernel with or without the patch. So it's safe
to say that it has not much of a visible effect on performance.
Thank you for your time!