2016-04-08 18:58:54

by Denys Vlasenko

[permalink] [raw]
Subject: [PATCH] drivers/infiniband/hw/nes/nes_verbs.c: Deinline nes_free_qp_mem, save 1072 bytes

This function compiles to 550 bytes of machine code.
Three callsites, all in nes_create_qp.

Signed-off-by: Denys Vlasenko <[email protected]>
CC: Faisal Latif <[email protected]>
CC: Doug Ledford <[email protected]>
CC: [email protected]
CC: [email protected]
---
drivers/infiniband/hw/nes/nes_verbs.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/infiniband/hw/nes/nes_verbs.c b/drivers/infiniband/hw/nes/nes_verbs.c
index fba69a3..5f48d08 100644
--- a/drivers/infiniband/hw/nes/nes_verbs.c
+++ b/drivers/infiniband/hw/nes/nes_verbs.c
@@ -981,7 +981,7 @@ static int nes_setup_mmap_qp(struct nes_qp *nesqp, struct nes_vnic *nesvnic,
/**
* nes_free_qp_mem() is to free up the qp's pci_alloc_consistent() memory.
*/
-static inline void nes_free_qp_mem(struct nes_device *nesdev,
+static void nes_free_qp_mem(struct nes_device *nesdev,
struct nes_qp *nesqp, int virt_wqs)
{
unsigned long flags;
--
2.1.0


2016-04-08 18:59:02

by Denys Vlasenko

[permalink] [raw]
Subject: [PATCH] drivers/virtio/virtio_ring.c: Deinline virtqueue_add, save 1016 bytes

This function compiles to 839 bytes of machine code.
In C, it is ~150 lines long.

This function has 3 callsites.

Signed-off-by: Denys Vlasenko <[email protected]>
CC: "Michael S. Tsirkin" <[email protected]>
CC: [email protected]
CC: [email protected]
---
drivers/virtio/virtio_ring.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index e12e385..77a4771 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -126,7 +126,7 @@ static struct vring_desc *alloc_indirect(struct virtqueue *_vq,
return desc;
}

-static inline int virtqueue_add(struct virtqueue *_vq,
+static int virtqueue_add(struct virtqueue *_vq,
struct scatterlist *sgs[],
unsigned int total_sg,
unsigned int out_sgs,
--
2.1.0

2016-04-08 18:59:14

by Denys Vlasenko

[permalink] [raw]
Subject: [PATCH] drivers/scsi/fnic/fnic_scsi.c: Deinline fnic_queue_abort_io_req, save 1792 bytes

This function compiles to 511 bytes of machine code.

Abort commands are not time-critical at all.

Signed-off-by: Denys Vlasenko <[email protected]>
CC: James Bottomley <[email protected]>
CC: Hiral Patel <[email protected]>
CC: Suma Ramars <[email protected]>
CC: Brian Uchino <[email protected]>
CC: [email protected]
CC: [email protected]
---
drivers/scsi/fnic/fnic_scsi.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/scsi/fnic/fnic_scsi.c b/drivers/scsi/fnic/fnic_scsi.c
index 266b909..0a3edee 100644
--- a/drivers/scsi/fnic/fnic_scsi.c
+++ b/drivers/scsi/fnic/fnic_scsi.c
@@ -1435,7 +1435,7 @@ wq_copy_cleanup_scsi_cmd:
}
}

-static inline int fnic_queue_abort_io_req(struct fnic *fnic, int tag,
+static int fnic_queue_abort_io_req(struct fnic *fnic, int tag,
u32 task_req, u8 *fc_lun,
struct fnic_io_req *io_req)
{
--
2.1.0

2016-04-08 18:59:11

by Denys Vlasenko

[permalink] [raw]
Subject: [PATCH] fs/gfs2/glock.c: Deinline do_error, save 1856 bytes

This function compiles to 522 bytes of machine code.

Error paths are not very time critical.

Signed-off-by: Denys Vlasenko <[email protected]>
CC: Steven Whitehouse <[email protected]>
CC: Bob Peterson <[email protected]>
CC: [email protected]
CC: [email protected]
---
fs/gfs2/glock.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/gfs2/glock.c b/fs/gfs2/glock.c
index 6539131..c3d5172 100644
--- a/fs/gfs2/glock.c
+++ b/fs/gfs2/glock.c
@@ -218,7 +218,7 @@ static void gfs2_holder_wake(struct gfs2_holder *gh)
*
*/

-static inline void do_error(struct gfs2_glock *gl, const int ret)
+static void do_error(struct gfs2_glock *gl, const int ret)
{
struct gfs2_holder *gh, *tmp;

--
2.1.0

2016-04-08 18:59:43

by Denys Vlasenko

[permalink] [raw]
Subject: [PATCH] lockdep: Deinline register_lock_class, save 2328 bytes

This function compiles to 1328 bytes of machine code. Three callsites.

Registering a new lock class is definitely not *that* time-critical to inline it.

Signed-off-by: Denys Vlasenko <[email protected]>
CC: Peter Zijlstra <[email protected]>
CC: Ingo Molnar <[email protected]>
CC: [email protected]
---
kernel/locking/lockdep.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/locking/lockdep.c b/kernel/locking/lockdep.c
index 53ab2f8..ffad373 100644
--- a/kernel/locking/lockdep.c
+++ b/kernel/locking/lockdep.c
@@ -708,7 +708,7 @@ look_up_lock_class(struct lockdep_map *lock, unsigned int subclass)
* yet. Otherwise we look it up. We cache the result in the lock object
* itself, so actual lookup of the hash should be once per lock object.
*/
-static inline struct lock_class *
+static struct lock_class *
register_lock_class(struct lockdep_map *lock, unsigned int subclass, int force)
{
struct lockdep_subclass_key *key;
--
2.1.0

2016-04-08 19:08:52

by Laurence Oberman

[permalink] [raw]
Subject: Re: [PATCH] drivers/scsi/fnic/fnic_scsi.c: Deinline fnic_queue_abort_io_req, save 1792 bytes

Simple change, looks fine to me.

Reviewed-by: Laurence Oberman <[email protected]>

Laurence Oberman
Principal Software Maintenance Engineer
Red Hat Global Support Services

----- Original Message -----
From: "Denys Vlasenko" <[email protected]>
To: "James Bottomley" <[email protected]>
Cc: "Denys Vlasenko" <[email protected]>, "Hiral Patel" <[email protected]>, "Suma Ramars" <[email protected]>, "Brian Uchino" <[email protected]>, [email protected], [email protected]
Sent: Friday, April 8, 2016 2:58:43 PM
Subject: [PATCH] drivers/scsi/fnic/fnic_scsi.c: Deinline fnic_queue_abort_io_req, save 1792 bytes

This function compiles to 511 bytes of machine code.

Abort commands are not time-critical at all.

Signed-off-by: Denys Vlasenko <[email protected]>
CC: James Bottomley <[email protected]>
CC: Hiral Patel <[email protected]>
CC: Suma Ramars <[email protected]>
CC: Brian Uchino <[email protected]>
CC: [email protected]
CC: [email protected]
---
drivers/scsi/fnic/fnic_scsi.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/scsi/fnic/fnic_scsi.c b/drivers/scsi/fnic/fnic_scsi.c
index 266b909..0a3edee 100644
--- a/drivers/scsi/fnic/fnic_scsi.c
+++ b/drivers/scsi/fnic/fnic_scsi.c
@@ -1435,7 +1435,7 @@ wq_copy_cleanup_scsi_cmd:
}
}

-static inline int fnic_queue_abort_io_req(struct fnic *fnic, int tag,
+static int fnic_queue_abort_io_req(struct fnic *fnic, int tag,
u32 task_req, u8 *fc_lun,
struct fnic_io_req *io_req)
{
--
2.1.0

2016-04-08 19:18:01

by Bob Peterson

[permalink] [raw]
Subject: Re: [PATCH] fs/gfs2/glock.c: Deinline do_error, save 1856 bytes

----- Original Message -----
> This function compiles to 522 bytes of machine code.
>
> Error paths are not very time critical.
>
> Signed-off-by: Denys Vlasenko <[email protected]>
> CC: Steven Whitehouse <[email protected]>
> CC: Bob Peterson <[email protected]>
> CC: [email protected]
> CC: [email protected]
> ---
> fs/gfs2/glock.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/fs/gfs2/glock.c b/fs/gfs2/glock.c
> index 6539131..c3d5172 100644
> --- a/fs/gfs2/glock.c
> +++ b/fs/gfs2/glock.c
> @@ -218,7 +218,7 @@ static void gfs2_holder_wake(struct gfs2_holder *gh)
> *
> */
>
> -static inline void do_error(struct gfs2_glock *gl, const int ret)
> +static void do_error(struct gfs2_glock *gl, const int ret)
> {
> struct gfs2_holder *gh, *tmp;
>
> --
> 2.1.0
>
>

Hi Denys,

The name is misleading. Function do_error() isn't really an error path.
Its job is to "fail" all the holders for a glock that are doing a "try" lock
in cases where trying the lock has been determined to have failed.

Is there a reason why you want to trade memory for speed? Are you
optimizing for memory on an embedded device or something?
I guess I have no fundamental problem in adding this patch, but perhaps
Steve or someone can offer a second opinion before I do.

Regards,

Bob Peterson
Red Hat File Systems

2016-04-08 19:46:11

by Denys Vlasenko

[permalink] [raw]
Subject: Re: [PATCH] fs/gfs2/glock.c: Deinline do_error, save 1856 bytes

On 04/08/2016 09:17 PM, Bob Peterson wrote:
> ----- Original Message -----
>> This function compiles to 522 bytes of machine code.
>>
> Is there a reason why you want to trade memory for speed? Are you
> optimizing for memory on an embedded device or something?

Yes. I did a scan for really large inlines and this function came up.
If you feel it is indeed performance critical, please ignore my patch.

2016-04-09 20:14:42

by Michael S. Tsirkin

[permalink] [raw]
Subject: Re: [PATCH] drivers/virtio/virtio_ring.c: Deinline virtqueue_add, save 1016 bytes

On Fri, Apr 08, 2016 at 08:58:44PM +0200, Denys Vlasenko wrote:
> This function compiles to 839 bytes of machine code.
> In C, it is ~150 lines long.
>
> This function has 3 callsites.
>
> Signed-off-by: Denys Vlasenko <[email protected]>
> CC: "Michael S. Tsirkin" <[email protected]>
> CC: [email protected]
> CC: [email protected]

This function is one of the most performance critical ones in the driver, a
bunch of tuning went into it, making this inline intentionally. I'd
have to see some numbers showing making it non-inline is a worth-while
tradeoff.

> ---
> drivers/virtio/virtio_ring.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> index e12e385..77a4771 100644
> --- a/drivers/virtio/virtio_ring.c
> +++ b/drivers/virtio/virtio_ring.c
> @@ -126,7 +126,7 @@ static struct vring_desc *alloc_indirect(struct virtqueue *_vq,
> return desc;
> }
>
> -static inline int virtqueue_add(struct virtqueue *_vq,
> +static int virtqueue_add(struct virtqueue *_vq,
> struct scatterlist *sgs[],
> unsigned int total_sg,
> unsigned int out_sgs,
> --
> 2.1.0

2016-04-11 04:50:48

by Leon Romanovsky

[permalink] [raw]
Subject: Re: [PATCH] drivers/infiniband/hw/nes/nes_verbs.c: Deinline nes_free_qp_mem, save 1072 bytes

On Fri, Apr 08, 2016 at 08:58:42PM +0200, Denys Vlasenko wrote:
> This function compiles to 550 bytes of machine code.
> Three callsites, all in nes_create_qp.

I agree with you, the functions which calls below and after this
function are not optimized for speed and there is no need to inline
this function.

I have two requests from you.
1)
Can you please change title to be more convenient?
[PATCH] drivers/infiniband/hw/nes/nes_verbs.c: Deinline nes_free_qp_mem, save 1072 bytes
--->
[PATCH] IB/nes: Deinline nes_free_qp_mem

2) Add bloat-o-meter output to the commit message.

And after that feel free to add my RB tag.

Reviewed-By: Leon Romanovsky <[email protected]>


Attachments:
(No filename) (677.00 B)
signature.asc (819.00 B)
Digital signature
Download all attachments

2016-04-12 02:16:53

by Martin K. Petersen

[permalink] [raw]
Subject: Re: [PATCH] drivers/scsi/fnic/fnic_scsi.c: Deinline fnic_queue_abort_io_req, save 1792 bytes

>>>>> "Denys" == Denys Vlasenko <[email protected]> writes:

Denys> This function compiles to 511 bytes of machine code. Abort
Denys> commands are not time-critical at all.

Satish, please review.

https://patchwork.kernel.org/patch/8785281/

--
Martin K. Petersen Oracle Linux Engineering

2016-04-12 16:49:34

by Bob Peterson

[permalink] [raw]
Subject: Re: [PATCH] fs/gfs2/glock.c: Deinline do_error, save 1856 bytes

----- Original Message -----
> This function compiles to 522 bytes of machine code.
>
> Error paths are not very time critical.
>
> Signed-off-by: Denys Vlasenko <[email protected]>
> CC: Steven Whitehouse <[email protected]>
> CC: Bob Peterson <[email protected]>
> CC: [email protected]
> CC: [email protected]
> ---
> fs/gfs2/glock.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/fs/gfs2/glock.c b/fs/gfs2/glock.c
> index 6539131..c3d5172 100644
> --- a/fs/gfs2/glock.c
> +++ b/fs/gfs2/glock.c
> @@ -218,7 +218,7 @@ static void gfs2_holder_wake(struct gfs2_holder *gh)
> *
> */
>
> -static inline void do_error(struct gfs2_glock *gl, const int ret)
> +static void do_error(struct gfs2_glock *gl, const int ret)
> {
> struct gfs2_holder *gh, *tmp;
>
> --
> 2.1.0
>
>

Hi,

Thanks. This is now applied to the for-next branch of the linux-gfs2 tree:
https://git.kernel.org/cgit/linux/kernel/git/gfs2/linux-gfs2.git/commit/fs/gfs2?h=for-next&id=a527b38e1475211b67eb59b3fadb40689f035529

Regards,

Bob Peterson
Red Hat File Systems

Subject: [tip:locking/core] locking/lockdep: Deinline register_lock_class(), save 2328 bytes

Commit-ID: c003ed928962a55eb446e78c544b1d7c4f6cb88a
Gitweb: http://git.kernel.org/tip/c003ed928962a55eb446e78c544b1d7c4f6cb88a
Author: Denys Vlasenko <[email protected]>
AuthorDate: Fri, 8 Apr 2016 20:58:46 +0200
Committer: Ingo Molnar <[email protected]>
CommitDate: Wed, 13 Apr 2016 10:06:13 +0200

locking/lockdep: Deinline register_lock_class(), save 2328 bytes

This function compiles to 1328 bytes of machine code. Three callsites.

Registering a new lock class is definitely not *that* time-critical to inline it.

Signed-off-by: Denys Vlasenko <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Paul E. McKenney <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: [email protected]
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>
---
kernel/locking/lockdep.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/locking/lockdep.c b/kernel/locking/lockdep.c
index ed94109..7cc43ef 100644
--- a/kernel/locking/lockdep.c
+++ b/kernel/locking/lockdep.c
@@ -708,7 +708,7 @@ look_up_lock_class(struct lockdep_map *lock, unsigned int subclass)
* yet. Otherwise we look it up. We cache the result in the lock object
* itself, so actual lookup of the hash should be once per lock object.
*/
-static inline struct lock_class *
+static struct lock_class *
register_lock_class(struct lockdep_map *lock, unsigned int subclass, int force)
{
struct lockdep_subclass_key *key;