2008-03-27 12:38:50

by Ilpo Järvinen

[permalink] [raw]
Subject: [PATCH 0/7]: uninline some net related static inline in .h bloaters

Hi all,

As suggested by Andrew, I rerun the uninlining of all what's in .h
(v2.6.25-rc2-mm1 this time) to fix the (un)likely profiling overhead
which showed up in the allyesconfig results. The config was manually
tweaked from allyesconfig to not include a number of debug related
options [1]. Other setup: 32-bit x86, gcc (GCC) 4.1.2 20070626
(Red Hat 4.1.2-13). I also tweaked the uninlining machinery a bit and
got even better coverage than last time.

IS_ERR is now successfully off the top :-). The numbers are smaller
than what was previously measured, esp. when a function has a large
number of call-sites and has (un)likely, nevertheless, mostly the
same functions show up among top bloaters as with plain allyesconfig.
Also my earlier, smaller scale tests yielded similar conclusion.
Interesting new comers, however, are those list related functions
which had non-inlined debug versions with allyesconfig.

Ok, here's the top of the list (5000+ bytes):

-60744 855 funcs, 861 +, 61605 -, diff: -60744 --- skb_put
-33129 42 funcs, 199 +, 33328 -, diff: -33129 --- cfi_build_cmd
-32338 1182 funcs, 594 +, 32932 -, diff: -32338 --- atomic_dec_and_test
-22906 1208 funcs, 462 +, 23368 -, diff: -22906 --- list_del
-18399 468 funcs, 282 +, 18681 -, diff: -18399 --- netif_wake_queue
-14283 14 funcs, 365 +, 14648 -, diff: -14283 --- cfi_send_gen_cmd
-13890 341 funcs, 189 +, 14079 -, diff: -13890 --- skb_push
-12853 35 funcs, 114 +, 12967 -, diff: -12853 --- le_key_k_type
-12178 382 funcs, 157 +, 12335 -, diff: -12178 --- dev_alloc_skb
-11675 1178 funcs, 1471 +, 13146 -, diff: -11675 --- list_add_tail
-10996 1688 funcs, 1144 +, 12140 -, diff: -10996 --- raw_local_irq_enable
-10838 1832 funcs, 3763 +, 14601 -, diff: -10838 --- __list_add
-10283 401 funcs, 208 +, 10491 -, diff: -10283 --- __dev_alloc_skb
-9697 338 funcs, 221 +, 9918 -, diff: -9697 --- skb_pull
-8963 836 funcs, 1903 +, 10866 -, diff: -8963 --- list_add
-7939 29 funcs, 304 +, 8243 -, diff: -7939 --- __vlan_hwaccel_rx
-7937 238 funcs, 314 +, 8251 -, diff: -7937 --- i_size_read
-7768 106 funcs, 254 +, 8022 -, diff: -7768 --- __nlmsg_put
-7569 376 funcs, 246 +, 7815 -, diff: -7569 --- __skb_pull
-7467 658 funcs, 503 +, 7970 -, diff: -7467 --- list_del_init
-7360 192 funcs, 131 +, 7491 -, diff: -7360 --- skb_trim
-7257 186 funcs, 70 +, 7327 -, diff: -7257 --- dst_release
-6943 234 funcs, 109 +, 7052 -, diff: -6943 --- __skb_trim
-6923 71 funcs, 296 +, 7219 -, diff: -6923 --- nlmsg_put
-6662 475 funcs, 410 +, 7072 -, diff: -6662 --- add_timer
-6535 380 funcs, 1491 +, 8026 -, diff: -6535 --- ___arch__swab64
-6457 372 funcs, 1429 +, 7886 -, diff: -6457 --- __fswab64
-5625 48 funcs, 79 +, 5704 -, diff: -5625 --- tty_insert_flip_char
-5615 25 funcs, 284 +, 5899 -, diff: -5615 --- vlan_hwaccel_receive_skb
-5404 23 funcs, 509 +, 5913 -, diff: -5404 --- jhash
-5396 214 funcs, 155 +, 5551 -, diff: -5396 --- dev_to_shost
-5310 110 funcs, 133 +, 5443 -, diff: -5310 --- skb_header_pointer
-5169 403 funcs, 131 +, 5300 -, diff: -5169 --- pci_write_config_byte

Please note that because one inline was tested (function uninlined)
at a time, the actual benefits of removing multiple inlines may well
be below what the sum of those individually is (especially when
something calls __-func with equal name, e.g., dev_alloc_skb that
basically just calls __dev_alloc_skb).

~210 are 1000+ bytes (was ~250 with allyesconfig) and ~350 in 500+
(was ~440 previously). Full list has small number of entries without
any details indicating compile failures, and a bit larger set of cases
where the static inline was preprocessed away due to #if blocks. Except
for that, it should be self explinary:
http://www.cs.helsinki.fi/u/ijjarvin/inlines/sorted.v2.6.25-rc2-mm1

I include couple of net related uninlines in this patch series. I don't
include the jhash to lib/ patch this time (was there previously) because
I haven't had time to finish it up.

Another view to the results showing size of the uninlined bodies is
available in here:
http://www.cs.helsinki.fi/u/ijjarvin/inlines/bodies.v2.6.25-rc2-mm1

The tools I used are available here except the site-specific
distribute machinery (in addition one needs pretty late
codiff from Arnaldo's toolset because there were some inline
related bugs fixed lately):

http://www.cs.helsinki.fi/u/ijjarvin/inline-tools.git/

As stated earlier, similar analysis should be performed on .h files
not under include/, but it would require minor modifications to
those tools.

--
i.

[1] http://www.cs.helsinki.fi/u/ijjarvin/inlines/config.nodebug.v2.6.25-rc2-mm1



2008-03-27 12:38:30

by Ilpo Järvinen

[permalink] [raw]
Subject: [PATCH 3/7] [NET]: uninline dev_alloc_skb, de-bloats a lot

Allyesconfig (v2.6.24-mm1):

-23668 392 funcs, 104 +, 23772 -, diff: -23668 --- dev_alloc_skb

Without many debug CONFIGs (v2.6.25-rc2-mm1):

-12178 382 funcs, 157 +, 12335 -, diff: -12178 --- dev_alloc_skb
dev_alloc_skb | +37

Signed-off-by: Ilpo J?rvinen <[email protected]>
---
include/linux/skbuff.h | 17 +----------------
net/core/skbuff.c | 18 ++++++++++++++++++
2 files changed, 19 insertions(+), 16 deletions(-)

diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index 6d6cde7..01a11b0 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -1272,22 +1272,7 @@ static inline struct sk_buff *__dev_alloc_skb(unsigned int length,
return skb;
}

-/**
- * dev_alloc_skb - allocate an skbuff for receiving
- * @length: length to allocate
- *
- * Allocate a new &sk_buff and assign it a usage count of one. The
- * buffer has unspecified headroom built in. Users should allocate
- * the headroom they think they need without accounting for the
- * built in space. The built in space is used for optimisations.
- *
- * %NULL is returned if there is no free memory. Although this function
- * allocates memory it can be called from an interrupt.
- */
-static inline struct sk_buff *dev_alloc_skb(unsigned int length)
-{
- return __dev_alloc_skb(length, GFP_ATOMIC);
-}
+extern struct sk_buff *dev_alloc_skb(unsigned int length);

extern struct sk_buff *__netdev_alloc_skb(struct net_device *dev,
unsigned int length, gfp_t gfp_mask);
diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index cf489b6..0daf5c0 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -263,6 +263,24 @@ struct sk_buff *__netdev_alloc_skb(struct net_device *dev,
return skb;
}

+/**
+ * dev_alloc_skb - allocate an skbuff for receiving
+ * @length: length to allocate
+ *
+ * Allocate a new &sk_buff and assign it a usage count of one. The
+ * buffer has unspecified headroom built in. Users should allocate
+ * the headroom they think they need without accounting for the
+ * built in space. The built in space is used for optimisations.
+ *
+ * %NULL is returned if there is no free memory. Although this function
+ * allocates memory it can be called from an interrupt.
+ */
+struct sk_buff *dev_alloc_skb(unsigned int length)
+{
+ return __dev_alloc_skb(length, GFP_ATOMIC);
+}
+EXPORT_SYMBOL(dev_alloc_skb);
+
static void skb_drop_list(struct sk_buff **listp)
{
struct sk_buff *list = *listp;
--
1.5.2.2

2008-03-27 12:39:15

by Ilpo Järvinen

[permalink] [raw]
Subject: [PATCH 1/7] [NET]: uninline skb_put, de-bloats a lot

Allyesconfig (v2.6.24-mm1):

~500 files changed
...
869 funcs, 198 +, 111003 -, diff: -110805 --- skb_put
skb_put | +104

Without number of debug related CONFIGs (v2.6.25-rc2-mm1):

-60744 855 funcs, 861 +, 61605 -, diff: -60744 --- skb_put
skb_put | +57

Signed-off-by: Ilpo J?rvinen <[email protected]>
---
include/linux/skbuff.h | 21 +--------------------
net/core/skbuff.c | 21 +++++++++++++++++++++
2 files changed, 22 insertions(+), 20 deletions(-)

diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index 7beb239..f085955 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -892,6 +892,7 @@ static inline void skb_set_tail_pointer(struct sk_buff *skb, const int offset)
/*
* Add data to an sk_buff
*/
+extern unsigned char *skb_put(struct sk_buff *skb, unsigned int len);
static inline unsigned char *__skb_put(struct sk_buff *skb, unsigned int len)
{
unsigned char *tmp = skb_tail_pointer(skb);
@@ -901,26 +902,6 @@ static inline unsigned char *__skb_put(struct sk_buff *skb, unsigned int len)
return tmp;
}

-/**
- * skb_put - add data to a buffer
- * @skb: buffer to use
- * @len: amount of data to add
- *
- * This function extends the used data area of the buffer. If this would
- * exceed the total buffer size the kernel will panic. A pointer to the
- * first byte of the extra data is returned.
- */
-static inline unsigned char *skb_put(struct sk_buff *skb, unsigned int len)
-{
- unsigned char *tmp = skb_tail_pointer(skb);
- SKB_LINEAR_ASSERT(skb);
- skb->tail += len;
- skb->len += len;
- if (unlikely(skb->tail > skb->end))
- skb_over_panic(skb, len, current_text_addr());
- return tmp;
-}
-
static inline unsigned char *__skb_push(struct sk_buff *skb, unsigned int len)
{
skb->data -= len;
diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index 0d0fd28..3402eca 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -857,6 +857,27 @@ free_skb:
return err;
}

+/**
+ * skb_put - add data to a buffer
+ * @skb: buffer to use
+ * @len: amount of data to add
+ *
+ * This function extends the used data area of the buffer. If this would
+ * exceed the total buffer size the kernel will panic. A pointer to the
+ * first byte of the extra data is returned.
+ */
+unsigned char *skb_put(struct sk_buff *skb, unsigned int len)
+{
+ unsigned char *tmp = skb_tail_pointer(skb);
+ SKB_LINEAR_ASSERT(skb);
+ skb->tail += len;
+ skb->len += len;
+ if (unlikely(skb->tail > skb->end))
+ skb_over_panic(skb, len, __builtin_return_address(0));
+ return tmp;
+}
+EXPORT_SYMBOL(skb_put);
+
/* Trims skb to length len. It can change skb pointers.
*/

--
1.5.2.2

2008-03-27 12:39:31

by Ilpo Järvinen

[permalink] [raw]
Subject: [PATCH 4/7] [NET]: uninline skb_push, de-bloats a lot

Allyesconfig (v2.6.24-mm1):

-21593 356 funcs, 2418 +, 24011 -, diff: -21593 --- skb_push

Without many debug related CONFIGs (v2.6.25-rc2-mm1):

-13890 341 funcs, 189 +, 14079 -, diff: -13890 --- skb_push
skb_push | +46

Signed-off-by: Ilpo J?rvinen <[email protected]>
---
include/linux/skbuff.h | 19 +------------------
net/core/skbuff.c | 19 +++++++++++++++++++
2 files changed, 20 insertions(+), 18 deletions(-)

diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index 01a11b0..1baf4d4 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -902,6 +902,7 @@ static inline unsigned char *__skb_put(struct sk_buff *skb, unsigned int len)
return tmp;
}

+extern unsigned char *skb_push(struct sk_buff *skb, unsigned int len);
static inline unsigned char *__skb_push(struct sk_buff *skb, unsigned int len)
{
skb->data -= len;
@@ -909,24 +910,6 @@ static inline unsigned char *__skb_push(struct sk_buff *skb, unsigned int len)
return skb->data;
}

-/**
- * skb_push - add data to the start of a buffer
- * @skb: buffer to use
- * @len: amount of data to add
- *
- * This function extends the used data area of the buffer at the buffer
- * start. If this would exceed the total buffer headroom the kernel will
- * panic. A pointer to the first byte of the extra data is returned.
- */
-static inline unsigned char *skb_push(struct sk_buff *skb, unsigned int len)
-{
- skb->data -= len;
- skb->len += len;
- if (unlikely(skb->data<skb->head))
- skb_under_panic(skb, len, current_text_addr());
- return skb->data;
-}
-
extern unsigned char *skb_pull(struct sk_buff *skb, unsigned int len);
static inline unsigned char *__skb_pull(struct sk_buff *skb, unsigned int len)
{
diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index 0daf5c0..a37127b 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -897,6 +897,25 @@ unsigned char *skb_put(struct sk_buff *skb, unsigned int len)
EXPORT_SYMBOL(skb_put);

/**
+ * skb_push - add data to the start of a buffer
+ * @skb: buffer to use
+ * @len: amount of data to add
+ *
+ * This function extends the used data area of the buffer at the buffer
+ * start. If this would exceed the total buffer headroom the kernel will
+ * panic. A pointer to the first byte of the extra data is returned.
+ */
+unsigned char *skb_push(struct sk_buff *skb, unsigned int len)
+{
+ skb->data -= len;
+ skb->len += len;
+ if (unlikely(skb->data<skb->head))
+ skb_under_panic(skb, len, __builtin_return_address(0));
+ return skb->data;
+}
+EXPORT_SYMBOL(skb_push);
+
+/**
* skb_pull - remove data from the start of a buffer
* @skb: buffer to use
* @len: amount of data to remove
--
1.5.2.2

2008-03-27 12:39:48

by Ilpo Järvinen

[permalink] [raw]
Subject: [PATCH 5/7] [NET]: uninline dst_release

Codiff stats (allyesconfig, v2.6.24-mm1):
-16420 187 funcs, 103 +, 16523 -, diff: -16420 --- dst_release

Without number of debug related CONFIGs (v2.6.25-rc2-mm1):
-7257 186 funcs, 70 +, 7327 -, diff: -7257 --- dst_release
dst_release | +40

Signed-off-by: Ilpo J?rvinen <[email protected]>
---
include/net/dst.h | 10 +---------
net/core/dst.c | 10 ++++++++++
2 files changed, 11 insertions(+), 9 deletions(-)

diff --git a/include/net/dst.h b/include/net/dst.h
index ae13370..002500e 100644
--- a/include/net/dst.h
+++ b/include/net/dst.h
@@ -163,15 +163,7 @@ struct dst_entry * dst_clone(struct dst_entry * dst)
return dst;
}

-static inline
-void dst_release(struct dst_entry * dst)
-{
- if (dst) {
- WARN_ON(atomic_read(&dst->__refcnt) < 1);
- smp_mb__before_atomic_dec();
- atomic_dec(&dst->__refcnt);
- }
-}
+extern void dst_release(struct dst_entry *dst);

/* Children define the path of the packet through the
* Linux networking. Thus, destinations are stackable.
diff --git a/net/core/dst.c b/net/core/dst.c
index 694cd2a..fe03266 100644
--- a/net/core/dst.c
+++ b/net/core/dst.c
@@ -259,6 +259,16 @@ again:
return NULL;
}

+void dst_release(struct dst_entry *dst)
+{
+ if (dst) {
+ WARN_ON(atomic_read(&dst->__refcnt) < 1);
+ smp_mb__before_atomic_dec();
+ atomic_dec(&dst->__refcnt);
+ }
+}
+EXPORT_SYMBOL(dst_release);
+
/* Dirty hack. We did it in 2.2 (in __dst_free),
* we have _very_ good reasons not to repeat
* this mistake in 2.3, but we have no choice
--
1.5.2.2

2008-03-27 12:40:03

by Ilpo Järvinen

[permalink] [raw]
Subject: [PATCH 2/7] [NET]: uninline skb_pull, de-bloats a lot

Allyesconfig (v2.6.24-mm1):

-28162 354 funcs, 3005 +, 31167 -, diff: -28162 --- skb_pull

Without number of debug related CONFIGs (v2.6.25-rc2-mm1):

-9697 338 funcs, 221 +, 9918 -, diff: -9697 --- skb_pull
skb_pull | +44

Signed-off-by: Ilpo J?rvinen <[email protected]>
---
include/linux/skbuff.h | 16 +---------------
net/core/skbuff.c | 16 ++++++++++++++++
2 files changed, 17 insertions(+), 15 deletions(-)

diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index f085955..6d6cde7 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -927,6 +927,7 @@ static inline unsigned char *skb_push(struct sk_buff *skb, unsigned int len)
return skb->data;
}

+extern unsigned char *skb_pull(struct sk_buff *skb, unsigned int len);
static inline unsigned char *__skb_pull(struct sk_buff *skb, unsigned int len)
{
skb->len -= len;
@@ -934,21 +935,6 @@ static inline unsigned char *__skb_pull(struct sk_buff *skb, unsigned int len)
return skb->data += len;
}

-/**
- * skb_pull - remove data from the start of a buffer
- * @skb: buffer to use
- * @len: amount of data to remove
- *
- * This function removes data from the start of a buffer, returning
- * the memory to the headroom. A pointer to the next data in the buffer
- * is returned. Once the data has been pulled future pushes will overwrite
- * the old data.
- */
-static inline unsigned char *skb_pull(struct sk_buff *skb, unsigned int len)
-{
- return unlikely(len > skb->len) ? NULL : __skb_pull(skb, len);
-}
-
extern unsigned char *__pskb_pull_tail(struct sk_buff *skb, int delta);

static inline unsigned char *__pskb_pull(struct sk_buff *skb, unsigned int len)
diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index 3402eca..cf489b6 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -878,6 +878,22 @@ unsigned char *skb_put(struct sk_buff *skb, unsigned int len)
}
EXPORT_SYMBOL(skb_put);

+/**
+ * skb_pull - remove data from the start of a buffer
+ * @skb: buffer to use
+ * @len: amount of data to remove
+ *
+ * This function removes data from the start of a buffer, returning
+ * the memory to the headroom. A pointer to the next data in the buffer
+ * is returned. Once the data has been pulled future pushes will overwrite
+ * the old data.
+ */
+unsigned char *skb_pull(struct sk_buff *skb, unsigned int len)
+{
+ return unlikely(len > skb->len) ? NULL : __skb_pull(skb, len);
+}
+EXPORT_SYMBOL(skb_pull);
+
/* Trims skb to length len. It can change skb pointers.
*/

--
1.5.2.2

2008-03-27 12:40:32

by Ilpo Järvinen

[permalink] [raw]
Subject: [PATCH 7/7] [SCTP]: Remove sctp_add_cmd_sf wrapper bloat

With a was number of callsites sctp_add_cmd_sf wrapper bloats
kernel by some amount. Due to unlikely tracking allyesconfig,
with the initial result were around ~7kB (thus caught my
attention) while a non-debug config produced only ~2.3kB effect.

I (ij) proposed first a patch to uninline it but Vlad responded
with a patch that removed the only sctp_add_cmd call which is
wrapped by sctp_add_cmd_sf (I wasn't sure if I could do that).
I did minor cleanup to Vlad's patch.

Signed-off-by: Ilpo J?rvinen <[email protected]>
Signed-off-by: Vlad Yasevich <[email protected]>
---
include/net/sctp/command.h | 3 +--
include/net/sctp/sm.h | 8 --------
net/sctp/command.c | 10 ++--------
net/sctp/sm_statefuns.c | 8 ++------
4 files changed, 5 insertions(+), 24 deletions(-)

diff --git a/include/net/sctp/command.h b/include/net/sctp/command.h
index 10ae2da..4263af8 100644
--- a/include/net/sctp/command.h
+++ b/include/net/sctp/command.h
@@ -205,12 +205,11 @@ typedef struct {
int sctp_init_cmd_seq(sctp_cmd_seq_t *seq);

/* Add a command to an sctp_cmd_seq_t.
- * Return 0 if the command sequence is full.
*
* Use the SCTP_* constructors defined by SCTP_ARG_CONSTRUCTOR() above
* to wrap data which goes in the obj argument.
*/
-int sctp_add_cmd(sctp_cmd_seq_t *seq, sctp_verb_t verb, sctp_arg_t obj);
+void sctp_add_cmd_sf(sctp_cmd_seq_t *seq, sctp_verb_t verb, sctp_arg_t obj);

/* Return the next command structure in an sctp_cmd_seq.
* Return NULL at the end of the sequence.
diff --git a/include/net/sctp/sm.h b/include/net/sctp/sm.h
index ef9e7ed..2481173 100644
--- a/include/net/sctp/sm.h
+++ b/include/net/sctp/sm.h
@@ -385,14 +385,6 @@ static inline int ADDIP_SERIAL_gte(__u16 s, __u16 t)
return (((s) == (t)) || (((t) - (s)) & ADDIP_SERIAL_SIGN_BIT));
}

-
-/* Run sctp_add_cmd() generating a BUG() if there is a failure. */
-static inline void sctp_add_cmd_sf(sctp_cmd_seq_t *seq, sctp_verb_t verb, sctp_arg_t obj)
-{
- if (unlikely(!sctp_add_cmd(seq, verb, obj)))
- BUG();
-}
-
/* Check VTAG of the packet matches the sender's own tag. */
static inline int
sctp_vtag_verify(const struct sctp_chunk *chunk,
diff --git a/net/sctp/command.c b/net/sctp/command.c
index bb97733..c004401 100644
--- a/net/sctp/command.c
+++ b/net/sctp/command.c
@@ -52,18 +52,12 @@ int sctp_init_cmd_seq(sctp_cmd_seq_t *seq)
/* Add a command to a sctp_cmd_seq_t.
* Return 0 if the command sequence is full.
*/
-int sctp_add_cmd(sctp_cmd_seq_t *seq, sctp_verb_t verb, sctp_arg_t obj)
+void sctp_add_cmd_sf(sctp_cmd_seq_t *seq, sctp_verb_t verb, sctp_arg_t obj)
{
- if (seq->next_free_slot >= SCTP_MAX_NUM_COMMANDS)
- goto fail;
+ BUG_ON(seq->next_free_slot >= SCTP_MAX_NUM_COMMANDS);

seq->cmds[seq->next_free_slot].verb = verb;
seq->cmds[seq->next_free_slot++].obj = obj;
-
- return 1;
-
-fail:
- return 0;
}

/* Return the next command structure in a sctp_cmd_seq.
diff --git a/net/sctp/sm_statefuns.c b/net/sctp/sm_statefuns.c
index 6545b5f..b534dbe 100644
--- a/net/sctp/sm_statefuns.c
+++ b/net/sctp/sm_statefuns.c
@@ -3135,12 +3135,8 @@ sctp_disposition_t sctp_sf_operr_notify(const struct sctp_endpoint *ep,
if (!ev)
goto nomem;

- if (!sctp_add_cmd(commands, SCTP_CMD_EVENT_ULP,
- SCTP_ULPEVENT(ev))) {
- sctp_ulpevent_free(ev);
- goto nomem;
- }
-
+ sctp_add_cmd_sf(commands, SCTP_CMD_EVENT_ULP,
+ SCTP_ULPEVENT(ev));
sctp_add_cmd_sf(commands, SCTP_CMD_PROCESS_OPERR,
SCTP_CHUNK(chunk));
}
--
1.5.2.2

2008-03-27 12:40:49

by Ilpo Järvinen

[permalink] [raw]
Subject: [PATCH 6/7] [NET]: uninline skb_trim, de-bloats

Allyesconfig (v2.6.24-mm1):
-10976 209 funcs, 123 +, 11099 -, diff: -10976 --- skb_trim

Without number of debug related CONFIGs (v2.6.25-rc2-mm1):
-7360 192 funcs, 131 +, 7491 -, diff: -7360 --- skb_trim
skb_trim | +42

Signed-off-by: Ilpo J?rvinen <[email protected]>
---
include/linux/skbuff.h | 16 +---------------
net/core/skbuff.c | 16 ++++++++++++++++
2 files changed, 17 insertions(+), 15 deletions(-)

diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index 1baf4d4..ff72145 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -1158,21 +1158,7 @@ static inline void __skb_trim(struct sk_buff *skb, unsigned int len)
skb_set_tail_pointer(skb, len);
}

-/**
- * skb_trim - remove end from a buffer
- * @skb: buffer to alter
- * @len: new length
- *
- * Cut the length of a buffer down by removing data from the tail. If
- * the buffer is already under the length specified it is not modified.
- * The skb must be linear.
- */
-static inline void skb_trim(struct sk_buff *skb, unsigned int len)
-{
- if (skb->len > len)
- __skb_trim(skb, len);
-}
-
+extern void skb_trim(struct sk_buff *skb, unsigned int len);

static inline int __pskb_trim(struct sk_buff *skb, unsigned int len)
{
diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index a37127b..86e5682 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -931,6 +931,22 @@ unsigned char *skb_pull(struct sk_buff *skb, unsigned int len)
}
EXPORT_SYMBOL(skb_pull);

+/**
+ * skb_trim - remove end from a buffer
+ * @skb: buffer to alter
+ * @len: new length
+ *
+ * Cut the length of a buffer down by removing data from the tail. If
+ * the buffer is already under the length specified it is not modified.
+ * The skb must be linear.
+ */
+void skb_trim(struct sk_buff *skb, unsigned int len)
+{
+ if (skb->len > len)
+ __skb_trim(skb, len);
+}
+EXPORT_SYMBOL(skb_trim);
+
/* Trims skb to length len. It can change skb pointers.
*/

--
1.5.2.2

2008-03-27 19:11:39

by Joe Perches

[permalink] [raw]
Subject: Re: [PATCH 1/7] [NET]: uninline skb_put, de-bloats a lot

On Thu, 2008-03-27 at 14:38 +0200, Ilpo Järvinen wrote:
> Allyesconfig (v2.6.24-mm1):

I think this change is only good in severely memory
limited uses. This will very likely negatively impact
high speed networking. It's a speed/size trade off.

Perhaps a CONFIG option to reduce net code size would
be appropriate.

2008-03-27 22:05:14

by David Miller

[permalink] [raw]
Subject: Re: [PATCH 1/7] [NET]: uninline skb_put, de-bloats a lot

From: Joe Perches <[email protected]>
Date: Thu, 27 Mar 2008 12:10:50 -0700

> On Thu, 2008-03-27 at 14:38 +0200, Ilpo J?rvinen wrote:
> > Allyesconfig (v2.6.24-mm1):
>
> I think this change is only good in severely memory
> limited uses. This will very likely negatively impact
> high speed networking. It's a speed/size trade off.

I severely doubt this, the bulk of the overhead of
skb_put() is the atomic operation, not whether the
instructions get executed inline or not.

Please run some tests before making claims like this.
I might give you some slack in this area if you've
been doing networking performance tuning for 10+ years
but you haven't.

2008-03-27 23:38:57

by Denys Vlasenko

[permalink] [raw]
Subject: Re: [PATCH 3/7] [NET]: uninline dev_alloc_skb, de-bloats a lot

On Thursday 27 March 2008 13:38, Ilpo J?rvinen wrote:
> Allyesconfig (v2.6.24-mm1):
>
> -23668 392 funcs, 104 +, 23772 -, diff: -23668 --- dev_alloc_skb
>
> Without many debug CONFIGs (v2.6.25-rc2-mm1):
>
> -12178 382 funcs, 157 +, 12335 -, diff: -12178 --- dev_alloc_skb
> dev_alloc_skb | +37


This will be very confusing for casual reader:

> +struct sk_buff *dev_alloc_skb(unsigned int length)
> +{
> + return __dev_alloc_skb(length, GFP_ATOMIC);
> +}
> +EXPORT_SYMBOL(dev_alloc_skb);

"what, they want to save one push instruction per callsite??".

Can you add a comment which explains the intent?

+struct sk_buff *dev_alloc_skb(unsigned int length)
+{
+ /* There is more code here than it seems:
+ * __dev_alloc_skb is an inline */
+ return __dev_alloc_skb(length, GFP_ATOMIC);
+}
+EXPORT_SYMBOL(dev_alloc_skb);


Another good chunk of code size saving can be achieved
by introducing dev_alloc_skb_or_warn(), and using it
in places like these:

drivers/net/irda/nsc-ircc.c:

skb = dev_alloc_skb(len+1);
if (skb == NULL) {
IRDA_WARNING("%s(), memory squeeze, "
"dropping frame.\n",
__FUNCTION__);

drivers/net/appletalk/ltpc.c:

skb = dev_alloc_skb(3+sklen);
if (skb == NULL)
{
printk("%s: dropping packet due to memory squeeze.\n",

net/econet/af_econet.c:

newskb = alloc_skb((len - sizeof(struct aunhdr) + 15) & ~15,
GFP_ATOMIC);
if (newskb == NULL)
{
printk(KERN_DEBUG "AUN: memory squeeze, dropping packet.\n");
/* Send nack and hope sender tries again */
goto bad;
}

(hmm, this last one also wants s/alloc_skb(GFP_ATOMIC)/dev_alloc_skb/)

--
vda

2008-03-28 00:04:19

by Joe Perches

[permalink] [raw]
Subject: Re: [PATCH 1/7] [NET]: uninline skb_put, de-bloats a lot

On Thu, 2008-03-27 at 15:04 -0700, David Miller wrote:
> From: Joe Perches <[email protected]>
> > This will very likely negatively impact
> > high speed networking.
> Please run some tests before making claims like this.

I think this is the sort of change you might have argued
against accepting without performance testing not long ago.

> I might give you some slack in this area if you've
> been doing networking performance tuning for 10+ years
> but you haven't.

I've done some work in network performance testing.

cheers, Joe

2008-03-28 00:05:11

by David Miller

[permalink] [raw]
Subject: Re: [PATCH 1/7] [NET]: uninline skb_put, de-bloats a lot

From: Joe Perches <[email protected]>
Date: Thu, 27 Mar 2008 17:02:37 -0700

> I think this is the sort of change you might have argued
> against accepting without performance testing not long ago.

Look at how complicated the assembler of this function
is. Any inlining performance gains will be trumped by
the instruction cache usage bloat.

2008-03-28 00:12:56

by Matt Mackall

[permalink] [raw]
Subject: Re: [PATCH 1/7] [NET]: uninline skb_put, de-bloats a lot


On Thu, 2008-03-27 at 15:04 -0700, David Miller wrote:
> From: Joe Perches <[email protected]>
> Date: Thu, 27 Mar 2008 12:10:50 -0700
>
> > On Thu, 2008-03-27 at 14:38 +0200, Ilpo Järvinen wrote:
> > > Allyesconfig (v2.6.24-mm1):
> >
> > I think this change is only good in severely memory
> > limited uses. This will very likely negatively impact
> > high speed networking. It's a speed/size trade off.
>
> I severely doubt this, the bulk of the overhead of
> skb_put() is the atomic operation, not whether the
> instructions get executed inline or not.

More generally, we have to weigh the cost of a function call against the
cost of a cache miss here or -somewhere else-. That is, running multiple
copies of this code inline means that much other code gets pushed out of
cache. Further, consolidating multiple copies of this code into one
means it's that much more likely to already be in cache when we hit it.

In the 486 era, when CPU performance was close to 1:1 with memory,
branches were more expensive than sequential memory fetches, and
registers were scarce, inlining made a fair amount of sense.

But now we've moved very far away from that indeed: CPU is orders of
magnitude faster than memory, branches are quite cheap, and register
are.. well, not quite as scarce. All of that means that a cache miss is
much more expensive than a function call.

Inlining typical only makes sense for fairly trivial transformations
(something you'd consider doing with a macro) where the code to set up a
function call is comparable to the size of the function itself and code
in innermost loops. And if the inline is in a header file, it's probably
not in the latter class.

In the case of this patch, removing 60-100k from the network stack means
we're almost certainly avoiding a lot of cache misses in the big picture
while taking a few cycle hit per packet in the smallest scale.

--
Mathematics is the supreme nostalgia of our time.

2008-03-28 00:44:00

by David Miller

[permalink] [raw]
Subject: Re: [PATCH 1/7] [NET]: uninline skb_put, de-bloats a lot

From: "Ilpo_J?rvinen" <[email protected]>
Date: Thu, 27 Mar 2008 14:38:00 +0200

> Allyesconfig (v2.6.24-mm1):
>
> ~500 files changed
> ...
> 869 funcs, 198 +, 111003 -, diff: -110805 --- skb_put
> skb_put | +104
>
> Without number of debug related CONFIGs (v2.6.25-rc2-mm1):
>
> -60744 855 funcs, 861 +, 61605 -, diff: -60744 --- skb_put
> skb_put | +57
>
> Signed-off-by: Ilpo J?rvinen <[email protected]>

Applied.

2008-03-28 00:51:52

by David Miller

[permalink] [raw]
Subject: Re: [PATCH 3/7] [NET]: uninline dev_alloc_skb, de-bloats a lot

From: "Ilpo_J?rvinen" <[email protected]>
Date: Thu, 27 Mar 2008 14:38:02 +0200

> Allyesconfig (v2.6.24-mm1):
>
> -23668 392 funcs, 104 +, 23772 -, diff: -23668 --- dev_alloc_skb
>
> Without many debug CONFIGs (v2.6.25-rc2-mm1):
>
> -12178 382 funcs, 157 +, 12335 -, diff: -12178 --- dev_alloc_skb
> dev_alloc_skb | +37
>
> Signed-off-by: Ilpo J?rvinen <[email protected]>

Applied.

2008-03-28 00:52:25

by David Miller

[permalink] [raw]
Subject: Re: [PATCH 3/7] [NET]: uninline dev_alloc_skb, de-bloats a lot

From: Denys Vlasenko <[email protected]>
Date: Fri, 28 Mar 2008 00:36:59 +0100

> Can you add a comment which explains the intent?
>
> +struct sk_buff *dev_alloc_skb(unsigned int length)
> +{
> + /* There is more code here than it seems:
> + * __dev_alloc_skb is an inline */
> + return __dev_alloc_skb(length, GFP_ATOMIC);
> +}
> +EXPORT_SYMBOL(dev_alloc_skb);

I've applied his patch already, if you want this comment
please submit the patch to add it and also please use
the correct formatting of the comment.

2008-03-28 00:53:07

by David Miller

[permalink] [raw]
Subject: Re: [PATCH 4/7] [NET]: uninline skb_push, de-bloats a lot

From: "Ilpo_J?rvinen" <[email protected]>
Date: Thu, 27 Mar 2008 14:38:03 +0200

> Allyesconfig (v2.6.24-mm1):
>
> -21593 356 funcs, 2418 +, 24011 -, diff: -21593 --- skb_push
>
> Without many debug related CONFIGs (v2.6.25-rc2-mm1):
>
> -13890 341 funcs, 189 +, 14079 -, diff: -13890 --- skb_push
> skb_push | +46
>
> Signed-off-by: Ilpo J?rvinen <[email protected]>

Applied.

2008-03-28 00:53:52

by David Miller

[permalink] [raw]
Subject: Re: [PATCH 5/7] [NET]: uninline dst_release

From: "Ilpo_J?rvinen" <[email protected]>
Date: Thu, 27 Mar 2008 14:38:04 +0200

> Codiff stats (allyesconfig, v2.6.24-mm1):
> -16420 187 funcs, 103 +, 16523 -, diff: -16420 --- dst_release
>
> Without number of debug related CONFIGs (v2.6.25-rc2-mm1):
> -7257 186 funcs, 70 +, 7327 -, diff: -7257 --- dst_release
> dst_release | +40
>
> Signed-off-by: Ilpo J?rvinen <[email protected]>

Applied.

2008-03-28 00:54:26

by David Miller

[permalink] [raw]
Subject: Re: [PATCH 6/7] [NET]: uninline skb_trim, de-bloats

From: "Ilpo_J?rvinen" <[email protected]>
Date: Thu, 27 Mar 2008 14:38:05 +0200

> Allyesconfig (v2.6.24-mm1):
> -10976 209 funcs, 123 +, 11099 -, diff: -10976 --- skb_trim
>
> Without number of debug related CONFIGs (v2.6.25-rc2-mm1):
> -7360 192 funcs, 131 +, 7491 -, diff: -7360 --- skb_trim
> skb_trim | +42
>
> Signed-off-by: Ilpo J?rvinen <[email protected]>

Applied.

2008-03-28 00:54:51

by David Miller

[permalink] [raw]
Subject: Re: [PATCH 7/7] [SCTP]: Remove sctp_add_cmd_sf wrapper bloat

From: "Ilpo_J?rvinen" <[email protected]>
Date: Thu, 27 Mar 2008 14:38:06 +0200

> With a was number of callsites sctp_add_cmd_sf wrapper bloats
> kernel by some amount. Due to unlikely tracking allyesconfig,
> with the initial result were around ~7kB (thus caught my
> attention) while a non-debug config produced only ~2.3kB effect.
>
> I (ij) proposed first a patch to uninline it but Vlad responded
> with a patch that removed the only sctp_add_cmd call which is
> wrapped by sctp_add_cmd_sf (I wasn't sure if I could do that).
> I did minor cleanup to Vlad's patch.
>
> Signed-off-by: Ilpo J?rvinen <[email protected]>
> Signed-off-by: Vlad Yasevich <[email protected]>

Applied, thanks.

2008-03-28 00:55:17

by David Miller

[permalink] [raw]
Subject: Re: [PATCH 0/7]: uninline some net related static inline in .h bloaters

From: "Ilpo_J?rvinen" <[email protected]>
Date: Thu, 27 Mar 2008 14:37:59 +0200

> As suggested by Andrew, I rerun the uninlining of all what's in .h
> (v2.6.25-rc2-mm1 this time) to fix the (un)likely profiling overhead
> which showed up in the allyesconfig results. The config was manually
> tweaked from allyesconfig to not include a number of debug related
> options [1]. Other setup: 32-bit x86, gcc (GCC) 4.1.2 20070626
> (Red Hat 4.1.2-13). I also tweaked the uninlining machinery a bit and
> got even better coverage than last time.

I applied everything and will push to net-2.6.26 after some
build tests.

Thanks!

2008-03-28 00:55:45

by Joe Perches

[permalink] [raw]
Subject: Re: [PATCH 1/7] [NET]: uninline skb_put, de-bloats a lot

On Thu, 2008-03-27 at 19:11 -0500, Matt Mackall wrote:
> In the 486 era, when CPU performance was close to 1:1 with memory,
> branches were more expensive than sequential memory fetches, and
> registers were scarce, inlining made a fair amount of sense.
>
> But now we've moved very far away from that indeed:

Systems have certainly improved but Linux is used in a
wide variety of CPU Hz, memory & register architectures.

Some of those systems haven't changed at all.
Some of those systems have sufficient cache for a
networking stack.

I think this change could negatively impact some of these
different uses and systems.

> In the case of this patch, removing 60-100k from the network stack means
> we're almost certainly avoiding a lot of cache misses in the big picture
> while taking a few cycle hit per packet in the smallest scale.

I think the quantities of big v small are instance dependent
and it might be prudent to have the capability to keep these
functions inline.

2008-03-28 01:20:17

by Matt Mackall

[permalink] [raw]
Subject: Re: [PATCH 1/7] [NET]: uninline skb_put, de-bloats a lot


On Thu, 2008-03-27 at 17:54 -0700, Joe Perches wrote:
> On Thu, 2008-03-27 at 19:11 -0500, Matt Mackall wrote:
> > In the 486 era, when CPU performance was close to 1:1 with memory,
> > branches were more expensive than sequential memory fetches, and
> > registers were scarce, inlining made a fair amount of sense.
> >
> > But now we've moved very far away from that indeed:
>
> Systems have certainly improved but Linux is used in a
> wide variety of CPU Hz, memory & register architectures.
>
> Some of those systems haven't changed at all.

It's true. In particular, 486s haven't changed at all since the 486 era.
What's changed is that people no longer run 486s to go fast, they run
them to save money. Saving memory is a win for those people.

The same goes for embedded systems. Saving memory is much higher on the
priority scale than performance. And the fact that saving memory on the
low end aligns very nicely with increasing performance on the high end
means that's the direction we're going.

--
Mathematics is the supreme nostalgia of our time.

2008-03-28 01:42:46

by Arnaldo Carvalho de Melo

[permalink] [raw]
Subject: Re: [PATCH 3/7] [NET]: uninline dev_alloc_skb, de-bloats a lot

Em Thu, Mar 27, 2008 at 05:52:12PM -0700, David Miller escreveu:
> From: Denys Vlasenko <[email protected]>
> Date: Fri, 28 Mar 2008 00:36:59 +0100
>
> > Can you add a comment which explains the intent?
> >
> > +struct sk_buff *dev_alloc_skb(unsigned int length)
> > +{
> > + /* There is more code here than it seems:
> > + * __dev_alloc_skb is an inline */
> > + return __dev_alloc_skb(length, GFP_ATOMIC);
> > +}
> > +EXPORT_SYMBOL(dev_alloc_skb);
>
> I've applied his patch already, if you want this comment
> please submit the patch to add it and also please use
> the correct formatting of the comment.

Which is, of course:

/*
* There is more code here than it seems:
* __def_alloc_skb is an inline
*/

Ilpo: Keep up the debloating jihad/crusade/campaign/surge/kamppailla! 8-)

- arnaldo

2008-03-28 19:57:18

by Ilpo Järvinen

[permalink] [raw]
Subject: Re: [PATCH 1/7] [NET]: uninline skb_put, de-bloats a lot

On Thu, 27 Mar 2008, Matt Mackall wrote:
> On Thu, 2008-03-27 at 17:54 -0700, Joe Perches wrote:
> > On Thu, 2008-03-27 at 19:11 -0500, Matt Mackall wrote:
> > > In the 486 era, when CPU performance was close to 1:1 with memory,
> > > branches were more expensive than sequential memory fetches, and
> > > registers were scarce, inlining made a fair amount of sense.
> > >
> > > But now we've moved very far away from that indeed:
> >
> > Systems have certainly improved but Linux is used in a
> > wide variety of CPU Hz, memory & register architectures.
> >
> > Some of those systems haven't changed at all.
>
> It's true. In particular, 486s haven't changed at all since the 486 era.
> What's changed is that people no longer run 486s to go fast, they run
> them to save money. Saving memory is a win for those people.

I'd guess though that they won't get that big size saving because they
probably have carefully tuned configs as well.

> The same goes for embedded systems. Saving memory is much higher on the
> priority scale than performance. And the fact that saving memory on the
> low end aligns very nicely with increasing performance on the high end
> means that's the direction we're going.

Also if some TLB misses are avoided due to uninlining, it may well
rebalance the equation (each config & scenario is quite unique though).

--
i.

2008-03-28 22:36:22

by Denys Vlasenko

[permalink] [raw]
Subject: Re: [PATCH 3/7] [NET]: uninline dev_alloc_skb, de-bloats a lot

On Friday 28 March 2008 01:52, David Miller wrote:
> From: Denys Vlasenko <[email protected]>
> Date: Fri, 28 Mar 2008 00:36:59 +0100
>
> > Can you add a comment which explains the intent?
> >
> > +struct sk_buff *dev_alloc_skb(unsigned int length)
> > +{
> > + /* There is more code here than it seems:
> > + * __dev_alloc_skb is an inline */
> > + return __dev_alloc_skb(length, GFP_ATOMIC);
> > +}
> > +EXPORT_SYMBOL(dev_alloc_skb);
>
> I've applied his patch already, if you want this comment
> please submit the patch to add it and also please use
> the correct formatting of the comment.

Here it is.

Signed-off-by: Denys Vlasenko <[email protected]>
--
vda


Attachments:
(No filename) (695.00 B)
dev_alloc_skb_comment.diff (374.00 B)
Download all attachments

2008-03-28 22:58:16

by David Miller

[permalink] [raw]
Subject: Re: [PATCH 3/7] [NET]: uninline dev_alloc_skb, de-bloats a lot

From: Denys Vlasenko <[email protected]>
Date: Fri, 28 Mar 2008 23:34:29 +0100

> Signed-off-by: Denys Vlasenko <[email protected]>

Applied, thanks!

2008-03-29 04:52:47

by Bill Fink

[permalink] [raw]
Subject: Re: [PATCH 3/7] [NET]: uninline dev_alloc_skb, de-bloats a lot

Typo in patch (see below):

On Fri, 28 Mar 2008, Denys Vlasenko wrote:

> On Friday 28 March 2008 01:52, David Miller wrote:
> > From: Denys Vlasenko <[email protected]>
> > Date: Fri, 28 Mar 2008 00:36:59 +0100
> >
> > > Can you add a comment which explains the intent?
> > >
> > > +struct sk_buff *dev_alloc_skb(unsigned int length)
> > > +{
> > > + /* There is more code here than it seems:
> > > + * __dev_alloc_skb is an inline */
> > > + return __dev_alloc_skb(length, GFP_ATOMIC);
> > > +}
> > > +EXPORT_SYMBOL(dev_alloc_skb);
> >
> > I've applied his patch already, if you want this comment
> > please submit the patch to add it and also please use
> > the correct formatting of the comment.
>
> Here it is.
>
> Signed-off-by: Denys Vlasenko <[email protected]>
> --
> vda
>
> --- net-2.6.26/net/core/skbuff.c Fri Mar 28 23:31:05 2008
> +++ net-2.6.26.comment/net/core/skbuff.c Fri Mar 28 23:33:03 2008
> @@ -277,6 +277,10 @@
> */
> struct sk_buff *dev_alloc_skb(unsigned int length)
> {
> + /*
> + * There is more code here than it seems:
> + * __def_alloc_skb is an inline
^
v

> + */
> return __dev_alloc_skb(length, GFP_ATOMIC);
> }
> EXPORT_SYMBOL(dev_alloc_skb);

-Bill