Hi,
Repeated calling to printf() for 13 times is a dire waste of CPU cycles.
For performance, combine all those calls into one while source code
formatting is preserved for readability.
Compile tested only.
Signed-off-by: Jean Sacren <[email protected]>
---
arch/x86/boot/compressed/mkpiggy.c | 30 ++++++++++++++++--------------
1 files changed, 16 insertions(+), 14 deletions(-)
diff --git a/arch/x86/boot/compressed/mkpiggy.c b/arch/x86/boot/compressed/mkpiggy.c
index 46a8238..0b6d82c 100644
--- a/arch/x86/boot/compressed/mkpiggy.c
+++ b/arch/x86/boot/compressed/mkpiggy.c
@@ -44,6 +44,7 @@ int main(int argc, char *argv[])
long ilen;
unsigned long offs;
FILE *f;
+ char *msg;
if (argc < 2) {
fprintf(stderr, "Usage: %s compressed_file\n", argv[0]);
@@ -82,21 +83,22 @@ int main(int argc, char *argv[])
offs += 64*1024 + 128; /* Add 64K + 128 bytes slack */
offs = (offs+4095) & ~4095; /* Round to a 4K boundary */
- printf(".section \".rodata..compressed\",\"a\",@progbits\n");
- printf(".globl z_input_len\n");
- printf("z_input_len = %lu\n", ilen);
- printf(".globl z_output_len\n");
- printf("z_output_len = %lu\n", (unsigned long)olen);
- printf(".globl z_extract_offset\n");
- printf("z_extract_offset = 0x%lx\n", offs);
/* z_extract_offset_negative allows simplification of head_32.S */
- printf(".globl z_extract_offset_negative\n");
- printf("z_extract_offset_negative = -0x%lx\n", offs);
-
- printf(".globl input_data, input_data_end\n");
- printf("input_data:\n");
- printf(".incbin \"%s\"\n", argv[1]);
- printf("input_data_end:\n");
+ msg = ".section \".rodata..compressed\",\"a\",@progbits\n"
+ ".globl z_input_len\n"
+ "z_input_len = %lu\n"
+ ".globl z_output_len\n"
+ "z_output_len = %lu\n"
+ ".globl z_extract_offset\n"
+ "z_extract_offset = 0x%lx\n"
+ ".globl z_extract_offset_negative\n"
+ "z_extract_offset_negative = -0x%lx\n"
+ ".globl input_data, input_data_end\n"
+ "input_data:\n"
+ ".incbin \"%s\"\n"
+ "input_data_end:\n";
+
+ printf(msg, ilen, (unsigned long)olen, offs, offs, argv[1]);
return 0;
}
--
1.7.2.2
--
Jean Sacren
On 06/05/2011 05:40 PM, Jean Sacren wrote:
> Hi,
>
> Repeated calling to printf() for 13 times is a dire waste of CPU cycles.
> For performance, combine all those calls into one while source code
> formatting is preserved for readability.
>
> Compile tested only.
>
> Signed-off-by: Jean Sacren <[email protected]>
You're got to be bloody kidding.
First of all, this is a build time tool which is executed exactly once
during the entire kernel build.
Second, printf execution time is largely dependent on the size
formatting string; since the I/O is buffered it is only issued once
anyway... which basically means that there is no time saved at all.
Third, the resulting code is substantially harder to read.
Fourth, carrying this as a patch will cost kernel developers more time
in additional git execution time than it ever will save them in build time.
Nacked-by: H. Peter Anvin <[email protected]>
-hpa
From: "H. Peter Anvin" <[email protected]>
Date: Mon, 06 Jun 2011 10:08:32 -0700
>
> On 06/05/2011 05:40 PM, Jean Sacren wrote:
> > Hi,
> >
> > Repeated calling to printf() for 13 times is a dire waste of CPU cycles.
> > For performance, combine all those calls into one while source code
> > formatting is preserved for readability.
> >
> > Compile tested only.
> >
> > Signed-off-by: Jean Sacren <[email protected]>
>
> You're got to be bloody kidding.
You're quite strong in opinion, but rather weak in code. Sadly you even
didn't get the contraction right...
>
> First of all, this is a build time tool which is executed exactly once
> during the entire kernel build.
>
> Second, printf execution time is largely dependent on the size
> formatting string; since the I/O is buffered it is only issued once
> anyway... which basically means that there is no time saved at all.
The above two arguments have nothing to do with the fact you printed out
13 lines individually, where they should have been printed out
collectively. To make my point here, why you didn't print out each
character individually?
I looked at the history of the file and the way you did it is a birth
shortcoming. For the past two years, no code has ever been necessarily
inserted between these 13 printf() calls. Looking into the future, that
block of code shall be facilitated by the updated patch.
>
> Third, the resulting code is substantially harder to read.
With the updated patch, this argument doesn't stand at all.
>
> Fourth, carrying this as a patch will cost kernel developers more time
> in additional git execution time than it ever will save them in build time.
With the updated patch, this argument doesn't make any sense at all.
>
> Nacked-by: H. Peter Anvin <[email protected]>
>
> -hpa
From: Jean Sacren <[email protected]>
Date: Sun, 12 Jun 2011 04:15:30 -0600
Subject: [PATCH 1/1] boot: Enhance performance by eliminating unnecessary calls to printf()
Repeated calling to printf() for 13 times is a dire waste of CPU cycles.
For performance, combine all those calls into one while source code
formatting is preserved for readability.
Signed-off-by: Jean Sacren <[email protected]>
---
arch/x86/boot/compressed/mkpiggy.c | 28 ++++++++++++++--------------
1 files changed, 14 insertions(+), 14 deletions(-)
diff --git a/arch/x86/boot/compressed/mkpiggy.c b/arch/x86/boot/compressed/mkpiggy.c
index 46a8238..1e21fca 100644
--- a/arch/x86/boot/compressed/mkpiggy.c
+++ b/arch/x86/boot/compressed/mkpiggy.c
@@ -82,21 +82,21 @@ int main(int argc, char *argv[])
offs += 64*1024 + 128; /* Add 64K + 128 bytes slack */
offs = (offs+4095) & ~4095; /* Round to a 4K boundary */
- printf(".section \".rodata..compressed\",\"a\",@progbits\n");
- printf(".globl z_input_len\n");
- printf("z_input_len = %lu\n", ilen);
- printf(".globl z_output_len\n");
- printf("z_output_len = %lu\n", (unsigned long)olen);
- printf(".globl z_extract_offset\n");
- printf("z_extract_offset = 0x%lx\n", offs);
/* z_extract_offset_negative allows simplification of head_32.S */
- printf(".globl z_extract_offset_negative\n");
- printf("z_extract_offset_negative = -0x%lx\n", offs);
-
- printf(".globl input_data, input_data_end\n");
- printf("input_data:\n");
- printf(".incbin \"%s\"\n", argv[1]);
- printf("input_data_end:\n");
+ printf(".section \".rodata..compressed\",\"a\",@progbits\n"
+ ".globl z_input_len\n"
+ "z_input_len = %lu\n"
+ ".globl z_output_len\n"
+ "z_output_len = %lu\n"
+ ".globl z_extract_offset\n"
+ "z_extract_offset = 0x%lx\n"
+ ".globl z_extract_offset_negative\n"
+ "z_extract_offset_negative = -0x%lx\n"
+ ".globl input_data, input_data_end\n"
+ "input_data:\n"
+ ".incbin \"%s\"\n"
+ "input_data_end:\n",
+ ilen, (unsigned long)olen, offs, offs, argv[1]);
return 0;
}
--
1.7.2.2
--
Jean Sacren
On Sun, 12 Jun 2011 12:36:28 MDT, Jean Sacren said:
> Subject: [PATCH 1/1] boot: Enhance performance by eliminating unnecessary calls to printf()
>
> Repeated calling to printf() for 13 times is a dire waste of CPU cycles.
> For performance, combine all those calls into one while source code
> formatting is preserved for readability.
*facepalm*
I defy anybody to find a *less* performance-critical piece of code to
micro-optimize. It doesn't even make a boot faster, it makes a kernel build
non-measurably faster.
If you wanted to make kernel builds faster, go look at Ingo's numbers regarding
the current #include mess - fixing *that* will be of more benefit than this...
On Sun, Jun 12, 2011 at 14:36, Jean Sacren <[email protected]> wrote:
> From: "H. Peter Anvin" <[email protected]>
> Date: Mon, 06 Jun 2011 10:08:32 -0700
>>
>> On 06/05/2011 05:40 PM, Jean Sacren wrote:
>> First of all, this is a build time tool which is executed exactly once
>> during the entire kernel build.
>>
>> Second, printf execution time is largely dependent on the size
>> formatting string; since the I/O is buffered it is only issued once
>> anyway... which basically means that there is no time saved at all.
>
> The above two arguments have nothing to do with the fact you printed out
> 13 lines individually, where they should have been printed out
> collectively. To make my point here, why you didn't print out each
> character individually?
Because that would be substantially harder to read. And harder to
write, at that.
>
> I looked at the history of the file and the way you did it is a birth
> shortcoming. For the past two years, no code has ever been necessarily
> inserted between these 13 printf() calls. Looking into the future, that
> block of code shall be facilitated by the updated patch.
>>
>> Third, the resulting code is substantially harder to read.
>
> With the updated patch, this argument doesn't stand at all.
Sure it does. With the original code, the arguments to printf are
right next to the format specifiers; with your patch you have to count
format specifiers to figure out which argument goes to what. And for
what? A millisecond or two of time saved, at compile time, probably
less?
>> Fourth, carrying this as a patch will cost kernel developers more time
>> in additional git execution time than it ever will save them in build time.
>
> With the updated patch, this argument doesn't make any sense at all.
You've already cost kernel developers more time in responding to your
patch than they'll save in build time. Asking them to apply it to git
wastes even more time.
> + ? ? ? printf(".section \".rodata..compressed\",\"a\",@progbits\n"
> + ? ? ? ? ? ? ?".globl z_input_len\n"
> + ? ? ? ? ? ? ?"z_input_len = %lu\n"
> + ? ? ? ? ? ? ?".globl z_output_len\n"
> + ? ? ? ? ? ? ?"z_output_len = %lu\n"
> + ? ? ? ? ? ? ?".globl z_extract_offset\n"
> + ? ? ? ? ? ? ?"z_extract_offset = 0x%lx\n"
> + ? ? ? ? ? ? ?".globl z_extract_offset_negative\n"
> + ? ? ? ? ? ? ?"z_extract_offset_negative = -0x%lx\n"
> + ? ? ? ? ? ? ?".globl input_data, input_data_end\n"
> + ? ? ? ? ? ? ?"input_data:\n"
> + ? ? ? ? ? ? ?".incbin \"%s\"\n"
> + ? ? ? ? ? ? ?"input_data_end:\n",
> + ? ? ? ? ? ? ?ilen, (unsigned long)olen, offs, offs, argv[1]);
Quick! Which printf argument does z_extract_offset correspond to?
If you had to count (or it took you more than half a second
otherwise), this code is less readable than it was before.