Hi!
I recently posted a module for twofish which implements the algorithm in
assembler (http://marc.theaimsgroup.com/?l=linux-kernel&m=106210815132365&w=2)
Unfortunately the assembler used is masm. I'd like to change that. Netwide
Assembler (nasm) is the assembler of my choice since it focuses on
portablity and has a more powerful macro facility (macros are heavily used
by 2fish_86.asm). But as I'd like to make my work useful (aim for an
inclusion in the kernel) I noticed that this would be the first module to
depend on nasm. Everything else uses gas.
So the question is: Is a patch which depends on nasm likely to be merged?
For more information on "what is nasm":
http://nasm.sourceforge.net/doc/html/nasmdoc1.html#section-1.1
Regards, Clemens
fruhwirth clemens wrote:
> Hi!
>
> I recently posted a module for twofish which implements the algorithm in
> assembler (http://marc.theaimsgroup.com/?l=linux-kernel&m=106210815132365&w=2)
>
> Unfortunately the assembler used is masm. I'd like to change that. Netwide
> Assembler (nasm) is the assembler of my choice since it focuses on
> portablity and has a more powerful macro facility (macros are heavily used
> by 2fish_86.asm). But as I'd like to make my work useful (aim for an
> inclusion in the kernel) I noticed that this would be the first module to
> depend on nasm. Everything else uses gas.
>
> So the question is: Is a patch which depends on nasm likely to be merged?
>
I hope no ...
Some years ago, we converted the only part of the kernel that used as86
to GNU as: see arch/i386/boot. I think this was an improvement.
Using nasm for only one small piece of code would be a regression, imho.
Regards.
PS: GCC pass .S assembler source files through cpp, so you get macros
expanding.
--
Yann Droneaud <[email protected]>
<[email protected]> <[email protected]>
On Thu, 4 Sep 2003, Yann Droneaud wrote:
> fruhwirth clemens wrote:
>
> > Hi!
> >
> > I recently posted a module for twofish which implements the algorithm in
> > assembler (http://marc.theaimsgroup.com/?l=linux-kernel&m=106210815132365&w=2)
> >
> > Unfortunately the assembler used is masm. I'd like to change that. Netwide
> > Assembler (nasm) is the assembler of my choice since it focuses on
> > portablity and has a more powerful macro facility (macros are heavily used
> > by 2fish_86.asm). But as I'd like to make my work useful (aim for an
> > inclusion in the kernel) I noticed that this would be the first module to
> > depend on nasm. Everything else uses gas.
> >
> > So the question is: Is a patch which depends on nasm likely to be merged?
> >
>
> I hope no ...
>
> Some years ago, we converted the only part of the kernel that used as86
> to GNU as: see arch/i386/boot. I think this was an improvement.
> Using nasm for only one small piece of code would be a regression, imho.
>
> Regards.
>
> PS: GCC pass .S assembler source files through cpp, so you get macros
> expanding.
>
GAS also has macro capability. It's just "strange". However, it
does everything MASM (/ducks/) can do. It's just strange, backwards, etc.
It takes some getting used to.
If you decide to use gcc as a preprocessor, you can't use comments,
NotGood(tm) because the "#" and some stuff after it gets "interpreted"
by cpp.
Cheers,
Dick Johnson
Penguin : Linux version 2.4.22 on an i686 machine (794.73 BogoMips).
Note 96.31% of all statistics are fiction.
"Richard B. Johnson" <[email protected]> writes:
> If you decide to use gcc as a preprocessor, you can't use comments,
> NotGood(tm) because the "#" and some stuff after it gets "interpreted"
> by cpp.
Although one could use C-style comments in this scenario, yes?
--
Ah bay tsay day vitamin.
On Thu, 4 Sep 2003, Sean Neakums wrote:
> "Richard B. Johnson" <[email protected]> writes:
>
> > If you decide to use gcc as a preprocessor, you can't use comments,
> > NotGood(tm) because the "#" and some stuff after it gets "interpreted"
> > by cpp.
>
> Although one could use C-style comments in this scenario, yes?
>
Sure. Then it's not assembly. It's some polymorphic conglomeration
of crap ......... don't get me started. If you write in assembler,
please learn to use the assembler. Assembly is not 'C'.
Use the right tool for the right thing. Both are tools, the fact
that you can shovel with an axe does not make the axe a shovel.
Cheers,
Dick Johnson
Penguin : Linux version 2.4.22 on an i686 machine (794.73 BogoMips).
Note 96.31% of all statistics are fiction.
fruhwirth clemens wrote:
> Hi!
>
> I recently posted a module for twofish which implements the algorithm in
> assembler (http://marc.theaimsgroup.com/?l=linux-kernel&m=106210815132365&w=2)
>
> Unfortunately the assembler used is masm. I'd like to change that. Netwide
> Assembler (nasm) is the assembler of my choice since it focuses on
> portablity and has a more powerful macro facility (macros are heavily used
> by 2fish_86.asm). But as I'd like to make my work useful (aim for an
> inclusion in the kernel) I noticed that this would be the first module to
> depend on nasm. Everything else uses gas.
>
Check those archive about the boot code rewrite.
http://www.ussg.iu.edu/hypermail/linux/kernel/9908.0/0107.html
http://www.ussg.iu.edu/hypermail/linux/kernel/9908.1/0083.html
http://www.ussg.iu.edu/hypermail/linux/kernel/9907.3/0960.html
--
Yann Droneaud <[email protected]>
<[email protected]> <[email protected]>
On Thursday 04 September 2003 21:44, Yann Droneaud wrote:
> fruhwirth clemens wrote:
> > Hi!
> >
> > I recently posted a module for twofish which implements the algorithm in
> > assembler
> > (http://marc.theaimsgroup.com/?l=linux-kernel&m=106210815132365&w=2)
> >
> > Unfortunately the assembler used is masm. I'd like to change that.
> > Netwide Assembler (nasm) is the assembler of my choice since it focuses
> > on portablity and has a more powerful macro facility (macros are heavily
> > used by 2fish_86.asm). But as I'd like to make my work useful (aim for an
> > inclusion in the kernel) I noticed that this would be the first module to
> > depend on nasm. Everything else uses gas.
> >
> > So the question is: Is a patch which depends on nasm likely to be merged?
>
> I hope no ...
>
> Some years ago, we converted the only part of the kernel that used as86
> to GNU as: see arch/i386/boot. I think this was an improvement.
> Using nasm for only one small piece of code would be a regression, imho.
>
Concur, not worthwhile to start using a fairly unsupported tool in the kernel.
As to using assembler, It is better to get rid of it but in special cases.
Todays compilers are the better coders in 98+% of applications, and if you
follow some of the discussions here on the list, you will be amazed what
people do with a C compiler - all portable and much more maintainable.
I guess your code should be 80-90% C and 10-20% assmbler. This will make it
up to 10 times a portable.
As to using nasm, note for gas and gcc 3.2+:
+ GAS does intel syntax too using the directive
.intel_syntax
+ GCC can do intel syntax asm output as well, tried
it on some mid size apps - it works fine.
As to "abuse" of macros, macros beyond C-style pre-processing are generaly
obsolete, better use C, or write an app-specific front end.
I did the latter to "preprocess" a common source, running 16 bit/embedded
source through wasm and 32bit/linux source through gas.
Regards
Michael
On Thu, Sep 04, 2003 at 12:42:45PM +0200, Fruhwirth Clemens wrote:
> Hi!
>
> I recently posted a module for twofish which implements the algorithm in
> assembler (http://marc.theaimsgroup.com/?l=linux-kernel&m=106210815132365&w=2)
>
> Unfortunately the assembler used is masm. I'd like to change that. Netwide
> Assembler (nasm) is the assembler of my choice since it focuses on
> portablity and has a more powerful macro facility (macros are heavily used
> by 2fish_86.asm). But as I'd like to make my work useful (aim for an
> inclusion in the kernel) I noticed that this would be the first module to
> depend on nasm. Everything else uses gas.
>
> So the question is: Is a patch which depends on nasm likely to be merged?
>
> For more information on "what is nasm":
> http://nasm.sourceforge.net/doc/html/nasmdoc1.html#section-1.1
nasm is i386-only.
gas is part of binutils and supports lots of target CPUs: I think the
first part of a new gcc architecture backend is the machines description
so that gas can assemble code for it.
Greets, Antonio.
Richard B. Johnson wrote:
>
> GAS also has macro capability. It's just "strange". However, it
> does everything MASM (/ducks/) can do. It's just strange, backwards, etc.
> It takes some getting used to.
>
> If you decide to use gcc as a preprocessor, you can't use comments,
> NotGood(tm) because the "#" and some stuff after it gets "interpreted"
> by cpp.
>
Yep, this is why arch/i386/boot/Makefile use -traditional flag.
In the past i rewrote all the comments with the C++ and now C99 form:
// comment
but my changes wasn't applied.
Files were hosted at http://project.meuh.eu.org/kernel/,
but they are no more on line, and i'm away from the computer where they
are backed up.
--
Yann Droneaud <[email protected]>
<[email protected]> <[email protected]>
Richard B. Johnson wrote:
> On Thu, 4 Sep 2003, Sean Neakums wrote:
>
>
>>"Richard B. Johnson" <[email protected]> writes:
>>
>>
>>>If you decide to use gcc as a preprocessor, you can't use comments,
>>>NotGood(tm) because the "#" and some stuff after it gets "interpreted"
>>>by cpp.
>>
>>Although one could use C-style comments in this scenario, yes?
>>
>
>
> Sure. Then it's not assembly. It's some polymorphic conglomeration
> of crap ......... don't get me started. If you write in assembler,
> please learn to use the assembler. Assembly is not 'C'.
>
Comments are not useful to the assembler, it doesn't look at it ;)
So being removed by cpp or by the assembler itself does not matter for me.
GAS support C style comment and there's an option to cpp to keep comment
if you want comment in the assembler listing.
The preprocessor is 'just a kind of filter', you can preprocess your
files through m4, sed, perl, etc ... what's the problem with that ?
> Use the right tool for the right thing. Both are tools, the fact
> that you can shovel with an axe does not make the axe a shovel.
>
...
--
Yann Droneaud <[email protected]>
<[email protected]> <[email protected]>
On Thu, Sep 04, 2003 at 10:57:12PM +0800, Michael Frank wrote:
> On Thursday 04 September 2003 21:44, Yann Droneaud wrote:
> >
> > Using nasm for only one small piece of code would be a regression, imho.
>
> Concur, not worthwhile to start using a fairly unsupported tool in the kernel.
>
> As to using assembler, It is better to get rid of it but in special cases.
> Todays compilers are the better coders in 98+% of applications, and if you
> follow some of the discussions here on the list, you will be amazed what
> people do with a C compiler - all portable and much more maintainable.
The gcc optimized code for sure much better than I do, but gcc's
optimization captabilities is for sure a joke compared to the guy how wrote
2fish_86.asm (just have a look at the source). The assembler implementation
is twice as fast as the C implemention we have in the kernel. Same is true
for AES (although just 50% faster instead of 100%:
http://clemens.endorphin.org/patches/aes-i586-asm-2.5.58.diff . That's gas
btw. )
> I guess your code should be 80-90% C and 10-20% assmbler. This will make it
> up to 10 times a portable.
The Twofish code is C but has hooks to use an asm backend in special cases
(keysetup, en/decrypt). But a plain C version of twofish is already present
in the kernel.
> As to using nasm, note for gas and gcc 3.2+:
>
> + GAS does intel syntax too using the directive
> .intel_syntax
That's certainly nice to hear. At least some cut/pasting can be done :)
Regards, Clemens
On Thursday 04 September 2003 17:57, Michael Frank wrote:
> Concur, not worthwhile to start using a fairly unsupported tool in the
> kernel.
>
> As to using assembler, It is better to get rid of it but in special cases.
> Todays compilers are the better coders in 98+% of applications, and if you
Better coders? Show me the evidence.
> follow some of the discussions here on the list, you will be amazed what
> people do with a C compiler - all portable and much more maintainable.
Portable yes. Maintainable yes. Better code _no_.
I'd say compiler generated asm code quality can be anywhere in between of
"hair raising crawling horror" and "not so bad although I can do better".
I have never seen really clever compiler yet. Writing a good compiler
is a very tough thing to do.
--
vda
On Thu, 4 September 2003 12:42:45 +0200, Fruhwirth Clemens wrote:
>
> I recently posted a module for twofish which implements the algorithm in
> assembler (http://marc.theaimsgroup.com/?l=linux-kernel&m=106210815132365&w=2)
>
> Unfortunately the assembler used is masm. I'd like to change that. Netwide
> Assembler (nasm) is the assembler of my choice since it focuses on
> portablity and has a more powerful macro facility (macros are heavily used
> by 2fish_86.asm). But as I'd like to make my work useful (aim for an
> inclusion in the kernel) I noticed that this would be the first module to
> depend on nasm. Everything else uses gas.
Orthogonally to the nasm/gas question, there are the problems or
performance and maintenance.
Do some benchmarks on lots of different machines and measure the
performance of the asm and c code. If it's faster on PPro but not on
PIII or Athlon, forget about it.
How big is the .text of the asm and c variant? If the text of yours
is much bigger, you just traded 2fish performance for general
performance. Everything else will suffer from cache misses. Forget
your microbenchmark, your variant will make the machine slower.
How many bugs are in your code? Are there any buffer overflows or
other security holes? How can you be sure about it? (Most people
aren't sure about c either, but it is much easier to check.)
If your code fails on any one of these questions, forget about it. If
it survives them, post your results and have someone else verify them.
As to nasm/gas, you will have a hard time to explain, why joe user
needs yet another tool to compile his own kernel. There may be good
arguments in favor, but they better be good, else translate to gas.
J?rn
--
Homo Sapiens is a goal, not a description.
-- unknown
> Do some benchmarks on lots of different machines and measure the
> performance of the asm and c code. If it's faster on PPro but not on
> PIII or Athlon, forget about it.
Presumably the asm code is tuned for a specific processor, and
intended to be used only on kernels optimised for that CPU.
On the other hand, unless it's translated to gas, it's more or less
useless in the context of the kernel - remember the 'perl in the
toolchain' discussion?
John.
On Fri, Sep 05, 2003 at 01:42:20PM +0200, J?rn Engel wrote:
> On Thu, 4 September 2003 12:42:45 +0200, Fruhwirth Clemens wrote:
>
> Do some benchmarks on lots of different machines and measure the
> performance of the asm and c code. If it's faster on PPro but not on
> PIII or Athlon, forget about it.
>
> How big is the .text of the asm and c variant? If the text of yours
> is much bigger, you just traded 2fish performance for general
> performance. Everything else will suffer from cache misses. Forget
> your microbenchmark, your variant will make the machine slower.
Men! Why is everyone doubting the usefulness of assembler optimized parts?
It's twice as fast on my Athlon. I assert the same is true for P3/P4. Just
test.
twofish-i586.ko's .text section is smaller than the kernel's twofish.ko's. 945
bytes to be precise. Please note that twofish-i586 includes TWO
implementations: C and assembler. Just think about how much smaller it will
be when I rip out the C part.
So much for that.
> How many bugs are in your code?
42... Is this a serious question?
> Are there any buffer overflows or other security holes?
> How can you be sure about it?
How can you be sure? Mathematical program verification applies quite badly to
assembler.
> If your code fails on any one of these questions, forget about it. If
> it survives them, post your results and have someone else verify them.
I'm sorry, your critique is too generel to be useful.
Regards, Clemens
> > Are there any buffer overflows or other security holes?
> > How can you be sure about it?
>
> How can you be sure? Mathematical program verification applies quite badly
> to assembler.
The point is, if somebody does find a bug they will want to
re-assemble with Gas after they've fixed it.
> > If your code fails on any one of these questions, forget about it. If
> > it survives them, post your results and have someone else verify them.
>
> I'm sorry, your critique is too generel to be useful.
It's not, all the time the argument is not against the assembler code,
but rather against $assembler!=Gas.
John.
On Fri, Sep 05, 2003 at 01:25:24PM +0100, John Bradford wrote:
> > > Are there any buffer overflows or other security holes?
> > > How can you be sure about it?
> >
> > How can you be sure? Mathematical program verification applies quite badly
> > to assembler.
>
> The point is, if somebody does find a bug they will want to
> re-assemble with Gas after they've fixed it.
If you referring to my precompiled masm binaries, yes, if one wants to
change the source, getting masm is not nice.
But if the source is writting in nasm, nasm (LGPL) can be installed
easily..
However, the kernel folks seem to dislike to depend on an additional tool.
Actually that's the answer to my original question. Now I just have to
ponder if I favour the preferences of the kernel over the prefs of user space
programs. There are lots of user space crypto implementations, which are
potential candidates.. and for theses apps an additional dependency on nasm
is no problem.
Regards, Clemens
On Fri, 5 September 2003 14:04:46 +0200, Fruhwirth Clemens wrote:
> On Fri, Sep 05, 2003 at 01:42:20PM +0200, J?rn Engel wrote:
> > On Thu, 4 September 2003 12:42:45 +0200, Fruhwirth Clemens wrote:
> >
> > Do some benchmarks on lots of different machines and measure the
> > performance of the asm and c code. If it's faster on PPro but not on
> > PIII or Athlon, forget about it.
> >
> > How big is the .text of the asm and c variant? If the text of yours
> > is much bigger, you just traded 2fish performance for general
> > performance. Everything else will suffer from cache misses. Forget
> > your microbenchmark, your variant will make the machine slower.
>
> Men! Why is everyone doubting the usefulness of assembler optimized parts?
Because assembler is such a pain. :)
In general, you don't want any code to be in assembler, so it is your
duty to prove that this is a valid exception.
> It's twice as fast on my Athlon. I assert the same is true for P3/P4. Just
> test.
Again, that is your job. It is faster and smaller on Athlon, good.
How about i386? Does it even run on that machine? If not, you need
Kconfig to make sure it isn't compiled in for i386. Repeat for all
other platforms.
Repeat for all the other cpus.
> twofish-i586.ko's .text section is smaller than the kernel's twofish.ko's. 945
> bytes to be precise. Please note that twofish-i586 includes TWO
> implementations: C and assembler. Just think about how much smaller it will
> be when I rip out the C part.
>
> So much for that.
Ok, that is good. It might still be bigger, depending on the target
cpu, but that is unlikely.
> > How many bugs are in your code?
>
> 42... Is this a serious question?
It is. You code should at least survive test runs against a different
implementation to make sure the good case is working correctly. I
guess you have done it, but have you really? Didn't tell us yet.
> > Are there any buffer overflows or other security holes?
> > How can you be sure about it?
>
> How can you be sure? Mathematical program verification applies quite badly to
> assembler.
I'm not sure about the rest of the kernel either, but I have a much
better feeling. The encryption implementations in the kernel were
based on existing and tested code, got some more testing, were
reviewed by several people...
If noone else ever went over your code, I will assume there are some
security holes left and a machine using that code is at least
vulnerable to local users. This may still be ok, but it should be
noted in BIG LETTERS: DANGEROUS.
> > If your code fails on any one of these questions, forget about it. If
> > it survives them, post your results and have someone else verify them.
>
> I'm sorry, your critique is too generel to be useful.
Sorry, I don't feel like going through intel assembler in nasm syntax
in my free time, especially if I have to go and fetch the code myself.
Your idea looks promising - the reason why I bother to answer at all -
but if it should go in, it still needs a lot of work. The fun part is
finished, you are 20% done. ;)
J?rn
--
It does not matter how slowly you go, so long as you do not stop.
-- Confucius
On Fri, 5 Sep 2003, John Bradford wrote:
> > > Are there any buffer overflows or other security holes?
> > > How can you be sure about it?
> >
> > How can you be sure? Mathematical program verification applies quite badly
> > to assembler.
>
> The point is, if somebody does find a bug they will want to
> re-assemble with Gas after they've fixed it.
>
> > > If your code fails on any one of these questions, forget about it. If
> > > it survives them, post your results and have someone else verify them.
> >
> > I'm sorry, your critique is too generel to be useful.
>
> It's not, all the time the argument is not against the assembler code,
> but rather against $assembler!=Gas.
>
> John.
All assemblers suck. However, they are exceeding useful. The
code ends up being exactly what you write. Usually one only
needs to learn one assembler per platform. It was a real shock
for me to have to learn GAS, it was "backwards", seemed to
think everything was a '68000, and basically sucked. However,
once I learned how to use it, it became a useful tool. In
a mini-'C' library I wrote for a project, the total sharable
runtime-library size is:
crt.so: ELF 32-bit LSB shared object, Intel 80386, version 1, stripped
-rwxr-xr-x 1 root root 77896 Aug 20 2000 assembly/crt.so
-rw-r--r-- 1 root root 1448 Aug 20 2000 assembly/start.o
This includes most of the string stuff and the 'C' interface to
Linux.
The test of code that works in the 'real' world is called
regression-testing. Basically, you run the stuff. You execute
all "known" possible execution paths. If it works, it works.
If it doesn't, you fix it until it does. Seeding with faults
to see if your regression test picks it up, as is proposed
by a bunch of different testing methods, is absurd whether it's
written in assembly or C#. It doesn't matter what the
language is. You need to test procedures as "black-boxes" with
specified inputs and outputs. You also have to violate the
input specifications and show that an error, so created, doesn't
propagate. Such an error need not crash or kill the system, but
it must be detected so that invalid output doesn't occur.
Error-checkers like Lint, that use a specific langage such as 'C',
can provide the programmer with a false sense of security. You
end up with 'perfect' code with all the unwanted return-values
cast to "void", but the logic remains wrong and will fail once
the high-bit in an integer is set. So, in some sense, writing
procedures in assembly is "safer". You know what the code will
do before you run it. If you don't, stay away from assembly.
Cheers,
Dick Johnson
Penguin : Linux version 2.4.22 on an i686 machine (794.73 BogoMips).
Note 96.31% of all statistics are fiction.
On Friday 05 September 2003 06:28, insecure wrote:
> On Thursday 04 September 2003 17:57, Michael Frank wrote:
> > Concur, not worthwhile to start using a fairly unsupported tool in the
> > kernel.
> >
> > As to using assembler, It is better to get rid of it but in special
> > cases. Todays compilers are the better coders in 98+% of applications,
> > and if you
>
> Better coders? Show me the evidence.
>
> > follow some of the discussions here on the list, you will be amazed what
> > people do with a C compiler - all portable and much more maintainable.
>
> Portable yes. Maintainable yes. Better code _no_.
>
> I'd say compiler generated asm code quality can be anywhere in between of
> "hair raising crawling horror" and "not so bad although I can do better".
>
> I have never seen really clever compiler yet. Writing a good compiler
> is a very tough thing to do.
Just got another reply to this thread which helps to explain what I meant by
"better coders in 98+% of applications"
On Friday 05 September 2003 19:42, J?rn Engel wrote:
> How big is the .text of the asm and c variant? If the text of yours
> is much bigger, you just traded 2fish performance for general
> performance. Everything else will suffer from cache misses. Forget
> your microbenchmark, your variant will make the machine slower.
>
> How many bugs are in your code? Are there any buffer overflows or
> other security holes? How can you be sure about it? (Most people
> aren't sure about c either, but it is much easier to check.)
>
> If your code fails on any one of these questions, forget about it. If
> it survives them, post your results and have someone else verify them.
There is another technical argument - which I am not very familiar with:
Modern and future CPU's are optimized for high level languages, it is
just too troublesome to arrange all the instructions best-case for the
hardware to be well utilized.
Back to my original message, my implied definition of "Better coders"
is the compromise between performance, development effort, stability
and security (and more).
It does not just refer to the best possible "perfect" code.
Let me give you a example of "best possible code" (to the best of my ability):
I do mostly embedded applications, years ago did consumer design for this
kind of Hong Kong made $19.99 gimmicks ($6 FOB) priced to be purchased
"on impulse" by joe consumer.
For one of those gimmicks, I used a 4bit running on 1.5V with 1K
instruction ROM and 64 nibbles RAM doing 32KIPS (32768 instructions per
second) to establish the speed of a tennis ball it was built into.
It required floating point calculations and display on a built-in
LCD display with 64 segments.
This takes __clever__ __optimized__ code, and is at least a week of work,
and affordable only in high-volume applications.
Consider, one week of work for what you can do in C using GLIBC within
1 hour or less!
Now, please consider a real life (linux) system, which you use everyday,
of course you could make every piece of code "better" by hand coding and
optimizing, but what is the real benefit?
Assuming these millions of lines of C having been implemented in optimized
assembly, could it perform that much faster (if that is what you call
"better"), or would you use "half" the memory for the same job?
Now, what about it's stability and maintainability - not to mention COST,
even you could find all those great human coders?
Guess the pioneering days are over ;)
Regards
Michael
> > > > Are there any buffer overflows or other security holes?
> > > > How can you be sure about it?
> > >
> > > How can you be sure? Mathematical program verification applies quite badly
> > > to assembler.
> >
> > The point is, if somebody does find a bug they will want to
> > re-assemble with Gas after they've fixed it.
> >
> > > > If your code fails on any one of these questions, forget about it. If
> > > > it survives them, post your results and have someone else verify them.
> > >
> > > I'm sorry, your critique is too generel to be useful.
> >
> > It's not, all the time the argument is not against the assembler code,
> > but rather against $assembler!=Gas.
> >
> > John.
>
> All assemblers suck. However, they are exceeding useful. The
> code ends up being exactly what you write. Usually one only
> needs to learn one assembler per platform. It was a real shock
> for me to have to learn GAS, it was "backwards", seemed to
> think everything was a '68000, and basically sucked. However,
> once I learned how to use it, it became a useful tool.
Not sure whether you're agreeing with me or not, quite possibly
because my last comment used a double negative and was somewhat
ambiguous :-).
What I meant was that if a piece of perfect code exists, (and as you
point out, this can be mathematically _proven_ with assembler code,
not just demonstrated), the requirement for an open source assembler
other than Gas is not so much of a problem, because nobody should need
to touch that code. If they do, they can translate it to Gas syntax.
If the possibility of bugs exists in the code, relying on
$assembler!=Gas is a bad thing, because there will be fewer people
willing to maintain it.
> The test of code that works in the 'real' world is called
> regression-testing. Basically, you run the stuff. You execute
> all "known" possible execution paths. If it works, it works.
> If it doesn't, you fix it until it does.
I totally agree.
> You need to test procedures as "black-boxes" with
> specified inputs and outputs. You also have to violate the
> input specifications and show that an error, so created, doesn't
> propagate. Such an error need not crash or kill the system, but
> it must be detected so that invalid output doesn't occur.
>
> Error-checkers like Lint, that use a specific langage such as 'C',
> can provide the programmer with a false sense of security. You
> end up with 'perfect' code with all the unwanted return-values
> cast to "void", but the logic remains wrong and will fail once
> the high-bit in an integer is set. So, in some sense, writing
> procedures in assembly is "safer". You know what the code will
> do before you run it. If you don't, stay away from assembly.
This is part of what makes someone a 'real' programmer, in my
opinion.
In my experience, 'Unreal' programmers tend to excessively re-use code
from other applications they've written, and just hack it about until
it works, at times leaving in code for features that are never used in
the new context :-).
John.
On Thursday 04 September 2003 17:28, insecure wrote:
> On Thursday 04 September 2003 17:57, Michael Frank wrote:
> > Concur, not worthwhile to start using a fairly unsupported tool in the
> > kernel.
> >
> > As to using assembler, It is better to get rid of it but in special
> > cases. Todays compilers are the better coders in 98+% of applications,
> > and if you
>
> Better coders? Show me the evidence.
>
> > follow some of the discussions here on the list, you will be amazed what
> > people do with a C compiler - all portable and much more maintainable.
>
> Portable yes. Maintainable yes. Better code _no_.
>
> I'd say compiler generated asm code quality can be anywhere in between of
> "hair raising crawling horror" and "not so bad although I can do better".
>
> I have never seen really clever compiler yet. Writing a good compiler
> is a very tough thing to do.
Actually, you mean "writing a good optimizer is a very tough thing to do".
The problem is NOT the compiler, or the coder. A "well defined" algorithm
such as the twofish mentioned CAN be compiled well by almost any compiler,
but the code optimizer MUST be at the highest level. There is also the
problem that what is optimum for one CPU, is NOT optimum for the next
generation, even if it uses the same identical architecture. This is where
the human coder can beat the compiler. That person will make many, many passes
through the code, and try similar/related instructions to optimize the result.
This amount of optimization also means that the result may not work at all on
a different processor member of the architecture, it is MUCH more likely to
run slower. Even minor things like the size of cache in the processor will
affect the human coder.
This a compiler cannot do since the optimizer is targeted toward a family of
processors and not a single member of that family. It will provide the most
compatibility. And since it is in a higher level language, the translation to
many other architectures is possible, even those that do not have available
human coders. The advantage the compiler has is that the optimizer can recieve
input from MANY excellent coders contributing rules for code generation. This
give the compiler the ability to surpass 90+% of the coders, and exceed the
productivity of all the assember coders.
The final question is "Is it fast enough?"
Only if this question is NO does it make sense to do assember. It doesn't
matter if it can be faster. And portability means you don't have to speed
hours rewriting it...
>> Error-checkers like Lint, that use a specific langage such
>> as 'C', can provide the programmer with a false sense of
>> security. You end up with 'perfect' code with all the
>> unwanted return-values cast to "void", but the logic remains
>> wrong and will fail once the high-bit in an integer is set.
>> So, in some sense, writing procedures in assembly is
>> "safer". You know what the code will do before you run it.
>> If you don't, stay away from assembly.
> This is part of what makes someone a 'real' programmer, in my opinion.
> In my experience, 'Unreal' programmers tend to excessively
> re-use code from other applications they've written, and just
> hack it about until it works, at times leaving in code for
> features that are never used in the new context :-).
Code re-usage is not a bad thing in computer science because it can save
you much work. But it has to be done correctly. Best thing is to use
so-called "design patterns": Solutions to common problems that have been
proven to work in many different environments. So if you solved some
problem in your past programs (of course specifying it well before) and
you prove that it doesn't work only for that particular program, then
there's no need to reinvent the wheel. For example that's why you use
standard libraries for basic operations like output to console.
You're right in the part that one should not have to hack the re-used
code until it works because that leads to dirty coding.
I'd also like to mention that algorithms implemented in high-level
languages can be mathematically proven too, for example with the hoare
calculus, which provides basic axioms for handling of sequences, loops
and conditional statements.
Mehmet
On Friday 05 September 2003 15:59, Michael Frank wrote:
> Just got another reply to this thread which helps to explain what I meant
> by "better coders in 98+% of applications"
>
> On Friday 05 September 2003 19:42, J?rn Engel wrote:
> > How big is the .text of the asm and c variant? If the text of yours
> > is much bigger, you just traded 2fish performance for general
> > performance. Everything else will suffer from cache misses. Forget
> > your microbenchmark, your variant will make the machine slower.
A random example form one small unrelated program (gcc 3.2):
main:
pushl %ebp
pushl %edi
pushl %esi
pushl %ebx
subl $32, %esp
xorl %ebp, %ebp
cmpl $1, 52(%esp)
movl $0, 20(%esp)
movl $1000000, %edi <----
movl $1000000, 16(%esp) <----
movl $0, 12(%esp)
movl $.LC27, 8(%esp)
je .L274
movl $1, %esi
cmpl 52(%esp), %esi
jge .L272
No sane human will do that.
main:
pushl %ebp
pushl %edi
pushl %esi
pushl %ebx
subl $32, %esp
xorl %ebp, %ebp
cmpl $1, 52(%esp)
movl $0, 20(%esp)
movl $1000000, %edi
movl %edi, 16(%esp) <-- save 4 bytes
movl %ebp, 12(%esp) <-- save 4 bytes
movl $.LC27, 8(%esp)
je .L274
movl $1, %esi
cmpl 52(%esp), %esi
jge .L272
And this is only from a cursory examination.
> There is another technical argument - which I am not very familiar with:
> Modern and future CPU's are optimized for high level languages, it is
> just too troublesome to arrange all the instructions best-case for the
> hardware to be well utilized.
You took marketspeak too seriously.
> Back to my original message, my implied definition of "Better coders"
> is the compromise between performance, development effort, stability
> and security (and more).
>
> It does not just refer to the best possible "perfect" code.
>
> Let me give you a example of "best possible code" (to the best of my
> ability):
>
> I do mostly embedded applications, years ago did consumer design for this
> kind of Hong Kong made $19.99 gimmicks ($6 FOB) priced to be purchased
> "on impulse" by joe consumer.
>
> For one of those gimmicks, I used a 4bit running on 1.5V with 1K
> instruction ROM and 64 nibbles RAM doing 32KIPS (32768 instructions per
> second) to establish the speed of a tennis ball it was built into.
> It required floating point calculations and display on a built-in
> LCD display with 64 segments.
>
> This takes __clever__ __optimized__ code, and is at least a week of work,
> and affordable only in high-volume applications.
>
> Consider, one week of work for what you can do in C using GLIBC within
> 1 hour or less!
>
> Now, please consider a real life (linux) system, which you use everyday,
> of course you could make every piece of code "better" by hand coding and
> optimizing, but what is the real benefit?
>
> Assuming these millions of lines of C having been implemented in optimized
> assembly, could it perform that much faster (if that is what you call
> "better"), or would you use "half" the memory for the same job?
>
> Now, what about it's stability and maintainability - not to mention COST,
> even you could find all those great human coders?
>
> Guess the pioneering days are over ;)
What gives you an impression that anyone is going to rewrite linux in asm?
I _only_ saying that compiler-generated asm is not 'good'. It's mediocre.
Nothing more. I am not asm zealot.
--
vda
On Fri, 5 September 2003 20:28:37 +0300, insecure wrote:
>
> What gives you an impression that anyone is going to rewrite linux in asm?
> I _only_ saying that compiler-generated asm is not 'good'. It's mediocre.
Depends. A couple weeks back, I've entered the Linuxtag coding
contest with a friend. The objective was to optimize some matrix
multiplication. We've produced *much* faster code than anyone who
tried to do it in assembler and we entered the contest during a slow
hour, when it was half over.
Given infinite developer time, you can create better assembler code
than the compiler can. But with limited time, it is a challenge.
Plus the code isn't automagically updated to new cpus simply by
recompiling.
So in the real world, compiler generated assembler is not perfect, but
it is still faster than what most human would come up with even if
they had the time.
J?rn
--
Public Domain - Free as in Beer
General Public - Free as in Speech
BSD License - Free as in Enterprise
Shared Source - Free as in "Work will make you..."
Yann Droneaud wrote:
> Richard B. Johnson wrote:
>
>
>>GAS also has macro capability. It's just "strange". However, it
>>does everything MASM (/ducks/) can do. It's just strange, backwards, etc.
>>It takes some getting used to.
>>
>>If you decide to use gcc as a preprocessor, you can't use comments,
>>NotGood(tm) because the "#" and some stuff after it gets "interpreted"
>>by cpp.
>>
>
>
> Yep, this is why arch/i386/boot/Makefile use -traditional flag.
Isn't this throwing out the baby with the bath? It makes writting a
header that is use in both C and ASM all that much harder.
--
George Anzinger [email protected]
High-res-timers: http://sourceforge.net/projects/high-res-timers/
Preemption patch: http://www.kernel.org/pub/linux/kernel/people/rml
On Thu, Sep 04, 2003 at 10:57:12PM +0800, Michael Frank wrote:
> As to using assembler, It is better to get rid of it but in special cases.
> Todays compilers are the better coders in 98+% of applications
This has already been shot down pretty well but a good example of why
it's not true is George Woltman's prime95 (mprime on linux) program.
The core is an FFT which has been painfully rewriten several times for
different cores to maximize performance. It makes extensive use of
SSE2 on the P4, which I haven't seen compilers do much of. George is a
great coder and he's pipelined everything so well that there really
isn't much room for improvement. This assembly is several times faster
than what gcc would be able to generate.
On Fri, 05 Sep 2003 16:51:21 PDT, Aaron Lehmann said:
> On Thu, Sep 04, 2003 at 10:57:12PM +0800, Michael Frank wrote:
> > As to using assembler, It is better to get rid of it but in special cases.
> > Todays compilers are the better coders in 98+% of applications
> isn't much room for improvement. This assembly is several times faster
> than what gcc would be able to generate.
You're making the rash leap of logic that "gcc" is anywhere representative of
what "todays compilers" are capable of. For example, IBM recently released
a new version of their 'xlc' compiler with support for the PPC970 core:
http://www.spscicomp.org/ScicomP7/Presentations/Blainey-SciComp7_compiler_update.pdf
20 pages of marketing fluff, interesting graphs on the last 2 pages. The upshot
is that for the SPECint2000 and SPECfp2000 benchmark suite, IBM's xlc was easily
able to generate code that ran almost twice as fast as gcc. It was particularly
impressive on the eon, apsi, and sixtrack tests.
Not 10% faster code. Not 20%. *TWICE* as fast.
It usually isn't hard to hand-write code that's 20% faster than gcc's generated
code. Anybody think they can consistently generate code by hand that's twice as fast?
/Valdis (who wonders if the kernel could be made xlc-compilable.. ;)
On Friday 05 September 2003 20:45, J?rn Engel wrote:
> On Fri, 5 September 2003 20:28:37 +0300, insecure wrote:
> > What gives you an impression that anyone is going to rewrite linux in
> > asm? I _only_ saying that compiler-generated asm is not 'good'. It's
> > mediocre.
>
> Depends. A couple weeks back, I've entered the Linuxtag coding
> contest with a friend. The objective was to optimize some matrix
> multiplication. We've produced *much* faster code than anyone who
> tried to do it in assembler and we entered the contest during a slow
> hour, when it was half over.
Can I see the code?
--
vda
Mehmet Ceyran wrote:
>>>Error-checkers like Lint, that use a specific langage such
>>>as 'C', can provide the programmer with a false sense of
>>>security. You end up with 'perfect' code with all the
>>>unwanted return-values cast to "void", but the logic remains
>>>wrong and will fail once the high-bit in an integer is set.
>>>So, in some sense, writing procedures in assembly is
>>>"safer". You know what the code will do before you run it.
>>>If you don't, stay away from assembly.
>>
>>This is part of what makes someone a 'real' programmer, in my opinion.
>>In my experience, 'Unreal' programmers tend to excessively
>>re-use code from other applications they've written, and just
>>hack it about until it works, at times leaving in code for
>>features that are never used in the new context :-).
>
>
> Code re-usage is not a bad thing in computer science because it can save
> you much work. But it has to be done correctly. Best thing is to use
> so-called "design patterns": Solutions to common problems that have been
> proven to work in many different environments. So if you solved some
> problem in your past programs (of course specifying it well before) and
> you prove that it doesn't work only for that particular program, then
> there's no need to reinvent the wheel. For example that's why you use
> standard libraries for basic operations like output to console.
>
> You're right in the part that one should not have to hack the re-used
> code until it works because that leads to dirty coding.
>
> I'd also like to mention that algorithms implemented in high-level
> languages can be mathematically proven too, for example with the hoare
> calculus, which provides basic axioms for handling of sequences, loops
> and conditional statements.
>
> Mehmet
>
> -
Mathematical proof only within the static non executing realm. Add in
the rest of the executing environment and you are out of luck. A
correctly written logically correct program is _not_ garunteed to
produce correct results.
Cheers,
Dave
On Fri, Sep 05, 2003 at 02:25:01PM +0200, Fruhwirth Clemens wrote:
> On Fri, Sep 05, 2003 at 01:25:24PM +0100, John Bradford wrote:
> > The point is, if somebody does find a bug they will want to
> > re-assemble with Gas after they've fixed it.
>
> If you referring to my precompiled masm binaries, yes, if one wants to
> change the source, getting masm is not nice.
>
> But if the source is writting in nasm, nasm (LGPL) can be installed
> easily..
>
> However, the kernel folks seem to dislike to depend on an additional tool.
> Actually that's the answer to my original question. Now I just have to
> ponder if I favour the preferences of the kernel over the prefs of user space
> programs. There are lots of user space crypto implementations, which are
> potential candidates.. and for theses apps an additional dependency on nasm
> is no problem.
what it the problem with gas anyway? why not convert
the masterpiece to GNU Assembler? there even exists
some script to aid in masm to gas conversion ...
http://www.delorie.com/djgpp/faq/converting/asm2s-sed.html
best,
Herbert
> Regards, Clemens
insecure <[email protected]> writes:
> On Friday 05 September 2003 15:59, Michael Frank wrote:
> > Just got another reply to this thread which helps to explain what I meant
> > by "better coders in 98+% of applications"
> >
> > On Friday 05 September 2003 19:42, J?rn Engel wrote:
> > > How big is the .text of the asm and c variant? If the text of yours
> > > is much bigger, you just traded 2fish performance for general
> > > performance. Everything else will suffer from cache misses. Forget
> > > your microbenchmark, your variant will make the machine slower.
>
> A random example form one small unrelated program (gcc 3.2):
>
> main:
> pushl %ebp
> pushl %edi
> pushl %esi
> pushl %ebx
> subl $32, %esp
> xorl %ebp, %ebp
> cmpl $1, 52(%esp)
> movl $0, 20(%esp)
> movl $1000000, %edi <----
> movl $1000000, 16(%esp) <----
> movl $0, 12(%esp)
> movl $.LC27, 8(%esp)
> je .L274
> movl $1, %esi
> cmpl 52(%esp), %esi
> jge .L272
>
> No sane human will do that.
>
> main:
> pushl %ebp
> pushl %edi
> pushl %esi
> pushl %ebx
> subl $32, %esp
> xorl %ebp, %ebp
> cmpl $1, 52(%esp)
> movl $0, 20(%esp)
> movl $1000000, %edi
> movl %edi, 16(%esp) <-- save 4 bytes
> movl %ebp, 12(%esp) <-- save 4 bytes
> movl $.LC27, 8(%esp)
> je .L274
> movl $1, %esi
> cmpl 52(%esp), %esi
> jge .L272
>
> And this is only from a cursory examination.
Actually it is no as simple as that. With the instruction that uses
%edi following immediately after the instruction that populates it you cannot
execute those two instructions in parallel. So the code may be slower. The
exact rules depend on the architecture of the cpu.
> What gives you an impression that anyone is going to rewrite linux in asm?
> I _only_ saying that compiler-generated asm is not 'good'. It's mediocre.
> Nothing more. I am not asm zealot.
I think I would agree with that statement most compiler-generated assembly
code is mediocre in general. At the same time I would add most human
generated assembly is poor, and a pain to maintain.
If you concentrate on those handful of places where you need to
optimize that is reasonable. Beyond that there simply are not the
developer resources to do good assembly. And things like algorithmic
transformations in assembly are an absolute nightmare. Where they are
quite simple in C.
And if the average generated code quality bothers you enough with C
the compiler can be fixed, or another compiler can be written that
does a better job, and the benefit applies to a lot more code.
Eric
Eric W. Biederman wrote:
> Actually it is no as simple as that. With the instruction that uses
> %edi following immediately after the instruction that populates it you cannot
> execute those two instructions in parallel. So the code may be slower. The
> exact rules depend on the architecture of the cpu.
I remember inserting a "nop" into a loop and it went significantly
faster on a Pentium Pro :)
> If you concentrate on those handful of places where you need to
> optimize that is reasonable. Beyond that there simply are not the
> developer resources to do good assembly. And things like algorithmic
> transformations in assembly are an absolute nightmare. Where they are
> quite simple in C.
If we had enough developer resources to write the whole thing in good
assembly, then for _sure_ we'd have enough to write a perfect compiler!
I would argue that the most powerful algorithmic transformations are a
nightmare in C, too. Less so, though.
-- Jamie
On Sun, Sep 07, 2003 at 12:08:00AM +0200, Herbert Poetzl wrote:
>
> what it the problem with gas anyway? why not convert
> the masterpiece to GNU Assembler? there even exists
> some script to aid in masm to gas conversion ...
>
> http://www.delorie.com/djgpp/faq/converting/asm2s-sed.html
Thanks. Already found, tried, does not work.
It doesn't even convert register names properly. I know enough of sed to fix
broken scripts resulting from incorrect line wrapping, but I'm certainly not
going to work on this.
I started to work on converting it to gas, but I stopped after the first
hour. It's just too much work to be fun. I won't convert it.
Regards, Clemens
Eric W. Biederman wrote:
> insecure <[email protected]> writes:
>> movl $0, 20(%esp)
>> movl $1000000, %edi <----
>> movl $1000000, 16(%esp) <----
>> movl $0, 12(%esp)
>>
>>No sane human will do that.
>>main:
>> movl $1000000, %edi
>> movl %edi, 16(%esp) <-- save 4 bytes
>> movl %ebp, 12(%esp) <-- save 4 bytes
>> movl $.LC27, 8(%esp)
>>
>>And this is only from a cursory examination.
>
> Actually it is no as simple as that. With the instruction that uses
> %edi following immediately after the instruction that populates it you cannot
> execute those two instructions in parallel. So the code may be slower. The
> exact rules depend on the architecture of the cpu.
>
It will depend on arch CPU only in case if you have unlimited i$ size.
Servers with 8MB of cache - yes it is faster.
Celeron with 128k of cache - +4bytes == higher probability of i$ miss
== lower performance.
>
>>What gives you an impression that anyone is going to rewrite linux in asm?
>>I _only_ saying that compiler-generated asm is not 'good'. It's mediocre.
>>Nothing more. I am not asm zealot.
>
>
> I think I would agree with that statement most compiler-generated assembly
> code is mediocre in general. At the same time I would add most human
> generated assembly is poor, and a pain to maintain.
>
> If you concentrate on those handful of places where you need to
> optimize that is reasonable. Beyond that there simply are not the
> developer resources to do good assembly. And things like algorithmic
> transformations in assembly are an absolute nightmare. Where they are
> quite simple in C.
>
> And if the average generated code quality bothers you enough with C
> the compiler can be fixed, or another compiler can be written that
> does a better job, and the benefit applies to a lot more code.
>
e.g. C-- project: something like C, where you can operate with
registers just like another variables. Under DOS was producing .com
files witout any overhead: program with only 'int main() { return 0; }'
was optimized to one byte 'ret' ;-) But sure it was not complete C
implementation.
Sure I would prefere to have nasm used for kernel asm parts - but
obviously gas already became standard.
P.S. Add having good macroprocessor for assembler is a must: CPP is
terribly stupid by design. I beleive gas has no preprocessor comparable
to masm's one? I bet they are using C's cpp. This is degradation: macros
is the major feature of any translator I was working with. They can save
you a lot of time and make code much more cleaner/readable/mantainable.
CPP is just too dumb for asm...
Good old times, when people were responsible to _every_ byte of their
programmes... Yeh... Memory/programmers are cheap nowadays...
Fruhwirth Clemens wrote:
>
> I started to work on converting it to gas, but I stopped after the first
> hour. It's just too much work to be fun. I won't convert it.
>
> Regards, Clemens
> masm ,,,
% objdump | less
Probably this way it could be easier/funnier ;-)
On Mon, 8 Sep 2003, Ihar 'Philips' Filipau wrote:
> Eric W. Biederman wrote:
> > insecure <[email protected]> writes:
> >> movl $0, 20(%esp)
> >> movl $1000000, %edi <----
> >> movl $1000000, 16(%esp) <----
> >> movl $0, 12(%esp)
> >>
> >>No sane human will do that.
> >>main:
> >> movl $1000000, %edi
> >> movl %edi, 16(%esp) <-- save 4 bytes
> >> movl %ebp, 12(%esp) <-- save 4 bytes
> >> movl $.LC27, 8(%esp)
> >>
> >>And this is only from a cursory examination.
> >
> > Actually it is no as simple as that. With the instruction that uses
> > %edi following immediately after the instruction that populates it you
> > cannot
> > execute those two instructions in parallel.
With a single-CPU ix86, the only instructions that operate in
parallel are the instructions that calculate the next address, and
this only if you use 'leal'. However, there is an instruction
pipe-line so many memory accesses may seem to be unrelated to the
current execution context and therfore assumed to be 'parallel'.
> > So the code may be slower. The
> > exact rules depend on the architecture of the cpu.
> >
>
> It will depend on arch CPU only in case if you have unlimited i$ size.
> Servers with 8MB of cache - yes it is faster.
> Celeron with 128k of cache - +4bytes == higher probability of i$ miss
> == lower performance.
>
> >
> >>What gives you an impression that anyone is going to rewrite linux in asm?
> >>I _only_ saying that compiler-generated asm is not 'good'. It's mediocre.
> >>Nothing more. I am not asm zealot.
> >
> >
> > I think I would agree with that statement most compiler-generated assembly
> > code is mediocre in general. At the same time I would add most human
> > generated assembly is poor, and a pain to maintain.
> >
The compiler-generated assembly is, by design, "universal" so that
any legal 'C' statement may follow any other legal 'C' statement.
This means that, at each sequence-point, the assembly generation
is complete. This results in a lot of code duplication, etc. A
really good optimizer could, perform a fix-up that, based upon
the current 'C' code context, remove a lot of redundancy. Currently,
some such optimization is done by gcc such as loop-unrolling, etc.
A really good project would be an assembly-optimizer operated
like:
gcc -O2 -S -o - prog.c | optimizer | as -o prog.o -
Just make that optimizer and away you go! I hate parser and
other text-based stuff so I'm not a candidate to make one of
these things.
> > If you concentrate on those handful of places where you need to
> > optimize that is reasonable. Beyond that there simply are not the
> > developer resources to do good assembly. And things like algorithmic
> > transformations in assembly are an absolute nightmare. Where they are
> > quite simple in C.
> >
> > And if the average generated code quality bothers you enough with C
> > the compiler can be fixed, or another compiler can be written that
> > does a better job, and the benefit applies to a lot more code.
> >
>
> e.g. C-- project: something like C, where you can operate with
> registers just like another variables. Under DOS was producing .com
> files witout any overhead: program with only 'int main() { return 0; }'
> was optimized to one byte 'ret' ;-) But sure it was not complete C
> implementation.
>
> Sure I would prefere to have nasm used for kernel asm parts - but
> obviously gas already became standard.
>
> P.S. Add having good macroprocessor for assembler is a must: CPP is
> terribly stupid by design. I beleive gas has no preprocessor comparable
> to masm's one? I bet they are using C's cpp. This is degradation: macros
> is the major feature of any translator I was working with. They can save
> you a lot of time and make code much more cleaner/readable/mantainable.
> CPP is just too dumb for asm...
> Good old times, when people were responsible to _every_ byte of their
> programmes... Yeh... Memory/programmers are cheap nowadays...
This is for information only. I certainly don't advocate
writing everything in assembly language.
Attached is a tar file containing source and a Makefile.
It generates two tiny programs, "hello" and "world".
Both write "Hello world!" to standard-output. One is
written in assembly and the other is written in 'C'.
The one written in 'C' uses your installed shared
runtime library as is normal for such programs. Even
then, it is 2,948 bytes in length. The one written
in assembly results in a complete executable that
doesn't require any runtime support, i.e., static.
It is only 456 bytes in length.
gcc -Wall -O4 -o hello hello.c
strip hello
as -o world.o world.S
ld -o world world.o
strip world
ls -la hello world
-rwxr-xr-x 1 root root 2948 Sep 8 08:34 hello
-rwxr-xr-x 1 root root 456 Sep 8 08:34 world
The point is that if you really need to save some application
size, in many cases you can do the work in assembly. It is
a very useful tool. Also, if you have critical sections of
code you need to pipe-line for speed, you can do it in assembly
and make sure the optimization doesn't disappear the next
time somebody updates (improves) your tools. What you write
in assembly is what you get.
I don't like "in-line" assembly. Sometimes you don't have
much choice because you can't call some assembly-language
function to perform the work. However, when you can afford
the overhead of calling a function written in assembly, the
following applies.
Assume you have:
extern int funct(int one, int two, int three);
Your assembly would obtain parameters as:
one = 0x04
two = 0x08
three = 0x0c
funct: movl one(%esp), %eax # Get first passed parameter
movl two(%esp), %ebx # Get second parameter
movl three(%esp), %ecx # Get third parameter
...etc
Now, gcc requires that your function not destroy any index
registers, %ebp, or any segment registers so, in the case
above, we need to save %ebx (an index register) before we
modify its value. To do this, we push it onto the stack.
This will alter the stack offsets where we obtain our input
parameters.
one = 0x08
two = 0x0c
three = 0x10
funct: pushl %ebx # Save index register
movl one(%esp), %eax # Get first passed parameter
movl two(%esp), %ebx # Get second parameter
movl three(%esp), %ecx # Get third parameter
...etc
popl %ebx # Restore index register
So, we could define macro that allows us to adjust the offsets
based upon the number of registers saved. I won't bother
here.
In most all cases, any value returned from the function is returned
in the %eax register. If you need to return a 'long long' both
%edx and %eax are used. Some functions may return values in the
floating-point unit so, when replacing existing 'C' code, you
need to see what the convention was.
When I write assembly-language functions I usually do it to
replace 'C' functions that (usually) somebody else has written.
Those 'C' functions are known to work. In other words, they
perform the correct mathematics. However, they need to be
speeded up or they need to be parred down to a more reasonable
size to fit in some embedded system.
Recently we had a function that calculated the RMS value of
an array of floating-point (double) numbers. With a particular
array size, the time necessary was something like 300 milliseconds.
By rewriting in assembly, and using the knowledge that the
array will never be less that 512 doubles in length, plus always
a power-of-two, the execution time went way down to 40 milliseconds.
Also, you can't "cheat" with a FP unit. There are always memory-
accesses that eat valuable CPU time. You can't put temporary float
values in registers.
I strongly suggest that if you have an interest in assembly, you
cultivate that interest. Soon most all mundane coding will be
performed by machine from a specification written by "Sales".
The only "real" programming will be done by those who can make
the interface between the hardware and the "coding machine". That's
assembly!
Cheers,
Dick Johnson
Penguin : Linux version 2.4.22 on an i686 machine (794.73 BogoMips).
Note 96.31% of all statistics are fiction.
Richard B. Johnson wrote:
> > > Actually it is no as simple as that. With the instruction that uses
> > > %edi following immediately after the instruction that populates it you
> > > cannot
> > > execute those two instructions in parallel.
>
> With a single-CPU ix86, the only instructions that operate in
> parallel are the instructions that calculate the next address, and
> this only if you use 'leal'. However, there is an instruction
> pipe-line so many memory accesses may seem to be unrelated to the
> current execution context and therfore assumed to be 'parallel'.
That was true on the 486. The Pentium famously executed one or two
instructions per cycle, depending on whether they are "pairable". The
Pentium Pro and later can issue up to 3 instructions per cycle,
depending on the instruction types. If they are the right
instructions, it will sustain that rate over multiple cycles.
Nowadays all the major x86 CPUs issue multiple instructions per clock cycle.
-- Jamie
>
> > > So the code may be slower. The
> > > exact rules depend on the architecture of the cpu.
> > >
> >
> > It will depend on arch CPU only in case if you have unlimited i$ size.
> > Servers with 8MB of cache - yes it is faster.
> > Celeron with 128k of cache - +4bytes == higher probability of i$ miss
> > == lower performance.
> >
> > >
> > >>What gives you an impression that anyone is going to rewrite linux in asm?
> > >>I _only_ saying that compiler-generated asm is not 'good'. It's mediocre.
> > >>Nothing more. I am not asm zealot.
> > >
> > >
> > > I think I would agree with that statement most compiler-generated assembly
> > > code is mediocre in general. At the same time I would add most human
> > > generated assembly is poor, and a pain to maintain.
> > >
>
> The compiler-generated assembly is, by design, "universal" so that
> any legal 'C' statement may follow any other legal 'C' statement.
> This means that, at each sequence-point, the assembly generation
> is complete. This results in a lot of code duplication, etc. A
> really good optimizer could, perform a fix-up that, based upon
> the current 'C' code context, remove a lot of redundancy. Currently,
> some such optimization is done by gcc such as loop-unrolling, etc.
>
> A really good project would be an assembly-optimizer operated
> like:
>
> gcc -O2 -S -o - prog.c | optimizer | as -o prog.o -
>
> Just make that optimizer and away you go! I hate parser and
> other text-based stuff so I'm not a candidate to make one of
> these things.
>
> > > If you concentrate on those handful of places where you need to
> > > optimize that is reasonable. Beyond that there simply are not the
> > > developer resources to do good assembly. And things like algorithmic
> > > transformations in assembly are an absolute nightmare. Where they are
> > > quite simple in C.
> > >
> > > And if the average generated code quality bothers you enough with C
> > > the compiler can be fixed, or another compiler can be written that
> > > does a better job, and the benefit applies to a lot more code.
> > >
> >
> > e.g. C-- project: something like C, where you can operate with
> > registers just like another variables. Under DOS was producing .com
> > files witout any overhead: program with only 'int main() { return 0; }'
> > was optimized to one byte 'ret' ;-) But sure it was not complete C
> > implementation.
> >
> > Sure I would prefere to have nasm used for kernel asm parts - but
> > obviously gas already became standard.
> >
> > P.S. Add having good macroprocessor for assembler is a must: CPP is
> > terribly stupid by design. I beleive gas has no preprocessor comparable
> > to masm's one? I bet they are using C's cpp. This is degradation: macros
> > is the major feature of any translator I was working with. They can save
> > you a lot of time and make code much more cleaner/readable/mantainable.
> > CPP is just too dumb for asm...
> > Good old times, when people were responsible to _every_ byte of their
> > programmes... Yeh... Memory/programmers are cheap nowadays...
>
>
> This is for information only. I certainly don't advocate
> writing everything in assembly language.
>
> Attached is a tar file containing source and a Makefile.
> It generates two tiny programs, "hello" and "world".
> Both write "Hello world!" to standard-output. One is
> written in assembly and the other is written in 'C'.
> The one written in 'C' uses your installed shared
> runtime library as is normal for such programs. Even
> then, it is 2,948 bytes in length. The one written
> in assembly results in a complete executable that
> doesn't require any runtime support, i.e., static.
> It is only 456 bytes in length.
>
> gcc -Wall -O4 -o hello hello.c
> strip hello
> as -o world.o world.S
> ld -o world world.o
> strip world
> ls -la hello world
> -rwxr-xr-x 1 root root 2948 Sep 8 08:34 hello
> -rwxr-xr-x 1 root root 456 Sep 8 08:34 world
>
> The point is that if you really need to save some application
> size, in many cases you can do the work in assembly. It is
> a very useful tool. Also, if you have critical sections of
> code you need to pipe-line for speed, you can do it in assembly
> and make sure the optimization doesn't disappear the next
> time somebody updates (improves) your tools. What you write
> in assembly is what you get.
>
> I don't like "in-line" assembly. Sometimes you don't have
> much choice because you can't call some assembly-language
> function to perform the work. However, when you can afford
> the overhead of calling a function written in assembly, the
> following applies.
>
> Assume you have:
>
> extern int funct(int one, int two, int three);
>
> Your assembly would obtain parameters as:
>
> one = 0x04
> two = 0x08
> three = 0x0c
>
> funct: movl one(%esp), %eax # Get first passed parameter
> movl two(%esp), %ebx # Get second parameter
> movl three(%esp), %ecx # Get third parameter
> ...etc
>
> Now, gcc requires that your function not destroy any index
> registers, %ebp, or any segment registers so, in the case
> above, we need to save %ebx (an index register) before we
> modify its value. To do this, we push it onto the stack.
> This will alter the stack offsets where we obtain our input
> parameters.
>
>
> one = 0x08
> two = 0x0c
> three = 0x10
>
> funct: pushl %ebx # Save index register
> movl one(%esp), %eax # Get first passed parameter
> movl two(%esp), %ebx # Get second parameter
> movl three(%esp), %ecx # Get third parameter
> ...etc
> popl %ebx # Restore index register
>
> So, we could define macro that allows us to adjust the offsets
> based upon the number of registers saved. I won't bother
> here.
>
> In most all cases, any value returned from the function is returned
> in the %eax register. If you need to return a 'long long' both
> %edx and %eax are used. Some functions may return values in the
> floating-point unit so, when replacing existing 'C' code, you
> need to see what the convention was.
>
> When I write assembly-language functions I usually do it to
> replace 'C' functions that (usually) somebody else has written.
> Those 'C' functions are known to work. In other words, they
> perform the correct mathematics. However, they need to be
> speeded up or they need to be parred down to a more reasonable
> size to fit in some embedded system.
>
> Recently we had a function that calculated the RMS value of
> an array of floating-point (double) numbers. With a particular
> array size, the time necessary was something like 300 milliseconds.
> By rewriting in assembly, and using the knowledge that the
> array will never be less that 512 doubles in length, plus always
> a power-of-two, the execution time went way down to 40 milliseconds.
> Also, you can't "cheat" with a FP unit. There are always memory-
> accesses that eat valuable CPU time. You can't put temporary float
> values in registers.
>
> I strongly suggest that if you have an interest in assembly, you
> cultivate that interest. Soon most all mundane coding will be
> performed by machine from a specification written by "Sales".
> The only "real" programming will be done by those who can make
> the interface between the hardware and the "coding machine". That's
> assembly!
>
> Cheers,
> Dick Johnson
> Penguin : Linux version 2.4.22 on an i686 machine (794.73 BogoMips).
> Note 96.31% of all statistics are fiction.
>
Ihar 'Philips' Filipau wrote:
> It will depend on arch CPU only in case if you have unlimited i$ size.
> Servers with 8MB of cache - yes it is faster.
> Celeron with 128k of cache - +4bytes == higher probability of i$ miss
> == lower performance.
Higher probability != optimal performance.
It depends on your execution context. If it's part of a tight loop
which is executed often, then saving a cycle in the loop gains more
performance than saving icache, even on a 128k Celeron.
The execution context can depend on the input to the program, in which
case the faster of the two code sequences can depend on the program's
input too. Then, for optimal performance, you need to profile the
"expected" inputs.
> P.S. Add having good macroprocessor for assembler is a must: CPP is
> terribly stupid by design. I beleive gas has no preprocessor comparable
> to masm's one? I bet they are using C's cpp.
You obviously have not read the GAS documentation.
It has quite a good macro facility built in.
-- Jamie
Jamie Lokier wrote:
> Ihar 'Philips' Filipau wrote:
>
>> It will depend on arch CPU only in case if you have unlimited i$ size.
>> Servers with 8MB of cache - yes it is faster.
>> Celeron with 128k of cache - +4bytes == higher probability of i$ miss
>>== lower performance.
>
> Higher probability != optimal performance.
>
> It depends on your execution context. If it's part of a tight loop
> which is executed often, then saving a cycle in the loop gains more
> performance than saving icache, even on a 128k Celeron.
>
You think as system-programmer.
Every bit of i$ waste - hit user space applications too.
128k of $ - is for every app.
If you gained one cycle by polluting one more cache line - do not
forget that this cache line probably contained some info, which was able
to avoid cache miss for another application. So you gained cycle here -
and lost it immediately in another app. Not good.
If you can improve performance by NOT polluting cache - it would be
another story :-)))
> The execution context can depend on the input to the program, in which
> case the faster of the two code sequences can depend on the program's
> input too. Then, for optimal performance, you need to profile the
> "expected" inputs.
>
>
> You obviously have not read the GAS documentation.
>
> It has quite a good macro facility built in.
>
Indeed. RTFM quickly shown some good examples.
But still I never saw this kind of thing being used in kernel.
Instead of writing normal asm we have something like i386/mmx.c. And
i386/checksum.S not the best sample of asm in kernel too. Sad.
--
Ihar 'Philips' Filipau / with best regards from Saarbruecken.
- - - - - - - - - - - - - - - - - - - -
* Please avoid sending me Word/PowerPoint/Excel attachments.
* See http://www.fsf.org/philosophy/no-word-attachments.html
- - - - - - - - - - - - - - - - - - - -
There should be some SCO's source code in Linux -
my servers sometimes are crashing. -- People
Ihar 'Philips' Filipau wrote:
> Jamie Lokier wrote:
> >Ihar 'Philips' Filipau wrote:
> >
> >> It will depend on arch CPU only in case if you have unlimited i$ size.
> >> Servers with 8MB of cache - yes it is faster.
> >> Celeron with 128k of cache - +4bytes == higher probability of i$ miss
> >>== lower performance.
> >
> >Higher probability != optimal performance.
> >
> >It depends on your execution context. If it's part of a tight loop
> >which is executed often, then saving a cycle in the loop gains more
> >performance than saving icache, even on a 128k Celeron.
> >
>
> You think as system-programmer.
> Every bit of i$ waste - hit user space applications too.
> 128k of $ - is for every app.
>
> If you gained one cycle by polluting one more cache line - do not
> forget that this cache line probably contained some info, which was able
> to avoid cache miss for another application. So you gained cycle here -
> and lost it immediately in another app. Not good.
Usually the whole L1 cache is flushed between appliation context
switches anyway, so the cost of a miss is borne by the application
which causes it.
And that _still_ doesn't change the truth of my statement. One cycle
saved in a loop which executes 1000 times is worth more than an L1
i-cache miss, always.
> If you can improve performance by NOT polluting cache - it would be
> another story :-)))
Yes, of course that is better when it is possible.
Modern OOO CPUs are subtle beasts. Like I said, I added a single
"nop" (one-byte instruction) to a tight graphics loop once, and the
loop went significantly faster. I could not explain it, except that I
know the Pentium Pro instruction decode stage has many quirks.
> But still I never saw this kind of thing being used in kernel.
> Instead of writing normal asm we have something like i386/mmx.c. And
> i386/checksum.S not the best sample of asm in kernel too. Sad.
Those were written before GAS had a macro facility. I agree with you,
it should be used more in the kernel.
The two examples you gave have been carefully tuned on particular
CPUs, by trial and error. Changing the instruction order makes a big
difference to their performance.
-- Jamie
On Mon, Sep 08, 2003 at 02:03:21PM +0200, Ihar 'Philips' Filipau wrote:
> e.g. C-- project: something like C, where you can operate with
> registers just like another variables. Under DOS was producing .com
> files witout any overhead: program with only 'int main() { return 0; }'
> was optimized to one byte 'ret' ;-) But sure it was not complete C
> implementation.
There is already a C-- project and it is unrelated to your suggestion.
c.f. http://cminusminus.org/
-- wli
William Lee Irwin III wrote:
> On Mon, Sep 08, 2003 at 02:03:21PM +0200, Ihar 'Philips' Filipau wrote:
>
>> e.g. C-- project: something like C, where you can operate with
>>registers just like another variables. Under DOS was producing .com
>>files witout any overhead: program with only 'int main() { return 0; }'
>>was optimized to one byte 'ret' ;-) But sure it was not complete C
>>implementation.
>
>
> There is already a C-- project and it is unrelated to your suggestion.
>
> c.f. http://cminusminus.org/
>
Actually I have tryed to find the project I knew long time ago.
But let say - it was pre internet era.
I beleive C-- was distributed as a shareware. I cannot be sure about
licensing - I come from USSR ;-)
[ ...removing the dust of time from my archives... ]
[ ...@#$%^&* it is archived with arj - VMware... ]
[ Wow - it has name! Sphinx C-- - and Google has some docs on-line
http://www.goosee.com/cmm/c--doc.htm (go directly to
http://www.goosee.com/cmm/c--doc.htm#Expressions) ]
[ Home page is http://www.goosee.com/cmm/ ]
Sample from my archive follows.
Watch the comment "RUN FILE SIZE" - 457 bytes. Far from ideal - but
already 'very good'.
/*
NAME: FIRE.C--
INFO: Written by Midnight.
Conversion to C-- (and a little palette messing) by SPHINX.
DESCRIPTION: This program displays a fire like simulation on a
VGA 320x200 256 colour screen.
RUN FILE SIZE: 457 bytes.
*/
?use80386
?resize FALSE
?assumeDSSS TRUE
?include "VIDEO.H--"
?include "VGA.H--"
?include "RANDOM.H--"
?include "KEYCODES.H--"
byte palette[768];
// byte pic = FROM "c--.cut";
word F;
word LowLimit;
void SetCols ()
byte N;
{
/*
BX = 0;
do {
SI = BX+BX+BX;
palette[SI] = BL >> 2;
palette[SI+1] = BL >> 3;
palette[SI+2] = 0;
BX++;
} while( BX < 256 );
*/
palette[0]=0;
palette[1]=0;
palette[2]=0;
BX = 0;
do {
DI = BX+BX+BX;
AX = BX * 64 / 85;
DL = AL;
palette[DI+3] = DL;
palette[DI+3+1] = 0;
palette[DI+3+2] = 0;
palette[DI+3+85+85+85] = 63;
palette[DI+3+85+85+85+1] = DL;
palette[DI+3+85+85+85+2] = 0;
palette[DI+3+85+85+85+85+85+85] = 63;
palette[DI+3+85+85+85+85+85+85+1] = 63;
palette[DI+3+85+85+85+85+85+85+2] = DL;
BX++;
} while( BX < 85 );
SETVGAPALETTE( ,0,256,#palette);
}
void AddFire (word N)
{
loop(N)
{
AX = RAND()%10+1;
SI = AX+AX;
AX = RAND()%298 + 64010;
DI = AX;
AL = RAND()&127 + 128;
loop(SI)
{
ESBYTE[DI] = AL;
ESBYTE[DI+640] = AL;
DI++;
}
}
}
void CopyAvg ()
{
DI = 1280;
do {
BX = 2;
do {
AX = ESBYTE[DI+BX-2] + ESBYTE[DI+BX+2] + ESBYTE[DI+BX] +
ESBYTE[DI+BX+640] / 4;
AX = AX + ESBYTE[DI+BX-640] >> 1;
IF( AL > 128 )
AL-=2;
ELSE IF( AL > 3)
AL -=4;
ELSE
AL = 0;
AH = AL;
ESWORD[DI+BX-1280] = AX;
ESWORD[DI+BX-1280+320] = AX;
BX += 2;
} while( BX < 318 );
DI += 640;
} while( DI < 320*204 );
}
void main ()
{
@ SETVIDEOMODE(byte vid_320x200_256);
SetCols();
ES = 0xA000;
F = 0;
do {
AddFire(F / 32 + 1);
CopyAvg();
IF( F < 512 )
F++;
LowLimit = F >> 2 + 128 >> 2;
// IF( F % 80 == 0 )
// overimage19(80,90,#pic,0);
} while( BIOSKEYCHECK() == 0 );
do {BIOSREADKEY();
} while( BIOSKEYCHECK() != 0 );
@ SETVIDEOMODE(byte vid_text80c);
}
/* end of FIRE.C-- */
Hi!
> A random example form one small unrelated program (gcc 3.2):
>
> main:
> pushl %ebp
> pushl %edi
> pushl %esi
> pushl %ebx
> subl _32, %esp
> xorl %ebp, %ebp
> cmpl _1, 52(%esp)
> movl _0, 20(%esp)
> movl _1000000, %edi <----
> movl _1000000, 16(%esp) <----
> movl _0, 12(%esp)
> movl _.LC27, 8(%esp)
> je .L274
> movl _1, %esi
> cmpl 52(%esp), %esi
> jge .L272
>
> No sane human will do that.
>
> main:
> pushl %ebp
> pushl %edi
> pushl %esi
> pushl %ebx
> subl _32, %esp
> xorl %ebp, %ebp
> cmpl _1, 52(%esp)
> movl _0, 20(%esp)
> movl _1000000, %edi
> movl %edi, 16(%esp) <-- save 4 bytes
> movl %ebp, 12(%esp) <-- save 4 bytes
> movl _.LC27, 8(%esp)
Hmm, but gcc version is likely faster. No sane person would write
multiply by 5 using single lea instruction, yet gcc will do that...
Pavel
--
Pavel
Written on sharp zaurus, because my Velo1 broke. If you have Velo you don't need...
On Sunday 07 September 2003 21:49, Eric W. Biederman wrote:
> insecure <[email protected]> writes:
> > On Friday 05 September 2003 15:59, Michael Frank wrote:
> > > Just got another reply to this thread which helps to explain what I
> > > meant by "better coders in 98+% of applications"
> > >
> > > On Friday 05 September 2003 19:42, J?rn Engel wrote:
> > > > How big is the .text of the asm and c variant? If the text of yours
> > > > is much bigger, you just traded 2fish performance for general
> > > > performance. Everything else will suffer from cache misses. Forget
> > > > your microbenchmark, your variant will make the machine slower.
> >
> > A random example form one small unrelated program (gcc 3.2):
> >
> > main:
> > pushl %ebp
> > pushl %edi
> > pushl %esi
> > pushl %ebx
> > subl $32, %esp
> > xorl %ebp, %ebp
> > cmpl $1, 52(%esp)
> > movl $0, 20(%esp)
> > movl $1000000, %edi <----
> > movl $1000000, 16(%esp) <----
> > movl $0, 12(%esp)
> > movl $.LC27, 8(%esp)
> > je .L274
> > movl $1, %esi
> > cmpl 52(%esp), %esi
> > jge .L272
> >
> > No sane human will do that.
> >
> >
> > main:
> > pushl %ebp
> > pushl %edi
> > pushl %esi
> > pushl %ebx
> > subl $32, %esp
> > xorl %ebp, %ebp
> > cmpl $1, 52(%esp)
> > movl $0, 20(%esp)
> > movl $1000000, %edi
> > movl %edi, 16(%esp) <-- save 4 bytes
> > movl %ebp, 12(%esp) <-- save 4 bytes
> > movl $.LC27, 8(%esp)
> > je .L274
> > movl $1, %esi
> > cmpl 52(%esp), %esi
> > jge .L272
> >
> > And this is only from a cursory examination.
>
> Actually it is no as simple as that. With the instruction that uses
> %edi following immediately after the instruction that populates it you
> cannot execute those two instructions in parallel. So the code may be
> slower. The exact rules depend on the architecture of the cpu.
That instruction is in main() initialization sequence. I.e.
it is executed once per program invocation.
Summary: we lost 8 bytes for no gain. There's not even a speed gain -
we lost 8 bytes of _icache_, that will bite us somewhere else.
> > What gives you an impression that anyone is going to rewrite linux in
> > asm? I _only_ saying that compiler-generated asm is not 'good'. It's
> > mediocre. Nothing more. I am not asm zealot.
>
> I think I would agree with that statement most compiler-generated assembly
> code is mediocre in general. At the same time I would add most human
> generated assembly is poor, and a pain to maintain.
I had an impression people think gcc generates code which
is 'mostly good' even compared to handwritted code.
That is not true (yet).
--
vda
On Sunday 07 September 2003 22:30, Jamie Lokier wrote:
> Eric W. Biederman wrote:
> > Actually it is no as simple as that. With the instruction that uses
> > %edi following immediately after the instruction that populates it you
> > cannot execute those two instructions in parallel. So the code may be
> > slower. The exact rules depend on the architecture of the cpu.
>
> I remember inserting a "nop" into a loop and it went significantly
> faster on a Pentium Pro :)
My example in _not_ a loop, far from it. That's the point.
GCC thinks everything is a loop.
> > If you concentrate on those handful of places where you need to
> > optimize that is reasonable. Beyond that there simply are not the
> > developer resources to do good assembly. And things like algorithmic
> > transformations in assembly are an absolute nightmare. Where they are
> > quite simple in C.
>
> If we had enough developer resources to write the whole thing in good
> assembly, then for _sure_ we'd have enough to write a perfect compiler!
Peace, Jamie. I do _not_ advocate using asm anywhere except speed critical
code.
--
vda
On Wed, 10 Sep 2003 00:34:57 +0300, insecure wrote:
> That instruction is in main() initialization sequence. I.e. it is
> executed once per program invocation. Summary: we lost 8 bytes for no
> gain. There's not even a speed gain - we lost 8 bytes of _icache_, that
> will bite us somewhere else.
You're quite right, but the I-Cache is a non issue: this code will be
evicted when there is need to put something else. And because its only run
once at the beginning of the program, it won't cause anything important to
be evicted. You can complain about the time it gets to fetch the code from
RAM though.
Quoting another post from you: "I do _not_ advocate using asm anywhere
except speed critical code."
This code is obviously not critical. So, it makes a bad choice for
discussion.
--
Ricardo
insecure <[email protected]> writes:
> On Sunday 07 September 2003 21:49, Eric W. Biederman wrote:
> > insecure <[email protected]> writes:
> > > On Friday 05 September 2003 15:59, Michael Frank wrote:
> > > What gives you an impression that anyone is going to rewrite linux in
> > > asm? I _only_ saying that compiler-generated asm is not 'good'. It's
> > > mediocre. Nothing more. I am not asm zealot.
> >
> > I think I would agree with that statement most compiler-generated assembly
> > code is mediocre in general. At the same time I would add most human
> > generated assembly is poor, and a pain to maintain.
>
> I had an impression people think gcc generates code which
> is 'mostly good' even compared to handwritted code.
> That is not true (yet).
It is true. Not when compared to hand optimized code. But compared
to a day to day churn it is true. Although I tend to still prefer gcc
2.95 for the code size.
Eric
Eric W. Biederman wrote:
> Although I tend to still prefer gcc 2.95 for the code size.
I just compiled a small C function with GCC 3.2.2.
With -O2, it had two completely redundant stack adjustment instructions.
With -Os, those instructions were gone and it was good code.
Why, oh why is -O2 still so lame after all these years? :)
-- Jamie
On Thursday 11 September 2003 14:07, Ricardo Bugalho wrote:
> On Wed, 10 Sep 2003 00:34:57 +0300, insecure wrote:
> > That instruction is in main() initialization sequence. I.e. it is
> > executed once per program invocation. Summary: we lost 8 bytes for no
> > gain. There's not even a speed gain - we lost 8 bytes of _icache_, that
> > will bite us somewhere else.
>
> You're quite right, but the I-Cache is a non issue: this code will be
Please disable icache on your CPU ;)
> evicted when there is need to put something else. And because its only run
> once at the beginning of the program, it won't cause anything important to
> be evicted.
How can you know that it won't evict useful code?
> You can complain about the time it gets to fetch the code from
> RAM though.
Thanks for the tip. I missed that!
> Quoting another post from you: "I do _not_ advocate using asm anywhere
> except speed critical code."
> This code is obviously not critical. So, it makes a bad choice for
> discussion.
It makes perfectly fine point that gcc code is not good.
It just wasted 8 bytes in a rather simple code sequence.
--
vda
On Fri, 2003-09-12 at 16:26, insecure wrote:
> > You're quite right, but the I-Cache is a non issue: this code will be
> Please disable icache on your CPU ;)
[snip]
> How can you know that it won't evict useful code?
a) the code is at the beginning of the program
b) its only run once
Therefore, its impact on i-cache is a non-issue. I wasn't disscussing
the merits of a i-cache.
> > You can complain about the time it gets to fetch the code from
> > RAM though.
>
> Thanks for the tip. I missed that!
Welcome.
> It makes perfectly fine point that gcc code is not good.
> It just wasted 8 bytes in a rather simple code sequence.
First of all, it would have been nice to see the relevant code and
optimization flags.
I'll assume the goal is speed, not code size. Lets look at that piece of
code:
a) its only run once, so it doesn't benefict of caching
b) there are no loads there, so it can't be prefetched while the CPU
waits for loads
And thats what makes its performance benefict of smaller code size.
But its also irrelevant to global performance. And I can't even imagin
any code that meets a) and b) and is relevant for performance.
In modern, general purpose, computer systems, code size is irrelevant.
It has been for 15 years and its not going to change.
And I'm not going to complaint about the compiler making a bad decision
on such a irrelevant case and one that is not worth re-writing in
assembly.
--
Ricardo
On Fri, 12 September 2003 18:27:29 +0100, Ricardo Bugalho wrote:
> On Fri, 2003-09-12 at 16:26, insecure wrote:
>
> > How can you know that it won't evict useful code?
>
> a) the code is at the beginning of the program
> b) its only run once
>
> Therefore, its impact on i-cache is a non-issue.
> > > You can complain about the time it gets to fetch the code from
> > > RAM though.
Non-issue, eh? ;)
> I'll assume the goal is speed, not code size. Lets look at that piece of
> code:
> a) its only run once, so it doesn't benefict of caching
While it may not benefit from caching, cache misses still hurt it.
And as it is only run once, there are only cache misses (ignoring the
cacheline effect). Shorter code is faster code.
> In modern, general purpose, computer systems, code size is irrelevant.
> It has been for 15 years and its not going to change.
- How long does you 50MB word processor take to load?
- Why do dietlibc and friends speed up server workloads?
- Why has Alan measured faster kernels with -Os than with -O2?
Code size *does* matter.
J?rn
--
Victory in war is not repetitious.
-- Sun Tzu
J?rn Engel wrote:
> - Why has Alan measured faster kernels with -Os than with -O2?
>
> Code size *does* matter.
That's not just i-cache pressure. It is partly a GCC problem, and
it's possible -Os would run faster than -O2 even with no i-cache.
I've observed -Os emitting exactly the same code as -O2 for some
trivial functions, except that -O2 has a few extra redundant
instructions.
Obvious the _intent_ of -O2 is to compile for speed, but it's clear
that GCC often emits trivially redundant instructions (like stack
adjustments) that don't serve to speed up the program at all.
-- Jamie
On Sat, 13 September 2003 20:25:39 +0100, Jamie Lokier wrote:
> J?rn Engel wrote:
> > - Why has Alan measured faster kernels with -Os than with -O2?
> >
> > Code size *does* matter.
>
> That's not just i-cache pressure. It is partly a GCC problem, and
> it's possible -Os would run faster than -O2 even with no i-cache.
>
> I've observed -Os emitting exactly the same code as -O2 for some
> trivial functions, except that -O2 has a few extra redundant
> instructions.
>
> Obvious the _intent_ of -O2 is to compile for speed, but it's clear
> that GCC often emits trivially redundant instructions (like stack
> adjustments) that don't serve to speed up the program at all.
I haven't collected too many numbers, but the few I did collect show
-O2 code actually being faster than -Os, as long as you stay in
userspace and the code is small and loopy. It may get worse for large
run-once code, but I don't have numbers for that.
My explanation for Alans results is that nature of kernel code.
Usually, kernel code execution takes only a fraction of the cpu time,
so the user code run in between effectively flushes the cache. Each
system call causes near 100% cache misses, so smaller code is almost
always faster.
So even if your observations were wrong and gcc would create perfect
code for both -O2 and -Os, I wouldn't expect -O2 to be faster for
kernel code.
J?rn
--
More computing sins are committed in the name of efficiency (without
necessarily achieving it) than for any other single reason - including
blind stupidity.
-- W. A. Wulf