Vojtech Pavlik wrote:
> Could you a test just for me? Take vanilla 2.4.21 and then
> make oldconfig; make dep; time make bzImage
> That's basically what I want to know how long will take, since
> it's one of the most common time consuming tasks the thing will
> have to handle.
Done! Here're the results:-
Desktop - Pentium III 1 G Hz 754 MB -> 10.x min.
Tablet PC - Crusoe TM5800 1 GHz 731 MB -> 17.x min.
From freshdiagnos benchmack, the TPC has about 2x faster RAM.
I use tmpfs for the whole process so disk speed didn't count.
Both test run without X or any foreground process using
2.4.21-ac1 and RedHat kernel.
What do you think?
Shouldn't TM5800 with 4-wide VLIW engine and 64 registers,
working on a single task, run as fast as a Pentium III?
Why it take 70% longer for such small process (make+gcc+as)!
There must be something wrong.
Followup to: <[email protected]>
By author: Samphan Raruenrom <[email protected]>
In newsgroup: linux.dev.kernel
>
> Vojtech Pavlik wrote:
> > Could you a test just for me? Take vanilla 2.4.21 and then
> > make oldconfig; make dep; time make bzImage
> > That's basically what I want to know how long will take, since
> > it's one of the most common time consuming tasks the thing will
> > have to handle.
> Done! Here're the results:-
>
> Desktop - Pentium III 1 G Hz 754 MB -> 10.x min.
> Tablet PC - Crusoe TM5800 1 GHz 731 MB -> 17.x min.
>
> From freshdiagnos benchmack, the TPC has about 2x faster RAM.
> I use tmpfs for the whole process so disk speed didn't count.
> Both test run without X or any foreground process using
> 2.4.21-ac1 and RedHat kernel.
>
> What do you think?
> Shouldn't TM5800 with 4-wide VLIW engine and 64 registers,
> working on a single task, run as fast as a Pentium III?
> Why it take 70% longer for such small process (make+gcc+as)!
> There must be something wrong.
>
Which version of gcc are you running on the two machines?
-hpa
--
<[email protected]> at work, <[email protected]> in private!
"Unix gives you enough rope to shoot yourself in the foot."
Architectures needed: ia64 m68k mips64 ppc ppc64 s390 s390x sh v850 x86-64
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
On Monday 23 of June 2003 04:58, Samphan Raruenrom wrote:
> Done! Here're the results:-
>
> Desktop - Pentium III 1 G Hz 754 MB -> 10.x min.
> Tablet PC - Crusoe TM5800 1 GHz 731 MB -> 17.x min.
>
> From freshdiagnos benchmack, the TPC has about 2x faster RAM.
Real life example. I had to compile latest kopete (im KDE application). It
takes 4h and some on my 600 Mhz crusoe (128MB of RAM). On 600 mhz p3 it takes
1.5h. And this is thing i just get used to, this processor is fast only for
non complex operations, since it got to "emulate" x86.
I know for some tablets You can download BIOS dedicated for Windows and
sepearate one for Linux. I guess it was AQUA or simmilar. Well you can even
buy this one with preinstalled linux on it (i belive it is midori linux,
transmeta's embded linux distro).
- --
Grzegorz Jaskiewicz
K4 Labs
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.2 (GNU/Linux)
iD8DBQE+9pLsqu082fCQYIgRAgsXAJ9qeDF+QVEirpYA7YQfZd3uuCPhIACeJUt8
d4z2oU8v3ERkQSolV27/dxg=
=RdSb
-----END PGP SIGNATURE-----
On Mon, Jun 23, 2003 at 10:58:12AM +0700, Samphan Raruenrom wrote:
> Vojtech Pavlik wrote:
> > Could you a test just for me? Take vanilla 2.4.21 and then
> > make oldconfig; make dep; time make bzImage
> > That's basically what I want to know how long will take, since
> > it's one of the most common time consuming tasks the thing will
> > have to handle.
> Done! Here're the results:-
>
> Desktop - Pentium III 1 G Hz 754 MB -> 10.x min.
> Tablet PC - Crusoe TM5800 1 GHz 731 MB -> 17.x min.
>
> From freshdiagnos benchmack, the TPC has about 2x faster RAM.
> I use tmpfs for the whole process so disk speed didn't count.
> Both test run without X or any foreground process using
> 2.4.21-ac1 and RedHat kernel.
>
> What do you think?
> Shouldn't TM5800 with 4-wide VLIW engine and 64 registers,
> working on a single task, run as fast as a Pentium III?
> Why it take 70% longer for such small process (make+gcc+as)!
> There must be something wrong.
Same GCC version on both? Which?
--
Vojtech Pavlik
SuSE Labs, SuSE CR
On Mon, Jun 23, 2003 at 10:58:12AM +0700, Samphan Raruenrom wrote:
> Vojtech Pavlik wrote:
> > Could you a test just for me? Take vanilla 2.4.21 and then
> > make oldconfig; make dep; time make bzImage
> > That's basically what I want to know how long will take, since
> > it's one of the most common time consuming tasks the thing will
> > have to handle.
> Done! Here're the results:-
>
> Desktop - Pentium III 1 G Hz 754 MB -> 10.x min.
> Tablet PC - Crusoe TM5800 1 GHz 731 MB -> 17.x min.
>
> From freshdiagnos benchmack, the TPC has about 2x faster RAM.
> I use tmpfs for the whole process so disk speed didn't count.
> Both test run without X or any foreground process using
> 2.4.21-ac1 and RedHat kernel.
Desktop - 1.1 GHz Athlon Tbird 512M RAM, using disk -> 3.7 min
This is with gcc 2.95.2, which may give it an unfair advantage, though.
Or is something else wrong here?
--
Vojtech Pavlik
SuSE Labs, SuSE CR
Vojtech Pavlik wrote:
> On Mon, Jun 23, 2003 at 10:58:12AM +0700, Samphan Raruenrom wrote:
>>Desktop - Pentium III 1 G Hz 754 MB -> 10.x min.
>>Tablet PC - Crusoe TM5800 1 GHz 731 MB -> 17.x min.
>> From freshdiagnos benchmack, the TPC has about 2x faster RAM.
>>I use tmpfs for the whole process so disk speed didn't count.
>>Both test run without X or any foreground process using
>>2.4.21-ac1 and RedHat kernel.
> Desktop - 1.1 GHz Athlon Tbird 512M RAM, using disk -> 3.7 min
> This is with gcc 2.95.2, which may give it an unfair advantage, though.
> Or is something else wrong here?
Both use gcc 3.2. The desktop pentium III is very old. Slow ram/bus may
be the
reason. Or this may be the fastest a pentium iii 1 MHz can perform?.
Large RAM don't help pentium III. It doesn't need them here.
Large RAM should help Cursoe a lot. CMS should use those RAM as traslation
cache and put the entire processes (make, gcc, as) in it.
I guess 17.x min kernel compile time result from CMS still taking time
interpreting x86
code (because it decide to do so to conserve translation cache space?).
I wish the linux kernel could hint CMS to do a better job, if possible, like
when to interpret, translate or fully optimize and save-to-disk for
later use.
Followup to: <[email protected]>
By author: Vojtech Pavlik <[email protected]>
In newsgroup: linux.dev.kernel
>
> Desktop - 1.1 GHz Athlon Tbird 512M RAM, using disk -> 3.7 min
>
> This is with gcc 2.95.2, which may give it an unfair advantage, though.
> Or is something else wrong here?
>
gcc has been getting *massively* slower with every version since 2.7.2
or so, so it's important to compare the same gcc version.
-hpa
--
<[email protected]> at work, <[email protected]> in private!
"Unix gives you enough rope to shoot yourself in the foot."
Architectures needed: ia64 m68k mips64 ppc ppc64 s390 s390x sh v850 x86-64
Followup to: <[email protected]>
By author: Samphan Raruenrom <[email protected]>
In newsgroup: linux.dev.kernel
>
> Vojtech Pavlik wrote:
> > Could you a test just for me? Take vanilla 2.4.21 and then
> > make oldconfig; make dep; time make bzImage
> > That's basically what I want to know how long will take, since
> > it's one of the most common time consuming tasks the thing will
> > have to handle.
> Done! Here're the results:-
>
> Desktop - Pentium III 1 G Hz 754 MB -> 10.x min.
> Tablet PC - Crusoe TM5800 1 GHz 731 MB -> 17.x min.
>
> From freshdiagnos benchmack, the TPC has about 2x faster RAM.
> I use tmpfs for the whole process so disk speed didn't count.
> Both test run without X or any foreground process using
> 2.4.21-ac1 and RedHat kernel.
>
> What do you think?
> Shouldn't TM5800 with 4-wide VLIW engine and 64 registers,
> working on a single task, run as fast as a Pentium III?
> Why it take 70% longer for such small process (make+gcc+as)!
> There must be something wrong.
>
I just realized something ... newer kernels if you do "make oldconfig"
without a .config file in the directory will look for one in /boot.
This could greatly skew the result. Please create a .config and use
it on both systems to make sure that it's not an issue of what is
being compiled in.
I'm not saying that's the problem, I'm just trying to figure out what
the heck is going on here.
-hpa
--
<[email protected]> at work, <[email protected]> in private!
"Unix gives you enough rope to shoot yourself in the foot."
Architectures needed: ia64 m68k mips64 ppc ppc64 s390 s390x sh v850 x86-64
On Mon, 23 Jun 2003, Samphan Raruenrom wrote:
> Desktop - Pentium III 1 G Hz 754 MB -> 10.x min.
> Tablet PC - Crusoe TM5800 1 GHz 731 MB -> 17.x min.
how much real memory are in these boxes? the above don't look like any
real memory sizes i'm aware of (even if i try to guess if your M=10^6
or M=2^20).
> From freshdiagnos benchmack, the TPC has about 2x faster RAM.
you're mistaken if you believe marketing doodoo which says that DDR is
"twice as fast" as SDR -- only data transfer is clocked at twice the
bus speed. there's command bus costs which are identical between SDR
and DDR -- and its these costs which dominate the bus in non-sequential
benchmarks such as compilation (as opposed to excessively sequential
benchmarks which are used for marketing purposes).
expansion memory for the tablet PC is PC133 -- you can verify this
by reading the part numbers off the SODIMM and doing a lookup on the
manufacturer's website...
if you don't have any expansion memory in your tablet PC then you've
got only 256MB of RAM and i really don't think you should be using tmpfs.
> Shouldn't TM5800 with 4-wide VLIW engine and 64 registers,
> working on a single task, run as fast as a Pentium III?
nope.
you're assuming that the VLIW has 4 completely orthogonal processing
units -- it doesn't. try measuring code sequences such as:
add %eax,%ebx
add %eax,%ecx
add %eax,%edx
add %eax,%edi
add %eax,%ebx
add %eax,%ecx
add %eax,%edx
add %eax,%edi
...
you'll quickly figure out how many ALUs the machine has, plus you can
figure out their latencies and throughputs (by increasing or decreasing
the number of independent additions). if you do this you'll find
that p3 and tm5800 are essentially just as "wide" as each other for
ALU operations: a pair of 32-bit ALUs with single-cycle latency and
single-cycle throughput.
similar microarchitctural analysis of other aspects of the cores can
help explain the differences you see.
traditionally VLIW have been used for DSP and other such related tasks
in which there are a large number of orthogonal units. in tm5800 you
can think of the VLIW as a set of up to 4 micro-ops which feed the front
of up to 4 pipelines in each cycle -- much like the decoded micro-ops
in the p3 feed one of several pipelines in each cycle.
but i'm really guessing you're causing excessive disk i/o by having a
small memory system use a huge tmpfs... get rid of the tmpfs and
see what happens. and also consider doing i/o benchmarks while
running something which soaks up idle cycles (i.e. a tight loop
incrementing a counter) to see how the two architectures differ.
-dean
On Mon, 23 Jun 2003, dean gaudet wrote:
> On Mon, 23 Jun 2003, Samphan Raruenrom wrote:
>
> > Desktop - Pentium III 1 G Hz 754 MB -> 10.x min.
> > Tablet PC - Crusoe TM5800 1 GHz 731 MB -> 17.x min.
...
> but i'm really guessing you're causing excessive disk i/o by having a
> small memory system use a huge tmpfs... get rid of the tmpfs and
you know a few other things occured to me -- you should use "vmstat 5" to
find out if any disk i/o is occuring (not that if longrun is doing its job
then the "idle%" for cpu statistic is completely useless -- longrun should
be making it as close to 100% as possible) . i'm guessing you've got a
desktop disk drive in your p3 system -- which almost certainly outperforms
the laptop disk in the tablet pc...
not only for reasons like platter transfer speed, and seek latency, but
it's also possible your tablet isn't using anything faster than UDMA33 --
and in fact it's entirely possible you're not even using UDMA33. do
something like "grep hda /var/log/dmesg" to see what the bootup messages
said. try "hdparm -d /dev/hda" to see if dma is active -- and if it
isn't, try "hdparm -d1 /dev/hda" to enable it.
if that doesn't work then it's most likely you don't have the IDE driver
necessary for your tablet -- try "lspci -v" and try to find your
southbridge IDE controller... i don't recall which southbridge is in the
tablet, many crusoe boxes have ALi southbridges which have a kernel
driver, but i think the tablet has something other than ALi... if in doubt
post here and i'm sure someone can point you to the right driver.
-dean
>>The time used to compile 2.4.21 kernel off the same .config
>>Desktop - Pentium III 1 G Hz 756 MB -> 10.x min.
>>Tablet PC - Crusoe TM5800 1 GHz 733 MB -> 17.x min.
> how much real memory are in these boxes? the above don't look like any
> real memory sizes i'm aware of
I copy the number from the 'total' column after free -m. They're
about 256x3 with Crusoe gave about 24M to CMS (I guess).
I tried the tests again with swapoff, on tmpfs, on disks, the result
are approximately the same. The 'kernel' used for compilation and
the setting are exactly the same (2.4.21-ac1). The 'kernel' used to run the
test and the setting are about the same (2.4.21-ac1 vs ac2).
The desktop is Red Hat 8, gcc 3.2. The TPC is Red Hat 9, gcc 3.2.
So the result are real, no disk speed related. What do you think?
I can't look at the running CMS or the translation cache so I don't know
what
it is doing but I guess that after several run of gcc+as+ld,
the CMS still decide to interpret most part of their x86 code.
Am I wrong?
Or that TM5800 is already running as fast as possible? (from tcache)
On Wed, 25 Jun 2003, Samphan Raruenrom wrote:
> So the result are real, no disk speed related. What do you think?
did you verify that IDE DMA is in use on the tablet PC?
-dean
Followup to: <[email protected]>
By author: Samphan Raruenrom <[email protected]>
In newsgroup: linux.dev.kernel
>
> Vojtech Pavlik wrote:
> > Could you a test just for me? Take vanilla 2.4.21 and then
> > make oldconfig; make dep; time make bzImage
> > That's basically what I want to know how long will take, since
> > it's one of the most common time consuming tasks the thing will
> > have to handle.
> Done! Here're the results:-
>
> Desktop - Pentium III 1 G Hz 754 MB -> 10.x min.
> Tablet PC - Crusoe TM5800 1 GHz 731 MB -> 17.x min.
>
> From freshdiagnos benchmack, the TPC has about 2x faster RAM.
> I use tmpfs for the whole process so disk speed didn't count.
> Both test run without X or any foreground process using
> 2.4.21-ac1 and RedHat kernel.
>
For what it's worth, we have been completely unable to reproduce these
kinds of results at Transmeta; our results are in fact very consistent
with the numbers reported by some people for the Sharp MM-10 "Kitty"
which is also a 1 GHz TM5800; all of them have been in the 10 minute
ballpark.
I have written a script to try to give a consistent compile benchmark;
however, one still needs to make sure that DMA is turned on (hdparm -d
/dev/hda); obviously, the compiler etc should not be on NFS.
The timed portion (make -j3 bzImage) part of this script takes
10m15.035s real time (user 9m10.890s, sys 0m43.350s) on my 1067 MHz
Crusoe prototype system (256MB SDR, 256MB DDR, ATA33 disk) -- don't
have TC1000 Tablet PC numbers yet, but I have asked someone to run it
-- running RedHat 9 including distro kernel and gcc 3.2.2. It
produced a bzImage file that's 1151608 bytes long when I ran it.
Note that it uses "make -j3" for the bzImage, and so aren't really
comparable to your times listed above.
You obviously need to point the KERNEL variable at a suitable copy of
linux-2.4.21.tar.gz. The script needs to run as root in order to
create the tmpfs.
-hpa
--
<[email protected]> at work, <[email protected]> in private!
"Unix gives you enough rope to shoot yourself in the foot."
Architectures needed: ia64 m68k mips64 ppc ppc64 s390 s390x sh v850 x86-64
Followup to: <[email protected]>
By author: "H. Peter Anvin" <[email protected]>
In newsgroup: linux.dev.kernel
>
> I have written a script to try to give a consistent compile benchmark;
> however, one still needs to make sure that DMA is turned on (hdparm -d
> /dev/hda); obviously, the compiler etc should not be on NFS.
>
Leave it to me to actually forget the script...
#!/bin/bash -x
KERNEL=/home/mirror/kernel.org/linux/kernel/v2.4/linux-2.4.21.tar.gz
if [ -d /tmp/build ]; then
umount /tmp/build > /dev/null 2>&1
rmdir /tmp/build
fi
mkdir -p /tmp/build
mount -t tmpfs none /tmp/build
cd /tmp/build
tar xfz $KERNEL
cd linux*
cp -f arch/i386/defconfig .config
yes "" | make oldconfig
make dep
start=`date`
time bash -c 'make -j3 bzImage > build.log 2>&1'
end=`date`
echo "Started: $start Ended: $end"
-hpa
--
<[email protected]> at work, <[email protected]> in private!
"Unix gives you enough rope to shoot yourself in the foot."
Architectures needed: ia64 m68k mips64 ppc ppc64 s390 s390x sh v850 x86-64
>> Vojtech Pavlik wrote:
>> > Could you a test just for me? Take vanilla 2.4.21 and then
>> > make oldconfig; make dep; time make bzImage
>> > That's basically what I want to know how long will take, since
>> > it's one of the most common time consuming tasks the thing will
>> > have to handle.
> Done! Here're the results:-
>> Desktop - Pentium III 1 G Hz 754 MB -> 10.x min.
>> Tablet PC - Crusoe TM5800 1 GHz 731 MB -> 17.x min.
>> From freshdiagnos benchmack, the TPC has about 2x faster RAM.
>> I use tmpfs for the whole process so disk speed didn't count.
>> Both test run without X or any foreground process using
>> 2.4.21-ac1 and RedHat kernel.
>For what it's worth, we have been completely unable to reproduce these
>kinds of results at Transmeta; our results are in fact very consistent
>with the numbers reported by some people for the Sharp MM-10 "Kitty"
>which is also a 1 GHz TM5800; all of them have been in the 10 minute
>ballpark.
:-( I'm sorry. It's really my false. So all the time everyone think
that I do exactly as Vojtech told.
For uncomprehensible reason, no, I took the chance to upgrade my kernel
to 2.4.21-ac2 then I 'make menuconfig' instead of 'make ldconfig' so
I have tc1000 specific kernel with e100, VIA EIDE/sound, usb, irda,
bluetooth even ppp and netfilters so it really take 17 min to 'make
modules bzImage' on tc1000 on tmpfs with DMA on. Sorry for making this
confusion :~~ I've just try following exactly what Vojtech told and yes,
it takes about 9.5 min. I'm happy now :-)
But anyway, I use exactly the same source, the same .config (with crusoe
setting) and build it the same way on my Pentium III, tmpfs with DMA on.
So that 10 min vs. 17 min should still mean something, right?
My comparison seem to be interesting (at least to me) because Crusoe
is usually said to be comparable to Pentium III. I happend to have
the desktop machine with equal ram so the comparison should be fair.
as long as the benchmark doesn't use harddisk.
You don't need me for this comparison though. Try to find a
1 GHz Pentium III and run that 2.4.21 kernel build benchmark.
I guess it should take 5.5-5.7 min (if it scale that easy).
Hope these didn't make you too busy. Tell me if you need a hand.
I do love to help.
Samphan Raruenrom,
The Open Source Project,
National Electronics and Computer Technology Center,
Thailand.