2006-01-09 06:29:06

by Nauman Tahir

[permalink] [raw]
Subject: X86_64 and X86_32 bit performance difference [Revisited]

Hello All
I have posted this problem before. Now mailing again after testing as
recommeded in previous replys.
My configuration is:

Hardware:
HP Proliant DL145 (2 x AMD Optaron 144)
14 GB RAM

OS:
FC 4

Kernel
2.6.xx

As suggested by some friend, I compiled same kermel with maximum
possible common configuration options both on 32 and 64 bit. Tested my
deriver and got the same result.
Let me explain in detail whats going on.
I have a block device driver which uses my RAMDISK for caching the
data for some Target disk.
I have implemented two simple caching policies in it. I am running
IOTEST to see the IO rate of my driver. My RAMDISK differs for 32 and
64 bit versions. 32 bit version uses kmap family to read/write data
to/from memory while 64 bit version uses __va function call to get the
virtual address directly to avoid ioremap which sleeps and slows down
the IO rate considerably.RAMDISK individually gives very high IO rate
with IOTEST but perormance with my driver gets about one fourth. This
only happens when I run the whole thing on X86_64 bit compiled kernel.
Things works well on 32 bit version. Driver for both versions is same.
I can also not figure out what kernel configuration option is making
the difference if there is any.

My code does not seems to have portablility issues. Like calculations
are based on unsigned long. There are few threads involved based on
kernel_thread as used in MD driver.

Any ideas whats is the cause of performance difference? what areas to
look for ??

Nauman


2006-01-09 07:51:24

by Arjan van de Ven

[permalink] [raw]
Subject: Re: X86_64 and X86_32 bit performance difference [Revisited]

On Sun, 2006-01-08 at 22:29 -0800, Nauman Tahir wrote:
> Hello All
> I have posted this problem before. Now mailing again after testing as
> recommeded in previous replys.
> My configuration is:
>
> Hardware:
> HP Proliant DL145 (2 x AMD Optaron 144)
> 14 GB RAM
>
> OS:
> FC 4
>
> Kernel
> 2.6.xx

You *STILL* have not posted the URL to your source code.
How is anyone supposed to help you without that?????




2006-01-09 18:27:23

by Andi Kleen

[permalink] [raw]
Subject: Re: X86_64 and X86_32 bit performance difference [Revisited]

Nauman Tahir <[email protected]> writes:

> I have posted this problem before. Now mailing again after testing as
> recommeded in previous replys.
> My configuration is:

Most likely it's related to you misusing the PCI DMA API in some way.
Review Documentation/DMA-mapping.txt closely.

If that doesn't turn on the light try oprofile.

-Andi

2006-01-10 10:49:36

by Nauman Tahir

[permalink] [raw]
Subject: Re: X86_64 and X86_32 bit performance difference [Revisited]

On 1/9/06, Arjan van de Ven <[email protected]> wrote:
> On Sun, 2006-01-08 at 22:29 -0800, Nauman Tahir wrote:
> > Hello All
> > I have posted this problem before. Now mailing again after testing as
> > recommeded in previous replys.
> > My configuration is:
> >
> > Hardware:
> > HP Proliant DL145 (2 x AMD Optaron 144)
> > 14 GB RAM
> >
> > OS:
> > FC 4
> >
> > Kernel
> > 2.6.xx
>
> You *STILL* have not posted the URL to your source code.
> How is anyone supposed to help you without that?????

I have attached a file which I use as thread API. Complete code is
quiet large and also need proper description. which i would be posting
if needed.
I hope I make my problem clear: I repeat : same code is giving alot of
performance degradation on previously mentioned configuration. One
suspect is the thread library.


dts_thread_t *dts_register_thread(void (*run) (void *), const char
*name, void * private)

is the function to register my thread handler

void dts_wakeup_thread(dts_thread_t *thread)

is the function in the dts_thread.c which i use to run my thread.

all my thread handlers either
call generic_make_request some times for my RAMDISK and sometimes for
my Target device [SCSI DISK or local HDD partition]
OR
uses list.h



>
>
>
>
>

2006-01-10 10:50:17

by Nauman Tahir

[permalink] [raw]
Subject: Re: X86_64 and X86_32 bit performance difference [Revisited]

On 09 Jan 2006 19:27:20 +0100, Andi Kleen <[email protected]> wrote:
> Nauman Tahir <[email protected]> writes:
>
> > I have posted this problem before. Now mailing again after testing as
> > recommeded in previous replys.
> > My configuration is:
>
> Most likely it's related to you misusing the PCI DMA API in some way.
> Review Documentation/DMA-mapping.txt closely.
>
> If that doesn't turn on the light try oprofile.

what is oprofile???
>
> -Andi
>

2006-01-10 10:53:56

by Nauman Tahir

[permalink] [raw]
Subject: Re: X86_64 and X86_32 bit performance difference [Revisited]

On 1/10/06, Nauman Tahir <[email protected]> wrote:
> On 1/9/06, Arjan van de Ven <[email protected]> wrote:
> > On Sun, 2006-01-08 at 22:29 -0800, Nauman Tahir wrote:
> > > Hello All
> > > I have posted this problem before. Now mailing again after testing as
> > > recommeded in previous replys.
> > > My configuration is:
> > >
> > > Hardware:
> > > HP Proliant DL145 (2 x AMD Optaron 144)
> > > 14 GB RAM
> > >
> > > OS:
> > > FC 4
> > >
> > > Kernel
> > > 2.6.xx
> >
> > You *STILL* have not posted the URL to your source code.
> > How is anyone supposed to help you without that?????
>
> I have attached a file which I use as thread API. Complete code is
> quiet large and also need proper description. which i would be posting
> if needed.
> I hope I make my problem clear: I repeat : same code is giving alot of
> performance degradation on previously mentioned configuration. One
> suspect is the thread library.
>
>
> dts_thread_t *dts_register_thread(void (*run) (void *), const char
> *name, void * private)
>
> is the function to register my thread handler
>
> void dts_wakeup_thread(dts_thread_t *thread)
>
> is the function in the dts_thread.c which i use to run my thread.
>
> all my thread handlers either
> call generic_make_request some times for my RAMDISK and sometimes for
> my Target device [SCSI DISK or local HDD partition]
> OR
> uses list.h
>
>
>
> >
> >
> >
> >
> >
>


Attachments:
(No filename) (1.37 kB)
dts_thread.c (2.42 kB)
Download all attachments

2006-01-10 21:14:32

by Arjan van de Ven

[permalink] [raw]
Subject: Re: X86_64 and X86_32 bit performance difference [Revisited]

On Tue, 2006-01-10 at 02:49 -0800, Nauman Tahir wrote:
> On 1/9/06, Arjan van de Ven <[email protected]> wrote:
> > On Sun, 2006-01-08 at 22:29 -0800, Nauman Tahir wrote:
> > > Hello All
> > > I have posted this problem before. Now mailing again after testing as
> > > recommeded in previous replys.
> > > My configuration is:
> > >
> > > Hardware:
> > > HP Proliant DL145 (2 x AMD Optaron 144)
> > > 14 GB RAM
> > >
> > > OS:
> > > FC 4
> > >
> > > Kernel
> > > 2.6.xx
> >
> > You *STILL* have not posted the URL to your source code.
> > How is anyone supposed to help you without that?????
>
> I have attached a file which I use as thread API. Complete code is
> quiet large and also need proper description. which i would be posting
> if needed.

well you don't give any of the block layer code, I'd say more code is
needed. Just put all of it online somewhere and post the URL...