2002-09-04 21:57:13

by Paolo Ciarrocchi

[permalink] [raw]
Subject: BYTE Unix Benchmarks Version 3.6

Hi all,
I've just ran the BYTE Unix Benchmarks Version 3.6 on the 2.4.19 and on the 2.5.33 kernel.
Here it goes the results:

BYTE UNIX Benchmarks (Version 3.11)
System -- Linux localhost.localdomain 2.4.19 #10 Fri Aug 23 20:53:06 BST 2002 i686 unknown
Start Benchmark Run: Wed Sep 4 22:11:32 BST 2002
1 interactive users.
Dhrystone 2 without register variables 1499020.6 lps (10 secs, 6 samples)
Dhrystone 2 using register variables 1501168.4 lps (10 secs, 6 samples)
Arithmetic Test (type = arithoh) 3598100.4 lps (10 secs, 6 samples)
Arithmetic Test (type = register) 201521.0 lps (10 secs, 6 samples)
Arithmetic Test (type = short) 190245.9 lps (10 secs, 6 samples)
Arithmetic Test (type = int) 201904.5 lps (10 secs, 6 samples)
Arithmetic Test (type = long) 201906.4 lps (10 secs, 6 samples)
Arithmetic Test (type = float) 210562.7 lps (10 secs, 6 samples)
Arithmetic Test (type = double) 210385.9 lps (10 secs, 6 samples)
System Call Overhead Test 407402.6 lps (10 secs, 6 samples)
Pipe Throughput Test 476268.6 lps (10 secs, 6 samples)
Pipe-based Context Switching Test 218969.9 lps (10 secs, 6 samples)
Process Creation Test 9078.6 lps (10 secs, 6 samples)
Execl Throughput Test 998.0 lps (9 secs, 6 samples)
File Read (10 seconds) 1571652.0 KBps (10 secs, 6 samples)
File Write (10 seconds) 109237.0 KBps (10 secs, 6 samples)
File Copy (10 seconds) 24329.0 KBps (10 secs, 6 samples)
File Read (30 seconds) 1562505.0 KBps (30 secs, 6 samples)
File Write (30 seconds) 113152.0 KBps (30 secs, 6 samples)
File Copy (30 seconds) 14334.0 KBps (30 secs, 6 samples)
C Compiler Test 470.9 lpm (60 secs, 3 samples)
Shell scripts (1 concurrent) 980.4 lpm (60 secs, 3 samples)
Shell scripts (2 concurrent) 544.1 lpm (60 secs, 3 samples)
Shell scripts (4 concurrent) 287.0 lpm (60 secs, 3 samples)
Shell scripts (8 concurrent) 147.0 lpm (60 secs, 3 samples)
Dc: sqrt(2) to 99 decimal places 42311.6 lpm (60 secs, 6 samples)
Recursion Test--Tower of Hanoi 18915.4 lps (10 secs, 6 samples)


INDEX VALUES
TEST BASELINE RESULT INDEX

Arithmetic Test (type = double) 2541.7 210385.9 82.8
Dhrystone 2 without register variables 22366.3 1499020.6 67.0
Execl Throughput Test 16.5 998.0 60.5
File Copy (30 seconds) 179.0 14334.0 80.1
Pipe-based Context Switching Test 1318.5 218969.9 166.1
Shell scripts (8 concurrent) 4.0 147.0 36.8
=========
SUM of 6 items 493.2
AVERAGE 82.2



sh pgms/report.sh results/log > results/report
sh pgms/index.sh pgms/index.base results/log >> results/report
cat results/report

BYTE UNIX Benchmarks (Version 3.11)
System -- Linux localhost.localdomain 2.5.33 #32 Tue Sep 3 22:18:19 BST 2002 i686 unknown
Start Benchmark Run: Wed Sep 4 20:46:31 BST 2002
1 interactive users.
Dhrystone 2 without register variables 1488327.9 lps (10 secs, 6 samples)
Dhrystone 2 using register variables 1488265.3 lps (10 secs, 6 samples)
Arithmetic Test (type = arithoh) 3435944.6 lps (10 secs, 6 samples)
Arithmetic Test (type = register) 197870.4 lps (10 secs, 6 samples)
Arithmetic Test (type = short) 145140.8 lps (10 secs, 6 samples)
Arithmetic Test (type = int) 104440.5 lps (10 secs, 6 samples)
Arithmetic Test (type = long) 177757.4 lps (10 secs, 6 samples)
Arithmetic Test (type = float) 208476.4 lps (10 secs, 6 samples)
Arithmetic Test (type = double) 208443.3 lps (10 secs, 6 samples)
System Call Overhead Test 397276.7 lps (10 secs, 6 samples)
Pipe Throughput Test 434561.9 lps (10 secs, 6 samples)
Pipe-based Context Switching Test 148653.5 lps (10 secs, 6 samples)
Process Creation Test 5422.1 lps (10 secs, 6 samples)
Execl Throughput Test 771.6 lps (10 secs, 6 samples)
File Read (10 seconds) 1553289.0 KBps (10 secs, 6 samples)
File Write (10 seconds) 132002.0 KBps (10 secs, 6 samples)
File Copy (10 seconds) 17994.0 KBps (10 secs, 6 samples)
File Read (30 seconds) 1540682.0 KBps (30 secs, 6 samples)
File Write (30 seconds) 137781.0 KBps (30 secs, 6 samples)
File Copy (30 seconds) 11460.0 KBps (30 secs, 6 samples)
C Compiler Test 450.9 lpm (60 secs, 3 samples)
Shell scripts (1 concurrent) 876.7 lpm (60 secs, 3 samples)
Shell scripts (2 concurrent) 480.3 lpm (60 secs, 3 samples)
Shell scripts (4 concurrent) 251.0 lpm (60 secs, 3 samples)
Shell scripts (8 concurrent) 126.0 lpm (60 secs, 3 samples)
Dc: sqrt(2) to 99 decimal places 33530.4 lpm (60 secs, 6 samples)
Recursion Test--Tower of Hanoi 18514.3 lps (10 secs, 6 samples)


INDEX VALUES
TEST BASELINE RESULT INDEX

Arithmetic Test (type = double) 2541.7 208443.3 82.0
Dhrystone 2 without register variables 22366.3 1488327.9 66.5
Execl Throughput Test 16.5 771.6 46.8
File Copy (30 seconds) 179.0 11460.0 64.0
Pipe-based Context Switching Test 1318.5 148653.5 112.7
Shell scripts (8 concurrent) 4.0 126.0 31.5
=========
SUM of 6 items 403.6
AVERAGE 67.3

Comments?

--
Get your free email from http://www.linuxmail.org


Powered by Outblaze


2002-09-04 22:40:24

by Cliff White

[permalink] [raw]
Subject: Re: BYTE Unix Benchmarks Version 3.6

> Hi all,
> I've just ran the BYTE Unix Benchmarks Version 3.6 on the 2.4.19 and on the 2.5.33 kernel.
> Here it goes the results:
>
snipped:
Always useful to compare. I think the data from STP show something simular.
I've pulled some reports, rather than send a huge email, here are the URL's
(btw STP runs v4.0.1 UNIXbench)

runs made on 2-CPU platform
2.4.19: http://khack.osdl.org/stp/3877/results/report
2.5.33: http://khack.osdl.org/stp/4928/results/report

Another compare...why does pre5 appear a little worse than pre4 ?
2.4.20-pre5: http://khack.osdl.org/stp/4836/results/report
2.4.20-pre4: http://khack.osdl.org/stp/4581/results/report


cliffw




2002-09-04 23:52:52

by Andrea Arcangeli

[permalink] [raw]
Subject: Re: BYTE Unix Benchmarks Version 3.6

On Thu, Sep 05, 2002 at 06:00:55AM +0800, Paolo Ciarrocchi wrote:
> Hi all,
> I've just ran the BYTE Unix Benchmarks Version 3.6 on the 2.4.19 and on the 2.5.33 kernel.
> Here it goes the results:

can you make a run on 2.4.20pre5aa1 too? thanks,

> Comments?

I'm just guessing but I think part of the global regression in most
numbers is most due the increased HZ and rmap.

Andrea

2002-09-05 13:43:56

by bert hubert

[permalink] [raw]
Subject: side-by-side Re: BYTE Unix Benchmarks Version 3.6

Side-by-side with some marked changes highlighted:

2.4.19 2.5.33
-----------------------------------------------------------------------
Dhrystone 2 without register variable 1499020.6 lps 1488327.9 lps
Dhrystone 2 using register variables 1501168.4 lps 1488265.3 lps
Arithmetic Test (type = arithoh) 3598100.4 lps 3435944.6 lps
Arithmetic Test (type = register) 201521.0 lps 197870.4 lps
Arithmetic Test (type = short) 190245.9 lps 145140.8 lps
Arithmetic Test (type = int) 201904.5 lps 104440.5 lps
Arithmetic Test (type = long) 201906.4 lps 177757.4 lps
Arithmetic Test (type = float) 210562.7 lps 208476.4 lps
Arithmetic Test (type = double) 210385.9 lps 208443.3 lps
System Call Overhead Test 407402.6 lps 397276.7 lps
>Pipe Throughput Test 476268.6 lps 434561.9 lps
>Pipe-based Context Switching Test 218969.9 lps 148653.5 lps
>Process Creation Test 9078.6 lps 5422.1 lps
Execl Throughput Test 998.0 lps 771.6 lps
File Read (10 seconds) 1571652.0 KBps 1553289.0 KBps
File Write (10 seconds) 109237.0 KBps 132002.0 KBps
>File Copy (10 seconds) 24329.0 KBps 17994.0 KBps
File Read (30 seconds) 1562505.0 KBps 1540682.0 KBps
File Write (30 seconds) 113152.0 KBps 137781.0 KBps
File Copy (30 seconds) 14334.0 KBps 11460.0 KBps
C Compiler Test 470.9 lpm 450.9 lpm
Shell scripts (1 concurrent) 980.4 lpm 876.7 lpm
Shell scripts (2 concurrent) 544.1 lpm 480.3 lpm
Shell scripts (4 concurrent) 287.0 lpm 251.0 lpm
Shell scripts (8 concurrent) 147.0 lpm 126.0 lpm
>Dc: sqrt(2) to 99 decimal places 42311.6 lpm 33530.4 lpm
Recursion Test--Tower of Hanoi 18915.4 lps 18514.3 lps


INDEX VALUES 2.4.19 2.5
TEST INDEX INDEX

Arithmetic Test (type = double) 82.8 82.0
Dhrystone 2 without register variables 67.0 66.5
Execl Throughput Test 60.5 46.8
File Copy (30 seconds) 80.1 64.0
Pipe-based Context Switching Test 166.1 112.7
Shell scripts (8 concurrent) 36.8 31.5
========= =========
SUM of 6 items 493.2 403.6
AVERAGE 82.2 67.3

--
http://www.PowerDNS.com Versatile DNS Software & Services
http://www.tk the dot in .tk
http://lartc.org Linux Advanced Routing & Traffic Control HOWTO

2002-09-05 15:07:31

by Luigi Genoni

[permalink] [raw]
Subject: Re: side-by-side Re: BYTE Unix Benchmarks Version 3.6


I usually run byte bench regularly with every new kernel, so I see some
strange results here.

>From your numbers, I would say you are using a PIII 600/900 Mhz (more or
less). It is not an AMD AThlon or a PIV, since float and double are too
slow, not it is a K6 because they are too fast.

On Thu, 5 Sep 2002, bert hubert wrote:

> Date: Thu, 5 Sep 2002 15:48:30 +0200
> From: bert hubert <[email protected]>
> To: Paolo Ciarrocchi <[email protected]>
> Cc: [email protected]
> Subject: side-by-side Re: BYTE Unix Benchmarks Version 3.6
>
> Side-by-side with some marked changes highlighted:
>
> 2.4.19 2.5.33
> -----------------------------------------------------------------------
> Dhrystone 2 without register variable 1499020.6 lps 1488327.9 lps
> Dhrystone 2 using register variables 1501168.4 lps 1488265.3 lps
> Arithmetic Test (type = arithoh) 3598100.4 lps 3435944.6 lps

this could vary a little

> Arithmetic Test (type = register) 201521.0 lps 197870.4 lps
> Arithmetic Test (type = short) 190245.9 lps 145140.8 lps

the difference should never be so big

> Arithmetic Test (type = int) 201904.5 lps 104440.5 lps

the difference should never be so big

> Arithmetic Test (type = long) 201906.4 lps 177757.4 lps

the difference should never be so big


seeing this I think you had something running in background using your CPU
while you where running int tests. if you loock at bm/results/log
(log.accum if you did some other run recently)
should find lines like:

Arithmetic Test (type = int)|10.0|lps|227163.1|227158.7|6

that is a little more interesting if you are under load.



> Arithmetic Test (type = float) 210562.7 lps 208476.4 lps
> Arithmetic Test (type = double) 210385.9 lps 208443.3 lps
> System Call Overhead Test 407402.6 lps 397276.7 lps
> >Pipe Throughput Test 476268.6 lps 434561.9 lps


> >Pipe-based Context Switching Test 218969.9 lps 148653.5 lps

this could vary because of a lot of factors, starting from a bad page
colouring going to sendmail activity.

> >Process Creation Test 9078.6 lps 5422.1 lps
> Execl Throughput Test 998.0 lps 771.6 lps

this is interesting, but seeing previous results about int and short,
I am curious about your real load. I am quite curious if with 2.5 you are
using kernel preemption.

> File Read (10 seconds) 1571652.0 KBps 1553289.0 KBps
> File Write (10 seconds) 109237.0 KBps 132002.0 KBps
> >File Copy (10 seconds) 24329.0 KBps 17994.0 KBps
> File Read (30 seconds) 1562505.0 KBps 1540682.0 KBps
> File Write (30 seconds) 113152.0 KBps 137781.0 KBps
> File Copy (30 seconds) 14334.0 KBps 11460.0 KBps

I saw the save with IDE disks... again, are you using kernel preemption?


> C Compiler Test 470.9 lpm 450.9 lpm
> Shell scripts (1 concurrent) 980.4 lpm 876.7 lpm
> Shell scripts (2 concurrent) 544.1 lpm 480.3 lpm
> Shell scripts (4 concurrent) 287.0 lpm 251.0 lpm
> Shell scripts (8 concurrent) 147.0 lpm 126.0 lpm

In my tests generally shell scripts are faster with 2.5 kernel.

> >Dc: sqrt(2) to 99 decimal places 42311.6 lpm 33530.4 lpm

> Recursion Test--Tower of Hanoi 18915.4 lps 18514.3 lps
>
>
> INDEX VALUES 2.4.19 2.5
> TEST INDEX INDEX
>
> Arithmetic Test (type = double) 82.8 82.0
> Dhrystone 2 without register variables 67.0 66.5
> Execl Throughput Test 60.5 46.8
> File Copy (30 seconds) 80.1 64.0
> Pipe-based Context Switching Test 166.1 112.7
> Shell scripts (8 concurrent) 36.8 31.5
> ========= =========
> SUM of 6 items 493.2 403.6
> AVERAGE 82.2 67.3
>

Luigi

> --
> http://www.PowerDNS.com Versatile DNS Software & Services
> http://www.tk the dot in .tk
> http://lartc.org Linux Advanced Routing & Traffic Control HOWTO
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>

2002-09-05 15:32:41

by Paolo Ciarrocchi

[permalink] [raw]
Subject: Re: side-by-side Re: BYTE Unix Benchmarks Version 3.6

From: [email protected]

> I usually run byte bench regularly with every new kernel, so I see some
> strange results here.
>
> From your numbers, I would say you are using a PIII 600/900 Mhz (more or
> less). It is not an AMD AThlon or a PIV, since float and double are too
> slow, not it is a K6 because they are too fast.
Yes, I ran the test on a HP Omnibook 600 (PIII@900)

[...]
> seeing this I think you had something running in background using your CPU
> while you where running int tests. if you loock at bm/results/log
> (log.accum if you did some other run recently)
> should find lines like:
>
> Arithmetic Test (type = int)|10.0|lps|227163.1|227158.7|6
>
> that is a little more interesting if you are under load.
No other load, just top and a less of a few files.

[...]
> > >Process Creation Test 9078.6 lps 5422.1 lps
> > Execl Throughput Test 998.0 lps 771.6 lps
>
> this is interesting, but seeing previous results about int and short,
> I am curious about your real load. I am quite curious if with 2.5 you are
> using kernel preemption.
No load, but preemption.

> > File Read (10 seconds) 1571652.0 KBps 1553289.0 KBps
> > File Write (10 seconds) 109237.0 KBps 132002.0 KBps
> > >File Copy (10 seconds) 24329.0 KBps 17994.0 KBps
> > File Read (30 seconds) 1562505.0 KBps 1540682.0 KBps
> > File Write (30 seconds) 113152.0 KBps 137781.0 KBps
> > File Copy (30 seconds) 14334.0 KBps 11460.0 KBps
>
> I saw the save with IDE disks... again, are you using kernel preemption?
ang again, yes ;-)

> > C Compiler Test 470.9 lpm 450.9 lpm
> > Shell scripts (1 concurrent) 980.4 lpm 876.7 lpm
> > Shell scripts (2 concurrent) 544.1 lpm 480.3 lpm
> > Shell scripts (4 concurrent) 287.0 lpm 251.0 lpm
> > Shell scripts (8 concurrent) 147.0 lpm 126.0 lpm
>
> In my tests generally shell scripts are faster with 2.5 kernel.

In any case I'll run again the test with the 4.1 version of Unix Bench.
I'll post the result using as "baseline" the results of the 2.4.19 again 2.5.33 and hopefully 2.4.20-pre5aa1.

Ciao,
Paolo
--
Get your free email from http://www.linuxmail.org


Powered by Outblaze

2002-09-06 03:16:46

by Daniel Phillips

[permalink] [raw]
Subject: Re: side-by-side Re: BYTE Unix Benchmarks Version 3.6

On Thursday 05 September 2002 15:48, bert hubert wrote:
> Arithmetic Test (type = arithoh) 3598100.4 lps 3435944.6 lps
> Arithmetic Test (type = register) 201521.0 lps 197870.4 lps
> Arithmetic Test (type = short) 190245.9 lps 145140.8 lps
> Arithmetic Test (type = int) 201904.5 lps 104440.5 lps
> Arithmetic Test (type = long) 201906.4 lps 177757.4 lps
> Arithmetic Test (type = float) 210562.7 lps 208476.4 lps
> Arithmetic Test (type = double) 210385.9 lps 208443.3 lps

What kind of arithmetic is this? Why on earth would arithmetic vary
from one kernel to another?

--
Daniel

2002-09-06 03:31:31

by Imran Badr

[permalink] [raw]
Subject: Calculating kernel logical address ..

Hi,

I need help to correctly calculate kernel logical address for a user pointer
which I mmaped from device driver. In mmap() file operation, I allocate some
memory using kmalloc() and call :

remap_page_range(vma->vm_start,
virt_to_phys((void *)(Uint32)kmalloc_buffer),
size,
PAGE_SHARED);

after reserving all pages and doing some other stuff. Now when I get a user
pointer and I need to calculate correspoding kernel logical address, I use
following code:

adr = user_address;
pgd_offset(current->mm, adr);

if (!pgd_none(*pgd)) {
pmd = pmd_offset(pgd, adr);
if (!pmd_none(*pmd)) {
ptep = pte_offset(pmd, adr);
pte = *ptep;
if(pte_present(pte)) {
kaddr = (unsigned long) page_address(pte_page(pte));
kaddr |= (adr & (PAGE_SIZE - 1));
}
}
}

Will this code always give me correct kernel logical address?

I will really appreciate any guidance.

Thanks,
Imran.


2002-09-06 07:04:35

by bert hubert

[permalink] [raw]
Subject: Re: side-by-side Re: BYTE Unix Benchmarks Version 3.6

On Fri, Sep 06, 2002 at 05:23:48AM +0200, Daniel Phillips wrote:
> On Thursday 05 September 2002 15:48, bert hubert wrote:
> > Arithmetic Test (type = arithoh) 3598100.4 lps 3435944.6 lps
> > Arithmetic Test (type = register) 201521.0 lps 197870.4 lps
> > Arithmetic Test (type = short) 190245.9 lps 145140.8 lps
> > Arithmetic Test (type = int) 201904.5 lps 104440.5 lps
> > Arithmetic Test (type = long) 201906.4 lps 177757.4 lps
> > Arithmetic Test (type = float) 210562.7 lps 208476.4 lps
> > Arithmetic Test (type = double) 210385.9 lps 208443.3 lps
>
> What kind of arithmetic is this? Why on earth would arithmetic vary
> from one kernel to another?

I wasn't involved in this benchmark, I just reformatted the results.
However, it might be that this benchmark is a tad braindead and 'suffers'
from far better timing resolution because of HZ=1000. I'm unsure. I saw that
this benchmark used something like a Sparcstation 5 as a reference platform,
so maybe it is not geared for today's processors.

Regards,

bert hubert

--
http://www.PowerDNS.com Versatile DNS Software & Services
http://www.tk the dot in .tk
http://lartc.org Linux Advanced Routing & Traffic Control HOWTO

2002-09-06 07:40:05

by Paolo Ciarrocchi

[permalink] [raw]
Subject: Re: side-by-side Re: BYTE Unix Benchmarks Version 3.6

From: Daniel Phillips <[email protected]>

[...]
> What kind of arithmetic is this? Why on earth would arithmetic vary
> from one kernel to another?

Yep, you are right!
There is something wrong in that test.
Look at my new tests I've just post using Unix Benchmarks Version 4.1, they more intersting.

Ciao,
Paolo
--
Get your free email from http://www.linuxmail.org


Powered by Outblaze

2002-09-06 15:39:53

by Manfred Spraul

[permalink] [raw]
Subject: Re: Calculating kernel logical address ..

> adr = user_address;
> pgd_offset(current->mm, adr);
>
> if (!pgd_none(*pgd)) {
> pmd = pmd_offset(pgd, adr);
> if (!pmd_none(*pmd)) {
> ptep = pte_offset(pmd, adr);
> pte = *ptep;
> if(pte_present(pte)) {
> kaddr = (unsigned long) page_address(pte_page(pte));
> kaddr |= (adr & (PAGE_SIZE - 1));
> }
> }
> }
>
> Will this code always give me correct kernel logical address?
>
What about

kmalloc_buffer+(user_address-vma->vm_start)

?
A driver should avoid accessing the page tables.

--
Manfred

2002-09-06 17:11:09

by Imran Badr

[permalink] [raw]
Subject: RE: Calculating kernel logical address ..



-----Original Message-----
From: Manfred Spraul [mailto:[email protected]]
Sent: Friday, September 06, 2002 8:44 AM
To: Imran Badr
Cc: [email protected]
Subject: Re: Calculating kernel logical address ..


> adr = user_address;
> pgd_offset(current->mm, adr);
>
> if (!pgd_none(*pgd)) {
> pmd = pmd_offset(pgd, adr);
> if (!pmd_none(*pmd)) {
> ptep = pte_offset(pmd, adr);
> pte = *ptep;
> if(pte_present(pte)) {
> kaddr = (unsigned long) page_address(pte_page(pte));
> kaddr |= (adr & (PAGE_SIZE - 1));
> }
> }
> }
>
> Will this code always give me correct kernel logical address?
>
What about

kmalloc_buffer+(user_address-vma->vm_start)

?
A driver should avoid accessing the page tables.

--
Manfred

I was wondering if the code which I am using, will always give me addresses
no matter whether HIGHMEM is defined in kernel configuration or not. I
belive that it should not be problem because I am mmap'ing kmalloc'ed memory
which always returns mapped memory. But whats happeing in my lab is
different. If I define HIGHMEM in kernel configuration and install 2GB of
memory in my server then I see a crash in the kernel where I try to access
kaddr calculated bu above code. Any idea?

The problem with your suggestion is that at the point where user gives me an
address for DMA, I do not know what kmalloc_buffer and vma->vm_start values
are. Also, if there are more than one processes accessing the driver, then
how am I going to keep track of all mmap'ed memory.

Thanks,
Imran.





2002-09-06 20:53:42

by Pavel Machek

[permalink] [raw]
Subject: Re: side-by-side Re: BYTE Unix Benchmarks Version 3.6

Hi!

> > I usually run byte bench regularly with every new kernel, so I see some
> > strange results here.
> >
> > From your numbers, I would say you are using a PIII 600/900 Mhz (more or
> > less). It is not an AMD AThlon or a PIV, since float and double are too
> > slow, not it is a K6 because they are too fast.
> Yes, I ran the test on a HP Omnibook 600 (PIII@900)

APM or ACPI? How did you guarantee not going powersave?
Pavel
--
Philips Velo 1: 1"x4"x8", 300gram, 60, 12MB, 40bogomips, linux, mutt,
details at http://atrey.karlin.mff.cuni.cz/~pavel/velo/index.html.

2002-09-07 01:37:42

by Daniel Phillips

[permalink] [raw]
Subject: Re: Calculating kernel logical address ..

On Friday 06 September 2002 05:34, Imran Badr wrote:
> if (!pgd_none(*pgd)) {
> pmd = pmd_offset(pgd, adr);
> if (!pmd_none(*pmd)) {
> ptep = pte_offset(pmd, adr);
> pte = *ptep;
> if(pte_present(pte)) {
> kaddr = (unsigned long) page_address(pte_page(pte));
> kaddr |= (adr & (PAGE_SIZE - 1));
> }
> }
> }
>
> Will this code always give me correct kernel logical address?
>
> I will really appreciate any guidance.

It looks good to me. Note that somebody has added some new voodoo in 2.5
so that page table pages can be in highmem, with the result that the above
code won't work in 2.5, whether or not highmem is configured.

--
Daniel

2002-09-07 07:05:56

by Daniel Phillips

[permalink] [raw]
Subject: Re: Calculating kernel logical address ..

On Friday 06 September 2002 19:13, Imran Badr wrote:
> > adr = user_address;
> > pgd_offset(current->mm, adr);
> >
> > if (!pgd_none(*pgd)) {
> > pmd = pmd_offset(pgd, adr);
> > if (!pmd_none(*pmd)) {
> > ptep = pte_offset(pmd, adr);
> > pte = *ptep;
> > if(pte_present(pte)) {
> > kaddr = (unsigned long) page_address(pte_page(pte));
> > kaddr |= (adr & (PAGE_SIZE - 1));
> > }
> > }
> > }
> >
> > Will this code always give me correct kernel logical address?
>
> I was wondering if the code which I am using, will always give me addresses
> no matter whether HIGHMEM is defined in kernel configuration or not.

On second thought, this code does have problems with highmem. The page in
question was never kmapped, so no kernel address was assigned if the page
was a high memory page. Besides that, there are other other changes to
page address that imply it can't be used with a high memory page. So the
above code isn't generic.

> I
> belive that it should not be problem because I am mmap'ing kmalloc'ed memory
> which always returns mapped memory. But whats happeing in my lab is
> different. If I define HIGHMEM in kernel configuration and install 2GB of
> memory in my server then I see a crash in the kernel where I try to access
> kaddr calculated bu above code. Any idea?

Because of what I just said.

> The problem with your suggestion is that at the point where user gives me an
> address for DMA, I do not know what kmalloc_buffer and vma->vm_start values
> are. Also, if there are more than one processes accessing the driver, then
> how am I going to keep track of all mmap'ed memory.

--
Daniel

2002-09-07 12:17:41

by Paolo Ciarrocchi

[permalink] [raw]
Subject: Re: side-by-side Re: BYTE Unix Benchmarks Version 3.6

From: Pavel Machek <[email protected]>

[...]
> > Yes, I ran the test on a HP Omnibook 600 (PIII@900)
>
> APM or ACPI? How did you guarantee not going powersave?
APM, and I pressed the shift key every few minutes,
therefore no powersafe.

Ciao,
Paolo
--
Get your free email from http://www.linuxmail.org


Powered by Outblaze

2002-09-08 00:04:41

by David Miller

[permalink] [raw]
Subject: Re: Calculating kernel logical address ..

From: Daniel Phillips <[email protected]>
Date: Sat, 7 Sep 2002 03:44:53 +0200

It looks good to me. Note that somebody has added some new voodoo in 2.5
so that page table pages can be in highmem, with the result that the above
code won't work in 2.5, whether or not highmem is configured.

The example given won't work for kernel text/data addresses on a few
platforms (sparc64 is one). And in fact on MIPS the KSEG0 pages lack
any page tables.

There are only three things one can portably obtain a physical address
of:

1) A user address, for a known MM

2) a kmalloc/get_free_page kernel page

3) A vmalloc page

For anything else you're in non-portablt land, including and
in partiular:

1) kernel stack addresses
2) addresses within the main kernel image text/data/bss

2002-09-08 14:05:09

by Luigi Genoni

[permalink] [raw]
Subject: Re: side-by-side Re: BYTE Unix Benchmarks Version 3.6

On Fri, 6 Sep 2002, Pavel Machek wrote:

> Date: Fri, 6 Sep 2002 10:28:50 +0000
> From: Pavel Machek <[email protected]>
> To: Paolo Ciarrocchi <[email protected]>
> Cc: [email protected], [email protected], [email protected]
> Subject: Re: side-by-side Re: BYTE Unix Benchmarks Version 3.6
>
> Hi!
>
> > > I usually run byte bench regularly with every new kernel, so I see some
> > > strange results here.
> > >
> > > From your numbers, I would say you are using a PIII 600/900 Mhz (more or
> > > less). It is not an AMD AThlon or a PIV, since float and double are too
> > > slow, not it is a K6 because they are too fast.
> > Yes, I ran the test on a HP Omnibook 600 (PIII@900)
>
> APM or ACPI? How did you guarantee not going powersave?
>
I suppose Paolo disabled power saving both from bios and from kernel, of
course. If not, then the differences I noticed could be explained easilly,
Thanx

Luigi


2002-09-08 19:21:36

by Pavel Machek

[permalink] [raw]
Subject: Re: side-by-side Re: BYTE Unix Benchmarks Version 3.6


Hi!

> > > Yes, I ran the test on a HP Omnibook 600 (PIII@900)
> >
> > APM or ACPI? How did you guarantee not going powersave?
> APM, and I pressed the shift key every few minutes,
> therefore no powersafe.

That still means APM bios calls when idle, right?
Pavel
--
Casualities in World Trade Center: ~3k dead inside the building,
cryptography in U.S.A. and free speech in Czech Republic.

2002-09-08 22:52:44

by Paolo Ciarrocchi

[permalink] [raw]
Subject: Re: side-by-side Re: BYTE Unix Benchmarks Version 3.6

From: Pavel Machek <[email protected]>
[...]
> > APM, and I pressed the shift key every few minutes,
> > therefore no powersafe.
>
> That still means APM bios calls when idle, right?

Yes, you are rigth.
But again, with Byte Unix version 4.1 I got much
more intersting result with no "strange" numbers,
I tried that test few hours ago,.
I know I can disable APM from both the kernel and the BIOS but I'd like to test the kernel I use in "daily" usage. What do you think about it? Do you suggest me to use a different configuration when I run the test?
And, what are the "best" benchmark?
I use dbench, LMbench, and Unix Bench Ver4.1.

Cheers {ciao},
Paolo
--
Get your free email from http://www.linuxmail.org


Powered by Outblaze

2002-09-09 04:54:30

by Daniel Phillips

[permalink] [raw]
Subject: Re: Calculating kernel logical address ..

On Sunday 08 September 2002 02:01, David S. Miller wrote:
> From: Daniel Phillips <[email protected]>
> Date: Sat, 7 Sep 2002 03:44:53 +0200
>
> It looks good to me. Note that somebody has added some new voodoo in 2.5
> so that page table pages can be in highmem, with the result that the above
> code won't work in 2.5, whether or not highmem is configured.
>
> The example given won't work for kernel text/data addresses on a few
> platforms (sparc64 is one). And in fact on MIPS the KSEG0 pages lack
> any page tables.
>
> There are only three things one can portably obtain a physical address
> of:
>
> 1) A user address, for a known MM
>
> 2) a kmalloc/get_free_page kernel page
>
> 3) A vmalloc page

Actually, he was trying to obtain a kernel virtual address, which
presents its own difficulties, particularly with respect to highmem.

> For anything else you're in non-portablt land, including and
> in partiular:
>
> 1) kernel stack addresses

Could you elaborate on what bad things happen here?

> 2) addresses within the main kernel image text/data/bss

Yep. MIPS's KSEG0 (a stupid design if there ever was one) and i386 large
kernel image pages are just two examples of wrinkles that would need to
be handled. The general principle is: mappings in the kernel's virtual
address space are not maintained by the faulting mechanism, they are
maintained 'by hand'; and the cross-platform N-level page tree is defined
only for addresses that can fault.

Where the page tree does define a mapping to physical memory, obviously
only a physical address may be obtained that way, and due to highmem,
this address may not be mapped into the kernel's virtual address range,
in which case a further kmapping step has to be performed to obtain a
kernel-usable address, subject to bizarre rules that tend to change on
a weekly basis.

Somebody needs to write a book about this ;-)

--
Daniel

2002-09-09 05:03:04

by David Miller

[permalink] [raw]
Subject: Re: Calculating kernel logical address ..

From: Daniel Phillips <[email protected]>
Date: Sun, 8 Sep 2002 20:44:17 +0200

> For anything else you're in non-portablt land, including and
> in partiular:
>
> 1) kernel stack addresses

Could you elaborate on what bad things happen here?

Kernel stack allocation is defined per-architecture. On
sun4c sparc systems, we carve virtual pages out from the kernel
address space and hard map them into the TLB by hand.

> 2) addresses within the main kernel image text/data/bss

Yep. MIPS's KSEG0 (a stupid design if there ever was one)

Actually, KSEG0 the most Linux friendly design in the world
particularly in 64-bit mode. There is no need to have page tables at
all for the main kernel physical memory map. It would shave a lot of
code from the sparc64 TLB miss handlers if I didn't have to handle
PAGE_OFFSET pages, for example.

Alpha does something akin to KSEG0 as well.

I pine constantly for it appearing some day on a future UltraSPARC
revision :-)

2002-09-09 05:10:09

by Daniel Phillips

[permalink] [raw]
Subject: Re: Calculating kernel logical address ..

On Monday 09 September 2002 07:00, David S. Miller wrote:
> > 2) addresses within the main kernel image text/data/bss
>
> Yep. MIPS's KSEG0 (a stupid design if there ever was one)
>
> Actually, KSEG0 the most Linux friendly design in the world
> particularly in 64-bit mode.

That's easy to say until you try and work with it (I assume you have,
and forgot). Just try to do a 3G/1G split on it, for example.

> I pine constantly for it appearing some day on a future UltraSPARC
> revision :-)

Don't wish too hard, you might get it ;-)

--
Daniel

2002-09-09 05:31:08

by David Miller

[permalink] [raw]
Subject: Re: Calculating kernel logical address ..

From: Daniel Phillips <[email protected]>
Date: Mon, 9 Sep 2002 07:17:30 +0200

On Monday 09 September 2002 07:00, David S. Miller wrote:
> Actually, KSEG0 the most Linux friendly design in the world
> particularly in 64-bit mode.

That's easy to say until you try and work with it (I assume you have,
and forgot). Just try to do a 3G/1G split on it, for example.

Maybe you missed the "64-bit mode" part of what I said. :-)

In 64-bit mode there is no need to do any kind of split.
You just use the KSEG mapping with full cache coherency for
all of physical memory as the PAGE_OFFSET area.

I forget if it was KSEG0 or some other number, but I know it
works.


2002-09-09 10:41:21

by Pavel Machek

[permalink] [raw]
Subject: Re: side-by-side Re: BYTE Unix Benchmarks Version 3.6

Hi!

> > > APM, and I pressed the shift key every few minutes,
> > > therefore no powersafe.
> >
> > That still means APM bios calls when idle, right?
>
> Yes, you are rigth.
> But again, with Byte Unix version 4.1 I got much
> more intersting result with no "strange" numbers,
> I tried that test few hours ago,.
> I know I can disable APM from both the kernel and the BIOS but I'd
> > > like to test the kernel I use in "daily" usage. What do you
> > > think about it? Do you suggest me to use a different
> > > configuration when I run the test?

Disable power managment. What you are doing is test of power managment
subsystem, I believe; that's okay but you did not label it as such.

Pavel
--
Casualities in World Trade Center: ~3k dead inside the building,
cryptography in U.S.A. and free speech in Czech Republic.

2002-09-09 17:07:03

by Imran Badr

[permalink] [raw]
Subject: RE: Calculating kernel logical address ..


So, what you gurus suggest me to do? How can I get physical address of a
user buffer (which was originally mmap'ed() from a kmalloc() allocation) and
which would also be protable across multiple platforms?

Thanks.
Imran.


-----Original Message-----
From: David S. Miller [mailto:[email protected]]
Sent: Sunday, September 08, 2002 10:28 PM
To: [email protected]
Cc: [email protected]; [email protected]
Subject: Re: Calculating kernel logical address ..


From: Daniel Phillips <[email protected]>
Date: Mon, 9 Sep 2002 07:17:30 +0200

On Monday 09 September 2002 07:00, David S. Miller wrote:
> Actually, KSEG0 the most Linux friendly design in the world
> particularly in 64-bit mode.

That's easy to say until you try and work with it (I assume you have,
and forgot). Just try to do a 3G/1G split on it, for example.

Maybe you missed the "64-bit mode" part of what I said. :-)

In 64-bit mode there is no need to do any kind of split.
You just use the KSEG mapping with full cache coherency for
all of physical memory as the PAGE_OFFSET area.

I forget if it was KSEG0 or some other number, but I know it
works.



2002-09-09 17:22:45

by Richard B. Johnson

[permalink] [raw]
Subject: RE: Calculating kernel logical address ..

On Mon, 9 Sep 2002, Imran Badr wrote:

>
> So, what you gurus suggest me to do? How can I get physical address of a
> user buffer (which was originally mmap'ed() from a kmalloc() allocation) and
> which would also be protable across multiple platforms?
>
> Thanks.
> Imran.

I think there is a virt_to_bus() macro and its inverse. The 'bus' address
is what you need to give to bus-masters that do DMA. This is different
than virt_to_phys(), which happens to be the same on some platforms
but would not be the same on those, like PPC (Motorola), which have
separate address spaces for different things (RAM, I/O, etc).

Isn't this what you want?

Cheers,
Dick Johnson
Penguin : Linux version 2.4.18 on an i686 machine (797.90 BogoMips).
The US military has given us many words, FUBAR, SNAFU, now ENRON.
Yes, top management were graduates of West Point and Annapolis.

2002-09-09 17:32:25

by Martin J. Bligh

[permalink] [raw]
Subject: Re: Calculating kernel logical address ..

> If you meant virt_to_phys(), this does not work on arbitrary
> kernel virtual addresses either only direct mapped ones
> (ie. kmalloc() or get_free_page() data).

Though maybe it should ... ;-) This doesn't seem like an
impossible modification, and would be most useful. If people
are concerned about speed, we could create __virt_to_phys
if you knew it was direct mapped ...

M.

2002-09-09 17:31:03

by Imran Badr

[permalink] [raw]
Subject: RE: Calculating kernel logical address ..


The virt_to_bus() macro would work only for kernel logical addresses. I am
trying to find a portable way to figure out the kernel logical address of a
user buffer so that I could use virt_to_bus() for DMA. The user address is
mmap'ed from kmalloc'ed buffer in the mmap() entry of my driver. Now when
the user wants to send this data to the PCI device, it makes an ioctl call
and give the user address to the driver. Now driver has to figure out the
kernel logical address for DMA.

Thanks,
Imran.

-----Original Message-----
From: Richard B. Johnson [mailto:[email protected]]
Sent: Monday, September 09, 2002 10:30 AM
To: Imran Badr
Cc: 'David S. Miller'; [email protected]; [email protected]
Subject: RE: Calculating kernel logical address ..


On Mon, 9 Sep 2002, Imran Badr wrote:

>
> So, what you gurus suggest me to do? How can I get physical address of a
> user buffer (which was originally mmap'ed() from a kmalloc() allocation)
and
> which would also be protable across multiple platforms?
>
> Thanks.
> Imran.

I think there is a virt_to_bus() macro and its inverse. The 'bus' address
is what you need to give to bus-masters that do DMA. This is different
than virt_to_phys(), which happens to be the same on some platforms
but would not be the same on those, like PPC (Motorola), which have
separate address spaces for different things (RAM, I/O, etc).

Isn't this what you want?

Cheers,
Dick Johnson
Penguin : Linux version 2.4.18 on an i686 machine (797.90 BogoMips).
The US military has given us many words, FUBAR, SNAFU, now ENRON.
Yes, top management were graduates of West Point and Annapolis.


2002-09-09 17:27:02

by David Miller

[permalink] [raw]
Subject: Re: Calculating kernel logical address ..

From: "Richard B. Johnson" <[email protected]>
Date: Mon, 9 Sep 2002 13:29:42 -0400 (EDT)

I think there is a virt_to_bus() macro and its inverse.

Which is deprecate and not to be used by any new code.
Use Documentation/DMA-mapping.txt instead.

If you meant virt_to_phys(), this does not work on arbitrary
kernel virtual addresses either only direct mapped ones
(ie. kmalloc() or get_free_page() data).

2002-09-09 17:53:26

by Richard B. Johnson

[permalink] [raw]
Subject: RE: Calculating kernel logical address ..

On Mon, 9 Sep 2002, Imran Badr wrote:

>
> The virt_to_bus() macro would work only for kernel logical addresses. I am
> trying to find a portable way to figure out the kernel logical address of a
> user buffer so that I could use virt_to_bus() for DMA. The user address is
> mmap'ed from kmalloc'ed buffer in the mmap() entry of my driver. Now when
> the user wants to send this data to the PCI device, it makes an ioctl call
> and give the user address to the driver. Now driver has to figure out the
> kernel logical address for DMA.
>
> Thanks,
> Imran.
>

Well I just read Documentation/DMA-mapping.txt as advised by David
and it seems as though it will no longer be possible to do what
many programmers have been wanting to do, to wit:

(1) In user-code, allocate a buffer.
(2) Lock that buffer into memory.
(3) Call some driver that DMAs data to/from that buffer.

Although I have never done this, I have heard that this is what
screen-cards (X-Servers), and audio boards have been doing. Also,
I'm told my some M$xperts that this is what "Direct-X" does. I
don't know anything about the direct-to/from user DMA, as is obvious,
but if that's being closed-off, there may be a problem that's
just beginning.

For some reason, (claimed performance reasons) user-mode code
has to be able to get data directly from hardware with no
intervening copy operation. I think any claimed advantage goes
away when you look at the overhead necessary for user-mode
code to sleep before, and awaken after, the DMA operation but
often marketing departments make those decisions.

So, is it correct that you cannot DMA to/from a user buffer?

Cheers,
Dick Johnson
Penguin : Linux version 2.4.18 on an i686 machine (797.90 BogoMips).
The US military has given us many words, FUBAR, SNAFU, now ENRON.
Yes, top management were graduates of West Point and Annapolis.

2002-09-09 18:03:24

by Daniel Phillips

[permalink] [raw]
Subject: Re: Calculating kernel logical address ..

On Monday 09 September 2002 19:23, David S. Miller wrote:
> From: "Richard B. Johnson" <[email protected]>
> Date: Mon, 9 Sep 2002 13:29:42 -0400 (EDT)
>
> I think there is a virt_to_bus() macro and its inverse.
>
> Which is deprecate and not to be used by any new code.
> Use Documentation/DMA-mapping.txt instead.
>
> If you meant virt_to_phys(), this does not work on arbitrary
> kernel virtual addresses either only direct mapped ones
> (ie. kmalloc() or get_free_page() data).

In this case he starts with a kmalloc, then mmaps it somehow. Imran,
exactly what code do you use to mmap the kmalloced memory?

--
Daniel

2002-09-09 18:04:24

by Andrew Morton

[permalink] [raw]
Subject: Re: Calculating kernel logical address ..

Imran Badr wrote:
>
> The virt_to_bus() macro would work only for kernel logical addresses. I am
> trying to find a portable way to figure out the kernel logical address of a
> user buffer so that I could use virt_to_bus() for DMA. The user address is
> mmap'ed from kmalloc'ed buffer in the mmap() entry of my driver. Now when
> the user wants to send this data to the PCI device, it makes an ioctl call
> and give the user address to the driver. Now driver has to figure out the
> kernel logical address for DMA.
>

You can obtain this info by walking the user's pagetables with
get_user_pages(). That give `struct page' pointers, with which
all things are possible.

2002-09-09 18:10:02

by Jesse Barnes

[permalink] [raw]
Subject: Re: Calculating kernel logical address ..

On Mon, Sep 09, 2002 at 02:00:35PM -0400, Richard B. Johnson wrote:
> Well I just read Documentation/DMA-mapping.txt as advised by David
> and it seems as though it will no longer be possible to do what
> many programmers have been wanting to do, to wit:
>
> (1) In user-code, allocate a buffer.
> (2) Lock that buffer into memory.
> (3) Call some driver that DMAs data to/from that buffer.

It looks drivers/media/video/video-buf.c uses alloc_kiovec() and
map_user_kiobuf() to do it. And I think Ben LaHaise was talking about
removing these functions and creating some other, lightweight
interface for the same purpose? OTOH, it's been awhile since I looked
at this stuff, so I'm not sure how it works anymore, I'm sure someone
else could provide more useful info.

Jesse

2002-09-09 18:10:19

by Imran Badr

[permalink] [raw]
Subject: RE: Calculating kernel logical address ..

I believe that screen cards and audio drivers do exactly the same what I am
doing. You donot allocate memory in user space for DMA becuase that memory
is not guaranteed to be contiguous in physical space. Instead, you call
mmap() entry point of the driver, the driver maps kernel memory (allocated
by kmalloc or get_free_pages, or device memory) in to the process's space.
Now, the user proghram can directly access device memory /or copy data
directly to that buffer for DMA. This eliminates copy_from/to_user call
which could be expensive. I have seen 30-40 % performance improvement on my
i386 system.

But my question here still begging an answer: What would be the portable way
to calculate kernel logical address of that user buffer?

Thanks,
Imran.




-----Original Message-----
From: Richard B. Johnson [mailto:[email protected]]
Sent: Monday, September 09, 2002 11:01 AM
To: Imran Badr
Cc: 'David S. Miller'; [email protected]; [email protected]
Subject: RE: Calculating kernel logical address ..


On Mon, 9 Sep 2002, Imran Badr wrote:

>
> The virt_to_bus() macro would work only for kernel logical addresses. I am
> trying to find a portable way to figure out the kernel logical address of
a
> user buffer so that I could use virt_to_bus() for DMA. The user address is
> mmap'ed from kmalloc'ed buffer in the mmap() entry of my driver. Now when
> the user wants to send this data to the PCI device, it makes an ioctl call
> and give the user address to the driver. Now driver has to figure out the
> kernel logical address for DMA.
>
> Thanks,
> Imran.
>

Well I just read Documentation/DMA-mapping.txt as advised by David
and it seems as though it will no longer be possible to do what
many programmers have been wanting to do, to wit:

(1) In user-code, allocate a buffer.
(2) Lock that buffer into memory.
(3) Call some driver that DMAs data to/from that buffer.

Although I have never done this, I have heard that this is what
screen-cards (X-Servers), and audio boards have been doing. Also,
I'm told my some M$xperts that this is what "Direct-X" does. I
don't know anything about the direct-to/from user DMA, as is obvious,
but if that's being closed-off, there may be a problem that's
just beginning.

For some reason, (claimed performance reasons) user-mode code
has to be able to get data directly from hardware with no
intervening copy operation. I think any claimed advantage goes
away when you look at the overhead necessary for user-mode
code to sleep before, and awaken after, the DMA operation but
often marketing departments make those decisions.

So, is it correct that you cannot DMA to/from a user buffer?

Cheers,
Dick Johnson
Penguin : Linux version 2.4.18 on an i686 machine (797.90 BogoMips).
The US military has given us many words, FUBAR, SNAFU, now ENRON.
Yes, top management were graduates of West Point and Annapolis.


2002-09-09 18:10:56

by Daniel Phillips

[permalink] [raw]
Subject: Re: Calculating kernel logical address ..

On Monday 09 September 2002 20:00, Richard B. Johnson wrote:
> For some reason, (claimed performance reasons) user-mode code
> has to be able to get data directly from hardware with no
> intervening copy operation. I think any claimed advantage goes
> away when you look at the overhead necessary for user-mode
> code to sleep before, and awaken after, the DMA operation but
> often marketing departments make those decisions.

Pfft. Try turning off ide dma and see what happens.

--
Daniel

2002-09-09 18:12:53

by Kurt Ferreira

[permalink] [raw]
Subject: RE: Calculating kernel logical address ..

On Mon, 9 Sep 2002, Richard B. Johnson wrote:

> Well I just read Documentation/DMA-mapping.txt as advised by David
> and it seems as though it will no longer be possible to do what
> many programmers have been wanting to do, to wit:
>
> (1) In user-code, allocate a buffer.
> (2) Lock that buffer into memory.
> (3) Call some driver that DMAs data to/from that buffer.
>
Hmm. IIRC (big if) did not kiobufs allow something similar to this. It
has been long since I have looked though.

Kurt

2002-09-09 18:19:46

by Daniel Phillips

[permalink] [raw]
Subject: Re: Calculating kernel logical address ..

On Monday 09 September 2002 20:12, Imran Badr wrote:
> But my question here still begging an answer: What would be the portable way
> to calculate kernel logical address of that user buffer?

Could you please post your code for doing the kmalloc and mmap?

--
Daniel

2002-09-09 18:20:30

by David Miller

[permalink] [raw]
Subject: Re: Calculating kernel logical address ..

From: "Richard B. Johnson" <[email protected]>
Date: Mon, 9 Sep 2002 14:00:35 -0400 (EDT)

Well I just read Documentation/DMA-mapping.txt as advised by David
and it seems as though it will no longer be possible to do what
many programmers have been wanting to do, to wit:

(1) In user-code, allocate a buffer.
(2) Lock that buffer into memory.
(3) Call some driver that DMAs data to/from that buffer.

Video capture drivers and ALSA layer in 2.5.x kernel do
this perfectly fine. Perhaps you should have a look
at how they handle DMA on PCI.

2002-09-09 18:37:12

by Richard B. Johnson

[permalink] [raw]
Subject: Re: Calculating kernel logical address ..

On Mon, 9 Sep 2002, Daniel Phillips wrote:

> On Monday 09 September 2002 20:00, Richard B. Johnson wrote:
> > For some reason, (claimed performance reasons) user-mode code
> > has to be able to get data directly from hardware with no
> > intervening copy operation. I think any claimed advantage goes
> > away when you look at the overhead necessary for user-mode
> > code to sleep before, and awaken after, the DMA operation but
> > often marketing departments make those decisions.
>
> Pfft. Try turning off ide dma and see what happens.

I know that DMA works, I'm talking about DMA direct-to-user
which is not what the file-systems that use DMA do.


Cheers,
Dick Johnson
Penguin : Linux version 2.4.18 on an i686 machine (797.90 BogoMips).
The US military has given us many words, FUBAR, SNAFU, now ENRON.
Yes, top management were graduates of West Point and Annapolis.

2002-09-09 18:36:54

by Daniel Phillips

[permalink] [raw]
Subject: Re: Calculating kernel logical address ..

On Monday 09 September 2002 20:08, Andrew Morton wrote:
> Imran Badr wrote:
> >
> > The virt_to_bus() macro would work only for kernel logical addresses. I am
> > trying to find a portable way to figure out the kernel logical address of a
> > user buffer so that I could use virt_to_bus() for DMA. The user address is
> > mmap'ed from kmalloc'ed buffer in the mmap() entry of my driver. Now when
> > the user wants to send this data to the PCI device, it makes an ioctl call
> > and give the user address to the driver. Now driver has to figure out the
> > kernel logical address for DMA.
>
> You can obtain this info by walking the user's pagetables with
> get_user_pages(). That give `struct page' pointers, with which
> all things are possible.

As long as you can be sure they won't spontaneously vanish on you.

--
Daniel

2002-09-09 18:38:49

by Andrew Morton

[permalink] [raw]
Subject: Re: Calculating kernel logical address ..

Jesse Barnes wrote:
>
> On Mon, Sep 09, 2002 at 02:00:35PM -0400, Richard B. Johnson wrote:
> > Well I just read Documentation/DMA-mapping.txt as advised by David
> > and it seems as though it will no longer be possible to do what
> > many programmers have been wanting to do, to wit:
> >
> > (1) In user-code, allocate a buffer.
> > (2) Lock that buffer into memory.
> > (3) Call some driver that DMAs data to/from that buffer.
>
> It looks drivers/media/video/video-buf.c uses alloc_kiovec() and
> map_user_kiobuf() to do it.

For video-buf.c and for Imran's application, that's just a wrapper
which is used to get at get_user_pages().

It would be best to not use the kiobuf code please - its future is
up in the air. Just use get_user_pages() directly for now.

2002-09-09 18:39:06

by Daniel Phillips

[permalink] [raw]
Subject: Re: Calculating kernel logical address ..

On Monday 09 September 2002 20:13, Jesse Barnes wrote:
> On Mon, Sep 09, 2002 at 02:00:35PM -0400, Richard B. Johnson wrote:
> > Well I just read Documentation/DMA-mapping.txt as advised by David
> > and it seems as though it will no longer be possible to do what
> > many programmers have been wanting to do, to wit:
> >
> > (1) In user-code, allocate a buffer.
> > (2) Lock that buffer into memory.
> > (3) Call some driver that DMAs data to/from that buffer.
>
> It looks drivers/media/video/video-buf.c uses alloc_kiovec() and
> map_user_kiobuf() to do it. And I think Ben LaHaise was talking about
> removing these functions and creating some other, lightweight
> interface for the same purpose?

Hopefully. My understanding is that kio is obsoleted by bio and aio,
anyone want to confirm/deny this?

--
Daniel

2002-09-09 18:41:18

by Imran Badr

[permalink] [raw]
Subject: RE: Calculating kernel logical address ..



-----Original Message-----
From: Daniel Phillips [mailto:[email protected]]
Sent: Monday, September 09, 2002 11:27 AM
To: [email protected]; [email protected]
Cc: 'David S. Miller'; [email protected]
Subject: Re: Calculating kernel logical address ..


On Monday 09 September 2002 20:12, Imran Badr wrote:
> But my question here still begging an answer: What would be the portable
way
> to calculate kernel logical address of that user buffer?

>Could you please post your code for doing the kmalloc and mmap?
>
>--
>Daniel


Sure, in mmap():

size = vma->vm_end - vma->vm_start;
if(size % PAGE_SIZE)
{
printk(KERN_CRIT "mmap: size (%ld) not multiple of PAGE_SIZE.\n", size);
return -ENXIO;
}

offset = vma->vm_pgoff<<PAGE_SHIFT;
if(offset & ~PAGE_MASK)
{
printk(KERN_CRIT "mmap: offset (%ld) not aligned.\n", offset);
return -ENXIO;
}

kmalloc_ptr = (Uint8 *)kmalloc(size+(2*PAGE_SIZE), GFP_KERNEL);
if(kmalloc_ptr == NULL)
{
printk(KERN_CRIT "mmap: not enough memory.\n");
return -ENOMEM;
}

/* align it to page boundary */
kmalloc_area = (Uint8 *)(((Uint32)kmalloc_ptr + PAGE_SIZE -1) & PAGE_MASK);

/* reserve all pages */
for(virt_addr = (Uint32)kmalloc_area; virt_addr < (Uint32)kmalloc_area +
size; virt_addr +=PAGE_SIZE)
{
mem_map_reserve(virt_to_page(virt_addr));
}

/* lock the area*/
vma->vm_flags |=VM_LOCKED;

if(remap_page_range(vma->vm_start,
virt_to_phys((void *)(Uint32)kmalloc_area),
size,
PAGE_SHARED))
{
printk(KERN_CRIT "mmap: remap page range failed.\n");
return -ENXIO;
}


vma->vm_ops = &pkp_vma_ops;
vma->vm_private_data = kmalloc_ptr;
return 0;



This works just fine on my i386 platform (SMP ,non-SMP). Now in my ioctl()
entry I get the kernel logical address by using the following code:

adr = user_address;
pgd_offset(current->mm, adr);
if (!pgd_none(*pgd)) {
pmd = pmd_offset(pgd, adr);
if (!pmd_none(*pmd)) {
ptep = pte_offset(pmd, adr);
pte = *ptep;
if(pte_present(pte)) {
kaddr = (unsigned long) page_address(pte_page(pte));
kaddr |= (adr & (PAGE_SIZE - 1));
}
}
}

Now for DMA, I get bus address by using virt_to_bus(kaddr). So, is there any
portablility issue in this scheme?

Thanks,
Imran.






2002-09-09 18:47:29

by Imran Badr

[permalink] [raw]
Subject: RE: Calculating kernel logical address ..



-----Original Message-----
From: [email protected]
[mailto:[email protected]]On Behalf Of Daniel Phillips
Sent: Monday, September 09, 2002 11:23 AM
To: Andrew Morton; [email protected]
Cc: [email protected]; 'David S. Miller';
[email protected]
Subject: Re: Calculating kernel logical address ..


On Monday 09 September 2002 20:08, Andrew Morton wrote:
> Imran Badr wrote:
> >
> > The virt_to_bus() macro would work only for kernel logical addresses. I
am
> > trying to find a portable way to figure out the kernel logical address
of a
> > user buffer so that I could use virt_to_bus() for DMA. The user address
is
> > mmap'ed from kmalloc'ed buffer in the mmap() entry of my driver. Now
when
> > the user wants to send this data to the PCI device, it makes an ioctl
call
> > and give the user address to the driver. Now driver has to figure out
the
> > kernel logical address for DMA.
>
> You can obtain this info by walking the user's pagetables with
> get_user_pages(). That give `struct page' pointers, with which
> all things are possible.

>As long as you can be sure they won't spontaneously vanish on you.

>--
>Daniel
>-
>To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
>the body of a message to [email protected]
>More majordomo info at http://vger.kernel.org/majordomo-info.html
>Please read the FAQ at http://www.tux.org/lkml/



down(&current->mm->mmap_sem) would help.

Imran.




2002-09-09 18:49:45

by David Miller

[permalink] [raw]
Subject: Re: Calculating kernel logical address ..

From: "Imran Badr" <[email protected]>
Date: Mon, 9 Sep 2002 11:49:02 -0700

down(&current->mm->mmap_sem) would help.

Yes and get_user_pages() grabs a reference to all the pages
for you.

2002-09-09 18:50:54

by Richard B. Johnson

[permalink] [raw]
Subject: Re: Calculating kernel logical address ..

On Mon, 9 Sep 2002, Daniel Phillips wrote:

> On Monday 09 September 2002 20:43, Richard B. Johnson wrote:
> > On Mon, 9 Sep 2002, Daniel Phillips wrote:
> >
> > > On Monday 09 September 2002 20:00, Richard B. Johnson wrote:
> > > > For some reason, (claimed performance reasons) user-mode code
> > > > has to be able to get data directly from hardware with no
> > > > intervening copy operation. I think any claimed advantage goes
> > > > away when you look at the overhead necessary for user-mode
> > > > code to sleep before, and awaken after, the DMA operation but
> > > > often marketing departments make those decisions.
> > >
> > > Pfft. Try turning off ide dma and see what happens.
> >
> > I know that DMA works, I'm talking about DMA direct-to-user
> > which is not what the file-systems that use DMA do.
>
> The next generation of fast, parallel filesystems relies on dma
> to/from user space. Besides, what do you think happens when you
> read/write a mmap?

You write to some memory that may (perhaps never) be written to
the underlying device, using whatever I/O method that underlying
device uses, including network.

And, if you are going to DMA direct to/from user-space, you have
a real big performance problem when the user changes a single byte
or a small number of bytes in a file. So your (theoretical) next
generation, as you say, "fast" parallel filesystems won't be doing
this.

Cheers,
Dick Johnson
Penguin : Linux version 2.4.18 on an i686 machine (797.90 BogoMips).
The US military has given us many words, FUBAR, SNAFU, now ENRON.
Yes, top management were graduates of West Point and Annapolis.

2002-09-09 18:58:46

by Daniel Phillips

[permalink] [raw]
Subject: Re: Calculating kernel logical address ..

On Monday 09 September 2002 20:49, Imran Badr wrote:
> > You can obtain this info by walking the user's pagetables with
> > get_user_pages(). That give `struct page' pointers, with which
> > all things are possible.
>
> >As long as you can be sure they won't spontaneously vanish on you.
>
> >--
> >Daniel
> >-
> >To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> >the body of a message to [email protected]
> >More majordomo info at http://vger.kernel.org/majordomo-info.html
> >Please read the FAQ at http://www.tux.org/lkml/
>
> down(&current->mm->mmap_sem) would help.

Not for anon pages, and how do you know whether it's anon or not before
looking at the page, which may be free by the time you look at it?
In other words, mm->page_table_lock is the one, because it's required
for unmapping a pte, and any mapped page will be forced to hold a count
increment until it gets past that lock. Without this lock, the results
of pte_page are unstable.

--
Daniel

2002-09-09 19:04:00

by Daniel Phillips

[permalink] [raw]
Subject: Re: Calculating kernel logical address ..

On Monday 09 September 2002 20:43, Richard B. Johnson wrote:
> On Mon, 9 Sep 2002, Daniel Phillips wrote:
>
> > On Monday 09 September 2002 20:00, Richard B. Johnson wrote:
> > > For some reason, (claimed performance reasons) user-mode code
> > > has to be able to get data directly from hardware with no
> > > intervening copy operation. I think any claimed advantage goes
> > > away when you look at the overhead necessary for user-mode
> > > code to sleep before, and awaken after, the DMA operation but
> > > often marketing departments make those decisions.
> >
> > Pfft. Try turning off ide dma and see what happens.
>
> I know that DMA works, I'm talking about DMA direct-to-user
> which is not what the file-systems that use DMA do.

The next generation of fast, parallel filesystems relies on dma
to/from user space. Besides, what do you think happens when you
read/write a mmap?

--
Daniel

2002-09-09 19:10:23

by Andrew Morton

[permalink] [raw]
Subject: Re: Calculating kernel logical address ..

Daniel Phillips wrote:
>
> ...
> > down(&current->mm->mmap_sem) would help.
>
> Not for anon pages, and how do you know whether it's anon or not before
> looking at the page, which may be free by the time you look at it?
> In other words, mm->page_table_lock is the one, because it's required
> for unmapping a pte, and any mapped page will be forced to hold a count
> increment until it gets past that lock. Without this lock, the results
> of pte_page are unstable.

The caller of get_user_pages() needs to hold mmap_sem for reading
to prevent the vmas from going away. get_user_pages() does the
right thing wrt page_table_lock. (As a quick peek at the code
would reveal...)

2002-09-09 19:21:17

by Imran Badr

[permalink] [raw]
Subject: RE: Calculating kernel logical address ..



-----Original Message-----
From: [email protected]
[mailto:[email protected]]On Behalf Of Andrew Morton
Sent: Monday, September 09, 2002 12:15 PM
To: Daniel Phillips
Cc: [email protected]; [email protected]; 'David S. Miller';
[email protected]
Subject: Re: Calculating kernel logical address ..


Daniel Phillips wrote:
>
> ...
> > down(&current->mm->mmap_sem) would help.
>
> Not for anon pages, and how do you know whether it's anon or not before
> looking at the page, which may be free by the time you look at it?
> In other words, mm->page_table_lock is the one, because it's required
> for unmapping a pte, and any mapped page will be forced to hold a count
> increment until it gets past that lock. Without this lock, the results
> of pte_page are unstable.

>The caller of get_user_pages() needs to hold mmap_sem for reading
>to prevent the vmas from going away. get_user_pages() does the
>right thing wrt page_table_lock. (As a quick peek at the code
>would reveal...)
>-
>To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
>the body of a message to [email protected]
>More majordomo info at http://vger.kernel.org/majordomo-info.html
>Please read the FAQ at http://www.tux.org/lkml/


So, I am hearing the get_user_pages is the right choice for me. BTW, did
anybody take a look at the code snippet that posted earlier? That code
mmap's kmalloc'ed memory to process space and then in the ioctl call, I
calculate kernel logical address.
Please have a look and advise any portability issue.

Thanks,
Imran.


2002-09-09 19:36:27

by Andrew Morton

[permalink] [raw]
Subject: Re: Calculating kernel logical address ..

Daniel Phillips wrote:
>
> On Monday 09 September 2002 20:13, Jesse Barnes wrote:
> > On Mon, Sep 09, 2002 at 02:00:35PM -0400, Richard B. Johnson wrote:
> > > Well I just read Documentation/DMA-mapping.txt as advised by David
> > > and it seems as though it will no longer be possible to do what
> > > many programmers have been wanting to do, to wit:
> > >
> > > (1) In user-code, allocate a buffer.
> > > (2) Lock that buffer into memory.
> > > (3) Call some driver that DMAs data to/from that buffer.
> >
> > It looks drivers/media/video/video-buf.c uses alloc_kiovec() and
> > map_user_kiobuf() to do it. And I think Ben LaHaise was talking about
> > removing these functions and creating some other, lightweight
> > interface for the same purpose?
>
> Hopefully. My understanding is that kio is obsoleted by bio and aio,
> anyone want to confirm/deny this?

Mumble, mutter, dunno.

There are two sides to kiobufs: they can be used as a front-end to get_user_pages (video-buf.c, Imran's application and at least one
proprietary mpeg streaming driver of which I am aware). And they
can be used as a container for direct IO to a block device (mtdblk.c
and LVM1).

Nobody seems to have come forth to implement a thought-out scatter/gather,
map-user-pages library infrastructure so I'd be a bit reluctant to
break stuff without offering a replacement.

We need a general-purpose "read or write these pages to this blockdev"
library function. For mtdblk, LVM1/LVM2 and probably swapper_space.
With that we can remove the block IO stuff from kiovecs. And convert
the other drivers to use get_user_pages() directly into an ad-hoc private
page array. Those things would allow kiovecs/kiobufs to be retired.

I guess we need to get more motivated about this, before some large
piece of infrastructure (EVMS/LVM2) lands in the tree using ll_rw_kiovec.

This:

generic_direct_IO(int rw, struct inode *inode, const struct iovec *iov,
loff_t offset, unsigned long nr_segs, get_blocks_t get_blocks)

is getting close to what we need. But it is synchronous, and too
heavyweight for swap I/O.

2002-09-09 20:34:21

by Daniel Phillips

[permalink] [raw]
Subject: Re: Calculating kernel logical address ..

On Monday 09 September 2002 21:40, Andrew Morton wrote:
> We need a general-purpose "read or write these pages to this blockdev"
> library function.

I thought bio was supposed to be that. In what way does it not suffice?
Simply because of not having a suitable wrapper?

> For mtdblk, LVM1/LVM2 and probably swapper_space.
> With that we can remove the block IO stuff from kiovecs. And convert
> the other drivers to use get_user_pages() directly into an ad-hoc private
> page array. Those things would allow kiovecs/kiobufs to be retired.

As far as pressing generic_direct_IO into use for this purpose goes, why
not forget about that (crufty looking) layer and sit directly on top of
bio?

--
Daniel

2002-09-09 20:57:51

by Manfred Spraul

[permalink] [raw]
Subject: RE: Calculating kernel logical address ..

>>As long as you can be sure they won't spontaneously vanish on you.
>
>>--
>>Daniel
>>-
>>To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
>>the body of a message to [email protected]
>>More majordomo info at http://vger.kernel.org/majordomo-info.html
>>Please read the FAQ at http://www.tux.org/lkml/
>
>
>
> down(&current->mm->mmap_sem) would help.
>

Wrong. Acquiring the mmap semaphore does NOT prevent the swapper from
swapping out pages.

Only the page_table_lock prevents the swapper from touching a task.

--
Manfred

2002-09-09 21:05:37

by Imran Badr

[permalink] [raw]
Subject: RE: Calculating kernel logical address ..



-----Original Message-----
From: Manfred Spraul [mailto:[email protected]]
Sent: Monday, September 09, 2002 2:02 PM
To: Imran Badr; [email protected]
Subject: RE: Calculating kernel logical address ..


>>As long as you can be sure they won't spontaneously vanish on you.
>
>>--
>>Daniel
>>-
>>To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
>>the body of a message to [email protected]
>>More majordomo info at http://vger.kernel.org/majordomo-info.html
>>Please read the FAQ at http://www.tux.org/lkml/
>
>
>
> down(&current->mm->mmap_sem) would help.
>

>Wrong. Acquiring the mmap semaphore does NOT prevent the swapper from
>swapping out pages.

>Only the page_table_lock prevents the swapper from touching a task.

>--
> Manfred


I think you missed the whole context of the discussion. The next step is to
call get_user_pages() which takes appropriate actions to prevent page swaps.

Thanks,
Imran.



2002-09-09 21:14:43

by Manfred Spraul

[permalink] [raw]
Subject: Re: Calculating kernel logical address ..

Andrew Morton wrote:
> Nobody seems to have come forth to implement a thought-out scatter/gather,
> map-user-pages library infrastructure so I'd be a bit reluctant to
> break stuff without offering a replacement.
>

We'd need one.

get_user_pages() is broken if a kernel module access the virtual address
of the page and the cpu caches are not coherent:
Most of the flush functions need the vma pointer, but it's impossible to
guarantee that it still exists when the get_user_pages() user calls
page_cache_release().

--
Manfred

2002-09-09 21:43:05

by Andrew Morton

[permalink] [raw]
Subject: Re: Calculating kernel logical address ..

Manfred Spraul wrote:
>
> Andrew Morton wrote:
> > Nobody seems to have come forth to implement a thought-out scatter/gather,
> > map-user-pages library infrastructure so I'd be a bit reluctant to
> > break stuff without offering a replacement.
> >
>
> We'd need one.
>
> get_user_pages() is broken if a kernel module access the virtual address
> of the page and the cpu caches are not coherent:

OK. Most users seem to just want to put the pages under DMA though.

> Most of the flush functions need the vma pointer, but it's impossible to
> guarantee that it still exists when the get_user_pages() user calls
> page_cache_release().

Well presumably, if the driver is altering user memory by hand,
it is synchronous and they can hang onto mmap_sem while doing it?

2002-09-09 21:48:30

by Alan

[permalink] [raw]
Subject: RE: Calculating kernel logical address ..

On Mon, 2002-09-09 at 19:12, Imran Badr wrote:
> But my question here still begging an answer: What would be the portable way
> to calculate kernel logical address of that user buffer?

Who says it even has one ? Not all user allocated pages are even mapped
into the kernel by default. The kiobuf stuff used in 2.4 will do the job
for 2.4. For 2.5 the API will probably look a little different and be a
fair bit faster

2002-09-09 22:50:59

by Imran Badr

[permalink] [raw]
Subject: RE: Calculating kernel logical address ..


But the buffer which I am concerned about was allocated my kmalloc() and
mapped to the process space in mmap(). AFAIK, kmalloc'ed buffers are
guaranteed to be mapped.

Here is how I am doing it:

1) Implement mmap method in driver.
2) allocate memory by kmalloc()
3) set reserved bit for all allocated pages.
4) set VM_LOCKED flag on the memory aread. ( vma->flags |= VM_LOCKED; )
5) call remap_page_range() to map physical address of the buffer to the user
space address.

I have posted the complete code in a previous post under the same subject.
Now in ioctl() method, the user gives me the memory address. I know that it
was mmaped by looking at the ioctl code. Now I have to calculate physical
address of the user memory address. I know that it was allocated my kmalloc
so it should be mapped. I am currently accessing the process page tables to
find the kernel logical address so that I could use virt_to_bus() to get bus
address.

adr = user_address;
pgd_offset(current->mm, adr);
if (!pgd_none(*pgd)) {
pmd = pmd_offset(pgd, adr);
if (!pmd_none(*pmd)) {
ptep = pte_offset(pmd, adr);
pte = *ptep;
if(pte_present(pte)) {
kaddr = (unsigned long) page_address(pte_page(pte));
kaddr |= (adr & (PAGE_SIZE - 1));
}
}
}

One suggestion was to use get_user_pages() after getting appropriate
semaphore but I have learned that this API is fundamentally broken for
architectures with noncoherent caches. Does any body has any solution?

Thanks,
Imran.









-----Original Message-----
From: Alan Cox [mailto:[email protected]]
Sent: Monday, September 09, 2002 2:55 PM
To: [email protected]
Cc: [email protected]; 'David S. Miller'; [email protected];
[email protected]
Subject: RE: Calculating kernel logical address ..


On Mon, 2002-09-09 at 19:12, Imran Badr wrote:
> But my question here still begging an answer: What would be the portable
way
> to calculate kernel logical address of that user buffer?

Who says it even has one ? Not all user allocated pages are even mapped
into the kernel by default. The kiobuf stuff used in 2.4 will do the job
for 2.4. For 2.5 the API will probably look a little different and be a
fair bit faster


2002-09-09 23:02:34

by Alan

[permalink] [raw]
Subject: RE: Calculating kernel logical address ..

On Mon, 2002-09-09 at 23:52, Imran Badr wrote:
>
> But the buffer which I am concerned about was allocated my kmalloc() and
> mapped to the process space in mmap(). AFAIK, kmalloc'ed buffers are
> guaranteed to be mapped.

In which case its all nice and easy. You can safely use virt_to_bus on
the buffer in question (although for 2.5 you will need to use the pci
api). There is a nice clean worked example in each of the pci sound
drivers. Basically it goes

addr = kmalloc(blah)
phys_addr = virt_to_bus(addr)

and then use

remap_page_range(vma->vm_start, phys_addr, size, vma->vm_page_prot)

in the mmap function

2002-09-10 06:39:37

by Jens Axboe

[permalink] [raw]
Subject: Re: Calculating kernel logical address ..

On Mon, Sep 09 2002, Daniel Phillips wrote:
> On Monday 09 September 2002 21:40, Andrew Morton wrote:
> > We need a general-purpose "read or write these pages to this blockdev"
> > library function.
>
> I thought bio was supposed to be that. In what way does it not suffice?
> Simply because of not having a suitable wrapper?

a bio _can_ hold a number of pages, it's just that noone has written the
bio_rw_pages() yet. Not that it would be hard...

--
Jens Axboe

2002-09-10 06:53:09

by Andrew Morton

[permalink] [raw]
Subject: Re: Calculating kernel logical address ..

Jens Axboe wrote:
>
> On Mon, Sep 09 2002, Daniel Phillips wrote:
> > On Monday 09 September 2002 21:40, Andrew Morton wrote:
> > > We need a general-purpose "read or write these pages to this blockdev"
> > > library function.
> >
> > I thought bio was supposed to be that. In what way does it not suffice?
> > Simply because of not having a suitable wrapper?
>
> a bio _can_ hold a number of pages, it's just that noone has written the
> bio_rw_pages() yet. Not that it would be hard...

It's simple if it's synchronous. When I discussed this a while
back with the LVM and EVMS developers the consensus was that an
async API would be better - so we'd need some sort of completion
cookie or callback or whatever.

It would end up with almost as much state as the rather amazing
`struct dio'.

Of course, one could do a synchronous API and see if anyone really,
really complains ;) But a bit of requirements-gathering would be
needed before getting in and coding it.

2002-09-10 07:15:17

by Gerd Knorr

[permalink] [raw]
Subject: Re: Calculating kernel logical address ..

> > It looks drivers/media/video/video-buf.c uses alloc_kiovec() and
> > map_user_kiobuf() to do it.
>
> For video-buf.c and for Imran's application, that's just a wrapper
> which is used to get at get_user_pages().

My latest video-buf.c version already uses get_user_pages() directly.
Get the latest bttv bits from http://bytesex.org/snapshot/ if you want
to have a look at the code. Will go into 2.5 with the next batch of
video4linux updates.

Gerd

--
You can't please everybody. And usually if you _try_ to please
everybody, the end result is one big mess.
-- Linus Torvalds, 2002-04-20

2002-09-10 16:57:00

by Manfred Spraul

[permalink] [raw]
Subject: Re: Calculating kernel logical address ..

Andrew Morton wrote:
> Manfred Spraul wrote:
>
>>Andrew Morton wrote:
>>
>>>Nobody seems to have come forth to implement a thought-out scatter/gather,
>>>map-user-pages library infrastructure so I'd be a bit reluctant to
>>>break stuff without offering a replacement.
>>>
>>
>>We'd need one.
>>
>>get_user_pages() is broken if a kernel module access the virtual address
>>of the page and the cpu caches are not coherent:
>
>
> OK. Most users seem to just want to put the pages under DMA though.
>
>
>>Most of the flush functions need the vma pointer, but it's impossible to
>>guarantee that it still exists when the get_user_pages() user calls
>>page_cache_release().
>
>
> Well presumably, if the driver is altering user memory by hand,
> it is synchronous and they can hang onto mmap_sem while doing it?
>

That's how it's done right now, and it works, but IMHO it's ugly.
You switch from RAID-1 to RAID-5, and suddenly you might get
unexplainable data corruptions with O_DIRECT.

--
Manfred