2005-02-01 00:30:52

by Chris Friesen

[permalink] [raw]
Subject: question on symbol exports

It appears that in 2.6.9 the ppc64 version of flush_tlb_page() depends
on two symbols which are not currently exported: the function
__flush_tlb_pending(), and the per-cpu variable ppc64_tlb_batch.

Is there any particular reason why modules should not be allowed to
flush the tlb, or is this an oversight?

Chris


2005-02-01 07:37:01

by Arjan van de Ven

[permalink] [raw]
Subject: Re: question on symbol exports

On Mon, 2005-01-31 at 18:15 -0600, Chris Friesen wrote:
> It appears that in 2.6.9 the ppc64 version of flush_tlb_page() depends
> on two symbols which are not currently exported: the function
> __flush_tlb_pending(), and the per-cpu variable ppc64_tlb_batch.
>
> Is there any particular reason why modules should not be allowed to
> flush the tlb, or is this an oversight?

can you point at the url to your module source? I suspect modules doing
tlb flushes is the wrong thing, but without seeing the source it's hard
to tell.


2005-02-01 15:38:02

by Chris Friesen

[permalink] [raw]
Subject: Re: question on symbol exports

Arjan van de Ven wrote:
> On Mon, 2005-01-31 at 18:15 -0600, Chris Friesen wrote:

>>Is there any particular reason why modules should not be allowed to
>>flush the tlb, or is this an oversight?
>
> can you point at the url to your module source? I suspect modules doing
> tlb flushes is the wrong thing, but without seeing the source it's hard
> to tell.

I've included the relevent code at the bottom. The module will be
released under the GPL.

I've got a module that I'm porting forward from 2.4. The basic idea is
that we want to be able to track pages dirtied by an application. The
system has no swap, so we use the dirty bit to get this information. On
demand we walk the page tables belonging to the process, store the
addresses of any dirty ones, flush the tlb, and mark them clean.

I (obviously) don't have a good understanding of how the tlb interacts
with the software page tables. If we don't need to flush the tlb I'd
love to hear it. If there's an easier way than walking the tables
manually please let me know.

If it matters, some of the dirty pages may be code (it's used by an
emulator for a system that can handle on-the-fly binary patching).

Thanks,

Chris







Note: this code is run while holding &mm->mmap_sem and &mm->page_table_lock.

/* scan through the entire address space given */
dirty_count = 0;
for(addr=start&PAGE_MASK; addr<=end; addr+=PAGE_SIZE) {
pgd_t *pgd;
pmd_t *pmd;
pte_t *ptep, pte;

/* Page table walking code stolen from follow_page() except
* that this version does not support huge tlbs.
*/
pgd = pgd_offset(mm, addr);
if (pgd_none(*pgd) || unlikely(pgd_bad(*pgd)))
continue;

pmd = pmd_offset(pgd, addr);
if (pmd_none(*pmd))
continue;
if (unlikely(pmd_bad(*pmd)))
continue;

ptep = pte_offset_map(pmd, addr);
if (!ptep)
continue;

pte = *ptep;
pte_unmap(ptep);
if (!pte_present(pte))
continue;

if (!pte_dirty(pte))
continue;

if (!pte_read(pte))
continue;

/* We have a user readable dirty page. Count it.*/
dirty_count++;

if (dirty_count > entries) {
continue;
} else {
__put_user(addr, buf);
buf++;
}

flush_tlb_page(find_vma(mm,addr), addr);
pte = pte_mkclean(pte);
}

2005-02-01 15:51:06

by Arjan van de Ven

[permalink] [raw]
Subject: Re: question on symbol exports

On Tue, 2005-02-01 at 09:37 -0600, Chris Friesen wrote:
> Arjan van de Ven wrote:
> > On Mon, 2005-01-31 at 18:15 -0600, Chris Friesen wrote:
>
> >>Is there any particular reason why modules should not be allowed to
> >>flush the tlb, or is this an oversight?
> >
> > can you point at the url to your module source? I suspect modules doing
> > tlb flushes is the wrong thing, but without seeing the source it's hard
> > to tell.
>
> I've included the relevent code at the bottom. The module will be
> released under the GPL.
>
> I've got a module that I'm porting forward from 2.4. The basic idea is
> that we want to be able to track pages dirtied by an application. The
> system has no swap, so we use the dirty bit to get this information. On
> demand we walk the page tables belonging to the process, store the
> addresses of any dirty ones, flush the tlb, and mark them clean.

afaik one doesn't need to do a tlb flush in code that clears the dirty
bit, as long as you use the proper vm functions to do so.
(if those need a tlb flush, those are supposed to do that for you
afaik).

Also note that your code isn't dealing with 4 level pagetables.... And
pagetable walking in drivers is basically almost always a mistake and a
sign that something is wrong.



2005-02-01 17:01:17

by Chris Friesen

[permalink] [raw]
Subject: Re: question on symbol exports

Arjan van de Ven wrote:
> On Tue, 2005-02-01 at 09:37 -0600, Chris Friesen wrote:

>>I've got a module that I'm porting forward from 2.4. The basic idea is
>>that we want to be able to track pages dirtied by an application. The
>>system has no swap, so we use the dirty bit to get this information. On
>>demand we walk the page tables belonging to the process, store the
>>addresses of any dirty ones, flush the tlb, and mark them clean.
>
>
> afaik one doesn't need to do a tlb flush in code that clears the dirty
> bit, as long as you use the proper vm functions to do so.
> (if those need a tlb flush, those are supposed to do that for you
> afaik).

I've been in contact with one of the developers of the code. The reason
we flush the tlb is so that on the next write the cpu has to fault it in
and set the dirty bit. Does that make sense? I should try
experimenting.....

> Also note that your code isn't dealing with 4 level pagetables....

2.6.9 doesn't have 4 level pagetables. We'll have to port it forward
when we eventually upgrade.

> And
> pagetable walking in drivers is basically almost always a mistake and a
> sign that something is wrong.

I'd rather not be doing it, but there's no nice generic va_to_pte()
function. There have been patches, and ppc has one, but as a whole I
don't know of any nice way to do this.

Chris

2005-02-04 20:15:45

by Chris Friesen

[permalink] [raw]
Subject: Re: question on symbol exports

I've added the ppc64 list to the addressees, in case they are interested.


Marcelo Tosatti wrote:
> On Tue, Feb 01, 2005 at 04:50:16PM +0100, Arjan van de Ven wrote:

>>afaik one doesn't need to do a tlb flush in code that clears the dirty
>>bit, as long as you use the proper vm functions to do so.
>>(if those need a tlb flush, those are supposed to do that for you
>>afaik).

> Yep, and "proper VM function" is include/asm-generic/pgtable.h::ptep_clear_flush_dirty(),
> which on PPC flushes the TLB.

It turns out that to call ptep_clear_flush_dirty() on ppc64 from a
module I needed to export the following symbols:

__flush_tlb_pending
ppc64_tlb_batch
hpte_update

>>Also note that your code isn't dealing with 4 level pagetables.... And
>>pagetable walking in drivers is basically almost always a mistake and a
>>sign that something is wrong.

> Or a sign that the core kernel lacks helper functions :)

Absolutely. It'd be so nice if there was a simple va_to_ptep() helper
function available.

Chris

2005-02-05 09:29:42

by Benjamin Herrenschmidt

[permalink] [raw]
Subject: Re: question on symbol exports


> It turns out that to call ptep_clear_flush_dirty() on ppc64 from a
> module I needed to export the following symbols:
>
> __flush_tlb_pending
> ppc64_tlb_batch
> hpte_update

Any reason why you need to call that from a module ? Is the module
GPL'd ?

Ben.


2005-02-07 14:48:48

by Chris Friesen

[permalink] [raw]
Subject: Re: question on symbol exports

Benjamin Herrenschmidt wrote:
>>It turns out that to call ptep_clear_flush_dirty() on ppc64 from a
>>module I needed to export the following symbols:
>>
>>__flush_tlb_pending
>>ppc64_tlb_batch
>>hpte_update
>
>
> Any reason why you need to call that from a module ? Is the module
> GPL'd ?

I explained this at the beginning of the thread, but I'll do so again.
The module will be released under the GPL.

The basic idea is that we want to be able to track pages dirtied by a
userspace process. The system has no swap, so we use the dirty bit for
this. On demand we look up the page tables for an address range
specified by the caller, store the addresses of any dirty pages, then
mark them clean so that the next write causes them to get marked dirty
again. It is this act of marking them clean that requires the
additional exports.

I've included the current code below. If there is any way to accomplish
this without the additional exports, I'd love to hear about it.

Chris








Note: this code is run while holding &mm->mmap_sem and &mm->page_table_lock.


for(addr=start&PAGE_MASK; addr<=end; addr+=PAGE_SIZE) {
pte_t *ptep=0;

ptep = va_to_ptep_map(mm, addr);
if (!ptep)
goto unmap_continue;

if (!pte_dirty(*ptep))
goto unmap_continue;

/* We have a user readable dirty page. Count it.*/
dirty_count++;

if (dirty_count <= entries) {
__put_user(addr, buf);
buf++;
ptep_clear_flush_dirty(find_vma(mm, addr), addr, ptep);

/* Handle option to stop early. */
if ((dirty_count == entries) &&
(options & STOP_WHEN_BUF_FULL))
addr=end+1;
}

unmap_continue:
if (ptep)
pte_unmap(ptep);
}

2005-02-07 21:36:52

by Benjamin Herrenschmidt

[permalink] [raw]
Subject: Re: question on symbol exports

On Mon, 2005-02-07 at 08:44 -0600, Chris Friesen wrote:
> Benjamin Herrenschmidt wrote:
> >>It turns out that to call ptep_clear_flush_dirty() on ppc64 from a
> >>module I needed to export the following symbols:
> >>
> >>__flush_tlb_pending
> >>ppc64_tlb_batch
> >>hpte_update
> >
> >
> > Any reason why you need to call that from a module ? Is the module
> > GPL'd ?
>
> I explained this at the beginning of the thread, but I'll do so again.
> The module will be released under the GPL.
>
> The basic idea is that we want to be able to track pages dirtied by a
> userspace process. The system has no swap, so we use the dirty bit for
> this. On demand we look up the page tables for an address range
> specified by the caller, store the addresses of any dirty pages, then
> mark them clean so that the next write causes them to get marked dirty
> again. It is this act of marking them clean that requires the
> additional exports.
>
> I've included the current code below. If there is any way to accomplish
> this without the additional exports, I'd love to hear about it.

Interesting... more than no swap, you must also make sure you have no
r/w mmap'ed file (which are technically equivalent to swap).

I'm not too fan about exporting those symbols, but I'll talk to paulus,
it should be possible at least to EXPORT_SYMBOL_GPL them...

Ben.


2005-02-07 23:03:24

by Chris Friesen

[permalink] [raw]
Subject: Re: question on symbol exports

Benjamin Herrenschmidt wrote:

> Interesting... more than no swap, you must also make sure you have no
> r/w mmap'ed file (which are technically equivalent to swap).

Ah...thanks for the warning.

We want to eventually make it work with swap as well, but that's
substantially more complicated.

> I'm not too fan about exporting those symbols, but I'll talk to paulus,
> it should be possible at least to EXPORT_SYMBOL_GPL them...

I understand the reluctance. I'm perfectly willing to export it GPL in
my private branch as long as you guys don't consider it evil--the module
is going to be GPL anyways.

The alternative would be for me to build my code directly in to the
kernel...just makes it harder for me to debug.

Chris

2005-02-07 23:43:14

by Dan Malek

[permalink] [raw]
Subject: Re: question on symbol exports


On Feb 7, 2005, at 4:35 PM, Benjamin Herrenschmidt wrote:

> Interesting... more than no swap, you must also make sure you have no
> r/w mmap'ed file (which are technically equivalent to swap).

Yeah, I kinda had a similar thought. Just because you aren't
swapping doesn't mean the VM subsystem isn't looking at dirty bits,
too. It could potentially steal a page that it thinks can be replaced
from either a zero-fill or reading again from persistent storage.


-- Dan

2005-02-08 15:37:08

by Chris Friesen

[permalink] [raw]
Subject: Re: question on symbol exports

Dan Malek wrote:
>
> On Feb 7, 2005, at 4:35 PM, Benjamin Herrenschmidt wrote:
>
>> Interesting... more than no swap, you must also make sure you have no
>> r/w mmap'ed file (which are technically equivalent to swap).
>
>
> Yeah, I kinda had a similar thought. Just because you aren't
> swapping doesn't mean the VM subsystem isn't looking at dirty bits,
> too. It could potentially steal a page that it thinks can be replaced
> from either a zero-fill or reading again from persistent storage.

In our existing case, the app also mlock()s the pages in question. This
should get around these two possible sources of inaccuracy.

Chris