Bartlomiej Zolnierkiewicz wrote:
> We have stuct page of the first page and a offset.
> We need to obtain struct page of the current page and map it.
Opening this question to a wider audience.
struct scatterlist gives us struct page*, and an offset+length pair.
The struct page* is the _starting_ page of a potentially multi-page run
of data.
The question: how does one get struct page* for the second, and
successive pages in a known-contiguous multi-page run, if one only knows
the first page?
Jeff
> Bartlomiej Zolnierkiewicz wrote:
>> We have stuct page of the first page and a offset.
>> We need to obtain struct page of the current page and map it.
>
>
> Opening this question to a wider audience.
>
> struct scatterlist gives us struct page*, and an offset+length pair. The struct page* is the _starting_ page of a potentially multi-page run of data.
>
> The question: how does one get struct page* for the second, and successive pages in a known-contiguous multi-page run, if one only knows the first page?
If it's a higher order allocation, just page+1 should be safe. If it just
happens to be contig, it might cross a discontig boundary, and not obey
that rule. Very unlikely, but possible.
M.
--"Martin J. Bligh" <[email protected]> wrote (on Wednesday, October 27, 2004 08:52:39 -0700):
>> Bartlomiej Zolnierkiewicz wrote:
>>> We have stuct page of the first page and a offset.
>>> We need to obtain struct page of the current page and map it.
>>
>>
>> Opening this question to a wider audience.
>>
>> struct scatterlist gives us struct page*, and an offset+length pair. The struct page* is the _starting_ page of a potentially multi-page run of data.
>>
>> The question: how does one get struct page* for the second, and successive pages in a known-contiguous multi-page run, if one only knows the first page?
>
> If it's a higher order allocation, just page+1 should be safe. If it just
> happens to be contig, it might cross a discontig boundary, and not obey
> that rule. Very unlikely, but possible.
To repeat what I said in IRC ... ;-)
Actually, you could check this with the pfns being the same when >> MAX_ORDER-1.
We should be aligned on a MAX_ORDER boundary, I think.
However, pfn_to_page(page_to_pfn(page) + 1) might be safer. If rather slower.
M.
Martin J. Bligh wrote:
>>Bartlomiej Zolnierkiewicz wrote:
>>
>>>We have stuct page of the first page and a offset.
>>>We need to obtain struct page of the current page and map it.
>>
>>
>>Opening this question to a wider audience.
>>
>>struct scatterlist gives us struct page*, and an offset+length pair. The struct page* is the _starting_ page of a potentially multi-page run of data.
>>
>>The question: how does one get struct page* for the second, and successive pages in a known-contiguous multi-page run, if one only knows the first page?
>
>
> If it's a higher order allocation, just page+1 should be safe. If it just
> happens to be contig, it might cross a discontig boundary, and not obey
> that rule. Very unlikely, but possible.
Unfortunately, it's not.
The block layer just tells us "it's a contiguous run of memory", which
implies nothing really about the allocation size.
Bart and I (and others?) essentially need a "page+1" thing (for 2.4.x
too!), that won't break in the face of NUMA/etc.
Alternatively (or additionally), we may need to make sure the block
layer doesn't merge across zones or NUMA boundaries or whatnot.
Jeff
===== include/linux/mm.h 1.193 vs edited =====
--- 1.193/include/linux/mm.h 2004-10-20 04:37:06 -04:00
+++ edited/include/linux/mm.h 2004-10-27 12:33:28 -04:00
@@ -41,6 +41,8 @@
#define MM_VM_SIZE(mm) TASK_SIZE
#endif
+#define nth_page(page,n) pfn_to_page(page_to_pfn((page)) + n)
+
/*
* Linux kernel virtual memory manager primitives.
* The idea being to have a "virtual" mm in the same way
> Unfortunately, it's not.
>
> The block layer just tells us "it's a contiguous run of memory", which implies nothing really about the allocation size.
>
> Bart and I (and others?) essentially need a "page+1" thing (for 2.4.x too!), that won't break in the face of NUMA/etc.
>
> Alternatively (or additionally), we may need to make sure the block layer doesn't merge across zones or NUMA boundaries or whatnot.
The latter would be rather more efficient. I don't know how often you
end up doing each operation though ... the page+1 vs the attemtped merge.
Depends on the ratio, I guess.
M.
Christoph Hellwig wrote:
>> I think this is the wrong level of interface exposed. Just add two hepler
>> kmap_atomic_sg/kunmap_atomic_sg that gurantee to map/unmap a sg list entry,
>> even if it's bigger than a page.
On Wed, Oct 27, 2004 at 02:33:45PM -0400, Jeff Garzik wrote:
> Why bother mapping anything larger than a page, when none of the users
> need it?
> P.S. In your scheme you would need four helpers; you forgot kmap_sg()
> and kunmap_sg().
This is all a non-issue. The page structure just represents little more
than a physical address to the block layer in the context of merging,
so the pfn_to_page(page_to_pfn(...) + ...) bits calculate this properly.
There is just nothing interesting going on here. Generate the page
structure for the piece of the segment, kmap_atomic() it, and it's done.
-- wli
Jeff Garzik <[email protected]> wrote:
>
> > However, pfn_to_page(page_to_pfn(page) + 1) might be safer. If rather slower.
>
>
> Is this patch acceptable to everyone? Andrew?
spose so. The scatterlist API is being a bit silly there.
It might be worthwhile doing:
#ifdef CONFIG_DISCONTIGMEM
#define nth_page(page,n) pfn_to_page(page_to_pfn((page)) + n)
#else
#define nth_page(page,n) ((page)+(n))
#endif
Jeff Garzik <[email protected]> wrote:
>> However, pfn_to_page(page_to_pfn(page) + 1) might be safer. If
>> rather slower. Is this patch acceptable to everyone? Andrew?
On Wed, Oct 27, 2004 at 02:29:14PM -0700, Andrew Morton wrote:
> spose so. The scatterlist API is being a bit silly there.
> It might be worthwhile doing:
> #ifdef CONFIG_DISCONTIGMEM
> #define nth_page(page,n) pfn_to_page(page_to_pfn((page)) + n)
> #else
> #define nth_page(page,n) ((page)+(n))
> #endif
This is actually not quite good enough. Zones are not guaranteed
to have adjacent mem_map[]'s even with CONFIG_DISCONTIGMEM=n. It may
make sense to prevent merging from spanning zones, but frankly the
overhead of the pfn_to_page()/page_to_pfn() is negligible in comparison
to the data movement and (when applicable) virtual windowing, where in
the merging code cpu overhead is a greater concern, particularly for
devices that don't require manual data movement.
-- wli
On Wed, Oct 27, 2004 at 02:29:14PM -0700, Andrew Morton wrote:
> spose so. The scatterlist API is being a bit silly there.
Well, it depends on your perspective :)
Each scatterlist entry is supposed to map to a physical segment to be
passed to h/w. Hardware S/G tables just want to see a addr/len pair,
and don't care about machine page size. scatterlist follows a similar
model.
dma_map_sg() and other helpers create a favorable situation, where >90%
of the drivers don't have to care about the VM-size details.
Unfortunately those drivers that need need to do their own data transfer
(like ATA's PIO, instead of DMA) need direct access to each member of an
s/g list.
Jeff
Christoph Hellwig wrote:
>>To repeat what I said in IRC ... ;-)
>>
>>Actually, you could check this with the pfns being the same when >> MAX_ORDER-1.
>>We should be aligned on a MAX_ORDER boundary, I think.
>>
>>However, pfn_to_page(page_to_pfn(page) + 1) might be safer. If rather slower.
>
>
> I think this is the wrong level of interface exposed. Just add two hepler
> kmap_atomic_sg/kunmap_atomic_sg that gurantee to map/unmap a sg list entry,
> even if it's bigger than a page.
Why bother mapping anything larger than a page, when none of the users
need it?
Jeff
P.S. In your scheme you would need four helpers; you forgot kmap_sg()
and kunmap_sg().
> To repeat what I said in IRC ... ;-)
>
> Actually, you could check this with the pfns being the same when >> MAX_ORDER-1.
> We should be aligned on a MAX_ORDER boundary, I think.
>
> However, pfn_to_page(page_to_pfn(page) + 1) might be safer. If rather slower.
I think this is the wrong level of interface exposed. Just add two hepler
kmap_atomic_sg/kunmap_atomic_sg that gurantee to map/unmap a sg list entry,
even if it's bigger than a page.
Christoph Hellwig wrote:
>> I think this is the wrong level of interface exposed. Just add two hepler
>> kmap_atomic_sg/kunmap_atomic_sg that gurantee to map/unmap a sg list entry,
>> even if it's bigger than a page.
On Wed, Oct 27, 2004 at 02:33:45PM -0400, Jeff Garzik wrote:
> Why bother mapping anything larger than a page, when none of the users
> need it?
> P.S. In your scheme you would need four helpers; you forgot kmap_sg()
> and kunmap_sg().
The scheme hch suggested is highly invasive in the area of architecture-
specific fixmap layout and introduces a dependency of fixmap layout on
maximum segment size, which may make it current normal maximum segment
sizes use prohibitive amounts of vmallocspace on 32-bit architectures.
So I'd drop that suggestion, though it's not particularly farfetched.
-- wli