2000-10-27 20:22:08

by Christoph Hellwig

[permalink] [raw]
Subject: [PATCH] kiobuf/rawio fixes for 2.4.0-test10-pre6

Ok, forgot to Cc linux-kernel ...
Please Cc linus on reply.

----- Forwarded message from Christoph Hellwig <[email protected]> -----

Date: Fri, 27 Oct 2000 22:03:54 +0200
From: Christoph Hellwig <[email protected]>
To: Linus Torvalds <[email protected]>
Subject: [PATCH] kiobuf/rawio fixes for 2.4.0-test10-pre6
X-Mailer: Mutt 1.0i

Hi Linus,

Stephen Tweedies last kiobuf patchset contained a lot bu fixes
besides new features. These bug-fixes are not yet merged in 2.4.0.

This patch contains forward-ports of the follwoing fixes
(quote from his 00README):

01-mapfix.diff

map_user_kiobuf() retries failed maps to cover a race in which
the swapper steals a page before the kiobuf has grabbed and
locked it.

02-iocount.diff

Kanoj Sarcar's fixes to allow kiobufs to work properly over
fork(), even on threaded applications.

04-eiofix.diff

Fix to return -EIO instead of 0 if a raw I/O read or write
encounters an error in the first block.

06-enxio.diff
Return ENXIO on read/write at or beyond the end of the device
for raw I/O

Please apply.

Christoph

--
Always remember that you are unique. Just like everyone else.


diff -uNr --exclude-from=dontdiff linux.orig/drivers/char/raw.c linux/drivers/char/raw.c
--- linux.orig/drivers/char/raw.c Thu Oct 19 13:21:24 2000
+++ linux/drivers/char/raw.c Tue Oct 24 13:25:47 2000
@@ -277,8 +277,11 @@

if ((*offp & sector_mask) || (size & sector_mask))
return -EINVAL;
- if ((*offp >> sector_bits) > limit)
+ if ((*offp >> sector_bits) > limit) {
+ if (size)
+ return -ENXIO;
return 0;
+ }

/*
* We'll just use one kiobuf
diff -uNr --exclude-from=dontdiff linux.orig/fs/buffer.c linux/fs/buffer.c
--- linux.orig/fs/buffer.c Tue Oct 24 13:15:49 2000
+++ linux/fs/buffer.c Tue Oct 24 13:26:31 2000
@@ -1924,6 +1924,8 @@

spin_unlock(&unused_list_lock);

+ if (!iosize)
+ return -EIO;
return iosize;
}

diff -uNr --exclude-from=dontdiff linux.orig/include/linux/mm.h linux/include/linux/mm.h
--- linux.orig/include/linux/mm.h Tue Oct 24 13:15:56 2000
+++ linux/include/linux/mm.h Tue Oct 24 14:41:46 2000
@@ -157,8 +157,9 @@
wait_queue_head_t wait;
struct page **pprev_hash;
struct buffer_head * buffers;
- void *virtual; /* non-NULL if kmapped */
+ void *virtual; /* non-NULL if kmapped */
struct zone_struct *zone;
+ atomic_t rawcount; /* count of raw io in progress */
} mem_map_t;

#define get_page(p) atomic_inc(&(p)->count)
diff -uNr --exclude-from=dontdiff linux.orig/mm/memory.c linux/mm/memory.c
--- linux.orig/mm/memory.c Tue Oct 24 13:15:58 2000
+++ linux/mm/memory.c Tue Oct 24 16:09:22 2000
@@ -138,6 +138,30 @@
check_pgt_cache();
}

+/*
+ * Establish a new mapping:
+ * - flush the old one
+ * - update the page tables
+ * - inform the TLB about the new one
+ */
+static inline void establish_pte(struct vm_area_struct * vma, unsigned long address,
+ pte_t *page_table, pte_t entry)
+{
+ flush_tlb_page(vma, address);
+ set_pte(page_table, entry);
+ update_mmu_cache(vma, address, entry);
+}
+
+static inline void break_cow(struct vm_area_struct * vma, struct page * old_page,
+ struct page * new_page, unsigned long address,
+ pte_t *page_table)
+{
+ copy_cow_page(old_page,new_page,address);
+ flush_page_to_ram(new_page);
+ flush_cache_page(vma, address);
+ establish_pte(vma, address, page_table, pte_mkwrite(pte_mkdirty(mk_pte(new_page, vma->vm_page_prot))));
+}
+
#define PTE_TABLE_MASK ((PTRS_PER_PTE-1) * sizeof(pte_t))
#define PMD_TABLE_MASK ((PTRS_PER_PMD-1) * sizeof(pmd_t))

@@ -227,6 +251,22 @@

/* If it's a COW mapping, write protect it both in the parent and the child */
if (cow) {
+ /* Rawio in progress? */
+ if (atomic_read(&ptepage->rawcount)) {
+ /*
+ * If pte is dirty, its a private page,
+ * rawio was initiated by a clone.
+ * For dmain operation, need to break
+ * cow.
+ */
+ if (pte_dirty(pte)) {
+ struct page * new_page = alloc_page(GFP_HIGHUSER);
+ if (!new_page)
+ goto nomem;
+ break_cow(vma, ptepage, new_page, address, dst_pte);
+ goto cont_copy_pte_range;
+ }
+ }
ptep_clear_wrprotect(src_pte);
pte = *src_pte;
}
@@ -382,9 +422,12 @@


/*
- * Do a quick page-table lookup for a single page.
+ * Do a quick page-table lookup for a single page. We have already verified
+ * access type, and done a fault in. But, kswapd might have stolen the page
+ * in the meantime. Return an indication of whether we should retry the fault
+ * in. Writability test is superfluous but conservative.
*/
-static struct page * follow_page(unsigned long address)
+static struct page * follow_page(unsigned long address, int writeacc, int * ret)
{
pgd_t *pgd;
pmd_t *pmd;
@@ -393,10 +436,15 @@
pmd = pmd_offset(pgd, address);
if (pmd) {
pte_t * pte = pte_offset(pmd, address);
- if (pte && pte_present(*pte))
+ if (pte && pte_present(*pte)) {
+ if (writeacc && !pte_write(*pte))
+ goto retry;
return pte_page(*pte);
+ }
}
-
+
+retry:
+ *ret = 1;
return NULL;
}

@@ -428,7 +476,8 @@
struct page * map;
int i;
int datain = (rw == READ);
-
+ int failed;
+
/* Make sure the iobuf is not already mapped somewhere. */
if (iobuf->nr_pages)
return -EINVAL;
@@ -467,23 +516,28 @@
}
if (((datain) && (!(vma->vm_flags & VM_WRITE))) ||
(!(vma->vm_flags & VM_READ))) {
- err = -EACCES;
goto out_unlock;
}
}
+
+faultin:
if (handle_mm_fault(current->mm, vma, ptr, datain) <= 0)
goto out_unlock;
spin_lock(&mm->page_table_lock);
- map = follow_page(ptr);
- if (!map) {
+ map = follow_page(ptr, datain, &failed);
+ if (failed) {
+ /*
+ * Page got stolen before we could lock it down.
+ * Retry.
+ */
spin_unlock(&mm->page_table_lock);
- dprintk (KERN_ERR "Missing page in map_user_kiobuf\n");
- goto out_unlock;
+ goto faultin;
}
map = get_page_map(map);
- if (map)
+ if (map) {
atomic_inc(&map->count);
- else
+ atomic_inc(&map->rawcount);
+ } else
printk (KERN_INFO "Mapped page missing [%d]\n", i);
spin_unlock(&mm->page_table_lock);
iobuf->maplist[i] = map;
@@ -519,6 +573,7 @@
if (map) {
if (iobuf->locked)
UnlockPage(map);
+ atomic_dec(&map->rawcount);
__free_page(map);
}
}
@@ -771,28 +826,6 @@
} while (from && (from < end));
flush_tlb_range(current->mm, beg, end);
return error;
-}
-
-/*
- * Establish a new mapping:
- * - flush the old one
- * - update the page tables
- * - inform the TLB about the new one
- */
-static inline void establish_pte(struct vm_area_struct * vma, unsigned long address, pte_t *page_table, pte_t entry)
-{
- flush_tlb_page(vma, address);
- set_pte(page_table, entry);
- update_mmu_cache(vma, address, entry);
-}
-
-static inline void break_cow(struct vm_area_struct * vma, struct page * old_page, struct page * new_page, unsigned long address,
- pte_t *page_table)
-{
- copy_cow_page(old_page,new_page,address);
- flush_page_to_ram(new_page);
- flush_cache_page(vma, address);
- establish_pte(vma, address, page_table, pte_mkwrite(pte_mkdirty(mk_pte(new_page, vma->vm_page_prot))));
}

/*
diff -uNr --exclude-from=dontdiff linux.orig/mm/page_alloc.c linux/mm/page_alloc.c
--- linux.orig/mm/page_alloc.c Tue Oct 24 13:15:58 2000
+++ linux/mm/page_alloc.c Tue Oct 24 13:36:44 2000
@@ -98,6 +98,8 @@
BUG();
if (PageInactiveClean(page))
BUG();
+ if (atomic_read(&page->rawcount))
+ BUG();

page->flags &= ~(1<<PG_referenced);
page->age = PAGE_AGE_START;
@@ -819,6 +821,7 @@
*/
for (p = lmem_map; p < lmem_map + totalpages; p++) {
set_page_count(p, 0);
+ atomic_set(&(p)->rawcount, 0);
SetPageReserved(p);
init_waitqueue_head(&p->wait);
memlist_init(&p->list);

----- End forwarded message -----

--
Always remember that you are unique. Just like everyone else.


2000-10-30 11:46:15

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [PATCH] kiobuf/rawio fixes for 2.4.0-test10-pre6

On Fri, Oct 27, 2000 at 02:23:04PM -0700, Linus Torvalds wrote:
>
> [...]
>
> That solution, btw, might be as simple as just saying:
>
> - raw IO is based on physical pages, and the COW mapping crated by
> fork() may cause the changes to be visibile to either child or parent
> or both, depending on usage patterns to the page in question. For
> repeatable behaviour, do not have outstanding direct IO in progress
> over a fork().
>
> Ie, just _document_ it. It's not _wrong_, it can just be surprising (but
> it is actually entirely straightforward and sane if you just look at it
> the right way).

Ok, here is an updated patch witout that change, but instead with a little
piece of kiobuf documentation that does document this and other things
related to kiobufs.

Christoph

--
Always remember that you are unique. Just like everyone else.


--- linux.orig/drivers/char/raw.c Thu Oct 19 13:21:24 2000
+++ linux/drivers/char/raw.c Sun Oct 29 20:55:43 2000
@@ -277,8 +277,11 @@

if ((*offp & sector_mask) || (size & sector_mask))
return -EINVAL;
- if ((*offp >> sector_bits) > limit)
+ if ((*offp >> sector_bits) > limit) {
+ if (size)
+ return -ENXIO;
return 0;
+ }

/*
* We'll just use one kiobuf
--- linux.orig/fs/buffer.c Fri Oct 27 12:28:40 2000
+++ linux/fs/buffer.c Sun Oct 29 20:55:43 2000
@@ -1924,6 +1924,8 @@

spin_unlock(&unused_list_lock);

+ if (!iosize)
+ return -EIO;
return iosize;
}

--- linux.orig/mm/memory.c Fri Oct 27 12:28:42 2000
+++ linux/mm/memory.c Sun Oct 29 20:56:09 2000
@@ -382,9 +382,12 @@


/*
- * Do a quick page-table lookup for a single page.
+ * Do a quick page-table lookup for a single page. We have already verified
+ * access type, and done a fault in. But, kswapd might have stolen the page
+ * in the meantime. Return an indication of whether we should retry the fault
+ * in. Writability test is superfluous but conservative.
*/
-static struct page * follow_page(unsigned long address)
+static struct page * follow_page(unsigned long address, int writeacc, int * ret)
{
pgd_t *pgd;
pmd_t *pmd;
@@ -393,10 +396,15 @@
pmd = pmd_offset(pgd, address);
if (pmd) {
pte_t * pte = pte_offset(pmd, address);
- if (pte && pte_present(*pte))
+ if (pte && pte_present(*pte)) {
+ if (writeacc && !pte_write(*pte))
+ goto retry;
return pte_page(*pte);
+ }
}
-
+
+retry:
+ *ret = 1;
return NULL;
}

@@ -428,7 +436,8 @@
struct page * map;
int i;
int datain = (rw == READ);
-
+ int failed;
+
/* Make sure the iobuf is not already mapped somewhere. */
if (iobuf->nr_pages)
return -EINVAL;
@@ -467,18 +476,22 @@
}
if (((datain) && (!(vma->vm_flags & VM_WRITE))) ||
(!(vma->vm_flags & VM_READ))) {
- err = -EACCES;
goto out_unlock;
}
}
+
+faultin:
if (handle_mm_fault(current->mm, vma, ptr, datain) <= 0)
goto out_unlock;
spin_lock(&mm->page_table_lock);
- map = follow_page(ptr);
- if (!map) {
+ map = follow_page(ptr, datain, &failed);
+ if (failed) {
+ /*
+ * Page got stolen before we could lock it down.
+ * Retry.
+ */
spin_unlock(&mm->page_table_lock);
- dprintk (KERN_ERR "Missing page in map_user_kiobuf\n");
- goto out_unlock;
+ goto faultin;
}
map = get_page_map(map);
if (map)
diff -uNr linux.orig/Documentation/kiobuf.txt linux/Documentation/kiobuf.txt
--- linux.orig/Documentation/kiobuf.txt Thu Jan 1 01:00:00 1970
+++ linux/Documentation/kiobuf.txt Sun Oct 29 21:38:20 2000
@@ -0,0 +1,100 @@
+ Abstract Kernel IO Buffers
+ Under Linux
+
+ Christoph Hellwig <[email protected]>
+
+
+This document describes the kiobuf concept used in the Linux Kernel
+IO/memory subsystem. It describes it's usages, functions working
+with kernel IO buffers and show some examples for kiobuf usage.
+
+
+The main reason for implementing kernel IO buffers (by Stephen Tweedie)
+was the lack of raw devices support in Linux kernels <= 2.2. Raw devices
+are the character devices that AT&T derived UNIX version implement to
+allow character based uncached access to mass storage devices. In
+Linux kernels <= 2.2 all blockdevice IO goes either through the buffer-
+or pagecache, so that applications like databases cannot get full
+control over their data.
+
+The solution in Linux 2.3 an higher is that the new raw devices driver
+locks down the virtual memory it gets passed by the ->read and ->write
+methods and does physical page io on them, bypassing the caches.
+NOTE: the physical memory referenced by kiobufs does - unlike nearly
+everything else in the Linux memory managment - not have reasonable COW
+semenantics. So don't even try to fork when doing rawio or using
+user-space memory in kiobufs in an other way.
+
+
+To use iobufs in this way you need to allocate one or more kiobufs (an
+array of kiobufs is called kiovec - do not confuse those with BSD iovecs).
+
+ err = alloc_kiovec (count, iovec);
+
+This allocates the memory for the wanted number of kiobufs (and adds them
+to a cache) and initalizes some variables - in an OO-language this would be
+the constructor. Then you force the virtual memory to faulted in and locked
+in physical memory and reference it by the kiobuf. (NOTE: this must be done
+for each iobuf, not for the whole iovec).
+
+ err = map_user_kiobuf (rw, iobuf, address, len);
+
+After that you request IO against the wanted device. For the case of
+raw devices where IO should be requested against a blockdevice, there
+is a function in fs/buffer.c that does exactly this. (the parameter
+'blocks' is an array of the block numbers the IO should be requested
+against)
+
+ err = brw_kiovec (rw, count, iovec, dev, blocks, sector_size);
+
+After the IO for this iobuf is done, unmap the virtual memory.
+
+ unmap_kiobuf (iobuf);
+
+And when we are finished with the iovev, free it.
+
+ free_kiovec (count, iovec);
+
+
+Locking down user memory and doing mass storage device IO with it is not
+the only purpose of kiobufs. Another use for kiobufs is allowing
+user-space mmaping dma memory, e.g in sound drivers. To do so you
+need to lock-down kernel virtual memory and refernece it using kiobufs.
+The code that does exactly this is not yet in the kernel - get Stephen
+Tweedie's kiobuf patchset if you want to use this.
+
+
+In the long term it looks like all blockdev IO will be done using
+kiobufs. In the SGI XFS tree there is code that allows passing kiovecs
+to the individual low-level block drivers. There are lots of advantages
+of doing it this way: the page cache doesn't need to fit the outstanding
+io into lots of bufferheads, passing each bufferhead to ll_rw_block()
+where the elevator merges some of them together for better device usage
+and submits them to the drivers. Instead the cache locks down the pages
+and submits the kiovec to the low-level driver. The lowlevel driver knows
+better how the request should be splitted for dmaing or whatever. On the
+other hand software RAID or LVM get more complicated: instead of just
+doing block-remapping they must split the kiobufs and - in case of LVM -
+find ways to do efficient IO on continguos areas.
+
+
+
+References:
+
+ Linux Kernel Sourcecode
+ (fs/buffer.c, fs/iobufs.c, mm/memory.c, drivers/char/raw.c)
+
+ SGI XFS for Linux
+ (http://oss.sgi.com/projects/linux-xfs/)
+
+ Stephen Tweedies kiobuf patchset
+ (ftp://ftp.linux.org.uk/pub/linux/sct/fs/raw-io/)
+
+ Linux MM mailinglist
+ (http://humbolt.geo.uu.nl/Linux-MM/linux-mm.html)
+
+
+Thanks to Arjan van de Ven, Daniel Phillips and Marcelo Tosatti for
+proofreading this document and giving usefull hints.
+
+$Id: kiobuf.txt,v 1.2 2000/10/29 20:37:54 hch Exp hch $

2000-10-30 17:19:45

by Jeff Garzik

[permalink] [raw]
Subject: Re: [PATCH] kiobuf/rawio fixes for 2.4.0-test10-pre6

Christoph Hellwig wrote:
> +Locking down user memory and doing mass storage device IO with it is not
> +the only purpose of kiobufs. Another use for kiobufs is allowing
> +user-space mmaping dma memory, e.g in sound drivers. To do so you
> +need to lock-down kernel virtual memory and refernece it using kiobufs.
> +The code that does exactly this is not yet in the kernel - get Stephen
> +Tweedie's kiobuf patchset if you want to use this.

Take a look at drivers/sound/via82cxxx_audio.c. How can that mmap be
improved by using kiobufs?

It seems like there is less overhead to mmap(2) DMA memory the way I do
it currently -- without kiobufs...

Honestly interested,

Jeff


--
Jeff Garzik | "Mind if I drive?" -Sam
Building 1024 | "Not if you don't mind me clawing at the
MandrakeSoft | dash and shrieking like a cheerleader."
| -Max

2000-10-30 18:18:15

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [PATCH] kiobuf/rawio fixes for 2.4.0-test10-pre6

On Mon, Oct 30, 2000 at 12:19:21PM -0500, Jeff Garzik wrote:
> Christoph Hellwig wrote:
> > +Locking down user memory and doing mass storage device IO with it is not
> > +the only purpose of kiobufs. Another use for kiobufs is allowing
> > +user-space mmaping dma memory, e.g in sound drivers. To do so you
> > +need to lock-down kernel virtual memory and refernece it using kiobufs.
> > +The code that does exactly this is not yet in the kernel - get Stephen
> > +Tweedie's kiobuf patchset if you want to use this.
>
> Take a look at drivers/sound/via82cxxx_audio.c. How can that mmap be
> improved by using kiobufs?

I think so - but you need Stephen's kvmap patch, that is in the same
patchset the forward-ported fixes are
(at ftp://ftp.linux.org.uk/pub/linux/sct/fs/raw-io/)

An very nice example is included.

Christoph

--
Always remember that you are unique. Just like everyone else.

2000-10-30 18:57:21

by Jeff Garzik

[permalink] [raw]
Subject: Re: [PATCH] kiobuf/rawio fixes for 2.4.0-test10-pre6

Christoph Hellwig wrote:
> On Mon, Oct 30, 2000 at 12:19:21PM -0500, Jeff Garzik wrote:
> > Take a look at drivers/sound/via82cxxx_audio.c. How can that mmap be
> > improved by using kiobufs?
>
> I think so - but you need Stephen's kvmap patch, that is in the same
> patchset the forward-ported fixes are
> (at ftp://ftp.linux.org.uk/pub/linux/sct/fs/raw-io/)
>
> An very nice example is included.

Seen it, re-read my question...

I keep seeing "audio drivers' mmap" used a specific example of a place
that would benefit from kiobufs. The current via audio mmap looks quite
a bit like mmap_kiobuf and its support code... except without all the
kiobuf overhead.

My question from above is: how can the via audio mmap in test10-preXX
be improved by using kiobufs? I am not a kiobuf expert, but AFAICS a
non-kiobuf implementation is better for audio drivers. (and the via
audio mmap implementation is what some other audio drivers are about to
start using...)

I can clearly see that many applications will find kiobufs quite useful
(learned another from alan just now...), but I do not see that audio
drivers can benefit from kiobufs at all. Corrections on this fact are
requested, as I am hacking audio drivers right now and want to make sure
I pick the best course of action for the long term.

Regards,

Jeff


--
Jeff Garzik | "Mind if I drive?" -Sam
Building 1024 | "Not if you don't mind me clawing at the
MandrakeSoft | dash and shrieking like a cheerleader."
| -Max

2000-10-30 19:45:03

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [PATCH] kiobuf/rawio fixes for 2.4.0-test10-pre6

On Mon, Oct 30, 2000 at 01:56:07PM -0500, Jeff Garzik wrote:
> My question from above is: how can the via audio mmap in test10-preXX
> be improved by using kiobufs? I am not a kiobuf expert, but AFAICS a
> non-kiobuf implementation is better for audio drivers. (and the via
> audio mmap implementation is what some other audio drivers are about to
> start using...)

I think the biggest advantage is that you actually get the list of pages
when you perform the mmap instead of doing virt_to_page on every ->nopage.
That should speed up the operations on the mmap'ed are a bit.

The other strong argument for the kiobuf solution is code-sharing. Instead
of having every (sound) driver playing with the vm, there is one central
place when you use kvmaps.

Christoph

--
Always remember that you are unique. Just like everyone else.

2000-10-30 20:09:31

by Jeff Garzik

[permalink] [raw]
Subject: Re: [PATCH] kiobuf/rawio fixes for 2.4.0-test10-pre6

Index: drivers/sound/via82cxxx_audio.c
===================================================================
RCS file: /cvsroot/gkernel/linux_2_4/drivers/sound/via82cxxx_audio.c,v
retrieving revision 1.1.1.6.4.1
diff -u -r1.1.1.6.4.1 via82cxxx_audio.c
--- drivers/sound/via82cxxx_audio.c 2000/10/27 08:21:41 1.1.1.6.4.1
+++ drivers/sound/via82cxxx_audio.c 2000/10/30 19:57:21
@@ -226,6 +226,7 @@
struct via_sgd_data {
dma_addr_t handle;
void *cpuaddr;
+ struct page *page;
};


@@ -626,6 +627,7 @@

if (!chan->sgbuf[i].cpuaddr)
goto err_out_nomem;
+ chan->sgbuf[i].page = virt_to_page (chan->sgbuf[i].cpuaddr);

if (i < (VIA_DMA_BUFFERS - 1))
chan->sgtable[i].count = cpu_to_le32 (VIA_DMA_BUF_SIZE | VIA_FLAG);
@@ -722,6 +724,7 @@
chan->sgbuf[i].handle);
chan->sgbuf[i].cpuaddr = NULL;
chan->sgbuf[i].handle = 0;
+ chan->sgbuf[i].page = NULL;
}

if (chan->sgtable) {
@@ -1717,9 +1720,11 @@
} else if (!wr)
chan = &card->ch_in;

+ assert (chan->sgbuf[pgoff].cpuaddr != NULL);
+ assert (chan->sgbuf[pgoff].page != NULL);
assert ((((unsigned long)chan->sgbuf[pgoff].cpuaddr) % PAGE_SIZE) == 0);

- dmapage = virt_to_page (chan->sgbuf[pgoff].cpuaddr);
+ dmapage = chan->sgbuf[pgoff].page;
DPRINTK ("EXIT, returning page %p for cpuaddr %lXh\n",
dmapage, (unsigned long) chan->sgbuf[pgoff].cpuaddr);
get_page (dmapage);


Attachments:
via.patch (1.35 kB)

2000-10-30 20:33:24

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [PATCH] kiobuf/rawio fixes for 2.4.0-test10-pre6

On Mon, Oct 30, 2000 at 03:08:31PM -0500, Jeff Garzik wrote:
> Actually, I wonder if its even possible for mmap_kiobuf to support audio
> -- full duplex requires that both record and playback buffer(s),
> theoretically two separate sets of kiobufs, to be presented as one space
> (with playback always presented before record).

kvmaps take kiovecs, which are multiple kiobufs ...

Christoph

--
Always remember that you are unique. Just like everyone else.

2000-10-30 21:52:51

by Jeff Garzik

[permalink] [raw]
Subject: Re: [PATCH] kiobuf/rawio fixes for 2.4.0-test10-pre6

Christoph Hellwig wrote:
>
> On Mon, Oct 30, 2000 at 03:08:31PM -0500, Jeff Garzik wrote:
> > Actually, I wonder if its even possible for mmap_kiobuf to support audio
> > -- full duplex requires that both record and playback buffer(s),
> > theoretically two separate sets of kiobufs, to be presented as one space
> > (with playback always presented before record).
>
> kvmaps take kiovecs, which are multiple kiobufs ...

s/sets of kiobufs/kiovecs/ in my message and re-read :)

<thinks> Ok kiobuf mmap in OSS audio is possible, but at that point
using kiobufs is still 100% overhead, because you still have to allocate
and manage DMA buffers separately due to read(2) and write(2).

--
Jeff Garzik | "Mind if I drive?" -Sam
Building 1024 | "Not if you don't mind me clawing at the
MandrakeSoft | dash and shrieking like a cheerleader."
| -Max

2000-10-31 02:09:09

by Andrea Arcangeli

[permalink] [raw]
Subject: Re: [PATCH] kiobuf/rawio fixes for 2.4.0-test10-pre6

On Mon, Oct 30, 2000 at 12:45:13PM +0100, Christoph Hellwig wrote:
> @@ -393,10 +396,15 @@
> pmd = pmd_offset(pgd, address);
> if (pmd) {
> pte_t * pte = pte_offset(pmd, address);
> - if (pte && pte_present(*pte))
> + if (pte && pte_present(*pte)) {
> + if (writeacc && !pte_write(*pte))
> + goto retry;
> return pte_page(*pte);
> + }
> }

It should also make sure the pte is dirty before starting the read-from-disk
I/O and then things will currently break in the swapout because the page is not
locked (see discussion of last week). The fix for that problem proposed by SCT
and Linus is that the page (not pte) will be marked dirty during swapout and
written back to disk _only_ once reference count is 1 (btw I now noticed
invalidate_inode_pages+MAP_SHARED will mess with that fix and it will trigger a
BUG() in free_pages).

> +
> +faultin:
> if (handle_mm_fault(current->mm, vma, ptr, datain) <= 0)
> goto out_unlock;
> spin_lock(&mm->page_table_lock);
> - map = follow_page(ptr);
> - if (!map) {
> + map = follow_page(ptr, datain, &failed);
> + if (failed) {
> + /*
> + * Page got stolen before we could lock it down.
> + * Retry.
> + */
> spin_unlock(&mm->page_table_lock);
> - dprintk (KERN_ERR "Missing page in map_user_kiobuf\n");
> - goto out_unlock;
> + goto faultin;

This is suboptimal (walks the pagetables twice if the page is just mapped). It
should be a follow page first and handle_mm_fault only if follow page failed.

Andrea

2000-11-01 11:16:53

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [PATCH] kiobuf/rawio fixes for 2.4.0-test10-pre6

Andrea Arcangeli <[email protected]> wrote:
>> + map = follow_page(ptr, datain, &failed);
>> + if (failed) {
>> + /*
>> + * Page got stolen before we could lock it down.
>> + * Retry.
>> + */
>> spin_unlock(&mm->page_table_lock);
>> - dprintk (KERN_ERR "Missing page in map_user_kiobuf\n");
>> - goto out_unlock;
>> + goto faultin;

> This is suboptimal (walks the pagetables twice if the page is just mapped). It
> should be a follow page first and handle_mm_fault only if follow page failed.

I did only forward-port the fixes from Stpehen's 2.3.99pre2 patchset
because no one else seemed to be interested. If someone with more
vm-experience (e.g. you) gets interested because of this patch: fine.

Christoph


--
Always remember that you are unique. Just like everyone else.

2000-11-01 13:35:08

by Stephen C. Tweedie

[permalink] [raw]
Subject: Re: [PATCH] kiobuf/rawio fixes for 2.4.0-test10-pre6

Hi,

On Mon, Oct 30, 2000 at 01:56:07PM -0500, Jeff Garzik wrote:
>
> Seen it, re-read my question...
>
> I keep seeing "audio drivers' mmap" used a specific example of a place
> that would benefit from kiobufs. The current via audio mmap looks quite
> a bit like mmap_kiobuf and its support code... except without all the
> kiobuf overhead.
>
> My question from above is: how can the via audio mmap in test10-preXX
> be improved by using kiobufs? I am not a kiobuf expert, but AFAICS a
> non-kiobuf implementation is better for audio drivers.

Code reuse. You may not need every single thing that the kvmap api
gives you --- for example, you may not need the per-mmap refcounting,
because you might be associating your dma buffer with the file
descriptor, not the mmap region --- but if you implement the same
nopage code in every single sound driver, then you end up with a lot
of duplication and you increase (enormously) the number of places you
have to touch if anything ever changes in the vma management code.

Cheers,
Stephen