Hi, everyone:
When I turn FORCE_DEBUG on in mm/slab.c, usb-uhci driver stops
working. It turned out that DMA headers must be aligned on 16.
Slab poisoning violates that assumption.
I have come up with a fix which USB folks did not like, but
they did not object to discussion on linux-kernel. Here is it.
The current code does something like this:
struct dmahdr {
__u32 head;
};
struct desc {
struct dmahdr h;
struct desc *next;
int last_used;
};
struct desc *p;
unsigned long busaddr;
p = kmalloc_z(sizeof(struct desc), GFP_SOMETHING);
busaddr = virt_to_bus(p);
writel(busaddr, ioremap_cookie + UHCI_SOME_REGISTER);
I changed it to this:
struct dmahdr {
__u32 head;
};
struct desc {
struct dmahdr *hp;
struct desc *next;
int last_used;
};
struct desc *p;
void *dp;
unsigned long busaddr;
p = kmalloc_z(sizeof(struct desc), GFP_SOMETHING);
dp = kmalloc_z(sizeof(struct dmahdr) + 15, GFP_SOMETHING);
dp = (dp + 15) & ~15;
p->hp = dp;
busaddr = virt_to_bus(p->hp);
writel(busaddr, ioremap_cookie + UHCI_SOME_REGISTER);
This way, after testing, we may replace second kmalloc and virt_to_bus
with pci_alloc_consistent.
Actual patch is more complicated due to checks, zeroing, and so on.
It is attached below.
Here is a mail that sums up the disagreement:
Date: Sun, 04 Mar 2001 13:10:26 -0800
From: David Brownell <[email protected]>
Subject: Re: [linux-usb-devel] Patch for usb-uhci and poisoned slab (2.4.2-ac7)
To: Peter Zaitcev <[email protected]>
Cc: [email protected]
> I found that usb-uhci fails when FORCE_DEBUG is set in mm/slab.c
> because it expects 16 bytes alignment for structures it allocates.
And mm/slab.c changes semantics when CONFIG_SLAB_DEBUG
is set: it ignores SLAB_HWCACHE_ALIGN. That seems more like
the root cause of the problem to me!
It's a lot simpler to patch mm/slab.c so its semantics don't change.
That is, don't resolve clashes between HWCACHE_ALIGN and
automagic redzoning in favor of redzoning any more.
> I did not go all the way to using pci_alloc_single,
> pci_alloc_consistent and friends, because I am not too sure
> in my hand, and also because I believe in gradual change
> (in this case).
That big a patch is rather non-gradual ... :-)
I think that the pci_alloc_consistent patch that Johannes sent
by for "uhci.c" would be a better approach. Though I'd like
to see that be more general ... say, making mm/slab.c know
about such things. Add a simple abstraction, and that should
be it -- right? :-)
- Dave
I see the Dave's argument as 1. Slab must honor HWCACHE_ALIGN
when debugged; 2. My patch is too big for too little gain.
It is understandable, but I find it hard to agree. First,
mixing the controller imposed alignment with cache alignment
is quite misleading. Second, is more "phylosophical" - yes
I can work without global poisoning. I just do want to use it.
I fancy forced slab poisoning, is it so wrong?
I need our benevolent dictatiorship to judge. Or else ... umm...
well, I guess, nothing will happen, we would just have broken
code as we always did :0
-- Pete
diff -ur -X ../dontdiff linux-2.4.2-ac7/drivers/usb/usb-uhci-debug.h linux-2.4.2-ac7-p3/drivers/usb/usb-uhci-debug.h
--- linux-2.4.2-ac7/drivers/usb/usb-uhci-debug.h Sat Jul 8 19:38:16 2000
+++ linux-2.4.2-ac7-p3/drivers/usb/usb-uhci-debug.h Sat Mar 3 11:27:45 2001
@@ -1,25 +1,25 @@
-#ifdef DEBUG
+#ifdef DEBUG_DUMP
static void __attribute__((__unused__)) uhci_show_qh (puhci_desc_t qh)
{
if (qh->type != QH_TYPE) {
dbg("qh has not QH_TYPE");
return;
}
- dbg("QH @ %p/%08lX:", qh, virt_to_bus (qh));
+ dbg("QH @ %p->%p/%08lX:", qh, qh->hwp, virt_to_bus (qh->hwp));
- if (qh->hw.qh.head & UHCI_PTR_TERM)
+ if (qh->hwp->qh.head & UHCI_PTR_TERM)
dbg(" Head Terminate");
else
dbg(" Head: %s @ %08X",
- (qh->hw.qh.head & UHCI_PTR_QH?"QH":"TD"),
- qh->hw.qh.head & ~UHCI_PTR_BITS);
+ (qh->hwp->qh.head & UHCI_PTR_QH?"QH":"TD"),
+ qh->hwp->qh.head & ~UHCI_PTR_BITS);
- if (qh->hw.qh.element & UHCI_PTR_TERM)
+ if (qh->hwp->qh.element & UHCI_PTR_TERM)
dbg(" Element Terminate");
else
dbg(" Element: %s @ %08X",
- (qh->hw.qh.element & UHCI_PTR_QH?"QH":"TD"),
- qh->hw.qh.element & ~UHCI_PTR_BITS);
+ (qh->hwp->qh.element & UHCI_PTR_QH?"QH":"TD"),
+ qh->hwp->qh.element & ~UHCI_PTR_BITS);
}
#endif
@@ -27,7 +27,7 @@
{
char *spid;
- switch (td->hw.td.info & 0xff) {
+ switch (td->hwp->td.info & 0xff) {
case USB_PID_SETUP:
spid = "SETUP";
break;
@@ -42,50 +42,52 @@
break;
}
- warn(" TD @ %p/%08lX, MaxLen=%02x DT%d EP=%x Dev=%x PID=(%s) buf=%08x",
- td, virt_to_bus (td),
- td->hw.td.info >> 21,
- ((td->hw.td.info >> 19) & 1),
- (td->hw.td.info >> 15) & 15,
- (td->hw.td.info >> 8) & 127,
+ warn(" TD @ %p->%p/%08lX, MaxLen=%02x DT%d EP=%x Dev=%x PID=(%s) buf=%08x",
+ td, td->hwp, virt_to_bus (td->hwp),
+ td->hwp->td.info >> 21,
+ ((td->hwp->td.info >> 19) & 1),
+ (td->hwp->td.info >> 15) & 15,
+ (td->hwp->td.info >> 8) & 127,
spid,
- td->hw.td.buffer);
+ td->hwp->td.buffer);
warn(" Len=%02x e%d %s%s%s%s%s%s%s%s%s%s",
- td->hw.td.status & 0x7ff,
- ((td->hw.td.status >> 27) & 3),
- (td->hw.td.status & TD_CTRL_SPD) ? "SPD " : "",
- (td->hw.td.status & TD_CTRL_LS) ? "LS " : "",
- (td->hw.td.status & TD_CTRL_IOC) ? "IOC " : "",
- (td->hw.td.status & TD_CTRL_ACTIVE) ? "Active " : "",
- (td->hw.td.status & TD_CTRL_STALLED) ? "Stalled " : "",
- (td->hw.td.status & TD_CTRL_DBUFERR) ? "DataBufErr " : "",
- (td->hw.td.status & TD_CTRL_BABBLE) ? "Babble " : "",
- (td->hw.td.status & TD_CTRL_NAK) ? "NAK " : "",
- (td->hw.td.status & TD_CTRL_CRCTIMEO) ? "CRC/Timeo " : "",
- (td->hw.td.status & TD_CTRL_BITSTUFF) ? "BitStuff " : ""
+ td->hwp->td.status & 0x7ff,
+ ((td->hwp->td.status >> 27) & 3),
+ (td->hwp->td.status & TD_CTRL_SPD) ? "SPD " : "",
+ (td->hwp->td.status & TD_CTRL_LS) ? "LS " : "",
+ (td->hwp->td.status & TD_CTRL_IOC) ? "IOC " : "",
+ (td->hwp->td.status & TD_CTRL_ACTIVE) ? "Active " : "",
+ (td->hwp->td.status & TD_CTRL_STALLED) ? "Stalled " : "",
+ (td->hwp->td.status & TD_CTRL_DBUFERR) ? "DataBufErr " : "",
+ (td->hwp->td.status & TD_CTRL_BABBLE) ? "Babble " : "",
+ (td->hwp->td.status & TD_CTRL_NAK) ? "NAK " : "",
+ (td->hwp->td.status & TD_CTRL_CRCTIMEO) ? "CRC/Timeo " : "",
+ (td->hwp->td.status & TD_CTRL_BITSTUFF) ? "BitStuff " : ""
);
- if (td->hw.td.link & UHCI_PTR_TERM)
+ if (td->hwp->td.link & UHCI_PTR_TERM)
warn(" TD Link Terminate");
else
warn(" Link points to %s @ %08x, %s",
- (td->hw.td.link & UHCI_PTR_QH?"QH":"TD"),
- td->hw.td.link & ~UHCI_PTR_BITS,
- (td->hw.td.link & UHCI_PTR_DEPTH ? "Depth first" : "Breadth first"));
+ (td->hwp->td.link & UHCI_PTR_QH?"QH":"TD"),
+ td->hwp->td.link & ~UHCI_PTR_BITS,
+ (td->hwp->td.link & UHCI_PTR_DEPTH ? "Depth first" : "Breadth first"));
}
-#ifdef DEBUG
+#ifdef DEBUG_DUMP
static void __attribute__((__unused__)) uhci_show_td_queue (puhci_desc_t td)
{
- //dbg("uhci_show_td_queue %p (%08lX):", td, virt_to_bus (td));
+ uhci_desc_u_t *h;
+ //dbg("uhci_show_td_queue %p (%08lX):", td, virt_to_bus (td->hwp));
while (1) {
uhci_show_td (td);
- if (td->hw.td.link & UHCI_PTR_TERM)
+ if (td->hwp->td.link & UHCI_PTR_TERM)
break;
- if (td != bus_to_virt (td->hw.td.link & ~UHCI_PTR_BITS))
- td = bus_to_virt (td->hw.td.link & ~UHCI_PTR_BITS);
- else {
- dbg("td points to itself!");
+ h = bus_to_virt (td->hwp->td.link & ~UHCI_PTR_BITS);
+ if (td->hwp != h) {
+ td = h->td.backp;
+ } else {
+ dbg("td %p points to itself!", td);
break;
}
}
@@ -94,21 +96,23 @@
static void __attribute__((__unused__)) uhci_show_queue (puhci_desc_t qh)
{
uhci_desc_t *start_qh=qh;
+ uhci_desc_u_t *h;
dbg("uhci_show_queue %p:", qh);
while (1) {
uhci_show_qh (qh);
- if (!(qh->hw.qh.element & UHCI_PTR_TERM))
- uhci_show_td_queue (bus_to_virt (qh->hw.qh.element & ~UHCI_PTR_BITS));
+ if (!(qh->hwp->qh.element & UHCI_PTR_TERM))
+ uhci_show_td_queue (((uhci_desc_u_t *)bus_to_virt (qh->hwp->qh.element & ~UHCI_PTR_BITS))->td.backp);
- if (qh->hw.qh.head & UHCI_PTR_TERM)
+ if (qh->hwp->qh.head & UHCI_PTR_TERM)
break;
- if (qh != bus_to_virt (qh->hw.qh.head & ~UHCI_PTR_BITS))
- qh = bus_to_virt (qh->hw.qh.head & ~UHCI_PTR_BITS);
+ h = bus_to_virt (qh->hwp->qh.head & ~UHCI_PTR_BITS);
+ if (qh->hwp != h)
+ qh = h->qh.backp;
else {
- dbg("qh points to itself!");
+ dbg("qh %p points to itself!", qh);
break;
}
diff -ur -X ../dontdiff linux-2.4.2-ac7/drivers/usb/usb-uhci.c linux-2.4.2-ac7-p3/drivers/usb/usb-uhci.c
--- linux-2.4.2-ac7/drivers/usb/usb-uhci.c Thu Mar 1 15:08:47 2001
+++ linux-2.4.2-ac7-p3/drivers/usb/usb-uhci.c Sat Mar 3 11:44:37 2001
@@ -44,7 +44,7 @@
//#define ISO_SANITY_CHECK
/* This enables debug printks */
-#define DEBUG
+#define DEBUG_DUMP
/* This enables all symbols to be exported, to ease debugging oopses */
//#define DEBUG_SYMBOLS
@@ -58,7 +58,6 @@
#include "usb-uhci.h"
#include "usb-uhci-debug.h"
-#undef DEBUG
#undef dbg
#define dbg(format, arg...) do {} while (0)
#define DEBUG_SYMBOLS
@@ -128,17 +127,17 @@
{
if (!list_empty(&s->urb_unlinked)) {
- s->td1ms->hw.td.status |= TD_CTRL_IOC;
+ s->td1ms->hwp->td.status |= TD_CTRL_IOC;
}
else {
- s->td1ms->hw.td.status &= ~TD_CTRL_IOC;
+ s->td1ms->hwp->td.status &= ~TD_CTRL_IOC;
}
if (s->timeout_urbs) {
- s->td32ms->hw.td.status |= TD_CTRL_IOC;
+ s->td32ms->hwp->td.status |= TD_CTRL_IOC;
}
else {
- s->td32ms->hw.td.status &= ~TD_CTRL_IOC;
+ s->td32ms->hwp->td.status &= ~TD_CTRL_IOC;
}
wmb();
@@ -153,7 +152,7 @@
return;
spin_lock_irqsave (&s->qh_lock, flags);
- s->chain_end->hw.qh.head&=~UHCI_PTR_TERM;
+ s->chain_end->hwp->qh.head &= ~UHCI_PTR_TERM;
mb();
s->loop_usage++;
((urb_priv_t*)urb->hcpriv)->use_loop=1;
@@ -172,7 +171,7 @@
s->loop_usage--;
if (!s->loop_usage) {
- s->chain_end->hw.qh.head|=UHCI_PTR_TERM;
+ s->chain_end->hwp->qh.head|=UHCI_PTR_TERM;
mb();
}
((urb_priv_t*)urb->hcpriv)->use_loop=0;
@@ -228,20 +227,42 @@
/*-------------------------------------------------------------------*/
_static int alloc_td (uhci_desc_t ** new, int flags)
{
+ uhci_desc_t *u;
+ void *p;
#ifdef DEBUG_SLAB
- *new= kmem_cache_alloc(uhci_desc_kmem, SLAB_FLAG);
+ u = kmem_cache_alloc(uhci_desc_kmem, SLAB_FLAG);
#else
- *new = (uhci_desc_t *) kmalloc (sizeof (uhci_desc_t), KMALLOC_FLAG);
+ u = (uhci_desc_t *) kmalloc (sizeof (uhci_desc_t), KMALLOC_FLAG);
#endif
- if (!*new)
+ *new = u;
+ if (u == NULL)
return -ENOMEM;
memset (*new, 0, sizeof (uhci_desc_t));
- (*new)->hw.td.link = UHCI_PTR_TERM | (flags & UHCI_PTR_BITS); // last by default
- (*new)->type = TD_TYPE;
+
+ /* XXX Replace this with a layer over pci_alloc_consistent(). */
+ /* Do not slabify (will not be able after pci_alloc_consistent(). */
+ if ((p = kmalloc(UHCI_HWDESC_SZ + 0xf, KMALLOC_FLAG)) == NULL) {
+#ifdef DEBUG_SLAB
+ kmem_cache_free(uhci_desc_kmem, u);
+#else
+ kfree (u);
+#endif
+ *new = NULL;
+ return -ENOMEM;
+ }
+ u->mmp = p;
+ u->hwp = (void *)(((unsigned long)p + 0xf) & ~0xf);
+ memset(u->hwp, 0, UHCI_HWDESC_SZ);
+
+#ifdef DEBUG_DUMP
+ u->hwp->td.backp = u;
+#endif
+ u->hwp->td.link = UHCI_PTR_TERM | (flags & UHCI_PTR_BITS); // last by default
+ u->type = TD_TYPE;
mb();
- INIT_LIST_HEAD (&(*new)->vertical);
- INIT_LIST_HEAD (&(*new)->horizontal);
-
+ INIT_LIST_HEAD (&u->vertical);
+ INIT_LIST_HEAD (&u->horizontal);
+
return 0;
}
/*-------------------------------------------------------------------*/
@@ -252,7 +273,7 @@
spin_lock_irqsave (&s->td_lock, xxx);
- td->hw.td.link = virt_to_bus (qh) | (flags & UHCI_PTR_DEPTH) | UHCI_PTR_QH;
+ td->hwp->td.link = virt_to_bus (qh->hwp) | (flags & UHCI_PTR_DEPTH) | UHCI_PTR_QH;
mb();
spin_unlock_irqrestore (&s->td_lock, xxx);
@@ -272,11 +293,11 @@
if (qh == prev ) {
// virgin qh without any tds
- qh->hw.qh.element = virt_to_bus (new) | UHCI_PTR_TERM;
+ qh->hwp->qh.element = virt_to_bus (new->hwp) | UHCI_PTR_TERM;
}
else {
// already tds inserted, implicitely remove TERM bit of prev
- prev->hw.td.link = virt_to_bus (new) | (flags & UHCI_PTR_DEPTH);
+ prev->hwp->td.link = virt_to_bus (new->hwp) | (flags & UHCI_PTR_DEPTH);
}
mb();
spin_unlock_irqrestore (&s->td_lock, xxx);
@@ -294,8 +315,8 @@
next = list_entry (td->horizontal.next, uhci_desc_t, horizontal);
list_add (&new->horizontal, &td->horizontal);
- new->hw.td.link = td->hw.td.link;
- td->hw.td.link = virt_to_bus (new);
+ new->hwp->td.link = td->hwp->td.link;
+ td->hwp->td.link = virt_to_bus (new->hwp);
mb();
spin_unlock_irqrestore (&s->td_lock, flags);
@@ -322,9 +343,9 @@
if (phys_unlink) {
// really remove HW linking
if (prev->type == TD_TYPE)
- prev->hw.td.link = element->hw.td.link;
+ prev->hwp->td.link = element->hwp->td.link;
else
- prev->hw.qh.element = element->hw.td.link;
+ prev->hwp->qh.element = element->hwp->td.link;
}
mb ();
@@ -342,6 +363,7 @@
/*-------------------------------------------------------------------*/
_static int delete_desc (uhci_desc_t *element)
{
+ kfree (element->mmp);
#ifdef DEBUG_SLAB
kmem_cache_free(uhci_desc_kmem, element);
#else
@@ -353,23 +375,45 @@
// Allocates qh element
_static int alloc_qh (uhci_desc_t ** new)
{
+ uhci_desc_t *u;
+ void *p;
#ifdef DEBUG_SLAB
- *new= kmem_cache_alloc(uhci_desc_kmem, SLAB_FLAG);
+ u = kmem_cache_alloc(uhci_desc_kmem, SLAB_FLAG);
#else
- *new = (uhci_desc_t *) kmalloc (sizeof (uhci_desc_t), KMALLOC_FLAG);
+ u = (uhci_desc_t *) kmalloc (sizeof (uhci_desc_t), KMALLOC_FLAG);
#endif
- if (!*new)
+ *new = u;
+ if (u == NULL)
return -ENOMEM;
memset (*new, 0, sizeof (uhci_desc_t));
- (*new)->hw.qh.head = UHCI_PTR_TERM;
- (*new)->hw.qh.element = UHCI_PTR_TERM;
- (*new)->type = QH_TYPE;
+
+ /* XXX Replace this with a layer over pci_alloc_consistent(). */
+ /* Do not slabify (will not be able after pci_alloc_consistent(). */
+ if ((p = kmalloc(UHCI_HWDESC_SZ + 0xf, KMALLOC_FLAG)) == NULL) {
+#ifdef DEBUG_SLAB
+ kmem_cache_free(uhci_desc_kmem, u);
+#else
+ kfree (u);
+#endif
+ *new = NULL;
+ return -ENOMEM;
+ }
+ u->mmp = p;
+ u->hwp = (void *)(((unsigned long)p + 0xf) & ~0xf);
+ memset(u->hwp, 0, UHCI_HWDESC_SZ);
+
+#ifdef DEBUG_DUMP
+ u->hwp->qh.backp = u;
+#endif
+ u->hwp->qh.head = UHCI_PTR_TERM;
+ u->hwp->qh.element = UHCI_PTR_TERM;
+ u->type = QH_TYPE;
mb();
- INIT_LIST_HEAD (&(*new)->horizontal);
- INIT_LIST_HEAD (&(*new)->vertical);
+ INIT_LIST_HEAD (&u->horizontal);
+ INIT_LIST_HEAD (&u->vertical);
- dbg("Allocated qh @ %p", *new);
+ dbg("Allocated qh @ %p", u);
return 0;
}
@@ -387,16 +431,16 @@
// (OLD) (POS) -> (OLD) (NEW) (POS)
old = list_entry (pos->horizontal.prev, uhci_desc_t, horizontal);
list_add_tail (&new->horizontal, &pos->horizontal);
- new->hw.qh.head = MAKE_QH_ADDR (pos) ;
- if (!(old->hw.qh.head & UHCI_PTR_TERM))
- old->hw.qh.head = MAKE_QH_ADDR (new) ;
+ new->hwp->qh.head = MAKE_QH_ADDR (pos->hwp) ;
+ if (!(old->hwp->qh.head & UHCI_PTR_TERM))
+ old->hwp->qh.head = MAKE_QH_ADDR (new->hwp) ;
}
else {
// (POS) (OLD) -> (POS) (NEW) (OLD)
old = list_entry (pos->horizontal.next, uhci_desc_t, horizontal);
list_add (&new->horizontal, &pos->horizontal);
- new->hw.qh.head = MAKE_QH_ADDR (old);
- pos->hw.qh.head = MAKE_QH_ADDR (new) ;
+ new->hwp->qh.head = MAKE_QH_ADDR (old->hwp);
+ pos->hwp->qh.head = MAKE_QH_ADDR (new->hwp) ;
}
mb ();
@@ -415,10 +459,10 @@
spin_lock_irqsave (&s->qh_lock, flags);
prev = list_entry (element->horizontal.prev, uhci_desc_t, horizontal);
- prev->hw.qh.head = element->hw.qh.head;
+ prev->hwp->qh.head = element->hwp->qh.head;
dbg("unlink qh %p, pqh %p, nxqh %p, to %08x", element, prev,
- list_entry (element->horizontal.next, uhci_desc_t, horizontal),element->hw.qh.head &~15);
+ list_entry (element->horizontal.next, uhci_desc_t, horizontal),element->hwp->qh.head &~15); /* XXX Why x&~15 here anymore? */
list_del(&element->horizontal);
@@ -466,9 +510,10 @@
/*-------------------------------------------------------------------*/
_static void fill_td (uhci_desc_t *td, int status, int info, __u32 buffer)
{
- td->hw.td.status = status;
- td->hw.td.info = info;
- td->hw.td.buffer = buffer;
+ uhci_td_t *p = &td->hwp->td;
+ p->status = status;
+ p->info = info;
+ p->buffer = buffer;
}
/*-------------------------------------------------------------------*/
// Removes ALL qhs in chain (paranoia!)
@@ -564,12 +609,12 @@
if (ret)
goto init_skel_cleanup;
s->iso_td[n] = td;
- s->framelist[n] = ((__u32) virt_to_bus (td));
+ s->framelist[n] = (__u32) virt_to_bus (td->hwp);
}
dbg("allocating qh: chain_end");
ret = alloc_qh (&qh);
-
+
if (ret)
goto init_skel_cleanup;
@@ -582,7 +627,7 @@
fill_td (td, 0 * TD_CTRL_IOC, 0, 0); // generate 1ms interrupt (enabled on demand)
insert_td (s, qh, td, 0);
- qh->hw.qh.element &= ~UHCI_PTR_TERM; // remove TERM bit
+ qh->hwp->qh.element &= ~UHCI_PTR_TERM; // remove TERM bit
s->td1ms=td;
dbg("allocating qh: bulk_chain");
@@ -603,7 +648,7 @@
#ifdef CONFIG_USB_UHCI_HIGH_BANDWIDTH
// disabled reclamation loop
- s->chain_end->hw.qh.head=virt_to_bus(s->control_chain) | UHCI_PTR_QH | UHCI_PTR_TERM;
+ s->chain_end->hwp->qh.head = virt_to_bus(s->control_chain->hwp) | UHCI_PTR_QH | UHCI_PTR_TERM;
#endif
dbg("allocating qh: ls_control_chain");
@@ -627,10 +672,10 @@
goto init_skel_cleanup;
s->int_chain[n] = td;
if (n == 0) {
- s->int_chain[0]->hw.td.link = virt_to_bus (s->ls_control_chain) | UHCI_PTR_QH;
+ s->int_chain[0]->hwp->td.link = virt_to_bus (s->ls_control_chain->hwp) | UHCI_PTR_QH;
}
else {
- s->int_chain[n]->hw.td.link = virt_to_bus (s->int_chain[0]);
+ s->int_chain[n]->hwp->td.link = virt_to_bus (s->int_chain[0]->hwp);
}
}
@@ -641,11 +686,11 @@
int m, o;
dbg("framelist[%i]=%x",n,s->framelist[n]);
if ((n&127)==127)
- ((uhci_desc_t*) s->iso_td[n])->hw.td.link = virt_to_bus(s->int_chain[0]);
+ ((uhci_desc_t*) s->iso_td[n])->hwp->td.link = virt_to_bus(s->int_chain[0]->hwp);
else
for (o = 1, m = 2; m <= 128; o++, m += m)
if ((n & (m - 1)) == ((m - 1) / 2))
- ((uhci_desc_t*) s->iso_td[n])->hw.td.link = virt_to_bus (s->int_chain[o]);
+ ((uhci_desc_t*) s->iso_td[n])->hwp->td.link = virt_to_bus (s->int_chain[o]->hwp);
}
ret = alloc_td (&td, 0);
@@ -783,7 +828,7 @@
urb->status = -EINPROGRESS;
queue_urb (s, urb); // queue before inserting in desc chain
- qh->hw.qh.element &= ~UHCI_PTR_TERM;
+ qh->hwp->qh.element &= ~UHCI_PTR_TERM;
//uhci_show_queue(qh);
/* Start it up... put low speed first */
@@ -862,8 +907,8 @@
}
return -ENOMEM;
}
- bqh->hw.qh.element = UHCI_PTR_TERM;
- bqh->hw.qh.head = virt_to_bus(nqh) | UHCI_PTR_QH; // element
+ bqh->hwp->qh.element = UHCI_PTR_TERM;
+ bqh->hwp->qh.head = virt_to_bus(nqh->hwp) | UHCI_PTR_QH; // element
upriv->bottom_qh = bqh;
}
queue_dbg("uhci_submit_bulk: qh %p bqh %p nqh %p",qh, bqh, nqh);
@@ -904,7 +949,7 @@
last = (len == 0 && (usb_pipein(pipe) || pktsze < maxsze || !(urb->transfer_flags & USB_DISABLE_SPD)));
if (last)
- td->hw.td.status |= TD_CTRL_IOC; // last one generates INT
+ td->hwp->td.status |= TD_CTRL_IOC; // last one generates INT
insert_td (s, qh, td, UHCI_PTR_DEPTH * depth_first);
if (!first_td)
@@ -925,9 +970,9 @@
queue_urb_unlocked (s, urb);
if (urb->transfer_flags & USB_QUEUE_BULK)
- qh->hw.qh.element = virt_to_bus (first_td);
+ qh->hwp->qh.element = virt_to_bus (first_td->hwp);
else
- qh->hw.qh.element &= ~UHCI_PTR_TERM; // arm QH
+ qh->hwp->qh.element &= ~UHCI_PTR_TERM; // arm QH
if (!bulk_urb) { // new bulk queue
if (urb->transfer_flags & USB_QUEUE_BULK) {
@@ -996,7 +1041,7 @@
spin_lock_irqsave (&s->qh_lock, flags);
prevqh = list_entry (ppriv->desc_list.next, uhci_desc_t, desc_list);
prevtd = list_entry (prevqh->vertical.prev, uhci_desc_t, vertical);
- prevtd->hw.td.link = virt_to_bus(priv->bottom_qh) | UHCI_PTR_QH; // skip current qh
+ prevtd->hwp->td.link = virt_to_bus(priv->bottom_qh->hwp) | UHCI_PTR_QH; // skip current qh
mb();
queue_dbg("uhci_clean_transfer: relink pqh %p, ptd %p",prevqh, prevtd);
spin_unlock_irqrestore (&s->qh_lock, flags);
@@ -1039,7 +1084,7 @@
if (!priv->prev_queued_urb) { // top QH
prevqh = list_entry (qh->horizontal.prev, uhci_desc_t, horizontal);
- prevqh->hw.qh.head = virt_to_bus(bqh) | UHCI_PTR_QH;
+ prevqh->hwp->qh.head = virt_to_bus(bqh->hwp) | UHCI_PTR_QH;
list_del (&qh->horizontal); // remove this qh form horizontal chain
list_add (&bqh->horizontal, &prevqh->horizontal); // insert next bqh in horizontal chain
}
@@ -1052,7 +1097,7 @@
ppriv->bottom_qh = bnqh;
ppriv->next_queued_urb = nurb;
prevqh = list_entry (ppriv->desc_list.next, uhci_desc_t, desc_list);
- prevqh->hw.qh.head = virt_to_bus(bqh) | UHCI_PTR_QH;
+ prevqh->hwp->qh.head = virt_to_bus(bqh->hwp) | UHCI_PTR_QH;
}
mb();
@@ -2274,7 +2319,7 @@
*/
if (urb_priv->flags &&
- ((qh->hw.qh.element == UHCI_PTR_TERM) ||(!(last_desc->hw.td.status & TD_CTRL_ACTIVE))))
+ ((qh->hwp->qh.element == UHCI_PTR_TERM) ||(!(last_desc->hwp->td.status & TD_CTRL_ACTIVE))))
goto transfer_finished;
urb->actual_length=0;
@@ -2282,12 +2327,12 @@
for (; p != &qh->vertical; p = p->next) {
desc = list_entry (p, uhci_desc_t, vertical);
- if (desc->hw.td.status & TD_CTRL_ACTIVE) // do not process active TDs
+ if (desc->hwp->td.status & TD_CTRL_ACTIVE) // do not process active TDs
return ret;
- actual_length = (desc->hw.td.status + 1) & 0x7ff; // extract transfer parameters from TD
- maxlength = (((desc->hw.td.info >> 21) & 0x7ff) + 1) & 0x7ff;
- status = uhci_map_status (uhci_status_bits (desc->hw.td.status), usb_pipeout (urb->pipe));
+ actual_length = (desc->hwp->td.status + 1) & 0x7ff; // extract transfer parameters from TD
+ maxlength = (((desc->hwp->td.info >> 21) & 0x7ff) + 1) & 0x7ff;
+ status = uhci_map_status (uhci_status_bits (desc->hwp->td.status), usb_pipeout (urb->pipe));
if (status == -EPIPE) { // see if EP is stalled
// set up stalled condition
@@ -2301,7 +2346,7 @@
urb->error_count++;
break;
}
- else if ((desc->hw.td.info & 0xff) != USB_PID_SETUP)
+ else if ((desc->hwp->td.info & 0xff) != USB_PID_SETUP)
urb->actual_length += actual_length;
// got less data than requested
@@ -2314,9 +2359,9 @@
// short read during control-IN: re-start status stage
if ((usb_pipetype (urb->pipe) == PIPE_CONTROL)) {
- if (uhci_packetid(last_desc->hw.td.info) == USB_PID_OUT) {
+ if (uhci_packetid(last_desc->hwp->td.info) == USB_PID_OUT) {
- qh->hw.qh.element = virt_to_bus (last_desc); // re-trigger status stage
+ qh->hwp->qh.element = virt_to_bus (last_desc->hwp); // re-trigger status stage
dbg("short packet during control transfer, retrigger status stage @ %p",last_desc);
//uhci_show_td (desc);
//uhci_show_td (last_desc);
@@ -2325,14 +2370,14 @@
}
}
// all other cases: short read is OK
- data_toggle = uhci_toggle (desc->hw.td.info);
+ data_toggle = uhci_toggle (desc->hwp->td.info);
break;
}
else if (status)
goto is_error;
- data_toggle = uhci_toggle (desc->hw.td.info);
- queue_dbg("process_transfer: len:%d status:%x mapped:%x toggle:%d", actual_length, desc->hw.td.status,status, data_toggle);
+ data_toggle = uhci_toggle (desc->hwp->td.info);
+ queue_dbg("process_transfer: len:%d status:%x mapped:%x toggle:%d", actual_length, desc->hwp->td.status,status, data_toggle);
}
@@ -2369,20 +2414,20 @@
{
desc = list_entry (p, uhci_desc_t, desc_list);
- if (desc->hw.td.status & TD_CTRL_ACTIVE) {
+ if (desc->hwp->td.status & TD_CTRL_ACTIVE) {
// do not process active TDs
- //dbg("TD ACT Status @%p %08x",desc,desc->hw.td.status);
+ //dbg("TD ACT Status @%p %08x",desc,desc->hwp->td.status);
break;
}
- if (!desc->hw.td.status & TD_CTRL_IOC) {
+ if (!desc->hwp->td.status & TD_CTRL_IOC) {
// do not process one-shot TDs, no recycling
break;
}
// extract transfer parameters from TD
- actual_length = (desc->hw.td.status + 1) & 0x7ff;
- status = uhci_map_status (uhci_status_bits (desc->hw.td.status), usb_pipeout (urb->pipe));
+ actual_length = (desc->hwp->td.status + 1) & 0x7ff;
+ status = uhci_map_status (uhci_status_bits (desc->hwp->td.status), usb_pipeout (urb->pipe));
// see if EP is stalled
if (status == -EPIPE) {
@@ -2422,23 +2467,23 @@
// Recycle INT-TD if interval!=0, else mark TD as one-shot
if (urb->interval) {
- desc->hw.td.info &= ~(1 << TD_TOKEN_TOGGLE);
+ desc->hwp->td.info &= ~(1 << TD_TOKEN_TOGGLE);
if (status==0) {
((urb_priv_t*)urb->hcpriv)->started=jiffies;
- desc->hw.td.info |= (usb_gettoggle (urb->dev, usb_pipeendpoint (urb->pipe),
+ desc->hwp->td.info |= (usb_gettoggle (urb->dev, usb_pipeendpoint (urb->pipe),
usb_pipeout (urb->pipe)) << TD_TOKEN_TOGGLE);
usb_dotoggle (urb->dev, usb_pipeendpoint (urb->pipe), usb_pipeout (urb->pipe));
} else {
- desc->hw.td.info |= (!usb_gettoggle (urb->dev, usb_pipeendpoint (urb->pipe),
+ desc->hwp->td.info |= (!usb_gettoggle (urb->dev, usb_pipeendpoint (urb->pipe),
usb_pipeout (urb->pipe)) << TD_TOKEN_TOGGLE);
}
- desc->hw.td.status= (urb->pipe & TD_CTRL_LS) | TD_CTRL_ACTIVE | TD_CTRL_IOC |
+ desc->hwp->td.status= (urb->pipe & TD_CTRL_LS) | TD_CTRL_ACTIVE | TD_CTRL_IOC |
(urb->transfer_flags & USB_DISABLE_SPD ? 0 : TD_CTRL_SPD) | (3 << 27);
mb();
}
else {
uhci_unlink_urb_async(s, urb);
- desc->hw.td.status &= ~TD_CTRL_IOC; // inactivate TD
+ desc->hwp->td.status &= ~TD_CTRL_IOC; // inactivate TD
}
}
}
@@ -2456,23 +2501,23 @@
uhci_desc_t *desc = list_entry (urb_priv->desc_list.prev, uhci_desc_t, desc_list);
dbg("urb contains iso request");
- if ((desc->hw.td.status & TD_CTRL_ACTIVE) && !mode)
+ if ((desc->hwp->td.status & TD_CTRL_ACTIVE) && !mode)
return -EXDEV; // last TD not finished
urb->error_count = 0;
urb->actual_length = 0;
urb->status = 0;
dbg("process iso urb %p, %li, %i, %i, %i %08x",urb,jiffies,UHCI_GET_CURRENT_FRAME(s),
- urb->number_of_packets,mode,desc->hw.td.status);
+ urb->number_of_packets,mode,desc->hwp->td.status);
for (i = 0; p != &urb_priv->desc_list; i++) {
desc = list_entry (p, uhci_desc_t, desc_list);
//uhci_show_td(desc);
- if (desc->hw.td.status & TD_CTRL_ACTIVE) {
+ if (desc->hwp->td.status & TD_CTRL_ACTIVE) {
// means we have completed the last TD, but not the TDs before
- desc->hw.td.status &= ~TD_CTRL_ACTIVE;
- dbg("TD still active (%x)- grrr. paranoia!", desc->hw.td.status);
+ desc->hwp->td.status &= ~TD_CTRL_ACTIVE;
+ dbg("TD still active (%x)- grrr. paranoia!", desc->hwp->td.status);
ret = -EXDEV;
urb->iso_frame_desc[i].status = ret;
unlink_td (s, desc, 1);
@@ -2489,15 +2534,15 @@
goto err;
}
- if (urb->iso_frame_desc[i].offset + urb->transfer_buffer != bus_to_virt (desc->hw.td.buffer)) {
+ if (urb->iso_frame_desc[i].offset + urb->transfer_buffer != bus_to_virt (desc->hwp->td.buffer)) {
// Hm, something really weird is going on
- dbg("Pointer Paranoia: %p!=%p", urb->iso_frame_desc[i].offset + urb->transfer_buffer, bus_to_virt (desc->hw.td.buffer));
+ dbg("Pointer Paranoia: %p!=%p", urb->iso_frame_desc[i].offset + urb->transfer_buffer, bus_to_virt (desc->hwp->td.buffer));
ret = -EINVAL;
urb->iso_frame_desc[i].status = ret;
goto err;
}
- urb->iso_frame_desc[i].actual_length = (desc->hw.td.status + 1) & 0x7ff;
- urb->iso_frame_desc[i].status = uhci_map_status (uhci_status_bits (desc->hw.td.status), usb_pipeout (urb->pipe));
+ urb->iso_frame_desc[i].actual_length = (desc->hwp->td.status + 1) & 0x7ff;
+ urb->iso_frame_desc[i].status = uhci_map_status (uhci_status_bits (desc->hwp->td.status), usb_pipeout (urb->pipe));
urb->actual_length += urb->iso_frame_desc[i].actual_length;
err:
@@ -2507,7 +2552,7 @@
urb->status = urb->iso_frame_desc[i].status;
}
dbg("process_iso: %i: len:%d %08x status:%x",
- i, urb->iso_frame_desc[i].actual_length, desc->hw.td.status,urb->iso_frame_desc[i].status);
+ i, urb->iso_frame_desc[i].actual_length, desc->hwp->td.status,urb->iso_frame_desc[i].status);
list_del (p);
p = p->next;
@@ -2939,6 +2984,9 @@
{
int i;
+ /* disable legacy emulation */
+ pci_write_config_word (dev, USBLEGSUP, 0);
+
if (pci_enable_device(dev) < 0)
return -ENODEV;
@@ -2954,8 +3002,6 @@
/* Is it already in use? */
if (check_region (io_addr, io_size))
break;
- /* disable legacy emulation */
- pci_write_config_word (dev, USBLEGSUP, 0);
pci_set_master(dev);
return alloc_uhci(dev, dev->irq, io_addr, io_size);
@@ -3005,7 +3051,7 @@
#ifdef DEBUG_SLAB
uhci_desc_kmem = kmem_cache_create("uhci_desc", sizeof(uhci_desc_t), 0, SLAB_HWCACHE_ALIGN, NULL, NULL);
-
+
if(!uhci_desc_kmem) {
err("kmem_cache_create for uhci_desc failed (out of memory)");
return -ENOMEM;
diff -ur -X ../dontdiff linux-2.4.2-ac7/drivers/usb/usb-uhci.h linux-2.4.2-ac7-p3/drivers/usb/usb-uhci.h
--- linux-2.4.2-ac7/drivers/usb/usb-uhci.h Mon May 15 12:05:15 2000
+++ linux-2.4.2-ac7-p3/drivers/usb/usb-uhci.h Sat Mar 3 11:29:36 2001
@@ -100,7 +100,7 @@
#define uhci_status_bits(ctrl_sts) (ctrl_sts & 0xFE0000)
#define uhci_actual_length(ctrl_sts) ((ctrl_sts + 1) & TD_CTRL_ACTLEN_MASK) /* 1-based */
-#define uhci_ptr_to_virt(x) bus_to_virt(x & ~UHCI_PTR_BITS)
+/* #define uhci_ptr_to_virt(x) bus_to_virt(x & ~UHCI_PTR_BITS) */ /* unused */
/*
* for TD <flags>:
@@ -124,6 +124,10 @@
/* ------------------------------------------------------------------------------------
New TD/QH-structures
------------------------------------------------------------------------------------ */
+
+/* One size for TD & QH. Align to 16 eats any gains of two sizes. Less fragmented, too. */
+#define UHCI_HWDESC_SZ 16
+
typedef enum {
TD_TYPE, QH_TYPE
} uhci_desc_type_t;
@@ -133,18 +137,29 @@
__u32 status;
__u32 info;
__u32 buffer;
-} uhci_td_t, *puhci_td_t;
+#ifdef DEBUG_DUMP
+ struct uhci_desc *backp;
+#endif
+} uhci_td_t, *puhci_td_t; /* __attribute__((packed)) perhaps? XXX */
typedef struct {
__u32 head;
__u32 element; /* Queue element pointer */
+#ifdef DEBUG_DUMP
+ __u32 fill_08;
+ __u32 fill_0c;
+ struct uhci_desc *backp;
+#endif
} uhci_qh_t, *puhci_qh_t;
-typedef struct {
- union {
- uhci_td_t td;
- uhci_qh_t qh;
- } hw;
+typedef union uhci_desc_u { /* uncached, dma-able part of descriptor */
+ uhci_td_t td;
+ uhci_qh_t qh;
+} uhci_desc_u_t;
+
+typedef struct uhci_desc { /* cached, or software, descriptor */
+ uhci_desc_u_t *hwp;
+ void *mmp;
uhci_desc_type_t type;
struct list_head horizontal;
struct list_head vertical;
> And mm/slab.c changes semantics when CONFIG_SLAB_DEBUG
> is set: it ignores SLAB_HWCACHE_ALIGN. That seems more like
> the root cause of the problem to me!
>
HWCACHE_ALIGN does not guarantee a certain byte alignment. And
additionally it's not even guaranteed that kmalloc() uses that
HWCACHE_ALIGN.
Uhci is broken, not my slab code ;-)
> I think that the pci_alloc_consistent patch that Johannes sent
>by for "uhci.c" would be a better approach. Though I'd like
>to see that be more general ... say, making mm/slab.c know
>about such things. Add a simple abstraction, and that should
>be it -- right? :-)
I looked at it, and there are 2 problems that make it virtually
impossible to integrate kmem_cache_alloc() with pci memory alloc without
a major redesign:
* pci_alloc_consistent returns 2 values, kmem_cache_alloc() only one.
This one would be possible to work around.
* the slab allocator heavily relies on the 'struct page' structure, but
it's not guaranteed that it exists for pci_alloced memory.
I'd switch to pci_alloc_consistent with some optimizations to avoid
wasting a complete page for each DMA header. (I haven't seen Johannes
patch, but we discussed the problem 6 weeks ago and that proposal was
the end of the thread)
--
Manfred
> > And mm/slab.c changes semantics when CONFIG_SLAB_DEBUG
> > is set: it ignores SLAB_HWCACHE_ALIGN. That seems more like
> > the root cause of the problem to me!
>
> HWCACHE_ALIGN does not guarantee a certain byte alignment. And
> additionally it's not even guaranteed that kmalloc() uses that
> HWCACHE_ALIGN.
> Uhci is broken, not my slab code ;-)
If so, the slab code docs/specs are broken too ... I just checked in
my development tree (2.4.2-ac7 plus goodies) and what's written
is that HWCACHE_ALIGN is to "Align the objects in this cache to
a hardware cacheline." Which contradicts what you just wrote;
it's supposed to always align, not do so only when convenient.
Clearly, something in mm/slab.c needs to change even if it's just
changing the spec for kmem_cache_create() behavior.
> I'd switch to pci_alloc_consistent with some optimizations to avoid
> wasting a complete page for each DMA header. (I haven't seen Johannes
> patch, but we discussed the problem 6 weeks ago and that proposal was
> the end of the thread)
I didn't see that thread. I agree, pci_alloc_consistent() already has
a signature that's up to the job. The change you suggest would need
to affect every architecture's implementation of that code ... which
somehow seems not the best solution at this time.
Perhaps we should have different functions for pci consistent alloc
at the "hardware" level (what we have now) and at the "slab" level.
That's sort of what Johannes' patch did, except it was specific to
that one driver rather than being a generic utility. Of course, I'd
rather that such a generic package have all the debug support
(except broken redzoning :-) that the current slab code does.
- Dave
On Mon, Mar 05, 2001 at 02:52:24PM -0800, David Brownell wrote:
> I didn't see that thread. I agree, pci_alloc_consistent() already has
> a signature that's up to the job. The change you suggest would need
> to affect every architecture's implementation of that code ... which
> somehow seems not the best solution at this time.
Needless to say that USB is currently broken for the architectures that
need pci_alloc_consistent.
A while ago, I looked at what was required to convert the OHCI driver
to pci_alloc_consistent, and it turns out that the current interface is
highly sub-optimal. It looks good on the face of it, but it _really_
does need sub-page allocations to make sense for USB.
At the time, I didn't feel like creating a custom sub-allocator just
for USB, and since then I haven't had the inclination nor motivation
to go back to trying to get my USB mouse or iPAQ communicating via USB.
(I've not used this USB port for 3 years anyway).
I'd be good to get it done "properly" at some point though.
--
Russell King ([email protected]) The developer of ARM Linux
http://www.arm.linux.org.uk/personal/aboutme.html
> At the time, I didn't feel like creating a custom sub-allocator just
> for USB, and since then I haven't had the inclination nor motivation
> to go back to trying to get my USB mouse or iPAQ communicating via USB.
> (I've not used this USB port for 3 years anyway).
>
> I'd be good to get it done "properly" at some point though.
Something like
struct pci_pool *pci_alloc_consistent_pool(int objectsize, int align)
pci_alloc_pool_consistent(pool,..
pci_free_pool_consistent(pool,..
??
Where the pool allocator does page grabbing and chaining >
Russell King writes:
> A while ago, I looked at what was required to convert the OHCI driver
> to pci_alloc_consistent, and it turns out that the current interface is
> highly sub-optimal. It looks good on the face of it, but it _really_
> does need sub-page allocations to make sense for USB.
>
> At the time, I didn't feel like creating a custom sub-allocator just
> for USB, and since then I haven't had the inclination nor motivation
> to go back to trying to get my USB mouse or iPAQ communicating via USB.
> (I've not used this USB port for 3 years anyway).
Gerard Roudier wrote for the sym53c8xx driver the exact thing
UHCI/OHCI need for this.
I think people are pissing their pants over the pci_alloc_consistent
interface for no reason. It gives PAGE<<order sized/aligned chunks
back to the caller at the request of Linus so that drivers did not
have to guess "is this 16-byte aligned..." etc.
Later,
David S. Miller
[email protected]
> From: "David S. Miller" <[email protected]>
> Date: Mon, 5 Mar 2001 20:53:21 -0800 (PST)
>[...]
> Gerard Roudier wrote for the sym53c8xx driver the exact thing
> UHCI/OHCI need for this.
Thanks for the hint.
***
Anyways, is this the end of the discussion regarding my patch?
If it goes in, then fine. If not, fine too... Just
let me know so that I can close the bug properly.
Manfred said plainly "usb-uhci is broken", Alan kinda
manuevered around my small problem, Dave Brownell looks
unconvinced. So?
-- Pete
> Anyways, is this the end of the discussion regarding my patch?
I think one of the maintainers for usb-uhci (Georg) said he'd
want the general fix ...
> Manfred said plainly "usb-uhci is broken", Alan kinda
> manuevered around my small problem, Dave Brownell looks
> unconvinced. So?
There are two problems I see.
(1) CONFIG_SLAB_DEBUG breaks the documented
requirement that the slab cache return adequately aligned
data ... which the appended patch should probably handle
nicely (something like it sure did :-) and with less danger
than the large patch you posted.
(2) The USB host controller drivers all need something
like a pci_consistent slab cache, which doesn't currently
exist. I have something like that in the works, and David
Miller noted one driver that I may steal from.
- Dave
--- slab.c-orig Tue Mar 6 15:01:26 2001
+++ slab.c Tue Mar 6 15:05:58 2001
@@ -676,12 +676,10 @@
}
#if DEBUG
+ /* redzoning would break cache alignment requirements */
+ if (flags & SLAB_HWCACHE_ALIGN)
+ flags &= ~SLAB_RED_ZONE;
if (flags & SLAB_RED_ZONE) {
- /*
- * There is no point trying to honour cache alignment
- * when redzoning.
- */
- flags &= ~SLAB_HWCACHE_ALIGN;
size += 2*BYTES_PER_WORD; /* words for redzone */
}
#endif
> > At the time, I didn't feel like creating a custom sub-allocator just
> > for USB, ...
> >
> > I'd be good to get it done "properly" at some point though.
>
> Something like
>
> struct pci_pool *pci_alloc_consistent_pool(int objectsize, int align)
struct pci_pool *
pci_create_consistent_pool (struct pci_dev *dev, int size, int align)
and similar for freeing the pool ... pci_alloc_consistent() needs the device,
presumably since some devices may need to dma into specific memory.
I'd probably want at least "flags" from kmem_cache_create().
> pci_alloc_pool_consistent(pool,..
> pci_free_pool_consistent(pool,..
These should have signatures just like pci_alloc_consistent() and
pci_free_consistent() except they take the pci_pool, not a pci_dev.
Oh, and likely GFP_ flags to control blocking.
> Where the pool allocator does page grabbing and chaining
Given an agreement on API, I suspect Johannes' patch could get
quickly generalized. Then debugging support (like in slab.c) could
be added later.
- Dave
David Brownell wrote:
>
> There are two problems I see.
>
> (1) CONFIG_SLAB_DEBUG breaks the documented
> requirement that the slab cache return adequately aligned
> data ...
adequately aligned for the _cpu_, not for some controllers. It's neither
documented that HW_CACHEALIGN aligns to 16 byte boundaries nor that
kmalloc uses HW_CACHEALIGN.
> which the appended patch should probably handle
> nicely (something like it sure did :-) and with less danger
> than the large patch you posted.
>
> (2) The USB host controller drivers all need something
> like a pci_consistent slab cache, which doesn't currently
> exist. I have something like that in the works, and David
> Miller noted one driver that I may steal from.
>
> - Dave
>
> --- slab.c-orig Tue Mar 6 15:01:26 2001
> +++ slab.c Tue Mar 6 15:05:58 2001
> @@ -676,12 +676,10 @@
> }
>
> #if DEBUG
> + /* redzoning would break cache alignment requirements */
> + if (flags & SLAB_HWCACHE_ALIGN)
> + flags &= ~SLAB_RED_ZONE;
The problem is that you've just disabled red zoning for kmalloc. And
kmalloc is the only case where redzoning is important: If a caller uses
kmem_cache_alloc() for a structure then he won't write beyond the end of
the structure.
I think everyone agrees that (2) correct fix.
I see 2 temporary workarounds: either your patch or
+ #ifdef CONFIG_SLAB_DEBUG
+ #error
+ #endif
in uhci.c.
--
Manfred
> > (1) CONFIG_SLAB_DEBUG breaks the documented
> > requirement that the slab cache return adequately aligned
> > data ...
>
> adequately aligned for the _cpu_, not for some controllers. It's neither
> documented that HW_CACHEALIGN aligns to 16 byte boundaries
It's documented in mm/slab.c (line 612 in my ac7+tweaks).
"Aligns to hardware cache line." The only cacheline define
I've found is in <asm/cache.h> for L1_CACHE_BYTES,
which is often more than 16 (not on x86).
If you don't like the patch I forwarded, then please submit
one that changes the kmem_cache_create() API spec ...
one or the other is needed. (I'd not change that API!!)
> nor that kmalloc uses HW_CACHEALIGN. ...
>
> > + /* redzoning would break cache alignment requirements */
> > + if (flags & SLAB_HWCACHE_ALIGN)
> > + flags &= ~SLAB_RED_ZONE;
>
> The problem is that you've just disabled red zoning for kmalloc. And
> kmalloc is the only case where redzoning is important: ...
If kmalloc wants to get auto redzoning, then I think it shouldn't be
using SLAB_HWCACHE_ALIGN when CONFIG_SLAB_DEBUG.
That'd be another simple fix. (Or, make it use some new flag that's
just a performance hint that can be ignored for debugging.)
> I think everyone agrees that (2) correct fix.
I was saying that there were two bugs (two fixes needed), and you're
saying that there's only one ... despite the evidence of the API spec.
But you could persuade me there's a third bug: kmalloc misuse of
that kmem_cache API.
> I see 2 temporary workarounds: either your patch or
>
> + #ifdef CONFIG_SLAB_DEBUG
> + #error
> + #endif
>
> in uhci.c.
Better in linux/drivers/usb/Config.in instead. All the host controller
drivers rely on the kmem_cache_create() API spec to be followed.
(Even the OHCI driver, when using kmalloc not kmem_cache.)
- Dave
David S. Miller writes:
> Russell King writes:
> > A while ago, I looked at what was required to convert the OHCI driver
> > to pci_alloc_consistent, and it turns out that the current interface is
> > highly sub-optimal. It looks good on the face of it, but it _really_
> > does need sub-page allocations to make sense for USB.
> >
> > At the time, I didn't feel like creating a custom sub-allocator just
> > for USB, ...
>
> Gerard Roudier wrote for the sym53c8xx driver the exact thing
> UHCI/OHCI need for this.
I looked at that, and it didn't seem it'd drop in very easily.
I'd be interested in feedback on the API below. It can
handle the USB host controllers (EHCI/OHCI/UHCI) on
currently unsupported hardware (the second bug I noted).
Questions: is it general enough for other drivers to use?
Should I package it as a patch for 2.4?
There's also a simple implementation, which works (to limited
testing) but would IMO better be replaced by something using
the innards of the slab allocator (mm/slab.c). That'd need new
APIs inside that allocator ... those changes seem like 2.5 fixes,
unlike the slab allocator bug(s) I pointed out. (And which
Manfred seems to have gone silent on.)
- Dave
/* LOGICALLY: <linux/pci.h> */
/*
* This works like a pci-dma-consistent kmem_cache_t would, and handles
* alignment guarantees for its (small) memory blocks. Many PCI device
* drivers need such functionality.
*/
struct pci_pool;
/* Create a pool - flags are SLAB_* */
extern struct pci_pool *
pci_pool_create (const char *name, struct pci_dev *pdev,
int size, int align, int flags);
/* Get memory from a pool - flags are GFP_KERNEL or GFP_ATOMIC */
extern void *
pci_pool_alloc (struct pci_pool *pool, int flags, dma_addr_t *handle);
/* Return memory too a pool */
extern void
pci_pool_free (struct pci_pool *pool, void *vaddr, dma_addr_t handle);
/* Destroy a pool */
extern void
pci_pool_destroy (struct pci_pool *pool);
/* Convert a DMA mapping to its cpu address (as returned by pci_pool_alloc).
* Don't assume this is cheap, although on some platforms it may be simple
* macros adding a constant to the DMA handle.
*/
extern void *
pci_pool_dma_to_cpu (struct pci_pool *pool, dma_addr_t handle);
/* LOGICALLY: drivers/linux/pci.c */
/*
There should be some memory pool allocator that this code
can use ... it'd be a layer (just like kmem_cache_alloc and
kmalloc) with an API that exposes PCI devices, dma address
mappings, and hardware alignment requirements.
Until that happens, this quick'n'dirty implementation of the
API should work, and should be fast enough for small pools
that use at most a few pages (and small allocation units).
*/
struct pci_pool {
char name [32];
struct pci_dev *dev;
struct list_head pages;
int size;
int align;
int flags;
int blocks_offset;
int blocks_per_page;
};
/* pci consistent pages allocated in units of LOGICAL_PAGE_SIZE, layout:
* - pci_page (always in the 'slab')
* - bitmap (with blocks_per_page bits)
* - blocks (starting at blocks_offset)
*
* this can easily be optimized, but the best fix would be to
* make this just a bus-specific front end to mm/slab.c logic.
*/
struct pci_page {
struct list_head pages;
dma_addr_t dma;
unsigned long bitmap [0];
};
#define LOGICAL_PAGE_SIZE PAGE_SIZE
/* irq-safe; protects any pool's pages and bitmaps */
static spinlock_t pci_page_lock = SPIN_LOCK_UNLOCKED;
#define POISON_BYTE 0xa5
/**
* pci_pool_create - Creates a pool of pci consistent memory blocks, for dma.
* @name: name of pool, for diagnostics
* @pdev: pci device that will be doing the DMA
* @size: size of the blocks in this pool.
* @align: alignment requirement for blocks; must be a power of two
* @flags: SLAB_* (or GFP_*) flags (not all are supported).
*
* Returns a pci allocation pool with the requested characteristics, or
* null if one can't be created.
*
* Memory from the pool will always be aligned to at least L1_CACHE_BYTES,
* or more strictly if requested. The actual size be larger than requested.
* Such memory will all have "consistent" DMA mappings, accessible by both
* devices and their drivers without using cache flushing primitives.
*
* FIXME: probably need a way to specify "objects must not cross boundaries
* of N Bytes", required by some device interfaces.
*/
extern struct pci_pool *
pci_pool_create (const char *name, struct pci_dev *pdev,
int size, int align, int flags)
{
struct pci_pool *retval;
size_t temp;
if ((LOGICAL_PAGE_SIZE - sizeof (struct pci_page)) < size)
return 0;
if (!(retval = kmalloc (sizeof *retval, flags)))
return retval;
if (align < L1_CACHE_BYTES)
align = L1_CACHE_BYTES;
if (size < align)
size = align;
else if (size % align)
size = (size + align - 1) & ~(align - 1);
strncpy (retval->name, name, sizeof retval->name);
retval->name [sizeof retval->name - 1] = 0;
retval->dev = pdev;
INIT_LIST_HEAD (&retval->pages);
retval->size = size;
retval->align = align;
retval->flags = flags;
temp = sizeof (struct pci_page);
if (temp % align)
temp = (temp + align - 1) & ~(align - 1);
retval->blocks_offset = temp;
retval->blocks_per_page = (LOGICAL_PAGE_SIZE - temp) / size;
temp -= sizeof (struct pci_page); /* bitmap size */
while ((temp * 8) < retval->blocks_per_page) {
temp += align;
retval->blocks_offset += align;
retval->blocks_per_page =
(LOGICAL_PAGE_SIZE - retval->blocks_offset)
/ size;
}
dbg ("create %s/%s, size %d align %d offset %d per-page %d",
pdev->slot_name, retval->name, size, align,
retval->blocks_offset, retval->blocks_per_page);
return retval;
}
EXPORT_SYMBOL (pci_pool_create);
static struct pci_page *
pci_alloc_page (struct pci_pool *pool)
{
void *vaddr;
struct pci_page *page;
dma_addr_t dma;
vaddr = pci_alloc_consistent (pool->dev, LOGICAL_PAGE_SIZE, &dma);
page = (struct pci_page *) vaddr;
if (page) {
page->dma = dma;
memset (page->bitmap, 0,
sizeof (long) *
((pool->blocks_per_page + BITS_PER_LONG - 1)
/ BITS_PER_LONG));
if (pool->flags & SLAB_POISON)
memset (vaddr + pool->blocks_offset, POISON_BYTE,
pool->size * pool->blocks_per_page);
list_add (&page->pages, &pool->pages);
}
return page;
}
static inline int
is_page_busy (int blocks, struct pci_page *page)
{
int i, bit;
for (i = 0; i < blocks; i += BITS_PER_LONG) {
bit = ffs (page->bitmap [i / BITS_PER_LONG]);
if (!bit || (i + bit) > blocks)
continue;
return 1;
}
return 0;
}
static void
pci_free_page (struct pci_pool *pool, struct pci_page *page)
{
dma_addr_t dma = page->dma;
#ifdef CONFIG_SLAB_DEBUG
if (is_page_busy (pool->blocks_per_page, page)) {
printk (KERN_ERR "pcipool %s/%s, free_page %p leak\n",
pool->dev->slot_name, pool->name, page);
return;
}
#endif
if (pool->flags & SLAB_POISON)
memset (page, POISON_BYTE, LOGICAL_PAGE_SIZE);
pci_free_consistent (pool->dev, LOGICAL_PAGE_SIZE, (void *) page, dma);
}
/**
* pci_pool_destroy - destroys a pool of pci memory blocks.
* @pool: pci pool that will be freed
*
* Caller guarantees no more memory from the pool is in use,
* and nothing will try to use the pool after this call.
*/
extern void
pci_pool_destroy (struct pci_pool *pool)
{
unsigned long flags;
struct pci_page *page;
dbg ("destroy %s/%s", pool->dev->slot_name, pool->name);
spin_lock_irqsave (&pci_page_lock, flags);
while (!list_empty (&pool->pages)) {
page = list_entry (pool->pages.next, struct pci_page, pages);
list_del (&page->pages);
pci_free_page (pool, page);
}
spin_unlock_irqrestore (&pci_page_lock, flags);
kfree (pool);
}
EXPORT_SYMBOL (pci_pool_destroy);
/**
* pci_pool_alloc - allocate a block of consistent memory
* @pool: pci pool that will produce the block
* @mem_flags: GFP_KERNEL or GFP_ATOMIC
* @handle: pointer to dma address of block
*
* This returns the kernel virtual address of the block, and reports
* its dma address through the handle.
*/
extern void *
pci_pool_alloc (struct pci_pool *pool, int mem_flags, dma_addr_t *handle)
{
unsigned long flags;
struct list_head *entry;
struct pci_page *page;
int map, block;
size_t offset;
void *retval;
spin_lock_irqsave (&pci_page_lock, flags);
list_for_each (entry, &pool->pages) {
int i;
page = list_entry (entry, struct pci_page, pages);
for (map = 0, i = 0;
i < pool->blocks_per_page;
i += BITS_PER_LONG, map++) {
if (page->bitmap [map] == ~0UL)
continue;
block = ffz (page->bitmap [map]);
if ((i + block) < pool->blocks_per_page) {
set_bit (block, &page->bitmap [map]);
offset = (BITS_PER_LONG * map) + block;
offset *= pool->size;
goto ready;
}
}
}
if (!(page = pci_alloc_page (pool))) {
if (mem_flags == GFP_KERNEL) {
/* FIXME: drop spinlock, block till woken
* or timeout, then restart
*/
printk (KERN_WARNING
"pci_pool_alloc %s/%s, can't block yet\n",
pool->dev->slot_name, pool->name);
}
retval = 0;
goto done;
}
offset = 0;
set_bit (0, &page->bitmap [0]);
ready:
offset += pool->blocks_offset;
retval = offset + (void *) page;
*handle = offset + page->dma;
done:
spin_unlock_irqrestore (&pci_page_lock, flags);
return retval;
}
EXPORT_SYMBOL (pci_pool_alloc);
static struct pci_page *
find_page (struct pci_pool *pool, dma_addr_t dma)
{
unsigned long flags;
struct list_head *entry;
struct pci_page *page;
spin_lock_irqsave (&pci_page_lock, flags);
list_for_each (entry, &pool->pages) {
page = list_entry (pool->pages.next, struct pci_page, pages);
if (dma < page->dma)
continue;
if (dma < (page->dma + LOGICAL_PAGE_SIZE))
goto done;
}
page = 0;
done:
spin_unlock_irqrestore (&pci_page_lock, flags);
return page;
}
#ifdef pci_pool_dma_to_cpu
# undef pci_pool_dma_to_cpu
#endif
/**
* pci_pool_dma_to_cpu - maps a block's dma address to a kernel virtual address.
* @pool: the pci pool holding the block
* @dma: dma address of the allocated block
*
* Convert a DMA mapping to its cpu address (as returned by pci_pool_alloc), or
* to null if the pool does not have a mapping for that particular dma address.
*
* Don't assume this is cheap, although on some platforms it can be optimized
* into macros that add some constant to the DMA address.
*/
extern void *
pci_pool_dma_to_cpu (struct pci_pool *pool, dma_addr_t dma)
{
struct pci_page *page;
if ((page = find_page (pool, dma)) == 0)
return 0;
return (dma - page->dma) + (void *)page;
}
EXPORT_SYMBOL (pci_pool_dma_to_cpu);
/**
* pci_pool_free - free an entry from a pci pool
* @pool: the pci pool holding the block
* @vaddr: virtual address of block
* @dma: dma address of block
*
* Caller promises neither device nor driver will touch this block unless
* it re-allocated.
*/
extern void
pci_pool_free (struct pci_pool *pool, void *vaddr, dma_addr_t dma)
{
struct pci_page *page;
unsigned long flags;
int block;
if ((page = find_page (pool, dma)) == 0) {
printk (KERN_ERR "pci_pool_free %s/%s, invalid dma %x\n",
pool->dev->slot_name, pool->name, dma);
return;
}
#ifdef CONFIG_SLAB_DEBUG
if (((dma - page->dma) + (void *)page) != vaddr) {
printk (KERN_ERR "pci_pool_free %s/%s, invalid vaddr %p\n",
pool->dev->slot_name, pool->name, vaddr);
return;
}
#endif
if (pool->flags & SLAB_POISON)
memset (vaddr, POISON_BYTE, pool->size);
block = dma - page->dma;
block /= pool->size;
spin_lock_irqsave (&pci_page_lock, flags);
clear_bit (block % BITS_PER_LONG,
&page->bitmap [block / BITS_PER_LONG]);
/* FIXME: if someone's waiting here, wakeup ... else */
if (!is_page_busy (pool->blocks_per_page, page)) {
list_del (&page->pages);
pci_free_page (pool, page);
}
spin_unlock_irqrestore (&pci_page_lock, flags);
}
EXPORT_SYMBOL (pci_pool_free);
On Fri, Mar 09, 2001, David S. Miller <[email protected]> wrote:
> Manfred Spraul writes:
> > Do lots of drivers need the reverse mapping? It wasn't on my todo list
> > yet.
>
> I am against any API which provides this. It can be extremely
> expensive to do this on some architectures, and since the rest
> of the PCI dma API does not provide such an interface neither
> should the pool routines.
The API I hacked together for uhci.c didn't have this.
> Drivers can keep track of this kind of information themselves,
> and that is what I tell every driver author to do who complains
> of a lack of a "bus_to_virt()" type thing, it's just lazy
> programming.
Once I worked around not having a "bus_to_virt()" type thing I was happier
with the resulting code. I completely agree.
JE
> > unlike the slab allocator bug(s) I pointed out. (And which
> > Manfred seems to have gone silent on.)
>
> which bugs?
See my previous email ... its behavior contradicts its spec,
and I'd sent a patch. You said you wanted kmalloc to have
an "automagic redzoning" feature, which would involve one
more change (to the flags used in kmalloc init when magic
redzoning is in effect). I'd expected a response.
> > * this can easily be optimized, but the best fix would be to
> > * make this just a bus-specific front end to mm/slab.c logic.
> ^^^^
>
> Adding that new frond end was already on my todo list for 2.5, but it
> means modifying half of mm/slab.c.
Exactly why I think we need a usable solution changing that
half! And why I asked for feedback about the API, not a
focus on this particular implementation.
> > if (align < L1_CACHE_BYTES)
> > align = L1_CACHE_BYTES;
>
> Why?
To see who was awake, of course! That shouldn't be there.
> > /* Convert a DMA mapping to its cpu address (as returned by pci_pool_alloc).
> > * Don't assume this is cheap, although on some platforms it may be simple
> > * macros adding a constant to the DMA handle.
> > */
> > extern void *
> > pci_pool_dma_to_cpu (struct pci_pool *pool, dma_addr_t handle);
>
> Do lots of drivers need the reverse mapping? It wasn't on my todo list
> yet.
Some hardware (like OHCI) talks to drivers using those dma handles.
- Dave
> Date: Fri, 09 Mar 2001 10:29:22 -0800
> From: David Brownell <[email protected]>
> > > extern void *
> > > pci_pool_dma_to_cpu (struct pci_pool *pool, dma_addr_t handle);
> >
> > Do lots of drivers need the reverse mapping? It wasn't on my todo list
> > yet.
>
> Some hardware (like OHCI) talks to drivers using those dma handles.
I wonder if it may be feasible to allocate a bunch of contiguous
pages. Then, whenever the hardware returns a bus address, subtract
the remembered bus address of the zone start, add the offset to
the virtual and voila.
-- Pete
> > > > extern void *
> > > > pci_pool_dma_to_cpu (struct pci_pool *pool, dma_addr_t handle);
> > >
> > > Do lots of drivers need the reverse mapping? It wasn't on my todo list
> > > yet.
> >
> > Some hardware (like OHCI) talks to drivers using those dma handles.
>
> I wonder if it may be feasible to allocate a bunch of contiguous
> pages. Then, whenever the hardware returns a bus address, subtract
> the remembered bus address of the zone start, add the offset to
> the virtual and voila.
That's effectively what the implementation I posted is doing.
Simple math ... as soon as you get the right "logical page",
and that page size could become a per-pool tunable. Currently
one logical page is PAGE_SIZE; there are some issues to
deal with in terms of not crossing page boundaries.
There can be multiple such pages, known to the pool allocator
and hidden from the device drivers. I'd expect most USB host
controllers wouldn't allocate more than one or two pages, so
the cost of this function would typically be small.
- Dave
> > > Do lots of drivers need the reverse mapping? It wasn't on my todo list
> > > yet.
> >
> > I am against any API which provides this. It can be extremely
> > expensive to do this on some architectures,
The implementation I posted needed no architecture-specific
knowledge. If cost is the issue, fine -- this makes it finite,
(not infinite), and some drivers can eliminate that cost.
> > and since the rest
> > of the PCI dma API does not provide such an interface neither
> > should the pool routines.
>
> The API I hacked together for uhci.c didn't have this.
But it didn't handle the OHCI done-list processing, and we've heard
a lot more about pci_*_consistent being needed with OHCI than
with UHCI; it's more common on non-Intel architectures.
Given that some hardware must return the dma addresses, why
should it be a good thing to have an API that doesn't expose
the notion of a reverse mapping? At this level -- not the lower
level code touching hardware PTEs.
- Dave
Pete Zaitcev writes:
> > Some hardware (like OHCI) talks to drivers using those dma handles.
>
> I wonder if it may be feasible to allocate a bunch of contiguous
> pages. Then, whenever the hardware returns a bus address, subtract
> the remembered bus address of the zone start, add the offset to
> the virtual and voila.
Thankfully, some people are not lazy :-)
Later,
David S. Miller
[email protected]
David Brownell writes:
> > Do lots of drivers need the reverse mapping? It wasn't on my todo list
> > yet.
>
> Some hardware (like OHCI) talks to drivers using those dma handles.
Drivers for such hardware will this keep track of the information
necessary to make this reverse mapping.
Later,
David S. Miller
[email protected]
David Brownell writes:
> Given that some hardware must return the dma addresses, why
> should it be a good thing to have an API that doesn't expose
> the notion of a reverse mapping? At this level -- not the lower
> level code touching hardware PTEs.
Because its' _very_ expensive on certain machines. You have to do
1 or more I/O accesses to get at the PTEs.
If you add this reverse notion to just one API (the dma pool one) then
people will complain (rightly) that there is not orthogonality in the
API since the other mapping functions do not provide it.
No, it is unacceptable.
Later,
David S. Miller
[email protected]
On Fri, 9 Mar 2001, David Brownell wrote:
> > > > > extern void *
> > > > > pci_pool_dma_to_cpu (struct pci_pool *pool, dma_addr_t handle);
> > > >
> > > > Do lots of drivers need the reverse mapping? It wasn't on my todo list
> > > > yet.
> > >
> > > Some hardware (like OHCI) talks to drivers using those dma handles.
> >
> > I wonder if it may be feasible to allocate a bunch of contiguous
> > pages. Then, whenever the hardware returns a bus address, subtract
> > the remembered bus address of the zone start, add the offset to
> > the virtual and voila.
>
> That's effectively what the implementation I posted is doing.
>
> Simple math ... as soon as you get the right "logical page",
> and that page size could become a per-pool tunable. Currently
> one logical page is PAGE_SIZE; there are some issues to
> deal with in terms of not crossing page boundaries.
>
> There can be multiple such pages, known to the pool allocator
> and hidden from the device drivers. I'd expect most USB host
> controllers wouldn't allocate more than one or two pages, so
> the cost of this function would typically be small.
Just for information to people that want to complexify the
pci_alloc_consistent() interface thats looks simple and elegant to me:
(Hopefully, I am not off topic here)
1) The sym53c8xx driver and friends expect this simple interface to
return naturally aligned memory chunks. It mostly allocates 1 page
at a time.
2) The sym* drivers use a very simple allocator that keeps track of bus
addresses for each chunk (page sized).
The object file of the allocator as seen in sym2 is as tiny as 3.4K
unstripped and 2.5K stripped.
3) The drivers need reverse virtual addresses for the DSA bus addresses
and implements a simplistic hashed list that costs no more
than 10 lines of trivial C code.
Btw, as a result, if pci_alloc_consistent() will ever return not
naturally aligned addresses the sym* drivers will be broken for
Linux.
This leaves me very surprised by all this noise given the _no_ issue I
have seen using the pci_alloc_consistent() interface. It looked to me a
_lot_ more simple to use than the equivalent interfaces I have had to
suffer with other O/Ses.
Now, if modern programmers are expecting Java-like interfaces for writing
kernel software, it is indeed another story. :-)
G?rard.
> > Given that some hardware must return the dma addresses, why
> > should it be a good thing to have an API that doesn't expose
> > the notion of a reverse mapping? At this level -- not the lower
> > level code touching hardware PTEs.
>
> Because its' _very_ expensive on certain machines. You have to do
> 1 or more I/O accesses to get at the PTEs.
Except, I said this was NOT at that level. Those costs don't
need to be incurred, but you are reasoning as if they did.
> If you add this reverse notion to just one API (the dma pool one) then
> people will complain (rightly) that there is not orthogonality in the
> API since the other mapping functions do not provide it.
"Orthogonality" is the wrong word there. In fact, this is a highly
orthogonal approach: each layer deals with distinct problems.
(Which is why I'd ignore that complaint.)
There's a bunch of functionality drivers need to have, and which
the pci_*_consistent() layer APIs (rightly) don't provide. Just
like a kmem_cache provides functionality that's not visible
through the generic page allocator code; except that this needs
to work with the pci-specific page allocator.
It feels to me like you're being inconsistent here, objecting
to a library API for some functionality (mapping) yet not for
any of the other functionality (alignment, small size, poisoning
and so on). And yet when Pete Zaitcev described what that
mapping code actually involved, you didn't object. So you've
succeeded in confusing me. Care to unconfuse?
- Dave
> I wonder if it may be feasible to allocate a bunch of contiguous
> pages. Then, whenever the hardware returns a bus address, subtract
> the remembered bus address of the zone start, add the offset to
> the virtual and voila.
Even if not you can hash by page number not low bits so the hash is way
smaller. You (in most cases) can also write the entry number in the relevant
tables onto the end of the object in spare space (or in front of it)
Something as trivial as
struct usb_thingy
{
u32 thing_id;
u32 flags;
struct usb_thingy *next;
#ifndef __LP64__
u32 pad
#endif
struct usb_controller_goo goo;
}
Alan
> Date: Fri, 09 Mar 2001 13:14:03 -0800
> From: David Brownell <[email protected]>
>[...]
> It feels to me like you're being inconsistent here, objecting
> to a library API for some functionality (mapping) yet not for
> any of the other functionality (alignment, small size, poisoning
> and so on). And yet when Pete Zaitcev described what that
> mapping code actually involved, you didn't object. So you've
> succeeded in confusing me. Care to unconfuse?
I did not propose an API or library which would be equal amond equals
with first rate citizens of pci_alloc_xxx and friends. I pointed out
that driver can do tracking of reverse mappings at very little cost
by using offset [Alan remarked to that how hash can use page number];
so, one may say that I supported DaveM's viewpoint. No wonder he did
not object.
-- Pete
G?rard --
> Just for information to people that want to complexify the
> pci_alloc_consistent() interface thats looks simple and elegant to me:
I certainly didn't propose that! Just a layer on top of the
pci_alloc_consistent code -- used as a page allocator, just
like you used it.
> The object file of the allocator as seen in sym2 is as tiny as 3.4K
> unstripped and 2.5K stripped.
What I sent along just compiled to 2.3 KB ... stripped, and "-O".
Maybe smaller with normal kernel flags. The reverse mapping
code hast to be less than 0.1KB.
I looked at your code, but it didn't seem straightforward to reuse.
I think the allocation and deallocation costs can be pretty comparable
in the two implementations. Your implementation might even fit behind
the API I sent. They're both layers over pci_*_consistent (and both
have address-to-address mappings, implemented much the same).
> Now, if modern programmers are expecting Java-like interfaces for writing
> kernel software, it is indeed another story. :-)
Only if when you wrote "Java-like" you really meant "reusable"! :)
- Dave
On Fri, 9 Mar 2001, David Brownell wrote:
> G?rard --
>
> > Just for information to people that want to complexify the
> > pci_alloc_consistent() interface thats looks simple and elegant to me:
>
> I certainly didn't propose that! Just a layer on top of the
> pci_alloc_consistent code -- used as a page allocator, just
> like you used it.
>
>
> > The object file of the allocator as seen in sym2 is as tiny as 3.4K
> > unstripped and 2.5K stripped.
>
> What I sent along just compiled to 2.3 KB ... stripped, and "-O".
> Maybe smaller with normal kernel flags. The reverse mapping
> code hast to be less than 0.1KB.
If reverse mapping means bus_to_virt(), then I would suggest not to
provide it since it is a confusing interface. OTOH, only a few drivers
need or want to retrieve the virtual address that lead to some bus dma
address and they should check that this virtual address is still valid
prior to using it. As I wrote, some trivial hashed list can be used by
such drivers (as sym* do).
> I looked at your code, but it didn't seem straightforward to reuse.
> I think the allocation and deallocation costs can be pretty comparable
> in the two implementations. Your implementation might even fit behind
> the API I sent. They're both layers over pci_*_consistent (and both
> have address-to-address mappings, implemented much the same).
I wanted the code as short as possible since the driver code is already
very large. On the other hand there are bunches of #ifdef to deal with all
still alive kernel versions. As a result, the code may well not be general
nor clean enough to be moved to the kernel. Just what it actually does is
fairly simple.
> > Now, if modern programmers are expecting Java-like interfaces for writing
> > kernel software, it is indeed another story. :-)
>
> Only if when you wrote "Java-like" you really meant "reusable"! :)
Hmmm... 'reusable' implies 'usable'...
Does 'usable' apply to Java applications ? :-)
G?rard.
> > The reverse mapping
> > code hast to be less than 0.1KB.
>
> If reverse mapping means bus_to_virt(), then I would suggest not to
> provide it since it is a confusing interface. OTOH, only a few drivers
> need or want to retrieve the virtual address that lead to some bus dma
Your SCSI code went the other way; the logic is about the same.
That's easy enough ... I'm not going to argue that point any longer.
The driver might even have Real Intelligence to apply. But I wonder
how many assumptions drivers will end up making about those dma
mappings. It may be important to expose the "logical" page size to
the driver ("don't cross 4k boundaries"); currently it's hidden.
Other than that L1_CACHE goof, it seems this was the main thing
needing to change in the I API sent by. Sound right? Implementation
would be a different question. I'm not in the least attached to what
I sent by, but some implementation is needed. Slab-like, or buddy? :)
> Does 'usable' apply to Java applications ? :-)
Servers and other non-gui tools? I don't see why not. You can make
good systems software in many languages. There are advantages to
not having those classes of memory-related bugs. I'm looking forward
to GCC 3.0 with GCJ, compiling Java just like C. But that's OT.
- Dave