diff -urpN -X /home/fletch/.diff.exclude
12-boot_error/drivers/net/starfire.c
19-fix_starfire_warning/drivers/net/starfire.c
--- 12-boot_error/drivers/net/starfire.c Fri Dec 13 23:17:59 2002
+++ 19-fix_starfire_warning/drivers/net/starfire.c Thu Jan 2 22:18:18 2003
@@ -1847,15 +1847,15 @@ static int netdev_close(struct net_devic
#ifdef __i386__
if (debug > 2) {
- printk("\n"KERN_DEBUG" Tx ring at %8.8x:\n",
- np->tx_ring_dma);
+ printk("\n"KERN_DEBUG" Tx ring at %9.9Lx:\n",
+ (u64) np->tx_ring_dma);
for (i = 0; i < 8 /* TX_RING_SIZE is huge! */; i++)
printk(KERN_DEBUG " #%d desc. %8.8x %8.8x -> %8.8x.\n",
i, le32_to_cpu(np->tx_ring[i].status),
le32_to_cpu(np->tx_ring[i].first_addr),
le32_to_cpu(np->tx_done_q[i].status));
- printk(KERN_DEBUG " Rx ring at %8.8x -> %p:\n",
- np->rx_ring_dma, np->rx_done_q);
+ printk(KERN_DEBUG " Rx ring at %9.9Lx -> %p:\n",
+ (u64) np->rx_ring_dma, np->rx_done_q);
if (np->rx_done_q)
for (i = 0; i < 8 /* RX_RING_SIZE */; i++) {
printk(KERN_DEBUG " #%d desc. %8.8x -> %8.8x\n",
On Sat, 04 Jan 2003 21:08:33 -0800, Martin J. Bligh <[email protected]> wrote:
> diff -urpN -X /home/fletch/.diff.exclude
> 12-boot_error/drivers/net/starfire.c
> 19-fix_starfire_warning/drivers/net/starfire.c
> --- 12-boot_error/drivers/net/starfire.c Fri Dec 13 23:17:59 2002
> +++ 19-fix_starfire_warning/drivers/net/starfire.c Thu Jan 2 22:18:18 2003
Fix the compiler warning, yes; fix the driver for 64-bit dma_addr_t, no.
It may work with PAE, by chance, if all addresses returned by pci_map_single
and friends are < (1 << 33), but not otherwise.
Jeff already has an updated starfire driver in his queue, complete with
full 64-bit support.
[the cc: to the maintainer is always appreciated...]
Ion
[starfire driver maintainer]
--
It is better to keep your mouth shut and be thought a fool,
than to open it and remove all doubt.
> Fix the compiler warning, yes; fix the driver for 64-bit dma_addr_t, no.
> It may work with PAE, by chance, if all addresses returned by pci_map_single
> and friends are < (1 << 33), but not otherwise.
Odd. It seems to work with PAE now. pci_map_single just casts an address
though ... are the things you're passing it always allocated from ZONE_NORMAL?
I run these all the time on 16Gb machines with 16 processors (ia32 NUMA-Q).
> Jeff already has an updated starfire driver in his queue, complete with
> full 64-bit support.
Cool! Sorry, I've just been seeing that warning for about 1 year, and was
sick of it.
> [the cc: to the maintainer is always appreciated...]
Sorry, missed that. But in that case, I have another question for you ;-)
Why do I get wierd errors like this:
Jan 6 10:09:53 larry kernel: eth0: Increasing Tx FIFO threshold to 80 bytes
Jan 6 10:09:56 larry kernel: eth0: Increasing Tx FIFO threshold to 96 bytes
Jan 6 10:09:56 larry kernel: eth0: Increasing Tx FIFO threshold to 112 bytes
Jan 6 10:10:09 larry kernel: eth0: Increasing Tx FIFO threshold to 128 bytes
Jan 6 10:10:09 larry kernel: eth0: Increasing Tx FIFO threshold to 144 bytes
Jan 6 10:10:12 larry kernel: eth0: Increasing Tx FIFO threshold to 160 bytes
Jan 6 10:10:12 larry kernel: eth0: Increasing Tx FIFO threshold to 176 bytes
Jan 6 10:10:14 larry kernel: eth0: Increasing Tx FIFO threshold to 192 bytes
Jan 6 10:10:30 larry kernel: eth0: Increasing Tx FIFO threshold to 208 bytes
Jan 6 10:10:32 larry kernel: eth0: Increasing Tx FIFO threshold to 224 bytes
Jan 6 10:10:39 larry kernel: eth0: Increasing Tx FIFO threshold to 240 bytes
Jan 6 10:10:39 larry kernel: eth0: Increasing Tx FIFO threshold to 256 bytes
Jan 6 10:10:39 larry kernel: eth0: Increasing Tx FIFO threshold to 272 bytes
Jan 6 10:10:46 larry kernel: eth0: Increasing Tx FIFO threshold to 288 bytes
Jan 6 10:10:46 larry kernel: eth0: Increasing Tx FIFO threshold to 304 bytes
Jan 6 10:10:47 larry kernel: eth0: Increasing Tx FIFO threshold to 320 bytes
Jan 6 10:10:47 larry kernel: eth0: Increasing Tx FIFO threshold to 336 bytes
Jan 6 10:10:57 larry kernel: eth0: Increasing Tx FIFO threshold to 352 bytes
Jan 6 10:10:57 larry kernel: eth0: Increasing Tx FIFO threshold to 368 bytes
Jan 6 10:11:22 larry kernel: eth0: Increasing Tx FIFO threshold to 384 bytes
Jan 6 10:11:37 larry kernel: eth0: Increasing Tx FIFO threshold to 400 bytes
Jan 6 10:11:58 larry kernel: eth0: Increasing Tx FIFO threshold to 416 bytes
Jan 6 10:12:29 larry kernel: eth0: Increasing Tx FIFO threshold to 432 bytes
Jan 6 10:12:29 larry kernel: eth0: Increasing Tx FIFO threshold to 448 bytes
Jan 6 10:12:29 larry kernel: eth0: Increasing Tx FIFO threshold to 464 bytes
I also recall getting errors like "Something Wicked happened", but I
don't seem to be able to find them in the log right now.
Thanks,
M.
On Mon, 6 Jan 2003, Martin J. Bligh wrote:
> > Fix the compiler warning, yes; fix the driver for 64-bit dma_addr_t, no.
> > It may work with PAE, by chance, if all addresses returned by pci_map_single
> > and friends are < (1 << 33), but not otherwise.
>
> Odd. It seems to work with PAE now. pci_map_single just casts an address
> though ... are the things you're passing it always allocated from ZONE_NORMAL?
> I run these all the time on 16Gb machines with 16 processors (ia32 NUMA-Q).
I get them from the network stack, and I supposed they're guaranteed to be
in ZONE_NORMAL as long as the adapter doesn't list NETIF_F_HIGHDMA among
its features. So yes, you're right that it will probably work with PAE,
but it won't work with a real 64-bit box, methinks.
> Cool! Sorry, I've just been seeing that warning for about 1 year, and was
> sick of it.
I was kinda busy and didn't get a chance to do much with the driver over
the last 10 months or so... I also didn't have any boxes which could do
real PAE, so I didn't try very hard -- until recently, that is. :-)
> Sorry, missed that. But in that case, I have another question for you ;-)
> Why do I get wierd errors like this:
>
> Jan 6 10:09:53 larry kernel: eth0: Increasing Tx FIFO threshold to 80 bytes
> Jan 6 10:09:56 larry kernel: eth0: Increasing Tx FIFO threshold to 96 bytes
A few of these are mostly normal, it's the card signalling the driver that
it is getting a Tx fifo underrun, and the driver responds by increasing
the threshold at which the card starts transmitting the packet.
> Jan 6 10:12:29 larry kernel: eth0: Increasing Tx FIFO threshold to 448 bytes
> Jan 6 10:12:29 larry kernel: eth0: Increasing Tx FIFO threshold to 464 bytes
These are very high, however; it could be that there really is very high
contention on the PCI bus, but otherwise I can't explain them.
If they stop before reaching 1500, then it's probably ok and just
something you're gonna have to live with. Otherwise it's a bug of some
sort.
> I also recall getting errors like "Something Wicked happened", but I
> don't seem to be able to find them in the log right now.
Yeah, the interrupt status (printed right after the messages) would have
been helpful...
That said, there is a known race condition in the v1.3.6 of the driver,
which could cause timeouts and erors under certain circumstances. It's all
fixed in 1.3.9 (the full 64-bit support version) and 1.4.0 (NAPI support).
Both of those will do real 64-bit transfers, without the need for double
buffering, so it should help on your 16GB boxes.
I could forward you one of those versions, if you want to test it. In
fact, I'd appreciate some testing! :-)
Thanks,
Ion
--
It is better to keep your mouth shut and be thought a fool,
than to open it and remove all doubt.
>> Odd. It seems to work with PAE now. pci_map_single just casts an address
>> though ... are the things you're passing it always allocated from ZONE_NORMAL?
>> I run these all the time on 16Gb machines with 16 processors (ia32 NUMA-Q).
>
> I get them from the network stack, and I supposed they're guaranteed to be
> in ZONE_NORMAL as long as the adapter doesn't list NETIF_F_HIGHDMA among
> its features. So yes, you're right that it will probably work with PAE,
> but it won't work with a real 64-bit box, methinks.
Well it seems like the Right Thing To Do anyway to make it all 64bit clean ;-)
> A few of these are mostly normal, it's the card signalling the driver that
> it is getting a Tx fifo underrun, and the driver responds by increasing
> the threshold at which the card starts transmitting the packet.
Can we not print them onto the console if they're normal then?
>> Jan 6 10:12:29 larry kernel: eth0: Increasing Tx FIFO threshold to 448 bytes
>> Jan 6 10:12:29 larry kernel: eth0: Increasing Tx FIFO threshold to 464 bytes
>
> These are very high, however; it could be that there really is very high
> contention on the PCI bus, but otherwise I can't explain them.
>
> If they stop before reaching 1500, then it's probably ok and just
> something you're gonna have to live with. Otherwise it's a bug of some
> sort.
I think the card took itself offline at this point, so it smells like a bug.
That's only been happening recently though (I've only noticed in the last
week from a year or two of use).
>> I also recall getting errors like "Something Wicked happened", but I
>> don't seem to be able to find them in the log right now.
>
> Yeah, the interrupt status (printed right after the messages) would have
> been helpful...
OK, will try to grab that.
> That said, there is a known race condition in the v1.3.6 of the driver,
> which could cause timeouts and erors under certain circumstances. It's all
> fixed in 1.3.9 (the full 64-bit support version) and 1.4.0 (NAPI support).
> Both of those will do real 64-bit transfers, without the need for double
> buffering, so it should help on your 16GB boxes.
>
> I could forward you one of those versions, if you want to test it. In
> fact, I'd appreciate some testing! :-)
Sure, send me the patch, these boxes bring out races like dying rich aunts
bring out friendly relatives. And I have a cabinet drawer full of starfire
cards ;-)
M.
On Mon, 6 Jan 2003, Martin J. Bligh wrote:
> > A few of these are mostly normal, it's the card signalling the driver that
> > it is getting a Tx fifo underrun, and the driver responds by increasing
> > the threshold at which the card starts transmitting the packet.
>
> Can we not print them onto the console if they're normal then?
They're only semi-normal, since they signal some unusual contention on the
PCI bus... but yeah, I guess we could lower their priority to KERN_INFO.
> I think the card took itself offline at this point, so it smells like a bug.
> That's only been happening recently though (I've only noticed in the last
> week from a year or two of use).
I could definitely be a bug (known or not). Anyway, it would be good to
test it with the latest version of the driver.
> Sure, send me the patch, these boxes bring out races like dying rich aunts
> bring out friendly relatives. And I have a cabinet drawer full of starfire
> cards ;-)
All right, I'll forward it off-list.
Thanks,
Ion
--
It is better to keep your mouth shut and be thought a fool,
than to open it and remove all doubt.
I've been using your new starfire driver for a couple of weeks now,
and it's in 2.5-mjb ... been testing on the 16x NUMA-Q w/16Gb RAM.
Not only does it work just fine with no problems, all the wierd error
messages I had before went away ;-)
Seems cool to me, are you ready to push this to Linus?
M.
On Sun, 9 Feb 2003, Martin J. Bligh wrote:
> I've been using your new starfire driver for a couple of weeks now,
> and it's in 2.5-mjb ... been testing on the 16x NUMA-Q w/16Gb RAM.
>
> Not only does it work just fine with no problems, all the wierd error
> messages I had before went away ;-)
Good to hear that...
Which version did you end up using: 1.4.0 or 1.3.9?
> Seems cool to me, are you ready to push this to Linus?
Well, 1.3.9 is already in Marcelo's tree. I'll push 1.4.1 -- which is
1.4.0 plus a compile option for enabling/disabling NAPI -- to Linus as
soon as I finish testing with both compile options.
Thanks,
Ion
--
It is better to keep your mouth shut and be thought a fool,
than to open it and remove all doubt.
>> I've been using your new starfire driver for a couple of weeks now,
>> and it's in 2.5-mjb ... been testing on the 16x NUMA-Q w/16Gb RAM.
>>
>> Not only does it work just fine with no problems, all the wierd error
>> messages I had before went away ;-)
>
> Good to hear that...
>
> Which version did you end up using: 1.4.0 or 1.3.9?
1.4.0
>> Seems cool to me, are you ready to push this to Linus?
>
> Well, 1.3.9 is already in Marcelo's tree. I'll push 1.4.1 -- which is
> 1.4.0 plus a compile option for enabling/disabling NAPI -- to Linus as
> soon as I finish testing with both compile options.
Sounds good to me! Thanks for fixing all this,
M.