On 2.4.18-rc2 with the Dolphin SCI adapters installed and running
with 2 GB of physical memory installed in the system, ioremap() fails
to properly map the SCI adapter PREFETCH space. This error does not
occur opn 2.4.18-rc2 when the physical memory installed in the system
is less than 1 GB. This error is proceeded by a failure in the PCI
subsystem to properly allocate resources and reports the following
message in the /var/log/messages file:
Feb 20 09:49:38 lnx1 kernel: PCI: PCI BIOS revision 2.10 entry at 0xfdbb1, last bus=2
Feb 20 09:49:38 lnx1 kernel: PCI: Using configuration type 1
Feb 20 09:49:38 lnx1 kernel: PCI: Probing PCI hardware
Feb 20 09:49:38 lnx1 kernel: PCI: Unable to handle 64-bit address for device 00:05.0
Feb 20 09:49:38 lnx1 kernel: PCI: Unable to handle 64-bit address for device 00:05.1
Feb 20 09:49:38 lnx1 kernel: PCI: Discovered primary peer bus 02 [IRQ]
Feb 20 09:49:38 lnx1 kernel: PCI: Using IRQ router ServerWorks [1166/0200] at 00:0f.0
Feb 20 09:49:38 lnx1 kernel: PCI->APIC IRQ transform: (B0,I2,P0) -> 18
Feb 20 09:49:38 lnx1 kernel: PCI->APIC IRQ transform: (B0,I6,P0) -> 31
Feb 20 09:49:38 lnx1 kernel: PCI->APIC IRQ transform: (B0,I15,P0) -> 10
Feb 20 09:49:38 lnx1 kernel: PCI->APIC IRQ transform: (B1,I0,P0) -> 30
Feb 20 09:49:38 lnx1 kernel: PCI->APIC IRQ transform: (B2,I2,P0) -> 24
=====>
Feb 20 09:49:38 lnx1 kernel: PCI: Cannot allocate resource region 0 of device 00:05.0
Feb 20 09:49:38 lnx1 kernel: PCI: Cannot allocate resource region 0 of device 00:05.1
Feb 20 09:49:38 lnx1 kernel: PCI: Failed to allocate resource 1(0-ffffffff) for 00:05.0
Feb 20 09:49:38 lnx1 kernel: PCI: Failed to allocate resource 1(0-ffffffff) for 00:05.1
This message does not occur on systems running 2.4.18-rc2 if they have
less than 1 GB of physical memory installed. The address ranges being
mapped are as follows, and the SCI adapter reports the attached errors:
Feb 20 09:49:45 lnx1 sci: Reading IRM driver configuration informaiton from /opt/DIS/sbin/../lib/modules/pcisci.conf
Feb 20 09:49:45 lnx1 kernel: SCI Driver : Linux SMP support disabled
Feb 20 09:49:45 lnx1 kernel: SCI Driver : using MTRR
Feb 20 09:49:45 lnx1 kernel: PCI SCI Bridge - device id 0xd667 found
Feb 20 09:49:45 lnx1 kernel: 1 supported PCI-SCI bridges (PSB's) found on the system
Feb 20 09:49:45 lnx1 kernel: Define PSB 1 key: Bus: 2 DevFn: 16
Feb 20 09:49:45 lnx1 kernel: Key 1: Key: (Bus: 2,DevFn: 16), Device No. 1, irq 24
Feb 20 09:49:45 lnx1 kernel: Mapping address space non cacheable CSR space: phaddr febe0000 sz 131072 out of 131072
Feb 20 09:49:45 lnx1 kernel: Mapping address space non cacheable CSR space: vaddr f8914000
Feb 20 09:49:45 lnx1 kernel: Mapping address space non cacheable IO space: phaddr fd000000 sz 16777216 out of 16777216
Feb 20 09:49:45 lnx1 kernel: Mapping address space non cacheable IO space: vaddr f8935000
Feb 20 09:49:45 lnx1 kernel: SCI Adapter 0 : User request to reduce prefetchspace size from 0x20000000 to 0x1c000000
=====>
Feb 20 09:49:45 lnx1 kernel: Mapping address space PREFETCH space: phaddr c0000000 sz 469762048 out of 469762048
Feb 20 09:49:45 lnx1 kernel: Failed to map address space PREFETCH space
Feb 20 09:49:45 lnx1 kernel: SCI Driver : Adapter init failed!!
Feb 20 09:49:45 lnx1 kernel: SCI Driver : init failed!
Feb 20 09:49:45 lnx1 sci: ERROR: IRM Driver failed to load
Feb 20 09:49:45 lnx1 sci: Check /var/log/scilog for details
Feb 20 09:49:45 lnx1 rc: Starting sci: failed
/proc/iomem on the system is reporting the following attached
hardware devices and their address mappings:
00000000-0009f3ff : System RAM
0009f400-0009f7ff : reserved
000a0000-000bffff : Video RAM area
000c0000-000c7fff : Video ROM
000cc000-000ccfff : Extension ROM
000f0000-000fffff : System ROM
00100000-3fffffff : System RAM
00100000-002881d5 : Kernel code
002881d6-00319977 : Kernel data
b1fff000-b1ffffff : ServerWorks CNB20HE Host Bridge
b2000000-b3ffffff : ServerWorks CNB20HE Host Bridge
b4100000-bc1fffff : PCI Bus #01
b8000000-bbffffff : ATI Technologies Inc Rage 128 PF
=========> we are mapping this address range
c0000000-dfffffff : Dolphin Interconnect Solutions AS PSB66 SCI-Adapter D33x
fc400000-fc4fffff : PCI Bus #01
fc4fc000-fc4fffff : ATI Technologies Inc Rage 128 PF
fc900000-fc9fffff : Intel Corp. 82557 [Ethernet Pro 100]
fcac0000-fcadffff : Intel Corp. 82543GC Gigabit Ethernet Controller
fcae0000-fcaeffff : Intel Corp. 82543GC Gigabit Ethernet Controller
fcafe000-fcafefff : Intel Corp. 82557 [Ethernet Pro 100]
fcafe000-fcafefff : eepro100
fcaff000-fcafffff : ServerWorks OSB4/CSB5 OHCI USB Controller
fd000000-fdffffff : Dolphin Interconnect Solutions AS PSB66 SCI-Adapter D33x
febe0000-febfffff : Dolphin Interconnect Solutions AS PSB66 SCI-Adapter D33x
fec00000-fec01fff : reserved
fee00000-fee00fff : reserved
fff80000-ffffffff : reserved
The specific code that is failing is attached. Any ideas as to what
is broken here? Looks like the mapping code is busted if the system
has over 1 GB of physical memory with certain hardware. The window
size we are attempting to map is 512 MB. This works with less than
1 GB of memory and I have attempted 4GB and 64GB compile options and
it does not work.
Jeff
Here's the code fragment in question:
/*
* ----------------------------------------------------------------------------
* ( U N ) M A P P S B A D D R S P A C E
* ----------------------------------------------------------------------------
*/
static int
MapPsbAddrSpace(Sci_p up, memarea_t *as)
{
char *id = as->id ? as->id : "UNKNOWN";
if (!as->ioaddr) {
osif_warn("Tried to map phaddr 0x%x space %s unit %d\n",
as->ioaddr, id, up->os.unit_no);
return -1;
}
if (as->vaddr) {
osif_warn("Tried to map already mapped space %s unit %d\n",
id,up->os.unit_no);
return -1;
}
#if 1
osif_note("Mapping address space %s: phaddr %x sz %u out of %u",
id, as->ioaddr, as->msize, as->rsize);
#endif
#ifdef CPU_ARCH_IS_ALPHA
#warning This looks quite suspect out
if ((as->vaddr = (vkaddr_t)(dense_mem((unsigned)as->ioaddr)+(unsigned)as->ioaddr)) == 0) {
#else
=====> we are failing at this point
if ((as->vaddr = (vkaddr_t)ioremap((unsigned)as->ioaddr, as->msize)) == 0) {
#endif
osif_warn("Failed to map address space %s \n",
id,up->os.unit_no);
return -1;
}
as->ph_base_addr = virt_to_phys(bus_to_virt(as->ioaddr));
#if 1
osif_note("Mapping address space %s: vaddr %lx", id, as->vaddr);
#endif
return 0;
}
Jeff
On Wed, Feb 20, 2002 at 10:33:20AM -0700, Jeff V. Merkey wrote:
>
>
> On 2.4.18-rc2 with the Dolphin SCI adapters installed and running
> with 2 GB of physical memory installed in the system, ioremap() fails
> to properly map the SCI adapter PREFETCH space. This error does not
> occur opn 2.4.18-rc2 when the physical memory installed in the system
> is less than 1 GB. This error is proceeded by a failure in the PCI
> subsystem to properly allocate resources and reports the following
> message in the /var/log/messages file:
>
>
> Feb 20 09:49:38 lnx1 kernel: PCI: PCI BIOS revision 2.10 entry at 0xfdbb1, last bus=2
> Feb 20 09:49:38 lnx1 kernel: PCI: Using configuration type 1
> Feb 20 09:49:38 lnx1 kernel: PCI: Probing PCI hardware
> Feb 20 09:49:38 lnx1 kernel: PCI: Unable to handle 64-bit address for device 00:05.0
> Feb 20 09:49:38 lnx1 kernel: PCI: Unable to handle 64-bit address for device 00:05.1
> Feb 20 09:49:38 lnx1 kernel: PCI: Discovered primary peer bus 02 [IRQ]
> Feb 20 09:49:38 lnx1 kernel: PCI: Using IRQ router ServerWorks [1166/0200] at 00:0f.0
> Feb 20 09:49:38 lnx1 kernel: PCI->APIC IRQ transform: (B0,I2,P0) -> 18
> Feb 20 09:49:38 lnx1 kernel: PCI->APIC IRQ transform: (B0,I6,P0) -> 31
> Feb 20 09:49:38 lnx1 kernel: PCI->APIC IRQ transform: (B0,I15,P0) -> 10
> Feb 20 09:49:38 lnx1 kernel: PCI->APIC IRQ transform: (B1,I0,P0) -> 30
> Feb 20 09:49:38 lnx1 kernel: PCI->APIC IRQ transform: (B2,I2,P0) -> 24
>
> =====>
>
> Feb 20 09:49:38 lnx1 kernel: PCI: Cannot allocate resource region 0 of device 00:05.0
> Feb 20 09:49:38 lnx1 kernel: PCI: Cannot allocate resource region 0 of device 00:05.1
> Feb 20 09:49:38 lnx1 kernel: PCI: Failed to allocate resource 1(0-ffffffff) for 00:05.0
> Feb 20 09:49:38 lnx1 kernel: PCI: Failed to allocate resource 1(0-ffffffff) for 00:05.1
>
> This message does not occur on systems running 2.4.18-rc2 if they have
> less than 1 GB of physical memory installed. The address ranges being
> mapped are as follows, and the SCI adapter reports the attached errors:
>
>
> Feb 20 09:49:45 lnx1 sci: Reading IRM driver configuration informaiton from /opt/DIS/sbin/../lib/modules/pcisci.conf
> Feb 20 09:49:45 lnx1 kernel: SCI Driver : Linux SMP support disabled
> Feb 20 09:49:45 lnx1 kernel: SCI Driver : using MTRR
> Feb 20 09:49:45 lnx1 kernel: PCI SCI Bridge - device id 0xd667 found
> Feb 20 09:49:45 lnx1 kernel: 1 supported PCI-SCI bridges (PSB's) found on the system
> Feb 20 09:49:45 lnx1 kernel: Define PSB 1 key: Bus: 2 DevFn: 16
> Feb 20 09:49:45 lnx1 kernel: Key 1: Key: (Bus: 2,DevFn: 16), Device No. 1, irq 24
> Feb 20 09:49:45 lnx1 kernel: Mapping address space non cacheable CSR space: phaddr febe0000 sz 131072 out of 131072
> Feb 20 09:49:45 lnx1 kernel: Mapping address space non cacheable CSR space: vaddr f8914000
> Feb 20 09:49:45 lnx1 kernel: Mapping address space non cacheable IO space: phaddr fd000000 sz 16777216 out of 16777216
> Feb 20 09:49:45 lnx1 kernel: Mapping address space non cacheable IO space: vaddr f8935000
> Feb 20 09:49:45 lnx1 kernel: SCI Adapter 0 : User request to reduce prefetchspace size from 0x20000000 to 0x1c000000
>
> =====>
>
> Feb 20 09:49:45 lnx1 kernel: Mapping address space PREFETCH space: phaddr c0000000 sz 469762048 out of 469762048
> Feb 20 09:49:45 lnx1 kernel: Failed to map address space PREFETCH space
> Feb 20 09:49:45 lnx1 kernel: SCI Driver : Adapter init failed!!
> Feb 20 09:49:45 lnx1 kernel: SCI Driver : init failed!
> Feb 20 09:49:45 lnx1 sci: ERROR: IRM Driver failed to load
> Feb 20 09:49:45 lnx1 sci: Check /var/log/scilog for details
> Feb 20 09:49:45 lnx1 rc: Starting sci: failed
>
>
> /proc/iomem on the system is reporting the following attached
> hardware devices and their address mappings:
>
>
> 00000000-0009f3ff : System RAM
> 0009f400-0009f7ff : reserved
> 000a0000-000bffff : Video RAM area
> 000c0000-000c7fff : Video ROM
> 000cc000-000ccfff : Extension ROM
> 000f0000-000fffff : System ROM
> 00100000-3fffffff : System RAM
> 00100000-002881d5 : Kernel code
> 002881d6-00319977 : Kernel data
> b1fff000-b1ffffff : ServerWorks CNB20HE Host Bridge
> b2000000-b3ffffff : ServerWorks CNB20HE Host Bridge
> b4100000-bc1fffff : PCI Bus #01
> b8000000-bbffffff : ATI Technologies Inc Rage 128 PF
>
> =========> we are mapping this address range
> c0000000-dfffffff : Dolphin Interconnect Solutions AS PSB66 SCI-Adapter D33x
> fc400000-fc4fffff : PCI Bus #01
> fc4fc000-fc4fffff : ATI Technologies Inc Rage 128 PF
> fc900000-fc9fffff : Intel Corp. 82557 [Ethernet Pro 100]
> fcac0000-fcadffff : Intel Corp. 82543GC Gigabit Ethernet Controller
> fcae0000-fcaeffff : Intel Corp. 82543GC Gigabit Ethernet Controller
> fcafe000-fcafefff : Intel Corp. 82557 [Ethernet Pro 100]
> fcafe000-fcafefff : eepro100
> fcaff000-fcafffff : ServerWorks OSB4/CSB5 OHCI USB Controller
> fd000000-fdffffff : Dolphin Interconnect Solutions AS PSB66 SCI-Adapter D33x
> febe0000-febfffff : Dolphin Interconnect Solutions AS PSB66 SCI-Adapter D33x
> fec00000-fec01fff : reserved
> fee00000-fee00fff : reserved
> fff80000-ffffffff : reserved
>
> The specific code that is failing is attached. Any ideas as to what
> is broken here? Looks like the mapping code is busted if the system
> has over 1 GB of physical memory with certain hardware. The window
> size we are attempting to map is 512 MB. This works with less than
> 1 GB of memory and I have attempted 4GB and 64GB compile options and
> it does not work.
>
> Jeff
>
"Jeff V. Merkey" wrote:
> #ifdef CPU_ARCH_IS_ALPHA
> #warning This looks quite suspect out
> if ((as->vaddr = (vkaddr_t)(dense_mem((unsigned)as->ioaddr)+(unsigned)as->ioaddr)) == 0) {
> #else
>
> =====> we are failing at this point
>
> if ((as->vaddr = (vkaddr_t)ioremap((unsigned)as->ioaddr, as->msize)) == 0) {
> #endif
ioremap works just fine on alpha.
type abuse aside, and alpha bugs aside, this looks ok... what is the
value of as->msize?
--
Jeff Garzik | "Why is it that attractive girls like you
Building 1024 | always seem to have a boyfriend?"
MandrakeSoft | "Because I'm a nympho that owns a brewery?"
| - BBC TV show "Coupling"
On Wed, Feb 20, 2002 at 12:26:12PM -0500, Jeff Garzik wrote:
> "Jeff V. Merkey" wrote:
> > #ifdef CPU_ARCH_IS_ALPHA
> > #warning This looks quite suspect out
> > if ((as->vaddr = (vkaddr_t)(dense_mem((unsigned)as->ioaddr)+(unsigned)as->ioaddr)) == 0) {
> > #else
> >
> > =====> we are failing at this point
> >
> > if ((as->vaddr = (vkaddr_t)ioremap((unsigned)as->ioaddr, as->msize)) == 0) {
> > #endif
>
> ioremap works just fine on alpha.
>
> type abuse aside, and alpha bugs aside, this looks ok... what is the
> value of as->msize?
Size is "sz" as reported in the sci adapter log file.
Feb 20 09:49:45 lnx1 kernel: Mapping address space PREFETCH space: phaddr c0000000 sz 469762048 out of 469762048.
size = 469762048.
Jeff
>
> --
> Jeff Garzik | "Why is it that attractive girls like you
> Building 1024 | always seem to have a boyfriend?"
> MandrakeSoft | "Because I'm a nympho that owns a brewery?"
> | - BBC TV show "Coupling"
From: Jeff Garzik <[email protected]>
Date: Wed, 20 Feb 2002 12:26:12 -0500
type abuse aside, and alpha bugs aside, this looks ok... what is the
value of as->msize?
Jeff and Jeff, the problem is one of two things:
1) when you have ~2GB of memory the vmalloc pool is very small
and this it the same place ioremap allocations come from
2) the BIOS or Linus is not assigning resources of the device
properly, or it simple can't because the available PCI MEM space
with this much memory is too small
I note that one of the resources of the card is 16MB or so.
David,
Someone had a thought that perhaps the Serverworks chipset is mapping
addresses above the 4GB boundry. Any thoughts on how to get around
this problem?
Jeff
On Wed, Feb 20, 2002 at 09:30:34AM -0800, David S. Miller wrote:
> From: Jeff Garzik <[email protected]>
> Date: Wed, 20 Feb 2002 12:26:12 -0500
>
> type abuse aside, and alpha bugs aside, this looks ok... what is the
> value of as->msize?
>
> Jeff and Jeff, the problem is one of two things:
>
> 1) when you have ~2GB of memory the vmalloc pool is very small
> and this it the same place ioremap allocations come from
>
> 2) the BIOS or Linus is not assigning resources of the device
> properly, or it simple can't because the available PCI MEM space
> with this much memory is too small
>
> I note that one of the resources of the card is 16MB or so.
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
On Wed, 20 Feb 2002, David S. Miller wrote:
> From: Jeff Garzik <[email protected]>
> Date: Wed, 20 Feb 2002 12:26:12 -0500
>
> type abuse aside, and alpha bugs aside, this looks ok... what is the
> value of as->msize?
>
> Jeff and Jeff, the problem is one of two things:
>
> 1) when you have ~2GB of memory the vmalloc pool is very small
> and this it the same place ioremap allocations come from
>
> 2) the BIOS or Linus is not assigning resources of the device
> properly, or it simple can't because the available PCI MEM space
> with this much memory is too small
>
> I note that one of the resources of the card is 16MB or so.
Hi guys,
There is actually no need to have all three regions mapped at all times is
there Jeff ? In the Scali ICM driver we actually doesn't ioremap() the
prefetchable space at all because this is done with the mmap() method to
the userspace clients. If you have a kernel space client though ioremap()
is used, but only the parts of it that is needed (based on the number of
nodes in the cluser and the shared memory size per node).
Regards,
--
Steffen Persvold | Scalable Linux Systems | Try out the world's best
mailto:[email protected] | http://www.scali.com | performing MPI implementation:
Tel: (+47) 2262 8950 | Olaf Helsets vei 6 | - ScaMPI 1.13.8 -
Fax: (+47) 2262 8951 | N0621 Oslo, NORWAY | >320MBytes/s and <4uS latency
On Wed, Feb 20, 2002 at 07:44:58PM +0100, Steffen Persvold wrote:
> On Wed, 20 Feb 2002, David S. Miller wrote:
>
> > From: Jeff Garzik <[email protected]>
> > Date: Wed, 20 Feb 2002 12:26:12 -0500
> >
> > type abuse aside, and alpha bugs aside, this looks ok... what is the
> > value of as->msize?
> >
> > Jeff and Jeff, the problem is one of two things:
> >
> > 1) when you have ~2GB of memory the vmalloc pool is very small
> > and this it the same place ioremap allocations come from
> >
> > 2) the BIOS or Linus is not assigning resources of the device
> > properly, or it simple can't because the available PCI MEM space
> > with this much memory is too small
> >
> > I note that one of the resources of the card is 16MB or so.
>
> Hi guys,
>
> There is actually no need to have all three regions mapped at all times is
> there Jeff ? In the Scali ICM driver we actually doesn't ioremap() the
> prefetchable space at all because this is done with the mmap() method to
> the userspace clients. If you have a kernel space client though ioremap()
> is used, but only the parts of it that is needed (based on the number of
> nodes in the cluser and the shared memory size per node).
>
> Regards,
>
I am not using the adapters in user space, I am using them in kernel
space with a distributed RAID agent and file system. This is a general
issue with Hugo's SISCI and IRM drivers and Linux. They all need to work
in every configuration. If it works with less than 1 GB is should work
with > 1GB of memory.
I am looking through get_vm_area() since this is where the bug is. Your
Scali drivers are not the Dolphin released IRM/SISCI but custom drivers
you guys sell with **YOUR** software versions, and they are far from
general purpose.
Jeff
> --
> Steffen Persvold | Scalable Linux Systems | Try out the world's best
> mailto:[email protected] | http://www.scali.com | performing MPI implementation:
> Tel: (+47) 2262 8950 | Olaf Helsets vei 6 | - ScaMPI 1.13.8 -
> Fax: (+47) 2262 8951 | N0621 Oslo, NORWAY | >320MBytes/s and <4uS latency
The following information is submitted regarding this problem:
I have corrected the prefetch allocation problem by reducing
the prefetch address space size from 512 MB down to 32 MB. The
failing code is in get_vm_area(). This bug effectively reduces
the ability of Linux over other OS's to run scalable SCI based
applications with large numbers of nodes if there are large numbers
of nodes in a cluster and these nodes have more than 1 GB of memory
in each system.
My RAID/FS application does not need a huge prefetch address space so I
can get around this problem easily, but some of the SCI applications do,
and this problem will relegate Linux to a back seat status for supercomputer
applications that use this technology if Linux in unable to map larger
regions of memory. I would propose that the maintainer of
vmalloc.c look at using 48 bit PTE entries or some other solution
as a way to alloc larger virtual address frames when the system has
a lot of physical memory. It's seems pretty lame to me for a machine
with 2 GB of physical memory not to have at lest 256 MB of address space
left over for address mapping.
Offending code attached. Please advise as to what a proposed solution
could be if any is possible with this problem. I am happy to adjust
the Dolphin IRM driver behavior to accomodate Linux but I think some
larger clusters (i.e. > 1000 nodes) may not work with Linux is there's
not enough address space to map remotely across the cluster.
Jeff
struct vm_struct * get_vm_area(unsigned long size, unsigned long flags)
{
unsigned long addr;
struct vm_struct **p, *tmp, *area;
area = (struct vm_struct *) kmalloc(sizeof(*area), GFP_KERNEL);
if (!area)
return NULL;
size += PAGE_SIZE;
addr = VMALLOC_START;
write_lock(&vmlist_lock);
for (p = &vmlist; (tmp = *p) ; p = &tmp->next) {
===============> we barf here since the size + addr wraps
if ((size + addr) < addr)
goto out;
if (size + addr <= (unsigned long) tmp->addr)
break;
addr = tmp->size + (unsigned long) tmp->addr;
if (addr > VMALLOC_END-size)
goto out;
}
area->flags = flags;
area->addr = (void *)addr;
area->size = size;
area->next = *p;
*p = area;
write_unlock(&vmlist_lock);
return area;
out:
write_unlock(&vmlist_lock);
kfree(area);
return NULL;
}
On Wed, Feb 20, 2002 at 11:00:04AM -0700, Jeff V. Merkey wrote:
>
>
> David,
>
> Someone had a thought that perhaps the Serverworks chipset is mapping
> addresses above the 4GB boundry. Any thoughts on how to get around
> this problem?
>
> Jeff
>
>
> On Wed, Feb 20, 2002 at 09:30:34AM -0800, David S. Miller wrote:
> > From: Jeff Garzik <[email protected]>
> > Date: Wed, 20 Feb 2002 12:26:12 -0500
> >
> > type abuse aside, and alpha bugs aside, this looks ok... what is the
> > value of as->msize?
> >
> > Jeff and Jeff, the problem is one of two things:
> >
> > 1) when you have ~2GB of memory the vmalloc pool is very small
> > and this it the same place ioremap allocations come from
> >
> > 2) the BIOS or Linus is not assigning resources of the device
> > properly, or it simple can't because the available PCI MEM space
> > with this much memory is too small
> >
> > I note that one of the resources of the card is 16MB or so.
> > -
> > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> > the body of a message to [email protected]
> > More majordomo info at http://vger.kernel.org/majordomo-info.html
> > Please read the FAQ at http://www.tux.org/lkml/
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
"Jeff V. Merkey" wrote:
> regions of memory. I would propose that the maintainer of
> vmalloc.c look at using 48 bit PTE entries or some other solution
> as a way to alloc larger virtual address frames when the system has
> a lot of physical memory. It's seems pretty lame to me for a machine
> with 2 GB of physical memory not to have at lest 256 MB of address space
> left over for address mapping.
Instead of constantly trying to map >32-bit addresses onto 32-bit
processors, why not just get a 64-bit processor?
One constantly runs into limitations with highmem...
Jeff
--
Jeff Garzik | "Why is it that attractive girls like you
Building 1024 | always seem to have a boyfriend?"
MandrakeSoft | "Because I'm a nympho that owns a brewery?"
| - BBC TV show "Coupling"
On Wed, Feb 20, 2002 at 02:54:49PM -0700, Jeff V. Merkey wrote:
> struct vm_struct * get_vm_area(unsigned long size, unsigned long flags)
> {
> unsigned long addr;
> struct vm_struct **p, *tmp, *area;
>
> area = (struct vm_struct *) kmalloc(sizeof(*area), GFP_KERNEL);
> if (!area)
> return NULL;
> size += PAGE_SIZE;
> addr = VMALLOC_START;
> write_lock(&vmlist_lock);
> for (p = &vmlist; (tmp = *p) ; p = &tmp->next) {
>
> ===============> we barf here since the size + addr wraps
>
Also, this function should be moved to the /arch/i386/mm area since
it is doing pointer arithmetic with 32 bit assumptions (i.e.
unsigned long + unsigned long) Last time I checked, unsigned long
was a construct for a 32 bit value in any gcc compiler version, ia64
or not.
Jeff
> if ((size + addr) < addr)
> goto out;
>
> if (size + addr <= (unsigned long) tmp->addr)
> break;
> addr = tmp->size + (unsigned long) tmp->addr;
> if (addr > VMALLOC_END-size)
> goto out;
> }
> area->flags = flags;
> area->addr = (void *)addr;
> area->size = size;
> area->next = *p;
> *p = area;
> write_unlock(&vmlist_lock);
> return area;
>
> out:
> write_unlock(&vmlist_lock);
> kfree(area);
> return NULL;
> }
>
>
>
>
>
>
>
>
>
>
>
> On Wed, Feb 20, 2002 at 11:00:04AM -0700, Jeff V. Merkey wrote:
> >
> >
> > David,
> >
> > Someone had a thought that perhaps the Serverworks chipset is mapping
> > addresses above the 4GB boundry. Any thoughts on how to get around
> > this problem?
> >
> > Jeff
> >
> >
> > On Wed, Feb 20, 2002 at 09:30:34AM -0800, David S. Miller wrote:
> > > From: Jeff Garzik <[email protected]>
> > > Date: Wed, 20 Feb 2002 12:26:12 -0500
> > >
> > > type abuse aside, and alpha bugs aside, this looks ok... what is the
> > > value of as->msize?
> > >
> > > Jeff and Jeff, the problem is one of two things:
> > >
> > > 1) when you have ~2GB of memory the vmalloc pool is very small
> > > and this it the same place ioremap allocations come from
> > >
> > > 2) the BIOS or Linus is not assigning resources of the device
> > > properly, or it simple can't because the available PCI MEM space
> > > with this much memory is too small
> > >
> > > I note that one of the resources of the card is 16MB or so.
> > > -
> > > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> > > the body of a message to [email protected]
> > > More majordomo info at http://vger.kernel.org/majordomo-info.html
> > > Please read the FAQ at http://www.tux.org/lkml/
> > -
> > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> > the body of a message to [email protected]
> > More majordomo info at http://vger.kernel.org/majordomo-info.html
> > Please read the FAQ at http://www.tux.org/lkml/
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
On Wed, Feb 20, 2002 at 04:51:00PM -0500, Jeff Garzik wrote:
> "Jeff V. Merkey" wrote:
> > regions of memory. I would propose that the maintainer of
> > vmalloc.c look at using 48 bit PTE entries or some other solution
> > as a way to alloc larger virtual address frames when the system has
> > a lot of physical memory. It's seems pretty lame to me for a machine
> > with 2 GB of physical memory not to have at lest 256 MB of address space
> > left over for address mapping.
>
> Instead of constantly trying to map >32-bit addresses onto 32-bit
> processors, why not just get a 64-bit processor?
>
> One constantly runs into limitations with highmem...
>
> Jeff
Sigh .... I am only using 2 GB on a 4GB capable processor (actually
a 64 GB capable processor). Looks like a patch is needed. Who is
maintaining vmalloc.c at present so I know who to submit a patch
to?
Jeff
>
>
>
> --
> Jeff Garzik | "Why is it that attractive girls like you
> Building 1024 | always seem to have a boyfriend?"
> MandrakeSoft | "Because I'm a nympho that owns a brewery?"
> | - BBC TV show "Coupling"
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
In article <[email protected]> you wrote:
> unsigned long + unsigned long) Last time I checked, unsigned long
> was a construct for a 32 bit value in any gcc compiler version, ia64
> or not.
16 bit msdos mode doesn't count; otherwise it really is 64 bit on 64 bit
machines.... as per ansi C definition: unsigned long is big enough to
hold a data pointer....
On Wed, Feb 20, 2002 at 03:20:11PM -0700, Jeff V. Merkey wrote:
> On Wed, Feb 20, 2002 at 04:51:00PM -0500, Jeff Garzik wrote:
> > "Jeff V. Merkey" wrote:
> > > regions of memory. I would propose that the maintainer of
> > > vmalloc.c look at using 48 bit PTE entries or some other solution
> > > as a way to alloc larger virtual address frames when the system has
> > > a lot of physical memory. It's seems pretty lame to me for a machine
> > > with 2 GB of physical memory not to have at lest 256 MB of address space
> > > left over for address mapping.
> >
> > Instead of constantly trying to map >32-bit addresses onto 32-bit
> > processors, why not just get a 64-bit processor?
> >
> > One constantly runs into limitations with highmem...
> >
> > Jeff
>
> Sigh .... I am only using 2 GB on a 4GB capable processor (actually
> a 64 GB capable processor). Looks like a patch is needed. Who is
> maintaining vmalloc.c at present so I know who to submit a patch
> to?
>
> Jeff
Jeff,
Is it a waste of time to attempt to fix this?
Jeff
>
>
> >
> >
> >
> > --
> > Jeff Garzik | "Why is it that attractive girls like you
> > Building 1024 | always seem to have a boyfriend?"
> > MandrakeSoft | "Because I'm a nympho that owns a brewery?"
> > | - BBC TV show "Coupling"
> > -
> > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> > the body of a message to [email protected]
> > More majordomo info at http://vger.kernel.org/majordomo-info.html
> > Please read the FAQ at http://www.tux.org/lkml/
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
> Sigh .... I am only using 2 GB on a 4GB capable processor (actually
> a 64 GB capable processor). Looks like a patch is needed. Who is
> maintaining vmalloc.c at present so I know who to submit a patch
> to?
Actually you are using a 64Gb capable processor that is only capable of
sanely addressing 4Gb at a time, total across both user and kernel space
and takes a hefty hit whenever you switch which 4Gb you are peering at.
If you want to make sensible use of even 4Gb user/4Gb kernel you need to
take a page table reload at syscall time and deal with quite messy handling
for copy to/from user.
[If someone from Intel disagrees please do so publically - I'd love to have
someone prove the limit can be dealt with 8)]
On Wed, Feb 20, 2002 at 11:06:17PM +0000, Alan Cox wrote:
> > Sigh .... I am only using 2 GB on a 4GB capable processor (actually
> > a 64 GB capable processor). Looks like a patch is needed. Who is
> > maintaining vmalloc.c at present so I know who to submit a patch
> > to?
>
> Actually you are using a 64Gb capable processor that is only capable of
> sanely addressing 4Gb at a time, total across both user and kernel space
> and takes a hefty hit whenever you switch which 4Gb you are peering at.
>
> If you want to make sensible use of even 4Gb user/4Gb kernel you need to
> take a page table reload at syscall time and deal with quite messy handling
> for copy to/from user.
>
> [If someone from Intel disagrees please do so publically - I'd love to have
> someone prove the limit can be dealt with 8)]
I'll get to work on it. The 4 bit PTE extention to enable 48 bit
addresses is quite ugly, and I agree -- heavy. There may. however,
be no other way to make this work properly. The SCI address space is
explicitly addressable, so it may be possible to enable this on
the PCI-SCI adapters **ONLY** (not ccNuma) without getting into
general user/kernel copy issues since the SCI address space is
typically assumed in this model to be an explicit entity for most
apps. :-)
Jeff
Jeff
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
"Jeff V. Merkey" wrote:
>
> On Wed, Feb 20, 2002 at 02:54:49PM -0700, Jeff V. Merkey wrote:
>
> > struct vm_struct * get_vm_area(unsigned long size, unsigned long flags)
> > {
> > unsigned long addr;
> > struct vm_struct **p, *tmp, *area;
> >
> > area = (struct vm_struct *) kmalloc(sizeof(*area), GFP_KERNEL);
> > if (!area)
> > return NULL;
> > size += PAGE_SIZE;
> > addr = VMALLOC_START;
> > write_lock(&vmlist_lock);
> > for (p = &vmlist; (tmp = *p) ; p = &tmp->next) {
> >
> > ===============> we barf here since the size + addr wraps
> >
>
> Also, this function should be moved to the /arch/i386/mm area since
> it is doing pointer arithmetic with 32 bit assumptions (i.e.
> unsigned long + unsigned long) Last time I checked, unsigned long
> was a construct for a 32 bit value in any gcc compiler version, ia64
> or not.
>
Jeff,
I think you'll have to check again. In LP64 programming models (used on most
64-bit OS'es) 'long' is 64 bit. Thus a 'unsigned long' is always safe to use
for pointer arithmetic since it will be 32 bit on 32bit machines and 64bit on
64bit machines.
Regards,
--
Steffen Persvold | Scalable Linux Systems | Try out the world's best
mailto:[email protected] | http://www.scali.com | performing MPI implementation:
Tel: (+47) 2262 8950 | Olaf Helsets vei 6 | - ScaMPI 1.13.8 -
Fax: (+47) 2262 8951 | N0621 Oslo, NORWAY | >320MBytes/s and <4uS latency
"Jeff V. Merkey" wrote:
>
> On Wed, Feb 20, 2002 at 07:44:58PM +0100, Steffen Persvold wrote:
> > On Wed, 20 Feb 2002, David S. Miller wrote:
> >
> > > From: Jeff Garzik <[email protected]>
> > > Date: Wed, 20 Feb 2002 12:26:12 -0500
> > >
> > > type abuse aside, and alpha bugs aside, this looks ok... what is the
> > > value of as->msize?
> > >
> > > Jeff and Jeff, the problem is one of two things:
> > >
> > > 1) when you have ~2GB of memory the vmalloc pool is very small
> > > and this it the same place ioremap allocations come from
> > >
> > > 2) the BIOS or Linus is not assigning resources of the device
> > > properly, or it simple can't because the available PCI MEM space
> > > with this much memory is too small
> > >
> > > I note that one of the resources of the card is 16MB or so.
> >
> > Hi guys,
> >
> > There is actually no need to have all three regions mapped at all times is
> > there Jeff ? In the Scali ICM driver we actually doesn't ioremap() the
> > prefetchable space at all because this is done with the mmap() method to
> > the userspace clients. If you have a kernel space client though ioremap()
> > is used, but only the parts of it that is needed (based on the number of
> > nodes in the cluser and the shared memory size per node).
> >
> > Regards,
> >
>
> I am not using the adapters in user space, I am using them in kernel
> space with a distributed RAID agent and file system. This is a general
> issue with Hugo's SISCI and IRM drivers and Linux. They all need to work
> in every configuration. If it works with less than 1 GB is should work
> with > 1GB of memory.
>
> I am looking through get_vm_area() since this is where the bug is. Your
> Scali drivers are not the Dolphin released IRM/SISCI but custom drivers
> you guys sell with **YOUR** software versions, and they are far from
> general purpose.
>
Jeff,
I really don't think you're in a position to say wether the Scali driver is
general purpose or not, but in any case this issue is OT. My point is that it
is not a good idea to keep the prefetchable area mapped at all times. The ICM
driver also has kernel clients (e.g. a ethernet emulation driver) and they only
ioremap() the areas which is needed based upon the number of nodes in the
cluster they communicate with (and only a few Kbytes is mapped from each node).
Regards,
--
Steffen Persvold | Scalable Linux Systems | Try out the world's best
mailto:[email protected] | http://www.scali.com | performing MPI implementation:
Tel: (+47) 2262 8950 | Olaf Helsets vei 6 | - ScaMPI 1.13.8 -
Fax: (+47) 2262 8951 | N0621 Oslo, NORWAY | >320MBytes/s and <4uS latency
On Fri, Feb 22, 2002 at 12:08:25PM +0100, Steffen Persvold wrote:
> "Jeff V. Merkey" wrote:
> >
> > On Wed, Feb 20, 2002 at 02:54:49PM -0700, Jeff V. Merkey wrote:
> >
> > > struct vm_struct * get_vm_area(unsigned long size, unsigned long flags)
> > > {
> > > unsigned long addr;
> > > struct vm_struct **p, *tmp, *area;
> > >
> > > area = (struct vm_struct *) kmalloc(sizeof(*area), GFP_KERNEL);
> > > if (!area)
> > > return NULL;
> > > size += PAGE_SIZE;
> > > addr = VMALLOC_START;
> > > write_lock(&vmlist_lock);
> > > for (p = &vmlist; (tmp = *p) ; p = &tmp->next) {
> > >
> > > ===============> we barf here since the size + addr wraps
> > >
> >
> > Also, this function should be moved to the /arch/i386/mm area since
> > it is doing pointer arithmetic with 32 bit assumptions (i.e.
> > unsigned long + unsigned long) Last time I checked, unsigned long
> > was a construct for a 32 bit value in any gcc compiler version, ia64
> > or not.
> >
>
> Jeff,
>
> I think you'll have to check again. In LP64 programming models (used on most
> 64-bit OS'es) 'long' is 64 bit. Thus a 'unsigned long' is always safe to use
> for pointer arithmetic since it will be 32 bit on 32bit machines and 64bit on
> 64bit machines.
>
> Regards,
> --
Hi Steffan,
On early IA64 long long was assumed to be 64 bit, long 32 bit. After
emailing some folks off line I relaize this may not be the case
any longer, but still is on some compiler options.
Say hi to Hugo for me next time you see him. :-) I have reviewed this
and we will need some changes to the page table setup in Linux
and copy to/from user macros in order to get around this problem
for preallocating prefetch space.
Any ideas and joint work on this problem is appreciated.
Jeff
> Steffen Persvold | Scalable Linux Systems | Try out the world's best
> mailto:[email protected] | http://www.scali.com | performing MPI implementation:
> Tel: (+47) 2262 8950 | Olaf Helsets vei 6 | - ScaMPI 1.13.8 -
> Fax: (+47) 2262 8951 | N0621 Oslo, NORWAY | >320MBytes/s and <4uS latency
On Fri, Feb 22, 2002 at 12:49:10PM +0100, Steffen Persvold wrote:
> "Jeff V. Merkey" wrote:
> >
> > On Wed, Feb 20, 2002 at 07:44:58PM +0100, Steffen Persvold wrote:
> > > On Wed, 20 Feb 2002, David S. Miller wrote:
> > >
> > > > From: Jeff Garzik <[email protected]>
> > > > Date: Wed, 20 Feb 2002 12:26:12 -0500
> > > >
> > > > type abuse aside, and alpha bugs aside, this looks ok... what is the
> > > > value of as->msize?
> > > >
> > > > Jeff and Jeff, the problem is one of two things:
> > > >
> > > > 1) when you have ~2GB of memory the vmalloc pool is very small
> > > > and this it the same place ioremap allocations come from
> > > >
> > > > 2) the BIOS or Linus is not assigning resources of the device
> > > > properly, or it simple can't because the available PCI MEM space
> > > > with this much memory is too small
> > > >
> > > > I note that one of the resources of the card is 16MB or so.
> > >
> > > Hi guys,
> > >
> > > There is actually no need to have all three regions mapped at all times is
> > > there Jeff ? In the Scali ICM driver we actually doesn't ioremap() the
> > > prefetchable space at all because this is done with the mmap() method to
> > > the userspace clients. If you have a kernel space client though ioremap()
> > > is used, but only the parts of it that is needed (based on the number of
> > > nodes in the cluser and the shared memory size per node).
> > >
> > > Regards,
> > >
> >
> > I am not using the adapters in user space, I am using them in kernel
> > space with a distributed RAID agent and file system. This is a general
> > issue with Hugo's SISCI and IRM drivers and Linux. They all need to work
> > in every configuration. If it works with less than 1 GB is should work
> > with > 1GB of memory.
> >
> > I am looking through get_vm_area() since this is where the bug is. Your
> > Scali drivers are not the Dolphin released IRM/SISCI but custom drivers
> > you guys sell with **YOUR** software versions, and they are far from
> > general purpose.
> >
>
> Jeff,
>
> I really don't think you're in a position to say wether the Scali driver is
Why don't you guys just support SISCI so there's not two driver
trees? I have never seen your source code made public. If it's
general then where can I download your code. I also noticed your
implementation of the D335 drivers have some problems with our
applications.
> general purpose or not, but in any case this issue is OT. My point is that it
> is not a good idea to keep the prefetchable area mapped at all times. The ICM
> driver also has kernel clients (e.g. a ethernet emulation driver) and they only
> ioremap() the areas which is needed based upon the number of nodes in the
> cluster they communicate with (and only a few Kbytes is mapped from each node).
>
> Regards,
Not everyone uses the same model you do for cluster membership. I am
primarily concerned with keeping the same features/behaviors with
the SISCI driver base. It's the one I use.
Jeff
> --
> Steffen Persvold | Scalable Linux Systems | Try out the world's best
> mailto:[email protected] | http://www.scali.com | performing MPI implementation:
> Tel: (+47) 2262 8950 | Olaf Helsets vei 6 | - ScaMPI 1.13.8 -
> Fax: (+47) 2262 8951 | N0621 Oslo, NORWAY | >320MBytes/s and <4uS latency
>>>>> On Fri, 22 Feb 2002 11:17:56 -0700, "Jeff V. Merkey" <[email protected]> said:
Jeff> On early IA64 long long was assumed to be 64 bit, long 32
Jeff> bit. After emailing some folks off line I relaize this may not
Jeff> be the case any longer, but still is on some compiler options.
In the context of Linux, this is certainly not true. Linux/ia64
always has been LP64 (i.e., sizeof(long)=8). Perhaps you're confusing
this with the hp-ux C compiler, which defaults to ILP32? Another
potential source of confusion is Windows, which uses the P64 data
model (only pointers and "long long" are 64 bits).
--david
David Mosberger wrote:
> In the context of Linux, this is certainly not true. Linux/ia64
> always has been LP64 (i.e., sizeof(long)=8). Perhaps you're confusing
> this with the hp-ux C compiler, which defaults to ILP32? Another
> potential source of confusion is Windows, which uses the P64 data
> model (only pointers and "long long" are 64 bits).
Tru64's vendor compiler has similar features, though I'm not sure if
32-bit mode is enabled by default. Noteably, Netscape for Tru64 is
compiled with this 32-bit mode, IIRC
People would be surprised how much ground alpha axp broke in userland,
years ago, simply by being one of the first Linux platforms where long
!= int
Jeff
--
Jeff Garzik | "UNIX enhancements aren't."
Building 1024 | -- says /usr/games/fortune
MandrakeSoft |
>>>>> On Fri, 22 Feb 2002 13:51:17 -0500, Jeff Garzik <[email protected]> said:
Jeff> People would be surprised how much ground alpha axp broke in
Jeff> userland, years ago, simply by being one of the first Linux
Jeff> platforms where long != int
Yes. And you don't want to know how many hours I personally spent on
this (along with several other folks).
--david
On Fri, Feb 22, 2002 at 10:42:49AM -0800, David Mosberger wrote:
> >>>>> On Fri, 22 Feb 2002 11:17:56 -0700, "Jeff V. Merkey" <[email protected]> said:
>
> Jeff> On early IA64 long long was assumed to be 64 bit, long 32
> Jeff> bit. After emailing some folks off line I relaize this may not
> Jeff> be the case any longer, but still is on some compiler options.
>
> In the context of Linux, this is certainly not true. Linux/ia64
> always has been LP64 (i.e., sizeof(long)=8). Perhaps you're confusing
> this with the hp-ux C compiler, which defaults to ILP32? Another
Correct.
Jeff
> potential source of confusion is Windows, which uses the P64 data
> model (only pointers and "long long" are 64 bits).
>
> --david
On Fri, Feb 22, 2002 at 12:08:25PM +0100, Steffen Persvold wrote:
> I think you'll have to check again. In LP64 programming models (used on most
> 64-bit OS'es) 'long' is 64 bit. Thus a 'unsigned long' is always safe to use
> for pointer arithmetic since it will be 32 bit on 32bit machines and 64bit on
> 64bit machines.
This isn't something you should count on universally. There
are targets that have a 64-bit long, but a 32-bit pointer.
Seems unlikely to show up on linux, but...
r~