Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S262430AbVDLPBr (ORCPT ); Tue, 12 Apr 2005 11:01:47 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S262344AbVDLPBJ (ORCPT ); Tue, 12 Apr 2005 11:01:09 -0400 Received: from vanessarodrigues.com ([192.139.46.150]:50073 "EHLO jaguar.mkp.net") by vger.kernel.org with ESMTP id S262445AbVDLOuP (ORCPT ); Tue, 12 Apr 2005 10:50:15 -0400 To: Andrew Morton Cc: linux-kernel@vger.kernel.org Subject: Re: [patch] mspec driver for 2.6.12-rc2-mm3 References: <16987.39773.267117.925489@jaguar.mkp.net> <20050412032747.51c0c514.akpm@osdl.org> From: Jes Sorensen Date: 12 Apr 2005 10:50:08 -0400 In-Reply-To: <20050412032747.51c0c514.akpm@osdl.org> Message-ID: User-Agent: Gnus/5.09 (Gnus v5.9.0) Emacs/21.4 MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Sender: linux-kernel-owner@vger.kernel.org X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 29642 Lines: 1056 >>>>> "Andrew" == Andrew Morton writes: Andrew> Does this compile without CONFIG_EFI? Andrew> (It seems that ia64 Kconfig tries to turn on EFI always, but I Andrew> thing allnoconfig will turn it off) Nope it doesn't. I don't think the ia64 port can operate without EFI at all. >> --- linux-2.6.12-rc2-mm3-vanilla/include/asm-ia64/sn/fetchop.h >> 2005-03-01 23:38:12 -08:00 +++ >> linux-2.6.12-rc2-mm3/include/asm-ia64/sn/fetchop.h 1969-12-31 >> 16:00:00 -08:00 Andrew> Did you mean to remove fetchop.h? Yep, it's obsoleted by mspec.h Fixed the malloc locking bug and atomic_dev issue as well. Thanks for the comments! Cheers, Jes Memory special driver for cached, uncached and 'fetchop' (SGI SN2 specific) memory mappings, formerly known as fetchop. Mostly Used by parallel appplictions. This patch relies on the PG_uncached support patch and the generic allocator patch (genalloc). Signed-off-by: Jes Sorensen diff -urN -X /usr/people/jes/exclude-linux linux-2.6.12-rc2-mm3-vanilla/arch/ia64/Kconfig linux-2.6.12-rc2-mm3/arch/ia64/Kconfig --- linux-2.6.12-rc2-mm3-vanilla/arch/ia64/Kconfig 2005-04-12 02:09:02 -07:00 +++ linux-2.6.12-rc2-mm3/arch/ia64/Kconfig 2005-04-12 02:14:06 -07:00 @@ -217,6 +217,16 @@ If you are compiling a kernel that will run under SGI's IA-64 simulator (Medusa) then say Y, otherwise say N. +config MSPEC + tristate "Special Memory support" + select GENERIC_ALLOCATOR + help + This driver allows for cached and uncached mappings of memory + to user processes. On SGI SN hardware it will also export the + special fetchop memory facility. + Fetchops are atomic memory operations that are implemented in the + memory controller on SGI SN hardware. + config FORCE_MAX_ZONEORDER int default "18" diff -urN -X /usr/people/jes/exclude-linux linux-2.6.12-rc2-mm3-vanilla/arch/ia64/configs/sn2_defconfig linux-2.6.12-rc2-mm3/arch/ia64/configs/sn2_defconfig --- linux-2.6.12-rc2-mm3-vanilla/arch/ia64/configs/sn2_defconfig 2005-04-12 02:09:02 -07:00 +++ linux-2.6.12-rc2-mm3/arch/ia64/configs/sn2_defconfig 2005-04-12 02:14:06 -07:00 @@ -82,6 +82,7 @@ # CONFIG_IA64_CYCLONE is not set CONFIG_IOSAPIC=y CONFIG_IA64_SGI_SN_SIM=y +CONFIG_MSPEC=m CONFIG_FORCE_MAX_ZONEORDER=18 CONFIG_SMP=y CONFIG_NR_CPUS=512 diff -urN -X /usr/people/jes/exclude-linux linux-2.6.12-rc2-mm3-vanilla/arch/ia64/defconfig linux-2.6.12-rc2-mm3/arch/ia64/defconfig --- linux-2.6.12-rc2-mm3-vanilla/arch/ia64/defconfig 2005-04-12 02:09:02 -07:00 +++ linux-2.6.12-rc2-mm3/arch/ia64/defconfig 2005-04-12 02:14:06 -07:00 @@ -80,6 +80,7 @@ CONFIG_ARCH_DISCONTIGMEM_ENABLE=y CONFIG_IA64_CYCLONE=y CONFIG_IOSAPIC=y +CONFIG_MSPEC=m CONFIG_FORCE_MAX_ZONEORDER=18 CONFIG_SMP=y CONFIG_NR_CPUS=512 diff -urN -X /usr/people/jes/exclude-linux linux-2.6.12-rc2-mm3-vanilla/arch/ia64/kernel/Makefile linux-2.6.12-rc2-mm3/arch/ia64/kernel/Makefile --- linux-2.6.12-rc2-mm3-vanilla/arch/ia64/kernel/Makefile 2005-03-01 23:38:33 -08:00 +++ linux-2.6.12-rc2-mm3/arch/ia64/kernel/Makefile 2005-04-12 02:14:06 -07:00 @@ -20,6 +20,7 @@ obj-$(CONFIG_PERFMON) += perfmon_default_smpl.o obj-$(CONFIG_IA64_CYCLONE) += cyclone.o obj-$(CONFIG_IA64_MCA_RECOVERY) += mca_recovery.o +obj-$(CONFIG_MSPEC) += mspec.o mca_recovery-y += mca_drv.o mca_drv_asm.o # The gate DSO image is built using a special linker script. diff -urN -X /usr/people/jes/exclude-linux linux-2.6.12-rc2-mm3-vanilla/arch/ia64/kernel/mspec.c linux-2.6.12-rc2-mm3/arch/ia64/kernel/mspec.c --- linux-2.6.12-rc2-mm3-vanilla/arch/ia64/kernel/mspec.c 1969-12-31 16:00:00 -08:00 +++ linux-2.6.12-rc2-mm3/arch/ia64/kernel/mspec.c 2005-04-12 07:39:19 -07:00 @@ -0,0 +1,798 @@ +/* + * Copyright (C) 2001-2005 Silicon Graphics, Inc. All rights + * reserved. + * + * This program is free software; you can redistribute it and/or modify it + * under the terms of version 2 of the GNU General Public License + * as published by the Free Software Foundation. + */ + +/* + * SN Platform Special Memory (mspec) Support + * + * This driver exports the SN special memory (mspec) facility to user processes. + * There are three types of memory made available thru this driver: + * fetchops, uncached and cached. + * + * Fetchops are atomic memory operations that are implemented in the + * memory controller on SGI SN hardware. + * + * Uncached are used for memory write combining feature of the ia64 + * cpu. + * + * Cached are used for areas of memory that are used as cached addresses + * on our partition and used as uncached addresses from other partitions. + * Due to a design constraint of the SN2 Shub, you can not have processors + * on the same FSB perform both a cached and uncached reference to the + * same cache line. These special memory cached regions prevent the + * kernel from ever dropping in a TLB entry and therefore prevent the + * processor from ever speculating a cache line from this page. + */ + + +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + + +#define DEBUG 0 + +#define FETCHOP_DRIVER_ID_STR "MSPEC Fetchop Device Driver" +#define CACHED_DRIVER_ID_STR "MSPEC Cached Device Driver" +#define UNCACHED_DRIVER_ID_STR "MSPEC Uncached Device Driver" +#define REVISION "3.0" +#define MSPEC_BASENAME "mspec" + + +#define BTE_ZERO_BLOCK(_maddr, _len) \ + bte_copy(0, _maddr - __IA64_UNCACHED_OFFSET, _len, BTE_WACQUIRE | BTE_ZERO_FILL, NULL) + +static int fetchop_mmap(struct file *file, struct vm_area_struct *vma); +static int cached_mmap(struct file *file, struct vm_area_struct *vma); +static int uncached_mmap(struct file *file, struct vm_area_struct *vma); +static void mspec_open(struct vm_area_struct *vma); +static void mspec_close(struct vm_area_struct *vma); +static struct page * mspec_nopage(struct vm_area_struct *vma, + unsigned long address, int *unused); + +/* + * Page types allocated by the device. + */ +enum { + MSPEC_FETCHOP = 1, + MSPEC_CACHED, + MSPEC_UNCACHED +}; + +static struct file_operations fetchop_fops = { + .owner THIS_MODULE, + .mmap fetchop_mmap +}; +static struct miscdevice fetchop_miscdev = { + .minor MISC_DYNAMIC_MINOR, + .name "sgi_fetchop", + .fops &fetchop_fops +}; + + +static struct file_operations cached_fops = { + .owner THIS_MODULE, + .mmap cached_mmap +}; +static struct miscdevice cached_miscdev = { + .minor MISC_DYNAMIC_MINOR, + .name "sgi_cached", + .fops &cached_fops +}; + + +static struct file_operations uncached_fops = { + .owner THIS_MODULE, + .mmap uncached_mmap +}; +static struct miscdevice uncached_miscdev = { + .minor MISC_DYNAMIC_MINOR, + .name "sgi_uncached", + .fops &uncached_fops +}; + + +static struct vm_operations_struct mspec_vm_ops = { + .open mspec_open, + .close mspec_close, + .nopage mspec_nopage +}; + +/* + * There is one of these structs per node. It is used to manage the mspec + * space that is available on the node. Current assumption is that there is + * only 1 mspec block of memory per node. + */ +struct node_mspecs { + long maddr; /* phys addr of start of mspecs. */ + int count; /* Total number of mspec pages. */ + atomic_t free; /* Number of pages currently free. */ + unsigned long bits[1]; /* Bitmap for managing pages. */ +}; + + +/* + * One of these structures is allocated when an mspec region is mmaped. The + * structure is pointed to by the vma->vm_private_data field in the vma struct. + * This structure is used to record the addresses of the mspec pages. + */ +struct vma_data { + atomic_t refcnt; /* Number of vmas sharing the data. */ + spinlock_t lock; /* Serialize access to the vma. */ + int count; /* Number of pages allocated. */ + int type; /* Type of pages allocated. */ + unsigned long maddr[1]; /* Array of MSPEC addresses. */ +}; + + +/* + * Memory Special statistics. + */ +struct mspec_stats { + atomic_t map_count; /* Number of active mmap's */ + atomic_t pages_in_use; /* Number of mspec pages in use */ + unsigned long pages_total; /* Total number of mspec pages */ +}; + +static struct mspec_stats mspec_stats; +static struct node_mspecs *node_mspecs[MAX_NUMNODES]; + +#define MAX_UNCACHED_GRANULES 5 +static int allocated_granules; + +struct gen_pool *mspec_pool[MAX_NUMNODES]; + +static void mspec_ipi_visibility(void *data) +{ + int status; + + status = ia64_pal_prefetch_visibility(PAL_VISIBILITY_PHYSICAL); + if ((status != PAL_VISIBILITY_OK) && + (status != PAL_VISIBILITY_OK_REMOTE_NEEDED)) + printk(KERN_DEBUG "pal_prefetch_visibility() returns %i on " + "CPU %i\n", status, get_cpu()); +} + + +static void mspec_ipi_mc_drain(void *data) +{ + int status; + status = ia64_pal_mc_drain(); + if (status) + printk(KERN_WARNING "ia64_pal_mc_drain() failed with %i on " + "CPU %i\n", status, get_cpu()); +} + + +static unsigned long +mspec_get_new_chunk(struct gen_pool *poolp) +{ + struct page *page; + void *tmp; + int status, node, i; + unsigned long addr; + + if (allocated_granules >= MAX_UNCACHED_GRANULES) + return 0; + + node = (int)poolp->private; + page = alloc_pages_node(node, GFP_KERNEL, + IA64_GRANULE_SHIFT-PAGE_SHIFT); + +#if DEBUG + printk(KERN_INFO "get_new_chunk page %p, addr %lx\n", + page, (unsigned long)(page-vmem_map) << PAGE_SHIFT); +#endif + + /* + * Do magic if no mem on local node! XXX + */ + if (!page) + return 0; + tmp = page_address(page); + memset(tmp, 0, IA64_GRANULE_SIZE); + + /* + * There's a small race here where it's possible for someone to + * access the page through /dev/mem halfway through the conversion + * to uncached - not sure it's really worth bothering about + */ + for (i = 0; i < (IA64_GRANULE_SIZE / PAGE_SIZE); i++) + SetPageUncached(&page[i]); + + flush_tlb_kernel_range(tmp, tmp + IA64_GRANULE_SIZE); + + status = ia64_pal_prefetch_visibility(PAL_VISIBILITY_PHYSICAL); +#if DEBUG + printk(KERN_INFO "pal_prefetch_visibility() returns %i on cpu %i\n", + status, get_cpu()); +#endif + if (!status) { + status = smp_call_function(mspec_ipi_visibility, NULL, 0, 1); + if (status) + printk(KERN_WARNING "smp_call_function failed for " + "mspec_ipi_visibility! (%i)\n", status); + } + + sn_flush_all_caches((unsigned long)tmp, IA64_GRANULE_SIZE); + ia64_pal_mc_drain(); + status = smp_call_function(mspec_ipi_mc_drain, NULL, 0, 1); + if (status) + printk(KERN_WARNING "smp_call_function failed for " + "mspec_ipi_mc_drain! (%i)\n", status); + + addr = (unsigned long)tmp - PAGE_OFFSET + __IA64_UNCACHED_OFFSET; + + allocated_granules++; + return addr; +} + + +/* + * mspec_alloc_page + * + * Allocate 1 mspec page. Allocates on the requested node. If no + * mspec pages are available on the requested node, roundrobin starting + * with higher nodes. + */ +static unsigned long +mspec_alloc_page(int nid, int type) +{ + unsigned long maddr; + + maddr = gen_pool_alloc(mspec_pool[nid], PAGE_SIZE); +#if DEBUG + printk(KERN_DEBUG "mspec_alloc_page returns %lx on node %i\n", + maddr, nid); +#endif + + /* + * If no memory is availble on our local node, try the + * remaining nodes in the system. + */ + if (!maddr) { + int i; + + for (i = MAX_NUMNODES - 1; i >= 0; i--) { + if (i == nid || !node_online(i)) + continue; + maddr = gen_pool_alloc(mspec_pool[i], PAGE_SIZE); +#if DEBUG + printk(KERN_DEBUG "mspec_alloc_page alternate search " + "returns %lx on node %i\n", maddr, i); +#endif + if (maddr) { + break; + } + } + } + + if (maddr) + atomic_inc(&mspec_stats.pages_in_use); + + return maddr; +} + + +/* + * mspec_free_page + * + * Free a single mspec page. + */ +static void +mspec_free_page(unsigned long maddr) +{ + int node; + + node = nasid_to_cnodeid(NASID_GET(maddr)); +#if DEBUG + printk(KERN_DEBUG "mspec_free_page(%lx) on node %i\n", maddr, node); +#endif + if ((maddr & (0XFUL << 60)) != __IA64_UNCACHED_OFFSET) + panic("mspec_free_page invalid address %lx\n", maddr); + + atomic_dec(&mspec_stats.pages_in_use); + gen_pool_free(mspec_pool[node], maddr, PAGE_SIZE); +} + + +/* + * mspec_mmap + * + * Called when mmaping the device. Initializes the vma with a fault handler + * and private data structure necessary to allocate, track, and free the + * underlying pages. + */ +static int +mspec_mmap(struct file *file, struct vm_area_struct *vma, int type) +{ + struct vma_data *vdata; + int pages; + + if (vma->vm_pgoff != 0) + return -EINVAL; + + if ((vma->vm_flags & VM_WRITE) == 0) + return -EPERM; + + pages = (vma->vm_end - vma->vm_start) >> PAGE_SHIFT; + if (!(vdata = vmalloc(sizeof(struct vma_data)+(pages-1)*sizeof(long)))) + return -ENOMEM; + memset(vdata, 0, sizeof(struct vma_data)+(pages-1)*sizeof(long)); + + vdata->type = type; + vdata->lock = SPIN_LOCK_UNLOCKED; + vdata->refcnt = ATOMIC_INIT(1); + vma->vm_private_data = vdata; + + vma->vm_flags |= (VM_IO | VM_SHM | VM_LOCKED); + if (vdata->type == MSPEC_FETCHOP || vdata->type == MSPEC_UNCACHED) + vma->vm_page_prot = pgprot_noncached(vma->vm_page_prot); + vma->vm_ops = &mspec_vm_ops; + + atomic_inc(&mspec_stats.map_count); + return 0; +} + +static int +fetchop_mmap(struct file *file, struct vm_area_struct *vma) +{ + return mspec_mmap(file, vma, MSPEC_FETCHOP); +} + +static int +cached_mmap(struct file *file, struct vm_area_struct *vma) +{ + return mspec_mmap(file, vma, MSPEC_CACHED); +} + +static int +uncached_mmap(struct file *file, struct vm_area_struct *vma) +{ + return mspec_mmap(file, vma, MSPEC_UNCACHED); +} + +/* + * mspec_open + * + * Called when a device mapping is created by a means other than mmap + * (via fork, etc.). Increments the reference count on the underlying + * mspec data so it is not freed prematurely. + */ +static void +mspec_open(struct vm_area_struct *vma) +{ + struct vma_data *vdata; + + vdata = vma->vm_private_data; + atomic_inc(&vdata->refcnt); +} + +/* + * mspec_close + * + * Called when unmapping a device mapping. Frees all mspec pages + * belonging to the vma. + */ +static void +mspec_close(struct vm_area_struct *vma) +{ + struct vma_data *vdata; + int i, pages; + bte_result_t br; + + vdata = vma->vm_private_data; + if (atomic_dec_and_test(&vdata->refcnt) == 0) { + pages = (vma->vm_end - vma->vm_start) >> PAGE_SHIFT; + for (i = 0; i < pages; i++) { + if (vdata->maddr[i] != 0) { + /* + * Clear the page before sticking it back + * into the pool. + */ + br = BTE_ZERO_BLOCK(vdata->maddr[i], PAGE_SIZE); + if (br == BTE_SUCCESS) + mspec_free_page(vdata->maddr[i]); + else + printk(KERN_WARNING "mspec_close(): BTE failed to zero page\n"); + } + } + if (vdata->count) + atomic_dec(&mspec_stats.map_count); + vfree(vdata); + } +} + +/* + * mspec_get_one_pte + * + * Return the pte for a given mm and address. + */ +static __inline__ int +mspec_get_one_pte(struct mm_struct *mm, u64 address, pte_t **pte) +{ + pgd_t *pgd; + pmd_t *pmd; + pud_t *pud; + + pgd = pgd_offset(mm, address); + if (pgd_present(*pgd)) { + pud = pud_offset(pgd, address); + if (pud_present(*pud)) { + pmd = pmd_offset(pud, address); + if (pmd_present(*pmd)) { + *pte = pte_offset_map(pmd, address); + if (pte_present(**pte)) { + return 0; + } + } + } + } + + return -1; +} + +/* + * mspec_nopage + * + * Creates a mspec page and maps it to user space. + */ +static struct page * +mspec_nopage(struct vm_area_struct *vma, unsigned long address, int *unused) +{ + unsigned long paddr, maddr = 0; + unsigned long pfn; + int index; + pte_t *pte; + struct page *page; + struct vma_data *vdata = vma->vm_private_data; + + spin_lock(&vdata->lock); + + index = (address - vma->vm_start) >> PAGE_SHIFT; + if (vdata->maddr[index] == 0) { + vdata->count++; + maddr = mspec_alloc_page(numa_node_id(), vdata->type); + if (maddr == 0) + BUG(); + vdata->maddr[index] = maddr; + } else if (mspec_get_one_pte(vma->vm_mm, address, &pte) == 0) { + printk(KERN_ERR "page already mapped\n"); + /* + * The page may have already been faulted by another + * pthread. If so, we need to avoid remapping the + * page or we will trip a BUG check in the + * remap_page_range() path. + */ + goto getpage; + } + + if (vdata->type == MSPEC_FETCHOP) + paddr = TO_AMO(vdata->maddr[index]); + else + paddr = __pa(TO_CAC(vdata->maddr[index])); + + /* + * XXX - is this correct? + */ + pfn = paddr >> PAGE_SHIFT; + if (remap_pfn_range(vma, address, pfn, PAGE_SIZE, vma->vm_page_prot)) { + printk(KERN_ERR "remap_pfn_range failed!\n"); + goto error; + } + + /* + * The kernel requires a page structure to be returned upon + * success, but there are no page structures for low granule pages. + * remap_page_range() creates the pte for us and we return a + * bogus page back to the kernel fault handler to keep it happy + * (the page is freed immediately there). + */ + if (mspec_get_one_pte(vma->vm_mm, address, &pte) == 0) { + spin_lock(&vma->vm_mm->page_table_lock); + inc_mm_counter(vma->vm_mm, rss); + spin_unlock(&vma->vm_mm->page_table_lock); + + set_pte(pte, pte_mkwrite(pte_mkdirty(*pte))); + } + getpage: + /* + * Is this really correct? + */ + spin_unlock(&vdata->lock); + page = alloc_pages(GFP_USER, 0); + return page; + + error: + if (maddr) { + mspec_free_page(vdata->maddr[index]); + vdata->maddr[index] = 0; + vdata->count--; + } + spin_unlock(&vdata->lock); + return NOPAGE_SIGBUS; +} + + +#ifdef CONFIG_PROC_FS +static void * +mspec_seq_start(struct seq_file *file, loff_t *offset) +{ + if (*offset < MAX_NUMNODES) + return offset; + return NULL; +} + +static void * +mspec_seq_next(struct seq_file *file, void *data, loff_t *offset) +{ + (*offset)++; + if (*offset < MAX_NUMNODES) + return offset; + return NULL; +} + +static void +mspec_seq_stop(struct seq_file *file, void *data) +{ +} + +static int +mspec_seq_show(struct seq_file *file, void *data) +{ + struct node_mspecs *mspecs; + int i; + + i = *(loff_t *)data; + + if (!i) { + seq_printf(file, "mappings : %i\n", + atomic_read(&mspec_stats.map_count)); + seq_printf(file, "current mspec pages : %i\n", + atomic_read(&mspec_stats.pages_in_use)); + seq_printf(file, "%4s %7s %7s\n", "node", "total", "free"); + } + + if (i < MAX_NUMNODES) { + int free, count; + mspecs = node_mspecs[i]; + if (mspecs) { + free = atomic_read(&mspecs->free); + count = mspecs->count; + seq_printf(file, "%4d %7d %7d\n", i, count, free); + } + } + + return 0; +} + + +static struct seq_operations mspec_seq_ops = { + .start = mspec_seq_start, + .next = mspec_seq_next, + .stop = mspec_seq_stop, + .show = mspec_seq_show +}; + +int +mspec_proc_open(struct inode *inode, struct file *file) +{ + return seq_open(file, &mspec_seq_ops); +} + +static struct file_operations proc_mspec_operations = { + .open = mspec_proc_open, + .read = seq_read, + .llseek = seq_lseek, + .release = seq_release, +}; + + +static struct proc_dir_entry *proc_mspec; + +#endif /* CONFIG_PROC_FS */ + +/* + * mspec_build_memmap, + * + * Called at boot time to build a map of pages that can be used for + * memory special operations. + */ +static int __init +mspec_build_memmap(unsigned long start, unsigned long end) +{ + long length; + bte_result_t br; + unsigned long vstart, vend; + int node; + + length = end - start; + vstart = start + __IA64_UNCACHED_OFFSET; + vend = end + __IA64_UNCACHED_OFFSET; + +#if DEBUG + printk(KERN_ERR "mspec_build_memmap(%lx %lx)\n", start, end); +#endif + + br = BTE_ZERO_BLOCK(vstart, length); + if (br != BTE_SUCCESS) + panic("BTE Failed while trying to zero mspec page. bte_result_t = %d\n", (int) br); + + node = nasid_to_cnodeid(NASID_GET(start)); + + for (; vstart < vend ; vstart += PAGE_SIZE) { +#if DEBUG + printk(KERN_INFO "sticking %lx into the pool!\n", vstart); +#endif + gen_pool_free(mspec_pool[node], vstart, PAGE_SIZE); + } + + return 0; +} + +/* + * Walk the EFI memory map to pull out leftover pages in the lower + * memory regions which do not end up in the regular memory map and + * stick them into the uncached allocator + */ +static void __init +mspec_walk_efi_memmap_uc (void) +{ + void *efi_map_start, *efi_map_end, *p; + efi_memory_desc_t *md; + u64 efi_desc_size, start, end; + + efi_map_start = __va(ia64_boot_param->efi_memmap); + efi_map_end = efi_map_start + ia64_boot_param->efi_memmap_size; + efi_desc_size = ia64_boot_param->efi_memdesc_size; + + for (p = efi_map_start; p < efi_map_end; p += efi_desc_size) { + md = p; + if (md->attribute == EFI_MEMORY_UC) { + start = PAGE_ALIGN(md->phys_addr); + end = PAGE_ALIGN((md->phys_addr+(md->num_pages << EFI_PAGE_SHIFT)) & PAGE_MASK); + if (mspec_build_memmap(start, end) < 0) + return; + } + } +} + + + +/* + * mspec_init + * + * Called at boot time to initialize the mspec facility. + */ +static int __init +mspec_init(void) +{ + int i, ret; + + /* + * The fetchop device only works on SN2 hardware, uncached and cached + * memory drivers should both be valid on all ia64 hardware + */ + if (ia64_platform_is("sn2")) { + if ((ret = misc_register(&fetchop_miscdev))) { + printk(KERN_ERR "%s: failed to register device %i\n", + FETCHOP_DRIVER_ID_STR, ret); + return ret; + } + } + if ((ret = misc_register(&cached_miscdev))) { + printk(KERN_ERR "%s: failed to register device %i\n", + CACHED_DRIVER_ID_STR, ret); + misc_deregister(&fetchop_miscdev); + return ret; + } + if ((ret = misc_register(&uncached_miscdev))) { + printk(KERN_ERR "%s: failed to register device %i\n", + UNCACHED_DRIVER_ID_STR, ret); + misc_deregister(&cached_miscdev); + misc_deregister(&fetchop_miscdev); + return ret; + } + + /* + * /proc code needs to be updated to work with the new + * allocation scheme + */ +#ifdef CONFIG_PROC_FS + if (!(proc_mspec = create_proc_entry(MSPEC_BASENAME, 0444, NULL))){ + printk(KERN_ERR "%s: unable to create proc entry", + FETCHOP_DRIVER_ID_STR); + misc_deregister(&uncached_miscdev); + misc_deregister(&cached_miscdev); + misc_deregister(&fetchop_miscdev); + return -EINVAL; + } + proc_mspec->proc_fops = &proc_mspec_operations; +#endif /* CONFIG_PROC_FS */ + + for (i = 0; i < MAX_NUMNODES; i++) { + if (!node_online(i)) + continue; + mspec_pool[i] = gen_pool_create(0, IA64_GRANULE_SHIFT, + &mspec_get_new_chunk, i); + } + + mspec_walk_efi_memmap_uc(); + + printk(KERN_INFO "%s: v%s\n", FETCHOP_DRIVER_ID_STR, REVISION); + printk(KERN_INFO "%s: v%s\n", CACHED_DRIVER_ID_STR, REVISION); + printk(KERN_INFO "%s: v%s\n", UNCACHED_DRIVER_ID_STR, REVISION); + + return 0; +} + + +static void __exit +mspec_exit(void) +{ + BUG_ON(atomic_read(&mspec_stats.pages_in_use) > 0); + +#ifdef CONFIG_PROC_FS + remove_proc_entry(MSPEC_BASENAME, NULL); +#endif + misc_deregister(&uncached_miscdev); + misc_deregister(&cached_miscdev); + misc_deregister(&fetchop_miscdev); +} + + +unsigned long +mspec_kalloc_page(int nid) +{ + return TO_AMO(mspec_alloc_page(nid, MSPEC_FETCHOP)); +} + + +void +mspec_kfree_page(unsigned long maddr) +{ + mspec_free_page(TO_PHYS(maddr) + __IA64_UNCACHED_OFFSET); +} +EXPORT_SYMBOL(mspec_kalloc_page); +EXPORT_SYMBOL(mspec_kfree_page); + + +module_init(mspec_init); +module_exit(mspec_exit); + + +MODULE_AUTHOR("Silicon Graphics, Inc."); +MODULE_DESCRIPTION("Driver for SGI SN special memory operations"); +MODULE_LICENSE("GPL"); diff -urN -X /usr/people/jes/exclude-linux linux-2.6.12-rc2-mm3-vanilla/include/asm-ia64/sn/fetchop.h linux-2.6.12-rc2-mm3/include/asm-ia64/sn/fetchop.h --- linux-2.6.12-rc2-mm3-vanilla/include/asm-ia64/sn/fetchop.h 2005-03-01 23:38:12 -08:00 +++ linux-2.6.12-rc2-mm3/include/asm-ia64/sn/fetchop.h 1969-12-31 16:00:00 -08:00 @@ -1,85 +0,0 @@ -/* - * - * This file is subject to the terms and conditions of the GNU General Public - * License. See the file "COPYING" in the main directory of this archive - * for more details. - * - * Copyright (c) 2001-2004 Silicon Graphics, Inc. All rights reserved. - */ - -#ifndef _ASM_IA64_SN_FETCHOP_H -#define _ASM_IA64_SN_FETCHOP_H - -#include - -#define FETCHOP_BASENAME "sgi_fetchop" -#define FETCHOP_FULLNAME "/dev/sgi_fetchop" - - - -#define FETCHOP_VAR_SIZE 64 /* 64 byte per fetchop variable */ - -#define FETCHOP_LOAD 0 -#define FETCHOP_INCREMENT 8 -#define FETCHOP_DECREMENT 16 -#define FETCHOP_CLEAR 24 - -#define FETCHOP_STORE 0 -#define FETCHOP_AND 24 -#define FETCHOP_OR 32 - -#define FETCHOP_CLEAR_CACHE 56 - -#define FETCHOP_LOAD_OP(addr, op) ( \ - *(volatile long *)((char*) (addr) + (op))) - -#define FETCHOP_STORE_OP(addr, op, x) ( \ - *(volatile long *)((char*) (addr) + (op)) = (long) (x)) - -#ifdef __KERNEL__ - -/* - * Convert a region 6 (kaddr) address to the address of the fetchop variable - */ -#define FETCHOP_KADDR_TO_MSPEC_ADDR(kaddr) TO_MSPEC(kaddr) - - -/* - * Each Atomic Memory Operation (AMO formerly known as fetchop) - * variable is 64 bytes long. The first 8 bytes are used. The - * remaining 56 bytes are unaddressable due to the operation taking - * that portion of the address. - * - * NOTE: The AMO_t _MUST_ be placed in either the first or second half - * of the cache line. The cache line _MUST NOT_ be used for anything - * other than additional AMO_t entries. This is because there are two - * addresses which reference the same physical cache line. One will - * be a cached entry with the memory type bits all set. This address - * may be loaded into processor cache. The AMO_t will be referenced - * uncached via the memory special memory type. If any portion of the - * cached cache-line is modified, when that line is flushed, it will - * overwrite the uncached value in physical memory and lead to - * inconsistency. - */ -typedef struct { - u64 variable; - u64 unused[7]; -} AMO_t; - - -/* - * The following APIs are externalized to the kernel to allocate/free pages of - * fetchop variables. - * fetchop_kalloc_page - Allocate/initialize 1 fetchop page on the - * specified cnode. - * fetchop_kfree_page - Free a previously allocated fetchop page - */ - -unsigned long fetchop_kalloc_page(int nid); -void fetchop_kfree_page(unsigned long maddr); - - -#endif /* __KERNEL__ */ - -#endif /* _ASM_IA64_SN_FETCHOP_H */ - diff -urN -X /usr/people/jes/exclude-linux linux-2.6.12-rc2-mm3-vanilla/include/asm-ia64/sn/mspec.h linux-2.6.12-rc2-mm3/include/asm-ia64/sn/mspec.h --- linux-2.6.12-rc2-mm3-vanilla/include/asm-ia64/sn/mspec.h 1969-12-31 16:00:00 -08:00 +++ linux-2.6.12-rc2-mm3/include/asm-ia64/sn/mspec.h 2005-04-12 02:14:06 -07:00 @@ -0,0 +1,72 @@ +/* + * + * This file is subject to the terms and conditions of the GNU General Public + * License. See the file "COPYING" in the main directory of this archive + * for more details. + * + * Copyright (c) 2001-2004 Silicon Graphics, Inc. All rights reserved. + */ + +#ifndef _ASM_IA64_SN_MSPEC_H +#define _ASM_IA64_SN_MSPEC_H + +#define FETCHOP_VAR_SIZE 64 /* 64 byte per fetchop variable */ + +#define FETCHOP_LOAD 0 +#define FETCHOP_INCREMENT 8 +#define FETCHOP_DECREMENT 16 +#define FETCHOP_CLEAR 24 + +#define FETCHOP_STORE 0 +#define FETCHOP_AND 24 +#define FETCHOP_OR 32 + +#define FETCHOP_CLEAR_CACHE 56 + +#define FETCHOP_LOAD_OP(addr, op) ( \ + *(volatile long *)((char*) (addr) + (op))) + +#define FETCHOP_STORE_OP(addr, op, x) ( \ + *(volatile long *)((char*) (addr) + (op)) = (long) (x)) + +#ifdef __KERNEL__ + +/* + * Each Atomic Memory Operation (AMO formerly known as fetchop) + * variable is 64 bytes long. The first 8 bytes are used. The + * remaining 56 bytes are unaddressable due to the operation taking + * that portion of the address. + * + * NOTE: The AMO_t _MUST_ be placed in either the first or second half + * of the cache line. The cache line _MUST NOT_ be used for anything + * other than additional AMO_t entries. This is because there are two + * addresses which reference the same physical cache line. One will + * be a cached entry with the memory type bits all set. This address + * may be loaded into processor cache. The AMO_t will be referenced + * uncached via the memory special memory type. If any portion of the + * cached cache-line is modified, when that line is flushed, it will + * overwrite the uncached value in physical memory and lead to + * inconsistency. + */ +typedef struct { + u64 variable; + u64 unused[7]; +} AMO_t; + + +/* + * The following APIs are externalized to the kernel to allocate/free pages of + * fetchop variables. + * mspec_kalloc_page - Allocate/initialize 1 fetchop page on the + * specified cnode. + * mspec_kfree_page - Free a previously allocated fetchop page + */ + +extern unsigned long mspec_kalloc_page(int); +extern void mspec_kfree_page(unsigned long); + + +#endif /* __KERNEL__ */ + +#endif /* _ASM_IA64_SN_MSPEC_H */ + - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/