2005-10-14 19:22:14

by Robin Holt

[permalink] [raw]
Subject: [Patch 0/3] SGI Altix and ia64 special memory support.


SGI hardware supports a special type of memory called fetchop or atomic
memory. This memory does atomic operations at the memory controller
instead of using the processor.

This patch set introduces a driver so user land can map the devices and
fault pages of the appropriate type. Pages are inserted on first touch.
The reason for that was hashed out earlier on the lists, but can be
distilled to node locality, node resource limitation, and application
performance.

Since a typical ia64 uncached page does not have a page struct backing it,
we first modify do_no_page to handle a new return type of NOPAGE_FAULTED.
This indicates to the nopage handler that the desired operation is
complete and should be treated as a minor fault. This is a result of a
discussion which Jes Sorenson started on the the ia64 mailing list and
Christoph Hellwig carried over to the linux-mm mailing list.

The second patch introduces the mspec driver.

I am reposting these today. The last version went out in a rush last
night and I did not take the time to notify the people that were part
of the earlier discussion.

Additionally, the version which Jes posted last April was using
remap_pfn_range(). This version uses set_pte(). I realize that is
probably the wrong thing to do. Unfortunately, we need this to be
thread-safe. With remap_pfn_range() there is a BUG_ON(!pte_none(*pte));
in remap_pte_range() which would trip if there were multiple threads
faulting at the same time. To work around that, I started looking at
breaking remap_pfn_range() into an _remap_pfn_range() which assumed
the locks were already held. At that point, it became apparent I
was stretching the use of remap_pfn_range beyond its original intent.
For this driver, we are inserting a single pte, the page tables have
already been put in place by the caller's chain, why not just insert
the pte directly. That is what I finally did.

Thanks,
Robin Holt


2005-10-14 19:22:50

by Robin Holt

[permalink] [raw]
Subject: [Patch 2/3] Export get_one_pte_map.


Add an export of get_one_pte_map.

Signed-off-by: Robin Holt <[email protected]>


Index: linux-2.6/mm/mremap.c
===================================================================
--- linux-2.6.orig/mm/mremap.c 2005-09-28 15:05:56.000000000 -0500
+++ linux-2.6/mm/mremap.c 2005-10-14 11:32:11.276747477 -0500
@@ -12,6 +12,7 @@
#include <linux/slab.h>
#include <linux/shm.h>
#include <linux/mman.h>
+#include <linux/module.h>
#include <linux/swap.h>
#include <linux/fs.h>
#include <linux/highmem.h>
@@ -50,7 +51,7 @@ end:
return pte;
}

-static pte_t *get_one_pte_map(struct mm_struct *mm, unsigned long addr)
+pte_t *get_one_pte_map(struct mm_struct *mm, unsigned long addr)
{
pgd_t *pgd;
pud_t *pud;
@@ -70,6 +71,7 @@ static pte_t *get_one_pte_map(struct mm_

return pte_offset_map(pmd, addr);
}
+EXPORT_SYMBOL(get_one_pte_map);

static inline pte_t *alloc_one_pte_map(struct mm_struct *mm, unsigned long addr)
{
Index: linux-2.6/include/linux/mm.h
===================================================================
--- linux-2.6.orig/include/linux/mm.h 2005-10-12 11:21:54.701122262 -0500
+++ linux-2.6/include/linux/mm.h 2005-10-14 09:04:29.191902399 -0500
@@ -733,6 +733,7 @@ int FASTCALL(set_page_dirty(struct page
int set_page_dirty_lock(struct page *page);
int clear_page_dirty_for_io(struct page *page);

+extern pte_t *get_one_pte_map(struct mm_struct *mm, unsigned long addr);
extern unsigned long do_mremap(unsigned long addr,
unsigned long old_len, unsigned long new_len,
unsigned long flags, unsigned long new_addr);

2005-10-14 19:22:07

by Robin Holt

[permalink] [raw]
Subject: [Patch 1/3] Add a NOPAGE_FAULTED flag.


Introduce a NOPAGE_FAULTED flag. This flag is returned from a drivers
nopage handler to indicate the desired pte has been inserted and should
be handled as a minor fault.

Signed-off-by: [email protected]


Index: linux-2.6/include/linux/mm.h
===================================================================
--- linux-2.6.orig/include/linux/mm.h 2005-10-11 21:15:45.147011169 -0500
+++ linux-2.6/include/linux/mm.h 2005-10-11 21:17:15.814561844 -0500
@@ -619,6 +619,7 @@ static inline int page_mapped(struct pag
*/
#define NOPAGE_SIGBUS (NULL)
#define NOPAGE_OOM ((struct page *) (-1))
+#define NOPAGE_FAULTED ((struct page *) (-2))

/*
* Different kinds of faults, as returned by handle_mm_fault().
Index: linux-2.6/mm/memory.c
===================================================================
--- linux-2.6.orig/mm/memory.c 2005-10-11 21:15:45.162634582 -0500
+++ linux-2.6/mm/memory.c 2005-10-11 21:17:15.849714525 -0500
@@ -1862,6 +1862,14 @@ retry:
return VM_FAULT_SIGBUS;
if (new_page == NOPAGE_OOM)
return VM_FAULT_OOM;
+ if (new_page == NOPAGE_FAULTED) {
+ spin_lock(&mm->page_table_lock);
+ page_table = pte_offset_map(pmd, address);
+ pte_unmap(page_table);
+ spin_unlock(&mm->page_table_lock);
+
+ return VM_FAULT_MINOR;
+ }

/*
* Should we do an early C-O-W break?

2005-10-14 19:23:12

by Robin Holt

[permalink] [raw]
Subject: [Patch 3/3] Special Memory (mspec) driver.


Introduce the special memory (mspec) driver. This is used to allow
userland to map fetchop, etc pages

Signed-off-by: [email protected]



Index: linux-2.6/arch/ia64/Kconfig
===================================================================
--- linux-2.6.orig/arch/ia64/Kconfig 2005-10-14 13:05:33.061934274 -0500
+++ linux-2.6/arch/ia64/Kconfig 2005-10-14 13:06:30.140195032 -0500
@@ -231,6 +231,16 @@ config IA64_SGI_SN_XP
this feature will allow for direct communication between SSIs
based on a network adapter and DMA messaging.

+config MSPEC
+ tristate "Special Memory support"
+ select IA64_UNCACHED_ALLOCATOR
+ help
+ This driver allows for cached and uncached mappings of memory
+ to user processes. On SGI SN hardware it will also export the
+ special fetchop memory facility.
+ Fetchops are atomic memory operations that are implemented in the
+ memory controller on SGI SN hardware.
+
config FORCE_MAX_ZONEORDER
int
default "18"
Index: linux-2.6/arch/ia64/kernel/Makefile
===================================================================
--- linux-2.6.orig/arch/ia64/kernel/Makefile 2005-10-14 13:05:33.062910739 -0500
+++ linux-2.6/arch/ia64/kernel/Makefile 2005-10-14 13:06:30.145077354 -0500
@@ -23,6 +23,7 @@ obj-$(CONFIG_IA64_CYCLONE) += cyclone.o
obj-$(CONFIG_CPU_FREQ) += cpufreq/
obj-$(CONFIG_IA64_MCA_RECOVERY) += mca_recovery.o
obj-$(CONFIG_KPROBES) += kprobes.o jprobes.o
+obj-$(CONFIG_MSPEC) += mspec.o
obj-$(CONFIG_IA64_UNCACHED_ALLOCATOR) += uncached.o
mca_recovery-y += mca_drv.o mca_drv_asm.o

Index: linux-2.6/arch/ia64/kernel/mspec.c
===================================================================
--- /dev/null 1970-01-01 00:00:00.000000000 +0000
+++ linux-2.6/arch/ia64/kernel/mspec.c 2005-10-14 14:02:08.473328177 -0500
@@ -0,0 +1,388 @@
+/*
+ * Copyright (C) 2001-2005 Silicon Graphics, Inc. All rights
+ * reserved.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of version 2 of the GNU General Public License
+ * as published by the Free Software Foundation.
+ */
+
+/*
+ * SN Platform Special Memory (mspec) Support
+ *
+ * This driver exports the SN special memory (mspec) facility to user processes.
+ * There are three types of memory made available thru this driver:
+ * fetchops, uncached and cached.
+ *
+ * Fetchops are atomic memory operations that are implemented in the
+ * memory controller on SGI SN hardware.
+ *
+ * Uncached are used for memory write combining feature of the ia64
+ * cpu.
+ *
+ * Cached are used for areas of memory that are used as cached addresses
+ * on our partition and used as uncached addresses from other partitions.
+ * Due to a design constraint of the SN2 Shub, you can not have processors
+ * on the same FSB perform both a cached and uncached reference to the
+ * same cache line. These special memory cached regions prevent the
+ * kernel from ever dropping in a TLB entry and therefore prevent the
+ * processor from ever speculating a cache line from this page.
+ */
+
+#include <linux/config.h>
+#include <linux/types.h>
+#include <linux/kernel.h>
+#include <linux/module.h>
+#include <linux/init.h>
+#include <linux/errno.h>
+#include <linux/miscdevice.h>
+#include <linux/spinlock.h>
+#include <linux/mm.h>
+#include <linux/proc_fs.h>
+#include <linux/vmalloc.h>
+#include <linux/bitops.h>
+#include <linux/string.h>
+#include <linux/slab.h>
+#include <linux/seq_file.h>
+#include <linux/efi.h>
+#include <asm/page.h>
+#include <asm/pal.h>
+#include <asm/system.h>
+#include <asm/pgtable.h>
+#include <asm/atomic.h>
+#include <asm/tlbflush.h>
+#include <asm/uncached.h>
+#include <asm/sn/addrs.h>
+#include <asm/sn/arch.h>
+#include <asm/sn/mspec.h>
+#include <asm/sn/sn_cpuid.h>
+#include <asm/sn/io.h>
+#include <asm/sn/bte.h>
+#include <asm/sn/shubio.h>
+
+
+#define FETCHOP_ID "Fetchop,"
+#define CACHED_ID "Cached,"
+#define UNCACHED_ID "Uncached"
+#define REVISION "3.0"
+#define MSPEC_BASENAME "mspec"
+
+/*
+ * Page types allocated by the device.
+ */
+enum {
+ MSPEC_FETCHOP = 1,
+ MSPEC_CACHED,
+ MSPEC_UNCACHED
+};
+
+/*
+ * One of these structures is allocated when an mspec region is mmaped. The
+ * structure is pointed to by the vma->vm_private_data field in the vma struct.
+ * This structure is used to record the addresses of the mspec pages.
+ */
+struct vma_data {
+ atomic_t refcnt; /* Number of vmas sharing the data. */
+ spinlock_t lock; /* Serialize access to the vma. */
+ int count; /* Number of pages allocated. */
+ int type; /* Type of pages allocated. */
+ unsigned long maddr[0]; /* Array of MSPEC addresses. */
+};
+
+/* used on shub2 to clear FOP cache in the HUB */
+static unsigned long uncached_scratch_page;
+static void *fetchop_cache_clear;
+#define SH2_AMO_CACHE_ENTRIES 4
+
+static inline int
+mspec_zero_block(unsigned long addr, int len)
+{
+ int status;
+
+ if (ia64_platform_is("sn2")) {
+ if (fetchop_cache_clear) {
+ void *p = fetchop_cache_clear;
+ int i;
+
+ for (i=0; i < SH2_AMO_CACHE_ENTRIES; i++) {
+ FETCHOP_LOAD_OP(p, FETCHOP_LOAD);
+ p += FETCHOP_VAR_SIZE;
+ }
+ }
+ status = bte_copy(0, addr & ~__IA64_UNCACHED_OFFSET, len,
+ BTE_WACQUIRE | BTE_ZERO_FILL, NULL);
+ } else {
+ memset((char *) addr, 0, len);
+ status = 0;
+ }
+ return status;
+}
+
+/*
+ * mspec_open
+ *
+ * Called when a device mapping is created by a means other than mmap
+ * (via fork, etc.). Increments the reference count on the underlying
+ * mspec data so it is not freed prematurely.
+ */
+static void
+mspec_open(struct vm_area_struct *vma)
+{
+ struct vma_data *vdata;
+
+ vdata = vma->vm_private_data;
+ atomic_inc(&vdata->refcnt);
+}
+
+/*
+ * mspec_close
+ *
+ * Called when unmapping a device mapping. Frees all mspec pages
+ * belonging to the vma.
+ */
+static void
+mspec_close(struct vm_area_struct *vma)
+{
+ struct vma_data *vdata;
+ int i, pages, result;
+
+ vdata = vma->vm_private_data;
+ if (!atomic_dec_and_test(&vdata->refcnt))
+ return;
+
+ pages = (vma->vm_end - vma->vm_start) >> PAGE_SHIFT;
+ for (i = 0; i < pages; i++) {
+ if (vdata->maddr[i] == 0)
+ continue;
+ /*
+ * Clear the page before sticking it back
+ * into the pool.
+ */
+ result = mspec_zero_block(vdata->maddr[i], PAGE_SIZE);
+ if (!result)
+ uncached_free_page(vdata-> maddr[i]);
+ else
+ printk(KERN_WARNING "mspec_close(): "
+ "failed to zero page %i\n",
+ result);
+ }
+
+ vfree(vdata);
+}
+
+/*
+ * mspec_nopage
+ *
+ * Creates a mspec page and maps it to user space.
+ */
+static struct page *
+mspec_nopage(struct vm_area_struct *vma,
+ unsigned long address, int *unused)
+{
+ unsigned long paddr, maddr = 0;
+ unsigned long pfn;
+ int index;
+ pte_t *page_table, pte;
+ struct vma_data *vdata = vma->vm_private_data;
+
+ index = (address - vma->vm_start) >> PAGE_SHIFT;
+ maddr = (volatile unsigned long) vdata->maddr[index];
+ if (maddr == 0) {
+ maddr = uncached_alloc_page(numa_node_id());
+ if (maddr == 0)
+ return NOPAGE_OOM;
+
+ spin_lock(&vdata->lock);
+ if (vdata->maddr[index] == 0) {
+ vdata->count++;
+ vdata->maddr[index] = maddr;
+ } else {
+ uncached_free_page(maddr);
+ maddr = vdata->maddr[index];
+ }
+ spin_unlock(&vdata->lock);
+ }
+
+ spin_lock(&vma->vm_mm->page_table_lock);
+ page_table = get_one_pte_map(vma->vm_mm, address);
+ if (page_table != NULL && !pte_present(*page_table)) {
+ if (vdata->type == MSPEC_FETCHOP)
+ paddr = TO_AMO(maddr);
+ else
+ paddr = __pa(TO_CAC(maddr));
+
+ pfn = paddr >> PAGE_SHIFT;
+ pte = pfn_pte(pfn, vma->vm_page_prot);
+ pte = pte_mkwrite(pte_mkdirty(pte));
+ set_pte(page_table, pte);
+ }
+ spin_unlock(&vma->vm_mm->page_table_lock);
+
+ return NOPAGE_FAULTED;
+}
+
+static struct vm_operations_struct mspec_vm_ops = {
+ .open mspec_open,
+ .close mspec_close,
+ .nopage mspec_nopage
+};
+
+/*
+ * mspec_mmap
+ *
+ * Called when mmaping the device. Initializes the vma with a fault handler
+ * and private data structure necessary to allocate, track, and free the
+ * underlying pages.
+ */
+static int
+mspec_mmap(struct file *file, struct vm_area_struct *vma, int type)
+{
+ struct vma_data *vdata;
+ int pages;
+
+ if (vma->vm_pgoff != 0)
+ return -EINVAL;
+
+ if ((vma->vm_flags & VM_WRITE) == 0)
+ return -EPERM;
+
+ pages = (vma->vm_end - vma->vm_start) >> PAGE_SHIFT;
+ vdata = vmalloc(sizeof(struct vma_data) + pages * sizeof(long));
+ if (!vdata)
+ return -ENOMEM;
+ memset(vdata, 0, sizeof(struct vma_data) + pages * sizeof(long));
+
+ vdata->type = type;
+ spin_lock_init(&vdata->lock);
+ vdata->refcnt = ATOMIC_INIT(1);
+ vma->vm_private_data = vdata;
+
+ vma->vm_flags |= (VM_IO | VM_SHM | VM_LOCKED | VM_RESERVED);
+ if (vdata->type == MSPEC_FETCHOP || vdata->type == MSPEC_UNCACHED)
+ vma->vm_page_prot = pgprot_noncached(vma->vm_page_prot);
+ vma->vm_ops = &mspec_vm_ops;
+
+ return 0;
+}
+
+static int
+fetchop_mmap(struct file *file, struct vm_area_struct *vma)
+{
+ return mspec_mmap(file, vma, MSPEC_FETCHOP);
+}
+
+static int
+cached_mmap(struct file *file, struct vm_area_struct *vma)
+{
+ return mspec_mmap(file, vma, MSPEC_CACHED);
+}
+
+static int
+uncached_mmap(struct file *file, struct vm_area_struct *vma)
+{
+ return mspec_mmap(file, vma, MSPEC_UNCACHED);
+}
+
+static struct file_operations fetchop_fops = {
+ .owner THIS_MODULE,
+ .mmap fetchop_mmap
+};
+
+static struct miscdevice fetchop_miscdev = {
+ .minor MISC_DYNAMIC_MINOR,
+ .name "sgi_fetchop",
+ .fops &fetchop_fops
+};
+
+static struct file_operations cached_fops = {
+ .owner THIS_MODULE,
+ .mmap cached_mmap
+};
+
+static struct miscdevice cached_miscdev = {
+ .minor MISC_DYNAMIC_MINOR,
+ .name "sgi_cached",
+ .fops &cached_fops
+};
+
+static struct file_operations uncached_fops = {
+ .owner THIS_MODULE,
+ .mmap uncached_mmap
+};
+
+static struct miscdevice uncached_miscdev = {
+ .minor MISC_DYNAMIC_MINOR,
+ .name "sgi_uncached",
+ .fops &uncached_fops
+};
+
+/*
+ * mspec_init
+ *
+ * Called at boot time to initialize the mspec facility.
+ */
+static int __init
+mspec_init(void)
+{
+ int ret;
+
+ /*
+ * The fetchop device only works on SN2 hardware, uncached and cached
+ * memory drivers should both be valid on all ia64 hardware
+ */
+ if (ia64_platform_is("sn2")) {
+ if (is_shub2()) {
+ uncached_scratch_page = uncached_alloc_page(0);
+ fetchop_cache_clear = (void *)TO_AMO(uncached_scratch_page);
+ }
+
+ ret = misc_register(&fetchop_miscdev);
+ if (ret) {
+ printk(KERN_ERR
+ "%s: failed to register device %i\n",
+ FETCHOP_ID, ret);
+ return ret;
+ }
+ }
+ ret = misc_register(&cached_miscdev);
+ if (ret) {
+ printk(KERN_ERR "%s: failed to register device %i\n",
+ CACHED_ID, ret);
+ misc_deregister(&fetchop_miscdev);
+ return ret;
+ }
+ ret = misc_register(&uncached_miscdev);
+ if (ret) {
+ printk(KERN_ERR "%s: failed to register device %i\n",
+ UNCACHED_ID, ret);
+ misc_deregister(&cached_miscdev);
+ misc_deregister(&fetchop_miscdev);
+ return ret;
+ }
+
+ printk(KERN_INFO "%s %s initialized devices: %s %s %s\n",
+ MSPEC_BASENAME, REVISION,
+ ia64_platform_is("sn2") ? FETCHOP_ID : "",
+ CACHED_ID, UNCACHED_ID);
+
+ return 0;
+}
+
+static void __exit
+mspec_exit(void)
+{
+ misc_deregister(&uncached_miscdev);
+ misc_deregister(&cached_miscdev);
+ if (ia64_platform_is("sn2")) {
+ misc_deregister(&fetchop_miscdev);
+ if (uncached_scratch_page)
+ uncached_free_page(uncached_scratch_page);
+ }
+}
+
+module_init(mspec_init);
+module_exit(mspec_exit);
+
+MODULE_AUTHOR("Silicon Graphics, Inc.");
+MODULE_DESCRIPTION("Driver for SGI SN special memory operations");
+MODULE_LICENSE("GPL");

2005-10-14 21:31:07

by Greg KH

[permalink] [raw]
Subject: Re: [Patch 2/3] Export get_one_pte_map.

On Fri, Oct 14, 2005 at 02:22:25PM -0500, Robin Holt wrote:
> +EXPORT_SYMBOL(get_one_pte_map);

EXPORT_SYMBOL_GPL() ?

thanks,

greg k-h

2005-10-17 11:32:29

by Robin Holt

[permalink] [raw]
Subject: Re: [Patch 2/3] Export get_one_pte_map.

On Fri, Oct 14, 2005 at 02:30:38PM -0700, Greg KH wrote:
> On Fri, Oct 14, 2005 at 02:22:25PM -0500, Robin Holt wrote:
> > +EXPORT_SYMBOL(get_one_pte_map);
>
> EXPORT_SYMBOL_GPL() ?

Not sure why it would fall that way. Looking at the directory,
I get:

[holt@lnx-holt mm]$ grep -c 'EXPORT_SYMBOL(' *.c | egrep -v ":0$"
bootmem.c:1
filemap.c:34
fremap.c:1
highmem.c:4
hugetlb.c:1
memory.c:12
mempolicy.c:1
mempool.c:8
mmap.c:10
nommu.c:13
page_alloc.c:15
page-writeback.c:11
readahead.c:2
slab.c:16
sparse.c:1
swap.c:6
swapfile.c:1
swap_state.c:1
truncate.c:2
vmalloc.c:6
vmscan.c:2
[holt@lnx-holt mm]$ grep -c 'EXPORT_SYMBOL_GPL(' *.c | egrep -v ":0$"
filemap_xip.c:5
readahead.c:1
slab.c:1
truncate.c:2


I will happily change it, but that seems inconsistent with the
majority of the exports from that subsystem.

Robin

2005-10-17 11:36:10

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [Patch 2/3] Export get_one_pte_map.

On Mon, Oct 17, 2005 at 06:31:31AM -0500, Robin Holt wrote:
> On Fri, Oct 14, 2005 at 02:30:38PM -0700, Greg KH wrote:
> > On Fri, Oct 14, 2005 at 02:22:25PM -0500, Robin Holt wrote:
> > > +EXPORT_SYMBOL(get_one_pte_map);
> >
> > EXPORT_SYMBOL_GPL() ?
>
> Not sure why it would fall that way. Looking at the directory,
> I get:

This is a very lowlevel export for things that poke deep into VM
internals, so _GPL makes sense. In fact not allowing modular builds
of the mspec driver might make even more sense.

2005-10-17 11:39:35

by Robin Holt

[permalink] [raw]
Subject: Re: [Patch 2/3] Export get_one_pte_map.

On Mon, Oct 17, 2005 at 12:36:01PM +0100, Christoph Hellwig wrote:
> On Mon, Oct 17, 2005 at 06:31:31AM -0500, Robin Holt wrote:
> > On Fri, Oct 14, 2005 at 02:30:38PM -0700, Greg KH wrote:
> > > On Fri, Oct 14, 2005 at 02:22:25PM -0500, Robin Holt wrote:
> > > > +EXPORT_SYMBOL(get_one_pte_map);
> > >
> > > EXPORT_SYMBOL_GPL() ?
> >
> > Not sure why it would fall that way. Looking at the directory,
> > I get:
>
> This is a very lowlevel export for things that poke deep into VM
> internals, so _GPL makes sense. In fact not allowing modular builds
> of the mspec driver might make even more sense.

That would be acceptable as well. I was just looking for a way to
minimize kernel sizes for ia64 machines that don't need these
devices and therefore would not load the module. Just looking at
it from a distro perspective. What is the concensus?

Thanks,
Robin

2005-10-17 11:42:22

by Dave Hansen

[permalink] [raw]
Subject: Re: [Patch 2/3] Export get_one_pte_map.

On Mon, 2005-10-17 at 06:31 -0500, Robin Holt wrote:
> On Fri, Oct 14, 2005 at 02:30:38PM -0700, Greg KH wrote:
> > On Fri, Oct 14, 2005 at 02:22:25PM -0500, Robin Holt wrote:
> > > +EXPORT_SYMBOL(get_one_pte_map);
> >
> > EXPORT_SYMBOL_GPL() ?
>
> Not sure why it would fall that way. Looking at the directory,
> I get:

Most of the VM stuff in those directories that you're referring to are
old, crusty exports, from the days before _GPL. We've left them to be
polite, but if many of them were recreated today, they'd certainly be
_GPL.

We do not want random external modules poking at PTEs, nor should a
module need to know such kernel internals as the semantics of
PTE_HIGHMEM. I think it needs _GPL.

BTW, your new patches look much nicer than the last set. Thanks for
making the changes I suggested.

-- Dave

2005-10-17 11:47:59

by Robin Holt

[permalink] [raw]
Subject: Re: [Patch 2/3] Export get_one_pte_map.

On Mon, Oct 17, 2005 at 01:41:52PM +0200, Dave Hansen wrote:
> On Mon, 2005-10-17 at 06:31 -0500, Robin Holt wrote:
> > On Fri, Oct 14, 2005 at 02:30:38PM -0700, Greg KH wrote:
> > > On Fri, Oct 14, 2005 at 02:22:25PM -0500, Robin Holt wrote:
> > > > +EXPORT_SYMBOL(get_one_pte_map);
> > >
> > > EXPORT_SYMBOL_GPL() ?
> >
> > Not sure why it would fall that way. Looking at the directory,
> > I get:
>
> Most of the VM stuff in those directories that you're referring to are
> old, crusty exports, from the days before _GPL. We've left them to be
> polite, but if many of them were recreated today, they'd certainly be
> _GPL.

I got a little push from our internal incident tracking system for
this being a module. _GPL it will be.

Thanks,
Robin

2005-10-17 12:34:52

by Hugh Dickins

[permalink] [raw]
Subject: Re: [Patch 2/3] Export get_one_pte_map.

On Mon, 17 Oct 2005, Robin Holt wrote:
> On Mon, Oct 17, 2005 at 01:41:52PM +0200, Dave Hansen wrote:
> > On Mon, 2005-10-17 at 06:31 -0500, Robin Holt wrote:
> > > On Fri, Oct 14, 2005 at 02:30:38PM -0700, Greg KH wrote:
> > > > On Fri, Oct 14, 2005 at 02:22:25PM -0500, Robin Holt wrote:
> > > > > +EXPORT_SYMBOL(get_one_pte_map);
>
> I got a little push from our internal incident tracking system for
> this being a module. _GPL it will be.

Sorry, Robin, I've not been following your patches. But if you look
at 2.6.14-rc4-mm1, you'll find that there isn't even a get_one_pte_map
there. Though there's no certainty yet that my pt locking changes, or
Nick's PageReserved changes, will actually go forward, there's a lot of
work queued up in -mm that is likely to affect your code. And I don't
think exporting internal functions from mremap.c, _GPL or otherwise,
is the way to go.

Moving useful functions to a more central location might be. But I'm
very dubious about your doing this kind of pte stuff deep down in an
architecture-specific driver. You're not the only one interested in
this kind of functionality: we were thinking of providing it via an
alternative to the ->nopage method, which deals in pfns rather than
struct pages (I think that was wli's suggestion originally; Carsten
has an interest in it on s390, and I bet there are others). There
may be excellent reasons why that wouldn't be good enough for you,
and your retcode method may be a better idea: I don't know yet.

Please rebase your work to 2.6.14-rc4-mm1 (but I won't get to look
at the result for a few days: perhaps others will).

The big question has to be: what are you expecting to happen for
PROT_WRITE, MAP_PRIVATE?

Hugh

2005-10-17 15:15:18

by Robin Holt

[permalink] [raw]
Subject: Re: [Patch 2/3] Export get_one_pte_map.

On Mon, Oct 17, 2005 at 01:33:53PM +0100, Hugh Dickins wrote:
> On Mon, 17 Oct 2005, Robin Holt wrote:
> > On Mon, Oct 17, 2005 at 01:41:52PM +0200, Dave Hansen wrote:
> > > On Mon, 2005-10-17 at 06:31 -0500, Robin Holt wrote:
> > > > On Fri, Oct 14, 2005 at 02:30:38PM -0700, Greg KH wrote:
> > > > > On Fri, Oct 14, 2005 at 02:22:25PM -0500, Robin Holt wrote:
> > > > > > +EXPORT_SYMBOL(get_one_pte_map);
> >
> > I got a little push from our internal incident tracking system for
> > this being a module. _GPL it will be.
>
> Sorry, Robin, I've not been following your patches. But if you look
> at 2.6.14-rc4-mm1, you'll find that there isn't even a get_one_pte_map
> there. Though there's no certainty yet that my pt locking changes, or
> Nick's PageReserved changes, will actually go forward, there's a lot of
> work queued up in -mm that is likely to affect your code. And I don't
> think exporting internal functions from mremap.c, _GPL or otherwise,
> is the way to go.

I am currently getting pressure from my management to get something
checked into the tree for 2.6.15. Would it be reasonable to ask
that the current patch be included and then I work up another patch
which introduces a ->nopfn type change for the -mm tree?

Thanks,
Robin

2005-10-17 15:21:16

by Greg KH

[permalink] [raw]
Subject: Re: [Patch 2/3] Export get_one_pte_map.

On Mon, Oct 17, 2005 at 10:14:30AM -0500, Robin Holt wrote:
> On Mon, Oct 17, 2005 at 01:33:53PM +0100, Hugh Dickins wrote:
> > On Mon, 17 Oct 2005, Robin Holt wrote:
> > > On Mon, Oct 17, 2005 at 01:41:52PM +0200, Dave Hansen wrote:
> > > > On Mon, 2005-10-17 at 06:31 -0500, Robin Holt wrote:
> > > > > On Fri, Oct 14, 2005 at 02:30:38PM -0700, Greg KH wrote:
> > > > > > On Fri, Oct 14, 2005 at 02:22:25PM -0500, Robin Holt wrote:
> > > > > > > +EXPORT_SYMBOL(get_one_pte_map);
> > >
> > > I got a little push from our internal incident tracking system for
> > > this being a module. _GPL it will be.
> >
> > Sorry, Robin, I've not been following your patches. But if you look
> > at 2.6.14-rc4-mm1, you'll find that there isn't even a get_one_pte_map
> > there. Though there's no certainty yet that my pt locking changes, or
> > Nick's PageReserved changes, will actually go forward, there's a lot of
> > work queued up in -mm that is likely to affect your code. And I don't
> > think exporting internal functions from mremap.c, _GPL or otherwise,
> > is the way to go.
>
> I am currently getting pressure from my management to get something
> checked into the tree for 2.6.15.

Your management seems to not understand how kernel development works :)

> Would it be reasonable to ask that the current patch be included and
> then I work up another patch which introduces a ->nopfn type change
> for the -mm tree?

The stuff in -mm is what is going to be in .15, so you have to work off
of that patchset if you wish to have something for .15.

Good luck,

greg k-h

2005-10-17 15:56:55

by Robin Holt

[permalink] [raw]
Subject: Re: [Patch 2/3] Export get_one_pte_map.

On Mon, Oct 17, 2005 at 08:20:34AM -0700, Greg KH wrote:
> > Would it be reasonable to ask that the current patch be included and
> > then I work up another patch which introduces a ->nopfn type change
> > for the -mm tree?
>
> The stuff in -mm is what is going to be in .15, so you have to work off
> of that patchset if you wish to have something for .15.

Is everything in the mm/ directory from the -mm tree going into .15 or
is there a planned subset? What should I develop against to help ensure
I match up with the community?

Thanks,
Robin

2005-10-17 16:00:16

by Hugh Dickins

[permalink] [raw]
Subject: Re: [Patch 2/3] Export get_one_pte_map.

On Mon, 17 Oct 2005, Robin Holt wrote:
>
> I am currently getting pressure from my management to get something
> checked into the tree for 2.6.15.

I'm sorry to hear that, but it's not a kernel development priority.

And since I'm nearing completion of a task which we expect to satisfy
what SGI's been asking for almost a year, which I had been obstructing,
I'm disinclined to drop it now in order to help meet their latest fancy.

> Would it be reasonable to ask
> that the current patch be included and then I work up another patch
> which introduces a ->nopfn type change for the -mm tree?

I'm definitely not in charge here, and cannot answer that. But I
think it's unlikely, unless Linus and Andrew are pretty sure that
what you have now is really the way they want to go in the long term.

You will, as ever, be entitled to apply your patch on top of 2.6.15
(but it may not apply without some changes).

Repeating a technical question (sorry, that now seems off-topic!):
what do you expect to happen with PROT_WRITE, MAP_PRIVATE?

Hugh

2005-10-17 16:07:09

by Greg KH

[permalink] [raw]
Subject: Re: [Patch 2/3] Export get_one_pte_map.

On Mon, Oct 17, 2005 at 10:56:05AM -0500, Robin Holt wrote:
> On Mon, Oct 17, 2005 at 08:20:34AM -0700, Greg KH wrote:
> > > Would it be reasonable to ask that the current patch be included and
> > > then I work up another patch which introduces a ->nopfn type change
> > > for the -mm tree?
> >
> > The stuff in -mm is what is going to be in .15, so you have to work off
> > of that patchset if you wish to have something for .15.
>
> Is everything in the mm/ directory from the -mm tree going into .15 or
> is there a planned subset? What should I develop against to help ensure
> I match up with the community?

-mm is "the community" :)

But Hugh would have the best answer for this, as he knows what he will
be sending in for .15, so at the least, work off of his patches in
there.

Good luck,

greg k-h

2005-10-17 16:10:11

by Dave Hansen

[permalink] [raw]
Subject: Re: [Patch 2/3] Export get_one_pte_map.

On Mon, 2005-10-17 at 10:56 -0500, Robin Holt wrote:
> On Mon, Oct 17, 2005 at 08:20:34AM -0700, Greg KH wrote:
> > > Would it be reasonable to ask that the current patch be included and
> > > then I work up another patch which introduces a ->nopfn type change
> > > for the -mm tree?
> >
> > The stuff in -mm is what is going to be in .15, so you have to work off
> > of that patchset if you wish to have something for .15.
>
> Is everything in the mm/ directory from the -mm tree going into .15 or
> is there a planned subset? What should I develop against to help ensure
> I match up with the community?

I'd just work on top of the entire -mm tree for now. Or, check out the
patch-series file and the broken-out/ directory in the tree's directory
on kernel.org. You can apply only the patches that are in the mm
section (it's commented), they're just about first after the -git trees.

-- Dave

2005-10-17 16:15:31

by Hugh Dickins

[permalink] [raw]
Subject: Re: [Patch 2/3] Export get_one_pte_map.

On Mon, 17 Oct 2005, Robin Holt wrote:
> On Mon, Oct 17, 2005 at 08:20:34AM -0700, Greg KH wrote:
> >
> > The stuff in -mm is what is going to be in .15, so you have to work off
> > of that patchset if you wish to have something for .15.

The stuff in -mm is a best guess at what's going to be in 2.6.15,
but it's far from exact and certain.

> Is everything in the mm/ directory from the -mm tree going into .15 or
> is there a planned subset? What should I develop against to help ensure
> I match up with the community?

That's a question for Andrew (added to CC in case he's not receiving
enough copies of this mail ;-), and it's too soon to tell - changes
which have only just been exposed in 2.6.14-rc4-mm1 are not yet
mature enough for a judgement.

I've still got stuff to come, to make full sense of what's already in:
if I were Andrew, given the faster releases we're going for now, I'd
currently be in some doubt about whether to push that for 2.6.15.

Perhaps he'll be more comfortable with yours, feel it's the right
way to go, and want to put it ahead of what's presently in -mm.
Or perhaps not.

Hugh

2005-10-17 20:25:31

by Robin Holt

[permalink] [raw]
Subject: Re: [Patch 2/3] Export get_one_pte_map.

On Mon, Oct 17, 2005 at 04:59:20PM +0100, Hugh Dickins wrote:
> Repeating a technical question (sorry, that now seems off-topic!):
> what do you expect to happen with PROT_WRITE, MAP_PRIVATE?

That would end up with a MAP_PRIVATE, PROT_WRITE, VM_RESERVED
mapping. That does not make sense for this device, so I added
the following check to mspec_mmap()

if ((vma->vm_flags & VM_SHARED) == 0)
return -EINVAL;

Thanks,
Robin

2005-10-17 20:57:48

by Andrew Morton

[permalink] [raw]
Subject: Re: [Patch 2/3] Export get_one_pte_map.

Hugh Dickins <[email protected]> wrote:
>
> On Mon, 17 Oct 2005, Robin Holt wrote:
> > On Mon, Oct 17, 2005 at 08:20:34AM -0700, Greg KH wrote:
> > >
> > > The stuff in -mm is what is going to be in .15, so you have to work off
> > > of that patchset if you wish to have something for .15.
>
> The stuff in -mm is a best guess at what's going to be in 2.6.15,
> but it's far from exact and certain.
>
> > Is everything in the mm/ directory from the -mm tree going into .15 or
> > is there a planned subset? What should I develop against to help ensure
> > I match up with the community?
>
> That's a question for Andrew (added to CC in case he's not receiving
> enough copies of this mail ;-), and it's too soon to tell - changes
> which have only just been exposed in 2.6.14-rc4-mm1 are not yet
> mature enough for a judgement.
>
> I've still got stuff to come, to make full sense of what's already in:
> if I were Andrew, given the faster releases we're going for now, I'd
> currently be in some doubt about whether to push that for 2.6.15.

Ther are nearly 100 mm patches in -mm. I need to do a round of discussion
with the originators to work out what's suitable for 2.6.15. For "Hugh
stuff" I'm thinking maybe the first batch
(mm-hugetlb-truncation-fixes.patch to mm-m68k-kill-stram-swap.patch) and
not the second batch. But we need to think about it.

> Perhaps he'll be more comfortable with yours, feel it's the right
> way to go, and want to put it ahead of what's presently in -mm.
> Or perhaps not.

The mspec driver? That work should be based on the mm patches in rc4-mm1,
I guess. Its impact on core mm is small, so we should be able to get it
into 2.6.15.

2005-10-17 21:00:18

by Hugh Dickins

[permalink] [raw]
Subject: Re: [Patch 2/3] Export get_one_pte_map.

On Mon, 17 Oct 2005, Robin Holt wrote:
> On Mon, Oct 17, 2005 at 04:59:20PM +0100, Hugh Dickins wrote:
> > Repeating a technical question (sorry, that now seems off-topic!):
> > what do you expect to happen with PROT_WRITE, MAP_PRIVATE?
>
> That would end up with a MAP_PRIVATE, PROT_WRITE, VM_RESERVED
> mapping. That does not make sense for this device, so I added
> the following check to mspec_mmap()
>
> if ((vma->vm_flags & VM_SHARED) == 0)
> return -EINVAL;

Great, that saves us (or at least postpones) some nastiness!

Thanks,
Hugh

2005-10-18 08:48:12

by Carsten Otte

[permalink] [raw]
Subject: Re: [Patch 2/3] Export get_one_pte_map.

Andrew Morton wrote:
> Ther are nearly 100 mm patches in -mm. I need to do a round of discussion
> with the originators to work out what's suitable for 2.6.15. For "Hugh
> stuff" I'm thinking maybe the first batch
> (mm-hugetlb-truncation-fixes.patch to mm-m68k-kill-stram-swap.patch) and
> not the second batch. But we need to think about it.
We tested Hugh's stuff that is currently in -mm, mainly from the xip
perspecive. Seems to work fine for 390.

2005-10-18 23:29:04

by William Lee Irwin III

[permalink] [raw]
Subject: Re: [Patch 2/3] Export get_one_pte_map.

On Mon, Oct 17, 2005 at 06:31:31AM -0500, Robin Holt wrote:
> Not sure why it would fall that way. Looking at the directory,
> I get:
> [holt@lnx-holt mm]$ grep -c 'EXPORT_SYMBOL(' *.c | egrep -v ":0$"
[...]
> I will happily change it, but that seems inconsistent with the
> majority of the exports from that subsystem.

I'm a bit more laissez-faire on this: I don't mind the export either
way. That said, I'm aware that the prevailing opinion is in favor of
EXPORT_SYMBOL_GPL().


-- wli