2004-03-20 21:02:22

by Andrea Arcangeli

[permalink] [raw]
Subject: 2.6.5-rc1-aa3

Fixed the sigbus in nopage and improved the page_t layout per Hugh's
suggestion. BUG() with discontigmem disabled if somebody returns non-ram
via do_no_page, that cannot work right on numa anyways.

http://www.us.kernel.org/pub/linux/kernel/people/andrea/kernels/v2.6/2.6.5-rc1-aa3.gz
http://www.us.kernel.org/pub/linux/kernel/people/andrea/kernels/v2.6/2.6.5-rc1-aa3/

Diff between 2.6.5-rc1-aa2 and 2.6.5-rc1-aa3.

Only in 2.6.5-rc1-aa2: 00000_extraversion-2
Only in 2.6.5-rc1-aa3: extraversion
Only in 2.6.5-rc1-aa2: 00100_objrmap-core-1.gz
Only in 2.6.5-rc1-aa3: objrmap-core.gz

Rediffed.

Only in 2.6.5-rc1-aa2: 00000_twofish-2.6.gz
Only in 2.6.5-rc1-aa3: twofish-2.6.gz
Only in 2.6.5-rc1-aa2: 00200_kgdb-ga-1.gz
Only in 2.6.5-rc1-aa2: 00201_kgdb-ga-recent-gcc-fix-1.gz
Only in 2.6.5-rc1-aa2: 00201_kgdb-THREAD_SIZE-fixes-1.gz
Only in 2.6.5-rc1-aa2: 00201_kgdb-x86_64-support-1.gz
Only in 2.6.5-rc1-aa3: kgdb-ga.gz
Only in 2.6.5-rc1-aa3: kgdb-ga-recent-gcc-fix.gz
Only in 2.6.5-rc1-aa3: kgdb-THREAD_SIZE-fixes.gz
Only in 2.6.5-rc1-aa3: kgdb-x86_64-support.gz

Renamed.

Only in 2.6.5-rc1-aa2: 00101_anon_vma-2.gz
Only in 2.6.5-rc1-aa3: anon_vma.gz

Change mapcount to an unsigned int, and move it near
the atomic_t to save 8 bytes per page on 64bit archs,
from Hugh Dickins.

Fixed a bug in do_no_page that was crashing if
->nopage returned a sigbus or oom error.

Only in 2.6.5-rc1-aa3: linus.patch.gz

Linus's patch from 2.6.5-rc1-mm2.

Only in 2.6.5-rc1-aa3: laptop-mode-2.patch.gz

laptop mode from 2.6.5-rc1-mm2.

Only in 2.6.5-rc1-aa3: clear_page_dirty_for_io.patch.gz
Only in 2.6.5-rc1-aa3: compound-pages-stop-using-lru.patch.gz
Only in 2.6.5-rc1-aa3: hugetlb-stop-using-page-list.patch.gz
Only in 2.6.5-rc1-aa3: irq-safe-pagecache-lock.patch.gz
Only in 2.6.5-rc1-aa3: page_alloc-stop-using-page-list.patch.gz
Only in 2.6.5-rc1-aa3: pageattr-stop-using-page-list.patch.gz
Only in 2.6.5-rc1-aa3: radix-tree-tagging.patch.gz
Only in 2.6.5-rc1-aa3: readahead-stop-using-page-list.patch.gz
Only in 2.6.5-rc1-aa3: remove-page-list.patch.gz
Only in 2.6.5-rc1-aa3: slab-stop-using-page-list.patch.gz
Only in 2.6.5-rc1-aa3: stop-using-clean-pages.patch.gz
Only in 2.6.5-rc1-aa3: stop-using-dirty-pages.patch.gz
Only in 2.6.5-rc1-aa3: stop-using-io-pages.patch.gz
Only in 2.6.5-rc1-aa3: stop-using-locked-pages-fix-2.patch.gz
Only in 2.6.5-rc1-aa3: stop-using-locked-pages-fix.patch.gz
Only in 2.6.5-rc1-aa3: stop-using-locked-pages.patch.gz
Only in 2.6.5-rc1-aa3: tag-dirty-pages.patch.gz
Only in 2.6.5-rc1-aa3: tag-writeback-pages-fix.patch.gz
Only in 2.6.5-rc1-aa3: tag-writeback-pages-missing-filesystems.patch.gz
Only in 2.6.5-rc1-aa3: tag-writeback-pages.patch.gz
Only in 2.6.5-rc1-aa3: unslabify-pgds-and-pmds.patch.gz

Writeback changes from 2.6.5-rc1-mm2 to reduce the difference
with other trees, and to avoid having to maintain significantly
different versions of anon_vma.


2004-03-20 22:06:03

by Andrew Morton

[permalink] [raw]
Subject: Re: 2.6.5-rc1-aa3

Andrea Arcangeli <[email protected]> wrote:
>
> Writeback changes from 2.6.5-rc1-mm2 to reduce the difference
> with other trees, and to avoid having to maintain significantly
> different versions of anon_vma.

yup. I'd hope to get these merged up post-2.6.5. There's possibly one
little kupdate problem which I need to look into today.

Daniel is still showing a once-per-three-hours data exposure leak with
O_DIRECT-versus-buffered on 8-way on ext3 (not on ext2) which I need to
think about a bit.

2004-03-21 06:17:12

by Martin J. Bligh

[permalink] [raw]
Subject: Re: 2.6.5-rc1-aa3

> Fixed the sigbus in nopage and improved the page_t layout per Hugh's
> suggestion. BUG() with discontigmem disabled if somebody returns non-ram
> via do_no_page, that cannot work right on numa anyways.

OK, well it doesn't oops any more. But sshd still dies as soon as it starts,
so accessing the box is tricky ;-) And now I have no obvious diagnostics
either ...

M.

2004-03-21 10:07:10

by Marc-Christian Petersen

[permalink] [raw]
Subject: Re: 2.6.5-rc1-aa3

On Saturday 20 March 2004 22:03, Andrea Arcangeli wrote:

Hey Andrea,

> Fixed the sigbus in nopage and improved the page_t layout per Hugh's
> suggestion. BUG() with discontigmem disabled if somebody returns non-ram
> via do_no_page, that cannot work right on numa anyways.

I thought trying out your new -aa3 on my desktop is a good idea ;) ... As soon
I start VMware 4 (any-any update 53) I get the attached oops. .config also
attached.

ciao, Marc


Attachments:
(No filename) (455.00 B)
2.6.5-rc1-aa3-oops.log (1.80 kB)
.config (35.65 kB)
dmesg (13.22 kB)
Download all attachments

2004-03-21 11:48:55

by Andrea Arcangeli

[permalink] [raw]
Subject: do we want to kill VM_RESERVED or not? [was Re: 2.6.5-rc1-aa3]

On Sun, Mar 21, 2004 at 11:05:05AM +0100, Marc-Christian Petersen wrote:
> On Saturday 20 March 2004 22:03, Andrea Arcangeli wrote:
>
> Hey Andrea,
>
> > Fixed the sigbus in nopage and improved the page_t layout per Hugh's
> > suggestion. BUG() with discontigmem disabled if somebody returns non-ram
> > via do_no_page, that cannot work right on numa anyways.
>
> I thought trying out your new -aa3 on my desktop is a good idea ;) ... As soon
> I start VMware 4 (any-any update 53) I get the attached oops. .config also
> attached.

this is easy to fix in the vmmon module, you can simply add ->vm_flags |=
VM_RESERVED somewhere in the vmware kernel modules, you should not find any
VM_RESERVED in that kernel module.

I'm enforcing modules that uses ->nopage to map non-VM-pageable
memory, to set VM_RESERVED like they must do in 2.4 to be safe. so that
we still enforce an API abstraction that in theory would allow to return doing
the pagetable walking if we wanted to (one obvious thing that the
pagetable walk avoids, is the lru_cache_add with a spinlock for
anonymous memory pagefaults), I don't think we'll ever go back, but it's
a little effort to add |= VM_RESERVED and last but not the least it adds
some hardness to the kernel as well.

I'd like to have feedback on this point and if people thinks I'm doing a
mistake enforcing drivers to use VM_RESERVED in 2.6 still. If we giveup
the ability to ever do a pagetable walk again, and we don't mind about
the additional hardness you could remove the BUG_ON on in memory.c at line
1432 (in 2.6.5-rc1-aa3 of course) and the VM would still work perfectly,
but you wouldn't catch drivers using ->nopage to fillup pagetables with
non-pageable memory and not setting VM_RESERVED at the same time anymore.

So it's up to you, if you prefer you can remove the BUG_ON, but I'd prefer if
you would add the one-liner fix to vmmon.

If you fixup vmware as I suggested please send me a patch too, thanks!

comments welcome.

[this is my current do_no_page for you to review, I believe my robustness
BUG_ON are correct and this is not a false positive, if we want to kill
VM_RESERVED I can remove the BUG_ON(reserved == pageable) which is the one
triggering with vmware right now]

static int
do_no_page(struct mm_struct *mm, struct vm_area_struct *vma,
unsigned long address, int write_access, pte_t *page_table, pmd_t *pmd)
{
struct page * new_page;
struct address_space *mapping = NULL;
pte_t entry;
int sequence = 0, reserved, anon, pageable, as;
int ret = VM_FAULT_MINOR;

if (!vma->vm_ops || !vma->vm_ops->nopage)
return do_anonymous_page(mm, vma, page_table,
pmd, write_access, address);
pte_unmap(page_table);
spin_unlock(&mm->page_table_lock);

if (vma->vm_file) {
mapping = vma->vm_file->f_mapping;
sequence = atomic_read(&mapping->truncate_count);
}
smp_rmb(); /* Prevent CPU from reordering lock-free ->nopage() */
retry:
new_page = vma->vm_ops->nopage(vma, address & PAGE_MASK, &ret);

/* no page was available -- either SIGBUS or OOM */
if (new_page == NOPAGE_SIGBUS)
return VM_FAULT_SIGBUS;
if (new_page == NOPAGE_OOM)
return VM_FAULT_OOM;

#ifndef CONFIG_DISCONTIGMEM
/* this check is unreliable with numa enabled */
BUG_ON(!pfn_valid(page_to_pfn(new_page)));
#endif
pageable = !PageReserved(new_page);
as = !!new_page->mapping;

BUG_ON(!pageable && as);

pageable &= as;

/* ->nopage cannot return swapcache */
BUG_ON(PageSwapCache(new_page));
/* ->nopage cannot return anonymous pages */
BUG_ON(PageAnon(new_page));

/*
* This is the entry point for memory under VM_RESERVED vmas.
* That memory will not be tracked by the vm. These aren't
* real anonymous pages, they're "device" reserved pages instead.
*/
reserved = !!(vma->vm_flags & VM_RESERVED);
BUG_ON(reserved == pageable);

/*
* Should we do an early C-O-W break?
*/
anon = 0;
if (write_access && !(vma->vm_flags & VM_SHARED)) {
struct page * page;
if (unlikely(anon_vma_prepare(vma)))
goto oom;
page = alloc_page(GFP_HIGHUSER);
if (!page)
goto oom;
copy_user_highpage(page, new_page, address);
page_cache_release(new_page);
lru_cache_add_active(page);
new_page = page;
anon = 1;
}

spin_lock(&mm->page_table_lock);
/*
* For a file-backed vma, someone could have truncated or otherwise
* invalidated this page. If invalidate_mmap_range got called,
* retry getting the page.
*/
if (mapping &&
(unlikely(sequence != atomic_read(&mapping->truncate_count)))) {
sequence = atomic_read(&mapping->truncate_count);
spin_unlock(&mm->page_table_lock);
page_cache_release(new_page);
goto retry;
}
page_table = pte_offset_map(pmd, address);

/*
* This silly early PAGE_DIRTY setting removes a race
* due to the bad i386 page protection. But it's valid
* for other architectures too.
*
* Note that if write_access is true, we either now have
* an exclusive copy of the page, or this is a shared mapping,
* so we can make it writable and dirty to avoid having to
* handle that later.
*/
/* Only go through if we didn't race with anybody else... */
if (pte_none(*page_table)) {
if (!PageReserved(new_page))
++mm->rss;
flush_icache_page(vma, new_page);
entry = mk_pte(new_page, vma->vm_page_prot);
if (write_access)
entry = maybe_mkwrite(pte_mkdirty(entry), vma);
set_pte(page_table, entry);
if (likely(pageable))
page_add_rmap(new_page, vma, address, anon);
pte_unmap(page_table);
} else {
/* One of our sibling threads was faster, back out. */
pte_unmap(page_table);
page_cache_release(new_page);
spin_unlock(&mm->page_table_lock);
goto out;
}

/* no need to invalidate: a not-present page shouldn't be cached */
update_mmu_cache(vma, address, entry);
spin_unlock(&mm->page_table_lock);
out:
return ret;

oom:
page_cache_release(new_page);
ret = VM_FAULT_OOM;
goto out;
}

2004-03-21 11:50:31

by Andrea Arcangeli

[permalink] [raw]
Subject: Re: 2.6.5-rc1-aa3

On Sat, Mar 20, 2004 at 10:17:16PM -0800, Martin J. Bligh wrote:
> > Fixed the sigbus in nopage and improved the page_t layout per Hugh's
> > suggestion. BUG() with discontigmem disabled if somebody returns non-ram
> > via do_no_page, that cannot work right on numa anyways.
>
> OK, well it doesn't oops any more. But sshd still dies as soon as it starts,
> so accessing the box is tricky ;-) And now I have no obvious diagnostics
> either ...

no surprise, you correctly get a sigbus now that kills sshd. Can you try
with mainline 2.6.5-rc1 to see if it works there?

2004-03-21 11:59:15

by Andrea Arcangeli

[permalink] [raw]
Subject: Re: do we want to kill VM_RESERVED or not? [was Re: 2.6.5-rc1-aa3]

On Sun, Mar 21, 2004 at 12:49:39PM +0100, Andrea Arcangeli wrote:
> the additional hardness you could remove the BUG_ON on in memory.c at line

I now discovered that WARN_ON exists too, so probably the best is to
simply change that to a WARN_ON (or to a printk). If one will ever do a
pagetable walk again, one has to change that to a BUG_ON by that time.
Kernel will work stable regardless of that condition triggering.

--- x/mm/memory.c.~1~ 2004-03-20 22:12:43.000000000 +0100
+++ x/mm/memory.c 2004-03-21 12:59:05.331923016 +0100
@@ -1429,7 +1429,7 @@ retry:
* real anonymous pages, they're "device" reserved pages instead.
*/
reserved = !!(vma->vm_flags & VM_RESERVED);
- BUG_ON(reserved == pageable);
+ WARN_ON(reserved == pageable);

/*
* Should we do an early C-O-W break?

2004-03-21 12:14:37

by Andrea Arcangeli

[permalink] [raw]
Subject: Re: do we want to kill VM_RESERVED or not? [was Re: 2.6.5-rc1-aa3]

On Sun, Mar 21, 2004 at 01:00:05PM +0100, Andrea Arcangeli wrote:
> On Sun, Mar 21, 2004 at 12:49:39PM +0100, Andrea Arcangeli wrote:
> > the additional hardness you could remove the BUG_ON on in memory.c at line
>
> I now discovered that WARN_ON exists too, so probably the best is to
> simply change that to a WARN_ON (or to a printk). If one will ever do a
> pagetable walk again, one has to change that to a BUG_ON by that time.
> Kernel will work stable regardless of that condition triggering.
>
> --- x/mm/memory.c.~1~ 2004-03-20 22:12:43.000000000 +0100
> +++ x/mm/memory.c 2004-03-21 12:59:05.331923016 +0100
> @@ -1429,7 +1429,7 @@ retry:
> * real anonymous pages, they're "device" reserved pages instead.
> */
> reserved = !!(vma->vm_flags & VM_RESERVED);
> - BUG_ON(reserved == pageable);
> + WARN_ON(reserved == pageable);
>
> /*
> * Should we do an early C-O-W break?

and here the vmware proper fix:

--- vmmon-only/linux/driver.c.~1~ 2004-03-21 13:07:02.869326296 +0100
+++ vmmon-only/linux/driver.c 2004-03-21 13:07:28.320457136 +0100
@@ -1083,6 +1083,7 @@ static int LinuxDriverMmap(struct file *
}
/* Clear VM_IO, otherwise SuSE's kernels refuse to do get_user_pages */
vma->vm_flags &= ~VM_IO;
+ vma->vm_flags |= VM_RESERVED;
#if LINUX_VERSION_CODE < KERNEL_VERSION(2, 2, 3)
vma->vm_file = filp;
filp->f_count++;


You should apply both (though just applying one of the two will fix it).

2004-03-21 13:25:41

by Andrea Arcangeli

[permalink] [raw]
Subject: Re: 2.6.5-rc1-aa3

On Sat, Mar 20, 2004 at 10:17:16PM -0800, Martin J. Bligh wrote:
> > Fixed the sigbus in nopage and improved the page_t layout per Hugh's
> > suggestion. BUG() with discontigmem disabled if somebody returns non-ram
> > via do_no_page, that cannot work right on numa anyways.
>
> OK, well it doesn't oops any more. But sshd still dies as soon as it starts,
> so accessing the box is tricky ;-) And now I have no obvious diagnostics
> either ...

Jens sent me the perfect strace log, after his help it has not been
difficult to spot the bug. this incremental should fix it
MAP_SHARED|MAP_ANONYMOUS isn't very common and my userspace never
triggered it. I placed the pgoff anon setting in the path of the shared
memory too, that generated the sigbus. Leaving the setting only in the
MAP_PRIVATE should fix it, the anonymous memory is only MAP_PRIVATE.

patch is untested at the moment, as soon as I get confirmation I'll
upload an update.

thanks!

--- x/mm/mmap.c.~1~ 2004-03-20 22:12:43.000000000 +0100
+++ x/mm/mmap.c 2004-03-21 14:15:17.269882800 +0100
@@ -622,11 +622,11 @@ unsigned long do_mmap_pgoff(struct file
return -EINVAL;
case MAP_PRIVATE:
vm_flags &= ~(VM_SHARED | VM_MAYSHARE);
- /* fall through */
+ pgoff = addr >> PAGE_SHIFT;
+ break;
case MAP_SHARED:
break;
}
- pgoff = addr >> PAGE_SHIFT;
}

error = security_file_mmap(file, prot, flags);

2004-03-21 16:23:18

by Martin J. Bligh

[permalink] [raw]
Subject: Re: 2.6.5-rc1-aa3

--Andrea Arcangeli <[email protected]> wrote (on Sunday, March 21, 2004 14:26:30 +0100):

> On Sat, Mar 20, 2004 at 10:17:16PM -0800, Martin J. Bligh wrote:
>> > Fixed the sigbus in nopage and improved the page_t layout per Hugh's
>> > suggestion. BUG() with discontigmem disabled if somebody returns non-ram
>> > via do_no_page, that cannot work right on numa anyways.
>>
>> OK, well it doesn't oops any more. But sshd still dies as soon as it starts,
>> so accessing the box is tricky ;-) And now I have no obvious diagnostics
>> either ...
>
> Jens sent me the perfect strace log, after his help it has not been
> difficult to spot the bug. this incremental should fix it
> MAP_SHARED|MAP_ANONYMOUS isn't very common and my userspace never
> triggered it. I placed the pgoff anon setting in the path of the shared
> memory too, that generated the sigbus. Leaving the setting only in the
> MAP_PRIVATE should fix it, the anonymous memory is only MAP_PRIVATE.
>
> patch is untested at the moment, as soon as I get confirmation I'll
> upload an update.

Yup, that fixes mine up too - runs fine now.

M.

2004-03-21 19:44:50

by Marc-Christian Petersen

[permalink] [raw]
Subject: Re: do we want to kill VM_RESERVED or not? [was Re: 2.6.5-rc1-aa3]

On Sunday 21 March 2004 13:15, Andrea Arcangeli wrote:

Hi Andrea,

first: many thanks for all your effort for objrmap and anon_vma.
I really appreciate it!

> and here the vmware proper fix:
> --- vmmon-only/linux/driver.c.~1~ 2004-03-21 13:07:02.869326296 +0100
> +++ vmmon-only/linux/driver.c 2004-03-21 13:07:28.320457136 +0100
> @@ -1083,6 +1083,7 @@ static int LinuxDriverMmap(struct file *
> }
> /* Clear VM_IO, otherwise SuSE's kernels refuse to do get_user_pages */
> vma->vm_flags &= ~VM_IO;
> + vma->vm_flags |= VM_RESERVED;
> #if LINUX_VERSION_CODE < KERNEL_VERSION(2, 2, 3)
> vma->vm_file = filp;
> filp->f_count++;
> You should apply both (though just applying one of the two will fix it).

ok, without the VMware fix, see attached oops report.

With the VMware fix, it works fine.

Both, for sure, with 2.6.5-rc2-aa1.

What I have noticed is this from VMware _without_ the VMware fix:

Mar 21 20:23:56 codeman kernel: /dev/vmnet: hub 8 does not exist, allocating
memory.
Mar 21 20:23:56 codeman kernel: /dev/vmnet: port on hub 8 successfully opened
Mar 21 20:23:56 codeman VMware[init]: Unable to sendto: Operation not <------
permitted
Mar 21 20:23:56 codeman VMware[init]:
Mar 21 20:23:56 codeman kernel: /dev/vmnet: open called by PID 10497
(vmnet-netifup)
Mar 21 20:23:56 codeman kernel: /dev/vmnet: port on hub 8 successfully opened


With the VMware fix applied, the "Unable to sendto..." line disappears.

ciao, Marc


Attachments:
(No filename) (1.43 kB)
.config (35.69 kB)
dmesg (14.00 kB)
2.6.5-rc2-aa1-oops-wo-vmware-fix.log (4.05 kB)
Download all attachments

2004-03-21 23:24:33

by Andrew Morton

[permalink] [raw]
Subject: Re: do we want to kill VM_RESERVED or not? [was Re: 2.6.5-rc1-aa3]

Andrea Arcangeli <[email protected]> wrote:
>
> believe my robustness
> BUG_ON are correct and this is not a false positive, if we want to kill
> VM_RESERVED I can remove the BUG_ON(reserved == pageable) which is the one
> triggering with vmware right now]

I'd prefer to retain VM_RESERVED and work toward removing PageReserved().
The latter has a real and measurable cost in put_page().

2004-03-22 00:09:34

by Andrea Arcangeli

[permalink] [raw]
Subject: Re: do we want to kill VM_RESERVED or not? [was Re: 2.6.5-rc1-aa3]

On Sun, Mar 21, 2004 at 08:42:18PM +0100, Marc-Christian Petersen wrote:
> On Sunday 21 March 2004 13:15, Andrea Arcangeli wrote:
>
> Hi Andrea,
>
> first: many thanks for all your effort for objrmap and anon_vma.
> I really appreciate it!

You're welcome, you should also thank Dave and Hugh even before you
thank me ;), since they solved many of the problems to make this
possible years ago even before I started working on the objrmap myself.

> > and here the vmware proper fix: --- vmmon-only/linux/driver.c.~1~
> > 2004-03-21 13:07:02.869326296 +0100
> > +++ vmmon-only/linux/driver.c 2004-03-21 13:07:28.320457136 +0100
> > @@ -1083,6 +1083,7 @@ static int LinuxDriverMmap(struct file *
> > }
> > /* Clear VM_IO, otherwise SuSE's kernels refuse to do get_user_pages */
> > vma->vm_flags &= ~VM_IO;
> > + vma->vm_flags |= VM_RESERVED;
> > #if LINUX_VERSION_CODE < KERNEL_VERSION(2, 2, 3)
> > vma->vm_file = filp;
> > filp->f_count++;
> > You should apply both (though just applying one of the two will fix it).
>
> ok, without the VMware fix, see attached oops report.

it's not an oops report, it's a warning only and it should not affect
functionality in any way (vmware should still work). The vmware fix will
shutdown the warning so you won't be annoyed anymore by it ;)

> With the VMware fix, it works fine.

Good.

> Both, for sure, with 2.6.5-rc2-aa1.
>
> What I have noticed is this from VMware _without_ the VMware fix:
>
> Mar 21 20:23:56 codeman kernel: /dev/vmnet: hub 8 does not exist, allocating
> memory.
> Mar 21 20:23:56 codeman kernel: /dev/vmnet: port on hub 8 successfully opened
> Mar 21 20:23:56 codeman VMware[init]: Unable to sendto: Operation not <------
> permitted
> Mar 21 20:23:56 codeman VMware[init]:
> Mar 21 20:23:56 codeman kernel: /dev/vmnet: open called by PID 10497
> (vmnet-netifup)
> Mar 21 20:23:56 codeman kernel: /dev/vmnet: port on hub 8 successfully opened
>
>
> With the VMware fix applied, the "Unable to sendto..." line disappears.

maybe a delay generated by the printk, not sure why there's a relation
between the two, or if it's only a coincidence. WARN_ON after triggering
should only generate a delay, no other effects.

thanks!

2004-03-22 00:50:35

by Andrea Arcangeli

[permalink] [raw]
Subject: Re: do we want to kill VM_RESERVED or not? [was Re: 2.6.5-rc1-aa3]

On Sun, Mar 21, 2004 at 03:24:27PM -0800, Andrew Morton wrote:
> I'd prefer to retain VM_RESERVED and work toward removing PageReserved().
> The latter has a real and measurable cost in put_page().

agreed. Unfortunately removing PageReserved won't be so trivial since
it'll be a change to userspace API too, today a PageReserved page
through fork() will act as a MAP_SHARED even if it's under a
MAP_PRIVATE, so slightly subtle userspace stuff can break then.

2004-03-22 12:16:07

by Marc-Christian Petersen

[permalink] [raw]
Subject: Re: do we want to kill VM_RESERVED or not? [was Re: 2.6.5-rc1-aa3]

On Monday 22 March 2004 01:10, Andrea Arcangeli wrote:

Hi Andrea,

> You're welcome, you should also thank Dave and Hugh even before you
> thank me ;), since they solved many of the problems to make this
> possible years ago even before I started working on the objrmap myself.

Okay :)

> it's not an oops report, it's a warning only and it should not affect
> functionality in any way (vmware should still work). The vmware fix will
> shutdown the warning so you won't be annoyed anymore by it ;)

well, ok. The first two things are warnings, the last is a kernel bug.

And VMware won't work at all. Booting a VMware Image triggers the 2 warnings
and the kernel BUG and the screen stays black in VMware.

cioa, Marc

2004-03-22 12:42:09

by Andrea Arcangeli

[permalink] [raw]
Subject: Re: do we want to kill VM_RESERVED or not? [was Re: 2.6.5-rc1-aa3]

On Mon, Mar 22, 2004 at 01:10:38PM +0100, Marc-Christian Petersen wrote:
> And VMware won't work at all. Booting a VMware Image triggers the 2 warnings
> and the kernel BUG and the screen stays black in VMware.

I see, the below patch will avoid your oops (I also removed the stack
trace dump from memory.c since it's useless to get the stack trace from
there and this will reduce the noise).

--- x/mm/memory.c.~1~ 2004-03-21 15:21:42.000000000 +0100
+++ x/mm/memory.c 2004-03-22 13:40:26.852849384 +0100
@@ -324,9 +324,11 @@ skip_copy_pte_range:
* Device driver pages must not be
* tracked by the VM for unmapping.
*/
- BUG_ON(!page_mapped(page));
- BUG_ON(!page->mapping);
- page_add_rmap(page, vma, address, PageAnon(page));
+ if (likely(page_mapped(page) && page->mapping))
+ page_add_rmap(page, vma, address, PageAnon(page));
+ else
+ printk("Badness in %s at %s:%d\n",
+ __FUNCTION__, __FILE__, __LINE__);
} else {
BUG_ON(page_mapped(page));
BUG_ON(page->mapping);
@@ -1429,7 +1431,9 @@ retry:
* real anonymous pages, they're "device" reserved pages instead.
*/
reserved = !!(vma->vm_flags & VM_RESERVED);
- WARN_ON(reserved == pageable);
+ if (unlikely(reserved == pageable))
+ printk("Badness in %s at %s:%d\n",
+ __FUNCTION__, __FILE__, __LINE__);

/*
* Should we do an early C-O-W break?

many thanks for the help!

2004-03-23 09:55:13

by Marc-Christian Petersen

[permalink] [raw]
Subject: Re: do we want to kill VM_RESERVED or not? [was Re: 2.6.5-rc1-aa3]

On Monday 22 March 2004 13:42, Andrea Arcangeli wrote:

Hi Andrea,

> I see, the below patch will avoid your oops (I also removed the stack
> trace dump from memory.c since it's useless to get the stack trace from
> there and this will reduce the noise).
> --- x/mm/memory.c.~1~ 2004-03-21 15:21:42.000000000 +0100
> +++ x/mm/memory.c 2004-03-22 13:40:26.852849384 +0100
> @@ -324,9 +324,11 @@ skip_copy_pte_range:
> * Device driver pages must not be
> * tracked by the VM for unmapping.
> */
> - BUG_ON(!page_mapped(page));
> - BUG_ON(!page->mapping);
> - page_add_rmap(page, vma, address, PageAnon(page));
> + if (likely(page_mapped(page) && page->mapping))
> + page_add_rmap(page, vma, address, PageAnon(page));
> + else
> + printk("Badness in %s at %s:%d\n",
> + __FUNCTION__, __FILE__, __LINE__);
> } else {
> BUG_ON(page_mapped(page));
> BUG_ON(page->mapping);
> @@ -1429,7 +1431,9 @@ retry:
> * real anonymous pages, they're "device" reserved pages instead.
> */
> reserved = !!(vma->vm_flags & VM_RESERVED);
> - WARN_ON(reserved == pageable);
> + if (unlikely(reserved == pageable))
> + printk("Badness in %s at %s:%d\n",
> + __FUNCTION__, __FILE__, __LINE__);
>
> /*
> * Should we do an early C-O-W break?

Perfect. Thanks alot.


> many thanks for the help!

You're welcome. Thanks for your help!

ciao, Marc