2006-02-23 03:21:18

by Yanmin Zhang

[permalink] [raw]
Subject: [PATCH] Enable mprotect on huge pages

From: Zhang, Yanmin <[email protected]>

2.6.16-rc3 uses hugetlb on-demand paging, but it doesn?t support hugetlb
mprotect. My patch against 2.6.16-rc3 enables this capability.

Signed-off-by: Zhang Yanmin <[email protected]>

Discussion is welcomed.

---

diff -Nraup linux-2.6.16-rc3/include/linux/hugetlb.h linux-2.6.16-rc3_mprotect/include/linux/hugetlb.h
--- linux-2.6.16-rc3/include/linux/hugetlb.h 2006-02-14 16:43:50.000000000 +0800
+++ linux-2.6.16-rc3_mprotect/include/linux/hugetlb.h 2006-02-22 00:12:35.000000000 +0800
@@ -41,6 +41,8 @@ struct page *follow_huge_pmd(struct mm_s
pmd_t *pmd, int write);
int is_aligned_hugepage_range(unsigned long addr, unsigned long len);
int pmd_huge(pmd_t pmd);
+void hugetlb_change_protection(struct vm_area_struct *vma,
+ unsigned long address, unsigned long end, pgprot_t newprot);

#ifndef ARCH_HAS_HUGEPAGE_ONLY_RANGE
#define is_hugepage_only_range(mm, addr, len) 0
@@ -101,6 +103,8 @@ static inline unsigned long hugetlb_tota
#define free_huge_page(p) ({ (void)(p); BUG(); })
#define hugetlb_fault(mm, vma, addr, write) ({ BUG(); 0; })

+#define hugetlb_change_protection(vma, address, end, newprot)
+
#ifndef HPAGE_MASK
#define HPAGE_MASK PAGE_MASK /* Keep the compiler happy */
#define HPAGE_SIZE PAGE_SIZE
diff -Nraup linux-2.6.16-rc3/mm/hugetlb.c linux-2.6.16-rc3_mprotect/mm/hugetlb.c
--- linux-2.6.16-rc3/mm/hugetlb.c 2006-02-14 16:43:50.000000000 +0800
+++ linux-2.6.16-rc3_mprotect/mm/hugetlb.c 2006-02-22 00:12:35.000000000 +0800
@@ -572,3 +572,32 @@ int follow_hugetlb_page(struct mm_struct

return i;
}
+
+void hugetlb_change_protection(struct vm_area_struct *vma,
+ unsigned long address, unsigned long end, pgprot_t newprot)
+{
+ struct mm_struct *mm = vma->vm_mm;
+ unsigned long start = address;
+ pte_t *ptep;
+ pte_t pte;
+
+ BUG_ON(address >= end);
+ flush_cache_range(vma, address, end);
+
+ spin_lock(&mm->page_table_lock);
+ for (; address < end; address += HPAGE_SIZE) {
+ ptep = huge_pte_offset(mm, address);
+ if (!ptep)
+ continue;
+ if (pte_present(*ptep)) {
+ pte = ptep_get_and_clear(mm, address, ptep);
+ pte = pte_modify(pte, newprot);
+ set_huge_pte_at(mm, addr, ptep, pte);
+ lazy_mmu_prot_update(pte);
+ }
+ }
+ spin_unlock(&mm->page_table_lock);
+
+ flush_tlb_range(vma, start, end);
+}
+
diff -Nraup linux-2.6.16-rc3/mm/mprotect.c linux-2.6.16-rc3_mprotect/mm/mprotect.c
--- linux-2.6.16-rc3/mm/mprotect.c 2006-01-25 21:54:15.000000000 +0800
+++ linux-2.6.16-rc3_mprotect/mm/mprotect.c 2006-02-22 00:13:34.000000000 +0800
@@ -124,7 +124,7 @@ mprotect_fixup(struct vm_area_struct *vm
* a MAP_NORESERVE private mapping to writable will now reserve.
*/
if (newflags & VM_WRITE) {
- if (!(oldflags & (VM_ACCOUNT|VM_WRITE|VM_SHARED|VM_HUGETLB))) {
+ if (!(oldflags & (VM_ACCOUNT|VM_WRITE|VM_SHARED))) {
charged = nrpages;
if (security_vm_enough_memory(charged))
return -ENOMEM;
@@ -166,7 +166,10 @@ success:
*/
vma->vm_flags = newflags;
vma->vm_page_prot = newprot;
- change_protection(vma, start, end, newprot);
+ if (is_vm_hugetlb_page(vma))
+ hugetlb_change_protection(vma, start, end, newprot);
+ else
+ change_protection(vma, start, end, newprot);
vm_stat_account(mm, oldflags, vma->vm_file, -nrpages);
vm_stat_account(mm, newflags, vma->vm_file, nrpages);
return 0;
@@ -240,11 +243,6 @@ sys_mprotect(unsigned long start, size_t

/* Here we know that vma->vm_start <= nstart < vma->vm_end. */

- if (is_vm_hugetlb_page(vma)) {
- error = -EACCES;
- goto out;
- }
-
newflags = vm_flags | (vma->vm_flags & ~(VM_READ | VM_WRITE | VM_EXEC));

/* newflags >> 4 shift VM_MAY% in place of VM_% */
@@ -260,6 +258,12 @@ sys_mprotect(unsigned long start, size_t
tmp = vma->vm_end;
if (tmp > end)
tmp = end;
+ if (is_vm_hugetlb_page(vma) &&
+ is_aligned_hugepage_range(nstart, tmp - nstart)) {
+ error = -EINVAL;
+ goto out;
+ }
+
error = mprotect_fixup(vma, &prev, nstart, tmp, newflags);
if (error)
goto out;



2006-02-24 22:27:08

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH] Enable mprotect on huge pages

"Zhang, Yanmin" <[email protected]> wrote:
>
> From: Zhang, Yanmin <[email protected]>
>
> 2.6.16-rc3 uses hugetlb on-demand paging, but it doesn_t support hugetlb
> mprotect. My patch against 2.6.16-rc3 enables this capability.
>

Well I suppose that makes sense. It does assume that the normal pte
protection-changing APIs do the right thing on all architectures which
implement huge pages. That's quite possibly the case, but we should
confirm that.

> +
> +void hugetlb_change_protection(struct vm_area_struct *vma,
> + unsigned long address, unsigned long end, pgprot_t newprot)
> +{
> + struct mm_struct *mm = vma->vm_mm;
> + unsigned long start = address;
> + pte_t *ptep;
> + pte_t pte;
> +
> + BUG_ON(address >= end);
> + flush_cache_range(vma, address, end);
> +
> + spin_lock(&mm->page_table_lock);
> + for (; address < end; address += HPAGE_SIZE) {
> + ptep = huge_pte_offset(mm, address);
> + if (!ptep)
> + continue;
> + if (pte_present(*ptep)) {
> + pte = ptep_get_and_clear(mm, address, ptep);
> + pte = pte_modify(pte, newprot);
> + set_huge_pte_at(mm, addr, ptep, pte);
> + lazy_mmu_prot_update(pte);
> + }
> + }
> + spin_unlock(&mm->page_table_lock);
> +
> + flush_tlb_range(vma, start, end);
> +}

2006-02-25 08:54:12

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [PATCH] Enable mprotect on huge pages

On Thu, Feb 23, 2006 at 11:19:40AM +0800, Zhang, Yanmin wrote:
> From: Zhang, Yanmin <[email protected]>
>
> 2.6.16-rc3 uses hugetlb on-demand paging, but it doesn?t support hugetlb
> mprotect. My patch against 2.6.16-rc3 enables this capability.

Adding another special case for hugetlb pages sounds rather bad. Could
you try adding a mrotect vm_area_operation if that works out cleaner?

2006-02-25 10:08:21

by Paul Rolland

[permalink] [raw]
Subject: [2.4.32 - 2.6.15.4] e1000 - Fix mii interface

Hello,

This patch is based on Linux 2.4.32, and I've verified the same problem
exists on 2.6.15.4.
Working on a machine with a 2.4.32 kernel, I was surprised to see the driver
complaining when setting the speed to 100FD using mii-tool, but accepting
the setting with ethtool.
Digging into the code, I found that there is some confusion with :
- DUPLEX_FULL and FULL_DUPLEX,
- DUPLEX_HALF and HALF_DUPLEX
in the code :
...
spddplx += (mii_reg & 0x100)
? FULL_DUPLEX :
HALF_DUPLEX;
retval = e1000_set_spd_dplx(adapter,
spddplx);
...
and
int
e1000_set_spd_dplx(struct e1000_adapter *adapter, uint16_t spddplx)
{
adapter->hw.autoneg = 0;

switch(spddplx) {
case SPEED_10 + DUPLEX_HALF:
adapter->hw.forced_speed_duplex = e1000_10_half;
break;
....
when the constants don't have the same value.

This patch is simply changing the code in the e1000_set_spd_dplx to use the
same constants as does the caller of the function : FULL_DUPLEX and
HALF_DUPLEX
whose values are not 0, to make sure we have had a successfull init
(DUPLEX_HALF value is 0, and the DUPLEX_xxx are defined in ethtool.h, thus
are probably not meant to be used in the mii interface).

Signed-off-by: Paul Rolland <[email protected]>

diff -urN linux-2.4.32-orig/drivers/net/e1000/e1000_main.c
linux-2.4.32/drivers/net/e1000/e1000_main.c
--- linux-2.4.32-orig/drivers/net/e1000/e1000_main.c Mon Apr 4 01:42:19
2005
+++ linux-2.4.32/drivers/net/e1000/e1000_main.c Sat Feb 25 09:36:23 2006
@@ -2944,23 +2944,23 @@
adapter->hw.autoneg = 0;

switch(spddplx) {
- case SPEED_10 + DUPLEX_HALF:
+ case SPEED_10 + HALF_DUPLEX:
adapter->hw.forced_speed_duplex = e1000_10_half;
break;
- case SPEED_10 + DUPLEX_FULL:
+ case SPEED_10 + FULL_DUPLEX:
adapter->hw.forced_speed_duplex = e1000_10_full;
break;
- case SPEED_100 + DUPLEX_HALF:
+ case SPEED_100 + HALF_DUPLEX:
adapter->hw.forced_speed_duplex = e1000_100_half;
break;
- case SPEED_100 + DUPLEX_FULL:
+ case SPEED_100 + FULL_DUPLEX:
adapter->hw.forced_speed_duplex = e1000_100_full;
break;
- case SPEED_1000 + DUPLEX_FULL:
+ case SPEED_1000 + FULL_DUPLEX:
adapter->hw.autoneg = 1;
adapter->hw.autoneg_advertised = ADVERTISE_1000_FULL;
break;
- case SPEED_1000 + DUPLEX_HALF: /* not supported */
+ case SPEED_1000 + HALF_DUPLEX: /* not supported */
default:
DPRINTK(PROBE, ERR,
"Unsupported Speed/Duplexity configuration\n");


Paul Rolland, rol(at)as2917.net
ex-AS2917 Network administrator and Peering Coordinator

--

Please no HTML, I'm not a browser - Pas d'HTML, je ne suis pas un navigateur
"Some people dream of success... while others wake up and work hard at it"

"I worry about my child and the Internet all the time, even though she's too
young to have logged on yet. Here's what I worry about. I worry that 10 or 15
years from now, she will come to me and say 'Daddy, where were you when they
took freedom of the press away from the Internet?'"
--Mike Godwin, Electronic Frontier Foundation


Attachments:
e1000.patch (1.42 kB)

2006-02-26 10:42:19

by Willy Tarreau

[permalink] [raw]
Subject: Re: [2.4.32 - 2.6.15.4] e1000 - Fix mii interface

Hello Paul,

On Sat, Feb 25, 2006 at 11:08:49AM +0100, Paul Rolland wrote:
> Hello,
>
> This patch is based on Linux 2.4.32, and I've verified the same problem
> exists on 2.6.15.4.

it's mangled, tabs have been turned into whitespaces. I fixed it so please
use the appended one.

> Working on a machine with a 2.4.32 kernel, I was surprised to see the driver
> complaining when setting the speed to 100FD using mii-tool, but accepting
> the setting with ethtool.
> Digging into the code, I found that there is some confusion with :
> - DUPLEX_FULL and FULL_DUPLEX,
> - DUPLEX_HALF and HALF_DUPLEX
> in the code :
> ...
> spddplx += (mii_reg & 0x100)
> ? FULL_DUPLEX :
> HALF_DUPLEX;
> retval = e1000_set_spd_dplx(adapter,
> spddplx);
> ...
> and
> int
> e1000_set_spd_dplx(struct e1000_adapter *adapter, uint16_t spddplx)
> {
> adapter->hw.autoneg = 0;
>
> switch(spddplx) {
> case SPEED_10 + DUPLEX_HALF:
> adapter->hw.forced_speed_duplex = e1000_10_half;
> break;
> ....
> when the constants don't have the same value.
>
> This patch is simply changing the code in the e1000_set_spd_dplx to use the
> same constants as does the caller of the function : FULL_DUPLEX and
> HALF_DUPLEX
> whose values are not 0, to make sure we have had a successfull init
> (DUPLEX_HALF value is 0, and the DUPLEX_xxx are defined in ethtool.h, thus
> are probably not meant to be used in the mii interface).
>
> Signed-off-by: Paul Rolland <[email protected]>
>
> diff -urN linux-2.4.32-orig/drivers/net/e1000/e1000_main.c
> linux-2.4.32/drivers/net/e1000/e1000_main.c
> --- linux-2.4.32-orig/drivers/net/e1000/e1000_main.c Mon Apr 4 01:42:19
> 2005
> +++ linux-2.4.32/drivers/net/e1000/e1000_main.c Sat Feb 25 09:36:23 2006
> @@ -2944,23 +2944,23 @@
> adapter->hw.autoneg = 0;
>
> switch(spddplx) {
> - case SPEED_10 + DUPLEX_HALF:
> + case SPEED_10 + HALF_DUPLEX:
> adapter->hw.forced_speed_duplex = e1000_10_half;
> break;
> - case SPEED_10 + DUPLEX_FULL:
> + case SPEED_10 + FULL_DUPLEX:
> adapter->hw.forced_speed_duplex = e1000_10_full;
> break;
> - case SPEED_100 + DUPLEX_HALF:
> + case SPEED_100 + HALF_DUPLEX:
> adapter->hw.forced_speed_duplex = e1000_100_half;
> break;
> - case SPEED_100 + DUPLEX_FULL:
> + case SPEED_100 + FULL_DUPLEX:
> adapter->hw.forced_speed_duplex = e1000_100_full;
> break;
> - case SPEED_1000 + DUPLEX_FULL:
> + case SPEED_1000 + FULL_DUPLEX:
> adapter->hw.autoneg = 1;
> adapter->hw.autoneg_advertised = ADVERTISE_1000_FULL;
> break;
> - case SPEED_1000 + DUPLEX_HALF: /* not supported */
> + case SPEED_1000 + HALF_DUPLEX: /* not supported */
> default:
> DPRINTK(PROBE, ERR,
> "Unsupported Speed/Duplexity configuration\n");
>
>
> Paul Rolland, rol(at)as2917.net
> ex-AS2917 Network administrator and Peering Coordinator

Regards,
Willy


diff -urN linux-2.4.32-orig/drivers/net/e1000/e1000_main.c linux-2.4.32/drivers/net/e1000/e1000_main.c
--- linux-2.4.32-orig/drivers/net/e1000/e1000_main.c Mon Apr 4 01:42:19 2005
+++ linux-2.4.32/drivers/net/e1000/e1000_main.c Sat Feb 25 09:36:23 2006
@@ -2944,23 +2944,23 @@
adapter->hw.autoneg = 0;

switch(spddplx) {
- case SPEED_10 + DUPLEX_HALF:
+ case SPEED_10 + HALF_DUPLEX:
adapter->hw.forced_speed_duplex = e1000_10_half;
break;
- case SPEED_10 + DUPLEX_FULL:
+ case SPEED_10 + FULL_DUPLEX:
adapter->hw.forced_speed_duplex = e1000_10_full;
break;
- case SPEED_100 + DUPLEX_HALF:
+ case SPEED_100 + HALF_DUPLEX:
adapter->hw.forced_speed_duplex = e1000_100_half;
break;
- case SPEED_100 + DUPLEX_FULL:
+ case SPEED_100 + FULL_DUPLEX:
adapter->hw.forced_speed_duplex = e1000_100_full;
break;
- case SPEED_1000 + DUPLEX_FULL:
+ case SPEED_1000 + FULL_DUPLEX:
adapter->hw.autoneg = 1;
adapter->hw.autoneg_advertised = ADVERTISE_1000_FULL;
break;
- case SPEED_1000 + DUPLEX_HALF: /* not supported */
+ case SPEED_1000 + HALF_DUPLEX: /* not supported */
default:
DPRINTK(PROBE, ERR,
"Unsupported Speed/Duplexity configuration\n");



2006-02-26 11:39:32

by Paul Rolland

[permalink] [raw]
Subject: Re: [2.4.32 - 2.6.15.4] e1000 - Fix mii interface

> it's mangled, tabs have been turned into whitespaces. I fixed
> it so please
> use the appended one.

Sorry about that, thanks for the fix.

Paul

2006-02-26 12:59:37

by Jesper Juhl

[permalink] [raw]
Subject: Re: [2.4.32 - 2.6.15.4] e1000 - Fix mii interface

On 2/25/06, Paul Rolland <[email protected]> wrote:
> Hello,
>
> This patch is based on Linux 2.4.32, and I've verified the same problem
> exists on 2.6.15.4.

are you planning a 2.6 patch as well ?

--
Jesper Juhl <[email protected]>
Don't top-post http://www.catb.org/~esr/jargon/html/T/top-post.html
Plain text mails only, please http://www.expita.com/nomime.html

2006-02-26 14:55:25

by Paul Rolland

[permalink] [raw]
Subject: Re: [2.4.32 - 2.6.15.4] e1000 - Fix mii interface

Hello,

> are you planning a 2.6 patch as well ?
>
I'm preparing it, and I'll be carefull with Tab/space ;)

Paul

2006-02-26 15:00:05

by Jesper Juhl

[permalink] [raw]
Subject: Re: [2.4.32 - 2.6.15.4] e1000 - Fix mii interface

On 2/26/06, Paul Rolland <[email protected]> wrote:
> Hello,
>
> > are you planning a 2.6 patch as well ?
> >
> I'm preparing it, and I'll be carefull with Tab/space ;)
>
Ok, great, I was just wondering since I would have made one if you had
no plans to do so.

--
Jesper Juhl <[email protected]>
Don't top-post http://www.catb.org/~esr/jargon/html/T/top-post.html
Plain text mails only, please http://www.expita.com/nomime.html

2006-02-26 15:12:27

by Paul Rolland

[permalink] [raw]
Subject: Re: [2.4.32 - 2.6.15.4] e1000 - Fix mii interface

Hello,

> Ok, great, I was just wondering since I would have made one if you had
> no plans to do so.

Well, I was just waiting to make sure it was interesting for someone ;)

Here is it, verified with tab and not spaces... but attached as my mailer
is likely to cripple anything I try to inline...

Signed-off-by: Paul Rolland <[email protected]>

Cheers,
Paul


Attachments:
e1000.patch-2.6.15.4 (1.43 kB)

2006-02-26 23:09:41

by David Gibson

[permalink] [raw]
Subject: Re: [PATCH] Enable mprotect on huge pages

On Fri, Feb 24, 2006 at 02:28:44PM -0800, Andrew Morton wrote:
> "Zhang, Yanmin" <[email protected]> wrote:
> >
> > From: Zhang, Yanmin <[email protected]>
> >
> > 2.6.16-rc3 uses hugetlb on-demand paging, but it doesn_t support hugetlb
> > mprotect. My patch against 2.6.16-rc3 enables this capability.
> >
>
> Well I suppose that makes sense. It does assume that the normal pte
> protection-changing APIs do the right thing on all architectures which
> implement huge pages. That's quite possibly the case, but we should
> confirm that.

Well, it will need to be huge_ptep_get_and_clear() below, not the
normal version. But pte_modify should be ok. I'm not sure
pte_present() is safe, either, !pte_none() is what we use elsewhere in
hugetlb.c.

And.. looks like lazy_mmu_prot_update() is unsafe, too. The only arch
which has something here (ia64) has a function which does icache
flushes on PAGE_SIZE only.

> > +
> > +void hugetlb_change_protection(struct vm_area_struct *vma,
> > + unsigned long address, unsigned long end, pgprot_t newprot)
> > +{
> > + struct mm_struct *mm = vma->vm_mm;
> > + unsigned long start = address;
> > + pte_t *ptep;
> > + pte_t pte;
> > +
> > + BUG_ON(address >= end);
> > + flush_cache_range(vma, address, end);
> > +
> > + spin_lock(&mm->page_table_lock);
> > + for (; address < end; address += HPAGE_SIZE) {
> > + ptep = huge_pte_offset(mm, address);
> > + if (!ptep)
> > + continue;
> > + if (pte_present(*ptep)) {
> > + pte = ptep_get_and_clear(mm, address, ptep);
> > + pte = pte_modify(pte, newprot);
> > + set_huge_pte_at(mm, addr, ptep, pte);
> > + lazy_mmu_prot_update(pte);
> > + }
> > + }
> > + spin_unlock(&mm->page_table_lock);
> > +
> > + flush_tlb_range(vma, start, end);
> > +}
>

--
David Gibson | I'll have my music baroque, and my code
david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson

2006-02-27 05:12:08

by Yanmin Zhang

[permalink] [raw]
Subject: Re: [PATCH] Enable mprotect on huge pages

On Sat, 2006-02-25 at 16:54, Christoph Hellwig wrote:
> On Thu, Feb 23, 2006 at 11:19:40AM +0800, Zhang, Yanmin wrote:
> > From: Zhang, Yanmin <[email protected]>
> >
> > 2.6.16-rc3 uses hugetlb on-demand paging, but it doesn?t support hugetlb
> > mprotect. My patch against 2.6.16-rc3 enables this capability.
>
> Adding another special case for hugetlb pages sounds rather bad. Could
> you try adding a mrotect vm_area_operation if that works out cleaner?
I don't quite understand your idea. vm_operations_struct has no mprotect function
pointer. Could you elaborate it?
In current kernel, sys_mprotect would bypass hugetlb if (is_vm_hugetlb_page(vma)).


2006-02-27 05:40:30

by Yanmin Zhang

[permalink] [raw]
Subject: Re: [PATCH] Enable mprotect on huge pages

On Mon, 2006-02-27 at 07:09, David Gibson wrote:
> On Fri, Feb 24, 2006 at 02:28:44PM -0800, Andrew Morton wrote:
> > "Zhang, Yanmin" <[email protected]> wrote:
> > >
> > > From: Zhang, Yanmin <[email protected]>
> > >
> > > 2.6.16-rc3 uses hugetlb on-demand paging, but it doesn_t support hugetlb
> > > mprotect. My patch against 2.6.16-rc3 enables this capability.
> > >
> >
> > Well I suppose that makes sense. It does assume that the normal pte
> > protection-changing APIs do the right thing on all architectures which
> > implement huge pages. That's quite possibly the case, but we should
> > confirm that.
>
> Well, it will need to be huge_ptep_get_and_clear() below, not the
> normal version.
I will change it.


> But pte_modify should be ok. I'm not sure
> pte_present() is safe, either, !pte_none() is what we use elsewhere in
> hugetlb.c.
pte_present is used in some files while !pte_none is used
in other files. Anyway, I will change it to !pte_none.


>
> And.. looks like lazy_mmu_prot_update() is unsafe, too. The only arch
> which has something here (ia64) has a function which does icache
> flushes on PAGE_SIZE only.
I already sent another patch to ia64 maillist to fix the issue.
See http://marc.theaimsgroup.com/?l=linux-ia64&m=114066414720468&w=2

Thanks.


2006-02-27 06:37:22

by Yanmin Zhang

[permalink] [raw]
Subject: Re: [PATCH] Enable mprotect on huge pages

On Mon, 2006-02-27 at 13:36, Zhang, Yanmin wrote:
> On Mon, 2006-02-27 at 07:09, David Gibson wrote:
> > On Fri, Feb 24, 2006 at 02:28:44PM -0800, Andrew Morton wrote:
> > > "Zhang, Yanmin" <[email protected]> wrote:
> > > >
> > > > From: Zhang, Yanmin <[email protected]>
> > > >
> > > > 2.6.16-rc3 uses hugetlb on-demand paging, but it doesn_t support hugetlb
> > > > mprotect. My patch against 2.6.16-rc3 enables this capability.

Based on David's comments, I worked out a new patch against 2.6.16-rc4.
Thank David.

Signed-off-by: Zhang Yanmin <[email protected]>

I tested it on i386/x86_64/ia64. Who could help test it on other
platforms, such like PPC64?

---

diff -Nraup linux-2.6.16-rc4/include/linux/hugetlb.h linux-2.6.16-rc4_mprotect/include/linux/hugetlb.h
--- linux-2.6.16-rc4/include/linux/hugetlb.h 2006-02-22 19:19:04.000000000 +0800
+++ linux-2.6.16-rc4_mprotect/include/linux/hugetlb.h 2006-02-27 20:58:33.000000000 +0800
@@ -41,6 +41,8 @@ struct page *follow_huge_pmd(struct mm_s
pmd_t *pmd, int write);
int is_aligned_hugepage_range(unsigned long addr, unsigned long len);
int pmd_huge(pmd_t pmd);
+void hugetlb_change_protection(struct vm_area_struct *vma,
+ unsigned long address, unsigned long end, pgprot_t newprot);

#ifndef ARCH_HAS_HUGEPAGE_ONLY_RANGE
#define is_hugepage_only_range(mm, addr, len) 0
@@ -101,6 +103,8 @@ static inline unsigned long hugetlb_tota
#define free_huge_page(p) ({ (void)(p); BUG(); })
#define hugetlb_fault(mm, vma, addr, write) ({ BUG(); 0; })

+#define hugetlb_change_protection(vma, address, end, newprot)
+
#ifndef HPAGE_MASK
#define HPAGE_MASK PAGE_MASK /* Keep the compiler happy */
#define HPAGE_SIZE PAGE_SIZE
diff -Nraup linux-2.6.16-rc4/mm/hugetlb.c linux-2.6.16-rc4_mprotect/mm/hugetlb.c
--- linux-2.6.16-rc4/mm/hugetlb.c 2006-02-22 19:19:05.000000000 +0800
+++ linux-2.6.16-rc4_mprotect/mm/hugetlb.c 2006-02-27 20:57:17.000000000 +0800
@@ -572,3 +572,32 @@ int follow_hugetlb_page(struct mm_struct

return i;
}
+
+void hugetlb_change_protection(struct vm_area_struct *vma,
+ unsigned long address, unsigned long end, pgprot_t newprot)
+{
+ struct mm_struct *mm = vma->vm_mm;
+ unsigned long start = address;
+ pte_t *ptep;
+ pte_t pte;
+
+ BUG_ON(address >= end);
+ flush_cache_range(vma, address, end);
+
+ spin_lock(&mm->page_table_lock);
+ for (; address < end; address += HPAGE_SIZE) {
+ ptep = huge_pte_offset(mm, address);
+ if (!ptep)
+ continue;
+ if (!pte_none(*ptep)) {
+ pte = huge_ptep_get_and_clear(mm, address, ptep);
+ pte = pte_modify(pte, newprot);
+ set_huge_pte_at(mm, addr, ptep, pte);
+ lazy_mmu_prot_update(pte);
+ }
+ }
+ spin_unlock(&mm->page_table_lock);
+
+ flush_tlb_range(vma, start, end);
+}
+
diff -Nraup linux-2.6.16-rc4/mm/mprotect.c linux-2.6.16-rc4_mprotect/mm/mprotect.c
--- linux-2.6.16-rc4/mm/mprotect.c 2006-02-22 19:18:21.000000000 +0800
+++ linux-2.6.16-rc4_mprotect/mm/mprotect.c 2006-02-27 20:55:10.000000000 +0800
@@ -124,7 +124,7 @@ mprotect_fixup(struct vm_area_struct *vm
* a MAP_NORESERVE private mapping to writable will now reserve.
*/
if (newflags & VM_WRITE) {
- if (!(oldflags & (VM_ACCOUNT|VM_WRITE|VM_SHARED|VM_HUGETLB))) {
+ if (!(oldflags & (VM_ACCOUNT|VM_WRITE|VM_SHARED))) {
charged = nrpages;
if (security_vm_enough_memory(charged))
return -ENOMEM;
@@ -166,7 +166,10 @@ success:
*/
vma->vm_flags = newflags;
vma->vm_page_prot = newprot;
- change_protection(vma, start, end, newprot);
+ if (is_vm_hugetlb_page(vma))
+ hugetlb_change_protection(vma, start, end, newprot);
+ else
+ change_protection(vma, start, end, newprot);
vm_stat_account(mm, oldflags, vma->vm_file, -nrpages);
vm_stat_account(mm, newflags, vma->vm_file, nrpages);
return 0;
@@ -240,11 +243,6 @@ sys_mprotect(unsigned long start, size_t

/* Here we know that vma->vm_start <= nstart < vma->vm_end. */

- if (is_vm_hugetlb_page(vma)) {
- error = -EACCES;
- goto out;
- }
-
newflags = vm_flags | (vma->vm_flags & ~(VM_READ | VM_WRITE | VM_EXEC));

/* newflags >> 4 shift VM_MAY% in place of VM_% */
@@ -260,6 +258,12 @@ sys_mprotect(unsigned long start, size_t
tmp = vma->vm_end;
if (tmp > end)
tmp = end;
+ if (is_vm_hugetlb_page(vma) &&
+ is_aligned_hugepage_range(nstart, tmp - nstart)) {
+ error = -EINVAL;
+ goto out;
+ }
+
error = mprotect_fixup(vma, &prev, nstart, tmp, newflags);
if (error)
goto out;


2006-02-27 07:42:38

by David Gibson

[permalink] [raw]
Subject: Re: [PATCH] Enable mprotect on huge pages

On Mon, Feb 27, 2006 at 01:36:32PM +0800, Zhang, Yanmin wrote:
> On Mon, 2006-02-27 at 07:09, David Gibson wrote:
> > On Fri, Feb 24, 2006 at 02:28:44PM -0800, Andrew Morton wrote:
> > > "Zhang, Yanmin" <[email protected]> wrote:
> > > >
> > > > From: Zhang, Yanmin <[email protected]>
> > > >
> > > > 2.6.16-rc3 uses hugetlb on-demand paging, but it doesn_t support hugetlb
> > > > mprotect. My patch against 2.6.16-rc3 enables this capability.
> > > >
> > >
> > > Well I suppose that makes sense. It does assume that the normal pte
> > > protection-changing APIs do the right thing on all architectures which
> > > implement huge pages. That's quite possibly the case, but we should
> > > confirm that.
> >
> > Well, it will need to be huge_ptep_get_and_clear() below, not the
> > normal version.
> I will change it.
>
>
> > But pte_modify should be ok. I'm not sure
> > pte_present() is safe, either, !pte_none() is what we use elsewhere in
> > hugetlb.c.
> pte_present is used in some files while !pte_none is used
> in other files. Anyway, I will change it to !pte_none.

But most importantly, pte_present() is not used in mm/hugetlb.c, the
generic part of the code.

> > And.. looks like lazy_mmu_prot_update() is unsafe, too. The only arch
> > which has something here (ia64) has a function which does icache
> > flushes on PAGE_SIZE only.
> I already sent another patch to ia64 maillist to fix the issue.
> See http://marc.theaimsgroup.com/?l=linux-ia64&m=114066414720468&w=2
>
> Thanks.
>
>

--
David Gibson | I'll have my music baroque, and my code
david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson

2006-02-27 19:26:14

by Jesse Brandeburg

[permalink] [raw]
Subject: Re: [2.4.32 - 2.6.15.4] e1000 - Fix mii interface

On 2/26/06, Paul Rolland <[email protected]> wrote:
> Hello,
>
> > Ok, great, I was just wondering since I would have made one if you had
> > no plans to do so.
>
> Well, I was just waiting to make sure it was interesting for someone ;)
>
> Here is it, verified with tab and not spaces... but attached as my mailer
> is likely to cripple anything I try to inline...
>
> Signed-off-by: Paul Rolland <[email protected]>

I've got an issue with this, as the same function is called in
e1000_ethtool.c. I think the correct fix is to fix the caller in the
mii-tool case, but I am working on verifiying my assumptions.

In the meantime can you send the exact command you were having the problem with?

2006-02-28 01:36:25

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH] Enable mprotect on huge pages

"Zhang, Yanmin" <[email protected]> wrote:
>
> > > > > 2.6.16-rc3 uses hugetlb on-demand paging, but it doesn_t support hugetlb
> > > > > mprotect. My patch against 2.6.16-rc3 enables this capability.
>
> Based on David's comments, I worked out a new patch against 2.6.16-rc4.
> Thank David.
>

Please always send an updated changelog when sending an updated patch.
Otherwise I have to go trolling back through the email thread to find it,
then work out what needs to be changed.

>
> I tested it on i386/x86_64/ia64. Who could help test it on other
> platforms, such like PPC64?

I can do that - please send me your test app?

2006-02-28 02:32:07

by Jesse Brandeburg

[permalink] [raw]
Subject: Re: [2.4.32 - 2.6.15.4] e1000 - Fix mii interface


> From: Paul Rolland <[email protected]>
>
> Hello,
>
> This patch is based on Linux 2.4.32, and I've verified the same problem
> exists on 2.6.15.4.
> Working on a machine with a 2.4.32 kernel, I was surprised to see the driver
> complaining when setting the speed to 100FD using mii-tool, but accepting
> the setting with ethtool.
> Digging into the code, I found that there is some confusion with :
> - DUPLEX_FULL and FULL_DUPLEX,
> - DUPLEX_HALF and HALF_DUPLEX
> in the code :
> ...
> spddplx += (mii_reg & 0x100)
> ? FULL_DUPLEX :
> HALF_DUPLEX;
> retval = e1000_set_spd_dplx(adapter,
> spddplx);

Please try this patch:

e1000: fix mii-tool access to setting speed and duplex

Paul Rolland reported that e1000 was having a hard time using mii-tool to
set speed and duplex. This patch fixes the issue on both newer hardware as
well as fixing the code issue that originally caused the problem.

Signed-off-by: Jesse Brandeburg <[email protected]>
CC: Paul Rolland <[email protected]>

---

drivers/net/e1000/e1000_main.c | 6 +++---
1 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/net/e1000/e1000_main.c b/drivers/net/e1000/e1000_main.c
index 31e3329..9730c2e 100644
--- a/drivers/net/e1000/e1000_main.c
+++ b/drivers/net/e1000/e1000_main.c
@@ -4269,7 +4269,7 @@ e1000_mii_ioctl(struct net_device *netde
spin_unlock_irqrestore(&adapter->stats_lock, flags);
return -EIO;
}
- if (adapter->hw.phy_type == e1000_phy_m88) {
+ if (adapter->hw.media_type == e1000_media_type_copper) {
switch (data->reg_num) {
case PHY_CTRL:
if (mii_reg & MII_CR_POWER_DOWN)
@@ -4285,8 +4285,8 @@ e1000_mii_ioctl(struct net_device *netde
else
spddplx = SPEED_10;
spddplx += (mii_reg & 0x100)
- ? FULL_DUPLEX :
- HALF_DUPLEX;
+ ? DUPLEX_FULL :
+ DUPLEX_HALF;
retval = e1000_set_spd_dplx(adapter,
spddplx);
if (retval) {

2006-02-28 03:27:29

by Yanmin Zhang

[permalink] [raw]
Subject: Re: [PATCH] Enable mprotect on huge pages

On Tue, 2006-02-28 at 09:34, Andrew Morton wrote:
> "Zhang, Yanmin" <[email protected]> wrote:
> >
> > > > > > 2.6.16-rc3 uses hugetlb on-demand paging, but it doesn_t support hugetlb
> > > > > > mprotect. My patch against 2.6.16-rc3 enables this capability.
> >
> > Based on David's comments, I worked out a new patch against 2.6.16-rc4.
> > Thank David.
> >
>
> Please always send an updated changelog when sending an updated patch.
> Otherwise I have to go trolling back through the email thread to find it,
> then work out what needs to be changed.
Thanks for your kind reminder. I would do so next time.

>
> >
> > I tested it on i386/x86_64/ia64. Who could help test it on other
> > platforms, such like PPC64?
>
> I can do that - please send me your test app?
I attach a test case. It will create directory /mnt/hugepages and delete it
after testing automatically.

To run it by user root:
#gcc -o mprotect_testcase mprotect_testcase.c
#echo "5">/proc/sys/vm/nr_hugepages
#./mprotect_testcase

You could use gdb to step it to see the changing of the process vma maps.

Thanks.



Attachments:
mprotect_testcase.c (3.43 kB)

2006-02-28 03:33:01

by David Gibson

[permalink] [raw]
Subject: Re: [PATCH] Enable mprotect on huge pages

On Tue, Feb 28, 2006 at 11:23:54AM +0800, Zhang, Yanmin wrote:
> On Tue, 2006-02-28 at 09:34, Andrew Morton wrote:
> > "Zhang, Yanmin" <[email protected]> wrote:
> > >
> > > > > > > 2.6.16-rc3 uses hugetlb on-demand paging, but it doesn_t support hugetlb
> > > > > > > mprotect. My patch against 2.6.16-rc3 enables this capability.
> > >
> > > Based on David's comments, I worked out a new patch against 2.6.16-rc4.
> > > Thank David.
> > >
> >
> > Please always send an updated changelog when sending an updated patch.
> > Otherwise I have to go trolling back through the email thread to find it,
> > then work out what needs to be changed.
> Thanks for your kind reminder. I would do so next time.
>
> >
> > >
> > > I tested it on i386/x86_64/ia64. Who could help test it on other
> > > platforms, such like PPC64?
> >
> > I can do that - please send me your test app?
> I attach a test case. It will create directory /mnt/hugepages and delete it
> after testing automatically.
>
> To run it by user root:
> #gcc -o mprotect_testcase mprotect_testcase.c
> #echo "5">/proc/sys/vm/nr_hugepages
> #./mprotect_testcase

If you could adapt this testcase to fit into the libhugetlbfs
testsuite, that would be really great (from
git://ozlabs.org/~dgibson/git/libhugetlbfs.git). Otherwise I guess I
will..

--
David Gibson | I'll have my music baroque, and my code
david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson

2006-02-28 03:41:12

by Yanmin Zhang

[permalink] [raw]
Subject: Re: [PATCH] Enable mprotect on huge pages

On Tue, 2006-02-28 at 11:32, David Gibson wrote:
> On Tue, Feb 28, 2006 at 11:23:54AM +0800, Zhang, Yanmin wrote:
> > On Tue, 2006-02-28 at 09:34, Andrew Morton wrote:
> > > "Zhang, Yanmin" <[email protected]> wrote:
> > > >
> > > > > > > > 2.6.16-rc3 uses hugetlb on-demand paging, but it doesn_t support hugetlb
> > > > > > > > mprotect. My patch against 2.6.16-rc3 enables this capability.
> > > >

> If you could adapt this testcase to fit into the libhugetlbfs
> testsuite, that would be really great (from
> git://ozlabs.org/~dgibson/git/libhugetlbfs.git). Otherwise I guess I
> will..
Frankly, I wrote a hugetlb test suite with dozens of test cases. It could
run on i386/x86_64/ia64 and caught many hugetlb bugs effectively. I am
not sure if I could distribute it out of intel.


2006-02-28 08:25:11

by David Gibson

[permalink] [raw]
Subject: Re: [PATCH] Enable mprotect on huge pages

On Tue, Feb 28, 2006 at 11:23:54AM +0800, Zhang, Yanmin wrote:
> On Tue, 2006-02-28 at 09:34, Andrew Morton wrote:
> > "Zhang, Yanmin" <[email protected]> wrote:
> > >
> > > > > > > 2.6.16-rc3 uses hugetlb on-demand paging, but it doesn_t support hugetlb
> > > > > > > mprotect. My patch against 2.6.16-rc3 enables this capability.
> > >
> > > Based on David's comments, I worked out a new patch against 2.6.16-rc4.
> > > Thank David.
> > >
> >
> > Please always send an updated changelog when sending an updated patch.
> > Otherwise I have to go trolling back through the email thread to find it,
> > then work out what needs to be changed.
> Thanks for your kind reminder. I would do so next time.
>
> >
> > >
> > > I tested it on i386/x86_64/ia64. Who could help test it on other
> > > platforms, such like PPC64?
> >
> > I can do that - please send me your test app?
> I attach a test case. It will create directory /mnt/hugepages and delete it
> after testing automatically.
>
> To run it by user root:
> #gcc -o mprotect_testcase mprotect_testcase.c
> #echo "5">/proc/sys/vm/nr_hugepages
> #./mprotect_testcase
>
> You could use gdb to step it to see the changing of the process vma maps.

I've just added a simpler testcase than this one to the libhugetlbfs
testsuite. Both R->RW and RW->R transitions appear to work (I haven't
tested EXEC transitions yet, that's hairier).

--
David Gibson | I'll have my music baroque, and my code
david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson

2006-02-28 10:46:08

by Paul Rolland

[permalink] [raw]
Subject: Re: [2.4.32 - 2.6.15.4] e1000 - Fix mii interface

Hello Jesse,

> spddplx += (mii_reg & 0x100)
> - ? FULL_DUPLEX :
> - HALF_DUPLEX;
> + ? DUPLEX_FULL :
> + DUPLEX_HALF;

As I said in my first mail, I didn't want to go that way as one of the
two DUPLEX_FULL or DUPLEX_HALF is defined as being 0, which prevents
detecting incorrect/incomplete initializations.

The problem I had came from :
mii-tool -F 100BaseTx-FD eth0
when at the same time the ethtool interface was OK.

But you are right, the change I made missed the second caller.

The correct change is yours, though for the reason I put above, I think
it'll make it harder to spot other bugs ;)

Well done,
Paul