LinuxLists.cc - [PATCH 3.13 00/22] 3.13.9-stable review

2014-04-01 04:11:41

by Greg Kroah-Hartman

[permalink] [raw]

Subject: [PATCH 3.13 00/22] 3.13.9-stable review

This is the start of the stable review cycle for the 3.13.9 release.
There are 22 patches in this series, all will be posted as a response
to this one. If anyone has any issues with these being applied, please
let me know.

Responses should be made by Thu Apr 3 04:06:39 UTC 2014.
Anything received after that time might be too late.

The whole patch series can be found in one patch at:
kernel.org/pub/linux/kernel/v3.0/stable-review/patch-3.13.9-rc1.gz
and the diffstat can be found below.

thanks,

greg k-h

-------------
Pseudo-Shortlog of commits:

Greg Kroah-Hartman <[email protected]>
Linux 3.13.9-rc1

Daniel Borkmann <[email protected]>
netfilter: nf_conntrack_dccp: fix skb_header_pointer API usages

Li Zefan <[email protected]>
cgroup: protect modifications to cgroup_idr with cgroup_mutex

David Rientjes <[email protected]>
mm: close PageTail race

Al Viro <[email protected]>
switch mnt_hash to hlist

Al Viro <[email protected]>
don't bother with propagate_mnt() unless the target is shared

Al Viro <[email protected]>
keep shadowed vfsmounts together

Al Viro <[email protected]>
resizable namespace.c hashes

Sasha Levin <[email protected]>
random32: avoid attempt to late reseed if in the middle of seeding

Thomas Petazzoni <[email protected]>
net: mvneta: fix usage as a module on RGMII configurations

Thomas Petazzoni <[email protected]>
net: mvneta: rename MVNETA_GMAC2_PSC_ENABLE to MVNETA_GMAC2_PCS_ENABLE

Al Viro <[email protected]>
make prepend_name() work correctly when called with negative *buflen

Artem Fetishev <[email protected]>
x86: fix boot on uniprocessor systems

Daniel Vetter <[email protected]>
drm/i915: Undo gtt scratch pte unmapping again

Scott Wood <[email protected]>
i2c: cpm: Fix build by adding of_address.h and of_irq.h

David Vrabel <[email protected]>
Revert "xen: properly account for _PAGE_NUMA during xen pte translations"

Wei Liu <[email protected]>
xen/balloon: flush persistent kmaps in correct position

Hans de Goede <[email protected]>
Input: cypress_ps2 - don't report as a button pads

Hans de Goede <[email protected]>
Input: synaptics - add manual min/max quirk for ThinkPad X240

Benjamin Tissoires <[email protected]>
Input: synaptics - add manual min/max quirk

Dmitry Torokhov <[email protected]>
Input: mousedev - fix race when creating mixed device

Al Viro <[email protected]>
rcuwalk: recheck mount_lock after mountpoint crossing attempts

Theodore Ts'o <[email protected]>
ext4: atomically set inode->i_flags in ext4_set_inode_flags()

-------------

Diffstat:

Makefile | 4 +-
arch/x86/include/asm/pgtable.h | 14 +--
arch/x86/include/asm/topology.h | 3 +-
arch/x86/xen/mmu.c | 4 +-
drivers/block/aoe/aoecmd.c | 4 +-
drivers/gpu/drm/i915/i915_gem_gtt.c | 2 +-
drivers/i2c/busses/i2c-cpm.c | 2 +
drivers/input/mouse/cypress_ps2.c | 1 -
drivers/input/mouse/synaptics.c | 55 ++++++++++
drivers/input/mousedev.c | 73 +++++++------
drivers/net/ethernet/marvell/mvneta.c | 43 ++------
drivers/vfio/vfio_iommu_type1.c | 4 +-
drivers/xen/balloon.c | 24 +++--
fs/dcache.c | 4 +-
fs/ext4/inode.c | 15 +--
fs/mount.h | 4 +-
fs/namei.c | 29 +++---
fs/namespace.c | 177 +++++++++++++++++++++-----------
fs/pnode.c | 26 +++--
fs/pnode.h | 4 +-
fs/proc/page.c | 2 +-
include/linux/bitops.h | 15 +++
include/linux/cgroup.h | 2 +
include/linux/huge_mm.h | 18 ----
include/linux/mm.h | 14 ++-
kernel/cgroup.c | 26 ++---
lib/random32.c | 13 ++-
mm/ksm.c | 2 +-
mm/memory-failure.c | 2 +-
mm/page_alloc.c | 4 +-
mm/swap.c | 4 +-
net/netfilter/nf_conntrack_proto_dccp.c | 6 +-
32 files changed, 362 insertions(+), 238 deletions(-)

2014-04-01 04:09:37

by Greg Kroah-Hartman

[permalink] [raw]

Subject: [PATCH 3.13 01/22] ext4: atomically set inode->i_flags in ext4_set_inode_flags()

3.13-stable review patch. If anyone has any objections, please let me know.

------------------

From: Theodore Ts'o <[email protected]>

commit 00a1a053ebe5febcfc2ec498bd894f035ad2aa06 upstream.

Use cmpxchg() to atomically set i_flags instead of clearing out the
S_IMMUTABLE, S_APPEND, etc. flags and then setting them from the
EXT4_IMMUTABLE_FL, EXT4_APPEND_FL flags, since this opens up a race
where an immutable file has the immutable flag cleared for a brief
window of time.

Reported-by: John Sullivan <[email protected]>
Signed-off-by: "Theodore Ts'o" <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
fs/ext4/inode.c | 15 +++++++++------
include/linux/bitops.h | 15 +++++++++++++++
2 files changed, 24 insertions(+), 6 deletions(-)

--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -38,6 +38,7 @@
#include <linux/slab.h>
#include <linux/ratelimit.h>
#include <linux/aio.h>
+#include <linux/bitops.h>

#include "ext4_jbd2.h"
#include "xattr.h"
@@ -3926,18 +3927,20 @@ int ext4_get_inode_loc(struct inode *ino
void ext4_set_inode_flags(struct inode *inode)
{
unsigned int flags = EXT4_I(inode)->i_flags;
+ unsigned int new_fl = 0;

- inode->i_flags &= ~(S_SYNC|S_APPEND|S_IMMUTABLE|S_NOATIME|S_DIRSYNC);
if (flags & EXT4_SYNC_FL)
- inode->i_flags |= S_SYNC;
+ new_fl |= S_SYNC;
if (flags & EXT4_APPEND_FL)
- inode->i_flags |= S_APPEND;
+ new_fl |= S_APPEND;
if (flags & EXT4_IMMUTABLE_FL)
- inode->i_flags |= S_IMMUTABLE;
+ new_fl |= S_IMMUTABLE;
if (flags & EXT4_NOATIME_FL)
- inode->i_flags |= S_NOATIME;
+ new_fl |= S_NOATIME;
if (flags & EXT4_DIRSYNC_FL)
- inode->i_flags |= S_DIRSYNC;
+ new_fl |= S_DIRSYNC;
+ set_mask_bits(&inode->i_flags,
+ S_SYNC|S_APPEND|S_IMMUTABLE|S_NOATIME|S_DIRSYNC, new_fl);
}

/* Propagate flags from i_flags to EXT4_I(inode)->i_flags */
--- a/include/linux/bitops.h
+++ b/include/linux/bitops.h
@@ -196,6 +196,21 @@ static inline unsigned long __ffs64(u64

#ifdef __KERNEL__

+#ifndef set_mask_bits
+#define set_mask_bits(ptr, _mask, _bits) \
+({ \
+ const typeof(*ptr) mask = (_mask), bits = (_bits); \
+ typeof(*ptr) old, new; \
+ \
+ do { \
+ old = ACCESS_ONCE(*ptr); \
+ new = (old & ~mask) | bits; \
+ } while (cmpxchg(ptr, old, new) != old); \
+ \
+ new; \
+})
+#endif
+
#ifndef find_last_bit
/**
* find_last_bit - find the last set bit in a memory region

2014-04-01 04:10:15

by Greg Kroah-Hartman

[permalink] [raw]

Subject: [PATCH 3.13 02/22] rcuwalk: recheck mount_lock after mountpoint crossing attempts

3.13-stable review patch. If anyone has any objections, please let me know.

------------------

From: Al Viro <[email protected]>

commit b37199e626b31e1175fb06764c5d1d687723aac2 upstream.

We can get false negative from __lookup_mnt() if an unrelated vfsmount
gets moved. In that case legitimize_mnt() is guaranteed to fail,
and we will fall back to non-RCU walk... unless we end up running
into a hard error on a filesystem object we wouldn't have reached
if not for that false negative. IOW, delaying that check until
the end of pathname resolution is wrong - we should recheck right
after we attempt to cross the mountpoint. We don't need to recheck
unless we see d_mountpoint() being true - in that case even if
we have just raced with mount/umount, we can simply go on as if
we'd come at the moment when the sucker wasn't a mountpoint; if we
run into a hard error as the result, it was a legitimate outcome.
__lookup_mnt() returning NULL is different in that respect, since
it might've happened due to operation on completely unrelated
mountpoint.

Signed-off-by: Al Viro <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
fs/namei.c | 29 +++++++++++++----------------
1 file changed, 13 insertions(+), 16 deletions(-)

--- a/fs/namei.c
+++ b/fs/namei.c
@@ -1098,7 +1098,7 @@ static bool __follow_mount_rcu(struct na
return false;

if (!d_mountpoint(path->dentry))
- break;
+ return true;

mounted = __lookup_mnt(path->mnt, path->dentry);
if (!mounted)
@@ -1114,20 +1114,7 @@ static bool __follow_mount_rcu(struct na
*/
*inode = path->dentry->d_inode;
}
- return true;
-}
-
-static void follow_mount_rcu(struct nameidata *nd)
-{
- while (d_mountpoint(nd->path.dentry)) {
- struct mount *mounted;
- mounted = __lookup_mnt(nd->path.mnt, nd->path.dentry);
- if (!mounted)
- break;
- nd->path.mnt = &mounted->mnt;
- nd->path.dentry = mounted->mnt.mnt_root;
- nd->seq = read_seqcount_begin(&nd->path.dentry->d_seq);
- }
+ return read_seqretry(&mount_lock, nd->m_seq);
}

static int follow_dotdot_rcu(struct nameidata *nd)
@@ -1155,7 +1142,17 @@ static int follow_dotdot_rcu(struct name
break;
nd->seq = read_seqcount_begin(&nd->path.dentry->d_seq);
}
- follow_mount_rcu(nd);
+ while (d_mountpoint(nd->path.dentry)) {
+ struct mount *mounted;
+ mounted = __lookup_mnt(nd->path.mnt, nd->path.dentry);
+ if (!mounted)
+ break;
+ nd->path.mnt = &mounted->mnt;
+ nd->path.dentry = mounted->mnt.mnt_root;
+ nd->seq = read_seqcount_begin(&nd->path.dentry->d_seq);
+ if (!read_seqretry(&mount_lock, nd->m_seq))
+ goto failed;
+ }
nd->inode = nd->path.dentry->d_inode;
return 0;

2014-04-01 04:10:52

by Greg Kroah-Hartman

[permalink] [raw]

Subject: [PATCH 3.13 03/22] Input: mousedev - fix race when creating mixed device

3.13-stable review patch. If anyone has any objections, please let me know.

------------------

From: Dmitry Torokhov <[email protected]>

commit e4dbedc7eac7da9db363a36f2bd4366962eeefcc upstream.

We should not be using static variable mousedev_mix in methods that can be
called before that singleton gets assigned. While at it let's add open and
close methods to mousedev structure so that we do not need to test if we
are dealing with multiplexor or normal device and simply call appropriate
method directly.

This fixes: https://bugzilla.kernel.org/show_bug.cgi?id=71551

Reported-by: GiulioDP <[email protected]>
Tested-by: GiulioDP <[email protected]>
Signed-off-by: Dmitry Torokhov <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
drivers/input/mousedev.c | 73 +++++++++++++++++++++++++++--------------------
1 file changed, 42 insertions(+), 31 deletions(-)

--- a/drivers/input/mousedev.c
+++ b/drivers/input/mousedev.c
@@ -67,7 +67,6 @@ struct mousedev {
struct device dev;
struct cdev cdev;
bool exist;
- bool is_mixdev;

struct list_head mixdev_node;
bool opened_by_mixdev;
@@ -77,6 +76,9 @@ struct mousedev {
int old_x[4], old_y[4];
int frac_dx, frac_dy;
unsigned long touch;
+
+ int (*open_device)(struct mousedev *mousedev);
+ void (*close_device)(struct mousedev *mousedev);
};

enum mousedev_emul {
@@ -116,9 +118,6 @@ static unsigned char mousedev_imex_seq[]
static struct mousedev *mousedev_mix;
static LIST_HEAD(mousedev_mix_list);

-static void mixdev_open_devices(void);
-static void mixdev_close_devices(void);
-
#define fx(i) (mousedev->old_x[(mousedev->pkt_count - (i)) & 03])
#define fy(i) (mousedev->old_y[(mousedev->pkt_count - (i)) & 03])

@@ -428,9 +427,7 @@ static int mousedev_open_device(struct m
if (retval)
return retval;

- if (mousedev->is_mixdev)
- mixdev_open_devices();
- else if (!mousedev->exist)
+ if (!mousedev->exist)
retval = -ENODEV;
else if (!mousedev->open++) {
retval = input_open_device(&mousedev->handle);
@@ -446,9 +443,7 @@ static void mousedev_close_device(struct
{
mutex_lock(&mousedev->mutex);

- if (mousedev->is_mixdev)
- mixdev_close_devices();
- else if (mousedev->exist && !--mousedev->open)
+ if (mousedev->exist && !--mousedev->open)
input_close_device(&mousedev->handle);

mutex_unlock(&mousedev->mutex);
@@ -459,21 +454,29 @@ static void mousedev_close_device(struct
* stream. Note that this function is called with mousedev_mix->mutex
* held.
*/
-static void mixdev_open_devices(void)
+static int mixdev_open_devices(struct mousedev *mixdev)
{
- struct mousedev *mousedev;
+ int error;
+
+ error = mutex_lock_interruptible(&mixdev->mutex);
+ if (error)
+ return error;

- if (mousedev_mix->open++)
- return;
+ if (!mixdev->open++) {
+ struct mousedev *mousedev;

- list_for_each_entry(mousedev, &mousedev_mix_list, mixdev_node) {
- if (!mousedev->opened_by_mixdev) {
- if (mousedev_open_device(mousedev))
- continue;
+ list_for_each_entry(mousedev, &mousedev_mix_list, mixdev_node) {
+ if (!mousedev->opened_by_mixdev) {
+ if (mousedev_open_device(mousedev))
+ continue;

- mousedev->opened_by_mixdev = true;
+ mousedev->opened_by_mixdev = true;
+ }
}
}
+
+ mutex_unlock(&mixdev->mutex);
+ return 0;
}

/*
@@ -481,19 +484,22 @@ static void mixdev_open_devices(void)
* device. Note that this function is called with mousedev_mix->mutex
* held.
*/
-static void mixdev_close_devices(void)
+static void mixdev_close_devices(struct mousedev *mixdev)
{
- struct mousedev *mousedev;
+ mutex_lock(&mixdev->mutex);

- if (--mousedev_mix->open)
- return;
+ if (!--mixdev->open) {
+ struct mousedev *mousedev;

- list_for_each_entry(mousedev, &mousedev_mix_list, mixdev_node) {
- if (mousedev->opened_by_mixdev) {
- mousedev->opened_by_mixdev = false;
- mousedev_close_device(mousedev);
+ list_for_each_entry(mousedev, &mousedev_mix_list, mixdev_node) {
+ if (mousedev->opened_by_mixdev) {
+ mousedev->opened_by_mixdev = false;
+ mousedev_close_device(mousedev);
+ }
}
}
+
+ mutex_unlock(&mixdev->mutex);
}

@@ -522,7 +528,7 @@ static int mousedev_release(struct inode
mousedev_detach_client(mousedev, client);
kfree(client);

- mousedev_close_device(mousedev);
+ mousedev->close_device(mousedev);

return 0;
}
@@ -550,7 +556,7 @@ static int mousedev_open(struct inode *i
client->mousedev = mousedev;
mousedev_attach_client(mousedev, client);

- error = mousedev_open_device(mousedev);
+ error = mousedev->open_device(mousedev);
if (error)
goto err_free_client;

@@ -861,16 +867,21 @@ static struct mousedev *mousedev_create(

if (mixdev) {
dev_set_name(&mousedev->dev, "mice");
+
+ mousedev->open_device = mixdev_open_devices;
+ mousedev->close_device = mixdev_close_devices;
} else {
int dev_no = minor;
/* Normalize device number if it falls into legacy range */
if (dev_no < MOUSEDEV_MINOR_BASE + MOUSEDEV_MINORS)
dev_no -= MOUSEDEV_MINOR_BASE;
dev_set_name(&mousedev->dev, "mouse%d", dev_no);
+
+ mousedev->open_device = mousedev_open_device;
+ mousedev->close_device = mousedev_close_device;
}

mousedev->exist = true;
- mousedev->is_mixdev = mixdev;
mousedev->handle.dev = input_get_device(dev);
mousedev->handle.name = dev_name(&mousedev->dev);
mousedev->handle.handler = handler;
@@ -919,7 +930,7 @@ static void mousedev_destroy(struct mous
device_del(&mousedev->dev);
mousedev_cleanup(mousedev);
input_free_minor(MINOR(mousedev->dev.devt));
- if (!mousedev->is_mixdev)
+ if (mousedev != mousedev_mix)
input_unregister_handle(&mousedev->handle);
put_device(&mousedev->dev);
}

2014-04-01 04:12:06

by Greg Kroah-Hartman

[permalink] [raw]

Subject: [PATCH 3.13 05/22] Input: synaptics - add manual min/max quirk for ThinkPad X240

3.13-stable review patch. If anyone has any objections, please let me know.

------------------

From: Hans de Goede <[email protected]>

commit 8a0435d958fb36d93b8df610124a0e91e5675c82 upstream.

This extends Benjamin Tissoires manual min/max quirk table with support for
the ThinkPad X240.

Signed-off-by: Hans de Goede <[email protected]>
Signed-off-by: Dmitry Torokhov <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
drivers/input/mouse/synaptics.c | 8 ++++++++
1 file changed, 8 insertions(+)

--- a/drivers/input/mouse/synaptics.c
+++ b/drivers/input/mouse/synaptics.c
@@ -1507,6 +1507,14 @@ static const struct dmi_system_id min_ma
.driver_data = (int []){1024, 5052, 2258, 4832},
},
{
+ /* Lenovo ThinkPad X240 */
+ .matches = {
+ DMI_MATCH(DMI_SYS_VENDOR, "LENOVO"),
+ DMI_MATCH(DMI_PRODUCT_VERSION, "ThinkPad X240"),
+ },
+ .driver_data = (int []){1232, 5710, 1156, 4696},
+ },
+ {
/* Lenovo ThinkPad T440s */
.matches = {
DMI_MATCH(DMI_SYS_VENDOR, "LENOVO"),

2014-04-01 04:12:32

by Greg Kroah-Hartman

[permalink] [raw]

Subject: [PATCH 3.13 06/22] Input: cypress_ps2 - dont report as a button pads

3.13-stable review patch. If anyone has any objections, please let me know.

------------------

From: Hans de Goede <[email protected]>

commit 6797b39e6f6f34c74177736e146406e894b9482b upstream.

The cypress PS/2 trackpad models supported by the cypress_ps2 driver
emulate BTN_RIGHT events in firmware based on the finger position, as part
of this no motion events are sent when the finger is in the button area.

The INPUT_PROP_BUTTONPAD property is there to indicate to userspace that
BTN_RIGHT events should be emulated in userspace, which is not necessary
in this case.

When INPUT_PROP_BUTTONPAD is advertised userspace will wait for a motion
event before propagating the button event higher up the stack, as it needs
current abs x + y data for its BTN_RIGHT emulation. Since in the
cypress_ps2 pads don't report motion events in the button area, this means
that clicks in the button area end up being ignored, so
INPUT_PROP_BUTTONPAD actually causes problems for these touchpads, and
removing it fixes:

https://bugs.freedesktop.org/show_bug.cgi?id=76341

Reported-by: Adam Williamson <[email protected]>
Tested-by: Adam Williamson <[email protected]>
Reviewed-by: Peter Hutterer <[email protected]>
Signed-off-by: Hans de Goede <[email protected]>
Signed-off-by: Dmitry Torokhov <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
drivers/input/mouse/cypress_ps2.c | 1 -
1 file changed, 1 deletion(-)

--- a/drivers/input/mouse/cypress_ps2.c
+++ b/drivers/input/mouse/cypress_ps2.c
@@ -410,7 +410,6 @@ static int cypress_set_input_params(stru
__clear_bit(REL_X, input->relbit);
__clear_bit(REL_Y, input->relbit);

- __set_bit(INPUT_PROP_BUTTONPAD, input->propbit);
__set_bit(EV_KEY, input->evbit);
__set_bit(BTN_LEFT, input->keybit);
__set_bit(BTN_RIGHT, input->keybit);

2014-04-01 04:13:18

by Greg Kroah-Hartman

[permalink] [raw]

Subject: [PATCH 3.13 07/22] xen/balloon: flush persistent kmaps in correct position

3.13-stable review patch. If anyone has any objections, please let me know.

------------------

From: Wei Liu <[email protected]>

commit 09ed3d5ba06137913960f9c9385f71fc384193ab upstream.

Xen balloon driver will update ballooned out pages' P2M entries to point
to scratch page for PV guests. In 24f69373e2 ("xen/balloon: don't alloc
page while non-preemptible", kmap_flush_unused was moved after updating
P2M table. In that case for 32 bit PV guest we might end up with

P2M X -----> S (S is mfn of balloon scratch page)
M2P Y -----> X (Y is mfn in persistent kmap entry)

kmap_flush_unused() iterates through all the PTEs in the kmap address
space, using pte_to_page() to obtain the page. If the p2m and the m2p
are inconsistent the incorrect page is returned. This will clear
page->address on the wrong page which may cause subsequent oopses if
that page is currently kmap'ed.

Move the flush back between get_page and __set_phys_to_machine to fix
this.

Signed-off-by: Wei Liu <[email protected]>
Signed-off-by: David Vrabel <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
drivers/xen/balloon.c | 24 ++++++++++++++++++------
1 file changed, 18 insertions(+), 6 deletions(-)

--- a/drivers/xen/balloon.c
+++ b/drivers/xen/balloon.c
@@ -406,11 +406,25 @@ static enum bp_state decrease_reservatio
state = BP_EAGAIN;
break;
}
+ scrub_page(page);

- pfn = page_to_pfn(page);
- frame_list[i] = pfn_to_mfn(pfn);
+ frame_list[i] = page_to_pfn(page);
+ }

- scrub_page(page);
+ /*
+ * Ensure that ballooned highmem pages don't have kmaps.
+ *
+ * Do this before changing the p2m as kmap_flush_unused()
+ * reads PTEs to obtain pages (and hence needs the original
+ * p2m entry).
+ */
+ kmap_flush_unused();
+
+ /* Update direct mapping, invalidate P2M, and add to balloon. */
+ for (i = 0; i < nr_pages; i++) {
+ pfn = frame_list[i];
+ frame_list[i] = pfn_to_mfn(pfn);
+ page = pfn_to_page(pfn);

#ifdef CONFIG_XEN_HAVE_PVMMU
/*
@@ -436,11 +450,9 @@ static enum bp_state decrease_reservatio
}
#endif

- balloon_append(pfn_to_page(pfn));
+ balloon_append(page);
}

- /* Ensure that ballooned highmem pages don't have kmaps. */
- kmap_flush_unused();
flush_tlb_all();

set_xen_guest_handle(reservation.extent_start, frame_list);

2014-04-01 04:13:38

by Greg Kroah-Hartman

[permalink] [raw]

Subject: [PATCH 3.13 08/22] Revert "xen: properly account for _PAGE_NUMA during xen pte translations"

3.13-stable review patch. If anyone has any objections, please let me know.

------------------

From: David Vrabel <[email protected]>

commit 5926f87fdaad4be3ed10cec563bf357915e55a86 upstream.

This reverts commit a9c8e4beeeb64c22b84c803747487857fe424b68.

PTEs in Xen PV guests must contain machine addresses if _PAGE_PRESENT
is set and pseudo-physical addresses is _PAGE_PRESENT is clear.

This is because during a domain save/restore (migration) the page
table entries are "canonicalised" and uncanonicalised". i.e., MFNs are
converted to PFNs during domain save so that on a restore the page
table entries may be rewritten with the new MFNs on the destination.
This canonicalisation is only done for PTEs that are present.

This change resulted in writing PTEs with MFNs if _PAGE_PROTNONE (or
_PAGE_NUMA) was set but _PAGE_PRESENT was clear. These PTEs would be
migrated as-is which would result in unexpected behaviour in the
destination domain. Either a) the MFN would be translated to the
wrong PFN/page; b) setting the _PAGE_PRESENT bit would clear the PTE
because the MFN is no longer owned by the domain; or c) the present
bit would not get set.

Symptoms include "Bad page" reports when munmapping after migrating a
domain.

Signed-off-by: David Vrabel <[email protected]>
Acked-by: Konrad Rzeszutek Wilk <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
arch/x86/include/asm/pgtable.h | 14 ++------------
arch/x86/xen/mmu.c | 4 ++--
2 files changed, 4 insertions(+), 14 deletions(-)

--- a/arch/x86/include/asm/pgtable.h
+++ b/arch/x86/include/asm/pgtable.h
@@ -445,20 +445,10 @@ static inline int pte_same(pte_t a, pte_
return a.pte == b.pte;
}

-static inline int pteval_present(pteval_t pteval)
-{
- /*
- * Yes Linus, _PAGE_PROTNONE == _PAGE_NUMA. Expressing it this
- * way clearly states that the intent is that protnone and numa
- * hinting ptes are considered present for the purposes of
- * pagetable operations like zapping, protection changes, gup etc.
- */
- return pteval & (_PAGE_PRESENT | _PAGE_PROTNONE | _PAGE_NUMA);
-}
-
static inline int pte_present(pte_t a)
{
- return pteval_present(pte_flags(a));
+ return pte_flags(a) & (_PAGE_PRESENT | _PAGE_PROTNONE |
+ _PAGE_NUMA);
}

#define pte_accessible pte_accessible
--- a/arch/x86/xen/mmu.c
+++ b/arch/x86/xen/mmu.c
@@ -365,7 +365,7 @@ void xen_ptep_modify_prot_commit(struct
/* Assume pteval_t is equivalent to all the other *val_t types. */
static pteval_t pte_mfn_to_pfn(pteval_t val)
{
- if (pteval_present(val)) {
+ if (val & _PAGE_PRESENT) {
unsigned long mfn = (val & PTE_PFN_MASK) >> PAGE_SHIFT;
unsigned long pfn = mfn_to_pfn(mfn);

@@ -381,7 +381,7 @@ static pteval_t pte_mfn_to_pfn(pteval_t

static pteval_t pte_pfn_to_mfn(pteval_t val)
{
- if (pteval_present(val)) {
+ if (val & _PAGE_PRESENT) {
unsigned long pfn = (val & PTE_PFN_MASK) >> PAGE_SHIFT;
pteval_t flags = val & PTE_FLAGS_MASK;
unsigned long mfn;

2014-04-01 04:13:43

by Greg Kroah-Hartman

[permalink] [raw]

Subject: [PATCH 3.13 10/22] drm/i915: Undo gtt scratch pte unmapping again

3.13-stable review patch. If anyone has any objections, please let me know.

------------------

From: Daniel Vetter <[email protected]>

commit 8ee661b505613ef2747b350ca2871a31b3781bee upstream.

It apparently blows up on some machines. This functionally reverts

commit 828c79087cec61eaf4c76bb32c222fbe35ac3930
Author: Ben Widawsky <[email protected]>
Date: Wed Oct 16 09:21:30 2013 -0700

drm/i915: Disable GGTT PTEs on GEN6+ suspend

Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=64841
Reported-and-Tested-by: Brad Jackson <[email protected]>
Cc: Takashi Iwai <[email protected]>
Cc: Paulo Zanoni <[email protected]>
Cc: Todd Previte <[email protected]>
Signed-off-by: Daniel Vetter <[email protected]>
Signed-off-by: Dave Airlie <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
drivers/gpu/drm/i915/i915_gem_gtt.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -828,7 +828,7 @@ void i915_gem_suspend_gtt_mappings(struc
dev_priv->gtt.base.clear_range(&dev_priv->gtt.base,
dev_priv->gtt.base.start / PAGE_SIZE,
dev_priv->gtt.base.total / PAGE_SIZE,
- false);
+ true);
}

void i915_gem_restore_gtt_mappings(struct drm_device *dev)

2014-04-01 04:14:07

by Greg Kroah-Hartman

[permalink] [raw]

Subject: [PATCH 3.13 12/22] make prepend_name() work correctly when called with negative *buflen

3.13-stable review patch. If anyone has any objections, please let me know.

------------------

From: Al Viro <[email protected]>

commit e825196d48d2b89a6ec3a8eff280098d2a78207e upstream.

In all callchains leading to prepend_name(), the value left in *buflen
is eventually discarded unused if prepend_name() has returned a negative.
So we are free to do what prepend() does, and subtract from *buflen
*before* checking for underflow (which turns into checking the sign
of subtraction result, of course).

Signed-off-by: Al Viro <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
fs/dcache.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

--- a/fs/dcache.c
+++ b/fs/dcache.c
@@ -2833,9 +2833,9 @@ static int prepend_name(char **buffer, i
u32 dlen = ACCESS_ONCE(name->len);
char *p;

- if (*buflen < dlen + 1)
- return -ENAMETOOLONG;
*buflen -= dlen + 1;
+ if (*buflen < 0)
+ return -ENAMETOOLONG;
p = *buffer -= dlen + 1;
*p++ = '/';
while (dlen--) {

2014-04-01 04:14:17

by Greg Kroah-Hartman

[permalink] [raw]

Subject: [PATCH 3.13 13/22] net: mvneta: rename MVNETA_GMAC2_PSC_ENABLE to MVNETA_GMAC2_PCS_ENABLE

3.13-stable review patch. If anyone has any objections, please let me know.

------------------

From: Thomas Petazzoni <[email protected]>

commit a79121d3b57e7ad61f0b5d23eae05214054f3ccd upstream.

Bit 3 of the MVNETA_GMAC_CTRL_2 is actually used to enable the PCS,
not the PSC: there was a typo in the name of the define, which this
commit fixes.

Signed-off-by: Thomas Petazzoni <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
drivers/net/ethernet/marvell/mvneta.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

--- a/drivers/net/ethernet/marvell/mvneta.c
+++ b/drivers/net/ethernet/marvell/mvneta.c
@@ -161,7 +161,7 @@
#define MVNETA_GMAC_MAX_RX_SIZE_MASK 0x7ffc
#define MVNETA_GMAC0_PORT_ENABLE BIT(0)
#define MVNETA_GMAC_CTRL_2 0x2c08
-#define MVNETA_GMAC2_PSC_ENABLE BIT(3)
+#define MVNETA_GMAC2_PCS_ENABLE BIT(3)
#define MVNETA_GMAC2_PORT_RGMII BIT(4)
#define MVNETA_GMAC2_PORT_RESET BIT(6)
#define MVNETA_GMAC_STATUS 0x2c10
@@ -729,7 +729,7 @@ static void mvneta_port_sgmii_config(str
u32 val;

val = mvreg_read(pp, MVNETA_GMAC_CTRL_2);
- val |= MVNETA_GMAC2_PSC_ENABLE;
+ val |= MVNETA_GMAC2_PCS_ENABLE;
mvreg_write(pp, MVNETA_GMAC_CTRL_2, val);

mvreg_write(pp, MVNETA_SGMII_SERDES_CFG, MVNETA_SGMII_SERDES_PROTO);

2014-04-01 04:14:41

by Greg Kroah-Hartman

[permalink] [raw]

Subject: [PATCH 3.13 14/22] net: mvneta: fix usage as a module on RGMII configurations

3.13-stable review patch. If anyone has any objections, please let me know.

------------------

From: Thomas Petazzoni <[email protected]>

commit e3a8786c10e75903f1269474e21fe8cb49c3a670 upstream.

Commit 5445eaf309ff ('mvneta: Try to fix mvneta when compiled as
module') fixed the mvneta driver to make it work properly when loaded
as a module in SGMII configuration, which was tested successful by the
author on the Armada XP OpenBlocks AX3, which uses SGMII.

However, it turns out that the Armada XP GP, which uses RGMII, is
affected by a similar problem: its SERDES configuration is lost when
mvneta is loaded as a module, because this configuration is set by the
bootloader, and then lost because the clock is gated by the clock
framework until the mvneta driver is loaded again and the clock is
re-enabled.

However, it turns out that for the RGMII case, setting the SERDES
configuration is not sufficient: the PCS enable bit in the
MVNETA_GMAC_CTRL_2 register must also be set, like in the SGMII
configuration.

Therefore, this commit reworks the SGMII/RGMII initialization: the
only difference between the two now is a different SERDES
configuration, all the rest is identical.

In detail, to achieve this, the commit:

* Renames MVNETA_SGMII_SERDES_CFG to MVNETA_SERDES_CFG because it is
not specific to SGMII, but also used on RGMII configurations.

* Adds a MVNETA_RGMII_SERDES_PROTO definition, that must be used as
the MVNETA_SERDES_CFG value in RGMII configurations.

* Removes the mvneta_gmac_rgmii_set() and mvneta_port_sgmii_config()
functions, and instead directly do the SGMII/RGMII configuration in
mvneta_port_up(), from where those functions where called. It is
worth mentioning that mvneta_gmac_rgmii_set() had an 'enable'
parameter that was always passed as '1', so it was pretty useless.

* Reworks the mvneta_port_up() function to set the MVNETA_SERDES_CFG
register to the appropriate value depending on the RGMII vs. SGMII
configuration. It also unconditionally set the PCS_ENABLE bit (was
already done for SGMII, but is now also needed for RGMII), and sets
the PORT_RGMII bit (which was already done for both SGMII and
RGMII).

This commit was successfully tested with mvneta compiled as a module,
on both the OpenBlocks AX3 (SGMII configuration) and the Armada XP GP
(RGMII configuration).

Reported-by: Steve McIntyre <[email protected]>
Signed-off-by: Thomas Petazzoni <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
drivers/net/ethernet/marvell/mvneta.c | 41 ++++++----------------------------
1 file changed, 8 insertions(+), 33 deletions(-)

--- a/drivers/net/ethernet/marvell/mvneta.c
+++ b/drivers/net/ethernet/marvell/mvneta.c
@@ -88,8 +88,9 @@
#define MVNETA_TX_IN_PRGRS BIT(1)
#define MVNETA_TX_FIFO_EMPTY BIT(8)
#define MVNETA_RX_MIN_FRAME_SIZE 0x247c
-#define MVNETA_SGMII_SERDES_CFG 0x24A0
+#define MVNETA_SERDES_CFG 0x24A0
#define MVNETA_SGMII_SERDES_PROTO 0x0cc7
+#define MVNETA_RGMII_SERDES_PROTO 0x0667
#define MVNETA_TYPE_PRIO 0x24bc
#define MVNETA_FORCE_UNI BIT(21)
#define MVNETA_TXQ_CMD_1 0x24e4
@@ -706,35 +707,6 @@ static void mvneta_rxq_bm_disable(struct
mvreg_write(pp, MVNETA_RXQ_CONFIG_REG(rxq->id), val);
}

-
-
-/* Sets the RGMII Enable bit (RGMIIEn) in port MAC control register */
-static void mvneta_gmac_rgmii_set(struct mvneta_port *pp, int enable)
-{
- u32 val;
-
- val = mvreg_read(pp, MVNETA_GMAC_CTRL_2);
-
- if (enable)
- val |= MVNETA_GMAC2_PORT_RGMII;
- else
- val &= ~MVNETA_GMAC2_PORT_RGMII;
-
- mvreg_write(pp, MVNETA_GMAC_CTRL_2, val);
-}
-
-/* Config SGMII port */
-static void mvneta_port_sgmii_config(struct mvneta_port *pp)
-{
- u32 val;
-
- val = mvreg_read(pp, MVNETA_GMAC_CTRL_2);
- val |= MVNETA_GMAC2_PCS_ENABLE;
- mvreg_write(pp, MVNETA_GMAC_CTRL_2, val);
-
- mvreg_write(pp, MVNETA_SGMII_SERDES_CFG, MVNETA_SGMII_SERDES_PROTO);
-}
-
/* Start the Ethernet port RX and TX activity */
static void mvneta_port_up(struct mvneta_port *pp)
{
@@ -2729,12 +2701,15 @@ static void mvneta_port_power_up(struct
mvreg_write(pp, MVNETA_UNIT_INTR_CAUSE, 0);

if (phy_mode == PHY_INTERFACE_MODE_SGMII)
- mvneta_port_sgmii_config(pp);
+ mvreg_write(pp, MVNETA_SERDES_CFG, MVNETA_SGMII_SERDES_PROTO);
+ else
+ mvreg_write(pp, MVNETA_SERDES_CFG, MVNETA_RGMII_SERDES_PROTO);

- mvneta_gmac_rgmii_set(pp, 1);
+ val = mvreg_read(pp, MVNETA_GMAC_CTRL_2);
+
+ val |= MVNETA_GMAC2_PCS_ENABLE | MVNETA_GMAC2_PORT_RGMII;

/* Cancel Port Reset */
- val = mvreg_read(pp, MVNETA_GMAC_CTRL_2);
val &= ~MVNETA_GMAC2_PORT_RESET;
mvreg_write(pp, MVNETA_GMAC_CTRL_2, val);

2014-04-01 04:15:01

by Greg Kroah-Hartman

[permalink] [raw]

Subject: [PATCH 3.13 15/22] random32: avoid attempt to late reseed if in the middle of seeding

3.13-stable review patch. If anyone has any objections, please let me know.

------------------

From: Sasha Levin <[email protected]>

commit 05efa8c943b1d5d90fa8c8147571837573338bb6 upstream.

Commit 4af712e8df ("random32: add prandom_reseed_late() and call when
nonblocking pool becomes initialized") has added a late reseed stage
that happens as soon as the nonblocking pool is marked as initialized.

This fails in the case that the nonblocking pool gets initialized
during __prandom_reseed()'s call to get_random_bytes(). In that case
we'd double back into __prandom_reseed() in an attempt to do a late
reseed - deadlocking on 'lock' early on in the boot process.

Instead, just avoid even waiting to do a reseed if a reseed is already
occuring.

Fixes: 4af712e8df99 ("random32: add prandom_reseed_late() and call when nonblocking pool becomes initialized")
Signed-off-by: Sasha Levin <[email protected]>
Acked-by: Hannes Frederic Sowa <[email protected]>
Signed-off-by: Daniel Borkmann <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
lib/random32.c | 13 ++++++++++++-
1 file changed, 12 insertions(+), 1 deletion(-)

--- a/lib/random32.c
+++ b/lib/random32.c
@@ -244,8 +244,19 @@ static void __prandom_reseed(bool late)
static bool latch = false;
static DEFINE_SPINLOCK(lock);

+ /* Asking for random bytes might result in bytes getting
+ * moved into the nonblocking pool and thus marking it
+ * as initialized. In this case we would double back into
+ * this function and attempt to do a late reseed.
+ * Ignore the pointless attempt to reseed again if we're
+ * already waiting for bytes when the nonblocking pool
+ * got initialized.
+ */
+
/* only allow initial seeding (late == false) once */
- spin_lock_irqsave(&lock, flags);
+ if (!spin_trylock_irqsave(&lock, flags))
+ return;
+
if (latch && !late)
goto out;
latch = true;

2014-04-01 04:15:27

by Greg Kroah-Hartman

[permalink] [raw]

Subject: [PATCH 3.13 16/22] resizable namespace.c hashes

3.13-stable review patch. If anyone has any objections, please let me know.

------------------

From: Al Viro <[email protected]>

commit 0818bf27c05b2de56c5b2bd08cfae2a939bd5f52 upstream.

* switch allocation to alloc_large_system_hash()
* make sizes overridable by boot parameters (mhash_entries=, mphash_entries=)
* switch mountpoint_hashtable from list_head to hlist_head

Signed-off-by: Al Viro <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
fs/mount.h | 2 -
fs/namespace.c | 81 ++++++++++++++++++++++++++++++++++++++++-----------------
2 files changed, 59 insertions(+), 24 deletions(-)

--- a/fs/mount.h
+++ b/fs/mount.h
@@ -19,7 +19,7 @@ struct mnt_pcp {
};

struct mountpoint {
- struct list_head m_hash;
+ struct hlist_node m_hash;
struct dentry *m_dentry;
int m_count;
};
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -23,11 +23,34 @@
#include <linux/uaccess.h>
#include <linux/proc_ns.h>
#include <linux/magic.h>
+#include <linux/bootmem.h>
#include "pnode.h"
#include "internal.h"

-#define HASH_SHIFT ilog2(PAGE_SIZE / sizeof(struct list_head))
-#define HASH_SIZE (1UL << HASH_SHIFT)
+static unsigned int m_hash_mask __read_mostly;
+static unsigned int m_hash_shift __read_mostly;
+static unsigned int mp_hash_mask __read_mostly;
+static unsigned int mp_hash_shift __read_mostly;
+
+static __initdata unsigned long mhash_entries;
+static int __init set_mhash_entries(char *str)
+{
+ if (!str)
+ return 0;
+ mhash_entries = simple_strtoul(str, &str, 0);
+ return 1;
+}
+__setup("mhash_entries=", set_mhash_entries);
+
+static __initdata unsigned long mphash_entries;
+static int __init set_mphash_entries(char *str)
+{
+ if (!str)
+ return 0;
+ mphash_entries = simple_strtoul(str, &str, 0);
+ return 1;
+}
+__setup("mphash_entries=", set_mphash_entries);

static int event;
static DEFINE_IDA(mnt_id_ida);
@@ -37,7 +60,7 @@ static int mnt_id_start = 0;
static int mnt_group_start = 1;

static struct list_head *mount_hashtable __read_mostly;
-static struct list_head *mountpoint_hashtable __read_mostly;
+static struct hlist_head *mountpoint_hashtable __read_mostly;
static struct kmem_cache *mnt_cache __read_mostly;
static DECLARE_RWSEM(namespace_sem);

@@ -55,12 +78,19 @@ EXPORT_SYMBOL_GPL(fs_kobj);
*/
__cacheline_aligned_in_smp DEFINE_SEQLOCK(mount_lock);

-static inline unsigned long hash(struct vfsmount *mnt, struct dentry *dentry)
+static inline struct list_head *m_hash(struct vfsmount *mnt, struct dentry *dentry)
{
unsigned long tmp = ((unsigned long)mnt / L1_CACHE_BYTES);
tmp += ((unsigned long)dentry / L1_CACHE_BYTES);
- tmp = tmp + (tmp >> HASH_SHIFT);
- return tmp & (HASH_SIZE - 1);
+ tmp = tmp + (tmp >> m_hash_shift);
+ return &mount_hashtable[tmp & m_hash_mask];
+}
+
+static inline struct hlist_head *mp_hash(struct dentry *dentry)
+{
+ unsigned long tmp = ((unsigned long)dentry / L1_CACHE_BYTES);
+ tmp = tmp + (tmp >> mp_hash_shift);
+ return &mountpoint_hashtable[tmp & mp_hash_mask];
}

/*
@@ -575,7 +605,7 @@ bool legitimize_mnt(struct vfsmount *bas
*/
struct mount *__lookup_mnt(struct vfsmount *mnt, struct dentry *dentry)
{
- struct list_head *head = mount_hashtable + hash(mnt, dentry);
+ struct list_head *head = m_hash(mnt, dentry);
struct mount *p;

list_for_each_entry_rcu(p, head, mnt_hash)
@@ -590,7 +620,7 @@ struct mount *__lookup_mnt(struct vfsmou
*/
struct mount *__lookup_mnt_last(struct vfsmount *mnt, struct dentry *dentry)
{
- struct list_head *head = mount_hashtable + hash(mnt, dentry);
+ struct list_head *head = m_hash(mnt, dentry);
struct mount *p;

list_for_each_entry_reverse(p, head, mnt_hash)
@@ -633,11 +663,11 @@ struct vfsmount *lookup_mnt(struct path

static struct mountpoint *new_mountpoint(struct dentry *dentry)
{
- struct list_head *chain = mountpoint_hashtable + hash(NULL, dentry);
+ struct hlist_head *chain = mp_hash(dentry);
struct mountpoint *mp;
int ret;

- list_for_each_entry(mp, chain, m_hash) {
+ hlist_for_each_entry(mp, chain, m_hash) {
if (mp->m_dentry == dentry) {
/* might be worth a WARN_ON() */
if (d_unlinked(dentry))
@@ -659,7 +689,7 @@ static struct mountpoint *new_mountpoint

mp->m_dentry = dentry;
mp->m_count = 1;
- list_add(&mp->m_hash, chain);
+ hlist_add_head(&mp->m_hash, chain);
return mp;
}

@@ -670,7 +700,7 @@ static void put_mountpoint(struct mountp
spin_lock(&dentry->d_lock);
dentry->d_flags &= ~DCACHE_MOUNTED;
spin_unlock(&dentry->d_lock);
- list_del(&mp->m_hash);
+ hlist_del(&mp->m_hash);
kfree(mp);
}
}
@@ -739,8 +769,7 @@ static void attach_mnt(struct mount *mnt
struct mountpoint *mp)
{
mnt_set_mountpoint(parent, mp, mnt);
- list_add_tail(&mnt->mnt_hash, mount_hashtable +
- hash(&parent->mnt, mp->m_dentry));
+ list_add_tail(&mnt->mnt_hash, m_hash(&parent->mnt, mp->m_dentry));
list_add_tail(&mnt->mnt_child, &parent->mnt_mounts);
}

@@ -762,8 +791,8 @@ static void commit_tree(struct mount *mn

list_splice(&head, n->list.prev);

- list_add_tail(&mnt->mnt_hash, mount_hashtable +
- hash(&parent->mnt, mnt->mnt_mountpoint));
+ list_add_tail(&mnt->mnt_hash,
+ m_hash(&parent->mnt, mnt->mnt_mountpoint));
list_add_tail(&mnt->mnt_child, &parent->mnt_mounts);
touch_mnt_namespace(n);
}
@@ -2777,18 +2806,24 @@ void __init mnt_init(void)
mnt_cache = kmem_cache_create("mnt_cache", sizeof(struct mount),
0, SLAB_HWCACHE_ALIGN | SLAB_PANIC, NULL);

- mount_hashtable = (struct list_head *)__get_free_page(GFP_ATOMIC);
- mountpoint_hashtable = (struct list_head *)__get_free_page(GFP_ATOMIC);
+ mount_hashtable = alloc_large_system_hash("Mount-cache",
+ sizeof(struct list_head),
+ mhash_entries, 19,
+ 0,
+ &m_hash_shift, &m_hash_mask, 0, 0);
+ mountpoint_hashtable = alloc_large_system_hash("Mountpoint-cache",
+ sizeof(struct hlist_head),
+ mphash_entries, 19,
+ 0,
+ &mp_hash_shift, &mp_hash_mask, 0, 0);

if (!mount_hashtable || !mountpoint_hashtable)
panic("Failed to allocate mount hash table\n");

- printk(KERN_INFO "Mount-cache hash table entries: %lu\n", HASH_SIZE);
-
- for (u = 0; u < HASH_SIZE; u++)
+ for (u = 0; u <= m_hash_mask; u++)
INIT_LIST_HEAD(&mount_hashtable[u]);
- for (u = 0; u < HASH_SIZE; u++)
- INIT_LIST_HEAD(&mountpoint_hashtable[u]);
+ for (u = 0; u <= mp_hash_mask; u++)
+ INIT_HLIST_HEAD(&mountpoint_hashtable[u]);

err = sysfs_init();
if (err)

2014-04-01 04:15:35

by Greg Kroah-Hartman

[permalink] [raw]

Subject: [PATCH 3.13 18/22] dont bother with propagate_mnt() unless the target is shared

3.13-stable review patch. If anyone has any objections, please let me know.

------------------

From: Al Viro <[email protected]>

commit 0b1b901b5a98bb36943d10820efc796f7cd45ff3 upstream.

If the dest_mnt is not shared, propagate_mnt() does nothing -
there's no mounts to propagate to and thus no copies to create.
Might as well don't bother calling it in that case.

Signed-off-by: Al Viro <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
fs/namespace.c | 17 +++++++----------
1 file changed, 7 insertions(+), 10 deletions(-)

--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -1653,16 +1653,14 @@ static int attach_recursive_mnt(struct m
err = invent_group_ids(source_mnt, true);
if (err)
goto out;
- }
- err = propagate_mnt(dest_mnt, dest_mp, source_mnt, &tree_list);
- if (err)
- goto out_cleanup_ids;
-
- lock_mount_hash();
-
- if (IS_MNT_SHARED(dest_mnt)) {
+ err = propagate_mnt(dest_mnt, dest_mp, source_mnt, &tree_list);
+ if (err)
+ goto out_cleanup_ids;
+ lock_mount_hash();
for (p = source_mnt; p; p = next_mnt(p, source_mnt))
set_mnt_shared(p);
+ } else {
+ lock_mount_hash();
}
if (parent_path) {
detach_mnt(source_mnt, parent_path);
@@ -1685,8 +1683,7 @@ static int attach_recursive_mnt(struct m
return 0;

out_cleanup_ids:
- if (IS_MNT_SHARED(dest_mnt))
- cleanup_group_ids(source_mnt, NULL);
+ cleanup_group_ids(source_mnt, NULL);
out:
return err;
}

2014-04-01 04:15:46

by Greg Kroah-Hartman

[permalink] [raw]

Subject: [PATCH 3.13 20/22] mm: close PageTail race

3.13-stable review patch. If anyone has any objections, please let me know.

------------------

From: David Rientjes <[email protected]>

commit 668f9abbd4334e6c29fa8acd71635c4f9101caa7 upstream.

Commit bf6bddf1924e ("mm: introduce compaction and migration for
ballooned pages") introduces page_count(page) into memory compaction
which dereferences page->first_page if PageTail(page).

This results in a very rare NULL pointer dereference on the
aforementioned page_count(page). Indeed, anything that does
compound_head(), including page_count() is susceptible to racing with
prep_compound_page() and seeing a NULL or dangling page->first_page
pointer.

This patch uses Andrea's implementation of compound_trans_head() that
deals with such a race and makes it the default compound_head()
implementation. This includes a read memory barrier that ensures that
if PageTail(head) is true that we return a head page that is neither
NULL nor dangling. The patch then adds a store memory barrier to
prep_compound_page() to ensure page->first_page is set.

This is the safest way to ensure we see the head page that we are
expecting, PageTail(page) is already in the unlikely() path and the
memory barriers are unfortunately required.

Hugetlbfs is the exception, we don't enforce a store memory barrier
during init since no race is possible.

Signed-off-by: David Rientjes <[email protected]>
Cc: Holger Kiehl <[email protected]>
Cc: Christoph Lameter <[email protected]>
Cc: Rafael Aquini <[email protected]>
Cc: Vlastimil Babka <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Mel Gorman <[email protected]>
Cc: Andrea Arcangeli <[email protected]>
Cc: Rik van Riel <[email protected]>
Cc: "Kirill A. Shutemov" <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
drivers/block/aoe/aoecmd.c | 4 ++--
drivers/vfio/vfio_iommu_type1.c | 4 ++--
fs/proc/page.c | 2 +-
include/linux/huge_mm.h | 18 ------------------
include/linux/mm.h | 14 ++++++++++++--
mm/ksm.c | 2 +-
mm/memory-failure.c | 2 +-
mm/page_alloc.c | 4 +++-
mm/swap.c | 4 ++--
9 files changed, 24 insertions(+), 30 deletions(-)

--- a/drivers/block/aoe/aoecmd.c
+++ b/drivers/block/aoe/aoecmd.c
@@ -905,7 +905,7 @@ bio_pageinc(struct bio *bio)
/* Non-zero page count for non-head members of
* compound pages is no longer allowed by the kernel.
*/
- page = compound_trans_head(bv->bv_page);
+ page = compound_head(bv->bv_page);
atomic_inc(&page->_count);
}
}
@@ -918,7 +918,7 @@ bio_pagedec(struct bio *bio)
int i;

bio_for_each_segment(bv, bio, i) {
- page = compound_trans_head(bv->bv_page);
+ page = compound_head(bv->bv_page);
atomic_dec(&page->_count);
}
}
--- a/drivers/vfio/vfio_iommu_type1.c
+++ b/drivers/vfio/vfio_iommu_type1.c
@@ -186,12 +186,12 @@ static bool is_invalid_reserved_pfn(unsi
if (pfn_valid(pfn)) {
bool reserved;
struct page *tail = pfn_to_page(pfn);
- struct page *head = compound_trans_head(tail);
+ struct page *head = compound_head(tail);
reserved = !!(PageReserved(head));
if (head != tail) {
/*
* "head" is not a dangling pointer
- * (compound_trans_head takes care of that)
+ * (compound_head takes care of that)
* but the hugepage may have been split
* from under us (and we may not hold a
* reference count on the head page so it can
--- a/fs/proc/page.c
+++ b/fs/proc/page.c
@@ -121,7 +121,7 @@ u64 stable_page_flags(struct page *page)
* just checks PG_head/PG_tail, so we need to check PageLRU to make
* sure a given page is a thp, not a non-huge compound page.
*/
- else if (PageTransCompound(page) && PageLRU(compound_trans_head(page)))
+ else if (PageTransCompound(page) && PageLRU(compound_head(page)))
u |= 1 << KPF_THP;

/*
--- a/include/linux/huge_mm.h
+++ b/include/linux/huge_mm.h
@@ -157,23 +157,6 @@ static inline int hpage_nr_pages(struct
return HPAGE_PMD_NR;
return 1;
}
-static inline struct page *compound_trans_head(struct page *page)
-{
- if (PageTail(page)) {
- struct page *head;
- head = page->first_page;
- smp_rmb();
- /*
- * head may be a dangling pointer.
- * __split_huge_page_refcount clears PageTail before
- * overwriting first_page, so if PageTail is still
- * there it means the head pointer isn't dangling.
- */
- if (PageTail(page))
- return head;
- }
- return page;
-}

extern int do_huge_pmd_numa_page(struct mm_struct *mm, struct vm_area_struct *vma,
unsigned long addr, pmd_t pmd, pmd_t *pmdp);
@@ -203,7 +186,6 @@ static inline int split_huge_page(struct
do { } while (0)
#define split_huge_page_pmd_mm(__mm, __address, __pmd) \
do { } while (0)
-#define compound_trans_head(page) compound_head(page)
static inline int hugepage_madvise(struct vm_area_struct *vma,
unsigned long *vm_flags, int advice)
{
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -389,8 +389,18 @@ static inline void compound_unlock_irqre

static inline struct page *compound_head(struct page *page)
{
- if (unlikely(PageTail(page)))
- return page->first_page;
+ if (unlikely(PageTail(page))) {
+ struct page *head = page->first_page;
+
+ /*
+ * page->first_page may be a dangling pointer to an old
+ * compound page, so recheck that it is still a tail
+ * page before returning.
+ */
+ smp_rmb();
+ if (likely(PageTail(page)))
+ return head;
+ }
return page;
}

--- a/mm/ksm.c
+++ b/mm/ksm.c
@@ -444,7 +444,7 @@ static void break_cow(struct rmap_item *
static struct page *page_trans_compound_anon(struct page *page)
{
if (PageTransCompound(page)) {
- struct page *head = compound_trans_head(page);
+ struct page *head = compound_head(page);
/*
* head may actually be splitted and freed from under
* us but it's ok here.
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -1645,7 +1645,7 @@ int soft_offline_page(struct page *page,
{
int ret;
unsigned long pfn = page_to_pfn(page);
- struct page *hpage = compound_trans_head(page);
+ struct page *hpage = compound_head(page);

if (PageHWPoison(page)) {
pr_info("soft offline: %#lx page already poisoned\n", pfn);
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -369,9 +369,11 @@ void prep_compound_page(struct page *pag
__SetPageHead(page);
for (i = 1; i < nr_pages; i++) {
struct page *p = page + i;
- __SetPageTail(p);
set_page_count(p, 0);
p->first_page = page;
+ /* Make sure p->first_page is always valid for PageTail() */
+ smp_wmb();
+ __SetPageTail(p);
}
}

--- a/mm/swap.c
+++ b/mm/swap.c
@@ -84,7 +84,7 @@ static void put_compound_page(struct pag
{
if (unlikely(PageTail(page))) {
/* __split_huge_page_refcount can run under us */
- struct page *page_head = compound_trans_head(page);
+ struct page *page_head = compound_head(page);

if (likely(page != page_head &&
get_page_unless_zero(page_head))) {
@@ -222,7 +222,7 @@ bool __get_page_tail(struct page *page)
*/
unsigned long flags;
bool got = false;
- struct page *page_head = compound_trans_head(page);
+ struct page *page_head = compound_head(page);

if (likely(page != page_head && get_page_unless_zero(page_head))) {
/* Ref to put_compound_page() comment. */

2014-04-01 04:16:00

by Greg Kroah-Hartman

[permalink] [raw]

Subject: [PATCH 3.13 21/22] cgroup: protect modifications to cgroup_idr with cgroup_mutex

3.13-stable review patch. If anyone has any objections, please let me know.

------------------

From: Li Zefan <[email protected]>

commit 0ab02ca8f887908152d1a96db5130fc661d36a1e upstream.

Setup cgroupfs like this:
# mount -t cgroup -o cpuacct xxx /cgroup
# mkdir /cgroup/sub1
# mkdir /cgroup/sub2

Then run these two commands:
# for ((; ;)) { mkdir /cgroup/sub1/tmp && rmdir /mnt/sub1/tmp; } &
# for ((; ;)) { mkdir /cgroup/sub2/tmp && rmdir /mnt/sub2/tmp; } &

After seconds you may see this warning:

------------[ cut here ]------------
WARNING: CPU: 1 PID: 25243 at lib/idr.c:527 sub_remove+0x87/0x1b0()
idr_remove called for id=6 which is not allocated.
...
Call Trace:
[<ffffffff8156063c>] dump_stack+0x7a/0x96
[<ffffffff810591ac>] warn_slowpath_common+0x8c/0xc0
[<ffffffff81059296>] warn_slowpath_fmt+0x46/0x50
[<ffffffff81300aa7>] sub_remove+0x87/0x1b0
[<ffffffff810f3f02>] ? css_killed_work_fn+0x32/0x1b0
[<ffffffff81300bf5>] idr_remove+0x25/0xd0
[<ffffffff810f2bab>] cgroup_destroy_css_killed+0x5b/0xc0
[<ffffffff810f4000>] css_killed_work_fn+0x130/0x1b0
[<ffffffff8107cdbc>] process_one_work+0x26c/0x550
[<ffffffff8107eefe>] worker_thread+0x12e/0x3b0
[<ffffffff81085f96>] kthread+0xe6/0xf0
[<ffffffff81570bac>] ret_from_fork+0x7c/0xb0
---[ end trace 2d1577ec10cf80d0 ]---

It's because allocating/removing cgroup ID is not properly synchronized.

The bug was introduced when we converted cgroup_ida to cgroup_idr.
While synchronization is already done inside ida_simple_{get,remove}(),
users are responsible for concurrent calls to idr_{alloc,remove}().

tj: Refreshed on top of b58c89986a77 ("cgroup: fix error return from
cgroup_create()").

[[email protected]: ported to 3.12]
Fixes: 4e96ee8e981b ("cgroup: convert cgroup_ida to cgroup_idr")
Cc: <[email protected]> #3.12+
Reported-by: Michal Hocko <[email protected]>
Signed-off-by: Li Zefan <[email protected]>
Signed-off-by: Michal Hocko <[email protected]>
Signed-off-by: Jiri Slaby <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
include/linux/cgroup.h | 2 ++
kernel/cgroup.c | 26 +++++++++++++-------------
2 files changed, 15 insertions(+), 13 deletions(-)

--- a/include/linux/cgroup.h
+++ b/include/linux/cgroup.h
@@ -169,6 +169,8 @@ struct cgroup {
*
* The ID of the root cgroup is always 0, and a new cgroup
* will be assigned with a smallest available ID.
+ *
+ * Allocating/Removing ID must be protected by cgroup_mutex.
*/
int id;

--- a/kernel/cgroup.c
+++ b/kernel/cgroup.c
@@ -4363,16 +4363,6 @@ static long cgroup_create(struct cgroup
rcu_assign_pointer(cgrp->name, name);

/*
- * Temporarily set the pointer to NULL, so idr_find() won't return
- * a half-baked cgroup.
- */
- cgrp->id = idr_alloc(&root->cgroup_idr, NULL, 1, 0, GFP_KERNEL);
- if (cgrp->id < 0) {
- err = -ENOMEM;
- goto err_free_name;
- }
-
- /*
* Only live parents can have children. Note that the liveliness
* check isn't strictly necessary because cgroup_mkdir() and
* cgroup_rmdir() are fully synchronized by i_mutex; however, do it
@@ -4381,7 +4371,7 @@ static long cgroup_create(struct cgroup
*/
if (!cgroup_lock_live_group(parent)) {
err = -ENODEV;
- goto err_free_id;
+ goto err_free_name;
}

/* Grab a reference on the superblock so the hierarchy doesn't
@@ -4391,6 +4381,16 @@ static long cgroup_create(struct cgroup
* fs */
atomic_inc(&sb->s_active);

+ /*
+ * Temporarily set the pointer to NULL, so idr_find() won't return
+ * a half-baked cgroup.
+ */
+ cgrp->id = idr_alloc(&root->cgroup_idr, NULL, 1, 0, GFP_KERNEL);
+ if (cgrp->id < 0) {
+ err = -ENOMEM;
+ goto err_unlock;
+ }
+
init_cgroup_housekeeping(cgrp);

dentry->d_fsdata = cgrp;
@@ -4491,11 +4491,11 @@ err_free_all:
ss->css_free(css);
}
}
+ idr_remove(&root->cgroup_idr, cgrp->id);
+err_unlock:
mutex_unlock(&cgroup_mutex);
/* Release the reference count that we took on the superblock */
deactivate_super(sb);
-err_free_id:
- idr_remove(&root->cgroup_idr, cgrp->id);
err_free_name:
kfree(rcu_dereference_raw(cgrp->name));
err_free_cgrp:

2014-04-01 04:16:25

by Greg Kroah-Hartman

[permalink] [raw]

Subject: [PATCH 3.13 17/22] keep shadowed vfsmounts together

3.13-stable review patch. If anyone has any objections, please let me know.

------------------

From: Al Viro <[email protected]>

commit 1d6a32acd70ab18499829c0a9a5dbe2bace72a13 upstream.

preparation to switching mnt_hash to hlist

Signed-off-by: Al Viro <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
fs/namespace.c | 32 +++++++++++++++++++++++---------
1 file changed, 23 insertions(+), 9 deletions(-)

--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -621,12 +621,20 @@ struct mount *__lookup_mnt(struct vfsmou
struct mount *__lookup_mnt_last(struct vfsmount *mnt, struct dentry *dentry)
{
struct list_head *head = m_hash(mnt, dentry);
- struct mount *p;
+ struct mount *p, *res = NULL;

- list_for_each_entry_reverse(p, head, mnt_hash)
+ list_for_each_entry(p, head, mnt_hash)
if (&p->mnt_parent->mnt == mnt && p->mnt_mountpoint == dentry)
- return p;
- return NULL;
+ goto found;
+ return res;
+found:
+ res = p;
+ list_for_each_entry_continue(p, head, mnt_hash) {
+ if (&p->mnt_parent->mnt != mnt || p->mnt_mountpoint != dentry)
+ break;
+ res = p;
+ }
+ return res;
}

/*
@@ -769,14 +777,14 @@ static void attach_mnt(struct mount *mnt
struct mountpoint *mp)
{
mnt_set_mountpoint(parent, mp, mnt);
- list_add_tail(&mnt->mnt_hash, m_hash(&parent->mnt, mp->m_dentry));
+ list_add(&mnt->mnt_hash, m_hash(&parent->mnt, mp->m_dentry));
list_add_tail(&mnt->mnt_child, &parent->mnt_mounts);
}

/*
* vfsmount lock must be held for write
*/
-static void commit_tree(struct mount *mnt)
+static void commit_tree(struct mount *mnt, struct mount *shadows)
{
struct mount *parent = mnt->mnt_parent;
struct mount *m;
@@ -791,7 +799,10 @@ static void commit_tree(struct mount *mn

list_splice(&head, n->list.prev);

- list_add_tail(&mnt->mnt_hash,
+ if (shadows)
+ list_add(&mnt->mnt_hash, &shadows->mnt_hash);
+ else
+ list_add(&mnt->mnt_hash,
m_hash(&parent->mnt, mnt->mnt_mountpoint));
list_add_tail(&mnt->mnt_child, &parent->mnt_mounts);
touch_mnt_namespace(n);
@@ -1659,12 +1670,15 @@ static int attach_recursive_mnt(struct m
touch_mnt_namespace(source_mnt->mnt_ns);
} else {
mnt_set_mountpoint(dest_mnt, dest_mp, source_mnt);
- commit_tree(source_mnt);
+ commit_tree(source_mnt, NULL);
}

list_for_each_entry_safe(child, p, &tree_list, mnt_hash) {
+ struct mount *q;
list_del_init(&child->mnt_hash);
- commit_tree(child);
+ q = __lookup_mnt_last(&child->mnt_parent->mnt,
+ child->mnt_mountpoint);
+ commit_tree(child, q);
}
unlock_mount_hash();

2014-04-01 04:16:44

by Greg Kroah-Hartman

[permalink] [raw]

Subject: [PATCH 3.13 22/22] netfilter: nf_conntrack_dccp: fix skb_header_pointer API usages

3.13-stable review patch. If anyone has any objections, please let me know.

------------------

From: Daniel Borkmann <[email protected]>

commit b22f5126a24b3b2f15448c3f2a254fc10cbc2b92 upstream.

Some occurences in the netfilter tree use skb_header_pointer() in
the following way ...

struct dccp_hdr _dh, *dh;
...
skb_header_pointer(skb, dataoff, sizeof(_dh), &dh);

... where dh itself is a pointer that is being passed as the copy
buffer. Instead, we need to use &_dh as the forth argument so that
we're copying the data into an actual buffer that sits on the stack.

Currently, we probably could overwrite memory on the stack (e.g.
with a possibly mal-formed DCCP packet), but unintentionally, as
we only want the buffer to be placed into _dh variable.

Fixes: 2bc780499aa3 ("[NETFILTER]: nf_conntrack: add DCCP protocol support")
Signed-off-by: Daniel Borkmann <[email protected]>
Signed-off-by: Pablo Neira Ayuso <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
net/netfilter/nf_conntrack_proto_dccp.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)

--- a/net/netfilter/nf_conntrack_proto_dccp.c
+++ b/net/netfilter/nf_conntrack_proto_dccp.c
@@ -428,7 +428,7 @@ static bool dccp_new(struct nf_conn *ct,
const char *msg;
u_int8_t state;

- dh = skb_header_pointer(skb, dataoff, sizeof(_dh), &dh);
+ dh = skb_header_pointer(skb, dataoff, sizeof(_dh), &_dh);
BUG_ON(dh == NULL);

state = dccp_state_table[CT_DCCP_ROLE_CLIENT][dh->dccph_type][CT_DCCP_NONE];
@@ -486,7 +486,7 @@ static int dccp_packet(struct nf_conn *c
u_int8_t type, old_state, new_state;
enum ct_dccp_roles role;

- dh = skb_header_pointer(skb, dataoff, sizeof(_dh), &dh);
+ dh = skb_header_pointer(skb, dataoff, sizeof(_dh), &_dh);
BUG_ON(dh == NULL);
type = dh->dccph_type;

@@ -577,7 +577,7 @@ static int dccp_error(struct net *net, s
unsigned int cscov;
const char *msg;

- dh = skb_header_pointer(skb, dataoff, sizeof(_dh), &dh);
+ dh = skb_header_pointer(skb, dataoff, sizeof(_dh), &_dh);
if (dh == NULL) {
msg = "nf_ct_dccp: short packet ";
goto out_invalid;

2014-04-01 04:17:32

by Greg Kroah-Hartman

[permalink] [raw]

Subject: [PATCH 3.13 19/22] switch mnt_hash to hlist

3.13-stable review patch. If anyone has any objections, please let me know.

------------------

From: Al Viro <[email protected]>

commit 38129a13e6e71f666e0468e99fdd932a687b4d7e upstream.

fixes RCU bug - walking through hlist is safe in face of element moves,
since it's self-terminating. Cyclic lists are not - if we end up jumping
to another hash chain, we'll loop infinitely without ever hitting the
original list head.

[fix for dumb braino folded]

Spotted by: Max Kellermann <[email protected]>
Signed-off-by: Al Viro <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
fs/mount.h | 2 -
fs/namespace.c | 79 +++++++++++++++++++++++++++++++--------------------------
fs/pnode.c | 26 ++++++++++--------
fs/pnode.h | 4 +-
4 files changed, 61 insertions(+), 50 deletions(-)

--- a/fs/mount.h
+++ b/fs/mount.h
@@ -25,7 +25,7 @@ struct mountpoint {
};

struct mount {
- struct list_head mnt_hash;
+ struct hlist_node mnt_hash;
struct mount *mnt_parent;
struct dentry *mnt_mountpoint;
struct vfsmount mnt;
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -59,7 +59,7 @@ static DEFINE_SPINLOCK(mnt_id_lock);
static int mnt_id_start = 0;
static int mnt_group_start = 1;

-static struct list_head *mount_hashtable __read_mostly;
+static struct hlist_head *mount_hashtable __read_mostly;
static struct hlist_head *mountpoint_hashtable __read_mostly;
static struct kmem_cache *mnt_cache __read_mostly;
static DECLARE_RWSEM(namespace_sem);
@@ -78,7 +78,7 @@ EXPORT_SYMBOL_GPL(fs_kobj);
*/
__cacheline_aligned_in_smp DEFINE_SEQLOCK(mount_lock);

-static inline struct list_head *m_hash(struct vfsmount *mnt, struct dentry *dentry)
+static inline struct hlist_head *m_hash(struct vfsmount *mnt, struct dentry *dentry)
{
unsigned long tmp = ((unsigned long)mnt / L1_CACHE_BYTES);
tmp += ((unsigned long)dentry / L1_CACHE_BYTES);
@@ -217,7 +217,7 @@ static struct mount *alloc_vfsmnt(const
mnt->mnt_writers = 0;
#endif

- INIT_LIST_HEAD(&mnt->mnt_hash);
+ INIT_HLIST_NODE(&mnt->mnt_hash);
INIT_LIST_HEAD(&mnt->mnt_child);
INIT_LIST_HEAD(&mnt->mnt_mounts);
INIT_LIST_HEAD(&mnt->mnt_list);
@@ -605,10 +605,10 @@ bool legitimize_mnt(struct vfsmount *bas
*/
struct mount *__lookup_mnt(struct vfsmount *mnt, struct dentry *dentry)
{
- struct list_head *head = m_hash(mnt, dentry);
+ struct hlist_head *head = m_hash(mnt, dentry);
struct mount *p;

- list_for_each_entry_rcu(p, head, mnt_hash)
+ hlist_for_each_entry_rcu(p, head, mnt_hash)
if (&p->mnt_parent->mnt == mnt && p->mnt_mountpoint == dentry)
return p;
return NULL;
@@ -620,20 +620,16 @@ struct mount *__lookup_mnt(struct vfsmou
*/
struct mount *__lookup_mnt_last(struct vfsmount *mnt, struct dentry *dentry)
{
- struct list_head *head = m_hash(mnt, dentry);
- struct mount *p, *res = NULL;
-
- list_for_each_entry(p, head, mnt_hash)
- if (&p->mnt_parent->mnt == mnt && p->mnt_mountpoint == dentry)
- goto found;
- return res;
-found:
- res = p;
- list_for_each_entry_continue(p, head, mnt_hash) {
+ struct mount *p, *res;
+ res = p = __lookup_mnt(mnt, dentry);
+ if (!p)
+ goto out;
+ hlist_for_each_entry_continue(p, mnt_hash) {
if (&p->mnt_parent->mnt != mnt || p->mnt_mountpoint != dentry)
break;
res = p;
}
+out:
return res;
}

@@ -750,7 +746,7 @@ static void detach_mnt(struct mount *mnt
mnt->mnt_parent = mnt;
mnt->mnt_mountpoint = mnt->mnt.mnt_root;
list_del_init(&mnt->mnt_child);
- list_del_init(&mnt->mnt_hash);
+ hlist_del_init_rcu(&mnt->mnt_hash);
put_mountpoint(mnt->mnt_mp);
mnt->mnt_mp = NULL;
}
@@ -777,7 +773,7 @@ static void attach_mnt(struct mount *mnt
struct mountpoint *mp)
{
mnt_set_mountpoint(parent, mp, mnt);
- list_add(&mnt->mnt_hash, m_hash(&parent->mnt, mp->m_dentry));
+ hlist_add_head_rcu(&mnt->mnt_hash, m_hash(&parent->mnt, mp->m_dentry));
list_add_tail(&mnt->mnt_child, &parent->mnt_mounts);
}

@@ -800,9 +796,9 @@ static void commit_tree(struct mount *mn
list_splice(&head, n->list.prev);

if (shadows)
- list_add(&mnt->mnt_hash, &shadows->mnt_hash);
+ hlist_add_after_rcu(&shadows->mnt_hash, &mnt->mnt_hash);
else
- list_add(&mnt->mnt_hash,
+ hlist_add_head_rcu(&mnt->mnt_hash,
m_hash(&parent->mnt, mnt->mnt_mountpoint));
list_add_tail(&mnt->mnt_child, &parent->mnt_mounts);
touch_mnt_namespace(n);
@@ -1193,26 +1189,28 @@ int may_umount(struct vfsmount *mnt)

EXPORT_SYMBOL(may_umount);

-static LIST_HEAD(unmounted); /* protected by namespace_sem */
+static HLIST_HEAD(unmounted); /* protected by namespace_sem */

static void namespace_unlock(void)
{
struct mount *mnt;
- LIST_HEAD(head);
+ struct hlist_head head = unmounted;

- if (likely(list_empty(&unmounted))) {
+ if (likely(hlist_empty(&head))) {
up_write(&namespace_sem);
return;
}

- list_splice_init(&unmounted, &head);
+ head.first->pprev = &head.first;
+ INIT_HLIST_HEAD(&unmounted);
+
up_write(&namespace_sem);

synchronize_rcu();

- while (!list_empty(&head)) {
- mnt = list_first_entry(&head, struct mount, mnt_hash);
- list_del_init(&mnt->mnt_hash);
+ while (!hlist_empty(&head)) {
+ mnt = hlist_entry(head.first, struct mount, mnt_hash);
+ hlist_del_init(&mnt->mnt_hash);
if (mnt->mnt_ex_mountpoint.mnt)
path_put(&mnt->mnt_ex_mountpoint);
mntput(&mnt->mnt);
@@ -1233,16 +1231,19 @@ static inline void namespace_lock(void)
*/
void umount_tree(struct mount *mnt, int how)
{
- LIST_HEAD(tmp_list);
+ HLIST_HEAD(tmp_list);
struct mount *p;
+ struct mount *last = NULL;

- for (p = mnt; p; p = next_mnt(p, mnt))
- list_move(&p->mnt_hash, &tmp_list);
+ for (p = mnt; p; p = next_mnt(p, mnt)) {
+ hlist_del_init_rcu(&p->mnt_hash);
+ hlist_add_head(&p->mnt_hash, &tmp_list);
+ }

if (how)
propagate_umount(&tmp_list);

- list_for_each_entry(p, &tmp_list, mnt_hash) {
+ hlist_for_each_entry(p, &tmp_list, mnt_hash) {
list_del_init(&p->mnt_expire);
list_del_init(&p->mnt_list);
__touch_mnt_namespace(p->mnt_ns);
@@ -1260,8 +1261,13 @@ void umount_tree(struct mount *mnt, int
p->mnt_mp = NULL;
}
change_mnt_propagation(p, MS_PRIVATE);
+ last = p;
+ }
+ if (last) {
+ last->mnt_hash.next = unmounted.first;
+ unmounted.first = tmp_list.first;
+ unmounted.first->pprev = &unmounted.first;
}
- list_splice(&tmp_list, &unmounted);
}

static void shrink_submounts(struct mount *mnt);
@@ -1645,8 +1651,9 @@ static int attach_recursive_mnt(struct m
struct mountpoint *dest_mp,
struct path *parent_path)
{
- LIST_HEAD(tree_list);
+ HLIST_HEAD(tree_list);
struct mount *child, *p;
+ struct hlist_node *n;
int err;

if (IS_MNT_SHARED(dest_mnt)) {
@@ -1671,9 +1678,9 @@ static int attach_recursive_mnt(struct m
commit_tree(source_mnt, NULL);
}

- list_for_each_entry_safe(child, p, &tree_list, mnt_hash) {
+ hlist_for_each_entry_safe(child, n, &tree_list, mnt_hash) {
struct mount *q;
- list_del_init(&child->mnt_hash);
+ hlist_del_init(&child->mnt_hash);
q = __lookup_mnt_last(&child->mnt_parent->mnt,
child->mnt_mountpoint);
commit_tree(child, q);
@@ -2818,7 +2825,7 @@ void __init mnt_init(void)
0, SLAB_HWCACHE_ALIGN | SLAB_PANIC, NULL);

mount_hashtable = alloc_large_system_hash("Mount-cache",
- sizeof(struct list_head),
+ sizeof(struct hlist_head),
mhash_entries, 19,
0,
&m_hash_shift, &m_hash_mask, 0, 0);
@@ -2832,7 +2839,7 @@ void __init mnt_init(void)
panic("Failed to allocate mount hash table\n");

for (u = 0; u <= m_hash_mask; u++)
- INIT_LIST_HEAD(&mount_hashtable[u]);
+ INIT_HLIST_HEAD(&mount_hashtable[u]);
for (u = 0; u <= mp_hash_mask; u++)
INIT_HLIST_HEAD(&mountpoint_hashtable[u]);

--- a/fs/pnode.c
+++ b/fs/pnode.c
@@ -220,14 +220,14 @@ static struct mount *get_source(struct m
* @tree_list : list of heads of trees to be attached.
*/
int propagate_mnt(struct mount *dest_mnt, struct mountpoint *dest_mp,
- struct mount *source_mnt, struct list_head *tree_list)
+ struct mount *source_mnt, struct hlist_head *tree_list)
{
struct user_namespace *user_ns = current->nsproxy->mnt_ns->user_ns;
struct mount *m, *child;
int ret = 0;
struct mount *prev_dest_mnt = dest_mnt;
struct mount *prev_src_mnt = source_mnt;
- LIST_HEAD(tmp_list);
+ HLIST_HEAD(tmp_list);

for (m = propagation_next(dest_mnt, dest_mnt); m;
m = propagation_next(m, dest_mnt)) {
@@ -246,27 +246,29 @@ int propagate_mnt(struct mount *dest_mnt
child = copy_tree(source, source->mnt.mnt_root, type);
if (IS_ERR(child)) {
ret = PTR_ERR(child);
- list_splice(tree_list, tmp_list.prev);
+ tmp_list = *tree_list;
+ tmp_list.first->pprev = &tmp_list.first;
+ INIT_HLIST_HEAD(tree_list);
goto out;
}

if (is_subdir(dest_mp->m_dentry, m->mnt.mnt_root)) {
mnt_set_mountpoint(m, dest_mp, child);
- list_add_tail(&child->mnt_hash, tree_list);
+ hlist_add_head(&child->mnt_hash, tree_list);
} else {
/*
* This can happen if the parent mount was bind mounted
* on some subdirectory of a shared/slave mount.
*/
- list_add_tail(&child->mnt_hash, &tmp_list);
+ hlist_add_head(&child->mnt_hash, &tmp_list);
}
prev_dest_mnt = m;
prev_src_mnt = child;
}
out:
lock_mount_hash();
- while (!list_empty(&tmp_list)) {
- child = list_first_entry(&tmp_list, struct mount, mnt_hash);
+ while (!hlist_empty(&tmp_list)) {
+ child = hlist_entry(tmp_list.first, struct mount, mnt_hash);
umount_tree(child, 0);
}
unlock_mount_hash();
@@ -338,8 +340,10 @@ static void __propagate_umount(struct mo
* umount the child only if the child has no
* other children
*/
- if (child && list_empty(&child->mnt_mounts))
- list_move_tail(&child->mnt_hash, &mnt->mnt_hash);
+ if (child && list_empty(&child->mnt_mounts)) {
+ hlist_del_init_rcu(&child->mnt_hash);
+ hlist_add_before_rcu(&child->mnt_hash, &mnt->mnt_hash);
+ }
}
}

@@ -350,11 +354,11 @@ static void __propagate_umount(struct mo
*
* vfsmount lock must be held for write
*/
-int propagate_umount(struct list_head *list)
+int propagate_umount(struct hlist_head *list)
{
struct mount *mnt;

- list_for_each_entry(mnt, list, mnt_hash)
+ hlist_for_each_entry(mnt, list, mnt_hash)
__propagate_umount(mnt);
return 0;
}
--- a/fs/pnode.h
+++ b/fs/pnode.h
@@ -36,8 +36,8 @@ static inline void set_mnt_shared(struct

void change_mnt_propagation(struct mount *, int);
int propagate_mnt(struct mount *, struct mountpoint *, struct mount *,
- struct list_head *);
-int propagate_umount(struct list_head *);
+ struct hlist_head *);
+int propagate_umount(struct hlist_head *);
int propagate_mount_busy(struct mount *, int);
void mnt_release_group_id(struct mount *);
int get_dominating_id(struct mount *mnt, const struct path *root);

2014-04-01 04:15:32

by Greg Kroah-Hartman

[permalink] [raw]

Subject: [PATCH 3.13 11/22] x86: fix boot on uniprocessor systems

3.13-stable review patch. If anyone has any objections, please let me know.

------------------

From: Artem Fetishev <[email protected]>

commit 825600c0f20e595daaa7a6dd8970f84fa2a2ee57 upstream.

On x86 uniprocessor systems topology_physical_package_id() returns -1
which causes rapl_cpu_prepare() to leave rapl_pmu variable uninitialized
which leads to GPF in rapl_pmu_init().

See arch/x86/kernel/cpu/perf_event_intel_rapl.c.

It turns out that physical_package_id and core_id can actually be
retreived for uniprocessor systems too. Enabling them also fixes
rapl_pmu code.

Signed-off-by: Artem Fetishev <[email protected]>
Cc: Stephane Eranian <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
arch/x86/include/asm/topology.h | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)

--- a/arch/x86/include/asm/topology.h
+++ b/arch/x86/include/asm/topology.h
@@ -119,9 +119,10 @@ static inline void setup_node_to_cpumask

extern const struct cpumask *cpu_coregroup_mask(int cpu);

-#ifdef ENABLE_TOPO_DEFINES
#define topology_physical_package_id(cpu) (cpu_data(cpu).phys_proc_id)
#define topology_core_id(cpu) (cpu_data(cpu).cpu_core_id)
+
+#ifdef ENABLE_TOPO_DEFINES
#define topology_core_cpumask(cpu) (per_cpu(cpu_core_map, cpu))
#define topology_thread_cpumask(cpu) (per_cpu(cpu_sibling_map, cpu))
#endif

2014-04-01 04:14:05

by Greg Kroah-Hartman

[permalink] [raw]

Subject: [PATCH 3.13 04/22] Input: synaptics - add manual min/max quirk

3.13-stable review patch. If anyone has any objections, please let me know.

------------------

From: Benjamin Tissoires <[email protected]>

commit 421e08c41fda1f0c2ff6af81a67b491389b653a5 upstream.

The new Lenovo Haswell series (-40's) contains a new Synaptics touchpad.
However, these new Synaptics devices report bad axis ranges.
Under Windows, it is not a problem because the Windows driver uses RMI4
over SMBus to talk to the device. Under Linux, we are using the PS/2
fallback interface and it occurs the reported ranges are wrong.

Of course, it would be too easy to have only one range for the whole
series, each touchpad seems to be calibrated in a different way.

We can not use SMBus to get the actual range because I suspect the firmware
will switch into the SMBus mode and stop talking through PS/2 (this is the
case for hybrid HID over I2C / PS/2 Synaptics touchpads).

So as a temporary solution (until RMI4 land into upstream), start a new
list of quirks with the min/max manually set.

Signed-off-by: Benjamin Tissoires <[email protected]>
Signed-off-by: Dmitry Torokhov <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
drivers/input/mouse/synaptics.c | 47 ++++++++++++++++++++++++++++++++++++++++
1 file changed, 47 insertions(+)

--- a/drivers/input/mouse/synaptics.c
+++ b/drivers/input/mouse/synaptics.c
@@ -265,11 +265,22 @@ static int synaptics_identify(struct psm
* Read touchpad resolution and maximum reported coordinates
* Resolution is left zero if touchpad does not support the query
*/
+
+static const int *quirk_min_max;
+
static int synaptics_resolution(struct psmouse *psmouse)
{
struct synaptics_data *priv = psmouse->private;
unsigned char resp[3];

+ if (quirk_min_max) {
+ priv->x_min = quirk_min_max[0];
+ priv->x_max = quirk_min_max[1];
+ priv->y_min = quirk_min_max[2];
+ priv->y_max = quirk_min_max[3];
+ return 0;
+ }
+
if (SYN_ID_MAJOR(priv->identity) < 4)
return 0;

@@ -1485,10 +1496,46 @@ static const struct dmi_system_id olpc_d
{ }
};

+static const struct dmi_system_id min_max_dmi_table[] __initconst = {
+#if defined(CONFIG_DMI)
+ {
+ /* Lenovo ThinkPad Helix */
+ .matches = {
+ DMI_MATCH(DMI_SYS_VENDOR, "LENOVO"),
+ DMI_MATCH(DMI_PRODUCT_VERSION, "ThinkPad Helix"),
+ },
+ .driver_data = (int []){1024, 5052, 2258, 4832},
+ },
+ {
+ /* Lenovo ThinkPad T440s */
+ .matches = {
+ DMI_MATCH(DMI_SYS_VENDOR, "LENOVO"),
+ DMI_MATCH(DMI_PRODUCT_VERSION, "ThinkPad T440"),
+ },
+ .driver_data = (int []){1024, 5112, 2024, 4832},
+ },
+ {
+ /* Lenovo ThinkPad T540p */
+ .matches = {
+ DMI_MATCH(DMI_SYS_VENDOR, "LENOVO"),
+ DMI_MATCH(DMI_PRODUCT_VERSION, "ThinkPad T540"),
+ },
+ .driver_data = (int []){1024, 5056, 2058, 4832},
+ },
+#endif
+ { }
+};
+
void __init synaptics_module_init(void)
{
+ const struct dmi_system_id *min_max_dmi;
+
impaired_toshiba_kbc = dmi_check_system(toshiba_dmi_table);
broken_olpc_ec = dmi_check_system(olpc_dmi_table);
+
+ min_max_dmi = dmi_first_match(min_max_dmi_table);
+ if (min_max_dmi)
+ quirk_min_max = min_max_dmi->driver_data;
}

static int __synaptics_init(struct psmouse *psmouse, bool absolute_mode)

2014-04-01 04:19:20

by Greg Kroah-Hartman

[permalink] [raw]

Subject: [PATCH 3.13 09/22] i2c: cpm: Fix build by adding of_address.h and of_irq.h

3.13-stable review patch. If anyone has any objections, please let me know.

------------------

From: Scott Wood <[email protected]>

commit 5f12c5eca6e6b7aeb4b2028d579f614b4fe7a81f upstream.

Fixes a build break due to the undeclared use of irq_of_parse_and_map()
and of_iomap(). This build break was apparently introduced while the
driver was unbuildable due to the bug fixed by
62c19c9d29e65086e5ae76df371ed2e6b23f00cd ("i2c: Remove usage of
orphaned symbol OF_I2C"). When 62c19c was added in v3.14-rc7,
the driver was enabled again, breaking the powerpc mpc85xx_defconfig
and mpc85xx_smp_defconfig.

62c19c is marked for stable, so this should go there as well.

Reported-by: Geert Uytterhoeven <[email protected]>
Signed-off-by: Scott Wood <[email protected]>
Signed-off-by: Wolfram Sang <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
drivers/i2c/busses/i2c-cpm.c | 2 ++
1 file changed, 2 insertions(+)

--- a/drivers/i2c/busses/i2c-cpm.c
+++ b/drivers/i2c/busses/i2c-cpm.c
@@ -40,7 +40,9 @@
#include <linux/i2c.h>
#include <linux/io.h>
#include <linux/dma-mapping.h>
+#include <linux/of_address.h>
#include <linux/of_device.h>
+#include <linux/of_irq.h>
#include <linux/of_platform.h>
#include <sysdev/fsl_soc.h>
#include <asm/cpm.h>

2014-04-02 00:03:41

by Guenter Roeck

[permalink] [raw]

Subject: Re: [PATCH 3.13 00/22] 3.13.9-stable review

On 03/31/2014 09:08 PM, Greg Kroah-Hartman wrote:
> This is the start of the stable review cycle for the 3.13.9 release.
> There are 22 patches in this series, all will be posted as a response
> to this one. If anyone has any issues with these being applied, please
> let me know.
>
> Responses should be made by Thu Apr 3 04:06:39 UTC 2014.
> Anything received after that time might be too late.
>

Build results:
total: 126 pass: 122 skipped: 4 fail: 0

Qemu tests all passed. The previously observed powerpc failures are gone
with this version.

Details are available at http://server.roeck-us.net:8010/builders.

Guenter

2014-04-03 22:43:01

by Greg Kroah-Hartman

[permalink] [raw]

Subject: Re: [PATCH 3.13 00/22] 3.13.9-stable review

On Tue, Apr 01, 2014 at 05:03:34PM -0700, Guenter Roeck wrote:
> On 03/31/2014 09:08 PM, Greg Kroah-Hartman wrote:
> >This is the start of the stable review cycle for the 3.13.9 release.
> >There are 22 patches in this series, all will be posted as a response
> >to this one. If anyone has any issues with these being applied, please
> >let me know.
> >
> >Responses should be made by Thu Apr 3 04:06:39 UTC 2014.
> >Anything received after that time might be too late.
> >
>
> Build results:
> total: 126 pass: 122 skipped: 4 fail: 0
>
> Qemu tests all passed. The previously observed powerpc failures are gone
> with this version.
>
> Details are available at http://server.roeck-us.net:8010/builders.

Thanks for testing all of these.

greg k-h

2014-04-04 13:31:04

[permalink] [raw]

Subject: Re: [PATCH 3.13 00/22] 3.13.9-stable review

On 04/03/2014 04:45 PM, Greg Kroah-Hartman wrote:
> On Tue, Apr 01, 2014 at 05:03:34PM -0700, Guenter Roeck wrote:
>> On 03/31/2014 09:08 PM, Greg Kroah-Hartman wrote:
>>> This is the start of the stable review cycle for the 3.13.9 release.
>>> There are 22 patches in this series, all will be posted as a response
>>> to this one. If anyone has any issues with these being applied, please
>>> let me know.
>>>
>>> Responses should be made by Thu Apr 3 04:06:39 UTC 2014.
>>> Anything received after that time might be too late.
>>>
>>
>> Build results:
>> total: 126 pass: 122 skipped: 4 fail: 0
>>
>> Qemu tests all passed. The previously observed powerpc failures are gone
>> with this version.
>>
>> Details are available at http://server.roeck-us.net:8010/builders.
>
> Thanks for testing all of these.
>
> greg k-h
>

Looks like I missed responding to this one. 3.13.9 worked on all my test
systems.

-- Shuah
--
Shuah Khan
Senior Linux Kernel Developer - Open Source Group
Samsung Research America(Silicon Valley)
[email protected] | (970) 672-0658

2014-04-09 20:17:48

[permalink] [raw]

Subject: Re: [PATCH 3.13 12/22] make prepend_name() work correctly when called with negative *buflen

On 04/01/2014 12:08 AM, Greg Kroah-Hartman wrote:
> 3.13-stable review patch. If anyone has any objections, please let me know.
>
> ------------------
>
> From: Al Viro <[email protected]>
>
> commit e825196d48d2b89a6ec3a8eff280098d2a78207e upstream.
>
> In all callchains leading to prepend_name(), the value left in *buflen
> is eventually discarded unused if prepend_name() has returned a negative.
> So we are free to do what prepend() does, and subtract from *buflen
> *before* checking for underflow (which turns into checking the sign
> of subtraction result, of course).
>
> Signed-off-by: Al Viro <[email protected]>
> Signed-off-by: Greg Kroah-Hartman <[email protected]>

Hi Al, Greg,

We're working on creating Ksplice updates for the last -stable tree,
and stumbled on this patch.

The log message isn't too clear on what actually gets fixed, and
auditing the code didn't reveal the answer for that.

Could someone please describe what's the issue being solved with
this patch?

Thanks,
Sasha

2014-04-10 10:05:22

by Steven Noonan

[permalink] [raw]

Subject: Re: [PATCH 3.13 08/22] Revert "xen: properly account for _PAGE_NUMA during xen pte translations"

I realize it's late to protest on this given that 3.13.9 is out, but
what is the path forward for those experiencing the original issue
that the reverted commit was intended to correct?

http://marc.info/?l=linux-kernel&m=139034684731087&w=2

On Mon, Mar 31, 2014 at 9:08 PM, Greg Kroah-Hartman
<[email protected]> wrote:
> 3.13-stable review patch. If anyone has any objections, please let me know.
>
> ------------------
>
> From: David Vrabel <[email protected]>
>
> commit 5926f87fdaad4be3ed10cec563bf357915e55a86 upstream.
>
> This reverts commit a9c8e4beeeb64c22b84c803747487857fe424b68.
>
> PTEs in Xen PV guests must contain machine addresses if _PAGE_PRESENT
> is set and pseudo-physical addresses is _PAGE_PRESENT is clear.
>
> This is because during a domain save/restore (migration) the page
> table entries are "canonicalised" and uncanonicalised". i.e., MFNs are
> converted to PFNs during domain save so that on a restore the page
> table entries may be rewritten with the new MFNs on the destination.
> This canonicalisation is only done for PTEs that are present.
>
> This change resulted in writing PTEs with MFNs if _PAGE_PROTNONE (or
> _PAGE_NUMA) was set but _PAGE_PRESENT was clear. These PTEs would be
> migrated as-is which would result in unexpected behaviour in the
> destination domain. Either a) the MFN would be translated to the
> wrong PFN/page; b) setting the _PAGE_PRESENT bit would clear the PTE
> because the MFN is no longer owned by the domain; or c) the present
> bit would not get set.
>
> Symptoms include "Bad page" reports when munmapping after migrating a
> domain.
>
> Signed-off-by: David Vrabel <[email protected]>
> Acked-by: Konrad Rzeszutek Wilk <[email protected]>
> Signed-off-by: Greg Kroah-Hartman <[email protected]>
>
> ---
> arch/x86/include/asm/pgtable.h | 14 ++------------
> arch/x86/xen/mmu.c | 4 ++--
> 2 files changed, 4 insertions(+), 14 deletions(-)
>
> --- a/arch/x86/include/asm/pgtable.h
> +++ b/arch/x86/include/asm/pgtable.h
> @@ -445,20 +445,10 @@ static inline int pte_same(pte_t a, pte_
> return a.pte == b.pte;
> }
>
> -static inline int pteval_present(pteval_t pteval)
> -{
> - /*
> - * Yes Linus, _PAGE_PROTNONE == _PAGE_NUMA. Expressing it this
> - * way clearly states that the intent is that protnone and numa
> - * hinting ptes are considered present for the purposes of
> - * pagetable operations like zapping, protection changes, gup etc.
> - */
> - return pteval & (_PAGE_PRESENT | _PAGE_PROTNONE | _PAGE_NUMA);
> -}
> -
> static inline int pte_present(pte_t a)
> {
> - return pteval_present(pte_flags(a));
> + return pte_flags(a) & (_PAGE_PRESENT | _PAGE_PROTNONE |
> + _PAGE_NUMA);
> }
>
> #define pte_accessible pte_accessible
> --- a/arch/x86/xen/mmu.c
> +++ b/arch/x86/xen/mmu.c
> @@ -365,7 +365,7 @@ void xen_ptep_modify_prot_commit(struct
> /* Assume pteval_t is equivalent to all the other *val_t types. */
> static pteval_t pte_mfn_to_pfn(pteval_t val)
> {
> - if (pteval_present(val)) {
> + if (val & _PAGE_PRESENT) {
> unsigned long mfn = (val & PTE_PFN_MASK) >> PAGE_SHIFT;
> unsigned long pfn = mfn_to_pfn(mfn);
>
> @@ -381,7 +381,7 @@ static pteval_t pte_mfn_to_pfn(pteval_t
>
> static pteval_t pte_pfn_to_mfn(pteval_t val)
> {
> - if (pteval_present(val)) {
> + if (val & _PAGE_PRESENT) {
> unsigned long pfn = (val & PTE_PFN_MASK) >> PAGE_SHIFT;
> pteval_t flags = val & PTE_FLAGS_MASK;
> unsigned long mfn;
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/

2014-04-10 10:08:46

by David Vrabel

[permalink] [raw]

Subject: Re: [PATCH 3.13 08/22] Revert "xen: properly account for _PAGE_NUMA during xen pte translations"

On 10/04/14 11:05, Steven Noonan wrote:
> I realize it's late to protest on this given that 3.13.9 is out, but
> what is the path forward for those experiencing the original issue
> that the reverted commit was intended to correct?

Disable NUMA_REBALANCING.

Or apply

https://lkml.org/lkml/2014/4/8/173

David