2015-07-08 07:34:59

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 3.14 00/30] 3.14.48-stable review

This is the start of the stable review cycle for the 3.14.48 release.
There are 30 patches in this series, all will be posted as a response
to this one. If anyone has any issues with these being applied, please
let me know.

Responses should be made by Fri Jul 10 07:31:09 UTC 2015.
Anything received after that time might be too late.

The whole patch series can be found in one patch at:
kernel.org/pub/linux/kernel/v3.x/stable-review/patch-3.14.48-rc1.gz
and the diffstat can be found below.

thanks,

greg k-h

-------------
Pseudo-Shortlog of commits:

Greg Kroah-Hartman <[email protected]>
Linux 3.14.48-rc1

David E. Box <[email protected]>
x86/iosf: Add Kconfig prompt for IOSF_MBI selection

Christoffer Dall <[email protected]>
arm/arm64: KVM: Keep elrsr/aisr in sync with software model

Marc Zyngier <[email protected]>
arm64: KVM: Do not use pgd_index to index stage-2 pgd

Marc Zyngier <[email protected]>
arm64: KVM: Fix HCR setting for 32bit guests

Marc Zyngier <[email protected]>
arm64: KVM: Fix TLB invalidation by IPA/VMID

Christoffer Dall <[email protected]>
arm/arm64: KVM: Require in-kernel vgic for the arch timers

Eric W. Biederman <[email protected]>
vfs: Ignore unlocked mounts in fs_fully_visible

Eric W. Biederman <[email protected]>
vfs: Remove incorrect debugging WARN in prepend_path

Jan Kara <[email protected]>
fs: Fix S_NOSEC handling

Radim Krčmář <[email protected]>
KVM: x86: make vapics_in_nmi_mode atomic

James Hogan <[email protected]>
MIPS: Fix KVM guest fixmap address

Bjorn Helgaas <[email protected]>
x86/PCI: Use host bridge _CRS info on Foxconn K8M890-8237A

Bjorn Helgaas <[email protected]>
x86/PCI: Use host bridge _CRS info on systems with >32 bit addressing

Anton Blanchard <[email protected]>
powerpc/perf: Fix book3s kernel to userspace backtraces

Marc Zyngier <[email protected]>
arm: KVM: force execution of HCPTR access on VM exit

Joe Konno <[email protected]>
intel_pstate: set BYT MSR with wrmsrl_on_cpu()

Joerg Roedel <[email protected]>
iommu/amd: Handle large pages correctly in free_pagetable

Horia Geant? <[email protected]>
Revert "crypto: talitos - convert to use be16_add_cpu()"

Horia Geant? <[email protected]>
crypto: talitos - avoid memleak in talitos_alg_alloc()

Alexander Sverdlin <[email protected]>
sctp: Fix race between OOTB responce and route removal

Mugunthan V N <[email protected]>
net: phy: fix phy link up when limiting speed via device tree

Christoph Paasch <[email protected]>
tcp: Do not call tcp_fastopen_reset_cipher from interrupt context

Julian Anastasov <[email protected]>
neigh: do not modify unlinked entries

Willem de Bruijn <[email protected]>
packet: avoid out of bounds read in round robin fanout

Eric Dumazet <[email protected]>
packet: read num_members once in packet_rcv_fanout()

Nikolay Aleksandrov <[email protected]>
bridge: fix br_stp_set_bridge_priority race conditions

Marcelo Ricardo Leitner <[email protected]>
sctp: fix ASCONF list handling

Shaohua Li <[email protected]>
net: don't wait for order-3 page allocation

Nikolay Aleksandrov <[email protected]>
bridge: fix multicast router rlist endless loop

Sowmini Varadhan <[email protected]>
sparc: Use GFP_ATOMIC in ldc_alloc_exp_dring() as it can be called in softirq context


-------------

Diffstat:

Makefile | 4 +--
arch/arm/include/asm/kvm_mmu.h | 3 +-
arch/arm/kvm/arm.c | 13 +++++++--
arch/arm/kvm/interrupts.S | 10 +++----
arch/arm/kvm/interrupts_head.S | 20 ++++++++++++--
arch/arm/kvm/mmu.c | 6 ++--
arch/arm64/include/asm/kvm_emulate.h | 2 ++
arch/arm64/include/asm/kvm_mmu.h | 2 ++
arch/arm64/kvm/hyp.S | 1 +
arch/arm64/kvm/reset.c | 1 -
arch/mips/include/asm/mach-generic/spaces.h | 4 +++
arch/powerpc/perf/core-book3s.c | 11 +++++++-
arch/sparc/kernel/ldc.c | 2 +-
arch/x86/Kconfig | 14 ++++++++--
arch/x86/include/asm/kvm_host.h | 2 +-
arch/x86/kvm/i8254.c | 2 +-
arch/x86/kvm/lapic.c | 4 +--
arch/x86/pci/acpi.c | 17 ++++++++++--
drivers/cpufreq/intel_pstate.c | 2 +-
drivers/crypto/talitos.c | 4 ++-
drivers/iommu/amd_iommu.c | 6 ++++
drivers/net/phy/phy_device.c | 5 ++--
fs/dcache.c | 11 --------
fs/inode.c | 4 +--
fs/namespace.c | 8 ++++--
include/kvm/arm_arch_timer.h | 10 +++----
include/net/netns/sctp.h | 1 +
include/net/sctp/structs.h | 4 +++
net/bridge/br_ioctl.c | 2 --
net/bridge/br_multicast.c | 7 ++---
net/bridge/br_stp_if.c | 4 ++-
net/core/neighbour.c | 13 +++++++++
net/core/skbuff.c | 4 ++-
net/core/sock.c | 4 ++-
net/ipv4/af_inet.c | 2 ++
net/ipv4/tcp.c | 7 +++--
net/ipv4/tcp_fastopen.c | 2 --
net/packet/af_packet.c | 20 ++------------
net/sctp/output.c | 4 ++-
net/sctp/socket.c | 43 +++++++++++++++++++++--------
virt/kvm/arm/arch_timer.c | 30 ++++++++++++++------
virt/kvm/arm/vgic.c | 10 +++++++
42 files changed, 223 insertions(+), 102 deletions(-)


2015-07-08 07:35:08

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 3.14 01/30] sparc: Use GFP_ATOMIC in ldc_alloc_exp_dring() as it can be called in softirq context

3.14-stable review patch. If anyone has any objections, please let me know.

------------------

From: Sowmini Varadhan <[email protected]>

Upstream commit 671d773297969bebb1732e1cdc1ec03aa53c6be2

Since it is possible for vnet_event_napi to end up doing
vnet_control_pkt_engine -> ... -> vnet_send_attr ->
vnet_port_alloc_tx_ring -> ldc_alloc_exp_dring -> kzalloc()
(i.e., in softirq context), kzalloc() should be called with
GFP_ATOMIC from ldc_alloc_exp_dring.

Signed-off-by: Sowmini Varadhan <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
arch/sparc/kernel/ldc.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

--- a/arch/sparc/kernel/ldc.c
+++ b/arch/sparc/kernel/ldc.c
@@ -2307,7 +2307,7 @@ void *ldc_alloc_exp_dring(struct ldc_cha
if (len & (8UL - 1))
return ERR_PTR(-EINVAL);

- buf = kzalloc(len, GFP_KERNEL);
+ buf = kzalloc(len, GFP_ATOMIC);
if (!buf)
return ERR_PTR(-ENOMEM);


2015-07-08 08:29:37

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 3.14 02/30] bridge: fix multicast router rlist endless loop

3.14-stable review patch. If anyone has any objections, please let me know.

------------------

From: Nikolay Aleksandrov <[email protected]>

[ Upstream commit 1a040eaca1a22f8da8285ceda6b5e4a2cb704867 ]

Since the addition of sysfs multicast router support if one set
multicast_router to "2" more than once, then the port would be added to
the hlist every time and could end up linking to itself and thus causing an
endless loop for rlist walkers.
So to reproduce just do:
echo 2 > multicast_router; echo 2 > multicast_router;
in a bridge port and let some igmp traffic flow, for me it hangs up
in br_multicast_flood().
Fix this by adding a check in br_multicast_add_router() if the port is
already linked.
The reason this didn't happen before the addition of multicast_router
sysfs entries is because there's a !hlist_unhashed check that prevents
it.

Signed-off-by: Nikolay Aleksandrov <[email protected]>
Fixes: 0909e11758bd ("bridge: Add multicast_router sysfs entries")
Acked-by: Herbert Xu <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
net/bridge/br_multicast.c | 7 +++----
1 file changed, 3 insertions(+), 4 deletions(-)

--- a/net/bridge/br_multicast.c
+++ b/net/bridge/br_multicast.c
@@ -1086,6 +1086,9 @@ static void br_multicast_add_router(stru
struct net_bridge_port *p;
struct hlist_node *slot = NULL;

+ if (!hlist_unhashed(&port->rlist))
+ return;
+
hlist_for_each_entry(p, &br->router_list, rlist) {
if ((unsigned long) port >= (unsigned long) p)
break;
@@ -1113,12 +1116,8 @@ static void br_multicast_mark_router(str
if (port->multicast_router != 1)
return;

- if (!hlist_unhashed(&port->rlist))
- goto timer;
-
br_multicast_add_router(br, port);

-timer:
mod_timer(&port->multicast_router_timer,
now + br->multicast_querier_interval);
}

2015-07-08 08:29:24

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 3.14 03/30] net: dont wait for order-3 page allocation

3.14-stable review patch. If anyone has any objections, please let me know.

------------------

From: Shaohua Li <[email protected]>

[ Upstream commit fb05e7a89f500cfc06ae277bdc911b281928995d ]

We saw excessive direct memory compaction triggered by skb_page_frag_refill.
This causes performance issues and add latency. Commit 5640f7685831e0
introduces the order-3 allocation. According to the changelog, the order-3
allocation isn't a must-have but to improve performance. But direct memory
compaction has high overhead. The benefit of order-3 allocation can't
compensate the overhead of direct memory compaction.

This patch makes the order-3 page allocation atomic. If there is no memory
pressure and memory isn't fragmented, the alloction will still success, so we
don't sacrifice the order-3 benefit here. If the atomic allocation fails,
direct memory compaction will not be triggered, skb_page_frag_refill will
fallback to order-0 immediately, hence the direct memory compaction overhead is
avoided. In the allocation failure case, kswapd is waken up and doing
compaction, so chances are allocation could success next time.

alloc_skb_with_frags is the same.

The mellanox driver does similar thing, if this is accepted, we must fix
the driver too.

V3: fix the same issue in alloc_skb_with_frags as pointed out by Eric
V2: make the changelog clearer

Cc: Eric Dumazet <[email protected]>
Cc: Chris Mason <[email protected]>
Cc: Debabrata Banerjee <[email protected]>
Signed-off-by: Shaohua Li <[email protected]>
Acked-by: Eric Dumazet <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
net/core/skbuff.c | 4 +++-
net/core/sock.c | 4 +++-
2 files changed, 6 insertions(+), 2 deletions(-)

--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -368,9 +368,11 @@ refill:
for (order = NETDEV_FRAG_PAGE_MAX_ORDER; ;) {
gfp_t gfp = gfp_mask;

- if (order)
+ if (order) {
gfp |= __GFP_COMP | __GFP_NOWARN |
__GFP_NOMEMALLOC;
+ gfp &= ~__GFP_WAIT;
+ }
nc->frag.page = alloc_pages(gfp, order);
if (likely(nc->frag.page))
break;
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -1914,8 +1914,10 @@ bool skb_page_frag_refill(unsigned int s
do {
gfp_t gfp = prio;

- if (order)
+ if (order) {
gfp |= __GFP_COMP | __GFP_NOWARN | __GFP_NORETRY;
+ gfp &= ~__GFP_WAIT;
+ }
pfrag->page = alloc_pages(gfp, order);
if (likely(pfrag->page)) {
pfrag->offset = 0;

2015-07-08 08:29:16

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 3.14 04/30] sctp: fix ASCONF list handling

3.14-stable review patch. If anyone has any objections, please let me know.

------------------

From: Marcelo Ricardo Leitner <[email protected]>

[ Upstream commit 2d45a02d0166caf2627fe91897c6ffc3b19514c4 ]

->auto_asconf_splist is per namespace and mangled by functions like
sctp_setsockopt_auto_asconf() which doesn't guarantee any serialization.

Also, the call to inet_sk_copy_descendant() was backuping
->auto_asconf_list through the copy but was not honoring
->do_auto_asconf, which could lead to list corruption if it was
different between both sockets.

This commit thus fixes the list handling by using ->addr_wq_lock
spinlock to protect the list. A special handling is done upon socket
creation and destruction for that. Error handlig on sctp_init_sock()
will never return an error after having initialized asconf, so
sctp_destroy_sock() can be called without addrq_wq_lock. The lock now
will be take on sctp_close_sock(), before locking the socket, so we
don't do it in inverse order compared to sctp_addr_wq_timeout_handler().

Instead of taking the lock on sctp_sock_migrate() for copying and
restoring the list values, it's preferred to avoid rewritting it by
implementing sctp_copy_descendant().

Issue was found with a test application that kept flipping sysctl
default_auto_asconf on and off, but one could trigger it by issuing
simultaneous setsockopt() calls on multiple sockets or by
creating/destroying sockets fast enough. This is only triggerable
locally.

Fixes: 9f7d653b67ae ("sctp: Add Auto-ASCONF support (core).")
Reported-by: Ji Jianwen <[email protected]>
Suggested-by: Neil Horman <[email protected]>
Suggested-by: Hannes Frederic Sowa <[email protected]>
Acked-by: Hannes Frederic Sowa <[email protected]>
Signed-off-by: Marcelo Ricardo Leitner <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
include/net/netns/sctp.h | 1 +
include/net/sctp/structs.h | 4 ++++
net/sctp/socket.c | 43 ++++++++++++++++++++++++++++++++-----------
3 files changed, 37 insertions(+), 11 deletions(-)

--- a/include/net/netns/sctp.h
+++ b/include/net/netns/sctp.h
@@ -31,6 +31,7 @@ struct netns_sctp {
struct list_head addr_waitq;
struct timer_list addr_wq_timer;
struct list_head auto_asconf_splist;
+ /* Lock that protects both addr_waitq and auto_asconf_splist */
spinlock_t addr_wq_lock;

/* Lock that protects the local_addr_list writers */
--- a/include/net/sctp/structs.h
+++ b/include/net/sctp/structs.h
@@ -219,6 +219,10 @@ struct sctp_sock {
atomic_t pd_mode;
/* Receive to here while partial delivery is in effect. */
struct sk_buff_head pd_lobby;
+
+ /* These must be the last fields, as they will skipped on copies,
+ * like on accept and peeloff operations
+ */
struct list_head auto_asconf_list;
int do_auto_asconf;
};
--- a/net/sctp/socket.c
+++ b/net/sctp/socket.c
@@ -1532,8 +1532,10 @@ static void sctp_close(struct sock *sk,

/* Supposedly, no process has access to the socket, but
* the net layers still may.
+ * Also, sctp_destroy_sock() needs to be called with addr_wq_lock
+ * held and that should be grabbed before socket lock.
*/
- local_bh_disable();
+ spin_lock_bh(&net->sctp.addr_wq_lock);
bh_lock_sock(sk);

/* Hold the sock, since sk_common_release() will put sock_put()
@@ -1543,7 +1545,7 @@ static void sctp_close(struct sock *sk,
sk_common_release(sk);

bh_unlock_sock(sk);
- local_bh_enable();
+ spin_unlock_bh(&net->sctp.addr_wq_lock);

sock_put(sk);

@@ -3511,6 +3513,7 @@ static int sctp_setsockopt_auto_asconf(s
if ((val && sp->do_auto_asconf) || (!val && !sp->do_auto_asconf))
return 0;

+ spin_lock_bh(&sock_net(sk)->sctp.addr_wq_lock);
if (val == 0 && sp->do_auto_asconf) {
list_del(&sp->auto_asconf_list);
sp->do_auto_asconf = 0;
@@ -3519,6 +3522,7 @@ static int sctp_setsockopt_auto_asconf(s
&sock_net(sk)->sctp.auto_asconf_splist);
sp->do_auto_asconf = 1;
}
+ spin_unlock_bh(&sock_net(sk)->sctp.addr_wq_lock);
return 0;
}

@@ -4009,18 +4013,28 @@ static int sctp_init_sock(struct sock *s
local_bh_disable();
percpu_counter_inc(&sctp_sockets_allocated);
sock_prot_inuse_add(net, sk->sk_prot, 1);
+
+ /* Nothing can fail after this block, otherwise
+ * sctp_destroy_sock() will be called without addr_wq_lock held
+ */
if (net->sctp.default_auto_asconf) {
+ spin_lock(&sock_net(sk)->sctp.addr_wq_lock);
list_add_tail(&sp->auto_asconf_list,
&net->sctp.auto_asconf_splist);
sp->do_auto_asconf = 1;
- } else
+ spin_unlock(&sock_net(sk)->sctp.addr_wq_lock);
+ } else {
sp->do_auto_asconf = 0;
+ }
+
local_bh_enable();

return 0;
}

-/* Cleanup any SCTP per socket resources. */
+/* Cleanup any SCTP per socket resources. Must be called with
+ * sock_net(sk)->sctp.addr_wq_lock held if sp->do_auto_asconf is true
+ */
static void sctp_destroy_sock(struct sock *sk)
{
struct sctp_sock *sp;
@@ -6973,6 +6987,19 @@ void sctp_copy_sock(struct sock *newsk,
newinet->mc_list = NULL;
}

+static inline void sctp_copy_descendant(struct sock *sk_to,
+ const struct sock *sk_from)
+{
+ int ancestor_size = sizeof(struct inet_sock) +
+ sizeof(struct sctp_sock) -
+ offsetof(struct sctp_sock, auto_asconf_list);
+
+ if (sk_from->sk_family == PF_INET6)
+ ancestor_size += sizeof(struct ipv6_pinfo);
+
+ __inet_sk_copy_descendant(sk_to, sk_from, ancestor_size);
+}
+
/* Populate the fields of the newsk from the oldsk and migrate the assoc
* and its messages to the newsk.
*/
@@ -6987,7 +7014,6 @@ static void sctp_sock_migrate(struct soc
struct sk_buff *skb, *tmp;
struct sctp_ulpevent *event;
struct sctp_bind_hashbucket *head;
- struct list_head tmplist;

/* Migrate socket buffer sizes and all the socket level options to the
* new socket.
@@ -6995,12 +7021,7 @@ static void sctp_sock_migrate(struct soc
newsk->sk_sndbuf = oldsk->sk_sndbuf;
newsk->sk_rcvbuf = oldsk->sk_rcvbuf;
/* Brute force copy old sctp opt. */
- if (oldsp->do_auto_asconf) {
- memcpy(&tmplist, &newsp->auto_asconf_list, sizeof(tmplist));
- inet_sk_copy_descendant(newsk, oldsk);
- memcpy(&newsp->auto_asconf_list, &tmplist, sizeof(tmplist));
- } else
- inet_sk_copy_descendant(newsk, oldsk);
+ sctp_copy_descendant(newsk, oldsk);

/* Restore the ep value that was overwritten with the above structure
* copy.

2015-07-08 08:25:49

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 3.14 05/30] bridge: fix br_stp_set_bridge_priority race conditions

3.14-stable review patch. If anyone has any objections, please let me know.

------------------

From: Nikolay Aleksandrov <[email protected]>

[ Upstream commit 2dab80a8b486f02222a69daca6859519e05781d9 ]

After the ->set() spinlocks were removed br_stp_set_bridge_priority
was left running without any protection when used via sysfs. It can
race with port add/del and could result in use-after-free cases and
corrupted lists. Tested by running port add/del in a loop with stp
enabled while setting priority in a loop, crashes are easily
reproducible.
The spinlocks around sysfs ->set() were removed in commit:
14f98f258f19 ("bridge: range check STP parameters")
There's also a race condition in the netlink priority support that is
fixed by this change, but it was introduced recently and the fixes tag
covers it, just in case it's needed the commit is:
af615762e972 ("bridge: add ageing_time, stp_state, priority over netlink")

Signed-off-by: Nikolay Aleksandrov <[email protected]>
Fixes: 14f98f258f19 ("bridge: range check STP parameters")
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
net/bridge/br_ioctl.c | 2 --
net/bridge/br_stp_if.c | 4 +++-
2 files changed, 3 insertions(+), 3 deletions(-)

--- a/net/bridge/br_ioctl.c
+++ b/net/bridge/br_ioctl.c
@@ -247,9 +247,7 @@ static int old_dev_ioctl(struct net_devi
if (!ns_capable(dev_net(dev)->user_ns, CAP_NET_ADMIN))
return -EPERM;

- spin_lock_bh(&br->lock);
br_stp_set_bridge_priority(br, args[1]);
- spin_unlock_bh(&br->lock);
return 0;

case BRCTL_SET_PORT_PRIORITY:
--- a/net/bridge/br_stp_if.c
+++ b/net/bridge/br_stp_if.c
@@ -243,12 +243,13 @@ bool br_stp_recalculate_bridge_id(struct
return true;
}

-/* called under bridge lock */
+/* Acquires and releases bridge lock */
void br_stp_set_bridge_priority(struct net_bridge *br, u16 newprio)
{
struct net_bridge_port *p;
int wasroot;

+ spin_lock_bh(&br->lock);
wasroot = br_is_root_bridge(br);

list_for_each_entry(p, &br->port_list, list) {
@@ -266,6 +267,7 @@ void br_stp_set_bridge_priority(struct n
br_port_state_selection(br);
if (br_is_root_bridge(br) && !wasroot)
br_become_root_bridge(br);
+ spin_unlock_bh(&br->lock);
}

/* called under bridge lock */

2015-07-08 08:28:47

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 3.14 06/30] packet: read num_members once in packet_rcv_fanout()

3.14-stable review patch. If anyone has any objections, please let me know.

------------------

From: Eric Dumazet <[email protected]>

[ Upstream commit f98f4514d07871da7a113dd9e3e330743fd70ae4 ]

We need to tell compiler it must not read f->num_members multiple
times. Otherwise testing if num is not zero is flaky, and we could
attempt an invalid divide by 0 in fanout_demux_cpu()

Note bug was present in packet_rcv_fanout_hash() and
packet_rcv_fanout_lb() but final 3.1 had a simple location
after commit 95ec3eb417115fb ("packet: Add 'cpu' fanout policy.")

Fixes: dc99f600698dc ("packet: Add fanout support.")
Signed-off-by: Eric Dumazet <[email protected]>
Cc: Willem de Bruijn <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
net/packet/af_packet.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

--- a/net/packet/af_packet.c
+++ b/net/packet/af_packet.c
@@ -1345,7 +1345,7 @@ static int packet_rcv_fanout(struct sk_b
struct packet_type *pt, struct net_device *orig_dev)
{
struct packet_fanout *f = pt->af_packet_priv;
- unsigned int num = f->num_members;
+ unsigned int num = ACCESS_ONCE(f->num_members);
struct packet_sock *po;
unsigned int idx;


2015-07-08 07:35:42

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 3.14 07/30] packet: avoid out of bounds read in round robin fanout

3.14-stable review patch. If anyone has any objections, please let me know.

------------------

From: Willem de Bruijn <[email protected]>

[ Upstream commit 468479e6043c84f5a65299cc07cb08a22a28c2b1 ]

PACKET_FANOUT_LB computes f->rr_cur such that it is modulo
f->num_members. It returns the old value unconditionally, but
f->num_members may have changed since the last store. Ensure
that the return value is always < num.

When modifying the logic, simplify it further by replacing the loop
with an unconditional atomic increment.

Fixes: dc99f600698d ("packet: Add fanout support.")
Suggested-by: Eric Dumazet <[email protected]>
Signed-off-by: Willem de Bruijn <[email protected]>
Acked-by: Eric Dumazet <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
net/packet/af_packet.c | 18 ++----------------
1 file changed, 2 insertions(+), 16 deletions(-)

--- a/net/packet/af_packet.c
+++ b/net/packet/af_packet.c
@@ -1264,16 +1264,6 @@ static void packet_sock_destruct(struct
sk_refcnt_debug_dec(sk);
}

-static int fanout_rr_next(struct packet_fanout *f, unsigned int num)
-{
- int x = atomic_read(&f->rr_cur) + 1;
-
- if (x >= num)
- x = 0;
-
- return x;
-}
-
static unsigned int fanout_demux_hash(struct packet_fanout *f,
struct sk_buff *skb,
unsigned int num)
@@ -1285,13 +1275,9 @@ static unsigned int fanout_demux_lb(stru
struct sk_buff *skb,
unsigned int num)
{
- int cur, old;
+ unsigned int val = atomic_inc_return(&f->rr_cur);

- cur = atomic_read(&f->rr_cur);
- while ((old = atomic_cmpxchg(&f->rr_cur, cur,
- fanout_rr_next(f, num))) != cur)
- cur = old;
- return cur;
+ return val % num;
}

static unsigned int fanout_demux_cpu(struct packet_fanout *f,

2015-07-08 08:25:37

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 3.14 08/30] neigh: do not modify unlinked entries

3.14-stable review patch. If anyone has any objections, please let me know.

------------------

From: Julian Anastasov <[email protected]>

[ Upstream commit 2c51a97f76d20ebf1f50fef908b986cb051fdff9 ]

The lockless lookups can return entry that is unlinked.
Sometimes they get reference before last neigh_cleanup_and_release,
sometimes they do not need reference. Later, any
modification attempts may result in the following problems:

1. entry is not destroyed immediately because neigh_update
can start the timer for dead entry, eg. on change to NUD_REACHABLE
state. As result, entry lives for some time but is invisible
and out of control.

2. __neigh_event_send can run in parallel with neigh_destroy
while refcnt=0 but if timer is started and expired refcnt can
reach 0 for second time leading to second neigh_destroy and
possible crash.

Thanks to Eric Dumazet and Ying Xue for their work and analyze
on the __neigh_event_send change.

Fixes: 767e97e1e0db ("neigh: RCU conversion of struct neighbour")
Fixes: a263b3093641 ("ipv4: Make neigh lookups directly in output packet path.")
Fixes: 6fd6ce2056de ("ipv6: Do not depend on rt->n in ip6_finish_output2().")
Cc: Eric Dumazet <[email protected]>
Cc: Ying Xue <[email protected]>
Signed-off-by: Julian Anastasov <[email protected]>
Acked-by: Eric Dumazet <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
net/core/neighbour.c | 13 +++++++++++++
1 file changed, 13 insertions(+)

--- a/net/core/neighbour.c
+++ b/net/core/neighbour.c
@@ -976,6 +976,8 @@ int __neigh_event_send(struct neighbour
rc = 0;
if (neigh->nud_state & (NUD_CONNECTED | NUD_DELAY | NUD_PROBE))
goto out_unlock_bh;
+ if (neigh->dead)
+ goto out_dead;

if (!(neigh->nud_state & (NUD_STALE | NUD_INCOMPLETE))) {
if (NEIGH_VAR(neigh->parms, MCAST_PROBES) +
@@ -1032,6 +1034,13 @@ out_unlock_bh:
write_unlock(&neigh->lock);
local_bh_enable();
return rc;
+
+out_dead:
+ if (neigh->nud_state & NUD_STALE)
+ goto out_unlock_bh;
+ write_unlock_bh(&neigh->lock);
+ kfree_skb(skb);
+ return 1;
}
EXPORT_SYMBOL(__neigh_event_send);

@@ -1095,6 +1104,8 @@ int neigh_update(struct neighbour *neigh
if (!(flags & NEIGH_UPDATE_F_ADMIN) &&
(old & (NUD_NOARP | NUD_PERMANENT)))
goto out;
+ if (neigh->dead)
+ goto out;

if (!(new & NUD_VALID)) {
neigh_del_timer(neigh);
@@ -1244,6 +1255,8 @@ EXPORT_SYMBOL(neigh_update);
*/
void __neigh_set_probe_once(struct neighbour *neigh)
{
+ if (neigh->dead)
+ return;
neigh->updated = jiffies;
if (!(neigh->nud_state & NUD_FAILED))
return;

2015-07-08 07:37:14

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 3.14 09/30] tcp: Do not call tcp_fastopen_reset_cipher from interrupt context

3.14-stable review patch. If anyone has any objections, please let me know.

------------------

From: Christoph Paasch <[email protected]>

[ Upstream commit dfea2aa654243f70dc53b8648d0bbdeec55a7df1 ]

tcp_fastopen_reset_cipher really cannot be called from interrupt
context. It allocates the tcp_fastopen_context with GFP_KERNEL and
calls crypto_alloc_cipher, which allocates all kind of stuff with
GFP_KERNEL.

Thus, we might sleep when the key-generation is triggered by an
incoming TFO cookie-request which would then happen in interrupt-
context, as shown by enabling CONFIG_DEBUG_ATOMIC_SLEEP:

[ 36.001813] BUG: sleeping function called from invalid context at mm/slub.c:1266
[ 36.003624] in_atomic(): 1, irqs_disabled(): 0, pid: 1016, name: packetdrill
[ 36.004859] CPU: 1 PID: 1016 Comm: packetdrill Not tainted 4.1.0-rc7 #14
[ 36.006085] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.7.5-0-ge51488c-20140602_164612-nilsson.home.kraxel.org 04/01/2014
[ 36.008250] 00000000000004f2 ffff88007f8838a8 ffffffff8171d53a ffff880075a084a8
[ 36.009630] ffff880075a08000 ffff88007f8838c8 ffffffff810967d3 ffff88007f883928
[ 36.011076] 0000000000000000 ffff88007f8838f8 ffffffff81096892 ffff88007f89be00
[ 36.012494] Call Trace:
[ 36.012953] <IRQ> [<ffffffff8171d53a>] dump_stack+0x4f/0x6d
[ 36.014085] [<ffffffff810967d3>] ___might_sleep+0x103/0x170
[ 36.015117] [<ffffffff81096892>] __might_sleep+0x52/0x90
[ 36.016117] [<ffffffff8118e887>] kmem_cache_alloc_trace+0x47/0x190
[ 36.017266] [<ffffffff81680d82>] ? tcp_fastopen_reset_cipher+0x42/0x130
[ 36.018485] [<ffffffff81680d82>] tcp_fastopen_reset_cipher+0x42/0x130
[ 36.019679] [<ffffffff81680f01>] tcp_fastopen_init_key_once+0x61/0x70
[ 36.020884] [<ffffffff81680f2c>] __tcp_fastopen_cookie_gen+0x1c/0x60
[ 36.022058] [<ffffffff816814ff>] tcp_try_fastopen+0x58f/0x730
[ 36.023118] [<ffffffff81671788>] tcp_conn_request+0x3e8/0x7b0
[ 36.024185] [<ffffffff810e3872>] ? __module_text_address+0x12/0x60
[ 36.025327] [<ffffffff8167b2e1>] tcp_v4_conn_request+0x51/0x60
[ 36.026410] [<ffffffff816727e0>] tcp_rcv_state_process+0x190/0xda0
[ 36.027556] [<ffffffff81661f97>] ? __inet_lookup_established+0x47/0x170
[ 36.028784] [<ffffffff8167c2ad>] tcp_v4_do_rcv+0x16d/0x3d0
[ 36.029832] [<ffffffff812e6806>] ? security_sock_rcv_skb+0x16/0x20
[ 36.030936] [<ffffffff8167cc8a>] tcp_v4_rcv+0x77a/0x7b0
[ 36.031875] [<ffffffff816af8c3>] ? iptable_filter_hook+0x33/0x70
[ 36.032953] [<ffffffff81657d22>] ip_local_deliver_finish+0x92/0x1f0
[ 36.034065] [<ffffffff81657f1a>] ip_local_deliver+0x9a/0xb0
[ 36.035069] [<ffffffff81657c90>] ? ip_rcv+0x3d0/0x3d0
[ 36.035963] [<ffffffff81657569>] ip_rcv_finish+0x119/0x330
[ 36.036950] [<ffffffff81657ba7>] ip_rcv+0x2e7/0x3d0
[ 36.037847] [<ffffffff81610652>] __netif_receive_skb_core+0x552/0x930
[ 36.038994] [<ffffffff81610a57>] __netif_receive_skb+0x27/0x70
[ 36.040033] [<ffffffff81610b72>] process_backlog+0xd2/0x1f0
[ 36.041025] [<ffffffff81611482>] net_rx_action+0x122/0x310
[ 36.042007] [<ffffffff81076743>] __do_softirq+0x103/0x2f0
[ 36.042978] [<ffffffff81723e3c>] do_softirq_own_stack+0x1c/0x30

This patch moves the call to tcp_fastopen_init_key_once to the places
where a listener socket creates its TFO-state, which always happens in
user-context (either from the setsockopt, or implicitly during the
listen()-call)

Cc: Eric Dumazet <[email protected]>
Cc: Hannes Frederic Sowa <[email protected]>
Fixes: 222e83d2e0ae ("tcp: switch tcp_fastopen key generation to net_get_random_once")
Signed-off-by: Christoph Paasch <[email protected]>
Acked-by: Eric Dumazet <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
net/ipv4/af_inet.c | 2 ++
net/ipv4/tcp.c | 7 +++++--
net/ipv4/tcp_fastopen.c | 2 --
3 files changed, 7 insertions(+), 4 deletions(-)

--- a/net/ipv4/af_inet.c
+++ b/net/ipv4/af_inet.c
@@ -228,6 +228,8 @@ int inet_listen(struct socket *sock, int
err = 0;
if (err)
goto out;
+
+ tcp_fastopen_init_key_once(true);
}
err = inet_csk_listen_start(sk, backlog);
if (err)
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -2684,10 +2684,13 @@ static int do_tcp_setsockopt(struct sock

case TCP_FASTOPEN:
if (val >= 0 && ((1 << sk->sk_state) & (TCPF_CLOSE |
- TCPF_LISTEN)))
+ TCPF_LISTEN))) {
+ tcp_fastopen_init_key_once(true);
+
err = fastopen_init_queue(sk, val);
- else
+ } else {
err = -EINVAL;
+ }
break;
case TCP_TIMESTAMP:
if (!tp->repair)
--- a/net/ipv4/tcp_fastopen.c
+++ b/net/ipv4/tcp_fastopen.c
@@ -84,8 +84,6 @@ void tcp_fastopen_cookie_gen(__be32 src,
__be32 path[4] = { src, dst, 0, 0 };
struct tcp_fastopen_context *ctx;

- tcp_fastopen_init_key_once(true);
-
rcu_read_lock();
ctx = rcu_dereference(tcp_fastopen_ctx);
if (ctx) {

2015-07-08 07:36:49

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 3.14 10/30] net: phy: fix phy link up when limiting speed via device tree

3.14-stable review patch. If anyone has any objections, please let me know.

------------------

From: Mugunthan V N <[email protected]>

[ Upstream commit eb686231fce3770299760f24fdcf5ad041f44153 ]

When limiting phy link speed using "max-speed" to 100mbps or less on a
giga bit phy, phy never completes auto negotiation and phy state
machine is held in PHY_AN. Fixing this issue by comparing the giga
bit advertise though phydev->supported doesn't have it but phy has
BMSR_ESTATEN set. So that auto negotiation is restarted as old and
new advertise are different and link comes up fine.

Signed-off-by: Mugunthan V N <[email protected]>
Reviewed-by: Florian Fainelli <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
drivers/net/phy/phy_device.c | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)

--- a/drivers/net/phy/phy_device.c
+++ b/drivers/net/phy/phy_device.c
@@ -765,10 +765,11 @@ static int genphy_config_advert(struct p
if (phydev->supported & (SUPPORTED_1000baseT_Half |
SUPPORTED_1000baseT_Full)) {
adv |= ethtool_adv_to_mii_ctrl1000_t(advertise);
- if (adv != oldadv)
- changed = 1;
}

+ if (adv != oldadv)
+ changed = 1;
+
err = phy_write(phydev, MII_CTRL1000, adv);
if (err < 0)
return err;

2015-07-08 07:36:20

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 3.14 11/30] sctp: Fix race between OOTB responce and route removal

3.14-stable review patch. If anyone has any objections, please let me know.

------------------

From: Alexander Sverdlin <[email protected]>

[ Upstream commit 29c4afc4e98f4dc0ea9df22c631841f9c220b944 ]

There is NULL pointer dereference possible during statistics update if the route
used for OOTB responce is removed at unfortunate time. If the route exists when
we receive OOTB packet and we finally jump into sctp_packet_transmit() to send
ABORT, but in the meantime route is removed under our feet, we take "no_route"
path and try to update stats with IP_INC_STATS(sock_net(asoc->base.sk), ...).

But sctp_ootb_pkt_new() used to prepare responce packet doesn't call
sctp_transport_set_owner() and therefore there is no asoc associated with this
packet. Probably temporary asoc just for OOTB responces is overkill, so just
introduce a check like in all other places in sctp_packet_transmit(), where
"asoc" is dereferenced.

To reproduce this, one needs to
0. ensure that sctp module is loaded (otherwise ABORT is not generated)
1. remove default route on the machine
2. while true; do
ip route del [interface-specific route]
ip route add [interface-specific route]
done
3. send enough OOTB packets (i.e. HB REQs) from another host to trigger ABORT
responce

On x86_64 the crash looks like this:

BUG: unable to handle kernel NULL pointer dereference at 0000000000000020
IP: [<ffffffffa05ec9ac>] sctp_packet_transmit+0x63c/0x730 [sctp]
PGD 0
Oops: 0000 [#1] PREEMPT SMP
Modules linked in: ...
CPU: 0 PID: 0 Comm: swapper/0 Tainted: G O 4.0.5-1-ARCH #1
Hardware name: ...
task: ffffffff818124c0 ti: ffffffff81800000 task.ti: ffffffff81800000
RIP: 0010:[<ffffffffa05ec9ac>] [<ffffffffa05ec9ac>] sctp_packet_transmit+0x63c/0x730 [sctp]
RSP: 0018:ffff880127c037b8 EFLAGS: 00010296
RAX: 0000000000000000 RBX: 0000000000000000 RCX: 00000015ff66b480
RDX: 00000015ff66b400 RSI: ffff880127c17200 RDI: ffff880123403700
RBP: ffff880127c03888 R08: 0000000000017200 R09: ffffffff814625af
R10: ffffea00047e4680 R11: 00000000ffffff80 R12: ffff8800b0d38a28
R13: ffff8800b0d38a28 R14: ffff8800b3e88000 R15: ffffffffa05f24e0
FS: 0000000000000000(0000) GS:ffff880127c00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 0000000000000020 CR3: 00000000c855b000 CR4: 00000000000007f0
Stack:
ffff880127c03910 ffff8800b0d38a28 ffffffff8189d240 ffff88011f91b400
ffff880127c03828 ffffffffa05c94c5 0000000000000000 ffff8800baa1c520
0000000000000000 0000000000000001 0000000000000000 0000000000000000
Call Trace:
<IRQ>
[<ffffffffa05c94c5>] ? sctp_sf_tabort_8_4_8.isra.20+0x85/0x140 [sctp]
[<ffffffffa05d6b42>] ? sctp_transport_put+0x52/0x80 [sctp]
[<ffffffffa05d0bfc>] sctp_do_sm+0xb8c/0x19a0 [sctp]
[<ffffffff810b0e00>] ? trigger_load_balance+0x90/0x210
[<ffffffff810e0329>] ? update_process_times+0x59/0x60
[<ffffffff812c7a40>] ? timerqueue_add+0x60/0xb0
[<ffffffff810e0549>] ? enqueue_hrtimer+0x29/0xa0
[<ffffffff8101f599>] ? read_tsc+0x9/0x10
[<ffffffff8116d4b5>] ? put_page+0x55/0x60
[<ffffffff810ee1ad>] ? clockevents_program_event+0x6d/0x100
[<ffffffff81462b68>] ? skb_free_head+0x58/0x80
[<ffffffffa029a10b>] ? chksum_update+0x1b/0x27 [crc32c_generic]
[<ffffffff81283f3e>] ? crypto_shash_update+0xce/0xf0
[<ffffffffa05d3993>] sctp_endpoint_bh_rcv+0x113/0x280 [sctp]
[<ffffffffa05dd4e6>] sctp_inq_push+0x46/0x60 [sctp]
[<ffffffffa05ed7a0>] sctp_rcv+0x880/0x910 [sctp]
[<ffffffffa05ecb50>] ? sctp_packet_transmit_chunk+0xb0/0xb0 [sctp]
[<ffffffffa05ecb70>] ? sctp_csum_update+0x20/0x20 [sctp]
[<ffffffff814b05a5>] ? ip_route_input_noref+0x235/0xd30
[<ffffffff81051d6b>] ? ack_ioapic_level+0x7b/0x150
[<ffffffff814b27be>] ip_local_deliver_finish+0xae/0x210
[<ffffffff814b2e15>] ip_local_deliver+0x35/0x90
[<ffffffff814b2a15>] ip_rcv_finish+0xf5/0x370
[<ffffffff814b3128>] ip_rcv+0x2b8/0x3a0
[<ffffffff81474193>] __netif_receive_skb_core+0x763/0xa50
[<ffffffff81476c28>] __netif_receive_skb+0x18/0x60
[<ffffffff81476cb0>] netif_receive_skb_internal+0x40/0xd0
[<ffffffff814776c8>] napi_gro_receive+0xe8/0x120
[<ffffffffa03946aa>] rtl8169_poll+0x2da/0x660 [r8169]
[<ffffffff8147896a>] net_rx_action+0x21a/0x360
[<ffffffff81078dc1>] __do_softirq+0xe1/0x2d0
[<ffffffff8107912d>] irq_exit+0xad/0xb0
[<ffffffff8157d158>] do_IRQ+0x58/0xf0
[<ffffffff8157b06d>] common_interrupt+0x6d/0x6d
<EOI>
[<ffffffff810e1218>] ? hrtimer_start+0x18/0x20
[<ffffffffa05d65f9>] ? sctp_transport_destroy_rcu+0x29/0x30 [sctp]
[<ffffffff81020c50>] ? mwait_idle+0x60/0xa0
[<ffffffff810216ef>] arch_cpu_idle+0xf/0x20
[<ffffffff810b731c>] cpu_startup_entry+0x3ec/0x480
[<ffffffff8156b365>] rest_init+0x85/0x90
[<ffffffff818eb035>] start_kernel+0x48b/0x4ac
[<ffffffff818ea120>] ? early_idt_handlers+0x120/0x120
[<ffffffff818ea339>] x86_64_start_reservations+0x2a/0x2c
[<ffffffff818ea49c>] x86_64_start_kernel+0x161/0x184
Code: 90 48 8b 80 b8 00 00 00 48 89 85 70 ff ff ff 48 83 bd 70 ff ff ff 00 0f 85 cd fa ff ff 48 89 df 31 db e8 18 63 e7 e0 48 8b 45 80 <48> 8b 40 20 48 8b 40 30 48 8b 80 68 01 00 00 65 48 ff 40 78 e9
RIP [<ffffffffa05ec9ac>] sctp_packet_transmit+0x63c/0x730 [sctp]
RSP <ffff880127c037b8>
CR2: 0000000000000020
---[ end trace 5aec7fd2dc983574 ]---
Kernel panic - not syncing: Fatal exception in interrupt
Kernel Offset: 0x0 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffff9fffffff)
drm_kms_helper: panic occurred, switching back to text console
---[ end Kernel panic - not syncing: Fatal exception in interrupt

Signed-off-by: Alexander Sverdlin <[email protected]>
Acked-by: Neil Horman <[email protected]>
Acked-by: Marcelo Ricardo Leitner <[email protected]>
Acked-by: Vlad Yasevich <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
net/sctp/output.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)

--- a/net/sctp/output.c
+++ b/net/sctp/output.c
@@ -599,7 +599,9 @@ out:
return err;
no_route:
kfree_skb(nskb);
- IP_INC_STATS(sock_net(asoc->base.sk), IPSTATS_MIB_OUTNOROUTES);
+
+ if (asoc)
+ IP_INC_STATS(sock_net(asoc->base.sk), IPSTATS_MIB_OUTNOROUTES);

/* FIXME: Returning the 'err' will effect all the associations
* associated with a socket, although only one of the paths of the

2015-07-08 07:36:30

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 3.14 12/30] crypto: talitos - avoid memleak in talitos_alg_alloc()

3.14-stable review patch. If anyone has any objections, please let me know.

------------------

From: Horia Geant? <[email protected]>

commit 5fa7dadc898567ce14d6d6d427e7bd8ce6eb5d39 upstream.

Fixes: 1d11911a8c57 ("crypto: talitos - fix warning: 'alg' may be used uninitialized in this function")
Signed-off-by: Horia Geanta <[email protected]>
Signed-off-by: Herbert Xu <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
drivers/crypto/talitos.c | 1 +
1 file changed, 1 insertion(+)

--- a/drivers/crypto/talitos.c
+++ b/drivers/crypto/talitos.c
@@ -2563,6 +2563,7 @@ static struct talitos_crypto_alg *talito
break;
default:
dev_err(dev, "unknown algorithm type %d\n", t_alg->algt.type);
+ kfree(t_alg);
return ERR_PTR(-EINVAL);
}


2015-07-08 07:36:11

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 3.14 13/30] Revert "crypto: talitos - convert to use be16_add_cpu()"

3.14-stable review patch. If anyone has any objections, please let me know.

------------------

From: Horia Geant? <[email protected]>

commit 69d9cd8c592f1abce820dbce7181bbbf6812cfbd upstream.

This reverts commit 7291a932c6e27d9768e374e9d648086636daf61c.

The conversion to be16_add_cpu() is incorrect in case cryptlen is
negative due to premature (i.e. before addition / subtraction)
implicit conversion of cryptlen (int -> u16) leading to sign loss.

Cc: Wei Yongjun <[email protected]>
Signed-off-by: Horia Geanta <[email protected]>
Signed-off-by: Herbert Xu <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
drivers/crypto/talitos.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)

--- a/drivers/crypto/talitos.c
+++ b/drivers/crypto/talitos.c
@@ -927,7 +927,8 @@ static int sg_to_link_tbl(struct scatter
sg_count--;
link_tbl_ptr--;
}
- be16_add_cpu(&link_tbl_ptr->len, cryptlen);
+ link_tbl_ptr->len = cpu_to_be16(be16_to_cpu(link_tbl_ptr->len)
+ + cryptlen);

/* tag end of link table */
link_tbl_ptr->j_extent = DESC_PTR_LNKTBL_RETURN;

2015-07-08 07:36:01

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 3.14 14/30] iommu/amd: Handle large pages correctly in free_pagetable

3.14-stable review patch. If anyone has any objections, please let me know.

------------------

From: Joerg Roedel <[email protected]>

commit 0b3fff54bc01e8e6064d222a33e6fa7adabd94cd upstream.

Make sure that we are skipping over large PTEs while walking
the page-table tree.

Fixes: 5c34c403b723 ("iommu/amd: Fix memory leak in free_pagetable")
Signed-off-by: Joerg Roedel <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
drivers/iommu/amd_iommu.c | 6 ++++++
1 file changed, 6 insertions(+)

--- a/drivers/iommu/amd_iommu.c
+++ b/drivers/iommu/amd_iommu.c
@@ -1922,9 +1922,15 @@ static void free_pt_##LVL (unsigned long
pt = (u64 *)__pt; \
\
for (i = 0; i < 512; ++i) { \
+ /* PTE present? */ \
if (!IOMMU_PTE_PRESENT(pt[i])) \
continue; \
\
+ /* Large PTE? */ \
+ if (PM_PTE_LEVEL(pt[i]) == 0 || \
+ PM_PTE_LEVEL(pt[i]) == 7) \
+ continue; \
+ \
p = (unsigned long)IOMMU_PTE_PAGE(pt[i]); \
FN(p); \
} \

2015-07-08 07:35:51

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 3.14 15/30] intel_pstate: set BYT MSR with wrmsrl_on_cpu()

3.14-stable review patch. If anyone has any objections, please let me know.

------------------

From: Joe Konno <[email protected]>

commit 0dd23f94251f49da99a6cbfb22418b2d757d77d6 upstream.

Commit 007bea098b86 (intel_pstate: Add setting voltage value for
baytrail P states.) introduced byt_set_pstate() with the assumption that
it would always be run by the CPU whose MSR is to be written by it. It
turns out, however, that is not always the case in practice, so modify
byt_set_pstate() to enforce the MSR write done by it to always happen on
the right CPU.

Fixes: 007bea098b86 (intel_pstate: Add setting voltage value for baytrail P states.)
Signed-off-by: Joe Konno <[email protected]>
Acked-by: Kristen Carlson Accardi <[email protected]>
Signed-off-by: Rafael J. Wysocki <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
drivers/cpufreq/intel_pstate.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

--- a/drivers/cpufreq/intel_pstate.c
+++ b/drivers/cpufreq/intel_pstate.c
@@ -417,7 +417,7 @@ static void byt_set_pstate(struct cpudat

val |= vid;

- wrmsrl(MSR_IA32_PERF_CTL, val);
+ wrmsrl_on_cpu(cpudata->cpu, MSR_IA32_PERF_CTL, val);
}

#define BYT_BCLK_FREQS 5

2015-07-08 07:35:38

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 3.14 16/30] arm: KVM: force execution of HCPTR access on VM exit

3.14-stable review patch. If anyone has any objections, please let me know.

------------------

From: Marc Zyngier <[email protected]>

commit 85e84ba31039595995dae80b277378213602891b upstream.

On VM entry, we disable access to the VFP registers in order to
perform a lazy save/restore of these registers.

On VM exit, we restore access, test if we did enable them before,
and save/restore the guest/host registers if necessary. In this
sequence, the FPEXC register is always accessed, irrespective
of the trapping configuration.

If the guest didn't touch the VFP registers, then the HCPTR access
has now enabled such access, but we're missing a barrier to ensure
architectural execution of the new HCPTR configuration. If the HCPTR
access has been delayed/reordered, the subsequent access to FPEXC
will cause a trap, which we aren't prepared to handle at all.

The same condition exists when trapping to enable VFP for the guest.

The fix is to introduce a barrier after enabling VFP access. In the
vmexit case, it can be relaxed to only takes place if the guest hasn't
accessed its view of the VFP registers, making the access to FPEXC safe.

The set_hcptr macro is modified to deal with both vmenter/vmexit and
vmtrap operations, and now takes an optional label that is branched to
when the guest hasn't touched the VFP registers.

Reported-by: Vikram Sethi <[email protected]>
Signed-off-by: Marc Zyngier <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
arch/arm/kvm/interrupts.S | 10 ++++------
arch/arm/kvm/interrupts_head.S | 20 ++++++++++++++++++--
2 files changed, 22 insertions(+), 8 deletions(-)

--- a/arch/arm/kvm/interrupts.S
+++ b/arch/arm/kvm/interrupts.S
@@ -159,13 +159,9 @@ __kvm_vcpu_return:
@ Don't trap coprocessor accesses for host kernel
set_hstr vmexit
set_hdcr vmexit
- set_hcptr vmexit, (HCPTR_TTA | HCPTR_TCP(10) | HCPTR_TCP(11))
+ set_hcptr vmexit, (HCPTR_TTA | HCPTR_TCP(10) | HCPTR_TCP(11)), after_vfp_restore

#ifdef CONFIG_VFPv3
- @ Save floating point registers we if let guest use them.
- tst r2, #(HCPTR_TCP(10) | HCPTR_TCP(11))
- bne after_vfp_restore
-
@ Switch VFP/NEON hardware state to the host's
add r7, vcpu, #VCPU_VFP_GUEST
store_vfp_state r7
@@ -177,6 +173,8 @@ after_vfp_restore:
@ Restore FPEXC_EN which we clobbered on entry
pop {r2}
VFPFMXR FPEXC, r2
+#else
+after_vfp_restore:
#endif

@ Reset Hyp-role
@@ -467,7 +465,7 @@ switch_to_guest_vfp:
push {r3-r7}

@ NEON/VFP used. Turn on VFP access.
- set_hcptr vmexit, (HCPTR_TCP(10) | HCPTR_TCP(11))
+ set_hcptr vmtrap, (HCPTR_TCP(10) | HCPTR_TCP(11))

@ Switch VFP/NEON hardware state to the guest's
add r7, r0, #VCPU_VFP_HOST
--- a/arch/arm/kvm/interrupts_head.S
+++ b/arch/arm/kvm/interrupts_head.S
@@ -578,8 +578,13 @@ vcpu .req r0 @ vcpu pointer always in r
.endm

/* Configures the HCPTR (Hyp Coprocessor Trap Register) on entry/return
- * (hardware reset value is 0). Keep previous value in r2. */
-.macro set_hcptr operation, mask
+ * (hardware reset value is 0). Keep previous value in r2.
+ * An ISB is emited on vmexit/vmtrap, but executed on vmexit only if
+ * VFP wasn't already enabled (always executed on vmtrap).
+ * If a label is specified with vmexit, it is branched to if VFP wasn't
+ * enabled.
+ */
+.macro set_hcptr operation, mask, label = none
mrc p15, 4, r2, c1, c1, 2
ldr r3, =\mask
.if \operation == vmentry
@@ -588,6 +593,17 @@ vcpu .req r0 @ vcpu pointer always in r
bic r3, r2, r3 @ Don't trap defined coproc-accesses
.endif
mcr p15, 4, r3, c1, c1, 2
+ .if \operation != vmentry
+ .if \operation == vmexit
+ tst r2, #(HCPTR_TCP(10) | HCPTR_TCP(11))
+ beq 1f
+ .endif
+ isb
+ .if \label != none
+ b \label
+ .endif
+1:
+ .endif
.endm

/* Configures the HDCR (Hyp Debug Configuration Register) on entry/return

2015-07-08 08:24:46

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 3.14 17/30] powerpc/perf: Fix book3s kernel to userspace backtraces

3.14-stable review patch. If anyone has any objections, please let me know.

------------------

From: Anton Blanchard <[email protected]>

commit 72e349f1124a114435e599479c9b8d14bfd1ebcd upstream.

When we take a PMU exception or a software event we call
perf_read_regs(). This overloads regs->result with a boolean that
describes if we should use the sampled instruction address register
(SIAR) or the regs.

If the exception is in kernel, we start with the kernel regs and
backtrace through the kernel stack. At this point we switch to the
userspace regs and backtrace the user stack with perf_callchain_user().

Unfortunately these regs have not got the perf_read_regs() treatment,
so regs->result could be anything. If it is non zero,
perf_instruction_pointer() decides to use the SIAR, and we get issues
like this:

0.11% qemu-system-ppc [kernel.kallsyms] [k] _raw_spin_lock_irqsave
|
---_raw_spin_lock_irqsave
|
|--52.35%-- 0
| |
| |--46.39%-- __hrtimer_start_range_ns
| | kvmppc_run_core
| | kvmppc_vcpu_run_hv
| | kvmppc_vcpu_run
| | kvm_arch_vcpu_ioctl_run
| | kvm_vcpu_ioctl
| | do_vfs_ioctl
| | sys_ioctl
| | system_call
| | |
| | |--67.08%-- _raw_spin_lock_irqsave <--- hi mum
| | | |
| | | --100.00%-- 0x7e714
| | | 0x7e714

Notice the bogus _raw_spin_irqsave when we transition from kernel
(system_call) to userspace (0x7e714). We inserted what was in the SIAR.

Add a check in regs_use_siar() to check that the regs in question
are from a PMU exception. With this fix the backtrace makes sense:

0.47% qemu-system-ppc [kernel.vmlinux] [k] _raw_spin_lock_irqsave
|
---_raw_spin_lock_irqsave
|
|--53.83%-- 0
| |
| |--44.73%-- hrtimer_try_to_cancel
| | kvmppc_start_thread
| | kvmppc_run_core
| | kvmppc_vcpu_run_hv
| | kvmppc_vcpu_run
| | kvm_arch_vcpu_ioctl_run
| | kvm_vcpu_ioctl
| | do_vfs_ioctl
| | sys_ioctl
| | system_call
| | __ioctl
| | 0x7e714
| | 0x7e714

Signed-off-by: Anton Blanchard <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
arch/powerpc/perf/core-book3s.c | 11 ++++++++++-
1 file changed, 10 insertions(+), 1 deletion(-)

--- a/arch/powerpc/perf/core-book3s.c
+++ b/arch/powerpc/perf/core-book3s.c
@@ -124,7 +124,16 @@ static inline void power_pmu_bhrb_read(s

static bool regs_use_siar(struct pt_regs *regs)
{
- return !!regs->result;
+ /*
+ * When we take a performance monitor exception the regs are setup
+ * using perf_read_regs() which overloads some fields, in particular
+ * regs->result to tell us whether to use SIAR.
+ *
+ * However if the regs are from another exception, eg. a syscall, then
+ * they have not been setup using perf_read_regs() and so regs->result
+ * is something random.
+ */
+ return ((TRAP(regs) == 0xf00) && regs->result);
}

/*

2015-07-08 08:25:20

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 3.14 18/30] x86/PCI: Use host bridge _CRS info on systems with >32 bit addressing

3.14-stable review patch. If anyone has any objections, please let me know.

------------------

From: Bjorn Helgaas <[email protected]>

commit 3d9fecf6bfb8b12bc2f9a4c7109895a2a2bb9436 upstream.

We enable _CRS on all systems from 2008 and later. On older systems, we
ignore _CRS and assume the whole physical address space (excluding RAM and
other devices) is available for PCI devices, but on systems that support
physical address spaces larger than 4GB, it's doubtful that the area above
4GB is really available for PCI.

After d56dbf5bab8c ("PCI: Allocate 64-bit BARs above 4G when possible"), we
try to use that space above 4GB *first*, so we're more likely to put a
device there.

On Juan's Toshiba Satellite Pro U200, BIOS left the graphics, sound, 1394,
and card reader devices unassigned (but only after Windows had been
booted). Only the sound device had a 64-bit BAR, so it was the only device
placed above 4GB, and hence the only device that didn't work.

Keep _CRS enabled even on pre-2008 systems if they support physical address
space larger than 4GB.

Fixes: d56dbf5bab8c ("PCI: Allocate 64-bit BARs above 4G when possible")
Reported-and-tested-by: Juan Dayer <[email protected]>
Reported-and-tested-by: Alan Horsfield <[email protected]>
Link: https://bugzilla.kernel.org/show_bug.cgi?id=99221
Link: https://bugzilla.opensuse.org/show_bug.cgi?id=907092
Signed-off-by: Bjorn Helgaas <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
arch/x86/pci/acpi.c | 6 ++++--
1 file changed, 4 insertions(+), 2 deletions(-)

--- a/arch/x86/pci/acpi.c
+++ b/arch/x86/pci/acpi.c
@@ -124,8 +124,10 @@ void __init pci_acpi_crs_quirks(void)
{
int year;

- if (dmi_get_date(DMI_BIOS_DATE, &year, NULL, NULL) && year < 2008)
- pci_use_crs = false;
+ if (dmi_get_date(DMI_BIOS_DATE, &year, NULL, NULL) && year < 2008) {
+ if (iomem_resource.end <= 0xffffffff)
+ pci_use_crs = false;
+ }

dmi_check_system(pci_crs_quirks);


2015-07-08 08:25:03

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 3.14 19/30] x86/PCI: Use host bridge _CRS info on Foxconn K8M890-8237A

3.14-stable review patch. If anyone has any objections, please let me know.

------------------

From: Bjorn Helgaas <[email protected]>

commit 1dace0116d0b05c967d94644fc4dfe96be2ecd3d upstream.

The Foxconn K8M890-8237A has two PCI host bridges, and we can't assign
resources correctly without the information from _CRS that tells us which
address ranges are claimed by which bridge. In the bugs mentioned below,
we incorrectly assign a sound card address (this example is from 1033299):

bus: 00 index 2 [mem 0x80000000-0xfcffffffff]
ACPI: PCI Root Bridge [PCI0] (domain 0000 [bus 00-7f])
pci_root PNP0A08:00: host bridge window [mem 0x80000000-0xbfefffff] (ignored)
pci_root PNP0A08:00: host bridge window [mem 0xc0000000-0xdfffffff] (ignored)
pci_root PNP0A08:00: host bridge window [mem 0xf0000000-0xfebfffff] (ignored)
ACPI: PCI Root Bridge [PCI1] (domain 0000 [bus 80-ff])
pci_root PNP0A08:01: host bridge window [mem 0xbff00000-0xbfffffff] (ignored)
pci 0000:80:01.0: [1106:3288] type 0 class 0x000403
pci 0000:80:01.0: reg 10: [mem 0xbfffc000-0xbfffffff 64bit]
pci 0000:80:01.0: address space collision: [mem 0xbfffc000-0xbfffffff 64bit] conflicts with PCI Bus #00 [mem 0x80000000-0xfcffffffff]
pci 0000:80:01.0: BAR 0: assigned [mem 0xfd00000000-0xfd00003fff 64bit]
BUG: unable to handle kernel paging request at ffffc90000378000
IP: [<ffffffffa0345f63>] azx_create+0x37c/0x822 [snd_hda_intel]

We assigned 0xfd_0000_0000, but that is not in any of the host bridge
windows, and the sound card doesn't work.

Turn on pci=use_crs automatically for this system.

Link: https://bugs.launchpad.net/ubuntu/+source/alsa-driver/+bug/931368
Link: https://bugs.launchpad.net/ubuntu/+source/alsa-driver/+bug/1033299
Signed-off-by: Bjorn Helgaas <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
arch/x86/pci/acpi.c | 11 +++++++++++
1 file changed, 11 insertions(+)

--- a/arch/x86/pci/acpi.c
+++ b/arch/x86/pci/acpi.c
@@ -84,6 +84,17 @@ static const struct dmi_system_id pci_cr
DMI_MATCH(DMI_BIOS_VENDOR, "Phoenix Technologies, LTD"),
},
},
+ /* https://bugs.launchpad.net/ubuntu/+source/alsa-driver/+bug/931368 */
+ /* https://bugs.launchpad.net/ubuntu/+source/alsa-driver/+bug/1033299 */
+ {
+ .callback = set_use_crs,
+ .ident = "Foxconn K8M890-8237A",
+ .matches = {
+ DMI_MATCH(DMI_BOARD_VENDOR, "Foxconn"),
+ DMI_MATCH(DMI_BOARD_NAME, "K8M890-8237A"),
+ DMI_MATCH(DMI_BIOS_VENDOR, "Phoenix Technologies, LTD"),
+ },
+ },

/* Now for the blacklist.. */


2015-07-08 08:24:35

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 3.14 20/30] MIPS: Fix KVM guest fixmap address

3.14-stable review patch. If anyone has any objections, please let me know.

------------------

From: James Hogan <[email protected]>

commit 8e748c8d09a9314eedb5c6367d9acfaacddcdc88 upstream.

KVM guest kernels for trap & emulate run in user mode, with a modified
set of kernel memory segments. However the fixmap address is still in
the normal KSeg3 region at 0xfffe0000 regardless, causing problems when
cache alias handling makes use of them when handling copy on write.

Therefore define FIXADDR_TOP as 0x7ffe0000 in the guest kernel mapped
region when CONFIG_KVM_GUEST is defined.

Signed-off-by: James Hogan <[email protected]>
Cc: [email protected]
Patchwork: https://patchwork.linux-mips.org/patch/9887/
Signed-off-by: Ralf Baechle <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
arch/mips/include/asm/mach-generic/spaces.h | 4 ++++
1 file changed, 4 insertions(+)

--- a/arch/mips/include/asm/mach-generic/spaces.h
+++ b/arch/mips/include/asm/mach-generic/spaces.h
@@ -94,7 +94,11 @@
#endif

#ifndef FIXADDR_TOP
+#ifdef CONFIG_KVM_GUEST
+#define FIXADDR_TOP ((unsigned long)(long)(int)0x7ffe0000)
+#else
#define FIXADDR_TOP ((unsigned long)(long)(int)0xfffe0000)
#endif
+#endif

#endif /* __ASM_MACH_GENERIC_SPACES_H */

2015-07-08 08:23:41

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 3.14 22/30] fs: Fix S_NOSEC handling

3.14-stable review patch. If anyone has any objections, please let me know.

------------------

From: Jan Kara <[email protected]>

commit 2426f3910069ed47c0cc58559a6d088af7920201 upstream.

file_remove_suid() could mistakenly set S_NOSEC inode bit when root was
modifying the file. As a result following writes to the file by ordinary
user would avoid clearing suid or sgid bits.

Fix the bug by checking actual mode bits before setting S_NOSEC.

Signed-off-by: Jan Kara <[email protected]>
Signed-off-by: Al Viro <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
fs/inode.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

--- a/fs/inode.c
+++ b/fs/inode.c
@@ -1631,8 +1631,8 @@ int file_remove_suid(struct file *file)
error = security_inode_killpriv(dentry);
if (!error && killsuid)
error = __remove_suid(dentry, killsuid);
- if (!error && (inode->i_sb->s_flags & MS_NOSEC))
- inode->i_flags |= S_NOSEC;
+ if (!error)
+ inode_has_no_xattr(inode);

return error;
}

2015-07-08 08:23:30

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 3.14 23/30] vfs: Remove incorrect debugging WARN in prepend_path

3.14-stable review patch. If anyone has any objections, please let me know.

------------------

From: "Eric W. Biederman" <[email protected]>

commit 93e3bce6287e1fb3e60d3324ed08555b5bbafa89 upstream.

The warning message in prepend_path is unclear and outdated. It was
added as a warning that the mechanism for generating names of pseudo
files had been removed from prepend_path and d_dname should be used
instead. Unfortunately the warning reads like a general warning,
making it unclear what to do with it.

Remove the warning. The transition it was added to warn about is long
over, and I added code several years ago which in rare cases causes
the warning to fire on legitimate code, and the warning is now firing
and scaring people for no good reason.

Reported-by: Ivan Delalande <[email protected]>
Reported-by: Omar Sandoval <[email protected]>
Fixes: f48cfddc6729e ("vfs: In d_path don't call d_dname on a mount point")
Signed-off-by: "Eric W. Biederman" <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
fs/dcache.c | 11 -----------
1 file changed, 11 deletions(-)

--- a/fs/dcache.c
+++ b/fs/dcache.c
@@ -2905,17 +2905,6 @@ restart:
vfsmnt = &mnt->mnt;
continue;
}
- /*
- * Filesystems needing to implement special "root names"
- * should do so with ->d_dname()
- */
- if (IS_ROOT(dentry) &&
- (dentry->d_name.len != 1 ||
- dentry->d_name.name[0] != '/')) {
- WARN(1, "Root dentry has weird name <%.*s>\n",
- (int) dentry->d_name.len,
- dentry->d_name.name);
- }
if (!error)
error = is_mounted(vfsmnt) ? 1 : 2;
break;

2015-07-08 08:23:19

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 3.14 24/30] vfs: Ignore unlocked mounts in fs_fully_visible

3.14-stable review patch. If anyone has any objections, please let me know.

------------------

From: "Eric W. Biederman" <[email protected]>

commit ceeb0e5d39fcdf4dca2c997bf225c7fc49200b37 upstream.

Limit the mounts fs_fully_visible considers to locked mounts.
Unlocked can always be unmounted so considering them adds hassle
but no security benefit.

Signed-off-by: "Eric W. Biederman" <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
fs/namespace.c | 8 ++++++--
1 file changed, 6 insertions(+), 2 deletions(-)

--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -3031,11 +3031,15 @@ bool fs_fully_visible(struct file_system
if (mnt->mnt.mnt_root != mnt->mnt.mnt_sb->s_root)
continue;

- /* This mount is not fully visible if there are any child mounts
- * that cover anything except for empty directories.
+ /* This mount is not fully visible if there are any
+ * locked child mounts that cover anything except for
+ * empty directories.
*/
list_for_each_entry(child, &mnt->mnt_mounts, mnt_child) {
struct inode *inode = child->mnt_mountpoint->d_inode;
+ /* Only worry about locked mounts */
+ if (!(mnt->mnt.mnt_flags & MNT_LOCKED))
+ continue;
if (!S_ISDIR(inode->i_mode))
goto next;
if (inode->i_nlink > 2)

2015-07-08 08:23:05

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 3.14 25/30] arm/arm64: KVM: Require in-kernel vgic for the arch timers

3.14-stable review patch. If anyone has any objections, please let me know.

------------------

From: Christoffer Dall <[email protected]>

commit 05971120fca43e0357789a14b3386bb56eef2201 upstream.

[Note this patch is a bit different from the original one as the names of
vgic_initialized and kvm_vgic_init are different.]

It is curently possible to run a VM with architected timers support
without creating an in-kernel VGIC, which will result in interrupts from
the virtual timer going nowhere.

To address this issue, move the architected timers initialization to the
time when we run a VCPU for the first time, and then only initialize
(and enable) the architected timers if we have a properly created and
initialized in-kernel VGIC.

When injecting interrupts from the virtual timer to the vgic, the
current setup should ensure that this never calls an on-demand init of
the VGIC, which is the only call path that could return an error from
kvm_vgic_inject_irq(), so capture the return value and raise a warning
if there's an error there.

We also change the kvm_timer_init() function from returning an int to be
a void function, since the function always succeeds.

Reviewed-by: Marc Zyngier <[email protected]>
Signed-off-by: Christoffer Dall <[email protected]>
Signed-off-by: Shannon Zhao <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
arch/arm/kvm/arm.c | 13 +++++++++++--
include/kvm/arm_arch_timer.h | 10 ++++------
virt/kvm/arm/arch_timer.c | 30 ++++++++++++++++++++++--------
3 files changed, 37 insertions(+), 16 deletions(-)

--- a/arch/arm/kvm/arm.c
+++ b/arch/arm/kvm/arm.c
@@ -441,6 +441,7 @@ static void update_vttbr(struct kvm *kvm

static int kvm_vcpu_first_run_init(struct kvm_vcpu *vcpu)
{
+ struct kvm *kvm = vcpu->kvm;
int ret;

if (likely(vcpu->arch.has_run_once))
@@ -452,12 +453,20 @@ static int kvm_vcpu_first_run_init(struc
* Initialize the VGIC before running a vcpu the first time on
* this VM.
*/
- if (unlikely(!vgic_initialized(vcpu->kvm))) {
- ret = kvm_vgic_init(vcpu->kvm);
+ if (unlikely(!vgic_initialized(kvm))) {
+ ret = kvm_vgic_init(kvm);
if (ret)
return ret;
}

+ /*
+ * Enable the arch timers only if we have an in-kernel VGIC
+ * and it has been properly initialized, since we cannot handle
+ * interrupts from the virtual timer with a userspace gic.
+ */
+ if (irqchip_in_kernel(kvm) && vgic_initialized(kvm))
+ kvm_timer_enable(kvm);
+
return 0;
}

--- a/include/kvm/arm_arch_timer.h
+++ b/include/kvm/arm_arch_timer.h
@@ -60,7 +60,8 @@ struct arch_timer_cpu {

#ifdef CONFIG_KVM_ARM_TIMER
int kvm_timer_hyp_init(void);
-int kvm_timer_init(struct kvm *kvm);
+void kvm_timer_enable(struct kvm *kvm);
+void kvm_timer_init(struct kvm *kvm);
void kvm_timer_vcpu_reset(struct kvm_vcpu *vcpu,
const struct kvm_irq_level *irq);
void kvm_timer_vcpu_init(struct kvm_vcpu *vcpu);
@@ -73,11 +74,8 @@ static inline int kvm_timer_hyp_init(voi
return 0;
};

-static inline int kvm_timer_init(struct kvm *kvm)
-{
- return 0;
-}
-
+static inline void kvm_timer_enable(struct kvm *kvm) {}
+static inline void kvm_timer_init(struct kvm *kvm) {}
static inline void kvm_timer_vcpu_reset(struct kvm_vcpu *vcpu,
const struct kvm_irq_level *irq) {}
static inline void kvm_timer_vcpu_init(struct kvm_vcpu *vcpu) {}
--- a/virt/kvm/arm/arch_timer.c
+++ b/virt/kvm/arm/arch_timer.c
@@ -61,12 +61,14 @@ static void timer_disarm(struct arch_tim

static void kvm_timer_inject_irq(struct kvm_vcpu *vcpu)
{
+ int ret;
struct arch_timer_cpu *timer = &vcpu->arch.timer_cpu;

timer->cntv_ctl |= ARCH_TIMER_CTRL_IT_MASK;
- kvm_vgic_inject_irq(vcpu->kvm, vcpu->vcpu_id,
- timer->irq->irq,
- timer->irq->level);
+ ret = kvm_vgic_inject_irq(vcpu->kvm, vcpu->vcpu_id,
+ timer->irq->irq,
+ timer->irq->level);
+ WARN_ON(ret);
}

static irqreturn_t kvm_arch_timer_handler(int irq, void *dev_id)
@@ -307,12 +309,24 @@ void kvm_timer_vcpu_terminate(struct kvm
timer_disarm(timer);
}

-int kvm_timer_init(struct kvm *kvm)
+void kvm_timer_enable(struct kvm *kvm)
{
- if (timecounter && wqueue) {
- kvm->arch.timer.cntvoff = kvm_phys_timer_read();
+ if (kvm->arch.timer.enabled)
+ return;
+
+ /*
+ * There is a potential race here between VCPUs starting for the first
+ * time, which may be enabling the timer multiple times. That doesn't
+ * hurt though, because we're just setting a variable to the same
+ * variable that it already was. The important thing is that all
+ * VCPUs have the enabled variable set, before entering the guest, if
+ * the arch timers are enabled.
+ */
+ if (timecounter && wqueue)
kvm->arch.timer.enabled = 1;
- }
+}

- return 0;
+void kvm_timer_init(struct kvm *kvm)
+{
+ kvm->arch.timer.cntvoff = kvm_phys_timer_read();
}

2015-07-08 08:22:58

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 3.14 26/30] arm64: KVM: Fix TLB invalidation by IPA/VMID

3.14-stable review patch. If anyone has any objections, please let me know.

------------------

From: Marc Zyngier <[email protected]>

commit 55e858b75808347378e5117c3c2339f46cc03575 upstream.

It took about two years for someone to notice that the IPA passed
to TLBI IPAS2E1IS must be shifted by 12 bits. Clearly our reviewing
is not as good as it should be...

Paper bag time for me.

Reported-by: Mario Smarduch <[email protected]>
Tested-by: Mario Smarduch <[email protected]>
Signed-off-by: Marc Zyngier <[email protected]>
Signed-off-by: Christoffer Dall <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
Signed-off-by: Shannon Zhao <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
arch/arm64/kvm/hyp.S | 1 +
1 file changed, 1 insertion(+)

--- a/arch/arm64/kvm/hyp.S
+++ b/arch/arm64/kvm/hyp.S
@@ -629,6 +629,7 @@ ENTRY(__kvm_tlb_flush_vmid_ipa)
* Instead, we invalidate Stage-2 for this IPA, and the
* whole of Stage-1. Weep...
*/
+ lsr x1, x1, #12
tlbi ipas2e1is, x1
/*
* We have to ensure completion of the invalidation at Stage-2,

2015-07-08 07:36:57

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 3.14 27/30] arm64: KVM: Fix HCR setting for 32bit guests

3.14-stable review patch. If anyone has any objections, please let me know.

------------------

From: Marc Zyngier <[email protected]>

commit 801f6772cecea6cfc7da61aa197716ab64db5f9e upstream.

Commit b856a59141b1 (arm/arm64: KVM: Reset the HCR on each vcpu
when resetting the vcpu) moved the init of the HCR register to
happen later in the init of a vcpu, but left out the fixup
done in kvm_reset_vcpu when preparing for a 32bit guest.

As a result, the 32bit guest is run as a 64bit guest, but the
rest of the kernel still manages it as a 32bit. Fun follows.

Moving the fixup to vcpu_reset_hcr solves the problem for good.

Signed-off-by: Marc Zyngier <[email protected]>
Signed-off-by: Christoffer Dall <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
Signed-off-by: Shannon Zhao <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
arch/arm64/include/asm/kvm_emulate.h | 2 ++
arch/arm64/kvm/reset.c | 1 -
2 files changed, 2 insertions(+), 1 deletion(-)

--- a/arch/arm64/include/asm/kvm_emulate.h
+++ b/arch/arm64/include/asm/kvm_emulate.h
@@ -41,6 +41,8 @@ void kvm_inject_pabt(struct kvm_vcpu *vc
static inline void vcpu_reset_hcr(struct kvm_vcpu *vcpu)
{
vcpu->arch.hcr_el2 = HCR_GUEST_FLAGS;
+ if (test_bit(KVM_ARM_VCPU_EL1_32BIT, vcpu->arch.features))
+ vcpu->arch.hcr_el2 &= ~HCR_RW;
}

static inline unsigned long *vcpu_pc(const struct kvm_vcpu *vcpu)
--- a/arch/arm64/kvm/reset.c
+++ b/arch/arm64/kvm/reset.c
@@ -90,7 +90,6 @@ int kvm_reset_vcpu(struct kvm_vcpu *vcpu
if (!cpu_has_32bit_el1())
return -EINVAL;
cpu_reset = &default_regs_reset32;
- vcpu->arch.hcr_el2 &= ~HCR_RW;
} else {
cpu_reset = &default_regs_reset;
}

2015-07-08 07:37:06

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 3.14 28/30] arm64: KVM: Do not use pgd_index to index stage-2 pgd

3.14-stable review patch. If anyone has any objections, please let me know.

------------------

From: Marc Zyngier <[email protected]>

commit 04b8dc85bf4a64517e3cf20e409eeaa503b15cc1 upstream.

[Since we don't backport commit c647355 (KVM: arm: Add initial dirty page
locking support) for linux-3.14.y, there is no stage2_wp_range in
arch/arm/kvm/mmu.c. So ignore the change in stage2_wp_range introduced
by this patch.]

The kernel's pgd_index macro is designed to index a normal, page
sized array. KVM is a bit diffferent, as we can use concatenated
pages to have a bigger address space (for example 40bit IPA with
4kB pages gives us an 8kB PGD.

In the above case, the use of pgd_index will always return an index
inside the first 4kB, which makes a guest that has memory above
0x8000000000 rather unhappy, as it spins forever in a page fault,
whist the host happilly corrupts the lower pgd.

The obvious fix is to get our own kvm_pgd_index that does the right
thing(tm).

Tested on X-Gene with a hacked kvmtool that put memory at a stupidly
high address.

Reviewed-by: Christoffer Dall <[email protected]>
Signed-off-by: Marc Zyngier <[email protected]>
Signed-off-by: Christoffer Dall <[email protected]>
Signed-off-by: Shannon Zhao <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
arch/arm/include/asm/kvm_mmu.h | 3 ++-
arch/arm/kvm/mmu.c | 6 +++---
arch/arm64/include/asm/kvm_mmu.h | 2 ++
3 files changed, 7 insertions(+), 4 deletions(-)

--- a/arch/arm/include/asm/kvm_mmu.h
+++ b/arch/arm/include/asm/kvm_mmu.h
@@ -117,13 +117,14 @@ static inline void kvm_set_s2pmd_writabl
(__boundary - 1 < (end) - 1)? __boundary: (end); \
})

+#define kvm_pgd_index(addr) pgd_index(addr)
+
static inline bool kvm_page_empty(void *ptr)
{
struct page *ptr_page = virt_to_page(ptr);
return page_count(ptr_page) == 1;
}

-
#define kvm_pte_table_empty(ptep) kvm_page_empty(ptep)
#define kvm_pmd_table_empty(pmdp) kvm_page_empty(pmdp)
#define kvm_pud_table_empty(pudp) (0)
--- a/arch/arm/kvm/mmu.c
+++ b/arch/arm/kvm/mmu.c
@@ -194,7 +194,7 @@ static void unmap_range(struct kvm *kvm,
phys_addr_t addr = start, end = start + size;
phys_addr_t next;

- pgd = pgdp + pgd_index(addr);
+ pgd = pgdp + kvm_pgd_index(addr);
do {
next = kvm_pgd_addr_end(addr, end);
if (!pgd_none(*pgd))
@@ -264,7 +264,7 @@ static void stage2_flush_memslot(struct
phys_addr_t next;
pgd_t *pgd;

- pgd = kvm->arch.pgd + pgd_index(addr);
+ pgd = kvm->arch.pgd + kvm_pgd_index(addr);
do {
next = kvm_pgd_addr_end(addr, end);
stage2_flush_puds(kvm, pgd, addr, next);
@@ -649,7 +649,7 @@ static pmd_t *stage2_get_pmd(struct kvm
pud_t *pud;
pmd_t *pmd;

- pgd = kvm->arch.pgd + pgd_index(addr);
+ pgd = kvm->arch.pgd + kvm_pgd_index(addr);
pud = pud_offset(pgd, addr);
if (pud_none(*pud)) {
if (!cache)
--- a/arch/arm64/include/asm/kvm_mmu.h
+++ b/arch/arm64/include/asm/kvm_mmu.h
@@ -69,6 +69,8 @@
#define PTRS_PER_S2_PGD (1 << (KVM_PHYS_SHIFT - PGDIR_SHIFT))
#define S2_PGD_ORDER get_order(PTRS_PER_S2_PGD * sizeof(pgd_t))

+#define kvm_pgd_index(addr) (((addr) >> PGDIR_SHIFT) & (PTRS_PER_S2_PGD - 1))
+
int create_hyp_mappings(void *from, void *to);
int create_hyp_io_mappings(void *from, void *to, phys_addr_t);
void free_boot_hyp_pgd(void);

2015-07-08 07:35:29

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 3.14 30/30] x86/iosf: Add Kconfig prompt for IOSF_MBI selection

3.14-stable review patch. If anyone has any objections, please let me know.

------------------

From: "David E. Box" <[email protected]>

commit aa8e4f22ab7773352ba3895597189b8097f2c307 upstream.

Fixes an error in having the iosf build as 'default m'. On X86 SoC's the iosf
sideband is the only way to access information for some registers, as opposed to
through MSR's on other Intel architectures. While selecting IOSF_MBI is
preferred, it does mean carrying extra code on non-SoC architectures. This
exports the selection to the user, allowing those driver writers to compile out
iosf code if it's not being built.

Signed-off-by: David E. Box <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: H. Peter Anvin <[email protected]>
Cc: William Dauchy <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
arch/x86/Kconfig | 14 ++++++++++++--
1 file changed, 12 insertions(+), 2 deletions(-)

--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -2440,9 +2440,19 @@ config X86_DMA_REMAP
depends on STA2X11

config IOSF_MBI
- tristate
- default m
+ tristate "Intel System On Chip IOSF Sideband support"
depends on PCI
+ ---help---
+ Enables sideband access to mailbox registers on SoC's. The sideband is
+ available on the following platforms. This list is not meant to be
+ exclusive.
+ - BayTrail
+ - Cherryview
+ - Braswell
+ - Quark
+
+ You should say Y if you are running a kernel on one of these
+ platforms.

source "net/Kconfig"


2015-07-08 14:08:10

by Guenter Roeck

[permalink] [raw]
Subject: Re: [PATCH 3.14 00/30] 3.14.48-stable review

On 07/08/2015 12:33 AM, Greg Kroah-Hartman wrote:
> This is the start of the stable review cycle for the 3.14.48 release.
> There are 30 patches in this series, all will be posted as a response
> to this one. If anyone has any issues with these being applied, please
> let me know.
>
> Responses should be made by Fri Jul 10 07:31:09 UTC 2015.
> Anything received after that time might be too late.
>

Build results:
total: 120 pass: 120 fail: 0
Qemu test results:
total: 30 pass: 30 fail: 0

Details are available at http://server.roeck-us.net:8010/builders.

Guenter

2015-07-08 14:57:32

by Sudip Mukherjee

[permalink] [raw]
Subject: Re: [PATCH 3.14 00/30] 3.14.48-stable review

On Wed, Jul 08, 2015 at 12:33:59AM -0700, Greg Kroah-Hartman wrote:
> This is the start of the stable review cycle for the 3.14.48 release.
> There are 30 patches in this series, all will be posted as a response
> to this one. If anyone has any issues with these being applied, please
> let me know.
>
> Responses should be made by Fri Jul 10 07:31:09 UTC 2015.
> Anything received after that time might be too late.
Compiled and booted on x86_32. No errors in dmesg.

regards
sudip

2015-07-08 16:33:49

by Shuah Khan

[permalink] [raw]
Subject: Re: [PATCH 3.14 00/30] 3.14.48-stable review

On 07/08/2015 01:33 AM, Greg Kroah-Hartman wrote:
> This is the start of the stable review cycle for the 3.14.48 release.
> There are 30 patches in this series, all will be posted as a response
> to this one. If anyone has any issues with these being applied, please
> let me know.
>
> Responses should be made by Fri Jul 10 07:31:09 UTC 2015.
> Anything received after that time might be too late.
>
> The whole patch series can be found in one patch at:
> kernel.org/pub/linux/kernel/v3.x/stable-review/patch-3.14.48-rc1.gz
> and the diffstat can be found below.
>
> thanks,
>
> greg k-h
>

Compiled and booted on my test system. No dmesg regressions.

thanks,
-- Shuah


--
Shuah Khan
Sr. Linux Kernel Developer
Open Source Innovation Group
Samsung Research America (Silicon Valley)
[email protected] | (970) 217-8978