2013-04-29 19:02:30

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [ 00/42] 3.8.11-stable review

This is the start of the stable review cycle for the 3.8.11 release.
There are 42 patches in this series, all will be posted as a response
to this one. If anyone has any issues with these being applied, please
let me know.

Responses should be made by Wed May 1 18:47:07 UTC 2013.
Anything received after that time might be too late.

The whole patch series can be found in one patch at:
kernel.org/pub/linux/kernel/v3.0/stable-review/patch-3.8.11-rc1.gz
and the diffstat can be found below.

thanks,

greg k-h

-------------
Pseudo-Shortlog of commits:

Greg Kroah-Hartman <[email protected]>
Linux 3.8.11-rc1

Aaro Koskinen <[email protected]>
ARM: 7692/1: iop3xx: move IOP3XX_PERIPHERAL_VIRT_BASE

Stephen Boyd <[email protected]>
ARM: 7699/1: sched_clock: Add more notrace to prevent recursion

Steven Rostedt <[email protected]>
tracing: Fix selftest function recursion accounting

Eric Dumazet <[email protected]>
net: drop dst before queueing fragments

Linus Torvalds <[email protected]>
net: fix incorrect credentials passing

Ben Greear <[email protected]>
net: rate-limit warn-bad-offload splats.

Eric Dumazet <[email protected]>
tcp: call tcp_replace_ts_recent() from tcp_ack()

Bjørn Mork <[email protected]>
net: cdc_mbim: remove bogus sizeof()

Willy Tarreau <[email protected]>
net: mvneta: fix improper tx queue usage in mvneta_tx()

Wei Yongjun <[email protected]>
esp4: fix error return code in esp_output()

Thomas Petazzoni <[email protected]>
net: mvmdio: add select PHYLIB

Thomas Graf <[email protected]>
tcp: Reallocate headroom if it would overflow csum_start

Dmitry Popov <[email protected]>
tcp: incoming connections might use wrong route under synflood

Michael Riesch <[email protected]>
rtnetlink: Call nlmsg_parse() with correct header length

Christoph Paasch <[email protected]>
ipv6/tcp: Stop processing ICMPv6 redirect messages

Patrick McHardy <[email protected]>
netfilter: don't reset nf_trace in nf_reset()

Eric W. Biederman <[email protected]>
af_unix: If we don't care about credentials coallesce all messages

Eric Dumazet <[email protected]>
bonding: fix l23 and l34 load balancing in forwarding path

[email protected] <[email protected]>
bonding: IFF_BONDING is not stripped on enslave failure

[email protected] <[email protected]>
bonding: fix bonding_masters race condition in bond unloading

Hannes Frederic Sowa <[email protected]>
atl1e: limit gso segment size to prevent generation of wrong ip length fields

Vlad Yasevich <[email protected]>
net: count hw_addr syncs so that unsync works properly.

Balakumaran Kannan <[email protected]>
net IPv6 : Fix broken IPv6 routing table after loopback down-up

Vasily Averin <[email protected]>
cbq: incorrect processing of high limits

Mathias Krause <[email protected]>
tipc: fix info leaks via msg_name in recv_msg/recv_stream

Mathias Krause <[email protected]>
rose: fix info leak via msg_name in rose_recvmsg()

Mathias Krause <[email protected]>
NFC: llcp: fix info leaks via msg_name in llcp_sock_recvmsg()

Mathias Krause <[email protected]>
netrom: fix info leak via msg_name in nr_recvmsg()

Mathias Krause <[email protected]>
llc: Fix missing msg_namelen update in llc_ui_recvmsg()

Mathias Krause <[email protected]>
l2tp: fix info leak in l2tp_ip6_recvmsg()

Mathias Krause <[email protected]>
iucv: Fix missing msg_namelen update in iucv_sock_recvmsg()

Mathias Krause <[email protected]>
irda: Fix missing msg_namelen update in irda_recvmsg_dgram()

Mathias Krause <[email protected]>
caif: Fix missing msg_namelen update in caif_seqpkt_recvmsg()

Mathias Krause <[email protected]>
Bluetooth: SCO - Fix missing msg_namelen update in sco_sock_recvmsg()

Mathias Krause <[email protected]>
Bluetooth: RFCOMM - Fix missing msg_namelen update in rfcomm_sock_recvmsg()

Mathias Krause <[email protected]>
Bluetooth: fix possible info leak in bt_sock_recvmsg()

Mathias Krause <[email protected]>
ax25: fix info leak via msg_name in ax25_recvmsg()

Mathias Krause <[email protected]>
atm: update msg_namelen in vcc_recvmsg()

David S. Miller <[email protected]>
sparc64: Fix race in TLB batch processing.

Jiri Slaby <[email protected]>
TTY: fix atime/mtime regression

Jiri Slaby <[email protected]>
TTY: do not update atime/mtime on read/write

Zhao Hongjiang <[email protected]>
aio: fix possible invalid memory access when DEBUG is enabled


-------------

Diffstat:

Makefile | 4 +-
arch/arm/include/asm/hardware/iop3xx.h | 2 +-
arch/arm/kernel/sched_clock.c | 4 +-
arch/sparc/include/asm/pgtable_64.h | 1 +
arch/sparc/include/asm/switch_to_64.h | 3 +-
arch/sparc/include/asm/tlbflush_64.h | 37 ++++++--
arch/sparc/kernel/smp_64.c | 41 +++++++-
arch/sparc/mm/tlb.c | 39 +++++++-
arch/sparc/mm/tsb.c | 57 +++++++++---
arch/sparc/mm/ultra.S | 119 +++++++++++++++++++-----
drivers/net/bonding/bond_main.c | 65 ++++++++-----
drivers/net/ethernet/atheros/atl1e/atl1e.h | 2 +-
drivers/net/ethernet/atheros/atl1e/atl1e_main.c | 1 +
drivers/net/ethernet/marvell/Kconfig | 2 +-
drivers/net/ethernet/marvell/mvneta.c | 9 +-
drivers/net/usb/cdc_mbim.c | 2 +-
drivers/tty/tty_io.c | 14 ++-
fs/aio.c | 2 +-
include/linux/netdevice.h | 2 +-
include/linux/skbuff.h | 7 ++
include/net/scm.h | 4 +-
kernel/trace/trace_selftest.c | 16 +---
net/atm/common.c | 2 +
net/ax25/af_ax25.c | 1 +
net/bluetooth/af_bluetooth.c | 4 +-
net/bluetooth/rfcomm/sock.c | 1 +
net/bluetooth/sco.c | 1 +
net/caif/caif_socket.c | 2 +
net/core/dev.c | 4 +
net/core/dev_addr_lists.c | 6 +-
net/core/rtnetlink.c | 4 +-
net/ipv4/esp4.c | 6 +-
net/ipv4/ip_fragment.c | 15 ++-
net/ipv4/syncookies.c | 4 +-
net/ipv4/tcp_input.c | 64 ++++++-------
net/ipv4/tcp_output.c | 8 +-
net/ipv6/addrconf.c | 27 ++++++
net/ipv6/reassembly.c | 13 ++-
net/ipv6/tcp_ipv6.c | 1 +
net/irda/af_irda.c | 2 +
net/iucv/af_iucv.c | 2 +
net/l2tp/l2tp_ip6.c | 1 +
net/llc/af_llc.c | 2 +
net/netrom/af_netrom.c | 1 +
net/nfc/llcp/sock.c | 3 +
net/rose/af_rose.c | 1 +
net/sched/sch_cbq.c | 5 +-
net/tipc/socket.c | 7 ++
net/unix/af_unix.c | 2 +-
49 files changed, 455 insertions(+), 167 deletions(-)


2013-04-29 19:02:57

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [ 40/42] tracing: Fix selftest function recursion accounting

3.8-stable review patch. If anyone has any objections, please let me know.

------------------

From: Steven Rostedt <[email protected]>

commit 05cbbf643b8eea1be21082c53cdb856d1dc6d765 upstream.

The test that checks function recursion does things differently
if the arch does not support all ftrace features. But that really
doesn't make a difference with how the test runs, and either way
the count variable should be 2 at the end.

Currently the test wrongly fails for archs that don't support all
the ftrace features.

Signed-off-by: Steven Rostedt <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
kernel/trace/trace_selftest.c | 16 +++-------------
1 file changed, 3 insertions(+), 13 deletions(-)

--- a/kernel/trace/trace_selftest.c
+++ b/kernel/trace/trace_selftest.c
@@ -452,7 +452,6 @@ trace_selftest_function_recursion(void)
char *func_name;
int len;
int ret;
- int cnt;

/* The previous test PASSED */
pr_cont("PASSED\n");
@@ -510,19 +509,10 @@ trace_selftest_function_recursion(void)

unregister_ftrace_function(&test_recsafe_probe);

- /*
- * If arch supports all ftrace features, and no other task
- * was on the list, we should be fine.
- */
- if (!ftrace_nr_registered_ops() && !FTRACE_FORCE_LIST_FUNC)
- cnt = 2; /* Should have recursed */
- else
- cnt = 1;
-
ret = -1;
- if (trace_selftest_recursion_cnt != cnt) {
- pr_cont("*callback not called expected %d times (%d)* ",
- cnt, trace_selftest_recursion_cnt);
+ if (trace_selftest_recursion_cnt != 2) {
+ pr_cont("*callback not called expected 2 times (%d)* ",
+ trace_selftest_recursion_cnt);
goto out;
}


2013-04-29 19:32:38

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [ 36/42] tcp: call tcp_replace_ts_recent() from tcp_ack()

3.8-stable review patch. If anyone has any objections, please let me know.

------------------


From: Eric Dumazet <[email protected]>

[ Upstream commit 12fb3dd9dc3c64ba7d64cec977cca9b5fb7b1d4e ]

commit bd090dfc634d (tcp: tcp_replace_ts_recent() should not be called
from tcp_validate_incoming()) introduced a TS ecr bug in slow path
processing.

1 A > B P. 1:10001(10000) ack 1 <nop,nop,TS val 1001 ecr 200>
2 B < A . 1:1(0) ack 1 win 257 <sack 9001:10001,TS val 300 ecr 1001>
3 A > B . 1:1001(1000) ack 1 win 227 <nop,nop,TS val 1002 ecr 200>
4 A > B . 1001:2001(1000) ack 1 win 227 <nop,nop,TS val 1002 ecr 200>

(ecr 200 should be ecr 300 in packets 3 & 4)

Problem is tcp_ack() can trigger send of new packets (retransmits),
reflecting the prior TSval, instead of the TSval contained in the
currently processed incoming packet.

Fix this by calling tcp_replace_ts_recent() from tcp_ack() after the
checks, but before the actions.

Reported-by: Yuchung Cheng <[email protected]>
Signed-off-by: Eric Dumazet <[email protected]>
Cc: Neal Cardwell <[email protected]>
Acked-by: Neal Cardwell <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
net/ipv4/tcp_input.c | 64 ++++++++++++++++++++++++---------------------------
1 file changed, 31 insertions(+), 33 deletions(-)

--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -116,6 +116,7 @@ int sysctl_tcp_early_retrans __read_most
#define FLAG_DSACKING_ACK 0x800 /* SACK blocks contained D-SACK info */
#define FLAG_NONHEAD_RETRANS_ACKED 0x1000 /* Non-head rexmitted data was ACKed */
#define FLAG_SACK_RENEGING 0x2000 /* snd_una advanced to a sacked seq */
+#define FLAG_UPDATE_TS_RECENT 0x4000 /* tcp_replace_ts_recent() */

#define FLAG_ACKED (FLAG_DATA_ACKED|FLAG_SYN_ACKED)
#define FLAG_NOT_DUP (FLAG_DATA|FLAG_WIN_UPDATE|FLAG_ACKED)
@@ -3572,6 +3573,27 @@ static void tcp_send_challenge_ack(struc
}
}

+static void tcp_store_ts_recent(struct tcp_sock *tp)
+{
+ tp->rx_opt.ts_recent = tp->rx_opt.rcv_tsval;
+ tp->rx_opt.ts_recent_stamp = get_seconds();
+}
+
+static void tcp_replace_ts_recent(struct tcp_sock *tp, u32 seq)
+{
+ if (tp->rx_opt.saw_tstamp && !after(seq, tp->rcv_wup)) {
+ /* PAWS bug workaround wrt. ACK frames, the PAWS discard
+ * extra check below makes sure this can only happen
+ * for pure ACK frames. -DaveM
+ *
+ * Not only, also it occurs for expired timestamps.
+ */
+
+ if (tcp_paws_check(&tp->rx_opt, 0))
+ tcp_store_ts_recent(tp);
+ }
+}
+
/* This routine deals with incoming acks, but not outgoing ones. */
static int tcp_ack(struct sock *sk, const struct sk_buff *skb, int flag)
{
@@ -3624,6 +3646,12 @@ static int tcp_ack(struct sock *sk, cons
prior_fackets = tp->fackets_out;
prior_in_flight = tcp_packets_in_flight(tp);

+ /* ts_recent update must be made after we are sure that the packet
+ * is in window.
+ */
+ if (flag & FLAG_UPDATE_TS_RECENT)
+ tcp_replace_ts_recent(tp, TCP_SKB_CB(skb)->seq);
+
if (!(flag & FLAG_SLOWPATH) && after(ack, prior_snd_una)) {
/* Window is constant, pure forward advance.
* No more checks are required.
@@ -3940,27 +3968,6 @@ const u8 *tcp_parse_md5sig_option(const
EXPORT_SYMBOL(tcp_parse_md5sig_option);
#endif

-static inline void tcp_store_ts_recent(struct tcp_sock *tp)
-{
- tp->rx_opt.ts_recent = tp->rx_opt.rcv_tsval;
- tp->rx_opt.ts_recent_stamp = get_seconds();
-}
-
-static inline void tcp_replace_ts_recent(struct tcp_sock *tp, u32 seq)
-{
- if (tp->rx_opt.saw_tstamp && !after(seq, tp->rcv_wup)) {
- /* PAWS bug workaround wrt. ACK frames, the PAWS discard
- * extra check below makes sure this can only happen
- * for pure ACK frames. -DaveM
- *
- * Not only, also it occurs for expired timestamps.
- */
-
- if (tcp_paws_check(&tp->rx_opt, 0))
- tcp_store_ts_recent(tp);
- }
-}
-
/* Sorry, PAWS as specified is broken wrt. pure-ACKs -DaveM
*
* It is not fatal. If this ACK does _not_ change critical state (seqs, window)
@@ -5556,14 +5563,9 @@ slow_path:
return 0;

step5:
- if (tcp_ack(sk, skb, FLAG_SLOWPATH) < 0)
+ if (tcp_ack(sk, skb, FLAG_SLOWPATH | FLAG_UPDATE_TS_RECENT) < 0)
goto discard;

- /* ts_recent update must be made after we are sure that the packet
- * is in window.
- */
- tcp_replace_ts_recent(tp, TCP_SKB_CB(skb)->seq);
-
tcp_rcv_rtt_measure_ts(sk, skb);

/* Process urgent data. */
@@ -5997,7 +5999,8 @@ int tcp_rcv_state_process(struct sock *s

/* step 5: check the ACK field */
if (true) {
- int acceptable = tcp_ack(sk, skb, FLAG_SLOWPATH) > 0;
+ int acceptable = tcp_ack(sk, skb, FLAG_SLOWPATH |
+ FLAG_UPDATE_TS_RECENT) > 0;

switch (sk->sk_state) {
case TCP_SYN_RECV:
@@ -6148,11 +6151,6 @@ int tcp_rcv_state_process(struct sock *s
}
}

- /* ts_recent update must be made after we are sure that the packet
- * is in window.
- */
- tcp_replace_ts_recent(tp, TCP_SKB_CB(skb)->seq);
-
/* step 6: check the URG bit */
tcp_urg(sk, skb, th);


2013-04-29 19:02:54

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [ 35/42] net: cdc_mbim: remove bogus sizeof()

3.8-stable review patch. If anyone has any objections, please let me know.

------------------


From: =?UTF-8?q?Bj=C3=B8rn=20Mork?= <[email protected]>

[ Upstream commit 32b161aa88aa40a83888a995c6e2ef81140219b1 ]

The intention was to test against the constant, not the size of
the constant.

Signed-off-by: Bjørn Mork <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
drivers/net/usb/cdc_mbim.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

--- a/drivers/net/usb/cdc_mbim.c
+++ b/drivers/net/usb/cdc_mbim.c
@@ -134,7 +134,7 @@ static struct sk_buff *cdc_mbim_tx_fixup
goto error;

if (skb) {
- if (skb->len <= sizeof(ETH_HLEN))
+ if (skb->len <= ETH_HLEN)
goto error;

/* mapping VLANs to MBIM sessions:

2013-04-29 19:33:00

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [ 41/42] ARM: 7699/1: sched_clock: Add more notrace to prevent recursion

3.8-stable review patch. If anyone has any objections, please let me know.

------------------

From: Stephen Boyd <[email protected]>

commit cea15092f098b7018e89f64a5a14bb71955965d5 upstream.

cyc_to_sched_clock() is called by sched_clock() and cyc_to_ns()
is called by cyc_to_sched_clock(). I suspect that some compilers
inline both of these functions into sched_clock() and so we've
been getting away without having a notrace marking. It seems that
my compiler isn't inlining cyc_to_sched_clock() though, so I'm
hitting a recursion bug when I enable the function graph tracer,
causing my system to crash. Marking these functions notrace fixes
it. Technically cyc_to_ns() doesn't need the notrace because it's
already marked inline, but let's just add it so that if we ever
remove inline from that function it doesn't blow up.

Signed-off-by: Stephen Boyd <[email protected]>
Signed-off-by: Russell King <[email protected]>
Signed-off-by: Jonghwan Choi <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
arch/arm/kernel/sched_clock.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

--- a/arch/arm/kernel/sched_clock.c
+++ b/arch/arm/kernel/sched_clock.c
@@ -45,12 +45,12 @@ static u32 notrace jiffy_sched_clock_rea

static u32 __read_mostly (*read_sched_clock)(void) = jiffy_sched_clock_read;

-static inline u64 cyc_to_ns(u64 cyc, u32 mult, u32 shift)
+static inline u64 notrace cyc_to_ns(u64 cyc, u32 mult, u32 shift)
{
return (cyc * mult) >> shift;
}

-static unsigned long long cyc_to_sched_clock(u32 cyc, u32 mask)
+static unsigned long long notrace cyc_to_sched_clock(u32 cyc, u32 mask)
{
u64 epoch_ns;
u32 epoch_cyc;

2013-04-29 19:32:58

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [ 42/42] ARM: 7692/1: iop3xx: move IOP3XX_PERIPHERAL_VIRT_BASE

3.8-stable review patch. If anyone has any objections, please let me know.

------------------

From: Aaro Koskinen <[email protected]>

commit f5d6a1441a5045824f36ff7c6b6bbae0373472a6 upstream.

Currently IOP3XX_PERIPHERAL_VIRT_BASE conflicts with PCI_IO_VIRT_BASE:

address size
PCI_IO_VIRT_BASE 0xfee00000 0x200000
IOP3XX_PERIPHERAL_VIRT_BASE 0xfeffe000 0x2000

Fix by moving IOP3XX_PERIPHERAL_VIRT_BASE below PCI_IO_VIRT_BASE.

The patch fixes the following kernel panic with 3.9-rc1 on iop3xx boards:

[ 0.000000] Booting Linux on physical CPU 0x0
[ 0.000000] Initializing cgroup subsys cpu
[ 0.000000] Linux version 3.9.0-rc1-iop32x (aaro@blackmetal) (gcc version 4.7.2 (GCC) ) #20 PREEMPT Tue Mar 5 16:44:36 EET 2013
[ 0.000000] bootconsole [earlycon0] enabled
[ 0.000000] ------------[ cut here ]------------
[ 0.000000] kernel BUG at mm/vmalloc.c:1145!
[ 0.000000] Internal error: Oops - BUG: 0 [#1] PREEMPT ARM
[ 0.000000] Modules linked in:
[ 0.000000] CPU: 0 Not tainted (3.9.0-rc1-iop32x #20)
[ 0.000000] PC is at vm_area_add_early+0x4c/0x88
[ 0.000000] LR is at add_static_vm_early+0x14/0x68
[ 0.000000] pc : [<c03e74a8>] lr : [<c03e1c40>] psr: 800000d3
[ 0.000000] sp : c03ffee4 ip : dfffdf88 fp : c03ffef4
[ 0.000000] r10: 00000002 r9 : 000000cf r8 : 00000653
[ 0.000000] r7 : c040eca8 r6 : c03e2408 r5 : dfffdf60 r4 : 00200000
[ 0.000000] r3 : dfffdfd8 r2 : feffe000 r1 : ff000000 r0 : dfffdf60
[ 0.000000] Flags: Nzcv IRQs off FIQs off Mode SVC_32 ISA ARM Segment kernel
[ 0.000000] Control: 0000397f Table: a0004000 DAC: 00000017
[ 0.000000] Process swapper (pid: 0, stack limit = 0xc03fe1b8)
[ 0.000000] Stack: (0xc03ffee4 to 0xc0400000)
[ 0.000000] fee0: 00200000 c03fff0c c03ffef8 c03e1c40 c03e7468 00200000 fee00000
[ 0.000000] ff00: c03fff2c c03fff10 c03e23e4 c03e1c38 feffe000 c0408ee4 ff000000 c0408f04
[ 0.000000] ff20: c03fff3c c03fff30 c03e2434 c03e23b4 c03fff84 c03fff40 c03e2c94 c03e2414
[ 0.000000] ff40: c03f8878 c03f6410 ffff0000 000bffff 00001000 00000008 c03fff84 c03f6410
[ 0.000000] ff60: c04227e8 c03fffd4 a0008000 c03f8878 69052e30 c02f96eb c03fffbc c03fff88
[ 0.000000] ff80: c03e044c c03e268c 00000000 0000397f c0385130 00000001 ffffffff c03f8874
[ 0.000000] ffa0: dfffffff a0004000 69052e30 a03f61a0 c03ffff4 c03fffc0 c03dd5cc c03e0184
[ 0.000000] ffc0: 00000000 00000000 00000000 00000000 00000000 c03f8878 0000397d c040601c
[ 0.000000] ffe0: c03f8874 c0408674 00000000 c03ffff8 a0008040 c03dd558 00000000 00000000
[ 0.000000] Backtrace:
[ 0.000000] [<c03e745c>] (vm_area_add_early+0x0/0x88) from [<c03e1c40>] (add_static_vm_early+0x14/0x68)

Tested-by: Mikael Pettersson <[email protected]>
Signed-off-by: Aaro Koskinen <[email protected]>
Signed-off-by: Russell King <[email protected]>
Signed-off-by: Jonghwan Choi <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
arch/arm/include/asm/hardware/iop3xx.h | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

--- a/arch/arm/include/asm/hardware/iop3xx.h
+++ b/arch/arm/include/asm/hardware/iop3xx.h
@@ -37,7 +37,7 @@ extern int iop3xx_get_init_atu(void);
* IOP3XX processor registers
*/
#define IOP3XX_PERIPHERAL_PHYS_BASE 0xffffe000
-#define IOP3XX_PERIPHERAL_VIRT_BASE 0xfeffe000
+#define IOP3XX_PERIPHERAL_VIRT_BASE 0xfedfe000
#define IOP3XX_PERIPHERAL_SIZE 0x00002000
#define IOP3XX_PERIPHERAL_UPPER_PA (IOP3XX_PERIPHERAL_PHYS_BASE +\
IOP3XX_PERIPHERAL_SIZE - 1)

2013-04-29 19:33:35

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [ 39/42] net: drop dst before queueing fragments

3.8-stable review patch. If anyone has any objections, please let me know.

------------------


From: Eric Dumazet <[email protected]>

[ Upstream commit 97599dc792b45b1669c3cdb9a4b365aad0232f65 ]

Commit 4a94445c9a5c (net: Use ip_route_input_noref() in input path)
added a bug in IP defragmentation handling, as non refcounted
dst could escape an RCU protected section.

Commit 64f3b9e203bd068 (net: ip_expire() must revalidate route) fixed
the case of timeouts, but not the general problem.

Tom Parkin noticed crashes in UDP stack and provided a patch,
but further analysis permitted us to pinpoint the root cause.

Before queueing a packet into a frag list, we must drop its dst,
as this dst has limited lifetime (RCU protected)

When/if a packet is finally reassembled, we use the dst of the very
last skb, still protected by RCU and valid, as the dst of the
reassembled packet.

Use same logic in IPv6, as there is no need to hold dst references.

Reported-by: Tom Parkin <[email protected]>
Tested-by: Tom Parkin <[email protected]>
Signed-off-by: Eric Dumazet <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
net/ipv4/ip_fragment.c | 15 +++++++++++----
net/ipv6/reassembly.c | 13 +++++++++++--
2 files changed, 22 insertions(+), 6 deletions(-)

--- a/net/ipv4/ip_fragment.c
+++ b/net/ipv4/ip_fragment.c
@@ -255,8 +255,7 @@ static void ip_expire(unsigned long arg)
if (!head->dev)
goto out_rcu_unlock;

- /* skb dst is stale, drop it, and perform route lookup again */
- skb_dst_drop(head);
+ /* skb has no dst, perform route lookup again */
iph = ip_hdr(head);
err = ip_route_input_noref(head, iph->daddr, iph->saddr,
iph->tos, head->dev);
@@ -525,8 +524,16 @@ found:
qp->q.max_size = skb->len + ihl;

if (qp->q.last_in == (INET_FRAG_FIRST_IN | INET_FRAG_LAST_IN) &&
- qp->q.meat == qp->q.len)
- return ip_frag_reasm(qp, prev, dev);
+ qp->q.meat == qp->q.len) {
+ unsigned long orefdst = skb->_skb_refdst;
+
+ skb->_skb_refdst = 0UL;
+ err = ip_frag_reasm(qp, prev, dev);
+ skb->_skb_refdst = orefdst;
+ return err;
+ }
+
+ skb_dst_drop(skb);

write_lock(&ip4_frags.lock);
list_move_tail(&qp->q.lru_list, &qp->q.net->lru_list);
--- a/net/ipv6/reassembly.c
+++ b/net/ipv6/reassembly.c
@@ -342,8 +342,17 @@ found:
}

if (fq->q.last_in == (INET_FRAG_FIRST_IN | INET_FRAG_LAST_IN) &&
- fq->q.meat == fq->q.len)
- return ip6_frag_reasm(fq, prev, dev);
+ fq->q.meat == fq->q.len) {
+ int res;
+ unsigned long orefdst = skb->_skb_refdst;
+
+ skb->_skb_refdst = 0UL;
+ res = ip6_frag_reasm(fq, prev, dev);
+ skb->_skb_refdst = orefdst;
+ return res;
+ }
+
+ skb_dst_drop(skb);

write_lock(&ip6_frags.lock);
list_move_tail(&fq->q.lru_list, &fq->q.net->lru_list);

2013-04-29 19:34:05

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [ 38/42] net: fix incorrect credentials passing

3.8-stable review patch. If anyone has any objections, please let me know.

------------------


From: Linus Torvalds <[email protected]>

[ Upstream commit 83f1b4ba917db5dc5a061a44b3403ddb6e783494 ]

Commit 257b5358b32f ("scm: Capture the full credentials of the scm
sender") changed the credentials passing code to pass in the effective
uid/gid instead of the real uid/gid.

Obviously this doesn't matter most of the time (since normally they are
the same), but it results in differences for suid binaries when the wrong
uid/gid ends up being used.

This just undoes that (presumably unintentional) part of the commit.

Reported-by: Andy Lutomirski <[email protected]>
Cc: Eric W. Biederman <[email protected]>
Cc: Serge E. Hallyn <[email protected]>
Cc: David S. Miller <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
Acked-by: "Eric W. Biederman" <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
include/net/scm.h | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

--- a/include/net/scm.h
+++ b/include/net/scm.h
@@ -56,8 +56,8 @@ static __inline__ void scm_set_cred(stru
scm->pid = get_pid(pid);
scm->cred = cred ? get_cred(cred) : NULL;
scm->creds.pid = pid_vnr(pid);
- scm->creds.uid = cred ? cred->euid : INVALID_UID;
- scm->creds.gid = cred ? cred->egid : INVALID_GID;
+ scm->creds.uid = cred ? cred->uid : INVALID_UID;
+ scm->creds.gid = cred ? cred->gid : INVALID_GID;
}

static __inline__ void scm_destroy_cred(struct scm_cookie *scm)

2013-04-29 19:02:52

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [ 32/42] net: mvmdio: add select PHYLIB

3.8-stable review patch. If anyone has any objections, please let me know.

------------------


From: Thomas Petazzoni <[email protected]>

[ Upstream commit 2e0cbf2cc2c9371f0aa198857d799175ffe231a6 ]

The mvmdio driver uses the phylib API, so it should select the PHYLIB
symbol, otherwise, a build with mvmdio (but without mvneta) fails to
build with undefined symbols such as mdiobus_unregister, mdiobus_free,
etc.

The mvneta driver does not use the phylib API directly, so it does not
need to select PHYLIB. It already selects the mvmdio driver anyway.

Historically, this problem is due to the fact that the PHY handling
was originally part of mvneta, and was later moved to a separate
driver, without updating the Kconfig select statements
accordingly. And since there was no functional reason to use mvmdio
without mvneta, this case was not tested.

Signed-off-by: Thomas Petazzoni <[email protected]>
Reported-by: Fengguang Wu <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
drivers/net/ethernet/marvell/Kconfig | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

--- a/drivers/net/ethernet/marvell/Kconfig
+++ b/drivers/net/ethernet/marvell/Kconfig
@@ -33,6 +33,7 @@ config MV643XX_ETH

config MVMDIO
tristate "Marvell MDIO interface support"
+ select PHYLIB
---help---
This driver supports the MDIO interface found in the network
interface units of the Marvell EBU SoCs (Kirkwood, Orion5x,
@@ -45,7 +46,6 @@ config MVMDIO
config MVNETA
tristate "Marvell Armada 370/XP network interface support"
depends on MACH_ARMADA_370_XP
- select PHYLIB
select MVMDIO
---help---
This driver supports the network interface units in the

2013-04-29 19:34:28

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [ 37/42] net: rate-limit warn-bad-offload splats.

3.8-stable review patch. If anyone has any objections, please let me know.

------------------


From: Ben Greear <[email protected]>

[ Upstream commit c846ad9b880ece01bb4d8d07ba917734edf0324f ]

If one does do something unfortunate and allow a
bad offload bug into the kernel, this the
skb_warn_bad_offload can effectively live-lock the
system, filling the logs with the same error over
and over.

Add rate limitation to this so that box remains otherwise
functional in this case.

Signed-off-by: Ben Greear <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
net/core/dev.c | 3 +++
1 file changed, 3 insertions(+)

--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -2018,6 +2018,9 @@ static void skb_warn_bad_offload(const s
struct net_device *dev = skb->dev;
const char *driver = "";

+ if (!net_ratelimit())
+ return;
+
if (dev && dev->dev.parent)
driver = dev_driver_string(dev->dev.parent);


2013-04-29 19:34:47

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [ 27/42] netfilter: dont reset nf_trace in nf_reset()

3.8-stable review patch. If anyone has any objections, please let me know.

------------------


From: Patrick McHardy <[email protected]>

[ Upstream commit 124dff01afbdbff251f0385beca84ba1b9adda68 ]

Commit 130549fe ("netfilter: reset nf_trace in nf_reset") added code
to reset nf_trace in nf_reset(). This is wrong and unnecessary.

nf_reset() is used in the following cases:

- when passing packets up the the socket layer, at which point we want to
release all netfilter references that might keep modules pinned while
the packet is queued. nf_trace doesn't matter anymore at this point.

- when encapsulating or decapsulating IPsec packets. We want to continue
tracing these packets after IPsec processing.

- when passing packets through virtual network devices. Only devices on
that encapsulate in IPv4/v6 matter since otherwise nf_trace is not
used anymore. Its not entirely clear whether those packets should
be traced after that, however we've always done that.

- when passing packets through virtual network devices that make the
packet cross network namespace boundaries. This is the only cases
where we clearly want to reset nf_trace and is also what the
original patch intended to fix.

Add a new function nf_reset_trace() and use it in dev_forward_skb() to
fix this properly.

Signed-off-by: Patrick McHardy <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
include/linux/skbuff.h | 7 +++++++
net/core/dev.c | 1 +
2 files changed, 8 insertions(+)

--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -2597,6 +2597,13 @@ static inline void nf_reset(struct sk_bu
#endif
}

+static inline void nf_reset_trace(struct sk_buff *skb)
+{
+#if IS_ENABLED(CONFIG_NETFILTER_XT_TARGET_TRACE)
+ skb->nf_trace = 0;
+#endif
+}
+
/* Note: This doesn't put any conntrack and bridge info in dst. */
static inline void __nf_copy(struct sk_buff *dst, const struct sk_buff *src)
{
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -1737,6 +1737,7 @@ int dev_forward_skb(struct net_device *d
skb->mark = 0;
secpath_reset(skb);
nf_reset(skb);
+ nf_reset_trace(skb);
return netif_rx(skb);
}
EXPORT_SYMBOL_GPL(dev_forward_skb);

2013-04-29 19:35:04

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [ 34/42] net: mvneta: fix improper tx queue usage in mvneta_tx()

3.8-stable review patch. If anyone has any objections, please let me know.

------------------


From: Willy Tarreau <[email protected]>

[ Upstream commit ee40a116ebf139f900c3d2e6febb8388738e96d0 ]

mvneta_tx() was using a static tx queue number causing crashes as
soon as a little bit of traffic was sent via the interface, because
it is normally expected that the same queue should be used as in
dev_queue_xmit().

As suggested by Ben Hutchings, let's use skb_get_queue_mapping() to
get the proper Tx queue number, and use alloc_etherdev_mqs() instead
of alloc_etherdev_mq() to create the queues.

Both my Mirabox and my OpenBlocks AX3 used to crash without this patch
and don't anymore with it. The issue appeared in 3.8 but became more
visible after the fix allowing GSO to be enabled.

Original work was done by Dmitri Epshtein and Thomas Petazzoni. I
just adapted it to take care of Ben's comments.

Signed-off-by: Willy Tarreau <[email protected]>
Cc: Dmitri Epshtein <[email protected]>
Cc: Thomas Petazzoni <[email protected]>
Cc: Gregory CLEMENT <[email protected]>
Cc: Ben Hutchings <[email protected]>
Tested-by: Gregory CLEMENT <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
drivers/net/ethernet/marvell/mvneta.c | 9 ++++-----
1 file changed, 4 insertions(+), 5 deletions(-)

--- a/drivers/net/ethernet/marvell/mvneta.c
+++ b/drivers/net/ethernet/marvell/mvneta.c
@@ -375,7 +375,6 @@ static int rxq_number = 8;
static int txq_number = 8;

static int rxq_def;
-static int txq_def;

#define MVNETA_DRIVER_NAME "mvneta"
#define MVNETA_DRIVER_VERSION "1.0"
@@ -1476,7 +1475,8 @@ error:
static int mvneta_tx(struct sk_buff *skb, struct net_device *dev)
{
struct mvneta_port *pp = netdev_priv(dev);
- struct mvneta_tx_queue *txq = &pp->txqs[txq_def];
+ u16 txq_id = skb_get_queue_mapping(skb);
+ struct mvneta_tx_queue *txq = &pp->txqs[txq_id];
struct mvneta_tx_desc *tx_desc;
struct netdev_queue *nq;
int frags = 0;
@@ -1486,7 +1486,7 @@ static int mvneta_tx(struct sk_buff *skb
goto out;

frags = skb_shinfo(skb)->nr_frags + 1;
- nq = netdev_get_tx_queue(dev, txq_def);
+ nq = netdev_get_tx_queue(dev, txq_id);

/* Get a descriptor for the first part of the packet */
tx_desc = mvneta_txq_next_desc_get(txq);
@@ -2690,7 +2690,7 @@ static int mvneta_probe(struct platform_
return -EINVAL;
}

- dev = alloc_etherdev_mq(sizeof(struct mvneta_port), 8);
+ dev = alloc_etherdev_mqs(sizeof(struct mvneta_port), txq_number, rxq_number);
if (!dev)
return -ENOMEM;

@@ -2844,4 +2844,3 @@ module_param(rxq_number, int, S_IRUGO);
module_param(txq_number, int, S_IRUGO);

module_param(rxq_def, int, S_IRUGO);
-module_param(txq_def, int, S_IRUGO);

2013-04-29 19:02:50

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [ 31/42] tcp: Reallocate headroom if it would overflow csum_start

3.8-stable review patch. If anyone has any objections, please let me know.

------------------


From: Thomas Graf <[email protected]>

[ Upstream commit 50bceae9bd3569d56744882f3012734d48a1d413 ]

If a TCP retransmission gets partially ACKed and collapsed multiple
times it is possible for the headroom to grow beyond 64K which will
overflow the 16bit skb->csum_start which is based on the start of
the headroom. It has been observed rarely in the wild with IPoIB due
to the 64K MTU.

Verify if the acking and collapsing resulted in a headroom exceeding
what csum_start can cover and reallocate the headroom if so.

A big thank you to Jim Foraker <[email protected]> and the team at
LLNL for helping out with the investigation and testing.

Reported-by: Jim Foraker <[email protected]>
Signed-off-by: Thomas Graf <[email protected]>
Acked-by: Eric Dumazet <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
net/ipv4/tcp_output.c | 8 ++++++--
1 file changed, 6 insertions(+), 2 deletions(-)

--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -2388,8 +2388,12 @@ int __tcp_retransmit_skb(struct sock *sk
*/
TCP_SKB_CB(skb)->when = tcp_time_stamp;

- /* make sure skb->data is aligned on arches that require it */
- if (unlikely(NET_IP_ALIGN && ((unsigned long)skb->data & 3))) {
+ /* make sure skb->data is aligned on arches that require it
+ * and check if ack-trimming & collapsing extended the headroom
+ * beyond what csum_start can cover.
+ */
+ if (unlikely((NET_IP_ALIGN && ((unsigned long)skb->data & 3)) ||
+ skb_headroom(skb) >= 0xFFFF)) {
struct sk_buff *nskb = __pskb_copy(skb, MAX_TCP_HEADER,
GFP_ATOMIC);
return nskb ? tcp_transmit_skb(sk, nskb, 0, GFP_ATOMIC) :

2013-04-29 19:35:40

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [ 33/42] esp4: fix error return code in esp_output()

3.8-stable review patch. If anyone has any objections, please let me know.

------------------


From: Wei Yongjun <[email protected]>

[ Upstream commit 06848c10f720cbc20e3b784c0df24930b7304b93 ]

Fix to return a negative error code from the error handling
case instead of 0, as returned elsewhere in this function.

Signed-off-by: Wei Yongjun <[email protected]>
Acked-by: Steffen Klassert <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
net/ipv4/esp4.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)

--- a/net/ipv4/esp4.c
+++ b/net/ipv4/esp4.c
@@ -139,8 +139,6 @@ static int esp_output(struct xfrm_state

/* skb is pure payload to encrypt */

- err = -ENOMEM;
-
esp = x->data;
aead = esp->aead;
alen = crypto_aead_authsize(aead);
@@ -176,8 +174,10 @@ static int esp_output(struct xfrm_state
}

tmp = esp_alloc_tmp(aead, nfrags + sglists, seqhilen);
- if (!tmp)
+ if (!tmp) {
+ err = -ENOMEM;
goto error;
+ }

seqhi = esp_tmp_seqhi(tmp);
iv = esp_tmp_iv(aead, tmp, seqhilen);

2013-04-29 19:02:47

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [ 18/42] tipc: fix info leaks via msg_name in recv_msg/recv_stream

3.8-stable review patch. If anyone has any objections, please let me know.

------------------


From: Mathias Krause <[email protected]>

[ Upstream commit 60085c3d009b0df252547adb336d1ccca5ce52ec ]

The code in set_orig_addr() does not initialize all of the members of
struct sockaddr_tipc when filling the sockaddr info -- namely the union
is only partly filled. This will make recv_msg() and recv_stream() --
the only users of this function -- leak kernel stack memory as the
msg_name member is a local variable in net/socket.c.

Additionally to that both recv_msg() and recv_stream() fail to update
the msg_namelen member to 0 while otherwise returning with 0, i.e.
"success". This is the case for, e.g., non-blocking sockets. This will
lead to a 128 byte kernel stack leak in net/socket.c.

Fix the first issue by initializing the memory of the union with
memset(0). Fix the second one by setting msg_namelen to 0 early as it
will be updated later if we're going to fill the msg_name member.

Signed-off-by: Mathias Krause <[email protected]>
Cc: Jon Maloy <[email protected]>
Cc: Allan Stephens <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
net/tipc/socket.c | 7 +++++++
1 file changed, 7 insertions(+)

--- a/net/tipc/socket.c
+++ b/net/tipc/socket.c
@@ -806,6 +806,7 @@ static void set_orig_addr(struct msghdr
if (addr) {
addr->family = AF_TIPC;
addr->addrtype = TIPC_ADDR_ID;
+ memset(&addr->addr, 0, sizeof(addr->addr));
addr->addr.id.ref = msg_origport(msg);
addr->addr.id.node = msg_orignode(msg);
addr->addr.name.domain = 0; /* could leave uninitialized */
@@ -920,6 +921,9 @@ static int recv_msg(struct kiocb *iocb,
goto exit;
}

+ /* will be updated in set_orig_addr() if needed */
+ m->msg_namelen = 0;
+
timeout = sock_rcvtimeo(sk, flags & MSG_DONTWAIT);
restart:

@@ -1029,6 +1033,9 @@ static int recv_stream(struct kiocb *ioc
goto exit;
}

+ /* will be updated in set_orig_addr() if needed */
+ m->msg_namelen = 0;
+
target = sock_rcvlowat(sk, flags & MSG_WAITALL, buf_len);
timeout = sock_rcvtimeo(sk, flags & MSG_DONTWAIT);


2013-04-29 19:35:57

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [ 30/42] tcp: incoming connections might use wrong route under synflood

3.8-stable review patch. If anyone has any objections, please let me know.

------------------


From: Dmitry Popov <[email protected]>

[ Upstream commit d66954a066158781ccf9c13c91d0316970fe57b6 ]

There is a bug in cookie_v4_check (net/ipv4/syncookies.c):
flowi4_init_output(&fl4, 0, sk->sk_mark, RT_CONN_FLAGS(sk),
RT_SCOPE_UNIVERSE, IPPROTO_TCP,
inet_sk_flowi_flags(sk),
(opt && opt->srr) ? opt->faddr : ireq->rmt_addr,
ireq->loc_addr, th->source, th->dest);

Here we do not respect sk->sk_bound_dev_if, therefore wrong dst_entry may be
taken. This dst_entry is used by new socket (get_cookie_sock ->
tcp_v4_syn_recv_sock), so its packets may take the wrong path.

Signed-off-by: Dmitry Popov <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
net/ipv4/syncookies.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

--- a/net/ipv4/syncookies.c
+++ b/net/ipv4/syncookies.c
@@ -348,8 +348,8 @@ struct sock *cookie_v4_check(struct sock
* hasn't changed since we received the original syn, but I see
* no easy way to do this.
*/
- flowi4_init_output(&fl4, 0, sk->sk_mark, RT_CONN_FLAGS(sk),
- RT_SCOPE_UNIVERSE, IPPROTO_TCP,
+ flowi4_init_output(&fl4, sk->sk_bound_dev_if, sk->sk_mark,
+ RT_CONN_FLAGS(sk), RT_SCOPE_UNIVERSE, IPPROTO_TCP,
inet_sk_flowi_flags(sk),
(opt && opt->srr) ? opt->faddr : ireq->rmt_addr,
ireq->loc_addr, th->source, th->dest);

2013-04-29 19:36:26

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [ 28/42] ipv6/tcp: Stop processing ICMPv6 redirect messages

3.8-stable review patch. If anyone has any objections, please let me know.

------------------


From: Christoph Paasch <[email protected]>

[ Upstream commit 50a75a8914539c5dcd441c5f54d237a666a426fd ]

Tetja Rediske found that if the host receives an ICMPv6 redirect message
after sending a SYN+ACK, the connection will be reset.

He bisected it down to 093d04d (ipv6: Change skb->data before using
icmpv6_notify() to propagate redirect), but the origin of the bug comes
from ec18d9a26 (ipv6: Add redirect support to all protocol icmp error
handlers.). The bug simply did not trigger prior to 093d04d, because
skb->data did not point to the inner IP header and thus icmpv6_notify
did not call the correct err_handler.

This patch adds the missing "goto out;" in tcp_v6_err. After receiving
an ICMPv6 Redirect, we should not continue processing the ICMP in
tcp_v6_err, as this may trigger the removal of request-socks or setting
sk_err(_soft).

Reported-by: Tetja Rediske <[email protected]>
Signed-off-by: Christoph Paasch <[email protected]>
Acked-by: Eric Dumazet <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
net/ipv6/tcp_ipv6.c | 1 +
1 file changed, 1 insertion(+)

--- a/net/ipv6/tcp_ipv6.c
+++ b/net/ipv6/tcp_ipv6.c
@@ -386,6 +386,7 @@ static void tcp_v6_err(struct sk_buff *s

if (dst)
dst->ops->redirect(dst, sk, skb);
+ goto out;
}

if (type == ICMPV6_PKT_TOOBIG) {

2013-04-29 19:36:24

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [ 29/42] rtnetlink: Call nlmsg_parse() with correct header length

3.8-stable review patch. If anyone has any objections, please let me know.

------------------


From: Michael Riesch <[email protected]>

[ Upstream commit 88c5b5ce5cb57af6ca2a7cf4d5715fa320448ff9 ]

Signed-off-by: Michael Riesch <[email protected]>
Cc: Jiri Benc <[email protected]>
Cc: "Theodore Ts'o" <[email protected]>
Acked-by: Mark Rustad <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
net/core/rtnetlink.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

--- a/net/core/rtnetlink.c
+++ b/net/core/rtnetlink.c
@@ -1068,7 +1068,7 @@ static int rtnl_dump_ifinfo(struct sk_bu
rcu_read_lock();
cb->seq = net->dev_base_seq;

- if (nlmsg_parse(cb->nlh, sizeof(struct rtgenmsg), tb, IFLA_MAX,
+ if (nlmsg_parse(cb->nlh, sizeof(struct ifinfomsg), tb, IFLA_MAX,
ifla_policy) >= 0) {

if (tb[IFLA_EXT_MASK])
@@ -1924,7 +1924,7 @@ static u16 rtnl_calcit(struct sk_buff *s
u32 ext_filter_mask = 0;
u16 min_ifinfo_dump_size = 0;

- if (nlmsg_parse(nlh, sizeof(struct rtgenmsg), tb, IFLA_MAX,
+ if (nlmsg_parse(nlh, sizeof(struct ifinfomsg), tb, IFLA_MAX,
ifla_policy) >= 0) {
if (tb[IFLA_EXT_MASK])
ext_filter_mask = nla_get_u32(tb[IFLA_EXT_MASK]);

2013-04-29 19:02:45

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [ 24/42] bonding: IFF_BONDING is not stripped on enslave failure

3.8-stable review patch. If anyone has any objections, please let me know.

------------------


From: "[email protected]" <[email protected]>

[ Upstream commit b6a5a7b9a528a8b4c8bec940b607c5dd9102b8cc ]

While enslaving a new device and after IFF_BONDING flag is set, in case
of failure it is not stripped from the device's priv_flags while
cleaning up, which could lead to other problems.
Cleaning at err_close because the flag is set after dev_open().

v2: no change

Signed-off-by: Nikolay Aleksandrov <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
drivers/net/bonding/bond_main.c | 1 +
1 file changed, 1 insertion(+)

--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -1888,6 +1888,7 @@ err_detach:
write_unlock_bh(&bond->lock);

err_close:
+ slave_dev->priv_flags &= ~IFF_BONDING;
dev_close(slave_dev);

err_unset_master:

2013-04-29 19:39:47

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [ 26/42] af_unix: If we dont care about credentials coallesce all messages

3.8-stable review patch. If anyone has any objections, please let me know.

------------------


From: "Eric W. Biederman" <[email protected]>

[ Upstream commit 0e82e7f6dfeec1013339612f74abc2cdd29d43d2 ]

It was reported that the following LSB test case failed
https://lsbbugs.linuxfoundation.org/attachment.cgi?id=2144 because we
were not coallescing unix stream messages when the application was
expecting us to.

The problem was that the first send was before the socket was accepted
and thus sock->sk_socket was NULL in maybe_add_creds, and the second
send after the socket was accepted had a non-NULL value for sk->socket
and thus we could tell the credentials were not needed so we did not
bother.

The unnecessary credentials on the first message cause
unix_stream_recvmsg to start verifying that all messages had the same
credentials before coallescing and then the coallescing failed because
the second message had no credentials.

Ignoring credentials when we don't care in unix_stream_recvmsg fixes a
long standing pessimization which would fail to coallesce messages when
reading from a unix stream socket if the senders were different even if
we did not care about their credentials.

I have tested this and verified that the in the LSB test case mentioned
above that the messages do coallesce now, while the were failing to
coallesce without this change.

Reported-by: Karel Srot <[email protected]>
Reported-by: Ding Tianhong <[email protected]>
Signed-off-by: "Eric W. Biederman" <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
net/unix/af_unix.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

--- a/net/unix/af_unix.c
+++ b/net/unix/af_unix.c
@@ -1995,7 +1995,7 @@ again:
if ((UNIXCB(skb).pid != siocb->scm->pid) ||
(UNIXCB(skb).cred != siocb->scm->cred))
break;
- } else {
+ } else if (test_bit(SOCK_PASSCRED, &sock->flags)) {
/* Copy credentials */
scm_set_cred(siocb->scm, UNIXCB(skb).pid, UNIXCB(skb).cred);
check_creds = 1;

2013-04-29 19:40:06

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [ 25/42] bonding: fix l23 and l34 load balancing in forwarding path

3.8-stable review patch. If anyone has any objections, please let me know.

------------------


From: Eric Dumazet <[email protected]>

[ Upstream commit 4394542ca4ec9f28c3c8405063d200b1e7c347d7 ]

Since commit 6b923cb7188d46 (bonding: support for IPv6 transmit hashing)
bonding doesn't properly hash traffic in forwarding setups.

Vitaly V. Bursov diagnosed that skb_network_header_len() returned 0 in
this case.

More generally, the transport header might not be in the skb head.

Use pskb_may_pull() & skb_header_pointer() to get it right, and use
proto_ports_offset() in bond_xmit_hash_policy_l34() to get support for
more protocols than TCP and UDP.

Reported-by: Vitaly V. Bursov <[email protected]>
Signed-off-by: Eric Dumazet <[email protected]>
Cc: Jay Vosburgh <[email protected]>
Cc: Andy Gospodarek <[email protected]>
Cc: John Eaglesham <[email protected]>
Tested-by: Vitaly V. Bursov <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
drivers/net/bonding/bond_main.c | 55 +++++++++++++++++++++-------------------
1 file changed, 30 insertions(+), 25 deletions(-)

--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -3380,20 +3380,22 @@ static int bond_xmit_hash_policy_l2(stru
*/
static int bond_xmit_hash_policy_l23(struct sk_buff *skb, int count)
{
- struct ethhdr *data = (struct ethhdr *)skb->data;
- struct iphdr *iph;
- struct ipv6hdr *ipv6h;
+ const struct ethhdr *data;
+ const struct iphdr *iph;
+ const struct ipv6hdr *ipv6h;
u32 v6hash;
- __be32 *s, *d;
+ const __be32 *s, *d;

if (skb->protocol == htons(ETH_P_IP) &&
- skb_network_header_len(skb) >= sizeof(*iph)) {
+ pskb_network_may_pull(skb, sizeof(*iph))) {
iph = ip_hdr(skb);
+ data = (struct ethhdr *)skb->data;
return ((ntohl(iph->saddr ^ iph->daddr) & 0xffff) ^
(data->h_dest[5] ^ data->h_source[5])) % count;
} else if (skb->protocol == htons(ETH_P_IPV6) &&
- skb_network_header_len(skb) >= sizeof(*ipv6h)) {
+ pskb_network_may_pull(skb, sizeof(*ipv6h))) {
ipv6h = ipv6_hdr(skb);
+ data = (struct ethhdr *)skb->data;
s = &ipv6h->saddr.s6_addr32[0];
d = &ipv6h->daddr.s6_addr32[0];
v6hash = (s[1] ^ d[1]) ^ (s[2] ^ d[2]) ^ (s[3] ^ d[3]);
@@ -3412,33 +3414,36 @@ static int bond_xmit_hash_policy_l23(str
static int bond_xmit_hash_policy_l34(struct sk_buff *skb, int count)
{
u32 layer4_xor = 0;
- struct iphdr *iph;
- struct ipv6hdr *ipv6h;
- __be32 *s, *d;
- __be16 *layer4hdr;
+ const struct iphdr *iph;
+ const struct ipv6hdr *ipv6h;
+ const __be32 *s, *d;
+ const __be16 *l4 = NULL;
+ __be16 _l4[2];
+ int noff = skb_network_offset(skb);
+ int poff;

if (skb->protocol == htons(ETH_P_IP) &&
- skb_network_header_len(skb) >= sizeof(*iph)) {
+ pskb_may_pull(skb, noff + sizeof(*iph))) {
iph = ip_hdr(skb);
- if (!ip_is_fragment(iph) &&
- (iph->protocol == IPPROTO_TCP ||
- iph->protocol == IPPROTO_UDP) &&
- (skb_headlen(skb) - skb_network_offset(skb) >=
- iph->ihl * sizeof(u32) + sizeof(*layer4hdr) * 2)) {
- layer4hdr = (__be16 *)((u32 *)iph + iph->ihl);
- layer4_xor = ntohs(*layer4hdr ^ *(layer4hdr + 1));
+ poff = proto_ports_offset(iph->protocol);
+
+ if (!ip_is_fragment(iph) && poff >= 0) {
+ l4 = skb_header_pointer(skb, noff + (iph->ihl << 2) + poff,
+ sizeof(_l4), &_l4);
+ if (l4)
+ layer4_xor = ntohs(l4[0] ^ l4[1]);
}
return (layer4_xor ^
((ntohl(iph->saddr ^ iph->daddr)) & 0xffff)) % count;
} else if (skb->protocol == htons(ETH_P_IPV6) &&
- skb_network_header_len(skb) >= sizeof(*ipv6h)) {
+ pskb_may_pull(skb, noff + sizeof(*ipv6h))) {
ipv6h = ipv6_hdr(skb);
- if ((ipv6h->nexthdr == IPPROTO_TCP ||
- ipv6h->nexthdr == IPPROTO_UDP) &&
- (skb_headlen(skb) - skb_network_offset(skb) >=
- sizeof(*ipv6h) + sizeof(*layer4hdr) * 2)) {
- layer4hdr = (__be16 *)(ipv6h + 1);
- layer4_xor = ntohs(*layer4hdr ^ *(layer4hdr + 1));
+ poff = proto_ports_offset(ipv6h->nexthdr);
+ if (poff >= 0) {
+ l4 = skb_header_pointer(skb, noff + sizeof(*ipv6h) + poff,
+ sizeof(_l4), &_l4);
+ if (l4)
+ layer4_xor = ntohs(l4[0] ^ l4[1]);
}
s = &ipv6h->saddr.s6_addr32[0];
d = &ipv6h->daddr.s6_addr32[0];

2013-04-29 19:02:43

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [ 16/42] NFC: llcp: fix info leaks via msg_name in llcp_sock_recvmsg()

3.8-stable review patch. If anyone has any objections, please let me know.

------------------


From: Mathias Krause <[email protected]>

[ Upstream commit d26d6504f23e803824e8ebd14e52d4fc0a0b09cb ]

The code in llcp_sock_recvmsg() does not initialize all the members of
struct sockaddr_nfc_llcp when filling the sockaddr info. Nor does it
initialize the padding bytes of the structure inserted by the compiler
for alignment.

Also, if the socket is in state LLCP_CLOSED or is shutting down during
receive the msg_namelen member is not updated to 0 while otherwise
returning with 0, i.e. "success". The msg_namelen update is also
missing for stream and seqpacket sockets which don't fill the sockaddr
info.

Both issues lead to the fact that the code will leak uninitialized
kernel stack bytes in net/socket.c.

Fix the first issue by initializing the memory used for sockaddr info
with memset(0). Fix the second one by setting msg_namelen to 0 early.
It will be updated later if we're going to fill the msg_name member.

Signed-off-by: Mathias Krause <[email protected]>
Cc: Lauro Ramos Venancio <[email protected]>
Cc: Aloisio Almeida Jr <[email protected]>
Cc: Samuel Ortiz <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
net/nfc/llcp/sock.c | 3 +++
1 file changed, 3 insertions(+)

--- a/net/nfc/llcp/sock.c
+++ b/net/nfc/llcp/sock.c
@@ -644,6 +644,8 @@ static int llcp_sock_recvmsg(struct kioc

pr_debug("%p %zu\n", sk, len);

+ msg->msg_namelen = 0;
+
lock_sock(sk);

if (sk->sk_state == LLCP_CLOSED &&
@@ -684,6 +686,7 @@ static int llcp_sock_recvmsg(struct kioc

pr_debug("Datagram socket %d %d\n", ui_cb->dsap, ui_cb->ssap);

+ memset(&sockaddr, 0, sizeof(sockaddr));
sockaddr.sa_family = AF_NFC;
sockaddr.nfc_protocol = NFC_PROTO_NFC_DEP;
sockaddr.dsap = ui_cb->dsap;

2013-04-29 19:40:52

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [ 23/42] bonding: fix bonding_masters race condition in bond unloading

3.8-stable review patch. If anyone has any objections, please let me know.

------------------


From: "[email protected]" <[email protected]>

[ Upstream commit 69b0216ac255f523556fa3d4ff030d857eaaa37f ]

While the bonding module is unloading, it is considered that after
rtnl_link_unregister all bond devices are destroyed but since no
synchronization mechanism exists, a new bond device can be created
via bonding_masters before unregister_pernet_subsys which would
lead to multiple problems (e.g. NULL pointer dereference, wrong RIP,
list corruption).

This patch fixes the issue by removing any bond devices left in the
netns after bonding_masters is removed from sysfs.

Signed-off-by: Nikolay Aleksandrov <[email protected]>
Acked-by: Veaceslav Falico <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
drivers/net/bonding/bond_main.c | 9 +++++++++
1 file changed, 9 insertions(+)

--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -4919,9 +4919,18 @@ static int __net_init bond_net_init(stru
static void __net_exit bond_net_exit(struct net *net)
{
struct bond_net *bn = net_generic(net, bond_net_id);
+ struct bonding *bond, *tmp_bond;
+ LIST_HEAD(list);

bond_destroy_sysfs(bn);
bond_destroy_proc_dir(bn);
+
+ /* Kill off any bonds created after unregistering bond rtnl ops */
+ rtnl_lock();
+ list_for_each_entry_safe(bond, tmp_bond, &bn->dev_list, bond_list)
+ unregister_netdevice_queue(bond->dev, &list);
+ unregister_netdevice_many(&list);
+ rtnl_unlock();
}

static struct pernet_operations bond_net_ops = {

2013-04-29 19:41:16

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [ 22/42] atl1e: limit gso segment size to prevent generation of wrong ip length fields

3.8-stable review patch. If anyone has any objections, please let me know.

------------------


From: Hannes Frederic Sowa <[email protected]>

[ Upstream commit 31d1670e73f4911fe401273a8f576edc9c2b5fea ]

The limit of 0x3c00 is taken from the windows driver.

Suggested-by: Huang, Xiong <[email protected]>
Cc: Huang, Xiong <[email protected]>
Cc: Eric Dumazet <[email protected]>
Signed-off-by: Hannes Frederic Sowa <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
drivers/net/ethernet/atheros/atl1e/atl1e.h | 2 +-
drivers/net/ethernet/atheros/atl1e/atl1e_main.c | 1 +
2 files changed, 2 insertions(+), 1 deletion(-)

--- a/drivers/net/ethernet/atheros/atl1e/atl1e.h
+++ b/drivers/net/ethernet/atheros/atl1e/atl1e.h
@@ -186,7 +186,7 @@ struct atl1e_tpd_desc {
/* how about 0x2000 */
#define MAX_TX_BUF_LEN 0x2000
#define MAX_TX_BUF_SHIFT 13
-/*#define MAX_TX_BUF_LEN 0x3000 */
+#define MAX_TSO_SEG_SIZE 0x3c00

/* rrs word 1 bit 0:31 */
#define RRS_RX_CSUM_MASK 0xFFFF
--- a/drivers/net/ethernet/atheros/atl1e/atl1e_main.c
+++ b/drivers/net/ethernet/atheros/atl1e/atl1e_main.c
@@ -2332,6 +2332,7 @@ static int atl1e_probe(struct pci_dev *p

INIT_WORK(&adapter->reset_task, atl1e_reset_task);
INIT_WORK(&adapter->link_chg_task, atl1e_link_chg_task);
+ netif_set_gso_max_size(netdev, MAX_TSO_SEG_SIZE);
err = register_netdev(netdev);
if (err) {
netdev_err(netdev, "register netdevice failed\n");

2013-04-29 19:02:41

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [ 14/42] llc: Fix missing msg_namelen update in llc_ui_recvmsg()

3.8-stable review patch. If anyone has any objections, please let me know.

------------------


From: Mathias Krause <[email protected]>

[ Upstream commit c77a4b9cffb6215a15196ec499490d116dfad181 ]

For stream sockets the code misses to update the msg_namelen member
to 0 and therefore makes net/socket.c leak the local, uninitialized
sockaddr_storage variable to userland -- 128 bytes of kernel stack
memory. The msg_namelen update is also missing for datagram sockets
in case the socket is shutting down during receive.

Fix both issues by setting msg_namelen to 0 early. It will be
updated later if we're going to fill the msg_name member.

Signed-off-by: Mathias Krause <[email protected]>
Cc: Arnaldo Carvalho de Melo <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
net/llc/af_llc.c | 2 ++
1 file changed, 2 insertions(+)

--- a/net/llc/af_llc.c
+++ b/net/llc/af_llc.c
@@ -720,6 +720,8 @@ static int llc_ui_recvmsg(struct kiocb *
int target; /* Read at least this many bytes */
long timeo;

+ msg->msg_namelen = 0;
+
lock_sock(sk);
copied = -ENOTCONN;
if (unlikely(sk->sk_type == SOCK_STREAM && sk->sk_state == TCP_LISTEN))

2013-04-29 19:41:53

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [ 21/42] net: count hw_addr syncs so that unsync works properly.

3.8-stable review patch. If anyone has any objections, please let me know.

------------------


From: Vlad Yasevich <[email protected]>

[ Upstream commit 4543fbefe6e06a9e40d9f2b28d688393a299f079 ]

A few drivers use dev_uc_sync/unsync to synchronize the
address lists from master down to slave/lower devices. In
some cases (bond/team) a single address list is synched down
to multiple devices. At the time of unsync, we have a leak
in these lower devices, because "synced" is treated as a
boolean and the address will not be unsynced for anything after
the first device/call.

Treat "synced" as a count (same as refcount) and allow all
unsync calls to work.

Signed-off-by: Vlad Yasevich <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
include/linux/netdevice.h | 2 +-
net/core/dev_addr_lists.c | 6 +++---
2 files changed, 4 insertions(+), 4 deletions(-)

--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -208,9 +208,9 @@ struct netdev_hw_addr {
#define NETDEV_HW_ADDR_T_SLAVE 3
#define NETDEV_HW_ADDR_T_UNICAST 4
#define NETDEV_HW_ADDR_T_MULTICAST 5
- bool synced;
bool global_use;
int refcount;
+ int synced;
struct rcu_head rcu_head;
};

--- a/net/core/dev_addr_lists.c
+++ b/net/core/dev_addr_lists.c
@@ -38,7 +38,7 @@ static int __hw_addr_create_ex(struct ne
ha->type = addr_type;
ha->refcount = 1;
ha->global_use = global;
- ha->synced = false;
+ ha->synced = 0;
list_add_tail_rcu(&ha->list, &list->list);
list->count++;

@@ -166,7 +166,7 @@ int __hw_addr_sync(struct netdev_hw_addr
addr_len, ha->type);
if (err)
break;
- ha->synced = true;
+ ha->synced++;
ha->refcount++;
} else if (ha->refcount == 1) {
__hw_addr_del(to_list, ha->addr, addr_len, ha->type);
@@ -187,7 +187,7 @@ void __hw_addr_unsync(struct netdev_hw_a
if (ha->synced) {
__hw_addr_del(to_list, ha->addr,
addr_len, ha->type);
- ha->synced = false;
+ ha->synced--;
__hw_addr_del(from_list, ha->addr,
addr_len, ha->type);
}

2013-04-29 19:42:14

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [ 20/42] net IPv6 : Fix broken IPv6 routing table after loopback down-up

3.8-stable review patch. If anyone has any objections, please let me know.

------------------


From: Balakumaran Kannan <[email protected]>

[ Upstream commit 25fb6ca4ed9cad72f14f61629b68dc03c0d9713f ]

IPv6 Routing table becomes broken once we do ifdown, ifup of the loopback(lo)
interface. After down-up, routes of other interface's IPv6 addresses through
'lo' are lost.

IPv6 addresses assigned to all interfaces are routed through 'lo' for internal
communication. Once 'lo' is down, those routing entries are removed from routing
table. But those removed entries are not being re-created properly when 'lo' is
brought up. So IPv6 addresses of other interfaces becomes unreachable from the
same machine. Also this breaks communication with other machines because of
NDISC packet processing failure.

This patch fixes this issue by reading all interface's IPv6 addresses and adding
them to IPv6 routing table while bringing up 'lo'.

==Testing==
Before applying the patch:
$ route -A inet6
Kernel IPv6 routing table
Destination Next Hop Flag Met Ref Use If
2000::20/128 :: U 256 0 0 eth0
fe80::/64 :: U 256 0 0 eth0
::/0 :: !n -1 1 1 lo
::1/128 :: Un 0 1 0 lo
2000::20/128 :: Un 0 1 0 lo
fe80::xxxx:xxxx:xxxx:xxxx/128 :: Un 0 1 0 lo
ff00::/8 :: U 256 0 0 eth0
::/0 :: !n -1 1 1 lo
$ sudo ifdown lo
$ sudo ifup lo
$ route -A inet6
Kernel IPv6 routing table
Destination Next Hop Flag Met Ref Use If
2000::20/128 :: U 256 0 0 eth0
fe80::/64 :: U 256 0 0 eth0
::/0 :: !n -1 1 1 lo
::1/128 :: Un 0 1 0 lo
ff00::/8 :: U 256 0 0 eth0
::/0 :: !n -1 1 1 lo
$

After applying the patch:
$ route -A inet6
Kernel IPv6 routing
table
Destination Next Hop Flag Met Ref Use If
2000::20/128 :: U 256 0 0 eth0
fe80::/64 :: U 256 0 0 eth0
::/0 :: !n -1 1 1 lo
::1/128 :: Un 0 1 0 lo
2000::20/128 :: Un 0 1 0 lo
fe80::xxxx:xxxx:xxxx:xxxx/128 :: Un 0 1 0 lo
ff00::/8 :: U 256 0 0 eth0
::/0 :: !n -1 1 1 lo
$ sudo ifdown lo
$ sudo ifup lo
$ route -A inet6
Kernel IPv6 routing table
Destination Next Hop Flag Met Ref Use If
2000::20/128 :: U 256 0 0 eth0
fe80::/64 :: U 256 0 0 eth0
::/0 :: !n -1 1 1 lo
::1/128 :: Un 0 1 0 lo
2000::20/128 :: Un 0 1 0 lo
fe80::xxxx:xxxx:xxxx:xxxx/128 :: Un 0 1 0 lo
ff00::/8 :: U 256 0 0 eth0
::/0 :: !n -1 1 1 lo
$

Signed-off-by: Balakumaran Kannan <[email protected]>
Signed-off-by: Maruthi Thotad <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
net/ipv6/addrconf.c | 27 +++++++++++++++++++++++++++
1 file changed, 27 insertions(+)

--- a/net/ipv6/addrconf.c
+++ b/net/ipv6/addrconf.c
@@ -2525,6 +2525,9 @@ static void sit_add_v4_addrs(struct inet
static void init_loopback(struct net_device *dev)
{
struct inet6_dev *idev;
+ struct net_device *sp_dev;
+ struct inet6_ifaddr *sp_ifa;
+ struct rt6_info *sp_rt;

/* ::1 */

@@ -2536,6 +2539,30 @@ static void init_loopback(struct net_dev
}

add_addr(idev, &in6addr_loopback, 128, IFA_HOST);
+
+ /* Add routes to other interface's IPv6 addresses */
+ for_each_netdev(dev_net(dev), sp_dev) {
+ if (!strcmp(sp_dev->name, dev->name))
+ continue;
+
+ idev = __in6_dev_get(sp_dev);
+ if (!idev)
+ continue;
+
+ read_lock_bh(&idev->lock);
+ list_for_each_entry(sp_ifa, &idev->addr_list, if_list) {
+
+ if (sp_ifa->flags & (IFA_F_DADFAILED | IFA_F_TENTATIVE))
+ continue;
+
+ sp_rt = addrconf_dst_alloc(idev, &sp_ifa->addr, 0);
+
+ /* Failure cases are ignored */
+ if (!IS_ERR(sp_rt))
+ ip6_ins_rt(sp_rt);
+ }
+ read_unlock_bh(&idev->lock);
+ }
}

static void addrconf_add_linklocal(struct inet6_dev *idev, const struct in6_addr *addr)

2013-04-29 19:02:40

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [ 15/42] netrom: fix info leak via msg_name in nr_recvmsg()

3.8-stable review patch. If anyone has any objections, please let me know.

------------------


From: Mathias Krause <[email protected]>

[ Upstream commits 3ce5efad47b62c57a4f5c54248347085a750ce0e and
c802d759623acbd6e1ee9fbdabae89159a513913 ]

In case msg_name is set the sockaddr info gets filled out, as
requested, but the code fails to initialize the padding bytes of
struct sockaddr_ax25 inserted by the compiler for alignment. Also
the sax25_ndigis member does not get assigned, leaking four more
bytes.

Both issues lead to the fact that the code will leak uninitialized
kernel stack bytes in net/socket.c.

Fix both issues by initializing the memory with memset(0).

Signed-off-by: Mathias Krause <[email protected]>
Cc: Ralf Baechle <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
net/netrom/af_netrom.c | 1 +
1 file changed, 1 insertion(+)

--- a/net/netrom/af_netrom.c
+++ b/net/netrom/af_netrom.c
@@ -1177,6 +1177,7 @@ static int nr_recvmsg(struct kiocb *iocb
}

if (sax != NULL) {
+ memset(sax, 0, sizeof(sax));
sax->sax25_family = AF_NETROM;
skb_copy_from_linear_data_offset(skb, 7, sax->sax25_call.ax25_call,
AX25_ADDR_LEN);

2013-04-29 19:42:39

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [ 19/42] cbq: incorrect processing of high limits

3.8-stable review patch. If anyone has any objections, please let me know.

------------------


From: Vasily Averin <[email protected]>

[ Upstream commit f0f6ee1f70c4eaab9d52cf7d255df4bd89f8d1c2 ]

currently cbq works incorrectly for limits > 10% real link bandwidth,
and practically does not work for limits > 50% real link bandwidth.
Below are results of experiments taken on 1 Gbit link

In shaper | Actual Result
-----------+---------------
100M | 108 Mbps
200M | 244 Mbps
300M | 412 Mbps
500M | 893 Mbps

This happen because of q->now changes incorrectly in cbq_dequeue():
when it is called before real end of packet transmitting,
L2T is greater than real time delay, q_now gets an extra boost
but never compensate it.

To fix this problem we prevent change of q->now until its synchronization
with real time.

Signed-off-by: Vasily Averin <[email protected]>
Reviewed-by: Alexey Kuznetsov <[email protected]>
Acked-by: Eric Dumazet <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
net/sched/sch_cbq.c | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)

--- a/net/sched/sch_cbq.c
+++ b/net/sched/sch_cbq.c
@@ -962,8 +962,11 @@ cbq_dequeue(struct Qdisc *sch)
cbq_update(q);
if ((incr -= incr2) < 0)
incr = 0;
+ q->now += incr;
+ } else {
+ if (now > q->now)
+ q->now = now;
}
- q->now += incr;
q->now_rt = now;

for (;;) {

2013-04-29 19:42:56

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [ 17/42] rose: fix info leak via msg_name in rose_recvmsg()

3.8-stable review patch. If anyone has any objections, please let me know.

------------------


From: Mathias Krause <[email protected]>

[ Upstream commit 4a184233f21645cf0b719366210ed445d1024d72 ]

The code in rose_recvmsg() does not initialize all of the members of
struct sockaddr_rose/full_sockaddr_rose when filling the sockaddr info.
Nor does it initialize the padding bytes of the structure inserted by
the compiler for alignment. This will lead to leaking uninitialized
kernel stack bytes in net/socket.c.

Fix the issue by initializing the memory used for sockaddr info with
memset(0).

Signed-off-by: Mathias Krause <[email protected]>
Cc: Ralf Baechle <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
net/rose/af_rose.c | 1 +
1 file changed, 1 insertion(+)

--- a/net/rose/af_rose.c
+++ b/net/rose/af_rose.c
@@ -1257,6 +1257,7 @@ static int rose_recvmsg(struct kiocb *io
skb_copy_datagram_iovec(skb, 0, msg->msg_iov, copied);

if (srose != NULL) {
+ memset(srose, 0, msg->msg_namelen);
srose->srose_family = AF_ROSE;
srose->srose_addr = rose->dest_addr;
srose->srose_call = rose->dest_call;

2013-04-29 19:02:37

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [ 12/42] iucv: Fix missing msg_namelen update in iucv_sock_recvmsg()

3.8-stable review patch. If anyone has any objections, please let me know.

------------------


From: Mathias Krause <[email protected]>

[ Upstream commit a5598bd9c087dc0efc250a5221e5d0e6f584ee88 ]

The current code does not fill the msg_name member in case it is set.
It also does not set the msg_namelen member to 0 and therefore makes
net/socket.c leak the local, uninitialized sockaddr_storage variable
to userland -- 128 bytes of kernel stack memory.

Fix that by simply setting msg_namelen to 0 as obviously nobody cared
about iucv_sock_recvmsg() not filling the msg_name in case it was set.

Signed-off-by: Mathias Krause <[email protected]>
Cc: Ursula Braun <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
net/iucv/af_iucv.c | 2 ++
1 file changed, 2 insertions(+)

--- a/net/iucv/af_iucv.c
+++ b/net/iucv/af_iucv.c
@@ -1331,6 +1331,8 @@ static int iucv_sock_recvmsg(struct kioc
struct sk_buff *skb, *rskb, *cskb;
int err = 0;

+ msg->msg_namelen = 0;
+
if ((sk->sk_state == IUCV_DISCONN) &&
skb_queue_empty(&iucv->backlog_skb_q) &&
skb_queue_empty(&sk->sk_receive_queue) &&

2013-04-29 19:02:35

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [ 10/42] caif: Fix missing msg_namelen update in caif_seqpkt_recvmsg()

3.8-stable review patch. If anyone has any objections, please let me know.

------------------


From: Mathias Krause <[email protected]>

[ Upstream commit 2d6fbfe733f35c6b355c216644e08e149c61b271 ]

The current code does not fill the msg_name member in case it is set.
It also does not set the msg_namelen member to 0 and therefore makes
net/socket.c leak the local, uninitialized sockaddr_storage variable
to userland -- 128 bytes of kernel stack memory.

Fix that by simply setting msg_namelen to 0 as obviously nobody cared
about caif_seqpkt_recvmsg() not filling the msg_name in case it was
set.

Signed-off-by: Mathias Krause <[email protected]>
Cc: Sjur Braendeland <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
net/caif/caif_socket.c | 2 ++
1 file changed, 2 insertions(+)

--- a/net/caif/caif_socket.c
+++ b/net/caif/caif_socket.c
@@ -286,6 +286,8 @@ static int caif_seqpkt_recvmsg(struct ki
if (m->msg_flags&MSG_OOB)
goto read_error;

+ m->msg_namelen = 0;
+
skb = skb_recv_datagram(sk, flags, 0 , &ret);
if (!skb)
goto read_error;

2013-04-29 19:43:34

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [ 13/42] l2tp: fix info leak in l2tp_ip6_recvmsg()

3.8-stable review patch. If anyone has any objections, please let me know.

------------------


From: Mathias Krause <[email protected]>

[ Upstream commit b860d3cc62877fad02863e2a08efff69a19382d2 ]

The L2TP code for IPv6 fails to initialize the l2tp_conn_id member of
struct sockaddr_l2tpip6 and therefore leaks four bytes kernel stack
in l2tp_ip6_recvmsg() in case msg_name is set.

Initialize l2tp_conn_id with 0 to avoid the info leak.

Signed-off-by: Mathias Krause <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
net/l2tp/l2tp_ip6.c | 1 +
1 file changed, 1 insertion(+)

--- a/net/l2tp/l2tp_ip6.c
+++ b/net/l2tp/l2tp_ip6.c
@@ -684,6 +684,7 @@ static int l2tp_ip6_recvmsg(struct kiocb
lsa->l2tp_addr = ipv6_hdr(skb)->saddr;
lsa->l2tp_flowinfo = 0;
lsa->l2tp_scope_id = 0;
+ lsa->l2tp_conn_id = 0;
if (ipv6_addr_type(&lsa->l2tp_addr) & IPV6_ADDR_LINKLOCAL)
lsa->l2tp_scope_id = IP6CB(skb)->iif;
}

2013-04-29 19:43:52

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [ 11/42] irda: Fix missing msg_namelen update in irda_recvmsg_dgram()

3.8-stable review patch. If anyone has any objections, please let me know.

------------------


From: Mathias Krause <[email protected]>

[ Upstream commit 5ae94c0d2f0bed41d6718be743985d61b7f5c47d ]

The current code does not fill the msg_name member in case it is set.
It also does not set the msg_namelen member to 0 and therefore makes
net/socket.c leak the local, uninitialized sockaddr_storage variable
to userland -- 128 bytes of kernel stack memory.

Fix that by simply setting msg_namelen to 0 as obviously nobody cared
about irda_recvmsg_dgram() not filling the msg_name in case it was
set.

Signed-off-by: Mathias Krause <[email protected]>
Cc: Samuel Ortiz <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
net/irda/af_irda.c | 2 ++
1 file changed, 2 insertions(+)

--- a/net/irda/af_irda.c
+++ b/net/irda/af_irda.c
@@ -1386,6 +1386,8 @@ static int irda_recvmsg_dgram(struct kio

IRDA_DEBUG(4, "%s()\n", __func__);

+ msg->msg_namelen = 0;
+
skb = skb_recv_datagram(sk, flags & ~MSG_DONTWAIT,
flags & MSG_DONTWAIT, &err);
if (!skb)

2013-04-29 19:44:16

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [ 09/42] Bluetooth: SCO - Fix missing msg_namelen update in sco_sock_recvmsg()

3.8-stable review patch. If anyone has any objections, please let me know.

------------------


From: Mathias Krause <[email protected]>

[ Upstream commit c8c499175f7d295ef867335bceb9a76a2c3cdc38 ]

If the socket is in state BT_CONNECT2 and BT_SK_DEFER_SETUP is set in
the flags, sco_sock_recvmsg() returns early with 0 without updating the
possibly set msg_namelen member. This, in turn, leads to a 128 byte
kernel stack leak in net/socket.c.

Fix this by updating msg_namelen in this case. For all other cases it
will be handled in bt_sock_recvmsg().

Signed-off-by: Mathias Krause <[email protected]>
Cc: Marcel Holtmann <[email protected]>
Cc: Gustavo Padovan <[email protected]>
Cc: Johan Hedberg <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
net/bluetooth/sco.c | 1 +
1 file changed, 1 insertion(+)

--- a/net/bluetooth/sco.c
+++ b/net/bluetooth/sco.c
@@ -667,6 +667,7 @@ static int sco_sock_recvmsg(struct kiocb
test_bit(BT_SK_DEFER_SETUP, &bt_sk(sk)->flags)) {
hci_conn_accept(pi->conn->hcon, 0);
sk->sk_state = BT_CONFIG;
+ msg->msg_namelen = 0;

release_sock(sk);
return 0;

2013-04-29 19:02:33

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [ 02/42] TTY: do not update atime/mtime on read/write

3.8-stable review patch. If anyone has any objections, please let me know.

------------------

From: Jiri Slaby <[email protected]>

commit b0de59b5733d18b0d1974a060860a8b5c1b36a2e upstream.

On http://vladz.devzero.fr/013_ptmx-timing.php, we can see how to find
out length of a password using timestamps of /dev/ptmx. It is
documented in "Timing Analysis of Keystrokes and Timing Attacks on
SSH". To avoid that problem, do not update time when reading
from/writing to a TTY.

I am afraid of regressions as this is a behavior we have since 0.97
and apps may expect the time to be current, e.g. for monitoring
whether there was a change on the TTY. Now, there is no change. So
this would better have a lot of testing before it goes upstream.

References: CVE-2013-0160

Signed-off-by: Jiri Slaby <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
drivers/tty/tty_io.c | 8 ++------
1 file changed, 2 insertions(+), 6 deletions(-)

--- a/drivers/tty/tty_io.c
+++ b/drivers/tty/tty_io.c
@@ -977,8 +977,7 @@ static ssize_t tty_read(struct file *fil
else
i = -EIO;
tty_ldisc_deref(ld);
- if (i > 0)
- inode->i_atime = current_fs_time(inode->i_sb);
+
return i;
}

@@ -1079,11 +1078,8 @@ static inline ssize_t do_tty_write(
break;
cond_resched();
}
- if (written) {
- struct inode *inode = file->f_path.dentry->d_inode;
- inode->i_mtime = current_fs_time(inode->i_sb);
+ if (written)
ret = written;
- }
out:
tty_write_unlock(tty);
return ret;

2013-04-29 19:44:42

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [ 08/42] Bluetooth: RFCOMM - Fix missing msg_namelen update in rfcomm_sock_recvmsg()

3.8-stable review patch. If anyone has any objections, please let me know.

------------------


From: Mathias Krause <[email protected]>

[ Upstream commit e11e0455c0d7d3d62276a0c55d9dfbc16779d691 ]

If RFCOMM_DEFER_SETUP is set in the flags, rfcomm_sock_recvmsg() returns
early with 0 without updating the possibly set msg_namelen member. This,
in turn, leads to a 128 byte kernel stack leak in net/socket.c.

Fix this by updating msg_namelen in this case. For all other cases it
will be handled in bt_sock_stream_recvmsg().

Signed-off-by: Mathias Krause <[email protected]>
Cc: Marcel Holtmann <[email protected]>
Cc: Gustavo Padovan <[email protected]>
Cc: Johan Hedberg <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
net/bluetooth/rfcomm/sock.c | 1 +
1 file changed, 1 insertion(+)

--- a/net/bluetooth/rfcomm/sock.c
+++ b/net/bluetooth/rfcomm/sock.c
@@ -610,6 +610,7 @@ static int rfcomm_sock_recvmsg(struct ki

if (test_and_clear_bit(RFCOMM_DEFER_SETUP, &d->flags)) {
rfcomm_dlc_accept(d);
+ msg->msg_namelen = 0;
return 0;
}


2013-04-29 19:45:10

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [ 07/42] Bluetooth: fix possible info leak in bt_sock_recvmsg()

3.8-stable review patch. If anyone has any objections, please let me know.

------------------


From: Mathias Krause <[email protected]>

[ Upstream commit 4683f42fde3977bdb4e8a09622788cc8b5313778 ]

In case the socket is already shutting down, bt_sock_recvmsg() returns
with 0 without updating msg_namelen leading to net/socket.c leaking the
local, uninitialized sockaddr_storage variable to userland -- 128 bytes
of kernel stack memory.

Fix this by moving the msg_namelen assignment in front of the shutdown
test.

Signed-off-by: Mathias Krause <[email protected]>
Cc: Marcel Holtmann <[email protected]>
Cc: Gustavo Padovan <[email protected]>
Cc: Johan Hedberg <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
net/bluetooth/af_bluetooth.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

--- a/net/bluetooth/af_bluetooth.c
+++ b/net/bluetooth/af_bluetooth.c
@@ -230,6 +230,8 @@ int bt_sock_recvmsg(struct kiocb *iocb,
if (flags & (MSG_OOB))
return -EOPNOTSUPP;

+ msg->msg_namelen = 0;
+
skb = skb_recv_datagram(sk, flags, noblock, &err);
if (!skb) {
if (sk->sk_shutdown & RCV_SHUTDOWN)
@@ -237,8 +239,6 @@ int bt_sock_recvmsg(struct kiocb *iocb,
return err;
}

- msg->msg_namelen = 0;
-
copied = skb->len;
if (len < copied) {
msg->msg_flags |= MSG_TRUNC;

2013-04-29 19:02:31

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [ 01/42] aio: fix possible invalid memory access when DEBUG is enabled

3.8-stable review patch. If anyone has any objections, please let me know.

------------------

From: Zhao Hongjiang <[email protected]>

commit 91d80a84bbc8f28375cca7e65ec666577b4209ad upstream.

dprintk() shouldn't access @ring after it's unmapped.

Signed-off-by: Zhao Hongjiang <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
fs/aio.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

--- a/fs/aio.c
+++ b/fs/aio.c
@@ -1027,9 +1027,9 @@ static int aio_read_evt(struct kioctx *i
spin_unlock(&info->ring_lock);

out:
- kunmap_atomic(ring);
dprintk("leaving aio_read_evt: %d h%lu t%lu\n", ret,
(unsigned long)ring->head, (unsigned long)ring->tail);
+ kunmap_atomic(ring);
return ret;
}


2013-04-29 19:47:44

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [ 06/42] ax25: fix info leak via msg_name in ax25_recvmsg()

3.8-stable review patch. If anyone has any objections, please let me know.

------------------


From: Mathias Krause <[email protected]>

[ Upstream commit ef3313e84acbf349caecae942ab3ab731471f1a1 ]

When msg_namelen is non-zero the sockaddr info gets filled out, as
requested, but the code fails to initialize the padding bytes of struct
sockaddr_ax25 inserted by the compiler for alignment. Additionally the
msg_namelen value is updated to sizeof(struct full_sockaddr_ax25) but is
not always filled up to this size.

Both issues lead to the fact that the code will leak uninitialized
kernel stack bytes in net/socket.c.

Fix both issues by initializing the memory with memset(0).

Signed-off-by: Mathias Krause <[email protected]>
Cc: Ralf Baechle <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
net/ax25/af_ax25.c | 1 +
1 file changed, 1 insertion(+)

--- a/net/ax25/af_ax25.c
+++ b/net/ax25/af_ax25.c
@@ -1647,6 +1647,7 @@ static int ax25_recvmsg(struct kiocb *io
ax25_address src;
const unsigned char *mac = skb_mac_header(skb);

+ memset(sax, 0, sizeof(struct full_sockaddr_ax25));
ax25_addr_parse(mac + 1, skb->data - mac - 1, &src, NULL,
&digi, NULL, NULL);
sax->sax25_family = AF_AX25;

2013-04-29 19:48:08

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [ 04/42] sparc64: Fix race in TLB batch processing.

3.8-stable review patch. If anyone has any objections, please let me know.

------------------

From: "David S. Miller" <[email protected]>

[ Commits f36391d2790d04993f48da6a45810033a2cdf847 and
f0af97070acbad5d6a361f485828223a4faaa0ee upstream. ]

As reported by Dave Kleikamp, when we emit cross calls to do batched
TLB flush processing we have a race because we do not synchronize on
the sibling cpus completing the cross call.

So meanwhile the TLB batch can be reset (tb->tlb_nr set to zero, etc.)
and either flushes are missed or flushes will flush the wrong
addresses.

Fix this by using generic infrastructure to synchonize on the
completion of the cross call.

This first required getting the flush_tlb_pending() call out from
switch_to() which operates with locks held and interrupts disabled.
The problem is that smp_call_function_many() cannot be invoked with
IRQs disabled and this is explicitly checked for with WARN_ON_ONCE().

We get the batch processing outside of locked IRQ disabled sections by
using some ideas from the powerpc port. Namely, we only batch inside
of arch_{enter,leave}_lazy_mmu_mode() calls. If we're not in such a
region, we flush TLBs synchronously.

1) Get rid of xcall_flush_tlb_pending and per-cpu type
implementations.

2) Do TLB batch cross calls instead via:

smp_call_function_many()
tlb_pending_func()
__flush_tlb_pending()

3) Batch only in lazy mmu sequences:

a) Add 'active' member to struct tlb_batch
b) Define __HAVE_ARCH_ENTER_LAZY_MMU_MODE
c) Set 'active' in arch_enter_lazy_mmu_mode()
d) Run batch and clear 'active' in arch_leave_lazy_mmu_mode()
e) Check 'active' in tlb_batch_add_one() and do a synchronous
flush if it's clear.

4) Add infrastructure for synchronous TLB page flushes.

a) Implement __flush_tlb_page and per-cpu variants, patch
as needed.
b) Likewise for xcall_flush_tlb_page.
c) Implement smp_flush_tlb_page() to invoke the cross-call.
d) Wire up global_flush_tlb_page() to the right routine based
upon CONFIG_SMP

5) It turns out that singleton batches are very common, 2 out of every
3 batch flushes have only a single entry in them.

The batch flush waiting is very expensive, both because of the poll
on sibling cpu completeion, as well as because passing the tlb batch
pointer to the sibling cpus invokes a shared memory dereference.

Therefore, in flush_tlb_pending(), if there is only one entry in
the batch perform a completely asynchronous global_flush_tlb_page()
instead.

Reported-by: Dave Kleikamp <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Acked-by: Dave Kleikamp <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
arch/sparc/include/asm/pgtable_64.h | 1
arch/sparc/include/asm/switch_to_64.h | 3
arch/sparc/include/asm/tlbflush_64.h | 37 ++++++++--
arch/sparc/kernel/smp_64.c | 41 ++++++++++-
arch/sparc/mm/tlb.c | 39 ++++++++++-
arch/sparc/mm/tsb.c | 57 ++++++++++++----
arch/sparc/mm/ultra.S | 119 +++++++++++++++++++++++++++-------
7 files changed, 242 insertions(+), 55 deletions(-)

--- a/arch/sparc/include/asm/pgtable_64.h
+++ b/arch/sparc/include/asm/pgtable_64.h
@@ -915,6 +915,7 @@ static inline int io_remap_pfn_range(str
return remap_pfn_range(vma, from, phys_base >> PAGE_SHIFT, size, prot);
}

+#include <asm/tlbflush.h>
#include <asm-generic/pgtable.h>

/* We provide our own get_unmapped_area to cope with VA holes and
--- a/arch/sparc/include/asm/switch_to_64.h
+++ b/arch/sparc/include/asm/switch_to_64.h
@@ -18,8 +18,7 @@ do { \
* and 2 stores in this critical code path. -DaveM
*/
#define switch_to(prev, next, last) \
-do { flush_tlb_pending(); \
- save_and_clear_fpu(); \
+do { save_and_clear_fpu(); \
/* If you are tempted to conditionalize the following */ \
/* so that ASI is only written if it changes, think again. */ \
__asm__ __volatile__("wr %%g0, %0, %%asi" \
--- a/arch/sparc/include/asm/tlbflush_64.h
+++ b/arch/sparc/include/asm/tlbflush_64.h
@@ -11,24 +11,40 @@
struct tlb_batch {
struct mm_struct *mm;
unsigned long tlb_nr;
+ unsigned long active;
unsigned long vaddrs[TLB_BATCH_NR];
};

extern void flush_tsb_kernel_range(unsigned long start, unsigned long end);
extern void flush_tsb_user(struct tlb_batch *tb);
+extern void flush_tsb_user_page(struct mm_struct *mm, unsigned long vaddr);

/* TLB flush operations. */

-extern void flush_tlb_pending(void);
+static inline void flush_tlb_mm(struct mm_struct *mm)
+{
+}
+
+static inline void flush_tlb_page(struct vm_area_struct *vma,
+ unsigned long vmaddr)
+{
+}
+
+static inline void flush_tlb_range(struct vm_area_struct *vma,
+ unsigned long start, unsigned long end)
+{
+}
+
+#define __HAVE_ARCH_ENTER_LAZY_MMU_MODE

-#define flush_tlb_range(vma,start,end) \
- do { (void)(start); flush_tlb_pending(); } while (0)
-#define flush_tlb_page(vma,addr) flush_tlb_pending()
-#define flush_tlb_mm(mm) flush_tlb_pending()
+extern void flush_tlb_pending(void);
+extern void arch_enter_lazy_mmu_mode(void);
+extern void arch_leave_lazy_mmu_mode(void);
+#define arch_flush_lazy_mmu_mode() do {} while (0)

/* Local cpu only. */
extern void __flush_tlb_all(void);
-
+extern void __flush_tlb_page(unsigned long context, unsigned long vaddr);
extern void __flush_tlb_kernel_range(unsigned long start, unsigned long end);

#ifndef CONFIG_SMP
@@ -38,15 +54,24 @@ do { flush_tsb_kernel_range(start,end);
__flush_tlb_kernel_range(start,end); \
} while (0)

+static inline void global_flush_tlb_page(struct mm_struct *mm, unsigned long vaddr)
+{
+ __flush_tlb_page(CTX_HWBITS(mm->context), vaddr);
+}
+
#else /* CONFIG_SMP */

extern void smp_flush_tlb_kernel_range(unsigned long start, unsigned long end);
+extern void smp_flush_tlb_page(struct mm_struct *mm, unsigned long vaddr);

#define flush_tlb_kernel_range(start, end) \
do { flush_tsb_kernel_range(start,end); \
smp_flush_tlb_kernel_range(start, end); \
} while (0)

+#define global_flush_tlb_page(mm, vaddr) \
+ smp_flush_tlb_page(mm, vaddr)
+
#endif /* ! CONFIG_SMP */

#endif /* _SPARC64_TLBFLUSH_H */
--- a/arch/sparc/kernel/smp_64.c
+++ b/arch/sparc/kernel/smp_64.c
@@ -849,7 +849,7 @@ void smp_tsb_sync(struct mm_struct *mm)
}

extern unsigned long xcall_flush_tlb_mm;
-extern unsigned long xcall_flush_tlb_pending;
+extern unsigned long xcall_flush_tlb_page;
extern unsigned long xcall_flush_tlb_kernel_range;
extern unsigned long xcall_fetch_glob_regs;
extern unsigned long xcall_fetch_glob_pmu;
@@ -1074,22 +1074,55 @@ local_flush_and_out:
put_cpu();
}

+struct tlb_pending_info {
+ unsigned long ctx;
+ unsigned long nr;
+ unsigned long *vaddrs;
+};
+
+static void tlb_pending_func(void *info)
+{
+ struct tlb_pending_info *t = info;
+
+ __flush_tlb_pending(t->ctx, t->nr, t->vaddrs);
+}
+
void smp_flush_tlb_pending(struct mm_struct *mm, unsigned long nr, unsigned long *vaddrs)
{
u32 ctx = CTX_HWBITS(mm->context);
+ struct tlb_pending_info info;
int cpu = get_cpu();

+ info.ctx = ctx;
+ info.nr = nr;
+ info.vaddrs = vaddrs;
+
if (mm == current->mm && atomic_read(&mm->mm_users) == 1)
cpumask_copy(mm_cpumask(mm), cpumask_of(cpu));
else
- smp_cross_call_masked(&xcall_flush_tlb_pending,
- ctx, nr, (unsigned long) vaddrs,
- mm_cpumask(mm));
+ smp_call_function_many(mm_cpumask(mm), tlb_pending_func,
+ &info, 1);

__flush_tlb_pending(ctx, nr, vaddrs);

put_cpu();
}
+
+void smp_flush_tlb_page(struct mm_struct *mm, unsigned long vaddr)
+{
+ unsigned long context = CTX_HWBITS(mm->context);
+ int cpu = get_cpu();
+
+ if (mm == current->mm && atomic_read(&mm->mm_users) == 1)
+ cpumask_copy(mm_cpumask(mm), cpumask_of(cpu));
+ else
+ smp_cross_call_masked(&xcall_flush_tlb_page,
+ context, vaddr, 0,
+ mm_cpumask(mm));
+ __flush_tlb_page(context, vaddr);
+
+ put_cpu();
+}

void smp_flush_tlb_kernel_range(unsigned long start, unsigned long end)
{
--- a/arch/sparc/mm/tlb.c
+++ b/arch/sparc/mm/tlb.c
@@ -24,11 +24,17 @@ static DEFINE_PER_CPU(struct tlb_batch,
void flush_tlb_pending(void)
{
struct tlb_batch *tb = &get_cpu_var(tlb_batch);
+ struct mm_struct *mm = tb->mm;

- if (tb->tlb_nr) {
- flush_tsb_user(tb);
+ if (!tb->tlb_nr)
+ goto out;

- if (CTX_VALID(tb->mm->context)) {
+ flush_tsb_user(tb);
+
+ if (CTX_VALID(mm->context)) {
+ if (tb->tlb_nr == 1) {
+ global_flush_tlb_page(mm, tb->vaddrs[0]);
+ } else {
#ifdef CONFIG_SMP
smp_flush_tlb_pending(tb->mm, tb->tlb_nr,
&tb->vaddrs[0]);
@@ -37,12 +43,30 @@ void flush_tlb_pending(void)
tb->tlb_nr, &tb->vaddrs[0]);
#endif
}
- tb->tlb_nr = 0;
}

+ tb->tlb_nr = 0;
+
+out:
put_cpu_var(tlb_batch);
}

+void arch_enter_lazy_mmu_mode(void)
+{
+ struct tlb_batch *tb = &__get_cpu_var(tlb_batch);
+
+ tb->active = 1;
+}
+
+void arch_leave_lazy_mmu_mode(void)
+{
+ struct tlb_batch *tb = &__get_cpu_var(tlb_batch);
+
+ if (tb->tlb_nr)
+ flush_tlb_pending();
+ tb->active = 0;
+}
+
static void tlb_batch_add_one(struct mm_struct *mm, unsigned long vaddr,
bool exec)
{
@@ -60,6 +84,12 @@ static void tlb_batch_add_one(struct mm_
nr = 0;
}

+ if (!tb->active) {
+ global_flush_tlb_page(mm, vaddr);
+ flush_tsb_user_page(mm, vaddr);
+ goto out;
+ }
+
if (nr == 0)
tb->mm = mm;

@@ -68,6 +98,7 @@ static void tlb_batch_add_one(struct mm_
if (nr >= TLB_BATCH_NR)
flush_tlb_pending();

+out:
put_cpu_var(tlb_batch);
}

--- a/arch/sparc/mm/tsb.c
+++ b/arch/sparc/mm/tsb.c
@@ -7,11 +7,10 @@
#include <linux/preempt.h>
#include <linux/slab.h>
#include <asm/page.h>
-#include <asm/tlbflush.h>
-#include <asm/tlb.h>
-#include <asm/mmu_context.h>
#include <asm/pgtable.h>
+#include <asm/mmu_context.h>
#include <asm/tsb.h>
+#include <asm/tlb.h>
#include <asm/oplib.h>

extern struct tsb swapper_tsb[KERNEL_TSB_NENTRIES];
@@ -46,23 +45,27 @@ void flush_tsb_kernel_range(unsigned lon
}
}

-static void __flush_tsb_one(struct tlb_batch *tb, unsigned long hash_shift,
- unsigned long tsb, unsigned long nentries)
+static void __flush_tsb_one_entry(unsigned long tsb, unsigned long v,
+ unsigned long hash_shift,
+ unsigned long nentries)
{
- unsigned long i;
+ unsigned long tag, ent, hash;

- for (i = 0; i < tb->tlb_nr; i++) {
- unsigned long v = tb->vaddrs[i];
- unsigned long tag, ent, hash;
+ v &= ~0x1UL;
+ hash = tsb_hash(v, hash_shift, nentries);
+ ent = tsb + (hash * sizeof(struct tsb));
+ tag = (v >> 22UL);

- v &= ~0x1UL;
+ tsb_flush(ent, tag);
+}

- hash = tsb_hash(v, hash_shift, nentries);
- ent = tsb + (hash * sizeof(struct tsb));
- tag = (v >> 22UL);
+static void __flush_tsb_one(struct tlb_batch *tb, unsigned long hash_shift,
+ unsigned long tsb, unsigned long nentries)
+{
+ unsigned long i;

- tsb_flush(ent, tag);
- }
+ for (i = 0; i < tb->tlb_nr; i++)
+ __flush_tsb_one_entry(tsb, tb->vaddrs[i], hash_shift, nentries);
}

void flush_tsb_user(struct tlb_batch *tb)
@@ -88,6 +91,30 @@ void flush_tsb_user(struct tlb_batch *tb
}
#endif
spin_unlock_irqrestore(&mm->context.lock, flags);
+}
+
+void flush_tsb_user_page(struct mm_struct *mm, unsigned long vaddr)
+{
+ unsigned long nentries, base, flags;
+
+ spin_lock_irqsave(&mm->context.lock, flags);
+
+ base = (unsigned long) mm->context.tsb_block[MM_TSB_BASE].tsb;
+ nentries = mm->context.tsb_block[MM_TSB_BASE].tsb_nentries;
+ if (tlb_type == cheetah_plus || tlb_type == hypervisor)
+ base = __pa(base);
+ __flush_tsb_one_entry(base, vaddr, PAGE_SHIFT, nentries);
+
+#if defined(CONFIG_HUGETLB_PAGE) || defined(CONFIG_TRANSPARENT_HUGEPAGE)
+ if (mm->context.tsb_block[MM_TSB_HUGE].tsb) {
+ base = (unsigned long) mm->context.tsb_block[MM_TSB_HUGE].tsb;
+ nentries = mm->context.tsb_block[MM_TSB_HUGE].tsb_nentries;
+ if (tlb_type == cheetah_plus || tlb_type == hypervisor)
+ base = __pa(base);
+ __flush_tsb_one_entry(base, vaddr, HPAGE_SHIFT, nentries);
+ }
+#endif
+ spin_unlock_irqrestore(&mm->context.lock, flags);
}

#define HV_PGSZ_IDX_BASE HV_PGSZ_IDX_8K
--- a/arch/sparc/mm/ultra.S
+++ b/arch/sparc/mm/ultra.S
@@ -53,6 +53,33 @@ __flush_tlb_mm: /* 18 insns */
nop

.align 32
+ .globl __flush_tlb_page
+__flush_tlb_page: /* 22 insns */
+ /* %o0 = context, %o1 = vaddr */
+ rdpr %pstate, %g7
+ andn %g7, PSTATE_IE, %g2
+ wrpr %g2, %pstate
+ mov SECONDARY_CONTEXT, %o4
+ ldxa [%o4] ASI_DMMU, %g2
+ stxa %o0, [%o4] ASI_DMMU
+ andcc %o1, 1, %g0
+ andn %o1, 1, %o3
+ be,pn %icc, 1f
+ or %o3, 0x10, %o3
+ stxa %g0, [%o3] ASI_IMMU_DEMAP
+1: stxa %g0, [%o3] ASI_DMMU_DEMAP
+ membar #Sync
+ stxa %g2, [%o4] ASI_DMMU
+ sethi %hi(KERNBASE), %o4
+ flush %o4
+ retl
+ wrpr %g7, 0x0, %pstate
+ nop
+ nop
+ nop
+ nop
+
+ .align 32
.globl __flush_tlb_pending
__flush_tlb_pending: /* 26 insns */
/* %o0 = context, %o1 = nr, %o2 = vaddrs[] */
@@ -203,6 +230,31 @@ __cheetah_flush_tlb_mm: /* 19 insns */
retl
wrpr %g7, 0x0, %pstate

+__cheetah_flush_tlb_page: /* 22 insns */
+ /* %o0 = context, %o1 = vaddr */
+ rdpr %pstate, %g7
+ andn %g7, PSTATE_IE, %g2
+ wrpr %g2, 0x0, %pstate
+ wrpr %g0, 1, %tl
+ mov PRIMARY_CONTEXT, %o4
+ ldxa [%o4] ASI_DMMU, %g2
+ srlx %g2, CTX_PGSZ1_NUC_SHIFT, %o3
+ sllx %o3, CTX_PGSZ1_NUC_SHIFT, %o3
+ or %o0, %o3, %o0 /* Preserve nucleus page size fields */
+ stxa %o0, [%o4] ASI_DMMU
+ andcc %o1, 1, %g0
+ be,pn %icc, 1f
+ andn %o1, 1, %o3
+ stxa %g0, [%o3] ASI_IMMU_DEMAP
+1: stxa %g0, [%o3] ASI_DMMU_DEMAP
+ membar #Sync
+ stxa %g2, [%o4] ASI_DMMU
+ sethi %hi(KERNBASE), %o4
+ flush %o4
+ wrpr %g0, 0, %tl
+ retl
+ wrpr %g7, 0x0, %pstate
+
__cheetah_flush_tlb_pending: /* 27 insns */
/* %o0 = context, %o1 = nr, %o2 = vaddrs[] */
rdpr %pstate, %g7
@@ -269,6 +321,20 @@ __hypervisor_flush_tlb_mm: /* 10 insns *
retl
nop

+__hypervisor_flush_tlb_page: /* 11 insns */
+ /* %o0 = context, %o1 = vaddr */
+ mov %o0, %g2
+ mov %o1, %o0 /* ARG0: vaddr + IMMU-bit */
+ mov %g2, %o1 /* ARG1: mmu context */
+ mov HV_MMU_ALL, %o2 /* ARG2: flags */
+ srlx %o0, PAGE_SHIFT, %o0
+ sllx %o0, PAGE_SHIFT, %o0
+ ta HV_MMU_UNMAP_ADDR_TRAP
+ brnz,pn %o0, __hypervisor_tlb_tl0_error
+ mov HV_MMU_UNMAP_ADDR_TRAP, %o1
+ retl
+ nop
+
__hypervisor_flush_tlb_pending: /* 16 insns */
/* %o0 = context, %o1 = nr, %o2 = vaddrs[] */
sllx %o1, 3, %g1
@@ -339,6 +405,13 @@ cheetah_patch_cachetlbops:
call tlb_patch_one
mov 19, %o2

+ sethi %hi(__flush_tlb_page), %o0
+ or %o0, %lo(__flush_tlb_page), %o0
+ sethi %hi(__cheetah_flush_tlb_page), %o1
+ or %o1, %lo(__cheetah_flush_tlb_page), %o1
+ call tlb_patch_one
+ mov 22, %o2
+
sethi %hi(__flush_tlb_pending), %o0
or %o0, %lo(__flush_tlb_pending), %o0
sethi %hi(__cheetah_flush_tlb_pending), %o1
@@ -397,10 +470,9 @@ xcall_flush_tlb_mm: /* 21 insns */
nop
nop

- .globl xcall_flush_tlb_pending
-xcall_flush_tlb_pending: /* 21 insns */
- /* %g5=context, %g1=nr, %g7=vaddrs[] */
- sllx %g1, 3, %g1
+ .globl xcall_flush_tlb_page
+xcall_flush_tlb_page: /* 17 insns */
+ /* %g5=context, %g1=vaddr */
mov PRIMARY_CONTEXT, %g4
ldxa [%g4] ASI_DMMU, %g2
srlx %g2, CTX_PGSZ1_NUC_SHIFT, %g4
@@ -408,20 +480,16 @@ xcall_flush_tlb_pending: /* 21 insns */
or %g5, %g4, %g5
mov PRIMARY_CONTEXT, %g4
stxa %g5, [%g4] ASI_DMMU
-1: sub %g1, (1 << 3), %g1
- ldx [%g7 + %g1], %g5
- andcc %g5, 0x1, %g0
+ andcc %g1, 0x1, %g0
be,pn %icc, 2f
-
- andn %g5, 0x1, %g5
+ andn %g1, 0x1, %g5
stxa %g0, [%g5] ASI_IMMU_DEMAP
2: stxa %g0, [%g5] ASI_DMMU_DEMAP
membar #Sync
- brnz,pt %g1, 1b
- nop
stxa %g2, [%g4] ASI_DMMU
retry
nop
+ nop

.globl xcall_flush_tlb_kernel_range
xcall_flush_tlb_kernel_range: /* 25 insns */
@@ -656,15 +724,13 @@ __hypervisor_xcall_flush_tlb_mm: /* 21 i
membar #Sync
retry

- .globl __hypervisor_xcall_flush_tlb_pending
-__hypervisor_xcall_flush_tlb_pending: /* 21 insns */
- /* %g5=ctx, %g1=nr, %g7=vaddrs[], %g2,%g3,%g4,g6=scratch */
- sllx %g1, 3, %g1
+ .globl __hypervisor_xcall_flush_tlb_page
+__hypervisor_xcall_flush_tlb_page: /* 17 insns */
+ /* %g5=ctx, %g1=vaddr */
mov %o0, %g2
mov %o1, %g3
mov %o2, %g4
-1: sub %g1, (1 << 3), %g1
- ldx [%g7 + %g1], %o0 /* ARG0: virtual address */
+ mov %g1, %o0 /* ARG0: virtual address */
mov %g5, %o1 /* ARG1: mmu context */
mov HV_MMU_ALL, %o2 /* ARG2: flags */
srlx %o0, PAGE_SHIFT, %o0
@@ -673,8 +739,6 @@ __hypervisor_xcall_flush_tlb_pending: /*
mov HV_MMU_UNMAP_ADDR_TRAP, %g6
brnz,a,pn %o0, __hypervisor_tlb_xcall_error
mov %o0, %g5
- brnz,pt %g1, 1b
- nop
mov %g2, %o0
mov %g3, %o1
mov %g4, %o2
@@ -757,6 +821,13 @@ hypervisor_patch_cachetlbops:
call tlb_patch_one
mov 10, %o2

+ sethi %hi(__flush_tlb_page), %o0
+ or %o0, %lo(__flush_tlb_page), %o0
+ sethi %hi(__hypervisor_flush_tlb_page), %o1
+ or %o1, %lo(__hypervisor_flush_tlb_page), %o1
+ call tlb_patch_one
+ mov 11, %o2
+
sethi %hi(__flush_tlb_pending), %o0
or %o0, %lo(__flush_tlb_pending), %o0
sethi %hi(__hypervisor_flush_tlb_pending), %o1
@@ -788,12 +859,12 @@ hypervisor_patch_cachetlbops:
call tlb_patch_one
mov 21, %o2

- sethi %hi(xcall_flush_tlb_pending), %o0
- or %o0, %lo(xcall_flush_tlb_pending), %o0
- sethi %hi(__hypervisor_xcall_flush_tlb_pending), %o1
- or %o1, %lo(__hypervisor_xcall_flush_tlb_pending), %o1
+ sethi %hi(xcall_flush_tlb_page), %o0
+ or %o0, %lo(xcall_flush_tlb_page), %o0
+ sethi %hi(__hypervisor_xcall_flush_tlb_page), %o1
+ or %o1, %lo(__hypervisor_xcall_flush_tlb_page), %o1
call tlb_patch_one
- mov 21, %o2
+ mov 17, %o2

sethi %hi(xcall_flush_tlb_kernel_range), %o0
or %o0, %lo(xcall_flush_tlb_kernel_range), %o0

2013-04-29 19:48:07

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [ 05/42] atm: update msg_namelen in vcc_recvmsg()

3.8-stable review patch. If anyone has any objections, please let me know.

------------------


From: Mathias Krause <[email protected]>

[ Upstream commit 9b3e617f3df53822345a8573b6d358f6b9e5ed87 ]

The current code does not fill the msg_name member in case it is set.
It also does not set the msg_namelen member to 0 and therefore makes
net/socket.c leak the local, uninitialized sockaddr_storage variable
to userland -- 128 bytes of kernel stack memory.

Fix that by simply setting msg_namelen to 0 as obviously nobody cared
about vcc_recvmsg() not filling the msg_name in case it was set.

Signed-off-by: Mathias Krause <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
net/atm/common.c | 2 ++
1 file changed, 2 insertions(+)

--- a/net/atm/common.c
+++ b/net/atm/common.c
@@ -532,6 +532,8 @@ int vcc_recvmsg(struct kiocb *iocb, stru
struct sk_buff *skb;
int copied, error = -EINVAL;

+ msg->msg_namelen = 0;
+
if (sock->state != SS_CONNECTED)
return -ENOTCONN;


2013-04-29 19:49:03

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [ 03/42] TTY: fix atime/mtime regression

3.8-stable review patch. If anyone has any objections, please let me know.

------------------

From: Jiri Slaby <[email protected]>

commit 37b7f3c76595e23257f61bd80b223de8658617ee upstream.

In commit b0de59b5733d ("TTY: do not update atime/mtime on read/write")
we removed timestamps from tty inodes to fix a security issue and waited
if something breaks. Well, 'w', the utility to find out logged users
and their inactivity time broke. It shows that users are inactive since
the time they logged in.

To revert to the old behaviour while still preventing attackers to
guess the password length, we update the timestamps in one-minute
intervals by this patch.

Signed-off-by: Jiri Slaby <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
drivers/tty/tty_io.c | 16 +++++++++++++++-
1 file changed, 15 insertions(+), 1 deletion(-)

--- a/drivers/tty/tty_io.c
+++ b/drivers/tty/tty_io.c
@@ -941,6 +941,14 @@ void start_tty(struct tty_struct *tty)

EXPORT_SYMBOL(start_tty);

+static void tty_update_time(struct timespec *time)
+{
+ unsigned long sec = get_seconds();
+ sec -= sec % 60;
+ if ((long)(sec - time->tv_sec) > 0)
+ time->tv_sec = sec;
+}
+
/**
* tty_read - read method for tty device files
* @file: pointer to tty file
@@ -978,6 +986,9 @@ static ssize_t tty_read(struct file *fil
i = -EIO;
tty_ldisc_deref(ld);

+ if (i > 0)
+ tty_update_time(&inode->i_atime);
+
return i;
}

@@ -1078,8 +1089,11 @@ static inline ssize_t do_tty_write(
break;
cond_resched();
}
- if (written)
+ if (written) {
+ struct inode *inode = file->f_path.dentry->d_inode;
+ tty_update_time(&inode->i_mtime);
ret = written;
+ }
out:
tty_write_unlock(tty);
return ret;

2013-04-30 00:21:22

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: [ 02/42] TTY: do not update atime/mtime on read/write

On Mon, Apr 29, 2013 at 05:14:45PM -0700, Simon Kirby wrote:
> On Mon, Apr 29, 2013 at 12:01:44PM -0700, Greg Kroah-Hartman wrote:
>
> > 3.8-stable review patch. If anyone has any objections, please let me know.
>
> I object. This breaks functionality I use every day (seeing who else is
> working on stuff with "w").
>
> Furthermore, the patch does not actually fix the hole referenced (see
> ptmx-keystroke-latency.c on http://vladz.devzero.fr/013_ptmx-timing.php).
> I can still reproduce the timing capture even with this patch applied
> (in 3.9-rc8).

How? There are no keystrokes being reported to other users, or did we
miss something with this patch?

> The grsec patch instead introdues another test within the inotify code
> (is_sidechannel_device()-related bits) -- untested by me, but probably
> more relevant.
>
> Even 37b7f3c76595e23257f61bd80b223de8658617ee, the "regression fix",
> which Linus merged in for the 3.9 release, is still a regression for me.

And I applied that one as well.

> 60 seconds means somebody is asleep in my environment, and so is still
> the kind of thing that just pisses me off. I'd rather revert this whole
> thing.

Users taking a break for longer than a minute upset you? What are you
really trying to keep track of here?

> I'd stand maybe 1 seconds as maximum granularity. You could do that with
> less code and no test.

Patch to show this?

thanks,

greg k-h

2013-04-30 00:33:12

by Simon Kirby

[permalink] [raw]
Subject: Re: [ 02/42] TTY: do not update atime/mtime on read/write

On Mon, Apr 29, 2013 at 12:01:44PM -0700, Greg Kroah-Hartman wrote:

> 3.8-stable review patch. If anyone has any objections, please let me know.

I object. This breaks functionality I use every day (seeing who else is
working on stuff with "w").

Furthermore, the patch does not actually fix the hole referenced (see
ptmx-keystroke-latency.c on http://vladz.devzero.fr/013_ptmx-timing.php).
I can still reproduce the timing capture even with this patch applied
(in 3.9-rc8).

The grsec patch instead introdues another test within the inotify code
(is_sidechannel_device()-related bits) -- untested by me, but probably
more relevant.

Even 37b7f3c76595e23257f61bd80b223de8658617ee, the "regression fix",
which Linus merged in for the 3.9 release, is still a regression for me.
60 seconds means somebody is asleep in my environment, and so is still
the kind of thing that just pisses me off. I'd rather revert this whole
thing.

I'd stand maybe 1 seconds as maximum granularity. You could do that with
less code and no test.

"watch -n.1 ls --full-time /dev/pts/1" shows that the exposed resolution
(without inotify) is to the nanosecond.

Simon-

> ------------------
>
> From: Jiri Slaby <[email protected]>
>
> commit b0de59b5733d18b0d1974a060860a8b5c1b36a2e upstream.
>
> On http://vladz.devzero.fr/013_ptmx-timing.php, we can see how to find
> out length of a password using timestamps of /dev/ptmx. It is
> documented in "Timing Analysis of Keystrokes and Timing Attacks on
> SSH". To avoid that problem, do not update time when reading
> from/writing to a TTY.
>
> I am afraid of regressions as this is a behavior we have since 0.97
> and apps may expect the time to be current, e.g. for monitoring
> whether there was a change on the TTY. Now, there is no change. So
> this would better have a lot of testing before it goes upstream.
>
> References: CVE-2013-0160
>
> Signed-off-by: Jiri Slaby <[email protected]>
> Signed-off-by: Greg Kroah-Hartman <[email protected]>
>
> ---
> drivers/tty/tty_io.c | 8 ++------
> 1 file changed, 2 insertions(+), 6 deletions(-)
>
> --- a/drivers/tty/tty_io.c
> +++ b/drivers/tty/tty_io.c
> @@ -977,8 +977,7 @@ static ssize_t tty_read(struct file *fil
> else
> i = -EIO;
> tty_ldisc_deref(ld);
> - if (i > 0)
> - inode->i_atime = current_fs_time(inode->i_sb);
> +
> return i;
> }
>
> @@ -1079,11 +1078,8 @@ static inline ssize_t do_tty_write(
> break;
> cond_resched();
> }
> - if (written) {
> - struct inode *inode = file->f_path.dentry->d_inode;
> - inode->i_mtime = current_fs_time(inode->i_sb);
> + if (written)
> ret = written;
> - }
> out:
> tty_write_unlock(tty);
> return ret;
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/

2013-04-30 00:36:42

by Simon Kirby

[permalink] [raw]
Subject: Re: [ 02/42] TTY: do not update atime/mtime on read/write

On Mon, Apr 29, 2013 at 05:21:17PM -0700, Greg Kroah-Hartman wrote:

> On Mon, Apr 29, 2013 at 05:14:45PM -0700, Simon Kirby wrote:
> > On Mon, Apr 29, 2013 at 12:01:44PM -0700, Greg Kroah-Hartman wrote:
> >
> > > 3.8-stable review patch. If anyone has any objections, please let me know.
> >
> > I object. This breaks functionality I use every day (seeing who else is
> > working on stuff with "w").
> >
> > Furthermore, the patch does not actually fix the hole referenced (see
> > ptmx-keystroke-latency.c on http://vladz.devzero.fr/013_ptmx-timing.php).
> > I can still reproduce the timing capture even with this patch applied
> > (in 3.9-rc8).
>
> How? There are no keystrokes being reported to other users, or did we
> miss something with this patch?

wget http://vladz.devzero.fr/svn/codes/PoC/ptmx-keystroke-latency.c
gcc -O ptmx-keystroke-latency ptmx-keystroke-latency.c
./ptmx-keystroke-latency

Log in to another tty, as another user. See keystroke timing. 3.9-rc8.

Seems like it was missed. Meanwhile, idle times in "w" do not update.

> > The grsec patch instead introdues another test within the inotify code
> > (is_sidechannel_device()-related bits) -- untested by me, but probably
> > more relevant.
> >
> > Even 37b7f3c76595e23257f61bd80b223de8658617ee, the "regression fix",
> > which Linus merged in for the 3.9 release, is still a regression for me.
>
> And I applied that one as well.

Right, so this restores updates but increases the granularity to 60
seconds. I'm complaining that this is still affects my occupational
performance.

> > 60 seconds means somebody is asleep in my environment, and so is still
> > the kind of thing that just pisses me off. I'd rather revert this whole
> > thing.
>
> Users taking a break for longer than a minute upset you? What are you
> really trying to keep track of here?

Really? In a team environment, a person idle for 30 seconds means they've
stopped to look at something else. Now we have to wait 2 minutes to know
if this has happened or not. Now it becomes faster to interrupt somebody
to ask them if maintenance can be done, etc.

> > I'd stand maybe 1 seconds as maximum granularity. You could do that with
> > less code and no test.
>
> Patch to show this?

I was thinking of just updating the seconds field of the timespec struct,
or leaving this particular part and setting sb->s_time_gran to 100000000,
though that would probably break other things. Since I've never looked at
this stuff before, I'm not sure I should make a patch, but I can...

Simon-

2013-04-30 01:37:44

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: [ 02/42] TTY: do not update atime/mtime on read/write

On Mon, Apr 29, 2013 at 05:36:40PM -0700, Simon Kirby wrote:
> On Mon, Apr 29, 2013 at 05:21:17PM -0700, Greg Kroah-Hartman wrote:
>
> > On Mon, Apr 29, 2013 at 05:14:45PM -0700, Simon Kirby wrote:
> > > On Mon, Apr 29, 2013 at 12:01:44PM -0700, Greg Kroah-Hartman wrote:
> > >
> > > > 3.8-stable review patch. If anyone has any objections, please let me know.
> > >
> > > I object. This breaks functionality I use every day (seeing who else is
> > > working on stuff with "w").
> > >
> > > Furthermore, the patch does not actually fix the hole referenced (see
> > > ptmx-keystroke-latency.c on http://vladz.devzero.fr/013_ptmx-timing.php).
> > > I can still reproduce the timing capture even with this patch applied
> > > (in 3.9-rc8).
> >
> > How? There are no keystrokes being reported to other users, or did we
> > miss something with this patch?
>
> wget http://vladz.devzero.fr/svn/codes/PoC/ptmx-keystroke-latency.c
> gcc -O ptmx-keystroke-latency ptmx-keystroke-latency.c
> ./ptmx-keystroke-latency
>
> Log in to another tty, as another user. See keystroke timing. 3.9-rc8.
>
> Seems like it was missed. Meanwhile, idle times in "w" do not update.

Ah, it's using inotify on the /dev/ptmx device. Jiri, your change
really doesn't affect that at all :(

Simon, you mention a grsec change somewhere that addresses this issue.
Any hints on where that would be?

thanks,

greg k-h

2013-04-30 01:54:40

by Shuah Khan

[permalink] [raw]
Subject: Re: [ 00/42] 3.8.11-stable review

On Mon, 2013-04-29 at 12:01 -0700, Greg Kroah-Hartman wrote:
> This is the start of the stable review cycle for the 3.8.11 release.
> There are 42 patches in this series, all will be posted as a response
> to this one. If anyone has any issues with these being applied, please
> let me know.
>
> Responses should be made by Wed May 1 18:47:07 UTC 2013.
> Anything received after that time might be too late.
>
> The whole patch series can be found in one patch at:
> kernel.org/pub/linux/kernel/v3.0/stable-review/patch-3.8.11-rc1.gz
> and the diffstat can be found below.
>
> thanks,
>
> greg k-h
>

Patches applied cleanly to 3.0.75, 3.4.42, and 3.8.10

Compiled and booted on the following systems:

Samsung Series 9 Intel Corei5 (3.4.43-rc1 and 3.8.11-rc1)
HP ProBook 6475b AMD A10-4600M APU with Radeon(tm) HD Graphics

dmesgs for all releases look good. No regressions compared to the
previous dmesgs for each of these releases.

Cross-compile tests results:

alpha: defconfig passed on all
arm: defconfig passed on all
arm64: not applicable to 3.0.y, 3.4.y. defconfig passed on 3.8.y
c6x: not applicable to 3.0.y, defconfig passed on 3.4.y, and 3.8.y.
mips: defconfig passed on all
mipsel: defconfig passed on all
powerpc: wii_defconfig passed on all
sh: defconfig passed on all
sparc: defconfig passed on all
tile: tilegx_defconfig passed on all

-- Shuah
????{.n?+???????+%?????ݶ??w??{.n?+????{??G?????{ay?ʇڙ?,j??f???h?????????z_??(?階?ݢj"???m??????G????????????&???~???iO???z??v?^?m???? ????????I?

2013-04-30 02:02:22

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: [ 00/42] 3.8.11-stable review

On Tue, Apr 30, 2013 at 01:54:37AM +0000, Shuah Khan wrote:
> On Mon, 2013-04-29 at 12:01 -0700, Greg Kroah-Hartman wrote:
> > This is the start of the stable review cycle for the 3.8.11 release.
> > There are 42 patches in this series, all will be posted as a response
> > to this one. If anyone has any issues with these being applied, please
> > let me know.
> >
> > Responses should be made by Wed May 1 18:47:07 UTC 2013.
> > Anything received after that time might be too late.
> >
> > The whole patch series can be found in one patch at:
> > kernel.org/pub/linux/kernel/v3.0/stable-review/patch-3.8.11-rc1.gz
> > and the diffstat can be found below.
> >
> > thanks,
> >
> > greg k-h
> >
>
> Patches applied cleanly to 3.0.75, 3.4.42, and 3.8.10
>
> Compiled and booted on the following systems:
>
> Samsung Series 9 Intel Corei5 (3.4.43-rc1 and 3.8.11-rc1)
> HP ProBook 6475b AMD A10-4600M APU with Radeon(tm) HD Graphics
>
> dmesgs for all releases look good. No regressions compared to the
> previous dmesgs for each of these releases.

Great, thanks for testing and letting us know.

greg k-h

2013-04-30 12:02:21

by Wolfram Gloger

[permalink] [raw]
Subject: Re: [ 03/42] TTY: fix atime/mtime regression

Hi,

>To revert to the old behaviour while still preventing attackers to
>guess the password length, we update the timestamps in one-minute
>intervals by this patch.

Sorry if I miss something, but isn't this an issue that should be very
obviously fixed in user space? Only user space knows whether the
atime/mtime updates on a device are security-sensitive or not.

The sshd process and/or the login process could easily perform randomly
timed, dummy utime() calls on the tty around and within the password
typing, making this attack unfeasible. I faintly remember sshd _already
does this_ for the network packets anyway by exchanging dummy packets.

Regards,
Wolfram.

2013-04-30 23:50:55

by Simon Kirby

[permalink] [raw]
Subject: Re: [ 02/42] TTY: do not update atime/mtime on read/write

On Mon, Apr 29, 2013 at 06:37:24PM -0700, Greg Kroah-Hartman wrote:

> On Mon, Apr 29, 2013 at 05:36:40PM -0700, Simon Kirby wrote:
> > On Mon, Apr 29, 2013 at 05:21:17PM -0700, Greg Kroah-Hartman wrote:
> >
> > > On Mon, Apr 29, 2013 at 05:14:45PM -0700, Simon Kirby wrote:
> > > > On Mon, Apr 29, 2013 at 12:01:44PM -0700, Greg Kroah-Hartman wrote:
> > > >
> > > > > 3.8-stable review patch. If anyone has any objections, please let me know.
> > > >
> > > > I object. This breaks functionality I use every day (seeing who else is
> > > > working on stuff with "w").
> > > >
> > > > Furthermore, the patch does not actually fix the hole referenced (see
> > > > ptmx-keystroke-latency.c on http://vladz.devzero.fr/013_ptmx-timing.php).
> > > > I can still reproduce the timing capture even with this patch applied
> > > > (in 3.9-rc8).
> > >
> > > How? There are no keystrokes being reported to other users, or did we
> > > miss something with this patch?
> >
> > wget http://vladz.devzero.fr/svn/codes/PoC/ptmx-keystroke-latency.c
> > gcc -O ptmx-keystroke-latency ptmx-keystroke-latency.c
> > ./ptmx-keystroke-latency
> >
> > Log in to another tty, as another user. See keystroke timing. 3.9-rc8.
> >
> > Seems like it was missed. Meanwhile, idle times in "w" do not update.
>
> Ah, it's using inotify on the /dev/ptmx device. Jiri, your change
> really doesn't affect that at all :(
>
> Simon, you mention a grsec change somewhere that addresses this issue.
> Any hints on where that would be?

Yes, see Jiri's comments in the original patch (b0de59b5733d):

http://vladz.devzero.fr/013_ptmx-timing.php

The grsec patch is linked from there:

http://grsecurity.net/~spender/sidechannel.diff

Simon-

2013-05-01 00:57:48

by Linus Torvalds

[permalink] [raw]
Subject: Re: [ 02/42] TTY: do not update atime/mtime on read/write

On Mon, Apr 29, 2013 at 6:37 PM, Greg Kroah-Hartman
<[email protected]> wrote:
>
> Ah, it's using inotify on the /dev/ptmx device. Jiri, your change
> really doesn't affect that at all :(

Hmm. Maybe something like the appended? Together with making the time
modification be 10 seconds to make Simon happy (that's what Jiri
originally did, it was me who said "why not make it a natural human
timeframe"). In fact, maybe we should just make it a power-of-two (8?
4?) and avoid the nasty division..

Patch is whitespace-damaged and totally untested! Caveat applicator.

Linus

--- snip snip ---

drivers/tty/pty.c | 3 +++
1 file changed, 3 insertions(+)

diff --git a/drivers/tty/pty.c b/drivers/tty/pty.c
index a62798fcc014..59bfaecc4e14 100644
--- a/drivers/tty/pty.c
+++ b/drivers/tty/pty.c
@@ -681,6 +681,9 @@ static int ptmx_open(struct inode *inode, struct
file *filp)

nonseekable_open(inode, filp);

+ /* We refuse fsnotify events on ptmx, since it's a shared resource */
+ filp->f_mode |= FMODE_NONOTIFY;
+
retval = tty_alloc_file(filp);
if (retval)
return retval;

2013-05-01 01:41:52

by Linus Torvalds

[permalink] [raw]
Subject: Re: [ 02/42] TTY: do not update atime/mtime on read/write

On Tue, Apr 30, 2013 at 5:57 PM, Linus Torvalds
<[email protected]> wrote:
>
> Patch is whitespace-damaged and totally untested! Caveat applicator.

Ok, so it's still whitespace-damaged, but it seems to work. The
appended has the "8 second rule" too..

Comments? Simon?

Linus

--- snip snip ---
drivers/tty/pty.c | 3 +++
drivers/tty/tty_io.c | 4 ++--
2 files changed, 5 insertions(+), 2 deletions(-)

diff --git a/drivers/tty/pty.c b/drivers/tty/pty.c
index a62798fcc014..59bfaecc4e14 100644
--- a/drivers/tty/pty.c
+++ b/drivers/tty/pty.c
@@ -681,6 +681,9 @@ static int ptmx_open(struct inode *inode, struct file *filp)

nonseekable_open(inode, filp);

+ /* We refuse fsnotify events on ptmx, since it's a shared resource */
+ filp->f_mode |= FMODE_NONOTIFY;
+
retval = tty_alloc_file(filp);
if (retval)
return retval;
diff --git a/drivers/tty/tty_io.c b/drivers/tty/tty_io.c
index 97ebc8c5864e..6464029e4860 100644
--- a/drivers/tty/tty_io.c
+++ b/drivers/tty/tty_io.c
@@ -988,10 +988,10 @@ void start_tty(struct tty_struct *tty)

EXPORT_SYMBOL(start_tty);

+/* We limit tty time update visibility to every 8 seconds or so. */
static void tty_update_time(struct timespec *time)
{
- unsigned long sec = get_seconds();
- sec -= sec % 60;
+ unsigned long sec = get_seconds() & ~7;
if ((long)(sec - time->tv_sec) > 0)
time->tv_sec = sec;
}

2013-05-01 05:23:13

by Jiri Slaby

[permalink] [raw]
Subject: Re: [ 02/42] TTY: do not update atime/mtime on read/write

On 05/01/2013 03:41 AM, Linus Torvalds wrote:
> On Tue, Apr 30, 2013 at 5:57 PM, Linus Torvalds
> <[email protected]> wrote:
>>
>> Patch is whitespace-damaged and totally untested! Caveat applicator.
>
> Ok, so it's still whitespace-damaged, but it seems to work. The
> appended has the "8 second rule" too..
>
> Comments?

Yeah, looks good to me.

> --- snip snip ---
> drivers/tty/pty.c | 3 +++
> drivers/tty/tty_io.c | 4 ++--
> 2 files changed, 5 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/tty/pty.c b/drivers/tty/pty.c
> index a62798fcc014..59bfaecc4e14 100644
> --- a/drivers/tty/pty.c
> +++ b/drivers/tty/pty.c
> @@ -681,6 +681,9 @@ static int ptmx_open(struct inode *inode, struct file *filp)
>
> nonseekable_open(inode, filp);
>
> + /* We refuse fsnotify events on ptmx, since it's a shared resource */
> + filp->f_mode |= FMODE_NONOTIFY;
> +
> retval = tty_alloc_file(filp);
> if (retval)
> return retval;
> diff --git a/drivers/tty/tty_io.c b/drivers/tty/tty_io.c
> index 97ebc8c5864e..6464029e4860 100644
> --- a/drivers/tty/tty_io.c
> +++ b/drivers/tty/tty_io.c
> @@ -988,10 +988,10 @@ void start_tty(struct tty_struct *tty)
>
> EXPORT_SYMBOL(start_tty);
>
> +/* We limit tty time update visibility to every 8 seconds or so. */
> static void tty_update_time(struct timespec *time)
> {
> - unsigned long sec = get_seconds();
> - sec -= sec % 60;
> + unsigned long sec = get_seconds() & ~7;
> if ((long)(sec - time->tv_sec) > 0)
> time->tv_sec = sec;
> }
>

thanks,
--
js
suse labs

2013-05-01 13:06:13

by Wolfram Gloger

[permalink] [raw]
Subject: Re: [ 02/42] TTY: do not update atime/mtime on read/write

Hi,

> --- snip snip ---
> drivers/tty/pty.c | 3 +++
> drivers/tty/tty_io.c | 4 ++--
> 2 files changed, 5 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/tty/pty.c b/drivers/tty/pty.c
> index a62798fcc014..59bfaecc4e14 100644
> --- a/drivers/tty/pty.c
> +++ b/drivers/tty/pty.c
> @@ -681,6 +681,9 @@ static int ptmx_open(struct inode *inode, struct file *filp)
>
> nonseekable_open(inode, filp);
>
> + /* We refuse fsnotify events on ptmx, since it's a shared resource */
> + filp->f_mode |= FMODE_NONOTIFY;
> +
> retval = tty_alloc_file(filp);
> if (retval)
> return retval;

This is definitely good. But of course you can still poll on mtime.

> diff --git a/drivers/tty/tty_io.c b/drivers/tty/tty_io.c
> index 97ebc8c5864e..6464029e4860 100644
> --- a/drivers/tty/tty_io.c
> +++ b/drivers/tty/tty_io.c
> @@ -988,10 +988,10 @@ void start_tty(struct tty_struct *tty)
>
> EXPORT_SYMBOL(start_tty);
>
> +/* We limit tty time update visibility to every 8 seconds or so. */
> static void tty_update_time(struct timespec *time)
> {
> - unsigned long sec = get_seconds();
> - sec -= sec % 60;
> + unsigned long sec = get_seconds() & ~7;
> if ((long)(sec - time->tv_sec) > 0)
> time->tv_sec = sec;
> }

I still find this mildly ugly. I would prefer this:

--- linux-3.8.10/drivers/tty/tty_io.c~ 2013-02-19 00:58:34.000000000 +0100
+++ linux-3.8.10/drivers/tty/tty_io.c 2013-05-01 13:46:16.000000000 +0200
@@ -1080,8 +1080,11 @@
cond_resched();
}
if (written) {
+ if (tty->driver->type != TTY_DRIVER_TYPE_PTY ||
+ tty->driver->subtype != PTY_TYPE_MASTER) {
struct inode *inode = file->f_path.dentry->d_inode;
inode->i_mtime = current_fs_time(inode->i_sb);
+ }
ret = written;
}
out:

(without the tty_update_time change). This prevents polling on
/dev/ptmx, but not on /dev/pts/*. The latter seems unnecessary to me,
because during password entry, echo mode is off so no bytes are read,
and canonical mode is ON so no bytes are written until NL is entered.
So you can only obtain the total time taken to enter the password, not
individual keystrokes.

The canonical mode is also the reason why my suggestion to fix this in
userspace (in the other subthread) is quite problematic (I tried to
change PAM) as it looks impossible to delay or obfuscate the write
events on /dev/ptmx.

Regards,
Wolfram.

2013-05-02 16:11:50

by Simon Kirby

[permalink] [raw]
Subject: Re: [ 02/42] TTY: do not update atime/mtime on read/write

On Tue, Apr 30, 2013 at 06:41:44PM -0700, Linus Torvalds wrote:

> On Tue, Apr 30, 2013 at 5:57 PM, Linus Torvalds
> <[email protected]> wrote:
> >
> > Patch is whitespace-damaged and totally untested! Caveat applicator.
>
> Ok, so it's still whitespace-damaged, but it seems to work. The
> appended has the "8 second rule" too..
>
> Comments? Simon?

Tested -- both hunks seem to work as intended. Thanks!

Simon-

Below became b0b885657b6c8ef63a46bc9299b2a7715d19acde

> Linus
>
> --- snip snip ---
> drivers/tty/pty.c | 3 +++
> drivers/tty/tty_io.c | 4 ++--
> 2 files changed, 5 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/tty/pty.c b/drivers/tty/pty.c
> index a62798fcc014..59bfaecc4e14 100644
> --- a/drivers/tty/pty.c
> +++ b/drivers/tty/pty.c
> @@ -681,6 +681,9 @@ static int ptmx_open(struct inode *inode, struct file *filp)
>
> nonseekable_open(inode, filp);
>
> + /* We refuse fsnotify events on ptmx, since it's a shared resource */
> + filp->f_mode |= FMODE_NONOTIFY;
> +
> retval = tty_alloc_file(filp);
> if (retval)
> return retval;
> diff --git a/drivers/tty/tty_io.c b/drivers/tty/tty_io.c
> index 97ebc8c5864e..6464029e4860 100644
> --- a/drivers/tty/tty_io.c
> +++ b/drivers/tty/tty_io.c
> @@ -988,10 +988,10 @@ void start_tty(struct tty_struct *tty)
>
> EXPORT_SYMBOL(start_tty);
>
> +/* We limit tty time update visibility to every 8 seconds or so. */
> static void tty_update_time(struct timespec *time)
> {
> - unsigned long sec = get_seconds();
> - sec -= sec % 60;
> + unsigned long sec = get_seconds() & ~7;
> if ((long)(sec - time->tv_sec) > 0)
> time->tv_sec = sec;
> }