2017-09-18 09:10:13

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.13 00/52] 4.13.3-stable review

This is the start of the stable review cycle for the 4.13.3 release.
There are 52 patches in this series, all will be posted as a response
to this one. If anyone has any issues with these being applied, please
let me know.

Responses should be made by Wed Sep 20 09:08:28 UTC 2017.
Anything received after that time might be too late.

The whole patch series can be found in one patch at:
kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.13.3-rc1.gz
or in the git tree and branch at:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-4.13.y
and the diffstat can be found below.

thanks,

greg k-h

-------------
Pseudo-Shortlog of commits:

Greg Kroah-Hartman <[email protected]>
Linux 4.13.3-rc1

Song Liu <[email protected]>
md/raid5: release/flush io in raid5_do_work()

Shaohua Li <[email protected]>
md/raid1/10: reset bio allocated from mempool

Pan Bian <[email protected]>
xfs: use kmem_free to free return value of kmem_zalloc

Christoph Hellwig <[email protected]>
xfs: open code end_buffer_async_write in xfs_finish_page_writeback

Christoph Hellwig <[email protected]>
xfs: don't set v3 xflags for v2 inodes

Amir Goldstein <[email protected]>
xfs: fix incorrect log_flushed on fsync

Christoph Hellwig <[email protected]>
xfs: disable per-inode DAX flag

Brian Foster <[email protected]>
xfs: relog dirty buffers during swapext bmbt owner change

Brian Foster <[email protected]>
xfs: disallow marking previously dirty buffers as ordered

Brian Foster <[email protected]>
xfs: move bmbt owner change to last step of extent swap

Brian Foster <[email protected]>
xfs: skip bmbt block ino validation during owner change

Brian Foster <[email protected]>
xfs: don't log dirty ranges for ordered buffers

Brian Foster <[email protected]>
xfs: refactor buffer logging into buffer dirtying helper

Brian Foster <[email protected]>
xfs: ordered buffer log items are never formatted

Brian Foster <[email protected]>
xfs: remove unnecessary dirty bli format check for ordered bufs

Brian Foster <[email protected]>
xfs: open-code xfs_buf_item_dirty()

Omar Sandoval <[email protected]>
xfs: check for race with xfs_reclaim_inode() in xfs_ifree_cluster()

Darrick J. Wong <[email protected]>
xfs: evict all inodes involved with log redo item

Carlos Maiolino <[email protected]>
xfs: stop searching for free slots in an inode chunk when there are none

Brian Foster <[email protected]>
xfs: handle -EFSCORRUPTED during head/tail verification

Brian Foster <[email protected]>
xfs: fix log recovery corruption error due to tail overwrite

Brian Foster <[email protected]>
xfs: always verify the log tail during recovery

Brian Foster <[email protected]>
xfs: fix recovery failure when log record header wraps log end

Carlos Maiolino <[email protected]>
xfs: Properly retry failed inode items in case of error during buffer writeback

Carlos Maiolino <[email protected]>
xfs: Add infrastructure needed for error propagation during buffer IO failure

Eric Sandeen <[email protected]>
xfs: toggle readonly state around xfs_log_mount_finish

Eric Sandeen <[email protected]>
xfs: write unmount record for ro mounts

Dan Williams <[email protected]>
libnvdimm: fix integer overflow static analysis warning

Christophe Jaillet <[email protected]>
libnvdimm, btt: check memory allocation failure

Eric Biggers <[email protected]>
idr: remove WARN_ON_ONCE() when trying to replace negative ID

Miklos Szeredi <[email protected]>
fuse: allow server to run in different pid_ns

Amir Goldstein <[email protected]>
ovl: fix false positive ESTALE on lookup

Tony Luck <[email protected]>
x86/mm, mm/hwpoison: Clear PRESENT bit for kernel 1:1 mappings of poison pages

Andy Lutomirski <[email protected]>
x86/switch_to/64: Rewrite FS/GS switching yet again to fix AMD CPUs

Andy Lutomirski <[email protected]>
x86/fsgsbase/64: Report FSBASE and GSBASE correctly in core dumps

Andy Lutomirski <[email protected]>
x86/fsgsbase/64: Fully initialize FS and GS state in start_thread_common

Bernat, Yehezkel <[email protected]>
thunderbolt: Allow clearing the key

Bernat, Yehezkel <[email protected]>
thunderbolt: Make key root-only accessible

Bernat, Yehezkel <[email protected]>
thunderbolt: Remove superfluous check

Jaegeuk Kim <[email protected]>
f2fs: check hot_data for roll-forward recovery

Jaegeuk Kim <[email protected]>
f2fs: let fill_super handle roll-forward errors

Haishuang Yan <[email protected]>
ip_tunnel: fix setting ttl and tos value in collect_md mode

Eric Dumazet <[email protected]>
tcp: fix a request socket leak

Marcelo Ricardo Leitner <[email protected]>
sctp: fix missing wake ups in some situations

Eric Dumazet <[email protected]>
ipv6: fix typo in fib6_net_exit()

Sabrina Dubroca <[email protected]>
ipv6: fix memory leak with multiple tables during netns destruction

Paolo Abeni <[email protected]>
udp: drop head states only when all skb references are gone

Xin Long <[email protected]>
ip6_gre: update mtu properly in ip6gre_err

Jason Wang <[email protected]>
vhost_net: correctly check tx avail during rx busy polling

Claudiu Manoil <[email protected]>
gianfar: Fix Tx flow control deactivation

Jesper Dangaard Brouer <[email protected]>
Revert "net: fix percpu memory leaks"

Jesper Dangaard Brouer <[email protected]>
Revert "net: use lib/percpu_counter API for fragmentation mem accounting"


-------------

Diffstat:

Documentation/ABI/testing/sysfs-bus-thunderbolt | 2 +
Makefile | 4 +-
arch/x86/include/asm/elf.h | 5 +-
arch/x86/include/asm/page_64.h | 4 +
arch/x86/kernel/cpu/mcheck/mce.c | 43 +++++
arch/x86/kernel/process_64.c | 236 +++++++++++++-----------
drivers/md/raid1.c | 19 +-
drivers/md/raid10.c | 35 +++-
drivers/md/raid5.c | 4 +
drivers/net/ethernet/freescale/gianfar.c | 2 +-
drivers/nvdimm/btt.c | 2 +
drivers/nvdimm/bus.c | 17 +-
drivers/thunderbolt/switch.c | 20 +-
drivers/vhost/net.c | 7 +-
fs/f2fs/recovery.c | 4 +-
fs/fuse/dev.c | 13 +-
fs/fuse/file.c | 3 -
fs/inode.c | 1 +
fs/internal.h | 1 -
fs/overlayfs/inode.c | 11 +-
fs/xfs/libxfs/xfs_bmap_btree.c | 1 +
fs/xfs/libxfs/xfs_btree.c | 27 ++-
fs/xfs/libxfs/xfs_btree.h | 3 +-
fs/xfs/libxfs/xfs_ialloc.c | 57 +++---
fs/xfs/xfs_aops.c | 71 ++++---
fs/xfs/xfs_bmap_util.c | 93 +++++++---
fs/xfs/xfs_buf_item.c | 135 +++++++++-----
fs/xfs/xfs_buf_item.h | 5 +-
fs/xfs/xfs_icache.c | 10 +-
fs/xfs/xfs_inode.c | 23 ++-
fs/xfs/xfs_inode_item.c | 47 ++++-
fs/xfs/xfs_ioctl.c | 41 ++--
fs/xfs/xfs_log.c | 33 +++-
fs/xfs/xfs_log_recover.c | 155 ++++++++++------
fs/xfs/xfs_super.c | 2 +-
fs/xfs/xfs_trace.h | 1 -
fs/xfs/xfs_trans.h | 14 +-
fs/xfs/xfs_trans_ail.c | 3 +-
fs/xfs/xfs_trans_buf.c | 79 +++++---
fs/xfs/xfs_trans_priv.h | 31 ++++
include/linux/fs.h | 1 +
include/linux/mm_inline.h | 6 +
include/linux/skbuff.h | 2 +-
include/net/inet_frag.h | 35 +---
lib/idr.c | 2 +-
mm/memory-failure.c | 2 +
net/core/skbuff.c | 9 +-
net/ieee802154/6lowpan/reassembly.c | 11 +-
net/ipv4/inet_fragment.c | 4 +-
net/ipv4/ip_fragment.c | 12 +-
net/ipv4/ip_tunnel.c | 4 +-
net/ipv4/tcp_ipv4.c | 6 +-
net/ipv4/udp.c | 5 +-
net/ipv6/ip6_fib.c | 25 ++-
net/ipv6/ip6_gre.c | 4 +-
net/ipv6/netfilter/nf_conntrack_reasm.c | 12 +-
net/ipv6/reassembly.c | 12 +-
net/ipv6/tcp_ipv6.c | 6 +-
net/sctp/ulpqueue.c | 3 +-
59 files changed, 918 insertions(+), 507 deletions(-)



2017-09-18 09:10:18

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.13 01/52] Revert "net: use lib/percpu_counter API for fragmentation mem accounting"

4.13-stable review patch. If anyone has any objections, please let me know.

------------------

From: Jesper Dangaard Brouer <[email protected]>


[ Upstream commit fb452a1aa3fd4034d7999e309c5466ff2d7005aa ]

This reverts commit 6d7b857d541ecd1d9bd997c97242d4ef94b19de2.

There is a bug in fragmentation codes use of the percpu_counter API,
that can cause issues on systems with many CPUs.

The frag_mem_limit() just reads the global counter (fbc->count),
without considering other CPUs can have upto batch size (130K) that
haven't been subtracted yet. Due to the 3MBytes lower thresh limit,
this become dangerous at >=24 CPUs (3*1024*1024/130000=24).

The correct API usage would be to use __percpu_counter_compare() which
does the right thing, and takes into account the number of (online)
CPUs and batch size, to account for this and call __percpu_counter_sum()
when needed.

We choose to revert the use of the lib/percpu_counter API for frag
memory accounting for several reasons:

1) On systems with CPUs > 24, the heavier fully locked
__percpu_counter_sum() is always invoked, which will be more
expensive than the atomic_t that is reverted to.

Given systems with more than 24 CPUs are becoming common this doesn't
seem like a good option. To mitigate this, the batch size could be
decreased and thresh be increased.

2) The add_frag_mem_limit+sub_frag_mem_limit pairs happen on the RX
CPU, before SKBs are pushed into sockets on remote CPUs. Given
NICs can only hash on L2 part of the IP-header, the NIC-RXq's will
likely be limited. Thus, a fair chance that atomic add+dec happen
on the same CPU.

Revert note that commit 1d6119baf061 ("net: fix percpu memory leaks")
removed init_frag_mem_limit() and instead use inet_frags_init_net().
After this revert, inet_frags_uninit_net() becomes empty.

Fixes: 6d7b857d541e ("net: use lib/percpu_counter API for fragmentation mem accounting")
Fixes: 1d6119baf061 ("net: fix percpu memory leaks")
Signed-off-by: Jesper Dangaard Brouer <[email protected]>
Acked-by: Florian Westphal <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
include/net/inet_frag.h | 30 +++++++++---------------------
net/ipv4/inet_fragment.c | 4 +---
2 files changed, 10 insertions(+), 24 deletions(-)

--- a/include/net/inet_frag.h
+++ b/include/net/inet_frag.h
@@ -1,14 +1,9 @@
#ifndef __NET_FRAG_H__
#define __NET_FRAG_H__

-#include <linux/percpu_counter.h>
-
struct netns_frags {
- /* The percpu_counter "mem" need to be cacheline aligned.
- * mem.count must not share cacheline with other writers
- */
- struct percpu_counter mem ____cacheline_aligned_in_smp;
-
+ /* Keep atomic mem on separate cachelines in structs that include it */
+ atomic_t mem ____cacheline_aligned_in_smp;
/* sysctls */
int timeout;
int high_thresh;
@@ -110,11 +105,11 @@ void inet_frags_fini(struct inet_frags *

static inline int inet_frags_init_net(struct netns_frags *nf)
{
- return percpu_counter_init(&nf->mem, 0, GFP_KERNEL);
+ atomic_set(&nf->mem, 0);
+ return 0;
}
static inline void inet_frags_uninit_net(struct netns_frags *nf)
{
- percpu_counter_destroy(&nf->mem);
}

void inet_frags_exit_net(struct netns_frags *nf, struct inet_frags *f);
@@ -140,31 +135,24 @@ static inline bool inet_frag_evicting(st

/* Memory Tracking Functions. */

-/* The default percpu_counter batch size is not big enough to scale to
- * fragmentation mem acct sizes.
- * The mem size of a 64K fragment is approx:
- * (44 fragments * 2944 truesize) + frag_queue struct(200) = 129736 bytes
- */
-static unsigned int frag_percpu_counter_batch = 130000;
-
static inline int frag_mem_limit(struct netns_frags *nf)
{
- return percpu_counter_read(&nf->mem);
+ return atomic_read(&nf->mem);
}

static inline void sub_frag_mem_limit(struct netns_frags *nf, int i)
{
- percpu_counter_add_batch(&nf->mem, -i, frag_percpu_counter_batch);
+ atomic_sub(i, &nf->mem);
}

static inline void add_frag_mem_limit(struct netns_frags *nf, int i)
{
- percpu_counter_add_batch(&nf->mem, i, frag_percpu_counter_batch);
+ atomic_add(i, &nf->mem);
}

-static inline unsigned int sum_frag_mem_limit(struct netns_frags *nf)
+static inline int sum_frag_mem_limit(struct netns_frags *nf)
{
- return percpu_counter_sum_positive(&nf->mem);
+ return atomic_read(&nf->mem);
}

/* RFC 3168 support :
--- a/net/ipv4/inet_fragment.c
+++ b/net/ipv4/inet_fragment.c
@@ -234,10 +234,8 @@ evict_again:
cond_resched();

if (read_seqretry(&f->rnd_seqlock, seq) ||
- percpu_counter_sum(&nf->mem))
+ sum_frag_mem_limit(nf))
goto evict_again;
-
- percpu_counter_destroy(&nf->mem);
}
EXPORT_SYMBOL(inet_frags_exit_net);



2017-09-18 09:10:24

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.13 04/52] vhost_net: correctly check tx avail during rx busy polling

4.13-stable review patch. If anyone has any objections, please let me know.

------------------

From: Jason Wang <[email protected]>


[ Upstream commit 8b949bef9172ca69d918e93509a4ecb03d0355e0 ]

We check tx avail through vhost_enable_notify() in the past which is
wrong since it only checks whether or not guest has filled more
available buffer since last avail idx synchronization which was just
done by vhost_vq_avail_empty() before. What we really want is checking
pending buffers in the avail ring. Fix this by calling
vhost_vq_avail_empty() instead.

This issue could be noticed by doing netperf TCP_RR benchmark as
client from guest (but not host). With this fix, TCP_RR from guest to
localhost restores from 1375.91 trans per sec to 55235.28 trans per
sec on my laptop (Intel(R) Core(TM) i7-5600U CPU @ 2.60GHz).

Fixes: 030881372460 ("vhost_net: basic polling support")
Signed-off-by: Jason Wang <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
drivers/vhost/net.c | 7 ++++++-
1 file changed, 6 insertions(+), 1 deletion(-)

--- a/drivers/vhost/net.c
+++ b/drivers/vhost/net.c
@@ -634,8 +634,13 @@ static int vhost_net_rx_peek_head_len(st

preempt_enable();

- if (vhost_enable_notify(&net->dev, vq))
+ if (!vhost_vq_avail_empty(&net->dev, vq))
vhost_poll_queue(&vq->poll);
+ else if (unlikely(vhost_enable_notify(&net->dev, vq))) {
+ vhost_disable_notify(&net->dev, vq);
+ vhost_poll_queue(&vq->poll);
+ }
+
mutex_unlock(&vq->mutex);

len = peek_head_len(rvq, sk);


2017-09-18 09:10:27

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.13 15/52] thunderbolt: Make key root-only accessible

4.13-stable review patch. If anyone has any objections, please let me know.

------------------

From: Bernat, Yehezkel <[email protected]>

commit 0956e41169222822d3557871fcd1d32e4fa7e934 upstream.

Non-root user may read the key back after root wrote it there.
This removes read access to everyone but root.

Signed-off-by: Yehezkel Bernat <[email protected]>
Acked-by: Mika Westerberg <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
drivers/thunderbolt/switch.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

--- a/drivers/thunderbolt/switch.c
+++ b/drivers/thunderbolt/switch.c
@@ -826,7 +826,7 @@ static ssize_t key_store(struct device *
mutex_unlock(&switch_lock);
return ret;
}
-static DEVICE_ATTR_RW(key);
+static DEVICE_ATTR(key, 0600, key_show, key_store);

static ssize_t nvm_authenticate_show(struct device *dev,
struct device_attribute *attr, char *buf)


2017-09-18 09:10:32

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.13 16/52] thunderbolt: Allow clearing the key

4.13-stable review patch. If anyone has any objections, please let me know.

------------------

From: Bernat, Yehezkel <[email protected]>

commit e545f0d8a54a9594fe604d67d80ca6fddf72ca59 upstream.

If secure authentication of a devices fails, either because the device
already has another key uploaded, or there is some other error sending
challenge to the device, and the user only wants to approve the device
just once (without a new key being uploaded to the device) the current
implementation does not allow this because the key cannot be cleared
once set even if we allow it to be changed.

Make this scenario possible and allow clearing the key by writing
empty string to the key sysfs file.

Signed-off-by: Yehezkel Bernat <[email protected]>
Acked-by: Mika Westerberg <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
Documentation/ABI/testing/sysfs-bus-thunderbolt | 2 ++
drivers/thunderbolt/switch.c | 15 +++++++++++----
2 files changed, 13 insertions(+), 4 deletions(-)

--- a/Documentation/ABI/testing/sysfs-bus-thunderbolt
+++ b/Documentation/ABI/testing/sysfs-bus-thunderbolt
@@ -45,6 +45,8 @@ Contact: [email protected]
Description: When a devices supports Thunderbolt secure connect it will
have this attribute. Writing 32 byte hex string changes
authorization to use the secure connection method instead.
+ Writing an empty string clears the key and regular connection
+ method can be used again.

What: /sys/bus/thunderbolt/devices/.../device
Date: Sep 2017
--- a/drivers/thunderbolt/switch.c
+++ b/drivers/thunderbolt/switch.c
@@ -807,8 +807,11 @@ static ssize_t key_store(struct device *
struct tb_switch *sw = tb_to_switch(dev);
u8 key[TB_SWITCH_KEY_SIZE];
ssize_t ret = count;
+ bool clear = false;

- if (hex2bin(key, buf, sizeof(key)))
+ if (!strcmp(buf, "\n"))
+ clear = true;
+ else if (hex2bin(key, buf, sizeof(key)))
return -EINVAL;

if (mutex_lock_interruptible(&switch_lock))
@@ -818,9 +821,13 @@ static ssize_t key_store(struct device *
ret = -EBUSY;
} else {
kfree(sw->key);
- sw->key = kmemdup(key, sizeof(key), GFP_KERNEL);
- if (!sw->key)
- ret = -ENOMEM;
+ if (clear) {
+ sw->key = NULL;
+ } else {
+ sw->key = kmemdup(key, sizeof(key), GFP_KERNEL);
+ if (!sw->key)
+ ret = -ENOMEM;
+ }
}

mutex_unlock(&switch_lock);


2017-09-18 09:10:36

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.13 18/52] x86/fsgsbase/64: Report FSBASE and GSBASE correctly in core dumps

4.13-stable review patch. If anyone has any objections, please let me know.

------------------

From: Andy Lutomirski <[email protected]>

commit 9584d98bed7a7a904d0702ad06bbcc94703cb5b4 upstream.

In ELF_COPY_CORE_REGS, we're copying from the current task, so
accessing thread.fsbase and thread.gsbase makes no sense. Just read
the values from the CPU registers.

In practice, the old code would have been correct most of the time
simply because thread.fsbase and thread.gsbase usually matched the
CPU registers.

Signed-off-by: Andy Lutomirski <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Brian Gerst <[email protected]>
Cc: Chang Seok <[email protected]>
Cc: Denys Vlasenko <[email protected]>
Cc: H. Peter Anvin <[email protected]>
Cc: Josh Poimboeuf <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
arch/x86/include/asm/elf.h | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)

--- a/arch/x86/include/asm/elf.h
+++ b/arch/x86/include/asm/elf.h
@@ -204,6 +204,7 @@ void set_personality_ia32(bool);

#define ELF_CORE_COPY_REGS(pr_reg, regs) \
do { \
+ unsigned long base; \
unsigned v; \
(pr_reg)[0] = (regs)->r15; \
(pr_reg)[1] = (regs)->r14; \
@@ -226,8 +227,8 @@ do { \
(pr_reg)[18] = (regs)->flags; \
(pr_reg)[19] = (regs)->sp; \
(pr_reg)[20] = (regs)->ss; \
- (pr_reg)[21] = current->thread.fsbase; \
- (pr_reg)[22] = current->thread.gsbase; \
+ rdmsrl(MSR_FS_BASE, base); (pr_reg)[21] = base; \
+ rdmsrl(MSR_KERNEL_GS_BASE, base); (pr_reg)[22] = base; \
asm("movl %%ds,%0" : "=r" (v)); (pr_reg)[23] = v; \
asm("movl %%es,%0" : "=r" (v)); (pr_reg)[24] = v; \
asm("movl %%fs,%0" : "=r" (v)); (pr_reg)[25] = v; \


2017-09-18 09:10:44

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.13 07/52] ipv6: fix memory leak with multiple tables during netns destruction

4.13-stable review patch. If anyone has any objections, please let me know.

------------------

From: Sabrina Dubroca <[email protected]>


[ Upstream commit ba1cc08d9488c94cb8d94f545305688b72a2a300 ]

fib6_net_exit only frees the main and local tables. If another table was
created with fib6_alloc_table, we leak it when the netns is destroyed.

Fix this in the same way ip_fib_net_exit cleans up tables, by walking
through the whole hashtable of fib6_table's. We can get rid of the
special cases for local and main, since they're also part of the
hashtable.

Reproducer:
ip netns add x
ip -net x -6 rule add from 6003:1::/64 table 100
ip netns del x

Reported-by: Jianlin Shi <[email protected]>
Fixes: 58f09b78b730 ("[NETNS][IPV6] ip6_fib - make it per network namespace")
Signed-off-by: Sabrina Dubroca <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
net/ipv6/ip6_fib.c | 25 +++++++++++++++++++------
1 file changed, 19 insertions(+), 6 deletions(-)

--- a/net/ipv6/ip6_fib.c
+++ b/net/ipv6/ip6_fib.c
@@ -198,6 +198,12 @@ static void rt6_release(struct rt6_info
}
}

+static void fib6_free_table(struct fib6_table *table)
+{
+ inetpeer_invalidate_tree(&table->tb6_peers);
+ kfree(table);
+}
+
static void fib6_link_table(struct net *net, struct fib6_table *tb)
{
unsigned int h;
@@ -1915,15 +1921,22 @@ out_timer:

static void fib6_net_exit(struct net *net)
{
+ unsigned int i;
+
rt6_ifdown(net, NULL);
del_timer_sync(&net->ipv6.ip6_fib_timer);

-#ifdef CONFIG_IPV6_MULTIPLE_TABLES
- inetpeer_invalidate_tree(&net->ipv6.fib6_local_tbl->tb6_peers);
- kfree(net->ipv6.fib6_local_tbl);
-#endif
- inetpeer_invalidate_tree(&net->ipv6.fib6_main_tbl->tb6_peers);
- kfree(net->ipv6.fib6_main_tbl);
+ for (i = 0; i < FIB_TABLE_HASHSZ; i++) {
+ struct hlist_head *head = &net->ipv6.fib_table_hash[i];
+ struct hlist_node *tmp;
+ struct fib6_table *tb;
+
+ hlist_for_each_entry_safe(tb, tmp, head, tb6_hlist) {
+ hlist_del(&tb->tb6_hlist);
+ fib6_free_table(tb);
+ }
+ }
+
kfree(net->ipv6.fib_table_hash);
kfree(net->ipv6.rt6_stats);
}


2017-09-18 09:10:48

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.13 11/52] ip_tunnel: fix setting ttl and tos value in collect_md mode

4.13-stable review patch. If anyone has any objections, please let me know.

------------------

From: Haishuang Yan <[email protected]>


[ Upstream commit 0f693f1995cf002432b70f43ce73f79bf8d0b6c9 ]

ttl and tos variables are declared and assigned, but are not used in
iptunnel_xmit() function.

Fixes: cfc7381b3002 ("ip_tunnel: add collect_md mode to IPIP tunnel")
Cc: Alexei Starovoitov <[email protected]>
Signed-off-by: Haishuang Yan <[email protected]>
Acked-by: Alexei Starovoitov <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
net/ipv4/ip_tunnel.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

--- a/net/ipv4/ip_tunnel.c
+++ b/net/ipv4/ip_tunnel.c
@@ -618,8 +618,8 @@ void ip_md_tunnel_xmit(struct sk_buff *s
ip_rt_put(rt);
goto tx_dropped;
}
- iptunnel_xmit(NULL, rt, skb, fl4.saddr, fl4.daddr, proto, key->tos,
- key->ttl, df, !net_eq(tunnel->net, dev_net(dev)));
+ iptunnel_xmit(NULL, rt, skb, fl4.saddr, fl4.daddr, proto, tos, ttl,
+ df, !net_eq(tunnel->net, dev_net(dev)));
return;
tx_error:
dev->stats.tx_errors++;


2017-09-18 09:10:52

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.13 12/52] f2fs: let fill_super handle roll-forward errors

4.13-stable review patch. If anyone has any objections, please let me know.

------------------

From: Jaegeuk Kim <[email protected]>

commit afd2b4da40b3b567ef8d8e6881479345a2312a03 upstream.

If we set CP_ERROR_FLAG in roll-forward error, f2fs is no longer to proceed
any IOs due to f2fs_cp_error(). But, for example, if some stale data is involved
on roll-forward process, we're able to get -ENOENT, getting fs stuck.
If we get any error, let fill_super set SBI_NEED_FSCK and try to recover back
to stable point.

Reviewed-by: Chao Yu <[email protected]>
Signed-off-by: Jaegeuk Kim <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
fs/f2fs/recovery.c | 2 --
1 file changed, 2 deletions(-)

--- a/fs/f2fs/recovery.c
+++ b/fs/f2fs/recovery.c
@@ -599,8 +599,6 @@ out:
}

clear_sbi_flag(sbi, SBI_POR_DOING);
- if (err)
- set_ckpt_flags(sbi, CP_ERROR_FLAG);
mutex_unlock(&sbi->cp_mutex);

/* let's drop all the directory inodes for clean checkpoint */


2017-09-18 09:10:56

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.13 14/52] thunderbolt: Remove superfluous check

4.13-stable review patch. If anyone has any objections, please let me know.

------------------

From: Bernat, Yehezkel <[email protected]>

commit 8fdd6ab36197ad891233572c57781b1f537da0ac upstream.

The key size is tested by hex2bin() already (as '\0' isn't an hex digit)

Suggested-by: Andy Shevchenko <[email protected]>
Signed-off-by: Yehezkel Bernat <[email protected]>
Acked-by: Mika Westerberg <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
drivers/thunderbolt/switch.c | 3 ---
1 file changed, 3 deletions(-)

--- a/drivers/thunderbolt/switch.c
+++ b/drivers/thunderbolt/switch.c
@@ -808,9 +808,6 @@ static ssize_t key_store(struct device *
u8 key[TB_SWITCH_KEY_SIZE];
ssize_t ret = count;

- if (count < 64)
- return -EINVAL;
-
if (hex2bin(key, buf, sizeof(key)))
return -EINVAL;



2017-09-18 09:11:01

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.13 05/52] ip6_gre: update mtu properly in ip6gre_err

4.13-stable review patch. If anyone has any objections, please let me know.

------------------

From: Xin Long <[email protected]>


[ Upstream commit 5c25f30c93fdc5bf25e62101aeaae7a4f9b421b3 ]

Now when probessing ICMPV6_PKT_TOOBIG, ip6gre_err only subtracts the
offset of gre header from mtu info. The expected mtu of gre device
should also subtract gre header. Otherwise, the next packets still
can't be sent out.

Jianlin found this issue when using the topo:
client(ip6gre)<---->(nic1)route(nic2)<----->(ip6gre)server

and reducing nic2's mtu, then both tcp and sctp's performance with
big size data became 0.

This patch is to fix it by also subtracting grehdr (tun->tun_hlen)
from mtu info when updating gre device's mtu in ip6gre_err(). It
also needs to subtract ETH_HLEN if gre dev'type is ARPHRD_ETHER.

Reported-by: Jianlin Shi <[email protected]>
Signed-off-by: Xin Long <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
net/ipv6/ip6_gre.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)

--- a/net/ipv6/ip6_gre.c
+++ b/net/ipv6/ip6_gre.c
@@ -432,7 +432,9 @@ static void ip6gre_err(struct sk_buff *s
}
break;
case ICMPV6_PKT_TOOBIG:
- mtu = be32_to_cpu(info) - offset;
+ mtu = be32_to_cpu(info) - offset - t->tun_hlen;
+ if (t->dev->type == ARPHRD_ETHER)
+ mtu -= ETH_HLEN;
if (mtu < IPV6_MIN_MTU)
mtu = IPV6_MIN_MTU;
t->dev->mtu = mtu;


2017-09-18 09:11:05

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.13 30/52] xfs: fix recovery failure when log record header wraps log end

4.13-stable review patch. If anyone has any objections, please let me know.

------------------

From: Brian Foster <[email protected]>

commit 284f1c2c9bebf871861184b0e2c40fa921dd380b upstream.

The high-level log recovery algorithm consists of two loops that
walk the physical log and process log records from the tail to the
head. The first loop handles the case where the tail is beyond the
head and processes records up to the end of the physical log. The
subsequent loop processes records from the beginning of the physical
log to the head.

Because log records can wrap around the end of the physical log, the
first loop mentioned above must handle this case appropriately.
Records are processed from in-core buffers, which means that this
algorithm must split the reads of such records into two partial
I/Os: 1.) from the beginning of the record to the end of the log and
2.) from the beginning of the log to the end of the record. This is
further complicated by the fact that the log record header and log
record data are read into independent buffers.

The current handling of each buffer correctly splits the reads when
either the header or data starts before the end of the log and wraps
around the end. The data read does not correctly handle the case
where the prior header read wrapped or ends on the physical log end
boundary. blk_no is incremented to or beyond the log end after the
header read to point to the record data, but the split data read
logic triggers, attempts to read from an invalid log block and
ultimately causes log recovery to fail. This can be reproduced
fairly reliably via xfstests tests generic/047 and generic/388 with
large iclog sizes (256k) and small (10M) logs.

If the record header read has pushed beyond the end of the physical
log, the subsequent data read is actually contiguous. Update the
data read logic to detect the case where blk_no has wrapped, mod it
against the log size to read from the correct address and issue one
contiguous read for the log data buffer. The log record is processed
as normal from the buffer(s), the loop exits after the current
iteration and the subsequent loop picks up with the first new record
after the start of the log.

Signed-off-by: Brian Foster <[email protected]>
Reviewed-by: Darrick J. Wong <[email protected]>
Signed-off-by: Darrick J. Wong <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
fs/xfs/xfs_log_recover.c | 18 ++++++++++++++----
1 file changed, 14 insertions(+), 4 deletions(-)

--- a/fs/xfs/xfs_log_recover.c
+++ b/fs/xfs/xfs_log_recover.c
@@ -5218,7 +5218,7 @@ xlog_do_recovery_pass(
xfs_daddr_t *first_bad) /* out: first bad log rec */
{
xlog_rec_header_t *rhead;
- xfs_daddr_t blk_no;
+ xfs_daddr_t blk_no, rblk_no;
xfs_daddr_t rhead_blk;
char *offset;
xfs_buf_t *hbp, *dbp;
@@ -5371,9 +5371,19 @@ xlog_do_recovery_pass(
bblks = (int)BTOBB(be32_to_cpu(rhead->h_len));
blk_no += hblks;

- /* Read in data for log record */
- if (blk_no + bblks <= log->l_logBBsize) {
- error = xlog_bread(log, blk_no, bblks, dbp,
+ /*
+ * Read the log record data in multiple reads if it
+ * wraps around the end of the log. Note that if the
+ * header already wrapped, blk_no could point past the
+ * end of the log. The record data is contiguous in
+ * that case.
+ */
+ if (blk_no + bblks <= log->l_logBBsize ||
+ blk_no >= log->l_logBBsize) {
+ /* mod blk_no in case the header wrapped and
+ * pushed it beyond the end of the log */
+ rblk_no = do_mod(blk_no, log->l_logBBsize);
+ error = xlog_bread(log, rblk_no, bblks, dbp,
&offset);
if (error)
goto bread_err2;


2017-09-18 09:11:12

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.13 21/52] ovl: fix false positive ESTALE on lookup

4.13-stable review patch. If anyone has any objections, please let me know.

------------------

From: Amir Goldstein <[email protected]>

commit 939ae4efd51c627da270af74ef069db5124cb5b0 upstream.

Commit b9ac5c274b8c ("ovl: hash overlay non-dir inodes by copy up origin")
verifies that the origin lower inode stored in the overlayfs inode matched
the inode of a copy up origin dentry found by lookup.

There is a false positive result in that check when lower fs does not
support file handles and copy up origin cannot be followed by file handle
at lookup time.

The false negative happens when finding an overlay inode in cache on a
copied up overlay dentry lookup. The overlay inode still 'remembers' the
copy up origin inode, but the copy up origin dentry is not available for
verification.

Relax the check in case copy up origin dentry is not available.

Fixes: b9ac5c274b8c ("ovl: hash overlay non-dir inodes by copy up...")
Reported-by: Jordi Pujol <[email protected]>
Signed-off-by: Amir Goldstein <[email protected]>
Signed-off-by: Miklos Szeredi <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
fs/overlayfs/inode.c | 11 +++++++----
1 file changed, 7 insertions(+), 4 deletions(-)

--- a/fs/overlayfs/inode.c
+++ b/fs/overlayfs/inode.c
@@ -576,10 +576,13 @@ static int ovl_inode_set(struct inode *i
static bool ovl_verify_inode(struct inode *inode, struct dentry *lowerdentry,
struct dentry *upperdentry)
{
- struct inode *lowerinode = lowerdentry ? d_inode(lowerdentry) : NULL;
-
- /* Lower (origin) inode must match, even if NULL */
- if (ovl_inode_lower(inode) != lowerinode)
+ /*
+ * Allow non-NULL lower inode in ovl_inode even if lowerdentry is NULL.
+ * This happens when finding a copied up overlay inode for a renamed
+ * or hardlinked overlay dentry and lower dentry cannot be followed
+ * by origin because lower fs does not support file handles.
+ */
+ if (lowerdentry && ovl_inode_lower(inode) != d_inode(lowerdentry))
return false;

/*


2017-09-18 09:11:16

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.13 22/52] fuse: allow server to run in different pid_ns

4.13-stable review patch. If anyone has any objections, please let me know.

------------------

From: Miklos Szeredi <[email protected]>

commit 5d6d3a301c4e749e04be6fcdcf4cb1ffa8bae524 upstream.

Commit 0b6e9ea041e6 ("fuse: Add support for pid namespaces") broke
Sandstorm.io development tools, which have been sending FUSE file
descriptors across PID namespace boundaries since early 2014.

The above patch added a check that prevented I/O on the fuse device file
descriptor if the pid namespace of the reader/writer was different from the
pid namespace of the mounter. With this change passing the device file
descriptor to a different pid namespace simply doesn't work. The check was
added because pids are transferred to/from the fuse userspace server in the
namespace registered at mount time.

To fix this regression, remove the checks and do the following:

1) the pid in the request header (the pid of the task that initiated the
filesystem operation) is translated to the reader's pid namespace. If a
mapping doesn't exist for this pid, then a zero pid is used. Note: even if
a mapping would exist between the initiator task's pid namespace and the
reader's pid namespace the pid will be zero if either mapping from
initator's to mounter's namespace or mapping from mounter's to reader's
namespace doesn't exist.

2) The lk.pid value in setlk/setlkw requests and getlk reply is left alone.
Userspace should not interpret this value anyway. Also allow the
setlk/setlkw operations if the pid of the task cannot be represented in the
mounter's namespace (pid being zero in that case).

Reported-by: Kenton Varda <[email protected]>
Signed-off-by: Miklos Szeredi <[email protected]>
Fixes: 0b6e9ea041e6 ("fuse: Add support for pid namespaces")
Cc: Eric W. Biederman <[email protected]>
Cc: Seth Forshee <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
fs/fuse/dev.c | 13 +++++++------
fs/fuse/file.c | 3 ---
2 files changed, 7 insertions(+), 9 deletions(-)

--- a/fs/fuse/dev.c
+++ b/fs/fuse/dev.c
@@ -1222,9 +1222,6 @@ static ssize_t fuse_dev_do_read(struct f
struct fuse_in *in;
unsigned reqsize;

- if (task_active_pid_ns(current) != fc->pid_ns)
- return -EIO;
-
restart:
spin_lock(&fiq->waitq.lock);
err = -EAGAIN;
@@ -1262,6 +1259,13 @@ static ssize_t fuse_dev_do_read(struct f

in = &req->in;
reqsize = in->h.len;
+
+ if (task_active_pid_ns(current) != fc->pid_ns) {
+ rcu_read_lock();
+ in->h.pid = pid_vnr(find_pid_ns(in->h.pid, fc->pid_ns));
+ rcu_read_unlock();
+ }
+
/* If request is too large, reply with an error and restart the read */
if (nbytes < reqsize) {
req->out.h.error = -EIO;
@@ -1823,9 +1827,6 @@ static ssize_t fuse_dev_do_write(struct
struct fuse_req *req;
struct fuse_out_header oh;

- if (task_active_pid_ns(current) != fc->pid_ns)
- return -EIO;
-
if (nbytes < sizeof(struct fuse_out_header))
return -EINVAL;

--- a/fs/fuse/file.c
+++ b/fs/fuse/file.c
@@ -2181,9 +2181,6 @@ static int fuse_setlk(struct file *file,
if ((fl->fl_flags & FL_CLOSE_POSIX) == FL_CLOSE_POSIX)
return 0;

- if (pid && pid_nr == 0)
- return -EOVERFLOW;
-
fuse_lk_fill(&args, file, fl, opcode, pid_nr, flock, &inarg);
err = fuse_simple_request(fc, &args);



2017-09-18 09:11:24

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.13 26/52] xfs: write unmount record for ro mounts

4.13-stable review patch. If anyone has any objections, please let me know.

------------------

From: Eric Sandeen <[email protected]>

commit 757a69ef6cf2bf839bd4088e5609ddddd663b0c4 upstream.

There are dueling comments in the xfs code about intent
for log writes when unmounting a readonly filesystem.

In xfs_mountfs, we see the intent:

/*
* Now the log is fully replayed, we can transition to full read-only
* mode for read-only mounts. This will sync all the metadata and clean
* the log so that the recovery we just performed does not have to be
* replayed again on the next mount.
*/

and it calls xfs_quiesce_attr(), but by the time we get to
xfs_log_unmount_write(), it returns early for a RDONLY mount:

* Don't write out unmount record on read-only mounts.

Because of this, sequential ro mounts of a filesystem with
a dirty log will replay the log each time, which seems odd.

Fix this by writing an unmount record even for RO mounts, as long
as norecovery wasn't specified (don't write a clean log record
if a dirty log may still be there!) and the log device is
writable.

Signed-off-by: Eric Sandeen <[email protected]>
Reviewed-by: Brian Foster <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
Reviewed-by: Darrick J. Wong <[email protected]>
Signed-off-by: Darrick J. Wong <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
fs/xfs/xfs_log.c | 7 +++++--
1 file changed, 5 insertions(+), 2 deletions(-)

--- a/fs/xfs/xfs_log.c
+++ b/fs/xfs/xfs_log.c
@@ -812,11 +812,14 @@ xfs_log_unmount_write(xfs_mount_t *mp)
int error;

/*
- * Don't write out unmount record on read-only mounts.
+ * Don't write out unmount record on norecovery mounts or ro devices.
* Or, if we are doing a forced umount (typically because of IO errors).
*/
- if (mp->m_flags & XFS_MOUNT_RDONLY)
+ if (mp->m_flags & XFS_MOUNT_NORECOVERY ||
+ xfs_readonly_buftarg(log->l_mp->m_logdev_targp)) {
+ ASSERT(mp->m_flags & XFS_MOUNT_RDONLY);
return 0;
+ }

error = _xfs_log_force(mp, XFS_LOG_SYNC, NULL);
ASSERT(error || !(XLOG_FORCED_SHUTDOWN(log)));


2017-09-18 09:11:28

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.13 27/52] xfs: toggle readonly state around xfs_log_mount_finish

4.13-stable review patch. If anyone has any objections, please let me know.

------------------

From: Eric Sandeen <[email protected]>

commit 6f4a1eefdd0ad4561543270a7fceadabcca075dd upstream.

When we do log recovery on a readonly mount, unlinked inode
processing does not happen due to the readonly checks in
xfs_inactive(), which are trying to prevent any I/O on a
readonly mount.

This is misguided - we do I/O on readonly mounts all the time,
for consistency; for example, log recovery. So do the same
RDONLY flag twiddling around xfs_log_mount_finish() as we
do around xfs_log_mount(), for the same reason.

This all cries out for a big rework but for now this is a
simple fix to an obvious problem.

Signed-off-by: Eric Sandeen <[email protected]>
Reviewed-by: Brian Foster <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
Reviewed-by: Darrick J. Wong <[email protected]>
Signed-off-by: Darrick J. Wong <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
fs/xfs/xfs_log.c | 7 +++++++
1 file changed, 7 insertions(+)

--- a/fs/xfs/xfs_log.c
+++ b/fs/xfs/xfs_log.c
@@ -743,10 +743,14 @@ xfs_log_mount_finish(
struct xfs_mount *mp)
{
int error = 0;
+ bool readonly = (mp->m_flags & XFS_MOUNT_RDONLY);

if (mp->m_flags & XFS_MOUNT_NORECOVERY) {
ASSERT(mp->m_flags & XFS_MOUNT_RDONLY);
return 0;
+ } else if (readonly) {
+ /* Allow unlinked processing to proceed */
+ mp->m_flags &= ~XFS_MOUNT_RDONLY;
}

/*
@@ -764,6 +768,9 @@ xfs_log_mount_finish(
xfs_log_work_queue(mp);
mp->m_super->s_flags &= ~MS_ACTIVE;

+ if (readonly)
+ mp->m_flags |= XFS_MOUNT_RDONLY;
+
return error;
}



2017-09-18 09:11:34

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.13 19/52] x86/switch_to/64: Rewrite FS/GS switching yet again to fix AMD CPUs

4.13-stable review patch. If anyone has any objections, please let me know.

------------------

From: Andy Lutomirski <[email protected]>

commit e137a4d8f4dd2e277e355495b6b2cb241a8693c3 upstream.

Switching FS and GS is a mess, and the current code is still subtly
wrong: it assumes that "Loading a nonzero value into FS sets the
index and base", which is false on AMD CPUs if the value being
loaded is 1, 2, or 3.

(The current code came from commit 3e2b68d752c9 ("x86/asm,
sched/x86: Rewrite the FS and GS context switch code"), which made
it better but didn't fully fix it.)

Rewrite it to be much simpler and more obviously correct. This
should fix it fully on AMD CPUs and shouldn't adversely affect
performance.

Signed-off-by: Andy Lutomirski <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Brian Gerst <[email protected]>
Cc: Chang Seok <[email protected]>
Cc: Denys Vlasenko <[email protected]>
Cc: H. Peter Anvin <[email protected]>
Cc: Josh Poimboeuf <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
arch/x86/kernel/process_64.c | 227 +++++++++++++++++++++++--------------------
1 file changed, 122 insertions(+), 105 deletions(-)

--- a/arch/x86/kernel/process_64.c
+++ b/arch/x86/kernel/process_64.c
@@ -149,6 +149,123 @@ void release_thread(struct task_struct *
}
}

+enum which_selector {
+ FS,
+ GS
+};
+
+/*
+ * Saves the FS or GS base for an outgoing thread if FSGSBASE extensions are
+ * not available. The goal is to be reasonably fast on non-FSGSBASE systems.
+ * It's forcibly inlined because it'll generate better code and this function
+ * is hot.
+ */
+static __always_inline void save_base_legacy(struct task_struct *prev_p,
+ unsigned short selector,
+ enum which_selector which)
+{
+ if (likely(selector == 0)) {
+ /*
+ * On Intel (without X86_BUG_NULL_SEG), the segment base could
+ * be the pre-existing saved base or it could be zero. On AMD
+ * (with X86_BUG_NULL_SEG), the segment base could be almost
+ * anything.
+ *
+ * This branch is very hot (it's hit twice on almost every
+ * context switch between 64-bit programs), and avoiding
+ * the RDMSR helps a lot, so we just assume that whatever
+ * value is already saved is correct. This matches historical
+ * Linux behavior, so it won't break existing applications.
+ *
+ * To avoid leaking state, on non-X86_BUG_NULL_SEG CPUs, if we
+ * report that the base is zero, it needs to actually be zero:
+ * see the corresponding logic in load_seg_legacy.
+ */
+ } else {
+ /*
+ * If the selector is 1, 2, or 3, then the base is zero on
+ * !X86_BUG_NULL_SEG CPUs and could be anything on
+ * X86_BUG_NULL_SEG CPUs. In the latter case, Linux
+ * has never attempted to preserve the base across context
+ * switches.
+ *
+ * If selector > 3, then it refers to a real segment, and
+ * saving the base isn't necessary.
+ */
+ if (which == FS)
+ prev_p->thread.fsbase = 0;
+ else
+ prev_p->thread.gsbase = 0;
+ }
+}
+
+static __always_inline void save_fsgs(struct task_struct *task)
+{
+ savesegment(fs, task->thread.fsindex);
+ savesegment(gs, task->thread.gsindex);
+ save_base_legacy(task, task->thread.fsindex, FS);
+ save_base_legacy(task, task->thread.gsindex, GS);
+}
+
+static __always_inline void loadseg(enum which_selector which,
+ unsigned short sel)
+{
+ if (which == FS)
+ loadsegment(fs, sel);
+ else
+ load_gs_index(sel);
+}
+
+static __always_inline void load_seg_legacy(unsigned short prev_index,
+ unsigned long prev_base,
+ unsigned short next_index,
+ unsigned long next_base,
+ enum which_selector which)
+{
+ if (likely(next_index <= 3)) {
+ /*
+ * The next task is using 64-bit TLS, is not using this
+ * segment at all, or is having fun with arcane CPU features.
+ */
+ if (next_base == 0) {
+ /*
+ * Nasty case: on AMD CPUs, we need to forcibly zero
+ * the base.
+ */
+ if (static_cpu_has_bug(X86_BUG_NULL_SEG)) {
+ loadseg(which, __USER_DS);
+ loadseg(which, next_index);
+ } else {
+ /*
+ * We could try to exhaustively detect cases
+ * under which we can skip the segment load,
+ * but there's really only one case that matters
+ * for performance: if both the previous and
+ * next states are fully zeroed, we can skip
+ * the load.
+ *
+ * (This assumes that prev_base == 0 has no
+ * false positives. This is the case on
+ * Intel-style CPUs.)
+ */
+ if (likely(prev_index | next_index | prev_base))
+ loadseg(which, next_index);
+ }
+ } else {
+ if (prev_index != next_index)
+ loadseg(which, next_index);
+ wrmsrl(which == FS ? MSR_FS_BASE : MSR_KERNEL_GS_BASE,
+ next_base);
+ }
+ } else {
+ /*
+ * The next task is using a real segment. Loading the selector
+ * is sufficient.
+ */
+ loadseg(which, next_index);
+ }
+}
+
int copy_thread_tls(unsigned long clone_flags, unsigned long sp,
unsigned long arg, struct task_struct *p, unsigned long tls)
{
@@ -286,7 +403,6 @@ __switch_to(struct task_struct *prev_p,
struct fpu *next_fpu = &next->fpu;
int cpu = smp_processor_id();
struct tss_struct *tss = &per_cpu(cpu_tss, cpu);
- unsigned prev_fsindex, prev_gsindex;

switch_fpu_prepare(prev_fpu, cpu);

@@ -295,8 +411,7 @@ __switch_to(struct task_struct *prev_p,
*
* (e.g. xen_load_tls())
*/
- savesegment(fs, prev_fsindex);
- savesegment(gs, prev_gsindex);
+ save_fsgs(prev_p);

/*
* Load TLS before restoring any segments so that segment loads
@@ -335,108 +450,10 @@ __switch_to(struct task_struct *prev_p,
if (unlikely(next->ds | prev->ds))
loadsegment(ds, next->ds);

- /*
- * Switch FS and GS.
- *
- * These are even more complicated than DS and ES: they have
- * 64-bit bases are that controlled by arch_prctl. The bases
- * don't necessarily match the selectors, as user code can do
- * any number of things to cause them to be inconsistent.
- *
- * We don't promise to preserve the bases if the selectors are
- * nonzero. We also don't promise to preserve the base if the
- * selector is zero and the base doesn't match whatever was
- * most recently passed to ARCH_SET_FS/GS. (If/when the
- * FSGSBASE instructions are enabled, we'll need to offer
- * stronger guarantees.)
- *
- * As an invariant,
- * (fsbase != 0 && fsindex != 0) || (gsbase != 0 && gsindex != 0) is
- * impossible.
- */
- if (next->fsindex) {
- /* Loading a nonzero value into FS sets the index and base. */
- loadsegment(fs, next->fsindex);
- } else {
- if (next->fsbase) {
- /* Next index is zero but next base is nonzero. */
- if (prev_fsindex)
- loadsegment(fs, 0);
- wrmsrl(MSR_FS_BASE, next->fsbase);
- } else {
- /* Next base and index are both zero. */
- if (static_cpu_has_bug(X86_BUG_NULL_SEG)) {
- /*
- * We don't know the previous base and can't
- * find out without RDMSR. Forcibly clear it.
- */
- loadsegment(fs, __USER_DS);
- loadsegment(fs, 0);
- } else {
- /*
- * If the previous index is zero and ARCH_SET_FS
- * didn't change the base, then the base is
- * also zero and we don't need to do anything.
- */
- if (prev->fsbase || prev_fsindex)
- loadsegment(fs, 0);
- }
- }
- }
- /*
- * Save the old state and preserve the invariant.
- * NB: if prev_fsindex == 0, then we can't reliably learn the base
- * without RDMSR because Intel user code can zero it without telling
- * us and AMD user code can program any 32-bit value without telling
- * us.
- */
- if (prev_fsindex)
- prev->fsbase = 0;
- prev->fsindex = prev_fsindex;
-
- if (next->gsindex) {
- /* Loading a nonzero value into GS sets the index and base. */
- load_gs_index(next->gsindex);
- } else {
- if (next->gsbase) {
- /* Next index is zero but next base is nonzero. */
- if (prev_gsindex)
- load_gs_index(0);
- wrmsrl(MSR_KERNEL_GS_BASE, next->gsbase);
- } else {
- /* Next base and index are both zero. */
- if (static_cpu_has_bug(X86_BUG_NULL_SEG)) {
- /*
- * We don't know the previous base and can't
- * find out without RDMSR. Forcibly clear it.
- *
- * This contains a pointless SWAPGS pair.
- * Fixing it would involve an explicit check
- * for Xen or a new pvop.
- */
- load_gs_index(__USER_DS);
- load_gs_index(0);
- } else {
- /*
- * If the previous index is zero and ARCH_SET_GS
- * didn't change the base, then the base is
- * also zero and we don't need to do anything.
- */
- if (prev->gsbase || prev_gsindex)
- load_gs_index(0);
- }
- }
- }
- /*
- * Save the old state and preserve the invariant.
- * NB: if prev_gsindex == 0, then we can't reliably learn the base
- * without RDMSR because Intel user code can zero it without telling
- * us and AMD user code can program any 32-bit value without telling
- * us.
- */
- if (prev_gsindex)
- prev->gsbase = 0;
- prev->gsindex = prev_gsindex;
+ load_seg_legacy(prev->fsindex, prev->fsbase,
+ next->fsindex, next->fsbase, FS);
+ load_seg_legacy(prev->gsindex, prev->gsbase,
+ next->gsindex, next->gsbase, GS);

switch_fpu_finish(next_fpu, cpu);



2017-09-18 09:11:38

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.13 35/52] xfs: evict all inodes involved with log redo item

4.13-stable review patch. If anyone has any objections, please let me know.

------------------

From: "Darrick J. Wong" <[email protected]>

commit 799ea9e9c59949008770aab4e1da87f10e99dbe4 upstream.

When we introduced the bmap redo log items, we set MS_ACTIVE on the
mountpoint and XFS_IRECOVERY on the inode to prevent unlinked inodes
from being truncated prematurely during log recovery. This also had the
effect of putting linked inodes on the lru instead of evicting them.

Unfortunately, we neglected to find all those unreferenced lru inodes
and evict them after finishing log recovery, which means that we leak
them if anything goes wrong in the rest of xfs_mountfs, because the lru
is only cleaned out on unmount.

Therefore, evict unreferenced inodes in the lru list immediately
after clearing MS_ACTIVE.

Fixes: 17c12bcd30 ("xfs: when replaying bmap operations, don't let unlinked inodes get reaped")
Signed-off-by: Darrick J. Wong <[email protected]>
Cc: [email protected]
Reviewed-by: Brian Foster <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
fs/inode.c | 1 +
fs/internal.h | 1 -
fs/xfs/xfs_log.c | 12 ++++++++++++
include/linux/fs.h | 1 +
4 files changed, 14 insertions(+), 1 deletion(-)

--- a/fs/inode.c
+++ b/fs/inode.c
@@ -637,6 +637,7 @@ again:

dispose_list(&dispose);
}
+EXPORT_SYMBOL_GPL(evict_inodes);

/**
* invalidate_inodes - attempt to free all inodes on a superblock
--- a/fs/internal.h
+++ b/fs/internal.h
@@ -132,7 +132,6 @@ static inline bool atime_needs_update_rc
extern void inode_io_list_del(struct inode *inode);

extern long get_nr_dirty_inodes(void);
-extern void evict_inodes(struct super_block *);
extern int invalidate_inodes(struct super_block *, bool);

/*
--- a/fs/xfs/xfs_log.c
+++ b/fs/xfs/xfs_log.c
@@ -761,12 +761,24 @@ xfs_log_mount_finish(
* inodes. Turn it off immediately after recovery finishes
* so that we don't leak the quota inodes if subsequent mount
* activities fail.
+ *
+ * We let all inodes involved in redo item processing end up on
+ * the LRU instead of being evicted immediately so that if we do
+ * something to an unlinked inode, the irele won't cause
+ * premature truncation and freeing of the inode, which results
+ * in log recovery failure. We have to evict the unreferenced
+ * lru inodes after clearing MS_ACTIVE because we don't
+ * otherwise clean up the lru if there's a subsequent failure in
+ * xfs_mountfs, which leads to us leaking the inodes if nothing
+ * else (e.g. quotacheck) references the inodes before the
+ * mount failure occurs.
*/
mp->m_super->s_flags |= MS_ACTIVE;
error = xlog_recover_finish(mp->m_log);
if (!error)
xfs_log_work_queue(mp);
mp->m_super->s_flags &= ~MS_ACTIVE;
+ evict_inodes(mp->m_super);

if (readonly)
mp->m_flags |= XFS_MOUNT_RDONLY;
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -2831,6 +2831,7 @@ static inline void lockdep_annotate_inod
#endif
extern void unlock_new_inode(struct inode *);
extern unsigned int get_next_ino(void);
+extern void evict_inodes(struct super_block *sb);

extern void __iget(struct inode * inode);
extern void iget_failed(struct inode *);


2017-09-18 09:11:44

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.13 37/52] xfs: open-code xfs_buf_item_dirty()

4.13-stable review patch. If anyone has any objections, please let me know.

------------------

From: Brian Foster <[email protected]>

commit a4f6cf6b2b6b60ec2a05a33a32e65caa4149aa2b upstream.

It checks a single flag and has one caller. It probably isn't worth
its own function.

Signed-off-by: Brian Foster <[email protected]>
Reviewed-by: Darrick J. Wong <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
Signed-off-by: Darrick J. Wong <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
fs/xfs/xfs_buf_item.c | 11 -----------
fs/xfs/xfs_buf_item.h | 1 -
fs/xfs/xfs_trans_buf.c | 2 +-
3 files changed, 1 insertion(+), 13 deletions(-)

--- a/fs/xfs/xfs_buf_item.c
+++ b/fs/xfs/xfs_buf_item.c
@@ -945,17 +945,6 @@ xfs_buf_item_log(
}


-/*
- * Return 1 if the buffer has been logged or ordered in a transaction (at any
- * point, not just the current transaction) and 0 if not.
- */
-uint
-xfs_buf_item_dirty(
- xfs_buf_log_item_t *bip)
-{
- return (bip->bli_flags & XFS_BLI_DIRTY);
-}
-
STATIC void
xfs_buf_item_free(
xfs_buf_log_item_t *bip)
--- a/fs/xfs/xfs_buf_item.h
+++ b/fs/xfs/xfs_buf_item.h
@@ -64,7 +64,6 @@ typedef struct xfs_buf_log_item {
int xfs_buf_item_init(struct xfs_buf *, struct xfs_mount *);
void xfs_buf_item_relse(struct xfs_buf *);
void xfs_buf_item_log(xfs_buf_log_item_t *, uint, uint);
-uint xfs_buf_item_dirty(xfs_buf_log_item_t *);
void xfs_buf_attach_iodone(struct xfs_buf *,
void(*)(struct xfs_buf *, xfs_log_item_t *),
xfs_log_item_t *);
--- a/fs/xfs/xfs_trans_buf.c
+++ b/fs/xfs/xfs_trans_buf.c
@@ -435,7 +435,7 @@ xfs_trans_brelse(xfs_trans_t *tp,
if (XFS_FORCED_SHUTDOWN(tp->t_mountp) && freed) {
xfs_trans_ail_remove(&bip->bli_item, SHUTDOWN_LOG_IO_ERROR);
xfs_buf_item_relse(bp);
- } else if (!xfs_buf_item_dirty(bip)) {
+ } else if (!(bip->bli_flags & XFS_BLI_DIRTY)) {
/***
ASSERT(bp->b_pincount == 0);
***/


2017-09-18 09:11:51

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.13 44/52] xfs: disallow marking previously dirty buffers as ordered

4.13-stable review patch. If anyone has any objections, please let me know.

------------------

From: Brian Foster <[email protected]>

commit a5814bceea48ee1c57c4db2bd54b0c0246daf54a upstream.

Ordered buffers are used in situations where the buffer is not
physically logged but must pass through the transaction/logging
pipeline for a particular transaction. As a result, ordered buffers
are not unpinned and written back until the transaction commits to
the log. Ordered buffers have a strict requirement that the target
buffer must not be currently dirty and resident in the log pipeline
at the time it is marked ordered. If a dirty+ordered buffer is
committed, the buffer is reinserted to the AIL but not physically
relogged at the LSN of the associated checkpoint. The buffer log
item is assigned the LSN of the latest checkpoint and the AIL
effectively releases the previously logged buffer content from the
active log before the buffer has been written back. If the tail
pushes forward and a filesystem crash occurs while in this state, an
inconsistent filesystem could result.

It is currently the caller responsibility to ensure an ordered
buffer is not already dirty from a previous modification. This is
unclear and error prone when not used in situations where it is
guaranteed a buffer has not been previously modified (such as new
metadata allocations).

To facilitate general purpose use of ordered buffers, update
xfs_trans_ordered_buf() to conditionally order the buffer based on
state of the log item and return the status of the result. If the
bli is dirty, do not order the buffer and return false. The caller
must either physically log the buffer (having acquired the
appropriate log reservation) or push it from the AIL to clean it
before it can be marked ordered in the current transaction.

Note that ordered buffers are currently only used in two situations:
1.) inode chunk allocation where previously logged buffers are not
possible and 2.) extent swap which will be updated to handle ordered
buffer failures in a separate patch.

Signed-off-by: Brian Foster <[email protected]>
Reviewed-by: Darrick J. Wong <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
Signed-off-by: Darrick J. Wong <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
fs/xfs/xfs_trans.h | 2 +-
fs/xfs/xfs_trans_buf.c | 7 +++++--
2 files changed, 6 insertions(+), 3 deletions(-)

--- a/fs/xfs/xfs_trans.h
+++ b/fs/xfs/xfs_trans.h
@@ -212,7 +212,7 @@ void xfs_trans_bhold_release(xfs_trans_
void xfs_trans_binval(xfs_trans_t *, struct xfs_buf *);
void xfs_trans_inode_buf(xfs_trans_t *, struct xfs_buf *);
void xfs_trans_stale_inode_buf(xfs_trans_t *, struct xfs_buf *);
-void xfs_trans_ordered_buf(xfs_trans_t *, struct xfs_buf *);
+bool xfs_trans_ordered_buf(xfs_trans_t *, struct xfs_buf *);
void xfs_trans_dquot_buf(xfs_trans_t *, struct xfs_buf *, uint);
void xfs_trans_inode_alloc_buf(xfs_trans_t *, struct xfs_buf *);
void xfs_trans_ichgtime(struct xfs_trans *, struct xfs_inode *, int);
--- a/fs/xfs/xfs_trans_buf.c
+++ b/fs/xfs/xfs_trans_buf.c
@@ -724,7 +724,7 @@ xfs_trans_inode_alloc_buf(
* transactions rather than the physical changes we make to the buffer without
* changing writeback ordering constraints of metadata buffers.
*/
-void
+bool
xfs_trans_ordered_buf(
struct xfs_trans *tp,
struct xfs_buf *bp)
@@ -734,7 +734,9 @@ xfs_trans_ordered_buf(
ASSERT(bp->b_transp == tp);
ASSERT(bip != NULL);
ASSERT(atomic_read(&bip->bli_refcount) > 0);
- ASSERT(!xfs_buf_item_dirty_format(bip));
+
+ if (xfs_buf_item_dirty_format(bip))
+ return false;

bip->bli_flags |= XFS_BLI_ORDERED;
trace_xfs_buf_item_ordered(bip);
@@ -744,6 +746,7 @@ xfs_trans_ordered_buf(
* to be marked dirty and that it has been logged.
*/
xfs_trans_dirty_buf(tp, bp);
+ return true;
}

/*


2017-09-18 09:11:56

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.13 45/52] xfs: relog dirty buffers during swapext bmbt owner change

4.13-stable review patch. If anyone has any objections, please let me know.

------------------

From: Brian Foster <[email protected]>

commit 2dd3d709fc4338681a3aa61658122fa8faa5a437 upstream.

The owner change bmbt scan that occurs during extent swap operations
does not handle ordered buffer failures. Buffers that cannot be
marked ordered must be physically logged so previously dirty ranges
of the buffer can be relogged in the transaction.

Since the bmbt scan may need to process and potentially log a large
number of blocks, we can't expect to complete this operation in a
single transaction. Update extent swap to use a permanent
transaction with enough log reservation to physically log a buffer.
Update the bmbt scan to physically log any buffers that cannot be
ordered and to terminate the scan with -EAGAIN. On -EAGAIN, the
caller rolls the transaction and restarts the scan. Finally, update
the bmbt scan helper function to skip bmbt blocks that already match
the expected owner so they are not reprocessed after scan restarts.

Signed-off-by: Brian Foster <[email protected]>
Reviewed-by: Darrick J. Wong <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
[darrick: fix the xfs_trans_roll call]
Signed-off-by: Darrick J. Wong <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
fs/xfs/libxfs/xfs_btree.c | 26 ++++++++++++++------
fs/xfs/xfs_bmap_util.c | 57 +++++++++++++++++++++++++++++++++++++---------
2 files changed, 65 insertions(+), 18 deletions(-)

--- a/fs/xfs/libxfs/xfs_btree.c
+++ b/fs/xfs/libxfs/xfs_btree.c
@@ -4452,10 +4452,15 @@ xfs_btree_block_change_owner(

/* modify the owner */
block = xfs_btree_get_block(cur, level, &bp);
- if (cur->bc_flags & XFS_BTREE_LONG_PTRS)
+ if (cur->bc_flags & XFS_BTREE_LONG_PTRS) {
+ if (block->bb_u.l.bb_owner == cpu_to_be64(bbcoi->new_owner))
+ return 0;
block->bb_u.l.bb_owner = cpu_to_be64(bbcoi->new_owner);
- else
+ } else {
+ if (block->bb_u.s.bb_owner == cpu_to_be32(bbcoi->new_owner))
+ return 0;
block->bb_u.s.bb_owner = cpu_to_be32(bbcoi->new_owner);
+ }

/*
* If the block is a root block hosted in an inode, we might not have a
@@ -4464,14 +4469,19 @@ xfs_btree_block_change_owner(
* block is formatted into the on-disk inode fork. We still change it,
* though, so everything is consistent in memory.
*/
- if (bp) {
- if (cur->bc_tp)
- xfs_trans_ordered_buf(cur->bc_tp, bp);
- else
- xfs_buf_delwri_queue(bp, bbcoi->buffer_list);
- } else {
+ if (!bp) {
ASSERT(cur->bc_flags & XFS_BTREE_ROOT_IN_INODE);
ASSERT(level == cur->bc_nlevels - 1);
+ return 0;
+ }
+
+ if (cur->bc_tp) {
+ if (!xfs_trans_ordered_buf(cur->bc_tp, bp)) {
+ xfs_btree_log_block(cur, bp, XFS_BB_OWNER);
+ return -EAGAIN;
+ }
+ } else {
+ xfs_buf_delwri_queue(bp, bbcoi->buffer_list);
}

return 0;
--- a/fs/xfs/xfs_bmap_util.c
+++ b/fs/xfs/xfs_bmap_util.c
@@ -1929,6 +1929,48 @@ xfs_swap_extent_forks(
return 0;
}

+/*
+ * Fix up the owners of the bmbt blocks to refer to the current inode. The
+ * change owner scan attempts to order all modified buffers in the current
+ * transaction. In the event of ordered buffer failure, the offending buffer is
+ * physically logged as a fallback and the scan returns -EAGAIN. We must roll
+ * the transaction in this case to replenish the fallback log reservation and
+ * restart the scan. This process repeats until the scan completes.
+ */
+static int
+xfs_swap_change_owner(
+ struct xfs_trans **tpp,
+ struct xfs_inode *ip,
+ struct xfs_inode *tmpip)
+{
+ int error;
+ struct xfs_trans *tp = *tpp;
+
+ do {
+ error = xfs_bmbt_change_owner(tp, ip, XFS_DATA_FORK, ip->i_ino,
+ NULL);
+ /* success or fatal error */
+ if (error != -EAGAIN)
+ break;
+
+ error = xfs_trans_roll(tpp, NULL);
+ if (error)
+ break;
+ tp = *tpp;
+
+ /*
+ * Redirty both inodes so they can relog and keep the log tail
+ * moving forward.
+ */
+ xfs_trans_ijoin(tp, ip, 0);
+ xfs_trans_ijoin(tp, tmpip, 0);
+ xfs_trans_log_inode(tp, ip, XFS_ILOG_CORE);
+ xfs_trans_log_inode(tp, tmpip, XFS_ILOG_CORE);
+ } while (true);
+
+ return error;
+}
+
int
xfs_swap_extents(
struct xfs_inode *ip, /* target inode */
@@ -1943,7 +1985,7 @@ xfs_swap_extents(
int lock_flags;
struct xfs_ifork *cowfp;
uint64_t f;
- int resblks;
+ int resblks = 0;

/*
* Lock the inodes against other IO, page faults and truncate to
@@ -1991,11 +2033,8 @@ xfs_swap_extents(
XFS_SWAP_RMAP_SPACE_RES(mp,
XFS_IFORK_NEXTENTS(tip, XFS_DATA_FORK),
XFS_DATA_FORK);
- error = xfs_trans_alloc(mp, &M_RES(mp)->tr_write, resblks,
- 0, 0, &tp);
- } else
- error = xfs_trans_alloc(mp, &M_RES(mp)->tr_ichange, 0,
- 0, 0, &tp);
+ }
+ error = xfs_trans_alloc(mp, &M_RES(mp)->tr_write, resblks, 0, 0, &tp);
if (error)
goto out_unlock;

@@ -2087,14 +2126,12 @@ xfs_swap_extents(
* inode number of the current inode.
*/
if (src_log_flags & XFS_ILOG_DOWNER) {
- error = xfs_bmbt_change_owner(tp, ip, XFS_DATA_FORK,
- ip->i_ino, NULL);
+ error = xfs_swap_change_owner(&tp, ip, tip);
if (error)
goto out_trans_cancel;
}
if (target_log_flags & XFS_ILOG_DOWNER) {
- error = xfs_bmbt_change_owner(tp, tip, XFS_DATA_FORK,
- tip->i_ino, NULL);
+ error = xfs_swap_change_owner(&tp, tip, ip);
if (error)
goto out_trans_cancel;
}


2017-09-18 09:12:03

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.13 40/52] xfs: refactor buffer logging into buffer dirtying helper

4.13-stable review patch. If anyone has any objections, please let me know.

------------------

From: Brian Foster <[email protected]>

commit 9684010d38eccda733b61106765e9357cf436f65 upstream.

xfs_trans_log_buf() is responsible for logging the dirty segments of
a buffer along with setting all of the necessary state on the
transaction, buffer, bli, etc., to ensure that the associated items
are marked as dirty and prepared for I/O. We have a couple use cases
that need to to dirty a buffer in a transaction without actually
logging dirty ranges of the buffer. One existing use case is
ordered buffers, which are currently logged with arbitrary ranges to
accomplish this even though the content of ordered buffers is never
written to the log. Another pending use case is to relog an already
dirty buffer across rolled transactions within the deferred
operations infrastructure. This is required to prevent a held
(XFS_BLI_HOLD) buffer from pinning the tail of the log.

Refactor xfs_trans_log_buf() into a new function that contains all
of the logic responsible to dirty the transaction, lidp, buffer and
bli. This new function can be used in the future for the use cases
outlined above. This patch does not introduce functional changes.

Signed-off-by: Brian Foster <[email protected]>
Reviewed-by: Allison Henderson <[email protected]>
Reviewed-by: Darrick J. Wong <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
Signed-off-by: Darrick J. Wong <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
fs/xfs/xfs_trans.h | 4 +++-
fs/xfs/xfs_trans_buf.c | 46 ++++++++++++++++++++++++++++++----------------
2 files changed, 33 insertions(+), 17 deletions(-)

--- a/fs/xfs/xfs_trans.h
+++ b/fs/xfs/xfs_trans.h
@@ -217,7 +217,9 @@ void xfs_trans_dquot_buf(xfs_trans_t *,
void xfs_trans_inode_alloc_buf(xfs_trans_t *, struct xfs_buf *);
void xfs_trans_ichgtime(struct xfs_trans *, struct xfs_inode *, int);
void xfs_trans_ijoin(struct xfs_trans *, struct xfs_inode *, uint);
-void xfs_trans_log_buf(xfs_trans_t *, struct xfs_buf *, uint, uint);
+void xfs_trans_log_buf(struct xfs_trans *, struct xfs_buf *, uint,
+ uint);
+void xfs_trans_dirty_buf(struct xfs_trans *, struct xfs_buf *);
void xfs_trans_log_inode(xfs_trans_t *, struct xfs_inode *, uint);

void xfs_extent_free_init_defer_op(void);
--- a/fs/xfs/xfs_trans_buf.c
+++ b/fs/xfs/xfs_trans_buf.c
@@ -493,25 +493,17 @@ xfs_trans_bhold_release(xfs_trans_t *tp,
}

/*
- * This is called to mark bytes first through last inclusive of the given
- * buffer as needing to be logged when the transaction is committed.
- * The buffer must already be associated with the given transaction.
- *
- * First and last are numbers relative to the beginning of this buffer,
- * so the first byte in the buffer is numbered 0 regardless of the
- * value of b_blkno.
+ * Mark a buffer dirty in the transaction.
*/
void
-xfs_trans_log_buf(xfs_trans_t *tp,
- xfs_buf_t *bp,
- uint first,
- uint last)
+xfs_trans_dirty_buf(
+ struct xfs_trans *tp,
+ struct xfs_buf *bp)
{
- xfs_buf_log_item_t *bip = bp->b_fspriv;
+ struct xfs_buf_log_item *bip = bp->b_fspriv;

ASSERT(bp->b_transp == tp);
ASSERT(bip != NULL);
- ASSERT(first <= last && last < BBTOB(bp->b_length));
ASSERT(bp->b_iodone == NULL ||
bp->b_iodone == xfs_buf_iodone_callbacks);

@@ -531,8 +523,6 @@ xfs_trans_log_buf(xfs_trans_t *tp,
bp->b_iodone = xfs_buf_iodone_callbacks;
bip->bli_item.li_cb = xfs_buf_iodone;

- trace_xfs_trans_log_buf(bip);
-
/*
* If we invalidated the buffer within this transaction, then
* cancel the invalidation now that we're dirtying the buffer
@@ -545,15 +535,39 @@ xfs_trans_log_buf(xfs_trans_t *tp,
bp->b_flags &= ~XBF_STALE;
bip->__bli_format.blf_flags &= ~XFS_BLF_CANCEL;
}
+ bip->bli_flags |= XFS_BLI_DIRTY | XFS_BLI_LOGGED;

tp->t_flags |= XFS_TRANS_DIRTY;
bip->bli_item.li_desc->lid_flags |= XFS_LID_DIRTY;
+}
+
+/*
+ * This is called to mark bytes first through last inclusive of the given
+ * buffer as needing to be logged when the transaction is committed.
+ * The buffer must already be associated with the given transaction.
+ *
+ * First and last are numbers relative to the beginning of this buffer,
+ * so the first byte in the buffer is numbered 0 regardless of the
+ * value of b_blkno.
+ */
+void
+xfs_trans_log_buf(
+ struct xfs_trans *tp,
+ struct xfs_buf *bp,
+ uint first,
+ uint last)
+{
+ struct xfs_buf_log_item *bip = bp->b_fspriv;
+
+ ASSERT(first <= last && last < BBTOB(bp->b_length));
+
+ xfs_trans_dirty_buf(tp, bp);

/*
* If we have an ordered buffer we are not logging any dirty range but
* it still needs to be marked dirty and that it has been logged.
*/
- bip->bli_flags |= XFS_BLI_DIRTY | XFS_BLI_LOGGED;
+ trace_xfs_trans_log_buf(bip);
if (!(bip->bli_flags & XFS_BLI_ORDERED))
xfs_buf_item_log(bip, first, last);
}


2017-09-18 09:12:10

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.13 47/52] xfs: fix incorrect log_flushed on fsync

4.13-stable review patch. If anyone has any objections, please let me know.

------------------

From: Amir Goldstein <[email protected]>

commit 47c7d0b19502583120c3f396c7559e7a77288a68 upstream.

When calling into _xfs_log_force{,_lsn}() with a pointer
to log_flushed variable, log_flushed will be set to 1 if:
1. xlog_sync() is called to flush the active log buffer
AND/OR
2. xlog_wait() is called to wait on a syncing log buffers

xfs_file_fsync() checks the value of log_flushed after
_xfs_log_force_lsn() call to optimize away an explicit
PREFLUSH request to the data block device after writing
out all the file's pages to disk.

This optimization is incorrect in the following sequence of events:

Task A Task B
-------------------------------------------------------
xfs_file_fsync()
_xfs_log_force_lsn()
xlog_sync()
[submit PREFLUSH]
xfs_file_fsync()
file_write_and_wait_range()
[submit WRITE X]
[endio WRITE X]
_xfs_log_force_lsn()
xlog_wait()
[endio PREFLUSH]

The write X is not guarantied to be on persistent storage
when PREFLUSH request in completed, because write A was submitted
after the PREFLUSH request, but xfs_file_fsync() of task A will
be notified of log_flushed=1 and will skip explicit flush.

If the system crashes after fsync of task A, write X may not be
present on disk after reboot.

This bug was discovered and demonstrated using Josef Bacik's
dm-log-writes target, which can be used to record block io operations
and then replay a subset of these operations onto the target device.
The test goes something like this:
- Use fsx to execute ops of a file and record ops on log device
- Every now and then fsync the file, store md5 of file and mark
the location in the log
- Then replay log onto device for each mark, mount fs and compare
md5 of file to stored value

Cc: Christoph Hellwig <[email protected]>
Cc: Josef Bacik <[email protected]>
Cc: <[email protected]>
Signed-off-by: Amir Goldstein <[email protected]>
Reviewed-by: Darrick J. Wong <[email protected]>
Signed-off-by: Darrick J. Wong <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
fs/xfs/xfs_log.c | 7 -------
1 file changed, 7 deletions(-)

--- a/fs/xfs/xfs_log.c
+++ b/fs/xfs/xfs_log.c
@@ -3375,8 +3375,6 @@ maybe_sleep:
*/
if (iclog->ic_state & XLOG_STATE_IOERROR)
return -EIO;
- if (log_flushed)
- *log_flushed = 1;
} else {

no_sleep:
@@ -3480,8 +3478,6 @@ try_again:

xlog_wait(&iclog->ic_prev->ic_write_wait,
&log->l_icloglock);
- if (log_flushed)
- *log_flushed = 1;
already_slept = 1;
goto try_again;
}
@@ -3515,9 +3511,6 @@ try_again:
*/
if (iclog->ic_state & XLOG_STATE_IOERROR)
return -EIO;
-
- if (log_flushed)
- *log_flushed = 1;
} else { /* just return */
spin_unlock(&log->l_icloglock);
}


2017-09-18 09:12:15

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.13 49/52] xfs: open code end_buffer_async_write in xfs_finish_page_writeback

4.13-stable review patch. If anyone has any objections, please let me know.

------------------

From: Christoph Hellwig <[email protected]>

commit 8353a814f2518dcfa79a5bb77afd0e7dfa391bb1 upstream.

Our loop in xfs_finish_page_writeback, which iterates over all buffer
heads in a page and then calls end_buffer_async_write, which also
iterates over all buffers in the page to check if any I/O is in flight
is not only inefficient, but also potentially dangerous as
end_buffer_async_write can cause the page and all buffers to be freed.

Replace it with a single loop that does the work of end_buffer_async_write
on a per-page basis.

Signed-off-by: Christoph Hellwig <[email protected]>
Reviewed-by: Brian Foster <[email protected]>
Reviewed-by: Darrick J. Wong <[email protected]>
Signed-off-by: Darrick J. Wong <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
fs/xfs/xfs_aops.c | 71 +++++++++++++++++++++++++++++++++++-------------------
1 file changed, 47 insertions(+), 24 deletions(-)

--- a/fs/xfs/xfs_aops.c
+++ b/fs/xfs/xfs_aops.c
@@ -85,11 +85,11 @@ xfs_find_bdev_for_inode(
* associated buffer_heads, paying attention to the start and end offsets that
* we need to process on the page.
*
- * Landmine Warning: bh->b_end_io() will call end_page_writeback() on the last
- * buffer in the IO. Once it does this, it is unsafe to access the bufferhead or
- * the page at all, as we may be racing with memory reclaim and it can free both
- * the bufferhead chain and the page as it will see the page as clean and
- * unused.
+ * Note that we open code the action in end_buffer_async_write here so that we
+ * only have to iterate over the buffers attached to the page once. This is not
+ * only more efficient, but also ensures that we only calls end_page_writeback
+ * at the end of the iteration, and thus avoids the pitfall of having the page
+ * and buffers potentially freed after every call to end_buffer_async_write.
*/
static void
xfs_finish_page_writeback(
@@ -97,29 +97,44 @@ xfs_finish_page_writeback(
struct bio_vec *bvec,
int error)
{
- unsigned int end = bvec->bv_offset + bvec->bv_len - 1;
- struct buffer_head *head, *bh, *next;
+ struct buffer_head *head = page_buffers(bvec->bv_page), *bh = head;
+ bool busy = false;
unsigned int off = 0;
- unsigned int bsize;
+ unsigned long flags;

ASSERT(bvec->bv_offset < PAGE_SIZE);
ASSERT((bvec->bv_offset & (i_blocksize(inode) - 1)) == 0);
- ASSERT(end < PAGE_SIZE);
+ ASSERT(bvec->bv_offset + bvec->bv_len <= PAGE_SIZE);
ASSERT((bvec->bv_len & (i_blocksize(inode) - 1)) == 0);

- bh = head = page_buffers(bvec->bv_page);
-
- bsize = bh->b_size;
+ local_irq_save(flags);
+ bit_spin_lock(BH_Uptodate_Lock, &head->b_state);
do {
- if (off > end)
- break;
- next = bh->b_this_page;
- if (off < bvec->bv_offset)
- goto next_bh;
- bh->b_end_io(bh, !error);
-next_bh:
- off += bsize;
- } while ((bh = next) != head);
+ if (off >= bvec->bv_offset &&
+ off < bvec->bv_offset + bvec->bv_len) {
+ ASSERT(buffer_async_write(bh));
+ ASSERT(bh->b_end_io == NULL);
+
+ if (error) {
+ mark_buffer_write_io_error(bh);
+ clear_buffer_uptodate(bh);
+ SetPageError(bvec->bv_page);
+ } else {
+ set_buffer_uptodate(bh);
+ }
+ clear_buffer_async_write(bh);
+ unlock_buffer(bh);
+ } else if (buffer_async_write(bh)) {
+ ASSERT(buffer_locked(bh));
+ busy = true;
+ }
+ off += bh->b_size;
+ } while ((bh = bh->b_this_page) != head);
+ bit_spin_unlock(BH_Uptodate_Lock, &head->b_state);
+ local_irq_restore(flags);
+
+ if (!busy)
+ end_page_writeback(bvec->bv_page);
}

/*
@@ -133,8 +148,10 @@ xfs_destroy_ioend(
int error)
{
struct inode *inode = ioend->io_inode;
- struct bio *last = ioend->io_bio;
- struct bio *bio, *next;
+ struct bio *bio = &ioend->io_inline_bio;
+ struct bio *last = ioend->io_bio, *next;
+ u64 start = bio->bi_iter.bi_sector;
+ bool quiet = bio_flagged(bio, BIO_QUIET);

for (bio = &ioend->io_inline_bio; bio; bio = next) {
struct bio_vec *bvec;
@@ -155,6 +172,11 @@ xfs_destroy_ioend(

bio_put(bio);
}
+
+ if (unlikely(error && !quiet)) {
+ xfs_err_ratelimited(XFS_I(inode)->i_mount,
+ "writeback error on sector %llu", start);
+ }
}

/*
@@ -423,7 +445,8 @@ xfs_start_buffer_writeback(
ASSERT(!buffer_delay(bh));
ASSERT(!buffer_unwritten(bh));

- mark_buffer_async_write(bh);
+ bh->b_end_io = NULL;
+ set_buffer_async_write(bh);
set_buffer_uptodate(bh);
clear_buffer_dirty(bh);
}


2017-09-18 09:12:22

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.13 52/52] md/raid5: release/flush io in raid5_do_work()

4.13-stable review patch. If anyone has any objections, please let me know.

------------------

From: Song Liu <[email protected]>

commit 9c72a18e46ebe0f09484cce8ebf847abdab58498 upstream.

In raid5, there are scenarios where some ios are deferred to a later
time, and some IO need a flush to complete. To make sure we make
progress with these IOs, we need to call the following functions:

flush_deferred_bios(conf);
r5l_flush_stripe_to_raid(conf->log);

Both of these functions are called in raid5d(), but missing in
raid5_do_work(). As a result, these functions are not called
when multi-threading (group_thread_cnt > 0) is enabled. This patch
adds calls to these function to raid5_do_work().

Note for stable branches:

r5l_flush_stripe_to_raid(conf->log) is need for 4.4+
flush_deferred_bios(conf) is only needed for 4.11+

Signed-off-by: Song Liu <[email protected]>
Signed-off-by: Shaohua Li <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
drivers/md/raid5.c | 4 ++++
1 file changed, 4 insertions(+)

--- a/drivers/md/raid5.c
+++ b/drivers/md/raid5.c
@@ -6235,6 +6235,10 @@ static void raid5_do_work(struct work_st

spin_unlock_irq(&conf->device_lock);

+ flush_deferred_bios(conf);
+
+ r5l_flush_stripe_to_raid(conf->log);
+
async_tx_issue_pending_all();
blk_finish_plug(&plug);



2017-09-18 09:19:26

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.13 20/52] x86/mm, mm/hwpoison: Clear PRESENT bit for kernel 1:1 mappings of poison pages

4.13-stable review patch. If anyone has any objections, please let me know.

------------------

From: Tony Luck <[email protected]>

commit ce0fa3e56ad20f04d8252353dcd24e924abdafca upstream.

Speculative processor accesses may reference any memory that has a
valid page table entry. While a speculative access won't generate
a machine check, it will log the error in a machine check bank. That
could cause escalation of a subsequent error since the overflow bit
will be then set in the machine check bank status register.

Code has to be double-plus-tricky to avoid mentioning the 1:1 virtual
address of the page we want to map out otherwise we may trigger the
very problem we are trying to avoid. We use a non-canonical address
that passes through the usual Linux table walking code to get to the
same "pte".

Thanks to Dave Hansen for reviewing several iterations of this.

Also see:

http://marc.info/?l=linux-mm&m=149860136413338&w=2

Signed-off-by: Tony Luck <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Brian Gerst <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Denys Vlasenko <[email protected]>
Cc: Elliott, Robert (Persistent Memory) <[email protected]>
Cc: H. Peter Anvin <[email protected]>
Cc: Josh Poimboeuf <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Naoya Horiguchi <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: [email protected]
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
arch/x86/include/asm/page_64.h | 4 +++
arch/x86/kernel/cpu/mcheck/mce.c | 43 +++++++++++++++++++++++++++++++++++++++
include/linux/mm_inline.h | 6 +++++
mm/memory-failure.c | 2 +
4 files changed, 55 insertions(+)

--- a/arch/x86/include/asm/page_64.h
+++ b/arch/x86/include/asm/page_64.h
@@ -51,6 +51,10 @@ static inline void clear_page(void *page

void copy_page(void *to, void *from);

+#ifdef CONFIG_X86_MCE
+#define arch_unmap_kpfn arch_unmap_kpfn
+#endif
+
#endif /* !__ASSEMBLY__ */

#ifdef CONFIG_X86_VSYSCALL_EMULATION
--- a/arch/x86/kernel/cpu/mcheck/mce.c
+++ b/arch/x86/kernel/cpu/mcheck/mce.c
@@ -51,6 +51,7 @@
#include <asm/mce.h>
#include <asm/msr.h>
#include <asm/reboot.h>
+#include <asm/set_memory.h>

#include "mce-internal.h"

@@ -1051,6 +1052,48 @@ static int do_memory_failure(struct mce
return ret;
}

+#if defined(arch_unmap_kpfn) && defined(CONFIG_MEMORY_FAILURE)
+
+void arch_unmap_kpfn(unsigned long pfn)
+{
+ unsigned long decoy_addr;
+
+ /*
+ * Unmap this page from the kernel 1:1 mappings to make sure
+ * we don't log more errors because of speculative access to
+ * the page.
+ * We would like to just call:
+ * set_memory_np((unsigned long)pfn_to_kaddr(pfn), 1);
+ * but doing that would radically increase the odds of a
+ * speculative access to the posion page because we'd have
+ * the virtual address of the kernel 1:1 mapping sitting
+ * around in registers.
+ * Instead we get tricky. We create a non-canonical address
+ * that looks just like the one we want, but has bit 63 flipped.
+ * This relies on set_memory_np() not checking whether we passed
+ * a legal address.
+ */
+
+/*
+ * Build time check to see if we have a spare virtual bit. Don't want
+ * to leave this until run time because most developers don't have a
+ * system that can exercise this code path. This will only become a
+ * problem if/when we move beyond 5-level page tables.
+ *
+ * Hard code "9" here because cpp doesn't grok ilog2(PTRS_PER_PGD)
+ */
+#if PGDIR_SHIFT + 9 < 63
+ decoy_addr = (pfn << PAGE_SHIFT) + (PAGE_OFFSET ^ BIT(63));
+#else
+#error "no unused virtual bit available"
+#endif
+
+ if (set_memory_np(decoy_addr, 1))
+ pr_warn("Could not invalidate pfn=0x%lx from 1:1 map\n", pfn);
+
+}
+#endif
+
/*
* The actual machine check handler. This only handles real
* exceptions when something got corrupted coming in through int 18.
--- a/include/linux/mm_inline.h
+++ b/include/linux/mm_inline.h
@@ -126,4 +126,10 @@ static __always_inline enum lru_list pag

#define lru_to_page(head) (list_entry((head)->prev, struct page, lru))

+#ifdef arch_unmap_kpfn
+extern void arch_unmap_kpfn(unsigned long pfn);
+#else
+static __always_inline void arch_unmap_kpfn(unsigned long pfn) { }
+#endif
+
#endif
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -1146,6 +1146,8 @@ int memory_failure(unsigned long pfn, in
return 0;
}

+ arch_unmap_kpfn(pfn);
+
orig_head = hpage = compound_head(p);
num_poisoned_pages_inc();



2017-09-18 09:12:28

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.13 46/52] xfs: disable per-inode DAX flag

4.13-stable review patch. If anyone has any objections, please let me know.

------------------

From: Christoph Hellwig <[email protected]>

commit 742d84290739ae908f1b61b7d17ea382c8c0073a upstream.

Currently flag switching can be used to easily crash the kernel. Disable
the per-inode DAX flag until that is sorted out.

Signed-off-by: Christoph Hellwig <[email protected]>
Reviewed-by: Darrick J. Wong <[email protected]>
Signed-off-by: Darrick J. Wong <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
fs/xfs/xfs_ioctl.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)

--- a/fs/xfs/xfs_ioctl.c
+++ b/fs/xfs/xfs_ioctl.c
@@ -1008,11 +1008,12 @@ xfs_diflags_to_linux(
inode->i_flags |= S_NOATIME;
else
inode->i_flags &= ~S_NOATIME;
+#if 0 /* disabled until the flag switching races are sorted out */
if (xflags & FS_XFLAG_DAX)
inode->i_flags |= S_DAX;
else
inode->i_flags &= ~S_DAX;
-
+#endif
}

static int


2017-09-18 09:12:20

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.13 51/52] md/raid1/10: reset bio allocated from mempool

4.13-stable review patch. If anyone has any objections, please let me know.

------------------

From: Shaohua Li <[email protected]>

commit 208410b546207cfc4c832635fa46419cfa86b4cd upstream.

Data allocated from mempool doesn't always get initialized, this happens when
the data is reused instead of fresh allocation. In the raid1/10 case, we must
reinitialize the bios.

Reported-by: Jonathan G. Underwood <[email protected]>
Fixes: f0250618361d(md: raid10: don't use bio's vec table to manage resync pages)
Fixes: 98d30c5812c3(md: raid1: don't use bio's vec table to manage resync pages)
Cc: Ming Lei <[email protected]>
Signed-off-by: Shaohua Li <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
drivers/md/raid1.c | 19 ++++++++++++++++++-
drivers/md/raid10.c | 35 ++++++++++++++++++++++++++++++++---
2 files changed, 50 insertions(+), 4 deletions(-)

--- a/drivers/md/raid1.c
+++ b/drivers/md/raid1.c
@@ -2564,6 +2564,23 @@ static int init_resync(struct r1conf *co
return 0;
}

+static struct r1bio *raid1_alloc_init_r1buf(struct r1conf *conf)
+{
+ struct r1bio *r1bio = mempool_alloc(conf->r1buf_pool, GFP_NOIO);
+ struct resync_pages *rps;
+ struct bio *bio;
+ int i;
+
+ for (i = conf->poolinfo->raid_disks; i--; ) {
+ bio = r1bio->bios[i];
+ rps = bio->bi_private;
+ bio_reset(bio);
+ bio->bi_private = rps;
+ }
+ r1bio->master_bio = NULL;
+ return r1bio;
+}
+
/*
* perform a "sync" on one "block"
*
@@ -2649,7 +2666,7 @@ static sector_t raid1_sync_request(struc

bitmap_cond_end_sync(mddev->bitmap, sector_nr,
mddev_is_clustered(mddev) && (sector_nr + 2 * RESYNC_SECTORS > conf->cluster_sync_high));
- r1_bio = mempool_alloc(conf->r1buf_pool, GFP_NOIO);
+ r1_bio = raid1_alloc_init_r1buf(conf);

raise_barrier(conf, sector_nr);

--- a/drivers/md/raid10.c
+++ b/drivers/md/raid10.c
@@ -2798,6 +2798,35 @@ static int init_resync(struct r10conf *c
return 0;
}

+static struct r10bio *raid10_alloc_init_r10buf(struct r10conf *conf)
+{
+ struct r10bio *r10bio = mempool_alloc(conf->r10buf_pool, GFP_NOIO);
+ struct rsync_pages *rp;
+ struct bio *bio;
+ int nalloc;
+ int i;
+
+ if (test_bit(MD_RECOVERY_SYNC, &conf->mddev->recovery) ||
+ test_bit(MD_RECOVERY_RESHAPE, &conf->mddev->recovery))
+ nalloc = conf->copies; /* resync */
+ else
+ nalloc = 2; /* recovery */
+
+ for (i = 0; i < nalloc; i++) {
+ bio = r10bio->devs[i].bio;
+ rp = bio->bi_private;
+ bio_reset(bio);
+ bio->bi_private = rp;
+ bio = r10bio->devs[i].repl_bio;
+ if (bio) {
+ rp = bio->bi_private;
+ bio_reset(bio);
+ bio->bi_private = rp;
+ }
+ }
+ return r10bio;
+}
+
/*
* perform a "sync" on one "block"
*
@@ -3027,7 +3056,7 @@ static sector_t raid10_sync_request(stru
atomic_inc(&mreplace->nr_pending);
rcu_read_unlock();

- r10_bio = mempool_alloc(conf->r10buf_pool, GFP_NOIO);
+ r10_bio = raid10_alloc_init_r10buf(conf);
r10_bio->state = 0;
raise_barrier(conf, rb2 != NULL);
atomic_set(&r10_bio->remaining, 0);
@@ -3236,7 +3265,7 @@ static sector_t raid10_sync_request(stru
}
if (sync_blocks < max_sync)
max_sync = sync_blocks;
- r10_bio = mempool_alloc(conf->r10buf_pool, GFP_NOIO);
+ r10_bio = raid10_alloc_init_r10buf(conf);
r10_bio->state = 0;

r10_bio->mddev = mddev;
@@ -4360,7 +4389,7 @@ static sector_t reshape_request(struct m

read_more:
/* Now schedule reads for blocks from sector_nr to last */
- r10_bio = mempool_alloc(conf->r10buf_pool, GFP_NOIO);
+ r10_bio = raid10_alloc_init_r10buf(conf);
r10_bio->state = 0;
raise_barrier(conf, sectors_done != 0);
atomic_set(&r10_bio->remaining, 0);


2017-09-18 09:51:47

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.13 50/52] xfs: use kmem_free to free return value of kmem_zalloc

4.13-stable review patch. If anyone has any objections, please let me know.

------------------

From: Pan Bian <[email protected]>

commit 6c370590cfe0c36bcd62d548148aa65c984540b7 upstream.

In function xfs_test_remount_options(), kfree() is used to free memory
allocated by kmem_zalloc(). But it is better to use kmem_free().

Signed-off-by: Pan Bian <[email protected]>
Reviewed-by: Darrick J. Wong <[email protected]>
Signed-off-by: Darrick J. Wong <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
fs/xfs/xfs_super.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

--- a/fs/xfs/xfs_super.c
+++ b/fs/xfs/xfs_super.c
@@ -1220,7 +1220,7 @@ xfs_test_remount_options(
tmp_mp->m_super = sb;
error = xfs_parseargs(tmp_mp, options);
xfs_free_fsname(tmp_mp);
- kfree(tmp_mp);
+ kmem_free(tmp_mp);

return error;
}


2017-09-18 09:52:10

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.13 48/52] xfs: dont set v3 xflags for v2 inodes

4.13-stable review patch. If anyone has any objections, please let me know.

------------------

From: Christoph Hellwig <[email protected]>

commit dd60687ee541ca3f6df8758f38e6f22f57c42a37 upstream.

Reject attempts to set XFLAGS that correspond to di_flags2 inode flags
if the inode isn't a v3 inode, because di_flags2 only exists on v3.

Signed-off-by: Christoph Hellwig <[email protected]>
Reviewed-by: Darrick J. Wong <[email protected]>
Signed-off-by: Darrick J. Wong <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
fs/xfs/xfs_ioctl.c | 38 +++++++++++++++++++++++++-------------
1 file changed, 25 insertions(+), 13 deletions(-)

--- a/fs/xfs/xfs_ioctl.c
+++ b/fs/xfs/xfs_ioctl.c
@@ -931,16 +931,15 @@ xfs_ioc_fsgetxattr(
return 0;
}

-STATIC void
-xfs_set_diflags(
+STATIC uint16_t
+xfs_flags2diflags(
struct xfs_inode *ip,
unsigned int xflags)
{
- unsigned int di_flags;
- uint64_t di_flags2;
-
/* can't set PREALLOC this way, just preserve it */
- di_flags = (ip->i_d.di_flags & XFS_DIFLAG_PREALLOC);
+ uint16_t di_flags =
+ (ip->i_d.di_flags & XFS_DIFLAG_PREALLOC);
+
if (xflags & FS_XFLAG_IMMUTABLE)
di_flags |= XFS_DIFLAG_IMMUTABLE;
if (xflags & FS_XFLAG_APPEND)
@@ -970,19 +969,24 @@ xfs_set_diflags(
if (xflags & FS_XFLAG_EXTSIZE)
di_flags |= XFS_DIFLAG_EXTSIZE;
}
- ip->i_d.di_flags = di_flags;

- /* diflags2 only valid for v3 inodes. */
- if (ip->i_d.di_version < 3)
- return;
+ return di_flags;
+}
+
+STATIC uint64_t
+xfs_flags2diflags2(
+ struct xfs_inode *ip,
+ unsigned int xflags)
+{
+ uint64_t di_flags2 =
+ (ip->i_d.di_flags2 & XFS_DIFLAG2_REFLINK);

- di_flags2 = (ip->i_d.di_flags2 & XFS_DIFLAG2_REFLINK);
if (xflags & FS_XFLAG_DAX)
di_flags2 |= XFS_DIFLAG2_DAX;
if (xflags & FS_XFLAG_COWEXTSIZE)
di_flags2 |= XFS_DIFLAG2_COWEXTSIZE;

- ip->i_d.di_flags2 = di_flags2;
+ return di_flags2;
}

STATIC void
@@ -1023,6 +1027,7 @@ xfs_ioctl_setattr_xflags(
struct fsxattr *fa)
{
struct xfs_mount *mp = ip->i_mount;
+ uint64_t di_flags2;

/* Can't change realtime flag if any extents are allocated. */
if ((ip->i_d.di_nextents || ip->i_delayed_blks) &&
@@ -1053,7 +1058,14 @@ xfs_ioctl_setattr_xflags(
!capable(CAP_LINUX_IMMUTABLE))
return -EPERM;

- xfs_set_diflags(ip, fa->fsx_xflags);
+ /* diflags2 only valid for v3 inodes. */
+ di_flags2 = xfs_flags2diflags2(ip, fa->fsx_xflags);
+ if (di_flags2 && ip->i_d.di_version < 3)
+ return -EINVAL;
+
+ ip->i_d.di_flags = xfs_flags2diflags(ip, fa->fsx_xflags);
+ ip->i_d.di_flags2 = di_flags2;
+
xfs_diflags_to_linux(ip);
xfs_trans_ichgtime(tp, ip, XFS_ICHGTIME_CHG);
xfs_trans_log_inode(tp, ip, XFS_ILOG_CORE);


2017-09-18 09:12:07

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.13 42/52] xfs: skip bmbt block ino validation during owner change

4.13-stable review patch. If anyone has any objections, please let me know.

------------------

From: Brian Foster <[email protected]>

commit 99c794c639a65cc7b74f30a674048fd100fe9ac8 upstream.

Extent swap uses xfs_btree_visit_blocks() to fix up bmbt block
owners on v5 (!rmapbt) filesystems. The bmbt scan uses
xfs_btree_lookup_get_block() to read bmbt blocks which verifies the
current owner of the block against the parent inode of the bmbt.
This works during extent swap because the bmbt owners are updated to
the opposite inode number before the inode extent forks are swapped.

The modified bmbt blocks are marked as ordered buffers which allows
everything to commit in a single transaction. If the transaction
commits to the log and the system crashes such that recovery of the
extent swap is required, log recovery restarts the bmbt scan to fix
up any bmbt blocks that may have not been written back before the
crash. The log recovery bmbt scan occurs after the inode forks have
been swapped, however. This causes the bmbt block owner verification
to fail, leads to log recovery failure and requires xfs_repair to
zap the log to recover.

Define a new invalid inode owner flag to inform the btree block
lookup mechanism that the current inode may be invalid with respect
to the current owner of the bmbt block. Set this flag on the cursor
used for change owner scans to allow this operation to work at
runtime and during log recovery.

Signed-off-by: Brian Foster <[email protected]>
Fixes: bb3be7e7c ("xfs: check for bogus values in btree block headers")
Cc: [email protected]
Reviewed-by: Darrick J. Wong <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
Signed-off-by: Darrick J. Wong <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
fs/xfs/libxfs/xfs_bmap_btree.c | 1 +
fs/xfs/libxfs/xfs_btree.c | 1 +
fs/xfs/libxfs/xfs_btree.h | 3 ++-
3 files changed, 4 insertions(+), 1 deletion(-)

--- a/fs/xfs/libxfs/xfs_bmap_btree.c
+++ b/fs/xfs/libxfs/xfs_bmap_btree.c
@@ -858,6 +858,7 @@ xfs_bmbt_change_owner(
cur = xfs_bmbt_init_cursor(ip->i_mount, tp, ip, whichfork);
if (!cur)
return -ENOMEM;
+ cur->bc_private.b.flags |= XFS_BTCUR_BPRV_INVALID_OWNER;

error = xfs_btree_change_owner(cur, new_owner, buffer_list);
xfs_btree_del_cursor(cur, error ? XFS_BTREE_ERROR : XFS_BTREE_NOERROR);
--- a/fs/xfs/libxfs/xfs_btree.c
+++ b/fs/xfs/libxfs/xfs_btree.c
@@ -1791,6 +1791,7 @@ xfs_btree_lookup_get_block(

/* Check the inode owner since the verifiers don't. */
if (xfs_sb_version_hascrc(&cur->bc_mp->m_sb) &&
+ !(cur->bc_private.b.flags & XFS_BTCUR_BPRV_INVALID_OWNER) &&
(cur->bc_flags & XFS_BTREE_LONG_PTRS) &&
be64_to_cpu((*blkp)->bb_u.l.bb_owner) !=
cur->bc_private.b.ip->i_ino)
--- a/fs/xfs/libxfs/xfs_btree.h
+++ b/fs/xfs/libxfs/xfs_btree.h
@@ -233,7 +233,8 @@ typedef struct xfs_btree_cur
short forksize; /* fork's inode space */
char whichfork; /* data or attr fork */
char flags; /* flags */
-#define XFS_BTCUR_BPRV_WASDEL 1 /* was delayed */
+#define XFS_BTCUR_BPRV_WASDEL (1<<0) /* was delayed */
+#define XFS_BTCUR_BPRV_INVALID_OWNER (1<<1) /* for ext swap */
} b;
} bc_private; /* per-btree type data */
} xfs_btree_cur_t;


2017-09-18 09:52:59

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.13 41/52] xfs: dont log dirty ranges for ordered buffers

4.13-stable review patch. If anyone has any objections, please let me know.

------------------

From: Brian Foster <[email protected]>

commit 8dc518dfa7dbd079581269e51074b3c55a65a880 upstream.

Ordered buffers are attached to transactions and pushed through the
logging infrastructure just like normal buffers with the exception
that they are not actually written to the log. Therefore, we don't
need to log dirty ranges of ordered buffers. xfs_trans_log_buf() is
called on ordered buffers to set up all of the dirty state on the
transaction, buffer and log item and prepare the buffer for I/O.

Now that xfs_trans_dirty_buf() is available, call it from
xfs_trans_ordered_buf() so the latter is now mutually exclusive with
xfs_trans_log_buf(). This reflects the implementation of ordered
buffers and helps eliminate confusion over the need to log ranges of
ordered buffers just to set up internal log state.

Signed-off-by: Brian Foster <[email protected]>
Reviewed-by: Allison Henderson <[email protected]>
Reviewed-by: Darrick J. Wong <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
Signed-off-by: Darrick J. Wong <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
fs/xfs/libxfs/xfs_btree.c | 6 ++----
fs/xfs/libxfs/xfs_ialloc.c | 2 --
fs/xfs/xfs_trans_buf.c | 26 ++++++++++++++------------
3 files changed, 16 insertions(+), 18 deletions(-)

--- a/fs/xfs/libxfs/xfs_btree.c
+++ b/fs/xfs/libxfs/xfs_btree.c
@@ -4464,12 +4464,10 @@ xfs_btree_block_change_owner(
* though, so everything is consistent in memory.
*/
if (bp) {
- if (cur->bc_tp) {
+ if (cur->bc_tp)
xfs_trans_ordered_buf(cur->bc_tp, bp);
- xfs_btree_log_block(cur, bp, XFS_BB_OWNER);
- } else {
+ else
xfs_buf_delwri_queue(bp, bbcoi->buffer_list);
- }
} else {
ASSERT(cur->bc_flags & XFS_BTREE_ROOT_IN_INODE);
ASSERT(level == cur->bc_nlevels - 1);
--- a/fs/xfs/libxfs/xfs_ialloc.c
+++ b/fs/xfs/libxfs/xfs_ialloc.c
@@ -378,8 +378,6 @@ xfs_ialloc_inode_init(
* transaction and pin the log appropriately.
*/
xfs_trans_ordered_buf(tp, fbuf);
- xfs_trans_log_buf(tp, fbuf, 0,
- BBTOB(fbuf->b_length) - 1);
}
} else {
fbuf->b_flags |= XBF_DONE;
--- a/fs/xfs/xfs_trans_buf.c
+++ b/fs/xfs/xfs_trans_buf.c
@@ -560,16 +560,12 @@ xfs_trans_log_buf(
struct xfs_buf_log_item *bip = bp->b_fspriv;

ASSERT(first <= last && last < BBTOB(bp->b_length));
+ ASSERT(!(bip->bli_flags & XFS_BLI_ORDERED));

xfs_trans_dirty_buf(tp, bp);

- /*
- * If we have an ordered buffer we are not logging any dirty range but
- * it still needs to be marked dirty and that it has been logged.
- */
trace_xfs_trans_log_buf(bip);
- if (!(bip->bli_flags & XFS_BLI_ORDERED))
- xfs_buf_item_log(bip, first, last);
+ xfs_buf_item_log(bip, first, last);
}


@@ -722,12 +718,11 @@ xfs_trans_inode_alloc_buf(
}

/*
- * Mark the buffer as ordered for this transaction. This means
- * that the contents of the buffer are not recorded in the transaction
- * but it is tracked in the AIL as though it was. This allows us
- * to record logical changes in transactions rather than the physical
- * changes we make to the buffer without changing writeback ordering
- * constraints of metadata buffers.
+ * Mark the buffer as ordered for this transaction. This means that the contents
+ * of the buffer are not recorded in the transaction but it is tracked in the
+ * AIL as though it was. This allows us to record logical changes in
+ * transactions rather than the physical changes we make to the buffer without
+ * changing writeback ordering constraints of metadata buffers.
*/
void
xfs_trans_ordered_buf(
@@ -739,9 +734,16 @@ xfs_trans_ordered_buf(
ASSERT(bp->b_transp == tp);
ASSERT(bip != NULL);
ASSERT(atomic_read(&bip->bli_refcount) > 0);
+ ASSERT(!xfs_buf_item_dirty_format(bip));

bip->bli_flags |= XFS_BLI_ORDERED;
trace_xfs_buf_item_ordered(bip);
+
+ /*
+ * We don't log a dirty range of an ordered buffer but it still needs
+ * to be marked dirty and that it has been logged.
+ */
+ xfs_trans_dirty_buf(tp, bp);
}

/*


2017-09-18 09:12:00

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.13 39/52] xfs: ordered buffer log items are never formatted

4.13-stable review patch. If anyone has any objections, please let me know.

------------------

From: Brian Foster <[email protected]>

commit e9385cc6fb7edf23702de33a2dc82965d92d9392 upstream.

Ordered buffers pass through the logging infrastructure without ever
being written to the log. The way this works is that the ordered
buffer status is transferred to the log vector at commit time via
the ->iop_size() callback. In xlog_cil_insert_format_items(),
ordered log vectors bypass ->iop_format() processing altogether.

Therefore it is unnecessary for xfs_buf_item_format() to handle
ordered buffers. Remove the unnecessary logic and assert that an
ordered buffer never reaches this point.

Signed-off-by: Brian Foster <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
Reviewed-by: Darrick J. Wong <[email protected]>
Signed-off-by: Darrick J. Wong <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
fs/xfs/xfs_buf_item.c | 12 ++----------
fs/xfs/xfs_trace.h | 1 -
2 files changed, 2 insertions(+), 11 deletions(-)

--- a/fs/xfs/xfs_buf_item.c
+++ b/fs/xfs/xfs_buf_item.c
@@ -323,6 +323,8 @@ xfs_buf_item_format(
ASSERT((bip->bli_flags & XFS_BLI_STALE) ||
(xfs_blft_from_flags(&bip->__bli_format) > XFS_BLFT_UNKNOWN_BUF
&& xfs_blft_from_flags(&bip->__bli_format) < XFS_BLFT_MAX_BUF));
+ ASSERT(!(bip->bli_flags & XFS_BLI_ORDERED) ||
+ (bip->bli_flags & XFS_BLI_STALE));


/*
@@ -347,16 +349,6 @@ xfs_buf_item_format(
bip->bli_flags &= ~XFS_BLI_INODE_BUF;
}

- if ((bip->bli_flags & (XFS_BLI_ORDERED|XFS_BLI_STALE)) ==
- XFS_BLI_ORDERED) {
- /*
- * The buffer has been logged just to order it. It is not being
- * included in the transaction commit, so don't format it.
- */
- trace_xfs_buf_item_format_ordered(bip);
- return;
- }
-
for (i = 0; i < bip->bli_format_count; i++) {
xfs_buf_item_format_segment(bip, lv, &vecp, offset,
&bip->bli_formats[i]);
--- a/fs/xfs/xfs_trace.h
+++ b/fs/xfs/xfs_trace.h
@@ -517,7 +517,6 @@ DEFINE_BUF_ITEM_EVENT(xfs_buf_item_size)
DEFINE_BUF_ITEM_EVENT(xfs_buf_item_size_ordered);
DEFINE_BUF_ITEM_EVENT(xfs_buf_item_size_stale);
DEFINE_BUF_ITEM_EVENT(xfs_buf_item_format);
-DEFINE_BUF_ITEM_EVENT(xfs_buf_item_format_ordered);
DEFINE_BUF_ITEM_EVENT(xfs_buf_item_format_stale);
DEFINE_BUF_ITEM_EVENT(xfs_buf_item_ordered);
DEFINE_BUF_ITEM_EVENT(xfs_buf_item_pin);


2017-09-18 09:53:37

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.13 38/52] xfs: remove unnecessary dirty bli format check for ordered bufs

4.13-stable review patch. If anyone has any objections, please let me know.

------------------

From: Brian Foster <[email protected]>

commit 6453c65d3576bc3e602abb5add15f112755c08ca upstream.

xfs_buf_item_unlock() historically checked the dirty state of the
buffer by manually checking the buffer log formats for dirty
segments. The introduction of ordered buffers invalidated this check
because ordered buffers have dirty bli's but no dirty (logged)
segments. The check was updated to accommodate ordered buffers by
looking at the bli state first and considering the blf only if the
bli is clean.

This logic is safe but unnecessary. There is no valid case where the
bli is clean yet the blf has dirty segments. The bli is set dirty
whenever the blf is logged (via xfs_trans_log_buf()) and the blf is
cleared in the only place BLI_DIRTY is cleared (xfs_trans_binval()).

Remove the conditional blf dirty checks and replace with an assert
that should catch any discrepencies between bli and blf dirty
states. Refactor the old blf dirty check into a helper function to
be used by the assert.

Signed-off-by: Brian Foster <[email protected]>
Reviewed-by: Darrick J. Wong <[email protected]>
Signed-off-by: Darrick J. Wong <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
fs/xfs/xfs_buf_item.c | 62 +++++++++++++++++++++++++-------------------------
fs/xfs/xfs_buf_item.h | 1
2 files changed, 33 insertions(+), 30 deletions(-)

--- a/fs/xfs/xfs_buf_item.c
+++ b/fs/xfs/xfs_buf_item.c
@@ -575,26 +575,18 @@ xfs_buf_item_unlock(
{
struct xfs_buf_log_item *bip = BUF_ITEM(lip);
struct xfs_buf *bp = bip->bli_buf;
- bool clean;
- bool aborted;
- int flags;
+ bool aborted = !!(lip->li_flags & XFS_LI_ABORTED);
+ bool hold = !!(bip->bli_flags & XFS_BLI_HOLD);
+ bool dirty = !!(bip->bli_flags & XFS_BLI_DIRTY);
+ bool ordered = !!(bip->bli_flags & XFS_BLI_ORDERED);

/* Clear the buffer's association with this transaction. */
bp->b_transp = NULL;

/*
- * If this is a transaction abort, don't return early. Instead, allow
- * the brelse to happen. Normally it would be done for stale
- * (cancelled) buffers at unpin time, but we'll never go through the
- * pin/unpin cycle if we abort inside commit.
- */
- aborted = (lip->li_flags & XFS_LI_ABORTED) ? true : false;
- /*
- * Before possibly freeing the buf item, copy the per-transaction state
- * so we can reference it safely later after clearing it from the
- * buffer log item.
+ * The per-transaction state has been copied above so clear it from the
+ * bli.
*/
- flags = bip->bli_flags;
bip->bli_flags &= ~(XFS_BLI_LOGGED | XFS_BLI_HOLD | XFS_BLI_ORDERED);

/*
@@ -602,7 +594,7 @@ xfs_buf_item_unlock(
* unlock the buffer and free the buf item when the buffer is unpinned
* for the last time.
*/
- if (flags & XFS_BLI_STALE) {
+ if (bip->bli_flags & XFS_BLI_STALE) {
trace_xfs_buf_item_unlock_stale(bip);
ASSERT(bip->__bli_format.blf_flags & XFS_BLF_CANCEL);
if (!aborted) {
@@ -620,20 +612,11 @@ xfs_buf_item_unlock(
* regardless of whether it is dirty or not. A dirty abort implies a
* shutdown, anyway.
*
- * Ordered buffers are dirty but may have no recorded changes, so ensure
- * we only release clean items here.
+ * The bli dirty state should match whether the blf has logged segments
+ * except for ordered buffers, where only the bli should be dirty.
*/
- clean = (flags & XFS_BLI_DIRTY) ? false : true;
- if (clean) {
- int i;
- for (i = 0; i < bip->bli_format_count; i++) {
- if (!xfs_bitmap_empty(bip->bli_formats[i].blf_data_map,
- bip->bli_formats[i].blf_map_size)) {
- clean = false;
- break;
- }
- }
- }
+ ASSERT((!ordered && dirty == xfs_buf_item_dirty_format(bip)) ||
+ (ordered && dirty && !xfs_buf_item_dirty_format(bip)));

/*
* Clean buffers, by definition, cannot be in the AIL. However, aborted
@@ -652,11 +635,11 @@ xfs_buf_item_unlock(
ASSERT(XFS_FORCED_SHUTDOWN(lip->li_mountp));
xfs_trans_ail_remove(lip, SHUTDOWN_LOG_IO_ERROR);
xfs_buf_item_relse(bp);
- } else if (clean)
+ } else if (!dirty)
xfs_buf_item_relse(bp);
}

- if (!(flags & XFS_BLI_HOLD))
+ if (!hold)
xfs_buf_relse(bp);
}

@@ -945,6 +928,25 @@ xfs_buf_item_log(
}


+/*
+ * Return true if the buffer has any ranges logged/dirtied by a transaction,
+ * false otherwise.
+ */
+bool
+xfs_buf_item_dirty_format(
+ struct xfs_buf_log_item *bip)
+{
+ int i;
+
+ for (i = 0; i < bip->bli_format_count; i++) {
+ if (!xfs_bitmap_empty(bip->bli_formats[i].blf_data_map,
+ bip->bli_formats[i].blf_map_size))
+ return true;
+ }
+
+ return false;
+}
+
STATIC void
xfs_buf_item_free(
xfs_buf_log_item_t *bip)
--- a/fs/xfs/xfs_buf_item.h
+++ b/fs/xfs/xfs_buf_item.h
@@ -64,6 +64,7 @@ typedef struct xfs_buf_log_item {
int xfs_buf_item_init(struct xfs_buf *, struct xfs_mount *);
void xfs_buf_item_relse(struct xfs_buf *);
void xfs_buf_item_log(xfs_buf_log_item_t *, uint, uint);
+bool xfs_buf_item_dirty_format(struct xfs_buf_log_item *);
void xfs_buf_attach_iodone(struct xfs_buf *,
void(*)(struct xfs_buf *, xfs_log_item_t *),
xfs_log_item_t *);


2017-09-18 09:11:50

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.13 43/52] xfs: move bmbt owner change to last step of extent swap

4.13-stable review patch. If anyone has any objections, please let me know.

------------------

From: Brian Foster <[email protected]>

commit 6fb10d6d22094bc4062f92b9ccbcee2f54033d04 upstream.

The extent swap operation currently resets bmbt block owners before
the inode forks are swapped. The bmbt buffers are marked as ordered
so they do not have to be physically logged in the transaction.

This use of ordered buffers is not safe as bmbt buffers may have
been previously physically logged. The bmbt owner change algorithm
needs to be updated to physically log buffers that are already dirty
when/if they are encountered. This means that an extent swap will
eventually require multiple rolling transactions to handle large
btrees. In addition, all inode related changes must be logged before
the bmbt owner change scan begins and can roll the transaction for
the first time to preserve fs consistency via log recovery.

In preparation for such fixes to the bmbt owner change algorithm,
refactor the bmbt scan out of the extent fork swap code to the last
operation before the transaction is committed. Update
xfs_swap_extent_forks() to only set the inode log flags when an
owner change scan is necessary. Update xfs_swap_extents() to trigger
the owner change based on the inode log flags. Note that since the
owner change now occurs after the extent fork swap, the inode btrees
must be fixed up with the inode number of the current inode (similar
to log recovery).

Signed-off-by: Brian Foster <[email protected]>
Reviewed-by: Darrick J. Wong <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
Signed-off-by: Darrick J. Wong <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
fs/xfs/xfs_bmap_util.c | 44 ++++++++++++++++++++++++++------------------
1 file changed, 26 insertions(+), 18 deletions(-)

--- a/fs/xfs/xfs_bmap_util.c
+++ b/fs/xfs/xfs_bmap_util.c
@@ -1840,29 +1840,18 @@ xfs_swap_extent_forks(
}

/*
- * Before we've swapped the forks, lets set the owners of the forks
- * appropriately. We have to do this as we are demand paging the btree
- * buffers, and so the validation done on read will expect the owner
- * field to be correctly set. Once we change the owners, we can swap the
- * inode forks.
+ * Btree format (v3) inodes have the inode number stamped in the bmbt
+ * block headers. We can't start changing the bmbt blocks until the
+ * inode owner change is logged so recovery does the right thing in the
+ * event of a crash. Set the owner change log flags now and leave the
+ * bmbt scan as the last step.
*/
if (ip->i_d.di_version == 3 &&
- ip->i_d.di_format == XFS_DINODE_FMT_BTREE) {
+ ip->i_d.di_format == XFS_DINODE_FMT_BTREE)
(*target_log_flags) |= XFS_ILOG_DOWNER;
- error = xfs_bmbt_change_owner(tp, ip, XFS_DATA_FORK,
- tip->i_ino, NULL);
- if (error)
- return error;
- }
-
if (tip->i_d.di_version == 3 &&
- tip->i_d.di_format == XFS_DINODE_FMT_BTREE) {
+ tip->i_d.di_format == XFS_DINODE_FMT_BTREE)
(*src_log_flags) |= XFS_ILOG_DOWNER;
- error = xfs_bmbt_change_owner(tp, tip, XFS_DATA_FORK,
- ip->i_ino, NULL);
- if (error)
- return error;
- }

/*
* Swap the data forks of the inodes
@@ -2092,6 +2081,25 @@ xfs_swap_extents(
xfs_trans_log_inode(tp, tip, target_log_flags);

/*
+ * The extent forks have been swapped, but crc=1,rmapbt=0 filesystems
+ * have inode number owner values in the bmbt blocks that still refer to
+ * the old inode. Scan each bmbt to fix up the owner values with the
+ * inode number of the current inode.
+ */
+ if (src_log_flags & XFS_ILOG_DOWNER) {
+ error = xfs_bmbt_change_owner(tp, ip, XFS_DATA_FORK,
+ ip->i_ino, NULL);
+ if (error)
+ goto out_trans_cancel;
+ }
+ if (target_log_flags & XFS_ILOG_DOWNER) {
+ error = xfs_bmbt_change_owner(tp, tip, XFS_DATA_FORK,
+ tip->i_ino, NULL);
+ if (error)
+ goto out_trans_cancel;
+ }
+
+ /*
* If this is a synchronous mount, make sure that the
* transaction goes to disk before returning to the user.
*/


2017-09-18 09:54:26

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.13 36/52] xfs: check for race with xfs_reclaim_inode() in xfs_ifree_cluster()

4.13-stable review patch. If anyone has any objections, please let me know.

------------------

From: Omar Sandoval <[email protected]>

commit f2e9ad212def50bcf4c098c6288779dd97fff0f0 upstream.

After xfs_ifree_cluster() finds an inode in the radix tree and verifies
that the inode number is what it expected, xfs_reclaim_inode() can swoop
in and free it. xfs_ifree_cluster() will then happily continue working
on the freed inode. Most importantly, it will mark the inode stale,
which will probably be overwritten when the inode slab object is
reallocated, but if it has already been reallocated then we can end up
with an inode spuriously marked stale.

In 8a17d7ddedb4 ("xfs: mark reclaimed inodes invalid earlier") we added
a second check to xfs_iflush_cluster() to detect this race, but the
similar RCU lookup in xfs_ifree_cluster() needs the same treatment.

Signed-off-by: Omar Sandoval <[email protected]>
Reviewed-by: Brian Foster <[email protected]>
Reviewed-by: Darrick J. Wong <[email protected]>
Signed-off-by: Darrick J. Wong <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
fs/xfs/xfs_icache.c | 10 +++++-----
fs/xfs/xfs_inode.c | 23 ++++++++++++++++++-----
2 files changed, 23 insertions(+), 10 deletions(-)

--- a/fs/xfs/xfs_icache.c
+++ b/fs/xfs/xfs_icache.c
@@ -1124,11 +1124,11 @@ reclaim:
* Because we use RCU freeing we need to ensure the inode always appears
* to be reclaimed with an invalid inode number when in the free state.
* We do this as early as possible under the ILOCK so that
- * xfs_iflush_cluster() can be guaranteed to detect races with us here.
- * By doing this, we guarantee that once xfs_iflush_cluster has locked
- * XFS_ILOCK that it will see either a valid, flushable inode that will
- * serialise correctly, or it will see a clean (and invalid) inode that
- * it can skip.
+ * xfs_iflush_cluster() and xfs_ifree_cluster() can be guaranteed to
+ * detect races with us here. By doing this, we guarantee that once
+ * xfs_iflush_cluster() or xfs_ifree_cluster() has locked XFS_ILOCK that
+ * it will see either a valid inode that will serialise correctly, or it
+ * will see an invalid inode that it can skip.
*/
spin_lock(&ip->i_flags_lock);
ip->i_flags = XFS_IRECLAIM;
--- a/fs/xfs/xfs_inode.c
+++ b/fs/xfs/xfs_inode.c
@@ -2359,11 +2359,24 @@ retry:
* already marked stale. If we can't lock it, back off
* and retry.
*/
- if (ip != free_ip &&
- !xfs_ilock_nowait(ip, XFS_ILOCK_EXCL)) {
- rcu_read_unlock();
- delay(1);
- goto retry;
+ if (ip != free_ip) {
+ if (!xfs_ilock_nowait(ip, XFS_ILOCK_EXCL)) {
+ rcu_read_unlock();
+ delay(1);
+ goto retry;
+ }
+
+ /*
+ * Check the inode number again in case we're
+ * racing with freeing in xfs_reclaim_inode().
+ * See the comments in that function for more
+ * information as to why the initial check is
+ * not sufficient.
+ */
+ if (ip->i_ino != inum + i) {
+ xfs_iunlock(ip, XFS_ILOCK_EXCL);
+ continue;
+ }
}
rcu_read_unlock();



2017-09-18 09:54:25

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.13 33/52] xfs: handle -EFSCORRUPTED during head/tail verification

4.13-stable review patch. If anyone has any objections, please let me know.

------------------

From: Brian Foster <[email protected]>

commit a4c9b34d6a17081005ec459b57b8effc08f4c731 upstream.

Torn write and tail overwrite detection both trigger only on
-EFSBADCRC errors. While this is the most likely failure scenario
for each condition, -EFSCORRUPTED is still possible in certain cases
depending on what ends up on disk when a torn write or partial tail
overwrite occurs. For example, an invalid log record h_len can lead
to an -EFSCORRUPTED error when running the log recovery CRC pass.

Therefore, update log head and tail verification to trigger the
associated head/tail fixups in the event of -EFSCORRUPTED errors
along with -EFSBADCRC. Also, -EFSCORRUPTED can currently be returned
from xlog_do_recovery_pass() before rhead_blk is initialized if the
first record encountered happens to be corrupted. This leads to an
incorrect 'first_bad' return value. Initialize rhead_blk earlier in
the function to address that problem as well.

Signed-off-by: Brian Foster <[email protected]>
Reviewed-by: Darrick J. Wong <[email protected]>
Signed-off-by: Darrick J. Wong <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
fs/xfs/xfs_log_recover.c | 7 +++----
1 file changed, 3 insertions(+), 4 deletions(-)

--- a/fs/xfs/xfs_log_recover.c
+++ b/fs/xfs/xfs_log_recover.c
@@ -1102,7 +1102,7 @@ xlog_verify_tail(
first_bad = 0;
error = xlog_do_recovery_pass(log, head_blk, *tail_blk,
XLOG_RECOVER_CRCPASS, &first_bad);
- while (error == -EFSBADCRC && first_bad) {
+ while ((error == -EFSBADCRC || error == -EFSCORRUPTED) && first_bad) {
int tail_distance;

/*
@@ -1188,7 +1188,7 @@ xlog_verify_head(
*/
error = xlog_do_recovery_pass(log, *head_blk, tmp_rhead_blk,
XLOG_RECOVER_CRCPASS, &first_bad);
- if (error == -EFSBADCRC) {
+ if ((error == -EFSBADCRC || error == -EFSCORRUPTED) && first_bad) {
/*
* We've hit a potential torn write. Reset the error and warn
* about it.
@@ -5257,7 +5257,7 @@ xlog_do_recovery_pass(
LIST_HEAD (buffer_list);

ASSERT(head_blk != tail_blk);
- rhead_blk = 0;
+ blk_no = rhead_blk = tail_blk;

for (i = 0; i < XLOG_RHASH_SIZE; i++)
INIT_HLIST_HEAD(&rhash[i]);
@@ -5335,7 +5335,6 @@ xlog_do_recovery_pass(
}

memset(rhash, 0, sizeof(rhash));
- blk_no = rhead_blk = tail_blk;
if (tail_blk > head_blk) {
/*
* Perform recovery around the end of the physical log.


2017-09-18 09:55:11

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.13 34/52] xfs: stop searching for free slots in an inode chunk when there are none

4.13-stable review patch. If anyone has any objections, please let me know.

------------------

From: Carlos Maiolino <[email protected]>

commit 2d32311cf19bfb8c1d2b4601974ddd951f9cfd0b upstream.

In a filesystem without finobt, the Space manager selects an AG to alloc a new
inode, where xfs_dialloc_ag_inobt() will search the AG for the free slot chunk.

When the new inode is in the same AG as its parent, the btree will be searched
starting on the parent's record, and then retried from the top if no slot is
available beyond the parent's record.

To exit this loop though, xfs_dialloc_ag_inobt() relies on the fact that the
btree must have a free slot available, once its callers relied on the
agi->freecount when deciding how/where to allocate this new inode.

In the case when the agi->freecount is corrupted, showing available inodes in an
AG, when in fact there is none, this becomes an infinite loop.

Add a way to stop the loop when a free slot is not found in the btree, making
the function to fall into the whole AG scan which will then, be able to detect
the corruption and shut the filesystem down.

As pointed by Brian, this might impact performance, giving the fact we
don't reset the search distance anymore when we reach the end of the
tree, giving it fewer tries before falling back to the whole AG search, but
it will only affect searches that start within 10 records to the end of the tree.

Signed-off-by: Carlos Maiolino <[email protected]>
Reviewed-by: Darrick J. Wong <[email protected]>
Signed-off-by: Darrick J. Wong <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
fs/xfs/libxfs/xfs_ialloc.c | 55 ++++++++++++++++++++++-----------------------
1 file changed, 27 insertions(+), 28 deletions(-)

--- a/fs/xfs/libxfs/xfs_ialloc.c
+++ b/fs/xfs/libxfs/xfs_ialloc.c
@@ -1133,6 +1133,7 @@ xfs_dialloc_ag_inobt(
int error;
int offset;
int i, j;
+ int searchdistance = 10;

pag = xfs_perag_get(mp, agno);

@@ -1159,7 +1160,6 @@ xfs_dialloc_ag_inobt(
if (pagno == agno) {
int doneleft; /* done, to the left */
int doneright; /* done, to the right */
- int searchdistance = 10;

error = xfs_inobt_lookup(cur, pagino, XFS_LOOKUP_LE, &i);
if (error)
@@ -1220,21 +1220,9 @@ xfs_dialloc_ag_inobt(
/*
* Loop until we find an inode chunk with a free inode.
*/
- while (!doneleft || !doneright) {
+ while (--searchdistance > 0 && (!doneleft || !doneright)) {
int useleft; /* using left inode chunk this time */

- if (!--searchdistance) {
- /*
- * Not in range - save last search
- * location and allocate a new inode
- */
- xfs_btree_del_cursor(tcur, XFS_BTREE_NOERROR);
- pag->pagl_leftrec = trec.ir_startino;
- pag->pagl_rightrec = rec.ir_startino;
- pag->pagl_pagino = pagino;
- goto newino;
- }
-
/* figure out the closer block if both are valid. */
if (!doneleft && !doneright) {
useleft = pagino -
@@ -1278,26 +1266,37 @@ xfs_dialloc_ag_inobt(
goto error1;
}

- /*
- * We've reached the end of the btree. because
- * we are only searching a small chunk of the
- * btree each search, there is obviously free
- * inodes closer to the parent inode than we
- * are now. restart the search again.
- */
- pag->pagl_pagino = NULLAGINO;
- pag->pagl_leftrec = NULLAGINO;
- pag->pagl_rightrec = NULLAGINO;
- xfs_btree_del_cursor(tcur, XFS_BTREE_NOERROR);
- xfs_btree_del_cursor(cur, XFS_BTREE_NOERROR);
- goto restart_pagno;
+ if (searchdistance <= 0) {
+ /*
+ * Not in range - save last search
+ * location and allocate a new inode
+ */
+ xfs_btree_del_cursor(tcur, XFS_BTREE_NOERROR);
+ pag->pagl_leftrec = trec.ir_startino;
+ pag->pagl_rightrec = rec.ir_startino;
+ pag->pagl_pagino = pagino;
+
+ } else {
+ /*
+ * We've reached the end of the btree. because
+ * we are only searching a small chunk of the
+ * btree each search, there is obviously free
+ * inodes closer to the parent inode than we
+ * are now. restart the search again.
+ */
+ pag->pagl_pagino = NULLAGINO;
+ pag->pagl_leftrec = NULLAGINO;
+ pag->pagl_rightrec = NULLAGINO;
+ xfs_btree_del_cursor(tcur, XFS_BTREE_NOERROR);
+ xfs_btree_del_cursor(cur, XFS_BTREE_NOERROR);
+ goto restart_pagno;
+ }
}

/*
* In a different AG from the parent.
* See if the most recently allocated block has any free.
*/
-newino:
if (agi->agi_newino != cpu_to_be32(NULLAGINO)) {
error = xfs_inobt_lookup(cur, be32_to_cpu(agi->agi_newino),
XFS_LOOKUP_EQ, &i);


2017-09-18 09:55:43

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.13 28/52] xfs: Add infrastructure needed for error propagation during buffer IO failure

4.13-stable review patch. If anyone has any objections, please let me know.

------------------

From: Carlos Maiolino <[email protected]>

commit 0b80ae6ed13169bd3a244e71169f2cc020b0c57a upstream.

With the current code, XFS never re-submit a failed buffer for IO,
because the failed item in the buffer is kept in the flush locked state
forever.

To be able to resubmit an log item for IO, we need a way to mark an item
as failed, if, for any reason the buffer which the item belonged to
failed during writeback.

Add a new log item callback to be used after an IO completion failure
and make the needed clean ups.

Reviewed-by: Brian Foster <[email protected]>
Signed-off-by: Carlos Maiolino <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
Reviewed-by: Darrick J. Wong <[email protected]>
Signed-off-by: Darrick J. Wong <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
fs/xfs/xfs_buf_item.c | 32 +++++++++++++++++++++++++++++++-
fs/xfs/xfs_trans.h | 7 +++++--
2 files changed, 36 insertions(+), 3 deletions(-)

--- a/fs/xfs/xfs_buf_item.c
+++ b/fs/xfs/xfs_buf_item.c
@@ -29,6 +29,7 @@
#include "xfs_error.h"
#include "xfs_trace.h"
#include "xfs_log.h"
+#include "xfs_inode.h"


kmem_zone_t *xfs_buf_item_zone;
@@ -1054,6 +1055,31 @@ xfs_buf_do_callbacks(
}
}

+/*
+ * Invoke the error state callback for each log item affected by the failed I/O.
+ *
+ * If a metadata buffer write fails with a non-permanent error, the buffer is
+ * eventually resubmitted and so the completion callbacks are not run. The error
+ * state may need to be propagated to the log items attached to the buffer,
+ * however, so the next AIL push of the item knows hot to handle it correctly.
+ */
+STATIC void
+xfs_buf_do_callbacks_fail(
+ struct xfs_buf *bp)
+{
+ struct xfs_log_item *next;
+ struct xfs_log_item *lip = bp->b_fspriv;
+ struct xfs_ail *ailp = lip->li_ailp;
+
+ spin_lock(&ailp->xa_lock);
+ for (; lip; lip = next) {
+ next = lip->li_bio_list;
+ if (lip->li_ops->iop_error)
+ lip->li_ops->iop_error(lip, bp);
+ }
+ spin_unlock(&ailp->xa_lock);
+}
+
static bool
xfs_buf_iodone_callback_error(
struct xfs_buf *bp)
@@ -1123,7 +1149,11 @@ xfs_buf_iodone_callback_error(
if ((mp->m_flags & XFS_MOUNT_UNMOUNTING) && mp->m_fail_unmount)
goto permanent_error;

- /* still a transient error, higher layers will retry */
+ /*
+ * Still a transient error, run IO completion failure callbacks and let
+ * the higher layers retry the buffer.
+ */
+ xfs_buf_do_callbacks_fail(bp);
xfs_buf_ioerror(bp, 0);
xfs_buf_relse(bp);
return true;
--- a/fs/xfs/xfs_trans.h
+++ b/fs/xfs/xfs_trans.h
@@ -64,11 +64,13 @@ typedef struct xfs_log_item {
} xfs_log_item_t;

#define XFS_LI_IN_AIL 0x1
-#define XFS_LI_ABORTED 0x2
+#define XFS_LI_ABORTED 0x2
+#define XFS_LI_FAILED 0x4

#define XFS_LI_FLAGS \
{ XFS_LI_IN_AIL, "IN_AIL" }, \
- { XFS_LI_ABORTED, "ABORTED" }
+ { XFS_LI_ABORTED, "ABORTED" }, \
+ { XFS_LI_FAILED, "FAILED" }

struct xfs_item_ops {
void (*iop_size)(xfs_log_item_t *, int *, int *);
@@ -79,6 +81,7 @@ struct xfs_item_ops {
void (*iop_unlock)(xfs_log_item_t *);
xfs_lsn_t (*iop_committed)(xfs_log_item_t *, xfs_lsn_t);
void (*iop_committing)(xfs_log_item_t *, xfs_lsn_t);
+ void (*iop_error)(xfs_log_item_t *, xfs_buf_t *);
};

void xfs_log_item_init(struct xfs_mount *mp, struct xfs_log_item *item,


2017-09-18 09:11:21

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.13 24/52] libnvdimm, btt: check memory allocation failure

4.13-stable review patch. If anyone has any objections, please let me know.

------------------

From: Christophe Jaillet <[email protected]>

commit ed36b4dba54a421ce5551638f6a9790b2c2116b1 upstream.

Check memory allocation failures and return -ENOMEM in such cases, as
already done few lines below for another memory allocation.

This avoids NULL pointers dereference.

Fixes: 14e494542636 ("libnvdimm, btt: BTT updates for UEFI 2.7 format")
Signed-off-by: Christophe JAILLET <[email protected]>
Reviewed-by: Vishal Verma <[email protected]>
Signed-off-by: Dan Williams <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
drivers/nvdimm/btt.c | 2 ++
1 file changed, 2 insertions(+)

--- a/drivers/nvdimm/btt.c
+++ b/drivers/nvdimm/btt.c
@@ -1429,6 +1429,8 @@ int nvdimm_namespace_attach_btt(struct n
}

btt_sb = devm_kzalloc(&nd_btt->dev, sizeof(*btt_sb), GFP_KERNEL);
+ if (!btt_sb)
+ return -ENOMEM;

/*
* If this returns < 0, that is ok as it just means there wasn't


2017-09-18 09:56:21

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.13 25/52] libnvdimm: fix integer overflow static analysis warning

4.13-stable review patch. If anyone has any objections, please let me know.

------------------

From: Dan Williams <[email protected]>

commit 58738c495e15badd2015e19ff41f1f1ed55200bc upstream.

Dan reports:
The patch 62232e45f4a2: "libnvdimm: control (ioctl) messages for
nvdimm_bus and nvdimm devices" from Jun 8, 2015, leads to the
following static checker warning:

drivers/nvdimm/bus.c:1018 __nd_ioctl()
warn: integer overflows 'buf_len'

From a casual review, this seems like it might be a real bug. On
the first iteration we load some data into in_env[]. On the second
iteration we read a use controlled "in_size" from nd_cmd_in_size().
It can go up to UINT_MAX - 1. A high number means we will fill the
whole in_env[] buffer. But we potentially keep looping and adding
more to in_len so now it can be any value.

It simple enough to change, but it feels weird that we keep looping
even though in_env is totally full. Shouldn't we just return an
error if we don't have space for desc->in_num.

We keep looping because the size of the total input is allowed to be
bigger than the 'envelope' which is a subset of the payload that tells
us how much data to expect. For safety explicitly check that buf_len
does not overflow which is what the checker flagged.

Fixes: 62232e45f4a2: "libnvdimm: control (ioctl) messages for nvdimm_bus..."
Reported-by: Dan Carpenter <[email protected]>
Signed-off-by: Dan Williams <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
drivers/nvdimm/bus.c | 17 +++++++++--------
1 file changed, 9 insertions(+), 8 deletions(-)

--- a/drivers/nvdimm/bus.c
+++ b/drivers/nvdimm/bus.c
@@ -905,19 +905,20 @@ static int __nd_ioctl(struct nvdimm_bus
int read_only, unsigned int ioctl_cmd, unsigned long arg)
{
struct nvdimm_bus_descriptor *nd_desc = nvdimm_bus->nd_desc;
- size_t buf_len = 0, in_len = 0, out_len = 0;
static char out_env[ND_CMD_MAX_ENVELOPE];
static char in_env[ND_CMD_MAX_ENVELOPE];
const struct nd_cmd_desc *desc = NULL;
unsigned int cmd = _IOC_NR(ioctl_cmd);
- unsigned int func = cmd;
- void __user *p = (void __user *) arg;
struct device *dev = &nvdimm_bus->dev;
- struct nd_cmd_pkg pkg;
+ void __user *p = (void __user *) arg;
const char *cmd_name, *dimm_name;
+ u32 in_len = 0, out_len = 0;
+ unsigned int func = cmd;
unsigned long cmd_mask;
- void *buf;
+ struct nd_cmd_pkg pkg;
int rc, i, cmd_rc;
+ u64 buf_len = 0;
+ void *buf;

if (nvdimm) {
desc = nd_cmd_dimm_desc(cmd);
@@ -977,7 +978,7 @@ static int __nd_ioctl(struct nvdimm_bus

if (cmd == ND_CMD_CALL) {
func = pkg.nd_command;
- dev_dbg(dev, "%s:%s, idx: %llu, in: %zu, out: %zu, len %zu\n",
+ dev_dbg(dev, "%s:%s, idx: %llu, in: %u, out: %u, len %llu\n",
__func__, dimm_name, pkg.nd_command,
in_len, out_len, buf_len);

@@ -1007,9 +1008,9 @@ static int __nd_ioctl(struct nvdimm_bus
out_len += out_size;
}

- buf_len = out_len + in_len;
+ buf_len = (u64) out_len + (u64) in_len;
if (buf_len > ND_IOCTL_MAX_BUFLEN) {
- dev_dbg(dev, "%s:%s cmd: %s buf_len: %zu > %d\n", __func__,
+ dev_dbg(dev, "%s:%s cmd: %s buf_len: %llu > %d\n", __func__,
dimm_name, cmd_name, buf_len,
ND_IOCTL_MAX_BUFLEN);
return -EINVAL;


2017-09-18 09:56:37

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.13 23/52] idr: remove WARN_ON_ONCE() when trying to replace negative ID

4.13-stable review patch. If anyone has any objections, please let me know.

------------------

From: Eric Biggers <[email protected]>

commit a47f68d6a944113bdc8097db6f933c2e17c27bf9 upstream.

IDR only supports non-negative IDs. There used to be a 'WARN_ON_ONCE(id <
0)' in idr_replace(), but it was intentionally removed by commit
2e1c9b286765 ("idr: remove WARN_ON_ONCE() on negative IDs").

Then it was added back by commit 0a835c4f090a ("Reimplement IDR and IDA
using the radix tree"). However it seems that adding it back was a
mistake, given that some users such as drm_gem_handle_delete()
(DRM_IOCTL_GEM_CLOSE) pass in a value from userspace to idr_replace(),
allowing the WARN_ON_ONCE to be triggered. drm_gem_handle_delete()
actually just wants idr_replace() to return an error code if the ID is
not allocated, including in the case where the ID is invalid (negative).

So once again remove the bogus WARN_ON_ONCE().

This bug was found by syzkaller, which encountered the following
warning:

WARNING: CPU: 3 PID: 3008 at lib/idr.c:157 idr_replace+0x1d8/0x240 lib/idr.c:157
Kernel panic - not syncing: panic_on_warn set ...

CPU: 3 PID: 3008 Comm: syzkaller218828 Not tainted 4.13.0-rc4-next-20170811 #2
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
Call Trace:
fixup_bug+0x40/0x90 arch/x86/kernel/traps.c:190
do_trap_no_signal arch/x86/kernel/traps.c:224 [inline]
do_trap+0x260/0x390 arch/x86/kernel/traps.c:273
do_error_trap+0x120/0x390 arch/x86/kernel/traps.c:310
do_invalid_op+0x1b/0x20 arch/x86/kernel/traps.c:323
invalid_op+0x1e/0x30 arch/x86/entry/entry_64.S:930
RIP: 0010:idr_replace+0x1d8/0x240 lib/idr.c:157
RSP: 0018:ffff8800394bf9f8 EFLAGS: 00010297
RAX: ffff88003c6c60c0 RBX: 1ffff10007297f43 RCX: 0000000000000000
RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff8800394bfa78
RBP: ffff8800394bfae0 R08: ffffffff82856487 R09: 0000000000000000
R10: ffff8800394bf9a8 R11: ffff88006c8bae28 R12: ffffffffffffffff
R13: ffff8800394bfab8 R14: dffffc0000000000 R15: ffff8800394bfbc8
drm_gem_handle_delete+0x33/0xa0 drivers/gpu/drm/drm_gem.c:297
drm_gem_close_ioctl+0xa1/0xe0 drivers/gpu/drm/drm_gem.c:671
drm_ioctl_kernel+0x1e7/0x2e0 drivers/gpu/drm/drm_ioctl.c:729
drm_ioctl+0x72e/0xa50 drivers/gpu/drm/drm_ioctl.c:825
vfs_ioctl fs/ioctl.c:45 [inline]
do_vfs_ioctl+0x1b1/0x1520 fs/ioctl.c:685
SYSC_ioctl fs/ioctl.c:700 [inline]
SyS_ioctl+0x8f/0xc0 fs/ioctl.c:691
entry_SYSCALL_64_fastpath+0x1f/0xbe

Here is a C reproducer:

#include <fcntl.h>
#include <stddef.h>
#include <stdint.h>
#include <sys/ioctl.h>
#include <drm/drm.h>

int main(void)
{
int cardfd = open("/dev/dri/card0", O_RDONLY);

ioctl(cardfd, DRM_IOCTL_GEM_CLOSE,
&(struct drm_gem_close) { .handle = -1 } );
}

Link: http://lkml.kernel.org/r/[email protected]
Fixes: 0a835c4f090a ("Reimplement IDR and IDA using the radix tree")
Signed-off-by: Eric Biggers <[email protected]>
Acked-by: Tejun Heo <[email protected]>
Cc: Dmitry Vyukov <[email protected]>
Cc: Matthew Wilcox <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
lib/idr.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

--- a/lib/idr.c
+++ b/lib/idr.c
@@ -154,7 +154,7 @@ void *idr_replace(struct idr *idr, void
void __rcu **slot = NULL;
void *entry;

- if (WARN_ON_ONCE(id < 0))
+ if (id < 0)
return ERR_PTR(-EINVAL);
if (WARN_ON_ONCE(radix_tree_is_internal_node(ptr)))
return ERR_PTR(-EINVAL);


2017-09-18 09:11:09

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.13 32/52] xfs: fix log recovery corruption error due to tail overwrite

4.13-stable review patch. If anyone has any objections, please let me know.

------------------

From: Brian Foster <[email protected]>

commit 4a4f66eac4681378996a1837ad1ffec3a2e2981f upstream.

If we consider the case where the tail (T) of the log is pinned long
enough for the head (H) to push and block behind the tail, we can
end up blocked in the following state without enough free space (f)
in the log to satisfy a transaction reservation:

0 phys. log N
[-------HffT---H'--T'---]

The last good record in the log (before H) refers to T. The tail
eventually pushes forward (T') leaving more free space in the log
for writes to H. At this point, suppose space frees up in the log
for the maximum of 8 in-core log buffers to start flushing out to
the log. If this pushes the head from H to H', these next writes
overwrite the previous tail T. This is safe because the items logged
from T to T' have been written back and removed from the AIL.

If the next log writes (H -> H') happen to fail and result in
partial records in the log, the filesystem shuts down having
overwritten T with invalid data. Log recovery correctly locates H on
the subsequent mount, but H still refers to the now corrupted tail
T. This results in log corruption errors and recovery failure.

Since the tail overwrite results from otherwise correct runtime
behavior, it is up to log recovery to try and deal with this
situation. Update log recovery tail verification to run a CRC pass
from the first record past the tail to the head. This facilitates
error detection at T and moves the recovery tail to the first good
record past H' (similar to truncating the head on torn write
detection). If corruption is detected beyond the range possibly
affected by the max number of iclogs, the log is legitimately
corrupted and log recovery failure is expected.

Signed-off-by: Brian Foster <[email protected]>
Reviewed-by: Darrick J. Wong <[email protected]>
Signed-off-by: Darrick J. Wong <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
fs/xfs/xfs_log_recover.c | 108 +++++++++++++++++++++++++++++++++--------------
1 file changed, 77 insertions(+), 31 deletions(-)

--- a/fs/xfs/xfs_log_recover.c
+++ b/fs/xfs/xfs_log_recover.c
@@ -1029,61 +1029,106 @@ out_error:
}

/*
- * Check the log tail for torn writes. This is required when torn writes are
- * detected at the head and the head had to be walked back to a previous record.
- * The tail of the previous record must now be verified to ensure the torn
- * writes didn't corrupt the previous tail.
+ * Calculate distance from head to tail (i.e., unused space in the log).
+ */
+static inline int
+xlog_tail_distance(
+ struct xlog *log,
+ xfs_daddr_t head_blk,
+ xfs_daddr_t tail_blk)
+{
+ if (head_blk < tail_blk)
+ return tail_blk - head_blk;
+
+ return tail_blk + (log->l_logBBsize - head_blk);
+}
+
+/*
+ * Verify the log tail. This is particularly important when torn or incomplete
+ * writes have been detected near the front of the log and the head has been
+ * walked back accordingly.
+ *
+ * We also have to handle the case where the tail was pinned and the head
+ * blocked behind the tail right before a crash. If the tail had been pushed
+ * immediately prior to the crash and the subsequent checkpoint was only
+ * partially written, it's possible it overwrote the last referenced tail in the
+ * log with garbage. This is not a coherency problem because the tail must have
+ * been pushed before it can be overwritten, but appears as log corruption to
+ * recovery because we have no way to know the tail was updated if the
+ * subsequent checkpoint didn't write successfully.
*
- * Return an error if CRC verification fails as recovery cannot proceed.
+ * Therefore, CRC check the log from tail to head. If a failure occurs and the
+ * offending record is within max iclog bufs from the head, walk the tail
+ * forward and retry until a valid tail is found or corruption is detected out
+ * of the range of a possible overwrite.
*/
STATIC int
xlog_verify_tail(
struct xlog *log,
xfs_daddr_t head_blk,
- xfs_daddr_t tail_blk)
+ xfs_daddr_t *tail_blk,
+ int hsize)
{
struct xlog_rec_header *thead;
struct xfs_buf *bp;
xfs_daddr_t first_bad;
- int count;
int error = 0;
bool wrapped;
- xfs_daddr_t tmp_head;
+ xfs_daddr_t tmp_tail;
+ xfs_daddr_t orig_tail = *tail_blk;

bp = xlog_get_bp(log, 1);
if (!bp)
return -ENOMEM;

/*
- * Seek XLOG_MAX_ICLOGS + 1 records past the current tail record to get
- * a temporary head block that points after the last possible
- * concurrently written record of the tail.
+ * Make sure the tail points to a record (returns positive count on
+ * success).
*/
- count = xlog_seek_logrec_hdr(log, head_blk, tail_blk,
- XLOG_MAX_ICLOGS + 1, bp, &tmp_head, &thead,
- &wrapped);
- if (count < 0) {
- error = count;
+ error = xlog_seek_logrec_hdr(log, head_blk, *tail_blk, 1, bp,
+ &tmp_tail, &thead, &wrapped);
+ if (error < 0)
goto out;
- }
+ if (*tail_blk != tmp_tail)
+ *tail_blk = tmp_tail;

/*
- * If the call above didn't find XLOG_MAX_ICLOGS + 1 records, we ran
- * into the actual log head. tmp_head points to the start of the record
- * so update it to the actual head block.
+ * Run a CRC check from the tail to the head. We can't just check
+ * MAX_ICLOGS records past the tail because the tail may point to stale
+ * blocks cleared during the search for the head/tail. These blocks are
+ * overwritten with zero-length records and thus record count is not a
+ * reliable indicator of the iclog state before a crash.
*/
- if (count < XLOG_MAX_ICLOGS + 1)
- tmp_head = head_blk;
-
- /*
- * We now have a tail and temporary head block that covers at least
- * XLOG_MAX_ICLOGS records from the tail. We need to verify that these
- * records were completely written. Run a CRC verification pass from
- * tail to head and return the result.
- */
- error = xlog_do_recovery_pass(log, tmp_head, tail_blk,
+ first_bad = 0;
+ error = xlog_do_recovery_pass(log, head_blk, *tail_blk,
XLOG_RECOVER_CRCPASS, &first_bad);
+ while (error == -EFSBADCRC && first_bad) {
+ int tail_distance;
+
+ /*
+ * Is corruption within range of the head? If so, retry from
+ * the next record. Otherwise return an error.
+ */
+ tail_distance = xlog_tail_distance(log, head_blk, first_bad);
+ if (tail_distance > BTOBB(XLOG_MAX_ICLOGS * hsize))
+ break;
+
+ /* skip to the next record; returns positive count on success */
+ error = xlog_seek_logrec_hdr(log, head_blk, first_bad, 2, bp,
+ &tmp_tail, &thead, &wrapped);
+ if (error < 0)
+ goto out;
+
+ *tail_blk = tmp_tail;
+ first_bad = 0;
+ error = xlog_do_recovery_pass(log, head_blk, *tail_blk,
+ XLOG_RECOVER_CRCPASS, &first_bad);
+ }

+ if (!error && *tail_blk != orig_tail)
+ xfs_warn(log->l_mp,
+ "Tail block (0x%llx) overwrite detected. Updated to 0x%llx",
+ orig_tail, *tail_blk);
out:
xlog_put_bp(bp);
return error;
@@ -1187,7 +1232,8 @@ xlog_verify_head(
if (error)
return error;

- return xlog_verify_tail(log, *head_blk, *tail_blk);
+ return xlog_verify_tail(log, *head_blk, tail_blk,
+ be32_to_cpu((*rhead)->h_size));
}

/*


2017-09-18 09:57:25

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.13 31/52] xfs: always verify the log tail during recovery

4.13-stable review patch. If anyone has any objections, please let me know.

------------------

From: Brian Foster <[email protected]>

commit 5297ac1f6d7cbf45464a49b9558831f271dfc559 upstream.

Log tail verification currently only occurs when torn writes are
detected at the head of the log. This was introduced because a
change in the head block due to torn writes can lead to a change in
the tail block (each log record header references the current tail)
and the tail block should be verified before log recovery proceeds.

Tail corruption is possible outside of torn write scenarios,
however. For example, partial log writes can be detected and cleared
during the initial head/tail block discovery process. If the partial
write coincides with a tail overwrite, the log tail is corrupted and
recovery fails.

To facilitate correct handling of log tail overwites, update log
recovery to always perform tail verification. This is necessary to
detect potential tail overwrite conditions when torn writes may not
have occurred. This changes normal (i.e., no torn writes) recovery
behavior slightly to detect and return CRC related errors near the
tail before actual recovery starts.

Signed-off-by: Brian Foster <[email protected]>
Reviewed-by: Darrick J. Wong <[email protected]>
Signed-off-by: Darrick J. Wong <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
fs/xfs/xfs_log_recover.c | 26 +++-----------------------
1 file changed, 3 insertions(+), 23 deletions(-)

--- a/fs/xfs/xfs_log_recover.c
+++ b/fs/xfs/xfs_log_recover.c
@@ -1183,31 +1183,11 @@ xlog_verify_head(
ASSERT(0);
return 0;
}
-
- /*
- * Now verify the tail based on the updated head. This is
- * required because the torn writes trimmed from the head could
- * have been written over the tail of a previous record. Return
- * any errors since recovery cannot proceed if the tail is
- * corrupt.
- *
- * XXX: This leaves a gap in truly robust protection from torn
- * writes in the log. If the head is behind the tail, the tail
- * pushes forward to create some space and then a crash occurs
- * causing the writes into the previous record's tail region to
- * tear, log recovery isn't able to recover.
- *
- * How likely is this to occur? If possible, can we do something
- * more intelligent here? Is it safe to push the tail forward if
- * we can determine that the tail is within the range of the
- * torn write (e.g., the kernel can only overwrite the tail if
- * it has actually been pushed forward)? Alternatively, could we
- * somehow prevent this condition at runtime?
- */
- error = xlog_verify_tail(log, *head_blk, *tail_blk);
}
+ if (error)
+ return error;

- return error;
+ return xlog_verify_tail(log, *head_blk, *tail_blk);
}

/*


2017-09-18 09:57:56

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.13 29/52] xfs: Properly retry failed inode items in case of error during buffer writeback

4.13-stable review patch. If anyone has any objections, please let me know.

------------------

From: Carlos Maiolino <[email protected]>

commit d3a304b6292168b83b45d624784f973fdc1ca674 upstream.

When a buffer has been failed during writeback, the inode items into it
are kept flush locked, and are never resubmitted due the flush lock, so,
if any buffer fails to be written, the items in AIL are never written to
disk and never unlocked.

This causes unmount operation to hang due these items flush locked in AIL,
but this also causes the items in AIL to never be written back, even when
the IO device comes back to normal.

I've been testing this patch with a DM-thin device, creating a
filesystem larger than the real device.

When writing enough data to fill the DM-thin device, XFS receives ENOSPC
errors from the device, and keep spinning on xfsaild (when 'retry
forever' configuration is set).

At this point, the filesystem can not be unmounted because of the flush locked
items in AIL, but worse, the items in AIL are never retried at all
(once xfs_inode_item_push() will skip the items that are flush locked),
even if the underlying DM-thin device is expanded to the proper size.

This patch fixes both cases, retrying any item that has been failed
previously, using the infra-structure provided by the previous patch.

Reviewed-by: Brian Foster <[email protected]>
Signed-off-by: Carlos Maiolino <[email protected]>
Reviewed-by: Darrick J. Wong <[email protected]>
Signed-off-by: Darrick J. Wong <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
fs/xfs/xfs_buf_item.c | 28 ++++++++++++++++++++++++++++
fs/xfs/xfs_buf_item.h | 3 +++
fs/xfs/xfs_inode_item.c | 47 +++++++++++++++++++++++++++++++++++++++++++----
fs/xfs/xfs_trans.h | 1 +
fs/xfs/xfs_trans_ail.c | 3 ++-
fs/xfs/xfs_trans_priv.h | 31 +++++++++++++++++++++++++++++++
6 files changed, 108 insertions(+), 5 deletions(-)

--- a/fs/xfs/xfs_buf_item.c
+++ b/fs/xfs/xfs_buf_item.c
@@ -1234,3 +1234,31 @@ xfs_buf_iodone(
xfs_trans_ail_delete(ailp, lip, SHUTDOWN_CORRUPT_INCORE);
xfs_buf_item_free(BUF_ITEM(lip));
}
+
+/*
+ * Requeue a failed buffer for writeback
+ *
+ * Return true if the buffer has been re-queued properly, false otherwise
+ */
+bool
+xfs_buf_resubmit_failed_buffers(
+ struct xfs_buf *bp,
+ struct xfs_log_item *lip,
+ struct list_head *buffer_list)
+{
+ struct xfs_log_item *next;
+
+ /*
+ * Clear XFS_LI_FAILED flag from all items before resubmit
+ *
+ * XFS_LI_FAILED set/clear is protected by xa_lock, caller this
+ * function already have it acquired
+ */
+ for (; lip; lip = next) {
+ next = lip->li_bio_list;
+ xfs_clear_li_failed(lip);
+ }
+
+ /* Add this buffer back to the delayed write list */
+ return xfs_buf_delwri_queue(bp, buffer_list);
+}
--- a/fs/xfs/xfs_buf_item.h
+++ b/fs/xfs/xfs_buf_item.h
@@ -70,6 +70,9 @@ void xfs_buf_attach_iodone(struct xfs_bu
xfs_log_item_t *);
void xfs_buf_iodone_callbacks(struct xfs_buf *);
void xfs_buf_iodone(struct xfs_buf *, struct xfs_log_item *);
+bool xfs_buf_resubmit_failed_buffers(struct xfs_buf *,
+ struct xfs_log_item *,
+ struct list_head *);

extern kmem_zone_t *xfs_buf_item_zone;

--- a/fs/xfs/xfs_inode_item.c
+++ b/fs/xfs/xfs_inode_item.c
@@ -27,6 +27,7 @@
#include "xfs_error.h"
#include "xfs_trace.h"
#include "xfs_trans_priv.h"
+#include "xfs_buf_item.h"
#include "xfs_log.h"


@@ -475,6 +476,23 @@ xfs_inode_item_unpin(
wake_up_bit(&ip->i_flags, __XFS_IPINNED_BIT);
}

+/*
+ * Callback used to mark a buffer with XFS_LI_FAILED when items in the buffer
+ * have been failed during writeback
+ *
+ * This informs the AIL that the inode is already flush locked on the next push,
+ * and acquires a hold on the buffer to ensure that it isn't reclaimed before
+ * dirty data makes it to disk.
+ */
+STATIC void
+xfs_inode_item_error(
+ struct xfs_log_item *lip,
+ struct xfs_buf *bp)
+{
+ ASSERT(xfs_isiflocked(INODE_ITEM(lip)->ili_inode));
+ xfs_set_li_failed(lip, bp);
+}
+
STATIC uint
xfs_inode_item_push(
struct xfs_log_item *lip,
@@ -484,13 +502,28 @@ xfs_inode_item_push(
{
struct xfs_inode_log_item *iip = INODE_ITEM(lip);
struct xfs_inode *ip = iip->ili_inode;
- struct xfs_buf *bp = NULL;
+ struct xfs_buf *bp = lip->li_buf;
uint rval = XFS_ITEM_SUCCESS;
int error;

if (xfs_ipincount(ip) > 0)
return XFS_ITEM_PINNED;

+ /*
+ * The buffer containing this item failed to be written back
+ * previously. Resubmit the buffer for IO.
+ */
+ if (lip->li_flags & XFS_LI_FAILED) {
+ if (!xfs_buf_trylock(bp))
+ return XFS_ITEM_LOCKED;
+
+ if (!xfs_buf_resubmit_failed_buffers(bp, lip, buffer_list))
+ rval = XFS_ITEM_FLUSHING;
+
+ xfs_buf_unlock(bp);
+ return rval;
+ }
+
if (!xfs_ilock_nowait(ip, XFS_ILOCK_SHARED))
return XFS_ITEM_LOCKED;

@@ -622,7 +655,8 @@ static const struct xfs_item_ops xfs_ino
.iop_unlock = xfs_inode_item_unlock,
.iop_committed = xfs_inode_item_committed,
.iop_push = xfs_inode_item_push,
- .iop_committing = xfs_inode_item_committing
+ .iop_committing = xfs_inode_item_committing,
+ .iop_error = xfs_inode_item_error
};


@@ -710,7 +744,8 @@ xfs_iflush_done(
* the AIL lock.
*/
iip = INODE_ITEM(blip);
- if (iip->ili_logged && blip->li_lsn == iip->ili_flush_lsn)
+ if ((iip->ili_logged && blip->li_lsn == iip->ili_flush_lsn) ||
+ lip->li_flags & XFS_LI_FAILED)
need_ail++;

blip = next;
@@ -718,7 +753,8 @@ xfs_iflush_done(

/* make sure we capture the state of the initial inode. */
iip = INODE_ITEM(lip);
- if (iip->ili_logged && lip->li_lsn == iip->ili_flush_lsn)
+ if ((iip->ili_logged && lip->li_lsn == iip->ili_flush_lsn) ||
+ lip->li_flags & XFS_LI_FAILED)
need_ail++;

/*
@@ -739,6 +775,9 @@ xfs_iflush_done(
if (INODE_ITEM(blip)->ili_logged &&
blip->li_lsn == INODE_ITEM(blip)->ili_flush_lsn)
mlip_changed |= xfs_ail_delete_one(ailp, blip);
+ else {
+ xfs_clear_li_failed(blip);
+ }
}

if (mlip_changed) {
--- a/fs/xfs/xfs_trans.h
+++ b/fs/xfs/xfs_trans.h
@@ -49,6 +49,7 @@ typedef struct xfs_log_item {
struct xfs_ail *li_ailp; /* ptr to AIL */
uint li_type; /* item type */
uint li_flags; /* misc flags */
+ struct xfs_buf *li_buf; /* real buffer pointer */
struct xfs_log_item *li_bio_list; /* buffer item list */
void (*li_cb)(struct xfs_buf *,
struct xfs_log_item *);
--- a/fs/xfs/xfs_trans_ail.c
+++ b/fs/xfs/xfs_trans_ail.c
@@ -687,12 +687,13 @@ xfs_trans_ail_update_bulk(
bool
xfs_ail_delete_one(
struct xfs_ail *ailp,
- struct xfs_log_item *lip)
+ struct xfs_log_item *lip)
{
struct xfs_log_item *mlip = xfs_ail_min(ailp);

trace_xfs_ail_delete(lip, mlip->li_lsn, lip->li_lsn);
xfs_ail_delete(ailp, lip);
+ xfs_clear_li_failed(lip);
lip->li_flags &= ~XFS_LI_IN_AIL;
lip->li_lsn = 0;

--- a/fs/xfs/xfs_trans_priv.h
+++ b/fs/xfs/xfs_trans_priv.h
@@ -164,4 +164,35 @@ xfs_trans_ail_copy_lsn(
*dst = *src;
}
#endif
+
+static inline void
+xfs_clear_li_failed(
+ struct xfs_log_item *lip)
+{
+ struct xfs_buf *bp = lip->li_buf;
+
+ ASSERT(lip->li_flags & XFS_LI_IN_AIL);
+ lockdep_assert_held(&lip->li_ailp->xa_lock);
+
+ if (lip->li_flags & XFS_LI_FAILED) {
+ lip->li_flags &= ~XFS_LI_FAILED;
+ lip->li_buf = NULL;
+ xfs_buf_rele(bp);
+ }
+}
+
+static inline void
+xfs_set_li_failed(
+ struct xfs_log_item *lip,
+ struct xfs_buf *bp)
+{
+ lockdep_assert_held(&lip->li_ailp->xa_lock);
+
+ if (!(lip->li_flags & XFS_LI_FAILED)) {
+ xfs_buf_hold(bp);
+ lip->li_flags |= XFS_LI_FAILED;
+ lip->li_buf = bp;
+ }
+}
+
#endif /* __XFS_TRANS_PRIV_H__ */


2017-09-18 09:58:26

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.13 13/52] f2fs: check hot_data for roll-forward recovery

4.13-stable review patch. If anyone has any objections, please let me know.

------------------

From: Jaegeuk Kim <[email protected]>

commit 125c9fb1ccb53eb2ea9380df40f3c743f3fb2fed upstream.

We need to check HOT_DATA to truncate any previous data block when doing
roll-forward recovery.

Reviewed-by: Chao Yu <[email protected]>
Signed-off-by: Jaegeuk Kim <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
fs/f2fs/recovery.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

--- a/fs/f2fs/recovery.c
+++ b/fs/f2fs/recovery.c
@@ -291,7 +291,7 @@ static int check_index_in_prev_nodes(str
return 0;

/* Get the previous summary */
- for (i = CURSEG_WARM_DATA; i <= CURSEG_COLD_DATA; i++) {
+ for (i = CURSEG_HOT_DATA; i <= CURSEG_COLD_DATA; i++) {
struct curseg_info *curseg = CURSEG_I(sbi, i);
if (curseg->segno == segno) {
sum = curseg->sum_blk->entries[blkoff];


2017-09-18 09:59:04

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.13 10/52] tcp: fix a request socket leak

4.13-stable review patch. If anyone has any objections, please let me know.

------------------

From: Eric Dumazet <[email protected]>


[ Upstream commit 1f3b359f1004bd34b7b0bad70b93e3c7af92a37b ]

While the cited commit fixed a possible deadlock, it added a leak
of the request socket, since reqsk_put() must be called if the BPF
filter decided the ACK packet must be dropped.

Fixes: d624d276d1dd ("tcp: fix possible deadlock in TCP stack vs BPF filter")
Signed-off-by: Eric Dumazet <[email protected]>
Acked-by: Alexei Starovoitov <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
net/ipv4/tcp_ipv4.c | 6 +++---
net/ipv6/tcp_ipv6.c | 6 +++---
2 files changed, 6 insertions(+), 6 deletions(-)

--- a/net/ipv4/tcp_ipv4.c
+++ b/net/ipv4/tcp_ipv4.c
@@ -1722,9 +1722,9 @@ process:
*/
sock_hold(sk);
refcounted = true;
- if (tcp_filter(sk, skb))
- goto discard_and_relse;
- nsk = tcp_check_req(sk, skb, req, false);
+ nsk = NULL;
+ if (!tcp_filter(sk, skb))
+ nsk = tcp_check_req(sk, skb, req, false);
if (!nsk) {
reqsk_put(req);
goto discard_and_relse;
--- a/net/ipv6/tcp_ipv6.c
+++ b/net/ipv6/tcp_ipv6.c
@@ -1456,9 +1456,9 @@ process:
}
sock_hold(sk);
refcounted = true;
- if (tcp_filter(sk, skb))
- goto discard_and_relse;
- nsk = tcp_check_req(sk, skb, req, false);
+ nsk = NULL;
+ if (!tcp_filter(sk, skb))
+ nsk = tcp_check_req(sk, skb, req, false);
if (!nsk) {
reqsk_put(req);
goto discard_and_relse;


2017-09-18 09:10:43

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.13 08/52] ipv6: fix typo in fib6_net_exit()

4.13-stable review patch. If anyone has any objections, please let me know.

------------------

From: Eric Dumazet <[email protected]>


[ Upstream commit 32a805baf0fb70b6dbedefcd7249ac7f580f9e3b ]

IPv6 FIB should use FIB6_TABLE_HASHSZ, not FIB_TABLE_HASHSZ.

Fixes: ba1cc08d9488 ("ipv6: fix memory leak with multiple tables during netns destruction")
Signed-off-by: Eric Dumazet <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
net/ipv6/ip6_fib.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

--- a/net/ipv6/ip6_fib.c
+++ b/net/ipv6/ip6_fib.c
@@ -1926,7 +1926,7 @@ static void fib6_net_exit(struct net *ne
rt6_ifdown(net, NULL);
del_timer_sync(&net->ipv6.ip6_fib_timer);

- for (i = 0; i < FIB_TABLE_HASHSZ; i++) {
+ for (i = 0; i < FIB6_TABLE_HASHSZ; i++) {
struct hlist_head *head = &net->ipv6.fib_table_hash[i];
struct hlist_node *tmp;
struct fib6_table *tb;


2017-09-18 09:59:49

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.13 09/52] sctp: fix missing wake ups in some situations

4.13-stable review patch. If anyone has any objections, please let me know.

------------------

From: Marcelo Ricardo Leitner <[email protected]>


[ Upstream commit 7906b00f5cd1cd484fced7fcda892176e3202c8a ]

Commit fb586f25300f ("sctp: delay calls to sk_data_ready() as much as
possible") minimized the number of wake ups that are triggered in case
the association receives a packet with multiple data chunks on it and/or
when io_events are enabled and then commit 0970f5b36659 ("sctp: signal
sk_data_ready earlier on data chunks reception") moved the wake up to as
soon as possible. It thus relies on the state machine running later to
clean the flag that the event was already generated.

The issue is that there are 2 call paths that calls
sctp_ulpq_tail_event() outside of the state machine, causing the flag to
linger and possibly omitting a needed wake up in the sequence.

One of the call paths is when enabling SCTP_SENDER_DRY_EVENTS via
setsockopt(SCTP_EVENTS), as noticed by Harald Welte. The other is when
partial reliability triggers removal of chunks from the send queue when
the application calls sendmsg().

This commit fixes it by not setting the flag in case the socket is not
owned by the user, as it won't be cleaned later. This works for
user-initiated calls and also for rx path processing.

Fixes: fb586f25300f ("sctp: delay calls to sk_data_ready() as much as possible")
Reported-by: Harald Welte <[email protected]>
Signed-off-by: Marcelo Ricardo Leitner <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
net/sctp/ulpqueue.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)

--- a/net/sctp/ulpqueue.c
+++ b/net/sctp/ulpqueue.c
@@ -265,7 +265,8 @@ int sctp_ulpq_tail_event(struct sctp_ulp
sctp_ulpq_clear_pd(ulpq);

if (queue == &sk->sk_receive_queue && !sp->data_ready_signalled) {
- sp->data_ready_signalled = 1;
+ if (!sock_owned_by_user(sk))
+ sp->data_ready_signalled = 1;
sk->sk_data_ready(sk);
}
return 1;


2017-09-18 10:00:41

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.13 17/52] x86/fsgsbase/64: Fully initialize FS and GS state in start_thread_common

4.13-stable review patch. If anyone has any objections, please let me know.

------------------

From: Andy Lutomirski <[email protected]>

commit 767d035d838f4fd6b5a5bbd7a3f6d293b7f65a49 upstream.

execve used to leak FSBASE and GSBASE on AMD CPUs. Fix it.

The security impact of this bug is small but not quite zero -- it
could weaken ASLR when a privileged task execs a less privileged
program, but only if program changed bitness across the exec, or the
child binary was highly unusual or actively malicious. A child
program that was compromised after the exec would not have access to
the leaked base.

Signed-off-by: Andy Lutomirski <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Brian Gerst <[email protected]>
Cc: Chang Seok <[email protected]>
Cc: Denys Vlasenko <[email protected]>
Cc: H. Peter Anvin <[email protected]>
Cc: Josh Poimboeuf <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

---
arch/x86/kernel/process_64.c | 9 +++++++++
1 file changed, 9 insertions(+)

--- a/arch/x86/kernel/process_64.c
+++ b/arch/x86/kernel/process_64.c
@@ -229,10 +229,19 @@ start_thread_common(struct pt_regs *regs
unsigned long new_sp,
unsigned int _cs, unsigned int _ss, unsigned int _ds)
{
+ WARN_ON_ONCE(regs != current_pt_regs());
+
+ if (static_cpu_has(X86_BUG_NULL_SEG)) {
+ /* Loading zero below won't clear the base. */
+ loadsegment(fs, __USER_DS);
+ load_gs_index(__USER_DS);
+ }
+
loadsegment(fs, 0);
loadsegment(es, _ds);
loadsegment(ds, _ds);
load_gs_index(0);
+
regs->ip = new_ip;
regs->sp = new_sp;
regs->cs = _cs;


2017-09-18 10:01:32

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.13 06/52] udp: drop head states only when all skb references are gone

4.13-stable review patch. If anyone has any objections, please let me know.

------------------

From: Paolo Abeni <[email protected]>


[ Upstream commit ca2c1418efe9f7fe37aa1f355efdf4eb293673ce ]

After commit 0ddf3fb2c43d ("udp: preserve skb->dst if required
for IP options processing") we clear the skb head state as soon
as the skb carrying them is first processed.

Since the same skb can be processed several times when MSG_PEEK
is used, we can end up lacking the required head states, and
eventually oopsing.

Fix this clearing the skb head state only when processing the
last skb reference.

Reported-by: Eric Dumazet <[email protected]>
Fixes: 0ddf3fb2c43d ("udp: preserve skb->dst if required for IP options processing")
Signed-off-by: Paolo Abeni <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
include/linux/skbuff.h | 2 +-
net/core/skbuff.c | 9 +++------
net/ipv4/udp.c | 5 ++++-
3 files changed, 8 insertions(+), 8 deletions(-)

--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -885,7 +885,7 @@ void kfree_skb(struct sk_buff *skb);
void kfree_skb_list(struct sk_buff *segs);
void skb_tx_error(struct sk_buff *skb);
void consume_skb(struct sk_buff *skb);
-void consume_stateless_skb(struct sk_buff *skb);
+void __consume_stateless_skb(struct sk_buff *skb);
void __kfree_skb(struct sk_buff *skb);
extern struct kmem_cache *skbuff_head_cache;

--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -753,14 +753,11 @@ EXPORT_SYMBOL(consume_skb);
* consume_stateless_skb - free an skbuff, assuming it is stateless
* @skb: buffer to free
*
- * Works like consume_skb(), but this variant assumes that all the head
- * states have been already dropped.
+ * Alike consume_skb(), but this variant assumes that this is the last
+ * skb reference and all the head states have been already dropped
*/
-void consume_stateless_skb(struct sk_buff *skb)
+void __consume_stateless_skb(struct sk_buff *skb)
{
- if (!skb_unref(skb))
- return;
-
trace_consume_skb(skb);
if (likely(skb->head))
skb_release_data(skb);
--- a/net/ipv4/udp.c
+++ b/net/ipv4/udp.c
@@ -1386,12 +1386,15 @@ void skb_consume_udp(struct sock *sk, st
unlock_sock_fast(sk, slow);
}

+ if (!skb_unref(skb))
+ return;
+
/* In the more common cases we cleared the head states previously,
* see __udp_queue_rcv_skb().
*/
if (unlikely(udp_skb_has_head_state(skb)))
skb_release_head_state(skb);
- consume_stateless_skb(skb);
+ __consume_stateless_skb(skb);
}
EXPORT_SYMBOL_GPL(skb_consume_udp);



2017-09-18 09:10:21

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.13 03/52] gianfar: Fix Tx flow control deactivation

4.13-stable review patch. If anyone has any objections, please let me know.

------------------

From: Claudiu Manoil <[email protected]>


[ Upstream commit 5d621672bc1a1e5090c1ac5432a18c79e0e13e03 ]

The wrong register is checked for the Tx flow control bit,
it should have been maccfg1 not maccfg2.
This went unnoticed for so long probably because the impact is
hardly visible, not to mention the tangled code from adjust_link().
First, link flow control (i.e. handling of Rx/Tx link level pause frames)
is disabled by default (needs to be enabled via 'ethtool -A').
Secondly, maccfg2 always returns 0 for tx_flow_oldval (except for a few
old boards), which results in Tx flow control remaining always on
once activated.

Fixes: 45b679c9a3ccd9e34f28e6ec677b812a860eb8eb ("gianfar: Implement PAUSE frame generation support")
Signed-off-by: Claudiu Manoil <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
drivers/net/ethernet/freescale/gianfar.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

--- a/drivers/net/ethernet/freescale/gianfar.c
+++ b/drivers/net/ethernet/freescale/gianfar.c
@@ -3687,7 +3687,7 @@ static noinline void gfar_update_link_st
u32 tempval1 = gfar_read(&regs->maccfg1);
u32 tempval = gfar_read(&regs->maccfg2);
u32 ecntrl = gfar_read(&regs->ecntrl);
- u32 tx_flow_oldval = (tempval & MACCFG1_TX_FLOW);
+ u32 tx_flow_oldval = (tempval1 & MACCFG1_TX_FLOW);

if (phydev->duplex != priv->oldduplex) {
if (!(phydev->duplex))


2017-09-18 10:02:07

by Greg Kroah-Hartman

[permalink] [raw]
Subject: [PATCH 4.13 02/52] Revert "net: fix percpu memory leaks"

4.13-stable review patch. If anyone has any objections, please let me know.

------------------

From: Jesper Dangaard Brouer <[email protected]>


[ Upstream commit 5a63643e583b6a9789d7a225ae076fb4e603991c ]

This reverts commit 1d6119baf0610f813eb9d9580eb4fd16de5b4ceb.

After reverting commit 6d7b857d541e ("net: use lib/percpu_counter API
for fragmentation mem accounting") then here is no need for this
fix-up patch. As percpu_counter is no longer used, it cannot
memory leak it any-longer.

Fixes: 6d7b857d541e ("net: use lib/percpu_counter API for fragmentation mem accounting")
Fixes: 1d6119baf061 ("net: fix percpu memory leaks")
Signed-off-by: Jesper Dangaard Brouer <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
---
include/net/inet_frag.h | 7 +------
net/ieee802154/6lowpan/reassembly.c | 11 +++--------
net/ipv4/ip_fragment.c | 12 +++---------
net/ipv6/netfilter/nf_conntrack_reasm.c | 12 +++---------
net/ipv6/reassembly.c | 12 +++---------
5 files changed, 13 insertions(+), 41 deletions(-)

--- a/include/net/inet_frag.h
+++ b/include/net/inet_frag.h
@@ -103,15 +103,10 @@ struct inet_frags {
int inet_frags_init(struct inet_frags *);
void inet_frags_fini(struct inet_frags *);

-static inline int inet_frags_init_net(struct netns_frags *nf)
+static inline void inet_frags_init_net(struct netns_frags *nf)
{
atomic_set(&nf->mem, 0);
- return 0;
}
-static inline void inet_frags_uninit_net(struct netns_frags *nf)
-{
-}
-
void inet_frags_exit_net(struct netns_frags *nf, struct inet_frags *f);

void inet_frag_kill(struct inet_frag_queue *q, struct inet_frags *f);
--- a/net/ieee802154/6lowpan/reassembly.c
+++ b/net/ieee802154/6lowpan/reassembly.c
@@ -580,19 +580,14 @@ static int __net_init lowpan_frags_init_
{
struct netns_ieee802154_lowpan *ieee802154_lowpan =
net_ieee802154_lowpan(net);
- int res;

ieee802154_lowpan->frags.high_thresh = IPV6_FRAG_HIGH_THRESH;
ieee802154_lowpan->frags.low_thresh = IPV6_FRAG_LOW_THRESH;
ieee802154_lowpan->frags.timeout = IPV6_FRAG_TIMEOUT;

- res = inet_frags_init_net(&ieee802154_lowpan->frags);
- if (res)
- return res;
- res = lowpan_frags_ns_sysctl_register(net);
- if (res)
- inet_frags_uninit_net(&ieee802154_lowpan->frags);
- return res;
+ inet_frags_init_net(&ieee802154_lowpan->frags);
+
+ return lowpan_frags_ns_sysctl_register(net);
}

static void __net_exit lowpan_frags_exit_net(struct net *net)
--- a/net/ipv4/ip_fragment.c
+++ b/net/ipv4/ip_fragment.c
@@ -844,8 +844,6 @@ static void __init ip4_frags_ctl_registe

static int __net_init ipv4_frags_init_net(struct net *net)
{
- int res;
-
/* Fragment cache limits.
*
* The fragment memory accounting code, (tries to) account for
@@ -871,13 +869,9 @@ static int __net_init ipv4_frags_init_ne

net->ipv4.frags.max_dist = 64;

- res = inet_frags_init_net(&net->ipv4.frags);
- if (res)
- return res;
- res = ip4_frags_ns_ctl_register(net);
- if (res)
- inet_frags_uninit_net(&net->ipv4.frags);
- return res;
+ inet_frags_init_net(&net->ipv4.frags);
+
+ return ip4_frags_ns_ctl_register(net);
}

static void __net_exit ipv4_frags_exit_net(struct net *net)
--- a/net/ipv6/netfilter/nf_conntrack_reasm.c
+++ b/net/ipv6/netfilter/nf_conntrack_reasm.c
@@ -622,18 +622,12 @@ EXPORT_SYMBOL_GPL(nf_ct_frag6_gather);

static int nf_ct_net_init(struct net *net)
{
- int res;
-
net->nf_frag.frags.high_thresh = IPV6_FRAG_HIGH_THRESH;
net->nf_frag.frags.low_thresh = IPV6_FRAG_LOW_THRESH;
net->nf_frag.frags.timeout = IPV6_FRAG_TIMEOUT;
- res = inet_frags_init_net(&net->nf_frag.frags);
- if (res)
- return res;
- res = nf_ct_frag6_sysctl_register(net);
- if (res)
- inet_frags_uninit_net(&net->nf_frag.frags);
- return res;
+ inet_frags_init_net(&net->nf_frag.frags);
+
+ return nf_ct_frag6_sysctl_register(net);
}

static void nf_ct_net_exit(struct net *net)
--- a/net/ipv6/reassembly.c
+++ b/net/ipv6/reassembly.c
@@ -714,19 +714,13 @@ static void ip6_frags_sysctl_unregister(

static int __net_init ipv6_frags_init_net(struct net *net)
{
- int res;
-
net->ipv6.frags.high_thresh = IPV6_FRAG_HIGH_THRESH;
net->ipv6.frags.low_thresh = IPV6_FRAG_LOW_THRESH;
net->ipv6.frags.timeout = IPV6_FRAG_TIMEOUT;

- res = inet_frags_init_net(&net->ipv6.frags);
- if (res)
- return res;
- res = ip6_frags_ns_sysctl_register(net);
- if (res)
- inet_frags_uninit_net(&net->ipv6.frags);
- return res;
+ inet_frags_init_net(&net->ipv6.frags);
+
+ return ip6_frags_ns_sysctl_register(net);
}

static void __net_exit ipv6_frags_exit_net(struct net *net)


2017-09-18 19:29:16

by Guenter Roeck

[permalink] [raw]
Subject: Re: [PATCH 4.13 00/52] 4.13.3-stable review

On Mon, Sep 18, 2017 at 11:09:28AM +0200, Greg Kroah-Hartman wrote:
> This is the start of the stable review cycle for the 4.13.3 release.
> There are 52 patches in this series, all will be posted as a response
> to this one. If anyone has any issues with these being applied, please
> let me know.
>
> Responses should be made by Wed Sep 20 09:08:28 UTC 2017.
> Anything received after that time might be too late.
>

Build results:
total: 145 pass: 145 fail: 0
Qemu test results:
total: 122 pass: 122 fail: 0

Details are available at http://kerneltests.org/builders.

Guenter

2017-09-18 20:17:53

by Shuah Khan

[permalink] [raw]
Subject: Re: [PATCH 4.13 00/52] 4.13.3-stable review

On 09/18/2017 03:09 AM, Greg Kroah-Hartman wrote:
> This is the start of the stable review cycle for the 4.13.3 release.
> There are 52 patches in this series, all will be posted as a response
> to this one. If anyone has any issues with these being applied, please
> let me know.
>
> Responses should be made by Wed Sep 20 09:08:28 UTC 2017.
> Anything received after that time might be too late.
>
> The whole patch series can be found in one patch at:
> kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.13.3-rc1.gz
> or in the git tree and branch at:
> git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-4.13.y
> and the diffstat can be found below.
>
> thanks,
>
> greg k-h
>

Compiled and booted on my test system. No dmesg regressions.

thanks,
-- Shuah

2017-09-19 06:33:20

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: [PATCH 4.13 00/52] 4.13.3-stable review

On Mon, Sep 18, 2017 at 02:17:49PM -0600, Shuah Khan wrote:
> On 09/18/2017 03:09 AM, Greg Kroah-Hartman wrote:
> > This is the start of the stable review cycle for the 4.13.3 release.
> > There are 52 patches in this series, all will be posted as a response
> > to this one. If anyone has any issues with these being applied, please
> > let me know.
> >
> > Responses should be made by Wed Sep 20 09:08:28 UTC 2017.
> > Anything received after that time might be too late.
> >
> > The whole patch series can be found in one patch at:
> > kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.13.3-rc1.gz
> > or in the git tree and branch at:
> > git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-4.13.y
> > and the diffstat can be found below.
> >
> > thanks,
> >
> > greg k-h
> >
>
> Compiled and booted on my test system. No dmesg regressions.

Thanks for testing all of these and letting me know.

greg k-h