Hi all,
While fuzzing with trinity inside a KVM tools guest running -next I've stumbled on the following:
[64092.216447] ==================================================================
[64092.217840] BUG: KASan: out of bounds on stack in iov_iter_advance+0x3b7/0x480 at addr ffff88040506fd48
[64092.219314] Read of size 8 by task trinity-c194/11387
[64092.220114] page:ffffea0010141bc0 count:0 mapcount:0 mapping: (null) index:0x2
[64092.221354] flags: 0x46fffff80000000()
[64092.221998] page dumped because: kasan: bad access detected
[64092.222879] CPU: 4 PID: 11387 Comm: trinity-c194 Not tainted 4.2.0-rc6-next-20150810-sasha-00040-g12ad0db3-dirty #2427
[64092.224537] ffff88040506fd30 ffff88040506fa88 ffffffff9ce7763b ffff88040506fb10
[64092.225763] ffff88040506fb00 ffffffff9376b1be 0000000000000000 ffff880270108600
[64092.226992] 0000000000000282 0000000000000000 0000000000000000 0000000000000000
[64092.228221] Call Trace:
[64092.228679] dump_stack (lib/dump_stack.c:52)
[64092.231252] kasan_report_error (mm/kasan/report.c:132 mm/kasan/report.c:193)
[64092.232219] __asan_report_load8_noabort (mm/kasan/report.c:251)
[64092.234167] iov_iter_advance (lib/iov_iter.c:511)
[64092.235105] generic_file_read_iter (mm/filemap.c:1743)
[64092.241532] blkdev_read_iter (fs/block_dev.c:1649)
[64092.242448] __vfs_read (fs/read_write.c:423 fs/read_write.c:434)
[64092.246949] vfs_read (fs/read_write.c:454)
[64092.247743] SyS_pread64 (fs/read_write.c:607 fs/read_write.c:594)
[64092.250445] entry_SYSCALL_64_fastpath (arch/x86/entry/entry_64.S:186)
[64092.251440] Memory state around the buggy address:
[64092.252221] ffff88040506fc00: 00 00 00 f1 f1 f1 f1 00 00 00 00 00 f4 f4 f4 f3
[64092.253340] ffff88040506fc80: f3 f3 f3 00 00 00 00 00 00 00 00 00 00 00 00 00
[64092.254456] >ffff88040506fd00: 00 00 f1 f1 f1 f1 00 00 f4 f4 f2 f2 f2 f2 00 00
[64092.255566] ^
[64092.256432] ffff88040506fd80: 00 00 00 f4 f4 f4 f2 f2 f2 f2 00 00 00 00 00 f4
[64092.257557] ffff88040506fe00: f4 f4 f3 f3 f3 f3 00 00 00 00 00 00 00 00 00 00
[64092.258684] ==================================================================
Thanks,
Sasha
On Wed, 12 Aug 2015 10:13:24 -0400
Sasha Levin <[email protected]> wrote:
> While fuzzing with trinity inside a KVM tools guest running -next I've stumbled on the following:
>
> [64092.216447] ==================================================================
> [64092.217840] BUG: KASan: out of bounds on stack in iov_iter_advance+0x3b7/0x480 at addr ffff88040506fd48
> [64092.219314] Read of size 8 by task trinity-c194/11387
> [64092.220114] page:ffffea0010141bc0 count:0 mapcount:0 mapping: (null) index:0x2
> [64092.221354] flags: 0x46fffff80000000()
> [64092.221998] page dumped because: kasan: bad access detected
> [64092.222879] CPU: 4 PID: 11387 Comm: trinity-c194 Not tainted 4.2.0-rc6-next-20150810-sasha-00040-g12ad0db3-dirty #2427
> [64092.224537] ffff88040506fd30 ffff88040506fa88 ffffffff9ce7763b ffff88040506fb10
> [64092.225763] ffff88040506fb00 ffffffff9376b1be 0000000000000000 ffff880270108600
> [64092.226992] 0000000000000282 0000000000000000 0000000000000000 0000000000000000
> [64092.228221] Call Trace:
> [64092.228679] dump_stack (lib/dump_stack.c:52)
> [64092.231252] kasan_report_error (mm/kasan/report.c:132 mm/kasan/report.c:193)
> [64092.232219] __asan_report_load8_noabort (mm/kasan/report.c:251)
> [64092.234167] iov_iter_advance (lib/iov_iter.c:511)
> [64092.235105] generic_file_read_iter (mm/filemap.c:1743)
> [64092.241532] blkdev_read_iter (fs/block_dev.c:1649)
> [64092.242448] __vfs_read (fs/read_write.c:423 fs/read_write.c:434)
> [64092.246949] vfs_read (fs/read_write.c:454)
> [64092.247743] SyS_pread64 (fs/read_write.c:607 fs/read_write.c:594)
> [64092.250445] entry_SYSCALL_64_fastpath (arch/x86/entry/entry_64.S:186)
> [64092.251440] Memory state around the buggy address:
> [64092.252221] ffff88040506fc00: 00 00 00 f1 f1 f1 f1 00 00 00 00 00 f4 f4 f4 f3
> [64092.253340] ffff88040506fc80: f3 f3 f3 00 00 00 00 00 00 00 00 00 00 00 00 00
> [64092.254456] >ffff88040506fd00: 00 00 f1 f1 f1 f1 00 00 f4 f4 f2 f2 f2 f2 00 00
> [64092.255566] ^
> [64092.256432] ffff88040506fd80: 00 00 00 f4 f4 f4 f2 f2 f2 f2 00 00 00 00 00 f4
> [64092.257557] ffff88040506fe00: f4 f4 f3 f3 f3 f3 00 00 00 00 00 00 00 00 00 00
> [64092.258684] ==================================================================
>
I tried to debug this but kasan doesn't print much useful information
for stack out of bounds access. It shows the address that's being
accessed but it doesn't show the value of the boundary that was
exceeded. And the stack dump doesn't show any addresses either - just
contents. It would be nice to see a full stack frame dump showing
where all the parent frames are too. Also too the file and line number
(lib/iov_iter.c:511) are completely useless because of inlining,
though that's not kasan's fault.
On 08/15/2015 11:13 PM, Chuck Ebbert wrote:
> On Wed, 12 Aug 2015 10:13:24 -0400
> Sasha Levin <[email protected]> wrote:
>
>> While fuzzing with trinity inside a KVM tools guest running -next I've stumbled on the following:
>>
>> [64092.216447] ==================================================================
>> [64092.217840] BUG: KASan: out of bounds on stack in iov_iter_advance+0x3b7/0x480 at addr ffff88040506fd48
>> [64092.219314] Read of size 8 by task trinity-c194/11387
>> [64092.220114] page:ffffea0010141bc0 count:0 mapcount:0 mapping: (null) index:0x2
>> [64092.221354] flags: 0x46fffff80000000()
>> [64092.221998] page dumped because: kasan: bad access detected
>> [64092.222879] CPU: 4 PID: 11387 Comm: trinity-c194 Not tainted 4.2.0-rc6-next-20150810-sasha-00040-g12ad0db3-dirty #2427
>> [64092.224537] ffff88040506fd30 ffff88040506fa88 ffffffff9ce7763b ffff88040506fb10
>> [64092.225763] ffff88040506fb00 ffffffff9376b1be 0000000000000000 ffff880270108600
>> [64092.226992] 0000000000000282 0000000000000000 0000000000000000 0000000000000000
>> [64092.228221] Call Trace:
>> [64092.228679] dump_stack (lib/dump_stack.c:52)
>> [64092.231252] kasan_report_error (mm/kasan/report.c:132 mm/kasan/report.c:193)
>> [64092.232219] __asan_report_load8_noabort (mm/kasan/report.c:251)
>> [64092.234167] iov_iter_advance (lib/iov_iter.c:511)
>> [64092.235105] generic_file_read_iter (mm/filemap.c:1743)
>> [64092.241532] blkdev_read_iter (fs/block_dev.c:1649)
>> [64092.242448] __vfs_read (fs/read_write.c:423 fs/read_write.c:434)
>> [64092.246949] vfs_read (fs/read_write.c:454)
>> [64092.247743] SyS_pread64 (fs/read_write.c:607 fs/read_write.c:594)
>> [64092.250445] entry_SYSCALL_64_fastpath (arch/x86/entry/entry_64.S:186)
>> [64092.251440] Memory state around the buggy address:
>> [64092.252221] ffff88040506fc00: 00 00 00 f1 f1 f1 f1 00 00 00 00 00 f4 f4 f4 f3
>> [64092.253340] ffff88040506fc80: f3 f3 f3 00 00 00 00 00 00 00 00 00 00 00 00 00
>> [64092.254456] >ffff88040506fd00: 00 00 f1 f1 f1 f1 00 00 f4 f4 f2 f2 f2 f2 00 00
>> [64092.255566] ^
>> [64092.256432] ffff88040506fd80: 00 00 00 f4 f4 f4 f2 f2 f2 f2 00 00 00 00 00 f4
>> [64092.257557] ffff88040506fe00: f4 f4 f3 f3 f3 f3 00 00 00 00 00 00 00 00 00 00
>> [64092.258684] ==================================================================
>>
>
> I tried to debug this but kasan doesn't print much useful information
> for stack out of bounds access. It shows the address that's being
> accessed but it doesn't show the value of the boundary that was
> exceeded.
This could be estimated by looking at the shadow memory:
ffff88040506fd00: 00 00 f1 f1 f1 f1 00 00 f4 [f4] f2 f2 f2 f2 00 00
Each byte in shadow represents 8 bytes of memory. So f1 - is the left redzone of the stack frame.
2 zeroes is probably 'struct iovec iov' defined in new_sync_read(). The next two f4 is redzone.
We hit the second f4, which means that we accessed iov[1].iov_len
This bug is similar to recently found bug in 9p: http://thread.gmane.org/gmane.linux.kernel/1931799/focus=1936542
Such report could be produced if retval > count.
generic_file_read_iter():
...
size_t count = iov_iter_count(iter);
...
if (!count)
goto out; /* skip atime */
size = i_size_read(inode);
retval = filemap_write_and_wait_range(mapping, pos,
pos + count - 1);
if (!retval) {
struct iov_iter data = *iter;
retval = mapping->a_ops->direct_IO(iocb, &data, pos);
}
if (retval > 0) {
*ppos = pos + retval;
iov_iter_advance(iter, retval);
So either filemap_write_and_wait_range() or mapping->a_ops->direct_IO() returned more
than 'count'.
> And the stack dump doesn't show any addresses either - just
> contents. It would be nice to see a full stack frame dump showing
> where all the parent frames are too.
Yes, I think it might be helpful to dump some portion of stack around the access address.
> Also too the file and line number
> (lib/iov_iter.c:511) are completely useless because of inlining,
> though that's not kasan's fault.
>
On Mon, Aug 17, 2015 at 12:18:12PM +0300, Andrey Ryabinin wrote:
> This bug is similar to recently found bug in 9p: http://thread.gmane.org/gmane.linux.kernel/1931799/focus=1936542
Ow. For those who'd missed that fun: the bug in question had turned out to
be caused by improper reuse of request ids, _not_ in the call chain of
the triggering syscall.
> if (!retval) {
> struct iov_iter data = *iter;
> retval = mapping->a_ops->direct_IO(iocb, &data, pos);
> }
>
> if (retval > 0) {
> *ppos = pos + retval;
> iov_iter_advance(iter, retval);
>
>
> So either filemap_write_and_wait_range()
Shouldn't - it's supposed to return 0 or -E...
> or mapping->a_ops->direct_IO() returned more
> than 'count'.
Was there DAX involved? ->direct_IO() in there is blkdev_direct_IO(),
which takes rather different paths in those cases...
> > Also too the file and line number
> > (lib/iov_iter.c:511) are completely useless because of inlining,
> > though that's not kasan's fault.
Might make sense to slap
if (WARN_ON(size > iov_iter_count(i)))
print size and *i
and see if it triggers...
On Wed, Sep 30, 2015 at 05:30:17PM -0400, Sasha Levin wrote:
> > So I've traced this all the way back to dax_io(). I can trigger this with:
> >
> > diff --git a/fs/dax.c b/fs/dax.c
> > index 93bf2f9..2cdb8a5 100644
> > --- a/fs/dax.c
> > +++ b/fs/dax.c
> > @@ -178,6 +178,7 @@ static ssize_t dax_io(struct inode *inode, struct iov_iter *iter,
> > if (need_wmb)
> > wmb_pmem();
> >
> > + WARN_ON((pos == start) && (pos - start > iov_iter_count(iter)));
> > return (pos == start) ? retval : pos - start;
> > }
> >
> > So it seems that iter gets moved twice here: once in dax_io(), and once again
> > back at generic_file_read_iter().
> >
> > I don't see how it ever worked. Am I missing something?
This:
struct iov_iter data = *iter;
retval = mapping->a_ops->direct_IO(iocb, &data, pos);
}
if (retval > 0) {
*ppos = pos + retval;
iov_iter_advance(iter, retval);
The iterator advanced in ->direct_IO() is a _copy_, not the original.
The contents of *iter as seen by generic_file_read_iter() is not
modifiable by ->direct_IO(), simply because its address is nowhere to
be found. And checking iov_iter_count(iter) at the end of dax_io() is
pointless - from the POV of generic_file_read_iter() it's data.count,
and while it used to be equal to iter->count, it's already been modified.
By the time we call iov_iter_advance() in generic_file_read_iter() that
value will be already discarded, along with rest of struct iov_iter data.
Wait a minute - you are triggering _what_???
> > + WARN_ON((pos == start) && (pos - start > iov_iter_count(iter)));
With '&&'? iov_iter_count() is size_t, while pos and start are loff_t,
so you are seeing equal values in pos and start (as integers) *and*
(loff_t)0 > (size_t)something. loff_t is a signed type, size_t - unsigned.
6.3.1.8[1] says that
* if rank of size_t is greater or equal to rank of loff_t, the
latter gets converted to size_t. And conversion of zero should be zero,
i.e. (size_t) 0 > (size_t)something, which is impossible (we compare them
as non-negative integers).
* if loff_t can represent all values of size_t, size_t value gets
converted to loff_t. Result of conversion should have the same (in particular,
non-negative) value. Again, comparison can't be true.
* otherwise both values are converted to unsigned counterpart of
loff_t. Again, zero converts to 0 and in any unsigned type 0 > x is
impossible.
I don't see any way for that condition to evaluate true.
Assuming that it's a misquoted ||... I don't see any way for pos to
get greater than start + original iov_iter_count(). However, I *do*
see a way for bad things to happen in a different way. Look:
// first pass through the loop, pos == start (and so's max)
retval = dax_get_addr(bh, &addr, blkbits);
// got a positive value
if (retval < 0)
break;
// nope, keep going
if (buffer_unwritten(bh) || buffer_new(bh)) {
dax_new_buf(addr, retval, first, pos,
end);
need_wmb = true;
}
addr += first;
size = retval - first;
// OK...
}
max = min(pos + size, end);
// OK...
}
if (iov_iter_rw(iter) == WRITE) {
len = copy_from_iter_pmem(addr, max - pos, iter);
need_wmb = true;
} else if (!hole)
len = copy_to_iter((void __force *)addr, max - pos,
iter);
else
len = iov_iter_zero(max - pos, iter);
// too bad - we'd hit an unmapped memory area. len is 0...
// and retval is fucking positive.
if (!len)
break;
return (pos == start) ? retval : pos - start;
// will return a bloody big positive value
Could you try to reproduce it with this:
dax_io(): don't let non-error value escape via retval instead of EFAULT
Signed-off-by: Al Viro <[email protected]>
---
diff --git a/fs/dax.c b/fs/dax.c
index a86d3cc..7b653e9 100644
--- a/fs/dax.c
+++ b/fs/dax.c
@@ -169,8 +169,10 @@ static ssize_t dax_io(struct inode *inode, struct iov_iter *iter,
else
len = iov_iter_zero(max - pos, iter);
- if (!len)
+ if (!len) {
+ retval = -EFAULT;
break;
+ }
pos += len;
addr += len;
On Fri, Nov 06, 2015 at 01:34:02AM +0000, Al Viro wrote:
> Could you try to reproduce it with this:
>
> dax_io(): don't let non-error value escape via retval instead of EFAULT
>
> Signed-off-by: Al Viro <[email protected]>
> ---
> diff --git a/fs/dax.c b/fs/dax.c
> index a86d3cc..7b653e9 100644
> --- a/fs/dax.c
> +++ b/fs/dax.c
> @@ -169,8 +169,10 @@ static ssize_t dax_io(struct inode *inode, struct iov_iter *iter,
> else
> len = iov_iter_zero(max - pos, iter);
>
> - if (!len)
> + if (!len) {
> + retval = -EFAULT;
> break;
> + }
>
> pos += len;
> addr += len;
>
PS: "block, dax: fix lifetime of in-kernel dax mappings with dax_map_atomic()"
Dan Williams had posted a while ago does change the things a bit, but
AFAICS only in turning "return a bogus positive value" into "return an
uninitialized value"; if applying that one after it, s/retval/rc/ in
the above. And whether it fixes the bug Sasha had been able to trigger,
the bug is real and needs fixing - it's been there since 4.0 when fs/dax.c
went into the tree.
How are we going to handle that one? I can put it into mainline pull
request via vfs.git, with Cc: stable, but if e.g. Jens prefers to take it
via the block tree, I'll be glad to leave it for him to deal with.
On Thu, Nov 5, 2015 at 6:19 PM, Al Viro <[email protected]> wrote:
>
> How are we going to handle that one? I can put it into mainline pull
> request via vfs.git, with Cc: stable, but if e.g. Jens prefers to take it
> via the block tree, I'll be glad to leave it for him to deal with.
Put it in the vfs tree (I'm hoping for a pull request soon..)
I pulled the block trees from Jens yesterday, so there is presumably
nothing pending there right now.
Linus
On 11/05/2015 08:38 PM, Linus Torvalds wrote:
> On Thu, Nov 5, 2015 at 6:19 PM, Al Viro <[email protected]> wrote:
>>
>> How are we going to handle that one? I can put it into mainline pull
>> request via vfs.git, with Cc: stable, but if e.g. Jens prefers to take it
>> via the block tree, I'll be glad to leave it for him to deal with.
>
> Put it in the vfs tree (I'm hoping for a pull request soon..)
>
> I pulled the block trees from Jens yesterday, so there is presumably
> nothing pending there right now.
Either way is obviously fine with me. I have 4 patches pending, but
unless more urgent things show up, I was going to continue collecting
fixes and submit that post -rc1.
--
Jens Axboe
Al, ping?
On Thu, Nov 5, 2015 at 7:38 PM, Linus Torvalds
<[email protected]> wrote:
> On Thu, Nov 5, 2015 at 6:19 PM, Al Viro <[email protected]> wrote:
>>
>> How are we going to handle that one? I can put it into mainline pull
>> request via vfs.git, with Cc: stable, but if e.g. Jens prefers to take it
>> via the block tree, I'll be glad to leave it for him to deal with.
>
> Put it in the vfs tree (I'm hoping for a pull request soon..)
>
> I pulled the block trees from Jens yesterday, so there is presumably
> nothing pending there right now.
Apparently my "hoping for a pull request soon" was ridiculously optimistic.
Al, looking at the most recent linux-next, most of the vfs commits
there seem to be committed in the last day or two. I'm getting the
feeling that that is all 4.5 material by now.
Should I just take the iov patch as-is, since apparently no vfs pull
request is happening this merge cycle? And no, I'm not taking
"developed during the second week of the merge window, and sent in the
last few days of it". I'm done with that.
Linus
On Tue, Nov 10 2015, Linus Torvalds wrote:
> Al, ping?
>
> On Thu, Nov 5, 2015 at 7:38 PM, Linus Torvalds
> <[email protected]> wrote:
> > On Thu, Nov 5, 2015 at 6:19 PM, Al Viro <[email protected]> wrote:
> >>
> >> How are we going to handle that one? I can put it into mainline pull
> >> request via vfs.git, with Cc: stable, but if e.g. Jens prefers to take it
> >> via the block tree, I'll be glad to leave it for him to deal with.
> >
> > Put it in the vfs tree (I'm hoping for a pull request soon..)
> >
> > I pulled the block trees from Jens yesterday, so there is presumably
> > nothing pending there right now.
>
> Apparently my "hoping for a pull request soon" was ridiculously optimistic.
>
> Al, looking at the most recent linux-next, most of the vfs commits
> there seem to be committed in the last day or two. I'm getting the
> feeling that that is all 4.5 material by now.
>
> Should I just take the iov patch as-is, since apparently no vfs pull
> request is happening this merge cycle? And no, I'm not taking
> "developed during the second week of the merge window, and sent in the
> last few days of it". I'm done with that.
I've got 8 other patches pending for a post core merge, just waiting for
the last core pull request to go in. I haven't seen this iov iter fix,
though.
git://git.kernel.dk/linux-block.git for-linus
----------------------------------------------------------------
Jan Kara (1):
brd: Refuse improperly aligned discard requests
Jens Axboe (2):
MAINTAINERS: add reference to new linux-block list
blk-mq: mark __blk_mq_complete_request() static
Randy Dunlap (1):
block: fix blk-core.c kernel-doc warning
Sathyavathi M (1):
NVMe: Increase the max transfer size when mdts is 0
Stephan G?nther (2):
NVMe: use split lo_hi_{read,write}q
NVMe: add support for Apple NVMe controller
Vivek Goyal (1):
fs/block_dev.c: Remove WARN_ON() when inode writeback fails
MAINTAINERS | 1 +
block/blk-core.c | 3 +++
block/blk-mq.c | 2 +-
block/blk-mq.h | 1 -
drivers/block/brd.c | 3 +++
drivers/nvme/host/pci.c | 15 +++++++++------
fs/block_dev.c | 15 ++++++++++++---
7 files changed, 29 insertions(+), 11 deletions(-)
--
Jens Axboe
On Tue, Nov 10, 2015 at 6:25 PM, Jens Axboe <[email protected]> wrote:
> On Tue, Nov 10 2015, Linus Torvalds wrote:
>> Al, ping?
>>
>> On Thu, Nov 5, 2015 at 7:38 PM, Linus Torvalds
>> <[email protected]> wrote:
>> > On Thu, Nov 5, 2015 at 6:19 PM, Al Viro <[email protected]> wrote:
>> >>
>> >> How are we going to handle that one? I can put it into mainline pull
>> >> request via vfs.git, with Cc: stable, but if e.g. Jens prefers to take it
>> >> via the block tree, I'll be glad to leave it for him to deal with.
>> >
>> > Put it in the vfs tree (I'm hoping for a pull request soon..)
>> >
>> > I pulled the block trees from Jens yesterday, so there is presumably
>> > nothing pending there right now.
>>
>> Apparently my "hoping for a pull request soon" was ridiculously optimistic.
>>
>> Al, looking at the most recent linux-next, most of the vfs commits
>> there seem to be committed in the last day or two. I'm getting the
>> feeling that that is all 4.5 material by now.
>>
>> Should I just take the iov patch as-is, since apparently no vfs pull
>> request is happening this merge cycle? And no, I'm not taking
>> "developed during the second week of the merge window, and sent in the
>> last few days of it". I'm done with that.
>
> I've got 8 other patches pending for a post core merge, just waiting for
> the last core pull request to go in. I haven't seen this iov iter fix,
> though.
It was in this thread, looked like this (without the whitespace damage):
dax_io(): don't let non-error value escape via retval instead of EFAULT
Signed-off-by: Al Viro <[email protected]>
---
diff --git a/fs/dax.c b/fs/dax.c
index a86d3cc..7b653e9 100644
--- a/fs/dax.c
+++ b/fs/dax.c
@@ -169,8 +169,10 @@ static ssize_t dax_io(struct inode *inode,
struct iov_iter *iter,
else
len = iov_iter_zero(max - pos, iter);
- if (!len)
+ if (!len) {
+ retval = -EFAULT;
break;
+ }
pos += len;
addr += len;
although I don't think I saw a confirmation that that was what Sasha
actually hit (but Sasha had narrowed it down to DAX, so it looks
possible/likely)
Linus
On 11/10/2015 07:31 PM, Linus Torvalds wrote:
> On Tue, Nov 10, 2015 at 6:25 PM, Jens Axboe <[email protected]> wrote:
>> On Tue, Nov 10 2015, Linus Torvalds wrote:
>>> Al, ping?
>>>
>>> On Thu, Nov 5, 2015 at 7:38 PM, Linus Torvalds
>>> <[email protected]> wrote:
>>>> On Thu, Nov 5, 2015 at 6:19 PM, Al Viro <[email protected]> wrote:
>>>>>
>>>>> How are we going to handle that one? I can put it into mainline pull
>>>>> request via vfs.git, with Cc: stable, but if e.g. Jens prefers to take it
>>>>> via the block tree, I'll be glad to leave it for him to deal with.
>>>>
>>>> Put it in the vfs tree (I'm hoping for a pull request soon..)
>>>>
>>>> I pulled the block trees from Jens yesterday, so there is presumably
>>>> nothing pending there right now.
>>>
>>> Apparently my "hoping for a pull request soon" was ridiculously optimistic.
>>>
>>> Al, looking at the most recent linux-next, most of the vfs commits
>>> there seem to be committed in the last day or two. I'm getting the
>>> feeling that that is all 4.5 material by now.
>>>
>>> Should I just take the iov patch as-is, since apparently no vfs pull
>>> request is happening this merge cycle? And no, I'm not taking
>>> "developed during the second week of the merge window, and sent in the
>>> last few days of it". I'm done with that.
>>
>> I've got 8 other patches pending for a post core merge, just waiting for
>> the last core pull request to go in. I haven't seen this iov iter fix,
>> though.
>
> It was in this thread, looked like this (without the whitespace damage):
>
> dax_io(): don't let non-error value escape via retval instead of EFAULT
>
> Signed-off-by: Al Viro <[email protected]>
> ---
> diff --git a/fs/dax.c b/fs/dax.c
> index a86d3cc..7b653e9 100644
> --- a/fs/dax.c
> +++ b/fs/dax.c
> @@ -169,8 +169,10 @@ static ssize_t dax_io(struct inode *inode,
> struct iov_iter *iter,
> else
> len = iov_iter_zero(max - pos, iter);
>
> - if (!len)
> + if (!len) {
> + retval = -EFAULT;
> break;
> + }
>
> pos += len;
> addr += len;
>
>
> although I don't think I saw a confirmation that that was what Sasha
> actually hit (but Sasha had narrowed it down to DAX, so it looks
> possible/likely)
I found it right after sending that email. Patch looks pretty straight
forward, at least from the case of max - pos != 0 and len == 0 on
return. Might be cleaner to add a
if (retval < 0)
break;
check, that should be the case where max == pos anyway. But we'd
potentially return -Exx into -EFAULT for that case with the patch.
Hmm?
--
Jens Axboe
On 11/10/2015 07:40 PM, Jens Axboe wrote:
> On 11/10/2015 07:31 PM, Linus Torvalds wrote:
>> On Tue, Nov 10, 2015 at 6:25 PM, Jens Axboe <[email protected]> wrote:
>>> On Tue, Nov 10 2015, Linus Torvalds wrote:
>>>> Al, ping?
>>>>
>>>> On Thu, Nov 5, 2015 at 7:38 PM, Linus Torvalds
>>>> <[email protected]> wrote:
>>>>> On Thu, Nov 5, 2015 at 6:19 PM, Al Viro <[email protected]>
>>>>> wrote:
>>>>>>
>>>>>> How are we going to handle that one? I can put it into mainline pull
>>>>>> request via vfs.git, with Cc: stable, but if e.g. Jens prefers to
>>>>>> take it
>>>>>> via the block tree, I'll be glad to leave it for him to deal with.
>>>>>
>>>>> Put it in the vfs tree (I'm hoping for a pull request soon..)
>>>>>
>>>>> I pulled the block trees from Jens yesterday, so there is presumably
>>>>> nothing pending there right now.
>>>>
>>>> Apparently my "hoping for a pull request soon" was ridiculously
>>>> optimistic.
>>>>
>>>> Al, looking at the most recent linux-next, most of the vfs commits
>>>> there seem to be committed in the last day or two. I'm getting the
>>>> feeling that that is all 4.5 material by now.
>>>>
>>>> Should I just take the iov patch as-is, since apparently no vfs pull
>>>> request is happening this merge cycle? And no, I'm not taking
>>>> "developed during the second week of the merge window, and sent in the
>>>> last few days of it". I'm done with that.
>>>
>>> I've got 8 other patches pending for a post core merge, just waiting for
>>> the last core pull request to go in. I haven't seen this iov iter fix,
>>> though.
>>
>> It was in this thread, looked like this (without the whitespace damage):
>>
>> dax_io(): don't let non-error value escape via retval instead of
>> EFAULT
>>
>> Signed-off-by: Al Viro <[email protected]>
>> ---
>> diff --git a/fs/dax.c b/fs/dax.c
>> index a86d3cc..7b653e9 100644
>> --- a/fs/dax.c
>> +++ b/fs/dax.c
>> @@ -169,8 +169,10 @@ static ssize_t dax_io(struct inode *inode,
>> struct iov_iter *iter,
>> else
>> len = iov_iter_zero(max - pos, iter);
>>
>> - if (!len)
>> + if (!len) {
>> + retval = -EFAULT;
>> break;
>> + }
>>
>> pos += len;
>> addr += len;
>>
>>
>> although I don't think I saw a confirmation that that was what Sasha
>> actually hit (but Sasha had narrowed it down to DAX, so it looks
>> possible/likely)
>
> I found it right after sending that email. Patch looks pretty straight
> forward, at least from the case of max - pos != 0 and len == 0 on
> return. Might be cleaner to add a
>
> if (retval < 0)
> break;
>
> check, that should be the case where max == pos anyway. But we'd
> potentially return -Exx into -EFAULT for that case with the patch.
>
> Hmm?
So we already do that, in the 'if' above. I think the patch looks fine.
--
Jens Axboe
On 11/10/2015 07:41 PM, Jens Axboe wrote:
> On 11/10/2015 07:40 PM, Jens Axboe wrote:
>> On 11/10/2015 07:31 PM, Linus Torvalds wrote:
>>> On Tue, Nov 10, 2015 at 6:25 PM, Jens Axboe <[email protected]> wrote:
>>>> On Tue, Nov 10 2015, Linus Torvalds wrote:
>>>>> Al, ping?
>>>>>
>>>>> On Thu, Nov 5, 2015 at 7:38 PM, Linus Torvalds
>>>>> <[email protected]> wrote:
>>>>>> On Thu, Nov 5, 2015 at 6:19 PM, Al Viro <[email protected]>
>>>>>> wrote:
>>>>>>>
>>>>>>> How are we going to handle that one? I can put it into mainline
>>>>>>> pull
>>>>>>> request via vfs.git, with Cc: stable, but if e.g. Jens prefers to
>>>>>>> take it
>>>>>>> via the block tree, I'll be glad to leave it for him to deal with.
>>>>>>
>>>>>> Put it in the vfs tree (I'm hoping for a pull request soon..)
>>>>>>
>>>>>> I pulled the block trees from Jens yesterday, so there is presumably
>>>>>> nothing pending there right now.
>>>>>
>>>>> Apparently my "hoping for a pull request soon" was ridiculously
>>>>> optimistic.
>>>>>
>>>>> Al, looking at the most recent linux-next, most of the vfs commits
>>>>> there seem to be committed in the last day or two. I'm getting the
>>>>> feeling that that is all 4.5 material by now.
>>>>>
>>>>> Should I just take the iov patch as-is, since apparently no vfs pull
>>>>> request is happening this merge cycle? And no, I'm not taking
>>>>> "developed during the second week of the merge window, and sent in the
>>>>> last few days of it". I'm done with that.
>>>>
>>>> I've got 8 other patches pending for a post core merge, just waiting
>>>> for
>>>> the last core pull request to go in. I haven't seen this iov iter fix,
>>>> though.
>>>
>>> It was in this thread, looked like this (without the whitespace damage):
>>>
>>> dax_io(): don't let non-error value escape via retval instead of
>>> EFAULT
>>>
>>> Signed-off-by: Al Viro <[email protected]>
>>> ---
>>> diff --git a/fs/dax.c b/fs/dax.c
>>> index a86d3cc..7b653e9 100644
>>> --- a/fs/dax.c
>>> +++ b/fs/dax.c
>>> @@ -169,8 +169,10 @@ static ssize_t dax_io(struct inode *inode,
>>> struct iov_iter *iter,
>>> else
>>> len = iov_iter_zero(max - pos, iter);
>>>
>>> - if (!len)
>>> + if (!len) {
>>> + retval = -EFAULT;
>>> break;
>>> + }
>>>
>>> pos += len;
>>> addr += len;
>>>
>>>
>>> although I don't think I saw a confirmation that that was what Sasha
>>> actually hit (but Sasha had narrowed it down to DAX, so it looks
>>> possible/likely)
>>
>> I found it right after sending that email. Patch looks pretty straight
>> forward, at least from the case of max - pos != 0 and len == 0 on
>> return. Might be cleaner to add a
>>
>> if (retval < 0)
>> break;
>>
>> check, that should be the case where max == pos anyway. But we'd
>> potentially return -Exx into -EFAULT for that case with the patch.
>>
>> Hmm?
>
> So we already do that, in the 'if' above. I think the patch looks fine.
Queued up. Unless Al objects, it'll be part of the 'for-linus' pull
later this week.
--
Jens Axboe
On Tue, Nov 10, 2015 at 06:21:47PM -0800, Linus Torvalds wrote:
> Al, looking at the most recent linux-next, most of the vfs commits
> there seem to be committed in the last day or two. I'm getting the
> feeling that that is all 4.5 material by now.
>
> Should I just take the iov patch as-is, since apparently no vfs pull
> request is happening this merge cycle? And no, I'm not taking
> "developed during the second week of the merge window, and sent in the
> last few days of it". I'm done with that.
s/developed/rebased/, actually, but... point taken. Mea culpa, and what
to do with those patches is for you to decide; some of those are simply
-stable fodder and probably ought to go one-by-one at any point you would
consider convenient, some are of the "remove stale comment" variety (obviously
can sit around until the next cycle, or go in one-by-one at any point - the
things like
-
- /* WARNING: probably going away soon, do not use! */
in inode_operations; the comment used to be about the method removed last
cycle and should've been gone with it; etc.)
There's a large pile not in those two classes - xattr+richacl stuff. I'm more
confident about the first part, but strictly speaking neither qualifies as
fixes.
FWIW, the stuff that had been _developed_ during the merge window is not there
- a patch series around the descriptor bitmaps. Doesn't change the situation;
I'd fucked up this cycle ;-/
On Tue, Nov 10, 2015 at 07:44:14PM -0700, Jens Axboe wrote:
> Queued up. Unless Al objects, it'll be part of the 'for-linus' pull
> later this week.
Reported-by: Sasha Levin <[email protected]>
Cc: [email protected] # 4.0+
probably ought to be there...
On 11/10/2015 08:06 PM, Al Viro wrote:
> On Tue, Nov 10, 2015 at 07:44:14PM -0700, Jens Axboe wrote:
>
>> Queued up. Unless Al objects, it'll be part of the 'for-linus' pull
>> later this week.
>
> Reported-by: Sasha Levin <[email protected]>
> Cc: [email protected] # 4.0+
>
> probably ought to be there...
Agree, done.
--
Jens Axboe
On 11/10/2015 09:31 PM, Linus Torvalds wrote:
> although I don't think I saw a confirmation that that was what Sasha
> actually hit (but Sasha had narrowed it down to DAX, so it looks
> possible/likely)
Yup, that indeed fixed the problem I was seeing.
Thanks,
Sasha
On Wed, Nov 11, 2015 at 02:56:47AM +0000, Al Viro wrote:
> s/developed/rebased/, actually, but... point taken. Mea culpa, and what
> to do with those patches is for you to decide; some of those are simply
> -stable fodder and probably ought to go one-by-one at any point you would
> consider convenient, some are of the "remove stale comment" variety (obviously
> can sit around until the next cycle, or go in one-by-one at any point - the
> things like
> -
> - /* WARNING: probably going away soon, do not use! */
> in inode_operations; the comment used to be about the method removed last
> cycle and should've been gone with it; etc.)
FWIW, here's what's in there:
dax_io fix
Jens has just taken it
fs: fix inode.c kernel-doc warning
fs: fix writeback.c kernel-doc warnings
trivial comment patches
overlayfs: move super block magic number to magic.h
got picked into overlayfs tree yesterday
debugfs: fix refcount imbalance in start_creating
old fix, -stable fodder (had been first posted in October, IIRC)
vfs: Check attribute names in posix acl xattr handers
vfs: Fix the posix_acl_xattr_list return value
ubifs: Remove unused security xattr handler
hfsplus: Remove unused xattr handler list operations
jffs2: Add missing capability check for listing trusted xattrs
xattr handlers: Pass handler to operations instead of flags
9p: xattr simplifications
squashfs: xattr simplifications
f2fs: xattr simplifications
xattr series; the first two are arguably fixes, and whatever happens in this
window, I'm taking the rest into -next for 4.5. Series makes sense and
cleans the things nicely, IMO.
FS-Cache: Increase reference of parent after registering, netfs success
FS-Cache: Don't override netfs's primary_index if registering failed
cachefiles: perform test on s_blocksize when opening cache file.
FS-Cache: Handle a write to the page immediately beyond the EOF marker
1, 2 and 4 are simply -stable fodder, 3 is an obvious optimization.
binfmt_elf: Don't clobber passed executable's file header
binfmt_elf: Correct `arch_check_elf's description
-stable fodder.
fs/pipe.c: preserve alloc_file() error code
fs/pipe.c: return error code rather than 0 in pipe_write()
-stable fodder.
vfs: remove unused wrapper block_page_mkwrite()
vfs: remove stale comment in inode_operations
dead code and stale comment removal. Can go at any point.
fs: 9p: cache.h: Add #define of include guard
trivial, can go at any point, or stay until the next cycle.
richacl series
probably misses the window - I'd really like to hear more detailed variant
of Christoph's objections in any case.
Again, my apologies to everyone involved - I'd fucked up, badly. The only
question is how much PITA it will end up causing. I can put those into
separate branches and/or mail directly; what ends up missing the window
will go into vfs.git#for-next as soon as -rc1 is out there (with the
possible exception of richacl stuff - I really want to hear from Christoph
and in more details than "it's all been said some iterations ago").
Linus, what would be your preference wrt that stuff? Besides the "don't
ever do that kind of shit again", that is - that much is obvious.
On Tue, Nov 10, 2015 at 7:30 PM, Al Viro <[email protected]> wrote:
>
> Linus, what would be your preference wrt that stuff?
If you can just create a branch with the stuff that is obvious and
clearly worth it (ie stuff that would basically be stable material
anyway), I'll just merge it. Assuming it's all done in some
reasonable timeframe..
Linus
On Tue, Nov 10, 2015 at 08:36:48PM -0800, Linus Torvalds wrote:
> On Tue, Nov 10, 2015 at 7:30 PM, Al Viro <[email protected]> wrote:
> >
> > Linus, what would be your preference wrt that stuff?
>
> If you can just create a branch with the stuff that is obvious and
> clearly worth it (ie stuff that would basically be stable material
> anyway), I'll just merge it. Assuming it's all done in some
> reasonable timeframe..
OK... Right now I have #for-linus-stable and #for-linus-2 on top
of it, the latter adding several comment fixes, etc., the most serious
change among which is the removal of never used block_page_mkwrite().
dax_io fix isn't there, neither is overlayfs magic.h patch - both are
already in other trees. I would like to get xattr series in as well,
but that's a separate pull request, if you'd accept them in this window in
the first place. richacl stuff isn't there as well, and I think that one
is clear "leave it for 4.5" fodder.
Anyway, for
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs.git for-linus-2
(both -stable fodder and trivial patches)
Shortlog:
Daniel Borkmann (1):
debugfs: fix refcount imbalance in start_creating
David Howells (1):
FS-Cache: Handle a write to the page immediately beyond the EOF marker
Eric Biggers (2):
fs/pipe.c: preserve alloc_file() error code
fs/pipe.c: return error code rather than 0 in pipe_write()
Kinglong Mee (2):
FS-Cache: Increase reference of parent after registering, netfs success
FS-Cache: Don't override netfs's primary_index if registering failed
Maciej W. Rozycki (2):
binfmt_elf: Don't clobber passed executable's file header
binfmt_elf: Correct `arch_check_elf's description
NeilBrown (1):
cachefiles: perform test on s_blocksize when opening cache file.
Randy Dunlap (2):
fs: fix inode.c kernel-doc warning
fs: fix writeback.c kernel-doc warnings
Ross Zwisler (2):
vfs: remove unused wrapper block_page_mkwrite()
vfs: remove stale comment in inode_operations
Tzvetelin Katchov (1):
fs: 9p: cache.h: Add #define of include guard
Diffstat:
fs/9p/cache.h | 1 +
fs/binfmt_elf.c | 12 ++++----
fs/buffer.c | 24 ++-------------
fs/cachefiles/namei.c | 2 ++
fs/cachefiles/rdwr.c | 73 +++++++++++++++++++++++----------------------
fs/debugfs/inode.c | 6 +++-
fs/ext4/inode.c | 4 +--
fs/fs-writeback.c | 3 +-
fs/fscache/netfs.c | 38 +++++++++++------------
fs/fscache/page.c | 2 +-
fs/inode.c | 1 +
fs/nilfs2/file.c | 2 +-
fs/pipe.c | 18 ++++++-----
fs/xfs/xfs_file.c | 2 +-
include/linux/buffer_head.h | 2 --
include/linux/fs.h | 2 --
16 files changed, 89 insertions(+), 103 deletions(-)
If you'd prefer to do that in two separate pulls - yell (or just pull
#for-linux-stable first, then this on top of it). I'd reordered
#for-next so that it continues #for-linus-2; tree of its tip being the
same as yesterday.
Hi Al,
On Wed, 11 Nov 2015 07:43:30 +0000 Al Viro <[email protected]> wrote:
>
> dax_io fix isn't there, neither is overlayfs magic.h patch - both are
> already in other trees. I would like to get xattr series in as well,
> but that's a separate pull request, if you'd accept them in this window in
> the first place. richacl stuff isn't there as well, and I think that one
> is clear "leave it for 4.5" fodder.
So could you please remove the 4.5 stuff from your for-next branch
until after the merge window closes.
Also, I noticed these new warnings today:
fs/orangefs/xattr.c:509:9: warning: initialization from incompatible pointer type [-Wincompatible-pointer-types]
.get = pvfs2_xattr_get_trusted,
^
fs/orangefs/xattr.c:509:9: note: (near initialization for 'pvfs2_xattr_trusted_handler.get')
fs/orangefs/xattr.c:510:9: warning: initialization from incompatible pointer type [-Wincompatible-pointer-types]
.set = pvfs2_xattr_set_trusted,
^
fs/orangefs/xattr.c:510:9: note: (near initialization for 'pvfs2_xattr_trusted_handler.set')
fs/orangefs/xattr.c:520:9: warning: initialization from incompatible pointer type [-Wincompatible-pointer-types]
.get = pvfs2_xattr_get_default,
^
fs/orangefs/xattr.c:520:9: note: (near initialization for 'pvfs2_xattr_default_handler.get')
fs/orangefs/xattr.c:521:9: warning: initialization from incompatible pointer type [-Wincompatible-pointer-types]
.set = pvfs2_xattr_set_default,
^
fs/orangefs/xattr.c:521:9: note: (near initialization for 'pvfs2_xattr_default_handler.set')
--
Cheers,
Stephen Rothwell [email protected]
On Wed, Nov 11, 2015 at 07:16:36PM +1100, Stephen Rothwell wrote:
> Hi Al,
>
> On Wed, 11 Nov 2015 07:43:30 +0000 Al Viro <[email protected]> wrote:
> >
> > dax_io fix isn't there, neither is overlayfs magic.h patch - both are
> > already in other trees. I would like to get xattr series in as well,
> > but that's a separate pull request, if you'd accept them in this window in
> > the first place. richacl stuff isn't there as well, and I think that one
> > is clear "leave it for 4.5" fodder.
>
> So could you please remove the 4.5 stuff from your for-next branch
> until after the merge window closes.
Done.
> Also, I noticed these new warnings today:
>
> fs/orangefs/xattr.c:509:9: warning: initialization from incompatible pointer type [-Wincompatible-pointer-types]
> .get = pvfs2_xattr_get_trusted,
> ^
> fs/orangefs/xattr.c:509:9: note: (near initialization for 'pvfs2_xattr_trusted_handler.get')
> fs/orangefs/xattr.c:510:9: warning: initialization from incompatible pointer type [-Wincompatible-pointer-types]
> .set = pvfs2_xattr_set_trusted,
> ^
> fs/orangefs/xattr.c:510:9: note: (near initialization for 'pvfs2_xattr_trusted_handler.set')
> fs/orangefs/xattr.c:520:9: warning: initialization from incompatible pointer type [-Wincompatible-pointer-types]
> .get = pvfs2_xattr_get_default,
> ^
> fs/orangefs/xattr.c:520:9: note: (near initialization for 'pvfs2_xattr_default_handler.get')
> fs/orangefs/xattr.c:521:9: warning: initialization from incompatible pointer type [-Wincompatible-pointer-types]
> .set = pvfs2_xattr_set_default,
> ^
> fs/orangefs/xattr.c:521:9: note: (near initialization for 'pvfs2_xattr_default_handler.set')
That's "xattr handlers: Pass handler to operations instead of flags" fallout,
trivially adjusted (typical change is
-ext2_xattr_security_list(struct dentry *dentry, char *list, size_t list_size,
- const char *name, size_t name_len, int type)
+ext2_xattr_security_list(const struct xattr_handler *handler,
+ struct dentry *dentry, char *list, size_t list_size,
+ const char *name, size_t name_len)
with type replaced with handler->flags if it's used anywhere in the body;
AFAICS, none of orangefs instances use it at all, so it's just a matter of
changing the argument lists in pvfs2_xattr_[gs]et_{default,trusted},
adding const struct xattr_handler *handler in the beginning and removing
the last argument; callers in pvfs2_ioctl() should simply use
pvfs2_inode_[gs]etxattr()).
Note, however, that orangefs in linux-next lacks a lot of fixes (see
vfs.git#orangefs-untested for some; AFAICS, those are missing from all
branches in orangefs git tree) and there are problems I don't know
how to fix, mostly due to the lack of documentation. The last I've
heard from them was that they were putting such docs together; hopefully
once that get done we'll be able to sort the rest of that thing out.
It'll be after -rc1, though.
So xattr conflicts are the least of the problems there; those are easy
to adjust for, there are more serious issues in the entire thing ;-/
BTW, while we are at it - pvfs2_listxattr() doesn't even validate
resp.listxattr.returned_count, so a bogus response from buggered
server will do really interesting things to the kernel.
I'll cook the minimal fixup for API change after I get some sleep and
send it your way, unless somebody gets there first...
Hi Al,
On Wed, 11 Nov 2015 10:19:48 +0000 Al Viro <[email protected]> wrote:
>
> On Wed, Nov 11, 2015 at 07:16:36PM +1100, Stephen Rothwell wrote:
> >
> > So could you please remove the 4.5 stuff from your for-next branch
> > until after the merge window closes.
>
> Done.
Thanks.
> > Also, I noticed these new warnings today:
> >
> I'll cook the minimal fixup for API change after I get some sleep and
> send it your way, unless somebody gets there first...
Thanks again.
--
Cheers,
Stephen Rothwell [email protected]
I'm the Orangefs guy...
If the orangefs warnings that people see because of what's in
linux-next is annoying, I could focus on quieting them down...
We've been focusing on code review and documentation ever
since our last big exchange with Al and Linus...
-Mike
On Wed, Nov 11, 2015 at 5:28 AM, Stephen Rothwell <[email protected]> wrote:
> Hi Al,
>
> On Wed, 11 Nov 2015 10:19:48 +0000 Al Viro <[email protected]> wrote:
>>
>> On Wed, Nov 11, 2015 at 07:16:36PM +1100, Stephen Rothwell wrote:
>> >
>> > So could you please remove the 4.5 stuff from your for-next branch
>> > until after the merge window closes.
>>
>> Done.
>
> Thanks.
>
>> > Also, I noticed these new warnings today:
>> >
>> I'll cook the minimal fixup for API change after I get some sleep and
>> send it your way, unless somebody gets there first...
>
> Thanks again.
>
> --
> Cheers,
> Stephen Rothwell [email protected]
> --
> To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, Nov 11, 2015 at 10:19:48AM +0000, Al Viro wrote:
> I'll cook the minimal fixup for API change after I get some sleep and
> send it your way, unless somebody gets there first...
This should do it - switches ->ioctl() to pvfs2_inode_[gs]etxattr() and
converts xattr_handler ->[gs]et() to new API.
Signed-off-by: Al Viro <[email protected]>
---
diff --git a/fs/orangefs/file.c b/fs/orangefs/file.c
index feb1764..3d6ffe0 100644
--- a/fs/orangefs/file.c
+++ b/fs/orangefs/file.c
@@ -793,11 +793,10 @@ static long pvfs2_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
*/
if (cmd == FS_IOC_GETFLAGS) {
val = 0;
- ret = pvfs2_xattr_get_default(file->f_path.dentry,
- "user.pvfs2.meta_hint",
- &val,
- sizeof(val),
- 0);
+ ret = pvfs2_inode_getxattr(file_inode(file),
+ PVFS2_XATTR_NAME_DEFAULT_PREFIX,
+ "user.pvfs2.meta_hint",
+ &val, sizeof(val));
if (ret < 0 && ret != -ENODATA)
return ret;
else if (ret == -ENODATA)
@@ -827,12 +826,10 @@ static long pvfs2_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
gossip_debug(GOSSIP_FILE_DEBUG,
"pvfs2_ioctl: FS_IOC_SETFLAGS: %llu\n",
(unsigned long long)val);
- ret = pvfs2_xattr_set_default(file->f_path.dentry,
- "user.pvfs2.meta_hint",
- &val,
- sizeof(val),
- 0,
- 0);
+ ret = pvfs2_inode_setxattr(file_inode(file),
+ PVFS2_XATTR_NAME_DEFAULT_PREFIX,
+ "user.pvfs2.meta_hint",
+ &val, sizeof(val), 0);
}
return ret;
diff --git a/fs/orangefs/pvfs2-kernel.h b/fs/orangefs/pvfs2-kernel.h
index 29b4a48..43339c6 100644
--- a/fs/orangefs/pvfs2-kernel.h
+++ b/fs/orangefs/pvfs2-kernel.h
@@ -237,19 +237,6 @@ extern const struct xattr_handler *pvfs2_xattr_handlers[];
extern struct posix_acl *pvfs2_get_acl(struct inode *inode, int type);
extern int pvfs2_set_acl(struct inode *inode, struct posix_acl *acl, int type);
-int pvfs2_xattr_set_default(struct dentry *dentry,
- const char *name,
- const void *buffer,
- size_t size,
- int flags,
- int handler_flags);
-
-int pvfs2_xattr_get_default(struct dentry *dentry,
- const char *name,
- void *buffer,
- size_t size,
- int handler_flags);
-
/*
* Redefine xtvec structure so that we could move helper functions out of
* the define
diff --git a/fs/orangefs/xattr.c b/fs/orangefs/xattr.c
index 227eaa4..b683daa 100644
--- a/fs/orangefs/xattr.c
+++ b/fs/orangefs/xattr.c
@@ -447,12 +447,12 @@ out_unlock:
return ret;
}
-int pvfs2_xattr_set_default(struct dentry *dentry,
- const char *name,
- const void *buffer,
- size_t size,
- int flags,
- int handler_flags)
+static int pvfs2_xattr_set_default(const struct xattr_handler *handler,
+ struct dentry *dentry,
+ const char *name,
+ const void *buffer,
+ size_t size,
+ int flags)
{
return pvfs2_inode_setxattr(dentry->d_inode,
PVFS2_XATTR_NAME_DEFAULT_PREFIX,
@@ -462,11 +462,11 @@ int pvfs2_xattr_set_default(struct dentry *dentry,
flags);
}
-int pvfs2_xattr_get_default(struct dentry *dentry,
- const char *name,
- void *buffer,
- size_t size,
- int handler_flags)
+static int pvfs2_xattr_get_default(const struct xattr_handler *handler,
+ struct dentry *dentry,
+ const char *name,
+ void *buffer,
+ size_t size)
{
return pvfs2_inode_getxattr(dentry->d_inode,
PVFS2_XATTR_NAME_DEFAULT_PREFIX,
@@ -476,12 +476,12 @@ int pvfs2_xattr_get_default(struct dentry *dentry,
}
-static int pvfs2_xattr_set_trusted(struct dentry *dentry,
- const char *name,
- const void *buffer,
- size_t size,
- int flags,
- int handler_flags)
+static int pvfs2_xattr_set_trusted(const struct xattr_handler *handler,
+ struct dentry *dentry,
+ const char *name,
+ const void *buffer,
+ size_t size,
+ int flags)
{
return pvfs2_inode_setxattr(dentry->d_inode,
PVFS2_XATTR_NAME_TRUSTED_PREFIX,
@@ -491,11 +491,11 @@ static int pvfs2_xattr_set_trusted(struct dentry *dentry,
flags);
}
-static int pvfs2_xattr_get_trusted(struct dentry *dentry,
- const char *name,
- void *buffer,
- size_t size,
- int handler_flags)
+static int pvfs2_xattr_get_trusted(const struct xattr_handler *handler,
+ struct dentry *dentry,
+ const char *name,
+ void *buffer,
+ size_t size)
{
return pvfs2_inode_getxattr(dentry->d_inode,
PVFS2_XATTR_NAME_TRUSTED_PREFIX,
On Wed, Nov 11, 2015 at 11:25:17AM -0500, Mike Marshall wrote:
> I'm the Orangefs guy...
>
> If the orangefs warnings that people see because of what's in
> linux-next is annoying, I could focus on quieting them down...
See the fixup just posted in this thread.
> We've been focusing on code review and documentation ever
> since our last big exchange with Al and Linus...
BTW, could you put the current state of the docs someplace public?
> BTW, could you put the current state of the docs someplace public?
The documentation will eventually end up in
Documentation/filesystems/orangefs.txt.
This part about the creation of the shared memory between userspace and
the kernel module seems complete and accurate to me so far. This "bufmap"
data structure is central to the protocol between userspace and the kernel
module. This describes the creation of the bufmap, details on how it is used
in exchanges is what I am working on now...
-----------------------------------------------------------------------------------------------------------
Orangefs is a user space filesystem and an associated kernel module.
We'll just refer to the user space part of Orangefs as "userspace"
from here on out...
The kernel module implements a pseudo device that userspace
can read from and write to. Userspace can also manipulate the
kernel module through the pseudo device with ioctl.
At startup userspace allocates two page-size-aligned (posix_memalign)
mlocked memory blocks, one is used for IO and one is used for readdir
operations. The IO block is 41943040 bytes and the readdir block is
4194304 bytes. Each block contains logical chunks, and a pointer to each
block is added to its own PVFS_dev_map_desc structure which also describes
its total size, as well as the size and number of the logical chunks.
A pointer to the IO block's PVFS_dev_map_desc structure is sent to a
mapping routine in the kernel module with an ioctl. The structure is
copied from user space to kernel space with copy_from_user and is used
to initialize the kernel module's "bufmap" (struct pvfs2_bufmap), which
then contains:
* refcnt - a reference counter
* desc_size - PVFS2_BUFMAP_DEFAULT_DESC_SIZE (4194304) the IO block's
logical chunk size, which represents the filesystem's block size and
is used for s_blocksize in super blocks.
* desc_count - PVFS2_BUFMAP_DEFAULT_DESC_COUNT (10) the number of
logical chunks in the IO block.
* desc_shift - log2(desc_size), used for s_blocksize_bits in super blocks.
* total_size - the total size of the IO block.
* page_count - the number of 4096 byte pages in the IO block.
* page_array - a pointer to page_count * (sizeof(struct page*)) bytes
of kcalloced memory. This memory is used as an array of pointers
to each of the pages in the IO block through a call to get_user_pages.
* desc_array - a pointer to desc_count * (sizeof(struct pvfs_bufmap_desc))
bytes of kcalloced memory. This memory is further intialized:
user_desc is the kernel's copy of the IO block's PVFS_dev_map_desc
structure. user_desc->ptr points to the IO block.
pages_per_desc = bufmap->desc_size / PAGE_SIZE
offset = 0
bufmap->desc_array[0].page_array = &bufmap->page_array[offset]
bufmap->desc_array[0].array_count = pages_per_desc = 1024
bufmap->desc_array[0].uaddr = (user_desc->ptr) + (0 * 1024 * 4096)
offset += 1024
.
.
.
bufmap->desc_array[9].page_array = &bufmap->page_array[offset]
bufmap->desc_array[9].array_count = pages_per_desc = 1024
bufmap->desc_array[9].uaddr = (user_desc->ptr) +
(9 * 1024 * 4096)
offset += 1024
* buffer_index_array - a desc_count sized array of ints, used to
indicate which of the IO block's chunks are available to use.
* buffer_index_lock - a spinlock to protect buffer_index_array during update.
* readdir_index_array - a five (PVFS2_READDIR_DEFAULT_DESC_COUNT) element
int array used to indicate which of the readdir block's chunks are
available to use.
* readdir_index_lock - a spinlock to protect readdir_index_array during
update.
On Wed, Nov 11, 2015 at 11:36 AM, Al Viro <[email protected]> wrote:
> On Wed, Nov 11, 2015 at 11:25:17AM -0500, Mike Marshall wrote:
>> I'm the Orangefs guy...
>>
>> If the orangefs warnings that people see because of what's in
>> linux-next is annoying, I could focus on quieting them down...
>
> See the fixup just posted in this thread.
>
>> We've been focusing on code review and documentation ever
>> since our last big exchange with Al and Linus...
>
> BTW, could you put the current state of the docs someplace public?
Hi Al,
On Wed, 11 Nov 2015 16:33:39 +0000 Al Viro <[email protected]> wrote:
>
> On Wed, Nov 11, 2015 at 10:19:48AM +0000, Al Viro wrote:
>
> > I'll cook the minimal fixup for API change after I get some sleep and
> > send it your way, unless somebody gets there first...
>
> This should do it - switches ->ioctl() to pvfs2_inode_[gs]etxattr() and
> converts xattr_handler ->[gs]et() to new API.
Thanks, I will use that as a merge conflict fix patch from today.
--
Cheers,
Stephen Rothwell [email protected]