by kernel test robot

[permalink] [raw]

Subject: Re: [PATCH v2 5/5] dax: handle media errors in dax_do_io

Hi Vishal,

[auto build test WARNING on linux-nvdimm/libnvdimm-for-next]
[also build test WARNING on v4.6-rc1 next-20160329]
[cannot apply to xfs/for-next]
[if your patch is applied to the wrong git tree, please drop us a note to help improving the system]

url: https://github.com/0day-ci/linux/commits/Vishal-Verma/dax-handling-of-media-errors/20160330-100409
base: https://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm libnvdimm-for-next
config: x86_64-randconfig-i0-03300245 (attached as .config)
reproduce:
# save the attached .config to linux build tree
make ARCH=x86_64

Note: it may well be a FALSE warning. FWIW you are at least aware of it now.
http://gcc.gnu.org/wiki/Better_Uninitialized_Warnings

All warnings (new ones prefixed by >>):

fs/ext4/indirect.c: In function 'ext4_ind_direct_IO':
>> fs/ext4/indirect.c:719:11: warning: 'ret_saved' may be used uninitialized in this function [-Wmaybe-uninitialized]
ssize_t ret_saved = 0;
^

vim +/ret_saved +719 fs/ext4/indirect.c

703 smp_mb();
704 if (unlikely(ext4_test_inode_state(inode,
705 EXT4_STATE_DIOREAD_LOCK))) {
706 inode_dio_end(inode);
707 goto locked;
708 }
709 if (IS_DAX(inode))
710 ret = dax_do_io(iocb, inode, iter, offset,
711 ext4_dio_get_block, NULL, 0);
712 else
713 ret = __blockdev_direct_IO(iocb, inode,
714 inode->i_sb->s_bdev, iter,
715 offset, ext4_dio_get_block,
716 NULL, NULL, 0);
717 inode_dio_end(inode);
718 } else {
> 719 ssize_t ret_saved = 0;
720
721 locked:
722 if (IS_DAX(inode)) {
723 ret = dax_do_io(iocb, inode, iter, offset,
724 ext4_dio_get_block, NULL, DIO_LOCKING);
725 if (ret == -EIO && iov_iter_rw(iter) == WRITE)
726 ret_saved = ret;
727 else

---
0-DAY kernel test infrastructure Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all Intel Corporation

Attachments:

(No filename) (1.98 kB)
.config.gz (28.49 kB)
Download all attachments

2016-03-30 04:21:01

On Tue, Apr 26, 2016 at 8:31 AM, Jan Kara <[email protected]> wrote:
> On Tue 26-04-16 07:59:10, Dan Williams wrote:
>> On Tue, Apr 26, 2016 at 1:27 AM, Dave Chinner <[email protected]> wrote:
>> > On Mon, Apr 25, 2016 at 09:18:42PM -0700, Dan Williams wrote:
>> [..]
>> > It seems to me you are focussing on code/technologies that exist
>> > today instead of trying to define an architecture that is more
>> > optimal for pmem storage systems. Yes, working code is great, but if
>> > you can't tell people how things like robust error handling and
>> > redundancy are going to work in future then it's going to take
>> > forever for everyone else to handle such errors robustly through the
>> > storage stack...
>>
>> Precisely because higher order redundancy is built on top this baseline.
>>
>> MD-RAID can't do it's error recovery if we don't have -EIO and
>> clear-error-on-write. On the other hand, you're absolutely right that
>> we have a gaping hole on top of the SIGBUS recovery model, and don't
>> have a kernel layer we can interpose on top of DAX to provide some
>> semblance of redundancy.
>>
>> In the meantime, a handful of applications with a team of full-time
>> site-reliability-engineers may be able to plug in external redundancy
>> infrastructure on top of what is defined in these patches. For
>> everyone else, the hard problem, we need to do a lot more thinking
>> about a trap and recover solution.
>
> So we could actually implement some kind of redundancy with DAX with
> reasonable effort. We already do track dirty storage PFNs in the radix
> tree. After DAX locking patches get merged we also have a reliable way to
> write-protect them when we decide to do 'writeback' (translates to flushing
> CPU caches) for them. When we do that, we have all the infrastructure in
> place to provide 'stable pages' while some mirroring or other redundancy
> mechanism in kernel works with the data.
>
> But as Dave said, we should do some writeup of how this is all supposed to
> work and e.g. which layer is going to be responsible for the redundancy. Do
> we want to have that in DAX code? Or just provide stable page guarantees
> from DAX and do the redundancy from device mapper? This needs more
> thought...
>

[ adding Mike, since his ears are likely burning by this point ]

If we had the ability to specify a range or list of ranges to
blkdev_issue_flush() that would allow the driver level to implement
redundancy at sync time. And no, before someone flies off the handle,
this isn't rehashing the same argument I lost about where to track
dirty pfns. Rather this relies on the radix to track dirty pfns, but
asks the driver to do the flush operation. In the nominal case this
is a clflush / clwb loop or wbinvd in the pmem driver, in the
redundancy case the pmem driver is swapped out for a driver that uses
the flush request as a trigger point to synchronize redundant data.

We want this at the driver level to take advantage of standard
asynchronous completions, and make it administratively equivalent to
the dm/md layering people are used to using.