2012-03-15 20:23:03

by Bruno Silva

[permalink] [raw]
Subject: About pNFS installation process.

Hello,

? ? ?I am PhD student and I intend to conduct performance experiments
adopting pNFS. ? First of all, i need to build the pNFS Block Layout
environment. I apologize for the long mail, but i need to explain all
the steps conducted to build the environment. I followed the steps
presented in http://wiki.linux-nfs.org/wiki/index.php/PNFS_Block_Server_Setup_Instructions.

Steps.

(1) Building and install the kernel. And these configurations on .config file
i adopted this code: git clone git://linux-nfs.org/~bhalevy/linux-pnfs.git

CONFIG_NFSD=m
CONFIG_NFSD_V4=y
CONFIG_PNFSD=y
# CONFIG_PNFSD_LOCAL_EXPORT is not set
CONFIG_PNFSD_BLOCK=y

(2) Building the nfsutils and utils/blkmapd

? ? git clone git://linux-nfs.org/~bhalevy/pnfs-nfs-utils.git

(3) Export the file system.

-----------------------------------------------------------------
For the block access to work properly the disks must have a signature.
Partitioned the disks using "parted". Disks partitioned with "fdisk"
doesn't have the signatures.

I have followed the below mentioned steps.

parted /dev/sdb
(parted) mklabel gpt
(parted) mkpart 1 <Provide start and end of the partetions>
(parted) print
Model: VMware Virtual disk (scsi)
Disk /dev/sdb: 53.7GB
Sector size (logical/physical): 512B/512B
Partition Table: gpt
Number ?Start ? End ? ? Size ? ?File system ?Name ?Flags
1 ? ? ?17.4kB ?53.7GB ?53.7GB ?ext3 ? ? ? ? 1 ? ? msftres
I have tested with ext4 file system, create ext4 file system with 4K block size.

?# mkfs.ext4 -b 4096 /dev/sdb1
-----------------------------------------------------------------

The steps 1, 2 and 3 i did to all machines (data servers, metadata
server and client).

I adopted the export option in metadata server exports file.
/mnt *(rw,sync,fsid=0,insecure,no_subtree_check,no_root_squash,pnfs)

And run the following script to start the server.

#/bin/bash
# UMOUNT /mnt
umount /mnt
#start the service
service tgtd restart
sleep 8
# Create iSCSI target
tgtadm --lld iscsi --op new --mode target --tid 1 -T
iqn.1992-05.com.emc:openblock
# Expose LUN as iSCSI target
tgtadm --lld iscsi --mode logicalunit --op new --tid 1 --lun 1
--backing-store /dev/sdb
# Allow acces of all initiator
tgtadm --lld iscsi --mode target --op bind --tid 1 --initiator-address ALL
# show all the details
tgtadm --lld iscsi --op show --mode target
# mount the partetion
mount /dev/sdb1 /mnt
sleep 3
# start the nfs server
service nfs restart
sleep 3
# start the deamon
cd <CTL_SRC>/ctl/
./ctl -u &

If the data server and metadata server have been run on the same
machine everything works normally. My question is how do I add other
pNFS data servers in the environment. I know that is related to
creating iscsi targets. But how the data servers are linked with the
metadata server. There is some configuration file to inform the
metadata server like in spNFS? How it works?

Thanks in advance.


--
Bruno Silva
Computer Engineer
Modcs Group
---------------------------------------------------------------------
Facebook goo.gl/QHaZx
Twitter goo.gl/yk4jf
Google+ goo.gl/xIbgk


2012-03-18 10:51:41

by Lev Solomonov

[permalink] [raw]
Subject: Re: About pNFS installation process.

On Thu, Mar 15, 2012 at 22:23, Bruno Silva <[email protected]> wrote:
<snip>
> 1 17.4kB 53.7GB 53.7GB ext3 1 msftres
> I have tested with ext4 file system, create ext4 file system with
> 4K block size.
>
> # mkfs.ext4 -b 4096 /dev/sdb1
> -----------------------------------------------------------------

i suspect you might encounter some issues with ext4, once you get past
the initial setup problems.

> The steps 1, 2 and 3 i did to all machines (data servers, metadata
> server and client).

assuming iSCSI as the underlying storage model, generally speaking:
* blkmapd and blocklayoutdriver are client-specific (as jim mentioned).
* ctl daemon is MDS-specific.
* iSCSI target is DS-specific.
* both MDS and client will need iSCSI initiator.

you'll need to discover and login to the iSCSI target on the DS from
both the MDS and the client for proper pNFS operation.

security often gets in the way, mind the ACLs on the iSCSI targets and
firewall settings on all three (client/MDS/DS) and between them.

> I adopted the export option in metadata server exports file.
> /mnt *(rw,sync,fsid=0,insecure,no_subtree_check,no_root_squash,pnfs)
<snip>
> If the data server and metadata server have been run on the same
> machine everything works normally.

see above. once you set everything up make sure you're actually running
in pNFS mode, i.e. that client sends/receives file data directly from
the DS over iSCSI, not as a fallback through MDS.

solo.

2012-03-16 18:14:05

by Alexandre Depoutovitch

[permalink] [raw]
Subject: About Direct I/O

Hello,
I am trying to do random sector aligned writes to an NFS mounted disk. The
performance is order of magnitude worse than 4K (file system block size)
aligned I/O.
The reason is that NFS demon (Linux kernel 2.6.32) on the server side
always does buffered I/O, which behaves poorly for block unaligned
requests.
Is there a way to tell NFS daemon to use direct I/O?
If not, is it an implementation limitation or there is a fundamental
problem with using direct I/O in NFS server?

Thank you,

Alex

2012-03-23 18:02:52

by Jim Rees

[permalink] [raw]
Subject: Re: About pNFS installation process.

Blkmapd is finding your iscsi devices, but can't create the mapped device
for some reason. I suspect the geometry is wrong but you'd have to do some
debugging to figure out exactly why.

I guess we should fix or get rid of pretty_sig.

Date: Thu, 2 Dec 2010 09:41:26 -0500
From: Jim Rees <[email protected]>
Subject: Re: [PATCH 4/5] various minor cleanups
To: Benny Halevy <[email protected]>
Cc: [email protected], peter honeyman <[email protected]>

...

I am glad you are paying attention! I am aware of the shortcomings of
pretty_sig(). In addition to the problems you noted, it also assumes that a
signature over 8 bytes long is representable as a text string, which is not
guaranteed. The code it replaced was worse.

I put this in because for debugging I need to be able to follow a signature
all the way from my EMC server to the devmapper. pretty_sig() simply prints
the signature in a way that I can match it up with the signature on the
server.

I don't want to spend a lot of time on this, but I also am uneasy leaving
EMC-specific code in nfs-utils, especially since it can blow up if you use
it against a non-EMC server. My inclination is to remove this debugging
code when I no longer need it. I guess at the very least I should put in a
comment. I am open to suggestions.

2012-03-16 20:35:22

by J. Bruce Fields

[permalink] [raw]
Subject: Re: About Direct I/O

On Fri, Mar 16, 2012 at 11:14:04AM -0700, Alexandre Depoutovitch wrote:
> Hello,
> I am trying to do random sector aligned writes to an NFS mounted disk. The
> performance is order of magnitude worse than 4K (file system block size)
> aligned I/O.
> The reason is that NFS demon (Linux kernel 2.6.32) on the server side
> always does buffered I/O, which behaves poorly for block unaligned
> requests.
> Is there a way to tell NFS daemon to use direct I/O?

No.

> If not, is it an implementation limitation or there is a fundamental
> problem with using direct I/O in NFS server?

I'm shamefully ignorant of Direct IO....

If we supported Direct IO, are there heuristics that would let the
server figure out on its own when it helped and when it didn't? Or
would the administrator be stuck trying to figure that out?

Is Direct IO possible from kernel buffers these days? Are there
alignment restrictions?

--b.

2012-03-16 20:58:16

by Myklebust, Trond

[permalink] [raw]
Subject: Re: About Direct I/O

T24gRnJpLCAyMDEyLTAzLTE2IGF0IDE2OjM1IC0wNDAwLCBKLiBCcnVjZSBGaWVsZHMgd3JvdGU6
DQo+IE9uIEZyaSwgTWFyIDE2LCAyMDEyIGF0IDExOjE0OjA0QU0gLTA3MDAsIEFsZXhhbmRyZSBE
ZXBvdXRvdml0Y2ggd3JvdGU6DQo+ID4gSGVsbG8sDQo+ID4gSSBhbSB0cnlpbmcgdG8gZG8gcmFu
ZG9tIHNlY3RvciBhbGlnbmVkIHdyaXRlcyB0byBhbiBORlMgbW91bnRlZCBkaXNrLiBUaGUNCj4g
PiBwZXJmb3JtYW5jZSBpcyBvcmRlciBvZiBtYWduaXR1ZGUgd29yc2UgdGhhbiA0SyAoZmlsZSBz
eXN0ZW0gYmxvY2sgc2l6ZSkNCj4gPiBhbGlnbmVkIEkvTy4NCj4gPiBUaGUgcmVhc29uIGlzIHRo
YXQgTkZTIGRlbW9uIChMaW51eCBrZXJuZWwgMi42LjMyKSBvbiB0aGUgc2VydmVyIHNpZGUNCj4g
PiBhbHdheXMgZG9lcyBidWZmZXJlZCBJL08sIHdoaWNoIGJlaGF2ZXMgcG9vcmx5IGZvciBibG9j
ayB1bmFsaWduZWQNCj4gPiByZXF1ZXN0cy4NCj4gPiBJcyB0aGVyZSBhIHdheSB0byB0ZWxsIE5G
UyBkYWVtb24gdG8gdXNlIGRpcmVjdCBJL08/IA0KPiANCj4gTm8uDQo+IA0KPiA+IElmIG5vdCwg
aXMgaXQgYW4gaW1wbGVtZW50YXRpb24gbGltaXRhdGlvbiBvciB0aGVyZSBpcyBhIGZ1bmRhbWVu
dGFsDQo+ID4gcHJvYmxlbSB3aXRoIHVzaW5nIGRpcmVjdCBJL08gaW4gTkZTIHNlcnZlcj8NCj4g
DQo+IEknbSBzaGFtZWZ1bGx5IGlnbm9yYW50IG9mIERpcmVjdCBJTy4uLi4NCj4gDQo+IElmIHdl
IHN1cHBvcnRlZCBEaXJlY3QgSU8sIGFyZSB0aGVyZSBoZXVyaXN0aWNzIHRoYXQgd291bGQgbGV0
IHRoZQ0KPiBzZXJ2ZXIgZmlndXJlIG91dCBvbiBpdHMgb3duIHdoZW4gaXQgaGVscGVkIGFuZCB3
aGVuIGl0IGRpZG4ndD8gIE9yDQo+IHdvdWxkIHRoZSBhZG1pbmlzdHJhdG9yIGJlIHN0dWNrIHRy
eWluZyB0byBmaWd1cmUgdGhhdCBvdXQ/DQo+IA0KPiBJcyBEaXJlY3QgSU8gcG9zc2libGUgZnJv
bSBrZXJuZWwgYnVmZmVycyB0aGVzZSBkYXlzPyAgQXJlIHRoZXJlDQo+IGFsaWdubWVudCByZXN0
cmljdGlvbnM/DQoNCldvcmsgaXMgb24gaXRzIHdheSB0byBhbGxvdyBkaXJlY3QgaS9vIGZyb20g
a2VybmVsIGJ1ZmZlcnMsIGJ1dCBpdCBpcw0Kbm90IHBvc3NpYmxlIHdpdGggZXhpc3Rpbmcga2Vy
bmVscy4NCg0KSGF2ZSBwYXRpZW5jZS4NCi0tIA0KVHJvbmQgTXlrbGVidXN0DQpMaW51eCBORlMg
Y2xpZW50IG1haW50YWluZXINCg0KTmV0QXBwDQpUcm9uZC5NeWtsZWJ1c3RAbmV0YXBwLmNvbQ0K
d3d3Lm5ldGFwcC5jb20NCg0K

2012-03-23 17:21:45

by Bruno Silva

[permalink] [raw]
Subject: Re: About pNFS installation process.

Fist of all, thanks for your replies.

I still have a few questions:

1. What file system do you suggest? Why ext4 is not recommended?
2. How know whether the pNFS is correctly set? Follows the blkmapd output:

[bruno@fedora blkmapd]$ sudo ./blkmapd -f
blkmapd: process_deviceinfo: 2 vols ***** POINT 1 *******
blkmapd: decode_blk_signature: si_comps[0]: bs_length 16, bs_string
%�c,�kkG�+ΉJ�2
blkmapd: read_cmp_blk_sig: /dev/sde sig %�c,�kkG�+ΉJ�2� at 568
blkmapd: decode_blk_volume: simple 0
blkmapd: decode_blk_volume: slice 1
device-mapper: reload ioctl failed: Invalid argument ***** POINT 2 *******
blkmapd: Create device pnfs_vol_0 failed
blkmapd: dm_device_create: 1 pnfs_vol_0 0:0
blkmapd: process_deviceinfo: 2 vols
blkmapd: decode_blk_signature: si_comps[0]: bs_length 16, bs_string
%�c,�kkG�+ΉJ�2
blkmapd: read_cmp_blk_sig: /dev/sde sig %�c,�kkG�+ΉJ�2� at 568
blkmapd: decode_blk_volume: simple 0
blkmapd: decode_blk_volume: slice 1
device-mapper: reload ioctl failed: Invalid argument
blkmapd: Create device pnfs_vol_1 failed
blkmapd: dm_device_create: 1 pnfs_vol_1 0:0
blkmapd: process_deviceinfo: 2 vols
blkmapd: decode_blk_signature: si_comps[0]: bs_length 16, bs_string
%�c,�kkG�+ΉJ�2
blkmapd: read_cmp_blk_sig: /dev/sde sig %�c,�kkG�+ΉJ�2� at 568
blkmapd: decode_blk_volume: simple 0
blkmapd: decode_blk_volume: slice 1
device-mapper: reload ioctl failed: Invalid argument
blkmapd: Create device pnfs_vol_2 failed
blkmapd: dm_device_create: 1 pnfs_vol_2 0:0

***** POINT 1 ******* I connected with four data servers using
the command iscsiadm-m discovery-t-p SendTargets <IP DATE OF
server>-l. But, note the output blkmapd "blkmapd: process_deviceinfo:
2 vols". I believe that should be listed four volumes, not two;

***** POINT 2 ******* What means this message? "device-mapper:
reload ioctl failed: Invalid argument"

Follows the output of command of "grep nfs /proc/self/mountstats"

[bruno@fedora ~]$ grep nfs /proc/self/mountstats
device sunrpc mounted on /var/lib/nfs/rpc_pipefs with fstype rpc_pipefs
device 192.168.0.203:/ mounted on /home/bruno/shared with fstype nfs4
statvers=1.0
nfsv4: bm0=0xfdffbfff,bm1=0x40f9be3e,acl=0x3,sessions,pnfs=not configured
RPC iostats version: 1.0 p/v: 100003/4 (nfs)

My question is: How can i add other data servers in the structure and
how do I find out where the problem is?

Thanks in advance.

[]'s

Em 18 de março de 2012 07:51, Lev Solomonov <[email protected]> escreveu:
>
> On Thu, Mar 15, 2012 at 22:23, Bruno Silva <[email protected]> wrote:
>    <snip>
> > 1      17.4kB  53.7GB  53.7GB  ext3         1     msftres
> > I have tested with ext4 file system, create ext4 file system with
> > 4K block size.
> >
> > # mkfs.ext4 -b 4096 /dev/sdb1
> > -----------------------------------------------------------------
>
> i suspect you might encounter some issues with ext4, once you get past
> the initial setup problems.
>
> > The steps 1, 2 and 3 i did to all machines (data servers, metadata
> > server and client).
>
> assuming iSCSI as the underlying storage model, generally speaking:
> * blkmapd and blocklayoutdriver are client-specific (as jim mentioned).
> * ctl daemon is MDS-specific.
> * iSCSI target is DS-specific.
> * both MDS and client will need iSCSI initiator.
>
> you'll need to discover and login to the iSCSI target on the DS from
> both the MDS and the client for proper pNFS operation.
>
> security often gets in the way, mind the ACLs on the iSCSI targets and
> firewall settings on all three (client/MDS/DS) and between them.
>
> > I adopted the export option in metadata server exports file.
> > /mnt *(rw,sync,fsid=0,insecure,no_subtree_check,no_root_squash,pnfs)
>    <snip>
> > If the data server and metadata server have been run on the same
> > machine everything works normally.
>
> see above. once you set everything up make sure you're actually running
> in pNFS mode, i.e. that client sends/receives file data directly from
> the DS over iSCSI, not as a fallback through MDS.
>
> solo.




--
Bruno Silva
Computer Engineer
Modcs Group
---------------------------------------------------------------------
Facebook goo.gl/QHaZx
Twitter goo.gl/yk4jf
Google+ goo.gl/xIbgk

2012-03-16 00:16:51

by Jim Rees

[permalink] [raw]
Subject: Re: About pNFS installation process.

Bruno Silva wrote:

The steps 1, 2 and 3 i did to all machines (data servers, metadata
server and client).

You only need blkmapd on the client. Client setup instructions are at
http://wiki.linux-nfs.org/wiki/index.php/Fedora_pNFS_Client_Setup . I added
a link to that page.

2012-04-04 00:21:02

by Lev Solomonov

[permalink] [raw]
Subject: Re: About pNFS installation process.

sorry for the delay.

On Fri, Mar 23, 2012 at 19:21, Bruno Silva <[email protected]> wrote:
> Fist of all, thanks for your replies.
>
> I still have a few questions:
>
> 1. What file system do you suggest? Why ext4 is not recommended?

i'm not aware of a single FS that will work perfectly out-of-the-box
with pNFS blocks layout on linux. in case of ext4/btrfs, there's tension
between the newer, advanced FS features and pNFS block layout underlying
FS requirements.

> 2. How know whether the pNFS is correctly set? Follows the blkmapd output:

IIRC neither client nor MDS nor DS emit blatantly bogus error messages,
so if you see any errors (e.g. 'failed' below) - that's a bad sign.

once you've cleared those, make sure that MDS exports pNFS, e.g.:
grep pnfs /var/lib/nfs/etab
or similar shows your expected exports, and that client mounts those as
v4.1, e.g.:
grep minorversion=1 /proc/mounts
or similar shows the expected mounts. then use the mounts on client
while sniffing the traffic, you'll want to see the file data move
around over iSCSI client<->DS rather than over NFS client<->MDS.

<snip>
> ***** POINT 1 ******* I connected with four data servers using
> the command iscsiadm-m discovery-t-p SendTargets <IP DATE OF
> server>-l. But, note the output blkmapd "blkmapd: process_deviceinfo:
> 2 vols". I believe that should be listed four volumes, not two;

nope, the "2 vols" is just the single device info: simple+slice, and you
appear to have several of such devices. take a peek around GETDEVICEINFO
in RFC 5661 + 5663. regardless, DM barfed.

what export topology are you aiming for with those 4 volumes?

<snip>
> Follows the output of command of "grep nfs /proc/self/mountstats"
>
> [bruno@fedora ~]$ grep nfs /proc/self/mountstats
> device sunrpc mounted on /var/lib/nfs/rpc_pipefs with fstype rpc_pipefs
> device 192.168.0.203:/ mounted on /home/bruno/shared with fstype nfs4
> statvers=1.0
> nfsv4: bm0=0xfdffbfff,bm1=0x40f9be3e,acl=0x3,sessions,pnfs=not configured
> RPC iostats version: 1.0 p/v: 100003/4 (nfs)

the "pnfs=not configured" is bad news (no active pNFS layout driver),
you'll want that to say "pnfs=LAYOUT_BLOCK_VOLUME".

> My question is: How can i add other data servers in the structure and

did you manage to successfully export a single DS iSCSI LUN through MDS
over pNFS?

> how do I find out where the problem is?

DM is likely to have left something in /var/log/messages (or wherever)
on failed reloads. any 'device-mapper' entries there?

solo.