James Lentini | a3fa73b | 2008-02-25 12:20:13 -0500 | [diff] [blame] | 1 | ################################################################################ |
| 2 | # # |
| 3 | # NFS/RDMA README # |
| 4 | # # |
| 5 | ################################################################################ |
| 6 | |
| 7 | Author: NetApp and Open Grid Computing |
James Lentini | 007de8b4 | 2008-06-02 15:33:59 -0400 | [diff] [blame] | 8 | Date: May 29, 2008 |
James Lentini | a3fa73b | 2008-02-25 12:20:13 -0500 | [diff] [blame] | 9 | |
| 10 | Table of Contents |
| 11 | ~~~~~~~~~~~~~~~~~ |
| 12 | - Overview |
| 13 | - Getting Help |
| 14 | - Installation |
| 15 | - Check RDMA and NFS Setup |
| 16 | - NFS/RDMA Setup |
| 17 | |
| 18 | Overview |
| 19 | ~~~~~~~~ |
| 20 | |
| 21 | This document describes how to install and setup the Linux NFS/RDMA client |
| 22 | and server software. |
| 23 | |
| 24 | The NFS/RDMA client was first included in Linux 2.6.24. The NFS/RDMA server |
| 25 | was first included in the following release, Linux 2.6.25. |
| 26 | |
| 27 | In our testing, we have obtained excellent performance results (full 10Gbit |
| 28 | wire bandwidth at minimal client CPU) under many workloads. The code passes |
| 29 | the full Connectathon test suite and operates over both Infiniband and iWARP |
| 30 | RDMA adapters. |
| 31 | |
| 32 | Getting Help |
| 33 | ~~~~~~~~~~~~ |
| 34 | |
| 35 | If you get stuck, you can ask questions on the |
| 36 | |
| 37 | nfs-rdma-devel@lists.sourceforge.net |
| 38 | |
| 39 | mailing list. |
| 40 | |
| 41 | Installation |
| 42 | ~~~~~~~~~~~~ |
| 43 | |
| 44 | These instructions are a step by step guide to building a machine for |
| 45 | use with NFS/RDMA. |
| 46 | |
| 47 | - Install an RDMA device |
| 48 | |
| 49 | Any device supported by the drivers in drivers/infiniband/hw is acceptable. |
| 50 | |
| 51 | Testing has been performed using several Mellanox-based IB cards, the |
| 52 | Ammasso AMS1100 iWARP adapter, and the Chelsio cxgb3 iWARP adapter. |
| 53 | |
| 54 | - Install a Linux distribution and tools |
| 55 | |
| 56 | The first kernel release to contain both the NFS/RDMA client and server was |
| 57 | Linux 2.6.25 Therefore, a distribution compatible with this and subsequent |
| 58 | Linux kernel release should be installed. |
| 59 | |
| 60 | The procedures described in this document have been tested with |
| 61 | distributions from Red Hat's Fedora Project (http://fedora.redhat.com/). |
| 62 | |
James Lentini | 007de8b4 | 2008-06-02 15:33:59 -0400 | [diff] [blame] | 63 | - Install nfs-utils-1.1.2 or greater on the client |
James Lentini | a3fa73b | 2008-02-25 12:20:13 -0500 | [diff] [blame] | 64 | |
James Lentini | 007de8b4 | 2008-06-02 15:33:59 -0400 | [diff] [blame] | 65 | An NFS/RDMA mount point can be obtained by using the mount.nfs command in |
J. Bruce Fields | 3cd2cfe | 2008-06-02 16:01:51 -0400 | [diff] [blame] | 66 | nfs-utils-1.1.2 or greater (nfs-utils-1.1.1 was the first nfs-utils |
| 67 | version with support for NFS/RDMA mounts, but for various reasons we |
| 68 | recommend using nfs-utils-1.1.2 or greater). To see which version of |
| 69 | mount.nfs you are using, type: |
James Lentini | a3fa73b | 2008-02-25 12:20:13 -0500 | [diff] [blame] | 70 | |
James Lentini | 007de8b4 | 2008-06-02 15:33:59 -0400 | [diff] [blame] | 71 | $ /sbin/mount.nfs -V |
James Lentini | a3fa73b | 2008-02-25 12:20:13 -0500 | [diff] [blame] | 72 | |
James Lentini | 007de8b4 | 2008-06-02 15:33:59 -0400 | [diff] [blame] | 73 | If the version is less than 1.1.2 or the command does not exist, |
| 74 | you should install the latest version of nfs-utils. |
James Lentini | a3fa73b | 2008-02-25 12:20:13 -0500 | [diff] [blame] | 75 | |
| 76 | Download the latest package from: |
| 77 | |
| 78 | http://www.kernel.org/pub/linux/utils/nfs |
| 79 | |
| 80 | Uncompress the package and follow the installation instructions. |
| 81 | |
James Lentini | 007de8b4 | 2008-06-02 15:33:59 -0400 | [diff] [blame] | 82 | If you will not need the idmapper and gssd executables (you do not need |
| 83 | these to create an NFS/RDMA enabled mount command), the installation |
| 84 | process can be simplified by disabling these features when running |
| 85 | configure: |
James Lentini | a3fa73b | 2008-02-25 12:20:13 -0500 | [diff] [blame] | 86 | |
James Lentini | 007de8b4 | 2008-06-02 15:33:59 -0400 | [diff] [blame] | 87 | $ ./configure --disable-gss --disable-nfsv4 |
James Lentini | a3fa73b | 2008-02-25 12:20:13 -0500 | [diff] [blame] | 88 | |
James Lentini | 007de8b4 | 2008-06-02 15:33:59 -0400 | [diff] [blame] | 89 | To build nfs-utils you will need the tcp_wrappers package installed. For |
| 90 | more information on this see the package's README and INSTALL files. |
James Lentini | a3fa73b | 2008-02-25 12:20:13 -0500 | [diff] [blame] | 91 | |
| 92 | After building the nfs-utils package, there will be a mount.nfs binary in |
| 93 | the utils/mount directory. This binary can be used to initiate NFS v2, v3, |
J. Bruce Fields | 3cd2cfe | 2008-06-02 16:01:51 -0400 | [diff] [blame] | 94 | or v4 mounts. To initiate a v4 mount, the binary must be called |
| 95 | mount.nfs4. The standard technique is to create a symlink called |
| 96 | mount.nfs4 to mount.nfs. |
James Lentini | a3fa73b | 2008-02-25 12:20:13 -0500 | [diff] [blame] | 97 | |
James Lentini | 007de8b4 | 2008-06-02 15:33:59 -0400 | [diff] [blame] | 98 | This mount.nfs binary should be installed at /sbin/mount.nfs as follows: |
| 99 | |
| 100 | $ sudo cp utils/mount/mount.nfs /sbin/mount.nfs |
| 101 | |
| 102 | In this location, mount.nfs will be invoked automatically for NFS mounts |
| 103 | by the system mount commmand. |
| 104 | |
| 105 | NOTE: mount.nfs and therefore nfs-utils-1.1.2 or greater is only needed |
James Lentini | a3fa73b | 2008-02-25 12:20:13 -0500 | [diff] [blame] | 106 | on the NFS client machine. You do not need this specific version of |
| 107 | nfs-utils on the server. Furthermore, only the mount.nfs command from |
James Lentini | 007de8b4 | 2008-06-02 15:33:59 -0400 | [diff] [blame] | 108 | nfs-utils-1.1.2 is needed on the client. |
James Lentini | a3fa73b | 2008-02-25 12:20:13 -0500 | [diff] [blame] | 109 | |
| 110 | - Install a Linux kernel with NFS/RDMA |
| 111 | |
| 112 | The NFS/RDMA client and server are both included in the mainline Linux |
| 113 | kernel version 2.6.25 and later. This and other versions of the 2.6 Linux |
| 114 | kernel can be found at: |
| 115 | |
| 116 | ftp://ftp.kernel.org/pub/linux/kernel/v2.6/ |
| 117 | |
| 118 | Download the sources and place them in an appropriate location. |
| 119 | |
| 120 | - Configure the RDMA stack |
| 121 | |
| 122 | Make sure your kernel configuration has RDMA support enabled. Under |
| 123 | Device Drivers -> InfiniBand support, update the kernel configuration |
| 124 | to enable InfiniBand support [NOTE: the option name is misleading. Enabling |
| 125 | InfiniBand support is required for all RDMA devices (IB, iWARP, etc.)]. |
| 126 | |
| 127 | Enable the appropriate IB HCA support (mlx4, mthca, ehca, ipath, etc.) or |
| 128 | iWARP adapter support (amso, cxgb3, etc.). |
| 129 | |
| 130 | If you are using InfiniBand, be sure to enable IP-over-InfiniBand support. |
| 131 | |
| 132 | - Configure the NFS client and server |
| 133 | |
| 134 | Your kernel configuration must also have NFS file system support and/or |
| 135 | NFS server support enabled. These and other NFS related configuration |
| 136 | options can be found under File Systems -> Network File Systems. |
| 137 | |
| 138 | - Build, install, reboot |
| 139 | |
| 140 | The NFS/RDMA code will be enabled automatically if NFS and RDMA |
| 141 | are turned on. The NFS/RDMA client and server are configured via the hidden |
| 142 | SUNRPC_XPRT_RDMA config option that depends on SUNRPC and INFINIBAND. The |
| 143 | value of SUNRPC_XPRT_RDMA will be: |
| 144 | |
| 145 | - N if either SUNRPC or INFINIBAND are N, in this case the NFS/RDMA client |
| 146 | and server will not be built |
| 147 | - M if both SUNRPC and INFINIBAND are on (M or Y) and at least one is M, |
| 148 | in this case the NFS/RDMA client and server will be built as modules |
| 149 | - Y if both SUNRPC and INFINIBAND are Y, in this case the NFS/RDMA client |
| 150 | and server will be built into the kernel |
| 151 | |
| 152 | Therefore, if you have followed the steps above and turned no NFS and RDMA, |
| 153 | the NFS/RDMA client and server will be built. |
| 154 | |
| 155 | Build a new kernel, install it, boot it. |
| 156 | |
| 157 | Check RDMA and NFS Setup |
| 158 | ~~~~~~~~~~~~~~~~~~~~~~~~ |
| 159 | |
| 160 | Before configuring the NFS/RDMA software, it is a good idea to test |
| 161 | your new kernel to ensure that the kernel is working correctly. |
| 162 | In particular, it is a good idea to verify that the RDMA stack |
| 163 | is functioning as expected and standard NFS over TCP/IP and/or UDP/IP |
| 164 | is working properly. |
| 165 | |
| 166 | - Check RDMA Setup |
| 167 | |
| 168 | If you built the RDMA components as modules, load them at |
| 169 | this time. For example, if you are using a Mellanox Tavor/Sinai/Arbel |
| 170 | card: |
| 171 | |
James Lentini | 007de8b4 | 2008-06-02 15:33:59 -0400 | [diff] [blame] | 172 | $ modprobe ib_mthca |
| 173 | $ modprobe ib_ipoib |
James Lentini | a3fa73b | 2008-02-25 12:20:13 -0500 | [diff] [blame] | 174 | |
| 175 | If you are using InfiniBand, make sure there is a Subnet Manager (SM) |
| 176 | running on the network. If your IB switch has an embedded SM, you can |
| 177 | use it. Otherwise, you will need to run an SM, such as OpenSM, on one |
| 178 | of your end nodes. |
| 179 | |
| 180 | If an SM is running on your network, you should see the following: |
| 181 | |
James Lentini | 007de8b4 | 2008-06-02 15:33:59 -0400 | [diff] [blame] | 182 | $ cat /sys/class/infiniband/driverX/ports/1/state |
James Lentini | a3fa73b | 2008-02-25 12:20:13 -0500 | [diff] [blame] | 183 | 4: ACTIVE |
| 184 | |
| 185 | where driverX is mthca0, ipath5, ehca3, etc. |
| 186 | |
| 187 | To further test the InfiniBand software stack, use IPoIB (this |
| 188 | assumes you have two IB hosts named host1 and host2): |
| 189 | |
James Lentini | 007de8b4 | 2008-06-02 15:33:59 -0400 | [diff] [blame] | 190 | host1$ ifconfig ib0 a.b.c.x |
| 191 | host2$ ifconfig ib0 a.b.c.y |
| 192 | host1$ ping a.b.c.y |
| 193 | host2$ ping a.b.c.x |
James Lentini | a3fa73b | 2008-02-25 12:20:13 -0500 | [diff] [blame] | 194 | |
| 195 | For other device types, follow the appropriate procedures. |
| 196 | |
| 197 | - Check NFS Setup |
| 198 | |
| 199 | For the NFS components enabled above (client and/or server), |
| 200 | test their functionality over standard Ethernet using TCP/IP or UDP/IP. |
| 201 | |
| 202 | NFS/RDMA Setup |
| 203 | ~~~~~~~~~~~~~~ |
| 204 | |
| 205 | We recommend that you use two machines, one to act as the client and |
| 206 | one to act as the server. |
| 207 | |
| 208 | One time configuration: |
| 209 | |
| 210 | - On the server system, configure the /etc/exports file and |
| 211 | start the NFS/RDMA server. |
| 212 | |
James Lentini | c272cca | 2008-04-24 15:57:43 -0400 | [diff] [blame] | 213 | Exports entries with the following formats have been tested: |
James Lentini | a3fa73b | 2008-02-25 12:20:13 -0500 | [diff] [blame] | 214 | |
James Lentini | c272cca | 2008-04-24 15:57:43 -0400 | [diff] [blame] | 215 | /vol0 192.168.0.47(fsid=0,rw,async,insecure,no_root_squash) |
| 216 | /vol0 192.168.0.0/255.255.255.0(fsid=0,rw,async,insecure,no_root_squash) |
James Lentini | a3fa73b | 2008-02-25 12:20:13 -0500 | [diff] [blame] | 217 | |
J. Bruce Fields | 3cd2cfe | 2008-06-02 16:01:51 -0400 | [diff] [blame] | 218 | The IP address(es) is(are) the client's IPoIB address for an InfiniBand |
| 219 | HCA or the cleint's iWARP address(es) for an RNIC. |
James Lentini | c272cca | 2008-04-24 15:57:43 -0400 | [diff] [blame] | 220 | |
J. Bruce Fields | 3cd2cfe | 2008-06-02 16:01:51 -0400 | [diff] [blame] | 221 | NOTE: The "insecure" option must be used because the NFS/RDMA client does |
| 222 | not use a reserved port. |
James Lentini | a3fa73b | 2008-02-25 12:20:13 -0500 | [diff] [blame] | 223 | |
| 224 | Each time a machine boots: |
| 225 | |
| 226 | - Load and configure the RDMA drivers |
| 227 | |
| 228 | For InfiniBand using a Mellanox adapter: |
| 229 | |
James Lentini | 007de8b4 | 2008-06-02 15:33:59 -0400 | [diff] [blame] | 230 | $ modprobe ib_mthca |
| 231 | $ modprobe ib_ipoib |
| 232 | $ ifconfig ib0 a.b.c.d |
James Lentini | a3fa73b | 2008-02-25 12:20:13 -0500 | [diff] [blame] | 233 | |
| 234 | NOTE: use unique addresses for the client and server |
| 235 | |
| 236 | - Start the NFS server |
| 237 | |
J. Bruce Fields | 3cd2cfe | 2008-06-02 16:01:51 -0400 | [diff] [blame] | 238 | If the NFS/RDMA server was built as a module (CONFIG_SUNRPC_XPRT_RDMA=m in |
| 239 | kernel config), load the RDMA transport module: |
James Lentini | a3fa73b | 2008-02-25 12:20:13 -0500 | [diff] [blame] | 240 | |
James Lentini | 007de8b4 | 2008-06-02 15:33:59 -0400 | [diff] [blame] | 241 | $ modprobe svcrdma |
James Lentini | a3fa73b | 2008-02-25 12:20:13 -0500 | [diff] [blame] | 242 | |
J. Bruce Fields | 3cd2cfe | 2008-06-02 16:01:51 -0400 | [diff] [blame] | 243 | Regardless of how the server was built (module or built-in), start the |
| 244 | server: |
James Lentini | a3fa73b | 2008-02-25 12:20:13 -0500 | [diff] [blame] | 245 | |
James Lentini | 007de8b4 | 2008-06-02 15:33:59 -0400 | [diff] [blame] | 246 | $ /etc/init.d/nfs start |
James Lentini | a3fa73b | 2008-02-25 12:20:13 -0500 | [diff] [blame] | 247 | |
| 248 | or |
| 249 | |
James Lentini | 007de8b4 | 2008-06-02 15:33:59 -0400 | [diff] [blame] | 250 | $ service nfs start |
James Lentini | a3fa73b | 2008-02-25 12:20:13 -0500 | [diff] [blame] | 251 | |
| 252 | Instruct the server to listen on the RDMA transport: |
| 253 | |
James Lentini | 096abd7 | 2009-01-08 13:13:26 -0500 | [diff] [blame] | 254 | $ echo rdma 20049 > /proc/fs/nfsd/portlist |
James Lentini | a3fa73b | 2008-02-25 12:20:13 -0500 | [diff] [blame] | 255 | |
| 256 | - On the client system |
| 257 | |
J. Bruce Fields | 3cd2cfe | 2008-06-02 16:01:51 -0400 | [diff] [blame] | 258 | If the NFS/RDMA client was built as a module (CONFIG_SUNRPC_XPRT_RDMA=m in |
| 259 | kernel config), load the RDMA client module: |
James Lentini | a3fa73b | 2008-02-25 12:20:13 -0500 | [diff] [blame] | 260 | |
James Lentini | 007de8b4 | 2008-06-02 15:33:59 -0400 | [diff] [blame] | 261 | $ modprobe xprtrdma.ko |
James Lentini | a3fa73b | 2008-02-25 12:20:13 -0500 | [diff] [blame] | 262 | |
J. Bruce Fields | 3cd2cfe | 2008-06-02 16:01:51 -0400 | [diff] [blame] | 263 | Regardless of how the client was built (module or built-in), use this |
| 264 | command to mount the NFS/RDMA server: |
James Lentini | a3fa73b | 2008-02-25 12:20:13 -0500 | [diff] [blame] | 265 | |
James Lentini | 096abd7 | 2009-01-08 13:13:26 -0500 | [diff] [blame] | 266 | $ mount -o rdma,port=20049 <IPoIB-server-name-or-address>:/<export> /mnt |
James Lentini | a3fa73b | 2008-02-25 12:20:13 -0500 | [diff] [blame] | 267 | |
J. Bruce Fields | 3cd2cfe | 2008-06-02 16:01:51 -0400 | [diff] [blame] | 268 | To verify that the mount is using RDMA, run "cat /proc/mounts" and check |
| 269 | the "proto" field for the given mount. |
James Lentini | a3fa73b | 2008-02-25 12:20:13 -0500 | [diff] [blame] | 270 | |
| 271 | Congratulations! You're using NFS/RDMA! |