Evgeniy Polyakov | b8523c4 | 2009-02-09 17:02:34 +0300 | [diff] [blame] | 1 | POHMELFS network protocol. |
| 2 | |
| 3 | Basic structure used in network communication is following command: |
| 4 | |
| 5 | struct netfs_cmd |
| 6 | { |
| 7 | __u16 cmd; /* Command number */ |
| 8 | __u16 csize; /* Attached crypto information size */ |
| 9 | __u16 cpad; /* Attached padding size */ |
| 10 | __u16 ext; /* External flags */ |
| 11 | __u32 size; /* Size of the attached data */ |
| 12 | __u32 trans; /* Transaction id */ |
| 13 | __u64 id; /* Object ID to operate on. Used for feedback.*/ |
| 14 | __u64 start; /* Start of the object. */ |
| 15 | __u64 iv; /* IV sequence */ |
| 16 | __u8 data[0]; |
| 17 | }; |
| 18 | |
| 19 | Commands can be embedded into transaction command (which in turn has own command), |
| 20 | so one can extend protocol as needed without breaking backward compatibility as long |
| 21 | as old commands are supported. All string lengths include tail 0 byte. |
| 22 | |
| 23 | All commans are transfered over the network in big-endian. CPU endianess is used at the end peers. |
| 24 | |
| 25 | @cmd - command number, which specifies command to be processed. Following |
| 26 | commands are used currently: |
| 27 | |
| 28 | NETFS_READDIR = 1, /* Read directory for given inode number */ |
| 29 | NETFS_READ_PAGE, /* Read data page from the server */ |
| 30 | NETFS_WRITE_PAGE, /* Write data page to the server */ |
| 31 | NETFS_CREATE, /* Create directory entry */ |
| 32 | NETFS_REMOVE, /* Remove directory entry */ |
| 33 | NETFS_LOOKUP, /* Lookup single object */ |
| 34 | NETFS_LINK, /* Create a link */ |
| 35 | NETFS_TRANS, /* Transaction */ |
| 36 | NETFS_OPEN, /* Open intent */ |
| 37 | NETFS_INODE_INFO, /* Metadata cache coherency synchronization message */ |
| 38 | NETFS_PAGE_CACHE, /* Page cache invalidation message */ |
| 39 | NETFS_READ_PAGES, /* Read multiple contiguous pages in one go */ |
| 40 | NETFS_RENAME, /* Rename object */ |
| 41 | NETFS_CAPABILITIES, /* Capabilities of the client, for example supported crypto */ |
| 42 | NETFS_LOCK, /* Distributed lock message */ |
| 43 | NETFS_XATTR_SET, /* Set extended attribute */ |
| 44 | NETFS_XATTR_GET, /* Get extended attribute */ |
| 45 | |
| 46 | @ext - external flags. Used by different commands to specify some extra arguments |
| 47 | like partial size of the embedded objects or creation flags. |
| 48 | |
| 49 | @size - size of the attached data. For NETFS_READ_PAGE and NETFS_READ_PAGES no data is attached, |
| 50 | but size of the requested data is incorporated here. It does not include size of the command |
| 51 | header (struct netfs_cmd) itself. |
| 52 | |
| 53 | @id - id of the object this command operates on. Each command can use it for own purpose. |
| 54 | |
| 55 | @start - start of the object this command operates on. Each command can use it for own purpose. |
| 56 | |
| 57 | @csize, @cpad - size and padding size of the (attached if needed) crypto information. |
| 58 | |
| 59 | Command specifications. |
| 60 | |
| 61 | @NETFS_READDIR |
| 62 | This command is used to sync content of the remote dir to the client. |
| 63 | |
| 64 | @ext - length of the path to object. |
| 65 | @size - the same. |
| 66 | @id - local inode number of the directory to read. |
| 67 | @start - zero. |
| 68 | |
| 69 | |
| 70 | @NETFS_READ_PAGE |
| 71 | This command is used to read data from remote server. |
| 72 | Data size does not exceed local page cache size. |
| 73 | |
| 74 | @id - inode number. |
| 75 | @start - first byte offset. |
| 76 | @size - number of bytes to read plus length of the path to object. |
| 77 | @ext - object path length. |
| 78 | |
| 79 | |
| 80 | @NETFS_CREATE |
| 81 | Used to create object. |
| 82 | It does not require that all directories on top of the object were |
| 83 | already created, it will create them automatically. Each object has |
| 84 | associated @netfs_path_entry data structure, which contains creation |
| 85 | mode (permissions and type) and length of the name as long as name itself. |
| 86 | |
| 87 | @start - 0 |
| 88 | @size - size of the all data structures needed to create a path |
| 89 | @id - local inode number |
| 90 | @ext - 0 |
| 91 | |
| 92 | |
| 93 | @NETFS_REMOVE |
| 94 | Used to remove object. |
| 95 | |
| 96 | @ext - length of the path to object. |
| 97 | @size - the same. |
| 98 | @id - local inode number. |
| 99 | @start - zero. |
| 100 | |
| 101 | |
| 102 | @NETFS_LOOKUP |
| 103 | Lookup information about object on server. |
| 104 | |
| 105 | @ext - length of the path to object. |
| 106 | @size - the same. |
| 107 | @id - local inode number of the directory to look object in. |
| 108 | @start - local inode number of the object to look at. |
| 109 | |
| 110 | |
| 111 | @NETFS_LINK |
| 112 | Create hard of symlink. |
| 113 | Command is sent as "object_path|target_path". |
| 114 | |
| 115 | @size - size of the above string. |
| 116 | @id - parent local inode number. |
| 117 | @start - 1 for symlink, 0 for hardlink. |
| 118 | @ext - size of the "object_path" above. |
| 119 | |
| 120 | |
| 121 | @NETFS_TRANS |
| 122 | Transaction header. |
| 123 | |
| 124 | @size - incorporates all embedded command sizes including theirs header sizes. |
| 125 | @start - transaction generation number - unique id used to find transaction. |
| 126 | @ext - transaction flags. Unused at the moment. |
| 127 | @id - 0. |
| 128 | |
| 129 | |
| 130 | @NETFS_OPEN |
| 131 | Open intent for given transaction. |
| 132 | |
| 133 | @id - local inode number. |
| 134 | @start - 0. |
| 135 | @size - path length to the object. |
| 136 | @ext - open flags (O_RDWR and so on). |
| 137 | |
| 138 | |
| 139 | @NETFS_INODE_INFO |
| 140 | Metadata update command. |
| 141 | It is sent to servers when attributes of the object are changed and received |
| 142 | when data or metadata were updated. It operates with the following structure: |
| 143 | |
| 144 | struct netfs_inode_info |
| 145 | { |
| 146 | unsigned int mode; |
| 147 | unsigned int nlink; |
| 148 | unsigned int uid; |
| 149 | unsigned int gid; |
| 150 | unsigned int blocksize; |
| 151 | unsigned int padding; |
| 152 | __u64 ino; |
| 153 | __u64 blocks; |
| 154 | __u64 rdev; |
| 155 | __u64 size; |
| 156 | __u64 version; |
| 157 | }; |
| 158 | |
| 159 | It effectively mirrors stat(2) returned data. |
| 160 | |
| 161 | |
| 162 | @ext - path length to the object. |
| 163 | @size - the same plus size of the netfs_inode_info structure. |
| 164 | @id - local inode number. |
| 165 | @start - 0. |
| 166 | |
| 167 | |
| 168 | @NETFS_PAGE_CACHE |
| 169 | Command is only received by clients. It contains information about |
| 170 | page to be marked as not up-to-date. |
| 171 | |
| 172 | @id - client's inode number. |
| 173 | @start - last byte of the page to be invalidated. If it is not equal to |
| 174 | current inode size, it will be vmtruncated(). |
| 175 | @size - 0 |
| 176 | @ext - 0 |
| 177 | |
| 178 | |
| 179 | @NETFS_READ_PAGES |
| 180 | Used to read multiple contiguous pages in one go. |
| 181 | |
| 182 | @start - first byte of the contiguous region to read. |
| 183 | @size - contains of two fields: lower 8 bits are used to represent page cache shift |
| 184 | used by client, another 3 bytes are used to get number of pages. |
| 185 | @id - local inode number. |
| 186 | @ext - path length to the object. |
| 187 | |
| 188 | |
| 189 | @NETFS_RENAME |
| 190 | Used to rename object. |
| 191 | Attached data is formed into following string: "old_path|new_path". |
| 192 | |
| 193 | @id - local inode number. |
| 194 | @start - parent inode number. |
| 195 | @size - length of the above string. |
| 196 | @ext - length of the old path part. |
| 197 | |
| 198 | |
| 199 | @NETFS_CAPABILITIES |
| 200 | Used to exchange crypto capabilities with server. |
| 201 | If crypto capabilities are not supported by server, then client will disable it |
| 202 | or fail (if 'crypto_fail_unsupported' mount options was specified). |
| 203 | |
| 204 | @id - superblock index. Used to specify crypto information for group of servers. |
| 205 | @size - size of the attached capabilities structure. |
| 206 | @start - 0. |
| 207 | @size - 0. |
| 208 | @scsize - 0. |
| 209 | |
| 210 | @NETFS_LOCK |
| 211 | Used to send lock request/release messages. Although it sends byte range request |
| 212 | and is capable of flushing pages based on that, it is not used, since all Linux |
| 213 | filesystems lock the whole inode. |
| 214 | |
| 215 | @id - lock generation number. |
| 216 | @start - start of the locked range. |
| 217 | @size - size of the locked range. |
| 218 | @ext - lock type: read/write. Not used actually. 15'th bit is used to determine, |
| 219 | if it is lock request (1) or release (0). |
| 220 | |
| 221 | @NETFS_XATTR_SET |
| 222 | @NETFS_XATTR_GET |
| 223 | Used to set/get extended attributes for given inode. |
| 224 | @id - attribute generation number or xattr setting type |
| 225 | @start - size of the attribute (request or attached) |
| 226 | @size - name length, path len and data size for given attribute |
| 227 | @ext - path length for given object |