Blame - Documentation/filesystems/ext4.txt - kernel/msm-4.9

blob: 0d5394920a31c146ef85cc4df64107e37a6ae386 [file] [log] [blame]

Dave Kleikamp	fc513a3	2006-10-11 01:21:25 -0700	[diff] [blame]	1
				2	Ext4 Filesystem
				3	===============
				4
				5	This is a development version of the ext4 filesystem, an advanced level
				6	of the ext3 filesystem which incorporates scalability and reliability
				7	enhancements for supporting large filesystems (64 bit) in keeping with
				8	increasing disk capacities and state-of-the-art feature requirements.
				9
				10	Mailing list: linux-ext4@vger.kernel.org
				11
				12
				13	1. Quick usage instructions:
				14	===========================
				15
Jose R. Santos	93e3270	2008-07-11 19:27:31 -0400	[diff] [blame]	16	- Compile and install the latest version of e2fsprogs (as of this
				17	writing version 1.41) from:
				18
				19	http://sourceforge.net/project/showfiles.php?group_id=2406
				20
				21	or
				22
Dave Kleikamp	fc513a3	2006-10-11 01:21:25 -0700	[diff] [blame]	23	ftp://ftp.kernel.org/pub/linux/kernel/people/tytso/e2fsprogs/
				24
Jose R. Santos	93e3270	2008-07-11 19:27:31 -0400	[diff] [blame]	25	or grab the latest git repository from:
Dave Kleikamp	fc513a3	2006-10-11 01:21:25 -0700	[diff] [blame]	26
Jose R. Santos	93e3270	2008-07-11 19:27:31 -0400	[diff] [blame]	27	git://git.kernel.org/pub/scm/fs/ext2/e2fsprogs.git
Dave Kleikamp	fc513a3	2006-10-11 01:21:25 -0700	[diff] [blame]	28
Theodore Ts'o	4537398	2008-07-27 19:59:21 -0400	[diff] [blame]	29	- Note that it is highly important to install the mke2fs.conf file
				30	that comes with the e2fsprogs 1.41.x sources in /etc/mke2fs.conf. If
				31	you have edited the /etc/mke2fs.conf file installed on your system,
				32	you will need to merge your changes with the version from e2fsprogs
				33	1.41.x.
				34
Jose R. Santos	93e3270	2008-07-11 19:27:31 -0400	[diff] [blame]	35	- Create a new filesystem using the ext4dev filesystem type:
Dave Kleikamp	fc513a3	2006-10-11 01:21:25 -0700	[diff] [blame]	36
Jose R. Santos	93e3270	2008-07-11 19:27:31 -0400	[diff] [blame]	37	# mke2fs -t ext4dev /dev/hda1
Dave Kleikamp	fc513a3	2006-10-11 01:21:25 -0700	[diff] [blame]	38
Jose R. Santos	93e3270	2008-07-11 19:27:31 -0400	[diff] [blame]	39	Or configure an existing ext3 filesystem to support extents and set
				40	the test_fs flag to indicate that it's ok for an in-development
				41	filesystem to touch this filesystem:
Dave Kleikamp	fc513a3	2006-10-11 01:21:25 -0700	[diff] [blame]	42
Jose R. Santos	93e3270	2008-07-11 19:27:31 -0400	[diff] [blame]	43	# tune2fs -O extents -E test_fs /dev/hda1
				44
				45	If the filesystem was created with 128 byte inodes, it can be
				46	converted to use 256 byte for greater efficiency via:
				47
				48	# tune2fs -I 256 /dev/hda1
				49
				50	(Note: we currently do not have tools to convert an ext4dev
				51	filesystem back to ext3; so please do not do try this on production
				52	filesystems.)
				53
				54	- Mounting:
				55
				56	# mount -t ext4dev /dev/hda1 /wherever
Dave Kleikamp	fc513a3	2006-10-11 01:21:25 -0700	[diff] [blame]	57
				58	- When comparing performance with other filesystems, remember that
Jose R. Santos	93e3270	2008-07-11 19:27:31 -0400	[diff] [blame]	59	ext3/4 by default offers higher data integrity guarantees than most.
				60	So when comparing with a metadata-only journalling filesystem, such
				61	as ext3, use `mount -o data=writeback'. And you might as well use
				62	`mount -o nobh' too along with it. Making the journal larger than
				63	the mke2fs default often helps performance with metadata-intensive
				64	workloads.
Dave Kleikamp	fc513a3	2006-10-11 01:21:25 -0700	[diff] [blame]	65
				66	2. Features
				67	===========
				68
				69	2.1 Currently available
				70
Jose R. Santos	93e3270	2008-07-11 19:27:31 -0400	[diff] [blame]	71	* ability to use filesystems > 16TB (e2fsprogs support not available yet)
Dave Kleikamp	fc513a3	2006-10-11 01:21:25 -0700	[diff] [blame]	72	* extent format reduces metadata overhead (RAM, IO for access, transactions)
				73	* extent format more robust in face of on-disk corruption due to magics,
				74	* internal redunancy in tree
Mingming Cao	49f1487	2008-07-11 19:27:31 -0400	[diff] [blame]	75	* improved file allocation (multi-block alloc)
Jose R. Santos	93e3270	2008-07-11 19:27:31 -0400	[diff] [blame]	76	* fix 32000 subdirectory limit
				77	* nsec timestamps for mtime, atime, ctime, create time
				78	* inode version field on disk (NFSv4, Lustre)
				79	* reduced e2fsck time via uninit_bg feature
				80	* journal checksumming for robustness, performance
				81	* persistent file preallocation (e.g for streaming media, databases)
				82	* ability to pack bitmaps and inode tables into larger virtual groups via the
				83	flex_bg feature
				84	* large file support
				85	* Inode allocation using large virtual block groups via flex_bg
Mingming Cao	49f1487	2008-07-11 19:27:31 -0400	[diff] [blame]	86	* delayed allocation
				87	* large block (up to pagesize) support
				88	* efficent new ordered mode in JBD2 and ext4(avoid using buffer head to force
				89	the ordering)
Dave Kleikamp	fc513a3	2006-10-11 01:21:25 -0700	[diff] [blame]	90
				91	2.2 Candidate features for future inclusion
				92
Jose R. Santos	93e3270	2008-07-11 19:27:31 -0400	[diff] [blame]	93	* Online defrag (patches available but not well tested)
				94	* reduced mke2fs time via lazy itable initialization in conjuction with
				95	the uninit_bg feature (capability to do this is available in e2fsprogs
				96	but a kernel thread to do lazy zeroing of unused inode table blocks
				97	after filesystem is first mounted is required for safety)
Dave Kleikamp	fc513a3	2006-10-11 01:21:25 -0700	[diff] [blame]	98
Jose R. Santos	93e3270	2008-07-11 19:27:31 -0400	[diff] [blame]	99	There are several others under discussion, whether they all make it in is
				100	partly a function of how much time everyone has to work on them. Features like
				101	metadata checksumming have been discussed and planned for a bit but no patches
				102	exist yet so I'm not sure they're in the near-term roadmap.
Dave Kleikamp	fc513a3	2006-10-11 01:21:25 -0700	[diff] [blame]	103
Jose R. Santos	93e3270	2008-07-11 19:27:31 -0400	[diff] [blame]	104	The big performance win will come with mballoc, delalloc and flex_bg
				105	grouping of bitmaps and inode tables. Some test results available here:
Dave Kleikamp	fc513a3	2006-10-11 01:21:25 -0700	[diff] [blame]	106
Jose R. Santos	93e3270	2008-07-11 19:27:31 -0400	[diff] [blame]	107	- http://www.bullopensource.org/ext4/20080530/ffsb-write-2.6.26-rc2.html
				108	- http://www.bullopensource.org/ext4/20080530/ffsb-readwrite-2.6.26-rc2.html
Dave Kleikamp	fc513a3	2006-10-11 01:21:25 -0700	[diff] [blame]	109
				110	3. Options
				111	==========
				112
				113	When mounting an ext4 filesystem, the following option are accepted:
				114	(*) == default
				115
Alex Tomas	c9de560	2008-01-29 00:19:52 -0500	[diff] [blame]	116	extents (*) ext4 will use extents to address file data. The
Dave Kleikamp	fc513a3	2006-10-11 01:21:25 -0700	[diff] [blame]	117	file system will no longer be mountable by ext3.
				118
Alex Tomas	c9de560	2008-01-29 00:19:52 -0500	[diff] [blame]	119	noextents ext4 will not use extents for newly created files
				120
Girish Shilamkar	818d276	2008-01-28 23:58:27 -0500	[diff] [blame]	121	journal_checksum Enable checksumming of the journal transactions.
				122	This will allow the recovery code in e2fsck and the
				123	kernel to detect corruption in the kernel. It is a
				124	compatible change and will be ignored by older kernels.
				125
				126	journal_async_commit Commit block can be written to disk without waiting
				127	for descriptor blocks. If enabled older kernels cannot
				128	mount the device. This will enable 'journal_checksum'
				129	internally.
				130
Dave Kleikamp	fc513a3	2006-10-11 01:21:25 -0700	[diff] [blame]	131	journal=update Update the ext4 file system's journal to the current
				132	format.
				133
				134	journal=inum When a journal already exists, this option is ignored.
				135	Otherwise, it specifies the number of the inode which
				136	will represent the ext4 file system's journal file.
				137
				138	journal_dev=devnum When the external journal device's major/minor numbers
				139	have changed, this option allows the user to specify
				140	the new journal location. The journal device is
				141	identified through its new major/minor numbers encoded
				142	in devnum.
				143
				144	noload Don't load the journal on mounting.
				145
				146	data=journal All data are committed into the journal prior to being
				147	written into the main file system.
				148
				149	data=ordered (*) All data are forced directly out to the main file
				150	system prior to its metadata being committed to the
				151	journal.
				152
				153	data=writeback Data ordering is not preserved, data may be written
				154	into the main file system after its metadata has been
				155	committed to the journal.
				156
				157	commit=nrsec (*) Ext4 can be told to sync all its data and metadata
				158	every 'nrsec' seconds. The default value is 5 seconds.
				159	This means that if you lose your power, you will lose
				160	as much as the latest 5 seconds of work (your
				161	filesystem will not be damaged though, thanks to the
				162	journaling). This default value (or any low value)
				163	will hurt performance, but it's good for data-safety.
				164	Setting it to 0 will have the same effect as leaving
				165	it at the default (5 seconds).
				166	Setting it to very large values will improve
				167	performance.
				168
Eric Sandeen	571640c	2008-05-26 12:29:46 -0400	[diff] [blame]	169	barrier=<0\|1(*)> This enables/disables the use of write barriers in
				170	the jbd code. barrier=0 disables, barrier=1 enables.
				171	This also requires an IO stack which can support
				172	barriers, and if jbd gets an error on a barrier
				173	write, it will disable again with a warning.
				174	Write barriers enforce proper on-disk ordering
				175	of journal commits, making volatile disk write caches
				176	safe to use, at some performance penalty. If
				177	your disks are battery-backed in one way or another,
				178	disabling barriers may safely improve performance.
Dave Kleikamp	fc513a3	2006-10-11 01:21:25 -0700	[diff] [blame]	179
				180	orlov (*) This enables the new Orlov block allocator. It is
				181	enabled by default.
				182
				183	oldalloc This disables the Orlov block allocator and enables
				184	the old block allocator. Orlov should have better
				185	performance - we'd like to get some feedback if it's
				186	the contrary for you.
				187
				188	user_xattr Enables Extended User Attributes. Additionally, you
				189	need to have extended attribute support enabled in the
				190	kernel configuration (CONFIG_EXT4_FS_XATTR). See the
				191	attr(5) manual page and http://acl.bestbits.at/ to
				192	learn more about extended attributes.
				193
				194	nouser_xattr Disables Extended User Attributes.
				195
				196	acl Enables POSIX Access Control Lists support.
				197	Additionally, you need to have ACL support enabled in
				198	the kernel configuration (CONFIG_EXT4_FS_POSIX_ACL).
				199	See the acl(5) manual page and http://acl.bestbits.at/
				200	for more information.
				201
				202	noacl This option disables POSIX Access Control List
				203	support.
				204
				205	reservation
				206
				207	noreservation
				208
				209	bsddf (*) Make 'df' act like BSD.
				210	minixdf Make 'df' act like Minix.
				211
				212	check=none Don't do extra checking of bitmaps on mount.
				213	nocheck
				214
				215	debug Extra debugging information is sent to syslog.
				216
				217	errors=remount-ro(*) Remount the filesystem read-only on an error.
				218	errors=continue Keep going on a filesystem error.
				219	errors=panic Panic and halt the machine if an error occurs.
				220
				221	grpid Give objects the same group ID as their creator.
				222	bsdgroups
				223
				224	nogrpid (*) New objects have the group ID of their creator.
				225	sysvgroups
				226
				227	resgid=n The group ID which may use the reserved blocks.
				228
				229	resuid=n The user ID which may use the reserved blocks.
				230
				231	sb=n Use alternate superblock at this location.
				232
				233	quota
				234	noquota
				235	grpquota
				236	usrquota
				237
				238	bh (*) ext4 associates buffer heads to data pages to
				239	nobh (a) cache disk block mapping information
				240	(b) link pages into transaction to provide
				241	ordering guarantees.
				242	"bh" option forces use of buffer heads.
				243	"nobh" option tries to avoid associating buffer
				244	heads (supported only for "writeback" mode).
				245
Alex Tomas	c9de560	2008-01-29 00:19:52 -0500	[diff] [blame]	246	mballoc (*) Use the multiple block allocator for block allocation
				247	nomballoc disabled multiple block allocator for block allocation.
				248	stripe=n Number of filesystem blocks that mballoc will try
				249	to use for allocation size and alignment. For RAID5/6
				250	systems this should be the number of data
				251	disks * RAID chunk size in file system blocks.
Mingming Cao	49f1487	2008-07-11 19:27:31 -0400	[diff] [blame]	252	delalloc (*) Deferring block allocation until write-out time.
				253	nodelalloc Disable delayed allocation. Blocks are allocation
				254	when data is copied from user to page cache.
Dave Kleikamp	fc513a3	2006-10-11 01:21:25 -0700	[diff] [blame]	255	Data Mode
Jose R. Santos	93e3270	2008-07-11 19:27:31 -0400	[diff] [blame]	256	=========
Dave Kleikamp	fc513a3	2006-10-11 01:21:25 -0700	[diff] [blame]	257	There are 3 different data modes:
				258
				259	* writeback mode
				260	In data=writeback mode, ext4 does not journal data at all. This mode provides
				261	a similar level of journaling as that of XFS, JFS, and ReiserFS in its default
				262	mode - metadata journaling. A crash+recovery can cause incorrect data to
				263	appear in files which were written shortly before the crash. This mode will
				264	typically provide the best ext4 performance.
				265
				266	* ordered mode
				267	In data=ordered mode, ext4 only officially journals metadata, but it logically
Mingming Cao	49f1487	2008-07-11 19:27:31 -0400	[diff] [blame]	268	groups metadata information related to data changes with the data blocks into a
				269	single unit called a transaction. When it's time to write the new metadata
				270	out to disk, the associated data blocks are written first. In general,
				271	this mode performs slightly slower than writeback but significantly faster than journal mode.
Dave Kleikamp	fc513a3	2006-10-11 01:21:25 -0700	[diff] [blame]	272
				273	* journal mode
				274	data=journal mode provides full data and metadata journaling. All new data is
				275	written to the journal first, and then to its final location.
				276	In the event of a crash, the journal can be replayed, bringing both data and
				277	metadata into a consistent state. This mode is the slowest except when data
				278	needs to be read from and written to disk at the same time where it
Mingming Cao	49f1487	2008-07-11 19:27:31 -0400	[diff] [blame]	279	outperforms all others modes. Curently ext4 does not have delayed
				280	allocation support if this data journalling mode is selected.
Dave Kleikamp	fc513a3	2006-10-11 01:21:25 -0700	[diff] [blame]	281
				282	References
				283	==========
				284
				285	kernel source: <file:fs/ext4/>
				286	<file:fs/jbd2/>
				287
				288	programs: http://e2fsprogs.sourceforge.net/
Dave Kleikamp	fc513a3	2006-10-11 01:21:25 -0700	[diff] [blame]	289
				290	useful links: http://fedoraproject.org/wiki/ext3-devel
				291	http://www.bullopensource.org/ext4/
Jose R. Santos	93e3270	2008-07-11 19:27:31 -0400	[diff] [blame]	292	http://ext4.wiki.kernel.org/index.php/Main_Page
				293	http://fedoraproject.org/wiki/Features/Ext4