Blame - Documentation/filesystems/ext4.txt - kernel/msm-4.19

blob: 80e193d82e2e10415cc828124cec296855398aeb [file] [log] [blame]

Dave Kleikamp	fc513a3	2006-10-11 01:21:25 -0700	[diff] [blame]	1
				2	Ext4 Filesystem
				3	===============
				4
				5	This is a development version of the ext4 filesystem, an advanced level
				6	of the ext3 filesystem which incorporates scalability and reliability
				7	enhancements for supporting large filesystems (64 bit) in keeping with
				8	increasing disk capacities and state-of-the-art feature requirements.
				9
				10	Mailing list: linux-ext4@vger.kernel.org
				11
				12
				13	1. Quick usage instructions:
				14	===========================
				15
Jose R. Santos	93e3270	2008-07-11 19:27:31 -0400	[diff] [blame]	16	- Compile and install the latest version of e2fsprogs (as of this
				17	writing version 1.41) from:
				18
				19	http://sourceforge.net/project/showfiles.php?group_id=2406
				20
				21	or
				22
Dave Kleikamp	fc513a3	2006-10-11 01:21:25 -0700	[diff] [blame]	23	ftp://ftp.kernel.org/pub/linux/kernel/people/tytso/e2fsprogs/
				24
Jose R. Santos	93e3270	2008-07-11 19:27:31 -0400	[diff] [blame]	25	or grab the latest git repository from:
Dave Kleikamp	fc513a3	2006-10-11 01:21:25 -0700	[diff] [blame]	26
Jose R. Santos	93e3270	2008-07-11 19:27:31 -0400	[diff] [blame]	27	git://git.kernel.org/pub/scm/fs/ext2/e2fsprogs.git
Dave Kleikamp	fc513a3	2006-10-11 01:21:25 -0700	[diff] [blame]	28
Jose R. Santos	93e3270	2008-07-11 19:27:31 -0400	[diff] [blame]	29	- Create a new filesystem using the ext4dev filesystem type:
Dave Kleikamp	fc513a3	2006-10-11 01:21:25 -0700	[diff] [blame]	30
Jose R. Santos	93e3270	2008-07-11 19:27:31 -0400	[diff] [blame]	31	# mke2fs -t ext4dev /dev/hda1
Dave Kleikamp	fc513a3	2006-10-11 01:21:25 -0700	[diff] [blame]	32
Jose R. Santos	93e3270	2008-07-11 19:27:31 -0400	[diff] [blame]	33	Or configure an existing ext3 filesystem to support extents and set
				34	the test_fs flag to indicate that it's ok for an in-development
				35	filesystem to touch this filesystem:
Dave Kleikamp	fc513a3	2006-10-11 01:21:25 -0700	[diff] [blame]	36
Jose R. Santos	93e3270	2008-07-11 19:27:31 -0400	[diff] [blame]	37	# tune2fs -O extents -E test_fs /dev/hda1
				38
				39	If the filesystem was created with 128 byte inodes, it can be
				40	converted to use 256 byte for greater efficiency via:
				41
				42	# tune2fs -I 256 /dev/hda1
				43
				44	(Note: we currently do not have tools to convert an ext4dev
				45	filesystem back to ext3; so please do not do try this on production
				46	filesystems.)
				47
				48	- Mounting:
				49
				50	# mount -t ext4dev /dev/hda1 /wherever
Dave Kleikamp	fc513a3	2006-10-11 01:21:25 -0700	[diff] [blame]	51
				52	- When comparing performance with other filesystems, remember that
Jose R. Santos	93e3270	2008-07-11 19:27:31 -0400	[diff] [blame]	53	ext3/4 by default offers higher data integrity guarantees than most.
				54	So when comparing with a metadata-only journalling filesystem, such
				55	as ext3, use `mount -o data=writeback'. And you might as well use
				56	`mount -o nobh' too along with it. Making the journal larger than
				57	the mke2fs default often helps performance with metadata-intensive
				58	workloads.
Dave Kleikamp	fc513a3	2006-10-11 01:21:25 -0700	[diff] [blame]	59
				60	2. Features
				61	===========
				62
				63	2.1 Currently available
				64
Jose R. Santos	93e3270	2008-07-11 19:27:31 -0400	[diff] [blame]	65	* ability to use filesystems > 16TB (e2fsprogs support not available yet)
Dave Kleikamp	fc513a3	2006-10-11 01:21:25 -0700	[diff] [blame]	66	* extent format reduces metadata overhead (RAM, IO for access, transactions)
				67	* extent format more robust in face of on-disk corruption due to magics,
				68	* internal redunancy in tree
Mingming Cao	49f1487	2008-07-11 19:27:31 -0400	[diff] [blame]	69	* improved file allocation (multi-block alloc)
Jose R. Santos	93e3270	2008-07-11 19:27:31 -0400	[diff] [blame]	70	* fix 32000 subdirectory limit
				71	* nsec timestamps for mtime, atime, ctime, create time
				72	* inode version field on disk (NFSv4, Lustre)
				73	* reduced e2fsck time via uninit_bg feature
				74	* journal checksumming for robustness, performance
				75	* persistent file preallocation (e.g for streaming media, databases)
				76	* ability to pack bitmaps and inode tables into larger virtual groups via the
				77	flex_bg feature
				78	* large file support
				79	* Inode allocation using large virtual block groups via flex_bg
Mingming Cao	49f1487	2008-07-11 19:27:31 -0400	[diff] [blame]	80	* delayed allocation
				81	* large block (up to pagesize) support
				82	* efficent new ordered mode in JBD2 and ext4(avoid using buffer head to force
				83	the ordering)
Dave Kleikamp	fc513a3	2006-10-11 01:21:25 -0700	[diff] [blame]	84
				85	2.2 Candidate features for future inclusion
				86
Jose R. Santos	93e3270	2008-07-11 19:27:31 -0400	[diff] [blame]	87	* Online defrag (patches available but not well tested)
				88	* reduced mke2fs time via lazy itable initialization in conjuction with
				89	the uninit_bg feature (capability to do this is available in e2fsprogs
				90	but a kernel thread to do lazy zeroing of unused inode table blocks
				91	after filesystem is first mounted is required for safety)
Dave Kleikamp	fc513a3	2006-10-11 01:21:25 -0700	[diff] [blame]	92
Jose R. Santos	93e3270	2008-07-11 19:27:31 -0400	[diff] [blame]	93	There are several others under discussion, whether they all make it in is
				94	partly a function of how much time everyone has to work on them. Features like
				95	metadata checksumming have been discussed and planned for a bit but no patches
				96	exist yet so I'm not sure they're in the near-term roadmap.
Dave Kleikamp	fc513a3	2006-10-11 01:21:25 -0700	[diff] [blame]	97
Jose R. Santos	93e3270	2008-07-11 19:27:31 -0400	[diff] [blame]	98	The big performance win will come with mballoc, delalloc and flex_bg
				99	grouping of bitmaps and inode tables. Some test results available here:
Dave Kleikamp	fc513a3	2006-10-11 01:21:25 -0700	[diff] [blame]	100
Jose R. Santos	93e3270	2008-07-11 19:27:31 -0400	[diff] [blame]	101	- http://www.bullopensource.org/ext4/20080530/ffsb-write-2.6.26-rc2.html
				102	- http://www.bullopensource.org/ext4/20080530/ffsb-readwrite-2.6.26-rc2.html
Dave Kleikamp	fc513a3	2006-10-11 01:21:25 -0700	[diff] [blame]	103
				104	3. Options
				105	==========
				106
				107	When mounting an ext4 filesystem, the following option are accepted:
				108	(*) == default
				109
Alex Tomas	c9de560	2008-01-29 00:19:52 -0500	[diff] [blame]	110	extents (*) ext4 will use extents to address file data. The
Dave Kleikamp	fc513a3	2006-10-11 01:21:25 -0700	[diff] [blame]	111	file system will no longer be mountable by ext3.
				112
Alex Tomas	c9de560	2008-01-29 00:19:52 -0500	[diff] [blame]	113	noextents ext4 will not use extents for newly created files
				114
Girish Shilamkar	818d276	2008-01-28 23:58:27 -0500	[diff] [blame]	115	journal_checksum Enable checksumming of the journal transactions.
				116	This will allow the recovery code in e2fsck and the
				117	kernel to detect corruption in the kernel. It is a
				118	compatible change and will be ignored by older kernels.
				119
				120	journal_async_commit Commit block can be written to disk without waiting
				121	for descriptor blocks. If enabled older kernels cannot
				122	mount the device. This will enable 'journal_checksum'
				123	internally.
				124
Dave Kleikamp	fc513a3	2006-10-11 01:21:25 -0700	[diff] [blame]	125	journal=update Update the ext4 file system's journal to the current
				126	format.
				127
				128	journal=inum When a journal already exists, this option is ignored.
				129	Otherwise, it specifies the number of the inode which
				130	will represent the ext4 file system's journal file.
				131
				132	journal_dev=devnum When the external journal device's major/minor numbers
				133	have changed, this option allows the user to specify
				134	the new journal location. The journal device is
				135	identified through its new major/minor numbers encoded
				136	in devnum.
				137
				138	noload Don't load the journal on mounting.
				139
				140	data=journal All data are committed into the journal prior to being
				141	written into the main file system.
				142
				143	data=ordered (*) All data are forced directly out to the main file
				144	system prior to its metadata being committed to the
				145	journal.
				146
				147	data=writeback Data ordering is not preserved, data may be written
				148	into the main file system after its metadata has been
				149	committed to the journal.
				150
				151	commit=nrsec (*) Ext4 can be told to sync all its data and metadata
				152	every 'nrsec' seconds. The default value is 5 seconds.
				153	This means that if you lose your power, you will lose
				154	as much as the latest 5 seconds of work (your
				155	filesystem will not be damaged though, thanks to the
				156	journaling). This default value (or any low value)
				157	will hurt performance, but it's good for data-safety.
				158	Setting it to 0 will have the same effect as leaving
				159	it at the default (5 seconds).
				160	Setting it to very large values will improve
				161	performance.
				162
Eric Sandeen	571640c	2008-05-26 12:29:46 -0400	[diff] [blame]	163	barrier=<0\|1(*)> This enables/disables the use of write barriers in
				164	the jbd code. barrier=0 disables, barrier=1 enables.
				165	This also requires an IO stack which can support
				166	barriers, and if jbd gets an error on a barrier
				167	write, it will disable again with a warning.
				168	Write barriers enforce proper on-disk ordering
				169	of journal commits, making volatile disk write caches
				170	safe to use, at some performance penalty. If
				171	your disks are battery-backed in one way or another,
				172	disabling barriers may safely improve performance.
Dave Kleikamp	fc513a3	2006-10-11 01:21:25 -0700	[diff] [blame]	173
				174	orlov (*) This enables the new Orlov block allocator. It is
				175	enabled by default.
				176
				177	oldalloc This disables the Orlov block allocator and enables
				178	the old block allocator. Orlov should have better
				179	performance - we'd like to get some feedback if it's
				180	the contrary for you.
				181
				182	user_xattr Enables Extended User Attributes. Additionally, you
				183	need to have extended attribute support enabled in the
				184	kernel configuration (CONFIG_EXT4_FS_XATTR). See the
				185	attr(5) manual page and http://acl.bestbits.at/ to
				186	learn more about extended attributes.
				187
				188	nouser_xattr Disables Extended User Attributes.
				189
				190	acl Enables POSIX Access Control Lists support.
				191	Additionally, you need to have ACL support enabled in
				192	the kernel configuration (CONFIG_EXT4_FS_POSIX_ACL).
				193	See the acl(5) manual page and http://acl.bestbits.at/
				194	for more information.
				195
				196	noacl This option disables POSIX Access Control List
				197	support.
				198
				199	reservation
				200
				201	noreservation
				202
				203	bsddf (*) Make 'df' act like BSD.
				204	minixdf Make 'df' act like Minix.
				205
				206	check=none Don't do extra checking of bitmaps on mount.
				207	nocheck
				208
				209	debug Extra debugging information is sent to syslog.
				210
				211	errors=remount-ro(*) Remount the filesystem read-only on an error.
				212	errors=continue Keep going on a filesystem error.
				213	errors=panic Panic and halt the machine if an error occurs.
				214
				215	grpid Give objects the same group ID as their creator.
				216	bsdgroups
				217
				218	nogrpid (*) New objects have the group ID of their creator.
				219	sysvgroups
				220
				221	resgid=n The group ID which may use the reserved blocks.
				222
				223	resuid=n The user ID which may use the reserved blocks.
				224
				225	sb=n Use alternate superblock at this location.
				226
				227	quota
				228	noquota
				229	grpquota
				230	usrquota
				231
				232	bh (*) ext4 associates buffer heads to data pages to
				233	nobh (a) cache disk block mapping information
				234	(b) link pages into transaction to provide
				235	ordering guarantees.
				236	"bh" option forces use of buffer heads.
				237	"nobh" option tries to avoid associating buffer
				238	heads (supported only for "writeback" mode).
				239
Alex Tomas	c9de560	2008-01-29 00:19:52 -0500	[diff] [blame]	240	mballoc (*) Use the multiple block allocator for block allocation
				241	nomballoc disabled multiple block allocator for block allocation.
				242	stripe=n Number of filesystem blocks that mballoc will try
				243	to use for allocation size and alignment. For RAID5/6
				244	systems this should be the number of data
				245	disks * RAID chunk size in file system blocks.
Mingming Cao	49f1487	2008-07-11 19:27:31 -0400	[diff] [blame]	246	delalloc (*) Deferring block allocation until write-out time.
				247	nodelalloc Disable delayed allocation. Blocks are allocation
				248	when data is copied from user to page cache.
Dave Kleikamp	fc513a3	2006-10-11 01:21:25 -0700	[diff] [blame]	249	Data Mode
Jose R. Santos	93e3270	2008-07-11 19:27:31 -0400	[diff] [blame]	250	=========
Dave Kleikamp	fc513a3	2006-10-11 01:21:25 -0700	[diff] [blame]	251	There are 3 different data modes:
				252
				253	* writeback mode
				254	In data=writeback mode, ext4 does not journal data at all. This mode provides
				255	a similar level of journaling as that of XFS, JFS, and ReiserFS in its default
				256	mode - metadata journaling. A crash+recovery can cause incorrect data to
				257	appear in files which were written shortly before the crash. This mode will
				258	typically provide the best ext4 performance.
				259
				260	* ordered mode
				261	In data=ordered mode, ext4 only officially journals metadata, but it logically
Mingming Cao	49f1487	2008-07-11 19:27:31 -0400	[diff] [blame]	262	groups metadata information related to data changes with the data blocks into a
				263	single unit called a transaction. When it's time to write the new metadata
				264	out to disk, the associated data blocks are written first. In general,
				265	this mode performs slightly slower than writeback but significantly faster than journal mode.
Dave Kleikamp	fc513a3	2006-10-11 01:21:25 -0700	[diff] [blame]	266
				267	* journal mode
				268	data=journal mode provides full data and metadata journaling. All new data is
				269	written to the journal first, and then to its final location.
				270	In the event of a crash, the journal can be replayed, bringing both data and
				271	metadata into a consistent state. This mode is the slowest except when data
				272	needs to be read from and written to disk at the same time where it
Mingming Cao	49f1487	2008-07-11 19:27:31 -0400	[diff] [blame]	273	outperforms all others modes. Curently ext4 does not have delayed
				274	allocation support if this data journalling mode is selected.
Dave Kleikamp	fc513a3	2006-10-11 01:21:25 -0700	[diff] [blame]	275
				276	References
				277	==========
				278
				279	kernel source: <file:fs/ext4/>
				280	<file:fs/jbd2/>
				281
				282	programs: http://e2fsprogs.sourceforge.net/
Dave Kleikamp	fc513a3	2006-10-11 01:21:25 -0700	[diff] [blame]	283
				284	useful links: http://fedoraproject.org/wiki/ext3-devel
				285	http://www.bullopensource.org/ext4/
Jose R. Santos	93e3270	2008-07-11 19:27:31 -0400	[diff] [blame]	286	http://ext4.wiki.kernel.org/index.php/Main_Page
				287	http://fedoraproject.org/wiki/Features/Ext4