EXT4(5) File Formats Manual EXT4(5)
NAME
ext2 - the second extended file system
ext3 - the third extended file system
ext4 - the fourth extended file system
DESCRIPTION
The second, third, and fourth extended file systems, or ext2, ext3, and
ext4 as they are commonly known, are Linux file systems that have his-
torically been the default file system for many Linux distributions.
They are general purpose file systems that have been designed for
extensibility and backwards compatibility. In particular, file systems
previously intended for use with the ext2 and ext3 file systems can be
mounted using the ext4 file system driver, and indeed in many modern
Linux distributions, the ext4 file system driver has been configured
handle mount requests for ext2 and ext3 file systems.
FILE SYSTEM FEATURES
A file system formated for ext2, ext3, or ext4 can be have some collec-
tion of the follow file system feature flags enabled. Some of these
features are not supported by all implementations of the ext2, ext3,
and ext4 file system drivers, depending on Linux kernel version in use.
On other operating systems, such as the GNU/HURD or FreeBSD, only a
very restrictive set of file system features may be supported in their
implementations of ext2.
64bit
Enables the file system to be larger than 2^32
blocks. This feature is set automatically, as
needed, but it can be useful to specify this feature
explicitly if the file system might need to be
resized larger than 2^32 blocks, even if it was
smaller than that threshold when it was originally
created. Note that some older kernels and older
versions of e2fsprogs will not support file systems
with this ext4 feature enabled.
bigalloc
This ext4 feature enables clustered block alloca-
tion, so that the unit of allocation is a power of
two number of blocks. That is, each bit in the what
had traditionally been known as the block allocation
bitmap now indicates whether a cluster is in use or
not, where a cluster is by default composed of 16
blocks. This feature can decrease the time spent on
doing block allocation and brings smaller fragmenta-
tion, especially for large files. The size can be
specified using the -C option.
Warning: The bigalloc feature is still under devel-
opment, and may not be fully supported with your
kernel or may have various bugs. Please see the web
page http://ext4.wiki.kernel.org/index.php/Bigalloc
for details. May clash with delayed allocation (see
nodelallocmountoption).
This feature requires that the extent features be
enabled.
dir_index
Use hashed b-trees to speed up name lookups in large
directories. This feature is supported by ext3 and
ext4 file systems, and is ignored by ext2 file sys-
tems.
dir_nlink
Normally ext4 allows an inode to have no more than
65,000 hard links. This applies to files as well as
directories, which means that there can be no more
than 64,998 subdirectories in a directory (because
each of the '..' entries counts as a hard link).
This feature lifts this limit by causing ext4 to use
a links count of 1 to indicate that the number of
hard links to a directory is not known.
extent
This ext4 feature allows the mapping of logical
block numbers for a particular inode to physical
blocks on the storage device to be stored using an
extent tree, which is a more efficient data struc-
ture than the traditional indirect block scheme used
by the ext2 and ext3 file systems. The use of the
extent tree decreases metadata block overhead,
improves file system performance, and decreases the
needed to run e2fsck(8) on the file system. (Note:
both extent and extents are accepted as valid names
for this feature for historical/backwards compati-
bility reasons.)
extra_isize
This ext4 feature reserves a specific amount of
space in each inode for extended metadata such as
nanosecond timestamps and file creation time, even
if the current kernel does not current need to
reserve this much space. Without this feature, the
kernel will reserve the amount of space for features
currently it currently needs, and the rest may be
consumed by extended attributes.
For this feature to be useful the inode size must be
256 bytes in size or larger.
ext_attr
This feature enables the use of extended attributes.
This feature is supported by ext2, ext3, and ext4.
filetype
This feature enables the storage file type informa-
tion in directory entries. This feature is sup-
ported by ext2, ext3, and ext4.
flex_bg
This ext4 feature allows the per-
block group metadata (allocation
bitmaps and inode tables) to be
placed anywhere on the storage
media. In addition, mke2fs will
place the per-block group meta-
data together starting at the
first block group of each
"flex_bg group". The size of
the flex_bg group can be speci-
fied using the -G option.
has_journal
Create a journal to ensure
filesystem consistency even
across unclean shutdowns. Set-
ting the filesystem feature is
equivalent to using the -j
option. This feature is sup-
ported by ext3 and ext4, and
ignored by the ext2 file system
driver.
huge_file
This ext4 feature allows files to
be larger than 2 terabytes in
size.
journal_dev
This feature is enabled on the
superblock found on an external
journal device. The block size
for the external journal must be
the same as the file system which
uses it.
The external journal device can
be used by a file system by spec-
ifying the -J device=<external-
device> option to mke2fs(8) or
tune2fs(8).
large_file
This feature flag is set automat-
ically by modern kernels when a
file larger than 2 gigabytes is
created. Very old kernels could
not handle large files, so this
feature flag was used to prohibit
those kernels from mounting file
systems that they could not
understand.
meta_bg
This ext4 feature allows file
systems to be resized on-line
without explicitly needing to
reserve space for growth in the
size of the block group descrip-
tors. This scheme is also used
to resize file systems which are
larger than 2^32 blocks. It is
not recommended that this feature
be set when a file system is cre-
ated, since this alternate method
of storing the block group
descriptor will slow down the
time needed to mount the file
system, and newer kernels can
automatically set this feature as
necessary when doing an online
resize and no more reserved space
is available in the resize inode.
mmp
This ext4 feature provides multi-
ple mount protection (MMP). MMP
helps to protect the filesystem
from being multiply mounted and
is useful in shared storage envi-
ronments.
resize_inode
This file system feature indi-
cates that space has been
reserved so the block group
descriptor table can be extended
by the file system is resized
while the file system is mounted.
The online resize operation is
carried out by the kernel, trig-
gered, by resize2fs(8). By
default mke2fs will attempt to
reserve enough space so that the
filesystem may grow to 1024 times
its initial size. This can be
changed using the resize extended
option.
This feature requires that the
sparse_super feature be enabled.
sparse_super
This file system feature is set
on all modern ext2, ext3, and
ext4 file system. It indicates
that backup copies of the
superblock and block group
descriptors be present only on a
few block groups, and not all of
them.
uninit_bg
This ext4 file system feature
indicates that the block group
descriptors will be protected
using checksums, making it safe
for mke2fs(8) to create a file
system without initializing all
of the block groups. The kernel
will keep a high watermark of
unused inodes, and initialize
inode tables and block lazily.
This feature speeds up the time
to check the file system using
e2fsck(8), and it also speeds up
the time required for mke2fs(8)
to create the file system.
MOUNT OPTIONS
This section describes mount options which are spe-
cific to ext2, ext3, and ext4. Other generic mount
options may be used as well; see mount(8) for
details.
Mount options for ext2
The `ext2' filesystem is the standard Linux filesys-
tem. Since Linux 2.5.46, for most mount options the
default is determined by the filesystem superblock.
Set them with tune2fs(8).
acl|noacl
Support POSIX Access Control Lists (or not).
bsddf|minixdf
Set the behavior for the statfs system call.
The minixdf behavior is to return in the
f_blocks field the total number of blocks of
the filesystem, while the bsddf behavior
(which is the default) is to subtract the
overhead blocks used by the ext2 filesystem
and not available for file storage. Thus
% mount /k -o minixdf; df /k; umount /k
Filesystem 1024-blocks Used Available Capacity Mounted on
/dev/sda6 2630655 86954 2412169 3% /k
% mount /k -o bsddf; df /k; umount /k
Filesystem 1024-blocks Used Available Capacity Mounted on
/dev/sda6 2543714 13 2412169 0% /k
(Note that this example shows that one can
add command line options to the options given
in /etc/fstab.)
check=none or nocheck
No checking is done at mount time. This is
the default. This is fast. It is wise to
invoke e2fsck(8) every now and then, e.g. at
boot time. The non-default behavior is unsup-
ported (check=normal and check=strict options
have been removed). Note that these mount
options don't have to be supported if ext4
kernel driver is used for ext2 and ext3
filesystems.
debug Print debugging info upon each (re)mount.
errors={continue|remount-ro|panic}
Define the behavior when an error is encoun-
tered. (Either ignore errors and just mark
the filesystem erroneous and continue, or
remount the filesystem read-only, or panic
and halt the system.) The default is set in
the filesystem superblock, and can be changed
using tune2fs(8).
grpid|bsdgroups and nogrpid|sysvgroups
These options define what group id a newly
created file gets. When grpid is set, it
takes the group id of the directory in which
it is created; otherwise (the default) it
takes the fsgid of the current process,
unless the directory has the setgid bit set,
in which case it takes the gid from the par-
ent directory, and also gets the setgid bit
set if it is a directory itself.
grpquota|noquota|quota|usrquota
The usrquota (same as quota) mount option
enables user quota support on the filesystem.
grpquota enables group quotas support. You
need the quota utilities to actually enable
and manage the quota system.
nouid32
Disables 32-bit UIDs and GIDs. This is for
interoperability with older kernels which
only store and expect 16-bit values.
oldalloc or orlov
Use old allocator or Orlov allocator for new
inodes. Orlov is default.
resgid=n and resuid=n
The ext2 filesystem reserves a certain per-
centage of the available space (by default
5%, see mke2fs(8) and tune2fs(8)). These
options determine who can use the reserved
blocks. (Roughly: whoever has the specified
uid, or belongs to the specified group.)
sb=n Instead of block 1, use block n as
superblock. This could be useful when the
filesystem has been damaged. (Earlier,
copies of the superblock would be made every
8192 blocks: in block 1, 8193, 16385, ...
(and one got thousands of copies on a big
filesystem). Since version 1.08, mke2fs has a
-s (sparse superblock) option to reduce the
number of backup superblocks, and since ver-
sion 1.15 this is the default. Note that this
may mean that ext2 filesystems created by a
recent mke2fs cannot be mounted r/w under
Linux 2.0.*.) The block number here uses 1 k
units. Thus, if you want to use logical block
32768 on a filesystem with 4 k blocks, use
"sb=131072".
user_xattr|nouser_xattr
Support "user." extended attributes (or not).
Mount options for ext3
The ext3 filesystem is a version of the ext2
filesystem which has been enhanced with journaling.
It supports the same options as ext2 as well as the
following additions:
journal=update
Update the ext3 filesystem's journal to the
current format.
journal=inum
When a journal already exists, this option is
ignored. Otherwise, it specifies the number
of the inode which will represent the ext3
filesystem's journal file; ext3 will create a
new journal, overwriting the old contents of
the file whose inode number is inum.
journal_dev=devnum/journal_path=path
When the external journal device's
major/minor numbers have changed, these
options allow the user to specify the new
journal location. The journal device is
identified either through its new major/minor
numbers encoded in devnum, or via a path to
the device.
norecovery/noload
Don't load the journal on mounting. Note
that if the filesystem was not unmounted
cleanly, skipping the journal replay will
lead to the filesystem containing inconsis-
tencies that can lead to any number of prob-
lems.
data={journal|ordered|writeback}
Specifies the journaling mode for file data.
Metadata is always journaled. To use modes
other than ordered on the root filesystem,
pass the mode to the kernel as boot parame-
ter, e.g. rootflags=data=journal.
journal
All data is committed into the journal
prior to being written into the main
filesystem.
ordered
This is the default mode. All data is
forced directly out to the main file
system prior to its metadata being
committed to the journal.
writeback
Data ordering is not preserved - data
may be written into the main filesys-
tem after its metadata has been com-
mitted to the journal. This is
rumoured to be the highest-throughput
option. It guarantees internal
filesystem integrity, however it can
allow old data to appear in files
after a crash and journal recovery.
data_err=ignore
Just print an error message if an error
occurs in a file data buffer in ordered mode.
data_err=abort
Abort the journal if an error occurs in a
file data buffer in ordered mode.
barrier=0 / barrier=1
This disables / enables the use of write bar-
riers in the jbd code. barrier=0 disables,
barrier=1 enables (default). This also
requires an IO stack which can support barri-
ers, and if jbd gets an error on a barrier
write, it will disable barriers again with a
warning. Write barriers enforce proper on-
disk ordering of journal commits, making
volatile disk write caches safe to use, at
some performance penalty. If your disks are
battery-backed in one way or another, dis-
abling barriers may safely improve perfor-
mance.
commit=nrsec
Sync all data and metadata every nrsec sec-
onds. The default value is 5 seconds. Zero
means default.
user_xattr
Enable Extended User Attributes. See the
attr(5) manual page.
acl Enable POSIX Access Control Lists. See the
acl(5) manual page.
usr-
jquota=aquota.user|grpjquota=aquota.group|jqfmt=vfsv0
Apart from the old quota system (as in ext2,
jqfmt=vfsold aka version 1 quota) ext3 also
supports journaled quotas (version 2 quota).
jqfmt=vfsv0 enables journaled quotas. For
journaled quotas the mount options usr-
jquota=aquota.user and grpjquota=aquota.group
are required to tell the quota system which
quota database files to use. Journaled quotas
have the advantage that even after a crash no
quota check is required.
Mount options for ext4
The ext4 filesystem is an advanced level of the ext3
filesystem which incorporates scalability and relia-
bility enhancements for supporting large filesystem.
The options journal_dev, norecovery, noload, data,
commit, orlov, oldalloc, [no]user_xattr [no]acl,
bsddf, minixdf, debug, errors, data_err, grpid, bsd-
groups, nogrpid sysvgroups, resgid, resuid, sb,
quota, noquota, grpquota, usrquota usrjquota,
grpjquota and jqfmt are backwardly compatible with
ext3 or ext2.
journal_checksum
Enable checksumming of the journal transac-
tions. This will allow the recovery code in
e2fsck and the kernel to detect corruption in
the kernel. It is a compatible change and
will be ignored by older kernels.
journal_async_commit
Commit block can be written to disk without
waiting for descriptor blocks. If enabled
older kernels cannot mount the device. This
will enable 'journal_checksum' internally.
barrier=0 / barrier=1 / barrier / nobarrier
These mount options have the same effect as
in ext3. The mount options "barrier" and
"nobarrier" are added for consistency with
other ext4 mount options.
The ext4 filesystem enables write barriers by
default.
inode_readahead_blks=n
This tuning parameter controls the maximum
number of inode table blocks that ext4's
inode table readahead algorithm will pre-read
into the buffer cache. The value must be a
power of 2. The default value is 32 blocks.
stripe=n
Number of filesystem blocks that mballoc will
try to use for allocation size and alignment.
For RAID5/6 systems this should be the number
of data disks * RAID chunk size in filesystem
blocks.
delalloc
Deferring block allocation until write-out
time.
nodelalloc
Disable delayed allocation. Blocks are allo-
cated when data is copied from user to page
cache.
max_batch_time=usec
Maximum amount of time ext4 should wait for
additional filesystem operations to be batch
together with a synchronous write operation.
Since a synchronous write operation is going
to force a commit and then a wait for the I/O
complete, it doesn't cost much, and can be a
huge throughput win, we wait for a small
amount of time to see if any other transac-
tions can piggyback on the synchronous write.
The algorithm used is designed to automati-
cally tune for the speed of the disk, by mea-
suring the amount of time (on average) that
it takes to finish committing a transaction.
Call this time the "commit time". If the
time that the transaction has been running is
less than the commit time, ext4 will try
sleeping for the commit time to see if other
operations will join the transaction. The
commit time is capped by the max_batch_time,
which defaults to 15000 us (15 ms). This
optimization can be turned off entirely by
setting max_batch_time to 0.
min_batch_time=usec
This parameter sets the commit time (as
described above) to be at least
min_batch_time. It defaults to zero microsec-
onds. Increasing this parameter may improve
the throughput of multi-threaded, synchronous
workloads on very fast disks, at the cost of
increasing latency.
journal_ioprio=prio
The I/O priority (from 0 to 7, where 0 is the
highest priority) which should be used for
I/O operations submitted by kjournald2 during
a commit operation. This defaults to 3,
which is a slightly higher priority than the
default I/O priority.
abort Simulate the effects of calling ext4_abort()
for debugging purposes. This is normally
used while remounting a filesystem which is
already mounted.
auto_da_alloc|noauto_da_alloc
Many broken applications don't use fsync()
when replacing existing files via patterns
such as
fd = open("foo.new")/write(fd,...)/close(fd)/
rename("foo.new", "foo")
or worse yet
fd = open("foo",
O_TRUNC)/write(fd,...)/close(fd).
If auto_da_alloc is enabled, ext4 will detect
the replace-via-rename and replace-via-trun-
cate patterns and force that any delayed
allocation blocks are allocated such that at
the next journal commit, in the default
data=ordered mode, the data blocks of the new
file are forced to disk before the rename()
operation is committed. This provides
roughly the same level of guarantees as ext3,
and avoids the "zero-length" problem that can
happen when a system crashes before the
delayed allocation blocks are forced to disk.
noinit_itable
Do not initialize any uninitialized inode ta-
ble blocks in the background. This feature
may be used by installation CD's so that the
install process can complete as quickly as
possible; the inode table initialization
process would then be deferred until the next
time the filesystem is mounted.
init_itable=n
The lazy itable init code will wait n times
the number of milliseconds it took to zero
out the previous block group's inode table.
This minimizes the impact on system perfor-
mance while the filesystem's inode table is
being initialized.
discard/nodiscard
Controls whether ext4 should issue dis-
card/TRIM commands to the underlying block
device when blocks are freed. This is useful
for SSD devices and sparse/thinly-provisioned
LUNs, but it is off by default until suffi-
cient testing has been done.
nouid32
Disables 32-bit UIDs and GIDs. This is for
interoperability with older kernels which
only store and expect 16-bit values.
block_validity/noblock_validity
This options allows to enables/disables the
in-kernel facility for tracking filesystem
metadata blocks within internal data struc-
tures. This allows multi-block allocator and
other routines to quickly locate extents
which might overlap with filesystem metadata
blocks. This option is intended for debugging
purposes and since it negatively affects the
performance, it is off by default.
dioread_lock/dioread_nolock
Controls whether or not ext4 should use the
DIO read locking. If the dioread_nolock
option is specified ext4 will allocate unini-
tialized extent before buffer write and con-
vert the extent to initialized after IO com-
pletes. This approach allows ext4 code to
avoid using inode mutex, which improves scal-
ability on high speed storages. However this
does not work with data journaling and
dioread_nolock option will be ignored with
kernel warning. Note that dioread_nolock
code path is only used for extent-based
files. Because of the restrictions this
options comprises it is off by default (e.g.
dioread_lock).
max_dir_size_kb=n
This limits the size of the directories so
that any attempt to expand them beyond the
specified limit in kilobytes will cause an
ENOSPC error. This is useful in memory-con-
strained environments, where a very large
directory can cause severe performance prob-
lems or even provoke the Out Of Memory
killer. (For example, if there is only 512 MB
memory available, a 176 MB directory may
seriously cramp the system's style.)
i_version
Enable 64-bit inode version support. This
option is off by default.
FILE ATTRIBUTES
The ext2, ext3, and ext4 filesystems support setting
the following file attributes on Linux systems using
the chattr(1) utility:
a - append only
A - no atime updates
d - no dump
D - synchronous directory updates
i - immutable
S - synchronous updates
u - undeletable
In addition, the ext3 and ext4 filesystems support
the following flag:
j - data journaling
Finally, the ext4 filesystem also supports the fol-
lowing flag:
e - extents format
For descriptions of these attribute flags, please
refer to the chattr(1) man page.
SEE ALSO
mke2fs(8), mke2fs.conf(5), e2fsck(8), dumpe2fs(8),
tune2fs(8), debugfs(8), mount(8), chattr(1)
E2fsprogs version 1.42.9 December 2013 EXT4(5)