[mythtv-users] XFS: options when running mkfs.xfs
John Goerzen
jgoerzen at complete.org
Wed Sep 5 14:45:46 UTC 2007
On Tue September 4 2007 7:51:03 pm Phill Edwards wrote:
> > I don't know of any mkfs changes, but Henk Schoneveld told me once
> > that using the mount option "allocsize=512m" reduces fragmentation.
> > I've been meaning to try it but haven't gotten around to it yet.
> > MythTV tends to keep filesystems very close to full, and I've found
> > xfs with default options will fragment and suffer reduced performance
> > under these conditions if xfs_fsr isn't run periodically.
>
> Uh-oh - urban myth flame coming here! I had always been led to believe
> that unlike their Windows counterparts "unix file systems never need
> defragging". I've always been a bit suss of that but as I love Linux I
> was prepared to have blind faith. But it sounds like you do have to
> defrag certain unix filf system types then? I also found this -
> https://sourceforge.net/projects/defragfs/
Existance of a project does not imply need ;-)
There are a several points to make here.
First, modern *nix filesystems are very good at avoiding performance problems
caused by fragmentation in general. This is distinct from avoiding
fragmentation itself. After all, what is the point of defragging a 10GB
file that is split into three contiguous extents?
Secondly, as a filesystem starts to get full -- say, 80-90% or more, no
filesystem is going to be able to avoid fragmentation of large (relative to
the size of the FS) files. That's because at this capacity, large holes are
unlikely to exist. That doesn't necessarily imply poor performance,
however.
Third, I think that fragmentation is a complete non-issue for MythTV users.
Even a fragmented disk will perform faster than the video stream.
I know little of NTFS, but I do know about FAT and various Unix filesystems.
FAT was prone to fragmentation due to several reasons. One is the typically
stupid block allocator that was used. If memory serves, whenever a new
block ("cluster") was needed for writing file data, it picked the first free
one starting from the beginning of the disk. This almost guaranteed
fragmentation, since it would try to fill up all the "holes" left over from
deleted files quickly.
Another problem with FAT that defragmentation typically does NOT resolve is
the distance from the file's metadata to the file's content. FAT stores
metadata about files (name, attributes/permissions, date, size) in a data
area for the *directory*. It is simply not possible to have this close to
each file, and it must be accessed before working with files -- at least the
size must be known.
Now, let's look at how Linux/Unix filesystems make this situation better.
Traditional Unix filesystems used cylinder groups (good explanation at [1])
for allocation. The disk was divided into groups. Each group contained the
metadata and content of the files stored within it. Of course, files can be
large enough to span groups. But the simple act of using groups
accomplishes several things. By trying to keep files within a group, even
if the file gets fragmented, the disk heads will not have to travel all over
the disk to hunt down its pieces. Also, it is easier and more efficient to
find large contiguous regions to use with data.
All Unix filesystems also have the notion of an inode. In Unix, the inode
contains the metadata about the file: permissions, size, datestamps, and the
exact blocks on disk that make up the file. The inode does not contain a
name; Unix directories are simply maps from names to inodes. These inodes
are typically put in the same cylinder group as the file content.
Pretty much every filesystem uses a more modern version of cylinder groups.
ext2/3 calls them block groups. XFS calls them allocation groups.
So, even if your file gets fragmented, the fragments tend to be close to each
other.
XFS also has some even nicer optimizations. The main one is delayed
allocation. When you write to a file on Unix, typically the data sits in
the write cache for awhile before being flushed out to disk. "awhile" is an
eternity in CPU time -- maybe even many seconds. Anyhow, most filesystems
reserve specific blocks on the disk when data enters the write cache. XFS
just reserves space in general. XFS does not reserve particular disk areas
until the last possible moment before the data must be written to the disk.
This allows it to find a chunk of disk space of the optimal size to hold the
data being written. A classic problem with fragmentation is predicting the
size of a file to be written (OS/2 actually had a system call that
programmers could call to inform it of the estimated size in advance). By
using delayed allocation, XFS can simply reserve blocks based on the entire
size of the write that it knows is coming because of the cache. So, having
more RAM available could actually reduce your fragmentation on XFS.
Finally, Unix is a multiuser, multitasking, multithreaded operating system.
The days when only one program is accessing the disk are long gone. Even on
a desktop PC, chances are that more than one program is going to be
accessing the disk at once. Fragmentation means much less when the disk is
in almost constant use, with multiple processes demanding multiple files
simultaneously. In this situation, block schedulers make much more of a
difference. For instance, if the OS can reorder read requests such that the
disk can make one head sweep and fulfill a dozen requests at once, then
sweep back in the other direction to fulfill more, that will be more
efficient than seeking back and forth over the disk at random.
So I think the bottom line is true in almost all cases: fragmentation is
irrelevant on Linux.
[1] http://8help.osu.edu/wks/sysadm_course/html/sysadm-31.html
More information about the mythtv-users
mailing list