[mythtv-users] XFS: options when running mkfs.xfs

Wed Sep 5 14:45:46 UTC 2007

On Tue September 4 2007 7:51:03 pm Phill Edwards wrote:
> > I don't know of any mkfs changes, but Henk Schoneveld told me once
> > that using the mount option "allocsize=512m" reduces fragmentation.
> > I've been meaning to try it but haven't gotten around to it yet.
> > MythTV tends to keep filesystems very close to full, and I've found
> > xfs with default options will fragment and suffer reduced performance
> > under these conditions if xfs_fsr isn't run periodically.
>
> Uh-oh - urban myth flame coming here! I had always been led to believe
> that unlike their Windows counterparts "unix file systems never need
> defragging". I've always been a bit suss of that but as I love Linux I
> was prepared to have blind faith. But it sounds like you do have to
> defrag certain unix filf system types then? I also found this -
> https://sourceforge.net/projects/defragfs/

Existance of a project does not imply need ;-)

There are a several points to make here.

First, modern *nix filesystems are very good at avoiding performance problems 
caused by fragmentation in general.  This is distinct from avoiding 
fragmentation itself.  After all, what is the point of defragging a 10GB 
file that is split into three contiguous extents?

Secondly, as a filesystem starts to get full -- say, 80-90% or more, no 
filesystem is going to be able to avoid fragmentation of large (relative to 
the size of the FS) files.  That's because at this capacity, large holes are 
unlikely to exist.  That doesn't necessarily imply poor performance, 
however.

Third, I think that fragmentation is a complete non-issue for MythTV users.  
Even a fragmented disk will perform faster than the video stream.

I know little of NTFS, but I do know about FAT and various Unix filesystems.

FAT was prone to fragmentation due to several reasons.  One is the typically 
stupid block allocator that was used.  If memory serves, whenever a new 
block ("cluster") was needed for writing file data, it picked the first free 
one starting from the beginning of the disk.  This almost guaranteed 
fragmentation, since it would try to fill up all the "holes" left over from 
deleted files quickly.

Another problem with FAT that defragmentation typically does NOT resolve is 
the distance from the file's metadata to the file's content.  FAT stores 
metadata about files (name, attributes/permissions, date, size) in a data 
area for the *directory*.  It is simply not possible to have this close to 
each file, and it must be accessed before working with files -- at least the 
size must be known.

Now, let's look at how Linux/Unix filesystems make this situation better.

Traditional Unix filesystems used cylinder groups (good explanation at [1]) 
for allocation.  The disk was divided into groups.  Each group contained the 
metadata and content of the files stored within it.  Of course, files can be 
large enough to span groups.  But the simple act of using groups 
accomplishes several things.  By trying to keep files within a group, even 
if the file gets fragmented, the disk heads will not have to travel all over 
the disk to hunt down its pieces.  Also, it is easier and more efficient to 
find large contiguous regions to use with data.

All Unix filesystems also have the notion of an inode.  In Unix, the inode 
contains the metadata about the file: permissions, size, datestamps, and the 
exact blocks on disk that make up the file.  The inode does not contain a 
name; Unix directories are simply maps from names to inodes.  These inodes 
are typically put in the same cylinder group as the file content.

Pretty much every filesystem uses a more modern version of cylinder groups.  
ext2/3 calls them block groups.  XFS calls them allocation groups.

So, even if your file gets fragmented, the fragments tend to be close to each 
other.

XFS also has some even nicer optimizations.  The main one is delayed 
allocation.  When you write to a file on Unix, typically the data sits in 
the write cache for awhile before being flushed out to disk.  "awhile" is an 
eternity in CPU time -- maybe even many seconds.  Anyhow, most filesystems 
reserve specific blocks on the disk when data enters the write cache.  XFS 
just reserves space in general.  XFS does not reserve particular disk areas 
until the last possible moment before the data must be written to the disk.  
This allows it to find a chunk of disk space of the optimal size to hold the 
data being written.  A classic problem with fragmentation is predicting the 
size of a file to be written (OS/2 actually had a system call that 
programmers could call to inform it of the estimated size in advance).  By 
using delayed allocation, XFS can simply reserve blocks based on the entire 
size of the write that it knows is coming because of the cache.  So, having 
more RAM available could actually reduce your fragmentation on XFS.

Finally, Unix is a multiuser, multitasking, multithreaded operating system.  
The days when only one program is accessing the disk are long gone.  Even on 
a desktop PC, chances are that more than one program is going to be 
accessing the disk at once.  Fragmentation means much less when the disk is 
in almost constant use, with multiple processes demanding multiple files 
simultaneously.  In this situation, block schedulers make much more of a 
difference.  For instance, if the OS can reorder read requests such that the 
disk can make one head sweep and fulfill a dozen requests at once, then 
sweep back in the other direction to fulfill more, that will be more 
efficient than seeking back and forth over the disk at random.

So I think the bottom line is true in almost all cases: fragmentation is 
irrelevant on Linux.

[1] http://8help.osu.edu/wks/sysadm_course/html/sysadm-31.html