jyavenard at gmail.com
Fri Feb 14 22:13:53 UTC 2014
On 15 February 2014 03:04, Warpme <warpme at o2.pl> wrote:
> Ah - maybe the it is related to Jens Axboe work related to new, more
> effective writeback mechanism for Linux Kernel version 2.6.32 ?
> It was about per-backing-device based writeback - so since 2.6.32, every
> block device has it own pdflush thread ensuring that dirty pages were
> periodically written to the underlying storage device.
I'm currently experiencing the issue in the following configurations:
FE -> BE with local RAID5 array (mdadm+LVM+JFS) with one disk showing
a dozen unreadable sector (and increasing)
FE -> BE -> FreeNAS File Server (RAIDZ2)
in both cases the frontend access the BE using storage group and the
FE -> Freenas <- BE
Where the FE mounts the same NFS file system as the backend, and as
such, there's no use of the myth protocol for live TV recording.
Then it's perfect.
Starting LiveTV shows that it's always at 5s behind live.
NFS directory are mounted with:
so my understanding is that a sync to the NFS file server is done
every second (actimeo=1)
I've started writing unit tests on the ringbuffer...
So I simulate a 15Mbit/s write using a ThreadedFileWriter using 188
bytes write and I open a few ms later a RingBuffer also pulling at
15Mbit/s reading 457kB at a time (which is what my FE is usually up to
before it stalls)
And after about 20s; I get a lock. That was last night right before
going to bed, I haven't analysed what's going on.
> Right. OS "stalls" for systems with huge RAM is well known "problem". AFAIK
> issue is with default pdflush settings.
my BE has 16GB of ECC RAM.
> Looking on defaults:
> dirty_background_ratio (default 10):
> Maximum percentage of active that can be filled with dirty pages before
> pdflush begins to writeback page cache to mass storage.
is that irrelevant of the actimeo=1 mount option?
> This means page cache can accommodate up tp 10% of data before flusher
> thread will trigger writeback. So if there is 16G RAM - it can be 1.6G
> written in one steep by pdflush thread working on top system priority (and
> of course causing famous "write hog")
this would explain the 20+ second delay... however, wouldn't a read be
able to read from the cache at that time if used on the same machine?
in which case it doesn't really matter what is being written and how
slow it is... the reader should only access cache data until it's
flushed to disk
> dirty_expire_centiseconds (default 3000):
> In hundredths of a second, how long data can be in the page cache before
> it's considered expired and must be written at the next opportunity. Note
> that this default is very long: a full 30 seconds. That means that under
> normal circumstances, unless you write enough to trigger the other pdflush
> method, Linux won't actually commit anything you write until 30 seconds
in any case, if you call sync, do you first the data to be written to
disk regardless of the kernel paramater?
> So data written to disk will sit in memory until either:
> a) they're more than 30 seconds old, or
> b) the dirty pages have consumed more than 10% of the active, working
> Maybe a) is explaining JYA observations that read thread see data with 25sec
> delay compared to writer thread - assuming writeback to mass storage is
> delayed by default 30sec?
oh I'm not seeing 25s delay between writes and reads.. I see safe_read
*locking* for 25+ seconds...
> I wasn't looking on MythCode, but quick google-fu tells:
> "If you do need guarantees about the consistency of your data on disk or the
> order in which it hits disk, there are several solutions: For file-based
> I/O, you can pass O_SYNC to open(2) or use the fsync(2), fdatasync(2), or
> sync_file_range(2) system calls. For mapped I/O, use msync(2)."
there's already fsync or fdatasync methods in the TFW class; however,
I don't know if they are called and how often.
> I'm wonder - are we using any from above in reader thread?
no only in the writer thread.
> When I had old, 512byte sector HDD, following settings allowed me to have
> zero "TFW(/myth/tv/8027_20140214090200.mpg:384): write(57528) cnt 38 total
> 2259196 -- took a long time, 1702 ms" during tests with 16HD concurrent
> streams on single SATA HDD.
Im seeing this recording 12 streams at once, which takes about
60Mbit/s on the network card to the NFS server.
FWIW, I can do about 68MB/s over NFS.
> BTW2: I would love to see this thread in MythTV forums - so I can
> read/replay anywhere via browser - instead of only in mailer program :-p
me too... I think it would make for an easier read...
In fact I suggested a troubleshooting thread specifically with this
topic in mind
More information about the mythtv-dev