[mythtv] ringbuffer.cpp

Fri Feb 14 16:04:03 UTC 2014

On 14/02/14 10:24, Henk D. Schoneveld wrote:
>
> Maybe I’m totally wrong but could it be that this is the symptom of an underlying problem ?
> In the past before kernel 2.6.33 I didn’t have these problems, later on sometimes more or less. What I discovered then was that kswapd was using 100% of CPU because of waiting for I/O.
Ah - maybe the it is related to Jens Axboe work related to new, more 
effective writeback mechanism for Linux Kernel version 2.6.32 ?
It was about per-backing-device based writeback - so since 2.6.32, every 
block device has it own pdflush thread ensuring that dirty pages were 
periodically written to the underlying storage device.
> There was plenty of memory, I even disabled swap, but the problem persisted. By disabling swap kswapd in theory has no function at all, nevertheless it ‘halted’ the system foor several seconds. Going to a pre 2.6.33 kernel solved my problems. My conclusion FWIW is that kswapd also only is the messenger not the cause.
>>
Right. OS "stalls" for systems with huge RAM is well known "problem". 
AFAIK issue is with default pdflush settings.
Looking on defaults:

dirty_background_ratio (default 10):
Maximum percentage of active that can be filled with dirty pages before 
pdflush begins to writeback page cache to mass storage.

This means page cache can accommodate up tp 10% of data before flusher 
thread will trigger writeback. So if there is 16G RAM - it can be 1.6G 
written in one steep by pdflush thread working on top system priority 
(and of course causing famous "write hog")

dirty_expire_centiseconds (default 3000):
In hundredths of a second, how long data can be in the page cache before 
it's considered expired and must be written at the next opportunity. 
Note that this default is very long: a full 30 seconds. That means that 
under normal circumstances, unless you write enough to trigger the other 
pdflush method, Linux won't actually commit anything you write until 30 
seconds later.

So data written to disk will sit in memory until either:
a) they're more than 30 seconds old, or
b) the dirty pages have consumed more than 10% of the active, working 
memory.

Maybe a) is explaining JYA observations that read thread see data with 
25sec delay compared to writer thread - assuming writeback to mass 
storage is delayed by default 30sec?

I wasn't looking on MythCode, but quick google-fu tells:

"If you do need guarantees about the consistency of your data on disk or 
the order in which it hits disk, there are several solutions: For 
file-based I/O, you can pass O_SYNC to open(2) or use the fsync(2), 
fdatasync(2), or sync_file_range(2) system calls. For mapped I/O, use 
msync(2)."

I'm wonder - are we using any from above in reader thread?

BTW:
When I had old, 512byte sector HDD, following settings allowed me to 
have zero "TFW(/myth/tv/8027_20140214090200.mpg:384): write(57528) cnt 
38 total 2259196 -- took a long time, 1702 ms" during tests with 16HD 
concurrent streams on single SATA HDD.

# ==============================================================
#
# dirty_writeback_centisecs
#
# The kernel flusher threads will periodically wake up and write `old' data
# out to disk.  This tunable expresses the interval between those 
wakeups, in
# 100'ths of a second.
#
# Setting this to zero disables periodic writeback altogether.
# by https://bugzilla.kernel.org/show_bug.cgi?id=12309
# every 5 sec kernel looks up for dirty status
# This setting for smooting writebacking. Maybe 100 will be
# even better.
# echo 300    > /proc/sys/vm/dirty_writeback_centisecs
# Default is 500

vm.dirty_writeback_centisecs = 100

# ==============================================================

# ==============================================================
#
# dirty_background_bytes
#
# Contains the amount of dirty memory at which the background kernel
# flusher threads will start writeback.
#
# Note: dirty_background_bytes is the counterpart of 
dirty_background_ratio. Only
# one of them may be specified at a time. When one sysctl is written it is
# immediately taken into account to evaluate the dirty memory limits and the
# other appears as 0 when read.
#
# Default is <empty>

vm.dirty_background_bytes = 102400

# ==============================================================

# ==============================================================
#
# dirty_expire_centisecs
#
# This tunable is used to define when dirty data is old enough to be 
eligible
# for writeout by the kernel flusher threads.  It is expressed in 100'ths
# of a second.  Data which has been dirty in-memory for longer than this
# interval will be written out next time a flusher thread wakes up.
#
# Default is 3000

vm.dirty_expire_centisecs = 864000

# ==============================================================

# ==============================================================
#
# dirty_bytes
#
# Contains the amount of dirty memory at which a process generating disk 
writes
# will itself start writeback.
#
# Note: dirty_bytes is the counterpart of dirty_ratio. Only one of them 
may be
# specified at a time. When one sysctl is written it is immediately 
taken into
# account to evaluate the dirty memory limits and the other appears as 0 
when
# read.
#
# Note: the minimum value allowed for dirty_bytes is two pages (in 
bytes); any
# value lower than this limit will be ignored and the old configuration 
will be
# retained.

# dirty_bytes = 16777216

# ==============================================================

# ==============================================================
#
# dirty_ratio
#
# Contains, as a percentage of total available memory that contains free 
pages
# and reclaimable pages, the number of pages at which a process which is
# generating disk writes will itself start writing out dirty data.
#
# The total avaiable memory is not equal to total system memory.
#
# Default is 20

vm.dirty_ratio = 2

# ==============================================================

# ==============================================================
#
# swappiness
#
# This control is used to define how aggressive the kernel will swap
# memory pages.  Higher values will increase agressiveness, lower values
# decrease the amount of swap.
#
# The default value is 60.

vm.swappiness = 0

# ==============================================================

Now, when I move to 4k sector HDD  -default kernel settings seems to be OK.
Honestly speaking, do don't believe in correlation between sector size 
and pdfluser efficiency - so maybe there is pure coincidence between HDD 
change and good performance on VM defaults. But anyway - You can try to 
play with above knobs...

BTW2: I would love to see this thread in MythTV forums - so I can 
read/replay anywhere via browser - instead of only in mailer program :-p