[mythtv-users] myth hangs + locks whole system

Joseph Fry joe at thefrys.com
Thu Aug 23 02:58:09 UTC 2012


>
> Finally on 0.25.2 I've had myth hang the entire system twice today.  The
> system eventually (c 20 minutes) came back.  I was hoping that the
> upgrade would fix such problems, but it seems not.
>
> The videos were recorded with 0.24.2 over the air in the US, and the
> signal quality is quite poor.  It would be very nice if myth did not
> react to bad inputs by hanging the system.
>
> Any advice, aside from getting better inputs?  I'm prepared to do some
> debugging.
>
> I'm running a chroot Debian testing with Marillat's myth package.  The
> host system is Debian lenny with linux-image-2.6.26-2-686
> 2.6.26-29 .
> Yes, upgrading RSN.  I'm probably going to do that before attempting
> debugging.
>
> First incident I was watching a video, which stuttered and froze.  The
> frontend had many messages like
> 2012-08-22 14:23:19.339406 E  AudioOutput Error: WriteAudio: device is in
> a bad state (state = 1)
> I also saw this problem in 0.24.2.
> The kernel started producing messages with
> Aug 22 14:32:28 corn kernel: [1307862.925078] pageout: orphaned page
> (many of them).  I've received such messages in other contexts and been
> told they are harmless;
> I think they are some debugging code that was accidentally left in the
> kernel (linux 2.6.26).
> Then comes
> Aug 22 14:33:33 corn kernel: [1307902.369300] INFO: task mythbackend:14587
> blocked for more than 120 seconds.
> Aug 22 14:33:33 corn kernel: [1307902.369308] "echo 0 >
> /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> Aug 22 14:33:33 corn kernel: [1307902.369314] mythbackend   D 02dce5eb
> 0 14587      1
> Aug 22 14:33:33 corn kernel: [1307902.369322]        c7ce8980 00000082
> f310f954 02dce5eb 00000000 f7fed660 f7fed7ec c201bfc0
> Aug 22 14:33:33 corn kernel: [1307902.369338]        00000001 c2019380
> 00000000 c20192f8 c201c3f4 c20192f8 fffeffff f7da89b8
> Aug 22 14:33:33 corn kernel: [1307902.369352]        b739885f f7fed660
> c02b9f50 d9cfdf5c f7da89c0 16423cd4 f7da89b8 f7da89b8
> Aug 22 14:33:33 corn kernel: [1307902.369365] Call Trace:
> Aug 22 14:33:33 corn kernel: [1307902.369463]  [<c02b9f50>]
> rwsem_down_failed_common+0x14c/0x167
> Aug 22 14:33:33 corn kernel: [1307902.369505]  [<c02b9fa4>]
> rwsem_down_read_failed+0x1a/0x22
> Aug 22 14:33:33 corn kernel: [1307902.369536]  [<c02b9fdf>]
> call_rwsem_down_read_failed+0x7/0xc
> Aug 22 14:33:33 corn kernel: [1307902.369554]  [<c02b95cf>]
> down_read+0xa/0xb
> Aug 22 14:33:33 corn kernel: [1307902.369561]  [<c0115b1c>]
> do_page_fault+0x214/0x5c0
> Aug 22 14:33:33 corn kernel: [1307902.369610]  [<c0115908>]
> do_page_fault+0x0/0x5c0
> Aug 22 14:33:33 corn kernel: [1307902.369622]  [<c02ba3a2>]
> error_code+0x72/0x78
> Aug 22 14:33:33 corn kernel: [1307902.369702]  =======================
> Aug 22 14:33:33 corn kernel: [1307902.373206] INFO: task mythbackend:24297
> blocked for more than 120 seconds.
> Aug 22 14:33:33 corn kernel: [1307902.373206] "echo 0 >
> /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> Aug 22 14:33:33 corn kernel: [1307902.373206] mythbackend   D 00000004
> 0 24297      1
> Aug 22 14:33:33 corn kernel: [1307902.373206]        cccba500 00000082
> e0a7e90c 00000004 00000000 f701b560 f701b6ec c2011fc0
> Aug 22 14:33:33 corn kernel: [1307902.373206]        00000000 c0162ef2
> 000009e8 f22a9f54 00000000 007fb000 ffffffff f7da89b8
> Aug 22 14:33:33 corn kernel: [1307902.373206]        a13febd0 f701b560
> c02b9f50 f22a9f88 f7da89c0 f7da89b8 f7da89b8 f7da89b8
> Aug 22 14:33:33 corn kernel: [1307902.373206] Call Trace:
> Aug 22 14:33:33 corn kernel: [1307902.373206]  [<c0162ef2>]
> zap_page_range+0x7b/0xbd
> Aug 22 14:33:33 corn kernel: [1307902.373206]  [<c02b9f50>]
> rwsem_down_failed_common+0x14c/0x167
> Aug 22 14:33:33 corn kernel: [1307902.373206]  [<c02b9f83>]
> rwsem_down_write_failed+0x18/0x1f
> Aug 22 14:33:33 corn kernel: [1307902.373206]  [<c02b9fea>]
> call_rwsem_down_write_failed+0x6/0x8
> Aug 22 14:33:33 corn kernel: [1307902.373206]  [<c02b95e2>]
> down_write+0x12/0x13
> Aug 22 14:33:33 corn kernel: [1307902.373206]  [<c016659b>]
> sys_munmap+0x21/0x3e
> Aug 22 14:33:33 corn kernel: [1307902.373206]  [<c0103857>]
> sysenter_past_esp+0x78/0xb1
> Aug 22 14:33:33 corn kernel: [1307902.373206]  [<c02b0000>]
> quirk_nopciamd+0x10/0x4a
> Aug 22 14:33:33 corn kernel: [1307902.373206]  =======================
> and more like that.  I'm guessing that in "mythbackend:24297" 24297 is a
> process number and the
> backend is spawning processes trying to complete some task.
>
> Eventually
> Aug 22 14:38:36 corn kernel: [1308078.477758] python invoked oom-killer:
> gfp_mask=0x1201d2, order=0, oomkilladj=0
> Aug 22 14:38:36 corn kernel: [1308078.477766] Pid: 29463, comm: python
> Tainted: P          2.6.26-2-686 #1
> Aug 22 14:38:36 corn kernel: [1308078.477800]  [<c0159252>]
> oom_kill_process+0x4f/0x195
> Aug 22 14:38:36 corn kernel: [1308078.477859]  [<c015967c>]
> out_of_memory+0x14e/0x17f
> Aug 22 14:38:36 corn kernel: [1308078.477915]  [<c015b5e4>]
> __alloc_pages_internal+0x2b8/0x34e
> Aug 22 14:38:36 corn kernel: [1308078.477974]  [<c015b686>]
> __alloc_pages+0x7/0x9
> Aug 22 14:38:36 corn kernel: [1308078.477985]  [<c015cffc>]
> __do_page_cache_readahead+0x86/0x163
> Aug 22 14:38:36 corn kernel: [1308078.478030]  [<c015d413>]
> do_page_cache_readahead+0x3d/0x4a
> Aug 22 14:38:36 corn kernel: [1308078.478058]  [<c0158af2>]
> filemap_fault+0x164/0x35a
> Aug 22 14:38:36 corn kernel: [1308078.478118]  [<c0161f4e>]
> __do_fault+0x42/0x34b
> Aug 22 14:38:36 corn kernel: [1308078.481642]  [<c016413d>]
> handle_mm_fault+0x30c/0x6dd
> Aug 22 14:38:36 corn kernel: [1308078.481642]  [<c017766c>]
> sys_fstat64+0x1e/0x23
> Aug 22 14:38:36 corn kernel: [1308078.481642]  [<c0115bab>]
> do_page_fault+0x2a3/0x5c0
> Aug 22 14:38:36 corn kernel: [1308078.481642]  [<c0115908>]
> do_page_fault+0x0/0x5c0
> Aug 22 14:38:36 corn kernel: [1308078.481642]  [<c02ba3a2>]
> error_code+0x72/0x78
> Aug 22 14:38:36 corn kernel: [1308078.481642]  [<c02b0000>]
> quirk_nopciamd+0x10/0x4a
> Aug 22 14:38:36 corn kernel: [1308078.481642]  =======================
> Aug 22 14:38:36 corn kernel: [1308078.481642] Mem-info:
> Aug 22 14:38:36 corn kernel: [1308078.481642] DMA per-cpu:
> Aug 22 14:38:36 corn kernel: [1308078.481642] CPU    0: hi:    0, btch:
> 1 usd:   0
> Aug 22 14:38:36 corn kernel: [1308078.481642] CPU    1: hi:    0, btch:
> 1 usd:   0
> Aug 22 14:38:36 corn kernel: [1308078.481642] Normal per-cpu:
> Aug 22 14:38:36 corn kernel: [1308078.481642] CPU    0: hi:  186, btch:
>  31 usd: 128
> Aug 22 14:38:36 corn kernel: [1308078.481642] CPU    1: hi:  186, btch:
>  31 usd: 180
> Aug 22 14:38:36 corn kernel: [1308078.481642] HighMem per-cpu:
> Aug 22 14:38:36 corn kernel: [1308078.481642] CPU    0: hi:  186, btch:
>  31 usd: 169
> Aug 22 14:38:36 corn kernel: [1308078.481642] CPU    1: hi:  186, btch:
>  31 usd: 185
> Aug 22 14:38:36 corn kernel: [1308078.481642] Active:59905 inactive:386043
> dirty:0 writeback:0 unstable:0
> Aug 22 14:38:36 corn kernel: [1308078.481642]  free:12187 slab:12329
> mapped:2315 pagetables:2615 bounce:0
> Aug 22 14:38:36 corn kernel: [1308078.481642] DMA free:8116kB min:68kB
> low:84kB high:100kB active:2564kB inactive:1808kB present:16256kB
> pages_scanned:8115 all_unreclaimable? yes
> Aug 22 14:38:36 corn kernel: [1308078.481642] lowmem_reserve[]: 0 873 2015
> 2015
> Aug 22 14:38:36 corn kernel: [1308078.481642] Normal free:40232kB
> min:3744kB low:4680kB high:5616kB active:76612kB inactive:549052kB
> present:894080kB pages_scanned:1019923 all_unreclaimable? yes
> Aug 22 14:38:36 corn kernel: [1308078.481642] lowmem_reserve[]: 0 0 9136
> 9136
> Aug 22 14:38:36 corn kernel: [1308078.481642] HighMem free:400kB min:512kB
> low:1736kB high:2960kB active:160444kB inactive:993312kB present:1169416kB
> pages_scanned:8566436 all_unreclaimable? yes
> Aug 22 14:38:36 corn kernel: [1308078.481642] lowmem_reserve[]: 0 0 0 0
> Aug 22 14:38:37 corn kernel: [1308078.481642] DMA: 51*4kB 63*8kB 67*16kB
> 42*32kB 16*64kB 3*128kB 2*256kB 0*512kB 1*1024kB 1*2048kB 0*4096kB = 8116kB
> Aug 22 14:38:37 corn kernel: [1308078.481642] Normal: 9050*4kB 19*8kB
> 7*16kB 6*32kB 5*64kB 3*128kB 3*256kB 0*512kB 0*1024kB 1*2048kB 0*4096kB =
> 40176kB
> Aug 22 14:38:37 corn kernel: [1308078.481642] HighMem: 12*4kB 10*8kB
> 1*16kB 0*32kB 0*64kB 0*128kB 1*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB =
> 400kB
> Aug 22 14:38:37 corn kernel: [1308078.481642] 32630 total pagecache pages
> Aug 22 14:38:37 corn kernel: [1308078.481642] Swap cache: add 11301606,
> delete 11301580, find 10602710/11733324
> Aug 22 14:38:37 corn kernel: [1308078.481642] Free swap  = 0kB
> Aug 22 14:38:37 corn kernel: [1308078.481642] Total swap = 2097144kB
> Aug 22 14:38:37 corn kernel: [1308078.489642] 524032 pages of RAM
> Aug 22 14:38:37 corn kernel: [1308078.489642] 294656 pages of HIGHMEM
> Aug 22 14:38:37 corn kernel: [1308078.489642] 38685 reserved pages
> Aug 22 14:38:37 corn kernel: [1308078.489642] 13575 pages shared
> Aug 22 14:38:37 corn kernel: [1308078.489642] 26 pages swap cached
> Aug 22 14:38:37 corn kernel: [1308078.489642] 0 pages dirty
> Aug 22 14:38:37 corn kernel: [1308078.489642] 0 pages writeback
> Aug 22 14:38:37 corn kernel: [1308078.489642] 2315 pages mapped
> Aug 22 14:38:37 corn kernel: [1308078.489642] 12329 pages slab
> Aug 22 14:38:38 corn kernel: [1308078.489642] 2615 pages pagetables
> Aug 22 14:38:38 corn kernel: [1308078.489642] Out of memory: kill process
> 13889 (sh) score 3162192 or a child
> Aug 22 14:38:38 corn kernel: [1308078.489672] Killed process 14638
> (mythfrontend)
>
> The backend log has no indication of trouble that I can see until 14:39:
> Aug 22 14:24:54 corn mythbackend[14575]: I ProcessRequest
> mainserver.cpp:1360 (HandleAnnounce) MainServer::ANN Monitor
> Aug 22 14:24:54 corn mythbackend[14575]: I ProcessRequest
> mainserver.cpp:1362 (HandleAnnounce) adding: corn as a client (events: 0)
> Aug 22 14:24:54 corn mythbackend[14575]: I ProcessRequest
> mainserver.cpp:1360 (HandleAnnounce) MainServer::ANN Monitor
> Aug 22 14:24:54 corn mythbackend[14575]: I ProcessRequest
> mainserver.cpp:1362 (HandleAnnounce) adding: corn as a client (events: 1)
> Aug 22 14:24:58 corn mythbackend[14575]: E PreviewGenerator
> previewgenerator.cpp:254 (Run) Preview: Encountered problems running
> '/usr/bin/mythpreviewgen --size 0x0 --chanid 2051 --starttime
> 20120501220000 --verbose general --loglevel info --quiet --syslog local6'
> (128)
> Aug 22 14:27:54 corn mythbackend[14575]: I HouseKeeping
> housekeeper.cpp:225 (RunHouseKeeping) Running housekeeping thread
> Aug 22 14:39:17 corn mythbackend[14575]: W ProcessRequest
> mainserver.cpp:5801 (connectionClosed) MainServer: Unknown socket closing
> MythSocket(0x88bcf78)
> Aug 22 14:39:18 corn mythbackend[14575]: E ProcessRequest
> mythsocket.cpp:358 (writeStringList) MythSocket(88bcf78:-1):
> writeStringList: Error, socket went unconnected.#012#011#011#011We wrote 0
> of 188 bytes with 1 errors#012#011#011#011starts with: 180
> corn[]:[]corn:/usr/local/var/media/free[]:[]1[]:[]1[...
> Aug 22 14:39:19 corn mythbackend[14575]: N Expire autoexpire.cpp:263
> (CalcParams) AutoExpire: CalcParams(): Max required Free Space: 1.0 GB
> w/freq: 15 min
>
> The second time I was doing lossless transcoding to remove commercials,
> and making a cutlust for a different show at the same time.  The
> mythtranscode logs have over 400,000 lines, mostly
> Aug 22 17:53:42 corn mythtranscode[6891]: E CoreContext mpeg2fix.cpp:1211
> (GetPoolFrame) No more queue slots!
> Also an error I saw with 0.24.2.  The first message from the kernel is a
> kill:
> Aug 22 18:05:40 corn kernel: [1321754.707709] ksysguardd invoked
> oom-killer: gfp_mask=0x1201d2, order=0, oomkilladj=0
> Aug 22 18:06:32 corn kernel: [1321754.707709] Pid: 26472, comm: ksysguardd
> Tainted: P          2.6.26-2-686 #1
> Aug 22 18:06:32 corn kernel: [1321754.707709]  [<c0159252>]
> oom_kill_process+0x4f/0x195
> Aug 22 18:06:32 corn kernel: [1321754.707709]  [<c015967c>]
> out_of_memory+0x14e/0x17f
> Aug 22 18:06:32 corn kernel: [1321754.707709]  [<c015b5e4>]
> __alloc_pages_internal+0x2b8/0x34e
> Aug 22 18:06:32 corn kernel: [1321754.707709]  [<c015b686>]
> __alloc_pages+0x7/0x9
> Aug 22 18:06:32 corn kernel: [1321754.707709]  [<c015cffc>]
> __do_page_cache_readahead+0x86/0x163
> Aug 22 18:06:32 corn kernel: [1321754.707709]  [<f950e6f9>]
> dm_table_any_congested+0x32/0x4e [dm_mod]
> Aug 22 18:06:32 corn kernel: [1321754.707709]  [<c015d413>]
> do_page_cache_readahead+0x3d/0x4a
> Aug 22 18:06:32 corn kernel: [1321754.707709]  [<c0158af2>]
> filemap_fault+0x164/0x35a
> Aug 22 18:06:32 corn kernel: [1321754.707709]  [<c02ba3a2>]
> error_code+0x72/0x78
> Aug 22 18:06:32 corn kernel: [1321754.707709]  [<c0161f4e>]
> __do_fault+0x42/0x34b
> Aug 22 18:06:32 corn kernel: [1321754.707709]  [<c016413d>]
> handle_mm_fault+0x30c/0x6dd
> Aug 22 18:06:32 corn kernel: [1321754.707709]  [<c0115bab>]
> do_page_fault+0x2a3/0x5c0
> Aug 22 18:06:32 corn kernel: [1321754.707709]  [<c01263c6>]
> sys_gettimeofday+0x27/0x53
> Aug 22 18:06:32 corn kernel: [1321754.707709]  [<c0115908>]
> do_page_fault+0x0/0x5c0
> Aug 22 18:06:32 corn kernel: [1321754.707709]  [<c02ba3a2>]
> error_code+0x72/0x78
> Aug 22 18:06:32 corn kernel: [1321754.707709]  [<c02b0000>]
> quirk_nopciamd+0x10/0x4a
> Aug 22 18:06:32 corn kernel: [1321754.707709]  =======================
>
>
Not an expert, but looks like some sort of memory leak or simple out of
memory issue.  I would have imagined that the dev's would have seen this
long before it made it to you... perhaps its not mythtv itself?  You do
have swap space configured?  Are you running with /tmp in ram... anything
non-standard like that?
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.mythtv.org/pipermail/mythtv-users/attachments/20120822/323109dd/attachment-0001.html>


More information about the mythtv-users mailing list