[mythtv-users] Kernel Crashes using anything over 2.6.35

Sat Aug 20 23:49:28 UTC 2011

On 08/20/11 16:04, Rob Davis wrote:
> I seem to remember reading something on the list here about people
> having problems with the rtl8139 gigabit driver and thought that might
> be the reason why my system won't work with anything over 2.6.35.
>
My primary back-end and my test back-end are both running Gentoo 2.6.38
and are quite stable.  I don't have a RTL8139, though.
> I am running a bunch of media cards, HVR 1250, HVR 1850, PVR500 and an
> HDPVR in USB, using Gentoo X86 and Myth.  If I use a kernel above this
> one then the sytem will lock up after a few days.  With my 2.6.35 kernel
> it'll go until the next storm comes through and knocks out the power
> (Thanks ComEd)..
>

> Aug 17 08:12:36 oac kernel: *pde = 00000000
Was there something deleted just before this line?  There's usually
something more informative just before it.
> Aug 17 08:12:36 oac kernel: Modules linked in: cx18_alsa cs5345 cx18
> videobuf_vmalloc mt2131 cx23885 altera_stapl(C) videobuf_dma_sg
> videobuf_dvb dvb_core videobuf_core btcx_risc wm8775 tda8290 tea5767
> tuner cx25840 ivtv cx2341x i2c_algo_bit tveeprom hdpvr v4l2_common
> videodev media vmnet vmblock vsock vmci vmmon s5h1409 tuner_simple
> tuner_types tda9887 lirc_serial(C) cpufreq_powersave powernow_k8
> freq_table mperf snd_usb_audio snd_usbmidi_lib snd_rawmidi snd_hda_intel
> bluetooth nfsd ipv6 snd_seq_dummy snd_seq_oss snd_seq_midi_event snd_seq
> snd_seq_device snd_pcm_oss snd_mixer_oss mxl5005s ir_lirc_codec lirc_dev
> ir_sony_decoder ir_jvc_decoder ir_rc6_decoder ir_rc5_decoder
> ir_nec_decoder rc_core snd_hda_codec_hdmi nvidia(P) firewire_ohci
> firewire_core snd_hda_codec_via i2c_piix4 snd_hda_codec snd_hwdep
> i2c_core serio_raw snd_pcm ppdev parport_pc snd_timer button floppy
> hid_ortek pcspkr ftdi_sio usbserial snd soundcore snd_page_alloc r8168
> crc_itu_t ati_agp agpgart iscsi_tcp libiscsi_tcp libiscsi
> scsi_transport_iscsi tg3 libphy e1000 fuse xfs exportfs nfs nfs_acl
> auth_rpcgss lockd sunrpc jfs reiserfs ext4 crc16 jbd2 ext2 raid10
> raid456 async_raid6_recov async_memcpy async_pq raid6_pq async_xor xor
> async_tx raid1 raid0 dm_snapshot dm_crypt dm_mirror dm_region_hash
> dm_log dm_mod scsi_wait_scan sl811_hcd ohci_hcd uhci_hcd usb_storage
> ehci_hcd aic94xx libsas scsi_transport_sas lpfc qla2xxx
> scsi_transport_fc megaraid_sas megaraid_mbox megaraid_mm megaraid
> aacraid sx8 DAC960 cciss 3w_9xxx 3w_xxxx atp870u dc395x qla1280 imm
> parport sym53c8xx gdth advansys initio BusLogic aic7xxx aic79xx
> scsi_transport_spi sr_mod cdrom sg sd_mod pdc_adma sata_inic162x sata_mv
> ata_piix ahci libahci sata_qstor sata_vsc sata_uli sata_sis sata_sx4
> sata_nv sata_via sata_svw sata_sil24 sata_sil sata_promise pata_sl82c105
> pata_cs5535 pata_cs5530 pata_cs5520 pata_via pata_jmicron pata_marvell
> pata_sis pata_netcell pata_sc1200 pata_pdc202xx_old pata_triflex
> pata_atiixp pata_opti pata_amd pata_ali pata_it8213 pata_pcmcia
> pata_ns87415 pata_ns87410 pata_serverworks pata_cypress pata_oldpiix
> pata_artop pata_it821x pata_optidma pata_hpt3x2n pata_hpt3x3 pata_hpt37x
> pata_hpt366 pata_cmd64x pata_efar pata_rz1000 pata_sil680 pata_radisys
> pata_pdc2027x pata_mpiix libata scsi_mod [last unloaded: media]
You have a *lot* of modules.  Do you really need all these (e.g., all
the pata* ones)?  A good first debugging step would be to build the
kernel with just those drivers you need for your hardware.
> Aug 17 08:12:36 oac kernel:
> Aug 17 08:12:36 oac kernel: Pid: 9519, comm: receiver on dev Tainted: P
>         C  3.0.1-gentoo #1 System manufacturer System Product Name/M3A78
Can you eliminate the tainted driver and still support your hardware? 
If so, you'll be much more likely to get help from the kernel developers
(who tend to shun problems on systems using tainted code).
> Aug 17 08:12:36 oac kernel: EIP: 0060:[<c11101b5>] EFLAGS: 00210286 CPU: 0
> Aug 17 08:12:36 oac kernel: EIP is at selinux_socket_unix_may_send+0x17/0x4e
You may want to reconfigure without selinux support unless you really
need it.
> Aug 17 08:12:36 oac kernel: EAX: e5d38040 EBX: e12a6d00 ECX: 00000019
> EDX: 00000000
> Aug 17 08:12:36 oac kernel: ESI: dea2fdb4 EDI: dea2fdb4 EBP: df13eb40
> ESP: dea2fdb4
> Aug 17 08:12:36 oac kernel: DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
> Aug 17 08:12:36 oac kernel: 00000000 dea2fdd4 c100330b e12a6f00 e5d38040
> dea2fe98 df13eb40 c12812a9
> Aug 17 08:12:36 oac kernel: e12a6f00 c14078b0 00000504 e5d38040 dea2fe98
> df13eb40 00000404 0000007b
> Aug 17 08:12:36 oac kernel: 0000007b 000000d8 00000000 ffffffc5 c1143ce9
> 00000060 00200246 e12a6d00
> Aug 17 08:12:36 oac kernel: [<c100330b>] ? do_IRQ+0x73/0x84
> Aug 17 08:12:36 oac kernel: [<c12812a9>] ? common_interrupt+0x29/0x30
> Aug 17 08:12:36 oac kernel: [<c1143ce9>] ? do_raw_spin_lock+0x5c/0x11d
> Aug 17 08:12:36 oac kernel: [<c110dfa1>] ? security_unix_may_send+0xc/0xd
> Aug 17 08:12:36 oac kernel: [<c1261c6d>] ? unix_dgram_sendmsg+0x35b/0x4ed
> Aug 17 08:12:36 oac kernel: [<c11f6b21>] ? sock_aio_write+0xf9/0x102
> Aug 17 08:12:36 oac kernel: [<c1113a45>] ? inode_has_perm.clone.19+0x2b/0x31
> Aug 17 08:12:36 oac kernel: [<c10a3ad5>] ? do_sync_write+0x9e/0xd3
> Aug 17 08:12:36 oac kernel: [<c110e5cc>] ?
> security_file_permission+0x14/0x6e
>
> After that things start to get killed off:
> Aug 17 08:34:42 oac -- MARK --
> Aug 17 08:54:42 oac -- MARK --
> Aug 17 09:14:42 oac -- MARK --
> Aug 17 09:34:42 oac -- MARK --
> Aug 17 09:54:42 oac -- MARK --
> Aug 17 10:14:42 oac -- MARK --
> Aug 17 10:34:42 oac -- MARK --
> Aug 17 10:54:42 oac -- MARK --
> Aug 17 11:02:33 oac kernel: apcupsd invoked oom-killer: gfp_mask=0xd0,
> order=1, oom_adj=0, oom_score_adj=0
> Aug 17 11:02:33 oac kernel: Pid: 16837, comm: apcupsd Tainted: P      D
>  C  3.0.1-gentoo #1
This rather looks like some of the kernel code started consuming a lot
of memory, resulting in overall memory pressure on the system.  At this
point, the kernel out-of-memory killer was activated and started killing
things.  apcupsd was the first target. 
> Aug 17 11:02:33 oac kernel: Call Trace:
> (snipped...)
>
> etc...
>
> Then the whole thing locked up and the reset button had to be hit..
>
> Any ideas?  Nvidia/Motherboard corruption?
When the oom killer activates, it can be rather non-discriminatory and
kill things that somewhat important.  That's probably the proximate
cause of the lock-up.  The root cause looks more like a memory leak in
some kernel code.  Try pruning your module list and remove selinux
support if possible.  If the problem still occurs, I would be suspicious
of the proprietary driver and/or one of the two staging drivers you have
enabled (altera_stapl, lirc_serial).

Keith