Laptop too hot - CPU (almost) always at 100%
Lately my laptop has been overheating, to the point where I had to lift it up with some books on the sides, so that there is better air circulation.
When trying to see what could causing it, I noticed that the system monitor gives me 100% cpu usage. But in the process tab, I see only two or three programs using CPU namely:
gnome-system-monitor; cpu 5-10; memory around 14MiB
firefox; cpu 5-10; memory around 300MiB
updater; cpu 1-5; memory around 2MiB
All other programs are not using any CPU (at least they show 0 and I have CPU as indexing priority). It's strange since the "firefox" is actually tor browser, I thought I would see "tor" process using CPU too.
What could be causing this?
Any other information necessary please ask so :)
Thanks in advance.
Do you see all processus on your computer or just only your processus ?
Is the system monitor set to also display root-owned tasks?
Hey guys, thanks for all the replies.
Well, turns out I was only seeing my own processes.
After a little testing, what I see is that a process called "kworker/0:0", "kworker/0:3" and "kworker/0:4" consumes a lot of CPU power, basically doing nothing.
Xorg also appeared but it takes nothing major.
I will investigate this "kworker" process, but any help is welcome as always :)
I will also try and run some root kit check ups to see if there is anything weird. I don't think that is the case, but better be safe than sorry.
EDIT: kworker runs as root.
Well, I found this:
https://askubuntu.com/questions/33640/kworker-what-is-it-and-why-is-it-hogging-so-much-cpu
But it seems to be way too technical for me... I will try and follow the advice on that page but it really is a little way beyond my usual understanding of linux systems. :(
In a terminal:
sudo -s
echo l > /proc/sysrq-trigger
dmesg
At the end of the dmesg output, you should have a short section that says "NMI backtrace" for each of your CPU cores. Look for any messages at the end of the backtrace sections to give you a clue as to whether you have some process that's malfunctioning.
In the example on the askubuntu.com page you linked to, the poster had a malfunctioning kernel module for his Intel ethernet port ("e1000e" is a Linux kernel driver for Intel network adapters). Therefore, he had a series of "e1000e" messages at the end of his backtrace section on dmesg.
If you copy and paste your backtrace section at the bottom of the demsg output, we may be able to help you.
Hey, I am having the same issue again.
I will have to try your method when the problem is occurring again (right now all is working well), since I think it will not be much good posting that output when the system is normal, right?
Also, if I may ask, what marks the beginning and ending of the backtrace?
The beginning is the ""NMI Backtrace" thing, right? Where does it end so I know what to copy and paste?
Thanks.
Here, I think this is it. The problem is happening right now, and again, I had left it for about an half an hour. When I got back the CPU was at 100% and after I entered my password again (to unlock the OS) and ran the commands you posted, here is the output:
[ 6929.704528] NMI backtrace for cpu 0
[ 6929.704533] CPU 0
[ 6929.704536] Modules linked in: bnep rfcomm bluetooth binfmt_misc xfs ip6t_REJECT xt_hl ip6t_rt nf_conntrack_ipv6 nf_defrag_ipv6 ipt_REJECT xt_LOG xt_limit xt_tcpudp xt_addrtype xt_conntrack acer_wmi sparse_keymap ip6table_filter ip6_tables snd_hda_codec_hdmi nf_conntrack_netbios_ns nf_conntrack_broadcast nf_nat_ftp nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 snd_hda_codec_realtek nf_conntrack_ftp nf_conntrack iptable_filter snd_hda_intel snd_hda_codec snd_hwdep ip_tables x_tables snd_pcm uvcvideo videobuf2_vmalloc videobuf2_memops dm_multipath microcode snd_page_alloc videobuf2_core videodev snd_seq_midi snd_seq_midi_event snd_rawmidi snd_seq arc4 k10temp joydev scsi_dh edac_core serio_raw edac_mce_amd snd_seq_device snd_timer snd ath9k soundcore ath9k_common ath9k_hw ath mac80211 cfg80211 sp5100_tco i2c_piix4 shpchp mac_hid parport_pc ppdev lp parport btrfs libcrc32c zlib_deflate xts gf128mul dm_crypt raid10 raid456 async_memcpy async_raid6_recov async_pq async_xor xor async_tx raid6_pq raid1 raid0 multipath linear dm_mirror dm_region_hash dm_log ums_realtek usb_storage radeon psmouse tg3 i2c_algo_bit ttm drm_kms_helper drm video wmi
[ 6929.704705]
[ 6929.704711] Pid: 3062, comm: bash Not tainted 3.4.112-gnu1 #1
[ 6929.704723] RIP: 0010:[] [] __bitmap_empty+0x4/0x90
[ 6929.704742] RSP: 0018:ffff88007c623e10 EFLAGS: 00000096
[ 6929.704747] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 000000000003ffff
[ 6929.704753] RDX: 0000000000000000 RSI: 0000000000000100 RDI: ffffffff81cd3660
[ 6929.704759] RBP: ffff88007c623e28 R08: 000000000000013f R09: 000000000000000a
[ 6929.704765] R10: 0000000000000000 R11: 0000000000000000 R12: ffffffff81c79fa0
[ 6929.704770] R13: 0000000000000286 R14: 0000000000000004 R15: 0000000000000000
[ 6929.704777] FS: 00007f5c6315e740(0000) GS:ffff88010fc00000(0000) knlGS:00000000f7409700
[ 6929.704784] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 6929.704789] CR2: 00007f5c627dc000 CR3: 00000000325ca000 CR4: 00000000000007f0
[ 6929.704795] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 6929.704801] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[ 6929.704807] Process bash (pid: 3062, threadinfo ffff88007c622000, task ffff8800b2ce8000)
[ 6929.704812] Stack:
[ 6929.704816] ffffffff81034d26 0000000000000000 000000000000006c ffff88007c623e38
[ 6929.704825] ffffffff813e110e ffff88007c623e78 ffffffff813e14f9 0000000d718c45e3
[ 6929.704833] 0000000000000002 fffffffffffffffb ffff8800cf2d7d00 ffffffff813e1560
[ 6929.704842] Call Trace:
[ 6929.704854] [] ? arch_trigger_all_cpu_backtrace+0x86/0xa0
[ 6929.704866] [] sysrq_handle_showallcpus+0xe/0x10
[ 6929.704875] [] __handle_sysrq+0x129/0x190
[ 6929.704884] [] ? __handle_sysrq+0x190/0x190
[ 6929.704893] [] write_sysrq_trigger+0x4a/0x50
[ 6929.704902] [] proc_reg_write+0x84/0xc0
[ 6929.704911] [] vfs_write+0xc8/0x190
[ 6929.704918] [] sys_write+0x51/0x90
[ 6929.704928] [] system_call_fastpath+0x16/0x1b
[ 6929.704932] Code: 89 45 f0 48 89 45 b8 48 8d 45 d0 4c 89 4d f8 c7 45 b0 10 00 00 00 48 89 45 c0 e8 38 ff ff ff c9 c3 90 90 90 90 90 90 44 8d 46 3f <85> f6 55 44 0f 49 c6 31 d2 48 89 e5 41 c1 f8 06 45 85 c0 7e 24
[ 6929.704998] Call Trace:
[ 6929.705005] [] ? arch_trigger_all_cpu_backtrace+0x86/0xa0
[ 6929.705015] [] sysrq_handle_showallcpus+0xe/0x10
[ 6929.705023] [] __handle_sysrq+0x129/0x190
[ 6929.705032] [] ? __handle_sysrq+0x190/0x190
[ 6929.705040] [] write_sysrq_trigger+0x4a/0x50
[ 6929.705048] [] proc_reg_write+0x84/0xc0
[ 6929.705055] [] vfs_write+0xc8/0x190
[ 6929.705062] [] sys_write+0x51/0x90
[ 6929.705070] [] system_call_fastpath+0x16/0x1b
I don't see why this is happening, since there is nothing appearing "repeated".
Do you only have an NMI Backtrace for CPU 0? My dmesg gives an output with backtrace for 4 CPU cores, some will have 2 (or more). Just do me a favor and check to be sure. Please post them if you have additional backtraces for additional CPU cores.
Also another question, do you have KDE installed?
Yes. I think that is because my cpu only has one core... Old laptop remember? :)
No, I use the standard Gnome that comes with Trisquel.
I would go back to your previous link: https://askubuntu.com/questions/33640/kworker-what-is-it-and-why-is-it-hogging-so-much-cpu
There was a good set of instructions there, that should be relatively simple to carry out:
"If you find the system unusable due to excessive kworker activity, I would recommend trying to do fewer things. If you think you're not doing anything, try shutting down long-running services or timers (RSS readers, mail readers, file indexers, activity trackers, etc.). If this doesn't work, try restarting. If your system allows you to enable or disable hardware in a pre-boot environment, try turning off hardware you aren't using. If it happens on every restart before you do anything, you could try uninstalling things, but at this point you'll want to be running syscall profiling tools to track down specific applications that seem to be causing this overload."
Check your system. Are you running any email programs, RSS news readers, any programs like "recoll" that index files and keywords? (Run the "ps -A" command and look for "recoll" or "recollindex"). Any "activity tracking" programs?
Is there any hardware you can disable without harming the functioning of the way you use your system?
As far as I am aware, no I don't have any other stuff running on the background. And truth be said, I only see kworker eating my CPU,no other processes.
I ran the command you suggested and there was no "recoll" or "recollindex" on the list. I will try to run that again when I have the issue happening.
I have decided that for now I will simply not use suspend/hibernate/whatever since that apparently prevents the issue from happening. Also, I was offered a cooler pad recently and it should alleviate the heating up if and when it happens.
As for hardware I don't know... I don't use the ethernet port, so I shouldn't have to disable it right? Same for webcam?
So this is a "suspend/hibernate/whatever" bug.
The default kernel comes with a ton of drivers. Many of these will be as removable modules and not baked in. (And for many devices having a driver active means they will be powered on.)
You could induce the 100% cpu condition and then lsmod to see what modules there are and then try removing (rmmod) those one by one to see if you can find the culprit that way. If you do, then add the offending module back (modprobe) and see if the condition persists or not. If not, then you could modify the "suspend/hibernate/whatever" scripts to do the module un/loading for you.
Note that if module A is used by module B, you must remove B before you can remove A.
Thanks, I will give that a try when I get the same behavior again. It doesn't happen everytime, and I have been trying to avoid it honestly ^_^
It happened again and I tried what you suggested. But there were too many modules loaded and I never got any to unload (trying sudo rmmod always gave an error, saying that the module was being used). But I got the idea of using iotop and noticed there was a thing called kswapd0 running. What I read online stated that it was something agnostic to processes any and all processes could be using it at the same time, so I don't know how to kill it. I suppose it is something unnecessary that is launched to do some kind of "indexing" when the computer is left unused for some time (maybe to try and make it faster when you need it) but it is not being properly stopped when I start using the pc again.
So... how to kill this kswaspd0??
kswaspd0 is in the kernel space. I do not think you can 'kill' it. That should calm it down though:
$ echo 1 > /proc/sys/vm/drop_caches
Thanks.
So, that is supposed to be done when the problem is happening right? Or should I do it now and it will prevent the problem from happening?
In other words, what does that do exactly? :)
Thanks in advance for the help.
Yes: when the problem is happening.
drop_caches
Writing to this will cause the kernel to drop clean caches, as well as
reclaimable slab objects like dentries and inodes. Once dropped, their
memory becomes free.
To free pagecache:
echo 1 > /proc/sys/vm/drop_caches
To free reclaimable slab objects (includes dentries and inodes):
echo 2 > /proc/sys/vm/drop_caches
To free slab objects and pagecache:
echo 3 > /proc/sys/vm/drop_caches
This is a non-destructive operation and will not free any dirty objects.
To increase the number of objects freed by this operation, the user may run
`sync' prior to writing to /proc/sys/vm/drop_caches. This will minimize the
number of dirty objects on the system and create more candidates to be
dropped.
This file is not a means to control the growth of the various kernel caches
(inodes, dentries, pagecache, etc...) These objects are automatically
reclaimed by the kernel when memory is needed elsewhere on the system.
Use of this file can cause performance problems. Since it discards cached
objects, it may cost a significant amount of I/O and CPU to recreate the
dropped objects, especially if they were under heavy use. Because of this,
use outside of a testing or debugging environment is not recommended.
https://www.kernel.org/doc/Documentation/sysctl/vm.txt
I tried it and always says "bash: /proc/sys/vm/drop_caches: permission denied".
I tried with sudo, tried with sudo and "", always get the permission denied error.
That command works if you're super user (i.e. after sudo su). If you want to use sudo with the command as one liner, you need to also use tee because currently only echo gets run as super user but the redirect which would need it doesn't get the sudo power. Why that's so I don't remember or understand.
Indeed. Sorry about that. Here is the proper command:
$ echo 1 | sudo tee /proc/sys/vm/drop_caches
Thanks! I will try i next time :)
Hey, it seems to solve the problem. I was getting the same behavior once again and I tried it. At first it seemed to do nothing (CPU was still at 100% and all kworker processes were still up) but after a few seconds when I run "sudo iotop" to see if kswap was there or not, it seemed to stop (probably it was a matter of timing, it would stop anyway, it was mere luck that it stopped when I run iotop) and kswap was nowhere to be found on iotop.
So, yes it seems to alleviate the issue, probably will have to do it everytime it happens. Maybe I will create a quick-launch-button for it. I don't think it can work like automatically... can it?
Thanks everyone for the comments.
Well, as I was reading the comments on those page and some others, I noticed that indeed my laptop was heating up more in the place where the disk is. So I went to the BIOS and changed from ide to adci (or something like that). And so far it has worked! No more kworker processes coming up, no more over heating, the cpu is normal... I think it might have been a change in some aspect of kernel update lately and it was necessary to change that.
For now I would mark this as solved :)
If anything goes wrong I will let you guys know :)
THANKS
That would be AHCI mode - AHCI is a modern controller language that the computer uses to communicate with the hard drive. IDE is the legacy mode. Sounds to me like a good option - you probably won't see a decrease in performance, and may see a slight increase. You should be fine - if your system wasn't going to work with AHCI mode enabled, you should've run into a huge error immediately. Since you were apparently able to boot up and log back into Trisquel, I would assume you're good to go now.
Yes, my thoughts exactly.
And apparently it solved the issues I had. So all is good now :)
Hey guys.
I have been getting similar behavior of my machine again (not as frequently but it happens), so I am "re opening" the issue.
I noticed it happens when the laptop is left unused for a long period of time, when I use it again it has the kworker requesting CPU power again.
I will try the methods suggested above but ask that anyone who can help to provide input here.
Thanks
If anyone else is interested in helping me out, here is the output of the dmesg:
[ 6929.704528] NMI backtrace for cpu 0
[ 6929.704533] CPU 0
[ 6929.704536] Modules linked in: bnep rfcomm bluetooth binfmt_misc xfs ip6t_REJECT xt_hl ip6t_rt nf_conntrack_ipv6 nf_defrag_ipv6 ipt_REJECT xt_LOG xt_limit xt_tcpudp xt_addrtype xt_conntrack acer_wmi sparse_keymap ip6table_filter ip6_tables snd_hda_codec_hdmi nf_conntrack_netbios_ns nf_conntrack_broadcast nf_nat_ftp nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 snd_hda_codec_realtek nf_conntrack_ftp nf_conntrack iptable_filter snd_hda_intel snd_hda_codec snd_hwdep ip_tables x_tables snd_pcm uvcvideo videobuf2_vmalloc videobuf2_memops dm_multipath microcode snd_page_alloc videobuf2_core videodev snd_seq_midi snd_seq_midi_event snd_rawmidi snd_seq arc4 k10temp joydev scsi_dh edac_core serio_raw edac_mce_amd snd_seq_device snd_timer snd ath9k soundcore ath9k_common ath9k_hw ath mac80211 cfg80211 sp5100_tco i2c_piix4 shpchp mac_hid parport_pc ppdev lp parport btrfs libcrc32c zlib_deflate xts gf128mul dm_crypt raid10 raid456 async_memcpy async_raid6_recov async_pq async_xor xor async_tx raid6_pq raid1 raid0 multipath linear dm_mirror dm_region_hash dm_log ums_realtek usb_storage radeon psmouse tg3 i2c_algo_bit ttm drm_kms_helper drm video wmi
[ 6929.704705]
[ 6929.704711] Pid: 3062, comm: bash Not tainted 3.4.112-gnu1 #1
[ 6929.704723] RIP: 0010:[] [] __bitmap_empty+0x4/0x90
[ 6929.704742] RSP: 0018:ffff88007c623e10 EFLAGS: 00000096
[ 6929.704747] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 000000000003ffff
[ 6929.704753] RDX: 0000000000000000 RSI: 0000000000000100 RDI: ffffffff81cd3660
[ 6929.704759] RBP: ffff88007c623e28 R08: 000000000000013f R09: 000000000000000a
[ 6929.704765] R10: 0000000000000000 R11: 0000000000000000 R12: ffffffff81c79fa0
[ 6929.704770] R13: 0000000000000286 R14: 0000000000000004 R15: 0000000000000000
[ 6929.704777] FS: 00007f5c6315e740(0000) GS:ffff88010fc00000(0000) knlGS:00000000f7409700
[ 6929.704784] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 6929.704789] CR2: 00007f5c627dc000 CR3: 00000000325ca000 CR4: 00000000000007f0
[ 6929.704795] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 6929.704801] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[ 6929.704807] Process bash (pid: 3062, threadinfo ffff88007c622000, task ffff8800b2ce8000)
[ 6929.704812] Stack:
[ 6929.704816] ffffffff81034d26 0000000000000000 000000000000006c ffff88007c623e38
[ 6929.704825] ffffffff813e110e ffff88007c623e78 ffffffff813e14f9 0000000d718c45e3
[ 6929.704833] 0000000000000002 fffffffffffffffb ffff8800cf2d7d00 ffffffff813e1560
[ 6929.704842] Call Trace:
[ 6929.704854] [] ? arch_trigger_all_cpu_backtrace+0x86/0xa0
[ 6929.704866] [] sysrq_handle_showallcpus+0xe/0x10
[ 6929.704875] [] __handle_sysrq+0x129/0x190
[ 6929.704884] [] ? __handle_sysrq+0x190/0x190
[ 6929.704893] [] write_sysrq_trigger+0x4a/0x50
[ 6929.704902] [] proc_reg_write+0x84/0xc0
[ 6929.704911] [] vfs_write+0xc8/0x190
[ 6929.704918] [] sys_write+0x51/0x90
[ 6929.704928] [] system_call_fastpath+0x16/0x1b
[ 6929.704932] Code: 89 45 f0 48 89 45 b8 48 8d 45 d0 4c 89 4d f8 c7 45 b0 10 00 00 00 48 89 45 c0 e8 38 ff ff ff c9 c3 90 90 90 90 90 90 44 8d 46 3f <85> f6 55 44 0f 49 c6 31 d2 48 89 e5 41 c1 f8 06 45 85 c0 7e 24
[ 6929.704998] Call Trace:
[ 6929.705005] [] ? arch_trigger_all_cpu_backtrace+0x86/0xa0
[ 6929.705015] [] sysrq_handle_showallcpus+0xe/0x10
[ 6929.705023] [] __handle_sysrq+0x129/0x190
[ 6929.705032] [] ? __handle_sysrq+0x190/0x190
[ 6929.705040] [] write_sysrq_trigger+0x4a/0x50
[ 6929.705048] [] proc_reg_write+0x84/0xc0
[ 6929.705055] [] vfs_write+0xc8/0x190
[ 6929.705062] [] sys_write+0x51/0x90
[ 6929.705070] [] system_call_fastpath+0x16/0x1b
Actually I think it would be solved if using a newer kernel, but I have to rely on the jxself 3.4 kernel, as it's the only way my graphics card works properly (it's an ATI Mobility).
You would have better performance on a 5 years older laptop with an integrated Intel GPU.. How about selling that laptop for cheap and buying an old one with that money?
Actually many reasons not to...
This laptop is actually a nice one (for its price at the time) and freedom-wise the only issue is the GPU. Other than that it has a lot of memory, good sized screen, and everything works very good, free wifi and all. I actually have some connection with this machine and don't look forward to get rid of it. Also, it's the only machine available at the time so family would be without a computer while I sell the old/buy the new.
I doubt I could sell this one for enough money to buy another one and would be afraid of something not working properly with the new one. For now this one actually serves me.
Since we are talking about this, do you have any information if the issue with the AMD in newer kernels has been resolved? 3.4 is going EOL in a couple months.
The easiest way to discover it is to install the latest Linux-libre kernel from Jxself's repository, boot it and see: https://jxself.org/linux-libre/
Do you have a different hard drive you could try? My Mrs had a similar issue and changing the hd sorted it right out.