{{tag>Brouillon Matériel CPU Mémoire Arch}} = Pb memoire cpu hardware mcelog ** Le paquet mcelog n'est plus pris en charge dans les noyaux 4.12 et suivants. rasdaemon peut être utilisé comme remplacement ** rasdaemon utility to receive RAS error tracings rasdaemon is a RAS (Reliability, Availability and Serviceability) logging tool. It currently records memory errors, using the EDAC tracing events. EDAC are drivers in the Linux kernel that handle detection of ECC errors from memory controllers for most chipsets on x86 and ARM architectures. This userspace component consists of an init script which makes sure EDAC drivers and DIMM labels are loaded at system startup, as well as a utility for reporting current error counts from the EDAC sysfs files == I enable memory error reporting http://www.mcelog.org/faq.html chkconfig mcelog on rcmcelog start ''/etc/cron.hourly/mcelog.cron'' #!/bin/bash # is mcelog supported? /usr/sbin/mcelog --supported >& /dev/null if [ $? -eq 1 ]; then exit 1; fi /usr/sbin/mcelog --ignorenodev --filter >> /var/log/mcelog http://askubuntu.com/questions/605369/mce-hardware-error-machine-check-events-logged-appears-in-syslog-what-sho sudo apt-get install mcelog The events will be logged to /var/log/mcelog. You can also run: sudo mcelog --client == II # mcelog mcelog: AMD Processor family 18: Please use the edac_mce_amd module instead. : Success CPU is unsupported lsmod |grep edac_mce_amd modprobe edac_mce_amd echo edac_mce_amd >> /etc/modules == III http://www.advancedclustering.com/act-kb/what-are-machine-check-exceptions-or-mce/ Paste or type the error message into a file, and then run it through the mcelog for example: /usr/sbin/mcelog --k8 --ascii < myerror Use the –k8 option if you are using an AMD Opteron or Athlon 64 processor, or substitute it for –p4 for a Pentium 4 or Xeon. Here is the output from the previous MCE error: HARDWARE ERROR. This is *NOT* a software problem! Please contact your hardware vendor CPU 1 4 northbridge TSC b0ce27165dd3 Northbridge Chipkill ECC error Chipkill ECC syndrome = 3700 bit32 = err cpu0 bit45 = uncorrected ecc error bit57 = processor context corrupt bit61 = error uncorrected bit62 = error overflow (multiple errors) bus error 'local node origin, request didn't time out generic read mem transaction memory access, level generic' STATUS f600200137080813 MCGSTATUS 4 This indicates that an uncorrected ECC error occurred. This indicates that one of your memory modules has failed. For further analysis please submit a support ticket with the complete MCE error message and the output of mcelog.