Mobo and other HW suggestions wanted for new OMV box with ECC

  • While its not the case here, I'm currently working with an MSI motherboard (AM1I) that listed ECC memory on the tested list, but doesn't support ECC. I thought it odd to see ECC listed on such a low end consumer board, so put some in and did a test install of Ubuntu. dmidecode reported properly but EDAC modules wouldn't load, telling me that "BIOS Blocked module from loading". MSI confirmed that ECC memory will work, but ECC function won't. Which is...strange.

    I think reycoaaron is confusing registered ECC with unbuffered ECC. Registered ram is supported only by servers and by newer high end chipsets like X99, it won't run in normal desktop boards.
    Unbuffered ECC can run in most boards I have tried, but of course without ECC enabled.
    I assume that if you put registered ECC in a x99 board with an i7 or with a bios that does not support it, it will run without ECC as well, but I can't say for sure.


    Anyway, AM1 SoCs (because it is a SoC, not just a CPU) have ECC memory controllers.


    It would sell tons if just ONE damn OEM did enable ECC in the BIOS.


    Afaik the only board where some claim ECC is working (unknown if it really is) is ASUS AM1M-A
    http://www.overclock.net/t/1495837/ecc-works-on-am1


    Aaand... it's not miniITX. :thumbdown:

    • Offizieller Beitrag

    Actually, I am quite familiar with registered ECC and unbuffered ECC. I was saying I have never seen a board that *supports* ECC but disables it. Not just lets you use unbuffered ECC with ECC disabled. I would say a board like that does not support ECC.

    omv 7.0.5-1 sandworm | 64 bit | 6.8 proxmox kernel

    plugins :: omvextrasorg 7.0 | kvm 7.0.13 | compose 7.1.4 | k8s 7.1.0-3 | cputemp 7.0.1 | mergerfs 7.0.4


    omv-extras.org plugins source code and issue tracker - github - changelogs


    Please try ctrl-shift-R and read this before posting a question.

    Please put your OMV system details in your signature.
    Please don't PM for support... Too many PMs!

  • EDAC for haswell processors has been added in kernel 3.17, so it explains why you aren't seeing anything.
    https://www.thomas-krenn.com/d…DAC_Driver_Feature_Matrix


    E5 processors are covered by the sb_edac.c module.


    So if you want ECC logging you may need kernel version 3.17 or higher.


    I think that ECC should work anyway though, but I'd still prefer having it enabled.


    If you get EDAC working, can you post the result of edac-util -s ? (package edac-utils)


    It would be very interesting to see what it says on that X99 board.


    Zitat

    I was saying I have never seen a board that *supports* ECC but disables it.

    I've seen way too many boards supporting ECC that were using ECC dims but running without ECC in the DDR2 and DDR1 variety.
    Thankfully their BIOS was crying about that and I promptly swapped the ECC modules with others until it found some they liked and stopped crying.


    Using memory from QVL is usually best, but it's not always easy to find them.


    Supermicro refuses to boot if the RAM isn't ECC afaik, which is better in some respects, and worse in others. If it does boot at least I know that nothing is broken.


    Workstation boards form Asrock (non-registered ram only) have a line on boot saying "ECC memory detected" or something like that, but work fine with non-ECC ram too, and if they don't like a module they will run it in non-ECC mode.

    • Offizieller Beitrag

    I had three Supermicro H8QGL-IF quad Socket G34 boards (supports ECC Registered) that worked just fine with 16 sticks of non-ECC memory (G.Skill F3-10666CL7Q-8GBXH) each.

    omv 7.0.5-1 sandworm | 64 bit | 6.8 proxmox kernel

    plugins :: omvextrasorg 7.0 | kvm 7.0.13 | compose 7.1.4 | k8s 7.1.0-3 | cputemp 7.0.1 | mergerfs 7.0.4


    omv-extras.org plugins source code and issue tracker - github - changelogs


    Please try ctrl-shift-R and read this before posting a question.

    Please put your OMV system details in your signature.
    Please don't PM for support... Too many PMs!

  • I couldn't find a reliable method to verify ECC. Even AIDA64, I'm not totally sure of. I know a friend of mine who swears by it, but I don't like things that cannot be proven. I'll be pleased once I can switch to BTRFS for added peace of mind. Development is rife but I think it'll be a long time before RAID5/6 usable under Debian. The addition of EDAC for Haswell is a welcome sight but it'll be a case of waiting for OMV 3 for that. I suppose I could fire up a Jessie install (and use backports? Not sure what kernel version Jessie has) to see if that shows anything.

    • Offizieller Beitrag

    Jessie kernel is same as Wheezy Backports kernel = 3.16. There is no Jessie Backports yet but it looks like it will be 4.0.

  • theoretically, with a eidac-util -s you should get decent evidence.


    It gives you the bit size of the "word" (the meaning of which I'm not remembering atm, google it), in normal ram it is 64 bits, in ECC it is 72 because of parity and overhead.


    If that is detected, there are strong indications that it is enabled, at least imho. I've seen ECC not running as ECC in ECC supporting boards where the word was back to 64 bits.
    But quite frankly the entire thing is pissing me off to no end. I'm paying premium prices and I even have to do tricks to get the damn thing to show it is worth what I paid for it?


    You can try booting an Arch linux live-cd like Manjaro that has kernel 3.18.something just to see that.
    Must install the eidac-utils first, so you need to look up their wiki on how to use pacman, their package manager.

  • I think you have a small typo there, eidac instead of edac. However, checking on my end... proof is in the pudding. I would say this is pretty concrete...



    1 MC detected surely means one qualifying memory controller (ie. Banks with ECC in them) and regardless, the EDAC module is loaded so it must have detected everything ok.


    Thanks - this is actually the most reassuring evidence to date.


    By the way, not used Manjaro before. I quite like it. Whenever I've needed a live Arch I've used ArchBang, but I think I might switch :) I know it's just pretty themes, but they've done a really nice job and its pleasing to use. :D


  • I'd like to give my thanks to @bobafetthotmail (and @ryecoaaron ) for all the clarifications. Great learning for the less experienced like myself. Not to hijack @ellnic 's string, but if you could clarify..are you of the opinion that if the operating system detects a 72 bit width on the memory (vs 64 bit for non-ECC) that ECC is probably functioning? In my AM1 example above, dmidecode reports "Error Correction Type: Multi-bit ECC", and for each memory module "Total Width: 72 bits". But as I mentioned EDAC is blocked from loading by BIOS. I'm not thoroughly familiar with EDAC though, so I'm not sure if its essential to ECC function, or if its to make the ECC function visible to users via logging etc.? If the former, well its still a bust. If the latter, I'd still tinker with it. The interesting ZFS Plug-in string in the Storage Forum makes me want to put ZFS on this box, but only if I can figure out if ECC is working or not.


    @ellnic a nice feature of Manjaro as an easily created live distro over Arch/ArchBang/Antergos, etc., is that Manjaro holds updated packages back a couple weeks before releasing to their repos to ensure there's no reported breakage in the mainstream. That way, you still get an up-to-date rolling experience on a live stick, while minimizing the chances you have to tinker with your live install after updating/installing packages.

  • @Markess yeah I noticed that. It's a good thing really. Rolling release is all well and good if you like to be up to date, but they are high maintenance systems. I did the Gentoo thing for a while and portage is great, but sooner or later, something breaks in a bad bad way. Same with Arch, but not quite as badly broken, and a lot less time consuming because you aren't waiting for everything to re-emerge.


    Just a note on the 72 bit width, in my opinion I don't think it's any indication at all. The system sees that the width is 72, but is it using it? The board and memory are capable of identifying the extra 8 bits, but if it were that easy to tell if ECC was enabled then there would be a lot less posts/sites/blogs on trying to verify it. That's just me though, and I could be totally wrong.

  • What is the reason four you guys questioning the ECC being active?
    From what I have read it is very rare that it is not used when CPU mobo and RAM do support it.


    I actually trusted my BIOS with that, it says ECC on post, it is enabled in memory options and it shows up as enabled in the summary where the parameters which are being used are listed?


    So what is the reason not to trust that info?

  • Just a note on the 72 bit width, in my opinion I don't think it's any indication at all. The system sees that the width is 72, but is it using it? The board and memory are capable of identifying the extra 8 bits, but if it were that easy to tell if ECC was enabled then there would be a lot less posts/sites/blogs on trying to verify it. That's just me though, and I could be totally wrong.


    I have to agree, once you mention it, recognizing ECC is installed just means that the necessary memory traces are present on the PCB. Which is a step in the right direction.


    Being someone that can't leave well enough alone and wants to understand EVERYTHING, I did a bit more searching. A post on Red Hat's bugzilla indicates that EDAC as an external module will fail to load on recent AM3+ or newer AMD processors, and that a loadable kernel module, edac_mce_amd replaces it. If I boot a live distro, lsmod shows that the module is loaded and is running. Definitely not the, relatively, straightforward confirmation you finally arrived at above. :( I think I won't pull the FreeNAS dinosaur down just yet.

    • Offizieller Beitrag

    I think I won't pull the FreeNAS dinosaur down just yet.


    How does FreeNAS help you? ECC is hardware issue. Pretty sure you can't disable ECC on the motherboard from software.

    omv 7.0.5-1 sandworm | 64 bit | 6.8 proxmox kernel

    plugins :: omvextrasorg 7.0 | kvm 7.0.13 | compose 7.1.4 | k8s 7.1.0-3 | cputemp 7.0.1 | mergerfs 7.0.4


    omv-extras.org plugins source code and issue tracker - github - changelogs


    Please try ctrl-shift-R and read this before posting a question.

    Please put your OMV system details in your signature.
    Please don't PM for support... Too many PMs!


  • How does FreeNAS help you? ECC is hardware issue. Pretty sure you can't disable ECC on the motherboard from software.


    Sorry, I gave no context there. I was referring to my current home NAS solution, which is running FreeNAS. Its got a Supermicro X7SBL-LN2 with a power hungry Intel 3200 chipset. Even with a modest 2 disk mirror, it draws almost 70 watts at idle. All my networked hardware tends to reside in my home office, which gets pretty hot in central California, so I wanted to replace it with something more efficient.


    I thought if I switched to OMV I could take advantage of Debian's much broader support for low power hardware. Low power solutions that work well with FreeNAS's BSD base are limited and tend to come at a price premium. I'm still playing with hardware I had on hand hand before deciding what to go with for a more permanent solution for OMV.


    Plus OMV has so many more plug-ins, and the forum seems a much happier place!

    • Offizieller Beitrag

    What are you waiting for? :) Of course, you would be happier with OMV :)

    omv 7.0.5-1 sandworm | 64 bit | 6.8 proxmox kernel

    plugins :: omvextrasorg 7.0 | kvm 7.0.13 | compose 7.1.4 | k8s 7.1.0-3 | cputemp 7.0.1 | mergerfs 7.0.4


    omv-extras.org plugins source code and issue tracker - github - changelogs


    Please try ctrl-shift-R and read this before posting a question.

    Please put your OMV system details in your signature.
    Please don't PM for support... Too many PMs!

  • dmidecode is a tool that gives you hardware specs as described here http://www.nongnu.org/dmidecode/
    (it is available also if you write "man dmidecode", on linux systems in general, terminal-only programs have decent documentation and are much more talkative than windows terminal-only programs)


    So it will tell you if the RAM is ECC or not, regardless of electrical connections as it just asks the hardware a spec dump. Which is a bit useless because you can use your eyes and detect the additional memory chip and the additional small controller on the RAM board already in the real world.


    edac-utils interrogates the edac drivers that talk with ECC hardware so they can get logs you can see about any hardware-corrected ram errors.


    the edac driver runs only if the hardware has ECC logging capability (assuming there is a driver for that hardware).
    Otherwise it returns various errors about something missing or options disabled or whatever.


    Zitat

    But as I mentioned EDAC is blocked from loading by BIOS. I'm not
    thoroughly familiar with EDAC though, so I'm not sure if its essential
    to ECC function, or if its to make the ECC function visible to users via
    logging etc.? If the former, well its still a bust. If the latter,
    I'd still tinker with it.

    There is some decent hope that it is just a hidden "ECC disabled because my manufacturer does not like you" setting in BIOS you can't change. That's enough to trick Windows and maybe BSD (don't know), but linux is master race, it can ignore that.


    You can try force load of the edac drivers ignoring BIOS's opinion, which you do AT YOUR OWN RISK, but if you want to test and see what happens, it might enable ECC, or at least allow you to see if it is capable of running ECC or not.


    See this blog post for details http://thetechskinny.blogspot.…ory-in-linux-without.html
    -adding the kernel boot parameter to GRUB ecc_enable_override (did not work for him, for me sometimes worked)
    -writing modprobe -v amd64_edac_mod ecc_enable_override=1


    From the link https://www.thomas-krenn.com/d…DAC_Driver_Feature_Matrix
    they say that up to AMD 16h processors are supported by that driver.
    Yours is a AM1 socket so the processor is a Kabini https://en.wikipedia.org/wiki/…ni.22_.282014.2C_28_nm.29
    From a quick googling I get a list here https://www.kernel.org/doc/Documentation/hwmon/k10temp
    that says 16h are Kabini, so they should be theoretically supported.


    The edac_mce_amd is a compatibility module that allows the edac logging system to run on AMD ECC hardware, but alone it is not enough, there must be an actual driver loaded. You must load the driver the guy wrote in his blog and I pasted here.


    Please post any result, as if it enables ECC it is big news. AM1 boards are dirt cheap and there is a total lack of mini-itx boards that support ECC and also have a HDMI port (for NAS + mediacenter use).


    I'm half sure that the AM1 SoCs do support AES acceleration, which is also critical for any kind of ZFS work if your processor is weak.
    Ah yes, it is supported but seems to be run on its iGPU, no idea on how good linux support for that is http://www.tomshardware.com/re…atform-review,3801-4.html


    Zitat

    1 MC detected surely means one qualifying memory controller (ie. Banks
    with ECC in them) and regardless, the EDAC module is loaded so it must
    have detected everything ok.

    Yep. It loads only if it finds a working ECC hardware logging system afaik (not necessarily ECC ram). If something isn't running it reports errors.
    Heh, that's good news for everyone. There isn't 100% certainty (mildly amusing) but there are strong indications that it does run ECC properly.


    Since you have a friend in the RMA businness, ask him to lend you a bad DDR4R ECC ram when he finds one, so you can test it for sure. They have lifetime warranty so it's guaranteed to get RMA'd.


    Zitat

    By the way, not used Manjaro before. I quite like it.

    Yeah, it's the Arch's Ubuntu lol.
    The main feature that made me "choose" it was that both Evolution (installs plain Arch) and Antergos were netinstalls and since my connection sucks they took a while and eventually an error popped up and the installer crashed. Multiple tries lead to same results. Ragequit.


    Manjaro follows the long-time tradition of installing from CD and then updating from a working system, something I do appreciate.


    Zitat

    What is the reason four you guys questioning the ECC being active?


    From what I have read it is very rare that it is not used when CPU mobo and RAM do support it.

    Because I know the people writing the BIOS are a bunch of exploited and underpaid slaves under the whip of cash-grabbing capitalist pigs that don't care. (slight exaggeration for lulz)


    Quite a few brands manage to consistently break BIOS features with updates (Hi ASUS!) or pull BIOS features away just as they pull away hardware without telling anyone and without changing product ID, only making revision numbers that make a PITA to buy the right board (Hi Gigabyte!), and most others did make mistakes in the past although were relatively quick to fix the issue once reported (Hi Asrock, Dell!) some took a while to fix (Hi Supermicro!) Some told me a bunch of nonsense when I reported the issue (Hi Hp!).


    I never used MSI that much so I can't say anything about them. The above is ASUS and Gigabyte track record with consumer boards, never tried their server offering for obvious reasons.


    And I'm talking of ECC, some board-specific setting needed to do something that matters (enable/disable/set controllers or features), and VT-d, where passthrough has been hit-and-miss for years.


    I'm paying premium prices for this stuff because I need it or I'm wasting my time to troubleshoot/RMA/change the board (for real work or to build PC/workstations for customers, not for fun and games), a sticker "running ECC ram/VT-d/whatever" isn't enough.


    Zitat

    What are you waiting for? Of course, you would be happier with OMV

    Seconding this.
    Debian blasts away freeBSD anytime as far as versatility and hardware compatibility goes.
    Now that there are ZFS drivers for Linux too, there is basically no reason to stay with freeBSD at all. :)

  • What is the reason four you guys questioning the ECC being active?
    From what I have read it is very rare that it is not used when CPU mobo and RAM do support it.


    I actually trusted my BIOS with that, it says ECC on post, it is enabled in memory options and it shows up as enabled in the summary where the parameters which are being used are listed?


    So what is the reason not to trust that info?


    Neither the Supermicro UEFI or the ASRock UEFI mention anything to do with ECC. I have seen screenshots to suggest that some of the Aptio 'UEFI's say something like 'ECC: Enabled', but I think it's pretty rare.


    I have to agree, once you mention it, recognizing ECC is installed just means that the necessary memory traces are present on the PCB. Which is a step in the right direction.


    Being someone that can't leave well enough alone and wants to understand EVERYTHING, I did a bit more searching. A post on Red Hat's bugzilla indicates that EDAC as an external module will fail to load on recent AM3+ or newer AMD processors, and that a loadable kernel module, edac_mce_amd replaces it. If I boot a live distro, lsmod shows that the module is loaded and is running. Definitely not the, relatively, straightforward confirmation you finally arrived at above. :( I think I won't pull the FreeNAS dinosaur down just yet.


    I'm pretty much the same. I like to see things are working. It did look so promising, but hey ho.



    This is what @ryecoaaron managed to find out. EDAC loads for other things, not just RAM. He got a driver loaded result on a 1st gen i7 system with non-ECC memory.


    Since you have a friend in the RMA businness, ask him to lend you a bad DDR4R ECC ram when he finds one, so you can test it for sure. They have lifetime warranty so it's guaranteed to get RMA'd.


    One step ahead of you... but he doesn't have anything at the moment. It's suprising, when you don't want bad RAM you'll find some.. but when you want just one bad stick, RAM is surprisingly resilient.


    Yeah, it's the Arch's Ubuntu lol.
    The main feature that made me "choose" it was that both Evolution (installs plain Arch) and Antergos were netinstalls and since my connection sucks they took a while and eventually an error popped up and the installer crashed. Multiple tries lead to same results. Ragequit.


    Manjaro follows the long-time tradition of installing from CD and then updating from a working system, something I do appreciate.


    I pretty much always do net installs, but I get why that might suck on a poor connection. The last time I looked, the main thing that put me off Evo/Lution was that it wouldn't accept BTRFS as the root FS, even if you manually formatted first, then chose the option to skip, it wouldn't mount.


    Zitat

    Seconding this.
    Debian blasts away freeBSD anytime as far as versatility and hardware compatibility goes.
    Now that there are ZFS drivers for Linux too, there is basically no reason to stay with freeBSD at all. :)


    @Markess +1 :) ECC is not affected by OS, if it does or doesn't work in FreeBSD, it will be the same in Linux. Make the jump! :D

  • This is what @ryecoaaron managed to find out. EDAC loads for other things, not just RAM. He got a driver loaded result on a 1st gen i7 system with non-ECC memory.

    Kinda. http://ark.intel.com/products/…20-GHz-6_40-GTs-Intel-QPI
    EDAC_I7CORE
    tristate "Intel i7 Core (Nehalem) processors"
    depends on EDAC_MM_EDAC && PCI && X86 && X86_MCE_INTEL
    help
    Support for error detection and correction the Intel
    i7 Core (Nehalem) Integrated Memory Controller that exists on
    newer processors like i7 Core, i7 Core Extreme, Xeon 35xx
    and Xeon 55xx processors.


    Did you check the blog post linked above?


    dmesg | grep -i edac


    was the command I was supposed to remember and tell you. :/


    It shows what devices EDAC is controlling and it says if ECC is being enabled or not.

  • Well, this is what I get from that:



    I don't specifically see anything that says ECC but there's quite a bit of output.


    What do you make of that?


    Can we get some outputs from others with 'known' ECC?

Jetzt mitmachen!

Sie haben noch kein Benutzerkonto auf unserer Seite? Registrieren Sie sich kostenlos und nehmen Sie an unserer Community teil!