New IBM POWER8 CPU

Discussion in 'All non-AMD/Intel CPUs' started by Red Falcon, Sep 9, 2013.

Thread Status:
Not open for further replies.
  1. mikeblas

    mikeblas [H]ard|DCer of the Month - May 2006

    Messages:
    12,775
    Joined:
    Jun 26, 2004
    My calculations were normalized to be per-core. 48 MB of L3 cache across 6 cores means 8 MB per core. Of course, a single core might be using more, and reuse can happen per core.

    That's because there are six cores per socket, and 384 MB / 6 = 64, and 64 / 6 is 10.67 megabytes.

    Sure. Either way, each CPU socket doesn't have 200-some megabytes of cache available to it.

    Yep, that's closer to the truth. Thing is, only 10,400 KB is in the processor package, and it's about 1/20th of the 200-something MB number you were using when you made your claim about about IBMs engineers and their with budget management. Now that you've learned some of the facts, I wonder if you thinking of re-evaluating your position.

    This is a very strong assertion for which you offer no real evidence.

    There's six sockets per book, and the number of books in the system depend on how many were ordered or installed in the machine. The Redbook for the system describes its architecture; I'm surprised to think you haven't read it.

    You're putting words into my mouth in support of your own position. I've made no claim about the success or failure of zEC12 machines. In fact, I'm asking you why you think they're such a failure to educate myself.


    I've written plenty of code -- certainly enough to know that someone who's trying to minimize lines of code is working to minimize the wrong thing. (And we're talking about hardware anyway, not software.) You'll want to familiarize yourself with my resume before you make further foolish presumptions.

    I'm not trying to understand the paper at AnandTech -- there's no paper at AnandTech, just an article reporting that somoene made a blog post. I'm trying to understand the claims you've made in your posts. I'm disappointed that your responses to my questions don't bring any clarity.


    What is "the IBM mainframe myth"? I guess I'm not asking about that because it hasn't come up before this point.
     
    Last edited: Dec 29, 2013
  2. niomosy

    niomosy Limp Gawd

    Messages:
    247
    Joined:
    Nov 21, 2005
    A lot of your mainframe software cost is going to be in software licensing given the way IBM and most every mainframe software vendor licenses their mainframe software. You're paying per MIPS for a lot of this software so there's a huge desire to not get into a higher MIPS quantity than you really need for business needs.

    As for the mainframe itself, it's been more I/O focused then specifically CPU focused for some time. Most shops will use the mainframe as the system of record with other systems pulling from and putting into the mainframe.
     
  3. King of Heroes

    King of Heroes [H]ard|Gawd

    Messages:
    2,006
    Joined:
    Mar 26, 2008
    This. I work at an IBM i shop, in which I started as an intern, and an important thing that I've learned as that IBM machines are built to be I/O monsters, NOT computation monsters. We use IBM i because its integration with DB2 makes it an enterprise database that's only seriously rivaled by Oracle.
     
  4. niomosy

    niomosy Limp Gawd

    Messages:
    247
    Joined:
    Nov 21, 2005
    Interesting as I see fewer and fewer i (AS/400) shops out there. Our i hardware is pretty old and handles specific apps that eventually feed to the mainframe. Same as our UNIX/Linux databases.

    Funny thing, we almost ended up with the i servers since they can run on the pSeries hardware. My boss ended up suggesting the Mainframe/Tandem team would be better suited to handling it.
     
  5. brutalizer

    brutalizer [H]ard|Gawd

    Messages:
    1,593
    Joined:
    Oct 23, 2010
    Sure, 30 million transistors is not a lot, but it adds up. There are at least two different schools out there. Microsoft belongs to one school saying that bloat is not something terrible. The cpus are so fast today, and we have so much RAM, it does not matter if a software used 100KB more or less. It does not matter. And another 30 million transistors does not matter.

    The other school says that bloat is a bad thing, you should try to minimize the code and keep it lean and mean, this is best from an engineering view point.

    If you have two pieces of code, one code using 1000 LoC, and another using 600 LoC - and they are doing the same thing - then you should almost always choose the 600 LoC alternative. The difference between an experience programmer and a beginner, is that the experienced will use much less code to do the same thing as the beginner. And the risk for bugs are less, as an side effect. The fewer LoC, the fewer bugs you will see. Complexity is a bad thing.

    And this must apply to cpus too. If you have one cpu using 1,000 million transistors and another using 600 million transistors doing the same thing - you should choose the fewer transistor cpu. More unnecessary transistors just eats up more watt, and you will see more bugs too.

    In circuit complexity it is very important to try to reduce the number of transistors. If you can reduce 5% of all transistors - that is better than not trying to optimize at all.

    Obviously, you belong to the same school as MS.

    How will bloat affect the architecture? What kind of question is that? Are you serious with that question?

    Well, first of all, we all know that the x86 is buggy, with lot of bugs. For instance, the FDIV bug.
    http://en.wikipedia.org/wiki/Pentium_F00F_bug
    http://en.wikipedia.org/wiki/Pentium_FDIV_bug
    There have been many stories about Intel and AMD having bugs in x86, forcing them to replace cpus at a huge cost, sometimes. Is this not a problem? This is a consequence of being bloated, and you dont see it as a problem? If you do admit it is a problem, how can you ask me how being bloated is a problem?

    Second, if you compare something truly bloated, such as Windows kernel, to a not as bloated kernel, such as a Linux/Unix kernel - we all know that bloated Windows has problems with unstability and performance problems, in comparison to a Linux/Unix kernel. In WinXP, Microsoft has no idea what was going on in the kernel. No one knew what was happening in that huge mess, causing lot of unstability and performance problems. It was Sinofsky at MS, who first started to try to understand the Windows kernel. And he created the MinWin kernel, the "tiny" Windows kernel, which was only 200MB or so. And after his work Win7 is much more stable and well structured kernel than WinXP ever was. Win7 has a more stripped down kernel, trying to remove bloat, improving stability and performance. I heard that Windows 2000 kernel had something like... 1,500 API calls or even more? Many of them legacy and not in use anymore, but you still have to have them in Windows. In comparison, Linux kernel has something like ~150 API calls. So the effect of bloat, is unstabilty and perofrmance problems. And talking about CPUs, more transistors eat more wattage.

    Really, I dont get it. How can anyone (except people at MS) ask me why bloat is a problem? Are you serious? o_O


    ...


    In software development, reducing LoC is important. And so it is in circuitries too. Reducing transistors is a big field of research. If you have all these transistors, can you find a smaller set, producing the same output? Have you missed this? You never try to optimize anything you construct, because removing 30 million transistors here and there, is not a problem?
    http://en.wikipedia.org/wiki/Circuit_complexity


    And there are developers at Microsoft saying that producing code with low LoC is not important, because Windows is so big in the first place, another 1000 LoC does not matter. And what has this resulted in? Windows being unstable and having low performance. And yes, it IS a problem, no matter how much the MS developer will assure the customers it is not a problem.

    And yes, the x86 architecture IS buggy and ineffecient - and that IS a problem.
     
  6. brutalizer

    brutalizer [H]ard|Gawd

    Messages:
    1,593
    Joined:
    Oct 23, 2010
    Normalized per core? Why? Are we discussing how slow the Mainframe cpus are, or how slow the Mainframe cores are? Why are you shifting focus from cpu to core? That is something IBM is very fond of, distorting facts.

    Here is an example on this thing that IBMers love to do:
    http://whywebsphere.com/2013/04/29/...uble-the-cost-of-the-websphere-on-ibm-power7/
    "Oracle announced their new SPARC T5 processor with much fanfare and claiming it to be the “fastest processor in the world”. Well, perhaps it is the fastest processor that Oracle has produced, but certainly not the fastest in the world. You see, when you publish industry benchmarks, people may actually compare your results to other vendor’s results. This is exactly what I would like to do in this article."

    And he "analyzes" (or twists the facts) and concludes that IBM POWER7 is a faster cpu at WebSphere, because POWER7 has higher EjOPS per core. There is a slight problem. SPARC T5 do has the world record. Each SPARC T5 core might be less powerful than POWER7, but T5 has many more cores - so the end result is T5 is the worlds fastest cpu on this.

    IBM says:
    POWER7 has faster cores than SPARC T5 -> might be true
    And therefore:
    POWER7 is a faster cpu than SPARC T5 -> not true.

    So, this is a lie. By shifting focus from CPUs to cores, IBM draws a conclusion, and applies the same conclusion back on the cpus again. Which is no longer true. Besides, SPARC T5 often has faster cores too, and many more of them too.

    So, I dont understand why are you talking about cores? I am talking about cpus.


    I said that I would like you to help me to figure out the exact numbers. And that means I am willing to re-evaluate my position, yes. I said so. The thing is, when I try to ask IBM people about this, they never help me. They duck and evade all my questions. I never get a clear answer.

    So, maybe you can help me. Again, how many books does the largest zEC12 have? I am comparing the largest and most powerful IBM Mainframe to x86 servers here. One book apparently has six sockets. And one book has 384 MB L4 cpu cache. Each cpu has 48MB L3 cache. These are facts?

    NB, when I say "cpu" I mean, one socket. Not one core. If I mean "core", I say "core". I will not say "cpu" and mean "core". That is only confusing people.

    And no, I have not read the RedBook, I dont have time to read books on this. Now when I have you, an IBM Mainframe person, online maybe you can help me figuring out this. Neither you nor I, would want me to say untrue things. So let us figure out how much cpu cache each Mainframe cpu has.


    So, IBM is not trying to hide and conceal facts, to make it difficult to assess performance of Mainframes vs x86?

    Well, how do you expect me to prove this? By linking to a document by IBM where they confessed they do try to hide facts? What kind of proof do you want?

    Facts are, IBM dont ever release benchmarks comparing x86 to Mainframes. There are no benchmarks, with say, SPECint2006 of zEC12 vs x86 cpus. Why is that? Of course, if you can show me such a benchmark, then I was wrong on this, and will stop say this. But fact is, IBM will not allow you to compare Mainframes to x86 or SPARC. No way. And I will not be able to show a written confession from IBM on this. I just state facts, IBM has not released benchmarks vs x86 or SPARC.


    They are a failure, because you can software emulate a big Mainframe on a x86. If Mainframes where that much faster, then it would not be possible to use software emulation to replace a Mainframe.

    You should know that a experienced developer writes less code, achieving the same result as a beginner. And the less code, the better - says every developer I met.


    Please ask your questions again, and I will try to answer them. Maybe you can answer my questions so we can finally figure out how much cpu cache there is in total in the largest configured Mainframe. How many books is the largest supported configuration?

    The myth is that they are so much more powerful, the most powerful SMP computers in the world. Well, they are not. They are in fact very slow cpu wise. worst in class.
     
  7. brutalizer

    brutalizer [H]ard|Gawd

    Messages:
    1,593
    Joined:
    Oct 23, 2010
    I have never denied that Mainframes are superior at I/O. This IS true, they are superior at I/O, and that is why many use them. One Mainframe can have 296.000 I/O channels, I heard. And they have lot of I/O co processors. The question is, how much I/O would you get from a x86 server, if you used that many I/O co processors?

    But it seems that you agree with me, they are not good cpu wise, but very good at I/O?
     
  8. mikeblas

    mikeblas [H]ard|DCer of the Month - May 2006

    Messages:
    12,775
    Joined:
    Jun 26, 2004
    This is a pretty vast oversimplification. Arbitrary complexity is bad, but users demand features -- and features add complexity. It's rare that simply useless code exists; code exists because someone wanted it and added it to the project. It serves some purpose, for someone.

    No user is concerned with lines of code; they're concerned with features, performance, and stability. The correlation between lines of code and any of those deliverables is indirect at best.
    Why must it apply? Just because you have said so?

    Except: you won't. If the code (or circuitry) is bloat, then it's not used. If it's not used, then the bugs won't be exercised and end up being irrelevant. If the code (or circuitry) is useful, then it's not bloat and the bugs have come from required complexity and aren't a result of bloat, anyway.


    Most people who buy processors are concerned with performance, not any notion of "bloat". A design that uses more transistors and performs better is more desirable than a design that uses fewer transistors and is slower.


    Are tehre any processors which are completely bug free? Will you enumerate them?

    Bugs aren't a consequence of bloat; they're a consequence of complexity. The instructions you cite aren't unnecessary waste; they're useful and necessary. Their implementation isn't trivial. Intel and AMD (and all other succesful processor manufacturers) do incredible verification and validation testing. (Lots of LoC in the tests -- that's bloat too, right?)

    Really, it isn't. No developer will spend a few days deleting code. They might re-factor code, but that usually ends up in more lines of code than less. There are other metrics in software development that are more important.


     
  9. Red Falcon

    Red Falcon [H]ardForum Junkie

    Messages:
    9,833
    Joined:
    May 7, 2007
    brutalizer hasn't made any personal attacks, and has been quite logical and reasonable throughout this whole discussion.
    It would be nice if you actually answered his questions, as I am interested in this information as well. :)
     
  10. jimmyb

    jimmyb 2[H]4U

    Messages:
    3,165
    Joined:
    May 24, 2006
    Again, we're talking about hardware design here.

    1) The extra logic to support decoding legacy instructions is doing something.
    2) Fewer number of transistors is not necessarily the better design, and does not necessarily use less power. A very big example of this is in digital logic where the industry has almost universally moved to CMOS implementations due to their lower power, despite typically having twice as many transistors as there equivalent NMOS implementation.
    3) Having smart designs, BIST components, modern DFT techniques, and thorough verification reduces bugs more than *anything* else; usually this means using more transistors than the smallest design possible.

    It's a very legitimate question. Tell me specifically how "bloat" in the decode logic affects the microarchitecture. Tell me why it's making x86 slow, buggy, and inefficient - don't just make general statements, tell me something specific.

    Is the decode logic:
    1) A critical timing path in the design? This would put an upper bound on the clock frequency. If it's not the critical timing path it's not slowing down the design.
    2) Is it necessitating an extra pipeline stage that would otherwise not be required? If it's not, then it isn't effecting the decode latency.
    3) Is it so slow and large that it has caused significant design decisions to accommodate it? What were these decisions specifically?

    If the "bloat" isn't slowing down the design and is providing some function, then it's not really bloat.

    All ASICs have bugs. Neither of those that you mentioned were due to "bloat" in the decode logic. They don't support you point.

    Intel and AMD's x86 processors generally have fewer errata than other vendors in my experience.


    Your argument is to conflate features you don't find useful as "bloat", and then generalize all "bloat" as having specifically determinantal effects. Some "bloat" causes issues, but not all does. You are claiming that supporting legacy instructions *specifically* is causing significant increases in die area, but your "proof" is only through generalizations.

    If they aren't slowing down the design, aren't consuming significant area, don't have any known bugs, and are providing a useful (albeit infrequently used) feature why would you "optimize" the design out?

    I don't need a wikipedia link to a page on circuit complexity. How much experience do you have doing integrated circuit design? My designs have been taped out in dozens of billion+ transistor products; ranging from CPUs, GPUs, supercomputers, telecom, datacom, etc. - in multiple leading edge submicron processes.

    All your arguments are generalizations without any specific data or analysis.


    What do you mean buggy and inefficient? Provide some data on this please.

    How many errata does Intel's and AMD's x86 microarchitecture have compared to ARM, PowerPC, etc? How severe are the errata? What is the comparative efficiency per watt? per area? etc.? You have provided no data on this, and yet are making claims.
     
  11. jimmyb

    jimmyb 2[H]4U

    Messages:
    3,165
    Joined:
    May 24, 2006
    He also hasn't substantiated any of his claims about hardware design.
     
  12. mikeblas

    mikeblas [H]ard|DCer of the Month - May 2006

    Messages:
    12,775
    Joined:
    Jun 26, 2004
    Imagine that you work (or have worked) at IBM or Microsoft, and re-read his posts. Then, tell me what you think. People who don't agree with him "aren't experienced", on top of it. Rhetoric like this might not be precisely what you think of as a personal attack, but to people who are personally involved in the technologies or companies in question (as I have been) it certainly is an unpleasant and ineffective way to make a point.

    The questions about cache size, book architecture, and book count are all in the redbook. The redbook can be freely downloaded from the IBM website. It's quite well written, though it uses terms that most PC users aren't too familiar with (like "book" for a CPU card, and a stack of acronyms specific to the Z-series architecture).

    Section 2.2.1 explains that four books can be in the processor cage. Secetion 2.2 explains how physical memory is arranged, and shows a block diagram of a book. Section 2.4 draws functional diagram of the processors and their cache, plus the off-chip cache on the book itself.

    Section 2.4.2 explains some of the processor's technical design characteristics, including the chip-level features that x86-class machines simply don't have.

    Section 2.4.3 describes Processor Units (cores, essentially -- this is another reason why it's natural to assume "CPU" means "core" instead of "socket") that are available on the different configurations.

    The off-package L4 cache is discussed in Sections 2.4.4 and 2.4.5.

    Section 2.5 explains how the system addresses memory. Note that the systems can be configured with 3 terabytes of memory.
     
  13. Red Falcon

    Red Falcon [H]ardForum Junkie

    Messages:
    9,833
    Joined:
    May 7, 2007
    ^ Great info, thank you for the link.
     
  14. niomosy

    niomosy Limp Gawd

    Messages:
    247
    Joined:
    Nov 21, 2005
    I want to see a bit more dive-down on this from our mainframe-familiar people. While, yes, you can emulate a mainframe on PC hardware and get CPU-based performance easily (I've run Hercules with the available OS'es at home to play around). I work with an ex mainframer that put VSE on Hercules and supported that in production for a shop that needed a small mainframe for some code still there.

    Yet what I recall about the mainframes is less the CPU importance and more the I/O importance. How is the PC going to be at handling the I/O loads that a mainframe would handle for large financial organizations, for example?
     
  15. mikeblas

    mikeblas [H]ard|DCer of the Month - May 2006

    Messages:
    12,775
    Joined:
    Jun 26, 2004
    x86 hardware can't handle the IO as well. It also doesn't have the redundancy, fail-safe, and security features the mainframe hardware has. There are some Itanium systems that are competitive, but I don't know of any contemporary x86 gear that's close. (There have been some entries into the market -- the last I know about was the HP Superdome boxes, and the Unisys ES7000-series, but I don't keep up to date on that market much anymore.)

    I think the conclusion that the ability to emulate one on the other means the other is superior in all aspects is quite flawed.
     
  16. FLECOM

    FLECOM Modder(ator) & [H]ardest Folder Evar Staff Member

    Messages:
    15,569
    Joined:
    Jun 27, 2001
    that's all nice and generic but name an ACTUAL task that these things do that a linux cluster cannot do (and that apparently would not require redundancy)

    not "SMP" tasks vs non-smp tasks that's nice, but not an answer, I mean REAL applications
     
  17. niomosy

    niomosy Limp Gawd

    Messages:
    247
    Joined:
    Nov 21, 2005
    I would guess OLTP databases as an opener since he's going for HPC versus non-HPC.
     
  18. Red Falcon

    Red Falcon [H]ardForum Junkie

    Messages:
    9,833
    Joined:
    May 7, 2007
    I too would like to know this, as it is an interesting point.
     
  19. Ins0mnyteq

    Ins0mnyteq Gawd

    Messages:
    982
    Joined:
    Nov 11, 2013
    i just Started working with a guy here in NC that works for IBM directly for this project. Some of the info he gave me led me to believe these chips were going to be the future of Computing. stating various facts & math that i dont know anything about. seemed super passionate about it. He told me that they were planning on selling the x86 side of IBM so they could compete with intel.
     
  20. Red Falcon

    Red Falcon [H]ardForum Junkie

    Messages:
    9,833
    Joined:
    May 7, 2007
    ^ If that's true, that would make things very interesting.
    If you find out anything more, let us know!
     
  21. FrankD400

    FrankD400 Limp Gawd

    Messages:
    147
    Joined:
    Jan 31, 2013
    I used to test/diagnose POWER6/POWER7 mainframes (which are NOT all on zArchitecture, though I did work with z9/z10/zEnterprise with z196 MCMs) .. the specs on those chips were insane, especially POWER7. I'd kill for a POWER8 box at home!

    I did mainly 9119-FHA, 9119-FHB and 9125-F2C. More commonly known as Power595, Power795, and Power 775. 775 was especially ridiculous. I really should've been mining BTC on them.. they did have OpenCL libraries.

    iSeries software on Power595/795 was a pain in the junk. How do people use that crap?

    The big caches on z196 are eDRAM by the way.
     
    Last edited: Feb 17, 2014
  22. FrankD400

    FrankD400 Limp Gawd

    Messages:
    147
    Joined:
    Jan 31, 2013
    Edit: IBMer informed me I was wrong about the L4, it's on the SC chips at 96MB per. I have to bug him for some documentation :) He says the L3 is eDRAM but he's not 100% sure about the L4 on the SCs. I did waaaay more pSeries/775 than I did with zSeries, guess my memory got a little foggy!

    http://www.vosizneias.com/wp-content/uploads/2010/09/savbg.jpg

    There's a picture from Poughkeepsie, that looks like either the node room in Building 007 or Mariner test in Building 052. The 775 supercomputer is/was referred to internally as 'Mariner'.

    Possibly interesting side note: We tested the 775 nodes (a super wide/long 2RU high box with water-cooled RAM, MCMs, and optical hubs) with 8400GS cards in the PCIE slots, save for 2 slots with SAS cards which were hooked up to drives bolted to a shelf powered by a Corsair PSU of all things. No disk IO drawers at that stage.. :)
     
    Last edited: Feb 17, 2014
  23. Ins0mnyteq

    Ins0mnyteq Gawd

    Messages:
    982
    Joined:
    Nov 11, 2013
    I have another meeting with him tomorrow to Plan the up-fit on his new building ( he runs a Property management company as a 2nd Job...wtf) I planned on asking him a bunch about it, They way he described it Reminded me of when i discovered Computers when i was a kid. Ill try and pry some hard facts from him, so that i can actually add to the thread lol
     
  24. FLECOM

    FLECOM Modder(ator) & [H]ardest Folder Evar Staff Member

    Messages:
    15,569
    Joined:
    Jun 27, 2001
    is this post in english?
     
  25. Red Falcon

    Red Falcon [H]ardForum Junkie

    Messages:
    9,833
    Joined:
    May 7, 2007
    It's nerdy as hell, but the guy is totally [H]ard, nonetheless! :cool:
     
  26. FrankD400

    FrankD400 Limp Gawd

    Messages:
    147
    Joined:
    Jan 31, 2013
  27. niomosy

    niomosy Limp Gawd

    Messages:
    247
    Joined:
    Nov 21, 2005
    As a Unix guy, I would never call any POWER-based system a mainframe. I reserve that, within IBM, for the z/Series servers. It's really odd to hear a non-z/Series IBM referred to as a mainframe unless they're referring to z/Series predecessors like the 390, 370, and 360 architectures.

    Those POWER7 systems can get really massive but even at that, this is the first time I've heard them called mainframes.
     
  28. FrankD400

    FrankD400 Limp Gawd

    Messages:
    147
    Joined:
    Jan 31, 2013
    IBM calls them mainframes internally in most departments, even though they're not 360/390 decent. I think it kind of sticks in IBM because of the culture and the fact that the computational/IO heart sits in a big rack (a frame). They're at a high level much the same with a large midplane for interbook IO, very high availability, hotswappable books, and have an IO orientation bias. They're all expensive as hell, even if the z is pricier :)

    What POWER systems do you work with?
     
  29. niomosy

    niomosy Limp Gawd

    Messages:
    247
    Joined:
    Nov 21, 2005
    That seems really odd that IBM thinks of them internally as mainframes. Now if they'd just release a mainframe ADCD hobbyist package so I could play with z/OS and z/VM on an officially blessed mainframe emulator, that would be great.

    These days, mostly p740 and p750 systems running AIX. We've still got some P6 595s left but are migrating to p750 systems mostly. I'm hoping we pick up some P8s eventually. Then we'll go through moving all the P7 stuff to P8 hardware.
     
  30. FrankD400

    FrankD400 Limp Gawd

    Messages:
    147
    Joined:
    Jan 31, 2013
    Well if you compare a zEnterprise and a 795 at a physical level they're not all that different. Nodes/books with some MCMs and RAM that communicate over a wide midplanar and connect to a crapload of IO via Infiniband. You use Tres drawers on your 595s?
     
  31. niomosy

    niomosy Limp Gawd

    Messages:
    247
    Joined:
    Nov 21, 2005
    No tres drawers for us. Storage is fibre channel to NetApp mostly.
     
  32. gigatexal

    gigatexal [H]ardness Supreme

    Messages:
    7,120
    Joined:
    Jun 22, 2004
  33. niomosy

    niomosy Limp Gawd

    Messages:
    247
    Joined:
    Nov 21, 2005
    Unseating Intel from the data center is going to be a rather difficult thing. Good luck, IBM, you're going to need it.
     
  34. brutalizer

    brutalizer [H]ard|Gawd

    Messages:
    1,593
    Joined:
    Oct 23, 2010
    I talked about reading a link about Mainframe cpus using lot of cache, I stumbled on this link again! Here it is! So now we can continue the discussion. Could someone help me interpret these numbers, and please no "read this book if you want to know". Has somebody read the book and can help me out here?
    http://www.tomshardware.com/news/z196-mainframe-microprocessor,11168.html
    "Additionally, a 4-node system uses 19.5MB of SRAM for L1 private cache, 144MB for L2 private cache, 576MB of eDRAM for L3 cache, and 768MB of eDRAM for a level-4 cache."

    I interpret this "4-node system" as a Mainframe with four sockets. Hence it has
    19.5 + 144 + 576 + 768 = 1504.5MB cache in total. That is, 376MB of cache per socket. It all revolves around what is a "4-node system"? If it is a 4-socket system, then I think most people agree that closer to half a GB of cache per socket is hilarious. But if it is not 4-sockets, I am willing to reevaluate my standpoint. No problem.


    My claims about hardware design are as follows. In engineering, you should try to avoid bloat. That is a general principle. And very important to keep the unnecessary bloat down. Any programmer knows that less bloat is superior to another source that uses way more code to the same thing. Among others, the attack vector for compromising the system is larger. There are more bugs in more code, etc etc. These things are obvious to any engineer.


    I posted a link from SGI. Read that again? Here it is for your convience:
    http://www.realworldtech.com/sgi-interview/6/
    "...However, scientific applications (HPC) have very different operating characteristics from commercial applications (SMP). Typically, much of the work in scientific code is done inside loops, whereas commercial applications, such as database or ERP software are far more branch intensive. This makes the memory hierarchy more important, particularly the latency to main memory. Whether Linux can scale well with a SMP workload is an open question. However, there is no doubt that with each passing month, the scalability in such environments will improve. Unfortunately, SGI has no plans to move into this SMP market, at this point in time..."

    So you will not see benchmarks on large enterprise systems performed on a cluster. For instance, such as this SAP benchmark done on a Oracle M6-32 SMP server:
    https://blogs.oracle.com/BestPerf/entry/20140327_m6_32_sap_sd
    All these servers on the benchmark are SMP servers, no clusters.


    The reason the x86 hardware can not handle IO as well as Mainframes is simple: Mainframes have lot of help IO cpus. Imagine if you tucked that same amount of help IO cpus on a x86 server...

    Well in math it is not. I have a heavy math back ground, and in math it is like this: if you can emulate a machine with another well enough - but not the other way around - then one system is superior, and the other inferior. I think this sounds reasonable?

    Here is the Mainframe myth, Ive posted this link before, and post it again for your convenience:
    "Perpetuing myths about the zSeries"
    http://www.mail-archive.com/linux-390@vm.marist.edu/msg18587.html

    Forgive me for saying this, but I dont believe you have been developing a lot. It seems you only have some basic understanding about development. If we are going to talk about our resumes as you do, I have a double Masters, one in theoretical computer science from a top ranked university and another in Math (not as good university, but one of the best in my country), and work with High frequency trading as a researcher at an investment bank.

    And I can promise you, no developer I know would say these things you say. And I know that you have not studied higher math, because of the things you say. No mathematician would for instance, consider a proof that is more bloated as good as a smaller proof - mathematicians (and developers) considers bloat a very very bad thing. You do not. Hence you are not a real mathematician nor a real developer.

    All developers tries to refactor the code to make it smaller whenever they can. Here is for instance, a _real_ developer that deletes code at Apple. The Apple managers didnt consider less code as better, instead they encouraged bloat and made no fuss of bloat. Just like you who dont consider bloat a very bad thing. Maybe you are more of a manager these days? You are not a developer, that I can tell with certainty. This real developer considers more LoC a bad thing, just as I have said all the time. Talk to a real developer and they all say the same thing as I do - bloat is a very bad thing. Anyone saying the opposite is not a real developer.
    http://www.folklore.org/StoryView.py?story=Negative_2000_Lines_Of_Code.txt

    "In early 1982, the Lisa software team was trying to buckle down for the big push to ship the software within the next six months. Some of the managers decided that it would be a good idea to track the progress of each individual engineer in terms of the amount of code that they wrote from week to week. They devised a form that each engineer was required to submit every Friday, which included a field for the number of lines of code that were written that week.

    Bill Atkinson, the author of Quickdraw and the main user interface designer, who was by far the most important Lisa implementor, thought that lines of code was a silly measure of software productivity. He thought his goal was to write as small and fast a program as possible, and that the lines of code metric only encouraged writing sloppy, bloated, broken code.

    He recently was working on optimizing Quickdraw's region calculation machinery, and had completely rewritten the region engine using a simpler, more general algorithm which, after some tweaking, made region operations almost six times faster. As a by-product, the rewrite also saved around 2,000 lines of code.

    He was just putting the finishing touches on the optimization when it was time to fill out the management form for the first time. When he got to the lines of code part, he thought about it for a second, and then wrote in the number: -2000.

    I'm not sure how the managers reacted to that, but I do know that after a couple more weeks, they stopped asking Bill to fill out the form, and he gladly complied."
     
  35. mikeblas

    mikeblas [H]ard|DCer of the Month - May 2006

    Messages:
    12,775
    Joined:
    Jun 26, 2004
    I guess the problem is that I have a more practical definition of superiority.

    Your car has an ECU on it which controls engine function. That ECU contains a computer which listens to sensors, makes some decisions, and controls the engine. That ECU can't emulate a large computing cluster, but the large computing cluster can certainly emulate the ECU.

    Is the ECU superior to the computing cluster? It certainly is: nobody wants to buy a car that requires a genset to run a computing cluster that weighs a few orders of magnitude more than the car itself.

    I might not argue that the cluster is more capable in most respects, but even then it's not capable of being portable or running in constrained heat or power environments. Things like capability and superiority are situational, not absolute; a fact that makes your comparisons of minicomputers to mainframes very brittle.

    I never said that bloat wasn't a bad thing. I just said that you're wrong in claiming that all developers try to actively refactor code; and that you're wrong when you claim smaller code is always better; and that you're wrong in your belief that minimization is most important.
     
  36. fixedmy7970

    fixedmy7970 Gawd

    Messages:
    524
    Joined:
    Jul 20, 2013
    wait a sec.

    when is larger code better than smaller code?

    i was under the impression that size doesn't matter,
     
  37. mikeblas

    mikeblas [H]ard|DCer of the Month - May 2006

    Messages:
    12,775
    Joined:
    Jun 26, 2004
    You're asking about code size, but I think brutalizer is making claims about code complexity.

    Code size matters at a small level. Larger code runs slower when it starts to not fit in cache. OTOH, larger code might be faster becasue it accesses memory less, implements a more efficient algorithm, or offers more features.

    Some of those changes also involve (or, at least, imply) adding complexity. While we measure code size in bytes (or maybe lines of code), there's no real concensus about how to measure complexity, so comparisons of complexity are harder than comparisons of size. More complex code is better when the complexity is required, and worse when the complexity is arbitrary or not required.

    Claims like "The difference between an experience programmer and a beginner, is that the experienced will use much less code to do the same thing as the beginner" or "All developers tries to refactor the code to make it smaller whenever they can" simply don't reflect the reality of commercial software development.
     
  38. jimmyb

    jimmyb 2[H]4U

    Messages:
    3,165
    Joined:
    May 24, 2006
    It is a *general* principle. That means you can't use it to back *specific* claims and comparisons about hardware.

    You made a claim about how legacy instruction support was significantly slowing down mainframe and x86 CPUs. "Bloat" is not an argument. There are thousands of engineers doing actual analysis and research on CPU design and verification, and they aren't basing their arguments solely (or even substantially) on the concept of bloat.
     
  39. jimmyb

    jimmyb 2[H]4U

    Messages:
    3,165
    Joined:
    May 24, 2006
    This just isn't true. Sometimes a longer and more complex mathematical proof provides greater intuition into *why* is something is true. Lots of mathematicians will avoid proofs by exhaustion for this very reason, even though they can be quite concise.
     
  40. brutalizer

    brutalizer [H]ard|Gawd

    Messages:
    1,593
    Joined:
    Oct 23, 2010
    You are dead wrong on this. Smaller is better. Talk to any mathematician, you dont have to take my word for it. Just ask anyone who has studied math at a high level, not some college or undergrad level.

    A shorter proof is considered more elegant than a bloated longer proof. A short clever proof is more beautiful, and math is all about beauty. Just ask any serious mathematician what beautiful math is. Mathematicians are obsessed with beauty. All serious mathematicians are.

    This also applies to source code.
     
Thread Status:
Not open for further replies.