TAT vs. speedfan inconsistencies between cores

Discussion in 'Overclocking & Cooling' started by graysky, May 18, 2007.

  1. graysky

    graysky Gawd

    Messages:
    620
    Joined:
    May 6, 2007
    If I run both TAT and speedfan, the Intelcore temps are different. Yeah, yeah, I know, I read the thread that explains they should be 15 °C off. On my system, I can calculate an average difference which differs for each core. Not only that, but the difference changes if the machine is in an idle state vs. a load state.

    What I did was simply run 2x orthos, then log the temps in both speedfan and TAT (you can write a log in TAT after the time period is up). After I parsed out the non-temp data from the logs, I simply averaged the numbers per core. If I subtract the TAT temps from the speedfan temps, I get the average differences, which as I said are both different per core, and different for an idle vs. load state.

    For my setup (P5B-Del and Q6600 @ 9x333) I get the following:

    [​IMG]

    Anyone else willing to try this experiment? If you don't want to or know how to do the averaging in a spreadsheet, you can email your logs to me and I'll do it for you.
     
  2. unclewebb

    unclewebb [H]ard|Gawd

    Messages:
    1,799
    Joined:
    Jun 21, 2006
    I used CrystalCPUID which lets you directly read the digital thermal sensor ( DTS ) info from the Core2Duo. The readings reported by CoreTemp and SpeedFan are directly based on the readings from this sensor without any lag whatsoever.

    TAT is inconsistent. I ran TAT one day and got a DTS reading, The next day the DTS was reporting exactly the same but TAT was reporting a temperature that was 2 degrees different.

    Both times the temperature was allowed to stabilize at a fixed value but the temperatures reported by TAT were not consistent. You can't depend on the temperatures reported by TAT to be 100% accurate because they're not. Any comparisons between TAT and CoreTemp or SpeedFan will end up being meaningless because of TAT.

    CoreTemp 0.95 works great. The only problem is that it sometimes uses the wrong Tjunction when it tries to calculate an absolute temperature.

    Reported Temp = Tjunction - DTS

    The E4300/E6300/E6400/E6600 all come in two different versions. Some have a Tjunction of 85C and some have a Tjuncion of 100C and Intel has released no documented way for software to tell the difference. ALL programs are left guessing.

    If they guess wrong, the reported temperature will be off by exactly 15C one way or the other.
     
  3. graysky

    graysky Gawd

    Messages:
    620
    Joined:
    May 6, 2007
    I d/l'ed crystalCPUID but I don't see the DTS data... where is it displayed?
     
  4. graysky

    graysky Gawd

    Messages:
    620
    Joined:
    May 6, 2007
    I just discovered RightMark CPU Clock Utility which seems much closer to +15 and more consistent than TAT when compared to speedfan. I'm doing some tests now and will post the results when finished.
     
  5. unclewebb

    unclewebb [H]ard|Gawd

    Messages:
    1,799
    Joined:
    Jun 21, 2006
    Using CrystalCPUID to extract temperature data from a Core2Duo is not for the faint of heart but it allows you to get the exact data in real time without having to depend on some other programmers interpretation.

    Intel refers to the register in the C2D as IA32_THERM_STATUS. It is located at position 0x19C.

    Within that register it uses bits [22..16] where it stores the reading from the DTS sensor.

    CrystalCPUID allows you to read any register within the C2D by using menu item:
    Function -> MSR Editor

    MSR stands for Model Specific Register.

    In the top box, MSR Number enter in the register you would like to check which for DTS info is 0x19C

    Click on the RDMSR button which reads this register.

    [​IMG]

    The raw DTS data is then transferred into the EAX register.
    In my case the data returned is 0x882E0000

    Each hexidecimal digit equals 4 bits. Bits [15..00] are the 4 zero digits on the right.
    Bits [22.16] are represented by 2E in this example.

    2E hexidecimal equals 46 in decimal.
    Use the MS Calculator program if you need a simple program to convert.

    The formula is:
    Core Temperature = Tjunction - DTS

    In my case Tjunction is 85 so
    Core Temperature = 85 - 46 = 39C

    Which is exactly what CoreTemp and SpeedFan report so they can be fully trusted.

    If either program guesses wrong at Tjunction they might be off by +/- 15C but there's a way to come up with a good guess at the Tjunction for your processor so you will be able to calibrate an absolute temperature from one of these programs.

    The TAT readings do not exactly correlate to the DTS so TAT temperature readings are wrong and meaningless. The DTS is the only Intel, fully documented, temperature sensor within the Core 2 Duo.

    RMClock also displays temperature data with a resolution of 0.2 degrees. It is using the DTS which is an integer value and then averaging out 5 samples to produce its results. Averaged results are not real time results so any comparison between two different ways to interpret DTS data won't have much meaning.
     
  6. graysky

    graysky Gawd

    Messages:
    620
    Joined:
    May 6, 2007
    Thanks for the detailed infomation. That seems like too much for me :) Plus I'm guess it can't write all that to a log file for data manipulation after-the-fact, right? RMClock has a very advanced logging feature and seems to give tighter data than TAT does (this may be a function of the small logging interval). TAT logs quad core data somewhat randomly where RMClock does it like once a sec.
     
  7. unclewebb

    unclewebb [H]ard|Gawd

    Messages:
    1,799
    Joined:
    Jun 21, 2006
    CoreTemp also has a logging function and you can adjust the interval duration as well.
     
  8. graysky

    graysky Gawd

    Messages:
    620
    Joined:
    May 6, 2007
  9. unclewebb

    unclewebb [H]ard|Gawd

    Messages:
    1,799
    Joined:
    Jun 21, 2006
    Are you using Windows Vista?

    The reading of the MSR register is a privileged instruction so the programming tricks that work in XP may not work in other versions of Windows.
     
  10. graysky

    graysky Gawd

    Messages:
    620
    Joined:
    May 6, 2007
    Hhehe, no way would I put that o/s on my machine. At least not until the first major SP is released. It's XP.
     
  11. graysky

    graysky Gawd

    Messages:
    620
    Joined:
    May 6, 2007
    Ok.. I did this on mine and got: 0x883e000 so I look at the 3E which is 62 and 100-63= 38? I believe by cross referencing to rmclock that that value is for my core 0. I have a Q6600. How can I get at the temps for all 4 cores via this method?

    Thanks!
     
  12. unclewebb

    unclewebb [H]ard|Gawd

    Messages:
    1,799
    Joined:
    Jun 21, 2006
    The MSR Editor is useful to compare the DTS value directly to what other temperature monitoring programs are putting out.

    Start up CrystalCPUID and then do a CTRL+ALT+DEL and bring up the TaskManager. Click on the Processes tab and then right click on the process for CrystalCPUID and set the Processor Affinity to what ever core you would like to check. On a quad core system there would normally be 4 check marks indicating that this process is allowed to run on any one of the 4 cores. By unchecking 3 of the cores you will be forcing it to run on only one core.

    I just noticed. The main page of CrystalCPUID also lets you select which core you are presently investigating near the top right just below the Close and Minimize buttons.

    Now when you open up MSR Editor and do a RDMSR you will be getting the temperature data for that core.

    CoreTemp never causes a problem on my computer running XP so I'm not sure what it's doing on your computer. SpeedFan is also very accurate but if your processor has a Tjunction of 100C I think it assumes it is only 85C so it might read 15C too low.

    If you click on the Configure button within SpeedFan and then select the Advanced tab you can set it to an offset of 15 to correct for this. People who have a processor with a Tjunction of 85C that is being misread as a Tjunction of 100C could enter -15 in here and correct for that as well.

    [​IMG]
     
  13. graysky

    graysky Gawd

    Messages:
    620
    Joined:
    May 6, 2007
    Thanks for the info.. I'll try it with the CPUx button enabled. As to speedfan, I actually knew about the offsets, but what I found when comparing the SF numbers (no offset) to RMClock or TAT, is that they tend to vary like the first post of this thread states. So I don't feel comfortable setting them unless I can get them to agree.
     
  14. graysky

    graysky Gawd

    Messages:
    620
    Joined:
    May 6, 2007
    Can you post a link to the Intel document that shows this relationship? Also, have a look at this thread. Is he using TJunction correctly? As I understand it, TJunction is a constant, yet I have seen countless threads (my own included) where people are calling TJunction what you're calling "Reported Temp."

    Thanks!
     
  15. BillParrish

    BillParrish [H]ardness Supreme

    Messages:
    7,976
    Joined:
    Aug 25, 2006
    This is the most accurate and technically based post I have seen on core temp. Excellent. The CrystalCPUID find/info is gold !!! Thanks unclewebb

    This post, the one you are reading now, is all about Quads, do NOT infer anything about other processors, Look up the specs of your processor if other than Q6600 QX6700 its not hard. (Hint, its made by Intel, its a processor, its a technical document, and they have a website.)

    Since I have it bookmarked.

    Read 4.2.10 in this document: http://download.intel.com/design/processor/designex/31559402.pdf


    Bah here,
    The information below is the property of Intel and copied from the above link. Highlights are mine. You should read the entire document, there is more than what is below and the actual data sheet for the CPU as well.

    From:
    Intel® Core™2 Extreme Quad-
    Core Processor QX6700Δ and
    Intel® Core™2 Quad Processor
    Q6000 Δ Sequence
    Thermal and Mechanical Design Guidelines


    4.2.10 Digital Thermal Sensor
    The processor uses the Digital Thermal Sensor (DTS) as the on-die sensor to use for
    fan speed control (FSC). The DTS replaces the on-die thermal diode used in previous
    product. The DTS is monitoring the same sensor that activates the TCC
    (see Section 4.2.2). Readings from the DTS are relative to the activation of the TCC.
    The DTS value where TCC activation occurs is 0 (zero).
    The DTS can be accessed by two methods. The first is via a MSR. The value read via
    the MSR is an unsigned number of degrees C away from TCC activation. The second
    method, which is expected to be the primary method for FSC, is via the PECI
    interface. The value of the DTS when read via the PECI interface is always negative
    and again is degrees C away from TCC activation.
    .
    A TCONTROL value will be provided for use with DTS. The usage model for TCONTROL with
    the DTS is the same as with the on-die thermal diode:
    • If the Digital Thermometer is less than TCONTROL, the fan speed can be reduced.
    • If the Digital Thermometer is greater than or equal to TCONTROL, then TC must be
    maintained at or below the Thermal Profile for the measured power dissipation.
    The calculation of TCONTROL is slightly different from previous product. There is no base
    value to sum with the TOFFSET located in the same MSR as used in previous processors.
    The BIOS only needs to read the TOFFSET MSR and provide this value to the fan speed control device. Multiple digital thermal sensors can be implemented within the package without adding a pair of signal pins per sensor as required with the thermal diode. The digital thermal sensor is easier to place in thermally sensitive locations of the processor than the thermal diode. This is achieved due to a smaller foot print and decreased
    sensitivity to noise. Since the DTS is factory set on a per-part basis there is no need
    for the health monitor components to be updated at each processor family.
    Note: The Intel® Core™2 Extreme quad-core processor QX6700 and Intel® Core™2
    quad processor Q6600 sequence do not have an on-die thermal diode. The
    TCONTROL in the MSR is relevant only to the DTS.



    Despite your measurements being off, it is consistently off as the measurements were made the same way, on the same processor, and while the actual numeric results found can be debated the trend you see cannot. Trouble is, Duh ! The cores are not going to run at the same temp either at idle, or with a load (assuming you could load each core exactly the same and you can not with any software I know about. ) These are individual cores, 4 of them, not specifically matched and certainly not monolithic silicon, slapped in a tin can. Temps are going to vary core to core. Even if they were matched some how, you could never get a perfectly uniform pattern of heat dissipation from the core to the heat spreader, heat spreader to heatsink interfaces and they would still vary.

    Hmm maybe i am missing the point, a database of many CPUs with reference to the stepping and date codes etc showing general trends as to difference in temps across the cores correlated to manufacturing runs might be interesting. But I don't see how you could eliminate the data skew created by the inconsistencies of heat sinking.

    But dont let me stop you, have fun !!! :D
     
  16. graysky

    graysky Gawd

    Messages:
    620
    Joined:
    May 6, 2007
    Thanks for the link, man. The document is technical in nature and not an easy read. One thing is obvious, there is no mention of TJunction, nor of that relationship that webuncle quoted (Core temp = TJunction - DTS).
     
  17. unclewebb

    unclewebb [H]ard|Gawd

    Messages:
    1,799
    Joined:
    Jun 21, 2006
    Now that I've found some people interested in this topic I'll see if I can post some more info tonight with references.

    In my own testing I've found that when DTS=2, Thermal Monitor 2 becomes active. It is designed to intermittently and rapidly drop the CPU multiplier down to 6 on all C2D processors and is also supposed to drop the core voltage as well to control the heat.

    Because it cycles so rapidly at first trying to maintain full power for the user, it can be difficult to detect. CPUz reports the multiplier dropping to 6X but I'm not sure if the processor voltage is actually changing or not. On my motherboard it's possible that when the voltage is set to a fixed value and not set to AUTO that when the processor asks for a voltage change, that request gets ignored by the motherboard to maintain stability.

    If the temperatures get high enough, DTS=0, CoreTemp reports that the VID of the processor has been dropped to 1.16 volts but I believe SpeedFan was still reporting full voltage of ~1.40 volts which is my normal full load voltage. CPUz isn't too trustworthy when reporting core voltage so I generally ignore it.
     
  18. BillParrish

    BillParrish [H]ardness Supreme

    Messages:
    7,976
    Joined:
    Aug 25, 2006

    I don't know how to say it, so I just will, its the highlighted bit in red and it concerns me you cannot discern the formula from the written description of said formula so I will attempt to explain it, Bear with me I am not a teacher.

    As uncle explained above and is documented in the document there is no Tjunction in these processors. You are getting tripped up by loose usage of terminology by enthusiasts. Intel engineers talk a different language.

    For purposes of this discussion only, TCC = TJunction = The temp the computer will thermally trip (shutdown to protect itself) at, either 85 or 100C (and actually can be set to anything per individual processor at the factory so I am not sure those two numbers are written in stone. )


    With understanding of the above the formula then becomes
    Core temp = TCC - DTS

    or in written form:
    The value read via the MSR is an unsigned number of degrees C away from TCC activation.

    To further digress:
    if TCC is known/assumed to be either 85 or 100 C and the DTS value is know from reading the MSR with crystalcpu or whatever.

    actual core temp = 85C (or 100C) - DTS value.


    get it ?



    Note: it is unfortunate the author of core temp (a superb effort and great program, I hate to be anything negative with it) choose the label Tjunction as a reference to the maxium core temp referred to in the above as TCC. I dont know if the terminology has changed or what, but it is clear from the document that Intel is moving to the placement of DTS (digital temp sensors) into the cores instead of the thermal junction diode temp readings and it is causing confusion.
     
  19. unclewebb

    unclewebb [H]ard|Gawd

    Messages:
    1,799
    Joined:
    Jun 21, 2006
    I have tended to use Tjunction because I thought users would be able to relate that to what is shown in CoreTemp.

    One person from Intel corrected that and used the term Tj(Max) to refer to the maximum temperature the processor is designed to operate at.

    Here's a list of Intel manuals for those that like to read:
    http://www.intel.com/products/processor/manuals/

    If you download volume 3A and do a search for IA32_THERM_STATUS you will end up in chapter 13 where most of my info has come from.
     
  20. unclewebb

    unclewebb [H]ard|Gawd

    Messages:
    1,799
    Joined:
    Jun 21, 2006
    Good news Bill. I found this in the Intel manuals:

    "MSRs with an “IA32_” prefix are designated as “architectural.” This means that
    the functions of these MSRs and their addresses remain the same for succeeding
    families of IA-32 processors."

    The technique of using CrystalCPUID to extract the temperature data out of your processor, as discussed above, seems to go back a few generations of Intel processors. The Intel manual says IA32_THERM_STATUS applies to the Core 2 Duo including the Quads as well as, Core 2 Xeon, Core Duo, Core Solo, the most recent Pentium 4 models and even some of the Pentium M line of processors.

    It's all just a simple matter of reading location 0x19C and extracting the data.

    Appendix B of Volume 3B of the Intel manuals lists all sorts of information you can get out of your processor by using the RDMSR feature of CrystalCPUID.
     
  21. graysky

    graysky Gawd

    Messages:
    620
    Joined:
    May 6, 2007
    I figured out what my problem was. TAT sucks for reading temps on C2D chips. I should have taken a hint from the application itself. If you launch it and look in the upper left under Processor Details, you'll see, "Processor: Pentium M." I just ran RMClock and Speedfan and had both apps log a set of load temps. Then I meticulously analyzed the log files and averaged the exactly (down to the last second) the same time points from each log to ensure a fair comparison. If I take the difference of the average RMClock core temp numbers and the average Speedfan numbers, I arrived at:

    Code:
    Core 0: 15.002
    Core 1: 15.069
    Core 2: 15.049
    Core 3: 14.979
    Conclusion: Speedfan can log temps as precisely as RMClock can and the offset is indeed -15 °C. Also, don't use TAT for a C2D or Quad C2D!