Interesting Problem in Visual Basic 2005

Solaris · Mar 26, 2008

Hi guys, I have a software program that has a C++ executable as well as a visual basic DLL which is used to handle all kinds of engineering calculations....

A strange problem I have come across is a bug reported by a user where a variable input from the user, which should only change the final answer by the difference of the input, causes a jump in our calculated result. Let me put it like this, you have 3.00mm as an input, and the output is 68.99. When 3.01 is put in as an input we get 77.03, and when 3.02 is put it we get 77.04, etc.... (This 68.99 number is just WRONG)

So trying to track down this bug, I open visual studio and go step into this god awful iterative routine for solving this stuff, and the ERROR DOES NOT OCCUR. OK, but it still occurs on my pre-compiled version ?

Next, I tried to add a bunch of lines to write all of the variables pertinent to this section out to a debug log since the error that occurs doesnt happen when im in "debug", yet I still need a way to track it down. Well, I still havn't been able to track down where this problem is originating, but I do know that after I log about 15 lines of data, the problem with this variable somehow recorrects itself.

Does anyone have any clue wtf might be the problem? I am not a computer scientist, but rather an engineer and this doesnt make any sense to me.

All help appreciate,
Solaris

Fryguy8 · Mar 26, 2008

overoptimizing compiler that causes different results when debugging symbols are stripped out is my first guess.

generelz · Mar 26, 2008

Since you are using an iterative solving technique in conjunction with floating point arithmetic, it sounds like you could be running into some rounding errors - however a discrepancy that large seems quite strange indeed. I would point to possibly the compiler also optimizing out some of the nuances of your floating point code.

One test would be to make two release builds - one with your current release settings and another with optimization disabled. See if the problem still exists. That might be a good first step to see if your compiler is optimizing away some crucial part of your iterative solver.

Here is a Microsoft article which seems like it might be fairly relevant:

http://msdn2.microsoft.com/en-us/library/aa289157(vs.71).aspx

Solaris · Mar 26, 2008

Thanks for the input...

Interesting read in the article, i'll have to look into what settings are available in the VB compiler. Getting late though, don't feel like opening that box again today... Will look into it tomorrow, thanks

Solaris

mikeblas · Mar 26, 2008

The presence (or absence) of debugging symbols have zero impact on code execution.

generelz · Mar 27, 2008

mikeblas said:
The presence (or absence) of debugging symbols have zero impact on code execution.

More to the point, isn't it the case that in general (or by default in VS 2005) building for debug would eschew the sorts of optimizations a release build would typically have?

mikeblas · Mar 27, 2008

In general, yes; debug builds aren't optimized. (That's independent of having symbols.) Most professional developers ship symbols with their retail builds. Otherwise, debugging crashes in the field is really tough. We don't know if the standalone version of the program is a debug build or a retail build. We further don't know if Solaris is optimizing his debug builds or turning optimizations off in his retail build.

Solaris, you'll have to do more debugging yourself -- or, post more, clearer information for us.

Obviously, there's a difference between the code you're testing standalone (which exhibits the problem) and the code you're running in the debugger. That difference can be any of a zillion things.

Maybe you're loading a different version of a DLL because of the debugger running in a different path. Maybe it's a different version of the executable. What have you done to demonstrate that you're using the same code in both the debugger and the standalone test?

There's lots of other problems you might be running into, but given the skimpy description of your scenario, it's hard to determine which might be most likely.

Your description with the specific numbers doesn't tell us anything; why is 68.99 "just WRONG"? What should the result be when the input is 3.00 millimeters? Are you returning this number from a function, or printing it to the screen, or showing it in a field?

Does the problem happen only on the first call to this function?

"Somehow recorrects itself"? What does that mean? So, you've added debugging code and now you get the correct answers? Always, or just sometimes? I guess you means something corrects itself again. Are you saying after adding the debugging code, you get the expected values, then get the erroneous values, then get correct values; but before adding the debugging code you always got unexpected results? Are you talking about the result, or the value of some variable during tie iteration in the calculation you're doing?

Can you show any code at all?

Solaris · Mar 27, 2008

The standalone code is exactly the same as the debug version. I am testing this by using an identical executable. I will go to the debug directory, run the .exe after I build the dll to this dir and then the error occurs. However, if i click the little "play" icon in Visual Studio and then proceed to step through it the error does not occur. This problem seems to occur for this input going back several release builds.

When I say I enter 3.00 and I get 68.99, but 3.01 gives me 77.03, 3.00 should thus give me an output of 77.02. This 3.00 is a corrosion allowance that gets subtracted from a nominal thickness for calculation purposes and then re-added on the end such that the user knows what thickness of material to order.

I can't post the source code here, this class alone is over 33kloc and holds too much private info. This "returned thickness" is sent back for display in the GUI dialogues on the C++ side, but in addition to this, full calculation HTML reports are also generated on the visual basic dll side, which confirms something is messed up.

To the question: Does it happen only on the first call? No, the entry of "3.00" seems to be some type of like singularity point, where if you enter it... causes problems, but if you slightly increase it (say to 3.00000001), problem dissapears. I can confirm that the results from entry of 3.00 are incorrect by hand calculations.

I have tracked the issue down to an iterative non-recursive routine or something that is called from within this routine. Inside the looped portion of code, I had it log all important variables for each iteration. But, once I get up to say 20 lines of code to log, the issue dissapears. By that I mean is gone, completely and will not return until I remove the debugging lines.

Still looking into this matter...

Solaris

mikeblas · Mar 27, 2008

Solaris said:
The standalone code is exactly the same as the debug version. I am testing this by using an identical executable. I will go to the debug directory, run the .exe after I build the dll to this dir and then the error occurs. However, if i click the little "play" icon in Visual Studio and then proceed to step through it the error does not occur.

Have you confirmed that the debugger is loading the same DLL as the executable is, when run from the command line? How?

We need to be perfectly sure the same code is really running. If that's true, for certain, then we can investigate why the same code behaves differently. Investigating why different code behaves differently would be a waste of time.

Solaris · Mar 27, 2008

Yes--- Im pretty sure that the debugger is using the exact same DLL as the standalone. I say this b/c our executable automatically uses the dll in the same directory as itself... My working directory in Vis Studio is say C:\my projects\project1\, so when i debug thru the development environment, its going to use my .exe from this folder and also build a new dll to put in this folder. The standalone I am comparing results with is the same .exe file, utilizing the same dll as previously mentioned.

So-- I'm almost 100% positive we are running the exact same code here, this make sense to you?

ty again

Solaris

Solaris · Mar 27, 2008

Quick update--

I took out all the debugging lines-- issue resurfaces. I then decided to replace the 2000+ occurrences of "Single" with "Double" in my class, and the problem is gone.

Not quite satisfied with this however, but its nice to see something that worked.

mikeblas · Mar 27, 2008

Using Double instead of Single implies that it is a rounding error.

Changing the debugging code implies that it is not a rounding error.

Solaris · Mar 27, 2008

I'm not sure I follow your second statement Mike, although the first one makes sense....

You are saying by adding/removing my ~20 fileput(..) calls which result in a different output when running standalone implies that there is likely another underlying issue?

I just want to make sure I fully understand what changes are made and why, as changes to 1 portion of code in a DLL of over 20 million lines can often have drastic and unexpected change s elsewhere.

Solaris

mikeblas · Mar 27, 2008

Solaris said:
You are saying by adding/removing my ~20 fileput(..) calls which result in a different output when running standalone implies that there is likely another underlying issue?

Yes, that's what I'm saying.

Khanmots · Mar 27, 2008

Letsee if I've got this right, adding lines of code that don't modify variables can cause the problem to go away?

Changing the size of data members can cause the problem to go away?

I'm wondering about the possibility of heap corruption. Those can be a real pain to track down though. (if you don't know where to start... well... start with examining memory allocation and deallocation looking for common errors such as using delete[] when the new didn't allocate an array, deallocating the wrong pointer type, deleting statically allocated memory, etc)

If you could be very specific in what you're describing, it'd help. Pronouns and such are evil

Also... does the problem only arise when you're looking at a whole number?

mikeblas... keep up the good work

Solaris · Mar 27, 2008

Given the visual basic, there isnt so much low-level memory allocation stuffs....

The only instance I have found of this issue coming up is when 3.00 is used, so im still very confused. Yes, just logging enough of the variables in the iteration causes no problems. This code has been in use for years and nobody has come across this strange instance.. I do have a second degree in applied mathematics perhaps It would be wise for me to take a look at the numerical method applied and any possible answer DOE issues.

And as for the underlying issue... im still a little bit unsure why (ignoring the lines of code that generate debug logs) im getting different results when in the development environment. Like I said I'm 99.9% sure that all the code being run is a carbon-copy...

Solaris

Khanmots · Mar 27, 2008

Thought you were using a c++ program which utilized a VB dll?

Anyways, while I can't speak as to Visual Basic, the default Visual Studio C++ debug and release compilations are quite different beasts. Your source code might not change, but there'll be drastic differences in the generated machine code. Throw conditional compilation into the mix and you'll start seeing differences in behavior as well.

Personally I think I'd start by using your favorite debugger to perform a trace, and stepping through the code line by line watching which variables change, and determining if the change is expected and correct.

pxc · Mar 27, 2008

I have seen this happen in VB going back to 3.0, where single stepping does not exhibit the error, but running at full speed inside or outside the IDE does. There is probably some difference that you need to track down like mikeblas suggested above. I have not run into this using VB 2005, but I had solved it before by just manipulating the code until it worked.

Wingy · Mar 28, 2008

I think Mike nailed it here. Without knowing how your calculation works, it's hard to tell where you may have lost precision. But from what you've described, you most certainly lost precision at some point. Since you're talking about a recursive algorithm, and my experience in numerical analysis with recursive algorithms, I believe you found your solution.

Interesting Problem in Visual Basic 2005

Solaris

[H]ard|Gawd

Fryguy8

[H]ard|Gawd

generelz

Limp Gawd

Solaris

[H]ard|Gawd

mikeblas

[H]ard|DCer of the Month - May 2006

generelz

Limp Gawd

mikeblas

[H]ard|DCer of the Month - May 2006

Solaris

[H]ard|Gawd

mikeblas

[H]ard|DCer of the Month - May 2006

Solaris

[H]ard|Gawd

Solaris

[H]ard|Gawd

mikeblas

[H]ard|DCer of the Month - May 2006

Solaris

[H]ard|Gawd

mikeblas

[H]ard|DCer of the Month - May 2006

Khanmots

Gawd

Solaris

[H]ard|Gawd

Khanmots

Gawd

pxc

Extremely [H]

Wingy

[H]ard|Gawd