VB6 to .NET extremely slow

M76

[H]F Junkie
Joined
Jun 12, 2012
Messages
14,030
I've just converted an old VB6 program with lots of mathematical calculations written years ago to VS2015 .NET.

I haven't changed anything about the code, just adjusted the syntax and declarations to be compatible, and added a framework to run it in parallel threads using backgroundworkers.

I wrote and compiled it on a Core2Duo E7400. 2 cores 2 threads. So the calculations run on it in 2 parallel threads. And it seems relatively OK, it runs as I'd expect on this CPU.

Then I took the compiled program to a computer with an W3690 Xeon CPU, 6 cores 12 threads. And the program runs like a dead sloth on this. It starts running on 12 threads all right, and CPU usage is over 90%, but each thread takes like 15x longer to process the same amount of data than on the Core2. And it's not the IO operations, those take very little time, it's the actual calculations that take this long.

What could be the issue?
 
I have a suspicion on what could cause the slowdown if there are more threads.

There are large two dimensional arrays declared in the project, populated with reference data needed for the calculations. So each thread reads data from these arrays. Would it be better to pass the arrays to each worker as an argument?
 
I have a suspicion on what could cause the slowdown if there are more threads.

There are large two dimensional arrays declared in the project, populated with reference data needed for the calculations. So each thread reads data from these arrays. Would it be better to pass the arrays to each worker as an argument?

If the arrays were not being written to by the threads, then there should not be any locks put on them for multithreading.

If done right, having read only arrays (after the initial population), it should not slow it down at all.

If not done right, every time something is read from the array, it will lock out the other threads until the thread that is using it has released the lock.
 
M76,
Do you have a the code anywhere for review?
 
Last edited:
If the arrays were not being written to by the threads, then there should not be any locks put on them for multithreading.

If done right, having read only arrays (after the initial population), it should not slow it down at all.

If not done right, every time something is read from the array, it will lock out the other threads until the thread that is using it has released the lock.

Then I think that could be the problem, but how do you prevent threads from locking the arrays?
 
OK, Suddenly it started working. And I don't even know what did the trick.
 
Then I think that could be the problem, but how do you prevent threads from locking the arrays?

Passing copies of them as args to the worker threads might be the simplest solution, assuming they're not written to after you initially populate them.
 
Then I think that could be the problem, but how do you prevent threads from locking the arrays?

Can you give some pseudo-code that shows your flow, if you can't share the code itself? Array enumeration is not thread-safe, it shouldn't lock anything. You can see the reference source here: http://referencesource.microsoft.com/#mscorlib/system/array.cs,4f49b6bfd66eb1e5

There's likely something else going on, e.g. your setup/teardown of background workers incorrect, is thrashing memory (you don't say how large this data is or how much memory each thread uses), etc.
 
Can you give some pseudo-code that shows your flow, if you can't share the code itself? Array enumeration is not thread-safe, it shouldn't lock anything. You can see the reference source here: http://referencesource.microsoft.com/#mscorlib/system/array.cs,4f49b6bfd66eb1e5

There's likely something else going on, e.g. your setup/teardown of background workers incorrect, is thrashing memory (you don't say how large this data is or how much memory each thread uses), etc.

It didn't seem to have memory problems. memory usage remains steady only fluctuating a few percent while running a heavy workload for an hour.

As I've mentioned it started working for some reason. But it's still not perfect. I'd like to see CPU usage pegged at 100%, but it's not, it's fluctuating heavily.

What I'm doing is I have one backgroundworker created in design, that I use to handle any preparation, and then read the source file. And then generate as many additional backgroundworkers as many threads the computer has, and then pass 2000000 lines of he input file to each. And repeat until the file is done. Currently I'm reusing the backgroundworkers over and over again.

The writing of the output is actually done by the individual workers. Using synclock, so they don't try to write to the output file at the same time. But that part is definitely not a problem. The problem is in the calculating code itself, that is very crude, I already fixed a ton of potential problems with it, and deleted much waste code, and unused or redundant variables. It was done by one of my colleagues who always thinks the work is done as soon as the code runs without errors, no matter how crude it is, or how user unfriendly the complete application.

Right now I'm taking a few days off so I don't want to see the code until tuesday.
 
My bet is the disk I/O is really being limited here - I know you say it's not, but I bet it is.

Post some code.
 
Without seeing the code, it is impossible to have any meaningful insight. Suggestions of threading APIs and such are not going to be useful. You can write perfectly good multi-threaded code in standard .NET even without using a lot of fancy .NET features. It is likely a symptom of poorly written or structured code and locking.
 
My bet is the disk I/O is really being limited here - I know you say it's not, but I bet it is.

Post some code.

That's my guess too, if that machine isn't using SSDs then its probably thrashing away.

How many worker processes are being spun up?
Why are you using a framework to handle the threading? The newer versions of .net have made it much easier to write multitreaded code.

People moan about applications not being multi threaded, but too many threads can be worse.
 
My bet is the disk I/O is really being limited here - I know you say it's not, but I bet it is.

Post some code.

this is what I was going to ask if the code was doing any disk I/O or any kind of trace debugging that could be writing to the disk. Once the disk gets involved things slow down A LOT.

I've written some pretty data heavy processes in windows services that don't kill the server but turn some basic I/O output for me to debug some time with them and they kill the machines. These also use threaded thing and some use background worker processes.
 
It's definitely not IO, no need to go down that alley, the IO operations are done entirely in a separate thread after the calculations are completed.

What I've done now is created local copies of each large variable on every thread, before using it 10 million times. Now running tests if this actually makes any difference in performance.

I also have about 2 dozen constants declared design time for the application. But I assume constants are handled differently and does not present a performance penalty when referenced from different threads right?

edit: the above made no difference, about 30 seconds faster in a 30 minute process.

But I guess it's fast enough. The original slowness was already fixed in december, but I don't know what fixed it.

The application runs 3 times faster than the original VB6.
 
Last edited:
I've benchmarked VB6 against VB NET, test app parsing millions of strings and VB6 wins by wide margin.
 
I've benchmarked VB6 against VB NET, test app parsing millions of strings and VB6 wins by wide margin.

There are string handling functions that you specifically should avoid in VB.NET, that are much slower there, you need to use alternatives.
 
A few things could be going on here:

The most obvious is that low duration kernel threads are bumping your worker threads, which then get reassigned by bumping another worker thread, which bumps another worker thread, and so on. Because in this case your work is computationally heavy, the threads bumping cores will cause the data in the L2 cache on the CPU to be copied (L2 is core dependent on Intel processors), farther reducing throughput.

Another issue could be locking between threads, although if you haven't set this up yourself, this shouldn't be coming into play in a way that effects performance. This does raise the question though: Are you sure you are performing your calculations in a thread safe manner? If any of your worker threads WRITE to the same location, then you have to introduce some locking mechanism to ensure the data is always correct. [READ operations are fine; no locks needed]

Its certainly no an IO problem if it's math heavy, since pretty much everything will reside in main memory, if not directly on the CPU cache.
 
A few things could be going on here:

The most obvious is that low duration kernel threads are bumping your worker threads, which then get reassigned by bumping another worker thread, which bumps another worker thread, and so on. Because in this case your work is computationally heavy, the threads bumping cores will cause the data in the L2 cache on the CPU to be copied (L2 is core dependent on Intel processors), farther reducing throughput.

Another issue could be locking between threads, although if you haven't set this up yourself, this shouldn't be coming into play in a way that effects performance. This does raise the question though: Are you sure you are performing your calculations in a thread safe manner? If any of your worker threads WRITE to the same location, then you have to introduce some locking mechanism to ensure the data is always correct. [READ operations are fine; no locks needed]

Its certainly no an IO problem if it's math heavy, since pretty much everything will reside in main memory, if not directly on the CPU cache.

The worker threads write directly to the output file only after they completed all the calculations on their assigned amount of elements. And of course I'm using synclock while writing to the file. So it performs all the math heavy calculations, and then writes the data to the output file quickly.

I'm currently using 2 million / worker. That's seems a good balance to conserve memory. Currently this uses about 8GB of memory on 12 threads. Maybe I should dynamically change the number depending on how much memory the computer has. Less work units should be quicker.

I don't know how would I know about the kernel threads? I'm new to multi threaded applications. I never needed to worry about performance of my applications thus far.
 
The worker threads write directly to the output file only after they completed all the calculations on their assigned amount of elements. And of course I'm using synclock while writing to the file. So it performs all the math heavy calculations, and then writes the data to the output file quickly.

I was referring to the data internal to the program.

Whenever you have multiple threads accessing the same object, if ANY thread performs a Write operation, that object has to be protected by software locks to prevent more then one thread from accessing that data at the same time. Failure to do so WILL result in incorrect data being read/written to, and potentially corrupt your data.

Note having multiple threads Read the data at the same time is OK.
 
I was referring to the data internal to the program.

Whenever you have multiple threads accessing the same object, if ANY thread performs a Write operation, that object has to be protected by software locks to prevent more then one thread from accessing that data at the same time. Failure to do so WILL result in incorrect data being read/written to, and potentially corrupt your data.

Note having multiple threads Read the data at the same time is OK.

There is no writing to the same location.
 
Back
Top