Which technology might be best for.....

protoculture · Oct 9, 2015

I have a question as to which technology would be the most ideal / efficient.

We have a coldfusion/java webserver running on IIS.

For reasons of politics we've been given the mandate to adopt an external webservice that was very poorly implemented.

What this means is that instead of getting only the records that have been updated they are sending us their entire list of records, which they are expecting us to process.
To look for changes. The core of our webservices as they are now have been constructed to update files / xml / data coming in.

So this pre-process will involve access to our database to compare records and to compile a list of records that have actually been updated.

Question : I have heard PERL might be good at processing large strings/data sets, so I am considering going this route. I do question though its database efficiency ( having not had much experience with Perl ).

Also may even consider C++ for this pre-processing.

Sort of dichotomy though as everyone expects this to display how bad of a system they are expecting us to use....as sort of an example of how it can't work its so ill concieved.... but I'm supposed to build it to work long term.

mikeblas · Oct 10, 2015

There is no technology that is "ideal". For anything, period.

The language that you decide to use is secondary. What you need to do is come up with an algorithm that helps you identify the records that are new. The candidate records probably have a timestamp, and might have an identifier that lets you decide which records are new since the last time you synced up.

If you don't have a way to identify a new record, you're going to get all the records from the source database and compare them for existence in the new database. You'll do this every time, and the list of records on both sides will become more and more; taking longer and longer. That will be true independent of the language you choose to use.

If you don't have a way to identify a new record, you need to add one. Otherwise, you don't have a reasonable solution.

It would be important to know how many records are in the database, and how many you expect to change between each sync. If it's only a handful of records in a database for a few hundred records, maybe things aren't so bad. But, who knows? You haven't said anything about it -- maybe you have billions of records, and tens of millions change each day.

Are you after records that are new, or records that have changed in any way?

michalrz · Oct 12, 2015

To add to what Mike said, you need an algorythm, requirements, specifications. Not tech, at least not yet.

Let me re-iterate what I have gathered from your post:

We have a coldfusion/java webserver running on IIS. (....) an external webservice that was very poorly implemented.

Is the former your target or development platform and the latter the original service that feeds the database?
From what I've gathered everything seems to eventually interface to everything (i.e. IIS works with Perl).

What this means is that instead of getting only the records that have been updated they are sending us their entire list of records, which they are expecting us to process.
To look for changes.

So what you're implying is - the old webservice is still feeding data that needs to be stored in the new format / new server (what RDBMS?).

The core of our webservices as they are now have been constructed to update files / xml / data coming in.

As in - you have already setup the part of the 'new' system up to the point it can convert between the two databases?

So this pre-process will involve access to our database to compare records and to compile a list of records that have actually been updated.

Again - what do you mean by 'update'? updating actual records? converting records? deleting records? adding new records?
Are there duplicate records? Like, identical?

Question : I have heard PERL might be good at processing large strings/data sets, so I am considering going this route. I do question though its database efficiency ( having not had much experience with Perl ).

Also may even consider C++ for this pre-processing.

Are you proficient with C++? If you have not had aced it, you are in for creating a secondary problem by choosing this path.
What you're trying to do sounds like a problem that can be just as easily be solved by a scripting language like Perl, maybe throw hardware at it.

Sort of dichotomy though as everyone expects this to display how bad of a system they are expecting us to use....as sort of an example of how it can't work its so ill concieved.... but I'm supposed to build it to work long term.

This sentence, combined with the term 'political reasons' somehow makes me think the generals are looking for scapegoats and are prepared to throw around fancy around to either win a contract or destroy someone.

If I understand you correctly, the solution would be to
1. ready access to both databases, have any conversion features loaded up and ready. Establish fast network links or move the source and target physically so that a fast connection can be established.
2. lock all databases (this will of course be scripted with strict logging)
3.1 the comparision part. Assuming you have no unique keys like a timestamp,
3.2 decide to either process the 'updates' list and merge it with the existing old records or find another way to handle this duality
3.3 if the number of tables and columns is sane, simply compare record-by-record,field-by-field, minding the original and target data types so as to not perform a type cast that would disrupt foreign key relationships
3.4 assuming both databases are still locked, at the very least the number of records between the DBs should match. If you want to go crazy, and the volume of data allows it, re-read all records from the old database, cast them to something like strings and actually make a hash. Do the same with the newly created database and compare hashes.
4. file notice, flee state

Of course you should perform test runs to determine the approximate time the lock will have to be in place.
There are lots of people here (me excluded) that would surely guide you if you broke the problem down further, described things. How many records, etc.

Which technology might be best for.....

protoculture

n00b

mikeblas

[H]ard|DCer of the Month - May 2006

michalrz

Supreme [H]ardness