(...talk about unclear title....)
I have a tough task given to me at work:
I have a database dump with over 100,000 records of client's contact information (email, name, company) and what services they subscribe to. Because each service has its own signup there are many duplicates in the data.
My job consists of:
- cleanup of the data - find all unique individuals (eliminate duplicates) - while email address is the best clue, it should not be the only criteria - people use multiple email addresses. The cleanup should be manual, that is, I make the decisions of who is who, but I need to build a system that automatically searches through the data and makes intelligent recommendations to me - I don't want to scroll all day in notepad
- keep track of what gets changed in the data (and why) and keep track of where the change is in the original database so that it might be referenced later
--------------------------------
I tried working in Excel but it was too limited in power - 100,000 clients X 60 services = 6,000,000 cells - Excel freezes for a few seconds when I make any change in the worksheet. In addition, I don't think there is an easy way to make a "work-flow" automation system inside Excel.
Then I decided to put the data in MySQL database - it is fast but I don't know any type of scripting/programming (learning Perl as fast as I can). I plan to setup an additional database that will keep track of the changes I make in the data and why I make them.
---------------------------------
QUESTION:
Since I don't know much programming, so I cannot make a program for myself, could you recommend me any kind of program/platform where I can setup tables, buttons, shortcuts, and macros driving them, so that I can have an automated system searching through the database, giving me the closest matches and I make the decision which contacts are the same person. The program, obviously, should be able to read/write to MySQL.
Sorry for the long post.
Any help will be greatly appreciated!
Thanks!
I have a tough task given to me at work:
I have a database dump with over 100,000 records of client's contact information (email, name, company) and what services they subscribe to. Because each service has its own signup there are many duplicates in the data.
My job consists of:
- cleanup of the data - find all unique individuals (eliminate duplicates) - while email address is the best clue, it should not be the only criteria - people use multiple email addresses. The cleanup should be manual, that is, I make the decisions of who is who, but I need to build a system that automatically searches through the data and makes intelligent recommendations to me - I don't want to scroll all day in notepad
- keep track of what gets changed in the data (and why) and keep track of where the change is in the original database so that it might be referenced later
--------------------------------
I tried working in Excel but it was too limited in power - 100,000 clients X 60 services = 6,000,000 cells - Excel freezes for a few seconds when I make any change in the worksheet. In addition, I don't think there is an easy way to make a "work-flow" automation system inside Excel.
Then I decided to put the data in MySQL database - it is fast but I don't know any type of scripting/programming (learning Perl as fast as I can). I plan to setup an additional database that will keep track of the changes I make in the data and why I make them.
---------------------------------
QUESTION:
Since I don't know much programming, so I cannot make a program for myself, could you recommend me any kind of program/platform where I can setup tables, buttons, shortcuts, and macros driving them, so that I can have an automated system searching through the database, giving me the closest matches and I make the decision which contacts are the same person. The program, obviously, should be able to read/write to MySQL.
Sorry for the long post.
Any help will be greatly appreciated!
Thanks!