Coldblackice
[H]ard|Gawd
- Joined
- Aug 14, 2010
- Messages
- 1,152
This is a simple project, but I feel that my methods are inefficient and kludgy:
(EDIT: I'm also open to suggestions of using other languages, it doesn't have to just be PHP)
(EDIT pt II: To any anxious to call foul over the scraping, and suggest a weather service API instead -- Weather providers' data is irrelevant; these are local data collection posts provided by schools in the town, and are specifically what's important. Additionally, I have full permissions to collect the data from all involved, and it was decided at the project's start that a scraping approach would be easiest, as the framework is already in place and running for the respective schools/posts continuous updating of temperature data via their respective websites.)
-A town's current weather stats are continuously updated on a few different websites
-I've written a local PHP script that resides on my desktop that is run in a browser, scraping the stats off these remote sites, then dumps the stats into a local MYSQL database, also on my desktop.
-To keep this process looping, I've altered my main PHP configuration, setting max script execution time to 10 minutes, and then in the html container that encompasses this PHP script, I've set <meta http-equiv="refresh" content="348;url=test.php"> in the head section. So the PHP script loops for a certain time (divided up by using sleep()'s), with the html refreshing at a separate interval.
(Besides the ill-coordination and timing of these two separate "timers", this feels like a really kludgy and half-cocked approach to me, but I don't know what a more veteran programmer would do. The end result works fine even with this happening, but for my own programming development, I'd like to see how more veteran and seasoned programming minds would approach this.)
-The scraped data stats are stored into a local MYSQL server
-(Not yet implemented) Now the stored SQL data needs to be displayed on an external web server, in table format. Initially, I just set the local PHP script to instead access a remote SQL server where it would store the scraped stats remotely. Then, anytime the main index page on that server is accessed, it would poll its SQL server for the stats data, processing it into a table in the browser for the user, on request.
It seems a bit half-cocked to have this program split in "two", with the scraper running locally while the database is remote, however, from my understanding, it wouldn't really be possible to have the scraper continuously running on a remote hosting server -- unless it was some type of local machine/client that's continuously refreshing a PHP file on the server. Any help/clarity/insight on this?
A bit more specific of a summary of the two hiccups I'm needing veteran insight on:
1. Without some manner of special access/permissions, is it typically possible to have a personal program/script running continuously on a hosting company's server? Or is the only way to have a script/PHP file run accomplished by a browser manually "triggering" a script/PHP file to run by accessing it?
2. What workflow would you implement in this situation, specifically, determining which parts are done locally, and which parts are done remotely?
e.g., local scraper/local database/remote publishing (like having a local scraper store into a local SQL, and then have the scraper copy its local db to the server's db, and then the server retrieves and tabulates the data upon request)?
Would you have a local scraper bypass any local databasing, and instead update a remote database directly?
Would a process that needs continual running/looping be best done on a local machine (with full access/control), or would it be better (or even possible) to do it on a hosting account somehow (I'm not aware how one could have something continually running on a host, without some kind of unadvertised special permissions or access. Maybe through an SSH session, possibly?)?
(EDIT: I'm also open to suggestions of using other languages, it doesn't have to just be PHP)
(EDIT pt II: To any anxious to call foul over the scraping, and suggest a weather service API instead -- Weather providers' data is irrelevant; these are local data collection posts provided by schools in the town, and are specifically what's important. Additionally, I have full permissions to collect the data from all involved, and it was decided at the project's start that a scraping approach would be easiest, as the framework is already in place and running for the respective schools/posts continuous updating of temperature data via their respective websites.)
-A town's current weather stats are continuously updated on a few different websites
-I've written a local PHP script that resides on my desktop that is run in a browser, scraping the stats off these remote sites, then dumps the stats into a local MYSQL database, also on my desktop.
-To keep this process looping, I've altered my main PHP configuration, setting max script execution time to 10 minutes, and then in the html container that encompasses this PHP script, I've set <meta http-equiv="refresh" content="348;url=test.php"> in the head section. So the PHP script loops for a certain time (divided up by using sleep()'s), with the html refreshing at a separate interval.
(Besides the ill-coordination and timing of these two separate "timers", this feels like a really kludgy and half-cocked approach to me, but I don't know what a more veteran programmer would do. The end result works fine even with this happening, but for my own programming development, I'd like to see how more veteran and seasoned programming minds would approach this.)
-The scraped data stats are stored into a local MYSQL server
-(Not yet implemented) Now the stored SQL data needs to be displayed on an external web server, in table format. Initially, I just set the local PHP script to instead access a remote SQL server where it would store the scraped stats remotely. Then, anytime the main index page on that server is accessed, it would poll its SQL server for the stats data, processing it into a table in the browser for the user, on request.
It seems a bit half-cocked to have this program split in "two", with the scraper running locally while the database is remote, however, from my understanding, it wouldn't really be possible to have the scraper continuously running on a remote hosting server -- unless it was some type of local machine/client that's continuously refreshing a PHP file on the server. Any help/clarity/insight on this?
A bit more specific of a summary of the two hiccups I'm needing veteran insight on:
1. Without some manner of special access/permissions, is it typically possible to have a personal program/script running continuously on a hosting company's server? Or is the only way to have a script/PHP file run accomplished by a browser manually "triggering" a script/PHP file to run by accessing it?
2. What workflow would you implement in this situation, specifically, determining which parts are done locally, and which parts are done remotely?
e.g., local scraper/local database/remote publishing (like having a local scraper store into a local SQL, and then have the scraper copy its local db to the server's db, and then the server retrieves and tabulates the data upon request)?
Would you have a local scraper bypass any local databasing, and instead update a remote database directly?
Would a process that needs continual running/looping be best done on a local machine (with full access/control), or would it be better (or even possible) to do it on a hosting account somehow (I'm not aware how one could have something continually running on a host, without some kind of unadvertised special permissions or access. Maybe through an SSH session, possibly?)?
Last edited: