Data Mining Setup - Cluster or Not?

Isara

n00b
Joined
Jan 29, 2007
Messages
7
[H]ardForum,

I have a charity organization in Thailand and we want to build a small search engine (local students will maintain it). To start we will build a setup that just data mines 24/7. Once that's stable, we will build a setup to handle search queries. The problem is I'm not sure which setup would have the best results. Would a cluster be better or would one or two Xeon servers be best? Here's what I was planning:

Data Mining - 10 nodes (each have PIII 1GHz, 512Mb RAM, 250gb, and 10/100 NIC) in a Linux cluster and web crawling using Nutch.

Search Queries - 30 nodes (PIII 1GHz, 1Gb RAM, 250gb, and gigabit NIC) in a cluster.

The Data Mining is probably the easiest part of the problem since speed is not an issue. My main concern is how do we get the fastest results for queries? Would a cluster, with all the data scattered equally on 30 nodes, be quicker than a HUGE disk array of 3-4Tb?

With a small budget, how would you do this? Thanks for any advice you can provide,
PK
 
Back
Top