[mythtv-users] Cheap SCSI scanners - was:The Bigger... Disk contest, Fall 2007 edition

Sun Oct 21 13:43:40 UTC 2007

On Sun, Oct 21, 2007 at 11:56:11AM +1300, Steve Hodge wrote:
> On 10/21/07, Jay R. Ashworth <jra at baylink.com> wrote:
> >
> > That datacenter buyers still use SCSI (and shortly, SAS) drives
> > exclusively, even over that 6:1 price disadvantage -- and the need to
> > sell it to PHBs -- tells me that the price differential exists for a
> > reason, or those guys would lose their jobs.
> 
> 
> Yet Google use IDE. That tells me that a lot of the people choosing SCSI are
> doing so purely because "that's the way it's always been done". Admittedly
> if you want 15k drives you have to choose SCSI. But in terms of reliability
> if there was really much evidence that SCSI was better we'd be able to point
> to studies that show that. Instead all we've got is a lot of anecdotes.

That is a severe oversimplification of a very complex topic. At a very
high, generic level there are two ways to have a highly available, very
scalable application. You can have a few pieces of really big,
expensive, reliable hardware, or you can have a large number of cheap
less-reliable machines.

The second approach is what google uses. They have so many machines that
if one fails (for any reason) there probably doesn't need to be a huge
rush to fix it. IDE/SATA is really the only way for them to go because
the per-unit price is much more significant than the cost of downtime
for a single unit. That's the whole reason their findings are of
interest, they have lots of drives from lots of manufacturers.

Not everybody can use that method. Not all applications can be run
across parallel machines - for example, many web applications can run
that way with little effort, but it's vastly more complex to make a
database run across a cluster. Even when your application can run across
a cluster, that creates other costs because you have to pay for space,
power, and cooling, which are very serious limiting factors for a lot of
datacenters today.

The most common level of redundancy I see is two power supplies, two
drives in a RAID1 mirror, two HBAs, two network cards, and two servers.
In many cases only one HBA and network card, since those rarely fail and
a second server will keep you online while you get a replacement. With
only two servers, the cost (risk) of a downed server is much more
significant than it would be if you had a few racks worth of failover
servers, so you spend extra money and get servers from a big-name vendor
who has a 4-hour on-site support contract.

SCSI vs. IDE is really a pretty large can of worms, and when it's all
said and done many companies don't actually have a choice. The rackmount
servers you buy from a big name vendor frequently only support IDE or
SATA drives if they're at the bottom of the product line. In many cases
those won't be hot-swap and won't have hardware RAID, meaning a single
drive failure has a much greater chance of causing downtime and will
always require downtime to replace. IDE/SATA drives are simply not an
option unless you're willing to make the much larger sacrifice that goes
with them. If you have a dozen more machines to take the load, that's
not a big deal. If you have one or none, it's a lot more significant.

-- 
Michael Heironimus