|
|
|
1 June 1999
Whats Missing from Mac OS X Server?
Why Mac OS X isnt yet ready for High Availability applications.
First posted on the Macintouch web site.
Summary
With the recent advent of Mac OS X Server, combined with an arising need where I work, I decided to take a look at file servers options. We currently have a peer-to-peer system, without a dedicated central file server. I considered Mac OS X Server, SGI, Sun, Linux, and NT Server for providing file services. As it turns out, in my particular application, a small engineering office, the most important aspect of choosing a file server was addressing my superiors legitimate concerns about centralizing the data. Our data is our productivity, and they wanted the data to always be available. The technical term for this requirement is High Availability. A critical part of high availability is reliability, and reliability is dependent on hardware and operating system. Cost was also critical to the selection of a server, not too surprisingly. My task was to find a system that met all our needs at less than $10,000. We do not technically require a 24x7 solution, but neither can we afford to have the server go down during the 10 hours a day, 5 days a week that it would be used. My employer will not tolerate the oh-too-common phrase, the server is down. In short, reliability, not sheer speed, is king. After examining all the facts, the conclusion I reached is that if the company cannot afford a proper file server, it is better to stick with the distributed arrangement we currently have. For various reasons that I will explain below, only the commercial Unix-based machines met all the requirements of a proper file server.
Driving all this research was the recent failure of one of the NT workstations, which had its first hard drive replaced just six months before. The estimated cost in lost productivity and retasking of personnel to fix the hardware can quickly exceed $1,000 per incident, even for a small office like where I work.
Before anyone gets upset, please note that several of the shortcomings in Linux and Mac OS X Server could be remedied by various open source software packages. Others could be solved by enterprising companies. Unfortunately, most of the software packages are not yet complete. Even if they were, the support issue becomes crucial in high availabilitywhom are you going to call when the software has a problem, as unlikely as that may be? And none of the microcomputers (Apple or Intel) have the inherent redundancy found in Unix hardware. After all, the point is to avoid having people on staff who do nothing but play with the computerswe are a small office. For Mac OS X Server or Linux to seriously compete with the commercial Unix offerings, they still need significant work.
HardwareThe Missing Pieces
The benefit to having a file server is that it contains all your data in one place. The bad news is that you have all your data in one place. This means that if the server has a problem, you have the entire office down, not just one or two people, as in the case of a failed workstation. One of the worst assumptions that a company can make, in my opinion, is that the server will never have problems, or at least not the problems seen on workstations. Tape backups provide a safety net for when the server does go down, but it can take hours or days to diagnose, replace, and reinstall everything back to its original condition. The best solution for this companys demands is to do everything possible to keep the server from going down in the first place. The challenge is doing it without having to hire an entire support crew and while keeping it within the budget.
Mac OS X Server runs on readily-available hardware, which helps keep initial costs down. However, what I found is that, as one of Murphys laws says, you really do get what you pay for. The new G3 Power Macintosh computers are powerful and affordable, and they make wonderful workstations. But, they lack some critical items found in purpose-built servers. Of note is that the G3 Server is really a desktop machine despite its billing as a file server. Ive attempted to highlight some of the items where I feel the G3 Server is still lacking. Perhaps Apple or an enterprising third party will seize the opportunity to address these issues. Until they do, I cannot recommend OS X Server for this office.
1. Hardware Redundancy. In contrast to Apples AIX-based servers of years ago, the new G3 servers do not have redundant power supplies, nor are they easily swappable. On some of the commercial Unix servers, in addition to quickly replaceable power supplies, there are environmental sensors and redundant, variable-speed cooling fans. No such options exist for the G3 hardware.
2. RAID. As far as I have been able to determine, there is no support for RAID built into the Mac OS X Server software. While it is true that external hardware controlled, platform-independent arrays can be purchased, these add several thousand dollars to the cost of the server. Furthermore, there are questions as to whether OS X will be able to boot from external RAIDs; if not, then there is the question of ensuring the integrity of the system (OS and boot) disk. With the commercial Unix vendors, the systems provide for hot-swappable internal drives, and RAID in the OS. This allows us to make a clone of the system disk, once configured, and then store it in a safe location. Should the system disk go bad, the system can be shut down, the cloned drive installed, and the system be back online in less than half an hour. OS X Server lacks this feature.
3. MP/CPU. There is currently no multiprocessor option available for the OS X Server platform. This affects both scalability (see #4) as well as reliability. The purpose-built Unix servers, when configured with multiple processors, will detect CPU errors and restart the system with only a single processor, all automatically. In a single-processor configuration, if the CPU goes down, the entire system goes down.
4. Scalability. Scalability refers to the ability of the system to increase in capability as the companys needs grow. With the G4 chip on the horizon, there are questions about whether just a CPU swap will increase the capabilities of an OS X Server machine, or whether the entire machine will have to be replaced. There also appears to be a lack of clustering software which might allow two machines to act as a pseudo-multiprocessing machine. There are open source projects attempting to provide clustering solutions (http://eddieware.serc.rmit.edu.au/ and http://www.mosix.cs.huji.ac.il/), but neither appears finished nor are they available for OS X Server.
5. Failsafe/Failover. At least one of the commercial Unix companies, SGI, offers a failsafe system. This allows two independent servers to be connected via a high speed, dedicated cable, and act as one machine. Either one of the two machines could completely fail, but there would be no interruption in the network services. This approach is beyond the budget for our company, but it could be added in the future, under different budget years, without obsoleting what we purchase this year. There is an open source project attempting to do something similar (http://eddieware.serc.rmit.edu.au/), but it is still in development and it is specific to serving web sites, not files. The difference is that whereas a file server may be updating hundreds or thousands of files at any given moment, this open source solution doesnt seem to address the instantaneous update of the other servers. In a failover situation, it appears that at least some data would be lost.
6. Cost. Since this company is installing a new file server, as opposed to replacing one that has reached capacity, there is no reason for us to not consider purchasing used equipment, as long as it will be fully supported by the manufacturer. If we add the missing items from the G3/OS X Server solution, the cost quickly approaches the $10,000 limit we have set. There is also the issue of support (see number 7, below).
7. Support. Since the company cannot afford a failsafe system, the next best thing is a guaranteed turn-around time for service. In the unlikely event that the hardware should fail, the commercial Unix vendors offer service contracts where they will be on-site in less than 4 hours and repair the problem. The contract also includes the OS upgrades, and is priced at less than $1,700 per yearmuch less than the cost of employing a full-time support person. This approach also allows us to devote our computer-related time to installing software and maintaining the workstations.
8. ECC RAM. Error-checking and correcting memory. Data integrity is of utmost importance for our company. With ECC RAM in the commercial Unix servers, the computer can not only verify the integrity of information in RAM, but it can block off any portions of memory that it determines are bad, while continuing to serve data. There is no mention of ECC RAM support for the G3 hardware that I could find.
As impressed as I am with OS X Server and Apple, I cannot recommend OS X Server as a solution for file serving at a company where time is billed hourly on projects, and downtime means lost revenues. Each of the items I have discussed may seem minor, but they are really a testament to the level of detail incorporated into commercial Unix servers. Worth noting is that none of the Intel-based hardware I briefly examined offers any of these items, regardless of the operating system, also disqualifying them for file serving. Each of the items above must be present to ensure maximum reliability of the server, which is critical in a High Availability application.
Conclusion
There are a whole variety of options available to us. It all comes down to how much reliability do you need, and how much can you afford? For my office, the recommendation I made was for this system:
Used SGI Origin 200 (demo equipment)
Twin 180MHz processors, each with 1MB cache
256MB ECC RAM
4GB system disk
2x9GB data disks, to be mirrored in software
IRIX 6.5 Server OS
FullCare maintenance contract (first year of on-site support)
The cost? Well, the total is confidential, but it was under the $10,000 limit we had set. To be fair, the cost of this equipment new would be substantially more. But there seems to be a good selection of used equipment. However, if I were restricted to new only, I could still probably keep the total under $15,000. Not exactly chump change, but when you consider the cost of downtime and full-time IT personnel, it is a very competitive offering. If the company ultimately decides that this sort of setup is unaffordable, I have recommended that we retain the existing peer-to-peer network. This is preferable to placing all our eggs in one basket by purchasing a lesser system that presents a greater risk of downtime and data loss, and is potentially higher maintenance. OS X Server was disqualified from this budget-limited High Availability application because it lacked certain hardware features, its cost, and because of the OS itself. When considering the package as a whole, OS X Server isnt yet ready to compete against the commercial Unix vendors, even on the basis of price. Ive explained what I see as the obstacles. Now its up to Apple and Macintosh developers to solve them.
3 June 1999
High Availability and Mac OS X ServerFollow Up
First off, I would like to express my appreciation for all the thoughtful, informed comments I received. I believe this has been a good opportunity to both hear and be heard. The responses I received tended to fall into two categories, although it seemed that everyone agreed that the current Apple hardware isnt really appropriate for server duty. The two types of responses were split evenly and either expressed agreement with the assessment, or they wondered why I hadnt looked further into certain combinations. I have attempted to clarify below the areas that seemed to be lacking in the original report.
High Availability
This is probably the foremost area where I should have given more explanation. High availability is about maximizing survivability and reliability. It is about paranoia. An operating system or platforms historic record for reliability is only a part of the equation. A more important part of high availability is planning for failures and how to deal with them. Some pointed out that being able to survive a CPU failure in an MP system sounded extreme. I agreebut thats part of the paranoia. The question isnt whether recovery from a CPU failure is necessary for an application, but only whether it is something that qualifies a system as being highly available.
When an employer expresses a desire for a system that has the highest survivability and reliability, it places a lot of responsibility on the employee. If I were a full-time IT person, I could specify lesser hardware running Linux (for example), because I would be available to service the system. However, I am first an engineer, and the management wants me spending as little time as possible on the computers. As a result, I have to specify the most reliable, most survivable system I can find within the budget constraints. This doesnt mean that other systems arent reliable, just that theyre not high availability systems. The other major issue for us was that purchasing a high availability system was going to cost approximately the same as the best offerings from Intel, which incidentally dont claim to be high availability systems.
Linux
The reservation I have about Linux is that I would be too tempted to be constantly tinkering with it. I also dont relish the thought of having to rebuild the kernel to incorporate new drivers, although I understand that is in the process of changing. Linux is a wonderful OS. Im just not sure about purchasing $7,000-$10,000 worth of Intel hardware to run Linux instead of Novell or Solaris x86, which are much more mature Oses. Theres also not much in the way of commercial support.
NT
The only reason I mentioned the NT Workstation failure was to point out that we needed a file server. In that particular case, the failure had nothing to do with the OS, but only with the hardware. There are already raging debates about NT versus Unix, and I dont want to start such a war here. The decision to disqualify NT was based in part on our own observations of NT, which we feel is something less than bulletproof. One professional mentioned to me that NTs failure rate is 3% per month. Thats livable for a workstation, but not for a server, in my opinion. There are other NT versus Unix issues, but I leave them up to the individual reader to discover (some are detailed at this site: http://www.unix-vs-nt.org/kirch/). In short, we elected to disqualify NT as an option.
Solaris x86
The reservation I have about Solaris x86 is that I get a lukewarm feeling from Sun about it. My impression is that Sun would prefer you purchase their (very reputable, purpose-built) hardware rather than Intel. And in the price range in question, I would prefer to do that.
Novell
Novell is a wonderful NOS. Its stability and reputation are tremendous. My reservations about Novell stemmed partly from the Intel hardware issues. I also had concerns about specifying a system that might require a CNA or CNE for maintenance, especially if I ever moved on. The commercial Unix systems, as far as I can tell, look to be the lowest maintenance while providing the highest survivability.
Compaq or other Intel Hardware
As far as I can tell, Compaqs Proliant server, which I will use for discussion purposes, seems to meet many of the hardware requirements I listed, including a 4-hour support option at a competitive cost. However, the support is for the hardware, not the system as a whole (which includes the OS). The initial cost and support for the OS would be additional costs. Is it good hardware? It sure looks like it. Is it high availability? I dont think it completely meets the requirements, specifically in the CPU and failover categories. Here too, a well-configured Proliant system will fall in the $7,000+ category, which puts it in the same price category as the Origin 200. Again, reliability is the foremost consideration, with cost and performance being secondary requirements. Given these facts, the Origin wins out over the Intel hardware.
Summary
I originally wrote the article to share the experiences I had while researching the topic of high availability servers, and I directed myself at evaluating some of the areas in OS X Server that need improvement. I still dont have a clear idea where OS X Server will be best deployed, given the fundamental hardware issues. As one person pointed out, its probably not in Apples best interest to take on the high availability market we are discussing, so we may never see this sort of dedicated server hardware from Apple.
Now to the Intel option. There isnt necessarily anything wrong with the Intel solutions that many people pointed out. They just simply arent high availability systems, and they dont claim to be, from what Ive seen. Nevertheless, I appreciate those people who have taken the time to kindly express their differences and further enlighten me. Thanks for all the input.
Page Last Updated Thu, Dec 23, 1999 |