Red Hat High Availability (in Toronto)

CN Tower
Last week I had the opportunity to attend a session of the four-day class, Red Hat Enterprise Clustering and Storage Management (RH436), held in Toronto.  It was a busy week, and that lovely view of the CN Tower above, as seen from my hotel room window, had to suffice for experiencing the city.  Fortunately I’ve been here before, looked through the Glass Floor, and generally done the tourist thing.  So let’s get down to business.

Purpose Of The Trip
At the radiology practice in which I work, we’ve long relied on Red Hat Enterprise Linux as the operating system underpinning our PACS (Picture Archiving and Communication System) that is the heart of our medical imaging business.  For awhile, most of the rest of our back-end systems ran atop the Microsoft family of server products, just as our workstations and laptops run Microsoft Windows Professional.  But over the last couple of years, the Microsoft-centric approach has gradually started to shift for us, as we build and deploy additional solutions on Linux.  (The reasons for this change have a lot to do with the low cost and abundance of various open-source infrastructure technologies as compared to their Microsoft licensed equivalents.)  But as we build out and begin to rely on additional applications running on Linux, we have to invest time in making these platforms as reliable and fault-tolerant as possible.

Fault Tolerance, Generally Speaking
The term ‘fault tolerance’ is fairly self-explanatory, though in practice it can cover a substantial amount of ground where technical implementations are concerned.  Perhaps it’s best thought of as eliminating single points of failure everywhere we can.  At my employer, and perhaps for the majority of businesses our size and larger, there’s already a great deal of fault tolerance underneath any new ‘server’ that we deploy today.  For starters, our SAN storage environment includes fault tolerant RAID disk groups, redundant storage processors, redundant Fibre Channel paths to the storage, redundant power supplies on redundant electrical circuits, etc.  Connected to the storage is a chassis containing multiple redundant physical blade servers, all running VMware’s virtualization software, including their High Availability (HA) and Distributed Resource Scheduler (DRS) features.  Finally, we create virtual Microsoft and Linux servers on top of all this infrastructure.  Those virtual servers get passed around from one physical host to another – seamlessly – as the workload demands, or in the event of a hardware component failure.  That’s a lot of redundancy.  But what if we want to take this a step further, and implement fault tolerance at the operating system or application level, in this case leveraging Red Hat Enterprise Linux?  That is where Red Hat clustering comes into play.

Caveat Emptor
Before we go any further, we should note that Red Hat lists the following prerequisites in their RH436 course textbook: “Students who are senior Linux systems administrators with at least five years of full-time Linux experience, preferably using Red Hat Enterprise Linux,” and “Students should enter the class with current RHCE credentials.”  Neither of those applies to me, so what you’re about to read is filtered through the lens of someone who is arguably not in the same league as the intended audience.  Then again, we’re all here to learn.

What Red Hat Clustering Is…
In Red Hat parlance, the term ‘clustering’ can refer to multiple scenarios, including simple load-balancing, high-performance computing clusters, and finally, high availability clusters.  Today we’ll focus on the latter, provided by Red Hat’s High Availability Add-On, an extra-cost module that starts at $399/year per 2-processor host.  With Red Hat’s HA addon, we’re able to cluster instances of Apache web server, a file system, an IP address, MySQL, an NFS client or server, an NFS/CIFS file system, Open LDAP, Oracle 10g, PostgreSQL, Samba, a SAP database, Sybase, Tomcat or a virtual machine.  We’re also able to cluster any custom service that launches via an init script, and which returns status appropriately.  Generally speaking, a clustered resource will run in an active-passive configuration, with one node holding the resource until it fails, at which time another node will take over.

…And What Red Hat HA Clustering Is Not
Less than two weeks prior to the RH436 class, I somehow managed to get through a half-hour phone conversation with a Red Hat Engineer without touching on one fundamental requirement of HA that, when later identified, shaped my understanding of Red Hat clustering going forward.  So perhaps the following point merits particular attention: Any service clustered via Red Hat’s HA add-on that also uses storage – say Apache or MySQL – requires that the cluster nodes have shared access to block level storage.  Let’s read it again: Red Hat’s HA clustering requires that all nodes have shared access to block level storage; the type typically provided by an iSCSI or Fibre Channel SAN.  Red Hat HA passes control of this shared storage back and forth among nodes as needed, rather than having some built-in facility for replicating a cluster’s user-facing content from one node to another.  For this reason and others, we can’t simply create discrete Red Hat servers here and there and combine them into a cluster, with no awareness of, nor regard for, our underlying storage and network infrastructure.  Yet before anyone goes dismissing any potential use cases out of hand, remember that like much of life and technology, the full story is always just a bit more complicated.

Traditional Cluster
Let’s begin by talking about how we might implement a traditional Red Hat HA cluster.  The following steps are vastly oversimplified, as a lot of planning is required around many of these actions prior to execution.  We’re not going to get into any command-line detail in today’s discussion, though that would make for an interesting post down the road.

  • We’ll begin with between two and sixteen physical or virtual servers running Red Hat Enterprise Linux with the HA add-on license.  The physical or virtual servers must support power fencing, a technology that allows a surviving node to separate failed nodes from possibly writing to shared storage by shutting the failed node down.  This is supported on physical servers by Cisco, Dell, HP, IBM and others, and is also supported on VMware.
  • We’ll need one or more shared block level storage instances accessible to all nodes, though one at a time.  In a traditional cluster, we’d make this available via an iSCSI or Fibre Channel SAN.
  • All nodes are on the same network segment in the same address space, though it’s wise to isolate cluster communication to a separate VLAN from published services.  Multicast, IGMP and gratuitous ARP are supported on our segments.  There’s no traditional layer 3 routing separating one cluster node from another.
  • We’d install a web-based cluster management application called Luci on a non-cluster node.  We’re not concerned about fault-tolerance of this management tool, as a new one can be spun up at a moment’s notice and pointed at an existing cluster.
  • Then we’d install a corresponding agent called Ricci (or likely the more all-encompassing “High Availability” and “Resilient Storage” groups from the Yum repository) on each cluster node, assign passwords, and set them to start on boot.
  • At this point we’d likely log into the Luci web interface, create a cluster, add nodes, set up fencing, set up failover, create shared resources (like an IP address, a file system or an Apache web service) and add those resources to a service group.  If that sounds like a lot, you’re right.  We could spend hours or days on this one bullet the first time around.
  • Before we declare Mission Accomplished, we’ll want to restart each node in the cluster and test every failover scenario that we can think of.  We don’t want to assume that we’ve got a functional cluster without proving it.

What About Small Environments Without a SAN?
It’s conceivable that someone might want to cluster Red Hat servers in an environment without a SAN at all.  Or perhaps one has a SAN, but they’ve already provisioned the entire thing for use by VMware, and they’d rather not start carving out LUNs to present directly to every new clustered scenario that they deploy.  What then?  Well, there are certainly free and non-free virtual iSCSI SAN products including FreeNAS, Openfiler and others.  Some are offered in several forms including a VMware VMDK file or virtual appliance.  They can be installed and sharing iSCSI targets in minutes, where previously we had none.  Some virtual iSCSI solutions even offer replication from one instance to another, analogous to an EMC MirrorView or similar.  In addition to eliminating yet another single point of failure, SAN replication provides a bit of a segue into what we’re going to talk about next.

What About Geographic Fault Tolerance?
As mentioned early on, at my office we already have several layers of fault tolerance built into our computing environment at our primary data center.  When looking into Red Hat HA, our ideal scenario might involve clustering a service or application across two data centers, separated in our case by around 25 miles, 1 Gbit/s of network bandwidth and a 1 ms response time.  Can we do it, and what about the shared storage requirement?  Fortunately Red Hat supports certain scenarios of Multi-Site Disaster Recovery Clusters and Stretch Clusters.  Let’s take a look at a few of the things involved.  Be aware that there are other requirements.

  • A Stretch Cluster, for instance, requires the data volumes to be replicated via hardware or 3rd-party software so that each group has access to a replica.
  • Further, a Stretch Cluster must span no more than two sites, and must have the same number of nodes at each location.
  • Both sites must share the same logical network, and routing between the two physical sites is not supported.  The network must also offer LAN-like latency that is less than or equal to 2 ms.
  • In the event of a site failure, human intervention is required to continue cluster operation, since a link failure would prevent the remaining site from initiating fencing.
  • Finally, all Stretch Clusters are subject to a Red Hat Architecture Review before they’ll be supported.  In fact, an Architecture Review might be a good idea in any cluster deployment, stretch or not.

While many enterprise computing environments already contain a great deal of fault tolerance these days, the clustering in Red Hat’s High Availability Add-On is one more tool that Systems Administrators may take advantage of as the need dictates.  Though generally designed around increasing the availability of enterprise workloads within a single data center, it can be scaled down to use virtual iSCSI storage, or stretched under certain specific circumstances to provide geographic fault tolerance.  In today’s 24×7 world, it’s good to have options.

Transformers in the Data Center

EMC VNX5400 Powered By 120 Volts AC

EMC VNX5400 Powered By 120 Volts AC

At my day job, we recently purchased an EMC VNX5400 SAN disk array for use at a secondary data center.  For any readers who aren’t aware, the VNX5400 is a complex and highly customizable piece of equipment that typically costs tens of thousands of dollars or more.

In an attempt to do proper due diligence, we asked well in advance about the specific power requirements of this system.  And we were told that two pre-existing 120 Volt, 20 Amp circuits were more than sufficient for the purpose.  The array is capable of running on 200 – 240 Volts as well, but those circuits have a higher recurring monthly cost.  We don’t use them unless we need to, and we haven’t needed to yet in this secondary location.

Installation day came.  Our EMC partner for this project – Par 4 Technology Group – arrived to rack, connect, power on and help configure the equipment.  Everything went predictably well as we mounted the equipment, save for a bent corner bracket that required straightening.  We ran into a problem, however, when we went to power on what EMC calls the Disk Processor Enclosure (DPE).  It simply wouldn’t power on.

This VNX5400 is a recently-released model, so the Par 4 team began looking for a power switch that might have been added to the configuration, or any other obvious explanation for what might be going on.  After all, the accompanying Disk Array Enclosures (DAEs) lit up just fine using the same circuits.  Soon it was time to call EMC.

As it turns out, EMC has an internal bulletin – not meant for end customers like us – in which they mention that, “our installation guides incorrectly stated that 100 – 120 Volts AC can be used.”  This DPE requires 200 – 240 V AC.

Now when I hear that we don’t meet the power requirements of an enterprise-class device, my first thought (and my only thought, prior to this experience) is that we need to get the higher-voltage circuits installed.  This would require an electrician, several hundred dollars in installation fees, and hundreds of dollars a month in recurring cost for the extra 200 V circuit(s).  Typically we’d install a pair of circuits for fault tolerance.  In addition to the unanticipated costs, we wouldn’t be able to get the SAN deployed on schedule, and would have to get the EMC partner back out at a later date.

But EMC’s internal bulletin didn’t end with the news of the change in power requirements.  They went on to specify that they’d validated a particular power transformer in their lab and are recommending it to customers for use with certain VNX models like ours on 110 V circuits.

In this particular instance, EMC recommends a Hammond Manufacturing 176G autotransformer, available from Mouser for $181.24 each at the time of this writing, and less in quantity.  I placed an order from Mouser at 4:07 PM that day for a pair of Hammond Manufacturing 176G units with overnight shipping.  Total cost was $406, one time only.  By comparison, a new pair of 200 V circuits in this facility might cost our company between $10,000 and $15,000 for the first year, and every year after that.

Anyone purchasing a step-up transformer should note that the output receptacle may be different than what their current power cord accommodates, so as not to be caught scrambling for adapters at the last minute.  Each 176G has a single NEMA 6-15R receptacle, and we’d need a NEMA 6-15P to IEC C13 cable for each of our connections to the DPE.  We scrambled to accommodate this need too, before discovering that EMC had proactively shipped us two of these cables with the VNX5400 in a separate box.

Now I’m not about to suggest that there’s not a place for 200 – 240 V power in a data center.  I’ve been told by two data center operations guys along the way that they prefer to run their entire facility on 200 – 240 V, as it’s more efficient overall.  But in this particular instance, with a single device requiring the higher voltage, step-up transformers proved to be an inexpensive solution that was suitable for the task.  We got our new EMC VNX5400 SAN up and running on schedule in the two days allotted, using the existing 120 V circuits, with only a minimal additional one-time cost of $406.  I’m sure that our company won’t mind saving the $10,000 – $15,000 a year over what might otherwise be necessary.  And from this perspective, perhaps the lesson on transformers alone was $406 well spent.