Just to be clear what I'm talking about

Of course there's a variety of virtualization technology - including hardware abstractions like Java and .NET, disk emulation like TrueCrypt, and even the memory virtualization implemented by most operating systems.   However, I am looking at whole system virtualization that allows a complete operating environment to run on a simulated computer.  As such, I sometimes refer to hypervisors that do this as "operating systems for operating systems". 

What does an operating system do anyway?

Conveniently, Wikipedia defines an Operating System exactly the same as I was taught: “An operating system (OS) is software, consisting of programs and data, that runs on computers and manages the computer hardware and provides common services for efficient execution of various application software.” (The italics are mine, by the way.)  The idea is that applications can’t just grab control of the hardware whenever they like.  Instead, resources are doled out judiciously by an OS that tries to arbitrate simultaneous requests and restrain poorly written applications.  In return for a little less responsiveness, an OS increases overall efficiency and reliability.

Some OSes are better than others at avoiding computational anarchy: DOS was pretty awful, Windows (since 2002) has been pretty good, and an RTOS like QNX* is probably the best.  Depending on an application’s requirements, a particular OS may be more appropriate. Even DOS is good enough for single-task “Live CD” utilities.  QNX is probably best for software that makes your car go (and stop).  General purpose operating systems like Unix, Linux, and Windows are more complicated though.  They need to be updated, optimized, and reconfigured on a regular basis - and each one has different sets of applications written for them.  There is the root of the problem.  If the latest versions of Windows or Linux ran every application perfectly, virtualization wouldn't really be necessary - after all, operating systems are supposed to be good at isolating and managing applications.  But no OS runs everything, and isolation can be difficult when an operating system actually encourages inter-process communication and resource sharing.

Dealing with multiple operating systems

While desktop software has mostly standardized on Windows and applications are usually forward-compatible, this is not universally true. In data-centers, this is decidedly not the case.  Linux servers often host web sites, databases, message stores, and other services that may not exist, perform badly, or cost too much on Windows.  Custom software may also be “frozen” on a particular OS version due to special dependencies or lack of updates.  Since PC-based systems are relatively inexpensive, single-purpose hosts have often been the way to deal with that: file & print services on Windows, DBMS on a Unix host, Web server on a Linux box, etc..  It can even be argued (as I have) that this form of isolation and independence can offer the most reliability.  But there is a problem...

The problem is progress

Actually, progress is just fine; however, its two side effects are the problem:

The first is the effect of new hardware development: old hardware. Old hardware isn’t a problem itself - except that it tends to eventually become broken hardware that is hard to replace.  Software that ran perfectly for years may suddenly become homeless in such an event.  Even worse, it may have nowhere else to go if it relied on deprecated functions, unsupported interfaces, or a completely obsolete OS.

The second is Moores Law*.  Having CPU resources increasing geometrically doesn’t seem like a problem except that you can only buy fast hardware.  Even if that 20-year-old OS/2 payroll system ran fine on a Pentium Pro system, the slowest replacement server will have 4 cores that each run over 10 times as fast.  Even assuming you got OS/2 to work on modern hardware, at least 90% of the processing power would be wasted – not to mention the acres of disk space and memory that would go unused.  Unless you want develop more software for your OS/2 box, there must be another use for all that extra computing power.  A good answer is to run OS/2 in a virtual machine that emulates 20-year-old hardware - while the rest of the system resources are used for modern applications.

An old invention lives on

As of 2017, the first commercial virtual machine is 45 years old. In fact, VM/370’s* successor, z/VM, is still running on modern IBM mainframes that host AIX, Linux, and MVS as guest OSes.  VM was a success mostly because big iron is expensive.  Nobody wants to buy a second mainframe for development, plus a third for testing. Low cost isolation was important back in 1972. Even in production, virtualization made better use of hardware by letting it host multiple operating systems – each with an independent purpose. In the last decade, this rationale has made its way to PC-based systems.  If several different systems can be condensed on a single new host, operating costs (power, HVAC, space, etc.) can be reduced by up to (n-1)/n of the previous hardware.  Assuming old hardware platforms have to be replaced anyway, condensing their OSes and software can save capital costs by the same formula.  Similar to a System 370, testing and staging updates on PC virtual machines can ensure production-identical environments without buying identical hardware.  These are some of the reasons products like VMWare, Hyper-V, Virtual Box, and XEN have become popular.

Isolation - it's not just for hermits

While I've mentioned Isolation in the last few paragraphs, I didn't actually say why it's important.  The problem is that modern hardware and operating systems can run many applications simultaneously but they also encourage resource sharing to optimize memory, storage utilization, and administration.  As long as every application uses the same version of Java, the same DLLs, the same network address, the same security credentials, etc. - everything will run OK.  But, should you happen to upgrade just ONE application that also updates a shared DLL, all hell could break loose.  That's not to say you can't localize the components being updated.  Similarly, you can use separate credentials for each application's admin, different addresses for each application, and privatized versions of other shared resources.  The problem is that it's just not "natural" for a general purpose operating system; you have to forcibly isolate your applications - and then keep them that way.  Sometimes it's not even possible - for instance, there's only one Windows registry used by many applications.  A virtual machine, on the other hand, can run a dedicated operating system with a single application that can't be "stepped on" by other applications or their administrators.  It's the next best thing to having dedicated hardware when you want to protect critical software from interference.

The OS for OSes

Here is a recap of the forces driving virtualization:

  • Multiple operating systems are often found side-by-side in the datacenter.
  • Regular operating systems can't really isolate applications (and their administrators) from each other.
  • Old software needs to migrate to new hosts - but can’t.
  • New hardware tends to be powerful but underutilized.
  • People have always wanted to save money.

While this list is significantly incomplete, I believe these are five “keystone” motivators.  As such, they have made the seemingly redundant "operating system for operating systems" possible.  While most of us left “make believe time” back in nursery school, software virtual machines regularly pretend that they are pieces of hardware that almost any OS can use.  Hardware companies like Intel and AMD provided their “V” processor extensions to make this imitation hardware more efficient.  Many Linux distributions include a XEN hypervisor at least partly because Linux users tend to be budget-conscious and need to maximize each server's usage.  Even Microsoft, a company with no great love for anything but Windows, even gives away their Linux-capable Hyper-V to avoid being left out of this drive for optimum hardware utilization.

So, it seems that the definition of an OS is has changed; it only thinks it manages the hardware.  The Hypervisor (or Type 2 Hypervisor + its host OS) is the real operating system - and it serves exactly one type of app: a delusional guest OS.  With the proliferation of hypervisors, perhaps 20 years from now, the same forces pushing operating systems away from the real hardware, will do the same to legacy hypervisors - resulting in virtual machines for old hypervisors to run on... OK, that's just silly.

Pros and Cons

Having looked at an absurd implication as well as valid reasons for virtualization, I offer a comparative look at some of virtualization's good and bad points (actually, the bad and then good points):

Virtualization Problems
On the other hand...
If you use a commercial virtualization product, you have to pay for it. If you can consolidate software without re-development, you can save on both capital and operating costs.
Guest operating systems may have different ideas about optimization than their hosts. For example, a defragmented virtual disk may actually be fragmented on the real disk. Guest OSes can be made to run more efficiently than native ones. For example, host disk caching can let guest operating systems boot much faster than normal.
Virtualization adds all the bugs and quirks of another "meta-operating system" to your application host. Virtual machines more portable and easily recoverable than real machines - sometimes able to fail-over automatically. 
Virtual machines tend to have lowest-common-denominator emulated hardware. A virtual machine can let the guest OS use stable, well-tested drivers that worked reliably with older (real) hardware.
Hypervisors need to be supported and updated like any other operating system. Hypervisors (type 1’s at least) are thinner than a typical, full-featured OS and consequently need less updating.
When para-virtualization isn’t possible, emulation has significant performance penalties. Legacy hardware emulation makes it possible to run old, unsupported OSes and software. 
Running more applications on a host means there is a greater possibility of resource contention. The guest OS/software mix can be arranged so that each virtual machine uses resources the others do not.
Like other OSes, you have to have faith that your applications guest OS will work with future generations of hypervisor. If good virtual machines are available, OS producers can relegate old versions to virtualization and not have to worry about supporting legacy apps or unpopular hardware.
Hypervisors themselves incur some resource overhead. Hypervisors allow detailed administration of the virtual hardware that is often not possible with real hardware.
Administering each host is more complicated: access to each vm console, run-time prediction for guest processes, networking on local virtual segments, etc. Concentrating virtual machines reduces datacenter clutter, LAN wiring, KVM units, power connections, etc.

 

What Would I do?

Although I've been skeptical in the past, I've found a lot of good uses for virtualization.  As with any tool, however, I wouldn’t use it everywhere...

Where I might use virtualization:

  • Testing OS and software upgrades prior to production release.
  • Backward compatibility testing with older OS and software installations.
  • Isolation from malware when testing unknown software.
  • Separate “work” and “personal” virtual machines on employee laptops.
  • Workstation OS updates distributed as virtual machine images.
  • True multi-user workstations with virtual machine images for each user.
  • Application virtualization for software that needs central config and/or license control.
  • Production servers that need live backups but are incapable doing it on their own.
  • Production servers that have no failover capabilities of their own.
  • Production servers where maintenance and restart downtime must be minimized.
  • Replacing significantly aged hardware with more reliable virtual machines.
  • Archiving old applications and their OS environment for future reference.
  • And of course, playing old DOS games.

Where I might not:

  • Systems that would be independently bound by any of a host’s resources (CPU, Memory, I/O).
  • Applications that have their own reliability (failover/backup) mechanisms.
  • Systems where individual host uptime is more important than recoverability.
  • When resource availability must be as independent and predictable as possible.
  • Systems where special hardware is required for operation or is critical for good performance.
  • Applications that require maximum isolation for security reasons.

Conclusion

Use of any technology always depends on the context. Resources, risks, timetables, and other constraints have to be measured when implementing and operating IT services.  Virtualization plans also have to consider application porting possibilities, blade server options, and even cloud computing alternatives.  Only when it comes to trivial questions like “how would you run old DOS games?” is there an easy, definitive answer: DOSBox.