Microwulf Is The World’s Cheapest Supercomputer, A Personal, Portable Beowulf Cluster
The system is a Beowulf cluster that is capable of running at 26.25 Gigaflops and costs $1,256. This may not qualify for a budget computer, but for budget supercomputer it sure is heck of a solid candidate. The figure of 26.25 Gigaflops is insane when you look at it from price/performance ratio.
Sun’s Spart Enterprise M9000 Supercomputer costs $511,385 and is capable of working at 1.03 Teraflops, breaking this in dollars per Gigaflop tells us that M9000 costs $496 per Gigaflop whereas Microwulf costs $48 per Gigaflop.
You can find out how to make your own mini-supercomputer by checking Cluster Monkey out!
Microwulf is designed to be a cost-efficient, high performance, portable, "personal" Beowulf cluster. The basic idea is to pack a lot of processing power into a small volume using multicore CPUs.
To do so, we use motherboards with
a smaller form-factor
(like Little Fe)
than the usual ATX size,
and we space them using threaded rods
(like this cluster)
and scrap plexiglass, to minimize "packaging" costs.
By building a "double decker sandwich" of
four microATX motherboards, each with
a dual core CPU and
2 GB RAM (1 GB/core),
we can build a 4-node, 8-core, 8GB multiprocessor
small enough to fit on one's desktop,
powerful enough to do useful work,
and inexpensive enough that anyone can afford one.
Since our microATX motherboards have an on-board Gigabit Ethernet adaptor, that is the least expensive way for the processors to communicate. To keep the two cores from competing for this adaptor, we add a second Gigabit Ethernet adaptor in each motherboard's PCI-Express slot. We then rely on Open MPI (see below) to spread the communication load across these two adaptors. Then we connect all the adpators via an inexpensive 8-port Gigabit Ethernet switch. This provides a Gigabit Ethernet link's worth of bandwidth for each core.
The following schematic diagram shows the interconnections between Microwulf's components:
Tim Brom and Microwulf
Microwulf "west" view
Microwulf "southwest" view
Microwulf "south" view
Microwulf "southeast" view
Microwulf: Cost Efficiency
When you have measured a supercomputer's performance using HPL, and know its price, you can measure its cost efficiency by computing its price/performance ratio. By computing the number of dollars you are paying for each floating point operation (flop), you can compare one supercomputer's cost-efficiency against others.
With a price of just $2470 and performance of 26.25 Gflops, Microwulf's price/performance ratio (PPR) is $94.10/Gflop, or less than $0.10/Mflop! This makes Microwulf the first general-purpose Beowulf cluster to break the $100/Gflop (or $0.10/Mflop) threshold for measured double-precision floating point performance.
For comparison purposes:
- In 1976, the Cray-1 cost more than 8 million dollars and had a peak (theoretical maximum) performance of 250 Mflops, making its PPR more than $32,000/Mflop. Since peak performance exceeds measured performance, its PPR using measured performance (estimated at 160 Mflops) would be much higher.
- In 1985, the Cray-2 cost more than 17 million dollars and had a peak performance of 3.9 Gflops, making its PPR more than $4,350/Mflop ($4,358,974/Gflop).
- In 1997, IBM's Deep Blue defeated world chess champion Gary Kasparov. Its price has been estimated at 5 million dollars, and it produced 11.38 Gflops of measured performance, making its PPR more than $439,367/Gflop.
- In 2003, the U. of Kentucky's Beowulf cluster KASY0 cost $39,454 to build, and produced 187.3 Gflops on the double-precision version of HPL, giving it a PPR of about $210/Gflop.
- Also in 2003, the University of Illinois at Urbana-Champaign's National Center for Supercomputing Applications built the PS 2 Cluster for about $50,000. No measured performance numbers are available; which isn't surprising, since the PS-2 has no hardware support for double precision floating point operations. This cluster's theoretical peak performance is about 500 Gflops (single-precision); however, one study showed that the PS-2's double-precision performance took over 17 times as long as its single-precision performance. Even using the inflated single-precision peak performance value, its PPR is more than $100/Gflop; it's measured double-precision performance is probably more than 17 times that.
- In 2004, Virginia Tech built System X, which cost 5.7 million dollars, and produced 12.25 Tflops of measured performance, giving it a PPR of about $465/Gflop.
- In 2007, Sun's Sparc Enterprice M9000 with a base price of $511,385, produced 1.03 Tflops of measured performance, making its PPR more than $496/Gflop. (The base price is for the 32 cpu model, the benchmark was run using a 64 cpu model, which is presumably more expensive.)