Posts
So much #fail in the RHEL init process
Its borked so incredibly badly, that in order to support what we need, we have to hack around all its brokenness. Dracut is a step up, but pretty much everything else (and this may be a dracut issue) is borked. We want one initramfs to support software RAID1 boot, network boot, iscsi boot. But you have to pull in so many modules to get this to work … we have gigantic initramfs that take forever to assemble.
Posts
We built that: 10 years in business
[warning: longer post] I mentioned this on twitter (@sijoe). The day job has been in business for 10 years. We’ve not taken outside investment to date, and we’ve not sold the company yet. We’ve been profitable and growing continuously during our lifetime. The preceding 3 years have seen growth, accelerating hard. The company was built starting with a conviction that practitioners and users of HPC systems needed better designs, better systems than were being pushed out by traditional vendors in the early 2000’s.
Posts
the mystery of the week
Customer has had a machine for a while. Generally stable. Followed our advice on doing a reboot recently. Unit started crashing Monday. Then today. Hard to stay up and stable. I asked if anything has changed, and haven’t gotten anything conclusive … mostly “we don’t think so”. About the crashes: Nothing in the logs. Not a thing. No hardware subsystem, which has logging enabled (RAID, motherboard, PCIe, IPMI, … ) reports an error.
Posts
... and Oracle snarfs up Xsigo ...
Xsigo makes virtual network connectivity systems. Basically letting you build a virtual network, in a software stack, so you can avoid spending so much money on a fixed (and inflexible) network stack. Its a neat concept, but its utility is focused elsewhere than HPC. Even though they talk storage, I’d argue its a fairly expensive way to build a network for storage as well … though if you are going to be changing your network all the time, it actually might be a win.
Posts
... and he's back!
with a good article on a new license formulated for genomic code being distributed by a university research center. Glad to see the blog back up! Or rebooted … and +10 on your article. It (that license) is the wrong direction IMO. Goes against what publicly funded scientific code should be distributed as (IMO).
Posts
More M&A?
I’ve heard OCZ being looked at by Seagate and others. That would make sense. Honestly I think my expectations are not that companies have fire sales going on … but that areas where some sort of force multiplication is possible … these companies will be snapped up to help grow larger companies. Acquirers are after a few things. Value in terms of market, products, people, technology and capability, fit, etc. I do expect to see a few fire sales, but not many.
Posts
A question a customer asked relative to Lustre and the Whamcloud acquisition
Whats to become of Chroma (from Whamcloud)? I know its early, and I am sure that there won’t be answers just yet. Intel acquired Cilk, and its now available (and being integrated into gcc!) Intel acquired many others, and their bits are available. I’d expect Chroma to be made into an offering from Intel, along the lines of their cluster suite. Fully integrated stack. I know some folks are nervous about the acquisition.
Posts
Some kernels don't like having non-assemble-able software RAIDs
This one took me a while to figure out. I had to start probing why a system would crash the MD stack shortly after booting, but not in single user mode. So I started delving into the RAID. And found that the folks who set this unit up had a RAID0 with 0.90 metadata on the devices, and then 1.2 metadata on the MDS. So along comes the Lustre-ized kernel, and whammo.
Posts
ahh grub 0.97 + ext4 ... how I loathe thee
I had forgotten that some combinations of grub + file system could be rendered unbootable without lots of additional help. Grub is annoying. This is Grub legacy. Grub current tries to fix the mess, but fails as it is overly complex. And it appears to omit PXE and network boot options. Well iPXE helps us there. This is why we like tiburon so much. No installation. No problem. No grub to worry about.
Posts
bad design + bad implementation = company success ??? Seriously ???
We are often hired to work on existing systems, to see if we can help make them faster and better. I am working on such a project now, but this post is not about this project. I’ve noticed a tendency in the market to shoehorn a set of designs for storage/computing systems into areas they weren’t designed for. Moreover, these designs would be right at home 15 years ago, since then, far better scale out designs have come along which do a far better job than the older designs.