Posts
there are times
that try my patience. Usually with poorly implemented filtering tools of one form or another. The SPF mechanism is to provide an anti-spoofing system, which identifies which machines are allowed to send email in your domain name. The tools that purport to test it? Not so good. I get conflicting answers from various tools for a simple SPF record. The online tester (interactive) seems to work and show me my config is working nicely.
Posts
Of course, this means more work ahead
Our client code that pulls configuration bits from a boot server works great. But the config it pulls is distribution specific. Where we need to be is distribution/OS agnostic, and set things in a document database. Let the client convert the configuration into something OS specific. This is, to a degree, a solved problem. Indeed, etcd is just a modern reworking of what we did with the client code … using a fixed client (e.
Posts
Very preliminary RHEL7/CentOS7 SIOS base support
This is rebasing our SIOS tech atop RHEL7/CentOS7. Very early stage, pre-alpha, lots of debugger windows open … but …
[root@usn-ramboot ~]# cat /etc/redhat-release CentOS Linux release 7.2.1511 (Core) [root@usn-ramboot ~]# uname -r 4.4.6.scalable [root@usn-ramboot ~]# df -h / Filesystem Size Used Avail Use% Mounted on tmpfs 8.0G 4.7G 3.4G 59% / Dracut is giving me a few fits, but I’ve finished that side for the most part, and am now into the debugging the post-pivot environment.
Posts
Best practice or random rule ... diagnosing problems and running into annoyances
As often as not, I’ll hear someone talk about a “best practice” that they are implementing or have implemented. Things that run counter to these “best practices” are obviously, by definition, “not best”. What I find sometimes amusing, often alarming, is that the “best practices” are often disconnected from reality in specific ways. This is not a bash on all best practices, some of them are sane, and real. Like not allowing plain text passwords for logins.
Posts
Attempting, and to some degree, failing, to prevent a user from accruing technical debt
We strive to do right by our customers. Sometimes this involves telling them unpleasant truths about choices they are going to make in the future, or have made in the past. I try not to overly sugar coat things … I won’t be judgemental … but I will be frank, and sometimes, this doesn’t go over well. During these discussions, I often see people insisting that their goal is X, but the steps Y to get there, will lead them to Z, which is not coincident with X.
Posts
When spam bots attack
I’ve been fixing up a few mail servers to be more discriminating over their connections. And I’ve noted that I didn’t have any automated tooling to block the spammers. I have lots of tooling to filter and control things. So I wrote a quick log -> ban list generator. Not perfect, but it seems to work nicely. Like I don’t have enough to do this week. /sigh Meetings tomorrow starting at 8am.
Posts
Why sticking with distro packages can be (very) bad for your security
I’ve been keeping a variety of systems up to date, updating security and other bits with zealous fervor. Security is never far from my mind, as I’ve watched bad practices being used at customers resulting in any number of things … from minor probes, through (in one case, with a grad student impacted by a windows key logger), taking down a linux cluster, but not before knocking the university temporarily off the internet.
Posts
Not-so-modern file system errors in modern file systems
On a system in heavy production use, using an underlying file system for metadata service, we see this:
kernel: EXT4-fs warning: ext4_dx_add_entry:1992: Directory index full! Ok, where does this come from? Ext3 had a limit of 32000 directory entries per directory, unless you turned on the dir_index feature. Ext4 theoretically has no limit. Well, its 64000 if you don’t use dir_index. Which we do use. Really the feature you want is dir_nlink.
Posts
SIOS-metrics being updated soon with our process table sampler
I needed to look at processes on the machine I’d been spending time debugging, in terms of what was running, what the state, the allocations, the IO, etc. Something was causing a hard panic, and it seemed correlated with an application issue. I didn’t have a process space sampler, so I wrote one. Takes one sample per second right now (configurable) across the whole process space. Uses 1% CPU or so normally.
Posts
Caught a not-so-cool bug in a hypervisor running on a production machine
Not naming names. Its a good product. It just gives up the ghost when you request 1.5x available memory, and the OS actually tries … tries … to fulfill the request. I thought I had set the maximum oversubscription amount to 85% of swap + physical. Yet, along came a nice spike and WHAMMO. Down the machine went. That this was a high visibility production machine, with hard uptime requirements … not so good.