Infinitus Incognita

The Infinite Unknown

Duct_TapeI’ve written before about the world being held together by duct tape… and it seems there are more people lately who have decided to rip off the covers and go looking for some duct tape.  The latest headline comes from the world of SCADA systems.  Researchers Lay Bare Woeful SCADA Security.  SCADA systems are small embedded computers that help guide various kinds of industrial processes..manufacturing, power plants and water systems. Basically anything where you have sensors, motors, pumps etc that have to be monitored and controlled.  Iran learned all about lax SCADA security over the last couple years and now everyone else is finding out about it too.  The dirty little secret is that most of these systems haven’t fundamentally changed in the last 20 years… despite huge improvements in the level of sophistication of what’s out there now even for hobbyists.  Things like the Arduino platform costs an order of magnitude less than commercial systems and can perform many of the same jobs.  Actually that’s not true though.. SCADA systems have changed in one very important way.. people started plugging them into a network.  Once you do that.. you are opening yourself up for a world of hurt if those systems were not designed to operate in a hostile environment.  As the researches in the linked story found out.. some of them can’t even be probed without crashing.. never mind standing up to direct attacks.

I was fortunate enough to take the SANS security course on Wireless Ethical Hacking, Penetration Testing, and Defenses a few years ago. While I totally recommend the SANS courses.. they are really top notch in the world of tech training.. one of the things I learned as a result of that course is that very few people/organizations take security seriously.   Security should be thought of as existing on a continuum along with ease of use.  That is.. something could be totally secure and totally unusable or very easy to use and totally insecure.  SCADA systems have been operating at that end of the scale for decades now and I doubt very seriously that’s going to change anytime soon.  If the customers who buy these systems cared at all about security they would demand the systems actually be more secure.  That doesn’t happen though.. and I blame human nature.

Incidentally… you may think your world isn’t personally touched by these systems but you would be wrong.  In fact.. in some areas you may already have a vulnerable SCADA component bolted right on your own home.  Heard of the SmartGrid?  The very same researcher who taught my wireless hacking class has found some serious issues with the power meters used in smart grid systems.  Imagine a worm that could infect a network of power company smart meters.. giving control over the power they regulate to some 3rd party.  At that point it would be trivial to crash the regional electrical grid on demand.. and we know from what happened accidentally in the north east a few years ago that can take days to recover from. Sleep tight!

As part of a new and fairly large project I have a need to partition a few postgres tables and have a rolling daily window.  That is.. I want to organize data by a timestamp storing each day in its own partition and maintain 90 days of historical data.  Doing this is possible in Postgresql but it’s not pretty or very clean to set it up.  To simplify the process I wrote this perl script that (when run daily) will pre-create a certain number of empty partitions into the future and remove the oldest partitions from your window.

The script is generalized so as to be easy to modify and there isn’t much here that’s specific to postgres.. so it could easily be adapted for use with other systems like Oracle. You will need to put in the DDL for the child tables you will create but otherwise it’s pretty straight forward.  Please let me know if you find this useful as I couldn’t find anything else out there like it.

Visit the project page for details and the download.

Update: Several important updates to the code and my examples since I first published this.  Be sure to grab the latest version which is starting to behave reasonably now.

My mail.app unread messages iconI’ve been using a mac for a while now and I recently decided to dump Entourage and go to using the native Mail.app.  I noticed a problem though.. within minutes of starting up it would consume several hundred megs of ram and have frequent CPU spikes of 80 to 100%.  If Mail was left open, memory usage would climb above 2 gig with continued CPU spikes.  After much digging I finally found the problem and fixed it.

I’d been using iSync back before I got an iPhone..  and had never thought to disable it.  It looks like what was happening is that mail and other apps (ical, address book) had been building a huge database of stuff that needed to be synced to my old phone.  Once I went in and reset the sync history and disabled iSync.. everything calmed down and now a running instance of Mail.app with 3 imap accounts and an Exchange account is using about 60 MB of ram.. and it’s not steadily climbing as it had been before.  This may have also been the source of the problems I’d been having with Entourage.

Now and then I’m called on to help interview candidates for linux admin/engineer slots and as I’ve been doing some of that lately I thought I’d share the way I go about doing a technical interview. This approach seems to work equally well over the phone or in person.

I’m big on understanding the fundamentals of linux. If someone comes to me with a resume showing 10+ years of experience building and managing production unix/linux systems there are certain things I’d expect them to know.. and to a certain depth. If they obviously don’t.. then I have to question the validity of what’s on the resume. So what I’ll usually do is pick a few key areas and start off with some general (easy) questions and then drill down a bit to discover the level of understanding on that particular topic. As an example.. I’ll share one of my favorites and lay it out the way I might do it during an interview.

Q: If I wanted to know who was logged in, how busy a system was and how long it had been up what command would tell me all that?

(Assuming they get that I’m looking for the ‘w’ command and mention the system load)

Q: Why are there three numbers for the system load?

(Assuming they know about the 3 time periods)

Q: What is the system load.. what do those averages actually represent?

(This usually starts to trip up the junior people but assuming they know it’s the run queue length)

Q: How does a multi-cpu system affect your interpretation of system load?

(Assuming they say something about dividing load by CPU count)

Q: Describe the relationship between the load average measurement and the percentage busy you might get from ‘top’.

This is usually about as far as I’ll take something like this.. but it can lead to a discussion about things like different kernel schedulers and how they can be tweaked etc. If a person can answer these and have that sort of discussion it tells me they have the right depth of understanding a senior person should have… at least on this area. Someone who has been an admin (but not what I’d classify as an engineer) should be able to answer at least the first 2 questions.

I was called on to provide a method of alerting from within nagios that was more active and direct than the usual use of email or SMS messages.  So I came up with a simple way to have a nagios notification place a phone call to our off hours tier3 support line to report certain very rare but serious problems.   This was actually a two part solution.  We were interested in looking for certain things coming out of the Broadsoft audit log that were important enough to wake someone up in the middle of the night.  So I wrote a daemonized script that tails the Broadsoft audit log and interprets it looking for these config changes and then reports this to a special RT3 queue.  It also notifies nagios (a push notification to a passive service check) over a socket connection. A listener script on the nagios box (using net::server) validates the syntax of the alert and lets nagios know.. which in turn triggers (and manages the scheduling) of the outgoing phone call(s) through asterisk.

I used the google TTS engine to record certain fixed statements that wold be common across calls.. converting the audio to the proper format for asterisk (SLN16) with sox.  For the specific alert text I’m using a simplified version of festival from the command line called ‘flite’.   The asterisk part is done entirely in a perl agi script and allows the called person to repeat the alert or acknowledge it within nagios. If they don’t answer or don’t ack the alert nagios will initiate another call in a few minutes.. and I’m able to use service escalations to notify different people if it goes too long without a response.

This project has more moving pieces than I usually like to use but it was interesting in just how easy it was to get working.  It’s gotten me thinking about doing a more full featured voice fronted to nagios that I could release.

See the code on page 2

continue reading…