An incomplete list of my excellent teachers

It’s just about that time to get back to school, so I wanted to thank some people in my life, from preschool through to teachers in my career:

  • Mrs. Hsu
  • Miss Wheeler
  • Mrs. Owens
  • Mrs. Hood
  • Mike Gajewski
  • Wade Jackson
  • Ken Drury
  • Mr. Anderson
  • Mr. Evans
  • Mrs. Chilson
  • Mr. Chilson
  • Nancy Black
  • Joan Twillman
  • Leigh Reisinger
  • Donna Kleinschmidt
  • Becky Minor
  • Bob Waggoner
  • Art Wash
  • Sanjay Patel
  • David Nicol
  • Paul Kwiat
  • Tony Liss
  • Lenny Pitt
  • Cinda Heeren
  • Adel Ibrahim
  • Matt Bross
  • Ted Bross
  • Mike Hoeflich
  • Bill Skepnek
  • Reza Garrow
  • Linda Boone

Brother Printers on Raspbian

It turns out that Brother doesn’t seem to be very serious about publishing driver packages that work for ARM machines. The easiest way I’ve found to deal with this so far– and still have quick, good-quality color printing– is to use the MFC-9840CDW PPD. Verified working on HL-3170CDW, MFC-8480DN, and I’ll test it on HL-2170DW and HL-4150CDN printers soon. I set up Samba and avahi-discover so I can have AirPrint, and although fidgety to set up, it does seem to work.

Oddly missing core functions

MSSQL is terrible for lacking some seemingly straightforward functionality- like the ability to drop a table only if it exists and has no rows. Salt to taste:

-- TODO: accept an array of tables, check all of them for existence and zero rows, then perform drop
CREATE PROCEDURE DropTableIfEmpty(@t as nvarchar(max))
      SELECT DISTINCT row_count
        FROM sys.dm_db_partition_stats
        WHERE Object_Name(Object_Id) = @t AND row_count = 0
    SELECT @q= N'DROP TABLE' + @t
    EXEC sp_executesql @q

To use: EXEC DropTableIfEmpty 'yourTableName'

Strong self-signed SSL certificates for development on IIS

openssl req -newkey rsa:4096 -nodes -keyout key.pem -x509 -days 3650 -out certificate.pem -sha256
openssl x509 -text -noout -in certificate.pem
openssl pkcs12 -inkey key.pem -in certificate.pem -export -out certificate.pfx
openssl pkcs12 -in certificate.pfx -noout -info


Splitting LDIF files

I went to the St. Louis PowerShell meetup last night, and someone had an interesting issue.  They were dealing with a 4.5 gigabyte LDIF file- it needed to be somehow processed, and the issue of processing had already been dealt with.  However, it turns out that PowerShell wasn’t cranking fast enough for the application.  Most computing work is single-threaded by default, so the simple approach here is to split the file up and process the pieces in parallel.

What is LDIF, anyway?  It’s a file format used to store LDAP entries.  Happily, it’s simple in that entries are just separated by a double newline.  The lowest-common-denominator approach for a straightforward text file like this is to use something like sed or awk.  I don’t know a whole lot about actually using either of these, but I knew that awk was really meant for splitting up “fields” contained within “records”.

First things first: grab some sample data.

My first concept was to use awk to grab some part of the first field and use that as a key. So for example, I might grab the fifth character from the first line- say it’s “q”. I’d then append that line to a new file, say file-q.ldif. But you always want to iterate and start from what you can get working, so I started with an example I grabbed- probably from some stackoverflow article, but I didn’t keep track:

awk -v RS= '/^#/ {next} {print > ("whatever-" NR ".txt")}' Example.ldif

The concept here was to also just skip comments, which is why the /^#/ {next} is in there- it should match strings that begin with a pound sign, and then just dump each record into its own file- file-1.txt, file-2.txt, and so on.

It turns out that I’m on a Mac, and awk on my system is the BSD variant- and it wigs out about too many files being open. No matter; we’ll close the files as we go.

awk -v RS= '{print > ("whatever-" NR ".txt"); close("whatever-" NR ".txt")}' Example.ldif

Really though, GNU awk is where it’s at, so I’ll be calling gawk from here. More generally, note that sed, awk, lex, and yacc all have serious cross-platform compatibility issues.  It’s easiest to standardize on GNU variants gsed, gawk, flex, and bison for all of these commands.  I’ve historically shied away from these tools because the syntax brings back bad memories of Perl, although in fact Perl grew out of these commands and not the other way around. I don’t know that I’ll use these tools every day, but it is certainly a handy approach when the “big guns” seem to be letting me down.

Back to the command above. It has a bug: it omits the double newline between entries as it writes the output files, so the LDIF entries get glommed into one big entry. To fix this, I need to fiddle with the output record separator variable, called ORS. This version grabs some characters from the first line of the input record to produce the output filename:

gawk 'BEGIN{RS="\n\n";FS="\n";ORS="\n\n"}{print >> ("file-" substr($1,9,1) ".txt")}' Example.ldif

The above approach has some merit if I wanted to group files by some content in the data. Here though, I only want to split the large file into smaller files of roughly equal sizes. The way to do it is to rotate systematically through the output files. Thankfully, this is EASY. The variable NR is the “record number”, so I can just take the modulus against however many files I want to split into. On this data sample, this approach results in much more even output file sizes than the substring hack above, and I don’t run the risk of getting weird garbage strings in the output filenames. I’m splitting into 6 files here, but salt to taste.

gawk 'BEGIN{RS="\n\n";FS="\n";ORS="\n\n"}{print >> ("file-" NR % 6 ".txt")}' Example.ldif

There’s one final refinement- because the record separator RS and output record separator ORS are equal, we can write this as:

gawk 'BEGIN{ORS=RS="\n\n";FS="\n"}{print >> ("file-" NR % 6 ".txt")}' Example.ldif

Useful reference:

Embroidery the Hard Way

With tens of thousands of dollars of equipment on hand, one would think that the seemingly simple would be possible.  Oh dear.  Not even close.

For years I’ve been putting binding on blankets and embroidering them with, generally, just a couple words- a name or something.  I wanted to do a bigger project- a quote that ran the whole length of the blanket.  I can’t even count the number of ways this went wrong.

I have access to a Babylock Ellisimo, a Babylock Ellyse, and a Bernina Artista 170.  The Ellisimo, being the biggest and baddest of these, seemed like the logical place to start. So….

  • Babylock doesn’t label their frames.  At all.
    • No size indication.
    • No useful indication of zero.
    • No model name or number you can search.
    • Not even an easy way to ask the sewing machine what size frames it knows about.
    • To add insult to injury, you can measure the frame–but the result is NOT the frame size.
    • Finally, I found out that the sizing is on the clear plastic insert pieces, which are also useful for rough alignment of your workpiece.  However, if you have the machine trace out the pattern perimeter, it doesn’t align with these patterns.  None of the pieces are actually labeled with the part numbers or sizes, but at least you can count the square centimeters on the clear inserts.
  • Poor discoverability all around on the Babylock.
    • Insert the frame, remove the frame, try not to be upset when the moving thing breaks a needle….
    • As noted above, no way to know what size frame is loaded or what frames the machine knows about.
    • If the embroidery file on your flash drive is too large for any frame the machine knows about, it simply DOESN’T DISPLAY ANY INDICATION that the file exists!
      • Please, if there’s something wrong with the file, show some sign of life!
      • If the file is corrupt or a format you don’t understand, say so.
      • If it’s too big, say so.
      • Too many stitches?  You get the idea.
  • No way to index multiple designs together across frame positions, or at least not as far as I could tell.
    • Apparently some machines have some kind of machine vision.  I don’t need anywhere near that level of precision- just let me index off a known point.
  • Stitchcode
  • SewWhat-Pro
    • Some cool features.  Not worth $65.
  • The funny thing is that some of the hard stuff was remarkably easy.
    • The “Letterworks Pro III” software from 20+ years ago worked flawlessly to take an arbitrary TrueType font and turn it into a stitch path.
    • The pattern it generated was pretty clearly a raster, but it does work.
    • Some of the embroidery fonts have rather annoying bugs- like the “LaurenScript” font, which will have big gaps if you blow it up too much.  These don’t show up in Letterworks but are apparent on the Ellisimo display and in SewWhat.

I would much rather publish a HOWTO- but I’m in the quagmire for the time being.  There are days I wonder if a needle and thimble would be the easier thing to do….but then, this project is on the order of 50,000 stitches…..per side…..

PS It seems to me that machining, printing (2D and 3D), pick and place, and PCB manufacture all involve fundamentally the same operations, but they run on totally different stacks.  I wish we could’ve all agreed on HPGL, G-code, PostScript, or SOMETHING back in the day.

Metrics for sunsetting systems

Really, these are similar metrics to what you would use to choose systems- except they’re often on the tail end of a choice made years ago.

I frequently see multiple systems run in parallel that really should be integrated (e.g., why would one have separate time and access cards?).  If you’re going to choose between systems, score them both and compare their relative risks and benefits.

This is a good time to read my previous post about sunk cost bias and risk adjustment.

I propose the following categories and considerations for evaluating systems for sunset:

  • Longevity
    • How long do you expect to need the system?
    • What would you do if you had to replace the system?
      • What’s the horizon on the subsystems? (e.g., if it only runs on SPARC, it’s probably time to look at sunsetting it.)
      • What if the vendor goes out of business?
      • What if there’s some horrific reliability or security problem and you had to take it down?
    • Is there any reason you would have to upgrade or modify the system in the foreseeable future to meet some need?  If so, what’s the cost and associated risk?
  • Data Synergy
    • Does the system tie to your strategic data goals?
    • How easy is it to import, export, back up, and manipulate data stored in the system?
    • Is the system tied to a core competence? (e.g., if your main customer CRM is very siloed, that probably matters a lot more than if your timecard system is).
  • Finance and Risk
    • Operating and maintenance costs.  Don’t forget that periodic downtime has some associated cost, and it’s really not acceptable for any system to be down for, say, 6 hours every night.
    • Patch intervals, speed of patch issuance, difficulty, likelihood of breakage.
    • How does the system score for PCI and other security frameworks?
    • Does the system integrate nicely with both your core systems and your customers’?
    • Does the system support accretive data?

Assessing risk (and pitfalls)

I’ve already mused a bit on project estimation, and I have an upcoming post about how to think about sunsetting systems.  It occurs to me that there are at least two more fundamental questions to address:  what does it mean to talk about the risk in a system, and what biases might we have when making that estimate?

What is risk?

Finance types define risk as the “standard deviation”.  In other words, it’s a measure of things that might not go strictly according to plan, and how far away from your estimate those things might send you.  In other words, if you say that the project estimated cost is $100 with no risk, you’re saying that the cost should be exactly $100.  However, if you have a cost of $100 with a risk of $50 (we’ll naively assume normally distributed), then that implies your project cost could vary a lot from that $100 estimate.

I often see risk discussed by technical people in terms of roadblocks: “it might be really slow when we hook widget X up to database Y.”  “We might not be able to translate records directly from System Q to System W.”  These are fine things to know about- but if you can quantify them in terms of time lost, revenue lost, or in raw dollar terms, that will get you closer to being to put a bottom line on your estimates.

When doing estimates, teams often use a sort of placeholder.  Instead of estimating that a task will cost, say, $1000, we tend to say it’s a “Small” task, and other tasks might be “Medium”, Large”, or “Extra-Large”.  This is fine.  However, I also think it would be useful to score each task for risk.  For example, I think most developers would agree that “fetch a web page” is a relatively low-risk task, while doing bleeding-edge machine vision is riskier.  From here, perhaps you extrapolate that the “retrieve a webpage” task is estimated to cost $800-1200, while a risker task of similar size might be $500-2000.


We all have biases- there are some tropes in industry, such as “not invented here” syndrome.  Most technical folks, I think, have some bias either toward platforms they’ve already used and understand, or toward the latest “sexy” technology.  I suffer from a variant of this where I prefer open-source products.

There’s another bias inherent to humans: the sunk cost fallacy.  We tend to impart value to things we’ve paid for, even if that past transaction has nothing to do with the current decision.  Sunk cost bias is fairly intuitive if you think about it, but it’s difficult to consciously minimize it when decisionmaking.  The essence: It doesn’t matter if you just spent a million dollars last year.  The question is what the cost is of a new system versus the ongoing cost of the current system.


Algorithms that are 80% good aren’t scary.

Leaving aside the difference between sensitivity and specificity: if people know the machine is sometimes wrong, it’s not so bad.  Things get scary when the machine is 99% or 99.9% accurate, and you’re caught on the wrong end of a presumption.

I’ve often wondered, for example, how many people get picked up on warrants for someone else.  I think it’s important that we have systems for review of the totality of the circumstances by a human.

Vexing little bugs

I find that, particularly with geek stuff, I get hung up on tiny little details.  For example, I did a deep dive with Javascript and CSS the other week, trying to find out why I couldn’t get an input field to select all the text inside when I clicked on it.  This ties back to Spolsky’s thoughts on craftsmanship.

But if I channel Godin for just a bit, I’d remember that the whole product is important- not just one tiny detail.  It’s fine to care about little details once the building is built, but while you’re building- keep going.