Security for self-patching software

For Chrome, I had to create a network update facility. This is a pretty simple task on the surface, but it has some interesting security implications. Specifically, what happens if someday my website gets hacked and someone replaces my “patch” with a virus? Then would all my customers auto-upgrade a virus? That would be awful, so I want to protect against that.

Keep in mind, however, that all these other applications you are running on your desktop probably have no security against this. They happily connect up to some server somewhere and download a program. If the right evil-doer gets into that website, who knows what your system might download.

So, how to protect against this? Well, the obvious solution is to sign your code. Signing your code generally uses asymmetric encryption (encryption methods that use a public and private key pair like RSA or ECC).

Microsoft has a technology called Authenticode which they use in Internet Explorer to help you avoid downloading bad programs. (Most users probably read “this code is not authenticated” but click right through to install anyway) But for my purposes, I want to verify a signature within my own code- authenticode doesn’t let me do that easily. And, even if it did, I’d have to go get a certificate from Verisign or Thawte – and that would cost me $150 or more.

So, I came up with the following solution. Thanks to help from CodeProject for a couple of sample source code projects.

Solution
The idea is that when my software wants to do an upgrade, it downloads a small XML file from the server which tells the application what new version is available, where to download it from, etc. The XML file also contains a signature section, which was created by signing the entire XML file with my private key.

So, I create a private key/public key pair using Microsoft .NET’s SN.EXE program, and I store that away in a safe place. I extract the public key, and hard code that into my program file which I ship to customers.

When the program checks for updates, it downloads the XML file. It can now verify the authenticity of the XML file by checking that it was signed with my key. Inside the signed portion of the XML is a MD5 hash. The MD5 hash is a hash code for the to-be-downloaded patch. Assuming that the signature of the XML file matched, we will save away this MD5 to compare against the MD5 of the file we actually download.

Now the application proceeds to the downloading step. After that completes, it verifies the MD5. Since the XML file signature matched, I know that this MD5 hash is “approved” by me. And since the MD5 of what I actually downloaded matches the signed XML file, I know I can safely install the file.

Amazingly, in C#.NET, the code for the entire signing/verification process is an executable that is less than 40KB.

Benefits
The great thing about this approach is that its quite safe and hard to break. I didn’t have to go get a certificate and shell out bucks to anyone.

Risks
One risk would be if my public/private key pair was ever stolen or comprimised. If so, I’d have no way to update existing clients. But this is a problem that mostly exists in the real world today. CRLs (Certificate Revocation Lists) can be used when you have a real certificate to find out if a public key that you once used is still valid. But most software doesn’t implement CRL checking anyway 🙂 I suppose that will change someday.

And I think this is better than Microsoft’s authenticode. Authenticode will still allow the poor user (who didn’t know why he got some funky security warning) to install the code. My approach flat-out-rejects anything which doesn’t pass my signed-code test.

Well, thats my solution. Hopefully I didn’t forget anything silly. And I hope that my software is a lot more secure against downloading bad patches than most of the software out there.

Chrome, Installers, and NSIS

Good news on Chrome this week. We released our 0.7 versions of the product, and they seem to be doing really well in the field. Many of the wierd stability problems are gone. There have been a few minor bugs to fix. We’re so lucky to have users that are patient and helpful. They’ve been just wonderful at helping us get the product to work.

OK – about installers. So far, we’ve used the Microsoft Installer to build our packages. Its been pretty nice. The way the Microsoft Installer works is that most of the program is resident in Windows already. Windows XP, for instance, ships with a version of the Microsoft Installer Service 2.0 built in. Whenever you launch a MSI file, its really the built-in installer service which is doing the work. The contents of the MSI file is basically a big database. Microsoft internally calls the database “Orca”. The Microsoft Installer software knows how to read this database and act appropriately. I have found that there is a tool (its in the developer’s SDK) for Orca which is worth downloading if you spend much time with an MSI.

For simple installs, MSI’s are great. but I’ve found they have a number of drawbacks for us:
1. Installation is slow. Our 2MB application takes about 30 seconds to install. Why?
2. Upgrading existing software is tricky. You have to get a whole bunch of options right in visual studio for this to work. I ran into one bug which is that because my product is versioned “0.70.0”, upgrades just don’t work. Turns out (if you install the Orca database tool) that there is a rule which gets auto-generated to upgrade my version from a minimum version of 1.0.0.0 to a maximum of 0.70.0. Well, of course that won’t work – my version is below 1! I think its just a bug, but I wrestled with that forever.
3. Some customers are reporting this really wierd error which says the MSI file is corrupt. Its intermittent, and only some users see it. But its happening at a rate thats high enough that I don’t believe its corrupt downloads. Searching the net for these errors, you’ll find zillions of other people with the same problem.

Solution: Use a different installer. I found NSIS (from NullSoft, the makers of WinAmp), which I like. You definitely don’t get any of the gravy that comes along with a Visual Studio-like environment. Scripting NSIS is back to unix-land. I saw there are some visual script editors for NSIS available, but I didn’t try them. It took me a good couple of days to really cleanup the installer, but in the end it was worth it for reasons I hadn’t originally anticipated:
– The size of my download decreased from 1.25MB to 0.65MB. I don’t know what MSI’s are doing, but they are just too big. Its the same files in each, but NSIS is able to make downloads HALF the size.
– Install is amazingly quick.
– Uninstall is amazingly quick.
– You get full control over what happens on install, uninstall, upgrade, etc. There is no “behind the scenes” magic. Of course, this is a blessing and a curse.
– You get to customize the graphics in a much nicer way.

So, I recommend NSIS to anyone looking for a nice installer.

There are some features which you loose when moving away from MSIs. For instance, MSIs come in with built-in administrative controls which allow IT folks to determine which applications the end user can install. I’m not sure how that all works, but I suspect the MSI is spending a lot of time figuring that out.

Go OpenOffice

This is really wordy. But its got a great point at the bottom. If you can’t get through this wordy doc, then just click right now over to Open Office and check it out. Otherwise, read on about how I found it… This is a real life story of why software piracy is good for Microsoft, and how Microsoft is making a big mistake right now.

I bought a new PC this week. Its nice. I decided to do some video editing, so I got 2.8GHz, 800MHz FSB, 512MB system with an ATI 9000 All-In-Wonder graphics card. Thats a pretty decent card for the video input. I ended up buying the thing in parts, because it was substantially cheaper than the assembled systems. Normally, assembled systems are cheaper these days, but with my fussiness over the DVD writer and the ATI graphics card, I couldn’t find one with the stuff I wanted.

Anyway, it turns out that installing the hardware was the easy part. The hard part was getting Microsoft Windows to work. I wanted to upgrade to Windows XP, and then I stumbled across the no-piracy stuff thats in there. With XP, it turns out that Microsoft requires activation over the network. They know if your key has been used before, and they won’t let you activate if it has. Of course, I was up late at night trying to get the thing installed, so I was using a CD that I got with my laptop (a legal copy, just not for my new machine!). Well, Microsoft was successful in preventing me from getting my installation done. They will let you try the OS for 30 days before they’ll lock you out, but curious me actually set the clock forward 3 months to see what happened when the 30 days was up. Yup – they locked me out !!!

So I got on the phone to MSFT calling the number that was given to me in the activation application from Microsoft. I told them I wanted to buy two copies of WinXP Pro. The nasty woman on the other end of the phone was clearly angry with me. She kept going on about how my license didn’t allow me to install the software. I told her I knew that and I wanted to give her my credit card number so I could pay for two copies. Needless to say, that was beyond her competence.

So, the next day, I took a trek to Fry’s and bought an OEM copy of windows. The OEM version costs $99, while the regular version costs $150. But, you have to buy hardware in order to qualify for the OEM version. And my system, purchased the day before, didn’t qualify. The nice customer service person directed my to their screw isle (no pun intended), where I picked up a $0.99 bag of computer case screws to qualify as my hardware purchase to go with my OEM version of XP.

Where does this leave Microsoft? Did they win? Did they lose? Well, in the short term, they clearly won. I purchased two copies of Windows XP that I probably wouldn’t have otherwise purchased. In the long term, however, they lost. I’m definitely not going to be upgrading to any new versions in the future unless I absolutely have to. And, I’m going to start looking for alternatives to Microsoft more seriously now too. From what I’ve read, they’ve locked down Microsoft Office tightly with keys just like they did for Microsoft Windows. So, this means I’m looking for alternatives to both.

I’ve never been a linux or open source bigot like many software developers are. I like linux, of course (the server which is serving this page is running Linux), but I don’t want linux on my desktop. So for now I still need Windows as my operating system. And I will admit it – I like windows! Microsoft has really done a great job with XP, and I have no major complaints about it. In the long run, buying it, even for home, is may be okay. But I’m going to look a lot more at competition if I *have* to pay for it rather than *elect to pay for it.

So, I think this anti-piracy crusade could be long term trouble for Microsoft. If they had just let me have it for free at home, I’d use it at work (and pay for it). Why would I look for anything else? I think the software is good – I learned it at home (for free) and I want it in the office too, because that makes my job easy.

But, instead of having me use Office at home, I’m now out looking for something else thats cheaper. And the only reason is because Microsoft wanted to get $99 from me. (Actually office costs more – $200) I’m really liking the alternative I found so far. Yes, I’ve still got XP, of course, but Microsoft Office is NOT on my systems anymore.

I found OpenOffice – an open source based alternative to Microsoft Office. Bundled in the package is a Excel, Word, and Powerpoint – oh – they call them “Spreadsheet”, “Document”, and “Presentation”. So far, its awesome. The install was incredibly smooth and the visual presentation was great (although it did give me some too-techie gripe about not having Java installed?). And now its worked with every word doc and spreadsheet I have. Literally- I see no bugs, and I’m absolutely blown away. This is really good software.

So, Microsoft, we’ll see who has the last laugh. I’m a small business owner, and I obviously hope my business grows. In the future, I’ll need to purchase Office-like software for my own employees. Had you not added this anti-piracy thing to my Windows XP install, I’d absolutely be purchasing Microsoft Office. Why would I look for anything else? I know how to use it, I like it, and its cheaper to not have to train employees to use something else. Now, however, I’ve discovered OpenOffice, and I’m going to use it for a while. Other home users are doing the same thing. And the next time I need to purchase a Word Processor, you may not get my business. While I may not have been paying when I was at home before, at least you were getting my mindshare. Now, I’m still not paying, and I’m not even running your products.
Read more

Getting even with Spam.

Lets get back at spammers. I have an idea how. Read on….

Spam is bugging me today. Did it bug you?

I have to admit that I’m hot and cold about taking action on spam. Some days, like today, I am feeling angst and I want to get back at the spammers to actually stop it. But, on most days I just think, “What’s the point? We all just need to deal with spam and move on. There is no recourse thats useful. ”

So, I have a new idea. To implement, we need a large body of computers willing to help. It goes something like this:

All spam is basically to sell something. In order for someone to sell you something, they have to identify themselves. This can be via a phone number, a snail mail address, an email address, or a website. But somehow or other, they need to leave a mechanism by which they can be contacted. We’re going to leverage that fact.

We write a spam filter that filters spam. Unlike most spam filters, however, this one does something else. In addition to filtering the spam from the user, the spam filter combs through the spam message and finds any identifying marks which indicate any contact info back to the seller. This includes websites, email addresses or phone numbers. Each spam filter then publishes the identifying marks back up to a central server.

The central server basically collects “votes” for who is a spammer. Each night, the central server then publishes a “black list”. The list of the top 20 spammers out there for the day. Each of the spam filters downloads this list and starts attacking the spammer.

How to attack the spammer? Here is how.

1. If you have the spammer’s email address, each spam filter starts sending “don’t spam me” messages. They are sent to “abuse@”, “webmaster@”, etc addresses as well. Emails are cc’d to the FTC.

2. If you have the spammer’s website, each spam filter starts auto-posting back bogus data to the website. This will drive most spammers nuts. Now their databases are filled with junk information. They’ve got more crap in their responses than legitimate responses! (Hey, they spammed me first!)

3. If you have the spammer’s phone number. Well, this one might be hard to attack. I guess we could all let our long distance bills go through the roof and use our modems to attack.

Etc, etc etc. Well, this idea has probably been considered before. Its not new.

Sigh.

Garbage In/Garbage Out

Lots of programmers have moved from using languages that primarily don’t do Garbage Collection to languages that do primarily do Garbage Collection. In fact, I’m probably a late comer to using it seriously. Sure, I’ve used some amounts of java on the side a bit over the past few years – enough to be dangerous, at least. But I haven’t used it enough to really care how the GC was working or even notice bugs where the GC was masking things for me.

In the good old C++ days, every major programming effort that I’ve been involved with had lots of memory allocator debugging techniques employed. We’d use macros for malloc/free, override new/delete, use purify, zeroify memory when its deallocated, create safety zones on each side of the buffers, etc. After you’d done it for a while, these techniques served you pretty well, and with very little effort, you could debug all your memory usage patterns.

Now, fast forward to the land of Garbage Collection. With the language naturally figuring out what you intended to free and not free, you shouldn’t need any of these tools, right? Well, sort of. So far, in my short experience with GC’d languages, it seems pretty common that you need to reference *something* that isn’t written in the GC’d language. For example, java calling out to C++. In this case, you are passing objects back and forth. Sometimes pointers, sometimes not. But either way, you’ve got references to objects that are not going to be GC’d held by objects that are GC’d. Unless you have a perfectly neat little program that can be 100% java, you may run into this. And, debugging it is a pain!

Why is it hard to debug? Well, in C & C++, you can employ all sorts of tricks to allocate/deallocate memory differently. But in the GC’d world, once you drop your references to the object, its going to get cleaned up eventually. And – you don’t know when! When does the GC run? When does it not run? Not much you can do.

Finally, I found one trick which helped a bit. That was to create a simple thread that sits in the background (development mode only) and initiates the GC collection process every second or so. This way, if I’ve got some dangling reference somewhere, the GC will collect the object, and I’ll notice the bug a *lot* sooner than I would have otherwise.

Anyway, this probably isn’t interesting to most folks, but I found it an interesting problem. I like the benefits of not worrying about GCs. But my stodgy old C++ side really likes understanding exactly when my objects are coming and going. Maybe I’m a control freak.

Lucene

I’ve recently been working with Lucene, an open source full text search engine. I haven’t been looking for search engine technology for a while, but all of a sudden I keep hearing about it from all different contexts.

Its a really nice index. Its amazingly simple to use, it appears to be blazingly fast – reasonable for writes and reads of the index.

My Assistant

OK. Well, this is really silly, but it was kind of fun. I ran across the Microsoft Agent Wizard today. And so, I had to create my own wizard. You can find him here; click on the link which says “Have my 24×7 assistant guide you through this page)”. Its cute. You’ll need to be running IE for it to work. It may take a minute to load. But, hopefully, it is worth the wait!

Building blocks for RDF

If you were going to create the RDF classifieds, there are some RDF building blocks you’d like to have.

Each for-sale item will have:
– A price.
This is a semi complex item. What is the actual price? What currency is it in?
– Shipping terms
Paid for by seller? How much? Paid for by buyer?
– Category
Presumably, robots will want to know how to categorize this. There is an RDF Taxonomy module which leverages the DMOZ categorization scheme. I hope dmoz promises to never change the taxonomy? 🙂
– Contact info for seller
Contact him by phone? By email? FOAF is probably the answer for this one.
– Location
Where is the item?

And, these aren’t new ideas.

I wish there were standard building blocks for things like prices and such though.

RSS Classifieds

I’ve been discussing the “RSS classifieds” idea (for lack of a better name) more with a few coleagues. I think the idea has merit.

The basic concept:

A rich RDF format is developed for specifying “I have something to sell”.
People that are selling can create this once, and register with search engines. Search engines pick look at the item, determine if its worth “accepting” into their system, and then post it for sale.

Why do I want this? Can’t I do it on EBay? No, you can’t. EBay, for all its greatness, is a closed system. Fortunately, for most sellers, EBay is currently the largest online marketplace, so its a safe bet. But, if you want to advertise your forsale item elsewhere, you have to do that manually – reposting your data into each system separately. You probably have to register on each system separately, etc etc. Its a real pain in the neck. And, in exchange for you locking in exclusively to EBay, EBay also takes a small fee from you!

OK. So, moving on. What we need to have such a system:

1. A format to specify for-sale items in RDF and RSS
2. Search engines that recognize the format. An ability to make sure that search engines “stay fresh” with the current status of the item

Well, thats it to get to phase I.

But, there is more. One really handy thing about EBay is its rating service for users. With RSS, I think this is reproducible in a distributed way. Lets say Bob and Charlie are about to engage in a transaction, where Bob is selling to Charlie. After the transaction, Bob puts a review of Charlie into his feed, which says “Good”. Charlie puts a review into his feed which says “Bob” was “Bad – late with payment”. As Bob and Cherlie enter into many transactions over time, each will be reviewed by others many times. These reviews can be found by search engines, and an overall composite score can be generated for each user – in effect replicating what EBay has done. Of course, as will everything on the web, we’ll have to take some time building anti-Spam features. We don’t want Bob to be able to boost his ratings by just creating lots of fake reviews of himself. That is probably solvable with a few heuristics, much like what search engines use today. The harder case is the case where Bob wants to maliciously accuse Charlie of being “Bad”. These may be solvable by using anti-spam techniques and also by allowing Charlie to post his own “Review-rebuttal” within his own feed. Lastly, this mechnism has problems with individual changing their reviews over time. Sure, Bob initially gave Charlie a good review. But after Charlie gives Bob a bad review, Bob goes and changes his review of Charlie to say “Bad”. It may be that a webservice is in order here for verifying the authenticity of reviews.

One other interesting point is anonymous email. I don’t think blogs will be using open email addresses forever. Once the spam bots figure out how to parse these little gems, we’ll all be spammed in our mailboxes. Plus, we really don’t want the general public to browse and see that Bill Lee is selling his collection of fancy dolls. So each forsale item that gets posted in RDF/RSS format may want to include contact info which goes through a one-way email anonymizer service. Many (most?) of the classifieds services online today already provide this. Craigslist is a great example.

From what I’ve seen, nobody has really done this so far. (Let me know if you’ve seen otherwise) I’m not sure why. Maybe there is no money in it. Also, I think the level of complexity in RDF for this type of thing is substantially more complex than anything in any of the RSS specifications today. Today’s RSS is about as simple as it gets.

Bots

A friend pointed me at a Scientific American article today, titled “Baffling the Bots“. Its a fun read, I suppose. It sort-of credits Yahoo as having pioneered this stuff in 2000. But we totally did this at Remarq in 1998/1999.

The reason we had to do it was because we had tons of images on our site, and people were sending bots to go find them all. This ate up a fair amount of bandwidth. So, we just required users to type in the number in the picture before they could view pictures, and they had to redo this every 50 pictures or so. Interestingly, we just displayed a simple 3 digit number. The implementation was cheap – we pre-generated the numbers and actually only ever displayed about 50 different numbers. As far as we knew, though, it worked 🙂