Corporate Email Users, Email Compliance and Google’s Desktop Search

Now that Google’s product has been out a few days, people are starting to write about it. One concern that is coming up is that of security, which you can read about in many articles such as this one.

One topic not yet mentioned is that for corporate users, Google’s product may also have severe legal implications. Its one thing to keep cached copies of webpages. But, keeping cached copies of email, which can contain important proprietary information, is another story. And as it turns out, if you delete email in Outlook, Google’s product still keeps its stealthy little hands on its own copy anyway. This means that users that may *think* they deleted that email when they actually did not. Unfortunately, I think this is a clear sign that Google still “doesn’t get it” when it comes to enterprise users and their search needs.

In 2004, many companies are spending huge dollars on making sure that their company is in compliance with new information responsibility laws. Companies are more responsible than ever for making sure that information about their customers is protected and not leaked. If your employees install Google’s software, and receive a transient email containing confidential information, Google caches it in the background, and won’t let you delete it without going back and manually looking for it. Most users won’t have any idea this is happening, let alone know what to do about it. And once that information is cached in a hidden place, its could accidentally slip into the wrong hands.

Likewise, if you archive your email in Exchange, or have corporate retention policies to delete emails that are older than 90 or 180 days, Google pays no attention. It happily caches your private information with no regard for your information lifecycle.

Lastly, there is the complaint that many users have already pointed out, which is that once the information cached, anyone logging into the machine as another user can now also see the cached copies of that email. So, no more lending your machine to your office mate while you are out to lunch.

If I were in charge of an IT department, I would be very concerned about employees installing Google’s product. Its a lot of legal risk.

Spyware Popularity

I was visiting the alexa site today and looking at the top-10 most popular sites on the web. Its mostly about what you’d expect. You will find:

Looks pretty much as expected, but who the heck is OfferOptimizer?

Well, turns out they are the recipients of some of traffic from some of the most commonly found spyware. They collect information off your machine and use it for their own sinister purposes.

I guess they must be happy – 8th most commonly accessed website on the net. Sigh. How sad!

Strings in C++ in 2004.

I wonder how many people are dealing with Strings today. How many programmers are stuck because they have one library using a std::string, another using an ATL::CString, and a third using a MFC::CString? Or maybe you are trying to interface to a BSTR? how about a LPWSTR? or even a plain old char *? Maybe you’ve got your own whiz-bang String class that’s even better….

As software designers in C++, we’ve just unequivocally and totally failed. C++ was heralded as the object oriented language to bring us interoperability and code reuse. Say what? How can we even fantasize about being interoperable when we can’t even agree on how to format a string?

I think Java and C# have two major advantages. One is garbage collection, and the second is a standard definition of strings.

How is it that here in 2004, we can’t figure out how to put an array of characters together in a way that works in all languages, and is interoperable with other programs? This is absolutely ridiculous.

Of course this is all well known and has been for years. As a struggling programmer sent back to the dark ages of C++ after having been spoiled by managed code, it just gets me particularly grumpy.

Your Anti Virus Program is a Virus

I had a couple of reports over the last few days that the Lookout install was infected with some sort of trojan or virus. This is very alarming, of course! So we looked into it seriously.

What we found, is a bug in Symantec. On Aug 9th, the corporate edition of their anti-virus software published a new definition file of viruses, which incorrectly diagnosed the Lookout installer as containing a virus. This has apparently been fixed in their Aug 10, rev 23 update of that file.

The particular file that was declared a virus was “nsisdl.dll”. Its a part of the NSIS installer, which is used by Lookout, but was written by the WinAmp team. From reading around the net, you can see that their product (as well as all other products that use NSIS) were suddenly hit by the antivirus product.

What the antivirus product does is to delete the files which contain “bad stuff” – and they do it automatically. And the definition of “bad stuff” is auto-updated behind your back. I sure hope they don’t make mistakes like this very often. What would happen if your trusted anti virus folks made a more serious blunder? What would happen if some hacker figured out how to edit that file (its probably signed to avoid tampering). Shoot – with this powerful antivirus software running on your system, who needs a virus program? If I were a hacker, I’d spend all my time disecting the virus definition file from Symantec, and trying to change it on their site. It would be hard word, but if you were successful, it would be the worst nightmare ever. Symantec has taken care of the distribution problem for you – just flip a couple of bits and that “anti” virus becaomes the virus itself.

But you know, I’m paranoid. I guess false positives are part of the world we live in. Sucks.

Blog Spam

Its a shame, but everyone seems to be doing their best to spam lately. I got two comments today (from the same person, pretending to be two people) with a link to their own advertisement for junk. I guess they are under the ficticious belief that Google will give them better placement if they try to use my site as a link to their spam? I guess they think Google can’t figure that out? Hmm. Spammers.

Anyway, if you think you’ll get away with spamming here, think again. I will crush you.

Stealing keystrokes

The “Scob” virus last week that was swimming around the web reportedly was capturing keystrokes and sending them back to a russian site somewhere. Its unclear if the recipients of this data were sophisticated enough to actually use the data they stole, but if they were, the poor souls that lost their data could be in for real trouble.

The notion of “Stealing Keystrokes” has me thinking. I’m no expert on keyboard drivers in windows, but why is it that a 3rd party application can be installed in windows to steal keystrokes which were intended for a different application? Just like we have segmented memory today, shouldn’t we have segmented keystrokes? In Unix, when I press a key on my virtual terminal, you’d have to be super-user to steal it. (XWindows probably opens up a whole set of holes, but lets not talk about that yet)

So, why not write some sort of driver inside of windows to protect the keyboard messages? Maybe some types of keys, like control-keys, shift-keys, etc would still be sent around globally so that hotkeys can work and such. But for regular old keypresses, is there a way we can make sure they only arrive at the intended application?

We know that these virus attacks are going to get worse. I think we need to start protecting even the individual subsystems in windows.

I wish I were proposing a better “solution” here, rather than just complaining about shortcomings…. I’ll do some research and see if I can’t learn anything.

Blogs as a technical resource

I love blogs. There are a lot of smart people out there writing really great stuff about technical topics – especially from people that have actually *used* the technology rather than just documentation people…. But, being that anyone can write a blog, and being that everyone has a slightly different standard for what is “publish worthy”, there is a fair amount of misinformation out there!

As I dive into more obtuse topics of MC++ and C# and Office/Outlook, there are fewer and fewer resources to draw on. As such, when there is misinformation out there it becomes all the more apparent! I’m seeing a lot of it lately.

I wish I had a good answer. I’ve written to a few authors – and they are generally very receptive to trying to cleanup mistakes. But, boy, be careful.

But I’m probably guilty of it too. Who knows how much misinformation exists in this blog!

Security for self-patching software

For Chrome, I had to create a network update facility. This is a pretty simple task on the surface, but it has some interesting security implications. Specifically, what happens if someday my website gets hacked and someone replaces my “patch” with a virus? Then would all my customers auto-upgrade a virus? That would be awful, so I want to protect against that.

Keep in mind, however, that all these other applications you are running on your desktop probably have no security against this. They happily connect up to some server somewhere and download a program. If the right evil-doer gets into that website, who knows what your system might download.

So, how to protect against this? Well, the obvious solution is to sign your code. Signing your code generally uses asymmetric encryption (encryption methods that use a public and private key pair like RSA or ECC).

Microsoft has a technology called Authenticode which they use in Internet Explorer to help you avoid downloading bad programs. (Most users probably read “this code is not authenticated” but click right through to install anyway) But for my purposes, I want to verify a signature within my own code- authenticode doesn’t let me do that easily. And, even if it did, I’d have to go get a certificate from Verisign or Thawte – and that would cost me $150 or more.

So, I came up with the following solution. Thanks to help from CodeProject for a couple of sample source code projects.

Solution
The idea is that when my software wants to do an upgrade, it downloads a small XML file from the server which tells the application what new version is available, where to download it from, etc. The XML file also contains a signature section, which was created by signing the entire XML file with my private key.

So, I create a private key/public key pair using Microsoft .NET’s SN.EXE program, and I store that away in a safe place. I extract the public key, and hard code that into my program file which I ship to customers.

When the program checks for updates, it downloads the XML file. It can now verify the authenticity of the XML file by checking that it was signed with my key. Inside the signed portion of the XML is a MD5 hash. The MD5 hash is a hash code for the to-be-downloaded patch. Assuming that the signature of the XML file matched, we will save away this MD5 to compare against the MD5 of the file we actually download.

Now the application proceeds to the downloading step. After that completes, it verifies the MD5. Since the XML file signature matched, I know that this MD5 hash is “approved” by me. And since the MD5 of what I actually downloaded matches the signed XML file, I know I can safely install the file.

Amazingly, in C#.NET, the code for the entire signing/verification process is an executable that is less than 40KB.

Benefits
The great thing about this approach is that its quite safe and hard to break. I didn’t have to go get a certificate and shell out bucks to anyone.

Risks
One risk would be if my public/private key pair was ever stolen or comprimised. If so, I’d have no way to update existing clients. But this is a problem that mostly exists in the real world today. CRLs (Certificate Revocation Lists) can be used when you have a real certificate to find out if a public key that you once used is still valid. But most software doesn’t implement CRL checking anyway 🙂 I suppose that will change someday.

And I think this is better than Microsoft’s authenticode. Authenticode will still allow the poor user (who didn’t know why he got some funky security warning) to install the code. My approach flat-out-rejects anything which doesn’t pass my signed-code test.

Well, thats my solution. Hopefully I didn’t forget anything silly. And I hope that my software is a lot more secure against downloading bad patches than most of the software out there.

Getting even with Spam.

Lets get back at spammers. I have an idea how. Read on….

Spam is bugging me today. Did it bug you?

I have to admit that I’m hot and cold about taking action on spam. Some days, like today, I am feeling angst and I want to get back at the spammers to actually stop it. But, on most days I just think, “What’s the point? We all just need to deal with spam and move on. There is no recourse thats useful. ”

So, I have a new idea. To implement, we need a large body of computers willing to help. It goes something like this:

All spam is basically to sell something. In order for someone to sell you something, they have to identify themselves. This can be via a phone number, a snail mail address, an email address, or a website. But somehow or other, they need to leave a mechanism by which they can be contacted. We’re going to leverage that fact.

We write a spam filter that filters spam. Unlike most spam filters, however, this one does something else. In addition to filtering the spam from the user, the spam filter combs through the spam message and finds any identifying marks which indicate any contact info back to the seller. This includes websites, email addresses or phone numbers. Each spam filter then publishes the identifying marks back up to a central server.

The central server basically collects “votes” for who is a spammer. Each night, the central server then publishes a “black list”. The list of the top 20 spammers out there for the day. Each of the spam filters downloads this list and starts attacking the spammer.

How to attack the spammer? Here is how.

1. If you have the spammer’s email address, each spam filter starts sending “don’t spam me” messages. They are sent to “abuse@”, “webmaster@”, etc addresses as well. Emails are cc’d to the FTC.

2. If you have the spammer’s website, each spam filter starts auto-posting back bogus data to the website. This will drive most spammers nuts. Now their databases are filled with junk information. They’ve got more crap in their responses than legitimate responses! (Hey, they spammed me first!)

3. If you have the spammer’s phone number. Well, this one might be hard to attack. I guess we could all let our long distance bills go through the roof and use our modems to attack.

Etc, etc etc. Well, this idea has probably been considered before. Its not new.

Sigh.