Wednesday, December 14, 2016

Don't be Afraid of SingleOrDefault - Much Worse, Performance Problem Edition

I found another example at work of someone not taking advantage of SingleOrDefault when making Linq-to-Sql calls, but in a much worse way. I previously mentioned the use of Any() and First() in https://jamesmclachlan.blogspot.ca/2014/05/dont-be-afraid-of-singleordefault.html.

The new example constructs a query and calls Any() to test for the presence of at least 1 result. Any() is very efficient on its own because it uses SQL's EXISTS to just check for at least 1 record without reading anything about the record. This isn't a problem on its own.

Unfortunately, this was followed by a call to ToList() and then [0] to get the first item, instead of First(). The effect of ToList()[0] is to run the query and pull every one of the matching records into memory and then take the first item. First() at least tells SQL to only return the TOP 1 item.

Even worse is that because of faulty logic the code fails to add any query parameters, loading ALL of the records in the system of a particular type. Production has many tens of thousands of such records. Luckily, it only does this in a very specific case. If anyone has ever seen a problem they haven't reported it.

So it's not enough to hunt down uses of First() and FirstOrDefault(). We also need to look for ToList()[0], or perhaps all uses of All().


Friday, October 14, 2016

Handling User Data Securely - Really, Really Securely

Imagine you're building a web app that requires a password from a user for encryption. This password is not the same as their login one, it's used only for encrypting and decrypting their data in the browser. The user enters the password, the app uses it to generate an encryption key that the browser can use and then encrypts and uploads data. Later, the user types the password in again so that they can work with the encrypted data.

One of the things your app promises is that it never stores the password on the server in any form that another party could intercept or crack. And if someone compromises the user's computer or it gets stolen or seized the app promises that there is no trace of the password left behind.

This means never including the password or a hash of it on any posts to the server, neither in form submissions nor AJAX calls nor in something like ASP.NET Webforms view state. Similarly, the app must not save the password in any form to a persistent store like the browser's local storage or one of the database systems that some browsers support.

What options do you have for building such a system with a typical web system that involves form posts like ASP.NET MVC or WebForms?

There are 3 ways a web app typically persists data that the user enters from one page to another:
  1. cookies written to disk by the browser and sent back to the server with the page request
  2. a view state system that includes bits of data in hidden form fields
  3. session storage in the browser to leave data behind for the resulting page to read
The first 2 obviously send data to the server, so they are not solutions. The third one does't send data to the server, and it might seem as good as being only in memory. However different browsers treat it differently. They don't all clear it when you close the browser, and if the browser dies or the power goes out it may be left on disk.

You might be tempted to encrypt the value before writing it to view state or session storage, but who controls that encryption key? If you do, then you have to manage it and may have to reveal it under court order. If you generate a key in the user's browser, then you have the same problem of storing something between page views.

You conclude that the you can only hold the user's password in memory, meaning a JavaScript variable. The problem is that they disappear every time you visit another page. The easiest solution to program is for the app to ask the user to enter their password on every page that needs it. Not exactly a delightful experience.

Obviously you want to minimize the number of times the user enters their password. One approach would be for the app to have a field that collects the password, perhaps at the top. The rest of the page performs AJAX-y form posts that submit encrypted data, receives HTML from the server and updates the DOM with the response. Something like WebForms' UpdatePanel or Ajax.BeginForm in MVC.

If the user navigates to another part of the app that doesn't need the password, though, they have to reenter it when they return. This might be an advantage because the app only asks for the password in applicable contexts. Users would learn to do everything they need to before navigating, or they'd use 2 browser windows or tabs.

You could get the number of times the user needs to enter the password down to once per use session by using a single page app (SPA). If the app is all JavaScript and AJAX calls for data then it can keep the variable holding the password alive for the entire duration of use. If the user closes the window the password evaporates. (Well, it might still be sitting in RAM or on disk in a virtual memory file, but this seems beyond our control in a browser.)

SPAs have an entirely different development process that you may have to learn, especially if you are used to server side web development (ask me how I know :). Putting your time in to learn will be worth it, though, if you want to give the user the best possible experience of entering a secure value exactly once.

Sunday, July 17, 2016

Code Learnability vs. Feature Addability

I'd like to talk about made up words.

No, really I'd like to talk about the tension between making a code base easy to learn vs. the ease of adding things to it, specifically about web applications.

Imagine you had a slider control for your code that went from "easy to learn" to "easy to maintain".

When you slide the control all the way over to "easy to learn", most of the classes disappear along with all of the interfaces and many if statements appear. Methods balloon in size and contain duplicate logic because there are so few classes to encapsulate it. It's very easy to trace data and logic through calls up and down the stack. Anyone reading the code will come across all of the decisions it makes. The reader won't have a good sense of why the code does what it does, though. The lack of little methods like DetermineRanking or classes like RankingEngine mean that such code lives all over the place.

At "easy to learn", you can give a new developer your codebase and they'll be productive very quickly with little mentoring. Need to add a new data field when adding a customer? Just find the giant AddCustomer method or click button event handler and change line 654. A quick bit of functional testing shows that more related logic lives down on line 1233.

Oops, the same logic appears in SaveCustomer and now you can add a customer with a new field but users can't update it. QA or, even worse, customer support opens a bug, new developer finds duplicate logic, copy, paste, fixed. Again and again.

In some organizations, this is totally acceptable to devs, managers and customers. They can hire less skilled, cheaper developers and undercut the competition. Their customers are probably buying the software for some cost center in their business, so spending as little as possible is the goal. Everyone here operates at the low end of some market, which might make sense at the moment. As the dev shop or the customer grows they'll try to level up the software they make or use.

(The field of Economics contains an idea that ideally every product exists at every quality and price point so that everyone can afford the version of a product that fits precisely with their income. If there were only shacks and mansions to live in you'd have to live in a shack until you can afford a mansion. Better to have a wide range of housing options, especially within a neighbourhood.)

I'll bet the development shop's employee turnover is pretty high, though. As soon as a dev learns a little more about development they look for a more advanced employer. Again, some companies accept this and keep hiring from the ever growing pool of junior developers, which is fine.

I'll also bet that successfully adding features takes weeks, including writing code and back and forth with QA. Such code is not unit testable, so people find all bugs during end to end manual or automated testing.

At the other end of the slider, "easy to maintain", classes and interfaces abound, if statements largely disappear, but the structure obfuscates the code's flow. Frameworks and libraries, like inversion of control (IOC) containers and request pipeline handlers, automatically set up actions and dependencies. This means that for the app to know what to do with a class or method you only have to put a specific interface or attribute on it.

The experienced developer's productivity goes way up and bugs go way down because of unit testability and the inherent design (e.g. code no longer allows the string "0" or "1" for values, it uses booleans). However, the major trade off is the learning curve and cognitive load of the code base. The concept count goes through the roof. With so many classes and seemingly magic behaviour, new developers need a lot of guidance with the design. That could be in person or in a recorded video or design docs.

(I recently experienced this myself looking at code samples that rely exclusively on IOC containers to resolve dependencies. Some containers use an easy to follow "map interface IFoo to class Foo" syntax, while others use "map all of Foo's interfaces to Foo". The latter makes it harder to find the thing that maps IFoo to Foo with Visual Studio's Find All References because IFoo doesn't appear in the mapping. I even found one IOC container with a convention-based mapping method like "map all classes in assembly X to the interfaces matching their names". This meant that neither the classes nor the interfaces appear in the container setup code. Unless you find the conference video or series of blog posts where the author goes from original code to the heavily refactored version, you can end up puzzling over the approaches you see.)

Adding features now takes days and involves mostly one way trips through QA.Production bug counts due to actual code problems go down.

The code now requires more experienced developers who demand higher pay. This causes the software to cost more. Customers are happy to pay, though, because the software "just works", which they need to keep their profit centers working. Turnover drops.

There is a joke (belief?) in programming that the more complicated and obscure the code the more job security you have because only you know how to update it. This depends on what makes the code complicated, though, and what the benefit is to the company. Are you using well documented but advanced libraries that do a lot of heavy lifting, or are you writing 1000 line methods of bit shifting code with single letter variables?

Friday, April 1, 2016

It Doesn't Matter How They Get The Data

I was reading about using the SecureString class. This is a class that encrypts a string so that it is readable only on the current machine so that it does not hang around in plain text. Strings in C# are immutable, so they stick around until they are garbage collected, even if you set your reference to a string to String.Blank (""). The typical use is to collect the value from the user directly into a SecureString and shuttle it around on the current machine. As soon as it leaves the machine it must be turned back into a plain string, which is not a best practice.

I was looking into using SecureString in ASP.NET to handle a password. That password is passed in in plain text, and probably again when you actually use it, so there would be more than 0 instances in clear text in memory. Most StackOverflow answers point this out, but it would at least cut down on the number of instances as it is passed from method to method. Strings are a value type, meaning that every time you call another method with one as an argument .NET creates another copy.

In addition, I kept coming across the same kind of comment:

Yeah, no one has ever been able to read a remote, web-accessible machine's memory. Until they could.

This reminded me of a comment a co-worker made about something you could do with a value from one of our databases: "How could anyone get it?"

It doesn't matter.

How did they get the Ashley Madison data? How did they get the Sony data? How did they get the U.S. Office of Personnel Management data?

It doesn't matter.

What matters is what more they can do with it. This is called "pivoting", or using one breach to hop a level deeper. Security through obscurity works until your whole database appears on the web. Any SQL system is one SQL injection vulnerability away from giving up everything. Any software involved in serving web requests is a buffer-overflow bug away from barfing memory.

Practice defense in depth to limit a small compromise from becoming much bigger.