Sunday, July 9, 2017

Dependency Inversion and Microsoft Web Technologies

When I started learning C# and ASP.NET about 10 or 15 years ago (!!), we rendered HTML with Web Forms and published services with ASMX Web Services. A little while later we moved on to WCF services.

One of the limitations of those technologies was that the framework would only call the default, parameterless constructor of the page and service classes. To allow unit testing, I used "poor man's" dependency injection (DI), which I learned from Jean-Paul Boodhoo's videos on DnRTV. This is the approach where you have 2 constructors: a default one that calls the other one, which takes instances of all needed dependencies. The default constructor created instances of concrete classes that implemented interfaces. E.g.

I don't remember using, or even looking to mock frameworks. I think we created custom mocks by creating test classes that implement the interfaces and allowed customizing responses. I'm not sure which inversion of control (IoC) containers existed for C# then, or mock frameworks. If they existed, they were probably open source, which was frowned on at the bank at the time, even if just for testing. If they were allowed it involved some paperwork, anyway. (Banks like to have big companies they can call when they need support and who will be around a while.) Our pages and unit tests looked something like this:

This is fine if all you want to do is unit test. But what if you want to do integration testing, where you mock out the layer where you cross into other people's code e.g. the database or an external service? You'd have to do something like this:

This wasn't something we tried to do at the time. We were a pretty inexperienced bunch. (Programming at a bank vs. programming at a software company is like practicing law at a bank vs. at a law firm.)

If I'd have known more about the dependency inversion principle then, I would have been very skeptical about putting up with this limitation. I would have immediately gone searching for ways to insert something into the web request pipeline to control page and service creation. If I need to replace an implementation at the bottom of a dependency graph, it should be as easy as replacing one interface registration in an IoC container.

Searching the web now it looks like it was possible, but not in an a way that made you feel good. They look like hacks, or the domain of .NET experts. If you wanted to stick to the SOLID principles, though, it's what you should have done.

I do find it surprising that Microsoft gave us frameworks based primarily on object oriented languages (C#, VB.NET) that didn't let you observe object-oriented (OO) practices. Every example from that time involved creating instances of concrete classes in the page or service classes themselves. I even remember one example that told developers to drag and drop a new SqlConnection object onto every page. Not very maintainable. Perhaps Microsoft didn't think the existing MS web developers of the day could embrace these concepts, so many being ASP/VB6 devs.

Whatever the reason, these limitations lead to some particularly gnarly code, completely lacking in abstractions and injection points. For example, the WCF service I'm currently tearing apart to allow mocking out the bottom-most layer. It goes something like this:

The WCF class calls static methods. The first time one of them is called, it initializes a static instance, that is, a singleton, of an object that has instances of database and service classes. If you replace one of those instances during a test, you must set it back to null for reinitialization when you're done, or have every test set it before it runs. This is what you get when you ignore the SOLID principles.

It wasn't until ASP.NET MVC 3 that Microsoft built IoC into ASP.NET MVC from the start, allowing controllers to take dependencies in constructors. Until then, people used IoC containers that implemented the complex looking code that allowed using DI in ASP.NET. I'm still surprised that it took so long for them to bake this in.

The next time I look at a technology that is based primarily on an OO language, I'll be looking for the injection points, no matter how complex they are to use. If they don't exist, the technology will need a very compelling reason for me to use it.

Saturday, February 18, 2017

WebAuthenticationBroker and OAuth2 UserCancel Error

Windows Phone 8.1 programming can be a little... opaque. Yes, I was working on a Phone 8.1 project. My phone can't be upgraded to 10. I'm in the less than 1% of phone users.

I wanted to build an app that can do OAuth2 authentication, so I started with an example from IdentityServer3.Samples on GitHub. It uses WebAuthenticationBroker to present the OAuth server's UI to the user. A client with an ID of "implicitclient" was missing from Clients.cs, but I just copied one of the JavaScript implicit flow clients. The sample worked - I could get both an ID token and an access token at the same time.

I created a similar Client.cs in my existing project and pretty much copied the sample WinPhone example. I could get an ID token from my server and I could get an access token. But not both at the same time.

The WebAuthenticationResult.ResponseStatus value was WebAuthenticationStatus.UserCancel, a very generic error that has many sources. The ResponseErrorDetail property had a more specific error number, 2148270093. I couldn't find many references to this number on the web, but in its hex form, 0x800C000D, I found results. It's a "URL moniker code" produced by IE meaning INET_E_UNKNOWN_PROTOCOL. The description is "The protocol is not known and no pluggable protocols have been entered that match." Still not much to go on.

The callback URL for Phone 8.1 apps starts with ms-app://, which I thought maybe wasn't being recognized. I pointed my Phone app at the sample server and I could get both tokens at the same time.

I debugged the Phone app and grabbed the URL from both my auth server and the sample one. The sample server's callback URL was quite a bit shorter. It started to dawn on me that Phone 8.1 uses IE11 and that it might have a fairly conservative URL size limit. It turns out it's 2083, which is not long enough to hold both of my server's tokens. My signing certificate's key is twice the size of the sample servers, making the token signature twice as long.

So, how to shorten the URL?

I was needlessly including some claims, so I cut them out. I read that elliptical curve keys are shorter, which makes for shorter signatures. IdentityServer3 doesn't have support for EC certificates out of the box, so that would have been some work.

Then I finally stumbled across the idea of reference tokens. It turns out that they are the typical way to shorten an OAuth2 callback URL. Instead of the entire access token the URL contains a short identifier. Clients send the identifier to the server and the server looks it up from its database.

After 3 or 4 days of beating my head against the wall, problem solved. Now I can login.

Wednesday, December 14, 2016

Don't be Afraid of SingleOrDefault - Much Worse, Performance Problem Edition

I found another example at work of someone not taking advantage of SingleOrDefault when making Linq-to-Sql calls, but in a much worse way. I previously mentioned the use of Any() and First() in https://jamesmclachlan.blogspot.ca/2014/05/dont-be-afraid-of-singleordefault.html.

The new example constructs a query and calls Any() to test for the presence of at least 1 result. Any() is very efficient on its own because it uses SQL's EXISTS to just check for at least 1 record without reading anything about the record. This isn't a problem on its own.

Unfortunately, this was followed by a call to ToList() and then [0] to get the first item, instead of First(). The effect of ToList()[0] is to run the query and pull every one of the matching records into memory and then take the first item. First() at least tells SQL to only return the TOP 1 item.

Even worse is that because of faulty logic the code fails to add any query parameters, loading ALL of the records in the system of a particular type. Production has many tens of thousands of such records. Luckily, it only does this in a very specific case. If anyone has ever seen a problem they haven't reported it.

So it's not enough to hunt down uses of First() and FirstOrDefault(). We also need to look for ToList()[0], or perhaps all uses of All().


Friday, October 14, 2016

Handling User Data Securely - Really, Really Securely

Imagine you're building a web app that requires a password from a user for encryption. This password is not the same as their login one, it's used only for encrypting and decrypting their data in the browser. The user enters the password, the app uses it to generate an encryption key that the browser can use and then encrypts and uploads data. Later, the user types the password in again so that they can work with the encrypted data.

One of the things your app promises is that it never stores the password on the server in any form that another party could intercept or crack. And if someone compromises the user's computer or it gets stolen or seized the app promises that there is no trace of the password left behind.

This means never including the password or a hash of it on any posts to the server, neither in form submissions nor AJAX calls nor in something like ASP.NET Webforms view state. Similarly, the app must not save the password in any form to a persistent store like the browser's local storage or one of the database systems that some browsers support.

What options do you have for building such a system with a typical web system that involves form posts like ASP.NET MVC or WebForms?

There are 3 ways a web app typically persists data that the user enters from one page to another:
  1. cookies written to disk by the browser and sent back to the server with the page request
  2. a view state system that includes bits of data in hidden form fields
  3. session storage in the browser to leave data behind for the resulting page to read
The first 2 obviously send data to the server, so they are not solutions. The third one does't send data to the server, and it might seem as good as being only in memory. However different browsers treat it differently. They don't all clear it when you close the browser, and if the browser dies or the power goes out it may be left on disk.

You might be tempted to encrypt the value before writing it to view state or session storage, but who controls that encryption key? If you do, then you have to manage it and may have to reveal it under court order. If you generate a key in the user's browser, then you have the same problem of storing something between page views.

You conclude that the you can only hold the user's password in memory, meaning a JavaScript variable. The problem is that they disappear every time you visit another page. The easiest solution to program is for the app to ask the user to enter their password on every page that needs it. Not exactly a delightful experience.

Obviously you want to minimize the number of times the user enters their password. One approach would be for the app to have a field that collects the password, perhaps at the top. The rest of the page performs AJAX-y form posts that submit encrypted data, receives HTML from the server and updates the DOM with the response. Something like WebForms' UpdatePanel or Ajax.BeginForm in MVC.

If the user navigates to another part of the app that doesn't need the password, though, they have to reenter it when they return. This might be an advantage because the app only asks for the password in applicable contexts. Users would learn to do everything they need to before navigating, or they'd use 2 browser windows or tabs.

You could get the number of times the user needs to enter the password down to once per use session by using a single page app (SPA). If the app is all JavaScript and AJAX calls for data then it can keep the variable holding the password alive for the entire duration of use. If the user closes the window the password evaporates. (Well, it might still be sitting in RAM or on disk in a virtual memory file, but this seems beyond our control in a browser.)

SPAs have an entirely different development process that you may have to learn, especially if you are used to server side web development (ask me how I know :). Putting your time in to learn will be worth it, though, if you want to give the user the best possible experience of entering a secure value exactly once.

Sunday, July 17, 2016

Code Learnability vs. Feature Addability

I'd like to talk about made up words.

No, really I'd like to talk about the tension between making a code base easy to learn vs. the ease of adding things to it, specifically about web applications.

Imagine you had a slider control for your code that went from "easy to learn" to "easy to maintain".

When you slide the control all the way over to "easy to learn", most of the classes disappear along with all of the interfaces and many if statements appear. Methods balloon in size and contain duplicate logic because there are so few classes to encapsulate it. It's very easy to trace data and logic through calls up and down the stack. Anyone reading the code will come across all of the decisions it makes. The reader won't have a good sense of why the code does what it does, though. The lack of little methods like DetermineRanking or classes like RankingEngine mean that such code lives all over the place.

At "easy to learn", you can give a new developer your codebase and they'll be productive very quickly with little mentoring. Need to add a new data field when adding a customer? Just find the giant AddCustomer method or click button event handler and change line 654. A quick bit of functional testing shows that more related logic lives down on line 1233.

Oops, the same logic appears in SaveCustomer and now you can add a customer with a new field but users can't update it. QA or, even worse, customer support opens a bug, new developer finds duplicate logic, copy, paste, fixed. Again and again.

In some organizations, this is totally acceptable to devs, managers and customers. They can hire less skilled, cheaper developers and undercut the competition. Their customers are probably buying the software for some cost center in their business, so spending as little as possible is the goal. Everyone here operates at the low end of some market, which might make sense at the moment. As the dev shop or the customer grows they'll try to level up the software they make or use.

(The field of Economics contains an idea that ideally every product exists at every quality and price point so that everyone can afford the version of a product that fits precisely with their income. If there were only shacks and mansions to live in you'd have to live in a shack until you can afford a mansion. Better to have a wide range of housing options, especially within a neighbourhood.)

I'll bet the development shop's employee turnover is pretty high, though. As soon as a dev learns a little more about development they look for a more advanced employer. Again, some companies accept this and keep hiring from the ever growing pool of junior developers, which is fine.

I'll also bet that successfully adding features takes weeks, including writing code and back and forth with QA. Such code is not unit testable, so people find all bugs during end to end manual or automated testing.

At the other end of the slider, "easy to maintain", classes and interfaces abound, if statements largely disappear, but the structure obfuscates the code's flow. Frameworks and libraries, like inversion of control (IOC) containers and request pipeline handlers, automatically set up actions and dependencies. This means that for the app to know what to do with a class or method you only have to put a specific interface or attribute on it.

The experienced developer's productivity goes way up and bugs go way down because of unit testability and the inherent design (e.g. code no longer allows the string "0" or "1" for values, it uses booleans). However, the major trade off is the learning curve and cognitive load of the code base. The concept count goes through the roof. With so many classes and seemingly magic behaviour, new developers need a lot of guidance with the design. That could be in person or in a recorded video or design docs.

(I recently experienced this myself looking at code samples that rely exclusively on IOC containers to resolve dependencies. Some containers use an easy to follow "map interface IFoo to class Foo" syntax, while others use "map all of Foo's interfaces to Foo". The latter makes it harder to find the thing that maps IFoo to Foo with Visual Studio's Find All References because IFoo doesn't appear in the mapping. I even found one IOC container with a convention-based mapping method like "map all classes in assembly X to the interfaces matching their names". This meant that neither the classes nor the interfaces appear in the container setup code. Unless you find the conference video or series of blog posts where the author goes from original code to the heavily refactored version, you can end up puzzling over the approaches you see.)

Adding features now takes days and involves mostly one way trips through QA.Production bug counts due to actual code problems go down.

The code now requires more experienced developers who demand higher pay. This causes the software to cost more. Customers are happy to pay, though, because the software "just works", which they need to keep their profit centers working. Turnover drops.

There is a joke (belief?) in programming that the more complicated and obscure the code the more job security you have because only you know how to update it. This depends on what makes the code complicated, though, and what the benefit is to the company. Are you using well documented but advanced libraries that do a lot of heavy lifting, or are you writing 1000 line methods of bit shifting code with single letter variables?

Friday, April 1, 2016

It Doesn't Matter How They Get The Data

I was reading about using the SecureString class. This is a class that encrypts a string so that it is readable only on the current machine so that it does not hang around in plain text. Strings in C# are immutable, so they stick around until they are garbage collected, even if you set your reference to a string to String.Blank (""). The typical use is to collect the value from the user directly into a SecureString and shuttle it around on the current machine. As soon as it leaves the machine it must be turned back into a plain string, which is not a best practice.

I was looking into using SecureString in ASP.NET to handle a password. That password is passed in in plain text, and probably again when you actually use it, so there would be more than 0 instances in clear text in memory. Most StackOverflow answers point this out, but it would at least cut down on the number of instances as it is passed from method to method. Strings are a value type, meaning that every time you call another method with one as an argument .NET creates another copy.

In addition, I kept coming across the same kind of comment:

Yeah, no one has ever been able to read a remote, web-accessible machine's memory. Until they could.

This reminded me of a comment a co-worker made about something you could do with a value from one of our databases: "How could anyone get it?"

It doesn't matter.

How did they get the Ashley Madison data? How did they get the Sony data? How did they get the U.S. Office of Personnel Management data?

It doesn't matter.

What matters is what more they can do with it. This is called "pivoting", or using one breach to hop a level deeper. Security through obscurity works until your whole database appears on the web. Any SQL system is one SQL injection vulnerability away from giving up everything. Any software involved in serving web requests is a buffer-overflow bug away from barfing memory.

Practice defense in depth to limit a small compromise from becoming much bigger.



Friday, March 13, 2015

Assert.DoesNotThrow Does not Preserve Stack Traces

Just a short post about an issue with an NUnit feature.

It has an assertion called DoesNotThrow that takes a delegate function as an argument. If the delegate throws an exception then NUnit will throw its own exception. The method is great at pointing out exactly where the code under test is. However, if an exception does occur, NUnit reports the exception type, but looses the original stack trace. There have been enough occurrences of lost stack traces that I've stopped using it. Now I just comment the actual test line and let the exception bubble up to the test runner.