Internet Security

I always get distressed when I read about the latest example of someone cracking a commercial system, stealing credit card numbers, etc. There's absolutely no excuse for this! We in the industry should have the knowledge and skills which preclude the possibility of anyone hacking into back-end systems. Those of us who've been at it for a number of years have learned the architectures and techniques for protecting information. I can only surmise that the people responsible for some of these systems have not had the benefit of experience. I don't personally know anyone who would make the kind of obvious mistakes we read about on a frighteningly frequent basis.

Firewalls are powerful tools which can be used to limit the ports numbers which are exposed to the Internet. In the case of a WWW server, the only one which should be accessible should be port 80; also port 443 if you're using HTTPS. Some servers expose a wide variety of ports and protocols by default. While I consider this to be a huge mistake from an architectural standpoint, using a firewall can prevent external access to these ports. While internal servers are reasonably safe from hacking, external-facing servers need to be hardened against any possible attacks.

I should admit my bias at the outset. I don't consider Microsoft's products to be very secure. One simply has to look at the frequency of security updates to the underlying software and applications. I wouldn't even dream of using IIS as an external server, for example. All I have to do is look at my Apache webserver logs to see the typical attacks which are designed to exploit security holes in IIS. Microsoft programmers also don't seem to have learned how to handle buffer overruns. That's one of the advantages of using Java over languages such as C and C++. The platform handles strings rather than requiring the programmer to allocate and manage a fixed-length buffer.

I use an Apache/Tomcat combination as my front-line server. While it doesn't offer full J2EE functionality, it's a high-performance HTTP server combined with the reference implementation of a servlet container. You can use JDBC in order to interface to a database or RMI/IIOP to communicate with a back-end J2EE server. And that's one of the keys to designing a secure network architecture. Using a second firewall between your front-end and back-end servers, and limiting the ports which are exposed, ensures that there's no direct access to the back-end servers from the 'net. Front-end servers typically reside in what is commonly referred to as the "demilitarized zone" or DMZ. Here's a diagram.

In this architecture, the external firewall would typically only pass requests for ports 80 and 443. If the back-end server was running Oracle and you were using JDBC and a flat application architecture then only port 1521 would be enabled on the internal firewall. Since the front-end servers should be equipped with dual network interfaces, it's easy to also limit access by IP address on the internal firewall. Finally, on robust systems such as RedHat Linux, you can use iptables to serve as the firewall. Since it functions at the network interface layer, you can actually combine the firewall and front-end functionality in a single server. The disadvantage is that you can't use load balancing in that scenario.

The experienced among you will note that application architecture in the diagram above is n-tier. There are a few ways of stratify an applications but generally you have the servlet/JSP web interface layer, the business logic layer (typically implemented as EJBs) and the persistence layer. Of course you can add more layers to the architecture as required. If the webservers and appservers were physically separate machines then you could add a service layer on the webserver which would handle the communications with the EJBs on the appserver, performing lookups, caching home references, etc. Similarly, rather than using entity EJBs, you could utilize a persistence layer such as hibernate.

Load balancing is another complex topic and actually impacts how one designs an application. There are two types of load balancers; statefull and stateless. In the first the load balancer examines the source of IP packets and routes requests to the same server every time. This can make it easier for developers to maintain session data on a single server. The drawback is that if a server goes down then the session data is lost and the users routed to that machine will have to login to the application again. What I consider to be a better approach is one which permits users to access any front-line server without regard to session persistence.

And this is where load balancing impacts the application architecture. J2EE servers have the ability to persist session data to a database. Whether using URL rewriting or cookies, a session reference can be maintained by the client and servers can retrieve the data from the back-end database. If a developer opts to maintain a lot of data in the session then there's obviously going to a performance impact with my preferred approach. The solution is simple, however: keep your session data to a minimum. The caching capabilities of a modern RDBMS should be able to serve up the data in a fraction of a second. What I like about stateless load balancers is the ability to add front-end servers according to load or take them down for required maintenance with no impact to users.

So now we have no direct access to the internal servers from the 'net. They're blocked by both port numbers and IP addresses. Using a reserved IP address range (such as 192.168.x.x) means that nothing coming from the net can masquerade as one of your front-end servers. Since only ports 80 and 443 are allowed in through the external firewall then they also couldn't access port 1521 on the internal firewall anyway. One of the dangers of not limiting access via the external firewall is that other protocols could be used to breach security on your front-end servers. Skilled hackers could then inject traffic on the second network interface to gain access to back-end servers.

Security needs to be an overarching concern when building web applications. From using HTTPS when requesting personally-identifiable information to using encryption to store credit card numbers, people have a right to expect that a company is doing everything in their power to secure information. While some might suggest that the hackers are more capable than the custodians of information, I would disagree. We have learned much about ways to secure information. Tools and techniques exist to prevent access to sensitive data. We need to ensure that we apply them intelligently or else run the risk of a loss of faith in the entire e-commerce industry.

Finally, it surprises me how many companies require skills in technologies which I consider to be obselete. Perl was the original language used to generate dynamic content on webservers. It was also used by UNIX system administrators for generating reports. It's not a strongly typed language so it's easy to make mistakes. PHP originally stood for Personal Home Page and was designed along similar lines. ColdFusion was an attempt to improve the situation but ultimately couldn't support applications at the enterprise scale, IMHO. The volume of concurrent requests on the 'net these days can tax even server solutions like J2EE. But at least J2EE is scalable and, with appropriate knowledge of how to architect solutions, can support incredible request volumes.

Of course not everyone will agree with these conclusions. Some can reasonably suggest that solutions like WebSphere and WebLogic are too expensive, even though OSS solutions like Glassfish are readily available. My response would likely be that these same people might try to use something like MySQL rather than a solid RDBMS solution like Oracle or DB/2. There's a big difference between designing solutions which will serve hundreds or thousands of requests per day as opposed to millions or tens of millions. If your website is vital to your company and your cashflow, has to run 24x7x365 then you should select the most robust solutions available. As always, YMMV.

June 1st, 2009

Copyright © 2009, 2010, 2011, 2012, 2013, 2014, 2015 by Phil Selby