Ideas for a faster Web (Draft)

How the web could be made faster

Background

I am a Software Developer that works in the shipping industry. Most of time I spend my time trying to develop solutions that improve the Internet communication of the vessels in terms of bandwidth and speed. Most of the modern web is build on the assumption that Internet is a commodity. In the shipping industry however Internet is a resource that needs to be managed carefully. Internet communication cost a lot of money and the communication speeds are relatively slow. The latency is brutal since most communications have to bounce of a satellite.

Disclaimer

I do not think that I have the accurate use of language required to explain the issue and the solution, so if I failed to explain anything in any way please send me an e-mail with suggestions. Feedback is always welcome.

The issue

All of the times we provide a local (on vessel) HTTP Cache Proxy that is responsible for caching content locally so computers accessing the same website could pull static content from the local cache and eliminate the need to download the same resources twice. We also have a remote proxy server for minimizing the content before forwarding it to the vessel. Since most websites these days are using HTTPS for content delivery this is rendering this technique useless. One possible solution for this is to use SSL bumping to solve the issue but this is a bad idea since it has several issues, not in least that it could be illegal in many areas or countries.

So where does the problem lyes? HTTPS assumes that content encryption and integrity are always desirable. This is not true for most static resources. For example most sites may be using a specific version of some JavaScript library or a CSS. Take for example bootstrap. How many sites use it? Do you always have to download all CSS and JS files from all those sites? You probably visited another site that required the same resources not long ago. Now you may say that most of those sites use some sort of common CDN to deliver those files, so a user does not have to re-download them every time. True but first of all web browsers will have to check if the contain has change by making a request to download that resource and receive a 304 response (Content not modified) if the version of the file they have did not change. More over this does little to help users on the same network to conserve more bandwidth since every browser on every PC will have to download the same file at lease once and that is assuming that most of the sites will use the same CDN.

What a solution could look like.

A potential solution would be to add a signature in the HTML tag of a resource that is probably public. This has already being done by the Subresource Integrity Specification. One small addition I would do to this specification is to add a content size tag as well. This will limit the probability of collisions in case a hashing algorithm is compromised in the future. Unfortunately the goal in mind as far as I understood was to provide additional security in the scenario that some CDN was compromised and start delivering malicious content. Fortunately this feature can also be leveraged by browsers to improve caching. The assumption here is that two resources that have the same hash and probably size (assuming that your hashing algorithm is secure) can be used interchangeably. So if a browser see a hash that is already known it can use the resource that is already in the cache. If not it can try to get the resource using an HTTP request and by adding the hash of the resource to the headers of the request. This will allow any cache proxy server to intercept the request and deliver a file with the same signature or download it and cache it before delivering it to the browser. If the file delivered to the browser does not match the desirable signature, the browser can attempt to download the file using an HTTPS request. If all requests failed to deliver the desirable resource then the browser can deny to run the page by issuing a security warning.

This will allow good old fashion cache proxy servers to keep working even without any changes and could really improve performance if they can leverage the hash header send by the HTTP request to eliminate duplicate resources and traffic.

Advantages

If all ISP (Internet Service Providers) have proxies like this install within their networks, it significantly improve the speed and eliminate useless bandwidth. This will also benefit a few large sites that currently use CDN to deliver content since some of the content could be delivered by a general purpose proxy server on the ISP site.