Rules to Scale, Optimize Performance and Speed up your Website (webinfrastructure)

Do you fancy your website (or web-infrastructure) built over simple commodity hardware to really handle millions of users ??
(ie no sweet and cute super computers and yet so powerful!)

Really? So lets jump in and start discussing that how you can optimize your site to deliver and render content at a blazing speed to the huge volumes of user (which ofcourse in many way reduces loads on the server):

(Note: All the maths below is hypothetical and not exact figures, this may vary based on use-case)

1. Minimize HTTP Requests: For all your images, css files, js files, a http request is made to the server, which it turn responds back with the required files.. while each http request takes somewhere between a few milliseconds to few seconds to respond back! (Heck of a time spend!) So, to reduce that, you can do the following:

a). Combine files together, for eg. merge multiple js files into one, multiple css file into one and likewise

b). Use of image sprites, image maps.. ie. combine multiple images into one, and use them by specifying background position instead of making multiple calls for n no. of images.


a). Increases page load time by about 80% (yes 80%, you heard it right)

b). Decreases connection pool load on server (lesser http requests, means lesser exhaustion of sockets and lesser request threads) ie added power to serve much more people in same CPU, bandwidth power!

2. Reduce DNS lookup: Uploading files at multiple hosts and using them in your html costs you some 20-120 milliseconds for each host. You can really get away with that by using a single host as your content provider (since browsers cache ie. remember the host name -> dns -> ip lookup for about 30 mins or more), so you help your self by saving that dns lookup time by following my
For eg. If you are reading some images from, some from, some from and some other files then you actually spend about 4*120 – 480 milliseconds to atleast download the content from each of these hosts.
Moreover, if you are using multiple hosts, the files downloading from them do not download in parallel. They all download one after the another. So finally, if you are downloading 20 images from 5 hosts (each serving 4 images), and if each image takes 5 seconds to download, your total download time will be somewhat like:
DNS Lookup for Host 1 : 0.4 seconds
Download 4 images from Host 1 : 5 seconds (all 4 images from host 1 downloaded in parallel)
DNS Lookup for Host 2 : 0.4 seconds
Download 4 images from Host 2 : 5 seconds
DNS Lookup for Host 3 : 0.4 seconds
Download 4 images from Host 3 : 5 seconds
DNS Lookup for Host 4 : 0.4 seconds
Download 4 images from Host 4 : 5 seconds
DNS Lookup for Host 5 : 0.4 seconds
Download 4 images from Host 5 : 5 seconds
So, Total time taken: 27 seconds
…. having said that if you were using only one host, this might have been:
DNS Lookup for Host 1 : 0.4 seconds
Download 20 images from Host 1: 5 seconds
So, Total time taken: 5.4 seconds
Compare the difference yourself

3. Avoid Redirects: Redirects – either 301 or 302 consume time! The headers are sent to the server, and there by redirecting it again leads into resending the headers and the same process being repeated again. More over, search engines “hate” such redirects, your SEO value drops down to floor, you loose ranks and trust of search enginesĀ  .. do you want that to happen? Search engines are like girl friends, you should not do things like these to piss them off

4. Flush the buffer: Well, generally any dynamic scripting language takes about 400-500 milliseconds to response with any html back to the browser, and during this time the browser sits idle. However, we can avoid that to happen, by sending partly executed php’s output to the browser to start rendering with. You can accomplish this by doing a <? flush(); ?>
So, if you do that, you give something to browser to start with, while you generate your rest of the page. So if in case, browser waits for 500 milli seconds to start receiving your html, and then takes 5 seconds to render. Your total time spent is 5 seconds + 500 milli seconds but if in case you flush your headers early, you will save those 500 milli seconds. Worth a save using an extra a line of code ( <? flush(); ?> ) isnt it?
This comes really handy when your server is under big load, and processing a page takes quite some time!

5. Use of CDN (Content Delievry Network): Generally, when you serve content to a user from your server, and your server is based in say US, while the page requester can be from Korea, India or else, so for bigger files such as images, js, css the network latency is a big factor that counts. Here is where CDN comes into act, CDN is nothing but a set of servers, spanned across the earth, ideally one in every country or so meant for serving
content … so now what happens is, when user opens your website, while you serve simple html from your server (in US), you serve images, css and js through CDN (in which, the CDN server nearest to your location is invoked to deliever the content to the user) and thus, content is delievered using nearest possible CDN server to the user, hence eliminating any network latency. Now, big question: having so many servers across the world is a huge issue, how will you have so many servers (its okay for firms like Yahoo, Facebook, etc to have that, but how will you do that?.. well the answer is as simple as: upload your images, css, etc files as an Google application on Google App Engine (Google really serves App Engine files in form of CDN file serve, since it has a set of CDN servers)
Thus, you leverage the power of Google to serve content and that too its free!
(and hey, did you realize, you will also save on bandwidth allocated to you by your hosting since you are not serving images and other big
files from your server)

Another positive of having a CDN is, that CDN caches the files (some do, some do not), that is it does not reads from file-system (its harddisk) each time a user requests for a file, instead it stores in its memory (ram) and delievers it. Owing to limited RAM / memory size, only the most popular files are cached, but really it simply speeds up the whole thing to a great extend!

6. Compress your content: Well, as you know when you archive (compression of) text, it really really reduces to a very small chunk of data. Well so all the modern browsers are capable of receiving html and text in compressed form and uncompress it at the client side. They pass the following to the server:
Accept-Encoding: gzip, deflate
If the web server sees this header in the request, it may compress the response using one of the methods listed by the client. The web server notifies the web client of this via the Content-Encoding header in the response.
Content-Encoding: gzip
Yet another way, to speed up the website!
(Note: text compression works @ converting 100 KB to say 3-4 KB, that said, you said 30 times or more of your hosting bandwidth, while you deliver the content 30 times faster to the user as well)

7. Stylesheets at top and Scripts at bottom: As a good development practice, we should have all the stylesheets included in starting of a html page and all the javascripts at the bottom. The reason being very simple: javascripts add functionality and run some programming stuff on your browser and tend to slow things down, so its a good idea to do that once the page is visible in a good format to user, so that the user doesnt gets a
bad impression of your website. Also, to let all the html come in best format (visual), its good to have all styles already available, ie before rendering starts, so inclusion of CSS files is preferred to be the first thing ie. do it within <head></head> .. while include javascript files in footer (or bottom of your page)

8. MySQL Scaling: So, with all the stuff said above, we didnt talk of how we can scale the database so that it can handle huge loads? It will still hit the db a million times for a million users with thousands of requests per second! … gosh … it will be a brutal murder of the mysql db!!
So, to scale mysql db, we should run multiple hosts on different machines, where we should keep a couple of them as masters and rest as slaves. And whenever we have to write anything to the db, it should be written to the master dbs only! And whenever we want to read anything, it should be read from the slave. While, amongst themselves after a period of time, master dbs should replicate their current data state into the slaves
so that slaves have up-to-date data!
What this helps is that instead of one db handling everthing, it distributes the load amonsgt all the multiple dbs, the two master db do only write and are busy with that, while other slaves fetch the data when data is requested.
It shares the load and lets mysql databases to scale easily to handle huge load (notice, with more dbs we will have more connection pool, and more execution power)

9. Caching MySQL Reads: Well, it has been noticed, that reads are many times more than writes (for eg. only 10 users reading an article may post a comment, while other 1000 will only visit the article and move ahead : which translates into 10 writes and 110 reads on db). So, this slave thing may exhaust too! So what to do now? well we have a solution based on an analogy that since we know that 100 of 110 reads got the same content back, it makes sense to have that content cached somewhere (in RAM) instead of being queried from db each time. So we write “intelligent” php scripts to read the content from db, only if its not cached, and once its read – cache that content in RAM, so that the next time some one tries to read, it can provide the content without even reading the database! Well, we just reduced those 110 reads into 10 reads from the db

10. Use Clean URLs (almost always): Well, instead of having urls like .. have it like .. This is not related to speeding up of the website or scalability, the simple reason to that is for SEO. Having such clean urls, lets search engines index your urls and tag them using the name in url (such as categories -> books ) hence giving full SEO value (much more than having stuffed keywords or super meta description in your page )

11. URL Rewrites in Apache, than htaccess: If you can change httpd.conf of apache or link your rewrite file somehow to it, must do it! Always avoid to keep the URL Rewrites rules in htaccess file. Reason being performance hit. Since the apache rules get executed only once the request is made to the server, while the htaccess file gets executed for each request, causing a super severe performance hit!

I will update my article with some links as well, may be at a later time.