Home Organic
Growth
Master
Planning
SEOMike™
Portfolio
Professional
Background
Personal
Background
SEOMike™
Blog
SEO
Services
 

Archive for the ‘SEO Tips & Tricks’ Category

Google Trends

Friday, May 14th, 2010
Google Trends offers a good look at year-over-year data for search volume.

Google Trends offers a good look at year-over-year data for search volume.

Google Trends can be a very effective tool for planning your SEO campaign if you know how to use it.  The tool is a great way to look for seasonality and “burstiness” of topics online, which is especially important to review if your client has season-related products.  Using Google Trends you can establish a target date where your campaign needs to come to fruition, which allows you to allocate your resources appropriately in order to reach your target.

As an example, I’ve used the terms “lawn mower” and “snow blower” because they obviously have very season-related sales.  You can see in the image below the year-over-year number of searches for the two terms.  You can see that “lawn mower” peaks around May of every year and then drops off as the season wears on.  If a client sells several seasonal items, you can see the importance of using the time leading up to May to prepare for the peak lawn mower season.  You can also see in the image below, that searches for “snow blower” peak in December – January of a given year and is subject to bursts of traffic related to major snowfall.

Google Trends can be a very effective tool for planning your SEO campaign if you know how to use it.  The tool is a great way to look for seasonality and “burstiness” of topics online, which is especially important to review if your client has season-related products.  Using Google Trends you can establish a target date where your campaign needs to come to fruition, which allows you to allocate your resources appropriately in order to reach your target.

As an example, I’ve used the terms “lawn mower” and “snow blower” because they obviously have very season-related sales.  You can see in the image below the year-over-year number of searches for the two terms.  You can see that “lawn mower” peaks around May of every year and then drops off as the season wears on.  If a client sells several seasonal items, you can see the importance of using the time leading up to May to prepare for the peak lawn mower season.  You can also see in the image below, that searches for “snow blower” peak in December – January of a given year and is subject to bursts of traffic related to major snowfall.  Another good use of this tool, beyond seasonality is illustrated below; the ability to measure one phrase’s popularity against another.  This is a good way to evaluate some of your long-tail phrases.

Google Trends offers a good look at year-over-year data for search volume.

Google Trends offers a good look at year-over-year data for search volume.

In the image above you will also notice a couple of other really important things; rankings of states and cities.  This information can help you specifically target your PPC campaigns to cities and states where the most searches for your items occur.  Another important thing you can see is the ability to export the results to a CSV file which allows you to present your own graphics and charts outside of the Google tool.  This is great if you are doing research for a client and want to brand your results as your own.  Don’t worry too much about the alphabetized article results displayed to the left when you are looking at a huge period of time as in the image above.  These links point to news stories that will give you a very small glimpse into what items and on this scale, much of the content is dead.

In the image below we’ll look at 2009′s trends related to these categories.  You can get a little deeper look at a single year’s trends like this as well as looking at which cities and states performed the best for a given year.  You can get a solid peak month for your campaign by using the single year view.

The Google Trends yearly view of results show seasonality in detail.

The Google Trends yearly view of results show seasonality in detail.

The single-year results can also be exported to an CSV file to allow you to play with them in Excel.  Earlier I said you should ignore the alphabetized article results on the left, but in this case these articles can be of use.  If you click the “more news results” link you’ll be taken to a page where you can see the articles that trended throughout the year.  This will help you determine market-factors that drove the number of searches and find out what was going on in the industry to help explain the search volume.  You’ll be able to find seasonal trends like the effect of huge snow storms on snow blower searches, and the effect of drought in regions on lawn mower sales.  When the grass doesn’t grow, sales go down.

One last thing I’ll mention about the “more results” is that there is more data here than meets the eye.  Since Google has been indexing offline content such as newspapers and books, you have a deep look into information going all the way back to the 1800s in some cases.  Though data from that far back is probably not relevant to today’s market, it is very interesting to see.

Google Trends shows mentions of items in offline material.

Google Trends shows mentions of items in offline material.

To sum up, there is a TON of data available to the SEO in Google Trends if you know what to look for.  You can offer a wealth of information to your clients to help them see why you want to focus on certain campaigns leading up to the busy season.  They will better understand why you want to focus so much effort on organic SEO for “snow blowers” in the summer instead of when they are selling.  The more the client understands the data behind your online strategy, the more likely they will value it as a part of their company’s strategy, instead of just a “bolt on” curiosity.

Using Subdomains to Speed Your Site

Tuesday, April 27th, 2010

Effectively using subodomains on your website will decrease your page load time.  If your site is image-heavy your pages probably load slowly.  Web browsers load all elements of a page from one TCP connection stream and each TCP stream only gets so much bandwidth.  For example, if a page has seventy images to load (including your logo, nav items, etc), HTML content, java scripts, and flash elements, a web browser could have 100+ simultaneous TCP connections to a single URL.  One major benefit of using multiple subdomains to call specific elements it that it allows content to load out of sequence.  You might have seen sites that stall until a flash piece is loaded before displaying the rest of the page content.  If the site would have called the flash piece from a subdomain, it would be loaded separately from the HTML and it wouldn’t stall the page load.

A good way to set this up is to write your pages to call items from subdomains.  Subdomains count as separate hostnames in browsers and are treated as such.  Upon the initial load of the website, a DNS query is made to get the IP address of each hostname and then it’s cached.  If your homepage loads elements from multiple hostnames, all of those IPs are cached and no further DNS queries need to be made.  If page load time is an issue for your site you could setup a subdomain for all javascript, images, flash, etc.  In this example queries for HTML are loaded from the main URL, javascript from js.example.com, images from i.example.com, flash from f.example.com.  If your site requires 100 TCP connections from one hostname, the server handles those sequentially.  The first set is loaded and as an element is completed, the next item in line fills the slot.  Splitting up the calls between hostnames divides the number of connections in the queue.  In this case you might have 12 connections to js.example.com, 5 connections to example.com (for content), 70 connections to i.example.com, and 2 connections to f.example.com.  Each set of connections is loaded simultaneously thus letting each connection take as long as needed without delaying loading of the elements farther down the list.  This is a bit complex so I’ve created an image to illustrate the point.

single-urlmulti-sd

I have setup sites in this manner before to help overcome slow page-load times and saw a great increase.  The site was and ecommerce site with lots of thumbnails, featured products, logos, etc. and it was nice to be able to call them from a separate hostname so loading the images didn’t slow down the site.  Modern browsers will open 24 – 32 threads per hostname.  You should take advantage of as many as possible.

Too Many Webmasters Are Webamateurs!

Thursday, April 22nd, 2010
It’s happened to all of us.  We’ve been to a website that we expected to work correctly, but of course, it didn’t.  Who should we blame for these functionality problems?  The webmaster.  A webmaster is defined as “A technician who designs or maintains a website.”  Notice in that definition the mention of “designs.”  Almost all web designers forget about an extremely important part of their job: browser ubiquity testing.  I discussed this topic before, but I see so many silly mistakes on the web that the topic is always at the front of my mind.
A great example of a major web faux paux can be seen below.  Can you believe that Merrill Lynch calls Google Chrome an “incompatible browser?”  They say their compatible browsers are IE 5+, Netscape 7+ and Firefox 0.8+!  How long has it been since their webmaster actually thought about their site’s compatibility?  The Netscape browser isn’t available anymore, IE 5 is history, and Firefox has been through ELEVEN version updates since this message was implemented.  Where is the webmaster?!

It’s happened to all of us.  We’ve been to a website that we expected to work correctly, but of course, it didn’t.  Who should we blame for these functionality problems?  The webmaster.  A webmaster is defined as “A technician who designs or maintains a website.”  Notice in that definition the mention of “designs.”  Almost all web designers forget about an extremely important part of their job: browser ubiquity testing.  I discussed this topic before, but I see so many silly mistakes on the web that the topic is always at the front of my mind.

A great example of a major web faux pas can be seen below.  Can you believe that Merrill Lynch calls Google Chrome an “incompatible browser?”  They say their compatible browsers are IE 5+, Netscape 7+ and Firefox 0.8+!  How long has it been since their webmaster actually thought about their site’s compatibility?  The Netscape browser isn’t available anymore, IE 5 is history, and Firefox has been through ELEVEN version updates since this message was implemented.  Where is the webmaster?!

Merrill Lynch has a webamateur!

Merrill Lynch has a webamateur!

This is just one example of a lazy or inept webmaster.  I’d estimate that 80% or more of the websites on the internet have some browser compatibility issues.  Website troubles are not limited to the small-budget businesses that throw together sites, as you can see in the example.  I wonder how much money Merrill Lynch spent on their website.  I wonder if they know their site still does browser detection for browsers that were released in early 2004 – SIX years ago.

The Importance of Valid HTML

Tuesday, March 30th, 2010
Valid HTML is the best practice when it comes to websites.  Though it can be difficult, it is worth the time.   If your code is valid, it won’t have to be auto-corrected by browsers and it won’t cause problems with search engine robots.
I’ve talked about browser ubiquity link testing before and the importance of working out problems with code to make sure a site renders correctly in browsers.  To sum up, when code is written correctly a web browser doesn’t have to interpret and correct it.
If web browsers can do their own code correction, why is validating the code important?  Browsers can correct code, search engine robots cannot.  They are left to fend for themselves without that advanced feature.  This can cause problems with them parsing your site in full and finding all the content on a page.  The spiders may get stuck on a piece of messed up code and stall which will eat up the time they have to spend on your site.
There are several tools available to help validate your code and the best one comes from the W3C (World Wide Web Consortium) site link .   The W3C is in charge of writing standards for web code and has played a key role in the development of the internet as we know it.  They know what they are talking about.
Correcting the code can be a real headache if you’re not really good at coding so I recommend contracting a web developer to help.  A good web developer is an expert in this kind of thing and the process will take a lot less time than a novice trying to work it out.

Valid HTML is the best practice when it comes to websites.  Though it can be difficult, it is worth the time.   If your code is valid, it won’t have to be auto-corrected by browsers and it won’t cause problems with search engine robots.

I’ve talked about browser ubiquity testing before and the importance of working out problems with code to make sure a site renders correctly in browsers.  To sum up, when code is written correctly a web browser doesn’t have to interpret and correct it.

If web browsers can do their own code correction, why is validating the code important?  Browsers can correct code, search engine robots cannot.  They are left to fend for themselves without that advanced feature.  This can cause problems with them parsing your site in full and finding all the content on a page.  The spiders may get stuck on a piece of messed up code and stall which will eat up the time they have to spend on your site.

There are several tools available to help validate your code and the best one comes from the W3C (World Wide Web Consortium).   The W3C is in charge of writing standards for web code and has played a key role in the development of the internet as we know it.  They know what they are talking about.

Correcting the code can be a real headache if you’re not really good at coding so I recommend contracting a web developer to help.  A good web developer is an expert in this kind of thing and the process will take a lot less time than a novice trying to work it out.

Broken Links = Broken Site

Friday, March 26th, 2010
Another new client comes on and reminds me of the importance of the basics.  The site I’m reviewing for the first time has almost 500 broken links.  That’s right, five HUNDRED.  The client came on reporting virtually no representation in the search engines.  I’m starting to see why.  Code that doesn’t validate, content in iframes, and a ton of broken links.  Though there are a lot of factors that I think are keeping them down, I’m just going to focus on broken links in this post though.
So, what are broken links and how do you find them?  Broken links are any kind of link that doesn’t deliver the expected page.  The link could fail because of a typo in the or because the target page has moved.  There are two major problems with having a lot of broken links on a website;  1. The users can’t get to content you want them to find and,  2.  The search engines get stuck in these “spider traps” and burn all their time waiting for a 404, 403, or 500a response.  Both problems affect a site’s positioning in the index because the search engines want to present their users with sites that work.  Also, the search engine bots will never discover all of your site’s content if they are stuck waiting for an error from a broken link.
In the case of my new client, there’s no way to go through it by hand to find every broken link.  It would take weeks.  The best way to do find them is to run a spider against the site, like Xenu, and let it do all the checking for you.  (I’ll make another post about Xenu at some point with tips on it’s operation, but for now I’ll say that you shouldn’t run it full-throttle against a site because your IP might get blocked for a DOS attack. link to wiki on dos)  Xenu will check every link to see if it works and will report the response.  After the report runs, you’ll have a nice list of broken links and the pages where they are listed.  Keep in mind though that Xenu doesn’t click on every link.  Some older versions won’t work with java onclick attributes and Xenu certainly won’t click in flash elements.
Now that you’ve got your broken link report, what should you do?  Fix them!  You have to go through each link and try to figure out what went wrong with that link.  Was the HTML fat-fingered?  Was the link copied incorrectly from a browser address bar?  Was the page moved?  What error does the server deliver?  Once you’ve figured out what is wrong with the broken link, you can try to fix it.  If the HTML was fat-fingered, just correct the problem and you should be good.  If something else happened to the link, try to find the correct target and replace it.  If you were deep-linking and the target no longer exists, try to find it on the site in an archive or something.  If you were citing some other site’s article and that article doesn’t exist, it’s still good practice to give a link to their homepage as credit for the citation.
If you’re an SEO correcting links for a client, it’s important to have a good report for them including recommendations for what they should change in their site editing / page creation policies to prevent it from happening again.  Oh, and don’t forget to run Xenu against the site again when the corrections have been made.  If they got messed up one time, they could get messed up again.

Another new client comes on and reminds me of the importance of the basics.  The site I’m reviewing for the first time has almost 500 broken links.  That’s right, five HUNDRED.  The client came on reporting virtually no representation in the search engines.  I’m starting to see why.  Code that doesn’t validate, content in iframes, and a ton of broken links.  Though there are a lot of factors that I think are keeping them down, I’m just going to focus on broken links in this post though.

What are broken links and how do you find them?  Broken links are any kind of link that doesn’t deliver the expected page.  The link could fail because of a typo or because the target page has moved.  There are two major problems with having a lot of broken links on a website;  1. The users can’t get to content you want them to find and,  2.  The search engines get stuck in these “spider traps” and burn all their time waiting for a 404, 403, or 500a response.  Both problems affect a site’s positioning in the index because the search engines want to present their users with sites that work.  Also, the search engine bots will never discover all of your site’s content if they are stuck waiting for an error from a broken link.

In the case of my new client, there’s no way to go through it by hand to find every broken link.  It would take weeks.  The best way to do find them is to run a spider against the site, like Xenu, and let it do all the checking for you.  (I’ll write another post about Xenu at some point with tips on it’s operation, but for now I’ll say that you shouldn’t run it full-throttle against a site because your IP might get blocked for a DOS attack.)  Xenu will check every link to see if it works and will report the response.  After the report runs, you’ll have a nice list of broken links and the pages where they are listed.  Keep in mind though that Xenu doesn’t click on every link.  Some older versions won’t work with java onclick attributes and Xenu certainly won’t click in flash elements.

Now that you’ve got your broken link report, what should you do?  Fix them!  You have to go through each link and try to figure out what went wrong with that link.  Was the HTML fat-fingered?  Was the link copied incorrectly from a browser address bar?  Was the page moved?  What error does the server deliver?  Once you’ve figured out what is wrong with the broken link, you can try to fix it.  If the HTML was fat-fingered, just correct the problem and you should be good.  If something else happened to the link, try to find the correct target and replace it.  If you were deep-linking and the target no longer exists, try to find it on the site in an archive.  If you were citing another site’s article and that article no longer exists, it’s still good practice to give a link to their homepage as credit for the citation.

If you’re an SEO correcting links for a client, it’s important to have a good report for them including recommendations for what they should change in their site editing / page creation policies to prevent it from happening again.  Oh, and don’t forget to run Xenu against the site after the corrections have been made.  If they got messed up one time, they could get messed up again.

Link Campaign Sustainability

Thursday, March 18th, 2010

When planning a link campaign it’s important to set some goals for the campaign so you can accurately track progress.  You can’t just wing it.  You need to be precise in how you target your campaign or else it will look artificial, or worse yet, spammy.

The best method for tracking your link campaign, as far as I’m concerned, is a good excel spreadsheet.  You should track where you found the link partner, what date you requested a link, when the link went live, the URL of the actual linking page, link type, the advertiser network (if relevant), and the last date you checked the link.  I found this to be quite useful when I had a text link campaign from a third party.  We would periodically review the campaign to make sure the links were live and found that we were paying a lot of money every month for links that were no on sites even though their system told them they were.

As the link campaign builds steam it’s important that you maintain the same pace throughout the campaign.  The number of link acquisitions in a period is tracked by Google and it’s best to keep the linking campaign on a steady pace with no spikes or lags in activity.  This is why I keep a separate spreadsheet containing the number of links for a site on a given day and update that sheet daily.  Using the data I’ve collected I have excel create a line chart of the number of links over time, then apply a trend-line to it to set a goal for the campaign.  The trend-line will show if you are making the progress needed or not.

Another important thing to look at is the number and types of links that your competitors have.  Once you’ve established what they have you can try to build a similar campaign.  You should evaluate the number of links from each different PR, ie., 340 links from PR 2 pages, 400 from PR 0 pages, 65 from PR 5 pages, etc.  The goal is to make your link base look as natural as possible, yet still competitive, to avoid unwanted attention from Google.  Remember:  All sites will have a number of “nofollow” links pointing to them.  Best to keep that number on par also.

The last factor I’ll mention here is link decay.  A Google patent application I read mentioned the fact that Google evaluates the decay rate of a site’s link base.  You need to make sure that your decay rate isn’t too high.  Link decay is when Google can’t find a link on a page anymore.  This can happen when a blog entry that links to you moves off of the homepage and Google doesn’t see it anymore.

A sustainable link campaign is critical to long-term success of your SEO efforts.

Content Management Systems are supposed to make life easy. . .

Thursday, March 4th, 2010

Content Management Systems (CMS) can make life easier for businesses that have no webmaster, or have thousands of pages that they need to organize.  However, a CMS comes at a steep price, both monetarily, and in restrictions.  There are so many inherent problems that choosing the right one is critical for the success of your website. I could go on and on and on about the problems I’ve had with CMSs over the years, but suffice it to say they are a total pain in the neck.

In the interest of keeping this somewhat short and not a soapbox rant, I will stick to a bullet-pointed list of things to think about when selecting a CMS.

  • Look for hidden fees for:
    • Page Editing
    • New Page Creation
    • File Uploads
    • Image Creation
    • Phone Tech Support
    • Email Tech Support
    • Yearly Licensing Fees
    • Software Update Fees
    • Bandwidth Overages
  • Do not work with a company that refuses to give you full admin rights.
  • Demand FTP access.
  • You must be able to edit the HTML of the pages and the page components. Period.  The only reason they wouldn’t want you to is so they can charge you to do it for you.
  • Can you create AND manage multiple section templates?
  • Can you have page-level access to meta tags?
  • If you create a static homepage, how to you upload it and how do you edit it?
  • Do they own the artwork they create?
  • Ask for examples of current clients and have an SEO inspect their HTML.
  • Test the page-load times of the example websites.  If you don’t know how, your SEO will.
  • Don’t sign anything unless you get everything you want in writing.
  • Remember, if you design a site in a custom CMS, they’ve got you. There’s no leaving without a complete recoding of the site and potential loss of database info.
  • If your site is small, will the CMS cost more then retaining a good webmaster?
  • If your CMS is a hosted solution, make them give you a Service Level Agreement (SLA) in writing to guarentee uptime.
  • Beware of built-in SEO packages. They are usually designed by a programmer that has “some” SEO knowledge and not a true expert.  Bolt-on SEO packages never, ever, work well.
  • Ask for a demo of their product as you would see it and in a working environment. A CMS company will often let you test a feature-rich version of the software on their fastest server. The version and server you get may be significantly different.
  • Do you really need a CMS or are you being sold a CMS? Ask a professional SEO.
  • Have an IT guy check out their system. If you don’t have an IT guy, get one. A real one. Not your nephew.
  • You should have contracted a professional SEO by now, get them involved before you buy.
  • Ask about guarenteed response times to support issues. If they don’t guarentee a resolution time, your issue may languish in tech support for months.
  • Ask who your main point of contact will be. It sure won’t be the salesman you’re talking to. Ask to speak with that rep and see if you can understand them. You know what I mean.
  • Ask what kind tier datacenter they have. There’s more info about datacenter tiers at Wikipedia.
  • Ask about the project workflow for the site design.
  • Get a guaranteed completion date.
  • Ask to see their designer’s other work.
  • Make sure you know the number of design concepts you can go through. You may only get two, and if two isn’t enough, you will have to pay more.
  • Ask how their testing environment works. You need a production area to test pages before pushing them live.

There are good CMSs out there, but as a layperson they are hard to identify.  A CMS may be a very good fit for your company, but too often the decision to buy a CMS is made without consulting anyone “technical.”    If you are relying on the CMS company to be your technical person, remember, they are selling you a product.  You need an independent opinion.  When you’re talking to the CMS company you’re not talking to a technical person looking out for your company’s best interests, you’re talking to a salesman working for a commission.

What You Can Learn From Your (IP) Neighbors

Tuesday, March 2nd, 2010

What exactly are IP Neighbors?  An IP neighborhood of sites is made up of every website that shares an IP.  If you have a dedicated server that just hosts your site, there’s not much to worry about, but like most people who have shared hosting, there are some things you need to know.

First, you have to find your neighbors.  There are several tools online to find out what sites share an IP.  My favorite search is provided by DnsQueries.

If you have a shared server you need to find out what other sites share that IP with you.  If some other site is malicious and spamming or distributing viruses, spyware or malware, ISPs or search engines might block the whole IP and your site would be affected.  If you find that the other sites on your shared server are spamming or doing other unsavory things, you need to move hosts.

Another reason it’s good to know who your neighbors are is because of the load they put on the server and the internet connection.  If you share a server with video sites or super-flash heavy sites with lots of traffic, your site’s performance may be reduced.  If the server admin hasn’t properly set resource limits for each site on the server, the other sites will hog the box and slow down delivery of your website.  That’s not fair.  You’re not getting what you pay for.

One final thing that I’ll say about the IP neighbor tool is that you can use it to “spy” on your competition.  If your SEO / design competition also provides hosting services, you can very quickly get a deep look into their client base.  That’s all I have to say about that.  ;-)

Most hosts won’t move your site to another server unless you really holler and complain.  Your best bet is to present strong documented proof  of why you want to move and tell them that if they don’t move you, you’re changing hosts.

In order to really understand what your IP neighbors are doing, I recommend that you hire an expert SEO to conduct the evaluation.  The SEO you choose should be able to do “forensic” style studies on your neighboring sites.  If you’re going to pin your hopes on the success of your beautiful website, you’d better have a clean place to put it.

Using Google Search Suggest for a Long Term Study

Friday, February 26th, 2010

If you have a client that depends on seasonal trends, Google Search Suggest can be quite valueable.  Goolge Search Suggest gets some media attention from time-to-time for it’s strange suggestions so you might have heard of it.  You can see Suggest in action by starting to type a query in the Google search box and watching what phrases are suggested.  This can be entertaining, but there is a good use for it.

In the beginning, Search Suggest had a lag of about 30 days, but now the suggestions seem to be more real-time.  Since SEO efforts take a lot of time to achieve positioning, the search suggest can be a good tool to use for identifying phrases to promote for next year’s season.  Of course, you can’t promote for things that are flash-in-the-pan results, but overall you can identify new phrases that customers are using RIGHT NOW to look for the products you offer.  This can help you create contextually targeted content for the next season based on more than just your historical phrase usage research.

A good example that you can currently see in Google is Valentine’s related.  Beyond the phrase research you did to prepare your client for the Valentine’s buying frenzy, you can look at the search suggest box to see what phrases are being searched in the season.  You might find some gems that are popularly searched, but not often SEO’d.  This year’s trends show searches for homemade gifts, how to write poems, dinner ideas, etc. that you might otherwise not have considered in your optimization strategy.

When to Stop Caching

Tuesday, February 23rd, 2010

We’ve all seen the little “cached” link in Google results.  This link can give some useful information for an SEO such as the most recent cache date which can be nice to keep track of so you can trend when Google caches your site.  If you know how often Google caches your site, you can judge the effectiveness of new campaigns and new content based on caching frequency.  A cache is also important so you can check out the text version of your site that Google has saved.  Looking at the text cache can help you troubleshoot content presentation problems, and see what content is not visible to Google because of frames, flash etc.  You can also identify what Google and other search engines think about your site navigation, internal linking structure, and outbound links.  Basically, there is a wealth of information to a skilled SEO.

If there’s so much good info, why would anyone want to block caching?  One reason; content ownership.  If your site offers some kind of service that is required to have certain phrasing in disclaimers, disclosures, etc. the Google cache can be dangerous because it can retain stale information if your terms change.  The real danger here is quite limited, but I worked with a client once where cached legal info was a problem and their attorneys needed to know how to get rid of it.

Another problem can arise from content scrapers.  There isn’t much you can do to prevent a content scraper from grabbing text from your site, but there are a few tricks you can use.  For the example above, I came up with a system so all of the client’s sites and pages contained a call to another server that would dynamically populate their legal disclaimers for pages.  We configured the server that provided the text to only accept requests from certain IPs (the website IPs) to prevent that include from working on other websites.  Most content scrapers just pull the HTML from a site and with this call, the legal info was contained in something that didn’t present scrapeable content.  The problem this client had was that scrapers were stealing and reproducing entire websites and impersonating the company.  This led to legal confusion and the lawyers wanted a way to protect against future scraping of the legal info.  Not only did the solution above ensure every one of the sites had up-to-date legal disclaimers, but it prevented scrapers from getting the content.

Another related challenge is a site called the Internet Archive that keeps a record of your site’s changes.  This site also contains a wealth of information and content that could be used against you.  A skilled SEO can look through the history of your site and reverse-engineer all the improvements you made to increase user conversion.  It’s really not all that difficult.  I could go to a competitor’s site in the Internet Archive and look at design changes they made to improve user conversion.  If I know the competitor and their target client well, I can find out all kinds of valuable information about user conversion improvement that the competitor probably spent a lot of money and time learning.  When I conduct User Funnel Improvement Research and Conversion Improvement Studies I usually start by digging through competition to see how they drive traffic into their high-value conversion pages.  I look at their current site design as well as their change history in the Internet Archive.  Starting from scratch on a multi-variant testing (MVT) campaign can be quite expensive so I use what I learn from competitors and improve it, then start my MVT from there.  By this time in my career, most conversion improvements are intuitive for me, but it’s still nice to look at where other companies are focusing their efforts.

The Internet Archive is especially useful for keeping tabs on other SEO firms that like to brag about their big clients and their latest client acquisitions.  If I know XYZ company has a big new client, I can watch the changes they make to see if there’s anything I can learn about their methods.  A tip for you: don’t advertise your new clients until AFTER you have your SEO improvements in place and the Internet Archive blocked.  This will make it more difficult for other SEO firms to track your changes.  Better still, keep under the radar and don’t talk about clients until you’re way into their campaign.

So, how do you stop Google and the Internet Archive from caching your critical, and expensive information?   Simple.   To stop the Internet Archiver from keeping a record of your site, simply block their user-agent in your Robots.txt.  This will also remove any previous records for your site in their archive.  The fix for Google is also pretty easy and can be controlled at the page-level with a robots “nocache” tag in the HEAD section of your page.  This should really only be used on pages that contain legally sensitive information that you don’t want cached, such as a “terms & conditions” page.

It’s important to protect your site from legal problems by taking every measure you can against scrapers.  This helps ensure the most up-to-date version of your information is available online.  Blocking the Internet Archive helps erase the bread crumb trail of improvements you have spent so much time and money implementing.


newsletter software