Optimize PDF Meta Info to Stand Out

It’s always good practice to provide as much information as possible to Google about your documents no matter the type.  PDFs are often overlooked and taking some time to optimize them will really make them stand out among the rest.  The same principals apply to PDFs as webpages regarding tag relevance.  Don’t just put your company’s slogan in the title and / or description tag because Google chooses to display title and description information that is relevant to the searcher’s query.  The best thing to do is be specific.  Use part numbers and exact part nomenclature wherever possible because the results for those types of searches is where PDFs are most likely to perform well.

Free PDF makers support the addition of meta information but in this post I will highlight Adobe Acrobat because that’s what we use.  Open a PDF in Acrobat and right click the document to bring up the pop-up menu.  Select “Document Properties” from the list.  This brings up a window that allows you to enter a lot of meta information.  Enter a title for the PDF that is about 55 characters long including spaces.  When properly optimized this will control the blue link in Google.  Next check out the “author” box.  Depending on the program you used to generate the PDF, this box will sometimes contain your name or initials or nothing at all.  It’s best to enter your company’s name in this box.  The subject is also very important to displaying a good listing in the results.  When properly optimized the information in the subject box will control the two-line blurb in Google.  Try to include relevant part numbers and specific nomenclature in the subject while keeping it to 150 characters or less. (including spaces)  Making this tag relevant to the user’s search will help ensure the subject tag is displayed in the results instead of what Google chooses on their own.

Document Properties box in Adobe Acrobat

Document Properties box in Adobe Acrobat

If you don’t enter this information into your PDFs, Google will try to index the content of the PDF and display what it thinks is relevant information.  When the document doesn’t contain indexable text, Google will do their best to run OCR (optical character recognition) and extract text to use.  That’s how you end up with descriptions like the one you see in the screenshot below.  The client had about 150 PDFs that would rank for some queries and all of them had this wonky capitalization, but the best part was that each was different!  The PDF had the text on a dark background and it seems to have wreaked havoc on Google’s OCR.  (The screenshot of the full listing is below.)

Google's attempt at OCR for this PDF yielded some awesome capitalization!

Google’s attempt at OCR for this PDF yielded some awesome capitalization! Click for a larger view.

As an interesting sidenote, back in 2008 I pointed out how capitalization was impacting search results followed by Google doing some live index editing to “fix” it.  According to Google, their searches are generally not case sensitive.  The set of results shown in the screenshot below is definitely case sensitive because it changes depending on which letters I capitalize in the query.  Interesting.

Full results to show the OCR capitalization craziness, redacted to protect my client.

Full results to show the OCR capitalization craziness, redacted to protect my client.

Matt Cutts and SEO Myths

Matt Cutts posted another video answering a question from a webmaster.  Ryan from Michigan asked him to list the biggest SEO myths.  Matt jumped right in laughing and saying that there is no boost in organic ranking from running an AdWords campaign.  Weeelllll… you don’t get a direct boost, but you do get what I would call “indirect extra credit”.

Here’s why: as soon as you turn on AdWords you’re getting traffic from a specific keyword hitting a page that is optimized for conversion, users are spending time on your site, conversion rate goes up, pageviews go up, time on site goes up, etc., etc.  All the little things are improved that tell Google that your site is relevant and useful to users searching for the keyword and so your organic ranking gets a boost.  I know this because it happened to a client of mine, and then we repeated it with a few others to prove it (no I’m not going to name them).  The first time I saw it was with a new client that wasn’t getting the kind of traffic they needed.  We did all of our optimization stuff and waited “long enough” for it to kick in. We were seeing the progress we expected from those efforts; a slow but steady increase in ranking and traffic.  A few months into the campaign we hadn’t cracked page one for some big keywords so we decided to start an AdWords campaign to get some extra traffic while we waited for organic ranking to improve.  Well, guess what – within a couple of days the client blasted up to #2 for their main keyword which just happened to be where most of their PPC money was focused.  There was no algo change announced.  There was no major shift in the results for that keyword and there was no across-the-board increase in ranking for my client, just the one keyword.  This was very unexpected, so I tried it again, and again and it worked every time.

Aside from the indirect “extra credit” boost, the only times we see sudden jumps in organic ranking is when we fix technical problems that otherwise held the site back from achieving it’s “true” ranking.  It’s easy to point to that kind of a change by analyzing server logs.  The AdWords “extra credit” does not have the same signature.  It is just a boost from running a good AdWords campaign.  Period.

Regarding AdWords, Matt also said “We wanna return really good search results to users so that they’re happy so that they’ll keep coming back.”  He then said “We’re not going to make an algorithmic change to drive people to ads.”  That’s true, Google doesn’t change the organic ranking algorithm to drive people to ads, they change the layout of the results pages to make paid ads more prominent.  Check out my post that shows only 5/41 links above the fold take you off Goolge’s site for a very clear example of what I’m talking about.  It’s real hard not to click a link that makes money for Google there.  Go ahead, try and convince me that wasn’t intended to drive people o click paid ads. Does Matt Cutts have any control over page layout?  I seriously doubt it.  He’s head of the “Defense Against Spammers” team, not the “Change the Layout to Maximize Profits from Paid Advertisements” team.  I think that’s headed by Larry.

You have to take what Matt says with a grain of salt. He’s a master of misdirection and ambiguity. Don’t let his smooth smile or charming demeanor fool you – he is extremely intelligent and chooses his words very carefully.  He is a very nice guy and I really believe he thinks what he is doing is, first and foremost, in the best interest of the user.  (Good user experience = more pageviews & more ad clicks.)  During the “pre-Cuttlet” era, Matt was always lovely to talk to. He’s witty, funny, and generally a good guy to be around.  He never struck me as the malicious type.  He has to tread lightly so he doesn’t expose unpatched weaknesses in the ranking algo.  I don’t blame him for that at all – I’d do the same.

Ok, so I made up the screenshot - oh well.

Yes, Photoshop was involved…


Google Error

I saw something the other day that made me smile.  Whoever made the Google 404 page went a little fast on the title tag and released the Shift key before they were done entering exclamation points.

Check out the extra 1 on the end.

Check out the 1 on the end


Google Hates Sharing Data

I missed this article during the hustle and bustle of Christmas, and I’m thinking that was what Google hoped would happen.  I’m sure they hoped people would be too distracted to see the news that Google was going to ruin email marketing to Gmail users.  Here’s the important part of the article:

“…Google has just announced a move that will [...] cache all images for Gmail users. Embedded images will now be saved by Google, and the e-mail content will be modified to display those images from Google’s cache, instead of from a third-party server. E-mail marketers will no longer be able to get any information from images…”

So what?  Well, just like [not provided], Google is disconnecting marketers from their target audience and effectively making them blind to a very important piece of data: open rate. When marketing emails are opened their images are downloaded from the email marketing company’s server with a special image call.  The special image call sends back information about you including your IP, email address, a timestamp, etc.  That tells the email marketing company that you opened an email and viewed their offer which gives them an “open rate”.  Since Google is going to cache all images on their own servers and display the cached images in Gmail, the email marketing company will no longer receive information about who has opened their emails and will no longer get good “open rate” information.  With an email campaign it’s really important to understand the number of people who opened an email vs. the number of people who clicked a link.  Since Google is going to cache the images, the only way the email marketing company can tell if someone has opened their email is if they click a link which makes Gmail’s open rate equal to click through rate.  If those two are equal it’s difficult to say if the email marketing company provided a compelling message because they won’t really know how many people opened an email vs. how many people engaged with it.

In a digital age, those who control the data, control the world.

Yep, that's Larry.