Why isn't The Oil Drum indexed by Google? UPDATE: We are!

If you've ever tried to search for this site using Google, you may have noticed you only get hits for the old Blogger site. In fact, it turns out that www.theoildrum.com is not indexed by Google at all! If you want to know why this is happening and maybe help us solve the problem, read below the fold.

Update [2005-9-30 10:20:36 by Super G]: Progress has been made! Now a search of site:theoildrum.com brings us up. We are also indexed by Google Blog search. If someone out there helped us out, thanks!

I thought that it was strange that we weren't indexed, so I sent Google tech support the following message:

My website, www.theoildrum.com , has been live since August 20 but it does not come up in Google searches.

We have plently of incoming links. According to our logs, the site gets crawled by Googlebot daily. Further, I am participating in the Google Sitemaps program.

I do have a robots.txt file to exclude certain admin pages from being crawled, but I believe it stills permit the crawling of all of the content pages, because those content pages are getting hits from Google's bot.

Here is the contents of my robots.txt file:

User-agent: *
Disallow: /~
Disallow: /my
Disallow: /user
Disallow: /poll
Disallow: /search
Disallow: /newuser
Disallow: /comments
Disallow: /?op=

Any ideas?

This was their reply:

Thank you for your note. We understand your concern about your site's not appearing in our index. Your page has been blocked from our index because it does not meet the quality standards necessary to assign accurate PageRank. We cannot comment on the individual reasons your page was removed. However, certain actions such as cloaking, writing text in such a way that it can be seen by search engines but not by users, or setting up pages/links with the sole purpose of fooling search engines may result in permanent removal from our index. Please read our Webmaster Guidelines at http://www.google.com/webmasters/guidelines.html for more information.

Regards,
The Google Team

So I came back with:

To whom it may concern:

I assure you that our site complies with Google's webmaster guidelines. We do not use any SEO techniques. I invite you to verify this by inspecting our website at http://www.theoildrum.com .

Our site is a weblog that uses the Scoop platform. Several major websites, such as Kuro5hin and DailyKos, use this platform. They have no trouble being indexed by Google.

We lauched our domain in August, and redirected traffic from our old webpage on Blogger at that time. We, like many sites, participate in a blogroll provided by Blogrolling.com. This program provides ~50 inbound links. An acquaintance warned me that an influx of links to a new domain could raise a red flag with Google, but I saw no mention of this policy in the Google documentation. Is this a possibility? If so, what can be done to rectify the situation?

I look forward to your response. Thank you.

Their response this time:

Thank you for your reply. We appreciate your efforts to comply with Google's quality standards. We have sent your request on to our engineering team who will review your site to determine if it is eligible for re-inclusion. If the website is reinstated, it will appear in Google's search results sometime in the next few months. We appreciate your patience during this process.

Regards,
The Google Team

So, they say we will appear in Google's search results sometime in the next few months! I think we can do better than that. Are there any Google insiders among our readers that can help us out?

You have so many links to advertising etc, I wonder if some of the destinations are causing you issues. I don't know if Google treats any worse those that use competitors for sales.

PS: Does your site "ping" weblogs.com etc on every new post?

Last night, our ISP admin configured our site to ping pingomatic.com on every new post. pingomatic.com then pings weblogs.com, Technorati, and a few other services.

I read that Google Blog Search collects data from weblogs.com. We'll see if their ban on our site extends to the blog search.

i'd agree. the number of outbound links probably set off some filter. theoildrum.com might appear to be a link farm to the filter.

a manual review of the site should prove that the site is not a spamblog ... but a source of real content.

check this article out. it was written by matt cutts, google engineer. topic: file a reinclusion request
The bulk of the outbound links come from the blogroll. Yet lots of sites have blogrolls, and they don't have trouble. Still I seems that that's the most plausible explanation.

That reinclusion request link was very enlightening. Thanks!

According to some the comments given on that page, it seems as if our reinclusion process is underway. But it's good to know there's an automated way to initiate it.

I've noticed lots of blogs putting their longer blogrolls onto separate pages and keeping only a very short list on the front page.  Perhaps that would help?
I did some quick analysis of your front page:


(x contains all urls):
->> internal = [u for u in x if not u.startswith('http')]
->> external  = [u for u in x if u.startswith('http')]
->> really_external  = [u for u in external if 'theoildrum' not in u]
->> len (internal)
83
->> len(external)
54
->> len(really_external)
35
->> len(x)
137

Total URLs on page: 137
Total URL's referencing your own site as http://www.theoildrum/: = 54-35 = 19
Total relative (on site / style urls):  83
Total external URLs: 35

Note this doesn't count the javascript-created links in ads, google may parse them out although I doubt it.

can we all just pile on and multiple's of us keep asking google to add theoildrum.com  ?

I can't figure out where to email/post on google

tho i saw this for adding URL
http://www.google.com/addurl/?continue=/addurl

            - lorax73

I've already used their addurl page to add our site. Because our site gets hit by their bot, we know that they know about it.

Let's hold off on the pile-on for the time being.

Well.. I've tried to do the very same thing for a Dutch Peak-oil website. I followed every rule, put in all the lines needed for search engines and robots, used only plain HTML, submitted to every thinkable search engine, did not link towards any odd website. But Google still doesn't index it at all.

It allmost seems they deliberatly block Peak-oil websites:

"We cannot comment on the individual reasons your page was removed." ...........

There's always a chance that this is a vast conspiracy to prevent people from finding out about peak oil, but I doubt it. Plenty of peak oil sites come up in Google, just not ours.
I've watched Google carefully on this. They really don't appear to intentionally block sites for political or censorship reasons.

If they think you are spamming or trying to trick them into getting a higher ranking, then they will refuse to index you.

I was just kidding here about sensorship;-)
But still, I got the same problem and am still not able to solve it..
i would recommend that you start running AdSense ... but then i realized TOD runs AdSense and it's obviously not helping out over here

i hate it when Google bans the wrong type of site and then won't let you buy your way back in ;)
Does the site have FRAMEs? If so, Google can't index it.
I do a lot of search engine stuff, and am baffled by Google's response.

It might be the redirect from the old site. Is this a forced redirect, ie, it automatically takes the person from that old site to here? If so, you might want to put a link on the old site saying "click here" for new site.

Blogrolls are fine, Google shouldn't care about those. Every blog has them. They arer more concerned with circular links, i.e. several mostly identical sites linking to each other. You're not doing that either. They consider this to be spamming them.

Other stuff to look for and change

text that is almost the same color as the background
itty-bitty text

There is not automatic redirect on the old site. Just a link to the new site that must be followed manually.
Another thing that stops Google cold is FRAMEs. It can't get past them to read the site. You aren't using them here but something for other sites to keep in mind.
I wonder if they have confused http://www.theoildrum.com/ with a similarly-named site www.oildrum.com (intentionly not a link).  I made that mistake once when I first started reading here.  Don't make that mistake in front of others...

Dear Mom, please check out this great website...  :-)

I'm still wondering if they've lumped you in with oildrum.com.  It isn't found if you do a site:oildrum.com site either.

I really do wonder if it was a typo that got theoildrum on the list of misbehaving sites because of oildrum.  It would be an easy mistake to make.

I have been reading TOD since before it launched at the new URL, and I have never been able to properly highlight text on the website for cut-and-pasting when using IE.  It works fine with Firefox.  What gives?  This is the only website I know of that I can not highlight text on.  (When I try to highlight text, ALL the content before the point at which I tried to start highlighting is highlighted.)  Please fix this issue.  I really would like to be able to cut-and-paste text from this website.  I read it everyday, and there are times that functionality would be useful.  This problem exists on both my work and home computer, and I just hopped on my colleague's PC and it's the same on his.  As far as I can tell, it is part of the website.
This problem is due to a bug in Internet Explorer. You can read about it here.

I may get around to fixing it eventually, but I suggest you use Firefox instead.

Hi there,

Any new site launched is going to take time to rank well in Google these days. I usually tell people it will take 6 - 12 months before you see any action. And as you have changed your site's domain name, you are effectively starting from scratch. I have changed a few sites to new domain names and it is a very painful process.

Looking at your site it does look like there is an issue as there are no pages indexed by Google which is strange. So hopefully they fix this. That being said, It will still take time for you to get the new domain up and ranking like the old site probably was. To rank well in Google, requires two main factors:

1. content / site optimisation
2. lots of incoming links

1. Site optimisation

You need lots of content related to your primary keywords and updated often - which you have in spades! You also need the right keywords within key areas of your pages to help establish your overall theme. Keywords in the page title, domain name, heading tags (h1) are most important. Looking at your page code I have a few small recommendations assuming that "peak oil" is your primary phrase:

Title: move "peak oil" to the start of the title tag
H1 Tag: make sure "peak oil" is in this
Masthead Image: change the alt tag to include "peak oil"
There is some HTML commented out within the head tags of your pages - I would remove that as it could be considered a spam technique by Google

2. Incoming links.

Links are king. Links determine your page rank and page rank (to a degree) determines how well you will rank. In fact, links can alter how a site ranks irrespective of content. E.g. Do a search for "miserable failure" OR "failure" in Google and George W Bush comes up #1!! This is simply because a few people linked to his site many times using these words in the links. As you now have a new domain name, most of your links are to your old site.

Here is your link status and page rank:

www.theoildrum.com
Google - 0
MSN - 3972
Yahoo - 4730

Google Page Rank: 0/10

theoildrum.blogspot.com
Google - 2880
MSN - 5736
Yahoo - 22900

Google Page Rank: 5/10

you can check links at http://www.marketleap.com

Note: Google never reports all links it knows about and only updates its link count every few months.

So, my recommendations here are:

1. try and get as many of the sites linking to your old domain to change this to your new domain. This takes time so you should focus on changing the quality links. These are links from pages that have a good page rank e.g. anything over 3/10 (not on the hope page but the page your link is on).

If you use Firefox, there is a great little extension called "search status" that will show you page rank and alexa rank.

Also, if you can, make sure that the words "peak oil" are in the link text to your site as this will help your ranking for that term. Make sure that your links have some variety. If they are all the same, this will look artificial to Google and you may incur a penalty.

Use this in the search engines to find links to your old site: "link:theoildrum.blogspot.com"

2. If you can set it up, you can try a permanent redirect from your old site to the new one. This is called a "301 redirect" and may help.

Final Note:

There is also a suspected "sandbox" that new sites can be caught in which can last for many months as well. It is believed that if the rate of new links appearing to a new site is large and sudden, this can trigger a penalty in Google. Others think there is an aging filter that applies to new links. Its probably a bit of both.

Hope that helps...

Any new site launched is going to take time to rank well in Google these days. I usually tell people it will take 6 - 12 months before you see any action.

Unless it's a blog.  6 - 12 hours is slow to be indexed on google.  Maybe they are sad that you ditched Blogspot? (a google company)

I started getting google hits the day after I started my blog from scratch.

I have a site http://www.naturalhub.com

I wanted to discuss sustainability issues (including peak oil), a completely seperate topic to natural foods, but couldn't afford a new hosting, so I bought the facility of cloaking http://www.sustainableliving.info onto the naturalhub site.

I was unaware google considers this 'illegal'!

Since 'cloaking' to this site, and submitting the 'cloaked' name, my sustainableliving pages have stopped being updated. Thats not so bad, but the URL given in the google results always resiles to a very old google cached version from when I first put the site up.

So the link that google gives is as if it is my page, but is in fact googles ancient cached version.

I consider this at least immoral.

I have emailed google and had zero response.

My page now carries the tiny URL address
http://tinyurl.com/72myz

and advises people that the google cahed version is well out of date - in other words, go back to bookmarks, don't rely on google, google may well mis-inform you!

If this trend increases, I can see google shooting oyself in the foot. People will understand it is unreliable, and go back to bookmarks and portal pages.

As for solutions, I don't know. Google has the power, and is misusing it.

As for conspiracy theories, please do review what Kunstler writes about his visit to Google in his blog. The kewl hax0rs (cool hacker spelling) at Google laughed at the peak oil issue, "it's like, you know, we have technology, duh!" or something like that.
Google can be very slow, months slow.

Make sure the redirect from the old site to the new one is a 301 (permanent redirect)

Changing domains can be tricky with search engines, and sometimes, things go wrong for no apparent reason. Usually related to the search engine thinking it's duplicate content and banning one of them.

Linking to wrong people can cause problems, but as your problems began with the domain change, I think it's related to that.

It's fairly common to hear about problems with domain changes (and with new sites).  Be patient, it can take a few months.  Very possible you didn't do anything wrong.

Keep adding good content, as you are doing now, and visitors and links will come, google will follow.

Something else that could help is contacting webmasters of links to the old domain and ask them to update them to the new one.

Some progress.  While yesterday "site:theoildrum.com" returned nada, today it returns the home page.

When you say that google's bot drops by, does that mean it fetches the robots.txt file and then leaves, or that it the majority of the site and then leaves?  If it's the first, then you can contrast your robots.txt file with that of other scoop sites that are well indexed. I doubt it's that.

Google's policy and procedures for building the index are as obscure as the Saudi oil reserves.

Ping:

Does the their bot sweep the whole site, or just the front page and the robots.txt file?

I'll note that your robots.txt is different from that of most scoop site.

I'd block the "~" files for all clients in the apache config, not in the robots.txt.

I just want to say thanks to Richard Heinberg and the Oil Drum for having the interview.  Nice to know something about the man behind his books.
I'd like to refer you guys to an article by an Austin blogger, Nick Lewis, who describes "malicious intent by posters" and how it can, very simply, remove an undesirable page from the internet:  

http://nicklewis.smartcampaigns.com/blogging/cnn-guerrillas-in-the-midst-a-viral-marketing-campaign- exposed

This is a long report but a brilliant descovery, and I recommend that you read it in its entirity