Crawling Dead Ends

Saturday, April 12th, 2008

Search engine dead ends are something you have to remind site owners about, as they do not think about the web in terms of how a search engine navigates a site, just how people do, particularly themselves. If they can navigate to said page, then everybody can is their thinking.

Google recently announced they are experimenting crawling through web forms on a small, select group of high quality web sites. Is this notice that soon any site owner will be able to hide content behind web forms and expect it to be crawled? No, I don’t think so. First, there are other search engines to think about; Google is not the other engine. Second, the coverage from Googlebot crawling through web forms certainly will not be as extensive as link crawling.

While this search advancement is welcome, I don’t think it should change how good, accessible sites are designed. The advice that Matt gives in his fictional site review still stands. If you need to place a selection of some kind in front of content that you need crawled (like your whole site), it is best to do this with links rather than a web form. There are other reasons for maximizing the content on the site’s main page rather than it being a simple portal with an image and region dropdown.

Google Summary of Paid Links Policy

Sunday, December 2nd, 2007

Published today was a summary of Google’s consistent policy over buying and selling links. It is worth pointing out to site owners. A frequently asked question by someone with a “big idea” and wants their site up and running on the web in two weeks with loads of search engine traffic on launch day is “Can I buy links for search engine placement?”. Of course they can, but against my recommendation. If they choose to do so, I choose to not get involved with them. Some people just see money as a short cut and understand once they read the webmaster guidelines. Others could care less about the rules, they are more interested in gaming the system and trying to get away with it. I think that effort is much better spent elsewhere.

Much of the anti-Google sentiments on the web I believe is from site owners who provide no value whatsoever, and are still trying to eke out revenue from link doping or other paid for links strategies as were employed over five years ago (Remember searching on a topic and ending up on a page full of links that had no relevance to your search in terms of actual content? See examples of today’s methods). Google is not the monopolistic dictator in this regard protecting big business/themselves as is often portrayed. There is a universal desire among all search engines to protect their indexes’ PageRank or equivalent from manipulation from paid links. I commend the search engines proactive work in this area, keeping their indexes unpolluted and user trustworthy. I feel the whole Internet becomes much less useful if the search engines were ever contaminated this way.

Getting a Site Indexed

Thursday, August 16th, 2007

Something I have learned the past few years while talking to friends and clients regarding new or existing web sites is the lack of understanding of how search engines index your site’s content and how that content shows up in search results. Of course that is the basis of the Search Engine Optimization (SEO) industry, of which I am not a big fan (that explanation is for a later post). A recent post by Matt Cutts of Google on the changes in the frequency of index updates is very interesting and displays how far the industry has come in just the last 7 years.

Matt states that in 2000 when he joined Google there was a 3-4 month period where they did not update their index at all and another search engine went for over a year without updating their index (perhaps one of the casualties of the search engine shake up). This would mean that no new sites or new content from existing sites would show up in searches until the index update. It was mid-2000 when Google started regular monthly index updates, driving the search engine industry to provide accurate and fresh results for searchers.

Since then Google has been improving their index updates to the point where things can appear in the index only minutes after being posted. Of course, other search engines have had to follow suit. This is where I appreciate Google’s focus on their search customers (although content owners love fresh results as well).

Changes in technology have helped Google reach these new levels of freshness. Instead of Google spiders having to crawl each site daily (which is impossible for them to do when they are indexing billions of sites) sites can ping Google when they have updates. This is possible now with the rise of RSS and sitemaps. Sites do not have to be a traditional blog to utilize these techniques either.

In the past, clients would bring new projects and expect a site to be created and launched in two months, as well as indexed by all the major search engines with a high result on key search terms on launch day. When I explained that sites had to be submitted for crawling by the search engines, and then there was a waiting period before they would be added to the index and available in search results, for a total wait time of 4-6 months, it often opened up their eyes to the search engine industry. Many wanted to pay to be included in the index and listed as the #1 result but after some explanation, they would understand the reality of the web. As all were small organizations or individuals, I stressed the importance of focusing on their content and doing what they could with the search engines but not obsessing over their initial rankings. Some dropped their site project with this news; others went forward and discovered their wait for a crawl and to show up in the index was not as detrimental as they thought it would be. Now it seems very easy to set up a site with feeds and sitemap pinging capabilities and you can be discovered and indexed in days or hours. Then you can immediately work on building content and incoming links from valuable resources (not link exchanges) to increase your visibility.

Some more good general advice is provided in a Google Webmaster Central post on getting indexed (English at bottom) for the Portuguese market, but it is relevant to every site. The top two points are critical – Be a subject authority (write good content people are looking for) and keep the search engines informed of your site updates, which are hopefully frequent. If you are not checking off these two points, then all the other optimization will do little to gain and maintain visitors, no matter how high you get your site to rank.

  • About the Author

    Jon Fedyk is a IT professional in Regina, Saskatchewan, Canada. He specializes in the creation and management of highly available systems. He is interested in open data, statistics and data presentation.

    More about Jon »

  • Places

  • Search
  • Archives

  • Categories

  • Meta