Footprint Bitesize: An Introduction to Screaming Frog

Published on 11th May 2018

What is Screaming Frog?

Screaming Frog is an online tool that we use a lot. It is a crawler for websites that simulates how a google bot would crawl a website.

You simply plug in a URL (any URL) and it will look for links on the page and follow them in the same way that a search engine would. It will then provide you with lots and lots (and lots!) of data which is great for SEO and content strategising.

You can look at almost anything on a website by using Screaming Frog, including:

  1. Hreflang
  2. Meta descriptions
  3. On page content
  4. Canonical tags
  5. Images

By clicking on ‘Configuration’ > ‘API Access’ you can also configure it with Google Analytics, Search Console, Moz, or Majestic and pull in things like page view data or link view data – the possibilities are endless!

What Are Some Simple Things I Can Use It For?

Because Screaming Frog can do almost anything, it’s hard to know where to start. One of the simplest things you can do is to plug in your own website’s URL, let it crawl, and then look at your:

  1. H1s
  2. Page Titles
  3. Meta Descriptions
  4. Response codes

You can click on these in tabs at the top of the screen. In the bottom right hand corner, you will see a graph or pie chart. This will give you info that can tell you whether you have a problem with any one of these elements of your site. For example, ‘H1’ will tell you how many of your pages are missing a main heading, how many of your main headings are duplicated, how many are overly long etc.

Response codes is a little different. Looking at this will tell you if there are any pages blocked by robots.txt, pages that have been redirected, if there are server errors etc.

Once you’ve looked at all of this information, you can start making real changes to your website – exciting stuff!

Different Ways to Crawl a Site

The easiest way to crawl a website is simply to type in the homepage URL, clicking ‘start’ and it will crawl the website like a spider would, following links.

However, there are also different ‘modes’. ‘Spider’ is the default, then there is ‘List’ and ‘SERP’.

‘List’ mode is a better way to crawl specific pages of a website. You can paste in a list of URLs and it will just look at them, rather than following links elsewhere.

Robots.txt, No indexing, and canonicals

Like Google, Screaming Frog will respect robots.txt and won’t crawl pages that it’s told not to. However, you can overwrite this.

By default, it won’t respect canonicals, next/prev, and no-index pages unless you tell it to.

You can tell the crawler to respect these by going to:

‘Configuration’ > ‘Spider’ > ‘Advanced’ and then ticking respect noindex; respect canonical; and respect next/prev.

A Word of Warning

Screaming Frog can break a website. For 99% of cases, it will work fine, but most website servers are configured to block repeat requests and if a server sees that multiple requests are coming in (which Screaming Frog does) then you can overload the server and it could temporarily block your IP address; if the server’s bad it could take the website down temporarily.

Smaller websites should be fine, but if you want to crawl an absolutely massive website, then proceed with caution. If you start to see 403s or connection timeouts then stop the crawl!

If you think that crawling might be a problem, but you really need to do it, then you can turn the spider speed configuration down and limit the number of URLs it requests at any one time. This will make it take a lot longer to crawl the site but is less likely to break it.

Get Going with Screaming Frog! 

If you want to find out more about using Screaming Frog, or download it for yourself, then head over to their website. They have a huge amount of information in their user guides that can help you on anything from checking CSS to exporting your crawls. There are also guides to all of their tabs.

 

Written by Alexandra Eade, Content Manager