Feeling Lucky? An Open Source, Stress-Free Approach to SEO
Search Engine Optimisation is vital for any public-facing website, but often labour-intensive. This was something that became acutely apparent to us over the last few months, as we launched several websites and web applications that aim to give Winton’s clients access to the best possible information and analysis.
The experience prompted us to create an open source SEO library to cut down the hassle for the ASP.NET Core web development framework.
The library essentially provides a solution for three parts of the SEO process: communication with the web-crawling bots that index websites; the listing of the URLs used by a website; and the creation of metadata – data about data – that determines how a site will be displayed on social media.
Moreover, some dynamic aspects of the library reduce the amount of ongoing maintenance that is necessary. The sections that follow elaborate on these features.
Spur to Action
Search engines crawl the web in order to build an index of all public websites and the links between them. It follows, therefore, that ensuring a site is indexed by a search engine is the first step towards getting it to appear in search results. When building a website, there are often externally accessible development and staging versions of the site that are used for testing by developers and project stakeholders. It is usually undesirable for search engines to crawl these test sites, since they may detract from related search results.
The most robust solution would be to restrict access entirely to the development and staging sites using something like an htaccess file, but this is not always a viable solution. Sometimes it may be helpful to allow users to view the site, or even for it show up in search results.
Fortunately, there is a defined robots exclusion standard that all good search engines respect when crawling a site. By simply placing a robots.txt file at the root of your website, when a bot indexes your site it will first read the contents of this file to determine which parts should be crawled. It can also be used to block all bots, as shown in the example below.
It is cumbersome to maintain separate files for each environment and frustrating that these files are static, since to make changes it would be necessary to re-deploy the application. Even with automated one-click, build and deployment processes, it is time-consuming to build and deploy the entire application just to change the contents of a robots.txt file in a particular environment. A better solution would be to externalise the robots.txt rules so that they can be defined in config instead. This was our initial motivation for creating an SEO library for ASP.NET Core, but we can also supply our sites with other metadata to help improve the information in our search result listing. So why stop there?
The obvious example is a sitemap.xml file, which should also be placed at the root of the website. It is an open standard that is used to inform bots about the relative priority of the different pages of the application. Below is an example of a minimal sitemap.xml file that just lists the home page of www.winton.com.
The difficulty with defining this file statically is that the URL for each route in the application must be an absolute URL. Most of the time it is probably clear what the absolute URL of the site is that needs to be deployed to, but wouldn’t it be better if this could just be figured out at runtime from the hosting environment?
It is also possible to define metadata about a site that determines how it will be displayed when shared on a social network. The Open Graph protocol is one such standard for doing this that is backed by Facebook. The site owner defines several
<meta> tags in the
<head> of their pages, such as
<meta property="og:title" content="Winton" /> and when someone shares that page on Facebook it will use this information for the title of the link that is displayed. Wouldn’t it be nicer if, rather than having to remember the format of these tags every time for each website developed, the metadata for the site could just be defined in advance and code could then handle the creation of the tags?
Finally, it is likely that there is even more metadata that can be defined for a website that has not been mentioned here. Keeping up to date with all of these protocols and ensuring your site conforms is an additional and unnecessary burden for web developers. Adding this information to maintain multiple sites is not much fun. It’s not uncommon to find SEO libraries for other web development frameworks and platforms that make it easier to work with robots.txt and sitemap.xml files, but we could not find one for ASP.NET Core. That’s why Winton built a new library to solve these problems. Simply define the metadata for a given site and the library takes care of the rendering.
When using ASP.NET Core to run websites, it should be easy enough to build these files server-side at runtime. The ASP.NET Core framework already provides services such as
IHostingEnvironment that can be leveraged to determine the environment in which the application is hosted. The routing is also extensible, so that it is possible to define a
Controller to serve the robots.txt and sitemap.xml and easily register it with the MVC router. It is also possible to create reusable view components that define a HTML partial and that have a strongly typed model. These can be utilised to define the markup for the social
<meta> tags, giving the client a strongly typed model with intellisense to let them know what data they need to provide.
To that end, we created a library that defines: a set of services for generating these files; an
SeoController that serves these files under the correct routes; and a
ViewComponentthat can be added to any
cshtml page to generate the social
Using this library offers several benefits:
- Files can be generated dynamically on the server-side using runtime information.
- Disabling bots in development and staging environments is trivial.
- It becomes possible to define robots.txt and sitemap.xml data in config, so files can be changed without re-deployment of the application.
- The burden of having to learn several protocols is removed.
- It presents a friendly, well-documented and type safe interface.
A robots.txt file does not block anyone from accessing your site, it simply informs ‘good’ search engine bots not to index it. Bad bots can choose to ignore a robots.txt file if they wish to and crawl your site anyway. If you want to block access to your site then you need to look at other approaches, which will likely depend on the web server you are using.
Open Source Details
After using the library for several website projects at Winton, it became clear that it was a prime candidate for open source collaboration. The project is now up on GitHub and we have pushed the latest version to NuGet, so it is ready for use today. The project READMEcontains all the information required to start using the library immediately in an application.
We believe this library offers benefits to any developer building websites using ASP.NET Core and would be thrilled to see it grow further. We welcome external cooperation, not least in terms of support for other types of search engine and social network metadata. If you spot anything missing, especially if it relates to something you already add to your own websites, then please open an issue or a pull request. All contributions will be gratefully received.