Keira Davidson (00:16):

Hi, and welcome to the TechSEO Podcast. Today with me, I’ve got Rick, who’s out in Denver. Would you like to give a bit of background information on yourself?

Rick Ramos (00:32):

Sure. My name is Rick Ramos. I’ve been doing SEO professionally since 2008. I’ve been through it all, as far as digital marketing is concerned. I’ve done PPC, I’ve done social, paid and organic. I’ve been HubSpot certified, inbound marketing certified, done the keywords and content thing. And technical SEO is my favorite, always has been my favorite for the last 13 or so years that I’ve been doing this work. So technical, that’s been my lean, and also e-commerce has been my lean as well. I just feel like you learn how the robots work, you kind of understand that the robot wants to think just as much like a human as it can, and just build a website to suit that and also suit the people that are coming to visit.

Keira Davidson (01:23):

And from that, based on what you were saying about how robots are similar to the way humans are, that they have to be able to access the site and so to humans, that brings us onto how best we should optimize a website for Crawl Budget?

Rick Ramos (01:42):

Oh wow, that’s a very, very broad question. So just I’m sure that everybody listening already understands that the concept of Crawl Budget, where Google has a finite amount of time and resources that is willing to spend crawling and understanding the website. And so maximizing that, that time that it’s willing to spend, is really the name of the game, as far as I’m concerned, when I first get my hands on a client website. So I guess, bringing it back to paint a mental picture, I like to think of technical SEO as a garden. I’m a gardener, I have very ambitious garden in my backyard over here. And so I’d like to start from the very beginning, from the garden box, from the plot of land that you’re actually going to grow in. So you have your garden box, say, you’ve got your four sides, you fill that with soil. That’s going to be your CMS, or your code structure. And then the soil inside is going to be that code, and you want to make sure that you have the right kind of code for the right kind of website that you’re trying to build.

If we’re talking about gardening, you want to make sure that you have the right soil for your hardiness zone, for the types of fruits and vegetables that you’re going to grow. So that’s really where I spend most of my time on my day to day, is in that garden box. And making sure that if we’re talking about the Crawl Budget, the finite amount of time and resources, that’ll be the overall health of the soil, the overall health of your code, is really the place to start. And making sure that everything that’s important is crawled, and making sure that the things that are not important are not being crawled. We could start with, a great place to start there is the XML sitemap, your XML sitemap is going to be your inventory.

So if I have, say, a brick-and-mortar business or, say, I have some kind of a wholesale business, or even a retail business, and I have all of the products that I want to sell in the warehouse, and I know exactly what products I have on hand, and I know exactly how many of them there are, and I know exactly how much money I’m going to be able to make potentially on all of those. So that’s your XML sitemap right there. You have an inventory of all of the pieces of the website, every single page that you want to express to your organic search strategy. You want to make sure that’s there. So this is going to be helpful for Google to know, or for search engine robots, no matter where they’re coming from, to know what you intend to be in the index. And then it’s also helpful for diagnostics. So when we go in there and look, we know exactly what’s supposed to be there, we know which pages that are important and which pages we want to have in our organic strategy.

From there, we want to talk about how is users and also searches robots finding that content? Are they starting at the homepage and then going from there? Are they starting from a different point on a different perhaps funnel? And where are they coming in, and how are they getting to the content? And we want to be sure that, just as we’re doing keywords and content SEO, we want to make sure that the most important parts to any funnel, whether we’re talking about a top of funnel thing with who people have a lower awareness, who are looking just for an answer to their problem, they may not know what that problem is, all the way to the folks who already know about your brand, or already know about what you’re offering, or how you can solve their problem, and they’re perhaps just looking for a place to input their credit card information. I guess that was a little verbose as far as just one question is concerned, but it was a very broad question. Was that okay?

Keira Davidson (05:53):

No, I really appreciated the garden box analogy. It really helped me get my head around it. And the likes of a sitemap is so important to help search engines know what they can crawl, what they can’t crawl, where to look, where not to look, especially if you’re a large e-comm site, that is vital to have that correct, and up to date. So it’s really important that it’s dynamic, to basically be a replica of what’s being shown on the site.

Rick Ramos (06:24):

Right, mm-hmm.

Keira Davidson (06:30):

And I think, [inaudible 00:06:30]. Sorry, go on.

Rick Ramos (06:30):

Oh, I was just going to say the dynamic nature of that is super-duper important. From time to time, I’ll be working on a client who doesn’t perhaps have the infrastructure in order to create a dynamic XML sitemap. And they just end up just crawling the website and maybe just producing one out of Screaming Frog, or things like that. But I find that causes a lot more housekeeping issues than it really solves in the long run.

Keira Davidson (06:58):

Yeah, yeah. Geez, imagine if you didn’t do that regularly. Your site could be so many versions old compared to your sitemap, and almost you’d be telling that Google bot to crawl a completely different site to what’s actually live. And that definitely wouldn’t be optimized well at all.

Rick Ramos (07:19):

Yeah. And that’ll happen more often than you’d like to see.

Keira Davidson (07:24):

Oh god, that isn’t good at all. So once you’ve taken a look at the sitemap and the CMS, and seeing how they sort of interact, and what languages are being used, what would you do from there sort of thing, once they’re they’re all okay and they’re set? Would you look at log files to see then how the site’s interacting, or what would you do?

Rick Ramos (07:53):

I love looking at log files, log files are one of my favorite pieces of data to look at. However, it’s difficult to get log files from clients. So I don’t always have log files on hand, however, when I do, I like to use them. If I’m working on a client and they don’t have maybe a log file analysis in their statement of work or in the contract that we’re working on with them, I’d still like to ask and see if I get them. Because I like to use those log files just as another data point. So if we’re looking at the website, we’re looking at all the URLs on the website, and so this is all of our owned content, our owned data. We can put that together with things like analytics, with a search console, with things like your keyword footprint, or backlink profile. And so we can see on a URL level, on a directory level, or really, on a site-wide level, what that value is. And so we can go ahead and put log files into that.

And rather than analyzing the log files separately, we could just use those as another performance metrics, say, “You know what? We have this page right here that has X number of referring domains, and it has Y number of keywords.” And then also we can look at the crawl rate, we say, “You know what? During the sample period we had, Google made this many GET requests to this directory or to this URL.” And it just, as we’re doing an analysis, just gives us another kind of thing to look at, rather than putting those log files in all by themselves. However, looking at the log files all by themselves can be helpful as well. And really, with those log files, the biggest benefit that those have given me has been clarifying a hunch. Okay, you know what? Here’s the situation that I ran into a number of years ago. I had a client who had all of their internal search results that were indexable, and they were crawlable, and everything like that. And so we all know that those shouldn’t be there.

Keira Davidson (10:13):


Rick Ramos (10:13):

But I had a hard time convincing the leadership, the company leadership, that this is something that we wanted to do, that we wanted to get that out of there. The company leadership were just convinced that Google needs to crawl all of these things, and it’s the only way that we’re going to be able to get to the content. So I asked for some log files, we got some log files, found out that Google was spending over 80% of its time in internal search. I said, “You know what? Yes, okay, they may be necessary, they may be helping Google to get to this other content, but they’re only in these low-value pages, and they’re not reading the stuff that you actually want people to land on when they do a Google Search.” And so I used that number, those log files, to just confirm, as a confirmation sort of thing. So that I could sell this strategy to company leadership. So they bought in, we went ahead and removed all of the internal search results from Google. We used a three-step process.

First, what we did is we put a noindex tag on those pages, we created an XML sitemap that just had a few samples of popular internal searches, so Google was able to go ahead and crawl all those. And then after, we let that sit and we let Google chew on that for about six, eight weeks probably. And then we just disallowed the entire internal search directory using a robot, so robots.txt. So when started doing that, it was right before Thanksgiving, we went ahead and put those noindex tags in the header of all of the internal search results, and Google started taking them out of the index. And within a week, we saw about a 300% increase in clicks and impressions inside of Google Search Console. So just an example of you know what? We already know the answer, the answer’s taking the internal search results out. However, company leadership was not convinced. And so we used that number, I think it was 83 to 85% or something like that, to say, “Let’s pull back on this.” So we went ahead and pulled back on this. And luckily, we won really big on that.

Keira Davidson (12:38):

That’s crazy that it managed to increase so significantly for both clicks. Geez, I bet they’re really pleased that they did that.

Rick Ramos (12:51):

Yeah, I was really pleased too. That is definitely one of the greatest quick increases in organic traffic that I’ve ever accomplished using only technical SEO.

Keira Davidson (12:59):

And I guess something that would be in the back of my mind, when you are having to suggest this strategy to them, were they concerned that, “Geez, we’ve got all these search query pages that people are looking for almost. If we remove these from the search and don’t have them indexed,” was it a battle of then them being concerned about no longer ranking for as much, or keywords? And was that ever an issue or was it mainly just trying to convince them that it’s the right thing to do overall to maximize Google seeing the more valuable pages on the site?

Rick Ramos (13:46):

To be perfectly honest, I’m not really sure what the specifics of their pushback were.

Keira Davidson (13:52):


Rick Ramos (13:53):

However, what I think is just they had an idea of how Google works. They have an idea of how Google Search, as a piece of software, works, and they really thought that they needed to have all of these pages or these search facets, basically, crawlable, otherwise Google wouldn’t find all of their content.

Keira Davidson (13:53):


Rick Ramos (14:21):

And so they were really concerned that Google’s using these as a pathway, as a crawl path, in order to find all of the content, and the content was job boards. It was a job board, so there’s lots and lots and lots. Similar to an eCommerce kind of situation where you have that type of category model, where you have category, subcategory, and then your PDP, if it was eCommerce, but your job page, since it was a jobs listing. And really, the only difference between that jobs structure and eCommerce structure is the jobs only last about 30 to 90 days, and then they’re pulled down. You know what? I apologize. I went off on that tangent and I totally lost the question.

Keira Davidson (15:12):

It’s fine, don’t worry. You pretty much answered it anyway. It was, like you said, they had an idea of how Google interacted with the site.

Rick Ramos (15:22):


Keira Davidson (15:23):

And I guess it’s quite easy to get confused on how Google interacts. I know I’ve been looking quite a lot on how Google gets onto a page, how it moves, how it renders. And it’s quite interesting and how a second crawl might happen, but ultimately, you’d rather Google got all the information in the first crawl, just to prevent resources being wasted. That was something that I was looking into for a client doing an infinite scroll, and how the issues around bots crawling a site, and the duplicate content that can happen. So yeah, it’s really, really important to make sure that Google’s got the easiest possible chance of seeing each page and understanding it, and then also being able to get to the next one. And I’m sure you’ve had many of those conversations as well.

Rick Ramos (16:20):

Yeah, absolutely, absolutely. So it’s easy to say, the more often a page is crawled, the higher it’s going to rank. However, I mean, it is really easy to say that, but it’s really hard to prove that. So what I like to tell my clients is that the more often a page is crawled, the more potential that it has to rank in a competitive way.

Keira Davidson (16:47):

Of course.

Rick Ramos (16:48):

And that’s a correlation, it’s not a causation. However, if we are looking in log files, then we do see that pages that rank more highly typically are pages that are crawled more often. And there are other correlations that we can make to that. The pages that are crawled more often are also closer to the host name and the directory structure. And that’s not to say if your page is closer than it’s going to get crawled more, it’s just one of those correlations that we look at and we try to understand, and we kind of bring into the bigger picture. Because even with that one little element, or even with that one little bit of information that we have, there’s still a lot of different things that will go into it.

So if we’re looking at, say, we’ve got six to eight items in our top menu, some of them are crawled more often, some of them are crawled less often. Well, let’s see how many internal links or external links we have to each one of those six to eight pages. And just looking at something, distilling something down into, and just as simply as possible, those are the types of things that we want to find.

Keira Davidson (18:02):

Yeah, I definitely think internal links are quite often overlooked, but they are so valuable. And once their implemented correctly across the board, it can have such a halo effect if the content is really good as well, to get bots moving across around all of the site.

Rick Ramos (18:20):

Yeah, absolutely. We developed at Seer, at the Seer Interactive technical SEO team, one of our senior managers, Allison, created an equation that is supposed to mimic page rank. And so we have this really great internal link analysis that we built based on that, where we’ll crawl the site, we’ll put this page rank script through the crawl that we’ve done, just basically counting the internal links and then counting how we think that page rank might be passed. And then, the output will be just a list of pages. We want you to link to these pages, we can link to them from these pages, and these are the keywords that we would like to express in the anchor text sort of thing. And yeah, that’s a fun analysis to do. And it does bring results.

Keira Davidson (19:23):

I guess, once you’ve taken a look at the log files, to then conduct that afterwards, to really encourage bots to crawl in maybe more higher value pages, that will really help.

Rick Ramos (19:38):

Yeah, absolutely. And if I’m doing a log file, or if I’m doing an internal link analysis and making recommendations there, I love to have a sample of log files from before the links were implemented, and then also after. So we’ll have, say, two weeks of log files from before, we’d take a look at that. Implement those internal link recommendations, and then take another sample, another two-week sample of log files, after about four to six weeks. And we can watch how that changes. And a lot of times, the changes that we made, the changes that we recommended in the internal link analysis are expressed in higher crawl rates.

Keira Davidson (20:28):

I think that would be really interesting to see. I’ve only ever done log files once, like you said, it’s so difficult to get them from clients. But maybe next time, I’ll see if I can do it again, as in like, after we’ve made changes, just to see how the interaction differs by making relatively small changes to pages by adding a link in here and there. So I think that’d be really interesting,

Rick Ramos (21:00):

Super interesting. And then, we have access to you, we, as in SEO professionals, have access to tools that monitor that thing for you, your Deepcrawl, your Oncrawl, Botify. I’ve found that this is a lot easier to manage the logistics if you’re in house rather than agency. However, you can go ahead and plug your log files directly into that software, and you can look at crawl rate correlations together with all of the other things that those pieces of software analyze. So you can look at log files over a long amount of time. You could look at log files over, say, a quarter, or, say, half a year or something like that, which can be really, really helpful. If we do make any kind of SEO changes, then we can also look, in a very specific way, at how Google is crawling the website. If we don’t have log files, then it’s really nice, now that Google Search Console put the little crawl rate graphs into Search Console, we can look at that, get some little anecdotal information.

However, you say, “You know what? We launched a content campaign and we launched it on this day. And you know what? We want to look at it at this day in the future,” sort of thing, you can correlate crawl rate with that too, when you’re using those kinds of things that ingest and parse those log files on a continuous basis, rather than if we’re just doing a log file analysis, or we don’t have a tool like that and we just have to use a sample of a few weeks.

Keira Davidson (22:45):

Do you often find there are any common things that are overlooked when it comes to Crawl Budget? For example, someone might be doing a migration, and they are completely changing the way that they’re, let’s say, doing it internationally, the way it’s being set up. And do you find that they will often overlook the redirects when changing, like going from a subdomain to a subfolder? Because I’m guessing it’s probably really important to look at those redirects and to update them so they go directly from A to B. Because otherwise, we’d have so many chains that a bot would have to crawl. Are there any common mistakes that are often overlooked, or issues that arise?

Rick Ramos (23:38):

Oh, so in all of my years of being an SEO consultant, I’ve been in-house, I’ve been in agency, but I’ve done most of my work at agency. So I’ve had my hands on a lot of different websites. And one thing that I see, the one thing that’s coming to my mind when you ask that question is disallowing the whole website in robots.txt when the site is launched.

Keira Davidson (24:04):


Rick Ramos (24:05):

That’s just something that, it’s just so simple and it’s just something that we all know. And it’s a reflex, but it’s not a reflex for everyone. And so that’s something that I always be sure to mention to my clients. No matter what their level of technical understanding is, I like to mention that more than once, because you know what? Accidents happen, and that accident, I’ve seen that happen, I’ve seen that happen a lot. Been at agencies that have gotten clients because they had done something like that. Or another one is, one that I ran into just once is, this company was using a third party to launch. They were migrating all of their blog content to another platform. It was going to be a proprietary code base that belonged to the vendor and everything like that. Did the migration and they tanked. Well, first of all, they didn’t do their redirects correctly. They had an agency partner that they were working with at the time, but they didn’t tell the agency partner that they were doing a migration.

And so the agency wasn’t involved, they just wanted to do it themselves and save some money, I guess. So they didn’t do their redirects, they lost all of their traffic. And then they figured out what they needed to do. And they went ahead and fixed all those things. But even after months to even a year or more after that, they never recovered to a quarter of the value that they had in the past, a quarter of the traffic that they had in the past. So I got in there and put my hands on it, and realized that this third party that they migrated to had an X-Robots header tag that was disallowing the site, that was noindexing all of their content. And they didn’t know to look for that. And I found it by accident, and it’s the very first time that I’ve seen that issue where they’re using an X-Robots tag in the header response. So yeah, that was tough too. So yeah, I guess, if there is a thing, if there is a little bit of advice if you’re migrating, just don’t deindex your whole site.

Keira Davidson (26:21):

Yeah. That is definitely a really good one to point out. But it’s probably quite an accidental common issue to occasionally happen. And you probably racking your brains, “What the hell’s gone on? Why has it absolutely tanked?” And then you realize, “Oh God, I’ve gone and done this.” And then to rectify, you can remove the tag, but it’s the case of trying to then get Google back on it quickly, and take a look over all the pages, rank and index. It’s probably just an absolute nightmare.

Rick Ramos (26:58):

Yeah. And after a year of being deindexed and not knowing what’s going on, they never recovered to where they were before. But I also have some critical words for that company together, but that’s all I’ll say there.

Keira Davidson (27:20):

So it would be make sure to check that you’re not disallowing the whole site, and that no directive tags have been applied as well, on additional content or on a third-party platform. Are there anything, piece of advice, that you could potentially offer someone like me of, as a beginner, where would you say the best place to start to learn about Crawl Budgets, learn about how to optimize them for, what advice would you give a newer SEO?

Rick Ramos (27:55):

Well, I’m old school, but so I’d say Moz, the Moz intro to SEO set of content is just, is awesome, it’s second to none. It’s a really, really, really good place to start. On top of that, use Twitter. I met all the people that I, that helped me start my career through Twitter. And what the SEO industry, on Twitter, is very well. You know what? They have been very kind, they can be very unkind.

Keira Davidson (28:37):


Rick Ramos (28:37):

However, a lot of the people, the influencers that you see who are writing content, that are actively talking to other people in the industry, people who, like John Mueller, will absolutely go and reply to their tweets. Find the folks that are doing the type of SEO that you like to do, whether it’s the technical, the content, the local or anything else, and try and find where they’re talking, whether that be a conference, whether that be if they’re writing an article in Search Engine Journal, or whether that be they’ve got their own blog, and all they’re doing is expressing their knowledge to Twitter. So yeah, not always the best place for everything, but I’ve had a lot of luck and learned a lot of things, just from SEO Twitter.

Keira Davidson (29:36):

Yeah. No, I agree with you on the Twitter. There’s some really nice and helpful people, and definitely there’s a big community out there that are willing to help people. So I agree that it’s a good place to go to. And lots of content, updates, important information is often shared on that platform. So yeah, no, I totally agree with that.

Rick Ramos (30:01):

Yeah. Anytime that there’s an algorithm update, somebody figures it out, Marie Haynes typically, and somebody like Lily Ray, and they’ll figure it out. And they’ll maybe write something about it or they’ll be talking about it, and really all the hard work is done for you.

Keira Davidson (30:01):


Rick Ramos (30:21):

These folks. Who else? Ashley Berman Hale or Jamie Alberico. They do the work, and really, they make our lives easy for us.

Keira Davidson (30:32):

No, I agree. I really appreciate you joining me today and sharing lots of insightful information. And thanks very much for taking part and being a guest on The TechSEO Podcast.

Rick Ramos (30:45):

Thank you, I’m honored to be here. And if there’s anything that I could do for you in the future, please let me know.

Keira Davidson (30:50):

That’s great, thank you.

Join the discussion