So, in a minute we're going to walk through a little history of search
engines. brief history, just to see some of the
main ideas about them. let's start off by saying the World Wide
Web, or www, that's the abbreviation for the World Wide Web, was introduced in
1989. So, it's relatively new thing.
And you can imagine that at that time, there weren't a lot of web pages on the
internet. So, if you did a search of all the web
pages that it wouldn't be that many. And but today, there's maybe 60 billion
pages in 2012. That's just an estimate.
It's hard to get an exact estimate on the number of pages which are on the
internet. Because there's just so many hidden pages
that are hard for any crawler, which would physically crawl the space of the
internet. And try to figure out, and try to grab
all the pages it could. And return all of the results of all the
different pages. so, it's, it's a hard task to do, but the
estimate as of 2012 was 60 billion pages on the internet.
And Google at that time around 2012, had indexed 40 billion webpages.
So, Google, when you search on Google, you have access to 2 3rds, on average,
I'm just going to say 2 3rds of the internet.
You have access to 2 3rds of the pages on the internet.
So, there's some that aren't even known even by Google.
And that's probably interesting to you. But what you have to understand is that
even 40 billion is just so remarkable that they were able to get that many
search results. So that's just a lot of results.
And so, here's a, here's a graph of the number of pages that were indexed by
Google in billions by year. So, starting back down in 1997 that's
when Google was founded, when Google started to come out, 1997.
there were 24 million pages that Google has indexed.
And by index, that means that they could return the result because they had it in
their database. So, they, they keep a collection of all
the pages that they've indexed and they have all the all the connectivity between
them which we'll look at in a minute. But all of the pages that they've indexed
and that they can return for a search result at the time of its founding was 24
million. So, they knew of, quote unquote, we can
say indexed means really to know of. They knew of 24 million, back down here,
so this is 24 million. And you can see how much that crawled up,
even after 2005. It was a little over 8 billion, so by
here it was greater than 8 billion. Just orders of magnitude greater, you
know? So, an interesting chart that any of you
could do for a Wiki entry and we'd love to see this, is if you actually plotted
the number of pages or the estimated number of web pages by year versus
Google's index count by year. So, you can add, if you added another
chart on the side over here or you just graph them even to just say on the one
hand we have this is how many pages Google has indexed.
And each year, also this is how many web pages were estimated to have been out
there in total, and so on. So, we can do that.
But the reason we stop here, at 2005, is because this the time when, then, Google,
after that, stopped publicizing the number of pages they have indexed on
their, on their front page. So, initially, they had it shown year by
year, so every time you went onto Google, you'd be able to see how many pages they
have in their index. But as of 2005 then they stopped doing
that. So, they just released so that there have
about 40 billion at least in 2012. That's how many they indexed.
So, an interesting point is if they have 40 billion pages in their index, how is
it that they're able to return what you need within the first few results?
So, you never, you rarely have to go beyond the first page.
Rarely, will you go beyond the first page, even if you do the search result
correctly. And they're able to return it and give
you the best hits within the first page. And when you're able to usually find the
information you need within the page. So, how is it that that's possible?
They have to sift through those 40 billion pages, find that ranking
consensus and return that. And that's an interesting problem.
So, now let's take a look at search engines and how they came to be.
And in doing this, we'll see how Google is really distinct from other search
engines and, and what they're able to do and what they offer.
So, in 1994 that's when the idea of full text search, search started to happen.
And the, the first to offer that was called WebCrawler where you could enter a
full text search that would return a list of pages that had results similar to that
in 1994. So, that's when this whole concept of
full text search where you could enter some amount of text into some page.
Then, they'd actually return based upon your search.
So then, in 95 to 96 that's when you had the inversions of search engines like
AltaVista, Yahoo, Ask Jeeves, and what they had, they had what's called a
relevance score. And that's how they based it.
And that's probably the most intuitive obvious way that you would go about
ranking the web pages as you'd would say. If they look through each of the web
pages and they would count, basically keep a count of the number of times what
you searched. So, you searched some query it would have
a certain number of words in it. And they would take each of those words
and phrases and see how many times it appeared in each of those web pages.
They'd look at each web page and they'd count the total number of times.
And so, maybe for this one it was ten, and then maybe for this one it was three,
and so on. And that's what they would use, that
relevance score, to actually rank the web pages in that way.
Then, Google came around in 1997 and they had a bit of a different idea.
Well, the first way that they changed the landscape is they actually have a
different, had a different way then of doing relevance score.
But really, their main contribution was in terms of defining what's called an
importance score. So, the importance score is independent
of the search query that you make, okay? So, the, the importance score for a web
page doesn't depend upon what you search at all.
It depends on the structure of the web pages themselves.
And we have to come up with the idea of which web pages are the most important.
And that itself is a term that doesn't have a right or wrong answer.
But this said let's also take into account how important the web pages are.
Rather than just having the hits we have on the web page.
How many pages, can we look at the connectivity on the web pages and see
which ones seem to be more than others. And those should be weighted more
effectively in the answer. So now, here's a search engine market
share taken in May 2012. It was showing that Google had about 73
to 74% of the market. Bing is another, another big name in the
search, definitely. They had 11 or 12%.
Yahoo is a little above them, at this time, at least with 12.47%.
And and Ask and AOL was still around there too.
And we'll start looking at now how the important score takes has an impact on
the search result page that you see when you do a Google search.