search engines, therefore, would be ones capable of understanding the meaning of
content for which they search. We define meaning as the message inherently
intended, expressed or signified in symbols, words, phrases, sentences and
larger blocks of text.
A semantic search engine would need to understand not only the
meaning of the data but that of the question being asked. And it would need to
do this instantly or automatically, returning only results that match and none
that have a meaning different from what the asker intended.
For example, a semantic search engine could disambiguate results
that lead to peoples’ resumes or profiles versus results that the lead to
employment advertisements. Perhaps an even simpler example would be to tell the
difference between Apple the fruit, Apple the company (or products), and Apple
the record studio. Search for just Apple in Google and most of the top results
will be about the company, not the other two, because most people on Google are
searching and clicking on results about Apple Inc. (and/or its products). This
is how statistically based popularity driven search devices like Google’s “page
rank” work. This is not semantic search at all. Popular pages are not
necessarily credible, and credible sources are not always incredibly popular.
One of the largest problems with implementing true semantic search has been that it is difficult for the computer to know who you are. In the example above it would have to tell whether you are a job seeker or a recruiter. Unless the search engine can learn from your past search behavior, or your previous selections, you would have to manually indicate a category for it to categorize results. Some search engines approximate semantic search by asking you to tag, catalog, sort and otherwise try to “train” the search engine, which is too time consuming for the average user.
So why is this important to you?
Well, if a computer knows what you mean right away, without having to learn from you or be trained by you, it would give you only relevant results and not show you all that other junk you have to manually sift through in today’s search engines. Technology is getting there, but we’re not close enough yet. In this author’s opinion there is still no search tool that comes close to understanding meaning and context, much less subtext, but there are a few getting close enough to be worth exploring.
Where is Semantic Search Today?
Semantic search promises that we should be able to search content on the Internet without needing to be experts in search. To do this, it needs to be automatic and it must not require us to go around tagging and cataloging content to make it acceptable for computers to “understand.” We just don’t have time to waste when a computer should be smart enough to derive context, subtext and meaning for me.
Tagging and annotating is not the answer either. Another part of the problem is
the lack of sheer computing power. There are limits to what we can compute
today, and search engines are no more than racks full of computers running
You see, search problems that have an exponential number of possible solutions can’t be solved by merely analyzing relational data. Think of all the possible combinations of meaning around a simple word like “well.”It could be a hole in the ground for extracting water, oil, gas, or brine… or it could be a container such as an ink well, or health related as in “I’m not well,” or an interjection in conversation “Well, then what I suggest is…” Or perhaps it signifies abundance as in “a well of information.” It could even mean one of the Internet’s original communities, The WELL. And that’s just some of the variations of the word as a noun. There are several others like the open space through all the floors in a building (stair well), nautical (anchor well), aeronautical (wheel well), or in British English the space in front of the judge’s bench. Then if you consider all the verb and adjective variations, and idioms, well… you see?
When a human reads “stair well” they don’t imagine an oil well inside a stair, they automatically know what it means. Computers, on the other hand, have to calculate dozens of variations and probabilities to be able to arrive at a best guess. I’m sure you’ve looked up words in the dictionary only to find they have three or four completely different definitions, sometimes even more! Disambiguating them is easy for us humans because of context and subtext, but not very easy for computers.
Context is the physical text surrounding a word, sentence or paragraph. In other words, its explicit therefore you can physically read context and thus a search engine could index keywords to interpret it. This is mostly how major search engines work today. Subtext, however, is the implicit or underlying meaning of text. Machines have not yet been able to “read” subtext, it must be interpreted though either inference, intuition, knowledge of the stated subject matter, educated guessing, or by making assumptions as a result of logical leaps. To do this, machines would have to use massive computing power to establish all possible relationships between words.
This kind of soft information is easily interpreted by a human with just enough knowledge of the landscape to be able to make logical leaps. For example, if you read the words “windows” and “vista” on a page that has other words that look like they are related to computers, you immediately know that is a page about the Microsoft Windows Vista operating system. But if a computer picks up those two words in a page it could associate them to concepts such as a view out of a window, and not really understand the underlying meaning.
Someday, search engines may be able to infer meaning from the pages they index. I’m waiting, with baited breath, for a solution that approaches the artificial intelligence needed to successfully extract this from pages. Semantic search technology has set our expectations too high. We have been misled by countless articles from experts, and the marketing of new search engines (remember Acoona and Cuil?) touting that this technology will dethrone Google by giving us much faster and more accurate search results. However, that is just not true. I do believe that semantic search is going to be a big deal some day, and that it will help us find data on the web in ways we just can’t do today by treating the entire worldwide content of the public web as a gigantic database and inferring meaning from our queries, just not today.