Searching for Tomorrow: Web Search

Showing posts with label Web Search. Show all posts

Monday, July 12, 2010

Infoaxe user blog posts. Thanks for the love!

It has been a little over a year since we launched Infoaxe, to make it easy for users to Search their Web History from anywhere, without going through the hassle of bookmarking & tagging.

We wanted to give a shout-out to some of the blog posts you, our users wrote about Infoaxe and how you have been finding the service helpful. Thanks for recommending us to your friends & readers. We have some exciting new features lined up this year which we hope you enjoy and let you get even more out of Infoaxe

Read the entire post on the Infoaxe Blog here.

Tuesday, December 29, 2009

Infoaxe Real-time Search

At Infoaxe, we launched our new Real-time Search Engine at the Real-time CrunchUp organized by TechCrunch in San Francisco. TechCrunch, VentureBeat & GigaOm covered our launch. Many thanks to Leena, Kim & Liz! The NYTimes & CNN also picked up the story which was exciting.

With Infoaxe's Real Time Search you ask the question, 'What's popular now for X?' (where X is your search query).

For eg. if you search for 'iphone review', Google shows a review of the first generation iphone from 2007 which is irrelevant now. Infoaxe on the other hand, shows a review of the iphone 3GS which is what is relevant NOW for such a query.

Infoaxe's real-time search engine works by analyzing the aggregate attention data collected by our Web History Search Engine with over 2.5 million users. We know what the world is looking at NOW and leverage that data to figure out the most timely and relevant results for queries. Infoaxe's ranking algorithms use signals derived from this aggregate browsing data to provide a real-time view of the Web for searchers. Instead of merely sorting results by time, Infoaxe's algorithms use freshness as a signal alongside several other relevance signals to provide relevant results. We think the best result for a query is one that is as fresh as possible but not fresher ;P. We think Einstein would agree ;).

Infoaxe does particularly well for queries relating to Shopping, deals, movies/sitcoms/ebooks etc. Check it out here and let us know what you think! This is just our first step out the door. We are constantly tuning our ranking our indexing algorithms so expect search quality to keep improving!

Saturday, June 27, 2009

A (Classification(Classification(Search Engines)))

Innovation in Search is far from asymptoting. I think we are going to see a lot of exciting next steps in Web Search in the coming years. There is a series of bets getting made on what the next disruptive step would be.

*User generated content (data generated by twitter, facebook, delicious, youtube etc)
*Personalization (disambiguate user intent better, shorter queries etc)
*Real time Search (fresher search, search twitter etc)
*Size (search through more documents, indexing the deep web, other data formats etc)
*Semantics (better understanding of documents, queries, user intent etc)
are all getting a lot of attention & investment.

This post is a classification of classification of Search Engines. Whenever I hear of a new search engine I subconsciously try to classify it based on a set of critera and it helps me see it in the context of its neighbors in that multidimensional space :). In this post I wanted to touch upon some of those criteria. Although this is a classification(classification) of Search Engines, I am being intentionally sloppy and have written this mainly from the context of the dimensions along which one can innovate in Search. For eg. for category 5. (visualization) one of the classes is the default paradigm of 10 blue links that I have not bothered to note. The goal here is to look at the search landscape and see the dimensions along which search upstarts are challenging the old guard. Examples are suggestive and not comprehensive.

1.By type of content searched for:
By type here I mean the multimedia content type of the results being returned. Bear in mind that what is actually getting indexed might be text (as in often the case for image search, video search etc).
Audio -last.fm,playlist,pandora
Video - youtube, metacafe, vimeo
Images - Like.com
Web pages - Google, Yahoo, Bing, Ask

2. By Specific Information Need/Purpose:
A Search Engine that solves a specific information need better than a General Purpose Web Search Engine.
Health - WebMD
Shopping- Amazon, Ebay, thefind
Travel - Expedia
Real Estate - Trulia
Entertainment -Youtube

3. Novel Ranking and/or Indexing Methods:
By leveraging features that are not used by current General purpose Web search engines. This is hardest way to compete with the incumbent Search Engines. Startups need to overcome several disadvantages to be able to even set up a meaningful comparison with the big guys. Disadvantages like data (queries, clicks etc), index size, spam data etc.
Natural Language Search - Powerset
Semantic Search - Hakia
Scalable Indexing- Cuil
Personalized Search -Kaltix
Real Time Search - Twitter, Tweetnews, Tweetmeme etc
Sometimes this can be in the context of vertical search engines also. For eg. for searching for restaurants using a feature like ratings might be useful which is not cleanly available to general purpose search engines but its a feature someone like Yelp might exploit.

4. Searching content that is not crawlable by General Purpose search engines:
Typically in these cases, the service containing the search engine generates its own data. For eg. Youtube, twitter, Facebook etc. But sometimes the data might be obtained via an API as in the case of the variety of Twitter Search Engines.
Videos - Youtube
Status Messages, link sharing - Twitter, Facebook
Data in charts & other parts of the Deep Web - Wolfram Alpha (some of the data seems to have been acquired at some cost)
Job Search - Linkedin, Hotjobs etc

5. Visualization:
Innovating on the search result presentation front.
Grokker (Map View)
Searchme (Coverflow like search result presentation)
Clusty (Document clustering by topic)
Snap (Thumbnail previews)
Kosmix (Automatic information aggregation from multiple sources)
Google Squared
Some conversational/dialogue interface based systems could also fall under this category.

6. Regionalization/Localization:
Regionalization/Localization could mean,
a.Better handling of the native language's character set (tokenization, stemming etc). The CJK languages (Chinese, Japanese, Korean) present unique challenges with word segmentation, named entity recognition etc.
b.Capturing any cultural biases
c.Blending in local & global content appropriately for search results. (This was my research project at Yahoo! Search. Will describe this problem in more detail in a subsequent post). For eg. for a query 'buying cars' issued by a UK user, we don't want to show listings of US cars. But if that same user queried for 'neural networks' we don't care if the result is a US website.
Crawling, indexing & ranking need some regionalization/localization and sometimes local search engines can challenge the larger search engines here.
China -Baidu
India -Guruji

Let me know if you see other forms of classfications and I'll update the post to reflect it. From a business perspective, I think there 4 main things to consider while building a new search engine.

A. How likely is to build a disruptive user experience? (Significantly better ranking, user experience etc. than the default Web Search Engine of choice. The delta is very likely to be much more than our first quess :))
B. How big will the impact be? (% of query stream impacted, $ value of impacted query stream etc)
C. How easy is to replicate? (Search is not sticky. A simple feature will get copied in no time and leave you in the cold.)
D. Accuracy of the binary query classifier the user would need to have(in her mind) to know when to use your search engine (for a general purpose search engine this is trivial but for other specific vertical/niche engines this is important). In English, this would be clarity of purpose.

Category 3. is definitely the playground where the big guys play and in my opinion the most exciting. Its where you are exposed to the black swans. Of course reward & risk are generally proportional and this is also where the riskiest bets are made. My company, Infoaxe will be entering category 3 in the next 2 months or so. This is a stealth project so I can't talk more about it at this time. I'll post an update once we're live.

[Image courtesy: http://www.webdesignedge.net/our-blog/wp-content/uploads/2009/06/search-engines.jpg]

Saturday, December 15, 2007

What has Search done to the Web?

I came across this interesting video from 1996 where Marc Andreessen is interviewed about Netscape and the future of the Web in general.
Its fascinating to hear Marc's vision of the Internet and the inevitable question of how Netscape sees Microsoft. Ok, how does this relate to Web Search? Let me tell you.
When Marc is asked about the impact of the browser on the internet, he explains it as follows. (to paraphrase) The browser basically made it easier for more people to view the Web. Only as more people started viewing the Web, did it make sense for more people to create Web Pages. He adds a nice analogy here. Its just like we wouldn't have books if there weren't any readers.
I see Web Search as being fundamentally similar to this. The exposure and access to information that Web Search Engines like Google gave to the Web has definitely fueled the growth of the WWW. Although the impact of Web search on the quantity of Webpages is fairly clear, I am not too sure what the impact on quality is though. Then again, quality is too subjective an attribute for the most part.

Powerset (where I currently work) is building a Natural Language Search Engine. It is starting out with a Search Engine for Wikipedia, and then will move to the WWW. On the same note, I think Powerset has the potential to improve the quality of Wikipedia, by offering a better search for it (enabling more people to find what they want, edit what they want etc).
Google and other Web Search Engines have contributed a fair bit to Wikipedia's growth by showing Wikipedia results in the top search results for a lot of queries.
John Battelle's blog post cites Google and Yahoo! as showing Wikipedia results in 27% and 31% of search queries respectively.
Its going to be fun to see how Google's Knol plays into all of this..

Searching for Tomorrow

Monday, July 12, 2010

Infoaxe user blog posts. Thanks for the love!

Tuesday, December 29, 2009

Infoaxe Real-time Search

Saturday, June 27, 2009

A (Classification(Classification(Search Engines)))

Saturday, December 15, 2007

What has Search done to the Web?

About Me

My Links

Blog Archive

My Blog List

About Me