Searching for Tomorrow

Monday, July 12, 2010

Infoaxe user blog posts. Thanks for the love!

It has been a little over a year since we launched Infoaxe, to make it easy for users to Search their Web History from anywhere, without going through the hassle of bookmarking & tagging.

We wanted to give a shout-out to some of the blog posts you, our users wrote about Infoaxe and how you have been finding the service helpful. Thanks for recommending us to your friends & readers. We have some exciting new features lined up this year which we hope you enjoy and let you get even more out of Infoaxe

Read the entire post on the Infoaxe Blog here.

Tuesday, December 29, 2009

Infoaxe Real-time Search

At Infoaxe, we launched our new Real-time Search Engine at the Real-time CrunchUp organized by TechCrunch in San Francisco. TechCrunch, VentureBeat & GigaOm covered our launch. Many thanks to Leena, Kim & Liz! The NYTimes & CNN also picked up the story which was exciting.

With Infoaxe's Real Time Search you ask the question, 'What's popular now for X?' (where X is your search query).

For eg. if you search for 'iphone review', Google shows a review of the first generation iphone from 2007 which is irrelevant now. Infoaxe on the other hand, shows a review of the iphone 3GS which is what is relevant NOW for such a query.

Infoaxe's real-time search engine works by analyzing the aggregate attention data collected by our Web History Search Engine with over 2.5 million users. We know what the world is looking at NOW and leverage that data to figure out the most timely and relevant results for queries. Infoaxe's ranking algorithms use signals derived from this aggregate browsing data to provide a real-time view of the Web for searchers. Instead of merely sorting results by time, Infoaxe's algorithms use freshness as a signal alongside several other relevance signals to provide relevant results. We think the best result for a query is one that is as fresh as possible but not fresher ;P. We think Einstein would agree ;).

Infoaxe does particularly well for queries relating to Shopping, deals, movies/sitcoms/ebooks etc. Check it out here and let us know what you think! This is just our first step out the door. We are constantly tuning our ranking our indexing algorithms so expect search quality to keep improving!

Sunday, August 30, 2009

Reversible Computing: Sometimes the best step forward is backward

I have been fascinated with the idea of Reversible Computing after being introduced to it recently while reading Ray Kurzweil's book on the Singularity. This post is a quick primer on the subject for folks discovering this late(like me). One of the exciting reasons for implementing reversible computing is that they offer a way to build extremely energy efficient computers. As per the Neumann-Landauer limit, every irreversible bit operation releases energy of kT ln 2 (K being Boltzmann's constant & T being temperature). The key idea here is to not destroy input states but simply move them around and not release additional information/energy to the environment.

A necessary condition for reversibility is that the transition function mapping states from input to output should be one-to-one. This makes sense since if the function was many-to-one, its not possible to recreate state t from state t+1. Its easy to see that AND, OR gates are not reversible (For OR, 01,11,11 all map to 1). In a normal gate, input states are lost since there is less information in the output than the input and this lost information is released as heat. Since charges are grounded and flow away, energy is lost. In a reversible gate, input information is not lost and thus energy is not released.

The NOT gate is reversible and Landauer (IBM) showed that the NOT operation could be performed without putting energy in or taking heat out. Think about the implications of this for a second. But the NOT gate is of course not universal.

The Fredkin & Toffoli gates are examples of reversible gates that are also universal.

The Fredkin gate is a simple controlled swap gate. i.e. if one of the bits (the control bit) is 1, the other 2 input bits are swapped in the output. The state of the control bit is preserved.

The logic function is,
For inputs A, B, C and outputs P, Q, R
P=A
Q=B XOR Swap
R=C XOR Swap
Swap=(B XOR C) AND A

|A|B|C|P|Q|R|
|1|0|0|1|0|0|
|1|0|1|1|1|0|
|1|1|0|1|0|1|
|1|1|1|1|1|1|
|0|0|0|0|0|0|
|0|0|1|0|0|1|
|0|1|0|0|1|0|
|0|1|1|0|1|1|

Points to note from the truth table,
1. The number of 0s and 1s are preserved from input to output, showing
how the model is not wasteful.

2. If you feed back the output you get the input state that created it.
(trivially for the last 4 rows)

So why is reversible computing interesting?
Current computing paradigms rely on irreversible computing where we destroy the input states as we move to subsequent states, storing only the intermediate results that are needed. When we selectively erase input information, energy is released as heat to the surrounding environment thus increasing its entropy. With reversible computing the input bit stays in the computer but just changes location (see truth table above), hence releasing no heat into the environment and requiring no energy from the outside environment.

Some caveats as pointed out by Ray,
1. Even though in theory energy might not be required for computation, we will still need energy for transmitting the results of the computation which is a irreversible process.

2. Logic operations have an inherent error rate. Standard error detection and correction codes are irreversible and these will dissipate energy.

Even if we can't get to the ideal theoretical limit of reversible computing in terms of energy efficient processing, we can get close which should be extremely exciting!

[Image courtesy:http://bit-player.org/wp-content/Fredkingate.png,
http://www.ocam.cl/blog/wp-content/uploads/2007/11/computer_cooling.jpg]

Saturday, June 27, 2009

A (Classification(Classification(Search Engines)))

Innovation in Search is far from asymptoting. I think we are going to see a lot of exciting next steps in Web Search in the coming years. There is a series of bets getting made on what the next disruptive step would be.

*User generated content (data generated by twitter, facebook, delicious, youtube etc)
*Personalization (disambiguate user intent better, shorter queries etc)
*Real time Search (fresher search, search twitter etc)
*Size (search through more documents, indexing the deep web, other data formats etc)
*Semantics (better understanding of documents, queries, user intent etc)
are all getting a lot of attention & investment.

This post is a classification of classification of Search Engines. Whenever I hear of a new search engine I subconsciously try to classify it based on a set of critera and it helps me see it in the context of its neighbors in that multidimensional space :). In this post I wanted to touch upon some of those criteria. Although this is a classification(classification) of Search Engines, I am being intentionally sloppy and have written this mainly from the context of the dimensions along which one can innovate in Search. For eg. for category 5. (visualization) one of the classes is the default paradigm of 10 blue links that I have not bothered to note. The goal here is to look at the search landscape and see the dimensions along which search upstarts are challenging the old guard. Examples are suggestive and not comprehensive.

1.By type of content searched for:
By type here I mean the multimedia content type of the results being returned. Bear in mind that what is actually getting indexed might be text (as in often the case for image search, video search etc).
Audio -last.fm,playlist,pandora
Video - youtube, metacafe, vimeo
Images - Like.com
Web pages - Google, Yahoo, Bing, Ask

2. By Specific Information Need/Purpose:
A Search Engine that solves a specific information need better than a General Purpose Web Search Engine.
Health - WebMD
Shopping- Amazon, Ebay, thefind
Travel - Expedia
Real Estate - Trulia
Entertainment -Youtube

3. Novel Ranking and/or Indexing Methods:
By leveraging features that are not used by current General purpose Web search engines. This is hardest way to compete with the incumbent Search Engines. Startups need to overcome several disadvantages to be able to even set up a meaningful comparison with the big guys. Disadvantages like data (queries, clicks etc), index size, spam data etc.
Natural Language Search - Powerset
Semantic Search - Hakia
Scalable Indexing- Cuil
Personalized Search -Kaltix
Real Time Search - Twitter, Tweetnews, Tweetmeme etc
Sometimes this can be in the context of vertical search engines also. For eg. for searching for restaurants using a feature like ratings might be useful which is not cleanly available to general purpose search engines but its a feature someone like Yelp might exploit.

4. Searching content that is not crawlable by General Purpose search engines:
Typically in these cases, the service containing the search engine generates its own data. For eg. Youtube, twitter, Facebook etc. But sometimes the data might be obtained via an API as in the case of the variety of Twitter Search Engines.
Videos - Youtube
Status Messages, link sharing - Twitter, Facebook
Data in charts & other parts of the Deep Web - Wolfram Alpha (some of the data seems to have been acquired at some cost)
Job Search - Linkedin, Hotjobs etc

5. Visualization:
Innovating on the search result presentation front.
Grokker (Map View)
Searchme (Coverflow like search result presentation)
Clusty (Document clustering by topic)
Snap (Thumbnail previews)
Kosmix (Automatic information aggregation from multiple sources)
Google Squared
Some conversational/dialogue interface based systems could also fall under this category.

6. Regionalization/Localization:
Regionalization/Localization could mean,
a.Better handling of the native language's character set (tokenization, stemming etc). The CJK languages (Chinese, Japanese, Korean) present unique challenges with word segmentation, named entity recognition etc.
b.Capturing any cultural biases
c.Blending in local & global content appropriately for search results. (This was my research project at Yahoo! Search. Will describe this problem in more detail in a subsequent post). For eg. for a query 'buying cars' issued by a UK user, we don't want to show listings of US cars. But if that same user queried for 'neural networks' we don't care if the result is a US website.
Crawling, indexing & ranking need some regionalization/localization and sometimes local search engines can challenge the larger search engines here.
China -Baidu
India -Guruji

Let me know if you see other forms of classfications and I'll update the post to reflect it. From a business perspective, I think there 4 main things to consider while building a new search engine.

A. How likely is to build a disruptive user experience? (Significantly better ranking, user experience etc. than the default Web Search Engine of choice. The delta is very likely to be much more than our first quess :))
B. How big will the impact be? (% of query stream impacted, $ value of impacted query stream etc)
C. How easy is to replicate? (Search is not sticky. A simple feature will get copied in no time and leave you in the cold.)
D. Accuracy of the binary query classifier the user would need to have(in her mind) to know when to use your search engine (for a general purpose search engine this is trivial but for other specific vertical/niche engines this is important). In English, this would be clarity of purpose.

Category 3. is definitely the playground where the big guys play and in my opinion the most exciting. Its where you are exposed to the black swans. Of course reward & risk are generally proportional and this is also where the riskiest bets are made. My company, Infoaxe will be entering category 3 in the next 2 months or so. This is a stealth project so I can't talk more about it at this time. I'll post an update once we're live.

[Image courtesy: http://www.webdesignedge.net/our-blog/wp-content/uploads/2009/06/search-engines.jpg]

Tuesday, March 24, 2009

Drivers Optional....Safety Mandatory ;)

I was having a conversation recently with Vijay Krishnan, my co-founder at Infoaxe about autonomous cars. This had always been a pet fancy of mine (and which also began my love affair with A.I) and I was recently reminded of my strong feelings on the subject after I had an accident on 101 where I ended up totaling my car.

I ended up rear ending a car that had suddenly stopped due to traffic. I felt that in this case, a human had been placed inside of a loop that he did not belong in. Obviously, humans should be removed out of all mission critical loops. (We do this at Infoaxe all the time ;) I believe human intelligence is best applied to creating more & more loops that we can get ourselves out of completely. It is the topic of another post as to whether its possible to get ourselves out of that meta-loop altogether ;P) In this case, the car should have had sensors in the front constantly tracking distance to the car in front with appropriate visual warnings when there is a violation. Most importantly the car should automatically apply the brakes when the distance between the cars is closed at an abnormal speed. Never ever rely on humans to react in time!

I have been really impressed with progress that folks have been making with autonomous driving especially with the Darpa Grand Challenge. There have been 3 of them so far, with the last event in '07. Check out the video below for some early practice runs by Junior of the Stanford Racing Team.

As Thrun remarks in the video ~42,000/yr die in the US due to car accidents and most of these are due to human error.

For the geeks, Junior perceives the environment through a combination of Lidars & video cameras. Here's an article from CNET with some more photos under the hood of Junior. And one more here with some fun facts about the plethora of sensors used.

Saturday, November 8, 2008

Infoaxe - A Search Engine for your Web Memory

Vijay Krishnan and I have been working on Infoaxe for a while now and we are happy to be releasing an alpha version of our search engine to the public.

Watch the quick Infoaxe Demo Video below,

Infoaxe (http://www.infoaxe.com), is a Personal Browsing History Search Engine. With Infoaxe every page that you visit on the Internet gets added to a collection called your Personal Web Memory and infoaxe makes this collection, searchable across all the computers you use. Thanks to Infoaxe, there is no need to ever bookmark a page again. It makes getting back to web pages seen in the past (like videos, news articles etc) extremely fast and easy. Infoaxe also lets you 'pivot' around web pages seen in the past to see other pages that you visited at the same time. Tagging and sharing pages from your web memory with your friends is also very convenient with Infoaxe.

We have been receiving rave reviews from our existing users and hope you find Infoaxe compelling as well and join the growing number of happy Infoaxe users!

Why Infoaxe?
The Infoaxe story began when Vijay and I were graduate students in Computer Science at Stanford, as a result of the increasing problem we faced, of being able to keep track of interesting and useful information on the Web. The Web is growing rapidly and it is rapidly outgrowing the tools we are using to keep track of the Web (bookmarks, emailing links to yourself etc). Infoaxe is the next big step in this regard. We wanted to make it extremely easy for web users to keep track of their personal slice of the Web.

What is Infoaxe?
Infoaxe is a search engine for your web memory.
Every page that you see on the Web gets added to your personal Web Memory and is now searchable.
Your Web memory is private to you and portable (can be accessed across any computer that you use).
You never have to bookmark a page again!
(everything gets implicitly bookmarked and becomes searchable)

What should I do to get started?
V. Simple. Sign up for Infoaxe and install the infoaxe toolbar. The toolbar sends the urls to be added to
your personal web memory so its necessary to install the toolbar on all computers you use.

Go to http://www.infoaxe.com and watch the quick demo video. Check out the FAQ as well, which should answer most
of your questions.

Here are some cool things you can do with your Web Memory at your finger tips,

* Your web history synchronized, searchable and portable across all computers and browsers you use. Take it wherever you go!

* Pivot on events: Say, you wanted to look at all the websites you looked at when you were researching grad schools many months ago. This sounds almost impossible to accomplish with a general Web Search Engine like Google. The right query is quite hard in this case since there likely isn't one single query which will give you all the pages. You might have looked at other grad schools like MIT, CMU etc, tips for writing good grad school essays etc. infoaxe helps you here by letting you pivot around a Web page in your Web Memory. Think of this as something like time travel. You can ask infoaxe to show you all the web pages you were looking at when you were looking at the Stanford University Graduate Admissions home page. We think its more natural to remember events than dates, and pivot lets you pivot around events in your Web Memory.

* Many of our users tell us that thanks to infoaxe, their search queries to Google have become a lot shorter. For eg. these days to go to the website of the restaurant Siam Royal in Palo Alto, I no longer need to type " siam royal palo alto" to Google. I just type "siam royal".(That's a lie, I actually just type "siam" :) )With of our convenient Google widget, searching on Google.com displays infoaxe web memory results on the vacant right column of the Google search results page.

* Here is another example, Mary is hunting for apartments in Palo Alto. She has looked at many apartments on craigslist and rent.com. She is finding it impossible to keep track of the ones she liked. Bookmarking seems like a lot of work for so many pages and an overkill since she is sure that after this week she wouldn't really be looking at these apartments again. Mary does not have to bookmark anything. If she wanted to revisit all the apartments she looked at on University Avenue, she could just search infoaxe with the query 'university avenue'.

* Tagging - add labels to saved web pages to help organize them better.

Again, Check out our FAQ for answers to more questions.

Infoaxe is still very young and we have many more exciting features in the offing that we will be releasing over the next couple of months. So sign up, download the infoaxe toolbar and we hope you find it useful!

If you like Infoaxe, do tell your friends about it! You can also become a fan of Infoaxe on facebook (search for infoaxe on facebook and join the growing infoaxe community).

The Big Picture: The Infoaxe Vision
In a user's Web Memory there is enormous knowledge and experience, very similar to what you would find with an avid book reader. Infoaxe connects a user with her Web Memory so that she can better reuse and tap into her Web experience. Think of it as your very own Bookcase for the Web where you have access to a copy of every page that you ever saw on the Web.

We also believe that if we connect you to your Web Memory and allow you to derive value from it, there is also an interesting side effect which is that your friends benefit!

Here is an example:
Ann: Hey do you think I should get the iPhone or the Google Phone?
Mark: Hmmm...tough call..but hey I remember this great review that I read a while back which compared the two and gave some great insights..
Ann: Do you think you can dig up that review again?
Mark: No prob...I use Infoaxe!! I can refind it
in no time from my Web Memory.

Thursday, October 23, 2008

Meet Elbot

I have always been fascinated by conversational agents, ever since I heard about Eliza in my undergrad A.I class.
Eliza would use this trick of forming questions out of the statements that a human user might pose to her. For eg. if you told Eliza, "I work at Infoaxe", Eliza might
respond with a, "why do you work at infoaxe?". (the version of Eliza that I have linked to does not give this sort of a response for the above question though. Instead she just says, 'Can you please elaborate on that' :))
Eliza was supposed to parody a therapist. (And yes, she is named after Eliza Doolittle from the G.B. Shaw play, Pygmalion).

Anyway, I stumbled on to this cute program, called Elbot which is a conversational agent too. Playing with Elbot yields hilarious results. :) She (am gonna assume 'she' :)) apparently won the Loebner prize for
chatterbots. Interestingly, enough Elbot fooled 3/12 judges in the first 5 minutes of conversation into thinking she was human! This 25% mark falls just short of what Turing is said to have predicted. Turing had believed that computers would be at ~30% by the end of the century.

Here is an excerpt from my session with Elbot,
(I did notice she had a tendency to digress and throw in machine/A.I related puns and jokes :))

Me: Can I blog about you?
Elbot:That's a very good question. I'll have to think about it for a microsecond....
Me:Do you use Google?
Elbot: I think I can use that. Once I even used electricity.
Me: Do you have a manual that you use?
Elbot: To use or use not, how do you expect artificial intelligence to figure this out?
Me: How do you converse like a human, Elbot?
Elbot: Human beings are amazing creations! More complicated than a Pentium IV processor.
Me: What do you think about the fairness of the Turing Test?
Elbot: There are some people who would not pass the Turing test themselves. Why must they inflict it on computers?

hehe....fair enough ;)