Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Ask HN: Is artificial intelligence/natural language processing a futile pursuit?
19 points by fezzl on March 22, 2010 | hide | past | favorite | 24 comments
Hi. I am a sophomore with a start-up that builds sentiment analysis technologies. We currently work on sentiment summarization and classification. Using papers and patents as a starting point, we try to create our own algorithm that we hope will outperform state-of-the-art approaches. We have been working on it for the past 2 months. Call it an irrational mindset, but we're adamant on building something that is not only tedious to copy but also fundamentally hard to duplicate.

We have concerns regarding what we are doing, namely: technology and market risks.

1) Technology risk We are worried that we would hit a dead-end and not be able to build what we set out to build. AI is a hard technical problem, and Marc Andreessen has described AI as an "equivalent" of rocket science. We are making progress but our chances of technologically outperforming Google or other AI-based companies are inherently not good. We fear that we might have picked the wrong beast to mess with but we're a little too involved right now to switch paths.

2) Market risk Other than social media monitoring tools, I haven't come across any solution that deals with sentiment analytics. The social media monitoring scene is crowded, so we hope to apply what we build in other unexplored areas. Our vision for now is to tackle the problem of information overload within consumer review platforms: think Amazon reviews, Yelp reviews, IMDb reviews, etc. We hope to offer an on-site (within-the-page-itself) dashboard that categorises raw data by mentioned concepts and their frequencies, the sentiments (positive/negative) with regard to said concepts and selected quotes that incorporate said aspects, along with sentiment insights in visualised forms (e.g. pie charts). End-users can interact with the dashboard and use any variable as an anchor to obtain insights (e.g. give me the raw data that mention concepts 1 and 2 in a positive light). This, we hope, would help people consume content more representatively and faster, without having to read through every single review from top-left to bottom-right. Hopefully, this would increase user engagement and shorten sales cycles. Naturally, the fear is that nobody wants our product, much less pay for it. Personally, as an end-user, I would find something like that useful, but I'm obviously biased.

Looking forward to some input for my current situation.



We got rocket science pretty much right 50 years ago. Don't let the fact that it sounds hard discourage you. Somebody's going to get this right eventually - why not you?

There have been a great many successes in the AI field. It's easy for people to forget though; once something works, nobody calls it AI anymore.


I don't think NLP and AI are dead-ends, however, I agree with you that sentiment analysis is a crowded area.

Personally I find I need sophisticated domain-specific heuristics to evaluate consumer reviews in particular spaces. For instance, when I buy a lens for my camera, I'm going to look at reviews, but it's tricky because there are always good and bad reviews for any lens. Some of the people who get bad reviews had a camera with a screwed up AF or they never really understood how to use the lens or what its limitations were. Then you look at say, a Sigma lens for the Canon platform and you'll see that different people are having wildly different results and you'd better just forget about it.

Dashboards and stuff like that is a waste of time. What I really want is something that creates a super-expert opinion that's maybe 3 sentences to a paragraph wrong. A bit beyond the state of the art.

----

More generally, I think Doug Lenat had the right idea with Cyc, but he went about it the wrong way. Had Doug not been able to make a comfortable living doing work for the government, he would have been forced to produce a revolutionary product, but he wasn't.

I think that the linked data space around the semantic web is going to explode and ultimately produce the "commonsense" knowledgebase that it takes to build real NLP systems.

Most of the people I know in the knowledge-management space are trying to develop expensive projects for government, pharma, legal discovery and such. I think, however, they are in the "pay a lot, get a little" businesses that are going to be disrupted by the next wave. On one hand you've got Google, Microsoft and a few biggies that are going to develop large-scale but low-margin products. The other side is going to be a vast, largely low-margin market of operators who pull from and add to the great linked data pool... which is going to grow like a Katamari ball until we reach the Singularity, maybe around 2025 or so.


Hi,

I have decades of experience with old style AI and a decade of the new kind :-). I specialize in language understanding algorithms and near-pefect sentiment analysis is something I expect we'll be able to do eventually using the methods I've invented.

The top level bit to worry about is whether you are attempting to re-do something that is already known not to work. Litmus tests: Are you using models of language such as grammars that are explicitly programmed in? Is your system specific to a single language so that switching to another would require complete re-coding? Do you employ linguists? If you answer yes to these, then you are in trouble.

I discussed how to get a modern AI education in an early blog entry at http://monicasmind.com

I propose a shift in direction of AI research in the second video at http://videos.syntience.com and explain why that's needed in the first one. Three more videos discuss details.

I have a theory/motivational site (6 pages or so) at http://artificial-intuition.com

I'm available for high level consultation on these issues. I worked at Google (I quit 2006) and although I cannot talk about what they do, I certainly will have an idea about what will be required to outperform them, both short term and long term.

If anyone wants to support Syntience Inc. in our effort to get true understanding to computers, please get in touch.

  - Monica Anderson
  http://syntience.com


As a non-specialist, I enjoyed your blog and site writing, thanks for sharing. In particular, your classification of problems and problem domains is compelling, and another confirmation that reductionist approaches seem to be hitting diminishing returns in understanding the world.


The fundamental problem with AI is the lack of parallelization. There are more connections in the human brain than there are atoms in the universe.

I am sure someone can write an AI program that is comparable with human intelligence on paper at this moment in time. But, the complexity of the algorithm would probably be exponential, hence massive parallelization is necessary.

I think the ibm blue brain project has the right approach for AI. http://en.wikipedia.org/wiki/Blue_Brain_Project


Uh. Wouldn't those connections need to be made of atoms?


Good point. Though I'm still sure that he's right in that there are many, many connections in the human brain.


There are more potential connections in between neurons in the human brain than there are atoms in the universe.

Or at least that is the way I heard it.


There are more potential connections between my cat and your dog than there are atoms in the universe. It's meaningless; just exponentiate something big enough and you get over the atom count :-)


I work on NLP for Chinese text with the Adso project (http://popupchinese.com/tools/downloads). This is a natural language processing engine that handles segmentation, sense analysis and semantic regexp for Chinese text.

In my experience, the complexity of most NLP applications work against them in the sense that they're competing with simpler approaches that are less computationally intensive. Good technology will let you do things other people can't do, but if you want to make it a business you'll need to know how to compete against Google doing 80% of what you can do with simple pattern matching. Your results will need to be orders of magnitude better before you have something people will use, let alone for which they'll pay. And be careful not to get trapped in a field where you'll be competing against companies paying serious cash for commercial databases to which you do not have access.

I'm personally skeptical there is much of a market for sentiment analysis incidentally, but the same tools are pretty useful for search (preprocessing, etc.). I think you'll find it difficult to get third-party adoption unless your product drives direct revenue for someone or can very visibly improve their product. But the problems are important and worth solving!


1) Trying to solve any hard problem bears the risk of failure. But as you already mentioned, if you succeed, you might have a competitive advantage, because it can't be easily duplicated. I think the risk of Google entering your market affects every internet-related venture. But at the same time, the fear is often unjustified. How often has that happened? Orkut did not kill Facebook, Buzz did not kill Twitter, etc.

2) I would build the technology and a cool showcase, but I would not directly target the consumer, but rather someone who can use your technology to make money.

Your technology might be interesting for advertisers to ensure that ads are only displayed next to articles with a positive sentiment towards to product. E.g. to avoid Toyota ad next to the latest news on stuck gas pedals. Maybe show a Volvo ad instead. I don't know if something like this is done at present.


Hi danielh,

Yes, there are already advertising networks that do semantic-targeted contextual advertising, e.g. peer39.com, collective.com. Not sure how they are doing though. Either way, it is something that we will look into. Thanks.


Cupitor impossibilium.

I've been working on alternative pricing algorithms for 8 years. Most people call this a waste of time. Recently, one of my algorithms started showing great promise. (I'll know within two months.) Do what fascinates you.


I think it's more than quite a leap to go from worrying about the marketability of your specific idea to asking about the futility of two very broad fields of research.

I work in NLP, my company has a sentiment analysis product. It's a very small part of what we do, and it's focused on a particular application. NLP itself has a very wide range of applications, some theoretical, some practical, some already in use all day long at very popular websites. Two months is a very short time for complex problems in NLP, believe me.

Regarding tech risk, I think you're taking your particular product niche and casting it to something unnecessarily broad. Polarity detection is just one possible task in NLP, which is itself a sub-domain of AI. It can be done reasonably well without using any real NLP techniques even (hence the crowded social media monitoring scene). That paragraph made it sound to me like you are stressing yourself out unnecessarily about 'solving AI'. It's sentiment analysis (which can be a challenge to get right), focus on that.

Regarding market risk, marketing new technology is a challenge for any new technology, and any given startup will have the same questions that must be asked and answered. The application of your idea may be marketable or it may not be, that has little bearing on the viability of AI/NLP. The best you can do is make sure the company has done due diligence and has a plan, hopefully with some market research to back it up.


I'd start by trying to validate your business model.

1) You need to find out who the paying customers are in this space and what features they really want. It seems very possible that average consumers would have no interest in your service but that power users, marketers or some other segment might. Once you know where the interest is you can make better feature decisions. For example, you might find that reporting, not AI, is what drives marketing sales.

2) Fundemental improvements in AI are really hard. Look at the results of the first Netflix prize. Lots of world class teams worked on that problem and the winning solution only produced something like a 10% improvement in the results. If that's the only edge your business has over the competition I doubt consumers would even notice. On the other hand, AI has matured to the point where it's pretty easy to produce good results. I'd rather bet my business on predicatble, good results and treat it as a pleasant surprise if we make a fundemental breakthrough.

Good luck!


"We are worried that we would hit a dead-end and not be able to build what we set out to build."

Listen attentively to your intuition. Your "worry" can stop you faster than any statistical or third opinion about whether AI is a hard problem. Doesn't mean you should stop worrying and continue. Spend time imagining your ideal solution in great detail --- touch, smell and taste it if you can. Judge based on how you feel about that end point.

I heard someone said "AI is what hasn't been done yet." You don't need to prove your worthiness to anyone or any community by "solving a hard problem". Instead, as chasingsparks put it, "do what fascinates you".


Don't worry about the technology risk. People already do this, so it isn't impossible. They probably don't do it well, but you only have to do better than random to provide benefit.

Market risk is more of a concern to me. If you're presenting results directly to users they'll probably care greatly about the quality of your algorithm. If you're just using aggregated data to drive, for example, marketing campaigns then the noise will be washed out given enough data.


No, it's not futile! Finding ways to get good information fast with the data explosion is one of the challenges we face in computer science. Anyway, the good folks at GATE are building a great set of tools for NLP for several years. The tools are similar to what you are doing. You are not crazy and we need more people like you to advance computer science! http://gate.ac.uk/


I'd like to address the market risks:

Did you know, that large retail sites increasingly employ people to post merited positive, and no-merit negative reviews to funnel consumers into a buying decision? And, by economy of scale, they do this in volume -usually massively overwhelming genuine on-site reviews. Working out the impact is left as an exercise for the reader.


I was probably careless when I did my research, but I just found a company that does the exact type of sentiment analysis on Amazon reviews (as I have independently envisioned):

http://techcrunch.com/2008/06/30/pluribo-is-cliffsnotes-for-...

Not sure how to react to this but sigh.


Hmm. you should probably look at http://adaptivesemantics.com/ . I just heard their founder present at SXSW, who was introduced as a "machine learning guru". they do machine learning for sentiment analysis


You need a catchy application that lets people understand the benefits of this technique, why not use it to rank the in real-time what things celebrities on Twitter are talking about... or something like that.


Certainly they're not futile to pursue in limited domains.

There's a paper I'm trying to find for you about analyzing affect in news articles for the purpose of trading stocks - I think they got things working fairly well.


This sounds like a really neat product and you sound really smart. I would love to chat with you because I have some (hopefully) unique ideas about how to tackle this problem, and would love to share. My email is zackster@gmåil.com




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: