We currently utilize a handful of ec2 instances for running the app server, the search server, and crawling jobs with easy access to load balancers to scale.
Try it out for yourself! Curious about real estate prices in china? The economics or outlooks on Netflix? Or maybe you want to read about tech valuations from Wharton's paper, Knowledge@Wharton. All of this is possible and searchable!
As a previous learnstreamer summed up nicely:
>"The greatest web framework on Earth! Next to all of the others. Play is our goto web framework. We feel like it is one of the only web frameworks done right.The JSON and templating libraries are fantastic. And it is built on top of Akka, so it has concurrency in mind from the very beginning. One last mention: the Iteratees library is cool as heck. The learning curve is very manageable. And there is a Java version if you are not interested in working with Scala. I will say, if we weren't using Play we would be using Spray."
Scrapers! Node executes crawling jobs written with PhantomJS, and then is responsible for interfacing with the Play API for ingest into Elasticsearch.
The Elasticsearch use case for Knowledge Now is unique for us in that we use Elasticsearch as the DBOR and our sole data store. Raw scraped data as well as normalized data are processed through Elasticsearch pipelines before being stored in a display index for web queries.
Using a lucene technology like Elasticsearch instead of Solr was crucial to our iteration speed as schema changes and enhancements could be done with minimal configuration, especially as we had to update our schemas on a recurring basis due to new sources of scraped data creating new field requirements.
Clean, easy to use selectors and headless capabilities. PhantomJS is at the core of every scraper we manage for Wharton.
Once again, a fantastic go-to language when faced with problems of polymorphic pipeline development.