Gray Box Model

Cloud computing has the advantage of being much more flexible than similar hardware-based services.  However, cloud services tend to fall behind when it comes to database-intensive applications due to limitations in hard drive speeds.  Updating data in a hard drive is the limiting factor for most computers nowadays, as the process is limited by the speed of the stick that is writing the information to the disk.

MIT’s news article, “Making Cloud Computing More Efficient,” written by Barzan Mozafari, explains that “updating data stored on a hard drive is time-consuming, so most database servers will try to postpone that operation as long as they can, instead storing data modifications in the much faster – but volatile – main memory.”

At the SIGMOD conference, MIT researchers will reveal algorithms used by a new system called DBSeer that uses a “gray box model” that should help solve this problem.  DBSeer will use machine-learning techniques that will be able to predict the resource usage and needs of individual database-driven application servers.  Cloud computing servers are often divided up into multiple “virtual machines,” which are partitions of servers which are each allocated a set amount of processing power, memory, etc.  DBSeer will hopefully be able to predict a database’s unique needs and idiosyncrasies so it can predict whether or not it is viable to allocate additional resources from other partitions to solve a task.  If a virtual machine is just sitting there idle, DBSeer will assess whether or not it is prudent for that virtual machine to continue sitting there, or spend its allocated resources to complete a task on another partition.

Ultimately, this will allow servers to be much more efficient without further investment in hardware.  This trend that follows with Big Data is really getting computer scientists to question if there are more efficient ways to handle our problems with the hardware that we have.  It is all about maximizing productivity by questioning our own methods, rather than simply investing in more hardware.

Data Driven Media

The uses of Big Data are expanding beyond the technological and business worlds into the realm of entertainment. In regards to this expansion, The New York Times ran an article by David Carr, “Giving Viewers What They Want,” which addressed the growing uses of Big Data within the media industry. There is debate about whether data from Netflix users can be reliable in determining the success or failure of a new program, but it seems that “House of Cards” is the Netflix success story.

Netflix recently used Big Data to analyze information gleaned from their 33 million subscribers worldwide to develop a concept for their new original program “House of Cards.” Based on the data, they combined well-reviewed actors, themes and directors to create a show their viewers would love. “Film and television producers have always used data…, but as a technology company that distributes and now produces content, Netflix has mind-boggling access to consumer sentiment in real time,” Carr writes.

Despite the apparent success, some – including John Langford, president of FX – are skeptical of data being an indicator of response to innovative programs. Langford is quoted: “Data can only tell you what people have liked before.” Alternatively, Rick Smolan, author of “The Human Face of Big Data,” was quoted stating, knowing what viewers are watching gives Netflix a competitive edge when developing programming.

If Big Data becomes the trend for developing television concepts, we may see a rise in consumer-driven decisions in other media from design to content, writer or medium. This is already happening on a smaller scale, but with an increase in data the possibility for input and innovation is limitless.