Three related factors appear to be relevant in allowing an understanding of online behaviors: attention (the time that individuals and groups expend); influence (the relationships between ideas, products and behaviors) and affect (the emotions and sentiments that are expressed in relation to ideas and products). The extraction of accurate data, then the analysis of these factors in online behavior, and the charting and representation of relationships between these factors poses a significant challenge. For one thing, these factors need to be related to specific content. Data analytics and visualization tools are needed to represent each factor and to chart these relationships. There is very little research to date that works across these fields. This large-scale project seeks to shed light on each element of online social media practice and to then draw relationships between these elements.
People perform topic-based content exploration on large-scale social media systems. Such sites continue to expand rapidly. For example, Twitter continues to grow around the globe at a record pace. Just a year ago, they delivered 65 million Tweets a day. Today, they generate over 200 million Tweets per day. One year ago, there were approximately 150,000 registered Twitter apps. Now, there are more than one million. Facebook has more than 800 million active users of which more than 50% log on to Facebook in any given day where the average user has 130 friends. Seventy-seven percent of active Internet users read blogs. At the same time specialized media companies, brand development agencies and brands have developed social media applications that allow their users to communicate and at the same time, allow them to track the resulting data.
Editorial and business leaders see value in understanding the emotional tone, influences, attention span and diversity of their various sections and offerings, contributors and readers. Attention and influence, for example, currently directly impact advertising dollar interest in an article. In going digital, media publications have added commentary in the form of opinion blogs by its core of writers as well as ample opportunity for readers to vote and comment. Currently a majority of online media allow readers to express their thoughts and opinions on content through social media commentary. This information can impact advertising sales, decisions on style and relatedness of writers and design and even the kind of influence that different sections, authors or columns may have. Editorial leadership is eager to better manage the means for reader commentary. At the same time it is valuable to understand any underlying patterns that suggest reasons for specific emotional tone. Discovering sentiments, patterns and relationships embedded in articles as well as comments is important for tracking the newspaper’s role in shaping public opinion on contemporary issues and the ways that readers interact with these opinions. It can help media analysts better understand the impact of sentiments on news events. What is more, new tools, on multiple platforms can be developed for media users that allow them to shape their emotional content and respond to others, and chart the influence of their ideas, media patterns and behaviors.
For almost a decade contemporary brands have relied on a growing direct dialogue with their consumer base through social media, and gamification (direct play as a means of polling). These relationships engender loyalty and provide a rich source of data to understand and predict consumer behaviors. Consumer opinion that is expressed in response to new offerings, system breakdowns, or customer service is of critical importance in a world where viral trends erupt quickly with significant impact. Events and opinion outside of an immediate enterprise can have a direct impact in a social media era. Marketing and advertising companies analyze consumer attitudes and relationships to brands for trend analysis and product development. The technology of “predictive analytics” is being fine-tuned by digital media and ICT companies with new offerings such as inferSYTEMS. While the technology of monitoring is becoming more sophisticated, the underlying assumptions of analysis have not changed dramatically for many years, continuing to rely on twentieth century psychology structures. Brands and media analysis companies seek to bring together social media data with data that tracks consumer behaviors – in specific their attention to media, to products and services and their consumption patterns.
In some areas, e.g., healthcare, free-form texts are the most common form of valuable data. These data range from doctor’s notes, descriptions of patient histories, to healthcare-related messages posted by patients on social media such as blogs, bulletin boards, and discussion forums. Such narrative text data contain the most valuable information for physicians to use in their practice and for public and government agencies to make their healthcare-related decisions. Recently, the New York Times reported on a study by MIT researchers, which showed that companies included in their study that adopted data-driven decision-making achieved 5-6% higher productivity than those that did not.
Since data are continuously generated every day in large volumes, the sheer amount of data is too overwhelming for humans to read and analyze manually. Automatic text analysis tools are in great need to discover the hidden information trapped inside the free-form texts. For example, a tool that identifies and analyzes the healthcare-related posts in social media can detect public opinions, activities and preferences in healthcare-related issues.
Understanding consumer opinion of reliability and service quality across an industry like banking can have an impact on a specific company’s quality of service as well as enabling an entire industry to improve. Natural language analysis, data mining and information retrieval are key techniques that can be used to build such text analysis tools.
It is difficult to discern meaning by extracting information piece by piece. We hypothesize that taking a data-driven design approach to visualizing content would make the aggregate meanings more apparent. The advantage of working with this partially processed data is that issues of confidentiality do not arise since any confidential or client information has been abstracted from the media. A second advantage is that research can also focus on visualization and design issues rather than duplicate commercially available linguistic parsing capabilities.