The Digital Humanities Observatory is running a week long series of workshops on TEI, XSLT, Data Modelling and Data Visulisation from Monday the 13th to Friday the 17th of July. As part of this I was invited to present a lecture to all the workshop participants. I entitled my talk "Visualisation as an analytical tool, from networks to data streams. 7 Key Challenges we face." Michael Maguire gave a very flattering description of my talk on his blog (thanks).
As I said during my talk, I normally give such a talk as an introductory session to information visualisation or visual analytics. However, this time I structured my talk around what I see as the 7 key challenges we (or anyone interested in visualising data) face. This blog post is a summary of the 110+ slides I presented (sans examples and mathematics!).
The ideas I presented are my view on the world of information visulisation and visual analytics. The key challenges were not presented in order of importance (as their relative importance is problem or domain dependent). There are also a number of challenges I personally feel (including multi-device and small screen visulisation) are crucial but I realise are not as pressing as the mainstream issues people face.
My ideas are informed by my ongoing research in InfoVis and from keynotes, lectures, online talks, toolkits and blogs that I've read or seen. Useful (and insightful) sources include, the visualizeit blog, the infosethetics blog by Andrew in the University of Sydney, the keynote Peter Eades gave at InfoVis 2006, the keynote Christian Chabot gave at the IEEE VAST 2008 and the ideas I could glean online from the VisWeek 2008 Panel on Grand Challenges for Information Visualization. If I've missed anything you feel is important do let me know!
So the 7 key challenges I see include:
- Empower: We must ensure the person using visualisation to understand data is empowered to gain insight or save time etc. To achieve this focus (long and hard) on identifying the questions that you need to answer with your visulisation. Do not just think about the data. If you think you have tool, method or technique to help empower a person (yourself or another) to gain insight or save time, can you validate this? What validation methods can you employ to ensure you are not just toying with pretty pictures?
- Connect: Ensure, based on the question at hand, you help the person using the visualisation build a connection between the data and any processing/analysis and the visual form presented. The question at hand and hence data drives what is an appropriate visualisation. Also, if you are using a particular visual form (eg. maps) how far can you stretch the metaphor or connection between data and display, before it breaks?
- Volume: Ensure if the data needed to help answer the question at hand has many elements that your visualisation method, tool or technique can support this. Voluminous datasets can break many desktop tools simply due to the time/memory/bandwidth needed to "load" the dataset. There are many sources of data with numerous individual elements to consider, 304,059,724 people in the USA (sources US Census Bureau) data on age, gender, ethnicity, household make up, home structure, income, farms, business and sales available. In July 2008 Google found 1 trillion (1,000,000,000,000) unique URLs on the web at once. This is ever increasing with user generated and automatically created content. One of our recent studies on extracting social networks from non-social network data started with 9,468,460 one-way flight passenger records. Clearly there are large datasets one might be faced with. Another problem (often overstated) is the dimensionality of the data (each element having multiple attributes to consider).
- Heterogeneity: Ensure if the data needed to help answer the question at hand consists of heterogenous data from multiple different sources or of “variant types” that your visualisation method, tool or technique can support this. If you need to consider a heterogenous data space then ensure the data-sets interlock so coupled or co-ordinated views are meaningful (and possible to display).
- Audience:Suit the word (display) to the audience. Ensure you match the visualisations to your questions and your audience. Know your user and don’t explore visualisation questions in a bubble. Engage and explore! Some methods, tools and techniques do not suit particular audiences. "You haven’t made impact with visual analytics until you help people with their own data" and I would add to this "in the particular sociotechnical context where they will use your tools,
methods or techniques".
- Dynamism: Data isn’t static. Ensure if the data needed to help answer the question at hand is a live source or the display is expected highlight changes over time that your visualisation method, tool or technique can support this.
- Discovery: Discover the new world once!: Ensure that your tools can store and capture and automate the process of pattern identification for subsequent data exploration. Convert identified patterns into “alerts” or stepwise mining, analysis, query and refinement into workflow.