Our aim is to pursue research in visualization and sonification of large portions of the CBC Newsworld corpus: the collected and digitized 24‐hour air‐check videos from the last 23 years (back to 1989), and, more generally, to enable spoken phrase and keyword search, information seeking, search and display and segment review within this corpus.
For the past 73 years, the CBC has given voice to our unique Canadian perspective on the world, producing a phenomenally rich, multimedia record of our social, political and cultural heritage. The CBC Newsworld archives consist of a very large and valuable collection of daily broadcast “air checks” (Newsworld content together with advertisements, etc.) recorded on VHS videotapes and DVDs.
These types of broadcast media were not designed for long‐term preservation or for reuse, so such material is difficult to access and subject to deterioration, as evidenced by major European initiatives for digital preservation of broadcast media and other cultural heritage materials, e.g., The Presto Project and The Digital Preservation Coalition. The need to develop effective tools to interact with and use such large multimedia collections is both an important research problem and a practical concern, as speech applications become ubiquitous, and the stores of recorded audio and associated video content grow exponentially. After thousands of years in which written
texts have been the primary means of transmitting stories and knowledge across cultures and generations, we are now at the point where spoken language can be recorded and passed on just as easily, by anyone with a computer, a cellphone or a digital recorder. Systems that can process, manage and retrieve spoken language content will be essential in the very near future (Goldman, et al, 2005).
This project has two inter‐connected goals. The first is practical: to digitize, visualize, and make available this collection of 20 years of Canadian news broadcasts through a state‐of‐ the‐art multimedia search and browsing system. This will ensure the preservation and use of this valuable material as well as opening it up as a source of data for researchers in fields as diverse as linguistics, journalism, communication, political science, art, and culture. The second goal is to use this real‐world project as an arena in which to conduct exploratory research and develop and test new technologies for visualization and spoken document and video retrieval. This project will leverage the collaborative and multi‐disciplinary approach of GRAND to draw upon knowledge from archival and information science, computational linguistics, journalism and computer science to develop innovative, open and user‐centred approaches to providing access to digital audio and video content.