Intelligent Speech

There are a number of speech-to-text technologies and services using the technology which can be accessed online and which demonstrate the particular effectiveness of the technology when applied to particular collections.

  •  Democracy Live (http://www.bbc.co.uk/democracylive)

This BBC News site offers live and on demand video coverage of the UK’s national political institutions and the European Parliament. Its search engine includes a speech-to-text system, built by Autonomy and Blinx. This enables word-searching across automatically generated transcripts (in English and Welsh), with results showing line of text highlighting the requested search term and a link to a particular point in time in the relevant video. It claims to have an accuracy rate of above 80%, and has been in operation for four years.

This site offers podcasts of public lectures, teaching material, interviews with leading academics, information about applying to the University. The archive has been subject-indexed with automatic keywords generated using the popular open source speech-to-text CMU Sphinx. The service was developed by Oxford University Computing Services and its Phonetics Laboratory as the JISC-funded SPINDLE project.

ScienceCinema features videos on research from the US Department of Energy and the European Organization for Nuclear Research (CERN). It uses the MAVIS audio indexing and speech recognition technology from Microsoft Research, enabling users to search for specific words and phrases spoken within video files.

Voxalead is a multimedia news test service, searching across freely-available web news sites from around the world, bringing together programme descriptions, subtitles and speech-to-text transcripts. It has been developed by Dassault Systèmes as an offshoot of its Exlead search engine. The speech-to-text element uses tools developed by Vocapia Research, which it combines with subtitle search and other metadata harvesting, with outputs such as map and chart view which demonstrate how powerful such technology can be in opening up and visualising audiovisual content.

This innovative demonstration service from BBC Research & Development and BBC World Service is an experiment in how to put large media archives online using a combination of algorithms and people. It includes over 50,000 English-language radio programmes from the World Service radio archive spanning the past 45 years, which have been categorised automatically using the CMU Sphinx open source tool to generate keywords. These are then reviewed and amended through crowdsourcing. The service is available to registered users only.

Speech-to-text technologies have reached an exciting stage where they are close to becoming adopted widely for large-scale operations. The recent Federation of International Television Archives (FIAT/IFTA) seminar on ‘Metadata as the Cornerstone of Digital Archive’, held in Hilversum, the Netherlands, in May 2013, showed how many broadcasters are now starting to think seriously of adopting speech-to-text systems to facilitate the subject-indexing of their audiovisual archives. Swiss-Italian broadcaster Radiotelevisione Svizzera has adopted speech-to-text to improve in-house indexing of their programmes, while Belgian broadcaster RTBF demonstrated the hugely impressive GEMS, a prototype for a semantic-based multimedia browser. This links data extracted from both traditional sources and speech-to-text, then combines it with Linked Open Data via a smart graphical user interface to connect the content to external services (such as Wikipedia pages). This demonstrates how speech-to-text applications are not simply about enhanced searching, but rather about extracting richer information from digital content.

SailLabs_demo

« previous     1 2 3 4