Personalised and Context-based Access to Corporate
Knowledge: a Multi-modal and Multi-model Solution
Approach for Learning Activities
Victor Manuel García-Barrios
(Institute for Information Systems and Computer Media (IICM) - Faculty of Computer Science
at Graz University of Technology, Graz, Austria
vgarcia@iicm.edu)
Abstract: This paper focuses on two technological aspects. Firstly, the MISTRAL system is
briefly introduced in order to give some insights into one possible way of gaining and
managing multi-modal information that is extracted from a meeting corpus and semantically
enriched. And secondly, a multi-model approach is analysed and explained for the purpose of
enabling different personalised and context-dependent views on the gained information spaces.
As the system consists of several specialised sub-systems, the focus is set on the Semantic
Applications Unit (SemAU), which represents the front-end sub-system to external clients. The
SemAU is responsible for the main user-centred functions of MISTRAL: Search & Retrieval,
Modelling & Adaptation, and Multiple Visualisations.
Keywords: meeting information, multi-modal data, adaptive e-learning, user modelling
Categories: (H.4, J.4, K.3, I.5)
1 Introduction
In general terms, global economy and companies world-wide are influenced by
political, economic, social and technological forces. This explains their continuously
changing environment, which in turn leads to a recurrent impact on society. Focusing
on the technological viewpoint, modern companies are getting more knowledge
intensive and set their investment efforts on technology-based services. Thus, workers
are put under an enormous improvement pressure, because of (1) the expected higher
performance after introducing e.g. e-learning, (2) the belief that access to digitally-
shared corporate knowledge will accelerate and enhance the outcome of business
processes and decisions, and (3) the hope in finding the key to long-term market
advantages when developing new competence skills through e.g. knowledge
management systems. Simplifying, let us state that the primary trait of ‘key-players’
in the business world is ‘talent’, meaning the individual potential for high business-
productivity based on being either very experienced or really gifted. Yet, although
most of the critical decisions and main responsibilities rely on single key-players, the
process till efficient solutions are found, implies mostly the cooperation and
collaboration of many workers. The best and simplest reason for working in groups is
the following: two or more pairs of eyes are better than one. And the scenario for
coming together is always the same: a meeting. Face-to-face and virtual meetings
increasingly take place in today’s business processes. As stated in [Romano et al. 01],
managers and knowledge workers spend between 25% and 80% of their working time
, in meetings. Further, the median number of participants in the analysed meetings was
nine. From our literature survey we identified the following most common meeting
purposes: reconciliation of conflicts, facilitating communication, decision making,
problem solving, learning and training, knowledge exchange, reaching a common
understanding, exploration of new ideas and concepts (see e.g. [Romano et al. 01] and
[Whiteside et al. 88]). Thus, big financial and human efforts are invested to create
knowledge or to transfer it among meeting participants.
Based on these findings, we identify an economisation potential by increasing the
efficiency of meetings applying improved software methods and support, e.g. meeting
systems, group support systems, e-conference tools, meeting browsers ([Antunes et al.
03], [Lalanne et al. 05]). Further, knowledge addressed and created in meetings
should be preserved and made accessible for all company members (attendees and
absentees!). Indeed, multi-modal meeting recording applications and meeting
information systems are of emerging interest. Thus, it’s not surprising to find several
research projects being conducted in this context. [Gütl and García-Barrios 05]
Despite the identified increasing research activity, our survey has shown that
there is still a lack of integration facilities into knowledge management and e-learning
systems, e.g. within the context of life-long learning. This fact motivated us to set the
focus for the Semantic Applications Unit (SemAU) of the MISTRAL system on
Learning-On-the-Job. SemAU represents the main focus of this paper. First, a brief
description of the MISTRAL system is given. Next, aspects and requirements of
SemAU are depicted from the viewpoint of learner roles and needs. Finally, based on
these requirements, the general architecture SemAU is introduced and explained.
2 The MISTRAL System – A brief Overview
The need for improved methods for multi-modal information processing and semantic
annotation motivated the Faculty of Computer Science at Graz University of
Technology to initiate the research project MISTRAL. The project aims at enhanced
semi-automatic procedures for semantic annotation and enrichment of multi-modal
data from meeting recordings and meeting-related documents. Further, ‘meeting’ has
a broad sense, enclosing various scenarios, e.g. face-to-face or virtual meetings,
workshops or conferences. Thus, the system must be flexible enough to allow the
treatment of different types of multi-modal data. [Gütl and García-Barrios 05]
In order to process these multi-modal data, the system consists of sequentially
ordered ‘conceptual units’ for (1) uni-modal data stream processing, (2) multi-modal
merging of extracted features, (3) semantic enrichment of concepts, and (4) semantic
applications. Thus, the architecture of the MISTRAL system results from the
composition and interaction of its conceptual units (in addition, a benchmarking
framework and a Data Management Unit are also provided). Please refer to [Mistral
06] for detailed information about the MISTRAL research project and system.
The Uni-modal Unit consists of five interoperable modules: Video, Audio,
Speech-to-Text, Text and Sensory Modules. Respectively, they process image data
(e.g. detection and recognition of persons, movement tracking), sound data (e.g. voice
characteristics, phone ringing), textual transcriptions from talks, text documents (e.g.
key-words or content clusters from agenda, presentation slides or lecture notes), and
multi-modal sensor data (e.g. interactions with a presentation device like selected or