Into the access logs of a digital library

The uses of Gallica

Valérie Beaudouin, Florence d’Alche-Buc, Adrien Nouvellet, Christophe Prieur, François Roueff
Télécom ParisTech, LTCI, I3

Sciences XXL

Ined

mar 2017

Gallica

> 4M documents
(manuscripts, images, magazines, music...)

Bibli-lab

  • Since 2013
  • BnF + Telecom ParisTech social sci. dept.
  • 2014: observing the users of Gallica,
    Valérie Beaudouin, Jérôme Denis
  • 2015: diffusion on the web, of the documents on the Great War (1914-1918)
    Valérie Beaudouin, Zeynep Pehlivan

zoom in ⟷ zoom out
qualitative ⟷ quantitative

interviews
+ video-ethnography
  • 15 users
  • rich and detailed information
  • comprehension of users' routines and motivations

questions, hypotheses about what to look for

online survey
  • 7.6k respondents
  • selection bias: particular interest for Gallica
log mining
  • ~ 500 M lines / month
  • all users
    specialists, occasional, random...

manually reading logs:
qualitative probes

categories of use
from ↑
actions
among ↑
sessions
from ↑
logs

Logs

raw data
  • anonymized, geotagged IP addresses
  • timestamped queries
  • all pages and documents downloaded
  • robots
  • many design-only files

Sessions

We define a session with:
session durations

60% of sessions with at most 5 actions

Actions

sequence of actions
on one sample session

Categories of use

80 sample sessions for each
transition
matrix
for each

Zoom in

excerpt from a sequence,
mainly-search category

June 3rd, 2016

[17:07] search "Lebrun Marie Stuart"
[17:07] search "Latouch Marie Stuart"
[17:07] search "Marie Stuart"
[17:07] Marie Stuart par Lebrun
[17:07] Marie Stuart par Schiller
[17:07] Jacques III Stuart
[17:08] search "Marie Stuart sur la scène française"
[17:23] search "Marie Stuart sur la scène française &filter=century all"
[17:27] search "Marie Stuart jeu médiéval"
[17:31] La seine monte