Annota's Dataset

About the Dataset

This dataset was collected by the users of the web-based bookmarking service Annota in four years (from 2012 to 2016). It was developed at the Slovak University of Technology in Bratislava as a part of the research project TraDiCe (Cognitive Travelling in Digital Space of the Web and Digital Libraries). The authors of the system were the members of the research group PeWe lead by prof. Maria Bielikova.

If you use this dataset, please let us know at annota(at)fiit[dot]stuba[dot]sk. Also, reference these works:

Holub, M., Moro, R., Sevcech, J., Liptak, M. & Bielikova, M., 2014. Annota: Towards Enriching Scientific Publications with Semantics and User Annotations. D-Lib Magazine, 20(11/12). DOI: 10.1045/november14-holub

BibTex Entry:

    author = {Holub, Michal and Moro, Robert and Sevcech, Jakub and Liptak, Martin and Bielikova, Maria},
    doi = {10.1045/november14-holub},
    issn = {1082-9873},
    journal = {D-Lib Magazine},
    keywords = {annota,annotation,digital library,domain model,navigation,scientific publication,search,semantics,user model},
    number = {11/12},
    title = {{Annota: Towards Enriching Scientific Publications with Semantics and User Annotations}},
    url = {},
    volume = {20},
    year = {2014}

Ševcech, J., Móro, R., Holub, M. & Bieliková, M., 2014. User Annotations as a Context for Related Document Search on the Web and Digital Libraries. Informatica, 38(1), pp.21–30.

BibTex Entry:

    author = {{\v{S}}evcech, Jakub and M{\'{o}}ro, R{\'{o}}bert and Holub, Michal and Bielikov{\'{a}}, M{\'{a}}ria},
    journal = {Informatica},
    keywords = {annotation,query construction,related document,search},
    number = {1},
    pages = {21--30},
    title = {{User Annotations as a Context for Related Document Search on the Web and Digital Libraries}},
    url = {}
    volume = {38},
    year = {2014}

Dataset Overview

The Annota dataset consists of 33 tables stored as csv files. The principal entities (tables) are documents, papers, and users that have many other associated tables (e.g., paper_has_keywords); there are also tables storing relations among entities in these tables (e.g., user_has_documents, relationships, or references).

Annota is a bookmarking service. The users store bookmarks (documents), which is reflected in user_has_documents. A document can be a research paper (stored in papers); each paper is also a document (it has document_id), but not vice-versa.

Atop of the basic model, there is Graphene that aggregates all the relationships in the model and stores them in two tables: graphene_entities and graphene_relations. It is up to you, whether you use Graphene or the original underlying entities. Since the original reference is always kept in the graphene_entities, it can be traced back.

The data were collected by the users as well as automatically parsed from the Web as the users visited Web pages and various digital libraries. In some cases, it is not complete, in other, there may be duplicates. Please, always perform basic sanity checks when you use the data.

How to get a dataset

If you are interested in getting our dataset, please send us an email to annota(at)fiit[dot]stuba[dot]sk and we will be happy to fulfill your needs.