Here you find all tools (i.e. software and software services) developed in the CLARIAH project, as well as some tools from predecessors and sister projects. Our tools are designed for researchers and developers in the Humanities and Social Sciences. Not all tools are suitable for all audiences and not all tools are mature and stable, this information should be clearly indicated for each tool, so you can make an informed judgement whether a tool might be suitable for you.

This list is automatically harvested from the tool producers and providers themselves, and updated daily.

Are you a CLARIAH developer and is your tool not included in the index yet or do you have questions or comments on the metadata? Please read our contribution guidelines

  • 8 - Complete: Technology complete and qualified, released for all end-users in scholarly environments.
  • Active: The project has reached a stable, usable state and is being actively developed.

Alpino Webservice 2.4

  •   Rijksuniversiteit Groningen (backend), Radboud Universiteit Nijmegen (webservice)
  •   KNAW Humanities Cluster & CLST, Radboud University
Alpino is a dependency parser for Dutch, developed in the context of the PIONIER Project Algorithms for Linguistic Processing, developed by Gertjan van Noord at the University of Groningen. You can upload either tokenised or untokenised files (which will be automatically tokenised for you using ucto), the output will consist of a zip file containing XML files, one for each sentence in the input document. [view more]
  • Internet > WWW/HTTP > WSGI > Application
  • Text Processing > Linguistic
  • dependency parsing
  • folia
  • linguistics
  • nlp
  • syntax
Created: 2015-09-08
Modified: 2023-11-01
  • Complete: The technology is complete, stable and deployed in production scenarios for end-users
  • Active: The project has reached a stable, usable state and is being actively developed.

AlpinoGraph 1.0.5

AlpinoGraph is een tool om syntactisch geannoteerde corpora te doorzoeken. De tool maakt gebruik van AgensGraph. AgensGraph combineert databasetechnologie (PostgreSQL) en Cypher, de standaard zoektaal voor grafen. De zoek-queries die je in AlpinoGraph kunt gebruiken zijn daarom een mix van SQL en Cypher. Daar voegt AlpinoGraph nog enkele extra uitbreidingen aan toe, zoals een eenvoudig maar handig systeem van macro's, en visualisatie van de resultaten. [view more]
  • Linguistics
  • nwo:ComputationalLinguisticsandPhilology
  • Software for humanities
  • Structural Analysis
  • Alpino
  • Cypher
  • Dependency parsing
  • SPOD: Syntactic profiler of Dutch
  • UD: Universal Dependencies
Created: 2020-03-25
Modified: 2024-04-24
  • Experimental: The technology is implemented and ready for experimental settings (beta), but requires further work and validation.
  • WIP: Initial development is in progress, but there has not yet been a stable, usable release suitable for the public.

Automatic Speech Recognition Service 0.3

An Automatic Speech Recognition Service for a variety of languages, powered by WhisperX [view more]
  • Internet > WWW/HTTP > WSGI > Application
  • Text Processing > Linguistic
  • clam webservice rest nlp computational_linguistics rest
Created: 2024-02-16
Modified: 2024-04-12
  • 9 - Proven: Technology complete and proven in practice by real users.
  • Active: The project has reached a stable, usable state and is being actively developed.

Automatic Transcription of Dutch Speech Recordings 0.6.1

  •   Centre for Language and Speech Technology, Radboud University
This webservice uses automatic speech recognition to provide the transcriptions of recordings spoken in Dutch. You can upload and process only one file per project. For bulk processing and other questions, please contact Henk van den Heuvel at h.vandenheuvel@let.ru.nl. [view more]
  • Software for humanities
  • Speech Recognizing
  • dutch
  • nlp
  • speech recognition
Created: 2017-04-02
  • Inactive: The project has reached a stable, usable state but is no longer being actively developed; support/maintenance will be provided as time allows.

FCS Aggregator 0.1

The Aggregator application is a part of the CLARIN-FCS common federated content search infrastructure. It serves as a user interface to perform queries to CLARIN-resources and display search results. The Aggregator communicates with components called endpoints, which are provided as a service by all centres who participate in the federated content search. Each endpoint provides access to one or more searchable resources. The user can select a specific resource or resources, based on the resource name or on the language, or search through all of them. The content of these resources is searched with the query supplied to the endpoint. The endpoint returns results to this query and the aggregator collects the responses from all the endpoints and displays them to the user. [view more]
  • BlackLab
  • CLARIN
  • corpus search
  • FCS 2.0
  • Federated Content Search
  • Nederlab
Created: 2016-09-11
Modified: 2023-05-10
  • Active: The project has reached a stable, usable state and is being actively developed.
Created: 2014-03-19
Modified: 2024-02-02
  • Active: The project has reached a stable, usable state and is being actively developed.
Created: 2014-03-19
Modified: 2024-02-02
  • Active: The project has reached a stable, usable state and is being actively developed.
Created: 2014-03-19
Modified: 2024-02-02
  • 8 - Complete: Technology complete and qualified, released for all end-users in scholarly environments.
  • Active: The project has reached a stable, usable state and is being actively developed.

CLARIAH Tools 1.6.4

This is a web portal where you can find all tools (i.e. software and software services) developed in the CLARIAH project, as well as some tools from predecessors and sister projects. This list is automatically harvested from the tool producers and providers themselves, and updated daily. Our tools are designed for researchers and developers in the Humanities and Social Sciences. Not all tools are suitable for all audiences and not all tools are mature and stable, this information should be clearly indicated for each tool, so you can make an informed judgement whether a tool might be suitable for you. [view more]
  • Browsing
  • Databases for humanities
  • Discovering
  • Exploration
  • Gathering
  • Software for humanities
  • codemeta
  • harvester
  • linked data
  • metadata
  • rdf
  • schema.org
  • software metadata
Created: 2022-01-05
Modified: 2024-06-04
  • Active: The project has reached a stable, usable state and is being actively developed.

RU-Cesar unknown

  •   Erwin Komen
Django web application that communicates with the CorpusStudioWeb back-end 'Crpp'. Two main purposes: (1) browse texts, (2) conduct syntactic searches with definable output per hit. Searches are translated to Xquery 'under the hood' [view more]
  • syntax
  • xquery
Created: 2018
  • Active: The project has reached a stable, usable state and is being actively developed.

e-WALD unknown

  •   Erwin Komen
Django web application that facilities viewing and searching a dictionary of Dutch dialects from the regions 'Achterhoek' and 'Liemers' [view more]
  • dialect
  • dictionary
  • dutch
Created: 2019
  • Active: The project has reached a stable, usable state and is being actively developed.

e-WGD unknown

  •   Erwin Komen
Django web application that facilities viewing and searching a dictionary of Dutch dialects from the province 'Gelderland' [view more]
  • dialect
  • dictionary
  • dutch
Created: 2019
  • Active: The project has reached a stable, usable state and is being actively developed.

e-WBD unknown

  •   Erwin Komen
Django web application that facilities viewing and searching a dictionary of dialects from the Dutch province 'Noord-Brabant' as well as the Belgian provinces of Antwerpen, Vlaams-Brabant and Brussels [view more]
  • dialect
  • dictionary
  • dutch
Created: 2017
  • Active: The project has reached a stable, usable state and is being actively developed.

e-WLD unknown

  •   Erwin Komen
Django web application that facilities viewing and searching a dictionary of the Dutch Limburgian dialects [view more]
  • dialect
  • dictionary
  • dutch
Created: 2016
  • 8 - Complete: Technology complete and qualified, released for all end-users in scholarly environments.
  • Active: The project has reached a stable, usable state and is being actively developed.

FLAT: the FoLiA Linguistic Annotation Tool 0.11.4

  •   KNAW Humanities Cluster & CLST, Radboud University
FLAT is a web-based linguistic annotation environment based around the FoLiA format (https://proycon.github.io/folia), a rich XML-based format for linguistic annotation. Flat allows users to view annotated FoLiA documents and enrich these documents with new annotations, a wide variety of linguistic annotation types is supported through the FoLiA paradigm. [view more]
  • Text Processing > Linguistic
  • annotation
  • computational linguistics
  • folia
  • linguistics
  • nlp
Created: 2014-01-02
Modified: 2024-02-07
  • 8 - Complete: Technology complete and qualified, released for all end-users in scholarly environments.
  • Inactive: The project has reached a stable, usable state but is no longer being actively developed; support/maintenance will be provided as time allows.

Piereling 0.4

  •   Centre for Language and Speech Technology, Radboud University
  •   KNAW Humanities Cluster & CLST, Radboud University
Piereling can convert a wide variety of document formats to FoLiA XML, and from FoLiA XML to various formats. Data conversions such as these provide the groundwork for Natural Language Processing pipelines. It relies on numerous specialised conversion tools in combination with notable third-party tools such as pandoc. [view more]
  • Internet > WWW/HTTP > WSGI > Application
  • Text Processing > Linguistic
  • webservice nlp computational_linguistics rest folia conversion
Created: 2019-10-18
Modified: 2023-11-01
  • 8 - Complete: Technology complete and qualified, released for all end-users in scholarly environments.
  • Active: The project has reached a stable, usable state and is being actively developed.

Frog Webservice 2.7

  •   Centre for Language and Speech Technology, Radboud University and KNAW Humanities Cluster
Frog is a suite containing a tokeniser, Part-of-Speech tagger, lemmatiser, morphological analyser, shallow parser, and dependency parser for Dutch. [view more]
  • Annotating
  • Contextualizing
  • Linguistics
  • Named Entity Recognition
  • POS-Tagging
  • Segmenting
  • Tagging
  • Textual and content analysis
  • Tree-Tagging
  • clam webservice rest nlp computational_linguistics rest
Created: 2022-02-17
Modified: 2023-12-05
  • Experimental: The technology is implemented and ready for experimental settings (beta), but requires further work and validation.
  • Active: The project has reached a stable, usable state and is being actively developed.

Grapheme to Phoneme converter 0.3.4

Grapheme to Phoneme (G2P) conversion. Input is a list of words (utf-8, one word per line). The G2P will output the best guess for the phonetic transcription per word. The system is trained on existing dictionaries. Please choose a language option. The system is a demo-version --- please refer to CLST for using G2P for long word lists. [view more]
  • Internet > WWW/HTTP > WSGI > Application
  • Text Processing > Linguistic
  • speech
  • transcription
Created: 2019-02-25
Modified: 2023-05-12
  • 6 - Late prototype: Technology demonstrated in target setting, end-users adopt it for testing purposes.
  • Active: The project has reached a stable, usable state and is being actively developed.

GaLAHaD 1.1.0

GaLAHaD (Generating Linguistic Annotations for Historical Dutch) allows linguists to compare taggers, tag their own corpora, evaluate the results and export their tagged documents. [view more]
  • Analyzing
  • Annotating
  • Artificial intelligence, export systems
  • Comparing
  • Computational linguistics and philology
  • Converting
  • Enriching
  • Lemmatizing
  • Linguistics
  • Machine Learning
  • Merging
  • POS-Tagging
  • Software for humanities
  • Tagging
  • Textual and linguistic corpora
Created: 2024-05-31
Modified: 2024-06-12
  • https://w3id.org/research-technology-readiness-level#Level8Complete
    Warning: Status is not expressed in a known vocabulary
  • Inactive: The project has reached a stable, usable state but is no longer being actively developed; support/maintenance will be provided as time allows.

Glem 1.3.1

  •   Faculty of Philosophy, Theology and Religious Studies and Centre for Language and Speech Technology, Radboud University Nijmegen
GLEM is a lemmatizer for Ancient Greek. [view more]
  • Annotating
  • Computational linguistics and philology
  • Greek and Latin philology and literature
  • ancient greek
  • greek
  • lemma
  • lemmatisation
  • natural language processing
  • nlp
Created: 2017-04-09
Modified: 2023-10-05
  • Active: The project has reached a stable, usable state and is being actively developed.

I-Analyzer 5.3.0

  •   Research Software Lab, Centre for Digital Humanities, Utrecht University
I-analyzer is a tool for exploring corpora (large collections of texts). You can use I-analyzer to find relevant documents, or to make visualisations to understand broader trends in the corpus. The interface is designed to be accessible for users of all skill levels. I-analyzer is primarily intended for academic research and higher education. We focus on data that is relevant for the humanities, but we are open to datasets that are relevant for other fields. [view more]
  • corpus research
  • data visualization
  • elasticsearch
  • natural language processing
  • text-mining
Created: 2016-09-01
Modified: 2023-12-08
  • Active: The project has reached a stable, usable state and is being actively developed.

Golden Agents | lenticularlens.org 1.17

Lenticular Lens is a tool which allows users to construct linksets between entities from different Timbuctoo datasets (so called data-alignment or reconciliation). Lenticular Lens tracks the configuration and the algorithms used in the alignment and is also able to report on manual corrections and the amount of manual validation done. [view more]
Created: 2019-01-16
Modified: 2022-10-26
  • Active: The project has reached a stable, usable state and is being actively developed.

Lenticular Lens 1.0.0

Lenticular Lens is a tool which allows users to construct linksets between entities from different Timbuctoo datasets (so called data-alignment or reconciliation). Lenticular Lens tracks the configuration and the algorithms used in the alignment and is also able to report on manual corrections and the amount of manual validation done. [view more]
Created: 2019-01-16
Modified: 2022-10-26
  • 9 - Proven: Technology complete and proven in practice by real users.
  • Active: The project has reached a stable, usable state and is being actively developed.

CLARIAH Media Suite 6.10

  •   Jaap Blom
The CLARIAH Media Suite is a research environment in which researchers can search, bookmark, annotate and compare items from a number of cultural heritage collections [view more]
  • collection analysis
  • cultural heritage
  • data portal
  • faceted search
  • scholerly annotation
  • virtual workspace
Created: 2023-11-21
Modified: 2023-11-21
  • 9 - Proven: Technology complete and proven in practice by real users.
  • Active: The project has reached a stable, usable state and is being actively developed.

Network of Terms GraphQL API

GraphQL API for the Network of Terms, a Search engine for finding terms in terminology sources (such as thesauri, classification systems and reference lists) [view more]
  • Identifying
  • graphql
  • linked-data
  • search
Created: 2020-04-17
  • 9 - Proven: Technology complete and proven in practice by real users.
  • Active: The project has reached a stable, usable state and is being actively developed.

Network of Terms Reconciliation API

Reconciliation API for the Network of Terms, a Search engine for finding terms in terminology sources (such as thesauri, classification systems and reference lists) [view more]
  • Identifying
  • graphql
  • linked-data
  • search
Created: 2020-04-17
  • Complete: The technology is complete, stable and deployed in production scenarios for end-users
  • Active: The project has reached a stable, usable state and is being actively developed.

PaQu 1.0.5

Met PaQu (Parse & Query) kun je zoeken in syntactisch geannoteerde Nederlandstalige corpora. PaQu ondersteunt twee manieren van zoeken. Met de eerste, eenvoudige, manier kun je naar woordparen zoeken, met daarbij eventueel hun syntactische relatie. De tweede, ingewikkeldere, manier gebruikt de zoektaal XPath. In PaQu is een aantal syntactisch geannoteerde corpora standaard beschikbaar. Maar het is ook mogelijk om je eigen teksten aan te bieden. Deze teksten worden dan door de automatische ontleder geanalyseerd, en opgeslagen. Vervolgens kun je dan op dezelfde manier in je eigen teksten zoeken. [view more]
  • Linguistics
  • nwo:ComputationalLinguisticsandPhilology
  • Software for humanities
  • Structural Analysis
  • Alpino
  • Dependency parsing
  • SPOD: Syntactic profiler of Dutch
  • UD: Universal Dependencies
  • XPath
Created: 2014-05-21
Modified: 2024-04-24
  • Active: The project has reached a stable, usable state and is being actively developed.

SHEBANQ v4.2z

Search engine for biblical Hebrew based on the Biblia Hebraica Stuttgartensia (Amstelodamensis) database (formerly known as ETCBC, historically known as WIVU) [view more]
  • Annotation
  • BHS
  • BHSA
  • Bible
  • Biblia Hebraica
  • Biblia Hebraica Stuttgartensia
  • Biblia Hebraica Stuttgartensia Amstelodamensis
  • Data Science
  • ETCBC
  • Hebrew
  • Hebrew Bible Reader
  • Hebrew Bible Research
  • Hebrew Bible Search
  • Hebrew Online Bible
  • Linguistic Queries
  • Online Bible Hebrew
  • Online Hebrew Bible
  • Query
  • Text Database
  • WIVU
Created: 2017-10-19
Modified: 2022-10-12
  • Active: The project has reached a stable, usable state and is being actively developed.

SHEBANQ v4.2z

Exposing the Hebrew Text Database of the ETCBC [view more]
  • annotation
  • etcbc
  • etcbc-data
  • hebrew
  • hebrew-bible
  • search-engine
  • text-fabric
Created: 2017-10-19
Modified: 2022-10-12
  • Active: The project has reached a stable, usable state and is being actively developed.

T-scan 0.9.8

  •   Utrecht University
T-Scan is an analysis tool for Dutch text, mainly focusing on text complexity. It has been initially conceptualized by Rogier Kraf and Henk Pander Maat. Rogier Kraf also programmed the first versions. From 2012 on, Henk Pander Maat supervised the development of the extended versions of the tool. These versions were programmed by Maarten van Gompel, Ko van der Sloot, Martijn van der Klis, Sheean Spoel and Luka van der Plas. [view more]
  • dutch
  • feature extraction
  • natural language processing
  • nlp
  • readability
Created: 2012-09-12
  • 8 - Complete: Technology complete and qualified, released for all end-users in scholarly environments.
  • Active: The project has reached a stable, usable state and is being actively developed.

Ucto Webservice 2.5.2

  •   Centre for Language and Speech Technology, Radboud University and KNAW Humanities Cluster
  •   KNAW Humanities Cluster & CLST, Radboud University
Ucto is a unicode-compliant tokeniser. It takes input in the form of one or more untokenised texts, and subsequently tokenises them. Several languages are supported, but the software is extensible to other languages. [view more]
  • Annotating
  • Linguistics
  • Tagging
  • Textual and content analysis
  • clam webservice rest nlp computational_linguistics rest
Created: 2022-04-08
Modified: 2024-03-14
  • Active: The project has reached a stable, usable state and is being actively developed.

Service to tokenize, lemmatize, pos-tag and dependency parse using udpipe 4.10

A rest service for an R / udpipe based tokenizer, lemmatizer, pos-tagger and dependency parser. See https://bitbucket.org/fryske-akademy/udpipe for (docker) setup. [view more]
Created: 2020-11-18
Modified: 2023-11-26