Here you find all tools (i.e. software and software services) developed in the CLARIAH project, as well as some tools from predecessors and sister projects. Our tools are designed for researchers and developers in the Humanities and Social Sciences. Not all tools are suitable for all audiences and not all tools are mature and stable, this information should be clearly indicated for each tool, so you can make an informed judgement whether a tool might be suitable for you.

This list is automatically harvested from the tool producers and providers themselves, and updated daily.

Are you a CLARIAH developer and is your tool not included in the index yet or do you have questions or comments on the metadata? Please read our contribution guidelines

Name Version Interface type Description Links Status Maintainer Authors Producer/Provider
Alpino
Alpino 0.0.0
  • Command-line Application
Alpino parser and related tools for Dutch [view more]
Category:
  • Linguistics
  • nwo:ComputationalLinguisticsandPhilology
  • Software for humanities
  • Structural Analysis
  • Complete: The technology is complete, stable and deployed in production scenarios for end-users
  • Active: The project has reached a stable, usable state and is being actively developed.

★ ★ ★ ★ ☆
Alpino
  • Command-line Application
Alpino-Webservice 2.4.1 2024-10-17 17:01:23 +0200
  • Web Application
Alpino is a dependency parser for Dutch, developed in the context of the PIONIER Project Algorithms for Linguistic Processing, developed by Gertjan van Noord at the University of Groningen. This is the webservice for it. You can upload either tokenised or untokenised files (which will be automatically tokenised for you using ucto), the output will consist of a zip file containing XML files, one for each sentence in the input document. [view more]
Category:
  • Internet > WWW/HTTP > WSGI > Application
  • Text Processing > Linguistic
Keywords:
  • dependency parsing
  • folia
  • linguistics
  • nlp
  • syntax
  • 8 - Complete: Technology complete and qualified, released for all end-users in scholarly environments.
  • Active: The project has reached a stable, usable state and is being actively developed.

★ ★ ★ ☆ ☆
  •   KNAW Humanities Cluster & CLST, Radboud University
Alpino Webservice 2.4.1
  • Web Application
Alpino is a dependency parser for Dutch, developed in the context of the PIONIER Project Algorithms for Linguistic Processing, developed by Gertjan van Noord at the University of Groningen. You can upload either tokenised or untokenised files (which will be automatically tokenised for you using ucto), the output will consist of a zip file containing XML files, one for each sentence in the input document.
AlpinoGraph 1.0.5 2024-04-24
  • Web Application
AlpinoGraph is een tool om syntactisch geannoteerde corpora te doorzoeken. De tool maakt gebruik van AgensGraph. AgensGraph combineert databasetechnologie (PostgreSQL) en Cypher, de standaard zoektaal voor grafen. De zoek-queries die je in AlpinoGraph kunt gebruiken zijn daarom een mix van SQL en Cypher. Daar voegt AlpinoGraph nog enkele extra uitbreidingen aan toe, zoals een eenvoudig maar handig systeem van macro's, en visualisatie van de resultaten. [view more]
Category:
  • Linguistics
  • nwo:ComputationalLinguisticsandPhilology
  • Software for humanities
  • Structural Analysis
Keywords:
  • Alpino
  • Cypher
  • Dependency parsing
  • SPOD: Syntactic profiler of Dutch
  • UD: Universal Dependencies
  • Complete: The technology is complete, stable and deployed in production scenarios for end-users
  • Active: The project has reached a stable, usable state and is being actively developed.

★ ★ ★ ★ ☆
AlpinoGraph
  • Web Application
alud 2.14.0 2024-04-24
  • Command-line Application
  • Software Library
A Go package for deriving Universal Dependencies from Dutch sentences parsed with Alpino [view more]
Category:
  • Linguistics
  • nwo:ComputationalLinguisticsandPhilology
  • Software for humanities
  • Structural Analysis
Keywords:
  • Alpino
  • UD: Universal Dependencies
  • Complete: The technology is complete, stable and deployed in production scenarios for end-users
  • Active: The project has reached a stable, usable state and is being actively developed.

★ ★ ★ ★ ☆
alud
  • Command-line Application
github.com/rug-compling/alud
  • Software Library
analiticcl 0.4.7 2024-10-16 11:59:28 +0200
  • Unknown
Analiticcl is an approximate string matching or fuzzy-matching system that can be used to find variants for spelling correction or text normalisation [view more]
Keywords:
  • linguistics
  • nlp
  • spellcheck
  • spelling-correction
  • text-processing
  • Active: The project has reached a stable, usable state and is being actively developed.

★ ★ ★ ☆ ☆
  •   KNAW Humanities Cluster & CLST, Radboud University
AnnoRepo
AnnoRepo 0.6.3 2024-04-03 17:17:29 +0200
  • Web API
Implementation of W3C Web Annotation Protocol (root project) [view more]
Keywords:
  • web-annotation
  • web-annotation-protocol
  • Experimental: The technology is implemented and ready for experimental settings (beta), but requires further work and validation.
  • Active: The project has reached a stable, usable state and is being actively developed.

★ ★ ★ ★ ☆
annorepo-client 0.1.3 2023-11-29 16:33:58 +0100
  • Command-line Application
A Python client for accessing an AnnoRepo server [view more]
  • Planning: The technology is in an initial planning stage (pre-alpha), no implementation is available yet
  • Active: The project has reached a stable, usable state and is being actively developed.

★ ★ ★ ★ ☆
version
  • Command-line Application
asrservice 0.3 2024-04-12 10:39:45 +0200
  • Web Application
An Automatic Speech Recognition Service for a variety of languages, powered by WhisperX [view more]
Category:
  • Internet > WWW/HTTP > WSGI > Application
  • Text Processing > Linguistic
Keywords:
  • clam webservice rest nlp computational_linguistics rest
  • Experimental: The technology is implemented and ready for experimental settings (beta), but requires further work and validation.
  • WIP: Initial development is in progress, but there has not yet been a stable, usable release suitable for the public.

★ ★ ★ ☆ ☆
Automatic Speech Recognition Service 0.3
  • Web Application
asrservice
  • Unknown
auchann 0.2.0 2023-08-21 12:04:40 +0200
  • Command-line Application
The AuChAnn (Automatic CHAT Annotation) package can generate CHAT annotations based on a transcript-correction pairs of utterances. [view more]
  • Inactive: The project has reached a stable, usable state but is no longer being actively developed; support/maintenance will be provided as time allows.

★ ★ ★ ☆ ☆
auchann
  • Command-line Application
Automatic Speech Recognition for Dutch 0.6.2
  • Web Application
This is a web-based automatic speech recogniser for Dutch, capable of transcribing dutch speech recordings using multiple models. [view more]
Category:
  • Software for humanities
  • Speech Recognizing
Keywords:
  • dutch
  • nlp
  • speech recognition
  • 9 - Proven: Technology complete and proven in practice by real users.
  • Active: The project has reached a stable, usable state and is being actively developed.

★ ★ ★ ☆ ☆
Automatic Transcription of Dutch Speech Recordings 0.6.1
  • Web Application
This webservice uses automatic speech recognition to provide the transcriptions of recordings spoken in Dutch. You can upload and process only one file per project. For bulk processing and other questions, please contact Henk van den Heuvel at h.vandenheuvel@let.ru.nl.
Automatic Speech Recognition for Dutch
  • Unknown
Blacklab & Corpus Search
A Blacklab Server CLARIN FCS 2.0 endpoint 0.1 2023-05-10 15:46:27 +0200
  • Web Application
CLARIAH Federated content search corpora, developed by the Dutch Language Institute (INT), is a service to enable searching in multiple Dutch corpora at the same time. This application implements the CLARIN FCS 2.0 specification on top of Dutch language corpora. This repository hosts the source code. [view more]
Keywords:
  • BlackLab
  • CLARIN
  • corpus search
  • FCS 2.0
  • Federated Content Search
  • Nederlab
  • Inactive: The project has reached a stable, usable state but is no longer being actively developed; support/maintenance will be provided as time allows.

★ ★ ★ ★ ☆
Dutch FCS endpoints hosted at INT
  • Unknown
CLARIAH Federated content search backends - instances for several Dutch corpora
FCS Aggregator
  • Web Application
The Aggregator application is a part of the CLARIN-FCS common federated content search infrastructure. It serves as a user interface to perform queries to CLARIN-resources and display search results. The Aggregator communicates with components called endpoints, which are provided as a service by all centres who participate in the federated content search. Each endpoint provides access to one or more searchable resources. The user can select a specific resource or resources, based on the resource name or on the language, or search through all of them. The content of these resources is searched with the query supplied to the endpoint. The endpoint returns results to this query and the aggregator collects the responses from all the endpoints and displays them to the user.
BlackLab Corpus Search 3.0.1 2022-10-06 13:08:42 +0200
  • Unknown
The parent project for BlackLab Core and BlackLab Server. [view more]
Keywords:
  • corpus
  • Active: The project has reached a stable, usable state and is being actively developed.

★ ★ ★ ☆ ☆
INT Corpus Frontend 3.1.1 2024-02-02 16:25:03 +0300
  • Web Application
A web application to search corpora through the BlackLab Server web service. [view more]
Keywords:
  • corpus
  • Active: The project has reached a stable, usable state and is being actively developed.

★ ★ ★ ☆ ☆
1
Brieven als Buit search
  • Web Application
Brieven als Buit provided by the Dutch Language Institute in Leiden.
Corpus Hedendaags Nederlands
  • Web Application
CHN, provided by the Dutch Language Institute in Leiden.
OpenSoNaR
  • Web Application
OpenSoNaR, provided by the Dutch Language Institute in Leiden.
Broccoli 0.40.2 2024-11-27 11:38:27 +0100
  • Unknown
Da Broker 🥦 [view more]
  • Active: The project has reached a stable, usable state and is being actively developed.

★ ☆ ☆ ☆ ☆
burgerLinker 0.0.1-SNAPSHOT 2022-09-21 11:03:01 +0200
  • Command-line Application
Command line tool for linking civil registries [view more]
  • Inactive: The project has reached a stable, usable state but is no longer being actively developed; support/maintenance will be provided as time allows.

★ ☆ ☆ ☆ ☆
burgerLinker
  • Command-line Application
CHAMD 0.5.12 2024-03-13 11:22:57 +0100
  • Command-line Application
Conversion and cleaning of CHILDES CHA files into PaQu Plaintext Metadata Format (to convert to Alpino). [view more]
  • Active: The project has reached a stable, usable state and is being actively developed.

★ ★ ★ ☆ ☆
  •   Jan Odijk
  •   Jan Odijk
  •   Sheean Spoel
  •   Jelte van Boheemen
chamd
  • Command-line Application
CLAM 3.2.10 2024-03-14
  • Command-line Application
  • Server Application
  • Software Library
Quickly turn command-line applications into RESTful webservices with a web-application front-end. You provide a specification of your command line application, its input, output and parameters, and CLAM wraps around your application to form a fully fledged RESTful webservice. [view more]
Keywords:
  • natural language processing
  • nlp
  • rest
  • webservice
  • 9 - Proven: Technology complete and proven in practice by real users.
  • Active: The project has reached a stable, usable state and is being actively developed.

★ ★ ★ ★ ★
clamnewproject
  • Command-line Application
Developer tool to create a new CLAM project
clam
  • Software Library
CLAM Data & Client API - programming library for Python
clamservice
  • Server Application
Webservice daemon, the core component of CLAM. May be invoked directly in development, often invoken indirectly via WSGI in production environments.
CLARIAH LD Proxy 1.0-SNAPSHOT 2021-06-28 21:21:07 +0200
  • Unknown
Keep you LD URI's resolvable [view more]
  • Inactive: The project has reached a stable, usable state but is no longer being actively developed; support/maintenance will be provided as time allows.

★ ☆ ☆ ☆ ☆
CLARIAH Tool Discovery
CLARIAH Tool Discovery 1.6.4 2024-06-04 13:07:44
  • Web Application
This is the over-arching project for CLARIAH Tool Discovery, its components harvest and aggregate codemeta from source repositories and service endpoints, automatically converting known metadata schemes in the process. This project holds the Tool Source Registry, pointing to all the tools that are to be harvested. It also holds the validation schema. [view more]
Category:
  • Browsing
  • Databases for humanities
  • Discovering
  • Exploration
  • Gathering
  • Software for humanities
Keywords:
  • codemeta
  • harvester
  • linked data
  • metadata
  • rdf
  • schema.org
  • software metadata
  • 8 - Complete: Technology complete and qualified, released for all end-users in scholarly environments.
  • Active: The project has reached a stable, usable state and is being actively developed.

★ ★ ★ ★ ☆
1
CLARIAH Tools
  • Web Application
This is a web portal where you can find all tools (i.e. software and software services) developed in the CLARIAH project, as well as some tools from predecessors and sister projects. This list is automatically harvested from the tool producers and providers themselves, and updated daily. Our tools are designed for researchers and developers in the Humanities and Social Sciences. Not all tools are suitable for all audiences and not all tools are mature and stable, this information should be clearly indicated for each tool, so you can make an informed judgement whether a tool might be suitable for you.
codemeta-harvester 0.4.0 2024-06-03 11:36:44
  • Command-line Application
Harvest and aggregate codemeta from source repositories and service endpoints, automatically converting known metadata schemes in the process [view more]
Keywords:
  • codemeta
  • harvester
  • linked data
  • metadata
  • rdf
  • schema.org
  • software metadata
  • 8 - Complete: Technology complete and qualified, released for all end-users in scholarly environments.
  • Active: The project has reached a stable, usable state and is being actively developed.

★ ★ ★ ☆ ☆
codemeta-harvester
  • Command-line Application
codemeta-lod-to-cmdi 1.0-SNAPSHOT 2023-05-15 11:54:11 +0200
  • Unknown
CLARIAH Tool Discovery output (LOD -> CMDI conversion) [view more]
  • Inactive: The project has reached a stable, usable state but is no longer being actively developed; support/maintenance will be provided as time allows.

★ ☆ ☆ ☆ ☆
codemeta-server 0.4.1 2023-11-24
  • Server Application
Web API serving codemeta software metadata using codemeta and schema.org, provides a SPARQL endpoint and also offers a human web-interface [view more]
Category:
  • Software Development
Keywords:
  • codemeta
  • linked data
  • metadata
  • rdf
  • schema.org
  • scientific
  • software metadata
  • 7 - Release Candidate: Technology ready enough and in initial use by end-users in intended scholarly environments. Further validation may be in progress.
  • Active: The project has reached a stable, usable state and is being actively developed.

★ ★ ★ ☆ ☆
codemeta-server
  • Server Application
codemeta2html 0.1.0 2023-05-15 18:08:47 +0100
  • Command-line Application
  • Software Library
Convert software metadata in codemeta to html for visualisation, can generate fully-fledged static sites that serve well as a portal for a collection of software [view more]
Category:
  • Software Development
Keywords:
  • codemeta
  • linked data
  • metadata
  • rdf
  • schema.org
  • scientific
  • software metadata
  • 7 - Release Candidate: Technology ready enough and in initial use by end-users in intended scholarly environments. Further validation may be in progress.
  • Active: The project has reached a stable, usable state and is being actively developed.

★ ★ ★ ☆ ☆
codemeta2html
  • Command-line Application
codemeta2html
  • Software Library
CodeMetaPy 2.5.3 2024-06-14 11:33:47 +0200
  • Command-line Application
  • Software Library
Codemetapy is a command-line tool and python library to work with the codemeta software metadata standard. Codemeta builds upon schema.org and defines a vocabulary for describing software source code. It maps various existing metadata standards to a unified vocabulary. Codemetapy allows you to generate codemeta from various sources. [view more]
Category:
  • Computer science
  • Converting
  • Software Development
Keywords:
  • codemeta
  • linked data
  • metadata
  • metadata-extractor
  • rdf
  • schema.org
  • scientific
  • software metadata
  • 9 - Proven: Technology complete and proven in practice by real users.
  • Active: The project has reached a stable, usable state and is being actively developed.

★ ★ ★ ★ ☆
codemeta
  • Software Library
codemetapy
  • Command-line Application
CMD2RDF 1.0.1 2021-03-07 18:35:08 +0100
  • Unknown
No description provided
Keywords:
  • cmdi
  • linked-data
  • metadata-conversion
  • rdf
  • Inactive: The project has reached a stable, usable state but is no longer being actively developed; support/maintenance will be provided as time allows.

★ ☆ ☆ ☆ ☆
COBALT unknown 2020-07-17 09:55:43 +0200
  • Unknown
Corpus annotation tool [view more]
Keywords:
  • corpus
  • Suspended: Initial development has started, but there has not yet been a stable, usable release; work has been stopped for the time being but the author(s) intend on resuming work.

☆ ☆ ☆ ☆ ☆
Colibri Core 2.5.9
  • Command-line Application
Colibri core is an NLP tool as well as a C++ and Python library for working with basic linguistic constructions such as n-grams and skipgrams (i.e patterns with one or more gaps, either of fixed or dynamic size) in a quick and memory-efficient way. [view more]
Keywords:
  • language modelling
  • natural language processing
  • ngrams
  • nlp
  • pattern recognition
  • skipgrams
  • Active: The project has reached a stable, usable state and is being actively developed.

★ ★ ★ ☆ ☆
colibri-reverseindex
  • Command-line Application
Computes and prints reverse index of the corpus, for each token position in the corpus, all patterns that start at that position are shown. This is a high-level convenience script over underlying tools.
colibri-classdecode
  • Command-line Application
Decodes a binary encoded corpus and a class file to a plain text corpus
colibri-histogram
  • Command-line Application
Computes a histogram for ngram occurrences (and optionally skipgrams) in the corpus. This is a high-level convenience script over underlying tools.
colibri-ngramstats
  • Command-line Application
Computes a summary report on the count of ngrams (and optionally skipgrams) in the corpus. This is a high-level convenience script over underlying tools.
colibri-patternmodeller
  • Command-line Application
Extract, model and compare recurring patterns (n-grams, skipgrams, flexgrams) and their frequencies in text corpus data. This is the main tool of Colibri Core.
colibri-queryngrams
  • Command-line Application
Interactive command line tool to n-grams with their counts from one or more plain-text corpus files. This is a high-level convenience script over underlying tools.
colibri-coverage
  • Command-line Application
Computes the coverage of training/background corpus on a particular test/foreground corpus, i.e how many of the patterns in the test corpus were found during training, how many tokens are covered, and how is this all distributed?. This is a high-level convenience script over underlying tools.
colibri-classencode
  • Command-line Application
Encodes a plain text corpus to a binary encoded corpus and a class file
colibri-freqlist
  • Command-line Application
Extract n-grams (and optionally skipgrams) with their counts from one or more plain-text corpus files. This is a high-level convenience script over underlying tools.
colibri-cooc
  • Command-line Application
Computes co-occurrence statistics (absolute co-cooccurrence or pointwise mutual information) between patterns in a corpus
colibri-ngrams
  • Command-line Application
Extract n-grams of a particular size by moving a sliding window over the corpus. This is a high-level convenience script over underlying tools.
colibri-loglikelihood
  • Command-line Application
Compares the frequency of patterns between two or more corpus files (plain text) by computing log likelihood, following the methodology of Rayson and Garside (2000), Comparing corpora using frequency profiling. In proceedings of the workshop on Comparing Corpora, held in conjunction with the 38th annual meeting of the Association for Computational Linguistics (ACL 2000). 1-8 October 2000, Hong Kong, pp. 1 - 6: http://www.comp.lancs.ac.uk/~paul/publications/rg_acl2000.pdf. This is a high-level convenience script over underlying tools.
colibri-findpatterns
  • Command-line Application
Find patterns in corpus data based on a presupplied list of patterns (one per line). This is a high-level convenience script over underlying tools.
Corpus Editor for Syntactically Annotated Resources (Cesar) unknown
  • Web Application
Django web application that communicates with the CorpusStudioWeb back-end 'Crpp'. Two main purposes: (1) browse texts, (2) conduct syntactic searches with definable output per hit. Searches are translated to Xquery 'under the hood' [view more]
Keywords:
  • syntax
  • xquery
  • Active: The project has reached a stable, usable state and is being actively developed.

★ ☆ ☆ ☆ ☆
  •   Erwin Komen
  •   Erwin Komen
RU-Cesar
  • Web Application
cow_csvw 1.21 2024-03-08 16:02:10 +0100
  • Command-line Application
Integrated CSV to RDF converter, using CSVW and nanopublications [view more]
Keywords:
  • csv
  • csvw
  • rdf
  • Inactive: The project has reached a stable, usable state but is no longer being actively developed; support/maintenance will be provided as time allows.

★ ★ ★ ★ ☆
cow_tool
  • Command-line Application
cow_tool_cli
  • Command-line Application
DANE
DANE 0.4.3 2024-05-13 10:13:48 +0200
  • Unknown
Utils for working with the Distributed Annotation and Enrichment system [view more]
Category:
  • Multimedia > Video
  • Scientific/Engineering > Artificial Intelligence
  • Software Development > Libraries > Python Modules
  • Experimental: The technology is implemented and ready for experimental settings (beta), but requires further work and validation.
  • WIP: Initial development is in progress, but there has not yet been a stable, usable release suitable for the public.

★ ★ ★ ★ ☆
dane-asr-worker 0.1.0
  • Unknown
Automatic speech recognition through an external service. Depends on DANE-server [view more]
Category:
  • Multimedia processing
  • WIP: Initial development is in progress, but there has not yet been a stable, usable release suitable for the public.

★ ☆ ☆ ☆ ☆
  •   Nanne van Noord
  •   Nanne van Noord
  •   Jaap Blom
dane-download-worker 0.9.0
  • Unknown
Basic "DANE worker" that downloads input data via HTTP(s) URLs for further processing by other DANE workers. Depends on DANE-server [view more]
Category:
  • Multimedia processing
  • WIP: Initial development is in progress, but there has not yet been a stable, usable release suitable for the public.

★ ☆ ☆ ☆ ☆
  •   Nanne van Noord
  •   Nanne van Noord
  •   Jaap Blom
DANE-server 0.3.1 2023-06-19 09:07:32 +0200
  • Unknown
Back-end for the Distributed Annotation 'n' Enrichment (DANE) system [view more]
  • Experimental: The technology is implemented and ready for experimental settings (beta), but requires further work and validation.
  • WIP: Initial development is in progress, but there has not yet been a stable, usable release suitable for the public.

★ ★ ★ ★ ☆
DANE-server
  • Unknown
dane-workflows 0.9.0
  • Software Library
Python library for setting up simple data processing workflows (using DANE) [view more]
Category:
  • Multimedia processing
  • WIP: Initial development is in progress, but there has not yet been a stable, usable release suitable for the public.

★ ☆ ☆ ☆ ☆
  •   Jaap Blom
  •   Jaap Blom
  •  
  •   The Netherlands Institute for Sound and Vision
dane-workflows
  • Software Library
deepfrog 0.2.1 2021-04-11 15:29:58 +0200
  • Unknown
A deep learning NLP suite (PoS,lemmatiser,NER) with FoLiA XML support [view more]
Category:
  • ['science', 'text-processing']
Keywords:
  • annotation
  • linguistics
  • nlp
  • text-processing
  • xml
  • Experimental: The technology is implemented and ready for experimental settings (beta), but requires further work and validation.
  • Suspended: Initial development has started, but there has not yet been a stable, usable release; work has been stopped for the time being but the author(s) intend on resuming work.

★ ★ ★ ☆ ☆
  •   KNAW Humanities Cluster & CLST, Radboud University
Dexter v0.15.0 2024-05-22 11:15:55 +0200
  • Unknown
No description provided
  • Active: The project has reached a stable, usable state and is being actively developed.

★ ☆ ☆ ☆ ☆
did-summarizer unknown 2024-02-02 13:28:23 +0100
  • Unknown
Linked Data summarizer driven by Decentralized Identifiers (DIDs) [view more]
  • Suspended: Initial development has started, but there has not yet been a stable, usable release; work has been stopped for the time being but the author(s) intend on resuming work.

★ ☆ ☆ ☆ ☆
Dutch_FrameNet_Lexicon unknown 2020-07-08 09:32:55 +0200
  • Unknown
No description provided
  • Suspended: Initial development has started, but there has not yet been a stable, usable release; work has been stopped for the time being but the author(s) intend on resuming work.

☆ ☆ ☆ ☆ ☆
Electronisch woordenboek van de Achterhoekse en Liemerse dialecten unknown
  • Web Application
Django web application that facilities viewing and searching a dictionary of Dutch dialects from the regions 'Achterhoek' and 'Liemers' [view more]
Keywords:
  • dialect
  • dictionary
  • dutch
  • Active: The project has reached a stable, usable state and is being actively developed.

★ ☆ ☆ ☆ ☆
  •   Erwin Komen
  •   Erwin Komen
e-WALD
  • Web Application
Electronisch woordenboek van de Gelderse dialecten unknown
  • Web Application
Django web application that facilities viewing and searching a dictionary of dialects from the Dutch province 'Noord-Brabant' as well as the Belgian provinces of Antwerpen, Vlaams-Brabant and Brussels [view more]
Keywords:
  • dialect
  • dictionary
  • dutch
  • Active: The project has reached a stable, usable state and is being actively developed.

★ ☆ ☆ ☆ ☆
  •   Erwin Komen
  •   Erwin Komen
e-WBD
  • Web Application
Electronisch woordenboek van de Gelderse dialecten unknown
  • Web Application
Django web application that facilities viewing and searching a dictionary of Dutch dialects from the province 'Gelderland' [view more]
Keywords:
  • dialect
  • dictionary
  • dutch
  • Active: The project has reached a stable, usable state and is being actively developed.

★ ☆ ☆ ☆ ☆
  •   Erwin Komen
  •   Erwin Komen
e-WGD
  • Web Application
Electronisch woordenboek van de Limburgse dialecten unknown
  • Web Application
Django web application that facilities viewing and searching a dictionary of the Dutch Limburgian dialects [view more]
Keywords:
  • dialect
  • dictionary
  • dutch
  • Active: The project has reached a stable, usable state and is being actively developed.

★ ☆ ☆ ☆ ☆
  •   Erwin Komen
  •   Erwin Komen
e-WLD
  • Web Application
FLAT
FoLiA-Linguistic-Annotation-Tool 0.11.5 2024-07-05 13:27:34 +0200
  • Web Application
FLAT is a web-based linguistic annotation environment based around the FoLiA format (https://proycon.github.io/folia), a rich XML-based format for linguistic annotation. Flat allows users to view annotated FoLiA documents and enrich these documents with new annotations, a wide variety of linguistic annotation types is supported through the FoLiA paradigm. [view more]
Category:
  • Text Processing > Linguistic
Keywords:
  • annotation
  • computational linguistics
  • folia
  • linguistics
  • nlp
  • 8 - Complete: Technology complete and qualified, released for all end-users in scholarly environments.
  • Active: The project has reached a stable, usable state and is being actively developed.

★ ★ ★ ★ ☆
  •   KNAW Humanities Cluster & CLST, Radboud University
FLAT: the FoLiA Linguistic Annotation Tool
  • Web Application
FoLiA-Linguistic-Annotation-Tool
  • Unknown
foliadocserve 0.7.8 2024-02-07 16:51:51 +0100
  • Command-line Application
The FoLiA Document Server is a backend HTTP service to interact with documents in the FoLiA format, a rich XML-based format for linguistic annotation (http://proycon.github.io/folia). It provides an interface to efficiently edit FoLiA documents through the FoLiA Query Language (FQL). [view more]
Category:
  • Text Processing > Linguistic
Keywords:
  • nlp computational_linguistics rest database document server
  • 8 - Complete: Technology complete and qualified, released for all end-users in scholarly environments.
  • Active: The project has reached a stable, usable state and is being actively developed.

★ ★ ★ ★ ☆
  •   KNAW Humanities Cluster & CLST, Radboud University
foliadocserve
  • Command-line Application
FoLiA
folia 0.0.6 2020-11-16 14:24:33 +0100
  • Software Library
High-performance library for handling the FoLiA XML format (Format for Linguistic Annotation) [view more]
Category:
  • ['science', 'text-processing']
Keywords:
  • annotation
  • linguistics
  • nlp
  • text-processing
  • xml
  • Inactive: The project has reached a stable, usable state but is no longer being actively developed; support/maintenance will be provided as time allows.

★ ★ ★ ☆ ☆
  •   KNAW Humanities Cluster & CLST, Radboud University
folia
  • Software Library
FoLiA tools 2.5.8 2024-10-17 16:51:49 +0200
  • Command-line Application
FoLiA-tools contains various Python-based command line tools for working with FoLiA XML (Format for Linguistic Annotation) [view more]
Category:
  • Annotating
  • https://w3id.org/nwo-research-fields#ComputationalLinguisticsandPhilology
  • Textual and linguistic corpora
Keywords:
  • annotation
  • computational linguistics
  • folia
  • nlp
  • search
  • 9 - Proven: Technology complete and proven in practice by real users.
  • Active: The project has reached a stable, usable state and is being actively developed.

★ ★ ★ ★ ☆
  •   KNAW Humanities Cluster & CLST, Radboud University
alpino2folia
  • Command-line Application
conllu2folia
  • Command-line Application
dcoi2folia
  • Command-line Application
folia2annotatedtxt
  • Command-line Application
folia2columns
  • Command-line Application
folia2dcoi
  • Command-line Application
folia2html
  • Command-line Application
folia2rst
  • Command-line Application
folia2salt
  • Command-line Application
folia2stam
  • Command-line Application
folia2txt
  • Command-line Application
foliabench
  • Command-line Application
foliacat
  • Command-line Application
foliacorrect
  • Command-line Application
foliacount
  • Command-line Application
foliaerase
  • Command-line Application
foliaeval
  • Command-line Application
foliafreqlist
  • Command-line Application
foliaid
  • Command-line Application
folialangid
  • Command-line Application
foliamerge
  • Command-line Application
foliaquery
  • Command-line Application
foliaquery1
  • Command-line Application
foliasetdefinition
  • Command-line Application
foliaspec
  • Command-line Application
foliaspec2json
  • Command-line Application
foliaspec2rdf
  • Command-line Application
foliasplit
  • Command-line Application
foliatextcontent
  • Command-line Application
foliatree
  • Command-line Application
foliaupgrade
  • Command-line Application
foliavalidator
  • Command-line Application
rst2folia
  • Command-line Application
tei2folia
  • Command-line Application
transcribedspeech2folia
  • Command-line Application
txt2folia
  • Command-line Application
FoLiApy 2.5.12 2024-10-11 18:28:11 +0200
  • Software Library
An extensive library for processing FoLiA documents. FoLiA stands for Format for Linguistic Annotation and is a very rich XML-based format used by various Natural Language Processing tools. [view more]
Category:
  • Annotating
  • https://w3id.org/nwo-research-fields#ComputationalLinguisticsandPhilology
  • Textual and linguistic corpora
Keywords:
  • annotation
  • computational linguistics
  • folia
  • format
  • nlp
  • xml
  • 9 - Proven: Technology complete and proven in practice by real users.
  • Active: The project has reached a stable, usable state and is being actively developed.

★ ★ ★ ★ ☆
  •   KNAW Humanities Cluster & CLST, Radboud University
FoLiApy
  • Software Library
foliautils 0.22
  • Command-line Application
Command-line utilities for working with the Format for Linguistic Annotation (FoLiA). [view more]
Keywords:
  • folia
  • linguistic annotation
  • natural language processing
  • nlp
  • xml
  • Active: The project has reached a stable, usable state and is being actively developed.

★ ★ ★ ☆ ☆
FoLiA-2text
  • Command-line Application
Convert FoLiA documents into plain text
FoLiA-clean
  • Command-line Application
FoLiA-clean will produce a cleaned up version of a FoLiA file, or a whole directory of FoLiA files, removing specified annotation types and specified text classes
FoLiA-stats
  • Command-line Application
Gather n-gram statistics over a series of FoLiA documents
FoLiA-langcat
  • Command-line Application
Language Identification using textcat.
FoLiA-correct
  • Command-line Application
Correct FoLiA documents using correction candidates generated by TICCL-rank (from ticcltools)
FoLiA-alto
  • Command-line Application
Convert ALTO DIDL files into a series of FoLiA documents
FoLiA-txt
  • Command-line Application
Convert plain text to FoLiA, the output will contain only <p> and <str> nodes. See ucto or rst2folia (FoLiA-tools) for alternatives.
FoLiA-page
  • Command-line Application
Convert PAGE XML to FoLiA
FoLiA-wordtranslate
  • Command-line Application
Simple word-by-word translator on the basis of a dictonary and/or rewrite rules
FoLiA-pm
  • Command-line Application
Convert Political Maskup XML to FoLiA
FoLiA-idf
  • Command-line Application
Count words in a series of FoLiA documents and compute IDF statistics, which are outputted to a tsv file
FoLiA-hocr
  • Command-line Application
Convert hOCR (as outputted by Tesseract) to FoLiA
FoLiA-collect
  • Command-line Application
Collect n-gram statistics from tsv files produced by FoLiA-stats, aggregating results.
libfolia 2.20
  • Command-line Application
  • Software Library
This is a C++ Library for working with the Format for Linguistic Annotation (FoLiA). [view more]
Keywords:
  • folia
  • linguistic annotation
  • natural language processing
  • nlp
  • xml
  • Active: The project has reached a stable, usable state and is being actively developed.

★ ★ ★ ☆ ☆
folialint
  • Command-line Application
FoLiA validation tool
libfolia
  • Software Library
FoLiA Library with API for C++
piereling 0.4 2023-11-01 11:43:34 +0100
  • Web Application
Piereling is a webservice and web-application to convert between a variety of document formats, mostly from and to FoLiA XML. It is intended for NLP pipelines. [view more]
Category:
  • Internet > WWW/HTTP > WSGI > Application
  • Text Processing > Linguistic
Keywords:
  • webservice nlp computational_linguistics rest folia conversion
  • 8 - Complete: Technology complete and qualified, released for all end-users in scholarly environments.
  • Inactive: The project has reached a stable, usable state but is no longer being actively developed; support/maintenance will be provided as time allows.

★ ★ ★ ☆ ☆
  •   KNAW Humanities Cluster & CLST, Radboud University
Piereling 0.4
  • Web Application
Piereling can convert a wide variety of document formats to FoLiA XML, and from FoLiA XML to various formats. Data conversions such as these provide the groundwork for Natural Language Processing pipelines. It relies on numerous specialised conversion tools in combination with notable third-party tools such as pandoc.
piereling
  • Unknown
Forced Alignment 2 0.3.1
  • Web Application
This webservice provides an output file with word alignments given an NL speech recording and a transcription. [view more]
Keywords:
  • alignment
  • speech recognition
  • Active: The project has reached a stable, usable state and is being actively developed.

★ ★ ★ ☆ ☆
ForcedAlignment2 0.3.1
  • Web Application
Forced Alignment of text and audio files
Forced Alignment 2
  • Unknown
Frog
Frog 0.33 2023-12-05 15:43:06 +0100
  • Command-line Application
  • Software Library
Frog is an integration of memory-based natural language processing (NLP) modules developed for Dutch. It performs automatic linguistic enrichment such as part of speech tagging, lemmatisation, named entity recognition, shallow parsing, dependency parsing and morphological analysis. All NLP modules are based on TiMBL. [view more]
Category:
  • Annotating
  • Contextualizing
  • Linguistics
  • Named Entity Recognition
  • POS-Tagging
  • Segmenting
  • Tagging
  • Textual and content analysis
  • Tree-Tagging
Keywords:
  • dependency parsing
  • dutch
  • lemma
  • lemmatisation
  • natural language processing
  • ner
  • nlp
  • parser
  • part-of-speech tagging
  • pos
  • shallow parsing
  • tagger
  • 9 - Proven: Technology complete and proven in practice by real users.
  • Active: The project has reached a stable, usable state and is being actively developed.

★ ★ ★ ★ ★
mbma
  • Command-line Application
Memory-based Morphological Analysis (standalone)
frog
  • Command-line Application
Command-line interface to the full NLP suite
mblem
  • Command-line Application
Memory-based Lemmatiser (standalone)
libfrog
  • Software Library
Frog Library with API for C++
ner
  • Command-line Application
Named Entity Recogniser (standalone)
Frog-Webservice 2.7 2023-12-05 16:06:08 +0100
  • Web Application
Frog is a suite containing a tokeniser, Part-of-Speech tagger, lemmatiser, morphological analyser, shallow parser, and dependency parser for Dutch. This is the webservice for it, for both humans and machines. [view more]
Category:
  • Annotating
  • Contextualizing
  • Linguistics
  • Named Entity Recognition
  • POS-Tagging
  • Segmenting
  • Tagging
  • Textual and content analysis
  • Tree-Tagging
Keywords:
  • clam webservice rest nlp computational_linguistics rest
  • 8 - Complete: Technology complete and qualified, released for all end-users in scholarly environments.
  • Active: The project has reached a stable, usable state and is being actively developed.

★ ★ ★ ☆ ☆
Frog Webservice 2.7
  • Web Application
Frog is a suite containing a tokeniser, Part-of-Speech tagger, lemmatiser, morphological analyser, shallow parser, and dependency parser for Dutch.
python-frog 0.6.10 2023-12-05 15:47:42 +0100
  • Unknown
Python binding to Frog, an NLP suite for Dutch doing part-of-speech tagging, lemmatisation, morphological analysis, named-entity recognition, shallow parsing, and dependency parsing. [view more]
Category:
  • Annotating
  • Contextualizing
  • Linguistics
  • Named Entity Recognition
  • POS-Tagging
  • Segmenting
  • Tagging
  • Textual and content analysis
  • Tree-Tagging
Keywords:
  • nlp computational_linguistics dutch pos lemmatizer
  • 9 - Proven: Technology complete and proven in practice by real users.
  • Active: The project has reached a stable, usable state and is being actively developed.

★ ★ ★ ★ ☆
toad v0.8 2023-02-22 17:02:10 +0100
  • Unknown
Toad: Trainer Of All Data, the Frog training collection [view more]
  • Active: The project has reached a stable, usable state and is being actively developed.

★ ★ ★ ☆ ☆
  •   Antal van de Bosch
fusus 0.0.2 2023-04-11 19:46:15 +0200
  • Unknown
Workflow for converting Arabic scanned pages into readable text [view more]
Category:
  • Religion
  • Scientific/Engineering > Information Analysis
  • Sociology > History
  • Text Processing
  • Text Processing > Fonts
  • Text Processing > Markup
Keywords:
  • arabic
  • image processing
  • islam
  • medieval
  • OCR
  • text
  • Proof of Concept: An initial proof-of-concept implementation of the technology is available (alpha). It is not mature enough for end-users yet.
  • WIP: Initial development is in progress, but there has not yet been a stable, usable release suitable for the public.

★ ★ ★ ☆ ☆
  •   Among, A Community for DH and MS
g2pservice 0.3.4 2023-05-12 13:09:12 +0200
  • Web Application
Grapheme to Phoneme converter. Input is a list of words (utf8). Choose one of the language options. [view more]
Category:
  • Internet > WWW/HTTP > WSGI > Application
  • Text Processing > Linguistic
Keywords:
  • speech
  • transcription
  • Experimental: The technology is implemented and ready for experimental settings (beta), but requires further work and validation.
  • Active: The project has reached a stable, usable state and is being actively developed.

★ ★ ★ ☆ ☆
  •   Louis ten Bosch
  •   Louis ten Bosch
Grapheme to Phoneme converter 0.3.4
  • Web Application
Grapheme to Phoneme (G2P) conversion. Input is a list of words (utf-8, one word per line). The G2P will output the best guess for the phonetic transcription per word. The system is trained on existing dictionaries. Please choose a language option. The system is a demo-version --- please refer to CLST for using G2P for long word lists.
g2pservice
  • Unknown
GaLAHaD
GaLAHaD 1.2.5 2024-11-28 11:59:28 +0100
  • Server Application
  • Software Image
  • Web API
  • Web Application
GaLAHaD (Generating Linguistic Annotations for Historical Dutch) allows linguists to compare taggers, tag their own corpora, evaluate the results and export their tagged documents. [view more]
Category:
  • Analyzing
  • Annotating
  • Artificial intelligence, export systems
  • Comparing
  • Computational linguistics and philology
  • Converting
  • Enriching
  • Lemmatizing
  • Linguistics
  • Machine Learning
  • Merging
  • POS-Tagging
  • Software for humanities
  • Tagging
  • Textual and linguistic corpora
  • 6 - Late prototype: Technology demonstrated in target setting, end-users adopt it for testing purposes.
  • Active: The project has reached a stable, usable state and is being actively developed.

★ ★ ★ ★ ☆
1
GaLAHaD proxy
  • Server Application
GaLAHaD API
  • Web API
GaLAHaD
  • Web Application
GaLAHaD server Docker image
  • Software Image
GaLAHaD proxy Docker image
  • Software Image
GaLAHaD client Docker image
  • Software Image
GaLAHaD Train Battery 1.0.0 2024-06-04
  • Unknown
Python program for training linguistic annotation taggers based on a configuration file and list of datasets. It prepares the resulting trained models for dockerization and adds relevant metadata. It is tagger software agnostic as long as a simple python shell is built around it. [view more]
Category:
  • Artificial intelligence, export systems
  • Computational linguistics and philology
  • Linguistics
  • 6 - Late prototype: Technology demonstrated in target setting, end-users adopt it for testing purposes.
  • Active: The project has reached a stable, usable state and is being actively developed.

★ ★ ★ ☆ ☆
GaLAHaD Train Battery - Trainer
  • Unknown
GaLAHaD Train Battery - Dockerizer
  • Unknown
int-pie 1.0.0 2024-06-05 15:51:23 +0200
  • Unknown
The PIE tagger with custom modifications by the Dutch Language Institute (INT). [view more]
Category:
  • Analyzing
  • Annotating
  • Artificial intelligence, export systems
  • Computational linguistics and philology
  • Enriching
  • Lemmatizing
  • Linguistics
  • Machine Learning
  • POS-Tagging
  • Tagging
  • 8 - Complete: Technology complete and qualified, released for all end-users in scholarly environments.
  • Active: The project has reached a stable, usable state and is being actively developed.

★ ★ ★ ☆ ☆
1
  •   Enrique Manjavacas
  •   Mike Kestemont
  •   Thibault Clerice
tag
  • Unknown
evaluate
  • Unknown
train
  • Unknown
Gecco 0.3.0 2020-07-11 13:28:04 +0200
  • Command-line Application
Generic Environment for Context-Aware Correction of Orthography [view more]
Category:
  • Text Processing > Linguistic
Keywords:
  • spelling corrector spell check nlp computational_linguistics rest
  • Experimental: The technology is implemented and ready for experimental settings (beta), but requires further work and validation.
  • Unsupported: The project has reached a stable, usable state but the author(s) have ceased all work on it. A new maintainer may be desired.

★ ★ ★ ☆ ☆
1
  •   KNAW Humanities Cluster & CLST, Radboud University
gecco
  • Command-line Application
Generale Missieven in Text-Fabric v1.1e 2024-03-27 08:09:41 +0100
  • Command-line Application
Conversion of Generale Missieven to Text-Fabric and tutorial how to work with the result [view more]
Keywords:
  • corpus-data
  • corpus-linguistics
  • corpus-processing
  • corpus-tools
  • dutch
  • history
  • nlp
  • Active: The project has reached a stable, usable state and is being actively developed.

★ ☆ ☆ ☆ ☆
converter and tutorial notebooks
  • Command-line Application
Glem 1.3.1 2023-10-05 14:28:06 +0200
  • Command-line Application
  • Web Application
GLEM is a lemmatizer for Ancient Greek. [view more]
Category:
  • Annotating
  • Computational linguistics and philology
  • Greek and Latin philology and literature
Keywords:
  • ancient greek
  • greek
  • lemma
  • lemmatisation
  • natural language processing
  • nlp
  • https://w3id.org/research-technology-readiness-level#Level8Complete
    Warning: Status is not expressed in a known vocabulary
  • Inactive: The project has reached a stable, usable state but is no longer being actively developed; support/maintenance will be provided as time allows.

★ ★ ★ ☆ ☆
  •   Corien Bary
  •   Peter Berck
  •   Iris Hendrickx
  •   Wessel Stoop
Glem 1.3.1
  • Web Application
glem
  • Command-line Application
Command-line interface to GLEM
gretel 4.2.4 2022-09-16 15:00:36 +0200
  • Web Application
GrETEL4 (fork from CCL-KULeuven) [view more]
  • Inactive: The project has reached a stable, usable state but is no longer being actively developed; support/maintenance will be provided as time allows.

★ ★ ★ ☆ ☆
GrETEL 4
  • Web Application
grlc: the git repository linked data API constructor 1.3.7 2022-03-21 22:20:00 +0100
  • Unknown
grlc, the git repository linked data API constructor, automatically builds Web APIs using SPARQL queries stored in git repositories. [view more]
Keywords:
  • linked-data
  • linked-data-api
  • semantic-web
  • sparql
  • swagger-ui
  • Inactive: The project has reached a stable, usable state but is no longer being actively developed; support/maintenance will be provided as time allows.

★ ★ ★ ★ ☆
  •   Albert Meroño-Peñuela
  •   Albert Meroño-Peñuela
  •   Carlos Martinez
grlc: the git repository linked data API constructor
  • Unknown
hypodisc 0.1.0 2024-10-17 17:10:45 +0200
  • Unknown
Hypothesis Discovery on RDF Knowledge Graphs [view more]
Keywords:
  • hypothesis generation
  • knowledge graphs
  • pattern discovery
  • rdf
  • Active: The project has reached a stable, usable state and is being actively developed.

★ ★ ★ ☆ ☆
  •   VU University Amsterdam
hypodisc
  • Unknown
I-Analyzer 5.3.0 2023-12-08 11:23:35 +0100
  • Web Application
I-analyzer is a tool for exploring corpora (large collections of texts). You can use I-analyzer to find relevant documents, or to make visualisations to understand broader trends in the corpus. The interface is designed to be accessible for users of all skill levels. I-analyzer is primarily intended for academic research and higher education. We focus on data that is relevant for the humanities, but we are open to datasets that are relevant for other fields. [view more]
Keywords:
  • corpus research
  • data visualization
  • elasticsearch
  • natural language processing
  • text-mining
  • Active: The project has reached a stable, usable state and is being actively developed.

★ ★ ★ ☆ ☆
  •   Research Software Lab, Centre for Digital Humanities, Utrecht University
  •   Research Software Lab, Centre for Digital Humanities, Utrecht University
I-Analyzer
  • Web Application
ineo unknown
  • Web Application
No description provided unknown
☆ ☆ ☆ ☆ ☆
Ineo - Start using digital humanities resources - Ineo
  • Web Application
Ineo lets you search, browse, find and select digital resources for your research in humanities and social sciences. The platform is already fully functional, but is still being filled with resource content. At the end of 2023, it will offer access to many tools, datasets, workflows, standards and educational material.
ineo-collaboration unknown 2024-09-19 11:39:21 +0200
  • Unknown
how to get metadata into INEO [view more]
  • WIP: Initial development is in progress, but there has not yet been a stable, usable release suitable for the public.

★ ★ ☆ ☆ ☆
Kaldi_NL v0.4.3 2023-11-01 12:10:26 +0100
  • Unknown
Code related to the Dutch instance and user groups of the KALDI speech recognition toolkit [view more]
Keywords:
  • dutch
  • kaldi
  • speech-recognition
  • speech-recognition-model
  • Inactive: The project has reached a stable, usable state but is no longer being actively developed; support/maintenance will be provided as time allows.

★ ☆ ☆ ☆ ☆
LaMachine 2.28
  • Unknown
LaMachine is a unified software distribution for Natural Language Processing. We integrate numerous open-source NLP tools, programming libraries, web-services, and web-applications in a single Virtual Research Environment that can be installed on a wide variety of machines. [view more]
Keywords:
  • installer
  • natural language processing
  • nlp
  • python
  • software distribution
  • Unsupported: The project has reached a stable, usable state but the author(s) have ceased all work on it. A new maintainer may be desired.

★ ★ ★ ☆ ☆
Lenticular Lens
lenticular-lens 1.18 2024-10-08 10:01:25 +0200
  • Web Application
[view more]
  • Active: The project has reached a stable, usable state and is being actively developed.

★ ★ ☆ ☆ ☆
Golden Agents | lenticularlens.org
  • Web Application
Lenticular Lens 1.0.0
  • Web Application
lenticular-lens
  • Unknown
lenticular-lens 1.0.0 2023-01-25 09:39:33 +0100
  • Web Application
Vue frontend for Lenticular Lens [view more]
  • Inactive: The project has reached a stable, usable state but is no longer being actively developed; support/maintenance will be provided as time allows.

★ ★ ☆ ☆ ☆
Lenticular Lens 1.0.0
  • Web Application
lenticular-lens-postgresql 1.4 2023-10-06 15:57:59 +0200
  • Unknown
PostgreSQL extension for Lenticular Lens [view more]
  • Active: The project has reached a stable, usable state and is being actively developed.

★ ★ ☆ ☆ ☆
lingua-cli 0.4.1 2024-10-12 23:10:48 +0200
  • Software Library
Lingua-cli is a command line tool for language classification, using the lingua-rs library. [view more]
Keywords:
  • languagedetection
  • nlp
  • 7 - Release Candidate: Technology ready enough and in initial use by end-users in intended scholarly environments. Further validation may be in progress.
  • Active: The project has reached a stable, usable state and is being actively developed.

★ ★ ★ ☆ ☆
  •   KNAW Humanities Cluster & CLST, Radboud University
lingua-cli
  • Software Library
mbt 3.10
  • Command-line Application
  • Software Library
MBT is a memory-based tagger-generator and tagger in one. The tagger-generator part can generate a sequence tagger on the basis of a training set of tagged sequences; the tagger part can tag new sequences. MBT can, for instance, be used to generate part-of-speech taggers or chunkers for natural language processing. It has also been used for named-entity recognition, information extraction in domain-specific texts, and disfluency chunking in transcribed speech. [view more]
Keywords:
  • machine learning
  • memory based learning
  • natural language processing
  • nlp
  • tagger
  • Active: The project has reached a stable, usable state and is being actively developed.

★ ★ ★ ☆ ☆
mbt
  • Command-line Application
Memory-based tagger, command-line tool
libmbt
  • Software Library
Memory-based Tagging Library with API for C++
Media Suite
CLARIAH Media Suite 6.10 2023-11-21
  • Web Application
The CLARIAH Media Suite is a research environment in which researchers can search, bookmark, annotate and compare items from a number of cultural heritage collections [view more]
Keywords:
  • collection analysis
  • cultural heritage
  • data portal
  • faceted search
  • scholerly annotation
  • virtual workspace
  • 9 - Proven: Technology complete and proven in practice by real users.
  • Active: The project has reached a stable, usable state and is being actively developed.

★ ★ ★ ★ ☆
  •   Jaap Blom
  •   Jaap Blom
CLARIAH Media Suite
  • Web Application
Nederlab
MTAS 8.11.1.0 2022-01-14 11:51:15 +0100
  • Software Library
Multi Tier Annotation Search, a Solr/Lucene based library and plugin providing search and analysis on annotated and structured text. [view more]
Keywords:
  • annotations
  • big-data
  • cql
  • distributed
  • lucene
  • search
  • search-engine
  • search-in-text
  • solr
  • structure
  • text
  • text-analysis
  • Active: The project has reached a stable, usable state and is being actively developed.

★ ★ ★ ☆ ☆
MTAS
  • Software Library
Nederlab Pipeline 0.8.0
  • Unknown
A set of workflows for linguistic enrichment of historical dutch [view more]
Keywords:
  • natural language processing
  • nlp
  • Inactive: The project has reached a stable, usable state but is no longer being actively developed; support/maintenance will be provided as time allows.

★ ★ ★ ☆ ☆
nederlab-portal unknown
  • Web Application
No description provided unknown
☆ ☆ ☆ ☆ ☆
nederlab onderzoeksportaal
  • Web Application
Netwerk Digitaal Erfgoed (NDE)
Dataset Register unknown
  • Web Application
Live index of heritage datasets [view more]
Category:
  • Discovering
Keywords:
  • datasets
  • nde
  • 9 - Proven: Technology complete and proven in practice by real users.
  • Active: The project has reached a stable, usable state and is being actively developed.

★ ★ ☆ ☆ ☆
Dataset Register OpenAPI
  • Web Application
Network of Terms GraphQL API unknown
  • Web Application
GraphQL API for the Network of Terms, a Search engine for finding terms in terminology sources (such as thesauri, classification systems and reference lists) [view more]
Category:
  • Identifying
Keywords:
  • graphql
  • linked-data
  • search
  • 9 - Proven: Technology complete and proven in practice by real users.
  • Active: The project has reached a stable, usable state and is being actively developed.

★ ★ ☆ ☆ ☆
Network of Terms GraphQL API
  • Web Application
Network of Terms Reconciliation API unknown
  • Web Application
Reconciliation API for the Network of Terms, a Search engine for finding terms in terminology sources (such as thesauri, classification systems and reference lists) [view more]
Category:
  • Identifying
Keywords:
  • graphql
  • linked-data
  • search
  • 9 - Proven: Technology complete and proven in practice by real users.
  • Active: The project has reached a stable, usable state and is being actively developed.

★ ★ ☆ ☆ ☆
Network of Terms Reconciliation API
  • Web Application
OpenDutchWordnet unknown 2021-05-11 16:38:00 +0200
  • Software Library
This repo provides a python module to work with Open Dutch WordNet. It was created using python 3.4. [view more]
  • Suspended: Initial development has started, but there has not yet been a stable, usable release; work has been stopped for the time being but the author(s) intend on resuming work.

☆ ☆ ☆ ☆ ☆
OpenDutchWordnet
  • Software Library
pagexml-tools 0.5.0 2024-03-18 14:49:12 +0100
  • Command-line Application
Utility functions for reading PageXML files [view more]
Category:
  • Scientific/Engineering
  • Experimental: The technology is implemented and ready for experimental settings (beta), but requires further work and validation.
  • Active: The project has reached a stable, usable state and is being actively developed.

★ ★ ★ ★ ☆
  •   Marijn Koolen
  •   Marijn Koolen
  •   Bram Buitendijk
version
  • Command-line Application
PaQu 1.0.5 2024-04-24
  • Web Application
Met PaQu (Parse & Query) kun je zoeken in syntactisch geannoteerde Nederlandstalige corpora. PaQu ondersteunt twee manieren van zoeken. Met de eerste, eenvoudige, manier kun je naar woordparen zoeken, met daarbij eventueel hun syntactische relatie. De tweede, ingewikkeldere, manier gebruikt de zoektaal XPath. In PaQu is een aantal syntactisch geannoteerde corpora standaard beschikbaar. Maar het is ook mogelijk om je eigen teksten aan te bieden. Deze teksten worden dan door de automatische ontleder geanalyseerd, en opgeslagen. Vervolgens kun je dan op dezelfde manier in je eigen teksten zoeken. [view more]
Category:
  • Linguistics
  • nwo:ComputationalLinguisticsandPhilology
  • Software for humanities
  • Structural Analysis
Keywords:
  • Alpino
  • Dependency parsing
  • SPOD: Syntactic profiler of Dutch
  • UD: Universal Dependencies
  • XPath
  • Complete: The technology is complete, stable and deployed in production scenarios for end-users
  • Active: The project has reached a stable, usable state and is being actively developed.

★ ★ ★ ★ ☆
PaQu
  • Web Application
pure3dtools 0.0.4 2024-10-30 16:37:28 +0100
  • Unknown
Pure3D tools [view more]
  • Experimental: The technology is implemented and ready for experimental settings (beta), but requires further work and validation.
  • Moved: The project has been moved to a new location, and the version at that location should be considered authoritative.

★ ★ ★ ☆ ☆
pure3dtools
  • Unknown
Ricgraph - Research in context graph 2.7 2024-12-03 13:36:22 +0100
  • Command-line Application
  • Software Library
  • Web Application
Ricgraph, also known as Research in context graph, enables the exploration of researchers, teams, their results, collaborations, skills, projects, and the relations between these items. Ricgraph can store many types of items into a single graph. These items can be obtained from various systems and from multiple organizations. Ricgraph facilitates reasoning about these items because it infers new relations between items, relations that are not present in any of the separate source systems. It is flexible and extensible, and can be adapted to new application areas. Throughout this text, we illustrate how Ricgraph works by applying it to the application area research information. Motivation Ricgraph, also known as Research in context graph, is software that is about relations between items. These items can be collected from various source systems and from multiple organizations. We explain how Ricgraph works by applying it to the application area research information. We show the insights that can be obtained by combining information from various source systems, insight arising from new relations that are not present in each separate source system. Research information is about anything related to research: research results, the persons in a research team, their collaborations, their skills, projects in which they have participated, as well as the relations between these entities. Examples of research results are publications, data sets, and software. Example use cases from the application area research information are: (1) As a journalist, I want to find researchers with a certain skill and their publications, so that I can interview them for a newspaper article. (2) As a librarian, I want to enrich my local research information system with research results that are in other systems but not in ours, so that we have a more complete view of research at our university. (3) As a researcher, I want to find researchers from other universities that have co-authored publications written by the co-authors of my own publications, so that I can read their publications to find out if we share common research interests. These use cases use different types of information (called items): researchers, skills, publications, etc. Most often, these types of information are not stored in one system, so the use cases may be difficult or time-consuming to answer. However, by using Ricgraph, these use cases (and many others) are easy to answer. Although this text illustrates Ricgraph in the application area research information, the principle "relations between items from various source systems" is general, so Ricgraph can be used in other application areas. Main contributions of Ricgraph (1) Ricgraph can store many types of items in a single graph. (2) Ricgraph harvests multiple source systems into a single graph. (3) Ricgraph Explorer is the exploration tool for Ricgraph. (4) Ricgraph facilitates reasoning about items because it infers new relations between items. (5) Ricgraph can be tailored for an application area. Read more about Ricgraph For a gentle introduction in Ricgraph, read the reference publication: Rik D.T. Janssen (2024). Ricgraph: A flexible and extensible graph to explore research in context from various systems. SoftwareX, 26(101736). https://doi.org/10.1016/j.softx.2024.101736 Extensive documentation, publications, videos and source code can be found in the GitHub repository https://github.com/UtrechtUniversity/ricgraph The website for Ricgraph can be found at https://www.ricgraph.eu [view more]
Category:
  • Analyzing
  • Browsing
  • Capturing
  • Discovering
  • Enriching
  • Exploration
  • Information Retrieval
  • Storing
Keywords:
  • Data enrichment
  • Data harvesting
  • Data linking
  • Graph
  • Graph database
  • Knowledge graph
  • Linked data
  • Metadata
  • Research in context graph
  • Ricgraph
  • Ricgraph Explorer
  • Ricgraph REST API
  • Utrecht University
  • 7 - Release Candidate: Technology ready enough and in initial use by end-users in intended scholarly environments. Further validation may be in progress.
  • Active: The project has reached a stable, usable state and is being actively developed.

★ ★ ★ ★ ☆
1
  •   Rik D.T. Janssen
  •   Rik D.T. Janssen
Script to harvest the Research Information System Pure for Ricgraph
  • Command-line Application
Script to harvest OpenAlex for Ricgraph
  • Command-line Application
Ricgraph
  • Software Library
Script to harvest the data repository Yoda for Ricgraph
  • Command-line Application
Script to call all harvest scripts
  • Command-line Application
Ricgraph REST API
  • Web Application
Script to harvest the Research Software Directory for Ricgraph
  • Command-line Application
Ricgraph Explorer
  • Web Application
Script to harvest the Utrecht University staff pages for Ricgraph
  • Command-line Application
sastadev 0.2.3 2024-06-18 16:31:39 +0200
  • Command-line Application
  • Web Application
Linguistic functions for SASTA tool [view more]
  • Active: The project has reached a stable, usable state and is being actively developed.

★ ★ ★ ☆ ☆
SASTA
  • Web Application
sastadev
  • Command-line Application
search-ui 1.0.0 2022-12-21 08:48:13 +0100
  • Web Application
This repository contains the code for a Search UI to test the functionality of the basic vocabulary-recommender. [view more]
  • Inactive: The project has reached a stable, usable state but is no longer being actively developed; support/maintenance will be provided as time allows.

★ ★ ☆ ☆ ☆
@triply/search-ui 1.0.0
  • Web Application
shebanq v4.2z 2022-10-12 10:12:53 +0200
  • Web Application
Exposing the Hebrew Text Database of the ETCBC [view more]
Keywords:
  • annotation
  • etcbc
  • etcbc-data
  • hebrew
  • hebrew-bible
  • search-engine
  • text-fabric
  • Active: The project has reached a stable, usable state and is being actively developed.

★ ☆ ☆ ☆ ☆
SHEBANQ
  • Web Application
Search engine for biblical Hebrew based on the Biblia Hebraica Stuttgartensia (Amstelodamensis) database (formerly known as ETCBC, historically known as WIVU)
SHEBANQ
  • Web Application
SPAQ unknown 2024-02-14 17:18:20 +0100
  • Unknown
SPAQ (speech aquisition using Surveys) [view more]
  • Suspended: Initial development has started, but there has not yet been a stable, usable release; work has been stopped for the time being but the author(s) intend on resuming work.

★ ☆ ☆ ☆ ☆
STAM
stam v1.1.1 2024-09-17 13:18:57 +0200
  • Unknown
Stand-off Text Annotation Model (STAM) is a data model for stand-off-text annotation where any information on a text is represented as an annotation. This repository contains the model's full specification, extensions, schemas, examples and documentation. [view more]
Category:
  • Annotating
  • Textual and content analysis
  • Textual and linguistic corpora
Keywords:
  • annotation
  • linguistics
  • stand-off
  • text
  • text-annotation
  • webannotation
  • Experimental: The technology is implemented and ready for experimental settings (beta), but requires further work and validation.
  • Active: The project has reached a stable, usable state and is being actively developed.

★ ☆ ☆ ☆ ☆
stam 0.10.1 2024-10-18 11:22:13 +0200
  • Software Library
STAM is a library for dealing with standoff annotations on text, this is the python binding. [view more]
Category:
  • Annotating
  • Textual and content analysis
  • Textual and linguistic corpora
Keywords:
  • annotation
  • linguistics
  • nlp
  • standoff
  • text-processing
  • 7 - Release Candidate: Technology ready enough and in initial use by end-users in intended scholarly environments. Further validation may be in progress.
  • Active: The project has reached a stable, usable state and is being actively developed.

★ ★ ★ ☆ ☆
stam
  • Software Library
stam 0.16.5 2024-11-18 13:15:25 +0100
  • Software Library
STAM is a powerful library for dealing with stand-off annotations on text. This is the Rust library. [view more]
Category:
  • Annotating
  • Textual and content analysis
  • Textual and linguistic corpora
Keywords:
  • annotation
  • linguistics
  • nlp
  • standoff
  • text-processing
  • 7 - Release Candidate: Technology ready enough and in initial use by end-users in intended scholarly environments. Further validation may be in progress.
  • Active: The project has reached a stable, usable state and is being actively developed.

★ ★ ★ ☆ ☆
stam
  • Software Library
stam-tools 0.9.2 2024-11-18 16:51:34 +0100
  • Command-line Application
Command-line tools for working with stand-off annotations on text (STAM) [view more]
Category:
  • Annotating
  • Textual and content analysis
  • Textual and linguistic corpora
Keywords:
  • annotation
  • linguistics
  • nlp
  • standoff
  • text-processing
  • 7 - Release Candidate: Technology ready enough and in initial use by end-users in intended scholarly environments. Further validation may be in progress.
  • Active: The project has reached a stable, usable state and is being actively developed.

★ ★ ★ ☆ ☆
stam-tools
  • Command-line Application
T-Scan 0.10.0
  • Web Application
T-Scan is an analysis tool for Dutch texts to assess the complexity of the text, and is based on original work by Rogier Kraf [view more]
Keywords:
  • dutch
  • feature extraction
  • natural language processing
  • nlp
  • readability
  • Active: The project has reached a stable, usable state and is being actively developed.

★ ★ ★ ☆ ☆
T-scan
  • Web Application
T-Scan is an analysis tool for Dutch text, mainly focusing on text complexity. It has been initially conceptualized by Rogier Kraf and Henk Pander Maat. Rogier Kraf also programmed the first versions. From 2012 on, Henk Pander Maat supervised the development of the extended versions of the tool. These versions were programmed by Maarten van Gompel, Ko van der Sloot, Martijn van der Klis, Sheean Spoel and Luka van der Plas.
text-fabric 12.6.2 2024-11-04 10:25:44 +0100
  • Command-line Application
  • Software Library
  • Web Application
Processor and browser for annotated text corpora [view more]
Category:
  • Archiving
  • Bible studies
  • Commenting
  • Computational linguistics and philology
  • Highlighting
  • Information Retrieval
  • Interpreting
  • Religious studies and theology
  • Rhetorical Analysis
  • Sharing
  • Structural Analysis
  • Textual and content analysis
  • Textual and linguistic corpora
Keywords:
  • akkadian
  • babylonian
  • bible
  • cuneiform
  • database
  • graph
  • greek
  • hebrew
  • linguistics
  • peshitta
  • quran
  • syriac
  • text
  • uruk
  • 9 - Proven: Technology complete and proven in practice by real users.
  • Active: The project has reached a stable, usable state and is being actively developed.

★ ★ ★ ★ ☆
  •   Dirk Roorda
  •   Dirk Roorda
Text-Fabric
  • Software Library
Text-Fabric Browser
  • Web Application
Text-Fabric
  • Command-line Application
textannoviz 0.16.1 2024-12-04 15:29:11 +0100
  • Web Application
[view more]
  • Active: The project has reached a stable, usable state and is being actively developed.

★ ★ ★ ☆ ☆
  •   Sebastiaan van Daalen
  •   Sebastiaan van Daalen
textannoviz 0.16.1
  • Web Application
TextRepo
textrepo v1.19.0 2022-03-15 14:51:17 +0100
  • Unknown
Text Repository [view more]
  • Active: The project has reached a stable, usable state and is being actively developed.

★ ☆ ☆ ☆ ☆
textrepo-client 0.5.1 2022-04-08 23:52:20 +0200
  • Command-line Application
A Python client to access a textrepo server [view more]
  • Active: The project has reached a stable, usable state and is being actively developed.

★ ★ ★ ☆ ☆
version
  • Command-line Application
TICCL & PICCL
PICCL 0.9.5
  • Unknown
A set of workflows for corpus building through OCR, post-correction, and normalisation. [view more]
Keywords:
  • natural language processing
  • nlp
  • ocr
  • Unsupported: The project has reached a stable, usable state but the author(s) have ceased all work on it. A new maintainer may be desired.

★ ★ ★ ☆ ☆
TICCLTools 0.10
  • Software Library
TicclTools is a collection of programs to process datafiles, together they constitute the bulk of TICCL: Text Induced Corpus-Cleanup. This software consists of individual modules that are invoked by the pipeline system PICCL. [view more]
Keywords:
  • natural language processing
  • nlp
  • normalization
  • ocr
  • Unsupported: The project has reached a stable, usable state but the author(s) have ceased all work on it. A new maintainer may be desired.

★ ★ ★ ☆ ☆
TICCLTools
  • Software Library
TiMBL
python3-timbl 2024.10.29 2024-10-29 15:22:27 +0100
  • Unknown
Python 3 language binding for the Tilburg Memory-Based Learner [view more]
Category:
  • Scientific/Engineering
  • Text Processing > Linguistic
Keywords:
  • k-nearest-neighbours
  • knn
  • machine-learning
  • python
  • timbl
  • Experimental: The technology is implemented and ready for experimental settings (beta), but requires further work and validation.
  • Active: The project has reached a stable, usable state and is being actively developed.

★ ★ ★ ☆ ☆
TiMBL 6.9
  • Command-line Application
  • Software Library
TiMBL is an open source software package implementing several memory-based learning algorithms, among which IB1-IG, an implementation of k-nearest neighbor classification with feature weighting suitable for symbolic feature spaces, and IGTree, a decision-tree approximation of IB1-IG. All implemented algorithms have in common that they store some representation of the training set explicitly in memory. During testing, new cases are classified by extrapolation from the most similar stored cases. [view more]
Keywords:
  • decision tree
  • k-nearest neighbours
  • knn
  • machine learning
  • memory based learning
  • natural language processing
  • nlp
  • Active: The project has reached a stable, usable state and is being actively developed.

★ ★ ★ ☆ ☆
timbl
  • Command-line Application
Memory-based learner, command-line tool
libtimbl
  • Software Library
Memory-based Learning Library with API for C++
Timbuctoo 7.15 2024-03-01 10:25:59 +0100
  • Unknown
An RDF datastore that gives researchers control over the sharing of data between datasets [view more]
Keywords:
  • berkeley-db
  • graphql
  • humanities
  • java
  • r2rml
  • rdf
  • Active: The project has reached a stable, usable state and is being actively developed.

★ ★ ★ ★ ☆
  •   Ronald Haentjens Dekker
  •   Ronald Haentjens Dekker
  •   Pratham Joshi
  •   Meindert Kroese
  •   Martijn Maas
  •   Kerim Meijer
  •   Jauco Noordzij
  •   Walter Ravenek
  •   Henk van den Berg
  •   René van der Ark
Ucto
python-ucto 0.6.8 2024-09-12 14:10:03 +0200
  • Unknown
This is a Python binding to the tokenizer Ucto. Tokenisation is one of the first step in almost any Natural Language Processing task, yet it is not always as trivial a task as it appears to be. This binding makes the power of the ucto tokeniser available to Python. Ucto itself is a regular-expression based, extensible, and advanced tokeniser written in C++ (https://languagemachines.github.io/ucto). [view more]
Category:
  • Text Processing > Linguistic
Keywords:
  • tokenizer tokenization tokeniser tokenisation nlp computational_linguistics ucto
  • 8 - Complete: Technology complete and qualified, released for all end-users in scholarly environments.
  • Active: The project has reached a stable, usable state and is being actively developed.

★ ★ ★ ★ ☆
  •   KNAW Humanities Cluster & CLST, Radboud University
ucto 0.34 2023-02-22 12:17:06 +0100
  • Command-line Application
  • Software Library
Ucto tokenizes text files: it separates words from punctuation, and splits sentences. This is one of the first tasks for almost any Natural Language Processing application. Ucto offers several other basic preprocessing steps such as changing case that you can all use to make your text suited for further processing such as indexing, part-of-speech tagging, or machine translation. [view more]
Category:
  • Annotating
  • Linguistics
  • Tagging
  • Textual and content analysis
Keywords:
  • natural language processing
  • nlp
  • tokenization
  • tokenizer
  • 9 - Proven: Technology complete and proven in practice by real users.
  • Active: The project has reached a stable, usable state and is being actively developed.

★ ★ ★ ★ ☆
ucto
  • Command-line Application
Command-line interface to the tokenizer
libucto
  • Software Library
Ucto Library with API for C++
Ucto-Webservice 2.5.2 2024-03-14 21:54:52 +0100
  • Web Application
Ucto is a rule-based tokeniser for multiple languages. This is the webservice for it, for both humans and machines. [view more]
Category:
  • Annotating
  • Linguistics
  • Tagging
  • Textual and content analysis
Keywords:
  • clam webservice rest nlp computational_linguistics rest
  • 8 - Complete: Technology complete and qualified, released for all end-users in scholarly environments.
  • Active: The project has reached a stable, usable state and is being actively developed.

★ ★ ★ ☆ ☆
  •   KNAW Humanities Cluster & CLST, Radboud University
Ucto Webservice 2.5.2
  • Web Application
Ucto is a unicode-compliant tokeniser. It takes input in the form of one or more untokenised texts, and subsequently tokenises them. Several languages are supported, but the software is extensible to other languages.
udpipe-service 4.10 2023-11-26 09:23:18 +0100
  • Web Application
A rest service for an R / udpipe based tokenizer, lemmatizer, pos-tagger and dependency parser. See https://bitbucket.org/fryske-akademy/udpipe for (docker) setup. [view more]
  • Active: The project has reached a stable, usable state and is being actively developed.

★ ★ ★ ☆ ☆
Service to tokenize, lemmatize, pos-tag and dependency parse using udpipe
  • Web Application
vocabulary-recommender 2.0.0 2022-12-23 13:24:13 +0100
  • Software Library
A generic linked data vocabulary recommender library that provides recommendation functions for various backends. [view more]
Keywords:
  • lod
  • namespace
  • recommender
  • vocabulary
  • Inactive: The project has reached a stable, usable state but is no longer being actively developed; support/maintenance will be provided as time allows.

★ ★ ★ ☆ ☆
vocabulary-recommender
  • Software Library
vurmpipe 3.0 2019-03-24 22:55:06 +0100
  • Unknown
VU Reading Machine Pipeline [view more]
  • Inactive: The project has reached a stable, usable state but is no longer being actively developed; support/maintenance will be provided as time allows.

★ ★ ★ ☆ ☆