Here you find all tools (i.e. software and software services) developed in the CLARIAH project, as well as some tools from predecessors and sister projects. Our tools are designed for researchers and developers in the Humanities and Social Sciences. Not all tools are suitable for all audiences and not all tools are mature and stable, this information should be clearly indicated for each tool, so you can make an informed judgement whether a tool might be suitable for you.
This list is automatically harvested from the tool producers and providers themselves, and updated daily.
Are you a CLARIAH developer and is your tool not included in the index yet or do you have questions or comments on the metadata? Please read our contribution guidelines
Alpino
Alpino-Webservice 2.4
- KNAW Humanities Cluster & CLST, Radboud University
Alpino is a dependency parser for Dutch, developed in the context of the PIONIER Project Algorithms for Linguistic Processing, developed by Gertjan van Noord at the University of Groningen. This is the webservice for it. You can upload either tokenised or untokenised files (which will be automatically tokenised for you using ucto), the output will consist of a zip file containing XML files, one for each sentence in the input document. [view more]
- Internet > WWW/HTTP > WSGI > Application
- Text Processing > Linguistic
- dependency parsing
- folia
- linguistics
- nlp
- syntax
- Bsd
- Linux
- Macos
- Python
- Source code
- Go to Alpino Webservice (WebApplication) https://webservices.cls.ru.nl/alpino Alpino is a dependency parser for Dutch, developed in the context of the PIONIER Project Algorithms for Linguistic Processing, developed by Gertjan van Noord at the University of Groningen. You can upload either tokenised or untokenised files (which will be automatically tokenised for you using ucto), the output will consist of a zip file containing XML files, one for each sentence in the input document.
Created: 2015-09-08
Modified: 2023-11-01
AlpinoGraph 1.0.5
AlpinoGraph is een tool om syntactisch geannoteerde corpora te doorzoeken. De tool maakt gebruik van AgensGraph. AgensGraph combineert databasetechnologie (PostgreSQL) en Cypher, de standaard zoektaal voor grafen. De zoek-queries die je in AlpinoGraph kunt gebruiken zijn daarom een mix van SQL en Cypher. Daar voegt AlpinoGraph nog enkele extra uitbreidingen aan toe, zoals een eenvoudig maar handig systeem van macro's, en visualisatie van de resultaten. [view more]
- Linguistics
- nwo:ComputationalLinguisticsandPhilology
- Software for humanities
- Structural Analysis
- Alpino
- Cypher
- Dependency parsing
- SPOD: Syntactic profiler of Dutch
- UD: Universal Dependencies
- Docker
- Linux
Created: 2020-03-25
Modified: 2024-04-24
alud 2.14.0
A Go package for deriving Universal Dependencies from Dutch sentences parsed with Alpino [view more]
- Linguistics
- nwo:ComputationalLinguisticsandPhilology
- Software for humanities
- Structural Analysis
- Alpino
- UD: Universal Dependencies
- Aix
- Android
- Darwin
- Dragonfly
- Freebsd
- Illumos
- Ios
- Js
- Linux
- Netbsd
- Openbsd
- Plan9
- Solaris
- Windows
Created: 2019-06-30
Modified: 2024-04-24
analiticcl 0.4.6
- KNAW Humanities Cluster & CLST, Radboud University
Analiticcl is an approximate string matching or fuzzy-matching system that can be used to find variants for spelling correction or text normalisation [view more]
- linguistics
- nlp
- spellcheck
- spelling-correction
- text-processing
Created: 2021-04-13
Modified: 2024-04-22
AnnoRepo
Created: 2022-03-24
Modified: 2024-04-03
Created: 2022-04-07
Modified: 2023-11-29
asrservice 0.3
An Automatic Speech Recognition Service for a variety of languages, powered by WhisperX [view more]
- Internet > WWW/HTTP > WSGI > Application
- Text Processing > Linguistic
- clam webservice rest nlp computational_linguistics rest
- Bsd
- Linux
- Macos
- Python
- Source code
- Go to Automatic Speech Recognition Service (WebApplication) https://webservices2.cls.ru.nl/asrservice An Automatic Speech Recognition Service for a variety of languages, powered by WhisperX
Created: 2024-02-16
Modified: 2024-04-12
Created: 2022-01-12
Modified: 2023-08-21
Automatic Speech Recognition for Dutch 0.6.2
This is a web-based automatic speech recogniser for Dutch, capable of transcribing dutch speech recordings using multiple models. [view more]
- Software for humanities
- Speech Recognizing
- dutch
- nlp
- speech recognition
- Linux
- Source code
- Go to Automatic Transcription of Dutch Speech Recordings (WebApplication) https://webservices.cls.ru.nl/asr_nl This webservice uses automatic speech recognition to provide the transcriptions of recordings spoken in Dutch. You can upload and process only one file per project. For bulk processing and other questions, please contact Henk van den Heuvel at h.vandenheuvel@let.ru.nl.
Created: 2017-04-02
Blacklab & Corpus Search
A Blacklab Server CLARIN FCS 2.0 endpoint 0.1
CLARIAH Federated content search corpora, developed by the Dutch Language Institute (INT), is a service to enable searching in multiple Dutch corpora at the same time. This application implements the CLARIN FCS 2.0 specification on top of Dutch language corpora. This repository hosts the source code. [view more]
- BlackLab
- CLARIN
- corpus search
- FCS 2.0
- Federated Content Search
- Nederlab
- Source code
- Go to FCS Aggregator (WebApplication) https://spraakbanken.gu.se/ws/fcs/2.0/aggregator/ The Aggregator application is a part of the CLARIN-FCS common federated content search infrastructure. It serves as a user interface to perform queries to CLARIN-resources and display search results. The Aggregator communicates with components called endpoints, which are provided as a service by all centres who participate in the federated content search. Each endpoint provides access to one or more searchable resources. The user can select a specific resource or resources, based on the resource name or on the language, or search through all of them. The content of these resources is searched with the query supplied to the endpoint. The endpoint returns results to this query and the aggregator collects the responses from all the endpoints and displays them to the user.
Created: 2016-09-11
Modified: 2023-05-10
BlackLab Corpus Search 3.0.1
The parent project for BlackLab Core and BlackLab Server. [view more]
- corpus
Created: 2012-10-04
Modified: 2022-10-06
INT Corpus Frontend 3.1.1
A web application to search corpora through the BlackLab Server web service. [view more]
- corpus
- Source code
- Go to Brieven als Buit search (WebApplication) https://brievenalsbuit.ivdnt.org Brieven als Buit provided by the Dutch Language Institute in Leiden.
- Go to Corpus Hedendaags Nederlands (WebApplication) https://chn.ivdnt.org/ CHN, provided by the Dutch Language Institute in Leiden.
- Go to OpenSoNaR (WebApplication) https://opensonar.ivdnt.org/ OpenSoNaR, provided by the Dutch Language Institute in Leiden.
Created: 2014-03-19
Modified: 2024-02-02
Created: 2022-06-29
Modified: 2024-09-02
Created: 2021-02-16
Modified: 2022-09-21
Created: 2017-03-15
Modified: 2024-03-13
CLAM 3.2.10
Quickly turn command-line applications into RESTful webservices with a web-application front-end. You provide a specification of your command line application, its input, output and parameters, and CLAM wraps around your application to form a fully fledged RESTful webservice. [view more]
- natural language processing
- nlp
- rest
- webservice
- Linux
Created: 2010-03-21
Modified: 2024-03-14
Created: 2021-04-18
Modified: 2021-06-28
CLARIAH Tool Discovery
CLARIAH Tool Discovery 1.6.4
This is the over-arching project for CLARIAH Tool Discovery, its components harvest and aggregate codemeta from source repositories and service endpoints, automatically converting known metadata schemes in the process. This project holds the Tool Source Registry, pointing to all the tools that are to be harvested. It also holds the validation schema. [view more]
- Browsing
- Databases for humanities
- Discovering
- Exploration
- Gathering
- Software for humanities
- codemeta
- harvester
- linked data
- metadata
- rdf
- schema.org
- software metadata
- Source code
- Go to CLARIAH Tools (WebApplication) https://tools.clariah.nl This is a web portal where you can find all tools (i.e. software and software services) developed in the CLARIAH project, as well as some tools from predecessors and sister projects. This list is automatically harvested from the tool producers and providers themselves, and updated daily. Our tools are designed for researchers and developers in the Humanities and Social Sciences. Not all tools are suitable for all audiences and not all tools are mature and stable, this information should be clearly indicated for each tool, so you can make an informed judgement whether a tool might be suitable for you.
Created: 2022-01-05
Modified: 2024-06-04
codemeta-harvester 0.4.0
Harvest and aggregate codemeta from source repositories and service endpoints, automatically converting known metadata schemes in the process [view more]
- codemeta
- harvester
- linked data
- metadata
- rdf
- schema.org
- software metadata
Created: 2022-01-05
Modified: 2024-06-03
codemeta-lod-to-cmdi 1.0-SNAPSHOT
CLARIAH Tool Discovery output (LOD -> CMDI conversion) [view more]
Created: 2023-02-01
Modified: 2023-05-15
codemeta-server 0.4.1
Web API serving codemeta software metadata using codemeta and schema.org, provides a SPARQL endpoint and also offers a human web-interface [view more]
- Software Development
- codemeta
- linked data
- metadata
- rdf
- schema.org
- scientific
- software metadata
- Bsd
- Linux
- Macos
- Python
Created: 2022-03-22
Modified: 2023-11-24
codemeta2html 0.1.0
Convert software metadata in codemeta to html for visualisation, can generate fully-fledged static sites that serve well as a portal for a collection of software [view more]
- Software Development
- codemeta
- linked data
- metadata
- rdf
- schema.org
- scientific
- software metadata
- Bsd
- Linux
- Macos
- Python
Created: 2023-05-06
Modified: 2023-05-15
CodeMetaPy 2.5.3
Codemetapy is a command-line tool and python library to work with the codemeta software metadata standard. Codemeta builds upon schema.org and defines a vocabulary for describing software source code. It maps various existing metadata standards to a unified vocabulary. Codemetapy allows you to generate codemeta from various sources. [view more]
- Computer science
- Converting
- Software Development
- codemeta
- linked data
- metadata
- metadata-extractor
- rdf
- schema.org
- scientific
- software metadata
- Bsd
- Linux
- Macos
- Python
Created: 2018-04-16
Modified: 2024-06-14
Created: 2014-08-07
Modified: 2021-03-07
Created: 2014-04-14
Modified: 2020-07-17
Colibri Core 2.5.9
Colibri core is an NLP tool as well as a C++ and Python library for working with basic linguistic constructions such as n-grams and skipgrams (i.e patterns with one or more gaps, either of fixed or dynamic size) in a quick and memory-efficient way. [view more]
- language modelling
- natural language processing
- ngrams
- nlp
- pattern recognition
- skipgrams
- Bsd
- Linux
- Macos
Created: 2013-09-15
Corpus Editor for Syntactically Annotated Resources (Cesar) unknown
Django web application that communicates with the CorpusStudioWeb back-end 'Crpp'. Two main purposes: (1) browse texts, (2) conduct syntactic searches with definable output per hit. Searches are translated to Xquery 'under the hood' [view more]
- syntax
- xquery
- Posix
Created: 2018
Created: 2023-11-09
Modified: 2024-03-08
DANE
Created: 2019-11-25
Modified: 2024-05-13
dane-asr-worker 0.1.0
Automatic speech recognition through an external service. Depends on DANE-server [view more]
- Multimedia processing
Created: 2022-02-15
dane-download-worker 0.9.0
Basic "DANE worker" that downloads input data via HTTP(s) URLs for further processing by other DANE workers. Depends on DANE-server [view more]
- Multimedia processing
Created: 2022-02-08
DANE-server 0.3.1
Back-end for the Distributed Annotation 'n' Enrichment (DANE) system [view more]
Created: 2020-01-22
Modified: 2023-06-19
dane-workflows 0.9.0
- The Netherlands Institute for Sound and Vision
Python library for setting up simple data processing workflows (using DANE) [view more]
- Multimedia processing
Created: 2022-07-18
Created: 2020-02-08
Modified: 2021-04-11
Created: 2022-07-19
Modified: 2024-05-22
did-summarizer
Linked Data summarizer driven by Decentralized Identifiers (DIDs) [view more]
Created: 2022-11-25
Modified: 2024-02-02
Electronisch woordenboek van de Achterhoekse en Liemerse dialecten unknown
Django web application that facilities viewing and searching a dictionary of Dutch dialects from the regions 'Achterhoek' and 'Liemers' [view more]
- dialect
- dictionary
- dutch
- Posix
Created: 2019
Electronisch woordenboek van de Gelderse dialecten unknown
Django web application that facilities viewing and searching a dictionary of Dutch dialects from the province 'Gelderland' [view more]
- dialect
- dictionary
- dutch
- Posix
Created: 2019
Electronisch woordenboek van de Gelderse dialecten unknown
Django web application that facilities viewing and searching a dictionary of dialects from the Dutch province 'Noord-Brabant' as well as the Belgian provinces of Antwerpen, Vlaams-Brabant and Brussels [view more]
- dialect
- dictionary
- dutch
- Posix
Created: 2017
Electronisch woordenboek van de Limburgse dialecten unknown
Django web application that facilities viewing and searching a dictionary of the Dutch Limburgian dialects [view more]
- dialect
- dictionary
- dutch
- Posix
Created: 2016
FLAT
FoLiA-Linguistic-Annotation-Tool 0.11.5
- KNAW Humanities Cluster & CLST, Radboud University
FLAT is a web-based linguistic annotation environment based around the FoLiA format (https://proycon.github.io/folia), a rich XML-based format for linguistic annotation. Flat allows users to view annotated FoLiA documents and enrich these documents with new annotations, a wide variety of linguistic annotation types is supported through the FoLiA paradigm. [view more]
- Text Processing > Linguistic
- annotation
- computational linguistics
- folia
- linguistics
- nlp
- Bsd
- Linux
- Macos
- Python
Created: 2014-01-02
Modified: 2024-07-05
foliadocserve 0.7.8
- KNAW Humanities Cluster & CLST, Radboud University
The FoLiA Document Server is a backend HTTP service to interact with documents in the FoLiA format, a rich XML-based format for linguistic annotation (http://proycon.github.io/folia). It provides an interface to efficiently edit FoLiA documents through the FoLiA Query Language (FQL). [view more]
- Text Processing > Linguistic
- nlp computational_linguistics rest database document server
- Bsd
- Linux
- Macos
- Python
Created: 2015-02-12
Modified: 2024-02-07
FoLiA
Created: 2019-06-08
Modified: 2020-11-16
FoLiA tools 2.5.7
- KNAW Humanities Cluster & CLST, Radboud University
FoLiA-tools contains various Python-based command line tools for working with FoLiA XML (Format for Linguistic Annotation) [view more]
- Annotating
- https://w3id.org/nwo-research-fields#ComputationalLinguisticsandPhilology
- Textual and linguistic corpora
- annotation
- computational linguistics
- folia
- nlp
- search
- Bsd
- Linux
- Macos
- Python
Created: 2011-01-14
Modified: 2024-05-14
FoLiApy 2.5.11
- KNAW Humanities Cluster & CLST, Radboud University
An extensive library for processing FoLiA documents. FoLiA stands for Format for Linguistic Annotation and is a very rich XML-based format used by various Natural Language Processing tools. [view more]
- Annotating
- https://w3id.org/nwo-research-fields#ComputationalLinguisticsandPhilology
- Textual and linguistic corpora
- annotation
- computational linguistics
- folia
- format
- nlp
- xml
- Bsd
- Linux
- Macos
- Python
Created: 2010-05-27
Modified: 2024-03-28
foliautils 0.22
Command-line utilities for working with the Format for Linguistic Annotation (FoLiA). [view more]
- folia
- linguistic annotation
- natural language processing
- nlp
- xml
- Posix
piereling 0.4
- KNAW Humanities Cluster & CLST, Radboud University
Piereling is a webservice and web-application to convert between a variety of document formats, mostly from and to FoLiA XML. It is intended for NLP pipelines. [view more]
- Internet > WWW/HTTP > WSGI > Application
- Text Processing > Linguistic
- webservice nlp computational_linguistics rest folia conversion
- Bsd
- Linux
- Macos
- Python
- Source code
- Go to Piereling (WebApplication) https://webservices.cls.ru.nl/piereling Piereling can convert a wide variety of document formats to FoLiA XML, and from FoLiA XML to various formats. Data conversions such as these provide the groundwork for Natural Language Processing pipelines. It relies on numerous specialised conversion tools in combination with notable third-party tools such as pandoc.
Created: 2019-10-18
Modified: 2023-11-01
Forced Alignment 2 0.3.1
This webservice provides an output file with word alignments given an NL speech recording and a transcription. [view more]
- alignment
- speech recognition
- Linux
- Website
- Source code
- Go to ForcedAlignment2 (WebApplication) https://webservices.cls.ru.nl/forcedalignment2 Forced Alignment of text and audio files
Created: 2020-03
Frog
Frog 0.33
Frog is an integration of memory-based natural language processing (NLP) modules developed for Dutch. It performs automatic linguistic enrichment such as part of speech tagging, lemmatisation, named entity recognition, shallow parsing, dependency parsing and morphological analysis. All NLP modules are based on TiMBL. [view more]
- Annotating
- Contextualizing
- Linguistics
- Named Entity Recognition
- POS-Tagging
- Segmenting
- Tagging
- Textual and content analysis
- Tree-Tagging
- dependency parsing
- dutch
- lemma
- lemmatisation
- natural language processing
- ner
- nlp
- parser
- part-of-speech tagging
- pos
- shallow parsing
- tagger
- Bsd
- Linux
- Macos
Created: 2011-03-31
Modified: 2023-12-05
Frog-Webservice 2.7
Frog is a suite containing a tokeniser, Part-of-Speech tagger, lemmatiser, morphological analyser, shallow parser, and dependency parser for Dutch. This is the webservice for it, for both humans and machines. [view more]
- Annotating
- Contextualizing
- Linguistics
- Named Entity Recognition
- POS-Tagging
- Segmenting
- Tagging
- Textual and content analysis
- Tree-Tagging
- clam webservice rest nlp computational_linguistics rest
- Bsd
- Linux
- Macos
- Python
- Website
- Source code
- Go to Frog Webservice (WebApplication) https://webservices.cls.ru.nl/frog Frog is a suite containing a tokeniser, Part-of-Speech tagger, lemmatiser, morphological analyser, shallow parser, and dependency parser for Dutch.
Created: 2022-02-17
Modified: 2023-12-05
python-frog 0.6.10
Python binding to Frog, an NLP suite for Dutch doing part-of-speech tagging, lemmatisation, morphological analysis, named-entity recognition, shallow parsing, and dependency parsing. [view more]
- Annotating
- Contextualizing
- Linguistics
- Named Entity Recognition
- POS-Tagging
- Segmenting
- Tagging
- Textual and content analysis
- Tree-Tagging
- nlp computational_linguistics dutch pos lemmatizer
- Bsd
- Cython
- Linux
- Macos
- Python
Created: 2014-09-07
Modified: 2023-12-05
fusus 0.0.2
- Among, A Community for DH and MS
Workflow for converting Arabic scanned pages into readable text [view more]
- Religion
- Scientific/Engineering > Information Analysis
- Sociology > History
- Text Processing
- Text Processing > Fonts
- Text Processing > Markup
- arabic
- image processing
- islam
- medieval
- OCR
- text
- Macos
- Microsoft
- Posix
- Python
Created: 2020-03-03
Modified: 2023-04-11
g2pservice 0.3.4
Grapheme to Phoneme converter. Input is a list of words (utf8). Choose one of the language options. [view more]
- Internet > WWW/HTTP > WSGI > Application
- Text Processing > Linguistic
- speech
- transcription
- Bsd
- Linux
- Macos
- Python
- Website
- Source code
- Go to Grapheme to Phoneme converter (WebApplication) https://webservices.cls.ru.nl/g2pservice Grapheme to Phoneme (G2P) conversion. Input is a list of words (utf-8, one word per line). The G2P will output the best guess for the phonetic transcription per word. The system is trained on existing dictionaries. Please choose a language option. The system is a demo-version --- please refer to CLST for using G2P for long word lists.
Created: 2019-02-25
Modified: 2023-05-12
GaLAHaD
GaLAHaD 1.2.2
GaLAHaD (Generating Linguistic Annotations for Historical Dutch) allows linguists to compare taggers, tag their own corpora, evaluate the results and export their tagged documents. [view more]
- Analyzing
- Annotating
- Artificial intelligence, export systems
- Comparing
- Computational linguistics and philology
- Converting
- Enriching
- Lemmatizing
- Linguistics
- Machine Learning
- Merging
- POS-Tagging
- Software for humanities
- Tagging
- Textual and linguistic corpora
- Jvm
- Linux
- Node
Created: 2024-05-31
Modified: 2024-08-30
GaLAHaD Train Battery 1.0.0
Python program for training linguistic annotation taggers based on a configuration file and list of datasets. It prepares the resulting trained models for dockerization and adds relevant metadata. It is tagger software agnostic as long as a simple python shell is built around it. [view more]
- Artificial intelligence, export systems
- Computational linguistics and philology
- Linguistics
- Linux
- Python
Created: 2024-05-31
Modified: 2024-06-04
Created: 2024-05-31
Modified: 2024-06-05
Created: 2015-01-08
Modified: 2020-07-11
Generale Missieven in Text-Fabric v1.1e
Conversion of Generale Missieven to Text-Fabric and tutorial how to work with the result [view more]
- corpus-data
- corpus-linguistics
- corpus-processing
- corpus-tools
- dutch
- history
- nlp
- Linux
- Macos
- Python
- Windows
Created: 2020-09-02
Modified: 2024-03-27
Glem 1.3.1
GLEM is a lemmatizer for Ancient Greek. [view more]
- Annotating
- Computational linguistics and philology
- Greek and Latin philology and literature
- ancient greek
- greek
- lemma
- lemmatisation
- natural language processing
- nlp
- Posix
- Website
- Source code
- Go to Glem (WebApplication) https://webservices.cls.ru.nl/glem GLEM is a lemmatizer for Ancient Greek.
Created: 2017-04-09
Modified: 2023-10-05
grlc: the git repository linked data API constructor 1.3.7
grlc, the git repository linked data API constructor, automatically builds Web APIs using SPARQL queries stored in git repositories. [view more]
- linked-data
- linked-data-api
- semantic-web
- sparql
- swagger-ui
Created: 2015-11-13
Modified: 2022-03-21
Created: 2023-07-27
Modified: 2024-06-03
I-Analyzer 5.3.0
I-analyzer is a tool for exploring corpora (large collections of texts). You can use I-analyzer to find relevant documents, or to make visualisations to understand broader trends in the corpus. The interface is designed to be accessible for users of all skill levels.
I-analyzer is primarily intended for academic research and higher education. We focus on data that is relevant for the humanities, but we are open to datasets that are relevant for other fields. [view more]
- corpus research
- data visualization
- elasticsearch
- natural language processing
- text-mining
Created: 2016-09-01
Modified: 2023-12-08
- Go to Ineo - Start using digital humanities resources - Ineo (WebApplication) https://ineo.tools/ Ineo lets you search, browse, find and select digital resources for your research in humanities and social sciences. The platform is already fully functional, but is still being filled with resource content. At the end of 2023, it will offer access to many tools, datasets, workflows, standards and educational material.
Created: 2023-05-17
Modified: 2024-09-06
Created: 2016-04-22
Modified: 2023-11-01
LaMachine 2.28
LaMachine is a unified software distribution for Natural Language Processing. We integrate numerous open-source NLP tools, programming libraries, web-services, and web-applications in a single Virtual Research Environment that can be installed on a wide variety of machines. [view more]
- installer
- natural language processing
- nlp
- python
- software distribution
- Posix
Created: 2015-05-17
Lenticular Lens
lenticular-lens 1.17
Lenticular Lens is a tool which allows users to construct linksets between entities from different Timbuctoo datasets (so called data-alignment or reconciliation). Lenticular Lens tracks the configuration and the algorithms used in the alignment and is also able to report on manual corrections and the amount of manual validation done. [view more]
Created: 2019-01-16
Modified: 2022-10-26
Created: 2021-06-16
Modified: 2023-01-25
lenticular-lens-postgresql 1.3
PostgreSQL extension for Lenticular Lens [view more]
Created: 2021-01-22
Modified: 2023-08-16
lingua-cli 0.4.0
- KNAW Humanities Cluster & CLST, Radboud University
Lingua-cli is a command line tool for language classification, using the lingua-rs library. [view more]
Created: 2022-04-16
Modified: 2024-06-10
mbt 3.10
MBT is a memory-based tagger-generator and tagger in one. The tagger-generator part can generate a sequence tagger on the basis of a training set of tagged sequences; the tagger part can tag new sequences. MBT can, for instance, be used to generate part-of-speech taggers or chunkers for natural language processing. It has also been used for named-entity recognition, information extraction in domain-specific texts, and disfluency chunking in transcribed speech. [view more]
- machine learning
- memory based learning
- natural language processing
- nlp
- tagger
- Bsd
- Linux
- Macos
Media Suite
CLARIAH Media Suite 6.10
The CLARIAH Media Suite is a research environment in which researchers can search, bookmark, annotate and compare items from a number of cultural heritage collections [view more]
- collection analysis
- cultural heritage
- data portal
- faceted search
- scholerly annotation
- virtual workspace
- Linux
Created: 2023-11-21
Modified: 2023-11-21
Nederlab
Created: 2016-07-11
Modified: 2022-01-14
Nederlab Pipeline 0.8.0
A set of workflows for linguistic enrichment of historical dutch [view more]
- natural language processing
- nlp
- Posix
Created: 2017
Netwerk Digitaal Erfgoed (NDE)
Created: 2020-12-14
Network of Terms GraphQL API
GraphQL API for the Network of Terms, a Search engine for finding terms in terminology sources (such as thesauri, classification systems and reference lists) [view more]
- Identifying
- graphql
- linked-data
- search
Created: 2020-04-17
Network of Terms Reconciliation API
Reconciliation API for the Network of Terms, a Search engine for finding terms in terminology sources (such as thesauri, classification systems and reference lists) [view more]
- Identifying
- graphql
- linked-data
- search
Created: 2020-04-17
OpenDutchWordnet
This repo provides a python module to work with Open Dutch WordNet. It was created using python 3.4. [view more]
Created: 2015-09-01
Modified: 2021-05-11
pagexml-tools 0.5.0
Utility functions for reading PageXML files [view more]
- Scientific/Engineering
- Os
- Python
Created: 2021-05-07
Modified: 2024-03-18
PaQu 1.0.5
Met PaQu (Parse & Query) kun je zoeken in syntactisch geannoteerde Nederlandstalige corpora.
PaQu ondersteunt twee manieren van zoeken. Met de eerste, eenvoudige, manier kun je naar woordparen zoeken, met daarbij eventueel hun syntactische relatie. De tweede, ingewikkeldere, manier gebruikt de zoektaal XPath.
In PaQu is een aantal syntactisch geannoteerde corpora standaard beschikbaar. Maar het is ook mogelijk om je eigen teksten aan te bieden. Deze teksten worden dan door de automatische ontleder geanalyseerd, en opgeslagen. Vervolgens kun je dan op dezelfde manier in je eigen teksten zoeken. [view more]
- Linguistics
- nwo:ComputationalLinguisticsandPhilology
- Software for humanities
- Structural Analysis
- Alpino
- Dependency parsing
- SPOD: Syntactic profiler of Dutch
- UD: Universal Dependencies
- XPath
- Docker
- Linux
Created: 2014-05-21
Modified: 2024-04-24
Created: 2022-04-19
Modified: 2024-04-11
Ricgraph - Research in context graph v2.4
Ricgraph, also known as Research in context graph, enables the exploration of researchers, teams, their results, collaborations, skills, projects, and the relations between these items.
Ricgraph can store many types of items into a single graph. These items can be obtained from various systems and from multiple organizations. Ricgraph facilitates reasoning about these items because it infers new relations between items, relations that are not present in any of the separate source systems. It is flexible and extensible, and can be adapted to new application areas.
Throughout this text, we illustrate how Ricgraph works by applying it to the application area research information.
Motivation
Ricgraph, also known as Research in context graph, is software that is about relations between items. These items can be collected from various source systems and from multiple organizations. We explain how Ricgraph works by applying it to the application area research information. We show the insights that can be obtained by combining information from various source systems, insight arising from new relations that are not present in each separate source system.
Research information is about anything related to research: research results, the persons in a research team, their collaborations, their skills, projects in which they have participated, as well as the relations between these entities. Examples of research results are publications, data sets, and software.
Example use cases from the application area research information are:
(1) As a journalist, I want to find researchers with a certain skill and their publications, so that I can interview them for a newspaper article.
(2) As a librarian, I want to enrich my local research information system with research results that are in other systems but not in ours, so that we have a more complete view of research at our university.
(3) As a researcher, I want to find researchers from other universities that have co-authored publications written by the co-authors of my own publications, so that I can read their publications to find out if we share common research interests.
These use cases use different types of information (called items): researchers, skills, publications, etc. Most often, these types of information are not stored in one system, so the use cases may be difficult or time-consuming to answer. However, by using Ricgraph, these use cases (and many others) are easy to answer.
Although this text illustrates Ricgraph in the application area research information, the principle "relations between items from various source systems" is general, so Ricgraph can be used in other application areas.
Main contributions of Ricgraph
(1) Ricgraph can store many types of items in a single graph.
(2) Ricgraph harvests multiple source systems into a single graph.
(3) Ricgraph Explorer is the exploration tool for Ricgraph.
(4) Ricgraph facilitates reasoning about items because it infers new relations between items.
(5) Ricgraph can be tailored for an application area.
Read more about Ricgraph
For a gentle introduction in Ricgraph, read the reference publication: Rik D.T. Janssen (2024). Ricgraph: A flexible and extensible graph to explore research in context from various systems. SoftwareX, 26(101736). https://doi.org/10.1016/j.softx.2024.101736
Extensive documentation, publications, videos and source code can be found in the GitHub repository https://github.com/UtrechtUniversity/ricgraph
The website for Ricgraph can be found at https://www.ricgraph.eu [view more]
- Analyzing
- Browsing
- Capturing
- Discovering
- Enriching
- Exploration
- Information Retrieval
- Storing
- Data enrichment
- Data harvesting
- Data linking
- Graph
- Graph database
- Knowledge graph
- Linked data
- Metadata
- Research in context graph
- Ricgraph
- Ricgraph Explorer
- Ricgraph REST API
- Utrecht University
Created: 2023-01-10
Modified: 2024-09-10
Created: 2021-11-18
Modified: 2024-06-18
Created: 2022-11-15
Modified: 2022-12-21
shebanq v4.2z
Exposing the Hebrew Text Database of the ETCBC [view more]
- annotation
- etcbc
- etcbc-data
- hebrew
- hebrew-bible
- search-engine
- text-fabric
- Linux
- Macos
- Python
- Selinux
- Windows
- Website
- Source code
- Go to SHEBANQ (WebApplication) https://shebanq.ancient-data.org Search engine for biblical Hebrew based on the Biblia Hebraica Stuttgartensia (Amstelodamensis) database (formerly known as ETCBC, historically known as WIVU)
- Go to SHEBANQ (WebApplication) https://shebanq.ancient-data.org
Created: 2017-10-19
Modified: 2022-10-12
Created: 2021-01-26
Modified: 2024-02-14
STAM
Created: 2023-01-31
Modified: 2024-08-29
Created: 2023-01-03
Modified: 2024-08-29
stam v1.1.0
Stand-off Text Annotation Model (STAM) is a data model for stand-off-text annotation where any information on a text is represented as an annotation. This repository contains the model's full specification, extensions, schemas, examples and documentation. [view more]
- Annotating
- Textual and content analysis
- Textual and linguistic corpora
- annotation
- linguistics
- stand-off
- text
- text-annotation
- webannotation
Created: 2021-09-09
Modified: 2024-08-23
stam-tools 0.8.0
Command-line tools for working with stand-off annotations on text (STAM) [view more]
- Annotating
- Textual and content analysis
- Textual and linguistic corpora
- annotation
- linguistics
- nlp
- standoff
- text-processing
Created: 2023-03-21
Modified: 2024-08-29
T-Scan 0.10.0
T-Scan is an analysis tool for Dutch texts to assess the complexity of the text, and is based on original work by Rogier Kraf [view more]
- dutch
- feature extraction
- natural language processing
- nlp
- readability
- Posix
- Website
- Source code
- Go to T-scan (WebApplication) https://tscan.hum.uu.nl/tscan T-Scan is an analysis tool for Dutch text, mainly focusing on text complexity. It has been initially conceptualized by Rogier Kraf and Henk Pander Maat. Rogier Kraf also programmed the first versions. From 2012 on, Henk Pander Maat supervised the development of the extended versions of the tool. These versions were programmed by Maarten van Gompel, Ko van der Sloot, Martijn van der Klis, Sheean Spoel and Luka van der Plas.
Created: 2012-09-12
text-fabric 12.5.3
Processor and browser for annotated text corpora [view more]
- Archiving
- Bible studies
- Commenting
- Computational linguistics and philology
- Highlighting
- Information Retrieval
- Interpreting
- Religious studies and theology
- Rhetorical Analysis
- Sharing
- Structural Analysis
- Textual and content analysis
- Textual and linguistic corpora
- akkadian
- babylonian
- bible
- cuneiform
- database
- graph
- greek
- hebrew
- linguistics
- peshitta
- quran
- syriac
- text
- uruk
- Javascript
- Macos
- Microsoft
- Posix
- Python
Created: 2017-10-19
Modified: 2024-07-05
Created: 2022-03-08
Modified: 2024-09-13
TextRepo
Created: 2019-08-07
Modified: 2022-03-15
Created: 2021-03-08
Modified: 2022-04-08
TICCL & PICCL
Created: 2015
TICCLTools 0.10
TicclTools is a collection of programs to process datafiles, together they constitute the bulk of TICCL: Text Induced Corpus-Cleanup. This software consists of individual modules that are invoked by the pipeline system PICCL. [view more]
- natural language processing
- nlp
- normalization
- ocr
- Posix
Created: 2015
TiMBL
python3-timbl 2020.6.8
Python 3 language binding for the Tilburg Memory-Based Learner [view more]
- Scientific/Engineering
- Text Processing > Linguistic
- k-nearest-neighbours
- knn
- machine-learning
- python
- timbl
- Bsd
- Linux
- Macos
- Python
Created: 2013-02-11
Modified: 2020-06-08
TiMBL 6.9
TiMBL is an open source software package implementing several memory-based learning algorithms, among which IB1-IG, an implementation of k-nearest neighbor classification with feature weighting suitable for symbolic feature spaces, and IGTree, a decision-tree approximation of IB1-IG. All implemented algorithms have in common that they store some representation of the training set explicitly in memory. During testing, new cases are classified by extrapolation from the most similar stored cases. [view more]
- decision tree
- k-nearest neighbours
- knn
- machine learning
- memory based learning
- natural language processing
- nlp
- Bsd
- Linux
- Macos
Created: 1998
Created: 2012-08-15
Modified: 2024-03-01
Ucto
python-ucto 0.6.8
- KNAW Humanities Cluster & CLST, Radboud University
This is a Python binding to the tokenizer Ucto. Tokenisation is one of the first step in almost any Natural Language Processing task, yet it is not always as trivial a task as it appears to be. This binding makes the power of the ucto tokeniser available to Python. Ucto itself is a regular-expression based, extensible, and advanced tokeniser written in C++ (https://languagemachines.github.io/ucto). [view more]
- Text Processing > Linguistic
- tokenizer tokenization tokeniser tokenisation nlp computational_linguistics ucto
- Bsd
- Cython
- Linux
- Macos
- Python
Created: 2014-05-21
Modified: 2024-09-12
ucto 0.34
Ucto tokenizes text files: it separates words from punctuation, and splits sentences. This is one of the first tasks for almost any Natural Language Processing application. Ucto offers several other basic preprocessing steps such as changing case that you can all use to make your text suited for further processing such as indexing, part-of-speech tagging, or machine translation. [view more]
- Annotating
- Linguistics
- Tagging
- Textual and content analysis
- natural language processing
- nlp
- tokenization
- tokenizer
- Bsd
- Linux
- Macos
Created: 2011-03-27
Modified: 2023-02-22
Ucto-Webservice 2.5.2
- KNAW Humanities Cluster & CLST, Radboud University
Ucto is a rule-based tokeniser for multiple languages. This is the webservice for it, for both humans and machines. [view more]
- Annotating
- Linguistics
- Tagging
- Textual and content analysis
- clam webservice rest nlp computational_linguistics rest
- Bsd
- Linux
- Macos
- Python
- Website
- Source code
- Go to Ucto Webservice (WebApplication) https://webservices.cls.ru.nl/ucto Ucto is a unicode-compliant tokeniser. It takes input in the form of one or more untokenised texts, and subsequently tokenises them. Several languages are supported, but the software is extensible to other languages.
Created: 2022-04-08
Modified: 2024-03-14
udpipe-service 4.10
A rest service for an R / udpipe based tokenizer, lemmatizer, pos-tagger and dependency parser.
See https://bitbucket.org/fryske-akademy/udpipe for (docker) setup.
[view more]
Created: 2020-11-18
Modified: 2023-11-26
vocabulary-recommender 2.0.0
A generic linked data vocabulary recommender library that provides recommendation functions for various backends. [view more]
- lod
- namespace
- recommender
- vocabulary
Created: 2022-09-05
Modified: 2022-12-23