GaLAHaD

GaLAHaD (Generating Linguistic Annotations for Historical Dutch) allows linguists to compare taggers, tag their own corpora, evaluate the results and export their tagged documents.

Provided tools & services

GaLAHaD API

Note: No URL was registered for this service (yet)
Type
  • Web API
Documentation
Service Provider
Input data
Type
TextDigitalDocument
Encoding Format
https://github.com/newsreader/NAF
Type
TextDigitalDocument
Encoding Format
text/tab-separated-values
Type
TextDigitalDocument
Encoding Format
https://universaldependencies.org/format.html
Type
TextDigitalDocument
Encoding Format
application/folia+xml
Type
TextDigitalDocument
Encoding Format
application/tei+xml
Type
TextDigitalDocument
Encoding Format
text/plain
Output data
Type
CreativeWork
Encoding Format
application/zip

GaLAHaD client Docker image

Type
  • Software Image

GaLAHaD proxy

Type
  • Server Application

GaLAHaD proxy Docker image

Type
  • Software Image

GaLAHaD server Docker image

Type
  • Software Image

Tool suite: GaLAHaD

The following closely related tools are in a tool suite together with GaLAHaD:

  • Command-line Application
  • 6 - Late prototype: Technology demonstrated in target setting, end-users adopt it for testing purposes.
  • Active: The project has reached a stable, usable state and is being actively developed.

GaLAHaD Train Battery 1.1.0

Python program for training linguistic annotation taggers based on a configuration file and list of datasets. It prepares the resulting trained models for dockerization and adds relevant metadata. It is tagger software agnostic as long as a simple python shell is built around it. [view more]
  • Artificial intelligence, export systems
  • Computational linguistics and philology
  • Linguistics
  • Machine Learning
  • Linux
  • Python
Created: 2024-05-31
Modified: 2025-04-07
  • Command-line Application
  • 8 - Complete: Technology complete and qualified, released for all end-users in scholarly environments.
  • Active: The project has reached a stable, usable state and is being actively developed.

int-pie 1.1.0

  •   Enrique Manjavacas
  •   Mike Kestemont
  •   Thibault Clerice
The PIE tagger with custom modifications by the Dutch Language Institute (INT). [view more]
  • Analyzing
  • Annotating
  • Artificial intelligence, export systems
  • Computational linguistics and philology
  • Enriching
  • Lemmatizing
  • Linguistics
  • Machine Learning
  • POS-Tagging
  • Tagging
  • Linux
  • Python
Created: 2024-05-31
Modified: 2025-04-07

Citation

You can cite this software using the following citation generated from its metadata:

(2025) GaLAHaD 1.2.8 .
  • Instituut voor de Nederlandse taal
.

Logs & Reviews

Name
Automatic software metadata validation report for GaLAHaD 1.2.8
Author
  • codemetapy validator using software.ttl
Date
2026-02-07 03:14:46
Review
Please consult the CLARIAH Software Metadata Requirements at https://github.com/CLARIAH/clariah-plus/blob/main/requirements/software-metadata-requirements.md for an in-depth explanation of any found problems

Validation of GaLAHaD 1.2.8 was successful (score=4/5), but there are some remarks which you may or may not want to address:

1. Info: Reference publications *SHOULD* be expressed, if any (This is missing in the metadata)
Rating
★ ★ ★ ★ ☆
There were 1 error(s) harvesting this metadata, please inspect the log.
(log file starts at Sat Feb  7 03:14:30 UTC 2026)

[harvester info] --> Processing galahad (https://github.com/instituutnederlandsetaal/galahad) [Sat Feb  7 03:14:30 UTC 2026]

[harvester info] Git updating cached clone of https://github.com/instituutnederlandsetaal/galahad...

[harvester info] Found release 1.2.8

[harvester info] Using '1.2.8'

[harvester info] Git reference: 1.2.8

[harvester info] Scanning directory /tmp/codemeta-harvester.cache/galahad for harvestable resources...

[harvester info] found codemeta-harvest.json for galahad (md5sum 6a1e01599a462c3e65c902c213911ef8); values in here take precendence over (override) those in later detection stages

[harvester info] Looking for license....

[harvester info] Found license Apache-2.0

[harvester info] Getting contributors from git...

[harvester info] Getting top contributor from git...

[harvester info] Git top contributor Vincent Prins <vincent.prins@ivdnt.org> will be assigned as author (and maintainer) if none are found in the metadata

[harvester info] Extracting last and first commit date from git log....

[harvester info] Date created: 2024-05-31T16:59:02Z+0200, date modified: 2025-04-09T11:37:17Z+0200

[harvester info] Querying Github/GitLab API (https://github.com/instituutnederlandsetaal/galahad)

[harvester info] Adding URL for found README: readme.md

[harvester info] Found releaseNotes

[harvester info] Querying Zenodo API for DOI (access token provided)...

[harvester info] Looking for TRL information in readme.md...

[harvester info] Looking for repostatus information in readme.md...

[harvester info] Looking for continuous integration information in readme.md...

[harvester info] Found CI https://github.com/INL/Galahad/actions/

[harvester info] Looking for documentation links in readme.md...

[harvester info] Falling back to git tag (1.2.8) if no version number is specified...

[harvester info] Inferring repostatus information from git activity (used only as a fallback if not explicitly provided)...

[harvester info] Inferred repostatus https://www.repostatus.org/#inactive

[harvester info] Looking for repostatus information in readme.md in master branch...

[harvester info] Setting group GaLAHaD

[harvester info] Reconciliating: codemetapy  --baseuri https://tools.clariah.nl --baseuri https://tools.clariah.nl --includecontext --addcontext https://w3id.org/nwo-research-fields --addcontext https://w3id.org/research-technology-readiness-levels --addcontextgraph https://vocabs.dariah.eu/rest/v1/tadirah/data?format=text/turtle --trl --identifier "galahad" --codeRepository "https://github.com/instituutnederlandsetaal/galahad" --validate /etc/software.ttl --released --enrich --textv "Please consult the CLARIAH Software Metadata Requirements at https://github.com/CLARIAH/clariah-plus/blob/main/requirements/software-metadata-requirements.md for an in-depth explanation of any found problems" -O /tmp/out/galahad.codemeta.json /tmp/codemeta-harvester.cache//tmp/99-version.galahad.codemeta.json /tmp/codemeta-harvester.cache//tmp/99-repostatus.galahad.codemeta.json /tmp/codemeta-harvester.cache//tmp/90-authors.galahad.codemeta.json /tmp/codemeta-harvester.cache//tmp/43-releasenotes.galahad.codemeta.json /tmp/codemeta-harvester.cache//tmp/41-readme.galahad.codemeta.json /tmp/codemeta-harvester.cache//tmp/40-gitapi.galahad.codemeta.json /tmp/codemeta-harvester.cache//tmp/39-gitdate.galahad.codemeta.json /tmp/codemeta-harvester.cache//tmp/32-contributors.galahad.codemeta.json /tmp/codemeta-harvester.cache//tmp/29-license.galahad.codemeta.json /tmp/codemeta-harvester.cache//tmp/12-ci.galahad.codemeta.json /tmp/codemeta-harvester.cache//tmp/10-harvest.galahad.codemeta.json /tmp/codemeta-harvester.cache//tmp/04-applicationSuite.galahad.codemeta.json 

-- begin log --

/usr/lib/python3.12/site-packages/pyshacl/extras/__init__.py:6: UserWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html. The pkg_resources package is slated for removal as early as 2025-11-30. Refrain from using this package or pin to Setuptools<81.

  import pkg_resources

Passed 12 files/sources but specified 0 input types! Automatically guessing types...

Detected input types: [('/tmp/codemeta-harvester.cache//tmp/99-version.galahad.codemeta.json', 'json'), ('/tmp/codemeta-harvester.cache//tmp/99-repostatus.galahad.codemeta.json', 'json'), ('/tmp/codemeta-harvester.cache//tmp/90-authors.galahad.codemeta.json', 'json'), ('/tmp/codemeta-harvester.cache//tmp/43-releasenotes.galahad.codemeta.json', 'json'), ('/tmp/codemeta-harvester.cache//tmp/41-readme.galahad.codemeta.json', 'json'), ('/tmp/codemeta-harvester.cache//tmp/40-gitapi.galahad.codemeta.json', 'json'), ('/tmp/codemeta-harvester.cache//tmp/39-gitdate.galahad.codemeta.json', 'json'), ('/tmp/codemeta-harvester.cache//tmp/32-contributors.galahad.codemeta.json', 'json'), ('/tmp/codemeta-harvester.cache//tmp/29-license.galahad.codemeta.json', 'json'), ('/tmp/codemeta-harvester.cache//tmp/12-ci.galahad.codemeta.json', 'json'), ('/tmp/codemeta-harvester.cache//tmp/10-harvest.galahad.codemeta.json', 'json'), ('/tmp/codemeta-harvester.cache//tmp/04-applicationSuite.galahad.codemeta.json', 'json')]

Adding to contextgraph: /tmp/turtle

Initial URI automatically generated, may be overriden later: https://tools.clariah.nl/galahad

Processing source #1 of 12

Parsing json-ld file from /tmp/codemeta-harvester.cache//tmp/99-version.galahad.codemeta.json

    NOTE: Not a valid JSON-LD document, @context missing! Attempting to inject automatically...

    Injected (possibly temporary) URI https://tools.clariah.nl/galahad

[CODEMETA COMPOSITION (https://tools.clariah.nl/galahad)] processed 1 new triples, total is now 2

Processing source #2 of 12

Parsing json-ld file from /tmp/codemeta-harvester.cache//tmp/99-repostatus.galahad.codemeta.json

    NOTE: Not a valid JSON-LD document, @context missing! Attempting to inject automatically...

    Injected (possibly temporary) URI https://tools.clariah.nl/galahad

[CODEMETA COMPOSITION (https://tools.clariah.nl/galahad)] processed 1 new triples, total is now 3

Processing source #3 of 12

Parsing json-ld file from /tmp/codemeta-harvester.cache//tmp/90-authors.galahad.codemeta.json

    Found main resource with URI https://tools.clariah.nl/galahad.topcontributor/snapshot

    Injected (possibly temporary) URI https://tools.clariah.nl/galahad

[CODEMETA COMPOSITION (https://tools.clariah.nl/galahad)] processed 8 new triples, total is now 10

Processing source #4 of 12

Parsing json-ld file from /tmp/codemeta-harvester.cache//tmp/43-releasenotes.galahad.codemeta.json

    NOTE: Not a valid JSON-LD document, @context missing! Attempting to inject automatically...

    Injected (possibly temporary) URI https://tools.clariah.nl/galahad

[CODEMETA COMPOSITION (https://tools.clariah.nl/galahad)] processed 2 new triples, total is now 12

Processing source #5 of 12

Parsing json-ld file from /tmp/codemeta-harvester.cache//tmp/41-readme.galahad.codemeta.json

    NOTE: Not a valid JSON-LD document, @context missing! Attempting to inject automatically...

    Injected (possibly temporary) URI https://tools.clariah.nl/galahad

[CODEMETA COMPOSITION (https://tools.clariah.nl/galahad)] processed 1 new triples, total is now 13

Processing source #6 of 12

Parsing json-ld file from /tmp/codemeta-harvester.cache//tmp/40-gitapi.galahad.codemeta.json

    Found main resource with URI https://tools.clariah.nl/galahad/snapshot

    Injected (possibly temporary) URI https://tools.clariah.nl/galahad

[CODEMETA COMPOSITION (https://tools.clariah.nl/galahad)] processed 27 new triples, total is now 39

Processing source #7 of 12

Parsing json-ld file from /tmp/codemeta-harvester.cache//tmp/39-gitdate.galahad.codemeta.json

    NOTE: Not a valid JSON-LD document, @context missing! Attempting to inject automatically...

    Injected (possibly temporary) URI https://tools.clariah.nl/galahad

[CODEMETA COMPOSITION (https://tools.clariah.nl/galahad)] overriding old http://schema.org/dateCreated (2024-05-31T14:57:58Z -> 2024-05-31T16:59:02Z+0200)

[CODEMETA COMPOSITION (https://tools.clariah.nl/galahad)] overriding old http://schema.org/dateModified (2026-02-06T06:08:39Z -> 2025-04-09T11:37:17Z+0200)

[CODEMETA COMPOSITION (https://tools.clariah.nl/galahad)] processed 2 new triples, total is now 39

Processing source #8 of 12

Parsing json-ld file from /tmp/codemeta-harvester.cache//tmp/32-contributors.galahad.codemeta.json

    Found main resource with URI https://tools.clariah.nl/galahad.contributors/snapshot

    Injected (possibly temporary) URI https://tools.clariah.nl/galahad

[CODEMETA COMPOSITION (https://tools.clariah.nl/galahad)] processed 8 new triples, total is now 40

Processing source #9 of 12

Parsing json-ld file from /tmp/codemeta-harvester.cache//tmp/29-license.galahad.codemeta.json

    NOTE: Not a valid JSON-LD document, @context missing! Attempting to inject automatically...

    Injected (possibly temporary) URI https://tools.clariah.nl/galahad

[CODEMETA COMPOSITION (https://tools.clariah.nl/galahad)] overriding old http://schema.org/license (http://spdx.org/licenses/Apache-2.0 -> Apache-2.0)

[CODEMETA CORRECTION (https://tools.clariah.nl/galahad)] automatically converting license to spdx URI

[CODEMETA COMPOSITION (https://tools.clariah.nl/galahad)] processed 1 new triples, total is now 40

Processing source #10 of 12

Parsing json-ld file from /tmp/codemeta-harvester.cache//tmp/12-ci.galahad.codemeta.json

    NOTE: Not a valid JSON-LD document, @context missing! Attempting to inject automatically...

    Injected (possibly temporary) URI https://tools.clariah.nl/galahad

[CODEMETA COMPOSITION (https://tools.clariah.nl/galahad)] processed 1 new triples, total is now 41

Processing source #11 of 12

Parsing json-ld file from /tmp/codemeta-harvester.cache//tmp/10-harvest.galahad.codemeta.json

    Injected (possibly temporary) URI https://tools.clariah.nl/galahad

[CODEMETA 2 TO 3] Updating targetProduct -> isSourceCodeOf

[CODEMETA 2 TO 3] Updating targetProduct -> isSourceCodeOf

[CODEMETA 2 TO 3] Updating targetProduct -> isSourceCodeOf

[CODEMETA 2 TO 3] Updating targetProduct -> isSourceCodeOf

[CODEMETA 2 TO 3] Updating targetProduct -> isSourceCodeOf

[CODEMETA 2 TO 3] Updating targetProduct -> isSourceCodeOf

[CODEMETA COMPOSITION (galahad)] overriding old http://schema.org/dateCreated (2024-05-31T16:59:02Z+0200 -> 2024-05-31)

[CODEMETA COMPOSITION (galahad)] overriding old http://schema.org/author (https://tools.clariah.nl/stub/H3fa17592f34c2df9 -> https://tools.clariah.nl/stub/H60fccf409fa6196c)

[CODEMETA COMPOSITION (galahad)] overriding old http://schema.org/contributor (https://tools.clariah.nl/stub/H3fa17592f34c2df9 -> https://tools.clariah.nl/stub/H-6150866bf929e008)

[CODEMETA COMPOSITION (galahad)] overriding old http://schema.org/codeRepository (https://github.com/instituutnederlandsetaal/galahad -> git+https://github.com/INL/galahad.git)

[CODEMETA COMPOSITION (galahad)] overriding old http://schema.org/description ("Galahad". Goal: enable linguists to experiment with different taggers and use the result in other INT products  -> GaLAHaD (Generating Linguistic Annotations for Historical Dutch) allows linguists to compare taggers, tag their own corpora, evaluate the results and export their tagged documents.)

[CODEMETA COMPOSITION (galahad)] overriding old http://schema.org/downloadUrl (https://github.com/instituutnederlandsetaal/galahad/archive/refs/tags/1.2.8.zip -> https://github.com/INL/galahad)

[CODEMETA COMPOSITION (galahad)] overriding old http://schema.org/name (galahad -> GaLAHaD)

[CODEMETA COMPOSITION (galahad)] overriding old https://codemeta.github.io/terms/continuousIntegration (https://github.com/INL/Galahad/actions/ -> https://github.com/INL/galahad/actions)

[CODEMETA COMPOSITION (galahad)] overriding old https://codemeta.github.io/terms/developmentStatus (https://www.repostatus.org/#inactive -> https://www.repostatus.org/#active)

[CODEMETA COMPOSITION (galahad)] overriding old http://schema.org/producer (https://tools.clariah.nl/org/dutch-language-institute -> https://www.ivdnt.org)

[CODEMETA COMPOSITION (galahad)] overriding old https://codemeta.github.io/terms/issueTracker (https://github.com/instituutnederlandsetaal/galahad/issues -> https://github.com/INL/galahad/issues)

[CODEMETA COMPOSITION (galahad)] overriding old https://codemeta.github.io/terms/readme (https://github.com/instituutnederlandsetaal/galahad/blob/1.2.8//readme.md -> https://github.com/INL/Galahad/blob/release/readme.md)

[CODEMETA COMPOSITION (galahad)] overriding old http://schema.org/releaseNotes (https://github.com/instituutnederlandsetaal/galahad/releases/tag/1.2.8 -> https://github.com/INL/Galahad/releases)

[CODEMETA COMPOSITION (galahad)] processed 301 new triples, total is now 327

Processing source #12 of 12

Parsing json-ld file from /tmp/codemeta-harvester.cache//tmp/04-applicationSuite.galahad.codemeta.json

    NOTE: Not a valid JSON-LD document, @context missing! Attempting to inject automatically...

    Injected (possibly temporary) URI https://tools.clariah.nl/galahad

[CODEMETA COMPOSITION (galahad)] processed 1 new triples, total is now 328

Remapping URI to (possibly) new identifier and version component: https://tools.clariah.nl/galahad -> https://tools.clariah.nl/galahad/1.2.8

[CODEMETA VALIDATION (galahad)] done

VALIDATION https://tools.clariah.nl/galahad/1.2.8 #1: Info: Reference publications *SHOULD* be expressed, if any (This is missing in the metadata)

-- end log --

[harvester info] Output written to /tmp/out/galahad.codemeta.json

[harvester info] Harvesting remote service URL https://portal.clarin.ivdnt.org/galahad for galahad: codemetapy  --baseuri https://tools.clariah.nl --baseuri https://tools.clariah.nl --includecontext --addcontext https://w3id.org/nwo-research-fields --addcontext https://w3id.org/research-technology-readiness-levels --addcontextgraph https://vocabs.dariah.eu/rest/v1/tadirah/data?format=text/turtle --trl -O "/tmp/codemeta-harvester.cache//tmp/galahad.codemeta.json" "/tmp/out/galahad.codemeta.json" "https://portal.clarin.ivdnt.org/galahad"

-- begin log --

/usr/lib/python3.12/site-packages/pyshacl/extras/__init__.py:6: UserWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html. The pkg_resources package is slated for removal as early as 2025-11-30. Refrain from using this package or pin to Setuptools<81.

  import pkg_resources

Passed 2 files/sources but specified 0 input types! Automatically guessing types...

Detected input types: [('/tmp/out/galahad.codemeta.json', 'json'), ('https://portal.clarin.ivdnt.org/galahad', 'web')]

Adding to contextgraph: /tmp/turtle

Initial URI automatically generated, may be overriden later: https://tools.clariah.nl/galahad

Processing source #1 of 2

Parsing json-ld file from /tmp/out/galahad.codemeta.json

    Found main resource with URI https://tools.clariah.nl/galahad/1.2.8

    Injected (possibly temporary) URI https://tools.clariah.nl/galahad

[CODEMETA 2 TO 3] Updating contIntegration -> continuousIntegration

[CODEMETA COMPOSITION (galahad)] processed 956 new triples, total is now 956

Processing source #2 of 2

Fallback: Obtaining metadata from remote URL https://portal.clarin.ivdnt.org/galahad

    Service replied with content-type text/html

Traceback (most recent call last):

  File "/usr/bin/codemetapy", line 8, in <module>

    sys.exit(main())

             ^^^^^^

  File "/usr/lib/python3.12/site-packages/codemeta/codemeta.py", line 339, in main

    g, res, args, contextgraph = build(**args.__dict__)

                                 ^^^^^^^^^^^^^^^^^^^^^^

  File "/usr/lib/python3.12/site-packages/codemeta/codemeta.py", line 692, in build

    for targetres in codemeta.parsers.web.parse_web(

                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

  File "/usr/lib/python3.12/site-packages/codemeta/parsers/web.py", line 132, in parse_web

    raise MiddlewareObstructionException(

codemeta.parsers.web.MiddlewareObstructionException: Unable to extract metadata from https://portal.clarin.ivdnt.org/galahad because it immediately redirects to an external (SSO) login page rather than a proper landing page

-- end log --

[harvester error] Failed to obtain or process metadata from remote service URL https://portal.clarin.ivdnt.org/galahad for galahad

[harvester info] <-- Finished processing galahad (https://github.com/instituutnederlandsetaal/galahad) [Sat Feb  7 03:14:50 UTC 2026]

        

Metadata Properties

Version
1.2.8 (release notes)
Interface types
  • Server Application
  • Software Image
  • Web API
  • Web Application
Software website
Source code repository
 https://github.com/instituutnederlandsetaal/galahad  Stars are an indicator of the popularity of this project on GitHub
Category
  • Analyzing
  • Annotating
  • Artificial intelligence, export systems
  • Comparing
  • Computational linguistics and philology
  • Converting
  • Enriching
  • Lemmatizing
  • Linguistics
  • Machine Learning
  • Merging
  • POS-Tagging
  • Software for humanities
  • Tagging
  • Textual and linguistic corpora
Keywords
  • conll-u
  • conllu
  • evaluation
  • evaluation-metrics
  • folia
  • kotlin
  • linguistics
  • naf
  • tagger
  • tagging
  • tei
  • tei-xml
Development Status
  • 6 - Late prototype: Technology demonstrated in target setting, end-users adopt it for testing purposes.
  • Active: The project has reached a stable, usable state and is being actively developed.
Issue Tracker (Support)
https://github.com/INL/galahad/issues  The number of open issues on the issue tracker  The number of closes issues on the issue tracker
Documentation
License
Author(s)
Maintainer(s)
Contributor(s)
Producer
Programming Language
  • Javascript
  • Kotlin
  • Typescript
Continuous Integration Tests
None
Runtime Platform
  • JVM
  • Node
Operating System
  • Linux
Software dependencies
  • @typescript-eslint/parser
  • mutationobserver-shim
  • springdoc-openapi-starter-webmvc-ui
  • @vue/eslint-config-typescript
  • node-sass
  • js-yaml
  • @vitejs/plugin-vue
  • axios
  • @typescript-eslint/eslint-plugin
  • kotlinx-serialization-json-jvm
  • klaxon
  • vue-slider-component
  • kotlin-reflect
  • kotlinx-coroutines-core-jvm
  • spring-boot-starter-web
  • vue-router
  • snakeyaml
  • @types/js-yaml
  • safe-buffer
  • eslint-plugin-vue
  • uuid
  • vue
  • json-loader
  • kotlin-stdlib
  • content-disposition
  • vite
  • log4j-api-kotlin
  • pinia
  • eslint
  • sass
  • buffer
  • @types/uuid
  • typescript
  • @types/jest
  • spring-boot-devtools
  • @rollup/plugin-yaml
Metadata validation
★ ★ ★ ★ ☆
Created
2024-05-31
Last modified
2025-04-09 11:37:17 +0200  Last commit (main branch). Gives an indication of project development activity and rough indication of how up-to-date the latest release is.  Number of commits since the last release. Gives an indication of project development activity and rough indication of how up-to-date the latest release is.