int-pie

The PIE tagger with custom modifications by the Dutch Language Institute (INT).

Provided tools & services

evaluate

Type
  • Command-line Application
Service Provider
Input data
Type
TextDigitalDocument
Encoding Format
text/tab-separated-values

tag

Type
  • Command-line Application
Service Provider
Input data
Type
TextDigitalDocument
Encoding Format
text/plain
Output data
Type
TextDigitalDocument
Encoding Format
text/tab-separated-values

train

Type
  • Command-line Application
Service Provider
Input data
Type
TextDigitalDocument
Encoding Format
text/tab-separated-values
Type
TextDigitalDocument
Encoding Format
application/json

Tool suite: GaLAHaD

The following closely related tools are in a tool suite together with int-pie:

  • Server Application
  • Software Image
  • Web API
  • Web Application
  • 6 - Late prototype: Technology demonstrated in target setting, end-users adopt it for testing purposes.
  • Active: The project has reached a stable, usable state and is being actively developed.

GaLAHaD 1.2.8

GaLAHaD (Generating Linguistic Annotations for Historical Dutch) allows linguists to compare taggers, tag their own corpora, evaluate the results and export their tagged documents. [view more]
  • Analyzing
  • Annotating
  • Artificial intelligence, export systems
  • Comparing
  • Computational linguistics and philology
  • Converting
  • Enriching
  • Lemmatizing
  • Linguistics
  • Machine Learning
  • Merging
  • POS-Tagging
  • Software for humanities
  • Tagging
  • Textual and linguistic corpora
  • conll-u
  • conllu
  • evaluation
  • evaluation-metrics
  • folia
  • kotlin
  • linguistics
  • naf
  • tagger
  • tagging
  • tei
  • tei-xml
  • Jvm
  • Linux
  • Node
Created: 2024-05-31
Modified: 2025-04-09
  • Command-line Application
  • 6 - Late prototype: Technology demonstrated in target setting, end-users adopt it for testing purposes.
  • Active: The project has reached a stable, usable state and is being actively developed.

GaLAHaD Train Battery 1.1.0

Python program for training linguistic annotation taggers based on a configuration file and list of datasets. It prepares the resulting trained models for dockerization and adds relevant metadata. It is tagger software agnostic as long as a simple python shell is built around it. [view more]
  • Artificial intelligence, export systems
  • Computational linguistics and philology
  • Linguistics
  • Machine Learning
  • Linux
  • Python
Created: 2024-05-31
Modified: 2025-04-07

Citation

You can cite this software using the following citation generated from its metadata:

  • Manjavacas, Enrique
  • Kestemont, Mike
  • Clerice, Thibault
(2025) int-pie 1.1.0 .
  • Instituut voor de Nederlandse taal
.

Logs & Reviews

Name
Automatic software metadata validation report for int-pie 1.1.0
Author
  • codemetapy validator using software.ttl
Date
2026-02-07 03:19:34
Review
Please consult the CLARIAH Software Metadata Requirements at https://github.com/CLARIAH/clariah-plus/blob/main/requirements/software-metadata-requirements.md for an in-depth explanation of any found problems

Validation of int-pie 1.1.0 was successful (score=3/5), but there are some warnings which should be addressed:

1. Info: Software source code *SHOULD* link to a continuous integration service that builds the software and runs the software's tests (This is missing in the metadata)
2. Warning: Documentation *SHOULD* be expressed (This is missing in the metadata)
3. Info: Reference publications *SHOULD* be expressed, if any (This is missing in the metadata)
4. Info: The funder *SHOULD* be acknowledged (This is missing in the metadata)
Rating
★ ★ ★ ☆ ☆
There were 1 error(s) harvesting this metadata, please inspect the log.
(log file starts at Sat Feb  7 03:19:16 UTC 2026)

[harvester info] --> Processing int-pie (https://github.com/instituutnederlandsetaal/int-pie) [Sat Feb  7 03:19:16 UTC 2026]

[harvester info] Git updating cached clone of https://github.com/instituutnederlandsetaal/int-pie...

[harvester info] Found release 1.1.0

[harvester info] Using '1.1.0'

[harvester info] Git reference: 1.1.0

[harvester info] Scanning directory /tmp/codemeta-harvester.cache/int-pie for harvestable resources...

[harvester info] found codemeta-harvest.json for int-pie (md5sum f53f916722cf625fe77622ff6d0d3f46); values in here take precendence over (override) those in later detection stages

[harvester info] found python setup for int-pie, converting to codemeta

-- begin log --

/usr/lib/python3.12/site-packages/pyshacl/extras/__init__.py:6: UserWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html. The pkg_resources package is slated for removal as early as 2025-11-30. Refrain from using this package or pin to Setuptools<81.

  import pkg_resources

No input files specified, but found python project (setup.py) in current dir, using that...

Generating egg_info

Traceback (most recent call last):

  File "/tmp/codemeta-harvester.cache/int-pie/setup.py", line 52, in <module>

    with open(os.path.join(here, project_slug, '__version__.py')) as f:

         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

FileNotFoundError: [Errno 2] No such file or directory: '/tmp/codemeta-harvester.cache/int-pie/int_pie/__version__.py'

Traceback (most recent call last):

  File "/usr/bin/codemetapy", line 8, in <module>

    sys.exit(main())

             ^^^^^^

  File "/usr/lib/python3.12/site-packages/codemeta/codemeta.py", line 339, in main

    g, res, args, contextgraph = build(**args.__dict__)

                                 ^^^^^^^^^^^^^^^^^^^^^^

  File "/usr/lib/python3.12/site-packages/codemeta/codemeta.py", line 596, in build

    raise Exception(

Exception: Could not generate egg_info (is python3 pointing to the right interpreter?)

-- end log --

[harvester error] python setup.py to codemeta conversion failed for int-pie (codemetapy failed)

[harvester info] Looking for license....

[harvester info] Found license MIT

[harvester info] Getting contributors from git...

[harvester info] Getting top contributor from git...

[harvester info] Git top contributor Enrique Manjavacas <enrique.manjavacas@gmail.com> will be assigned as author (and maintainer) if none are found in the metadata

[harvester info] Extracting last and first commit date from git log....

[harvester info] Date created: 2018-04-25T15:52:42Z+0200, date modified: 2025-04-07T16:43:42Z+0200

[harvester info] Querying Github/GitLab API (https://github.com/instituutnederlandsetaal/int-pie)

[harvester info] Adding URL for found README: README.md

[harvester info] Found releaseNotes

[harvester info] Querying Zenodo API for DOI (access token provided)...

[harvester info] Looking for TRL information in README.md...

[harvester info] Mapping repostatus https://www.repostatus.org/#wip to trl:Stage3Experimental

[harvester info] Looking for repostatus information in README.md...

[harvester info] Looking for continuous integration information in README.md...

[harvester info] Looking for documentation links in README.md...

[harvester info] Falling back to git tag (1.1.0) if no version number is specified...

[harvester info] Inferring repostatus information from git activity (used only as a fallback if not explicitly provided)...

[harvester info] Inferred repostatus https://www.repostatus.org/#inactive

[harvester info] Looking for repostatus information in README.md in master branch...

[harvester info] Setting group GaLAHaD

[harvester info] Reconciliating: codemetapy  --baseuri https://tools.clariah.nl --baseuri https://tools.clariah.nl --includecontext --addcontext https://w3id.org/nwo-research-fields --addcontext https://w3id.org/research-technology-readiness-levels --addcontextgraph https://vocabs.dariah.eu/rest/v1/tadirah/data?format=text/turtle --trl --identifier "int-pie" --codeRepository "https://github.com/instituutnederlandsetaal/int-pie" --validate /etc/software.ttl --released --enrich --textv "Please consult the CLARIAH Software Metadata Requirements at https://github.com/CLARIAH/clariah-plus/blob/main/requirements/software-metadata-requirements.md for an in-depth explanation of any found problems" -O /tmp/out/int-pie.codemeta.json /tmp/codemeta-harvester.cache//tmp/99-version.int-pie.codemeta.json /tmp/codemeta-harvester.cache//tmp/99-repostatus.int-pie.codemeta.json /tmp/codemeta-harvester.cache//tmp/90-authors.int-pie.codemeta.json /tmp/codemeta-harvester.cache//tmp/43-releasenotes.int-pie.codemeta.json /tmp/codemeta-harvester.cache//tmp/41-readme.int-pie.codemeta.json /tmp/codemeta-harvester.cache//tmp/40-gitapi.int-pie.codemeta.json /tmp/codemeta-harvester.cache//tmp/39-gitdate.int-pie.codemeta.json /tmp/codemeta-harvester.cache//tmp/32-contributors.int-pie.codemeta.json /tmp/codemeta-harvester.cache//tmp/29-license.int-pie.codemeta.json /tmp/codemeta-harvester.cache//tmp/11-trl.int-pie.codemeta.json /tmp/codemeta-harvester.cache//tmp/10-harvest.int-pie.codemeta.json /tmp/codemeta-harvester.cache//tmp/04-applicationSuite.int-pie.codemeta.json 

-- begin log --

/usr/lib/python3.12/site-packages/pyshacl/extras/__init__.py:6: UserWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html. The pkg_resources package is slated for removal as early as 2025-11-30. Refrain from using this package or pin to Setuptools<81.

  import pkg_resources

Passed 12 files/sources but specified 0 input types! Automatically guessing types...

Detected input types: [('/tmp/codemeta-harvester.cache//tmp/99-version.int-pie.codemeta.json', 'json'), ('/tmp/codemeta-harvester.cache//tmp/99-repostatus.int-pie.codemeta.json', 'json'), ('/tmp/codemeta-harvester.cache//tmp/90-authors.int-pie.codemeta.json', 'json'), ('/tmp/codemeta-harvester.cache//tmp/43-releasenotes.int-pie.codemeta.json', 'json'), ('/tmp/codemeta-harvester.cache//tmp/41-readme.int-pie.codemeta.json', 'json'), ('/tmp/codemeta-harvester.cache//tmp/40-gitapi.int-pie.codemeta.json', 'json'), ('/tmp/codemeta-harvester.cache//tmp/39-gitdate.int-pie.codemeta.json', 'json'), ('/tmp/codemeta-harvester.cache//tmp/32-contributors.int-pie.codemeta.json', 'json'), ('/tmp/codemeta-harvester.cache//tmp/29-license.int-pie.codemeta.json', 'json'), ('/tmp/codemeta-harvester.cache//tmp/11-trl.int-pie.codemeta.json', 'json'), ('/tmp/codemeta-harvester.cache//tmp/10-harvest.int-pie.codemeta.json', 'json'), ('/tmp/codemeta-harvester.cache//tmp/04-applicationSuite.int-pie.codemeta.json', 'json')]

Adding to contextgraph: /tmp/turtle

Initial URI automatically generated, may be overriden later: https://tools.clariah.nl/int-pie

Processing source #1 of 12

Parsing json-ld file from /tmp/codemeta-harvester.cache//tmp/99-version.int-pie.codemeta.json

    NOTE: Not a valid JSON-LD document, @context missing! Attempting to inject automatically...

    Injected (possibly temporary) URI https://tools.clariah.nl/int-pie

[CODEMETA COMPOSITION (https://tools.clariah.nl/int-pie)] processed 1 new triples, total is now 2

Processing source #2 of 12

Parsing json-ld file from /tmp/codemeta-harvester.cache//tmp/99-repostatus.int-pie.codemeta.json

    NOTE: Not a valid JSON-LD document, @context missing! Attempting to inject automatically...

    Injected (possibly temporary) URI https://tools.clariah.nl/int-pie

[CODEMETA COMPOSITION (https://tools.clariah.nl/int-pie)] processed 1 new triples, total is now 3

Processing source #3 of 12

Parsing json-ld file from /tmp/codemeta-harvester.cache//tmp/90-authors.int-pie.codemeta.json

    Found main resource with URI https://tools.clariah.nl/int-pie.topcontributor/snapshot

    Injected (possibly temporary) URI https://tools.clariah.nl/int-pie

[CODEMETA COMPOSITION (https://tools.clariah.nl/int-pie)] processed 8 new triples, total is now 10

Processing source #4 of 12

Parsing json-ld file from /tmp/codemeta-harvester.cache//tmp/43-releasenotes.int-pie.codemeta.json

    NOTE: Not a valid JSON-LD document, @context missing! Attempting to inject automatically...

    Injected (possibly temporary) URI https://tools.clariah.nl/int-pie

[CODEMETA COMPOSITION (https://tools.clariah.nl/int-pie)] processed 2 new triples, total is now 12

Processing source #5 of 12

Parsing json-ld file from /tmp/codemeta-harvester.cache//tmp/41-readme.int-pie.codemeta.json

    NOTE: Not a valid JSON-LD document, @context missing! Attempting to inject automatically...

    Injected (possibly temporary) URI https://tools.clariah.nl/int-pie

[CODEMETA COMPOSITION (https://tools.clariah.nl/int-pie)] processed 1 new triples, total is now 13

Processing source #6 of 12

Parsing json-ld file from /tmp/codemeta-harvester.cache//tmp/40-gitapi.int-pie.codemeta.json

    Found main resource with URI https://tools.clariah.nl/int-pie/snapshot

    Injected (possibly temporary) URI https://tools.clariah.nl/int-pie

[CODEMETA COMPOSITION (https://tools.clariah.nl/int-pie)] processed 14 new triples, total is now 26

Processing source #7 of 12

Parsing json-ld file from /tmp/codemeta-harvester.cache//tmp/39-gitdate.int-pie.codemeta.json

    NOTE: Not a valid JSON-LD document, @context missing! Attempting to inject automatically...

    Injected (possibly temporary) URI https://tools.clariah.nl/int-pie

[CODEMETA COMPOSITION (https://tools.clariah.nl/int-pie)] overriding old http://schema.org/dateCreated (2024-04-19T07:44:21Z -> 2018-04-25T15:52:42Z+0200)

[CODEMETA COMPOSITION (https://tools.clariah.nl/int-pie)] overriding old http://schema.org/dateModified (2025-04-07T14:45:10Z -> 2025-04-07T16:43:42Z+0200)

[CODEMETA COMPOSITION (https://tools.clariah.nl/int-pie)] processed 2 new triples, total is now 26

Processing source #8 of 12

Parsing json-ld file from /tmp/codemeta-harvester.cache//tmp/32-contributors.int-pie.codemeta.json

    Found main resource with URI https://tools.clariah.nl/int-pie.contributors/snapshot

    Injected (possibly temporary) URI https://tools.clariah.nl/int-pie

[CODEMETA COMPOSITION (https://tools.clariah.nl/int-pie)] processed 26 new triples, total is now 47

Processing source #9 of 12

Parsing json-ld file from /tmp/codemeta-harvester.cache//tmp/29-license.int-pie.codemeta.json

    NOTE: Not a valid JSON-LD document, @context missing! Attempting to inject automatically...

    Injected (possibly temporary) URI https://tools.clariah.nl/int-pie

[CODEMETA COMPOSITION (https://tools.clariah.nl/int-pie)] overriding old http://schema.org/license (http://spdx.org/licenses/MIT -> MIT)

[CODEMETA CORRECTION (https://tools.clariah.nl/int-pie)] automatically converting license to spdx URI

[CODEMETA COMPOSITION (https://tools.clariah.nl/int-pie)] processed 1 new triples, total is now 47

Processing source #10 of 12

Parsing json-ld file from /tmp/codemeta-harvester.cache//tmp/11-trl.int-pie.codemeta.json

    NOTE: Not a valid JSON-LD document, @context missing! Attempting to inject automatically...

    Injected (possibly temporary) URI https://tools.clariah.nl/int-pie

[CODEMETA COMPOSITION (https://tools.clariah.nl/int-pie)] processed 1 new triples, total is now 48

Processing source #11 of 12

Parsing json-ld file from /tmp/codemeta-harvester.cache//tmp/10-harvest.int-pie.codemeta.json

    Injected (possibly temporary) URI https://tools.clariah.nl/int-pie

[CODEMETA 2 TO 3] Updating targetProduct -> isSourceCodeOf

[CODEMETA 2 TO 3] Updating targetProduct -> isSourceCodeOf

[CODEMETA 2 TO 3] Updating targetProduct -> isSourceCodeOf

[CODEMETA COMPOSITION (int-pie)] overriding old http://schema.org/dateCreated (2018-04-25T15:52:42Z+0200 -> 2024-05-31)

[CODEMETA COMPOSITION (int-pie)] overriding old http://schema.org/author (https://tools.clariah.nl/stub/H6692f51fb1dfc237 -> https://tools.clariah.nl/stub/H-271682d99591239c)

[CODEMETA COMPOSITION (int-pie)] overriding old http://schema.org/contributor (https://tools.clariah.nl/stub/H4a465f5fc7bdc4e3 -> https://tools.clariah.nl/stub/H2efdc8a9f3282a20)

[CODEMETA COMPOSITION (int-pie)] overriding old http://schema.org/codeRepository (https://github.com/instituutnederlandsetaal/int-pie -> git+https://github.com/INL/int-pie.git)

[CODEMETA COMPOSITION (int-pie)] overriding old http://schema.org/description (https://github.com/emanjavacas/pie with custom modifications -> The PIE tagger with custom modifications by the Dutch Language Institute (INT).)

[CODEMETA COMPOSITION (int-pie)] overriding old http://schema.org/downloadUrl (https://github.com/instituutnederlandsetaal/int-pie/archive/refs/tags/1.1.0.zip -> https://github.com/INL/int-pie)

[CODEMETA COMPOSITION (int-pie)] overriding old https://codemeta.github.io/terms/developmentStatus (https://www.repostatus.org/#inactive -> https://www.repostatus.org/#active)

[CODEMETA COMPOSITION (int-pie)] overriding old https://codemeta.github.io/terms/developmentStatus (https://w3id.org/research-technology-readiness-levels#Stage3Experimental -> https://w3id.org/research-technology-readiness-levels#Level8Complete)

[CODEMETA COMPOSITION (int-pie)] overriding old http://schema.org/producer (https://tools.clariah.nl/org/dutch-language-institute -> https://www.ivdnt.org)

[CODEMETA COMPOSITION (int-pie)] overriding old https://codemeta.github.io/terms/issueTracker (https://github.com/instituutnederlandsetaal/int-pie/issues -> https://github.com/INL/int-pie/issues)

[CODEMETA COMPOSITION (int-pie)] overriding old https://codemeta.github.io/terms/readme (https://github.com/instituutnederlandsetaal/int-pie/blob/1.1.0//README.md -> https://github.com/INL/int-pie/blob/release/README.md)

[CODEMETA COMPOSITION (int-pie)] overriding old http://schema.org/releaseNotes (https://github.com/instituutnederlandsetaal/int-pie/releases/tag/1.1.0 -> https://github.com/INL/int-pie/releases)

[CODEMETA COMPOSITION (int-pie)] processed 214 new triples, total is now 247

Processing source #12 of 12

Parsing json-ld file from /tmp/codemeta-harvester.cache//tmp/04-applicationSuite.int-pie.codemeta.json

    NOTE: Not a valid JSON-LD document, @context missing! Attempting to inject automatically...

    Injected (possibly temporary) URI https://tools.clariah.nl/int-pie

[CODEMETA COMPOSITION (int-pie)] processed 1 new triples, total is now 248

Remapping URI to (possibly) new identifier and version component: https://tools.clariah.nl/int-pie -> https://tools.clariah.nl/int-pie/1.1.0

[CODEMETA VALIDATION (int-pie)] done

VALIDATION https://tools.clariah.nl/int-pie/1.1.0 #1: Info: Software source code *SHOULD* link to a continuous integration service that builds the software and runs the software's tests (This is missing in the metadata)

VALIDATION https://tools.clariah.nl/int-pie/1.1.0 #2: Warning: Documentation *SHOULD* be expressed (This is missing in the metadata)

VALIDATION https://tools.clariah.nl/int-pie/1.1.0 #3: Info: Reference publications *SHOULD* be expressed, if any (This is missing in the metadata)

VALIDATION https://tools.clariah.nl/int-pie/1.1.0 #4: Info: The funder *SHOULD* be acknowledged (This is missing in the metadata)

-- end log --

[harvester info] Output written to /tmp/out/int-pie.codemeta.json

[harvester info] <-- Finished processing int-pie (https://github.com/instituutnederlandsetaal/int-pie) [Sat Feb  7 03:19:34 UTC 2026]

        

Metadata Properties

Version
1.1.0 (release notes)
Interface types
  • Command-line Application
Source code repository
 https://github.com/instituutnederlandsetaal/int-pie  Stars are an indicator of the popularity of this project on GitHub
Category
  • Analyzing
  • Annotating
  • Artificial intelligence, export systems
  • Computational linguistics and philology
  • Enriching
  • Lemmatizing
  • Linguistics
  • Machine Learning
  • POS-Tagging
  • Tagging
Development Status
  • 8 - Complete: Technology complete and qualified, released for all end-users in scholarly environments.
  • Active: The project has reached a stable, usable state and is being actively developed.
Issue Tracker (Support)
https://github.com/INL/int-pie/issues  The number of open issues on the issue tracker  The number of closes issues on the issue tracker
Documentation
License
Author(s)
  •   Enrique Manjavacas
  •   Mike Kestemont
  •   Thibault Clerice
Maintainer(s)
Contributor(s)
  •   Enrique Manjavacas
  •   Mike Kestemont
  •   Thibault Clerice
  •   Vincent Prins
Producer
Programming Language
  • Python
Runtime Platform
  • Python 3.10
Operating System
  • Linux
Software dependencies
  • filelock
  • smart-open
  • lxml
  • scikit-learn
  • torch
  • scipy
  • fsspec
  • threadpoolctl
  • markupsafe
  • joblib
  • pyyaml
  • click
  • torchaudio
  • sympy
  • networkx
  • terminaltables
  • jinja2
  • torchvision
  • termcolor
  • json-minify
  • typing-extensions
  • typing
  • mpmath
  • gensim
  • numpy
  • tqdm
Metadata validation
★ ★ ★ ☆ ☆
Created
2024-05-31
Last modified
2025-04-07 16:43:42 +0200  Last commit (main branch). Gives an indication of project development activity and rough indication of how up-to-date the latest release is.  Number of commits since the last release. Gives an indication of project development activity and rough indication of how up-to-date the latest release is.