(log file starts at Wed Sep 18 03:08:12 UTC 2024) [harvester info] --> Processing galahad (https://github.com/INL/galahad) [Wed Sep 18 03:08:12 UTC 2024] [harvester info] Git updating cached clone of https://github.com/INL/galahad... [harvester info] Found release 1.2.2 [harvester info] Using '1.2.2' [harvester info] Git reference: 1.2.2 [harvester info] Scanning directory /tmp/codemeta-harvester.cache/galahad for harvestable resources... [harvester info] found codemeta-harvest.json for galahad (md5sum 6a1e01599a462c3e65c902c213911ef8); values in here take precendence over (override) those in later detection stages [harvester info] Looking for license.... [harvester info] Found license Apache-2.0 [harvester info] Getting contributors from git... [harvester info] No git contributors found [harvester info] Getting top contributor from git... [harvester info] Git top contributor will be assigned as author (and maintainer) if none are found in the metadata [harvester info] Extracting last and first commit date from git log.... [harvester info] Date created: 2024-05-31T16:59:02Z+0200, date modified: 2024-08-30T14:38:25Z+0200 [harvester info] Querying Github/GitLab API (https://github.com/INL/galahad) [harvester info] Adding URL for found README: readme.md [harvester info] Found releaseNotes [harvester info] Querying Zenodo API for DOI (access token provided)... [harvester info] Looking for TRL information in readme.md... [harvester info] Looking for repostatus information in readme.md... [harvester info] Looking for continuous integration information in readme.md... [harvester info] Found CI https://github.com/INL/Galahad/actions/ [harvester info] Looking for documentation links in readme.md... [harvester info] Falling back to git tag (1.2.2) if no version number is specified... [harvester info] Inferring repostatus information from git activity (used only as a fallback if not explicitly provided)... [harvester info] Inferred repostatus https://www.repostatus.org/#active [harvester info] Looking for repostatus information in readme.md in master branch... [harvester info] Setting group GaLAHaD [harvester info] Reconciliating: codemetapy --baseuri https://tools.clariah.nl --baseuri https://tools.clariah.nl --includecontext --addcontext https://w3id.org/nwo-research-fields --addcontext https://w3id.org/research-technology-readiness-levels --addcontextgraph https://vocabs.dariah.eu/rest/v1/tadirah/data?format=text/turtle --trl --identifier "galahad" --codeRepository "https://github.com/INL/galahad" --validate /etc/software.ttl --released --enrich --textv "Please consult the CLARIAH Software Metadata Requirements at https://github.com/CLARIAH/clariah-plus/blob/main/requirements/software-metadata-requirements.md for an in-depth explanation of any found problems" -O /tmp/out/galahad.codemeta.json /tmp/codemeta-harvester.cache//tmp/99-version.galahad.codemeta.json /tmp/codemeta-harvester.cache//tmp/99-repostatus.galahad.codemeta.json /tmp/codemeta-harvester.cache//tmp/90-authors.galahad.codemeta.json /tmp/codemeta-harvester.cache//tmp/43-releasenotes.galahad.codemeta.json /tmp/codemeta-harvester.cache//tmp/41-readme.galahad.codemeta.json /tmp/codemeta-harvester.cache//tmp/40-gitapi.galahad.codemeta.json /tmp/codemeta-harvester.cache//tmp/39-gitdate.galahad.codemeta.json /tmp/codemeta-harvester.cache//tmp/29-license.galahad.codemeta.json /tmp/codemeta-harvester.cache//tmp/12-ci.galahad.codemeta.json /tmp/codemeta-harvester.cache//tmp/10-harvest.galahad.codemeta.json /tmp/codemeta-harvester.cache//tmp/04-applicationSuite.galahad.codemeta.json -- begin log -- Passed 11 files/sources but specified 0 input types! Automatically guessing types... Detected input types: [('/tmp/codemeta-harvester.cache//tmp/99-version.galahad.codemeta.json', 'json'), ('/tmp/codemeta-harvester.cache//tmp/99-repostatus.galahad.codemeta.json', 'json'), ('/tmp/codemeta-harvester.cache//tmp/90-authors.galahad.codemeta.json', 'json'), ('/tmp/codemeta-harvester.cache//tmp/43-releasenotes.galahad.codemeta.json', 'json'), ('/tmp/codemeta-harvester.cache//tmp/41-readme.galahad.codemeta.json', 'json'), ('/tmp/codemeta-harvester.cache//tmp/40-gitapi.galahad.codemeta.json', 'json'), ('/tmp/codemeta-harvester.cache//tmp/39-gitdate.galahad.codemeta.json', 'json'), ('/tmp/codemeta-harvester.cache//tmp/29-license.galahad.codemeta.json', 'json'), ('/tmp/codemeta-harvester.cache//tmp/12-ci.galahad.codemeta.json', 'json'), ('/tmp/codemeta-harvester.cache//tmp/10-harvest.galahad.codemeta.json', 'json'), ('/tmp/codemeta-harvester.cache//tmp/04-applicationSuite.galahad.codemeta.json', 'json')] Adding to contextgraph: /tmp/turtle Initial URI automatically generated, may be overriden later: https://tools.clariah.nl/galahad Processing source #1 of 11 Parsing json-ld file from /tmp/codemeta-harvester.cache//tmp/99-version.galahad.codemeta.json NOTE: Not a valid JSON-LD document, @context missing! Attempting to inject automatically... Injected (possibly temporary) URI https://tools.clariah.nl/galahad [CODEMETA COMPOSITION (https://tools.clariah.nl/galahad)] processed 1 new triples, total is now 2 Processing source #2 of 11 Parsing json-ld file from /tmp/codemeta-harvester.cache//tmp/99-repostatus.galahad.codemeta.json NOTE: Not a valid JSON-LD document, @context missing! Attempting to inject automatically... Injected (possibly temporary) URI https://tools.clariah.nl/galahad [CODEMETA COMPOSITION (https://tools.clariah.nl/galahad)] processed 1 new triples, total is now 3 Processing source #3 of 11 Parsing json-ld file from /tmp/codemeta-harvester.cache//tmp/90-authors.galahad.codemeta.json Found main resource with URI https://tools.clariah.nl/galahad.topcontributor/snapshot Injected (possibly temporary) URI https://tools.clariah.nl/galahad [CODEMETA COMPOSITION (https://tools.clariah.nl/galahad)] processed 1 new triples, total is now 3 Processing source #4 of 11 Parsing json-ld file from /tmp/codemeta-harvester.cache//tmp/43-releasenotes.galahad.codemeta.json NOTE: Not a valid JSON-LD document, @context missing! Attempting to inject automatically... Injected (possibly temporary) URI https://tools.clariah.nl/galahad [CODEMETA COMPOSITION (https://tools.clariah.nl/galahad)] processed 2 new triples, total is now 5 Processing source #5 of 11 Parsing json-ld file from /tmp/codemeta-harvester.cache//tmp/41-readme.galahad.codemeta.json NOTE: Not a valid JSON-LD document, @context missing! Attempting to inject automatically... Injected (possibly temporary) URI https://tools.clariah.nl/galahad [CODEMETA COMPOSITION (https://tools.clariah.nl/galahad)] processed 1 new triples, total is now 6 Processing source #6 of 11 Parsing json-ld file from /tmp/codemeta-harvester.cache//tmp/40-gitapi.galahad.codemeta.json Found main resource with URI https://tools.clariah.nl/galahad/snapshot Injected (possibly temporary) URI https://tools.clariah.nl/galahad [CODEMETA COMPOSITION (https://tools.clariah.nl/galahad)] processed 13 new triples, total is now 18 Processing source #7 of 11 Parsing json-ld file from /tmp/codemeta-harvester.cache//tmp/39-gitdate.galahad.codemeta.json NOTE: Not a valid JSON-LD document, @context missing! Attempting to inject automatically... Injected (possibly temporary) URI https://tools.clariah.nl/galahad [CODEMETA COMPOSITION (https://tools.clariah.nl/galahad)] overriding old http://schema.org/dateCreated (2024-05-31T14:57:58Z -> 2024-05-31T16:59:02Z+0200) [CODEMETA COMPOSITION (https://tools.clariah.nl/galahad)] overriding old http://schema.org/dateModified (2024-09-17T22:01:41Z -> 2024-08-30T14:38:25Z+0200) [CODEMETA COMPOSITION (https://tools.clariah.nl/galahad)] processed 2 new triples, total is now 18 Processing source #8 of 11 Parsing json-ld file from /tmp/codemeta-harvester.cache//tmp/29-license.galahad.codemeta.json NOTE: Not a valid JSON-LD document, @context missing! Attempting to inject automatically... Injected (possibly temporary) URI https://tools.clariah.nl/galahad [CODEMETA COMPOSITION (https://tools.clariah.nl/galahad)] overriding old http://schema.org/license (http://spdx.org/licenses/Apache-2.0 -> Apache-2.0) [CODEMETA CORRECTION (https://tools.clariah.nl/galahad)] automatically converting license to spdx URI [CODEMETA COMPOSITION (https://tools.clariah.nl/galahad)] processed 1 new triples, total is now 18 Processing source #9 of 11 Parsing json-ld file from /tmp/codemeta-harvester.cache//tmp/12-ci.galahad.codemeta.json NOTE: Not a valid JSON-LD document, @context missing! Attempting to inject automatically... Injected (possibly temporary) URI https://tools.clariah.nl/galahad [CODEMETA COMPOSITION (https://tools.clariah.nl/galahad)] processed 1 new triples, total is now 19 Processing source #10 of 11 Parsing json-ld file from /tmp/codemeta-harvester.cache//tmp/10-harvest.galahad.codemeta.json Injected (possibly temporary) URI https://tools.clariah.nl/galahad [CODEMETA COMPOSITION (galahad)] overriding old http://schema.org/dateCreated (2024-05-31T16:59:02Z+0200 -> 2024-05-31) [CODEMETA COMPOSITION (galahad)] overriding old http://schema.org/codeRepository (https://github.com/INL/galahad -> git+https://github.com/INL/galahad.git) [CODEMETA COMPOSITION (galahad)] overriding old http://schema.org/description ("Galahad". Goal: enable linguists to experiment with different taggers and use the result in other INT products -> GaLAHaD (Generating Linguistic Annotations for Historical Dutch) allows linguists to compare taggers, tag their own corpora, evaluate the results and export their tagged documents.) [CODEMETA COMPOSITION (galahad)] overriding old http://schema.org/downloadUrl (https://github.com/INL/galahad/archive/refs/tags/1.2.2.zip -> https://github.com/INL/galahad) [CODEMETA COMPOSITION (galahad)] overriding old http://schema.org/name (galahad -> GaLAHaD) [CODEMETA COMPOSITION (galahad)] overriding old https://codemeta.github.io/terms/contIntegration (https://github.com/INL/Galahad/actions/ -> https://github.com/INL/galahad/actions) [CODEMETA COMPOSITION (galahad)] overriding old http://schema.org/producer (https://tools.clariah.nl/org/dutch-language-institute -> https://www.ivdnt.org) [CODEMETA COMPOSITION (galahad)] overriding old https://codemeta.github.io/terms/readme (https://github.com/INL/galahad/blob/1.2.2//readme.md -> https://github.com/INL/Galahad/blob/release/readme.md) [CODEMETA COMPOSITION (galahad)] overriding old http://schema.org/releaseNotes (https://github.com/INL/galahad/releases/tag/1.2.2 -> https://github.com/INL/Galahad/releases) [CODEMETA COMPOSITION (galahad)] processed 301 new triples, total is now 307 Processing source #11 of 11 Parsing json-ld file from /tmp/codemeta-harvester.cache//tmp/04-applicationSuite.galahad.codemeta.json NOTE: Not a valid JSON-LD document, @context missing! Attempting to inject automatically... Injected (possibly temporary) URI https://tools.clariah.nl/galahad [CODEMETA COMPOSITION (galahad)] processed 1 new triples, total is now 308 Remapping URI to (possibly) new identifier and version component: https://tools.clariah.nl/galahad -> https://tools.clariah.nl/galahad/1.2.2 [CODEMETA VALIDATION (galahad)] done VALIDATION https://tools.clariah.nl/galahad/1.2.2 #1: Info: Reference publications *SHOULD* be expressed, if any (This is missing in the metadata) -- end log -- [harvester info] Output written to /tmp/out/galahad.codemeta.json [harvester info] Harvesting remote service URL https://portal.clarin.ivdnt.org/galahad for galahad: codemetapy --baseuri https://tools.clariah.nl --baseuri https://tools.clariah.nl --includecontext --addcontext https://w3id.org/nwo-research-fields --addcontext https://w3id.org/research-technology-readiness-levels --addcontextgraph https://vocabs.dariah.eu/rest/v1/tadirah/data?format=text/turtle --trl -O "/tmp/codemeta-harvester.cache//tmp/galahad.codemeta.json" "/tmp/out/galahad.codemeta.json" "https://portal.clarin.ivdnt.org/galahad" -- begin log -- Passed 2 files/sources but specified 0 input types! Automatically guessing types... Detected input types: [('/tmp/out/galahad.codemeta.json', 'json'), ('https://portal.clarin.ivdnt.org/galahad', 'web')] Adding to contextgraph: /tmp/turtle Initial URI automatically generated, may be overriden later: https://tools.clariah.nl/galahad Processing source #1 of 2 Parsing json-ld file from /tmp/out/galahad.codemeta.json Found main resource with URI https://tools.clariah.nl/galahad/1.2.2 Injected (possibly temporary) URI https://tools.clariah.nl/galahad [CODEMETA COMPOSITION (galahad)] processed 477 new triples, total is now 477 Processing source #2 of 2 Fallback: Obtaining metadata from remote URL https://portal.clarin.ivdnt.org/galahad Service replied with content-type text/html Traceback (most recent call last): File "/usr/bin/codemetapy", line 8, in sys.exit(main()) ^^^^^^ File "/usr/lib/python3.12/site-packages/codemeta/codemeta.py", line 335, in main g, res, args, contextgraph = build(**args.__dict__) ^^^^^^^^^^^^^^^^^^^^^^ File "/usr/lib/python3.12/site-packages/codemeta/codemeta.py", line 688, in build for targetres in codemeta.parsers.web.parse_web( File "/usr/lib/python3.12/site-packages/codemeta/parsers/web.py", line 132, in parse_web raise MiddlewareObstructionException( codemeta.parsers.web.MiddlewareObstructionException: Unable to extract metadata from https://portal.clarin.ivdnt.org/galahad because it immediately redirects to an external (SSO) login page rather than a proper landing page -- end log -- [harvester error] Failed to obtain or process metadata from remote service URL https://portal.clarin.ivdnt.org/galahad for galahad [harvester info] <-- Finished processing galahad (https://github.com/INL/galahad) [Wed Sep 18 03:08:25 UTC 2024]