T-Scan

T-Scan is an analysis tool for Dutch texts to assess the complexity of the text, and is based on original work by Rogier Kraf

Provided tools & services

T-scan

T-Scan is an analysis tool for Dutch text, mainly focusing on text complexity. It has been initially conceptualized by Rogier Kraf and Henk Pander Maat. Rogier Kraf also programmed the first versions. From 2012 on, Henk Pander Maat supervised the development of the extended versions of the tool. These versions were programmed by Maarten van Gompel, Ko van der Sloot, Martijn van der Klis, Sheean Spoel and Luka van der Plas.
Type
  • Web Application
Service Provider
      Utrecht University
Input data
Name
*.data
Description
Stoplist
Type
DigitalDocument
Encoding Format
text/plain
Name
*.xml
Description
Alpino XML
Type
TextDigitalDocument
Encoding Format
text/xml
Name
*.data
Description
Noun Classification
Type
DigitalDocument
Encoding Format
text/plain
Name
*.txt
Description
Text Input
Type
DigitalDocument
Encoding Format
text/plain
Name
*.data
Description
Adjective Classification
Type
DigitalDocument
Encoding Format
text/plain
Name
*.data
Description
Intensifying words
Type
DigitalDocument
Encoding Format
text/plain
Name
*.data
Description
Own Classification
Type
DigitalDocument
Encoding Format
text/plain
Output data
Name
total.sen.csv
Description
Aggregated statistics, per sentence
Type
SpreadsheetDigitalDocument
Encoding Format
text/csv
Name
*.xml
Description
Output analysis
Type
TextDigitalDocument
Encoding Format
text/xml
Name
error.log
Description
Log file with (standard) error output
Type
DigitalDocument
Encoding Format
text/plain
Name
total.par.csv
Description
Aggregated statistics, per paragraph
Type
SpreadsheetDigitalDocument
Encoding Format
text/csv
Name
*.words.csv
Description
Document statistics, per word
Type
SpreadsheetDigitalDocument
Encoding Format
text/csv
Name
total.doc.csv
Description
Aggregated statistics, per document
Type
SpreadsheetDigitalDocument
Encoding Format
text/csv
Name
tscanview.xsl
Description
Stylesheet for Visualisation
Type
DigitalDocument
Encoding Format
application/xslt+xml
Name
total.word.csv
Description
Aggregated statistics, per word
Type
SpreadsheetDigitalDocument
Encoding Format
text/csv
Name
*.sentences.csv
Description
Document statistics, per sentence
Type
SpreadsheetDigitalDocument
Encoding Format
text/csv
Name
*.paragraphs.csv
Description
Document statistics, per paragraph
Type
SpreadsheetDigitalDocument
Encoding Format
text/csv
Name
*.document.csv
Description
Document statistics, entire document
Type
SpreadsheetDigitalDocument
Encoding Format
text/csv

References

Citation

Please use one of the above reference publications to cite the software, if you want to cite the software directly, you can use the following citation generated from the metadata:

T-Scan 0.9.8 .
  • Centre for Language and Speech Technology
  • Utrecht Institute of Linguistics OTS
.

Logs & Reviews

Name
Automatic software metadata validation report for T-Scan 0.9.8
Author
  • codemetapy validator using software.ttl
Date
2024-02-06 03:27:32
Review
Please consult the CLARIAH Software Metadata Requirements at https://github.com/CLARIAH/clariah-plus/blob/main/requirements/software-metadata-requirements.md for an in-depth explanation of any found problems

Validation of T-Scan 0.9.8 was successful (score=3/5), but there are some warnings which should be addressed:

1. Info: An interface type *SHOULD* be expressed: Software source code should define one or more target products that are the resulting software applications offering specific interfaces (This is missing in the metadata)
2. Warning: Documentation *SHOULD* be expressed (This is missing in the metadata)
3. Info: The technology readiness level *SHOULD* be expressed (This is missing in the metadata)
Rating
★ ★ ★ ☆ ☆
(log file starts at Tue Feb  6 03:27:30 UTC 2024)

[harvester info] --> Processing tscan (https://github.com/UUDigitalHumanitieslab/tscan) [Tue Feb  6 03:27:30 UTC 2024]

[harvester info] Git updating cached clone of https://github.com/UUDigitalHumanitieslab/tscan...

[harvester info] Found release v0.9.8

[harvester info] Using 'v0.9.8'

[harvester info] Git reference: v0.9.8

[harvester info] Scanning directory /tmp/codemeta-harvester.cache/tscan for harvestable resources...

[harvester info] found codemeta.json for tscan (md5sum c7e5627ab22f22efe757b5c90e827caa); **NOTE: this is considered authoritative and most other detection methods will be skipped now!**

[harvester info] Inferring repostatus information from git activity (used only as a fallback if not explicitly provided)...

[harvester info] Inferred repostatus https://www.repostatus.org/#active

[harvester info] Looking for repostatus information in README.md in master branch...

[harvester info] Looking for repostatus information in README in master branch...

[harvester info] Reconciliating: codemetapy  --baseuri https://tools.clariah.nl --baseuri https://tools.clariah.nl --includecontext --addcontext https://w3id.org/nwo-research-fields --addcontext https://w3id.org/research-technology-readiness-levels --addcontextgraph https://vocabs.dariah.eu/rest/v1/tadirah/data?format=text/turtle --trl --identifier "tscan" --codeRepository "https://github.com/UUDigitalHumanitieslab/tscan" --validate /etc/software.ttl --released --enrich --textv "Please consult the CLARIAH Software Metadata Requirements at https://github.com/CLARIAH/clariah-plus/blob/main/requirements/software-metadata-requirements.md for an in-depth explanation of any found problems" -O /tmp/out/tscan.codemeta.json /tmp/codemeta-harvester.cache//tmp/99-repostatus.tscan.codemeta.json /tmp/codemeta-harvester.cache//tmp/10-jsonld.tscan.codemeta.json 

-- begin log --

Passed 2 files/sources but specified 0 input types! Automatically guessing types...

Detected input types: [('/tmp/codemeta-harvester.cache//tmp/99-repostatus.tscan.codemeta.json', 'json'), ('/tmp/codemeta-harvester.cache//tmp/10-jsonld.tscan.codemeta.json', 'json')]

Adding to contextgraph: /tmp/turtle

Initial URI automatically generated, may be overriden later: https://tools.clariah.nl/tscan

Processing source #1 of 2

Parsing json-ld file from /tmp/codemeta-harvester.cache//tmp/99-repostatus.tscan.codemeta.json

    NOTE: Not a valid JSON-LD document, @context missing! Attempting to inject automatically...

    Injected (possibly temporary) URI https://tools.clariah.nl/tscan

[CODEMETA COMPOSITION (https://tools.clariah.nl/tscan)] processed 1 new triples, total is now 2

Processing source #2 of 2

Parsing json-ld file from /tmp/codemeta-harvester.cache//tmp/10-jsonld.tscan.codemeta.json

    Injected (possibly temporary) URI https://tools.clariah.nl/tscan

[CODEMETA COMPOSITION (tscan)] overriding old https://codemeta.github.io/terms/developmentStatus (https://www.repostatus.org/#active -> active)

[CODEMETA CORRECTION (tscan)] automatically converting status active to repostatus URI

[CODEMETA COMPOSITION (tscan)] processed 139 new triples, total is now 139

Remapping URI to (possibly) new identifier and version component: https://tools.clariah.nl/tscan -> https://tools.clariah.nl/tscan/0.9.8

[CODEMETA VALIDATION (tscan)] done

[CODEMETA ENRICHMENT (tscan)] adding author https://tools.clariah.nl/stub/H-c44c5653b60ed4e as contributor

[CODEMETA ENRICHMENT (tscan)] adding author https://tools.clariah.nl/stub/H-663774814e655a03 as contributor

[CODEMETA ENRICHMENT (tscan)] adding author https://orcid.org/0000-0002-1046-0006 as contributor

[CODEMETA ENRICHMENT (tscan)] considering first author as maintainer

VALIDATION https://tools.clariah.nl/tscan/0.9.8 #1: Info: An interface type *SHOULD* be expressed: Software source code should define one or more target products that are the resulting software applications offering specific interfaces (This is missing in the metadata)

VALIDATION https://tools.clariah.nl/tscan/0.9.8 #2: Warning: Documentation *SHOULD* be expressed (This is missing in the metadata)

VALIDATION https://tools.clariah.nl/tscan/0.9.8 #3: Info: The technology readiness level *SHOULD* be expressed (This is missing in the metadata)

-- end log --

[harvester info] Output written to /tmp/out/tscan.codemeta.json

[harvester info] Harvesting remote service URL https://tscan.hum.uu.nl/tscan/ for tscan: codemetapy  --baseuri https://tools.clariah.nl --baseuri https://tools.clariah.nl --includecontext --addcontext https://w3id.org/nwo-research-fields --addcontext https://w3id.org/research-technology-readiness-levels --addcontextgraph https://vocabs.dariah.eu/rest/v1/tadirah/data?format=text/turtle --trl -O "/tmp/codemeta-harvester.cache//tmp/tscan.codemeta.json" "/tmp/out/tscan.codemeta.json" "https://tscan.hum.uu.nl/tscan/"

[harvester info] <-- Finished processing tscan (https://github.com/UUDigitalHumanitieslab/tscan) [Tue Feb  6 03:27:38 UTC 2024]

        

Metadata Properties

Version
0.9.8 (release notes)
Interface types
  • Web Application
Software website
Source code repository
 https://github.com/UUDigitalHumanitieslab/tscan  Stars are an indicator of the popularity of this project on GitHub
Keywords
  • dutch
  • feature extraction
  • natural language processing
  • nlp
  • readability
Development Status
  • Active: The project has reached a stable, usable state and is being actively developed.
Issue Tracker (Support)
https://github.com/proycon/tscan/issues  The number of open issues on the issue tracker  The number of closes issues on the issue tracker
Documentation
License
Author(s)
Maintainer(s)
Contributor(s)
Producer
Funder
  •   Leesbaarheids-Index Nederlands (LIN) (NWO grant)
Programming Language
  • C++
Continuous Integration Tests
https://travis-ci.org/proycon/tscan
Operating System
  • POSIX
Software dependencies
  • Wopr
  • frog
  • ucto
  • libfolia
  • Alpino
Metadata validation
★ ★ ★ ☆ ☆
Created
2012-09-12