fusus

Workflow for converting Arabic scanned pages into readable text

Citation

You can cite this software using the following citation generated from its metadata:

(2023) fusus 0.0.2 .
  • Among, A Community for DH and MS
.

Logs & Reviews

Name
Automatic software metadata validation report for fusus 0.0.2
Author
  • codemetapy validator using software.ttl
Date
2024-06-15 03:07:59
Review
Please consult the CLARIAH Software Metadata Requirements at https://github.com/CLARIAH/clariah-plus/blob/main/requirements/software-metadata-requirements.md for an in-depth explanation of any found problems

Validation of fusus 0.0.2 was successful (score=3/5), but there are some warnings which should be addressed:

1. Info: Software source code *SHOULD* link to a continuous integration service that builds the software and runs the software's tests (This is missing in the metadata)
2. Info: An interface type *SHOULD* be expressed: Software source code should define one or more target products that are the resulting software applications offering specific interfaces (This is missing in the metadata)
3. Warning: Documentation *SHOULD* be expressed (This is missing in the metadata)
4. Info: Reference publications *SHOULD* be expressed, if any (This is missing in the metadata)
5. Info: The funder *SHOULD* be acknowledged (This is missing in the metadata)
6. Info: A research domain *SHOULD* be expressed as a category using the NWO Research Fields vocabulary, if applicable (This is missing in the metadata)
7. Info: A research activity *SHOULD* be expressed as a category using the TaDiRaH vocabulary (This is missing in the metadata)
Rating
★ ★ ★ ☆ ☆
(log file starts at Sat Jun 15 03:07:46 UTC 2024)

[harvester info] --> Processing fusus (https://github.com/among/fusus) [Sat Jun 15 03:07:46 UTC 2024]

[harvester info] Git updating cached clone of https://github.com/among/fusus...

[harvester info] Found release v0.8

[harvester info] Using 'v0.8'

[harvester info] Git reference: v0.8

[harvester info] Scanning directory /tmp/codemeta-harvester.cache/fusus for harvestable resources...

[harvester info] found python setup for fusus, converting to codemeta

[harvester info] Looking for license....

[harvester info] Found license MIT

[harvester info] Getting contributors from git...

[harvester info] No git contributors found

[harvester info] Getting top contributor from git...

[harvester info] Git top contributor  will be assigned as author (and maintainer) if none are found in the metadata

[harvester info] Extracting last and first commit date from git log....

[harvester info] Date created: 2020-03-03T10:02:46Z+0100, date modified: 2023-04-11T19:46:15Z+0200

[harvester info] Querying Github/GitLab API (https://github.com/among/fusus)

[harvester info] Adding URL for found README: README.md

[harvester info] Found releaseNotes

[harvester info] Querying Zenodo API for DOI (access token provided)...

[harvester info] Found DOI https://doi.org/10.5281/zenodo.7818766

[harvester info] Looking for TRL information in README.md...

[harvester info] Looking for repostatus information in README.md...

[harvester info] Found repostatus https://www.repostatus.org/#wip

[harvester info] Looking for continuous integration information in README.md...

[harvester info] Looking for documentation links in README.md...

[harvester info] Falling back to git tag (v0.8) if no version number is specified...

[harvester info] Inferring repostatus information from git activity (used only as a fallback if not explicitly provided)...

[harvester info] Inferred repostatus https://www.repostatus.org/#active

[harvester info] Looking for repostatus information in README.md in master branch...

[harvester info] Found repostatus (master branch) https://www.repostatus.org/#wip

[harvester info] Reconciliating: codemetapy  --baseuri https://tools.clariah.nl --baseuri https://tools.clariah.nl --includecontext --addcontext https://w3id.org/nwo-research-fields --addcontext https://w3id.org/research-technology-readiness-levels --addcontextgraph https://vocabs.dariah.eu/rest/v1/tadirah/data?format=text/turtle --trl --identifier "fusus" --codeRepository "https://github.com/among/fusus" --validate /etc/software.ttl --released --enrich --textv "Please consult the CLARIAH Software Metadata Requirements at https://github.com/CLARIAH/clariah-plus/blob/main/requirements/software-metadata-requirements.md for an in-depth explanation of any found problems" -O /tmp/out/fusus.codemeta.json /tmp/codemeta-harvester.cache//tmp/99-version.fusus.codemeta.json /tmp/codemeta-harvester.cache//tmp/99-repostatus.fusus.codemeta.json /tmp/codemeta-harvester.cache//tmp/90-authors.fusus.codemeta.json /tmp/codemeta-harvester.cache//tmp/43-releasenotes.fusus.codemeta.json /tmp/codemeta-harvester.cache//tmp/41-readme.fusus.codemeta.json /tmp/codemeta-harvester.cache//tmp/40-gitapi.fusus.codemeta.json /tmp/codemeta-harvester.cache//tmp/39-gitdate.fusus.codemeta.json /tmp/codemeta-harvester.cache//tmp/29-license.fusus.codemeta.json /tmp/codemeta-harvester.cache//tmp/20-python.fusus.codemeta.json /tmp/codemeta-harvester.cache//tmp/11-repostatus.fusus.codemeta.json /tmp/codemeta-harvester.cache//tmp/05-repostatus.fusus.codemeta.json /tmp/codemeta-harvester.cache//tmp/05-doi.fusus.codemeta.json 

-- begin log --

Passed 12 files/sources but specified 0 input types! Automatically guessing types...

Detected input types: [('/tmp/codemeta-harvester.cache//tmp/99-version.fusus.codemeta.json', 'json'), ('/tmp/codemeta-harvester.cache//tmp/99-repostatus.fusus.codemeta.json', 'json'), ('/tmp/codemeta-harvester.cache//tmp/90-authors.fusus.codemeta.json', 'json'), ('/tmp/codemeta-harvester.cache//tmp/43-releasenotes.fusus.codemeta.json', 'json'), ('/tmp/codemeta-harvester.cache//tmp/41-readme.fusus.codemeta.json', 'json'), ('/tmp/codemeta-harvester.cache//tmp/40-gitapi.fusus.codemeta.json', 'json'), ('/tmp/codemeta-harvester.cache//tmp/39-gitdate.fusus.codemeta.json', 'json'), ('/tmp/codemeta-harvester.cache//tmp/29-license.fusus.codemeta.json', 'json'), ('/tmp/codemeta-harvester.cache//tmp/20-python.fusus.codemeta.json', 'json'), ('/tmp/codemeta-harvester.cache//tmp/11-repostatus.fusus.codemeta.json', 'json'), ('/tmp/codemeta-harvester.cache//tmp/05-repostatus.fusus.codemeta.json', 'json'), ('/tmp/codemeta-harvester.cache//tmp/05-doi.fusus.codemeta.json', 'json')]

Adding to contextgraph: /tmp/turtle

Initial URI automatically generated, may be overriden later: https://tools.clariah.nl/fusus

Processing source #1 of 12

Parsing json-ld file from /tmp/codemeta-harvester.cache//tmp/99-version.fusus.codemeta.json

    NOTE: Not a valid JSON-LD document, @context missing! Attempting to inject automatically...

    Injected (possibly temporary) URI https://tools.clariah.nl/fusus

[CODEMETA COMPOSITION (https://tools.clariah.nl/fusus)] processed 1 new triples, total is now 2

Processing source #2 of 12

Parsing json-ld file from /tmp/codemeta-harvester.cache//tmp/99-repostatus.fusus.codemeta.json

    NOTE: Not a valid JSON-LD document, @context missing! Attempting to inject automatically...

    Injected (possibly temporary) URI https://tools.clariah.nl/fusus

[CODEMETA COMPOSITION (https://tools.clariah.nl/fusus)] processed 1 new triples, total is now 3

Processing source #3 of 12

Parsing json-ld file from /tmp/codemeta-harvester.cache//tmp/90-authors.fusus.codemeta.json

    Found main resource with URI https://tools.clariah.nl/fusus.topcontributor/snapshot

    Injected (possibly temporary) URI https://tools.clariah.nl/fusus

[CODEMETA COMPOSITION (https://tools.clariah.nl/fusus)] processed 1 new triples, total is now 3

Processing source #4 of 12

Parsing json-ld file from /tmp/codemeta-harvester.cache//tmp/43-releasenotes.fusus.codemeta.json

    NOTE: Not a valid JSON-LD document, @context missing! Attempting to inject automatically...

    Injected (possibly temporary) URI https://tools.clariah.nl/fusus

[CODEMETA COMPOSITION (https://tools.clariah.nl/fusus)] processed 2 new triples, total is now 5

Processing source #5 of 12

Parsing json-ld file from /tmp/codemeta-harvester.cache//tmp/41-readme.fusus.codemeta.json

    NOTE: Not a valid JSON-LD document, @context missing! Attempting to inject automatically...

    Injected (possibly temporary) URI https://tools.clariah.nl/fusus

[CODEMETA COMPOSITION (https://tools.clariah.nl/fusus)] processed 1 new triples, total is now 6

Processing source #6 of 12

Parsing json-ld file from /tmp/codemeta-harvester.cache//tmp/40-gitapi.fusus.codemeta.json

    Found main resource with URI https://tools.clariah.nl/fusus/snapshot

    Injected (possibly temporary) URI https://tools.clariah.nl/fusus

[CODEMETA COMPOSITION (https://tools.clariah.nl/fusus)] processed 24 new triples, total is now 29

Processing source #7 of 12

Parsing json-ld file from /tmp/codemeta-harvester.cache//tmp/39-gitdate.fusus.codemeta.json

    NOTE: Not a valid JSON-LD document, @context missing! Attempting to inject automatically...

    Injected (possibly temporary) URI https://tools.clariah.nl/fusus

[CODEMETA COMPOSITION (https://tools.clariah.nl/fusus)] overriding old http://schema.org/dateCreated (2020-03-03T09:02:45Z -> 2020-03-03T10:02:46Z+0100)

[CODEMETA COMPOSITION (https://tools.clariah.nl/fusus)] overriding old http://schema.org/dateModified (2023-11-14T16:08:47Z -> 2023-04-11T19:46:15Z+0200)

[CODEMETA COMPOSITION (https://tools.clariah.nl/fusus)] processed 2 new triples, total is now 29

Processing source #8 of 12

Parsing json-ld file from /tmp/codemeta-harvester.cache//tmp/29-license.fusus.codemeta.json

    NOTE: Not a valid JSON-LD document, @context missing! Attempting to inject automatically...

    Injected (possibly temporary) URI https://tools.clariah.nl/fusus

[CODEMETA COMPOSITION (https://tools.clariah.nl/fusus)] overriding old http://schema.org/license (http://spdx.org/licenses/MIT -> MIT)

[CODEMETA CORRECTION (https://tools.clariah.nl/fusus)] automatically converting license to spdx URI

[CODEMETA COMPOSITION (https://tools.clariah.nl/fusus)] processed 1 new triples, total is now 29

Processing source #9 of 12

Parsing json-ld file from /tmp/codemeta-harvester.cache//tmp/20-python.fusus.codemeta.json

    Found main resource with URI https://tools.clariah.nl/fusus/0.0.2

    Injected (possibly temporary) URI https://tools.clariah.nl/fusus

[CODEMETA COMPOSITION (fusus)] overriding old http://schema.org/description (a workflow to transform Arabic classical works in printed form to structured text -> Workflow for converting Arabic scanned pages into readable text)

[CODEMETA COMPOSITION (fusus)] overriding old https://codemeta.github.io/terms/developmentStatus (https://www.repostatus.org/#active -> https://www.repostatus.org/#wip)

[CODEMETA COMPOSITION (fusus)] overriding old http://schema.org/keywords (ocr -> OCR)

[CODEMETA COMPOSITION (fusus)] overriding old http://schema.org/keywords (text-processing -> OCR)

[CODEMETA COMPOSITION (fusus)] overriding old http://schema.org/keywords (kraken -> OCR)

[CODEMETA COMPOSITION (fusus)] overriding old http://schema.org/keywords (image-processing -> OCR)

[CODEMETA COMPOSITION (fusus)] overriding old http://schema.org/keywords (text-fabric -> OCR)

[CODEMETA COMPOSITION (fusus)] overriding old http://schema.org/keywords (digital-humanities -> OCR)

[CODEMETA COMPOSITION (fusus)] overriding old http://schema.org/keywords (opencv -> OCR)

[CODEMETA COMPOSITION (fusus)] overriding old http://schema.org/keywords (python -> OCR)

[CODEMETA COMPOSITION (fusus)] overriding old http://schema.org/keywords (wisdom -> OCR)

[CODEMETA COMPOSITION (fusus)] overriding old http://schema.org/keywords (workflow -> OCR)

[CODEMETA COMPOSITION (fusus)] overriding old http://schema.org/url (https://among.github.io/fusus/fusus/index.html -> https://github.com/among/fusus)

[CODEMETA COMPOSITION (fusus)] overriding old http://schema.org/version (v0.8 -> 0.0.2)

[CODEMETA COMPOSITION (fusus)] processed 121 new triples, total is now 130

Processing source #10 of 12

Parsing json-ld file from /tmp/codemeta-harvester.cache//tmp/11-repostatus.fusus.codemeta.json

    NOTE: Not a valid JSON-LD document, @context missing! Attempting to inject automatically...

    Injected (possibly temporary) URI https://tools.clariah.nl/fusus

[CODEMETA COMPOSITION (fusus)] processed 1 new triples, total is now 130

Processing source #11 of 12

Parsing json-ld file from /tmp/codemeta-harvester.cache//tmp/05-repostatus.fusus.codemeta.json

    NOTE: Not a valid JSON-LD document, @context missing! Attempting to inject automatically...

    Injected (possibly temporary) URI https://tools.clariah.nl/fusus

[CODEMETA COMPOSITION (fusus)] processed 1 new triples, total is now 130

Processing source #12 of 12

Parsing json-ld file from /tmp/codemeta-harvester.cache//tmp/05-doi.fusus.codemeta.json

    NOTE: Not a valid JSON-LD document, @context missing! Attempting to inject automatically...

    Injected (possibly temporary) URI https://tools.clariah.nl/fusus

[CODEMETA COMPOSITION (fusus)] overriding old http://schema.org/identifier (fusus -> )

[CODEMETA COMPOSITION (fusus)] processed 5 new triples, total is now 134

Remapping URI to (possibly) new identifier and version component: https://tools.clariah.nl/fusus -> https://tools.clariah.nl/fusus/0.0.2

[CODEMETA VALIDATION (fusus)] done

[CODEMETA ENRICHMENT (fusus)] automatically adding programmingLanguage Python derived from runtimePlatform Python

[CODEMETA ENRICHMENT (fusus)] automatically adding programmingLanguage Python derived from runtimePlatform Python

[CODEMETA ENRICHMENT (fusus)] automatically adding programmingLanguage Python derived from runtimePlatform Python

[CODEMETA ENRICHMENT (fusus)] adding author https://tools.clariah.nl/person/cornelis-van-lit as contributor

[CODEMETA ENRICHMENT (fusus)] adding author https://tools.clariah.nl/person/dirk-roorda as contributor

[CODEMETA ENRICHMENT (fusus)] considering first author as maintainer

VALIDATION https://tools.clariah.nl/fusus/0.0.2 #1: Info: Software source code *SHOULD* link to a continuous integration service that builds the software and runs the software's tests (This is missing in the metadata)

VALIDATION https://tools.clariah.nl/fusus/0.0.2 #2: Info: An interface type *SHOULD* be expressed: Software source code should define one or more target products that are the resulting software applications offering specific interfaces (This is missing in the metadata)

VALIDATION https://tools.clariah.nl/fusus/0.0.2 #3: Warning: Documentation *SHOULD* be expressed (This is missing in the metadata)

VALIDATION https://tools.clariah.nl/fusus/0.0.2 #4: Info: Reference publications *SHOULD* be expressed, if any (This is missing in the metadata)

VALIDATION https://tools.clariah.nl/fusus/0.0.2 #5: Info: The funder *SHOULD* be acknowledged (This is missing in the metadata)

VALIDATION https://tools.clariah.nl/fusus/0.0.2 #6: Info: A research domain *SHOULD* be expressed as a category using the NWO Research Fields vocabulary, if applicable (This is missing in the metadata)

VALIDATION https://tools.clariah.nl/fusus/0.0.2 #7: Info: A research activity *SHOULD* be expressed as a category using the TaDiRaH vocabulary (This is missing in the metadata)

-- end log --

[harvester info] Output written to /tmp/out/fusus.codemeta.json

[harvester info] <-- Finished processing fusus (https://github.com/among/fusus) [Sat Jun 15 03:07:59 UTC 2024]

        

Metadata Properties

Version
0.0.2 (release notes)
Interface types
  • Unknown
Software website
Source code repository
 https://github.com/among/fusus  Stars are an indicator of the popularity of this project on GitHub
Category
  • Religion
  • Scientific/Engineering > Information Analysis
  • Sociology > History
  • Text Processing
  • Text Processing > Fonts
  • Text Processing > Markup
Keywords
  • arabic
  • image processing
  • islam
  • medieval
  • OCR
  • text
Development Status
  • Proof of Concept: An initial proof-of-concept implementation of the technology is available (alpha). It is not mature enough for end-users yet.
  • WIP: Initial development is in progress, but there has not yet been a stable, usable release suitable for the public.
Issue Tracker (Support)
https://github.com/among/fusus/issues  The number of open issues on the issue tracker  The number of closes issues on the issue tracker
Documentation
License
Author(s)
Maintainer(s)
Contributor(s)
Producer
  •   Among, A Community for DH and MS
Programming Language
  • Python
Runtime Platform
  • Python 3
  • Python 3 Only
  • Python Implementation CPython
Operating System
  • MacOS > MacOS X
  • Microsoft > Windows > Windows 10
  • POSIX > Linux
Software dependencies
  • PyMuPDF
  • ipython
  • kraken
  • numpy
  • opencv-contrib-python
  • pdoc3
  • pillow
  • python-Levenshtein
  • pyyaml
  • text-fabric
Metadata validation
★ ★ ★ ☆ ☆
Created
2020-03-03 10:02:46 +0100
Last modified
2023-04-11 19:46:15 +0200  Last commit (main branch). Gives an indication of project development activity and rough indication of how up-to-date the latest release is.  Number of commits since the last release. Gives an indication of project development activity and rough indication of how up-to-date the latest release is.