Skip to content

ISCC - Codec & Algorithms#

Build Version Coverage Quality Downloads

iscc-core is a Python library that implements the core algorithms of the ISCC (International Standard Content Code)

What is an ISCC#

The ISCC is a similarity preserving identifier for digital media assets.

ISCCs are generated algorithmically from digital content, just like cryptographic hashes. However, instead of using a single cryptographic hash function to identify data only, the ISCC uses various algorithms to create a composite identifier that exhibits similarity-preserving properties (soft hash).

The component-based structure of the ISCC identifies content at multiple levels of abstraction. Each component is self-describing, modular, and can be used separately or with others to aid in various content identification tasks. The algorithmic design supports content deduplication, database synchronization, indexing, integrity verification, timestamping, versioning, data provenance, similarity clustering, anomaly detection, usage tracking, allocation of royalties, fact-checking and general digital asset management use-cases.

What is iscc-core#

iscc-core is the python based library of the core algorithms to create standard-compliant ISCC codes. It also serves as a reference for porting ISCC to other programming languages.

Tip

This is a low level reference implementation. iscc-core does not support content/metadata detection, extraction or preprocessing. For easy generation of ISCC codes see: iscc-cli |

ISCC Architecture#

ISCC Architecture ISCC Architecture

ISCC MainTypes#

Idx Slug Bits Purpose
0 META 0000 Match on metadata similarity
1 SEMANTIC 0001 Match on semantic content similarity
2 CONTENT 0010 Match on perceptual content similarity
3 DATA 0011 Match on data similarity
4 INSTANCE 0100 Match on data identity
5 ISCC 0101 Composite of two or more components with common header
6 ID 0110 Short unique identifier bound to ISCC, timestamp, pubkey

Installation#

Use the package manager pip to install iscc-core.

pip install iscc-core

Quick Start#

import iscc_core


meta_code = iscc_core.gen_meta_code(name="ISCC Test Document!")

print(f"Meta-Code:     {meta_code.iscc}")
print(f"Structure:     {iscc_core.explain(meta_code.iscc)}\n")

# Extract text from file
with open("demo.txt", "rt", encoding="utf-8") as stream:
    text = stream.read()
    text_code = iscc_core.gen_text_code_v0(text)
    print(f"Text-Code:     {text_code.iscc}")
    print(f"Structure:     {iscc_core.explain(text_code.iscc)}\n")

# Process raw bytes of textfile
with open("demo.txt", "rb") as stream:
    data_code = iscc_core.gen_data_code(stream)
    print(f"Data-Code:     {data_code.iscc}")
    print(f"Structure:     {iscc_core.explain(data_code.iscc)}\n")

    stream.seek(0)
    instance_code = iscc_core.gen_instance_code(stream)
    print(f"Instance-Code: {instance_code.iscc}")
    print(f"Structure:     {iscc_core.explain(instance_code.iscc)}\n")

iscc_code = iscc_core.gen_iscc_code(
    (meta_code.iscc, text_code.iscc, data_code.iscc, instance_code.iscc)
)
print(f"ISCC-CODE:     {iscc_code.iscc}")
print(f"Structure:     {iscc_core.explain(iscc_code.iscc)}")
print(f"Multiformat:   {iscc_code.code_obj.mf_base32}\n")

iscc_id = iscc_core.gen_iscc_id(chain=1, iscc_code=iscc_code.iscc, uc=7)
print(f"ISCC-ID:       {iscc_id.iscc}")
print(f"Structure:     {iscc_core.explain(iscc_id.iscc)}")
print(f"Multiformat:   {iscc_id.code_obj.mf_base32}")

The output of this example is as follows:

Meta-Code:     ISCC:AAAT4EBWK27737D2
Structure:     META-NONE-V0-64-3e103656bffdfc7a

Text-Code:     ISCC:EAAQMBEYQF6457DP
Structure:     CONTENT-TEXT-V0-64-060498817dcefc6f

Data-Code:     ISCC:GAAZ5SQ47ZQ34A3V
Structure:     DATA-NONE-V0-64-9eca1cfe61be0375

Instance-Code: ISCC:IAASQF7FY2TLVFRC
Structure:     INSTANCE-NONE-V0-64-2817e5c6a6ba9622

ISCC-CODE:     ISCC:KACT4EBWK27737D2AYCJRAL5Z36G7HWKDT7GDPQDOUUBPZOGU25JMIQ
Structure:     ISCC-TEXT-V0-MCDI-3e103656bffdfc7a060498817dcefc6f9eca1cfe61be03752817e5c6a6ba9622
Multiformat:   bzqavabj6ca3fnp757r5ambeyqf6457dpt3fbz7tbxybxkkax4xdknouwei

ISCC-ID:       ISCC:MEASAHQADTLH37X4A4
Structure:     ID-BITCOIN-V0-72-201e001cd67dfefc-7
Multiformat:   bzqawcajadyabzvt5736ao

Documentation#

https://core.iscc.codes

Project Status#

ISCC is in the process of being standardized within ISO/TC 46/SC 9.

Maintainers#

@titusz

Contributing#

Pull requests are welcome. For significant changes, please open an issue first to discuss your plans. Please make sure to update tests as appropriate.

You may also want join our developer chat on Telegram at https://t.me/iscc_dev.

Changelog#

0.2.0 - Unreleased#

  • Code cleanup

0.1.9 - 2021-12-17#

  • Added warning on non-standard options
  • Added multiformats support
  • Added uri representation
  • Removed redundant cdc_avg_chunk_size option
  • Updated codec format documentation

0.1.8 - 2021-12-12#

  • Added conformance tests for all top level functions
  • Added conformance tests to source dir
  • Added conformance module with selftest function
  • Changed gen_image_code to accept normalized pixels instead of stream
  • Changed opts to core_opts
  • Removed image pre-processing and Pillow dependency
  • Fixed readability of conformance tests
  • Fixed soft_hash_video_v0 to accept non-tuple sequences
  • Updated example code

0.1.7 - 2021-12-09#

  • Add dotenv for enviroment based configuration
  • Cleanup package toplevel imports
  • Return schema objects for iscc_code and iscc_id
  • Exclude unset and none values from result dicts
  • Add support for multiple code combinations for ISCC-CODE
  • Add support for ISCC-ID based on singular Instance-Code
  • Add initial conformance test system

0.1.6 - 2021-11-29#

  • Show counter for ISCC-ID in Code.explain

0.1.5 - 2021-11-28#

  • Fix documentation
  • Change metahash creation logic
  • Refactor models
  • Add Content-Code-Mixed
  • Add ISCC-ID
  • Refactor compose to gen_iscc_code
  • Refactor models to schema

0.1.4 - 2021-11-17#

  • Simplified options
  • Optimize video WTA-hash for use with 64-bit granular features

0.1.3 - 2021-11-15#

  • Try to compile Cython/C accelerator modules when installing via pip
  • Simplify soft_hash api return values
  • Add .code() method to InstanceHasher, DataHasher
  • Remove granular fingerprint calculation
  • Add more top-level imports

0.1.2 - 2021-11-14#

  • Export more functions to toplevel
  • Return schema driven objects from ISCC code generators.

0.1.1 - 2021-11-14#

  • Fix packaging problems

0.1.0 - 2021-11-13#

  • Initial release