ISCC - Codec#
This module implements encoding, decoding and transcoding functions of ISCC
Codec Overview#
Codec Functions#
encode_component(mtype, stype, version, bit_length, digest)
#
Encode an ISCC-UNIT inlcuding header and body with standard base32 encoding.
Note
The length
value must be the length in number of bits for the component.
If digest
has more bits than specified by length
it wil be truncated.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
mtype |
MainType
|
Maintype of unit (0-6) |
required |
stype |
SubType
|
SubType of unit depending on MainType (0-5) |
required |
version |
Version
|
Version of unit algorithm (0). |
required |
bit_length |
length
|
Length of unit, in number of bits (multiple of 32) |
required |
digest |
bytes
|
The hash digest of the unit. |
required |
Returns:
Type | Description |
---|---|
str
|
Base32 encoded ISCC-UNIT. |
Source code in iscc_core\codec.py
encode_header(mtype, stype, version = 0, length = 1)
#
Encodes header values with nibble-sized (4-bit) variable-length encoding.
The result is minimum 2 and maximum 8 bytes long. If the final count of nibbles
is uneven it is padded with 4-bit 0000
at the end.
Warning
The length value must be encoded beforhand because its semantics depend on
the MainType (see encode_length
function).
Parameters:
Name | Type | Description | Default |
---|---|---|---|
mtype |
MainType
|
MainType of unit. |
required |
stype |
SubType
|
SubType of unit. |
required |
version |
Version
|
Version of component algorithm. |
0
|
length |
Length
|
length value of unit (1 means 64-bits for standard units) |
1
|
Returns:
Type | Description |
---|---|
bytes
|
Varnibble stream encoded ISCC header as bytes. |
Source code in iscc_core\codec.py
decode_header(data)
#
Decodes varnibble encoded header and returns it together with tail data
.
Tail data is included to enable decoding of sequential ISCCs. The returned tail data must be truncated to decode_length(r[0], r[3]) bits to recover the actual hash-bytes.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
data |
bytes
|
ISCC bytes |
required |
Returns:
Type | Description |
---|---|
IsccTuple
|
(MainType, SubType, Version, length, TailData) |
Source code in iscc_core\codec.py
encode_varnibble(n)
#
Writes integer to variable length sequence of 4-bit chunks.
Variable-length encoding scheme:
prefix bits | nibbles | data bits | unsigned range |
---|---|---|---|
0 | 1 | 3 | 0 - 7 |
10 | 2 | 6 | 8 - 71 |
110 | 3 | 9 | 72 - 583 |
1110 | 4 | 12 | 584 - 4679 |
Parameters:
Name | Type | Description | Default |
---|---|---|---|
n |
int
|
Positive integer to be encoded as varnibble (0-4679) |
required |
Returns:
Type | Description |
---|---|
bitarray
|
Varnibble encoded integera |
Source code in iscc_core\codec.py
decode_varnibble(b)
#
Reads first varnibble, returns its integer value and remaining bits.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
b |
bitarray
|
Array of header bits |
required |
Returns:
Type | Description |
---|---|
Tuple[int, bitarray]
|
A tuple of the integer value of first varnible and the remaining bits. |
Source code in iscc_core\codec.py
encode_units(units)
#
Encodes a combination of ISCC units to an integer between 0-7 to be used as length value for the final encoding of MT.ISCC
Parameters:
Name | Type | Description | Default |
---|---|---|---|
units |
Tuple
|
A tuple of a MainType combination (can be empty) |
required |
Returns:
Type | Description |
---|---|
int
|
Integer value to be used as length-value for header encoding |
Source code in iscc_core\codec.py
decode_units(unit_id)
#
Decodes an ISCC header length value that has been encoded with a unit_id to an ordered tuple of MainTypes.
Source code in iscc_core\codec.py
encode_length(mtype, length)
#
Encode length to integer value for header encoding.
The length
value has MainType-specific semantics:
For MainTypes META
, SEMANTIC
, CONTENT
, DATA
, INSTANCE
:
Length means number of bits for the body.
Length is encoded as the multiple of 32-bit chunks (0 being 32bits)
Examples: 32 -> 0, 64 -> 1, 96 -> 2 ...
For MainType ISCC
:
MainTypes `DATA` and `INSTANCE` are mandatory for ISCC-CODEs, all others are
optional. Length means the composition of optional 64-bit units included
in the ISCC composite.
Examples:
No optional units -> 0000 -> 0
CONTENT -> 0001 -> 1
SEMANTIC -> 0010 -> 2
SEMANTIC, CONTENT -> 0011 -> 3
META -> 0100 -> 4
META, CONTENT -> 0101 -> 5
...
For MainType ID
:
Lengths means number the number of bits for the body including the counter
Length is encoded as number of bytes of the counter (64-bit body is implicit)
Examples:
64 -> 0 (No counter)
72 -> 1 (One byte counter)
80 -> 2 (Two byte counter)
...
Parameters:
Name | Type | Description | Default |
---|---|---|---|
mtype |
MainType
|
The MainType for which to encode the length value. |
required |
length |
Length
|
The length expressed according to the semantics of the type |
required |
Returns:
Type | Description |
---|---|
int
|
The length value encoded as integer for use with write_header. |
Source code in iscc_core\codec.py
decode_length(mtype, length)
#
Dedoce raw length value from ISCC header to length of digest in number of bits.
Decodes a raw header integer value in to its semantically meaningfull value (e.g. number of bits)
Source code in iscc_core\codec.py
encode_base32(data)
#
decode_base32(code)
#
Standard RFC4648 base32 decoding without padding and with casefolding.
Source code in iscc_core\codec.py
iscc_decompose(iscc_code)
#
Decompose a normalized ISCC-CODE or any valid ISCC sequence into a list of ISCC-UNITS.
A valid ISCC sequence is a string concatenation of ISCC-UNITS optionally seperated by a hyphen.
Source code in iscc_core\codec.py
iscc_normalize(iscc_code)
#
Normalize an ISCC to its canonical form.
The canonical form of an ISCC is its shortest base32 encoded representation
prefixed with the string ISCC:
.
Possible valid inputs:
MEACB7X7777574L6
ISCC:MEACB7X7777574L6
fcc010001657fe7cafe9791bb
iscc:maagztfqttvizpjr
Iscc:Maagztfqttvizpjr
Info
A concatenated sequence of codes will be composed into a single ISCC of MainType
MT.ISCC
if possible.
Example
Parameters:
Name | Type | Description | Default |
---|---|---|---|
iscc_code |
str
|
Any valid ISCC string |
required |
Returns:
Type | Description |
---|---|
str
|
Normalized ISCC |
Source code in iscc_core\codec.py
Alternate Encodings#
encode_base64(data)
#
decode_base64(code)
#
Standard RFC4648 base64url decoding without padding.
encode_base32hex(data)
#
RFC4648 Base32hex encoding without padding
decode_base32hex(code)
#
RFC4648 Base32hex decoding without padding
see: https://tools.ietf.org/html/rfc4648#page-10
Source code in iscc_core\codec.py
normalize_multiformat(iscc_code)
#
Normalize a multiformat encoded ISCC to standard base32 encoding. Returns the input unchanged (but cleaned) if it's not multiformat encoded.
Source code in iscc_core\codec.py
Helper Functions#
iscc_decode(iscc)
#
Decode ISCC to an IsccTuple
Parameters:
Name | Type | Description | Default |
---|---|---|---|
iscc |
str
|
ISCC string |
required |
Returns:
Type | Description |
---|---|
IsccTuple
|
ISCC decoded to a tuple |
Source code in iscc_core\codec.py
iscc_explain(iscc)
#
Convert ISCC to a human-readable representation
Parameters:
Name | Type | Description | Default |
---|---|---|---|
iscc |
str
|
ISCC string |
required |
Returns:
Type | Description |
---|---|
str
|
Human-readable representation of ISCC |
Source code in iscc_core\codec.py
iscc_type_id(iscc)
#
Extract and convert ISCC HEADER to a readable Type-ID string.
Type-ids can be used as names in databases to index ISCC-UNITs seperatly.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
iscc |
str
|
ISCC string |
required |
Returns:
Type | Description |
---|---|
str
|
Unique Type-ID string |
Source code in iscc_core\codec.py
iscc_validate(iscc, strict = True)
#
Validate that a given string is a strictly well-formed ISCC.
A strictly well-formed ISCC is:
- an ISCC-CODE or ISCC-UNIT
- encoded with base32 upper without padding
- has a valid combination of header values
- is represented in its canonical form
Parameters:
Name | Type | Description | Default |
---|---|---|---|
iscc |
str
|
ISCC string |
required |
strict |
bool
|
Raise an exeption if validation fails (default True) |
True
|
Returns:
Type | Description |
---|---|
bool
|
True if sting is valid else false. (raises ValueError in strict mode) |
Source code in iscc_core\codec.py
iscc_validate_mf(iscc, strict = True)
#
Validate that a given string is a well-formed ISCC in any supported encoding format.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
iscc |
str
|
ISCC string in any supported encoding |
required |
strict |
bool
|
Raise an exception if validation fails (default True) |
True
|
Returns:
Type | Description |
---|---|
bool
|
True if string is valid else false. (raises ValueError in strict mode) |
Source code in iscc_core\codec.py
iscc_clean(iscc)
#
Cleanup ISCC string.
Removes leading scheme, dashes, leading/trailing whitespace.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
iscc |
str
|
Any valid ISCC string |
required |
Returns:
Type | Description |
---|---|
str
|
Cleaned ISCC string. |