A multi-component identifier for digital media assets.
An ISCC-CODE can be generated from the concatenation of the digests of the following
five ISCC-UNITs together with a single common header:
- Meta-Code - Encodes metadata similarity
- Semantic-Code - Encodes semantic content similarity (to be developed)
- Content-Code - Encodes syntactic/perceptual similarity
- Data-Code - Encodes raw bitstream similarity
- Instance-Code - Data checksum
The following sequences of ISCC-UNITs are possible:
- Data, Instance
- Content, Data, Instance
- Semantic, Data, Instance
- Content, Semantic, Data, Instance
- Meta, Data, Instance
- Meta, Content, Data, Instance
- Meta, Semantic, Data, Instance
- Meta, Semantic, Content, Data, Instance
gen_iscc_code_v0(codes, wide = False)
Combine multiple ISCC-UNITS to an ISCC-CODE with a common header using
algorithm v0.
Parameters:
Name |
Type |
Description |
Default |
codes |
Sequence[str]
|
A valid sequence of singluar ISCC-UNITS. |
required
|
wide |
bool
|
If True, use 128-bit digests for Data and Instance codes (requires both to be at least 128-bit) |
False
|
Returns:
Type |
Description |
dict
|
An ISCC object with ISCC-CODE |
Source code in iscc_core\iscc_code.py
| def gen_iscc_code_v0(codes, wide=False):
# type: (Sequence[str], bool) -> dict
"""
Combine multiple ISCC-UNITS to an ISCC-CODE with a common header using
algorithm v0.
:param Sequence[str] codes: A valid sequence of singluar ISCC-UNITS.
:param bool wide: If True, use 128-bit digests for Data and Instance codes (requires both to be at least 128-bit)
:return: An ISCC object with ISCC-CODE
:rtype: dict
"""
codes = [ic.iscc_clean(code) for code in codes]
# Check basic constraints
if len(codes) < 2:
raise ValueError("Minimum two ISCC units required to generate valid ISCC-CODE")
for code in codes:
if len(code) < 16:
raise ValueError(f"Cannot build ISCC-CODE from units shorter than 64-bits: {code}")
# Decode units and sort by MainType
decoded = sorted(
[ic.decode_header(ic.decode_base32(code)) for code in codes], key=itemgetter(0)
)
main_types = tuple(d[0] for d in decoded)
if main_types[-2:] != (ic.MT.DATA, ic.MT.INSTANCE):
raise ValueError(f"ISCC-CODE requires at least MT.DATA and MT.INSTANCE units.")
# Check if this is a special case of 128-bit Data+Instance composite
is_wide_composite = (
wide
and len(codes) == 2
and main_types == (ic.MT.DATA, ic.MT.INSTANCE)
and all(
ic.decode_length(t[0], t[3]) >= 128 for t in decoded
) # Check if both units are at least 128-bit
)
# Determine SubType (generic mediatype)
if is_wide_composite:
st = ic.ST_ISCC.WIDE
else:
sub_types = [t[1] for t in decoded if t[0] in {ic.MT.SEMANTIC, ic.MT.CONTENT}]
if len(set(sub_types)) > 1:
raise ValueError(f"Semantic-Code and Content-Code must be of same SubType")
st = (
sub_types.pop() if sub_types else ic.ST_ISCC.SUM if len(codes) == 2 else ic.ST_ISCC.NONE
)
# Encode unit combination
encoded_length = ic.encode_units(main_types[:-2])
# Collect unit digests
if is_wide_composite:
# For wide case, use full 128-bit digests
digest = b"".join([t[-1][:16] for t in decoded])
else:
# For standard case, truncate unit digests to 64-bit
digest = b"".join([t[-1][:8] for t in decoded])
header = ic.encode_header(ic.MT.ISCC, st, ic.VS.V0, encoded_length)
code = ic.encode_base32(header + digest)
iscc = "ISCC:" + code
return dict(iscc=iscc)
|