ISCC - Content Defined Chunking#
Compatible with fastcdc
alg_cdc_chunks(data, utf32, avg_chunk_size = ic.core_opts.data_avg_chunk_size)
#
A generator that yields data-dependent chunks for data
.
Usage Example:
Parameters:
Name | Type | Description | Default |
---|---|---|---|
data |
bytes
|
Raw data for variable sized chunking. |
required |
utf32 |
bool
|
If true assume we are chunking text that is utf32 encoded. |
required |
avg_chunk_size |
int
|
Target chunk size in number of bytes. |
ic.core_opts.data_avg_chunk_size
|
Returns:
Type | Description |
---|---|
Generator[bytes]
|
A generator that yields data chunks of variable sizes. |
Source code in iscc_core\cdc.py
alg_cdc_offset(buffer, mi, ma, cs, mask_s, mask_l)
#
Find breakpoint offset for a given buffer.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
buffer |
Data
|
The data to be chunked. |
required |
mi |
int
|
Minimum chunk size. |
required |
ma |
int
|
Maximung chunk size. |
required |
cs |
int
|
Center size. |
required |
mask_s |
int
|
Small mask. |
required |
mask_l |
int
|
Large mask. |
required |
Returns:
Type | Description |
---|---|
int
|
Offset of dynamic cutpoint in number of bytes. |
Source code in iscc_core\cdc.py
alg_cdc_params(avg_size: int) -> tuple
#
Calculate CDC parameters
Parameters:
Name | Type | Description | Default |
---|---|---|---|
avg_size |
int
|
Target average size of chunks in number of bytes. |
required |
Returns:
Type | Description |
---|---|
tuple
|
Tuple of (min_size, max_size, center_size, mask_s, mask_l). |