chunk_data#
- crispy.scms.chunk_data(ncpu, data_list, data_size)[source]#
Divide data into chunks for multiprocessing.
This function splits data into approximately equal-sized chunks to facilitate parallel processing across multiple CPUs.
- Parameters:
ncpu (int) – Number of CPUs to use for parallel processing. If ncpu is negative, the entire dataset is treated as a single chunk.
data_list (list of ndarray) – List of data arrays to be chunked. Each array should have the same size along the first axis (data_size).
data_size (int) – The total number of data points (size of the first dimension of arrays in data_list).
- Returns:
chunks – A tuple where each element corresponds to a list of chunks for a particular array in data_list. The total number of chunks is determined by ncpu.
- Return type:
tuple of lists of ndarray
Notes
The function computes the chunk size as data_size // ncpu to ensure chunks are of approximately equal size.
If ncpu is negative, the entire dataset is returned as a single chunk.
Examples
Divide data into chunks for parallel processing:
>>> import numpy as np >>> from crispy import scms >>> data1 = np.random.random((100, 3)) # Dataset 1 >>> data2 = np.random.random((100, 3)) # Dataset 2 >>> ncpu = 4 >>> chunks = scms.chunk_data(ncpu, [data1, data2], data_size=100) >>> for chunk1, chunk2 in zip(*chunks): ... print(chunk1.shape, chunk2.shape)