chunk_data#

crispy.scms.chunk_data(ncpu, data_list, data_size)[source]#

Divide data into chunks for multiprocessing.

This function splits data into approximately equal-sized chunks to facilitate parallel processing across multiple CPUs.

Parameters:
  • ncpu (int) – Number of CPUs to use for parallel processing. If ncpu is negative, the entire dataset is treated as a single chunk.

  • data_list (list of ndarray) – List of data arrays to be chunked. Each array should have the same size along the first axis (data_size).

  • data_size (int) – The total number of data points (size of the first dimension of arrays in data_list).

Returns:

chunks – A tuple where each element corresponds to a list of chunks for a particular array in data_list. The total number of chunks is determined by ncpu.

Return type:

tuple of lists of ndarray

Notes

  • The function computes the chunk size as data_size // ncpu to ensure chunks are of approximately equal size.

  • If ncpu is negative, the entire dataset is returned as a single chunk.

Examples

Divide data into chunks for parallel processing:

>>> import numpy as np
>>> from crispy import scms
>>> data1 = np.random.random((100, 3))  # Dataset 1
>>> data2 = np.random.random((100, 3))  # Dataset 2
>>> ncpu = 4
>>> chunks = scms.chunk_data(ncpu, [data1, data2], data_size=100)
>>> for chunk1, chunk2 in zip(*chunks):
...     print(chunk1.shape, chunk2.shape)