Source code for elephant.cell_assembly_detection

"""
CAD :cite:`cad-Russo2017_e19428` is a method aimed to capture structures of
higher-order correlation in massively parallel spike trains. In particular, it
is able to extract patterns of spikes with arbitrary configuration of time lags
(time interval between spikes in a pattern), and at multiple time scales,
e.g. from synchronous patterns to firing rate co-modulations.

CAD consists of a statistical parametric testing done on the level of pairs
of neurons, followed by an agglomerative recursive algorithm, in order to
detect and test statistically precise repetitions of spikes in the data.
In particular, pairs of neurons are tested for significance under the null
hypothesis of independence, and then the significant pairs are agglomerated
into higher order patterns.

Given a list of discretized (binned) spike trains by a given temporal
scale (bin_size), assumed to be recorded in parallel, the CAD analysis can be
applied as demonstrated in this short toy example of 5 parallel spike trains
that exhibit fully synchronous events of order 5.

.. autosummary::
    :toctree: _toctree/cell_assembly_detection

    cell_assembly_detection


Visualization
-------------
Visualization of CAD method is covered in Viziphant
:func:`viziphant.patterns.plot_patterns`


See Also
--------
:ref:`spade` : advanced synchronous patterns detection

Examples
--------
>>> import quantities as pq
>>> import numpy as np
>>> from elephant.cell_assembly_detection import cell_assembly_detection
>>> from elephant.spike_train_generation import compound_poisson_process
>>> from elephant.conversion import BinnedSpikeTrain

Generate correlated data and bin it with a bin_size of 10ms.

>>> np.random.seed(30)
>>> spiketrains = compound_poisson_process(rate=15*pq.Hz,
...     amplitude_distribution=[0, 0.95, 0, 0, 0, 0, 0.05], t_stop=5*pq.s)
>>> bst = BinnedSpikeTrain(spiketrains, bin_size=10 * pq.ms)
>>> bst.rescale('ms')

Call of the method.

>>> patterns = cell_assembly_detection(bst, max_lag=2)
>>> patterns[0] # doctest: +SKIP
{'neurons': [0, 2],
 'lags': array([0.]) * ms,
 'pvalue': [5.3848138041122556e-05],
 'times': array([  90.,  160.,  170.,  550.,  790.,  910.,  930., 1420., 1470.,
        1480., 1650., 2030., 2220., 2570., 3130., 3430., 3480., 3610.,
        3800., 3830., 3930., 4080., 4560., 4600., 4670.]) * ms,
 'signature': [[1, 83], [2, 25]]}

Refer to the Viziphant documentation regarding the visualization of this
example.

"""

from __future__ import division, print_function, unicode_literals

import copy
import math
import time

import numpy as np
from scipy.stats import f

import elephant.conversion as conv

__all__ = [
    "cell_assembly_detection"
]


[docs] def cell_assembly_detection(binned_spiketrain, max_lag, reference_lag=2, alpha=0.05, min_occurrences=1, size_chunks=100, max_spikes=np.inf, significance_pruning=True, subgroup_pruning=True, same_configuration_pruning=False, verbose=False): """ Perform the CAD analysis :cite:`cad-Russo2017_e19428` for the binned (discretized) spike trains given in the input. The method looks for candidate significant patterns with lags (number of bins between successive spikes in the pattern) ranging from `-max_lag` to `max_lag` (the second parameter of the function). Thus, between two successive spikes in the pattern there can be at most `max_lag`*`bin_size` units of time. The method agglomerates pairs of units (or a unit and a preexisting assembly), tests their significance by a statistical test and stops when the detected assemblies reach their maximal dimension (parameter `max_spikes`). At every agglomeration size step (e.g. from triplets to quadruplets), the method filters patterns having the same neurons involved, and keeps only the most significant one. This pruning is optional and the choice is identified by the parameter 'significance_pruning'. Assemblies already included in a bigger assembly are eliminated in a final pruning step. Also this pruning is optional, and the choice is identified by the parameter `subgroup_pruning`. Parameters ---------- binned_spiketrain : elephant.conversion.BinnedSpikeTrain Binned spike trains containing data to be analyzed. max_lag : int Maximal lag to be tested. For a binning dimension of bin_size the method will test all pairs configurations with a time shift between '-max_lag' and 'max_lag'. reference_lag : int, optional Reference lag (in bins) for the non-stationarity correction in the statistical test. Default: 2 alpha : float, optional Significance level for the statistical test. Default: 0.05 min_occurrences : int, optional Minimal number of occurrences required for an assembly (all assemblies, even if significant, with fewer occurrences than min_occurrences are discarded). Default: 0 size_chunks : int, optional Size (in bins) of chunks in which the spike trains are divided to compute the variance (to reduce non stationarity effects on variance estimation). Default: 100 max_spikes : int, optional Maximal assembly order (the algorithm will return assemblies composed of maximum `max_spikes` elements). Default: `np.inf` significance_pruning : bool, optional If True, the method performs significance pruning among the detected assemblies. Default: True subgroup_pruning : bool, optional If True, the method performs subgroup pruning among the detected assemblies. Default: True same_configuration_pruning : bool, optional If True, performs pruning (not present in the original code and more efficient), not testing assemblies already formed if they appear in the very same configuration. Default: False verbose : bool, optional Regulates the number of prints given by the method. If true all prints are given, otherwise the method does give any prints. Default: False Returns ------- assembly : list of dict Contains the assemblies detected for the bin size chosen. Each assembly is a dictionary with attributes: 'neurons' : list Vector of units taking part to the assembly (unit order correspond to the agglomeration order). 'lag' : pq.Quantity Vector of time lags. `lag[z]` is the activation delay between `neurons[1]` and `neurons[z+1]`. 'pvalue' : list Vector containing p-values. `pvalue[z]` is the p-value of the statistical test between performed adding `neurons[z+1]` to the `neurons[1:z]`. 'times' : pq.Quantity Assembly activation times in the units of `binned_spiketrain`. 'signature' : np.ndarray Array of two entries `(z,c)`. The first is the number of neurons participating in the assembly (size), and the second is number of assembly occurrences. Raises ------ TypeError If `binned_spiketrain` is not an instance of `elephant.conversion.BinnedSpikeTrain`. ValueError If the parameters are out of bounds. Notes ----- Alias: cad """ initial_time = time.time() # check parameter input and raise errors if necessary _raise_errors(binned_spiketrain=binned_spiketrain, max_lag=max_lag, alpha=alpha, min_occurrences=min_occurrences, size_chunks=size_chunks, max_spikes=max_spikes) bin_size = binned_spiketrain.bin_size t_start = binned_spiketrain.t_start # transform the binned spiketrain into array binned_spiketrain = binned_spiketrain.to_array() # zero order n_neurons = len(binned_spiketrain) # initialize empty assembly assembly_in = [{'neurons': None, 'lags': None, 'pvalue': None, 'times': None, 'signature': None} for _ in range(n_neurons)] # initializing the dictionaries if verbose: print('Initializing the dictionaries...') for w1 in range(n_neurons): assembly_in[w1]['neurons'] = [w1] assembly_in[w1]['lags'] = [] assembly_in[w1]['pvalue'] = [] assembly_in[w1]['times'] = binned_spiketrain[w1] assembly_in[w1]['signature'] = [[1, sum(binned_spiketrain[w1])]] # first order = test over pairs # denominator of the Bonferroni correction # divide alpha by the number of tests performed in the first # pairwise testing loop number_test_performed = n_neurons * (n_neurons - 1) * (2 * max_lag + 1) # Fix Issue #443 # keep alph as the original input for alpha, this was done following # the MATLAB implementation by Eleonora Russo, see: # https://github.com/DurstewitzLab/Cell-Assembly-Detection alph = alpha alpha = alph * 2 / float(number_test_performed) if verbose: print('actual significance_level', alpha) # sign_pairs_matrix is the matrix with entry as 1 for the significant pairs sign_pairs_matrix = np.zeros((n_neurons, n_neurons), dtype=int) assembly = [] if verbose: print('Testing on pairs...') # nns: count of the existing assemblies nns = 0 # initialize the structure existing_patterns, storing the patterns # determined by neurons and lags: # if the pattern is already existing, don't do the test existing_patterns = [] # for loop for the pairwise testing for w1 in range(n_neurons - 1): for w2 in range(w1 + 1, n_neurons): spiketrain2 = binned_spiketrain[w2] n2 = w2 assembly_flag = 0 # call of the function that does the pairwise testing call_tp = _test_pair( ensemble=assembly_in[w1], spiketrain2=spiketrain2, n2=n2, max_lag=max_lag, size_chunks=size_chunks, reference_lag=reference_lag, existing_patterns=existing_patterns, same_configuration_pruning=same_configuration_pruning) if same_configuration_pruning: assem_tp = call_tp[0] else: assem_tp = call_tp # if the assembly given in output is significant and the number # of occurrences is higher than the minimum requested number if assem_tp['pvalue'][-1] < alpha and \ assem_tp['signature'][-1][1] > min_occurrences: # save the assembly in the output assembly.append(assem_tp) sign_pairs_matrix[w1][w2] = 1 assembly_flag = 1 # flag : it is indeed an assembly # put the item_candidate into the existing_patterns list if same_configuration_pruning: item_candidate = call_tp[1] if not existing_patterns: existing_patterns = [item_candidate] else: existing_patterns.append(item_candidate) if assembly_flag: nns += 1 # count of the existing assemblies # making sign_pairs_matrix symmetric sign_pairs_matrix = sign_pairs_matrix + sign_pairs_matrix.T sign_pairs_matrix[sign_pairs_matrix == 2] = 1 # print(sign_pairs_matrix) # second order and more: increase the assembly size by adding a new unit # the algorithm will return assemblies composed by # maximum max_spikes elements if verbose: print('\nTesting on higher order assemblies...\n') # keep the count of the current size of the assembly current_size_agglomeration = 2 # number of groups previously found n_as = len(assembly) # w2_to_test_v : contains the elements to test with the elements that are # in the assembly in input w2_to_test_v = np.zeros(n_neurons) # testing for higher order assemblies w1 = 0 while w1 < n_as: w1_elements = assembly[w1]['neurons'] # Add only neurons that have significant first order # co-occurrences with members of the assembly # Find indices and values of nonzero elements for i in range(len(w1_elements)): w2_to_test_v += sign_pairs_matrix[w1_elements[i]] # w2_to_test_p : vector with the index of nonzero elements w2_to_test_p = np.nonzero(w2_to_test_v)[0] # list with the elements to test # that are not already in the assembly w2_to_test = [item for item in w2_to_test_p if item not in w1_elements] pop_flag = 0 # check that there are candidate neurons for agglomeration if w2_to_test: # bonferroni correction only for the tests actually performed alpha = alph / float(len(w2_to_test) * n_as * (2 * max_lag + 1)) # testing for the element in w2_to_test for ww2 in range(len(w2_to_test)): w2 = w2_to_test[ww2] spiketrain2 = binned_spiketrain[w2] assembly_flag = 0 pop_flag = max(assembly_flag, 0) # testing for the assembly and the new neuron call_tp = _test_pair( ensemble=assembly[w1], spiketrain2=spiketrain2, n2=w2, max_lag=max_lag, size_chunks=size_chunks, reference_lag=reference_lag, existing_patterns=existing_patterns, same_configuration_pruning=same_configuration_pruning) if same_configuration_pruning: assem_tp = call_tp[0] else: assem_tp = call_tp # if it is significant and # the number of occurrences is sufficient and # the length of the assembly is less than the input limit if assem_tp['pvalue'][-1] < alpha and \ assem_tp['signature'][-1][1] > min_occurrences and \ assem_tp['signature'][-1][0] <= max_spikes: # the assembly is saved in the output list of # assemblies assembly.append(assem_tp) assembly_flag = 1 if len(assem_tp['neurons']) > current_size_agglomeration: # up to the next agglomeration level current_size_agglomeration += 1 # Pruning step 1 # between two assemblies with the same unit set # arranged into different # configurations, choose the most significant one if significance_pruning is True and \ current_size_agglomeration > 3: assembly, n_filtered_assemblies = \ _significance_pruning_step( pre_pruning_assembly=assembly) if same_configuration_pruning: item_candidate = call_tp[1] existing_patterns.append(item_candidate) if assembly_flag: # count one more assembly nns += 1 n_as = len(assembly) # if at least once the assembly was agglomerated to a bigger one, # pop the smaller one if pop_flag: assembly.pop(w1) w1 = w1 + 1 # Pruning step 1 # between two assemblies with the same unit set arranged into different # configurations, choose the most significant one # Last call for pruning of last order agglomeration if significance_pruning: assembly = _significance_pruning_step(pre_pruning_assembly=assembly)[0] # Pruning step 2 # Remove assemblies whom elements are already # ALL included in a bigger assembly if subgroup_pruning: assembly = _subgroup_pruning_step(pre_pruning_assembly=assembly) # Reformat of the activation times for pattern in assembly: times = np.where(pattern['times'] > 0)[0] * bin_size + t_start pattern['times'] = times pattern['lags'] = pattern['lags'] * bin_size pattern['signature'] = np.array(pattern['signature'], dtype=int) # Give as output only the maximal groups if verbose: print('\nGiving outputs of the method...\n') print('final_assembly') for item in assembly: print(item['neurons'], item['lags'], item['signature']) # Time needed for the computation if verbose: print('\ntime', time.time() - initial_time) return assembly
def _chunking(binned_pair, size_chunks, max_lag, best_lag): """ Chunking the object binned_pair into parts with the same bin length Parameters ---------- binned_pair : np.array vector of the binned spike trains for the pair being analyzed size_chunks : int size of chunks desired max_lag : int max number of lags for the bin_size chosen best_lag : int lag with the higher number of coincidences Returns ------- chunked : list list with the object binned_pair cut in size_chunks parts n_chunks : int number of chunks """ length = len(binned_pair[0], ) # number of chunks n_chunks = math.ceil((length - max_lag) / size_chunks) # new chunk size, this is to have all chunks of roughly the same size size_chunks = math.floor((length - max_lag) / n_chunks) n_chunks = int(n_chunks) size_chunks = int(size_chunks) chunked = [[[], []] for _ in range(n_chunks)] # cut the time series according to best_lag binned_pair_cut = np.array([np.zeros(length - max_lag, dtype=int), np.zeros(length - max_lag, dtype=int)]) # choose which entries to consider according to the best lag chosen if best_lag == 0: binned_pair_cut[0] = binned_pair[0][0:length - max_lag] binned_pair_cut[1] = binned_pair[1][0:length - max_lag] elif best_lag > 0: binned_pair_cut[0] = binned_pair[0][0:length - max_lag] binned_pair_cut[1] = binned_pair[1][ best_lag:length - max_lag + best_lag] else: binned_pair_cut[0] = binned_pair[0][ -best_lag:length - max_lag - best_lag] binned_pair_cut[1] = binned_pair[1][0:length - max_lag] # put the cut data into the chunked object for iii in range(n_chunks - 1): chunked[iii][0] = binned_pair_cut[0][ size_chunks * iii:size_chunks * (iii + 1)] chunked[iii][1] = binned_pair_cut[1][ size_chunks * iii:size_chunks * (iii + 1)] # last chunk can be of slightly different size chunked[n_chunks - 1][0] = binned_pair_cut[0][ size_chunks * (n_chunks - 1):length] chunked[n_chunks - 1][1] = binned_pair_cut[1][ size_chunks * (n_chunks - 1):length] return chunked, n_chunks def _assert_same_pattern(item_candidate, existing_patterns, max_lag): """ Tests if a particular pattern has already been tested and retrieved as significant. Parameters ---------- item_candidate : list of list with two components in the first component there are the neurons involved in the assembly, in the second there are the correspondent lags existing_patterns : list list of the already significant patterns max_lag : int maximum lag to be tested Returns ------- True if the pattern was already tested and retrieved as significant False if not """ # unique representation of pattern in term of lags, maxlag and neurons # participating item_candidate = sorted(item_candidate[0] * 2 * max_lag + item_candidate[1] + max_lag) if item_candidate in existing_patterns: return True else: return False def _test_pair(ensemble, spiketrain2, n2, max_lag, size_chunks, reference_lag, existing_patterns, same_configuration_pruning): """ Tests if two spike trains have repetitive patterns occurring more frequently than chance. Parameters ---------- ensemble : dictionary structure with the previously formed assembly and its spike train spiketrain2 : list spike train of the new unit to be tested for significance (candidate to be a new assembly member) n2 : int new unit tested max_lag : int maximum lag to be tested size_chunks : int size (in bins) of chunks in which the spike trains is divided to compute the variance (to reduce non stationarity effects on variance estimation) reference_lag : int lag of reference; if zero or negative reference lag=-l existing_patterns : list list of the already significant patterns same_configuration_pruning : bool if True (not present in the original code and more efficient), does not test assemblies already formed if they appear in the very same configuration Default: False Returns ------- assembly : dictionary assembly formed by the method (can be empty), with attributes: 'elements' : vector of units taking part to the assembly (unit order correspond to the agglomeration order) 'lag' : vector of time lags (lag[z] is the activation delay between elements[1] and elements[z+1] 'pvalue' : vector of pvalues. `pr[z]` is the p-value of the statistical test between performed adding elements[z+1] to the elements[1:z] 'times' : assembly activation time. It reports how many times the complete assembly activates in that bin. time always refers to the activation of the first listed assembly element (elements[1]), that doesn't necessarily corresponds to the first unit firing. 'signature' : array of two entries (z,c). The first is the number of neurons participating in the assembly (size), while the second is number of assembly occurrences. item_candidate : list of list with two components in the first component there are the neurons involved in the assembly, in the second there are the correspondent lags. """ # list with the binned spike trains of the two neurons binned_pair = [ensemble['times'], spiketrain2] # For large bin_sizes, the binned spike counts may potentially fluctuate # around a high mean level and never fall below some minimum count # considerably larger than zero for the whole time series. # Entries up to this minimum count would contribute # to the coincidence count although they are completely # uninformative, so we subtract the minima. binned_pair = np.array([binned_pair[0] - min(binned_pair[0]), binned_pair[1] - min(binned_pair[1])]) ntp = len(binned_pair[0]) # trial length # Divide in parallel trials with 0/1 elements # max number of spikes in one bin for both neurons maxrate = int(max(max(binned_pair[0]), max(binned_pair[1]))) # creation of the parallel processes, one for each rate up to maxrate # and computation of the coincidence count for both neurons par_processes = np.zeros((maxrate, 2, ntp), dtype=int) par_proc_expectation = np.zeros(maxrate, dtype=int) for i in range(maxrate): par_processes[i] = np.array(binned_pair > i, dtype=int) par_proc_expectation[i] = (np.sum(par_processes[i][0]) * np.sum( par_processes[i][1])) / float(ntp) # Decide which is the lag with most coincidences (l_ : best lag) # we are calculating the joint spike count of units A and B at lag l. # It is computed by counting the number # of times we have a spike in A and a corresponding spike in unit B # l times later for every lag, # we select the one corresponding to the highest count # structure with the coincidence counts for each lag fwd_coinc_count = np.array([0 for _ in range(max_lag + 1)]) bwd_coinc_count = np.array([0 for _ in range(max_lag + 1)]) for lag in range(max_lag + 1): time_fwd_cc = np.array([binned_pair[0][ 0:len(binned_pair[0]) - max_lag], binned_pair[1][ lag:len(binned_pair[1]) - max_lag + lag]]) time_bwd_cc = np.array([binned_pair[0][ lag:len(binned_pair[0]) - max_lag + lag], binned_pair[1][ 0:len(binned_pair[1]) - max_lag]]) # taking the minimum, place by place for the coincidences fwd_coinc_count[lag] = np.sum(np.minimum(time_fwd_cc[0], time_fwd_cc[1])) bwd_coinc_count[lag] = np.sum(np.minimum(time_bwd_cc[0], time_bwd_cc[1])) # choice of the best lag, taking into account the reference lag if reference_lag <= 0: # if the global maximum is in the forward process (A to B) if np.amax(fwd_coinc_count) > np.amax(bwd_coinc_count): # bwd_flag indicates whether we are in time_fwd_cc or time_bwd_cc fwd_flag = 1 global_maximum_index = np.argmax(fwd_coinc_count) else: fwd_flag = 2 global_maximum_index = np.argmax(bwd_coinc_count) best_lag = (fwd_flag == 1) * global_maximum_index - ( fwd_flag == 2) * global_maximum_index max_coinc_count = max(np.amax(fwd_coinc_count), np.amax(bwd_coinc_count)) else: # reverse the ctAB_ object and not take into account the first entry bwd_coinc_count_rev = bwd_coinc_count[1:len(bwd_coinc_count)][::-1] hab_l = np.append(bwd_coinc_count_rev, fwd_coinc_count) lags = range(-max_lag, max_lag + 1) max_coinc_count = np.amax(hab_l) best_lag = lags[np.argmax(hab_l)] if best_lag < 0: lag_ref = best_lag + reference_lag coinc_count_ref = hab_l[lags.index(lag_ref)] else: lag_ref = best_lag - reference_lag coinc_count_ref = hab_l[lags.index(lag_ref)] # now check whether the pattern, with those neurons and that particular # configuration of lags, # is already in the list of the significant patterns # if it is, don't do the testing # if it is not, continue previous_neu = ensemble['neurons'] pattern_candidate = copy.copy(previous_neu) pattern_candidate.append(n2) pattern_candidate = np.array(pattern_candidate) # add both the new lag and zero previous_lags = ensemble['lags'] lags_candidate = copy.copy(previous_lags) lags_candidate.append(best_lag) lags_candidate[:0] = [0] pattern_candidate = list(pattern_candidate) lags_candidate = list(lags_candidate) item_candidate = [[pattern_candidate], [lags_candidate]] if same_configuration_pruning: if _assert_same_pattern(item_candidate=item_candidate, existing_patterns=existing_patterns, max_lag=max_lag): en_neurons = copy.copy(ensemble['neurons']) en_neurons.append(n2) en_lags = copy.copy(ensemble['lags']) en_lags.append(np.inf) en_pvalue = copy.copy(ensemble['pvalue']) en_pvalue.append(1) en_n_occ = copy.copy(ensemble['signature']) en_n_occ.append([0, 0]) item_candidate = [] assembly = {'neurons': en_neurons, 'lags': en_lags, 'pvalue': en_pvalue, 'times': [], 'signature': en_n_occ} return assembly, item_candidate else: # I go on with the testing pair_expectation = np.sum(par_proc_expectation) # case of no coincidences or limit for the F asimptotical # distribution (too few coincidences) if max_coinc_count == 0 or pair_expectation <= 5 or \ pair_expectation >= (min(np.sum(binned_pair[0]), np.sum(binned_pair[1])) - 5): en_neurons = copy.copy(ensemble['neurons']) en_neurons.append(n2) en_lags = copy.copy(ensemble['lags']) en_lags.append(np.inf) en_pvalue = copy.copy(ensemble['pvalue']) en_pvalue.append(1) en_n_occ = copy.copy(ensemble['signature']) en_n_occ.append([0, 0]) assembly = {'neurons': en_neurons, 'lags': en_lags, 'pvalue': en_pvalue, 'times': [], 'signature': en_n_occ} if same_configuration_pruning: item_candidate = [] return assembly, item_candidate else: return assembly else: # construct the activation series for binned_pair length = len(binned_pair[0]) # trial length activation_series = np.zeros(length) if reference_lag <= 0: if best_lag == 0: # synchrony case for i in range(maxrate): # for all parallel processes par_processes_a = par_processes[i][0] par_processes_b = par_processes[i][1] activation_series = \ np.add(activation_series, np.multiply(par_processes_a, par_processes_b)) coinc_count_matrix = np.array([[0, fwd_coinc_count[0]], [bwd_coinc_count[2], 0]]) # matrix with #AB and #BA # here we specifically choose # 'l* = -2' for the synchrony case elif best_lag > 0: for i in range(maxrate): par_processes_a = par_processes[i][0] par_processes_b = par_processes[i][1] # multiplication between the two binned time series # shifted by best_lag activation_series[0:length - best_lag] = \ np.add(activation_series[0:length - best_lag], np.multiply(par_processes_a[ 0:length - best_lag], par_processes_b[ best_lag:length])) coinc_count_matrix = \ np.array([[0, fwd_coinc_count[global_maximum_index]], [bwd_coinc_count[global_maximum_index], 0]]) else: for i in range(maxrate): par_processes_a = par_processes[i][0] par_processes_b = par_processes[i][1] activation_series[-best_lag:length] = \ np.add(activation_series[-best_lag:length], np.multiply(par_processes_a[ -best_lag:length], par_processes_b[ 0:length + best_lag])) coinc_count_matrix = \ np.array([[0, fwd_coinc_count[global_maximum_index]], [bwd_coinc_count[global_maximum_index], 0]]) else: if best_lag == 0: for i in range(maxrate): par_processes_a = par_processes[i][0] par_processes_b = par_processes[i][1] activation_series = \ np.add(activation_series, np.multiply(par_processes_a, par_processes_b)) elif best_lag > 0: for i in range(maxrate): par_processes_a = par_processes[i][0] par_processes_b = par_processes[i][1] activation_series[0:length - best_lag] = \ np.add(activation_series[0:length - best_lag], np.multiply(par_processes_a[ 0:length - best_lag], par_processes_b[ best_lag:length])) else: for i in range(maxrate): par_processes_a = par_processes[i][0] par_processes_b = par_processes[i][1] activation_series[-best_lag:length] = \ np.add(activation_series[-best_lag:length], np.multiply(par_processes_a[ -best_lag:length], par_processes_b[ 0:length + best_lag])) coinc_count_matrix = np.array([[0, max_coinc_count], [coinc_count_ref, 0]]) # chunking chunked, nch = _chunking(binned_pair=binned_pair, size_chunks=size_chunks, max_lag=max_lag, best_lag=best_lag) marginal_counts = np.zeros((nch, maxrate, 2), dtype=int) # for every chunk, a vector with in each entry the sum of elements # in each parallel binary process, for each unit # maxrate_t : contains the maxrates for both neurons in each chunk maxrate_t = np.zeros(nch, dtype=int) # ch_nn : contains the length of the different chunks ch_nn = np.zeros(nch, dtype=int) count_sum = 0 # for every chunk build the parallel processes # and the coincidence counts for iii in range(nch): binned_pair_chunked = np.array(chunked[iii]) maxrate_t[iii] = max(max(binned_pair_chunked[0]), max(binned_pair_chunked[1])) ch_nn[iii] = len(chunked[iii][0]) par_processes_chunked = [None for _ in range( int(maxrate_t[iii]))] for i in range(int(maxrate_t[iii])): par_processes_chunked[i] = np.zeros( (2, len(binned_pair_chunked[0])), dtype=int) par_processes_chunked[i] = np.array(binned_pair_chunked > i, dtype=int) for i in range(int(maxrate_t[iii])): par_processes_a = par_processes_chunked[i][0] par_processes_b = par_processes_chunked[i][1] marginal_counts[iii][i][0] = int(np.sum(par_processes_a)) marginal_counts[iii][i][1] = int(np.sum(par_processes_b)) count_sum = count_sum + min(marginal_counts[iii][i][0], marginal_counts[iii][i][1]) # marginal_counts[iii][i] has in its entries # '[ #_a^{\alpha,c} , #_b^{\alpha,c}]' # where '\alpha' goes from 1 to maxrate, c goes from 1 to nch # calculation of variance for each chunk n = ntp - max_lag # used in the calculation of the p-value var_x = [np.zeros((2, 2)) for _ in range(nch)] var_tot = 0 cov_abab = [0 for _ in range(nch)] cov_abba = [0 for _ in range(nch)] var_t = [np.zeros((2, 2)) for _ in range(nch)] cov_x = [np.zeros((2, 2)) for _ in range(nch)] for iii in range(nch): # for every chunk ch_size = ch_nn[iii] # evaluation of AB + variance and covariance cov_abab[iii] = [[0 for _ in range(maxrate_t[iii])] for _ in range(maxrate_t[iii])] # for every rate up to the maxrate in that chunk for i in range(maxrate_t[iii]): par_marg_counts_i = \ np.outer(marginal_counts[iii][i], np.ones(2)) cov_abab[iii][i][i] = \ np.multiply( np.multiply(par_marg_counts_i, par_marg_counts_i.T) / float(ch_size), np.multiply(ch_size - par_marg_counts_i, ch_size - par_marg_counts_i.T) / float(ch_size * (ch_size - 1))) # calculation of the variance var_t[iii] = var_t[iii] + cov_abab[iii][i][i] # cross covariances terms if maxrate_t[iii] > 1: for j in range(i + 1, maxrate_t[iii]): par_marg_counts_j = \ np.outer(marginal_counts[iii][j], np.ones(2)) cov_abab[iii][i][j] = \ 2 * np.multiply( np.multiply(par_marg_counts_j, par_marg_counts_j.T) / float(ch_size), np.multiply(ch_size - par_marg_counts_i, ch_size - par_marg_counts_i.T) / float(ch_size * (ch_size - 1))) # update of the variance var_t[iii] = var_t[iii] + cov_abab[iii][i][j] # evaluation of coinc_count_matrix = #AB - #BA cov_abba[iii] = [[0 for _ in range(maxrate_t[iii])] for _ in range(maxrate_t[iii])] for i in range(maxrate_t[iii]): par_marg_counts_i = \ np.outer(marginal_counts[iii][i], np.ones(2)) cov_abba[iii][i][i] = \ np.multiply( np.multiply(par_marg_counts_i, par_marg_counts_i.T) / float(ch_size), np.multiply(ch_size - par_marg_counts_i, ch_size - par_marg_counts_i.T) / float(ch_size * (ch_size - 1) ** 2)) cov_x[iii] = cov_x[iii] + cov_abba[iii][i][i] if maxrate_t[iii] > 1: for j in range((i + 1), maxrate_t[iii]): par_marg_counts_j = \ np.outer(marginal_counts[iii][j], np.ones(2)) cov_abba[iii][i][j] = \ 2 * np.multiply( np.multiply(par_marg_counts_j, par_marg_counts_j.T) / float(ch_size), np.multiply(ch_size - par_marg_counts_i, ch_size - par_marg_counts_i.T) / float(ch_size * (ch_size - 1) ** 2)) cov_x[iii] = cov_x[iii] + cov_abba[iii][i][j] var_x[iii] = var_t[iii] + var_t[iii].T - cov_x[iii] - cov_x[iii].T var_tot = var_tot + var_x[iii] # Yates correction coinc_count_matrix = coinc_count_matrix - coinc_count_matrix.T if abs(coinc_count_matrix[0][1]) > 0: coinc_count_matrix = abs(coinc_count_matrix) - 0.5 if var_tot[0][1] == 0: pr_f = 1 # p-value obtained through approximation to a Fischer F distribution # (here we employ the survival function) else: fstat = coinc_count_matrix ** 2 / var_tot pr_f = f.sf(fstat[0][1], 1, n) # Creation of the dictionary with the results en_neurons = copy.copy(ensemble['neurons']) en_neurons.append(n2) en_lags = copy.copy(ensemble['lags']) en_lags.append(best_lag) en_pvalue = copy.copy(ensemble['pvalue']) en_pvalue.append(pr_f) en_n_occ = copy.copy(ensemble['signature']) en_n_occ.append([len(en_neurons), sum(activation_series)]) assembly = {'neurons': en_neurons, 'lags': en_lags, 'pvalue': en_pvalue, 'times': activation_series, 'signature': en_n_occ} if same_configuration_pruning: return assembly, item_candidate else: return assembly def _significance_pruning_step(pre_pruning_assembly): """ Between two assemblies with the same unit set arranged into different configurations the most significant one is chosen. Parameters ---------- pre_pruning_assembly : list contains the whole set of significant assemblies (unfiltered) Returns ------- assembly : list contains the filtered assemblies n_filtered_assemblies : int number of filtered assemblies by the function """ # number of assemblies before pruning nns = len(pre_pruning_assembly) # boolean array for selection of assemblies to keep selection = [] # list storing the found assemblies assembly = [] for i in range(nns): elem = sorted(pre_pruning_assembly[i]['neurons']) # in the list, so that membership can be checked if elem in selection: # find the element that was already in the list pre = selection.index(elem) if pre_pruning_assembly[i]['pvalue'][-1] <= \ assembly[pre]['pvalue'][-1]: # if the new element has a p-value that is smaller # than the one had previously selection[pre] = elem # substitute the prev element in the selection with the new assembly[pre] = pre_pruning_assembly[i] # substitute also in the list of the new assemblies if elem not in selection: selection.append(elem) assembly.append(pre_pruning_assembly[i]) # number of assemblies filtered out is equal to the difference # between the pre and post pruning size n_filtered_assemblies = nns - len(assembly) return assembly, n_filtered_assemblies def _subgroup_pruning_step(pre_pruning_assembly): """ Removes assemblies which are already all included in a bigger assembly Parameters ---------- pre_pruning_assembly : list contains the assemblies filtered by the significance value Returns -------- final_assembly : list contains the assemblies filtered by inclusion """ # reversing the semifinal_assembly makes the computation quicker # since the assembly are formed by agglomeration pre_pruning_assembly_r = list(reversed(pre_pruning_assembly)) nns = len(pre_pruning_assembly_r) # boolean list with the selected assemblies selection = [True for _ in range(nns)] for i in range(nns): # check only in the range of the already selected assemblies if selection[i]: a = pre_pruning_assembly_r[i]['neurons'] for j in range(i + 1, nns): if selection[j]: b = pre_pruning_assembly_r[j]['neurons'] # check if a is included in b or vice versa if set(a).issuperset(set(b)): selection[j] = False if set(b).issuperset(set(a)): selection[i] = False # only for the case in which significance_pruning=False if set(a) == set(b): selection[i] = True selection[j] = True assembly_r = [] # put into final_assembly only the selected ones for i in range(nns): if selection[i]: assembly_r.append(pre_pruning_assembly_r[i]) assembly = list(reversed(assembly_r)) return assembly def _raise_errors(binned_spiketrain, max_lag, alpha, min_occurrences, size_chunks, max_spikes): """ Returns errors if the parameters given in input are not correct. Parameters ---------- binned_spiketrain : BinnedSpikeTrain object binned spike trains containing data to be analysed max_lag : int maximal lag to be tested. For a binning dimension of bin_size the method will test all pairs configurations with a time shift between -max_lag and max_lag alpha : float alpha level. min_occurrences : int minimal number of occurrences required for an assembly (all assemblies, even if significant, with fewer occurrences than min_occurrences are discarded). size_chunks : int size (in bins) of chunks in which the spike trains is divided to compute the variance (to reduce non stationarity effects on variance estimation) max_spikes : int maximal assembly order (the algorithm will return assemblies of composed by maximum max_spikes elements). Raises ------ TypeError if the data is not an elephant.conv.BinnedSpikeTrain object ValueError if the maximum lag considered is 1 or less if the significance level is not in [0,1] if the minimal number of occurrences for an assembly is less than 1 if the length of the chunks for the variance computation is 1 or less if the maximal assembly order is not between 2 and the number of neurons if the time series is too short (less than 100 bins) """ if not isinstance(binned_spiketrain, conv.BinnedSpikeTrain): raise TypeError( 'data must be in BinnedSpikeTrain format') if max_lag < 2: raise ValueError('max_lag value cant be less than 2') if alpha < 0 or alpha > 1: raise ValueError('significance level has to be in interval [0,1]') if min_occurrences < 1: raise ValueError('minimal number of occurrences for an assembly ' 'must be at least 1') if size_chunks < 2: raise ValueError('length of the chunks cannot be 1 or less') if max_spikes < 2: raise ValueError('maximal assembly order must be less than 2') if binned_spiketrain.shape[1] - max_lag < 100: raise ValueError('The time series is too short, consider ' 'taking a longer portion of spike train ' 'or diminish the bin size to be tested') # alias for the function cad = cell_assembly_detection