Source code for elephant.cell_assembly_detection

"""
CAD :cite:`cad-Russo2017_e19428` is a method aimed to capture structures of
higher-order correlation in massively parallel spike trains. In particular, it
is able to extract patterns of spikes with arbitrary configuration of time lags
(time interval between spikes in a pattern), and at multiple time scales,
e.g. from synchronous patterns to firing rate co-modulations.

CAD consists of a statistical parametric testing done on the level of pairs
of neurons, followed by an agglomerative recursive algorithm, in order to
detect and test statistically precise repetitions of spikes in the data.
In particular, pairs of neurons are tested for significance under the null
hypothesis of independence, and then the significant pairs are agglomerated
into higher order patterns.

Given a list of discretized (binned) spike trains by a given temporal
scale (bin_size), assumed to be recorded in parallel, the CAD analysis can be
applied as demonstrated in this short toy example of 5 parallel spike trains
that exhibit fully synchronous events of order 5.

.. autosummary::
    :toctree: _toctree/cell_assembly_detection

    cell_assembly_detection


Visualization
-------------
Visualization of CAD method is covered in Viziphant
:func:`viziphant.patterns.plot_patterns`


See Also
--------
:ref:`spade` : advanced synchronous patterns detection

Examples
--------
>>> import quantities as pq
>>> import numpy as np
>>> from elephant.cell_assembly_detection import cell_assembly_detection
>>> from elephant.spike_train_generation import compound_poisson_process
>>> from elephant.conversion import BinnedSpikeTrain

Generate correlated data and bin it with a bin_size of 10ms.

>>> np.random.seed(30)
>>> spiketrains = compound_poisson_process(rate=15*pq.Hz,
...     amplitude_distribution=[0, 0.95, 0, 0, 0, 0, 0.05], t_stop=5*pq.s)
>>> bst = BinnedSpikeTrain(spiketrains, bin_size=10 * pq.ms)
>>> bst.rescale('ms')

Call of the method.

>>> patterns = cell_assembly_detection(bst, max_lag=2)
>>> patterns[0] # doctest: +SKIP
{'neurons': [0, 2],
 'lags': array([0.]) * ms,
 'pvalue': [5.3848138041122556e-05],
 'times': array([  90.,  160.,  170.,  550.,  790.,  910.,  930., 1420., 1470.,
        1480., 1650., 2030., 2220., 2570., 3130., 3430., 3480., 3610.,
        3800., 3830., 3930., 4080., 4560., 4600., 4670.]) * ms,
 'signature': [[1, 83], [2, 25]]}

Refer to the Viziphant documentation regarding the visualization of this
example.

"""

from __future__ import division, print_function, unicode_literals

import copy
import math
import time

import numpy as np
from scipy.stats import f

import elephant.conversion as conv

__all__ = [
    "cell_assembly_detection"
]



[docs]
def cell_assembly_detection(binned_spiketrain, max_lag, reference_lag=2,
                            alpha=0.05, min_occurrences=1, size_chunks=100,
                            max_spikes=np.inf, significance_pruning=True,
                            subgroup_pruning=True,
                            same_configuration_pruning=False,
                            verbose=False):
    """
    Perform the CAD analysis :cite:`cad-Russo2017_e19428` for the binned
    (discretized) spike trains given in the input. The method looks for
    candidate significant patterns with lags (number of bins between successive
    spikes in the pattern) ranging from `-max_lag` to `max_lag` (the second
    parameter of the function). Thus, between two successive spikes in the
    pattern there can be at most `max_lag`*`bin_size` units of time.

    The method agglomerates pairs of units (or a unit and a preexisting
    assembly), tests their significance by a statistical test
    and stops when the detected assemblies reach their maximal dimension
    (parameter `max_spikes`).

    At every agglomeration size step (e.g. from triplets to quadruplets), the
    method filters patterns having the same neurons involved, and keeps only
    the most significant one. This pruning is optional and the choice is
    identified by the parameter 'significance_pruning'.
    Assemblies already included in a bigger assembly are eliminated in a final
    pruning step. Also this pruning is optional, and the choice is identified
    by the parameter `subgroup_pruning`.

    Parameters
    ----------
    binned_spiketrain : elephant.conversion.BinnedSpikeTrain
        Binned spike trains containing data to be analyzed.
    max_lag : int
        Maximal lag to be tested. For a binning dimension of bin_size the
        method will test all pairs configurations with a time
        shift between '-max_lag' and 'max_lag'.
    reference_lag : int, optional
        Reference lag (in bins) for the non-stationarity correction in the
        statistical test.
        Default: 2
    alpha : float, optional
        Significance level for the statistical test.
        Default: 0.05
    min_occurrences : int, optional
        Minimal number of occurrences required for an assembly
        (all assemblies, even if significant, with fewer occurrences
        than min_occurrences are discarded).
        Default: 0
    size_chunks : int, optional
        Size (in bins) of chunks in which the spike trains are divided
        to compute the variance (to reduce non stationarity effects
        on variance estimation).
        Default: 100
    max_spikes : int, optional
        Maximal assembly order (the algorithm will return assemblies
        composed of maximum `max_spikes` elements).
        Default: `np.inf`
    significance_pruning : bool, optional
        If True, the method performs significance pruning among
        the detected assemblies.
        Default: True
    subgroup_pruning : bool, optional
        If True, the method performs subgroup pruning among
        the detected assemblies.
        Default: True
    same_configuration_pruning : bool, optional
        If True, performs pruning (not present in the original code and more
        efficient), not testing assemblies already formed
        if they appear in the very same configuration.
        Default: False
    verbose : bool, optional
        Regulates the number of prints given by the method. If true all prints
        are given, otherwise the method does give any prints.
        Default: False

    Returns
    -------
    assembly : list of dict
        Contains the assemblies detected for the bin size chosen. Each
        assembly is a dictionary with attributes:

        'neurons' : list
            Vector of units taking part to the assembly (unit order correspond
            to the agglomeration order).
        'lag' : pq.Quantity
            Vector of time lags.
            `lag[z]` is the activation delay between `neurons[1]` and
            `neurons[z+1]`.
        'pvalue' : list
            Vector containing p-values.
            `pvalue[z]` is the p-value of the statistical test between
            performed adding `neurons[z+1]` to the `neurons[1:z]`.
        'times' : pq.Quantity
            Assembly activation times in the units of `binned_spiketrain`.
        'signature' : np.ndarray
            Array of two entries `(z,c)`. The first is the number of neurons
            participating in the assembly (size), and the second is number of
            assembly occurrences.

    Raises
    ------
    TypeError
        If `binned_spiketrain` is not an instance of
        `elephant.conversion.BinnedSpikeTrain`.
    ValueError
        If the parameters are out of bounds.

    Notes
    -----
    Alias: cad

    """
    initial_time = time.time()

    # check parameter input and raise errors if necessary
    _raise_errors(binned_spiketrain=binned_spiketrain,
                  max_lag=max_lag,
                  alpha=alpha,
                  min_occurrences=min_occurrences,
                  size_chunks=size_chunks,
                  max_spikes=max_spikes)

    bin_size = binned_spiketrain.bin_size
    t_start = binned_spiketrain.t_start

    # transform the binned spiketrain into array
    binned_spiketrain = binned_spiketrain.to_array()

    # zero order
    n_neurons = len(binned_spiketrain)

    # initialize empty assembly

    assembly_in = [{'neurons': None,
                    'lags': None,
                    'pvalue': None,
                    'times': None,
                    'signature': None} for _ in range(n_neurons)]

    # initializing the dictionaries
    if verbose:
        print('Initializing the dictionaries...')
    for w1 in range(n_neurons):
        assembly_in[w1]['neurons'] = [w1]
        assembly_in[w1]['lags'] = []
        assembly_in[w1]['pvalue'] = []
        assembly_in[w1]['times'] = binned_spiketrain[w1]
        assembly_in[w1]['signature'] = [[1, sum(binned_spiketrain[w1])]]

    # first order = test over pairs

    # denominator of the Bonferroni correction
    # divide alpha by the number of tests performed in the first
    # pairwise testing loop
    number_test_performed = n_neurons * (n_neurons - 1) * (2 * max_lag + 1)
    # Fix Issue #443
    # keep alph as the original input for alpha, this was done following
    # the MATLAB implementation by Eleonora Russo, see:
    # https://github.com/DurstewitzLab/Cell-Assembly-Detection
    alph = alpha
    alpha = alph * 2 / float(number_test_performed)
    if verbose:
        print('actual significance_level', alpha)

    # sign_pairs_matrix is the matrix with entry as 1 for the significant pairs
    sign_pairs_matrix = np.zeros((n_neurons, n_neurons), dtype=int)
    assembly = []
    if verbose:
        print('Testing on pairs...')

    # nns: count of the existing assemblies
    nns = 0

    # initialize the structure existing_patterns, storing the patterns
    # determined by neurons and lags:
    # if the pattern is already existing, don't do the test
    existing_patterns = []

    # for loop for the pairwise testing
    for w1 in range(n_neurons - 1):
        for w2 in range(w1 + 1, n_neurons):
            spiketrain2 = binned_spiketrain[w2]
            n2 = w2
            assembly_flag = 0

            # call of the function that does the pairwise testing
            call_tp = _test_pair(
                ensemble=assembly_in[w1],
                spiketrain2=spiketrain2,
                n2=n2,
                max_lag=max_lag,
                size_chunks=size_chunks,
                reference_lag=reference_lag,
                existing_patterns=existing_patterns,
                same_configuration_pruning=same_configuration_pruning)
            if same_configuration_pruning:
                assem_tp = call_tp[0]
            else:
                assem_tp = call_tp

            # if the assembly given in output is significant and the number
            # of occurrences is higher than the minimum requested number
            if assem_tp['pvalue'][-1] < alpha and \
                    assem_tp['signature'][-1][1] > min_occurrences:
                # save the assembly in the output
                assembly.append(assem_tp)
                sign_pairs_matrix[w1][w2] = 1
                assembly_flag = 1  # flag : it is indeed an assembly
                # put the item_candidate into the existing_patterns list
                if same_configuration_pruning:
                    item_candidate = call_tp[1]
                    if not existing_patterns:
                        existing_patterns = [item_candidate]
                    else:
                        existing_patterns.append(item_candidate)
            if assembly_flag:
                nns += 1  # count of the existing assemblies

    # making sign_pairs_matrix symmetric
    sign_pairs_matrix = sign_pairs_matrix + sign_pairs_matrix.T
    sign_pairs_matrix[sign_pairs_matrix == 2] = 1
    # print(sign_pairs_matrix)

    # second order and more: increase the assembly size by adding a new unit

    # the algorithm will return assemblies composed by
    # maximum max_spikes elements
    if verbose:
        print('\nTesting on higher order assemblies...\n')

    # keep the count of the current size of the assembly
    current_size_agglomeration = 2

    # number of groups previously found
    n_as = len(assembly)

    # w2_to_test_v : contains the elements to test with the elements that are
    # in the assembly in input
    w2_to_test_v = np.zeros(n_neurons)

    # testing for higher order assemblies

    w1 = 0

    while w1 < n_as:

        w1_elements = assembly[w1]['neurons']

        # Add only neurons that have significant first order
        # co-occurrences with members of the assembly
        # Find indices and values of nonzero elements

        for i in range(len(w1_elements)):
            w2_to_test_v += sign_pairs_matrix[w1_elements[i]]

        # w2_to_test_p : vector with the index of nonzero elements
        w2_to_test_p = np.nonzero(w2_to_test_v)[0]

        # list with the elements to test
        # that are not already in the assembly
        w2_to_test = [item for item in w2_to_test_p
                      if item not in w1_elements]
        pop_flag = 0

        # check that there are candidate neurons for agglomeration
        if w2_to_test:

            # bonferroni correction only for the tests actually performed
            alpha = alph / float(len(w2_to_test) * n_as * (2 * max_lag + 1))

            # testing for the element in w2_to_test
            for ww2 in range(len(w2_to_test)):
                w2 = w2_to_test[ww2]
                spiketrain2 = binned_spiketrain[w2]
                assembly_flag = 0
                pop_flag = max(assembly_flag, 0)
                # testing for the assembly and the new neuron

                call_tp = _test_pair(
                    ensemble=assembly[w1],
                    spiketrain2=spiketrain2,
                    n2=w2,
                    max_lag=max_lag,
                    size_chunks=size_chunks,
                    reference_lag=reference_lag,
                    existing_patterns=existing_patterns,
                    same_configuration_pruning=same_configuration_pruning)
                if same_configuration_pruning:
                    assem_tp = call_tp[0]
                else:
                    assem_tp = call_tp

                # if it is significant and
                # the number of occurrences is sufficient and
                # the length of the assembly is less than the input limit
                if assem_tp['pvalue'][-1] < alpha and \
                        assem_tp['signature'][-1][1] > min_occurrences and \
                        assem_tp['signature'][-1][0] <= max_spikes:
                    # the assembly is saved in the output list of
                    # assemblies
                    assembly.append(assem_tp)
                    assembly_flag = 1

                    if len(assem_tp['neurons']) > current_size_agglomeration:
                        # up to the next agglomeration level
                        current_size_agglomeration += 1
                        # Pruning step 1
                        # between two assemblies with the same unit set
                        # arranged into different
                        # configurations, choose the most significant one
                        if significance_pruning is True and \
                                current_size_agglomeration > 3:
                            assembly, n_filtered_assemblies = \
                                _significance_pruning_step(
                                    pre_pruning_assembly=assembly)
                    if same_configuration_pruning:
                        item_candidate = call_tp[1]
                        existing_patterns.append(item_candidate)
                if assembly_flag:
                    # count one more assembly
                    nns += 1
                    n_as = len(assembly)
        # if at least once the assembly was agglomerated to a bigger one,
        # pop the smaller one
        if pop_flag:
            assembly.pop(w1)
        w1 = w1 + 1

    # Pruning step 1
    # between two assemblies with the same unit set arranged into different
    # configurations, choose the most significant one

    # Last call for pruning of last order agglomeration

    if significance_pruning:
        assembly = _significance_pruning_step(pre_pruning_assembly=assembly)[0]

    # Pruning step 2
    # Remove assemblies whom elements are already
    # ALL included in a bigger assembly
    if subgroup_pruning:
        assembly = _subgroup_pruning_step(pre_pruning_assembly=assembly)

    # Reformat of the activation times
    for pattern in assembly:
        times = np.where(pattern['times'] > 0)[0] * bin_size + t_start
        pattern['times'] = times
        pattern['lags'] = pattern['lags'] * bin_size
        pattern['signature'] = np.array(pattern['signature'], dtype=int)

    # Give as output only the maximal groups
    if verbose:
        print('\nGiving outputs of the method...\n')
        print('final_assembly')
        for item in assembly:
            print(item['neurons'],
                  item['lags'],
                  item['signature'])

    # Time needed for the computation
    if verbose:
        print('\ntime', time.time() - initial_time)

    return assembly



def _chunking(binned_pair, size_chunks, max_lag, best_lag):
    """
    Chunking the object binned_pair into parts with the same bin length

    Parameters
    ----------
    binned_pair : np.array
        vector of the binned spike trains for the pair being analyzed
    size_chunks : int
        size of chunks desired
    max_lag : int
        max number of lags for the bin_size chosen
    best_lag : int
        lag with the higher number of coincidences

    Returns
    -------
    chunked : list
        list with the object binned_pair cut in size_chunks parts
    n_chunks : int
        number of chunks
    """

    length = len(binned_pair[0], )

    # number of chunks
    n_chunks = math.ceil((length - max_lag) / size_chunks)

    # new chunk size, this is to have all chunks of roughly the same size
    size_chunks = math.floor((length - max_lag) / n_chunks)

    n_chunks = int(n_chunks)
    size_chunks = int(size_chunks)

    chunked = [[[], []] for _ in range(n_chunks)]

    # cut the time series according to best_lag

    binned_pair_cut = np.array([np.zeros(length - max_lag, dtype=int),
                                np.zeros(length - max_lag, dtype=int)])

    # choose which entries to consider according to the best lag chosen
    if best_lag == 0:
        binned_pair_cut[0] = binned_pair[0][0:length - max_lag]
        binned_pair_cut[1] = binned_pair[1][0:length - max_lag]
    elif best_lag > 0:
        binned_pair_cut[0] = binned_pair[0][0:length - max_lag]
        binned_pair_cut[1] = binned_pair[1][
                             best_lag:length - max_lag + best_lag]
    else:
        binned_pair_cut[0] = binned_pair[0][
                             -best_lag:length - max_lag - best_lag]
        binned_pair_cut[1] = binned_pair[1][0:length - max_lag]

    # put the cut data into the chunked object
    for iii in range(n_chunks - 1):
        chunked[iii][0] = binned_pair_cut[0][
                          size_chunks * iii:size_chunks * (iii + 1)]
        chunked[iii][1] = binned_pair_cut[1][
                          size_chunks * iii:size_chunks * (iii + 1)]

    # last chunk can be of slightly different size
    chunked[n_chunks - 1][0] = binned_pair_cut[0][
                               size_chunks * (n_chunks - 1):length]
    chunked[n_chunks - 1][1] = binned_pair_cut[1][
                               size_chunks * (n_chunks - 1):length]

    return chunked, n_chunks


def _assert_same_pattern(item_candidate, existing_patterns, max_lag):
    """
    Tests if a particular pattern has already been tested and retrieved as
    significant.

    Parameters
    ----------
    item_candidate : list of list with two components
        in the first component there are the neurons involved in the assembly,
        in the second there are the correspondent lags
    existing_patterns : list
        list of the already significant patterns
    max_lag : int
        maximum lag to be tested

    Returns
    -------
        True if the pattern was already tested and retrieved as significant
        False if not
    """
    # unique representation of pattern in term of lags, maxlag and neurons
    # participating
    item_candidate = sorted(item_candidate[0] * 2 * max_lag +
                            item_candidate[1] + max_lag)
    if item_candidate in existing_patterns:
        return True
    else:
        return False


def _test_pair(ensemble, spiketrain2, n2, max_lag, size_chunks, reference_lag,
               existing_patterns, same_configuration_pruning):
    """
    Tests if two spike trains have repetitive patterns occurring more
    frequently than chance.

    Parameters
    ----------
    ensemble : dictionary
        structure with the previously formed assembly and its spike train
    spiketrain2 : list
        spike train of the new unit to be tested for significance
        (candidate to be a new assembly member)
    n2 : int
        new unit tested
    max_lag : int
        maximum lag to be tested
    size_chunks : int
        size (in bins) of chunks in which the spike trains is divided
        to compute the variance (to reduce non stationarity effects
        on variance estimation)
    reference_lag : int
        lag of reference; if zero or negative reference lag=-l
    existing_patterns : list
        list of the already significant patterns
    same_configuration_pruning : bool
        if True (not present in the original code and more
        efficient), does not test assemblies already formed
        if they appear in the very same configuration
        Default: False

    Returns
    -------
    assembly : dictionary
        assembly formed by the method (can be empty), with attributes:
        'elements' : vector of units taking part to the assembly
                     (unit order correspond to the agglomeration order)
        'lag' : vector of time lags (lag[z] is the activation delay between
                elements[1] and elements[z+1]
        'pvalue' : vector of pvalues. `pr[z]` is the p-value of the statistical
               test between performed adding elements[z+1] to the elements[1:z]
        'times' : assembly activation time. It reports how many times the
                 complete assembly activates in that bin.
                 time always refers to the activation of the first listed
                 assembly element (elements[1]), that doesn't necessarily
                 corresponds to the first unit firing.
        'signature' : array of two entries (z,c). The first is the number of
                        neurons participating in the assembly (size),
                        while the second is number of assembly occurrences.
    item_candidate : list of list with two components
        in the first component there are the neurons involved in the assembly,
        in the second there are the correspondent lags.

    """

    # list with the binned spike trains of the two neurons
    binned_pair = [ensemble['times'], spiketrain2]

    # For large bin_sizes, the binned spike counts may potentially fluctuate
    # around a high mean level and never fall below some minimum count
    # considerably larger than zero for the whole time series.
    # Entries up to this minimum count would contribute
    # to the coincidence count although they are completely
    # uninformative, so we subtract the minima.

    binned_pair = np.array([binned_pair[0] - min(binned_pair[0]),
                            binned_pair[1] - min(binned_pair[1])])

    ntp = len(binned_pair[0])  # trial length

    # Divide in parallel trials with 0/1 elements
    # max number of spikes in one bin for both neurons
    maxrate = int(max(max(binned_pair[0]), max(binned_pair[1])))

    # creation of the parallel processes, one for each rate up to maxrate
    # and computation of the coincidence count for both neurons
    par_processes = np.zeros((maxrate, 2, ntp), dtype=int)
    par_proc_expectation = np.zeros(maxrate, dtype=int)

    for i in range(maxrate):
        par_processes[i] = np.array(binned_pair > i, dtype=int)
        par_proc_expectation[i] = (np.sum(par_processes[i][0]) * np.sum(
            par_processes[i][1])) / float(ntp)

    # Decide which is the lag with most coincidences (l_ : best lag)
    # we are calculating the joint spike count of units A and B at lag l.
    # It is computed by counting the number
    # of times we have a spike in A and a corresponding spike in unit B
    # l times later for every lag,
    # we select the one corresponding to the highest count

    # structure with the coincidence counts for each lag
    fwd_coinc_count = np.array([0 for _ in range(max_lag + 1)])
    bwd_coinc_count = np.array([0 for _ in range(max_lag + 1)])

    for lag in range(max_lag + 1):
        time_fwd_cc = np.array([binned_pair[0][
                                0:len(binned_pair[0]) - max_lag],
                                binned_pair[1][
                                lag:len(binned_pair[1]) - max_lag + lag]])

        time_bwd_cc = np.array([binned_pair[0][
                                lag:len(binned_pair[0]) - max_lag + lag],
                                binned_pair[1][
                                0:len(binned_pair[1]) - max_lag]])

        # taking the minimum, place by place for the coincidences
        fwd_coinc_count[lag] = np.sum(np.minimum(time_fwd_cc[0],
                                                 time_fwd_cc[1]))
        bwd_coinc_count[lag] = np.sum(np.minimum(time_bwd_cc[0],
                                                 time_bwd_cc[1]))

    # choice of the best lag, taking into account the reference lag
    if reference_lag <= 0:
        # if the global maximum is in the forward process (A to B)
        if np.amax(fwd_coinc_count) > np.amax(bwd_coinc_count):
            # bwd_flag indicates whether we are in time_fwd_cc or time_bwd_cc
            fwd_flag = 1
            global_maximum_index = np.argmax(fwd_coinc_count)
        else:
            fwd_flag = 2
            global_maximum_index = np.argmax(bwd_coinc_count)
        best_lag = (fwd_flag == 1) * global_maximum_index - (
                fwd_flag == 2) * global_maximum_index
        max_coinc_count = max(np.amax(fwd_coinc_count),
                              np.amax(bwd_coinc_count))
    else:
        # reverse the ctAB_ object and not take into account the first entry
        bwd_coinc_count_rev = bwd_coinc_count[1:len(bwd_coinc_count)][::-1]
        hab_l = np.append(bwd_coinc_count_rev, fwd_coinc_count)
        lags = range(-max_lag, max_lag + 1)
        max_coinc_count = np.amax(hab_l)
        best_lag = lags[np.argmax(hab_l)]
        if best_lag < 0:
            lag_ref = best_lag + reference_lag
            coinc_count_ref = hab_l[lags.index(lag_ref)]
        else:
            lag_ref = best_lag - reference_lag
            coinc_count_ref = hab_l[lags.index(lag_ref)]

    # now check whether the pattern, with those neurons and that particular
    # configuration of lags,
    # is already in the list of the significant patterns
    # if it is, don't do the testing
    # if it is not, continue
    previous_neu = ensemble['neurons']
    pattern_candidate = copy.copy(previous_neu)
    pattern_candidate.append(n2)
    pattern_candidate = np.array(pattern_candidate)

    # add both the new lag and zero
    previous_lags = ensemble['lags']
    lags_candidate = copy.copy(previous_lags)
    lags_candidate.append(best_lag)

    lags_candidate[:0] = [0]
    pattern_candidate = list(pattern_candidate)
    lags_candidate = list(lags_candidate)
    item_candidate = [[pattern_candidate], [lags_candidate]]

    if same_configuration_pruning:
        if _assert_same_pattern(item_candidate=item_candidate,
                                existing_patterns=existing_patterns,
                                max_lag=max_lag):
            en_neurons = copy.copy(ensemble['neurons'])
            en_neurons.append(n2)
            en_lags = copy.copy(ensemble['lags'])
            en_lags.append(np.inf)
            en_pvalue = copy.copy(ensemble['pvalue'])
            en_pvalue.append(1)
            en_n_occ = copy.copy(ensemble['signature'])
            en_n_occ.append([0, 0])
            item_candidate = []
            assembly = {'neurons': en_neurons,
                        'lags': en_lags,
                        'pvalue': en_pvalue,
                        'times': [],
                        'signature': en_n_occ}
            return assembly, item_candidate
    else:
        # I go on with the testing

        pair_expectation = np.sum(par_proc_expectation)
        # case of no coincidences or limit for the F asimptotical
        # distribution (too few coincidences)
        if max_coinc_count == 0 or pair_expectation <= 5 or \
                pair_expectation >= (min(np.sum(binned_pair[0]),
                                         np.sum(binned_pair[1])) - 5):
            en_neurons = copy.copy(ensemble['neurons'])
            en_neurons.append(n2)
            en_lags = copy.copy(ensemble['lags'])
            en_lags.append(np.inf)
            en_pvalue = copy.copy(ensemble['pvalue'])
            en_pvalue.append(1)
            en_n_occ = copy.copy(ensemble['signature'])
            en_n_occ.append([0, 0])
            assembly = {'neurons': en_neurons,
                        'lags': en_lags,
                        'pvalue': en_pvalue,
                        'times': [],
                        'signature': en_n_occ}
            if same_configuration_pruning:
                item_candidate = []
                return assembly, item_candidate
            else:
                return assembly

        else:  # construct the activation series for binned_pair
            length = len(binned_pair[0])  # trial length
            activation_series = np.zeros(length)
            if reference_lag <= 0:
                if best_lag == 0:  # synchrony case
                    for i in range(maxrate):  # for all parallel processes
                        par_processes_a = par_processes[i][0]
                        par_processes_b = par_processes[i][1]
                        activation_series = \
                            np.add(activation_series,
                                   np.multiply(par_processes_a,
                                               par_processes_b))
                    coinc_count_matrix = np.array([[0, fwd_coinc_count[0]],
                                                   [bwd_coinc_count[2], 0]])
                    # matrix with #AB and #BA
                    # here we specifically choose
                    # 'l* = -2' for the synchrony case
                elif best_lag > 0:
                    for i in range(maxrate):
                        par_processes_a = par_processes[i][0]
                        par_processes_b = par_processes[i][1]
                        # multiplication between the two binned time series
                        # shifted by best_lag
                        activation_series[0:length - best_lag] = \
                            np.add(activation_series[0:length - best_lag],
                                   np.multiply(par_processes_a[
                                               0:length - best_lag],
                                               par_processes_b[
                                               best_lag:length]))
                    coinc_count_matrix = \
                        np.array([[0, fwd_coinc_count[global_maximum_index]],
                                  [bwd_coinc_count[global_maximum_index], 0]])
                else:
                    for i in range(maxrate):
                        par_processes_a = par_processes[i][0]
                        par_processes_b = par_processes[i][1]
                        activation_series[-best_lag:length] = \
                            np.add(activation_series[-best_lag:length],
                                   np.multiply(par_processes_a[
                                               -best_lag:length],
                                               par_processes_b[
                                               0:length + best_lag]))
                    coinc_count_matrix = \
                        np.array([[0, fwd_coinc_count[global_maximum_index]],
                                  [bwd_coinc_count[global_maximum_index], 0]])
            else:
                if best_lag == 0:
                    for i in range(maxrate):
                        par_processes_a = par_processes[i][0]
                        par_processes_b = par_processes[i][1]
                        activation_series = \
                            np.add(activation_series,
                                   np.multiply(par_processes_a,
                                               par_processes_b))
                elif best_lag > 0:
                    for i in range(maxrate):
                        par_processes_a = par_processes[i][0]
                        par_processes_b = par_processes[i][1]
                        activation_series[0:length - best_lag] = \
                            np.add(activation_series[0:length - best_lag],
                                   np.multiply(par_processes_a[
                                               0:length - best_lag],
                                               par_processes_b[
                                               best_lag:length]))
                else:
                    for i in range(maxrate):
                        par_processes_a = par_processes[i][0]
                        par_processes_b = par_processes[i][1]
                        activation_series[-best_lag:length] = \
                            np.add(activation_series[-best_lag:length],
                                   np.multiply(par_processes_a[
                                               -best_lag:length],
                                               par_processes_b[
                                               0:length + best_lag]))
                coinc_count_matrix = np.array([[0, max_coinc_count],
                                               [coinc_count_ref, 0]])

        # chunking

        chunked, nch = _chunking(binned_pair=binned_pair,
                                 size_chunks=size_chunks,
                                 max_lag=max_lag,
                                 best_lag=best_lag)

        marginal_counts = np.zeros((nch, maxrate, 2), dtype=int)

        # for every chunk, a vector with in each entry the sum of elements
        # in each parallel binary process, for each unit

        # maxrate_t : contains the maxrates for both neurons in each chunk
        maxrate_t = np.zeros(nch, dtype=int)

        # ch_nn : contains the length of the different chunks
        ch_nn = np.zeros(nch, dtype=int)
        count_sum = 0
        # for every chunk build the parallel processes
        # and the coincidence counts

        for iii in range(nch):
            binned_pair_chunked = np.array(chunked[iii])
            maxrate_t[iii] = max(max(binned_pair_chunked[0]),
                                 max(binned_pair_chunked[1]))
            ch_nn[iii] = len(chunked[iii][0])
            par_processes_chunked = [None for _ in range(
                int(maxrate_t[iii]))]

            for i in range(int(maxrate_t[iii])):
                par_processes_chunked[i] = np.zeros(
                    (2, len(binned_pair_chunked[0])), dtype=int)
                par_processes_chunked[i] = np.array(binned_pair_chunked > i,
                                                    dtype=int)

            for i in range(int(maxrate_t[iii])):
                par_processes_a = par_processes_chunked[i][0]
                par_processes_b = par_processes_chunked[i][1]
                marginal_counts[iii][i][0] = int(np.sum(par_processes_a))
                marginal_counts[iii][i][1] = int(np.sum(par_processes_b))
                count_sum = count_sum + min(marginal_counts[iii][i][0],
                                            marginal_counts[iii][i][1])

        # marginal_counts[iii][i] has in its entries
        # '[ #_a^{\alpha,c} , #_b^{\alpha,c}]'
        # where '\alpha' goes from 1 to maxrate, c goes from 1 to nch

        # calculation of variance for each chunk

        n = ntp - max_lag  # used in the calculation of the p-value
        var_x = [np.zeros((2, 2)) for _ in range(nch)]
        var_tot = 0
        cov_abab = [0 for _ in range(nch)]
        cov_abba = [0 for _ in range(nch)]
        var_t = [np.zeros((2, 2)) for _ in range(nch)]
        cov_x = [np.zeros((2, 2)) for _ in range(nch)]

        for iii in range(nch):  # for every chunk
            ch_size = ch_nn[iii]

            # evaluation of AB + variance and covariance

            cov_abab[iii] = [[0 for _ in range(maxrate_t[iii])]
                             for _ in range(maxrate_t[iii])]
            # for every rate up to the maxrate in that chunk
            for i in range(maxrate_t[iii]):
                par_marg_counts_i = \
                    np.outer(marginal_counts[iii][i], np.ones(2))

                cov_abab[iii][i][i] = \
                    np.multiply(
                        np.multiply(par_marg_counts_i, par_marg_counts_i.T)
                        / float(ch_size),
                        np.multiply(ch_size - par_marg_counts_i,
                                    ch_size - par_marg_counts_i.T)
                        / float(ch_size * (ch_size - 1)))

                # calculation of the variance
                var_t[iii] = var_t[iii] + cov_abab[iii][i][i]

                # cross covariances terms
                if maxrate_t[iii] > 1:
                    for j in range(i + 1, maxrate_t[iii]):
                        par_marg_counts_j = \
                            np.outer(marginal_counts[iii][j], np.ones(2))
                        cov_abab[iii][i][j] = \
                            2 * np.multiply(
                                np.multiply(par_marg_counts_j,
                                            par_marg_counts_j.T)
                                / float(ch_size),
                                np.multiply(ch_size - par_marg_counts_i,
                                            ch_size - par_marg_counts_i.T)
                                / float(ch_size * (ch_size - 1)))

                        # update of the variance
                        var_t[iii] = var_t[iii] + cov_abab[iii][i][j]

            # evaluation of coinc_count_matrix = #AB - #BA

            cov_abba[iii] = [[0 for _ in range(maxrate_t[iii])]
                             for _ in range(maxrate_t[iii])]

            for i in range(maxrate_t[iii]):
                par_marg_counts_i = \
                    np.outer(marginal_counts[iii][i], np.ones(2))
                cov_abba[iii][i][i] = \
                    np.multiply(
                        np.multiply(par_marg_counts_i, par_marg_counts_i.T)
                        / float(ch_size),
                        np.multiply(ch_size - par_marg_counts_i,
                                    ch_size - par_marg_counts_i.T)
                        / float(ch_size * (ch_size - 1) ** 2))
                cov_x[iii] = cov_x[iii] + cov_abba[iii][i][i]

                if maxrate_t[iii] > 1:
                    for j in range((i + 1), maxrate_t[iii]):
                        par_marg_counts_j = \
                            np.outer(marginal_counts[iii][j], np.ones(2))

                        cov_abba[iii][i][j] = \
                            2 * np.multiply(
                                np.multiply(par_marg_counts_j,
                                            par_marg_counts_j.T)
                                / float(ch_size),
                                np.multiply(ch_size - par_marg_counts_i,
                                            ch_size - par_marg_counts_i.T)
                                / float(ch_size * (ch_size - 1) ** 2))

                        cov_x[iii] = cov_x[iii] + cov_abba[iii][i][j]

            var_x[iii] = var_t[iii] + var_t[iii].T - cov_x[iii] - cov_x[iii].T
            var_tot = var_tot + var_x[iii]

        # Yates correction
        coinc_count_matrix = coinc_count_matrix - coinc_count_matrix.T
        if abs(coinc_count_matrix[0][1]) > 0:
            coinc_count_matrix = abs(coinc_count_matrix) - 0.5
        if var_tot[0][1] == 0:
            pr_f = 1
        # p-value obtained through approximation to a Fischer F distribution
        # (here we employ the survival function)
        else:
            fstat = coinc_count_matrix ** 2 / var_tot
            pr_f = f.sf(fstat[0][1], 1, n)

        # Creation of the dictionary with the results
        en_neurons = copy.copy(ensemble['neurons'])
        en_neurons.append(n2)
        en_lags = copy.copy(ensemble['lags'])
        en_lags.append(best_lag)
        en_pvalue = copy.copy(ensemble['pvalue'])
        en_pvalue.append(pr_f)
        en_n_occ = copy.copy(ensemble['signature'])
        en_n_occ.append([len(en_neurons), sum(activation_series)])
        assembly = {'neurons': en_neurons,
                    'lags': en_lags,
                    'pvalue': en_pvalue,
                    'times': activation_series,
                    'signature': en_n_occ}
        if same_configuration_pruning:
            return assembly, item_candidate
        else:
            return assembly


def _significance_pruning_step(pre_pruning_assembly):
    """
    Between two assemblies with the same unit set arranged into different
    configurations the most significant one is chosen.

    Parameters
    ----------
    pre_pruning_assembly : list
        contains the whole set of significant assemblies (unfiltered)

    Returns
    -------
    assembly : list
        contains the filtered assemblies
    n_filtered_assemblies : int
        number of filtered assemblies by the function
    """

    # number of assemblies before pruning
    nns = len(pre_pruning_assembly)

    # boolean array for selection of assemblies to keep
    selection = []

    # list storing the found assemblies
    assembly = []

    for i in range(nns):
        elem = sorted(pre_pruning_assembly[i]['neurons'])
        # in the list, so that membership can be checked
        if elem in selection:
            # find the element that was already in the list
            pre = selection.index(elem)

            if pre_pruning_assembly[i]['pvalue'][-1] <= \
                    assembly[pre]['pvalue'][-1]:
                # if the new element has a p-value that is smaller
                # than the one had previously
                selection[pre] = elem
                # substitute the prev element in the selection with the new
                assembly[pre] = pre_pruning_assembly[i]
                # substitute also in the list of the new assemblies
        if elem not in selection:
            selection.append(elem)
            assembly.append(pre_pruning_assembly[i])

    # number of assemblies filtered out is equal to the difference
    # between the pre and post pruning size
    n_filtered_assemblies = nns - len(assembly)

    return assembly, n_filtered_assemblies


def _subgroup_pruning_step(pre_pruning_assembly):
    """
    Removes assemblies which are already all included in a bigger assembly

    Parameters
    ----------
    pre_pruning_assembly : list
        contains the assemblies filtered by the significance value

    Returns
    --------
    final_assembly : list
        contains the assemblies filtered by inclusion

    """

    # reversing the semifinal_assembly makes the computation quicker
    # since the assembly are formed by agglomeration
    pre_pruning_assembly_r = list(reversed(pre_pruning_assembly))
    nns = len(pre_pruning_assembly_r)
    # boolean list with the selected assemblies
    selection = [True for _ in range(nns)]

    for i in range(nns):
        # check only in the range of the already selected assemblies
        if selection[i]:
            a = pre_pruning_assembly_r[i]['neurons']
            for j in range(i + 1, nns):
                if selection[j]:
                    b = pre_pruning_assembly_r[j]['neurons']
                    # check if a is included in b or vice versa
                    if set(a).issuperset(set(b)):
                        selection[j] = False
                    if set(b).issuperset(set(a)):
                        selection[i] = False
                    # only for the case in which significance_pruning=False
                    if set(a) == set(b):
                        selection[i] = True
                        selection[j] = True

    assembly_r = []

    # put into final_assembly only the selected ones
    for i in range(nns):
        if selection[i]:
            assembly_r.append(pre_pruning_assembly_r[i])

    assembly = list(reversed(assembly_r))

    return assembly


def _raise_errors(binned_spiketrain, max_lag, alpha, min_occurrences,
                  size_chunks, max_spikes):
    """
    Returns errors if the parameters given in input are not correct.

    Parameters
    ----------
    binned_spiketrain : BinnedSpikeTrain object
        binned spike trains containing data to be analysed
    max_lag : int
        maximal lag to be tested. For a binning dimension of bin_size the
        method will test all pairs configurations with a time
        shift between -max_lag and max_lag
    alpha : float
        alpha level.
    min_occurrences : int
        minimal number of occurrences required for an assembly
        (all assemblies, even if significant, with fewer occurrences
        than min_occurrences are discarded).
    size_chunks : int
        size (in bins) of chunks in which the spike trains is divided
        to compute the variance (to reduce non stationarity effects
        on variance estimation)
    max_spikes : int
        maximal assembly order (the algorithm will return assemblies of
        composed by maximum max_spikes elements).

    Raises
    ------
    TypeError
        if the data is not an elephant.conv.BinnedSpikeTrain object
    ValueError
        if the maximum lag considered is 1 or less
        if the significance level is not in [0,1]
        if the minimal number of occurrences for an assembly is less than 1
        if the length of the chunks for the variance computation is 1 or less
        if the maximal assembly order is not between 2
        and the number of neurons
        if the time series is too short (less than 100 bins)

    """

    if not isinstance(binned_spiketrain, conv.BinnedSpikeTrain):
        raise TypeError(
            'data must be in BinnedSpikeTrain format')

    if max_lag < 2:
        raise ValueError('max_lag value cant be less than 2')

    if alpha < 0 or alpha > 1:
        raise ValueError('significance level has to be in interval [0,1]')

    if min_occurrences < 1:
        raise ValueError('minimal number of occurrences for an assembly '
                         'must be at least 1')

    if size_chunks < 2:
        raise ValueError('length of the chunks cannot be 1 or less')

    if max_spikes < 2:
        raise ValueError('maximal assembly order must be less than 2')

    if binned_spiketrain.shape[1] - max_lag < 100:
        raise ValueError('The time series is too short, consider '
                         'taking a longer portion of spike train '
                         'or diminish the bin size to be tested')


# alias for the function
cad = cell_assembly_detection
Navigation

Related Topics

Source code for elephant.cell_assembly_detection