Source code for elephant.spade

"""
SPADE :cite:`spade-Torre2013_132,spade-Quaglio2017_41,spade-Stella2019_104022`
is the combination of a mining technique and multiple statistical tests to
detect and assess the statistical significance of repeated occurrences of spike
sequences (spatio-temporal patterns, STP).


.. autosummary::
    :toctree: _toctree/spade

    spade
    concepts_mining
    pvalue_spectrum
    test_signature_significance
    approximate_stability
    pattern_set_reduction
    concept_output_to_patterns


Visualization
-------------
Visualization of SPADE analysis is covered in Viziphant e.g.:
:func:`viziphant.patterns.plot_patterns_statistics_all`

See Viziphant for more information:

https://viziphant.readthedocs.io/en/latest/modules.html


Notes
-----

This modules relies on the C++ implementation of the fp-growth
algorithm developed by Forian Porrmann (available at
https://github.com/fporrmann/FPG). The module replaces a more generic
implementation of the algorithm by Christian Borgelt (
http://www.borgelt.net/pyfim.html) that was used in previous versions of
Elephant.

If the module ``fim.so`` is not available in a precompiled format (
currently Linux/Windows) or cannot be compiled on a given system during
install, SPADE will make use of a pure Python implementation of the fast fca
algorithm contained in `elephant/spade_src/fast_fca.py`, which is
significantly slower.

See :ref:`no-compile-spade` on how to install elephant without compiling the
``fim.so`` module.


See Also
--------
elephant.cell_assembly_detection.cell_assembly_detection : another synchronous
patterns detection


Examples
--------
Given a list of Neo Spiketrain objects, assumed to be recorded in parallel, the
SPADE analysis can be applied as demonstrated in this short toy example of 10
artificial spike trains of exhibiting fully synchronous events of order 10.

>>> import quantities as pq
>>> import numpy as np
>>> from elephant.spike_train_generation import compound_poisson_process
>>> from elephant.spade import spade

Generate correlated spiketrains.

>>> np.random.seed(30)
>>> spiketrains = compound_poisson_process(rate=15*pq.Hz,
...     amplitude_distribution=[0, 0.95, 0, 0, 0, 0, 0.05], t_stop=5*pq.s)

Mining patterns with SPADE using a `bin_size` of 1 ms and a window length of 1
bin (i.e., detecting only synchronous patterns).

>>> patterns = spade(spiketrains, bin_size=10 * pq.ms, winlen=1,
...                  dither=5 * pq.ms, min_spikes=6, n_surr=10,
...                  psr_param=[0, 0, 3])['patterns']  # doctest:+ELLIPSIS
Time for data mining: ...
Time for pvalue spectrum computation: ...

>>> patterns[0] # doctest: +SKIP
{'itemset': (4, 3, 0, 2, 5, 1),
 'windows_ids': (9,
  16,
  55,
  91,
  ...,
  393,
  456,
  467),
 'neurons': [4, 3, 0, 2, 5, 1],
 'lags': array([0., 0., 0., 0., 0.]) * ms,
 'times': array([  90.,  160.,  550.,  910.,  930., 1420., 1480., 1650., 2570.,
        3130., 3430., 3480., 3610., 3800., 3830., 3930., 4560., 4670.]) * ms,
 'signature': (6, 18),
 'pvalue': 0.0}


Refer to Viziphant documentation to check how to visualzie such patterns.

:copyright: Copyright 2014-2024 by the Elephant team, see `doc/authors.rst`.
:license: BSD, see LICENSE.txt for details.
"""
from __future__ import division, print_function, unicode_literals

import operator
import time
import warnings
from collections import defaultdict
from functools import reduce
from itertools import chain, combinations

import numpy as np
from scipy import sparse

import quantities as pq

import neo
from neo.core.spiketrainlist import SpikeTrainList

import elephant.conversion as conv
import elephant.spike_train_surrogates as surr
from elephant.spade_src import fast_fca
from elephant.utils import deprecated_alias

__all__ = [
    "spade",
    "concepts_mining",
    "pvalue_spectrum",
    "test_signature_significance",
    "approximate_stability",
    "pattern_set_reduction",
    "concept_output_to_patterns"
]

warnings.simplefilter('once', UserWarning)

try:
    from mpi4py import MPI  # for parallelized routines

    HAVE_MPI = True
except ImportError:  # pragma: no cover
    HAVE_MPI = False

try:
    from elephant.spade_src import fim

    HAVE_FIM = True
except ImportError:  # pragma: no cover
    HAVE_FIM = False


[docs] @deprecated_alias(binsize='bin_size') def spade(spiketrains, bin_size, winlen, min_spikes=2, min_occ=2, max_spikes=None, max_occ=None, min_neu=1, approx_stab_pars=None, n_surr=0, dither=15 * pq.ms, spectrum='#', alpha=None, stat_corr='fdr_bh', surr_method='dither_spikes', psr_param=None, output_format='patterns', **surr_kwargs): r""" Perform the SPADE :cite:`spade-Torre2013_132`, :cite:`spade-Quaglio2017_41`, :cite:`spade-Stella2019_104022` analysis for the parallel input `spiketrains`. They are discretized with a temporal resolution equal to `bin_size` in a sliding window of `winlen*bin_size`. First, spike patterns are mined from the `spiketrains` using a technique called frequent itemset mining (FIM) or formal concept analysis (FCA). In this framework, a particular spatio-temporal spike pattern is called a "concept". It is then possible to compute the stability and the p-value of all pattern candidates. In a final step, concepts are filtered according to a stability threshold and a significance level `alpha`. Parameters ---------- spiketrains : list of :class:`neo.core.SpikeTrain` List containing the parallel spike trains to analyze bin_size : pq.Quantity The time precision used to discretize the spiketrains (binning). winlen : int The size (number of bins) of the sliding window used for the analysis. The maximal length of a pattern (delay between first and last spike) is then given by winlen*bin_size min_spikes : int, optional Minimum number of spikes of a sequence to be considered a pattern. Default: 2 min_occ : int, optional Minimum number of occurrences of a sequence to be considered as a pattern. Default: 2 max_spikes : int, optional Maximum number of spikes of a sequence to be considered a pattern. If None no maximal number of spikes is considered. Default: None max_occ : int, optional Maximum number of occurrences of a sequence to be considered as a pattern. If None, no maximal number of occurrences is considered. Default: None min_neu : int, optional Minimum number of neurons in a sequence to consider a pattern. Default: 1 approx_stab_pars : dict or None, optional Parameter values for approximate stability computation. 'n_subsets': int Number of subsets of a concept used to approximate its stability. If `n_subsets` is 0, it is calculated according to the formula given in Babin, Kuznetsov (2012), proposition 6: .. math:: n_{\text{subset}} = \frac{1}{2 \cdot \epsilon^2} \ln{\left( \frac{2}{\delta} \right)} +1 'delta' : float, optional delta: probability with at least :math:`1-\delta` 'epsilon' : float, optional epsilon: absolute error 'stability_thresh' : None or list of float List containing the stability thresholds used to filter the concepts. If `stability_thresh` is None, then the concepts are not filtered. Otherwise, only concepts with * intensional stability is greater than `stability_thresh[0]` or * extensional stability is greater than `stability_thresh[1]` are further analyzed. n_surr : int, optional Number of surrogates to generate to compute the p-value spectrum. This number should be large (`n_surr >= 1000` is recommended for 100 spike trains in `spiketrains`). If `n_surr == 0`, then the p-value spectrum is not computed. Default: 0 dither : pq.Quantity, optional Amount of spike time dithering for creating the surrogates for filtering the pattern spectrum. A spike at time `t` is placed randomly within `[t-dither, t+dither]` (see also :func:`elephant.spike_train_surrogates.dither_spikes`). Default: 15*pq.ms spectrum : {'#', '3d#'}, optional Define the signature of the patterns. '#': pattern spectrum using the as signature the pair: (number of spikes, number of occurrences) '3d#': pattern spectrum using the as signature the triplets: (number of spikes, number of occurrence, difference between last and first spike of the pattern) Default: '#' alpha : float, optional The significance level of the hypothesis tests performed. If `alpha is None`, all the concepts are returned. If `0.<alpha<1.`, the concepts are filtered according to their signature in the p-value spectrum. Default: None stat_corr : str Method used for multiple testing. See: :func:`test_signature_significance` Default: 'fdr_bh' surr_method : str Method to generate surrogates. You can use every method defined in :func:`elephant.spike_train_surrogates.surrogates`. Default: 'dither_spikes' psr_param : None or list of int or tuple of int This list contains parameters used in the pattern spectrum filtering: `psr_param[0]`: correction parameter for subset filtering (see `h_subset_filtering` in :func:`pattern_set_reduction`). `psr_param[1]`: correction parameter for superset filtering (see `k_superset_filtering` in :func:`pattern_set_reduction`). `psr_param[2]`: correction parameter for covered-spikes criterion (see `l_covered_spikes` in :func:`pattern_set_reduction`). output_format : {'concepts', 'patterns'} Distinguish the format of the output (see Returns). Default: 'patterns' surr_kwargs Keyword arguments that are passed to the surrogate methods. Returns ------- output : dict 'patterns': If `output_format` is 'patterns', see the return of :func:`concept_output_to_patterns` If `output_format` is 'concepts', then `output['patterns']` is a tuple of patterns which in turn are tuples of 1. spikes in the pattern 2. occurrences of the pattern For details see :func:`concepts_mining`. if stability is calculated, there are also: 3. intensional stability 4. extensional stability For details see :func:`approximate_stability`. 'pvalue_spectrum' (only if `n_surr > 0`): A list of signatures in tuples format: * size * number of occurrences * duration (only if `spectrum=='3d#'`) * p-value 'non_sgnf_sgnt': list Non-significant signatures of 'pvalue_spectrum'. Notes ----- If detected, this function will use MPI to parallelize the analysis. Examples -------- The following example applies SPADE to `spiketrains` (list of `neo.SpikeTrain`). >>> from elephant.spade import spade >>> import quantities as pq >>> from elephant.spike_train_generation import compound_poisson_process >>> np.random.seed(30) >>> spiketrains = compound_poisson_process(rate=15*pq.Hz, ... amplitude_distribution=[0, 0.95, 0, 0, 0, 0, 0.05], t_stop=5*pq.s) >>> bin_size = 3 * pq.ms # time resolution to discretize the spiketrains >>> winlen = 10 # maximal pattern length in bins (i.e., sliding window) >>> result_spade = spade( ... spiketrains, bin_size, winlen) # doctest: +ELLIPSIS Time for data mining: ... """ if HAVE_MPI: # pragma: no cover comm = MPI.COMM_WORLD # create MPI communicator rank = comm.Get_rank() # get rank of current MPI task else: rank = 0 compute_stability = _check_input( spiketrains=spiketrains, bin_size=bin_size, winlen=winlen, min_spikes=min_spikes, min_occ=min_occ, max_spikes=max_spikes, max_occ=max_occ, min_neu=min_neu, approx_stab_pars=approx_stab_pars, n_surr=n_surr, dither=dither, spectrum=spectrum, alpha=alpha, stat_corr=stat_corr, surr_method=surr_method, psr_param=psr_param, output_format=output_format) time_mining = time.time() if rank == 0 or compute_stability: # Mine the spiketrains for extraction of concepts concepts, rel_matrix = concepts_mining( spiketrains, bin_size, winlen, min_spikes=min_spikes, min_occ=min_occ, max_spikes=max_spikes, max_occ=max_occ, min_neu=min_neu, report='a') time_mining = time.time() - time_mining print(f"Time for data mining: {time_mining}") # Decide if compute the approximated stability if compute_stability: if 'stability_thresh' in approx_stab_pars.keys(): stability_thresh = approx_stab_pars.pop('stability_thresh') else: stability_thresh = None # Computing the approximated stability of all the concepts time_stability = time.time() concepts = approximate_stability( concepts, rel_matrix, **approx_stab_pars) time_stability = time.time() - time_stability print(f"Time for stability computation: {time_stability}") # Filtering the concepts using stability thresholds if stability_thresh is not None: concepts = [concept for concept in concepts if _stability_filter(concept, stability_thresh)] output = {} pv_spec = None # initialize pv_spec to None # Decide whether compute pvalue spectrum if n_surr > 0: # Compute pvalue spectrum time_pvalue_spectrum = time.time() pv_spec = pvalue_spectrum( spiketrains, bin_size, winlen, dither=dither, n_surr=n_surr, min_spikes=min_spikes, min_occ=min_occ, max_spikes=max_spikes, max_occ=max_occ, min_neu=min_neu, spectrum=spectrum, surr_method=surr_method, **surr_kwargs) time_pvalue_spectrum = time.time() - time_pvalue_spectrum print(f"Time for pvalue spectrum computation: {time_pvalue_spectrum}") # Storing pvalue spectrum output['pvalue_spectrum'] = pv_spec # rank!=0 returning None if rank != 0: warnings.warn('Returning None because executed on a process != 0') return None # Initialize non-significant signatures as empty list: ns_signatures = [] # Decide whether filter concepts with psf if n_surr > 0: if len(pv_spec) > 0 and alpha is not None: # Computing non-significant entries of the spectrum applying # the statistical correction ns_signatures = test_signature_significance( pv_spec, concepts, alpha, winlen, corr=stat_corr, report='non_significant', spectrum=spectrum) # Storing non-significant entries of the pvalue spectrum output['non_sgnf_sgnt'] = ns_signatures # Filter concepts with pvalue spectrum (psf) if len(ns_signatures) > 0: concepts = [concept for concept in concepts if _pattern_spectrum_filter(concept, ns_signatures, spectrum, winlen)] # Decide whether to filter concepts using psr if psr_param is not None: # Filter using conditional tests (psr) concepts = pattern_set_reduction( concepts, ns_signatures, winlen=winlen, spectrum=spectrum, h_subset_filtering=psr_param[0], k_superset_filtering=psr_param[1], l_covered_spikes=psr_param[2], min_spikes=min_spikes, min_occ=min_occ) # Storing patterns for output format concepts if output_format == 'concepts': output['patterns'] = concepts else: # output_format == 'patterns': # Transforming concepts to dictionary containing pattern's infos output['patterns'] = concept_output_to_patterns( concepts, winlen, bin_size, pv_spec, spectrum, spiketrains[0].t_start) return output
def _check_input( spiketrains, bin_size, winlen, min_spikes=2, min_occ=2, max_spikes=None, max_occ=None, min_neu=1, approx_stab_pars=None, n_surr=0, dither=15 * pq.ms, spectrum='#', alpha=None, stat_corr='fdr_bh', surr_method='dither_spikes', psr_param=None, output_format='patterns'): """ Checks all input given to SPADE Parameters ---------- see :`func`:`spade` Returns ------- compute_stability: bool if the stability calculation is used """ # Check spiketrains if not all([isinstance(elem, neo.SpikeTrain) for elem in spiketrains]): raise TypeError( 'spiketrains must be a list of SpikeTrains') # Check that all spiketrains have same t_start and same t_stop if not all([spiketrain.t_start == spiketrains[0].t_start for spiketrain in spiketrains]) or \ not all([spiketrain.t_stop == spiketrains[0].t_stop for spiketrain in spiketrains]): raise ValueError( 'All spiketrains must have the same t_start and t_stop') # Check bin_size if not isinstance(bin_size, pq.Quantity): raise TypeError('bin_size must be a pq.Quantity') # Check winlen if not isinstance(winlen, int): raise TypeError('winlen must be an integer') # Check min_spikes if not isinstance(min_spikes, int): raise TypeError('min_spikes must be an integer') # Check min_occ if not isinstance(min_occ, int): raise TypeError('min_occ must be an integer') # Check max_spikes if not (isinstance(max_spikes, int) or max_spikes is None): raise TypeError('max_spikes must be an integer or None') # Check max_occ if not (isinstance(max_occ, int) or max_occ is None): raise TypeError('max_occ must be an integer or None') # Check min_neu if not isinstance(min_neu, int): raise TypeError('min_neu must be an integer') # Check approx_stab_pars compute_stability = False if isinstance(approx_stab_pars, dict): if 'n_subsets' in approx_stab_pars.keys() or\ ('epsilon' in approx_stab_pars.keys() and 'delta' in approx_stab_pars.keys()): compute_stability = True else: raise ValueError( 'for approximate stability computation you need to ' 'pass n_subsets or epsilon and delta.') # Check n_surr if not isinstance(n_surr, int): raise TypeError('n_surr must be an integer') # Check dither if not isinstance(dither, pq.Quantity): raise TypeError('dither must be a pq.Quantity') # Check spectrum if spectrum not in ('#', '3d#'): raise ValueError("spectrum must be '#' or '3d#'") # Check alpha if isinstance(alpha, (int, float)): # Check redundant use of alpha if 0. < alpha < 1. and n_surr == 0: warnings.warn('0.<alpha<1. but p-value spectrum has not been ' 'computed (n_surr==0)') elif alpha is not None: raise TypeError('alpha must be an integer, a float or None') # Check stat_corr: if stat_corr not in \ ('bonferroni', 'sidak', 'holm-sidak', 'holm', 'simes-hochberg', 'hommel', 'fdr_bh', 'fdr_by', 'fdr_tsbh', 'fdr_tsbky', '', 'no'): raise ValueError("Parameter stat_corr not recognized") # Check surr_method if surr_method not in surr.SURR_METHODS: raise ValueError( f'specified surr_method (={surr_method}) not valid') # Check psr_param if psr_param is not None: if not isinstance(psr_param, (list, tuple)): raise TypeError('psr_param must be None or a list or tuple of ' 'integer') if not all(isinstance(param, int) for param in psr_param): raise TypeError('elements of psr_param must be integers') # Check output_format if output_format not in ('concepts', 'patterns'): raise ValueError("The output_format value has to be" "'patterns' or 'concepts'") return compute_stability
[docs] @deprecated_alias(binsize='bin_size') def concepts_mining(spiketrains, bin_size, winlen, min_spikes=2, min_occ=2, max_spikes=None, max_occ=None, min_neu=1, report='a'): """ Find pattern candidates extracting all the concepts of the context, formed by the objects defined as all windows of length `winlen*bin_size` slided along the discretized `spiketrains` and the attributes as the spikes occurring in each of the windows. Hence, the output are all the repeated sequences of spikes with maximal length `winlen`, which are not trivially explained by the same number of occurrences of a superset of spikes. Parameters ---------- spiketrains : list of :class:`neo.core.SpikeTrain` or :class:`elephant.conversion.BinnedSpikeTrain` Either list of the spiketrains to analyze or BinningSpikeTrain object containing the binned spiketrains to analyze bin_size : pq.Quantity The time precision used to discretize the `spiketrains` (clipping). winlen : int The size (number of bins) of the sliding window used for the analysis. The maximal length of a pattern (delay between first and last spike) is then given by `winlen*bin_size` min_spikes : int, optional Minimum number of spikes of a sequence to be considered a pattern. Default: 2 min_occ : int, optional Minimum number of occurrences of a sequence to be considered as a pattern. Default: 2 max_spikes : int, optional Maximum number of spikes of a sequence to be considered a pattern. If None no maximal number of spikes is considered. Default: None max_occ : int, optional Maximum number of occurrences of a sequence to be considered as a pattern. If None, no maximal number of occurrences is considered. Default: None min_neu : int, optional Minimum number of neurons in a sequence to be considered a pattern. Default: 1 report : {'a', '#', '3d#'}, optional Indicates the output of the function. 'a': All the mined patterns '#': Pattern spectrum using as signature the pair: (number of spikes, number of occurrence) '3d#': Pattern spectrum using as signature the triplets: (number of spikes, number of occurrence, difference between the times of the last and the first spike of the pattern) Default: 'a' Returns ------- mining_results : :class:`numpy.ndarray` If report == 'a': Numpy array of all the pattern candidates (concepts) found in the `spiketrains`. Each pattern is represented as a tuple containing (spike IDs, discrete times (window position) of the occurrences of the pattern). The spike IDs are defined as: `spike_id=neuron_id*bin_id` with `neuron_id` in `[0, len(spiketrains)]` and `bin_id` in `[0, winlen]`. If report == '#': The pattern spectrum is represented as a numpy array of triplets (pattern size, number of occurrences, number of patterns). If report == '3d#': The pattern spectrum is represented as a numpy array of quadruplets (pattern size, number of occurrences, difference between last and first spike of the pattern, number of patterns) rel_matrix : :class:`scipy.sparse.coo_matrix` A binary matrix of shape (number of windows, winlen*len(spiketrains)). Each row corresponds to a window (order according to their position in time). Each column corresponds to one bin and one neuron, and it is 0 if no spikes or 1 if one or more spikes occurred in that bin for that particular neuron. For example, the entry [0,0] of this matrix corresponds to the first bin of the first window position for the first neuron, the entry `[0,winlen]` to the first bin of the first window position for the second neuron. """ if report not in ('a', '#', '3d#'): raise ValueError( "report has to assume of the following values:" + f" 'a', '#' and '3d#,' got {report} instead") # if spiketrains is list of neo.SpikeTrain convert to conv.BinnedSpikeTrain if isinstance(spiketrains, (list, SpikeTrainList)) and \ isinstance(spiketrains[0], neo.SpikeTrain): spiketrains = conv.BinnedSpikeTrain( spiketrains, bin_size=bin_size, tolerance=None) if not isinstance(spiketrains, conv.BinnedSpikeTrain): raise TypeError( 'spiketrains must be either a list of neo.SpikeTrain or ' 'a conv.BinnedSpikeTrain object') # Clipping the spiketrains and (binary matrix) binary_matrix = spiketrains.to_sparse_bool_array().tocoo(copy=False) # Computing the context and the binary matrix encoding the relation between # objects (window positions) and attributes (spikes, # indexed with a number equal to neuron idx*winlen+bin idx) context, transactions, rel_matrix = _build_context(binary_matrix, winlen) # By default, set the maximum pattern size to the maximum number of # spikes in a window if max_spikes is None: max_spikes = binary_matrix.shape[0] * winlen # By default, set maximum number of occurrences to number of non-empty # windows if max_occ is None: max_occ = int(np.sum(np.sum(rel_matrix, axis=1) > 0)) # Check if fim.so available and use it if HAVE_FIM: # Return the output mining_results = _fpgrowth( transactions, rel_matrix=rel_matrix, min_c=min_occ, min_z=min_spikes, max_z=max_spikes, max_c=max_occ, winlen=winlen, min_neu=min_neu, report=report) return mining_results, rel_matrix # Otherwise use fast_fca python implementation warnings.warn( 'Optimized C implementation of FCA (fim.so/fim.pyd) not found ' + 'in elephant/spade_src folder, or not compatible with this ' + 'Python version. You are using the pure Python implementation ' + 'of fast fca.') # Return output mining_results = _fast_fca( context, min_c=min_occ, min_z=min_spikes, max_z=max_spikes, max_c=max_occ, winlen=winlen, min_neu=min_neu, report=report) return mining_results, rel_matrix
def _build_context(binary_matrix, winlen): """ Building the context given a matrix (number of trains x number of bins) of binned spike trains Parameters ---------- binary_matrix : sparse.coo_matrix Binary matrix containing the binned spike trains winlen : int Length of the bin_size used to bin the spiketrains Returns ------- context : list of tuple List of tuples containing one object (window position idx) and one of the correspondent spikes idx (bin idx * neuron idx) transactions : list List of all transactions, each element of the list contains the attributes of the corresponding object. rel_matrix : sparse.coo_matrix A binary matrix with shape (number of windows, winlen*len(spiketrains)). Each row corresponds to a window (order according to their position in time). Each column corresponds to one bin and one neuron, and it is 0 if no spikes or 1 if one or more spikes occurred in that bin for that particular neuron. E.g. the entry [0,0] of this matrix corresponds to the first bin of the first window position for the first neuron, the entry [0,winlen] to the first bin of the first window position for the second neuron. """ # Initialization of the outputs context = [] transactions = [] num_neurons, num_bins = binary_matrix.shape indices = np.argsort(binary_matrix.col) binary_matrix.row = binary_matrix.row[indices] binary_matrix.col = binary_matrix.col[indices] # out of all window positions # get all non-empty first bins unique_cols, unique_col_idx = np.unique( binary_matrix.col, return_index=True) unique_col_idx = np.concatenate((unique_col_idx, [len(binary_matrix.col)])) windows_row = [] windows_col = [] # all non-empty bins are starting positions for windows for idx, window_idx in enumerate(unique_cols): # find the end of the current window in unique_cols end_of_window = np.searchsorted(unique_cols, window_idx + winlen) # loop over all non-empty bins in the current window for rel_idx, col in enumerate(unique_cols[idx:end_of_window]): # get all occurrences of the current col in binary_matrix.col spike_indices_in_window = np.arange( unique_col_idx[idx + rel_idx], unique_col_idx[idx + rel_idx + 1]) # get the binary_matrix.row entries matching the current col # prepare the row of rel_matrix matching the current window # spikes are indexed as (neuron_id * winlen + bin_id) windows_col.extend( binary_matrix.row[spike_indices_in_window] * winlen + (col - window_idx)) windows_row.extend([window_idx] * len(spike_indices_in_window)) # Shape of the rel_matrix: # (total number of bins, # number of bins in one window * number of neurons) rel_matrix = sparse.coo_matrix( (np.ones((len(windows_col)), dtype=bool), (windows_row, windows_col)), shape=(num_bins, winlen * num_neurons), dtype=bool).A # Array containing all the possible attributes (each spike is indexed by # a number equal to neu idx*winlen + bin_idx) attributes = np.array( [s * winlen + t for s in range(binary_matrix.shape[0]) for t in range(winlen)]) # Building context and rel_matrix # Looping all the window positions w for window in unique_cols: # spikes in the current window times = rel_matrix[window] current_transactions = attributes[times] # adding to the context the window positions and the correspondent # attributes (spike idx) (fast_fca input) context.extend( (window, transaction) for transaction in current_transactions) # appending to the transactions spike idx (fast_fca input) of the # current window (fpgrowth input) transactions.append(list(current_transactions)) # Return context and rel_matrix return context, transactions, rel_matrix def _fpgrowth(transactions, min_c=2, min_z=2, max_z=None, max_c=None, rel_matrix=None, winlen=1, min_neu=1, target='c', report='a'): """ Find frequent item sets with the fpgrowth algorithm. Parameters ---------- transactions: tuple Transactions database to mine. The database must be an iterable of transactions; each transaction must be an iterable of items; each item must be a hashable object. If the database is a dictionary, the transactions are the keys, the values their (integer) multiplicities. target: str type of frequent item sets to find s/a sets/all all frequent item sets c closed closed frequent item sets m maximal maximal frequent item sets g gens generators Default:'c' min_c: int minimum support of an item set Default: 2 min_z: int minimum number of items per item set Default: 2 max_z: None/int maximum number of items per item set. If max_c==None no maximal size required Default: None max_c: None/int maximum support per item set. If max_c==None no maximal support required Default: None report: str 'a': all the mined patterns '#': pattern spectrum using as signature the pair: (number of spikes, number of occurrence) '3d#': pattern spectrum using as signature the triplets: (number of spikes, number of occurrence, difference between the times of the last and the first spike of the pattern) Default: 'a' rel_matrix : None or sparse.coo_matrix A binary matrix with shape (number of windows, winlen*len(spiketrains)). Each row corresponds to a window (order according to their position in time). Each column corresponds to one bin and one neuron, and it is 0 if no spikes or 1 if one or more spikes occurred in that bin for that particular neuron. E.g. the entry [0,0] of this matrix corresponds to the first bin of the first window position for the first neuron, the entry [0,winlen] to the first bin of the first window position for the second neuron. If == None only the closed frequent itemsets (intent) are returned and not which the index of their occurrences (extent) Default: None The following parameters are specific to Massive parallel SpikeTrains winlen: int The size (number of bins) of the sliding window used for the analysis. The maximal length of a pattern (delay between first and last spike) is then given by winlen*bin_size Default: 1 min_neu: int Minimum number of neurons in a sequence to be considered a potential pattern. Default: 1 Returns ------- If report == 'a': numpy array of all the pattern candidates (concepts) found in the spiketrains. Each pattern is represented as a tuple containing (spike IDs, discrete times (window position) of the occurrences of the pattern). The spike IDs are defined as: spike_id=neuron_id*bin_id; with neuron_id in [0, len(spiketrains)] and bin_id in [0, winlen]. If report == '#': The pattern spectrum is represented as a numpy array of triplets each formed by: (pattern size, number of occurrences, number of patterns) If report == '3d#': The pattern spectrum is represented as a numpy array of quadruplets each formed by: (pattern size, number of occurrences, difference between last and first spike of the pattern, number of patterns) """ if min_neu < 1: raise ValueError('min_neu must be an integer >=1') # By default, set the maximum pattern size to the number of spiketrains if max_z is None: max_z = max(max(map(len, transactions)), min_z + 1) # By default, set maximum number of data to number of bins if max_c is None: max_c = len(transactions) # Initializing outputs concepts = [] if report == '#': spec_matrix = np.zeros((max_z + 1, max_c + 1)) if report == '3d#': spec_matrix = np.zeros((max_z + 1, max_c + 1, winlen)) spectrum = [] # check whether all transactions are identical # in that case FIM would not find anything, # so we need to create the output manually # for optimal performance, # we do the check sequentially and immediately break # once we find a second unique transaction first_transaction = transactions[0] for transaction in transactions[1:]: if transaction != first_transaction: # Mining the spiketrains with fpgrowth algorithm fpgrowth_output = fim.fpgrowth( tracts=transactions, target=target, supp=-min_c, zmin=min_z, zmax=max_z, report='a', algo='s', winlen=winlen, min_neu=min_neu, threads=0, verbose=4) break else: fpgrowth_output = [(tuple(transactions[0]), len(transactions))] # Applying min/max conditions and computing extent (window positions) # fpgrowth_output = [concept for concept in fpgrowth_output # if _fpgrowth_filter(concept, winlen, max_c, min_neu)] # filter out subsets of patterns that are found as a side effect # of using the moving window strategy fpgrowth_output = _filter_for_moving_window_subsets( fpgrowth_output, winlen) for (intent, supp) in fpgrowth_output: if report == 'a': if rel_matrix is not None: # Computing the extent of the concept (patterns # occurrences), checking in rel_matrix in which windows # the intent occurred extent = tuple( np.nonzero( np.all(rel_matrix[:, intent], axis=1) )[0]) concepts.append((intent, extent)) # Computing 2d spectrum elif report == '#': spec_matrix[len(intent) - 1, supp - 1] += 1 # Computing 3d spectrum elif report == '3d#': spec_matrix[len(intent) - 1, supp - 1, max( np.array(intent) % winlen)] += 1 del fpgrowth_output if report == 'a': return concepts if report == '#': for (size, occurrences) in np.transpose(np.where(spec_matrix != 0)): spectrum.append( (size + 1, occurrences + 1, int(spec_matrix[size, occurrences]))) elif report == '3d#': for (size, occurrences, duration) in\ np.transpose(np.where(spec_matrix != 0)): spectrum.append( (size + 1, occurrences + 1, duration, int(spec_matrix[size, occurrences, duration]))) del spec_matrix if len(spectrum) > 0: spectrum = np.array(spectrum) elif report == '#': spectrum = np.zeros(shape=(0, 3)) elif report == '3d#': spectrum = np.zeros(shape=(0, 4)) return spectrum def _rereference_to_last_spike(transactions, winlen): """ Converts transactions from the default format neu_idx * winlen + bin_idx (relative to window start) into the format neu_idx * winlen + bin_idx (relative to last spike) """ len_transactions = len(transactions) neurons = np.zeros(len_transactions, dtype=int) bins = np.zeros(len_transactions, dtype=int) # extract neuron and bin indices for idx, attribute in enumerate(transactions): neurons[idx] = attribute // winlen bins[idx] = attribute % winlen # reference bins to last spike bins = bins.max() - bins # calculate converted transactions converted_transactions = neurons * winlen + bins return converted_transactions def _filter_for_moving_window_subsets(concepts, winlen): """ Since we're using a moving window, sub patterns starting from subsequent spikes after the first pattern spike will also be found. This filter removes them if they do not occur on their own in addition to the occurrences explained by their superset. Uses a reverse map with a set representation. """ # don't do anything if the input list is empty if len(concepts) == 0: return concepts # don't do anything if winlen is one if winlen == 1: return concepts if hasattr(concepts[0], 'intent'): # fca format # sort the concepts by (decreasing) support concepts.sort(key=lambda c: -len(c.extent)) support = np.array([len(c.extent) for c in concepts]) # convert transactions relative to last pattern spike converted_transactions = [_rereference_to_last_spike(concept.intent, winlen=winlen) for concept in concepts] else: # fim.fpgrowth format # sort the concepts by (decreasing) support concepts.sort(key=lambda concept: -concept[1]) support = np.array([concept[1] for concept in concepts]) # convert transactions relative to last pattern spike converted_transactions = [_rereference_to_last_spike(concept[0], winlen=winlen) for concept in concepts] output = [] for current_support in np.unique(support): support_indices = np.nonzero(support == current_support)[0] # construct reverse map reverse_map = defaultdict(set) for map_idx, i in enumerate(support_indices): for window_bin in converted_transactions[i]: reverse_map[window_bin].add(map_idx) for i in support_indices: intersection = reduce( operator.and_, (reverse_map[window_bin] for window_bin in converted_transactions[i])) if len(intersection) == 1: output.append(concepts[i]) return output def _fast_fca(context, min_c=2, min_z=2, max_z=None, max_c=None, report='a', winlen=1, min_neu=1): """ Find concepts of the context with the fast-fca algorithm. Parameters ---------- context : list List of tuples containing one object and one the correspondent attribute min_c: int minimum support of an item set Default: 2 min_z: int minimum number of items per item set Default: 2 max_z: None/int maximum number of items per item set. If max_c==None no maximal size required Default: None max_c: None/int maximum support per item set. If max_c==None no maximal support required Default: None report: str 'a': all the mined patterns '#': pattern spectrum using as signature the pair: (number of spikes, number of occurrence) '3d#': pattern spectrum using as signature the triplets: (number of spikes, number of occurrence, difference between the times of the last and the first spike of the pattern) Default: 'a' The following parameters are specific to Massive parallel SpikeTrains winlen: int The size (number of bins) of the sliding window used for the analysis. The maximal length of a pattern (delay between first and last spike) is then given by winlen*bin_size Default: 1 min_neu: int Minimum number of neurons in a sequence to be considered a potential pattern. Default: 1 Returns ------- If report == 'a': All the pattern candidates (concepts) found in the spiketrains. Each pattern is represented as a tuple containing (spike IDs, discrete times (window position) of the occurrences of the pattern). The spike IDs are defined as: spike_id=neuron_id*bin_id; with neuron_id in [0, len(spiketrains)] and bin_id in [0, winlen]. If report == '#': The pattern spectrum is represented as a list of triplets each formed by: (pattern size, number of occurrences, number of patterns) If report == '3d#': The pattern spectrum is represented as a list of quadruplets each formed by: (pattern size, number of occurrences, difference between last and first spike of the pattern, number of patterns) """ # Initializing outputs concepts = [] # Check parameters if min_neu < 1: raise ValueError('min_neu must be an integer >=1') # By default, set maximum number of attributes if max_z is None: max_z = len(context) # By default, set maximum number of data to number of bins if max_c is None: max_c = len(context) if report == '#': spec_matrix = np.zeros((max_z, max_c)) elif report == '3d#': spec_matrix = np.zeros((max_z, max_c, winlen)) spectrum = [] # Mining the spiketrains with fast fca algorithm fca_out = fast_fca.FormalConcepts(context) fca_out.computeLattice() fca_concepts = fca_out.concepts fca_concepts = [concept for concept in fca_concepts if _fca_filter(concept, winlen, min_c, min_z, max_c, max_z, min_neu)] fca_concepts = _filter_for_moving_window_subsets(fca_concepts, winlen) # Applying min/max conditions for fca_concept in fca_concepts: intent = tuple(fca_concept.intent) extent = tuple(fca_concept.extent) concepts.append((intent, extent)) # computing spectrum if report == '#': spec_matrix[len(intent) - 1, len(extent) - 1] += 1 elif report == '3d#': spec_matrix[len(intent) - 1, len(extent) - 1, max( np.array(intent) % winlen)] += 1 if report == 'a': return concepts del concepts # returning spectrum if report == '#': for (size, occurrence) in np.transpose(np.where(spec_matrix != 0)): spectrum.append( (size + 1, occurrence + 1, int(spec_matrix[size, occurrence]))) if report == '3d#': for (size, occurrence, duration) in\ np.transpose(np.where(spec_matrix != 0)): spectrum.append( (size + 1, occurrence + 1, duration, int(spec_matrix[size, occurrence, duration]))) del spec_matrix if len(spectrum) > 0: spectrum = np.array(spectrum) elif report == '#': spectrum = np.zeros(shape=(0, 3)) elif report == '3d#': spectrum = np.zeros(shape=(0, 4)) return spectrum def _fca_filter(concept, winlen, min_c, min_z, max_c, max_z, min_neu): """ Filter to select concepts with minimum/maximum number of spikes and occurrences and first spike in the first bin position """ intent = tuple(concept.intent) extent = tuple(concept.extent) keep_concepts = \ min_z <= len(intent) <= max_z and \ min_c <= len(extent) <= max_c and \ len(np.unique(np.array(intent) // winlen)) >= min_neu and \ min(np.array(intent) % winlen) == 0 return keep_concepts
[docs] @deprecated_alias(binsize='bin_size') def pvalue_spectrum( spiketrains, bin_size, winlen, dither, n_surr, min_spikes=2, min_occ=2, max_spikes=None, max_occ=None, min_neu=1, spectrum='#', surr_method='dither_spikes', **surr_kwargs): """ Compute the p-value spectrum of pattern signatures extracted from surrogates of parallel spike trains, under the null hypothesis of independent spiking. * n_surr surrogates are obtained from each spike train by spike dithering * pattern candidates (concepts) are collected from each surrogate data * the signatures (number of spikes, number of occurrences) of all patterns are computed, and their occurrence probability estimated by their occurrence frequency (p-value spectrum) Parameters ---------- spiketrains : list of :class:`neo.core.SpikeTrain` List containing the parallel spike trains to analyze bin_size : pq.Quantity The time precision used to discretize the `spiketrains` (binning). winlen : int The size (number of bins) of the sliding window used for the analysis. The maximal length of a pattern (delay between first and last spike) is then given by `winlen*bin_size` dither : pq.Quantity Amount of spike time dithering for creating the surrogates for filtering the pattern spectrum. A spike at time t is placed randomly within `[t-dither, t+dither]` (see also :func:`elephant.spike_train_surrogates.dither_spikes`). Default: 15*pq.s n_surr : int Number of surrogates to generate to compute the p-value spectrum. This number should be large (`n_surr>=1000` is recommended for 100 spike trains in spiketrains). If `n_surr` is 0, then the p-value spectrum is not computed. Default: 0 min_spikes : int, optional Minimum number of spikes of a sequence to be considered a pattern. Default: 2 min_occ : int, optional Minimum number of occurrences of a sequence to be considered as a pattern. Default: 2 max_spikes : int, optional Maximum number of spikes of a sequence to be considered a pattern. If None no maximal number of spikes is considered. Default: None max_occ : int, optional Maximum number of occurrences of a sequence to be considered as a pattern. If None, no maximal number of occurrences is considered. Default: None min_neu : int, optional Minimum number of neurons in a sequence to be considered a pattern. Default: 1 spectrum : {'#', '3d#'}, optional Defines the signature of the patterns. '#': pattern spectrum using the as signature the pair: (number of spikes, number of occurrence) '3d#': pattern spectrum using the as signature the triplets: (number of spikes, number of occurrence, difference between last and first spike of the pattern) Default: '#' surr_method : str Method that is used to generate the surrogates. You can use every method defined in :func:`elephant.spike_train_surrogates.dither_spikes`. Default: 'dither_spikes' surr_kwargs Keyword arguments that are passed to the surrogate methods. Returns ------- pv_spec : list if spectrum == '#': A list of triplets (z,c,p), where (z,c) is a pattern signature and p is the corresponding p-value (fraction of surrogates containing signatures (z*,c*)>=(z,c)). if spectrum == '3d#': A list of triplets (z,c,l,p), where (z,c,l) is a pattern signature and p is the corresponding p-value (fraction of surrogates containing signatures (z*,c*,l*)>=(z,c,l)). Signatures whose empirical p-value is 0 are not listed. """ # Initializing variables for parallel computing if HAVE_MPI: # pragma: no cover comm = MPI.COMM_WORLD # create MPI communicator rank = comm.Get_rank() # get rank of current MPI task size = comm.Get_size() # get tot number of MPI tasks else: rank = 0 size = 1 # Check on number of surrogates if n_surr <= 0: raise ValueError('n_surr has to be >0') if surr_method not in surr.SURR_METHODS: raise ValueError( f'specified surr_method (={surr_method}) not valid') if spectrum not in ('#', '3d#'): raise ValueError(f"Invalid spectrum: '{spectrum}'") len_partition = n_surr // size # length of each MPI task len_remainder = n_surr % size add_remainder = rank < len_remainder if max_spikes is None: # if max_spikes not defined, set it to the number of spiketrains times # number of bins per window. max_spikes = len(spiketrains) * winlen if spectrum == '#': max_occs = np.empty(shape=(len_partition + add_remainder, max_spikes - min_spikes + 1), dtype=np.uint16) else: # spectrum == '3d#': max_occs = np.empty(shape=(len_partition + add_remainder, max_spikes - min_spikes + 1, winlen), dtype=np.uint16) for surr_id, binned_surrogates in _generate_binned_surrogates( spiketrains, bin_size=bin_size, dither=dither, surr_method=surr_method, n_surrogates=len_partition+add_remainder, **surr_kwargs): # Find all pattern signatures in the current surrogate data set surr_concepts = concepts_mining( binned_surrogates, bin_size, winlen, min_spikes=min_spikes, max_spikes=max_spikes, min_occ=min_occ, max_occ=max_occ, min_neu=min_neu, report=spectrum)[0] # The last entry of the signature is the number of times the # signature appeared. This entry is not needed here. surr_concepts = surr_concepts[:, :-1] max_occs[surr_id] = _get_max_occ( surr_concepts, min_spikes, max_spikes, winlen, spectrum) # Collecting results on the first PCU if size != 1: max_occs = comm.gather(max_occs, root=0) if rank != 0: # pragma: no cover return [] # The gather operator gives a list out. This is rearranged as a 2 resp. # 3-dimensional numpy-array. max_occs = np.vstack(max_occs) # Compute the p-value spectrum, and return it return _get_pvalue_spec(max_occs, min_spikes, max_spikes, min_occ, n_surr, winlen, spectrum)
def _generate_binned_surrogates( spiketrains, bin_size, dither, surr_method, n_surrogates, **surr_kwargs): if surr_method == 'bin_shuffling': binned_spiketrains = [ conv.BinnedSpikeTrain( spiketrain, bin_size=bin_size, tolerance=None) for spiketrain in spiketrains] max_displacement = int(dither.rescale(pq.ms).magnitude / bin_size.rescale(pq.ms).magnitude) elif surr_method in ('joint_isi_dithering', 'isi_dithering'): isi_dithering = surr_method == 'isi_dithering' joint_isi_instances = \ [surr.JointISI(spiketrain, dither=dither, isi_dithering=isi_dithering, **surr_kwargs) for spiketrain in spiketrains] for surr_id in range(n_surrogates): if surr_method == 'bin_shuffling': binned_surrogates = \ [surr.bin_shuffling(binned_spiketrain, max_displacement=max_displacement, **surr_kwargs)[0] for binned_spiketrain in binned_spiketrains] binned_surrogates = np.array( [binned_surrogate.to_bool_array()[0] for binned_surrogate in binned_surrogates]) binned_surrogates = conv.BinnedSpikeTrain( binned_surrogates, bin_size=bin_size, t_start=spiketrains[0].t_start, t_stop=spiketrains[0].t_stop, tolerance=None) elif surr_method in ('joint_isi_dithering', 'isi_dithering'): surrs = [instance.dithering()[0] for instance in joint_isi_instances] elif surr_method == 'dither_spikes_with_refractory_period': # The initial refractory period is set to the bin size in order to # prevent that spikes fall into the same bin, if the spike trains # are sparse (min(ISI)>bin size). surrs = \ [surr.dither_spikes( spiketrain, dither=dither, n_surrogates=1, refractory_period=bin_size, **surr_kwargs)[0] for spiketrain in spiketrains] else: surrs = \ [surr.surrogates( spiketrain, n_surrogates=1, method=surr_method, dt=dither, **surr_kwargs)[0] for spiketrain in spiketrains] if surr_method != 'bin_shuffling': binned_surrogates = conv.BinnedSpikeTrain( surrs, bin_size=bin_size, tolerance=None) yield surr_id, binned_surrogates def _get_pvalue_spec(max_occs, min_spikes, max_spikes, min_occ, n_surr, winlen, spectrum): """ This function converts the list of maximal occurrences into the corresponding p-value spectrum. Parameters ---------- max_occs: np.ndarray min_spikes: int max_spikes: int min_occ: int n_surr: int winlen: int spectrum: {'#', '3d#'} Returns ------- if spectrum == '#': List[List]: each entry has the form: [pattern_size, pattern_occ, p_value] if spectrum == '3d#': List[List]: each entry has the form: [pattern_size, pattern_occ, pattern_dur, p_value] """ if spectrum not in ('#', '3d#'): raise ValueError(f"Invalid spectrum: '{spectrum}'") pv_spec = [] if spectrum == '#': max_occs = np.expand_dims(max_occs, axis=2) winlen = 1 for size_id, pt_size in enumerate(range(min_spikes, max_spikes + 1)): for dur in range(winlen): max_occs_size_dur = max_occs[:, size_id, dur] counts, occs = np.histogram( max_occs_size_dur, bins=np.arange(min_occ, np.max(max_occs_size_dur) + 2)) occs = occs[:-1].astype(np.uint16) pvalues = np.cumsum(counts[::-1])[::-1] / n_surr for occ_id, occ in enumerate(occs): if spectrum == '#': pv_spec.append([pt_size, occ, pvalues[occ_id]]) else: # spectrum == '3d#': pv_spec.append([pt_size, occ, dur, pvalues[occ_id]]) return pv_spec def _get_max_occ(surr_concepts, min_spikes, max_spikes, winlen, spectrum): """ This function takes from a list of surrogate_concepts those concepts which have the highest occurrence for a given pattern size and duration. Parameters ---------- surr_concepts: List[List] min_spikes: int max_spikes: int winlen: int spectrum: {'#', '3d#'} Returns ------- np.ndarray Two-dimensional array. Each element corresponds to the highest occurrence for a specific pattern size (which range from min_spikes to max_spikes) and pattern duration (which range from 0 to winlen-1). The first axis corresponds to the pattern size the second to the duration. """ if spectrum == '#': winlen = 1 max_occ = np.zeros(shape=(max_spikes - min_spikes + 1, winlen)) for size_id, pt_size in enumerate(range(min_spikes, max_spikes + 1)): concepts_for_size = surr_concepts[ surr_concepts[:, 0] == pt_size][:, 1:] for dur in range(winlen): if spectrum == '#': occs = concepts_for_size[:, 0] else: # spectrum == '3d#': occs = concepts_for_size[concepts_for_size[:, 1] == dur][:, 0] max_occ[size_id, dur] = np.max(occs, initial=0) for pt_size in range(max_spikes - 1, min_spikes - 1, -1): size_id = pt_size - min_spikes max_occ[size_id] = np.max(max_occ[size_id:size_id + 2], axis=0) if spectrum == '#': max_occ = np.squeeze(max_occ, axis=1) return max_occ def _stability_filter(concept, stability_thresh): """Criteria by which to filter concepts from the lattice""" # stabilities larger than stability_thresh keep_concept = \ concept[2] > stability_thresh[0]\ or concept[3] > stability_thresh[1] return keep_concept def _mask_pvalue_spectrum(pv_spec, concepts, spectrum, winlen): """ The function filters the pvalue spectrum based on the number of the statistical tests to be done. Only the entries of the pvalue spectrum that coincide with concepts found in the original data are kept. Moreover, entries of the pvalue spectrum with a value of 1 (all surrogates datasets containing at least one mined pattern with that signature) are discarded as well and considered trivial. Parameters ---------- pv_spec: List[List] concepts: List[Tuple] spectrum: {'#', '3d#'} winlen: int Returns ------- mask : np.array An array of boolean values, indicating if a signature of p-value spectrum is also in the mined concepts of the original data. """ if spectrum == '#': signatures = {(len(concept[0]), len(concept[1])) for concept in concepts} else: # spectrum == '3d#': # third entry of signatures is the duration, fixed as the maximum lag signatures = {(len(concept[0]), len(concept[1]), max(np.array(concept[0]) % winlen)) for concept in concepts} mask = np.zeros(len(pv_spec), dtype=bool) for index, pv_entry in enumerate(pv_spec): if tuple(pv_entry[:-1]) in signatures \ and not np.isclose(pv_entry[-1], [1]): # select the highest number of occurrences for size and duration mask[index] = True if mask[index-1]: if spectrum == '#': size = pv_spec[index][0] prev_size = pv_spec[index-1][0] if prev_size == size: mask[index-1] = False else: size, duration = pv_spec[index][[0, 2]] prev_size, prev_duration = pv_spec[index-1][[0, 2]] if prev_size == size and duration == prev_duration: mask[index-1] = False return mask
[docs] def test_signature_significance(pv_spec, concepts, alpha, winlen, corr='fdr_bh', report='spectrum', spectrum='#'): """ Compute the significance spectrum of a pattern spectrum. Given pvalue_spectrum `pv_spec` as a list of triplets (z,c,p), where z is pattern size, c is pattern support and p is the p-value of the signature (z,c), this routine assesses the significance of (z,c) using the confidence level alpha. Bonferroni or FDR statistical corrections can be applied. Parameters ---------- pv_spec : list A list of triplets (z,c,p), where z is pattern size, c is pattern support and p is the p-value of signature (z,c) concepts : list of tuple Output of the concepts mining for the original data. alpha : float Significance level of the statistical test winlen : int Size (number of bins) of the sliding window used for the analysis corr : str, optional Method used for testing and adjustment of pvalues. Can be either the full name or initial letters. Available methods are: 'bonferroni' : one-step correction 'sidak' : one-step correction 'holm-sidak' : step down method using Sidak adjustments 'holm' : step-down method using Bonferroni adjustments 'simes-hochberg' : step-up method (independent) 'hommel' : closed method based on Simes tests (non-negative) 'fdr_bh' : Benjamini/Hochberg (non-negative) 'fdr_by' : Benjamini/Yekutieli (negative) 'fdr_tsbh' : two stage fdr correction (non-negative) 'fdr_tsbky' : two stage fdr correction (non-negative) '' or 'no': no statistical correction For further description see: https://www.statsmodels.org/stable/generated/statsmodels.stats.multitest.multipletests.html Default: 'fdr_bh' report : {'spectrum', 'significant', 'non_significant'}, optional Format to be returned for the significance spectrum: 'spectrum': list of triplets (z,c,b), where b is a boolean specifying whether signature (z,c) is significant (True) or not (False) 'significant': list containing only the significant signatures (z,c) of pvalue_spectrum 'non_significant': list containing only the non-significant signatures spectrum : {'#', '3d#'}, optional Defines the signature of the patterns. '#': pattern spectrum using the as signature the pair: (number of spikes, number of occurrence) '3d#': pattern spectrum using the as signature the triplets: (number of spikes, number of occurrence, difference between last and first spike of the pattern) Default: '#' Returns ------- sig_spectrum : list Significant signatures of pvalue_spectrum, in the format specified by `report` """ # If alpha == 1 all signatures are significant if alpha == 1: return [] if spectrum not in ('#', '3d#'): raise ValueError("spectrum must be either '#' or '3d#', " + f"got {spectrum} instead") if report not in ('spectrum', 'significant', 'non_significant'): raise ValueError("report must be either 'spectrum'," + " 'significant' or 'non_significant'," + f"got {report} instead") if corr not in ('bonferroni', 'sidak', 'holm-sidak', 'holm', 'simes-hochberg', 'hommel', 'fdr_bh', 'fdr_by', 'fdr_tsbh', 'fdr_tsbky', '', 'no'): raise ValueError("Parameter corr not recognized") pv_spec = np.array(pv_spec) mask = _mask_pvalue_spectrum(pv_spec, concepts, spectrum, winlen) pvalues = pv_spec[:, -1] pvalues_totest = pvalues[mask] # Initialize test array to False tests = [False] * len(pvalues) if len(pvalues_totest) > 0: # Compute significance for only the non-trivial tests if corr in ['', 'no']: # ...without statistical correction tests_selected = pvalues_totest <= alpha else: try: import statsmodels.stats.multitest as sm except ModuleNotFoundError: raise ModuleNotFoundError( "Please run 'pip install statsmodels' if you " "want to use multiple testing correction") tests_selected = sm.multipletests(pvalues_totest, alpha=alpha, method=corr)[0] # assign each corrected pvalue to its corresponding entry # this breaks for index, value in zip(mask.nonzero()[0], tests_selected): tests[index] = value # Return the specified results: if spectrum == '#': if report == 'spectrum': sig_spectrum = [(size, occ, test) for (size, occ, pv), test in zip(pv_spec, tests)] elif report == 'significant': sig_spectrum = [(size, occ) for ((size, occ, pv), test) in zip(pv_spec, tests) if test] else: # report == 'non_significant' sig_spectrum = [(size, occ) for ((size, occ, pv), test) in zip(pv_spec, tests) if not test] else: # spectrum == '3d#' if report == 'spectrum': sig_spectrum =\ [(size, occ, l, test) for (size, occ, l, pv), test in zip(pv_spec, tests)] elif report == 'significant': sig_spectrum = [(size, occ, l) for ((size, occ, l, pv), test) in zip(pv_spec, tests) if test] else: # report == 'non_significant' sig_spectrum =\ [(size, occ, l) for ((size, occ, l, pv), test) in zip(pv_spec, tests) if not test] return sig_spectrum
def _pattern_spectrum_filter(concept, ns_signatures, spectrum, winlen): """ Filter for significant concepts """ if spectrum == '#': keep_concept = (len(concept[0]), len(concept[1])) not in ns_signatures else: # spectrum == '3d#': # duration is fixed as the maximum lag duration = max(np.array(concept[0]) % winlen) keep_concept = (len(concept[0]), len(concept[1]), duration) not in ns_signatures return keep_concept
[docs] def approximate_stability(concepts, rel_matrix, n_subsets=0, delta=0., epsilon=0.): r""" Approximate the stability of concepts. Uses the algorithm described in Babin, Kuznetsov (2012): Approximating Concept Stability Parameters ---------- concepts : list All the pattern candidates (concepts) found in the spiketrains. Each pattern is represented as a tuple containing (spike IDs, discrete times (window position) of the occurrences of the pattern). The spike IDs are defined as: `spike_id=neuron_id*bin_id` with `neuron_id` in `[0, len(spiketrains)]` and `bin_id` in `[0, winlen]`. rel_matrix : :class:`scipy.sparse.coo_matrix` A binary matrix with shape (number of windows, winlen*len(spiketrains)). Each row corresponds to a window (order according to their position in time). Each column corresponds to one bin and one neuron, and it is 0 if no spikes or 1 if one or more spikes occurred in that bin for that particular neuron. For example, the entry [0,0] of this matrix corresponds to the first bin of the first window position for the first neuron, the entry `[0, winlen]` to the first bin of the first window position for the second neuron. n_subsets : int Number of subsets of a concept used to approximate its stability. If `n_subsets` is 0, it is calculated according to to the formula given in Babin, Kuznetsov (2012), proposition 6: .. math:: n_{\text{subset}} = \frac{1}{2 \cdot \epsilon^2} \ln{\left( \frac{2}{\delta} \right)} +1 Default: 0 delta : float, optional delta: probability with at least :math:`1-\delta` Default: 0.0 epsilon : float, optional epsilon: absolute error Default: 0.0 Returns ------- output : list List of all the pattern candidates (concepts) given in input, each with the correspondent intensional and extensional stability. Each pattern is represented as a tuple (spike IDs, discrete times of the occurrences of the pattern, intensional stability of the pattern, extensional stability of the pattern). The spike IDs are defined as: `spike_id=neuron_id*bin_id` with `neuron_id` in `[0, len(spiketrains)]` and `bin_id` in `[0, winlen]`. Notes ----- If n_subset is larger than the extent all subsets are directly calculated, otherwise for small extent size an infinite loop can be created while doing the recursion, since the random generation will always contain the same numbers and the algorithm will be stuck searching for other (random) numbers. """ if HAVE_MPI: # pragma: no cover comm = MPI.COMM_WORLD # create MPI communicator rank = comm.Get_rank() # get rank of current MPI task size = comm.Get_size() # get tot number of MPI tasks else: rank = 0 size = 1 if not (isinstance(n_subsets, int) and n_subsets >= 0): raise ValueError('n_subsets must be an integer >=0') if n_subsets == 0 and not (isinstance(delta, float) and delta > 0. and isinstance(epsilon, float) and epsilon > 0.): raise ValueError('delta and epsilon must be floats > 0., ' 'given that n_subsets = 0') if len(concepts) == 0: return [] if len(concepts) <= size: rank_idx = [0] * (size + 1) + [len(concepts)] else: rank_idx = list( range(0, len(concepts) - len(concepts) % size + 1, len(concepts) // size)) + [len(concepts)] # Calculate optimal n if n_subsets == 0: n_subsets = int(round(np.log(2. / delta) / (2 * epsilon ** 2) + 1)) if rank == 0: concepts_on_partition = concepts[rank_idx[rank]:rank_idx[rank + 1]] + \ concepts[rank_idx[-2]:rank_idx[-1]] else: concepts_on_partition = concepts[rank_idx[rank]:rank_idx[rank + 1]] output = [] for concept in concepts_on_partition: intent, extent = np.array(concept[0]), np.array(concept[1]) stab_int = _calculate_single_stability_parameter( intent, extent, n_subsets, rel_matrix, look_at='intent') stab_ext = _calculate_single_stability_parameter( intent, extent, n_subsets, rel_matrix, look_at='extent') output.append((intent, extent, stab_int, stab_ext)) if size != 1: recv_list = comm.gather(output, root=0) if rank == 0: for i in range(1, len(recv_list)): output.extend(recv_list[i]) return output
def _calculate_single_stability_parameter(intent, extent, n_subsets, rel_matrix, look_at='intent'): """ Calculates the stability parameter for extent or intent. For detailed description see approximate_stabilty Parameters ---------- extent : np.array 2nd element of concept intent : np.array 1st element of concept n_subsets : int See approximate_stabilty rel_matrix : sparse.coo_matrix See approximate_stabilty look_at : {'extent', 'intent'} whether to determine stability for extent or intent. Default: 'intent' Returns ------- stability : float Stability parameter for given extent, intent depending on which to look """ if look_at == 'intent': element_1, element_2 = intent, extent else: # look_at == 'extent': element_1, element_2 = extent, intent if n_subsets > 2 ** len(element_1): subsets = chain.from_iterable( combinations(element_1, subset_index) for subset_index in range(len(element_1) + 1)) else: subsets = _select_random_subsets(element_1, n_subsets) stability = 0 excluded_subsets = [] for subset in subsets: if any([set(subset).issubset(excluded_subset) for excluded_subset in excluded_subsets]): continue # computation of the ' operator for the subset if look_at == 'intent': subset_prime = \ np.where(np.all(rel_matrix[:, subset], axis=1) == 1)[0] else: # look_at == 'extent': subset_prime = \ np.where(np.all(rel_matrix[subset, :], axis=0) == 1)[0] # Condition holds if the closure of the subset of element_1 given in # element_2 is equal to element_2 given in input if set(subset_prime) == set(element_2): stability += 1 else: excluded_subsets.append(subset) stability /= min(n_subsets, 2 ** len(element_1)) return stability def _select_random_subsets(element_1, n_subsets): """ Creates a list of random_subsets of element_1. Parameters ---------- element_1 : np.array intent or extent n_subsets : int see approximate_stability Returns ------- subsets : list each element a subset of element_1 """ subsets_indices = [set()] * (len(element_1) + 1) subsets = [] while len(subsets) < n_subsets: num_indices = np.random.binomial(n=len(element_1), p=1 / 2) random_indices = sorted(np.random.choice( len(element_1), size=num_indices, replace=False)) random_tuple = tuple(random_indices) if random_tuple not in subsets_indices[num_indices]: subsets_indices[num_indices].add(random_tuple) subsets.append(element_1[random_indices]) return subsets
[docs] def pattern_set_reduction(concepts, ns_signatures, winlen, spectrum, h_subset_filtering=0, k_superset_filtering=0, l_covered_spikes=0, min_spikes=2, min_occ=2): r""" Takes a list concepts and performs pattern set reduction (PSR). PSR determines which patterns in concepts_psf are statistically significant given any other pattern, on the basis of the pattern size and occurrence count ("support"). Only significant patterns are retained. The significance of a pattern A is evaluated through its signature :math:`(z_a, c_A)`, where :math:`z_A = |A|` is the size and :math:`c_A` - the support of A, by either of: * subset filtering: any pattern B is discarded if *concepts* contains a superset A of B such that :math:`(z_B, c_B - c_A + h) \in \text{ns}_{\text{signatures}}` * superset filtering: any pattern A is discarded if *concepts* contains a subset B of A such that :math:`(z_A - z_B + k, c_A) \in \text{ns}_{\text{signatures}}` * covered-spikes criterion: for any two patterns A, B with :math:`A \subset B`, B is discarded if :math:`(z_B-l) \cdot c_B \le c_A \cdot (z_A - l)`, A is discarded otherwise; * combined filtering: combines the three procedures above: takes a list concepts (see output psf function) and performs combined filtering based on the signature (z, c) of each pattern, where z is the pattern size and c the pattern support. For any two patterns A and B in concepts_psf such that :math:`B \subset A`, check: 1) :math:`(z_B, c_B - c_A + h) \in \text{ns}_{\text{signatures}}`, and 2) :math:`(z_A - z_B + k, c_A) \in \text{ns}_{\text{signatures}}`. Then: * if 1) and not 2): discard B * if 2) and not 1): discard A * if 1) and 2): discard B if :math:`c_B \cdot (z_B - l) \le c_A \cdot (z_A - l)`, otherwise discard A * if neither 1) nor 2): keep both patterns Assumptions/Approximations: * a pair of concepts cannot cause one another to be rejected * if two concepts overlap more than min_occ times, one of them can account for all occurrences of the other one if it passes the filtering Parameters ---------- concepts : list List of concepts, each consisting in its intent and extent ns_signatures : list A list of non-significant pattern signatures (z, c) winlen : int The size (number of bins) of the sliding window used for the analysis. The maximal length of a pattern (delay between first and last spike) is then given by `winlen*bin_size`. spectrum : {'#', '3d#'} Define the signature of the patterns. '#': pattern spectrum using the as signature the pair: (number of spikes, number of occurrences) '3d#': pattern spectrum using the as signature the triplets: (number of spikes, number of occurrence, difference between last and first spike of the pattern) h_subset_filtering : int, optional Correction parameter for subset filtering Default: 0 k_superset_filtering : int, optional Correction parameter for superset filtering Default: 0 l_covered_spikes : int, optional Correction parameter for covered-spikes criterion Default: 0 min_spikes : int, optional Minimum pattern size Default: 2 min_occ : int, optional Minimum number of pattern occurrences Default: 2 Returns ------- tuple A tuple containing the elements of the input argument that are significant according to combined filtering. """ additional_measures = [] # Extracting from the extent and intent the spike and window times for concept in concepts: intent = concept[0] extent = concept[1] additional_measures.append((len(extent), len(intent))) # by default, select all elements in conc to be returned in the output selected = [True] * len(concepts) # scan all conc and their subsets for id1, id2 in combinations(range(len(concepts)), r=2): # immediately continue if both concepts have already been rejected if not selected[id1] and not selected[id2]: continue intent1, extent1 = concepts[id1][:2] intent2, extent2 = concepts[id2][:2] occ1, size1 = additional_measures[id1] occ2, size2 = additional_measures[id2] dur1 = max(np.array(intent1) % winlen) dur2 = max(np.array(intent2) % winlen) intent2 = set(intent2) # Collecting all the possible distances between the windows # of the two concepts time_diff_all = np.array( [w2 - w1 for w2 in extent2 for w1 in extent1]) # sort time differences by ascending absolute value time_diff_sorting = np.argsort(np.abs(time_diff_all)) sorted_time_diff, sorted_time_diff_occ = np.unique( time_diff_all[time_diff_sorting], return_counts=True) # only consider time differences that are smaller than winlen # and that correspond to intersections that occur at least min_occ # times time_diff_mask = np.logical_and( np.abs(sorted_time_diff) < winlen, sorted_time_diff_occ >= min_occ) # Rescaling the spike times to realign to real time for time_diff in sorted_time_diff[time_diff_mask]: intent1_new = [t_old - time_diff for t_old in intent1] # from here on we will only need the intents as sets intent1_new = set(intent1_new) # if intent1 and intent2 are disjoint, skip this step if len(intent1_new & intent2) == 0: continue # Test the case intent1 is a superset of intent2 if intent1_new.issuperset(intent2): reject1, reject2 = _perform_combined_filtering( occ_superset=occ1, size_superset=size1, dur_superset=dur1, occ_subset=occ2, size_subset=size2, dur_subset=dur2, spectrum=spectrum, ns_signatures=ns_signatures, h_subset_filtering=h_subset_filtering, k_superset_filtering=k_superset_filtering, l_covered_spikes=l_covered_spikes, min_spikes=min_spikes, min_occ=min_occ) elif intent2.issuperset(intent1_new): reject2, reject1 = _perform_combined_filtering( occ_superset=occ2, size_superset=size2, dur_superset=dur2, occ_subset=occ1, size_subset=size1, dur_subset=dur1, spectrum=spectrum, ns_signatures=ns_signatures, h_subset_filtering=h_subset_filtering, k_superset_filtering=k_superset_filtering, l_covered_spikes=l_covered_spikes, min_spikes=min_spikes, min_occ=min_occ) else: # none of the intents is a superset of the other one # we compare both concepts to the intersection # if one of them is not significant given the # intersection, it is rejected inter_size = len(intent1_new & intent2) reject1 = _superset_filter( occ_superset=occ1, size_superset=size1, dur_superset=dur1, size_subset=inter_size, spectrum=spectrum, ns_signatures=ns_signatures, k_superset_filtering=k_superset_filtering, min_spikes=min_spikes) reject2 = _superset_filter( occ_superset=occ2, size_superset=size2, dur_superset=dur2, size_subset=inter_size, spectrum=spectrum, ns_signatures=ns_signatures, k_superset_filtering=k_superset_filtering, min_spikes=min_spikes) # Reject accordingly: if reject1 and reject2: reject1, reject2 = _covered_spikes_criterion( occ_superset=occ1, size_superset=size1, occ_subset=occ2, size_subset=size2, l_covered_spikes=l_covered_spikes) selected[id1] &= not reject1 selected[id2] &= not reject2 # skip remaining time-shifts if both concepts have been rejected if (not selected[id1]) and (not selected[id2]): break # Return the selected concepts return [p for i, p in enumerate(concepts) if selected[i]]
def _perform_combined_filtering(occ_superset, size_superset, dur_superset, occ_subset, size_subset, dur_subset, spectrum, ns_signatures, h_subset_filtering, k_superset_filtering, l_covered_spikes, min_spikes, min_occ): """ perform combined filtering (see pattern_set_reduction) """ reject_subset = _subset_filter( occ_superset=occ_superset, occ_subset=occ_subset, size_subset=size_subset, dur_subset=dur_subset, spectrum=spectrum, ns_signatures=ns_signatures, h_subset_filtering=h_subset_filtering, min_occ=min_occ) reject_superset = _superset_filter( occ_superset=occ_superset, size_superset=size_superset, dur_superset=dur_superset, size_subset=size_subset, spectrum=spectrum, ns_signatures=ns_signatures, k_superset_filtering=k_superset_filtering, min_spikes=min_spikes) # Reject the superset and/or the subset accordingly: if reject_superset and reject_subset: reject_superset, reject_subset = _covered_spikes_criterion( occ_superset=occ_superset, size_superset=size_superset, occ_subset=occ_subset, size_subset=size_subset, l_covered_spikes=l_covered_spikes) return reject_superset, reject_subset def _subset_filter(occ_superset, occ_subset, size_subset, dur_subset, spectrum, ns_signatures=None, h_subset_filtering=0, min_occ=2): """ perform subset filtering (see pattern_set_reduction) """ if ns_signatures is None: ns_signatures = [] occ_diff = occ_subset - occ_superset + h_subset_filtering if spectrum == '#': signature_to_test = (size_subset, occ_diff) else: # spectrum == '3d#': signature_to_test = (size_subset, occ_diff, dur_subset) reject_subset = occ_diff < min_occ or signature_to_test in ns_signatures return reject_subset def _superset_filter(occ_superset, size_superset, dur_superset, size_subset, spectrum, ns_signatures=None, k_superset_filtering=0, min_spikes=2): """ perform superset filtering (see pattern_set_reduction) """ if ns_signatures is None: ns_signatures = [] size_diff = size_superset - size_subset + k_superset_filtering if spectrum == '#': signature_to_test = (size_diff, occ_superset) else: # spectrum == '3d#': signature_to_test = (size_diff, occ_superset, dur_superset) reject_superset = \ size_diff < min_spikes or signature_to_test in ns_signatures return reject_superset def _covered_spikes_criterion(occ_superset, size_superset, occ_subset, size_subset, l_covered_spikes): """ evaluate covered spikes criterion (see pattern_set_reduction) """ reject_superset = True reject_subset = True score_superset = (size_superset - l_covered_spikes) * occ_superset score_subset = (size_subset - l_covered_spikes) * occ_subset if score_superset >= score_subset: reject_superset = False else: reject_subset = False return reject_superset, reject_subset
[docs] def concept_output_to_patterns(concepts, winlen, bin_size, pv_spec=None, spectrum='#', t_start=0 * pq.ms): """ Construction of dictionaries containing all the information about a pattern starting from a list of concepts and its associated pvalue_spectrum. Parameters ---------- concepts : tuple Each element of the tuple corresponds to a pattern which it turn is a tuple of (spikes in the pattern, occurrences of the patterns) winlen : int Length (in bins) of the sliding window used for the analysis. bin_size : pq.Quantity The time precision used to discretize the `spiketrains` (binning). pv_spec : None or tuple Contains a tuple of signatures and the corresponding p-value. If equal to None all p-values are set to -1. spectrum : {'#', '3d#'} '#': pattern spectrum using the as signature the pair: (number of spikes, number of occurrences) '3d#': pattern spectrum using the as signature the triplets: (number of spikes, number of occurrence, difference between last and first spike of the pattern) Default: '#' t_start : pq.Quantity t_start of the analyzed spike trains Returns ------- output : list List of dictionaries. Each dictionary corresponds to a pattern and has the following entries: 'itemset': A list of the spikes in the pattern, expressed in theform of itemset, each spike is encoded by `spiketrain_id * winlen + bin_id`. 'windows_ids': The ids of the windows in which the pattern occurred in discretized time (given byt the binning). 'neurons': An array containing the idx of the neurons of the pattern. 'lags': An array containing the lags (integers corresponding to the number of bins) between the spikes of the patterns. The first lag is always assumed to be 0 and corresponds to the first spike. 'times': An array containing the times (integers corresponding to the bin idx) of the occurrences of the patterns. 'signature': A tuple containing two integers (number of spikes of the patterns, number of occurrences of the pattern). 'pvalue': The p-value corresponding to the pattern. If `n_surr==0`, all p-values are set to -1. """ if pv_spec is not None: pvalue_dict = defaultdict(float) # Creating a dictionary for the pvalue spectrum for entry in pv_spec: if spectrum == '3d#': pvalue_dict[(entry[0], entry[1], entry[2])] = entry[-1] if spectrum == '#': pvalue_dict[(entry[0], entry[1])] = entry[-1] # Initializing list containing all the patterns t_start = t_start.rescale(bin_size.units) output = [] for concept in concepts: itemset, window_ids = concept[:2] # Vocabulary for each of the patterns, containing: # - The pattern expressed in form of Itemset, each spike in the pattern # is represented as spiketrain_id * winlen + bin_id # - The ids of the windows in which the pattern occurred in discretized # time (clipping) output_dict = {'itemset': itemset, 'windows_ids': window_ids} # Bins relative to the sliding window in which the spikes of patt fall itemset = np.array(itemset) bin_ids_unsort = itemset % winlen order_bin_ids = np.argsort(bin_ids_unsort) bin_ids = bin_ids_unsort[order_bin_ids] # id of the neurons forming the pattern output_dict['neurons'] = list(itemset[order_bin_ids] // winlen) # Lags (in bin_sizes units) of the pattern output_dict['lags'] = bin_ids[1:] * bin_size # Times (in bin_size units) in which the pattern occurs output_dict['times'] = sorted(window_ids) * bin_size + t_start # pattern dictionary appended to the output if spectrum == '#': # Signature (size, n occ) of the pattern signature = (len(itemset), len(window_ids)) else: # spectrum == '3d#': # Signature (size, n occ, duration) of the pattern # duration is position of the last bin signature = (len(itemset), len(window_ids), bin_ids[-1]) output_dict['signature'] = signature # If None is given in input to the pval spectrum the pvalue # is set to -1 (pvalue spectrum not available) if pv_spec is None: output_dict['pvalue'] = -1 else: # p-value assigned to the pattern from the pvalue spectrum output_dict['pvalue'] = pvalue_dict[signature] output.append(output_dict) return output