# The Poisson-Influenced K-Means Algorithm

We present an implementation of the Poisson-Influenced -Means Algorithm (PIKA), first developed to characterize the output of a superconducting transition edge sensor (TES) in the few-photon-counting regime. The algorithm seeks to group data into several clusters that minimize their distances from their means, as in classical -means clustering, but with the added knowledge that the cluster sizes should follow a Poisson distribution.

### 1. Run PIKA Here

The algorithm proper is run when it is submitted using the button near the lower-right corner of the form. You also have the option to use a separate input file to manually override the form, which may be more useful for automated runs on multiple datasets. This function launches the program; evaluating it generates the form and then executes PIKA. The function is defined and documented in Section 7.9. After taking input through the form, `pika` calls `runPIKA`, the *de facto* main function for the program, defined in Section 3. You can also specify a separate options file in the form that overrides the variable assignments that the form makes. The first command ensures that the paclet for forms is current.

If you get a popup that asks, Do you want to automatically evaluate all the initialization cells…?, answer yes. When you get the form, it is necessary to replace [directory] with the pathname of the location of the data; you might also have to change the backslash to a forward slash.

### 2. General Formulation (No Code)

The Poisson-Influenced -means Algorithm (PIKA) was first described in [1] as a way of calibrating a transition edge sensor (TES), a superconducting few-photon detector. A TES can discern the number of photons in a very weak pulse of light, but it must be calibrated in order to do so. Our implementation deals with photon counting, but many of its features are applicable to more general probability-assisted -means clustering situations.

#### 2.1. Background

A TES is a superconductor kept in its transition from its superconducting phase to the normal regime, where it loses its superconducting properties. Photons incident on the sensor heat it, causing its resistance to rise sharply and then slowly fall to superconducting levels as the heat dissipates. A current is run through the TES, and the change in resistance is captured by the voltage signal of a superconducting quantum interference device (SQUID) inductively coupled to the TES circuit.

Several groups of TES signal waveforms are shown in Figure 1 (each graph shows the set of signals elicited by an ensemble of laser pulses with an average number of photons per pulse given by ). For , one can clearly distinguish the different photon numbers and their relative frequencies; for higher numbers this is harder. Higher photon numbers create higher signal amplitudes, but at a certain point the TES saturates in the normal regime and additional photons change the signal maximum very little.

**Figure 1.** Several collections of TES waveforms resulting from pulses with particular mean photon numbers given by , from [2].

The goal of PIKA is to characterize individual TES waveforms by the integer photon numbers of the pulses that cause them. The photon numbers of individual pulses cannot be determined directly; we can only estimate the average photon number of all of the pulses, based on the nominal laser and attenuator parameters of the light source.

-means clustering, upon which PIKA is based, is a fundamental part of unsupervised machine learning. PIKA extends the -means algorithm to scenarios in which the ideal distribution that the clusters should follow is known, and though some of the implementation is specific to the context of TES calibration (e.g. the use of the Poisson distribution, the idea of ordering observations by photon number), much of it can be generalized without much difficulty to other situations with known probability distributions.

#### 2.2. -means Clustering and the Poisson Distribution

Traditional -means clustering consists of taking some amount of data and organizing it into clusters that minimize their members’ distance from the cluster mean. Essentially this is a minimization of an objective function, the sum over each piece of data of its deviation from its cluster mean (where deviation is measured by some relevant definition of distance). We can use a similar approach by considering each signal as a high-dimensional vector and its deviation from some mean as squared Euclidean distance. Then the -means component of the objective function becomes

(1) |

where is the signal vector for observation ( is the element of the vector ), is the mean of the cluster , and is the number of time points; and give the first cluster’s photon number and number of clusters, respectively, and are determined by which photon numbers we expect to be associated with at least one pulse based on the Poisson distribution. More physically, is an individual waveform and is the average of the waveforms with an assigned photon number .

To account for the Poisson-distributed cluster sizes, we introduce another term, , where is the likelihood, according to the Poisson distribution, of a group of clusters associated with a group of photon numbers having the particular sizes that a given clustering asserts that they do, given the mean photon number of the ensemble of pulses. The likelihood of a particular sequence of photon numbers occurring in an ensemble with mean is

(2) |

where is the number of waveforms in cluster . We also need a combinatorial component, since different photon-number sequences can yield the same eventual cluster sizes:

(3) |

where . Then , and the PIKA objective function is

(4) |

The constant relating the two terms can be estimated from the data, since the objective function is itself the negative log-likelihood of a normal distribution. (This also means that minimizing is equivalent to maximizing the product of two likelihoods.) PIKA minimizes the objective function by moving waveforms to neighboring clusters.

Once the clusters are optimized, each waveform is assigned an effective photon number by a linear interpolation between the two closest cluster means. First, we find the value that minimizes the root mean square deviation of from , where and are the closest and second-closest mean waveforms to . In practice, and . One can easily show that

(5) |

for each . The effective photon number is then given by

(6) |

#### 2.3. Initial Clustering

PIKA needs an initial clustering upon which to improve. Random cluster assignment is an option, but a better alternative is to give the observations a rough order by photon number, so that our initial guess is actually a meaningful estimate. This is done via the dot product method: we assign each observation an initial effective photon number

(7) |

where is the entire ensemble’s mean, not a cluster mean . The initial clusters are sized to fit each observation and conform to the Poisson distribution, and the observations are placed in the clusters by order of effective photon number.

The geometric interpretation of PIKA and the dot product method is a curve and a line, respectively, evolving through hyperspace (shown in Figure 2). The dot product method projects each observation onto the mean waveform vector (a line) and then assumes that photon number scales linearly with distance along the mean vector (which is not actually true, but suffices for a first guess); that converts distance relative to the mean to photon number relative to the mean. PIKA, in contrast, finds a piecewise linear approximation of a curve that passes through the cluster means and projects each observation onto that. Both essentially measure photon number by progress along a one-dimensional path through high-dimensional space.

**Figure 2.** An illustration of the geometric differences between the dot product method (green arrow) and PIKA (blue curve). The 3D space here stands in for a high-dimensional space.

#### 2.4. Choice of Mean Photon Number

PIKA requires knowledge of the ensemble’s mean photon number in addition to an initial clustering. There are two ways of supplying that knowledge. The first is simply to give the exact mean photon number of the incoming pulses, if it is known; then PIKA clusters the data accordingly.

The second is to test several mean photon numbers on the data if the true mean is not known exactly. The test means should be close together in some range around a rough estimate of the true mean; PIKA clusters the data once for each value, returning a new (usually better) estimate of the mean based on each optimized clustering, as well as the value of the objective function associated with each new estimate. (In addition, since the test means are close together so that adjacent distributions should be very similar, the effective photon numbers for the waveforms from one round are used as the initial ordering for the next, instead of the dot product method.)

The optimized means are usually closer to the true mean than their initial seeds, but, depending on the structure of the probability distribution underlying the objective function, some optimized means may be moved farther away due to attraction to local minima. In the Poisson case, we have observed the objective function to have secondary minima at integer differences in mean photon number from the primary minimum and convex regions of width 1 centered around each minimum. Thus, test means that land in the same region as the true mean should be optimized toward the true mean, but those outside are diverted by secondary minima. Reference [3] estimates the range of test means to within half of a photon number from the nominal laser power and attenuation.

#### 2.5. PIKA’s Results

The following results and images (in Section 2.5) except Figure 6 come from [1].

Figure 3(a) shows the optimized cluster means from two PIKA runs (, solid red, and , dotted blue). The shapes of the mean waveforms appear independent of —reassuring, since the average photon number of an ensemble should not affect the shape of individual waveforms. This independence indicates that PIKA is properly identifying actual photon numbers in the data, and that the average photon number with which it is supplied is not unduly affecting the results.

Figure 3(b) is a histogram of the optimized effective photon numbers from (bin widths of 0.05), which follow a Poisson distribution but with some Gaussian spread around the integers (the red curves are Gaussians centered on the integers and fitted to the data), resulting in a comb-like shape.

**Figure 3.** (a) Optimized cluster mean waveforms for (solid red) and (dotted blue). (b) Optimized effective photon numbers for (black) and integer-centered Gaussians fitted to the data (red).

Figure 4 shows a similar comb structure for for both the dot product method and PIKA. As the photon number increases, the teeth of the comb grow less defined—that is, the peak visibility falls, and with it the photon-resolving capability. Figure 5 shows this drop in visibility. The power of PIKA is that it retains nonzero visibility (i.e. the uncertainty does not include 0) through , whereas the dot product method alone loses it after . PIKA has been used to explore the regime just beyond the loss of peak visibility [4].

**Figure 4.** Effective photon numbers from the dot product method and from PIKA for .

**Figure 5.** Peak visibility for the dot product method (blue) and PIKA (red) through (both lose visibility altogether after that point). Uncertainties are given at two standard deviations and are purely statistical.

Figure 6 illustrates the loss of visibility with high photon numbers. As increases, cluster means become closer to each other (saturation in the normal regime) and individual waveforms overlap with each other more and more, making it more difficult to differentiate between adjacent photon numbers.

**Figure 6.** Illustration of the loss of visibility as increases. Cluster means become closer together, so individual waveforms overlap more and more.

### 3. Terminology and Main Loop (`runPika`)

The code and documentation are written in the context of few-photon-counting with a TES, so variables and functions are often named for physical quantities and concepts specific to the original purpose of the algorithm instead of more abstract ones (e.g. `readTES` instead of `readData`). In addition, the terms “trace,” “observation,” and “waveform” are used interchangeably and synonymously to mean a single detector response in the set of signals—that is, a regular sampling of voltage over time that records the response to a single optical pulse fired at the TES. In the program a trace is a list of voltage values.

Variable and function names also follow certain conventions:

- A variable beginning with
`i`or`j`is an index (or list of indices) on the remainder of its name.`iObs`is single index or list of indices on a set of observations, for example.

- A variable beginning with
`n`is a number or size of the remainder of its name (or a list of numbers or sizes). - A variable beginning with
`m`is a maximum number of something.`mSample`gives the maximum allowed number of waveforms to sample for graphical output, but fewer are taken if the population of waveforms is smaller than`mSample`.

- A variable of the form
`xOfY`lists`x`for each`Y`.`iObsOfClust`gives a list of indices`"iObs"`corresponding to`"Of"`each cluster`"Clust"`.

`nClust`gives the number of clusters, but`nOfClust`lists the size of each cluster.

- A function beginning with
`get`returns something described by the rest of its name.`getIObsOfClust`returns`iObsOfClust`.

- A variable such as
`p1`is a plot.

This section describes the overarching form of the algorithm. Here, the process is broken into a series of more isolated procedures, whose implementation details are described in the next section. Note that the functions referenced in this section and defined in the next take no arguments and return no output—they merely break the whole algorithm into smaller pieces of code that operate on global variables.

Below is the skeleton of the process. We begin by defining constants and options (which can be overridden as necessary by the user), and then reading in the data with a noise filter. One can optionally filter out observations contaminated by background radiation; by default the algorithm does not do this. We then proceed to iterate (Loop 1) over the elements of `nPhotonAvgList`, a list of hypothetical mean photon numbers for the set of pulses incident on the sensor, running PIKA once on the dataset for each element. This lets us compare the optimized objective functions for each hypothesized mean and estimate the actual mean from the runs with the smallest objective functions. The dot-product method supplies the initial effective photon numbers (which create the initial clustering) for the first mean photon number; after that we use the optimized effective photon numbers from the previous mean on the list as our initial guess for the next mean (adjacent elements of `nPhotonAvgList` should be close together, so one round’s optimized ordering of the observations should be a reasonable seed for the next).

In Loop 1 (over `nPhotonAvgList`), we organize the traces by their clusters and then find the initial value of the objective function to be minimized. We then enter the actual optimization loop (Loop 2), which repeats some preset number of times, trying to lower the objective function by moving observations between clusters. For each iteration of Loop 2, the nested Loop 3 examines each observation and considers moving it to a neighboring cluster (one with a photon number one greater or less). The greedy algorithm [5] (the default option) accepts any move that reduces the objective function and rejects any that does not, while the simulated annealing algorithm [6] may accept disadvantageous moves so that it can find global minima instead of just local ones. If a move is accepted, the relevant pieces of the objective function are updated. When Loop 2 finishes, the observations have been organized into optimized clusters, and we create various graphical and textual outputs.

### 4. Implementation

Here we define the functions referenced in the previous section. The structure of the following subsections mirrors that of the algorithm skeleton above: Subsection 1 defines the procedures called before Loop 1 begins, Subsections 2 through 4 define those used in Loop 1 (with Subsection 3 containing the functions for Loops 2 and 3), and Subsection 5 contains the (very minimal) end of the algorithm after exiting Loop 1. The functions are defined in the order in which they are called, and each is called exactly once. The functions called inside the bodies of these skeleton functions (i.e. those that take arguments, unlike the functions defined in this section) are defined in Sections 5

and 7.

#### 4.1. Set Options and Read Data

The main execution function, `pika`, assigns values to the following variables based on the user’s response to the input form:

`iNPhoton`—the index of the dataset to read (assuming that multiple datasets are stored in the same location)`iDataSet`—a list of indices of parts of the dataset to read (assuming that datasets are split over multiple files)`nTime`—the number of time points to keep after filtration`mSample`—the maximum number of traces to randomly sample when outputting samples`nDatUse`—the maximum number of traces to read in (if this is equal to 0, we use all of the traces)`backgroundReject`—a Boolean, true if PIKA should reject response waveforms with background radiation`peakValCut`—the voltage cutoff for background rejection`peakPosCut`—the peak location cutoff for background rejection`peakNumCut`—the cutoff for number of peaks for background rejection`nCool`—the number of optimizing rounds to perform`nGreedy`—the number of rounds that should be run with the greedy algorithm`coolConst`—the cooling constant for simulated annealing`tAnneal`—the starting annealing temperature`probDistName`—the name of the probability distribution`nSigma`—the number of standard deviations from the mean of the probability distribution to generate`binFract`—the histogram bin size (fraction of a photon number)`outputImageSize`—the default graphics size for output`nPhotonAvgList`—a list of (close together) mean photon numbers with which to run PIKA on the dataset`partialFilePath`—the directory and the beginning of the name of the data files, without the`iNPhoton`or`iDataSet`markers`fileExt`—the file extension of the data files (without the`iDataSet`marker)`nSamplePerTrace`—the number of samples per trace`nTracePerFile`—the number of traces per file`useInputFile`—a Boolean, true if there is another options file to read in and override settings from the form`pikaInput`—the path to the options override file

`runPIKA` (defined in Section 3 with implementation in this section) begins with some initial setup, processing some of the input given in `pika` and setting up tables to store graphical output.

Now we read in the data and apply a noise filter. `readTESandFilter`, defined in Section 7.7, takes the data from the set of files specified in the options above and applies a Hanning filter to it. We can then take a random sample and reduce the dataset to a smaller size, if desired, but by default we keep all of the observations. `dat` is a list of lists—each sublist corresponds to a waveform in the data and lists its values at regular time points.

In addition to the noise filter, we can remove observations from the dataset that have been contaminated by background radiation. `getIObsKeep` returns the indices of the observations in `dat` that should not be thrown out, based on the parameters `peakPosCut`, `peakValCut`, and `peakNumCut` that characterize waveforms unduly influenced by background radiation. The graphical output is a random sample of rejected observations.

`meanTrace` is the average of all of the waveforms in `dat`. The output here is a graph of `meanTrace` with a sample of accepted traces underlaid.

For the first mean photon number in `nPhotonAvgList`, the effective photon number for observation `i` is found via the dot-product method. For subsequent means, the optimized effective photon number from the previous item on `nPhotonAvgList` serves as the initial estimate for the next mean, since the elements of `nPhotonAvgList` are expected to be in order and close together. In the beginning of Loop 1, we create the initial clustering based on these effective photon numbers.

#### 4.2. Preprocessing (Loop 1)

Now we enter Loop 1, whose index, `iNPhotonAvgList`, is the position in `nPhotonAvgList` of the current mean photon number. Each iteration of Loop 1 runs PIKA on the given dataset with the assumption that the waveforms in the data come from pulses incident on the sensor with an average of `nPhotonAvgList` photons per pulse. The runs are independent of each other apart from the fact that one round’s optimized effective photon numbers are used as initial effective photon numbers for the next.

The initial clusters are formed from the effective photon numbers of the individual observations. The effective photon numbers order the observations, but the relative sizes of the clusters and the photon number to which each one corresponds are determined by the mean photon number, `nPhotonAvg`, and the Poisson distribution. `groupProbability`, defined and described in more detail in Section 7.4, returns:

`prob`—a list of ordered pairs, the first a photon number and the second the Poisson probability mass function at that photon number.`nPhotonOfClust`—a list of photon numbers for the clusters created.`iObsOfClust`—a list of lists. Each sublist corresponds to a cluster and contains the indices in`dat`of the waveforms in that cluster. The cluster has photon number .`clustMeanTrace`—a list of waveforms. is the average waveform of the cluster .

`nClust` is then the number of clusters, which `groupProbability` decides based on which photon numbers we expect to see in at least one observation.

The following code graphs samples of waveforms from the middle three clusters.

Here we graph the average waveform of each cluster.

The initial, unoptimized value of the objective function, from equation (4) above, is . Here, `sqDevOfClust` is a table containing the square deviation of each cluster from its own mean (each element is for cluster ), so the table’s sum is , the -means term of the objective function. `nOfClust` is a list of cluster sizes, and `logLikeProb` is , the Poisson-combinatorial term. `sigmaObjFtn` is , the regularization parameter for the objective function based on the deviations of the initial clusters (unlike , does not change as the objective function is minimized).

This graphs the root mean square (RMS) deviations of each cluster to its mean.

The optimization loop (Loop 2) is now ready to begin, apart from a few organizational tasks. First, since the optimization moves waveforms between clusters many times, we need a more convenient way to keep track of the clusters. `getIClustOfObs` converts `iObsOfClust`, a list of lists, into `iClustOfObs`, a one-dimensional list, each element of which corresponds to an observation in `dat` and gives the index of the cluster to which that observation belongs.

Next, for each cluster, `neighborOfClust` lists the acceptable clusters to transfer to during the optimization loop. By default, the neighbors for cluster are the clusters and , since the initial clustering (from the dot-product method or the previous round of PIKA) is expected to be at least a somewhat accurate arrangement, and so long-distance transfers should not be necessary.

Finally, a system for keeping track of transfer history lets us reduce unnecessary computation. `kMove` counts the number of transfers that have been made in the loop. `birthOfClust` is a table that records the move number when each cluster was last changed, and `birthOfObsClust` is a two-dimensional table that records, for each observation-cluster pair, the last time that the deviation from the observation to the cluster mean was calculated. `sqDevOfObsClust` is a table with the same dimensions that records the actual deviations so that they can be used later in the loop if they are still up to date.

#### 4.3. Optimization Loop (Loop 2)

Loop 2, nested inside Loop 1, begins here. Its index, `iCool`, specifies how many rounds of optimization have already passed. The optimization loop runs for a preset number of iterations (given by `nCool` in the initial setup), the first `nGreedy` of which employs the greedy algorithm. The remaining iterations are run with the simulated annealing algorithm. By default, `nGreedy` is equal to `nCool`, so all of the iterations use the greedy algorithm (i.e. simulated annealing is never used).

The bulk of Loop 2 is contained in the nested Loop 3, the waveform transfer loop that passes over all of the observations in `dat` and decides whether to move them to neighboring clusters. Before starting Loop 3, we randomly order the indices in `dat` of the waveforms upon which Loop 3 is to operate, so that the original ordering of the data does not bias the algorithm in any systematic way.

##### 4.3.1. Waveform Transfer Loop (Loop 3)

The index of Loop 3, `jObs`, is the current position of the loop in `iObsAll`, the random permutation of the observation indices. `iObs` is then set to the index of the observation itself, rather than the index of the index. `iClust` is the index of the associated cluster. If the cluster has only one observation then we skip to the next iteration of Loop 3, since no cluster is allowed to become empty. We then randomly pick a cluster `jClust` from the list of neighbors, and we consider moving the waveform from its current cluster to the new one.

To reduce the computation time, `birthOfObsClust` and `birthOfClust` record the “ages” of observation-to-mean deviation calculations and changes to cluster contents, respectively, so this step checks whether `iClust` and `jClust` have been modified since we last calculated the deviation of `iObs` from their means. If they are unchanged, then `sqDevOfObsClust` is up to date; this condition becomes increasingly common as the calculation approaches convergence. If they are changed, we need to recalculate one or both deviations and store them in `sqDevOfObsClust`.

We need to know how a proposed move would change the objective function before deciding whether to accept it. Section 5 gives more detail on the functions called here that efficiently compute the change in the objective function.

If the transfer would decrease the objective function, we automatically accept it, and if we use the greedy algorithm, we reject any transfer that does not decrease the objective function. If we use simulated annealing, however, we may accept a transfer that increases the objective function for the purpose of exploring a greater number of assignments and possibly avoiding local minima that would trap the greedy algorithm. In simulated annealing, it grows harder for a disadvantageous transfer to occur as time goes on and the “temperature” drops.

If the move is accepted, then we need to move the observation from one cluster to the other and update the clusters’ deviations and mean traces. Both clusters have been changed in this move, so we update `birthOfClust` accordingly.

After making a transfer or deciding not to, Loop 3 moves to the next waveform in the randomly ordered list or finishes if there is none.

##### 4.3.2. Annealing Update and End of Loop 2

Loop 3 is now over, after examining all of the observations and moving some to neighboring clusters. If we use the simulated annealing algorithm, the annealing temperature decreases after Loop 3, making it more difficult for a move that increases the objective function to be accepted.

Loop 2 repeats a preset number of times, randomly ordering the waveforms, running Loop 3 on each one, and then decreasing the annealing temperature in each iteration.

#### 4.4. Data Analysis and Output (Loop 1)

When Loop 2 is finished, PIKA has run for many rounds on the data and the observations should be arranged into optimal clusters. This subsection concerns itself primarily with creating graphical and numerical outputs to understand and visualize the clustering. Now that the optimization is finished, `iObsOfClust` (the list of lists) is a more useful format than `iClustOfObs` (the simple list). In order to compare the new clustering to the old, we generate `iObsOfClustNew` from `iClustOfObs`, which reflects the optimized clusters, and simply copy `iObsOfClustOld` from `iObsOfClust`, which has not been changed since before Loop 2. `freqNew` and `freqOld` are lists of the new and old cluster sizes, respectively, and `nPhotonAvgNew` is an estimate of the actual mean photon number based on the results of PIKA.

As before, we graph some sample waveforms from middle clusters.

Here we numerically output and then graph the optimized cluster mean waveforms, with a graph of both the optimized and initial means as well.

This graph compares the initial and final cluster counts to a Poisson distribution.

is the average mean square deviation of the observations in cluster `i` to the mean waveform in cluster `j`. This graph is of the RMS deviations of the traces in clusters to adjacent means.

is the mean square deviation of the mean of cluster `i` to that of cluster `j`. This graph is of the RMS deviations of cluster means to means one, two, and three clusters away.

`nPhotonEff` lists the effective photon number of each observation (`getNPhotonEff` is implemented in Section 7.2). The rest of this code creates a histogram of `nPhotonEff` with bin widths of `binFract`: `binEdge` specifies where the edges of the bins should be, `binCnt` counts the number of waveforms in each bin, and `binCtr` lists the centers of the bins.

Loop 1 runs PIKA on the data once with each mean photon number in `nPhotonAvgList` and then exits.

#### 4.5. Show Output

After Loop 1 finishes, the algorithm outputs the computation time used and displays the output from PIKA.

### 5. Updates to the Objective Function

A naive implementation of the transfer of waveforms between clusters would recalculate the various components of the objective function from scratch with each change, resulting in a computationally expensive process whose running time would depend on the size of the clusters in question. We can avoid this excessive calculation by noticing that, given some basic summary information about the clusters and waveform to which a given transfer pertains, the changes to the objective function and the cluster means are very easy to compute without any knowledge of the other waveforms in the clusters. The running times of the updates in this section are independent of cluster size.

#### 5.1. Updating the -Means Objective Function

From equation (1), we can decompose the -means term of the objective function as

(8) |

where

(9) |

If we transfer a waveform into the cluster , the new cluster members form the set . It can be shown that

(10) |

If we transfer a waveform out of the cluster , the new cluster members form the set

, with

(11) |

In equations (10) and (11), and refer to the original clusters and , before the transfer of . See below for a proof that these equations indeed give the proper changes in cluster deviation. Equation (11) corrects equation (A4) in [1]. After the transfer, the cluster means for and become

(12) |

with plus signs for and minus signs for . No cluster is ever allowed to become empty (in such a case we would have not but means), so the denominator never becomes zero. Therefore, we can determine the appropriate change to the means and square deviations of the clusters between which the transfer takes place.

##### 5.1.1. Example: The -Means* Update*

Suppose that cluster has nine waveforms and cluster has 14 (and each has some mean waveform in addition), and we want to move a waveform from to . When we make this transfer, changes by the following amount.

The mean waveforms of clusters and become the following.

Of course, the cluster sizes change as well: now has 10 waveforms, while has 13.

##### 5.1.2. Derivation of the Update Formulas

The derivation for equations (10) and (11) is slightly involved and requires some preliminaries. Voltage signals are considered as vectors for the derivation, denoted , where the discrete time index is not given explicitly in this section.

**Lemma:**

**Proof:**

which is equivalent to the first expression. (The final step is valid because .) □

**Lemma:**

**Proof:**

From the proof of the previous lemma, we have

and thus

Finally,

and the lemma follows. □

Now we can prove that equations (10) and (11) hold true.

**Theorem:**

**Proof:**

(Note the summation over , not .) Since is the observation to add to or remove from , we can write

By the second lemma, we have

and the theorem follows. □

#### 5.2. Updating the Poisson Log-Likelihood

From equation (2), the Poisson log-likelihood is

(13) |

where . If some waveform is transferred from the to the cluster, then the change in the objective function consists of a term from the Poisson factor and a term from the combinatorial factor from equation (3). The term due to the Poisson factor is

(14) |

(The notation suggests , but the same formula applies for the case.)

The combinatorial factor’s treatment is similar. The logarithm of the factor is

(15) |

does not change with a transfer, so only the second term contributes to the change:

(16) |

The algorithm only considers moves to adjacent clusters, so we only need to calculate the change due to transfers to immediately preceding and succeeding clusters. These two functions give the change in the probabilistic/combinatorial component of the objective function for moves to higher and lower photon numbers.

Bear in mind that the term that contributes to the overall objective function is the negation of its logarithm, so a positive decreases .

##### 5.2.1. Example: The Poisson Update

Suppose we have a set of clusters, among them an cluster with five waveforms and an cluster with seven waveforms, with a mean of 4.4 photons. We are interested in moving a waveform from the cluster to the cluster. Then the change in the Poisson log-likelihood is given by `deltaPoissonUp`.

### 6. Acknowledgments

We would like to thank Boris L. Glebov and Alan L. Migdall for providing the TES data. Figure 1 courtesy Thomas Gerrits and the Optical Society of America.

### 7. Appendix: Function Definitions

The functions in this section either are used often, and so are given their own names, or perform very particular procedures (such as data reading) whose implementation details are tangential to the core operation of the algorithm, and so have been separated from the main body of the code.

#### 7.1. General Purpose

`boundList` confines the elements of a list between two inclusive bounds.

`getDotToIdeal` takes a single trace or a set of traces and finds its or their dot product with an ideal trace, normalized by the square of the ideal trace’s magnitude. If the function is given a single trace as its first argument, the operation in the numerator is a simple dot product; if it is given a list of traces, the operation is matrix-vector multiplication, which returns a list of dot products.

`getMeanOfEachTime` averages all of the traces in `dat` at each time point, returning the overall mean waveform.

`getSquareDiff` returns a table that lists the squared magnitude of the difference between each observation and each cluster mean.

`groupCommon` is a small function that rearranges the result of `GatherBy`. It is best understood from its definition.

`meanSquare` takes a vector and returns its mean square (magnitude squared divided by length).

`pair` takes two lists (of equal length) and pairs elements at the same position, returning a list of pairs.

`randomSample` is a wrapper for `RandomSample` that returns a random sample of some size from a list, reducing the sample size if it is greater than the length of the list.

`twoMin` takes a list and returns the positions of the smallest and second-smallest elements, in that order.

#### 7.2. Effective Photon Number

The three-argument version of `getNPhotonEff` takes a list of observations, a list of photon numbers associated with clusters, and a list of cluster means. It returns a list specifying the effective photon number of each observation, found via a linear interpolation between the two cluster means closest to a given observation.

`alphaByClustPair` is a list that gives the value of the interpolation parameter for each observation, organized by the two cluster means to which each observation is closest. `getNPhotonEff` (the two-argument version) takes that information and finds the effective photon number for each observation.

`getAlphaByClustPair` returns a list of ordered triples that give the cluster means closest to a group of observations and the alpha values for those observations. is the mean square deviation of observation `i` to cluster mean `j`, and is a pair of indices giving the two cluster means with the lowest mean square deviations to observation `i` (i.e. a pair of indices giving the two columns with the smallest elements in row `i` of `msDev`). `iObsTwoMinIndex` pairs the relevant observation index with each pair in `twoMinIndex`. In the next line, the `GatherBy` groups the elements of `iObsTwoMinIndex` by their lowest-deviation index pairs, so that each element of `iObsTwoMinIndexGroup` is a list of elements of `iObsTwoMinIndex` that all have the same pair of nearest cluster means. Mapping `groupCommon` onto each of these lists in `iObsTwoMinIndexGroup` converts them into a more useful form; each element of `twoMinIndexIObsList` is now a list of length two, with the first entry being a pair of nearest-mean indices and the second being a list of observation indices whose nearest two means correspond to the pair in the first entry. The output is a list with an entry for each distinct ordered pair of nearest means. Each entry contains a pair of nearest-mean indices, a list of observations whose nearest means are those two, and a list giving the alpha value for each of those observations.

`getAlpha` takes a list of observations and a pair of ideal waveforms and finds the value of alpha for each observation. Alpha for a waveform is the value that minimizes the RMS deviations of from , where is the closest mean waveform (the first entry in the second argument of `getAlpha`) and is the second closest. `getAlpha` finds the minimizing value from equation (5) for each observation in `traces`. A value of 0 indicates perfect agreement with the first ideal trace, and a value of 1 indicates perfect agreement with the second.

The two-argument `getNPhotonEff` takes the formatted list of triples `alphaByClustPair` and a list of cluster photon numbers and determines the effective photon number of each observation. `alpha` is a list of alpha values for the observations, `iObs` is a list of observation indices corresponding to those alpha values, and `clustPair` is a list of nearest-mean-index pairs for those observations. `alpha` and `clustPair` are in the same order as `iObs`, but we want them in the same order as the waveforms from the original dataset, so we reorder them with `iiObs`. `nPhotonPair` takes the cluster indices in `clustPair` and converts them to the actual photon numbers associated with those clusters. The output is a list specifying the effective photon number for each observation, derived from a linear interpolation between the closest () and second-closest () photon numbers.

#### 7.3. Background Rejection

If background rejection is enabled, `getIObsKeep` returns the indices of the observations that should stay in the dataset, based on the preset parameters for rejection. `peakPosList` and `peakValList` are lists specifying the position and value of the maximum in each waveform in `dat`. `getIObsDrop` returns the indices of the waveforms that should be removed because their peak positions and values exceed their respective cutoff values, and `getIObsDropA` lists those whose endpoints are greater than the value cutoff, indicating that the sensor was registering a background photon at the beginning or end of the pulse. These two lists to remove are combined into `iObsDrop`. Each element of `datPeakLoc` lists the positions of the local maxima of a waveform in `dat`, `datPeakLocVal` pairs each of those positions with the waveform’s value there, and `datPeakLocValBig` reduces `datPeakLocVal` to the maxima that exceed the peak value cutoff. `datPeakNum` gives the number of maxima in each waveform, and `getIObsKeepA` returns the indices that should stay in the dataset (i.e. those that are not in `iObsDrop` and that have fewer maxima than `peakNumCut`).

`posMax` returns the position of the first occurrence of a list’s maximum.

`getMaxAndPosList` returns two lists, the first giving the position of each waveform’s maximum and the second giving the maxima themselves.

`getIObsDrop` returns the indices of the waveforms whose peaks have values and positions greater than the cutoffs.

`getIObsDropA` returns the indices of the waveforms whose endpoints have values greater than the cutoff, indicating that a background photon may have registered with the sensor at the beginning or end of the pulse.

`peakList` finds the local maxima of a list by comparing each element to the ones before and after it.

`getDatPeakLoc` takes a list of waveforms and returns the locations of the maxima for each.

`getDatPeakLocVal` pairs each of the locations returned by `getDatPeakLoc` with the value of the corresponding waveform at that point.

`getDatPeakLocValBig` finds the maxima that exceed the cutoff among those returned by `getDatPeakLocVal`.

`getIObsKeepA` takes the list of waveforms rejected because of misplaced peaks and too-large endpoints and combines that with the information on each waveform’s number of maxima to return a list of the indices of all of the waveforms that should be rejected due to background radiation.

#### 7.4. Cluster Organization

These two functions convert between `iObsOfClust` and `iClustOfObs`, two different ways of organizing clusters of waveforms. `iObsOfClust` is a list of lists. Each sublist is associated with a cluster and has the indices (in `dat`) of the traces in that cluster. `iClustOfObs` is a simple list, each of whose entries corresponds to a trace and states what cluster that trace is in (0 if the trace is not in any cluster). `getIClustOfObs` takes `iObsOfClust` and returns `iClustOfObs`, and `getIObsOfClust` is its inverse function.

`getIClustOfObs` creates a table with an entry for each observation and then iterates through the clusters. In the table, it assigns the contents of each cluster to the appropriate cluster number.

`getIObsOfClust` creates a list of observation number/cluster number pairs and then applies a `GatherBy` to group waveforms in the same clusters together. Mapping `groupCommon` onto the list creates a list of lists, each of whose first element is a cluster number and each of whose second element is a list of the indices of the waveforms in that cluster. We then sort the overall list by cluster number and extract the second entries in the sublists to create a list of lists of observation indices.

`groupProbability` takes the data, the hypothesized mean photon number for the current round, and the effective-photon-number ordering of the data, and returns a list of four things: the Poisson probability distribution (ordered pairs of photon numbers and probabilities), a list of photon numbers for the clustering it creates, a list of lists of constituent waveforms’ indices for the clusters (`iObsOfClust`), and a list containing the average waveform in each cluster. `groupProbability` finds the photon number/value pairs for the Poisson distribution with the given mean photon number and then calls `groupProb` to organize the clusters.

`groupProb` processes the data into clusters based on the given effective photon number list and the Poisson distribution. First we find the proper ordering of the effective photon numbers by size (`nPhotonEffSortIndex`); this will be needed once we have determined the sizes of the clusters to be created. The probability distribution `prob` is broken up into its component parts: `nPhoton` lists photon numbers, and `probList` lists their corresponding probabilities. `probCum` is the cumulative distribution function of `probList`, which we normalize to ensure that the final element is one. gives the index (in the sorted list of effective photon numbers, `nPhotonEffSortIndex`) where cluster `i` should start; gives the index where it should stop. Both indices are inclusive, which is why `probIndexStart` is offset by one from `probIndexStop`. (`boundList` ensures that the starting and stopping indices stay within the confines of the number of observations available to group.) `keep` weeds out the zero-length bins, and we use it to reduce `nPhoton`, `probIndexStart`, and `probIndexStop` down to only the nontrivial clusters. We then form `iObsOfClust` by finding the range of *sorted* effective photon numbers that each cluster should encompass and then using `nPhotonEffSortIndex` to find the observation numbers to which that range corresponds. `clustMeanTrace` is formed by taking the waveforms in `dat` of each cluster in `iObsOfClust` and averaging them. We then return `nPhotonUse` (the photon numbers of the nonempty clusters), `iObsOfClust`, and `clustMeanTrace`.

#### 7.5. Probability and Log-Likelihood

This is the combinatorial term of the log-likelihood objective function. From equation (15) above, , where the are the cluster sizes and .

The Poisson log-likelihood term is .

`probComboLogLikelihood` finds the Poisson/combinatorial log-likelihood term of the objective function.

`poisson` generates about `nSigma` standard deviations of a Poisson distribution on either side of the given mean. The output is a list of ordered pairs of photon numbers and associated values of the Poisson probability mass function. The output is normalized so that it sums to one.

#### 7.6. Deviation and Objective Function Measurement

`getSqDevClustObs` takes a list of observation traces and a list of cluster mean traces and returns a table whose element in position is the mean square deviation of observation `j` from the mean trace of cluster `i`.

`getSqDevClustClust` returns a table whose element in position is the average of the mean square deviations of the traces in cluster `i` from the mean trace of cluster `j`.

`getObjFtn` takes a list of each cluster’s square deviation from its mean, the constant relating the -means and probabilistic components of the objective function (which can be a scalar or a list that takes on a different value for each cluster), and the log-likelihood of the Poisson distribution (with the combinatorial term included). It returns the value of the objective function.

`getSqDevInClust` takes a list of observations and a mean waveform and totals up the mean square deviations of the observations to the mean. (It returns , where is the list of observations, is the number of time points, the are the observations, and is the mean.)

`getSqDevInClust` returns a list that gives , the total mean square deviation of a cluster to its mean, for each cluster . The list returned becomes the first argument of `getObjFtn`.

#### 7.7. Reading and Filtering Data

`readTES` reads in the data from a set of files specified in the options, returning `dat`, a table whose rows give the values of particular waveforms at regular time intervals. `readTES` assumes that several different datasets may share a directory and that each dataset is split over some number of files, each of which consists of an equal number of unsigned 16-bit integers concatenated together in string form. `iNPhoton` gives the numeric label of the dataset to read (it is not equal to the mean photon number of the dataset). `iDataSet` lists the indices of the particular files in the dataset that should be read in. `fileInfo` is a list of three things: `fileNamePart`, a list of two strings that combine with the dataset label and the file index to create the whole filename; `nSamplePerTrace`, the number of time points in each waveform; and `nTracePerFile`, the number of waveforms in each file of the dataset.

For example, if `iNPhoton` were `7`, `iDataSet` were , and `fileInfo` were , then we would be looking for the files , , and in , with 200 time points in each waveform and 512 waveforms in each of the three files.

For each file, we import the data as a list of samples, organize the samples into sublists (waveforms) of length `nSamplePerTrace`, and assign the list of waveforms to the appropriate section of `dat`. When all of the files have been read, `dat` is full and properly formatted.

We can also filter the data as we read it in. `readTES` applies a “short” filter to a list, one that omits half of the frequency domain.

This generates a Hanning filter of a given length.

`readTESandFilter` reads in the dataset, applying a filter to each file’s data before it is stored. This halves the memory consumption of the reading process, since filtration reduces the amount of information to store by half.

#### 7.8. Output Organization

`outputCreateTabs` creates several list structures (global variables) of the appropriate dimensions to accommodate output from the specified number of runs of PIKA. The output is organized into a `TabView`, with each run receiving a tab that contains a nested view with graphical and textual output. `outputNameList` stores the names of the labels in the subview (which are the same for all runs), and `outputContentTable` stores the output content for each run and sublabel. (Content is stored as a list that later becomes a `Column`.) Each run’s tab also has a space above its subview for other information, which is stored in `outputTopList`. Finally, there is another tab on the level of the runs with general textual (`outputGeneralLog`) and graphical (`outputGeneralGraphics`) information about the data read in.

`outputAdd` adds some expression to the content list for a particular name and run, creating the name’s content list if it has no content already. A run number of 0 indicates output to the general information tab, and a blank name indicates output to the top space of a run’s tab.

`outputShowTabView` is called at the end of PIKA, taking the name and content lists and formatting them into a nested `TabView`.

`print` takes any number of arguments (`args`), turns them into strings, concatenates them, and prints the result to the log. `printSp` prints to an arbitrary section of output, not just the log. `iRun` and `name` indicate the run number and tab name, respectively, to which to print.

#### 7.9. Form Input and Runner

`pika` creates a form with a field for each constant and option that needs to be set for PIKA to run. It then uses `runFromAssociation` to assign the proper values to the proper variables and run PIKA.

`makeField` creates a rule that, when used in a `FormObject`, generates a field to set the variable with name `name` and type `type` to the entered value, with descriptive text `labelPart` and default value `defaultContents`. `args` takes optional extra arguments that specify additional options for the field.

`runFromAssociation` takes an association between variable names (strings) and the values that should be assigned to them and assigns the proper value to the symbolic variable associated with each name. It then runs PIKA (using the main function from Section 3) with those global variables set.

### References

[1] | Z. H. Levine, T. Gerrits, A. L. Migdall, D. V. Samarov, B. Calkins, A. E. Lita, and S. W. Nam, “Algorithm for Finding Clusters with a Known Distribution and Its Application to Photon-Number Resolution Using a Superconducting Transition-Edge Sensor,” Journal of the Optical Society of America B, 29(8), 2012 pp. 2066–2073. doi:10.1364/JOSAB.29.002066. |

[2] | T. Gerrits, B. Calkins, N. Tomlin, A. E. Lita, A. Migdall, R. Mirin, and S. W. Nam, “Extending Single-Photon Optimized Superconducting Transition Edge Sensors beyond the Single-Photon Counting Regime,” Optics Express, 20(21), 2021 pp. 23798–23810. doi:10.1364/OE.20.023798. |

[3] | Z. H. Levine, B. L. Glebov, A. L. Pintar, and A. L. Migdall, “Absolute Calibration of a Variable Attenuator Using Few-Photon Pulses,” Optics Express, 23(12), 2015 pp. 16372–16382. doi:10.1364/OE.23.016372. |

[4] | Z. H. Levine, B. L. Glebov, A. L. Migdall, T. Gerrits, B. Calkins, A. E. Lita, and S. W. Nam, “Photon-Number Uncertainty in a Superconducting Transition Edge Sensor beyond Resolved-Photon-Number Determination,” Journal of the Optical Society of America B, 31(10), 2014 pp. B20–B24. doi:10.1364/JOSAB.31.000B20. |

[5] | P. E. Black, “Greedy Algorithm,” in Dictionary of Algorithms and Data Structures (V. Pieterse and P. E. Black, eds.), Feb 2, 2005. www.nist.gov/dads/HTML/greedyalgo.html. |

[6] | S. Kirkpatrick, C. D. Gelatt Jr., and M. P. Vecchi, “Optimization by Simulated Annealing,” Science, 220, 1983 pp. 671–680. doi:10.1126/science.220.4598.671. |

B. P. M. Morris and Z. H. Levine, “The Poisson-Influenced -Means Algorithm,” The Mathematica Journal, 2016. dx.doi.org/doi:10.3888/tmj.18-3. |

### Additional Material

www.mathematica-journal.com/data/uploads/2016/03/TES_E0019-16.daq00.zip

www.mathematica-journal.com/data/uploads/2016/03/TES_E0019-16.daq01.zip

www.mathematica-journal.com/data/uploads/2016/03/TES_E0019-16.daq02.zip

www.mathematica-journal.com/data/uploads/2016/03/TES_E0019-16.daq03.zip

www.mathematica-journal.com/data/uploads/2016/03/TES_E0019-19.daq00.zip

www.mathematica-journal.com/data/uploads/2016/03/TES_E0019-19.daq01.zip

www.mathematica-journal.com/data/uploads/2016/03/TES_E0019-19.daq02.zip

www.mathematica-journal.com/data/uploads/2016/03/TES_E0019-19.daq03.zip

### About the Authors

Brian P. M. Morris is a senior in the Science, Mathematics, and Computer Science Magnet Program at Montgomery Blair High School in Silver Spring, Maryland, was in the Summer High School Intern Program at the National Institute of Standards and Technology, and is a 2016 Intel Science Talent Search Semifinalist.

Zachary H. Levine is a physicist at the National Institute of Standards and Technology in the Infrared Technology Group and is a Fellow of the American Physical Society. He is a graduate of MIT and the University of Pennsylvania.

**Brian P. M. Morris**

*Montgomery Blair High School
Silver Spring, Maryland 20901*

**Zachary H. Levine**

*National Institute of Standards and Technology
Gaithersburg, Maryland 20899-8441
*

*zlevine@nist.gov*

*mathematica-journal.com*