4.1. An Optimal Key Enumeration Algorithm
We study the optimal key enumeration algorithm (OKEA) that was introduced in the research paper [
18]. We will firstly give the basic idea behind the algorithm by assuming the encoding of the secret key is represented as two chunks; hence, we have access to two lists of chunk candidates.
4.1.1. Setup
Let ${L}^{0}=[{c}_{0}^{0},{c}_{1}^{0},\dots ,{c}_{{m}_{0}1}^{0}]$ and ${L}^{1}=[{c}_{0}^{1},{c}_{1}^{1},\dots ,{c}_{{m}_{1}1}^{1}]$ be the two lists respectively. Each list is in decreasing order based on the score component of its chunk candidates. Let us define an extended candidate as a 4tuple of the form $\mathtt{C}:=({\mathtt{c}}_{{j}_{0}}^{0},{\mathtt{c}}_{{j}_{1}}^{1},{j}_{0},{j}_{1})$ and its score as ${\mathtt{c}}_{{j}_{0}}^{0}.score+{\mathtt{c}}_{{j}_{1}}^{1}.score$. Additionally, let $\mathtt{Q}$ be a priority queue that will store extended candidates in decreasing order based on their score.
This data structure $\mathtt{Q}$ supports three methods. Firstly, the method $\mathtt{Q}.\mathtt{poll}()$ retrieves and removes the head from this queue $\mathtt{Q}$ or returns $\mathtt{null}$ if this queue is empty. Secondly, the method $\mathtt{Q}.\mathtt{add}(e)$ inserts the specified element e into the priority queue Q. Thirdly, the method $\mathtt{Q}.\mathtt{clear}()$ removes all the elements from the queue Q. The queue will be empty after this method returns. By making use of a heap, we can support any priorityqueue operation on a set of size n in $\mathcal{O}(lo{g}_{2}(n))$ time.
Furthermore, let $\mathtt{X}$ and $\mathtt{Y}$ be two vectors of bits that grow as needed. These are employed to track an extended candidate $\mathtt{C}$ in $\mathtt{Q}$. $\mathtt{C}$ is in $\mathtt{Q}$ only if both ${\mathtt{X}}_{{j}_{0}}$ and ${\mathtt{Y}}_{{j}_{1}}$ are set to 1. By default, all bits in a vector initially have the value 0.
4.1.2. Basic Algorithm
At the initial stage, queue Q will be created. Next, the extended candidate $({\mathtt{c}}_{0}^{0},{\mathtt{c}}_{0}^{1},0,0)$ will be inserted into the priority queue and both ${\mathtt{X}}_{0}$ and ${\mathtt{Y}}_{0}$ will be set to 1. In order to generate a new key candidate, the routine nextCandidate, defined in Algorithm 1, should be executed.
Let us assume that ${m}_{0},{m}_{1}>1$. First, the extended candidate $({\mathtt{c}}_{0}^{0},{\mathtt{c}}_{0}^{1},0,0)$ will be retrieved and removed from Q, and then, ${\mathtt{X}}_{0}$ and ${\mathtt{Y}}_{0}$ will be set to 0. The two $\mathbf{if}$ blocks of instructions will then be executed, meaning that the extended candidates $({\mathtt{c}}_{1}^{0},{\mathtt{c}}_{0}^{1},1,0)$ and $({\mathtt{c}}_{0}^{0},{\mathtt{c}}_{1}^{1},0,1)$ will be inserted into Q. Moreover, the entries ${\mathtt{X}}_{0},{\mathtt{X}}_{1},{\mathtt{Y}}_{0}$, and ${\mathtt{Y}}_{1}$ will be set to 1, while the other entries of X and Y will remain as 0. The routine nextCandidate will then return ${\mathtt{c}}_{0,0}=combine({\mathtt{c}}_{0}^{0},{\mathtt{c}}_{0}^{1})$, which is the highest score key candidate, since ${L}^{0}$ and ${L}^{1}$ are in decreasing order. At this point, the two extended candidates $({\mathtt{c}}_{1}^{0},{\mathtt{c}}_{0}^{1},1,0)$ and $({\mathtt{c}}_{0}^{0},{\mathtt{c}}_{1}^{1},0,1)$ (both in Q) are the only ones that can have the second highest score. Therefore, if Algorithm 2 is called again, the first instruction will retrieve and remove the extended candidate with the second highest score, say $({\mathtt{c}}_{0}^{0},{\mathtt{c}}_{1}^{1},0,1)$, from Q and then the second instruction will set ${\mathtt{X}}_{0}$ and ${\mathtt{Y}}_{1}$ to 0. The first $\mathbf{if}$ condition will be attempted, but this time, it will be false since ${\mathtt{X}}_{1}$ is set to 1. However, the second $\mathbf{if}$ condition will be satisfied, and therefore, $({\mathtt{c}}_{0}^{0},{\mathtt{c}}_{2}^{1},0,2)$ will be inserted into Q and the entries ${\mathtt{X}}_{0}$ and ${\mathtt{Y}}_{2}$ will be set to 1. The method will then return ${c}_{0,1}=\mathtt{combine}({\mathtt{c}}_{0}^{0},{\mathtt{c}}_{1}^{1})$, which is the second highest score key candidate.
Algorithm 1 outputs the next highestscoring key candidate from ${L}^{0}$ and ${L}^{1}$. 
 1:
functionNextCandidate($\mathtt{Q}$)  2:
$({\mathtt{c}}_{{j}_{0}}^{0},{\mathtt{c}}_{{j}_{1}}^{1},{j}_{0},{j}_{1})\leftarrow \mathtt{Q}.\mathtt{poll}();$  3:
${\mathtt{X}}_{{j}_{0}}\leftarrow 0;{\mathtt{Y}}_{{j}_{1}}\leftarrow 0;$  4:
if $({j}_{0}+1)<{L}^{0}.\mathtt{size}()$ and ${\mathtt{X}}_{{j}_{0}+1}=0$ then  5:
${\mathtt{c}}_{{j}_{0}+1}^{0}\leftarrow {L}^{0}.\mathtt{get}({j}_{0}+1)$;  6:
$\mathtt{Q}.\mathtt{add}(({\mathtt{c}}_{{j}_{0}+1}^{0},{\mathtt{c}}_{{j}_{1}}^{1},{j}_{0}+1,{j}_{1}))$;  7:
${\mathtt{X}}_{{j}_{0}+1}\leftarrow 1;{\mathtt{Y}}_{{j}_{1}}\leftarrow 1;$  8:
end if  9:
if $({j}_{1}+1)<{L}^{1}.\mathtt{size}()$ and ${\mathtt{Y}}_{{j}_{1}+1}=0$ then  10:
${\mathtt{c}}_{{j}_{1}+1}^{1}\leftarrow {L}^{1}.\mathtt{get}({j}_{1}+1)$;  11:
$\mathtt{Q}.\mathtt{add}(({\mathtt{c}}_{{j}_{0}}^{0},{\mathtt{c}}_{{j}_{1}+1}^{1},{j}_{0},{j}_{1}+1))$;  12:
${\mathtt{X}}_{{j}_{0}}\leftarrow 1;{\mathtt{Y}}_{{j}_{1}+1}\leftarrow 1;$  13:
end if  14:
return ${\mathtt{c}}_{{j}_{0},{j}_{1}}=\mathtt{combine}({\mathtt{c}}_{{j}_{0}}^{0},{\mathtt{c}}_{{j}_{1}}^{1})$;  15:
end function

At this point, the two extended candidates $({\mathtt{c}}_{1}^{0},{\mathtt{c}}_{0}^{1},1,0)$ and $({\mathtt{c}}_{0}^{0},{\mathtt{c}}_{2}^{1},0,2)$ (both in Q) are the only ones that can have the third highest score. As for why, we know that the algorithm has generated ${\mathtt{c}}_{0,0}$ and ${\mathtt{c}}_{0,1}$ so far. Since ${L}_{0}$ and ${L}_{1}$ are in decreasing order, we have that either ${\mathtt{c}}_{0,0}.score\ge {\mathtt{c}}_{0,1}.score\ge {\mathtt{c}}_{1,0}.score\ge {\mathtt{c}}_{0,2}.score$ or ${\mathtt{c}}_{0,0}.score\ge {\mathtt{c}}_{0,1}.score\ge {\mathtt{c}}_{0,2}.score\ge {\mathtt{c}}_{1,0}.score$. Also, any other extended candidate yet to be inserted into Q cannot have the third highest score for the same reason. Consider, for example, $({\mathtt{c}}_{1}^{0},{\mathtt{c}}_{1}^{1},1,1)$: this extended candidate will be inserted into Q only if $({\mathtt{c}}_{1}^{0},{\mathtt{c}}_{0}^{1},1,0)$ has been retrieved and removed from Q. Therefore, if Algorithm 1 is executed again, it will return the third highest scoring key candidate and have the extended candidate with the fourth highest score placed at the head of Q. In general, the manner in which this algorithm travels through the ${m}_{0}\times {m}_{1}$ matrix of key candidates guarantees to output key candidates in a decreasing order based on their total accumulated score, i.e., this algorithm is an optimal key enumeration algorithm.
Regarding how fast queue $\mathtt{Q}$ grows, let ${N}_{\mathtt{Q}}^{s}$ be the number of extended candidates in Q after the function nextCandidate has been called $s\ge 0$ times. Clearly, we have that ${N}_{\mathtt{Q}}^{0}$ = 1, since $\mathtt{Q}$ only contains the extended candidate $({\mathtt{c}}_{0}^{0},{\mathtt{c}}_{0}^{1},0,0)$ after initialisation. Also, ${N}_{\mathtt{Q}}^{{m}_{1}\xb7{m}_{2}}=0$ because, after ${m}_{1}\xb7{m}_{2}$ calls to the function, there will be no more key candidates to be enumerated. Note that, during the execution of the function nextCandidate, an extended candidate will be removed from Q and two new extended candidates might be inserted into Q. Considering the way in which an extended candidate is inserted into the queue, Q may contain at most one element in each row and column at any stage; hence, ${N}_{\mathtt{Q}}^{s}\le min({m}_{0},{m}_{1})$ for $0\le s\le m1\xb7m2$.
4.1.3. Complete Algorithm
Note that Algorithm 1 works properly if both input lists are in decreasing order. Hence, it may be generalized to a number of lists greater than 2 by employing a divideandconquer approach, which works by recursively breaking down the problem into two or more subproblems of the same or related type until these become simple enough to be solved directly. The solutions to the subproblems are then combined to give a solution to the original problem [
30]. To explain the complete algorithm, let us consider the case when there are five chunks as an example. We have access to five lists of chunk candidates
${L}^{i},0\le i<5$, each of which has a size of
${m}_{i}$. We first call
$\mathtt{initialise}(0,4)$, as defined in Algorithm 2. This function will build a treelike structure from the five given lists (see
Figure 1).
Each node ${\mathtt{N}}^{i,\dots ,f}$ is a 6tuple of the form $({\mathtt{N}}^{i,\dots ,q},{\mathtt{N}}^{q+1,\dots ,f},{\mathtt{Q}}^{i,\dots ,f},{\mathtt{X}}^{i,\dots ,f},{\mathtt{Y}}^{i,\dots ,f},{L}^{i,\dots ,f})$, where ${\mathtt{N}}^{i,\dots ,q}$ and ${\mathtt{N}}^{q+1,\dots ,f}$ are the children nodes, ${\mathtt{Q}}^{i,\dots ,f}$ is a priority queue, ${\mathtt{X}}^{i,\dots ,f}$ and ${\mathtt{Y}}^{i,\dots ,f}$ are bit vectors, and ${L}^{i,\dots ,f}$ a list of chunk candidates. Additionally, this data structure supports the method $\mathtt{size}()$, which returns the maximum number of chunk candidates that this node can generate. This method is easily defined in a recursive way: if ${\mathtt{N}}^{i,\dots ,f}$ is a leaf node, then the method will return ${L}^{i,\dots ,f}.\mathtt{size}()$ or else, the method will return ${\mathtt{N}}^{i,\dots ,q}.\mathtt{size}()\times {\mathtt{N}}^{q+1,\dots ,f}.\mathtt{size}()$. To avoid computing this value each time this method is called, a node will internally store the value once it has been computed for the first time. Hence, the method will only return the stored value from the second call onwards. Furthermore, the function $\mathtt{getCandidate}({\mathtt{N}}^{i,\dots ,f},j)$, as defined in Algorithm 3, returns the ${j}_{th}$ best chunk candidate (chunk candidate of which its score rank is j) from the node ${\mathtt{N}}^{i,\dots ,f}.$
In order to generate the first N best key candidates from the root node R, with $\mathtt{R}:={N}^{0,\dots ,4}$, we simply run $\mathtt{nextCandidate}(\mathtt{R})$, as defined in Algorithm 4, N times. This function internally calls the function $\mathtt{getCandidate}$ with suitable parameters each time it is required. Calling $\mathtt{getCandidate}({\mathtt{N}}^{i,\dots ,f},j)$ may cause this function to internally invoke $\mathtt{nextCandidate}({N}^{i,\dots ,f})$ to generate ordered key candidates from the inner node ${\mathtt{N}}^{i,\dots ,f}$ on the fly. Therefore, any inner node ${\mathtt{N}}^{i,\dots ,f}$ should keep track of the chunk candidates returned by $\mathtt{getCandidate}({\mathtt{N}}^{i,\dots ,f},j)$ when called by its parent; otherwise, the j best chunk candidates from ${\mathtt{N}}^{i,\dots ,f}$ would have to be generated each time such a call is done, which is inefficient. To keep track of the returned chunk candidates, each node ${\mathtt{N}}^{i,\dots ,f}$ updates its internal list ${L}^{i,\dots ,f}$ (see lines 5 to 7 in Algorithm 3).
Algorithm 2 creates and initialises each node of the treelike structure. 
 1:
functioninitialise($i,f$)  2:
if $f=i$ then  3:
${\mathtt{L}}^{i}\leftarrow (\mathtt{null},\mathtt{null},\mathtt{null},\mathtt{null},\mathtt{null},{L}^{i});$  4:
return ${\mathtt{L}}^{i}$;  5:
else  6:
$q\leftarrow \lfloor \frac{i+f}{2}\rfloor $;  7:
${\mathtt{N}}^{i,\dots ,q}\leftarrow \mathtt{initialise}(i,q)$;  8:
${\mathtt{N}}^{q+1,\dots ,f}\leftarrow \mathtt{initialise}(q+1,f)$;  9:
${\mathtt{c}}_{0}^{i,\dots ,q}\leftarrow \mathtt{getCandidate}({\mathtt{N}}^{i,\dots ,q},0)$;  10:
${\mathtt{c}}_{0}^{q+1,\dots ,f}\leftarrow \mathtt{getCandidate}({\mathtt{N}}^{q+1,\dots ,f},0)$;  11:
${\mathtt{Q}}^{i,\dots ,f}.\mathtt{add}(({\mathtt{c}}_{0}^{i,\dots ,q},{\mathtt{c}}_{0}^{q+1,\dots ,f},0,0))$;  12:
${\mathtt{X}}_{0}^{i,\dots ,f}\leftarrow 1$; ${\mathtt{Y}}_{0}^{i,\dots ,f}\leftarrow 1;$  13:
${\mathtt{N}}^{i,\dots ,f}\leftarrow ({\mathtt{N}}^{i,\dots ,q},{\mathtt{N}}^{q+1,\dots ,f},{\mathtt{Q}}^{i,\dots ,f},{\mathtt{X}}^{i,\dots ,f},{\mathtt{Y}}^{i,\dots ,f},{L}^{i,\dots ,f});$  14:
return ${\mathtt{N}}^{i,\dots ,f}$;  15:
end if  16:
end function

Algorithm 3 outputs the ${j}_{th}$ best chunk candidate from the node ${\mathtt{N}}^{i,\dots ,f}$. 
 1:
functiongetCandidate(${\mathtt{N}}^{i,\dots ,f},j$)  2:
if ${\mathtt{N}}^{i,\dots ,f}$ is a leaf then  3:
return ${L}^{i,\dots ,f}.\mathtt{get}(j)$;  4:
end if  5:
if $j\ge {L}^{i,\dots ,f}.\mathtt{size}()$ then  6:
${L}^{i,\dots ,f}.\mathtt{add}(\mathtt{nextCandidate}({\mathtt{N}}^{i,\dots ,f}))$;  7:
end if  8:
return ${L}^{i,\dots ,f}.\mathtt{get}(j)$;  9:
end function

Algorithm 4 outputs the next highestscoring chunk candidate from the node ${\mathtt{N}}^{i,\dots ,f}$. 
 1:
functionnextCandidate(${\mathtt{N}}^{i,\dots ,f}$)  2:
$({c}_{{j}_{x}}^{x},{c}_{{j}_{y}}^{y},{j}_{x},{j}_{y})\leftarrow {\mathtt{Q}}^{i,\dots ,f}.\mathtt{poll}();$$(x=\{i,\dots ,q\},y=\{q+1,\dots ,f\}).$  3:
${\mathtt{X}}_{{j}_{x}}^{i,\dots ,f}\leftarrow 0;{\mathtt{Y}}_{{j}_{y}}^{i,\dots ,f}\leftarrow 0;$  4:
if $({j}_{x}+1)<{\mathtt{N}}^{i,\dots ,q}.\mathtt{size}()$ and ${\mathtt{X}}_{{j}_{x}+1}^{i,\dots ,f}=0$ then  5:
${c}_{{j}_{x}+1}^{x}\leftarrow \mathtt{getCandidate}({\mathtt{N}}^{i,\dots ,q},{j}_{x}+1);$  6:
${\mathtt{Q}}^{i,\dots ,f}.\mathtt{add}(({c}_{{j}_{x}+1}^{x},{c}_{{j}_{y}}^{y},{j}_{x}+1,{j}_{y}));$  7:
${\mathtt{X}}_{{j}_{x}+1}^{i,\dots ,f}\leftarrow 1;{\mathtt{Y}}_{{j}_{y}}^{i,\dots ,f}\leftarrow 1;$  8:
end if  9:
if $({j}_{y}+1)<{\mathtt{N}}^{q+1,\dots ,f}.\mathtt{size}()$ and ${\mathtt{Y}}_{{j}_{y}+1}^{i,\dots ,f}=0$ then  10:
${c}_{{j}_{y}+1}^{y}\leftarrow \mathtt{getCandidate}({\mathtt{N}}^{q+1,\dots ,f},{j}_{y}+1);$  11:
${\mathtt{Q}}^{i,\dots ,f}.\mathtt{add}(({c}_{{j}_{x}}^{x},{c}_{{j}_{y}+1}^{y},{j}_{x},{j}_{y}+1))$;  12:
${\mathtt{X}}_{{j}_{x}}^{i,\dots ,f}\leftarrow 1;{\mathtt{Y}}_{{j}_{y}+1}^{i,\dots ,f}\leftarrow 1;$  13:
end if  14:
return $\mathtt{combine}({c}_{{j}_{x}}^{x},{c}_{{j}_{y}}^{y});$  15:
end function

4.1.4. Memory Consumption
Let us suppose that the encoding of a secret key is $W={2}^{a+b}$ bits in size and that we set $w={2}^{a}$; therefore, $\mathcal{N}={2}^{b}$. Hence, we have access to $\mathcal{N}$ lists ${L}^{i}$, $0\le i<{2}^{b}$, each of which has ${m}_{i}$ chunk candidates. Suppose we would like to generate the first N best key candidates. We first invoke $\mathtt{initialise}(0,\mathcal{N}1)$ (Algorithm 2). This call will create a treelike structure with $b+1$ levels starting at 0.
The root node $\mathtt{R}:={\mathtt{N}}^{0,\dots ,{2}^{b}1}$ at level 0.
The inner nodes ${\mathtt{N}}^{{I}_{d}}:={\mathtt{N}}_{\lambda}^{{i}_{d}}$ with ${I}_{d}=\{{i}_{d}\xb7{2}^{b\lambda},\dots ,({i}_{d}+1)\xb7{2}^{b\lambda}1\}$, where $\lambda ,0<\lambda <b,$ is the level and ${i}_{d},0\le {i}_{d}<{2}^{\lambda},$ is the node identification at level $\lambda $.
The leaf nodes ${\mathtt{L}}^{i}$ at level b for $0\le i<{2}^{b}$.
This tree will have ${2}^{0}+{2}^{1}+\cdots +{2}^{b}={2}^{b+1}1$ nodes.
Let ${M}_{k}$ be the number of bits consumed by chunk candidates stored in memory after calling the function nextCandidate with R as a parameter k times. A chunk candidate at level $0\le \lambda \le b$ is of the form $(score,[{e}_{0},\dots ,{e}_{{2}^{b\lambda}1}])$ with $score$ being a real number and ${e}_{l}$ being bit strings. Let ${B}_{\lambda}$ be the number of bits a chunk candidate at level $\lambda $ occupies in memory.
First note that invoking $\mathtt{initialise}(0,\mathcal{N}1)$ causes each internal node’s list to grow, since
At creation of nodes ${\mathtt{L}}^{i}$ (lines 2 to 4), ${\mathtt{L}}^{i}$ is created by setting ${\mathtt{L}}^{i}$’s internal list to ${L}^{i}$ and by setting ${\mathtt{L}}^{i}$’s other components to null.
At creation of both $\mathtt{R}$ and nodes ${\mathtt{N}}_{\lambda}^{{i}_{d}}$, for $0<\lambda <b1$ and $0\le {i}_{d}<{2}^{\lambda}$, the execution of the function getCandidate (lines 9 to 10) makes their corresponding left child (right child) store a new chunk candidate in their corresponding internal list. That is, for $0<\lambda \le b1,0\le id<{2}^{\lambda}$, the ${\mathtt{N}}_{\lambda}^{id}$’s internal list has a new element.
Therefore, ${M}_{0}={\sum}_{\lambda =1}^{b1}{2}^{\lambda}{B}_{\lambda}+{B}_{b}({\sum}_{i=0}^{{2}^{b}1}{m}_{i}).$
Suppose the best key candidate is about to be generated, then $\mathtt{nextCandidate}(\mathtt{R})$ will be executed for the first time. This routine will remove the extended candidate $({\mathtt{c}}_{0}^{x},{\mathtt{c}}_{0}^{y},0,0)$ out of R’s priority queue. If it enters the first if (lines 4 to 8), it will make the call $\mathtt{getCandidate}({\mathtt{N}}_{1}^{0},1)$ (line 5), which may cause each node, except for the leaf nodes, of the left subtree to store at most a new chunk candidate in its corresponding internal list. Hence, retrieving the chunk candidate ${\mathtt{c}}_{1}^{x}$ may cause at most ${2}^{\lambda 1}$ chunk candidates per level $\lambda ,1\le \lambda <b,$ to be stored. Likewise, if it enters the second if (lines 9 to 13), it will call the function $\mathtt{getCandidate}({N}_{1}^{1},1)$ (line 10), which may cause each node, except for the leaf nodes, of the right subtree to store at most a new chunk candidate in its corresponding internal list. Therefore, retrieving the chunk candidate ${\mathtt{c}}_{1}^{y}$ (line 10) may cause at most ${2}^{\lambda 1}$ chunk candidates per level $\lambda ,1\le \lambda <b$, to be stored. Therefore, after generating the best key candidate, ${p}_{\lambda}^{(1)}\le {2}^{\lambda}$ chunk candidates per level $\lambda ,1\le \lambda <b$, will be stored in memory; hence, ${M}_{0}\le {M}_{1}={M}_{0}+{\sum}_{\lambda =1}^{b1}{p}_{\lambda}^{(1)}{B}_{\lambda}\le 2{\sum}_{\lambda =1}^{b1}{2}^{\lambda}{B}_{\lambda}+{B}_{b}({\sum}_{i=0}^{{2}^{b}1}{m}_{i})$ bits are consumed by chunk candidates stored in memory.
Let us assume that
$k1$ key candidates have already been generated; therefore,
${M}_{k1}$ bits are consumed by chunk candidates in memory, with
${M}_{k1}={M}_{0}+{\sum}_{d=1}^{k1}{\sum}_{\lambda =1}^{b1}{p}_{\lambda}^{(d)}{B}_{\lambda}$. Let us now suppose the
${k}_{th}$ best key candidate is about to be generated; then, the method
$\mathtt{nextCandidate}(\mathtt{R})$ will be executed for the
${k}_{th}$ time. This routine will remove the best extended candidate
$({c}_{{j}_{x}}^{x},{c}_{{j}_{y}}^{y},{j}_{x},{j}_{y})$ out of the
R’s priority queue. It will then attempt to insert two new extended candidates into
R’s priority queue. As seen previously, retrieving the chunk candidate
${\mathtt{c}}_{{j}_{x}+1}^{x}$ may cause at most
${2}^{\lambda}1$ chunk candidates per level
$\lambda ,1\le \lambda <b,$ to be stored. Likewise, retrieving the chunk candidate
${c}_{{j}_{y}+1}^{y}$ may also cause at most
${2}^{\lambda 1}$ chunk candidates per level
$\lambda ,1\le \lambda <b,$ to be stored. Therefore, after generating the
${k}_{th}$ best key candidate,
${p}_{\lambda}^{(k)}\le {2}^{\lambda}$ chunk candidates per level
$\lambda ,1\le \lambda <b$, will be stored in memory; hence,
bits are consumed by chunk candidates stored in memory.
It follows that, if
N key candidates are generated, then
bits are consumed by chunk candidates stored in memory in addition to the extended candidates stored internally in the priority queue of the nodes
R and
${\mathtt{N}}_{\lambda}^{{i}_{d}}.$ Therefore, this algorithm may consume a large amount of memory if it is used to generate a large number of key candidates, which may be problematic.
4.4. Complete Algorithm
When the number of chunks is greater than 2, the algorithm applies a recursive decomposition of the problem (similar to OKEA). Whenever a new chunk candidate is inserted into the candidate set, its value is obtained by applying the enumeration algorithm to the lower level. We explain an example to give an idea of the general algorithm. Let us suppose the encoding of the secret key is divided into 4 chunks; then, we have access to 4 lists of chunk candidates, each of which is of size ${m}_{i}$ with $\omega \mid {m}_{i}.$
To generate key candidates, we need to generate the two lists of chunk candidates for the lower level
${L}^{0,1}$ and
${L}^{2,3}$ on the fly as far as required. For this, we maintain a set of next potential candidates, for each dimension,
${\mathtt{Q}}^{0,1}$ and
${\mathtt{Q}}^{2,3}$, so that each next chunk candidate obtained from
${\mathtt{Q}}^{0,1}$ (or
${\mathtt{Q}}^{2,3}$) is stored in the list
${L}^{0,1}$(or
${L}^{2,3}$). Because the enumeration is performed by layers, the sizes of the data structures
${\mathtt{Q}}^{1,2}$ and
${\mathtt{Q}}^{3,4}$ are bounded by
$2\xb7\omega $. However, this is not the case for the lists
${L}^{0,1}$ and
${L}^{2,3}$, which grow as the number of candidates enumerated grows, hence becoming problematic as seen in
Section 4.1.4.
To handle this, each
$laye{r}_{k}^{\omega}$ is partitioned into squares of size
$\omega \times \omega $. The algorithm still enumerates the key candidates in
$laye{r}_{1}^{\omega}$ first, then in
$laye{r}_{2}^{\omega}$, and so on, but in each
$laye{r}_{k}^{\omega}$, the enumeration will be squarebysquare.
Figure 3 depicts the geometric representation of the key enumeration within
$laye{r}_{3}^{3},$ where a square (strong shade of blue) within a layer represents the square being processed by the enumeration algorithm. More specifically, for given nonnegative integers
I and
$J,$ let us define
${S}_{I,J}^{w}$ as
Let us set
${m}_{min}=min({m}_{0}\xb7{m}_{1},{m}_{2}\xb7{m}_{3})$; hence,
for
$1\le k\le \frac{{m}_{min}}{\omega}$. The remaining layers, if any, are also partitioned in a similar way.
The inlayer algorithm then proceeds as follows. For each $laye{r}_{k}^{\omega},1\le k\le \frac{{m}_{min}}{\omega},$ the inlayer algorithm first enumerates the candidates in the two corner squares $S={S}_{k1,0}^{\omega}\cup {S}_{0,k1}^{\omega}$ by applying OKEA on S. At some point, one of the two squares is completely enumerated. Assume this is ${S}_{k1,0}^{\omega}$. At this point, the only square that contains the next key candidates after ${S}_{k1,0}^{\omega}$ is the successor ${S}_{k1,1}^{\omega}$. Therefore, when one of the squares is completely enumerated, its successor is inserted in S, as long as S does not contain a square in the same row or column. For the remaining layers, if any, the inlayer algorithm first enumerates the candidates in the square $S={S}_{k1,0}^{\omega}$ (or ${S}_{0,k1}^{\omega}$) by applying OKEA on it. Once the square is completely enumerated, its successor is inserted in S, and so on. This inlayer partition into squares reduces the space complexity, since instead of storing the full list of chunk candidates of the lower levels, only the relevant chunk candidates are stored for enumerating the two current squares.
Because this inlayer algorithm enumerates at most two squares at any time in a layer, the treelike structure is no longer a binary tree. A node ${\mathtt{N}}^{i,\dots ,f}$ is now extended to an 8tuple of the form $({\mathtt{N}}_{0}^{i,\dots ,q},{\mathtt{N}}_{0}^{q+1,\dots ,f},{\mathtt{N}}_{1}^{i,\dots ,q},{\mathtt{N}}_{1}^{q+1,\dots ,f},{\mathtt{Q}}^{i,\dots ,f},{\mathtt{X}}^{i,\dots ,f},{\mathtt{Y}}^{i,\dots ,f},{L}^{i,\dots ,f})$, where ${N}_{b}^{i,\dots ,q}$ and ${N}_{b}^{q+1,\dots ,f}$ for $b=0,1$ are the children nodes used to enumerate at most two squares in a particular layer, ${\mathtt{Q}}^{i,\dots ,f}$ is a priority queue, ${\mathtt{X}}^{i,\dots ,f}$ and ${\mathtt{Y}}^{i,\dots ,f}$ are bit vectors, and ${L}^{i,\dots ,f}$ is a list of chunk candidates. Hence, the function that initialises the treelike structure is adjusted to create the two additional children for a given node (see Algorithm 5).
Algorithm 5 creates and initialises each node of the treelike structure. 
 1:
functioninitialise($i,f$)  2:
if $f=i$ then  3:
${\mathtt{L}}^{i}\leftarrow (\mathtt{null},\mathtt{null},\mathtt{null},\mathtt{null},\mathtt{null},\mathtt{null},\mathtt{null},{L}^{i});$  4:
return ${\mathtt{L}}^{i}$;  5:
else  6:
$q\leftarrow \lfloor \frac{i+f}{2}\rfloor $;  7:
${\mathtt{N}}_{0}^{i,\dots ,q}\leftarrow \mathtt{initialise}(i,q)$;  8:
${\mathtt{N}}_{0}^{q+1,\dots ,f}\leftarrow \mathtt{initialise}(q+1,f)$;  9:
${\mathtt{N}}_{1}^{i,\dots ,q}\leftarrow \mathtt{initialise}(i,q)$;  10:
${\mathtt{N}}_{1}^{q+1,\dots ,f}\leftarrow \mathtt{initialise}(q+1,f)$;  11:
${\mathtt{c}}_{0}^{i,\dots ,q}\leftarrow \mathtt{getCandidate}({\mathtt{N}}_{0}^{i,\dots ,q},0,2)$;  12:
${\mathtt{c}}_{0}^{q+1,\dots ,f}\leftarrow \mathtt{getCandidate}({\mathtt{N}}_{1}^{q+1,\dots ,f},0,2)$;  13:
${\mathtt{Q}}^{i,\dots ,f}.\mathtt{add}(({\mathtt{c}}_{0}^{i,\dots ,q},{\mathtt{c}}_{0}^{q+1,\dots ,f},0,0))$;  14:
${\mathtt{X}}_{0}^{i,\dots ,f}\leftarrow 1$; ${\mathtt{Y}}_{0}^{i,\dots ,f}\leftarrow 1$;  15:
${\mathtt{N}}^{i,\dots ,f}\leftarrow ({\mathtt{N}}_{0}^{i,\dots ,q},{\mathtt{N}}_{0}^{q+1,\dots ,f},{\mathtt{N}}_{1}^{i,\dots ,q},{\mathtt{N}}_{1}^{q+1,\dots ,f},{\mathtt{Q}}^{i,\dots ,f},{\mathtt{X}}^{i,\dots ,f},{\mathtt{Y}}^{i,\dots ,f},{L}^{i,\dots ,f})$;  16:
return ${\mathtt{N}}^{i,\dots ,f}$;  17:
end if  18:
end function

Algorithm 6 outputs the ${j}_{th}$ chunk candidate from the node ${\mathtt{N}}^{i,\dots ,f}$. 
 1:
functiongetCandidate(${\mathtt{N}}^{i,\dots ,f},j,sw$)  2:
if ${\mathtt{N}}^{i,\dots ,f}$ is a leaf then  3:
return ${L}^{i,\dots ,f}.\mathtt{get}(j)$;  4:
end if  5:
if $sw=0$ then  6:
$\mathtt{restart}({\mathtt{N}}^{i,\dots ,f})$;  7:
else  8:
if $sw=1$ then  9:
${L}^{i,\dots ,f}.\mathtt{clear}()$;  10:
end if  11:
end if  12:
$j\leftarrow j\phantom{\rule{3.33333pt}{0ex}}mod\phantom{\rule{0.277778em}{0ex}}\omega $;  13:
if $j\ge {L}^{i,\dots ,f}.\mathtt{size}()$ then  14:
${L}^{i,\dots ,f}.\mathtt{add}(\mathtt{nextCandidate}({\mathtt{N}}^{i,\dots ,f}))$;  15:
end if  16:
return ${L}^{i,\dots ,f}.\mathtt{get}(j)$;  17:
end function

Algorithm 7 outputs the next chunk candidate from the node ${\mathtt{N}}^{i,\dots ,f}$. 
 1:
functionnextCandidate(${\mathtt{N}}^{i,\dots ,f}$)  2:
$({c}_{{j}_{x}}^{x},{c}_{{j}_{y}}^{y},{j}_{x},{j}_{y})\leftarrow {\mathtt{Q}}^{i,\dots ,f}.\mathtt{poll}();$$(x=\{i,\dots ,q\},y=\{q+1,\dots ,f\})$  3:
${\mathtt{X}}_{{j}_{x}}^{i,\dots ,f}\leftarrow 0;{\mathtt{Y}}_{{j}_{y}}^{i,\dots ,f}\leftarrow 0;$  4:
$I\leftarrow \lfloor \frac{{j}_{x}}{\omega}\rfloor ;J\leftarrow \lfloor \frac{{j}_{y}}{\omega}\rfloor ;b=(I\ge J)?0:1;$  5:
if ${S}_{I,J}$ is completely enumerated then  6:
$las{t}_{I}\leftarrow {\mathtt{N}}_{0}^{i,\dots ,q}.\mathtt{size}()/\omega 1$;  7:
$las{t}_{J}\leftarrow {\mathtt{N}}_{1}^{q+1,\dots ,f}.\mathtt{size}()/\omega 1$;  8:
if $I=J$ or ($I>las{t}_{J}$ and $J=las{t}_{J}$) or ($J>las{t}_{I}$ and$I=las{t}_{I}$) then  9:
if $({j}_{x}+1)<(las{t}_{I}+1)\xb7\omega $ then  10:
${\mathtt{c}}_{{j}_{x}+1}^{x}\leftarrow \mathtt{getCandidate}({\mathtt{N}}_{0}^{i,\dots ,q},{j}_{x}+1,1);$  11:
${\mathtt{c}}_{0}^{y}\leftarrow \mathtt{getCandidate}({\mathtt{N}}_{0}^{q+1,\dots ,f},0,0);$  12:
${\mathtt{Q}}^{i,\dots ,f}.\mathtt{add}(({c}_{{j}_{x}+1}^{x},{c}_{0}^{y},{j}_{x}+1,0));$  13:
${\mathtt{X}}_{{j}_{x}+1}^{i,\dots ,f}\leftarrow 1;{\mathtt{Y}}_{0}^{i,\dots ,f}\leftarrow 1;$  14:
end if  15:
if $({j}_{y}+1)<(las{t}_{J}+1)\xb7\omega $ then  16:
${\mathtt{c}}_{0}^{x}\leftarrow \mathtt{getCandidate}({\mathtt{N}}_{1}^{i,\dots ,q},0,0);$  17:
${\mathtt{c}}_{{j}_{y}+1}^{y}\leftarrow \mathtt{getCandidate}({\mathtt{N}}_{1}^{q+1,\dots ,f},{j}_{y}+1,1);$  18:
${\mathtt{Q}}^{i,\dots ,f}.\mathtt{add}(({c}_{0}^{x},{c}_{{j}_{y}+1}^{y},0,{j}_{y}+1));$  19:
${\mathtt{X}}_{0}^{i,\dots ,f}\leftarrow 1;{\mathtt{Y}}_{{j}_{y}+1}^{i,\dots ,f}\leftarrow 1;$  20:
end if  21:
else  22:
if no candidates in same row/column as $\mathtt{Successor}({S}_{I,J})$ then  23:
$({c}_{k}^{x},{c}_{l}^{y},k,l)\leftarrow \mathtt{getHighestScoreCandidate}(\mathtt{Successor}({S}_{I,J}))$;  24:
${\mathtt{Q}}^{i,\dots ,f}.\mathtt{add}(({c}_{k}^{x},{c}_{l}^{y},k,l));$  25:
${\mathtt{X}}_{k}^{i,\dots ,f}\leftarrow 1;{\mathtt{Y}}_{l}^{i,\dots ,f}\leftarrow 1;$  26:
end if  27:
end if  28:
else  29:
if $({j}_{x}+1,{j}_{y})\in {S}_{I,J}$ and ${\mathtt{X}}_{{j}_{x}+1}^{i,\dots ,f}$ is set to 0 then  30:
${\mathtt{c}}_{{j}_{x}+1}^{x}\leftarrow \mathtt{getCandidate}({\mathtt{N}}_{b}^{i,\dots ,q},{j}_{x}+1,2);$  31:
${\mathtt{Q}}^{i,\dots ,f}.\mathtt{add}(({c}_{{j}_{x}+1}^{x},{c}_{{j}_{y}}^{y},{j}_{x}+1,{j}_{y}));$  32:
${\mathtt{X}}_{{j}_{x}+1}^{i,\dots ,f}\leftarrow 1;{\mathtt{Y}}_{{j}_{y}}^{i,\dots ,f}\leftarrow 1;$  33:
end if  34:
if $({j}_{x},{j}_{y}+1)\in {S}_{I,J}$ and ${\mathtt{X}}_{{j}_{y}+1}^{i,\dots ,f}$ is set to 0 then  35:
if $I=J$ then  36:
${\mathtt{c}}_{{j}_{y}+1}^{y}\leftarrow \mathtt{getCandidate}({\mathtt{N}}_{1}^{q+1,\dots ,f},{j}_{y}+1,2);$  37:
else  38:
${\mathtt{c}}_{{j}_{y}+1}^{y}\leftarrow \mathtt{getCandidate}({\mathtt{N}}_{b}^{q+1,\dots ,f},{j}_{y}+1,2);$  39:
end if  40:
${\mathtt{Q}}^{i,\dots ,f}.\mathtt{add}(({c}_{{j}_{x}}^{x},{c}_{{j}_{y}+1}^{y},{j}_{x},{j}_{y}+1));$  41:
${\mathtt{X}}_{{j}_{x}}^{i,\dots ,f}\leftarrow 1;{\mathtt{Y}}_{{j}_{y}+1}^{i,\dots ,f}\leftarrow 1;$  42:
end if  43:
end if  44:
return $\mathtt{combine}({c}_{{j}_{x}}^{x},{c}_{{j}_{y}}^{y})$;  45:
end function

Moreover, the function $\mathtt{getCandidate}({\mathtt{N}}^{i,\dots ,f},j,sw)$ is also adjusted so that each node’s internal list ${L}^{i,\dots ,f}$ has at most $\omega $ chunk candidates at any stage of the algorithm (see Algorithm 6). This function internally makes the call to $\mathtt{restart}({\mathtt{N}}^{i,\dots ,f})$ if $sw=0$. The call to $\mathtt{restart}({\mathtt{N}}^{i,\dots ,f})$ causes ${\mathtt{N}}^{i,\dots ,f}$ to restart its enumeration, i.e., after $\mathtt{restart}({\mathtt{N}}^{i,\dots ,f})$ has been invoked, calling $\mathtt{nextCandidate}({\mathtt{N}}^{i,\dots ,f})$ will return the first chunk candidate from ${\mathtt{N}}^{i,\dots ,f}$. Also, the function $\mathtt{getHighestScoreCandidate}({S}_{I,J}^{\omega})$ returns the highestscoring extended candidate from the square ${S}_{I,J}^{\omega}$. Note this function is called to get the highestscoring extended candidate from the successor of ${S}_{I,J}^{\omega}$. At this point, the content of the internal list of ${\mathtt{N}}_{0}^{q+1,\dots ,f}$ is cleared if $b=0$. Otherwise, the content of the internal list of ${\mathtt{N}}_{1}^{i,\dots ,q}$ is cleared, if $b=1$. Finally, Algorithm 7 precisely describes the manner in which this enumeration works.
4.4.1. Parallelization
The original authors of the research paper [
13] suggest having OKEA run in parallel per square within a layer, but this has a negative effect on the algorithm’s nearoptimality property and even on its overall performance since there are squares within a layer that are strongly dependent on others, i.e., for the algorithm to enumerate the successor square, say,
${S}_{I,J+1}$ within a layer, it requires having information that is obtained during the enumeration of
${S}_{I,J}$. Hence, this strategy may incur extra computation and is also difficult to implement.
4.4.2. Variant
As a variant of this algorithm, we propose to slightly change the definition of layer. Here, a layer consists of all the squares within a secondary diagonal, as shown in
Figure 4. The variant will follow the same process as the original algorithm, i.e., enumeration layer by layer starting at the first secondary diagonal. Within each layer, it will first enumerate the two square corners
$S={S}_{k1,0}\cup {S}_{0,k1}$ by applying OKEA on it. Once one of two squares is enumerated, let us say
${S}_{k1,0}$, its successor
${S}_{k2,1}$ will be inserted in
S as long as such insertion is possible. The algorithm will continue the enumeration by applying OKEA on the updated
S and so on. This algorithm is motivated by the intuition that enumerating secondary diagonals may improve the quality of order of output key candidates, i.e., it may be closer to optimal. This variant, however, may have a potential disadvantage in the multidimensional case because it strongly depends on having all the previously enumerated chunk candidates of both dimension
x and
y stored. To illustrate this, let us suppose that this square
${S}_{k2,1}$ is to be inserted. Then, the algorithm needs to insert its highestscoring extended candidate,
$({\mathtt{c}}_{(k2)\xb7\omega}^{x},{\mathtt{c}}_{\omega}^{y},(k2)\xb7\omega ,\omega )$, into the queue. Hence, the algorithm needs to somehow have both
${\mathtt{c}}_{(k2)\xb7\omega}^{x}$ and
${\mathtt{c}}_{\omega}^{y}$ readily accessible when needed. This implies the need to store them when they are being enumerated (in previous layers). Comparatively, the original algorithm only requires having the
$\omega $ previously generated chunk candidates of both dimension
x and
y stored, which is advantageous in terms of memory consumption.
4.5. A Simple StackBased, DepthFirst Key Enumeration Algorithm
We next present a memoryefficient, nonoptimal key enumeration algorithm that generates key candidates of which their total scores are within a given interval
$[{B}_{1},{B}_{2}]$ that is based on the algorithm introduced by Martin et al. in the research paper [
16]. We note that the original algorithm is fairly efficient while generating a new key candidate; however, its overall performance may be negatively affected by its use of memory, since it was originally designed to store each new generated key candidate, each of which is tested only once the algorithm has completed the enumeration. Our variant, however, makes use of a stack (lastinfirstout queue) during the enumeration process. This helps in maintaining the state of the algorithm. Each newly generated key candidate may be tested immediately, and there is no need for candidates to be stored for future processing.
Our variant basically performs a depthfirst search in an undirected graph G originated from the $\mathcal{N}$ lists of chunk candidates ${L}^{i}=[{c}_{0}^{i},{c}_{n}^{i},\dots ,{c}_{{m}_{i}1}^{i}]$. This graph G has ${\sum}_{i=0}^{\mathcal{N}1}{m}_{i}$ vertices, each of which represents a chunk candidate. Each vertex ${v}_{j}^{i}$ is connected to the vertices ${v}_{k}^{i+1},0\le i<\mathcal{N}1,0\le j<{m}_{i},0\le k<{m}_{i+1}$. At any vertex ${v}_{j}^{i}$, the algorithm will check if ${c}_{j}^{i}.score$ plus an accumulated score is within the given interval $[{B}_{1},{B}_{2}]$. If so, it will select the chunk candidate ${c}_{j}^{i}$ for the chunk i and travel forward to the vertex ${v}_{0}^{i+1}$, or else, it will continue exploring and attempt to travel to the vertex ${v}_{j+1}^{i}$. Otherwise, it will travel backwards to a vertex from the previous chunk ${v}_{k}^{i1},0\le k<{m}_{i1}$, when there is no suitable chunk candidate for the current chunk i.
As can be noted, this variant uses a simple backtracking strategy. In order to speed up the pruning process, we will make use of two precomputed tables
$\mathtt{minArray}(\mathtt{maxArray})$. The entry
$\mathtt{minArray}\left[i\right](\mathtt{maxArray}[i])$ holds the global minimum (maximum) value that can be reached from chunk
i to chunk
$\mathcal{N}1$. In other words,
with
$\mathtt{minArray}\left[\mathcal{N}\right]=\mathtt{maxArray}\left[\mathcal{N}\right]=0.$Additionally, note that when each list of chunk candidates
${L}^{i}=[{\mathtt{c}}_{0}^{i},{\mathtt{c}}_{1}^{i},\dots ,{\mathtt{c}}_{{m}_{i}1}^{i}],0\le i<\mathcal{N}$, is in decreasing order based on the score component of its chunk candidates, we can compute the entry
$\mathtt{minArray}(\mathtt{maxArray})$ by computing
and
Therefore, the basic variant is sped up by computing maxS (minS), which is the maximum(minimum) score that can be obtained from the current chunk candidate, and then by checking if the intersection of the intervals $[\mathtt{minS},\mathtt{maxS}]$ and $[{B}_{1},{B}_{2}]$ is not empty.
4.5.1. Setup
We now introduce a couple of tools that we will use to describe the algorithm, using the following notations.
$\mathtt{S}$ will denote a stack. This data structure supports two basic methods [
30]. Firstly, the method
$\mathtt{S}.\mathtt{pop}()$ removes the element at the top of this stack and returns that element as the value of this function. Secondly, the method
$\mathtt{S}.\mathtt{push}(e)$ pushes
e onto the top of this stack. This stack
S will store 4tuples of the form
$(score,i,j,indices)$, where
$score$ is the accumulated score at any stage of the algorithm,
i and
j are the indices for the chunk candidate
${c}_{j}^{i}$, and
$indices$ is an array of positive integers holding the indices of the selected chunk candidates, i.e., the chunk candidate
${c}_{indices\left[k\right]}^{k}$ is assigned to chunk
k and for each k,
$0\le k\le i$.
4.5.2. Complete Algorithm
Firstly, at the initialisation stage, the 4tuple $(0,0,0,[])$ will be inserted into the stack $\mathtt{S}$. The main loop of this algorithm will call the function $\mathtt{nextCandidate}(\mathtt{S},{B}_{1},{B}_{2})$, defined in Algorithm 8, as long as the stack $\mathtt{S}$ is not empty. Specifically the main loop will call this function to obtain a key candidate of which its score is in the range $[{B}_{1},{B}_{2}]$. Algorithm 8 will then attempt to find such a candidate, and once it has found such a candidate, it will return the candidate to the main loop (at this point, $\mathtt{S}$ may not be empty). The main loop will get the key candidate, process or test it, and continue calling the function $\mathtt{nextCandidate}(\mathtt{S},{B}_{1},{B}_{2})$ as long as S is not empty. Because of the use of the stack S, the state of Algorithm 8 will not be lost; therefore, each time the main loop calls it, it will return a new key candidate of which its score lies in the interval $[{B}_{1},{B}_{2}]$. The main loop will terminate once all possible key candidates of which their scores are within the interval $[{B}_{1},{B}_{2}]$ have already been generated, which will happen once the stack is empty.
Algorithm 8 outputs a key candidate in the interval $[{B}_{1},{B}_{2}]$. 
 1:
functionNextCandidate($\mathtt{S},{B}_{1},{B}_{2}$)  2:
while S is not empty do  3:
$(aScore,i,j,indices)\leftarrow \mathtt{S}.\mathtt{pop}();$  4:
if $j<{L}^{i}.\mathtt{size}()1$ then  5:
$\mathtt{S}.\mathtt{push}((aScore,i,j+1,indices))$;  6:
end if  7:
$uScore\leftarrow aScore+{\mathtt{c}}_{j}^{i}.score$;  8:
$\mathtt{maxS}\leftarrow uScore+\mathtt{maxArray}[i+1];$  9:
$\mathtt{minS}\leftarrow uScore+\mathtt{minArray}[i+1];$  10:
if $\mathtt{maxS}\ge {B}_{1}$ and $\mathtt{minS}\le {B}_{2}$ then  11:
if $uScore\le {B}_{2}$ then  12:
if $i=\mathcal{N}1$ then  13:
if ${B}_{1}\le uScore$ then  14:
$indices\leftarrow indices\text{}\Vert \text{}\left[j\right]$;  15:
$\mathtt{c}\leftarrow \mathtt{combine}({\mathtt{c}}_{indices\left[0\right]}^{0},\dots ,{\mathtt{c}}_{indices[\mathcal{N}1]}^{\mathcal{N}1});$  16:
$\mathbf{break};$  17:
end if  18:
else  19:
$\mathtt{S}.\mathtt{push}((aScore,i+1,0,indices\text{}\Vert \text{}\left[j\right]))$;  20:
end if  21:
end if  22:
end if  23:
end while  24:
return $\mathtt{c}$;  25:
end function

4.5.3. Memory Consumption
We claim that, at any stage of the algorithm, there are at most $\mathcal{N}$ 4tuples stored in the stack S. Indeed, after the stack is initialised, it only contains the 4tuple $(0,0,0,[])$. Note that, during the execution of a while iteration, a 4tuple is removed out of the stack and two new 4tuples might be inserted. Hence, after s while iterations have been completed, there will be ${N}_{\mathtt{S}}^{s}=1+(1+{l}_{1})+(1+{l}_{2})+(1+{l}_{3})+(1+{l}_{4})+\cdots +(1+{l}_{s})$ 4tuples, where $0\le {l}_{r}\le 2$, for $1\le r\le s$.
Suppose now that the algorithm is about to execute the ${k}_{th}$ while iteration during which the first valid key candidate will be returned. Therefore, ${N}_{\mathtt{S}}^{k1}=1+(1+{l}_{1})+(1+{l}_{2})+(1+{l}_{3})+(1+{l}_{4})+\cdots +(1+{l}_{k1})\le \mathcal{N}.$ During the execution of the ${k}_{th}$ while iteration, a 4tuple will be removed and only a new 4tuple will be considered for insertion in the stack. Therefore, we have that ${N}_{\mathtt{S}}^{k}={N}_{\mathtt{S}}^{k1}1+{l}_{k}\le \mathcal{N}1+{l}_{k}\le \mathcal{N}$, since $0\le {l}_{k}\le 1$. Applying a similar reasoning, we have ${N}_{\mathtt{S}}^{n}\le \mathcal{N}$ for $n>k$.
4.5.4. Parallelization
One of the most interesting features of the previous algorithm is that it is parallelizable. The original authors suggested as a parallelization method to run instances of the algorithm over different disjoint intervals [
16]. Although this method is effective and has a potential advantage as the different instances will produce nonoverlapping lists of key candidates with the instance searching over the first interval producing the mostlikely key candidates, it is not efficient since each instance will inevitably repeat a lot of the work done by the other instances. Here, we propose another parallelization method that partitions the search space to avoid the repetition of work.
Suppose that we want to have t parallel, independent tasks ${T}_{1},{T}_{2},{T}_{3},\dots ,{T}_{t}$ to search over a given interval in parallel. Let ${L}^{i}=[{c}_{0}^{i},{c}_{1}^{i},\dots ,{c}_{{m}_{i}1}^{i}]$ be the list of chunk candidates for chunk i, $0\le i\le \mathcal{N}1$.
We first assume that $t\le {m}_{0}$, where ${m}_{0}$ is the size of ${L}^{0}$. In order to construct these tasks, we partition ${L}^{0}$ into t disjoint, roughly equalsized sublists ${L}_{j}^{0},1\le j\le t$. We set each task ${T}_{j}$ to perform its enumeration over the given interval but only consider the lists of chunk candidates ${L}_{j}^{0},{L}^{1},\dots ,{L}^{\mathcal{N}1}$.
Note that the previous startegy can be easily generalised for ${m}_{0}<t\ll {\prod}_{k=0}^{\mathcal{N}1}{m}_{k}$. Indeed, first, find the smallest integer l, with $0<l<\mathcal{N}1$, such that ${\prod}_{k=0}^{l1}{m}_{k}<t\le {\prod}_{k=0}^{l}{m}_{k}$. We then construct the list of chunk candidates ${L}^{0,\dots ,l}$ as follows. For each $(l+1)$tuple $({\mathtt{c}}_{{j}_{0}}^{0},{\mathtt{c}}_{{j}_{1}}^{1},\dots ,{\mathtt{c}}_{{j}_{l}}^{l})$, with ${\mathtt{c}}_{{j}_{k}}^{k}\in {L}^{k},0\le {j}_{k}<{m}_{k},0\le k\le l$, the chunk candidate ${\mathtt{c}}^{{j}_{0},{j}_{1},\dots ,{j}_{l}}$ is constructed by calculating ${\mathtt{c}}^{{j}_{0},{j}_{1},\dots ,{j}_{l}}.score={\sum}_{k=0}^{l}{\mathtt{c}}_{{j}_{k}}^{k}.score$ and by setting ${\mathtt{c}}^{{j}_{0},{j}_{1},\dots ,{j}_{l}}.value=[{c}_{{j}_{0}}^{0}.value,\dots ,{c}_{{j}_{l}}^{l}.value]$, and then, ${\mathtt{c}}^{{j}_{0},{j}_{1},\dots ,{j}_{l}}$ is added to ${L}^{0,\dots ,l}$. We then partition ${L}^{0,\dots ,l}$ into t disjoint, roughly equalsized sublists ${L}_{j}^{0,\dots ,l},1\le j\le t$ and finally set each task ${T}_{j}$ to perform its enumeration over the given interval but only consider the lists of chunk candidates ${L}_{j}^{0,\dots ,l},{L}^{l+1},\dots ,{L}^{\mathcal{N}1}$. Note that the workload assigned to each enumerating task is a consequence of the selected method for partitioning the list ${L}^{0,\dots ,l}$.
Additionally, both parallelization methods can be combined by partitioning the given interval $[{B}_{1},{B}_{2}]$ into ${n}_{s}$ disjoint subintervals and by searching each such subinterval with ${t}_{k}$ tasks, hence amounting to ${\sum}_{k=1}^{{n}_{s}}{t}_{k}$ enumerating tasks.
4.6. Threshold Algorithm
Algorithm 8 shares some similarities with the algorithm Threshold introduced in the research paper [
14], since Threshold also makes use of an array (
partialSum) similar to the array
minArray to speed up the pruning process. However, Threshold works with nonnegative integer values (weights) rather than scores. Threshold restricts the scores to weights such that the smallest weight is the likeliest score by making use of a function that converts scores into weights [
14].
Assuming the scores have already been converted to weights, Threshold first sorts each list of chunk candidates ${L}^{i}=[{\mathtt{c}}_{0}^{i},{\mathtt{c}}_{1}^{i},\dots ,{\mathtt{c}}_{{m}_{i}1}^{i}],0\le i<N$ in ascending order based on the score/weight component of its chunk candidates. It then computes the entries of partialSum by first setting $\mathtt{partialSum}[\mathcal{N}1]=0$ and then by computing
Threshold then enumerates all the key candidates of which their accumulated total weight lies in a range of the form $[0,{W}_{t})$, where ${W}_{t}$ is a parameter. To do so, it performs a similar process to Algorithm 8 by using its precomputed table (partialSum) to avoid useless paths, hence improving the pruning process. This enumeration process performed by Threshold is described in Algorithm 9.
According to its designers, this algorithm may perform a nonoptimal enumeration to a depth of
${2}^{40}$ if some adjustments are made in the data structure
L used to store the key candidates. However, its primary drawback is that it must always start enumerating from the most likely key. Consequently, whilst the simplicity and relatively strong time complexity of Threshold is desirable, in a parallelized environment, it can only serve as the first enumeration algorithm (or can only be used in the first search task). Threshold, therefore, was not implemented and, hence, is not included in the comparison made in
Section 5.
Algorithm 9 enumerates all key candidate in the interval $[0,{W}_{t}]$. 
 1:
functionthreshold($i,w,\mathtt{K},{W}_{t},L$)  2:
for $j=0\phantom{\rule{3.33333pt}{0ex}}to\phantom{\rule{3.33333pt}{0ex}}{m}_{i}$ do  3:
$newW\leftarrow w+{\mathtt{c}}_{j}^{i}.score;$  4:
if $(newW+\mathtt{partialSum}\left[i\right])>{W}_{t})$ then  5:
$\mathbf{break};$  6:
else  7:
if $i=\mathcal{N}1$ then  8:
${\mathtt{K}}_{i}\leftarrow j;$  9:
$\mathtt{c}\leftarrow \mathtt{combine}({\mathtt{c}}_{\mathtt{K}\left[0\right]}^{0},{\mathtt{c}}_{\mathtt{K}\left[1\right]}^{1},\dots ,{\mathtt{c}}_{\mathtt{K}[\mathcal{N}1]}^{\mathcal{N}1});$  10:
$L\leftarrow L.\mathtt{add}(\mathtt{c});$  11:
else  12:
${\mathtt{K}}_{i}\leftarrow j;$  13:
$L\leftarrow \mathtt{threshold}(i+1,newW,\mathtt{K},{W}_{t},L);$  14:
end if  15:
end if  16:
end for  17:
return L;  18:
end function

4.7. A WeightBased Key Enumeration Algorithm
In this subsection, we will describe a nonoptimal enumeration algorithm based on the algorithm introduced in the research paper [
12]. This algorithm differs from the original algorithm in the manner in which this algorithm builds a precomputed table (
iRange) and uses it during execution to construct key candidates of which their total accumulated score is equal to a certain accumulated score. This algorithm shares similarities with the stackbased, depthfirst key enumeration algorithm described in
Section 4.5 because both algorithms essentially perform a depthfirst search in the undirected graph
G. However, this algorithm controls pruning by the accumulated total score that a key candidate must reach to be accepted. To achieve this, the scores are restricted to positive integer values (weights), which may be derived from a correlation value in a sidechannel analysis attack.
This algorithm starts off by generating all key candidates with the largest possible accumulated total weight
${W}_{1}$ and then proceeds to generate all key candidates of which their accumulated total weight are equal to the second largest possible accumulated total weight
${W}_{2}$, and so forth, until it generates all key candidates with the minimum possible accumulated total weight
${W}_{N}$. To find a key candidate with a weight equal to a certain accumulated weight, this algorithm makes use of a simple backtracking strategy, which is efficient because impossible paths can be pruned early. The pruning is controlled by the accumulated weight that must be reached for the solution to be accepted. To achieve a fast decision process during backtracking, this algorithm precomputes tables for minimal and maximal accumulated total weights that can be reached by completing a path to the right, like the tables
minArray and
maxArray introduced in
Section 4.5. Additionally, this algorithm precomputes an additional table,
iRange.
Given $0\le i\le \mathcal{N}$ and $\mathtt{minArray}\left[i\right]\le w\le \mathtt{maxArray}\left[i\right]$, the entry $\mathtt{iRange}\left[i\right]\left[w\right]$ points to a list of integers ${L}^{(i,w)}=[{k}_{0}^{(i,w)},{k}_{1}^{(i,w)},\dots ,{k}_{n}^{(i,w)}]$, where each entry represents a distinct index of the list ${L}^{i}$, i.e., $0\le {k}_{j}^{(i,w)}\ne {k}_{l}^{(i,w)}<{m}_{i}$ for $j\ne l$. The algorithm uses these indices to construct a chunk candidate with an accumulated score w from chunk i to chunk $\mathcal{N}1$.
In order to compute this table, we use the observation that for a given entry ${k}_{j}^{(i,w)}$ of $\mathtt{iRange}\left[i\right]\left[w\right]$, the list $\mathtt{iRange}[i+1]\left[cw\right]$, with $cw=w{\mathtt{c}}_{{k}_{j}^{(i,w)}}^{i}.score$, must be defined and be nonempty. So we first set the entry $\mathtt{iRange}\left[\mathcal{N}\right]\left[0\right]$ to $\left[0\right]$ and then proceed to compute the entries for $i=\mathcal{N}1,\dots ,0$ and $w=\mathtt{minArray}\left[i\right],\dots ,\mathtt{maxArray}\left[i\right]$. Algorithm 10 describes precisely how this table is precomputed.
Algorithm 10 precomputes the table iRange. 
 1:
functionPrecomputeIRange( )  2:
$\mathtt{iRange}\left[\mathcal{N}\right]\left[0\right]\leftarrow \left[0\right]$;  3:
for $i=\mathcal{N}1$ to 0 do  4:
for $w=\mathtt{minArray}\left[i\right]$ to $\mathtt{maxArray}\left[i\right]$ do  5:
${L}^{(i,w)}\leftarrow \left[\right];$  6:
for $k=0$ to ${m}_{i}1$ do  7:
$cw\leftarrow w{\mathtt{c}}_{k}^{i}.score$;  8:
if $\mathtt{iRange}[i+1]\left[cw\right].\mathtt{size}()>0$ then  9:
${L}^{(i,w)}.\mathtt{add}(k)$;  10:
end if  11:
end for  12:
if ${L}^{(i,w)}.\mathtt{size}()>0$ then  13:
$\mathtt{IRange}\left[i\right]\left[w\right]\leftarrow {L}^{(i,w)}$;  14:
end if  15:
end for  16:
end for  17:
return $\mathtt{IRange}$  18:
end function

4.7.1. Complete Algorithm
Algorithm 11 describes the backtracking strategy more precisely, making use of the precomputed tables for pruning impossible paths. The integer array TWeights contains accumulated weights in a selected order, where an entry $w\in \mathtt{TWeights}$ must satisfy that the list $\mathtt{iRange}\left[0\right]\left[w\right]$ is nonempty, i.e., $\mathtt{iRange}\left[0\right]\left[w\right].\mathtt{size}()>0$. This helps in constructing a key candidate with an accumulated score w from chunk 0 to chunk $\mathcal{N}1$. In particular, TWeights may be set to $[{W}_{1},{W}_{2},\dots ,{W}_{N}]$, i.e., the array containing all possible accumulated scores that can be reached from chunk 0 to chunk $\mathcal{N}1$.
Furthermore, the order in which the elements in the array TWeights are arranged is important. For this array $[{W}_{1},{W}_{2},\dots ,{W}_{N}]$, for example, the algorithm will first enumerate all key candidates with accumulated weight ${W}_{1}$ and then all those with accumulated weight ${W}_{2}$ and so on. This guarantees a certain quality, since good key candidates will be enumerated earlier than worse ones. However, key candidates with the same accumulated weight will be generated in no particular order, so a lack of precision in converting scores to weights will lead to some decrease of quality.
Algorithm 11 enumerates key candidates for given weights. 
 1:
functionKeyEnumeration(TWeights, iRange)  2:
for $w\in \mathtt{TWeights}$ do  3:
$i\leftarrow 0$;  4:
$\mathtt{k}\left[0\right]\leftarrow (0,\mathtt{iRange}[0\left]\right[w].\mathtt{get}(0))$; 2tuple $({e}_{1},{e}_{2})$  5:
$cw\leftarrow w;$  6:
while $i\ge 0$ do  7:
while $i<\mathcal{N}1$ do  8:
$cw\leftarrow cw{\mathtt{c}}_{\mathtt{k}\left[i\right].{e}_{2}}^{i}.score;$  9:
$i\leftarrow i+1;$  10:
$\mathtt{k}\left[i\right]\leftarrow (0,\mathtt{iRange}[i\left]\right[cw].\mathtt{get}(0));$  11:
end while  12:
$\mathtt{c}\leftarrow \mathtt{combine}({\mathtt{c}}_{\mathtt{k}\left[0\right].{e}_{2}}^{0},{\mathtt{c}}_{\mathtt{k}\left[1\right].{e}_{2}}^{1},\dots ,{\mathtt{c}}_{\mathtt{k}[\mathcal{N}1].{e}_{2}}^{\mathcal{N}1});$  13:
$\mathtt{Test}(\mathtt{c});$  14:
while $i\ge 0$ and $\mathtt{k}\left[i\right].{e}_{1}\ge (\mathtt{iRange}\left[i\right]\left[cw\right].\mathtt{size}()1)$ do  15:
$i\leftarrow i1;$  16:
if $i\ge 0$ then  17:
$cw\leftarrow cw+{\mathtt{c}}_{\mathtt{k}\left[i\right].{e}_{2}}^{i}.score;$  18:
end if  19:
end while  20:
if $i\ge 0$ then  21:
$\mathtt{k}\left[i\right]\leftarrow (\mathtt{k}\left[i\right].{e}_{1}+1,\mathtt{iRange}\left[i\right]\left[cw\right].\mathtt{get}(\mathtt{k}\left[i\right].{e}_{1}+1));$  22:
end if  23:
end while  24:
end for  25:
end function

Algorithm 11 makes use of the table $\mathtt{k}$ with $\mathcal{N}$ entries, each of which is a 2tuple of the form $({e}_{1},{e}_{2})$ with ${e}_{1}$ and ${e}_{2}$ integers. For a given tuple $\mathtt{k}\left[i\right]$, the component $\mathtt{k}\left[i\right].{e}_{1}$ is an index of some list $\mathtt{iRange}\left[i\right]\left[w\right]$, with $\mathtt{minArray}\left[i\right]\le w\le \mathtt{maxArray}\left[i\right]$, while $\mathtt{k}\left[i\right].{e}_{2}$ is the corresponding value, i.e., $\mathtt{k}\left[i\right].{e}_{2}=\mathtt{iRange}\left[i\right]\left[w\right].\mathtt{get}(\mathtt{k}\left[i\right].{e}_{1})$. The value of $\mathtt{k}\left[i\right].{e}_{1}$ allows the algorithm to control if the list $\mathtt{iRange}\left[i\right]\left[w\right]$ has been traveled completely or not, while the second component allows the algorithm to retrieve the chunk candidate of index $\mathtt{k}\left[i\right].{e}_{2}$ of ${L}^{i}$. This is done to avoid recalculating $\mathtt{k}\left[i\right].{e}_{2}$ each time it is required during the execution of the algorithm.
We will now analyse Algorithm 11. Suppose that $w\in \mathtt{TWeights}$; hence, $\mathtt{iRange}\left[0\right]\left[w\right].\mathtt{size}()>0$. The algorithm will then set $\mathtt{k}\left[0\right]$ to $(0,{e}_{2}^{(0)})$, with ${e}_{2}^{(0)}$ being the integer from the entry of index 0 of $\mathtt{iRange}\left[0\right]\left[w\right]$, and then set $cw$ to w (lines 3 to 5). We claim that the main while loop (lines 6 to 23) at each iteration will compute $\mathtt{k}\left[i\right]$ for $0\le i\le \mathcal{N}1$ such that the key candidate $\mathtt{c}$ constructed at line 12 will have an accumulated score w.
Let us set $c{w}_{0}=w$. We know that the list $\mathtt{iRange}\left[0\right]\left[c{w}_{0}\right]$ is nonempty; hence, for any entry ${e}_{2}^{(0)}$ in the list $\mathtt{iRange}\left[0\right]\left[c{w}_{0}\right]$, the list $\mathtt{iRange}\left[1\right]\left[c{w}_{1}\right]$ is nonempty, where
Likewise, for any entry ${e}_{2}^{(1)}$ in the list $\mathtt{iRange}\left[1\right]\left[c{w}_{1}\right]$, the list $\mathtt{iRange}\left[2\right]\left[c{w}_{2}\right]$ is nonempty, where
Hence, for $0\le i<\mathcal{N}$, we have that, for any given entry ${e}_{2}^{(i)}$ in the list $\mathtt{iRange}\left[i\right]\left[c{w}_{i}\right]$, the list $\mathtt{iRange}[i+1]\left[c{w}_{i+1}\right]$ is nonempty, where
Note that, when $i=\mathcal{N}1$, the list $\mathtt{iRange}[i+1]\left[0\right]=\left[0\right]$ is nonempty and $c{w}_{i+1}=0$.
Given $\mathtt{k}\left[0\right],\mathtt{k}\left[1\right],\dots ,\mathtt{k}\left[j\right]$ are already set for some $0\le j<\mathcal{N}1$; the first inner while loop (lines 7 to 11) will set $\mathtt{k}\left[i\right]=(0,{e}_{2}^{(i)})$, where ${e}_{2}^{(i)}$ holds the entry of index 0 of $\mathtt{iRange}\left[i\right]\left[c{w}_{i}\right]$, for $0<j<i\le \mathcal{N}1$. Therefore, once the while loop ends, $i=\mathcal{N}1$ and $c{w}_{i+1}=c{w}_{\mathcal{N}}=c{w}_{i}{\mathtt{c}}_{{e}_{2}^{(i)}}^{i}.score=0$. Hence, the key candidate constructed from the second components $\mathtt{k}\left[l\right].{e}_{2},0\le l\le \mathcal{N}1,$ will have an accumulated score w. In particular, the first time $\mathtt{k}\left[0\right]$ is set, and so, the first inner while loop will calculate $\mathtt{k}\left[1\right],\dots ,\mathtt{k}[\mathcal{N}1]$.
Since there may be more than one key candidate with an accumulated score w, the second inner while loop (lines 14 to 19) will backtrack to a chunk $0\le i<\mathcal{N}$, from which a new key candidate with accumulated score w can be constructed. This is done by simply moving backwards (line 15) and updating $c{w}_{i+1}$ to $c{w}_{i}=c{w}_{i+1}+{c}_{\mathtt{k}\left[i\right].e2}^{i}.score$ until there is an i, $0\le i<\mathcal{N}$, such that $\mathtt{k}\left[i\right].{e}_{1}<\mathtt{iRange}\left[i\right]\left[c{w}_{i}\right].\mathtt{size}()1$.
If there is such an i, then the instruction at line 21 will update $\mathtt{k}\left[i\right]$ to $(\mathtt{k}\left[i\right].{e}_{1}+1,\mathtt{iRange}\left[i\right]\left[c{w}_{i}\right].\mathtt{get}(\mathtt{k}\left[i\right].{e}_{1}+1))$. This means that the updated value for the second component of $\mathtt{k}\left[i\right]$ will be a valid index in ${L}^{i}$, so ${\mathtt{c}}_{\mathtt{k}\left[i\right].e2}^{i}$ will be the new chunk candidate for chunk i. Then, the first inner while loop (lines 7 to 11) will again execute and compute the indices for the remaining chunk candidates in the lists ${L}^{i+1},\dots ,{L}^{\mathcal{N}1}$ such that the resulting key candidate will have the accumulated score w.
Otherwise, if $i<0$, then the main while loop (lines 6 to 23) will end and w will be set to a new value from TWeights, since all key candidates with an accumulated score w have just been enumerated.
4.7.2. Parallelization
Suppose we would like to have t tasks ${T}_{1},{T}_{2},{T}_{3},\cdots ,{T}_{t}$ executed in parallel to enumerate key candidates of which the accumulated total weights are equal to those in the array $\mathtt{TWeights}$. We can split the array $\mathtt{TWeights}$ into t disjoint subarrays ${\mathtt{TWeights}}_{i}$ and then set each task ${T}_{i}$ to run Algorithm 11 through the subarray ${\mathtt{TWeights}}_{i}$. As an example of a partition algorithm to distribute the workload among the tasks, we set the subarray ${\mathtt{TWeights}}_{i}$ to contain elements with indices congruent to i mod t from $\mathtt{TWeights}$. Additionally, note that, if we have access to the number of candidates to be enumerated for each score in the array $\mathtt{TWeights}$ beforehand, we may design a partition algorithm for distributing the workload among the tasks almost evenly.
4.7.3. Run Times
We assume each list of chunk candidates
${L}^{i}=[{\mathtt{c}}_{0}^{i},{\mathtt{c}}_{1}^{i},\dots ,{\mathtt{c}}_{{m}_{i}1}^{i}],0\le i<\mathcal{N}$, is in decreasing order based on the score component of its chunk candidates. Regarding the run time for computing the tables
maxArray and
minArray, note that each entry of the table
$\mathtt{minArray}(\mathtt{maxArray})$ can be computed as explained in
Section 4.5. Therefore, the run time of such an algorithm is
$\mathrm{\Theta}(\mathcal{N})$.
Regarding the run time for computing iRange, we will analyse Algorithm 10. This algorithm is composed of three For blocks. For each i, $0\le i<\mathcal{N}$, the For loop from line 4 to line 15 will be executed ${r}_{i}$ times, where ${r}_{i}=\mathtt{maxArray}\left[i\right]\mathtt{minArray}\left[i\right]+1$. For each iteration, the innermost For block (lines 6 to 11) will execute simple instructions ${m}_{i}$ times. Therefore, once the innermost block has finished, its run time will be ${T}_{3}\xb7{m}_{i}+{C}_{3}$, where ${T}_{3}$ and ${C}_{3}$ are constants. Then, the if block (lines 12 to 14) will be attempted and its run time will be ${C}_{2}$, where ${C}_{2}$ is another constant. Therefore, the run time for an iteration of the For loop (lines 4 to 15) will be ${T}_{3}\xb7{m}_{i}+{C}_{2}+{C}_{3}$. Therefore, the run time of Algorithm 10 is ${\sum}_{i=0}^{\mathcal{N}1}{r}_{i}({T}_{3}\xb7{m}_{i}+{C}_{2}+{C}_{3})$. More specifically,
As noted, this run time depends heavily on ${r}_{i}=\mathtt{maxArray}\left[i\right]\mathtt{minArray}\left[i\right]+1$. Now, the size of the range $\left[\mathtt{minArray}\right[i],\mathtt{maxArray}[i\left]\right]$ relies on the scaling technique used to get a positive integer from a real number. The more accurate the scaling technique is, the more different integer scores there will be. Hence, if we use an accurate scaling technique, we will probably get larger ${r}_{i}$.
We will analyse the run time for Algorithm 11 to generate all key candidates of which their total accumulated weight is w. Let us assume there are ${N}_{w}$ key candidates of which their total accumulated score is equal to w.
First, the run time for instructions at lines 3 to 5 is constant. Therefore, we will only focus on the while loop (lines 6 to 23). In any iteration, the first inner while loop (lines 7 to 11) will execute and compute the indices for the remaining chunk candidates in the lists ${L}^{i},\dots ,{L}^{\mathcal{N}1}$, with i starting at any number in $[0,\mathcal{N}2]$, such that the resulting key candidate will have the accumulated score w. Therefore, its run time is at most $C\xb7(\mathcal{N}1)$, where C is a constant, i.e., it is $\mathcal{O}(\mathcal{N})$. The instruction at line 12 will combine all chunks from 0 to $\mathcal{N}1$, and hence, its run time is also $\mathcal{O}(\mathcal{N})$. The next instruction $\mathtt{Test}(\mathtt{c})$ will test c, and its run time will depend on the scenario in which the algorithm is being run. Let us assume its run time is $\mathcal{O}(T(\mathcal{N}))$, where T is a function.
Regarding the second inner while loop (lines 14 to 19), this loop will backtrack to a chunk i with $0\le i<\mathcal{N}$, from which a new key candidate with accumulated score w can be constructed. This is done by simply moving backwards while computing some simple operations. Therefore, the run time for the second inner while loop is at most $D\xb7(\mathcal{N}1)$, where D is a constant, i.e., it is $\mathcal{O}(\mathcal{N})$. Therefore, the run time for generating all key candidates of which the total accumulated score is w will be $\mathcal{O}({N}_{w}\xb7(\mathcal{N}+T(\mathcal{N})))$.
4.7.4. Memory Consumption
Besides the precomputed tables, it is easy to see that Algorithm 11 makes use of negligible memory while enumerating key candidates. Indeed, testing key candidates is done on the fly to avoid storing them during enumeration. However, the table iRange may have many entries.
Let ${N}_{e}$ be the number of entries of the table iRange. Line 2 of Algorithm 10 will create the entry $\mathtt{iRange}\left[N\right]\left[0\right]$ that points to the list $\left[0\right]$. Hence, after the instruction at line 2 has been executed, ${N}_{e}=1$. Let us consider the For block from line 4 to line 15. For each i, $0\le i<N$, let ${W}_{i}$ be the set of different values w in the range $\left[\mathtt{minArray}\right[i],\mathtt{maxArray}[i\left]\right]$ such that ${L}^{(i,w)}$ is nonempty. After the iteration for i has been executed, the table iRange will have ${W}_{i}$ new entries, each of which will point to a nonempty list, with $0<{W}_{i}\le {r}_{i}.$ Therefore, ${N}_{e}=1+{\sum}_{i=0}^{\mathcal{N}1}\left{W}_{i}\right$ after Algorithm 10 has completed its execution.
Note that ${W}_{i}$ may increase if the range $\left[\mathtt{minArray}\right[i],\mathtt{maxArray}[i\left]\right]$ is large. The size of this interval relies on the scaling technique used to get a positive integer from a real number. The more accurate the scaling technique is, the more different integer scores there will be. Hence, if we use an accurate scaling technique, we will probably get larger ${r}_{i}$, making it likely for ${W}_{i}$ to increase. Therefore, the table iRange may have many entries.
Regarding the number of bits used in memory to store the table
iRange, let us suppose that an integer is stored in
${B}_{int}$ bits and that a pointer is stored in
${B}_{p}$ bits. Once Algorithm 10 has completed its execution, we know that
$\mathtt{iRange}\left[i\right]\left[w\right]$ will point to the list
${L}^{(i,w)}$, with
$0\le i\le \mathcal{N}$ and
$w\in {W}_{i}.$ Moreover, by definition, we know that the list
${L}^{(\mathcal{N},0)}$ will be the list
$\left[0\right]$, while any other list
${L}^{(i,w)}$,
$0\le i<\mathcal{N}$ and
$w\in {W}_{i}$, will have
${n}^{(i,w)}$ entries, with
$1\le {n}^{(i,w)}\le {m}_{i}$. Therefore, the number of bits
iRange occupies in memory after Algorithm 11 has completed its execution is
Since
$1\le {n}^{(i,w)}\le {m}_{i}$, we have
4.8. A Key Enumeration Algorithm using Histograms
In this subsection, we will describe a nonoptimal key enumeration algorithm introduced in the research paper [
17].
4.8.1. Setup
We now introduce a couple of tools that we will use to describe the subalgorithms used in the algorithm of the research paper [
17], using the following notations:
H will denote a histogram,
${N}_{b}$ will denote a number of bins,
b will denote a bin, and
x will denote a bin index.
Linear Histograms
The function ${H}_{i}=\mathtt{createHist}({L}^{i},{N}_{b})$ creates a standard histogram from the list of chunk candidates ${L}_{i}$ with ${N}_{b}$ linearly spaced bins.
Given a list of chunk candidates ${L}^{i}$, the function createHist will first calculate both the minimum score $min$ and maximum score $max$ among all the chunk candidates in ${L}^{i}$. It will then partition the interval $I=[min,max]$ into subintervals ${I}_{0}=[min,min+\sigma ),{I}_{1}=[min+\sigma ,min+2\sigma ),\dots ,{I}_{{N}_{b}1}=[min+({N}_{b}1)\sigma ,max]$, where $\sigma =\frac{maxmin}{{N}_{b}}.$ It then will proceed to build the list ${L}_{{H}_{i}}$ of size ${N}_{b}$. The entry $0\le x<{N}_{b}$ of ${L}_{{H}_{i}}$ will point to a list that contains all chunk candidates from ${L}^{i}$ such that their scores lie in ${I}_{x}$. The returned standard histogram ${H}_{i}$ is therefore stored as the list ${L}_{{H}_{i}}$ of which its entries will point to lists of chunk candidates. For a given bin index x, ${L}_{{H}_{i}}.\mathtt{get}(x)$ outputs the list of chunk candidates contained in the bin of index x of ${H}_{i}.$ Therefore, ${H}_{i}\left[x\right]={L}_{{H}_{i}}.\mathtt{get}(x).\mathtt{size}()$ is the number of chunk candidates in the bin of index x of ${H}_{i}$. The run time for $\mathtt{createHist}({L}^{i},{N}_{b})$ is $\mathrm{\Theta}({m}_{i}+{N}_{b})$.
Convolution
This is the usual convolution algorithm which computes
${H}_{1:2}=\mathtt{conv}({H}_{1},{H}_{2})$ from two histograms
${H}_{1}$ and
${H}_{2}$ of sizes
${n}_{1}$ and
${n}_{2}$, respectively, where
${H}_{1:2}\left[k\right]={\sum}_{i=0}^{k}{H}_{1}\left[i\right]\xb7{H}_{2}[ki].$ The computation of
${H}_{1:2}$ is done efficiently by using Fast Fourier Transformation (FFT) for polynomial multiplication. Indeed, the array
$[{H}_{j}[0],{H}_{j}[1],\dots ,{H}_{j}[{n}_{j}1]]$ is seen as the coefficient representation of
${P}_{j}={H}_{j}\left[0\right]+{H}_{j}\left[1\right]x+\dots +{H}_{j}[{n}_{j}1]{x}^{{n}_{j}1}$ for
$j=1,2.$ In order to get
${H}_{1:2}$, we multiply the two polynomials of degreebound
$n=max({n}_{1},{n}_{2})$ in time
$\mathrm{\Theta}(nlogn)$, with both the input and output representations in coefficient form [
30]. The convoluted histogram
${H}_{1:2}$ is therefore stored as a list of integers.
Getting the Size of a Histogram
The method size() returns the number of bins of a histogram. This method simply returns $L.\mathtt{size}()$, where L is the underlying list used to represent the histogram.
Getting Chunk Candidates from a Bin
Given a standard histogram ${H}_{i}$ and an index $0\le x<{H}_{i}.\mathtt{size}()$, the method ${H}_{i}.\mathtt{get}(x)$ outputs the list of all chunk candidates contained in the bin of index x of ${H}_{i}$, i.e., this method simply returns the list ${L}_{{H}_{i}}.\mathtt{get}(x).$
4.8.2. Complete Algorithm
This key enumeration algorithm uses histograms to represent scores, and the first step of the key enumeration is a convolution of histograms modelling the distribution of the $\mathcal{N}$ lists of scores. This step is detailed in Algorithm 12.
Algorithm 12 computes standard and convoluted histograms. 
 1:
functioncreateHistograms(${L}^{0},{L}^{1},\dots ,{L}^{\mathcal{N}1},{N}_{b}$)  2:
${H}_{0}\leftarrow \mathtt{createHist}({L}^{0},{N}_{b})$;  3:
${H}_{1}\leftarrow \mathtt{createHist}({L}^{1},{N}_{b})$;  4:
${H}_{0:1}\leftarrow \mathtt{conv}({H}_{0},{H}_{1})$;  5:
for $i=2$ to $\mathcal{N}1$ do  6:
${H}_{i}\leftarrow \mathtt{createHist}({L}^{i},{N}_{b})$;  7:
${H}_{0:i}\leftarrow \mathtt{conv}({H}_{i},{H}_{0:i1})$;  8:
end for  9:
return $H=[{H}_{0},{H}_{1},\dots ,{H}_{\mathcal{N}1},{H}_{0:1},\dots ,{H}_{0:\mathcal{N}1}]$;  10:
end function

Based on this first step, this key enumeration algorithm allows enumerating key candidates that are ranked between two bounds ${R}_{1}$ and ${R}_{2}$. In order to enumerate all keys ranked between the bounds ${R}_{1}$ and ${R}_{2}$, the corresponding indices of bins of ${H}_{0:\mathcal{N}1}$ have to be computed, as described in Algorithm 13. It simply sums the number of key candidates contained in the bins starting from the bin containing the highest scoring key candidates until we exceed ${R}_{1}$ and ${R}_{2}$ and returns the corresponding indices ${x}_{start}$ and ${x}_{stop}.$
Algorithm 13 computes the indices’ bounds. 
 1:
functioncomputeBounds(${R}_{1},{R}_{2}$)  2:
$start\leftarrow {H}_{0:\mathcal{N}1}.\mathtt{size}()$;  3:
$cn{t}_{start}\leftarrow 0$;  4:
while $cn{t}_{start}<{R}_{1}$ do  5:
$start\leftarrow start1$;  6:
$cn{t}_{start}\leftarrow cn{t}_{start}+{H}_{0:\mathcal{N}1}\left[start\right]$;  7:
end while  8:
${x}_{start}\leftarrow start$;  9:
while $cn{t}_{start}<{R}_{2}$ do  10:
$start\leftarrow start1$;  11:
$cn{t}_{start}\leftarrow cn{t}_{start}+{H}_{0:\mathcal{N}1}\left[start\right]$;  12:
end while  13:
${x}_{stop}\leftarrow start$;  14:
return ${x}_{start},{x}_{stop}$;  15:
end function

Given the list of histograms of scores H and the indices of bins of ${H}_{0:\mathcal{N}1}$ between which we want to enumerate, the enumeration simply consists of performing a backtracking over all the bins between ${x}_{start}$ and ${x}_{stop}$. More precisely, during this phase, we recover the bins of the initial histograms (i.e., before convolution) that were used to build a bin of the convoluted histogram ${H}_{0:\mathcal{N}1}$. For a given bin b with index x of ${H}_{0:\mathcal{N}1}$, we have to run through all the nonempty bins ${b}_{0},\dots ,{b}_{\mathcal{N}1}$ of indices ${x}_{0},\dots ,{x}_{\mathcal{N}1}$ of ${H}_{0},\dots ,{H}_{\mathcal{N}1}$ such that ${x}_{0}+\dots +{x}_{\mathcal{N}1}=x$. Each ${b}_{i}$ will then contain at least one and at most ${m}_{i}$ chunk candidates of the list ${L}^{i}$ that we must enumerate. This leads to storing a table $\mathtt{kf}$ of $\mathcal{N}$ entries, each of which points to a list of chunk candidates. The list pointed to by the entry $\mathtt{kf}\left[i\right]$ holds at least one and at most ${m}_{i}$ chunk candidates contained in the bin ${b}_{i}$ of the histogram ${H}_{i}.$ Any combination of these $\mathcal{N}$ lists, i.e., picking an entry from each list, results in a key candidate.
Algorithm 14 describes more precisely this bin decomposition process. This algorithm simply follows a recursive decomposition. That is, in order to enumerate all the key candidates within a bin b of index x of ${H}_{0:\mathcal{N}1}$, it first finds two nonempty bins of indices ${x}_{\mathcal{N}1}$ and $x{x}_{\mathcal{N}1}$ of ${H}_{\mathcal{N}1}$ and ${H}_{0:\mathcal{N}2}$, respectively. All the chunk candidates in the bin of index ${x}_{\mathcal{N}1}$ of ${H}_{\mathcal{N}1}$ will be added to the key factorisation, i.e., the entry $\mathtt{kf}[\mathcal{N}1]$ will point to the list of chunk candidates returned by ${H}_{\mathcal{N}1}.\mathtt{get}({x}_{\mathcal{N}1})$. It then continues the recursion with the bin of index $x{x}_{\mathcal{N}1}$ of ${H}_{0:\mathcal{N}2}$ by finding two nonempty bins of indices ${x}_{\mathcal{N}2}$ and $x{x}_{\mathcal{N}1}{x}_{\mathcal{N}2}$ of ${H}_{0:\mathcal{N}2}$ and ${H}_{0:\mathcal{N}3}$, respectively, and by adding all the chunk candidates in the bin of index ${x}_{\mathcal{N}2}$ of ${H}_{\mathcal{N}2}$ to the key factorisation, i.e., $\mathtt{kf}[\mathcal{N}2]$ will now point to the list of chunk candidates returned by ${H}_{\mathcal{N}2}.\mathtt{get}({x}_{\mathcal{N}2})$ and so forth. Eventually, each time a factorisation is completed, Algorithm 14 calls the function processKF, which takes as input the table kf. The function processKF, as defined in Algorithm 15, will compute the key candidates from kf. This algorithm basically generates all the possible combinations from the $\mathcal{N}$ lists $\mathtt{kf}\left[i\right]$. Note that this algorithm may be seen as a particular case of Algorithm 11. Finally, the main loop of this key enumeration algorithm simply calls Algorithm 14 for all the bins of ${H}_{0:\mathcal{N}1}$, which are between the enumeration bounds ${x}_{start},{x}_{stop}$.
Algorithm 14 performs bin decomposition. 
 1:
functionDecomposeBin($H,i,{x}_{bin},\mathtt{kf}$)  2:
if $i=1$ then  3:
$x\leftarrow {H}_{0}.\mathtt{size}()1$;  4:
while $(x\ge 0)$ and $(x+{H}_{1}.\mathtt{size}())\ge {x}_{bin}$ do  5:
if ${H}_{0}\left[x\right]>0$ and ${H}_{1}[{x}_{bin}x]>0$ then  6:
$\mathtt{kf}\left[0\right]\leftarrow {H}_{0}.\mathtt{get}(x)$;  7:
$\mathtt{kf}\left[1\right]\leftarrow {H}_{1}.\mathtt{get}({x}_{bin}x)$;  8:
$\mathtt{processKF}(\mathtt{kf})$;  9:
end if  10:
$x\leftarrow x1$;  11:
end while  12:
else  13:
$x\leftarrow {H}_{i}.\mathtt{size}()1$;  14:
while $(x\ge 0)$ and $(x+{H}_{0:i1}.size())\ge {x}_{bin}$ do  15:
if ${H}_{i}\left[x\right]>0$ and ${H}_{0:i1}[{x}_{bin}x]>0$ then  16:
$\mathtt{kf}\left[i\right]\leftarrow {H}_{i}.get(x)$;  17:
$\mathtt{DecomposeBin}(H,i1,{x}_{bin}x,\mathtt{kf})$;  18:
end if  19:
$x\leftarrow x1$;  20:
end while  21:
end if  22:
end function

4.8.3. Parallelization
Suppose we would like to have t tasks ${T}_{1},{T}_{2},{T}_{3},\cdots ,{T}_{t}$ executing in parallel to enumerate key candidates that are ranked between two bounds ${R}_{1}$ and ${R}_{2}$ in parallel. We can then calculate the indices ${x}_{start}$ and ${x}_{stop}$ and then create the array $\mathtt{X}=[{x}_{start},{x}_{start}1,\dots ,{x}_{stop}]$. We then partition the array $\mathtt{X}$ into t disjoint subarrays ${\mathtt{X}}_{i}$ and finally set each task ${T}_{i}$ to call the function decomposeBin for all the bins of ${H}_{0:\mathcal{N}1}$ with indices in ${\mathtt{X}}_{i}$.
As has been noted previously, the algorithm employed to partition the array $\mathtt{X}$ directly allows efficient parallel key enumeration, where the amount of computation performed by each task may be well balanced. An example of a partition algorithm that could almost evenly distribute the workload among the tasks is as follows:
Set i to 0.
If $\mathtt{X}$ is nonempty, pick an index x in $\mathtt{X}$ such that ${H}_{0:\mathcal{N}1}\left[x\right]$ is the maximum number or else return ${\mathtt{X}}_{1},{\mathtt{X}}_{2},\dots ,{\mathtt{X}}_{t}.$
Remove x from the array $\mathtt{X}$, and add it to the array ${\mathtt{X}}_{i+1}$.
Update i to $(i+1)\phantom{\rule{3.33333pt}{0ex}}mod\phantom{\rule{0.277778em}{0ex}}t$, and go back to Step 2.
Algorithm 15 processes table kf. 
 1:
functionproccessKF($\mathtt{kf}$)  2:
$i\leftarrow 0$;  3:
$I\left[i\right]\leftarrow 0$;  4:
while $i\ge 0$ do  5:
while $i<\mathcal{N}1$ do  6:
$i\leftarrow i+1$;  7:
$I\left[i\right]\leftarrow 0$;  8:
end while  9:
$\mathtt{c}\leftarrow \mathtt{combine}(\mathtt{kf}[0].\mathtt{get}(I\left[0\right]),\dots ,\mathtt{kf}[\mathcal{N}1].\mathtt{get}(I[\mathcal{N}1]))$;  10:
$\mathtt{Test}(\mathtt{c})$;  11:
while $i\ge 0$ and $I\left[i\right]\ge (\mathtt{kf}[i].\mathtt{size}()1)$ do  12:
$i\leftarrow i1$;  13:
end while  14:
if $i\ge 0$ then  15:
$I\left[i\right]\leftarrow I\left[i\right]+1;$  16:
end if  17:
end while  18:
end function

4.8.4. Memory Consumption
Besides the precomputed histograms, which are stored as arrays in memory, it is easy to see that this algorithm makes use of negligible memory (only table kf) while enumerating key candidates. Additionally, it is important to note that each time the function $\mathtt{processKF}$ is called, it will need to generate all key candidates obtained by picking chunk candidates from the $\mathcal{N}$ lists pointed to by the entries of $\mathtt{kf}$ and to process all of them immediately, since the table $\mathtt{kf}$ may have changed. This implies that, if the processing of key candidates is left to be done after the complete enumeration has finished, each version of the table $\mathtt{kf}$ would need to be stored, which, again, might be problematic in terms of memory consumption.
Regarding how many bits in memory the precomputed histograms consumes, we will analyse Algorithm 12. First, note, for a given list of chunk candidates
${L}^{i}$ and
${N}_{b}$, the function
$\mathtt{createHist}({L}^{i},{N}_{b})$ will return the standard histogram
${H}_{i}$. This standard histogram will be stored as the list
${L}_{{H}_{i}}$ of size
${N}_{b}$. An entry
x of
${L}_{{H}_{i}}$ will point to a list of chunk candidates. The total number of chunk candidates held by all the lists pointed to by the entries of
${L}_{{H}_{i}}$ is
${m}_{i}$. Therefore, the number of bits to store the list
${L}_{{H}_{i}}$ is
${B}_{p}\xb7{N}_{b}+{B}_{c}\xb7{m}_{i}$, where
${B}_{p}$ is the number of bits to store a pointer and
${B}_{c}$ is the number of bits to store a chunk candidate
$(score,[e])$. The total number of bits to store all lists
${L}_{{H}_{i}},0\le i<\mathcal{N},$ is
Concerning the convoluted histograms, let us first look at
${H}_{0:1}=\mathtt{conv}({H}_{0},{H}_{1})$. We know that
${H}_{0:1}$ is stored as a list of integers and that these entries can be seen as the coefficients of the resulting polynomial from multiplying the polynomial
${P}_{0}={H}_{0}\left[0\right]+{H}_{0}\left[1\right]x+\dots +{H}_{0}[{N}_{b}1]{x}^{{N}_{b}1}$ by
${P}_{1}={H}_{1}\left[0\right]+{H}_{1}\left[1\right]x+\dots +{H}_{1}[{N}_{b}1]{x}^{{N}_{b}1}$. Therefore, the list of integers used to store
${H}_{0:1}$ has
$2\xb7{N}_{b}1$ entries. Following a similar reasoning to the previous one, we can conclude that the list of integers used to store
${H}_{0:2}=\mathtt{conv}({H}_{2},{H}_{0:1})$ has
$3\xb7{N}_{b}2$ entries. Therefore, for a given
$i,1\le i\le \mathcal{N}1$, the list of integers used to store
${H}_{0:i}=\mathtt{conv}({H}_{i},{H}_{0:i1})$ has
$(i+1)\xb7{N}_{b}i$ entries. The total number of entries of all the convoluted histograms
${H}_{0:1},{H}_{0:2},\dots ,{H}_{0:\mathcal{N}1}$ is
As expected, the total number of entries strongly depends on the values
${N}_{b}$ and
$\mathcal{N}$. If an integer is stored in
${B}_{int}$ bits, then the number of bits for storing all the convoluted histograms is
4.8.5. Equivalence with the PathCounting Approach
The stackbased key enumeration algorithm and the scorebased key enumeration algorithm can be also used for rank computation (instead of enumerating each path, the rank version counts each path). Similarly, the histogram algorithm can also be used for rank computation by simply summing the size of the corresponding bins in
${H}_{0:\mathcal{N}1}$. These two approaches were believed to be distinct from each other. However, Martin et al. in the research paper [
31] showed that both approaches are mathematically equivalent, i.e., they both compute the exact same rank when choosing their discretisation parameter correspondingly. Particularly, the authors showed that the binning process in the histogram algorithm is equivalent to the “map to weight” floattointeger conversion used prior to their path counting algorithm (Forest) by choosing the algorithms’ discretisation parameter carefully. Additionally, in this paper, a performance comparison between their enumeration versions was carried out. The practical experiments indicated that Histogram performs best for low discretisation and that Forest wins for higher parameters.
4.8.6. Variant
A recent paper by Grosso [
26] introduced a variant of the previous algorithm. Basically, the author of [
26] makes a small adaptation of Algorithm 14 to take into account the treelike structure used by their rank estimation algorithm. Also, the author claims this variant has an advantage over the previous one when the memory needed to store histograms is too large.
4.9. A Quantum Key Search Algorithm
In this subsection, we will describe a quantum key enumeration algorithm introduced in the research paper [
29] for the sake of completeness. This algorithm is constructed from a nonoptimal key enumeration algorithm, which uses the key rank algorithm given by Martin et al. in the research paper [
16] to return a single key candidate (the
${r}_{th}$) with a weight in a particular range. We will first describe the key rank algorithm. This algorithm restricts the scores to positive integer values (weights) such that the smallest weight is the likeliest score by making use of a function that converts scores into weights [
16].
Assuming the scores have already been converted to weights, the rank algorithm first constructs a matrix $\mathbf{b}$ with size of $\mathcal{N}\times {W}_{2}$ for a given range $[{W}_{1},{W}_{2})$ as follows. For $i=\mathcal{N}1$ and $0\le w<{W}_{2}$, the entry ${\mathbf{b}}_{i,w}$ contains the number of chunk candidates such that their total score plus w lies in the given range. Therefore, ${\mathbf{b}}_{i,w}$ is given by the number of chunk candidates ${\mathtt{c}}_{j}^{i},0\le j<{m}_{i}$, such that ${W}_{1}w\le {\mathtt{c}}_{j}^{i}.score<{W}_{2}w$.
On the other hand, for $i=\mathcal{N}2,\mathcal{N}3,\dots ,0,$ and $0\le w<{W}_{2}$, the entry ${\mathbf{b}}_{i,w}$ contains the number of chunk candidates that can be constructed from the chunk i to the chunk $\mathcal{N}1$ such that their total score plus w lies in the given range. Therefore, ${\mathbf{b}}_{i,w}$ may be calculated as follows. For $0\le j<{m}_{i},{\mathbf{b}}_{i,w}={\mathbf{b}}_{i,w}+{\mathbf{b}}_{i+1,w+{\mathtt{c}}_{j}^{i}.score}$ if $w+{\mathtt{c}}_{j}^{i}.score<{W}_{2}.$
Algorithm 16 describes precisely the manner in which the matrix $\mathbf{b}$ is computed. Once matrix $\mathbf{b}$ is computed, the rank algorithm will calculate the number of key candidates in the given range by simply returning ${\mathbf{b}}_{0,0}$. Note that ${\mathbf{b}}_{0,0}$, by construction, contains the number of chunk candidates, with initial weight 0, that can be constructed from the chunk 0 to the chunk $\mathcal{N}1$ such that their total weight lies in the given range. Algorithm 17 describes the rank algorithm.
Algorithm 16 creates the matrix $\mathbf{b}$. 
 1:
functioninitialise(${W}_{1},{W}_{2}$)  2:
$i\leftarrow \mathcal{N}1$;  3:
$\mathbf{b}\leftarrow {\left[{\left[0\right]}^{{W}_{2}}\right]}^{\mathcal{N}}$;  4:
for $w=0\phantom{\rule{3.33333pt}{0ex}}to\phantom{\rule{3.33333pt}{0ex}}{W}_{2}1$ do  5:
for $j=0\phantom{\rule{3.33333pt}{0ex}}to\phantom{\rule{3.33333pt}{0ex}}{m}_{i}1$ do  6:
if ${W}_{1}w\le {\mathtt{c}}_{j}^{i}.score<{W}_{2}w$ then  7:
${\mathbf{b}}_{i,w}\leftarrow {\mathbf{b}}_{i,w}+1$;  8:
end if  9:
end for  10:
end for  11:
for $i=\mathcal{N}2\phantom{\rule{3.33333pt}{0ex}}to\phantom{\rule{3.33333pt}{0ex}}0$ do  12:
for $w=0\phantom{\rule{3.33333pt}{0ex}}to\phantom{\rule{3.33333pt}{0ex}}{W}_{2}1$ do  13:
for $j=0\phantom{\rule{3.33333pt}{0ex}}to\phantom{\rule{3.33333pt}{0ex}}{m}_{i}1$ do  14:
if $w+{\mathtt{c}}_{j}^{i}.score<{W}_{2}$ then  15:
${\mathbf{b}}_{i,w}\leftarrow {\mathbf{b}}_{i,w}+{\mathbf{b}}_{i+1,w+{\mathtt{c}}_{j}^{i}.score}$;  16:
end if  17:
end for  18:
end for  19:
end for  20:
return $\mathbf{b}$;  21:
end function

Algorithm 17 returns the number of key candidates in a given range. 
 1:
functionrank(${W}_{1},{W}_{2}$)  2:
$\mathbf{b}\leftarrow \mathtt{initialise}({W}_{1},{W}_{2})$;  3:
return ${\mathbf{b}}_{0,0}$;  4:
end function

With the help of Algorithm 17, an algorithm for requesting particular key candidates is introduced, which is described in Algorithm 18. It returns the ${r}_{th}$ key candidate with weight between ${W}_{1}$ and ${W}_{2}$. Note that the correctness of the function getKey follows from the correctness of $\mathbf{b}$ and that the algorithm is deterministic, i.e., given the same r, it will return the same key candidate k. Also, note that the ${r}_{th}$ key candidate does not have to be the ${r}_{th}$ most likely key candidate in the given range.
Equipped with the
getkey algorithm, the authors of [
29] introduced a nonoptimal key enumeration algorithm to enumerate and test all key candidates in the given range. This algorithm works by calling the function
getKey to obtain a key candidate in the given range until there are no more key candidates in the given range. Also, for each obtained key candidate
k, it is tested by using a testing function
$\mathtt{T}$ returning either 1 or 0. Algorithm 19 precisely describes how this nonoptimal key enumeration algorithm works.
Algorithm 18 returns the ${r}_{th}$ key candidate with weight between ${W}_{1}$ and ${W}_{2}.$ 
 1:
functiongetKey($\mathbf{b},{W}_{1},{W}_{2},r$)  2:
if $r>{\mathbf{b}}_{0,0}$ then  3:
return ⊥;  4:
end if end if  5:
$\mathtt{k}\leftarrow {\left[0\right]}^{\mathcal{N}}$;  6:
$w\leftarrow 0$;  7:
for $i=0\phantom{\rule{3.33333pt}{0ex}}to\phantom{\rule{3.33333pt}{0ex}}\mathcal{N}2$ do  8:
for $j=0\phantom{\rule{3.33333pt}{0ex}}to\phantom{\rule{3.33333pt}{0ex}}{m}_{i}1$ do  9:
if $r<{\mathbf{b}}_{i+1,w+{\mathtt{c}}_{j}^{i}.score}$ then  10:
${\mathtt{k}}_{i}\leftarrow j$;  11:
$w\leftarrow w+{\mathtt{c}}_{j}^{i}.score$;  12:
break j;  13:
end if  14:
$r\leftarrow r{\mathbf{b}}_{i+1,w+{\mathtt{c}}_{j}^{i}.score}$;  15:
end for  16:
end for  17:
$i\leftarrow \mathcal{N}1$;  18:
for $j=0\phantom{\rule{3.33333pt}{0ex}}to\phantom{\rule{3.33333pt}{0ex}}{m}_{i}1$ do  19:
$v\leftarrow ({W}_{1}w\le {\mathtt{c}}_{j}^{i}.score<{W}_{2}w)?1:0$;  20:
if $r\le v$ then  21:
${\mathtt{k}}_{i}\leftarrow j$;  22:
break j;  23:
end if  24:
$r\leftarrow rv$;  25:
end for  26:
return $\mathtt{k}$;  27:
end function

Combining together the function
keySearch with techniques for searching over partitions independently, the authors of the research paper [
29] introduced a key search algorithm, described in Algorithm 20. The function
KS works by partitioning the search space into sections of which the size follows a geometrically increasing sequence using a size parameter
$a=\mathcal{O}(1)$. This parameter is chosen such that the number of loop iterations is balanced with the number of keys verified per block.
Algorithm 19 enumerates and tests key candidates with weight between ${W}_{1}$ and ${W}_{2}.$ 
 1:
functionkeySearch(${W}_{1},{W}_{2},\mathtt{T}$)  2:
$\mathbf{b}\leftarrow \mathtt{initialise}({W}_{1},{W}_{2})$;  3:
$r\leftarrow 1$;  4:
while $True$ do  5:
$\mathtt{k}\leftarrow \mathtt{getKey}(\mathbf{b},{W}_{1},{W}_{2},r)$;  6:
if $\mathtt{k}=\mathrm{\perp}$ then  7:
break;  8:
end if  9:
if $\mathtt{T}(\mathtt{k})=1$ then  10:
break;  11:
end if  12:
$r\leftarrow r+1$;  13:
end while  14:
return $\mathtt{k}$;  15:
end function

Algorithm 20 searches key candidates in a range with a size of e approximately. 
 1:
functionKS($e,\mathtt{T}$)  2:
${W}_{1}\leftarrow {W}_{min}$;  3:
${W}_{2}\leftarrow {W}_{min}+1$;  4:
$step\leftarrow 0$;  5:
Choose ${W}_{e}$ such that $\mathtt{rank}(0,{W}_{e})$ is approx e;  6:
while ${W}_{1}\le {W}_{e}$ do  7:
$\mathtt{k}\leftarrow \mathtt{keySearch}({W}_{1},{W}_{2},\mathtt{T})$;  8:
if $\mathtt{k}\ne \mathrm{\perp}$ then  9:
return $\mathtt{k}$;  10:
end if  11:
$step\leftarrow step+1$;  12:
${W}_{1}\leftarrow {W}_{2}$;  13:
Choose ${W}_{2}$ such that $\mathtt{rank}({W}_{1},{W}_{2})$ is approx ${a}^{step}$;  14:
end while  15:
return ⊥;  16:
end function

Having introduced the function
KS, the authors of the research paper [
29] transformed it into a quantum key search algorithm that heavily relies on Grover’s algorithm [
32]. This is a quantum algorithm to solve the following problem: Given a black box function which returns 1 on a single input
x and 0 on all other inputs, find
x. Note that, if there are
N possible inputs to the black box function, the classical algorithm uses
$\mathcal{O}(N)$ queries to the black box function since the correct input might be the very last input tested. However, in a quantum setting, a version of Grover’s algorithm solves the problem using
$\mathcal{O}({N}^{1/2})$ queries, with certainty [
32,
33]. Algorithm 21 describes the quantum search algorithm, which achieves a quadratic speedup over the classical key search (Algorithm 20) [
29]. However, it would require significant quantum memory and a deep quantum circuit, making its practical application in the near future rather unlikely.
Algorithm 21 performs a quantum search of key candidates in a range with a size of e approximately. 
 1:
functionQKS($e,\mathtt{T}$)  2:
${W}_{1}\leftarrow {W}_{min}$;  3:
${W}_{2}\leftarrow {W}_{min}+1$;  4:
$step\leftarrow 0$;  5:
Choose ${W}_{e}$ such that $\mathtt{rank}(0,{W}_{e})$ is approx e;  6:
while ${W}_{1}\le {W}_{e}$ do  7:
$\mathbf{b}\leftarrow \mathtt{initialise}({W}_{1},{W}_{2})$;  8:
$\mathtt{f}(\xb7)\leftarrow \mathtt{T}(\mathtt{getKey}(\mathbf{b},{W}_{1},{W}_{2},\xb7))$;  9:
Call Grover using $\mathtt{f}$ for one or zero marked elements in range $[{W}_{1},{W}_{2})$;  10:
if marked element t found then  11:
return $\mathtt{getKey}(\mathbf{b},{W}_{1},{W}_{2},t)$;  12:
end if  13:
$step\leftarrow step+1$;  14:
${W}_{1}\leftarrow {W}_{2}$;  15:
Choose ${W}_{2}$ such that $\mathtt{rank}({W}_{1},{W}_{2})$ is approx ${a}^{step}$;  16:
end while  17:
return ⊥;  18:
end function
