Indistinguishability and Pseudo-Randomness
Recall the perfectly secure encryption, OTP. That is, we bitwise-XOR our message with a uniform random string.
OTP is inefficient because the long random string must be shared between Alice and Bob in advance. Suppose that we have a (mathematical, deterministic) function that can extends a short truely random string to a long “random-looking” string. We can use the seemingly random to encrypt messages as in OTP, yet it is efficient.
Is that possible? How to formally define “random-looking”?
Let
We will introduce computational indistinguishability, and then define pseudo-random generator (PRG) and pseudo-random function (PRF).
Computational Indistinguishability
Key Idea: If we have no way to show the difference, then we are satisfied. We call it indistinguishability.
Example: Turing test, when a machine and a human is indistinguishable in every human’s prompts, we call it AI.
Observation: they are not the same, not even close in any sense; however, the distinguisher “another human” can not tell the difference due to a limited power.
Concept: we say a distribution is pseudorandom if for every efficient algorithm, it can not be distinguished from a (truely) uniform distribution.
We will formalize the concept asymptotically.
Definition: Ensembles of Probability Distributions
A sequence
is called an ensemble if for each , is a probability distribution over . (We often write when the context is clear.)
E.g., supposing
Definition: Computational Indistinguishability
Let
and be ensembles where are distributions over for some polynomial . We say that and are computationally indistinguishable (denoted by ) if for all NUPPT (called the “distinguisher”), there exists a negligible function such that ,
Note:
- “=1” is a convention in literature
- “absolute” is not necessary due to “for all D”
This definition requires the two distributions to pass all efficient statistical tests, which include the following.
- Roughly as many 0 as 1.
- Roughly as many 01 as 10.
- Each sequence of bits occurs with roughly the same probability.
- Given any prefix, guessing the next bit correctly happens with roughly the same probability.
Lemma: Closure Under Efficient Operations
If the pair of ensembles
, then for any NUPPT , .
(By standard reduction)
Examples:
ignores its input. Clearly, for all . is identity, i.e., its output is exactly the input. . outputs the first half of the input, i.e., .
Lemma: Hybrid Lemma
Let
be a sequence of probability distributions. Assume that the machine distinguishes and with probability . Then there exists some s.t. distinguishes and with probability .
(By triangular ineq)
Notice that this lemma applies to distributions, not ensembles. Fortunately, it implies the following.
Corollary:
For any ensembles
, if and , then it follows that .
(left as exercise)
Discuss If the number of hybrid distributions (between two ensembles) depends on the size
Example: Let
Lemma: Prediction Lemma
Let
and be two ensembles where are probability distributions over for some polynomial , and let be a NUPPT machine that distinguishes between and with probability for infinitely many . Then there exists a NUPPT such that for infinitely many
.
Remove the absolute value in the def of comp. ind. by negating the distinguisher
, and then standard probability.
Note: the converse the easier to prove. Hence, prediction and distinguishing is essentially equivalent.
Pseudo-Random Generator
Definition: Pseudo-random Ensembles.
The probability ensemble
, where is a probability distribution over for some polynomial , is said to be pseudorandom if , where is the uniform distribution over .
Note:
- this definition says that a pseudorandom distribution must pass all efficiently computable tests that the uniform distribution would have passesd.
- it is hard to check or prove if a distribution is pseudorandom (due to the “for all” quantifier from comp. ind.)
Definition: Pseudo-random Generators.
A function
is a Pseudo-random Generator (PRG) if the following holds.
- (efficiency):
can be computed in PPT. - (expansion):
- The ensemble
is pseudorandom.
We sometimes say that the expansion of PRG
Example: if
Lemma: Expansion of a PRG
Let
to be a PRG for all . For any polynomial , define as follows: where
, . Then is a PRG.
Proof, warmup:
Suppose that
, no expansion, but we want to show pseudorandomness. Define distributions for
, and define for . Since , by and closure, we have . By is pseudorandom and closure, , which implies . By the corollary of hybrid lemma, we have .
Proof of PRG Expansion
It is slightly tricky when
depends on . Define the prefix and last bit of iterating as: and
We have
, and we want to prove it through Hybrid Lemma. Given , define hybrid distributions , , and define for as where
denotes sampling an -bit string uniformly at random. Observe that for each , and differ by a , that is, for all
. Assume for contra (AC), there exists NUPPT
, poly s.t. for inf many , distinguishes and w.p. at least . The intuition is to apply Hybrid Lemma so that there exists such that are distinguishable, and thus by Closure Lemma is distinguishable from uniform. We prove it formally by constructing
that aims to distinguish . Given input , performs:
- Samplable
(where ) , , and - output
To show that
succeed with non-negl prob., we partition the event as follows: where the random variable
is sampled exactly the same as in . Notice that conditioned on
for any fixed , the distribution (given to ) is identical to That implies
We thus have the summations cancelling out,
where the last inequality follows by (AC). That is,
distinguishes w.p. at least , contradicting is a PRG.
Discuss In the above, we proved it formally and preserved the uniformity (if
We proved in the above that a PRG with 1-bit expansion is sufficient to build any poly-long expansion. We have not yet give any candidate construct of PRG (even 1-bit expansion), but it is useful to firstly see what we can achieve using PRGs.
Example: Now suppose that we have a PRG
What’s the downside of the above multi-message encryption?
Pseudo-Random Functions
In order to apply PRGs more efficiently, we construct a tree structure and call the abstraction pseudo-random functions (PRFs). We begin with defining (truly) random function.
Definition: Random Functions
A random function
is a random variable sampled uniformly from the set .
We can view a random function in two ways. In the combinatorial view, any function
In the computational view, a random function
- initialize a map
to empty before any query - if
, then sample and set - output
In both views, the random function needs
Note: the random function
Intuitively, a pseudo-random function (PRF) shall look similar to a random function. That is, indistinguishable by any NUPPT Turing machine that is capable of interacting with the function.
Definition: Oracle Indistinguishability
Let
and be ensembles where are probability distributions over functions. We say that and are computationally indistinguishable if if for all NUPPT machines D that is given oracle accesses to a function, there exists a negligible function such that for all ,
Note:
It is easy to verify that oracle indistinguishability satisfies “closure under efficient operations”, the Hybrid Lemma, and the Prediction Lemma.
Also notic that we can transform a distribution of oracle functions to a distribution of strings using an efficient oracle operation, and in that case, the oracle indistinguishability is translated into the comp. indistinguishability of strings (see CPA-secure encryption below).
Definition: Pseudo-random Functions (PRFs)
A family of functions
is pseudo-random if
- (Easy to compute):
can be computed by a PPT algo that is given input . - (Pseudorandom):
.
Note: similar to PRG, the seed
Theorem: Construct PRF from PRG
If a pseudorandom generator exists, then pseudorandom functions exist.
We have shown that a PRG with 1-bit expansion implies any PRG with poly expansion. So, let
where
We define
That is, we evaluate
This constructs a binary tree. The intuition is from expanding the 1-bit PRG, but now we want that any sub-string of the expansion can be efficiently computed. (We CS people love binary trees.) Clearly,
There are
leaves in the tree, too many so that we can not use the “switch one more PRG to uniform in each hybrid” technique as in expanding PRG. The trick is that the distinguisher can only query at most polynomial many times since is poly-time. Each query correspond to a path in the binary tree, and there are at most polynomial many nodes in all queries. Hence, we will switch the evaluations from root to leaves of the tree and from the first query to the last query. Note: switching each instance of
(for each ) is a reduction that runs to distinguish one instance of ; therefore, we switch exactly one in each hybrid. More formally, assume for contra (AC), there exists NUPPT
, poly s.t. for inf many , distinguishes from RF (in the oracle interaction). We want to construct that distinguishes . We define hybrid oracles as follows:
- the map
is initialized to empty - if the prefix
, then sample and set - output
Notice that
is a function defined using the computational view. Let
be the distribution of for short. We have and , but there are still too many switches between . The key observation is that, given is PPT, we know a poly that is the running time of on , and then we just need to switch at most instances of . That is to define sub-hybrids ,
- the map
is initialized to empty if the prefix
, then depending on the “number of queries” that are made to so far, including the current query, do the following: sample , set and set
- output
We have
. Moreover for any runs in time , we have (their combinatorial views differ, but their computational views are identical for queries). Now we have hybrids, so we can construct :
- sample
and uniformly at random - define oracle
such that is similar to but “injects” to the map in the -th query if the prefix . (This is constructable and computable only in the next step when queries come from .) - run and output
, that is running on input when providing with oracle queries to It remains to calculate the probabilities, namely, given (AC),
distinguishes from uniformly sampled string w.p. , a contradiction. The calculation is almost identical to the proof of PRG expansion and left as an exercise.
Secure Encryption Scheme
Perfect secrecy considers that the adversary gets the ciphertext only (but nothing else). However, there are other natural adversarial models in practical scenarios.
- Known plaintext attack: The adversary may get to see pairs of form
- Chosen plain text, CPA: The adversary gets access to an encryption oracle before and after selecting messages.
- Chosen ciphertext attack, CCA1: The adversary has access to an encryption oracle and to a decryption oracle before selecting the messages. [“lunch-time attack”, Naor and Young]
- Chosen ciphertext attack, CCA2: This is just like a CCA1 attack except that the adversary also has access to decryption oracle after selecting the messages. It is not allowed to decrypt the challenge ciphertext however. [Rackoff and Simon]
We formalize CPA-security next (but leave CCA1/CCA2 later in authentication).
Definition: Chose-Plaintext-Attack Encryption (CPA)
Let
be an encryption scheme. For any NUPPT adversary , for any , define the experiment to be: Experiment
:
- Output
Then we say
is CPA secure if for all NUPPT ,
Note: the experiment
Compared to Shannon/perfect secrecy, what are the differences?
- comp. bounded
- orcale before
- orcale after
- choose
Suppose that we have a secure encryption even without CPA oracle but the key is shorter than the message. Can we get a PRG/PRF? Can we get a OWF?
Theorem: CPA-Secure Encryption from PRF
Let
be a family of PRFs. Then the following is a CPA-secure encryption scheme.
: sample and output .
: given input , sample , and then output
: given input , output
The correctness and efficiency of the construction follows from PRF directly. It remains to prove CPA security.
To show
and are comp. ind., we define hybrid experiments as follows. Hybrid
:
, and then let to be the following oracle: where
is sampled uniformaly. - Output
By oracle indistinguishability of
and and closure under efficient operations, we have and . (Notice that and are oracle ind., but and are comp. ind. of strings.) Hence, it suffices to prove that the ensembles
and are ind. They seem to be indentically distributed (as in OTP). However, there is a difference: gets oracle accesses to (before and after choosing ), and could sample the same in the cipher and in another oracle accesses. Fortunately, hitting the same twice in polynomial time happens with negligible probability. We formally prove
next. Define to be the set and let
be the random variable sampled for the cipher . We want to show that is negligible for all NUPPT . Let and be the events for short. where
. We also have , thus Given that
is polynomial in for any NUPPT , it follows that is negligible in , which concludes the proof.
Notice that we could have constructed an efficient CPA-secure encryption from PRG, but using a PRF significantly simplified the construction and the proof.
Hard-Core Bits from any OWF
So far we have not yet have a candidate construction of PRG (with 1-bit expansion). We will next construct a PRG from one-way permutations.
The construct of PRG comes from two properties of OWF:
- The output of
must be sufficiently random when the input is uniform; otherwise, is constant (for most ), then we can invert easily. - A sufficiently random
can still be easily inverted (such as indentity func). By hard to invert, there must be some bits of that are hard to guess when is given. How many bits are hard to guess for any polynomial-time adversary? Must be .
Suppose
Definition: One-way Permutations
An OWF
for all is called a one-way permutations if is a bijection.
Definition: Hard-core Bits
A predicate
is a hard-core predicate for if is efficiently computable given , and for any NUPPT adversary , there exists a negligible so that for all ,
This is indeed the case for some OWPs, such as RSA. If we construct OWP from the RSA assumption, then the least significant bit of
Theorem: PRG from OWP and hard-core predicate
Suppose that
is a OWP and is a hard-core predicate for (for all ). Then, to be defined below is a PRG: (The proof is a standard reduction: if there exists a NUPPT distinguisher
against , then we can build a NUPPT adversary that inverts by running .)
However, we want to obtain PRG from any OWP or any OWF (without depending on specific assumptions). That is unfortunately unclear.
Fortunately, Goldreich-Levin showed that for any OWF
Theorem: Goldreich-Levin, Hard-Core Lemma
Let
for all be a OWF. Define functions to be the following: where
denotes the inner product modulo 2, i.e., for any . Then, is a OWF and is a hard-core predicate for .
Note: in the above definition of
Clearly
Full Assumption:
There exists NUPPT
, polynomial , such that for inf many ,
The construct and analysis of
Warmup Assumption 1:
There exists NUPPT
, such that for inf many , for all ,
To invert
, the construction of is simple:
- For
, do the following
- Let
be the -bit string that only the -th bit is 1 (0 otherwise) - Run
- Output
To see why inverts , observe that , where . Hence, succeeds w.p. 1, a contradiction.
Note: the above assumed “for all
Warmup Assumption 2:
There exists NUPPT
, polynomial , such that for inf many ,
We would like to use
as before, but now may always fail whenever the suffix of is . Hence, we randomize to and and then recover the inner product (this is also called “self correction”). Fact
For all
, any strings , it holds that . To invert
, the construction of is below:
- For each
, do
- For
to , do
- Run
- Let
be the majority of - Output
To prove
succeeds with high prob., we first prove that there are many good ’s. Good instances are plenty.
Define
to be the set of good instances, where
.
If the Warmup Assumption 2 holds, then. (This is actually a standard averaging argument or a Markov ineq.) Suppose not,
. Then, which contradicts Warmup Assumption 2.
Now, suppose that
. fails to invert or w.p. by union bound. So, for any fixed , for each independently. By Chernoff bound, the majority of is w.p. . Choosing , the probability is exponentially close to 1. By union bound over all , recovers w.p. close to 1. Finally,
succeeds w.p. for all uniformly sampled by failing for all .
To complete the full proof, We want to lower from
- The union bound of inverting both
and . For , that lowers to only , and then that is too low for the following majority and Chernoff bound.
The first idea is to guess the inner product
The second idea is to use pairwise independent guesses. Particularly, we have Chebychev’s bound for the measure concentration of pairwise indep. r.v. (instead of Chernoff bound for fully indep.).
Theorem: Chebychev’s inequality
Let
be pairwise independent random variables such that for all , . Then, where
. [Ps, p189]
We can then reduce the number of guesses from
Fact: Sampling pairwise independent random strings
For any
, let be strings independently sampled uniformly at random (we abuse notation and round up to the next integer). Define strings for each to be The random variables
are pairwise independent, where denotes such that is the -th subset of . (The proof is left as exercise.)
Now we are ready to prove the full theorem.
Proof of Hard-Core Lemma (Goldreich-Levin, 1989)
Given NUPPT
in the Full Assumption, we construct that inverts as follows. Algorithm
- Let
, be fully independent and be pairwise independent -bit random strings as in Fact of pairwise indep. - For each
, sample guess bit uniformly. For each , compute the bit from in the same way as (so that for any , and for all ). - For each
,
- For each
,
- Run
. Let
be the majority of - Output
We begin with claiming the number of good instances of
. Good instances are plenty.
Define
to be the set of good instances, where
. If the Full Assumption holds, then . (The proof is almost the same and omitted.)
We condition on the good event that
. Next, we condition on the “lucky event” that the guess equals to for all , which happens w.p. . That implies are all correct. With the conditioning, for any , and are still pairwise indep., and thus are pairwise indep. as well. Therefore, by Chebychev’s ineq., the majority of equals to w.p. where
, and denotes the event that outputs correctly. Choosing , we have that . Taking union bound for all , , conditioning on and all ’s are correct. Removing the conditional events* takes and , but still inverts w.p. , contradicting is OWF. (*For any events
, , where is , is , is all ’s are correct.)
Discuss The number of bits we guessed is
Discuss How far does the Hard-core Lemma extend to? Suppose
- Is
a OWF? - Let
, and let . Is a OWF? If so, is a hard-core predicate for ? - Let
, and let . Is a OWF? If so, is a hard-core predicate for ?
The questions are highly relevant when we want to construct PRG from any one-way function.