Indistinguishability and Pseudo-Randomness

Recall the perfectly secure encryption, OTP. That is, we bitwise-XOR our message with a uniform random string.

mk, |m|=|k|.

OTP is inefficient because the long random string must be shared between Alice and Bob in advance. Suppose that we have a (mathematical, deterministic) function that can extends a short truely random string to a long “random-looking” string. We can use the seemingly random to encrypt messages as in OTP, yet it is efficient.

Is that possible? How to formally define “random-looking”?

Let g be the above function with short input x and long output g(x). We want Alice and Bob share the same g(x) to decrypt correctly, so g must be deterministic. Mathematically, we have def for the distance between two probability distributions. However, for any |g(x)|>|x|, the input / output distributions are far. The point is “random-looking” at best.

mg(s), |m|=|g(s)|, but |s||m|.

We will introduce computational indistinguishability, and then define pseudo-random generator (PRG) and pseudo-random function (PRF).

Computational Indistinguishability

Key Idea: If we have no way to show the difference, then we are satisfied. We call it indistinguishability.

Example: Turing test, when a machine and a human is indistinguishable in every human’s prompts, we call it AI.

Observation: they are not the same, not even close in any sense; however, the distinguisher “another human” can not tell the difference due to a limited power.

Concept: we say a distribution is pseudorandom if for every efficient algorithm, it can not be distinguished from a (truely) uniform distribution.

We will formalize the concept asymptotically.

Definition: Ensembles of Probability Distributions

A sequence {Xn}nN is called an ensemble if for each nN, Xn is a probability distribution over {0,1}. (We often write X={Xn}n when the context is clear.)

E.g., supposing Xn is a distribution over n-bit strings for all nN, {Xn}nN is an ensemble.

Definition: Computational Indistinguishability

Let X={Xn}n and Y={Yn}n be ensembles where Xn,Yn are distributions over {0,1}(n) for some polynomial (·). We say that X and Y are computationally indistinguishable (denoted by XY) if for all NUPPT D (called the “distinguisher”), there exists a negligible function ϵ such that nN,

|Pr[tXn,D(t)=1]Pr[tYn,D(t)=1]|<ϵ(n).

Note:

  • “=1” is a convention in literature
  • “absolute” is not necessary due to “for all D”

This definition requires the two distributions to pass all efficient statistical tests, which include the following.

  • Roughly as many 0 as 1.
  • Roughly as many 01 as 10.
  • Each sequence of bits occurs with roughly the same probability.
  • Given any prefix, guessing the next bit correctly happens with roughly the same probability.

Lemma: Closure Under Efficient Operations

If the pair of ensembles {Xn}n{Yn}n, then for any NUPPT M, {M(Xn)}n{M(Yn)}n.

(By standard reduction)

Examples:

  1. M ignores its input. Clearly, M(Xn)M(Yn) for all n.
  2. M is identity, i.e., its output is exactly the input. {M(Xn)=Xn}n{M(Yn)=Yn}n.
  3. M outputs the first half of the input, i.e., M(x):=x[1,,|x|/2].

Lemma: Hybrid Lemma

Let X(1),X(2),,X(m) be a sequence of probability distributions. Assume that the machine D distinguishes X(1) and X(m) with probability p. Then there exists some i{1,,m1} s.t. D distinguishes X(1) and X(m) with probability p/m.

(By triangular ineq)

Notice that this lemma applies to distributions, not ensembles. Fortunately, it implies the following.

Corollary:

For any ensembles X,Y,Z, if XY and YZ, then it follows that XZ.

(left as exercise)

Discuss If the number of hybrid distributions (between two ensembles) depends on the size n (of the distributions in the ensembles), the above corollary is tricky. Consider two ensembles X={Xn}n,Y={Yn}n, and suppose that the machine D distinguishes X,Y w.p. p(n) (that depends on n), and then suppose that the sequence (Xn=Xn(1),,Xn(m(n))=Yn) consists of m(n) distributions such that m depends on n. Then, we can not define m(n) ensembles between X and Y due to the dependence (i.e., the length of the sequence depends on n). This is indeed the case when we have many hybrids, e.g., going from (n+1)-bit PRG to 2n-bit PRG. There are two ways to treat this case, the formal one and the more popular one. In the formal way, we assume for contra that exists D and p(n) s.t. for inf many nN, D distinguishes (Xn,Yn) w.p. at least p(n); we then construct a reduction B such that guesses an index j[m(n)1] and hoping that j=i, where i is the index given by hybrid lemma, so that B runs D to distinguish and solve the challenge specified by the j-th hybrid. The popular way is less rigorous but more intuitive: we just claim that the two distributions Xn(j),Xn(j+1) are “indistinguishable” for each j,n, and thus Xn,Yn are “indistinguishable”; this is informal because fixing any j means that n is also fixed and Xn,Yn are two distributions (not ensembles), but indistinguishability is defined asymptotically on ensembles.

Example: Let X,Y,Z,Z be ensembles. Suppose that (X,Z)(X,Z) and (Y,Z)(Y,Z). Does (X,Y,Z)(X,Y,Z)?

Lemma: Prediction Lemma

Let {Xn0}n and {Xn1}n be two ensembles where Xn0,Xn1 are probability distributions over {0,1}(n) for some polynomial (), and let D be a NUPPT machine that distinguishes between {Xn0}n and {Xn1}n with probability p(·) for infinitely many nN. Then there exists a NUPPT A such that

Pr[b{0,1},tXnb:A(t)=b]12+p(n)2

for infinitely many nN.

Remove the absolute value in the def of comp. ind. by negating the distinguisher D, and then standard probability.

Note: the converse the easier to prove. Hence, prediction and distinguishing is essentially equivalent.

Pseudo-Random Generator

Definition: Pseudo-random Ensembles.

The probability ensemble {Xn}n, where Xn is a probability distribution over {0,1}l(n) for some polynomial l(), is said to be pseudorandom if {Xn}n{Ul(n)}n, where Um is the uniform distribution over {0,1}m.

Note:

  • this definition says that a pseudorandom distribution must pass all efficiently computable tests that the uniform distribution would have passesd.
  • it is hard to check or prove if a distribution is pseudorandom (due to the “for all” quantifier from comp. ind.)

Definition: Pseudo-random Generators.

A function g:{0,1}{0,1} is a Pseudo-random Generator (PRG) if the following holds.

  1. (efficiency): g can be computed in PPT.
  2. (expansion): |g(x)|>|x|
  3. The ensemble {xUn:g(x)}n is pseudorandom.

We sometimes say that the expansion of PRG g is t if |g(x)||x|t for all x.

Example: if g:{0,1}n{0,1}n+1 for all n is a PRG, then g is a OWF. (proof left as exercise, why expansion is necessary?)

Lemma: Expansion of a PRG

Let g:{0,1}n{0,1}n+1 to be a PRG for all nN. For any polynomial (n)>n, define g:{0,1}n{0,1}(n) as follows:

g(s)b1b2...b,

where :=(|s|), x0s,xi+1bi+1g(xi). Then g is a PRG.

Proof, warmup:

Suppose that =2, no expansion, but we want to show pseudorandomness. Define distributions

Hn0:=g(s),Hn1:=U1g(s)[n+1],H2_n:=U_2

for nN, and define Hi:={Hni}n for i=0,1,2. Since g(s)=g(s)[n+1]g(g(s)[1n])[n+1], by g(s)Un+1 and closure, we have H0H1. By g(x) is pseudorandom and closure, g(Un)[n+1]U1, which implies H1H2. By the corollary of hybrid lemma, we have H0H2.

Proof of PRG Expansion

It is slightly tricky when depends on n. Define the prefix h and last bit s of iterating g as:

hi(x):={g(x)[1...n]i=1,g(hi1(x))[1...n]i>1

and

si:=si(x):={g(x)[n+1]i=1,g(hi1(x))[n+1]i>1.

We have g(x)=s1s2s, and we want to prove it through Hybrid Lemma. Given n, define hybrid distributions H0:=g(x), H:=U, and define Hi for i=1,,1 as

Hi:=Uis1...s(n)i,

where Ui denotes sampling an i-bit string uniformly at random. Observe that for each i=0,1,,1, Hi and Hi+1 differ by a g(x), that is,

Hi+1=UiU1s1...si1, and Hi=Uis1s2...si=Uig(x)[n+1]s1(g(x)[1...n])...si1(g(x)[1...n])

for all i=0,1,,.

Assume for contra (AC), there exists NUPPT D, poly p(n) s.t. for inf many nN, D distinguishes {x{0,1}n:g(x)}n and U(n) w.p. at least 1/p(n). The intuition is to apply Hybrid Lemma so that there exists j such that Hj,Hj+1 are distinguishable, and thus by Closure Lemma g(x) is distinguishable from uniform.

We prove it formally by constructing D that aims to distinguish g(x). Given input t{0,1}n+1, D performs:

  1. Samplable i{0,,1} (where (n))
  2. t0Ui, t1t[n+1], and t2s1(t[1n])s2(t[1n])si1(t[1n])
  3. output D(t0t1t2)

To show that D succeed with non-negl prob., we partition the event as follows:

PrtUn+1,i[D(t)=1]PrxUn,i[D(g(x))=1]=j=01Prt,i[D(t)=1i=j]Prx,i[D(g(x))=1i=j]=j=01(Prt,i[D(t)=1|i=j]Prx,i[D(g(x))=1|i=j])Pr[i=j]=1j=01Prt,i[D(t)=1|i=j]Prx,i[D(g(x))=1|i=j]

where the random variable i{0,1,,1} is sampled exactly the same as in D.

Notice that conditioned on i=j for any fixed j, the distribution t0t1t2 (given to D) is identical to

{Hj+1if t{0,1}n+1Hjif x{0,1}n,tg(x).

That implies

Prt,i[D(t)=1|i=j]=Pr[tHj+1:D(t)=1],Prx,i[D(t)=1|i=j]=Pr[tHj:D(t)=1].

We thus have the summations cancelling out,

PrtUn+1,i[D(t)=1]PrxUn,i[D(g(x))=1]=1j=01PrtHj+1[D(t)=1]PrtHj[D(t)=1]=1(PrtH[D(t)=1]PrtH0[D(t)=1])11p(n),

where the last inequality follows by (AC). That is, D distinguishes g(x) w.p. at least 1(n)p(n), contradicting g is a PRG.

Discuss In the above, we proved it formally and preserved the uniformity (if D is a uniform TM, then D is also uniform). We did not apply Hybrid Lemma (and no triangular ineq), nor did we use Closure Lemma. Alternatively after (AC), one may apply Hybrid Lemma which claims that exists j s.t. Hj is distinguishable from Hj+1 w.p. at least 1/(p), and then hardwire j into D in order to distinguish g(x). This would make D non-uniform because j would depend on each n and we would not have an efficient way to find j.

We proved in the above that a PRG with 1-bit expansion is sufficient to build any poly-long expansion. We have not yet give any candidate construct of PRG (even 1-bit expansion), but it is useful to firstly see what we can achieve using PRGs.

Example: Now suppose that we have a PRG g with n(n) expansion for any poly . We can construct a computationally secure encryption by sampling key k as an n-bit string and then bitwise XORing g(k) with the message. That mg(k) encrypts one message. We can encrypt more messages by attaching to each message a sequence number, such as (m1g(k)[1n],1),(m2g(k)[n2n],2), and so on.

What’s the downside of the above multi-message encryption?

Pseudo-Random Functions

In order to apply PRGs more efficiently, we construct a tree structure and call the abstraction pseudo-random functions (PRFs). We begin with defining (truly) random function.

Definition: Random Functions

A random function f:{0,1}n{0,1}n is a random variable sampled uniformly from the set RFn:={f:{0,1}n{0,1}n}.

We can view a random function in two ways. In the combinatorial view, any function f:{0,1}n{0,1}n is described by a table of 2n entries, each entry is the n-bit string, f(x).

f(000000),f(000001),,f(111111)

In the computational view, a random function f is a data structure that on any input x, perform the following:

  1. initialize a map m to empty before any query
  2. if xm, then sample y{0,1}n and set m[x]y
  3. output m[x]

In both views, the random function needs 2nn bits to describe, and thus there are 2n2n random functions in RFn.

Note: the random function FRFn is also known as random oracle in the literature.

Intuitively, a pseudo-random function (PRF) shall look similar to a random function. That is, indistinguishable by any NUPPT Turing machine that is capable of interacting with the function.

Definition: Oracle Indistinguishability

Let {On}nN and {On}nN be ensembles where On,On are probability distributions over functions. We say that {On}n and {On}n are computationally indistinguishable if if for all NUPPT machines D that is given oracle accesses to a function, there exists a negligible function ϵ() such that for all nN,

Pr[FO:DF()(1n)=1]Pr[FO:DF()(1n)=1]ϵ(n).

Note: Df() denotes that the TM D may interact with the function f through black-box input and output, while each input-output takes time to read/write but computing f takes 0 time.

It is easy to verify that oracle indistinguishability satisfies “closure under efficient operations”, the Hybrid Lemma, and the Prediction Lemma.

Also notic that we can transform a distribution of oracle functions to a distribution of strings using an efficient oracle operation, and in that case, the oracle indistinguishability is translated into the comp. indistinguishability of strings (see CPA-secure encryption below).

Definition: Pseudo-random Functions (PRFs)

A family of functions {fs:{0,1}|s|{0,1}|s|}s{0,1} is pseudo-random if

  • (Easy to compute): fs(x) can be computed by a PPT algo that is given input s,x.
  • (Pseudorandom): {s{0,1}n:fs}n{FRFn:F}n.

Note: similar to PRG, the seed s is not revealed to D (otherwise it is trivial to distinguish).

Theorem: Construct PRF from PRG

If a pseudorandom generator exists, then pseudorandom functions exist.

We have shown that a PRG with 1-bit expansion implies any PRG with poly expansion. So, let g be a length-doubling PRG, i.e., |g(x)|=2|x|. Also, define g0,g1 to be

g0(x):=g(x)[1...n], and g1:=g(x)[n...2n],

where n:=|x| is the input length.

We define fs as follows to be a PRF:

fs(b1b2...bn):=gbngbn1...gb1(s).

That is, we evaluate g on s, but keep only one side of the output depending on b1, and then keep applying g on the kept side, and then continue to choose the side by b2, and so on.

This constructs a binary tree. The intuition is from expanding the 1-bit PRG, but now we want that any sub-string of the expansion can be efficiently computed. (We CS people love binary trees.) Clearly, fs is easy to compute, and we want to prove it is pseudorandom.

There are 2n leaves in the tree, too many so that we can not use the “switch one more PRG to uniform in each hybrid” technique as in expanding PRG. The trick is that the distinguisher D can only query fs at most polynomial many times since D is poly-time. Each query correspond to a path in the binary tree, and there are at most polynomial many nodes in all queries. Hence, we will switch the g(x) evaluations from root to leaves of the tree and from the first query to the last query.

Note: switching each instance of g(x) (for each x) is a reduction that runs D to distinguish one instance of g(x); therefore, we switch exactly one in each hybrid.

More formally, assume for contra (AC), there exists NUPPT D, poly p s.t. for inf many nN, D distinguishes fs from RF (in the oracle interaction). We want to construct D that distinguishes g(x). We define hybrid oracles Hi(b1bn) as follows:

  1. the map m is initialized to empty
  2. if the prefix b1bim, then sample s(bibi1b1){0,1}n and set m[bib1]s(bibi1b1)
  3. output gbngbn1gbi1(m[bib1])

Notice that Hi is a function defined using the computational view.

Let PRFn:={fs:s{0,1}n} be the distribution of fs for short. We have H0PRFn and HnRFn, but there are still too many switches between Hi,Hi+1. The key observation is that, given D is PPT, we know a poly T(n) that is the running time of D on 1n, and then we just need to switch at most T(n) instances of g(x). That is to define sub-hybrids Hi,j,

  1. the map m is initialized to empty
  2. if the prefix b1bibi+1m, then depending on the “number of queries” that are made to Hi,j so far, including the current query, do the following: sample s{0,1}n, set

    m[bi+1bi...b1]{{0,1}nnumber of queriesjgbi+1(s)otherwise,

    and set

    m[bi+1bi...b1]{{0,1}nnumber of queriesjgbi+1(s)otherwise.
  3. output gbngbn1gbi(m[bi+1b1])

We have Hi,0Hi. Moreover for any D runs in time T(n), we have Hi,T(n)Hi+1 (their combinatorial views differ, but their computational views are identical for T(n) queries). Now we have nT(n) hybrids, so we can construct D(t):

  1. sample i{0,1,,n1} and j{0,,T(n)1} uniformly at random
  2. define oracle Oi,j,t() such that is similar to Hi,j but “injects” t to the map m in the j-th query if the prefix b1bibi+1m. (This is constructable and computable only in the next step when queries come from D.)
  3. run and output DOi,j,t()(1n), that is running D on input 1n when providing D with oracle queries to Oi,j,t

It remains to calculate the probabilities, namely, given (AC), D distinguishes g(x) from uniformly sampled string w.p. 1nT(n)p(n), a contradiction. The calculation is almost identical to the proof of PRG expansion and left as an exercise.

Secure Encryption Scheme

Perfect secrecy considers that the adversary gets the ciphertext only (but nothing else). However, there are other natural adversarial models in practical scenarios.

  • Known plaintext attack: The adversary may get to see pairs of form (m0,Enck(m0))
  • Chosen plain text, CPA: The adversary gets access to an encryption oracle before and after selecting messages.
  • Chosen ciphertext attack, CCA1: The adversary has access to an encryption oracle and to a decryption oracle before selecting the messages. [“lunch-time attack”, Naor and Young]
  • Chosen ciphertext attack, CCA2: This is just like a CCA1 attack except that the adversary also has access to decryption oracle after selecting the messages. It is not allowed to decrypt the challenge ciphertext however. [Rackoff and Simon]

We formalize CPA-security next (but leave CCA1/CCA2 later in authentication).

Definition: Chose-Plaintext-Attack Encryption (CPA)

Let Π=(Gen,Enc,Dec) be an encryption scheme. For any NUPPT adversary A, for any nN,b{0,1}, define the experiment ExprbΠ,A(1n) to be:

Experiment ExprbΠ,A(1n):

  1. kGen(1n)
  2. (m0,m1,state)AEnck()(1n)
  3. cEnck(mb)
  4. Output AEnck()(c,state)

Then we say Π is CPA secure if for all NUPPT A,

{Expr0Π,A(1n)}n{Expr1Π,A(1n)}n.

Note: the experiment Expr is often equivalently described as “the adversary A interacts a challenger C, where C performs all other steps that are not belong to A (such as Gen, Enc, and answering the queries to Enck())”.

Compared to Shannon/perfect secrecy, what are the differences?

  • comp. bounded
  • orcale before
  • orcale after
  • choose m

Suppose that we have a secure encryption even without CPA oracle but the key is shorter than the message. Can we get a PRG/PRF? Can we get a OWF?

Theorem: CPA-Secure Encryption from PRF

Let PRF={fs:{0,1}|s|{0,1}|s|}s{0,1} be a family of PRFs. Then the following (Gen,Enc,Dec) is a CPA-secure encryption scheme.

  • Gen(1n): sample and output k{0,1}n.
  • Enck(m): given input m{0,1}n, sample r{0,1}n, and then output

    c:=mfk(r)  r.
  • Deck(c): given input c=c|r{0,1}2n, output

    m:=cfk(r).

The correctness and efficiency of the construction follows from PRF directly. It remains to prove CPA security.

To show Expr0 and Expr1 are comp. ind., we define hybrid experiments H0,H1 as follows.

Hybrid HbA(1n):

  1. FRFn, and then let OF to be the following oracle:

    OF(x):=xF(r)r,

    where r{0,1}n is sampled uniformaly.

  2. (m0,m1,state)AOF()(1n)
  3. r{0,1}n,cmbF(r)|r
  4. Output AOF()(c,state)

By oracle indistinguishability of PRF and RF and closure under efficient operations, we have {Expr0Π,A(1n)}n{H0A(1n)}n and {Expr1Π,A(1n)}n{H1A(1n)}n. (Notice that PRF and RF are oracle ind., but Expr and H are comp. ind. of strings.)

Hence, it suffices to prove that the ensembles H0 and H1 are ind. They seem to be indentically distributed (as in OTP). However, there is a difference: A gets oracle accesses to OF (before and after choosing mb), and OF could sample the same r in the cipher c and in another oracle accesses. Fortunately, hitting the same r twice in polynomial time happens with negligible probability.

We formally prove {H0A(1n)}n{H1A(1n)}n next. Define R to be the set

R:={r{0,1}n:r is sampled when AOF()},

and let r be the random variable sampled for the cipher c. We want to show that |Pr[H0A(1n)=1]Pr[H0A(1n)=1]| is negligible for all NUPPT A. Let H0 and H1 be the events for short.

Pr[H0]=Pr[H0rR]+Pr[H0rR]Pr[rR]+Pr[H0|rR]Pr[rR]=γ+Pr[H0|rR](1γ),

where γ:=|R|/2n. We also have Pr[H0|rR]=Pr[H1|rR], thus

Pr[H0]γ+Pr[H1|rR](1γ)=γ+Pr[H1rR]γ+Pr[H1].

Given that |R| is polynomial in n for any NUPPT A, it follows that γ is negligible in n, which concludes the proof.

Notice that we could have constructed an efficient CPA-secure encryption from PRG, but using a PRF significantly simplified the construction and the proof.

Hard-Core Bits from any OWF

So far we have not yet have a candidate construction of PRG (with 1-bit expansion). We will next construct a PRG from one-way permutations.

OWF vs others

The construct of PRG comes from two properties of OWF:

  • The output of f(x) must be sufficiently random when the input x is uniform; otherwise, f is constant (for most x), then we can invert easily.
  • A sufficiently random f(x) can still be easily inverted (such as indentity func). By hard to invert, there must be some bits of x that are hard to guess when f(x) is given. How many bits are hard to guess for any polynomial-time adversary? Must be ω(logn).

Suppose f is OWP, then we have “fully random” f(x) (that is stronger than the first propery). Additionally utilizing the second property, it seems we can take just 1 bit from the “hard bits” of x to obtain a 1-bit PRG.

Definition: One-way Permutations

An OWF f:{0,1}n{0,1}n for all nN is called a one-way permutations if f is a bijection.

Definition: Hard-core Bits

A predicate h:{0,1}{0,1} is a hard-core predicate for f(x) if h is efficiently computable given x, and for any NUPPT adversary A, there exists a negligible ϵ so that for all nN,

Pr[x{0,1}n:A(1n,f(x))=h(x)]12+ϵ(n).

This is indeed the case for some OWPs, such as RSA. If we construct OWP from the RSA assumption, then the least significant bit of x is that “hard to guess” one, and then we can obtain PRG from RSA assumption.

Theorem: PRG from OWP and hard-core predicate

Suppose that f:{0,1}n{0,1}n is a OWP and h:{0,1}n{0,1} is a hard-core predicate for f (for all nN). Then, g:{0,1}n{0,1}n+1 to be defined below is a PRG:

g(x):=f(x)h(x).

(The proof is a standard reduction: if there exists a NUPPT distinguisher D against g, then we can build a NUPPT adversary A that inverts f by running D.)

However, we want to obtain PRG from any OWP or any OWF (without depending on specific assumptions). That is unfortunately unclear.

Fortunately, Goldreich-Levin showed that for any OWF f, we can obtain another OWF f that we know its hard-core predicate. The intuition is: given f is hard to invert, in the preimage of f(x), there must be at least ω(logn) bits that are hard to guess (otherwise, a poly-time adv can invert). Hard-core predicate formalizes those bits. Even we do not know which bits are hard, we can sample randomly and hope to obtain 1 bit out of them.

Theorem: Goldreich-Levin, Hard-Core Lemma

Let f:{0,1}n{0,1}n for all nN be a OWF. Define functions f:{0,1}2n{0,1}2n,h:{0,1}2n{0,1} to be the following:

f(x,r):=f(x)r, and h(x,r):=xr,

where denotes the inner product modulo 2, i.e., ab:=(i[n]ai+bi)mod2 for any a,b{0,1}n. Then, f is a OWF and h is a hard-core predicate for f.

Note: in the above definition of f and h, the thm says that “even we are given the subset r and f(x), because f(x) is hard to invert, we still do not know the parity of x over r”. Since the subset r is chosen uniformly, and even we do not know where are them, r hits some “hard bits” with overwhelming probability. This is indeed consistent with the earlier intuition.

Clearly f is a OWF, and h is easy to compute. The main challenge is to prove that h is hard-core. We assume for contra that h is not hard-core, which is the following, and then to reach contra, we want to construct another adversary B that inverts f.

Full Assumption:

There exists NUPPT A, polynomial p, such that for inf many nN,

Pr[x{0,1}n,r{0,1}n:A(12n,f(x,r))=h(x,r)]1/2+1/p(n).

The construct and analysis of B is involved, so we will start from a couple of warmups.

Warmup Assumption 1:

There exists NUPPT A, such that for inf many nN, for all r{0,1}n,

Prx[A(12n,f(x,r))=h(x,r)]=1.

To invert yf(x), the construction of B1(1n,y) is simple:

  1. For i=1,2,,n, do the following
    1. Let ei be the n-bit string that only the i-th bit is 1 (0 otherwise)
    2. Run xiA(12n,yei)
  2. Output x:=x1x2xn To see why B1 inverts yf(x), observe that xi=h(x)=xei=xi, where x=x1x2xn. Hence, B1 succeeds w.p. 1, a contradiction.

Note: the above assumed “for all r” and “w.p. =1”, both are much stronger than we wanted.

Warmup Assumption 2:

There exists NUPPT A, polynomial p, such that for inf many nN,

Prx,r[A(12n,f(x,r))=h(x,r)]3/4+1/p(n).

We would like to use ei as before, but now A may always fail whenever the suffix of f(x,r) is ei. Hence, we randomize ei to r and rei and then recover the inner product (this is also called “self correction”).

Fact

For all n, any strings x,a,b{0,1}n, it holds that (xa)(xb)=x(ab).

To invert yf(x), the construction of B2(1n,y) is below:

  1. For each i=1,2,,n, do
    1. For j=1 to m, do
      • r{0,1}n
      • Run zi,jA(12n,yeir)A(12n,yr)
    2. Let xi be the majority of {zi,j}j[m]
  2. Output x:=x1x2xn

To prove B2 succeeds with high prob., we first prove that there are many good x’s.

Good instances are plenty.

Define G to be the set of good instances,

G:={x{0,1}n | Prr[A(12n,f(x,r))=h(x,r)]3/4+α/2},

where α:=1/p(n).
If the Warmup Assumption 2 holds, then |G|2nα/2.

(This is actually a standard averaging argument or a Markov ineq.) Suppose not, |G|<2nα/2. Then,

Prx,r[A(f(x,r))=h(x,r)]=Pr[A=hxG]+Pr[A=h|xG]Pr[xG]<α/2+Pr[A=h|xG]<α/2+3/4+α/2=3/4+α,

which contradicts Warmup Assumption 2.

Now, suppose that xG. A fails to invert yeir or yr w.p. <1/2α by union bound. So, for any fixed i, Pr[zi,j=xi]1/2+α for each j independently. By Chernoff bound, the majority of zi,j is xi w.p. 1emα2/2. Choosing m=np2(n), the probability is exponentially close to 1. By union bound over all i[n], B2 recovers x w.p. close to 1.

Finally, B2 succeeds w.p. α/4 for all x uniformly sampled by failing for all xG.

To complete the full proof, We want to lower from 3/4 to 1/2. The “good set” still holds when modified to 1/2 (since it is a simple averaging). The main challenges from the previous 3/4 proof is:

  • The union bound of inverting both yeir and yr. For 1/2, that lowers to only α, and then that is too low for the following majority and Chernoff bound.

The first idea is to guess the inner product xr uniformly at random, which is a correct guess w.p. 1/2. Supposing that p(n) is a constant, we can choose m(n)=O(logn), all m guesses are correct w.p. 1/2m=1/poly(n), then conditioned on correct guesses, we have A(yeir) correct w.p. 1/2+α (when x is good), and then we can continue with Chernoff bound (w.p. 1/poly(n) to fail) and finish the prove. For large p(n), the guesses are too many and 1/2m is negligible.

The second idea is to use pairwise independent guesses. Particularly, we have Chebychev’s bound for the measure concentration of pairwise indep. r.v. (instead of Chernoff bound for fully indep.).

Theorem: Chebychev’s inequality

Let X1,,Xm[0,1] be pairwise independent random variables such that for all j, E[Xj]=μ. Then,

Pr[|Xmμ|mδ]1μ2mδ2,

where X:=j[m]Xi.

[Ps, p189]

We can then reduce the number of guesses from m to logm.

Fact: Sampling pairwise independent random strings

For any n,mN, let (ui:ui{0,1}n)i[logm] be strings independently sampled uniformly at random (we abuse notation and round up logm to the next integer). Define strings rI for each I[logm] to be

rI:=iIui.

The random variables (r1,r2,,rm) are pairwise independent, where rj denotes rI such that I is the j-th subset of [logm].

(The proof is left as exercise.)

Now we are ready to prove the full theorem.

Proof of Hard-Core Lemma (Goldreich-Levin, 1989)

Given NUPPT A in the Full Assumption, we construct B that inverts yf(x) as follows.

Algorithm B(1n,y)

  1. Let :=logm, (u1,,u) be fully independent and (r1,,rm) be pairwise independent n-bit random strings as in Fact of pairwise indep.
  2. For each k[], sample guess bit bk uniformly. For each j[m], compute the bit gi,j from (b1,,b) in the same way as rj (so that for any x, gi,j=xrj and bk=xuk for all k).
  3. For each i=1,2,..,n,
    1. For each j=1,2,,m,
      • Run zi,jA(12n,yeirj)gi,j.

      Let xi be the majority of {zi,j}j[m]

  4. Output x:=x1x2xn

We begin with claiming the number of good instances of x.

Good instances are plenty.

Define G to be the set of good instances,

G:={x{0,1}n | Prr[A(12n,f(x,r))=h(x,r)]1/2+α/2},

where α:=1/p(n). If the Full Assumption holds, then |G|2nα/2.

(The proof is almost the same and omitted.)

We condition on the good event that xG. Next, we condition on the “lucky event” that the guess bk equals to xuk for all k, which happens w.p. 1/m. That implies (gi,1,,gi,m) are all correct. With the conditioning, for any jj, rj and rj are still pairwise indep., and thus (gi,j,gi,j) are pairwise indep. as well. Therefore, by Chebychev’s ineq., the majority of gi,j equals to xei w.p.

Pr[m(1+α)/2Xmα/2]1m(α/2)2,

where X=jXj, and Xj denotes the event that A outputs x(eirj) correctly. Choosing m(n):=8np2(n), we have that Pr[xixi]1/2n. Taking union bound for all i, Pr[x=x]1/2, conditioning on xG and all bi’s are correct. Removing the conditional events* takes α/2 and 1/m, but B still inverts y w.p. 1/(4p(n)m(n))=1/32np3(n), contradicting f is OWF.

(*For any events A,B,C, Pr[A]Pr[A|BC]Pr[BC], where A is x=x, B is xG, C is all bi’s are correct.)

Discuss The number of bits we guessed is logm=O(logp(n))=O(logn), where p(n) depends on the (hypothetical) NUPPT A. Since the guessed bits entails information about x, the proof implies (again) that there must be ω(logn) bits that are hard to invert (from f(x) to x). Still, having an efficient and uniform attack is non-trivial: since we do not know which are the hard bits, and the number of subsets (nclogn)(n/clogn)clogn is a super-polynomial, it was unclear how to guess efficiently if we had not applied the Chebychev bound.

Discuss How far does the Hard-core Lemma extend to? Suppose f is OWF, and suppose h is a hard-core predicate for f.

  • Is f(x):=f(x)h(x) a OWF?
  • Let f(x,t,r):=f(x)tr, and let h(x,t,r):=xr. Is f a OWF? If so, is h a hard-core predicate for f?
  • Let f(x,t,r):=f(x)txtr, and let h(x,t,r):=xr. Is f a OWF? If so, is h a hard-core predicate for f?

The questions are highly relevant when we want to construct PRG from any one-way function.