\newcommand{\RF}{\mathsf{RF}} \newcommand{\PRF}{\mathsf{PRF}} \newcommand{\Enc}{\mathsf{Enc}} \newcommand{\Dec}{\mathsf{Dec}} \newcommand{\Gen}{\mathsf{Gen}} \newcommand{\Expr}{\mathsf{Expr}} \newcommand{\state}{\mathsf{state}}

Contruct PRG from OWF

We have shown that given a PRG, we can construct a CPA-secure encryption. We also showed that PRG, PRF, and CPA-secure encryption implies OWF, so OWFs are necessary if we want cryptography. We also have a universal OWF that is as strong as any candidate OWF. It remains to construct PRG from OWF. The construction implies that OWF is necessary and sufficient for PRG, PRF, and encryption, and that we have an encryption (and PRG and PRF) that is as strong as any encryption (by universal OWF).

OWF vs others

Hard-Core Bits from any OWF

So far we have not yet have a candidate construction of PRG (even with 1-bit expansion). We will next construct a PRG from one-way permutations (which is easier).

The construct of PRG comes from two properties of OWF:

  • The output of f(x)f(x) must be sufficiently random (given the input xx is uniform).

    (If the random variable y:=f(x)y := f(x) is constant or taken over a small support for most xx, then we can invert yy with good probability.)

  • By hard to invert, there must be some bits of xx that are hard to guess even when f(x)f(x) is given.

    A sufficiently random f(x)f(x) can still be easily inverted (such as indentity function). How many bits are hard to guess for any polynomial-time adversary? Must be ω(logn)\omega(\log n) (otherwise, it can be guessed correctly w.p. 1/poly(n)1/poly(n)).

Suppose that ff is OWP, then we have “fully random” f(x)f(x). That is good in terms of the first property.

Definition: One-way Permutations

An OWF f:{0,1}n{0,1}nf: \bit^n \to \bit^n for all nNn\in\N is called a one-way permutations if ff is a bijection.

To utilize the second property, we want to obtain some of the “hard bits” from the input xx. If we can get 1 extra hard bit, we have a construction of PRG by putting together the output of OWP and the extra hard bit. The hard bit must be depend on the output mathematically, but it shall be hard to compute even given the output. The hard bit is formalized below.

Definition: Hard-core bits

A predicate h:{0,1}{0,1}h : \bit^\ast \to \bit is a hard-core predicate for f(x)f (x) if hh is efficiently computable given xx, and for any NUPPT adversary AA, there exists a negligible ϵ\eps so that for all nNn\in\N,

Pr[x{0,1}n:A(1n,f(x))=h(x)]12+ϵ(n).\Pr[x \gets \bit^n: A(1^n, f(x)) = h(x)] \le \frac{1}{2} + \eps(n).

Notice that here we are not restricting ff to OWP nor OWF. (If ff never use a bit in the input, then that bit is hard-core. However, such ff is not permutation and is more challenging for us. We will focus on OWP in this section.)

That is indeed the case for some OWPs, such as RSA. If we construct OWP from the RSA assumption, then the least significant bit of xx is that “hard to guess” one, and then we can obtain PRG from RSA assumption.

Theorem: PRG from OWP and hard-core predicate

Suppose that f:{0,1}n{0,1}nf: \bit^n \to \bit^n is a OWP and h:{0,1}n{0,1}h: \bit^n \to \bit is a hard-core predicate for ff (for all nNn\in\N). Then, g:{0,1}n{0,1}n+1g: \bit^n \to \bit^{n+1} to be defined below is a PRG:

g(x):=f(x)h(x).g(x) := f(x) \| h(x).

(The proof is a standard reduction: if there exists a NUPPT distinguisher DD against gg, then we can build a NUPPT adversary AA that inverts ff by running DD.)

However, we want to obtain PRG from any OWP or any OWF (without depending on specific assumptions). That is unfortunately unclear.

Fortunately, Goldreich-Levin showed that for any OWF (or OWP) ff', we can obtain another OWF (or OWP) ff that we know its hard-core predicate. The intuition is: given ff' is hard to invert, in the preimage of f(x)f(x), there must be at least ω(logn)\omega(\log n) bits that are hard to guess (otherwise, a poly-time adversary can invert). The hard-core predicate formalizes those bits. Even we do not know which bits are hard, we can sample randomly and hope to obtain 1 bit out of them.

Theorem: Goldreich-Levin, Hard-Core Lemma

Let f:{0,1}n{0,1}nf': \bit^n \to \bit^n for all nNn\in\N be a OWF. Define functions f:{0,1}2n{0,1}2n,h:{0,1}2n{0,1}f: \bit^{2n}\to \bit^{2n}, h: \bit^{2n} \to \bit to be the following:

f(x,r):=f(x)r, and h(x,r):=xr,f(x,r) := f'(x) \| r, \text{ and } h(x,r) := x \odot r,

where \odot denotes the inner product modulo 2, i.e., ab:=(i[n]ai+bi)mod  2a \odot b := (\sum_{i\in[n]} a_i + b_i) \mod 2 for any a,b{0,1}na,b\in\bit^n. Then, ff is a OWF and hh is a hard-core predicate for ff.

Note: in the above definition of ff and hh, the thm says that “even we are given the subset rr and f(x)f'(x), because f(x)f'(x) is hard to invert, we still do not know the parity of xx over rr”. Since the subset rr is chosen uniformly, and even we do not know where are them, rr hits some “hard bits” with overwhelming probability. This is indeed consistent with the earlier intuition.

Clearly ff is a OWF, and hh is easy to compute. The main challenge is to prove that hh is hard-core. We assume for contra that hh is not hard-core, which is the following, and then to reach contra, we want to construct another adversary BB that inverts ff'.

Full Assumption:

There exists NUPPT AA, polynomial pp, such that for inf many nNn\in\N,

Pr[x{0,1}n,r{0,1}n:A(12n,f(x,r))=h(x,r)]1/2+1/p(n).\Pr[x \gets \bit^n, r \gets \bit^n: A(1^{2n}, f(x,r)) = h(x,r)] \ge 1/2 + 1/p(n).

The construct and analysis of BB is involved, so we will start from a couple of warmups.

Warmup Assumption 1:

There exists NUPPT AA, such that for inf many nNn\in\N, for all r{0,1}nr \in \bit^n,

Prx[A(12n,f(x,r))=h(x,r)]=1.\Pr_{x}[A(1^{2n}, f(x,r)) = h(x,r)] = 1.

To invert yf(x)y \gets f'(x), the construction of B1(1n,y)B_1(1^n, y) is simple:

  1. For i=1,2,...,ni = 1, 2, ..., n, do the following
    1. Let eie_i be the nn-bit string that only the ii-th bit is 1 (0 otherwise)
    2. Run xiA(12n,yei)x'_i \gets A(1^{2n}, y \| e_i)
  2. Output x:=x1x2...xnx' := x'_1 x'_2 ... x'_n To see why B1B_1 inverts yf(x)y \gets f'(x), observe that xi=h(x)=xei=xix'_i = h(x) = x \odot e_i = x_i, where x=x1x2...xnx = x_1 x_2 ... x_n. Hence, B1B_1 succeeds w.p. 1, a contradiction.

Note: the above assumed “for all rr” and “w.p. =1=1”, both are much stronger than we wanted.

Warmup Assumption 2:

There exists NUPPT AA, polynomial pp, such that for inf many nNn\in\N,

Prx,r[A(12n,f(x,r))=h(x,r)]3/4+1/p(n).\Pr_{x,r}[A(1^{2n}, f(x,r)) = h(x,r)] \ge 3/4 + 1/p(n).

We would like to use eie_i as before, but now AA may always fail whenever the suffix of f(x,r)f(x,r) is eie_i. Hence, we randomize eie_i to rr and reir \oplus e_i and then recover the inner product (this is also called “self correction”).

Fact

For all nn, any strings x,a,b{0,1}nx, a, b \in \bit^n, it holds that (xa)(xb)=x(ab)(x \odot a) \oplus (x \odot b) = x \odot (a \oplus b).

To invert yf(x)y \gets f'(x), the construction of B2(1n,y)B_2(1^n, y) is below:

  1. For each i=1,2,...,ni = 1, 2, ..., n, do
    1. For j=1j = 1 to mm, do
      1. Sample r{0,1}nr \gets \bit^n
      2. Run zi,jA(12n,yeir)A(12n,yr)z_{i,j} \gets A(1^{2n}, y \| e_i\oplus r) \oplus A(1^{2n}, y \| r)
    2. Let xix'_i be the majority of {zi,j}j[m]\set{z_{i,j}}_{j\in[m]}
  2. Output x:=x1x2...xnx' := x'_1 x'_2 ... x'_n

To prove B2B_2 succeeds with high prob., we first prove that there are many good xx’s.

Good instances are plenty.

Define GG to be the set of good instances,

G:={x{0,1}n  Prr[A(12n,f(x,r))=h(x,r)]3/4+α/2},G:= \set{ x \in \bit^n ~|~ \Pr_{r}[A(1^{2n}, f(x,r)) = h(x,r)] \ge 3/4 + \alpha / 2 },

where α:=1/p(n)\alpha := 1/p(n).
If the Warmup Assumption 2 holds, then G2nα/2|G| \ge 2^n \cdot \alpha / 2.

This is actually a standard averaging argument (or a Markov inequality). Suppose not, G<2nα/2|G| \lt 2^n \cdot \alpha / 2. Then,

Prx,r[A(f(x,r))=h(x,r)]=Pr[A=hxG]+Pr[A=hxG]Pr[xG]<α/2+Pr[A=hxG]<α/2+3/4+α/2=3/4+α,\begin{align*} \Pr_{x,r}[A(f(x,r)) = h(x,r)] & = \Pr[A=h \cap x \in G] + \Pr[A=h | x\notin G] \cdot \Pr[x \notin G]\\ & \lt \alpha/2 + \Pr[A=h | x\notin G]\\ & \lt \alpha/2 + 3/4 + \alpha /2 = 3/4 + \alpha, \end{align*}

which contradicts Warmup Assumption 2.

Now, suppose that xGx \in G. AA fails to invert yeiry \| e_i \oplus r or yry \| r w.p. <1/2α\lt 1/2 - \alpha by union bound. So, for any fixed ii, Pr[zi,j=xi]1/2+α\Pr[z_{i,j} = x_i] \ge 1/2 + \alpha for each jj independently. By Chernoff bound, the majority of zi,jz_{i,j} is xix_i w.p. 1emα2/2\ge 1 - e^{-m\alpha^2 /2}. Choosing m=np2(n)m = np^2(n), the probability is at least 1eΩ(n)1 - e^{-\Omega(n)}. By union bound over all i[n]i\in[n], B2B_2 recovers xx except w.p. eΩ(n)e^{-\Omega(n)}.

Finally, B2B_2 succeeds w.p. α/4\ge \alpha / 4 for all xx uniformly sampled by failing for all xGx \notin G.

To complete the full proof, We want to lower the probability from 3/43/4 to 1/21/2. The “good set” still holds when modified to 1/21/2 (since it is a simple averaging). The main challenges from the previous 3/43/4 proof is:

  • It is too weak to take the union bound of inverting both yeiry \| e_i \oplus r and yry \| r. For 1/21/2, that probability is lowered to only α\alpha, and then that is too low for the following majority and Chernoff bound.

The first idea is to guess the inner product xrx \odot r uniformly at random, which is a correct guess w.p. 1/21/2. Supposing that p(n)p(n) is a constant, we can choose m(n)=O(logn)m(n) = O(\log n), all mm guesses are correct w.p. 1/2m=1/poly(n)1/2^m = 1 / \poly(n), then conditioned on correct guesses, we have A(yeir)A(y \| e_i \oplus r) correct w.p. 1/2+α1/2 + \alpha (when xx is good), and then we can continue with Chernoff bound (w.p. 1/poly(n)1/\poly(n) to fail) and finish the prove. For large p(n)p(n), the guesses are too many, and thus the success probability 1/2m1/2^m is negligible in nn.

The second idea is to use pairwise independent guesses. Particularly, we have Chebychev’s bound for the measure concentration of pairwise indep. r.v. (instead of Chernoff bound for fully indep.).

Theorem: Chebychev’s inequality

Let X1,...,Xm[0,1]X_1, ..., X_m \in [0,1] be pairwise independent random variables such that for all jj, E[Xj]=μ\E[X_j] = \mu. Then,

Pr[Xmμmδ]1μ2mδ2,\Pr\left[ |X - m \mu| \ge m \delta \right] \le \frac{1-\mu^2}{m \delta^2},

where X:=j[m]XiX := \sum_{j\in[m]} X_i.

[Ps, p189]

We can then reduce the number of guesses from mm to logm\log m.

Fact: Sampling pairwise independent random strings

For any n,mNn, m \in \N, let (ui:ui{0,1}n)i[logm](u_i : u_i \gets \bit^n)_{i \in [\log m]} be strings independently sampled uniformly at random (we abuse notation and round up logm\log m to the next integer). Define strings rIr_I for each I[logm]I \subseteq [\log m] to be

rI:=iIui.r_I := \bigoplus_{i \in I} u_i.

The random variables (r1,r2,...,rm)(r_1, r_2, ..., r_m) are pairwise independent, where rjr_j denotes rIr_I such that II is the jj-th subset of [logm][\log m].

(The proof is left as exercise.)

Now we are ready to prove the full theorem.

Proof of Hard-Core Lemma (Goldreich-Levin, 1989)

Given NUPPT AA in the Full Assumption, we construct BB that inverts yf(x)y \gets f'(x) as follows.

Algorithm B(1n,y)B(1^n, y)

  1. Let m(n)m(n) be a polynomial to choose later.
  2. Let :=logm\ell := \log m, (u1,...,u)(u_1, ..., u_\ell) be fully independent and (r1,...,rm)(r_1,..., r_m) be pairwise independent nn-bit random strings as in sampling pairwise independent.
  3. Let (b1,...,b)(b_1, ..., b_\ell) be fully independent and (g1,...,gm)(g_1,..., g_m) be pairwise independent random 1-bit strings (symmetricall as in the previous step)
  4. For each i=1,2,..,ni=1,2, .., n,
    1. For each j=1,2,...,mj=1,2,..., m,
      • Run zi,jA(12n,yeirj)gjz_{i,j} \gets A(1^{2n}, y \| e_i \oplus r_j) \oplus g_{j}.
    2. Let xix'_i be the majority of {zi,j}j[m]\set{z_{i,j}}_{j\in[m]}
  5. Output x:=x1x2...xnx' := x'_1 x'_2 ... x'_n

We begin with claiming the number of good instances of xx.

Good instances are plenty.

Define GG to be the set of good instances,

G:={x{0,1}n  Prr[A(12n,f(x,r))=h(x,r)]1/2+α/2},G:= \set{ x \in \bit^n ~|~ \Pr_{r}[A(1^{2n}, f(x,r)) = h(x,r)] \ge 1/2 + \alpha / 2 },

where α:=1/p(n)\alpha := 1/p(n). If the Full Assumption holds, then G2nα/2|G| \ge 2^n \cdot \alpha / 2.

(The proof is almost the same and omitted.)

We condition on the good event that xGx \in G. Next, we condition on the “lucky event” that for all kk, the guess bkb_k equals to xukx \odot u_k, which happens w.p. 1/m1/m. That implies that for all jj, we have the correct guess gj=xrjg_j = x \odot r_j; that is, for any j[m]j \in [m], let S:={k:k-th bit of j is 1}S := \set{k : k\text{-th bit of } j \text{ is 1}}, we have that

gj=kSbk=kSxuk=xrj.g_{j} = \bigoplus_{k \in S} b_{k} = \bigoplus_{k \in S} x \odot u_k = x \odot r_j.

With the conditioning, for any jjj \neq j', rjr_j and rjr_{j'} are still pairwise independent, and similarly (A(yeirj),A(yeirj))(A(y \| e_i \oplus r_j), A(y\|e_i \oplus r_{j'})). Therefore, by Chebychev’s inequality, the majority of zi,jz_{i,j} equals to xeix \odot e_i w.p.

Pr[m(1+α)/2Xmα/2]1m(α/2)2,\Pr[ m (1 + \alpha)/2 - X \ge m \alpha/2] \le \frac{1}{m \cdot (\alpha/2)^2},

where X=jXjX = \sum_j X_j, and XjX_j denotes the event that AA outputs x(eirj)x\odot(e_i \oplus r_j) correctly. Choosing sufficiently large m(n):=8np2(n)m(n) := 8n p^2(n), we have that Pr[xixi]1/2n\Pr[x'_i \neq x_i] \le 1/2n.

The above shows that each bit xix_i is correctly recovered with a good marginal probability. We want that all bits xix'_i’s are correct with a good joint probability, but the different xix'_i’s come from the same randomness uku_k’s and bkb_k’s. (It is necessary to use the same bkb_k’s because we want the number of guessed bits to be strictly logarithmic in nn).

Fortunately, union bound works with dependent events. Taking union bound for all i[n]i\in[n], Pr[x=x]1/2\Pr[x' = x] \ge 1/2, conditioning on xGx \in G and all bib_i’s are correct. Removing the conditional events* takes α/2\alpha/2 and 1/m1/m, but BB still inverts yy w.p. 1/(4p(n)m(n))=1/32np3(n)\ge 1/(4p(n)m(n)) = 1 / 32 n p^3(n), contradicting ff' is OWF.

* For any events A,B,CA,B,C, Pr[A]Pr[ABC]Pr[BC]\Pr[A] \ge \Pr[A | B \cap C] \Pr[B \cap C]. The above applied that AA is x=xx' = x, BB is xGx \in G, and CC is the event that bk=A(yuk)b_k = A(y\| u_k) is correct for all k[]k \in [\ell].

Discuss The number of bits we guessed is logm=O(logp(n))=O(logn)\log m = O(\log p(n)) = O(\log n), where p(n)p(n) depends on the (hypothetical) NUPPT AA. Since the guessed bits entails information about xx, the proof formally implies that for any OWF, there must be ω(logn)\omega(\log n) bits that are hard to invert (from f(x)f(x) to xx). Still, having an efficient and uniform attack is non-trivial. Put it in another view. The output of adversary AA gives a probabilistic bit conveying information about xx, and the reduction is to learn xx by repeatedly querying AA with carefully chosen inputs, so that the probability of finding correct xx is high and the time and query is small. Hence, the reduction is related to learning at a high level.

Discuss How far does the Hard-core Lemma extend to? Suppose ff' is OWF, and suppose hh' is a hard-core predicate for ff'.

  • Is f(x):=f(x)h(x)f(x) := f'(x) \| h'(x) a OWF?
  • Let f(x,t,r):=f(x)trf(x,t,r) := f'(x) \| t \| r, and let h(x,t,r):=xrh(x,t,r) := x \odot r. Is ff a OWF? If so, is hh a hard-core predicate for ff?
  • Let f(x,t,r):=f(x)txtrf(x,t,r) := f'(x) \| t \| x \odot t \| r, and let h(x,t,r):=xrh(x,t,r) := x \odot r. Is ff a OWF? If so, is hh a hard-core predicate for ff?

The questions are highly relevant when we want to construct PRG from any one-way function.

Min-Entropy and Leftover Hash Lemma

We will use pairwise independent hash functions. The following facts are borrowed from the book of Salil Vadhan, Pseudorandomness, cited as [V].

Definition: statistical difference

For random variables XX and YY taking values in UU, their statistical difference (also known as variation distance) is Δ(X,Y):=maxTUPr[XT]Pr[YT]\Delta(X, Y) := \max_{T \subseteq U} | \Pr[X \in T ] − \Pr[Y \in T]|. We say that XX and YY are ϵ\eps-close if Δ(X,Y)ϵ\Delta(X, Y ) \leq \eps.

[V, Definition 6.2, p169]

Fact: (Statistical close to uniform, warmup)

Let nNn \in \N, ϵ[0,1]\eps \in [0,1], and random variable X{0,1}nX \in \bit^n. If XX is ϵ\eps-close to UnU_n, then

Pr[yUn:ySupp(X)]1ϵ,\Pr[ y \gets U_n: y \in \Supp(X)] \ge 1-\eps,

where Supp(X):={x:Pr[X=x]>0}\Supp(X) := \set{x : \Pr[X = x] \gt 0} denotes the support of XX.

Definition: Pairwise independent hash family.

A family of functions H={h:{0,1}n{0,1}m}\cH = \set{h : \bit^n \to \bit^m} is pairwise independent if the following two conditions hold when hHh \gets \cH is a function chosen uniformly at random from H\cH:

  1. For all x{0,1}nx \in \bit^n, the random variable h(x)h(x) is uniform in {0,1}m\bit^m.
  2. For all x1x2{0,1}nx_1\neq x_2 \in \bit^n, the random variables h(x1)h(x_1) and h(x2)h(x_2) are independent.

[V, Definition 3.21, p64]

Lemma: Pairwise independent hash from linear mapping.

For any finite field FF, define H\cH to be the following set:

H:={ha,b:ha,b(x)=ax+b,a,bF}.\cH := \set{h_{a,b} : h_{a,b}(x) = a x + b, a,b\in F}.

H\cH is a pairwise independent hash family. We often abuse notation, denoting hHh \in \cH to be the seed and h(x)h(x) to be the evaluation of the hash function.

[V, Construction 3.23, p65]

If mnm \ge n, choosing the field to be F2mF_{2^m} gives a construction such that each function takes 2m2m bits to describe. If m<nm \lt n, choosing F2nF_{2^n} and chopping the output to mm bits is still pairwise independent.

Corollary:

For any n,mNn,m\in\N, there exists a pairwise independent hash family Hn,m\cH_{n,m} such that each hHh \in \cH is 2max(n,m)2 \max(n,m) bits.

[V, Theorem 3.26, p66]

Definition: Min-entropy

Let XX be a random variable. Then the min-entropy of XX is:

H(X)=minx{log1Pr[X=x]}.H_\infty(X) = \min_{x}\left\{\log \frac{1}{\Pr[X=x]}\right\}.

where all logs are base 2.

[V, Definition 6.7, p171]

Example:

We say a function f:{0,1}n{0,1}nf: \bit^n \to \bit^n is MM-regular if it is MM-to-one for every input, i.e., for all y{0,1}ny \in \bit^n, it holds that f1(y)=M|f^{-1}(y)| = M. We have H(f(Un))=log(2n/M)H_\infty(f(U_n)) = \log(2^n / M). Let k:=H(f(Un))k := H_\infty(f(U_n)), and let m:=logMm := \log M (so that n=k+mn = k+m).

Suppose someone secretly sampled xUnx \gets U_n and computed f(x)f(x). We have the following:

  • Given nothing, we can only guess xx correctly w.p. 2n2^{-n}.
  • Given nothing, we can only guess f(x)f(x) correctly w.p. 2k2^{-k}, by the min-entropy of f(x)f(x).
  • Given f(x)f(x), we can guess xx correctly w.p. 2m2^{-m}. Thus, mm is viewed as the min-entropy of xx given f(x)f(x) (we avoid the formalization of conditional min-entropy).

Theorem: Leftover Hash Lemma

Let n,m,kNn,m,k \in \N and ϵ>0\eps > 0. Suppose that H={h:{0,1}n{0,1}m}\cH = \set{h : \bit^n \to \bit^m} is a pairwise independent family of hash functions and that X{0,1}nX \in \bit^n is a random variable. If H(X)kH_\infty(X) \ge k and m=k2log(1/ϵ)m = k − 2 \log(1/\eps), then (h,h(X))(h, h(X)) ϵ\eps-close to an uniformly random string of h+m|h| + m bits.

[V, Theorem 6.18, p179]

Corollary: Guessing the hash value

Let H\cH be a pairwise independent hash family, from nn-bit strings to nn-bit strings. For any random variable X{0,1}nX \in \bit^n such that H(X)=knH_\infty(X) = k \le n,

Pr[hH,yUkd:xSupp(X),hkd(x)=y]12d/2\Pr[h\gets \cH, y \gets U_{k-d}: \exists x \in \Supp(X), h_{k-d}(x) = y ] \ge 1-2^{-d/2}

where ht(x)h_t(x) denotes the prefix tt bits of h(x)h(x).

Another look at Goldreich-Levin.

Theorem: Leftover Hash Lemma [Mazor-Pass 2023, Lemma 3.4]

Let nN,ϵ[0,1]n\in\N, \eps \in [0,1], and let XX be a random variable over {0,1}n\bit^n. Let M{0,1}n×nM \gets \bit^{n\times n} and a random matrix, and let

H(X)3log(1/ϵ)4logn4.\ell \le H_\infty(X) - 3 \log(1/\eps) - 4 \log n - 4.

Then,

Δ((M,(Mx)1..),(M,U))ϵ,\Delta((M,(M\odot x)_{1..\ell}), (M, U_\ell)) \le \eps,

where MxM\odot x denotes the matrix multiplication modulo 2, (z)1..(z)_{1..\ell} denotes the first \ell bits of zz, and UU_\ell denotes the uniform distribution over {0,1}\bit^\ell.

(See [Mazor-Pass 2023] for a simple proof from Goldreich-Levin’s hard-core lemma.)

Example:

Consider a distribution X{0,1}nX \in \bit^n such that H(X)=kH_\infty(X) = k for some k[0,n]k \in [0,n]. Let MM be the matrix sampled as in LHL, and let \ell be defined w.r.t. kk as in LHL. Sample z{0,1}z \gets \bit^\ell uniformly at random. Then, for any ϵ[0,1]\eps \in [0,1], we have that

PrM,z[x s.t. pX(x)>0]1ϵ,\Pr_{M,z}[\exists x \text{~s.t.~} p_X(x) > 0] \ge 1 - \eps,

where pX()p_X(\cdot) denotes the probability mass function of XX.

Example:

Consider the distribution XX, the random matrix MM, the parameter \ell defined as the above. Then, for any ϵ[0,1]\eps \in [0,1], we have that the following two distributions

  • (M,(Mx1)1..),(Mx2)1..)(M, (M\odot x_1)_{1..\ell}), (M\odot x_2)_{1..\ell}) and
  • (M,U1,,U2,)(M, U_{1,\ell}, U_{2,\ell}) are 2ϵ2\eps-close, where U1,U_{1,\ell} and U2,U_{2,\ell} are two independent and uniformly random \ell-bit strings.

(The proof is a simple hybrid.)

Notice that in the above example, x1x_1 and x2x_2 did not need to be the same distribution, and they didn’t even need to be independent.

Weak Pseudo-Entropy Generator (PEG)

The first step, a gap in Shannon entropy and pseudo-entropy.

Definition: weak PEG

A function F:{0,1}n{0,1}mF : \bit^n \to \bit^m is called a weak pseudo-entropy generator(PEG), if there exists a kk such that

  1. H(F(Un))kH(F(U_n)) \le k.
  2. There exists YnY_n such that {F(U_n)}_n{Y_n}_n\set{F(U\_n)}\_n \approx \set{Y\_n}\_n and H(Yn)k+1100nH(Y_n) \ge k + \frac{1}{100n}. (This is called pseudo Shannon entropy.)

Discuss Is a weak PEG also a weak OWF?

Throughout the construction, it is easier to think that the given OWF is a KK-to-one mapping for some known K=2kK=2^k.

Definition: kk-regular OWFs

Let k:=k(n)k := k(n). A OWF ff is called kk-regular iff for all x{0,1}nx \in \bit^n, it holds that logf1(f(x))=k\log |f^{-1}(f(x))| = k.

Theorem: Weak PEG from OWF

Let f:{0,1}n{0,1}nf: \bit^n \to \bit^n for all nNn\in\N be a OWF, let H\cH be a pairwise independent hash family that for each hHh \in \cH, h:{0,1}n{0,1}nh : \bit^n \to \bit^n and h=2n|h| = 2n. Define function ff to be the following:

F(x,i,h,r):=(f(x),i,h,hi(x),r,xr), and F(x,i,h,r) := (f(x), i, h, h_i(x), r, x \odot r), \text{ and }

where hh is abused to denote the description of hHh \in \cH, hi(x)h_i(x) denotes the ii-bit prefix of h(x)h(x). Then, ff is a weak PEG.

Intuition for the construct:

  • f(x),r,xrf(x), r, x \odot r is GL hardcore lemma, the starting point to obtain a pseudo-random bit (i.e., pseudo-entropy). However, xrx \odot r gives no extra pseudo-entropy for many-to-one ff.
  • f(x),r,x,h(x)f(x), r, x, h(x) is attempting to obtain PE by “identifying different” xf1(f(x))x' \in f^{-1}(f(x)) through h(x)h(x). However, by randomizing hh, any xx'' may map to h(x)h(x) (like OTP), bad identification.
  • f(x),r,x,h,h(x)f(x), r, x, h, h(x) is giving hh, and we get good identification. However, too good to be easy to invert xx since solving h(x)=ax+bh(x) = a x + b is easy.
  • f(x),r,x,h,hi(x),if(x), r, x, h, h_{i^\ast}(x), i^\ast is cutting h(x)h(x) short to make inverting hard. For proper choice of i(x)i^\ast(x), this works, but i(x)i^\ast(x) is hard to compute.
  • We end up with guessing random ii. It works, the proof follows below.

For each x{0,1}nx \in \bit^n, let i(x):=logf1(f(x))+1i^\ast(x) := \ceil{\log |f^{-1}(f(x))|}+1. Let ii^\ast to be i(x)i^\ast(x) and ZZ' to be Z_i(x)Z'\_{i^\ast(x)} for short. Let random variables Y,Z,Z_iY,Z,Z\_{i^\ast}' be the following

Y:=(f(x),i,h,hi(x),r), Z:=xr, andY := (f(x), i, h, h_i(x), r), ~ Z := x \odot r, \text{ and} Z:={random bitif i=ixrotherwise.Z' := \begin{cases} \text{random bit} & \text{if } i = i^*\\ x \odot r & \text{otherwise.} \end{cases}

Claim: Low Shannon entropy

H(YZ)H(YZ)+12nH(YZ') \ge H(YZ) + \frac{1}{2n}

We have H(YX)=H(Y)+H(XY)H(YX) = H(Y) + H(X | Y) for all X,YX,Y (proof left as exercise). Hence, it suffices to show that H(ZY)H(ZY)+12nH(Z' | Y) \ge H(Z | Y) + \frac{1}{2n}. Conditioned on iii \neq i^\ast, we have H(ZY)=H(ZY)H(Z' | Y) = H(Z | Y). We want to show that when conditioned on i=ii = i^\ast, H(ZY)=1H(ZY)+12H(Z' | Y) = 1 \ge H(Z | Y) + \frac{1}{2}, which happens w.p. 1/n1/n. It remains to show that H(ZY)1/2H(Z | Y) \le 1/2. We will show that given (f(x),i,h,hi(x),r)(f(x), i^\ast, h, h_i(x), r), w.h.p. it holds that xx is uniquely determined, and thus Z=xrZ = x \odot r is also determined (and gives 0 Shannon entropy).

For any x{0,1}nx \in \bit^n and yf(x)y \gets f(x), define the pre-image set S:=f1(f(x)){x}S := f^{-1}(f(x)) \setminus \set{x}. Notice that S2i1|S| \le 2^{i^\ast -1}. By hh is pairwise independent, for any xSx' \in S,

Prh[hi(x)=hi(x)]=1/2i.\Pr_h[h_i(x') = h_i(x)] = 1/2^{i^\ast}.

To see xx is determined over the random hh,

Prh[xS s.t. h(x)=h(x)]=1Pr[xS s.t. h(x)=h(x)]1S/2i1/2,\begin{align*} \Pr_h[\nexists x' \in S \text{ s.t. } h(x') = h(x)] & = 1 - \Pr[\exists x' \in S \text{ s.t. } h(x') = h(x)] \\ & \ge 1 - |S|/2^{i^\ast} \ge 1/2, \end{align*}

by union bound and then by S|S| is small.

(The calculation of conditional entropy is left as exercise.)

Claim: High pseudo-entropy

{YZ}n{YZi(x)}n.\set{YZ}_n \approx \set{YZ'_{i^\ast(x)}}_n.

Proof Sketch.

Because YZ,YZYZ, YZ' differ only when i=ii = i^\ast, assume for contradiction, there exists NUPPT AA, polynomial pp, such that for inf many nn,

Prx,h,r[A(f(x),i,h,hi(x),r)=xr]1/2+α,\Pr_{x,h,r} [A(f(x),i^*,h,h_{i^*}(x),r) = x\odot r] \ge 1/2 + \alpha,

where α=1/p(n)\alpha = 1/p(n).

We want to construct BB that inverts yf(x)y \gets f(x). We have a similar claim of good xx’s: let GG to be the set

G:={x{0,1}n  Prh,r[A(f(x),i,h,hi(x),r)=xr]1/2+α/2}.G := \set{ x \in \bit^n ~|~ \Pr_{h,r}[A(f(x),i^*,h,h_{i^*}(x),r) = x \odot r] \ge 1/2 + \alpha / 2 }.

Then, G2nα/2|G| \ge 2^n \cdot \alpha / 2. We can next fix hh similarly: for each xGx\in G, let GxG_x to be the set

Hx:={hH  Prr[A(f(x),i,h,hi(x),r)=xr]1/2+β}.\cH_x := \set{ h \in \cH ~|~ \Pr_{r}[A(f(x),i^*,h,h_{i^*}(x),r) = x \odot r] \ge 1/2 + \beta }.

Then, HxHβ|\cH_x| \ge |\cH| \cdot \beta, where β:=α/4\beta := \alpha/4.

Now, we can condition on xGx \in G and hHxh \in \cH_x. Namely, given yf(x)y \gets f(x), BB tries all i[n]i \gets [n] and samples hHh \gets \cH uniformly, and we have that i=ii=i^\ast and hHxh \in \cH_x w.p. β\beta. It remains to find the correct h(x)h(x) so that BB can run AA repeatedly using pairwise independent rr’s.

Suppose that xx is fixed and hh is sampled uniformly and independently from H\cH. Given y=f(x)y = f(x), the min-entropy of xx is i(x)1\ge i^\ast(x)-1 because each xf1(y)x' \in f^{-1}(y) can be mapped to yy. By the corollary of Leftover Hash Lemma, “guessing the hash value”, the first idi^\ast - d bits of h(x)h(x) is 2(d1)/22^{-(d-1)/2}-close to uniform. This implies that we can hit the first idi^\ast - d bits of hi(x)h_{i^*}(x) w.p. 12(d1)/21 - 2^{-(d-1)/2} by sampling them uniformly at random.

However, to apply AA, we also conditioned on hHxh \in \cH_x (instead of uniform hh). Hence, we need to take union bound:

  • hHxh \notin \cH_x w.p. 1β\le 1 - \beta
  • the guess is not the first idi^\ast - d bits of h(x)h(x') for all xf1(y)x' \in f^{-1}(y) w.p. 2(d1)/22^{-(d-1)/2}.

Thus, choosing dd such that 2(d1)/2β/22^{-(d-1)/2} \le \beta/2, we will sample a “good” input (y=f(x),i,h,hi)(y = f(x'), i^\ast, h, h_{i^\ast}) to AA w.p. β/2\ge \beta/2 (only conditioned on xGx \in G). With the above, we can try all remaining d=O(2logβ)=O(logn)d = O(2 \log \beta) = O(\log n) bits and then check if the outcome xx' satisfies f(x)=yf(x') = y.

Algorithm B(y)B(y):

  1. hHh \gets \cH
  2. d:=log(α/4)+1d := -\log (\alpha / 4) + 1
  3. For each i=1,...,ni=1,...,n,
    1. t1{0,1}idt_1 \gets \bit^{i - d}
    2. For each t2{0,1}dt_2 \in \bit^d,
      • Let t:=t1t2t := t_1 \| t_2.
      • Run xB0(y,i,h,t)x' \gets B_0(y, i, h, t).
      • Output xx' if f(x)=yf(x') = y.

The subroutine B0B_0 performs almost identical to the standard Goldreich-Levin, and the only difference is that AA takes additional input (i,h,hi)(i, h, h_i).

Algorithm B0(y,i,h,hi)B_0(y, i, h, h_i)

  1. Let :=logm\ell := \log m, (u1,...,u)(u_1, ..., u_\ell) be fully independent and (r1,...,rm)(r_1,..., r_m) be pairwise independent nn-bit random strings.
  2. For each k[]k \in [\ell], sample guess bit bkb_k uniformly. For each j[m]j \in [m], compute the bit gi,jg_{i,j} from (b1,...,b)(b_1, ..., b_\ell) in the same way as rjr_j (so that for any xx, gi,j=xrjg_{i,j} = x \odot r_j and bk=xukb_k = x \odot u_k for all kk).
  3. For each i=1,2,..,ni=1,2, .., n,
    1. For each j=1,2,...,mj=1,2,..., m,
      • Run zi,jA(y,i,h,hi,eirj)gi,jz_{i,j} \gets A(y, i, h, h_i, e_i \oplus r_j) \oplus g_{i,j}.
    2. Let x_ix'\_i be the majority of {z_i,j}_j[m]\set{z\_{i,j}}\_{j\in[m]}
  4. Output x:=x1x2...xnx' := x'_1 x'_2 ... x'_n

The parameter mm is choosen according to the success probability of AA conditioned on xGx \in G and hHxh\in \cH_x and (i,hi)(i, h_i) are consistent. Notice that conditioned on xGx \in G, the events hHxh \in \cH_x and t1t_1 hits hi(x)h_i(x) are independent, w.p. β\beta and 1/21/2. Also, BB runs over all possible ii and t2t_2. Hence, the overall success probability is poly(1/n,1/p(n))\poly(1/n, 1/p(n)).

Discuss Is FF a OWF? No, consider the case i=ni=n, which happens w.p. 1/n1/n, and then w.h.p. over hh, we can solve xx from (h,hi(x))(h,h_i(x)). However, the above claim also showed that FF is hard to invert when i=i(x)i = i^\ast(x), i.e., FF is a weak OWF.

PRG from any OWF

We assume that the OWF f:{0,1}n{0,1}nf: \bit^n \to \bit^n. This is w.l.o.g.: if input is shorter, then we pad the input with unused random bits; if the output is shorter, then we pad the output with fixed bits. The same applies to the new notions in this section, namely weak PEG and PEG.

Historically, the first construction of PRG from OWF is given by HILL’99, which was initiated by IL’89 and ILL’89. The construction here is presented by a lecture of Barak at Princeton, which followed the paper of Holenstein’06. Later, HRV’13 and VZ’12 improved the efficiency. Interestingly in constructions of PRG, there are several novel tools that are useful later, e.g., the Leftover Hash Lemma, due to [ILL’89].

Even the result is impactful, the full construction is often skipped in textbooks and graduate-level cryptography. Many books and courses cover the Goldreich-Levin hard-core lemma [Ps, KL], but only few of them goes beyond that (such as the lecture of Bellare, 1999). The book of Goldreich, Section 3.5 is one that goes much deeper, which constructs PRG from any “regular” OWF, where regular means that for the same length xx, the pre-image set f1(f(x))f^{-1}(f(x)) is the same cardinality. Still, the full construction

… is even more complex and is not suitable for a book of this nature.

Goldreich, Section 3.5

The only teaching material we found is the lecture of Barak.

In this subsection, we describe the construction of Mazor and Pass [MP23].

Construction: Mazor-Pass PRG

Let f:{0,1}n{0,1}nf: \bit^n \to \bit^n be a function for all nNn\in\N. Define the following functions.

  • hM(x):=Mf(x)Mxh_M(x) := M\odot f(x) \| M\odot x, where x{0,1}nx\in\bit^n, and M{0,1}n×nM\in \bit^{n\times n} is a matrix that will be given later. Looking forward, we will compute hM(x)h_M(x) for different xx’s but the same MM.
  • hˉM(x1,...,xt):=hM(x1)...hM(xt)\bar h_M(x_1, ..., x_{t}) := h_M(x_1) \| ... \| h_M(x_{t}), which is simply repeatedly applying hMh_M on tt independent strings xi{0,1}nx_i\in\bit^n and concatenate the outputs.

The function gg is defined by the following (deterministic) algorithm, where

  • s,t=poly(n)s,t = \poly(n) are parameters to be determined later,
  • s:=sn+logn2ns' := s \cdot \ceil{\frac{n + \log n}{2n}} is also a parameter,
  • M{0,1}n×nM \in \bit^{n\times n}, xi,j{0,1}nx_{i,j} \in \bit^n, ri[2n]r_i \in [2n], and R{0,1}s×sR \in \bit^{s' \times s} are all inputs (rir_i is represented in a (logn)(\log n)-bit binary string).

Function g(M,(xi,j:i[s],j[t]),(ri:i[s]),R)g(M, (x_{i,j} : i \in [s], j\in [t]), (r_i : i \in [s]), R):

  1. For all i[s]i \in [s], compute yihˉM(xi,1,...,xi,t)y'_i \gets \bar h_M(x_{i,1}, ..., x_{i,t}). This is called a “row,” which consists of 2nt2nt bits.
  2. For each i[s]i \in [s], remove from yiy'_i the prefix rir_i bits and suffix 2nri2n - r_i bits; call the resulting 2n(t1)2n(t-1)-bit string as yiy_i.
  3. Define R(y^):=Ry^R(\hat y) := R \odot \hat y for any ss-bit yy.
  4. For each j[2n(t1)]j \in [2n(t-1)], define y^j:=y1,jy2,j...ys,j\hat y_j := y_{1,j} \| y_{2,j} \| ... \| y_{s,j}. That is, y^j\hat y_j is the concatenation of the jj-th bits of all y1,y2,...,ysy_1, y_2, ..., y_s, and thus each y^j\hat y_j is ss-bit.
  5. Output MRR(y^1)R(y^2)...R(y^2n(t1))M\| R\| R(\hat y_1) \| R(\hat y_2) \| ... \| R(\hat y_{2n(t-1)}).

We claim that if ff is OWF, then the function gg defined as the above is a PRG. Clearly, gg is easy to compute if ff is. For expansion, the input and output sizes of gg are:

  • Input: n2+nst+s(1+logn)+ssn^2 + nst + s(1+\log n) + ss', for MM, xx’s, rr’s, and RR.
  • Output: n2+ss+2n(t1)sn^2 + ss' + 2n(t-1)s'. Since s=s/2+slog2ns' = s/2 + s \cdot \frac{\log}{2n}, we have that 2n(t1)s=n(t1)s+(t1)slog(2n).2n(t-1)s' = n(t-1)s + (t-1)s \log (2n).

Choosing sufficiently large t2nt \ge 2n, we obtain the expansion as the difference between output and input size is

n(t1)s+(t1)slog(2n)(nst+s(1+logn))=ns+(2n2)slog(2n)>1,n(t-1)s + (t-1)s \log (2n) - (nst + s(1+\log n)) = -ns + (2n-2)s \log (2n) > 1,

It remains to show pseudorandomness.