\newcommand{\Gen}{\mathsf{Gen}} \newcommand{\Tag}{\mathsf{Tag}} \newcommand{\Ver}{\mathsf{Ver}} \newcommand{\Sign}{\mathsf{Sign}} \newcommand{\taccept}{\text{accept}} \newcommand{\twin}{\text{ win}}

Authentication

When people communicate through written words, signatures and seals are applied to ensure that the written message is exactly the same from the sender to the receiver. Notice that the sender must share some information with the receiver up front, for example, the receiver must have known the signature of the sender.

In this section, we will discuss digital methods which make it difficult to “forge” a “signature.” Just as with encryption, there are two different approaches to the problem based on whether private keys are allowed: message authentication codes and digital signatures. Message Authentication Codes (MACs) are used in the private key setting. Only people who know the secret key can check if a message is valid. Digital Signatures extend this idea to the public key setting. Anyone who knows the public key of Alice can verify a signature issued by Alice, and only those who know the secret key can issue signatures. [Ps, p133]

Message Authentication Codes (MAC)

Definition: MAC

(Gen,Tag,Ver)(\Gen, \Tag, \Ver) is a message authentication code (MAC) over the message space {Mn}n\set{\cM_n}_n if the following syntax, correctness, and security hold:

  • Gen\Gen is a PPT algorithm that returns a key kGen(1n)k \gets \Gen(1^n).
  • Tag\Tag is a PPT algorithm that on input key kk and message mm outputs a tag σTagk(m)\sigma \gets \Tag_k(m).
  • Ver\Ver is a deterministic polynomial-time algorithm that on input k,mk, m and σ\sigma outputs “accept” or “reject”.

Correctness: For all nNn \in\N, for all mMnm \in \cM_n,

Pr[kGen(1n):Verk(m,Tagk(m))=accept]=1.\Pr[k \gets \Gen(1^n) : \Ver_k(m, \Tag_k(m)) = \taccept] = 1.

Security: for all NUPPT adversaries AA, there exists a negligible function ϵ(n)\eps(n) such that

Pr[kGen(1n);(m,σ)ATagk()(1n) : A did not query m, and Verk(m,σ)=accept]ϵ(n)\Pr\left[ \begin{array}{l} k \gets \Gen(1^n);\\ (m, \sigma) \gets A^{\Tag_k(\cdot)}(1^n) \end{array} ~:~ \begin{array}{l} A \text{ did not query } m \text{, and }\\ \Ver_k(m, \sigma) = \taccept \end{array} \right] \le \eps(n)

Discuss:

  • The adversary AA is allowed to oracle query Tagk\Tag_k. Is that too strong as AA gets too much info? What if we totally remove the oracle query? What if the adversary gets some (x,Tagk(x))(x, \Tag_k(x)) pairs but only for challenger-chosen xx’s?
  • The adversary AA aims to forge a tag for arbitrary message mm. Is that too strong since useful messages are not arbitrary?
  • Can we change the definition to that AA wins only if it outputs a good key?
  • Can this definition prevent an adversary that replays a (message, tag) pair? Replay can be a devastating attack in many applications, such as financial transactions.
  • If we additionally give the adversary oracle access to verification Verk\Ver_k, is the definition stronger?
  • Does the existence of MAC imply the existence of OWF? If so, what’s the construction?

The above questions provide intuitions to our first MAC candidate.

Construction: Warmup, Information-Theoretic MAC

Define the following functions

  • Gen(1n):k{0,1}n\Gen(1^n): k \gets \bit^n.
  • Tagk(m)\Tag_k(m): Output mkm \oplus k.
  • Verk(m,σ)\Ver_k(m, \sigma): Ouptut “accept” if and only if mk=σm \oplus k = \sigma.

This MAC is clearly insecure if the adversary can oracle query Tagk\Tag_k once: given any pair (m,σ=mk)(m, \sigma = m \oplus k), it is trivial to recover kk. However, the MAC is actually Information-Theoretically (and perfectly) secure when the adversary can not obtain any pair. Hence, it is called IT-MAC and is indeed used in some cryptographic protocols (which we don’t have time to cover).

Noticing that the above output of Tag\Tag is uniformly random, we construct a MAC that outputs a psuedorandom string.

Construction: MAC

Let F={fs}F = \set{ f_s} be a family of pseudorandom functions such that fs:{0,1}s{0,1}sf_s : \bit^{|s|} \to \bit^{|s|}.

  • Gen(1n):k{0,1}n\Gen(1^n): k \gets \bit^n.
  • Tagk(m)\Tag_k(m): Output fk(m)f_k(m).
  • Verk(m,σ)\Ver_k(m, \sigma): Ouptut “accept” if and only if fk(m)=σf_k(m) = \sigma.

Theorem:

If there exists a pseudorandom function, then the above scheme is a Message Authentication Code over the message space {0,1}n\bit^n.

Proof Sketch:

The correctness is direct, and the security is argued below.

Consider a hybrid scheme and the corresponding hybrid experiment such that a random function RFRF is used instead of the PRF fkf_k. The hybrid scheme is indistinguishable from the real construction by the oracle indistinguishability of PRF and then by a standard reduction (for any NUPPT adversary). Next, in the hybrid experiment, we claim that any adversary can only win with probability at most 2n2^{-n}: if the adversary queries mm, it is always rejected; otherwise, the adversary can only output σ=RF(m)\sigma = RF(m) w.p. 2n2^{-n} (for even unbounded adversaries).

From fixed-length to arbitrary length

The above definition and construction consider MAC for messages in space Mn\cM_n so that the length is fixed. Of course, to handle arbitrarily long messages, we can instantiate the PRF in the construction with a PRF scheme of arbitrary length. However, existing practical pseudorandom functions (i.e., block ciphers) take short, fixed-length inputs. It is necessary in practice to handle some subtleties, and this is called “domain extension” in general.

  1. Split a long message into a sequence of short “blocks” and then Tag\Tag each.
  2. Include a sequence number into each block (to prevent reordering attack).
  3. Append to every block the total length (to prevent truncation attack).

There are other approaches, such as CBC-MAC (which is only secure when the message length is “fixed and agreed upon in advance”, see [KL, Exercise 4.13]). We skip the details of the constructs.

Cryptographic Hash Functions

Hash functions are used in many areas of computer science, and in general, all areas require that the function is shrinking; that is, the output is shorter than the input of the function.

In cryptographic applications like MAC, we want a hash function hh to be collision-resistant. That is, give an input mm and its corresponding output h(m)h(m), it is computationally hard to find mmm' \neq m such that h(m)=h(m)h(m') = h(m). There are several formal definitions, and we discuss only two below.

Ref: [Ps 5.5] [KL 9.3]

Definition: Universal One-Way Hash Functions

A set of functions H={hk:{0,1}d(n){0,1}r(n),n=k}k{0,1}H = \set{h_k : \bit^{d(n)} \to \bit^{r(n)}, n = |k|}_{k \in \bit^\ast} is a family of universal one-way hash functions (UOWHF) if:

  • (compression) r(n)<d(n)r(n) \lt d(n).
  • (ease of evaluation) Given k{0,1}nk \in \bit^n and x{0,1}r(n)x \in \bit^{r(n)}, the computation of hk(x)h_k(x) can be done in PPT in nn.
  • (collision resistance) for all NUPPT AA, there exists a negligible ϵ\eps such that nN\forall n \in \N, AA wins the following game w.p. ϵ(n)\le \eps(n):

    1. (x,state)A(1n)(x, state) \gets A(1^n)
    2. k{0,1}nk \gets \bit^n
    3. xA(1n,k,state)x' \gets A(1^n, k, state)
    4. AA wins if xxx' \neq x and fk(x)=fk(x)f_k(x') = f_k(x)

Example:

Suppose H={hi:{0,1}{0,1}i}i{0,1}H = \set{h_i: \bit^* \to \bit^{|i|}}_{i \in \bit^*} is a family of UOWHF. Let (Gen,Tag,Ver)(\Gen, \Tag, \Ver) be a MAC for 2n2n bit messages, where nn is the security parameter (given to Gen\Gen). Then the following is also a MAC for arbitrarily long (polynomial in nn) messages:

  • Gen\Gen' is identical to Gen\Gen.
  • Tagk(m)\Tag'_k(m): sample i{0,1}ni \gets \bit^n, compute yhi(m)y \gets h_i(m), then output (iyTagk(iy))(i \| y \| \Tag_k(i \| y)).
  • Verk(m,σ=iyσ)\Ver'_k(m, \sigma = i' \| y' \| \sigma'): accept iff hi(m)=yh_{i'}(m) = y' and Verk(iy,σ)\Ver_k(i'\|y', \sigma') accepts.

Here, we upgraded the MAC to work for messages longer than 2n2n.

Question: in the definition of UOWHF, we allow AA to choose xx. Why or where we use such property in the example?

Definition: Collision-Resistant Hash Functions

A set of functions H={hi:DiRi}iIH = \set{h_i : D_i \to R_i}_{i\in I} and algorithm Gen\Gen is a family of collision-resistant hash functions (CRHF) if:

  • (ease of sampling) iGen(1n)i\gets \Gen(1^n) runs in PPT, iIi \in I.
  • (compression) Ri<Di|R_i| \lt |D_i|.
  • (ease of evaluation) Given iIi \in I and xRix \in R_i, the computation of hi(x)h_i(x) can be done in PPT.
  • (collision resistance) for all NUPPT AA, there exists a negligible ϵ\eps such that nN\forall n \in \N,

    Pr[iGen(1n);(x,x)A(1n,i) : hi(x)=hi(x)xx]ϵ(n).\Pr[i \gets \Gen(1^n); (x, x') \gets A(1^n, i) ~:~ h_i(x) = h_i(x') \wedge x \neq x'] \le \eps(n).

Note Because the adversary AA chooses both xx and xx', the key kk is necessary to defend against non-uniform adversaries; otherwise, a non-uniform AA can just remember a colliding pair (x,x)(x,x') for every problem size nNn\in \N. Many practical hash functions (such as SHA) are unkeyed and do not satisfy this definition.

A CRHF is a UOWHF and also a OWF, and we can construct UOWHF from OWF. However, it is long open whether we can get CRHF from OWF. Instead, CRHF is contructed from various concrete assumptions, and CRHF is also constructed from “trapdoor permutations”, which is yet another primitive that we do not know how to obtain from OWF. We will use discrete logarithm assumption below.

{:.label}Discuss: Suppose that HkH_k is a CRHF or UOWHF (for some corresponding key kk and key generation). Let HkH'_k be the function that truncates the last output bit of HkH_k. Is the resulting HH' and the key generation a good CRHF (or UOWHF)? Is there a counterexample such that HH is secure but HH' is not?

Hash-and-MAC

Using either collision-resistant hash functions (CRHF) or even UOWHF, we have a standard way to extend the message space of any MAC scheme (or digital signature, discussed later). That is, whenever we want to sign (or tag) a message mm, we sample a key ss of the hash function hh then compute the hash value v:=hs(m)v := h_s(m) and then compute the signature as σ:=(s,Tagk(v))\sigma:=(s, \Tag_k(v)), where Tag\Tag and kk are the tagging algorithm and the key of the given MAC scheme. The verification of (m,σ)(m,\sigma) is contructed accordingly.

Note: In the textbook [KL, Construction 6.5], the key ss of the hash function is generated once for all messages, and thus ss is kept secret from the adversary. (CRHF and UOWHF both gives the key to the adversary, so the proof does not use the full property.)

Note: Practical hash functions are typically using variants of SHA that are currently implemented directly in mainstream processors, including AMD, Intel, and Arm since 2015–2017. Also notice that SHA is a function but NOT keyed, while people have not find any collision.

Digital Signature Schemes

With message authentication codes, both the signer and verifier need to share a secret key. In contrast, digital signatures mirror real-life signatures in that anyone who knows Alice (but not necessarily her secrets) can verify a signature generated by Alice. Moreover, digital signatures possess the property of non-repudiability, i.e., if Alice signs a message and sends it to Bob, then Bob can prove to a third party (who also knows Alice) the validity of the signature. Hence, digital signatures can be used as certificates in a public key infrastructure.

– [Ps, Section 5.3]

Ref: [KL 13.1, 13.2, 13.6]

Definition: Digital Signatures

(Gen,Sign,Ver)(\Gen, \Sign, \Ver) is a digital signature scheme over the message space {Mn}n\set{\cM_n}_n if the following syntax, correctness, and security are satisfied:

  • Gen(1n)\Gen(1^n) is a PPT which on input nn outputs a public key pk\pk and a secret key sk\sk, (pk,sk)Gen(1n)(\pk, \sk) \gets \Gen(1^n).
  • Sign\Sign is PPT algorithm which on input a secret key sk\sk and message mMnm \in \cM_n outputs a signature σSignsk(m)\sigma \gets \Sign_\sk(m).
  • Ver\Ver is a deterministic poly-time algorithm which on input a public key pk\pk, a message mm and a signature σ\sigma returns either “accept” or “reject”.

Correctness: For all nNn \in \N, for all mMnm \in \cM_n,

Pr[pk,skGen(1n):Verpk(m,Signsk(m))=accept]=1\Pr[\pk, \sk \gets \Gen(1^n) : \Ver_\pk(m, \Sign_\sk(m)) = \taccept] = 1

Security: For all NUPPT adversaries AA, there exists a negligible function ϵ(n)\eps(n) such that nN\forall n \in \N,

Pr[pk,skGen(1n);(m,σ)ASignsk()(1n,pk) : A did not query m, and Verpk(m,σ)=accept]ϵ(n)\Pr\left[ \begin{array}{l} \pk,\sk \gets \Gen(1^n);\\ (m, \sigma) \gets A^{\Sign_\sk(\cdot)}(1^n, \pk) \end{array} ~:~ \begin{array}{l} A \text{ did not query } m \text{, and }\\ \Ver_\pk(m, \sigma) = \taccept \end{array} \right] \le \eps(n)

Discuss:

  • pk\pk must be sent to the verifier through an athenticated way (by definition in the above). If we have that authenticated way, why do we just use MAC?
  • Would it be meaningful if AA gets no oracle access to Signsk\Sign_\sk? The adversary AA is given pk\pk as input.
  • Would it be meaningful if AA gets oracle access to Signsk\Sign_\sk but only once?
  • The definition is a public-key version of MAC.
  • Since the verification uses only public key, AA can perform verification without oracle queries.
  • Hence, it is clear that DS implies MAC, and then it implies OWF.
  • Can we obtain a digital signature from (public-key) encryption?

The following theorem shows that signing nn-bit messages is sufficient because we can extend the message space through hashing.

Theorem: (Extending message space)

Suppose that (Gen,Sign,Ver)(\Gen, \Sign, \Ver) is a digital signature scheme over the message space cMncM_n such that nNn \in \N is the security parameter (given as input to Gen\Gen). Also suppose that {fk:{0,1}{0,1}n,n=k}nN\set{f_k : \bit^* \to \bit^n, n = |k|}_{n\in\N} is a family of UOWHF. Then, the following is a digital signature scheme over the message space Mpoly(n)\cM_{\poly(n)} for any polynomial poly\poly.

  • Gen\Gen' is the same as Gen\Gen
  • Signsk(m)\Sign'_\sk(m): sample k{0,1}n/2k\gets \bit^{n/2} uniformly at random, compute hfk(m)h \gets f_k(m), run σSignsk(kh)\sigma' \gets \Sign_\sk(k \| h), and then output σ:=(kσ)\sigma := (k \| \sigma').
  • Verpk(m,σ)\Ver'_\pk(m, \sigma): parse σ=(kσ)\sigma = (k \| \sigma'), compute hfk(m)h \gets f_k(m), accept if Verpk(kh,σ)\Ver_\pk(k \| h, \sigma') accepts.

Lamport’s Signature Scheme

Refs: [KL 14.4], Lamport’79, Goldwasser@Berkeley

Definition: One-Time Digital Signatures

(Gen,Sign,Ver)(\Gen, \Sign, \Ver) is a one-time digital signature scheme the definition of DS is satisfied under the constraint that the adversary AA is only allowed to query the signing oracle once (in ASignsk()A^{\Sign_\sk(\cdot)}).

Construct: Lamport’s Digital Signature

Let f:{0,1}{0,1}f: \bit^\ast \to \bit^\ast be a OWF. Given nNn \in \N, (Gen,Sign,Ver)(\Gen, \Sign, \Ver) is constructed as follows for Mn:={0,1}n\cM_n := \bit^n.

Gen(1n)\Gen(1^n):

  1. Sample strings xbi{0,1}nx_b^i \gets \bit^n for i[n],b{0,1}i \in [n], b \in \bit.
  2. Compute ybi=f(xbi)y_b^i = f(x_b^i) for all i,bi, b
  3. Output pk:=(ybi)i,b\pk := (y_b^i)_{i,b} and sk:=(xbi)i,b\sk := (x_b^i)_{i,b}.

Signsk(m)\Sign_\sk(m):

  1. Output σ:=(xmii)i[n]\sigma := (x_{m_i}^i)_{i\in[n]}, where mim_i denotes the ii-th bit of mm.

Verpk(σ)\Ver_\pk(\sigma):

  1. Output accept\taccept iff for all i[n]i \in [n], it holds that f(xmii)=ymiif(x_{m_i}^i) = y_{m_i}^i, where xmiix_{m_i}^i comes from σ\sigma and ymiiy_{m_i}^i comes from pk\pk.

Theorem:

If ff is a one-way function, then Lamport’s Signature is a secure one-time digital signature (for nn-bit messages).

As a corollary, the construction extends to messages of \ell bits such that :=(n)\ell := \ell(n) is a polynomial.

Note Many other public-key cryptographic schemes (such as public-key encryption) relied on stronger assumptions (such as PKE from RSA assumption), and we do not know any construction from OWF. Digital signature is a big surprise since we get it from OWF.

Proof sketch.

We want to prove by contradiction: if there exists AA that makes one-time query mm' and then forges the message and signature (m,σ)(m, \sigma) with mmm \neq m', then we want to construct another adversary B\cB that inverts ff. The intuition is that given mmm \neq m', there exists a bit mimim_i \neq m'_i for some ii, and then in order to pass the verification of (m,σ)(m, \sigma), AA must be able to find the pre-image of the ii-th entry of pk\pk, which is inverting ff.

The tricky step is that in the reduction, we need to give pk\pk to AA up front. Since we have no idea about ii at that step, we are going to guess it.

More formally, assume for contradiction, there exists NUPPT adversary AA and polynomial pp such that for infinitely many nNn \in \N,

Pr[A win]1/p(n),\Pr[A \twin] \ge 1/p(n),

where A winA \twin denotes the event that mmm \neq m' and Verpk(m,σ)=\Ver_\pk(m, \sigma) = accept in the security game. We want to construct BB that inverts ff.

Let zf(x)z \gets f(x) and x{0,1}nx \gets \bit^n. BB is constructed as below.

Algorithm B(1n,z)B(1^n, z):

  1. Sample pk:=(y_bi)_i,b\pk := (y\_b^i)\_{i,b} and sk:=(x_bi)_i,b\sk := (x\_b^i)\_{i,b} as per Lamport’s key generation.
  2. Sample b{0,1},i[n]b^\ast \gets \bit, i^\ast \gets [n] uniformly at random.
  3. Modify pk\pk by setting ybizy_{b^\ast}^{i^\ast} \gets z.
  4. Run ASignsk()(pk)A^{\Sign_\sk(\cdot)}(\pk): if AA queries mm' such that m_i=bm'\_{i^\ast} = b^\ast, then output \bot (“fail” symbol); otherwise, respond to AA the signature as per Sign_sk\Sign\_\sk. Let the result be (m,σ)(m, \sigma).
  5. If mimim_{i^\ast} \neq m'_{i^\ast}, then output σ\sigma (as a candidate pre-image of zz); output \bot otherwise.

Notice that AA can not know bb^\ast nor ii^\ast because all entries in pk\pk are identically distributed. Hence, Pr[m_ib]=1/2\Pr[m'\_{i^\ast} \neq b^\ast] = 1/2, and Pr[m_im_i]1/n\Pr[m\_{i^\ast} \neq m\_{i^\ast}] \geq 1/n.

Tree-based Signatures

Ref: [KL 14.4.2, 14.4.3]

How to extend one-time signature to sign many messages? When signing a message, we can additionally generate the next pair of (pk1,sk1)(\pk_1,\sk_1) and then sign and send the next pk1\pk_1 with the current message, and so on for the next messages. The verifier needs to verify and to keep the next pk1\pk_1. That is, both the signer and verifier need to keep states, or the signing / verification time is linear in the number of signed messages.

To improve it, we use tree-based approach. That is, for each pair (pk,sk)(\pk, \sk), we sign two public keys pk0,pk1\pk_0, \pk_1, and then each pkb\pk_b (together with the corresponding skb\sk_b) can further sign two keys pkb0,pkb1\pk_{b0}, \pk_{b1}, and so on. We build a tree of 2n2^n leaves so that we can sign up to 2n2^n messages, and the first signature consists of nn one-time signatures:

σ:=(pk,σ0,(pk0,pk1),σ1,(pk00,pk01),...,σn1,(pk0n,pk00...01),Signsk00...0(m)),\sigma := (\pk, \sigma_0, (\pk_0, \pk_1), \sigma_1, (\pk_{00},\pk_{01}), ..., \sigma_{n-1}, (\pk_{0^n},\pk_{00...01}), \Sign_{\sk_{00...0}}(m)),

where σiSignsk0i(pk0i0,pk0i1)\sigma_i \gets \Sign_{\sk_{0^i}}(\pk_{0^i 0},\pk_{0^i 1}). The second, thrid, and forth signatures and so on are moving the path from 000...00000...00 to 000...01000...01 and then to 000...10000...10 and so on. Because all the (pkx,skx)(\pk_x, \sk_x) pairs are one-time, the signer keeps a state that counts the number of messages signed so far and all the (pkx,skx)(\pk_x, \sk_x) pairs generated so far so that the next message is signed consistently. The verifier is stateless and keeps only pk\pk.

The above scheme can be further improved to achieve a stateless signer by using a PRF to generate the randomess needed by each tree node. That is, let xx be a string of less then nn bits that indicates the tree node, and let fkf_k be a PRF with key kk sampled one-time up front, and then we can use fk(x)f_k(x) be the randomness needed by node xx (since we need only two Gen\Gen and one Sign\Sign at each tree node). Key kk is added to be part of the secret key, and signer can always generate the identical (pkx,skx)(\pk_x, \sk_x) at node xx.

Discuss:

  • If the number of messages is not given in advance, how to build a DS?

Notice If we plug Lamport’s signature into the above composition, then the length of the signature will grow exponentially because the public key is much longer than the message (and we sign the pk\pk’s). An easier “fix” is to shrink the message to a shorter “hash” and then sign the hash. That is, the signer S and verifier V agree on a keyed hash function hkh_k, S computes the (shorter) hash hk(m)h_k(m) from message mm so that S can sign the hash and V can verify. Of course, we need a secure hash function so that any adversary given kk can not forge another mmm' \neq m but hk(m)=hk(m)h_k(m') = h_k(m). This property is called targeted collision resistant hash functions or universal one-way hash functions (UOWHF). Such hash functions can be constructed from one-way functions. We omit the proof here and give CRHF below since UOWHF is implied by CRHF. Historically, Naor and Yung formalized UOWHF and constructed it from one-way permutations, and then Rompel constructed UOWHF from any OWF. [Goldreich, FoC, Vol 2, Section 6.4.3] shows the result of Naor and Yung, and Rompel’s result is later proved formally by Katz and Koo. Similar to PRG, it is still active research, e.g., Mao, Nazor, and Zhang improved the construction to be non-adaptive (ie parallel) calls to OWF.

Postscript{. label} Leslie Lamport wrote a great note on “How To Present a Paper” in 1979. That is decades old and just two pages, but it is still inspiring and pure gold on good presentations.

Collision-Resistant Hash Functions

Definition: Generator of a Group

A element gg of a multiplicative group GG is a generator if the set {g,g2,g3,...}=G\set{g, g^2, g^3, ...} = G. We denote the set of all generators of a group GG by Gen(G)\Gen(G).

Assumption: Discrete Log

If GqG_q is a group of prime order qq, then for every adversary AA, there exists a negligible function ϵ\eps such that

Pr[qΠn;gGen(Gq);xZq:A(q,g,gx)=x]<ϵ(n).\Pr [ q \gets \Pi_n; g \gets \Gen(G_q) ; x \gets \Z_q : A(q, g, g^x) = x ] \lt \eps(n).

In the above definition, the group GqG_q is not instantiated. Hence the hardness of the DL problem depends on GqG_q, and indeed for some groups such as (Zq,+)(\Z_q, +) it is trivial. In crypto, it is believed that DL is hard on the subgroup GqG_q of Zp\Z^\ast_p for prime pp. Notice that the order of Zp\Z^\ast_p is even p1p-1, not prime order (and thus DL may be easy). Therefore, we sample a prime of the form p=2q+1p = 2q + 1: first sample prime qq and then let p=2q+1p = 2q + 1 if 2q+12q + 1 is also prime, otherwise repeatedly sample another qq. Such primes are known as Sophie Germain primes or safe primes.

Unfortunately, even though this procedure always terminates in practice, its basic theoretical properties are unknown. It is unknown even (a) whether there are an infinite number of Sophie Germain primes, (b) and even so, whether this simple procedure for finding them continues to quickly succeed as the size of qq increases. – [Ps, Section 2.8.1]

Also notice that DL is a one-to-one mapping, and the domain and range are the same in cardinality. However, the representations of elements (in bits) can be quite different, and thus it is not exactly a one-way permutation (strictly speaking).

Sampling prime-order group and the generator

Theorem:

Let p=2q+1p = 2q + 1 with p,qp,q prime. Then

G:={h2mod  p:hZp}G := \set{ h^2 \mod p : h \in \Z^\ast_p}

is a subgroup of Zp\Z^\ast_p of order qq.

It is direct that GG is a subgroup. It remains to prove the order, and it suffices to show that f(h)=h2mod  pf(h) = h^2 \mod p is 2-to-one (because G=f(Zp)=(p1)/2=q|G| = |f(\Z^\ast_p)| = (p-1)/2 = q). Let gg be a generator of Zp\Z^\ast_p (proof omitted: the existence of generator of Zp\Z^\ast_p) Let hZph \in \Z^\ast_p such that h=gth = g^t for some tp1t \le p-1. Consider any h2h_2 such that h22=h2mod  ph_2^2 =h^2 \mod p where h2=gt2h_2 = g^{t_2}. We have g2t2=g2tg^{2t_2} = g^{2t}, and 2(t2t)=0mod  (p1)2(t_2 - t) = 0 \mod (p-1). Thus by p1=2qp-1 = 2q, either t2=tt_2 = t or t2t=0mod  qt2 - t = 0 \mod q.

This proof extends to hrh^r for all even integer r2r \ge 2. Moreover, it also shows that sampling from GG uniformly at random is efficient: just sample hh and compute h2mod  ph^2 \mod p.

Construction: CRHF from DL

Gen(1n)\Gen(1^n): output (p,q,g,y)(p,q,g,y), where

  • p,qp,q are nn-bit primes that p=2q+1p=2q+1,
  • gg is the generator of GqG_q,
  • yGqy \gets G_q, and
  • GqG_q is order qq subgroup of Zp\Z^\ast_p.

hp,q,g,y(x,b)h_{p,q,g,y}(x, b): Output ybgxmod  py^b g^x \mod p, where

  • xx is nn-bit, bb is 1-bit

Theorem:

Under the Discrete Logarithm assumption, the above construction is a collision-resistant hash function that compresses by 1 bit.

Proof sketch:

It is efficient, and the compression follows by n+1n+1 to nn bits mapping (strictly speaking, that is 2p2p to pp as x[p]x \in [p]). To prove collision resistance, firstly consider the case h(x,b)=h(x,b)h(x,b) = h(x', b) but xxx \neq x'. Then, we have

gx=gxmod  pg^{x} = g^{x'} \mod p

which means that x=xx = x'. Hence, it must be xx,bbx \neq x', b \neq b' but h(x,b)=h(x,b)h(x,b) = h(x',b'). Given bbb \neq b', suppose b=1,b=0b=1, b' = 0 (this is without loss of generality as b=0b=0 otherwise). Then,

y1gx=y0gxmod  py^1 g^x = y^0 g^{x'} \mod p

which implies that

y=gxxmod  p.y = g^{x'-x} \mod p.

Thus, we can compute the DL of yy using any adversary AA that on input yy outputs a collision (x,b)(x,b)(x,b) \neq (x',b').

Note: the above construction is based on the DL assumption w.r.t. the prime order group GqG_q. We can analogously construct CRHF from group Zp\Z^\ast_p based on the DL assumption w.r.t. Zp\Z^\ast_p.