[][src]Function crypto::sha2::sha256_digest_block

pub fn sha256_digest_block(state: &mut [u32; 8], block: &[u8])

Process a block with the SHA-256 algorithm. (See more...)

Internally, this uses functions which resemble the new Intel SHA instruction sets, and so it's data locality properties may improve performance. However, to benefit the most from this implementation, replace these functions with x86 intrinsics to get a possible speed boost.

Implementation

The Sha256 algorithm is implemented with functions that resemble the new Intel SHA instruction set extensions. These intructions fall into two categories: message schedule calculation, and the message block 64-round digest calculation. The schedule-related instructions allow 4 rounds to be calculated as:

This example is not tested
use std::simd::u32x4;
use self::crypto::sha2::{
    sha256msg1,
    sha256msg2,
    sha256load
};

fn schedule4_data(work: &mut [u32x4], w: &[u32]) {

    // this is to illustrate the data order
    work[0] = u32x4(w[3], w[2], w[1], w[0]);
    work[1] = u32x4(w[7], w[6], w[5], w[4]);
    work[2] = u32x4(w[11], w[10], w[9], w[8]);
    work[3] = u32x4(w[15], w[14], w[13], w[12]);
}

fn schedule4_work(work: &mut [u32x4], t: usize) {

    // this is the core expression
    work[t] = sha256msg2(sha256msg1(work[t - 4], work[t - 3]) +
                         sha256load(work[t - 2], work[t - 1]),
                         work[t - 1])
}

instead of 4 rounds of:

This example is not tested
fn schedule_work(w: &mut [u32], t: usize) {
    w[t] = sigma1!(w[t - 2]) + w[t - 7] + sigma0!(w[t - 15]) + w[t - 16];
}

and the digest-related instructions allow 4 rounds to be calculated as:

This example is not tested
use std::simd::u32x4;
use self::crypto::sha2::{K32X4,
    sha256rnds2,
    sha256swap
};

fn rounds4(state: &mut [u32; 8], work: &mut [u32x4], t: usize) {
    let [a, b, c, d, e, f, g, h]: [u32; 8] = *state;

    // this is to illustrate the data order
    let mut abef = u32x4(a, b, e, f);
    let mut cdgh = u32x4(c, d, g, h);
    let temp = K32X4[t] + work[t];

    // this is the core expression
    cdgh = sha256rnds2(cdgh, abef, temp);
    abef = sha256rnds2(abef, cdgh, sha256swap(temp));

    *state = [abef.0, abef.1, cdgh.0, cdgh.1,
              abef.2, abef.3, cdgh.2, cdgh.3];
}

instead of 4 rounds of:

This example is not tested
fn round(state: &mut [u32; 8], w: &mut [u32], t: usize) {
    let [a, b, c, mut d, e, f, g, mut h]: [u32; 8] = *state;

    h += big_sigma1!(e) +   choose!(e, f, g) + K32[t] + w[t]; d += h;
    h += big_sigma0!(a) + majority!(a, b, c);

    *state = [h, a, b, c, d, e, f, g];
}

NOTE: It is important to note, however, that these instructions are not implemented by any CPU (at the time of this writing), and so they are emulated in this library until the instructions become more common, and gain support in LLVM (and GCC, etc.).