How the L0pht (probably) optimized attack against the LanMan hash.

  1. Introduction
  2. Data Encryption Standard
  3. The LanMan Algorithm
  4. Brute Force Attack
  5. Version 1
  6. Precomputing Key Schedules 1
  7. Version 2
  8. Using Macros For The Key Schedule Algorithm
  9. Initial and Final Permutation
  10. Skipping Rounds
  11. Version 3
  12. Precomputing Key Schedules 2
  13. Version 4
  14. Results

1. Introduction

Some of you may remember a famous group of hackers that operated out of a loft (or attic) in Boston, Massachusetts, USA between 1992 and 2000 that called themselves L0pht Heavy Industries. Perhaps a defining moment in the group’s history was in May 1998 when they testified before the United States Congress, forewarning about the fragility of the Internet and how it could be shut down in 30 minutes using the Border Gateway Protocol (BGP). Most oldskool hackers will remember them for being some of the first security researchers to practice responsible disclosure of software vulnerabilities via advisories, aswell as maintaining a number of websites like HackerNews.com, the Black Crawling Systems Archives, the Whacked Mac Archives and Guerilla.net.

Like many people, I remember the group for writing L0phtCrack, a tool designed to recover passwords protected by the Windows operating system. L0phtCrack was originally published with an advisory almost 22 years ago in April 1997. In the year 2000, a now defunct company called @atstake acquired L0pht, including the ownership rights to L0phtCrack. In 2004, Symantec acquired @atstake before discontinuing development and distribution of L0phtCrack in 2005. In 2009, members of the original L0pht group (Zatko, Wysopal, and Rioux) reacquired ownership rights to L0phtCrack and continued with its development up to the present day. Those of you that want to know more about the group can read History of the L0pht.

This post suggests some ways the L0pht may have accelerated recovery of passwords protected by the LanMan (LM) hash that is derived from the Data Encryption Standard (DES). I don’t reveal any Top Secret technique for cracking DES that only L0pht or some alphabet agencies know about. Similar optimizations were implemented over twenty years ago by Alexander Peslyak and Roman Rusakov in another popular password recovery tool called John The Ripper.

l0pht

2. Data Encryption Standard

DES is a block cipher that operates on plaintext blocks of 64-bits and returns ciphertext blocks of the same size. Each key can be 56-bits in total giving us 2^{56} (72,057,594,037,927,936), or approximately 72 quadrillion possible keys. A 56-bit key is expanded into 16 subkeys or round keys, each of which is 48-bits long. It has for over 20 years been considered obsolete and insecure, but continues to be used mainly to support legacy systems.

What follows is a list of notable events around initial research into cracking DES, including the LanMan hash derived from DES.

January 1997 Cryptographer Eli Biham publishes his paper at Fast Software Encryption 4 titled A Fast New DES Implementation in Software. It describes a novel way to optimize the Data Encryption Standard using simple bitwise operations (XOR, AND, NOT, OR). Although unrelated to the development of L0phtCrack, the technique would later be used to optimize attacks against the LanMan hash in tools like John The Ripper.
January 1997 Rocke Verser launches DESCHALL in response to an offer by RSA to crack DES for a $10,000 prize.
March 1997 Samba developer Jeremy Allison releases pwdump. It enables Administrators to dump LM (derived from DES) and NTLM (derived from MD4) hashes stored in the Security Account Manager (SAM) database on Windows NT.
April 1997 L0phtCrack v1.0 released. It primarily exploits the poor design of the LanMan algorithm to recover plaintext passwords.
May 1997 Microsoft releases Service Pack 3 for Windows NT that includes “SYSKEY”; an optional component designed to prevent pwdump working properly.
June 1997 Rocke Verser announces the recovery a 56-bit DES key.
July 1997 L0phtCrack v1.5 released. Includes a much more detailed analysis of Server Message Block (SMB) authentications. Cryptographer David Wagner shares his observations on the the challenge response/pair and suggests ways to optimize attacks against it.
August 1997 L0pht attend the Beyond HOPE (The Hackers on Planet Earth) conference in New York city. Discuss the lack of adequate security provided by vendors in various technologies.
February 1998 L0phtCrack v2.0 released. Includes an SMB session network sniffer, a multithreaded brute force algorithm and faster search algorithm for large databases.
May 1998 Matthew Kwan releases his “bitslice” code based on the paper by Eli Biham.
July 1998 The Electronic Frontier Foundation build a DES cracker called “Deep Crack” and recover a 56-bit key in 56 hours using a device that costs $250,000.
January 1999 Deep Crack and distributed.net break a DES key in 22 hours and 15 minutes.
January 1999 L0phtCrack v2.5 released. The DES routines have been highly optimized in assembler for Pentium, Pentium MMX, Pentium Pro, and Pentium II specific processors. This results in a 450% speed increase. All alphanumeric passwords can be found in under 24 hours on a Pentium II/450.

What I try to focus on in this post is how the L0pht gained a “450% speed increase” over previous versions of the software, but first, how are LanMan hashes created?

3. The LanMan Algorithm

  1. The password is restricted to a maximum of fourteen characters. (null-padded if required)
  2. The password is converted to uppercase.
  3. The password is encoded in the system OEM code page.
  4. The password is split into 7-byte halves and used to create two DES keys.
  5. Each key is used to encrypt the string KGS!@#$% using DES in ECB mode, resulting in two 8-byte ciphertext values. The string “KGS!@#$% could possibly mean Key of Glen and Steve with the combination of Shift + 12345. Glen Zorn and Steve Cobb are the authors of RFC 2433 (Microsoft PPP CHAP Extensions).
  6. The two ciphertext values are concatenated to create a 16-byte value, which is the LM hash.

Using the above details, the following code uses OpenSSL to generate a LanMan hash. The only thing missing is the OEM encoding. For that reason, hashes generated by this code will not always match those generated by Windows itself. Internally, Windows originally used the CharToOem API before creating a DES key. This is important to remember because some passwords generated by Windows will simply not be recovered unless the cracker uses CharToOem or CharToOemBuff before hand.

#include <stdio.h>
#include <stdint.h>
#include <stdlib.h>
#include <string.h>
#include <ctype.h>

#include <openssl/des.h>

void DES_str_to_key (char str[], uint8_t key[]) {
    int i;

    key[0] = str[0] >> 1;
    key[1] = ((str[0] & 0x01) << 6) | (str[1] >> 2);
    key[2] = ((str[1] & 0x03) << 5) | (str[2] >> 3);
    key[3] = ((str[2] & 0x07) << 4) | (str[3] >> 4);
    key[4] = ((str[3] & 0x0F) << 3) | (str[4] >> 5);
    key[5] = ((str[4] & 0x1F) << 2) | (str[5] >> 6);
    key[6] = ((str[5] & 0x3F) << 1) | (str[6] >> 7);
    key[7] = str[6] & 0x7F;

    for (i = 0;i < 8;i++) {
      key[i] = (key[i] << 1);
    }
    DES_set_odd_parity ((DES_cblock*)key);
}

char* lmhash(char *pwd) {
    DES_cblock       key1, key2;
    DES_key_schedule ks1, ks2;
    const char       ptext[]="KGS!@#$%";
    static char      hash[64], lm_pwd[16];
    uint8_t          ctext[16];
    size_t           i, pwd_len = strlen(pwd);
    
    // 1. zero-initialize local buffer
    memset(lm_pwd, 0, sizeof(lm_pwd));
    
    // 2. convert password to uppercase (restricted to 14 characters)
    for(i=0; i<pwd_len && i<14; i++) {
      lm_pwd[i] = toupper((int)pwd[i]);
    }
    
    // 3. create two DES keys
    DES_str_to_key(&lm_pwd[0], (uint8_t*)&key1);
    DES_str_to_key(&lm_pwd[7], (uint8_t*)&key2);
    DES_set_key(&key1, &ks1);
    DES_set_key(&key2, &ks2);
    
    // 4. encrypt plaintext
    DES_ecb_encrypt((const_DES_cblock*)ptext, 
      (DES_cblock*)&ctext[0], &ks1, DES_ENCRYPT);

    DES_ecb_encrypt((const_DES_cblock*)ptext, 
      (DES_cblock*)&ctext[8], &ks2, DES_ENCRYPT);
      
    // 5. convert ciphertext to string
    for(i=0; i<16; i++) {
      snprintf(&hash[i*2], 3, "%02X", ctext[i]);
    }
    return hash;
}


int main(int argc, char *argv[]) {
    if (argc!=2) {
      printf("usage: lmhash <password>\n");
      return 0;
    }
    
    printf("LM Hash: %s\n", lmhash(argv[1]));
    return 0;
}

We identify a number of weaknesses here based on the algorithm.

  1. Electronic Code Book (ECB) mode using plaintext that is always the same. Contrast this with unix crypt3() that encrypts a nonce/salt resulting in unique ciphertext. As a result of the LanMan algorithm encrypting the same plaintext, passwords seven characters or less will always include 0xAAD3B435B51404EE in the 16-byte LM hash.
  2. The fourteen character password is used to create two 7-byte or 56-bit DES keys. This means a brute force attack only requires 95^{7} attempts instead of 95^{14}
  3. Passwords are converted to uppercase, reducing the keyspace to 69^{7} attempts.

4. Brute Force Attack

Brute force has sometimes been referred to as “dumb mode” because rather than select passwords based on a predefined set of rules, it will simply attempt all possible combinations from a set of numbers and letters, including ones that are unlikely to be used in practice. Passwords should always be easy to recall, and even today it’s unusual for people to pick something that might be difficult to remember later. Having said that, rules are imperfect and sometimes only an exhaustive search of the keyspace will succeed.

The following screenshot is of L0phtcrack 2.5 running inside a virtual machine. As you can see, it averages around 5.64 million tries/keys per second.

The brute force cracker implemented in lmcrack is simply to demonstrate the overall gains achieved via simple optimizations and isn’t intended to be used for anything else. Those that want to recover passwords should use a fully functional password cracker.

5. Version 1

The first version is simply using the OpenSSL library. L0phtCrack v1.0 and v1.5 both used DES routines from the OpenSSL library.

static bool crack_lm1(void *param) {
    int              i;
    DES_cblock       deskey;
    DES_key_schedule ks;
    uint8_t          pwd[8]={0};
    const char       ptext[]="KGS!@#$%";
    uint8_t          ctext[8];
    crack_opt_t      *c=(crack_opt_t*)param;
    
    // initialize password
    for(i=0;i<7;i++) {
      if(c->pwd_idx[i] == ~0UL)break;
      pwd[i] = c->alphabet[c->pwd_idx[i]];
    }
      
    // while not stopped
    while(!c->stopped) {
      // convert password to DES odd parity key
      DES_str_to_key(pwd, deskey);
      // create DES subkeys 
      DES_set_key(&deskey, &ks);
      // encrypt plaintext
      DES_ecb_encrypt((const_DES_cblock*)ptext, 
        (DES_cblock*)ctext, &ks, DES_ENCRYPT);
      
      // increase how many passwords processed
      c->complete++;
      
      // if hashes match, set found and exit loop
      if(memcmp(ctext, c->hash.b, 8)==0) {
        c->found=true;
        return true;
      }
      // decrease total tried. if none left, exit
      if(--c->total_cbn == 0) {
        return false;
      }
      // update password index values
      for(i=0;;i++) {
        // increase one. if not length of alphabet, break.
        if(++c->pwd_idx[i] != c->alpha_len) {
          pwd[i] = c->alphabet[c->pwd_idx[i]];
          break;
        }  
        // reset index
        c->pwd_idx[i]=0;
        pwd[i] = c->alphabet[0];
      }
    }
    // we didn't find it
    return false;
}

The screenshot below shows 2.46 million keys per second are tested. It uses no optimization at all, apart from those used by the OpenSSL library.

6. Precomputing Key Schedules 1

The simple design of the DES key schedule algorithm makes both differential and linear attacks easier. That is not to imply the design was simplified to facilitate attacks. It was simplified to implement on 1970s hardware with wiring. The lack of non-linear operations means that no bits in a subkey overlap with another subkey. This allows us to use a bitwise OR / bitwise XOR to combine subkeys and generate completely new ones.

We can generate key schedules for each unique bit of a 56-bit key without requiring a large amount of storage. DES_init_keys() will perform this operation and only uses 229,376 bytes of RAM. That’s 256 key schedules (0x00-0xFF) for 7 bytes.

void DES_str_to_key (char str[], uint8_t key[]) {
    int i;

    key[0] = str[0] >> 1;
    key[1] = ((str[0] & 0x01) << 6) | (str[1] >> 2);
    key[2] = ((str[1] & 0x03) << 5) | (str[2] >> 3);
    key[3] = ((str[2] & 0x07) << 4) | (str[3] >> 4);
    key[4] = ((str[3] & 0x0F) << 3) | (str[4] >> 5);
    key[5] = ((str[4] & 0x1F) << 2) | (str[5] >> 6);
    key[6] = ((str[5] & 0x3F) << 1) | (str[6] >> 7);
    key[7] = str[6] & 0x7F;

    for (i = 0;i < 8;i++) {
      key[i] = (key[i] << 1);
    }
    DES_set_odd_parity ((DES_cblock*)key);
}

// initialize 7*256 key schedules
void DES_init_keys(DES_key_schedule ks_tbl[7][256]) {
    DES_cblock key;
    int        i, j;
    char       pwd[8];
    
    memset(pwd,0,sizeof(pwd));
    
    // for each byte of a 56-bit key
    for(i=0;i<7;i++) {
      // create 256 key schedules
      for(j=0;j<256;j++) {
        pwd[i]=j;
        DES_str_to_key(pwd, (uint8_t*)&key);
        DES_set_key(&key, &ks_tbl[i][j]);
      }
      // clear byte
      pwd[i]=0;
    }
}

DES_set_keyx() works in the same way DES_set_key() does except it uses a precomputed table. As you will see later, this approach is much faster than using the function provided by OpenSSL. We are exploiting the lack of non-linear operations in the key scheduling algorithm and the fact no bits overlap with one another. A bitwise OR is used here, but an XOR will work too.

/ generate DES key schedule from precomputed DES schedules
void DES_set_keyx(DES_cblock*key, 
  DES_key_schedule *ks, DES_key_schedule ks_tbl[7][256]) 
{
    uint64_t *s, *d;
    uint8_t  *k=(uint8_t*)key;
    size_t   i, j;
    
    d = (uint64_t*)ks;
    
    // zero initialize
    for(i=0; i<128/8; i++) 
      d[i]=0;
    
    // for each byte of a 56-bit key
    for(i=0; i<7; i++) {
      // get a key schedule
      s = (uint64_t*)&ks_tbl[i][k[i]];
      
      // perform a bitwise OR
      for(j=0; j<128/8; j++) 
        d[j] |= s[j];
    }
}

7. Version 2

This is similar to version 1 with the obvious difference of using precomputed DES schedules.

static bool crack_lm2(void *param) {
    int              i;
    DES_key_schedule ks;
    uint8_t          pwd[7+1]={0};
    const char       ptext[]="KGS!@#$%";
    uint8_t          ctext[8];
    DES_key_schedule ks_tbl[7][256];
    crack_opt_t      *c=(crack_opt_t*)param;
    
    // precompute key schedules
    DES_init_keys(ks_tbl);
    
    // initialize password
    for(i=0;i<7;i++) {
      if(c->pwd_idx[i] == ~0UL)break;
      pwd[i] = c->alphabet[c->pwd_idx[i]];
    }
      
    // while not stopped
    while(!c->stopped) {
      // create DES subkeys from index values
      DES_set_keyx((DES_cblock*)pwd, &ks, ks_tbl);
      // encrypt plaintext
      DES_ecb_encrypt((const_DES_cblock*)ptext, 
        (DES_cblock*)ctext, &ks, DES_ENCRYPT);
      
      // increase how many passwords processed
      c->complete++;
      
      // if hashes match, set found and exit loop
      if(memcmp(ctext, c->hash.b, 8)==0) {
        c->found=true;
        return true;
      }
      // decrease total tried. if none left, exit
      if(--c->total_cbn == 0) return false;
      // update password index values
      for(i=0;;i++) {
        // increase one. if not length of alphabet, break.
        if(++c->pwd_idx[i] != c->alpha_len) {
          pwd[i] = c->alphabet[c->pwd_idx[i]];
          break;
        }  
        // reset index
        c->pwd_idx[i]=0;
        pwd[i] = c->alphabet[0];
      }
    }
    // we didn't find it
    return false;
}

4.44 million keys per second are tested which is a distinct improvement over version 1.

8. Using Macros For The Key Schedule Algorithm

In a brute force attack, we only require changing one byte in the password string for each iteration. However, DES_set_keyx will derive a key schedule from all 7 bytes. DES_init_keys2() is a new function that will generate DES key schedules using an alphabet and order them in a way that allows us to use macros for creating new key schedules.

// initialize key schedules for alphabet
void DES_init_keys2(char alphabet[], 
  DES_key_schedule ks_tbl[7][256]) 
{
    DES_cblock key;
    char       pwd[7+1];
    size_t     i, j, alpha_len=strlen(alphabet);
    
    memset(pwd,0,sizeof(pwd));
    
    // for each byte of a 56-bit key
    for(i=0;i<7;i++) {
      // create key schedules for each character of the alphabet
      for(j=0;j<alpha_len;j++) {
        pwd[i] = alphabet[j];
        DES_str_to_key(pwd, (uint8_t*)&key);
        DES_set_key(&key, &ks_tbl[i][j]);
      }
      // clear byte
      pwd[i]=0;
    }
}

The following macros replace DES_set_keyx and use vector instructions provded by SSE2 and AVX2 to improve performance.

// create DES subkeys using precomputed schedules
// using AVX2 is slightly faster than SSE2, but not by much.
#if defined(AVX2)
#include <immintrin.h>

#define DES_SET_KEY(idx) { \
    __m256i *s = (__m256i*)&ks_tbl[idx-1][c->pwd_idx[idx-1]]; \
    __m256i *p = (__m256i*)&ks[idx]; \
    __m256i *d = (__m256i*)&ks[idx-1]; \
    if (idx == 7) { \
        d[0] = s[0]; d[1] = s[1]; \
        d[2] = s[2]; d[3] = s[3]; \
    } else { \
        d[0] = _mm256_or_si256(s[0], p[0]); \
        d[1] = _mm256_or_si256(s[1], p[1]); \
        d[2] = _mm256_or_si256(s[2], p[2]); \
        d[3] = _mm256_or_si256(s[3], p[3]); \
    } \
}
#elif defined(SSE2)
#include <emmintrin.h>

#define DES_SET_KEY(idx) { \
    __m128i *s = (__m128i*)&ks_tbl[idx-1][c->pwd_idx[idx-1]]; \
    __m128i *p = (__m128i*)&ks[idx]; \
    __m128i *d = (__m128i*)&ks[idx-1]; \
    if (idx == 7) {\
        d[0] = s[0]; d[1] = s[1]; \
        d[2] = s[2]; d[3] = s[3]; \
        d[4] = s[4]; d[5] = s[5]; \
        d[6] = s[6]; d[7] = s[7]; \
    } else { \
        d[0] = _mm_or_si128(s[0], p[0]); \
        d[1] = _mm_or_si128(s[1], p[1]); \
        d[2] = _mm_or_si128(s[2], p[2]); \
        d[3] = _mm_or_si128(s[3], p[3]); \
        d[4] = _mm_or_si128(s[4], p[4]); \
        d[5] = _mm_or_si128(s[5], p[5]); \
        d[6] = _mm_or_si128(s[6], p[6]); \
        d[7] = _mm_or_si128(s[7], p[7]); \
    } \
}
#else
#define DES_SET_KEY(idx) { \
    uint64_t *p = (uint64_t*)&ks[idx]; \
    uint64_t *s = (uint64_t*)&ks_tbl[idx-1][c->pwd_idx[idx-1]]; \
    uint64_t *d = (uint64_t*)&ks[idx-1]; \
    \
    d[ 0]=s[ 0]; d[ 1]=s[ 1]; d[ 2]=s[ 2]; d[ 3]=s[ 3]; \
    d[ 4]=s[ 4]; d[ 5]=s[ 5]; d[ 6]=s[ 6]; d[ 7]=s[ 7]; \
    d[ 8]=s[ 8]; d[ 9]=s[ 9]; d[10]=s[10]; d[11]=s[11]; \
    d[12]=s[12]; d[13]=s[13]; d[14]=s[14]; d[15]=s[15]; \
    \
    if(idx < 7) { \
      d[ 0] |= p[ 0]; d[ 1] |= p[ 1]; \
      d[ 2] |= p[ 2]; d[ 3] |= p[ 3]; \
      d[ 4] |= p[ 4]; d[ 5] |= p[ 5]; \
      d[ 6] |= p[ 6]; d[ 7] |= p[ 7]; \
      d[ 8] |= p[ 8]; d[ 9] |= p[ 9]; \
      d[10] |= p[10]; d[11] |= p[11]; \
      d[12] |= p[12]; d[13] |= p[13]; \
      d[14] |= p[14]; d[15] |= p[15]; \
    } \
}
#endif

This really speeds up an attack, but we’re not entirely finished yet.

9. Initial and Final Permutation

So far, we’ve focused primarily on the key scheduling algorithm, but now let’s examine the encryption algorithm and try to reduce the amount of code required for this process.

Before encryption, the 64-bit plaintext is remapped using something known as Initial Permutation (IP). After 16 rounds of encryption have been applied, the inverse known as Final Permutation (FP) is applied. Believe it or not, both IP and FP were made part of the DES specification simply because of how expensive it was to build hardware back in the 1970s. The designers identified an issue with the wiring of hardware after the project was completed and had the choice between building a new hardware device or changing the specification.

It was simply cheaper to change the specification and it’s widely accepted this additional process does not affect security of the cipher in any way. It is akin to a modern block cipher such as NOEKEON or SM4 that converts the plaintext to big-endian on little-endian machines. As you can see from the code below, it requires a lot of operations. By removing the permutation for both the plaintext and ciphertext, there is a significant increase in the speed of recovery.

#define ROTATE(a,n)(((a)>>(n))+((a)<<(32-(n))))

#define PERM_OP(a,b,t,n,m) ((t)=((((a)>>(n))^(b))&(m)),\
        (b)^=(t),\
        (a)^=((t)<<(n)))

#define IP(l,r) \
        { \
        register uint32_t tt; \
        PERM_OP(r,l,tt, 4,0x0f0f0f0fL); \
        PERM_OP(l,r,tt,16,0x0000ffffL); \
        PERM_OP(r,l,tt, 2,0x33333333L); \
        PERM_OP(l,r,tt, 8,0x00ff00ffL); \
        PERM_OP(r,l,tt, 1,0x55555555L); \
        }
        
    // perform initial permutation on ciphertext/hash
    h[0] = c->hash.w[0];
    h[1] = c->hash.w[1];
    IP(h[0], h[1]);
    h[0] = ROTATE(h[0], 29) & 0xffffffffL;
    h[1] = ROTATE(h[1], 29) & 0xffffffffL;

The plaintext KGS!@#$% in its hexadecimal representation is 0x4B47532140232425. Once the initial permutation has been applied, we end up with 0xAA1907472400B807 that gets loaded into L and R before applying each round of encryption.

10. Skipping Rounds

We can safely skip the last round of encryption by first checking the result of L with half of the LM hash we are trying to crack. If they are equal, only then do we apply the last round and check R.

                  // permuted plaintext
                  r = 0x2400B807; l = 0xAA190747;

                  k = (uint32_t*)&ks[0];

                  // encrypt
                  DES_F(l, r,  0); DES_F(r, l,  2);
                  DES_F(l, r,  4); DES_F(r, l,  6);     
                  DES_F(l, r,  8); DES_F(r, l, 10);    
                  DES_F(l, r, 12); DES_F(r, l, 14);   
                  DES_F(l, r, 16); DES_F(r, l, 18);    
                  DES_F(l, r, 20); DES_F(r, l, 22);    
                  DES_F(l, r, 24); DES_F(r, l, 26);    
                  DES_F(l, r, 28);   

                  c->complete++;

                  // do we have one half of the LM hash?
                  if (h[0] == l) {
                    // apply the last round
                    DES_F(r, l, 30);
                    // do we have the full hash?
                    if (h[1] == r) {
                      // ok, we found the key
                      c->found = true;
                      return true;
                    }
                  }

11. Version 3

Note how the key schedule buffers are aligned by 32 bytes. This is to enable using AVX2.

static bool crack_lm3(void *param) {
    uint32_t         h[2], l, r, t, u, *k;
    DES_key_schedule ks_tbl[7][256] __attribute__ ((aligned(32)));
    DES_key_schedule ks[7]          __attribute__ ((aligned(32)));
    crack_opt_t      *c=(crack_opt_t*)param;
    
    // precompute key schedules for alphabet
    DES_init_keys2(c->alphabet, ks_tbl);
        
    // perform initial permutation on ciphertext/hash
    h[0] = c->hash.w[0];
    h[1] = c->hash.w[1];
    IP(h[0], h[1]);
    h[0] = ROTATE(h[0], 29) & 0xffffffffL;
    h[1] = ROTATE(h[1], 29) & 0xffffffffL;

    // set the initial key schedules based on pwd_idx
    for (int i=7; i>0; i--) {
      // if not set, skip it
      if (c->pwd_idx[i-1] == ~0UL) continue;
      // set key schedule for this index
      DES_SET_KEY(i);
    }

    goto compute_lm;

    do {
      DES_SET_KEY(7);
      do {
        DES_SET_KEY(6);
        do {
          DES_SET_KEY(5);
          do {
            DES_SET_KEY(4);
            do {
              DES_SET_KEY(3);
              do {
                DES_SET_KEY(2);
                do {
                  DES_SET_KEY(1);
compute_lm:
                  // permuted plaintext
                  r = 0x2400B807; l = 0xAA190747;

                  k = (uint32_t*)&ks[0];

                  // encrypt
                  DES_F(l, r,  0); DES_F(r, l,  2);
                  DES_F(l, r,  4); DES_F(r, l,  6);     
                  DES_F(l, r,  8); DES_F(r, l, 10);    
                  DES_F(l, r, 12); DES_F(r, l, 14);   
                  DES_F(l, r, 16); DES_F(r, l, 18);    
                  DES_F(l, r, 20); DES_F(r, l, 22);    
                  DES_F(l, r, 24); DES_F(r, l, 26);    
                  DES_F(l, r, 28);   

                  c->complete++;

                  // do we have one half of the LM hash?
                  if (h[0] == l) {
                    // apply the last round
                    DES_F(r, l, 30);
                    // do we have the full hash?
                    if (h[1] == r) {
                      // ok, we found the key
                      c->found = true;
                      return true;
                    }
                  }

                  if (--c->total_cbn == 0) return false;
                  if (c->stopped) return false;

                } while (++c->pwd_idx[0] < c->alpha_len);
                c->pwd_idx[0] = 0;
              } while (++c->pwd_idx[1] < c->alpha_len);
              c->pwd_idx[1] = 0;
            } while (++c->pwd_idx[2] < c->alpha_len);
            c->pwd_idx[2] = 0;
          } while (++c->pwd_idx[3] < c->alpha_len);
          c->pwd_idx[3] = 0;
        } while (++c->pwd_idx[4] < c->alpha_len);
        c->pwd_idx[4] = 0;
      } while (++c->pwd_idx[5] < c->alpha_len);
      c->pwd_idx[5] = 0;
    } while (++c->pwd_idx[6] < c->alpha_len);
    return false;
}

Now we’re talking! Over 3.5 million keys per second more than version 2.

12. Precomputing Key Schedules 2

Our final optimization in C is the precomputation of key schedules for all 2-byte passwords and storing them in memory. For the alphabet A-Z, this requires 86,528 bytes of RAM. 69 characters would require 609,408 bytes of RAM. For devices that perform better with large blocks of memory, one might consider precomputing key schedules for all 3-byte passwords depending on the circumstances. Worst case scenario for 3-byte passwords is around 42MB. I’ve not tried using this amount, but it might be worth researching.

    // create key schedules for every two character password
    for(i=0;i<c->alpha_len;i++) {
      memset(pwd, 0, sizeof(pwd));
      pwd[0] = c->alphabet[i];

      for(j=0;j<c->alpha_len;j++) {
        pwd[1] = c->alphabet[j];
        DES_str_to_key((char*)pwd, (uint8_t*)&key);
        DES_set_key(&key, p);
        p++;
      }
    }

The F round is also changed to factor in the 2-byte key schedules. k1 points to the key schedule for bytes 3-7 while k2 points to the key schedules for every 2-byte combination.

#define LOAD_DATA_tmp(a,b,c,d,e,f) LOAD_DATA(a,b,c,d,e,f,g)
#define LOAD_DATA(R,S,u,t,E0,E1,tmp) \
        u=R^(k1[S  ] | k2[S  ]); \
        t=R^(k1[S+1] | k2[S+1]);

#define DES_F(LL,R,S) {\
        LOAD_DATA_tmp(R,S,u,t,E0,E1); \
        t=ROTATE(t,4); \
        LL^=DES_sbox[0][(u>> 2L)&0x3f]^ \
            DES_sbox[2][(u>>10L)&0x3f]^ \
            DES_sbox[4][(u>>18L)&0x3f]^ \
            DES_sbox[6][(u>>26L)&0x3f]^ \
            DES_sbox[1][(t>> 2L)&0x3f]^ \
            DES_sbox[3][(t>>10L)&0x3f]^ \
            DES_sbox[5][(t>>18L)&0x3f]^ \
            DES_sbox[7][(t>>26L)&0x3f]; }

13. Version 4

cbn contains the length of the alphabet squared. For [A,Z], that’s 26^{2} or 676 combinations. By keeping the code inside the CPU cache longer, this helps improve performance.

    k1 = (uint32_t*)&ks1[2];
    k2 = (uint32_t*)&ks2[0];
    
    k2 += ((c->pwd_idx[0] * c->alpha_len) + c->pwd_idx[1]) * 32;
    cbn = c->alpha_len * c->alpha_len;
    
    goto compute_lm;

    do {
      DES_SET_KEY(7);
      do {
        DES_SET_KEY(6);
        do {
          DES_SET_KEY(5);
          do {
            DES_SET_KEY(4);
            do {
              DES_SET_KEY(3);
              k2 = (uint32_t*)&ks2[0];
compute_lm:
              for(i=0;i<cbn;i++) {
                // permuted plaintext
                r = 0x2400B807; l = 0xAA190747;

                // encrypt
                DES_F(l, r,  0); 
                DES_F(r, l,  2); DES_F(l, r,  4); 
                DES_F(r, l,  6); DES_F(l, r,  8); 
                DES_F(r, l, 10); DES_F(l, r, 12); 
                DES_F(r, l, 14); DES_F(l, r, 16); 
                DES_F(r, l, 18); DES_F(l, r, 20); 
                DES_F(r, l, 22); DES_F(l, r, 24); 
                DES_F(r, l, 26); DES_F(l, r, 28); 
                
                if (h[0] == l) {
                  DES_F(r, l, 30);
                  if (h[1] == r) {
                    // yay, we found it.
                    c->pwd_idx[0] = (i / c->alpha_len);
                    c->pwd_idx[1] = (i % c->alpha_len);
                    c->found = true;
                    return true;
                  }
                }
                k2+=32;
              }
              c->complete += cbn;
              c->total_cbn -= cbn;
              if ((int64_t)c->total_cbn<0) return false;
              if (c->stopped) return false;
                
            } while (++c->pwd_idx[2] < c->alpha_len);
            c->pwd_idx[2] = 0;
          } while (++c->pwd_idx[3] < c->alpha_len);
          c->pwd_idx[3] = 0;
        } while (++c->pwd_idx[4] < c->alpha_len);
        c->pwd_idx[4] = 0;
      } while (++c->pwd_idx[5] < c->alpha_len);
      c->pwd_idx[5] = 0;
    } while (++c->pwd_idx[6] < c->alpha_len);
    return false;

We see a modest increase in speed for a single thread, but this will make more of a difference in multithreaded mode. This time we have 8.64 million keys per second.

14. Results

Here are all four routines running one after the other using multiple threads.

We achieve a 300% speed increase, but that’s significantly less than the 450% gain advertised by L0phtCrack twenty years ago. The only explanation I can think of at this point is that optimizers for compilers 22 years ago were not as good as they are today. Well written assembler routines would increase the speed further, but not by that much IMHO. What I’ve shown here may not be exactly how the L0pht did it, but i’d say it’s probably close enough.

Source code

This entry was posted in cryptography, passwords, programming, security, windows and tagged , , , , . Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s