forked from csarron/mdict-analysis
-
Notifications
You must be signed in to change notification settings - Fork 0
/
pureSalsa20.py
365 lines (287 loc) · 13.9 KB
/
pureSalsa20.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
#!/usr/bin/env python
# coding: utf-8
"""
Copyright by https://github.com/zhansliu/writemdict
pureSalsa20.py -- a pure Python implementation of the Salsa20 cipher, ported to Python 3
v4.0: Added Python 3 support, dropped support for Python <= 2.5.
// zhansliu
Original comments below.
====================================================================
There are comments here by two authors about three pieces of software:
comments by Larry Bugbee about
Salsa20, the stream cipher by Daniel J. Bernstein
(including comments about the speed of the C version) and
pySalsa20, Bugbee's own Python wrapper for salsa20.c
(including some references), and
comments by Steve Witham about
pureSalsa20, Witham's pure Python 2.5 implementation of Salsa20,
which follows pySalsa20's API, and is in this file.
Salsa20: a Fast Streaming Cipher (comments by Larry Bugbee)
-----------------------------------------------------------
Salsa20 is a fast stream cipher written by Daniel Bernstein
that basically uses a hash function and XOR making for fast
encryption. (Decryption uses the same function.) Salsa20
is simple and quick.
Some Salsa20 parameter values...
design strength 128 bits
key length 128 or 256 bits, exactly
IV, aka nonce 64 bits, always
chunk size must be in multiples of 64 bytes
Salsa20 has two reduced versions, 8 and 12 rounds each.
One benchmark (10 MB):
1.5GHz PPC G4 102/97/89 MB/sec for 8/12/20 rounds
AMD Athlon 2500+ 77/67/53 MB/sec for 8/12/20 rounds
(no I/O and before Python GC kicks in)
Salsa20 is a Phase 3 finalist in the EU eSTREAM competition
and appears to be one of the fastest ciphers. It is well
documented so I will not attempt any injustice here. Please
see "References" below.
...and Salsa20 is "free for any use".
pySalsa20: a Python wrapper for Salsa20 (Comments by Larry Bugbee)
------------------------------------------------------------------
pySalsa20.py is a simple ctypes Python wrapper. Salsa20 is
as it's name implies, 20 rounds, but there are two reduced
versions, 8 and 12 rounds each. Because the APIs are
identical, pySalsa20 is capable of wrapping all three
versions (number of rounds hardcoded), including a special
version that allows you to set the number of rounds with a
set_rounds() function. Compile the version of your choice
as a shared library (not as a Python extension), name and
install it as libsalsa20.so.
Sample usage:
from pySalsa20 import Salsa20
s20 = Salsa20(key, IV)
dataout = s20.encryptBytes(datain) # same for decrypt
This is EXPERIMENTAL software and intended for educational
purposes only. To make experimentation less cumbersome,
pySalsa20 is also free for any use.
THIS PROGRAM IS PROVIDED WITHOUT WARRANTY OR GUARANTEE OF
ANY KIND. USE AT YOUR OWN RISK.
Enjoy,
Larry Bugbee
April 2007
References:
-----------
http://en.wikipedia.org/wiki/Salsa20
http://en.wikipedia.org/wiki/Daniel_Bernstein
http://cr.yp.to/djb.html
http://www.ecrypt.eu.org/stream/salsa20p3.html
http://www.ecrypt.eu.org/stream/p3ciphers/salsa20/salsa20_p3source.zip
Prerequisites for pySalsa20:
----------------------------
- Python 2.5 (haven't tested in 2.4)
pureSalsa20: Salsa20 in pure Python 2.5 (comments by Steve Witham)
------------------------------------------------------------------
pureSalsa20 is the stand-alone Python code in this file.
It implements the underlying Salsa20 core algorithm
and emulates pySalsa20's Salsa20 class API (minus a bug(*)).
pureSalsa20 is MUCH slower than libsalsa20.so wrapped with pySalsa20--
about 1/1000 the speed for Salsa20/20 and 1/500 the speed for Salsa20/8,
when encrypting 64k-byte blocks on my computer.
pureSalsa20 is for cases where portability is much more important than
speed. I wrote it for use in a "structured" random number generator.
There are comments about the reasons for this slowness in
http://www.tiac.net/~sw/2010/02/PureSalsa20
Sample usage:
from pureSalsa20 import Salsa20
s20 = Salsa20(key, IV)
dataout = s20.encryptBytes(datain) # same for decrypt
I took the test code from pySalsa20, added a bunch of tests including
rough speed tests, and moved them into the file testSalsa20.py.
To test both pySalsa20 and pureSalsa20, type
python testSalsa20.py
(*)The bug (?) in pySalsa20 is this. The rounds variable is global to the
libsalsa20.so library and not switched when switching between instances
of the Salsa20 class.
s1 = Salsa20( key, IV, 20 )
s2 = Salsa20( key, IV, 8 )
In this example,
with pySalsa20, both s1 and s2 will do 8 rounds of encryption.
with pureSalsa20, s1 will do 20 rounds and s2 will do 8 rounds.
Perhaps giving each instance its own nRounds variable, which
is passed to the salsa20wordtobyte() function, is insecure. I'm not a
cryptographer.
pureSalsa20.py and testSalsa20.py are EXPERIMENTAL software and
intended for educational purposes only. To make experimentation less
cumbersome, pureSalsa20.py and testSalsa20.py are free for any use.
Revisions:
----------
p3.2 Fixed bug that initialized the output buffer with plaintext!
Saner ramping of nreps in speed test.
Minor changes and print statements.
p3.1 Took timing variability out of add32() and rot32().
Made the internals more like pySalsa20/libsalsa .
Put the semicolons back in the main loop!
In encryptBytes(), modify a byte array instead of appending.
Fixed speed calculation bug.
Used subclasses instead of patches in testSalsa20.py .
Added 64k-byte messages to speed test to be fair to pySalsa20.
p3 First version, intended to parallel pySalsa20 version 3.
More references:
----------------
http://www.seanet.com/~bugbee/crypto/salsa20/ [pySalsa20]
http://cr.yp.to/snuffle.html [The original name of Salsa20]
http://cr.yp.to/snuffle/salsafamily-20071225.pdf [ Salsa20 design]
http://www.tiac.net/~sw/2010/02/PureSalsa20
THIS PROGRAM IS PROVIDED WITHOUT WARRANTY OR GUARANTEE OF
ANY KIND. USE AT YOUR OWN RISK.
Cheers,
Steve Witham sw at remove-this tiac dot net
February, 2010
"""
import sys
assert(sys.version_info >= (2, 6))
if sys.version_info >= (3,):
integer_types = (int,)
python3 = True
else:
integer_types = (int, long)
python3 = False
from struct import Struct
little_u64 = Struct( "<Q" ) # little-endian 64-bit unsigned.
# Unpacks to a tuple of one element!
little16_i32 = Struct( "<16i" ) # 16 little-endian 32-bit signed ints.
little4_i32 = Struct( "<4i" ) # 4 little-endian 32-bit signed ints.
little2_i32 = Struct( "<2i" ) # 2 little-endian 32-bit signed ints.
_version = 'p4.0'
#----------- Salsa20 class which emulates pySalsa20.Salsa20 ---------------
class Salsa20(object):
def __init__(self, key=None, IV=None, rounds=20 ):
self._lastChunk64 = True
self._IVbitlen = 64 # must be 64 bits
self.ctx = [ 0 ] * 16
if key:
self.setKey(key)
if IV:
self.setIV(IV)
self.setRounds(rounds)
def setKey(self, key):
assert type(key) == bytes
ctx = self.ctx
if len( key ) == 32: # recommended
constants = b"expand 32-byte k"
ctx[ 1],ctx[ 2],ctx[ 3],ctx[ 4] = little4_i32.unpack(key[0:16])
ctx[11],ctx[12],ctx[13],ctx[14] = little4_i32.unpack(key[16:32])
elif len( key ) == 16:
constants = b"expand 16-byte k"
ctx[ 1],ctx[ 2],ctx[ 3],ctx[ 4] = little4_i32.unpack(key[0:16])
ctx[11],ctx[12],ctx[13],ctx[14] = little4_i32.unpack(key[0:16])
else:
raise Exception( "key length isn't 32 or 16 bytes." )
ctx[0],ctx[5],ctx[10],ctx[15] = little4_i32.unpack( constants )
def setIV(self, IV):
assert type(IV) == bytes
assert len(IV)*8 == 64, 'nonce (IV) not 64 bits'
self.IV = IV
ctx=self.ctx
ctx[ 6],ctx[ 7] = little2_i32.unpack( IV )
ctx[ 8],ctx[ 9] = 0, 0 # Reset the block counter.
setNonce = setIV # support an alternate name
def setCounter( self, counter ):
assert( type(counter) in integer_types )
assert( 0 <= counter < 1<<64 ), "counter < 0 or >= 2**64"
ctx = self.ctx
ctx[ 8],ctx[ 9] = little2_i32.unpack( little_u64.pack( counter ) )
def getCounter( self ):
return little_u64.unpack( little2_i32.pack( *self.ctx[ 8:10 ] ) ) [0]
def setRounds(self, rounds, testing=False ):
assert testing or rounds in [8, 12, 20], 'rounds must be 8, 12, 20'
self.rounds = rounds
def encryptBytes(self, data):
assert type(data) == bytes, 'data must be byte string'
assert self._lastChunk64, 'previous chunk not multiple of 64 bytes'
lendata = len(data)
munged = bytearray(lendata)
for i in range( 0, lendata, 64 ):
h = salsa20_wordtobyte( self.ctx, self.rounds, checkRounds=False )
self.setCounter( ( self.getCounter() + 1 ) % 2**64 )
# Stopping at 2^70 bytes per nonce is user's responsibility.
for j in range( min( 64, lendata - i ) ):
if python3:
munged[ i+j ] = data[ i+j ] ^ h[j]
else:
munged[ i+j ] = ord(data[ i+j ]) ^ ord(h[j])
self._lastChunk64 = not lendata % 64
return bytes(munged)
decryptBytes = encryptBytes # encrypt and decrypt use same function
#--------------------------------------------------------------------------
def salsa20_wordtobyte( input, nRounds=20, checkRounds=True ):
""" Do nRounds Salsa20 rounds on a copy of
input: list or tuple of 16 ints treated as little-endian unsigneds.
Returns a 64-byte string.
"""
assert( type(input) in ( list, tuple ) and len(input) == 16 )
assert( not(checkRounds) or ( nRounds in [ 8, 12, 20 ] ) )
x = list( input )
def XOR( a, b ): return a ^ b
ROTATE = rot32
PLUS = add32
for i in range( nRounds // 2 ):
# These ...XOR...ROTATE...PLUS... lines are from ecrypt-linux.c
# unchanged except for indents and the blank line between rounds:
x[ 4] = XOR(x[ 4],ROTATE(PLUS(x[ 0],x[12]), 7));
x[ 8] = XOR(x[ 8],ROTATE(PLUS(x[ 4],x[ 0]), 9));
x[12] = XOR(x[12],ROTATE(PLUS(x[ 8],x[ 4]),13));
x[ 0] = XOR(x[ 0],ROTATE(PLUS(x[12],x[ 8]),18));
x[ 9] = XOR(x[ 9],ROTATE(PLUS(x[ 5],x[ 1]), 7));
x[13] = XOR(x[13],ROTATE(PLUS(x[ 9],x[ 5]), 9));
x[ 1] = XOR(x[ 1],ROTATE(PLUS(x[13],x[ 9]),13));
x[ 5] = XOR(x[ 5],ROTATE(PLUS(x[ 1],x[13]),18));
x[14] = XOR(x[14],ROTATE(PLUS(x[10],x[ 6]), 7));
x[ 2] = XOR(x[ 2],ROTATE(PLUS(x[14],x[10]), 9));
x[ 6] = XOR(x[ 6],ROTATE(PLUS(x[ 2],x[14]),13));
x[10] = XOR(x[10],ROTATE(PLUS(x[ 6],x[ 2]),18));
x[ 3] = XOR(x[ 3],ROTATE(PLUS(x[15],x[11]), 7));
x[ 7] = XOR(x[ 7],ROTATE(PLUS(x[ 3],x[15]), 9));
x[11] = XOR(x[11],ROTATE(PLUS(x[ 7],x[ 3]),13));
x[15] = XOR(x[15],ROTATE(PLUS(x[11],x[ 7]),18));
x[ 1] = XOR(x[ 1],ROTATE(PLUS(x[ 0],x[ 3]), 7));
x[ 2] = XOR(x[ 2],ROTATE(PLUS(x[ 1],x[ 0]), 9));
x[ 3] = XOR(x[ 3],ROTATE(PLUS(x[ 2],x[ 1]),13));
x[ 0] = XOR(x[ 0],ROTATE(PLUS(x[ 3],x[ 2]),18));
x[ 6] = XOR(x[ 6],ROTATE(PLUS(x[ 5],x[ 4]), 7));
x[ 7] = XOR(x[ 7],ROTATE(PLUS(x[ 6],x[ 5]), 9));
x[ 4] = XOR(x[ 4],ROTATE(PLUS(x[ 7],x[ 6]),13));
x[ 5] = XOR(x[ 5],ROTATE(PLUS(x[ 4],x[ 7]),18));
x[11] = XOR(x[11],ROTATE(PLUS(x[10],x[ 9]), 7));
x[ 8] = XOR(x[ 8],ROTATE(PLUS(x[11],x[10]), 9));
x[ 9] = XOR(x[ 9],ROTATE(PLUS(x[ 8],x[11]),13));
x[10] = XOR(x[10],ROTATE(PLUS(x[ 9],x[ 8]),18));
x[12] = XOR(x[12],ROTATE(PLUS(x[15],x[14]), 7));
x[13] = XOR(x[13],ROTATE(PLUS(x[12],x[15]), 9));
x[14] = XOR(x[14],ROTATE(PLUS(x[13],x[12]),13));
x[15] = XOR(x[15],ROTATE(PLUS(x[14],x[13]),18));
for i in range( len( input ) ):
x[i] = PLUS( x[i], input[i] )
return little16_i32.pack( *x )
#--------------------------- 32-bit ops -------------------------------
def trunc32( w ):
""" Return the bottom 32 bits of w as a Python int.
This creates longs temporarily, but returns an int. """
w = int( ( w & 0x7fffFFFF ) | -( w & 0x80000000 ) )
assert type(w) == int
return w
def add32( a, b ):
""" Add two 32-bit words discarding carry above 32nd bit,
and without creating a Python long.
Timing shouldn't vary.
"""
lo = ( a & 0xFFFF ) + ( b & 0xFFFF )
hi = ( a >> 16 ) + ( b >> 16 ) + ( lo >> 16 )
return ( -(hi & 0x8000) | ( hi & 0x7FFF ) ) << 16 | ( lo & 0xFFFF )
def rot32( w, nLeft ):
""" Rotate 32-bit word left by nLeft or right by -nLeft
without creating a Python long.
Timing depends on nLeft but not on w.
"""
nLeft &= 31 # which makes nLeft >= 0
if nLeft == 0:
return w
# Note: now 1 <= nLeft <= 31.
# RRRsLLLLLL There are nLeft RRR's, (31-nLeft) LLLLLL's,
# => sLLLLLLRRR and one s which becomes the sign bit.
RRR = ( ( ( w >> 1 ) & 0x7fffFFFF ) >> ( 31 - nLeft ) )
sLLLLLL = -( (1<<(31-nLeft)) & w ) | (0x7fffFFFF>>nLeft) & w
return RRR | ( sLLLLLL << nLeft )
# --------------------------------- end -----------------------------------