Free Essay

Hashing

In:

Submitted By mamolo102
Words 4332
Pages 18
Hashing hash functions collision resolution applications

References: Algorithms in Java, Chapter 14 http://www.cs.princeton.edu/introalgsds/42hash 1

Summary of symbol-table implementations

implementation unordered array ordered array unordered list ordered list BST randomized BST red-black tree

guarantee search N lg N N N N 7 lg N 3 lg N insert N N N N N 7 lg N 3 lg N delete N N N N N 7 lg N 3 lg N search N/2 lg N N/2 N/2 1.39 lg N 1.39 lg N lg N

average case insert N/2 N/2 N N/2 1.39 lg N 1.39 lg N lg N delete N/2 N/2 N/2 N/2 ? 1.39 lg N lg N

ordered iteration? no yes no yes yes yes yes

Can we do better?

2

Optimize Judiciously

More computing sins are committed in the name of efficiency (without necessarily achieving it) than for any other single reason including blind stupidity. - William A. Wulf

We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil. - Donald E. Knuth

We follow two rules in the matter of optimization: Rule 1: Don't do it. Rule 2 (for experts only). Don't do it yet - that is, not until you have a perfectly clear and unoptimized solution. - M. A. Jackson

Reference: Effective Java by Joshua Bloch.
3

Hashing: basic plan Save items in a key-indexed table (index is a function of the key). Hash function. Method for computing table index from key. hash(“it”) = 3 ?? hash(“times”) = 3
0 1 2 3 4 5 “it”

Issues. 1. Computing the hash function 2. Collision resolution: Algorithm and data structure to handle two keys that hash to the same index. 3. Equality test: Method for checking whether two keys are equal. Classic space-time tradeoff. No space limitation: trivial hash function with key as address. No time limitation: trivial collision resolution with sequential search. Limitations on both time and space: hashing (the real world).

• • •

4

hash functions collision resolution applications

5

Computing the hash function Idealistic goal: scramble the keys uniformly. Efficiently computable. Each table position equally likely for each key.

• •

thoroughly researched problem, still problematic in practical applications

Practical challenge: need different approach for each type of key Ex: Social Security numbers. Bad: first three digits. Better: last three digits. Ex: date of birth. Bad: birth year. Better: birthday. Ex: phone numbers. Bad: first three digits. Better: last three digits.

• • • • • •

573 = California, 574 = Alaska assigned in chronological order within a given geographic region

6

Hash Codes and Hash Functions Java convention: all classes implement hashCode() hashcode() returns a 32-bit int (between -2147483648 and 2147483647)

Hash function. An int between 0 and M-1 (for use as an array index) First try:
String s = "call"; int code = s.hashCode(); int hash = code % M;
7121 8191 3045982

Bug. Don't use (code % M) as array index

1-in-a billion bug. Don't use (Math.abs(code) % M) as array index.

OK. Safe to use ((code & 0x7fffffff) % M) as array index. hex literal 31-bit mask
7

Java’s hashCode() convention Theoretical advantages Ensures hashing can be used for every type of object Allows expert implementations suited to each type

• • • • • • • • •

Requirements: If x.equals(y) then x and y must have the same hash code. Repeated calls to x.hashCode() must return the same value. x y

Practical realities True randomness is hard to achieve Cost is an important consideration
x.hashCode() y.hashCode()

Available implementations default (inherited from Object): Memory address of x ( ! ! ! ) customized Java implementations: String, URL, Integer, Date. User-defined types: users are on their own that’s you!
8

A typical type Assumption when using hashing in Java: Key type has reasonable implementation of hashCode() and equals() Ex. Phone numbers: (609) 867-5309. exchange extension

public final class PhoneNumber { private final int area, exch, ext; public PhoneNumber(int area, int exch, int ext) { this.area = area; this.exch = exch; this.ext = ext; } public boolean equals(Object y) { // as before } public int hashCode() { return 10007 * (area + 1009 * exch) + ext; } }

sufficiently random?

Fundamental problem: Need a theorem for each data type to ensure reliability.

9

A decent hash code design Java 1.5 string library [see also Program 14.2 in Algs in Java]. public int hashCode() { int hash = 0; for (int i = 0; i < length(); i++) hash = s[i] + (31 * hash); return hash; } ith character of s char … 'a' 'b' 'c' … Unicode … 97 98 99 …

• Equivalent to h = 31 s + … + 31 s + 31 s + s . • Horner's method to hash string of length L: L multiplies/adds
L-1 0 2 L-3 L-2 L-1

Ex.

String s = "call"; int code = s.hashCode();
3045982 = 99 313 + 97 312 + 108 311 + 108 310 = 108 + 31 (108 + 31 (99 + 31 (97)))

Provably random? Well, no.

10

A poor hash code design Java 1.1 string library. For long strings: only examines 8-9 evenly spaced characters. Saves time in performing arithmetic…

• •

public { int int for

int hashCode()

hash = 0; skip = Math.max(1, length() / 8); (int i = 0; i < length(); i += skip) hash = (37 * hash) + s[i]; return hash;

}

but great potential for bad collision patterns. http://www.cs.princeton.edu/introcs/13loop/Hello.java http://www.cs.princeton.edu/introcs/13loop/Hello.class http://www.cs.princeton.edu/introcs/13loop/Hello.html http://www.cs.princeton.edu/introcs/13loop/index.html http://www.cs.princeton.edu/introcs/12type/index.html

Basic rule: need to use the whole key.

11

Digression: using a hash function for data mining Use content to characterize documents. Applications Search documents on the web for documents similar to a given one. Determine whether a new document belongs in one set or another

• • • • •

Approach Fix order k and dimension d • Compute hashCode() % d for all k-grams in the document Result: d-dimensional vector profile of each document To compare documents: Consider angle θ separating vectors cos θ close to 0: not similar cos θ close to 1: similar

a

b

θ cos θ = a |a||b|
12

b/

Digression: using a hash function for data mining k = 10 d = 65536
% more tale.txt it was the best of times it was the worst of times it was the age of wisdom it was the age of foolishness ... % more genome.txt CTTTCGGTTTGGAACC GAAGCCGCGCGTCT TGTCTGCTGCAGC ATCGTTC ... tale.txt i 10-grams with hashcode() i freq

genome.txt
10-grams with hashcode() i freq

0 1 2

0 0 0

0 0 0

435

best of ti foolishnes

2

TTTCGGTTTG TGTCTGCTGC

2

8999 ... 12122 ... 34543

it was the

8

0

0

CTTTCGGTTT

3

t was the b

5

ATGCGGTCGA

4

cos θ small: not similar

... 65535 65536 profiles
13

Digression: using a hash function to profile a document for data mining

public class Document { private String name; private double[] profile; public Document(String name, int k, int d) { this.name = name; String doc = (new In(name)).readAll(); int N = doc.length(); profile = new double[d]; for (int i = 0; i < N-k; i++) { int h = doc.substring(i, i+k).hashCode(); profile[Math.abs(h % d)] += 1; } } public double simTo(Document other) { // compute dot product and divide by magnitudes } }

14

Digression: using a hash function to compare documents

public class CompareAll { public static void main(String args[]) { int k = Integer.parseInt(args[0]); int d = Integer.parseInt(args[1]); int N = StdIn.readInt(); Document[] a = new Document[N]; for (int i = 0; i < N; i++) a[i] = new Document(StdIn.readString(), k, d); System.out.print(" "); for (int j = 0; j < N; j++) System.out.printf(" %.4s", a[j].name()); System.out.println(); for (int i = 0; i < N; i++) { System.out.printf("%.4s ", a[i].name()); for (int j = 0; j < N; j++) System.out.printf("%8.2f", a[i].simTo(a[j])); System.out.println(); } } }
15

Digression: using a hash function to compare documents
Cons TomS Huck Prej Pict DJIA Amaz ACTG
US Constitution “Tom Sawyer” “Huckleberry Finn” “Pride and Prejudice” a photograph financial data Amazon.com website .html source genome

% java CompareAll Cons Cons 1.00 TomS 0.89 Huck 0.87 Prej 0.88 Pict 0.35 DJIA 0.70 Amaz 0.63 ACTG 0.58

5 1000 < docs.txt TomS Huck Prej 0.89 0.87 0.88 1.00 0.98 0.96 0.98 1.00 0.94 0.96 0.94 1.00 0.34 0.32 0.34 0.75 0.74 0.76 0.66 0.65 0.67 0.62 0.61 0.63

Pict 0.35 0.34 0.32 0.34 1.00 0.29 0.48 0.24

DJIA 0.70 0.75 0.74 0.76 0.29 1.00 0.62 0.58

Amaz 0.63 0.66 0.65 0.67 0.48 0.62 1.00 0.45

ACTG 0.58 0.62 0.61 0.63 0.24 0.58 0.45 1.00
16

hash functions collision resolution applications

17

Helpful results from probability theory Bins and balls. Throw balls uniformly at random into M bins.

0

1

2

3

4

5

6

7

8

9

10

11

12

13 14

15

Birthday problem. Expect two balls in the same bin after Coupon collector. Expect every bin has
M / 2 tosses.

1 ball after

(M ln M) tosses.

Load balancing. After M tosses, expect most loaded bin has

(log M / log log M) balls.

18

Collisions Collision. Two distinct keys hashing to same index. Conclusion. Birthday problem a ridiculous amount of memory. can't avoid collisions unless you have

Challenge. Deal with collisions efficiently.

Approach 1: accept multiple collisions

Approach 2: minimize collisions

19

Collision resolution: two approaches 1. Separate chaining. [H. P. Luhn, IBM 1953] Put keys that collide in a list associated with index. 2. Open addressing. [Amdahl-Boehme-Rocherster-Samuel, IBM 1953] When a new key collides, find next empty slot, and put it there. st[0] st[1] st[0] st[1] st[2] st[3] jocularly listen null null suburban untravelled considerating seriously jocularly

null listen suburban

st[2] st[3]

st[30001] st[8190] browsing browsing

separate chaining (M = 8191, N = 15000) easy extension of linked list ST implementation

linear probing (M = 30001, N = 15000) easy extension of array ST implementation

20

Collision resolution approach 1: separate chaining good choice: M Use an array of M < N linked lists. Hash: map key to integer i between 0 and M-1. Insert: put at front of ith chain (if not already there). Search: only need to search ith chain.

• • •

N/10

key

hash 7121 3480 5017 0 3 3 .

st[0] st[1] st[2] st[3]

jocularly listen

seriously

call me ishmael

null suburban untravelled considerating

seriously untravelled suburban

st[8190]

browsing

. . .

21

Separate chaining ST implementation (skeleton) public class ListHashST { private int M = 8191; private Node[] st = new Node[M]; private class Node { Object key; Object val; Node next; Node(Key key, Value val, Node next) { this.key = key; this.val = val; this.next = next; } } private int hash(Key key) { return (key.hashcode() & 0x7ffffffff) % M; public void put(Key key, Value val) // see next slide public Val get(Key key) // see next slide }
22

could use doubling

compare with linked lists

no generics in arrays in Java

}

Separate chaining ST implementation (put and get)

public void put(Key key, Value val) { int i = hash(key); for (Node x = st[i]; x != null; x = x.next) if (key.equals(x.key)) { x.val = val; return; } st[i] = new Node(key, value, first); } public Value get(Key key) { int i = hash(key); for (Node x = st[i]; x != null; x = x.next) if (key.equals(x.key)) return (Value) x.val; return null; }

Identical to linked-list code, except hash to pick a list.

23

Analysis of separate chaining Separate chaining performance. Cost is proportional to length of list. Average length = N / M. Worst case: all keys hash to same list.

• • •

Theorem. Let = N / M > 1 be average length of list. For any t > 1, probability that list length > t is exponentially small in t. depends on hash map being random map

Parameters. M too large too many empty chains. M too small chains too long. Typical choice: = N / M 10 constant-time ops.

• • •

24

Collision resolution approach 2: open addressing good choice: M 2N Use an array of size M >> N. Hash: map key to integer i between 0 and M-1. Linear probing: Insert: put in slot i if free; if not try i+1, i+2, etc. Search: search slot i; if occupied but no match, try i+1, i+2, etc.

• • •

0

1

2

S
3

H
4

5

6

A
7

C
8

E
9

R
10

11

N
12

0

1

2

S
3

H
4

5

6

A
7

C
8

E
9

R
10

I
11

12

insert I hash(I) = 11

0

1

2

S
3

H
4

5

6

A
7

C
8

E
9

R
10

I
11

N
12

insert N hash(N) = 8

25

Linear probing ST implementation

public class ArrayHashST { standard ugly casts private int M = 30001; private Value[] vals = (Value[]) new Object[maxN]; private Key[] keys = (Key[]) new Object[maxN]; privat int hash(Key key) // as before public void put(Key key, Value val) { int i; for (i = hash(key); keys[i] != null; i = (i+1) % M) if (key.equals(keys[i])) break; vals[i] = val; keys[i] = key; } public Value get(Key key) { for (int i = hash(key); keys[i] != null; i = (i+1) % M) if (key.equals(keys[i])) return vals]i]; return null; } }

compare with elementary unordered array implementation standard array doubling code omitted (double when half full)

26

Clustering Cluster. A contiguous block of items. Observation. New keys likely to hash into middle of big clusters.

-

-

-

S

H

A

C

E

-

-

-

X

M

I

-

-

-

P

-

-

R

L

-

-

cluster

Knuth's parking problem. Cars arrive at one-way street with M parking spaces. Each desires a random space i: if space i is taken, try i+1, i+2, … What is mean displacement of a car?

Empty. With M/2 cars, mean displacement is about 3/2. Full. Mean displacement for the last car is about
M/2
27

Analysis of linear probing Linear probing performance. Insert and search cost depend on length of cluster. but keys more likely to Average length of cluster = = N / M. hash to big clusters Worst case: all keys hash to same cluster.

• • •

Theorem. [Knuth 1962] Let

= N / M < 1 be the load factor.

Average probes for insert/search miss
1 — 2

(1 + (1 +

1 ( 1 1 ( 1 α ) α )
2

Average probes for search hit
1 — 2

= )2

( 1 + α + 2α2 + 3α3 + 4α4 + . . . ) /

)=

1 + ( α + α2 + α3 + α4 + . . . ) /2

Parameters. Load factor too small Load factor too large Typical choice: M 2N

• • •

too many empty array entries. clusters coalesce. constant-time ops.
28

Hashing: variations on the theme Many improved versions have been studied: Ex: Two-probe hashing hash to two positions, put key in shorter of the two lists reduces average length of the longest list to log log N

• • • • •

Ex: Double hashing use linear probing, but skip a variable amount, not just 1 each time effectively eliminates clustering can allow table to become nearly full

29

Double hashing Idea Avoid clustering by using second hash to compute skip for search. Hash. Map key to integer i between 0 and M-1. Second hash. Map key to nonzero skip value k. Ex: k = 1 + (v mod 97). hashCode() Effect. Skip values give different search paths for keys that collide.

Best practices. Make k and M relatively prime.

30

Double Hashing Performance

Theorem. [Guibas-Szemerédi] Let
Average probes for insert/search miss
1 ( 1 α ) 1 ( 1 α )

= N / M < 1 be average length of list.

=

1 + α + α2 + α3 + α4 + . . .

Average probes for search hit
1 — α

ln

= 1 + α/2 + α2 /3 + α3 /4 + α4 /5 +...

Parameters. Typical choice: α

1.2

constant-time ops.

Disadvantage. Delete cumbersome to implement.

31

Hashing Tradeoffs Separate chaining vs. linear probing/double hashing. Space for links vs. empty table slots. Small table + linked allocation vs. big coherent array.

• •

Linear probing vs. double hashing. load factor 50% linear probing double hashing get put get put 1.5 2.5 1.4 1.5 66% 2.0 5.0 1.6 2.0 75% 3.0 8.5 1.8 3.0 90% 5.5 55.5 2.6 5.5

number of probes

32

Summary of symbol-table implementations

implementation unordered array ordered array unordered list ordered list BST randomized BST red-black tree hashing

guarantee search N lg N N N N 7 lg N 2 lg N 1* insert N N N N N 7 lg N 2 lg N 1* delete N N N N N 7 lg N 2 lg N 1* search N/2 lg N N/2 N/2 1.38 lg N 1.38 lg N lg N 1*

average case insert N/2 N/2 N N/2 1.38 lg N 1.38 lg N lg N 1* delete N/2 N/2 N/2 N/2 ? 1.38 lg N lg N 1*

ordered iteration? no yes no yes yes yes yes no

operations on keys equals() compareTo() equals() compareTo() compareTo() compareTo() compareTo() equals() hashCode()

* assumes random hash code

33

Hashing versus balanced trees

Hashing simpler to code no effective alternative for unordered keys faster for simple keys (a few arithmetic ops versus lg N compares) (Java) better system support for strings [cached hashcode] does your hash function produce random values for your key type??

• • • • • • • • • •

Balanced trees stronger performance guarantee can support many more operations for ordered keys easier to implement compareTo() correctly than equals() and hashCode()

Java system includes both red-black trees: java.util.TreeMap, java.util.TreeSet hashing: java.util.HashMap, java.util.IdentityHashMap
34

Typical “full” ST API public class *ST *ST() void Value boolean Key Key Key Key void Iterator put(Key key, Value val) get(Key key) contains(Key key) min() max() next(Key key) prev(Key key) remove(Key key) iterator() create a symbol table put key-value pair into the table return value paired with key (null if key is not in table) is there a value paired with key? smallest key largest key next largest key (null if key is max) next smallest key (null if key is min) remove key-value pair from table iterator through keys in table

Hashing is not suitable for implementing such an API (no order) BSTs are easy to extend to support such an API (basic tree ops)
Ex: Can use LLRB trees implement priority queues for distinct keys
35

hash functions collision resolution applications

36

Set ADT Set. Collection of distinct keys. public class *SET SET() void boolean void Iterator add(Key key) contains(Key key) remove(Key key) iterator() create a set put key into the set is there a value paired with key? remove key from the set iterator through all keys in the set

Normal mathematical assumption: collection is unordered Typical (eventual) client expectation: ordered iteration Q. How to implement? A0. Hashing (our ST code [value removed] or java.util.HashSet) A1. Red-black BST (our ST code [value removed] or java.util.TreeSet) ordered iterator O(log N) search unordered iterator O(1) search

37

SET client example 1: dedup filter Remove duplicates from strings in standard input Read a key. If key is not in set, insert and print it.

• •

No iterator needed. Output is in same order as input with dups removed.

public class DeDup { public static void main(String[] args) { SET set = new SET(); while (!StdIn.isEmpty()) { String key = StdIn.readString(); if (!set.contains(key)) { set.add(key); StdOut.println(key); } } } }

% more tale.txt it was the best of times it was the worst of times it was the age of wisdom it was the age of foolishness ... % java Dedup < tale.txt it was the best of times worst age wisdom foolishness ...

Simplified version of FrequencyCount (no iterator needed)

38

SET client example 2A: lookup filter Print words from standard input that are found in a list Read in a list of words from one file. Print out all words from standard input that are in the list.

• •

public class LookupFilter { public static void main(String[] args) { SET set = new SET(); In in = new In(args[0]); while (!in.isEmpty()) set.add(in.readString()); while (!StdIn.isEmpty()) { String word = StdIn.readString(); if (set.contains(word)) StdOut.println(word); } } }

create SET

process list

print words that are not in list

39

SET client example 2B: exception filter Print words from standard input that are not found in a list Read in a list of words from one file. Print out all words from standard input that are not in the list.

• •

public class LookupFilter { public static void main(String[] args) { SET set = new SET(); In in = new In(args[0]); while (!in.isEmpty()) set.add(in.readString()); while (!StdIn.isEmpty()) { String word = StdIn.readString(); if (!set.contains(word)) StdOut.println(word); } } }

create SET

process list

print words that are not in list

40

SET filter applications

application dedup spell checker browser chess spam filter trusty filter credit cards

purpose eliminate duplicates find misspelled words mark visited pages detect draw eliminate spam allow trusted mail check for stolen cards

key

type dedup

in list duplicates dictionary visited pages positions spam good mail stolen cards

not in list unique keys misspelled words

word URL board IP addr URL number

exception lookup lookup exception lookup exception

good mail

good cards

41

Searching challenge: Problem: Index for a PC or the web Assumptions: 1 billion++ words to index Which searching method to use? 1) hashing implementation of SET 2) hashing implementation of ST 3) red-black-tree implementation of ST 4) red-black-tree implementation of SET 5) doesn’t matter much

42

Index for search in a PC
ST st = new ST(); for (File f: filesystem) { In in = new In(f); String[] words = in.readAll().split("\\s+"); for (int i = 0; i < words.length; i++) { String s = words[i]; if (!st.contains(s)) st.put(s, new SET()); SET files = st.get(s); files.add(f); } }

build index

SET files = st.get(s); for (File f: files) ...

process lookup request

43

Searching challenge: Problem: Index for a book Assumptions: book has 100,000+ words

Which searching method to use? 1) hashing implementation of SET 2) hashing implementation of ST 3) red-black-tree implementation of ST 4) red-black-tree implementation of SET 5) doesn’t matter much

44

Index for a book

public class Index { public static void main(String[] args) { String[] words = StdIn.readAll().split("\\s+"); ST st; st = new ST(); for (int i = 0; i < words.length; i++) { String s = words[i]; if (!st.contains(s)) st.put(s, new SET()); SET pages = st.get(s); pages.add(page(i)); } for (String s : st) StdOut.println(s + ": " + st.get(s)); } }

read book and create ST

process all words

print index!

Requires ordered iterators (not hashing)
45

Hashing in the wild: Java implementations Java has built-in libraries for hash tables. • java.util.HashMap = separate chaining implementation. • java.util.IdentityHashMap = linear probing implementation. import java.util.HashMap; public class HashMapDemo { public static void main(String[] args) { HashMap st = new HashMap (); st.put("www.cs.princeton.edu", "128.112.136.11"); "128.112.128.15"); st.put("www.princeton.edu", StdOut.println(st.get("www.cs.princeton.edu")); } }

Null value policy. Java HashMap allows null values. Our implementation forbids null values.

• •

46

Using HashMap Implementation of our API with java.util.HashMap. import java.util.HashMap; import java.util.Iterator; public class ST implements Iterable { private HashMap st = new HashMap(); public void put(Key key, Value val) { if (val == null) st.remove(key); else st.put(key, val); } public Value get(Key key) { public Value remove(Key key) { public boolean contains(Key key) { public int size() contains(Key key) { public Iterator iterator() { }

return return return return return

st.get(key); st.remove(key); st.contains(key); st.size(); st.keySet().iterator();

} } } } }

47

Hashing in the wild: algorithmic complexity attacks Is the random hash map assumption important in practice? Obvious situations: aircraft control, nuclear reactor, pacemaker. Surprising situations: denial-of-service attacks.

• •

malicious adversary learns your ad hoc hash function (e.g., by reading Java API) and causes a big pile-up in single address that grinds performance to a halt

Real-world exploits. [Crosby-Wallach 2003] Bro server: send carefully chosen packets to DOS the server, using less bandwidth than a dial-up modem Perl 5.8.0: insert carefully chosen strings into associative array. Linux 2.4.20 kernel: save files with carefully chosen names.

• • •

Reference:

http://www.cs.rice.edu/~scrosby/hash

48

Algorithmic complexity attack on the Java Library Goal. Find strings with the same hash code. Solution. The base-31 hash code is part of Java's string API.
Key
Aa BB

hashCode()
2112 2112

Key
AaAaAaAa AaAaAaBB AaAaBBAa AaAaBBBB AaBBAaAa AaBBAaBB AaBBBBAa AaBBBBBB BBAaAaAa BBAaAaBB BBAaBBAa BBAaBBBB

hashCode()
-540425984 -540425984 -540425984 -540425984 -540425984 -540425984 -540425984 -540425984 -540425984 -540425984 -540425984 -540425984 -540425984 -540425984 -540425984 -540425984
49

2N strings of length 2N that hash to same value!

Does your hash function produce random values for your key type??

BBBBAaAa BBBBAaBB BBBBBBAa BBBBBBBB

One-Way Hash Functions One-way hash function. Hard to find a key that will hash to a desired value, or to find two keys that hash to same value. Ex. MD4, MD5, SHA-0, SHA-1, SHA-2, WHIRLPOOL, RIPEMD-160. insecure String password = args[0]; MessageDigest sha1 = MessageDigest.getInstance("SHA1"); byte[] bytes = sha1.digest(password); // prints bytes as hex string

Applications. Digital fingerprint, message digest, storing passwords. Too expensive for use in ST implementations (use balanced trees)
50

Similar Documents

Free Essay

Consistent Hashing

...1. Consistent Hashing Consider the following two scenarios. Describe in each case why consistent hashing is likely to perform better than hashing. Scenario 1: There is a fixed set of cache servers implementing consistent hashing and a population of clients who have incomplete views of the system i.e. each client only knows about a fraction of the servers Scenario 2: There is a set of cache servers that change i.e. nodes come and go. Answer 1: Hashing: Hashing is an easy to implement and quick to evaluate algorithm. Let us consider that the fixed set of cache servers are ‘n’, that is the number of nodes. Let us number the computers 0, 1, 2,…,n-1. According to the hashing algorithm, the key value pair (k, v) will be stored on the cache ‘hash(k) mod n’ where hash() is any function that converts the arbitrary string k to a non-negative integer. Keys are distributed evenly in a cluster for any reasonable number of keys, if the hash function being used in hash(k) mod n is a good hash function. Consistent Hashing: Consistent hashing is a cleverer algorithm compared to hashing. Here, the output range of the hash function is treated as a ring or fixed circular space. The largest hash value wraps around to the smallest hash value to form the ring. Each node in the system is assigned a random value which represents its ‘position’ on the ring. Each data item identified by a key k is assigned to a node by hashing data item’s key to yield a position on the ring. We walk clockwise...

Words: 1166 - Pages: 5

Free Essay

Algorithm

...1. Illustrate the operation of Radix_sort on the following list of English words: cow, dog, seq, rug, row, mob, box tab, bar ear, tar, dig, big, tea, now, fox. ANSWER: It is a sorting algorithm that is used to sort numbers. We sort numbers from least significant digit to most significant digit. In the following array of words, three is the maximum number of digits a word has, hence the number of passes will be three. In pass 1, sort the words alphabetically using first letter from the right. For eg, tea has “a” as the last letter, hence it comes first, similarly mob which has “b” as the last letter comes second. In this way the remaining words are sorted. In pass 2, sort the words alphabetically using second letter from the right. For eg, tab has “a” as its middle letter which comes first, then comes bar and so on. In pass 3, sort the words alphabetically using third letter from the right. For eg, bar has “b” as its first letter from left and since no word starts with “a”, bar will appear first. Similarly, big, box, cow and so on. UNSORTED ARRAY | PASS 1 | PASS 2 | PASS 3(SORTED ARRAY) | cow | tea | tab | bar | dog | mob | bar | big | seq | tab | ear | box | rug | rug | tar | cow | row | dog | tea | dig | mob | dig | seq | dog | box | big | dig | ear | tab | seq | big | fox | bar | bar | mob | mob | ear | ear | dog | now | tar | tar | cow | row | dig | cow | row | rug | ...

Words: 1470 - Pages: 6

Premium Essay

Assessing and Securing Systems on a Wan and Applying Encryption and Hashing Algorithms for Secure Communications

...Unit 1 Individual Project Danielle Hunker Ethical Hacking Colorado Technical University Online CSS280 February 22, 2016 Assessment Worksheet Assessing and Securing Systems on a Wide Area Network (WAN) Course Name and Number: Ethical Hacking CSS280 Student Name: Danielle Hunker Instructor Name: Jimmy Irwin Lab Due Date: February 22, 2016 Overview In this lab, a systems administrator for the securelabsondemand.com network has reported odd behavior on two servers that support legacy applications you first conducted internal penetration tests (also called a vulnerability scan) on each system and then helped secure those systems by configuring firewalls and removing vulnerable open ports. Lab Assessment Questions & Answers 1. What is the first Nmap command you ran in this lab? Explain the switches used. Nmap command: nmap –O –v 10.20.100.50 -O was the switch used to detect the operating system 10.20.100.50 -v was the switch used to show the detail of 10.20.100.50 2. What are the open ports when scanning 192.168.3.25 and their service names? * 80 HTTP services * 135 Microsoft EPMAP (End Point Mapper) * 139 NetBios session service * 445 Microsoft DS, SMB file sharing and CIFS (common internet file sharing) * 3389 RDP (Remote Desktop Protocol) * 5357 WSDAPI web services for devices * 49152 uo to 49157 DCOM or ephemeral ports 3. What is the command line syntax for running an SMB vulnerability scan...

Words: 832 - Pages: 4

Premium Essay

Lab 1

...Assessment Worksheet 111 LAB #7 – ASSESSMENT WORKSHEET Relate Windows Encryption and Hashing to Confidentiality and Integrity Course Name and Number: Student Name: Instructor Name: Lab Due Date: Overview This lab demonstrated how hashing tools can be used to ensure message and file transfer integrity and how encryption can be used to maximize confidentiality. Common hashing and encryption tools, including MD5, SHA1, and GnuPG, were used. You used GnuPG to generate both a public and private key and a secret key for encryption only. Lab Assessment Questions & Answers 1. If you and another person want to encrypt messages, should you provide that person with your public 7 Relate Windows Encryption and Hashing to Confidentiality and Integrity key, private key, or both? You should both provide each other with your public keys. 2. What does GPG allow you to do once it is installed? GPG allows you to encrypt and decrypt data and generate public and private keys. 3. Name two different types of encryption supported by GPG for your key. GPG supports symmetric ciphers DES and Blowfish as well as asymmetric ciphers ELGamal and RSA. 112 LAB #7 | Relate Windows Encryption and Hashing to Confidentiality and Integrity 4. What happens when you sign and trust a new key to your keychain? A new private and public key is created with a fingerprint for non repudiation. 5. If a user sends you his/her public key, will he/she be able to decrypt your encrypted...

Words: 472 - Pages: 2

Free Essay

Accounting

...24. Explain how a hashing structure works and why it is quicker than using an index. Give an example. If it so much faster, why isn't it used exclusively? ANS: Hashing is the transformation of a string of character s into a usually shorter fixed-length value or key that represents the original string. Hashing is used to index and retrieve items in a database because it is faster to find the item using the shorter hashed key than to find it using the original value. It is also used in many encryption algorithms. As a simple example of the using of hashing in databases, a group of people could be arranged in a database like this: Abernathy, Sara Epperdingle, Roscoe Moore, Wilfred Smith, David (and many more sorted into alphabetical order) Each of these names would be the key in the database for that person's data. A database search mechanism would first have to start looking character-by-character across the name for matches until it found the match (or ruled the other entries out). But if each of the names were hashed, it might be possible (depending on the number of names in the database) to generate a unique four-digit key for each name. For example: 7864 Abernathy, Sara 9802 Epperdingle, Roscoe 1990 Moore, Wilfred 8822 Smith, David (and so forth) A search for any name would first consist of computing the hash value (using the same hash function used to store the item) and then comparing for a match using that value. It would, in general, be much faster to find...

Words: 858 - Pages: 4

Premium Essay

Lab 7 Fundementals of Cyber Security

...Assessment Worksheet 111 LAB #7 – ASSESSMENT WORKSHEET Relate Windows Encryption and Hashing to Confidentiality and Integrity Course Name and Number: CSIA301 Overview This lab demonstrated how hashing tools can be used to ensure message and file transfer integrity and how encryption can be used to maximize confidentiality. Common hashing and encryption tools, including MD5, SHA1, and GnuPG, were used. You used GnuPG to generate both a public and private key and a secret key for encryption only. Lab Assessment Questions & Answers 1. If you and another person want to encrypt messages, should you provide that person with your public 7 Relate Windows Encryption and Hashing to Confidentiality and Integrity key, private key, or both? In theory you could, but I you are taking the time out to make in the encrypted messages I'm assuming you wouldn't want others to know, but I think you have to provide the person with both you need both to access the messages. 2. What does GPG allow you to do once it is installed? GPG is specifically a command line tool that enables you to encrypt and sign your data and communication and includes a key management system as well as access modules for all kind of public key directories. 3. Name two different types of encryption supported by GPG for your key. 112 LAB #7 | Relate Windows Encryption and Hashing to Confidentiality and Integrity 4. What happens when you sign and trust a new key to your keychain? ...

Words: 442 - Pages: 2

Premium Essay

Se 571 Principles of Information Security and Privacy Midterm Exam

...nd-privacy-midterm-exam/ SE 571 Principles of Information Security and privacy Midterm Exam 1. (TCO A) What are the three goals of security in computing? For each goal, list two controls that can be implemented to help achieve that goal. 2. (TCO A) List and define five desirable qualities in a process designed to evaluate the trustworthiness of an operating system 3. (TCO B) Suppose you have a high capacity network connection coming into your home, and you also have a wireless network access point. Also suppose you do not use the full capacity of your network connection. List three reasons you might still want to prevent an outsider obtaining free network access by intruding into your wireless network 4. (TCO C) Explain how a hashing algorithm works and how it can be used to provide authentication and data integrity 5. (TCO B) Which of the following is a correct statement? SE 571 Principles of Information Security and privacy Midterm Exam Follow Link Below To Get Tutorial https://homeworklance.com/downloads/se-571-principles-of-information-security-and-privacy-midterm-exam/ SE 571 Principles of Information Security and privacy Midterm Exam 1. (TCO A) What are the three goals of security in computing? For each goal, list two controls that can be implemented to help achieve that goal. 2. (TCO A) List and define five desirable qualities in a process designed to evaluate the trustworthiness of an operating system 3. (TCO B) Suppose you have a high capacity...

Words: 3561 - Pages: 15

Premium Essay

Nt1330 Unit 3

...Hashing values over single Encryption of a value When it comes to the common username and password verification, the most widely used mechanism for encryption is public and private key. While this level of encryption is good enough for protecting a password it does have a few downsides. 1. The Private key has to be kept confidential at all times if leaked all information is now accessible from any source. 2. A common username and password are contained in the payload of a packet that is encrypted. 3. Once the packet is decrypted, the server will store the credentials or compare them to previous credentials. 4. If a digital certificate is offered, is this a valid certificate or has it been tampered with in any way? With these four downsides identified it could be time to adopt what has been learned by FIDO. The main characteristics of FIDO are that your personal information is never exposed to a server. This is where FIDO has the edge over common login credentials, everyone is kept anonymous. The next stage is to develop a hybrid approach where the user has control of the information that is going to be used for login credentials. This could be done by saving the user’s first name, second name, age, address, country and email into a secure chip that can only be accessed using...

Words: 1229 - Pages: 5

Premium Essay

Nt1330 Unit 6 Assignment

...week’s lab dealt with the application of encryption and hashing algorithms to a test file on the Linux operating system. I chose this lab over the other lab option because I find this topic of particular interest and something that I can apply in my day to day duties at my current employer. I also enjoy working with the Linux operating system because I have always found it to be more challenging than Windows which I deal with daily. I had no trouble launching the virtual environment and accessing the Linux virtual machine. The lab used a text editing program called gedit which I am also familiar with and is similar to the notepad application sound withing Windows. The gedit application is launched by using the terminal window within the operating system which is a command line tool....

Words: 536 - Pages: 3

Free Essay

Unit 2 Assignment 2: Vulnerability of a Cryptosystem

...me to make recommendations on how to remedy the situation. There is a few websites that I have been advised to read as they may assist in my decision making process. After reading further I have been asked a large number of questions. I am planning to read up so I know about the cryptosystem then go into answering the provided questions. When we think about MD5 hashing we have to consider the hash and its long history of collisions on the network. When we were doing the practice labs in class the other night we say a number of student using the MD5 hashing and getting the same hash out of different text documents. This is not a good sign that this is the best type of hashing algorithm to use. I would advised using the latest greatest out with a known history of being secure. Asking if the threat is significant is an easy question to answer. Any organizations documentation at some level needs to be protected so it is not used in the wrong way. Yes, of course the cryptosystem being vulnerable is something that needs to be addresses right away. Modifying the hardware and software to provide a more secure hashing algorithm is on the top of the list. I need to continue doing research to find the best solution available for a price point that is reasonable for our university. When we think about how easy a system is to exploit I don’t think of a system being exploited as easy. I think of it like the attacker has some need for what you have and will work to find a way into...

Words: 1643 - Pages: 7

Premium Essay

Ethical Hacking Lab 2

...Lab #2 – Assessment Worksheet Applying Encryption and Hashing Algorithms for Secure Communications Ethical Hacking Course Name and Number: _____________________________________________________ Student Name: ________________________________________________________________ Instructor Name: ______________________________________________________________ Lab Due Date: ________________________________________________________________ Overview In this lab, you applied common cryptographic techniques to ensure confidentiality, integrity, and authentication. You created an MD5sum and SHA1 hash on a simple text file on a Linux virtual machine and compared the hash values of the original files with those generated after the file had been modified. Next, you used GnuPG to generate an encryption key pair and encrypted a message. Finally, you used the key pairs to send secure messages between two user accounts on the virtual machine and verified the integrity of the received files. Lab Assessment Questions & Answers 1. Compare the hash values calculated for Example.txt that you documented during this lab. Explain in your own words why the hash values will change when the data is modified. The harsh value would change because of course there is a change in data of the file "Example.txt" so if the file should be transfer from the source to the destination with different hash string, for example the source hash string is 3ddhyhhhs47878, and when it reach the destination the...

Words: 662 - Pages: 3

Free Essay

Csec 610 Lab 1

...1. Explain the two different types of attacks that can be performed in Cain and Abel to crack user account passwords. Which do you think is the most effective and why? Answer: The two different types of attacks that can be performed in Cain and Abel are Brute Force attack and a Dictionary attack. The Brute Force attack is a method of breaking a cipher in a word through every possible key. The extent of breaking the password depends greatly on the length of the password. Within the program Cain and Abel, Brute Force will look at all possible combinations of characters within the password to try and recover or crack the password than the dictionary attack. Brute Force cracking can take forever to find the password but it will eventually lead to a password being cracked (Ducklin, 2013). Dictionary attacks, also known as wordlist attacks, is a simple and more efficient way to crack passwords. Many people tend to use words listed in the dictionary for passwords. The program uses multiple dictionaries as well as technical and foreign language dictionaries as support to enable the cipher to be cracked. The downside to this type of password cracking is that if a word contains complex symbols, uppercase, lowercase, and numbers that are not in the dictionary, then the dictionary attack can be beat (Gibson, 2011). With working with Cain and Abel in class, I felt that the dictionary attack was more efficient in finding the password due to real life scenarios where individuals set passwords...

Words: 1190 - Pages: 5

Free Essay

Ethical Hacking Chapter 3

...* Question 1 5 out of 5 points | | | Symmetric encryption faces difficulty due to what issue? | | | | | Selected Answer: | Key exchange | Answers: | Security | | Key exchange | | Bit length | | Software expense | | | | | * Question 2 5 out of 5 points | | | Digital signatures are used for all but which one of the following purposes? | | | | | Selected Answer: | Availability | Answers: | Authentication | | Nonrepudiation | | Integrity | | Availability | | | | | * Question 3 5 out of 5 points | | | Which of the following is most likely to be broken using a birthday attack? | | | | | Selected Answer: | MD5 | Answers: | DES | | RSA | | PKI | | MD5 | | | | | * Question 4 0 out of 5 points | | | Attacks against ciphers that feed information into a system and observe output are: | | | | | Selected Answer: | Known plaintext | Answers: | Ciphertext only | | Known plaintext | | Chosen plaintext | | Chosen ciphertext | | | | | * Question 5 5 out of 5 points | | | Asymmetric encryption does not require ___________. | | | | | Selected Answer: | secure initial key exchange | Answers: | key exchange | | secret keys | | multiple keys | | secure initial key exchange | | | | | * Question 6 5 out of 5 points | | | Symmetric encryption requires which of the following? | | | | | Selected Answer: | Both the parties...

Words: 676 - Pages: 3

Free Essay

Aic Triad

...Introduction The AIC triad is one of the many approaches to secure networks in today's complex computing environments. What makes the AIC triad different from any other theory is that when it is used properly it forms the cornerstone of every aspect of computing and network security. Most IT security practices are focused on protecting systems from loss of confidentiality, loss of integrity, and loss of availability; these three together are referred to as the security triad, the CIA triad, and the AIC triad. Regardless of the order in which the letters are organized in the acronym, they refer to the same principles. Confidentiality, Integrity and Availability are the cornerstones to which a network is comprised. Each with its own independent yet very important role in networking. Confidentiality refers to access control and ensures that it is restricted to the individuals who have been previously authorized to access a network or one of its resources. Integrity addresses the validity of data and any networked object. It ensures that the unauthorized changes to the data or object is noticed so that appropriate actions can be taken. Availability’s meaning is essentially as simple as the word itself. It refers to the principle that addresses the need for an authorized user to have access to a resource as quickly as possible based off the networks functioning abilities. Availability In an information technology (IT) environment availability is one of the most important...

Words: 1508 - Pages: 7

Free Essay

Correlation Based Dynamic Clustering and Hash Based Retrieval for Large Datasets

...Correlation Based Dynamic Clustering and Hash Based Retrieval for Large Datasets ABSTRACT Automated information retrieval systems are used to reduce the overload of document retrieval. There is a need to provide an efficient method for storage and retrieval .This project proposes the use of dynamic clustering mechanism for organizing and storing the dataset according to concept based clustering. Also hashing technique will be used to retrieve the data from the dataset based on the association rules .Related documents are grouped into same cluster by k-means clustering algorithm. From each cluster important sentences are extracted by concept matching and also based on sentence feature score. Experiments are carried to analyze the performance of the proposed work with the existing techniques considering scientific articles and news tracks as data set .From the analysis it is inferred that our proposed technique gives better enhancement for the documents related to scientific terms. Keywords Document clustering, concept extraction, K-means algorithm, hash-based indexing, performance evaluation 1. INTRODUCTION Now-a-days online submission of documents has increased widely, which means large amount of documents are accumulated for a particular domain dynamically. Information retrieval [1] is the process of searching information within the documents. An information retrieval process begins when a user enters a query; queries are formal statements of...

Words: 2233 - Pages: 9