Monday, March 5, 2018

Threat Analysis and Risk Assessment Steps


  1. Vulnerability: A flaw or weakness in a systems design implementation or operation and management that could be exploited to violate the systems security policy
  2. Threat: A Potential for violation of security, which exists when there is a circumstance, capability, action or event that could breach security and cause harm..a threat is a possible danget that might exploit a vulnerability
  3. Attack: An assault on system security that derives from an intelligent threat to evade security services and violate the security policy of a system

Threat Analysis Process:

1. Threat Modelling (Design centric)
2. Exploit the vulnerabilities realted to Threats using a attacker model

Example: Electronic lock in a hotel

  1. Define sub-Components: lock can be opened by Guest card, Master key
  2. Define security Objectives
    1. Allow guest access to the room
    2. Allow service personnel access
    3. Prevent unauthorized access
    4. Every entry should be logged
  3. Define Work-flow 
    1. Guest is given card and uses it to open or lock the door
  4. Trust Boundaries 
    1. Central encoder in a secure location
    2. Card
    3. Interface between lock and lock encoder
  5. Security Controls:
    1. Central encoder is accessible to hotel staff
    2. Lock encoder is physically hard to modify
    3. Cards data is encrypted
  6. Attacker Targets Assets
    1. Encoding master keys
    2. Card itself
    3. lock programmer
    4. Lock Encoder
  7. Threats
    1.   Integrity:
      1.   someone steals guest card
      2.   Sneaks into room when door is open
      3.   changing the audit log by physical access
      4.   Break lock
    2.   Confidentiality:
      1.   Exposure of audit log by physically acessing the log
    3.   Availability:
      1.   Central encoder is out of orderm no way to unlock
      2.   Power to the lock is lost and no way to open
  8. Vulnerabilities
    1. Accessing encryption keys in lock programmer
    2. Crypto algorithm/key-size weakness
    3. Guest card easy to copy
    4. Lock Physical strength weakness

Threat Quantification:

Threat |Threat consequence |Probability of Theat|Damage of Threat| Attacker level

Steal card| Unauthorized access | Medium | Medium | Loner

Risk Assessment Steps:

1. Define Scope

Identity what is covered and what is not covered
Agreement with senior management

2. Data Collection

Understand policies and procedures currently in place Analysis. Interview key personnel, check documentation, system and service information
  • -services running
  • -Network applications running
  • -Physical location of systems
  • -Access control permissions
  • -Firewall testing

Gather information about specific systems and services:
  •   Security Focus ( - searchable databases of
  •   vulnerabilities and relevant news groups.
  • ( - information on current threat activities.
  •   Packet Storm (
  •   InfoSysSec (
  •   SANS (

3. Analysis of Policies and Procedures

Review and analyze existing policies and procedues and guage compliance level within organization

  •   ISO 17799
  •   BSI 7799
  •   Common Criteria - ISO 15504

4. Vulnerability Analysis

Test the systems for current exposure, safe guards in terms of confidentiality, integrity and availability. Various tools can be used to identity vulnerabilities in the systems:
  •   Whisker
  •   Portscan
  •   IBM AppScan
  •   Parfait - static analysis tool
  •   Findbugs

Tests include Penetration testing, Zero-knowledge testing performed by external parties
Provide Rating to the threats by Severity and Exposure

  •   Severity - Minor, Moderate, High
  •   Exposure - Minor, Moderatem, Hign

5. Threat Analysis

Threat Agents are divided into Human (Hackers, theft, current or former employees, service personnels) and non-human (Floods, Lightling Plumbing, Viruses)

6. Analysis of Acceptable Risks

Assess existing policies, procedures and protection items are adequate. Document and inform senior management.



Thursday, March 1, 2018

Apache Spark

Apache Spark
Data analysis using Scala
Spark-scala CLI, input CSV, simple operations

Scala language

Scalable and Efficient langulage
Function is first class object
Static methods are replaced with singletons, multiple inheritence with traits
Scala good in pattern matching comprehensions

Machine Learning, Python, Pandas

Python and DataScience:

Object oriented language
Dynamically typed language
Easy to learn
Suitable for Data Science, Scientific analysis
Pandas - Python Data Analysis Library - Allows to read and manipulate data efficiently
Another alternative to Data science analysis is  R
R has lot of support for data manupulation


Pandas is a library for data manipulation and analysis. -Data Science or data analytics is a process of analyzing large set of data points to get answers on questions related to that data set
-The library provides data structures and operations for manipulating numerical tables and time series.
-If dataset size is billions of records than tranditional tools Excel will not work
-PyCharm community edition
-Jupyter notebook to work with Pandas library
-df['Temparature'].max() - find max temparature
-df['EST'][df['Events']=='Rain'] - Days it rained
-df['WindsppedMPH'].mean() - mean windspeed
-Data munging or data wrangling - Process of cleaning messy data
- Pandas comes with Anaconda distribution, or pip install

Python Pandas Library -


  • Mean - is Sum of all numbers divided by total number of values
  • Median - middle point of a number set
  • Standard deviation
  • -is a measure used to quantify the amount of variation or dispersion of a set of data values.
  • -A low standard deviation indicates that the data points tend to be close to the mean (also called the expected value) of the set, while a high standard deviation indicates that the data points are spread out over a wider range of values.

Jupyter Notebook:

Install anaconda and it gets Jupyter

start Jupyter notebook:

>jupyter notebook - this starts notebook server on eg: http://localhost:8888/notebooks/Pandas-First-Jupyter-notebook.ipynb#

>import pandas as pd
>df = pd.read_csv("/home/bhanu/github/MachineLearning-Pandas/Sample.csv");
>df['temperaturemin'].mean()  -- Prints mean of column data

"Fitting a linear regression" - Linear regression is a powerful and commonly used machine learning algorithm. It predicts the target variable using linear combinations of the predictor variables. Let's say we have a 2 values, 3, and 4. A linear combination would be 3 * .5 + 4 * .5. A linear combination involves multiplying each number by a constant, and adding the results. You can read more here.

Linear regression only works well when the predictor variables and the target variable are linearly correlated. As we saw earlier, a few of the predictors are correlated with the target, so linear regression should work well for us.
Example data:

date temperaturemin temperaturemax precipitation snowfall
2007-01-13 48 69.1 0 0
2007-01-19 34 54 0 0
2007-01-21 28 35.1 0.8 0
2007-01-25 30.9 46.9 0 0
2007-01-27 32 64 0 0
2007-02-05 19.9 39.9 0 0
2007-02-08 27 48 0 0

Unsupervised learning algorithm:

k-means clustering is a method of vector quantization, originally from signal processing, that is popular for cluster analysis in data mining. k-means clustering aims to partition n observations into k clusters in which each observation belongs to the cluster with the nearest mean, serving as a prototype of the cluster. This results in a partitioning of the data space into Voronoi cells
- unsupervised learning algorithm
The algorithm works as follows:

First we initialize k points, called means, randomly.
We categorize each item to its closest mean and we update the mean’s coordinates, which are the averages of the items categorized in that mean so far.
We repeat the process for a given number of iterations and at the end, we have our clusters.

Supervised learning algorithm:

Linear regression is a statistical method of finding the relationship between independent and dependent variables.
The Linear Regression uses Slope - Intercept form of a line. The equation of a line in slope intercept form is given by:

'x' is our independent variable.
'm' is the slope. It is measure to tell how steep our line.
'b' is the intercept. It tells where the line crosses y-axis
The very idea of Linear Regression is to find the best combination of slope (m) and intercept (b) which minimizes the SSE (Sum Of Squared Error) .



  2. - Best one includes example of supervised and unspervised learning

Wednesday, February 14, 2018

TLS session hash and extended master secret extension support - RFC 7627

Added TLS session hash and extended master secret extension support
-new properties: jdk.tls.useExtendedMasterSecret, jdk.tls.allowLegacyResumption
-Browser, automated, interoperability tests, Upgrade and downgrade tests
-short-term solution: server certificate change is restricted if endpoint identification is not enabled
-Session renegotiation - it can happen anytime due to different set if encryption kyes, hash alg used.
-Session resumption - means previous session info is used while communication
-Malicious user once session is compromized can perform DOS attacks. Server requires 10 times more processing power
-Master secret is 48-byte length, Pre-master secret size depends on alg RSA (48-byte)
-pre_master_secret is the value generated by the client using a CSPRNG and some other variables
As described in [TRIPLE-HS], in both the RSA and DHE key exchanges, an active attacker can synchronize two TLS sessions so that they share the same “master_secret”. For an RSA key exchange where the client is unauthenticated, this is achieved as follows. Suppose a client C connects to a server A. C does not realize that A is malicious and that A connects in the background to an honest server S and completes both handshakes. For simplicity, assume that C and S only use RSA ciphersuites.
1. C sends a “ClientHello” to A, and A forwards it to S.
2. S sends a “ServerHello” to A, and A forwards it to C.
3. S sends a “Certificate”, containing its certificate chain, to A. A replaces it with its own certificate chain and sends it to C.
4. S sends a “ServerHelloDone” to A, and A forwards it to C.
5. C sends a “ClientKeyExchange” to A, containing the “pre_master_secret”, encrypted with A’s public key. A decrypts the “pre_master_secret”, re-encrypts it with S’s public key, and sends it on to S.
6. C sends a “Finished” to A. A computes a “Finished” for its connection with S and sends it to S.
7. S sends a “Finished” to A. A computes a “Finished” for its connection with C and sends it to C.
At this point, both connections (between C and A, and between A and S) have new sessions that share the same “pre_master_secret”, “ClientHello.random”, “ServerHello.random”, as well as other session parameters, including the session identifier and, optionally, the session ticket. Hence, the “master_secret” value will be equal for the two sessions and will be associated both at C and S with the same session ID, even though the server identities on the two connections are different. Recall that C only sees A’s certificate and is unaware of A’s connection with S. Moreover, the record keys on the two connections will also be the same.

This attack is possible because Master Secret only depends on ClientHello.random and ServerHello.random, the proposed fix was to include the certificate and other session dependent parameters in the formula to generate the master secret.
We define a session hash after completion of the handshake as follows:

session_hash = Hash(handshake_messages)

The extended master secret that’s extension type 0x0017 signals both the client and server that Master Secret is generated with this formula:
Solution for proxies to work: disable the Extended master secret key extension


Wednesday, February 7, 2018

Terracotta EHCache vs Hazlecast

Terracotta EHCache vs Hazlecast:
We tried both of them for one of the largest online classifieds and e-commerce platform. We started with ehcache/terracotta(server array) cause it's well-known, backed by Terracotta and has bigger community support than hazelcast.
When we get it on production environment(distributed,beyond one node cluster) things changed, our backend architecture became really expensive so we decided to give hazelcast a chance.

Hazelcast is dead simple, it does what it says and performs really well without any configuration overhead.

Lucene vs Solr

Lucene vs Solr:
A simple way to conceptualize the relationship between Solr and Lucene is that of a car and its engine. You can't drive an engine, but you can drive a car. Similarly, Lucene is a programmatic library which you can't use as-is, whereas Solr is a complete application which you can use out-of-box.

What is Solr?
Apache Solr is a web application built around Lucene with all kinds of goodies.

It adds functionality like

Hit highlighting
Faceted Search and Filtering
Geospatial Search
Fast Incremental Updates and Index Replication
Web administration interface etc
Unlike Lucene, Solr is a web application (WAR) which can be deployed in any servlet container, e.g. Jetty, Tomcat, Resin, etc.

Lucene: Lucene is an inverted full-text index. This means that it takes all the documents, splits them into words, and then builds an index for each word. Since the index is an exact string-match, unordered, it can be extremely fast. Hypothetically, an SQL unordered index on a varchar field could be just as fast, and in fact I think you'll find the big databases can do a simple string-equality query very quickly in that case.

Let's look at a sample corpus of five documents:
My sister is coming for the holidays.
The holidays are a chance for family meeting.
Who did your sister meet?
It takes an hour to make fudge.
My sister makes awesome fudge.

What does Lucene do? Lucene is a full text search library.
Search has two principal stages: indexing and "retrieval."

During indexing, each document is broken into words, and the list of documents containing each word is stored in a list called the "postings list".
The posting list for the word "My" is:
My --> 1,5
And the posting list for the word "fudge" is:
fudge --> 4,5
The index consists of all the posting lists for the words in the corpus.
Indexing must be done before retrieval, and we can only retrieve documents that were indexed.

Retrieval is the process starting with a query and ending with a ranked list of documents. Say the query is [my fudge]. (The brackets denote the borders of the query). In order to find matches for the query, we break it into the individual words, and go to the posting lists. The full list of documents containing the keywords is [1,4,5]. Because document 5 contains both words and documents 1 and 4 contain just a single word from the query, a possible ranking is: 5, 1, 4 (document 5 appears first, then document 4, then document 1).
In general, indexing is a batch, preprocessing stage, and retrieval is a quick online stage, but there are exceptions.

This is the gist of Lucene. The rest of Lucene is (many important) specific bells and whistles for the indexing and retrieval processes.