Monday, March 5, 2018

Threat Analysis and Risk Assessment Steps


Terms:

  1. Vulnerability: A flaw or weakness in a systems design implementation or operation and management that could be exploited to violate the systems security policy
  2. Threat: A Potential for violation of security, which exists when there is a circumstance, capability, action or event that could breach security and cause harm..a threat is a possible danget that might exploit a vulnerability
  3. Attack: An assault on system security that derives from an intelligent threat to evade security services and violate the security policy of a system


Threat Analysis Process:

1. Threat Modelling (Design centric)
2. Exploit the vulnerabilities realted to Threats using a attacker model

Example: Electronic lock in a hotel

  1. Define sub-Components: lock can be opened by Guest card, Master key
  2. Define security Objectives
    1. Allow guest access to the room
    2. Allow service personnel access
    3. Prevent unauthorized access
    4. Every entry should be logged
  3. Define Work-flow 
    1. Guest is given card and uses it to open or lock the door
  4. Trust Boundaries 
    1. Central encoder in a secure location
    2. Card
    3. Interface between lock and lock encoder
  5. Security Controls:
    1. Central encoder is accessible to hotel staff
    2. Lock encoder is physically hard to modify
    3. Cards data is encrypted
  6. Attacker Targets Assets
    1. Encoding master keys
    2. Card itself
    3. lock programmer
    4. Lock Encoder
  7. Threats
    1.   Integrity:
      1.   someone steals guest card
      2.   Sneaks into room when door is open
      3.   changing the audit log by physical access
      4.   Break lock
    2.   Confidentiality:
      1.   Exposure of audit log by physically acessing the log
    3.   Availability:
      1.   Central encoder is out of orderm no way to unlock
      2.   Power to the lock is lost and no way to open
  8. Vulnerabilities
    1. Accessing encryption keys in lock programmer
    2. Crypto algorithm/key-size weakness
    3. Guest card easy to copy
    4. Lock Physical strength weakness

Threat Quantification:


Threat |Threat consequence |Probability of Theat|Damage of Threat| Attacker level

Steal card| Unauthorized access | Medium | Medium | Loner


Risk Assessment Steps:

1. Define Scope

Identity what is covered and what is not covered
Agreement with senior management

2. Data Collection

Understand policies and procedures currently in place Analysis. Interview key personnel, check documentation, system and service information
  • -services running
  • -Network applications running
  • -Physical location of systems
  • -Access control permissions
  • -Firewall testing


Gather information about specific systems and services:
  •   Security Focus (www.securityfocus.com) - searchable databases of
  •   vulnerabilities and relevant news groups.
  •   Incidents.org (www.incidents.org) - information on current threat activities.
  •   Packet Storm (packetstormsecurity.org)
  •   InfoSysSec (www.infosyssec.com)
  •   SANS (www.sans.org)


3. Analysis of Policies and Procedures

Review and analyze existing policies and procedues and guage compliance level within organization
Example:

  •   ISO 17799
  •   BSI 7799
  •   Common Criteria - ISO 15504


4. Vulnerability Analysis

Test the systems for current exposure, safe guards in terms of confidentiality, integrity and availability. Various tools can be used to identity vulnerabilities in the systems:
  •   Whisker
  •   Portscan
  •   IBM AppScan
  •   Parfait - static analysis tool
  •   Findbugs

Tests include Penetration testing, Zero-knowledge testing performed by external parties
Provide Rating to the threats by Severity and Exposure

  •   Severity - Minor, Moderate, High
  •   Exposure - Minor, Moderatem, Hign


5. Threat Analysis

Threat Agents are divided into Human (Hackers, theft, current or former employees, service personnels) and non-human (Floods, Lightling Plumbing, Viruses)


6. Analysis of Acceptable Risks

Assess existing policies, procedures and protection items are adequate. Document and inform senior management.

References: 

  1. https://downloads.cloudsecurityalliance.org/initiatives/top_threats/The_Notorious_Nine_Cloud_Computing_Top_Threats_in_2013.pdf
  2. https://www.sans.org/reading-room/whitepapers/auditing/overview-threat-risk-assessment-76

Thursday, March 1, 2018

Apache Spark

Apache Spark
Data analysis using Scala
Spark-scala CLI, input CSV, simple operations
Examples: https://www.youtube.com/watch?v=HQTB3hlLD6E

Scala language

Scala:
Scalable and Efficient langulage
Function is first class object
Static methods are replaced with singletons, multiple inheritence with traits
Scala good in pattern matching comprehensions

Machine Learning, Python, Pandas

Python and DataScience:

Object oriented language
Dynamically typed language
Easy to learn
Suitable for Data Science, Scientific analysis
Pandas - Python Data Analysis Library - Allows to read and manipulate data efficiently
Another alternative to Data science analysis is  R
R has lot of support for data manupulation

Pandas

Pandas is a library for data manipulation and analysis. -Data Science or data analytics is a process of analyzing large set of data points to get answers on questions related to that data set
-The library provides data structures and operations for manipulating numerical tables and time series.
-If dataset size is billions of records than tranditional tools Excel will not work
-PyCharm community edition
-Jupyter notebook to work with Pandas library
-df['Temparature'].max() - find max temparature
-df['EST'][df['Events']=='Rain'] - Days it rained
-df['WindsppedMPH'].mean() - mean windspeed
-Data munging or data wrangling - Process of cleaning messy data
- Pandas comes with Anaconda distribution, or pip install

Python Pandas Library - https://www.youtube.com/watch?v=F6kmIpWWEdU

Terms:

  • Mean - is Sum of all numbers divided by total number of values
  • Median - middle point of a number set
  • Standard deviation
  • -is a measure used to quantify the amount of variation or dispersion of a set of data values.
  • -A low standard deviation indicates that the data points tend to be close to the mean (also called the expected value) of the set, while a high standard deviation indicates that the data points are spread out over a wider range of values.

Jupyter Notebook:

Install anaconda and it gets Jupyter

start Jupyter notebook:

>jupyter notebook - this starts notebook server on eg: http://localhost:8888/notebooks/Pandas-First-Jupyter-notebook.ipynb#

>import pandas as pd
>df = pd.read_csv("/home/bhanu/github/MachineLearning-Pandas/Sample.csv");
>df['temperaturemin'].mean()  -- Prints mean of column data

"Fitting a linear regression" - Linear regression is a powerful and commonly used machine learning algorithm. It predicts the target variable using linear combinations of the predictor variables. Let's say we have a 2 values, 3, and 4. A linear combination would be 3 * .5 + 4 * .5. A linear combination involves multiplying each number by a constant, and adding the results. You can read more here.

Linear regression only works well when the predictor variables and the target variable are linearly correlated. As we saw earlier, a few of the predictors are correlated with the target, so linear regression should work well for us.
Example data:

date temperaturemin temperaturemax precipitation snowfall
2007-01-13 48 69.1 0 0
2007-01-19 34 54 0 0
2007-01-21 28 35.1 0.8 0
2007-01-25 30.9 46.9 0 0
2007-01-27 32 64 0 0
2007-02-05 19.9 39.9 0 0
2007-02-08 27 48 0 0


Unsupervised learning algorithm:

k-means clustering is a method of vector quantization, originally from signal processing, that is popular for cluster analysis in data mining. k-means clustering aims to partition n observations into k clusters in which each observation belongs to the cluster with the nearest mean, serving as a prototype of the cluster. This results in a partitioning of the data space into Voronoi cells
- unsupervised learning algorithm
The algorithm works as follows:

First we initialize k points, called means, randomly.
We categorize each item to its closest mean and we update the mean’s coordinates, which are the averages of the items categorized in that mean so far.
We repeat the process for a given number of iterations and at the end, we have our clusters.

Supervised learning algorithm:

Linear regression is a statistical method of finding the relationship between independent and dependent variables.
The Linear Regression uses Slope - Intercept form of a line. The equation of a line in slope intercept form is given by:

y=mx+b
'x' is our independent variable.
'm' is the slope. It is measure to tell how steep our line.
'b' is the intercept. It tells where the line crosses y-axis
The very idea of Linear Regression is to find the best combination of slope (m) and intercept (b) which minimizes the SSE (Sum Of Squared Error) .

Github: https://github.com/gopularam/developer/tree/master/MachineLearning-Pandas

References:


  1. http://pandas.pydata.org
  2. https://www.dataquest.io/blog/machine-learning-python/ - Best one includes example of supervised and unspervised learning
  3. https://www.quora.com/How-would-linear-regression-be-described-and-explained-in-laymans-terms
  4. https://www.youtube.com/watch?v=CmorAWRsCAw