Wednesday, February 14, 2018

TLS session hash and extended master secret extension support - RFC 7627

Added TLS session hash and extended master secret extension support
-new properties: jdk.tls.useExtendedMasterSecret, jdk.tls.allowLegacyResumption
-Browser, automated, interoperability tests, Upgrade and downgrade tests
-short-term solution: server certificate change is restricted if endpoint identification is not enabled
-Session renegotiation - it can happen anytime due to different set if encryption kyes, hash alg used.
-Session resumption - means previous session info is used while communication
-Malicious user once session is compromized can perform DOS attacks. Server requires 10 times more processing power
-Master secret is 48-byte length, Pre-master secret size depends on alg RSA (48-byte)
-pre_master_secret is the value generated by the client using a CSPRNG and some other variables
As described in [TRIPLE-HS], in both the RSA and DHE key exchanges, an active attacker can synchronize two TLS sessions so that they share the same “master_secret”. For an RSA key exchange where the client is unauthenticated, this is achieved as follows. Suppose a client C connects to a server A. C does not realize that A is malicious and that A connects in the background to an honest server S and completes both handshakes. For simplicity, assume that C and S only use RSA ciphersuites.
1. C sends a “ClientHello” to A, and A forwards it to S.
2. S sends a “ServerHello” to A, and A forwards it to C.
3. S sends a “Certificate”, containing its certificate chain, to A. A replaces it with its own certificate chain and sends it to C.
4. S sends a “ServerHelloDone” to A, and A forwards it to C.
5. C sends a “ClientKeyExchange” to A, containing the “pre_master_secret”, encrypted with A’s public key. A decrypts the “pre_master_secret”, re-encrypts it with S’s public key, and sends it on to S.
6. C sends a “Finished” to A. A computes a “Finished” for its connection with S and sends it to S.
7. S sends a “Finished” to A. A computes a “Finished” for its connection with C and sends it to C.
At this point, both connections (between C and A, and between A and S) have new sessions that share the same “pre_master_secret”, “ClientHello.random”, “ServerHello.random”, as well as other session parameters, including the session identifier and, optionally, the session ticket. Hence, the “master_secret” value will be equal for the two sessions and will be associated both at C and S with the same session ID, even though the server identities on the two connections are different. Recall that C only sees A’s certificate and is unaware of A’s connection with S. Moreover, the record keys on the two connections will also be the same.

This attack is possible because Master Secret only depends on ClientHello.random and ServerHello.random, the proposed fix was to include the certificate and other session dependent parameters in the formula to generate the master secret.
We define a session hash after completion of the handshake as follows:

session_hash = Hash(handshake_messages)

The extended master secret that’s extension type 0x0017 signals both the client and server that Master Secret is generated with this formula:
Solution for proxies to work: disable the Extended master secret key extension


Wednesday, February 7, 2018

Terracotta EHCache vs Hazlecast

Terracotta EHCache vs Hazlecast:
We tried both of them for one of the largest online classifieds and e-commerce platform. We started with ehcache/terracotta(server array) cause it's well-known, backed by Terracotta and has bigger community support than hazelcast.
When we get it on production environment(distributed,beyond one node cluster) things changed, our backend architecture became really expensive so we decided to give hazelcast a chance.

Hazelcast is dead simple, it does what it says and performs really well without any configuration overhead.

Lucene vs Solr

Lucene vs Solr:
A simple way to conceptualize the relationship between Solr and Lucene is that of a car and its engine. You can't drive an engine, but you can drive a car. Similarly, Lucene is a programmatic library which you can't use as-is, whereas Solr is a complete application which you can use out-of-box.

What is Solr?
Apache Solr is a web application built around Lucene with all kinds of goodies.

It adds functionality like

Hit highlighting
Faceted Search and Filtering
Geospatial Search
Fast Incremental Updates and Index Replication
Web administration interface etc
Unlike Lucene, Solr is a web application (WAR) which can be deployed in any servlet container, e.g. Jetty, Tomcat, Resin, etc.

Lucene: Lucene is an inverted full-text index. This means that it takes all the documents, splits them into words, and then builds an index for each word. Since the index is an exact string-match, unordered, it can be extremely fast. Hypothetically, an SQL unordered index on a varchar field could be just as fast, and in fact I think you'll find the big databases can do a simple string-equality query very quickly in that case.

Let's look at a sample corpus of five documents:
My sister is coming for the holidays.
The holidays are a chance for family meeting.
Who did your sister meet?
It takes an hour to make fudge.
My sister makes awesome fudge.

What does Lucene do? Lucene is a full text search library.
Search has two principal stages: indexing and "retrieval."

During indexing, each document is broken into words, and the list of documents containing each word is stored in a list called the "postings list".
The posting list for the word "My" is:
My --> 1,5
And the posting list for the word "fudge" is:
fudge --> 4,5
The index consists of all the posting lists for the words in the corpus.
Indexing must be done before retrieval, and we can only retrieve documents that were indexed.

Retrieval is the process starting with a query and ending with a ranked list of documents. Say the query is [my fudge]. (The brackets denote the borders of the query). In order to find matches for the query, we break it into the individual words, and go to the posting lists. The full list of documents containing the keywords is [1,4,5]. Because document 5 contains both words and documents 1 and 4 contain just a single word from the query, a possible ranking is: 5, 1, 4 (document 5 appears first, then document 4, then document 1).
In general, indexing is a batch, preprocessing stage, and retrieval is a quick online stage, but there are exceptions.

This is the gist of Lucene. The rest of Lucene is (many important) specific bells and whistles for the indexing and retrieval processes.

Papers published and brief overview

Paper-1: Experimental Evaluation of Network Telemetry Anonymization for Cloud Based Security Analysis

-Format preserving encryption =
Length Preserving encryption +
Ranking functions
-Searchable Encryption:
clients can search using token called Trapdoor
Symmetric searchable encryption
-Data Anonymization and Analytics as a Service (DAAS): anonimyze network telemetry data and ensures the ability to perform analytics on anonymized data
-Network Scanning Detection -

Direct Data fields that are considered sensitive and directly needed for analytics. For example internal IP addresses, MAC addresses etc. Anonymization of these fields should facilitate correlation across multiple homogeneous sources (different telemetry collectors, devices) and/or multiple heterogeneous sources (different logs like Web Server Logs, NetFlow records etc.)
-> Searchable Encryption: Data fields that are sensitive and
identified as directly needed for analytics are encrypted using
Searchable Encryption techniques.
-> Direct encryption results in finding frequency. Data fields identified as indirectly needed are encrypted using randomized techniques (like different IV’s or tweaks etc.)
For this we leverage the probabilistic nature of Searchable encryption along with Format Preserving encryption schemes in an onion layered
fashion similar to cryptdb.
2) Indirect These are data fields that are sensitive and not directly needed for analytics but needed to be part of final results or reports. For example few use cases may not need TIME STAMP of the flow for analytics but may be needed for further forensics in the Consumer network.
-> Data fields identified as indirectly needed are encrypted using randomized techniques (like different IV’s or tweaks etc.)
3) Public Data fields that are non sensitive in nature are left unchanged. Few data fields may need to be sanitized before sharing. For example, DST IP can be left as is as the server is public.
4) Removal (or Nullifying) Data fields not needed either
for analytics should be nullified or removed completely
from the data records. For example protocol version.
Failing to do so may leak unwanted information for
adversaries to try inference attacks.

Onlion Layered Encryption -> 
Format Preserving and Searchable : Ranking, SSE, LP-DET
SSE - Searchable: SWP scheme
FP DET: Ranking, LP-DET
DET: AES in ECB mode
RND: AES in CBC mode

Usage of Metrics:
1) NetFlow Anonymization - Used in Security Information and Event Management (SIEM) or Intrusion Prevention Systems(IPS)

Test Data:
We further split these data sets into chunks of approximately 10000 records each carefully ensuring each such data set contains traces of scanning activity.
-Total Flows are the number of flows (flowing inside and outside) the network
-Internal IP Addresses are the number of IP Addresses in the internal network. These are the ones being targeted during recon in Scan Detection use case and/or of interest to forensic investigators during metrics use case.
-Matching Flows are the number of flows that are outbound and matching with the Internal IP Addresses.

Paper-2:  On the Optimization of Key Revocation Schemes for Network Telemetry Data Distribution

Attribute-based encryption is a type of public-key encryption in which the secret key of a user and the ciphertext are dependent upon attributes (e.g. the country in which he lives, or the kind of subscription he has). In such a system, the decryption of a ciphertext is possible only if the set of attributes of the user key matches the attributes of the ciphertext.

- Attribute-based encryption (ABE) can be used for log encryption. Instead of encrypting each part of a log with the keys of all  ecipients, it is possible to encrypt the log only with attributes which match recipients' attributes. This primitive can also be used for broadcast encryption in order to decrease the number of keys used.

Although ABE concept is very powerful and a promising mechanism, ABE systems suffer mainly from two drawbacks: non-efficiency and non-existence of attribute revocation mechanism.

Other main challenges are:
Key coordination
Key escrow
Key revocation

ID-based encryption:
ID-based encryption, or identity-based encryption (IBE), is an important primitive of ID-based cryptography. As such it is a type of public-key encryption in which the public key of a user is some unique information about the identity of the user (e.g. a user's email address). This means that a sender who has access to the public parameters of the system can encrypt a message using e.g. the text-value of the receiver's name or email address as a key. The receiver obtains its decryption key from a central authority, which needs to be trusted as it generates secret keys for every user

Elliptic-curve cryptography (ECC) is an approach to public-key cryptography based on the algebraic structure of elliptic curves over finite fields. ECC requires smaller keys compared to non-ECC cryptography (based on plain Galois fields) to provide equivalent security.

For elliptic-curve-based protocols, it is assumed that finding the discrete logarithm of a random elliptic curve element with respect to a publicly known base point is infeasible: this is the "elliptic curve discrete logarithm problem" (ECDLP). The security of elliptic curve cryptography depends on the ability to compute a point multiplication and the inability to compute the multiplicand given the original and product points. The size of the elliptic curve determines the difficulty of the problem. 256-bit ECC key has security similar to 3072-bit RSA key


Attribute-based encryption (ABE) is a relatively recent approach that reconsiders the concept of public-key cryptography. In traditional public-key cryptography, a message is encrypted for a specific receiver using the receiver’s public-key. Identity-based cryptography and in particular identity-based encryption (IBE) changed the traditional understanding of public-key cryptography by allowing the public-key to be an arbitrary string, e.g., the email address of the receiver. ABE goes one step further and defines the identity not atomic but as a set of attributes, e.g., roles, and messages can be encrypted with respect to subsets of attributes (key-policy ABE - KP-ABE) or policies defined over a set of attributes (ciphertext-policy ABE - CP-ABE). The key issue is, that someone should only be able to decrypt a ciphertext if the person holds a key for "matching attributes" (more below) where user keys are always issued by some trusted party.

Ciphertext-Policy ABE

In ciphertext-policy attribute-based encryption (CP-ABE) a user’s private-key is associated with a set of attributes and a ciphertext specifies an access policy over a defined universe of attributes within the system. A user will be ale to decrypt a ciphertext, if and only if his attributes satisfy the policy of the respective ciphertext. Policies may be defined over attributes using conjunctions, disjunctions and (k,n)(k,n)-threshold gates, i.e., kk out of nn attributes have to be present (there may also be non-monotone access policies with additional negations and meanwhile there are also constructions for policies defined as arbitrary circuits). For instance, let us assume that the universe of attributes is defined to be {A,B,C,D}{A,B,C,D} and user 1 receives a key to attributes {A,B}{A,B} and user 2 to attribute {D}{D}. If a ciphertext is encrypted with respect to the policy (A∧C)∨D(A∧C)∨D, then user 2 will be able to decrypt, while user 1 will not be able to decrypt.

CP-ABE thus allows to realize implicit authorization, i.e., authorization is included into the encrypted data and only people who satisfy the associated policy can decrypt data. Another nice features is, that users can obtain their private keys after data has been encrypted with respect to policies. So data can be encrypted without knowledge of the actual set of users that will be able to decrypt, but only specifying the policy which allows to decrypt. Any future users that will be given a key with respect to attributes such that the policy can be satisfied will then be able to decrypt the data.

Key-Policy ABE

KP-ABE is the dual to CP-ABE in the sense that an access policy is encoded into the users secret key, e.g., (A∧C)∨D(A∧C)∨D, and a ciphertext is computed with respect to a set of attributes, e.g., {A,B}{A,B}. In this example the user would not be able to decrypt the ciphertext but would for instance be able to decrypt a ciphertext with respect to {A,C}{A,C}.

An important property which has to be achieved by both, CP- and KP-ABE is called collusion resistance. This basically means that it should not be possible for distinct users to "pool" their secret keys such that they could together decrypt a ciphertext that neither of them could decrypt on their own (which is achieved by independently randomizing users' secret keys).

Beyond ABE

ABE is just one type of the more general concept of functional encryption (FE) covering IBE, ABE and many other concepts such as inner product or hidden vector encryption (yielding e.g., searchable encryption) etc. It is a very active and young field of research and has many interesting applications (in particular in the field of cloud computing).

Broadcast Encryption
The BE scheme uses bilinear group G of order p and take identities in range 1,...,n, n is number of users, r is revoked users. The scheme provides strong revocation mechansim requiring ciphertext to have O(r) elements and publlic, private keys of constant size. The scheme comprises Setup (n), Key-Gen(MSK, ID), Encrypt(PK, M, S) and Decrypt(S,CT,ID,D), if ID is in revocation list the algorithm aborts.

Optimizations in the scheme:
Right pairings and curves eg: SS512
1) Randomness reuse: ZR, G1, G2 - During encrypt operation, the random(ZR) is used in s1 variables and for each item in revocation list a random(ZR) is assigned.
2) Bilinear Pairing computation: During decrypt operation, the bilinear pairing is computed for pair(c[i],d[i]) where i range [1..7]. The pairing results are cached and reused while decryption of data for same user. The pairing computation is
done once per user.
3) Indexing: The group.random values for G1, G2 and ZR are precomputed and stored in buffer, the group.hash value is computed for each user in revoked list during encrypt and decrypt execution. S hash = group.hash(S[i].upper()) for each in revoked list S. An index of revoked userlist and corresponding hash values is used as this is used frequently during encryption and decryption operations.
4) Reuse of intermediate results: During decrypt, the A4 computation required considerable amount of time, calculating it requires encrypted-text, user-secret key and hash values for users in revoked list, in circumstances where data is encrypted to large set of users the value is computed on periodic basis.

SSL Handshake:
1. Client initiates with Client Hello message, ciphers supported and random number#1
2. Server hello, cipher selected, random number#2, SSL cert
3. Client creates Pre-master secret
Pre-master secret encrypted with Server public key
4. Both client and server have Random nubmer#1,#2 and pre-master secret, generates Master secret
5. Client sends: SSL Finished
6. Server sends: SSL Finished

Paper: Experiments in Encrypted and Searchable Network Audit Logs
Used advances in Identity Based Encryption and Attribute Based Encryption techniques look promising in providing capabilities of privacy preserving search over encrypted data.

Privacy Preserving Audit:
Searchable Encryption techniques allow a client to store encrypted records at a remote server. Subsequently the client can Search the server using a token called TrapDoor. Server uses it in order to match the encrypted records and returns the matching ones.

Setup: The algorithm initialization depends on bilinear pairing and elliptic curve used. The key server then generates master key MK, and public key PK.

-> Selection of bilinear-pairing a bilinear group G0 of prime order p with generator g. We have used elliptic curve with bilinear maps (or pairings) like SS512 which is a symmetric curve with a 512-bit base field using this private key is generated and asymmetric curve pairings like MNT159 and MNT224 having 159-bit and 224-bit base field respectively.
-> Curve selection We used Type-A curve such as y2 = x3 + x to compute the pairings

Key generation and sharing: The user secret key is generated using (PK, MK, search keyword). Like in traditional CP-ABE scheme, attributes are associated with public and access policy is associated with ciphertext. Here instead in place of public identifiers the search keyword is used. The secret key is communicated to interested parties using a secure channel like TLS/SSL.

Encryption: The data records are read from SiLK repository. For each log entry m comprising search keywords w 1 , w 2 , ..w n (keywords could be ip-address, subnet-mask, protocol using which user would like to filter the data)

-> The server encrypts the entry using random symmetric encryption key K, to get E K (m). For each keyword wi, the server computes the CP-ABE encryption ci of string (F LAG|K) using search keyword as access policy and PK public key
-> The choice of symmetric encryption for data encryption is attributed to the fact that these exhibit high performance and more suitable while encrypting large data. We have used AES in CBC mode with 16 byte block size (with PKCS5 padding) and HMAC SHA1 algorithm as a PRP generator.

Match and Decrypt:
If the data owner wants to provide controlled access to third party auditor who wish to search and retrieve of particular data from encrypted records. The data owner with help of key authority constructs private key with capability, then for each encrypted record MatchAndDecrypt operation is run:
• As part of match routine the data record is decrypted using (PK, sk, ciphertext) and the decrypted text is if it has FLAG has prefix.
• The match returns true then decrypt the ciphertext c using previously generated secret-key sk and public key PK. The symmetric encryption key is extracted from decrypted text and one more round of decryption happens but this time it is done using symmetric key.
• If match is false then the record is not processed further.

Reverse Proxy Vs Forward Proxy:
Forward Proxy: Acting on behalf of a requestor (or service consumer)
Reverse Proxy: Acting on behalf of service/content producer.

Reverse Proxy Gateway Functionality
Security: Reverse Proxy Gateway acts as entry point and incase of secure Hypertext Transfer Protocol (HTTPS) requests it decrypts request contents and pass the decoded request content to backend servers which can handle only HTTP requests. Some advantages offered in security front are Secure Socket Layer (SSL/TLS) configuration which is CPU intensive.
Centralized Logging Service: As incoming requests are routed through reverse proxy gateway, it captures important events pertaining to traffic patterns and help in security monitoring service.
Load balancing: The reverse proxy load balances the incoming traffic to available servers based on availability and distribute requests using strategies like sticky session in case of stateful sessions, round robin selection of servers.
Caching and static content serving: This service is required for content heavy applications like Youtube where the server responsiveness is improved by hosting static contents in gateway server to improve access speed.
Gzip compression and decompression: In order to speed up data transfer between client-server the reverse proxy servers can compress data before request is served and uncompress the data uploaded by clients.

Challenges associated with Traditional Data Sharing Approaches
In many cases the cloud service provider need to share sensitive data with cloud users. This kind of data sharing can happen periodically at stipulated intervals or on-demand whenever some security investigation is necessary. With traditional security schemes such as PKI there are certain challenges which tend to become a bottleneck when used in cloud environment.
1. Certificate Management: The existing mechanisms for secure content sharing largely relay on secure socket layers which use certificates for trust establishment. Certificate management involves validation of certificates and frequent synchronization with certificate authority servers.
2. Validity of the certificate: The CA publishes the certificate status information which client applications can validate. This information is published periodically which clients synchronize with server.
Receiver need to verify that sender certificate not revoked. The certificate status information is queried using Certificate revocation list (CRL) or Online certificate status protocol (OCSP).
3. How to trust Certificate Authority: Both sender and receiver may have different CA. This lead to CA chain certification or validation even for CAs.
4. Fine-grained Access Control: The existing security mechanisms provide either complete access to data or completely restrict usage and they do not provide easy way to share data selectively with other parties.

Paper-3: : Improved Data Confidentiality of Audit Trail Data in Multi-Tenant Cloud

Typically the cloud providers have a demilitarized zone protecting the data center along with a reverse proxy setup. The reverse proxy gateway acts as initial access point and provides additional capabilities like load balancing, caching, security monitoring capturing events, syslogs related to hosts residing in the cloud. The audit-trail logs captured by reverse proxy server comprise important information related to all the tenants.

We provide a two-phase approach for sharing the audit-logs with users allowing fine-grained access. In this paper we evaluate certain Identity-Based and Attribute-Based Encryption schemes and provide detailed analysis on performance.

The internet facing reverse proxy gateway provides protection from issues like intrusion detection, denial of service attacks etc. Data collected by reverse proxy includes system logs, alarms and it can capture HTTP/REST requests, remote-service calls pertaining to tenants if it is configured as SSL termination end-point.

Audit trail log structure:
http {
log_format compression '$remote_addr - $remote_user [$time_local] ' '"$request" $status $body_bytes_sent ' '"$http_referer" "$http_user_agent";

Methodology: We consider role of reverse proxy server extended as SSL termination end-point so that it can intercept all HTTP/SSL traffic. The cloud provider has a Network Admin who has access to entire logs and cloud tenants with users having different roles like level-1, level-2, level-3 etc. While level-1 users are in the bottom of organizational hierarchy and they are monitored by level-2 and so on and so forth.

Consider public cloud provider hosting tenants having reverse proxy server installed which captures audit-trail logs of incoming traffic pertaining to clients. We consider role of reverse proxy server extended as SSL termination end-point so that it can intercept all HTTP/SSL traffic. The cloud provider has a Network Admin who has access to entire logs and cloud tenants with users having different roles like level-1, level-2, level-3 etc. While level-1 users are in the bottom of
organizational hierarchy and they are monitored by level-2 and so on and so forth.
A. Privacy and Security of Audit logs - Objectives We divide the problem into two sub-domains –
1. Cloud Network Admin has access control on entire logs and can do operations like search, encryption, decryption,
2. Tenant users like Network Admin can access all tenant specific logs and users of Level-1, Level-2 etc. has controlled access to data.
Users at higher level can oversee data pertaining to lower level that they are administering. It implies that user’s access to audit log contents is controlled using role-based access control policies.

Type-I Data Security
The Cloud Network Administrator being a super user has complete control on data. Log entries have unique attributes like TenantId, application-id or any public identifiers. The cloud provider repository contains data pertaining to all the tenants, the users can access content based on access restrictions implied by associated access structure according to RBAC policies. Type-I data protection scheme uses IBE or ID-PKC algorithm such as BB scheme. The Phase-1 uses ID-PKC scheme as it facilitates communication using public keys generated out of public identifiers like user-id, organization name, pin-code which provide advantage of less overhead in communication. We explore BB, BB-CKRS schemes for Type-1 data security implementation. For performance evaluation we evaluate combination of these schemes and assess those using large datasets in our experiments.
Type-II Data Security
Data security implemented using Type-II mechanism imposes access control restrictions on specific datasets or particular fields of a dataset tuple shared among the users. User has access to data based on access restrictions as per the access structure embedded in ciphertext. As an example the Level-1 users who are bottom of hierarchy can see data pertaining to their own activity and Level-2 can escrow or oversee all the Level-1 user data along with additional data pertaining to its activity. We modify the existing CP-ABE along with encrypted text with embedded role-based policy or access control data and user keys having descriptive attributes such as organization, division, manager. The main reason for choosing CP-ABE for Type-II security implementation is that it perfectly suits circumstances where user privileges (RBAC policies) determine the access and ensures granular access on data. For Type-II data security we assess BSW and Waters cryptographic schemes.
We use Key encapsulation mechanism or Symmetric Crypto Abstraction for content encryption as used in practical web-based applications. Symmetric Crypto Abstraction is conglomeration of symmetric and asymmetric schemes for faster data encryption, this approach is used in HTTPS or secures internet communication in real world applications. The reason for choosing symmetric key algorithm for large data encryption is because they exhibit higher performance which is useful for real-time communications. Only the key parameters like symmetric secret key or session-Id is encrypted using the asymmetric key algorithm, in this case it is Type-1 or Type-2 security scheme. So for practical applications with requirement for large data encryption a symmetric key algorithm like Advanced Encryption Standard (AES) scheme with CBC mode having 128-bit security is used, the key size of 128-bit is sufficient but 192 or 256-bit is desirable.

Following are two use cases we can foresee in content sharing in a cloud scenario
Cloud service provider use Type-1 scheme for content encryption and tenants or individual users use a proxy service to interpret the data and re-encrypt using policy data or RBAC information of local users using Type-2 scheme
Alternatively the cloud service provider use Type-2 security mechanism with RBAC policy tree (access structure) as input for content encryption and then re-encrypt using Type-1 scheme. The consumers or tenant users initially decrypt contents using secret keys of Type-1 mechanism and apply one more round of decryption using Type-2 scheme.



SQL (Digital) is preferred for clearly defined items, specifications.
No-SQL (Analog) is preferred with fluid

1. Contacts list: each contact can have 1or more phones, email address, address
Creating SQL tables would be overhead and many data would be left NULL.

2. Social Network: relationship links, status updates, messaging, likes etc
With SQl, adding new fileds can be overhead

3. Warehouse Management: Integrity and transaction support - SQL is good


So what sets NoSQL databases apart?
So we made clear that all those databases commonly referred to as NoSQL are too different to evaluate them together. Each of them needs to be evaluated separately to decide if they are a good fit to solve a specific problem. But where do we begin? Thankfully, NoSQL databases can be grouped into certain categories, which are suitable for different use-cases:


Examples: MongoDB, CouchDB

Strengths: Heterogenous data, working object-oriented, agile development

Their advantage is that they do not require a consistent data structure. They are useful when your requirements and thus your database layout changes constantly, or when you are dealing with datasets which belong together but still look very differently. When you have a lot of tables with two columns called "key" and "value", then these might be worth looking into.

Graph databases

Examples: Neo4j, GiraffeDB.

Strengths: Data Mining

While most NoSQL databases abandon the concept of managing data relations, these databases embrace it even more than those so-called relational databases.

Their focus is at defining data by its relation to other data. When you have a lot of tables with primary keys which are the primary keys of two other tables (and maybe some data describing the relation between them), then these might be something for you.

Key-Value Stores

Examples: Redis, Cassandra, MemcacheDB

Strengths: Fast lookup of values by known keys

They are very simplistic, but that makes them fast and easy to use. When you have no need for stored procedures, constraints, triggers and all those advanced database features and you just want fast storage and retrieval of your data, then those are for you.

Unfortunately they assume that you know exactly what you are looking for. You need the profile of User157641? No problem, will only take microseconds. But what when you want the names of all users who are aged between 16 and 24, have "waffles" as their favorite food and logged in in the last 24 hours? Tough luck. When you don't have a definite and unique key for a specific result, you can't get it out of your K-V store that easily.

Is SQL obsolete?
Some NoSQL proponents claim that their favorite NoSQL database is the new way of doing things, and SQL is a thing of the past.

Are they right?

No, of course they aren't. While there are problems SQL isn't suitable for, it still got its strengths. Lots of data models are simply best represented as a collection of tables which reference each other. Especially because most database programmers were trained for decades to think of data in a relational way, and trying to press this mindset onto a new technology which wasn't made for it rarely ends well.

NoSQL databases aren't a replacement for SQL - they are an alternative.

Most software ecosystems around the different NoSQL databases aren't as mature yet. While there are advances, you still haven't got supplemental tools which are as mature and powerful as those available for popular SQL databases.

Also, there is much more know-how for SQL around. Generations of computer scientists have spent decades of their careers into research focusing on relational databases, and it shows: The literature written about SQL databases and relational data modelling, both practical and theoretical, could fill multiple libraries full of books. How to build a relational database for your data is a topic so well-researched it's hard to find a corner case where there isn't a generally accepted by-the-book best practice.

Most NoSQL databases, on the other hand, are still in their infancy. We are still figuring out the best way to use them.

NoSQL and Transaction Support:
CAP: A commonly cited theorem is the CAP theorem: consistency, availability and partition tolerance cannot be achieved at the same time. SQL, NoSQL and NewSQL tools can be classified according to what they give up; a good figure might be found here.

BASE: A new, weaker set of requirements replacing ACID is BASE ("basically avalilable, soft state, eventual consistency"). However, eventually consistent tools ("eventually all accesses to an item will return the last updated value") are hardly acceptable in transactional applications like banking. Here a good idea would be to use in-memory, column-oriented and distributed SQL/ACID databases, for example VoltDB; I suggest looking at these "NewSQL" solutions.

MongoDB does not have any build-in features which ensure consistency (only exception: uniqueness constraints can be enforced with unique indexes). The responsibility to not write inconsistent data to the database is delegated to the application.

Friday, February 2, 2018

Top Threats in Cloud Computing: Cloud Security Alliance

Top Threats in Cloud Computing: Cloud Security Alliance

Cloud provide quick infrastructure for businesses but at the same time there is risk associated with data migrated to cloud, the number of attack vectors increase by manifold compared to traditional in-premise software solutions. An important source of attacks is insiders in cloud ecosystem as cloud infrastructure is shared between multiple entities there is risk associated.

  • Threat-1: Abuse and Nefarious use of Cloud Computing: IaaS and PaaS: providers offer unlimited compute, network, storage and infrastructure to consumers. These can be abused with relative anonymity of users using cloud services. Incidents like IaaS offering Zeus botnet, InfoStealer Trojan horses. As defensive measure entire blocks of IaaS network address have been publicly blacklisted. Remediation measures include stricter initial registration, enhanced credit card fraud monitoring and coordination, Comprehensive inspection and of customer network traffic, monitoring public blacklists.
  • Threat-2: Insecure Interfaces and API: Cloud service providers and third parties provide application programming interfaces for customers to enable cloud integration, accessibility. While these provide an automated way to access cloud the security is dependent on implementation by the providers. As a remediation the cryptographic schemes for authentication and authorization, should be carefully diagnosed for possible security breaches.
  • Threat-3: Malicious Insiders: The threat from insiders of organization gets complicated in cloud context. A single malicious insider can create problem all cloud tenants and it cane huge loss to cloud provider in terms of finance and reputation. Unlike traditional businesses the cloud services require extra checks for access restrictions and personnel hiring practices.
  • Threat-4: Shared Technology Issues: The cloud vendors IaaS, PaaS provide various services by sharing infrastructure. The hardware, software, networks even GPU resources are shared between businesses. Any vulnerability within the technology can create huge problem to the businesses. A strong defense in depth strategy is recommended to remediate this problem and strong compartmentalization is needed between tenants. Ensure latest software is used in host machines with security patches.
  • Threat-5: Data Loss or Leakage: The data residing in cloud may be deleted or altered due to unforeseen events like machine failures and software crash. The data leakage by internal or external personnel can happen due to weak authentication systems. The data needs to be encrypted when it is moved from one point to another and when it is persisted and encryption keys need to be stored.
  • Threat-6: Account or Service Hijacking: Usually attackers use mechanism like phishing, request forgery to lure users to malicious websites. In cloud context the problem is magnified due to large user base. To remediate this stop users sharing information with other, educate them on possible hazards and monitor SLAs.
  • Threat-7: Unknown Risk Profile: The public cloud hosts multiple businesses within same data center or premise. Lots of sensitive data traverses within the cloud network pertaining to business operation which are captured by intermediate devices like proxy servers, firewalls, load-balancers and gateways. The sensitive data like network telemetry, audit-trail logs should be carefully preserved from unauthorized access. Disclosure of such information can create a huge loss to business operations.

Data Security using Identity-Based Encryption Schemes and Key Management

Research Area: Data Security using Identity-Based Encryption Schemes and Key Management in Cloud Environment

Identity and Attribute-based encryption schemes are viable alternative to PKI based cryptosystemsas they help in devising ACP using personal identifiers(PID), hierarchy and validity-period. The multi-tenant cloud environment have broad array of threat vectors compared to on-premise software. The present schemes can be complex (if not limiting) to share and control the data due to complexity involved in certificate management and cost. Our research work involved study of ID-PKC, ABE schemes and related key management techniques. As part of research we devised access control mechanism  suitable for cloud environment. Techniques like searchable encryption help to leverage third-party services without compromising on privacy of data. We have improvised methods for data transmission, efficiency of key management schemes and evaluated them using network telemetry datasets.

Let G be a cyclic group of prime order p generated by g G. Let e : G × G Gt be a function mapping pairs of elements of G to elements of some group Gt also of order p. We use a multiplicative notation for the group operations in G and Gt. In practice G will be a subgroup of the group of points on a curve defined over some finite field, while Gt will be a multiplicative subgroup in some extension of the field. Note that it will not be feasible to compute a homomorphism from Gt to G without violating complexity assumptions. Suppose that e satisfies the bilinearity condition that u, v G, a, b Z, e(ua, vb) = e(u, v)ab, and the non-degeneracy condition that (g, g) generate Gt. Suppose also that the group operations in G and Gt, as well as the pairing e, can all be computed efficiently (which requires that the elements of G and t have compact representations). In this case, it is said that G is a bilinear group, and that the map e is a symmetric bilinear map or pairing in the group G. The symmetry refers to the invariance of the bilinear map upon interchange of its arguments.
A.2 Asymmetric Bilinear Groups
Let G and Ĝ be a pair of cyclic groups of prime order p respectively generated by g G and ḡ Ĝ. Let e : G×Ĝ Gt be a function mapping pairs of elements in (G, Ĝ) to elements of some group Gt also of order p. Group operations are written multiplicatively in G, Ĝ, and Gt. Suppose that e satisfies the bilinearity condition that u G, v Ĝ, a, b Z, e(ua, vb) = e(u, v)ab, and the non-degeneracy condition that e(g, ḡ) generate Gt. Suppose also that the group operations in G, Ĝ, and Gt, as well as the pairing e, are all efficiently computable. In this case, it is said that (G, Ĝ) form a bilinear group pair, and that the map e is an asymmetric bilinear map, or pairing, in (G, Ĝ). The asymmetry refers to the non interchangeability of the bilinear map’s arguments. Finally, let φ : Ĝ G be the group homomorphism such that φ(ḡ) = g; this homomorphism always exists but is not always efficiently computable. When the homomorphism φ is efficiently computable, the bilinear group pair (G, Ĝ) is of type 2, otherwise it is of type 3.
A.3 Types of Security Attacks on ID-PKC schemes
There are six general categories of attacks that the use of encryption can protect against. In each of these cases, an attacker attempts to either determine a key needed to decrypt a message or the plaintext message that was encrypted.
  1. Ciphertext-only attack. A ciphertext-only attack is carried out by an adversary who has access to only ciphertext. This is the most difficult attack to carry out, and any system needs to be resistant to such an attack to provide any level of security at all.
  2. Known-plaintext attack. A known-plaintext attack is carried out by an adversary who has access to both plaintext and corresponding ciphertext. The matching plaintext and ciphertext need not comprise all of an encrypted message. This type of attack is very easy for an adversary to carry out, and protection against known-plaintext attacks is essential for any useful cryptographic system. Almost any type of information that is transmitted electronically has enough structure to guarantee some level of matching plaintext and ciphertext. Bytes representing ASCII text have some fixed bits while others can be guessed with a high probability, for example.
  3. Chosen-plaintext attack. A chosen-plaintext attack is carried out by an adversary who can select the plaintext and then be given the corresponding ciphertext. For example, to create a list of all possible plaintext-ciphertext pairs and then decrypt any other encrypted messages that he observes by looking up the correct plaintext in this table. One way to counter such a capability is to include random information with the plaintext that gets encrypted, so that a single plaintext message will get encrypted to a different ciphertext each time it is encrypted.
  4. Adaptive chosen-plaintext attack. In an adaptive chosen-plaintext attack, an adversary selects an initial plaintext message to encrypt and then selects the next plaintext messages that he encrypts based on the ciphertext that he receives from the previous encryption. He can repeat this process as often as needed to gather more information about the key being used. Otherwise, this attack has the same properties as a chosen-plaintext attack.
  5. Chosen-ciphertext attack. In a chosen-ciphertext attack, an adversary selects a ciphertext and is able to obtain the corresponding plaintext. If an algorithm encrypts a particular plaintext to the same ciphertext every time it is encrypted then it is vulnerable to a chosen-ciphertext attack, so many encryption algorithms add a random input to the plaintext to make such an attack infeasible. Portable devices like smartcards may be susceptible to chosen-ciphertext attacks, because they can often be obtained by an adversary. Being secure against chosen ciphertext attacks is the standard level of security that is currently expected of public-key systems.
  6. Adaptive chosen-ciphertext attack. In this, an adversary selects an initial ciphertext message to decrypt and then selects the next ciphertext messages that he decrypts based on the plaintext that he receives from the previous decryption.
A.4 Criteria for Ideal Identity-Based Encryption scheme
  1. Data confidentiality: Before uploading data to the cloud, the data was encrypted by the data owner. Therefore, unauthorized parties including the cloud cannot know the information about the encrypted data.
  2. Fine-grained access control: In the same group, the system granted the different access right to individual user. Users are on the same group, but each user can be granted the different access right to access data. Even for users in the same group, their access rights are not the same.
  3. Scalability: When the authorized users increase, the system can work efficiently. So the number of authorized users cannot affect the performance of the system.
  4. User accountability: If the authorized user is dishonest, he would share his attribute private key with the other user. Due to that illegal key may be created among unauthorized users.
  5. User revocation: If the user quits the system, the scheme can revoke his access right from the system directly. The revocable user cannot access any stored data.
  6. Collusion resistant: Users cannot combine their attributes to decipher the encrypted data. Since each attribute is related to the polynomial or the random number, different users cannot collude each other.
A.5 Identity-Based Encryption and Security Analysis
An IBE system consists of four algorithms Setup, KeyGen, Encrypt and Decrypt.
Setup generates Private Key Generators parameters params and a master key master-key. Key-Gen uses master-key and user identity parameters to generate private key for that identity. Encrypt takes message, identity and params as input and outputs a ciphertext while Decrypt uses ciphertext for the identity using a identity private key.
Boneh and Franklin define chosen ciphertext security using chosen identity attack using following game.
Setup: The challenger runs the Setup, and sends the params to the adversary
Phase 1: The adversary issues queries q1, q2, q3, ..qm where qi is one of following
Key generation query: the challenger runs KeyGen on IDi and forwards the resulting private key to the adversary
Decryption query: the challenger runs the KeyGen on IDi decrypts ci using private key and sends the result to the adversary
Challenge: The adversary submits two plaintexts M0, M1 Є M and an identity ID. Only condition is that ID should not have been used in Phase1 key generation query. The challenger selects a random bit b Є {0, 1}, sets C = Encrypt(params, ID, Mb), and sends C to the adversary as its challenge ciphertext.
Phase 2: Same as Phase 1 except that adversary may not request a private key for ID or the decryption of (ID, C)
Guess: The adversary submits a guess b` Є {0, 1} and adversary wins if b = b`
The adversary A in above same is called IND-ID-CCA adversary.
The advantage of adversary A is defined as probability
Adv (A)= |Pr[b= b']- 1/2 | …(A.1)
Definition A.3: An ID-PKC system is (t, qID, qC, Є) IND-ID-CCA secure if all t-time IND-ID-CCA adversaries making at most qID private key queries and at most qC chosen ciphertext queries have advantage at most Є in winning the game.
IND-ID-CPA security is defined similarly but with restriction that adversary cannot make decryption queries.
Definition A.4: An ID-PKC system is (t, qID, Є) IND-ID-CPA secure if it is (t, qID, 0, Є) IND-ID-CCA secure.
A.6 Ciphertext-Policy Attribute-Based Encryption
Ciphertext Attribute based encryption scheme constitutes of four algorithms:
Setup: the algorithm is run by PKG authority which generates master public key PK and master key MK
KeyGeneration: Key generation is performed by PKG authority considering the attribute set S provided by user and generates a secret key SK
Encrypt: Encryption is performed by sender or data owner which encrypt the plaintext message M
Decrypt: Decryption is performed by receiver which decrypts ciphertext CT using user secret key
The access policy constitutes expressions with boolean variables and threshold gates. AND and OR gates describe relationship between the variables. Access policy rule validates the access structure based on state of the attribute set S and returns 1 or 0 depending on whether attribute set S satisfies R or not.
A.7 CP-ABE Scheme Security Analysis
Definition: CPA security of CP-ABE scheme. If there is no probabilistic polynomial time algorithm within which adversaries can get advantage. The CP-ABE scheme is said to be secure against chosen plaintext attacks.
Security of CP-ABE scheme is explained using a game between challenger and adversary model.
Initial: The adversary chooses an access structure resembling the original access policy and submits to the challenger
Setup: The Challenger runs the Setup algorithm in CP-ABE
Phase1 - The adversary makes secret key query to the KeyGeneration algorithm using attribute set S, with a restriction that S not equal to the access structure.
Challenge: The adversary submits two plaintexts M0 and M1 of equal length to the challenger. The task for challenger is to choose μ Є {0,1} randomly and encrypt under the access structure to obtain ciphertext CT. Finally CT is passed to adversary
Phase2, same as Phase1
Guess, the adversary guesses the value of μ as μ'
In the above CPA security game, the advantage of adversary A is defined as
AdvCP-ABECPA (A)= |Pr[μ= μ']- 1/2 | …(A.2)