cta

Comece a Usar

nuvem

Pronto para começar?

Baixar sandbox

Como podemos ajudá-lo?

fecharBotão Fechar

Da Equipe de Desenvolvimento

R is one of the primary programming languages for data science with more than 10,000 packages. R is an open source software that is widely taught in colleges and universities as part of statistics and computer science curriculum. R uses data frame as the API which makes data manipulation convenient. R has powerful visualization infrastructure, […]

Large-scale Machine Learning The ability to learn without being explicitly programmed, Machine Learning, has been around for a long time and is well understood. What is different is the relatively recent emergence of general purpose tools, such as Apache Spark, that enable processing of very large datasets. Additionally, data scientists can now collaborate and rapidly […]

The 2014 Yahoo email hack is a good illustration how a big data security analytics platform such as Apache Metron can make it easier to detect, investigate, assess, and remediate threats in your environment.  In this article I will describe how to setup and configure Apache Metron to detect a recent cyber attack on Yahoo, […]

Thank you for reading our Data Lake 3.0 series! In part 1 of the series, we introduced what a Data Lake 3.0 is and in part 2 of the series, we talked about how a multi-colored YARN will play a critical role in building a successful Data Lake 3.0. In this blog, we will take a […]

Thank you for reading our Data Lake 3.0 series! In part 1 of the series, we briefly introduced the power of leveraging prepackaged applications in Data Lake 3.0 and how the focus will shift from the platform management to solving the business problems. In this post, we further deliberate on this idea to help answer […]

The new year brings new innovation and collaborative efforts. Various teams from the Apache community have been working hard for the last eighteen months to bring the EZ button to Apache Hadoop technology and Data Lake. In the coming months, we will publish a series of blogs introducing our Data Lake 3.0 architecture and highlighting […]

Welcome back to my blog series, the CISO’s View.  In my last article, CISO’s View: metrics part 1, we started looking at metrics and why they are the foundation of a successful security program. Today, we’ll look at how we derive metrics that communicate value in a way that’s tied to the company strategy. Hopefully […]

Welcome back to my blog series, the CISO’s View.  In my last article, CISO’s View: Why an integrated approach matters, I stirred up the waters a bit by stating that the CISO’s first and most fundamental job is taking all this security data, threats, vulnerabilities, policy violations, and transforming it into business language that shows […]

We are very excited about the release of Apache Zeppelin 0.7.0 and want to thank the Apache Foundation along with the Apache Zeppelin community. The long awaited release introduces several key features which are highlighted below, the most notable improvements in this release are in the area of multi user enhancements, pluggable visualization, Apache Spark & security […]

Apache Spark 2.1 was released recently in the community. The main focus of this release was improvements in Structured Streaming and Machine Learning. Structured Streaming: Kafka .10 support, Metrics & Stability improvements Machine Learning: SparkR Improvements including new ML algorithms for LDA, Random forests, GMM, etc. Wanna try Spark 2.1 now? Well, you are in […]

We recently concluded our highly attended How to Get Started with Hortonworks Data Cloud for AWS Webinars. Thank you Jeff Sposetti and Sean Roberts for hosting the sessions. The webinars provided a very informative overview about the offering and included a detailed demonstration to show how the product works. Some great questions came across during […]

Originally posted in HCC 1. Introduction NiFi is a powerful and easy to use technology to build dataflows from diverse sources to diverse targets while transforming and dynamically routing in between. NiFi is packaged in HDF 2.0 which (in addition to bundling Kafka and Storm for a complete data movement platform) pushes NiFi to enterprise […]

Apache Spark has been Open Source’s new kid on the block. Companies are using Spark to develop sophisticated models that would enable them to discover new opportunities or avoid risk. But what does the future or at least the near future hold for Spark? In this blog we have outlined five trends we see in […]

It has been another exciting week on Hortonworks Community Connection HCC. We continue to see great activity and recommend the following assets from last week. Top Articles from HCC One Way Trust – MIT KDC to Active Directory by:emaxwell One Way Trust – MIT KDC to Active Directory Many security environments have strict policies on […]

It has been another exciting week on Hortonworks Community Connection HCC. We continue to see great activity and recommend the following assets from last week. Top Articles from HCC Supporting Custom Properties for Expression Language in Apache NiFi by:ydavis NiFi has previously supported the ability to refer to flow file attributes, system properties and environment […]