Obtenha atualizações recentes da Hortonworks por e-mail

Uma vez por mês, receba os mais recentes insights, tendências, informações analíticas e conhecimentos sobre Big Data.


Sign up for the Developers Newsletter

Uma vez por mês, receba os mais recentes insights, tendências, informações analíticas e conhecimentos sobre Big Data.


Comece a Usar


Pronto para começar?

Baixar sandbox

Como podemos ajudá-lo?

* Eu entendo que posso cancelar a inscrição a qualquer momento. Eu também reconheço as informações adicionais encontradas na Política de Privacidade da Hortonworks.
fecharBotão Fechar
March 26, 2018
slide anteriorPróximo slide

How Migrates Big Datasets to the Cloud

We had a very successful DataWorks Summit Berlin, highlighted by a number of impressive keynote and breakout speakers. These speakers included Adrian Woodhead, Principal Engineer and Elliot West, Senior Engineer, at within the Data Processing and Warehousing track. is an affiliate of Expedia Inc. and is a website for booking hotel rooms online and by telephone. The company has 85 websites in 34 languages, and lists over 325,000 hotels in approximately 19,000 locations. Its inventory includes everything from international chains and all-inclusive resorts to local favorites and bed & breakfasts, condos and other types of commercial lodging. The website provides all the information needed to book the perfect stay.

The title of’s breakout session was “Tools and Approaches for Migrating Big Datasets to the Cloud.” The presentation highlights the journey taken by the big data platform team when tasked with migrating big data sets and pipelines from on-premises clusters to cloud based platforms. This includes two open source tools that the team built to overcome the unexpected challenges it faced.

From the breakout session abstract:

“The first of these tools is Circus Train—a dataset replication tool that copies Hive tables between clusters and clouds. The second tool is Waggle Dance—a federated Hive query service that enables querying of data stored across multiple Hive metastores. Giving real world examples, we will describe how we’ve used these tools to successfully build a petabyte scale platform that is now also being used by other brands within the Expedia organisation.

In the hospitality industry, building a 360-view of the customer is crucial. This enables organizations to interact with customers across multiple channels. Organizations use predictive analytics to glean information from their data to find connections and relationships in customer behavior, improve processes to more closely align with buyer patterns, and ultimately improve customer experiences.

Be sure to check out’s presentation to learn about what technologies are in place and how the business continues to grow its Big Data journey. The goal of the session was to assist others in the early part of their journey to building a solid foundation. It was a breakout session you’ll definitely want to check out!

To access the rest of the breakout sessions, visit:
For more customer use cases, visit:


When asked what they were looking forward to most before attending DataWorks Summit, said:  

“’s data teams are engaged in an epic migration journey moving our on-premises data processing to the cloud. Along the way we’ve learnt a lot and developed tools that have proven very useful. Our hope is that by open sourcing these and presenting them at the DataWorks Summit, we can encourage others in the big data community to join us by contributing code, ideas, comments and constructive criticism. We hope to engage with other cloud-bound travelers attending the summit and share war stories, good experiences and hopefully find common patterns and approaches that make all our lives easier.”

Deixar uma resposta

Seu endereço de e-mail não será publicado. Os campos obrigatórios são marcados como *