{"id":297,"date":"2018-12-11T15:33:24","date_gmt":"2018-12-11T14:33:24","guid":{"rendered":"http:\/\/justmakeit.es\/?p=297"},"modified":"2018-12-11T15:33:24","modified_gmt":"2018-12-11T14:33:24","slug":"java-spark-hive","status":"publish","type":"post","link":"http:\/\/justmakeit.es\/?p=297","title":{"rendered":"Java + Spark + Hive"},"content":{"rendered":"<p>\u00daltimamente me ha tocado trabajar en un desarrollo con tecnolog\u00eda Big Data en el que era necesario realizar la lectura de unas tablas Hive y convertirlas en otras tablas tambi\u00e9n de Hive con s\u00f3lo algunos de los campos de las originales. Como a\u00fan estoy muy verde con Scala y su sintaxis me sigue resultando extra\u00f1a, he decidido hacerlo en Java con Spark. Probablemente intente realizar este mismo desarrollo con Scala cuando haya finalizado con \u00e9l y poder as\u00ed comparar rendimiento y comportamiento de ambos.<br \/>\nConcretando partimos de 14 tablas originales y debemos construir 40 nuevas.<br \/>\nPara ello hemos decidido crear un fichero de definici\u00f3n de los or\u00edgenes y destinos de las tablas en formato JSON que tras procesar nos permiten crear y\/o modificar las tablas de destino e insertar los datos desde los or\u00edgenes.<\/p>\n<p>A continuaci\u00f3n tratar\u00e9 de detallar los pasos seguidos para completar el desarrollo.<\/p>\n<p><strong>Fichero JSON de definici\u00f3n<\/strong><br \/>\n<code><br \/>\n{<br \/>\n  \"origin_db\": \"nombre_bd_origen\",<br \/>\n  \"origin_tableName\" : \"nombre_tabla_origen\",<br \/>\n  \"l_origin_elements\": [<br \/>\n   \t{      \t\"columnName\": \"campo_1\"},<br \/>\n\t{\t\"columnName\": \"campo_2\"},<br \/>\n\t{\t\"columnName\": \"campo_3\"}<br \/>\n  ],<br \/>\n  \"where\": \"\",<br \/>\n  \"destination_db\": \"nombre_bd_destino\",<br \/>\n  \"destination_table\": \"nombre_tabla_destino\",<br \/>\n  \"l_destination_elements\": [<br \/>\n    \t{      \t\"columnName\": \"campo_1\",<br \/>\n\t      \t\"alias\": \"nombre_campo_destino\"},<br \/>\n\t{\t\"columnName\": \"campo_2\"},<br \/>\n\t{\t\"columnName\": \"campo_3\"}<br \/>\n  ]<br \/>\n}<br \/>\n<\/code><\/p>\n<p><strong>JSONReadObject.java<\/strong><br \/>\n<code><br \/>\nimport java.io.Serializable;<br \/>\nimport java.util.*;<br \/>\nimport com.fasterxml.jackson.annotation.JsonProperty;<\/p>\n<p>public class JSONReadObject implements Serializable{<\/p>\n<p>\t@JsonProperty<br \/>\n\tprivate String origin_db;<br \/>\n\t@JsonProperty<br \/>\n\tprivate String origin_tableName;<br \/>\n\t@JsonProperty<br \/>\n\tprivate List<OriginElement> l_origin_elements;<br \/>\n\t@JsonProperty<br \/>\n\tprivate String where;\t\/\/ conditions for filters<br \/>\n\t@JsonProperty<br \/>\n\tprivate String destination_db;<br \/>\n\t@JsonProperty<br \/>\n\tprivate String destination_table;<br \/>\n\t@JsonProperty<br \/>\n\tprivate List<DestinationElement> l_destination_elements;<\/p>\n<p>\t\/\/ Estos dos atributos NO se cargan desde el JSON<br \/>\n\t\/\/ se cargan con la estructura ID, XXXElement tras la lectura de la info del JSON<br \/>\n\tprivate SortedMap<String, OriginElement> m_origin;<br \/>\n\tprivate SortedMap<String, DestinationElement> m_destination;<\/p>\n<p>\tpublic JSONReadObject() {<br \/>\n\t\tsuper();<br \/>\n\t}<\/p>\n<p>\tpublic void createMaps(){<br \/>\n\t\t\/\/ recuperamos los listados de columnas de origin y destino<br \/>\n\t\t\/\/ ordenados y con clave ID<br \/>\n\t\tthis.m_destination = this.destListToMap(l_destination_elements);<br \/>\n\t\tthis.m_origin = this.oriListToMap(this.l_origin_elements);<br \/>\n\t}<\/p>\n<p>\tpublic SortedMap<String,String> getListColumns(){<br \/>\n\t\tSortedMap<String, String> columns = new TreeMap<String, String>();<br \/>\n\t\tfor(OriginElement origin: this.getL_origin_elements()) {<\/p>\n<p>\t\t\tif( null != this.getM_destination().get(origin.getColumnName()).getAlias() &&<br \/>\n\t\t\t\t\t!this.getM_destination().get(origin.getColumnName()).getAlias().equals(\"\")) {<br \/>\n\t\t\t\tcolumns.put(this.getM_destination().get(origin.getColumnName()).getAlias().toUpperCase(), origin.getDataType());<br \/>\n\t\t\t}else{<br \/>\n\t\t\t\tcolumns.put(origin.getColumnName().toUpperCase(), origin.getDataType());<br \/>\n\t\t\t}<br \/>\n\t\t}<br \/>\n\t\treturn columns;<br \/>\n\t}<\/p>\n<p>\t@Override<br \/>\n\t\/\/public String toString() {<\/p>\n<p>\t\/\/getters and setters<br \/>\n}<br \/>\n<\/code><\/p>\n<p><b>OriginElement.java<\/b><br \/>\n<code><br \/>\npublic class OriginElement implements Serializable {<\/p>\n<p>\tprivate String columnName;<br \/>\n\tprivate String dataType;\t\t\/\/ esta informaci\u00f3n NO se recupera del JSON, sino que se recupera de la tabla a la que se referencia<\/p>\n<p>\t\/\/getters and setters<br \/>\n\t\/\/ default constructor<br \/>\n}<br \/>\n<\/code><br \/>\n<b>DestinationElement.java<\/b><br \/>\n<code><br \/>\npublic class DestinationElement implements Serializable {<\/p>\n<p>\tprivate String columnName;<br \/>\n\tprivate String alias;<\/p>\n<p>\t\/\/ getters and setters<br \/>\n\t\/\/ default constructor<br \/>\n}<br \/>\n<\/code><\/p>\n","protected":false},"excerpt":{"rendered":"<p>\u00daltimamente me ha tocado trabajar en un desarrollo con tecnolog\u00eda Big Data en el que era necesario realizar la lectura &hellip; <a href=\"http:\/\/justmakeit.es\/?p=297\" class=\"btn btn-readmore\">Read More <span class=\"screen-reader-text\"> \u00abJava + Spark + Hive\u00bb<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":304,"comment_status":"closed","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[5,14],"tags":[36,35,34],"class_list":["post-297","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-java","category-programacion","tag-java","tag-json","tag-spark"],"_links":{"self":[{"href":"http:\/\/justmakeit.es\/index.php?rest_route=\/wp\/v2\/posts\/297","targetHints":{"allow":["GET"]}}],"collection":[{"href":"http:\/\/justmakeit.es\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"http:\/\/justmakeit.es\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"http:\/\/justmakeit.es\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"http:\/\/justmakeit.es\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=297"}],"version-history":[{"count":7,"href":"http:\/\/justmakeit.es\/index.php?rest_route=\/wp\/v2\/posts\/297\/revisions"}],"predecessor-version":[{"id":309,"href":"http:\/\/justmakeit.es\/index.php?rest_route=\/wp\/v2\/posts\/297\/revisions\/309"}],"wp:featuredmedia":[{"embeddable":true,"href":"http:\/\/justmakeit.es\/index.php?rest_route=\/wp\/v2\/media\/304"}],"wp:attachment":[{"href":"http:\/\/justmakeit.es\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=297"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"http:\/\/justmakeit.es\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=297"},{"taxonomy":"post_tag","embeddable":true,"href":"http:\/\/justmakeit.es\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=297"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}