{"id":347,"date":"2019-12-26T13:06:27","date_gmt":"2019-12-26T12:06:27","guid":{"rendered":"http:\/\/justmakeit.es\/?p=347"},"modified":"2019-12-26T13:08:22","modified_gmt":"2019-12-26T12:08:22","slug":"modificar-el-schema-de-una-tabla-dinamicamente","status":"publish","type":"post","link":"http:\/\/justmakeit.es\/?p=347","title":{"rendered":"Modificar el schema de una tabla din\u00e1micamente"},"content":{"rendered":"\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"269\" src=\"http:\/\/justmakeit.es\/wp-content\/uploads\/2019\/12\/mongo_spark-1024x269.png\" alt=\"\" class=\"wp-image-349\" srcset=\"http:\/\/justmakeit.es\/wp-content\/uploads\/2019\/12\/mongo_spark-1024x269.png 1024w, http:\/\/justmakeit.es\/wp-content\/uploads\/2019\/12\/mongo_spark-300x79.png 300w, http:\/\/justmakeit.es\/wp-content\/uploads\/2019\/12\/mongo_spark-768x202.png 768w, http:\/\/justmakeit.es\/wp-content\/uploads\/2019\/12\/mongo_spark.png 1351w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><\/figure>\n\n\n\n<p>En Spark 2.3 se puede realizar el siguiente ejercicio:<\/p>\n\n\n\n<p>Crear una tabla con una estructura o schema definido como String para todos sus campos<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>spark.sql(\"CREATE TABLE IF NOT EXISTS \"  + databaseName + \".\" + tableName + \" (col1 String, col2 String, col3 String) STORED AS PARQUET\");<\/code><\/pre>\n\n\n\n<p>La ejecuci\u00f3n del CREATE TABLE no nos devuelve el schema reci\u00e9n creado, ya que devuelve uno vac\u00edo, que no nos servir\u00e1 para construir el Dataset. Por ello necesitamos ejecutar a continuaci\u00f3n algo parecido a esto&#8230;<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>StructType schema = spark.sql(SELECT * FROM \" + databaseName + \".\" + tableName + \" LIMIT 1\").schema();<\/code><\/pre>\n\n\n\n<p>A partir de este schema que acabamos de recuperar podemos crear un Dataset con el listado de datos que hemos recuperado del origen que sea&#8230;, para ello<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>Dataset&lt;Row> dataset = spark.createDataFrame(cols, schema);<\/code><\/pre>\n\n\n\n<p>El objecto cols, en nuestro caso es un JavaRDD&lt;Row&gt; construido a partir de datos recuperados de MongoDB, pero podr\u00eda venir de cualquier otro origen.<\/p>\n\n\n\n<p>Una vez creado el Dataset, podemos modificar el tipo de los datos de alguna de las columnas de la siguiente manera.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>dataset = dataset.withColumn(\"col1\", dataset.col(\"col1\").cast(DataTypes.LongType));\ndataset = dataset.withColumn(\"col2\", dataset.col(\"col2\").cast(DataTypes.IntegerType));<\/code><\/pre>\n\n\n\n<p>Por \u00faltimo persistimos la informaci\u00f3n en Hive<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>dataset.write().mode(SaveMode.Overwrite).format(\"parquet\").saveAsTable(tableName);<\/code><\/pre>\n\n\n\n<p>Por otro lado, la conexi\u00f3n a MongoDB desde Spark se realiza de la siguiente manera&#8230;<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>String spark_mongodb_input_uri = \"mongodb:\/\/\" + user + \":\" + password + \"@\" + host + \":\" + port + \"\/\" + authenticationDatabaseName;\n\nSparkSession spark = SparkSession.builder()\n   \/\/ .master(\"local[6]\")\/\/ only for debug\n   .appName(\"mongo-db-connector\").enableHiveSupport()\n   .config(\"spark.mongodb.input.uri\", spark_mongodb_input_uri)\n   .config(\"spark.mongodb.input.database\", spark_mongodb_input_database)\n   .config(\"spark.mongodb.input.collection\", spark_mongodb_input_collection);\n\nJavaSparkContext jsc = new JavaSparkContext(spark.sparkContext());\n\n\/* Start Example: Read data from MongoDB ************************\/\nJavaMongoRDD&lt;Document> rdd = MongoSpark.load(jsc);\n\/* End Example **************************************************\/<\/code><\/pre>\n","protected":false},"excerpt":{"rendered":"<p>En Spark 2.3 se puede realizar el siguiente ejercicio: Crear una tabla con una estructura o schema definido como String &hellip; <a href=\"http:\/\/justmakeit.es\/?p=347\" class=\"btn btn-readmore\">Read More <span class=\"screen-reader-text\"> \u00abModificar el schema de una tabla din\u00e1micamente\u00bb<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"closed","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[14],"tags":[40,36,41,34],"class_list":["post-347","post","type-post","status-publish","format-standard","hentry","category-programacion","tag-bigdata","tag-java","tag-mongodb","tag-spark"],"_links":{"self":[{"href":"http:\/\/justmakeit.es\/index.php?rest_route=\/wp\/v2\/posts\/347","targetHints":{"allow":["GET"]}}],"collection":[{"href":"http:\/\/justmakeit.es\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"http:\/\/justmakeit.es\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"http:\/\/justmakeit.es\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"http:\/\/justmakeit.es\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=347"}],"version-history":[{"count":2,"href":"http:\/\/justmakeit.es\/index.php?rest_route=\/wp\/v2\/posts\/347\/revisions"}],"predecessor-version":[{"id":350,"href":"http:\/\/justmakeit.es\/index.php?rest_route=\/wp\/v2\/posts\/347\/revisions\/350"}],"wp:attachment":[{"href":"http:\/\/justmakeit.es\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=347"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"http:\/\/justmakeit.es\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=347"},{"taxonomy":"post_tag","embeddable":true,"href":"http:\/\/justmakeit.es\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=347"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}