{"id":428,"date":"2026-02-16T15:01:52","date_gmt":"2026-02-16T14:01:52","guid":{"rendered":"http:\/\/justmakeit.es\/?p=428"},"modified":"2026-02-16T15:01:52","modified_gmt":"2026-02-16T14:01:52","slug":"borrando-datos-en-hdfs-con-scala-spark","status":"publish","type":"post","link":"http:\/\/justmakeit.es\/?p=428","title":{"rendered":"Borrando datos en HDFS con Scala\/Spark"},"content":{"rendered":"\n<p>\u00bfC\u00f3mo deber\u00edamos realizar un borrado de datos en HDFS? <\/p>\n\n\n\n<p>Depende de si queremos borrar todos los datos de una partici\u00f3n o si queremos eliminar s\u00f3lo una parte de los mismos. En el primer caso, deber\u00edamos hacer algo similar a esto:<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">def deleteHDFS(spark: SparkSession, <br>                   entityName: String, <br>                   configMap: Map[String, String], <br>                   partitionDate: String): Unit = {    <br>    <br>    val hadoopConf = spark.sparkContext.hadoopConfiguration<br>    val fs = FileSystem.get(hadoopConf)<br><br>    \/\/ Creamos el array en funci\u00f3n del valor de la variable<br><br>    val tableName: Array[String] = entityName match {<br>    \/\/ SI necesitamos borrar una \u00fanica \"tabla\" <br>      case TYPE_1 =>Array(configMap(\"kuduTableName1\").split(\"\\\\.\")(1))<br>    \/\/ SI necesitamos borrar dos o m\u00e1s \"tablas\"<br>      case TYPE_2 => Array(configMap(\"kuduTableName1\").split(\"\\\\.\")(1),                             configMap(\"kuduTableName2\").split(\"\\\\.\")(1))<br>      case _ => Array.empty[String] \/\/ Array vac\u00edo si no coincide<br>    }<br><br>    \/\/ para cada nombre de tabla que debemos tratar, hacemos el borrado de la partici\u00f3n que corresponde al d\u00eda<br>    tableName.foreach { name =><br>      \/\/ Ruta de la partici\u00f3n en HDFS. En nuestro caso, las rutas coinciden con los nombres de tableName<br>      <strong>val partitionPath = new Path(configMap(\"pathRoot\") + \"\/\" + name.toUpperCase + \"\/date=\" + partitionDate)<\/strong><br>      logger.info(\"partitionPath = \" + partitionPath)<br><br>      try {<br>        logger.info(\"SE VA A REALIZAR EL BORRADO SI EXISTE DE LA CARPETA \")<br><br>      if (fs.exists(partitionPath)) {<br>        logger.info(\"fs.exists \" + fs.exists(partitionPath))<br><br>        <strong>fs.delete(partitionPath, true)<\/strong> \/\/ true = borrado recursivo<br>        logger.info(s\"Partition deleted: $partitionPath\")<br><br>      } else {<br>        logger.info(partitionPath + \" does not exist!\")<br>      }<br><br>      }catch {<br>        case ce: Exception =><br>          logger.error(\"An error has ocurred trying to remove an HDFS path\", ce)<br>      }<br>}<\/pre>\n\n\n\n<p><\/p>\n","protected":false},"excerpt":{"rendered":"<p>\u00bfC\u00f3mo deber\u00edamos realizar un borrado de datos en HDFS? Depende de si queremos borrar todos los datos de una partici\u00f3n &hellip; <a href=\"http:\/\/justmakeit.es\/?p=428\" class=\"btn btn-readmore\">Read More <span class=\"screen-reader-text\"> \u00abBorrando datos en HDFS con Scala\/Spark\u00bb<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"closed","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[6],"tags":[],"class_list":["post-428","post","type-post","status-publish","format-standard","hentry","category-cosos"],"_links":{"self":[{"href":"http:\/\/justmakeit.es\/index.php?rest_route=\/wp\/v2\/posts\/428","targetHints":{"allow":["GET"]}}],"collection":[{"href":"http:\/\/justmakeit.es\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"http:\/\/justmakeit.es\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"http:\/\/justmakeit.es\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"http:\/\/justmakeit.es\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=428"}],"version-history":[{"count":1,"href":"http:\/\/justmakeit.es\/index.php?rest_route=\/wp\/v2\/posts\/428\/revisions"}],"predecessor-version":[{"id":429,"href":"http:\/\/justmakeit.es\/index.php?rest_route=\/wp\/v2\/posts\/428\/revisions\/429"}],"wp:attachment":[{"href":"http:\/\/justmakeit.es\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=428"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"http:\/\/justmakeit.es\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=428"},{"taxonomy":"post_tag","embeddable":true,"href":"http:\/\/justmakeit.es\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=428"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}