{"id":399,"date":"2024-05-09T15:43:41","date_gmt":"2024-05-09T14:43:41","guid":{"rendered":"http:\/\/justmakeit.es\/?p=399"},"modified":"2024-05-09T15:44:51","modified_gmt":"2024-05-09T14:44:51","slug":"lectura-de-elastic-search-con-scala","status":"publish","type":"post","link":"http:\/\/justmakeit.es\/?p=399","title":{"rendered":"Lectura de Elastic Search con Scala"},"content":{"rendered":"\n<figure class=\"wp-block-image size-full\"><a href=\"http:\/\/justmakeit.es\/wp-content\/uploads\/2024\/05\/EKS.png\"><img loading=\"lazy\" decoding=\"async\" width=\"480\" height=\"105\" src=\"http:\/\/justmakeit.es\/wp-content\/uploads\/2024\/05\/EKS.png\" alt=\"\" class=\"wp-image-402\" srcset=\"http:\/\/justmakeit.es\/wp-content\/uploads\/2024\/05\/EKS.png 480w, http:\/\/justmakeit.es\/wp-content\/uploads\/2024\/05\/EKS-300x66.png 300w\" sizes=\"auto, (max-width: 480px) 100vw, 480px\" \/><\/a><\/figure>\n\n\n\n<p>M\u00e9todo para realizar la consulta de datos en Elastic Search, hay que a\u00f1adir la query que se quiere ejecutar, que ser\u00e1 de un tipo parecido al siguiente<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>\nvar query = {\"query\":\n  {\"bool\":\n    {\"must\": &#91;\n      { \"match\": { \"condicion1\": \"valor1\" }},\n      { \"match\": { \"condicion2\": \"valor2\" }},\n      { \"bool\": { \"should\": &#91; { \"match\": { \"condicion3\": \"valor3\" }},\n                              { \"match\": { \"condicion4\": \"valor4\" }}\n                            ]\n                }\n      }]\n    }\n  }\n}<\/code><\/pre>\n\n\n\n<p>Idealmente, la query estar\u00e1 en un fichero properties; de no ser as\u00ed, ser\u00e1 necesario incluirla en el c\u00f3digo como un String, y para ello ser\u00e1 necesario escapar las comillas dobles con \\\u00bb<\/p>\n\n\n\n<p>En el caso de la creaci\u00f3n del Array[Column] con las columnas a recuperar de la consulta, he inclu\u00eddo un ejemplo, en el que habr\u00eda que escapar de nuevo las dobles comillas para que el String sea v\u00e1lido. El contenido de schema ser\u00eda similar a: col1;col2;col3;col4;\u00bb}<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>var schema = getColumnsFromES({\"properties\":  &#91;\n    {\"name\":\"col1\",\"type\":\"keyword\",\"alias\":\"alias1\",\"isArray\":\"false\"}, \n    {\"name\":\"col2\",\"type\":\"keyword\",\"alias\":\"alias2\",\"isArray\":\"false\"}, \n    {\"name\":\"col3\",\"type\":\"keyword\",\"alias\":\"alias3\",\"isArray\":\"false\"}, \n    {\"name\":\"col4\",\"type\":\"keyword\",\"alias\":\"alias4\",\"isArray\":\"false\"}, \n    {\"name\":\"col5\",\"type\":\"keyword\",\"alias\":\"alias5\",\"isArray\":\"false\"}]}\")\n\ndef getColumnsFromES(schemaES: String): Array&#91;Column] = {\n    var result = Array.empty&#91;Column]\n    val jsonObject = new Gson().fromJson(schemaES, classOf&#91;JsonObject])\n    val jsonArray: JsonArray = jsonObject.getAsJsonArray(\"properties\")\n    var logColumns = \"\"\n    jsonArray.forEach(\n      elem =&gt; {\n        val name = elem.getAsJsonObject.get(\"name\").getAsString\n        val alias = elem.getAsJsonObject.get(\"alias\").getAsString\n        logColumns = logColumns + name + \";\"\n        if (elem.getAsJsonObject.get(\"isArray\").getAsString == \"true\") {\n          result = result :+ new Column(name).alias(alias)\n        } else {\n          result = result :+ new Column(name).cast(StringType).alias(alias)\n        }\n      }\n    )\n    logger.info(\"Columns for json metadata mapping:\" + logColumns)\n    result\n}\n\ndef getInfoFromElasticSearch(\n      spark,            \/\/ : SparkSession\n      eks_master,       \/\/ : String es.nodes : listado de hosts de elastic\n      \"9200\",           \/\/ : String es.port : connection port\n      \"user\",           \/\/ : String es.net.http.auth.user : connection user\n      \"password\",       \/\/ : String es.net.http.auth.pass : user password\n      query,            \/\/ : String es.query : query\n      schema,           \/\/ : Array&#91;Column] schema\n      collections,      \/\/ : Array&#91;String] es.read.field.as.array.include\n    ): DataFrame = {\n\n    var df: DataFrame = null\n    try {\n      \/\/ fill the options to read\n      var jMap = Map&#91;String, String]()\n      jMap += (\"es.nodes\" -&gt; hostname)\n      jMap += (\"es.port\" -&gt; port)\n      jMap += (\"es.net.http.auth.user\" -&gt; user)\n      jMap += (\"es.net.http.auth.pass\" -&gt; pass)\n      jMap += (\"es.read.field.empty.as.null\" -&gt; \"no\")\n      jMap += (\"es.nodes.wan.only\" -&gt; \"true\")\n      jMap += (\"es.mapping.date.rich\" -&gt; \"false\")\n      jMap += (\"es.scroll.size\" -&gt; \"5000\")\n      jMap += (\"es.input.max.docs.per.partition\" -&gt; \"10000\")\n      jMap += (\"es.query\" -&gt; query)\n      if (collections!= null &amp;&amp; collections.length &gt; 0) {\n        jMap += (\"es.read.field.as.array.include\" -&gt; collections.mkString(\",\"))\n      }\n\n    df = spark\n        .read\n        .format(\"org.elasticsearch.spark.sql\")\n        .options(jMap)          \/\/ Map&#91;String, String]\n        .load(\"trace.form.cont.uti.corp.*\")\n        .select(schema: _*)     \/\/ Array&#91;Column]\n    } catch {\n      case ce: Exception =&gt;\n        logger.error(\"Error reading data from elasticsearch\", ce)\n\n    df\n}<\/code><\/pre>\n","protected":false},"excerpt":{"rendered":"<p>M\u00e9todo para realizar consultas en Elastic Search con Scala<\/p>\n","protected":false},"author":1,"featured_media":402,"comment_status":"closed","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[6,52],"tags":[53,54,34],"class_list":["post-399","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-cosos","category-scala","tag-elastic-search","tag-scala","tag-spark"],"_links":{"self":[{"href":"http:\/\/justmakeit.es\/index.php?rest_route=\/wp\/v2\/posts\/399","targetHints":{"allow":["GET"]}}],"collection":[{"href":"http:\/\/justmakeit.es\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"http:\/\/justmakeit.es\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"http:\/\/justmakeit.es\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"http:\/\/justmakeit.es\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=399"}],"version-history":[{"count":8,"href":"http:\/\/justmakeit.es\/index.php?rest_route=\/wp\/v2\/posts\/399\/revisions"}],"predecessor-version":[{"id":408,"href":"http:\/\/justmakeit.es\/index.php?rest_route=\/wp\/v2\/posts\/399\/revisions\/408"}],"wp:featuredmedia":[{"embeddable":true,"href":"http:\/\/justmakeit.es\/index.php?rest_route=\/wp\/v2\/media\/402"}],"wp:attachment":[{"href":"http:\/\/justmakeit.es\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=399"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"http:\/\/justmakeit.es\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=399"},{"taxonomy":"post_tag","embeddable":true,"href":"http:\/\/justmakeit.es\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=399"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}