Spatial Modeler Discussions

Gain, exchange, and share knowledge, or discuss topics with other Hexagon Geospatial Product pioneers and experts to get the most out of Hexagon Geospatial products.
Showing results for 
Search instead for 
Do you mean 
Reply
Frequent Contributor
Posts: 138
Registered: ‎11-10-2015
Accepted Solution

Connect the Spatial Modeler to Excel file, and validate the content

Hi all,

 

A customer has some different Excel file as input, and their need is to be able to connect a tool to these files, extract the 'valid' elements from the input files, generate a copy of all the 'invalid' elements with the reason of the invalidity to be able to correct the data before a new integration

 

The first idea was to develop a full C# dev to achieve it, but if Spatial Modeler can be configured to do the job, it is better

 

Is it possible to confgure Spatial Modeler for :

  • Connect .xlsx files as input?
  • Configure a valid Feature Schema (types of the attributes, max length, ...) and validate a feature input to be able to extract the 'valid' (conform to the fetaure schema) features ?
  • Extract all the invalid features
  • Export the invalid features

I have only seen a GeoCSV data input, and it is not possible to make something if the input data are only alphanumeric data (without geometry)

 

Regards

Technical Evangelist
Technical Evangelist
Posts: 434
Registered: ‎02-03-2016

Re: Connect the Spatial Modeler to Excel file, and validate the content

I am not aware of any means by which to bring Excel data into Spatial Modeler.  This would be an excellent Idea on the Ideation board for SM.

 

Something valuable to know is that with the 2022 this October we will introduce a CREATEPOINT function for the expression system used with the Generate Functional Attributes operator.  This will allow you to take any coordinate data stored as numbers (e.g. X, Y, Z attributes) and create point geometries from them.  It would be natural to use if your spreadsheet contains coordinates.

 

We must not confuse ingest of XLSX files with their simultaneous validation.  Attribute validation is also a reasonable Idea for the Ideation board, but it would be a separate operator, not part of Features Input (which would presumedly be used for input of XLSX).  Such an operator would require, as you indicate, a set of rules and the ability to identify those Features that violate the rules so that they can be exported.

 

Another thing valuable to know is that this may be achievable (depending on the complexity of the rules) using a new operator that will be delivered with the 2022 release this October called Classify Features.  This operator allows you to build a set of classification expressions using true/false statements, and it assigns each feature to one output class (or more if desired).  So you could create a classifying filter such as "LEN(TextAttribute)>40" and it would assign all features with a text value length greater than 40 to be classified as invalid.  Or maybe a numeric field needs to be a positive value or zero, then you could have a classifying filter of "NumericAttribute < 0.0".

 

But you must keep in mind that the data must first be served in a valid schema, so if you're looking to flag as invalid some text that is stored in a numeric column in Excel (or any other true schema mismatch), it will never make it into the model - once the field is declared as being of type number, it cannot store text for downstream validation.  - Hal

Frequent Contributor
Posts: 138
Registered: ‎11-10-2015

Re: Connect the Spatial Modeler to Excel file, and validate the content

Hi Hal

 

Thanks for the different elements

 

Ok for the input problem, XSLX can be added in the ideation box

 

But for this case, a 'real' validation is needed, and it is not in the roadmap for the moment. I have seen a 'Features Information' operator, and in my mind it wasn't really hard to implement a simple comparison between 2 different 'Features Information' output. With this operator, we are able today to know what is the attributes type from an unknown source (I have saved a simple .XLSX file in .CSV and created a GeoCSV input based on this file -> the 'Features Information' operator returns in the FieldTypes output port the correct types of the attributes (numeric, string, ...) excepted for the dates).

 

The idea is to be able to create (by hand or based on a table in a DB) a FieldTypes collection and compare it with the input file FieldTypes result. Of course, it will be useful to be able to loop on the input lines to check line after line if the comparison is correct (loop on the lines, create a 'Features Information' result for each line, and compare it with the model

 

Regards

 

 

 

 

Technical Evangelist
Technical Evangelist
Posts: 434
Registered: ‎02-03-2016

Re: Connect the Spatial Modeler to Excel file, and validate the content

Features Information operator is a metadata interrogation tool.  Apart from its ability (if requested) to count the number of features and to compute their footprint, it looks only at metadata, not data.  The metadata it is looking at was already assigned by the Features Input operator.  So in your example where it correctly guessed numeric and string data but failed to correctly identify date data, those decisions were made in Features Input.

 

Sometimes when reading from a data source, that data source has a formal metadata system (as with a database), and sometimes it does not (as with a CSV file).  When confronted with a CSV file or any other kind of "unstructured" data lacking an enforced schema, the reading software can only guess what the user intended.  Many times such software simply samples the first few rows to make a determination of field type (e.g. this is often true with GDAL).  This can of course lead to errors, and those errors can be of two kinds (at least):

 

  1. It decides on a field type that is too broad.  For example, it's really a date field but it decides to present it as a text field.  Well date values can still be stored in a text field so there is "loss" only at the metadata level not at the data level.
  2. It decides on a field type that is too narrow.  For example, it's really a text field but it decides to present it as a date field.  Many strings cannot be stored as dates, so some values will fail to comply to the schema and some data will be lost as it is served (usually null values are served instead).

If you are wanting to compare schema information that is output from the Features Information operator to a "template" or "example" schema you have invented, that is fine, I'm sure you can do so using existing operator tools to compare string values and string lists.  But it will still only be a schema comparison, not a comparison of data values to the schema.  This sort of idea you gave isn't possible:

 

Of course, it will be useful to be able to loop on the input lines to check line after line if the comparison is correct (loop on the lines, create a 'Features Information' result for each line, and compare it with the model.

 

Features Information gives only one answer for the entire Features dataset schema, it does not (cannot) vary on a per-line / per-feature basis.

 

If you had a way to force all incoming values to be captured into text fields (the most generic type that can handle everything else) then hypothetically you could test individual values in each field/row combination against some desired data type for the field.  I suspect that could be achieved through the Generate Functional Attributes operator.  But I don't know how to keep the data source reading software within Features Input from interpreting data types as something other than text.  - Hal

Frequent Contributor
Posts: 138
Registered: ‎11-10-2015

Re: Connect the Spatial Modeler to Excel file, and validate the content

Thanks Hal for the detailed answer

 

I will use a custom C# dev do achieve this input data validation before integration