a week ago
A customer has some different Excel file as input, and their need is to be able to connect a tool to these files, extract the 'valid' elements from the input files, generate a copy of all the 'invalid' elements with the reason of the invalidity to be able to correct the data before a new integration
The first idea was to develop a full C# dev to achieve it, but if Spatial Modeler can be configured to do the job, it is better
Is it possible to confgure Spatial Modeler for :
I have only seen a GeoCSV data input, and it is not possible to make something if the input data are only alphanumeric data (without geometry)
Solved! Go to Solution.
a week ago
I am not aware of any means by which to bring Excel data into Spatial Modeler. This would be an excellent Idea on the Ideation board for SM.
Something valuable to know is that with the 2022 this October we will introduce a CREATEPOINT function for the expression system used with the Generate Functional Attributes operator. This will allow you to take any coordinate data stored as numbers (e.g. X, Y, Z attributes) and create point geometries from them. It would be natural to use if your spreadsheet contains coordinates.
We must not confuse ingest of XLSX files with their simultaneous validation. Attribute validation is also a reasonable Idea for the Ideation board, but it would be a separate operator, not part of Features Input (which would presumedly be used for input of XLSX). Such an operator would require, as you indicate, a set of rules and the ability to identify those Features that violate the rules so that they can be exported.
Another thing valuable to know is that this may be achievable (depending on the complexity of the rules) using a new operator that will be delivered with the 2022 release this October called Classify Features. This operator allows you to build a set of classification expressions using true/false statements, and it assigns each feature to one output class (or more if desired). So you could create a classifying filter such as "LEN(TextAttribute)>40" and it would assign all features with a text value length greater than 40 to be classified as invalid. Or maybe a numeric field needs to be a positive value or zero, then you could have a classifying filter of "NumericAttribute < 0.0".
But you must keep in mind that the data must first be served in a valid schema, so if you're looking to flag as invalid some text that is stored in a numeric column in Excel (or any other true schema mismatch), it will never make it into the model - once the field is declared as being of type number, it cannot store text for downstream validation. - Hal
a week ago
Thanks for the different elements
Ok for the input problem, XSLX can be added in the ideation box
But for this case, a 'real' validation is needed, and it is not in the roadmap for the moment. I have seen a 'Features Information' operator, and in my mind it wasn't really hard to implement a simple comparison between 2 different 'Features Information' output. With this operator, we are able today to know what is the attributes type from an unknown source (I have saved a simple .XLSX file in .CSV and created a GeoCSV input based on this file -> the 'Features Information' operator returns in the FieldTypes output port the correct types of the attributes (numeric, string, ...) excepted for the dates).
The idea is to be able to create (by hand or based on a table in a DB) a FieldTypes collection and compare it with the input file FieldTypes result. Of course, it will be useful to be able to loop on the input lines to check line after line if the comparison is correct (loop on the lines, create a 'Features Information' result for each line, and compare it with the model
a week ago
Features Information operator is a metadata interrogation tool. Apart from its ability (if requested) to count the number of features and to compute their footprint, it looks only at metadata, not data. The metadata it is looking at was already assigned by the Features Input operator. So in your example where it correctly guessed numeric and string data but failed to correctly identify date data, those decisions were made in Features Input.
Sometimes when reading from a data source, that data source has a formal metadata system (as with a database), and sometimes it does not (as with a CSV file). When confronted with a CSV file or any other kind of "unstructured" data lacking an enforced schema, the reading software can only guess what the user intended. Many times such software simply samples the first few rows to make a determination of field type (e.g. this is often true with GDAL). This can of course lead to errors, and those errors can be of two kinds (at least):
If you are wanting to compare schema information that is output from the Features Information operator to a "template" or "example" schema you have invented, that is fine, I'm sure you can do so using existing operator tools to compare string values and string lists. But it will still only be a schema comparison, not a comparison of data values to the schema. This sort of idea you gave isn't possible:
Of course, it will be useful to be able to loop on the input lines to check line after line if the comparison is correct (loop on the lines, create a 'Features Information' result for each line, and compare it with the model.
Features Information gives only one answer for the entire Features dataset schema, it does not (cannot) vary on a per-line / per-feature basis.
If you had a way to force all incoming values to be captured into text fields (the most generic type that can handle everything else) then hypothetically you could test individual values in each field/row combination against some desired data type for the field. I suspect that could be achieved through the Generate Functional Attributes operator. But I don't know how to keep the data source reading software within Features Input from interpreting data types as something other than text. - Hal