giovedì 23 giugno 2016

Parsing fixed width text with Java

...the unnecessarily complicated way.

Recently I've been asked to parse text files with fixed-width columns (see also ISAM).
The task is menial but I had two different schema and particularly large files (77 fields and about 21.000 records).
The schema were described in a table with the name and the width of each field.
Here's what I did

  1. pasted the table into Excel
  2. used a formula to cleanup the field names (removing spaces, hyphen, parentheses, whatever)
  3. used another formula to build a Java declaration for each field
  4. created a class (with Lombok) with the record structure

Here's a (short) example.

field name field width field name (Java) Java declaration
flg-store 1 flg_store @Column(width=1, index=47) private String flg_store;
Code NEW 10 code_new @Column(width=10, index=58) private String code_new;

And here's the class I built:

@Data
public class Item
{
 @Column(width=1, index=0) private String flg_store;
 @Column(width=10, index=1) private String code_new;
}

So the objects of this class will hold the data parsed from each record. The instructions for parsing are attached to each field via annotations.
Of course any other configuration method would do (property file, XML, JSON, even the Excel file itself), but I find that having the code and the schema all in the same place is very effective.
Here is the annotation:

import java.lang.annotation.ElementType;
import java.lang.annotation.Retention;
import java.lang.annotation.RetentionPolicy;
import java.lang.annotation.Target;

@Target(ElementType.FIELD)
@Retention(RetentionPolicy.RUNTIME)
public @interface Column
{

 public int width();
 public int index();
 
}

Ok, so how to use this information? I found it easier to build a list of instructions for parsing, and then parse each line to build the object.

package it.digiwrite.tiis.sap.parser;

import it.digiwrite.anoto.utility.ReflectionHelper;

import java.lang.reflect.Field;
import java.util.ArrayList;
import java.util.List;

import lombok.Data;

public class FixedWidthParser<T>
{
 
 @Data static class FieldDescriptor
 {
  int index;
  int width;
  String fieldName;
 }
 
 List<FixedWidthParser.FieldDescriptor> mapIndexFieldName;
 
 public FixedWidthParser(Class<T> what)
 {
  mapIndexFieldName = new ArrayList<FixedWidthParser.FieldDescriptor>();
  for (Field field : what.getDeclaredFields())
  {
   Column column = field.getAnnotation(Column.class);
   FixedWidthParser.FieldDescriptor fd = new FixedWidthParser.FieldDescriptor();
   fd.setFieldName(field.getName());
   fd.setIndex(column.index());
   fd.setWidth(column.width());
   mapIndexFieldName.add(column.index(), fd);
  }
 }

 public T parseLine(String line, Class<T> cla22)
 {
  try
  {
   int lastPosition = 0;
   T item = (T)cla22.newInstance();
   
   ReflectionHelper rh = new ReflectionHelper(item); // 1
   
   for (int i = 0; i<mapIndexFieldName.size(); i++)
   {
    FixedWidthParser.FieldDescriptor fd = mapIndexFieldName.get(i);
    String token = line.substring(lastPosition, lastPosition+fd.width);
    lastPosition = lastPosition+fd.width;
    rh.set(fd.fieldName, token.trim()); // 2
   }
   return item;
  }
  catch (Exception e)
  {
   throw new RuntimeException("Error parsing line: " + line);
  }
 }
 
}
NOTE:
  1. ReflectionHelper is a utility class to wrap an object; here it is used only to...
  2. ...easily set a field via reflection
  3. this code uses templates

And how do we use this?

package it.digiwrite.tiis.sap;

import static org.junit.Assert.assertEquals;
import it.digiwrite.tiis.sap.bean.Item;
import it.digiwrite.tiis.sap.parser.FixedWidthParser;

import java.io.BufferedReader;
import java.io.FileReader;

import org.junit.Test;

public class FixedWidthParserTest
{

 @Test
 public void flusso()
 {
  String line = "Y0000011111";
  FixedWidthParser<Item> fwp = new FixedWidthParser<item>(FlussoCompletoSap.class);
  FlussoCompletoSap fcs = fwp.parseLine(line, FlussoCompletoSap.class);
  assertEquals("Y", fcs.getFlg_store());
  assertEquals("0000011111", fcs.getCodenew());
 }
 
}


NOTE 2: here is the code for ReflectionHelper
package ...;

import java.lang.reflect.Field;
import java.lang.reflect.Method;

public class ReflectionHelper
{

	private Object obj; // wrapped object
    private Class klass; // wrapped object Class
    private String className; // wrapped object class name

    public ReflectionHelper(Object obj)
    {
        this.obj = obj;
        this.klass = obj.getClass();
        String className = klass.getName();
        this.className = className.substring(className.lastIndexOf("."));
    }

    public String[] getFieldsNames()
    {
        Field[] fields = klass.getDeclaredFields();
        String[] fieldsNames = new String[fields.length];
        int i = 0;
        for (Field f : fields)
        {
            fieldsNames[i++] = f.getName();
        }
        return fieldsNames;
    }

    public String getClassName()
    {
        return className;
    }

    public Class getObjectClass()
    {
        return klass;
    }

    /**
     * Gets the property "itemName" from the wrapped object "obj", equivalent to: obj.getItemName()
     * @param itemName
     * @return obj.getItemName
     * @throws Exception
     */
    public Object get(String itemName) throws Exception
    {
        Method get;
        try
        {
            itemName = capitalize(itemName);
            try
            {
                get = klass.getMethod("get" + itemName, new Class[0]);
            }
            catch (NoSuchMethodException nsme)
            {
                get = klass.getMethod("is" + itemName, new Class[0]);
            }
            return get.invoke(obj, new Object[0]);
        }
        catch (Exception e)
        {
            throw e;
        }
    }

    /**
     * 
     * @param itemName
     * @return string with the first letter uppercase (ex. "example" --> "Example")
     */
    private String capitalize(String itemName)
    {
        return itemName.substring(0, 1).toUpperCase() + itemName.substring(1);
    }

    public Class fieldType(String fieldName) throws Exception
    {
        Class x = null;
        Class clazz = klass;
        while (clazz != null && (x == null))
        {
            try
            {
                Field field = clazz.getDeclaredField(fieldName);
                x = field.getType();
            }
            catch (NoSuchFieldException nsfe)
            {
                clazz = clazz.getSuperclass();
            }
        }
        if (x == null)
        {
            throw new RuntimeException("No such method: '" + fieldName + "'");
        }
        return x;
    }

    public void set(String fieldName, Object value) throws Exception
    {
        String methodName = "set" + capitalize(fieldName);
        Class class1 = null;
        try
        {
            class1 = fieldType(fieldName);
            Method set = klass.getMethod(methodName, new Class[] { class1 });
            Object[] args = new Object[] { value };
            set.invoke(obj, args);
        }
        catch (Exception e)
        {
            throw new RuntimeException("class: " + class1 + "; method: " + methodName,e);
        }
    }

    public String executeVoidToString(String methodName)
    {
        try
        {
            Method get = klass.getMethod(methodName, new Class[0]);
            return (String) get.invoke(obj, new Object[0]);
        }
        catch (Exception e)
        {
            throw new RuntimeException(e);
        }
    }

	/**
     * "RogerRabbit" --> "roger_rabbit"
     */
    public static String camelCaseToDb(String fieldName)
    {
        if ((fieldName==null)||(fieldName.length()==0)) return fieldName;
        String charE = fieldName.substring(0, 1);
        boolean lowerCase = charE.toLowerCase().equals(charE);
        String result = charE;
        for (int i=1; i<fieldName.length(); i++)
        {
            charE=fieldName.substring(i,i+1);
            boolean isThisTheCase = charE.toLowerCase().equals(charE);
            if (lowerCase & !isThisTheCase) result+="_";
            result+=charE.toLowerCase();
            lowerCase=isThisTheCase;
        }
        return result;
    }

}

martedì 28 gennaio 2014

Learning regex [ENG]

This is wonderful for beginners.

And as a bonus I throw in a regexp replacement pattern for Notepad++ to convert a list of mime types and file extensions in a Java map loading snippet:

Find: ([^\s]*)(\s+)(.*)
Replace: mimemap.put(".\3", "\1");
transforms this:
application/envoy evy
application/fractals fif
application/futuresplash spl
application/hta hta
application/internet-property-stream acx
application/mac-binhex40 hqx
application/msword doc
application/msword dot
application/octet-stream *
application/octet-stream bin
...
into this:
mimemap.put(".evy", "application/envoy");
mimemap.put(".fif", "application/fractals");
mimemap.put(".spl", "application/futuresplash");
mimemap.put(".hta", "application/hta");
mimemap.put(".acx", "application/internet-property-stream");
mimemap.put(".hqx", "application/mac-binhex40");
mimemap.put(".doc", "application/msword");
mimemap.put(".dot", "application/msword");
mimemap.put(".*", "application/octet-stream");
mimemap.put(".bin", "application/octet-stream");
...

giovedì 16 gennaio 2014

Eclipse Regex to match simple getter/setter methods

To match simple getters/setters:
public .* [g|s]et(.)*(\r\n)(.)*\{(\r\n).*(\r\n).*\}

lunedì 11 novembre 2013

Edit distance in Oracle [ENG]

Thanks to Paolo C. I just found out that Oracle has its own flavour of edit-distance functions. Very neat and mighty useful.


mercoledì 26 giugno 2013

Java: CLI arguments [ENG]

Something useful for when you have to juggle command line parameters in Java.

venerdì 22 marzo 2013

Web Form Autosave [ENG]

Modern browsers support a simple method to store local data across browser (and server) crashes.

The following script autosaves every 10 seconds the form(s) data and - on load - asks if you want to restore the form data.

NOTE: incomplete untested draft! Use at your own risk!

function storeFormFields(inputForm)
{
 for(i=0; i%lt;inputForm.elements.length; i++)
 {
  var element = inputForm.elements[i];
  localStorage.setItem(inputForm.name+"."+element.name, element.value);
 }
}

function loadFormFields(inputForm)
{
 for(i=0; i<inputForm.elements.length; i++)
 {
  var element = inputForm.elements[i];
  var value = localStorage.getItem(inputForm.name+"."+element.name);
  element.value = value;
 }
}

 setInterval(
 function() 
 {
        storeFormFields(propAttForm);
  storeFormFields(ullForm);
  localStorage.setItem("commit_not_done", "true");
    }, 
 10000);
 
function onloadRestore()
{
 if (localStorage.getItem("commit_not_done")=="true")
 {
  if (confirm("Effettuare il recupero dei dati non salvati?"))
  {
   loadFormFields(propAttForm);
   loadFormFields(ullForm);
  }
 }
}

giovedì 21 marzo 2013

Agile tips from Thoughtworks [ENG]

From the session 10 Things You Can do to Better Lead Your Agile Team by Jonathan Rasmusson:

  1. Ask the tough questions early ("inception deck")
    1. Why are we here?
    2. Elevator pitch (small card with some relevant points)
    3. Design a product box
    4. Create a NOT list
    5. Meet your neighbours
    6. Show the solution
    7. What keeps us up at night
    8. Size it up
    9. What's going to give
    10. What's going to take
  2. Go Spartan: few key super important stories, build end-to-end in the first iteration, super bare-bones
  3. Make truth self-evident: put on a wall a done/to-do graph
  4. The burn down graph: show when requirement churn delays the deadline
  5. Goodwill and trust by delivering fiercely ("Yesterday you said tomorrow"): when you build trust you can spend it when you make mistakes
  6. Set the bar high in the beginning (practices) and keep it there
  7. Have teams demo their software: keep the team accountable
  8. Give purpose: explain the "why" of everything, get the people excited, develop a sense of gratitude ("you're lucky to have a job" or "you didn't have to dodge sniper fire to get to the office"), public recognition, pride in the work
  9. Give up control: good people crave three things: autonomy, mastery, purpose

    Good leaders:
    1. Share informations
    2. Give away their best secrets
    3. Teach others everything they know
    4. Make those around them better
    5. Aren't afraid of becoming obsolete because they never are!
    6. Are not afraid, do not operate from a position of fear
  10. Deal with drama and dysfunction; three simple truth:
    1. It is impossible to gather all the requirements at the beginning of a project
    2. Whatever requiremente you do gather are guaranteed to change
    3. There will always be more to do than time and money allow
  11. Follow your gut, serve your team and be prepared to get out of the way

Highlights:

  1. imagining a product box and the advertising on the box (very cool!)
  2. the project community is always bigger than you expect (i.e. many outsiders' influences are to be factored in)
  3. show the solution early even if you are not yet sure
  4. set the scope, budget, time and quality with the customer (who will almost every time agree that budget is king)
  5. put every thing on the table with the team about technology, organization, who's who...
  6. "I'm writing these test just because you ask": before the project starts say "I want to do things this way and here's why"
  7. You just don't worship, there is no "one way", you gotta be you, you know where you're strong and where you're not so strong

mercoledì 2 gennaio 2013

Swapping Fields with Autohotkey [ENG]

Let's say you have a really messed up contact books on your phone and you decide, as a new year resolution, to tidy it up just a little bit. You unsheathe your Kies and find that many of your contact have first and second name swapped. You still have your new years dinner and alcohol in you, so you don't seem to be able to summon the strength for a super smart script trying to sort the thing up for you, so you settle for the mininum wage and whip up an Autohotkey script to just swap the field in focus with the next in the tab order:
;--------------------------------------------------------------
; Swap consecutive fields
^#tab::
{
 ; Select all, cut
 Send ^a
 clipboard=
 Send ^x
 ClipWait
 firstField = %clipboard%
 ; Move to next field
 Send {Tab}
 ; Select all, cut
 Send ^a
 clipboard=
 Send ^x
 ClipWait
 secondField = %clipboard%
 ; Insert the first field
 Send %firstField%
 ; Get back on the previous field
 send +{Tab}
 ; Insert the second field
 Send %secondField%
}
Just some clicks away and the contact book is back in shape.

venerdì 7 dicembre 2012

CTRL+ENTER in Skype [ENG]

The new version of Skype lacks the option to set CTRL+ENTER as the standard hotkey to send a message. Why on earth they thought that removing an option could be a good idea is beyond me. Anyway if you have Autohotkey all you have to do is append these lines to your script:
#IfWinActive ahk_class tSkMainForm
^enter::
{
   Send {enter}
}

venerdì 12 ottobre 2012

Skipping tests and getting away with it

An insightful post about testing, iOS and quality software: It's not about the unit tests