Skills: Data Engineering - Spark - AWS - Cloudera - Databricks - semi structured and structured data - Machine Learning - 

Our financial services client has multiple sources of unstructured, semi structured and some structured data. It has an exciting initiative to architect, build, and operate an advanced unified data capability to be exploited by its Data Scientists in supporting a data driven decision process’ strategy.

As part of this initiative our client requires an experienced Data Engineer, ideally with some Machine Learning, on a contract basis for their London office. The successful candidate will be an early member of the team that will develop and maintain this advanced data capability.

The emphasis is providing data for data science rather than for an enterprise data warehouse. Also, the unstructured data will predominately be textual and not signal, image or sound based at this initial stage.

You will be required to work on a hands-on basis from Day 1. and will have demonstrable practical knowledge of & experience in areas such as:

  • working collaboratively with business users in understanding & exploring their actual and potential needs & data sources as well as with data scientists who will select / develop and execute the algorithms to deliver insights and correlations to support business decision making using the data capability that you have developed and maintain.
  • planning & managing your (and potentially others) work / projects within a control framework, e.g. agile, scrum, in context of overall governance and use of tools, e.g. trello, and maintenance of appropriate project artefacts
  • understanding & defining data, such as in data modelling & defining data, e.g.  key / values, tuples, graphs, relational models, as well as schema / read on schema approaches, data structures, ubiquitous language, etc.
  • acquiring, configuring, and managing the architecture and infrastructure required to support such a data capability, such as, with providers, e.g. AWS, Google, Cloudera, Databricks; containers, e.g. Docker; and resource management, e.g. YARN, etc.
  • acquiring and storing data, such as, raw data acquisition, e.g. using APIs; various raw formats, e.g. PDF, CSV, JSON; storage, e.g. HDFS, S3; storage formats, e.g. Parquet, ORC, etc.
  • processing data, such as, in transforming data, e.g. Spark, including it’s RDDs & Operations; machine learning, e.g. Spark MLLib, Python, Scala, Java; statistical analysis, e.g. R; graph analytics, e.g. SparkR; structured queries, e.g. Spark SQL, Hive; and information extraction, e.g. Spark NLPL, SpaCy, ScalaNLP, etc.

You will probably have a minimum of 5 years commercial experience in the above technologies and have a degree or MSc in Computer Science, Mathematics or Physics.

Your responsibilities, working either individually or as part of a team, will include:

  • initiating, and participating / managing your and others work in, ‘projects’ including use of tools and maintenance of artefacts
  • exploring, understanding & describing business requirements / use cases and existing & new sources of data
  • proposing, designing, and incrementally implementing an appropriate architecture & set of technologies as required for the advanced data capability
  • driven by business requirements rapidly developing & executing PoCs, hacks, and solutions for acquiring, storing, transforming data and making it available for subsequent processing, e.g. by data scientists
  • and also involvement in this subsequent processing in developing & executing, e.g. using algorithms, graph, statistical and or SQL analytics, end point solutions to be consumed by business users

