About me
Click here to go direct to my Github profile
Hi, I’m a Data Scientist with an Product Development and Strategic analysis background. I hold a Masters degree in Mechanical Engineering and Aeronautics with a focus in numerical analysis and simulation. I have extensive experience in the automotive and energy sectors in product development, testing and strategic insight functions.
My commercial projects have included:
- Fleet telematics: Deployment of a vehicle fleet telematics system including ETL pipeline, data processing, visualisation and dashboarding.
- Battery health forecasting: Regression model to forecast EV battery failure, and add data-driven insight to support >£10mil warranty analysis.
- Pipe inspection tools: Developed data processing and reporting tools for 2x cutting edge subsea pipe test inspection products, laser bore scanning and 3D strain imaging.
My technical expertise is in data mining, visualisation and software development (Python, Matlab etc). I work on Data analytics and machine learning projects in my free time. Some personal projects include:
Projects
Used car pricing analysis
- Built a used car valuation model to optimise the selling price for my own car.
- Web-scraped 1000+ adverts from Autotrader and analysed price trends against features like age, mileage, engine size etc.
- With an SVR Regression model I achieved R^2: 0.97 and MAE: £961. The most influential features on price were age and mileage.
- The final result valued my car within £200 of Autotrader's own recommended selling price.
Tools: Python
Pandas
NumPy
Requests
BeautifulSoup4
Matplotlib
Seaborn
Scikit-learn
Job market analysis (NLP)
- Created an NLP job title classifier with data scraped from indeed.com.
- Automation of job titling could boost recruitment efficiency and better reach the most suitable candidates.
- Extracted 'skill tags' for each role (Python, Cloud, Machine Learning etc).
- Deployed a web-app using Streamlit, allowing anyone to classify a job as 'Data Scientist', 'Data Analyst', or 'Data Engineer'.
Libraries: Requests
BeautifulSoup4
Pandas
NTLK
Seaborn
Scikit-learn
Streamlit
MOT data analysis
- Analysed 30mi MOT tests from GOV.uk for trends in vehicle ownership, pass/fail rates etc.
Libraries: SQLite
Pandas
Seaborn
Scikit-Learn
Energy Demand Forecasting
- Predicted survival probability for passengers onboard the Titanic cruise ship.
Libraries: …
Data science tools
- Languages:
Python
SQL
Matlab
- Databases:
SQLite
BigQuery
- Machine learning:
sklearn
- Tex analytics:
nltk
spaCy
- Data manipulation:
pandas
numpy
dask
scipy
- Visualisation:
matplotlib
seaborn
plotly
Tableau
streamlit
Other skills and tools
- Google Cloud:
BigQuery
Datastudio
- Web scraping:
BeatifulSoup4
- Other:
pptk
(point cloud visualisation)
Currently learning
tensorflow/keras
Databricks
spark
Airflow