Meet the Team: CSDA’s Research Fellow Oleksandr Fialko
A background in physics is an essential tool for Oleksandr Fialko who joined CSDA last year as a Research Fellow in Data Science and is working to improve the accuracy of models being used to predict child welfare outcomes.
A background in physics is an essential tool for Oleksandr Fialko who joined CSDA last year as a Research Fellow in Data Science and is working to improve the accuracy of models being used to predict child welfare outcomes.
LinkedIn chief scientist DJ Patil once said that the best scientists tend to be hard scientists, particularly physicists, who come from a discipline where it is vital to get the most from the data.
Oleksandr says since joining CSDA, he has ‘realised how dirty the real data can be’ and so his work is focused on trying to understand and clean the data.
“Recently, thorough cleaning of a dataset in our homelessness project in Allegheny led to a dramatic increase in accuracy.”
“For me, data analysis and machine learning are not only about cool techniques. More importantly, I believe they can produce useful predictions for our life from the data that has been collected.”
“Intensive human effort is needed to make the data tell a story and, in most cases, it is challenging. But, that is where it is intriguing. I am dedicated to hunting for the treasures in the data that can make a real difference.”
To help create the most accurate models Oleksandr says the CSDA teamwork through the data, applying different machine learning and statistical techniques to identify which is most accurate in each case.
In the case of CSDA’s child welfare and homelessness projects for Allegheny and Douglas County, Oleksandr says Extreme Gradient Boosting (or XGBoost) algorithms have proved the most accurate.
There are challenges to overcome with XGBoost however. Its accuracy is rooted in the complexity of the algorithm, which means it is also the least transparent option. It’s also very new, having only been released in the last four years, and so people are less familiar and trusting of it than options like Logistic Regression which has been around since the late 1950s.
"It is difficult and time-consuming to train it. I would say that every machine learning algorithm is a black box, but XGBoost is the darkest one.”
Oleksandr says the way forward is to implement and test these models and validate the results over time, to demonstrate their accuracy and gain trust from the communities that stand to benefit from their implementation.
“Making a real difference is what I missed when I was a physicist. Now with the CSDA team, I try to help real people using data analytics and machine learning.”
Contact:
Phone:
Email: