不完美的智能，第一部分-垃圾数据

#AI #分析 #数据

人工智能让你的生活更轻松. 它可以推荐你喜欢的电影, 它可以相对安全地驾驶你, 诊断你的疾病, and it can even predict what you’re going to buy before you yourself know. 在正确的人手中，人工智能无疑会改善你的生活. 但是，当我们使用人工智能来支持重大财务决策时，会发生什么呢? Could an AI solution turn down a mortgage application for a first-time buyer simply because they’ve spent too wildly with Deliveroo in the previous month?

AI has a glaring Achilles heel: the data that feeds it only has the meaning we, humans, give it. 数据不能是客观的. 而, 它的意义是由人类构建的, 输入到人类编写的机器程序中, 其结果随后由人类来解释.

As AI and Machine Learning continues to creep into more and more business decisions, 我们需要注意它的缺点以及如何减轻它们. 记住这些老生常谈是很重要的 “垃圾进，垃圾出”: a computer algorithm cannot produce useful outputs if the data it is fed is biased. While financial services can employ algorithms to help with many of their primary functions, 比如借钱给谁，投资到哪里, 除非经过精确的训练，否则它们是无法工作的, 数据真实丰富.

I have narrowed down 3 of the key reasons why businesses end up creating ‘garbage’ data that prevents their AI programs from being objective:

构建问题

For every solution a deep learning program is trying to achieve, there must be a problem. 当机器被训练产生某种输出时, they are at the mercy of how the problem has been defined by the business.

Take, for example, an algorithm designed to increase the profits of a retail credit division. 以这种方式被企业框定, it is probable the solution will identify that those less likely to repay debts in a timely fashion are more immediately profitable, therefore making negligent recommendations about product suitability – an outcome that the business certainly wouldn’t have been seeking.

创建健壮的第一步, useful and accurate AI solution is a clear and objective articulation of the business problem that considers the end customer benefit that is being aimed for. 没有这个, the final solution will be riddled with the biases and errors applied before the technical development even begins.

数据收集

The selection and collation of the data being interpreted by machines hugely influences the results. 如果数据集不能反映现实情况, 它会给你扭曲的结果, 或者更糟的是，它会强化现有的偏见或障碍.

去年亚马逊不得不停止使用人工智能招聘工具 that reviewed job applications to improve its talent identification process. 总的来说，科技是一个男性主导的领域, 该项目让自己明白，男性候选人更受欢迎, 同时歧视女性候选人. 任何简历上有“女人”这个词的, 比如"女子网球队"或者"女子领导小组", 被自动处罚.

此外, 大多数统计模型的基线是历史数据，这有助于创建能够有效训练模型的趋势. When there isn’t sufficient historical data, the output is inherently skewed. 如果没有什么可以比较你的发现, 那么很难说你的模型有多准确.

Even if it seems you have the necessary quantity of data to train the machine, it’s important to scrutinise whether you have the right data to provide an accurate picture.

Most banks will be at the mercy of data points that have been captured by systems that weren’t originally designed to support a specific AI problem statement – potentially resulting in key information being neglected as an input into the algorithm.

Say you choose to investigate the main reasons why customers are unable to make their mortgage repayments using internal account and transaction data; it is plausible you may not have enough context to generate an accurate finding. 你可能已经掌握了顾客的年龄, 收入和外卖习惯, 但这可能不会给你完整的画面. Are those least likely to miss a mortgage repayment also a carer for an elderly parent? Did they go on holiday and forget to pay off their bills ahead of time? 他们的关系状态有变化吗?

数据准备

Feature selection is a key component of data mining, and as such has been described as the 机器学习的“艺术”. Every data set is made up of different “attributes” which must each be determined as significant for consideration or not before being ingested into a computer algorithm.

The problem here arises when feature selection itself is subject to human bias, or even when the data is trained on features that are perhaps ethically inappropriate. 例如, a computer algorithm in the US used to help predict the likelihood of a criminal re-offending erroneously flagged black defendants as being twice as likely to break the law again 而不是白人.

If an AI/ML program indicates that age is the most important factor in determining credit worthiness – as the older you are the better you are at paying back loans – does that mean young people should be less eligible for a home loan?

理解垃圾数据

You don’t need to be a data scientist or computer programmer to understand that if the data used to feed an AI program is flawed and skewed by human bias, then whatever information it tells us is also going to be flawed and equally skewed. 如果我们不考虑这些缺点, we can’t be objective and might just be reinforcing the very biases we seek to eliminate.

有了好的数据，人工智能可以发挥令人印象深刻的作用我说的不仅仅是在Netflix上推荐电影. 然而, 即使有很好的数据, algorithms can be trained on hidden biases that continue to give us negligible results… More on this to come!

不完美的智能，第一部分-垃圾数据

分享

相关的帖子

由Wise Owl 培训发布

计算机培训室租用十大网博靠谱平台数字会员折扣

由Datum数据中心发布

Manchester Data Centre Announced as PoP (Point of Presence) for London Internet Exchange (LINX)

由Fuzzy Labs发布

Fuzzy Labs release free tool to export Google Analytics data into Google BigQuery

由英国汽车贸易公司发布

高级技术讲座:汽车贸易商

由Datum数据中心发布

十大网博靠谱平台数据中心投资1英镑.500万美元用于能源效率

由高地营销发布

为什么你需要为谷歌分析你的网站

由Datum数据中心发布

十大网博靠谱平台数据中心你不知道的10件事

由Datum数据中心发布

调查显示企业不知道他们的云数据在哪里

由高地营销发布

转化率优化的重要提示

由Datum数据中心发布

十大网博靠谱平台数据中心投资1英镑25k into Advanced Security Technologies

澳门十大正规赌博娱乐平台