Scoring of welfare beneficiaries: the indecency of CAF’s algorithm now undeniable

After more than a year of mobilization against the CAF’s – the family branch of the french welfare system – practices of scoring welfare beneficiaries alongside the Stop Contrôles and Changer de Cap collectives¹You can contact them at stop.controles@protonmail.com and contact@changerdecap.net.
, and after having detailed how the CAF’s algorithm works and its political framework, we are today publishing the source code of this scoring algorithm. We also invite you to consult here our presentation page on the use of similar algorithms within other administrations.

Technical details of the algorithm (code, list of variables and their weightings) and the methodology used to construct profile-types are presented in this methodological appendix.

Little by little, light is being shed on a particularly pernicious mass surveillance system²The CAF is not the only administration to use this type of algorithm, but it was the first to do so. We’ll be back shortly with a more global view of the use of these kinds of algorithms by social administrations as a whole.
: the CAF’s use of an algorithm for rating recipients, aimed at predicting which welfare recipients are (un)trustworthy and need to be monitored.

As a reminder, this algorithm, built from the analysis of hundreds of data records that the CAF holds on each recipient³While the algorithm itself uses only a few dozen variables to calculate recipients’ scores, these are selected after a so-called “training” phase mobilizing over 1,000 pieces of information per recipient. For technical details see Pierre Collinet’s article “Le datamining dans les caf : une réalité, des perspectives”, written in 2013 and available here.
, assigns a “suspicion score” to each recipient. This score, updated every first of the month, ranges from zero to one. The closer it is to one, the more the algorithm judges a recipient to be suspicious: a check is triggered when it approaches its maximum value⁴Controls at CAF are of three types. Automated checks are procedures for verifying recipients’ declarations (income, employment status, etc.), organized via the interconnection of administrative files (tax, employment office, etc.). They are by far the most numerous. Documentary checks involve requesting additional supporting documents from the claimant. Finally, on-site checks are the least numerous but the most intrusive. Carried out by a CAF controller, they involve an in-depth inspection of the claimant’s situation. These are the ones that are now overwhelmingly triggered by the algorithm following a deterioration in a recipient’s rating (See Vincent Dubois, “Contrôler les assistés”, p.258).
.

Lifting opacity to end the media battle

Our criticisms focus as much on the nature of this predictive surveillance with its dystopian overtones as on the fact that the algorithm deliberately targets the most precarious⁵See above all Vincent Dubois’s book published in 2021. “Contrôler les assistés. Geneses et usage d’un mot d’ordre” On the over-control of the most precarious populations, see chapter 10. On the political history of the “fight against welfare”, and the major role played in France by Nicolas Sarkozy, see chapter 2. On the evolution of control policies, their centralization following the introduction of the algorithm and the definition of targets, see pages 177 and 258. On the contestation of national targeting plans by local CAF directors, see page 250.
. Faced with mounting protests, CAF managers took refuge behind the opacity surrounding the algorithm to minimize both this state of affairs and their responsibility in establishing a deliberately discriminatory control policy. A CAF director went so far as to argue that “the algorithm is neutral” and would even be “the opposite of discrimination” since “no one can explain why a file is targeted”⁶Extract from a CAF director’s response to the criticisms levelled by the Défenseur des Droits against the use of this algorithm.
.

This is why we battled for many months to get the CAF to give us access to the source code of the algorithm, i.e. the “formula” used by its managers to score recipients⁷The CAF initially provided us with “redacted” source code in which almost all variable names were hidden. We eventually obtained the code for two versions of the algorithm. The first was used between 2010 and 2014. The second between 2014 and 2018. Six variables were still hidden from the “2010” model and 3 from the “2014” model.
. We hope that its publication will put an end to these untruths so that, finally, a debate can take place around the political abuses that have led a social institution to resort to such practices.

The algorithm of shame…

Reading the source code of the two models used between 2010 and 2018 – the CAF has refused to send us the current version of its algorithm – first confirms the scale of the surveillance system for detecting “suspect” claimants set up by the CAF.

Family, professional and financial situation, place of residence, type and amounts of benefits received, frequency of connections to the web space, time since last visit to reception, number of emails exchanged, time since last check, number and types of declarations: the list of some forty parameters taken into account by the algorithm, available here, reveals the degree of intrusiveness of the surveillance at work.

It focuses on data declared by a recipient, those linked to the management of his/her file and those linked to his/her interactions, in the broadest sense, with the CAF. Finally, each parameter is analyzed according to a history of variable duration. Targeting both recipients and their relatives, it covers the more than 32 million people, including 13 million children, living in a household receiving a CAF benefit.

As for the question of targeting the most precarious, the publication of the source code comes to give definitive proof of the discriminatory nature of the criteria used. Thus, among the variables increasing the “suspicion score” are:

Low income,
The fact of being unemployed,
The fact of being an RSA (french minimum income benefit) recipient,
Living in a “disadvantaged” neighborhood⁸Concerning the variable linked to place of residence, the latter is a priori one of the variables masked in the code received. However, it is mentioned in the CAF’s response to CADA, which is why it seems reasonable to include it here. See our methodological appendix for a detailed discussion of the formula.
,
Spending a significant portion of income on rent,
The fact of not having a job or stable income.

The height of cynicism, the algorithm deliberately targets people with disabilities: receiving the Allocation Adulte Handicapé (AAH) while working is one of the parameters with the strongest upward impact on a recipient’s score.

In one graph

Of course, these factors are correlated and cannot be considered independently of each other. For example, it is likely that a person with a low income has experienced periods of unemployment or is on minimum social benefits etc….

Disposing of both the parameters and their weightings, we were able to construct different profile-types of recipients for whom we calculated suspicion scores⁹To do this, we simulated the necessary data – some thirty variables – for each “profile-type” then used the algorithm to calculate their score. For the details you can see our methodological apendix.
. Between the different profile-types, we only varied the parameters related to employment status, income, benefits received, marital status or disability.

We would like to point out that to carry out these simulations, we have to make many assumptions, which are sometimes difficult to assess. Thus, the simulated scores below are given for information only. However, our results are consistent with Vincent Dubois’s analyses based on aggregated statistics¹⁰The over-targeting of disabled people – AAH recipients – concerns only those with a job. As such, these results are compatible with the analyses in chapter 10 of Vincent Dubois’s book Contrôler les assistés, which includes all people with disabilities. See our methodological appendix for a detailed discussion of this point.
. In the interests of transparency, we detail their construction – and its limitations – in a methodological appendix¹¹See in particular an alternative methodology used by LightHouse Reports in its article on Rotterdam for which journalists had not only the formula but also data on the people targeted. It is available here.
.

The profiles-types all correspond to households with two dependent children and are supposed to correspond to:

An “affluent” family with a stable, high income,
A “modest” family with both parents earning the SMIC (the french minimal wage)
A single parent also earning the SMIC (the french minimal wage)
A family where both parents are on minimum income benefits (RSA),
A family where one of the parents is a disabled worker: for this profile, we simulate the score of the person receiving the quarterly AAH.

The results are enlightening, as shown in the graph below.The “suspicion scores” of the most affluent households are much lower than those of households receiving minimum social benefits or the quarterly AAH.

We also observe the targeting of single-parent families, 80% of whom are women¹²See the Insee note available here.
. Our simulations indicate that this targeting is carried out indirectly – the CAF having perhaps judged that the inclusion of a “single mother” variable was too risky politically – by integrating variables such as total household income and the number of months of activity cumulated over one year of the heads of the household, the nature of which mechanically comes to disadvantage households not comprising two parents¹³At equal incomes, a single parent earns less than two parents. As for the number of months of activity per year, this will never exceed 12 for a single-parent family, but can be as high as 24 for a couple. This targeting is particularly strong in the months following a separation, as this type of event severely degrades a recipient’s score. See our additional analyses in the methodological appendix.
.

Threshold effects, discrimination and double punishment

A few months ago, the CAF sought to minimize the stigmatization of the most precarious generated by its algorithm by explaining that “the highest risk scores” do not “always concern the poorest” because “the risk score does not integrate financial situation as the only data”¹⁴This is what it has already done in its “True/False” on datamining where she explained that “the highest risk scores” do not always concern “the poorest people” because “the risk score does not integrate as the only data the financial situation”.
. Our analyses demonstrate just how fallacious this reasoning is.

What our graph shows is precisely that socio-economic variables have a preponderant weight in the calculation of the score, structurally disadvantaging people in precarious situations. For example, the risk of being checked following an event considered a “risk factor” by the algorithm – moving house, separation, death – is non-existent for a well-off recipient, since his or her score is initially close to zero. Conversely, for an RSA (minimum income benefit) recipient whose score is already particularly high, the slightest of these events is likely to tip his or her score over the threshold at which a check is triggered.

Worse, most of the non-financial variables are in fact linked to situations of instability and deviation from the norm – recent separation, moving house, multiple rent changes, repeated modification of professional activity, loss of income, declaratory errors, low number of web connections… – all of which suggest that they are themselves linked to situations of precariousness. Contrary to what the CAF would have us believe, everything indicates that this algorithm functions rather like a “double penalty”: it targets those who, among the most precarious, are going through a particularly complicated period.

Closing the (false) technical debate

As the CAF has refused to provide us with the most recent version of its algorithm, we expect its leaders to respond by arguing that they have a new, more “equitable” model. In anticipation, we would like to clarify a fundamental point: there can be no model of the algorithm that does not target the most disadvantaged, and more broadly those who deviate from the norm defined by its designers.

As we explained here in detail, while the CAF algorithm was promoted in the name of “fighting fraud”, it was actually designed to detect “indus” (overpayments). This choice was made for reasons of profitability: undue payments are more numerous and easier to detect than cases of fraud, the characterization of which requires, in theory, proof of intent¹⁵The testimonies collected by Stop Contrôles or Changer de Cap show that the need to prove intent in order to qualify an undue payment as fraud – the consequences of which for a recipient are more severe – is very regularly flouted.
.

Or, these undue payments are mainly due to unintentional errors in declarations, which all studies show are mainly concentrated on people on minimum social benefits and, more generally, on recipients in difficulty. This concentration is primarily due to the fact that these benefits are governed by complex rules – the fruit of successive policies to “combat assistance” – multiplying the risk of possible errors. In the words of a CNAF anti-fraud director: “it is the social benefits themselves that generate the risk […] this is all the more true for benefits linked to precariousness […], which are highly dependent on the family, financial and professional situation of recipients.”¹⁶See Daniel Buchet. 2006. “Du contrôle des risques à la maitrise des risques”. Available here.
.

So one doesn’t need to know the details of the algorithm’s formula to predict which populations will be targeted, because it’s the political objective of the algorithm – to detect overpayments – that determines it. This is why allowing a debate to develop around the inclusion of a given variable is a statistical fool’s game. The CAF will always be able to substitute a variable deemed politically “sensitive” with other criteria deemed “acceptable” to achieve the same result, as it already seems to be doing for single mothers¹⁷It would thus be relatively easy for the CAF to remove direct reference to minimum social benefits or AAH in its algorithm by limiting itself to using the “quarterly generating facts” variable. This variable only concerns benefits requiring a quarterly declaration of resources: quarterly AAH, APL, RSA and prime d’activité. With regard to the targeting of RSA and AAH recipients, the CAF could thus claim, without losing too much precision, to have modified its algorithm by retaining in the calculation only this variable “faits générateurs trimestriels” while continuing to target people on minimum social benefits.
.

Police logic, managerial logic

To say this is finally to go beyond the technical debate and recognize that this algorithm is merely the reflection of the spread of managerial and police logics within our social administrations in the name of “anti-fraud”policies.

It is by transforming benefit recipients into “assistants”, and then into risks to the survival of our social system, that the discourse of “the fight against assistance” has made their control an imperative of “good management”¹⁸See above all Vincent Dubois’s book published in 2021. “Contrôler les assistés. Geneses et usage d’un mot d’ordre”. On the over-control of the most precarious populations, see chapter 10. On the political history of the “fight against welfare”, and the major role played in France by Nicolas Sarkozy, see chapter 2. On the evolution of control policies, their centralization following the introduction of the algorithm and the definition of targets, see pages 177 and 258. On the contestation of national targeting plans by local CAF directors, see page 250.. Never mind that all estimates show that “social fraud” is marginal, and that it is on the contrary non-use of benefits that is proving to be a massive phenomenon.

As an institutional objective, control must be rationalized. Digital technology then becomes the preferred tool in the “fight against social fraud”, thanks to the ability it offers managers to respond to results-based injunctions, while at the same time providing a technical alibi for the widespread discrimination practices their outfit imposes.

These logics are salient in the response written by the CAF to oppose the transmission of the code of its algorithm, before being forced to do so by the Commission d’Accès aux Documents Administratifs (CADA). It openly assumes a police discourse, putting forward as its main argument that this communication would consist of a “breach of public security” because “by identifying the criteria constituting targeting factors, fraudsters could organize and assemble fraudulent files”.

Fight

Health insurance, old-age insurance, the Mutualités Sociales Agricoles or, to a lesser extent, Pôle Emploi: all use or are developing algorithms that are similar in every respect. At a time when these rating practices are becoming widespread, it seems necessary to think about a large-scale fight.

That’s why we’ve decided to make these algorithmic control practices a priority for the coming year. You’ll find here our page dedicated to this topic, which we’ll be updating regularly.

References[+]

References
↑1	You can contact them at stop.controles@protonmail.com and contact@changerdecap.net.
↑2	The CAF is not the only administration to use this type of algorithm, but it was the first to do so. We’ll be back shortly with a more global view of the use of these kinds of algorithms by social administrations as a whole.
↑3	While the algorithm itself uses only a few dozen variables to calculate recipients’ scores, these are selected after a so-called “training” phase mobilizing over 1,000 pieces of information per recipient. For technical details see Pierre Collinet’s article “Le datamining dans les caf : une réalité, des perspectives”, written in 2013 and available here.
↑4	Controls at CAF are of three types. Automated checks are procedures for verifying recipients’ declarations (income, employment status, etc.), organized via the interconnection of administrative files (tax, employment office, etc.). They are by far the most numerous. Documentary checks involve requesting additional supporting documents from the claimant. Finally, on-site checks are the least numerous but the most intrusive. Carried out by a CAF controller, they involve an in-depth inspection of the claimant’s situation. These are the ones that are now overwhelmingly triggered by the algorithm following a deterioration in a recipient’s rating (See Vincent Dubois, “Contrôler les assistés”, p.258).
↑5	See above all Vincent Dubois’s book published in 2021. “Contrôler les assistés. Geneses et usage d’un mot d’ordre” On the over-control of the most precarious populations, see chapter 10. On the political history of the “fight against welfare”, and the major role played in France by Nicolas Sarkozy, see chapter 2. On the evolution of control policies, their centralization following the introduction of the algorithm and the definition of targets, see pages 177 and 258. On the contestation of national targeting plans by local CAF directors, see page 250.
↑6	Extract from a CAF director’s response to the criticisms levelled by the Défenseur des Droits against the use of this algorithm.
↑7	The CAF initially provided us with “redacted” source code in which almost all variable names were hidden. We eventually obtained the code for two versions of the algorithm. The first was used between 2010 and 2014. The second between 2014 and 2018. Six variables were still hidden from the “2010” model and 3 from the “2014” model.
↑8	Concerning the variable linked to place of residence, the latter is a priori one of the variables masked in the code received. However, it is mentioned in the CAF’s response to CADA, which is why it seems reasonable to include it here. See our methodological appendix for a detailed discussion of the formula.
↑9	To do this, we simulated the necessary data – some thirty variables – for each “profile-type” then used the algorithm to calculate their score. For the details you can see our methodological apendix.
↑10	The over-targeting of disabled people – AAH recipients – concerns only those with a job. As such, these results are compatible with the analyses in chapter 10 of Vincent Dubois’s book Contrôler les assistés, which includes all people with disabilities. See our methodological appendix for a detailed discussion of this point.
↑11	See in particular an alternative methodology used by LightHouse Reports in its article on Rotterdam for which journalists had not only the formula but also data on the people targeted. It is available here.
↑12	See the Insee note available here.
↑13	At equal incomes, a single parent earns less than two parents. As for the number of months of activity per year, this will never exceed 12 for a single-parent family, but can be as high as 24 for a couple. This targeting is particularly strong in the months following a separation, as this type of event severely degrades a recipient’s score. See our additional analyses in the methodological appendix.
↑14	This is what it has already done in its “True/False” on datamining where she explained that “the highest risk scores” do not always concern “the poorest people” because “the risk score does not integrate as the only data the financial situation”.
↑15	The testimonies collected by Stop Contrôles or Changer de Cap show that the need to prove intent in order to qualify an undue payment as fraud – the consequences of which for a recipient are more severe – is very regularly flouted.
↑16	See Daniel Buchet. 2006. “Du contrôle des risques à la maitrise des risques”. Available here.
↑17	It would thus be relatively easy for the CAF to remove direct reference to minimum social benefits or AAH in its algorithm by limiting itself to using the “quarterly generating facts” variable. This variable only concerns benefits requiring a quarterly declaration of resources: quarterly AAH, APL, RSA and prime d’activité. With regard to the targeting of RSA and AAH recipients, the CAF could thus claim, without losing too much precision, to have modified its algorithm by retaining in the calculation only this variable “faits générateurs trimestriels” while continuing to target people on minimum social benefits.
↑18	See above all Vincent Dubois’s book published in 2021. “Contrôler les assistés. Geneses et usage d’un mot d’ordre”. On the over-control of the most precarious populations, see chapter 10. On the political history of the “fight against welfare”, and the major role played in France by Nicolas Sarkozy, see chapter 2. On the evolution of control policies, their centralization following the introduction of the algorithm and the definition of targets, see pages 177 and 258. On the contestation of national targeting plans by local CAF directors, see page 250.