A Security Framework for Smart Metering withMultiple Data Consumers
Cristina Rottondi, Giacomo Verticale and Antonio Capone
Dipartimento di Elettronica e Informazione, Politecnico di Milano, Piazza Leonardo da Vinci, 32, Milano, Italyrottondi@elet.polimi.it, vertical@elet.polimi.it, capone@elet.polimi.it
Abstract
—The increasing diffusion of Automatic Meter Reading (AMR) has raised many concerns about the protection of personal data related to energy, water or gas consumption, fromwhich details about the habits of the users can be inferred. Onthe other hand, aggregated measurements about consumptionare crucial for several goals, including resource provisioning,forecasting, and monitoring.This paper proposes a framework for allowing informationConsumers, such as utilities and third parties, to collect data withdifferent levels of spatial and temporal aggregation from smartmeters without revealing information about individual customers.The proposed infrastructure introduces a new set of functionalnodes, namely the Privacy Preserving Nodes (PPNs), which collectcustomer data masked by means of a secret sharing schemewith homomorphic properties, and aggregate them directly in themasked domain, according to the Consumer’s needs and accessrights. The information Consumers can recover the aggregateddata by collecting multiple shares from the PPNs.The paper describes an Integer Linear Programming formulation and a greedy algorithm to address the problem of deployingthe information ﬂows between the information Producers (i.e.the customers), the PPNs, and the Consumers and evaluates thescalability of the infrastructure both under the assumption thatthe communication network is reliable and timely and in presenceof communication errors.
Index Terms
—Smart Metering; Homomorphic Encryption;Data Privacy;
I. I
NTRODUCTION
After a long time during which public utilities like electricity, gas and water have been provided by infrastructures unableto measure in realtime how and where they were consumedin the distribution networks, the new smart metering systemspromise to completely redesign the relationship between thecustomers and the utility companies. Furthermore, it is expected that new actors will play a role in the management of the services, the infrastructures, and the related information,with different companies as well as public/regulation authorities and end users being involved in the reshaped market of utilities [1].Therefore, the development of systems for Automatic MeterReading (AMR) and Automatic Meter Management (AMM)is being stimulated by many governments around the worldwith the goal of improving the overall efﬁciency in the useof energy and natural resources and of removing barriers andconstraints in the utility markets [2], [3].Notwithstanding the remarkable research and developmentefforts, several security and privacy issues are yet to be
Cristina Rottondi is funded by Fondazione Ugo Bordoni
solved, mainly because the smart grid scenario implies securityrequirements and assumptions that are different from thesecurity assumptions underlying the general purpose technologies considered for communication. In particular, accordingto the conceptual models of smart metering and smart gridsystems [4], we believe that that key element of the newsystem architecture is the service platform that can be open toapplications provided non only by traditional utility companiesbut also by third parties that can play a role in an openmarket of value added services. It is important to observe that,differently from traditional systems, it is not only the resourceitself (electricity, gas, water) but also the information on its useand production that has a direct economic value. Therefore, ina scenario where different actors can provide services basedon the information gathered by the smart metering system, itis of paramount importance to deﬁne a security infrastructureable to provide access to data with different levels of spatialand temporal aggregation. Moreover, due to privacy concernabout collected data that may reveal information on the userseven not related to the resource measured (presence at home,habits, etc.), it is important to be able to hide information onindividual customers and their identity.In this paper we propose an infrastructure for allowinginformation Consumers, such as utilities, companies and thirdparty service providers, to collect data only when aggregatedon a spatial and/or temporal basis according to the speciﬁcservice they are expected to provide, therefore preservingthe privacy of customers. We provide the following mainnovel contributions. (i) We design the general architectureof the privacy infrastructure introducing a new set of functional nodes in the smart grid, namely the Privacy PreservingNodes (PPNs), which collect and aggregate customer datamasked by means of a secret sharing scheme. (ii) We identifysome critical design problems addressing the allocation of information ﬂows and the dimensioning of the set of thePPNs and of their computational resources. (iii) We modelthe above mentioned problems by means of an Integer LinearProgramming formulation, we prove that the model is NPhard, and propose a greedy algorithm for tackling largeinstances in short computation time. (iv) We evaluate thescalability of the infrastructure, ﬁrst under the assumptionthat the communication network is reliable and timely, thenin presence of an unreliable communication network.The paper is structured as follows. Section II reviews theliterature on privacypreserving data aggregation and compares
our proposed framework to other solutions. Section III discusses our security assumptions and describes the functionalnodes of the architecture and the aggregation protocol. SectionIV formalizes the design problems that arise when implementing the architecture in a scenario with a large numberof Producers and Consumers. We also prove the NPhardnessof the model and present a greedy algorithm for solving itefﬁciently. In Section V we compare the performance of thegreedy algorithm to the optimal solutions and discuss thescalability of our architecture both assuming an errorfreescenario and a scenario in which protocol messages can belost. A conclusion is left for the ﬁnal Section.II. R
ELATED
W
ORK
MolinaMarkham
et al.
[5] propose a zeroknowledge protocol allowing the smart meters to calculate both the aggregateconsumption and the energy bill of each household withoutreleasing ﬁnegrained information to the utility companies.The protocol is computationally quite expensive and requires apotentially very large number of Neighborhood Gateways. Ourproposal is more scalable, requiring a small number of PrivacyPreserving Nodes, and has a lower computational complexity.Also, our proposal is robust to the loss of protocol messages.Paper [6] proposes an aggregation protocol using the homomorphic Pallier’s cryptosystem. Our protocol relies onShamir’s Secret Sharing, which has a lower computationalcomplexity, and also makes it possible to aggregate the samedata according to different rules with a sublinear increase inprotocol trafﬁc.Papers above adopt the honestbutcurious adversary model,in which the nodes honestly execute the protocol, but keepall inputs and try to infer individual measurements. Our paperassumes the same adversarial model. The following papers, instead, use a dishonestbutnonintrusive (DN) adversary, whichmay not follow the protocol and can provide false information,but cannot modify the communications infrastructure: Garciaand Jacobs [7] use a combination of Pallier’s scheme andsecret sharing; paper [8] proposes four different protocols withdifferent cryptographic properties and complexities.; paper [9]proposes a protocol based on the differential security model,which is robust to the temporary loss of connectivity to a node.Differently to these papers, our proposal requires a honestnode, the PPN, but is more scalable and allows aggregationboth in time and/or in space. Further, our architecture is theonly one that includes by design multiple data Consumers withdifferent time and space aggregation granularities.Finally, the idea of using a sharing scheme to divide themeasurements over multiple PPNs, which then can performhomomorphic operations, is borrowed from [10]. That paperproposes a privacy preserving aggregation scheme for network trafﬁc measurements. Apart from the completely different application scenario, our paper studies the optimization problemthat raises when multiple aggregation rules share the samearchitecture. Further, we extend the protocol robustness andscalability in case some protocol messages are lost.
Fig. 1. The functional nodes of the architecture. The Privacy PreservingNodes are indicated as PPNs.
w
is the minimum number of shares generatedby a Producer.
t
is the minimum number of shares necessary to recover theaggregated measurements.
III. T
HE
A
GGREGATION
A
RCHITECTURE AND
P
ROTOCOL
With reference to Figure 1, the architecture comprises threesets of nodes:
•
the set of information
Producers
,
P
, which represent thesmart meters;
•
the set of
Privacy Preserving Nodes
(PPN),
N
, whichare the new nodes that perform the aggregation in theencrypted domain;
•
the set of information
Consumers
,
C
, which receive timeand/or spaceaggregated information and represent theutilities or other third party services, such as billingcompanies or energy brokers.The architecture also includes a
Conﬁgurator
which receivesthe aggregation request from the Consumers, deﬁnes theinformation ﬂows between Producers, PPNs and Consumers,and conﬁgures the aggregation rules in the PPNs.The basic idea is the following. Each Producer usesShamir’s secret sharing scheme to divide its measurementsin multiple shares so that at least
t
shares are necessary torecover the measurement. Each share is sent to a differentPPN, therefore a collusion of at least
t
PPNs is required toobtain individual measurements. The PPNs can concurrentlysum their shares obtained from different Producers or from thesame Producer at different times. The summed shares are thensent to the Consumer, which, having received multiple shares,can run the recovery algorithms. By virtue of the homomorphicproperties of Shamir’s scheme, if the secret recovery algorithmis run over the sum of shares, the recovered secret is the sumof the srcinal secrets. Therefore, the Consumer obtains theaggregated measurements, but gets no information about theindividual measurements.In order to share a measurement with Shamir’s scheme,the Producer generates a random polynomial
f
(
·
)
of degree
t
−
1
over the ﬁeld
Z
q
, where
q
is a public system parameterthat is used by all the Producers. The prime number
q
mustbe larger than the number of PPNs and larger than themaximum individual or aggregated measurement that mustbe represented. The polynomial is built so that
f
(0)
is the
(secret) measurement. The share for the
n
th PPN is the couple
(
n,f
(
n
))
. When performing the aggregation, the PPNs are,in practice, generating a new random polynomial
f
(
·
)
withunknown coefﬁcients that, when evaluated in 0, yields theaggregated measurement. However, the
n
th PPN only knows
f
(
n
)
and has no information about
f
(0)
. In order to recoverthe secret, the Consumer exploits the wellknown principlethat
t
evaluations of a polynomial of degree
t
−
1
are enoughto uniquely identify the polynomial. Once the polynomial isknown, it is trivial to evaluate it in zero. There are severalalgorithms to recover the secret, with one of the most popularbeing the Lagrange interpolator.In detail, we assume that time is divided in rounds and allnodes have a common timereference, communications amongthe nodes is conﬁdential and authenticated, and each PPN isidentiﬁed by a progressive number.At the end of the
τ
th round, each Producer
p
generatesa measurement
µ
i
(
τ
)
. By exploiting Shamir’s scheme, theProducer divides its measurement in
w
i
shares and sends themto the
w
i
PPNs chosen by the Conﬁgurator.Each Producer sends data to at least
w
PPNs, where
w
isa system parameter. In case the communication network isperfectly reliable and timely,
w
can be as low as
t
. However,to increase reliability,
w
can be set to be higher than
t
. Further,some Producers send data to more than
w
PPNs if they areinvolved in more aggregation rules. Therefore, in general, the
w
i
can be different. We denote as
[
µ
i
(
τ
)]
n
the share of secret
µ
i
(
τ
)
sent to the
n
th PPN. The shares are calculated by theProducer by using the following random polynomial:
[
µ
i
(
τ
)]
n
=
µ
i
(
τ
) +
r
1
(
τ
)
n
+
···
+
r
t
−
1
(
τ
)
n
t
−
1
mod
q
(1)The integers
r
1
(
τ
)
,...,r
t
−
1
(
τ
)
are a set of random numbersuniformly distributed in the range
[0
,q
)
and change at eachround. Note that the powers of
n
can be computed in advanceand have no cost during the execution of the algorithm.Consumer
c
speciﬁes an aggregation rule, deﬁned as a set of Producers,
Π
c
, and a time window duration,
k
c
. Without lossof generality, we assume that each Consumer speciﬁes a singleaggregation rule. When
c
speciﬁes its rule, the Conﬁguratorchecks the conformance of the rule to the system policy andcommunicates to the Producers in
Π
c
the set of the
w
PPNsto which they must send a share of their measurements. Thesame share can be used within different aggregation rules.The Conﬁgurator can use different strategies for choosingthese
w
PPNs: the reader is referred to Section IV, in whichwe present the relevant optimization problems and introduceheuristic algorithms to solve them efﬁciently.With reference to the
c
th aggregation rule, each of theinvolved PPNs independently and concurrently performs aggregation on the masked data according to the rule, calculatingthe aggregated measurement. The
n
th PPN calculates itsaggregated share,
[
σ
c
(
τ
)]
n
, as:
[
σ
c
(
τ
)]
n
=
i
∈
S τ
τ
=
τ
−
k
c
+1
[
µ
i
(
τ
)]
n
mod
q
(2)where
S
is the set of the shares. Aggregation is performedonly at rounds
τ
that are integer multiples of
k
. In order toperform aggregation, the PPN must wait for all the necessarydata. As soon as the aggregated measurement is available, thePPN sends it to Consumer
c
.The Consumer collects the incoming aggregated shares. Assoon as
t
aggregated shares are available, the Consumer canrecover the value of the aggregate measurement.Like [10], [6], we assume an
honestbutcurious
securitymodel. In this model, the PPNs are assumed to follow theprotocol but they keep all the inputs from the Producers andtry to actively recover the values of the measurements. Underthis assumption, the aggregation procedure is
informationtheoretically
secure since no set of fewer than
t
PPNs can learnany information about the individual or aggregated measurements. A collusion of two or more Consumers can learn anyindividual or aggregated measurement that can be expressed asa function of the aggregated measurements known to that set of Consumers. It is responsibility of the Conﬁgurator to preventConsumers from specifying aggregation rules that could leadto information leakage.A passive intruder could collect multiple shares from agiven Producer and recover the individual measurements. Toprevent this attack, we assume that the communication channelbetween Producers and PPNs is conﬁdential and authenticated.Since practical systems for securing communication channelsare computationally secure, our architecture is also computationally secure against this kind of attack.Computational complexity of the proposed protocol is dominated, at each node, by the following operations:
•
At the Producer, the generation of the shares comprisesthe generation of
t
−
1
random numbers and
t
−
1
sumsmodulus
q
for each share. Therefore the computationalcomplexity is
O
(
wt
)
.
•
At the PPN, the aggregation is performed by means of (2), so the
c
th aggregation rule has complexity
O
(

Π

k
)
.
•
At the Consumer, the aggregated measurement must berecovered. The algorithm based on Lagrange interpolatorhas a complexity
O
(
t
2
)
, while other known algorithmshave complexity of
O
(
t
log
2
t
)
.IV. M
ODEL
D
ESIGN AND
O
PTIMIZATION
Given the large number of producers that can be monitored,and considering that each PPN manages multiple rules, computation at PPNs is the bottleneck of the system. In this Section,we provide an ILP formulation for the problem of minimizingthe number of installed PPNs, in case the maximum numberof sums that each PPN can perform is limited by a threshold.In the remainder of the paper, this problem will be named
minPPN
problem. We also prove that the problem is NPhard.
A. ILP Formulation
Let
P
,
C
, and
N
be the sets of the Producers, Consumers,and PPNs, respectively.
Parameters:
•
w
: number of shares used in the secret sharing scheme
•
A
pc
: boolean indicator, it is 1 if Producer
p
is monitoredby Consumer
c
, 0 otherwise
•
L
: maximum computational load (expressed in numberof sums) of each PPN
Variables:
•
x
n p
: boolean variable, it is 1 if Producer
p
sends a shareto PPN
n
, 0 otherwise
•
y
nc
: boolean variable, it is 1 if Consumer
c
receives anaggregated share from PPN
n
, 0 otherwise
•
z
n
: boolean variable, it is 1 if PPN
n
is activated, 0otherwise
Objective function:
min
n
∈
N
z
n
(3)
Constraints:
n
∈
N
y
nc
=
w
∀
c
∈
C
(4)
A
pc
y
nc
≤
x
n p
∀
p
∈
P,
∀
n
∈
N,
∀
c
∈
C
(5)
x
n p
≤
c
∈
C
A
pc
y
nc
∀
p
∈
P,
∀
n
∈
N
(6)
L
≥
c
∈
C
p
∈
P
A
pc
y
nc
∀
n
∈
N
(7)
y
nc
≤
z
n
∀
n
∈
N,
∀
c
∈
C
(8)
x
n p
≤
z
n
∀
p
∈
P,
∀
n
∈
N
(9)Constraint (4) imposes that each Consumer receives
w
aggregated shares, computed by different PPNs. The secretcan be reconstructed by the Consumer even if
w
−
t
shares arelost because of communication errors. The coherence betweenthe values of
x
n p
and
y
nc
variables is imposed by Constraints(5) and (6): (5) forces
y
nc
to 0 in case none of the Producersmonitored by Consumer
c
sends a share to PPN
n
, while (6)sets
x
n p
to 0 if none of the Consumers interested to the datagenerated by Producer
p
receives an aggregated share fromPPN
n
. The total number of sums performed by each PPN isforced by Constraint (7) to be inferior than the threshold
L
.Constraints 8 and 9 impose coherence between the values of
z
n
,
y
nc
and
x
n p
.
Theorem 1.
The
minPPN
problem is NPhard.Proof:
Consider the following problem where, with respect to the
minPPN
problem, we introduce a parameter
M
c
=
p
∈
P
A
pc
, which expresses the number of sumsnecessary to compute each share destined to Consumer
c
, andthe set of shares
S
. Furthermore, a binary variable
g
ncs
, whichis 1 in case the
s
th share (
1
≤
s
≤
w
) destined to Consumer
c
is computed by PPN
n
and 0 otherwise, is introduced. Theobjective function remains unvaried, while the Constraints are
1:
initialize
x
n p
and
y
nc
to 0
∀
(
p,n,c
)
∈
P
×
N
×
C
2:
for all
(
n,c
)
∈
N
×
C
do
3:
if
s
∈
S
g
ncs
≥
1
then
4:
y
nc
←
1
5:
end if
6:
end for
7:
for all
(
p,n,c
)
∈
P
×
N
×
C
such that
A
pc
= 1
do
8:
if
s
∈
S
g
ncs
≥
1
then
9:
x
n p
←
1
10:
end if
11:
end for
Fig. 2. Conversion Algorithm
replaced as follows:
n
∈
N
g
ncs
= 1
∀
s
∈
S,
∀
c
∈
C
(10)
s
∈
S,c
∈
C
M
c
g
ncs
≤
L
∀
n
∈
N
(11)
s
∈
S
g
ncs
≤
z
n
∀
n
∈
N,
∀
c
∈
C
(12)Constraint (10) ensures that each Consumer receives all theaggregated shares, while the computational burden at eachPPN is forced by Constraint (11) to be lower than
L
. Finally,Constraint 12 ensures that no aggregated shares are computedby a PPN that is not installed.In case

S

= 1
, the above problem is reduced to a binpacking problem, which is proved to be NPhard. A feasiblesolution can be converted to a solution of the
minPPN
problemwith the Algorithm in Figure 2. Consequently, the
minPPN
problem is NPhard.
B. Heuristic Approach
The Algorithm in Figure 3 is a greedy algorithm to ﬁndfeasible solutions for the the
minPPN
problem and can bedivided in two parts: the ﬁrst one (lines 111) is aimed atequally distributing the computational load among all theavailable PPNs, considering the threshold
L
imposed on themaximum number of sums that each of them can perform.Then, the second part of the algorithm (lines 1235) tries toeliminate some of the PPNs by redistributing their load amongthe others: in particular, the PPN
n
, which performs the lowestnumber of sums, is selected and for each Consumer
c
receivingan aggregated share from
n
, the computational load needed tocalculate the aggregated share is associated to another PPN,
j
, chosen among the ones that do not already provide andaggregated share to
c
. During this second phase, the auxiliaryvariables
y
nc
and
L
n
are introduced in order to record thechanges in the associations between Consumers and PPNs andin the computational burden of each PPN. The procedure isrepeated until the computational load of
n
becomes 0. In thatcase, the PPN is eliminated and the variables
y
nc
and
L
n
areupdated to the values of
y
nc
and
L
n
respectively. Finally, when
1:
initialize
x
n p
,
y
nc
,
L
n
and
z
n
to 0
∀
(
p,n,c
)
∈
P
×
N
×
C
2:
for all
c
∈
C
do
3:
M
c
←
p
∈
P
A
pc
4:
end for
5:
sort the elements of C in descending order of
M
c
6:
for all
c
∈
sorted
(
C
)
do
7:
while
n
∈
N
y
nc
< w
do
8:
n
←
argmin
n
∈
N
:
y
nc
=0
∧
L
n
+
M
c
≤
L
c
∈
C
M
c
y
nc
9:
L
n
←
L
n
+
M
c
,
y
nc
←
1
,
z
n
←
1
10:
end while
11:
end for
12:
for all
(
n,c
)
∈
N
×
C
do
13:
L
n
←
L
n
,
y
nc
←
y
nc
14:
end for
15:
flag
←
0
16:
while
flag
= 0
do
17:
n
←
argmin
n
∈
N
L
n
18:
for all
c
∈
C
do
19:
OK
←
0
20:
for all
j
∈
N
such that
(
j
=
n
)
∧
(
y
jc
= 0)
∧
(
y
nc
= 1)
∧
(
L
j
+
M
c
≤
L
)
do
21:
if
OK
= 0
then
22:
L
j
←
L
j
+
M
c
,
L
n
←
L
n
−
M
c
23:
y
nc
←
0
,
y
jc
←
1
,
OK
←
1
24:
end if
25:
end for
26:
end for
27:
if
L
n
= 0
then
28:
z
n
←
0
,
N
←
N
\{
n
}
29:
for all
c
∈
C
do
30:
L
n
←
L
n
,
y
nc
←
y
nc
31:
end for
32:
else
33:
flag
←
1
34:
end if
35:
end while
36:
for all
∀
(
p,n,c
)
∈
P
×
N
×
C
do
37:
if
A
pc
= 1
∧
y
nc
= 1
then
38:
x
c p
←
1
39:
end if
40:
end for
41:
return
n
∈
N
z
n
Fig. 3. Greedy algorithm for the
minPPN
problem
no more PPNs can be eliminated, the value of the variables
x
n p
is set according to
y
cn
and
A
pc
(lines 3640). The complexityof the algorithm is
O
(

C

N

P

)
.V. N
UMERICAL
R
ESULTS
This section compares the experimental results provided byAlgorithm 3 with the optimal solutions obtained by solvingthe ILP formulation. Results obtained with the greedy algorithm are analyzed under two different assumptions: weﬁrstly suppose that the communication of the shares betweenProducers and PPNs is not subject to transmission errors, thenwe assume that the communication can fail with probability
p
e
. The failures are assumed to be independent and can be dueto transmission delays or losses. Whatever the cause, if oneor more shares are not available at the PPN, the associatedProducers have to be excluded from the computation of theaggregated share at that PPN. Otherwise, the shares providedto the Consumer by the different PPNs would be inconsistentand unusable.In the remainder of the paper, if not stated differently thenumber of shares used by the protocol is assumed to be
w
=4
and the threshold for recovering the measurement is alsoassumed
t
= 4
shares. All the results have been averagedby running the greedy algorithms and the ILP solver over aset of 10 randomly generated instances of the problem: foreach instance, the parameter
A
pc
has been randomly computedassuming that each Producer
p
has probability
p
= 0
.
5
to bemonitored by Consumer
c
.
A. ILP model vs Greedy
Table I compare the performance of the Algorithm inFigure 3 in terms of results and computational time withrespect to the optimal solutions obtained by solving the ILP
minPPN
problem. For the comparison, we assume an errorfree scenario, where no communication errors occur in thetransmission of the shares. The maximum number of sums thateach PPN can perform is assumed to be
L
=
α

P

with
α
= 8
,where
α
is a parameter indicating the maximum number of sums per Producer. The number of Producers has been variedfrom 100 to 10000 for two possible sets of Consumers, of cardinality

C

= 10
and

C

= 50
respectively.There is a clear evidence that the results obtained by thegreedy are close to the optimum. Moreover, the running timeof our implementations is signiﬁcantly shorter than the timerequired by the ILP solver by several orders of magnitude.Therefore, the greedy algorithm is effective and scalable torealistic scenarios with millions of Producers monitored byhundreds of Consumers (simulations with

P

= 10
millionsand

C

= 100
provide a feasible solution in a few minutes).If not stated differently, all the results provided in the nextsections have to be intended as computed with the greedyalgorithm.
B. Scenario with Communication Errors
In this scenario the communication of the disaggregateddata between a Producer and a PPN can fail with probability
p
e
. The probability
P
T

M
c
that at least
t
aggregated sharesreceived by a given Consumer
c
monitoring
M
c
Producers arecorrect, so that it can reconstruct the aggregated data, can becomputed as follows:
P
T

M
c
=
w
i
=
t
wi
(1
−
p
e
)
k
c
M
c
i
(1
−
(1
−
p
e
)
k
c
M
c
)
w
−
i
(13)