Title: | Exploit Data Leakages in Time Series Forecasting Competitions |
---|---|
Description: | Forecasting competitions are of increasing importance as a mean to learn best practices and gain knowledge. Data leakage is one of the most common issues that can often be found in competitions. Data leaks can happen when the training data contains information about the test data. For example: randomly chosen blocks of time series are concatenated to form a new time series, scale-shifts, repeating patterns in time series, white noise is added in the original time series to form a new time series, etc. 'tsdataleaks' package can be used to detect data leakages in a collection of time series. |
Authors: | Thiyanga S. Talagala [aut, cre] |
Maintainer: | Thiyanga S. Talagala <[email protected]> |
License: | GPL (>= 2) |
Version: | 2.1.1 |
Built: | 2024-11-15 04:52:39 UTC |
Source: | https://github.com/thiyangt/tsdataleaks |
Correlation calculation based on rolling window with overlapping observations.
find_dataleaks(lstx, h, cutoff = 1)
find_dataleaks(lstx, h, cutoff = 1)
lstx |
list of time series |
h |
length of forecast horizon |
cutoff |
benchmark value for corr absolute value, default 1 |
list of matching quantities
a = rnorm(15) lst <- list( a = a, b = c(a[10:15], rnorm(10), a[1:5], a[1:5]), c = c(rnorm(10), a[1:5]) ) find_dataleaks(lst, h=5) #' a = rnorm(15) lst <- list( x= a, y= c(rnorm(10), a[1:5]) ) find_dataleaks(lst, h=5) # List without naming elements lst <- list( a, c(rnorm(10), a[1:5], a[1:5]), rnorm(10) ) find_dataleaks(lst, h=5)
a = rnorm(15) lst <- list( a = a, b = c(a[10:15], rnorm(10), a[1:5], a[1:5]), c = c(rnorm(10), a[1:5]) ) find_dataleaks(lst, h=5) #' a = rnorm(15) lst <- list( x= a, y= c(rnorm(10), a[1:5]) ) find_dataleaks(lst, h=5) # List without naming elements lst <- list( a, c(rnorm(10), a[1:5], a[1:5]), rnorm(10) ) find_dataleaks(lst, h=5)
Correlation calculation based on rolling window with overlapping observations.
Correlation calculation based on rolling window with overlapping observations.
reason_dataleaks(lstx, finddataleaksout, h, ang = 0) reason_dataleaks(lstx, finddataleaksout, h, ang = 0)
reason_dataleaks(lstx, finddataleaksout, h, ang = 0) reason_dataleaks(lstx, finddataleaksout, h, ang = 0)
lstx |
list of time series |
finddataleaksout |
list, the output generated from find_dataleaks function |
h |
length of the window size |
ang |
angle at which the tick and axis labels should be displayed (default 0) |
matrix visualizing the output
matrix visualizing the output
a = rnorm(15) lst <- list( a = a, b = c(a[10:15], rnorm(10), a[1:5]+10, a[1:5]), c = c(rnorm(10), a[1:5]) ) f1 <- find_dataleaks(lst, h=5) reason_dataleaks(lst, f1, h=5) # List without naming elements lst <- list( a, c(rnorm(10), a[1:5], a[1:5]), rnorm(10) ) f2 <- find_dataleaks(lst, h=5) reason_dataleaks(lst, f2, h=5) a = rnorm(15) lst <- list( a = a, b = c(a[10:15], rnorm(10), a[1:5], a[1:5]), c = c(rnorm(10), a[1:5]) ) f1 <- find_dataleaks(lst, h=5) reason_dataleaks(lst, f1, h=5)
a = rnorm(15) lst <- list( a = a, b = c(a[10:15], rnorm(10), a[1:5]+10, a[1:5]), c = c(rnorm(10), a[1:5]) ) f1 <- find_dataleaks(lst, h=5) reason_dataleaks(lst, f1, h=5) # List without naming elements lst <- list( a, c(rnorm(10), a[1:5], a[1:5]), rnorm(10) ) f2 <- find_dataleaks(lst, h=5) reason_dataleaks(lst, f2, h=5) a = rnorm(15) lst <- list( a = a, b = c(a[10:15], rnorm(10), a[1:5], a[1:5]), c = c(rnorm(10), a[1:5]) ) f1 <- find_dataleaks(lst, h=5) reason_dataleaks(lst, f1, h=5)
Correlation calculation based on rolling window with overlapping observations.
ts.match(x, y, cutoff = 1)
ts.match(x, y, cutoff = 1)
x |
time series |
y |
subsection of the time series to map |
cutoff |
benchmark value for corr, default 1 |
Pearson's correlation coefficient between x
and y
x <- rnorm(15) y <- -x[6:10] x <- c(x, y) ts.match(x, y, 1) z <- rnorm(5) ts.match(x, z)
x <- rnorm(15) y <- -x[6:10] x <- c(x, y) ts.match(x, y, 1) z <- rnorm(5) ts.match(x, z)
Correlation calculation based on rolling window with overlapping observations.
viz_dataleaks(finddataleaksout)
viz_dataleaks(finddataleaksout)
finddataleaksout |
list, the output generated from find_dataleaks function |
matrix visualizing the output
a = rnorm(15) lst <- list( a = a, b = c(a[10:15]+rep(8,6), rnorm(10), a[1:5], a[1:5]), c = c(rnorm(10), a[1:5]), d = rnorm(10) ) f1 <- find_dataleaks(lst, h=5) viz_dataleaks(f1) a = rnorm(15) lst <- list( x= a, y= c(rnorm(10), a[1:5]) ) f2 <- find_dataleaks(lst, h=5) viz_dataleaks(f2) # List without naming elements lst <- list( a, c(rnorm(10), a[1:5], a[1:5]), rnorm(10) ) f3 <- find_dataleaks(lst, h=5) viz_dataleaks(f3)
a = rnorm(15) lst <- list( a = a, b = c(a[10:15]+rep(8,6), rnorm(10), a[1:5], a[1:5]), c = c(rnorm(10), a[1:5]), d = rnorm(10) ) f1 <- find_dataleaks(lst, h=5) viz_dataleaks(f1) a = rnorm(15) lst <- list( x= a, y= c(rnorm(10), a[1:5]) ) f2 <- find_dataleaks(lst, h=5) viz_dataleaks(f2) # List without naming elements lst <- list( a, c(rnorm(10), a[1:5], a[1:5]), rnorm(10) ) f3 <- find_dataleaks(lst, h=5) viz_dataleaks(f3)