+1 vote

Best answer

You can try one of the following two approaches to shuffle both data and labels in the same order.

**Approach 1: **Using the number of elements in your data, generate a random index using function *permutation()*. Use that random index to shuffle the data and labels.

>>> import numpy as np

>>> X=np.array([[1.1,2.2,3.3,4.4],[1.2,2.3,3.4,4.5],[2.1,2.2,2.3,2.4],[3.1,3.2,3.3,3.4],[4.1,4.2,4.3,4.4]])

>>> X

array([[1.1, 2.2, 3.3, 4.4],

[1.2, 2.3, 3.4, 4.5],

[2.1, 2.2, 2.3, 2.4],

[3.1, 3.2, 3.3, 3.4],

[4.1, 4.2, 4.3, 4.4]])

>>> y=np.array([0,1,2,3,4])

>>> p = np.random.permutation(len(y))

>>> p

array([1, 0, 3, 4, 2])

>>> X_shuffled=X[p]

>>> X_shuffled

array([[1.2, 2.3, 3.4, 4.5],

[1.1, 2.2, 3.3, 4.4],

[3.1, 3.2, 3.3, 3.4],

[4.1, 4.2, 4.3, 4.4],

[2.1, 2.2, 2.3, 2.4]])

>>> y_shuffled=y[p]

>>> y_shuffled

array([1, 0, 3, 4, 2])

**Approach 2: **You can also use the *shuffle**()* module of sklearn to randomize the data and labels in the same order.

>>> from sklearn.utils import shuffle

>>> X_shuffled,y_shuffled = shuffle(X, y, random_state=0)

>>> X_shuffled

array([[2.1, 2.2, 2.3, 2.4],

[1.1, 2.2, 3.3, 4.4],

[1.2, 2.3, 3.4, 4.5],

[3.1, 3.2, 3.3, 3.4],

[4.1, 4.2, 4.3, 4.4]])

>>> y_shuffled

array([2, 0, 1, 3, 4])