Skip to main content

Data Source Analysis

IA1:
image.png

For IA2 & IA3 data or a data source should have been provided.

File Format

When looking at the data you should try to understand the file format being used. It will probably be either CSV, JSON or XML. The data inside it will generally consist of text and numbers at least.

Do you need all the data or only part of it? Are there issues with the data that need to be resolved?

You should look at the provided file format and explain how you will use it and any issues that could be a problem.

Source/Reliability

You should evaluate the source of the data and how reliable it is. Do you trust it? Is it likely to be biased? Is it limited or incomplete?

Sample Data

Provide some sample data.

Password DB Example - Data Source Analysis

File Format

The file is a CSV file which means that the headings need to be ignored when being imported. Also, all the fields are text.

Source/Reliability

The data provided looks unreliable with one of the URLs not being formatted correctly. Validation should be used and it should be rejected as a URL. One of the usernames looks like it should be an email address but apart from providing a warning, it is possible to know what the format should be.

The passwords provided are also bad. It might be good to provide a generate or some kind of rating to encourage better passwords.

The dataset is also very small. If more data was collected then other people may want additional information that isn't shown in this data set.

Since the passwords are stored and we need to get them back we cannot use hashing. In the future it would be better to encrypt the entire database to make it more secure. SQLite does support encrypted database which might be a solution when encryption is added.

Sample Data
Name Username Password URL Comments
Gmail apple@gmail.com 123abc https://www.gmail.com/
Hotmail bannanahotmail.com Password1 Hot Mail
NetFlix apple@gmail.com 123abc https://nextfix.com/