(Free) How Non-Engineers Can Sigodeki with Huge Data Analysis

Add this entry to Hatena Bookmarks
Big CSV to pass

Scenes from a large CSV

Need to work with CSV files that are too large to open in ExcelHave you ever come across such a situation?

For example,

  • Sales data in the hundreds of thousands to millions of lines

  • Survey results and log data

  • Report CSV spit out from the business system

...Cases where you want to look at this kind of data and consider or analyze it a little. 

I think it's a very common case if you work in the corporate world 📓.

How big are these files? I would say

for example

A company with annual sales of 500 million yen and a unit sales price of about 10,000 yen

...then there are 50,000 lines of basic sales data alone.

For 5 years, 250,000 lines.

When this is combined with data on product types, discounts, and other measures, the total number of rows can quickly grow to more than 1 million!

Naturally, if sales are large, such as 5 billion yen or 50 billion yen...

Data will increase proportionally.

That's just one example, but it's someone who works in a company,It is a rather common case that you want to analyze data from millions of rows of data.I think that's what it means.

DB icon

CSV file, not database.

In a company of a certain size, there is usually an IT department and an engineer, and the engineer may be someone who works with databases on a daily basis,

As an engineer, it is normal to deal with millions of rows of dataabout.

When analyzing data within the IT department,Set up and work on a database for this purpose.I know you do.

However, it is often the case that the people in a company who want to do data analysis in the first person are in the marketing department, product department, sales department, human resources department, etc. ・・・・

Sometimes they're not in the IT department.

And,

suchNo database for data analysis for non-IT departmentsI'm sure there are many companies that have

The alual in such cases is,IT department only gives me CSV.This is a pattern called

IT department engineer: "I've pulled 5 million rows of CSVs from the database, and you can do whatever analysis you want with them."

Engineer and marketer

He looks at the huge file he handed me and says, "Hmmm...

...I think there are quite a few phases like this.

What to do in such a situation

If possible, I wish I could analyze it in Excel, which I am familiar with, but unfortunately Excel is not.When the number of lines exceeds several tens of thousands (depending on the performance of the computer), the operation becomes very slow and tiring.

Also, if the number of lines exceeds 1,040,000, it cannot be opened in the normal way

The Excel Power Query method can be used to open the data, but it does not avoid the fact that the analysis will take time,Heavy operation and risk of crashes can be a bottleneck

However, uploading to some unknown cloud is not a security measure.

Of course, it's hard for me to say, "Well, I'll set up a database in-house," like an engineer.

Similarly, "Let's implement a BI tool such as Tableau or Power BI" is a good suggestion for a future plan, but not something you can do today or tomorrow on your own.

I need to analyze it today and I'm stuck...

We want to do it.I just want to handle large data quickly on my local PC! (If there were just a few more lines, I could do it in a snap with Excel...!)something like ...

 

Therefore, in this issue.DBBrowser for SQLiteto and AI (Chat GPT, CoPilot, Gemini, Claude.... Any AI that can be consulted in a chat format) approaches to


 

Why DB Browser for SQLite?

DB Browser fro SQLite is SQLite is an ultra-lightweight database with GUI operation and has the following features

  • ✅ fully localNo net connection or server required

  • ✅ High capacity OKLightweight operation even with hundreds of thousands to millions of lines

  • ✅ Free and easy to install

All of the following can be operated with a mouse

  • Data import (CSV → table)

  • Extraction by conditions (search filter)

  • Editing and deleting

  • Export results (e.g., re-CSV)

Only when we do the final analysis, we use a language called SQL, but this can be solved in an instant with the support of AI.

  • Analyze using SQL → Solve using AI

 

 

Operation Procedure (Preparation of DB Browser for SQLite)

1. download and install the tool

Let's download it first!

For Windows users, choose "DB.Browser.for.SQLite-*.win64.exe".

The screen images from here on out will be done on a Mac.

 Homebrew is a command line installation from the terminal

Here we will use the installer (.dmg file on Mac)

When you open the dmg file, you will see a screen like this, so copy it to Applications by drag & drop

 

Double-click DB Browser for SQLite in the Applications folder to open it

 

 You may get a screen like this, but it will open.
 DB Browser for SQLite has been opened. If you want to configure Japanese settings, please click  You can configure it by going to DB Browser for SQLite → Preferences
 Now ready for use

Create a new database file

 

Choose File→New Database
Choose where to save the database

 

Next is the table definition screen.
 

Import CSV files as tables

After this, import the CSV file so that it can be handled as a database (table)
A table is data in table form

File→Import→CSV File to Table

 

 

In this case, we have prepared two CSV files as samples
This is a sample, so it is 900 rows, but it works with CSVs more than 100 times larger than this.
  1. sales detail data
  2. Merchandise Cost Master

The sales detail data is an image of the data in the POS data or sales management system, with columns like this inside

  • SalesID (Serial number of the sales invoice)
  • SalesDate
  • ProductID (product ID of the sold product)
  • Category (Category of sold products)
  • Gender (whether the product sold is men's or women's)
  • Size (size of item sold)
  • SalesValue
The product cost master is an image of the master that manages the product cost of each product and contains columns such as
  • ProductID (Product ID)
  • ProductCost (the cost amount of the item)
This is a completely fictitious sample, but I am sure similar examples are common. 
If you open it in Excel, you will see an image like this