Overview

之前项目的人遗留的数据散落在多种数据库中,既有MySQLMongoDB,也有CassandraPostgreSQL。在Python3版本的jupyter中连接PostgreSQL需要安装psycopg2,而psycopg2Python2中则是已经集成好的。

1. 安装python3-psycopg2libpq-dev

先在Linux上安装好这两个包,

sudo apt-get install python3-psycopg2
sudo apt-get install libpq-dev

再安装psycopg2,

pip3 install psycopg2

否则就会报错如下:

ERROR: Command errored out with exit status 1:
     command: /home/yangbingjiao/anaconda3/bin/python -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'/tmp/pip-install-wxj7s4lx/psycopg2/setup.py'"'"'; __file__='"'"'/tmp/pip-install-wxj7s4lx/psycopg2/setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' egg_info --egg-base pip-egg-info
         cwd: /tmp/pip-install-wxj7s4lx/psycopg2/
    Complete output (23 lines):
    running egg_info
    creating pip-egg-info/psycopg2.egg-info
    writing pip-egg-info/psycopg2.egg-info/PKG-INFO
    writing dependency_links to pip-egg-info/psycopg2.egg-info/dependency_links.txt
    writing top-level names to pip-egg-info/psycopg2.egg-info/top_level.txt
    writing manifest file 'pip-egg-info/psycopg2.egg-info/SOURCES.txt'

    Error: pg_config executable not found.

    pg_config is required to build psycopg2 from source.  Please add the directory
    containing pg_config to the $PATH or specify the full executable path with the
    option:

        python setup.py build_ext --pg-config /path/to/pg_config build ...

    or with the pg_config option in 'setup.cfg'.

    If you prefer to avoid building psycopg2 from source, please install the PyPI
    'psycopg2-binary' package instead.

    For further information please check the 'doc/src/install.rst' file (also at
    <http://initd.org/psycopg/docs/install.html>).

    ----------------------------------------
ERROR: Command errored out with exit status 1: python setup.py egg_info Check the logs for full command output.

2.连接PostgreSQL数据库

import psycopg2
import pandas as pd

conn = psycopg2.connect(database="database", user="user", password="password", host="192.168.1.230", port="5432")
cur = conn.cursor()
pql = """
SELECT * 
FROM database.table
WHERE  create_time >= '2019-09-23' AND create_time < '2019-09-30'
order by create_time
limit 10
"""
cur.execute(pql)
rows = cur.fetchall()
df = pd.DataFrame(rows)
conn.close()

本文主要参考了以下文章:
Python3连接PostgreSQL(10.5)数据库