优化若干细节支持本地GPU和远程GPU训练

master
詹力 2023-09-22 00:47:00 +08:00
parent cad4d97004
commit bdf17e8356
22 changed files with 63 additions and 8 deletions

3
.gitignore vendored Normal file
View File

@ -0,0 +1,3 @@
**/__pycache__
*.pth
**/logs

View File

@ -1,4 +1,4 @@
sThis repository hosts the source code of our paper: [[CVPR 2022] Cascade Transformers for End-to-End Person Search](https://arxiv.org/abs/2203.09642). In this work, we developed a novel Cascaded Occlusion-Aware Transformer (COAT) model for end-to-end person search. The COAT model outperforms **state-of-the-art** methods on the PRW benchmark dataset by a large margin and achieves state-of-the-art performance on the CUHK-SYSU dataset. This repository hosts the source code of our paper: [[CVPR 2022] Cascade Transformers for End-to-End Person Search](https://arxiv.org/abs/2203.09642). In this work, we developed a novel Cascaded Occlusion-Aware Transformer (COAT) model for end-to-end person search. The COAT model outperforms **state-of-the-art** methods on the PRW benchmark dataset by a large margin and achieves state-of-the-art performance on the CUHK-SYSU dataset.
| Dataset | mAP | Top-1 | Model | | Dataset | mAP | Top-1 | Model |
| --------- | ---- | ----- | ------------------------------------------------------------ | | --------- | ---- | ----- | ------------------------------------------------------------ |
@ -43,12 +43,23 @@ conda activate coat
If you want to install another version of PyTorch, you can modify the versions in `coat_pt171.yml`. Just make sure the dependencies have the appropriate version. If you want to install another version of PyTorch, you can modify the versions in `coat_pt171.yml`. Just make sure the dependencies have the appropriate version.
## Experiments on CUHK-SYSU ## CUHK-SYSU数据集实验
**Training**: The code currently only supports single GPU. The default training script for CUHK-SYSU is as follows: **训练**: 目前代码只支持单GPU. The default training script for CUHK-SYSU is as follows:
``` **在本地GTX4090训练**
``` bash
cd COAT cd COAT
python train.py --cfg configs/cuhk_sysu.yaml INPUT.BATCH_SIZE_TRAIN 3 SOLVER.BASE_LR 0.003 SOLVER.MAX_EPOCHS 14 SOLVER.LR_DECAY_MILESTONES [11] MODEL.LOSS.USE_SOFTMAX True SOLVER.LW_RCNN_SOFTMAX_2ND 0.1 SOLVER.LW_RCNN_SOFTMAX_3RD 0.1 OUTPUT_DIR ./logs/cuhk-sysu # 说明4090显存较小所以batchsize只能设置为2, 实测可以运行
python train.py --cfg configs/cuhk_sysu-local.yaml INPUT.BATCH_SIZE_TRAIN 2 SOLVER.BASE_LR 0.003 SOLVER.MAX_EPOCHS 14 SOLVER.LR_DECAY_MILESTONES [11] MODEL.LOSS.USE_SOFTMAX True SOLVER.LW_RCNN_SOFTMAX_2ND 0.1 SOLVER.LW_RCNN_SOFTMAX_3RD 0.1 OUTPUT_DIR ./logs/cuhk-sysu
```
**在本地UESTC训练**
```bash
cd COAT
# 说明RTX8000显存48G所以batchsize只能设置为3
python train.py --cfg configs/cuhk_sysu.yaml INPUT.BATCH_SIZE_TRAIN 2 SOLVER.BASE_LR 0.003 SOLVER.MAX_EPOCHS 14 SOLVER.LR_DECAY_MILESTONES [11] MODEL.LOSS.USE_SOFTMAX True SOLVER.LW_RCNN_SOFTMAX_2ND 0.1 SOLVER.LW_RCNN_SOFTMAX_3RD 0.1 OUTPUT_DIR ./logs/cuhk-sysu
``` ```
Note that the dataset-specific parameters are defined in `configs/cuhk_sysu.yaml`. When the batch size (`INPUT.BATCH_SIZE_TRAIN`) is 3, the training will take about 23GB GPU memory, being suitable for GPUs like RTX6000. When the batch size is 5, the training will take about 38GB GPU memory, being able to run on A100 GPU. The larger batch size usually results in better performance on CUHK-SYSU. Note that the dataset-specific parameters are defined in `configs/cuhk_sysu.yaml`. When the batch size (`INPUT.BATCH_SIZE_TRAIN`) is 3, the training will take about 23GB GPU memory, being suitable for GPUs like RTX6000. When the batch size is 5, the training will take about 38GB GPU memory, being able to run on A100 GPU. The larger batch size usually results in better performance on CUHK-SYSU.
@ -57,6 +68,8 @@ For the CUHK-SYSU dataset, we use a relative low weight for softmax loss (`SOLVE
**Testing**: The test script is very simple. You just need to add the flag `--eval` and provide the folder `--ckpt` where the [model](https://drive.google.com/file/d/1LkEwXYaJg93yk4Kfhyk3m6j8v3i9s1B7/view?usp=sharing) was saved. **Testing**: The test script is very simple. You just need to add the flag `--eval` and provide the folder `--ckpt` where the [model](https://drive.google.com/file/d/1LkEwXYaJg93yk4Kfhyk3m6j8v3i9s1B7/view?usp=sharing) was saved.
测试这个测试脚本非常简单你只需要添加flag --eval以及对应提供--ckpt当模型已经保存的时候
``` ```
python train.py --cfg ./configs/cuhk-sysu/config.yaml --eval --ckpt ./logs/cuhk-sysu/cuhk_COAT.pth python train.py --cfg ./configs/cuhk-sysu/config.yaml --eval --ckpt ./logs/cuhk-sysu/cuhk_COAT.pth
``` ```
@ -76,8 +89,19 @@ python train.py --cfg ./configs/cuhk-sysu/config.yaml --eval --ckpt ./logs/cuhk-
## Experiments on PRW ## Experiments on PRW
**Training**: The script is similar to CUHK-SYSU. The code currently only supports single GPU. The default training script for PRW is as follows: **Training**: The script is similar to CUHK-SYSU. The code currently only supports single GPU. The default training script for PRW is as follows:
``` **在本地GTX4090训练**
```bash
cd COAT cd COAT
# PRW数据集较小可以RTX4090的bs可以设置为3
python train.py --cfg ./configs/prw-local.yaml INPUT.BATCH_SIZE_TRAIN 3 SOLVER.BASE_LR 0.003 SOLVER.MAX_EPOCHS 13 MODEL.LOSS.USE_SOFTMAX True OUTPUT_DIR ./logs/prw
```
**在本地UESTC训练**
```bash
cd COAT
# PRW数据集较小可以RTX4090的bs可以设置为3
python train.py --cfg ./configs/prw.yaml INPUT.BATCH_SIZE_TRAIN 3 SOLVER.BASE_LR 0.003 SOLVER.MAX_EPOCHS 13 MODEL.LOSS.USE_SOFTMAX True OUTPUT_DIR ./logs/prw python train.py --cfg ./configs/prw.yaml INPUT.BATCH_SIZE_TRAIN 3 SOLVER.BASE_LR 0.003 SOLVER.MAX_EPOCHS 13 MODEL.LOSS.USE_SOFTMAX True OUTPUT_DIR ./logs/prw
``` ```

Binary file not shown.

Binary file not shown.

Binary file not shown.

View File

@ -0,0 +1,15 @@
OUTPUT_DIR: "./logs/cuhk_coat"
INPUT:
DATASET: "CUHK-SYSU"
DATA_ROOT: "E:/DeepLearning/PersonSearch/COAT/datasets/CUHK-SYSU"
BATCH_SIZE_TRAIN: 3
SOLVER:
MAX_EPOCHS: 14
BASE_LR: 0.003
LW_RCNN_SOFTMAX_2ND: 0.1
LW_RCNN_SOFTMAX_3RD: 0.1
MODEL:
LOSS:
LUT_SIZE: 5532
CQ_SIZE: 5000
DISP_PERIOD: 100

13
configs/prw-local.yaml Normal file
View File

@ -0,0 +1,13 @@
OUTPUT_DIR: "./logs/prw_coat"
INPUT:
DATASET: "PRW"
DATA_ROOT: "E:/DeepLearning/PersonSearch/COAT/datasets/PRW"
BATCH_SIZE_TRAIN: 3
SOLVER:
MAX_EPOCHS: 13
BASE_LR: 0.003
MODEL:
LOSS:
LUT_SIZE: 482
CQ_SIZE: 500
DISP_PERIOD: 100

View File

@ -205,7 +205,7 @@ _C.DISP_PERIOD = 10
# Whether to use tensorboard for visualization # Whether to use tensorboard for visualization
_C.TF_BOARD = True _C.TF_BOARD = True
# The device loading the model # The device loading the model
_C.DEVICE = "cuda:1" _C.DEVICE = "cuda:0"
# Set seed to negative to fully randomize everything # Set seed to negative to fully randomize everything
_C.SEED = 1 _C.SEED = 1
# Directory where output files are written # Directory where output files are written

Binary file not shown.

Binary file not shown.